Guard against intermediate x*x overflow in Student's t pdf/cdf#1402
Guard against intermediate x*x overflow in Student's t pdf/cdf#1402a-leontyev wants to merge 7 commits into
Conversation
pdf() and cdf() form x*x before normalising by the degrees of freedom. For finite but large |x| this overflows to infinity, after which pdf() silently returns 0 and cdf() silently returns the tail, bypassing the error-handling policy. Add an explicit guard that invokes raise_overflow_error<> via the active Policy, mirroring quantile(). Adds test_students_t_overflow.cpp covering throw_on_error, errno_on_error and a no-regression case for float/double/long double.
Address review feedback: - Revert std::fabs to an ADL fabs in the pdf/cdf guards and add BOOST_MATH_STD_USING to cdf, so the call resolves for non-standard types (real_concept, multiprecision). - Extend test_students_t_overflow with real_concept and cpp_bin_float_50, exercising the throw policy and the no-regression path. The test's overflowing_deviate() now also uses an ADL sqrt for the same reason.
|
I've approved the CI run for this now since the changes look good. |
|
Looks like you have a single failure when running with ISAN: |
Fixed. ISAN failures were the BOOST_CHECK_EQUAL(errno, 0) assertions on the success paths in test_no_regression(). Under UBSan the diagnostic printer calls isatty() on the redirected stderr, which sets errno to ENOTTY (25) right between our reset and the check - and strictly speaking C11 7.5/3 allows any library call to modify errno on success anyway, so those assertions were over-strict. I've removed them; the policy-driven errno semantics are still covered by test_errno() (reset → call → ERANGE). |
|
Many thanks for looking into this, however, I'm not sure this is the correct fix: overflow_errors are for when the result is non-finite, in this case we have an intermediate calculation which just happens to spuriously overflow, but the result is expected to be finite. ie 0 for the PDF and 1 or 0 for the CDF and CDF complement respectively. So we should prevent the overflow in the intermediate calculation so that FPU exceptions don't get clobbered, but IMO the result should be either 0 or 1 depending on the function, and there's no need to call the error handlers for this. |
Thanks. I agreed, that's the cleaner semantic. I've reworked the guards to return the exact saturated values directly. 0 for the pdf, and 0/1 for the cdf depending on sign (the complement gets the mirrored values via the negation overload). No error handlers are involved any more. The test now checks the exact tail values under a throwing policy so any error-handler invocation would fail it and verifies via fetestexcept that FE_OVERFLOW is not raised for built-in types. |
Closes #1401.
Problem
pdf() and cdf() of the Student's t distribution form
x * xbefore anydivision by the degrees of freedom. For finite but large |x| this overflows
to infinity, after which pdf() silently returns 0 and cdf() silently returns
the tail (0 or 1). The error-handling policy is bypassed entirely, so an
overflow policy of throw_on_error / errno_on_error has no effect.
Fix
Add an explicit guard in both functions: when
fabs(x) > sqrt(max_value)the squared deviate would overflow, so invoke
raise_overflow_error<>withthe active Policy. This mirrors the existing use of raise_overflow_error in
quantile() for the p==0 / p==1 cases, so the style is consistent with the
surrounding code.
Behaviour
deviates that would otherwise overflow xx. The existing isinf(x) fast-path
is untouched (the guard sits below it, on the finite-but-huge path). The
internal edgeworth-expansion helper, which also forms xx but only on a
bounded normal deviate, is intentionally left unguarded.
Tests
Adds test/test_students_t_overflow.cpp covering, for float/double/long double:
but safe deviate (1e6) evaluates without firing the
guard.
Both the new test and the existing test_students_t pass locally
(clang-17, arm64, C++14).
Notes for reviewers
The numerical core is intentionally left untouched — this is a guard, not a
reformulation of x*x, precisely to keep accuracy on in-range inputs identical.
Threshold sqrt(max_value) is conservative: any |x| above it overflows when
squared regardless of df.