[AutoSparkUT] Fix std variance floating overflow coverage#14762
Conversation
Signed-off-by: Allen Xu <allxu@nvidia.com>
Greptile SummaryThis PR fixes a NaN-producing overflow bug in the host-side M2 reduction merge path (
Confidence Score: 5/5Safe to merge — the host-side merge fix is minimal, well-scoped, and the test suite was run end-to-end with 75 passing tests. The Scala change is a targeted fix adding an identity branch that short-circuits the first non-empty partial, eliminating the Inf×0→NaN path. The code path is only reached in the host-side global reduction merge, not in GPU kernels or per-row expressions. No files require special attention. Important Files Changed
Reviews (7): Last reviewed commit: "Drop -Infinity sentinel and revert unrel..." | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
This PR addresses intermittent stddev/variance test failures caused by floating-point overflow differences (NaN vs ±Inf) by aligning the host-side M2 merge logic with Spark’s merge order and refining integration test coverage/compare semantics.
Changes:
- Update
CudfMergeM2host-sidereductionAggregatemerge to follow Spark’s central-moment merge order (including an identity path for the first non-empty partial). - Split std/variance integration tests into (a) common finite floating-point coverage with strict comparisons and (b) extreme floating-point coverage with a scoped NaN/Inf overflow equivalence allowance.
- Add an assertion-option plumb-through in the integration test framework to treat NaN vs ±Inf as equivalent for documented overflow cases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/aggregateFunctions.scala | Adjust host-side merge of partial (n, mean, m2) to mirror Spark merge order and reduce order-dependent overflow artifacts. |
| integration_tests/src/main/python/hash_aggregate_test.py | Split std/variance coverage into common vs extreme floating-point inputs; extreme path enables NaN/Inf-overflow-equivalence comparisons. |
| integration_tests/src/main/python/asserts.py | Plumb a new comparison option through equality helpers to allow NaN vs ±Inf equivalence for documented overflow tests. |
Signed-off-by: Allen Xu <allxu@nvidia.com>
|
build |
Signed-off-by: Allen Xu <allxu@nvidia.com>
| cpu_items = list(cpu.items()).sort(key=_RowCmp) | ||
| gpu_items = list(gpu.items()).sort(key=_RowCmp) | ||
| _assert_equal(cpu_items, gpu_items, float_check, path + ["map"]) | ||
| _assert_equal(cpu_items, gpu_items, float_check, path + ["map"], | ||
| nan_inf_equivalent_for_overflow) |
There was a problem hiding this comment.
The
sorted(...) fix mentioned in the PR description was not applied. list.sort() is an in-place method that returns None, so cpu_items and gpu_items are both None. When _assert_equal(None, None, ...) is called, it falls through to the cpu == None branch (elif (cpu == None): assert cpu == gpu) which always passes. Any dict comparison therefore silently succeeds regardless of content — a false green that masks real CPU/GPU divergence in map-type columns.
| cpu_items = list(cpu.items()).sort(key=_RowCmp) | |
| gpu_items = list(gpu.items()).sort(key=_RowCmp) | |
| _assert_equal(cpu_items, gpu_items, float_check, path + ["map"]) | |
| _assert_equal(cpu_items, gpu_items, float_check, path + ["map"], | |
| nan_inf_equivalent_for_overflow) | |
| cpu_items = sorted(list(cpu.items()), key=_RowCmp) | |
| gpu_items = sorted(list(gpu.items()), key=_RowCmp) | |
| _assert_equal(cpu_items, gpu_items, float_check, path + ["map"], | |
| nan_inf_equivalent_for_overflow) |
There was a problem hiding this comment.
Addressed in e964fed: this PR no longer changes the dict/map comparison branch. The list.sort() behavior is a pre-existing issue, but applying the suggested sorted(...) cleanup in this PR exposed unrelated existing map/JSON/ORC differences in Blossom #13006, so that cleanup should be handled separately instead of being bundled with the std/variance fix.
Local validation after removing this PR's dict-branch change passed:
hash_aggregate_test.py::test_std_variance_extreme_floating_point plus json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]: 11 passed, 3 warnings in 116.69s.
Signed-off-by: Allen Xu <allxu@nvidia.com>
|
build |
|
build |
Signed-off-by: Allen Xu <allxu@nvidia.com>
|
build |
2 similar comments
|
build |
|
build |
Signed-off-by: Allen Xu <allxu@nvidia.com>
| } | ||
|
|
||
| def _canonicalize_std_variance_overflow_value(value): | ||
| if isinstance(value, float) and (math.isnan(value) or math.isinf(value)): |
There was a problem hiding this comment.
nit: should we avoid treating -Infinity as an acceptable overflow sentinel here? For stddev/variance over finite inputs, NaN vs +Infinity can be explained by overflow and merge order, but -Infinity would likely indicate a different issue, such as a negative M2/sign bug. It may be safer to canonicalize only NaN and +Infinity, and update the compatibility note accordingly.
There was a problem hiding this comment.
Addressed in f402c5f. The canonicalizer now treats only NaN and +Infinity as accepted overflow sentinels; -Infinity is left as-is so a negative-M2/sign bug would surface as a real diff. The compatibility note in docs/compatibility.md and the inline test comment are updated to match.
Re-ran hash_aggregate_test.py::test_std_variance and ::test_std_variance_extreme_floating_point locally with Python 3.10 + Spark 3.3.0: 265 passed, 39510 deselected, 8 warnings in 1425.14s.
| return (from_cpu, from_gpu) | ||
|
|
||
| def assert_gpu_and_cpu_are_equal_collect(func, conf={}, is_cpu_first=True, result_canonicalize_func_before_compare=None): | ||
| def assert_gpu_and_cpu_are_equal_collect(func, conf={}, is_cpu_first=True, |
There was a problem hiding this comment.
nit: unnecessary change?
There was a problem hiding this comment.
Reverted in f402c5f. The assert_gpu_and_cpu_are_equal_collect signature is back to a single line; this PR no longer changes that line.
| """ | ||
| _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare) | ||
| _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, | ||
| result_canonicalize_func_before_compare=result_canonicalize_func_before_compare) |
There was a problem hiding this comment.
nit: unnecessary change?
There was a problem hiding this comment.
Reverted in f402c5f. The _assert_gpu_and_cpu_are_equal call is back to a single line; this PR no longer changes that line.
Address PR review nits on NVIDIA#14762: - hash_aggregate_test.py: the std/variance overflow canonicalizer no longer accepts -Infinity. Over finite inputs, NaN and +Infinity can be explained by overflow plus partial-merge order; -Infinity would indicate a different issue (e.g. negative M2 / sign bug) and must not be hidden by sentinel canonicalization. - docs/compatibility.md: align the std/variance overflow note with the test scope (NaN and +Infinity only). - asserts.py: revert two unnecessary line wraps on assert_gpu_and_cpu_are_equal_collect signature and its call to _assert_gpu_and_cpu_are_equal; the new parameter plumbing through assert_gpu_and_cpu_are_equal_sql is kept. Validation (Python 3.10, Spark 3.3.0): TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point': 265 passed, 39510 deselected, 8 warnings in 1425.14s. Signed-off-by: Allen Xu <allxu@nvidia.com>
|
build |
Closes #14681
This follows the discussion in this issue comment to keep strict std/variance coverage for common floating-point inputs and split the overflow-oriented NaN/Inf cases into a smaller documented test set.
Upstream dependency:
0a1620e5b3), which adds the first non-empty partial identity path for extreme finite means.spark-rapids-jniorigin/mainalready pointsthirdparty/cudfat this fix.GroupByAggregation.mergeM2()path. The Scala change in this PR is still needed forCudfMergeM2.reductionAggregate, the host-side/global reduction merge path, because that path copies partial buffers to the JVM and does not call cuDFMERGE_M2. Without the Scala identity branch, the first non-empty partial can still evaluate the generic formula withmergeN == 0, causingdelta * deltato overflow toInfand thenInf * 0to becomeNaN.CentralMomentAgg(deltaN = delta / newN;m2 += delta * deltaN * n1 * n2), reducing remaining CPU/GPU differences that are purely caused by merge-order arithmetic.Changes:
result_canonicalize_func_before_comparehook through the SQL assertion helper and keep std/variance overflow canonicalization local to the extreme-input test.no_nans=True, so NaN/Inf output is attributable to overflow rather than input special cases. The broader dict/map comparison cleanup was kept out of this PR after Blossom exposed unrelated existing map/JSON/ORC differences.Performance impact:
The Scala change runs only in the host-side
reductionAggregatemerge over partial aggregate rows. Complexity and allocation behavior are unchanged; the first non-empty partial now takes an identity branch and later partials use scalar double arithmetic equivalent to Spark's merge order. No GPU kernel or per-row expression hot path is changed.Validation:
python -m py_compile integration_tests/src/main/python/asserts.py integration_tests/src/main/python/hash_aggregate_test.pygit diff --checkmvn package -DskipTests -pl dist,integration_tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo -s jenkins/settings.xml -P mirror-apache-to-urm: BUILD SUCCESSTESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short:75 passed, 3 warnings in 985.02s (0:16:25)spark-rapids-jniorigin/mainhas the cuDF fix viathirdparty/cudf->0a1620e5b3 Fix MERGE_M2 for extreme finite partial means (#22393).c12348ed68e05227139a54e19d35374530c23e7b; confirmed the resultingrapids-4-spark_2.12-26.06.0-SNAPSHOT-cuda12.jarembeds that cuDF revision.TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short:75 passed, 3 warnings in 670.56s (0:11:10)TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh -s:10 passed, 3 warnings in 101.80s (0:01:41)TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh --tb=short -q:11 passed, 3 warnings in 95.10s (0:01:35)TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh --tb=short -q:11 passed, 3 warnings in 116.69s (0:01:56)TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short -q:75 passed, 3 warnings in 656.13s (0:10:56)-Infinityfrom overflow sentinels; revert two unrelated wraps inasserts.py), Python 3.10 + Spark 3.3.0,f402c5fb5:TEST='test_std_variance_extreme_floating_point or (test_std_variance and not extreme)' TEST_PARALLEL=0 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short -q:265 passed, 39510 deselected, 8 warnings in 1425.14s (0:23:45).Documentation
Testing
(Please provide the names of the existing tests in the PR description.)
Performance