Skip to content

[AutoSparkUT] Fix std variance floating overflow coverage#14762

Merged
wjxiz1992 merged 7 commits into
NVIDIA:mainfrom
wjxiz1992:fix/14681-std-variance-revans
May 15, 2026
Merged

[AutoSparkUT] Fix std variance floating overflow coverage#14762
wjxiz1992 merged 7 commits into
NVIDIA:mainfrom
wjxiz1992:fix/14681-std-variance-revans

Conversation

@wjxiz1992
Copy link
Copy Markdown
Collaborator

@wjxiz1992 wjxiz1992 commented May 9, 2026

Closes #14681

This follows the discussion in this issue comment to keep strict std/variance coverage for common floating-point inputs and split the overflow-oriented NaN/Inf cases into a smaller documented test set.

Upstream dependency:

  • cuDF MERGE_M2 fix: rapidsai/cudf#22393 (0a1620e5b3), which adds the first non-empty partial identity path for extreme finite means. spark-rapids-jni origin/main already points thirdparty/cudf at this fix.
  • Scope split: the cuDF fix covers the native GroupByAggregation.mergeM2() path. The Scala change in this PR is still needed for CudfMergeM2.reductionAggregate, the host-side/global reduction merge path, because that path copies partial buffers to the JVM and does not call cuDF MERGE_M2. Without the Scala identity branch, the first non-empty partial can still evaluate the generic formula with mergeN == 0, causing delta * delta to overflow to Inf and then Inf * 0 to become NaN.
  • The Scala merge formula is also reordered to match Spark CPU CentralMomentAgg (deltaN = delta / newN; m2 += delta * deltaN * n1 * n2), reducing remaining CPU/GPU differences that are purely caused by merge-order arithmetic.

Changes:

  • Match Spark's central-moment merge order in the host-side M2 reduction merge, including an identity path for the first non-empty partial.
  • Split std/variance integration coverage into common finite floating-point cases with strict comparison and extreme floating-point cases with a scoped NaN/Inf overflow allowance.
  • Pass the existing generic result_canonicalize_func_before_compare hook through the SQL assertion helper and keep std/variance overflow canonicalization local to the extreme-input test.
  • Tighten post-review test plumbing by making the extreme FP generators finite-only with no_nans=True, so NaN/Inf output is attributable to overflow rather than input special cases. The broader dict/map comparison cleanup was kept out of this PR after Blossom exposed unrelated existing map/JSON/ORC differences.

Performance impact:
The Scala change runs only in the host-side reductionAggregate merge over partial aggregate rows. Complexity and allocation behavior are unchanged; the first non-empty partial now takes an identity branch and later partials use scalar double arithmetic equivalent to Spark's merge order. No GPU kernel or per-row expression hot path is changed.

Validation:

  • python -m py_compile integration_tests/src/main/python/asserts.py integration_tests/src/main/python/hash_aggregate_test.py
  • git diff --check
  • mvn package -DskipTests -pl dist,integration_tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo -s jenkins/settings.xml -P mirror-apache-to-urm: BUILD SUCCESS
  • TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short: 75 passed, 3 warnings in 985.02s (0:16:25)
  • Confirmed spark-rapids-jni origin/main has the cuDF fix via thirdparty/cudf -> 0a1620e5b3 Fix MERGE_M2 for extreme finite partial means (#22393).
  • Rebuilt local spark-rapids dist with a JNI jar containing the equivalent cuDF fix revision c12348ed68e05227139a54e19d35374530c23e7b; confirmed the resulting rapids-4-spark_2.12-26.06.0-SNAPSHOT-cuda12.jar embeds that cuDF revision.
  • Re-ran targeted IT with that rebuilt dist jar: TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short: 75 passed, 3 warnings in 670.56s (0:11:10)
  • Post-review fix validation: TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh -s: 10 passed, 3 warnings in 101.80s (0:01:41)
  • Blossom blocker fix validation: TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh --tb=short -q: 11 passed, 3 warnings in 95.10s (0:01:35)
  • New review-comment follow-up validation after removing this PR's dict/map assertion diff: TESTS="hash_aggregate_test.py::test_std_variance_extreme_floating_point json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh --tb=short -q: 11 passed, 3 warnings in 116.69s (0:01:56)
  • Review fix validation after moving overflow handling out of the generic assert framework, using Python 3.10 and Spark 3.3.0: TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point' TEST_PARALLEL=1 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short -q: 75 passed, 3 warnings in 656.13s (0:10:56)
  • Review nit follow-up (drop -Infinity from overflow sentinels; revert two unrelated wraps in asserts.py), Python 3.10 + Spark 3.3.0, f402c5fb5: TEST='test_std_variance_extreme_floating_point or (test_std_variance and not extreme)' TEST_PARALLEL=0 DATAGEN_SEED=1777180076 ./integration_tests/run_pyspark_from_build.sh --tb=short -q: 265 passed, 39510 deselected, 8 warnings in 1425.14s (0:23:45).

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Allen Xu <allxu@nvidia.com>
Copilot AI review requested due to automatic review settings May 9, 2026 03:22
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR fixes a NaN-producing overflow bug in the host-side M2 reduction merge path (CudfMergeM2.reductionAggregate) and splits the std/variance integration test coverage into a strict common-case suite and a loosely-compared extreme-input suite.

  • Scala fix: Adds an identity branch for mergeN == 0.0 in the host-side merge loop. Without it, the first non-empty partial evaluates the generic formula with mergeN == 0, producing mean * meanInf (overflow), then Inf * 0NaN per IEEE 754. The merge order is also reordered to match Spark CPU's CentralMomentAgg (deltaN = delta / newN; m2 += delta * deltaN * n1 * n2), narrowing remaining CPU/GPU merge-arithmetic differences.
  • Python test split: test_std_variance now uses DoubleGen(min_exp=-200, max_exp=200, no_nans=True) for strict equality, while test_std_variance_extreme_floating_point uses the full-exponent generator with a canonicalize hook that maps NaN/+Inf overflow sentinels to a common placeholder.
  • assert_gpu_and_cpu_are_equal_sql plumbing: The result_canonicalize_func_before_compare parameter is now threaded through, keeping the canonicalization local to the extreme-input test.

Confidence Score: 5/5

Safe to merge — the host-side merge fix is minimal, well-scoped, and the test suite was run end-to-end with 75 passing tests.

The Scala change is a targeted fix adding an identity branch that short-circuits the first non-empty partial, eliminating the Inf×0→NaN path. The code path is only reached in the host-side global reduction merge, not in GPU kernels or per-row expressions.

No files require special attention.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/aggregateFunctions.scala Adds identity branch for first non-empty partial in host-side M2 merge, and reorders merge arithmetic to match Spark CPU CentralMomentAgg.
integration_tests/src/main/python/hash_aggregate_test.py Splits std/variance coverage into common-finite (strict) and extreme-FP (loose sentinel) suites.
integration_tests/src/main/python/asserts.py Threads result_canonicalize_func_before_compare through assert_gpu_and_cpu_are_equal_sql.
docs/compatibility.md Adds documentation for stddev/variance overflow behavior on extreme finite inputs.

Reviews (7): Last reviewed commit: "Drop -Infinity sentinel and revert unrel..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses intermittent stddev/variance test failures caused by floating-point overflow differences (NaN vs ±Inf) by aligning the host-side M2 merge logic with Spark’s merge order and refining integration test coverage/compare semantics.

Changes:

  • Update CudfMergeM2 host-side reductionAggregate merge to follow Spark’s central-moment merge order (including an identity path for the first non-empty partial).
  • Split std/variance integration tests into (a) common finite floating-point coverage with strict comparisons and (b) extreme floating-point coverage with a scoped NaN/Inf overflow equivalence allowance.
  • Add an assertion-option plumb-through in the integration test framework to treat NaN vs ±Inf as equivalent for documented overflow cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/aggregate/aggregateFunctions.scala Adjust host-side merge of partial (n, mean, m2) to mirror Spark merge order and reduce order-dependent overflow artifacts.
integration_tests/src/main/python/hash_aggregate_test.py Split std/variance coverage into common vs extreme floating-point inputs; extreme path enables NaN/Inf-overflow-equivalence comparisons.
integration_tests/src/main/python/asserts.py Plumb a new comparison option through equality helpers to allow NaN vs ±Inf equivalence for documented overflow tests.

Comment thread integration_tests/src/main/python/asserts.py
Comment thread integration_tests/src/main/python/hash_aggregate_test.py Outdated
Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992 wjxiz1992 requested a review from revans2 May 9, 2026 04:10
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

Signed-off-by: Allen Xu <allxu@nvidia.com>
Comment on lines +86 to +89
cpu_items = list(cpu.items()).sort(key=_RowCmp)
gpu_items = list(gpu.items()).sort(key=_RowCmp)
_assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
_assert_equal(cpu_items, gpu_items, float_check, path + ["map"],
nan_inf_equivalent_for_overflow)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The sorted(...) fix mentioned in the PR description was not applied. list.sort() is an in-place method that returns None, so cpu_items and gpu_items are both None. When _assert_equal(None, None, ...) is called, it falls through to the cpu == None branch (elif (cpu == None): assert cpu == gpu) which always passes. Any dict comparison therefore silently succeeds regardless of content — a false green that masks real CPU/GPU divergence in map-type columns.

Suggested change
cpu_items = list(cpu.items()).sort(key=_RowCmp)
gpu_items = list(gpu.items()).sort(key=_RowCmp)
_assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
_assert_equal(cpu_items, gpu_items, float_check, path + ["map"],
nan_inf_equivalent_for_overflow)
cpu_items = sorted(list(cpu.items()), key=_RowCmp)
gpu_items = sorted(list(gpu.items()), key=_RowCmp)
_assert_equal(cpu_items, gpu_items, float_check, path + ["map"],
nan_inf_equivalent_for_overflow)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e964fed: this PR no longer changes the dict/map comparison branch. The list.sort() behavior is a pre-existing issue, but applying the suggested sorted(...) cleanup in this PR exposed unrelated existing map/JSON/ORC differences in Blossom #13006, so that cleanup should be handled separately instead of being bundled with the std/variance fix.

Local validation after removing this PR's dict-branch change passed:
hash_aggregate_test.py::test_std_variance_extreme_floating_point plus json_matrix_test.py::test_from_json_map_string_string[int_formatted.json]: 11 passed, 3 warnings in 116.69s.

Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

2 similar comments
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992 wjxiz1992 requested a review from thirtiseven May 13, 2026 06:36
Comment thread integration_tests/src/main/python/asserts.py Outdated
Signed-off-by: Allen Xu <allxu@nvidia.com>
}

def _canonicalize_std_variance_overflow_value(value):
if isinstance(value, float) and (math.isnan(value) or math.isinf(value)):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we avoid treating -Infinity as an acceptable overflow sentinel here? For stddev/variance over finite inputs, NaN vs +Infinity can be explained by overflow and merge order, but -Infinity would likely indicate a different issue, such as a negative M2/sign bug. It may be safer to canonicalize only NaN and +Infinity, and update the compatibility note accordingly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f402c5f. The canonicalizer now treats only NaN and +Infinity as accepted overflow sentinels; -Infinity is left as-is so a negative-M2/sign bug would surface as a real diff. The compatibility note in docs/compatibility.md and the inline test comment are updated to match.

Re-ran hash_aggregate_test.py::test_std_variance and ::test_std_variance_extreme_floating_point locally with Python 3.10 + Spark 3.3.0: 265 passed, 39510 deselected, 8 warnings in 1425.14s.

return (from_cpu, from_gpu)

def assert_gpu_and_cpu_are_equal_collect(func, conf={}, is_cpu_first=True, result_canonicalize_func_before_compare=None):
def assert_gpu_and_cpu_are_equal_collect(func, conf={}, is_cpu_first=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in f402c5f. The assert_gpu_and_cpu_are_equal_collect signature is back to a single line; this PR no longer changes that line.

"""
_assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
_assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first,
result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted in f402c5f. The _assert_gpu_and_cpu_are_equal call is back to a single line; this PR no longer changes that line.

Address PR review nits on NVIDIA#14762:

- hash_aggregate_test.py: the std/variance overflow canonicalizer no
  longer accepts -Infinity. Over finite inputs, NaN and +Infinity can
  be explained by overflow plus partial-merge order; -Infinity would
  indicate a different issue (e.g. negative M2 / sign bug) and must
  not be hidden by sentinel canonicalization.
- docs/compatibility.md: align the std/variance overflow note with
  the test scope (NaN and +Infinity only).
- asserts.py: revert two unnecessary line wraps on
  assert_gpu_and_cpu_are_equal_collect signature and its call to
  _assert_gpu_and_cpu_are_equal; the new parameter plumbing through
  assert_gpu_and_cpu_are_equal_sql is kept.

Validation (Python 3.10, Spark 3.3.0):
TESTS='hash_aggregate_test.py::test_std_variance hash_aggregate_test.py::test_std_variance_extreme_floating_point':
265 passed, 39510 deselected, 8 warnings in 1425.14s.

Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992 wjxiz1992 requested a review from thirtiseven May 14, 2026 09:34
Copy link
Copy Markdown
Collaborator

@thirtiseven thirtiseven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjxiz1992 wjxiz1992 merged commit ca3c8f1 into NVIDIA:main May 15, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] test_std_variance fails with GPU nan vs CPU inf on Double data with small batchSizeBytes intermittently

4 participants