Fix MERGE_M2 for extreme finite partial means by wjxiz1992 · Pull Request #22393 · rapidsai/cudf

wjxiz1992 · 2026-05-06T04:21:55Z

Description

MERGE_M2 now treats merging the first non-empty partial into an empty accumulator as an identity operation. This avoids evaluating the generic merge formula with n == 0, where an extreme finite mean can make delta * delta overflow to inf and then produce NaN via inf * 0.

For non-empty merges, the update now uses the central-moment form with delta_n = delta / new_n, matching the numerically safer order used by Spark's CPU implementation.

Added groupby tests for:

a single extreme finite partial, which should preserve m2 = 0.0
merging an extreme finite partial with another finite partial, which should produce m2 = +inf

Local validation:

cmake -S cpp -B cpp/build \
  -DCMAKE_INSTALL_PREFIX=/home/allxu/work/spark-set/cudf-14681-merge-m2/cpp/build/install \
  -DCMAKE_CUDA_ARCHITECTURES=NATIVE \
  -DUSE_NVTX=ON \
  -DBUILD_TESTS=ON \
  -DBUILD_BENCHMARKS=OFF \
  -DDISABLE_DEPRECATION_WARNINGS=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DZLIB_INCLUDE_DIR=/home/allxu/.local/lib/python3.12/site-packages/lxml/includes/extlibs \
  -DZLIB_LIBRARY=/usr/lib/x86_64-linux-gnu/libz.so.1

cmake --build cpp/build --target GROUPBY_TEST -j12
[100%] Built target GROUPBY_TEST

./cpp/build/gtests/GROUPBY_TEST --gtest_filter='GroupbyMergeM2*' --gtest_color=no
[==========] 44 tests from 7 test suites ran. (134 ms total)
[  PASSED  ] 44 tests.

cmake --build cpp/build --target generate_ctest_json -j12
cmake --build cpp/build --target cudf_identify_stream_usage_mode_cudf -j12
ctest --test-dir cpp/build -R '^GROUPBY_TEST$' --output-on-failure
100% tests passed, 0 tests failed out of 2

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-05-06T04:21:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

This PR fixes numerical edge cases in the MERGE_M2 groupby aggregation when merging partial states with extreme (but finite) means, preventing NaN production when the accumulator is still empty and aligning the merge update with a numerically safer formulation.

Changes:

Treat merging the first non-empty partial into an empty accumulator as an identity operation to avoid inf * 0 -> NaN.
Update non-empty merge math to use a central-moment form (delta_n = delta / new_n) for improved numerical stability.
Add groupby regression tests covering extreme finite partials (identity case) and extreme+finite merges (expected m2 = +inf).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`cpp/src/groupby/sort/group_merge_m2.cu`	Adds an early identity-path for empty accumulators and updates the merge formula to a safer central-moment form.
`cpp/tests/groupby/merge_m2_tests.cpp`	Adds regression tests for extreme finite means to ensure `MERGE_M2` does not produce `NaN` and behaves as expected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Allen Xu <allxu@nvidia.com>

…l struct Cover both int64_t and double count columns for the MERGE_M2 extreme-finite cases. Spark stores the count as FLOAT64, which the original two TEST_F variants did not exercise. Strengthen the assertion to compare the full result struct (counts, means, m2) rather than only the m2 child column. Signed-off-by: Allen Xu <allxu@nvidia.com>

davidwendt · 2026-05-06T18:49:52Z

/ok to test 8b781d9

wjxiz1992 · 2026-05-07T03:12:38Z

/ok to test c331016

davidwendt · 2026-05-07T12:48:57Z

@wjxiz1992 Are you planning to resolve the Copilot review comments?

pmattione-nvidia · 2026-05-07T15:10:43Z

+      // Merging an empty accumulator with a non-empty partial is an identity operation. Running
+      // the generic formula for this case can evaluate inf * 0 and turn extreme finite partials
+      // into NaN.
+      if (n == 0) {


yes but what if the input mean is literally infinity? or it's a NaN? then it should return NaN right? You should also check std::isfinite() here. Or am I misunderstanding what merge m2 is trying to do.

Walking through the cases:

partial_avg = +Inf (with partial_n > 0): the identity branch propagates avg = +Inf, m2 = partial_m2 as-is. The old generic path produced NaN here via delta * delta_n * n * partial_n = +Inf * +Inf * 0 * partial_n = inf*0 — same inf*0=NaN side effect this PR is fixing. Propagating +Inf preserves the upstream "overflowed" signal; coercing to NaN would discard it.

partial_avg = NaN: identity sets avg = NaN; any subsequent merge step propagates NaN through the generic formula (NaN ⊕ anything = NaN). Final result is NaN regardless of partial position, as expected.

In practice Spark's CentralMomentAgg doesn't emit (count, +Inf, m2_finite) partials — Welford hits +Inf - +Inf = NaN on the first overflowing row, so the partial becomes (count, NaN, NaN). So the "+Inf avg" case really only shows up for direct callers of MERGE_M2 with hand-crafted partials, and for those propagation is strictly more informative than coercion.

I pushed 5d917711 (now 071266d after rebase) with regression tests pinning these semantics: NanMeanFirstPartial, InfMeanFirstPartial, and NanMeanMergedWithFinite for both INT64 and FLOAT64 count types. Let me know if there's a Spark scenario where NaN coercion is actually wanted — I'm not seeing one.

Add regression tests showing the identity branch propagates non-finite partial means as-is, instead of coercing them to NaN. Covers single NaN-mean partial, single +Inf-mean partial, and NaN-mean merged with a finite partial. Both INT64 and FLOAT64 count types are covered. Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2026-05-08T02:20:38Z

@davidwendt the two Copilot review comments are now resolved (replied inline) — both were already addressed in 8b781d9 (full struct comparison, plus FLOAT64-count variants for the Spark path). Also pushed 071266d with regression tests for NaN/Inf partial means in response to @pmattione-nvidia.

wjxiz1992 · 2026-05-08T02:20:40Z

/ok to test 071266d

copy-pr-bot · 2026-05-08T02:20:43Z

/ok to test 071266d

@wjxiz1992, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

coderabbitai · 2026-05-08T02:20:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 090952b1-cb09-46a6-88c4-393705c0ffe9

📥 Commits

Reviewing files that changed from the base of the PR and between 65df106 and 071266d.

📒 Files selected for processing (2)

cpp/src/groupby/sort/group_merge_m2.cu
cpp/tests/groupby/merge_m2_tests.cpp

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Fixed merge logic for group-by aggregations to correctly handle empty accumulators. Previously, merging empty accumulators with non-empty ones could produce NaN values; values are now assigned directly from the first non-empty accumulator.
Tests
- Added test coverage for extreme and non-finite (NaN/Inf) values in M2 aggregation merge operations.

Walkthrough

The PR fixes a NaN propagation bug in MERGE_M2 aggregation for M2 partial states. A special case was added to the merge logic to directly assign the first non-empty partial's values when the accumulator is empty, preventing inf * 0 overflow. Tests validate extreme finite means, NaN, and Inf propagation across single and merged partials.

Changes

M2 Merge Extreme Values

Layer / File(s)	Summary
Core Implementation `cpp/src/groupby/sort/group_merge_m2.cu`	Added a special-case branch in `merge_fn::operator()` that directly copies `partial_n`, `partial_avg`, and `partial_m2` into the accumulator when the running count `n` is zero, preventing the generic formula from computing `inf * 0 = NaN`.
Test Validation `cpp/tests/groupby/merge_m2_tests.cpp`	Added `#include <limits>`, templated helper functions (`test_extreme_finite_first_partial`, `test_extreme_finite_merged_partials`, `test_nan_mean_first_partial`, `test_inf_mean_first_partial`, `test_nan_mean_merged_with_finite`), new fixture `GroupbyMergeM2ExtremeTest`, and concrete `TEST_F` cases for `int64_t` and `double` types covering extreme finite, NaN, and Inf mean propagation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main fix: addressing MERGE_M2 behavior for extreme finite partial means, which is the core issue from `#22391`.
Description check	✅ Passed	The description provides comprehensive context including the issue link, technical explanation of the problem and solution, and validation evidence through test execution.
Linked Issues check	✅ Passed	The PR implements all key requirements from `#22391`: short-circuiting when n==0 to avoid NaN, using central-moment form for safety, and adding comprehensive tests for extreme finite values.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing MERGE_M2 behavior and adding corresponding tests; no unrelated modifications are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/14681-merge-m2-extreme

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

wjxiz1992 · 2026-05-08T02:42:42Z

/ok to test f49c069

davidwendt · 2026-05-08T21:06:28Z

/merge

Closes rapidsai#22391. `MERGE_M2` now treats merging the first non-empty partial into an empty accumulator as an identity operation. This avoids evaluating the generic merge formula with `n == 0`, where an extreme finite mean can make `delta * delta` overflow to `inf` and then produce `NaN` via `inf * 0`. For non-empty merges, the update now uses the central-moment form with `delta_n = delta / new_n`, matching the numerically safer order used by Spark's CPU implementation. Added groupby tests for: - a single extreme finite partial, which should preserve `m2 = 0.0` - merging an extreme finite partial with another finite partial, which should produce `m2 = +inf` Local validation: ```text cmake -S cpp -B cpp/build \ -DCMAKE_INSTALL_PREFIX=/home/allxu/work/spark-set/cudf-14681-merge-m2/cpp/build/install \ -DCMAKE_CUDA_ARCHITECTURES=NATIVE \ -DUSE_NVTX=ON \ -DBUILD_TESTS=ON \ -DBUILD_BENCHMARKS=OFF \ -DDISABLE_DEPRECATION_WARNINGS=ON \ -DCMAKE_BUILD_TYPE=Release \ -DZLIB_INCLUDE_DIR=/home/allxu/.local/lib/python3.12/site-packages/lxml/includes/extlibs \ -DZLIB_LIBRARY=/usr/lib/x86_64-linux-gnu/libz.so.1 ``` ```text cmake --build cpp/build --target GROUPBY_TEST -j12 [100%] Built target GROUPBY_TEST ``` ```text ./cpp/build/gtests/GROUPBY_TEST --gtest_filter='GroupbyMergeM2*' --gtest_color=no [==========] 44 tests from 7 test suites ran. (134 ms total) [ PASSED ] 44 tests. ``` ```text cmake --build cpp/build --target generate_ctest_json -j12 cmake --build cpp/build --target cudf_identify_stream_usage_mode_cudf -j12 ctest --test-dir cpp/build -R '^GROUPBY_TEST$' --output-on-failure 100% tests passed, 0 tests failed out of 2 ``` Authors: - Allen Xu (https://github.com/wjxiz1992) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - David Wendt (https://github.com/davidwendt) URL: rapidsai#22393

Copilot AI review requested due to automatic review settings May 6, 2026 04:21

wjxiz1992 requested a review from a team as a code owner May 6, 2026 04:21

wjxiz1992 requested review from davidwendt and mythrocks May 6, 2026 04:21

github-actions Bot assigned wjxiz1992 May 6, 2026

github-actions Bot added the libcudf Affects libcudf (C++/CUDA) code. label May 6, 2026

Copilot started reviewing on behalf of wjxiz1992 May 6, 2026 04:22 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread cpp/tests/groupby/merge_m2_tests.cpp Outdated

Comment thread cpp/tests/groupby/merge_m2_tests.cpp Outdated

wjxiz1992 added bug Something isn't working non-breaking Non-breaking change labels May 6, 2026

Fix merge M2 with extreme finite partial means

c12348e

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 force-pushed the fix/14681-merge-m2-extreme branch from ad5ba04 to c12348e Compare May 6, 2026 05:01

Merge branch 'main' into fix/14681-merge-m2-extreme

c331016

wjxiz1992 mentioned this pull request May 7, 2026

[BUG] test_std_variance fails with GPU nan vs CPU inf on Double data with small batchSizeBytes intermittently NVIDIA/spark-rapids#14681

Closed

pmattione-nvidia reviewed May 7, 2026

View reviewed changes

Merge branch 'main' into fix/14681-merge-m2-extreme

f49c069

pmattione-nvidia approved these changes May 8, 2026

View reviewed changes

davidwendt approved these changes May 8, 2026

View reviewed changes

rapids-bot Bot merged commit 0a1620e into main May 8, 2026
465 of 495 checks passed

wjxiz1992 mentioned this pull request May 9, 2026

[AutoSparkUT] Fix std variance floating overflow coverage NVIDIA/spark-rapids#14762

Merged

8 tasks

Conversation

wjxiz1992 commented May 6, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

davidwendt commented May 6, 2026

Uh oh!

wjxiz1992 commented May 7, 2026

Uh oh!

davidwendt commented May 7, 2026

Uh oh!

pmattione-nvidia May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 commented May 8, 2026

Uh oh!

wjxiz1992 commented May 8, 2026

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

wjxiz1992 commented May 8, 2026

Uh oh!

davidwendt commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pmattione-nvidia May 7, 2026 •

edited

Loading