Skip to content

[DO NOT MERGE] PDS DS ALL#22469

Draft
quasiben wants to merge 97 commits into
rapidsai:mainfrom
quasiben:pds-ds-all
Draft

[DO NOT MERGE] PDS DS ALL#22469
quasiben wants to merge 97 commits into
rapidsai:mainfrom
quasiben:pds-ds-all

Conversation

@quasiben
Copy link
Copy Markdown
Member

Matt711 and others added 30 commits April 17, 2026 01:58
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
python/cudf_polars/tests/testing/test_asserts.py (1)

479-508: ⚡ Quick win

Expand this new grouped-sort test with edge-case fixtures

Please add parameterized cases for empty, all-null, and single-element frames (same assertion flow) to harden this path.

As per coding guidelines, tests should “include tests for empty, all-null, single-element, and mixed type cases.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudf_polars/tests/testing/test_asserts.py` around lines 479 - 508, Add
parameterization to test_assert_tpch_result_equal_grouped_float_sort to exercise
empty, all-null, and single-element frames: introduce a new
pytest.mark.parametrize("fixture_type", ["normal","empty","all_null","single"])
and for each branch construct left/right/right_different accordingly (e.g.,
empty -> pl.DataFrame({"a": [], "b": [], "c": []}), all_null -> c column all
None/np.nan, single -> one-row frames with matching and mismatching float
values); keep the same drop_columns logic and sort_by adjustment, call
assert_tpch_result_equal for the matching pair (abs_tol=0.01, check_exact=False)
and expect ValidationError for right_different, and ensure you reuse the
existing symbols test_assert_tpch_result_equal_grouped_float_sort and
assert_tpch_result_equal so the new cases follow the same assertion flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.py`:
- Around line 370-396: The current else-branch always reorders rows via
grouped_sort_columns before comparing, which neutralizes check_row_order; update
the logic around left_sorted/right_sorted so that when check_row_order is True
you do NOT sort (keep left/right as-is) and only perform the grouped_sort when
check_row_order is False (preserving existing behavior when sort_by is empty);
adjust the variables used in the call to polars.testing.assert_frame_equal
(left_sorted/right_sorted) accordingly so the comparison respects the
check_row_order flag.

---

Nitpick comments:
In `@python/cudf_polars/tests/testing/test_asserts.py`:
- Around line 479-508: Add parameterization to
test_assert_tpch_result_equal_grouped_float_sort to exercise empty, all-null,
and single-element frames: introduce a new
pytest.mark.parametrize("fixture_type", ["normal","empty","all_null","single"])
and for each branch construct left/right/right_different accordingly (e.g.,
empty -> pl.DataFrame({"a": [], "b": [], "c": []}), all_null -> c column all
None/np.nan, single -> one-row frames with matching and mismatching float
values); keep the same drop_columns logic and sort_by adjustment, call
assert_tpch_result_equal for the matching pair (abs_tol=0.01, check_exact=False)
and expect ValidationError for right_different, and ensure you reuse the
existing symbols test_assert_tpch_result_equal_grouped_float_sort and
assert_tpch_result_equal so the new cases follow the same assertion flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cd585b0b-15b2-4184-a4a6-011913ce4283

📥 Commits

Reviewing files that changed from the base of the PR and between b1e57de and 4f69818.

📒 Files selected for processing (4)
  • python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.py
  • python/cudf_polars/cudf_polars/experimental/rapidsmpf/core.py
  • python/cudf_polars/cudf_polars/experimental/rapidsmpf/over.py
  • python/cudf_polars/tests/testing/test_asserts.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • python/cudf_polars/cudf_polars/experimental/rapidsmpf/core.py
  • python/cudf_polars/cudf_polars/experimental/rapidsmpf/over.py

Comment on lines 370 to 396
else:
# no sort_by, just a straight comparison.
non_float_columns = [
col
for col in left.columns
if left.schema[col] not in (pl.Float32, pl.Float64)
]
float_columns = [
col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64)
]
grouped_sort_columns = [*non_float_columns, *float_columns]
left_sorted = (
left.sort(by=grouped_sort_columns, nulls_last=nulls_last)
if grouped_sort_columns
else left
)
right_sorted = (
right.sort(by=grouped_sort_columns, nulls_last=nulls_last)
if grouped_sort_columns
else right
)

# no sort_by, compare after grouped sort to ignore nondeterministic row order.
try:
polars.testing.assert_frame_equal(
left,
right,
left_sorted,
right_sorted,
**polars_kwargs, # type: ignore[arg-type]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

check_row_order is neutralized when sort_by is empty

This branch always canonicalizes row order before comparison, so check_row_order=True no longer enforces original order and can hide ordering regressions.

Suggested fix
 else:
-    non_float_columns = [
-        col
-        for col in left.columns
-        if left.schema[col] not in (pl.Float32, pl.Float64)
-    ]
-    float_columns = [
-        col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64)
-    ]
-    grouped_sort_columns = [*non_float_columns, *float_columns]
-    left_sorted = (
-        left.sort(by=grouped_sort_columns, nulls_last=nulls_last)
-        if grouped_sort_columns
-        else left
-    )
-    right_sorted = (
-        right.sort(by=grouped_sort_columns, nulls_last=nulls_last)
-        if grouped_sort_columns
-        else right
-    )
+    if check_row_order:
+        left_sorted = left
+        right_sorted = right
+    else:
+        non_float_columns = [
+            col
+            for col in left.columns
+            if left.schema[col] not in (pl.Float32, pl.Float64)
+        ]
+        float_columns = [
+            col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64)
+        ]
+        grouped_sort_columns = [*non_float_columns, *float_columns]
+        left_sorted = (
+            left.sort(by=grouped_sort_columns, nulls_last=nulls_last)
+            if grouped_sort_columns
+            else left
+        )
+        right_sorted = (
+            right.sort(by=grouped_sort_columns, nulls_last=nulls_last)
+            if grouped_sort_columns
+            else right
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.py` around
lines 370 - 396, The current else-branch always reorders rows via
grouped_sort_columns before comparing, which neutralizes check_row_order; update
the logic around left_sorted/right_sorted so that when check_row_order is True
you do NOT sort (keep left/right as-is) and only perform the grouped_sort when
check_row_order is False (preserving existing behavior when sort_by is empty);
adjust the variables used in the call to polars.testing.assert_frame_equal
(left_sorted/right_sorted) accordingly so the comparison respects the
check_row_order flag.

@wence- wence- marked this pull request as draft May 12, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars DO NOT MERGE Hold off on merging; see PR for details Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants