[DO NOT MERGE] PDS DS ALL#22469
Conversation
…rs/streaming-over
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
python/cudf_polars/tests/testing/test_asserts.py (1)
479-508: ⚡ Quick winExpand this new grouped-sort test with edge-case fixtures
Please add parameterized cases for empty, all-null, and single-element frames (same assertion flow) to harden this path.
As per coding guidelines, tests should “include tests for empty, all-null, single-element, and mixed type cases.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cudf_polars/tests/testing/test_asserts.py` around lines 479 - 508, Add parameterization to test_assert_tpch_result_equal_grouped_float_sort to exercise empty, all-null, and single-element frames: introduce a new pytest.mark.parametrize("fixture_type", ["normal","empty","all_null","single"]) and for each branch construct left/right/right_different accordingly (e.g., empty -> pl.DataFrame({"a": [], "b": [], "c": []}), all_null -> c column all None/np.nan, single -> one-row frames with matching and mismatching float values); keep the same drop_columns logic and sort_by adjustment, call assert_tpch_result_equal for the matching pair (abs_tol=0.01, check_exact=False) and expect ValidationError for right_different, and ensure you reuse the existing symbols test_assert_tpch_result_equal_grouped_float_sort and assert_tpch_result_equal so the new cases follow the same assertion flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.py`:
- Around line 370-396: The current else-branch always reorders rows via
grouped_sort_columns before comparing, which neutralizes check_row_order; update
the logic around left_sorted/right_sorted so that when check_row_order is True
you do NOT sort (keep left/right as-is) and only perform the grouped_sort when
check_row_order is False (preserving existing behavior when sort_by is empty);
adjust the variables used in the call to polars.testing.assert_frame_equal
(left_sorted/right_sorted) accordingly so the comparison respects the
check_row_order flag.
---
Nitpick comments:
In `@python/cudf_polars/tests/testing/test_asserts.py`:
- Around line 479-508: Add parameterization to
test_assert_tpch_result_equal_grouped_float_sort to exercise empty, all-null,
and single-element frames: introduce a new
pytest.mark.parametrize("fixture_type", ["normal","empty","all_null","single"])
and for each branch construct left/right/right_different accordingly (e.g.,
empty -> pl.DataFrame({"a": [], "b": [], "c": []}), all_null -> c column all
None/np.nan, single -> one-row frames with matching and mismatching float
values); keep the same drop_columns logic and sort_by adjustment, call
assert_tpch_result_equal for the matching pair (abs_tol=0.01, check_exact=False)
and expect ValidationError for right_different, and ensure you reuse the
existing symbols test_assert_tpch_result_equal_grouped_float_sort and
assert_tpch_result_equal so the new cases follow the same assertion flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cd585b0b-15b2-4184-a4a6-011913ce4283
📒 Files selected for processing (4)
python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/core.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/over.pypython/cudf_polars/tests/testing/test_asserts.py
🚧 Files skipped from review as they are similar to previous changes (2)
- python/cudf_polars/cudf_polars/experimental/rapidsmpf/core.py
- python/cudf_polars/cudf_polars/experimental/rapidsmpf/over.py
| else: | ||
| # no sort_by, just a straight comparison. | ||
| non_float_columns = [ | ||
| col | ||
| for col in left.columns | ||
| if left.schema[col] not in (pl.Float32, pl.Float64) | ||
| ] | ||
| float_columns = [ | ||
| col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64) | ||
| ] | ||
| grouped_sort_columns = [*non_float_columns, *float_columns] | ||
| left_sorted = ( | ||
| left.sort(by=grouped_sort_columns, nulls_last=nulls_last) | ||
| if grouped_sort_columns | ||
| else left | ||
| ) | ||
| right_sorted = ( | ||
| right.sort(by=grouped_sort_columns, nulls_last=nulls_last) | ||
| if grouped_sort_columns | ||
| else right | ||
| ) | ||
|
|
||
| # no sort_by, compare after grouped sort to ignore nondeterministic row order. | ||
| try: | ||
| polars.testing.assert_frame_equal( | ||
| left, | ||
| right, | ||
| left_sorted, | ||
| right_sorted, | ||
| **polars_kwargs, # type: ignore[arg-type] |
There was a problem hiding this comment.
check_row_order is neutralized when sort_by is empty
This branch always canonicalizes row order before comparison, so check_row_order=True no longer enforces original order and can hide ordering regressions.
Suggested fix
else:
- non_float_columns = [
- col
- for col in left.columns
- if left.schema[col] not in (pl.Float32, pl.Float64)
- ]
- float_columns = [
- col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64)
- ]
- grouped_sort_columns = [*non_float_columns, *float_columns]
- left_sorted = (
- left.sort(by=grouped_sort_columns, nulls_last=nulls_last)
- if grouped_sort_columns
- else left
- )
- right_sorted = (
- right.sort(by=grouped_sort_columns, nulls_last=nulls_last)
- if grouped_sort_columns
- else right
- )
+ if check_row_order:
+ left_sorted = left
+ right_sorted = right
+ else:
+ non_float_columns = [
+ col
+ for col in left.columns
+ if left.schema[col] not in (pl.Float32, pl.Float64)
+ ]
+ float_columns = [
+ col for col in left.columns if left.schema[col] in (pl.Float32, pl.Float64)
+ ]
+ grouped_sort_columns = [*non_float_columns, *float_columns]
+ left_sorted = (
+ left.sort(by=grouped_sort_columns, nulls_last=nulls_last)
+ if grouped_sort_columns
+ else left
+ )
+ right_sorted = (
+ right.sort(by=grouped_sort_columns, nulls_last=nulls_last)
+ if grouped_sort_columns
+ else right
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cudf_polars/cudf_polars/experimental/benchmarks/asserts.py` around
lines 370 - 396, The current else-branch always reorders rows via
grouped_sort_columns before comparing, which neutralizes check_row_order; update
the logic around left_sorted/right_sorted so that when check_row_order is True
you do NOT sort (keep left/right as-is) and only perform the grouped_sort when
check_row_order is False (preserving existing behavior when sort_by is empty);
adjust the variables used in the call to polars.testing.assert_frame_equal
(left_sorted/right_sorted) accordingly so the comparison respects the
check_row_order flag.
# Conflicts: # python/cudf_polars/cudf_polars/experimental/benchmarks/pdsds_queries/q8.py
Branch merges final PRs for completing PDS-DS:
assert_tpch_result_equaldue to float sort ambiguity #22378