Merge release/26.06 into main#22590
Closed
KyleFromNVIDIA wants to merge 10 commits into
Closed
Conversation
Updated cudf-polars to support Polars 1.39. Summary: * **Dependency pin** updated across conda envs, the recipe, `dependencies.yaml`, and `pyproject.toml`. New `POLARS_VERSION_LT_139` flag gates version specific code. * **Rolling expressions:** polars 1.39 makes `pl.col(...).rolling(...)` accessible again via `AExpr::Rolling`. A new `_translate_rolling` handles it, registered only when the node type exists. Rolling tests use a single `skip_rolling_expr_136_to_138` marker. * **HConcat strict mode:** added a `strict` slot on the `HConcat` IR that raises `pl.exceptions.ShapeError` on height mismatch, threaded through every construction site. * **IsBetween Decimal vs Float:** new `_align_decimal_float_for_comparison` casts Decimal to Float64 on 1.39+, since polars no longer inserts that cast and libcudf would otherwise give wrong results. * **set_sorted:** options shape changed from `(asc_str,)` to `(descending_bool, ...)`; translator branches on type. * **Dynamic predicates:** new `_is_dynamic_pred` helper makes Scan and Filter skip predicates that raise `"dynamic_pred"`. * **IR version ceiling** raised from `(12, 1)` to `(12, 2)`. Sink format check now includes `"Json"`, and a precedence bug in `_sink_to_file` is fixed. Authors: - Matthew Murray (https://github.com/Matt711) - Matthew Roeschke (https://github.com/mroeschke) Approvers: - James Lamb (https://github.com/jameslamb) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22048
rapidsai#22558) PR rapidsai#22048 (merged today) added the new `test_hconcat_strict_different_heights` test, which imports `assert_collect_raises`. However, PR rapidsai#22535 (also merged today) removed that helper. The two PRs landed on `release/26.06` without the conflict being noticed. On `main`, `test_hconcat.py` does not contain the strict-mode test, so the issue is limited to `release/26.06`. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Matthew Murray (https://github.com/Matt711) URL: rapidsai#22558
…ai#22529) This PR fixes the use-after-destroy and stream ordering (with PTDS input) issue (with host buffer source) in the `fetch_byte_ranges_to_device_async` IO utility used by parquet and hybrid scan. See follow up PR rapidsai#22550 that reduces the locked region size by moving all `host_read_async` outside it. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Bradley Dice (https://github.com/bdice) - Amin Aramoon (https://github.com/aminaramoon) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#22529
This PR is pure moving/renaming.
### New layout
```
cudf_polars/
callback.py
containers/
dsl/
engine/ ← user-facing GPU engine classes (Streaming/Ray/Dask/SPMD/DefaultSingleton)
streaming/ ← multi-partition execution layer (formerly "experimental")
actor_graph/ ← RapidsMPF-backed runtime
collectives/ ← RapidsMPF collective communication primitives
benchmarks/
utils.py ← consolidated benchmark utilities (formerly split between utils.py shim and utils_new_frontends.py)
pdsds.py
...
base.py
dispatch.py
parallel.py
groupby.py
io.py
join.py
...
testing/
typing/
utils/
```
Engine entry points move from deeply nested experimental paths to top-level imports:
```
cudf_polars.experimental.rapidsmpf.frontend.options → cudf_polars.engine.options
cudf_polars.experimental.rapidsmpf.frontend.spmd → cudf_polars.engine.spmd
cudf_polars.experimental.rapidsmpf.frontend.ray → cudf_polars.engine.ray
cudf_polars.experimental.rapidsmpf.frontend.dask → cudf_polars.engine.dask
cudf_polars.experimental.rapidsmpf.frontend.core → cudf_polars.engine.core
```
Benchmarks is now under `streaming`:
```
python -m cudf_polars.streaming.benchmarks.pdsh
```
Authors:
- Mads R. B. Kristensen (https://github.com/madsbk)
Approvers:
- Lawrence Mitchell (https://github.com/wence-)
- Peter Andreas Entschev (https://github.com/pentschev)
- Bradley Dice (https://github.com/bdice)
- Matthew Murray (https://github.com/Matt711)
URL: rapidsai#22491
This backports a pair of commits for the cudf-polars benchmarking CLI. We're currently running benchmarks against both release/26.06 and main. Authors: - Tom Augspurger (https://github.com/TomAugspurger) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#22572
Fixes a memcheck error introduced by rapidsai#22452 where an atomic operation on a bool variable is reported by compute-sanitizer as an out-of-bounds access. Changing the variable to an `int32_t` resolves the error. Closes rapidsai#22570 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#22571
- Follow up to rapidsai#22491 - Moves the `collectives` module under `actor_graph` to break circular dependencies. The "collectives" are **mostly** used to build the actor graph anyway. **Note**: Before merging this, I'd like to get confirmation that others see circular-import errors locally. E.g. ``` pytest -v python/cudf_polars/tests/streaming/test_groupby.py ... E ImportError: cannot import name 'ShuffleManager' from partially initialized module 'cudf_polars.streaming.collectives.shuffle' (most likely due to a circular import) (/raid/rzamora/rapids-26.06/cudf/python/cudf_polars/cudf_polars/streaming/collectives/shuffle.py) ``` Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Matthew Murray (https://github.com/Matt711) - Mads R. B. Kristensen (https://github.com/madsbk) URL: rapidsai#22578
Hand-tune `polars_impl` for 19 TPC-DS benchmark queries in `python/cudf_polars/cudf_polars/experimental/benchmarks/pdsds_queries/`. Each rewrite preserves query semantics and only changes how the polars LazyFrame is constructed; `duckdb_impl` is unchanged. The optimizations apply a small set of recurring patterns that the polars optimizer does not (yet) perform automatically: - **Predicate pushdown on dimension tables** — pre-filter `date_dim`, `item`, `store`, etc. by literal predicates (year, quarter, month window, category/class/brand) before any join, so the join builds smaller hash tables. - **Semi-join fact-table pre-filtering** — use selective dimension keys (and in some cases `store_returns` (customer, item) pairs) as semi-join probes against the fact tables, shrinking them before the expensive joins. - **Projection pushdown** — `select(...)` only the columns each table contributes before joining, instead of relying on the planner to prune them later. - **Condition-join → equi-join** — replace cross-join + filter and CONDITIONALJOIN-style patterns with constant-key equi-joins where the predicate is equivalent. - **Single-pass bucket aggregation** — collapse multiple independent global-sum group-bys over the same fact table into one pass that emits the values in a single aggregation, replacing N scans with 1. - **Join reordering** — defer non-selective joins (e.g. customer) until after the selective filter chain so the row count entering the deferred join is much smaller. ## Test plan - [ ] Run TPC-DS validation against DuckDB on the 19 modified queries - [ ] Run benchmark sweep and confirm no regressions vs. main on unmodified queries - [ ] Confirm result equality (sorted output) matches DuckDB reference Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: rapidsai#22395
…ai#22582) Provide a patch for apache/arrow#48801 Fixes rapidsai#22540 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) - Paul Mattione (https://github.com/pmattione-nvidia) - MithunR (https://github.com/mythrocks) URL: rapidsai#22582
Member
Author
|
Closing in favor of #22585 |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (226)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughMass migration from experimental to streaming/engine modules, Polars dependency upper bound to <1.40, DSL/IR/version-gate updates, Parquet async read refactor with batched device copies, Arrow/RapidJSON CMake override, plus extensive benchmark and test updates to new APIs. ChangesStreaming/Engine Migration and Polars 1.40
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Checklist