Skip to content

Forward-merge release/26.06 into main#22585

Merged
rockhowse merged 10 commits into
rapidsai:mainfrom
madsbk:main-merge-release/26.06
May 20, 2026
Merged

Forward-merge release/26.06 into main#22585
rockhowse merged 10 commits into
rapidsai:mainfrom
madsbk:main-merge-release/26.06

Conversation

@madsbk
Copy link
Copy Markdown
Member

@madsbk madsbk commented May 19, 2026

Description

Solve merge conflicts in #22555, remember to use /merge nosquash

Matt711 and others added 6 commits May 18, 2026 19:00
Updated cudf-polars to support Polars 1.39.

Summary:

* **Dependency pin** updated across conda envs, the recipe, `dependencies.yaml`, and `pyproject.toml`. New `POLARS_VERSION_LT_139` flag gates version specific code.
* **Rolling expressions:** polars 1.39 makes `pl.col(...).rolling(...)` accessible again via `AExpr::Rolling`. A new `_translate_rolling` handles it, registered only when the node type exists. Rolling tests use a single `skip_rolling_expr_136_to_138` marker.
* **HConcat strict mode:** added a `strict` slot on the `HConcat` IR that raises `pl.exceptions.ShapeError` on height mismatch, threaded through every construction site.
* **IsBetween Decimal vs Float:** new `_align_decimal_float_for_comparison` casts Decimal to Float64 on 1.39+, since polars no longer inserts that cast and libcudf would otherwise give wrong results.
* **set_sorted:** options shape changed from `(asc_str,)` to `(descending_bool, ...)`; translator branches on type.
* **Dynamic predicates:** new `_is_dynamic_pred` helper makes Scan and Filter skip predicates that raise `"dynamic_pred"`.
* **IR version ceiling** raised from `(12, 1)` to `(12, 2)`. Sink format check now includes `"Json"`, and a precedence bug in `_sink_to_file` is fixed.

Authors:
  - Matthew Murray (https://github.com/Matt711)
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: rapidsai#22048
rapidsai#22558)

PR rapidsai#22048 (merged today) added the new `test_hconcat_strict_different_heights` test, which imports `assert_collect_raises`. However, PR rapidsai#22535 (also merged today) removed that helper.

The two PRs landed on `release/26.06` without the conflict being noticed.

On `main`, `test_hconcat.py` does not contain the strict-mode test, so the issue is limited to `release/26.06`.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Matthew Murray (https://github.com/Matt711)

URL: rapidsai#22558
…ai#22529)

This PR fixes the use-after-destroy and stream ordering (with PTDS input) issue (with host buffer source) in the `fetch_byte_ranges_to_device_async` IO utility used by parquet and hybrid scan.

See follow up PR rapidsai#22550 that reduces the locked region size by moving all `host_read_async` outside it.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Amin Aramoon (https://github.com/aminaramoon)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#22529
This PR is pure moving/renaming.

### New layout
```
cudf_polars/
  callback.py
  containers/
  dsl/
  engine/         ← user-facing GPU engine classes (Streaming/Ray/Dask/SPMD/DefaultSingleton)
  streaming/      ← multi-partition execution layer (formerly "experimental")
    actor_graph/  ← RapidsMPF-backed runtime
    collectives/  ← RapidsMPF collective communication primitives
    benchmarks/
      utils.py    ← consolidated benchmark utilities (formerly split between utils.py shim and utils_new_frontends.py)
      pdsds.py
      ...
    base.py
    dispatch.py
    parallel.py
    groupby.py
    io.py
    join.py
    ...
  testing/
  typing/
  utils/
```

Engine entry points move from deeply nested experimental paths to top-level imports:
```
cudf_polars.experimental.rapidsmpf.frontend.options → cudf_polars.engine.options
cudf_polars.experimental.rapidsmpf.frontend.spmd    → cudf_polars.engine.spmd
cudf_polars.experimental.rapidsmpf.frontend.ray     → cudf_polars.engine.ray
cudf_polars.experimental.rapidsmpf.frontend.dask    → cudf_polars.engine.dask
cudf_polars.experimental.rapidsmpf.frontend.core    → cudf_polars.engine.core
```

Benchmarks is now under `streaming`:
```
python -m cudf_polars.streaming.benchmarks.pdsh
```

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Murray (https://github.com/Matt711)

URL: rapidsai#22491
This backports a pair of commits for the cudf-polars benchmarking CLI. We're currently running benchmarks against both release/26.06 and main.

Authors:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#22572
Fixes a memcheck error introduced by rapidsai#22452 where an atomic operation on a bool variable is reported by compute-sanitizer as an out-of-bounds access. Changing the variable to an `int32_t` resolves the error.

Closes rapidsai#22570

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#22571
@madsbk madsbk self-assigned this May 19, 2026
@madsbk madsbk requested review from a team as code owners May 19, 2026 19:03
@madsbk madsbk requested a review from bdice May 19, 2026 19:03
@madsbk madsbk added improvement Improvement / enhancement to an existing function breaking Breaking change labels May 19, 2026
@madsbk madsbk requested a review from ttnghia May 19, 2026 19:03
@madsbk madsbk force-pushed the main-merge-release/26.06 branch from 65496f7 to f0ffbb9 Compare May 19, 2026 19:08
@GPUtester GPUtester moved this to In Progress in cuDF Python May 19, 2026
- Follow up to rapidsai#22491
- Moves the `collectives` module under `actor_graph` to break circular dependencies. The "collectives" are **mostly** used to build the actor graph anyway.

**Note**: Before merging this, I'd like to get confirmation that others see circular-import errors locally. E.g.

```
pytest -v python/cudf_polars/tests/streaming/test_groupby.py

...

E   ImportError: cannot import name 'ShuffleManager' from partially initialized module 'cudf_polars.streaming.collectives.shuffle' (most likely due to a circular import) (/raid/rzamora/rapids-26.06/cudf/python/cudf_polars/cudf_polars/streaming/collectives/shuffle.py)
```

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#22578
coderabbitai[bot]

This comment was marked as off-topic.

@rapidsai rapidsai deleted a comment from coderabbitai Bot May 19, 2026
@rapidsai rapidsai deleted a comment from coderabbitai Bot May 19, 2026
Hand-tune `polars_impl` for 19 TPC-DS benchmark queries in `python/cudf_polars/cudf_polars/experimental/benchmarks/pdsds_queries/`. Each rewrite preserves query semantics and only changes how the polars LazyFrame is constructed; `duckdb_impl` is unchanged.

The optimizations apply a small set of recurring patterns that the polars optimizer does not (yet) perform automatically:

- **Predicate pushdown on dimension tables** — pre-filter `date_dim`, `item`, `store`, etc. by literal predicates (year, quarter, month window, category/class/brand) before any join, so the join builds smaller hash tables.
- **Semi-join fact-table pre-filtering** — use selective dimension keys (and in some cases `store_returns` (customer, item) pairs) as semi-join probes against the fact tables, shrinking them before the expensive joins.
- **Projection pushdown** — `select(...)` only the columns each table contributes before joining, instead of relying on the planner to prune them later.
- **Condition-join → equi-join** — replace cross-join + filter and CONDITIONALJOIN-style patterns with constant-key equi-joins where the predicate is equivalent.
- **Single-pass bucket aggregation** — collapse multiple independent global-sum group-bys over the same fact table into one pass that emits the values in a single aggregation, replacing N scans with 1.
- **Join reordering** — defer non-selective joins (e.g. customer) until after the selective filter chain so the row count entering the deferred join is much smaller.

## Test plan

- [ ] Run TPC-DS validation against DuckDB on the 19 modified queries
- [ ] Run benchmark sweep and confirm no regressions vs. main on unmodified queries
- [ ] Confirm result equality (sorted output) matches DuckDB reference

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)

URL: rapidsai#22395
@KyleFromNVIDIA KyleFromNVIDIA changed the title Main merge release/26.06 Merge release/26.06 into main May 19, 2026
@KyleFromNVIDIA KyleFromNVIDIA changed the title Merge release/26.06 into main Forward-merge release/26.06 into main May 19, 2026
Copy link
Copy Markdown
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving C++

@madsbk madsbk force-pushed the main-merge-release/26.06 branch from f0ffbb9 to 752499e Compare May 20, 2026 06:09
@madsbk madsbk requested review from a team as code owners May 20, 2026 06:09
@coderabbitai

This comment was marked as off-topic.

coderabbitai[bot]

This comment was marked as off-topic.

@rapidsai rapidsai deleted a comment from coderabbitai Bot May 20, 2026
Copy link
Copy Markdown
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally approving the CMake changes that I made in #22582.

image

@davidwendt
Copy link
Copy Markdown
Contributor

/merge nosquash

@rockhowse rockhowse merged commit 648d028 into rapidsai:main May 20, 2026
217 of 221 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 20, 2026
@madsbk madsbk deleted the main-merge-release/26.06 branch May 21, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change CMake CMake build issue cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.