Skip to content

[cudf_polars] Reorganize package layout#22491

Merged
rapids-bot[bot] merged 5 commits into
rapidsai:release/26.06from
madsbk:reorganize-package-layout
May 19, 2026
Merged

[cudf_polars] Reorganize package layout#22491
rapids-bot[bot] merged 5 commits into
rapidsai:release/26.06from
madsbk:reorganize-package-layout

Conversation

@madsbk
Copy link
Copy Markdown
Member

@madsbk madsbk commented May 13, 2026

Description

This PR is pure moving/renaming.

New layout

cudf_polars/
  callback.py
  containers/
  dsl/
  engine/         ← user-facing GPU engine classes (Streaming/Ray/Dask/SPMD/DefaultSingleton)
  streaming/      ← multi-partition execution layer (formerly "experimental")
    actor_graph/  ← RapidsMPF-backed runtime
    collectives/  ← RapidsMPF collective communication primitives
    benchmarks/
      utils.py    ← consolidated benchmark utilities (formerly split between utils.py shim and utils_new_frontends.py)
      pdsds.py
      ...
    base.py
    dispatch.py
    parallel.py
    groupby.py
    io.py
    join.py
    ...
  testing/
  typing/
  utils/

Engine entry points move from deeply nested experimental paths to top-level imports:

cudf_polars.experimental.rapidsmpf.frontend.options → cudf_polars.engine.options
cudf_polars.experimental.rapidsmpf.frontend.spmd    → cudf_polars.engine.spmd
cudf_polars.experimental.rapidsmpf.frontend.ray     → cudf_polars.engine.ray
cudf_polars.experimental.rapidsmpf.frontend.dask    → cudf_polars.engine.dask
cudf_polars.experimental.rapidsmpf.frontend.core    → cudf_polars.engine.core

Benchmarks is now under streaming:

python -m cudf_polars.streaming.benchmarks.pdsh

@madsbk madsbk self-assigned this May 13, 2026
@madsbk madsbk added improvement Improvement / enhancement to an existing function breaking Breaking change labels May 13, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@rapidsai rapidsai deleted a comment from coderabbitai Bot May 13, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 13, 2026
@madsbk madsbk marked this pull request as ready for review May 13, 2026 19:59
@madsbk madsbk requested a review from a team as a code owner May 13, 2026 19:59
Comment on lines -132 to -147
def lower_dataframescan_rapidsmpf(
ir: DataFrameScan, rec: LowerIRTransformer
) -> tuple[IR, MutableMapping[IR, PartitionInfo]]:
"""Lower a DataFrameScan node for the RapidsMPF streaming runtime."""
config_options = rec.state["config_options"]

# NOTE: We calculate the expected partition count
# to help trigger fallback warnings in lower_ir_graph.
# The generate_ir_sub_network logic is NOT required
# to obey this partition count. However, the count
# WILL match after an IO operation (for now).
rows_per_partition = config_options.executor.max_rows_per_partition
nrows = max(ir.df.shape()[0], 1)
count = math.ceil(nrows / rows_per_partition)

return ir, {ir: PartitionInfo(count=count)}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed lower_dataframescan_rapidsmpf from streaming/actor_graph/io.py (formerly experimental/rapidsmpf/io.py).

Its body are inlined verbatim into the existing @lower_ir_node.register(DataFrameScan) implementation in streaming/io.py (formerly experimental/io.py).

Comment on lines -320 to -348
def lower_scan_rapidsmpf(
ir: Scan, rec: LowerIRTransformer
) -> tuple[IR, MutableMapping[IR, PartitionInfo]]:
"""Lower a Scan node for the RapidsMPF streaming runtime."""
config_options = rec.state["config_options"]
if (
ir.typ in ("csv", "parquet", "ndjson")
and ir.n_rows == -1
and ir.skip_rows == 0
and ir.row_index is None
):
# NOTE: We calculate the expected partition count
# to help trigger fallback warnings in lower_ir_graph.
# The generate_ir_sub_network logic is NOT required
# to obey this partition count. However, the count
# WILL match after an IO operation (for now).
plan = scan_partition_plan(ir, rec.state["stats"], config_options)
paths = list(ir.paths)
if plan.flavor == IOPartitionFlavor.SPLIT_FILES:
count = plan.factor * len(paths)
else:
count = math.ceil(len(paths) / plan.factor)

return ir, {ir: PartitionInfo(count=count, io_plan=plan)}
else:
plan = IOPartitionPlan(
flavor=IOPartitionFlavor.SINGLE_READ, factor=len(ir.paths)
)
return ir, {ir: PartitionInfo(count=1, io_plan=plan)}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed lower_scan_rapidsmpf from streaming/actor_graph/io.py (formerly experimental/rapidsmpf/io.py).

Its body are inlined verbatim into the existing @lower_ir_node.register(Scan) implementation in streaming/io.py (formerly experimental/io.py).

@madsbk madsbk force-pushed the reorganize-package-layout branch 2 times, most recently from 9f3df89 to 019cc0d Compare May 19, 2026 06:10
@madsbk madsbk requested a review from a team as a code owner May 19, 2026 06:10
@madsbk madsbk requested a review from jameslamb May 19, 2026 06:10
@madsbk madsbk changed the base branch from main to release/26.06 May 19, 2026 06:10
@madsbk madsbk force-pushed the reorganize-package-layout branch from 019cc0d to 774de38 Compare May 19, 2026 06:16
coderabbitai[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mads!

@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 19, 2026

/merge

@rapids-bot rapids-bot Bot merged commit d48b4af into rapidsai:release/26.06 May 19, 2026
87 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 19, 2026
madsbk added a commit to madsbk/cudf that referenced this pull request May 19, 2026
This PR is pure moving/renaming.

### New layout
```
cudf_polars/
  callback.py
  containers/
  dsl/
  engine/         ← user-facing GPU engine classes (Streaming/Ray/Dask/SPMD/DefaultSingleton)
  streaming/      ← multi-partition execution layer (formerly "experimental")
    actor_graph/  ← RapidsMPF-backed runtime
    collectives/  ← RapidsMPF collective communication primitives
    benchmarks/
      utils.py    ← consolidated benchmark utilities (formerly split between utils.py shim and utils_new_frontends.py)
      pdsds.py
      ...
    base.py
    dispatch.py
    parallel.py
    groupby.py
    io.py
    join.py
    ...
  testing/
  typing/
  utils/
```

Engine entry points move from deeply nested experimental paths to top-level imports:
```
cudf_polars.experimental.rapidsmpf.frontend.options → cudf_polars.engine.options
cudf_polars.experimental.rapidsmpf.frontend.spmd    → cudf_polars.engine.spmd
cudf_polars.experimental.rapidsmpf.frontend.ray     → cudf_polars.engine.ray
cudf_polars.experimental.rapidsmpf.frontend.dask    → cudf_polars.engine.dask
cudf_polars.experimental.rapidsmpf.frontend.core    → cudf_polars.engine.core
```

Benchmarks is now under `streaming`:
```
python -m cudf_polars.streaming.benchmarks.pdsh
```

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Murray (https://github.com/Matt711)

URL: rapidsai#22491
@rjzamora rjzamora mentioned this pull request May 19, 2026
3 tasks
@madsbk madsbk deleted the reorganize-package-layout branch May 19, 2026 16:43
rapids-bot Bot pushed a commit that referenced this pull request May 19, 2026
- Follow up to #22491
- Moves the `collectives` module under `actor_graph` to break circular dependencies. The "collectives" are **mostly** used to build the actor graph anyway.

**Note**: Before merging this, I'd like to get confirmation that others see circular-import errors locally. E.g.

```
pytest -v python/cudf_polars/tests/streaming/test_groupby.py

...

E   ImportError: cannot import name 'ShuffleManager' from partially initialized module 'cudf_polars.streaming.collectives.shuffle' (most likely due to a circular import) (/raid/rzamora/rapids-26.06/cudf/python/cudf_polars/cudf_polars/streaming/collectives/shuffle.py)
```

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #22578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants