Skip to content

Remove cudf-polars[rapidsmpf] pip extra & numpy as a [test] dependency; add [dask] pip extra#22480

Merged
rapids-bot[bot] merged 10 commits into
rapidsai:mainfrom
mroeschke:ref/cudf_polars/rapidsmpf_optional_dep
May 13, 2026
Merged

Remove cudf-polars[rapidsmpf] pip extra & numpy as a [test] dependency; add [dask] pip extra#22480
rapids-bot[bot] merged 10 commits into
rapidsai:mainfrom
mroeschke:ref/cudf_polars/rapidsmpf_optional_dep

Conversation

@mroeschke
Copy link
Copy Markdown
Contributor

@mroeschke mroeschke commented May 12, 2026

Description

  • Follow up to making rapidsmpf a required dependency of cudf-polars, Make RapidsMPF the default runtime for cudf_polars streaming executor #22281, we no longer need a cudf-polars[rapidsmpf] pip extra
  • Removes numpy as a testing dependency as it was only used for np.random and np.full
  • Removes "nvidia-ml-py>=12" from the experimental extra as it's already a required dependency of cudf_polars
  • Adds cudf-polars[dask] as an alias for cudf-polars[experimental]

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke self-assigned this May 12, 2026
@mroeschke mroeschke requested a review from a team as a code owner May 12, 2026 18:24
@mroeschke mroeschke requested a review from jameslamb May 12, 2026 18:24
@mroeschke mroeschke added improvement Improvement / enhancement to an existing function breaking Breaking change labels May 12, 2026
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 12, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed the rapidsmpf optional dependency group and the numpy>=1.26,<3.0 test pin from python/cudf_polars/pyproject.toml; updated dependencies.yaml to add/redirect cudf-polars dask wiring and removed numpy_run; changed CI to install the wheel with [test,dask] extras; tests migrated from NumPy to Polars/Arrow and use a seeded RNG for deterministic floats.

Changes

Optional dependency & CI cleanup

Layer / File(s) Summary
pyproject optional-dependencies cleanup
python/cudf_polars/pyproject.toml
Removed numpy>=1.26,<3.0 from the test optional-dependencies and deleted the rapidsmpf optional-dependency group so ray follows trace.
dependencies and CI wiring
dependencies.yaml, ci/test_wheel_cudf_polars.sh
Added py_run_cudf_polars_dask, redirected experimental files to include run_cudf_polars_dask, removed numpy_run from files.py_test_dask_cudf.includes, replaced run_cudf_polars_experimental with run_cudf_polars_dask (omitting *nvidia_ml_py), and changed wheel install extras to [test,dask].

Tests: NumPy → Polars/Arrow

Layer / File(s) Summary
Allgather test input migration
python/cudf_polars/tests/experimental/test_allgather.py
Replaced NumPy-based input generation with polars.Series and convert to pylibcudf columns via Column.from_arrow when building tables.
Spilling test input migration
python/cudf_polars/tests/experimental/test_spilling.py
Create deterministic Float32 data using random.Random(42) into a polars.Series, then construct pylibcudf.Table columns via from_arrow; adjusted imports accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • pentschev
  • jameslamb
  • msarahan
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: removing the rapidsmpf pip extra, removing numpy as a test dependency, and adding a dask pip extra.
Description check ✅ Passed The description clearly relates to the changeset, explaining the rationale for removing rapidsmpf and numpy, and adding the dask extra as an alias.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@mroeschke mroeschke requested a review from a team as a code owner May 12, 2026 18:43
@mroeschke mroeschke requested a review from pentschev May 12, 2026 18:43
@mroeschke mroeschke changed the title Remove cudf-polars[rapidsmpf] pip extra Remove cudf-polars[rapidsmpf] pip extra, numpy as a test dependency May 12, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cudf_polars/tests/experimental/test_allgather.py`:
- Around line 28-34: The test currently only asserts shape/type after converting
pl.Series -> plc.Column.from_arrow -> plc.Table and calling allgather; add a
value-level assertion that the gathered table's column values equal the expected
Polars result (e.g., the concatenated pl.Series values) by extracting the column
from the result of plc.allgather and comparing its values to the CPU-side
pl.concat(...) result; locate the conversion usage of plc.Column.from_arrow and
the allgather call in test_allgather.py and assert element-wise equality (or use
to_list()/to_numpy() on both sides) to ensure GPU results match Polars CPU
results.

In `@python/cudf_polars/tests/experimental/test_spilling.py`:
- Around line 36-38: The current table builder reseeds the RNG on every call
(rng = random.Random(42)) producing identical pl.Series payloads; remove the
per-call reseed and instead use a persistent RNG (e.g., a module-level
random.Random instance or pass an rng parameter) so pl_data =
pl.Series([rng.random() ...], dtype=pl.Float32()) yields varied values across
calls; update both occurrences (the one creating pl_data and the similar
occurrence around line 90) and ensure plc.Column.from_arrow/plc.Table creation
uses the new non-reseeded rng.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12d4f352-6d66-4d38-ba75-43fa9a4d7b64

📥 Commits

Reviewing files that changed from the base of the PR and between 71a53ef and d35ff78.

📒 Files selected for processing (4)
  • dependencies.yaml
  • python/cudf_polars/pyproject.toml
  • python/cudf_polars/tests/experimental/test_allgather.py
  • python/cudf_polars/tests/experimental/test_spilling.py
💤 Files with no reviewable changes (2)
  • dependencies.yaml
  • python/cudf_polars/pyproject.toml

Comment thread python/cudf_polars/tests/experimental/test_allgather.py
Comment thread python/cudf_polars/tests/experimental/test_spilling.py
@mroeschke mroeschke changed the title Remove cudf-polars[rapidsmpf] pip extra, numpy as a test dependency Remove cudf-polars[rapidsmpf] pip extra & numpy as a [test] dependency; add [dask] pip extra May 12, 2026
Comment thread dependencies.yaml Outdated
- depends_on_rapidsmpf
- depends_on_pylibcudf
- depends_on_cuda_python
# TODO: Eventually remove this alias in favor of dask
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend picking a timeline and including that in the comment. Remove in 26.08? 26.10?

@mroeschke mroeschke requested a review from a team as a code owner May 13, 2026 17:30
@gforsyth gforsyth removed the request for review from jameslamb May 13, 2026 17:37
@mroeschke
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 7d5637a into rapidsai:main May 13, 2026
117 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 13, 2026
@mroeschke mroeschke deleted the ref/cudf_polars/rapidsmpf_optional_dep branch May 13, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants