Skip to content

Overhaul cudf-polars docs for new streaming multi-GPU engines#22252

Merged
rapids-bot[bot] merged 8 commits into
rapidsai:release/26.06from
madsbk:docs-overhaul
May 20, 2026
Merged

Overhaul cudf-polars docs for new streaming multi-GPU engines#22252
rapids-bot[bot] merged 8 commits into
rapidsai:release/26.06from
madsbk:docs-overhaul

Conversation

@madsbk
Copy link
Copy Markdown
Member

@madsbk madsbk commented Apr 22, 2026

Restructure the docs around the new streaming multi-GPU engines and unified configuration model, replacing the legacy execution narrative, add a set of user-facing guides covering usage, engines, configuration, profiling, and legacy workflows.

@madsbk madsbk self-assigned this Apr 22, 2026
@madsbk madsbk added the doc Documentation label Apr 22, 2026
@madsbk madsbk added the non-breaking Non-breaking change label Apr 22, 2026
@madsbk madsbk force-pushed the docs-overhaul branch 3 times, most recently from 3670e3e to 5d31663 Compare April 23, 2026 12:48
@madsbk madsbk marked this pull request as ready for review April 23, 2026 14:11
@rapidsai rapidsai deleted a comment from copy-pr-bot Bot Apr 23, 2026
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Partial review.

Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/options.md Outdated
Comment thread docs/cudf/source/cudf_polars/options.md Outdated
Comment thread docs/cudf/source/cudf_polars/profiling.md Outdated
Comment thread docs/cudf/source/cudf_polars/profiling.md Outdated
Comment thread docs/cudf/source/cudf_polars/profiling.md Outdated
@rapidsai rapidsai deleted a comment from copy-pr-bot Bot Apr 23, 2026
@madsbk madsbk requested a review from TomAugspurger April 23, 2026 18:50
Comment thread docs/cudf/source/cudf_polars/api.md Outdated
Comment thread docs/cudf/source/cudf_polars/legacy.md Outdated
Comment thread docs/cudf/source/conf.py Outdated
Comment thread docs/cudf/source/cudf_polars/dask_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/dask_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/dask_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/dask_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/dask_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/spmd_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/spmd_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/spmd_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/spmd_engine.md
Comment thread docs/cudf/source/cudf_polars/spmd_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/legacy.md Outdated
Comment thread docs/cudf/source/cudf_polars/engines.md Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Reorganizes and expands cudf-polars documentation: adds API reference, engine-specific guides (Ray, Dask, SPMD, default singleton, in-memory), StreamingOptions reference, profiling/tracing docs, Sphinx nitpick ignores, dependencies update, and minor docstring cross-reference fixes. No runtime code changes.

Changes

cuDF Polars Documentation Restructuring

Layer / File(s) Summary
Sphinx Build Configuration
docs/cudf/source/conf.py, dependencies.yaml
Extended nitpick_ignore and nitpick_ignore_regex to suppress missing-reference warnings for rapidsmpf, ray, distributed, and dask_cuda; added depends_on_ray to docs dependencies.
API Reference Restructuring
docs/cudf/source/cudf_polars/api.md
Rewrote into an API Reference that auto-documents streaming engines (RayEngine, DaskEngine, SPMDEngine, DefaultSingletonEngine), shared types (StreamingEngine, ClusterInfo), configuration entities (StreamingOptions, UNSPECIFIED, HardwareBindingPolicy, bind_to_gpu), SPMD helpers, and updated internal config members.
Landing Page and Index
docs/cudf/source/cudf_polars/index.md, docs/cudf/source/cudf_polars/index.rst
Added markdown landing page with installation, quick-start, benchmarks, and toctree; removed the older ReStructuredText index.rst.
Engines Abstraction and Selection
docs/cudf/source/cudf_polars/engines.md
New page explaining engine= selection (streaming vs in-memory), compares Ray/Dask/SPMD/default singleton, and documents result collection semantics and recommended patterns.
Configuration Options Reference
docs/cudf/source/cudf_polars/options.md
New StreamingOptions reference: categories (executor, engine, rapidsmpf), from_dict behavior, env-var precedence, passthrough options, and option tables.
RayEngine Usage Guide
docs/cudf/source/cudf_polars/usage.md
Rewritten Ray-centric usage guide: first GPU query, RayEngine.from_options(), attach to Ray cluster, manual lifetime control, sink semantics, and gather_cluster_info diagnostics.
DaskEngine Distributed Guide
docs/cudf/source/cudf_polars/dask_engine.md
New DaskEngine guide: one-worker-per-GPU model, startup/teardown, StreamingOptions usage, attach semantics, GPU-binding and dask-cuda-worker coordination guidance, and diagnostics.
SPMDEngine SPMD Execution Guide
docs/cudf/source/cudf_polars/spmd_engine.md
New SPMD guide: one-script-per-GPU semantics, rrun multi-GPU behavior, UCXX communicator bootstrap/reuse, query symmetry constraints, allgather helpers, and diagnostics.
DefaultSingletonEngine and In-Memory Engine
docs/cudf/source/cudf_polars/default_singleton_engine.md, docs/cudf/source/cudf_polars/in_memory_engine.md
Added default_singleton_engine.md (lazy process-wide singleton semantics, get_or_create()/shutdown(), mutual-exclusion) and in_memory_engine.md (non-streaming single-GPU usage and examples).
Engine Selection Discovery Page
docs/cudf/source/cudf_polars/other_engines.md
New page listing alternative execution backends and updating the toctree and external link targets.
Profiling, Tracing, and Statistics
docs/cudf/source/cudf_polars/profiling.md
New profiling/tracing guide: RapidsMPF statistics, GPU profiling differences, CUDF_POLARS_LOG_TRACES scopes, selective metric disabling, and structlog examples.
Python Docstring Updates
python/cudf_polars/cudf_polars/engine/dask.py, .../default_singleton_engine.py, .../ray.py, .../spmd.py
Docstring cross-reference fixes to fully-qualified Sphinx roles for StreamingOptions, ClusterInfo, reserve_op_id, and small formatting tweaks. No runtime changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested labels

improvement, 3 - Ready for Review

Suggested reviewers

  • bdice
  • Matt711
  • mroeschke
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: documentation restructuring focused on new streaming multi-GPU engines, which aligns with the substantial doc rewrites across 15+ files.
Description check ✅ Passed The description clearly relates to the changeset, explaining the documentation restructuring around streaming multi-GPU engines, configuration models, and user guides that match the actual changes.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as off-topic.

@madsbk madsbk requested a review from a team as a code owner May 9, 2026 14:42
@madsbk madsbk requested a review from Matt711 May 9, 2026 14:42
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 9, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 9, 2026
coderabbitai[bot]

This comment was marked as off-topic.

@madsbk madsbk requested a review from a team as a code owner May 19, 2026 13:24
@madsbk madsbk requested a review from davidwendt May 19, 2026 13:24
@github-actions github-actions Bot added the libcudf Affects libcudf (C++/CUDA) code. label May 19, 2026
@madsbk madsbk removed the request for review from davidwendt May 19, 2026 13:30
@madsbk madsbk marked this pull request as draft May 19, 2026 13:30
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@madsbk madsbk force-pushed the docs-overhaul branch 2 times, most recently from a64f7f5 to caf77d6 Compare May 19, 2026 15:09
Comment thread docs/cudf/source/cudf_polars/dask_engine.md
Copy link
Copy Markdown
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice @madsbk - Thanks for doing this! The only blocking (I think) issue I found was related to engine='in-memory'.

Comment thread docs/cudf/source/cudf_polars/engines.md Outdated
Comment thread docs/cudf/source/cudf_polars/in_memory_engine.md Outdated
Comment thread docs/cudf/source/cudf_polars/in_memory_engine.md Outdated
Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small suggestions but I think overall this is in a decent place.

Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/engines.md Outdated
Comment thread docs/cudf/source/cudf_polars/engines.md Outdated
Comment thread docs/cudf/source/cudf_polars/default_singleton_engine.md
Comment thread docs/cudf/source/cudf_polars/options.md Outdated
@madsbk madsbk marked this pull request as ready for review May 19, 2026 16:54
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/cudf/source/cudf_polars/api.md`:
- Line 28: The sentence "The three engines share a common base class:" is
inconsistent with the listed classes; change the wording to match the four
listed engine classes (RayEngine, DaskEngine, SPMDEngine,
DefaultSingletonEngine) — e.g., replace with "The streaming engine classes share
a common base class:" or "The engine classes share a common base class:" so the
text and the list (RayEngine, DaskEngine, SPMDEngine, DefaultSingletonEngine)
are consistent and unambiguous.

In `@docs/cudf/source/cudf_polars/profiling.md`:
- Line 145: Replace the phrase "memory related metrics" with the hyphenated form
"memory-related metrics" in the documentation (search for the exact string
"memory related metrics") to follow the clarity guideline; ensure any other
occurrences in the same document use the hyphenated form consistently.
- Around line 47-52: The snippet uses undefined names (pl, RayEngine, opts) so
make it self-contained by adding the necessary imports and creating a
StreamingOptions instance before using RayEngine; specifically import polars as
pl, import StreamingOptions from cudf_polars.engine.options and RayEngine from
cudf_polars.engine.ray, set opts = StreamingOptions(statistics=True) (or other
desired settings), then use RayEngine.from_options(opts) and call
engine.global_statistics(clear=True) as shown.

In `@docs/cudf/source/cudf_polars/usage.md`:
- Line 8: Change the plural pronoun "them" to the singular "it" in the sentence
recommending constructing an engine object and using it in a context manager so
the pronoun agrees with "an engine object"; update the sentence near the phrase
"constructing an engine object and using them in a context manager" to read
"constructing an engine object and using it in a context manager" to improve
clarity.
- Around line 104-106: Rewrite the opening fragment into a full sentence by
merging it with the following sentence so it reads something like: "When you
need to control the engine lifetime explicitly — for example, in a Jupyter
notebook where a `with` block cannot span multiple cells — construct a
`RayEngine` once and reuse it, then call `engine.shutdown()` when you are done."
Update the text around `RayEngine` and `engine.shutdown()` to ensure punctuation
and flow are correct.
- Line 34: Typo: update the link display text "Attaching to an existing
Raycluster" to "Attaching to an existing Ray cluster" so the word "Raycluster"
is split; locate the string containing "Attaching to an existing Raycluster" in
the docs (the link text shown next to the anchor
"`#attaching-to-an-existing-ray-cluster`") and replace it with "Attaching to an
existing Ray cluster" to satisfy codespell checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 342c2fa1-feb7-42be-9b6b-ed0d44b6313c

📥 Commits

Reviewing files that changed from the base of the PR and between bf25952 and 75cfe03.

📒 Files selected for processing (20)
  • dependencies.yaml
  • docs/cudf/source/conf.py
  • docs/cudf/source/cudf_polars/api.md
  • docs/cudf/source/cudf_polars/dask_engine.md
  • docs/cudf/source/cudf_polars/default_singleton_engine.md
  • docs/cudf/source/cudf_polars/engine_options.md
  • docs/cudf/source/cudf_polars/engines.md
  • docs/cudf/source/cudf_polars/in_memory_engine.md
  • docs/cudf/source/cudf_polars/index.md
  • docs/cudf/source/cudf_polars/index.rst
  • docs/cudf/source/cudf_polars/options.md
  • docs/cudf/source/cudf_polars/other_engines.md
  • docs/cudf/source/cudf_polars/profiling.md
  • docs/cudf/source/cudf_polars/spmd_engine.md
  • docs/cudf/source/cudf_polars/streaming_execution.md
  • docs/cudf/source/cudf_polars/usage.md
  • python/cudf_polars/cudf_polars/engine/dask.py
  • python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
  • python/cudf_polars/cudf_polars/engine/ray.py
  • python/cudf_polars/cudf_polars/engine/spmd.py
💤 Files with no reviewable changes (3)
  • docs/cudf/source/cudf_polars/index.rst
  • docs/cudf/source/cudf_polars/streaming_execution.md
  • docs/cudf/source/cudf_polars/engine_options.md
✅ Files skipped from review due to trivial changes (9)
  • docs/cudf/source/cudf_polars/in_memory_engine.md
  • python/cudf_polars/cudf_polars/engine/dask.py
  • python/cudf_polars/cudf_polars/engine/spmd.py
  • python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
  • python/cudf_polars/cudf_polars/engine/ray.py
  • docs/cudf/source/cudf_polars/options.md
  • docs/cudf/source/cudf_polars/spmd_engine.md
  • docs/cudf/source/cudf_polars/dask_engine.md
  • docs/cudf/source/cudf_polars/default_singleton_engine.md

Comment thread docs/cudf/source/cudf_polars/api.md Outdated
Comment thread docs/cudf/source/cudf_polars/profiling.md
Comment thread docs/cudf/source/cudf_polars/profiling.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Comment thread docs/cudf/source/cudf_polars/usage.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/cudf/source/cudf_polars/index.md`:
- Line 68: Replace the unhyphenated compound adjective "top performing queries"
with the hyphenated form "top-performing queries" in the sentence "complex
aggregation and join operations. Below are the speedups for the top performing
queries:" so the compound adjective correctly modifies "queries".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14954f80-ccf2-4de0-a3fb-6fdaad298d71

📥 Commits

Reviewing files that changed from the base of the PR and between bf25952 and 2a25d35.

📒 Files selected for processing (20)
  • dependencies.yaml
  • docs/cudf/source/conf.py
  • docs/cudf/source/cudf_polars/api.md
  • docs/cudf/source/cudf_polars/dask_engine.md
  • docs/cudf/source/cudf_polars/default_singleton_engine.md
  • docs/cudf/source/cudf_polars/engine_options.md
  • docs/cudf/source/cudf_polars/engines.md
  • docs/cudf/source/cudf_polars/in_memory_engine.md
  • docs/cudf/source/cudf_polars/index.md
  • docs/cudf/source/cudf_polars/index.rst
  • docs/cudf/source/cudf_polars/options.md
  • docs/cudf/source/cudf_polars/other_engines.md
  • docs/cudf/source/cudf_polars/profiling.md
  • docs/cudf/source/cudf_polars/spmd_engine.md
  • docs/cudf/source/cudf_polars/streaming_execution.md
  • docs/cudf/source/cudf_polars/usage.md
  • python/cudf_polars/cudf_polars/engine/dask.py
  • python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
  • python/cudf_polars/cudf_polars/engine/ray.py
  • python/cudf_polars/cudf_polars/engine/spmd.py
💤 Files with no reviewable changes (3)
  • docs/cudf/source/cudf_polars/engine_options.md
  • docs/cudf/source/cudf_polars/streaming_execution.md
  • docs/cudf/source/cudf_polars/index.rst
✅ Files skipped from review due to trivial changes (10)
  • docs/cudf/source/cudf_polars/in_memory_engine.md
  • python/cudf_polars/cudf_polars/engine/ray.py
  • python/cudf_polars/cudf_polars/engine/dask.py
  • docs/cudf/source/cudf_polars/usage.md
  • docs/cudf/source/cudf_polars/engines.md
  • python/cudf_polars/cudf_polars/engine/spmd.py
  • python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
  • docs/cudf/source/cudf_polars/spmd_engine.md
  • docs/cudf/source/cudf_polars/dask_engine.md
  • docs/cudf/source/cudf_polars/profiling.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/cudf/source/cudf_polars/default_singleton_engine.md
  • docs/cudf/source/cudf_polars/options.md

Comment thread docs/cudf/source/cudf_polars/index.md
@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 20, 2026

/merge

@rapids-bot rapids-bot Bot merged commit c17a215 into rapidsai:release/26.06 May 20, 2026
219 of 221 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 20, 2026
@madsbk madsbk deleted the docs-overhaul branch May 20, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars doc Documentation libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants