Overhaul cudf-polars docs for new streaming multi-GPU engines#22252
Conversation
3670e3e to
5d31663
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReorganizes and expands cudf-polars documentation: adds API reference, engine-specific guides (Ray, Dask, SPMD, default singleton, in-memory), StreamingOptions reference, profiling/tracing docs, Sphinx nitpick ignores, dependencies update, and minor docstring cross-reference fixes. No runtime code changes. ChangescuDF Polars Documentation Restructuring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
a64f7f5 to
caf77d6
Compare
wence-
left a comment
There was a problem hiding this comment.
I have some small suggestions but I think overall this is in a decent place.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/cudf/source/cudf_polars/api.md`:
- Line 28: The sentence "The three engines share a common base class:" is
inconsistent with the listed classes; change the wording to match the four
listed engine classes (RayEngine, DaskEngine, SPMDEngine,
DefaultSingletonEngine) — e.g., replace with "The streaming engine classes share
a common base class:" or "The engine classes share a common base class:" so the
text and the list (RayEngine, DaskEngine, SPMDEngine, DefaultSingletonEngine)
are consistent and unambiguous.
In `@docs/cudf/source/cudf_polars/profiling.md`:
- Line 145: Replace the phrase "memory related metrics" with the hyphenated form
"memory-related metrics" in the documentation (search for the exact string
"memory related metrics") to follow the clarity guideline; ensure any other
occurrences in the same document use the hyphenated form consistently.
- Around line 47-52: The snippet uses undefined names (pl, RayEngine, opts) so
make it self-contained by adding the necessary imports and creating a
StreamingOptions instance before using RayEngine; specifically import polars as
pl, import StreamingOptions from cudf_polars.engine.options and RayEngine from
cudf_polars.engine.ray, set opts = StreamingOptions(statistics=True) (or other
desired settings), then use RayEngine.from_options(opts) and call
engine.global_statistics(clear=True) as shown.
In `@docs/cudf/source/cudf_polars/usage.md`:
- Line 8: Change the plural pronoun "them" to the singular "it" in the sentence
recommending constructing an engine object and using it in a context manager so
the pronoun agrees with "an engine object"; update the sentence near the phrase
"constructing an engine object and using them in a context manager" to read
"constructing an engine object and using it in a context manager" to improve
clarity.
- Around line 104-106: Rewrite the opening fragment into a full sentence by
merging it with the following sentence so it reads something like: "When you
need to control the engine lifetime explicitly — for example, in a Jupyter
notebook where a `with` block cannot span multiple cells — construct a
`RayEngine` once and reuse it, then call `engine.shutdown()` when you are done."
Update the text around `RayEngine` and `engine.shutdown()` to ensure punctuation
and flow are correct.
- Line 34: Typo: update the link display text "Attaching to an existing
Raycluster" to "Attaching to an existing Ray cluster" so the word "Raycluster"
is split; locate the string containing "Attaching to an existing Raycluster" in
the docs (the link text shown next to the anchor
"`#attaching-to-an-existing-ray-cluster`") and replace it with "Attaching to an
existing Ray cluster" to satisfy codespell checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 342c2fa1-feb7-42be-9b6b-ed0d44b6313c
📒 Files selected for processing (20)
dependencies.yamldocs/cudf/source/conf.pydocs/cudf/source/cudf_polars/api.mddocs/cudf/source/cudf_polars/dask_engine.mddocs/cudf/source/cudf_polars/default_singleton_engine.mddocs/cudf/source/cudf_polars/engine_options.mddocs/cudf/source/cudf_polars/engines.mddocs/cudf/source/cudf_polars/in_memory_engine.mddocs/cudf/source/cudf_polars/index.mddocs/cudf/source/cudf_polars/index.rstdocs/cudf/source/cudf_polars/options.mddocs/cudf/source/cudf_polars/other_engines.mddocs/cudf/source/cudf_polars/profiling.mddocs/cudf/source/cudf_polars/spmd_engine.mddocs/cudf/source/cudf_polars/streaming_execution.mddocs/cudf/source/cudf_polars/usage.mdpython/cudf_polars/cudf_polars/engine/dask.pypython/cudf_polars/cudf_polars/engine/default_singleton_engine.pypython/cudf_polars/cudf_polars/engine/ray.pypython/cudf_polars/cudf_polars/engine/spmd.py
💤 Files with no reviewable changes (3)
- docs/cudf/source/cudf_polars/index.rst
- docs/cudf/source/cudf_polars/streaming_execution.md
- docs/cudf/source/cudf_polars/engine_options.md
✅ Files skipped from review due to trivial changes (9)
- docs/cudf/source/cudf_polars/in_memory_engine.md
- python/cudf_polars/cudf_polars/engine/dask.py
- python/cudf_polars/cudf_polars/engine/spmd.py
- python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
- python/cudf_polars/cudf_polars/engine/ray.py
- docs/cudf/source/cudf_polars/options.md
- docs/cudf/source/cudf_polars/spmd_engine.md
- docs/cudf/source/cudf_polars/dask_engine.md
- docs/cudf/source/cudf_polars/default_singleton_engine.md
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/cudf/source/cudf_polars/index.md`:
- Line 68: Replace the unhyphenated compound adjective "top performing queries"
with the hyphenated form "top-performing queries" in the sentence "complex
aggregation and join operations. Below are the speedups for the top performing
queries:" so the compound adjective correctly modifies "queries".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 14954f80-ccf2-4de0-a3fb-6fdaad298d71
📒 Files selected for processing (20)
dependencies.yamldocs/cudf/source/conf.pydocs/cudf/source/cudf_polars/api.mddocs/cudf/source/cudf_polars/dask_engine.mddocs/cudf/source/cudf_polars/default_singleton_engine.mddocs/cudf/source/cudf_polars/engine_options.mddocs/cudf/source/cudf_polars/engines.mddocs/cudf/source/cudf_polars/in_memory_engine.mddocs/cudf/source/cudf_polars/index.mddocs/cudf/source/cudf_polars/index.rstdocs/cudf/source/cudf_polars/options.mddocs/cudf/source/cudf_polars/other_engines.mddocs/cudf/source/cudf_polars/profiling.mddocs/cudf/source/cudf_polars/spmd_engine.mddocs/cudf/source/cudf_polars/streaming_execution.mddocs/cudf/source/cudf_polars/usage.mdpython/cudf_polars/cudf_polars/engine/dask.pypython/cudf_polars/cudf_polars/engine/default_singleton_engine.pypython/cudf_polars/cudf_polars/engine/ray.pypython/cudf_polars/cudf_polars/engine/spmd.py
💤 Files with no reviewable changes (3)
- docs/cudf/source/cudf_polars/engine_options.md
- docs/cudf/source/cudf_polars/streaming_execution.md
- docs/cudf/source/cudf_polars/index.rst
✅ Files skipped from review due to trivial changes (10)
- docs/cudf/source/cudf_polars/in_memory_engine.md
- python/cudf_polars/cudf_polars/engine/ray.py
- python/cudf_polars/cudf_polars/engine/dask.py
- docs/cudf/source/cudf_polars/usage.md
- docs/cudf/source/cudf_polars/engines.md
- python/cudf_polars/cudf_polars/engine/spmd.py
- python/cudf_polars/cudf_polars/engine/default_singleton_engine.py
- docs/cudf/source/cudf_polars/spmd_engine.md
- docs/cudf/source/cudf_polars/dask_engine.md
- docs/cudf/source/cudf_polars/profiling.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/cudf/source/cudf_polars/default_singleton_engine.md
- docs/cudf/source/cudf_polars/options.md
|
/merge |
c17a215
into
rapidsai:release/26.06
Restructure the docs around the new streaming multi-GPU engines and unified configuration model, replacing the legacy execution narrative, add a set of user-facing guides covering usage, engines, configuration, profiling, and legacy workflows.