Skip to content

Cleanup the legacy engines code path#22488

Closed
madsbk wants to merge 8 commits into
rapidsai:mainfrom
madsbk:cleanup_legacy_code_path
Closed

Cleanup the legacy engines code path#22488
madsbk wants to merge 8 commits into
rapidsai:mainfrom
madsbk:cleanup_legacy_code_path

Conversation

@madsbk
Copy link
Copy Markdown
Member

@madsbk madsbk commented May 13, 2026

Test helpers (cudf_polars/testing/asserts.py)

  • assert_collect_raises

Tests

  • test_collect_assert_raises
  • assert_collect_raises imports (in test_asserts.py,
    test_stringfunction.py, test_casting.py,
    test_datetime_basic.py)

Benchmark CLI (benchmarks/utils.py)

  • Removed the --executor flag and folded its cpu and
    in-memory choices into --frontend, which now accepts:
    cpu, in-memory, spmd, ray, dask, and duckdb.
  • ExecutorType alias
  • RunConfig.executor
  • Executor serialization / summary output
  • assert_never(run_config.executor) branch
  • Unused assert_never import

Benchmark shim

  • benchmarks/utils_new_frontends.py
    (contents merged into benchmarks/utils.py)

@madsbk madsbk self-assigned this May 13, 2026
@madsbk madsbk added improvement Improvement / enhancement to an existing function breaking Breaking change labels May 13, 2026
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 13, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 13, 2026
@madsbk madsbk marked this pull request as ready for review May 13, 2026 14:15
@madsbk madsbk requested a review from a team as a code owner May 13, 2026 14:15
@madsbk madsbk requested a review from TomAugspurger May 13, 2026 14:15
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR eliminates global default executor configuration from the cudf_polars testing infrastructure by making engine specification explicit throughout. It requires all assertion functions to accept explicit GPUEngine instances, removes the --executor CLI option and helper functions that provided defaults, updates the entire test suite to pass explicit engines, consolidates benchmark utilities from a separate module into the main utils module, and adds comprehensive CLI and execution infrastructure for benchmark runners.

Changes

Testing and Benchmark Engine Parameterization

Layer / File(s) Summary
Assertion function API refactoring
python/cudf_polars/cudf_polars/testing/asserts.py
assert_gpu_result_equal and assert_sink_result_equal now require an explicit engine: GPUEngine keyword-only parameter. Removed DEFAULT_EXECUTOR, get_default_engine(), and assert_collect_raises() to eliminate optional engine fallbacks.
Pytest configuration and test fixture cleanup
python/cudf_polars/tests/conftest.py
Removed the pytest_addoption() hook that registered --executor CLI flag and the wiring in pytest_configure() that set DEFAULT_EXECUTOR from it. Configuration now only handles marker registration and global ResourceWarning suppression.
Test suite updates for explicit engines
python/cudf_polars/tests/expressions/test_casting.py, python/cudf_polars/tests/expressions/test_datetime_basic.py, python/cudf_polars/tests/expressions/test_stringfunction.py, python/cudf_polars/tests/test_config.py, python/cudf_polars/tests/test_executors.py, python/cudf_polars/tests/test_sink.py, python/cudf_polars/tests/test_tracing.py, python/cudf_polars/tests/testing/test_asserts.py
All test modules replaced assert_collect_raises() calls and executor="..." string arguments with explicit pytest.raises(...) blocks or instantiated pl.GPUEngine(executor="in-memory", raise_on_fail=True) objects. Updated subprocess environment to use explicit CUDF_POLARS__EXECUTOR setting instead of relying on test defaults.
Benchmark utilities consolidation and CLI infrastructure
python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py (previously utils_new_frontends.py removed)
Merged benchmarking datamodels (QueryResult, RunConfig, validation records), hardware/version tracking (HardwareInfo, PackageVersions), frontend execution runners (CPU, in-memory, SPMD, Ray, Dask, DuckDB), NVTX-annotated timing, structured tracing collection, and comprehensive CLI argument parsing (build_parser, parse_args, run_polars) into a single, self-contained module. Includes dataset loading, scale-factor inference, validation golden-file discovery, and result summarization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The review involves understanding the systematic removal of a global executor pattern across multiple test files and assertion utilities, verifying that all test branches correctly specify engines, and validating the large consolidation of benchmark utilities that introduces comprehensive CLI infrastructure and multi-frontend execution orchestration.

Possibly related PRs

  • rapidsai/cudf#22410: Related refactoring of default engine/executor handling in asserts.py with similar removal of fallback engine derivation patterns.
  • rapidsai/cudf#22381: Extends test fixtures in conftest.py with Ray/Dask-aware streaming engine markers and fixtures that work in tandem with the explicit engine requirement pattern introduced here.

Suggested reviewers

  • TomAugspurger
  • mroeschke
  • bdice
  • Matt711
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Cleanup the legacy engines code path' accurately describes the main objective of removing deprecated engine-related code and making engine arguments mandatory in assertion functions.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #22346: makes engine argument mandatory for assert_gpu_result_equal and assert_sink_result_equal, removes DEFAULT_EXECUTOR and get_default_engine, eliminates assert_collect_raises, and removes the --executor CLI option to prevent implicit defaults.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue #22346 objectives. The refactoring of benchmarks/utils.py consolidating frontend selection and removal of utils_new_frontends.py are supporting changes to achieve explicit engine handling throughout the codebase.
Description check ✅ Passed The PR description clearly outlines the removal of legacy engine code paths, test helpers, and CLI flag consolidation, matching the changeset which removes deprecated test utilities and refactors benchmark/testing infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as off-topic.

@rapidsai rapidsai deleted a comment from copy-pr-bot Bot May 13, 2026
@rapidsai rapidsai deleted a comment from coderabbitai Bot May 13, 2026
@@ -1,19 +1,33 @@
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff is hard to read on GitHub. To better isolate the changes in
utils.py, use something like:

git diff \
  main:python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.py \
  HEAD:python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py

@wence-
Copy link
Copy Markdown
Contributor

wence- commented May 14, 2026

  • --executor CLI option

    • pytest_addoption

    • DEFAULT_EXECUTOR = config.getoption("--executor")

This stuff has already gone, no?

@wence-
Copy link
Copy Markdown
Contributor

wence- commented May 14, 2026

Closes #22346.

This is closed by refusing to do it, AFAICT, rather than shoveling an engine argument into assert_ir_translation_raises.

Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description for what is now actually going on.

TBH, I would prefer just changing the benchmark rather than the benchmark and the tests in this PR

return final_polars_collect_kwargs, final_cudf_collect_kwargs


def assert_collect_raises(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining it instead of calling assert_collect_raises is simpler IMO.

@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 14, 2026

Closes #22346.

This is closed by refusing to do it, AFAICT, rather than shoveling an engine argument into assert_ir_translation_raises.

Removed the closing, you are right that this PR only addresses the first part of #22346.
I am not sure assert_ir_translation_raises is really an issue, though.

@madsbk madsbk marked this pull request as draft May 14, 2026 14:12
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 14, 2026

TBH, I would prefer just changing the benchmark rather than the benchmark and the tests in this PR

Fair point, let's do one at a time: #22504

@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 16, 2026

close in favor of #22535

@madsbk madsbk closed this May 16, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants