Implement groupby all/any via bool-coercion + min/max by galipremsagar · Pull Request #22371 · rapidsai/cudf

galipremsagar · 2026-05-04T20:05:51Z

Summary

Split out from #22289. GroupBy.all and GroupBy.any previously raised NotImplementedError. This PR implements them by reducing to min/max on a bool-coerced copy of the value columns.

Implementation (`python/cudf/cudf/core/groupby/groupby.py`)

A new _bool_reduce helper:

Coerces strings as count_characters > 0 so empty strings become False and nulls remain null (preserved through the aggregation).
Coerces numerics as != 0 with the same null preservation.
For skipna=False, fills nulls with True before aggregation so they don't flip all to False and trivially make any True.
Empty groups (skipna=True with all-NA values) yield NA from min/max; pandas treats those as vacuously True for all and False for any, so the result is filled accordingly.
Applies min_count by counting per-group non-nulls and masking groups whose count is below the threshold.

The new GroupBy is constructed with by=self.grouping (passing the existing _Grouping object) so key columns match the bool-coerced value columns exactly, avoiding label-based lookup when the original key column was excluded.

Tests

python/cudf/cudf/tests/groupby/test_reductions.py:

test_groupby_all_any over bool/int/float data.
test_groupby_all_any_string for string columns.
test_groupby_all_any_empty for empty-group behavior.

Conftest

Removes 32 test_string_dtype_all_na[*-all-*] and [*-any-*] entries.

Relationship to #22289

One of the four split PRs requested in the review on #22289. The DataFrame-case test_string_dtype_all_na[*-{all,any}-*] parametrizations (df.groupby(df["a"]).all()) also rely on identity-based grouping-key column exclusion in #22369; both must merge before the 32 conftest removals stop xpassing.

Both methods previously raised ``NotImplementedError``. Reduce ``all``/ ``any`` to ``min``/``max`` on a bool-coerced copy of the value columns: - Strings coerce as ``count_characters > 0`` so empty strings become ``False`` and nulls remain null (preserving them through the agg). - Numerics coerce as ``!= 0`` with the same null preservation. - ``skipna=False`` replaces nulls with ``True`` before the aggregation so that nulls don't flip ``all`` to ``False`` and trivially make ``any`` ``True``. - Empty groups (all-NA values, skipna=True) yield NA from min/max; pandas treats those as vacuously ``True`` for ``all`` and ``False`` for ``any``, so the result is filled accordingly. - ``min_count`` masks groups whose non-null count is below the threshold. Conftest update for ``test_string_dtype_all_na[*-all-*]`` and ``[*-any-*]`` (32 entries). The string-key DataFrame cases additionally rely on identity-based grouping-key column exclusion, which lands in a sibling PR; both must merge before the entries can be removed without xpassing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-05-04T20:05:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

galipremsagar · 2026-05-04T20:33:49Z

/okay to test b288bbc

mroeschke · 2026-05-05T02:30:02Z

+        # Empty groups (skipna=True with all-NA values) yield NA from
+        # min/max — pandas treats these as ``True`` for ``all`` and
+        # ``False`` for ``any``.
+        bool_np = np.dtype(np.bool_)


Just confirming, is np.dtype(np.bool_) return regardless of the pandas string type?

Yes:

In [9]: df = pd.DataFrame({ ...: "k": [1, 1, 2, 2], ...: "s": pd.array(["a", "b", pd.NA, "c"], dtype="string"), ...: }) In [10]: df Out[10]: k s 0 1 a 1 1 b 2 2 <NA> 3 2 c In [11]: df.groupby("k").all() Out[11]: s k 1 True 2 True In [12]: df.groupby("k").all().dtypes Out[12]: s bool dtype: object

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

…ai#22295) ## Description In pandas-compatible mode, reject casting nullable string columns that use `pd.NA` as their missing-value sentinel to numpy `object` dtype. This came from a pandas 3 compatibility issue in `cudf.pandas`: pandas preserves `pd.NA` when `StringDtype(na_value=pd.NA)` is cast to `object`, while cuDF's string-to-object path materializes nulls as Python `None`. Preserving that sentinel would require carrying source dtype metadata after the result has become plain `object`, which the review pointed out is not a good fit for the current column model. Instead, when `mode.pandas_compatible` is enabled, this PR now raises in `StringColumn.as_string_column` for: - `pd.StringDtype(..., na_value=pd.NA)` -> `object` - string `pd.ArrowDtype` -> `object` Outside pandas-compatible mode, the existing string-to-object cast behavior is unchanged. String dtypes that use `np.nan` as their missing-value sentinel and ordinary object string columns also keep the existing behavior. ## Changes - Add an explicit pandas-compatible-mode `NotImplementedError` for nullable `pd.NA` string-to-object casts in `python/cudf/cudf/core/column/string.py`. - Add focused coverage in `python/cudf/cudf/tests/series/methods/test_astype.py` for both pandas-compatible and non-pandas-compatible behavior. - Remove the previous per-instance `_PANDAS_NA_VALUE` override path. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

Drops the legacy `Cluster.DISTRIBUTED` cluster and the entire `rapidsmpf.integrations.dask` execution path. The new `DaskEngine` (`Cluster.DASK`) is unaffected. Note: all removed components were under `experimental`, so no deprecation period is required. **What’s removed** * `Cluster.DISTRIBUTED` enum value and all dispatch paths (`rapidsmpf/core.py`, `parallel.py:get_scheduler`) * `experimental/dask_registers.py`, `experimental/spilling.py`, `experimental/rapidsmpf/dask.py` * `rapidsmpf_distributed_available()`, `StreamingExecutor.rapidsmpf_spill`, and `cluster_kind` plumbing in `shuffle.py` and `sort.py` * Legacy benchmark harness (`benchmarks/utils_legacy.py`) and the `utils.py` dispatch shim * Legacy test suite (`tests/experimental/legacy/`) and Dask registration test files **What stays** * `Cluster.DASK` / `DaskEngine` (`frontend/dask.py`), the supported Dask backend * `Cluster.SINGLE`, `SPMD`, and `RAY` streaming frontends * The task-graph backend (`Runtime.TASKS`). Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) URL: rapidsai#22358

…identity (rapidsai#22366) Uses node directly as the dict key instead of `id(node)`, so nodes reconstructed on workers (introduced in rapidsai#22287) are found correctly by value rather than failing with a `KeyError`. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) - Mads R. B. Kristensen (https://github.com/madsbk) URL: rapidsai#22366

…dsai#22344) Pass the managed-pool MR directly into each `cudf::datagen::generate_*` call instead of swapping it in as the current device resource and restoring on exit. Also fixes forwarding of the mr parameter down the datagen stack. There are still a few tiny allocations (KBs) that use the default mr because switching would require a copy. These should not cause OOM errors. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - Tianyu Liu (https://github.com/kingcrimsontianyu) URL: rapidsai#22344

Fixes some compile warnings in the libcudf tests. These are deprecation warnings about the missing alignment parameter for the custom allocators in the `hybrid_scan_io` and `parquet_io` examples. ``` /cudf/cpp/examples/parquet_io/io_source.hpp:61:66: warning: 'void cuda::mr::__4::__ibasic_async_resource< <template-parameter-1-1> >::deallocate(cuda::__4::stream_ref, void*, size_t) [with <template-parameter-1-1> = {cuda::__4::__ireference<cuda::__4::__iset_<cuda::mr::__4::__ibasic_async_resource<>, cuda::mr::__4::__ibasic_resource<>, cuda::mr::__4::__with_property<cuda::mr::__4::dynamic_accessibility_property>::__iproperty<>, cuda::mr::__4::__with_property<cuda::mr::__4::host_accessible>::__iproperty<>, cuda::__4::__icopyable<>, cuda::__4::__iequality_comparable<> > >}; size_t = long unsigned int]' is deprecated: Specify an explicit alignment argument. The default alignment will be removed in a future release. [-Wdeprecated-declarations] 61 | void deallocate(T* ptr, std::size_t n) noexcept { mr.deallocate(stream, ptr, n * sizeof(T)); } ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: rapidsai#22335

This PR updates the join benchmarks to include a skip axis, allowing users to optionally include large table sizes, which is not possible in the current setup due to its unconditional skip of those sizes. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Shruti Shivakumar (https://github.com/shrshi) URL: rapidsai#22241

## Summary Pandas' `BaseMaskedDtype` defines `__from_arrow__` for converting a `pyarrow.Array` (including `NullArray`/`ChunkedArray` of nulls) into the matching `BaseMaskedArray`. The cudf.pandas final proxy types for `BooleanDtype`, `Int{8,16,32,64}Dtype`, `UInt{8,16,32,64}Dtype`, and `Float{32,64}Dtype` did not list `__from_arrow__` in their `additional_attributes`, so the proxy `__getattr__` raised `AttributeError` even though the slow object has it. ## Change Add `"__from_arrow__": _FastSlowAttribute("__from_arrow__")` to all eleven masked dtype proxy declarations in `python/cudf/cudf/pandas/_wrappers/pandas.py`, mirroring the existing pattern on `ArrowDtype`. ## Tests / Conftest Removes 25 entries from `conftest-patch.py` that were xfailed only because of the missing attribute: - 22 parametrizations of `tests/arrays/masked/test_arrow_compat.py::test_from_arrow_null` (all four masked dtype families × two arrow array shapes). - `tests/arrays/masked/test_arrow_compat.py::test_arrow_from_arrow_uint`. - `tests/arrays/masked/test_arrow_compat.py::test_dataframe_from_arrow_types_mapper`. - `tests/indexes/multi/test_constructors.py::test_from_frame_missing_values_multiIndex`. All 22 `test_from_arrow_null` cases pass, the full `test_arrow_compat.py` file passes (69 passed, 22 unrelated xfails), and the cudf-side `cudf_pandas_tests/` suite is clean (435 passed). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Description In pandas empty datetime inputs default to `s` resolution, this PR fixes that inconsistency and matches `cudf` with `pandas`. This PR also fixes `freq` preservation in `Groupby.size` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

…thodProxy (rapidsai#22374) ## Summary Three pandas-tests xfail entries surfaced `AttributeError` failures that were just missing entries in the proxy `additional_attributes` (or on `_MethodProxy` itself). ## Changes - **`IntervalArray` proxy** now exposes `_left` and `_right` (private), matching the existing `_data`/`_mask` plumbing. Fixes `test_series_from_temporary_intervalindex_readonly_data`. - **`Styler` proxy** now exposes `_compute`, `_display_funcs_column_names`, and `_display_funcs_index_names` (all private). Fixes `test_format_index_names_clear[_display_funcs_column_names-kwargs1]` and `[_display_funcs_index_names-kwargs0]`. - **`_MethodProxy`** now exposes `__func__` (forwarded to the slow underlying method), mirroring the existing `__name__` and `__doc__` properties. This is required for callers that introspect classmethod descriptors via `type(x).method.__func__`. ## Conftest Removed three `NODEIDS_THAT_FAIL` entries whose underlying tests now pass. ## Notes on remaining `AttributeError` xfails Audited the remaining 17 `AttributeError` xfail entries; they fall into a few buckets that need deeper changes (out of scope for this PR): - **Slow-side `isinstance` failures** (`Styler._compute`, `'DataFrame'/'SubclassedDataFrame' object has no attribute 'dtype'`): the slow-side function's `__globals__` was bound at import time before the proxy classes were installed, so `isinstance(proxy_df, real_DataFrame)` is `False` inside the slow module. Needs a different mechanism than `additional_attributes`. - **Mixed-type Series limitations** (`quantile_box`, `quantile_box_nat`, `quantile_date_range`, `quantile_ea_scalar`): cuDF documents that it returns a `DataFrame` instead of a `Series` when the result would be mixed-type — the proxy preserves that type, breaking downstream `assert_series_equal`. - **`.values` returning ndarray for nullable dtypes** (`test_construct_from_dict_ea_series`): pure pandas returns `IntegerArray`; cuDF returns `ndarray`. - **Other one-offs** (`SparseArray.reshape`, abstract `_from_sequence_of_strings`, custom accessor `xyz`, loc setitem datetime parsing, `_fsproxy_slow` proxy-conversion failure): each needs its own targeted fix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix bugs that appear when running with `num_ranks > 1`, where client-side `pl.concat(per_rank_outputs)` exposes assumptions that do not hold under single-rank execution. These were all discovered while working on multi-rank tests. **NB:** Please take a close look during review, as I’m still a bit unfamiliar with the IR part of cudf-polars. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#22361

Fixes a regression in rapidsai#22237 where reading a CSV larger than the internal 64 MiB chunk size dropped all rows past the first chunk. Root cause is a misuse of a clamped value to determine the EOF state. This PR fixes the EOF transition so it only happens in the last chunk. Also added a large test - all previous CSV tests were below the chunk threshold. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Basit Ayantunde (https://github.com/lamarrr) - Bradley Dice (https://github.com/bdice) URL: rapidsai#22375

Closes rapidsai#22154 This PR adds decimal128 values to the groupby_max_cardinality benchmark. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - David Wendt (https://github.com/davidwendt) URL: rapidsai#22162

…apidsai#22384) The `cudf-polars-ir-signatures` pre-commit hook uses `language: python` but is just a local script (`./ci/check_cudf_polars_ir.py`) that only depends on stdlib modules (`ast`, `argparse`, `sys`, `typing`) and has a `#!/usr/bin/env python3` shebang. With `language: python`, pre-commit unnecessarily creates a virtualenv for this hook. `language: script` is the correct setting — it runs the entry point directly as an executable, relying on the shebang for interpreter selection, with no virtualenv overhead. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) URL: rapidsai#22384

This PR fixes a potential infinite loop in parquet page header count/decode kernels if case of malformed input. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Paul Mattione (https://github.com/pmattione-nvidia) URL: rapidsai#22274

…rapidsai#22281) closes rapidsai#21466 closes rapidsai#21767 Waiting for rapidsai#22212 * Makes rapidsmpf a required dependency of cudf_polars * Removes the following `StreamingExecutor` options as they were "experimental" with associated code paths * `StreamingExecutor.runtime` * `StreamingExecutor.shuffle_method` * `StreamingExecutor.unique_fraction` * `StreamingExecutor.groupby_n_ary` * `StreamingExecutor.rapidsmpf_spill` * Removes the task runtime and associated tests * Some tests we modified to only test 1 specific test configuration because of rapidsai#22346 to pass these tests for now. Planning on revisiting this once rapidsmpf becomes the default Ops-Bot-Merge-Barrier: true Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#22281

This PR uses the host worker pool to submit hybrid scan's host-read IO tasks so that the mutex can be safely released after submission. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Shruti Shivakumar (https://github.com/shrshi) URL: rapidsai#21992

…#22145) Follow up rapidsai#22144 Adds Python bindings for the `cudf::apply_deletion_mask` API and adds pytests for stream compaction. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) URL: rapidsai#22145

…sai#22350) - Follow up to rapidsai#22315 - Further revises `sort_actor` in preparation for rapidsai/rapidsmpf#853 - Part of rapidsai#22128 - Breaks apart `sort_actor` logic into modular steps, so we can avoid collecting boundaries when we already know the boundaries (future work). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Matthew Murray (https://github.com/Matt711) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22350

…apidsai#22381) Builds on the cached `streaming_engines` fixture from rapidsai#22364, which amortizes SPMD bootstrap via `_reset()`, and extends the same pattern to Dask and Ray. With this change, the test matrix runs against: `["in-memory", "spmd", "spmd-small", "dask", "ray"]` subject to package availability and `rrun` gating. We might change the different setups later, but for now CI runs: | Engine | Block Size(s) | GPU Configuration | |----------------|-----------------------|-------------------| | `SPMDEngine` | `"medium"`, `"small"` | Single GPU | | `DaskEngine` | `"medium"` | Single GPU | | `RayEngine` | `"medium"` | Two GPUs | Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) - Peter Andreas Entschev (https://github.com/pentschev) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22381

…22289) ## Summary `get_dtype_of_same_kind` was silently downgrading `StringDtype` results when the source and target were both `StringDtype` but with different storage/`na_value`. When the source had `na_value=np.nan`, it returned the bare target dtype; when the source had `pyarrow` storage, it converted to `large_string[pyarrow]` unless the source equaled the target exactly. This caused groupby `min`/`max`/`first`/`last` on `StringDtype` value columns to return the wrong dtype (e.g., `str[python]` would come back as `StringDtype(na_value=nan)` with pyarrow storage; `string[pyarrow]` would come back as `large_string[pyarrow]`). ## Change If both the source and target dtypes are `pd.StringDtype`, return the source unchanged. This preserves storage and `na_value` for all four storage/`na_value` combinations. ## Tests `test_groupby_string_min_max_preserves_dtype` covers `min`/`max`/`first`/`last` over the four `StringDtype` storage/`na_value` combinations and asserts that the result dtype matches pandas. ## Conftest Removes 24 `test_string_dtype_all_na[*-{min,max,first,last}-{True,False}-True-0]` entries (the `Series.groupby(df[\"a\"]).<op>()` parametrizations with `min_count=0`) that now produce the correct dtype on the first try. ## Relationship to other split PRs This was originally part of a larger #22289 covering string sum, bool any/all, min_count, and several dtype-preservation pieces. Per [the review request](#22289 (review)) this branch now contains only the `get_dtype_of_same_kind` change. The remaining work is split into: - #22369 — extension-type preservation in groupby reductions and identity-based grouping-key column exclusion - #22370 — string sum - #22371 — bool any/all - #22372 — min_count support Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

galipremsagar · 2026-05-12T15:11:41Z

@mroeschke This one is ready for review.

galipremsagar · 2026-05-12T16:29:23Z

/okay to test 48c4ccd

## Summary Split out from rapidsai#22289. `GroupBy.all` and `GroupBy.any` previously raised `NotImplementedError`. This PR implements them by reducing to `min`/`max` on a bool-coerced copy of the value columns. ## Implementation (`python/cudf/cudf/core/groupby/groupby.py`) A new `_bool_reduce` helper: - Coerces strings as `count_characters > 0` so empty strings become `False` and nulls remain null (preserved through the aggregation). - Coerces numerics as `!= 0` with the same null preservation. - For `skipna=False`, fills nulls with `True` before aggregation so they don't flip `all` to `False` and trivially make `any` `True`. - Empty groups (skipna=True with all-NA values) yield NA from min/max; pandas treats those as vacuously `True` for `all` and `False` for `any`, so the result is filled accordingly. - Applies `min_count` by counting per-group non-nulls and masking groups whose count is below the threshold. The new GroupBy is constructed with `by=self.grouping` (passing the existing `_Grouping` object) so key columns match the bool-coerced value columns exactly, avoiding label-based lookup when the original key column was excluded. ## Tests `python/cudf/cudf/tests/groupby/test_reductions.py`: - `test_groupby_all_any` over bool/int/float data. - `test_groupby_all_any_string` for string columns. - `test_groupby_all_any_empty` for empty-group behavior. ## Conftest Removes 32 `test_string_dtype_all_na[*-all-*]` and `[*-any-*]` entries. ## Relationship to rapidsai#22289 One of the four split PRs requested in [the review on rapidsai#22289](rapidsai#22289 (review)). The DataFrame-case `test_string_dtype_all_na[*-{all,any}-*]` parametrizations (`df.groupby(df["a"]).all()`) also rely on identity-based grouping-key column exclusion in rapidsai#22369; both must merge before the 32 conftest removals stop xpassing. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com> Co-authored-by: Matthew Murray <41342305+Matt711@users.noreply.github.com> Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com> Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com> Co-authored-by: Yunsong Wang <12716979+PointKernel@users.noreply.github.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: Kyle Edwards <kyedwards@nvidia.com> Co-authored-by: Bradley Dice <bdice@bradleydice.com> Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com> Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com> Co-authored-by: Muhammad Haseeb <14217455+mhaseeb123@users.noreply.github.com>

galipremsagar requested a review from a team as a code owner May 4, 2026 20:05

galipremsagar requested review from TomAugspurger and brandon-b-miller and removed request for a team May 4, 2026 20:05

github-actions Bot assigned galipremsagar May 4, 2026

github-actions Bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels May 4, 2026

github-project-automation Bot added this to cuDF Python May 4, 2026

galipremsagar mentioned this pull request May 4, 2026

Preserve StringDtype storage and na_value in get_dtype_of_same_kind #22289

Merged

GPUtester moved this to In Progress in cuDF Python May 4, 2026

galipremsagar added bug Something isn't working non-breaking Non-breaking change labels May 4, 2026

galipremsagar requested a review from mroeschke May 4, 2026 20:33

mroeschke reviewed May 5, 2026

View reviewed changes

galipremsagar and others added 13 commits May 6, 2026 15:49

Apply suggestions from code review

8992d39

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

vyasr and others added 8 commits May 8, 2026 02:49

Address reviews

7a120b7

galipremsagar requested a review from a team as a code owner May 8, 2026 02:55

github-actions Bot added Java Affects Java cuDF API. cudf-polars Issues specific to cudf-polars pylibcudf Issues specific to the pylibcudf package labels May 8, 2026

Merge branch 'pandas3' into groupby_bool_reduce

4dcb025

galipremsagar requested review from mroeschke and removed request for a team May 8, 2026 02:56

galipremsagar removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. labels May 8, 2026

Merge

48c4ccd

mroeschke approved these changes May 13, 2026

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels May 13, 2026

Merge

3d9f864

galipremsagar merged commit 0c1b66a into rapidsai:pandas3 May 13, 2026
6 of 8 checks passed

github-project-automation Bot moved this from In Progress to Done in cuDF Python May 13, 2026

GPUtester moved this from Done to In Progress in cuDF Python May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement groupby all/any via bool-coercion + min/max#22371

Implement groupby all/any via bool-coercion + min/max#22371
galipremsagar merged 41 commits into
rapidsai:pandas3from
galipremsagar:groupby_bool_reduce

galipremsagar commented May 4, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

galipremsagar commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

mroeschke May 5, 2026

Uh oh!

galipremsagar May 6, 2026

Uh oh!

galipremsagar commented May 12, 2026

Uh oh!

galipremsagar commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

galipremsagar commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation (python/cudf/cudf/core/groupby/groupby.py)

Tests

Conftest

Relationship to #22289

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

galipremsagar commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

mroeschke May 5, 2026

Choose a reason for hiding this comment

Uh oh!

galipremsagar May 6, 2026

Choose a reason for hiding this comment

Uh oh!

galipremsagar commented May 12, 2026

Uh oh!

galipremsagar commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

galipremsagar commented May 4, 2026 •

edited

Loading

Implementation (`python/cudf/cudf/core/groupby/groupby.py`)