Implement groupby all/any via bool-coercion + min/max#22371
Merged
galipremsagar merged 41 commits intoMay 13, 2026
Conversation
Both methods previously raised ``NotImplementedError``. Reduce ``all``/ ``any`` to ``min``/``max`` on a bool-coerced copy of the value columns: - Strings coerce as ``count_characters > 0`` so empty strings become ``False`` and nulls remain null (preserving them through the agg). - Numerics coerce as ``!= 0`` with the same null preservation. - ``skipna=False`` replaces nulls with ``True`` before the aggregation so that nulls don't flip ``all`` to ``False`` and trivially make ``any`` ``True``. - Empty groups (all-NA values, skipna=True) yield NA from min/max; pandas treats those as vacuously ``True`` for ``all`` and ``False`` for ``any``, so the result is filled accordingly. - ``min_count`` masks groups whose non-null count is below the threshold. Conftest update for ``test_string_dtype_all_na[*-all-*]`` and ``[*-any-*]`` (32 entries). The string-key DataFrame cases additionally rely on identity-based grouping-key column exclusion, which lands in a sibling PR; both must merge before the entries can be removed without xpassing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/okay to test b288bbc |
mroeschke
reviewed
May 5, 2026
| # Empty groups (skipna=True with all-NA values) yield NA from | ||
| # min/max — pandas treats these as ``True`` for ``all`` and | ||
| # ``False`` for ``any``. | ||
| bool_np = np.dtype(np.bool_) |
Contributor
There was a problem hiding this comment.
Just confirming, is np.dtype(np.bool_) return regardless of the pandas string type?
Contributor
Author
There was a problem hiding this comment.
Yes:
In [9]: df = pd.DataFrame({
...: "k": [1, 1, 2, 2],
...: "s": pd.array(["a", "b", pd.NA, "c"], dtype="string"),
...: })
In [10]: df
Out[10]:
k s
0 1 a
1 1 b
2 2 <NA>
3 2 c
In [11]: df.groupby("k").all()
Out[11]:
s
k
1 True
2 True
In [12]: df.groupby("k").all().dtypes
Out[12]:
s bool
dtype: objectCo-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
…ai#22295) ## Description In pandas-compatible mode, reject casting nullable string columns that use `pd.NA` as their missing-value sentinel to numpy `object` dtype. This came from a pandas 3 compatibility issue in `cudf.pandas`: pandas preserves `pd.NA` when `StringDtype(na_value=pd.NA)` is cast to `object`, while cuDF's string-to-object path materializes nulls as Python `None`. Preserving that sentinel would require carrying source dtype metadata after the result has become plain `object`, which the review pointed out is not a good fit for the current column model. Instead, when `mode.pandas_compatible` is enabled, this PR now raises in `StringColumn.as_string_column` for: - `pd.StringDtype(..., na_value=pd.NA)` -> `object` - string `pd.ArrowDtype` -> `object` Outside pandas-compatible mode, the existing string-to-object cast behavior is unchanged. String dtypes that use `np.nan` as their missing-value sentinel and ordinary object string columns also keep the existing behavior. ## Changes - Add an explicit pandas-compatible-mode `NotImplementedError` for nullable `pd.NA` string-to-object casts in `python/cudf/cudf/core/column/string.py`. - Add focused coverage in `python/cudf/cudf/tests/series/methods/test_astype.py` for both pandas-compatible and non-pandas-compatible behavior. - Remove the previous per-instance `_PANDAS_NA_VALUE` override path. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
Drops the legacy `Cluster.DISTRIBUTED` cluster and the entire `rapidsmpf.integrations.dask` execution path. The new `DaskEngine` (`Cluster.DASK`) is unaffected. Note: all removed components were under `experimental`, so no deprecation period is required. **What’s removed** * `Cluster.DISTRIBUTED` enum value and all dispatch paths (`rapidsmpf/core.py`, `parallel.py:get_scheduler`) * `experimental/dask_registers.py`, `experimental/spilling.py`, `experimental/rapidsmpf/dask.py` * `rapidsmpf_distributed_available()`, `StreamingExecutor.rapidsmpf_spill`, and `cluster_kind` plumbing in `shuffle.py` and `sort.py` * Legacy benchmark harness (`benchmarks/utils_legacy.py`) and the `utils.py` dispatch shim * Legacy test suite (`tests/experimental/legacy/`) and Dask registration test files **What stays** * `Cluster.DASK` / `DaskEngine` (`frontend/dask.py`), the supported Dask backend * `Cluster.SINGLE`, `SPMD`, and `RAY` streaming frontends * The task-graph backend (`Runtime.TASKS`). Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) URL: rapidsai#22358
…identity (rapidsai#22366) Uses node directly as the dict key instead of `id(node)`, so nodes reconstructed on workers (introduced in rapidsai#22287) are found correctly by value rather than failing with a `KeyError`. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) - Mads R. B. Kristensen (https://github.com/madsbk) URL: rapidsai#22366
…dsai#22344) Pass the managed-pool MR directly into each `cudf::datagen::generate_*` call instead of swapping it in as the current device resource and restoring on exit. Also fixes forwarding of the mr parameter down the datagen stack. There are still a few tiny allocations (KBs) that use the default mr because switching would require a copy. These should not cause OOM errors. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - Tianyu Liu (https://github.com/kingcrimsontianyu) URL: rapidsai#22344
Fixes some compile warnings in the libcudf tests. These are deprecation warnings about the missing alignment parameter for the custom allocators in the `hybrid_scan_io` and `parquet_io` examples.
```
/cudf/cpp/examples/parquet_io/io_source.hpp:61:66: warning: 'void cuda::mr::__4::__ibasic_async_resource< <template-parameter-1-1> >::deallocate(cuda::__4::stream_ref, void*, size_t) [with <template-parameter-1-1> = {cuda::__4::__ireference<cuda::__4::__iset_<cuda::mr::__4::__ibasic_async_resource<>, cuda::mr::__4::__ibasic_resource<>, cuda::mr::__4::__with_property<cuda::mr::__4::dynamic_accessibility_property>::__iproperty<>, cuda::mr::__4::__with_property<cuda::mr::__4::host_accessible>::__iproperty<>, cuda::__4::__icopyable<>, cuda::__4::__iequality_comparable<> > >}; size_t = long unsigned int]' is deprecated: Specify an explicit alignment argument. The default alignment will be removed in a future release. [-Wdeprecated-declarations]
61 | void deallocate(T* ptr, std::size_t n) noexcept { mr.deallocate(stream, ptr, n * sizeof(T)); }
```
Authors:
- David Wendt (https://github.com/davidwendt)
Approvers:
- Yunsong Wang (https://github.com/PointKernel)
- Bradley Dice (https://github.com/bdice)
URL: rapidsai#22335
This PR updates the join benchmarks to include a skip axis, allowing users to optionally include large table sizes, which is not possible in the current setup due to its unconditional skip of those sizes. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Shruti Shivakumar (https://github.com/shrshi) URL: rapidsai#22241
## Summary
Pandas' `BaseMaskedDtype` defines `__from_arrow__` for converting a
`pyarrow.Array` (including `NullArray`/`ChunkedArray` of nulls) into the
matching `BaseMaskedArray`. The cudf.pandas final proxy types for
`BooleanDtype`, `Int{8,16,32,64}Dtype`, `UInt{8,16,32,64}Dtype`, and
`Float{32,64}Dtype` did not list `__from_arrow__` in their
`additional_attributes`, so the proxy `__getattr__` raised
`AttributeError` even though the slow object has it.
## Change
Add `"__from_arrow__": _FastSlowAttribute("__from_arrow__")` to all
eleven masked dtype proxy declarations in
`python/cudf/cudf/pandas/_wrappers/pandas.py`, mirroring the existing
pattern on `ArrowDtype`.
## Tests / Conftest
Removes 25 entries from `conftest-patch.py` that were xfailed only
because of the missing attribute:
- 22 parametrizations of
`tests/arrays/masked/test_arrow_compat.py::test_from_arrow_null` (all
four masked dtype families × two arrow array shapes).
-
`tests/arrays/masked/test_arrow_compat.py::test_arrow_from_arrow_uint`.
-
`tests/arrays/masked/test_arrow_compat.py::test_dataframe_from_arrow_types_mapper`.
-
`tests/indexes/multi/test_constructors.py::test_from_frame_missing_values_multiIndex`.
All 22 `test_from_arrow_null` cases pass, the full
`test_arrow_compat.py` file passes (69 passed, 22 unrelated xfails), and
the cudf-side `cudf_pandas_tests/` suite is clean (435 passed).
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Description In pandas empty datetime inputs default to `s` resolution, this PR fixes that inconsistency and matches `cudf` with `pandas`. This PR also fixes `freq` preservation in `Groupby.size` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
…thodProxy (rapidsai#22374) ## Summary Three pandas-tests xfail entries surfaced `AttributeError` failures that were just missing entries in the proxy `additional_attributes` (or on `_MethodProxy` itself). ## Changes - **`IntervalArray` proxy** now exposes `_left` and `_right` (private), matching the existing `_data`/`_mask` plumbing. Fixes `test_series_from_temporary_intervalindex_readonly_data`. - **`Styler` proxy** now exposes `_compute`, `_display_funcs_column_names`, and `_display_funcs_index_names` (all private). Fixes `test_format_index_names_clear[_display_funcs_column_names-kwargs1]` and `[_display_funcs_index_names-kwargs0]`. - **`_MethodProxy`** now exposes `__func__` (forwarded to the slow underlying method), mirroring the existing `__name__` and `__doc__` properties. This is required for callers that introspect classmethod descriptors via `type(x).method.__func__`. ## Conftest Removed three `NODEIDS_THAT_FAIL` entries whose underlying tests now pass. ## Notes on remaining `AttributeError` xfails Audited the remaining 17 `AttributeError` xfail entries; they fall into a few buckets that need deeper changes (out of scope for this PR): - **Slow-side `isinstance` failures** (`Styler._compute`, `'DataFrame'/'SubclassedDataFrame' object has no attribute 'dtype'`): the slow-side function's `__globals__` was bound at import time before the proxy classes were installed, so `isinstance(proxy_df, real_DataFrame)` is `False` inside the slow module. Needs a different mechanism than `additional_attributes`. - **Mixed-type Series limitations** (`quantile_box`, `quantile_box_nat`, `quantile_date_range`, `quantile_ea_scalar`): cuDF documents that it returns a `DataFrame` instead of a `Series` when the result would be mixed-type — the proxy preserves that type, breaking downstream `assert_series_equal`. - **`.values` returning ndarray for nullable dtypes** (`test_construct_from_dict_ea_series`): pure pandas returns `IntegerArray`; cuDF returns `ndarray`. - **Other one-offs** (`SparseArray.reshape`, abstract `_from_sequence_of_strings`, custom accessor `xyz`, loc setitem datetime parsing, `_fsproxy_slow` proxy-conversion failure): each needs its own targeted fix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix bugs that appear when running with `num_ranks > 1`, where client-side `pl.concat(per_rank_outputs)` exposes assumptions that do not hold under single-rank execution. These were all discovered while working on multi-rank tests. **NB:** Please take a close look during review, as I’m still a bit unfamiliar with the IR part of cudf-polars. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#22361
Fixes a regression in rapidsai#22237 where reading a CSV larger than the internal 64 MiB chunk size dropped all rows past the first chunk. Root cause is a misuse of a clamped value to determine the EOF state. This PR fixes the EOF transition so it only happens in the last chunk. Also added a large test - all previous CSV tests were below the chunk threshold. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Basit Ayantunde (https://github.com/lamarrr) - Bradley Dice (https://github.com/bdice) URL: rapidsai#22375
Closes rapidsai#22154 This PR adds decimal128 values to the groupby_max_cardinality benchmark. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - David Wendt (https://github.com/davidwendt) URL: rapidsai#22162
…apidsai#22384) The `cudf-polars-ir-signatures` pre-commit hook uses `language: python` but is just a local script (`./ci/check_cudf_polars_ir.py`) that only depends on stdlib modules (`ast`, `argparse`, `sys`, `typing`) and has a `#!/usr/bin/env python3` shebang. With `language: python`, pre-commit unnecessarily creates a virtualenv for this hook. `language: script` is the correct setting — it runs the entry point directly as an executable, relying on the shebang for interpreter selection, with no virtualenv overhead. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) URL: rapidsai#22384
This PR fixes a potential infinite loop in parquet page header count/decode kernels if case of malformed input. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Paul Mattione (https://github.com/pmattione-nvidia) URL: rapidsai#22274
…rapidsai#22281) closes rapidsai#21466 closes rapidsai#21767 Waiting for rapidsai#22212 * Makes rapidsmpf a required dependency of cudf_polars * Removes the following `StreamingExecutor` options as they were "experimental" with associated code paths * `StreamingExecutor.runtime` * `StreamingExecutor.shuffle_method` * `StreamingExecutor.unique_fraction` * `StreamingExecutor.groupby_n_ary` * `StreamingExecutor.rapidsmpf_spill` * Removes the task runtime and associated tests * Some tests we modified to only test 1 specific test configuration because of rapidsai#22346 to pass these tests for now. Planning on revisiting this once rapidsmpf becomes the default Ops-Bot-Merge-Barrier: true Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#22281
This PR uses the host worker pool to submit hybrid scan's host-read IO tasks so that the mutex can be safely released after submission. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Shruti Shivakumar (https://github.com/shrshi) URL: rapidsai#21992
…#22145) Follow up rapidsai#22144 Adds Python bindings for the `cudf::apply_deletion_mask` API and adds pytests for stream compaction. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) URL: rapidsai#22145
…sai#22350) - Follow up to rapidsai#22315 - Further revises `sort_actor` in preparation for rapidsai/rapidsmpf#853 - Part of rapidsai#22128 - Breaks apart `sort_actor` logic into modular steps, so we can avoid collecting boundaries when we already know the boundaries (future work). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Matthew Murray (https://github.com/Matt711) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22350
…apidsai#22381) Builds on the cached `streaming_engines` fixture from rapidsai#22364, which amortizes SPMD bootstrap via `_reset()`, and extends the same pattern to Dask and Ray. With this change, the test matrix runs against: `["in-memory", "spmd", "spmd-small", "dask", "ray"]` subject to package availability and `rrun` gating. We might change the different setups later, but for now CI runs: | Engine | Block Size(s) | GPU Configuration | |----------------|-----------------------|-------------------| | `SPMDEngine` | `"medium"`, `"small"` | Single GPU | | `DaskEngine` | `"medium"` | Single GPU | | `RayEngine` | `"medium"` | Two GPUs | Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) - Peter Andreas Entschev (https://github.com/pentschev) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22381
galipremsagar
added a commit
that referenced
this pull request
May 11, 2026
…22289) ## Summary `get_dtype_of_same_kind` was silently downgrading `StringDtype` results when the source and target were both `StringDtype` but with different storage/`na_value`. When the source had `na_value=np.nan`, it returned the bare target dtype; when the source had `pyarrow` storage, it converted to `large_string[pyarrow]` unless the source equaled the target exactly. This caused groupby `min`/`max`/`first`/`last` on `StringDtype` value columns to return the wrong dtype (e.g., `str[python]` would come back as `StringDtype(na_value=nan)` with pyarrow storage; `string[pyarrow]` would come back as `large_string[pyarrow]`). ## Change If both the source and target dtypes are `pd.StringDtype`, return the source unchanged. This preserves storage and `na_value` for all four storage/`na_value` combinations. ## Tests `test_groupby_string_min_max_preserves_dtype` covers `min`/`max`/`first`/`last` over the four `StringDtype` storage/`na_value` combinations and asserts that the result dtype matches pandas. ## Conftest Removes 24 `test_string_dtype_all_na[*-{min,max,first,last}-{True,False}-True-0]` entries (the `Series.groupby(df[\"a\"]).<op>()` parametrizations with `min_count=0`) that now produce the correct dtype on the first try. ## Relationship to other split PRs This was originally part of a larger #22289 covering string sum, bool any/all, min_count, and several dtype-preservation pieces. Per [the review request](#22289 (review)) this branch now contains only the `get_dtype_of_same_kind` change. The remaining work is split into: - #22369 — extension-type preservation in groupby reductions and identity-based grouping-key column exclusion - #22370 — string sum - #22371 — bool any/all - #22372 — min_count support Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
@mroeschke This one is ready for review. |
Contributor
Author
|
/okay to test 48c4ccd |
mroeschke
approved these changes
May 13, 2026
galipremsagar
added a commit
to galipremsagar/cudf
that referenced
this pull request
May 13, 2026
## Summary Split out from rapidsai#22289. `GroupBy.all` and `GroupBy.any` previously raised `NotImplementedError`. This PR implements them by reducing to `min`/`max` on a bool-coerced copy of the value columns. ## Implementation (`python/cudf/cudf/core/groupby/groupby.py`) A new `_bool_reduce` helper: - Coerces strings as `count_characters > 0` so empty strings become `False` and nulls remain null (preserved through the aggregation). - Coerces numerics as `!= 0` with the same null preservation. - For `skipna=False`, fills nulls with `True` before aggregation so they don't flip `all` to `False` and trivially make `any` `True`. - Empty groups (skipna=True with all-NA values) yield NA from min/max; pandas treats those as vacuously `True` for `all` and `False` for `any`, so the result is filled accordingly. - Applies `min_count` by counting per-group non-nulls and masking groups whose count is below the threshold. The new GroupBy is constructed with `by=self.grouping` (passing the existing `_Grouping` object) so key columns match the bool-coerced value columns exactly, avoiding label-based lookup when the original key column was excluded. ## Tests `python/cudf/cudf/tests/groupby/test_reductions.py`: - `test_groupby_all_any` over bool/int/float data. - `test_groupby_all_any_string` for string columns. - `test_groupby_all_any_empty` for empty-group behavior. ## Conftest Removes 32 `test_string_dtype_all_na[*-all-*]` and `[*-any-*]` entries. ## Relationship to rapidsai#22289 One of the four split PRs requested in [the review on rapidsai#22289](rapidsai#22289 (review)). The DataFrame-case `test_string_dtype_all_na[*-{all,any}-*]` parametrizations (`df.groupby(df["a"]).all()`) also rely on identity-based grouping-key column exclusion in rapidsai#22369; both must merge before the 32 conftest removals stop xpassing. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com> Co-authored-by: Matthew Murray <41342305+Matt711@users.noreply.github.com> Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com> Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com> Co-authored-by: Yunsong Wang <12716979+PointKernel@users.noreply.github.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: Kyle Edwards <kyedwards@nvidia.com> Co-authored-by: Bradley Dice <bdice@bradleydice.com> Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com> Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com> Co-authored-by: Muhammad Haseeb <14217455+mhaseeb123@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split out from #22289.
GroupBy.allandGroupBy.anypreviously raisedNotImplementedError. This PR implements them by reducing tomin/maxon a bool-coerced copy of the value columns.Implementation (
python/cudf/cudf/core/groupby/groupby.py)A new
_bool_reducehelper:count_characters > 0so empty strings becomeFalseand nulls remain null (preserved through the aggregation).!= 0with the same null preservation.skipna=False, fills nulls withTruebefore aggregation so they don't flipalltoFalseand trivially makeanyTrue.TrueforallandFalseforany, so the result is filled accordingly.min_countby counting per-group non-nulls and masking groups whose count is below the threshold.The new GroupBy is constructed with
by=self.grouping(passing the existing_Groupingobject) so key columns match the bool-coerced value columns exactly, avoiding label-based lookup when the original key column was excluded.Tests
python/cudf/cudf/tests/groupby/test_reductions.py:test_groupby_all_anyover bool/int/float data.test_groupby_all_any_stringfor string columns.test_groupby_all_any_emptyfor empty-group behavior.Conftest
Removes 32
test_string_dtype_all_na[*-all-*]and[*-any-*]entries.Relationship to #22289
One of the four split PRs requested in the review on #22289. The DataFrame-case
test_string_dtype_all_na[*-{all,any}-*]parametrizations (df.groupby(df["a"]).all()) also rely on identity-based grouping-key column exclusion in #22369; both must merge before the 32 conftest removals stop xpassing.