PERF: short-circuit sentinel scans on integer indexers by jbrockmendel · Pull Request #65298 · pandas-dev/pandas

jbrockmendel · 2026-04-19T15:58:16Z

Summary

Add lib.has_sentinel(arr, sentinel) — a short-circuiting Cython helper for the common (arr == sentinel).any() pattern on integer indexers, with an 8x-unrolled inner loop over a fused int8/16/32/64 memoryview.

Wire up the clearest (indexer == -1).any() call sites:

_MergeOperation._maybe_add_join_keys (non-inner merges)
_Unstacker.new_index (single-level unstack)
sorting.get_group_index / decons_obs_group_ids (groupby key lifting)
MultiIndex._get_indexer_strict NaN-key path
DataFrame.__setitem__ non-unique columns path

The idxmin/idxmax sites were intentionally left alone — they can receive an ExtensionArray (e.g. int64[pyarrow]) rather than a numpy array, and the fused memoryview helper doesn't accept those.

Benchmarks

Best-of-9 repeats × ~200 iters each, ARM64 (Apple clang). Full 39-case sweep; significant signals only (|z|>2 and |Δmean|>3%):

Case	Δbest	Δmean
`DataFrame.__setitem__` non-unique cols	−5%	−30%
`groupby.count` n=100K	−22%	−16%
`unstack` sparse 100×500 (gaps)	−21%	−14%
`groupby.sum` n=10K	−12%	−13%
`groupby.sum` n=100K	−12%	−13%
`groupby.sum` 2 keys n=10K	−13%	−10%
`crosstab` n=100K	+1%	−7%
`groupby.sum` n=1M	−5%	−6%
`unstack` dense int16 codes 500×500	−5%	−4%
`crosstab` n=10K	−2%	−4%
`MultiIndex.loc` NaN key	−2%	−3%
`merge(how='outer')` 50% overlap n=100K	+1%	+2%

Only one significant regression: merge(how='outer') 50% overlap n=100K ≈ +2%. Root cause is the known mid-size SIMD gap — numpy's (arr == -1).any() does 2 int64 lanes/cycle; our 8-wide scalar unroll can't match when the array fits in L2 and the first sentinel isn't near the start. Disappears at 1M+ (memory-bound) and small sizes (Python overhead dominates).

Test plan

pandas/tests/libs/test_lib.py + targeted correctness tests across int8/16/32/64, including all tail-positioning edge cases
Full regression sweep across frame/, series/, indexing/, reshape/, indexes/, libs/, groupby/ (~78K tests)

Notes

Kept the existing (ilocs < 0).any() form where values can also be < -1, and the (indices == -1).any() form on EA-typed res._values paths.
The .all() analog would require a companion all_sentinel helper; could follow up if useful, but the majority cluster in the codebase is .any().

🤖 Generated with Claude Code

Add `lib.has_sentinel(arr, sentinel)` — a short-circuiting Cython helper for the common `(arr == sentinel).any()` pattern on integer indexers, with an 8x-unrolled inner loop over a fused int8/16/32/64 memoryview. Wire up the clearest `(indexer == -1).any()` call sites: - `merge._MergeOperation._maybe_add_join_keys` (outer/left/right merges) - `reshape._Unstacker.new_index` (single-level unstack) - `sorting.get_group_index` / `decons_obs_group_ids` (groupby) - `MultiIndex._get_indexer_strict` NaN-key path - `DataFrame.__setitem__` non-unique columns path User-visible impact (best-of-9 repeats × ~200 iters, ARM64): - groupby.sum / groupby.count at n >= 10K: -5% to -16% - DataFrame[cols] = value with non-unique columns: ~-30% - unstack with NaN-introducing gaps (100x500): -14% - crosstab: -4% to -7% - multi_loc_nan_key: -3% - merge outer with no overlap (short-circuit fires immediately): -3% to -5% One known regression: merge(how='outer') with ~50% overlap and n ~100K sees ~+2% because the scan (length ~150K, first sentinel near the middle) fits in L2 cache, where numpy's SIMD (arr == -1).any() beats our scalar unroll. Unchanged at 1M+ (memory-bound) and small sizes (Python overhead dominates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jbrockmendel added the Performance Memory or execution speed performance label Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: short-circuit sentinel scans on integer indexers#65298

PERF: short-circuit sentinel scans on integer indexers#65298
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-has-sentinel

jbrockmendel commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jbrockmendel commented Apr 19, 2026

Summary

Benchmarks

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant