perf: short-circuit COO.reshape when -1 resolves to self.shape by thodson-usgs · Pull Request #935 · pydata/sparse

thodson-usgs · 2026-04-21T19:16:42Z

Summary

Two small changes to COO.reshape:

Resolve any -1 in the target shape before the self.shape == shape short-circuit, rather than after.
Replace any(d == -1 for d in shape) with -1 in shape (C-level tuple containment).

-        if self.shape == shape:
-            return self
-        if any(d == -1 for d in shape):
+        if -1 in shape:
             extra = int(self.size / np.prod([d for d in shape if d != -1]))
             shape = tuple([d if d != -1 else extra for d in shape])
+        if self.shape == shape:
+            return self

Why

sparse.tensordot reshapes its operands to 2D with calls like a.reshape((-1, K)). When a is already 2D with trailing dim K, the target shape equals a.shape — but the equality check runs before -1 is resolved, so it doesn't match. The full reshape path (linear_loc(), coord rebuild, new COO allocation) then runs and produces a copy identical to the input.

Resolving -1 first lets the short-circuit catch this case and return self. The -1 in shape swap is a readability / micro-perf nit that matters here because unconditionally checking for -1 before the equality short-circuit (which the first change requires) would otherwise add ~200 ns to every reshape — including the exact-shape case that was already hitting the short-circuit.

Semantics

Strictly fewer copies:

Any call that previously returned self still does (exact-shape input has no -1, so the resolution step is a no-op and the equality check sees the same shape).
reshape((-1, ...)) that resolves to the current shape now also returns self. This is consistent with the documented behavior (test_reshape_same codifies s.reshape(s.shape) is s) and matches NumPy, which similarly does not promise a copy from ndarray.reshape when the target is equivalent.
Error paths are unchanged — the size-mismatch ValueError below still runs with the resolved shape.
-1 in shape is equivalent to any(d == -1 for d in shape) because shape is already a tuple of ints by this point.

Minimal repro

Run with the project's pixi env (pixi run -e test python <file>), or standalone with uv run --with sparse --with numpy python <file>:

import time
import numpy as np
import sparse

# Moderately sparse matrix; shape picked so (-1, 300) resolves to self.shape
a = sparse.random((200, 300), density=0.02, random_state=0)
N = 20_000

t0 = time.perf_counter()
for _ in range(N):
    a.reshape((-1, 300))          # the hot tensordot pattern
dt_neg1 = time.perf_counter() - t0

t0 = time.perf_counter()
for _ in range(N):
    a.reshape(a.shape)            # already-short-circuited path for comparison
dt_exact = time.perf_counter() - t0

print(f"reshape((-1, 300)): {dt_neg1*1e6/N:6.1f} us/call")
print(f"reshape(a.shape):   {dt_exact*1e6/N:6.1f} us/call")

# Correctness: -1 resolution returning self is consistent with exact-shape
assert a.reshape((-1, 300)) is a
assert a.reshape(a.shape) is a

Measured on this script (macOS / Python 3.13, median of 3 runs):

	`reshape((-1, 300))`	`reshape(a.shape)`
`main`	22.7 μs/call	0.2 μs/call
this PR	2.3 μs/call	0.2 μs/call

~10× faster on the (-1, K) pattern, exact-shape path unchanged. All 6050 existing numba-backend tests pass.

COO.reshape returns self when self.shape equals the requested shape, but only checks before resolving any -1 in the target. sparse.tensordot passes shapes like (-1, N) even for 2D x 2D matmul that doesn't actually change shape, so the short-circuit never fires and a full reshape runs (linear_loc + coord rebuild + new COO allocation). Moving the -1 resolution before the equality check avoids that work for callers that pass a -1 factorization of the current shape. Behavior is a strict subset ("fewer copies"): any reshape that already returned self before still does, and reshape((-1, ...)) that resolves to the current shape now also returns self, matching the documented contract. Measured ~16% speedup on a warm conservative-regrid loop (xarray-regrid ConservativeRegridder.regrid) whose tensordot call sits on the hot path; bit-identical output. All 6050 existing numba-backend tests pass.

hameerabbasi · 2026-04-21T19:36:18Z

Thanks, @thodson-usgs!

Replace `any(d == -1 for d in shape)` with `-1 in shape`. The latter is a C-level tuple containment, the former a Python-level generator. On this machine (micro): 221 ns -> 45 ns per check. End-to-end on the PR's repro (median of 3): reshape(a.shape): 0.4 us -> 0.2 us (matches main; erases the regression introduced by running the -1 check unconditionally) reshape((-1, K)): 2.7 us -> 2.3 us (small incremental win) Pure readability / perf nit; semantics are identical since shape is already a tuple of ints by the line above.

thodson-usgs · 2026-04-21T20:23:48Z

Sorry @hameerabbasi, I noticed a tiny performance regression for reshape(a.shape): 0.2 -> 0.4 μs/call. Claude found a simplification that got this back to 0.2 μs/call.

codspeed-hq · 2026-04-21T20:39:35Z

Merging this PR will degrade performance by 18.04%

⚡ 20 improved benchmarks
❌ 2 regressed benchmarks
✅ 318 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`test_gcxs_dot_ndarray[coo-m=200-n=200-p=200]`	1.8 ms	1.6 ms	+15.33%
⚡	`test_gcxs_dot_ndarray[coo-m=200-n=500-p=200]`	2.4 ms	2.2 ms	+11.14%
⚡	`test_gcxs_dot_ndarray[coo-m=500-n=200-p=200]`	2.5 ms	2.2 ms	+10.82%
❌	`test_index_fancy[side=100-rank=1-format='coo']`	1.2 ms	1.4 ms	-16.97%
❌	`test_index_slice[side=100-rank=2-format='gcxs']`	2.2 ms	2.7 ms	-18.04%
⚡	`test_matmul[m=1000-n=1000-p=200-format='coo']`	9.6 ms	8.3 ms	+15.34%
⚡	`test_matmul[m=200-n=1000-p=200-format='coo']`	3.7 ms	3 ms	+22.93%
⚡	`test_matmul[m=1000-n=200-p=200-format='coo']`	3.6 ms	3 ms	+19.29%
⚡	`test_matmul[m=200-n=1000-p=1000-format='coo']`	10.4 ms	9.2 ms	+13.17%
⚡	`test_matmul[m=1000-n=500-p=200-format='coo']`	5.8 ms	5 ms	+16.51%
⚡	`test_matmul[m=200-n=200-p=1000-format='coo']`	4.4 ms	3.8 ms	+14.74%
⚡	`test_matmul[m=200-n=200-p=500-format='coo']`	3.2 ms	2.7 ms	+18.49%
⚡	`test_matmul[m=200-n=500-p=200-format='coo']`	2.8 ms	2.3 ms	+22.87%
⚡	`test_matmul[m=200-n=1000-p=500-format='coo']`	6.3 ms	5.4 ms	+16.9%
⚡	`test_matmul[m=200-n=200-p=200-format='coo']`	2.3 ms	1.9 ms	+24.31%
⚡	`test_matmul[m=200-n=500-p=1000-format='coo']`	6.7 ms	5.9 ms	+14.02%
⚡	`test_matmul[m=200-n=500-p=500-format='coo']`	4.4 ms	3.7 ms	+17.54%
⚡	`test_matmul[m=500-n=1000-p=200-format='coo']`	5.9 ms	5 ms	+18.39%
⚡	`test_matmul[m=500-n=200-p=200-format='coo']`	2.8 ms	2.3 ms	+21.29%
⚡	`test_matmul[m=500-n=500-p=200-format='coo']`	4 ms	3.3 ms	+19.23%
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

_{Comparing thodson-usgs:perf/reshape-resolve-neg1-before-shortcircuit (487809e) with main (f65a764)}

hameerabbasi · 2026-04-22T07:20:59Z

Sorry @hameerabbasi, I noticed a tiny performance regression for reshape(a.shape): 0.2 -> 0.4 μs/call. Claude found a simplification that got this back to 0.2 μs/call.

Thanks for being thorough! I've kicked off CI and auto-merge once more.

github-actions Bot added the performance label Apr 21, 2026

hameerabbasi approved these changes Apr 21, 2026

View reviewed changes

hameerabbasi previously approved these changes Apr 21, 2026

View reviewed changes

thodson-usgs dismissed hameerabbasi’s stale review via 487809e April 21, 2026 20:13

thodson-usgs requested a review from hameerabbasi April 21, 2026 20:23

thodson-usgs marked this pull request as ready for review April 21, 2026 20:24

hameerabbasi approved these changes Apr 22, 2026

View reviewed changes

hameerabbasi enabled auto-merge (squash) April 22, 2026 07:20

hameerabbasi merged commit fa58c7e into pydata:main Apr 22, 2026
14 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: short-circuit COO.reshape when -1 resolves to self.shape#935

perf: short-circuit COO.reshape when -1 resolves to self.shape#935
hameerabbasi merged 2 commits intopydata:mainfrom
thodson-usgs:perf/reshape-resolve-neg1-before-shortcircuit

thodson-usgs commented Apr 21, 2026 •

edited

Loading

Uh oh!

hameerabbasi commented Apr 21, 2026

Uh oh!

thodson-usgs commented Apr 21, 2026

Uh oh!

codspeed-hq Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

hameerabbasi commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

thodson-usgs commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Semantics

Minimal repro

Uh oh!

hameerabbasi commented Apr 21, 2026

Uh oh!

thodson-usgs commented Apr 21, 2026

Uh oh!

codspeed-hq Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 18.04%

Performance Changes

Uh oh!

hameerabbasi commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thodson-usgs commented Apr 21, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 21, 2026 •

edited

Loading