perf: short-circuit COO.reshape when -1 resolves to self.shape#935
Merged
hameerabbasi merged 2 commits intopydata:mainfrom Apr 22, 2026
Merged
Conversation
COO.reshape returns self when self.shape equals the requested shape,
but only checks before resolving any -1 in the target. sparse.tensordot
passes shapes like (-1, N) even for 2D x 2D matmul that doesn't actually
change shape, so the short-circuit never fires and a full reshape runs
(linear_loc + coord rebuild + new COO allocation).
Moving the -1 resolution before the equality check avoids that work for
callers that pass a -1 factorization of the current shape. Behavior is
a strict subset ("fewer copies"): any reshape that already returned
self before still does, and reshape((-1, ...)) that resolves to the
current shape now also returns self, matching the documented contract.
Measured ~16% speedup on a warm conservative-regrid loop (xarray-regrid
ConservativeRegridder.regrid) whose tensordot call sits on the hot path;
bit-identical output. All 6050 existing numba-backend tests pass.
hameerabbasi
approved these changes
Apr 21, 2026
hameerabbasi
previously approved these changes
Apr 21, 2026
Collaborator
|
Thanks, @thodson-usgs! |
Replace `any(d == -1 for d in shape)` with `-1 in shape`. The latter is
a C-level tuple containment, the former a Python-level generator.
On this machine (micro): 221 ns -> 45 ns per check.
End-to-end on the PR's repro (median of 3):
reshape(a.shape): 0.4 us -> 0.2 us (matches main; erases the regression
introduced by running the -1 check
unconditionally)
reshape((-1, K)): 2.7 us -> 2.3 us (small incremental win)
Pure readability / perf nit; semantics are identical since shape is
already a tuple of ints by the line above.
Contributor
Author
|
Sorry @hameerabbasi, I noticed a tiny performance regression for |
hameerabbasi
approved these changes
Apr 22, 2026
Collaborator
Thanks for being thorough! I've kicked off CI and auto-merge once more. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small changes to
COO.reshape:-1in the target shape before theself.shape == shapeshort-circuit, rather than after.any(d == -1 for d in shape)with-1 in shape(C-level tuple containment).Why
sparse.tensordotreshapes its operands to 2D with calls likea.reshape((-1, K)). Whenais already 2D with trailing dimK, the target shape equalsa.shape— but the equality check runs before-1is resolved, so it doesn't match. The full reshape path (linear_loc(), coord rebuild, newCOOallocation) then runs and produces a copy identical to the input.Resolving
-1first lets the short-circuit catch this case and returnself. The-1 in shapeswap is a readability / micro-perf nit that matters here because unconditionally checking for-1before the equality short-circuit (which the first change requires) would otherwise add ~200 ns to every reshape — including the exact-shape case that was already hitting the short-circuit.Semantics
Strictly fewer copies:
selfstill does (exact-shape input has no-1, so the resolution step is a no-op and the equality check sees the sameshape).reshape((-1, ...))that resolves to the current shape now also returnsself. This is consistent with the documented behavior (test_reshape_samecodifiess.reshape(s.shape) is s) and matches NumPy, which similarly does not promise a copy fromndarray.reshapewhen the target is equivalent.ValueErrorbelow still runs with the resolved shape.-1 in shapeis equivalent toany(d == -1 for d in shape)becauseshapeis already a tuple of ints by this point.Minimal repro
Run with the project's
pixienv (pixi run -e test python <file>), or standalone withuv run --with sparse --with numpy python <file>:Measured on this script (macOS / Python 3.13, median of 3 runs):
reshape((-1, 300))reshape(a.shape)main~10× faster on the
(-1, K)pattern, exact-shape path unchanged. All 6050 existing numba-backend tests pass.