Skip to content

feat: add SupportsSetRange protocol and store implementations#3907

Open
d-v-b wants to merge 15 commits into
zarr-developers:mainfrom
d-v-b:feat/byte-range-setter
Open

feat: add SupportsSetRange protocol and store implementations#3907
d-v-b wants to merge 15 commits into
zarr-developers:mainfrom
d-v-b:feat/byte-range-setter

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented Apr 15, 2026

Adds a protocol for stores that support synchronously and asynchronously writing a bytes into a range in the target object. only MemoryStore and LocalStore implement this.

this behavior is necessary to enable an in-place writing mode for shards, e.g. where a single subchunk is written without re-writing the entire shard.

Add SupportsSetRange protocol for stores that support writing to a byte
range within an existing value (set_range/set_range_sync). Implement
in MemoryStore and LocalStore, both explicitly subclassing the protocol.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.37%. Comparing base (b740cf2) to head (f3b8afe).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3907      +/-   ##
==========================================
+ Coverage   93.32%   93.37%   +0.05%     
==========================================
  Files          88       88              
  Lines       11828    11862      +34     
==========================================
+ Hits        11038    11076      +38     
+ Misses        790      786       -4     
Files with missing lines Coverage Δ
src/zarr/abc/store.py 96.47% <100.00%> (+0.04%) ⬆️
src/zarr/storage/_local.py 97.42% <100.00%> (+2.01%) ⬆️
src/zarr/storage/_memory.py 96.95% <100.00%> (+0.21%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b and others added 2 commits April 15, 2026 11:03
Tests cover isinstance check, async set_range, sync set_range_sync,
and edge case (writing at end of value).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026
@d-v-b d-v-b requested a review from maxrjones April 21, 2026 19:12
@maxrjones
Copy link
Copy Markdown
Member

should this get the same design pivot as #3925 (comment) started in #3925, regarding protocols vs. abc methods?

Do you remember where you were previously pointed to using protocols over methods despite the weight of our existing store API? It might be helpful to quickly jot down our decisions here (use methods for now, plan for a better protocol-based store API in the future) in either https://zarr.readthedocs.io/en/stable/contributing/ or a CLAUDE/AGENTS.md for future reference.

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 15, 2026

byte-range writes are an optional behavior that only a handful of "niche" stores support (local and memory). There's not really a sensible fallback or default implementation, (unlike get_ranges). So it makes sense for stores to opt in rather than opt out.

And if we made this a method on the Store abc, callers would need to check for NotImplementedError to figure out of the store really supports it, and the method would clutter the signatures of most stores that will never support it (cloud storage).

I don't think we can categorically say "no" to adding functionality to stores or codecs via protocols. There's already a precedent for defining extra functionality with semi-structural mixins: see

class ArrayBytesCodecPartialEncodeMixin:
. Arguably this should have been a protocol from the start.

@maxrjones
Copy link
Copy Markdown
Member

I'd prefer someone whose work is more oriented towards local/HPC filesystems review this PR if they're available and willing (@LDeakin and @ilan-gold come to mind).

I'm not fully prepared to discuss tradeoffs, but lack of a concurrency/atomicity contract in the docstring raised a few questions for me:

  1. Is parallel set_range to disjoint ranges of the same key supposed to be safe? The motivating sharded-write use case suggests yes, but LocalStore doesn't seem to have locking.
  2. Is set_range racing against set defined?
  3. Should crash-mid-write atomicity be a protocol requirement?

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented May 15, 2026

Is parallel set_range to disjoint ranges of the same key supposed to be safe? The motivating sharded-write use case suggests yes, but LocalStore doesn't seem to have locking.

yes, in the two target stores (local and memory), disjoint range writes should be safe. overlapping range writes will have order-dependent behavior.

Is set_range racing against set defined?

set + set_range is a race condition, but so are concurrent sets.

Should crash-mid-write atomicity be a protocol requirement?

probably, zarr-python 2.x used a write to a temporary file + a rename for atomicity. we don't do that now, but we should!

It's worth keeping in mind that there is just 1 intended caller of this method, and only under very special circumstances: the sharding codec when the inner chunks have deterministic compressed sizes. I don't know when this method would be called outside that context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants