feat: add SupportsSetRange protocol and store implementations#3907
feat: add SupportsSetRange protocol and store implementations#3907d-v-b wants to merge 15 commits into
Conversation
Add SupportsSetRange protocol for stores that support writing to a byte range within an existing value (set_range/set_range_sync). Implement in MemoryStore and LocalStore, both explicitly subclassing the protocol. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3907 +/- ##
==========================================
+ Coverage 93.32% 93.37% +0.05%
==========================================
Files 88 88
Lines 11828 11862 +34
==========================================
+ Hits 11038 11076 +38
+ Misses 790 786 -4
🚀 New features to boost your workflow:
|
Tests cover isinstance check, async set_range, sync set_range_sync, and edge case (writing at end of value). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r-python into feat/byte-range-setter
|
should this get the same design pivot as #3925 (comment) started in #3925, regarding protocols vs. abc methods? Do you remember where you were previously pointed to using protocols over methods despite the weight of our existing store API? It might be helpful to quickly jot down our decisions here (use methods for now, plan for a better protocol-based store API in the future) in either https://zarr.readthedocs.io/en/stable/contributing/ or a CLAUDE/AGENTS.md for future reference. |
|
byte-range writes are an optional behavior that only a handful of "niche" stores support (local and memory). There's not really a sensible fallback or default implementation, (unlike And if we made this a method on the I don't think we can categorically say "no" to adding functionality to stores or codecs via protocols. There's already a precedent for defining extra functionality with semi-structural mixins: see zarr-python/src/zarr/abc/codec.py Line 256 in 7e58df0 |
|
I'd prefer someone whose work is more oriented towards local/HPC filesystems review this PR if they're available and willing (@LDeakin and @ilan-gold come to mind). I'm not fully prepared to discuss tradeoffs, but lack of a concurrency/atomicity contract in the docstring raised a few questions for me:
|
yes, in the two target stores (local and memory), disjoint range writes should be safe. overlapping range writes will have order-dependent behavior.
set + set_range is a race condition, but so are concurrent sets.
probably, zarr-python 2.x used a write to a temporary file + a rename for atomicity. we don't do that now, but we should! It's worth keeping in mind that there is just 1 intended caller of this method, and only under very special circumstances: the sharding codec when the inner chunks have deterministic compressed sizes. I don't know when this method would be called outside that context. |
Adds a protocol for stores that support synchronously and asynchronously writing a bytes into a range in the target object. only
MemoryStoreandLocalStoreimplement this.this behavior is necessary to enable an in-place writing mode for shards, e.g. where a single subchunk is written without re-writing the entire shard.