perf(encoding): initialize only the page metadata a read will touch#7465
Draft
Ali2Arslan wants to merge 2 commits into
Draft
perf(encoding): initialize only the page metadata a read will touch#7465Ali2Arslan wants to merge 2 commits into
Ali2Arslan wants to merge 2 commits into
Conversation
Threads the requested top-level row ranges through `StructuralFieldScheduler::initialize` and `DecodeBatchScheduler::try_new` so the structural path initializes only the pages those ranges overlap (`pages_overlapping_ranges`) instead of every page in the column. Page metadata is now cached per page (`PageDataCacheKey`, which keeps the existing `view_tag` alongside a new `page_index`) and the misses for a read coalesce into a single shared request via the split `init_ranges` / `init_from_buffers` page-scheduler API. A cold point read's metadata IO is thus invariant to the column's page count -- a BTree-leaf read (one leaf out of thousands of large, non-coalescing pages) no longer pays to initialize untouched pages. Nested schedulers forward ranges consistently with their `schedule_ranges`: lists/maps/structs pass the top-level ranges through unchanged, and fixed-size-list scales them by `dimension`. `None` preserves the prior eager behavior (initialize every page). Tests: the round-trip encoding harness now drives both the eager (full scan) and lazy (range/take) paths, plus a unit test asserting an N-page miss issues exactly one coalesced request. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The structural decoder's
StructuralFieldScheduler::initializeeagerly initializes every page in a column before scheduling, even when the read only touches one page. For a column with thousands of large, non-coalescing pages (e.g. a BTree leaf scan that reads one leaf), this means paying metadata IO proportional to the page count regardless of how few rows are actually requested.This PR makes page-metadata initialization range-scoped and per-page:
DecodeBatchScheduler::try_newandStructuralFieldScheduler::initializenow take an optionalrequested_ranges: Option<Arc<[Range<u64>]>>— the top-level row ranges that will later be scheduled. The structural path initializes only the pages those ranges overlap (pages_overlapping_ranges).Nonepreserves the prior eager behavior.initializeis split intoinit_ranges()(declare the byte ranges the page needs) andinit_from_buffers(buffers, io)(finish initialization from those bytes). The field scheduler concatenates the misses' ranges into a singlesubmit_requestso adjacent ranges coalesce into shared GETs instead of one request per page.PageDataCacheKey { column_index, page_index, view_tag }(replacing the per-columnFieldDataCacheKey).view_tagis retained so a column decoded under two shapes (blob descriptorStruct<pos,size>vs rawLargeBinary) can't collide on cached state.schedule_ranges: lists/maps/structs pass the top-level ranges through unchanged; fixed-size-list scales bydimension. The requested ranges are always a superset of whatschedule_ranges/schedule_takelater touch, so any page a read schedules was initialized here.Net effect: a cold point/range read's metadata IO is invariant to the column's total page count.
Test plan
cargo test -p lance-encoding --lib(393 passed) — the round-trip encoding harness now exercises both the eager path (full scan,None) and the lazy path (range reads and scattered takes pass theirrequested_ranges).test_initialize_coalesces_missed_page_metadata: an N-page cache miss issues exactly onesubmit_requestcarrying every page's metadata range (not one request per page).init_ranges/init_from_bufferssplit.cargo test -p lance-file --lib(76 passed).cargo test -p lance --lib dataset::blob::tests::test_blob_cache_key_distinguishes_views— confirms the per-page key still distinguishes blob decoder views.cargo fmt --allandcargo clippy -p lance-encoding -p lance-file --tests --benches -- -D warningsclean.Made with Cursor