Skip to content

perf(encoding): initialize only the page metadata a read will touch#7465

Draft
Ali2Arslan wants to merge 2 commits into
lance-format:mainfrom
Ali2Arslan:feat/lazy-page-metadata-init
Draft

perf(encoding): initialize only the page metadata a read will touch#7465
Ali2Arslan wants to merge 2 commits into
lance-format:mainfrom
Ali2Arslan:feat/lazy-page-metadata-init

Conversation

@Ali2Arslan

Copy link
Copy Markdown

Summary

The structural decoder's StructuralFieldScheduler::initialize eagerly initializes every page in a column before scheduling, even when the read only touches one page. For a column with thousands of large, non-coalescing pages (e.g. a BTree leaf scan that reads one leaf), this means paying metadata IO proportional to the page count regardless of how few rows are actually requested.

This PR makes page-metadata initialization range-scoped and per-page:

  • DecodeBatchScheduler::try_new and StructuralFieldScheduler::initialize now take an optional requested_ranges: Option<Arc<[Range<u64>]>> — the top-level row ranges that will later be scheduled. The structural path initializes only the pages those ranges overlap (pages_overlapping_ranges). None preserves the prior eager behavior.
  • The page-scheduler initialize is split into init_ranges() (declare the byte ranges the page needs) and init_from_buffers(buffers, io) (finish initialization from those bytes). The field scheduler concatenates the misses' ranges into a single submit_request so adjacent ranges coalesce into shared GETs instead of one request per page.
  • Page metadata is cached per page via PageDataCacheKey { column_index, page_index, view_tag } (replacing the per-column FieldDataCacheKey). view_tag is retained so a column decoded under two shapes (blob descriptor Struct<pos,size> vs raw LargeBinary) can't collide on cached state.
  • Nested schedulers forward ranges consistently with their existing schedule_ranges: lists/maps/structs pass the top-level ranges through unchanged; fixed-size-list scales by dimension. The requested ranges are always a superset of what schedule_ranges/schedule_take later touch, so any page a read schedules was initialized here.

Net effect: a cold point/range read's metadata IO is invariant to the column's total page count.

Test plan

  • cargo test -p lance-encoding --lib (393 passed) — the round-trip encoding harness now exercises both the eager path (full scan, None) and the lazy path (range reads and scattered takes pass their requested_ranges).
  • New unit test test_initialize_coalesces_missed_page_metadata: an N-page cache miss issues exactly one submit_request carrying every page's metadata range (not one request per page).
  • Updated FullZip page-scheduler tests to the init_ranges / init_from_buffers split.
  • cargo test -p lance-file --lib (76 passed).
  • cargo test -p lance --lib dataset::blob::tests::test_blob_cache_key_distinguishes_views — confirms the per-page key still distinguishes blob decoder views.
  • cargo fmt --all and cargo clippy -p lance-encoding -p lance-file --tests --benches -- -D warnings clean.

Made with Cursor

Threads the requested top-level row ranges through
`StructuralFieldScheduler::initialize` and `DecodeBatchScheduler::try_new`
so the structural path initializes only the pages those ranges overlap
(`pages_overlapping_ranges`) instead of every page in the column.

Page metadata is now cached per page (`PageDataCacheKey`, which keeps the
existing `view_tag` alongside a new `page_index`) and the misses for a
read coalesce into a single shared request via the split
`init_ranges` / `init_from_buffers` page-scheduler API. A cold point read's
metadata IO is thus invariant to the column's page count -- a BTree-leaf
read (one leaf out of thousands of large, non-coalescing pages) no longer
pays to initialize untouched pages.

Nested schedulers forward ranges consistently with their `schedule_ranges`:
lists/maps/structs pass the top-level ranges through unchanged, and
fixed-size-list scales them by `dimension`. `None` preserves the prior
eager behavior (initialize every page).

Tests: the round-trip encoding harness now drives both the eager (full
scan) and lazy (range/take) paths, plus a unit test asserting an N-page
miss issues exactly one coalesced request.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added A-encoding Encoding, IO, file reader/writer performance labels Jun 25, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant