perf: merge half-open range queries on the same BTree index by xloya · Pull Request #7477 · lance-format/lance

xloya · 2026-06-25T09:07:08Z

Problem

A filter like fqdn = x AND log_time >= A AND log_time <= B AND channel = y is compiled into two half-open range queries on the same BTree index (log_time >= A with an unbounded upper bound, and log_time <= B with an unbounded lower bound). Each half-open range matches nearly all BTree pages, so the entire index is loaded even though only the pages inside [A, B] are needed.

Fix

Add a ScalarIndexExpr::optimize() pass that:

flattens the AND tree to collect leaf queries,
merges Range queries on the same index into a single closed range,
rebuilds the AND tree.

It is called from ScalarIndexExec::new() before execution.

Benchmark

Method: build a real BTree index over 100,000 sorted unique i32 values, split into ~100 pages (batch_size = 1000). Query a narrow 50-value window in the middle of the range. Measure both (a) the number of BTree pages loaded (via LocalMetricsCollector.parts_loaded) and (b) query wall-clock time, for:

Before — the two half-open range searches the AND tree issues separately: value >= a (unbounded upper) and value <= b (unbounded lower);
After — the single closed range [a, b] produced by optimize().

Pages loaded:

Execution	BTree pages loaded
Before (two half-open ranges)	101 (≈ whole index)
After (merged closed range)	1

≈ 101x fewer index pages loaded for this query shape.

Query latency:

Scenario	Before (two half-open)	After (merged)	Speedup
In-memory (CPU/decode bound)	18.6 ms/q	0.28 ms/q	~67x
Per-page latency (2 ms/GET)	76.7 ms/q	10.8 ms/q	~7x

In-memory, the ~67x reflects the decode/scan cost of touching the whole index vs a single page. On remote storage the wall-clock gain depends on whether the workload is latency-bound (page fetches are parallelized, so the gain is smaller) or bandwidth-bound (scales with the ~100x reduction in bytes read).

Test

test_optimize_* suite in expression.rs covers the merge logic (nested AND trees, exclusive bounds, no-merge cases for different indices / non-range queries, OR/NOT preservation, recheck propagation).

…index load When a filter like 'fqdn = x AND log_time >= A AND log_time <= B AND channel = y' is evaluated, the expression compiler splits the log_time range into two separate half-open range queries (>= A with Unbounded upper, <= B with Unbounded lower). Each half-open range matches nearly all BTree pages, causing the entire index to be loaded (~433 MB, ~100s) even though only a few pages (~5 MB) are actually needed. This fix adds a ScalarIndexExpr::optimize() pass that: 1. Flattens the AND expression tree to collect all leaf queries 2. Identifies Range queries on the same index 3. Merges them into a single closed-range query with tighter bounds 4. Rebuilds the AND tree with the merged result The optimize() pass is called in ScalarIndexExec::new() before execution. Expected improvement: queries combining BTree range filters with other index filters (Bitmap) should see ~20x reduction in index I/O on first query.

codecov · 2026-06-25T09:43:05Z

Codecov Report

❌ Patch coverage is 93.75951% with 41 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/expression.rs	93.75%	38 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: merge half-open range queries on the same BTree index#7477

perf: merge half-open range queries on the same BTree index#7477
xloya wants to merge 1 commit into
lance-format:mainfrom
xloya:upstream-pr/btree-range-merge

xloya commented Jun 25, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xloya commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Benchmark

Test

Uh oh!

codecov Bot commented Jun 25, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xloya commented Jun 25, 2026 •

edited

Loading