perf: merge half-open range queries on the same BTree index#7477
Open
xloya wants to merge 1 commit into
Open
Conversation
…index load When a filter like 'fqdn = x AND log_time >= A AND log_time <= B AND channel = y' is evaluated, the expression compiler splits the log_time range into two separate half-open range queries (>= A with Unbounded upper, <= B with Unbounded lower). Each half-open range matches nearly all BTree pages, causing the entire index to be loaded (~433 MB, ~100s) even though only a few pages (~5 MB) are actually needed. This fix adds a ScalarIndexExpr::optimize() pass that: 1. Flattens the AND expression tree to collect all leaf queries 2. Identifies Range queries on the same index 3. Merges them into a single closed-range query with tighter bounds 4. Rebuilds the AND tree with the merged result The optimize() pass is called in ScalarIndexExec::new() before execution. Expected improvement: queries combining BTree range filters with other index filters (Bitmap) should see ~20x reduction in index I/O on first query.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A filter like
fqdn = x AND log_time >= A AND log_time <= B AND channel = yis compiled into two half-open range queries on the same BTree index (log_time >= Awith an unbounded upper bound, andlog_time <= Bwith an unbounded lower bound). Each half-open range matches nearly all BTree pages, so the entire index is loaded even though only the pages inside[A, B]are needed.Fix
Add a
ScalarIndexExpr::optimize()pass that:It is called from
ScalarIndexExec::new()before execution.Benchmark
Method: build a real BTree index over 100,000 sorted unique
i32values, split into ~100 pages (batch_size = 1000). Query a narrow 50-value window in the middle of the range. Measure both (a) the number of BTree pages loaded (viaLocalMetricsCollector.parts_loaded) and (b) query wall-clock time, for:value >= a(unbounded upper) andvalue <= b(unbounded lower);[a, b]produced byoptimize().Pages loaded:
≈ 101x fewer index pages loaded for this query shape.
Query latency:
In-memory, the ~67x reflects the decode/scan cost of touching the whole index vs a single page. On remote storage the wall-clock gain depends on whether the workload is latency-bound (page fetches are parallelized, so the gain is smaller) or bandwidth-bound (scales with the ~100x reduction in bytes read).
Test
test_optimize_*suite inexpression.rscovers the merge logic (nested AND trees, exclusive bounds, no-merge cases for different indices / non-range queries, OR/NOT preservation, recheck propagation).