perf(fts): prewarm larger chunks concurrently by BubbleCal · Pull Request #7436 · lance-format/lance

BubbleCal · 2026-06-24T04:00:31Z

Performance Improvement

What is the performance issue or bottleneck?

FTS prewarm was reading posting lists in very small chunks. For large FTS indexes this can create thousands of range reads, so remote-read scheduling and per-chunk build overhead dominate prewarm time.

How does this PR improve performance?

This PR uses larger bounded chunks and bounded read/build concurrency:

Increase the default prewarm chunk target from 32 MiB to 128 MiB.
Increase the token cap from 4,096 tokens to 256K tokens.
Read/build chunks concurrently within the current posting partition, using the store I/O parallelism as the concurrency limit.
Prewarm posting partitions serially to avoid multiplying partition-level and chunk-level fanout.
Preserve group-aligned chunk boundaries and with_position=true cache behavior.

This keeps prewarm bounded without using a whole-file fast path or runtime memory-budget probing.

Benchmark or measurement results:

On the large FTS prewarm benchmark used for validation:

Old main: 434.200s.
This PR: 35.216s and 35.165s, mean 35.191s.
About 12.34x faster than old main.
Prewarm wall time reduced by about 91.9%.

Validation

cargo fmt --all
git diff --check
cargo test -p lance-index prewarm -- --nocapture
cargo clippy --all --tests --benches -- -D warnings

codecov · 2026-06-24T04:41:03Z

Codecov Report

❌ Patch coverage is 96.63866% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/index.rs	96.63%	3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

perf(fts): avoid repeated group scans in prewarm

51357b7

github-actions Bot added performance A-index Vector index, linalg, tokenizer and removed performance labels Jun 24, 2026

perf(fts): restore budgeted whole-file prewarm

a8157f8

github-actions Bot added the performance label Jun 24, 2026

BubbleCal changed the title ~~perf(fts): avoid repeated group scans in prewarm~~ perf(fts): restore budgeted whole-file prewarm Jun 24, 2026

perf(fts): prewarm larger chunks concurrently

4eaaac1

BubbleCal changed the title ~~perf(fts): restore budgeted whole-file prewarm~~ perf(fts): prewarm larger chunks concurrently Jun 24, 2026

perf(fts): avoid collecting prewarm chunk results

1675891

BubbleCal force-pushed the yang/fix-fts-prewarm-group-scan branch from 822b129 to 1675891 Compare June 24, 2026 15:18

BubbleCal added 3 commits June 25, 2026 14:01

perf(fts): prewarm partitions serially

42b94b5

perf(fts): increase prewarm chunk concurrency

d5cb8d4

perf(fts): use store io parallelism for prewarm chunks

6fe64e1

BubbleCal marked this pull request as ready for review June 25, 2026 06:50

Xuanwo approved these changes Jun 25, 2026

View reviewed changes

BubbleCal merged commit ae8725e into main Jun 25, 2026
38 of 39 checks passed

BubbleCal deleted the yang/fix-fts-prewarm-group-scan branch June 25, 2026 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(fts): prewarm larger chunks concurrently#7436

perf(fts): prewarm larger chunks concurrently#7436
BubbleCal merged 7 commits into
mainfrom
yang/fix-fts-prewarm-group-scan

BubbleCal commented Jun 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BubbleCal commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Improvement

Validation

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BubbleCal commented Jun 24, 2026 •

edited

Loading

codecov Bot commented Jun 24, 2026 •

edited

Loading