Skip to content

perf(fts): prewarm larger chunks concurrently#7436

Merged
BubbleCal merged 7 commits into
mainfrom
yang/fix-fts-prewarm-group-scan
Jun 25, 2026
Merged

perf(fts): prewarm larger chunks concurrently#7436
BubbleCal merged 7 commits into
mainfrom
yang/fix-fts-prewarm-group-scan

Conversation

@BubbleCal

@BubbleCal BubbleCal commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Performance Improvement

What is the performance issue or bottleneck?

FTS prewarm was reading posting lists in very small chunks. For large FTS indexes this can create thousands of range reads, so remote-read scheduling and per-chunk build overhead dominate prewarm time.

How does this PR improve performance?

This PR uses larger bounded chunks and bounded read/build concurrency:

  • Increase the default prewarm chunk target from 32 MiB to 128 MiB.
  • Increase the token cap from 4,096 tokens to 256K tokens.
  • Read/build chunks concurrently within the current posting partition, using the store I/O parallelism as the concurrency limit.
  • Prewarm posting partitions serially to avoid multiplying partition-level and chunk-level fanout.
  • Preserve group-aligned chunk boundaries and with_position=true cache behavior.

This keeps prewarm bounded without using a whole-file fast path or runtime memory-budget probing.

Benchmark or measurement results:

On the large FTS prewarm benchmark used for validation:

  • Old main: 434.200s.
  • This PR: 35.216s and 35.165s, mean 35.191s.
  • About 12.34x faster than old main.
  • Prewarm wall time reduced by about 91.9%.

Validation

  • cargo fmt --all
  • git diff --check
  • cargo test -p lance-index prewarm -- --nocapture
  • cargo clippy --all --tests --benches -- -D warnings

@github-actions github-actions Bot added performance A-index Vector index, linalg, tokenizer and removed performance labels Jun 24, 2026
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.63866% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 96.63% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal changed the title perf(fts): avoid repeated group scans in prewarm perf(fts): restore budgeted whole-file prewarm Jun 24, 2026
@BubbleCal BubbleCal changed the title perf(fts): restore budgeted whole-file prewarm perf(fts): prewarm larger chunks concurrently Jun 24, 2026
@BubbleCal BubbleCal force-pushed the yang/fix-fts-prewarm-group-scan branch from 822b129 to 1675891 Compare June 24, 2026 15:18
@BubbleCal BubbleCal marked this pull request as ready for review June 25, 2026 06:50
@BubbleCal BubbleCal merged commit ae8725e into main Jun 25, 2026
38 of 39 checks passed
@BubbleCal BubbleCal deleted the yang/fix-fts-prewarm-group-scan branch June 25, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants