Skip to content

feat(fts): add configurable posting block size#7466

Draft
BubbleCal wants to merge 3 commits into
mainfrom
yang/oss-1344-make-fts-index-block-size-configurable
Draft

feat(fts): add configurable posting block size#7466
BubbleCal wants to merge 3 commits into
mainfrom
yang/oss-1344-make-fts-index-block-size-configurable

Conversation

@BubbleCal

@BubbleCal BubbleCal commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Feature

Linear: OSS-1344

What is the new feature?

FTS inverted index creation now accepts a block_size parameter for compressed posting blocks. Supported values are 128 and 256.

Why do we need this feature?

The posting block size was previously fixed at 128, which made the block-max granularity impossible to tune for different datasets and query profiles.

How does it work?

  • Adds block_size to InvertedIndexParams, protobuf details, posting-list schema metadata, and cache headers.
  • Uses 256 as the default for newly created indexes.
  • Treats older serialized params, schema metadata, and cache entries that omit block_size as legacy 128.
  • Rejects unsupported values, including 512, with a clear validation error.
  • Threads the configured block size through FTS build, read, iterator, WAND, cache, and MemWAL flush paths.
  • Exposes the parameter in Python and Java FTS index creation APIs, with docs and focused tests.

Validation

  • cargo fmt --all
  • cargo fmt --all --check
  • git diff --check
  • CARGO_TARGET_DIR=/tmp/lance-target-a479-no512 cargo test -p lance-index block_size -- --nocapture
  • CARGO_TARGET_DIR=/tmp/lance-target-a479-no512 cargo clippy -p lance-index --tests -- -D warnings
  • uv run make build from python/
  • uv run pytest python/tests/test_scalar_index.py::test_create_scalar_index_fts_block_size from python/
  • uv run ruff format --check python/tests/test_scalar_index.py python/lance/dataset.py from python/
  • uv run ruff check python/tests/test_scalar_index.py python/lance/dataset.py from python/

Not run locally: Java focused test / spotless check, because this machine has no Java Runtime installed (Unable to locate a Java Runtime).

@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added A-python Python bindings A-index Vector index, linalg, tokenizer A-java Java bindings + JNI A-format On-disk format: protos and format spec docs enhancement New feature or request labels Jun 25, 2026
@BubbleCal BubbleCal force-pushed the yang/oss-1344-make-fts-index-block-size-configurable branch from dd4ac88 to d9f0acb Compare June 25, 2026 07:09
@BubbleCal BubbleCal force-pushed the yang/oss-1344-make-fts-index-block-size-configurable branch from d9f0acb to 23e4810 Compare June 25, 2026 09:13
@BubbleCal BubbleCal force-pushed the yang/oss-1344-make-fts-index-block-size-configurable branch from 23e4810 to 059ae90 Compare June 25, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-format On-disk format: protos and format spec docs A-index Vector index, linalg, tokenizer A-java Java bindings + JNI A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant