fix(index): preserve schema metadata when re-serializing loaded HNSW#7476
Open
yanghua wants to merge 1 commit into
Open
fix(index): preserve schema metadata when re-serializing loaded HNSW#7476yanghua wants to merge 1 commit into
yanghua wants to merge 1 commit into
Conversation
ddb8fe1 to
653a12b
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HNSW::to_batch()on a disk-loaded graph dropped all schema metadataexcept the HNSW key, breaking the IVF partition-cache round-trip with
target schema is not superset of current schema.When an HNSW partition is loaded from disk, its
RecordBatchschemainherits metadata keys from the index file (
INDEX_METADATA_SCHEMA_KEY,IVF_METADATA_KEY, …). TheLoadedbranch ofto_batch()replaced theschema metadata with a single-key map (
HNSW_METADATA_KEYonly), so whenRecordBatch::with_schemaran its superset check, the new schema was nota superset of the original and the call errored out. This affects any
PartitionEntry<HNSW, *>(all quantizers) that re-serializes a loadedindex through the partition cache.
The fix clones the existing schema metadata and inserts/updates the HNSW
key, instead of building a fresh single-key map — guaranteeing the new
metadata is always a superset.
Changes
rust/lance-index/src/vector/hnsw/builder.rs: into_batch()'sLoadedbranch, mergeHNSW_METADATA_KEYinto the retained batch'sschema metadata rather than replacing it wholesale.
test_to_batch_loaded_preserves_extra_schema_metadata: simulatesthe disk-load path by seeding extra schema metadata keys, then asserts
to_batch()round-trips and preserves them.