feat(perf): content-keyed embedding cache to skip redundant per-turn embeds by github-actions[bot] · Pull Request #1748 · amd/gaia

github-actions · 2026-06-18T19:26:45Z

Every chat turn re-embedded the query from scratch, so identical text — the same recall(query=…) across turns, or hybrid search re-embedding input a tool call already embedded that turn — paid the Lemonade embed cost twice, adding latency and avoidable backend calls. This adds a content-keyed LRU cache so an identical embed is served from memory and makes zero backend calls.

The cache key is the content — (model_id, dim, sha256(text)) — so a hit is never stale and swapping the embedding model invalidates by construction. It's wired into the two per-turn embed sites (MemoryMixin._embed_text and RAG query encoding); stored memories and doc chunks already persist their vectors, so this targets repeated query embeds only and leaves indexing untouched.

Note: this does not by itself fix the NPU load loop (#1746) — a genuinely new query still embeds once.

Closes #1743

Test plan

python -m pytest tests/unit/test_embedding_cache.py tests/unit/test_memory_mixin.py tests/unit/rag/ -q passes
python util/lint.py --all passes (Black, isort, Pylint, Flake8 clean on the changed files)
New tests assert a second identical embed makes zero backend calls and that a model/dim change invalidates the entry

…embeds Adds an LRU cache keyed on (model_id, dim, sha256(text)) so identical query text re-embedded across turns pays the Lemonade embed cost once. The key is the content, so a hit is never stale and a model swap invalidates by construction. Wired into MemoryMixin._embed_text and RAGSDK query encoding; stored memories and doc chunks already persist their vectors, so this targets repeated query embeds only. Closes #1743

github-actions Bot requested a review from kovtcharov-amd as a code owner June 18, 2026 19:26

github-actions Bot mentioned this pull request Jun 18, 2026

feat(perf): content-keyed embedding cache to skip redundant per-turn embeds #1743

Open

2 tasks

kovtcharov-amd approved these changes Jun 18, 2026

View reviewed changes

Merge branch 'main' into autofix/issue-1743

f59467c

github-actions Bot added rag RAG system changes llm LLM backend changes tests Test changes performance Performance-critical changes agents labels Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(perf): content-keyed embedding cache to skip redundant per-turn embeds#1748

feat(perf): content-keyed embedding cache to skip redundant per-turn embeds#1748
github-actions[bot] wants to merge 2 commits into
mainfrom
autofix/issue-1743

github-actions Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

github-actions Bot commented Jun 18, 2026

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant