Skip to content

fix: pin Lemonade back to 10.2.0 (embedding regression on >= b6524)#1872

Closed
kovtcharov wants to merge 1 commit into
mainfrom
fix/revert-lemonade-10.2.0
Closed

fix: pin Lemonade back to 10.2.0 (embedding regression on >= b6524)#1872
kovtcharov wants to merge 1 commit into
mainfrom
fix/revert-lemonade-10.2.0

Conversation

@kovtcharov

Copy link
Copy Markdown
Collaborator

Why this matters

RAG embeddings are broken on the AMD NPU/GPU path. Lemonade 10.7.0/10.8.x bundle a llama.cpp build ≥ b6524, which crashes loading the embedding model nomic-embed-text-v2-moe on the Vulkan backend (llama-server failed to start). Test Lemonade Embeddings has been red since the 10.x bumps; 10.2.0 is the last version where it passes.

Upstream is unfixed and there's no working workaround:

  • llama.cpp #16301 — b6524 Vulkan regression, open, deprioritized by maintainers as niche.
  • lemonade #612 / #941open, downstream.
  • The maintainer's GGML_VK_DISABLE_COOPMAT=1 workaround is already applied in our embeddings CI job (test_embeddings.yml) and still fails on our Strix/Windows runners — so a downgrade is the only fix available to us.

10.2.0 is GAIA's documented min_lemonade_version floor (src/gaia/installer/init_command.py), so this pins to the floor, not below it. Port 13305 (introduced in 10.1.0) still applies, and no per-version checksums are pinned. version.py is the single source of truth; the C++ setup docs are updated in lock-step.

Test plan

  • version.py + all hardcoded doc references moved 10.8.1 → 10.2.0 (no 10.8.1 remains outside lockfiles)
  • Test Lemonade Embeddings passes on this branch — the decisive check; manually dispatched against this branch to confirm 10.2.0 loads nomic-embed-text-v2-moe
  • RAG / API / agent Lemonade checks green
  • Follow-up (separate): once upstream llama.cpp/​Lemonade ships a fix, bump forward again — pairs with the CI-trigger gap fix (ci: run Lemonade-dependent checks when LEMONADE_VERSION changes #1871) so the next bump actually runs these checks

Related: #1871 (makes a LEMONADE_VERSION bump trigger the full Lemonade test surface, so this class of regression is caught on the bump PR).

Lemonade 10.7.0/10.8.x bundle a llama.cpp build >= b6524, which crashes
loading the embedding model nomic-embed-text-v2-moe on the Vulkan backend
(AMD) — "llama-server failed to start". This breaks RAG embeddings on the
NPU/GPU path and has been red in CI since the 10.x bumps.

Upstream is unfixed: llama.cpp #16301 (b6524 Vulkan regression, open,
deprioritized) and lemonade #612 / #941 (open). The maintainer's
GGML_VK_DISABLE_COOPMAT=1 workaround is already applied in the embeddings
CI job and still fails on our Strix/Windows runners, so there is no
effective workaround — revert to the last known-good version.

10.2.0 is GAIA's documented min_lemonade_version floor, so this is a pin to
the floor, not below it. version.py is the single source of truth; the cpp
setup docs are updated in lock-step.
@github-actions github-actions Bot added documentation Documentation changes cpp labels Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Verdict: Approve — pending the one CI check that actually proves the fix.

This pins Lemonade back from 10.8.1 to 10.2.0 to fix RAG embeddings, which crash on the NPU/GPU Vulkan path with the llama.cpp build (≥ b6524) that 10.7.0/10.8.x bundle. It's a clean, well-scoped revert: version.py (the single source of truth) plus the three docs that hardcode an install version, moved in lockstep. The rationale and upstream-unfixed status are documented thoroughly.

The bottom line: the code change is correct, but the decisive proof — Test Lemonade Embeddings passing on this branch with 10.2.0 actually loading nomic-embed-text-v2-moe — is still an unchecked box in the test plan. That's the real merge gate here, not the diff. Hold the merge until that check is green; everything else verifies.

🔍 Technical details

Verification performed (claims hold up):

  • No 10.8.1 remains anywhere outside lockfiles — confirmed by repo-wide grep across .py/.mdx/.md/.yml/.toml/.json.
  • 10.2.0 is the documented floor: src/gaia/installer/init_command.py sets min_lemonade_version: "10.2.0" for all the relevant profiles, so this pins to the floor, not below it.
  • Single-source-of-truth intact: every runtime reference flows through LEMONADE_VERSION (version.py:12), including the NSIS installer define at version.py:79. Only the docs that hardcode a literal needed manual edits, and all three were updated.
  • docs/guides/npu.mdx:15 correctly reads v10.2.0+ (matches the floor semantics).
  • Other versioned doc refs (10.7.0, 10.1.0, 10.0.0, 10.0.2) are in docs/releases/v0.21.2.mdx and docs/plans/* — frozen changelog/planning content that should NOT track the pin. Correctly left untouched.

🟢 Process note (not blocking): the test plan's decisive item (Test Lemonade Embeddings green on this branch) is unchecked. A version pin has no unit-testable logic, so that dispatched CI run is the only real evidence the regression is fixed — worth confirming green before merge rather than after.

Strengths:

  • Respects the LEMONADE_VERSION single source of truth and keeps the hardcoded doc references in lockstep — no drift introduced.
  • Correctly distinguishes install-instruction versions (updated) from historical release notes / plans (left alone).
  • Description names the upstream issues, the already-applied GGML_VK_DISABLE_COOPMAT=1 workaround that still fails, and links the follow-up (ci: run Lemonade-dependent checks when LEMONADE_VERSION changes #1871) that makes future bumps trigger this test surface — exactly the context a reviewer needs.

@kovtcharov

Copy link
Copy Markdown
Collaborator Author

Closing — the premise is invalidated. The embedding-load failure is not a Lemonade version regression: 10.2.0 fails today with the identical llama-server failed to start (it passed 3 days ago on a different runner, stx-3, vs the failing stx-1), and the model loads fine on 10.8.1 in isolation. So the downgrade would not fix CI. Root cause is runner/harness-side, not the Lemonade version — tracking that separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp documentation Documentation changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant