component search v2 by Mbeaulne · Pull Request #2308 · TangleML/tangle-ui

Mbeaulne · 2026-05-25T16:53:08Z

Description

Adds an experimental Components V2 page with natural-language search over the component library, behind the component-search-v2 beta flag. Currently the Components page has no search — finding the right component in a large library is painful. This is the start of a real fix.

Architecture: two layers, not one

The search uses a lexical index + optional LLM rerank pattern rather than sending every query to an LLM. This keeps the common case fast and cheap.

Lexical search (componentSearchIndex.ts) runs entirely in the browser. Tokenizes component name, description, input/output names, and container command/args (image, args, flags). Sub-10ms for hundreds of components. No API call, no key needed. Works for code-style queries (pandas, train_test_split, --epochs) and partial names.
AI rerank (naturalLanguageComponentSearchService.ts) is opt-in via a ✨ button. Takes the top 20 lexical hits and asks an LLM to reorder them by intent and write a one-sentence reason per match. Reranking 20 candidates is cheap and fast; the model never sees the whole library.

BYOK

AI rerank requires the user's own OpenAI-compatible API key (any provider — OpenAI, Anthropic via gateway, Gemini, Shopify LLM proxy, local Ollama, etc.). Stored in localStorage only — no shared key bundled in the app, no proxying through Tangle. Configured at Settings → Agent Configuration. Lexical search works with no key.

What's intentionally NOT in this PR

Static embedding index — would catch fully-semantic queries with zero word overlap (e.g. reduce dimensionality → pca_decomposition). Requires build-step changes; deferred to a follow-up. The architecture has a clean hook point: add semanticSearch() next to lexicalSearch() and merge.
"Insert into editor" action on result cards. Discovery without follow-through is half the value, but editor integration is its own PR.
Telemetry on query / click / rerank usage. Needs analytics-provider wiring; own PR.
Server-side LLM endpoint with a managed key. BYOK is the v0; server-side comes when we know users want this.

Notable design decisions

AI rerank is a useMutation, not a useQuery. Reranking is an explicit user action. Auto-firing on keystrokes would burn tokens for nothing.
Implementation/command text goes into the lexical index, not the LLM prompt. Was in the LLM prompt in an earlier iteration; cut because LLMs are bad at exact-string matching and slow for retrieval. Lexical does code-search properly.
useDeferredValue (React 19) instead of a debounce timer. Input stays snappy.
Route is flag-gated via beforeLoad redirect, not just hidden in the sidebar. Direct URL navigation to /components-v2 redirects to /components if the flag is off.
Hallucinated ids from the LLM are filtered against the candidate set. Candidates the LLM drops are appended after reranked ones — surfacing lexical hits the model disagreed with builds trust by not silently hiding them.

Related Issue and Pull requests

N/A — experimental beta feature.

Type of Change

New feature

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Screenshots

TODO: add screenshots of the search page (empty / lexical results / AI rerank) and the Agent Configuration settings page.

Test Instructions

Setup

Enable the flag at Settings → Beta Features → Component Search V2.
A new Components V2 entry appears in the dashboard sidebar.

Test lexical search (no key required)

Navigate to Components V2.
Confirm the empty state shows N components indexed. Start typing to search.
Try these queries (results should appear within a frame, well under 50ms):
- Exact name: type train → components with "train" in name surface first. Each result has a matched: name badge.
- Code / library: type pandas → components whose container command imports pandas surface, with matched: command badge.
- Multi-token: type train test split → train_test_split (or similar) ranks at the top.
- Input/output names: type the name of a known I/O parameter (e.g. dataset) → matches show matched: inputs/outputs.
- Unknown: type asdfqwer → "No components matched" message.
Clear the input → returns to the empty state with component count.

Test AI rerank (BYOK required)

Without configuration: type any query that returns lexical hits. The ✨ button is disabled with a panel below offering a link to Settings → Agent Configuration.
Navigate to Settings → Agent Configuration.
Enter an OpenAI-compatible API base URL and key. Example for OpenAI direct: https://api.openai.com/v1 + your sk-... key.
Click Test connection → toast confirms "Connected. Provider exposes N model(s)."
Click Save → toast confirms "Agent settings saved."
Return to Components V2. Type a semantic query that lexical handles poorly, e.g. clean up my data (assuming you have components like dedupe_rows, drop_nulls).
Click the ✨ button. Within ~1–3 seconds, results reorder and each card gains a Why: ... line explaining the match.
Verify the loading state shows a spinner inside the icon button while the request is in flight.
Edit the query → rerank state clears, lexical results return.

Test flag gating

Disable the flag at Settings → Beta Features.
Confirm the Components V2 sidebar entry disappears.
Manually navigate to /components-v2 in the URL bar → redirects to /components.

Test error handling

In Agent Configuration, set an invalid API base (e.g. https://example.com/v1) and save.
Run an AI search → "AI search failed: ..." appears below the results. Lexical results above remain visible.
Click Clear in Agent Configuration → settings reset, ✨ button disables again.

Regression

Confirm the regular Components page still works (it's untouched).
Confirm /runs, /pipelines, /favorites, and the editor still work.

Additional Comments

This is intentionally scoped as an MVP behind a flag. Treat the lexical layer as the load-bearing piece — it works without any API key, and the architecture is set up so embeddings can slot in next to it without restructuring. The LLM rerank is the optional cherry on top, not the core mechanic.

Open questions for reviewers:

Default lexical limit (20) — feels right for the rerank input size, but if libraries grow we may want to raise it for display while keeping rerank capped.
Two model fields in ComponentSearchConfig (model + thinkingModel) — currently only thinkingModel is used (for rerank). The model field is vestigial from the prior "fast vs thinking" toggle. Worth simplifying to one field in a follow-up, with a tiny localStorage migration.
AI search availability when lexical returns 0 hits. Currently the ✨ button is disabled in that case. A query with zero literal overlap can't be rescued by AI rerank — that's the embedding-index gap. We could fall back to "send the whole library to the LLM" as a safety net (~5 LOC) but it changes the cost story. Leaving it out for now; will revisit after dogfooding.

github-actions · 2026-05-25T16:53:17Z

🎩 Preview

A preview build has been created at: 05-25-component_search_v2/22e4748

Mbeaulne · 2026-05-25T16:53:22Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Mbeaulne force-pushed the 05-25-component_search_v2 branch 3 times, most recently from 376b12b to fccd552 Compare May 25, 2026 19:40

component search v2

22e4748

Mbeaulne force-pushed the 05-25-component_search_v2 branch from fccd552 to 22e4748 Compare May 26, 2026 13:39

Mbeaulne mentioned this pull request May 26, 2026

component search v2 - unify component search across standard, user, published, and registered sources #2311

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

component search v2#2308

component search v2#2308
Mbeaulne wants to merge 1 commit into
masterfrom
05-25-component_search_v2

Mbeaulne commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Mbeaulne commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mbeaulne commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Architecture: two layers, not one

BYOK

What's intentionally NOT in this PR

Notable design decisions

Related Issue and Pull requests

Type of Change

Checklist

Screenshots

Test Instructions

Setup

Test lexical search (no key required)

Test AI rerank (BYOK required)

Test flag gating

Test error handling

Regression

Additional Comments

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 Preview

Uh oh!

Mbeaulne commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mbeaulne commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

Mbeaulne commented May 25, 2026 •

edited

Loading