Semantic search over Claude Code conversation history. Persistent HTTP server with hybrid vector + keyword search, real-time transcript indexing, and multi-project support.
SessionFlow watches your Claude Code session transcripts, embeds them with a local MLX model (EmbeddingGemma-300M on Apple Silicon), and makes them searchable via MCP tools. Every conversation turn is indexed — search by keyword, semantic meaning, date range, project, or git branch.
Claude Code terminals (6-8 concurrent)
↓ POST /mcp (MCP StreamableHTTP)
SessionFlow HTTP server (port 7102)
├── Embedding: EmbeddingGemma-300M via MLX Metal (local, ~600 MB)
├── Vectors: Milvus Standalone (remote, HNSW index)
├── Keywords: SQLite FTS5 (local sidecar)
├── Search: Hybrid RRF merge (vector + keyword)
└── Watcher: FSEvents → debounce → incremental parse → index
- Embedding model runs locally on Apple Silicon GPU (Metal) — no API calls
- Vector storage on Milvus Standalone (Portainer) via
SESSIONFLOW_MILVUS_URI— or embedded Milvus Lite fallback - One server process serves all terminals — model loaded once, connections pooled
- Real-time indexing via file watcher on
~/.claude/projects/
| Tool | Description |
|---|---|
search_session |
Search current session with recency bias |
search_all_sessions |
Cross-session semantic search. Optional git_branch, date_from, date_to, project_root filters |
get_turns |
Retrieve context around a specific turn |
get_session_stats |
Index statistics: turn count, session count, branches |
cleanup_sessions |
Delete session data by age, session ID, or branch |
search_all_sessions("deployment decisions", date_from="2026-04-08", date_to="2026-04-08")
search_all_sessions("what broke in production", date_from="2026-04-01")
search_session("the milvus migration", session_id="<CLAUDE_SESSION_ID>")
./setup.sh # venv, deps, model download, hooks
# Restart Claude Code to activate| Variable | Default | Description |
|---|---|---|
SESSIONFLOW_MILVUS_URI |
~/.sessionflow/milvus.db |
Milvus URI — http://host:port for Standalone, file path for Lite |
SESSIONFLOW_HOST |
127.0.0.1 |
HTTP server bind address |
SESSIONFLOW_PORT |
7102 |
HTTP server port |
SESSIONFLOW_MODEL |
embeddinggemma |
Embedding model (embeddinggemma or modernbert) |
SESSIONFLOW_EXPIRE_DAYS |
365 |
Auto-prune turns older than N days |
SESSIONFLOW_WATCH |
true |
Enable real-time file watcher |
SESSIONFLOW_URL |
http://127.0.0.1:7102 |
Server URL (used by hooks) |
./sessionflow-server.sh start # Start server + watchdog
./sessionflow-server.sh stop # Stop server
./sessionflow-server.sh restart # Restart
./sessionflow-server.sh status # Check health
~/.sessionflow/sessionflow-launcher.sh start # Hook-safe start via LaunchAgent
curl http://127.0.0.1:7102/health # Health checkWhen several harnesses (Claude Code, Codex, OpenCode, Antigravity CLI) launch
at the same time, their SessionStart hooks race to start the server. The
optional user LaunchAgent starts SessionFlow at login before any hook fires,
so every harness simply attaches to an already-running server.
./sessionflow-server.sh install-agent # write & bootstrap ~/Library/LaunchAgents/cc.lbruton.sessionflow.plist
./sessionflow-server.sh agent-status # show plist + launchctl state
./sessionflow-server.sh uninstall-agent # bootout + remove plistThe LaunchAgent is OPTIONAL — start/stop/restart/status behavior is
unchanged whether it is installed or not. setup.sh will offer to install it
interactively, or non-interactively when SESSIONFLOW_INSTALL_AGENT=1.
SessionFlow ingests sessions from multiple coding agents. Native structured sources are preferred; terminal/log fallbacks are not used in this release.
| Provider | Status | Source kind | Notes |
|---|---|---|---|
claude_code_cli |
Searchable | claude_code_jsonl (~/.claude/projects/**/*.jsonl) |
Original SessionFlow source. Watcher + backfill both supported. |
claude_desktop_cowork |
Probe only | claude_desktop_sessions (~/Library/Application Support/Claude/claude-code-sessions/**/local_*.json) |
Discovery surfaces files in health/status output. Full turn-content ingestion deferred pending a parser spike. |
codex |
Searchable | codex_rollout_jsonl |
Native rollout JSONL; provider-tagged. |
opencode |
Searchable | opencode_storage |
Native storage; provider-tagged. |
antigravity_cli |
Searchable | antigravity_cli_transcript_jsonl (brain/<id>/.system_generated/logs/transcript.jsonl) |
Authoritative source per discovery. Sibling .pb artifacts are NOT parsed in this release. |
antigravity_desktop |
Searchable | antigravity_desktop_transcript_jsonl |
Desktop/IDE transcript JSONL. Source kind is distinguished from antigravity_cli in diagnostics. |
Antigravity is the successor to Gemini CLI. SessionFlow treats legacy Gemini CLI artifacts as migration context only:
antigravity_cliingestsbrain/<id>/.system_generated/logs/transcript.jsonl. Sibling.pb(protobuf) artifacts are opaque without a stable schema and are not parsed in this release.antigravity_desktopingests desktop/IDE transcript JSONL.- Legacy Gemini CLI history (
legacy_gemini_history) is recognized as a source-kind constant for one-time import work but is not auto-ingested.
~/Library/Application Support/Claude/claude-code-sessions/**/local_*.json
files are discovered and reported in health/status output, but full turn
content is not yet indexed — a parser spike is required before claiming
searchable support. Treat it as probe-only for now.
All embedding work in SessionFlow is local MLX (EmbeddingGemma-300M). Backfill
respects a configurable budget so long historical imports cannot saturate the
GPU. See embedding_control.py for the authoritative list.
| Variable | Default | Purpose |
|---|---|---|
SESSIONFLOW_EMBED_BATCH_SIZE |
16 |
Max turns embedded per batch. |
SESSIONFLOW_EMBED_COOLDOWN_MS |
200 |
Sleep between batches. Hard floor of 200ms — MLX Metal driver SIGSEGVs under sustained load below this. |
SESSIONFLOW_BACKFILL_MAX_TURNS_PER_RUN |
200 |
Max turns embedded per backfill invocation. |
SESSIONFLOW_BACKFILL_MAX_FILES_PER_RUN |
100 |
Max source files visited per backfill invocation. |
SESSIONFLOW_BACKFILL_RECENT_DAYS |
14 |
Window for recent mode. |
SESSIONFLOW_BACKFILL_MODE |
recent |
One of recent, incremental, full. |
SESSIONFLOW_BACKFILL_PAUSED |
unset | If truthy at startup, backfill begins paused. |
Backfill is provider-aware and durable across restarts (queue state lives in SessionFlow's index state directory). Modes:
recent— only sources modified in the lastSESSIONFLOW_BACKFILL_RECENT_DAYS. This is the default so the most useful recall lands first.incremental— pick up from the last cursor for each source; no rescans.full— exhaustive backfill across all sources for the provider. Use sparingly; runs are still bounded by the per-run turn/file caps above.
Maintenance commands (see cleanup.py):
python cleanup.py status # provider + embedding + backfill snapshot
python cleanup.py status --provider codex # per-provider view
python cleanup.py backfill status # queue status
python cleanup.py backfill pause # pause all providers
python cleanup.py backfill pause --provider codex # pause one provider
python cleanup.py backfill resume # resume all
python cleanup.py backfill enqueue --provider antigravity_cli --mode recentPause state and queued jobs persist on disk, so a restart (or LaunchAgent re-launch) resumes the same plan.
If a secret was indexed before the ingestion-time redaction guard caught it,
cleanup.py sanitize finds and removes it from the Milvus document field, the
FTS5 content column, and the embedding vector derived from them. Detection
reuses the same engine as the ingestion guard, so what the sanitizer flags is
exactly what live ingestion would now redact.
Dry-run is the default — it reports per-rule counts, the affected-turn count, and an audit path, and writes nothing to the index:
python cleanup.py sanitize # dry-run over the whole index
python cleanup.py sanitize --provider claude_code_cli # scope by provider
python cleanup.py sanitize --project /path/to/repo --since 2026-05-01--apply rewrites the affected turns (redact the text, re-embed, overwrite the
row). With --drop it deletes the affected turns instead. Both require an
explicit --yes — there is no interactive prompt. --apply without --yes
refuses before any read or write and exits non-zero:
python cleanup.py sanitize --apply --yes # redact + re-embed in place
python cleanup.py sanitize --apply --yes --drop # delete affected turnsScope flags (--project, --provider, --session, --since) apply to both
dry-run and apply. --drop is only valid with --apply.
Every run writes a value-free JSONL audit trail under ~/.sessionflow/audit/
(0600) — rule names, tiers, integer offsets, and pre-masked snippets only, never
a raw secret value. Output to stdout is likewise counts-only.
Redaction is not safety — rotate the key. Removing a secret from the index does not un-expose it. Once a credential has been written anywhere, treat it as compromised and rotate it at the source; the sanitizer warns about this on every apply but cannot perform the rotation for you.
Hosted/OpenAI embeddings are deferred and not implemented in SESF-6. SessionFlow remains self-hosted: all embedding runs through local MLX. The provider/identity layer leaves room for a future opt-in hosted path, but no hosted setup steps, credentials, or collections are created by this release. If local resource controls prove insufficient for your workload, hosted embeddings will be tracked as a separate future issue.
- Hybrid search — vector similarity (Milvus) + keyword matching (FTS5), merged with Reciprocal Rank Fusion
- HNSW index on Milvus Standalone — O(log n) search over 21K+ turns
- Pure Python git root resolver — no subprocess forks, cached lookups (0.4ms for 10K calls)
- Non-blocking startup — FTS backfill and transcript backfill run as background tasks
- Multi-project — 30+ projects indexed, searchable by
project_rootfilter - Date-range filtering —
date_from/date_toon all search tools - Incremental indexing — byte-offset tracking per transcript, checkpoint every 100 files
Originally forked from mwgreen/claude-code-session-rag. Diverged significantly with Milvus Standalone migration, subprocess fork elimination, HNSW indexing, background startup, and multi-terminal hardening. Now an independent project.
MIT