feat(SESF-42): retroactive secret sanitizer (CLI + MCP)#40
Conversation
Add an operator-driven sanitizer that removes secrets already indexed in Milvus (document), FTS5 (content), and the embedding vector. - secret_redaction.scan_spans: per-occurrence, value-free audit spans (rule/tier/offset/length/masked_snippet); masked_snippet masks every detected span in its window (no raw value, any tier). - rag_engine.upsert_document / delete_by_doc_id: Milvus upsert-by-PK + FTS delete-then-insert (redact) / dual delete (drop); FTS failure surfaced distinctly so a row stays retryable. - sanitize.py: scan (dry-run) / apply (redact|drop) orchestrator with worklist+done checkpoint, throttled re-embed (200ms floor), and 0600 value-free audit JSONL. - cleanup.py sanitize subcommand (refuse without --yes) + tools.py sanitize_index MCP tool (refuse apply without confirm). - Tests: 44 new (scan_spans, orchestrator, primitives, CLI, MCP). Live dry-run baseline: 1296 affected turns (HIGH_ENTROPY 6136, ASSIGNMENT 882); audit verified value-free on real data. Refs SESF-42
- upsert_document: new_vector now Optional; None preserves the stored vector (resume/FTS-only converge no longer corrupts the 768-dim row). - upsert_document: FTS rewrite rebuilds the full metadata record from the fetched row (was dropping ~13 columns → degraded filtered/BM25 search). - delete_by_doc_id now returns DeleteResult(deleted, fts_ok); drop path marks a turn done only when both Milvus and FTS deletes succeed, mirroring the redact FTS-incomplete contract. - _build_worklist caches scanned rows so the resume/no-spans path is reachable. - tools.py refusal wording reflects drop; sanitize.apply pause_event typed. - +5 tests for the previously-mocked vector/metadata/drop-FTS paths. Refs SESF-42
|
Warning Review limit reached
More reviews will be available in 28 minutes and 39 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
WalkthroughImplements SESF-42: a retroactive secret sanitizer that detects previously-indexed secrets in Milvus and FTS5. Adds ChangesSESF-42 Retroactive Secret Sanitizer
Sequence Diagram(s)sequenceDiagram
participant Operator
participant cleanup.py / sanitize_index
participant sanitize.py
participant secret_redaction
participant rag_engine
participant Milvus
participant FTS5
rect rgba(100, 100, 200, 0.5)
note over Operator,FTS5: Dry-run (scan)
Operator->>cleanup.py / sanitize_index: sanitize (no --apply / apply=false)
cleanup.py / sanitize_index->>sanitize.py: scan(scope)
sanitize.py->>Milvus: keyset-batch query in-scope rows
Milvus-->>sanitize.py: rows
sanitize.py->>secret_redaction: scan_spans(document, mode="report")
secret_redaction-->>sanitize.py: (text, spans)
sanitize.py-->>cleanup.py / sanitize_index: SanitizeReport(mode="dry-run", audit_path)
cleanup.py / sanitize_index-->>Operator: counts + audit path (no secret values)
end
rect rgba(100, 200, 100, 0.5)
note over Operator,FTS5: Apply (redact or drop)
Operator->>cleanup.py / sanitize_index: sanitize --apply --yes / apply=true, confirm=true
cleanup.py / sanitize_index->>sanitize.py: apply(scope, drop, confirmed=True)
loop per doc_id in worklist
alt drop=False
sanitize.py->>secret_redaction: scan_spans(document, mode="enforce")
secret_redaction-->>sanitize.py: (redacted_text, spans)
sanitize.py->>rag_engine: upsert_document(doc_id, redacted_text, new_vector)
rag_engine->>Milvus: upsert row
rag_engine->>FTS5: delete then insert
rag_engine-->>sanitize.py: UpsertResult(milvus_ok, fts_ok)
else drop=True
sanitize.py->>rag_engine: delete_by_doc_id(doc_id)
rag_engine->>Milvus: delete row
rag_engine->>FTS5: delete row
rag_engine-->>sanitize.py: DeleteResult(deleted, fts_ok)
end
sanitize.py->>sanitize.py: checkpoint after each turn
end
sanitize.py-->>cleanup.py / sanitize_index: SanitizeReport(rotate_warning=True)
cleanup.py / sanitize_index-->>Operator: report (no secret values)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | ✅ 315 (≤ 500 complexity) |
| Duplication | ✅ 4 (≤ 5 duplication) |
AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Pull Request Overview
The PR is currently not up to standards due to critical scalability risks and a failure to correctly implement hardware-constrained throttling. While the core functionality for retroactive sanitization is present, the implementation contains a high-severity risk of Out-Of-Memory (OOM) errors when processing large indices because full document payloads are cached in memory.
Additionally, the re-embedding throttle contains a logic flaw that allows it to bypass safety waits, violating the requirement for hardware-aligned budget management. Finally, there is significant logic duplication between real-time and retroactive scanning paths, which must be refactored to ensure consistent secret prioritization and maintainability.
About this PR
- The orchestration logic for scanning rows is not scalable. Caching full document payloads for every turn in a repository will lead to OOM crashes during large-scale index cleaning. The system should be refactored to process payloads just-in-time or only cache lightweight metadata.
- There is a systemic duplication of secret candidate aggregation and filtering logic between the live redaction module and this new retroactive scanner. This creates a high risk of 'behavioral drift' where real-time and retroactive scanning apply different rules to the same content.
Test suggestions
- Verify that a dry-run identifies secrets and generates an audit log without modifying Milvus or FTS.
- Verify that apply mode refuses to execute and returns an error/status when confirmation is missing.
- Verify that the audit log and CLI/MCP output contain only rule names, counts, and masked snippets, never raw secret values.
- Verify that a resumed run skips doc_ids already marked 'done' in the checkpoint.
- Verify that re-embedding calls the embedding budget's before_batch/after_batch for throttling.
- Verify that FTS failures during a redact or drop operation leave the turn as retryable (not marked done).
- Verify that upsert_document preserves all Milvus metadata fields while updating the document and vector.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
There was a problem hiding this comment.
Pull Request Overview
The pull request is currently not up to standards due to high-risk performance issues and security vulnerabilities. Specifically, the _build_worklist function in sanitize.py poses a significant risk of OutOfMemoryError when operating on large Milvus indices because it attempts to cache full document content in memory.
From a security and integrity perspective, the audit logging mechanism contains a race condition in permission handling and incorrectly truncates existing logs when a sanitization run is resumed. These issues must be addressed to ensure a secure and reliable audit trail. While the implementation successfully meets most acceptance criteria regarding dry-run defaults and confirmation prompts, the code duplication identified in redaction logic should also be resolved to prevent policy drift.
Test suggestions
- CLI 'sanitize' defaults to dry-run and reports detections without writing
- CLI 'sanitize --apply' refuses to proceed without the '--yes' confirmation flag
- MCP 'sanitize_index' tool refuses to apply without 'confirm=true'
- Redaction mode re-embeds the masked text using the throttled embedding budget
- Drop mode deletes rows from both Milvus and FTS stores
- FTS rewrite failure results in 'incomplete' status and leaves the turn in the worklist
- Audit logs use 0600 permissions and do not leak raw secret values in snippets
- FTS metadata is preserved during an in-place update
- Sanitizer can resume an interrupted apply run from a durable checkpoint
- Snippet generation for forced entropy tokens is value-free even if the keyword is outside the window
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
There was a problem hiding this comment.
Pull request overview
Adds SESF-42 “retroactive secret sanitizer” capability to remove secrets that were already indexed (Milvus document, FTS5 content, and derived embeddings), exposed via both the cleanup.py sanitize CLI and a new sanitize_index MCP tool. This extends the existing SESF-41 ingestion-time guard by enabling operator-driven cleanup of historical data while preserving the “no secret value emitted” invariant via span-aware, value-free auditing.
Changes:
- Introduces
secret_redaction.scan_spansand span/snippet data model to support per-occurrence, value-free auditing. - Adds sanitizer orchestrator (
sanitize.py) plus CLI/MCP adapters (confirmation-gated apply; dry-run default) and extensive tests. - Adds
rag_engine.upsert_document/delete_by_doc_idprimitives to rewrite/delete across Milvus + FTS with distinct FTS success reporting.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools.py | Registers sanitize_index MCP tool and adds markdown formatting for sanitize reports. |
| sanitize.py | Implements scan/apply orchestration, audit JSONL writing, checkpointing, and embedding throttling. |
| secret_redaction.py | Adds Span + scan_spans() for per-occurrence, value-free audit spans/snippets. |
| rag_engine.py | Adds upsert_document / delete_by_doc_id primitives and result types for dual-store operations. |
| cleanup.py | Adds cleanup.py sanitize subcommand and value-free report printing with confirmation gate. |
| README.md | Documents retroactive sanitizer usage, flags, and audit/no-leak guarantees. |
| CHANGELOG.md | Notes the new SESF-42 sanitizer feature set and operator warnings. |
| tests/test_tools_sanitize.py | Tests MCP adapter wiring, confirmation gating, and no-leak output contract. |
| tests/test_cleanup_sanitize.py | Tests CLI wiring, refusal-before-reads/writes semantics, and no-leak output. |
| tests/test_secret_redaction.py | Adds regression coverage for span/snippet masking invariants in scan_spans. |
| tests/test_sanitize.py | Adds orchestrator + rag_engine primitive tests for scan/apply/drop/resume/budget behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
sanitize.py (3)
243-252:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAudit file truncates existing logs on resume and has a TOCTOU race on permissions.
Opening with
"w"mode destroys any prior audit entries for this run_id (e.g. on resume). Thechmodafteropenleaves a window where the file exists with default umask permissions. Use atomic open with mode:- self._handle = open(self.path, "w", encoding="utf-8") - try: - os.chmod(self.path, _FILE_MODE) - except OSError: - pass + fd = os.open(self.path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, _FILE_MODE) + self._handle = os.fdopen(fd, "w", encoding="utf-8")For resume scenarios, consider append mode (
O_APPEND) instead of truncate, or use a fresh run_id.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sanitize.py` around lines 243 - 252, Update the audit file initialization in sanitize.py’s __init__ so it does not truncate existing run logs on resume; switch the file open behavior away from "w" to an atomic create/open strategy with the intended permissions, and use append semantics if the same run_id must preserve prior entries. Keep the permission-setting logic tied to the open path to avoid the post-open TOCTOU window, and adjust the _audit_path/run_id handling only if needed to support a fresh run_id for truncate-on-new-run behavior.
307-318:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winO(N) membership check on list creates O(N²) scan complexity.
doc_id not in worklistis O(N) for lists. With large indices this becomes a bottleneck. Use a set for tracking seen doc_ids:counts: dict = {} - worklist: list[str] = [] + worklist: list[str] = [] + worklist_set: set[str] = set() audit = _AuditWriter(run_id) try: for row in _iter_rows(scope, db_path): ... doc_id = row.get("doc_id") - if doc_id not in worklist: + if doc_id not in worklist_set: + worklist_set.add(doc_id) worklist.append(doc_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sanitize.py` around lines 307 - 318, The worklist variable is initialized as a list but used with the membership check operator (not in), which performs an O(N) search. This creates O(N²) overall complexity when looping through rows. Change the worklist initialization from list[str] = [] to set[str] = set(), and replace the worklist.append(doc_id) call with worklist.add(doc_id). This will reduce the membership check to O(1) and overall scan complexity to O(N).
368-375:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThrottle loop exits without enforcing the 200ms cooldown floor when budget denies but provides no delay.
When
allowed=Falseandretry_after_secondsis 0/None, the loop breaks immediately without any wait. Per CLAUDE.md and retrieved learnings, the embedding backfill must never reduce cooldown below 100ms due to MLX Metal SIGSEGV risk.delay = getattr(decision, "retry_after_seconds", 0.0) or 0.0 - if delay <= 0: - break + if delay <= 0: + delay = 0.2 # enforce 200ms floor per CLAUDE.md time.sleep(delay)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sanitize.py` around lines 368 - 375, The throttle loop in sanitize.py should always enforce a minimum cooldown instead of breaking immediately when budget.before_batch() returns allowed=False with no retry delay. Update the backfill wait logic in the while True block so a denied decision with retry_after_seconds missing or zero still sleeps for the required floor (at least 200ms, and never below the documented 100ms minimum), then retries rather than exiting the loop.Sources: Coding guidelines, Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rag_engine.py`:
- Around line 2041-2046: Update rag_engine.upsert_document so the FTS rewrite
uses the same UTF-8 truncated text stored in row["document"] rather than
new_document; keep the Milvus upsert and metadata handling unchanged, but pass
the truncated document value into the FTS insert path at the anchor site and the
sibling site (rag_engine.py 2041-2046 and 2061-2063) to preserve dual-write
alignment.
- Around line 2057-2085: The FTS rewrite logic in rag_engine.py only closes the
ephemeral SQLite connection on the success path; update both rewrite blocks
around _fts.connection(), _fts.delete(), and _fts.insert() so
_fts.close_ephemeral(conn) runs in a finally block (or equivalent) whenever conn
is created, including when delete/insert raises. Keep fts_ok handling and
logging intact, but ensure the connection is always released on failure paths.
In `@tests/test_sanitize.py`:
- Around line 306-317: In the test function
test_apply_without_confirmation_makes_no_calls, add an assertion that verifies
no reads were performed on the stubbed engine when confirmed=False is passed to
sanitize.apply(). Currently the test only asserts that writes (upserts, deletes,
embed_inputs) are blocked, but does not verify that the code skips reading from
the index altogether. Add an assertion checking the appropriate field in the cap
object that tracks read/search operations to ensure that reads are also blocked
when confirmation is not provided, consistent with the security requirement that
destructive sanitize operations must refuse unless explicitly confirmed.
---
Duplicate comments:
In `@sanitize.py`:
- Around line 243-252: Update the audit file initialization in sanitize.py’s
__init__ so it does not truncate existing run logs on resume; switch the file
open behavior away from "w" to an atomic create/open strategy with the intended
permissions, and use append semantics if the same run_id must preserve prior
entries. Keep the permission-setting logic tied to the open path to avoid the
post-open TOCTOU window, and adjust the _audit_path/run_id handling only if
needed to support a fresh run_id for truncate-on-new-run behavior.
- Around line 307-318: The worklist variable is initialized as a list but used
with the membership check operator (not in), which performs an O(N) search. This
creates O(N²) overall complexity when looping through rows. Change the worklist
initialization from list[str] = [] to set[str] = set(), and replace the
worklist.append(doc_id) call with worklist.add(doc_id). This will reduce the
membership check to O(1) and overall scan complexity to O(N).
- Around line 368-375: The throttle loop in sanitize.py should always enforce a
minimum cooldown instead of breaking immediately when budget.before_batch()
returns allowed=False with no retry delay. Update the backfill wait logic in the
while True block so a denied decision with retry_after_seconds missing or zero
still sleeps for the required floor (at least 200ms, and never below the
documented 100ms minimum), then retries rather than exiting the loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e1bb2cf4-19c6-4e77-82ab-a0619221b3b5
📒 Files selected for processing (12)
CHANGELOG.mdCLAUDE.mdREADME.mdcleanup.pyrag_engine.pysanitize.pysecret_redaction.pytests/test_cleanup_sanitize.pytests/test_sanitize.pytests/test_secret_redaction.pytests/test_tools_sanitize.pytools.py
…edup - sanitize.py: worklist holds doc_ids + value-free metadata only (no document payloads cached) → JIT row fetch via rag_engine.get_row_by_doc_id; set-based membership (O(N^2)→O(N)); resume loads worklist+run_id from the checkpoint without re-scanning. Fixes OOM risk on large indices. - sanitize.py _throttled_embed: a hard budget deny (allowed=False, no retry) now aborts the run (status=paused, checkpoint) instead of embedding — no longer bypasses the pause/cap gate. - sanitize.py _AuditWriter: O_CREAT|O_APPEND|0o600 at open — no truncation on resume, no perms-after-create race; resumed runs reuse the run_id. - rag_engine.py upsert_document/delete_by_doc_id: try/finally so the ephemeral FTS connection is always closed (was leaking on exception). - secret_redaction.py: extract _aggregate_maskable_candidates shared by redact() + _maskable_values() — removes the duplication (Codacy metric) and guarantees no policy drift between live + retroactive paths. - tests: parameterize FTS-failure cases, fix stale docstring, +coverage for JIT fetch / resume / throttle-abort / audit append. Full suite 356 passed; live dry-run re-validated (audit value-free). Refs SESF-42
There was a problem hiding this comment.
Pull Request Overview
This PR successfully implements the retroactive secret sanitizer with the required dry-run, redaction, and deletion modes, supported by a value-free audit system. All acceptance criteria for functional behavior, including mandatory confirmation and throttled re-embedding, are satisfied. However, two critical issues prevent immediate approval: a likely runtime NameError in rag_engine.py regarding text truncation and a blocking synchronous operation in the MCP tool handler that would make the server unresponsive. Furthermore, the Milvus integration should be optimized to use integer primary keys rather than string identifiers to ensure performance and prevent logic desynchronization.
About this PR
- There is a systemic pattern of duplicating primary key derivation and document fetching logic in
rag_engine.py. Centralizing these into private helpers will reduce the risk of desynchronization between the ingestion, sanitization, and deletion paths.
Test suggestions
- Sanitize scan (dry-run) identifies affected turns and generates a value-free report without writes.
- Sanitize apply (redact) successfully updates Milvus, re-embeds text via throttled budget, and rewrites FTS metadata.
- Sanitize apply (drop) successfully deletes turns from both Milvus and FTS stores.
- Sanitizer refuses to perform destructive operations (apply/drop) without explicit confirmation.
- Sanitizer audit trail and stdout/MCP responses never contain raw secret values (masking verification).
- Sanitizer resumes from a checkpoint correctly after an interruption or budget-pause.
- FTS failure during apply leaves the turn on the worklist for retry/convergence.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
A redacted payload can expand past Milvus's 65535-byte VARCHAR cap; the FTS insert must index the same truncated text Milvus stores or the two stores diverge. +regression test for the >64KB case. (CodeRabbit) Refs SESF-42
… lookup - tools.py: sanitize_index offloads scan/apply via asyncio.to_thread so a long-running apply no longer blocks the MCP event loop. - rag_engine.py: extract _pk_from_doc_id (centralizes the sha256[:15] PK derivation across insert/delete/get); get_row_by_doc_id + upsert_document now look up by integer PK (id == <pk>) for O(1) instead of a VARCHAR scan. - tests: assert apply(confirmed=False) performs no reads (spies _query_batches + get_row_by_doc_id), not just no writes. Refs SESF-42
Summary
Adds an operator-driven sanitizer that removes secrets already indexed in the Milvus
documentfield, the FTS5contentcolumn, and the embedding vector — the retroactive half of the SESF-35 research (SESF-41 shipped the ingestion guard). Exposed via acleanup.py sanitizeCLI subcommand and a dedicatedsanitize_indexMCP tool.0600audit JSONL (~/.sessionflow/audit/); zero writes to any store.--apply --yesredacts in place and re-embeds the redacted text (throttled through the embedding budget, 200ms floor; checkpointed/resumable).--apply --yes --dropdeletes affected turns instead.New primitives
secret_redaction.scan_spans— per-occurrence, value-free audit spans (reuses the SESF-41 detector); snippets mask every detected span in-window.rag_engine.upsert_document(Milvus upsert-by-PK + FTS metadata-preserving rewrite) anddelete_by_doc_id(DeleteResultsurfaces FTS outcome). FTS-rewrite/-delete failure keeps a turn retryable, never silently "done".Linked issue
Tests
pytestfull suite: 349 passed / 0 failed (300 baseline + 49 new). ruff clean.Live exposure baseline (dry-run, no writes)
Ran against the live Standalone index: 1296 affected turns (HIGH_ENTROPY 6136, ASSIGNMENT 882; codex 5628 / claude_code_cli 980 / antigravity_desktop 203 / opencode 118 / antigravity_cli 89). Audit verified value-free on real data (7018 records, 0 raw-value fields, 0 leaking snippets,
0600).Out of scope
--applyon the live index (a separate, deliberate operator action).Version
No version bump — SessionFlow has no version-lock system.
Summary by CodeRabbit