Acting layer: optimize --apply, session guard, realized savings (opt-in) by iamtoruk · Pull Request #610 · getagentseal/codeburn

iamtoruk · 2026-07-03T11:40:44Z

What

The opt-in acting layer from the epic: CodeBurn can now execute the fixes it recommends, watch sessions for runaway spend, and measure what applied fixes actually saved. Everything is off by default, journaled, and undoable. Core commands are unchanged for users who never opt in.

Closes #603, closes #604, closes #605, closes #606. (#607 stays design-gated per the epic #602.)

codeburn act (#603)

Action journal with per-file backups under the config dir. Every mutation the acting layer makes goes through one framework: snapshot first, mutate, hash, journal. act list shows what was done; act undo <id> restores byte-identical, refusing (without --force) when a target drifted since apply. Handles files and directories, move collisions, mid-apply rollback, and a stale-plan guard: a plan whose target changed between preview and confirm refuses to apply instead of clobbering.

codeburn optimize --apply (#604)

Findings now carry stable ids, and the config-class ones become one-keystroke fixes: remove or project-scope MCP servers (scoped precisely; other projects' entries are never touched and the preview names exactly which entries change), archive unused skills/agents/commands, CLAUDE.md rule blocks between markers (idempotent), bash output cap. Interactive picker, --yes, --dry-run, --only. CLAUDE.md edits are excluded from blanket --yes as a safety rule.

codeburn guard (#605)

Session-time hook pack for Claude Code, verified against the live hooks docs: budget caps (soft warn once, hard block until guard allow), session openers injected only into projects the detectors flag, an end-of-session yield nudge, optional statusline. Handlers fail open (any error exits 0 silently) and parse transcripts incrementally with per-message-id dedup that matches the shipped cost path exactly; warm hook overhead is ~10ms on a 90MB transcript. Install/uninstall edit settings.json through the journal and restore byte-identical.

codeburn act report (#606)

Realized savings for applied fixes: baselines captured at apply time, post-window re-measurement per kind, estimates window-scaled so the comparison cannot over-claim, reverts detected, low-confidence rows excluded from the optimize header, and honest-accounting rules printed in the footer. Optimize gains a one-line header only when measured actions exist.

How it was built and reviewed

Each issue went through an implement, adversarial-review, fix cycle before the next started, with a final line-by-line review on top. The review passes caught and fixed, among others: silent data destruction in move-op undo, a project-scope change that would have removed servers from unrelated projects, and a critical guard bug where streaming transcript duplicates inflated measured session cost ~3x (validated fixed to the cent against real transcripts). Test mutations that survived the suites were turned into new tests.

Testing

93 new tests across five suites (act journal/undo 20, optimize-apply 25, guard 30, act-report 23, plus CLI-level coverage); full suite 1493 passing
Real-machine smokes are read-only by design: --dry-run verified byte-clean against the live config repeatedly; guard hooks exercised against copies of real transcripts in temp dirs; no real apply or install was run on this machine
npm run build and tsc --noEmit clean; zero new dependencies

Add src/act, a dependency-free framework for journaling and reverting any file CodeBurn modifies. runAction is the single mutation path: it snapshots every target, applies the changes, then appends a JSONL record, rolling back completed steps and journaling nothing if a mutation throws midway. Undo checks each file against its post-apply sha256 and refuses on drift unless forced, restoring edits, deletes (created files), and moves. A pid plus timestamp lockfile (stale after 60s) guards apply and undo, and the journal reader tolerates corrupt lines with last-line-wins status updates. Wire up `codeburn act list` (table or --json) and `codeburn act undo <id|--last> [--force]`. Storage lives under the existing config home via the config.ts resolver. Tests cover apply with backups and afterHash, byte-identical undo per op type, drift refusal and --force, mid-apply rollback, and corrupt-journal tolerance.

Undo no longer clobbers files it did not create: an occupied original path counts as drift for moves (--force removes then renames back), a move destination that already exists is snapshotted (destBackup) and restored after the file moves back, and a missing moved file falls back to the source snapshot so forced undo cannot die mid-loop. Non-move reverts now key on backup presence instead of the op label, which also restores files that a create overwrote. Apply snapshots once per unique path, hashes after all mutations so overlapping changes carry the final state, and journals inside the rollback region so a failed append reverts the mutations. The lock is taken with a single wx write and goes stale by mtime only, so a fresh lock can never be stolen while empty. Drift reads treat any unreadable target as drift with its error code, ambiguous id prefixes report the match count, undo --last skips already-undone records ("Nothing to undo."), and readRecords only swallows ENOENT. Tests cover each new behavior plus two mutation probes (forward-order revert and removed locking both fail the suite), and a CLI-level check of `act list --json` output shape and ordering.

The archive actions in the acting epic move whole skill/agent directories, which the framework could not handle: snapshotFile used copyFile (EISDIR on a directory) and the afterHash pass used readFile. Snapshots now branch on lstat, copying directory trees with fs.cp recursive. Directories get an empty afterHash ('' means no content hash) and drift detection skips the hash comparison for them; the occupied-original-path check still applies and a missing movedTo still falls back to the backup. Backup restore likewise branches: a directory snapshot replaces the target (rm then cp recursive). Apply-side rename cannot replace a directory destination (ENOTEMPTY), so a move retries once after clearing the already-snapshotted destination, rethrowing other codes before any destination damage. Tests: archive a directory tree and restore it byte-identical, move a directory onto an existing destination directory (destBackup taken, both trees restored), and dir-move undo with an occupied original path refusing without --force and overwriting with it.

Add stable kebab-case finding ids to every optimize detector and to the JSON report, then route the config-class findings through the action journal so they can be applied and undone. - optimize.ts: id on every WasteFinding and OptimizeJsonReport entry; appliable findings also carry a machine-readable apply payload (mcp server list, project-scope keepers, archive names). - act/plans.ts: planFor(finding) builds concrete, journaled file mutations for mcp-remove, mcp-project-scope, skill/agent/command archives, CLAUDE.md rule blocks, and the bash output cap. JSON edits preserve the rest of the document (2-space indent, trailing newline); unparseable config files are reported and skipped, not fatal. - act/optimize-apply.ts: codeburn optimize --apply with a plain-readline confirm, plus --yes, --dry-run, and --only; prints each journal id and the undo hint. --apply with --json exits 2. Tests cover mcp remove/undo, project-scope global-to-project move, unparseable-file skip, archive collision suffixing, CLAUDE.md marker idempotency, a byte-identical dry-run tree hash, and a finding-id guard.

…on (#604) Review fixes plus one coordinator amendment on top of the initial optimize --apply implementation. - mcp-project-scope no longer strips a server from every projects[*] container in ~/.claude.json: only the top-level entry and the finding's cold projects lose it, and the cwd's own config files count as cold only when the cwd is in the cold list. The plan preview annotates ~/.claude.json with the project entries that lose the server. - unused-mcp findings are now appliable (remove-everywhere mcp-remove plans, same as low-coverage). - Notes are rendered under manual findings too, so an all-unparseable config surfaces its parse error instead of a bare "manual". - ConfigDocs strips a leading UTF-8 BOM before JSON.parse. - claude-md plans are excluded from --apply --yes (they write to cwd/CLAUDE.md); they apply via the interactive picker or an explicit --only selection, and --yes prints them as skipped with the reason. - --only with an unknown or not-appliable id errors to stderr with the run's valid appliable ids and exit code 2. - EOF at the interactive prompt prints "Nothing applied." and exits 0; an answer that arrives together with EOF is still honored. Adds end-to-end tests driving runOptimizeApply over injected stdio and a fixture home: --yes output with journal ids and undo hints, picker parsing, --only filtering and validation, EOF, claude-md skip, the projects[*] over-deletion regression, manual-note rendering, and BOM configs.

Plans serialize the full post-edit file content at build time, so a target edited between the preview and the interactive confirm would be silently overwritten with stale content (recoverable via backup, but a silent violation of the dry-run-shows-exactly-what-changes principle). - PlannedChange edit/create variants gain optional expectedHash: sha256 of the raw on-disk bytes the plan was built from (hashed before the BOM strip), null when the plan expects the file to be absent, undefined to skip validation (framework back-compat). - plans.ts sets it everywhere a target is read: ConfigDocs hashes the raw buffer it parses, and the marker builders hash the file behind a shared markerChange helper. - runAction validates all expected hashes after snapshotting and before the first mutation; a mismatch throws "<path> changed since the plan was built; re-run codeburn optimize --apply", removes the backup dir, and journals nothing. No rollback is involved since nothing mutated. Tests: stale edit target rejected with no journal record and no backup dir left, expects-absent plan rejected when the file appeared, matching hash still applies, and a hash-less change still applies unchecked.

…rs to constants Guard (#605) reuses both as a single source of truth: parseApiCall is the narrowest per-entry cost+tools extractor for the incremental transcript parse, and the two session-opener strings become exported constants so the optimize findings and the guard SessionStart hook can never drift.

codeburn guard install|uninstall|status|refresh|allow plus the internal hook/statusline handlers. Off by default, fully local, cleanly removable. - Settings edits go through the action journal (guard-install / guard-uninstall) with expectedHash, appending our entries and removing exactly ours by command prefix; a byte-identical uninstall is asserted by test. - PreToolUse budget cap (soft warn once, hard block with a per-session allow override), Stop yield checkpoint (expensive with no edits and no commit, once), and a flagged-project SessionStart opener built from the optimize detectors. - Incremental per-session cache keyed by session id: resumes the transcript parse from the last complete-line byte offset via readSessionLines, folding only the tail into running totals. Warm invocation ~0.28s against a 90MB transcript, dominated by CLI startup; the tail parse itself is negligible. - All handlers fail open: any error, malformed stdin, or missing transcript exits 0 with no output so a broken guard can never block a session. - Hook protocol verified against the live docs (dated block at the top of hooks.ts); zero new dependencies.

Claude Code rewrites each assistant message several times as it streams, every copy carrying the full final usage; the shipped parser dedupes these last-wins (dedupeStreamingMessageIds) but the guard fold summed every line, measuring real sessions at 2.5-2.8x their true cost and false-blocking the hard cap at roughly 40% of the configured spend. - The session cache now maps message id -> that id's cost contribution and each id-carrying line replaces its previous contribution; id-less lines keep plain adds. Validated against two real transcripts (90MB and 116MB): guard totals now equal the shipped deduped totals exactly. - Replace semantics also self-heal the trailing-line case: a complete final line without its newline is folded but byteOffset stops before it, so the next invocation re-reads it as a replace, not a double add. - editCount becomes a set-once sawEdit boolean so duplicate copies of an edit tool_use cannot inflate it; cache schema bumped to v2 (old caches cold-reparse once). - Per-session state moves to guard/sessions/ so a session id can never collide with the shared flags.json, dropping the doAllow special case. - The git-commit detector now requires commit as the git subcommand at a command boundary (start of string or line, or ; & |), with intra-command gaps that never cross newlines: 'git log --grep commit' and 'git diff && echo commit' no longer match, while newline-separated 'git add ...\ngit commit' in multi-line Bash calls now does (verified as a real false negative on a live transcript). - Corrected the statusline protocol note: each stdout line renders as its own row; we emit exactly one. - New tests: streaming-duplicate fixtures (3x identical, growing last-wins, incremental replace) asserted equal to a cold shipped-parser computation, the trailing-partial-line scenario, the commit-detector matrix, and a stale-plan test proving guard-install plans carry expectedHash (a concurrent settings edit aborts the apply and survives). The act list CLI spawn test now anchors to the repo root from the test file location.

Capture a trailing-14-day before-baseline when a fix is applied and re-measure it against the post-apply window so optimize can show realized numbers next to estimates. - ActionBaseline (windowDays, capturedAt, estimatedTokens, sessions, metrics) persisted by runAction; captured in the optimize --apply flow and at guard install time. - codeburn act report [--json]: applied, not-undone actions older than 3 days, re-running the detectors over apply-date-to-now (capped 30 days). Per-kind realized deltas: MCP/archive tokens-per-session times saved sessions with reverted-by-user detection; read-edit deficit reduction; guard yield split labeled correlation. Bash cap is marked not measurable (result sizes are not retained). Low confidence under 20 post-window sessions or past a 2x volume shift. Realized numbers rounded down, estimate kept visible. - optimize gains one header line only when a measured action exists, and appends "(previously applied <date>, re-flagged)" to re-triggered findings. No change for users with no applied actions. Reuses scanAndDetect helpers over a date-bounded range; exports the token constants and read/edit tool sets rather than duplicating the math.

…606) Bug-hunt follow-ups on the realized-savings report: - Scale the displayed estimate to the measured window so the Estimated and Realized columns are comparable: mcp/archive use the per-session baseline times the post-window session count, read-edit uses deficitThen times the same edits denominator as realized (so realized never exceeds it). The at-apply estimate stays in --json as estimatedAtApply next to estimatedForWindow; the footer states the scaling and that mcp/archive realized figures are derived from session counts, not independently measured. - Never crash on a corrupt journal: records without a string status or a parseable string `at` are skipped and surfaced ("N malformed records skipped"); the optimize header computation is additionally wrapped so any error just drops the header. - Zero post-window sessions now reads "not measurable: no sessions in the window yet" instead of measured-zero. - The optimize header sums only normal-confidence measured rows (under-claim); low-confidence rows stay visible in act report only. - Tests: floor discipline on non-integer mcp and read-edit products (a ceil mutation fails), the project-scope keeper subtraction, the archive estimate==realized tautology, malformed-journal robustness, and the low-confidence header exclusion.

iamtoruk added 11 commits July 3, 2026 09:39

iamtoruk merged commit 4b8e744 into main Jul 3, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Acting layer: optimize --apply, session guard, realized savings (opt-in)#610

Acting layer: optimize --apply, session guard, realized savings (opt-in)#610
iamtoruk merged 11 commits into
mainfrom
feat/act-core

iamtoruk commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

iamtoruk commented Jul 3, 2026

What

codeburn act (#603)

codeburn optimize --apply (#604)

codeburn guard (#605)

codeburn act report (#606)

How it was built and reviewed

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant