Skip to content

Acting layer: optimize --apply, session guard, realized savings (opt-in)#610

Merged
iamtoruk merged 11 commits into
mainfrom
feat/act-core
Jul 3, 2026
Merged

Acting layer: optimize --apply, session guard, realized savings (opt-in)#610
iamtoruk merged 11 commits into
mainfrom
feat/act-core

Conversation

@iamtoruk

@iamtoruk iamtoruk commented Jul 3, 2026

Copy link
Copy Markdown
Member

What

The opt-in acting layer from the epic: CodeBurn can now execute the fixes it recommends, watch sessions for runaway spend, and measure what applied fixes actually saved. Everything is off by default, journaled, and undoable. Core commands are unchanged for users who never opt in.

Closes #603, closes #604, closes #605, closes #606. (#607 stays design-gated per the epic #602.)

codeburn act (#603)

Action journal with per-file backups under the config dir. Every mutation the acting layer makes goes through one framework: snapshot first, mutate, hash, journal. act list shows what was done; act undo <id> restores byte-identical, refusing (without --force) when a target drifted since apply. Handles files and directories, move collisions, mid-apply rollback, and a stale-plan guard: a plan whose target changed between preview and confirm refuses to apply instead of clobbering.

codeburn optimize --apply (#604)

Findings now carry stable ids, and the config-class ones become one-keystroke fixes: remove or project-scope MCP servers (scoped precisely; other projects' entries are never touched and the preview names exactly which entries change), archive unused skills/agents/commands, CLAUDE.md rule blocks between markers (idempotent), bash output cap. Interactive picker, --yes, --dry-run, --only. CLAUDE.md edits are excluded from blanket --yes as a safety rule.

codeburn guard (#605)

Session-time hook pack for Claude Code, verified against the live hooks docs: budget caps (soft warn once, hard block until guard allow), session openers injected only into projects the detectors flag, an end-of-session yield nudge, optional statusline. Handlers fail open (any error exits 0 silently) and parse transcripts incrementally with per-message-id dedup that matches the shipped cost path exactly; warm hook overhead is ~10ms on a 90MB transcript. Install/uninstall edit settings.json through the journal and restore byte-identical.

codeburn act report (#606)

Realized savings for applied fixes: baselines captured at apply time, post-window re-measurement per kind, estimates window-scaled so the comparison cannot over-claim, reverts detected, low-confidence rows excluded from the optimize header, and honest-accounting rules printed in the footer. Optimize gains a one-line header only when measured actions exist.

How it was built and reviewed

Each issue went through an implement, adversarial-review, fix cycle before the next started, with a final line-by-line review on top. The review passes caught and fixed, among others: silent data destruction in move-op undo, a project-scope change that would have removed servers from unrelated projects, and a critical guard bug where streaming transcript duplicates inflated measured session cost ~3x (validated fixed to the cent against real transcripts). Test mutations that survived the suites were turned into new tests.

Testing

  • 93 new tests across five suites (act journal/undo 20, optimize-apply 25, guard 30, act-report 23, plus CLI-level coverage); full suite 1493 passing
  • Real-machine smokes are read-only by design: --dry-run verified byte-clean against the live config repeatedly; guard hooks exercised against copies of real transcripts in temp dirs; no real apply or install was run on this machine
  • npm run build and tsc --noEmit clean; zero new dependencies

iamtoruk added 11 commits July 3, 2026 09:39
Add src/act, a dependency-free framework for journaling and reverting any
file CodeBurn modifies. runAction is the single mutation path: it snapshots
every target, applies the changes, then appends a JSONL record, rolling back
completed steps and journaling nothing if a mutation throws midway. Undo
checks each file against its post-apply sha256 and refuses on drift unless
forced, restoring edits, deletes (created files), and moves. A pid plus
timestamp lockfile (stale after 60s) guards apply and undo, and the journal
reader tolerates corrupt lines with last-line-wins status updates.

Wire up `codeburn act list` (table or --json) and
`codeburn act undo <id|--last> [--force]`. Storage lives under the existing
config home via the config.ts resolver. Tests cover apply with backups and
afterHash, byte-identical undo per op type, drift refusal and --force,
mid-apply rollback, and corrupt-journal tolerance.
Undo no longer clobbers files it did not create: an occupied original path
counts as drift for moves (--force removes then renames back), a move
destination that already exists is snapshotted (destBackup) and restored
after the file moves back, and a missing moved file falls back to the source
snapshot so forced undo cannot die mid-loop. Non-move reverts now key on
backup presence instead of the op label, which also restores files that a
create overwrote.

Apply snapshots once per unique path, hashes after all mutations so
overlapping changes carry the final state, and journals inside the rollback
region so a failed append reverts the mutations. The lock is taken with a
single wx write and goes stale by mtime only, so a fresh lock can never be
stolen while empty. Drift reads treat any unreadable target as drift with
its error code, ambiguous id prefixes report the match count, undo --last
skips already-undone records ("Nothing to undo."), and readRecords only
swallows ENOENT.

Tests cover each new behavior plus two mutation probes (forward-order revert
and removed locking both fail the suite), and a CLI-level check of
`act list --json` output shape and ordering.
The archive actions in the acting epic move whole skill/agent directories,
which the framework could not handle: snapshotFile used copyFile (EISDIR on
a directory) and the afterHash pass used readFile. Snapshots now branch on
lstat, copying directory trees with fs.cp recursive. Directories get an
empty afterHash ('' means no content hash) and drift detection skips the
hash comparison for them; the occupied-original-path check still applies and
a missing movedTo still falls back to the backup. Backup restore likewise
branches: a directory snapshot replaces the target (rm then cp recursive).

Apply-side rename cannot replace a directory destination (ENOTEMPTY), so a
move retries once after clearing the already-snapshotted destination,
rethrowing other codes before any destination damage.

Tests: archive a directory tree and restore it byte-identical, move a
directory onto an existing destination directory (destBackup taken, both
trees restored), and dir-move undo with an occupied original path refusing
without --force and overwriting with it.
Add stable kebab-case finding ids to every optimize detector and to the
JSON report, then route the config-class findings through the action
journal so they can be applied and undone.

- optimize.ts: id on every WasteFinding and OptimizeJsonReport entry;
  appliable findings also carry a machine-readable apply payload (mcp
  server list, project-scope keepers, archive names).
- act/plans.ts: planFor(finding) builds concrete, journaled file
  mutations for mcp-remove, mcp-project-scope, skill/agent/command
  archives, CLAUDE.md rule blocks, and the bash output cap. JSON edits
  preserve the rest of the document (2-space indent, trailing newline);
  unparseable config files are reported and skipped, not fatal.
- act/optimize-apply.ts: codeburn optimize --apply with a plain-readline
  confirm, plus --yes, --dry-run, and --only; prints each journal id and
  the undo hint. --apply with --json exits 2.

Tests cover mcp remove/undo, project-scope global-to-project move,
unparseable-file skip, archive collision suffixing, CLAUDE.md marker
idempotency, a byte-identical dry-run tree hash, and a finding-id guard.
…on (#604)

Review fixes plus one coordinator amendment on top of the initial
optimize --apply implementation.

- mcp-project-scope no longer strips a server from every projects[*]
  container in ~/.claude.json: only the top-level entry and the
  finding's cold projects lose it, and the cwd's own config files count
  as cold only when the cwd is in the cold list. The plan preview
  annotates ~/.claude.json with the project entries that lose the
  server.
- unused-mcp findings are now appliable (remove-everywhere mcp-remove
  plans, same as low-coverage).
- Notes are rendered under manual findings too, so an all-unparseable
  config surfaces its parse error instead of a bare "manual".
- ConfigDocs strips a leading UTF-8 BOM before JSON.parse.
- claude-md plans are excluded from --apply --yes (they write to
  cwd/CLAUDE.md); they apply via the interactive picker or an explicit
  --only selection, and --yes prints them as skipped with the reason.
- --only with an unknown or not-appliable id errors to stderr with the
  run's valid appliable ids and exit code 2.
- EOF at the interactive prompt prints "Nothing applied." and exits 0;
  an answer that arrives together with EOF is still honored.

Adds end-to-end tests driving runOptimizeApply over injected stdio and
a fixture home: --yes output with journal ids and undo hints, picker
parsing, --only filtering and validation, EOF, claude-md skip, the
projects[*] over-deletion regression, manual-note rendering, and BOM
configs.
Plans serialize the full post-edit file content at build time, so a
target edited between the preview and the interactive confirm would be
silently overwritten with stale content (recoverable via backup, but a
silent violation of the dry-run-shows-exactly-what-changes principle).

- PlannedChange edit/create variants gain optional expectedHash: sha256
  of the raw on-disk bytes the plan was built from (hashed before the
  BOM strip), null when the plan expects the file to be absent,
  undefined to skip validation (framework back-compat).
- plans.ts sets it everywhere a target is read: ConfigDocs hashes the
  raw buffer it parses, and the marker builders hash the file behind a
  shared markerChange helper.
- runAction validates all expected hashes after snapshotting and before
  the first mutation; a mismatch throws "<path> changed since the plan
  was built; re-run codeburn optimize --apply", removes the backup dir,
  and journals nothing. No rollback is involved since nothing mutated.

Tests: stale edit target rejected with no journal record and no backup
dir left, expects-absent plan rejected when the file appeared, matching
hash still applies, and a hash-less change still applies unchecked.
…rs to constants

Guard (#605) reuses both as a single source of truth: parseApiCall is the
narrowest per-entry cost+tools extractor for the incremental transcript parse,
and the two session-opener strings become exported constants so the optimize
findings and the guard SessionStart hook can never drift.
codeburn guard install|uninstall|status|refresh|allow plus the internal
hook/statusline handlers. Off by default, fully local, cleanly removable.

- Settings edits go through the action journal (guard-install / guard-uninstall)
  with expectedHash, appending our entries and removing exactly ours by command
  prefix; a byte-identical uninstall is asserted by test.
- PreToolUse budget cap (soft warn once, hard block with a per-session allow
  override), Stop yield checkpoint (expensive with no edits and no commit, once),
  and a flagged-project SessionStart opener built from the optimize detectors.
- Incremental per-session cache keyed by session id: resumes the transcript
  parse from the last complete-line byte offset via readSessionLines, folding
  only the tail into running totals. Warm invocation ~0.28s against a 90MB
  transcript, dominated by CLI startup; the tail parse itself is negligible.
- All handlers fail open: any error, malformed stdin, or missing transcript
  exits 0 with no output so a broken guard can never block a session.
- Hook protocol verified against the live docs (dated block at the top of
  hooks.ts); zero new dependencies.
Claude Code rewrites each assistant message several times as it streams,
every copy carrying the full final usage; the shipped parser dedupes these
last-wins (dedupeStreamingMessageIds) but the guard fold summed every line,
measuring real sessions at 2.5-2.8x their true cost and false-blocking the
hard cap at roughly 40% of the configured spend.

- The session cache now maps message id -> that id's cost contribution and
  each id-carrying line replaces its previous contribution; id-less lines
  keep plain adds. Validated against two real transcripts (90MB and 116MB):
  guard totals now equal the shipped deduped totals exactly.
- Replace semantics also self-heal the trailing-line case: a complete final
  line without its newline is folded but byteOffset stops before it, so the
  next invocation re-reads it as a replace, not a double add.
- editCount becomes a set-once sawEdit boolean so duplicate copies of an
  edit tool_use cannot inflate it; cache schema bumped to v2 (old caches
  cold-reparse once).
- Per-session state moves to guard/sessions/ so a session id can never
  collide with the shared flags.json, dropping the doAllow special case.
- The git-commit detector now requires commit as the git subcommand at a
  command boundary (start of string or line, or ; & |), with intra-command
  gaps that never cross newlines: 'git log --grep commit' and
  'git diff && echo commit' no longer match, while newline-separated
  'git add ...\ngit commit' in multi-line Bash calls now does (verified as
  a real false negative on a live transcript).
- Corrected the statusline protocol note: each stdout line renders as its
  own row; we emit exactly one.
- New tests: streaming-duplicate fixtures (3x identical, growing last-wins,
  incremental replace) asserted equal to a cold shipped-parser computation,
  the trailing-partial-line scenario, the commit-detector matrix, and a
  stale-plan test proving guard-install plans carry expectedHash (a
  concurrent settings edit aborts the apply and survives). The act list CLI
  spawn test now anchors to the repo root from the test file location.
Capture a trailing-14-day before-baseline when a fix is applied and
re-measure it against the post-apply window so optimize can show realized
numbers next to estimates.

- ActionBaseline (windowDays, capturedAt, estimatedTokens, sessions, metrics)
  persisted by runAction; captured in the optimize --apply flow and at guard
  install time.
- codeburn act report [--json]: applied, not-undone actions older than 3 days,
  re-running the detectors over apply-date-to-now (capped 30 days). Per-kind
  realized deltas: MCP/archive tokens-per-session times saved sessions with
  reverted-by-user detection; read-edit deficit reduction; guard yield split
  labeled correlation. Bash cap is marked not measurable (result sizes are not
  retained). Low confidence under 20 post-window sessions or past a 2x volume
  shift. Realized numbers rounded down, estimate kept visible.
- optimize gains one header line only when a measured action exists, and
  appends "(previously applied <date>, re-flagged)" to re-triggered findings.
  No change for users with no applied actions.

Reuses scanAndDetect helpers over a date-bounded range; exports the token
constants and read/edit tool sets rather than duplicating the math.
…606)

Bug-hunt follow-ups on the realized-savings report:

- Scale the displayed estimate to the measured window so the Estimated and
  Realized columns are comparable: mcp/archive use the per-session baseline
  times the post-window session count, read-edit uses deficitThen times the
  same edits denominator as realized (so realized never exceeds it). The
  at-apply estimate stays in --json as estimatedAtApply next to
  estimatedForWindow; the footer states the scaling and that mcp/archive
  realized figures are derived from session counts, not independently
  measured.
- Never crash on a corrupt journal: records without a string status or a
  parseable string `at` are skipped and surfaced ("N malformed records
  skipped"); the optimize header computation is additionally wrapped so any
  error just drops the header.
- Zero post-window sessions now reads "not measurable: no sessions in the
  window yet" instead of measured-zero.
- The optimize header sums only normal-confidence measured rows (under-claim);
  low-confidence rows stay visible in act report only.
- Tests: floor discipline on non-integer mcp and read-edit products (a ceil
  mutation fails), the project-scope keeper subtraction, the archive
  estimate==realized tautology, malformed-journal robustness, and the
  low-confidence header exclusion.
@iamtoruk iamtoruk merged commit 4b8e744 into main Jul 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant