atomantic · atomantic · Jul 4, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
diff --git a/.changelog/NEXT.md b/.changelog/NEXT.md
@@ -13,3 +13,4 @@
 ## Fixed
 
 - **PM2 standardization no longer overwrites a PM2 config PortOS didn't generate.** Standardizing an app used to always regenerate `ecosystem.config.cjs`, silently replacing a hand-written one — losing its custom ports and settings. A config PortOS didn't generate is now preserved by default (and its ports are left untouched in `.env`/Vite too); PortOS only regenerates its own previously-generated config. To deliberately replace a custom config, opt in with the overwrite flag when standardizing.
+- **CoS TUI agents running `/do:release`/`/do:pr`/`/do:rpr` no longer get prematurely reaped as "done" while a reviewer is still working** — the idle-timeout reaper only special-cased the `/do:next --swarm` merge-queue wait (#2074); a slow multi-reviewer pass (e.g. codex reading a large diff) could go silent past the default 3-minute idle window and get finalized as a false `idle-complete` success before the release ever reached the merge gate, leaving the PR open with the task dashboard showing "completed" (observed on agent-61508f36, PR #2084, which sat open+unmerged for hours). Added a `createReviewLoopTracker` (`server/lib/tuiHandshake.js`) that latches on multi-reviewer-loop chrome — the review-plan status line (agent list in brackets), a review-pass counter (N of M), or the aggregate report heading — and extends the idle reaper's grace to 15 minutes while active — mirroring the merge-queue fix exactly, including surfacing a timeout as a needs-manual-finish failure (`reason: 'review-loop-idle-timeout'`) instead of a silent success. (Deliberately worded here without reproducing the literal trigger strings — this changelog entry itself would otherwise false-latch the tracker once a future release run reads it.)
diff --git a/server/lib/tuiHandshake.js b/server/lib/tuiHandshake.js
@@ -378,6 +378,106 @@ export function createMergeQueueTracker() {
   };
 }
 
+// Extended idle threshold applied while a `/do:release`, `/do:pr`, or `/do:rpr`
+// multi-reviewer loop is waiting on a slow external reviewer (a Copilot cloud
+// review, a headless codex/agy/claude review pass, an Ollama pass, or an
+// arbitrary @<login> human reviewer). Observed 2026-07-02 (agent-61508f36): the
+// review loop correctly backgrounds the reviewer and polls for it rather than
+// blocking — but the reviewer itself can go silent in the wrapped TUI for well
+// over the 3-minute default while it works (e.g. codex reading a large diff),
+// and the runner reaped the still-waiting release agent as `idle-complete` (a
+// false SUCCESS) before it ever reached the merge gate, leaving the release PR
+// open and unmerged. Mirrors the merge-queue grace (#2074) exactly: 15 minutes
+// comfortably covers one reviewer's silent working stretch while still
+// bounding a genuinely-dead agent's reap.
+export const REVIEW_LOOP_IDLE_TIMEOUT_MS = 900000;
+
+// Distinctive markers the multi-reviewer loop (do:release/do:pr/do:rpr) prints
+// once it starts waiting on a reviewer pass. Detection is deliberately
+// conservative, same rationale as MERGE_QUEUE_MARKERS above: a false POSITIVE
+// only extends the (bounded) idle window, and a false NEGATIVE just preserves
+// prior behavior. Matched against ANSI-stripped output.
+//
+// Both patterns are anchored to the literal RENDERED shape rather than a bare
+// substring, because this repo bundles the slashdo docs that DESCRIBE these
+// banners — `lib/slashdo/lib/multi-reviewer-loop.md` alone contains the word
+// "multi-reviewer" dozens of times, the literal phrase "Review plan:" once
+// (inside its own instruction text), AND a fully-rendered example of its own
+// aggregate-report heading ("## Multi-Reviewer Summary", in the doc's sample
+// output block) — so a THIRD marker keyed on that heading alone would latch
+// on any agent reading/quoting that one doc file (codex review flagged this
+// exact collision; verified via `grep -rn "multi-reviewer summary"
+// lib/slashdo/` — it's the only bundled doc containing that literal string).
+// This project's CLAUDE.md convention text is also "run a simplify/
+// self-review pass before committing", whose substring "review pass" would
+// otherwise latch on ANY CoS agent's ordinary narration — not just an actual
+// do:release/do:pr/do:rpr run. Anchoring on the shape only the runtime output
+// actually has (a rendered `[...]` agent list, or a digit/slash pass counter)
+// — verified clean against every bundled slashdo doc — keeps the
+// false-positive rate low without weakening true-positive detection: the
+// review-plan banner alone is a complete, sufficient signal (it prints once,
+// unconditionally, before ANY reviewer pass begins) and the tracker latches
+// permanently once set, so the pass-banner pattern only needs to catch the
+// (rare) case where the plan banner itself was missed.
+const REVIEW_PLAN_PATTERN = /review plan:\s*\[/i;
+const REVIEW_PASS_BANNER_PATTERN = /review pass\s+\d+\s*\/\s*\d+/i;
+
+/**
+ * True when a chunk of ANSI-stripped TUI output shows the multi-reviewer loop
+ * (do:release/do:pr/do:rpr) has started a reviewer pass. Callers MUST pass
+ * stripped output. Non-string / empty input yields false.
+ *
+ * @param {string} strippedText — ANSI-stripped output (a chunk or accumulator).
+ * @returns {boolean}
+ */
+export function isReviewLoopSignal(strippedText) {
+  if (typeof strippedText !== 'string' || !strippedText) return false;
+  return REVIEW_PLAN_PATTERN.test(strippedText) || REVIEW_PASS_BANNER_PATTERN.test(strippedText);
+}
+
+// Rolling tail cap for createReviewLoopTracker's cross-chunk buffer (below).
+// The banner text itself is well under 100 chars (e.g. "Review plan: [claude,
+// codex] (mode: series, stop-mode: all)" is ~62), so this is generous
+// headroom for intervening chrome without letting the buffer grow unbounded
+// over a long-running session.
+const REVIEW_LOOP_TAIL_CAP = 512;
+
+/**
+ * Latching tracker for "this agent is waiting inside a multi-reviewer loop"
+ * (do:release/do:pr/do:rpr). Feed it each ANSI-stripped post-submit chunk via
+ * `observe(text)`; it becomes `active` the first time a review-loop marker
+ * appears and STAYS active thereafter (same latching rationale as
+ * createMergeQueueTracker — the failure mode is a silent external-reviewer
+ * wait, so a recency window would age the flag out exactly when the extended
+ * grace is needed). Once latched, the idle reaper uses
+ * REVIEW_LOOP_IDLE_TIMEOUT_MS instead of the 3-minute default.
+ *
+ * Keeps a small rolling buffer of the most recent REVIEW_LOOP_TAIL_CAP
+ * characters (codex review finding, iteration 2): a real TUI can deliver the
+ * one-shot `Review plan: [` / `Review pass N/M` banner split across two
+ * `onData` chunks — plausible during token-by-token streaming — so checking
+ * only the current chunk in isolation would miss it if the split lands
+ * mid-marker. Concatenating each new chunk onto the tail before testing means
+ * a marker split across a chunk boundary still appears whole on the very next
+ * observation.
+ *
+ * @returns {{ observe: (strippedText: string) => boolean, readonly active: boolean }}
+ */
+export function createReviewLoopTracker() {
+  let active = false;
+  let tail = '';
+  return {
+    observe(strippedText) {
+      if (active) return true;
+      if (typeof strippedText !== 'string' || !strippedText) return active;
+      tail = (tail + strippedText).slice(-REVIEW_LOOP_TAIL_CAP);
+      if (isReviewLoopSignal(tail)) active = true;
+      return active;
+    },
+    get active() { return active; },
+  };
+}
+
 // ─── Buffer caps (defensive RAM bounds) ───────────────────────────────────
 //
 // RAW caps stay small — the raw PTY stream is only used for paste-marker

diff --git a/server/lib/tuiHandshake.test.js b/server/lib/tuiHandshake.test.js
@@ -15,6 +15,9 @@ import {
   MERGE_QUEUE_IDLE_TIMEOUT_MS,
   isMergeQueueSignal,
   createMergeQueueTracker,
+  REVIEW_LOOP_IDLE_TIMEOUT_MS,
+  isReviewLoopSignal,
+  createReviewLoopTracker,
   rendersWorkCounter,
   PASTE_TO_ENTER_MIN_DELAY_MS,
   PASTE_TO_ENTER_FALLBACK_MS,
@@ -290,6 +293,80 @@ describe('tuiHandshake — merge-queue idle suppression (#2074)', () => {
   });
 });
 
+// Observed 2026-07-02 (agent-61508f36) — a do:release run's multi-reviewer
+// loop correctly backgrounded a slow codex review and polled for it rather
+// than blocking, but the reviewer's silent working stretch exceeded the
+// 3-minute default and the runner reaped the still-waiting release as a false
+// `idle-complete` success before it reached the merge gate, leaving PR #2084
+// open. Mirrors the merge-queue suppression above.
+describe('tuiHandshake — review-loop idle suppression', () => {
+  it('extends the idle timeout well past the default 3-minute window', () => {
+    expect(REVIEW_LOOP_IDLE_TIMEOUT_MS).toBe(900000);
+    expect(REVIEW_LOOP_IDLE_TIMEOUT_MS).toBeGreaterThan(DEFAULT_TUI_IDLE_TIMEOUT_MS);
+  });
+
+  it('isReviewLoopSignal matches multi-reviewer-loop chrome (case-insensitive)', () => {
+    expect(isReviewLoopSignal('Review plan: [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
+    expect(isReviewLoopSignal('--- Review pass 1/2: codex ---')).toBe(true);
+  });
+
+  it('isReviewLoopSignal ignores ordinary implementation/output chrome', () => {
+    expect(isReviewLoopSignal('Editing server/services/agentTuiSpawning.js')).toBe(false);
+    expect(isReviewLoopSignal('● high · (12s · running tests)')).toBe(false);
+    expect(isReviewLoopSignal('')).toBe(false);
+    expect(isReviewLoopSignal(null)).toBe(false);
+    expect(isReviewLoopSignal(undefined)).toBe(false);
+  });
+
+  // Regression for false-positive latches found across three rounds of local
+  // review: bare substrings ('review pass', 'review loop', 'multi-reviewer',
+  // 'multi-reviewer summary') would match this project's own CLAUDE.md
+  // convention ("run a simplify/self-review pass before committing") and this
+  // repo's bundled slashdo docs — which say "review loop"/"multi-reviewer"
+  // dozens of times, include the literal instruction text "Review plan:
+  // {REVIEW_AGENTS}...", AND (codex review's finding) render a fully-formed
+  // example of their own "## Multi-Reviewer Summary" aggregate-report heading
+  // in a sample output block — latching the tracker for ANY CoS agent's
+  // ordinary narration or docs-editing, not just an actual do:release/do:pr/
+  // do:rpr run.
+  it('isReviewLoopSignal does NOT match ordinary self-review narration or doc prose', () => {
+    expect(isReviewLoopSignal('running the self-review pass before committing')).toBe(false);
+    expect(isReviewLoopSignal('## Local Agent Code Review Loop')).toBe(false);
+    expect(isReviewLoopSignal('You are a Copilot review loop agent.')).toBe(false);
+    expect(isReviewLoopSignal('the multi-reviewer wrapper dispatches each listed agent')).toBe(false);
+    expect(isReviewLoopSignal('Print the resolved plan before starting: `Review plan: {REVIEW_AGENTS} (mode: ...)`')).toBe(false);
+    expect(isReviewLoopSignal('## Multi-Reviewer Summary')).toBe(false);
+  });
+
+  it('createReviewLoopTracker latches on first signal and stays active through silence', () => {
+    const tracker = createReviewLoopTracker();
+    expect(tracker.active).toBe(false);
+    tracker.observe('implementing the fix, running the suite');
+    expect(tracker.active).toBe(false);
+    // Enters the multi-reviewer loop — latches.
+    expect(tracker.observe('Review plan: [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
+    expect(tracker.active).toBe(true);
+    // Subsequent quiet reviewer-wait chunks (no marker) must NOT un-latch it —
+    // the whole point is that the silent gap is when the grace is needed.
+    tracker.observe('waiting for codex...');
+    tracker.observe('');
+    expect(tracker.active).toBe(true);
+  });
+
+  // Regression for codex review [P2] (iteration 2): a real TUI can deliver
+  // the banner split across two onData chunks (token-by-token streaming),
+  // so checking only the current chunk in isolation would miss it.
+  it('createReviewLoopTracker latches on a marker split across two chunks', () => {
+    const tracker = createReviewLoopTracker();
+    // Split right before the '[' that anchors the pattern — neither half
+    // alone contains "review plan: [".
+    tracker.observe('Now starting the review loop. Review plan:');
+    expect(tracker.active).toBe(false);
+    expect(tracker.observe(' [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
+    expect(tracker.active).toBe(true);
+  });
+});
+
 describe('tuiHandshake.inferTuiCommand', () => {
   // Catch-all default also returns claude; the claude rows just confirm
   // an explicit match isn't accidentally tagged codex/antigravity/gemini.

diff --git a/server/services/agentTuiSpawning.js b/server/services/agentTuiSpawning.js
@@ -26,12 +26,14 @@ import {
   DEFAULT_TUI_PROMPT_DELAY_MS,
   DEFAULT_TUI_IDLE_TIMEOUT_MS,
   MERGE_QUEUE_IDLE_TIMEOUT_MS,
+  REVIEW_LOOP_IDLE_TIMEOUT_MS,
   READY_POLL_INTERVAL_MS,
   READY_IDLE_THRESHOLD_MS,
   PASTE_MARKER_POLL_MS,
   countPasteMarkers,
   createWorkActivityTracker,
   createMergeQueueTracker,
+  createReviewLoopTracker,
   createInputReadyTracker,
   rendersWorkCounter,
   PASTE_TO_ENTER_MIN_DELAY_MS,
@@ -283,6 +285,14 @@ export async function spawnTuiAgent({
   // latched, the idle reaper uses the extended MERGE_QUEUE_IDLE_TIMEOUT_MS so a
   // still-working orchestrator isn't reaped mid-merge (issue #2074).
   const mergeQueue = createMergeQueueTracker();
+  // Latches once the agent enters a do:release/do:pr/do:rpr multi-reviewer
+  // loop, whose external reviewer passes (codex reading a large diff, a
+  // Copilot cloud review, a human @<login> review) can go silent in the TUI
+  // for well over the default idle window. While latched, the idle reaper
+  // uses the extended REVIEW_LOOP_IDLE_TIMEOUT_MS so a still-waiting release
+  // isn't reaped as a false `idle-complete` success before it reaches the
+  // merge gate (issue observed on agent-61508f36, PR #2084).
+  const reviewLoop = createReviewLoopTracker();
   // Tracks claude's interactive input-readiness (footer chrome) and its first-run
   // folder-trust gate. Gates the prompt paste for the claude TUI so we never
   // paste into a startup banner, a trust menu, or a returned shell prompt.
@@ -639,6 +649,13 @@ export async function spawnTuiAgent({
         emitLog('info', `TUI agent ${agentId} entered merge queue — idle reaper extended to ${Math.round(MERGE_QUEUE_IDLE_TIMEOUT_MS / 60000)}min`, { agentId, phase: 'merge-queue' });
         await updateAgent(agentId, { metadata: { phase: 'merge-queue' } });
       }
+      // Detect entry into a do:release/do:pr/do:rpr multi-reviewer loop so the
+      // idle reaper can extend its grace across a slow reviewer's silent
+      // working stretch (see reviewLoop declaration above for the incident).
+      if (promptSubmittedAt && stripped && !reviewLoop.active && reviewLoop.observe(stripped)) {
+        emitLog('info', `TUI agent ${agentId} entered review loop — idle reaper extended to ${Math.round(REVIEW_LOOP_IDLE_TIMEOUT_MS / 60000)}min`, { agentId, phase: 'review-loop' });
+        await updateAgent(agentId, { metadata: { phase: 'review-loop' } });
+      }
       lastOutputAt = now;
       if (firstOutputAt === null) firstOutputAt = lastOutputAt;
 
@@ -901,7 +918,9 @@ export async function spawnTuiAgent({
     // extended window still bounds a genuinely-dead orchestrator's reap.
     const effectiveIdleTimeoutMs = mergeQueue.active
       ? Math.max(tuiConfig.idleTimeoutMs, MERGE_QUEUE_IDLE_TIMEOUT_MS)
-      : tuiConfig.idleTimeoutMs;
+      : reviewLoop.active
+        ? Math.max(tuiConfig.idleTimeoutMs, REVIEW_LOOP_IDLE_TIMEOUT_MS)
+        : tuiConfig.idleTimeoutMs;
     if (idle >= effectiveIdleTimeoutMs) {
       // Reaped AFTER the extended merge-queue grace elapsed: the orchestrator
       // almost certainly died mid-merge with PRs opened/merged-but-uncleaned.
@@ -918,6 +937,22 @@ export async function spawnTuiAgent({
         });
         return;
       }
+      // Reaped AFTER the extended review-loop grace elapsed: a reviewer
+      // (copilot/codex/agy/claude/ollama/@<login>) likely hung, or the wait
+      // simply exceeded budget. Surface it as a needs-manual-finish FAILURE
+      // rather than the silent `status: completed` that let PR #2084 sit
+      // open+unmerged for hours while agent-61508f36 looked "done".
+      if (reviewLoop.active) {
+        finish({
+          success: false,
+          exitCode: 1,
+          error: `TUI agent idled out after ${Math.round(effectiveIdleTimeoutMs / 60000)}min waiting inside the multi-reviewer loop — a reviewer may have hung or the wait exceeded budget; check the PR's review/merge state and finish manually.`,
+          reason: 'review-loop-idle-timeout',
+        }).catch(err => {
+          emitLog('error', `Failed to finalize TUI agent ${agentId}: ${err.message}`, { agentId });
+        });
+        return;
+      }
       // Distinguish a real (sentinel-less) completion from a never-submitted
       // prompt that just idled out. `lastOutputAt > promptSentAt` only proves
       // the TUI repainted SOMETHING — banner/status chrome churns even with the
Original file line number	Diff line number	Diff line change
Expand Up		@@ -13,3 +13,4 @@
		## Fixed

		- PM2 standardization no longer overwrites a PM2 config PortOS didn't generate. Standardizing an app used to always regenerate `ecosystem.config.cjs`, silently replacing a hand-written one — losing its custom ports and settings. A config PortOS didn't generate is now preserved by default (and its ports are left untouched in `.env`/Vite too); PortOS only regenerates its own previously-generated config. To deliberately replace a custom config, opt in with the overwrite flag when standardizing.
		- CoS TUI agents running `/do:release`/`/do:pr`/`/do:rpr` no longer get prematurely reaped as "done" while a reviewer is still working — the idle-timeout reaper only special-cased the `/do:next --swarm` merge-queue wait (#2074); a slow multi-reviewer pass (e.g. codex reading a large diff) could go silent past the default 3-minute idle window and get finalized as a false `idle-complete` success before the release ever reached the merge gate, leaving the PR open with the task dashboard showing "completed" (observed on agent-61508f36, PR #2084, which sat open+unmerged for hours). Added a `createReviewLoopTracker` (`server/lib/tuiHandshake.js`) that latches on multi-reviewer-loop chrome — the review-plan status line (agent list in brackets), a review-pass counter (N of M), or the aggregate report heading — and extends the idle reaper's grace to 15 minutes while active — mirroring the merge-queue fix exactly, including surfacing a timeout as a needs-manual-finish failure (`reason: 'review-loop-idle-timeout'`) instead of a silent success. (Deliberately worded here without reproducing the literal trigger strings — this changelog entry itself would otherwise false-latch the tracker once a future release run reads it.)