Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .changelog/NEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
## Fixed

- **PM2 standardization no longer overwrites a PM2 config PortOS didn't generate.** Standardizing an app used to always regenerate `ecosystem.config.cjs`, silently replacing a hand-written one — losing its custom ports and settings. A config PortOS didn't generate is now preserved by default (and its ports are left untouched in `.env`/Vite too); PortOS only regenerates its own previously-generated config. To deliberately replace a custom config, opt in with the overwrite flag when standardizing.
- **CoS TUI agents running `/do:release`/`/do:pr`/`/do:rpr` no longer get prematurely reaped as "done" while a reviewer is still working** — the idle-timeout reaper only special-cased the `/do:next --swarm` merge-queue wait (#2074); a slow multi-reviewer pass (e.g. codex reading a large diff) could go silent past the default 3-minute idle window and get finalized as a false `idle-complete` success before the release ever reached the merge gate, leaving the PR open with the task dashboard showing "completed" (observed on agent-61508f36, PR #2084, which sat open+unmerged for hours). Added a `createReviewLoopTracker` (`server/lib/tuiHandshake.js`) that latches on multi-reviewer-loop chrome — the review-plan status line (agent list in brackets), a review-pass counter (N of M), or the aggregate report heading — and extends the idle reaper's grace to 15 minutes while active — mirroring the merge-queue fix exactly, including surfacing a timeout as a needs-manual-finish failure (`reason: 'review-loop-idle-timeout'`) instead of a silent success. (Deliberately worded here without reproducing the literal trigger strings — this changelog entry itself would otherwise false-latch the tracker once a future release run reads it.)
100 changes: 100 additions & 0 deletions server/lib/tuiHandshake.js
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,106 @@ export function createMergeQueueTracker() {
};
}

// Extended idle threshold applied while a `/do:release`, `/do:pr`, or `/do:rpr`
// multi-reviewer loop is waiting on a slow external reviewer (a Copilot cloud
// review, a headless codex/agy/claude review pass, an Ollama pass, or an
// arbitrary @<login> human reviewer). Observed 2026-07-02 (agent-61508f36): the
// review loop correctly backgrounds the reviewer and polls for it rather than
// blocking — but the reviewer itself can go silent in the wrapped TUI for well
// over the 3-minute default while it works (e.g. codex reading a large diff),
// and the runner reaped the still-waiting release agent as `idle-complete` (a
// false SUCCESS) before it ever reached the merge gate, leaving the release PR
// open and unmerged. Mirrors the merge-queue grace (#2074) exactly: 15 minutes
// comfortably covers one reviewer's silent working stretch while still
// bounding a genuinely-dead agent's reap.
export const REVIEW_LOOP_IDLE_TIMEOUT_MS = 900000;

// Distinctive markers the multi-reviewer loop (do:release/do:pr/do:rpr) prints
// once it starts waiting on a reviewer pass. Detection is deliberately
// conservative, same rationale as MERGE_QUEUE_MARKERS above: a false POSITIVE
// only extends the (bounded) idle window, and a false NEGATIVE just preserves
// prior behavior. Matched against ANSI-stripped output.
//
// Both patterns are anchored to the literal RENDERED shape rather than a bare
// substring, because this repo bundles the slashdo docs that DESCRIBE these
// banners — `lib/slashdo/lib/multi-reviewer-loop.md` alone contains the word
// "multi-reviewer" dozens of times, the literal phrase "Review plan:" once
// (inside its own instruction text), AND a fully-rendered example of its own
// aggregate-report heading ("## Multi-Reviewer Summary", in the doc's sample
// output block) — so a THIRD marker keyed on that heading alone would latch
// on any agent reading/quoting that one doc file (codex review flagged this
// exact collision; verified via `grep -rn "multi-reviewer summary"
// lib/slashdo/` — it's the only bundled doc containing that literal string).
// This project's CLAUDE.md convention text is also "run a simplify/
// self-review pass before committing", whose substring "review pass" would
// otherwise latch on ANY CoS agent's ordinary narration — not just an actual
// do:release/do:pr/do:rpr run. Anchoring on the shape only the runtime output
// actually has (a rendered `[...]` agent list, or a digit/slash pass counter)
// — verified clean against every bundled slashdo doc — keeps the
// false-positive rate low without weakening true-positive detection: the
// review-plan banner alone is a complete, sufficient signal (it prints once,
// unconditionally, before ANY reviewer pass begins) and the tracker latches
// permanently once set, so the pass-banner pattern only needs to catch the
// (rare) case where the plan banner itself was missed.
const REVIEW_PLAN_PATTERN = /review plan:\s*\[/i;
const REVIEW_PASS_BANNER_PATTERN = /review pass\s+\d+\s*\/\s*\d+/i;

/**
* True when a chunk of ANSI-stripped TUI output shows the multi-reviewer loop
* (do:release/do:pr/do:rpr) has started a reviewer pass. Callers MUST pass
* stripped output. Non-string / empty input yields false.
*
* @param {string} strippedText — ANSI-stripped output (a chunk or accumulator).
* @returns {boolean}
*/
export function isReviewLoopSignal(strippedText) {
if (typeof strippedText !== 'string' || !strippedText) return false;
return REVIEW_PLAN_PATTERN.test(strippedText) || REVIEW_PASS_BANNER_PATTERN.test(strippedText);
}

// Rolling tail cap for createReviewLoopTracker's cross-chunk buffer (below).
// The banner text itself is well under 100 chars (e.g. "Review plan: [claude,
// codex] (mode: series, stop-mode: all)" is ~62), so this is generous
// headroom for intervening chrome without letting the buffer grow unbounded
// over a long-running session.
const REVIEW_LOOP_TAIL_CAP = 512;

/**
* Latching tracker for "this agent is waiting inside a multi-reviewer loop"
* (do:release/do:pr/do:rpr). Feed it each ANSI-stripped post-submit chunk via
* `observe(text)`; it becomes `active` the first time a review-loop marker
* appears and STAYS active thereafter (same latching rationale as
* createMergeQueueTracker — the failure mode is a silent external-reviewer
* wait, so a recency window would age the flag out exactly when the extended
* grace is needed). Once latched, the idle reaper uses
* REVIEW_LOOP_IDLE_TIMEOUT_MS instead of the 3-minute default.
*
* Keeps a small rolling buffer of the most recent REVIEW_LOOP_TAIL_CAP
* characters (codex review finding, iteration 2): a real TUI can deliver the
* one-shot `Review plan: [` / `Review pass N/M` banner split across two
* `onData` chunks — plausible during token-by-token streaming — so checking
* only the current chunk in isolation would miss it if the split lands
* mid-marker. Concatenating each new chunk onto the tail before testing means
* a marker split across a chunk boundary still appears whole on the very next
* observation.
*
* @returns {{ observe: (strippedText: string) => boolean, readonly active: boolean }}
*/
export function createReviewLoopTracker() {
let active = false;
let tail = '';
return {
observe(strippedText) {
if (active) return true;
if (typeof strippedText !== 'string' || !strippedText) return active;
tail = (tail + strippedText).slice(-REVIEW_LOOP_TAIL_CAP);
if (isReviewLoopSignal(tail)) active = true;
return active;
},
get active() { return active; },
};
}

// ─── Buffer caps (defensive RAM bounds) ───────────────────────────────────
//
// RAW caps stay small — the raw PTY stream is only used for paste-marker
Expand Down
77 changes: 77 additions & 0 deletions server/lib/tuiHandshake.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ import {
MERGE_QUEUE_IDLE_TIMEOUT_MS,
isMergeQueueSignal,
createMergeQueueTracker,
REVIEW_LOOP_IDLE_TIMEOUT_MS,
isReviewLoopSignal,
createReviewLoopTracker,
rendersWorkCounter,
PASTE_TO_ENTER_MIN_DELAY_MS,
PASTE_TO_ENTER_FALLBACK_MS,
Expand Down Expand Up @@ -290,6 +293,80 @@ describe('tuiHandshake — merge-queue idle suppression (#2074)', () => {
});
});

// Observed 2026-07-02 (agent-61508f36) — a do:release run's multi-reviewer
// loop correctly backgrounded a slow codex review and polled for it rather
// than blocking, but the reviewer's silent working stretch exceeded the
// 3-minute default and the runner reaped the still-waiting release as a false
// `idle-complete` success before it reached the merge gate, leaving PR #2084
// open. Mirrors the merge-queue suppression above.
describe('tuiHandshake — review-loop idle suppression', () => {
it('extends the idle timeout well past the default 3-minute window', () => {
expect(REVIEW_LOOP_IDLE_TIMEOUT_MS).toBe(900000);
expect(REVIEW_LOOP_IDLE_TIMEOUT_MS).toBeGreaterThan(DEFAULT_TUI_IDLE_TIMEOUT_MS);
});

it('isReviewLoopSignal matches multi-reviewer-loop chrome (case-insensitive)', () => {
expect(isReviewLoopSignal('Review plan: [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
expect(isReviewLoopSignal('--- Review pass 1/2: codex ---')).toBe(true);
});

it('isReviewLoopSignal ignores ordinary implementation/output chrome', () => {
expect(isReviewLoopSignal('Editing server/services/agentTuiSpawning.js')).toBe(false);
expect(isReviewLoopSignal('● high · (12s · running tests)')).toBe(false);
expect(isReviewLoopSignal('')).toBe(false);
expect(isReviewLoopSignal(null)).toBe(false);
expect(isReviewLoopSignal(undefined)).toBe(false);
});

// Regression for false-positive latches found across three rounds of local
// review: bare substrings ('review pass', 'review loop', 'multi-reviewer',
// 'multi-reviewer summary') would match this project's own CLAUDE.md
// convention ("run a simplify/self-review pass before committing") and this
// repo's bundled slashdo docs — which say "review loop"/"multi-reviewer"
// dozens of times, include the literal instruction text "Review plan:
// {REVIEW_AGENTS}...", AND (codex review's finding) render a fully-formed
// example of their own "## Multi-Reviewer Summary" aggregate-report heading
// in a sample output block — latching the tracker for ANY CoS agent's
// ordinary narration or docs-editing, not just an actual do:release/do:pr/
// do:rpr run.
it('isReviewLoopSignal does NOT match ordinary self-review narration or doc prose', () => {
expect(isReviewLoopSignal('running the self-review pass before committing')).toBe(false);
expect(isReviewLoopSignal('## Local Agent Code Review Loop')).toBe(false);
expect(isReviewLoopSignal('You are a Copilot review loop agent.')).toBe(false);
expect(isReviewLoopSignal('the multi-reviewer wrapper dispatches each listed agent')).toBe(false);
expect(isReviewLoopSignal('Print the resolved plan before starting: `Review plan: {REVIEW_AGENTS} (mode: ...)`')).toBe(false);
expect(isReviewLoopSignal('## Multi-Reviewer Summary')).toBe(false);
});

it('createReviewLoopTracker latches on first signal and stays active through silence', () => {
const tracker = createReviewLoopTracker();
expect(tracker.active).toBe(false);
tracker.observe('implementing the fix, running the suite');
expect(tracker.active).toBe(false);
// Enters the multi-reviewer loop — latches.
expect(tracker.observe('Review plan: [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
expect(tracker.active).toBe(true);
// Subsequent quiet reviewer-wait chunks (no marker) must NOT un-latch it —
// the whole point is that the silent gap is when the grace is needed.
tracker.observe('waiting for codex...');
tracker.observe('');
expect(tracker.active).toBe(true);
});

// Regression for codex review [P2] (iteration 2): a real TUI can deliver
// the banner split across two onData chunks (token-by-token streaming),
// so checking only the current chunk in isolation would miss it.
it('createReviewLoopTracker latches on a marker split across two chunks', () => {
const tracker = createReviewLoopTracker();
// Split right before the '[' that anchors the pattern — neither half
// alone contains "review plan: [".
tracker.observe('Now starting the review loop. Review plan:');
expect(tracker.active).toBe(false);
expect(tracker.observe(' [claude, codex] (mode: series, stop-mode: all)')).toBe(true);
expect(tracker.active).toBe(true);
});
});

describe('tuiHandshake.inferTuiCommand', () => {
// Catch-all default also returns claude; the claude rows just confirm
// an explicit match isn't accidentally tagged codex/antigravity/gemini.
Expand Down
37 changes: 36 additions & 1 deletion server/services/agentTuiSpawning.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ import {
DEFAULT_TUI_PROMPT_DELAY_MS,
DEFAULT_TUI_IDLE_TIMEOUT_MS,
MERGE_QUEUE_IDLE_TIMEOUT_MS,
REVIEW_LOOP_IDLE_TIMEOUT_MS,
READY_POLL_INTERVAL_MS,
READY_IDLE_THRESHOLD_MS,
PASTE_MARKER_POLL_MS,
countPasteMarkers,
createWorkActivityTracker,
createMergeQueueTracker,
createReviewLoopTracker,
createInputReadyTracker,
rendersWorkCounter,
PASTE_TO_ENTER_MIN_DELAY_MS,
Expand Down Expand Up @@ -283,6 +285,14 @@ export async function spawnTuiAgent({
// latched, the idle reaper uses the extended MERGE_QUEUE_IDLE_TIMEOUT_MS so a
// still-working orchestrator isn't reaped mid-merge (issue #2074).
const mergeQueue = createMergeQueueTracker();
// Latches once the agent enters a do:release/do:pr/do:rpr multi-reviewer
// loop, whose external reviewer passes (codex reading a large diff, a
// Copilot cloud review, a human @<login> review) can go silent in the TUI
// for well over the default idle window. While latched, the idle reaper
// uses the extended REVIEW_LOOP_IDLE_TIMEOUT_MS so a still-waiting release
// isn't reaped as a false `idle-complete` success before it reaches the
// merge gate (issue observed on agent-61508f36, PR #2084).
const reviewLoop = createReviewLoopTracker();
// Tracks claude's interactive input-readiness (footer chrome) and its first-run
// folder-trust gate. Gates the prompt paste for the claude TUI so we never
// paste into a startup banner, a trust menu, or a returned shell prompt.
Expand Down Expand Up @@ -639,6 +649,13 @@ export async function spawnTuiAgent({
emitLog('info', `TUI agent ${agentId} entered merge queue — idle reaper extended to ${Math.round(MERGE_QUEUE_IDLE_TIMEOUT_MS / 60000)}min`, { agentId, phase: 'merge-queue' });
await updateAgent(agentId, { metadata: { phase: 'merge-queue' } });
}
// Detect entry into a do:release/do:pr/do:rpr multi-reviewer loop so the
// idle reaper can extend its grace across a slow reviewer's silent
// working stretch (see reviewLoop declaration above for the incident).
if (promptSubmittedAt && stripped && !reviewLoop.active && reviewLoop.observe(stripped)) {
emitLog('info', `TUI agent ${agentId} entered review loop — idle reaper extended to ${Math.round(REVIEW_LOOP_IDLE_TIMEOUT_MS / 60000)}min`, { agentId, phase: 'review-loop' });
await updateAgent(agentId, { metadata: { phase: 'review-loop' } });
}
lastOutputAt = now;
if (firstOutputAt === null) firstOutputAt = lastOutputAt;

Expand Down Expand Up @@ -901,7 +918,9 @@ export async function spawnTuiAgent({
// extended window still bounds a genuinely-dead orchestrator's reap.
const effectiveIdleTimeoutMs = mergeQueue.active
? Math.max(tuiConfig.idleTimeoutMs, MERGE_QUEUE_IDLE_TIMEOUT_MS)
: tuiConfig.idleTimeoutMs;
: reviewLoop.active
? Math.max(tuiConfig.idleTimeoutMs, REVIEW_LOOP_IDLE_TIMEOUT_MS)
: tuiConfig.idleTimeoutMs;
if (idle >= effectiveIdleTimeoutMs) {
// Reaped AFTER the extended merge-queue grace elapsed: the orchestrator
// almost certainly died mid-merge with PRs opened/merged-but-uncleaned.
Expand All @@ -918,6 +937,22 @@ export async function spawnTuiAgent({
});
return;
}
// Reaped AFTER the extended review-loop grace elapsed: a reviewer
// (copilot/codex/agy/claude/ollama/@<login>) likely hung, or the wait
// simply exceeded budget. Surface it as a needs-manual-finish FAILURE
// rather than the silent `status: completed` that let PR #2084 sit
// open+unmerged for hours while agent-61508f36 looked "done".
if (reviewLoop.active) {
finish({
success: false,
exitCode: 1,
error: `TUI agent idled out after ${Math.round(effectiveIdleTimeoutMs / 60000)}min waiting inside the multi-reviewer loop — a reviewer may have hung or the wait exceeded budget; check the PR's review/merge state and finish manually.`,
reason: 'review-loop-idle-timeout',
}).catch(err => {
emitLog('error', `Failed to finalize TUI agent ${agentId}: ${err.message}`, { agentId });
});
return;
}
// Distinguish a real (sentinel-less) completion from a never-submitted
// prompt that just idled out. `lastOutputAt > promptSentAt` only proves
// the TUI repainted SOMETHING — banner/status chrome churns even with the
Expand Down
Loading