You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
deploybot react is the privileged orchestrator the GitHub Action runs on every delivery event. It already computes a rich result object (promoted, drain, integrations, release, top-level state) and prints it as JSON — but only to stdout inside one of many interleaved Actions runs, and it always exits 0. As a result, the worst operational failure mode is invisible: when the pipeline is paused, every triggered react no-ops, returns exit 0, and CI stays green, so a stuck pipeline can go unnoticed indefinitely.
This issue proposes rendering the result react already produces into the surfaces humans and agents actually look at — a GitHub Actions step summary, a sticky PR/commit status, and a non-green CI signal when paused or timed-out — plus a run_id to correlate the event fan-in. This is additive, needs no schema or architecture change, and strengthens the existing --json / notify / comment-marker patterns.
Problem statement
command_react returns/prints a structured object but only to stdout, and the top-level main() returns 0 for every non-exception path. (cli.py command_react, main)
When pipeline_control() is paused, react returns {"state": "paused", "reason": ...} and exits 0; the reason lives only in a deploybot-control:v1 comment marker, and nothing surfaces it proactively. (records.py control_body)
follow_release can return timed-out, but only to stdout with exit 0 — a stuck deploy looks successful. (pipeline.py follow_release)
The Action invokes react and currently writes no $GITHUB_STEP_SUMMARY. (action.yml)
There is no place an operator can glance at to see what the latest react pass did or why the pipeline is stuck.
Proposed behavior
Render the existing command_react result into operator-visible surfaces:
Actions step summary — when GITHUB_STEP_SUMMARY is set, append a Markdown table each pass: state, promoted, merged, waiting [{number, reason}], integration PRs, release/timeout. Built from the result object react already returns.
Visible paused / timed-out signal — when state == "paused" or release.state == "timed-out", make the run non-green (documented non-zero exit and/or a failing check-run). A normal empty pass stays exit 0. The JSON state field remains authoritative for agents.
Sticky status surface — upsert a single "DeployBot status" comment (or commit status) summarizing the latest pass, including, when paused, the reason and the deploybot control unpause remedy. Reuse the marker-upsert machinery in records.py.
run_id correlation — compute once per pass (sha256(repo:utc_now)[:12], mirroring the intent_id pattern in command_request) and include it in the result JSON and in every notify() payload emitted during that pass.
CLI / API / config changes
No new subcommand required; behavior attaches to the existing react flow.
Add run_id to command_react's result dict and thread it into notify() payloads.
Step-summary writing handled in the composite Action's final step (or behind a GITHUB_STEP_SUMMARY check in the CLI).
Optional: a non-zero exit policy for react when paused/timed-out.
No .mergequeue.toml schema changes. No marker schema changes.
Backward compatibility
Purely additive. Default text/JSON stdout is unchanged aside from the new run_id field; the step summary, sticky status, and exit-code policy are additive and auto-detected (e.g., only when GITHUB_STEP_SUMMARY is present). Commit-pinned workflows are unaffected. Safe to ship in a minor release.
Telemetry / logging needs
No external telemetry.
Reuse the existing notify() webhook for run_id-stamped events.
Reuse the existing comment-marker upsert for the sticky status surface.
Acceptance criteria
Every react pass writes a Markdown table to $GITHUB_STEP_SUMMARY when set; behavior is unchanged when unset.
A paused pass renders "⏸ paused: — run deploybot control unpause" in the summary/status surface and makes the Actions run non-green; a normal empty pass stays green / exit 0.
A follow timeout renders as a visible non-green signal and appears in the summary, distinguishable from verified.
react's JSON includes a run_id, and the same id appears in any notify() payloads emitted during that pass.
waiting[] entries carry the existing classify() reason strings (e.g., "CI is not complete", "head changed after it was queued").
A single sticky "DeployBot status" comment/status is upserted (not duplicated) per pass.
Default stdout (text and --json) is byte-for-byte unchanged aside from the additive run_id field.
Unit tests cover: summary file writing, paused→non-zero exit, timed-out→non-zero exit, run_id propagation into notify, and sticky-comment upsert — following the existing unittest / patch("agent_merge_queue...") style in tests/test_cli.py.
Risks & mitigations
Noisy sticky comment → upsert a single comment in place rather than posting per pass; only update when content changes.
Scope creep into concurrency/idempotency fixes → explicitly out of scope here; this issue only surfaces existing state (duplicates become visible first, then fixable).
Out of scope (possible follow-ons)
Idempotency keys (per run_id + batch fingerprint) to prevent duplicate integration PRs / CI dispatches under event bursts.
Emitting paused and follow timeouts as notify() events for alerting.
Summary
deploybot reactis the privileged orchestrator the GitHub Action runs on every delivery event. It already computes a rich result object (promoted,drain,integrations,release, top-levelstate) and prints it as JSON — but only to stdout inside one of many interleaved Actions runs, and it always exits 0. As a result, the worst operational failure mode is invisible: when the pipeline is paused, every triggeredreactno-ops, returns exit 0, and CI stays green, so a stuck pipeline can go unnoticed indefinitely.This issue proposes rendering the result
reactalready produces into the surfaces humans and agents actually look at — a GitHub Actions step summary, a sticky PR/commit status, and a non-green CI signal when paused or timed-out — plus arun_idto correlate the event fan-in. This is additive, needs no schema or architecture change, and strengthens the existing--json/notify/ comment-marker patterns.Problem statement
command_reactreturns/prints a structured object but only to stdout, and the top-levelmain()returns 0 for every non-exception path. (cli.pycommand_react,main)pipeline_control()ispaused,reactreturns{"state": "paused", "reason": ...}and exits 0; the reason lives only in adeploybot-control:v1comment marker, and nothing surfaces it proactively. (records.pycontrol_body)concurrency: group=deploybot-${{ github.repository }}, cancel-in-progress: falseand fans in 6 event families, so debugging "why didn't PR Route deployment receipts to PR-opening threads #42 merge?" means diffing multiple interleaved runs with no correlation id. (examples/github-workflow.yml)follow_releasecan returntimed-out, but only to stdout with exit 0 — a stuck deploy looks successful. (pipeline.pyfollow_release)reactand currently writes no$GITHUB_STEP_SUMMARY. (action.yml)There is no place an operator can glance at to see what the latest
reactpass did or why the pipeline is stuck.Proposed behavior
Render the existing
command_reactresult into operator-visible surfaces:GITHUB_STEP_SUMMARYis set, append a Markdown table each pass: state, promoted, merged,waiting [{number, reason}], integration PRs, release/timeout. Built from the result objectreactalready returns.state == "paused"orrelease.state == "timed-out", make the run non-green (documented non-zero exit and/or a failing check-run). A normal empty pass stays exit 0. The JSONstatefield remains authoritative for agents.deploybot control unpauseremedy. Reuse the marker-upsert machinery inrecords.py.run_idcorrelation — compute once per pass (sha256(repo:utc_now)[:12], mirroring theintent_idpattern incommand_request) and include it in the result JSON and in everynotify()payload emitted during that pass.CLI / API / config changes
reactflow.run_idtocommand_react's result dict and thread it intonotify()payloads.GITHUB_STEP_SUMMARYcheck in the CLI).reactwhen paused/timed-out..mergequeue.tomlschema changes. No marker schema changes.Backward compatibility
Purely additive. Default text/JSON stdout is unchanged aside from the new
run_idfield; the step summary, sticky status, and exit-code policy are additive and auto-detected (e.g., only whenGITHUB_STEP_SUMMARYis present). Commit-pinned workflows are unaffected. Safe to ship in a minor release.Telemetry / logging needs
notify()webhook forrun_id-stamped events.Acceptance criteria
reactpass writes a Markdown table to$GITHUB_STEP_SUMMARYwhen set; behavior is unchanged when unset.deploybot control unpause" in the summary/status surface and makes the Actions run non-green; a normal empty pass stays green / exit 0.followtimeout renders as a visible non-green signal and appears in the summary, distinguishable fromverified.react's JSON includes arun_id, and the same id appears in anynotify()payloads emitted during that pass.waiting[]entries carry the existingclassify()reason strings (e.g., "CI is not complete", "head changed after it was queued").--json) is byte-for-byte unchanged aside from the additiverun_idfield.run_idpropagation intonotify, and sticky-comment upsert — following the existingunittest/patch("agent_merge_queue...")style intests/test_cli.py.Risks & mitigations
pausedandtimed-out; document it; keep empty/normal passes exit 0.GITHUB_STEP_SUMMARYpresence; no-op locally.Out of scope (possible follow-ons)
run_id+ batch fingerprint) to prevent duplicate integration PRs / CI dispatches under event bursts.pausedandfollowtimeouts asnotify()events for alerting.reactorchestration tests (promote→drain→overlap-integrate→follow, timed-out branch).