Skip to content

fix(wta): surface first-ever copilot sessions live in session management#182

Open
yeelam-gordon wants to merge 3 commits into
mainfrom
dev/yeelam/fix-wta-listen-respawn
Open

fix(wta): surface first-ever copilot sessions live in session management#182
yeelam-gordon wants to merge 3 commits into
mainfrom
dev/yeelam/fix-wta-listen-respawn

Conversation

@yeelam-gordon

@yeelam-gordon yeelam-gordon commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Symptom

The session management view stays empty after a user starts their first-ever copilot (or other agent) session inside Intelligent Terminal, even though the session uuid dir is being written to ~/.copilot/session-state/<uuid>/. The row only appears after IT is restarted (which triggers the cold-start history_loader::load_all scan from master boot).

Reproduces in a fresh new tab too — confirming master''s registry is empty, not a per-helper UI staleness issue.

Root cause

Two independent gaps in the live-update pipeline:

  1. Hook drop at synthetic-key gate. Copilot''s SessionStart hook observably fires with an empty session_id on first run. route_agent_event_to_registry_with_hook_sink (app.rs:704) maps that to a synthetic pane:<guid> key, which is intentionally local-only and never published to master. Master''s registry therefore never learns about the new uuid.
  2. wtcli --json listen is one-shot. start_reader spawns the child once and returns silently on spawn failure or child exit. If that channel dies, the helper goes deaf to every subsequent WT broadcast (autofix classification, agent.tool.starting, etc.) with no log and no recovery until helper restart.

Neither path self-heals, so the only currently-shipping workaround is "restart IT".

Fixes (two commits, independently reviewable)

1. fix(wta): retry wtcli listen once on unexpected exit (cli_channel.rs)

  • start_reader now retries the wtcli --json listen child exactly once if it exits unexpectedly, then gives up with a loud error log.
  • Deliberately not an infinite respawn loop or backoff schedule: if the first retry also dies, something is fundamentally broken and tight respawn would just spam logs. Per-helper push-event freshness is not load-bearing for the user''s reported symptom — fix Adding Microsoft SECURITY.MD #2 covers that path independently.
  • Tradeoff explicitly accepted: after a permanent failure, autofix and agent.tool.* push events go dark for the lifetime of that helper. Session management still recovers via the master-side 30 s disk rescan below.
  • kill_on_drop(true) + start_kill before wait so the dying child is reaped, not orphaned.

2. fix(wta-master): periodic disk rescan to surface missed hook sessions (master/mod.rs + session_registry.rs)

  • New 30 s rescan task in serve_master that re-runs history_loader::load_all and inserts any uuid dir not already in master''s registry, then broadcasts sessions/changed iff added > 0.
  • SessionRegistry::insert_if_absent — race-free check+insert under one lock guard so the rescan never clobbers a live row with disk-inferred state (history_loader can''t observe status=Working/Attention/current_tool).
  • Worst-case latency for a session to surface in session management: ~35 s (30 s rescan tick + 5 s helper session/list tick). This is the load-bearing fix for the user''s reported symptom.

Copilot AI review requested due to automatic review settings June 2, 2026 06:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes session-management UI failing to surface a user's first-ever agent CLI session by closing two live-update gaps: a one-shot wtcli listen reader that goes silent on any transient failure, and a master-side registry that only scans disk once at boot (missing sessions whose hook payload was dropped at the synthetic-key gate).

Changes:

  • Wrap wtcli listen in a respawn loop with exponential backoff, stderr draining, healthy-lifetime gating, and kill_on_drop to keep WT event subscription alive across child exits.
  • Add a 30 s periodic history_loader::load_all rescan in serve_master that broadcasts sessions/changed whenever previously-unseen on-disk sessions are surfaced.
  • Introduce SessionRegistry::insert_if_absent so the rescan never clobbers live hook-tracked state (status=Working, current_tool) under a single lock guard.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tools/wta/src/shell/wt_channel/cli_channel.rs Reworks start_reader into a self-healing respawn loop with backoff, stderr logging, and child reaping.
tools/wta/src/session_registry.rs Adds race-free insert_if_absent trait method + InMemoryRegistry impl for non-clobbering inserts.
tools/wta/src/master/mod.rs Adds 30 s periodic disk rescan task that uses insert_if_absent and broadcasts sessions/changed on new finds.

@github-actions

This comment has been minimized.

@yeelam-gordon yeelam-gordon force-pushed the dev/yeelam/fix-wta-listen-respawn branch 2 times, most recently from e8576eb to a4c1c3b Compare June 2, 2026 06:33
@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings June 2, 2026 07:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@github-actions

This comment has been minimized.

yeelam-gordon and others added 3 commits June 4, 2026 09:15
wtcli --json listen was a one-shot subprocess: any silent exit (COM
blip, transient RPC, WT restart) permanently silenced the helper's
push-event path until IT restart, so the first copilot/agent session
never surfaced in session-mgmt.

Respawn on exit, but only ONCE. If the retry also exits, give up
(loud error log) — at that point something is fundamentally broken
and a tight respawn loop would just spam logs. Session-management
freshness is independently covered by the master-side 30s disk
rescan, so the worst-case tradeoff is autofix and agent_event hooks
going dark for that helper's lifetime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hooks fire SessionStart with empty session_id on the very first run
of a copilot/agent session (the agent hasn't allocated one yet),
so the resulting synthetic pane:<guid> key was dropped at the
helper edge and master never learned about the session.

Add a 30s ticker in serve_master that rescans the session-discovery
dirs and back-fills any sessions registry doesn't already know about.
Also add InMemoryRegistry::insert_if_absent for a race-free
check+insert under one lock, so the rescan and a concurrent hook
delivery can't double-insert.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yeelam-gordon yeelam-gordon force-pushed the dev/yeelam/fix-wta-listen-respawn branch from 13fb665 to 655cb9b Compare June 4, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants