wta(prompts): tighten autofix + terminal-agent classification by yeelam-gordon · Pull Request #154 · microsoft/intelligent-terminal

yeelam-gordon · 2026-05-31T13:02:26Z

What this fixes

Two prompt files used by WTA classify the user''s intent before deciding how to respond. Two classifications were misrouting in ways that gave users a worse experience:

Bug 1 — Autofix: missing-package errors got an explanation instead of a fix

When a user ran a command and hit a "missing package" error such as:

ModuleNotFoundError: No module named ''requests''

or

Error: Cannot find module ''express''

…autofix would respond with an explanation of the error rather than a one-line fix (pip install requests, npm install express). The fix is unambiguous in these cases — the error itself tells you the package manager — so making the user read a paragraph is the wrong call.

Root cause: tools/wta/prompts/auto-fix.md told the model to route "tool not installed" cases to the explain action because system CLIs (psql, docker, gh, etc.) are genuinely ambiguous (apt vs brew vs winget vs scoop vs chocolatey). That blanket rule swept up language-level packages too.

Fix: Two sentence edits in auto-fix.md:

The fix description now explicitly includes language-level packages when the package manager is unambiguous (ModuleNotFoundError → pip install, Cannot find module ''X'' → npm install, Rust unresolved import → cargo add).
The explain description narrows "tool not installed" to system CLIs where the install path is genuinely ambiguous.

Bug 2 — Terminal-agent chat: follow-up to a failed command got a generic chat reply

When the user ran a command that failed, then asked a short follow-up like "why?", "explain", or "help", the model often took the question as a generic chat prompt (Chat mode → prose answer) rather than a request to fix the command shown in the buffer (Mode A → run-this-command card).

Fix: Three edits in tools/wta/prompts/terminal-agent.md:

Chat mode now requires the buffer to be clean — if the buffer has a recent error, even a bare "why?"/"explain"/"help" is treated as a follow-up about that error.
Mode A description spells out that follow-ups to a failed command in the buffer always land in Mode A.
A tiebreaker: if the model is about to emit prose followed by a powershell/bash fix-command code fence, stop and emit a Mode A card instead.

Evidence

Both edits were validated by an offline A/B harness that sends a set of failing-terminal scenarios to the live LLM under two prompt variants — baseline (the prompt before this PR) and the edited prompt — multiple trials each, then tallies how often the model''s output matches the expected action (fix vs explain for autofix; Chat vs Mode A for terminal-agent). The harness itself is kept out of this PR because it relies on a live model and per-developer API config.

Autofix — 12 scenarios × 3 trials × 2 model backends (36 calls per variant per backend)

Backend	Baseline	After fix
Qwen (Qwen Code CLI)	31/36 (86.1%)	35/36 (97.2%)
Copilot (OpenAI-compatible API)	35/36 (97.2%)	35/36 (97.2%)

The aggregate understates the change — only two scenarios were broken, and they were the ones targeted:

Scenario	Qwen baseline	Qwen after	Copilot baseline	Copilot after
`ModuleNotFoundError: requests`	1/3 ✗	3/3 ✓	3/3	3/3
`Cannot find module ''express''`	0/3 ✗✗✗	3/3 ✓	2/3 ✗	3/3 ✓

The remaining 1/36 miss on the "after" side is the model occasionally forgetting the json fence — a parser flake, not a classification regression.

Terminal-agent

Same harness pattern: scenarios where the user runs a typo-d command (gti status, pythn --version, npm run buld, etc.) and then asks "why?" / "help" / "?" / "fix?". Baseline frequently routed these to Chat (generic prose answer); after the edits they consistently route to Mode A (a card with the corrected command).

Files

tools/wta/prompts/auto-fix.md (+2 / −2)
tools/wta/prompts/terminal-agent.md (+3 / −1)

Why this is a separate PR from #123

These two prompt files travelled together with the custom-agent-save settings fix on the original branch by accident. They have nothing to do with custom-agent-save, so they''re extracted here as a standalone change. PR #123 has been force-pushed to drop the three commits that landed here.

Copilot

Pull request overview

This PR tightens WTA prompt routing to reduce misclassification between “auto-fix” vs “explain” and “Chat” vs “Mode A”, and adds a PowerShell A/B harness (plus recorded CSVs) to reproduce and validate those prompt changes.

Changes:

Update auto-fix.md to treat unambiguous language-level missing packages as fix, while keeping ambiguous system CLI installs as explain.
Update terminal-agent.md to route follow-ups after a failed command (as seen in buffer) to Mode A, and add a tie-breaker discouraging prose+fix-command fences.
Add prompt evaluation harness scripts + checked-in run summaries under tools/wta/prompts/tests/.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tools/wta/prompts/auto-fix.md	Refines fix vs explain decision criteria for missing packages vs system tools.
tools/wta/prompts/terminal-agent.md	Tightens Chat eligibility based on runtime buffer errors; adds Mode A follow-up and tie-breaker guidance.
tools/wta/prompts/tests/runner-autofix.ps1	Adds A/B harness for `auto-fix.md` with scenario parsing/scoring.
tools/wta/prompts/tests/runner-terminal-agent.ps1	Adds A/B harness for Chat vs Mode A classification using runtime buffer scenarios.
tools/wta/prompts/tests/runner-terminal-agent-copilot-cli.ps1	Adds Copilot CLI-driven harness for a reduced scenario set.
tools/wta/prompts/tests/README.md	Documents harness layout, requirements, usage, and references to checked-in results.
tools/wta/prompts/tests/results/autofix-min-qwen.csv	Recorded Qwen track results for autofix A/B.
tools/wta/prompts/tests/results/autofix-min-copilot.csv	Recorded Copilot track results for autofix A/B.
tools/wta/prompts/tests/results/terminal-agent-min-qwen.csv	Recorded Qwen track results for terminal-agent A/B.
tools/wta/prompts/tests/results/terminal-agent-min-copilot.csv	Recorded Copilot track results for terminal-agent A/B.
tools/wta/prompts/tests/results/terminal-agent-min-copilot-cli.csv	Recorded Copilot CLI track results for terminal-agent A/B.

+function Build-Variant {
+    param([string]$variant)
+    switch ($variant) {
+        'baseline' { return $basePrompt }
+        'MIN2'     { return $minPrompt }
+        default    { throw "Unknown variant: $variant" }
+    }


+# Robust A/B harness for the Terminal Agent prompt against qwen-code default system prompt.
+# Variants:
+#   baseline   = current prompt (line 9 unchanged)
+#   VC         = baseline + 15-word chat-mode qualifier on line 9
+#   PRE        = baseline + Step 0 binary gate inserted BEFORE the numbered modes
+#   PRE+VC     = both (belt and suspenders)


+$ErrorActionPreference = 'Stop'
+$root = $PSScriptRoot
+if (-not $root) { $root = Join-Path $PSScriptRoot 'results' }
+


+function Invoke-CopilotCli {
+    param([string]$systemAndUser)
+    $tmp = New-TemporaryFile
+    Set-Content -Path $tmp -Value $systemAndUser -Encoding UTF8 -NoNewline
+    try {
+        # Pipe prompt via stdin would be cleaner but -p reads arg; use file-based via @  not supported.
+        # Use -p with the raw content; PowerShell will pass as single arg.
+        $out = & copilot -p $systemAndUser --allow-all-tools 2>&1 | Out-String
+        return $out
+    } finally { Remove-Item $tmp -ErrorAction SilentlyContinue }
+}


+$ErrorActionPreference = 'Stop'
+$root = $PSScriptRoot
+if (-not $root) { $root = Join-Path $PSScriptRoot 'results' }
+


+$ErrorActionPreference = 'Stop'
+$root = $PSScriptRoot
+if (-not $root) { $root = Join-Path $PSScriptRoot 'results' }
+


+Each runner sends the same scenarios to the live LLM under two prompt
+variants (`baseline` = pre-fix, `MIN` = post-fix) across multiple
+trials, parses the response, and tallies pass/fail. This is the
+evidence trail for the prompt edits — re-run any time the prompts
+change to catch regressions.
+
+## Layout
+
+| File | Purpose |
+|---|---|
+| `runner-autofix.ps1` | A/B harness for `auto-fix.md` (Qwen + Copilot OpenAI-compatible API). 12 scenarios (F1–F8 expect `fix`, E1–E4 expect `explain`). |
+| `runner-terminal-agent.ps1` | A/B harness for `terminal-agent.md` (Qwen + Copilot OpenAI-compatible API). Chat-vs-Mode-A classification scenarios. |
+| `runner-terminal-agent-copilot-cli.ps1` | Same scenarios as `runner-terminal-agent.ps1` but driven through the real `copilot -p` CLI (closer to production wta path). |
+| `results/` | CSV summaries from the runs that justified the prompt edits in this PR. |
+
+## Requirements
+
+- Windows PowerShell 5+ or PowerShell 7+
+- A Qwen Code CLI config at `~/.qwen/settings.json` containing the
+  OpenAI-compatible endpoint, API key env var, and model id
+  (`modelProviders.openai[0].{baseUrl,envKey,id}`).
+- For the Qwen track only: a `qwen-default-sys.txt` next to the runner
+  containing the Qwen CLI's default system prompt (so the harness
+  mirrors production exactly). If missing, pass `-Copilot` to skip the
+  Qwen track.
+- For `runner-terminal-agent-copilot-cli.ps1`: the `copilot` CLI on PATH.
+
+## Usage
+
+```powershell
+cd tools/wta/prompts/tests
+
+# 3-trial A/B on auto-fix.md, Qwen track:
+.\runner-autofix.ps1 -Trials 3 -Variants @('baseline','MIN') -OutSuffix '-qwen'
+
+# Same, Copilot track (no qwen-default-sys.txt needed):
+.\runner-autofix.ps1 -Trials 3 -Variants @('baseline','MIN') -Copilot -OutSuffix '-copilot'
+
+# terminal-agent.md, both variants:
+.\runner-terminal-agent.ps1 -Trials 3 -Variants @('baseline','MIN') -OutSuffix '-qwen'
+.\runner-terminal-agent.ps1 -Trials 3 -Variants @('baseline','MIN') -Copilot -OutSuffix '-copilot'
+```
+
+Outputs land in this folder (or `-OutFile` if you set it):
+`{autofix,results}-summary{$OutSuffix}.csv` (per-trial pass/fail) and
+`{autofix,results}-full{$OutSuffix}.json` (full prompts + raw model
+responses for debugging).


github-advanced-security

check-spelling found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

 Read the runtime context (cwd, profile, activeTarget, buffer, supported delegate agents) and the user's input. Then walk this decision tree top-to-bottom and stop at the FIRST match:

-1. **Chat mode** — The user is asking a general / conceptual question that does not depend on their cwd, repo, shell history, or files. Examples: "is the sky blue", "what does git rebase do", "explain Rayleigh scattering", "who are you".
+1. **Chat mode** — The user is asking a general / conceptual question that does not depend on their cwd, repo, shell history, or files, AND the runtime `buffer` shows no recent error / failed command. If the buffer shows an error, the request is never Chat — it inherits that error as context: route information-seeking words ("why?", "what does this mean", "explain") to **Mode A** (explain the error), and action-seeking words ("help", "fix it", "make it work") to **Mode B** (run a command). Chat examples (no buffer error): "is the sky blue", "what does git rebase do", "explain Rayleigh scattering", "who are you".


 Read the runtime context (cwd, profile, activeTarget, buffer, supported delegate agents) and the user's input. Then walk this decision tree top-to-bottom and stop at the FIRST match:

-1. **Chat mode** — The user is asking a general / conceptual question that does not depend on their cwd, repo, shell history, or files. Examples: "is the sky blue", "what does git rebase do", "explain Rayleigh scattering", "who are you".
+1. **Chat mode** — The user is asking a general / conceptual question that does not depend on their cwd, repo, shell history, or files, AND the runtime `buffer` shows no recent error / failed command. If the buffer shows an error, the request is never Chat — it inherits that error as context: route information-seeking words ("why?", "what does this mean", "explain") to **Mode A** (explain the error), and action-seeking words ("help", "fix it", "make it work") to **Mode B** (run a command). Chat examples (no buffer error): "is the sky blue", "what does git rebase do", "explain Rayleigh scattering", "who are you".


 ### `fix` — one deterministic command resolves it

-Use when you can write a single shell command (including in-place file edits) that fixes the error with certainty: typos, wrong flags, made-up commands with obvious intent (`listdir` → shell-native equivalent), source edits the compiler pinpoints, single-file renames, missing imports.
+**This is the strong default. Pick `fix` whenever a single shell command can plausibly resolve what the user was trying to do** — typos, wrong flags, made-up commands with obvious intent (`listdir` → shell-native equivalent), source edits the compiler pinpoints, single-file renames, missing imports, missing language-level packages where the package manager is unambiguous from the project (`ModuleNotFoundError` → `pip install`, `Cannot find module 'X'` → `npm install`, `unresolved import` in Rust → `cargo add`), bare words that look like a non-existent command but match an idiomatic one in this shell (`datetime` in PowerShell → `Get-Date`; `ll` on Windows PowerShell → `Get-ChildItem`).
+
+If multiple shell commands are plausible interpretations, **commit to the single most likely one** for the current shell and mention the alternative in `rationale` ("Did you mean X? — Y is also possible.") rather than escalating to `explain`. The user can dismiss the suggestion if it's wrong; an unhelpful "intent is unclear" essay is worse than a best-guess fix.


Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

+1. **Chat mode** — The user is seeking information or asking a question. They want an answer, not an action. Answer in prose. A recent error in the buffer is fine context to draw on when explaining; don't try to fix it unless the user explicitly asked for action.
   → Answer in prose. No tool calls. No JSON.


 ### `fix` — one deterministic command resolves it

-Use when you can write a single shell command (including in-place file edits) that fixes the error with certainty: typos, wrong flags, made-up commands with obvious intent (`listdir` → shell-native equivalent), source edits the compiler pinpoints, single-file renames, missing imports.
+**The strong default.** Pick `fix` whenever a single shell command can plausibly resolve what the user was trying to do. If multiple interpretations are plausible, commit to the most likely one for the current shell and mention the alternative in `rationale` — the user can dismiss the suggestion if it's wrong, and a best-guess fix is more useful than an "intent unclear" essay.



Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

yeelam-gordon · 2026-06-01T12:50:19Z

+1. **Chat mode** — The user is seeking information or asking a question. They want an answer, not an action. Answer in prose. A recent error in the buffer is fine context to draw on when explaining; don't try to fix it unless the user explicitly asked for action.
   → Answer in prose. No tool calls. No JSON.


Should we talk about user ask for an action or ask for information. Ask for action should still trigger Mode A or Mode B explcitly.

Applied in 1d2f88d. Reframed both Chat and Mode A around the info-vs-action distinction: Chat = wants prose information and is not asking you to do anything, suggest a command, or address an error; Mode A = asking for an action — a recommended command, an operation on the system, or a fix for an error visible in the buffer. Tested: copilot 48/48, qwen 45/48 (only bare-word help still flips to Chat on qwen — defensible, that one is genuinely ambiguous).

 ### `fix` — one deterministic command resolves it

-Use when you can write a single shell command (including in-place file edits) that fixes the error with certainty: typos, wrong flags, made-up commands with obvious intent (`listdir` → shell-native equivalent), source edits the compiler pinpoints, single-file renames, missing imports.
+**The strong default.** Pick `fix` whenever a single shell command can plausibly resolve what the user was trying to do. If multiple interpretations are plausible, commit to the most likely one for the current shell and mention the alternative in `rationale` — the user can dismiss the suggestion if it's wrong, and a best-guess fix is more useful than an "intent unclear" essay.



yeelam-gordon · 2026-06-01T12:47:30Z

-```json
-{"action": "fix", "title": "Use println! instead of printf!", "command": "(Get-Content src\\main.rs) -replace 'printf!', 'println!' | Set-Content src\\main.rs", "rationale": "Rust uses println!; compiler suggested the same."}
-```
-


I don't see you restore this?

Restored in 1d2f88d — missed it when I reverted, sorry. Back to matching main.

`auto-fix.md` - `fix` desc: add missing language-level packages where the package manager is unambiguous (`ModuleNotFoundError` -> `pip install`, `Cannot find module 'X'` -> `npm install`, Rust `unresolved import` -> `cargo add`). - `explain` desc: narrow "tool not installed" to *system* CLIs where the install path is ambiguous (`psql` / `docker` / `gh`). `terminal-agent.md` - Chat-mode line: a non-empty buffer with an error disqualifies Chat. Even a bare "why?" / "explain" / "help" inherits that error as context and routes to Mode A or B. - Mode A description: explicit "follow-up to a failed command in buffer always lands in Mode A — user wants the fix command, not prose." - Tiebreaker: if you would emit prose followed by a code fence with a fix command, stop and emit a Mode A card instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-01T13:27:41Z

@check-spelling-bot Report

⚠️ Dictionary not found

Problems were encountered retrieving check dictionaries (cspell:ada/dict/ada.txt cspell:software-terms/dict/softwareTerms.txt cspell:cpp/src/lang-jargon.txt cspell:public-licenses/src/generated/public-licenses.txt cspell:php/dict/php.txt cspell:java/src/java.txt cspell:node/dict/node.txt cspell:k8s/dict/k8s.txt cspell:public-licenses/src/additional-licenses.txt cspell:cpp/src/stdlib-cpp.txt cspell:r/src/r.txt cspell:dart/src/dart.txt cspell:shell/dict/shell-all-words.txt cspell:cpp/src/stdlib-c.txt cspell:python/src/additional_words.txt cspell:sql/src/sql.txt cspell:elixir/dict/elixir.txt cspell:gaming-terms/dict/gaming-terms.txt cspell:python/src/common/extra.txt cspell:typescript/dict/typescript.txt cspell:powershell/dict/powershell.txt cspell:cpp/src/stdlib-cmath.txt cspell:npm/dict/npm.txt cspell:cpp/src/compiler-clang-attributes.txt cspell:python/src/python/python-lib.txt cspell:svelte/dict/svelte.txt cspell:redis/dict/redis.txt cspell:cpp/src/template-strings.txt cspell:dotnet/dict/dotnet.txt cspell:scala/dict/scala.txt cspell:swift/src/swift.txt cspell:cpp/src/ecosystem.txt cspell:golang/dict/go.txt cspell:cpp/src/compiler-gcc.txt cspell:software-terms/dict/webServices.txt cspell:cpp/src/compiler-msvc.txt cspell:java/src/java-terms.txt cspell:cpp/src/lang-keywords.txt cspell:python/src/python/python.txt cspell:css/dict/css.txt cspell:cpp/src/stdlib-cerrno.txt cspell:clojure/src/clojure.txt cspell:sql/src/tsql.txt cspell:latex/dict/latex.txt cspell:monkeyc/src/monkeyc_keywords.txt cspell:haskell/dict/haskell.txt cspell:django/dict/django.txt cspell:fullstack/dict/fullstack.txt cspell:html/dict/html.txt cspell:ruby/dict/ruby.txt cspell:cpp/src/people.txt cspell:rust/dict/rust.txt cspell:lua/dict/lua.txt cspell:docker/src/docker-words.txt).

⚠️ For more information, see check-dictionary-not-found.

🔴 Please review

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

Unrecognized words (1)

wcsnicmp

These words are not needed and should be removed

Backgrounder Ccc cplusplus ctl Debian dotnet drv endptr EOFs evt Fullwidth gitlab hdr idl IME inbox intelligentterminal Ioctl KVM lbl lld lsb NONINFRINGEMENT notif oss outdir Podcast pri prioritization PSobject rcv segfault Signtool sourced SWP Tbl testname transitioning unk unparseable unregisters Virt VMs VTE webpage websites WTCLI xsi

To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the git@github.com:microsoft/intelligent-terminal.git repository
on the dev/yeelam/wta-prompt-improvements branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/cfb6f7e75bbfc89c71eaa30366d0c166f1bd9c8c/apply.pl' |
perl - 'https://github.com/microsoft/intelligent-terminal/actions/runs/26757481883/attempts/1' &&
git commit -m 'Update check-spelling metadata'

Available 📚 dictionaries could cover words (expected and unrecognized) not in the 📘 dictionary

This includes both expected items (2064) from .github/actions/spelling/expect/alphabet.txt .github/actions/spelling/expect/expect.txt .github/actions/spelling/expect/web.txt and unrecognized words (1)

Dictionary	Entries	Covers	Uniquely
cspell:csharp/csharp.txt	32	2	2
cspell:aws/aws.txt	232	2	2
cspell:fonts/fonts.txt	536	1	1

Consider adding to the extra_dictionaries array (in the .github/actions/spelling/config.json file):

    "cspell:csharp/csharp.txt",
    "cspell:aws/aws.txt",
    "cspell:fonts/fonts.txt",

To stop checking additional dictionaries, put (in the .github/actions/spelling/config.json file):

"check_extra_dictionaries": []

Warnings ⚠️ (1)

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

⚠️ Warnings	Count
⚠️ check-dictionary-not-found	54

See ⚠️ Event descriptions for more information.

✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

If the listed items are:

... misspelled, then please correct them instead of using the command.
... names, please add them to .github/actions/spelling/allow/names.txt.
... APIs, you can add them to a file in .github/actions/spelling/allow/.
... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

If the flagged items are 🤯 false positives

If items relate to a ...

binary file (or some other file you wouldn't want to check at all).

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).
well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.

Copilot AI review requested due to automatic review settings May 31, 2026 13:02

Copilot started reviewing on behalf of yeelam-gordon May 31, 2026 13:02 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

github-advanced-security AI found potential problems May 31, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch 2 times, most recently from 5a38d83 to d916600 Compare June 1, 2026 09:28

Copilot AI review requested due to automatic review settings June 1, 2026 09:28

Copilot started reviewing on behalf of yeelam-gordon June 1, 2026 09:29 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread tools/wta/prompts/auto-fix.md Fixed

This comment has been minimized.

Sign in to view

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch from d916600 to cafe581 Compare June 1, 2026 09:51

Copilot AI review requested due to automatic review settings June 1, 2026 09:59

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch from cafe581 to 8808b96 Compare June 1, 2026 09:59

Copilot started reviewing on behalf of yeelam-gordon June 1, 2026 09:59 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Jun 1, 2026

View reviewed changes

yeelam-gordon commented Jun 1, 2026

View reviewed changes

Comment thread tools/wta/prompts/auto-fix.md

yeelam-gordon commented Jun 1, 2026

View reviewed changes

Comment thread tools/wta/prompts/auto-fix.md

This comment has been minimized.

Sign in to view

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch from 8808b96 to ca20f9f Compare June 1, 2026 10:19

Copilot AI review requested due to automatic review settings June 1, 2026 10:21

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch from ca20f9f to d7c4cf5 Compare June 1, 2026 10:21

Copilot started reviewing on behalf of yeelam-gordon June 1, 2026 10:21 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

yeelam-gordon commented Jun 1, 2026

View reviewed changes

yeelam-gordon force-pushed the dev/yeelam/wta-prompt-improvements branch from d7c4cf5 to 1d2f88d Compare June 1, 2026 13:20

		1. Chat mode — The user is seeking information or asking a question. They want an answer, not an action. Answer in prose. A recent error in the buffer is fine context to draw on when explaining; don't try to fix it unless the user explicitly asked for action.
		→ Answer in prose. No tool calls. No JSON.

Conversation

yeelam-gordon commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

Bug 1 — Autofix: missing-package errors got an explanation instead of a fix

Bug 2 — Terminal-agent chat: follow-up to a failed command got a generic chat reply

Evidence

Autofix — 12 scenarios × 3 trials × 2 model backends (36 calls per variant per backend)

Terminal-agent

Files

Why this is a separate PR from #123

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-advanced-security AI left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

yeelam-gordon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

yeelam-gordon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

yeelam-gordon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

yeelam-gordon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 1, 2026

@check-spelling-bot Report

⚠️ Dictionary not found

🔴 Please review

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

Unrecognized words (1)

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yeelam-gordon commented May 31, 2026 •

edited

Loading