Skip to content

fix(core): press CUA keypress combinations as a single chord (#2266)#2298

Merged
seanmcguire12 merged 2 commits into
mainfrom
evals/external-contrib-cua-keypress-yawbtng
Jul 1, 2026
Merged

fix(core): press CUA keypress combinations as a single chord (#2266)#2298
seanmcguire12 merged 2 commits into
mainfrom
evals/external-contrib-cua-keypress-yawbtng

Conversation

@seanmcguire12

@seanmcguire12 seanmcguire12 commented Jul 1, 2026

Copy link
Copy Markdown
Member

thanks @yawbtng for the contribution here!!

why

CUA keypress actions describe a single key chord (modifiers held down while the main key is pressed), but
V3CuaAgentHandler.executeAction pressed each key in the array separately. page.keyPress(modifier) presses and releases the modifier, so by the time the main key was pressed the modifier was already up.

The concrete failure: a ["Control", "A"] keypress sends Control on its own (a no-op) and then A through the plain typing path — so instead of select-all, the agent types a literal a into the focused field. Any select-all / copy / paste / cut / shortcut pattern silently fails and corrupts input. Because the agent-replay cache recorded the broken per-key sequence, replays reproduced the bug too.

This is provider-dependent, based on the shape each client emits:

| Provider | emits for a combo | old behavior | status | | --- | --- | --- | --- |
| OpenAI | keys: ["CTRL", "A"] | Ctrl then literal a | ❌ broken | | Google (key_combination) | .split("+")["Control", "A"] | Ctrl then literal a | ❌ broken |
| Microsoft (fara-7b) | keys: string[] (per-key) | Ctrl then literal a | ❌ broken |
| Anthropic | keys: ["ctrl+s"] (single +-joined string) | chorded correctly | ✅ unaffected |

Anthropic only worked by accident — it pre-joins with +, which page.keyPress already chords internally.

what changed

packages/core/lib/v3/handlers/v3CuaAgentHandler.ts — in the keypress case, map each key and join into one +-delimited combination, then call page.keyPress once. page.keyPress already holds modifiers down for the final key and already special-cases the literal + key, so single keys, already-combined strings, and Ctrl++-style inputs all stay correct. mapKeyToPlaywright is idempotent (CTRL/ControlControl), so Google's pre-mapped arrays and Anthropic's combined string are unchanged. The recorded replay step is now a single press Control+A instead of the broken press Control, press A.

test plan

New packages/core/tests/unit/cua-keypress-chord.test.ts (5 cases, all passing):

  • ["Control", "A"] → single keyPress("Control+A")
  • alias normalization: ["CTRL", "A"]keyPress("Control+A")
  • single key ["Enter"]keyPress("Enter") (unchanged)
  • already-combined ["ctrl+s"]keyPress("ctrl+s") (Anthropic shape, unchanged)
  • empty [] → no keyPress call

Existing CUA suites (anthropic-cua-triple-click, openai-cua-client, microsoft-cua-client, anthropic-cua-adaptive-thinking) — 25 tests still green.


Related: this is exactly the class of provider-specific CUA regression that #2188 proposes catching with a deterministic bench task.


Summary by cubic

Fixes CUA keypress combos by pressing them as one chord. Shortcuts like Ctrl+A now work across OpenAI, Google key_combination, and Microsoft clients instead of typing letters.

  • Bug Fixes
    • Map keys, join with "+", and call page.keyPress once; supports arrays, already-joined strings, and the literal "+" key.
    • Normalize aliases (CTRLControl) and record a single press Control+A step for replays.
    • Added unit tests for combos, alias normalization, single key, already-combined, and empty input.

Written for commit c966475. Summary will update on new commits.

Review in cubic

## why

CUA `keypress` actions describe a single key **chord** (modifiers held
down while the main key is pressed), but
`V3CuaAgentHandler.executeAction` pressed each key in the array
**separately**. `page.keyPress(modifier)` presses and *releases* the
modifier, so by the time the main key was pressed the modifier was
already up.

The concrete failure: a `["Control", "A"]` keypress sends `Control` on
its own (a no-op) and then `A` through the plain typing path — so
instead of select-all, the agent **types a literal `a` into the focused
field**. Any select-all / copy / paste / cut / shortcut pattern silently
fails *and* corrupts input. Because the agent-replay cache recorded the
broken per-key sequence, replays reproduced the bug too.

This is provider-dependent, based on the shape each client emits:

| Provider | emits for a combo | old behavior | status |
| --- | --- | --- | --- |
| OpenAI | `keys: ["CTRL", "A"]` | `Ctrl` then literal `a` | ❌ broken |
| Google (`key_combination`) | `.split("+")` → `["Control", "A"]` |
`Ctrl` then literal `a` | ❌ broken |
| Microsoft (`fara-7b`) | `keys: string[]` (per-key) | `Ctrl` then
literal `a` | ❌ broken |
| Anthropic | `keys: ["ctrl+s"]` (single `+`-joined string) | chorded
correctly | ✅ unaffected |

Anthropic only worked by accident — it pre-joins with `+`, which
`page.keyPress` already chords internally.

## what changed

`packages/core/lib/v3/handlers/v3CuaAgentHandler.ts` — in the `keypress`
case, map each key and **join into one `+`-delimited combination**, then
call `page.keyPress` once. `page.keyPress` already holds modifiers down
for the final key and already special-cases the literal `+` key, so
single keys, already-combined strings, and `Ctrl++`-style inputs all
stay correct. `mapKeyToPlaywright` is idempotent (`CTRL`/`Control` →
`Control`), so Google's pre-mapped arrays and Anthropic's combined
string are unchanged. The recorded replay step is now a single `press
Control+A` instead of the broken `press Control, press A`.

## test plan

New `packages/core/tests/unit/cua-keypress-chord.test.ts` (5 cases, all
passing):
- `["Control", "A"]` → single `keyPress("Control+A")`
- alias normalization: `["CTRL", "A"]` → `keyPress("Control+A")`
- single key `["Enter"]` → `keyPress("Enter")` (unchanged)
- already-combined `["ctrl+s"]` → `keyPress("ctrl+s")` (Anthropic shape,
unchanged)
- empty `[]` → no `keyPress` call

Existing CUA suites (`anthropic-cua-triple-click`, `openai-cua-client`,
`microsoft-cua-client`, `anthropic-cua-adaptive-thinking`) — 25 tests
still green.

---

Related: this is exactly the class of provider-specific CUA regression
that #2188 proposes catching with a deterministic bench task.


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes CUA keypress combos by pressing them as one chord, not as separate
keys. Shortcuts like Ctrl+A now work across OpenAI, Google
`key_combination`, and Microsoft clients instead of typing letters.

- **Bug Fixes**
- Map keys and join with "+" before one `page.keyPress` call; supports
arrays, already-joined strings, and the literal "+" key.
- Added unit tests for combos, alias normalization, single key,
already-combined, and empty input.

<sup>Written for commit 4f921ee.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2266?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
@changeset-bot

changeset-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: c966475

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant CUA as CUA Client
    participant Handler as V3CuaAgentHandler
    participant Replay as Agent Replay Cache
    participant Playwright as Playwright page
    
    Note over CUA,Playwright: Keypress Action Flow
    
    CUA->>Handler: executeAction({ type: "keypress", keys })
    
    alt keys is array (OpenAI, Google, Microsoft)
        alt keys.length > 0
            Handler->>Handler: map each key via mapKeyToPlaywright()
            Handler->>Handler: join() mapped keys with "+"
            Note over Handler: e.g., ["Control","A"] → "Control+A"
            Handler->>Playwright: page.keyPress("Control+A")
            Playwright-->>Handler: done
            opt recording enabled
                Handler->>Replay: recordCuaActStep("press Control+A")
            end
        else keys.length === 0
            Note over Handler: No-op, skip keyPress
        end
    else keys is single string (Anthropic)
        Handler->>Handler: wrap in array → ["ctrl+s"]
        Handler->>Handler: map via mapKeyToPlaywright() → ["ctrl+s"]
        Handler->>Handler: join() → "ctrl+s"
        Handler->>Playwright: page.keyPress("ctrl+s")
        Playwright-->>Handler: done
        opt recording enabled
            Handler->>Replay: recordCuaActStep("press ctrl+s")
        end
    end
    
    Handler-->>CUA: { success: true }
Loading

Re-trigger cubic

@seanmcguire12 seanmcguire12 merged commit 892701a into main Jul 1, 2026
237 checks passed
@github-actions github-actions Bot mentioned this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants