Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ This plugin includes the following skills (see `skills/` for details):
| [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects |
| [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session |
| [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs |
| [ui-debugging](skills/ui-debugging/SKILL.md) | Fix user-reported UI bugs with exact browser checks — reproduce with a source-blind probe, patch, verify with a page-side check whose JSON failure feeds the next fix attempt |

## Installation

Expand Down
21 changes: 21 additions & 0 deletions skills/ui-debugging/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Browserbase, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
121 changes: 121 additions & 0 deletions skills/ui-debugging/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
name: ui-debugging
description: Debug and fix user-reported UI bugs using exact browser checks instead of repeated browsing. Use when a bug report describes user-visible misbehavior in a web app you have code access to - wrong state after a click, missing or overlapping elements, validation accepting bad input, stale async results, broken mobile layout, focus or accessibility failures. The workflow - reproduce in a browser with a source-blind probe, fix, then verify with an exact post-fix check whose JSON output names the failing element and the pass condition - measurably outperforms one-shot fixing (39% -> 67% fix rate on an 18-bug benchmark, with zero re-browsing during repair).
license: MIT
metadata:
author: browserbase
version: "0.1.0"
allowed-tools: Bash Read Edit Glob Grep
compatibility: "Works with any browser tool that can navigate and evaluate JS in the page (browse CLI, Stagehand/Playwright page.evaluate, chrome-devtools). Local apps: browse CLI local mode. Deployed/staging sites: browse --remote (Browserbase session)."
---

# UI Debugging — fix bugs with exact browser checks

Fix UI bugs by treating the browser as a **diagnostic instrument, not an explorer**. The core finding from benchmarking this workflow: when a fix attempt fails, more browsing barely helps — a *sharper assertion* does. One precise DOM check that names the exact wrong element recovers bugs that narrative re-investigation cannot.

## When to use

A user-visible bug in a web app whose code you can edit: state bugs (counter goes down instead of up, draft lost on tab switch), validation bugs (bad input accepted), async bugs (stale results, spinner never clears), auth/visibility bugs (protected UI still shown), layout bugs (overflow, hidden menus, overlap), a11y bugs (focus not moving into a dialog, missing accessible names).

For *finding* unknown bugs in changed UI, use the `ui-test` skill; this skill is for *fixing* a reported bug and proving the fix.

## The loop

```
1. REPRODUCE browser probe, source-blind -> evidence
2. FIX smallest credible code change
3. CHECK exact check: passed / measurements -> if passed, done
4. FEED BACK the check's JSON (not prose!) into the next fix attempt -> goto 2
```

With the `browse` CLI:

```bash
browse open http://127.0.0.1:5173/route --wait networkidle
browse viewport 375 700 # only for mobile-width bugs
browse eval '<check expression>'
```

For a deployed or staging site, add `--remote` to run in a Browserbase cloud session (you also get a session replay to attach to the bug report). Any tool with `page.evaluate` works the same way.

### Step 1 — Reproduce with a source-blind probe

Before reading any code, write a probe that follows the *user's* reproduction path using only what is visible on the page: query visible controls, set input values, dispatch events, click, wait, then return the before/after state. Do not let what you expect from the source bias what you measure.

```js
(async () => {
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
// For React apps, set inputs via the native setter so state actually updates:
const setValue = (el, value) => {
Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, "value")?.set?.call(el, value);
el.dispatchEvent(new Event("input", { bubbles: true }));
};
const before = document.querySelector("output")?.textContent;
[...document.querySelectorAll("button")].find((b) => b.textContent === "+")?.click();
await sleep(100);
return { before, after: document.querySelector("output")?.textContent };
})()
```

### Step 2 — Fix

Read the code, make the smallest change that explains *all* of the probe evidence. Prefer 1–2 files. Don't refactor.

### Step 3 — Write an exact check (the part that wins)

Write the verification as a **check contract** — a page-side expression returning:

```js
{
passed: <boolean>, // the assertion, computed in the page
...measurements, // named evidence: elements, computed styles, rects, text
passCondition: "<one sentence: what passing means>",
instructionToFixer: "<when failing: exactly what must change>"
}
```

Three rules make checks effective:

1. **Name the exact element the assertion inspects.** Don't just assert "menu is visible" — return *which* element your selector matched, its `display`/`visibility`, its rect, and all other candidates. The classic failure: the selector matches a hidden desktop `<nav>` that appears before the mobile menu, so the fix targeted the wrong surface. The check that finally fixed it returned `querySelectorMatched: {dataProbe: "desktop-nav", display: "none"}, candidates: [...]` — instantly telling the fixer the *first match* was the problem.
2. **Measure, don't summarize.** `getComputedStyle`, `getBoundingClientRect`, `document.activeElement`, `scrollWidth > clientWidth`, regex over `document.body.innerText` — concrete numbers and booleans, not impressions.
3. **Compute `passed` inside the page.** The assertion runs where the truth lives; nothing is lost in translation.

Worked example — "focus doesn't move into the delete dialog":

```js
(async () => {
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
const describe = (el) => ({
tag: el?.tagName || "", role: el?.getAttribute?.("role") || "",
text: (el?.textContent || "").replace(/\s+/g, " ").trim().slice(0, 80)
});
document.querySelector('[role="menuitem"], button')?.click();
await sleep(120);
document.querySelector('button[data-variant="destructive"]')?.click();
await sleep(150);
const dialog = document.querySelector('[role="dialog"]');
const active = document.activeElement;
return {
passed: !!dialog && dialog.contains(active),
dialog: describe(dialog), active: describe(active),
passCondition: "after the delete confirmation opens, document.activeElement is inside the dialog",
instructionToFixer: "Focus a real button/input inside the dialog when it opens (autoFocus or a ref/effect on the dialog content), not the trigger behind the modal."
};
})()
```

### Step 4 — Feed the JSON back, not a summary

When the check fails after a fix, the next fix attempt gets the **raw check result** plus what the previous patch changed. Resist re-browsing for more narrative evidence — benchmark data shows the check payload is what converts failures, not fresh exploration. Re-run the *same* check after the next fix; never "eyeball" verification.

## Anti-patterns

- **Vague evidence**: "the mobile menu does not appear correctly" — true, but the fixer can patch the wrong nav surface. Name the element.
- **More browsing on failure**: a second narrative investigation mostly restates the first. Convert the failure into a sharper check instead.
- **Asserting outside the page**: extracting text and judging it in your head loses the measurements the next fix needs.
- **Trusting the click**: `el.click()` on React inputs without the native value setter silently does nothing; verify state actually changed in the probe output.
- **Declaring victory without re-running the check**: the check is the definition of done.

## Provenance

This workflow was extracted from ui-debug-bench, a benchmarked debugger→fixer harness: a browser-only debugger agent reproduces a planted bug, a code-only fixer patches it, and a deterministic check verifies. Measured on its 18-bug suite (6 realistic vibe-coded React apps; browse CLI + claude-opus-4-8): one-shot fixing solved 7/18; feeding the check's exact failure back into the fixer reached 11–12/18 — with zero re-browsing during repair. An earlier prototype showed the same ladder (6/18 one-shot, 8/18 with re-browse repair, 10/18 with exact probes). Sharper assertions beat more browsing.