browser-record: turn a recorded human browser flow into a reusable, intent-level task skill by shubh24 · Pull Request #141 · browserbase/skills

shubh24 · 2026-06-27T06:54:23Z

What we're building

Turn a one-time human browser demonstration into a durable, reusable, parameterized agent skill.

A human performs a flow once in a live cloud browser; out the other end comes a skills/<task>/SKILL.md that any agent can invoke — parameterized, self-verifying, and resilient to the page changing underneath it. "Show, don't prompt."

The core thesis: replay intent, not mechanics

A naive recording captures mechanics — "typed n-e-w-y-o, clicked #c307, clicked li[1]". Those rot instantly: dynamic ids regenerate per page load, deep DOM paths drift, and keystrokes aren't the point. We started with a deterministic selector-replay engine (with a healing ladder) and hit the wall you'd expect — it could be made to work, but it was the wrong abstraction.

What you actually want is intent — "destination = New York". And recovering intent is a judgment call:

The committed value lives in the outcome (the chosen suggestion's accessible name), not the input.
A human who typed San Francisco, erased it, and chose Los Angeles meant Los Angeles — the recorder must drop the correction.
A filter applied then removed is net-zero — drop it entirely.
The values the human supplied (cities, dates) are parameters, not constants.

None of that is collapsible by deterministic rules. So the distiller is an agent, not a script — the same shape as the autobrowse teacher loop, re-seeded on a human's trace instead of an agent's own run. (Intended to merge with autobrowse later.)

How it works: capture wide, reason narrow

record (interaction stream + per-step screenshots)        ← semantic spine
  + browser-trace (CDP firehose: network/console/DOM)     ← full observability
  → distill = teacher agent reconstructs INTENT           ← collapses corrections,
  → skills/<task>/ (SKILL.md + screenshots/ + recording)    drops abandoned actions

Capture — record.mjs injects a listener that records each click/type with the acted element's accessible name + role (ungated — this is the fix that captures an autocomplete suggestion's name even when its only selector is a dynamic id) plus a screenshot per step. RR_CONNECT_URL lets it attach to a browser-trace keep-alive session so the firehose and the interaction stream observe the same session.
Distill — an agent (per references/distill.md) reads the stream + screenshots, queries the bisected trace on demand (progressive disclosure, not firehose-in-prompt), and writes the smallest set of intents that explains the session.
Replay — just invoke the generated skill. The agent realizes each intent via browse, uses the per-step screenshots as the visual oracle, and verifies committed values.

The generated task skill bundles a curated screenshots/ folder (referenced per step) and names each step's recorded target as a hint while granting the agent agency to use whatever live element achieves the intent.

Why Browserbase (not local Chrome)

Capture works over plain CDP, so it runs locally — but the loop is materially better on Browserbase: the live-view URL makes the human demo remote and shareable; server-side observability (session recording, proxy network, downloads, logs via bb-finalize) gives the teacher agent far more to reason over; clean isolated sessions avoid recording your local profile/cookies/extensions; and recording + replaying in the same environment keeps record ≈ replay.

What's in this PR

skills/browser-record/ — capture scripts (record.mjs, inject.js), the teacher-agent distill procedure + prompt (references/distill.md), the task-skill shape (SKILL.md), evals, package.json, LICENSE.
Pairs with the existing browser-trace skill for the firehose.
Deterministic replay engine removed — replay is invoking the generated skill.

Status / validation

Draft. node scripts/validate-skills.mjs --skill browser-record → passes (0 errors/warnings).
Demonstrated end-to-end on Google Flights: enriched capture recorded the suggestion name "San Francisco International Airport" (the old tag-gated capture dropped it); the teacher agent distilled a one-way SAN→SFO/Jul-3 flow into a parameterized google-flights-search skill, collapsing the nameless filter-UI noise. That generated skill is kept local for now, not in this PR.
Pre-existing validator failures in cookie-sync / pitch-prep are unrelated and untouched.

🤖 Generated with Claude Code

Record a human browser flow on a Browserbase cloud session and replay it deterministically through the browse CLI, with accessibility-snapshot selector healing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Shift from deterministic selector-replay to "capture wide, reason narrow": - inject.js: capture each step's accessible name + role (ungated), so an autocomplete suggestion ("New York") is recorded even when its only selector is a dynamic id — this is the intent signal. - record.mjs: per-step screenshots (intent evidence + replay oracle) and an RR_CONNECT_URL attach mode so the recorder can join a browser-trace keep-alive session and share the full CDP firehose. - Distillation is now an agent, not a script (removed distill.mjs). The teacher agent reads the interaction stream + screenshots + trace and reconstructs intent — collapsing self-corrections, dropping abandoned actions, parameterizing inputs — then authors a task skill. See references/distill.md. - SKILL.md rewritten around record -> trace -> distill -> task skill; deterministic replay.mjs demoted to an optional CI fast path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Rename the skill to browser-record (it's a recorder that emits a task skill; "replay" is now just invoking that skill). - Remove replay.mjs: replay is agentic (invoke the generated skill), so the deterministic engine is no longer part of the product. - Generated task skills now bundle a curated screenshots/ folder (the visual oracle) referenced per step, and each step names its recorded target (accessible name/role) as a hint while granting the agent agency to use whatever live element achieves the intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shrey150 · 2026-06-27T20:48:30Z

Played with this end-to-end — recorded a couple of real flows and distilled them into skills. Two learnings worth baking in:

1. Default the capture step to local headed Chrome, not a cloud session

I tried swapping the recorder to launch a local headed Chrome — chromium.launchPersistentContext(userDataDir, { headless: false, channel: 'chrome' }) — instead of creating a Browserbase session + connectOverCDP. The capture engine (inject.js → window.__rr_events → drain + per-step screenshots) is browser-agnostic, so only the connection setup changes. Local won out as the better default for the record step because:

It uses the human's real persistent profile, so logins survive between recordings. Big for auth'd flows — with a warm profile the SSO/2FA step is often already satisfied and you land straight on the destination.
No API key, no session cost, no cloud round-trip latency.
You interact with a real window directly — no live-view URL hand-off needed.

Keeping RR_CONNECT_URL as an optional attach path preserves the browser-trace firehose pairing (and Browserbase itself, when you want the shareable live view / clean isolated session). Suggestion: make local headed the default, cloud/attach opt-in.

2. Security posture on captured credentials

Recording any flow that includes a login currently persists the password to disk in plaintext. inject.js logs every change event's .value with no special-casing:

// scripts/inject.js
const value = ('value' in el) ? el.value : '';   // includes type="password"

I hit this for real recording a GitHub login — my email + password ended up in the output recording.json in /tmp, and per-step screenshots can capture sensitive fields visually too. For a skill whose whole purpose is "record human flows and persist them as reusable files," that's a meaningful gap. Suggested fixes:

Redact type="password" / autocomplete="*-password" inputs — store a [REDACTED] sentinel instead of the value.
Consider the same for other obvious secrets (OTP, card number), and add a note that screenshots may capture sensitive UI.
A scrub pass in the distiller so any generated recording.json fallback never carries secrets.

Happy to send a PR with both changes if useful.

shubh24 and others added 3 commits June 26, 2026 23:54

Add record-and-replay skill

1cd473d

Record a human browser flow on a Browserbase cloud session and replay it deterministically through the browse CLI, with accessibility-snapshot selector healing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shubh24 changed the title ~~Add record-and-replay skill~~ Add browser-record skill (record a flow → distill into a task skill) Jun 27, 2026

shubh24 changed the title ~~Add browser-record skill (record a flow → distill into a task skill)~~ browser-record: turn a recorded human browser flow into a reusable, intent-level task skill Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

browser-record: turn a recorded human browser flow into a reusable, intent-level task skill#141

browser-record: turn a recorded human browser flow into a reusable, intent-level task skill#141
shubh24 wants to merge 3 commits into
mainfrom
record-and-replay-skill

shubh24 commented Jun 27, 2026 •

edited

Loading

Uh oh!

shrey150 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shubh24 commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What we're building

The core thesis: replay intent, not mechanics

How it works: capture wide, reason narrow

Why Browserbase (not local Chrome)

What's in this PR

Status / validation

Uh oh!

shrey150 commented Jun 27, 2026

1. Default the capture step to local headed Chrome, not a cloud session

2. Security posture on captured credentials

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shubh24 commented Jun 27, 2026 •

edited

Loading