diff --git a/README.md b/README.md index 6b06aea..62adde3 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ This plugin includes the following skills (see `skills/` for details): | [company-research](skills/company-research/SKILL.md) | Discover target companies matching your ICP using the Browserbase Search API, deep-research each one, and score fit into a research report and CSV | | [event-prospecting](skills/event-prospecting/SKILL.md) | Extract speakers from a conference page, filter their companies against your ICP, and deep-research the best-fit people into a person-first prospecting report | | [competitor-analysis](skills/competitor-analysis/SKILL.md) | Auto-discover a company's competitors via the Browserbase Search API, deep-research each across marketing, signal, benchmark, and strategic-diff lanes, and compile a browsable HTML report with an overview, per-competitor deep dives, a feature/pricing matrix, and a mentions feed | +| [browser-record](skills/browser-record/SKILL.md) | Record a human browser flow (clicks, typing, screenshots, full CDP trace) on a Browserbase session, then let an agent distill what the human *meant* — collapsing corrections, dropping abandoned actions — into a reusable, parameterized task skill that replays against the live page | ## Installation diff --git a/skills/browser-record/LICENSE.txt b/skills/browser-record/LICENSE.txt new file mode 100644 index 0000000..f2f4397 --- /dev/null +++ b/skills/browser-record/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Browserbase, Inc. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/browser-record/SKILL.md b/skills/browser-record/SKILL.md new file mode 100644 index 0000000..33fcade --- /dev/null +++ b/skills/browser-record/SKILL.md @@ -0,0 +1,130 @@ +--- +name: browser-record +description: Record a human browser flow on a Browserbase session and distill it into a reusable, parameterized task skill. Captures clicks/typing/screenshots (plus an optional full CDP trace), then an agent reasons about what the human *meant* — collapsing corrections, dropping abandoned actions — and writes an intent-level SKILL.md that replays against the live page. Use for "show, don't prompt": record a flow once and turn it into a skill. Triggers on "record this flow", "turn this into a skill", "record a browser workflow", "browser record". +compatibility: "Requires Node 18+ and the browse CLI (`npm install -g browse`), plus `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID`. Record uses `@browserbasehq/sdk` + `playwright-core` — run `npm install` in this skill dir. Pairs with the `browser-trace` skill for the full CDP firehose." +license: MIT +allowed-tools: Bash, Read, Grep +--- + +# Browser Record + +"Show the bug instead of prompting it." Record a human flow once, then turn it +into a **reusable, parameterized task skill** an agent can replay against the live +page. + +The pipeline is **capture wide, reason narrow**: + +``` +record (interaction stream + screenshots) ← semantic spine + + browser-trace (CDP firehose: network/console/DOM) ← full observability + → distill = teacher agent reasons about INTENT ← collapses corrections, + → skills//SKILL.md drops abandoned actions +``` + +The key idea: a recording is **mechanics** ("typed 'new yo', clicked `#c307`"). +What you want is **intent** ("destination = New York"). Recovering intent — +including spotting that the user typed San Francisco, erased it, and chose Los +Angeles, or applied a filter then removed it — is a judgment, so the distiller is +**an agent, not a script** (see `references/distill.md`). + +## 1. Capture + +Record produces the **semantic spine**: each click/type with the acted element's +accessible `name` + `role` + committed value, plus a screenshot per step. + +```bash +RR_URL="https://www.saucedemo.com" RR_OUT=/tmp/rec.json RR_TITLE="login flow" \ + node --env-file=.env scripts/record.mjs +``` + +Open the printed **live view URL**, perform the flow, then stop with ENTER, +`touch /tmp/rr-stop`, or `RR_SECONDS=30`. Output: `RR_OUT` + `-shots/`. + +**For full observability**, attach `browser-trace` so the teacher agent can also +query network/console/DOM. Create one keep-alive session, point both at it: + +```bash +node ../browser-trace/scripts/bb-capture.mjs --new myflow # session + CDP firehose +SID=$(jq -r .browserbase.session_id .o11y/myflow/manifest.json) +CONNECT_URL=$(browse cloud sessions get "$SID" | jq -r .connectUrl) +RR_CONNECT_URL="$CONNECT_URL" RR_URL="https://site.com" RR_OUT=/tmp/rec.json \ + node --env-file=.env scripts/record.mjs # attaches to same session +# after stopping the recording: +node ../browser-trace/scripts/stop-capture.mjs myflow && node ../browser-trace/scripts/bisect-cdp.mjs myflow +``` + +| Var | Default | Meaning | +|-----|---------|---------| +| `RR_URL` | `https://example.com` | start URL | +| `RR_OUT` | `/tmp/recording-.json` | output recording path | +| `RR_CONNECT_URL` | _(none)_ | attach to an existing session (e.g. browser-trace's) instead of creating one | +| `RR_TITLE` / `RR_STOP` / `RR_SECONDS` | — | title / stop-file / auto-stop | + +## 2. Distill (the agent does this) + +Read `references/distill.md`, then **act as the teacher agent**: read +`recording.json` + the screenshots, query the `browser-trace` buckets as needed, +and reconstruct the *smallest set of intents that explains the session* — +collapsing corrections, dropping abandoned/undone actions, parameterizing the +values the user supplied. Write the result as `skills//`. + +Each step's headline is the value the field **committed to** (the acted element's +`name`), never the keystrokes or a dynamic selector. The committed value is also +the step's verification check. + +### What the generated task skill must contain + +- `SKILL.md` — intent steps (shape below). +- `screenshots/NN-