diff --git a/skills/autobrowse/SKILL.md b/skills/autobrowse/SKILL.md
index ba7ca24..e2b91df 100644
--- a/skills/autobrowse/SKILL.md
+++ b/skills/autobrowse/SKILL.md
@@ -225,47 +225,65 @@ Read the new summary. Did it pass? Make clear progress?
 
 ### Generate a runnable script (optional)
 
-Once the task has converged, you can produce a deterministic, runnable script
-in one or more frameworks via `scripts/codegen.mjs`. This is one shot of an
-LLM call per framework, cached by content hash, with optional verify-against-
-fresh-session and rewrite-on-failure.
-
-```bash
-node ${CLAUDE_SKILL_DIR}/scripts/codegen.mjs \
-  --task <name> \
-  --workspace ./autobrowse \
-  --frameworks playwright,stagehand \
-  --verify
-```
-
-Each framework gets its own subdirectory under `tasks/<name>/<framework>/`
-with the emitted script and a self-contained scaffold (`package.json`,
-`tsconfig.json`). The directory is runnable standalone with
-`cd tasks/<name>/playwright && npm install && npx tsx <name>.ts` — the only
-runtime requirement is `BROWSERBASE_API_KEY` (plus `ANTHROPIC_API_KEY` for
-the Stagehand target).
-
-Builtin frameworks: `playwright`, `stagehand`. Add a custom framework with
-`--prompt-template <path> --frameworks custom` (and provide your own runner
-or pass `--no-verify`).
-
-Common flags:
-
-| Flag | Purpose |
-|---|---|
-| `--frameworks a,b,...` | Comma-separated; default `playwright` |
-| `--verify` / `--no-verify` | Run the produced script against a fresh BB session; default `--verify` |
-| `--max-retries N` | Rewrite-on-verify-failure cap; default 2 |
-| `--cache-only` | Error if cache miss (CI-friendly) |
-| `--force` | Bust the cache |
-| `--dry-run` | Estimate prompt size + cost; don't call the LLM |
-| `--run <id>` | Force a specific `run-NNN` (default: latest passing) |
-
-Output is one JSON line per framework on stdout. Non-zero exit if any
-selected framework's final state is `passed: false`.
-
-See `references/playwright-cdp-bridge.md` for the canonical
-`connectOverCDP` patterns the emitted scripts follow.
+Once the task has converged, you can produce a runnable script in one or
+more frameworks (Playwright, Stagehand) directly using your own `Write` and
+`Bash` tools — autobrowse no longer ships a separate `codegen.mjs`
+sub-process. The framework-specific specs live as reference docs you read
+on demand:
+
+- `references/codegen/playwright.md` — script shape, scaffold, verify
+  contract, locator priorities, HTTP-only variant
+- `references/codegen/stagehand.md` — Stagehand v3 constructor, `act` /
+  `extract` patterns, when NOT to ship Stagehand
+- `references/playwright-cdp-bridge.md` — canonical `connectOverCDP`
+  create-session / release dance
+
+The loop is:
+
+Pick an **output directory** for each run and keep all of step 2-4 inside
+it. The two common shapes:
+
+- **Per-framework subdir** (standalone autobrowse use, no host): one
+  directory per framework, scripts named after the task —
+  `tasks/<task>/playwright/<task>.ts`, `tasks/<task>/stagehand/<task>.ts`.
+  Each subdir gets its own `package.json` + `node_modules`.
+- **Flattened upload root** (what browse.sh's skill-generator uses): all
+  frameworks share one output dir at the upload root, scripts named after
+  the framework — `/tmp/skill/{domain}/{task}/playwright.ts`,
+  `.../stagehand.ts`. One merged `package.json` covers both.
+
+Step 4's verify command and step 7's "delete broken script" path must
+match step 2's filename. Pick one shape per task and stick with it.
+
+The loop is:
+
+1. `Read` the converged trace at
+   `./autobrowse/traces/<task>/run-NNN/{trace.json,unified-events.jsonl}`
+   (zero-padded run number — autobrowse also maintains a `latest` symlink
+   to the most recent run if you'd rather use that), the task's
+   `strategy.md` at `./autobrowse/tasks/<task>/strategy.md`, and the
+   framework reference doc at `references/codegen/<framework>.md`.
+2. `Write` the script into the output directory you picked:
+   `<output-dir>/<task>.ts` (per-framework subdir) or
+   `<output-dir>/<framework>.ts` (flattened root).
+3. `Write` the scaffold's `package.json` + `tsconfig.json` per the
+   reference. When multiple frameworks share an output directory, merge
+   the `dependencies` across frameworks into a single `package.json`.
+4. `Bash` `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 npm install --silent --no-audit --no-fund`
+   then `npx tsx <the-script-you-just-wrote>` against a fresh Browserbase
+   session. Use the same filename you `Write`'d in step 2.
+5. Parse the trailing `{"success":boolean,...}` JSON line from stdout. If
+   it failed, read the stderr tail and iterate — up to ~3 attempts is
+   reasonable. If still failing, delete the broken script so it isn't
+   uploaded (the upload glob ships whatever's on disk).
+
+The agent does this directly because it already has the context, the
+tools, and the judgment for "this stderr means …, try X". A sub-process
+LLM call (the old `codegen.mjs`) couldn't see why a script was failing
+beyond the stderr tail, and tended to bleed natural-language preamble
+into the `.ts` file via the completion API's message channel — both
+problems disappear when the outer agent writes the file through the
+`Write` tool's structured argument.
 
 ### After all iterations — publish if ready
 
diff --git a/skills/autobrowse/codegen/prompts/playwright.md b/skills/autobrowse/codegen/prompts/playwright.md
deleted file mode 100644
index 51c7de7..0000000
--- a/skills/autobrowse/codegen/prompts/playwright.md
+++ /dev/null
@@ -1,55 +0,0 @@
-# Playwright codegen — system prompt
-
-You are converting a converged autobrowse trace into a runnable Playwright
-script. Your output is the **complete contents of a `.ts` file**, nothing
-else: no preamble, no closing remarks, no markdown fences.
-
-## Constraints
-
-- **Self-contained.** The script must run with only `BROWSERBASE_API_KEY` in
-  the environment. No reliance on autobrowse state, no reading from
-  workspace files.
-- **CDP attach, never `chromium.launch()`.** Follow the
-  `Playwright ↔ Browserbase bridge` reference verbatim for the
-  create-session / connectOverCDP / release dance.
-- **No `browser.close()`.** Release the session via
-  `browse cloud sessions update <id> --status REQUEST_RELEASE` in `finally`.
-- **Final stdout line is JSON.** `{"success":true,"data":...}` on success
-  or `{"success":false,"error":"..."}` on failure. The runner parses this
-  line — don't emit any other JSON-looking lines after it.
-- **Snap on errors.** Wrap `main()` in `try { … } catch (err) { await snap(page, '99-error'); throw err; }`. Honor `process.env.SCREENSHOT_DIR` for snap output.
-- **Locator preferences in order:** `data-testid` attribute → role + name →
-  id → text → xpath. Prefer Playwright's auto-waiting (`locator.click()`,
-  `locator.fill()`) over explicit waits when possible.
-- **Use the descriptor data when available.** Each `descriptors.ndjson` entry
-  describes the actual DOM target the agent interacted with — pick locators
-  from those `attributes` / `role` / `accessibleName` fields rather than
-  inventing them.
-- **Use the trace's network signals.** Where the unified events show a slow
-  XHR after an action, insert `page.waitForResponse(...)` rather than
-  arbitrary sleeps.
-
-## Output schema
-
-The script must define a Zod schema that mirrors the `# Output` section of
-the task.md provided in context, and validate the extracted data through
-that schema before printing the final `success: true` line.
-
-## Imports / runtime
-
-```typescript
-import { chromium, type Browser, type Page } from "playwright";
-import { execFileSync } from "node:child_process";
-import { join } from "node:path";
-import { z } from "zod";
-import "dotenv/config";
-```
-
-`playwright` and `zod` are already in the scaffolded `package.json`. Do not
-add other dependencies.
-
-## What to emit
-
-Output the complete `.ts` file content. Start with imports, end with a call
-to `main()`. Nothing before the first import, nothing after the last
-closing brace. No markdown fences.
diff --git a/skills/autobrowse/codegen/prompts/stagehand.md b/skills/autobrowse/codegen/prompts/stagehand.md
deleted file mode 100644
index 085f7d3..0000000
--- a/skills/autobrowse/codegen/prompts/stagehand.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# Stagehand codegen — system prompt
-
-You are converting a converged autobrowse trace into a runnable Stagehand
-script. Your output is the **complete contents of a `.ts` file**, nothing
-else: no preamble, no closing remarks, no markdown fences.
-
-This targets **Stagehand v3** (`@browserbasehq/stagehand` 3.x). The v3 API
-differs from older examples — follow the patterns below exactly.
-
-## Constraints
-
-- **Self-contained.** The script must run with `BROWSERBASE_API_KEY` and
-  `ANTHROPIC_API_KEY` in the environment.
-- **Stagehand owns its own Browserbase session.** Construct it with
-  `env: "BROWSERBASE"` and let it create the session — do NOT pre-create a
-  session via the `browse` CLI and do NOT pass `browserbaseSessionID`. The
-  constructor shape is:
-  ```typescript
-  const stagehand = new Stagehand({
-    env: "BROWSERBASE",
-    apiKey: process.env.BROWSERBASE_API_KEY,        // ← BROWSERBASE key (NOT the Anthropic key); project inferred from it
-    model: {                                        // ← LLM config lives here, not at top level
-      modelName: "anthropic/claude-sonnet-4-6",     // ← provider-prefixed; do not invent model names
-      apiKey: process.env.ANTHROPIC_API_KEY,
-    },
-  });
-  await stagehand.init();
-  ```
-  The top-level `apiKey` is the **Browserbase** API key (the project is
-  inferred from it — no `projectId` needed). There is no `browserbaseAPIKey`
-  field and no top-level `modelName` — using the Anthropic key as `apiKey`
-  makes session lookup fail with a 404.
-- **Get the page from the context, not `stagehand.page`.**
-  ```typescript
-  const page = stagehand.context.pages()[0] ?? (await stagehand.context.newPage());
-  await page.goto(url, { waitUntil: "domcontentloaded" });
-  ```
-  `page` supports `goto`, `waitForTimeout`, `waitForSelector`, `screenshot`.
-- **`act` and `extract` are methods on the `stagehand` instance, not the page.**
-  - Actions: `await stagehand.act("click the Continue button")`
-  - Data: `await stagehand.extract("<instruction>", zodSchema)` — pass the Zod
-    schema as the second argument; it returns the parsed object.
-  Prefer natural-language intent strings — the whole point of Stagehand is the
-  LLM picks the locator at runtime.
-- **One natural-language action per `act` call.** Don't compound
-  ("click X and fill Y"); chain individual `act` calls so each is retryable.
-- **Schema-backed extract.** Define Zod schemas mirroring the `# Output`
-  section of task.md and validate before emitting the final `success: true`
-  line.
-- **Use the descriptors as natural-language hints.** Where a descriptor shows
-  `accessibleName: "Continue"`, the corresponding `act` should say
-  `"click the Continue button"`. Specific locators aren't required.
-- **Snap on errors.** Wrap the body in
-  `try { … } catch (err) { await snap(page, '99-error'); … }`, honoring
-  `process.env.SCREENSHOT_DIR`. `snap` should be a no-op when the dir is unset.
-- **Final stdout line is JSON.** `{"success":true,"data":...}` on success,
-  `{"success":false,"error":"..."}` on failure. The runner parses this — emit
-  no other JSON-looking lines after it.
-- **Tear down with `await stagehand.close()` in `finally`.** Since Stagehand
-  created and owns the session, `close()` is the correct teardown — do NOT use
-  `browse cloud sessions update … REQUEST_RELEASE` (that's only for the
-  CDP-attach pattern where you created the session yourself).
-
-## Imports / runtime
-
-```typescript
-import { Stagehand } from "@browserbasehq/stagehand";
-import { join } from "node:path";
-import { z } from "zod";
-import "dotenv/config";
-```
-
-`@browserbasehq/stagehand` and `zod` are already in the scaffolded
-`package.json`. Do not add other dependencies.
-
-## What to emit
-
-Output the complete `.ts` file content. Start with imports, end with a call
-to `main()`. Nothing before the first import, nothing after the last
-closing brace. No markdown fences.
diff --git a/skills/autobrowse/codegen/runners/lib/tsx-runner.mjs b/skills/autobrowse/codegen/runners/lib/tsx-runner.mjs
deleted file mode 100644
index c90407c..0000000
--- a/skills/autobrowse/codegen/runners/lib/tsx-runner.mjs
+++ /dev/null
@@ -1,136 +0,0 @@
-// tsx-runner.mjs — shared logic for codegen target runners that boot a tsx
-// script in a scaffolded output dir and parse its trailing JSON line.
-//
-// Playwright and Stagehand runners (and any future TS target that follows the
-// same {"success":boolean,"data":...} contract) call runTsxTarget with their
-// per-framework tweaks: a label for stderr prefix, extra env (e.g.
-// PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1), and an optional preflight check (e.g.
-// "ANTHROPIC_API_KEY required for Stagehand").
-
-import * as fs from "node:fs";
-import * as path from "node:path";
-import * as crypto from "node:crypto";
-import { spawnSync } from "node:child_process";
-
-export function getArg(name) {
-  const i = process.argv.indexOf(`--${name}`);
-  return i !== -1 && process.argv[i + 1] ? process.argv[i + 1] : null;
-}
-
-// Emit a JSON result line on stdout and exit. Centralized so the contract
-// (single {passed:bool,...} JSON line, exit 0/2) is consistent across runners.
-function emitAndExit(result) {
-  console.log(JSON.stringify(result));
-  process.exit(result.passed ? 0 : 2);
-}
-
-/**
- * Run a tsx target script against a fresh BB session.
- *
- * @param {object} opts
- * @param {string} opts.label                 stderr prefix, e.g. "playwright"
- * @param {Record<string,string>} [opts.extraEnv]  merged into the run's env
- * @param {Record<string,string>} [opts.installEnv] merged into npm install's env
- * @param {() => string|null} [opts.preflight]  return error message to fail fast
- */
-export function runTsxTarget(opts) {
-  const { label, extraEnv = {}, installEnv = {}, preflight } = opts;
-  const outDir = getArg("out-dir");
-  const script = getArg("script");
-
-  if (!outDir || !script) {
-    emitAndExit({ passed: false, error: "runner missing --out-dir or --script" });
-  }
-
-  const scriptPath = path.join(outDir, script);
-  if (!fs.existsSync(scriptPath)) {
-    emitAndExit({ passed: false, error: `script not found at ${scriptPath}` });
-  }
-
-  if (preflight) {
-    const err = preflight();
-    if (err) emitAndExit({ passed: false, error: err });
-  }
-
-  // Install deps when package.json changes. Gating purely on node_modules
-  // existing is wrong when two frameworks share an --out dir: framework #2's
-  // dropScaffold merges its deps into the existing package.json, but the
-  // node_modules from framework #1's install is still missing them. We hash
-  // package.json and compare against a stamp under node_modules/ to detect
-  // that and re-install.
-  const pkgPath = path.join(outDir, "package.json");
-  const stampPath = path.join(outDir, "node_modules", ".codegen-pkg-hash");
-  const pkgHash = fs.existsSync(pkgPath)
-    ? crypto.createHash("sha256").update(fs.readFileSync(pkgPath)).digest("hex")
-    : null;
-  const stampedHash = fs.existsSync(stampPath)
-    ? fs.readFileSync(stampPath, "utf-8").trim()
-    : null;
-  if (pkgHash && pkgHash !== stampedHash) {
-    process.stderr.write(`[runner.${label}] installing deps in ${outDir}\n`);
-    // Always set PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 here, regardless of which
-    // runner we are. In shared --out mode, framework #2 (e.g. stagehand) gets
-    // playwright merged into its package.json by dropScaffold, so even runners
-    // that don't list playwright in installEnv would still trigger its
-    // postinstall and try to fetch hundreds of MB of chromium — exhausting
-    // the 3min install budget. We never need bundled browsers (always CDP).
-    const install = spawnSync("npm", ["install", "--silent", "--no-audit", "--no-fund"], {
-      cwd: outDir,
-      stdio: ["ignore", "inherit", "inherit"],
-      env: { ...process.env, PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1", ...installEnv },
-      timeout: 3 * 60 * 1000,
-    });
-    if (install.status !== 0) {
-      emitAndExit({ passed: false, error: `npm install exited ${install.status}` });
-    }
-    try {
-      fs.mkdirSync(path.dirname(stampPath), { recursive: true });
-      fs.writeFileSync(stampPath, pkgHash);
-    } catch {}
-  }
-
-  // Per-run screenshot dir, exposed to the script via SCREENSHOT_DIR so its
-  // snap() helper can write progress / failure shots somewhere we can find.
-  const screenshotDir = path.join(outDir, "screenshots", `verify-${Date.now()}`);
-  fs.mkdirSync(screenshotDir, { recursive: true });
-
-  process.stderr.write(`[runner.${label}] running ${scriptPath}\n`);
-  const run = spawnSync("npx", ["tsx", script], {
-    cwd: outDir,
-    encoding: "utf-8",
-    stdio: ["ignore", "pipe", "pipe"],
-    env: { ...process.env, ...extraEnv, SCREENSHOT_DIR: screenshotDir },
-    timeout: 5 * 60 * 1000,
-  });
-
-  const stdout = run.stdout ?? "";
-  const stderr = run.stderr ?? "";
-
-  // Parse the script's trailing JSON line — walk backward through lines and
-  // take the last one that parses as JSON with a boolean `success` field.
-  let parsed = null;
-  const lines = stdout.trim().split("\n").filter(Boolean);
-  for (let i = lines.length - 1; i >= 0; i--) {
-    try {
-      const candidate = JSON.parse(lines[i]);
-      if (typeof candidate?.success === "boolean") {
-        parsed = candidate;
-        break;
-      }
-    } catch {}
-  }
-
-  const passed = run.status === 0 && parsed?.success === true;
-  const result = {
-    passed,
-    exit_code: run.status,
-    script_output: parsed,
-    screenshot_dir: screenshotDir,
-    stderr_tail: stderr.slice(-2000),
-  };
-  if (!passed) {
-    result.error = parsed?.error
-      || (run.status !== 0 ? `script exited ${run.status}` : "script did not emit success:true");
-  }
-  emitAndExit(result);
-}
diff --git a/skills/autobrowse/codegen/runners/playwright.mjs b/skills/autobrowse/codegen/runners/playwright.mjs
deleted file mode 100755
index dbc2f4e..0000000
--- a/skills/autobrowse/codegen/runners/playwright.mjs
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/usr/bin/env node
-
-/**
- * playwright.mjs — Runner for the Playwright codegen target.
- *
- * Invoked by codegen.mjs's verify step. Installs the scaffolded deps if
- * needed, spawns `npx tsx <script>` against a fresh BB session, and emits
- * a single {"passed":boolean, ...} JSON line on stdout.
- *
- * Contract:
- *   --out-dir <path>      the scaffolded output dir
- *   --script <basename>   file inside --out-dir to run (e.g. acme.ts)
- *
- * Shared with stagehand.mjs via lib/tsx-runner.mjs — only differences are
- * the label and the PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD trick (so playwright's
- * postinstall doesn't try to fetch chromium; we use connectOverCDP).
- */
-
-import { runTsxTarget } from "./lib/tsx-runner.mjs";
-
-runTsxTarget({
-  label: "playwright",
-  // PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 is required at install time too,
-  // otherwise the playwright postinstall pulls hundreds of MB of browser
-  // binaries that we never use (we always connectOverCDP to a remote BB
-  // session). Set it for both install and run.
-  installEnv: { PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" },
-  extraEnv: { PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" },
-});
diff --git a/skills/autobrowse/codegen/runners/stagehand.mjs b/skills/autobrowse/codegen/runners/stagehand.mjs
deleted file mode 100755
index 5e2a3aa..0000000
--- a/skills/autobrowse/codegen/runners/stagehand.mjs
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/usr/bin/env node
-
-/**
- * stagehand.mjs — Runner for the Stagehand codegen target.
- *
- * Same contract and shared logic as playwright.mjs (see lib/tsx-runner.mjs).
- * Differences:
- *   - No PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD trick (Stagehand uses
- *     connectOverCDP without bundling a local chromium).
- *   - Requires ANTHROPIC_API_KEY (or ANTHROPIC_AUTH_TOKEN) — Stagehand's
- *     act/extract are LLM-driven.
- */
-
-import { runTsxTarget } from "./lib/tsx-runner.mjs";
-
-runTsxTarget({
-  label: "stagehand",
-  preflight: () => {
-    if (!process.env.ANTHROPIC_API_KEY && !process.env.ANTHROPIC_AUTH_TOKEN) {
-      return "ANTHROPIC_API_KEY required for Stagehand verify";
-    }
-    return null;
-  },
-});
diff --git a/skills/autobrowse/codegen/scaffolds/playwright/package.json b/skills/autobrowse/codegen/scaffolds/playwright/package.json
deleted file mode 100644
index 081ed66..0000000
--- a/skills/autobrowse/codegen/scaffolds/playwright/package.json
+++ /dev/null
@@ -1,15 +0,0 @@
-{
-  "name": "{{TASK}}-playwright",
-  "version": "0.1.0",
-  "private": true,
-  "type": "module",
-  "scripts": {
-    "start": "tsx {{SCRIPT}}"
-  },
-  "dependencies": {
-    "dotenv": "{{DOTENV_VERSION}}",
-    "playwright": "{{PLAYWRIGHT_VERSION}}",
-    "tsx": "{{TSX_VERSION}}",
-    "zod": "{{ZOD_VERSION}}"
-  }
-}
diff --git a/skills/autobrowse/codegen/scaffolds/playwright/tsconfig.json b/skills/autobrowse/codegen/scaffolds/playwright/tsconfig.json
deleted file mode 100644
index b7b1f75..0000000
--- a/skills/autobrowse/codegen/scaffolds/playwright/tsconfig.json
+++ /dev/null
@@ -1,13 +0,0 @@
-{
-  "compilerOptions": {
-    "target": "ES2022",
-    "module": "ESNext",
-    "moduleResolution": "Bundler",
-    "strict": true,
-    "esModuleInterop": true,
-    "skipLibCheck": true,
-    "noEmit": true,
-    "resolveJsonModule": true
-  },
-  "include": ["*.ts"]
-}
diff --git a/skills/autobrowse/codegen/scaffolds/stagehand/package.json b/skills/autobrowse/codegen/scaffolds/stagehand/package.json
deleted file mode 100644
index bee503c..0000000
--- a/skills/autobrowse/codegen/scaffolds/stagehand/package.json
+++ /dev/null
@@ -1,15 +0,0 @@
-{
-  "name": "{{TASK}}-stagehand",
-  "version": "0.1.0",
-  "private": true,
-  "type": "module",
-  "scripts": {
-    "start": "tsx {{SCRIPT}}"
-  },
-  "dependencies": {
-    "@browserbasehq/stagehand": "{{STAGEHAND_VERSION}}",
-    "dotenv": "{{DOTENV_VERSION}}",
-    "tsx": "{{TSX_VERSION}}",
-    "zod": "{{ZOD_VERSION}}"
-  }
-}
diff --git a/skills/autobrowse/codegen/scaffolds/stagehand/tsconfig.json b/skills/autobrowse/codegen/scaffolds/stagehand/tsconfig.json
deleted file mode 100644
index b7b1f75..0000000
--- a/skills/autobrowse/codegen/scaffolds/stagehand/tsconfig.json
+++ /dev/null
@@ -1,13 +0,0 @@
-{
-  "compilerOptions": {
-    "target": "ES2022",
-    "module": "ESNext",
-    "moduleResolution": "Bundler",
-    "strict": true,
-    "esModuleInterop": true,
-    "skipLibCheck": true,
-    "noEmit": true,
-    "resolveJsonModule": true
-  },
-  "include": ["*.ts"]
-}
diff --git a/skills/autobrowse/references/codegen/playwright.md b/skills/autobrowse/references/codegen/playwright.md
new file mode 100644
index 0000000..8aa5782
--- /dev/null
+++ b/skills/autobrowse/references/codegen/playwright.md
@@ -0,0 +1,128 @@
+# Playwright codegen reference
+
+Spec for the `playwright.ts` file an outer agent writes when codegenning a
+runnable script from a converged autobrowse trace. The outer agent should
+read this file, draft the script with the `Write` tool, then verify it with
+`Bash` (`npm install && npx tsx playwright.ts`) against a fresh Browserbase
+session — iterating on failure using its own judgment.
+
+The companion file is `references/playwright-cdp-bridge.md`, which has the
+canonical create-session / connectOverCDP / release dance. Read that too.
+
+## Hard constraints
+
+- **Self-contained.** Runs with only `BROWSERBASE_API_KEY` in the env. No
+  reliance on autobrowse state, no reading from workspace files.
+- **CDP attach, never `chromium.launch()`.** Follow the cdp-bridge reference
+  verbatim for create-session / `connectOverCDP` / release.
+- **No `browser.close()`.** Release the session via
+  `browse cloud sessions update <id> --status REQUEST_RELEASE` in `finally`.
+  `browser.close()` on a `connectOverCDP` attachment tears down the remote
+  session prematurely.
+- **Final stdout line is JSON.** Emit `{"success":true,"data":...}` on
+  success or `{"success":false,"error":"..."}` on failure as the last line
+  on stdout. The verify command parses the trailing JSON line — don't emit
+  any other JSON-looking lines after it.
+- **Snap on errors.** Wrap `main()` in
+  `try { … } catch (err) { await snap(page, '99-error'); throw err; }`.
+  `snap` honors `process.env.SCREENSHOT_DIR` and is a no-op when unset.
+- **Locator priority:** `data-testid` → role + accessible name → id → text
+  → xpath. Prefer Playwright's auto-waiting (`locator.click()`,
+  `locator.fill()`) over explicit sleeps.
+- **Use the descriptor data when available.** Each `descriptors.ndjson`
+  entry from the trace describes the actual DOM target the agent interacted
+  with — pick locators from those `attributes` / `role` / `accessibleName`
+  fields rather than inventing them.
+- **Use the trace's network signals.** Where the unified events show a slow
+  XHR after an action, insert `page.waitForResponse(...)` rather than
+  arbitrary sleeps.
+
+## Output schema
+
+Define a Zod schema mirroring the `# Output` section of `task.md`, and
+validate the extracted data through it before printing the final
+`success: true` line.
+
+## Imports
+
+```typescript
+import { chromium, type Browser, type Page } from "playwright";
+import { execFileSync } from "node:child_process";
+import { join } from "node:path";
+import { z } from "zod";
+import "dotenv/config";
+```
+
+Only `playwright`, `zod`, `dotenv`, and `tsx` should appear in
+`package.json`. Don't add other runtime deps.
+
+## Scaffold
+
+Write `package.json` alongside `playwright.ts` (in the same directory):
+
+```json
+{
+  "name": "<task>-playwright",
+  "version": "0.1.0",
+  "private": true,
+  "type": "module",
+  "scripts": { "start": "tsx playwright.ts" },
+  "dependencies": {
+    "dotenv": "16.4.5",
+    "playwright": "1.50.0",
+    "tsx": "4.22.3",
+    "zod": "4.4.3"
+  }
+}
+```
+
+And `tsconfig.json`:
+
+```json
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "ES2022",
+    "moduleResolution": "Bundler",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true
+  }
+}
+```
+
+**Install with `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1`** — we always connect
+over CDP to a remote Browserbase session, so the bundled chromium download
+is pure waste (and the sandbox's network allowlist blocks the CDN anyway).
+
+```bash
+PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 npm install --silent --no-audit --no-fund
+PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 npx tsx playwright.ts
+```
+
+## Verify contract
+
+Run the script against a fresh Browserbase session and read the trailing
+JSON line on stdout. Pass if `success === true`; fail otherwise. On
+failure, feed the stderr tail back into your next attempt and iterate.
+
+## When the workflow is HTTP-only
+
+If the trace shows the task can be accomplished via HTTP requests with no
+DOM interaction (api / fetch / url-param `recommended_method`), use
+Playwright's `request` API instead of opening a browser:
+
+```typescript
+import { request, type APIRequestContext } from "playwright";
+
+async function main() {
+  const ctx = await request.newContext({
+    extraHTTPHeaders: { "user-agent": "..." },
+  });
+  const res = await ctx.get("https://example.com/api/foo");
+  // ... parse, validate via Zod, emit success line ...
+}
+```
+
+You still emit the same trailing JSON success/failure line. No
+`connectOverCDP`, no session, no `snap`.
diff --git a/skills/autobrowse/references/codegen/stagehand.md b/skills/autobrowse/references/codegen/stagehand.md
new file mode 100644
index 0000000..0cc29e4
--- /dev/null
+++ b/skills/autobrowse/references/codegen/stagehand.md
@@ -0,0 +1,145 @@
+# Stagehand codegen reference
+
+Spec for the `stagehand.ts` file an outer agent writes when codegenning a
+runnable script from a converged autobrowse trace. The outer agent should
+read this file, draft the script with the `Write` tool, then verify it with
+`Bash` (`npm install && npx tsx stagehand.ts`) against a fresh Browserbase
+session — iterating on failure using its own judgment.
+
+This targets **Stagehand v3** (`@browserbasehq/stagehand` 3.x). The v3 API
+differs from older examples — follow the patterns below exactly.
+
+## When NOT to write a Stagehand script
+
+Stagehand fundamentally needs a browser session, so it doesn't fit
+HTTP-only workflows. If `recommended_method` in metadata.json is `api`,
+`mcp`, `fetch`, or `url-param`, skip Stagehand and ship only the Playwright
+variant. Same for `cli`.
+
+## Hard constraints
+
+- **Self-contained.** Runs with `BROWSERBASE_API_KEY` and `ANTHROPIC_API_KEY`
+  in the env.
+- **Stagehand owns its own Browserbase session.** Construct it with
+  `env: "BROWSERBASE"` and let it create the session — do NOT pre-create a
+  session via the `browse` CLI and do NOT pass `browserbaseSessionID`.
+- **Top-level `apiKey` is the Browserbase key, not the Anthropic key.** The
+  project is inferred from it. There is no `browserbaseAPIKey` field. Using
+  the Anthropic key as `apiKey` makes session lookup fail with a 404.
+- **Get the page from `stagehand.context`, not `stagehand.page`.**
+- **`act` and `extract` are methods on the `stagehand` instance, not the page.**
+- **One natural-language action per `act` call.** Don't compound
+  ("click X and fill Y"); chain individual `act` calls so each is retryable.
+- **Schema-backed extract.** Define Zod schemas mirroring the `# Output`
+  section of task.md and validate before emitting the final `success: true`
+  line.
+- **Tear down with `await stagehand.close()` in `finally`.** Since Stagehand
+  created and owns the session, `close()` is the correct teardown — do NOT
+  use `browse cloud sessions update … REQUEST_RELEASE` (that's only for the
+  CDP-attach pattern in `playwright.ts`).
+- **Snap on errors.** Wrap the body in
+  `try { … } catch (err) { await snap(page, '99-error'); throw err; }`,
+  honoring `process.env.SCREENSHOT_DIR`. `snap` is a no-op when the dir is
+  unset.
+- **Final stdout line is JSON.** Emit `{"success":true,"data":...}` on
+  success or `{"success":false,"error":"..."}` on failure as the last line
+  on stdout.
+
+## Constructor shape
+
+```typescript
+const stagehand = new Stagehand({
+  env: "BROWSERBASE",
+  apiKey: process.env.BROWSERBASE_API_KEY,        // ← BROWSERBASE key; project inferred from it
+  model: {                                        // ← LLM config lives here, not at top level
+    modelName: "anthropic/claude-sonnet-4-6",     // ← provider-prefixed; do not invent model names
+    apiKey: process.env.ANTHROPIC_API_KEY,
+  },
+});
+await stagehand.init();
+const page = stagehand.context.pages()[0] ?? (await stagehand.context.newPage());
+await page.goto(url, { waitUntil: "domcontentloaded" });
+
+// Actions:
+await stagehand.act("click the Continue button");
+
+// Data:
+const data = await stagehand.extract("<instruction>", zodSchema);
+```
+
+Use the descriptors from the trace as natural-language hints: where a
+descriptor shows `accessibleName: "Continue"`, the corresponding `act`
+should say `"click the Continue button"`. Specific locators aren't
+required — Stagehand picks them at runtime.
+
+## Imports
+
+```typescript
+import { Stagehand } from "@browserbasehq/stagehand";
+import { join } from "node:path";
+import { z } from "zod";
+import "dotenv/config";
+```
+
+Only `@browserbasehq/stagehand`, `zod`, `dotenv`, and `tsx` should appear
+in `package.json`. Don't add other runtime deps.
+
+## Scaffold
+
+Write `package.json` alongside `stagehand.ts` (in the same directory):
+
+```json
+{
+  "name": "<task>-stagehand",
+  "version": "0.1.0",
+  "private": true,
+  "type": "module",
+  "scripts": { "start": "tsx stagehand.ts" },
+  "dependencies": {
+    "@browserbasehq/stagehand": "3.4.0",
+    "dotenv": "16.4.5",
+    "tsx": "4.22.3",
+    "zod": "4.4.3"
+  }
+}
+```
+
+If `playwright.ts` is being written into the same directory, **merge** the
+dependencies rather than overwriting the existing `package.json` —
+otherwise the second framework's deps get lost. Use a single combined
+`package.json` with both `playwright` and `@browserbasehq/stagehand` listed.
+
+And `tsconfig.json`:
+
+```json
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "ES2022",
+    "moduleResolution": "Bundler",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true
+  }
+}
+```
+
+**Install with `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1`** if Playwright is also
+in the same `package.json` (its postinstall would otherwise fetch chromium
+binaries the sandbox can't reach).
+
+```bash
+PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 npm install --silent --no-audit --no-fund
+npx tsx stagehand.ts
+```
+
+## Verify contract
+
+Run the script against a fresh Browserbase session and read the trailing
+JSON line on stdout. Pass if `success === true`; fail otherwise. On
+failure, feed the stderr tail back into your next attempt and iterate.
+
+If verify still fails after 2-3 retries, **delete `stagehand.ts`** before
+the upload — the upload script globs for `playwright.ts stagehand.ts` and
+will ship whatever is on disk. Shipping a broken Stagehand variant is
+worse than shipping just Playwright.
diff --git a/skills/autobrowse/scripts/codegen.mjs b/skills/autobrowse/scripts/codegen.mjs
deleted file mode 100755
index e4970a3..0000000
--- a/skills/autobrowse/scripts/codegen.mjs
+++ /dev/null
@@ -1,515 +0,0 @@
-#!/usr/bin/env node
-
-/**
- * codegen.mjs — Convert a converged autobrowse trace into a runnable script in
- * one or more frameworks (Playwright, Stagehand, …).
- *
- * Pipeline per framework:
- *   1. Compose the context: task.md + unified-events.jsonl (when present) +
- *      descriptors.ndjson (when present) + strategy.md + the framework's
- *      cdp-bridge reference doc.
- *   2. Compute a cache key over (framework, prompt-template, task, trace,
- *      descriptors). Cache hit short-circuits with zero LLM cost.
- *   3. Single Anthropic completion against the framework's prompt template.
- *      Emit `<task>.<ext>` to the output dir.
- *   4. Drop the framework's scaffold files (package.json, tsconfig, …).
- *   5. If --verify: invoke the framework's runner against a fresh Browserbase
- *      session. On failure, feed the error back into a rewrite call up to
- *      --max-retries times.
- *
- * One JSON status line per framework on stdout. Non-zero exit if any selected
- * framework's final state is fail.
- *
- * Usage:
- *   node scripts/codegen.mjs --task <name> [options]
- *
- * Options:
- *   --task <name>                  task name under <workspace>/tasks/ (required)
- *   --workspace <dir>              default ./autobrowse
- *   --run <id>                     default: latest run-NNN with success: true
- *   --frameworks <a,b,...>         default: playwright
- *   --verify | --no-verify         default: --verify
- *   --max-retries <N>              rewrite-on-verify-failure cap (default: 2)
- *   --cache-dir <dir>              default <workspace>/codegen-cache
- *   --out <dir>                    default <workspace>/tasks/<name>/<framework>
- *   --prompt-template <path>       custom framework prompt (pair with --frameworks custom)
- *   --force                        bust cache
- *   --dry-run                      estimate cost without LLM call
- *   --cache-only                   error if cache miss (no LLM call)
- *   --model <name>                 override Claude model
- *   --help
- */
-
-import "dotenv/config";
-import Anthropic from "@anthropic-ai/sdk";
-import * as fs from "node:fs";
-import * as path from "node:path";
-import { execFileSync, spawnSync } from "node:child_process";
-import { fileURLToPath } from "node:url";
-import crypto from "node:crypto";
-
-const __dirname = path.dirname(fileURLToPath(import.meta.url));
-const SKILL_DIR = path.resolve(__dirname, "..");
-const PROMPT_TEMPLATE_VERSION = "2"; // bump to invalidate cache after prompt edits or scaffold/runner contract changes
-
-const DEFAULT_MODEL = "claude-sonnet-4-6";
-const DEFAULT_MAX_TOKENS = 8192;
-
-// ── CLI ────────────────────────────────────────────────────────────
-
-function getArg(name, fallback) {
-  const i = process.argv.indexOf(`--${name}`);
-  return i !== -1 && process.argv[i + 1] ? process.argv[i + 1] : fallback;
-}
-const hasFlag = (n) => process.argv.includes(`--${n}`);
-
-if (hasFlag("help") || hasFlag("h")) {
-  console.log(`autobrowse codegen — produce runnable scripts from a converged trace
-
-Usage: node scripts/codegen.mjs --task <name> [options]
-
-Options:
-  --task <name>                  task name under <workspace>/tasks/ (required)
-  --workspace <dir>              default: ./autobrowse
-  --run <id>                     specific run-NNN (default: newest passing)
-  --frameworks <a,b,...>         comma list; default: playwright
-                                 builtins: playwright, stagehand
-  --verify | --no-verify         run the script in a fresh BB session (default: --verify)
-  --max-retries <N>              cap rewrite-on-verify-fail loop (default: 2)
-  --cache-dir <dir>              default: <workspace>/codegen-cache
-  --out <dir>                    default: <workspace>/tasks/<name>/<framework>
-  --prompt-template <path>       custom prompt template (pair with --frameworks=custom)
-  --force                        ignore cache, regenerate
-  --dry-run                      estimate cost; don't call the LLM
-  --cache-only                   error if cache miss
-  --model <name>                 default: ${DEFAULT_MODEL}
-
-Env:
-  ANTHROPIC_API_KEY              required for LLM call
-  BROWSERBASE_API_KEY            required for --verify
-
-Exits 0 if all selected frameworks ended in pass (or --no-verify), 2 if any
-failed, 1 on harness error.`);
-  process.exit(0);
-}
-
-const TASK = getArg("task");
-if (!TASK) {
-  console.error("ERROR: --task <name> is required. Pass --help for usage.");
-  process.exit(1);
-}
-const WORKSPACE = path.resolve(getArg("workspace", "autobrowse"));
-const FORCED_RUN = getArg("run");
-const FRAMEWORKS = getArg("frameworks", "playwright").split(",").map((s) => s.trim()).filter(Boolean);
-const VERIFY = !hasFlag("no-verify");
-const MAX_RETRIES = parseInt(getArg("max-retries", "2"), 10);
-const CACHE_DIR = path.resolve(getArg("cache-dir", path.join(WORKSPACE, "codegen-cache")));
-const OUT_OVERRIDE = getArg("out");
-const PROMPT_TEMPLATE_OVERRIDE = getArg("prompt-template");
-const FORCE = hasFlag("force");
-const DRY_RUN = hasFlag("dry-run");
-const CACHE_ONLY = hasFlag("cache-only");
-const MODEL = getArg("model", DEFAULT_MODEL);
-
-// ── Inputs ─────────────────────────────────────────────────────────
-
-const taskDir = path.join(WORKSPACE, "tasks", TASK);
-const tracesDir = path.join(WORKSPACE, "traces", TASK);
-const taskFile = path.join(taskDir, "task.md");
-
-for (const [label, file] of [["task.md", taskFile]]) {
-  if (!fs.existsSync(file)) {
-    console.error(`ERROR: ${label} not found at ${file}. Run autobrowse first.`);
-    process.exit(1);
-  }
-}
-
-function pickRun() {
-  if (FORCED_RUN) {
-    // --run was passed; still confirm the directory exists. Without this we'd
-    // happily call codegen with empty trace/events/descriptors and the LLM
-    // would invent a script from just task.md + strategy.md, while logs
-    // still report the forced run id as if it were a real input.
-    const forcedDir = path.join(tracesDir, FORCED_RUN);
-    if (!fs.existsSync(forcedDir)) return null;
-    return FORCED_RUN;
-  }
-  if (!fs.existsSync(tracesDir)) return null;
-  const runs = fs.readdirSync(tracesDir)
-    .filter((d) => /^run-\d+$/.test(d))
-    .sort()
-    .reverse();
-  for (const r of runs) {
-    const summary = path.join(tracesDir, r, "summary.md");
-    if (!fs.existsSync(summary)) continue;
-    const text = fs.readFileSync(summary, "utf-8");
-    if (/success:\s*true/.test(text) || /"success"\s*:\s*true/.test(text)) return r;
-  }
-  return null;
-}
-
-const RUN_ID = pickRun();
-if (!RUN_ID) {
-  if (FORCED_RUN) {
-    console.error(`ERROR: --run ${FORCED_RUN} not found at ${path.join(tracesDir, FORCED_RUN)}.`);
-  } else {
-    console.error(`ERROR: no passing run found under ${tracesDir}. Pass --run <id> to force, or run autobrowse first.`);
-  }
-  process.exit(1);
-}
-const runDir = path.join(tracesDir, RUN_ID);
-
-// Try multiple candidate paths for each input — autobrowse layouts have
-// shifted over time and we want this to be robust to both modern and legacy.
-function readFirstExisting(...candidates) {
-  for (const p of candidates) {
-    if (p && fs.existsSync(p)) return { path: p, content: fs.readFileSync(p, "utf-8") };
-  }
-  return null;
-}
-
-const taskMd = fs.readFileSync(taskFile, "utf-8");
-const strategyMd = readFirstExisting(path.join(taskDir, "strategy.md"))?.content || "";
-const traceJson = readFirstExisting(path.join(runDir, "trace.json"))?.content || "";
-const unifiedEvents = readFirstExisting(path.join(runDir, "unified-events.jsonl"))?.content || "";
-const descriptors = readFirstExisting(
-  path.join(runDir, ".o11y", RUN_ID, "cdp", "descriptors.ndjson"),
-  path.join(runDir, "cdp", "descriptors.ndjson"),
-)?.content || "";
-
-// ── Framework registry ────────────────────────────────────────────
-
-const CODEGEN_DIR = path.join(SKILL_DIR, "codegen");
-const REFERENCES_DIR = path.join(SKILL_DIR, "references");
-
-function frameworkConfig(framework) {
-  const promptPath = PROMPT_TEMPLATE_OVERRIDE && framework === "custom"
-    ? path.resolve(PROMPT_TEMPLATE_OVERRIDE)
-    : path.join(CODEGEN_DIR, "prompts", `${framework}.md`);
-  const scaffoldDir = path.join(CODEGEN_DIR, "scaffolds", framework);
-  const runnerPath = path.join(CODEGEN_DIR, "runners", `${framework}.mjs`);
-  const extByFramework = { playwright: "ts", stagehand: "ts", puppeteer: "js", selenium: "py" };
-  const ext = extByFramework[framework] || "ts";
-  return { promptPath, scaffoldDir, runnerPath, ext };
-}
-
-// ── Context builder ───────────────────────────────────────────────
-
-// Trim a stringified blob to a budget while keeping head + tail.
-function clip(text, maxBytes) {
-  if (text.length <= maxBytes) return text;
-  const head = Math.floor(maxBytes * 0.7);
-  const tail = maxBytes - head - 64;
-  return text.slice(0, head) + `\n\n…[truncated ${text.length - head - tail} bytes]…\n\n` + text.slice(-tail);
-}
-
-function buildContext({ promptTemplate, cdpBridgeDoc, previousAttempt, verifyFailure }) {
-  const parts = [];
-  parts.push("# Task\n\n" + taskMd.trim());
-  if (strategyMd.trim()) parts.push("# Strategy notes\n\n" + strategyMd.trim());
-  if (cdpBridgeDoc) parts.push("# Reference: Playwright ↔ Browserbase bridge\n\n" + cdpBridgeDoc.trim());
-  if (unifiedEvents.trim()) {
-    parts.push("# Unified events (agent + browser, time-ordered)\n\n```\n" + clip(unifiedEvents, 32_000) + "\n```");
-  } else if (traceJson.trim()) {
-    parts.push("# Trace (agent turns)\n\n```json\n" + clip(traceJson, 32_000) + "\n```");
-  }
-  if (descriptors.trim()) {
-    parts.push("# Descriptors (per-command DOM target)\n\n```\n" + clip(descriptors, 16_000) + "\n```");
-  }
-  if (previousAttempt && verifyFailure) {
-    parts.push(
-      "# Previous attempt and the verify failure\n\nYour previous attempt was:\n\n```\n" +
-      clip(previousAttempt, 12_000) +
-      "\n```\n\nIt failed verification with:\n\n```\n" +
-      clip(verifyFailure, 4_000) +
-      "\n```\n\nFix the issue and emit a complete corrected script.",
-    );
-  }
-  return promptTemplate.trim() + "\n\n" + parts.join("\n\n");
-}
-
-// ── Cache ─────────────────────────────────────────────────────────
-
-function hashContent(s) {
-  return crypto.createHash("sha256").update(s).digest("hex").slice(0, 16);
-}
-function cacheKey(framework, promptTemplate) {
-  return hashContent([
-    "v" + PROMPT_TEMPLATE_VERSION,
-    framework,
-    hashContent(promptTemplate),
-    hashContent(taskMd),
-    hashContent(traceJson),
-    hashContent(unifiedEvents),
-    hashContent(descriptors),
-    hashContent(strategyMd),
-  ].join("|"));
-}
-function readCache(key) {
-  const p = path.join(CACHE_DIR, `${key}.txt`);
-  return fs.existsSync(p) ? fs.readFileSync(p, "utf-8") : null;
-}
-function writeCache(key, content) {
-  fs.mkdirSync(CACHE_DIR, { recursive: true });
-  fs.writeFileSync(path.join(CACHE_DIR, `${key}.txt`), content);
-}
-
-// ── LLM call ──────────────────────────────────────────────────────
-
-let _anthropic = null;
-function anthropic() {
-  if (!_anthropic) {
-    if (!process.env.ANTHROPIC_API_KEY && !process.env.ANTHROPIC_AUTH_TOKEN) {
-      throw new Error("ANTHROPIC_API_KEY (or ANTHROPIC_AUTH_TOKEN) is required for codegen.");
-    }
-    _anthropic = new Anthropic();
-  }
-  return _anthropic;
-}
-
-async function callLlm(systemPrompt, userMessage) {
-  const res = await anthropic().messages.create({
-    model: MODEL,
-    max_tokens: DEFAULT_MAX_TOKENS,
-    system: systemPrompt,
-    messages: [{ role: "user", content: userMessage }],
-  });
-  const text = res.content
-    .filter((b) => b.type === "text")
-    .map((b) => b.text)
-    .join("\n");
-  // The agent might emit fences anyway; strip a single outer code block.
-  const fenced = text.match(/^```[\w-]*\n([\s\S]*?)\n```\s*$/);
-  const code = fenced ? fenced[1] : text.trim();
-  const cost = (res.usage?.input_tokens ?? 0) * 3e-6 + (res.usage?.output_tokens ?? 0) * 15e-6;
-  return { code, cost, tokens: res.usage };
-}
-
-// ── Scaffold + write output ───────────────────────────────────────
-
-// Scaffold version pins. Each framework's scaffold/package.json references
-// these via {{PLAYWRIGHT_VERSION}} / {{STAGEHAND_VERSION}} / etc. so callers
-// can canary a new release without forking — set the corresponding env var.
-// Loose semver guard rejects shell-injection shapes before they hit npm.
-const VERSION_RE = /^\d+\.\d+\.\d+(?:-[A-Za-z0-9.-]+)?$/;
-function resolveVersion(envName, fallback) {
-  const raw = process.env[envName];
-  if (!raw) return fallback;
-  if (!VERSION_RE.test(raw)) {
-    throw new Error(`${envName}="${raw}" is not a valid X.Y.Z[-tag] version`);
-  }
-  return raw;
-}
-const SCAFFOLD_VERSIONS = {
-  PLAYWRIGHT_VERSION: resolveVersion("PLAYWRIGHT_VERSION", "1.50.0"),
-  STAGEHAND_VERSION: resolveVersion("STAGEHAND_VERSION", "3.4.0"),
-  TSX_VERSION: resolveVersion("TSX_VERSION", "4.22.3"),
-  ZOD_VERSION: resolveVersion("ZOD_VERSION", "4.4.3"),
-  DOTENV_VERSION: resolveVersion("DOTENV_VERSION", "16.4.5"),
-};
-
-function templateInterpolate(content, vars) {
-  return Object.entries(vars).reduce(
-    (acc, [k, v]) => acc.replaceAll(`{{${k}}}`, v),
-    content,
-  );
-}
-
-function dropScaffold(scaffoldDir, outDir, taskName, scriptBasename) {
-  if (!fs.existsSync(scaffoldDir)) return;
-  // Two distinct template vars: TASK is the slug (used in package name),
-  // SCRIPT is the actual filename (used in the start script). They diverge
-  // in --out mode where files are named <framework>.ts but TASK is the
-  // task slug — without SCRIPT, `npm start` would invoke a missing file.
-  const vars = { TASK: taskName, SCRIPT: scriptBasename, ...SCAFFOLD_VERSIONS };
-  for (const entry of fs.readdirSync(scaffoldDir)) {
-    const src = path.join(scaffoldDir, entry);
-    const dst = path.join(outDir, entry);
-    const content = templateInterpolate(fs.readFileSync(src, "utf-8"), vars);
-    // Special-case package.json: when --out is shared across frameworks (e.g.
-    // browse.sh passes one dir for playwright+stagehand), the first framework
-    // writes its package.json and the second must MERGE its dependencies in,
-    // not skip. Otherwise the second framework's `node_modules` lacks its own
-    // runtime deps (e.g. @browserbasehq/stagehand) and verify can never pass.
-    if (entry === "package.json" && fs.existsSync(dst)) {
-      try {
-        const existing = JSON.parse(fs.readFileSync(dst, "utf-8"));
-        const incoming = JSON.parse(content);
-        existing.dependencies = {
-          ...(existing.dependencies || {}),
-          ...(incoming.dependencies || {}),
-        };
-        existing.devDependencies = {
-          ...(existing.devDependencies || {}),
-          ...(incoming.devDependencies || {}),
-        };
-        fs.writeFileSync(dst, JSON.stringify(existing, null, 2) + "\n");
-        continue;
-      } catch {
-        // Fall through to never-overwrite policy if either side is malformed.
-      }
-    }
-    if (fs.existsSync(dst)) continue; // never overwrite a user's file
-    fs.writeFileSync(dst, content);
-  }
-}
-
-// ── Verify ────────────────────────────────────────────────────────
-
-function verify(framework, outDir, scriptBasename) {
-  const { runnerPath } = frameworkConfig(framework);
-  if (!fs.existsSync(runnerPath)) {
-    return { passed: false, error: `no runner for framework "${framework}" at ${runnerPath}`, runner_missing: true };
-  }
-  // The parent timeout must exceed the runner's worst case: tsx-runner allows
-  // up to 3min for npm install + 5min for the tsx run = 8min, plus slack for
-  // process startup and the trailing-JSON parse. 10min keeps us safely above
-  // that so a healthy slow run isn't killed mid-flight.
-  const res = spawnSync("node", [runnerPath, "--out-dir", outDir, "--script", scriptBasename], {
-    encoding: "utf-8",
-    stdio: ["ignore", "pipe", "pipe"],
-    env: process.env,
-    timeout: 10 * 60 * 1000,
-  });
-  const stdout = res.stdout || "";
-  const stderr = res.stderr || "";
-  // Runners must emit a final JSON line: {"passed":true,...} or {"passed":false,...}
-  const lastLine = stdout.trim().split("\n").pop() || "";
-  let parsed = null;
-  try { parsed = JSON.parse(lastLine); } catch {}
-  if (parsed && typeof parsed.passed === "boolean") {
-    return { ...parsed, stdout, stderr };
-  }
-  return { passed: false, error: `runner did not emit a {passed:boolean} JSON line; exit=${res.status}`, stdout, stderr };
-}
-
-// ── Per-framework pipeline ────────────────────────────────────────
-
-async function generateOne(framework) {
-  const cfg = frameworkConfig(framework);
-  if (!fs.existsSync(cfg.promptPath)) {
-    return { framework, passed: false, error: `no prompt template for "${framework}" at ${cfg.promptPath}` };
-  }
-  const promptTemplate = fs.readFileSync(cfg.promptPath, "utf-8");
-  const cdpBridgeDoc = fs.existsSync(path.join(REFERENCES_DIR, "playwright-cdp-bridge.md"))
-    ? fs.readFileSync(path.join(REFERENCES_DIR, "playwright-cdp-bridge.md"), "utf-8")
-    : "";
-
-  // Filename + outDir convention:
-  //  - default mode (--out unset): per-framework subdir, file named after the
-  //    task, so the dir feels like a standalone project — e.g.
-  //    tasks/<task>/playwright/<task>.ts  with its own package.json.
-  //  - --out mode: caller is flattening into someone else's tree (e.g.
-  //    browse.sh's /tmp/skill/{domain}/{task}/), so we use the framework
-  //    name as the filename — playwright.ts + stagehand.ts in the same dir,
-  //    no collision.
-  const outDir = OUT_OVERRIDE ? path.resolve(OUT_OVERRIDE) : path.join(taskDir, framework);
-  const scriptBasename = OUT_OVERRIDE ? `${framework}.${cfg.ext}` : `${TASK}.${cfg.ext}`;
-  fs.mkdirSync(outDir, { recursive: true });
-  const scriptPath = path.join(outDir, scriptBasename);
-
-  // Cache lookup
-  const key = cacheKey(framework, promptTemplate);
-  let cached = !FORCE ? readCache(key) : null;
-  if (CACHE_ONLY && !cached) {
-    return { framework, passed: false, error: `--cache-only set but no cached output for key ${key}` };
-  }
-
-  if (DRY_RUN) {
-    const ctx = buildContext({ promptTemplate, cdpBridgeDoc });
-    const bytes = ctx.length;
-    const estCost = (bytes / 4) * 3e-6; // ~4 chars/token, $3/M in
-    return { framework, dryRun: true, prompt_bytes: bytes, estimated_cost_usd: Number(estCost.toFixed(4)) };
-  }
-
-  // `attempts` counts emitted-script-versions. Cached and uncached both start
-  // at 1 (the script-on-disk is one version, whether the LLM just wrote it or
-  // we restored it from cache). The retry loop below then increments per
-  // rewrite, bounded by --max-retries. Initializing to 0 on a cache hit gave
-  // cached runs one extra rewrite vs uncached — caught by Bugbot.
-  let code, cost = 0, attempts = 1;
-  if (cached) {
-    code = cached;
-  } else {
-    const ctx = buildContext({ promptTemplate, cdpBridgeDoc });
-    const { code: c, cost: k } = await callLlm(
-      "You are an expert browser-automation engineer. Output ONLY the contents of the script file — no preamble, no explanation, no markdown fences. The script must be runnable as-is.",
-      ctx,
-    );
-    code = c;
-    cost += k;
-    writeCache(key, code);
-  }
-
-  fs.writeFileSync(scriptPath, code);
-  dropScaffold(cfg.scaffoldDir, outDir, TASK, scriptBasename);
-
-  if (!VERIFY) {
-    return { framework, passed: true, scriptPath, cached: !!cached, verify_skipped: true, cost_usd: cost };
-  }
-
-  // Verify loop with rewrite-on-failure
-  let lastVerify = verify(framework, outDir, scriptBasename);
-  while (!lastVerify.passed && attempts < MAX_RETRIES + 1) {
-    if (lastVerify.runner_missing) break;
-    // --cache-only forbids ANY LLM call, including the rewrite path. Without
-    // this guard a cached script that fails verify would still burn quota
-    // through the rewrite loop, contradicting the documented "no LLM call"
-    // CI behavior.
-    if (CACHE_ONLY) break;
-    attempts++;
-    const previousCode = code;
-    const failureContext =
-      (lastVerify.error || "") +
-      "\n\nstderr:\n" + (lastVerify.stderr || "").slice(-2000) +
-      "\nstdout:\n" + (lastVerify.stdout || "").slice(-2000);
-    const ctx = buildContext({
-      promptTemplate,
-      cdpBridgeDoc,
-      previousAttempt: previousCode,
-      verifyFailure: failureContext,
-    });
-    const { code: c, cost: k } = await callLlm(
-      "You are an expert browser-automation engineer. Output ONLY the corrected script file — no preamble, no explanation, no markdown fences.",
-      ctx,
-    );
-    code = c;
-    cost += k;
-    fs.writeFileSync(scriptPath, code);
-    writeCache(key, code); // overwrite cache with the latest attempt
-    lastVerify = verify(framework, outDir, scriptBasename);
-  }
-
-  return {
-    framework,
-    passed: lastVerify.passed,
-    scriptPath,
-    cached: !!cached && cost === 0,
-    verify_attempts: attempts,
-    last_error: lastVerify.passed ? null : (lastVerify.error || lastVerify.stderr?.slice(-200) || null),
-    cost_usd: Number(cost.toFixed(4)),
-  };
-}
-
-// ── Main ──────────────────────────────────────────────────────────
-
-async function main() {
-  console.error(`[codegen] task=${TASK} run=${RUN_ID} frameworks=[${FRAMEWORKS.join(",")}] verify=${VERIFY}`);
-  let anyFailed = false;
-  for (const framework of FRAMEWORKS) {
-    try {
-      const result = await generateOne(framework);
-      console.log(JSON.stringify(result));
-      if (result.passed === false) anyFailed = true;
-    } catch (err) {
-      console.log(JSON.stringify({ framework, passed: false, error: err.message }));
-      anyFailed = true;
-    }
-  }
-  process.exit(anyFailed ? 2 : 0);
-}
-
-main().catch((err) => {
-  console.error("FATAL:", err.stack || err.message);
-  process.exit(1);
-});