[STG-2419] feat: add playwright-to-stagehand skill by shrey150 · Pull Request #140 · browserbase/skills

shrey150 · 2026-06-26T21:14:56Z

Summary

Adds playwright-to-stagehand — a skill that migrates Playwright automation scripts (TypeScript/JavaScript or Python) to Stagehand v3 (TypeScript) on Browserbase. It's the Playwright counterpart to the merged browser-use-to-stagehand skill and follows the same structure (lean SKILL.md → references/) and the same live-eval verification discipline.

Linear: STG-2419

Opened as draft — first spike; validated (see E2E below) but worth a review pass before ready.

The core design decision

The migrations pull in opposite directions, and that's the whole point:

browser-use is agentic-by-default → its migration removes AI where the flow is known.
Playwright is deterministic-by-default but brittle → its migration keeps the deterministic skeleton and selectively upgrades only the fragile parts.

Stagehand v3 does not run Playwright — it uses the understudy CDP engine, whose page API is Playwright-flavored but only partially compatible (verified against @browserbasehq/stagehand 3.6.0 source). So a naive transpile is wrong. Every step is one of three moves:

Port the compatible subset (page.goto, page.locator(css/xpath).fill/click, evaluate, screenshot, frames, waitForSelector/LoadState).
Rewrite the different-shape constructs: page.click(sel) → page.locator(sel).click() (page-level click is coordinate-based), stable-selector $$eval → page.evaluate(...) (deterministic, zero AI), getByTestId → [data-testid], positional setViewportSize, waitForURL → poll.
Upgrade or flag the rest: brittle/variable selectors & reads → act/extract/observe; semantic getByRole/Text/Label → act or CSS; route/waitForResponse/page.on(event)/expect()/downloads/multi-context → needs-human-review.

Deterministic-by-default is load-bearing. The skill is explicit that you should not over-AI-ify: a stable-selector scrape stays a page.evaluate, a stable #id click stays a locator, and a secret goes into a deterministic locator("#password").fill(process.env.PASS!) (no LLM call, and the secret never enters a prompt) — act/extract are reserved for genuinely brittle/semantic/variable steps or when you explicitly want DOM-drift resilience. It also gates scope: @playwright/test files are out of scope (Stagehand isn't a test runner) — lift only the browser logic, map expect() to read-and-throw.

What's in it

SKILL.md — scope gate, source detection (TS/JS vs Python; plain vs @playwright/test), inventory, the port/rewrite/upgrade triage, the v3 rewrite, a migration summary.
references/api-mapping.md — the full page-API compatibility table (Port / Rewrite / Upgrade-or-flag for every common Playwright call), verified against Stagehand 3.6.0 source, plus the Python→TS cross-language mapping and the gap list.
references/determinism.md — the keep/rewrite/upgrade decision tree (reads default to deterministic page.evaluate) and the failure modes (over-AI-ify, under-migrate, copy-what-doesn't-exist).
references/guide.md, references/prompt.md (tool-agnostic), references/trace-assisted.md, EXAMPLES.md (TS + Python before/after, incl. test-file and network-interception gaps), LICENSE.txt.
README.md row added; passes node scripts/validate-skills.mjs (18/18, 0 errors/0 warnings).

E2E Test Matrix

Verified with a live eval: real Playwright scripts (TS + Python, plain + @playwright/test, brittle scrapes, login, semantic locators, network interception, screenshot) → converted by skill-only subagents (each loads only the skill, not Stagehand prior knowledge) → tsc --noEmit → run live on Browserbase → graded vs ground truth.

Command / flow	Observed output	Confidence / sufficiency
`node scripts/validate-skills.mjs`	`18 passed, 0 failed, 0 error(s), 0 warning(s)`	Frontmatter/license/README-row/structure pass repo CI.
`tsc --noEmit` on all converted scripts	`9/9 tsc OK`	Skill's API mapping emits type-correct v3 (no nonexistent APIs).
Case 01 — TS `$$eval` scrape (live)	first 5 quotes w/ tags, exact	Read upgrade path.
Case 02 — TS books grid + pagination (live)	`scraped 11 books`, £ prices exact	Scrape + relative→absolute nav rewrite. Single-page category, so 2nd-page hop not traversed.
Case 03 — TS login + secrets + `expect` (live)	`You logged into a secure area!`	`#id` ported via locator; `expect`→throw.
Case 04 — TS semantic `getBy*` → CSS locator (live)	banner + logout, zero AI	Rewrote `getBy*` to stable CSS + polyfilled missing `waitForURL`; no over-AI-ifying.
Case 05 — Python sync → TS (live)	first 5 authors, exact	Cross-language path.
Case 06 — Python async → TS (live)	first 5 books+prices, exact	Cross-language async path.
Case 07 — `@playwright/test` spec (live)	valid + invalid credential paths both correct	Scope gate: flagged scaffold, lifted logic, `expect`→read+throw.
Case 08 — `route` + `waitForResponse` gap (live)	first 5 quotes via DOM read	Flagged needs-review, dropped incidental `route`, restructured XHR-sniff.
Case 09 — full-page `screenshot` (live)	`669151` bytes, `20` books	Platform mapping: `screenshot`→Buffer, positional viewport.

Result: 9/9 compile, 9/9 run live, 9/9 outcomes match ground truth, no skill-attributable failures.

Correction during review (deterministic-by-default)

Review caught an over-AI-ify lean in the first eval pass: the converters sent stable-selector scrapes to extract() and a stable #password to act(), when a deterministic page.evaluate / locator.fill is better (free, instant, secret never reaches a prompt). Root cause was the decision tree routing every read to extract. Fixed the skill to default reads/secret-fills to deterministic, then re-verified with fresh skill-only reconversions on the corrected skill:

Re-verified flow	Result
Case 01 reconverted	now emits `page.evaluate` — 0 `extract`, 0 `act` — tsc-clean, correct quotes live
Case 03 reconverted	password via `locator("#password").fill(process.env…)` — 0 `act` — tsc-clean, "secure area" live

The eval harness (skill-only conversion → tsc → live Browserbase → grade) doubles as a drift detector to re-run on each Stagehand/Playwright release.

Notes

No changeset: this repo has no .changeset/config.json (skills marketplace, not a published package).
Not added to .claude-plugin/marketplace.json — consistent with the sibling browser-use-to-stagehand, which isn't listed there either.

🤖 Generated with Claude Code

…n) to Stagehand v3 on Browserbase Converts Playwright automation scripts to Stagehand v3 (TypeScript) on Browserbase. Stagehand v3's understudy page API is Playwright-flavored but only partially compatible, so the skill frames every step as one of three moves — Port the compatible subset, Rewrite the different-shape constructs (page.click(sel) -> locator(sel).click(), $$eval -> evaluate, getByTestId -> [data-testid], positional setViewportSize), and Upgrade-or-flag the rest (brittle selectors/list scrapes -> act/extract; getByRole/Text/Label -> act; route/waitForResponse/expect/downloads -> needs-human-review). Handles TS/JS and Python sources; flags @playwright/test files as out of scope (Stagehand is not a test runner). - SKILL.md: scope gate, source detection, inventory, port/rewrite/upgrade triage, v3 rewrite, migration summary - references/: api-mapping (full page-API compatibility table verified vs Stagehand 3.6.0), determinism (keep/rewrite/upgrade decision tree), guide, prompt (tool-agnostic), trace-assisted - EXAMPLES.md: before/after pairs (TS + Python, plus the test-file and network-interception gaps) - README row added; passes validate-skills.mjs Validated with a live eval (9 real Playwright scripts converted via skill-only subagents -> tsc -> run on live Browserbase): 9/9 compile, 9/9 run, outcomes match ground truth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…stic, not AI The decision tree routed every read to extract() and showed the password fill via act()+variables — both over-AI-ify deterministic code. Corrected: - Reads: stable-selector scrapes default to a deterministic page.evaluate(...) (zero AI/zero cost); extract() reserved for brittle/variable markup or wanted DOM-drift resilience. ($$eval has no understudy equivalent, but evaluate does.) - Secrets: stable fields fill deterministically via locator(sel).fill(process.env...) — no LLM call, and the secret never enters a prompt; act()+variables only when the field needs AI resolution. - Updated SKILL.md (triage, checklist, mistakes), determinism.md (read decision + failure modes), api-mapping.md (§4.1, §4.4), EXAMPLES.md (#1, #2), prompt.md. Re-verified with skill-only reconversions on the corrected skill: case 01 (scrape) now emits page.evaluate (0 extract), case 03 (login) fills the password deterministically (0 act) — both tsc-clean and run live with correct output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shrey150 and others added 2 commits June 26, 2026 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[STG-2419] feat: add playwright-to-stagehand skill#140

[STG-2419] feat: add playwright-to-stagehand skill#140
shrey150 wants to merge 2 commits into
mainfrom
shrey/playwright-to-stagehand

shrey150 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shrey150 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The core design decision

What's in it

E2E Test Matrix

Correction during review (deterministic-by-default)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shrey150 commented Jun 26, 2026 •

edited

Loading