Skip to content

[STG-2419] feat: add playwright-to-stagehand skill#140

Draft
shrey150 wants to merge 2 commits into
mainfrom
shrey/playwright-to-stagehand
Draft

[STG-2419] feat: add playwright-to-stagehand skill#140
shrey150 wants to merge 2 commits into
mainfrom
shrey/playwright-to-stagehand

Conversation

@shrey150

@shrey150 shrey150 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds playwright-to-stagehand — a skill that migrates Playwright automation scripts (TypeScript/JavaScript or Python) to Stagehand v3 (TypeScript) on Browserbase. It's the Playwright counterpart to the merged browser-use-to-stagehand skill and follows the same structure (lean SKILL.mdreferences/) and the same live-eval verification discipline.

Linear: STG-2419

Opened as draft — first spike; validated (see E2E below) but worth a review pass before ready.

The core design decision

The migrations pull in opposite directions, and that's the whole point:

  • browser-use is agentic-by-default → its migration removes AI where the flow is known.
  • Playwright is deterministic-by-default but brittle → its migration keeps the deterministic skeleton and selectively upgrades only the fragile parts.

Stagehand v3 does not run Playwright — it uses the understudy CDP engine, whose page API is Playwright-flavored but only partially compatible (verified against @browserbasehq/stagehand 3.6.0 source). So a naive transpile is wrong. Every step is one of three moves:

  • Port the compatible subset (page.goto, page.locator(css/xpath).fill/click, evaluate, screenshot, frames, waitForSelector/LoadState).
  • Rewrite the different-shape constructs: page.click(sel)page.locator(sel).click() (page-level click is coordinate-based), stable-selector $$evalpage.evaluate(...) (deterministic, zero AI), getByTestId[data-testid], positional setViewportSize, waitForURL → poll.
  • Upgrade or flag the rest: brittle/variable selectors & reads → act/extract/observe; semantic getByRole/Text/Labelact or CSS; route/waitForResponse/page.on(event)/expect()/downloads/multi-context → needs-human-review.

Deterministic-by-default is load-bearing. The skill is explicit that you should not over-AI-ify: a stable-selector scrape stays a page.evaluate, a stable #id click stays a locator, and a secret goes into a deterministic locator("#password").fill(process.env.PASS!) (no LLM call, and the secret never enters a prompt) — act/extract are reserved for genuinely brittle/semantic/variable steps or when you explicitly want DOM-drift resilience. It also gates scope: @playwright/test files are out of scope (Stagehand isn't a test runner) — lift only the browser logic, map expect() to read-and-throw.

What's in it

  • SKILL.md — scope gate, source detection (TS/JS vs Python; plain vs @playwright/test), inventory, the port/rewrite/upgrade triage, the v3 rewrite, a migration summary.
  • references/api-mapping.md — the full page-API compatibility table (Port / Rewrite / Upgrade-or-flag for every common Playwright call), verified against Stagehand 3.6.0 source, plus the Python→TS cross-language mapping and the gap list.
  • references/determinism.md — the keep/rewrite/upgrade decision tree (reads default to deterministic page.evaluate) and the failure modes (over-AI-ify, under-migrate, copy-what-doesn't-exist).
  • references/guide.md, references/prompt.md (tool-agnostic), references/trace-assisted.md, EXAMPLES.md (TS + Python before/after, incl. test-file and network-interception gaps), LICENSE.txt.
  • README.md row added; passes node scripts/validate-skills.mjs (18/18, 0 errors/0 warnings).

E2E Test Matrix

Verified with a live eval: real Playwright scripts (TS + Python, plain + @playwright/test, brittle scrapes, login, semantic locators, network interception, screenshot) → converted by skill-only subagents (each loads only the skill, not Stagehand prior knowledge) → tsc --noEmitrun live on Browserbase → graded vs ground truth.

Command / flow Observed output Confidence / sufficiency
node scripts/validate-skills.mjs 18 passed, 0 failed, 0 error(s), 0 warning(s) Frontmatter/license/README-row/structure pass repo CI.
tsc --noEmit on all converted scripts 9/9 tsc OK Skill's API mapping emits type-correct v3 (no nonexistent APIs).
Case 01 — TS $$eval scrape (live) first 5 quotes w/ tags, exact Read upgrade path.
Case 02 — TS books grid + pagination (live) scraped 11 books, £ prices exact Scrape + relative→absolute nav rewrite. Single-page category, so 2nd-page hop not traversed.
Case 03 — TS login + secrets + expect (live) You logged into a secure area! #id ported via locator; expect→throw.
Case 04 — TS semantic getBy* → CSS locator (live) banner + logout, zero AI Rewrote getBy* to stable CSS + polyfilled missing waitForURL; no over-AI-ifying.
Case 05 — Python sync → TS (live) first 5 authors, exact Cross-language path.
Case 06 — Python async → TS (live) first 5 books+prices, exact Cross-language async path.
Case 07 — @playwright/test spec (live) valid + invalid credential paths both correct Scope gate: flagged scaffold, lifted logic, expect→read+throw.
Case 08 — route + waitForResponse gap (live) first 5 quotes via DOM read Flagged needs-review, dropped incidental route, restructured XHR-sniff.
Case 09 — full-page screenshot (live) 669151 bytes, 20 books Platform mapping: screenshot→Buffer, positional viewport.

Result: 9/9 compile, 9/9 run live, 9/9 outcomes match ground truth, no skill-attributable failures.

Correction during review (deterministic-by-default)

Review caught an over-AI-ify lean in the first eval pass: the converters sent stable-selector scrapes to extract() and a stable #password to act(), when a deterministic page.evaluate / locator.fill is better (free, instant, secret never reaches a prompt). Root cause was the decision tree routing every read to extract. Fixed the skill to default reads/secret-fills to deterministic, then re-verified with fresh skill-only reconversions on the corrected skill:

Re-verified flow Result
Case 01 reconverted now emits page.evaluate0 extract, 0 act — tsc-clean, correct quotes live
Case 03 reconverted password via locator("#password").fill(process.env…)0 act — tsc-clean, "secure area" live

The eval harness (skill-only conversion → tsc → live Browserbase → grade) doubles as a drift detector to re-run on each Stagehand/Playwright release.

Notes

  • No changeset: this repo has no .changeset/config.json (skills marketplace, not a published package).
  • Not added to .claude-plugin/marketplace.json — consistent with the sibling browser-use-to-stagehand, which isn't listed there either.

🤖 Generated with Claude Code

shrey150 and others added 2 commits June 26, 2026 14:14
…n) to Stagehand v3 on Browserbase

Converts Playwright automation scripts to Stagehand v3 (TypeScript) on Browserbase.
Stagehand v3's understudy page API is Playwright-flavored but only partially
compatible, so the skill frames every step as one of three moves — Port the
compatible subset, Rewrite the different-shape constructs (page.click(sel) ->
locator(sel).click(), $$eval -> evaluate, getByTestId -> [data-testid], positional
setViewportSize), and Upgrade-or-flag the rest (brittle selectors/list scrapes ->
act/extract; getByRole/Text/Label -> act; route/waitForResponse/expect/downloads ->
needs-human-review). Handles TS/JS and Python sources; flags @playwright/test files
as out of scope (Stagehand is not a test runner).

- SKILL.md: scope gate, source detection, inventory, port/rewrite/upgrade triage, v3 rewrite, migration summary
- references/: api-mapping (full page-API compatibility table verified vs Stagehand 3.6.0), determinism (keep/rewrite/upgrade decision tree), guide, prompt (tool-agnostic), trace-assisted
- EXAMPLES.md: before/after pairs (TS + Python, plus the test-file and network-interception gaps)
- README row added; passes validate-skills.mjs

Validated with a live eval (9 real Playwright scripts converted via skill-only
subagents -> tsc -> run on live Browserbase): 9/9 compile, 9/9 run, outcomes match
ground truth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stic, not AI

The decision tree routed every read to extract() and showed the password fill via
act()+variables — both over-AI-ify deterministic code. Corrected:
- Reads: stable-selector scrapes default to a deterministic page.evaluate(...) (zero
  AI/zero cost); extract() reserved for brittle/variable markup or wanted DOM-drift
  resilience. ($$eval has no understudy equivalent, but evaluate does.)
- Secrets: stable fields fill deterministically via locator(sel).fill(process.env...)
  — no LLM call, and the secret never enters a prompt; act()+variables only when the
  field needs AI resolution.
- Updated SKILL.md (triage, checklist, mistakes), determinism.md (read decision +
  failure modes), api-mapping.md (§4.1, §4.4), EXAMPLES.md (#1, #2), prompt.md.

Re-verified with skill-only reconversions on the corrected skill: case 01 (scrape)
now emits page.evaluate (0 extract), case 03 (login) fills the password
deterministically (0 act) — both tsc-clean and run live with correct output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant