spike(wasm): edge-runtime portability — Cloudflare Workers / workerd / Deno by linyiru · Pull Request #137 · linyiru/rubyrs

linyiru · 2026-05-26T22:22:38Z

What this is

A spike, not a feature PR. Do not merge. Opening as a draft so the empirical findings + scaffolding are discoverable from the repo, not just sitting on a local branch.

The spike validates one claim and one disclaimer:

Claim: a single Wizer-pre-initialised rubyrs_worker.wasm (1.68 MB) runs unchanged across three V8-based edge runtimes — Cloudflare Workers (managed), workerd (self-host), Deno (self-host) — plus wasmtime as a non-V8 baseline.
Disclaimer: 18 ms cold-start (workerd local) is the V8-wasm floor for this build shape, confirmed by two negative experiments documented in docs/BENCHMARKS.md. To go lower would require either a function-count refactor inside rubyrs, a component-model + AOT move (wasmtime serve), or CF exposing a generic snapshot API.

Files

poc/cf-worker/
├── src/worker.js                  CF Workers / workerd entry (workers-wasi shim)
├── src/rubyrs_worker.wasm         build artifact (gitignored)
├── build.sh                       cargo → optional wasm-opt → wizer → src/
├── wrangler.toml                  CF Workers deploy config
├── package.json                   workerd + wrangler + workers-wasi + esbuild
├── workerd/
│   ├── config.capnp               self-host workerd config
│   ├── bundle.sh                  esbuild bundler for worker.mjs
│   └── bench-js/                  JS-only worker for baseline cold-start
├── deno/
│   ├── server.ts                  Deno.serve + browser_wasi_shim
│   └── deno.json                  scoped to keep parent npm tree out
└── README.md                      run instructions per target

crates/rubyrs/src/bin/wasm_worker.rs  stdin → Runtime::eval → stdout
docs/BENCHMARKS.md                    +"Edge runtimes" section, +cold-start floor notes

Headline numbers (n=5 each, Apple M-series)

puts 1+1 workload, wizer-only wasm (no wasm-opt — counter-intuitive finding):

Runtime	Engine	Self-host?	Cold-start	Warm tiny
Deno 2.8 + browser_wasi_shim	V8	✅	25 ms	1.5 ms
workerd + workers-wasi	V8	✅	18 ms	2.5 ms
CF Workers edge	V8	❌	~149 ms wall / ~80 ms cpu cold; 7 ms cpu warm	(see edge bucket)
wasmtime 45 (CLI, no HTTP)	wasmtime	✅	12.7 ms raw / ~7 ms AOT	n/a

Surprises worth flagging

wasm-opt is consistently net-negative on V8 cold-start. At every level tested (-Oz, -O2). Smaller wasm doesn't help; V8 wasm parser bottlenecks on IR construction, not byte count. Build default is now WASM_OPT_LEVEL=skip.
Wizer pre-init is the big win. −69 % cold-start on workerd local (57 → 18 ms). Same trick CF uses internally for Python Workers via make_snapshots.py.
CF edge variance is pool-warming, not build choice. First-pass conclusion that "opt+wizer regresses edge perf" was debunked once requests were bucketed by per-isolate invocation count (see x-rubyrs-invocation header in src/worker.js).
Deno beats workerd on warm by ~40 % despite trailing on cold. browser_wasi_shim + Deno.serve's hyper-based Rust HTTP cuts out workerd's JS-shim ↔ kj layer.
@cloudflare/workers-wasi cannot be used on Deno (internal memfs.wasm confuses Deno's loader); jsr:@std/wasi doesn't exist any more (removed Nov 2023). The Deno path uses @bjorn3/browser_wasi_shim.
A bug surfaced and was fixed during the spike: SyntaxError messages were leaking ruby_prism::Diagnostic Debug format (raw pointers + PhantomData markers). Shipped separately as fix(vm): SyntaxError message no longer leaks Prism Diagnostic Debug format #130.

What this is NOT

Not a proposal to ship anything inside the rubyrs crate. The only main-crate change is src/bin/wasm_worker.rs — a 60-line stdin → eval → stdout bin that's incidental to the PoC and self-contained.
Not a recommendation for production deployment on CF Workers Free. The 10 ms CPU cap is below our cold-start floor.
Not a Spin / wasmtime-serve / Fastly target — those require component-model adoption, which is a separable rubyrs-internal decision (likely tied to a future wasi-preview-2 migration).
Not benchmarked against AWS Lambda / Vercel / Cloud Run. The spike's thesis is portability across V8-based runtimes, not absolute fastest hosting.

How to reproduce

brew install deno binaryen
cargo install wizer-cli
rustup target add wasm32-wasip1
# Set WASI_SDK_PATH per docs/DEVELOPMENT.md.

cd poc/cf-worker
npm install
./build.sh                                # → src/rubyrs_worker.wasm (Wizer'd)

# CF Workers managed:
npx wrangler dev                          # local V8 (workerd via Miniflare)
# npx wrangler deploy                     # real edge (needs your CF account)

# workerd self-host:
./workerd/bundle.sh
npx workerd serve workerd/config.capnp    # listens on :8080

# Deno self-host:
cd deno && deno run --allow-net --allow-read server.ts  # listens on :8000

# Smoke against any of them:
curl -X POST --data-binary 'puts (1..5).sum' http://localhost:PORT

Branch retention

Keep this branch + draft PR open as a stable reference for the runtime-portability claim. If a future rubyrs feature wants to revisit (e.g. component model migration, lazy-load preamble follow-up), this is the starting point + measurement harness.

Minimum-viable end-to-end: HTTP POST body (Ruby source) → worker.js pipes body as stdin via @cloudflare/workers-wasi → wasm_worker.wasm reads stdin → Runtime::eval → stdout → HTTP 200 body=Ruby script's stdout Local `wrangler dev` (Miniflare v3 → workerd, same runtime as production) round-trips smoke.rb in ~12 ms wall and a single- statement `puts (1..5).sum` cold request in ~38 ms. CPU time specifically (which is what Workers Free's 10 ms cap measures) needs a real edge deploy to know; Miniflare doesn't enforce those caps locally. Design choices, locked in the README: - New `wasm_worker` bin reads Ruby from stdin rather than argv, because `@cloudflare/workers-wasi` has no public API to write files into its in-isolate littlefs before `wasi.start()`. Stdin is the documented input channel for command-shape wasm. - No `[[rules]] CompiledWasm` in `wrangler.toml` — wrangler v3 ships a default rule for `**/*.wasm` with `fallthrough = false`, so a re-declaration fails the build. - `build.sh` prepends `~/.cargo/bin` to PATH to dodge Homebrew's rust (lacking the `wasm32-wasip1` rust-std component) shadowing rustup's. Same WASI_SDK_PATH requirement as `tests/wasm/smoke.sh`. - wizer is best-effort — skipped when the bin lacks the `wizer.initialize` export (`wasm_worker` currently doesn't carry it, since the export only lives on the `rubyrs` CLI bin). A follow-up could add the same pattern to `wasm_worker` to skip the ~3-6 ms classes-and-preamble cold-start on each isolate. Knobs / follow-ups documented in README: streaming response, RUBYRS_DEADLINE_MS env wiring, real-edge cold-start measurement, static-script (embedded `include_str!`) mode. This is a spike, not a finished feature — the `poc/cf-worker/` directory is meant to stay outside the workspace build graph until/unless a real product target emerges.

Adds a self-host deployment path that complements the existing wrangler/CF-edge target — runs the exact same rubyrs.wasm under `workerd serve config.capnp` with no Cloudflare account, no CPU caps, and no isolate-eviction noise. workerd is CF's own runtime binary (Apache 2.0) so the wasm + JS surface is identical to the managed edge, just hosted ourselves. Measurement findings driving the build.sh changes: - workerd local cold-start, n=5 each (`puts 1+1`, restart between): baseline raw (1.54 MB) ............ 57 ms median wasm-opt -Oz only (1.22 MB) ....... 53 ms (size −21%, time −7%) wasm-opt -Oz + wizer (1.37 MB) .... 27 ms wasm-opt -O2 + wizer (1.42 MB) .... 23 ms wizer only (1.68 MB) .............. 18 ms ← BEST - wasm-opt is consistently net-negative on V8 cold-start at every optimisation level tested. Smaller wasm doesn't translate into faster instantiate — the V8 wasm parser appears to be bottlenecked on something other than raw byte count (likely IR construction / module setup). - Default `WASM_OPT_LEVEL=skip` in build.sh; opt-in by env when benchmarking other levels. CF edge cold/warm bucketing (new `x-rubyrs-invocation` header in worker.js lets the harness partition requests by per-isolate hit count; tail captures cpuTime). 60-request bursts: baseline raw warm n=51 cpu median 10 ms (p10 7, p90 16, max 86) wizer only warm n=58 cpu median 7 ms (p10 6, p90 12, max 13) The wizer win on edge is smaller than local (−30% vs −69%) but the cpu-max collapse (86 → 13 ms) is the load-bearing improvement: it eliminates the cold-isolate spike that pushes individual requests over the Free 10 ms CPU cap. Earlier "edge appears to regress with opt+wizer" reading was debunked once cold/warm were bucketed properly — that was post-deploy pool warming, not a wasm-choice regression. The Pyodide-on-Workers internal docs confirm CF's published mean cold-start is a blended pool-hit + pool-miss number with the same noise profile. build.sh changes: - WASM_OPT_LEVEL env knob (skip | -O2 | -O3 | -Oz) - wasm-opt → wizer ordering (wizer snapshots the post-shape memory, reversed order would invalidate the snapshot's function-index refs) - `grep -q` over a tempfile not a pipe — `set -o pipefail` was turning successful "wizer.initialize present" detection into pipe-failure because grep -q closes its stdin early and SIGPIPEs upstream - `--wasm-bulk-memory true` (wizer's flag wants an explicit bool) workerd self-host: - workerd/config.capnp: HTTP socket on :8080, modules list aliases worker.js's `./rubyrs_worker.wasm` import to the build output - workerd/bundle.sh: esbuild bundles src/worker.js + workers-wasi into a single .mjs (workerd's capnp modules need explicit deps; `--external:*.wasm` keeps wasm imports out of the bundle so capnp's `wasm = embed` resolves them) - workerd/bench-js: minimal JS-only worker for cold-start baselining (workerd's own boot cost is <1 ms; the rest of our 18 ms is wasm + Ruby setup) src/worker.js: - Co-located wasm import (`./rubyrs_worker.wasm`) — works for both wrangler (default CompiledWasm rule) and workerd (workerd rejects `..` paths; bare specifiers don't resolve). Same artifact, two hosts. - `x-rubyrs-invocation` header from a module-scope counter — lets the measurement harness bucket cold (invocation == 1) vs warm per V8 isolate. build artifact moved from `wasm/` to `src/` to satisfy workerd's relative-import resolution; .gitignore updated.

Adds a Deno + browser_wasi_shim path that runs the EXACT SAME `src/rubyrs_worker.wasm` bytes as the CF Workers / workerd targets. Demonstrates the broader thesis of the spike — one wasm artifact, multiple V8-host runtimes, no vendor lock-in. Deno plays the same role for "self-host JS edge runtime" that workerd plays for "self-host workers-compatible runtime"; Deno Deploy is the managed counterpart, mirroring the workerd → CF Workers duality. Measurement on the wizer-only wasm (1.68 MB), n=5 each: | metric | Deno | workerd | CF edge (warm) | | cold-start | 25 ms | 18 ms | ~149 ms wall | | warm tiny | 1.5 ms | 2.5 ms | 7 ms cpu | | warm smoke.rb | 1.7 ms | 4.0 ms | 7 ms cpu | | 1M iter each | 124 ms | 135 ms | 173 ms cpu | Deno actually edges out workerd on warm (1.5 vs 2.5 ms tiny — ~40 % faster) despite being behind on cold (25 vs 18 ms). Two plausible reasons: (1) browser_wasi_shim's stdin/stdout path is a pure-JS callback fed by a single `Uint8Array` buffer, whereas workers-wasi proxies through its own bundled `memfs.wasm`; (2) `Deno.serve` is a hyper-based Rust HTTP server cutting out the JS-shim ↔ kj layer that workerd's HTTP path goes through. Heavy compute converges to within ~10 % because V8's wasm engine dominates that regime. Stack we landed on, after the dead-ends documented in the server.ts header: - @cloudflare/workers-wasi does NOT work in Deno — its bundled memfs.wasm uses `import "./memfs.wasm"` which Deno's module loader eagerly walks looking for JS deps, hitting an unresolvable `wasi_snapshot_preview1` import. - jsr:@std/wasi does NOT exist any more — Deno deprecated std/wasi in Oct 2023 (PR #3732) and removed it in Nov 2023 (PR #3733) before the JSR cutover. Never published to JSR. - npm:@bjorn3/browser_wasi_shim — pure-JS Preview 1 shim with no internal wasm deps. File / OpenFile / ConsoleStdout fd adapters; buffer-shaped stdin from request body. deno.json + lock are scoped to the deno/ subdirectory so the parent package.json's npm tree (wrangler / workerd / esbuild and their transitive vitest/rolldown/turbo footprints) does NOT get pulled into Deno's node_modules. Running `deno run` from deno/ keeps the dependency surface to just browser_wasi_shim. `x-rubyrs-invocation` header is reused identically across all three host wrappers so the cold/warm bucketing harness works unchanged regardless of which target is being measured.

New section sandwiched between the P2-A pivot (rubyrs.wasm vs ruby.wasm size/speed thesis) and Throughput. Documents what the spike/cf-worker-poc branch actually validated: same Wizer'd `rubyrs_worker.wasm` (1.68 MB) running unchanged across three V8-based edge runtimes (Deno self-host, workerd self-host, CF Workers managed) plus wasmtime as the non-V8 baseline. Two tables: 1. Per-runtime cold-start / warm-tiny / warm-smoke / heavy. Deno wins warm (1.5 ms tiny — ~40% faster than workerd), workerd wins cold (18 ms vs Deno's 25 ms), CF edge cpu settles at 7 ms p50 warm but eats ~149 ms wall on cold- isolate hits. Each cell sourced from the n=5 PoC bench. 2. wasm-opt vs Wizer build-pipeline ablation (workerd local cold-start). Counter-intuitive finding worth flagging in the public BENCHMARKS: every `wasm-opt` level we measured (-Oz, -O2) was net-negative on V8 cold-start; Wizer-only (1.68 MB, no opt) is the fastest at 18 ms vs 57 ms baseline (−69%). Smaller wasm doesn't translate into faster instantiate — V8's wasm parser appears to bottleneck on IR construction, not byte count. Notes call out the methodology gotcha (CF edge variance is pool- warming, not wasm-choice — bucket by `x-rubyrs-invocation` header) and the Pyodide-on-Workers precedent (CF's `make_snapshots.py` is the same trick we're applying via Wizer, externally). Also notes wasmtime is HTTP-less by default so it sits alongside the V8 hosts as a CLI-shape baseline rather than in the apples-to-apples row.

Documents why the PoC stops optimising at 18 ms (workerd local cold-start with Wizer-only build). Two independent angles attempted, both produced negative or zero results, both for the same root cause: 1. Lazy-loading Tier 1 stdlib preambles (Random + SecureRandom): −40 KB wasm, n=5 cold-start MEDIAN 19.5 ms vs 18.2 ms baseline — variance, no improvement. 2. `opt-level = "z"` + LTO=fat + codegen-units=1: repo's own [profile.release-min] history note records 3–19 % SLOWER cold start despite 56 % smaller binary, measured across three hosts. Both confirm: V8 wasm cold-start is dominated by the per-function IR build (1798 functions in our wasm), NOT byte count. The 18 ms floor is the V8 parser + module-instantiate fixed cost for this build shape, and reducing further requires either a function-count refactor inside rubyrs, a component-model + AOT move (wasmtime serve), or CF exposing a generic snapshot API (currently privileged-only for their bundled Python runtime). Worth flagging publicly in BENCHMARKS because the natural follow-up reading ("ok, why didn't you wasm-opt -Oz? Why didn't you trim stdlib?") has an empirical answer documented inline.

github-actions · 2026-05-26T22:24:30Z

gapscan PR diff

Both binaries produced identical histograms across the 10 canonical scan targets. (If the classifier changed for node classes that none of these targets exercise, this view won't catch it — the data-file diff would.)

See docs/gap-reports/ for the dataset and methodology.

linyiru added 5 commits May 26, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spike(wasm): edge-runtime portability — Cloudflare Workers / workerd / Deno#137

spike(wasm): edge-runtime portability — Cloudflare Workers / workerd / Deno#137
linyiru wants to merge 5 commits into
masterfrom
spike/cf-worker-poc

linyiru commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

linyiru commented May 26, 2026

What this is

Files

Headline numbers (n=5 each, Apple M-series)

Surprises worth flagging

What this is NOT

How to reproduce

Branch retention

Uh oh!

github-actions Bot commented May 26, 2026

gapscan PR diff

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant