Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions crates/rubyrs/src/bin/wasm_worker.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// PoC entry point for the Cloudflare Workers PoC.
//
// Shape: read Ruby source from stdin, evaluate, write result /
// runtime stdout to stdout, exit non-zero on trap. The Worker
// pipes the HTTP request body as stdin and captures stdout as
// the HTTP response — see `poc/cf-worker/`.
//
// Why a separate bin (not main.rs): the CLI reads a path from
// argv, which is awkward to coordinate from workers-wasi since
// its public API does not expose pre-populating the in-isolate
// FS. Stdin is a `ReadableStream` in workers-wasi's option
// shape, which IS easy to drive from a Worker.
//
// Intentionally NOT a feature flag — keeping it as a separate
// bin avoids adding any conditional compilation to the
// well-trodden CLI / library paths. Build with:
// cargo build --release --target wasm32-wasip1 \
// --bin wasm_worker --no-default-features -p rubyrs

use std::io::Read;

use rubyrs::{Config, Runtime};

fn main() {
let mut src = String::new();
if let Err(e) = std::io::stdin().read_to_string(&mut src) {
eprintln!("wasm_worker: stdin read failed: {e}");
std::process::exit(2);
}
// PoC: defaults only. Once cold-start + execution numbers
// land, the right per-request caps (RUBYRS_DEADLINE_MS to
// back-stop Workers' 30s CPU cap, max_value_bytes to keep a
// runaway response from filling the 128MB isolate budget)
// are an obvious follow-up.
let cfg = Config {
// wasi has no PID concept; CLI uses None for the same
// reason. `$$` surfaces as 0 in Ruby-land.
pid: None,
..Config::default()
};
// `take_wizer_runtime` only exists under `target_os = "wasi"`
// (see lib.rs); on host targets we skip the fast path so this
// bin still `cargo check`s without `--target wasm32-wasip1`.
#[cfg(target_os = "wasi")]
let mut rt = match rubyrs::take_wizer_runtime() {
Some(mut rt) => { rt.apply_config(cfg); rt }
None => Runtime::with_config(cfg),
};
#[cfg(not(target_os = "wasi"))]
let mut rt = Runtime::with_config(cfg);
rt.set_stdout(Box::new(std::io::stdout()));
if let Err(trap) = rt.eval(&src, "(worker)") {
eprint!("{}", rt.format_trap(&trap));
std::process::exit(1);
}
}
101 changes: 101 additions & 0 deletions docs/BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,107 @@ or native MRI wins on throughput (see "Throughput" below where
rubyrs still trails CRuby's interpreter ~1.76× on a 1M-iteration
loop). The two niches don't overlap.

## Edge runtimes: cross-host portability

Validates the "one wasm artifact, many edge runtimes" thesis. Same
`src/rubyrs_worker.wasm` (1.68 MB after Wizer pre-init, no
wasm-opt — see [PoC details](#wasm-opt-vs-wizer-notes) below)
runs unchanged under three different V8-based runtimes and one
non-V8 baseline. Spike branch:
[`spike/cf-worker-poc`](../poc/cf-worker/).

`puts 1+1` workload, n=5 each, Apple M-series:

| Runtime | Engine | Self-host? | Cold-start | Warm tiny | Warm smoke.rb | 1M `each` |
|---------|:------:|:---------:|:----------:|:---------:|:-------------:|:---------:|
| **Deno** 2.8 + browser_wasi_shim | V8 14.9 | ✅ | 25 ms | **1.5 ms** | **1.7 ms** | 124 ms |
| **workerd** 2026-05-26 + workers-wasi | V8 | ✅ | **18 ms** | 2.5 ms | 4.0 ms | 135 ms |
| **CF Workers edge** (managed) | V8 (= workerd) | ❌ | ~149 ms wall | 7 ms cpu | 7 ms cpu | 173 ms cpu |
| wasmtime 45 (CLI, no HTTP) | wasmtime | ✅ | 12.7 ms (raw) / ~7 ms (AOT) | — | — | — |

Notes:

- **CF edge numbers are CPU time from `wrangler tail`** bucketed
by per-isolate invocation count (a header `x-rubyrs-invocation`
emitted by [worker.js](../poc/cf-worker/src/worker.js)). Cold
isolate (invocation == 1) wall is 149 ms / cpu ~80 ms; warm
(invocation > 1) settles to 7 ms cpu p50, p90 12 ms, max 13 ms.
The earlier-reading "wizer regresses edge perf" turned out to
be deploy-then-immediately-measure pool-warming noise, not a
real regression — [Pyodide-on-Workers' published 1027 ms
mean](https://blog.cloudflare.com/python-workers-advancements/)
is similarly a pool-hit + pool-miss blend.
- **Deno beats workerd on warm by ~40 %** (1.5 vs 2.5 ms tiny)
despite trailing on cold (25 vs 18 ms). Plausible reasons: (1)
`browser_wasi_shim`'s stdin/stdout is a pure-JS callback on a
single `Uint8Array`, vs `workers-wasi`'s extra `memfs.wasm`
proxy step; (2) `Deno.serve` is hyper-based Rust HTTP cutting
out workerd's JS-shim ↔ kj layer. Heavy compute converges to
within ~10 % because V8's wasm engine dominates that regime.
- **wasmtime cold-start (7-13 ms)** beats every V8 host on
cold but provides no HTTP layer of its own — listed for
baseline only; HTTP-serving wasmtime would require either
wasi-http (component model, not Preview 1) or a custom Rust
HTTP loop. Not part of the V8-host comparison.

#### wasm-opt vs Wizer notes

Counter-intuitive PoC finding: **`wasm-opt` is consistently
net-negative on V8 cold-start at every optimisation level**, even
when its size reductions are large. Smaller wasm doesn't translate
into faster instantiate; the V8 wasm parser appears to bottleneck
on IR construction / module setup rather than byte count. Wizer
pre-init is the win, n=5 each on workerd local:

| Build pipeline | Wasm size | Cold-start (median) |
|----------------|----------:|--------------------:|
| baseline (raw cargo output) | 1.54 MB | 57 ms |
| wasm-opt -Oz only | 1.22 MB (−21 %) | 53 ms (−7 %) |
| wasm-opt -Oz + Wizer | 1.37 MB | 27 ms (−53 %) |
| wasm-opt -O2 + Wizer | 1.42 MB | 23 ms (−60 %) |
| **Wizer only** (no wasm-opt) | **1.68 MB** | **18 ms (−69 %)** |

The Wizer win matches what
[`workerd/src/pyodide/make_snapshots.py`](https://github.com/cloudflare/workerd/tree/main/src/pyodide)
does for Python Workers — snapshot the post-init linear memory
so cold-start skips re-running the interpreter's bootstrap. We
cannot match CF's *baseline-preloaded-in-isolate-pool* trick
(that requires the runtime to be linked into workerd itself),
but the per-Worker snapshot equivalent is exactly what the PoC's
`build.sh` produces.

#### Cold-start floor — two negative experiments

Above ~18 ms (workerd local) the marginal cost of further wasm
shrinkage is zero. Two independent attempts confirmed:

1. **Lazy-loading the Tier 1 stdlib preambles** (Random + SecureRandom,
`src/lib.rs::load_preamble` calls these unconditionally today).
Cuts ~40 KB from the Wizer'd wasm. Cold-start n=5: 17.7, 19.5,
22.1, 22.6, 19.3 ms — **median 19.5 ms, marginally SLOWER than
the 18.2 ms baseline**, well within variance.

2. **`opt-level = "z"` + LTO=fat + codegen-units=1**. Repo's own
`[profile.release-min]` history note records this combination as
**3–19 % SLOWER at cold start despite producing a 56 %-smaller
binary**, measured on three hosts (macOS arm64, Linux arm64,
Linux x86_64). The reason is the same one wasm-opt -Oz hits:
aggressive size shrinkage suppresses inlining and substitutes
shorter call sequences, which V8's wasm tier-up engine takes
longer to fix up than it would have to compile the original.

Two independent angles, same negative result — the 18 ms floor
is V8's wasm parser + module-instantiate fixed cost, NOT a
function of our byte count. To reduce further the project would
have to either (a) reduce the function count (1798 today) so V8
has fewer IRs to build, requiring rubyrs-internal refactoring; (b)
move to component model + AOT (wasmtime serve), bypassing V8's
parse path entirely; or (c) get CF to expose a generic
`--save-wasm-snapshot` user-wasm equivalent of their privileged
Python preload (currently not on offer). The 18 ms cold-start
is best treated as the public-API floor for this build shape and
the PoC is now operating at that floor.

## Throughput

1M iteration loop computing fizzbuzz string lengths.
Expand Down
4 changes: 4 additions & 0 deletions poc/cf-worker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
node_modules/
.wrangler/
src/*.wasm
workerd/dist/
94 changes: 94 additions & 0 deletions poc/cf-worker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# rubyrs on Cloudflare Workers — PoC

Goal: prove rubyrs.wasm runs on Cloudflare Workers (V8 isolate +
WASI Preview 1 polyfill), end-to-end, locally via `wrangler dev`.

## Shape

```
HTTP POST body=Ruby source
Worker fetch handler (src/worker.js)
↓ pipes body as stdin
@cloudflare/workers-wasi (WASI preview1 shim in JS)
rubyrs_worker.wasm (wasm32-wasip1 bin reading stdin via Runtime::eval)
↓ captures stdout
HTTP 200 body=Ruby script output
```

The worker bin (`crates/rubyrs/src/bin/wasm_worker.rs`) reads
stdin → `Runtime::eval` → stdout. The Worker pipes
`request.body` straight in; it does not touch the in-isolate
filesystem (workers-wasi's littlefs has no public pre-population
API, see [research notes](#research-notes)).

## Prerequisites

- Rust toolchain matching `rust-toolchain.toml`
- `rustup target add wasm32-wasip1`
- `WASI_SDK_PATH` pointing at a wasi-sdk install (same as
`tests/wasm/smoke.sh` — needed for the wasi_stub.c compile in
build.rs). Download from
https://github.com/WebAssembly/wasi-sdk/releases.
- `node` + `npm`
- `wizer` is optional; included in the build path when present
(`cargo install wizer-cli`).

## Quick start

```sh
# From this directory.
npm install # @cloudflare/workers-wasi + wrangler
./build.sh # cargo → (optional) wizer → wasm/rubyrs_worker.wasm
npx wrangler dev # local V8 (workerd) on http://localhost:8787

# In another terminal:
curl -X POST --data-binary 'puts (1..5).sum' http://localhost:8787
# → 15
```

## Layout

```
poc/cf-worker/
├── wrangler.toml # Worker config + CompiledWasm rule
├── package.json # workers-wasi + wrangler
├── build.sh # cargo build → wizer → copy artifact
├── src/worker.js # fetch handler
├── wasm/ # build.sh writes rubyrs_worker.wasm here
└── README.md
```

## Knobs / next steps

- **Streaming response**: replace the buffered stdout capture in
`worker.js` with a `TransformStream` whose readable side is
the `Response` body. Lets long-running Ruby see incremental
output.
- **CPU / memory caps**: surface `RUBYRS_DEADLINE_MS` etc. via
WASI `env`. Worker fetch handler can set a deadline below the
Worker's own 30 s CPU cap so traps come from rubyrs with
context rather than from the edge with `Error 1102`.
- **Wizer cold-start measurement**: only meaningful on the real
edge — Miniflare/`wrangler dev` does not reproduce isolate
cold-start. Deploy + `wrangler tail` to measure.
- **Static-script mode**: for a fixed-DSL deployment, replace
`request.body` with an embedded `include_str!`'d script and
pin the wasm at build time. Removes the per-request stdin
plumbing and lets the response be streamed.

## Research notes

- `@cloudflare/workers-wasi` does not expose a way to write into
the FS before instantiation; `preopens` is a `string[]` of
names only. Stdin is the documented input channel for
command-shape wasm — hence the bin reads from stdin.
- Local dev (`wrangler dev`) uses Miniflare v3 → `workerd`, the
same runtime as production. Module loading and the
`wasi_snapshot_preview1` shim behave identically. Cold-start
timing and the 10 ms / 30 s CPU caps are **not** enforced
locally — only on the deployed edge.
- `_start` is the entry; workers-wasi's `wasi.start(instance)`
drives it. Re-instantiating per request is the documented
pattern; V8 caches the compiled `WebAssembly.Module`.
118 changes: 118 additions & 0 deletions poc/cf-worker/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#!/usr/bin/env bash
# Build rubyrs.wasm for the CF Workers PoC.
#
# Pipeline:
# 1. cargo build wasm_worker bin for wasm32-wasip1, --no-default-features
# (cext requires dlopen which wasi has no equivalent for).
# 2. (Optional) wizer pre-init pass: snapshots classes + preamble
# bytecode into the wasm so cold-start on Workers doesn't burn
# the 1s top-level CPU budget re-doing that work. Skipped when
# `wizer` is not on PATH so first-time PoC contributors don't
# need to install it before seeing the round-trip work.
# 3. Copy the artifact to poc/cf-worker/wasm/ so wrangler picks
# it up via the [[rules]] CompiledWasm glob.
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
WORKSPACE_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
cd "$WORKSPACE_ROOT"

if [ -z "${WASI_SDK_PATH:-}" ]; then
echo "build.sh: WASI_SDK_PATH not set (needed for wasi_stub.c compile in build.rs)." >&2
echo " Install wasi-sdk from https://github.com/WebAssembly/wasi-sdk/releases" >&2
echo " and export WASI_SDK_PATH=/path/to/wasi-sdk-XX.0" >&2
exit 1
fi

# Prefer rustup's shim over any Homebrew (or other) rustc that may
# shadow PATH — those distributions usually lack the wasm32-wasip1
# rust-std component, and cargo errors with a misleading "target
# may not be installed" even when `rustup target add wasm32-wasip1`
# succeeded for the rustup toolchain.
if [ -x "$HOME/.cargo/bin/cargo" ]; then
export PATH="$HOME/.cargo/bin:$PATH"
fi

if ! rustup target list --installed | grep -qx wasm32-wasip1; then
echo "build.sh: wasm32-wasip1 target missing — \`rustup target add wasm32-wasip1\`" >&2
exit 1
fi

echo "[build.sh] cargo build --release --target wasm32-wasip1 --bin wasm_worker --no-default-features"
cargo build --release --target wasm32-wasip1 \
--bin wasm_worker -p rubyrs --no-default-features

RAW="$WORKSPACE_ROOT/target/wasm32-wasip1/release/wasm_worker.wasm"
# Final artifact lands NEXT TO src/worker.js so both wrangler and
# workerd can resolve `import "./rubyrs_worker.wasm"`. A historical
# poc/cf-worker/wasm/ location worked for wrangler (default
# CompiledWasm glob walks the project) but not for workerd, which
# rejects `..`-containing module specifiers. Co-locating is the
# minimum-friction shape that satisfies both runtimes.
OUT_DIR="$SCRIPT_DIR/src"
mkdir -p "$OUT_DIR"
OUT="$OUT_DIR/rubyrs_worker.wasm"

# Optional wasm-opt pass. Pick the level via `WASM_OPT_LEVEL`:
# skip → no wasm-opt (default — see note below)
# -O2 → balanced speed/size
# -O3 → aggressive speed
# -Oz → aggressive size
#
# Why default = skip: rubyrs PoC measurement found that `-Oz` on
# the wasm32-wasip1 binary improves *workerd local* cold-start
# (57→27 ms with wizer) but REGRESSES V8 execution perf on
# Cloudflare Workers' edge (heavy loop 173 ms → 416 ms,
# `puts 1+1` 8 ms → 60 ms). Working hypothesis: `-Oz`'s
# aggressive size shrinks (function-deduplication, inlining
# inhibition, instruction substitution) break V8's wasm
# tier-up heuristics. Until that's debugged the conservative
# default is no opt; the env var lets benchmarks opt in.
#
# Order: wasm-opt FIRST, then wizer. wasm-opt restructures code
# (function indices, instruction layout); wizer snapshots linear
# memory at init time AFTER seeing the final code shape, so
# running it the other way around would have wasm-opt
# invalidate the snapshot's function-index references.
WIZER_IN="$RAW"
WASM_OPT_LEVEL="${WASM_OPT_LEVEL:-skip}"
if [ "$WASM_OPT_LEVEL" = "skip" ]; then
echo "[build.sh] wasm-opt skipped (WASM_OPT_LEVEL=skip)"
elif command -v wasm-opt >/dev/null 2>&1; then
OPT="$WORKSPACE_ROOT/target/wasm32-wasip1/release/wasm_worker.opt.wasm"
echo "[build.sh] wasm-opt $WASM_OPT_LEVEL"
wasm-opt "$WASM_OPT_LEVEL" --enable-bulk-memory "$RAW" -o "$OPT"
WIZER_IN="$OPT"
echo "[build.sh] $(wc -c < "$RAW") → $(wc -c < "$OPT") bytes"
else
echo "[build.sh] wasm-opt not on PATH — skipping size pass (\`brew install binaryen\`)"
fi

if command -v wizer >/dev/null 2>&1; then
# Wizer needs --allow-wasi --wasm-bulk-memory + the binary's
# `wizer.initialize` export (lib.rs exports this). Skip if the
# export is absent so we don't fail on bins without it.
#
# Stage the objdump output to a tempfile rather than piping
# straight into `grep -q`. `grep -q` closes its stdin after
# the first match, which sends SIGPIPE upstream — under
# `set -o pipefail` (which we want everywhere else in this
# script) that turns the successful detection into a
# failure-coded pipe, and we'd silently fall through to the
# "wizer skipped" branch even when the export is present.
DUMP="$(mktemp -t rubyrs-wasm-dump.XXXXXX)"
trap 'rm -f "$DUMP"' EXIT
wasm-objdump -x "$WIZER_IN" > "$DUMP" 2>/dev/null || true
if grep -q "wizer.initialize" "$DUMP"; then
echo "[build.sh] wizer pre-init pass"
wizer --allow-wasi --wasm-bulk-memory true "$WIZER_IN" -o "$OUT"
else
echo "[build.sh] wizer skipped (no wizer.initialize export in this bin)"
cp "$WIZER_IN" "$OUT"
fi
else
echo "[build.sh] wizer not on PATH — skipping pre-init pass (\`cargo install wizer-cli\`)"
cp "$WIZER_IN" "$OUT"
fi

echo "[build.sh] $OUT ($(wc -c < "$OUT") bytes)"
14 changes: 14 additions & 0 deletions poc/cf-worker/deno/deno.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"_comment": [
"Scoped Deno config so deno commands run from this subdir do not",
"pick up the parent package.json (workerd, wrangler, esbuild, ...).",
"The parent's npm tree pulls a vitest/turbo/rolldown forest that",
"deno tries to materialise under --node-modules-dir=auto. An empty",
"imports block plus the local nodeModulesDir setting keeps Deno",
"self-contained to this subdir."
],
"imports": {
"@bjorn3/browser_wasi_shim": "npm:@bjorn3/browser_wasi_shim@^0.4.2"
},
"nodeModulesDir": "auto"
}
Loading
Loading