Skip to content

Testing: add perf regression orchestrator and PR-comment CI gate#33433

Open
RenaudRohlinger wants to merge 8 commits intomrdoob:devfrom
RenaudRohlinger:feat/perf-regression-ci
Open

Testing: add perf regression orchestrator and PR-comment CI gate#33433
RenaudRohlinger wants to merge 8 commits intomrdoob:devfrom
RenaudRohlinger:feat/perf-regression-ci

Conversation

@RenaudRohlinger
Copy link
Copy Markdown
Collaborator

@RenaudRohlinger RenaudRohlinger commented Apr 21, 2026

See commit message

This contribution is funded by Spawn

Introduces a multi-iteration A/B perf harness comparing the live tree
against a baseline ref (default: dev). Median + MAD statistical gate,
adaptive iteration count with fast-exit on catastrophic regressions,
persistent browsers and in-process servers for throughput, vsync
disabled so FPS reflects renderer throughput.

Two workflows mirror the existing bundle-size pattern: read-perf runs
on PR with read-only perms and uploads a summary artifact; report-perf
triggers on workflow_run with pull-requests:write and posts/updates a
markdown comment on the PR.

Gated metrics: fps, frame p50/p95/p99, jsHeap mean, WebGPU VRAM,
submits/frame, WebGPU errors. Heap growth + GC counters are shown for
debugging but never block CI (too noisy for gating).
Stop absorbing the orchestrator exit code via `|| echo` — real crashes
were silently passing and confusing the downstream "Attach PR number"
step. Use `continue-on-error: true` instead so exit 2 (regression) still
posts a comment while exit 1 (crash) surfaces as a red check.

Pass the base SHA rather than the ref name to `--baseline` so there's
no ref-resolution ambiguity on shallow/detached CI checkouts.

Guard "Attach PR number" and the artifact upload behind `if: always()`
with a missing-file check so they handle both regression and crash
cases gracefully.
On Linux CI, puppeteer's headless: 'new' has a broken GPU path — WebGPU
adapter returns null and WebGL context creation also fails (hence the
"Cannot read properties of null (reading 'getSupportedExtensions')"
page errors seen on the first CI run). The existing test/e2e/puppeteer.js
already works around this by using headless: false when CI is set,
relying on the workflow's xvfb-run wrapper to provide a virtual display.

Match that pattern so WebGPU via lavapipe actually initializes. Also
add a collectIteration guard that throws if zero frames were recorded
during the sample window — the previous behaviour was to silently
produce a summary of zeros with phantom REGRESS verdicts on jsHeap.
@RenaudRohlinger
Copy link
Copy Markdown
Collaborator Author

RenaudRohlinger commented Apr 21, 2026

🟢 Perf regression (webgpu_backdrop_water)

Median across 6 measured iterations (1 warmup dropped), gated at k=3·MAD. Lavapipe (software WebGPU), vsync disabled — FPS reflects renderer throughput, not display refresh.

Metric Baseline Candidate Δ Verdict
FPS (uncapped) 30.22 30.56 +1.1% stable
Frame time (median) 2.95 ms 2.60 ms -11.9% stable
JS heap (mean) 18.17 MB 17.87 MB -1.6% stable
WebGPU VRAM 50.29 MB 50.29 MB · stable
Submits/frame 4 3.99 -0.3% stable

Baseline: 3fbe9eabb995af7e22a73d3fda97eb1664ab1ef9 @ 3fbe9eab · Candidate: 34e0d1bf (uncommitted) · Duration: 8000ms · Warmup: 3000ms

From CI run 24702529757 — this is a manual preview of what the report-perf workflow will post automatically on future PRs (blocked on this PR by the workflow_run bootstrap).

Drop p95/p99 (p99 is dominated by one-off shader-compile stalls under
short windows), WebGPU errors (not a perf concern), JS heap growth and
GC heap freed (ill-defined % deltas — growth can be negative). Keep
only the rows that reviewers actually act on: FPS, median frame time,
JS heap mean, VRAM, submits/frame, and GC event count (info-only).

Also render exact-zero Δ as "·" instead of "+0.0%" — removes visual
noise on stable metrics like VRAM and submits.

Full metric set stays in the summary JSON artifact for anyone digging
into a specific PR; the comment just leads with the useful columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant