Testing: add perf regression orchestrator and PR-comment CI gate#33433
Open
RenaudRohlinger wants to merge 8 commits intomrdoob:devfrom
Open
Testing: add perf regression orchestrator and PR-comment CI gate#33433RenaudRohlinger wants to merge 8 commits intomrdoob:devfrom
RenaudRohlinger wants to merge 8 commits intomrdoob:devfrom
Conversation
Introduces a multi-iteration A/B perf harness comparing the live tree against a baseline ref (default: dev). Median + MAD statistical gate, adaptive iteration count with fast-exit on catastrophic regressions, persistent browsers and in-process servers for throughput, vsync disabled so FPS reflects renderer throughput. Two workflows mirror the existing bundle-size pattern: read-perf runs on PR with read-only perms and uploads a summary artifact; report-perf triggers on workflow_run with pull-requests:write and posts/updates a markdown comment on the PR. Gated metrics: fps, frame p50/p95/p99, jsHeap mean, WebGPU VRAM, submits/frame, WebGPU errors. Heap growth + GC counters are shown for debugging but never block CI (too noisy for gating).
Stop absorbing the orchestrator exit code via `|| echo` — real crashes were silently passing and confusing the downstream "Attach PR number" step. Use `continue-on-error: true` instead so exit 2 (regression) still posts a comment while exit 1 (crash) surfaces as a red check. Pass the base SHA rather than the ref name to `--baseline` so there's no ref-resolution ambiguity on shallow/detached CI checkouts. Guard "Attach PR number" and the artifact upload behind `if: always()` with a missing-file check so they handle both regression and crash cases gracefully.
On Linux CI, puppeteer's headless: 'new' has a broken GPU path — WebGPU adapter returns null and WebGL context creation also fails (hence the "Cannot read properties of null (reading 'getSupportedExtensions')" page errors seen on the first CI run). The existing test/e2e/puppeteer.js already works around this by using headless: false when CI is set, relying on the workflow's xvfb-run wrapper to provide a virtual display. Match that pattern so WebGPU via lavapipe actually initializes. Also add a collectIteration guard that throws if zero frames were recorded during the sample window — the previous behaviour was to silently produce a summary of zeros with phantom REGRESS verdicts on jsHeap.
Collaborator
Author
🟢 Perf regression (webgpu_backdrop_water)Median across 6 measured iterations (1 warmup dropped), gated at k=3·MAD. Lavapipe (software WebGPU), vsync disabled — FPS reflects renderer throughput, not display refresh.
Baseline: From CI run 24702529757 — this is a manual preview of what the |
Drop p95/p99 (p99 is dominated by one-off shader-compile stalls under short windows), WebGPU errors (not a perf concern), JS heap growth and GC heap freed (ill-defined % deltas — growth can be negative). Keep only the rows that reviewers actually act on: FPS, median frame time, JS heap mean, VRAM, submits/frame, and GC event count (info-only). Also render exact-zero Δ as "·" instead of "+0.0%" — removes visual noise on stable metrics like VRAM and submits. Full metric set stays in the summary JSON artifact for anyone digging into a specific PR; the comment just leads with the useful columns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See commit message
This contribution is funded by Spawn