WebGPURenderer: Reduce per-frame allocations in render & compute paths by RenaudRohlinger · Pull Request #33419 · mrdoob/three.js

RenaudRohlinger · 2026-04-19T10:30:25Z

Pre-allocates WebGPU descriptors, state-tracking objects, and scratch arrays on the render/compute hot paths so they're reused each frame instead of re-created per call. Also removes a WeakMap.set dedup pattern (_activePipelines) in favor of inline state the render loop already tracks.

Introduces a TexelCopyTextureInfo class wrapping the WebGPU IDL type, used as a reusable scratch for copyTextureToTexture / copyFramebufferToTexture.

Before / after — `webgpu_backdrop_water`, 10 s window, Inspector stripped, 1 KB sampling, 3-run average

Total sampled heap

branch	sampled heap
dev	411.06 KB
this PR	340.51 KB
Δ	−70.55 KB (−17.2%)

Net of sites this PR actually touches: ~−65 KB. The remaining ~6 KB of the total delta comes from V8's sampler re-attributing hits across the render chain after the descriptor / submit-array allocations disappear — not individually
causal, only visible in aggregate.

WebGPU resource deltas (interceptor-counted)

metric	dev	this PR
live buffers Δ over 10 s	+16	+16 (preexisting — #33413)
bind groups Δ over 10 s	+32	+32 (preexisting — #33413)

Combined impact with follow-up PRs

Two smaller allocation-reduction changes were spun off into their own PRs:

perf/uniforms-group-range-pooling — pool per-uniform { start, count } objects in UniformsGroup
perf/node-material-observer-lights-cache — reuse cached lightsData entry in NodeMaterialObserver

Applied together with this PR, total sampled-heap reduction is ~14-17% on the same workload (exact figure drifts ±3% between sessions due to sampler noise). Most of the win is from this PR's _activePipelines change alone; the follow-ups are structural correctness improvements that don't produce a measurable additional drop in isolation.

This contribution is funded by Spawn

- beginCompute: cache pass descriptor and label; avoid per-dispatch map/join. - clear: drop unused colorAttachments array literal. - beginRender: cache occlusionQuerySet descriptor across frames. - _getRenderPassDescriptor: reuse scratch view descriptors instead of fresh literals. - _createDepthLayerDescriptors: reuse per-layer descriptors in place; truncate stale depth views and layer descriptors when cameras.length shrinks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Renderer.compute: cache a singleton array for single-node dispatches instead of allocating [ computeNode ] per call. - WebGPUTimestampQueryPool._resolveQueries: reuse scratch submit array. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- test/e2e/perf-regression.js: run a WebGPU example and capture JS heap, GC, frame timing, and WebGPU resource deltas (buffers, textures, bind groups, pipelines, submits/render passes/compute passes, uncaptured errors). - test/e2e/perf-regression-compare.js: diff two runs and print a clean A/B table. - Self-contained — relies only on puppeteer plus the existing utils/server.js. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T10:32:50Z

📦 Bundle size

Full ESM build, minified and gzipped.

	Before	After	Diff
WebGL	365.3 86.77	365.3 86.77	+0 B +0 B
WebGPU	638.99 177.4	642.25 178.29	+3.26 kB +893 B
WebGPU Nodes	637.11 177.11	640.36 178	+3.26 kB +896 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

	Before	After	Diff
WebGL	497.82 121.45	497.82 121.45	+0 B +0 B
WebGPU	710.84 192.25	714.1 193.14	+3.26 kB +892 B
WebGPU Nodes	660.06 179.57	663.31 180.48	+3.26 kB +912 B

These changes belong to mrdoob#33418 and were bundled into this branch by mistake. - Renderer.js: drop `onError` / `_onError` hook. - WebGPUBackend.js: drop `device.onuncapturederror` → `renderer.onError` bridge. - WebGPUPipelineUtils.js: revert to dev (removes pipeline-label error messages and `_reportShaderDiagnostics`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- RenderContexts: drop attachment-state cache; rebuild the template literal every call. Benchmarked three variants (instance cache, WeakMap cache, no cache) and the cache provided no net benefit. Addresses review comments about monkey-patching render targets and avoiding comparison complexity for a short string. - WebGPUBackend: extract `_setTexelCopyInfo` and `_submit` module helpers to collapse repeated field-by-field mutation patterns. `_submit` replaces the three-line submit-array scratch pattern at 5 call sites; `_setTexelCopyInfo` collapses 10 field assignments in `copyTextureToTexture` into 2 helper calls. Addresses the readability comment on line 2641. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the `_setTexelCopyInfo` helper function with a `TexelCopyTextureInfo` class that exposes chainable `setTexture`/`setMipLevel`/`setOrigin` setters, matching the reviewer's suggestion on PR mrdoob#33419 and mirroring three.js's Matrix4/Vector3 conventions. The four `copyTextureToTexture` / `copyFramebufferToTexture` scratch descriptors are now instances of this class, so call sites document themselves (`.setTexture(...).setOrigin(...)`) rather than opaque positional arguments to an ad-hoc helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The wrapper class for `GPUTexelCopyTextureInfo` belongs with the other texture utilities. Export it from `WebGPUTextureUtils.js` and import it into `WebGPUBackend.js`. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The wrapper class for `GPUTexelCopyTextureInfo` is a standalone helper — move it out of `WebGPUTextureUtils.js` (which is about texture management) into `src/renderers/webgpu/utils/TexelCopyTextureInfo.js`, matching three.js's "one class per file" convention. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The perf-regression scripts belong in a separate PR (or stay as personal tooling); this PR should be scoped to the renderer allocation fixes only. Drops `test/e2e/perf-regression.js`, `test/e2e/perf-regression-compare.js`, their README section, and the corresponding `.gitignore` entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dup. `WebGPUPipelineUtils.setPipeline` tracked the active pipeline per pass encoder via a WeakMap. Pass encoders are created fresh every frame, so each pipeline change fired `_activePipelines.set( pass, pipeline )` — allocating a new entry hundreds of times per second. Sampled heap profile attributed ~64 KB / 10 s to this one line on webgpu_backdrop_water. We already track the active pipeline in `currentSets.pipeline` (render) and can do the same on the compute group's data entry. Inline the dedup at both call sites, delete the WeakMap, delete the method. Bonus: use an Array for `currentSets.attributes` instead of `{}` and reset with `.length = 0` — avoids the `for…in / delete` loop in `_resetCurrentSets` that caused hidden-class transitions. Measured on webgpu_backdrop_water (Inspector stripped, 10 s, 1 KB sampling): `update @ Animation.js:70` subtree 402 KB → 341 KB (−15%). `animate` child subtree 135 KB → 61 KB (−55%). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`getLights` allocated a fresh `{ renderId, lightsData }` object every time `renderId` changed (i.e. once per frame per lightsNode). Mutate the cached entry's fields in place instead, so the cache-miss path doesn't allocate after the first frame. `getLightsData` also allocated a fresh `const lights = []` per call; change the signature to accept the output array so it can be reused. Measured on webgpu_backdrop_water (Inspector stripped, 1 KB sampling): site no longer appears in the top 30 allocators. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`addUniformUpdateRange` allocated a fresh `{start, count}` object every frame, for every uniform whose value changed. On an animated scene with dozens of per-object uniforms this was hundreds of allocations per frame — the single largest remaining source of sampled-heap traffic after the earlier backend fixes. Keep the range objects in `_updateRangeCache` across frames; mark each as `added: false` on `clearUpdateRanges()` so the next frame pushes the same cached object back onto `updateRanges` instead of allocating a new one. Measured on webgpu_backdrop_water (Inspector stripped, 10 s, 1 KB sampling, 3-run average): total sampled heap 97.0 KB → 66.5 KB (-31.5%) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eagerly. Two small per-frame allocation reductions in the rAF bucket: - `update()` now calls `performance.now()` once instead of 2-3 times. Each call returns a non-Smi double that V8 boxes into a `HeapNumber` when stored on a property; fewer calls means fewer boxes per frame. - `lastTime` is initialized to `performance.now()` in the constructor, so the property starts as a double rather than transitioning from `undefined`. Keeps V8's hidden class stable and allows the property to stay in double-unboxed storage across frames. Measured on webgpu_backdrop_water (Inspector stripped, 1 KB sampling, 3-run average): `update @ Animation.js:70` attribution 15.98 KB → 11.80 KB (−26%). Total sampled heap unchanged within noise — V8 re-attributes the remaining bytes to other sites — but the `update` row specifically is the frontier that V8 inlining fuzz lets us touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Moving UniformsGroup range pooling and NodeMaterialObserver cache reuse into separate PRs so each can be reviewed independently by the relevant subsystem owners. NodeFrame change is dropped entirely. Reverts: 598a225 NodeFrame: performance.now() consolidation — dropped 47bcec4 UniformsGroup: Pool per-uniform update-range — moved to own PR dbad1d4 NodeMaterialObserver: Reuse lightsData cache entry — moved to own PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RenaudRohlinger · 2026-04-20T12:17:04Z

@Mugen87 @sunag

Everything should be ready for review. Overall a pretty big win with a total sampled-heap reduction of about ~14-17% on the same workload.

Related PR:

…er starts. Fixes 'No pipeline set' on compute dispatches in scenes with repeating compute calls (e.g. webgpu_compute_texture_pingpong). The inline pipeline dedup introduced in 2e164b2 tracked the active pipeline on the compute group's data entry, but each `beginCompute` creates a new pass encoder — so the tracker would still match last frame's pipeline and skip the `setPipeline` call on the fresh encoder. Reset `groupGPU.currentPipeline = null` whenever we recreate the command encoder in `beginCompute`, so the first `compute()` call after each `beginCompute` always calls `setPipeline` on the new encoder. Render path already does this via `_resetCurrentSets` in `beginRender`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mugen87 · 2026-04-21T07:52:40Z

+const _copyDst = new TexelCopyTextureInfo();
+const _copySize = [ 0, 0 ];
+
+const _texCopyEncoderOptions = { label: '' };


I think the code gets more clear if we introduce more descriptor classes. For that, I would suggest to create a new director src/renderers/webgpu/descriptors and put the class definitions (including TexelCopyTextureInfo) there.

Here is a list with possible types for the new module scope objects:

_texCopyEncoderOptions: GPUCommandEncoderDescriptor

_clearEncoderOptions: GPUCommandEncoderDescriptor

_viewDescriptor: GPUTextureViewDescriptor

_depthViewOptions: GPUTextureViewDescriptor

_clearPassDescriptor: GPURenderPassDescriptor

_timestampWrites: GPURenderPassTimestampWrites

RenaudRohlinger and others added 4 commits April 19, 2026 15:08

Fix WebGPU renderer memory churn

3282f4f

RenaudRohlinger added this to the r185 milestone Apr 19, 2026

Mugen87 reviewed Apr 19, 2026

View reviewed changes

Comment thread src/renderers/common/RenderContexts.js Outdated

Mugen87 reviewed Apr 19, 2026

View reviewed changes

Comment thread src/renderers/webgpu/WebGPUBackend.js

RenaudRohlinger and others added 9 commits April 20, 2026 16:17

Mugen87 reviewed Apr 20, 2026

View reviewed changes

Comment thread src/nodes/core/NodeFrame.js Outdated

Mugen87 reviewed Apr 20, 2026

View reviewed changes

Comment thread src/materials/nodes/manager/NodeMaterialObserver.js Outdated

RenaudRohlinger marked this pull request as draft April 20, 2026 11:33

RenaudRohlinger marked this pull request as ready for review April 20, 2026 12:14

This was referenced Apr 20, 2026

UniformsGroup: Pool per-uniform update-range objects. #33427

Merged

NodeMaterialObserver: Reuse lightsData cache entry per frame. #33425

Merged

Mugen87 reviewed Apr 20, 2026

View reviewed changes

Comment thread src/renderers/common/Renderer.js

Move to constructor

aecadd9

Mugen87 reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WebGPURenderer: Reduce per-frame allocations in render & compute paths#33419

WebGPURenderer: Reduce per-frame allocations in render & compute paths#33419
RenaudRohlinger wants to merge 17 commits intomrdoob:devfrom
RenaudRohlinger:fix-webgpu-memory-leaks

RenaudRohlinger commented Apr 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RenaudRohlinger commented Apr 20, 2026

Uh oh!

Uh oh!

Mugen87 Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RenaudRohlinger commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before / after — webgpu_backdrop_water, 10 s window, Inspector stripped, 1 KB sampling, 3-run average

Total sampled heap

WebGPU resource deltas (interceptor-counted)

Combined impact with follow-up PRs

Uh oh!

github-actions Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Bundle size

🌳 Bundle size after tree-shaking

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RenaudRohlinger commented Apr 20, 2026

Uh oh!

Uh oh!

Mugen87 Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RenaudRohlinger commented Apr 19, 2026 •

edited

Loading

Before / after — `webgpu_backdrop_water`, 10 s window, Inspector stripped, 1 KB sampling, 3-run average

github-actions Bot commented Apr 19, 2026 •

edited

Loading

Mugen87 Apr 21, 2026 •

edited

Loading