Skip to content

WebGPURenderer: Reduce per-frame allocations in render & compute paths#33419

Open
RenaudRohlinger wants to merge 17 commits intomrdoob:devfrom
RenaudRohlinger:fix-webgpu-memory-leaks
Open

WebGPURenderer: Reduce per-frame allocations in render & compute paths#33419
RenaudRohlinger wants to merge 17 commits intomrdoob:devfrom
RenaudRohlinger:fix-webgpu-memory-leaks

Conversation

@RenaudRohlinger
Copy link
Copy Markdown
Collaborator

@RenaudRohlinger RenaudRohlinger commented Apr 19, 2026

Related:

Pre-allocates WebGPU descriptors, state-tracking objects, and scratch arrays on the render/compute hot paths so they're reused each frame instead of re-created per call. Also removes a WeakMap.set dedup pattern (_activePipelines) in favor of inline state the render loop already tracks.

Introduces a TexelCopyTextureInfo class wrapping the WebGPU IDL type, used as a reusable scratch for copyTextureToTexture / copyFramebufferToTexture.

Before / after — webgpu_backdrop_water, 10 s window, Inspector stripped, 1 KB sampling, 3-run average

Total sampled heap

branch sampled heap
dev 411.06 KB
this PR 340.51 KB
Δ −70.55 KB (−17.2%)

Net of sites this PR actually touches: ~−65 KB. The remaining ~6 KB of the total delta comes from V8's sampler re-attributing hits across the render chain after the descriptor / submit-array allocations disappear — not individually
causal, only visible in aggregate.

WebGPU resource deltas (interceptor-counted)

metric dev this PR
live buffers Δ over 10 s +16 +16 (preexisting — #33413)
bind groups Δ over 10 s +32 +32 (preexisting — #33413)

Combined impact with follow-up PRs

Two smaller allocation-reduction changes were spun off into their own PRs:

  • perf/uniforms-group-range-pooling — pool per-uniform { start, count } objects in UniformsGroup
  • perf/node-material-observer-lights-cache — reuse cached lightsData entry in NodeMaterialObserver

Applied together with this PR, total sampled-heap reduction is ~14-17% on the same workload (exact figure drifts ±3% between sessions due to sampler noise). Most of the win is from this PR's _activePipelines change alone; the follow-ups are structural correctness improvements that don't produce a measurable additional drop in isolation.

This contribution is funded by Spawn

RenaudRohlinger and others added 4 commits April 19, 2026 15:08
- beginCompute: cache pass descriptor and label; avoid per-dispatch map/join.
- clear: drop unused colorAttachments array literal.
- beginRender: cache occlusionQuerySet descriptor across frames.
- _getRenderPassDescriptor: reuse scratch view descriptors instead of fresh literals.
- _createDepthLayerDescriptors: reuse per-layer descriptors in place; truncate stale depth views and layer descriptors when cameras.length shrinks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Renderer.compute: cache a singleton array for single-node dispatches instead of allocating [ computeNode ] per call.
- WebGPUTimestampQueryPool._resolveQueries: reuse scratch submit array.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test/e2e/perf-regression.js: run a WebGPU example and capture JS heap, GC, frame timing, and WebGPU resource deltas (buffers, textures, bind groups, pipelines, submits/render passes/compute passes, uncaptured errors).
- test/e2e/perf-regression-compare.js: diff two runs and print a clean A/B table.
- Self-contained — relies only on puppeteer plus the existing utils/server.js.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 19, 2026

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 365.3
86.77
365.3
86.77
+0 B
+0 B
WebGPU 638.99
177.4
642.25
178.29
+3.26 kB
+893 B
WebGPU Nodes 637.11
177.11
640.36
178
+3.26 kB
+896 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 497.82
121.45
497.82
121.45
+0 B
+0 B
WebGPU 710.84
192.25
714.1
193.14
+3.26 kB
+892 B
WebGPU Nodes 660.06
179.57
663.31
180.48
+3.26 kB
+912 B

@RenaudRohlinger RenaudRohlinger added this to the r185 milestone Apr 19, 2026
These changes belong to mrdoob#33418 and were bundled into this branch by mistake.

- Renderer.js: drop `onError` / `_onError` hook.
- WebGPUBackend.js: drop `device.onuncapturederror` → `renderer.onError` bridge.
- WebGPUPipelineUtils.js: revert to dev (removes pipeline-label error messages and `_reportShaderDiagnostics`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/renderers/common/RenderContexts.js Outdated
Comment thread src/renderers/webgpu/WebGPUBackend.js
RenaudRohlinger and others added 9 commits April 20, 2026 16:17
- RenderContexts: drop attachment-state cache; rebuild the template literal
  every call. Benchmarked three variants (instance cache, WeakMap cache,
  no cache) and the cache provided no net benefit. Addresses review
  comments about monkey-patching render targets and avoiding comparison
  complexity for a short string.
- WebGPUBackend: extract `_setTexelCopyInfo` and `_submit` module helpers
  to collapse repeated field-by-field mutation patterns. `_submit`
  replaces the three-line submit-array scratch pattern at 5 call sites;
  `_setTexelCopyInfo` collapses 10 field assignments in
  `copyTextureToTexture` into 2 helper calls. Addresses the readability
  comment on line 2641.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the `_setTexelCopyInfo` helper function with a `TexelCopyTextureInfo`
class that exposes chainable `setTexture`/`setMipLevel`/`setOrigin` setters,
matching the reviewer's suggestion on PR mrdoob#33419 and mirroring three.js's
Matrix4/Vector3 conventions. The four `copyTextureToTexture` /
`copyFramebufferToTexture` scratch descriptors are now instances of this
class, so call sites document themselves (`.setTexture(...).setOrigin(...)`)
rather than opaque positional arguments to an ad-hoc helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper class for `GPUTexelCopyTextureInfo` belongs with the other
texture utilities. Export it from `WebGPUTextureUtils.js` and import it
into `WebGPUBackend.js`. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapper class for `GPUTexelCopyTextureInfo` is a standalone helper — move
it out of `WebGPUTextureUtils.js` (which is about texture management) into
`src/renderers/webgpu/utils/TexelCopyTextureInfo.js`, matching three.js's
"one class per file" convention. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The perf-regression scripts belong in a separate PR (or stay as personal
tooling); this PR should be scoped to the renderer allocation fixes only.
Drops `test/e2e/perf-regression.js`, `test/e2e/perf-regression-compare.js`,
their README section, and the corresponding `.gitignore` entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dup.

`WebGPUPipelineUtils.setPipeline` tracked the active pipeline per pass
encoder via a WeakMap. Pass encoders are created fresh every frame, so
each pipeline change fired `_activePipelines.set( pass, pipeline )` —
allocating a new entry hundreds of times per second. Sampled heap
profile attributed ~64 KB / 10 s to this one line on webgpu_backdrop_water.

We already track the active pipeline in `currentSets.pipeline` (render)
and can do the same on the compute group's data entry. Inline the
dedup at both call sites, delete the WeakMap, delete the method.

Bonus: use an Array for `currentSets.attributes` instead of `{}` and
reset with `.length = 0` — avoids the `for…in / delete` loop in
`_resetCurrentSets` that caused hidden-class transitions.

Measured on webgpu_backdrop_water (Inspector stripped, 10 s, 1 KB
sampling): `update @ Animation.js:70` subtree 402 KB → 341 KB (−15%).
`animate` child subtree 135 KB → 61 KB (−55%).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`getLights` allocated a fresh `{ renderId, lightsData }` object every
time `renderId` changed (i.e. once per frame per lightsNode). Mutate
the cached entry's fields in place instead, so the cache-miss path
doesn't allocate after the first frame.

`getLightsData` also allocated a fresh `const lights = []` per call;
change the signature to accept the output array so it can be reused.

Measured on webgpu_backdrop_water (Inspector stripped, 1 KB sampling):
site no longer appears in the top 30 allocators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`addUniformUpdateRange` allocated a fresh `{start, count}` object every
frame, for every uniform whose value changed. On an animated scene with
dozens of per-object uniforms this was hundreds of allocations per
frame — the single largest remaining source of sampled-heap traffic
after the earlier backend fixes.

Keep the range objects in `_updateRangeCache` across frames; mark each
as `added: false` on `clearUpdateRanges()` so the next frame pushes the
same cached object back onto `updateRanges` instead of allocating a
new one.

Measured on webgpu_backdrop_water (Inspector stripped, 10 s, 1 KB
sampling, 3-run average):
  total sampled heap  97.0 KB → 66.5 KB  (-31.5%)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eagerly.

Two small per-frame allocation reductions in the rAF bucket:

- `update()` now calls `performance.now()` once instead of 2-3 times.
  Each call returns a non-Smi double that V8 boxes into a `HeapNumber`
  when stored on a property; fewer calls means fewer boxes per frame.

- `lastTime` is initialized to `performance.now()` in the constructor,
  so the property starts as a double rather than transitioning from
  `undefined`. Keeps V8's hidden class stable and allows the property
  to stay in double-unboxed storage across frames.

Measured on webgpu_backdrop_water (Inspector stripped, 1 KB sampling,
3-run average): `update @ Animation.js:70` attribution 15.98 KB → 11.80 KB
(−26%). Total sampled heap unchanged within noise — V8 re-attributes
the remaining bytes to other sites — but the `update` row specifically
is the frontier that V8 inlining fuzz lets us touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/nodes/core/NodeFrame.js Outdated
Comment thread src/materials/nodes/manager/NodeMaterialObserver.js Outdated
@RenaudRohlinger RenaudRohlinger marked this pull request as draft April 20, 2026 11:33
Moving UniformsGroup range pooling and NodeMaterialObserver cache
reuse into separate PRs so each can be reviewed independently by the
relevant subsystem owners. NodeFrame change is dropped entirely.

Reverts:
  598a225 NodeFrame: performance.now() consolidation — dropped
  47bcec4 UniformsGroup: Pool per-uniform update-range — moved to own PR
  dbad1d4 NodeMaterialObserver: Reuse lightsData cache entry — moved to own PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@RenaudRohlinger RenaudRohlinger marked this pull request as ready for review April 20, 2026 12:14
@RenaudRohlinger
Copy link
Copy Markdown
Collaborator Author

@Mugen87 @sunag

Everything should be ready for review. Overall a pretty big win with a total sampled-heap reduction of about ~14-17% on the same workload.

Related PR:

…er starts.

Fixes 'No pipeline set' on compute dispatches in scenes with repeating
compute calls (e.g. webgpu_compute_texture_pingpong). The inline pipeline
dedup introduced in 2e164b2 tracked the active pipeline on the compute
group's data entry, but each `beginCompute` creates a new pass encoder —
so the tracker would still match last frame's pipeline and skip the
`setPipeline` call on the fresh encoder.

Reset `groupGPU.currentPipeline = null` whenever we recreate the command
encoder in `beginCompute`, so the first `compute()` call after each
`beginCompute` always calls `setPipeline` on the new encoder. Render
path already does this via `_resetCurrentSets` in `beginRender`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/renderers/common/Renderer.js
const _copyDst = new TexelCopyTextureInfo();
const _copySize = [ 0, 0 ];

const _texCopyEncoderOptions = { label: '' };
Copy link
Copy Markdown
Collaborator

@Mugen87 Mugen87 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code gets more clear if we introduce more descriptor classes. For that, I would suggest to create a new director src/renderers/webgpu/descriptors and put the class definitions (including TexelCopyTextureInfo) there.

Here is a list with possible types for the new module scope objects:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants