perf(web): replace softbuffer with direct put_image_data canvas present#1374
Conversation
The render path converted each dirty region RGBA -> u32 `0RGB`, then let softbuffer repack u32 -> RGBA into a freshly allocated buffer every frame — two pixel passes over the whole surface plus a per-frame allocation. Replace it with the canvas's own 2D context: one copy of the region into a reused RGBA scratch (alpha forced opaque) followed by put_image_data at the region origin. softbuffer is dropped from ironrdp-web (still used by ironrdp-viewer). Mirrors the same fix in IronVNC. Measured with a record/replay draw bench (dev wasm, headless Chromium), draw-stage median: 4K 1706ms -> 83ms (~20x), 1080p 705ms -> 14ms (~50x), with byte-identical canvas output and unchanged framebuffer checksums.
|
Same as VNC, remove softbuffer |
There was a problem hiding this comment.
Pull request overview
This PR updates the ironrdp-web rendering path to remove the softbuffer dependency and present updated regions by uploading RGBA buffers directly to an HTML canvas via CanvasRenderingContext2d::put_image_data, aiming to reduce per-frame work and allocations.
Changes:
- Replaces the softbuffer-based canvas present path with direct
ImageData+put_image_datablits for dirty regions. - Removes the
softbufferdependency fromironrdp-web. - Enables additional
web-sysfeatures needed for 2D canvas rendering (CanvasRenderingContext2d,ImageData).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/ironrdp-web/src/canvas.rs | Implements the new 2D-context put_image_data rendering path and removes the softbuffer surface logic. |
| crates/ironrdp-web/Cargo.toml | Drops softbuffer dependency; enables required web-sys features for 2D canvas + ImageData. |
| Cargo.lock | Updates lockfile dependency edges to reflect removal of softbuffer from ironrdp-web. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review on the canvas present path: the buffer-grows branch initialized the scratch via spare_capacity_mut + set_len and then unconditionally re-copied the whole buffer, writing every byte twice on the grow path behind an unsafe block that bought nothing. Replace it with clear() + extend_from_slice: a single copy that writes straight into spare capacity (no zero-fill), reuses the allocation in steady state, and removes the unsafe entirely. Also fix doc comments that referenced a non-existent in-crate benchmark and reword the resize and context_2d docs to be accurate.
|
Alex Yusiuk (@RRRadicalEdward) Benoît Cortier (@CBenoit) after a closer look, I realized that this rgba buffer might not be needed to begin with, so removed it, no allocation/initialization issue anymore. It's even 20% faster than what we have in the first draft! |
extract_partial_image already returns an owned, region-sized RGBA copy, so the reusable rgba scratch in Canvas was a redundant second buffer. Force opaque alpha directly on that buffer and hand it straight to put_image_data; Canvas no longer keeps any per-frame state. This removes the scratch allocation, its copy, and the whole zero-fill / set_len question along with it. draw takes &mut [u8] now (the buffer is the caller's throwaway copy, safe to mutate). Output is byte-identical (replay bench: framebuffer CRC and canvas FNV-1a unchanged) and the present path is ~20-30% faster than the buffered version.
Benoît Cortier (CBenoit)
left a comment
There was a problem hiding this comment.
We’re getting at a good place!
| /// This replaced a softbuffer-backed path that converted RGBA -> u32 `0RGB` (our pass) and then let | ||
| /// softbuffer repack u32 -> RGBA per frame into a freshly allocated buffer — two pixel passes over | ||
| /// the whole surface plus a per-frame allocation. The direct path drops the u32 round-trip and the | ||
| /// per-frame allocation, measuring an order of magnitude faster present at 4K with byte-identical | ||
| /// canvas output. Mirrors the same fix in IronVNC. |
There was a problem hiding this comment.
Keeping some historical documentation on how we ended up converging may be okay in order to justify the current design, but it currently sounds like a "prompt artifact": "Mirrors the same fix in IronVNC" is completely irrelevant outside of this PR.
| for pixel in buffer.chunks_exact_mut(4) { | ||
| pixel[3] = 0xFF; | ||
| } |
There was a problem hiding this comment.
question: Just challenging the established pattern here: do we really need to force opaque alpha? I assume that if we don’t do that we end up with visual artifacts? My understanding is that we are just extracting a sub-image from an otherwise fully rendered image, and that we don’t really need any extra cleaning step.
There was a problem hiding this comment.
Benoît Cortier (@CBenoit) yeah, works for most cases but it's not guaranteed. Framebuffer starts zero-filled (here) and the QOI-RGBA path keeps source alpha (here) — plus a tall update gets widened to full width, so early / post-resize frames can upload not-yet-painted columns as transparent.
Best case we'd guarantee opaque upstream (init the framebuffer to 0xff + clamp apply_rgba32), but for the scope of this PR I think it's better to keep the force and fix it in a follow-up. Sound good?
There was a problem hiding this comment.
Sounds good to me! My only suggestion then is to track this with an issue for visibility
Alex Yusiuk (RRRadicalEdward)
left a comment
There was a problem hiding this comment.
Good job 👍.
Benoit already pointed out good things. I have nothing to add
Could you apply this method to IronVNC as well? |
extract_partial_image now fills a caller-owned WriteBuf (unfilled_to + advance) instead of allocating a fresh Vec on every call. session.rs keeps one buffer across frames and clears it per region, so steady-state draws don't allocate. Adds WriteBuf::filled_mut for the in-place alpha fixup. Also inline the single-call-site blit helper into draw, and reword the canvas/extract docs to describe the types and their contracts rather than the calling code.
Cut the verbose doc blocks down to the non-obvious rationale (why force alpha, the whole-rows widening, the WriteBuf clear-between-regions contract), 2 lines max each.
|
Updated, we now use WriteBuf to achive effectively |
| region: InclusiveRectangle, | ||
| buffer: &mut WriteBuf, | ||
| ) -> InclusiveRectangle { | ||
| // PERF: needs actual benchmark to find a better heuristic |
There was a problem hiding this comment.
Thought: Since you have the tools for benchmarking, I think it may be worth to look into this comment as well. As a follow up 🙂
Benoît Cortier (CBenoit)
left a comment
There was a problem hiding this comment.
LGTM! Good job! Before we merge, let's measure the final gains and clean up the final commit body so it matches the result we ended up having 🙂
d3705af
into
master
Summary
The web client presented frames through
softbuffer, whose web backend repacksthe whole surface (RGBA → u32 → RGBA into a fresh buffer) on every present.
This replaces it with a direct
put_image_datathat uploads only the dirtyregion, and drops the
softbufferdependency.Same idea as the IronVNC change.
What changed
softbufferdependency; present each dirty region withput_image_dataat its origin.extract_partial_imagefillsa single
WriteBufreused across frames, so steady-state draws don't allocate.WriteBuf::filled_muttoironrdp-core(mutable counterpart offilled).web-sys: addCanvasRenderingContext2d+ImageData, drop the softbuffer-onlyfeatures.
Performance
Draw-stage time on a 1080p replay (595 frames / 110 dirty regions), headless
Chromium, 8 measured passes × 3 runs, median. Both rows are reproducible branches
off the replay-bench harness; the only difference is the render path.
present_with_damagebench/draw-softbufferWriteBuf)bench/draw-zerocopysurface every present.
WriteBuf(vs a per-frame allocation) keeps the steady-state drawallocation-free; the remaining cost is the unavoidable
ImageDataJS copy.2d8e1b79matches the recordedground truth and the rendered-canvas FNV-1a is unchanged.
ratio held across runs.
Reproduce:
Correctness
put_image_datastores alpha verbatim, and the decoded framebuffer isn'tguaranteed opaque — it's zero-initialised, a widened whole-rows region can cover
not-yet-painted columns (alpha 0), and the QOI-RGBA path copies source alpha. So
we force alpha opaque before upload. A scan-then-conditionally-force was tried and
is slower than just forcing (the check touches the same bytes), so the
unconditional force stays.
Follow-up (separate PR)
Guarantee framebuffer opacity upstream in
ironrdp-session(init alpha to0xff+clamp
apply_rgba32); after that the web side can drop the alpha force entirely.