Skip to content

perf(web): replace softbuffer with direct put_image_data canvas present#1374

Merged
irvingouj@Devolutions (irvingoujAtDevolution) merged 7 commits into
masterfrom
perf/web-remove-softbuffer
Jun 25, 2026
Merged

perf(web): replace softbuffer with direct put_image_data canvas present#1374
irvingouj@Devolutions (irvingoujAtDevolution) merged 7 commits into
masterfrom
perf/web-remove-softbuffer

Conversation

@irvingoujAtDevolution

@irvingoujAtDevolution irvingouj@Devolutions (irvingoujAtDevolution) commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

The web client presented frames through softbuffer, whose web backend repacks
the whole surface (RGBA → u32 → RGBA into a fresh buffer) on every present.
This replaces it with a direct put_image_data that uploads only the dirty
region, and drops the softbuffer dependency.

Same idea as the IronVNC change.

What changed

  • Remove the softbuffer dependency; present each dirty region with
    put_image_data at its origin.
  • No full-surface buffer and no per-region scratch. extract_partial_image fills
    a single WriteBuf reused across frames, so steady-state draws don't allocate.
  • Force opaque alpha before upload (kept — see Correctness).
  • Add WriteBuf::filled_mut to ironrdp-core (mutable counterpart of filled).
  • web-sys: add CanvasRenderingContext2d + ImageData, drop the softbuffer-only
    features.

Performance

Draw-stage time on a 1080p replay (595 frames / 110 dirty regions), headless
Chromium, 8 measured passes × 3 runs, median. Both rows are reproducible branches
off the replay-bench harness; the only difference is the render path.

Render path draw (ms) vs softbuffer branch
softbuffer present_with_damage ~1031 bench/draw-softbuffer
this PR (direct upload, reused WriteBuf) ~97 ~10.6× bench/draw-zerocopy
  • The win is structural: upload the dirty region instead of repacking the whole
    surface every present.
  • Reusing one WriteBuf (vs a per-frame allocation) keeps the steady-state draw
    allocation-free; the remaining cost is the unavoidable ImageData JS copy.
  • Output is byte-identical: framebuffer CRC32 2d8e1b79 matches the recorded
    ground truth and the rendered-canvas FNV-1a is unchanged.
  • Absolute ms carry ~±15% noise from machine load (decode drifted 1.5–1.9 s); the
    ratio held across runs.

Reproduce:

git checkout bench/draw-softbuffer   # or bench/draw-zerocopy
cd crates/ironrdp-web && wasm-pack build --target web --release -- --features bench
cd bench-harness && node run.mjs --capture /bench-corpus/<your>.irdprec --passes 8

Correctness

put_image_data stores alpha verbatim, and the decoded framebuffer isn't
guaranteed opaque — it's zero-initialised, a widened whole-rows region can cover
not-yet-painted columns (alpha 0), and the QOI-RGBA path copies source alpha. So
we force alpha opaque before upload. A scan-then-conditionally-force was tried and
is slower than just forcing (the check touches the same bytes), so the
unconditional force stays.

Follow-up (separate PR)

Guarantee framebuffer opacity upstream in ironrdp-session (init alpha to 0xff +
clamp apply_rgba32); after that the web side can drop the alpha force entirely.

The render path converted each dirty region RGBA -> u32 `0RGB`, then let
softbuffer repack u32 -> RGBA into a freshly allocated buffer every frame —
two pixel passes over the whole surface plus a per-frame allocation. Replace
it with the canvas's own 2D context: one copy of the region into a reused
RGBA scratch (alpha forced opaque) followed by put_image_data at the region
origin. softbuffer is dropped from ironrdp-web (still used by ironrdp-viewer).
Mirrors the same fix in IronVNC.

Measured with a record/replay draw bench (dev wasm, headless Chromium),
draw-stage median: 4K 1706ms -> 83ms (~20x), 1080p 705ms -> 14ms (~50x),
with byte-identical canvas output and unchanged framebuffer checksums.
@irvingoujAtDevolution

Copy link
Copy Markdown
Contributor Author

Same as VNC, remove softbuffer

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ironrdp-web rendering path to remove the softbuffer dependency and present updated regions by uploading RGBA buffers directly to an HTML canvas via CanvasRenderingContext2d::put_image_data, aiming to reduce per-frame work and allocations.

Changes:

  • Replaces the softbuffer-based canvas present path with direct ImageData + put_image_data blits for dirty regions.
  • Removes the softbuffer dependency from ironrdp-web.
  • Enables additional web-sys features needed for 2D canvas rendering (CanvasRenderingContext2d, ImageData).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
crates/ironrdp-web/src/canvas.rs Implements the new 2D-context put_image_data rendering path and removes the softbuffer surface logic.
crates/ironrdp-web/Cargo.toml Drops softbuffer dependency; enables required web-sys features for 2D canvas + ImageData.
Cargo.lock Updates lockfile dependency edges to reflect removal of softbuffer from ironrdp-web.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/ironrdp-web/src/canvas.rs Outdated
Comment thread crates/ironrdp-web/src/canvas.rs
Comment thread crates/ironrdp-web/src/canvas.rs Outdated
Comment thread crates/ironrdp-web/src/canvas.rs Outdated

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Address review on the canvas present path: the buffer-grows branch
initialized the scratch via spare_capacity_mut + set_len and then
unconditionally re-copied the whole buffer, writing every byte twice on
the grow path behind an unsafe block that bought nothing.

Replace it with clear() + extend_from_slice: a single copy that writes
straight into spare capacity (no zero-fill), reuses the allocation in
steady state, and removes the unsafe entirely. Also fix doc comments
that referenced a non-existent in-crate benchmark and reword the resize
and context_2d docs to be accurate.
@irvingoujAtDevolution

irvingouj@Devolutions (irvingoujAtDevolution) commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Alex Yusiuk (@RRRadicalEdward) Benoît Cortier (@CBenoit) after a closer look, I realized that this rgba buffer might not be needed to begin with, so removed it, no allocation/initialization issue anymore. It's even 20% faster than what we have in the first draft!

extract_partial_image already returns an owned, region-sized RGBA copy,
so the reusable rgba scratch in Canvas was a redundant second buffer.
Force opaque alpha directly on that buffer and hand it straight to
put_image_data; Canvas no longer keeps any per-frame state.

This removes the scratch allocation, its copy, and the whole zero-fill /
set_len question along with it. draw takes &mut [u8] now (the buffer is
the caller's throwaway copy, safe to mutate). Output is byte-identical
(replay bench: framebuffer CRC and canvas FNV-1a unchanged) and the
present path is ~20-30% faster than the buffered version.

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re getting at a good place!

Comment thread crates/ironrdp-web/src/canvas.rs Outdated
Comment thread crates/ironrdp-web/src/canvas.rs Outdated
Comment on lines +18 to +22
/// This replaced a softbuffer-backed path that converted RGBA -> u32 `0RGB` (our pass) and then let
/// softbuffer repack u32 -> RGBA per frame into a freshly allocated buffer — two pixel passes over
/// the whole surface plus a per-frame allocation. The direct path drops the u32 round-trip and the
/// per-frame allocation, measuring an order of magnitude faster present at 4K with byte-identical
/// canvas output. Mirrors the same fix in IronVNC.

@CBenoit Benoît Cortier (CBenoit) Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping some historical documentation on how we ended up converging may be okay in order to justify the current design, but it currently sounds like a "prompt artifact": "Mirrors the same fix in IronVNC" is completely irrelevant outside of this PR.

Comment thread crates/ironrdp-web/src/canvas.rs Outdated
Comment on lines +54 to 56
for pixel in buffer.chunks_exact_mut(4) {
pixel[3] = 0xFF;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Just challenging the established pattern here: do we really need to force opaque alpha? I assume that if we don’t do that we end up with visual artifacts? My understanding is that we are just extracting a sub-image from an otherwise fully rendered image, and that we don’t really need any extra cleaning step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benoît Cortier (@CBenoit) yeah, works for most cases but it's not guaranteed. Framebuffer starts zero-filled (here) and the QOI-RGBA path keeps source alpha (here) — plus a tall update gets widened to full width, so early / post-resize frames can upload not-yet-painted columns as transparent.

Best case we'd guarantee opaque upstream (init the framebuffer to 0xff + clamp apply_rgba32), but for the scope of this PR I think it's better to keep the force and fix it in a follow-up. Sound good?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! My only suggestion then is to track this with an issue for visibility

Comment thread crates/ironrdp-web/src/session.rs Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job 👍.
Benoit already pointed out good things. I have nothing to add

@RRRadicalEdward

Copy link
Copy Markdown
Collaborator

Alex Yusiuk (Alex Yusiuk (@RRRadicalEdward)) Benoît Cortier (Benoît Cortier (@CBenoit)) after a closer look, I realized that this rgba buffer might not be needed to begin with, so removed it, no allocation/initialization issue anymore. It's even 20% faster than what we have in the first draft!

Could you apply this method to IronVNC as well?

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated no new comments.

extract_partial_image now fills a caller-owned WriteBuf (unfilled_to +
advance) instead of allocating a fresh Vec on every call. session.rs
keeps one buffer across frames and clears it per region, so steady-state
draws don't allocate. Adds WriteBuf::filled_mut for the in-place alpha
fixup.

Also inline the single-call-site blit helper into draw, and reword the
canvas/extract docs to describe the types and their contracts rather
than the calling code.
Cut the verbose doc blocks down to the non-obvious rationale (why force
alpha, the whole-rows widening, the WriteBuf clear-between-regions
contract), 2 lines max each.
@irvingoujAtDevolution

Copy link
Copy Markdown
Contributor Author

Updated, we now use WriteBuf to achive effectively O(1) allocation per draw on the Rust side rendering path.

region: InclusiveRectangle,
buffer: &mut WriteBuf,
) -> InclusiveRectangle {
// PERF: needs actual benchmark to find a better heuristic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Since you have the tools for benchmarking, I think it may be worth to look into this comment as well. As a follow up 🙂

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job! Before we merge, let's measure the final gains and clean up the final commit body so it matches the result we ended up having 🙂

@irvingoujAtDevolution irvingouj@Devolutions (irvingoujAtDevolution) changed the title perf(web): replace softbuffer canvas present with direct put_image_data perf(web): replace softbuffer with direct put_image_data canvas present Jun 25, 2026
@irvingoujAtDevolution irvingouj@Devolutions (irvingoujAtDevolution) merged commit d3705af into master Jun 25, 2026
21 checks passed
@irvingoujAtDevolution irvingouj@Devolutions (irvingoujAtDevolution) deleted the perf/web-remove-softbuffer branch June 25, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants