Skip to content

fix: element_index clicks should produce click.png in cua-driver-rs trajectory recorder (Windows)#1737

Open
mvanhorn wants to merge 1 commit into
trycua:mainfrom
mvanhorn:fix/1725-fix-elementindex-clicks-should-produce-c
Open

fix: element_index clicks should produce click.png in cua-driver-rs trajectory recorder (Windows)#1737
mvanhorn wants to merge 1 commit into
trycua:mainfrom
mvanhorn:fix/1725-fix-elementindex-clicks-should-produce-c

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented May 27, 2026

Summary

fix: element_index clicks should produce click.png in cua-driver-rs trajectory recorder (Windows)

Closes #1725


AI was used for assistance.

Summary by CodeRabbit

  • Bug Fixes

    • Improved click coordinate conversion for click, double-click, and right-click actions to accurately map screen positions to window-local coordinates, enhancing click accuracy and reliability.
  • Tests

    • Added test coverage for coordinate conversion and click marker validation behavior.

Review Change Stack

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 27, 2026

@mvanhorn is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR adds click.png markers for element_index-based clicks on Windows by parsing screen coordinates from result_summary and converting them to screenshot-local coordinates using a new platform callback that resolves window bitmap origins via DwmGetWindowAttribute.

Changes

Click.png markers for element_index clicks with window bitmap coordinate conversion

Layer / File(s) Summary
Click-point resolution with element_index support
libs/cua-driver/rust/crates/cua-driver-core/src/recording.rs
Added set_window_bitmap_origin_fn callback registration backed by OnceLock to map (window_id, pid) to bitmap origin. Added helpers to parse screen coordinates from result_summary and convert to window-local. Updated write_turn to derive click points from result_summary for element_index clicks and convert them via the platform callback before falling back to element bounds. Includes unit tests covering both derived screen-to-local conversion and click.png marker writing behavior.
Recording loader visibility change
libs/cua-driver/rust/crates/cua-driver-core/src/recording_loader.rs
Changed parse_screen_from_summary from file-private to pub(crate) to enable reuse in core recording logic for extracting coordinates from result text.
Windows window bitmap origin implementation
libs/cua-driver/rust/crates/platform-windows/src/recording_hooks.rs
Added window_bitmap_origin_xy function that determines screenshot bitmap top-left in screen coordinates using DwmGetWindowAttribute(DWMWA_EXTENDED_FRAME_BOUNDS) with crop inset, falling back to GetWindowRect if needed. Updated element_window_local_xy to subtract bitmap origin instead of window screen origin. Extended non-Windows API with window_bitmap_origin_xy stub returning None.
Registry callback wiring for Windows
libs/cua-driver/rust/crates/cua-driver/src/main.rs
Registered set_window_bitmap_origin_fn callback in both build_registry and build_registry_no_cursor Windows branches to wire core recording to platform_windows::recording_hooks::window_bitmap_origin_xy for daemon/server and CLI subcommand modes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • trycua/cua#1708: Both PRs adjust Windows coordinate-space conversion for click/drag by deriving the window/screenshot bitmap origin from DWMWA_EXTENDED_FRAME_BOUNDS (with the crop inset) and using it to map between bitmap-local and screen coordinates, so the changes overlap at the same coordinate-origin logic.

  • trycua/cua#1720: Both PRs modify click.png coordinate conversion in the recording pipeline—feat(cua-driver-rs)(recording): native ScreenCaptureKit on macOS + app_state.json/click.png regressions #1720 adds element_index→window-local screenshot coordinate support via Windows recording hooks, while this PR further changes Windows element_window_local_xy/recording core to use a new window_bitmap_origin_xy origin resolver (and extends write_turn to derive x/y from result_summary before converting).

Poem

🐰 A rabbit hops through clicks so fine,
From screen-space threads to local line—
Result summaries yield their (x, y) prize,
DwmGetWindowAttribute brings window eyes,
Now every element's marked where it lies! 🖱️✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main fix: enabling click.png marker generation for element_index clicks on Windows, which was the core problem described in #1725.
Linked Issues check ✅ Passed The PR implementation fulfills all #1725 coding requirements: recovers click coordinates from result_summary, converts screen-absolute to window-local coordinates using window bounds, produces click.png for element_index clicks, and preserves existing pixel-click behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the #1725 fix: core recording callback registration, coordinate parsing and conversion helpers, platform-specific window bitmap origin computation, and unit test coverage for the new functionality.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
libs/cua-driver/rust/crates/platform-windows/src/recording_hooks.rs (1)

68-70: 💤 Low value

Consider centralizing DWM_CROP_INSET_PX.

The constant is duplicated with a sync comment. A shared constant in a common module would eliminate the risk of divergence during future maintenance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver/rust/crates/platform-windows/src/recording_hooks.rs` around
lines 68 - 70, Multiple copies of DWM_CROP_INSET_PX are maintained separately
(here and in capture::DWM_CROP_INSET_PX and tools::impl_::bitmap_to_screen);
create a single shared constant and reference it everywhere to avoid drift. Add
a single public constant (e.g., pub(crate) const DWM_CROP_INSET_PX: i32 = 1) in
a common module (like a new crate-level constants module or existing capture
module), replace the local const DWM_CROP_INSET_PX in recording_hooks.rs with an
import/use of that shared constant, and update the other places (capture and
tools::impl_::bitmap_to_screen) to reference the same symbol instead of defining
their own copies; keep the type as i32 and ensure visibility is appropriate for
all callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@libs/cua-driver/rust/crates/platform-windows/src/recording_hooks.rs`:
- Around line 68-70: Multiple copies of DWM_CROP_INSET_PX are maintained
separately (here and in capture::DWM_CROP_INSET_PX and
tools::impl_::bitmap_to_screen); create a single shared constant and reference
it everywhere to avoid drift. Add a single public constant (e.g., pub(crate)
const DWM_CROP_INSET_PX: i32 = 1) in a common module (like a new crate-level
constants module or existing capture module), replace the local const
DWM_CROP_INSET_PX in recording_hooks.rs with an import/use of that shared
constant, and update the other places (capture and
tools::impl_::bitmap_to_screen) to reference the same symbol instead of defining
their own copies; keep the type as i32 and ensure visibility is appropriate for
all callers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e47c1614-78fa-469a-bccb-4d795843b2cb

📥 Commits

Reviewing files that changed from the base of the PR and between 7d893ce and 0ee9178.

📒 Files selected for processing (4)
  • libs/cua-driver/rust/crates/cua-driver-core/src/recording.rs
  • libs/cua-driver/rust/crates/cua-driver-core/src/recording_loader.rs
  • libs/cua-driver/rust/crates/cua-driver/src/main.rs
  • libs/cua-driver/rust/crates/platform-windows/src/recording_hooks.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cua-driver-rs (Windows): trajectory recorder writes no click.png for element_index clicks (only pixel-clicks get the marker)

1 participant