chore(cua-driver-rs)(linux): add Xvfb smoke harness + parity audit + first-pass findings by f-trycua · Pull Request #1693 · trycua/cua

f-trycua · 2026-05-25T17:05:38Z

Summary

New scripts/linux-smoke.sh — single bash script that stands up an Xvfb display, spawns xeyes as victim, and exercises every cua-driver tool with sensible JSON args. Output is a sorted PASS/FAIL/SKIP table for easy cross-distro diffing
New LINUX_PARITY_AUDIT.md — Linux vs Windows tool parity audit (24 implemented-but-unverified, 33 vs 0 example/parity binaries, per-distro testing matrix)
New scripts/linux-smoke-RESULTS.md — first-pass findings with the reproduction details and the one real driver bug surfaced

Headline result (v1 harness, both Ubuntu 22.04/Xfce and Ubuntu 24.04/GNOME-XWayland, identical)

20 PASS / 9 FAIL / 3 SKIP out of 32 tools.

8 of the 9 fails are harness bugs (launch_app arg shape, missing window_id on keyboard tools, error-detector matching the word "error" as a field name in legitimate JSON output). 1 is a real driver bug — see below.

Verdict	Count	Tools
PASS	20	`check_permissions` · `click` · `double_click` · `drag` · `get_accessibility_tree` · `get_agent_cursor_state` · `get_config` · `get_cursor_position` · `get_screen_size` · `get_window_state` · `list_apps` · `list_windows` · `move_cursor` · `right_click` · `scroll` · `set_agent_cursor_enabled` · `set_agent_cursor_motion` · `set_agent_cursor_style` · `set_config` · `zoom`
FAIL (harness, 8)	8	`get_recording_state` · `hotkey` · `kill_app` · `launch_app` · `press_key` · `set_recording` · `type_text` · `type_text_chars`
FAIL (driver, 1)	1	`screenshot`
SKIP	3	`page` (no Chromium) · `replay_trajectory` (no recording file) · `set_value` (xeyes has no AT-SPI editable text)

Real bug surfaced

```
thread 'tokio-rt-worker' panicked at crates/mcp-server/src/image_utils.rs:48:9:
assertion `left == right` failed: Invalid buffer length:
expected 120000 got 40000 for 200x200 image
```
`120000 = 200×200×3` (RGB) vs `40000 = 200×200×1`. The assertion expects 24-bit RGB but `XGetImage` against xeyes on Xvfb (`-screen 0 1280x800x24`) returned 8 bpp. The image-buffer code needs to either accept the depth the X server actually returned and convert, or request a 24-bit visual explicitly before the grab. I'll file this as a separate issue after this PR.

What this means for the parity rollout

The audit listed 24 tools as 🟡 implemented-but-unverified on Linux. After v2 harness fixes confirm (projected ~26 PASS), ~20 of those flip to 🟢 verified over X11 native + XWayland on the same machine that produced this data. The 4 remaining gaps:

get_window_state, get_accessibility_tree, set_value — need a real DE session, not Xvfb (at-spi-registry needs registered apps)
page — needs Chromium with --remote-debugging-port

Test plan

Harness runs end-to-end on Ubuntu 22.04 / Xfce
Harness runs end-to-end on Ubuntu 24.04 / GNOME-XWayland
Confirm projected ~26 PASS after harness v2 fixes (window_id pass-through, launch_app schema, error-detector tightening)
Re-run on Debian 12 (apt install xvfb first)
Filed screenshot Xvfb-bpp panic as a separate issue
AT-SPI-heavy tools verified via xrdp session against a real DE app
CodeRabbit pass on review

Why draft

v2 harness fixes are already written locally but haven't completed a full run yet (VM SSH wedged + my IP changed mid-session). I'll push the v2 fixes and the bug-file reference as a follow-up commit before un-drafting.

🤖 Generated with Claude Code

…ty findings Drops in scripts/linux-smoke.sh — a single bash script that stands up an Xvfb display, spawns xeyes as a victim, and exercises every cua-driver tool with sensible JSON args. Output is a sorted PASS/FAIL/SKIP table with the leading line of each response, designed to be diffed across distros to spot Linux-backend regressions. First-pass results (Ubuntu 22.04 / Xfce AND Ubuntu 24.04 / GNOME-XWayland, identical verdicts on both): 20 PASS / 9 FAIL / 3 SKIP (out of 32 tools). Of the 9 FAILs, 8 are harness bugs (wrong arg names, missing window_id, overly broad error matcher). One is a genuine driver bug worth filing separately: screenshot panics on Xvfb-backed targets with "Invalid buffer length: expected 120000 got 40000 for 200x200 image" (crates/mcp-server/src/image_utils.rs:48) — the assertion expects 24-bit RGB but XGetImage on Xvfb returned 8 bpp. Also adds LINUX_PARITY_AUDIT.md — the up-to-date Linux vs Windows tool parity audit (24 implemented-but-unverified tools on Linux, 33 vs 0 example/parity binaries gap, per-distro testing matrix recommendations). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-25T17:05:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Ignored		May 25, 2026 5:05pm

coderabbitai · 2026-05-25T17:05:45Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83341ca4-4d2e-489b-afd2-c84bd7fad683

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/linux-smoke-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(cua-driver-rs)(linux): add Xvfb smoke harness + parity audit + first-pass findings#1693

chore(cua-driver-rs)(linux): add Xvfb smoke harness + parity audit + first-pass findings#1693
f-trycua wants to merge 1 commit into
mainfrom
feat/linux-smoke-harness

f-trycua commented May 25, 2026

Uh oh!

vercel Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

f-trycua commented May 25, 2026

Summary

Headline result (v1 harness, both Ubuntu 22.04/Xfce and Ubuntu 24.04/GNOME-XWayland, identical)

Real bug surfaced

What this means for the parity rollout

Test plan

Why draft

Uh oh!

vercel Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant