Skip to content

chore(cua-driver-rs)(linux): add Xvfb smoke harness + parity audit + first-pass findings#1693

Draft
f-trycua wants to merge 1 commit into
mainfrom
feat/linux-smoke-harness
Draft

chore(cua-driver-rs)(linux): add Xvfb smoke harness + parity audit + first-pass findings#1693
f-trycua wants to merge 1 commit into
mainfrom
feat/linux-smoke-harness

Conversation

@f-trycua
Copy link
Copy Markdown
Collaborator

Summary

  • New scripts/linux-smoke.sh — single bash script that stands up an Xvfb display, spawns xeyes as victim, and exercises every cua-driver tool with sensible JSON args. Output is a sorted PASS/FAIL/SKIP table for easy cross-distro diffing
  • New LINUX_PARITY_AUDIT.md — Linux vs Windows tool parity audit (24 implemented-but-unverified, 33 vs 0 example/parity binaries, per-distro testing matrix)
  • New scripts/linux-smoke-RESULTS.md — first-pass findings with the reproduction details and the one real driver bug surfaced

Headline result (v1 harness, both Ubuntu 22.04/Xfce and Ubuntu 24.04/GNOME-XWayland, identical)

20 PASS / 9 FAIL / 3 SKIP out of 32 tools.

8 of the 9 fails are harness bugs (launch_app arg shape, missing window_id on keyboard tools, error-detector matching the word "error" as a field name in legitimate JSON output). 1 is a real driver bug — see below.

Verdict Count Tools
PASS 20 check_permissions · click · double_click · drag · get_accessibility_tree · get_agent_cursor_state · get_config · get_cursor_position · get_screen_size · get_window_state · list_apps · list_windows · move_cursor · right_click · scroll · set_agent_cursor_enabled · set_agent_cursor_motion · set_agent_cursor_style · set_config · zoom
FAIL (harness, 8) 8 get_recording_state · hotkey · kill_app · launch_app · press_key · set_recording · type_text · type_text_chars
FAIL (driver, 1) 1 screenshot
SKIP 3 page (no Chromium) · replay_trajectory (no recording file) · set_value (xeyes has no AT-SPI editable text)

Real bug surfaced

```
thread 'tokio-rt-worker' panicked at crates/mcp-server/src/image_utils.rs:48:9:
assertion `left == right` failed: Invalid buffer length:
expected 120000 got 40000 for 200x200 image
```
`120000 = 200×200×3` (RGB) vs `40000 = 200×200×1`. The assertion expects 24-bit RGB but `XGetImage` against xeyes on Xvfb (`-screen 0 1280x800x24`) returned 8 bpp. The image-buffer code needs to either accept the depth the X server actually returned and convert, or request a 24-bit visual explicitly before the grab. I'll file this as a separate issue after this PR.

What this means for the parity rollout

The audit listed 24 tools as 🟡 implemented-but-unverified on Linux. After v2 harness fixes confirm (projected ~26 PASS), ~20 of those flip to 🟢 verified over X11 native + XWayland on the same machine that produced this data. The 4 remaining gaps:

  • get_window_state, get_accessibility_tree, set_value — need a real DE session, not Xvfb (at-spi-registry needs registered apps)
  • page — needs Chromium with --remote-debugging-port

Test plan

  • Harness runs end-to-end on Ubuntu 22.04 / Xfce
  • Harness runs end-to-end on Ubuntu 24.04 / GNOME-XWayland
  • Confirm projected ~26 PASS after harness v2 fixes (window_id pass-through, launch_app schema, error-detector tightening)
  • Re-run on Debian 12 (apt install xvfb first)
  • Filed screenshot Xvfb-bpp panic as a separate issue
  • AT-SPI-heavy tools verified via xrdp session against a real DE app
  • CodeRabbit pass on review

Why draft

v2 harness fixes are already written locally but haven't completed a full run yet (VM SSH wedged + my IP changed mid-session). I'll push the v2 fixes and the bug-file reference as a follow-up commit before un-drafting.

🤖 Generated with Claude Code

…ty findings

Drops in scripts/linux-smoke.sh — a single bash script that stands up an
Xvfb display, spawns xeyes as a victim, and exercises every cua-driver
tool with sensible JSON args. Output is a sorted PASS/FAIL/SKIP table
with the leading line of each response, designed to be diffed across
distros to spot Linux-backend regressions.

First-pass results (Ubuntu 22.04 / Xfce AND Ubuntu 24.04 / GNOME-XWayland,
identical verdicts on both):

  20 PASS / 9 FAIL / 3 SKIP (out of 32 tools).

Of the 9 FAILs, 8 are harness bugs (wrong arg names, missing window_id,
overly broad error matcher). One is a genuine driver bug worth filing
separately:

  screenshot panics on Xvfb-backed targets with
    "Invalid buffer length: expected 120000 got 40000 for 200x200 image"
  (crates/mcp-server/src/image_utils.rs:48) — the assertion expects
  24-bit RGB but XGetImage on Xvfb returned 8 bpp.

Also adds LINUX_PARITY_AUDIT.md — the up-to-date Linux vs Windows tool
parity audit (24 implemented-but-unverified tools on Linux, 33 vs 0
example/parity binaries gap, per-distro testing matrix recommendations).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Ignored Ignored May 25, 2026 5:05pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83341ca4-4d2e-489b-afd2-c84bd7fad683

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/linux-smoke-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant