Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions LINUX_PARITY_AUDIT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# cua-driver Linux vs Windows parity audit

**Date:** 2026-05-24
**cua-driver version on VMs:** 0.2.18
**Source review:** main @ `5ad4cfeb` (latest pull this session)

## TL;DR

| | Count |
|---|---|
| Tool names on Windows | **30** |
| Tool names on Linux | **29** |
| Tools where Linux IMPLEMENTATION exists (impl_.rs) | **29 / 29** |
| Tools where Linux is VERIFIED on a real Linux host (per PARITY.md before today) | **3** |
| Tools NEWLY verified today via Xvfb on Ubuntu 22.04 | **2 more** (`check_permissions`, `list_windows`) |
| **Effective parity gap right now** | **24 tools UNVERIFIED on Linux** despite having implementations |

**The code is essentially at parity. The TESTS aren't.** Windows has 33 example/parity binaries under `crates/platform-windows/examples/`. Linux has **zero**.

---

## Tool-by-tool parity status

### Surface differences (the only structural delta)

| Tool | Windows | Linux | Why |
|---|---|---|---|
| `debug_window_info` | ✅ | ❌ missing | Windows-only diagnostic dumping HWND state — not portable |

Everything else has matching tool names + schemas.

### Status per tool (collated from PARITY.md + today's empirical work)

🟢 = VERIFIED on Linux · 🟡 = IMPLEMENTED but UNVERIFIED on a real Linux host · 🔵 = INTENTIONAL_DIVERGENCE · 🔴 = MISSING

| Tool | Linux status | Windows status | Notes |
|---|---|---|---|
| `move_cursor` | 🔵 | 🟢 | Intentional: Linux+Windows are overlay-only (no real-cursor warp); macOS-only warps the OS cursor. Verified overlay behavior. |
| `get_cursor_position` | 🟢 | 🟢 | PARITY.md says VERIFIED. |
| `get_screen_size` | 🟢 | 🟢 | PARITY.md says VERIFIED. |
| `check_permissions` | 🔵 → 🟢 today | 🟢 | Intentional: Linux returns `{atspi, x11, xsend_event}` triple instead of macOS's `{accessibility, screen_recording}`. **Today's Xvfb test passed all 3 true.** |
| `list_apps` | 🟡 | 🟢 | Code ready but not exercised. /proc/+ XDG-walk-based on Linux. |
| `list_windows` | 🟡 → 🟢 today | 🟢 | **Today's Xvfb test found xeyes.** Multi-app stress untested. |
| `get_window_state` | 🟡 | 🟢 | The big one — combines AT-SPI walk + screenshot. Linux uses at-spi-bus-launcher; differs across DE (Xfce vs GNOME vs KDE). |
| `screenshot` | 🟡 | 🟢 | XGetImage (x11rb) or ImageMagick `import` fallback. Today's test hit SSH-pipe timeout on the b64 blob, not a real failure. |
| `click` | 🟡 | 🟢 | XSendEvent ButtonPress/Release. |
| `double_click` | 🟡 | 🟢 | Same primitive, 2 events. |
| `right_click` | 🟡 | 🟢 | Same primitive, Button3. |
| `drag` | 🟡 | 🟢 | ButtonPress + MotionNotify×steps + ButtonRelease. |
| `scroll` | 🟡 | 🟢 | XSendEvent Button4/Button5 events. |
| `type_text` | 🟡 | 🟢 | XKeyEvent via XTest. |
| `type_text_chars` | 🟡 | 🟢 | Character-by-character keysym lookup. |
| `press_key` | 🟡 | 🟢 | Single XKeyPress/Release. |
| `hotkey` | 🟡 | 🟢 | Modifier+key composed via XTest. |
| `set_value` | 🟡 | 🟢 | AT-SPI `setText` via atspi crate. |
| `launch_app` | 🟡 | 🟢 | Forks process (exec via .desktop discovery or absolute path). |
| `kill_app` | 🟡 | 🟢 | SIGKILL via libc. |
| `get_accessibility_tree` | 🟡 | 🟢 | AT-SPI walk. |
| `zoom` | 🟡 | 🟢 | Crop region from screenshot. |
| `get_config` / `set_config` | 🟡 | 🟢 | JSON config persistence. |
| `set_agent_cursor_enabled` | 🟡 | 🟢 | Overlay show/hide. |
| `set_agent_cursor_style` | 🟡 | 🟢 | Overlay icon. |
| `set_agent_cursor_motion` | 🟡 | 🟢 | Overlay animation params. |
| `get_agent_cursor_state` | 🟡 | 🟢 | Overlay state readback. |
| `page` (browser JS exec) | 🟡 | 🟢 | CDP-based; same path. |
| `replay_trajectory` | 🟢 | 🟢 | Trajectory file → re-issue calls. Cross-platform. |
| `debug_window_info` | 🔴 | 🟢 | Windows-only HWND diagnostic. Intentional. |

**Score:** 4 verified · 24 implemented-but-unverified · 1 intentional divergence · 1 Windows-only · 1 missing-on-macOS-too.

### Verification artifact gap

```
crates/platform-windows/examples/ ← 33 parity binaries (one per tool)
crates/platform-linux/examples/ ← does not exist (0 binaries)
```

**This is the single biggest action item.** Each `*_parity.rs` example takes ~50–100 LOC and validates one tool against a known fixture (calculator open, xeyes window, etc.). Building the Linux equivalents is the cheapest way to flip 24 🟡s to 🟢s.

---

## How to split testing across distros

Linux variance falls on **3 axes** that matter for cua-driver-rs:

| Axis | Why it matters for cua-driver | Variants worth testing |
|---|---|---|
| **Display server** | Native code path: X11 direct, Wayland needs XWayland fallback. XWayland adds geometry/scaling translation bugs. | X11 native · Wayland+XWayland · (eventually: pure Wayland regression detection) |
| **AT-SPI provider** | Different DEs ship different at-spi versions + register elements differently. The `get_window_state` / `get_accessibility_tree` output varies. | Xfce (at-spi-bus-launcher) · GNOME (gnome-shell a11y bridge) · KDE (kf5-at-spi) |
| **Package family + glibc** | Affects install scripts (apt vs dnf), shared lib compat, kernel-XTest API. | Debian/Ubuntu (apt, glibc 2.35+) · RHEL/Fedora (rpm) · Arch (rolling) |

### Tier 1 — the 3 VMs we just provisioned (cover ~80%)

| VM | Combo | Hits which axes |
|---|---|---|
| `cua-linux-ubuntu2204` | Xfce, X11 native, deb | **All three native-X11 paths**: x11rb direct, at-spi-bus-launcher, glibc 2.35. The "happy path." |
| `cua-linux-ubuntu2404` | GNOME, Wayland+XWayland, deb | XWayland fallback path + gnome-shell at-spi bridge. Newer glibc 2.39. |
| `cua-linux-debian12` | GNOME, Wayland+XWayland, deb | Older glibc 2.36 + slightly older at-spi version. Catches "works on Ubuntu, breaks on Debian" issues. |

**Recommended testing matrix on these 3**:
1. **All 28 tools** smoke-tested in a fresh xrdp session on each VM.
2. **at-spi-heavy tools** (`get_window_state`, `get_accessibility_tree`, `set_value`) run against representative apps:
- LibreOffice Writer (open in both Xfce and GNOME — output should differ; document the diff)
- Firefox (Mozilla a11y is its own beast)
- GNOME Files / Thunar (DE-native file managers)
3. **Multi-monitor + fractional-scaling** specifically on Combo B (GNOME default is fractional scaling on Wayland; this breaks naive XGetImage)

### Tier 2 — high-value adds (~one more day of setup)

| Add | Why |
|---|---|
| **Fedora 41 + GNOME** | RPM package family. Newer SELinux defaults (could block XTest). Different kernel branch. |
| **Kubuntu 24.04 + KDE** | Third major at-spi provider (kf5-at-spi). KDE-native apps (Dolphin, Kate) have distinct AT-SPI structure. |
| **Ubuntu 22.04 + KDE Plasma** | Mixes "older base" with "different DE" — surfaces at-spi-version-vs-DE bugs. |

### Tier 3 — defensive regression detection

| Add | Why |
|---|---|
| **Pure Wayland (sway, KDE Wayland)** | EXPOSES the lack of native Wayland support. cua-driver should fail with a clear "no X server / start XWayland" error. Catches regressions when someone tries to "make it work on pure Wayland" half-heartedly. |
| **Linux ARM64 (Azure ARM VM)** | Verifies x11rb / atspi crates build on ARM. cua-driver is x86-only today; ARM build is the test that surfaces dependencies that aren't ARM-ready. |
| **Headless Xvfb-only** (no real DE) | What we did today. Useful for CI: no GUI session needed, captures the bulk of the X11 code path. Worth pinning as a CI workflow. |

---

## Recommended next actions, in priority order

1. **Run the 28-tool smoke test on Combo A first** (X11 native, simplest). Once that's clean, the bug surface narrows to "Wayland/XWayland-specific" or "GNOME-specific" for B+C.
2. **Author Linux parity examples** — port `crates/platform-windows/examples/*_parity.rs` to `crates/platform-linux/examples/` one per tool. ~30 small files, mostly mechanical. Each one flips a 🟡 to 🟢 in PARITY.md.
3. **Pick the riskiest 3 tools to verify on B+C now**: `get_window_state` (at-spi-heavy), `screenshot` (XWayland scaling), `click` (XSendEvent through XWayland). Catches the biggest "Linux-specific landmines" early.
4. **Add a Xvfb-only CI job** that runs the parity examples on every PR. Cheapest way to keep Linux from regressing silently.
5. **Update PARITY.md** in batches as tools get verified — flip 🟡 → 🟢 with the specific VM / test + commit hash.

---

## What's already proven today

- Install path: `install-local.sh --release` works on all 3 distros (Ubuntu 22.04, Ubuntu 24.04, Debian 12) end-to-end. Binary lands at `~/.local/bin/cua-driver`.
- Binary runtime: `cua-driver --version`, `cua-driver list-tools`, `cua-driver call check_permissions`, `cua-driver call list_windows` all work on Ubuntu 22.04.
- AT-SPI on Linux: with at-spi-bus-launcher + at-spi2-core packages + a dbus session, `check_permissions` reports `atspi=true`.
- XSendEvent path: `xsend_event=true` reported by check_permissions confirms input synthesis is available.
- Display server selection works: with `DISPLAY=:99` + Xvfb running, `list_windows` correctly finds xeyes (one of the canonical X11 test apps).

What's NOT yet verified today: the at-spi-heavy tools (`get_window_state`, `get_accessibility_tree`, `set_value`) — these need a real DE running + the at-spi-registryd registered (Xvfb-only doesn't register real apps with at-spi). Best path: log in via xrdp and run from inside the Xfce session.

---

End of audit.
68 changes: 68 additions & 0 deletions scripts/linux-smoke-RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Linux cua-driver smoke test — first-pass results

**Date:** 2026-05-25
**Driver version:** `cua-driver 0.2.18` (matches main `8137a3d7`)
**Harness:** `scripts/linux-smoke.sh` (Xvfb on `:99`, xeyes as victim, calls every tool with sensible args, classifies PASS/FAIL/SKIP)

## TL;DR

20 of 32 tools PASS on a plain Xvfb + xeyes session on Ubuntu 22.04 / Xfce (Combo A) AND Ubuntu 24.04 / GNOME-XWayland (Combo B) — **identical verdicts on both distros**, demonstrating the X11 code path is consistent. Of the 9 FAILs, 8 turned out to be harness bugs (wrong tool args, missing `window_id`, error-detector false positives on JSON output that includes the word "error" as a field name). **One genuine driver bug** surfaced: `screenshot` panics on the Xvfb framebuffer depth path.

## Headline tally (v1 harness, both distros identical)

| Verdict | Count | Tools |
|---|---|---|
| **PASS** | 20 | check_permissions · click · double_click · drag · get_accessibility_tree · get_agent_cursor_state · get_config · get_cursor_position · get_screen_size · get_window_state · list_apps · list_windows · move_cursor · right_click · scroll · set_agent_cursor_enabled · set_agent_cursor_motion · set_agent_cursor_style · set_config · zoom |
| **FAIL** | 9 | get_recording_state · hotkey · kill_app · launch_app · press_key · screenshot · set_recording · type_text · type_text_chars |
| **SKIP** | 3 | page (no Chromium) · replay_trajectory (no recording file) · set_value (xeyes has no AT-SPI editable text) |

## Real driver bug found

**`screenshot` panics on Xvfb-backed targets:**

```
thread 'tokio-rt-worker' panicked at crates/mcp-server/src/image_utils.rs:48:9:
assertion `left == right` failed: Invalid buffer length:
expected 120000 got 40000 for 200x200 image
left: 120000
right: 40000
```

`120000 = 200 × 200 × 3` (RGB) vs `40000 = 200 × 200 × 1` (single byte / pixel). The assertion expects 24-bit RGB but `XGetImage` against the xeyes window on Xvfb (`-screen 0 1280x800x24`) returned 8 bits per pixel. The image-buffer code needs to either (a) accept the depth that the X server actually returned and convert on the fly, or (b) request a 24-bit visual explicitly before the grab. Reproducer: run `scripts/linux-smoke.sh` on any Xvfb-only Linux host; `screenshot` is the only tool that panics.

## Harness bugs (will be fixed in v2 of the harness, not driver bugs)

| Tool | Why "FAIL" was spurious | Fix in harness |
|---|---|---|
| `launch_app` | Passed `{"app":"xclock"}` but schema wants `name` / `launch_path` / `urls` / `bundle_id` | Use `{"name":"xclock"}` |
| `kill_app` | Cascading from `launch_app` failure (no pid to kill) | Auto-fixes once launch_app does |
| `hotkey`, `press_key`, `type_text` | Driver requires `window_id` for keyboard tools (looks up the focused X window through the pid's windows); I only sent `pid` | Add `window_id=$WIN_ID` |
| `get_recording_state`, `set_recording` | Harness's error detector matched the word "error" appearing as a property name in legitimate JSON output | Tighten match to `^❌` or `^Error:` |
| `type_text_chars` | Tool is deprecated by design — driver returns `"deprecated tool name — use 'type_text'"` as an actionable error | Mark as SKIP (deprecation is intentional) |

After applying those harness fixes, projected v2 verdict on Combo A + B: **~26 PASS · 1 FAIL (the real screenshot bug) · 5 SKIP**.

## Setup that did NOT work cleanly

- **Combo C (Debian 12 / GNOME-XWayland)**: Xvfb wasn't preinstalled — `sudo apt install xvfb` fixes it; once installed, expected to mirror A + B's results
- **My laptop's public IP changed mid-session**: all 3 NSGs ship with `source: <single Mac IP>`; updated them via `az network nsg rule update --source-address-prefixes <new-IP>` to recover
- **Combo A wedged on SSH twice** despite Azure reporting "VM running": `az vm deallocate && az vm start` recovers cleanly

## What this means for the parity rollout (per LINUX_PARITY_AUDIT.md)

The audit listed 24 tools as "🟡 implemented-but-unverified". After v2 harness fixes confirm, ~20 of those flip to 🟢 verified (over X11 native + XWayland) on the same machine that produced this data. The 4 not yet covered are:

- `get_window_state`, `get_accessibility_tree`, `set_value` (the 3 AT-SPI-heavy tools — need a real DE session, not Xvfb, to register apps with the at-spi registry). Both v1 runs PASSED `get_window_state` and `get_accessibility_tree` against an empty-tree fixture, but the at-spi response was minimal — real-app verification still needed.
- `page` (Chromium with `--remote-debugging-port`)

Strict next step from here: install Chromium on one combo + drive a real-app session via xrdp to flip the AT-SPI tools, then file the `screenshot` Xvfb-depth bug as a real issue.

## How to re-run

```bash
# On any Linux host with Xvfb + xeyes + xdotool installed
cua-driver --version # sanity
bash scripts/linux-smoke.sh # writes per-tool PASS/FAIL/SKIP to stdout
```

Output ends with a sorted summary table and totals.
Loading
Loading