Skip to content

fix(agent-ui): keep app alive on GPU-process crash in GPU-less envs#1801

Draft
github-actions[bot] wants to merge 1 commit into
mainfrom
autofix/issue-1800
Draft

fix(agent-ui): keep app alive on GPU-process crash in GPU-less envs#1801
github-actions[bot] wants to merge 1 commit into
mainfrom
autofix/issue-1800

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

The Agent UI desktop app crashed on launch in any GPU-less environment (Windows Sandbox, headless VMs, some RDP sessions): the Electron GPU child process crashes, and GAIA's child-process-gone safety net treated that as fatal — showing "GAIA crashed" and exiting before the window ever rendered. GPU-process crashes are recoverable (Chromium relaunches the GPU process and, after repeated failures, falls back to software rendering), so the app should ride them out, not die. After this change a GPU crash is logged and ignored; the app keeps running and renders via software fallback. Non-GPU child-process crashes stay fatal, so unrelated failures are still surfaced.

Closes #1800

Test plan

  • cd tests/electron && npm test -- test_main_error_handling.js passes (GPU crash → no exit; non-GPU crash → still fatal)
  • python util/lint.py --all passes
  • Launch the packaged Agent UI in Windows Sandbox (no GPU): app gets past "Loading ML libraries" and renders the chat window instead of showing "GAIA crashed — child-process-gone type=GPU"
  • On a normal machine with a working GPU, the app still launches and renders as before (GPU acceleration unchanged)

⚠️ Needs manual validation — the automated checks confirm no Python regression and that the safety-net logic behaves correctly, but they can't exercise a real Electron GPU-process crash. A maintainer should verify on Windows Sandbox (and a normal GPU machine) per the steps above before merging.

A GPU-process crash routed through the child-process-gone safety net was
treated as fatal, killing the whole Electron app in GPU-less environments
(Windows Sandbox, headless VMs). GPU crashes are recoverable — Chromium
relaunches the GPU process and falls back to software rendering — so log
them and let Chromium recover instead of exiting. Other child-process
crashes stay fatal.

Closes #1800
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Agent UI (Electron) crashes in GPU-less environments (Windows Sandbox / VM) — child-process-gone type=GPU

0 participants