GitHub - agent-sh/computer-use-linux: Linux desktop control over MCP — AT-SPI, GNOME Shell, Wayland portals, ydotool (original) (raw)

Control a real Linux desktop from any MCP host.

CI crates.io npm License: MIT

computer-use-linux reads accessibility trees, takes screenshots, and drives clicks, scrolls, and keystrokes across GNOME, KDE/KWin, Hyprland, i3, and COSMIC — Wayland-first, X11 best-effort.

npm install -g @agent-sh/computer-use-linux computer-use-linux doctor | jq .readiness

The Rust crate is published as computer-use-linux and the npm wrapper as @agent-sh/computer-use-linux. Prebuilt binaries ship with the latest release.

What this is

computer-use-linux is a Rust MCP server and CLI for Linux desktop control. The crate ships the main computer-use-linux binary plus a small computer-use-linux-cosmic helper used only for COSMIC Wayland window management. Any MCP host — Codex Desktop's Linux build, Claude Desktop, Hermes Agent, or your own client — can spawn it and gain full control of the local Linux desktop: read accessibility trees, list and focus windows, take screenshots, click, drag, scroll, type, and invoke semantic accessibility actions.

Most computer-use MCP servers are macOS-only (they lean on AppKit, AXUIElement, CGEvent). The few that target Linux either drive xdotool against an X11 root window or shell out to OCR over screenshots. Four things set this one apart:

The crate was extracted from codex-desktop-linux (the Linux distribution of Codex Desktop), which still bundles this binary as a built-in plugin. This standalone repo is the upstream.

Features

MCP tools exposed by the server:

Diagnostics

Discovery

Screenshot payloads are size-bounded by default before they are returned to the MCP host: max 1920 px width/height and 2 MiB image bytes, with hard caps even when callers request more. Agents that need more detail can pass max_width, max_height, max_bytes, scale, format: "jpeg", or quality, preferably with a window target or crop. PNG remains the default; JPEG lets callers trade lossless pixels for a smaller payload before the byte cap forces further resizing. Returned screenshot metadata includes coordinate_width, coordinate_height, scale, format, and quality so callers can convert from a downscaled preview to desktop coordinate pixels.

Input

Semantic actions

Navigation

MCP safety contract

computer-use-linux is not a read-only data source. It can observe the local desktop and, when a mutating tool is called, can change real application state. The tools/list response includes MCP ToolAnnotations so hosts can surface this distinction before invocation:

Class Tools Contract
Read-only observation doctor, list_apps, list_windows, focused_window, get_app_state readOnlyHint=true; may reveal app, window, accessibility, and screenshot contents. get_app_state may trigger the desktop screenshot portal prompt.
Local setup mutators setup_accessibility, setup_window_targeting readOnlyHint=false, destructiveHint=false, idempotentHint=true; modifies user desktop configuration by enabling accessibility or installing/enabling the GNOME window-targeting extension.
UI state mutators activate_window, scroll, screenshot readOnlyHint=false, destructiveHint=false; changes focus or scroll position in the live desktop, or raises a window to capture it.
Desktop action mutators click, drag, press_key, type_text, perform_action, set_value readOnlyHint=false, destructiveHint=true, openWorldHint=true; can trigger arbitrary actions in whatever local application is targeted.

Annotations are safety hints, not an authorization system. MCP hosts should still ask the user before calls that could submit, delete, send, purchase, overwrite, or otherwise commit state.

The binary also exposes the same capabilities from the CLI for scripting and debugging:

computer-use-linux mcp                                  # stdio MCP server
computer-use-linux doctor                               # JSON readiness report
computer-use-linux setup                                # enable AT-SPI
computer-use-linux setup-window-targeting               # install GNOME Shell extension
computer-use-linux apps
computer-use-linux state [APP_NAME]
computer-use-linux screenshot                           # JSON screenshot summary
computer-use-linux windows

Support matrix

Validated manually on Ubuntu 25.10 (GNOME Shell 50.1, Wayland). Other compositor backends are implemented and covered by parser / contract tests, but real desktop behavior still depends on each session exposing its expected control API.

Desktop/session Window backend Notes
GNOME Wayland GNOME Shell extension first, org.gnome.Shell.Introspect fallback Full target. The extension provides exact window activation when GNOME blocks native introspection; Introspect can list windows and focus apps by app_id when allowed.
GNOME X11 org.gnome.Shell.Introspect when allowed AT-SPI and ydotool work; the bundled GNOME Shell extension is only needed for GNOME Wayland. Exact per-window focus may be unavailable without the extension backend.
KDE Plasma / KWin temporary KWin DBus scripting Lists and focuses windows through org.kde.KWin scripting when the session bus exposes it.
Hyprland hyprctl clients -j and hyprctl dispatch focuswindow Requires hyprctl in the desktop session.
i3 i3-msg; optional xprop for PID hydration Lists and focuses i3 windows over the active i3 IPC socket.
COSMIC Wayland computer-use-linux-cosmic helper Installed automatically by ./install.sh, cargo install, and npm. For custom/manual layouts, put the helper next to the main binary, on PATH, or point COMPUTER_USE_LINUX_COSMIC_HELPER at it.
Sway / generic wlroots no dedicated backend yet AT-SPI, screenshots, and global ydotool input can still work; exact window list/focus is currently unavailable unless another backend applies.
Generic X11 / XFCE / other WMs no dedicated backend yet AT-SPI plus ydotool global input only, unless running under i3.

If you run on a desktop not covered above, or a covered backend does not come up cleanly, please open an issue with the output of computer-use-linux doctor so we can extend the matrix honestly.

Install

COSMIC users do not need a second package or a separate helper install when using ./install.sh, cargo install, or the npm wrapper. Those paths install computer-use-linux-cosmic alongside the main binary automatically. Only manual prebuilt-binary installs need you to copy both release assets.

Option A — ./install.sh from a clone

Installs system packages on Debian/Ubuntu, Fedora/RHEL-like, or Arch-like distros; installs Rust if needed; builds both release binaries; installs them to ~/.local/bin; enables ydotoold as a user service; enables GNOME AT-SPI settings when running under GNOME; and installs the bundled GNOME Shell extension on GNOME Wayland.

git clone https://github.com/agent-sh/computer-use-linux cd computer-use-linux ./install.sh

log out and back in if the GNOME extension was newly installed

computer-use-linux doctor | jq .readiness

Option B — cargo install (Rust binaries, no system setup)

Installs the Rust binaries from crates.io. You still handle the system-level pieces yourself: ydotoold, AT-SPI, desktop portals, and the GNOME extension if you need the GNOME Wayland exact-focus backend.

cargo install computer-use-linux computer-use-linux doctor

For unreleased changes from main, install directly from Git:

cargo install --git https://github.com/agent-sh/computer-use-linux

Then, as needed:

sudo apt install ydotool at-spi2-core # or your distro's equivalent systemctl --user enable --now ydotoold computer-use-linux setup # gsettings AT-SPI bridge computer-use-linux setup-window-targeting # GNOME Shell extension

Option C — npm wrapper (binary download)

Good for users who already have Node.js and want a no-Rust install. The npm package downloads and verifies the matching main and COSMIC helper binaries during install, then the wrapper sets COMPUTER_USE_LINUX_COSMIC_HELPER to the bundled helper automatically.

npm install -g @agent-sh/computer-use-linux computer-use-linux doctor

You will still need ydotoold running and AT-SPI enabled (run computer-use-linux setup and the systemd commands above).

Option D — prebuilt binaries

Linux x86_64 / aarch64 builds are published with each tag. Each binary ships a .sha256 next to it.

target=x86_64-unknown-linux-gnu base=https://github.com/agent-sh/computer-use-linux/releases/latest/download for binary in computer-use-linux computer-use-linux-cosmic; do asset="$binary-$target" curl -L -O "$base/$asset" curl -L -O "$base/$asset.sha256" sha256sum -c "$asset.sha256" install -m 0755 "$asset" "$HOME/.local/bin/$binary" done

You will still need ydotoold running and AT-SPI enabled (run computer-use-linux setup and the systemd commands above).

Wire it into your MCP host

The binary speaks the rmcp 2024-11-05 stdio protocol. Pass mcp as the only argument; everything else is configured through MCP tool calls.

Codex Desktop (Linux build)

The Linux build of Codex Desktop already bundles this binary as a plugin. You don't need to wire it up manually — the plugin definition lives in codex-desktop-linux under its plugins/ directory and is enabled by default. To upgrade the plugin in place, replace the binary it ships with the one from this repo's release assets.

Claude Code (CLI)

Use the claude mcp add command to register the binary as a stdio MCP server. Pick a scope:

User-wide install (recommended for desktop control)

claude mcp add --scope user computer-use-linux -- computer-use-linux mcp

Verify the server is registered and reachable

claude mcp list

If computer-use-linux is not on PATH, pass the absolute path (e.g. ~/.local/bin/computer-use-linux). Inside a Claude Code session, run /mcp to confirm the tools are loaded.

Claude Desktop

Edit ~/.config/Claude/claude_desktop_config.json:

{ "mcpServers": { "computer-use-linux": { "command": "computer-use-linux", "args": ["mcp"] } } }

Restart Claude Desktop. The tools should appear in the tools list.

Hermes Agent

Install the companion Hermes skill so Hermes has the desktop-specific runbook:

hermes skills tap add agent-sh/computer-use-linux hermes skills install agent-sh/computer-use-linux/computer-use-linux

The skill is optional but recommended for Hermes users. It teaches Hermes how to install, configure, verify, and call the Linux desktop MCP safely. It follows the same skills/<name>/SKILL.md tap layout used by Hermes community skills.

Then add the stdio MCP server:

hermes mcp add computer-use-linux --command computer-use-linux --args mcp hermes mcp test computer-use-linux hermes mcp configure computer-use-linux

configure opens Hermes' tool-selection UI for the server. The generated config should look like this:

mcp_servers: computer-use-linux: command: computer-use-linux args: ["mcp"] timeout: 120 connect_timeout: 30

Optional: expose the tools to subagents as well.

inherit_mcp_toolsets: true

If you installed the binary somewhere that is not on PATH, pass the absolute path as --command.

Restart Hermes after editing the config. Hermes registers the tools as mcp_computer_use_linux_<tool> and creates the mcp-computer-use-linux runtime toolset.

You can verify both sides before asking Hermes to use the desktop:

computer-use-linux doctor | jq .readiness hermes skills inspect agent-sh/computer-use-linux/computer-use-linux hermes chat --toolsets mcp-computer-use-linux -q "List the current desktop windows."

For one-off installs without adding the tap first, Hermes also accepts hermes skills install agent-sh/computer-use-linux/skills/computer-use-linux.

Generic MCP client

Spawn the binary with ["mcp"] as the argv tail. It speaks JSON-RPC over stdio per the rmcp 2024-11-05 protocol; capability discovery happens through tools/list and the doctor tool. The server normally needs no MCP-specific configuration, but desktop runtime environment still matters (DBUS_SESSION_BUS_ADDRESS, XDG_RUNTIME_DIR, portals, AT-SPI, ydotoold, and optionally COMPUTER_USE_LINUX_COSMIC_HELPER).

First-run checklist

  1. Run doctor.
    computer-use-linux doctor | jq .readiness
    Aim for can_register_mcp_tools, can_build_accessibility_tree, can_send_development_input, and can_query_windows all true. The blockers array should be empty.
  2. If accessibility.at_spi_bus.ok = false — run computer-use-linux setup (or call the setup_accessibility MCP tool). This sets:
    • org.gnome.desktop.interface toolkit-accessibility true
      You may need to restart toolkit-using apps for the change to take effect.
  3. If windowing.can_list_windows = false — inspect doctor.windowing.backends. On GNOME Wayland, run computer-use-linux setup-window-targeting (or call setup_window_targeting) to install the bundled computer-use-linux@avifenesh.dev Shell extension, then log out and back in so GNOME Shell loads it. On KDE, Hyprland, i3, or COSMIC, install or expose the matching compositor tool/helper shown in the backend details.
  4. Grant the screencast portal on first screenshot. The first time get_app_state or any screenshot subcommand runs, GNOME will pop a portal dialog asking to share the screen. Accept once and tick "remember" to make it sticky for the session.
  5. Confirm ydotoold is running.
    systemctl --user status ydotoold
    Its socket should appear at /run/user/$UID/.ydotool_socket.

Environment variables

Most setups need none of these — doctor and the installers pick sensible defaults. They exist for overriding auto-detected paths and input backends.

Server runtime (set in the MCP host's environment):

Variable Effect
COMPUTER_USE_LINUX_COSMIC_HELPER Path to the computer-use-linux-cosmic helper when it isn't next to the binary or on PATH.
CU_DISABLE_ABS_POINTER Disable the uinput absolute pointer and click through ydotool instead for setups where the abs-pointer device misbehaves.
COMPUTER_USE_LINUX_FORCE_PORTAL_POINTER / …_KEYBOARD Always route pointer / keyboard through the RemoteDesktop portal on Wayland, skipping auto-detection.
COMPUTER_USE_LINUX_FORCE_YDOTOOL_POINTER / …_KEYBOARD Always route pointer / keyboard through ydotool, skipping the portal and KDE clipboard paths.
COMPUTER_USE_LINUX_SCREENSHOT_BACKEND Force a single screenshot backend, skipping the fallback chain. Accepts gnome-shell, portal, or gnome-screenshot. Pin gnome-screenshot for background/systemd contexts where the GNOME Shell and portal DBus paths are denied.

Build-time identity overrides (set while compiling a downstream embedded bundle): CUL_GNOME_EXTENSION_UUID, CUL_DBUS_SERVICE, andCUL_DBUS_OBJECT_PATH replace the default standalone GNOME Shell extension UUID and DBus endpoint in both the Rust probes and the generated extension files.

npm wrapper (set during npm install, or before running):

Variable Effect
COMPUTER_USE_LINUX_BIN Run this binary instead of the one bundled by the npm package.
COMPUTER_USE_LINUX_DOWNLOAD_BASE Override the GitHub release base URL the installer downloads from (mirrors, air-gapped hosts).
COMPUTER_USE_LINUX_SKIP_DOWNLOAD=1 Skip the post-install binary download entirely.
COMPUTER_USE_LINUX_LOCAL_BINARY / …_LOCAL_COSMIC_HELPER Install from a local build instead of downloading (used by CI and local testing).

Architecture

Security

Computer-use tooling is, by definition, a privilege-escalation surface. The threat model:

If you're running this on a shared workstation, set ydotoold's socket permissions to 0600 (the default) and audit which processes on your user can connect() to it.

Troubleshooting

computer-use-linux doctor is the source of truth. Common failure modes and fixes:

If doctor is green and a specific tool still misbehaves, file an issue with the JSON output of doctor and the failing tool's request payload.

Contributing

Contributions are welcome. See CONTRIBUTING.md for the local development workflow, CI gates, and PR expectations. Report security vulnerabilities through SECURITY.md, not public issues.

Credits

Extracted from codex-desktop-linux, the Linux distribution of Codex Desktop, which continues to ship this same binary as a bundled plugin. Maintained by Avi Fenesh.

Built on top of:

Publishing

Publishing is tag-driven from GitHub Actions. The repository needs these Actions secrets:

gh secret set CARGO_REGISTRY_TOKEN -R agent-sh/computer-use-linux gh secret set NPM_TOKEN -R agent-sh/computer-use-linux

Then bump Cargo.toml and package.json together, update CHANGELOG.md, and push a vX.Y.Z tag. CI runs the full Rust and MCP safety gates, builds release assets for both architectures, publishes computer-use-linux to crates.io, and publishes the npm wrapper after the GitHub release binaries are available.

License

MIT — see LICENSE.