Ubuntu Desktop Control MCP Server

$pip install ubuntu-desktop-control
README.md

Control Ubuntu desktops through screenshots, mouse clicks, and keyboard interactions

Ubuntu Desktop Control MCP Server

An MCP (Model Context Protocol) server that enables LLMs to control your Ubuntu desktop by taking screenshots and sending mouse clicks. This allows AI assistants to visually interact with your desktop applications.

⚡ NEW: Optimized Production Workflow

5x faster, 5x more accurate! Now using the same optimization techniques as Anthropic's Computer Use API:

  • 📸 Smart Screenshots: Auto-downsampled to 1280x720 (5x smaller)
  • 🎯 Numbered Elements: See what's clickable at a glance with overlaid IDs
  • 🤖 AT-SPI Integration: Automatic UI element detection using accessibility API
  • 📐 Percentage Coords: Resolution-agnostic positioning (no more pixel hunting!)
  • ⚡ Workflow Batching: Execute multiple actions in one MCP call
  • 🎪 Element Cache: Direct element interaction - "click element #5"

Example - Old way (8+ calls, ~15s):

take_screenshot() → analyze → grid overlay → zoom quadrant → find pixel → click → miss

Example - New way (1 call, ~3s):

take_screenshot() → "I see Pinta is element #5" → click_screen(element_id=5) → ✓

See README.md for full details.

Features

  • 📸 Screenshot Capture: Annotated screenshots with automatic element detection
  • 🔢 Element Detection: AT-SPI + CV fallback for robust UI element identification
  • 🖱️ Smart Clicking: Click by element ID or percentage coordinates
  • ⌨️ Keyboard Control: Type text and press keys/hotkeys
  • 🎯 Mouse Movement: Smooth cursor positioning with animation
  • 🚀 Workflow Batching: Execute multi-step tasks in single MCP call
  • 📊 Diagnostics: Display scaling detection, warnings, and recommendations

Quick Start

1. Prerequisites

  • Ubuntu Linux (X11 required, Wayland not fully supported)
  • Python 3.9+

2. Installation

From PyPI (Recommended)
pip install ubuntu-desktop-control
From Source
# Clone repository
git clone https://github.com/charettep/ubuntu-desktop-control-mcp.git
cd ubuntu-desktop-control-mcp

# Install system dependencies (requires sudo)
chmod +x scripts/install.sh
./scripts/install.sh

# Install Python dependencies
pip install -e .

Configuration

Claude Code

Installation Methods
Method 1: CLI (Recommended)
claude mcp add --transport stdio ubuntu-desktop-control -- \
  ubuntu-desktop-control
Method 2: Manual Config

Edit ~/.claude/claude_desktop_config.json:

{
  "mcpServers": {
    "ubuntu-desktop-control": {
      "command": "ubuntu-desktop-control",
      "args": []
    }
  }
}

VS Code Insiders

Installation Methods
Method 1: MCP Command
  1. Open Command Palette (Ctrl+Shift+P)
  2. Run MCP: Open Workspace Folder Configuration
  3. Add the server configuration below.
Method 2: Manual Config

Create .vscode/mcp.json in your workspace:

{
  "servers": {
    "ubuntu-desktop-control": {
      "type": "stdio",
      "command": "ubuntu-desktop-control",
      "args": []
    }
  }
}

Codex CLI

Installation Methods
Method 1: CLI
codex mcp add ubuntu-desktop-control -- \
  ubuntu-desktop-control
Method 2: Manual Config

Edit ~/.config/codex/config.toml:

[mcp_servers.ubuntu-desktop-control]
type = "stdio"
command = "ubuntu-desktop-control"
args = []

Tools

Core Capabilities

Tool Description
take_screenshot Capture the desktop (optionally per-monitor) with annotated elements.
click_screen Click by element ID or percentage coordinates (supports per-monitor).
move_mouse Move the cursor by element ID or percentage coordinates (supports per-monitor).
drag_mouse Drag the cursor to coordinates while holding a mouse button.
type_text Type text using the keyboard.
press_key Press a specific key (e.g., 'enter', 'esc').
press_hotkey Press a combination of keys simultaneously (e.g., Ctrl+Shift+C).
get_screen_info Get screen dimensions and display server type (X11/Wayland).
get_display_diagnostics Troubleshoot scaling and coordinate mismatches.
map_GUI_elements_location Detect and map UI elements (hitboxes) using Computer Vision.
convert_screenshot_coordinates Convert pixels from a screenshot to logical click coordinates.
list_prompt_templates List available prompt templates (for clients without native prompt support).
execute_workflow Execute a batch of actions (screenshot/click/move/type/wait).

Prompt Rendering Tools

These tools allow clients without native prompt support (like Codex CLI) to render prompt templates as text.

Tool Description
render_prompt_baseline_display_check Render the baseline display check prompt.
render_prompt_capture_full_desktop Render the full desktop capture pr

Tools (11)

take_screenshotCapture the desktop (optionally per-monitor) with annotated elements.
click_screenClick by element ID or percentage coordinates (supports per-monitor).
move_mouseMove the cursor by element ID or percentage coordinates (supports per-monitor).
drag_mouseDrag the cursor to coordinates while holding a mouse button.
type_textType text using the keyboard.
press_keyPress a specific key (e.g., 'enter', 'esc').
press_hotkeyPress a combination of keys simultaneously (e.g., Ctrl+Shift+C).
get_screen_infoGet screen dimensions and display server type (X11/Wayland).
get_display_diagnosticsTroubleshoot scaling and coordinate mismatches.
map_GUI_elements_locationDetect and map UI elements (hitboxes) using Computer Vision.
execute_workflowExecute a batch of actions (screenshot/click/move/type/wait).

Configuration

claude_desktop_config.json
{"mcpServers": {"ubuntu-desktop-control": {"command": "ubuntu-desktop-control", "args": []}}}

Try it

Take a screenshot of my desktop and tell me what applications are currently open.
Open the Pinta application, find the brush tool, and click on it.
Type 'Hello World' into the active text editor and then press the Enter key.
Use a hotkey to copy the selected text and then paste it into a new document.
Run a diagnostic check on my display scaling to ensure coordinates are accurate.

Frequently Asked Questions

What are the key features of Ubuntu Desktop Control?

Annotated Screenshot Capture with automatic element detection and ID overlays.. Hybrid Element Detection using AT-SPI accessibility API and Computer Vision fallback.. Resolution-agnostic positioning using percentage coordinates and element IDs.. Workflow Batching to execute multiple desktop actions in a single MCP call.. Full Keyboard and Mouse control including hotkeys, dragging, and smooth movement..

What can I use Ubuntu Desktop Control for?

Automating repetitive GUI tasks in Linux applications that lack APIs.. Assisting users with visual impairments by identifying and interacting with UI elements.. Performing automated UI testing and verification on Ubuntu desktops.. Providing AI-driven remote technical support by visually navigating the OS.. Streamlining complex workflows across multiple desktop applications..

How do I install Ubuntu Desktop Control?

Install Ubuntu Desktop Control by running: pip install ubuntu-desktop-control

What MCP clients work with Ubuntu Desktop Control?

Ubuntu Desktop Control works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Use Ubuntu Desktop Control with Conare

Manage MCP servers visually, upload persistent context, and never start from zero with Claude Code & Codex.

Try Free