Delegate desktop automation tasks to an autonomous vision-based agent.
CUA MCP Server
An agentic Model Context Protocol (MCP) server for CUA Cloud - delegate desktop automation tasks to an autonomous vision-based agent. Images never leave the server; only text summaries are returned.
Production URL: https://cua-mcp-server.vercel.app/mcp
What is CUA?
CUA (Computer Use Agent) provides cloud-based virtual machine sandboxes that AI agents can control. This MCP server exposes CUA's capabilities through a clean task-delegation API:
- Create and manage VMs (Linux, Windows, macOS)
- Delegate tasks - "Open Chrome and navigate to google.com"
- Get text summaries - No images in your context window
- Query screen state - Vision-based descriptions without taking action
Architecture
Claude Code (Orchestrator)
│
│ run_task("Open Chrome and go to google.com")
▼
┌─────────────────────────────────────────────────────────────┐
│ CUA MCP Server (Agentic) │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Internal Agent Loop │ │
│ │ 1. screenshot() → CUA sandbox │ │
│ │ 2. screenshot → Claude API (computer_use tool) │ │
│ │ 3. Claude returns: click(x,y) / type("text") / done │ │
│ │ 4. Execute action on sandbox │ │
│ │ 5. Loop until complete │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
{ success: true, summary: "Opened Chrome...", steps_taken: 5 }
(TEXT ONLY - no images)
Project Structure
api/mcp.ts # MCP protocol handler
lib/
├── agent/ # Modular agent architecture
│ ├── index.ts # Public exports
│ ├── types.ts # Type definitions
│ ├── config.ts # Model configurations
│ ├── validation.ts # Coordinate validation helpers
│ ├── execute.ts # Main agent loop
│ ├── describe.ts # Screen description
│ ├── progress.ts # Progress tracking
│ ├── utils.ts # Utilities (sleep, generateTaskId)
│ └── actions/ # Action handler registry (16 handlers)
├── cua-client.ts # CUA Cloud API client
└── tool-schemas.ts # MCP tool definitions
Available Tools (9 total)
Sandbox Management (5 tools)
| Tool | Description |
|---|---|
list_sandboxes |
List all CUA cloud sandboxes with their current status |
get_sandbox |
Get details of a specific sandbox including API URLs |
start_sandbox |
Start a stopped sandbox |
stop_sandbox |
Stop a running sandbox |
restart_sandbox |
Restart a sandbox |
Note: Create and delete sandboxes via the CUA Dashboard - the Cloud API doesn't expose these operations.
Agentic Tools (4 tools)
| Tool | Description |
|---|---|
describe_screen |
Get a text description of current screen state using vision AI. No actions taken. |
run_task |
Execute a computer task autonomously. Returns immediately with task_id for polling. |
get_task_progress |
Poll progress of running tasks. Returns current step, last action, and reasoning. |
get_task_history |
Retrieve results of a previously executed task by ID. |
Quick Start
1. Get a CUA API Key
- Go to cua.ai/signin
- Navigate to Dashboard > API Keys > New API Key
- Copy your API key (starts with
sk_cua-api01_...)
2. Configure Claude Code
Add to your ~/.claude.json:
{
"mcpServers": {
"cua": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://cua-mcp-server.vercel.app/mcp"]
}
}
}
3. Use with Claude Code
You: "List my CUA sandboxes"
Claude: [Uses list_sandboxes tool]
You: "Start my-sandbox"
Claude: [Uses start_sandbox tool]
You: "Open Firefox and go to google.com on my-sandbox"
Claude: [Uses run_task with task="Open Firefox and navigate to google.com"]
→ Returns: { success: true, summary: "Opened Firefox, navigated to google.com", steps_taken: 4 }
You: "What's currently on the screen?"
Claude: [Uses describe_screen tool]
→ Returns: { description: "Firefox browser showing Google homepage with search box..." }
Usage Examples
Automate a Web Task
You: "On my-sandbox, open Chrome, go to github.com, and search for 'mcp server'"
Claude uses run_task:
- task: "Open Chrome browser, navigate to github.com, find the search box, type 'mcp server' and press Enter"
- Returns summary of what happened (no screenshots in your context)
Check Screen State
You: "What's on the screen right now?"
Claude uses describe_screen:
- focus: "ui" (or "text" or "full")
- Returns text description of UI elements, buttons, text content
Ask Specific Questions
Yo
Tools (9)
list_sandboxesList all CUA cloud sandboxes with their current statusget_sandboxGet details of a specific sandbox including API URLsstart_sandboxStart a stopped sandboxstop_sandboxStop a running sandboxrestart_sandboxRestart a sandboxdescribe_screenGet a text description of current screen state using vision AIrun_taskExecute a computer task autonomouslyget_task_progressPoll progress of running tasksget_task_historyRetrieve results of a previously executed task by IDEnvironment Variables
CUA_API_KEYrequiredAPI key for authenticating with CUA Cloud servicesConfiguration
{"mcpServers": {"cua": {"command": "npx", "args": ["-y", "mcp-remote", "https://cua-mcp-server.vercel.app/mcp"]}}}