Let AI control your desktop — click buttons, fill forms, automate workflows
ScreenHand
Let AI control your desktop — click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.
An open-source MCP server for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.
Quick Start | What It Does | Example | All 111 Tools | Architecture | Website
The Problem
AI assistants can write code but can't use your computer. Every click requires a screenshot → LLM interpretation → coordinate guess — 3-5 seconds and an API call per action.
ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.
| Without ScreenHand | With ScreenHand | |
|---|---|---|
| Click a button | Screenshot → LLM → coordinate click (~3-5s) | Native Accessibility API (~50ms) |
| Cost per action | 1 LLM API call | 0 LLM calls |
| Accuracy | Coordinate guessing — misses on layout shift | Exact element targeting by role/name |
| Browser control | Needs focus, screenshot per action | CDP in background (~10ms), no focus needed |
| Works across apps | One app at a time | Cross-app workflows, multi-agent coordination |
Quick Start
1. Add to your AI client (one step)
Claude Code (recommended)
claude mcp add screenhand -- npx -y screenhand
Done. That's it.
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"screenhand": {
"command": "npx",
"args": ["-y", "screenhand"]
}
}
}
Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"screenhand": {
"command": "npx",
"args": ["-y", "screenhand"]
}
}
}
OpenAI Codex CLI
Add to ~/.codex/config.toml:
[mcp.screenhand]
command = "npx"
args = ["-y", "screenhand"]
transport = "stdio"
Any MCP Client
ScreenHand is a standard MCP server over stdio. Run with npx -y screenhand.
2. Grant permissions
macOS: System Settings > Privacy & Security > Accessibility > enable your terminal app.
Windows: No special permissions needed.
3. Browser control (optional)
Launch Chrome with remote debugging to enable browser tools:
open -a "Google Chrome" --args --remote-debugging-port=9222
That's it. Your AI client now has 111 tools for desktop automation.
Building from source (contributors only)
git clone https://github.com/manushi4/screenhand.git
cd screenhand && npm install && npm run build:native
On Windows, use npm run build:native:windows instead.
What It Does
ScreenHand gives AI agents seven capabilities:
Desktop Control — 19 tools
Click buttons, type text, read UI trees, navigate menus, drag, scroll — all via native Accessibility APIs in ~50ms. Works with any app: Finder, Notes, VS Code, Xcode, System Settings, etc.
Browser Automation — 15 tools
Full Chrome control via DevTools Protocol. Navigate, click, type, run JavaScript, fill forms — all in the background at ~10ms. Built-in anti-detection (browser_stealth, browser_human_click) for sites with bot protection.
Smart Fallbacks — 8 tools
click_with_fallback, type_with_fallback, etc. automatically try Accessibility → CDP → OCR → coordinates. You don't have to pick the right method — ScreenHand figures it out.
Memory & Learning — 14 tools
Gets smarter every session. Logs tool calls, saves winning strategies, tracks error patterns with fixes. Zero config, zero latency overhead (in-memory cache, async disk writes). Ships with 12 seed strategies for common macOS workflows. 6 learning policies: locator stability, sensor effectiveness, recovery ranking, pattern recognition, adaptive timing, and topology (navigation edge reliability).
App Mastery Map — automatic per-app spatial understanding
Builds a persistent reverse-engineered blueprint of every app from normal tool usage. 8 features record automatically: page zones, navigation g
Tools (3)
desktop_controlProvides 19 tools to click buttons, type text, read UI trees, and navigate menus via native Accessibility APIs.browser_automationProvides 15 tools for full Chrome control via DevTools Protocol including navigation, form filling, and stealth clicks.smart_fallbacksProvides 8 tools that automatically cycle through Accessibility, CDP, OCR, and coordinate methods for robust interaction.Configuration
{"mcpServers": {"screenhand": {"command": "npx", "args": ["-y", "screenhand"]}}}