ScreenHand MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add screenhand -- npx -y screenhand
README.md

Let AI control your desktop — click buttons, fill forms, automate workflows

ScreenHand

Let AI control your desktop — click buttons, fill forms, automate workflows in ~50ms with zero extra AI calls.

An open-source MCP server for macOS and Windows. Works with Claude, Cursor, Codex CLI, and any MCP-compatible client.

Quick Start | What It Does | Example | All 111 Tools | Architecture | Website


The Problem

AI assistants can write code but can't use your computer. Every click requires a screenshot → LLM interpretation → coordinate guess — 3-5 seconds and an API call per action.

ScreenHand gives AI direct access to native OS APIs. No screenshots needed for clicks. No AI calls for button presses.

Without ScreenHand With ScreenHand
Click a button Screenshot → LLM → coordinate click (~3-5s) Native Accessibility API (~50ms)
Cost per action 1 LLM API call 0 LLM calls
Accuracy Coordinate guessing — misses on layout shift Exact element targeting by role/name
Browser control Needs focus, screenshot per action CDP in background (~10ms), no focus needed
Works across apps One app at a time Cross-app workflows, multi-agent coordination

Quick Start

1. Add to your AI client (one step)

Claude Code (recommended)
claude mcp add screenhand -- npx -y screenhand

Done. That's it.

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "screenhand": {
      "command": "npx",
      "args": ["-y", "screenhand"]
    }
  }
}
Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "screenhand": {
      "command": "npx",
      "args": ["-y", "screenhand"]
    }
  }
}
OpenAI Codex CLI

Add to ~/.codex/config.toml:

[mcp.screenhand]
command = "npx"
args = ["-y", "screenhand"]
transport = "stdio"
Any MCP Client

ScreenHand is a standard MCP server over stdio. Run with npx -y screenhand.

2. Grant permissions

macOS: System Settings > Privacy & Security > Accessibility > enable your terminal app.

Windows: No special permissions needed.

3. Browser control (optional)

Launch Chrome with remote debugging to enable browser tools:

open -a "Google Chrome" --args --remote-debugging-port=9222

That's it. Your AI client now has 111 tools for desktop automation.

Building from source (contributors only)
git clone https://github.com/manushi4/screenhand.git
cd screenhand && npm install && npm run build:native

On Windows, use npm run build:native:windows instead.


What It Does

ScreenHand gives AI agents seven capabilities:

Desktop Control — 19 tools

Click buttons, type text, read UI trees, navigate menus, drag, scroll — all via native Accessibility APIs in ~50ms. Works with any app: Finder, Notes, VS Code, Xcode, System Settings, etc.

Browser Automation — 15 tools

Full Chrome control via DevTools Protocol. Navigate, click, type, run JavaScript, fill forms — all in the background at ~10ms. Built-in anti-detection (browser_stealth, browser_human_click) for sites with bot protection.

Smart Fallbacks — 8 tools

click_with_fallback, type_with_fallback, etc. automatically try Accessibility → CDP → OCR → coordinates. You don't have to pick the right method — ScreenHand figures it out.

Memory & Learning — 14 tools

Gets smarter every session. Logs tool calls, saves winning strategies, tracks error patterns with fixes. Zero config, zero latency overhead (in-memory cache, async disk writes). Ships with 12 seed strategies for common macOS workflows. 6 learning policies: locator stability, sensor effectiveness, recovery ranking, pattern recognition, adaptive timing, and topology (navigation edge reliability).

App Mastery Map — automatic per-app spatial understanding

Builds a persistent reverse-engineered blueprint of every app from normal tool usage. 8 features record automatically: page zones, navigation g

Tools (3)

desktop_controlProvides 19 tools to click buttons, type text, read UI trees, and navigate menus via native Accessibility APIs.
browser_automationProvides 15 tools for full Chrome control via DevTools Protocol including navigation, form filling, and stealth clicks.
smart_fallbacksProvides 8 tools that automatically cycle through Accessibility, CDP, OCR, and coordinate methods for robust interaction.

Configuration

claude_desktop_config.json
{"mcpServers": {"screenhand": {"command": "npx", "args": ["-y", "screenhand"]}}}

Try it

Open the System Settings app and navigate to the Accessibility menu.
Fill out the login form on the open Chrome tab using my saved credentials.
Click the 'Save' button in the current application window.
Read the UI tree of the active window and summarize the available menu options.
Scroll down the current webpage by 500 pixels.

Frequently Asked Questions

What are the key features of ScreenHand?

Native desktop control via Accessibility APIs for macOS and Windows. Full Chrome browser automation using the DevTools Protocol. Smart fallback mechanisms that cycle through multiple interaction methods. In-memory learning system that logs tool calls and saves winning strategies. Automatic per-app spatial understanding and blueprint generation.

What can I use ScreenHand for?

Automating repetitive data entry tasks across multiple desktop applications. Navigating complex software interfaces that lack keyboard shortcuts. Performing background browser tasks without needing to focus the window. Building cross-app workflows that require interaction with both native and web apps. Creating self-improving automation scripts that learn from previous execution patterns.

How do I install ScreenHand?

Install ScreenHand by running: claude mcp add screenhand -- npx -y screenhand

What MCP clients work with ScreenHand?

ScreenHand works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep ScreenHand docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare