About
An MCP and HTTP server wrapper for PyAutoGUI, enabling LLMs to control your mouse and keyboard.
Architecture
The service supports two deployment architectures:
LLM -> MCP -> TOOL (Remote Tool Service)
This architecture separates the MCP server from the tool service, allowing the MCP server to connect to a remote tool service via HTTP.
graph LR
LLM[LLM Client] -->|MCP Protocol| MCP[MCP Servermain.pyClient-based Tools]
MCP -->|HTTP APIwith API Key| TOOLA[Tool Servicetool.pyHTTP API Server]
MCP -->|HTTP APIwith API Key| TOOLB[Tool Servicetool.pyHTTP API Server]
TOOLA -->|PyAutoGUI| COMPUTERA[Computer Control]
TOOLB -->|PyAutoGUI| COMPUTERB[Computer Control]
style LLM fill:#e1f5ff
style MCP fill:#fff4e1
style TOOLA fill:#ffe1f5
style COMPUTERA fill:#e1ffe1
style COMPUTERB fill:#e1ffe1
Characteristics:
- MCP server uses client-based tools (
register_computer_tools_with_client) - MCP server forwards requests to remote tool service via HTTP
- Tool service performs actual computer control operations
- Suitable for distributed deployments where MCP server and tool service run on different machines
- Requires
endpointparameter in MCP tool calls
Architecture 2: LLM -> MCP (Direct Tools)
This architecture uses direct tools where the MCP server directly performs computer control operations.
graph LR
LLM[LLM Client] -->|MCP Protocolstdio/http| MCP[MCP Servermcp_local.pyDirect Tools]
MCP -->|PyAutoGUI| COMPUTER[Computer Control]
style LLM fill:#e1f5ff
style MCP fill:#fff4e1
style COMPUTER fill:#e1ffe1
Characteristics:
- MCP server uses direct tools (
register_computer_tools) - MCP server directly executes computer control operations
- No separate tool service required
- Suitable for local deployments where everything runs on the same machine
- No
endpointparameter needed in MCP tool calls
Features
- π Dual Protocol Support: HTTP REST API and MCP (Model Context Protocol)
- π API Key Authentication: Optional API key authentication for service-to-service communication
- π Multiple MCP Transports: Support for both HTTP and stdio (Standard Input/Output) transport modes
- π±οΈ Mouse Control: Move, click, drag, scroll operations
- β¨οΈ Keyboard Control: Press keys, type text, key combinations
- πΈ Screenshot: Capture screen and get base64-encoded images
- π Screen Info: Get cursor position and screen resolution
- βοΈ Configuration Management: Pydantic Settings with environment variable support
- π Auto Documentation: Swagger UI for HTTP API
- π§ Flexible Deployment: Run HTTP server or MCP server independently
- π Request Tracing: Request ID middleware for request tracking
- π Structured Logging: Loguru-based logging with request ID integration
- π Remote MCP Support: Optional HTTP client for remote tool server integration
Quick Start
Prerequisites
- Python >= 3.12
uvpackage manager (recommended)
Installation
- Clone the repository:
git clone https://github.com/stonehill-2345/mcp-autogui-multinode.git
cd mcp-autogui-multinode
- Install dependencies based on your deployment scenario:
Local Full Development
For local development with all features (GUI control + testing):
uv sync --group gui --group dev
Deploy MCP Server Only
For deploying MCP server that connects to remote tool service (no GUI dependencies needed):
uv sync --no-group gui
Deploy Tool Service Only
For deploying HTTP tool service that performs actual computer control (requires GUI):
uv sync --group gui
Running the Service
The service supports two independent servers:
1. Run Tool Service (HTTP API)
Starts the HTTP API server for computer control:
uv run python tool.py
2. Run MCP Server
Starts the MCP server that can connect to remote tool services. The server supports two transport modes:
HTTP Transport Mode:
uv run python mcp_local.py http
stdio Transport Mode (default):
uv run python mcp_local.py stdio
After starting, you can access:
- HTTP API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- MCP Endpoint: http://localhost:8001/mcp (if using HTTP transport)
API Endpoints
Base Endpoints
GET /- Root path, returns API informationGET /health- Health check endpoint
Computer Control Endpoints
All computer control actions are available at:
POST /api/computer/{action}- Execute a computer control action
Tools 1
computer_controlExecutes mouse and keyboard operations including clicking, typing, and screen interaction.Environment Variables
API_KEYOptional API key for service-to-service authentication