Native Windows MCP server that gives AI agents full desktop control.
Orbination AI Desktop Vision & Control
Give AI assistants eyes and hands. A native Windows MCP server that lets AI see the screen, read UI elements, click buttons, type text, and control any application — with built-in OCR, dark theme support, window occlusion detection, and batch action sequencing.
Built for Claude Code by Leia Enterprise Solutions for the Orbination project.
AI coding assistants are blind. They generate code but can never see the result. They can't compare a design mockup to a running app. They can't click through a UI to test it. This server fixes that.
What It Does
This MCP server bridges the gap between AI and your desktop. Instead of working blind with just text, the AI can:
- See — Take screenshots, run OCR on any window (auto-enhances dark themes), detect window occlusion
- Read — Detect every UI element (buttons, inputs, text, tabs, checkboxes) with exact positions via Windows UIAutomation
- Interact — Click elements by text (UIAutomation + OCR fallback), navigate menus, fill forms, type and paste text
- Navigate — Open apps, switch windows, focus tabs, navigate browser URLs
- Understand — Scan the entire desktop: window visibility %, occlusion detection, uncovered desktop regions
- Batch — Execute multi-step UI workflows in a single call with
run_sequence
What's New in v2.0
- Window Occlusion Detection — Grid-based analysis showing which windows are truly visible (visibility %) and which are hidden behind others
- Desktop Region Detection — Flood-fill algorithm to find uncovered screen areas
- Shared OcrService — Centralized OCR with automatic dark theme enhancement (invert + contrast boost) — single-pass, not two
- PrintWindow API — Capture window content even when obscured by other windows
click_elementOCR Fallback — UIAutomation first, then OCR for dark themes, web apps, iframesrun_sequence— Batch multiple UI actions (click, type, paste, hotkey, wait, focus, OCR click) in a single MCP callclick_menu_item— Navigate parent > child menus with smooth mouse movement to keep submenus open- DPI Awareness — Per-monitor DPI for correct coordinates on multi-monitor setups with mixed scaling
- Embedded AI Instructions — Server sends tool usage guidelines on MCP connection, teaching AI to prefer OCR over screenshots
Architecture
AI Client (Claude Code / Claude Desktop)
│
│ MCP / stdio
▼
┌─────────────────────────────┐
│ MCP Server │
│ (ServerInstructions) │
└─────────┬───────────────────┘
│
┌─────────┼──────────────────────────────────────┐
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ │
Mouse Keyboard Screen Vision Composite │
Tools Tools Tools Tools Tools │
│ │ │ │
┌────────┼──────────┼──────────┘ │
▼ ▼ ▼ │
Win32 UIAuto- OcrService │
Native mation (dark theme) │
│ │ │
▼ ▼ │
DesktopScanner NativeInput │
(occlusion, (SendInput, │
regions) clipboard) │
│ │ │
└───────┬───────┘ │
▼ │
Windows OS │
(Desktop, Windows, Apps) │
└────────────────────────────────────────────────┘
Single native .NET 8 executable. No Python. No Node.js. No browser drivers. Direct Windows API access.
Requirements
- Windows 10/11
- .NET 8 SDK
Build
cd DesktopControlMcp
dotnet build -c Release
Or publish as a single file:
dotnet publish -c Release -r win-x64 --self-contained false
Setup with Claude Code
Add the MCP server to your Claude Code conf
Tools (3)
run_sequenceExecute multiple UI actions in a single call.click_elementClick a UI element identified by text or OCR.click_menu_itemNavigate and click items in application menus.Configuration
{"mcpServers": {"orbination": {"command": "path/to/DesktopControlMcp.exe"}}}