Orbination AI Desktop Vision & Control MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
git clone https://github.com/amichail-1/Orbination-AI-Desktop-Vision-Control
cd Orbination-AI-Desktop-Vision-Control

Then follow the repository README for any remaining dependency or build steps before continuing.

2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add orbination-desktop-control -- node "<FULL_PATH_TO_ORBINATION_AI_DESKTOP_VISION_CONTROL>/dist/index.js"

Replace <FULL_PATH_TO_ORBINATION_AI_DESKTOP_VISION_CONTROL>/dist/index.js with the actual folder you prepared in step 1.

README.md

Native Windows MCP server that gives AI agents full desktop control.

Orbination AI Desktop Vision & Control

Give AI assistants eyes and hands. A native Windows MCP server that lets AI see the screen, read UI elements, click buttons, type text, and control any application — with built-in OCR, dark theme support, window occlusion detection, and batch action sequencing.

Built for Claude Code by Leia Enterprise Solutions for the Orbination project.

AI coding assistants are blind. They generate code but can never see the result. They can't compare a design mockup to a running app. They can't click through a UI to test it. This server fixes that.

What It Does

This MCP server bridges the gap between AI and your desktop. Instead of working blind with just text, the AI can:

  • See — Take screenshots, run OCR on any window (auto-enhances dark themes), detect window occlusion
  • Read — Detect every UI element (buttons, inputs, text, tabs, checkboxes) with exact positions via Windows UIAutomation
  • Interact — Click elements by text (UIAutomation + OCR fallback), navigate menus, fill forms, type and paste text
  • Navigate — Open apps, switch windows, focus tabs, navigate browser URLs
  • Understand — Scan the entire desktop: window visibility %, occlusion detection, uncovered desktop regions
  • Batch — Execute multi-step UI workflows in a single call with run_sequence

What's New in v2.0

  • Window Occlusion Detection — Grid-based analysis showing which windows are truly visible (visibility %) and which are hidden behind others
  • Desktop Region Detection — Flood-fill algorithm to find uncovered screen areas
  • Shared OcrService — Centralized OCR with automatic dark theme enhancement (invert + contrast boost) — single-pass, not two
  • PrintWindow API — Capture window content even when obscured by other windows
  • click_element OCR Fallback — UIAutomation first, then OCR for dark themes, web apps, iframes
  • run_sequence — Batch multiple UI actions (click, type, paste, hotkey, wait, focus, OCR click) in a single MCP call
  • click_menu_item — Navigate parent > child menus with smooth mouse movement to keep submenus open
  • DPI Awareness — Per-monitor DPI for correct coordinates on multi-monitor setups with mixed scaling
  • Embedded AI Instructions — Server sends tool usage guidelines on MCP connection, teaching AI to prefer OCR over screenshots

Architecture

AI Client (Claude Code / Claude Desktop)
         │
         │  MCP / stdio
         ▼
    ┌─────────────────────────────┐
    │       MCP Server            │
    │   (ServerInstructions)      │
    └─────────┬───────────────────┘
              │
    ┌─────────┼──────────────────────────────────────┐
    │         │         │          │          │       │
    ▼         ▼         ▼          ▼          ▼       │
 Mouse    Keyboard   Screen    Vision    Composite   │
 Tools     Tools     Tools     Tools      Tools      │
                       │          │          │       │
              ┌────────┼──────────┼──────────┘       │
              ▼        ▼          ▼                  │
          Win32     UIAuto-    OcrService            │
          Native    mation     (dark theme)          │
              │        │                             │
              ▼        ▼                             │
         DesktopScanner    NativeInput               │
         (occlusion,       (SendInput,               │
          regions)          clipboard)               │
              │               │                      │
              └───────┬───────┘                      │
                      ▼                              │
               Windows OS                            │
               (Desktop, Windows, Apps)              │
    └────────────────────────────────────────────────┘

Single native .NET 8 executable. No Python. No Node.js. No browser drivers. Direct Windows API access.

Requirements

Build

cd DesktopControlMcp
dotnet build -c Release

Or publish as a single file:

dotnet publish -c Release -r win-x64 --self-contained false

Setup with Claude Code

Add the MCP server to your Claude Code conf

Tools (3)

run_sequenceExecute multiple UI actions in a single call.
click_elementClick a UI element identified by text or OCR.
click_menu_itemNavigate and click items in application menus.

Configuration

claude_desktop_config.json
{"mcpServers": {"orbination": {"command": "path/to/DesktopControlMcp.exe"}}}

Try it

Find the 'Save' button in the current window and click it.
Navigate to the File menu and select 'Export' then 'PDF'.
Perform a sequence: click the search bar, type 'Project Alpha', and press Enter.
Analyze the current screen and tell me which windows are visible.

Frequently Asked Questions

What are the key features of Orbination AI Desktop Vision & Control?

Native Windows UIAutomation and OCR integration. Window occlusion detection and visibility analysis. Batch action sequencing for complex workflows. Automatic dark theme enhancement for OCR. PrintWindow API for capturing obscured windows.

What can I use Orbination AI Desktop Vision & Control for?

Automating repetitive data entry across multiple desktop applications. Testing UI responsiveness and element visibility in Windows apps. Enabling AI agents to interact with legacy software lacking APIs. Performing multi-step desktop navigation tasks via natural language.

How do I install Orbination AI Desktop Vision & Control?

Install Orbination AI Desktop Vision & Control by running: dotnet build -c Release

What MCP clients work with Orbination AI Desktop Vision & Control?

Orbination AI Desktop Vision & Control works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep Orbination AI Desktop Vision & Control docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare