What are the requirements for Orbination AI Desktop Vision & Control?

Orbination AI Desktop Vision & Control requires a compatible MCP client such as Claude Desktop, Claude Code, or Cursor. No additional environment variables are needed for basic setup.

Is Orbination AI Desktop Vision & Control free to use?

Yes, Orbination AI Desktop Vision & Control is open source and free to use. You can find the source code on GitHub.

What MCP clients support Orbination AI Desktop Vision & Control?

Orbination AI Desktop Vision & Control works with any MCP-compatible client including Claude Desktop (Anthropic's official desktop app), Claude Code (CLI tool), Cursor, and other editors with MCP support.

How do I configure Orbination AI Desktop Vision & Control?

Configure Orbination AI Desktop Vision & Control by adding it to your MCP client's config file. The setup block at the top of this page generates a ready-to-paste config for Claude Code, Cursor, Codex, Windsurf, and Claude Desktop.

MCP server/browser-automation

Orbination AI Desktop Vision & Control MCP Server

Q: What tools does Orbination AI Desktop Vision & Control provide?

run_sequence: Execute multiple UI actions in a single call.. click_element: Click a UI element identified by text or OCR.. click_menu_item: Navigate and click items in application menus..

Q: How do I install Orbination AI Desktop Vision & Control?

Install Orbination AI Desktop Vision & Control by running: dotnet build -c Release

Native Windows MCP server that gives AI agents full desktop control.

★ 4 amichail-1/Orbination-AI-Desktop-Vision-Control ↗by amichail-1updated Mar 22, 2026

Add it to Claude Code

claude mcp add orbination-desktop-control -- path/to/DesktopControlMcp.exe

Make your agent remember this setup

orbination-desktop-control's config, env vars, and the gotchas you hit — recalled in every future Claude Code, Cursor, and Codex session.

npx conare@latest

Free · one command · indexes the sessions already on disk. Set up in the browser instead →

What it does

Native Windows UIAutomation and OCR integration
Window occlusion detection and visibility analysis
Batch action sequencing for complex workflows
Automatic dark theme enhancement for OCR
PrintWindow API for capturing obscured windows

Tools 3

run_sequenceExecute multiple UI actions in a single call.

click_elementClick a UI element identified by text or OCR.

click_menu_itemNavigate and click items in application menus.

Try it

→Find the 'Save' button in the current window and click it.

→Navigate to the File menu and select 'Export' then 'PDF'.

→Perform a sequence: click the search bar, type 'Project Alpha', and press Enter.

→Analyze the current screen and tell me which windows are visible.

Original README from amichail-1/Orbination-AI-Desktop-Vision-Control

Orbination AI Desktop Vision & Control

Give AI assistants eyes and hands. A native Windows MCP server that lets AI see the screen, read UI elements, click buttons, type text, and control any application — with built-in OCR, dark theme support, window occlusion detection, and batch action sequencing.

Built for Claude Code by Leia Enterprise Solutions for the Orbination project.

AI coding assistants are blind. They generate code but can never see the result. They can't compare a design mockup to a running app. They can't click through a UI to test it. This server fixes that.

What It Does

This MCP server bridges the gap between AI and your desktop. Instead of working blind with just text, the AI can:

See — Take screenshots, run OCR on any window (auto-enhances dark themes), detect window occlusion
Read — Detect every UI element (buttons, inputs, text, tabs, checkboxes) with exact positions via Windows UIAutomation
Interact — Click elements by text (UIAutomation + OCR fallback), navigate menus, fill forms, type and paste text
Navigate — Open apps, switch windows, focus tabs, navigate browser URLs
Understand — Scan the entire desktop: window visibility %, occlusion detection, uncovered desktop regions
Batch — Execute multi-step UI workflows in a single call with run_sequence

What's New in v2.0

Window Occlusion Detection — Grid-based analysis showing which windows are truly visible (visibility %) and which are hidden behind others
Desktop Region Detection — Flood-fill algorithm to find uncovered screen areas
Shared OcrService — Centralized OCR with automatic dark theme enhancement (invert + contrast boost) — single-pass, not two
PrintWindow API — Capture window content even when obscured by other windows
click_element OCR Fallback — UIAutomation first, then OCR for dark themes, web apps, iframes
run_sequence — Batch multiple UI actions (click, type, paste, hotkey, wait, focus, OCR click) in a single MCP call
click_menu_item — Navigate parent > child menus with smooth mouse movement to keep submenus open
DPI Awareness — Per-monitor DPI for correct coordinates on multi-monitor setups with mixed scaling
Embedded AI Instructions — Server sends tool usage guidelines on MCP connection, teaching AI to prefer OCR over screenshots

Architecture

AI Client (Claude Code / Claude Desktop)
         │
         │  MCP / stdio
         ▼
    ┌─────────────────────────────┐
    │       MCP Server            │
    │   (ServerInstructions)      │
    └─────────┬───────────────────┘
              │
    ┌─────────┼──────────────────────────────────────┐
    │         │         │          │          │       │
    ▼         ▼         ▼          ▼          ▼       │
 Mouse    Keyboard   Screen    Vision    Composite   │
 Tools     Tools     Tools     Tools      Tools      │
                       │          │          │       │
              ┌────────┼──────────┼──────────┘       │
              ▼        ▼          ▼                  │
          Win32     UIAuto-    OcrService            │
          Native    mation     (dark theme)          │
              │        │                             │
              ▼        ▼                             │
         DesktopScanner    NativeInput               │
         (occlusion,       (SendInput,               │
          regions)          clipboard)               │
              │               │                      │
              └───────┬───────┘                      │
                      ▼                              │
               Windows OS                            │
               (Desktop, Windows, Apps)              │
    └────────────────────────────────────────────────┘

Single native .NET 8 executable. No Python. No Node.js. No browser drivers. Direct Windows API access.

Requirements

Windows 10/11
.NET 8 SDK

Build

cd DesktopControlMcp
dotnet build -c Release

Or publish as a single file:

dotnet publish -c Release -r win-x64 --self-contained false

Setup with Claude Code

Add the MCP server to your Claude Code conf

Frequently Asked Questions

What are the key features of Orbination AI Desktop Vision & Control?

Native Windows UIAutomation and OCR integration. Window occlusion detection and visibility analysis. Batch action sequencing for complex workflows. Automatic dark theme enhancement for OCR. PrintWindow API for capturing obscured windows.

What can I use Orbination AI Desktop Vision & Control for?

Automating repetitive data entry across multiple desktop applications. Testing UI responsiveness and element visibility in Windows apps. Enabling AI agents to interact with legacy software lacking APIs. Performing multi-step desktop navigation tasks via natural language.

How do I install Orbination AI Desktop Vision & Control?

Install Orbination AI Desktop Vision & Control by running: dotnet build -c Release

What MCP clients work with Orbination AI Desktop Vision & Control?

Orbination AI Desktop Vision & Control works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Conare · memory for coding agents

Turn this server into reusable context

Keep Orbination AI Desktop Vision & Control docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Set up free$npx conare@latest