Vision MCP Server

Free, unlimited vision capabilities for your AI coding assistant

README.md

Vision MCP Server

Free, unlimited vision capabilities for your AI coding assistant using Groq API and Meta Llama 4 Vision model.

Features

  • Image Analysis - Understand and describe images
  • Text Extraction (OCR) - Extract text from screenshots, documents, photos
  • UI Analysis - Describe UI components, layouts, and design
  • Error Diagnosis - Analyze error screenshots and suggest fixes
  • Diagram Understanding - Interpret flowcharts, UML, architecture diagrams
  • Chart Analysis - Read charts and dashboards for insights
  • Image Comparison - Compare two images for differences
  • Code Extraction - Extract code from IDE screenshots

Installation

Prerequisites

  • Python 3.10 or higher
  • Free Groq API key

Get Groq API Key (Free)

  1. Visit https://console.groq.com/keys
  2. Sign up (free)
  3. Create a new API key

Install Dependencies

cd vision-mcp-server

# Option 1: Using install script (recommended)
./install.sh

# Option 2: Manual installation
pip3 install mcp groq pillow aiofiles

Configuration

Claude Desktop

Add to ~/.claude/config.json:

{
  "mcpServers": {
    "vision-mcp-server": {
      "command": "python",
      "args": ["-m", "vision_mcp_server.server"],
      "env": {
        "GROQ_API_KEY": "your-groq-api-key-here"
      }
    }
  }
}

OpenCode

Add to OpenCode settings:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "vision-mcp-server": {
      "type": "local",
      "command": ["python", "-m", "vision_mcp_server.server"],
      "environment": {
        "GROQ_API_KEY": "your-groq-api-key-here"
      }
    }
  }
}

Cline (VS Code)

Add to Cline settings:

{
  "mcpServers": {
    "vision-mcp-server": {
      "command": "python",
      "args": ["-m", "vision_mcp_server.server"],
      "env": {
        "GROQ_API_KEY": "your-groq-api-key-here"
      }
    }
  }
}

Usage

Analyze Image

Describe this image: screenshot.png

Extract Text

Extract text from this document: scan.jpg

Diagnose Error

What's wrong with this error screenshot: error.png

Understand Diagram

Explain this architecture diagram: system-diagram.png

Compare Images

Compare these two UI screenshots: old-ui.png vs new-ui.png

Available Tools

  • analyze_image - General image analysis
  • extract_text - OCR text extraction
  • describe_ui - UI component analysis
  • diagnose_error - Error screenshot analysis
  • understand_diagram - Diagram interpretation
  • analyze_chart - Chart and dashboard analysis
  • compare_images - Image comparison
  • code_from_screenshot - Code extraction from screenshots

Models Used

  • meta-llama/llama-4-scout-17b-16e-instruct - Latest Meta Llama 4 vision model
  • Available for free via Groq API
  • No quotas, no limits
  • Superior vision capabilities and multimodal performance

Testing

Run locally:

export GROQ_API_KEY=your-api-key
python -m vision_mcp_server.server

License

MIT

Tools 8

analyze_imageGeneral image analysis
extract_textOCR text extraction
describe_uiUI component analysis
diagnose_errorError screenshot analysis
understand_diagramDiagram interpretation
analyze_chartChart and dashboard analysis
compare_imagesImage comparison
code_from_screenshotCode extraction from screenshots

Environment Variables

GROQ_API_KEYrequiredAPI key for accessing Groq services

Try it

Describe this image: screenshot.png
Extract text from this document: scan.jpg
What's wrong with this error screenshot: error.png
Explain this architecture diagram: system-diagram.png
Compare these two UI screenshots: old-ui.png vs new-ui.png

Frequently Asked Questions

What are the key features of Vision MCP Server?

Image analysis and description. OCR text extraction from documents and photos. UI component and layout analysis. Error screenshot diagnosis and fix suggestions. Diagram and chart interpretation.

What can I use Vision MCP Server for?

Extracting text from scanned documents or screenshots for editing. Analyzing UI screenshots to generate frontend code or design feedback. Diagnosing software errors by analyzing screenshots of error messages. Interpreting complex architecture diagrams or flowcharts. Comparing visual differences between two versions of a UI.

How do I install Vision MCP Server?

Install Vision MCP Server by running: pip3 install mcp groq pillow aiofiles

What MCP clients work with Vision MCP Server?

Vision MCP Server works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep Vision MCP Server docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Open Conare