MCP Vision Server MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
pip install -e .
2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add -e "VISION_API_KEY=${VISION_API_KEY}" -e "VISION_BASE_URL=${VISION_BASE_URL}" -e "VISION_MODEL=${VISION_MODEL}" mcp-vision-server -- node "<FULL_PATH_TO_MCP_VISION_SERVER>/dist/index.js"

Replace <FULL_PATH_TO_MCP_VISION_SERVER>/dist/index.js with the actual folder you prepared in step 1.

Required:VISION_API_KEYVISION_BASE_URLVISION_MODEL+ 8 optional
README.md

Provides image analysis, OCR text extraction, and multi-turn visual dialogues.

MCP Vision Server - 图像识别 MCP 服务器

提供图像分析能力的 MCP 服务器,支持图像识别、文字提取、多轮对话等功能。

特性

  • 图像分析 - 支持各种图像内容识别与描述
  • 多轮对话 - 基于图像的连续问答
  • 灵活输入 - 支持本地文件路径和 Base64 编码
  • OpenAI 兼容 - 使用 OpenAI 兼容 API,支持多种视觉模型
  • 会话持久化 - 对话历史可持久化存储

安装

# 克隆仓库
git clone https://github.com/YOUR_USERNAME/mcp-vision-server.git
cd mcp-vision-server

# 创建虚拟环境
python -m venv venv
source venv/Scripts/activate  # Windows Git Bash

# 安装依赖
pip install -e .

配置

  1. 复制环境变量模板:
cp .env.example .env
  1. 编辑 .env 文件,填入您的 API 配置:
# 必填配置
VISION_API_KEY=your-api-key-here
VISION_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
VISION_MODEL=glm-4v

使用方法

启动服务器

mcp-vision-server

或直接运行:

python -m mcp_vision.server

Web 配置工具

启动 Web 配置界面,支持热加载配置:

mcp-vision-config

或指定端口:

mcp-vision-config --host 127.0.0.1 --port 8080

访问 http://127.0.0.1:7860 即可打开配置界面。

功能特性

  • 📝 可视化编辑所有配置项
  • 🔄 保存后自动热加载,无需重启服务
  • 🔒 API Key 密码隐藏显示
  • 📋 实时查看当前运行配置

MCP 工具

1. analyze_image - 图像分析

分析图像内容并返回详细描述。

# 基础用法
analyze_image(
    image="C:/path/to/image.png",
    prompt="详细描述这张图片"
)

# OCR 文字提取
analyze_image(
    image="C:/docs/scan.png",
    prompt="提取图片中的所有文字"
)

# 代码识别
analyze_image(
    image="C:/code/snippet.png",
    prompt="识别并转录图片中的代码,保持格式"
)
2. chat_vision - 两轮对话

基于图像进行两轮问答。

# 第一轮对话
result1 = chat_vision(
    image="C:/chart.png",
    question="这个图表显示什么数据?"
)
session_id = result1["session_id"]
# remaining_turns = 1, can_continue = True

# 第二轮对话(追问细节,对话结束后无法继续)
if result1["remaining_turns"] > 0:
    result2 = chat_vision(
        image="C:/chart.png",
        question="数据有什么趋势?",
        session_id=session_id
    )
    # remaining_turns = 0, can_continue = False

# 开始新对话
result3 = chat_vision(
    image="C:/another.png",
    question="描述这张图",
    is_new_conversation=True
)
3. get_status - 状态查询

获取服务器运行状态。

status = get_status()
# 返回: 服务器名称、模型信息、会话状态等

输入格式

支持两种图像输入格式:

1. 本地文件路径

image="C:/Users/name/Pictures/screenshot.png"
image="/home/user/images/photo.jpg"

2. Base64 编码

# 纯 Base64
image="iVBORw0KGgoAAAANSUhEUgAA..."

# Data URL 格式
image="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."

环境变量

变量名 说明 默认值
VISION_API_KEY API 密钥 -
VISION_BASE_URL API 基础 URL -
VISION_MODEL 模型名称 glm-4v
VISION_MAX_IMAGE_SIZE 最大图像大小(字节) 20971520 (20MB)
VISION_TIMEOUT 请求超时(秒) 120
VISION_TEMPERATURE 温度参数 0.7
VISION_MAX_TOKENS 最大输出 tokens 4096
VISION_LOG_LEVEL 日志级别 INFO
VISION_MAX_HISTORY 对话历史最大保存数 50
VISION_ENABLE_PERSISTENCE 启用持久化 true
VISION_HISTORY_PATH 历史文件路径 ~/.mcp-vision/history.json

支持的图像格式

  • PNG
  • JPEG / JPG
  • GIF
  • WebP
  • BMP
  • TIFF

项目结构

mcp-vision-server/
├── src/mcp_vision/
│   ├── __init__.py           # 包初始化
│   ├── server.py             # MCP 服务器主文件
│   ├── config.py             # 配置管理
│   ├── vision_client.py      # 视觉 API 客户端
│   ├── image_processor.py    # 图像处理
│   ├── chat_manager.py       # 对话管理器
│   ├── web_config.py         # Web 配置工具
│   └── utils.py              # 工具函数
├── tests/
├── .env.example
├── pyproject.toml
└── README.md

在 Claude Code 中配置

编辑 Claude Code 配置文件,添加 MCP 服务器:

{
  "mcpServers": {
    "vision": {
      "command": "mcp-vision-server",
      "env": {
        "VISION_API_KEY": "your-api-key",
        "VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4/",
        "VISION_MODEL": "glm-4v"
      }
    }
  }
}

许可证

MIT License

Tools (3)

analyze_imageAnalyzes image content and returns a detailed description, including OCR and code recognition.
chat_visionConducts multi-turn visual dialogues based on an image.
get_statusRetrieves the current server running status, model information, and session state.

Environment Variables

VISION_API_KEYrequiredAPI key for the vision service
VISION_BASE_URLrequiredBase URL for the vision API
VISION_MODELrequiredModel name to use
VISION_MAX_IMAGE_SIZEMaximum image size in bytes
VISION_TIMEOUTRequest timeout in seconds
VISION_TEMPERATURETemperature parameter for generation
VISION_MAX_TOKENSMaximum output tokens
VISION_LOG_LEVELLogging level
VISION_MAX_HISTORYMaximum number of conversation history entries
VISION_ENABLE_PERSISTENCEWhether to enable conversation persistence
VISION_HISTORY_PATHFile path for storing conversation history

Configuration

claude_desktop_config.json
{"mcpServers": {"vision": {"command": "mcp-vision-server", "env": {"VISION_API_KEY": "your-api-key", "VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4/", "VISION_MODEL": "glm-4v"}}}}

Try it

Analyze the image at C:/path/to/screenshot.png and describe the UI elements.
Extract all the text from the scanned document located at C:/docs/invoice.png.
Identify the code snippet in this image and transcribe it into a markdown block.
Start a new conversation about this chart and explain what the data trends represent.

Frequently Asked Questions

What are the key features of MCP Vision Server?

Advanced image content recognition and description. Multi-turn visual dialogue capabilities. Support for local file paths and Base64 encoded images. OpenAI-compatible API integration. Persistent conversation history storage.

What can I use MCP Vision Server for?

Automating data extraction from scanned documents or invoices. Analyzing UI screenshots for accessibility or design feedback. Transcribing code snippets from images into editable text. Interpreting complex charts and graphs through conversational AI.

How do I install MCP Vision Server?

Install MCP Vision Server by running: pip install -e .

What MCP clients work with MCP Vision Server?

MCP Vision Server works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep MCP Vision Server docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare