GLM OCR MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add -e "ZHIPU_API_KEY=${ZHIPU_API_KEY}" -e "ZHIPU_OCR_API_URL=${ZHIPU_OCR_API_URL}" glm-ocr -- uvx glm-ocr-mcp
Required:ZHIPU_API_KEYZHIPU_OCR_API_URL
README.md

Extract text from images and PDFs using ZhipuAI GLM-OCR

GLM OCR MCP Server

MCP server for extracting text from images and PDFs using ZhipuAI GLM-OCR.

Usage

{
  "mcpServers": {
    "glm-ocr": {
      "command": "uvx",
      "args": ["glm-ocr-mcp"],
      "env": {
        "ZHIPU_API_KEY": "your_api_key_here",
        "ZHIPU_OCR_API_URL": "https://open.bigmodel.cn/api/paas/v4/layout_parsing"
      }
    }
  }
}

Using with Claude Code

claude mcp add --scope user glm-ocr \
  --env ZHIPU_API_KEY=your_api_key_here \
  --env ZHIPU_OCR_API_URL=https://open.bigmodel.cn/api/paas/v4/layout_parsing \
  -- uvx glm-ocr-mcp

Using with Codex

Add MCP server with command:

codex mcp add glm-ocr \
  --env ZHIPU_API_KEY=your_api_key_here \
  --env ZHIPU_OCR_API_URL=https://open.bigmodel.cn/api/paas/v4/layout_parsing \
  -- uvx glm-ocr-mcp

Tools

The server provides one tool:

  • extract_text: Extract from local file or URL (png, jpg/jpeg, pdf)
    • default returns Markdown text
    • set return_json=true to return structured JSON without md_results (contains page parsing details like bbox_2d, content, label, etc.)

Parameters:

  • file_path: Local file path or URL for png, jpg/jpeg, or pdf
  • base64_data: Optional data URL/base64 payload (use when file_path is unavailable)
  • start_page_id: Optional PDF start page (1-based, only effective for PDF)
  • end_page_id: Optional PDF end page (1-based, only effective for PDF)
  • return_json: Optional boolean, default false. true returns JSON; false returns Markdown.

Examples

# Extract text from local image
extract_text(file_path="./screenshot.png")

# Extract text from local PDF
extract_text(file_path="./document.pdf")

# Extract text from URL image
extract_text(file_path="https://example.com/test.jpg")

# Use base64/data URL
extract_text(base64_data="data:image/png;base64,iVBORw0KGgo...")

# Extract structured layout JSON
extract_text(file_path="https://example.com/test.png", return_json=True)

Development

# Create virtual environment
uv venv
source .venv/bin/activate

# Sync dependencies and install current project
uv sync

# Run server for testing
python -m glm_ocr_mcp.server

Windows PowerShell activation:

.venv\Scripts\Activate.ps1

Project Structure

glm-ocr-mcp/
├── pyproject.toml         # Project configuration
├── README.md              # Documentation
├── .env.example           # Environment variable template
├── src/
│   └── glm_ocr_mcp/
│       ├── __init__.py
│       ├── __main__.py    # Entry point
│       ├── ocr.py         # OCR client
│       └── server.py      # MCP server

Tools (1)

extract_textExtracts text from a local file or URL, supporting png, jpg/jpeg, and pdf formats.

Environment Variables

ZHIPU_API_KEYrequiredAPI key for accessing ZhipuAI services
ZHIPU_OCR_API_URLrequiredThe endpoint URL for the GLM-OCR layout parsing API

Configuration

claude_desktop_config.json
{"mcpServers": {"glm-ocr": {"command": "uvx", "args": ["glm-ocr-mcp"], "env": {"ZHIPU_API_KEY": "your_api_key_here", "ZHIPU_OCR_API_URL": "https://open.bigmodel.cn/api/paas/v4/layout_parsing"}}}}

Try it

Extract the text from the image located at ./invoice.png.
Read the first 5 pages of the document at ./manual.pdf and summarize the content.
Extract the text from this image URL: https://example.com/document.jpg
Perform OCR on ./report.pdf and return the result as structured JSON.

Frequently Asked Questions

What are the key features of GLM OCR?

Supports text extraction from PNG, JPG, JPEG, and PDF files. Handles both local file paths and remote URLs. Provides optional structured JSON output including layout details like bounding boxes. Supports page range selection for multi-page PDF documents. Accepts base64-encoded data for direct image processing.

What can I use GLM OCR for?

Digitizing scanned paper documents for search and analysis. Extracting data from invoices or receipts for automated bookkeeping. Processing multi-page technical manuals to make them searchable. Converting image-based screenshots into editable text content.

How do I install GLM OCR?

Install GLM OCR by running: claude mcp add --scope user glm-ocr --env ZHIPU_API_KEY=your_api_key_here --env ZHIPU_OCR_API_URL=https://open.bigmodel.cn/api/paas/v4/layout_parsing -- uvx glm-ocr-mcp

What MCP clients work with GLM OCR?

GLM OCR works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep GLM OCR docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare