MCP Mistral OCR Optimized MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add -e "MISTRAL_API_KEY=${MISTRAL_API_KEY}" -e "OCR_DIR=${OCR_DIR}" mistral-ocr-opt -- docker build -t mcp-mistral-ocr-opt .
Required:MISTRAL_API_KEYOCR_DIR+ 1 optional
README.md

Optimized MCP server for OCR processing using Mistral AI

MCP Mistral OCR Optimized

Optimized MCP server for OCR processing using Mistral AI with batch processing and async connection pooling.

🚀 Key Optimizations

Feature Benefit
Batch Processing API Up to 50% cost reduction for large file sets
Async Connection Pooling 20-30% faster processing for multiple files
Token-Efficient Defaults include_images=False, table_format=markdown saves 30-40% tokens
Concurrent Processing Process up to 5 files simultaneously
Cross-Platform Paths Works on Windows, macOS, Linux, and Docker
Configurable Parameters Fine-tune OCR output with table_format, headers, footers

📦 Installation

Using UV (Recommended)

# Navigate to project directory
cd D:/dev/mcp_mistral_ocr_opt

# Create and activate virtual environment
uv venv
# Windows
.venv\Scripts\activate
# Unix
source .venv/bin/activate

# Install dependencies
uv pip install .

Using Docker

# Build image
docker build -t mcp-mistral-ocr-opt .

# Run container
docker run -e MISTRAL_API_KEY=your_api_key \
           -v /path/to/your/files:/data/ocr \
           mcp-mistral-ocr-opt:latest

⚙️ Configuration

Environment Variables

Create or edit .env file:

# Required
MISTRAL_API_KEY=your_api_key_here
OCR_DIR=D:/dev/mcp_mistral_ocr_opt/data/ocr

# Optional - Batch Processing
BATCH_MODE=auto                  # auto, always, never
BATCH_MIN_FILES=5                # Use batch processing for 5+ files in auto mode
INLINE_BATCH_THRESHOLD=10        # Use inline batch for <10 files
MAX_CONCURRENT_REQUESTS=5        # Max concurrent API requests

# Optional - OCR Defaults (token optimization)
DEFAULT_TABLE_FORMAT=markdown    # null, markdown, or html
INCLUDE_IMAGES=false             # Default false for token efficiency
EXTRACT_HEADER=false             # Extract document headers
EXTRACT_FOOTER=false             # Extract document footers

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mistral-ocr-opt": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "D:/dev/mcp_mistral_ocr_opt",
        "-m",
        "src.mcp_mistral_ocr_opt.main"
      ],
      "env": {
        "MISTRAL_API_KEY": "your_api_key_here",
        "OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr",
        "BATCH_MODE": "auto"
      }
    }
  }
}

🛠️ Available Tools

1. `process_local_file` - Process a single file

Process a single local file from OCR_DIR.

{
  "name": "process_local_file",
  "arguments": {
    "filename": "document.pdf",
    "table_format": "markdown",
    "extract_header": false,
    "extract_footer": false,
    "include_images": false
  }
}

Parameters:

  • filename (required): Name of file relative to OCR_DIR
  • table_format (optional): null, markdown, or html - default: markdown
  • extract_header (optional): Extract document headers - default: false
  • extract_footer (optional): Extract document footers - default: false
  • include_images (optional): Include base64 images - default: false (token efficient)

Supported local file types:

  • PDFs: .pdf
  • Images: .jpg, .jpeg, .png, .gif, .webp, .bmp, .avif
  • Other formats (docx/xlsx/pptx) are not supported

2. `process_batch_local_files` - Process multiple files concurrently

Process multiple files with concurrent or batch processing (auto-selected).

{
  "name": "process_batch_local_files",
  "arguments": {
    "patterns": ["*.pdf", "scanned_*.jpg"],
    "max_files": 100,
    "table_format": "markdown",
    "include_images": false
  }
}

Parameters:

  • patterns (required): Array of glob patterns (e.g., ["*.pdf", "*.jpg"])
  • max_files (optional): Maximum files to process
  • Other parameters same as process_local_file

Auto-selection Logic:

  • < 5 files: Concurrent processing
  • 5-9 files: Inline batch (if BATCH_MODE=auto)
  • 10+ files: File batch (saves up to 50% cost)

3. `process_url_file` - Process file from URL

Process a file from a public URL.

{
  "name": "process_url_file",
  "arguments": {
    "url": "https://example.com/document.pdf",
    "file_type": "pdf",
    "table_format": "html"
  }
}

4. `create_batch_job` - Create explicit batch job

Create a batch processing job (for large file sets, cost savings up to 50%).

{
  "name": "create_batch_job",
  "arguments": {
    "patterns": ["documents/*.pdf"],
    "use_inline": false,
    "table_format": "markdown"
  }
}

Returns:

{
  "batch_type": "file",
  "job_id": "job_abc123",
  "batch_file_id": "file_xyz789",
  "files_queued": 50,
  "message": "Batch job created with 50 files. Use check_batch_status to monitor progress."
}

5. `check_batch_status` - Monitor batch job

{
  "name": "check_batch_status",
  "arguments": {
    "job_id": "job_abc123"
  }
}

**Returns

Tools (5)

process_local_fileProcess a single local file from OCR_DIR.
process_batch_local_filesProcess multiple files with concurrent or batch processing.
process_url_fileProcess a file from a public URL.
create_batch_jobCreate a batch processing job for large file sets.
check_batch_statusMonitor the status of a batch job.

Environment Variables

MISTRAL_API_KEYrequiredAPI key for Mistral AI services
OCR_DIRrequiredDirectory path for local OCR files
BATCH_MODEBatch processing mode (auto, always, never)

Configuration

claude_desktop_config.json
{"mcpServers": {"mistral-ocr-opt": {"command": "uv", "args": ["run", "--directory", "D:/dev/mcp_mistral_ocr_opt", "-m", "src.mcp_mistral_ocr_opt.main"], "env": {"MISTRAL_API_KEY": "your_api_key_here", "OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr", "BATCH_MODE": "auto"}}}}

Try it

Process the file 'invoice_001.pdf' in my OCR directory and extract the data as a markdown table.
Batch process all PDF files in the current directory using the optimized batch mode.
Extract text from this URL: https://example.com/document.pdf and format it as HTML.
Check the status of my batch job with ID job_abc123.
Process all scanned images matching 'scanned_*.jpg' and include headers in the output.

Frequently Asked Questions

What are the key features of MCP Mistral OCR Optimized?

Batch processing API for up to 50% cost reduction. Async connection pooling for faster multi-file processing. Token-efficient defaults including image exclusion and markdown tables. Concurrent processing of up to 5 files simultaneously. Cross-platform support for Windows, macOS, Linux, and Docker.

What can I use MCP Mistral OCR Optimized for?

Automating the extraction of data from large sets of invoices or receipts. Converting scanned document archives into structured markdown for knowledge bases. Processing public web-hosted PDFs into machine-readable text formats. Reducing API costs for high-volume document digitization tasks.

How do I install MCP Mistral OCR Optimized?

Install MCP Mistral OCR Optimized by running: uv pip install .

What MCP clients work with MCP Mistral OCR Optimized?

MCP Mistral OCR Optimized works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep MCP Mistral OCR Optimized docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare