Optimized MCP server for OCR processing using Mistral AI
MCP Mistral OCR Optimized
Optimized MCP server for OCR processing using Mistral AI with batch processing and async connection pooling.
🚀 Key Optimizations
| Feature | Benefit |
|---|---|
| Batch Processing API | Up to 50% cost reduction for large file sets |
| Async Connection Pooling | 20-30% faster processing for multiple files |
| Token-Efficient Defaults | include_images=False, table_format=markdown saves 30-40% tokens |
| Concurrent Processing | Process up to 5 files simultaneously |
| Cross-Platform Paths | Works on Windows, macOS, Linux, and Docker |
| Configurable Parameters | Fine-tune OCR output with table_format, headers, footers |
📦 Installation
Using UV (Recommended)
# Navigate to project directory
cd D:/dev/mcp_mistral_ocr_opt
# Create and activate virtual environment
uv venv
# Windows
.venv\Scripts\activate
# Unix
source .venv/bin/activate
# Install dependencies
uv pip install .
Using Docker
# Build image
docker build -t mcp-mistral-ocr-opt .
# Run container
docker run -e MISTRAL_API_KEY=your_api_key \
-v /path/to/your/files:/data/ocr \
mcp-mistral-ocr-opt:latest
⚙️ Configuration
Environment Variables
Create or edit .env file:
# Required
MISTRAL_API_KEY=your_api_key_here
OCR_DIR=D:/dev/mcp_mistral_ocr_opt/data/ocr
# Optional - Batch Processing
BATCH_MODE=auto # auto, always, never
BATCH_MIN_FILES=5 # Use batch processing for 5+ files in auto mode
INLINE_BATCH_THRESHOLD=10 # Use inline batch for <10 files
MAX_CONCURRENT_REQUESTS=5 # Max concurrent API requests
# Optional - OCR Defaults (token optimization)
DEFAULT_TABLE_FORMAT=markdown # null, markdown, or html
INCLUDE_IMAGES=false # Default false for token efficiency
EXTRACT_HEADER=false # Extract document headers
EXTRACT_FOOTER=false # Extract document footers
Claude Desktop Configuration
Add to claude_desktop_config.json:
{
"mcpServers": {
"mistral-ocr-opt": {
"command": "uv",
"args": [
"run",
"--directory",
"D:/dev/mcp_mistral_ocr_opt",
"-m",
"src.mcp_mistral_ocr_opt.main"
],
"env": {
"MISTRAL_API_KEY": "your_api_key_here",
"OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr",
"BATCH_MODE": "auto"
}
}
}
}
🛠️ Available Tools
1. `process_local_file` - Process a single file
Process a single local file from OCR_DIR.
{
"name": "process_local_file",
"arguments": {
"filename": "document.pdf",
"table_format": "markdown",
"extract_header": false,
"extract_footer": false,
"include_images": false
}
}
Parameters:
filename(required): Name of file relative to OCR_DIRtable_format(optional):null,markdown, orhtml- default:markdownextract_header(optional): Extract document headers - default:falseextract_footer(optional): Extract document footers - default:falseinclude_images(optional): Include base64 images - default:false(token efficient)
Supported local file types:
- PDFs:
.pdf - Images:
.jpg,.jpeg,.png,.gif,.webp,.bmp,.avif - Other formats (docx/xlsx/pptx) are not supported
2. `process_batch_local_files` - Process multiple files concurrently
Process multiple files with concurrent or batch processing (auto-selected).
{
"name": "process_batch_local_files",
"arguments": {
"patterns": ["*.pdf", "scanned_*.jpg"],
"max_files": 100,
"table_format": "markdown",
"include_images": false
}
}
Parameters:
patterns(required): Array of glob patterns (e.g.,["*.pdf", "*.jpg"])max_files(optional): Maximum files to process- Other parameters same as
process_local_file
Auto-selection Logic:
- < 5 files: Concurrent processing
- 5-9 files: Inline batch (if BATCH_MODE=auto)
- 10+ files: File batch (saves up to 50% cost)
3. `process_url_file` - Process file from URL
Process a file from a public URL.
{
"name": "process_url_file",
"arguments": {
"url": "https://example.com/document.pdf",
"file_type": "pdf",
"table_format": "html"
}
}
4. `create_batch_job` - Create explicit batch job
Create a batch processing job (for large file sets, cost savings up to 50%).
{
"name": "create_batch_job",
"arguments": {
"patterns": ["documents/*.pdf"],
"use_inline": false,
"table_format": "markdown"
}
}
Returns:
{
"batch_type": "file",
"job_id": "job_abc123",
"batch_file_id": "file_xyz789",
"files_queued": 50,
"message": "Batch job created with 50 files. Use check_batch_status to monitor progress."
}
5. `check_batch_status` - Monitor batch job
{
"name": "check_batch_status",
"arguments": {
"job_id": "job_abc123"
}
}
**Returns
Tools (5)
process_local_fileProcess a single local file from OCR_DIR.process_batch_local_filesProcess multiple files with concurrent or batch processing.process_url_fileProcess a file from a public URL.create_batch_jobCreate a batch processing job for large file sets.check_batch_statusMonitor the status of a batch job.Environment Variables
MISTRAL_API_KEYrequiredAPI key for Mistral AI servicesOCR_DIRrequiredDirectory path for local OCR filesBATCH_MODEBatch processing mode (auto, always, never)Configuration
{"mcpServers": {"mistral-ocr-opt": {"command": "uv", "args": ["run", "--directory", "D:/dev/mcp_mistral_ocr_opt", "-m", "src.mcp_mistral_ocr_opt.main"], "env": {"MISTRAL_API_KEY": "your_api_key_here", "OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr", "BATCH_MODE": "auto"}}}}