McPlex MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
pip install mcplex
2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add mcplex -- node "<FULL_PATH_TO_MCPLEX>/dist/index.js"

Replace <FULL_PATH_TO_MCPLEX>/dist/index.js with the actual folder you prepared in step 1.

README.md

Bridge local Ollama models and ChromaDB vector memory to MCP clients

McPlex

MCP server that bridges local Ollama models to Claude Code and other MCP clients -- text generation, embeddings, vision, and vector memory, all running locally.

Why I Built This

Claude Code is powerful but cloud-only. Local models via Ollama are private and free but disconnected from MCP tooling. I needed a bridge: expose local models as MCP tools so Claude Code can delegate tasks to local inference (summarization, embedding, image analysis) without API costs or data leaving my machine. Any MCP-compatible client (Claude Code, Cursor, etc.) gets access with zero custom integration.

What It Does

  • 9 MCP tools -- generate, chat, embed, list_models, analyze_image, ocr_image, memory_store, memory_search, memory_list_collections
  • Zero cloud dependency -- all inference runs locally via Ollama; no API keys needed
  • ChromaDB vector memory -- store and semantically search text with persistent local storage
  • Vision and OCR -- analyze images and extract text using local vision models (LLaVA)
  • Drop-in MCP config -- add 3 lines to Claude Code's MCP config and local models are available immediately

Key Technical Decisions

  • MCP protocol over custom API -- standard protocol means any MCP client works without custom integration code. When a new MCP client launches, McPlex works with it automatically.
  • Ollama over vLLM -- simpler setup, built-in model management (ollama pull), runs on consumer hardware. vLLM is faster at scale but requires manual model configuration and more VRAM.
  • Lazy ChromaDB loading -- memory tools are optional. Core text/vision tools work without ChromaDB installed. pip install mcplex[memory] adds vector storage only when needed.
  • Async HTTP via httpx -- non-blocking Ollama API calls. Multiple tools can query different models concurrently without blocking the MCP event loop.

Quick Start

pip install mcplex              # Core (text + vision)
pip install mcplex[memory]      # With ChromaDB vector memory

# Requires Ollama running locally
ollama pull qwen3:8b            # Pull a model

Add to Claude Code MCP config:

{
  "mcpServers": {
    "mcplex": {
      "command": "mcplex",
      "args": []
    }
  }
}

Then ask Claude Code: "Use the generate tool to summarize this file with qwen3:8b"

Configuration via environment variables:

MCPLEX_OLLAMA_URL=http://localhost:11434
MCPLEX_DEFAULT_MODEL=qwen3:8b
MCPLEX_CHROMA_PATH=./mcplex_data/chroma

Lessons Learned

MCP tool schema design matters more than implementation quality. Overly flexible schemas (e.g., a single query tool that accepts model, prompt, temperature, max_tokens, format, and system prompt) confuse LLM clients -- they don't know which parameters to set. Specific, well-documented tool signatures with sensible defaults (generate takes a prompt and optional model) produce much better tool-calling accuracy. I went through three schema iterations before landing on the current 9-tool design, and each simplification improved Claude Code's ability to use the tools correctly.

Tests

pip install -e ".[memory,dev]"
pytest tests/ -v    # 24 tests

MIT License. See LICENSE.

Tools (9)

generateGenerates text using a local Ollama model.
chatEngages in a chat session with a local Ollama model.
embedCreates embeddings for text using a local Ollama model.
list_modelsLists all available local Ollama models.
analyze_imageAnalyzes an image using a local vision model.
ocr_imageExtracts text from an image using local OCR capabilities.
memory_storeStores text in local ChromaDB vector memory.
memory_searchSearches for semantically similar text in ChromaDB.
memory_list_collectionsLists all available ChromaDB memory collections.

Environment Variables

MCPLEX_OLLAMA_URLThe URL for the local Ollama instance.
MCPLEX_DEFAULT_MODELThe default model to use for generation tasks.
MCPLEX_CHROMA_PATHThe file system path for persistent ChromaDB storage.

Configuration

claude_desktop_config.json
{
  "mcpServers": {
    "mcplex": {
      "command": "mcplex",
      "args": []
    }
  }
}

Try it

Use the generate tool to summarize this file with qwen3:8b
Analyze the image at ./screenshot.png and describe the UI elements
Store the contents of this documentation in my local memory collection
Search my memory for previous notes regarding the project architecture
List all my currently available local Ollama models

Frequently Asked Questions

What are the key features of McPlex?

Integrates local Ollama models with MCP-compatible clients. Provides persistent vector memory via ChromaDB. Supports vision-based image analysis and OCR. Zero cloud dependency with no external API keys required. Asynchronous HTTP communication for concurrent tool execution.

What can I use McPlex for?

Summarizing local files using private, offline LLMs. Building a local knowledge base that Claude can query via vector search. Extracting text from local screenshots or diagrams without uploading to the cloud. Managing local model inference tasks directly from the Claude Code interface.

How do I install McPlex?

Install McPlex by running: pip install mcplex

What MCP clients work with McPlex?

McPlex works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep McPlex docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare