Ollama MCP Server

Exposes local Ollama instances as tools for Claude Code

README.md

ollama-mcp

An MCP (Model Context Protocol) server that exposes local Ollama instances as tools for Claude Code.

Lets Claude offload code generation, drafts, embeddings, and quick questions to your local GPUs.

Setup

  1. Run the setup script:

    bash setup.sh
    

    This creates a venv, installs dependencies, generates a machine-specific config.json, and registers the MCP server with Claude Code.

    Note: setup.sh uses cygpath and targets Windows (Git Bash / MSYS2). On Linux/macOS, replace the cygpath -w calls with the paths directly, or register manually:

    claude mcp add ollama -s user -- /path/to/.venv/bin/python /path/to/src/ollama_mcp/server.py
    
  2. Restart Claude Code.

Tools

Tool Description
ollama_generate Single-turn prompt → response
ollama_chat Multi-turn conversation
ollama_embed Generate embedding vectors
ollama_list_models List models on your Ollama instances

Configuration

Copy config.example.json to config.json and fill in your machine details, or let setup.sh generate it interactively.

Requirements

  • Python 3.10+
  • Ollama 0.4.0+ running on at least one machine
  • Claude Code with MCP support

Development

pip install -e ".[dev]"
pytest tests/ -v

About

Extracted from a private developer infrastructure repo and published as a standalone tool. This server runs daily as part of a multi-project AI development workflow spanning game engines, RAG pipelines, and task orchestration — see mcp-rag and orchestration-engine for projects that use it.

Troubleshooting

Problem Cause Fix
config.json not found Setup not run Run bash setup.sh
404 on embed calls Ollama < 0.4.0 Upgrade Ollama (ollama update)
Cannot connect to... Ollama not running on target host Start Ollama: ollama serve or check Docker
Request timed out Large model / slow hardware Increase timeout in config.json, or pass timeout parameter
OFFLINE in list_models Host unreachable Check network, firewall, Ollama port 11434
cygpath: command not found Running setup.sh on Linux/macOS See setup note above

License

MIT

Tools 4

ollama_generatePerforms a single-turn prompt to response generation.
ollama_chatFacilitates a multi-turn conversation with the model.
ollama_embedGenerates embedding vectors for provided text.
ollama_list_modelsLists all models currently available on your Ollama instances.

Try it

List all the models I currently have installed in my local Ollama instance.
Generate a draft for a README file for my new project using the llama3 model.
Create an embedding vector for the following text snippet to use in my RAG pipeline.
Continue our conversation about this code snippet using the mistral model.

Frequently Asked Questions

What are the key features of Ollama MCP?

Exposes local Ollama instances as tools for Claude Code. Supports multi-turn conversations with local LLMs. Enables local generation of embedding vectors. Provides model management and discovery for local Ollama instances.

What can I use Ollama MCP for?

Offloading code generation tasks to local GPUs to save API costs. Drafting text and documentation locally for privacy-sensitive projects. Building RAG pipelines by generating embeddings locally. Orchestrating complex AI development workflows across multiple local models.

How do I install Ollama MCP?

Install Ollama MCP by running: claude mcp add ollama -s user -- /path/to/.venv/bin/python /path/to/src/ollama_mcp/server.py

What MCP clients work with Ollama MCP?

Ollama MCP works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Conare · memory for coding agents

Turn this server into reusable context

Keep Ollama MCP docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Set up free$npx conare@latest