Knowledge RAG MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
pip install knowledge-rag
2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add knowledge-rag -- node "<FULL_PATH_TO_KNOWLEDGE_RAG>/dist/index.js"

Replace <FULL_PATH_TO_KNOWLEDGE_RAG>/dist/index.js with the actual folder you prepared in step 1.

README.md

Local RAG system for Claude Code with hybrid search and cross-encoder reranking

Knowledge RAG

LLMs don't know your docs. Every conversation starts from zero.

Your notes, writeups, internal procedures, PDFs — none of it exists to your AI assistant. Cloud RAG solutions leak your private data. Local ones require Docker, Ollama, and 15 minutes of setup before a single query.

Knowledge RAG fixes this. One pip install, zero external servers. Your documents become instantly searchable inside Claude Code — with reranking precision that actually finds what you need.

clone → pip install → restart Claude Code → done.


12 MCP Tools | Hybrid Search + Cross-Encoder Reranking | Markdown-Aware Chunking | 100% Local, Zero Cloud

What's New | Installation | API Reference | Architecture


Breaking Changes (v2.x → v3.0)

v3.0 is a major release. If you are upgrading from v2.x, read this section first.

Change v2.x v3.0
Embedding engine Ollama (external server) FastEmbed (ONNX in-process)
Embedding model nomic-embed-text (768D) BAAI/bge-small-en-v1.5 (384D)
Embedding dimensions 768 384
Dependencies ollama>=0.6.0 fastembed>=0.4.0, requests, beautifulsoup4
MCP tools 6 tools 12 tools
Default hybrid_alpha 0.3 0.3

Migration Steps

# 1. Pull the latest code
git pull origin main

# 2. Activate your virtual environment
# Windows:
.\venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

# 3. Install new dependencies
pip install -r requirements.txt

# 4. Restart Claude Code — the server auto-detects dimension mismatch
#    and triggers a nuclear rebuild on first startup (re-embeds everything)

The first startup after upgrading will take longer than usual because:

  1. FastEmbed downloads the BAAI/bge-small-en-v1.5 model (~50MB, cached in ~/.cache/fastembed/)
  2. All documents are re-embedded with the new 384-dim model
  3. The cross-encoder reranker model is downloaded on first query (~25MB)

After the initial rebuild, startup and queries are faster than v2.x because there is no Ollama server dependency.


What's New in v3.1.0

Office Document Support (DOCX, XLSX, PPTX, CSV)

9 formats supported. DOCX headings preserved as markdown structure, Excel sheets extracted as text tables, PowerPoint slides extracted per-slide, CSV natively parsed. All new formats integrate with markdown-aware chunking.

File Watcher — Auto-Reindex on Changes

Documents directory is monitored in real-time via watchdog. When you add, modify, or delete a file, the system auto-reindexes with 5-second debounce. No manual reindex_documents needed.

MMR Result Diversification

Maximal Marginal Relevance applied after reranking to reduce redundant results. Balances relevance vs diversity (lambda=0.7). If your top 5 results were all from the same document, MMR pushes varied sources up.


What's New in v3.0.0

Ollama Removed — Zero External Dependencies

FastEmbed replaces Ollama entirely. Embeddings and reranking run in-process via ONNX Runtime. No server to start, no port to check, no process to manage. The embedding model downloads automatically on first run and is cached locally.

Cross-Encoder Reranking

After hybrid RRF fusion produces initial candidates, a cross-encoder (Xenova/ms-marco-MiniLM-L-6-v2) re-scores query-document pairs jointly. This dramatically improves precision for ambiguous queries where bi-encoder similarity alone is insufficient.

Markdown-Aware Chunking

.md files are now split by ## and ### header boundaries instead of fixed 1000-character windows. Each section becomes a semantically coherent chunk. Sections larger than chunk_size are sub-chunked with overlap. Non-markdown files still use the standard fixed-size chunker.

Query Expansion

54 security-term synonym mappings expand abbreviated queries before BM25 search. Searching for "sqli" automatically includes "sql injection"; "privesc" includes "privilege escalation"; "pth" includes "pass-the

Tools (2)

reindex_documentsManually triggers a re-indexing of the documents directory.
search_knowledgePerforms a hybrid search (semantic + BM25) across indexed documents with reranking.

Configuration

claude_desktop_config.json
{"mcpServers": {"knowledge-rag": {"command": "python", "args": ["-m", "knowledge_rag"]}}}

Try it

Search my local documentation for the procedure on how to handle database migrations.
Find information in my notes regarding the new API authentication requirements.
What does our internal documentation say about the deployment process for the staging environment?
Search for 'sqli' in my project files to see if there are any relevant security notes.

Frequently Asked Questions

What are the key features of Knowledge RAG?

Hybrid search combining semantic embeddings and BM25 keyword matching. Cross-encoder reranking for improved precision on ambiguous queries. Markdown-aware chunking that respects header boundaries. Real-time file watching with auto-reindexing. Zero external dependencies using in-process ONNX runtime.

What can I use Knowledge RAG for?

Querying internal technical documentation and procedures directly from Claude Code. Searching through local project notes and PDFs without leaking data to the cloud. Maintaining an up-to-date knowledge base that automatically reflects file changes. Improving AI responses by providing context from local project-specific knowledge.

How do I install Knowledge RAG?

Install Knowledge RAG by running: pip install knowledge-rag

What MCP clients work with Knowledge RAG?

Knowledge RAG works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep Knowledge RAG docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare