What are the requirements for Knowledge RAG?

Knowledge RAG requires a compatible MCP client such as Claude Desktop, Claude Code, or Cursor. No additional environment variables are needed for basic setup.

Is Knowledge RAG free to use?

Yes, Knowledge RAG is open source and free to use. You can find the source code on GitHub.

What MCP clients support Knowledge RAG?

Knowledge RAG works with any MCP-compatible client including Claude Desktop (Anthropic's official desktop app), Claude Code (CLI tool), Cursor, and other editors with MCP support.

How do I configure Knowledge RAG?

Configure Knowledge RAG by adding it to your MCP client's config file. The setup block at the top of this page generates a ready-to-paste config for Claude Code, Cursor, Codex, Windsurf, and Claude Desktop.

MCP server/ai-tools

Knowledge RAG MCP Server

Q: What tools does Knowledge RAG provide?

reindex_documents: Manually triggers a re-indexing of the documents directory.. search_knowledge: Performs a hybrid search (semantic + BM25) across indexed documents with reranking..

Q: How do I install Knowledge RAG?

Install Knowledge RAG by running: pip install knowledge-rag

Local RAG system for Claude Code with hybrid search and cross-encoder reranking

★ 45 lyonzin/knowledge-rag ↗by lyonzinupdated Apr 10, 2026

Add it to Claude Code

claude mcp add knowledge-rag -- python -m knowledge_rag

Make your agent remember this setup

knowledge-rag's config, env vars, and the gotchas you hit — recalled in every future Claude Code, Cursor, and Codex session.

npx conare@latest

Free · one command · indexes the sessions already on disk. Set up in the browser instead →

What it does

Hybrid search combining semantic embeddings and BM25 keyword matching
Cross-encoder reranking for improved precision on ambiguous queries
Markdown-aware chunking that respects header boundaries
Real-time file watching with auto-reindexing
Zero external dependencies using in-process ONNX runtime

Tools 2

reindex_documentsManually triggers a re-indexing of the documents directory.

search_knowledgePerforms a hybrid search (semantic + BM25) across indexed documents with reranking.

Try it

→Search my local documentation for the procedure on how to handle database migrations.

→Find information in my notes regarding the new API authentication requirements.

→What does our internal documentation say about the deployment process for the staging environment?

→Search for 'sqli' in my project files to see if there are any relevant security notes.

Original README from lyonzin/knowledge-rag

Knowledge RAG

LLMs don't know your docs. Every conversation starts from zero.

Your notes, writeups, internal procedures, PDFs — none of it exists to your AI assistant. Cloud RAG solutions leak your private data. Local ones require Docker, Ollama, and 15 minutes of setup before a single query.

Knowledge RAG fixes this. One pip install, zero external servers. Your documents become instantly searchable inside Claude Code — with reranking precision that actually finds what you need.

clone → pip install → restart Claude Code → done.

12 MCP Tools | Hybrid Search + Cross-Encoder Reranking | Markdown-Aware Chunking | 100% Local, Zero Cloud

What's New | Installation | API Reference | Architecture

Breaking Changes (v2.x → v3.0)

v3.0 is a major release. If you are upgrading from v2.x, read this section first.

Change	v2.x	v3.0
Embedding engine	Ollama (external server)	FastEmbed (ONNX in-process)
Embedding model	nomic-embed-text (768D)	BAAI/bge-small-en-v1.5 (384D)
Embedding dimensions	768	384
Dependencies	`ollama>=0.6.0`	`fastembed>=0.4.0`, `requests`, `beautifulsoup4`
MCP tools	6 tools	12 tools
Default hybrid_alpha	0.3	0.3

Migration Steps

# 1. Pull the latest code
git pull origin main

# 2. Activate your virtual environment
# Windows:
.\venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

# 3. Install new dependencies
pip install -r requirements.txt

# 4. Restart Claude Code — the server auto-detects dimension mismatch
#    and triggers a nuclear rebuild on first startup (re-embeds everything)

The first startup after upgrading will take longer than usual because:

FastEmbed downloads the BAAI/bge-small-en-v1.5 model (~50MB, cached in ~/.cache/fastembed/)
All documents are re-embedded with the new 384-dim model
The cross-encoder reranker model is downloaded on first query (~25MB)

After the initial rebuild, startup and queries are faster than v2.x because there is no Ollama server dependency.

What's New in v3.1.0

Office Document Support (DOCX, XLSX, PPTX, CSV)

9 formats supported. DOCX headings preserved as markdown structure, Excel sheets extracted as text tables, PowerPoint slides extracted per-slide, CSV natively parsed. All new formats integrate with markdown-aware chunking.

File Watcher — Auto-Reindex on Changes

Documents directory is monitored in real-time via watchdog. When you add, modify, or delete a file, the system auto-reindexes with 5-second debounce. No manual reindex_documents needed.

MMR Result Diversification

Maximal Marginal Relevance applied after reranking to reduce redundant results. Balances relevance vs diversity (lambda=0.7). If your top 5 results were all from the same document, MMR pushes varied sources up.

What's New in v3.0.0

Ollama Removed — Zero External Dependencies

FastEmbed replaces Ollama entirely. Embeddings and reranking run in-process via ONNX Runtime. No server to start, no port to check, no process to manage. The embedding model downloads automatically on first run and is cached locally.

Cross-Encoder Reranking

After hybrid RRF fusion produces initial candidates, a cross-encoder (Xenova/ms-marco-MiniLM-L-6-v2) re-scores query-document pairs jointly. This dramatically improves precision for ambiguous queries where bi-encoder similarity alone is insufficient.

Markdown-Aware Chunking

.md files are now split by ## and ### header boundaries instead of fixed 1000-character windows. Each section becomes a semantically coherent chunk. Sections larger than chunk_size are sub-chunked with overlap. Non-markdown files still use the standard fixed-size chunker.

Query Expansion

54 security-term synonym mappings expand abbreviated queries before BM25 search. Searching for "sqli" automatically includes "sql injection"; "privesc" includes "privilege escalation"; "pth" includes "pass-the

Frequently Asked Questions

What are the key features of Knowledge RAG?

Hybrid search combining semantic embeddings and BM25 keyword matching. Cross-encoder reranking for improved precision on ambiguous queries. Markdown-aware chunking that respects header boundaries. Real-time file watching with auto-reindexing. Zero external dependencies using in-process ONNX runtime.

What can I use Knowledge RAG for?

Querying internal technical documentation and procedures directly from Claude Code. Searching through local project notes and PDFs without leaking data to the cloud. Maintaining an up-to-date knowledge base that automatically reflects file changes. Improving AI responses by providing context from local project-specific knowledge.

How do I install Knowledge RAG?

Install Knowledge RAG by running: pip install knowledge-rag

What MCP clients work with Knowledge RAG?

Knowledge RAG works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Conare · memory for coding agents

Turn this server into reusable context

Keep Knowledge RAG docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Set up free$npx conare@latest