A generalized RAG system with hybrid search capabilities for any documents.
Hybrid RAG Project
A generalized Retrieval-Augmented Generation (RAG) system with hybrid search capabilities that works with any documents you provide. Combines semantic (dense vector) search and keyword (sparse BM25) search for optimal document retrieval, with an MCP server API for easy integration.
šÆ Key Features: Multi-format support ⢠Local LLM ⢠Claude Desktop integration ⢠Structured data queries ⢠Document-type-aware retrieval
š Quick Start (No MCP Required!)
You don't need Claude Desktop or MCP to use this project! Just run:
# 1. Make sure Ollama is running
ollama serve
# 2. Activate virtual environment
source .venv/bin/activate
# 3. Start conversational demo (recommended)
python scripts/demos/conversational.py
# Or use the shortcut
./scripts/bin/ask.sh
That's it! Ask questions about the 43,835 document chunks in the sample dataset.
š See Quick Start Guide for complete usage instructions. š Browse all documentation in the docs/ folder or start with docs/README.md.
Overview
This project implements a hybrid RAG system that combines:
- Semantic Search: Dense vector embeddings for understanding meaning and context
- Keyword Search: BM25 sparse retrieval for exact keyword matching
- Hybrid Fusion: Reciprocal Rank Fusion (RRF) to combine results from both methods
- MCP Server: Both REST API and Model Context Protocol server for Claude integration
- Multi-format Support: Automatically loads documents from various file formats
The hybrid approach ensures better retrieval accuracy by leveraging the strengths of both search methods.
Features
- Vector-based semantic search using Chroma and Ollama embeddings
- BM25 keyword search for exact term matching
- Ensemble retriever with Reciprocal Rank Fusion (RRF)
- Integration with local Ollama LLM for answer generation
- Support for multiple document formats (TXT, PDF, MD, DOCX, CSV)
- Automated document loading from data directory
- RESTful API server with
/ingestand/queryendpoints - Model Context Protocol (MCP) server for Claude Desktop/API integration
- Configuration-driven architecture (no hardcoded values)
- Persistent vector store for faster subsequent queries
Architecture
User Documents ā data/ directory
ā
Document Loader
ā
Query ā Hybrid Retriever ā [Vector Retriever + BM25 Retriever]
ā RRF Fusion
ā Retrieved Context
ā LLM (Ollama)
ā Final Answer
Prerequisites
- Python 3.9+
- Ollama installed and running locally
- Required Ollama models:
llama3.1:latest(or another LLM model)nomic-embed-text(or another embedding model)
Installing Ollama
Visit ollama.ai to download and install Ollama for your platform.
After installation, pull the required models:
ollama pull llama3.1:latest
ollama pull nomic-embed-text
Verify Ollama is running:
curl http://localhost:11434/api/tags
Installation
- Clone the repository:
git clone <your-repo-url>
cd hybrid-rag-project
- Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Project Structure
hybrid-rag-project/
āāā src/
ā āāā hybrid_rag/ # Core application package
ā āāā __init__.py # Package initialization
ā āāā document_loader.py # Document loading utility
ā āāā structured_query.py# CSV query engine
ā āāā utils.py # Logging and utility functions
āāā scripts/
ā āāā run_demo.py # Main demonstration script
ā āāā mcp_server.py # REST API server
ā āāā mcp_server_claude.py # MCP server for Claude integration
āāā config/
ā āāā config.yaml # Configuration file
ā āāā claude_desktop_config.json # Sample Claude Desktop MCP config
āāā docs/
ā āāā INSTALLATION.md # Detailed installation guide
ā āāā STRUCTURED_QUERIES.md # CSV query documentation
ā āāā ASYNC_INGESTION.md # Async ingestion guide
ā āāā SHUTDOWN.md # Shutdown handling guide
āāā data/ # Sample data files (13 files included)
ā āāā *.csv # 7 CSV files (structured data)
ā āāā *.md # 5 Markdown files (unstructured)
ā āāā *.txt # 1 Text file (technical specs)
āāā chroma_db/
Tools (2)
queryPerforms a hybrid search across the document store using both semantic and keyword matching.ingestIngests new documents from the data directory into the vector store.Configuration
{ "mcpServers": { "hybrid-rag": { "command": "python", "args": ["/path/to/hybrid-rag-project/scripts/mcp_server_claude.py"] } } }