A local Retrieval-Augmented Generation system for markdown files
EyeLevel RAG MCP Server
A local Retrieval-Augmented Generation (RAG) system implemented as an MCP (Model Context Protocol) server. This server allows you to ingest markdown files into a local knowledge base and perform semantic search to retrieve relevant context for LLM queries.
Features
- Local RAG Implementation: No external dependencies or paid services required
- Markdown File Support: Ingest and search through
.mdfiles - Semantic Search: Uses sentence transformers for embedding-based similarity search
- Persistent Storage: Automatically saves and loads the vector index using FAISS
- Chunk Management: Intelligently splits documents into searchable chunks
- Multiple Documents: Support for ingesting and searching across multiple markdown files
Installation
- Clone this repository
- Install dependencies using uv:
uv sync
Dependencies
sentence-transformers: For creating text embeddingsfaiss-cpu: For efficient vector similarity searchnumpy: For numerical operationsmcp[cli]: For the MCP server framework
Available Tools
1. `search_doc_for_rag_context(query: str)`
Searches the knowledge base for relevant context based on a user query.
Parameters:
query(str): The search query
Returns:
- Relevant text chunks with relevance scores
2. `ingest_markdown_file(local_file_path: str)`
Ingests a markdown file into the knowledge base.
Parameters:
local_file_path(str): Path to the markdown file to ingest
Returns:
- Status message indicating success or failure
3. `list_indexed_documents()`
Lists all documents currently in the knowledge base.
Returns:
- Summary of indexed files and chunk counts
4. `clear_knowledge_base()`
Clears all documents from the knowledge base.
Returns:
- Confirmation message
Usage
Start the server:
python main.pyIngest markdown files: Use the
ingest_markdown_filetool to add your.mdfiles to the knowledge base.Search for context: Use the
search_doc_for_rag_contexttool to find relevant information for your queries.
How It Works
- Document Processing: Markdown files are split into chunks based on paragraphs and sentence boundaries
- Embedding Creation: Text chunks are converted to embeddings using the
all-MiniLM-L6-v2model - Vector Storage: Embeddings are stored in a FAISS index for fast similarity search
- Retrieval: User queries are embedded and matched against the stored vectors to find relevant content
File Structure
main.py: Main server implementation with RAG functionalitypyproject.toml: Project dependencies and configurationrag_index.faiss: FAISS vector index (created automatically)rag_documents.pkl: Serialized documents and metadata (created automatically)
Configuration
The RAG system uses the all-MiniLM-L6-v2 sentence transformer model by default. This model provides a good balance between speed and quality for semantic search tasks.
Example Workflow
- Prepare your markdown files with the content you want to search
- Use
ingest_markdown_fileto add each file to the knowledge base - Use
search_doc_for_rag_contextto find relevant context for your questions - The retrieved context can be used by an LLM to provide informed answers
Notes
- The first time you run the server, it will download the sentence transformer model
- The vector index is automatically saved and loaded between sessions
- Long documents are automatically chunked to optimize search performance
- The system supports multiple markdown files and maintains source file metadata
Tools (4)
search_doc_for_rag_contextSearches the knowledge base for relevant context based on a user query.ingest_markdown_fileIngests a markdown file into the knowledge base.list_indexed_documentsLists all documents currently in the knowledge base.clear_knowledge_baseClears all documents from the knowledge base.Configuration
{
"mcpServers": {
"eyelevel-rag": {
"command": "python",
"args": ["/path/to/main.py"]
}
}
}