A local RAG-powered documentation search system using vector embeddings
local_lense
A production-ready RAG (Retrieval-Augmented Generation) system that enables semantic search across local documentation using vector embeddings and similarity search. Built with TypeScript, this tool demonstrates modern AI integration patterns including vector databases, embedding generation, and MCP (Model Context Protocol) tooling.
Perfect for: Engineering teams needing intelligent documentation search, knowledge bases, or RAG system implementations.
What is local_lense?
local_lense is a RAG (Retrieval-Augmented Generation) powered documentation search tool that:
- Indexes your local documentation - Processes markdown, HTML, JSON, YAML, and text files to create a searchable vector index
- Semantic search - Uses vector embeddings to find relevant content based on meaning, not just keywords
- Cursor integration - Exposes search capabilities via MCP so Cursor AI can search your docs
- Fast and local - Everything runs locally with Qdrant vector database
- Extensible - Supports custom source processors for indexing content from web, databases, or other sources
How it works
local_lense uses a RAG (Retrieval-Augmented Generation) architecture:
Indexing Phase:
- Scans your configured documentation directory
- Splits documents into chunks
- Generates vector embeddings using transformer models
- Stores embeddings in Qdrant vector database
Search Phase:
- Takes a natural language query
- Generates an embedding for the query
- Searches Qdrant for similar document chunks
- Returns relevant sections with relevance scores
Refresh Mechanism:
- Uses a single "docs" collection that is dropped and re-indexed on initialization
- Simple and straightforward approach for reliable indexing
MCP Integration (Future):
- Exposes search as MCP tools
- Cursor AI can query your docs directly
- Seamless integration with your workflow
Prerequisites
- Node.js (v18 or higher)
- Docker and Docker Compose (for Qdrant vector database)
- TypeScript (installed as dev dependency)
Quick Start
1. Clone the repository
git clone <repository-url>
cd local_lense
2. Install dependencies
npm install
3. Start Qdrant vector database
docker-compose up -d
This starts a Qdrant container on localhost:6333. The data persists in a Docker volume.
4. Configure your documentation path
Edit configs.json:
{
"sourcePath": "~/Documents/my-docs",
"searchResultLimit": 3
}
sourcePath: Path to your documentation directory (supports~for home directory)searchResultLimit: Maximum number of search results to return
5. Build the project
npm run build
6. Run indexing and search
Currently, the tool runs as a test script. Edit src/main.ts to configure your search query, then:
npm run dev
Configuration
configs.json
The main user configuration file located in the project root:
sourcePath (string, required): Path to your documentation directory
- Important: Use full absolute paths - avoid using
~(tilde) for home directory expansion - Example: Use
"/Users/username/Documents/my-docs"instead of"~/Documents/my-docs" - Full paths ensure reliable operation across different contexts and environments
- Important: Use full absolute paths - avoid using
searchResultLimit (number, optional): Maximum number of results per search
- Default:
3
- Default:
keywordBoost (boolean, optional): Enable keyword-based score boosting to improve relevance with local embedding models
- Boosts scores when query keywords appear in document content or file paths
- Default:
true
keywordBoostWeight (number, optional): Controls the strength of keyword boosting (0.0 to 1.0)
- Higher values increase the boost effect
- Default:
0.2(20% boost weight)
Note: Collection management is handled automatically by the system. The system uses a single "docs" collection that is always dropped and re-indexed on initialization.
Docker Compose
The docker-compose.yaml file configures Qdrant:
- Port:
6333(Qdrant HTTP API) - Storage: Persistent volume
qdrant_storage - Health checks: Automatic container health monitoring
Supported File Types
The default FileSourceProcessor (see `src/ragIndexer/implementations/fileSourceProcessor.ts`) supports the following file types:
Fully Supported
- Markdown:
.md,.markdown - HTML:
.html,.htm - JSON:
.json - YAML:
.yaml,.yml - Text:
.txt,.text
Other Files
Files with unsupported extensions are processed as `ContentType.OT
Environment Variables
sourcePathrequiredPath to your documentation directorysearchResultLimitMaximum number of results per searchkeywordBoostEnable keyword-based score boostingkeywordBoostWeightStrength of keyword boosting (0.0 to 1.0)Configuration
{"sourcePath": "/Users/username/Documents/my-docs", "searchResultLimit": 3}