Deep code indexing for AI agents.
Srclight
Deep code indexing for AI agents. SQLite FTS5 + tree-sitter + embeddings + MCP.
Srclight builds a rich, searchable index of your codebase that AI coding agents can query instantly — replacing dozens of grep/glob calls with precise, structured lookups. It is the most comprehensive code intelligence MCP server available: 29 tools covering symbol search, relationship graphs, git change intelligence, semantic search, build system awareness, and document extraction — capabilities no other single MCP server combines. Fully local and private: your code never leaves your machine.
Why?
AI coding agents (Claude Code, Cursor, etc.) spend 40-60% of their tokens on orientation — searching for files, reading code to understand structure, hunting for callers and callees. Srclight eliminates this waste.
| Without Srclight | With Srclight |
|---|---|
| 8-12 grep rounds to find callers | get_callers("lookup") — one call |
| Read 5 files to understand module | codebase_map() — instant overview |
| "Find code that does X" → 20 greps | semantic_search("dictionary lookup") — one call |
| 15-25 tool calls per bug fix | 5-8 tool calls per bug fix |
Features
- Minimal dependencies — single SQLite file per repo, no Docker/Redis/vector DB
- Fully offline — no API calls, works air-gapped (Ollama local embeddings)
- Incremental — only re-indexes changed files (content hash detection)
- 11 languages — Python, C, C++, C#, JavaScript, TypeScript, PHP, Dart, Swift, Kotlin, Java, Go
- 10 document formats — PDF, DOCX, XLSX, HTML, CSV/TSV, email (.eml), images (PNG/JPG/SVG/etc.), plain text, RST, Markdown
- Optional OCR — PaddleOCR for scanned/image-only PDF pages; pytesseract for images
- 4 search modes — symbol names, source code (trigram), documentation (stemmed), semantic (embeddings)
- Hybrid search — RRF fusion of keyword + semantic results for best accuracy
- Multi-repo workspaces — search across all your repos simultaneously via SQLite ATTACH+UNION
- MCP server — works with Claude Code, Cursor, and any MCP client
- CLI — index, search, and inspect from the terminal
- Auto-reindex — git post-commit/post-checkout hooks keep indexes fresh
Requirements
- Python 3.11+
- Git (for change intelligence and auto-reindex hooks)
- Ollama (optional, for semantic search / embeddings) — ollama.com
- NVIDIA GPU + cupy (optional, for GPU-accelerated vector search)
- Poppler (optional, for PaddleOCR scanned-PDF support) —
apt install poppler-utils/brew install poppler
Quick Start
# Install from PyPI
pip install srclight
# Install from source
git clone https://github.com/srclight/srclight.git
cd srclight
pip install -e .
# Optional: document format support (PDF, DOCX, XLSX, HTML, images)
pip install 'srclight[docs,pdf]'
# Optional: OCR for scanned PDFs (also needs poppler-utils on your system)
pip install 'srclight[pdf,paddleocr]'
# Optional: OCR for images (needs tesseract on your system)
pip install 'srclight[docs,ocr]'
# Optional: GPU-accelerated vector search (requires CUDA 12.x)
pip install 'srclight[gpu]'
# Everything (docs + pdf + ocr + paddleocr + gpu)
pip install 'srclight[all]'
# Index your project
cd /path/to/your/project
srclight index
# Index with embeddings (requires Ollama running)
srclight index --embed qwen3-embedding
# Search
srclight search "lookup"
srclight search --kind function "parse"
srclight symbols src/main.py
# Start MCP server (for Claude Code / Cursor)
srclight serve
Note:
srclight indexautomatically adds.srclight/to your.gitignore. Index databases and embedding files can be large and should never be committed.
Semantic Search (Embeddings)
Srclight supports embedding-based semantic search for natural language queries like "find code that handles authentication" or "where is the database connection pool".
Setup
# Install Ollama (https://ollama.com)
# Pull an embedding model
ollama pull qwen3-embedding # Best quality (8B params, needs ~6GB VRAM)
ollama pull nomic-embed-text # Lighter alternative (137M params)
# Index with embeddings
srclight index --embed qwen3-embedding
# Or index workspace with embeddings
srclight workspace index -w myworkspace --embed qwen3-embedding
How It Works
- Each symbol's name + signature + docstring + content is embedded as a float vector
- Vectors are stored as BLOBs in
symbol_embeddingstable (SQLite) - After indexing, a
.npysidecar snapshot is built and loaded to GPU VRAM (cupy) or CPU RAM (numpy) for fast sear
Tools (3)
get_callersFinds all callers of a specific symbol or function.codebase_mapProvides an instant overview of the codebase structure.semantic_searchPerforms a semantic search across the codebase using embeddings.Configuration
{"mcpServers": {"srclight": {"command": "srclight", "args": ["serve"]}}}