CodeSight MCP Server

Local setup required. This server has to be cloned and prepared on your machine before you register it in Claude Code.
1

Set the server up locally

Run this once to clone and prepare the server before adding it to Claude Code.

Run in terminal
pip install -e ".[dev]"
2

Register it in Claude Code

After the local setup is done, run this command to point Claude Code at the built server.

Run in terminal
claude mcp add -e "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}" codesight -- node "<FULL_PATH_TO_HOLUSIGHT>/dist/index.js"

Replace <FULL_PATH_TO_HOLUSIGHT>/dist/index.js with the actual folder you prepared in step 1.

Required:ANTHROPIC_API_KEY+ 3 optional
README.md

AI-powered document search engine with hybrid BM25 + vector retrieval

codesight

AI-powered document search engine — hybrid BM25 + vector + RRF retrieval with Claude answer synthesis.

Quick Start

# Install
pip install -e ".[dev]"

# Index a folder of documents
python -m codesight index /path/to/documents

# Search
python -m codesight search "payment terms" /path/to/documents

# Ask a question (requires ANTHROPIC_API_KEY)
python -m codesight ask "What are the payment terms?" /path/to/documents

# Launch the web chat UI
pip install -e ".[demo]"
python -m codesight demo

Python API

from codesight import CodeSight

engine = CodeSight("/path/to/documents")
engine.index()                                     # Index all files
results = engine.search("payment terms")           # Hybrid search
answer = engine.ask("What are the payment terms?") # Search + Claude answer
status = engine.status()                           # Index freshness check

Supported Formats

Format Extension Parser
PDF .pdf pymupdf
Word .docx python-docx
PowerPoint .pptx python-pptx
Code .py, .js, .ts, .go, .rs, etc. Built-in (10 languages)
Text .md, .txt, .csv Built-in

Architecture

  • Document Parsing: PDF, DOCX, PPTX text extraction with page/section metadata
  • Chunking: Language-aware regex splitting (code) + paragraph-aware splitting (documents)
  • Embeddings: all-MiniLM-L6-v2 via sentence-transformers (local, no API key)
  • Vector Store: LanceDB (serverless, file-based)
  • Keyword Search: SQLite FTS5 sidecar
  • Retrieval: Hybrid BM25 + vector with RRF merge
  • Answer Synthesis: Claude API generates answers with source citations

See ARCHITECTURE.md for the full system tour.

Configuration

Variable Default Description
ANTHROPIC_API_KEY Required for ask() / Claude answer synthesis
CODESIGHT_DATA_DIR ~/.codesight/data Where indexes are stored
CODESIGHT_EMBEDDING_MODEL all-MiniLM-L6-v2 Embedding model
CODESIGHT_LLM_MODEL claude-sonnet-4-20250514 Claude model for answers
CODESIGHT_STALE_MINUTES 60 Index freshness threshold
LOG_LEVEL INFO Logging verbosity

See .env.example for all options.

Workflow: Explore → Plan → Execute → Review

Opus in VS Code plans and launches autonomous CLI agents in the background — the user never leaves the conversation. Agents run via env -u CLAUDECODE claude --dangerously-skip-permissions --model [model] -p '...' with output redirected to files. Multiple cycles ensure quality: Sonnet implements, Opus reviews. See .claude/rules/workflow.md for full details.

Stack

  • Python 3.11+
  • LanceDB + SQLite FTS5
  • sentence-transformers
  • Anthropic Claude API
  • Streamlit (web chat UI)
  • pymupdf, python-docx, python-pptx (document parsing)

Tools (3)

indexIndexes a folder of documents for search
searchPerforms hybrid BM25 and vector search on indexed documents
askAnswers questions based on indexed documents using Claude

Environment Variables

ANTHROPIC_API_KEYrequiredRequired for ask() and Claude answer synthesis
CODESIGHT_DATA_DIRDirectory where indexes are stored
CODESIGHT_EMBEDDING_MODELEmbedding model to use
CODESIGHT_LLM_MODELClaude model for answers

Configuration

claude_desktop_config.json
{"mcpServers": {"codesight": {"command": "python", "args": ["-m", "codesight.mcp"], "env": {"ANTHROPIC_API_KEY": "your-key-here"}}}}

Try it

Search for 'payment terms' in my local documentation folder.
What are the payment terms defined in the project documents?
Index the contents of the /docs folder for future searching.
Find all references to authentication logic in the codebase.

Frequently Asked Questions

What are the key features of CodeSight?

Hybrid retrieval combining BM25 keyword search and vector embeddings. Language-aware chunking for code and paragraph-aware splitting for documents. Local embedding generation using all-MiniLM-L6-v2. Answer synthesis with source citations via Claude API. Support for PDF, DOCX, PPTX, and various code formats.

What can I use CodeSight for?

Searching through large local codebases for specific implementation patterns. Querying technical documentation and project requirements stored in PDF or Word files. Building a local RAG system for private project knowledge bases. Automating documentation retrieval for development workflows.

How do I install CodeSight?

Install CodeSight by running: pip install -e ".[dev]"

What MCP clients work with CodeSight?

CodeSight works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep CodeSight docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare