Fast semantic code search for AI agents
codeix
codeix.dev · Fast semantic code search for AI agents — find symbols, references, and callers across any codebase.
codeix # start MCP server, watch for changes
codeix build # parse source files, write .codeindex
codeix -r ~/project build # build from a specific directory
Why
AI coding agents spend most of their token budget finding code before they can work on it. They grep, read files, grep again, backtrack. On a large codebase the agent might burn thousands of tokens just locating the right function — or worse, miss it entirely and hallucinate.
Codeix gives the agent a pre-built map of your codebase. One structured query returns the symbol name, file, line range, signature, and parent — no scanning, no guessing.
What existing tools get wrong
| Problem | What happens today |
|---|---|
| No structure | grep finds text matches, not symbols. The agent can't distinguish a function definition from a comment mentioning it. |
| Slow re-parsing | Python-based indexers re-parse everything on startup. On large codebases, you wait. |
| Not shareable | Indexes are local caches — ephemeral, per-machine. A new developer or CI runner starts from scratch. |
| No composition | Monorepo with 10 packages? Dependencies with useful APIs? No way to query across boundaries. |
| Prose is invisible | TODOs, docstrings, error messages — searchable by grep but not selectively. You can't search only comments without also matching code. |
What codeix does differently
- Committed to git — the index is a
.codeindexdirectory you commit with your code. Clone the repo, the index is already there. No re-indexing. - Shareable — library authors can ship
.codeindexin their npm/PyPI/crates.io package. Consumers get instant navigation of dependencies. - Composable — the MCP server auto-discovers dependency indexes and mounts them. Query your code and your dependencies in one place.
- Structured for LLMs — symbols have kinds, signatures, parent relationships, and line ranges. The agent gets exactly what it needs in one tool call instead of piecing it together from raw text.
- Prose search —
search --scope texttargets comments, docstrings, and string literals specifically. Find TODOs, find the error message a user reported, find what a function's docstring says — without noise from code. - Fast — builds in seconds, queries in milliseconds. Rust + tree-sitter + in-memory SQLite FTS5 under the hood.
The `.codeindex` format
An open, portable format for structured code indexing. Plain JSONL files you commit alongside your code — git-friendly diffs, human-readable with grep and jq, no binary blobs.
.codeindex/
index.json # manifest: version, name, languages
files.jsonl # one line per source file (path, lang, hash, line count)
symbols.jsonl # one line per symbol (functions, classes, imports, with signatures)
texts.jsonl # one line per comment, docstring, string literal
Any tool that can parse JSON can consume a .codeindex. Codeix builds it using tree-sitter, and AI agents query it through MCP (Model Context Protocol).
Example — symbols.jsonl:
{"file":"src/main.py","name":"os","kind":"import","line":[1,1]}
{"file":"src/main.py","name":"Config","kind":"class","line":[22,45]}
{"file":"src/main.py","name":"Config.__init__","kind":"method","line":[23,30],"parent":"Config","sig":"def __init__(self, path: str, debug: bool = False)"}
{"file":"src/main.py","name":"main","kind":"function","line":[48,60],"sig":"def main(args: list[str]) -> int"}
Ship your index with your package
Include .codeindex in your package and every developer who depends on you gets instant navigation of your API — no setup, no re-indexing.
Works with Git repos, npm, PyPI, and crates.io.
MCP tools
Seven tools, zero setup. The agent queries immediately — no init, no config, no refresh.
| Tool | What it does |
|---|---|
explore |
Explore project structure: metadata, subprojects, files grouped by directory |
search |
Unified full-text search across symbols, files, and texts (FTS5, BM25-ranked) with scope/kind/path/project filters |
get_file_symbols |
List all symbols in a file |
get_children |
Get children of a class/module |
get_callers |
Find all places that call or reference a symbol |
get_callees |
Find all symbols that a function/method calls |
flush_index |
Flush pending index changes to disk |
Project discovery
Launch codeix from any directory. It walks downward and treats every directory containing .git/ as a separate project — each gets its own .codeindex.
Works uniformly for single repos, monorepos, sibling repos, and git submodules. No config needed.
Languages
Tree-sitter grammars, feat
Tools (7)
exploreExplore project structure including metadata, subprojects, and files grouped by directory.searchUnified full-text search across symbols, files, and texts with filtering.get_file_symbolsList all symbols in a specific file.get_childrenGet children of a class or module.get_callersFind all places that call or reference a symbol.get_calleesFind all symbols that a function or method calls.flush_indexFlush pending index changes to disk.Configuration
{"mcpServers": {"codeix": {"command": "codeix"}}}