RTFM
Retrieve The Forgotten Memory
The open retrieval layer for AI agents
Index your entire project — code, docs, legal, research, data — and serve your AI agent exactly the context it needs.
Why?
Your AI agent is blind. It greps through thousands of files, loses context every session, hallucinates modules that don't exist. The fix isn't a smarter model — it's smarter retrieval.
Augment, Sourcegraph, and Cursor index code. RTFM indexes everything.
pip install rtfm-ai[mcp] && cd your-project && rtfm init
30 seconds. Claude Code now searches your indexed knowledge base before grepping.
Features
Search & Retrieval
- FTS5 full-text search — instant, zero-config, works out of the box
- Semantic search — optional embeddings (FastEmbed/ONNX, no GPU needed)
- Metadata-first — search returns file paths + scores (~300 tokens), not content dumps
- Progressive disclosure — the agent reads only what it needs via
Read(file_path)
Indexing
- 10 parsers built-in — Markdown, Python (AST), LaTeX, YAML, JSON, Shell, PDF, XML, HTML, plain text
- Extensible — add any format in ~50 lines of Python
- Incremental sync — only re-indexes what changed
- Auto-sync — hooks keep the index fresh every prompt, zero manual work
Integration
- MCP server — works with Claude Code, Cursor, Codex, any MCP client
- CLI —
rtfm search,rtfm sync,rtfm status, ... - Python API —
Library,SearchResults, custom parsers - Non-invasive — doesn't touch your code, doesn't replace your workflow tools
Quick Start
Install
pip install rtfm-ai[mcp]
Initialize in your project
cd /path/to/your-project
rtfm init
This creates .rtfm/library.db, registers the MCP server, injects search instructions into CLAUDE.md, and installs auto-sync hooks. Done.
Then say to Claude Code: "Search for authentication flow" — it uses rtfm_search instead of grepping.
Optional extras
pip install rtfm-ai[embeddings] # Semantic search (FastEmbed ONNX)
pip install rtfm-ai[pdf] # PDF parsing (pdftext + marker)
pip install rtfm-ai[mcp,embeddings,pdf] # Everything
MCP Tools
| Tool | What it does |
|---|---|
rtfm_search |
Search the index (FTS, semantic, or hybrid) |
rtfm_context |
Get relevant context for a subject (metadata-only) |
rtfm_expand |
Show all chunks of a source with full content |
rtfm_discover |
Fast project structure scan (~1s, no indexing needed) |
rtfm_books |
List indexed documents |
rtfm_stats |
Library statistics |
rtfm_sync |
Sync a directory (incremental) |
rtfm_ingest |
Ingest a single file |
rtfm_tags |
List all tags |
rtfm_tag_chunks |
Add tags to specific chunks |
rtfm_remove |
Remove a file from the index |
The Parser Architecture
This is what makes RTFM different. Need to index a format nobody supports?
from rtfm.parsers.base import BaseParser, ParserRegistry
from rtfm.core.models import Chunk
@ParserRegistry.register
class FHIRParser(BaseParser):
"""Parse HL7 FHIR medical records."""
extensions = ['.fhir.json']
name = "fhir"
def parse(self, path, metadata=None):
data = json.loads(path.read_text())
for entry in data.get('entry', []):
resource = entry.get('resource', {})
yield Chunk(
id=resource.get('id', str(uuid4())),
content=json.dumps(resource, indent=2),
book_title=f"FHIR {resource.get('resourceType', 'Unknown')}",
book_slug=resource.get('id', 'unknown'),
page_start=1,
page_end=1,
)
50 lines. Now your medical AI agent understands FHIR records.
Built-in parsers
| Parser | Extensions
Tools 11
rtfm_searchSearch the index using FTS, semantic, or hybrid methods.rtfm_contextGet relevant context for a subject using metadata-only retrieval.rtfm_expandShow all chunks of a source with full content.rtfm_discoverPerform a fast project structure scan without indexing.rtfm_booksList all indexed documents.rtfm_statsRetrieve library statistics.rtfm_syncPerform an incremental sync of a directory.rtfm_ingestIngest a single file into the index.rtfm_tagsList all available tags.rtfm_tag_chunksAdd tags to specific chunks.rtfm_removeRemove a file from the index.