Your LLM re-reads the same reference docs every conversation. Grimoire indexes them.
Grimoire
Your LLM re-reads the same reference docs every conversation. Grimoire indexes them once.tannner.com ·
GitHub
The Problem
Your LLM agent needs to reference CWE-89 during a code review. Without Grimoire, it either hallucinates the details, or you paste 50 pages of NIST docs into the context window and hope it finds the right paragraph. Every conversation. Every time.
The Solution
Grimoire indexes security reference material once — CVEs, CWEs, OWASP, audit findings, your internal standards — into a single SQLite file with both FTS5 keyword search and semantic embeddings. Your LLM agent searches it mid-conversation via MCP. Exact matches when you need "CWE-89". Conceptual recall when you need "authentication bypass techniques". Both in one query.
One SQLite file. Zero cloud. Instant retrieval via MCP.
+------------------+
| Data Sources |
| CVE MD CSV .. |
+--------+---------+
|
ingest()
|
+--------v---------+
| SQLite DB |
| +------------+ |
| | documents | |
| +------------+ |
| | docs_fts5 | | <-- FTS5 keyword index
| +------------+ |
| | embeddings | | <-- semantic vectors
| +------------+ |
+--------+---------+
|
+--------v---------+
| Search Engine |
| |
| keyword (BM25) |
| semantic (cos) |
| hybrid (both) |
+--------+---------+
|
+-------------+-------------+
| |
+------v------+ +--------v--------+
| Python API | | MCP Server |
| | | |
| Grimoire() | | grimoire_search |
| .search() | | grimoire_status |
| .add_doc() | | grimoire_quality|
+-------------+ +-----------------+
Quick Start
git clone https://github.com/tannernicol/grimoire.git
cd grimoire
pip install -e .
# Fetch and index real security data (NVD CVEs + CWE catalog + OWASP Top 10)
python scripts/fetch_sources.py all
# Search
python examples/search_demo.py "SQL injection"
python examples/search_demo.py "access control" --severity critical
python examples/search_demo.py --status
Auto-Fetch Security Data
Grimoire fetches from reputable public sources — no manual downloads:
# Everything: NVD + CWE + OWASP
python scripts/fetch_sources.py all
# Recent CVEs from NIST NVD (last 90 days, critical only)
python scripts/fetch_sources.py nvd --days 90 --severity CRITICAL
# Full CWE catalog from MITRE
python scripts/fetch_sources.py cwe
# With embeddings for semantic search (requires Ollama)
python scripts/fetch_sources.py all --embeddings
Enable Semantic Search
Requires Ollama with nomic-embed-text:
ollama pull nomic-embed-text
python scripts/fetch_sources.py all --embeddings
python examples/search_demo.py "authentication bypass" --mode hybrid
Why Not Just Use RAG?
Most RAG setups do one thing: chunk documents, embed them, vector search. That works until you need an exact CVE number, a specific NIST control ID, or a CWE by name. Vector search alone misses exact matches.
Grimoire runs both:
- FTS5 (BM25) for keyword precision — finds "CWE-89" when you search "CWE-89"
- Semantic embeddings (cosine similarity) for conceptual recall — finds SQL injection variants when you search "database manipulation"
- Hybrid mode combines both with configurable weighting (default 40/60 keyword/semantic)
Everything lives in a single SQLite file. N
Tools (3)
grimoire_searchPerforms hybrid keyword and semantic search against the indexed security documentation.grimoire_statusReturns the current status and statistics of the indexed security database.grimoire_qualityChecks the quality or integrity of the indexed data.Configuration
{"mcpServers": {"grimoire": {"command": "python", "args": ["/path/to/grimoire/mcp_server.py"]}}}