Token-efficient document retrieval for substrate AI agents.
Notebook Library MCP Server
Token-efficient document retrieval for substrate AI agents. Drop PDFs, text files, and markdown into notebook folders — they get chunked, embedded, and indexed for semantic search. Queries return only the most relevant passages (~2,500 tokens) instead of loading entire documents (50,000+).
What It Does
Your AI agent gets a notebook_library tool with these actions:
| Action | Description |
|---|---|
list_notebooks |
See all available notebooks |
create_notebook |
Create a new notebook collection |
query_notebook |
Semantic search within a notebook (the main one!) |
browse_notebook |
List documents in a notebook |
read_document |
Deep-read a specific document chunk by chunk |
notebook_stats |
Get statistics about a notebook |
sync_notebook |
Re-sync after adding/changing files |
remove_document |
Remove a document from the search index |
Supported file formats: .pdf, .txt, .md, .text, .markdown
Architecture
data/
├── notebooks/ # Your document folders
│ ├── Research_Papers/ # Each subfolder = one notebook
│ │ ├── paper1.pdf
│ │ └── notes.md
│ └── Business_Docs/
│ └── plan.txt
└── notebook_chromadb/ # Vector database (auto-created)
└── manifests/ # File change tracking
mcp_servers/
└── notebook_library/
├── server.py # MCP server (if running standalone)
├── notebook_manager.py # Core: ChromaDB ingestion + search
├── document_processor.py # Text extraction + chunking
├── file_watcher.py # Auto-ingestion on file changes
└── requirements.txt
backend/tools/
├── notebook_library_tool.py # Tool wrapper for consciousness loop
└── notebook_library_tool_schema.json # Tool schema definition
Embedding strategy (multi-tier fallback):
- Hugging Face (
jinaai/jina-embeddings-v2-base-de) — local, free, multilingual - Ollama (
nomic-embed-text) — local fallback if HF fails
No external API keys needed. Everything runs locally.
Setup Guide
1. Install Dependencies
From your substrate root:
pip install -r mcp_servers/notebook_library/requirements.txt
Key dependencies:
chromadb==0.4.18— vector databasetransformers+torch— Hugging Face embeddings (primary)ollama— embedding fallbackPyMuPDF— PDF text extractionwatchdog— file system monitoring
Note: First run will download the Hugging Face embedding model (~270MB). This is a one-time download.
2. Create Data Directories
mkdir -p data/notebooks
mkdir -p data/notebook_chromadb
3. Copy the MCP Server Files
Copy the entire mcp_servers/notebook_library/ directory into your substrate:
your_substrate/
└── mcp_servers/
└── notebook_library/
├── __init__.py
├── server.py
├── notebook_manager.py
├── document_processor.py
├── file_watcher.py
└── requirements.txt
4. Copy the Tool Wrapper
Copy these two files into your backend/tools/ directory:
backend/tools/notebook_library_tool.py — The tool function your consciousness loop calls. This imports NotebookManager directly (no subprocess).
backend/tools/notebook_library_tool_schema.json — The tool schema so your agent knows how to call it.
5. Register the Tool in Your Consciousness Loop
Three integration points:
a) Import in `integration_tools.py`
Add to your imports:
from tools.notebook_library_tool import notebook_library_tool as _notebook_library_tool
Add the wrapper method to your IntegrationTools class:
def notebook_library(self, **kwargs) -> Dict[str, Any]:
"""
Notebook Library — token-efficient document retrieval.
"""
try:
result = _notebook_library_tool(**kwargs)
return result
except Exception as e:
return {
"status": "error",
"message": f"Notebook library error: {str(e)}"
}
Add 'notebook_library_tool' to your tool schema loading list so the JSON schema gets picked up.
b) Add tool call handler in `consciousness_loop.py`
In your tool execution block (where you handle elif tool_name == "..." cases), add:
elif tool_name == "notebook_library":
result = self.tools.notebook_library(**arguments)
c) Verify schema loading
The tool schema file (notebook_library_tool_schema.json) must be in backend/tools/ alongside your other tool schemas. The schema loader should pick it up automatically if it follows the same pattern as your other tools.
6. Add Documents
Create notebook folders and drop files in:
mkdir -p data/notebooks/My_Research
cp ~/some_paper.pdf data/notebooks/My_Research/
cp ~/notes.md data/notebooks/My_Research/
Documents are auto-ingested when your agent first queries the notebook, or you can trigger a manual sync via the sync_notebook action.
Envi
Tools (8)
list_notebooksSee all available notebookscreate_notebookCreate a new notebook collectionquery_notebookSemantic search within a notebookbrowse_notebookList documents in a notebookread_documentDeep-read a specific document chunk by chunknotebook_statsGet statistics about a notebooksync_notebookRe-sync after adding/changing filesremove_documentRemove a document from the search indexConfiguration
{ "mcpServers": { "notebook_library": { "command": "python", "args": ["path/to/mcp_servers/notebook_library/server.py"] } } }