What are the requirements for Zotero Chunk RAG?

Zotero Chunk RAG requires the following environment variables: GEMINI_API_KEY (optional), ANTHROPIC_API_KEY (optional). You'll also need a compatible MCP client like Claude Desktop or Claude Code.

Is Zotero Chunk RAG free to use?

Yes, Zotero Chunk RAG is open source and free to use. You can find the source code on GitHub.

What MCP clients support Zotero Chunk RAG?

Zotero Chunk RAG works with any MCP-compatible client including Claude Desktop (Anthropic's official desktop app), Claude Code (CLI tool), Cursor, and other editors with MCP support.

How do I configure Zotero Chunk RAG?

Configure Zotero Chunk RAG by adding it to your MCP client's config file. The setup block at the top of this page generates a ready-to-paste config for Claude Code, Cursor, Codex, Windsurf, and Claude Desktop.

MCP server/search

Zotero Chunk RAG MCP Server

Q: What tools does Zotero Chunk RAG provide?

index_library: Triggers the indexing of the Zotero library to extract text, tables, and figures..

Q: How do I install Zotero Chunk RAG?

Install Zotero Chunk RAG by running: python -m venv .venv && .venv/Scripts/python.exe -m pip install -e .

Semantic search over a Zotero library using Gemini and ChromaDB.

★ 3 ccam80/deep-zotero ↗by ccam80updated Apr 22, 2026

Add it to Claude Code

claude mcp add zotero-chunk-rag -- python -m deep_zotero.server

Make your agent remember this setup

zotero-chunk-rag's config, env vars, and the gotchas you hit — recalled in every future Claude Code, Cursor, and Codex session.

npx conare@latest

Free · one command · indexes the sessions already on disk. Set up in the browser instead →

What it does

Section-aware text chunking with overlap
Vision-based table extraction using Claude Haiku 4.5
Figure detection and caption-based indexing
Incremental indexing of Zotero SQLite database
Support for both Gemini and local embedding providers

Tools 1

index_libraryTriggers the indexing of the Zotero library to extract text, tables, and figures.

Environment Variables

GEMINI_API_KEYAPI key for Gemini embeddings

ANTHROPIC_API_KEYAPI key for vision-based table extraction

Try it

→Search my Zotero library for recent papers discussing transformer architecture.

→Find the table in my saved PDFs that compares model performance metrics.

→Retrieve the specific passage from my research notes regarding the methodology of the 2023 study.

→Index my Zotero library to ensure all new PDFs are searchable.

Original README from ccam80/deep-zotero

DeepZotero

Semantic search over a Zotero library. PDFs are extracted (text, tables, figures), chunked, embedded, and stored in ChromaDB. An MCP server exposes the index to Claude Code (or any MCP client) as 13 tools for semantic search, boolean search, table/figure search, context expansion, citation graph lookup, indexing, and cost tracking.

What it extracts

Text — section-aware chunks with overlap, classified by document section (abstract, methods, results, etc.)
Tables — vision-based extraction via Claude Haiku 4.5. Each table is rendered to PNG and transcribed to structured markdown (headers, rows, footnotes). Falls back to PyMuPDF heuristics if vision is disabled.
Figures — detected with captions, extracted as PNGs, searchable by caption text.

Requirements

Python 3.10+
A Gemini API key for embeddings (unless using embedding_provider: "local")
An Anthropic API key for vision-based table extraction (optional but recommended)
A Zotero installation with PDFs in storage/

Install

python -m venv .venv
.venv/Scripts/python.exe -m pip install -e .

For vision table extraction:

.venv/Scripts/python.exe -m pip install -e ".[vision]"

Setup

1. Configuration

mkdir -p ~/.config/deep-zotero
cp config.example.json ~/.config/deep-zotero/config.json

Edit ~/.config/deep-zotero/config.json:

{
    "zotero_data_dir": "~/Zotero",
    "chroma_db_path": "~/.local/share/deep-zotero/chroma",
    "gemini_api_key": "YOUR_GEMINI_KEY",
    "anthropic_api_key": "YOUR_ANTHROPIC_KEY"
}

All other fields have sensible defaults. You can also set GEMINI_API_KEY and ANTHROPIC_API_KEY as environment variables instead.

2. API keys

Gemini (required for default embeddings): Get a key at aistudio.google.com/app/apikey. Set it as gemini_api_key in config or GEMINI_API_KEY env var. If you don't want to use Gemini, set "embedding_provider": "local" to use ChromaDB's built-in all-MiniLM-L6-v2 model (no API key needed, lower quality).

Anthropic (required for vision table extraction): Get a key at console.anthropic.com. Set it as anthropic_api_key in config or ANTHROPIC_API_KEY env var. Without this key, tables are still extracted via PyMuPDF heuristics but accuracy on complex tables is lower. Vision extraction uses the Anthropic Batch API with Claude Haiku 4.5 — cost is roughly $0.016 per table, with prompt caching reducing cost on large batches.

To disable vision extraction entirely:

{
    "vision_enabled": false
}

3. Index your library

deep-zotero-index -v

To test with a subset first:

deep-zotero-index --limit 10 -v

This reads the Zotero SQLite database (read-only, safe while Zotero is open), extracts text/tables/figures from each PDF, chunks the text, embeds via Gemini, and stores everything in ChromaDB.

CLI options:

Flag	Description
`--force`	Delete and rebuild index for all matching items
`--limit N`	Only index N items
`--item-key KEY`	Index a single Zotero item
`--title PATTERN`	Regex filter on title (case-insensitive)
`--no-vision`	Skip vision table extraction for this run
`--config PATH`	Use a different config file
`-v`	Debug logging

The indexer is incremental — it only processes items not already in the index. Use --force after changing chunk_size, embedding_dimensions, or ocr_language.

You can also trigger indexing from the MCP client via the index_library tool.

4. Register the MCP server

Add to your Claude Code settings (~/.claude/settings.json):

{
    "mcpServers": {
        "deep-zotero": {
            "command": "/path/to/.venv/bin/python",
            "args": ["-m", "deep_zotero.server"]
        }
    }
}

On Windows:

{
    "mcpServers": {
        "deep-zotero": {
            "command": "C:\\path\\to\\.venv\\Scripts\\python.exe",
            "args": ["-m", "deep_zotero.server"]
        }
    }
}

Restart Claude Code. All 13 tools will be available.

Configuration reference

Zotero

Field	Default	Description
`zotero_data_dir`	`~/Zotero`	Path to Zotero's data directory (contains `zotero.sqlite` and `storage/`)
`chroma_db_path`	`~/.local/share/deep-zotero/chroma`	Where the ChromaDB index is stored on disk

Embedding

Field	Default	Description
`embedding_provider`	`"gemini"`	`"gemini"` for Gemini API, `"local"` for ChromaDB's built-in all-MiniLM-L6-v2 (no key needed)
`embedding_model`	`"gemini-embedding-001"`	Gemini model name (only used when provider is `"gemini"`)
`embedding_dimensions`	`768`	Output vector dimensions. `gemini-embedding-001` supports 64-3072. Changing requires `--force` re-ind

Frequently Asked Questions

What are the key features of Zotero Chunk RAG?

Section-aware text chunking with overlap. Vision-based table extraction using Claude Haiku 4.5. Figure detection and caption-based indexing. Incremental indexing of Zotero SQLite database. Support for both Gemini and local embedding providers.

What can I use Zotero Chunk RAG for?

Quickly locating specific data points within a large collection of academic PDFs.. Extracting structured markdown tables from complex research papers.. Performing semantic searches across a personal library of scientific literature.. Automating the indexing of new research materials as they are added to Zotero..

How do I install Zotero Chunk RAG?

Install Zotero Chunk RAG by running: python -m venv .venv && .venv/Scripts/python.exe -m pip install -e .

What MCP clients work with Zotero Chunk RAG?

Zotero Chunk RAG works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Conare · memory for coding agents

Turn this server into reusable context

Keep Zotero Chunk RAG docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Set up free$npx conare@latest