Web search, content extraction, and library docs indexing with hybrid search.
WET - Web Extended Toolkit MCP Server
mcp-name: io.github.n24q02m/wet-mcp
Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.
Features
- Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
- Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
- Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
- Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
- Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
- Media -- List, download, and analyze images, videos, audio files
- Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
- Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
- Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)
Quick Start
Claude Code Plugin (Recommended)
claude plugin add n24q02m/wet-mcp
MCP Server
Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify
--python 3.13when usinguvx.
On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.
Option 1: uvx
{
"mcpServers": {
"wet": {
"command": "uvx",
"args": ["--python", "3.13", "wet-mcp@latest"],
"env": {
// -- optional: cloud embedding + reranking (Jina AI recommended)
"API_KEYS": "JINA_AI_API_KEY:jina_...",
// -- or: "API_KEYS": "GOOGLE_API_KEY:AIza...,COHERE_API_KEY:co-...",
// -- without API_KEYS, uses built-in local Qwen3 ONNX models (CPU, ~570MB first download)
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: higher rate limits for docs discovery (60 -> 5000 req/hr)
"GITHUB_TOKEN": "ghp_...",
// -- optional: restrict local file conversion to specific directories
// "CONVERT_ALLOWED_DIRS": "/home/user/docs,/tmp/uploads",
// -- optional: sync indexed docs across machines via rclone
"SYNC_ENABLED": "true", // default: false
"SYNC_INTERVAL": "300" // auto-sync every 5min (0 = manual only)
}
}
}
}
Option 2: Docker
{
"mcpServers": {
"wet": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-wet",
"-v", "wet-data:/data",
"-e", "API_KEYS",
"-e", "GITHUB_TOKEN",
"-e", "SYNC_ENABLED",
"-e", "SYNC_INTERVAL",
"n24q02m/wet-mcp:latest"
],
"env": {
"API_KEYS": "JINA_AI_API_KEY:jina_...",
"GITHUB_TOKEN": "ghp_...",
"SYNC_ENABLED": "true",
"SYNC_INTERVAL": "300"
}
}
}
}
Pre-install (optional)
# Pre-download SearXNG, Playwright, embedding model (~570MB), and reranker model (~570MB)
uvx --python 3.13 wet-mcp warmup
# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_K
Tools (3)
web_searchPerforms web searches using embedded SearXNG with support for filters and semantic reranking.extract_contentCleans and extracts content from URLs into Markdown or structured JSON.convert_fileConverts local files like PDF, DOCX, or XLSX to Markdown.Environment Variables
API_KEYSOptional cloud provider keys for embedding and reranking (e.g., Jina AI, Google, Cohere).GITHUB_TOKENOptional token to increase rate limits for documentation discovery.SYNC_ENABLEDEnable cross-machine sync of indexed docs via rclone.SYNC_INTERVALAuto-sync interval in seconds.Configuration
{"mcpServers": {"wet": {"command": "uvx", "args": ["--python", "3.13", "wet-mcp@latest"], "env": {"SYNC_ENABLED": "true", "SYNC_INTERVAL": "300"}}}}