WET - Web Extended Toolkit MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add wet-mcp -- uvx --python 3.13 wet-mcp@latest
README.md

Web search, content extraction, and library docs indexing with hybrid search.

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

claude plugin add n24q02m/wet-mcp

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        // -- optional: cloud embedding + reranking (Jina AI recommended)
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        // -- or: "API_KEYS": "GOOGLE_API_KEY:AIza...,COHERE_API_KEY:co-...",
        // -- without API_KEYS, uses built-in local Qwen3 ONNX models (CPU, ~570MB first download)
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: higher rate limits for docs discovery (60 -> 5000 req/hr)
        "GITHUB_TOKEN": "ghp_...",
        // -- optional: restrict local file conversion to specific directories
        // "CONVERT_ALLOWED_DIRS": "/home/user/docs,/tmp/uploads",
        // -- optional: sync indexed docs across machines via rclone
        "SYNC_ENABLED": "true",                    // default: false
        "SYNC_INTERVAL": "300"                     // auto-sync every 5min (0 = manual only)
      }
    }
  }
}
Option 2: Docker
{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "-e", "SYNC_INTERVAL",
        "n24q02m/wet-mcp:latest"
      ],
      "env": {
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        "GITHUB_TOKEN": "ghp_...",
        "SYNC_ENABLED": "true",
        "SYNC_INTERVAL": "300"
      }
    }
  }
}

Pre-install (optional)

# Pre-download SearXNG, Playwright, embedding model (~570MB), and reranker model (~570MB)
uvx --python 3.13 wet-mcp warmup

# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_K

Tools (3)

web_searchPerforms web searches using embedded SearXNG with support for filters and semantic reranking.
extract_contentCleans and extracts content from URLs into Markdown or structured JSON.
convert_fileConverts local files like PDF, DOCX, or XLSX to Markdown.

Environment Variables

API_KEYSOptional cloud provider keys for embedding and reranking (e.g., Jina AI, Google, Cohere).
GITHUB_TOKENOptional token to increase rate limits for documentation discovery.
SYNC_ENABLEDEnable cross-machine sync of indexed docs via rclone.
SYNC_INTERVALAuto-sync interval in seconds.

Configuration

claude_desktop_config.json
{"mcpServers": {"wet": {"command": "uvx", "args": ["--python", "3.13", "wet-mcp@latest"], "env": {"SYNC_ENABLED": "true", "SYNC_INTERVAL": "300"}}}}

Try it

Search for the latest documentation on the MCP protocol and summarize the key features.
Extract the content from this URL and convert it into a clean Markdown format.
Convert the attached PDF report into Markdown so I can analyze the text.
Find recent academic papers on LLM efficiency using Google Scholar.

Frequently Asked Questions

What are the key features of WET - Web Extended Toolkit?

Embedded SearXNG metasearch with semantic reranking. Academic research search across Google Scholar, arXiv, and PubMed. Clean content extraction to Markdown or structured JSON. Local file conversion for PDF, DOCX, XLSX, and more. Cross-machine sync of indexed documentation via rclone.

What can I use WET - Web Extended Toolkit for?

Automating research by aggregating results from multiple search engines and academic databases.. Converting complex document formats like PDF or Excel into LLM-readable Markdown.. Indexing library documentation for local RAG workflows without needing external API keys.. Bypassing anti-bot protections on sites like Medium or Twitter for data extraction..

How do I install WET - Web Extended Toolkit?

Install WET - Web Extended Toolkit by running: claude plugin add n24q02m/wet-mcp

What MCP clients work with WET - Web Extended Toolkit?

WET - Web Extended Toolkit works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep WET - Web Extended Toolkit docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare