MCP Web Scraper Server MCP Server

An advanced web search and scraping server for AI models.

README.md

šŸš€ Production MCP Web Scraper Server

A modular, production-ready MCP server built with the official MCP Python SDK. Optimized for Render deployment with clean separation of concerns.

šŸ“ Project Structure

mcp-web-scraper/
ā”œā”€ā”€ server.py              # Main server entry point
ā”œā”€ā”€ tools/
│   ā”œā”€ā”€ __init__.py       # Tools package initialization
│   ā”œā”€ā”€ search.py         # Search tools (web_search, news_search, etc.)
│   └── scraping.py       # Scraping tools (scrape_html, extract_article, etc.)
ā”œā”€ā”€ utils/
│   ā”œā”€ā”€ __init__.py       # Utils package initialization
│   └── helpers.py        # Helper functions (clean_text, validate_url)
ā”œā”€ā”€ requirements.txt       # Python dependencies
ā”œā”€ā”€ render.yaml           # Render deployment configuration
ā”œā”€ā”€ .gitignore            # Git ignore rules
ā”œā”€ā”€ README.md             # This file
└── config.example.json   # Claude Desktop config example

✨ Features

šŸ” Search Tools (`tools/search.py`)

  • web_search - DuckDuckGo web search
  • news_search - News articles with metadata
  • search_and_scrape - Search + content extraction
  • smart_search - Adaptive search (quick/standard/comprehensive)

šŸ“„ Scraping Tools (`tools/scraping.py`)

  • scrape_html - HTML scraping with CSS selectors
  • extract_article - Clean article extraction
  • extract_links - Link extraction with filtering
  • extract_metadata - Page metadata & Open Graph
  • scrape_table - Table data extraction

šŸš€ Quick Deploy to Render

Step 1: Create Project Structure

mkdir mcp-web-scraper
cd mcp-web-scraper

# Create directory structure
mkdir -p tools utils

# Create all files (copy from artifacts above):
# - server.py
# - tools/__init__.py
# - tools/search.py
# - tools/scraping.py
# - utils/__init__.py
# - utils/helpers.py
# - requirements.txt
# - render.yaml
# - .gitignore
# - README.md

Step 2: Push to GitHub

git init
git add .
git commit -m "Initial commit: Modular MCP Web Scraper"
git remote add origin https://github.com/YOUR_USERNAME/mcp-web-scraper.git
git push -u origin main

Step 3: Deploy on Render

  1. Go to render.com
  2. Click "New +" → "Web Service"
  3. Connect your GitHub repository
  4. Render auto-detects render.yaml
  5. Click "Create Web Service"
  6. Wait 2-3 minutes ✨

Step 4: Get Your URL

Your service: https://your-app.onrender.com MCP endpoint: https://your-app.onrender.com/mcp

šŸ”Œ Connect to Claude Desktop

Config Location

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Configuration

{
  "mcpServers": {
    "web-scraper": {
      "type": "streamable-http",
      "url": "https://your-app.onrender.com/mcp"
    }
  }
}

Restart Claude Desktop after updating config!

šŸ’» Local Development

# Clone and setup
git clone https://github.com/YOUR_USERNAME/mcp-web-scraper.git
cd mcp-web-scraper

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run server
python server.py

Server runs at http://localhost:8000/mcp

Test Locally

# List tools
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# Test web search
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc":"2.0",
    "id":2,
    "method":"tools/call",
    "params":{
      "name":"web_search",
      "arguments":{"query":"AI news","max_results":3}
    }
  }'

šŸ› ļø Adding New Tools

1. Search Tool Example

Edit tools/search.py:

@mcp.tool()
def my_custom_search(query: str) -> dict:
    """Your custom search tool"""
    # Implementation here
    return {"success": True, "data": []}

2. Scraping Tool Example

Edit tools/scraping.py:

@mcp.tool()
def my_custom_scraper(url: str) -> dict:
    """Your custom scraper"""
    # Implementation here
    return {"success": True, "content": ""}

3. Deploy Changes

git add .
git commit -m "Add new tools"
git push origin main
# Render auto-deploys!

šŸ“Š Monitoring

View Logs

  1. Render Dashboard → Your Service
  2. Click "Logs" tab
  3. View real-time logs

Health Check

curl https://your-app.onrender.com/health

šŸŽÆ Architecture Benefits

āœ… Modular Design

  • Separation of concerns - Each file has one responsibility
  • Easy to maintain - Find and update code quickly
  • Scalable - Add new tools without touching existing code

āœ… Clean Code

  • Type hints - Better IDE support and error catching
  • Logging - Track all operations
  • Error handling - Graceful failures with detailed errors

āœ… Production Ready

  • Official MCP SDK - FastMCP framework
  • Streamable HTTP - Single endpoint communicatio

Tools 9

web_searchPerforms a DuckDuckGo web search.
news_searchSearches for news articles with metadata.
search_and_scrapePerforms a search and extracts content from results.
smart_searchAdaptive search with quick, standard, or comprehensive modes.
scrape_htmlScrapes HTML content using CSS selectors.
extract_articleCleans and extracts article content from a webpage.
extract_linksExtracts links from a webpage with filtering options.
extract_metadataExtracts page metadata and Open Graph tags.
scrape_tableExtracts table data from a webpage.

Try it

→Search for the latest news about AI regulations and summarize the key points.
→Extract the main article content from this URL: https://example.com/article.
→Find all the links on this webpage and filter for those pointing to documentation.
→Scrape the data table from this URL and format it as a CSV.
→Perform a comprehensive search for 'best practices for Python web scraping' and extract the metadata from the top result.

Frequently Asked Questions

What are the key features of MCP Web Scraper Server?

DuckDuckGo web and news search integration. Advanced HTML scraping with CSS selector support. Clean article and metadata extraction. Table data extraction capabilities. Modular, production-ready architecture optimized for Render.

What can I use MCP Web Scraper Server for?

Automating news discovery and summarization workflows. Extracting structured data from websites for research or analysis. Gathering metadata and Open Graph information for SEO auditing. Building custom AI-driven research agents that browse the live web.

How do I install MCP Web Scraper Server?

Install MCP Web Scraper Server by running: git clone https://github.com/Aniruddha1202/mcp-web-scraper.git && cd mcp-web-scraper && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && python server.py

What MCP clients work with MCP Web Scraper Server?

MCP Web Scraper Server works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep MCP Web Scraper Server docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Open Conare