Scrape and transcribe podcast episodes from YouTube or RSS feeds
MCP Podcast Scraper
An MCP (Model Context Protocol) server that scrapes and transcribes podcast episodes. Designed to work with Claude Code or Claude Desktop - you provide the podcast, the MCP transcribes it, and Claude summarizes it.
What It Does
- šļø Scrapes podcasts from YouTube videos or RSS feeds
- šÆ Transcribes audio using Deepgram's fast Nova-2 model
- š Organizes files by podcast name and episode date
- š Tracks podcasts for new episodes
- āļø Skips duplicates - won't re-scrape already processed episodes
- š Finds incomplete work - lists episodes that need summarization
- āļø Custom summary prompts - customize how Claude summarizes for your needs
How It Works
You: "Check for new episodes and summarize them"
ā
Claude: Calls check_new_episodes() ā Finds new episodes
ā
Claude: Calls scrape_podcast() ā Downloads & transcribes
ā
Claude: Calls get_summary_prompt() ā Reads your custom instructions
ā
Claude: Calls get_transcript() ā Reads the transcript
ā
Claude: Summarizes following your prompt
ā
Claude: Calls save_summary() ā Saves the .md file
ā
Done! transcript.md + summary.md saved
Installation Guide
Step 1: Prerequisites
Install required system tools (macOS):
# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install yt-dlp (for YouTube) and ffmpeg (for audio)
brew install yt-dlp ffmpeg
Step 2: Clone & Build
# Clone the repository
git clone https://github.com/wkoleilat-happytitan/mcp-podcast-scraper.git
cd mcp-podcast-scraper
# Install dependencies
npm install
# Build
npm run build
Step 3: Get a Deepgram API Key
- Go to https://console.deepgram.com/
- Sign up (free tier includes $200 credit - enough for ~300 hours of audio)
- Create an API key
- Copy the key
Step 4: Configure
Copy the example config file and add your API key:
# Copy the example config
cp config.example.json config.json
# Edit config.json and add your Deepgram API key
Your config.json should look like:
{
"outputDirectory": "./podcasts",
"deepgramApiKey": "YOUR_ACTUAL_DEEPGRAM_API_KEY",
"tempDirectory": "./temp"
}
ā ļø Important: Never commit
config.jsonto git - it contains your API key! The.gitignorealready excludes it.
Step 5: Add to Claude Code
Add this to your Claude Code MCP settings (~/.cursor/mcp.json or via Settings ā MCP):
{
"mcpServers": {
"podcast-scraper": {
"command": "node",
"args": ["/FULL/PATH/TO/mcp-podcast-scraper/dist/index.js"]
}
}
}
Important: Replace /FULL/PATH/TO/ with the actual path to your installation.
Step 5 (Alternative): Add to Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"podcast-scraper": {
"command": "node",
"args": ["/FULL/PATH/TO/mcp-podcast-scraper/dist/index.js"]
}
}
}
Then restart Claude Desktop.
File Structure
mcp-podcast-scraper/
āāā config.example.json # Template - copy to config.json
āāā config.json # Your config (git-ignored, contains API key)
āāā tracking.example.json # Example tracking file
āāā tracking.json # Your tracked podcasts (git-ignored)
āāā prompts/
ā āāā summary-prompt.md # Customize how Claude summarizes (editable)
āāā podcasts/ # Your transcripts & summaries (git-ignored)
āāā src/ # Source code
āāā dist/ # Compiled code (git-ignored)
āāā node_modules/ # Dependencies (git-ignored)
Usage Examples
Scrape a Specific Episode
"Scrape this YouTube podcast: https://youtube.com/watch?v=..."
"Find and scrape the latest Lex Fridman episode"
Track Podcasts for New Episodes
"Track the Huberman Lab podcast: https://feeds.megaphone.fm/hubermanlab"
"Check my tracked podcasts for new episodes"
"List all podcasts I'm tracking"
Find Incomplete Work
"Show me episodes that need summaries"
"List incomplete episodes"
MCP Tools Reference
| Tool | Description |
|---|---|
scrape_podcast |
Scrape & transcribe an episode. Returns file path and preview. |
get_transcript |
Read the full transcript of a scraped episode. |
get_summary_prompt |
Get your custom summarization instructions. |
save_summary |
Save your generated summary to a markdown file. |
check_new_episodes |
Check tracked podcasts for new (unscraped) episodes. |
list_incomplete |
Find episodes with transcripts but no summaries. |
search_podcast |
Search YouTube or parse RSS feeds to find episodes. |
add_tracking |
Add a podcast RSS feed to your tracking list. |
list_tracking |
List all podcasts you're tracking. |
remove_tracking |
Remove a podca |
Tools (10)
scrape_podcastScrape and transcribe an episode, returning file path and preview.get_transcriptRead the full transcript of a scraped episode.get_summary_promptGet your custom summarization instructions.save_summarySave your generated summary to a markdown file.check_new_episodesCheck tracked podcasts for new unscraped episodes.list_incompleteFind episodes with transcripts but no summaries.search_podcastSearch YouTube or parse RSS feeds to find episodes.add_trackingAdd a podcast RSS feed to your tracking list.list_trackingList all podcasts you are tracking.remove_trackingRemove a podcast from your tracking list.Environment Variables
deepgramApiKeyrequiredAPI key for Deepgram transcription servicesConfiguration
{"mcpServers": {"podcast-scraper": {"command": "node", "args": ["/FULL/PATH/TO/mcp-podcast-scraper/dist/index.js"]}}}