MCP server for video analysis — extracts transcripts, key frames, and metadata.
mcp-video-analyzer
Featured in awesome-mcp-servers.
MCP server for video analysis — extracts transcripts, key frames, and metadata from video URLs. Supports Loom, direct video files (.mp4, .webm), and more.
No existing video MCP combines transcripts + visual frames + metadata in one tool. This one does.
Installation
Prerequisites
- Node.js 18+ — required to run the server via
npx - yt-dlp (optional) — enables frame extraction via ffmpeg. Install with
pip install yt-dlp - Chrome/Chromium (optional) — fallback for frame extraction if yt-dlp is unavailable
Without yt-dlp or Chrome, the server still works — you'll get transcripts, metadata, and comments, just no frames.
Claude Code (CLI)
claude mcp add video-analyzer -- npx mcp-video-analyzer@latest
Then restart Claude Code or start a new conversation.
VS Code / Cursor
Add to your MCP settings file:
- VS Code:
File → Preferences → Settings → search "MCP"or edit~/.vscode/mcp.json/%APPDATA%\Code\User\mcp.json(Windows) - Cursor:
Settings → MCP Servers → Add
{
"servers": {
"mcp-video-analyzer": {
"type": "stdio",
"command": "npx",
"args": ["mcp-video-analyzer@latest"]
}
}
}
Then reload the window (Ctrl+Shift+P → "Developer: Reload Window").
Claude Desktop
Add to your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"video-analyzer": {
"command": "npx",
"args": ["mcp-video-analyzer@latest"]
}
}
}
Then restart Claude Desktop.
Verify it works
Once installed, ask your AI assistant:
Analyze this video: https://www.loom.com/share/bdebdfe44b294225ac718bad241a94fe
If the server is connected, it will automatically call the analyze_video tool.
Tools
`analyze_video` — Full video analysis
Extracts everything from a video URL in one call:
> Analyze this video: https://www.loom.com/share/abc123...
Returns:
- Transcript with timestamps and speakers
- Key frames extracted via scene-change detection (automatically deduplicated)
- OCR text extracted from frames (code, error messages, UI text visible on screen)
- Annotated timeline merging transcript + frames + OCR into a unified "what happened when" view
- Metadata (title, duration, platform)
- Comments from viewers
- Chapters and AI summary (when available)
The AI will automatically call this tool when it sees a video URL — no need to ask.
Options:
detail— analysis depth:"brief"(metadata + truncated transcript, no frames),"standard"(default),"detailed"(dense sampling, more frames)fields— array of specific fields to return, e.g.["metadata", "transcript"]. Available:metadata,transcript,frames,comments,chapters,ocrResults,timeline,aiSummarymaxFrames(1-60, default depends on detail level) — cap on extracted framesthreshold(0.0-1.0, default 0.1) — scene-change sensitivityforceRefresh— bypass cache and re-analyzeskipFrames— skip frame extraction for transcript-only analysis
`get_transcript` — Transcript only
> Get the transcript from this video
Quick transcript extraction. Falls back to Whisper transcription when no native transcript is available.
`get_metadata` — Metadata only
> What's this video about?
Returns metadata, comments, chapters, and AI summary without downloading the video.
`get_frames` — Frames only
> Extract frames from this video with dense sampling
Two modes:
- Scene-change detection (default) — captures visual transitions
- Dense sampling (
dense: true) — 1 frame/sec for full coverage
`analyze_moment` — Deep-dive on a time range
> Analyze what happens between 1:30 and 2:00 in this video
Combines burst frame extraction + filtered transcript + OCR + annotated timeline for a focused segment. Use when you need to understand exactly what happens at a specific moment.
`get_frame_at` — Single frame at a timestamp
> Show me the frame at 1:23 in this video
The AI reads the transcript, spots a critical moment, and requests the exact frame to see what's on screen.
`get_frame_burst` — N frames in a time range
> Show me 10 frames between 0:15 and 0:17 of this video
For motion, vibration, animations, or fast scrolling — burst mode captures N frames in a narrow window so the AI can see frame-by-frame changes.
Detail Levels
| Level | Frames | Transcript | OCR | Timeline | Use case | |-------|
Tools (7)
analyze_videoExtracts transcript, key frames, OCR text, annotated timeline, metadata, comments, and chapters from a video URL.get_transcriptExtracts transcript from a video, falling back to Whisper if native transcript is unavailable.get_metadataReturns metadata, comments, chapters, and AI summary without downloading the video.get_framesExtracts frames from a video using scene-change detection or dense sampling.analyze_momentPerforms deep-dive analysis on a specific time range including frames, transcript, and OCR.get_frame_atRetrieves a single frame at a specific timestamp.get_frame_burstCaptures a burst of N frames within a specific time range.Configuration
{"mcpServers": {"video-analyzer": {"command": "npx", "args": ["mcp-video-analyzer@latest"]}}}