Unified web layer for AI agents
Spectrawl
The unified web layer for AI agents. Search, browse, crawl, extract, and act on platforms — one package, self-hosted.
5,000 free searches/month via Gemini Grounded Search. Full page scraping, stealth browsing, multi-page crawling, structured extraction, AI browser agent, 24 platform adapters.
What It Does
AI agents need to interact with the web — searching, browsing pages, crawling sites, logging into platforms, posting content. Today you wire together Playwright + a search API + cookie managers + platform-specific scripts. Spectrawl is one package that does all of it.
npm install spectrawl
How It Works
Spectrawl searches via Gemini Grounded Search (Google-quality results), scrapes the top pages for full content, and returns everything to your agent. Your agent's LLM reads the actual sources and forms its own answer — no pre-chewed summaries.
Quick Start
npm install spectrawl
export GEMINI_API_KEY=your-free-key # Get one at aistudio.google.com
const { Spectrawl } = require('spectrawl')
const web = new Spectrawl()
// Deep search — returns sources for your agent/LLM to process
const result = await web.deepSearch('how to build an MCP server in Node.js')
console.log(result.sources) // [{ title, url, content, score }]
// With AI summary (opt-in — uses extra Gemini call)
const withAnswer = await web.deepSearch('query', { summarize: true })
console.log(withAnswer.answer) // AI-generated answer with [1] [2] citations
// Fast mode — snippets only, skip scraping
const fast = await web.deepSearch('query', { mode: 'fast' })
// Basic search — raw results
const basic = await web.search('query')
Why no summary by default? Your agent already has an LLM. If we summarize AND your agent summarizes, you're paying two LLMs for one answer. We return rich sources — your agent does the rest.
Spectrawl vs Others
| Tavily | Crawl4AI | Firecrawl | Stagehand | Spectrawl | |
|---|---|---|---|---|---|
| Speed | ~2s | ~5s | ~3s | ~3s | ~6-10s |
| Free tier | 1,000/mo | Unlimited | 500/mo | None | 5,000/mo |
| Returns | Snippets + AI | Markdown | Markdown/JSON | Structured | Full page + structured |
| Self-hosted | No | Yes | Yes | Yes | Yes |
| Anti-detect | No | No | No | No | Yes (Camoufox) |
| Block detection | No | No | No | No | 8 services |
| CAPTCHA solving | No | No | No | No | Yes (Gemini Vision) |
| Structured extraction | No | No | No | Yes | Yes |
| NL browser agent | No | No | No | Yes | Yes |
| Network capturing | No | Yes | No | No | Yes |
| Multi-page crawl | No | Yes | Yes | No | Yes (+ sitemap) |
| Platform posting | No | No | No | No | 24 adapters |
| Auth management | No | No | No | No | Cookie store + refresh |
Search
Two modes: basic search and deep search.
Basic Search
const results = await web.search('query')
Returns raw search results from the engine cascade. Fast, lightweight.
Deep Search
const results = await web.deepSearch('query', { summarize: true })
Full pipeline: query expansion → parallel search → merge/dedup → rerank → scrape top N → optional AI summary with citations.
Search Engine Cascade
Default cascade: Gemini Grounded → Tavily → Brave
Gemini Grounded Search gives you Google-quality results through the Gemini API. Free tier: 5,000 grounded queries/month.
| Engine | Free Tier | Key Required | Default |
|---|---|---|---|
| Gemini Grounded | 5,000/month | GEMINI_API_KEY |
✅ Primary |
| Tavily | 1,000/month | TAVILY_API_KEY |
✅ 1st fallback |
| Brave | 2,000/month | BRAVE_API_KEY |
✅ 2nd fallback |
| DuckDuckGo | Unlimited | None | Available |
| Bing | Unlimited | None | Available |
| Serper | 2,500 trial | SERPER_API_KEY |
Available |
| Google CSE | 100/day | GOOGLE_CSE_KEY |
Available |
| Jina Reader | Unlimited | None | Available |
| SearXNG | Unlimited | Self-hosted | Available |
Deep Search Pipeline
Query → Gemini Grounded + DDG (parallel)
→ Merge & deduplicate (12-19 results)
→ Source quality ranking (boost GitHub/SO/Reddit, penalize SEO spam)
→ Parallel scraping (Jina → readability → Playwright fallback)
→ Returns sources to your agent (AI summary opt-in with summarize: true)
What you get without any keys
DDG-only search, raw results, no AI answer. Works from home IPs. Datacenter IPs get rate-limited by DDG — recommend at minimum a free Gemini key.
Browse
Stealth browsing with anti-detection. Three tiers (auto-escalation):
- Playwright + stealth plugin — default, works immediately
- Camoufox binary — engine-level anti-fingerprint (
npx spectrawl install-stealth) - Remote Camoufox — for existing deployments
If tier 1 gets blocked, Spectrawl automatically escalates to tier 2 (if installed) or tier 3 (if configured). No manual intervention needed.
Browse Options
const page = await web.browse('https://example.com
Tools (3)
searchPerforms a basic search using the engine cascade.deepSearchPerforms a full pipeline search including query expansion, scraping, and optional AI summarization.browseNavigates to a URL using stealth browsing techniques.Environment Variables
GEMINI_API_KEYAPI key for Gemini Grounded Search and AI summarization.TAVILY_API_KEYAPI key for Tavily search fallback.BRAVE_API_KEYAPI key for Brave search fallback.SERPER_API_KEYAPI key for Serper search fallback.Configuration
{"mcpServers": {"spectrawl": {"command": "npx", "args": ["-y", "spectrawl"]}}}