BrowseAI Dev MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add -e "OPENROUTER_API_KEY=${OPENROUTER_API_KEY}" browse-ai -- npx -y browseai-dev
Required:OPENROUTER_API_KEY
README.md

Research infrastructure for AI agents

BrowseAI Dev

Research infrastructure for AI agents — real-time web search, evidence extraction, and structured citations. Every claim is backed by a URL. Every answer has a confidence score.

Agent → BrowseAI Dev → Internet → Verified answers + sources

Website · Playground · API Docs · Alternatives · Discord

Package names: npm: `browseai-dev` · PyPI: `browseaidev` · LangChain: `langchain-browseaidev` — Previously browse-ai and browseai. Old names still work and redirect automatically.


How It Works

search → fetch pages → neural rerank → extract claims → verify → cited answer (streamed)

Every answer goes through a multi-step verification pipeline. No hallucination. Every claim is backed by a real source.

Verification & Confidence Scoring

Confidence scores are evidence-based — not LLM self-assessed. After the LLM extracts claims and sources, a post-extraction verification engine checks every claim against the actual source page text:

  1. Atomic claim decomposition — Compound claims are auto-split into individual verifiable facts. "Tesla had $96B revenue and 1.8M deliveries" becomes two atomic claims, each verified independently.
  2. Hybrid retrieval (BM25 + dense embeddings) — For each claim, BM25 finds lexical matches and OpenAI text-embedding-3-small (via OpenRouter) finds semantic matches from source text. Rankings are fused using Reciprocal Rank Fusion (RRF) — a rank-based method that avoids score normalization issues. This catches paraphrased evidence that BM25 alone misses (e.g., "prevents fabricated answers" matching "reduces hallucinations"). Premium tier only, with graceful BM25 fallback.
  3. NLI evidence reranking — Top-3 RRF-fused candidates per claim are reranked by a DeBERTa-v3 NLI model for semantic entailment. Final hybrid score: 30% BM25 + 70% NLI, with contradiction penalties and paraphrase boosts.
  4. Multi-provider search — Parallel search across multiple providers for broader source diversity. More independent sources = stronger cross-reference = higher confidence.
  5. Domain authority scoring — 10,000+ domains across 5 tiers (institutional .gov/.edu → major news → tech journalism → community → low-quality), stored in Supabase with Majestic Million bulk import. Self-improving via Bayesian cold-start smoothing.
  6. Source quote verification — LLM-extracted quotes verified against actual page text using hybrid matching (exact substring → BM25 fallback).
  7. Cross-source consensus — Each claim verified against all available page texts. Claims supported by 3+ independent domains get "strong consensus". Single-source claims flagged as "weak".
  8. Contradiction detection — Claim pairs analyzed for semantic conflicts using topic overlap + NLI contradiction classification. Detected contradictions surfaced in the response and penalize confidence.
  9. Multi-pass consistency — In thorough mode, claims are cross-checked across independent extraction passes. Claims confirmed by both passes get boosted; inconsistent claims are penalized (SelfCheckGPT-inspired).
  10. Auto-calibrated confidence — 7-factor confidence formula auto-adjusts from user feedback using isotonic calibration curves. Predicted confidence aligns with actual accuracy over time. Factors: verification rate (25%), domain authority (20%), source count (15%), consensus (15%), domain diversity (10%), claim grounding (10%), citation depth (5%).
  11. Per-claim evidence retrieval — Weak claims get targeted search queries generated by LLM, then searched individually across all providers. Each claim gets its own evidence pool instead of sharing the same corpus (SAFE-inspired, from Google DeepMind's fact-checking research).
  12. Counter-query verification — Verified claims are stress-tested with adversarial "what would disprove this?" search queries. If counter-evidence is found, claim confidence is penalized (SANCTUARY-inspired).
  13. Iterative confidence-gated retrieval — Thorough mode uses a FIRE-inspired loop: verify → if weak claims remain → generate

Tools (1)

searchPerforms real-time web search with evidence extraction and confidence scoring.

Environment Variables

OPENROUTER_API_KEYrequiredAPI key for accessing LLM models used in verification and extraction.

Configuration

claude_desktop_config.json
{"mcpServers": {"browse-ai": {"command": "npx", "args": ["-y", "browseai-dev"]}}}

Try it

Search for the latest developments in quantum computing and provide evidence-backed citations.
Find recent news about AI regulations and verify the claims with confidence scores.
Research the current market share of electric vehicles and extract structured facts with sources.

Frequently Asked Questions

What are the key features of BrowseAI Dev?

Real-time web search with evidence extraction. Atomic claim decomposition for verifiable facts. Confidence scoring based on domain authority and consensus. Multi-provider search for source diversity. Contradiction detection and adversarial verification.

What can I use BrowseAI Dev for?

Fact-checking claims in AI-generated reports. Gathering evidence for academic or market research. Building AI agents that require verified, citation-backed information. Monitoring news with high-confidence source verification.

How do I install BrowseAI Dev?

Install BrowseAI Dev by running: npx -y browseai-dev

What MCP clients work with BrowseAI Dev?

BrowseAI Dev works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep BrowseAI Dev docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare