Visual web browsing for AI agents via Model Context Protocol (MCP).
š atlas-browser-mcp
Visual web browsing for AI agents via Model Context Protocol (MCP).
⨠Features
- šø Visual-First: Navigate the web through screenshots, not DOM parsing
- š·ļø Set-of-Mark: Interactive elements labeled with clickable
[0],[1],[2]... markers - š Humanized: Bezier curve mouse movements, natural typing rhythms
- š§© CAPTCHA-Ready: Multi-click support for image selection challenges
- š”ļø Anti-Detection: Built-in measures to avoid bot detection
š Quick Start
Installation
pip install atlas-browser-mcp
playwright install chromium
Use with Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}
Then ask Claude:
"Navigate to https://news.ycombinator.com and tell me the top 3 stories"
š ļø Available Tools
| Tool | Description |
|---|---|
navigate |
Go to URL, returns labeled screenshot |
screenshot |
Capture current page with labels |
click |
Click element by label ID [N] |
multi_click |
Click multiple elements (for CAPTCHA) |
type |
Type text, optionally press Enter |
scroll |
Scroll page up or down |
š Usage Examples
Basic Navigation
User: Go to google.com
AI: [calls navigate(url="https://google.com")]
AI: I see the Google homepage. The search box is labeled [3].
User: Search for "MCP protocol"
AI: [calls click(label_id=3)]
AI: [calls type(text="MCP protocol", submit=true)]
AI: Here are the search results...
CAPTCHA Handling
User: Select all images with traffic lights
AI: [Looking at the CAPTCHA grid]
AI: I can see traffic lights in images [2], [5], and [8].
AI: [calls multi_click(label_ids=[2, 5, 8])]
š§ Configuration
Headless Mode
For servers without display:
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser(
headless=True, # No visible browser window
humanize=False # Faster, less human-like
)
Custom Viewport
browser = VisualBrowser()
browser.VIEWPORT = {"width": 1920, "height": 1080}
šļø How It Works
- Navigate: Browser loads the page
- Inject SoM: JavaScript labels all interactive elements
- Screenshot: Capture the labeled page
- AI Sees: The screenshot shows
[0],[1],[2]... on buttons, links, inputs - AI Acts: "Click
[5]" ā Browser clicks the element at that position - Repeat: New screenshot with updated labels
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā [0] Logo [1] Search [2] Menu ā
ā ā
ā [3] Article Title ā
ā [4] Read More ā
ā ā
ā [5] Subscribe [6] Share ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š¤ Integration
With Cline (VS Code)
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}
Programmatic Use
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser()
# Navigate
result = browser.execute("navigate", url="https://example.com")
print(f"Page title: {result.data['title']}")
print(f"Found {result.data['element_count']} interactive elements")
# Click element [0]
result = browser.execute("click", label_id=0)
# Type in focused field
result = browser.execute("type", text="Hello world", submit=True)
# Cleanup
browser.execute("close")
š Requirements
- Python 3.10+
- Playwright with Chromium
š Troubleshooting
"Playwright not installed"
pip install playwright
playwright install chromium
"Browser closed unexpectedly"
Try running with headless=False to see what's happening:
browser = VisualBrowser(headless=False)
Elements not being detected
Some dynamic pages need more wait time. The browser waits 1.5s after navigation, but complex SPAs may need longer.
š License
MIT License - see LICENSE
š Credits
Built for Atlas, an autonomous AI agent.
Inspired by:
anthropic/mcp - Model Context Protocol
AskUI - Visual testing approach
Set-of-Mark prompting - Visual grounding technique
Tools (6)
navigateGo to URL, returns labeled screenshotscreenshotCapture current page with labelsclickClick element by label ID [N]multi_clickClick multiple elements (for CAPTCHA)typeType text, optionally press EnterscrollScroll page up or downConfiguration
{"mcpServers": {"browser": {"command": "atlas-browser-mcp"}}}