MCP Playwright Browser Server
A production-grade Model Context Protocol (MCP) server that gives AI assistants full browser control through Playwright — using a hybrid DOM + Accessibility Tree + Visual approach. Built for real-world agentic automation: job applications, web scraping, form filling, and complex multi-tab workflows.
v2.0 is a complete rewrite. The server grew from 680 lines and 23 tools to nearly 5,000 lines and 71 tools, with a modular architecture, token-optimized capture profiles, hard payload budgets, and a full test suite.
What's New in v2.0
The Problem v1 Had
v1 was a working proof of concept. It could browse pages and extract jobs. But when used with Gemini CLI for real tasks — filling application forms, navigating multi-tab flows, handling downloads — it hit hard limits:
- Token waste: Every tool response dumped everything it found. One
browser.snapshoton a complex page could push 50KB+ into Gemini's context window in a single call, rapidly exhausting the budget. - No multi-tab support: If a link opened a new tab (very common in job applications), Gemini was stuck with no way to switch to it.
- No form intelligence: Filling a form required manual click-by-click instructions. There was no way to ask "what fields are still empty?" or "fill all required fields."
- Brittle DOM-only navigation: Shadow DOM, iframes, and obfuscated element IDs caused failures with no fallback.
- No session persistence: Every run started fresh. Logging in again and again wasted time and triggered bot detection.
- No safety rails: The AI could write files anywhere on disk, run arbitrary JS, or create its own automation scripts — unguarded.
- Monolithic: One 680-line file with no tests.
What v2.0 Solves
Every one of those problems has a specific solution in v2.0:
| Problem | v2.0 Solution |
|---|---|
| Token waste | Capture Profile System (light/balanced/full) + 280KB hard payload ceiling |
| Multi-tab stuck | Page Manager with stable pageIds, browser.list_pages, browser.select_page |
| Dumb form filling | browser.form_audit + browser.fill_form + Google Forms specialist tools |
| Shadow DOM / obfuscated IDs | A11y tree via CDP Accessibility.getFullAXTree with stable ax- UIDs |
| Session loss | Cookie export/import, browser.export_storage_state / browser.import_storage_state |
| No safety | Path allowlist in src/security/paths.js, MCP_ALLOW_EVALUATE guard |
| Monolithic | 10 focused modules in src/browser/ + src/security/ + 18-test suite |
v1 vs v2 Comparison
| Dimension | v1.0 | v2.0 |
|---|---|---|
| Total MCP tools | 23 | 71 |
| Server size | 680 lines, 1 file | 4,966 lines, 11 modules |
| Token efficiency | Uncontrolled dumps | Capture profiles + 280KB hard ceiling |
| Multi-tab support | Single tab only | Full page manager (list, select, close) |
| Form automation | Manual click-by-click | form_audit + fill_form + Google Forms specialist |
| A11y / Shadow DOM | DOM-only, brittle | CDP Accessibility tree with stable UIDs |
| Scroll handling | Saw first viewport only | Scroll awareness + container scrolling |
| Session persistence | None | Cookie/storage export-import |
| Popup & dialog handling | None | Dialog accept/dismiss, popup pageId capture |
| Download management | None | Wait-for-download, save to path |
| File reading (CV/PDF) | None | files.read_text, files.read_pdf_text |
| Security | No restrictions | Allowlist-enforced read/write paths |
| Observability | None | Console log capture, network request log |
| Test coverage | 2 tests | 18 tests |
| Profiles | 3 | 5 (+ persistent variants) |
| Batch scripts | 5 .bat launchers |
7 .bat launchers |
| Error handling | Raw exceptions to AI | Normalized, structured, budgeted |
What stayed the same
- Indeed job extractor (production-grade, multi-selector, deduplication)
- Google search extractor (consent handling, URL deobfuscation)
- Stealth mode (webdriver hiding, user agent spoofing)
- CDP connection to real Chrome
- Visual snapshot + coordinate-based clicking
How It Works
You / Gemini CLI
│
│ natural language prompt
▼
Gemini CLI ──── loads MCP conf
Tools 5
browser.list_pagesLists all currently open browser pages.browser.select_pageSwitches focus to a specific browser page by ID.browser.form_auditAudits a form to identify fields and their requirements.browser.fill_formFills out form fields based on provided data.browser.export_storage_stateExports current cookies and local storage for session persistence.Environment Variables
MCP_ALLOW_EVALUATEEnables or disables the ability to evaluate arbitrary JavaScript.