Plasmate
The browser engine for agents. 25K lines of Rust. 230 tests. Apache-2.0. Now at v0.5.
I built Plasmate because I was tired of shoving raw HTML into LLMs and watching them choke on 50,000 tokens of layout divs and tracking scripts just to answer "what are the top stories on Hacker News?"
The core idea: instead of returning raw HTML, Plasmate compiles pages into a Semantic Object Model (SOM). That's a structured JSON representation of what's actually on the page: what you can click, what you can type into, what the content says. Everything an agent doesn't need gets stripped: CSS classes, inline styles, SVGs, tracking pixels, ad containers, layout divs.
We run automated coverage tests against the top 100 websites - both with and without JavaScript. The JS-enabled scorecard parses 94 out of 98 sites (98.9%) with a median 9x compression ratio. A separate cost analysis across 49 URLs measured 94% average token savings. Google Cloud compresses 116x. Reddit compresses 103x.
What shipped (v0.5)
v0.5 adds direct compilation, richer semantic extraction, and the WebTaskBench evaluation suite. SOM now captures ARIA widget states, disclosure widgets, improved tables, and strips cookie/consent banners automatically.
- plasmate compile - Compile HTML to SOM from files or stdin. No browser, no network. Perfect for publisher build pipelines.
- html_id field - SOM elements preserve original HTML id attributes for DOM resolution and agent interaction.
- ARIA state capture - aria-expanded, aria-selected, aria-checked, aria-disabled, aria-current, aria-pressed, aria-hidden.
- details/summary support - Disclosure widgets as first-class interactive elements with toggle action and open state.
- Better tables - 12 columns (was 8), 30 rows (was 20), colspan handling, caption extraction.
- GDPR/cookie banner stripping - Consent overlays detected and removed automatically.
- ICU/Intl support - Full ICU data for Intl.NumberFormat and related APIs in JS-heavy SPAs.
- Raised script limits - 3MB per script, 10MB total. Handles large SPA bundles.
- WebTaskBench - 100 agent tasks across 50 real websites. SOM is 4x more token-efficient and faster on GPT-4o and Claude Sonnet 4.
Previous: v0.4
- Full SPA hydration - insertBefore, replaceChild, classList, cloneNode, MutationObserver
- Interaction APIs - page.click(), page.type(), page.waitForSelector()
- Screenshots - Page.captureScreenshot delegates to Chrome for pixel-perfect results
- CDP compatibility - Puppeteer connects out of the box
- Network interception - block, modify, or mock responses
- 50 concurrent sessions per instance
How it works
Plasmate is a single Rust binary. Give it a URL, get structured JSON back.
Under the hood: html5ever parses the HTML, V8 runs JavaScript (with a full DOM shim for SPA frameworks), then the SOM compiler extracts semantic structure. It also speaks CDP (Puppeteer/Playwright), AWP (our agent-native protocol), and MCP (for Claude, GPT, and any MCP-compatible agent).
What this saves you
At 1,000 page loads across our 49-URL benchmark:
| Model | HTML Cost | SOM Cost | Savings |
|---|---|---|---|
| GPT-4 ($10/M) | $50,397 | $3,042 | $47,355 (94%) |
| GPT-4o ($2.50/M) | $12,599 | $761 | $11,839 (94%) |
| Claude Sonnet ($3/M) | $15,119 | $913 | $14,207 (94%) |
At 1M pages/month, that's $966/month saved on GPT-4 alone. See the full cost analysis (49 sites) or the JS coverage scorecard (98 sites).
SDKs and integrations
SOM itself is an open spec (v1.0) with JSON Schema validation. We publish interactive HTML and JS coverage scorecards that test the top 100 sites automatically.
MCP tools (for AI agents)
Run plasmate mcp and any MCP-compatible agent gets these tools:
vs. the alternatives
| Plasmate | Lightpanda | Stagehand | Chrome | |
|---|---|---|---|---|
| Speed | 4-5ms | 23ms | 252ms+ | 252ms |
| Memory (100pg) | ~30MB | ~2.4GB | ~20GB | ~20GB |
| Output | SOM (JSON) | HTML | HTML | HTML |
| Token savings | 94% | 0% | 0% | 0% |
| Puppeteer | Yes (CDP) | Partial | Yes (Chrome) | Yes |
| License | Apache-2.0 | AGPL | MIT | Chromium |
What's next
Next up: daemon mode for persistent warm instances (already landing post-v0.5), 500+ concurrent sessions per instance, proxy rotation, iframe and shadow DOM support, full ES module support, and the Chrome extension on the Web Store. The full roadmap is public.
Why Apache-2.0
The closest alternative (Lightpanda) is AGPL, which means if you use it in your product, you have to open source your entire stack. That's a non-starter for a lot of companies. Plasmate is Apache-2.0. Use it however you want. Embed it, wrap it, sell it, fork it. I don't care. I just want agents to have a better browser than Chrome.
Get started
For Claude Desktop or any MCP client: