Every AI Browser Tool Is Broken Except One
I tested Playwright, playwright-cli, OpenClaw's browser tool, and our own tappi on real tasks. Only one went 3/3 with correct data โ and it wasn't close.
Playwright couldn't log into Gmail. playwright-cli got CAPTCHA'd by Reddit on the first page. OpenClaw's browser tool burned 252K tokens doing what tappi did in 59K. And Playwright "scripted" its way to wrong answers on 4 out of 5 Reddit posts without even knowing.
4 AI agents. 4 browser tools. 3 real-world tasks. Same model (Claude Sonnet 4.6), same thinking level, same instructions.
The Scorecard
| ๐น tappi | ๐ธ Browser Tool | ๐ท Playwright | ๐ถ playwright-cli | |
|---|---|---|---|---|
| Success Rate | ๐ข 3/3 | ๐ข 3/3 | ๐ก 1/3* | ๐ด 1/3 |
| Total Context | 59K | 252K | 44K | 52K |
| Total Time | 4m 13s | 8m 38s | 3m 42s | 3m 36s |
| Auth Tasks | โ | โ | โ | โ |
| Bot Detection | โ | โ | โ | โ |
| Shadow DOM | โ | โ ๏ธ Workaround | N/A | N/A |
| Data Quality | โญ High | โญ High | โ ๏ธ Low | N/A |
| Verdict | ๐ Best overall | Reliable but heavy | Cheap but brittle | Too limited |
*Playwright's Reddit "success" returned automod bot comments instead of actual top comments on 4/5 posts โ functionally incorrect.
Task 1: Reddit Data Extraction
Navigate to r/LocalLLaMA, find top 5 posts from the past week, extract title, upvotes, and top comment for each.
- tappi opened the subreddit, ran JavaScript to pull all titles and upvotes in one shot, visited each post, evaluated comment scores via DOM, and deliberately skipped automod bot comments. 8 tool calls. Done in under 2 minutes.
- Browser tool followed the same strategy but each page produced a full ARIA tree โ tens of thousands of tokens. Same quality, 5.6x the cost.
- Playwright wrote a script using old.reddit.com but blindly grabbed the first comment on each post โ automod bot on 4 of 5. No way to inspect and adjust.
- playwright-cli never got past the front door. Reddit detected headless Chrome and served a visual reCAPTCHA.
| Tool | Context | Time | Result |
|---|---|---|---|
| ๐น tappi | 21K | 1m 52s | โ Correct data |
| ๐ธ Browser tool | 118K | 3m 00s | โ Correct, massive token cost |
| ๐ท Playwright | 14K | 1m 02s | โ ๏ธ Wrong data (bot comments) |
| ๐ถ playwright-cli | 21K | 2m 22s | โ CAPTCHA blocked |
Task 2: Google Maps Lead Generation
Search for "plumbers in Houston TX" and extract top 5 results with name, rating, phone, address.
All four tools succeeded here. Google Maps is the great equalizer โ single page extraction on a site that doesn't aggressively block bots.
| Tool | Context | Time | Result |
|---|---|---|---|
| ๐น tappi | 16K | 59s | โ 3 commands |
| ๐ธ Browser tool | 21K | 38s | โ Single snapshot |
| ๐ท Playwright | 18K | 2m 34s | โ Works, slow |
| ๐ถ playwright-cli | 20K | 42s | โ Elegant |
The insight: When everything's on one page, tool differences shrink. The real differentiation happens on multi-step, interactive tasks โ which is most real-world agent work.
Task 3: Gmail โ Send an Email
Navigate to Gmail, compose, add two recipients, fill subject/body, send.
- tappi navigated to Gmail (already signed in), clicked Compose, typed recipients, filled subject/body, clicked Send. Shadow DOM compose dialog? Pierced right through. 8 tool calls, 82 seconds.
- Browser tool hit a wall โ Gmail's floating compose dialog is invisible to the ARIA tree. After 5 minutes and 113K tokens of workarounds, it found Gmail's URL-based compose form. Email sent, but painfully.
- Playwright & playwright-cli โ both launched fresh browsers. Google redirected to sign-in. No cookies. No session. Done in 30 seconds. Failed.
| Tool | Context | Time | Result |
|---|---|---|---|
| ๐น tappi | 22K | 1m 22s | โ Email sent |
| ๐ธ Browser tool | 113K | 5m 35s | โ Workaround needed |
| ๐ท Playwright | 12K | 26s | โ No auth |
| ๐ถ playwright-cli | 11K | 32s | โ No auth |
The Big Picture
Tappi: the only tool to complete every task, with correct data, at reasonable token cost.
59K total tokens vs. 252K for the next-closest successful tool. That's 4.3x more efficient โ and tappi didn't need any workarounds.
Two fault lines exposed:
- Persistent sessions are non-negotiable. Without them, you can't access any authenticated service.
- Shadow DOM piercing matters. Gmail's compose dialog is invisible to accessibility-tree-based tools.
Try It
pip install tappi
Full benchmark breakdown on dev.to ยท GitHub ยท tappi.synthworx.com