Browser Control Skill

This is not some URL fetcher that grabs raw HTML. Your ClawStart agent gets a real headless Chromium browser powered by Playwright. It can open pages, wait for JavaScript to finish rendering, click buttons, fill out forms, scroll around, take screenshots, and even run custom JS on the page. Basically everything you would do yourself in a browser window, except the agent does it while you do something else.

Why does this matter? Because most modern websites need JavaScript to show any content at all. If you send a simple HTTP request to a React or Next.js app, you get back an empty page. The browser skill sees the same thing a real visitor would see.

What is going on under the hood

Every ClawStart agent connects to a shared headless Chrome instance that runs in a separate container. When the agent needs to visit a website, it opens a new tab through the Chrome DevTools Protocol, does its thing, and closes the tab when it is done. Playwright, Microsoft's browser automation framework, handles the connection.

You do not write any code. You just describe what you want in plain language. "Go to producthunt.com and get me the top 5 products launched today." The agent figures out the navigation, waits for the page to load, pulls out the text, and formats the results for you.

Because the connection goes through the DevTools Protocol, the agent has pretty fine-grained control over what happens in the browser. It can intercept network requests, run arbitrary JavaScript on the page, capture console output, even tweak the DOM before extracting data. This is the same protocol that Chrome DevTools itself uses, so the agent can do pretty much anything a developer would do with the console open.

What it can do

The agent navigates to any URL, follows links, handles redirects, and waits for the page to fully load including all the JavaScript content. It fills out forms of any kind: text fields, dropdowns, checkboxes, even file uploads. Multi-step forms that span several pages work too.

For data extraction, it reads text from pages, pulls structured data out of tables, scrapes product listings, grabs contact information. It can take screenshots of the whole page or specific elements when you need visual proof of something. And when you need to interact with the page, it clicks buttons, opens modals, scrolls through infinite feeds, and expands dropdowns.

Example conversation

You:

"Go to producthunt.com, find today's top 5 products, and give me a summary of each."

Agent:

"Opening Product Hunt... Page loaded. Here are today's top 5: (1) CodeReview AI, automated PR reviewer, 342 upvotes. (2) MailMerge Pro, bulk email personalization, 281 upvotes. (3) DesignTokens, Figma-to-code sync, 267 upvotes. (4) APIBench, load testing tool, 198 upvotes. (5) TaskBot, Slack task manager, 154 upvotes. Want me to open any of these for more details?"

You:

"Tell me more about CodeReview AI. What is their pricing?"

Agent:

"Going to their product page now... They have three plans: Free for up to 5 repos, Pro at $29 a month for unlimited repos and priority reviews, Enterprise with custom pricing. The free tier includes 50 reviews per month. Want a screenshot of the pricing page?"

You:

"No need. Check if they have a GitHub integration and what it takes to set up."

Agent:

"Found their docs. Setup means installing a GitHub App on your org, granting read access to pull requests, and adding a webhook URL. Their guide says it takes about 2 minutes. No API key needed on the free tier."

What people use it for

Price monitoring. Have your agent check e-commerce sites for prices on products you are watching. If you combine this with the Gmail skill, it can email you when a price drops below your target. The agent easily handles 10 to 20 product pages in one session, comparing prices across different retailers.

Lead generation. The agent visits company directories, industry sites, conference speaker pages, and extracts names, titles, emails, company info. It structures everything into a clean format you can drop into a CRM. A common approach is to first use web search to find the right directories and then switch to the browser to pull the actual data.

Competitor research. Point the agent at competitor websites and ask it to check their pricing, feature lists, and positioning. It visits the pages, reads through them, and writes up a comparison. You can save the results to Notion or send them to Slack.

Filling out forms. If you have the same information to enter on multiple platforms, the agent handles the repetitive part. Job applications, government forms, registration processes. Give it your data and it fills in the fields on each site.

How it compares to writing Selenium scripts

Anyone who has written browser automation with Selenium or Puppeteer or Playwright directly knows the pain: selectors break when the site changes its layout, timing is always tricky, you spend more time debugging than writing actual logic. With ClawStart, you skip all of that. The agent understands the page content and figures out what to click and where to look on its own, no CSS selectors needed.

There is a trade-off though. A hand-written Playwright script gives you pixel-perfect automation that runs the exact same way every time. The agent is more flexible but also less predictable. For one-off research or quick data collection, the agent is faster to use. For production scraping that runs on a schedule, you might want both: use the agent to explore and prototype, then write a proper script for the recurring job.

Goes well with other skills

Browser control and web search are a natural pair. The agent searches for relevant pages first, then visits them with the browser to get detailed data that search snippets cannot provide.

Connect Notion or Slack through MCP to save what the agent finds or share it with your team automatically.

What it cannot do

CAPTCHAs are a no-go. Sites with aggressive anti-bot measures may block the agent entirely. The browser runs headless so you cannot watch it navigate in real time, though you can request screenshots at any step. Two-factor authentication is a problem unless you provide an active session cookie. Some very complex single-page apps like Figma or Google Sheets may not work right because they depend on specific browser features or WebGL that headless Chrome handles differently. And the agent goes through pages one at a time, so scraping a hundred pages will take a while. For that kind of volume a dedicated scraping tool is more practical.

Getting started

Browser control works on all ClawStart plans. Create your agent and ask it to visit a website. The agent uses the browser automatically whenever it needs to. Nothing to configure.

See what else your agent can do: all skills and MCP servers.

Frequently asked questions

Can the agent solve CAPTCHAs?

No. CAPTCHAs exist specifically to stop bots and the agent does not try to get around them. If a site puts up a CAPTCHA, the agent tells you about it and moves on. You can provide session cookies to bypass the login flow as a workaround.

Can I watch the browser in real time?

The browser runs headless so there is no visible window. But you can ask the agent to take a screenshot at any point and it will send you the image. That way you can see exactly what the browser sees.

How fast is it compared to doing it manually?

For a single page, the agent takes about 3 to 10 seconds including navigation, waiting for JavaScript, and pulling out the content. Roughly the same as you would. The advantage kicks in with repetitive stuff, like checking 20 product pages or filling the same form on multiple sites.

Can the agent run multiple browser tabs at once?

The agent works through pages one at a time. Opens a tab, does what it needs, closes it, opens the next one. This keeps things predictable and prevents resource spikes.

Does it work with sites that require login?

Yes, if you give it session cookies or credentials. The agent can fill in login forms but cannot handle two-factor authentication unless you provide an active session. For services like Gmail or GitHub, you are better off using the dedicated skills and MCP servers.

Browser Control Skill

What is going on under the hood

What it can do

Example conversation

What people use it for

How it compares to writing Selenium scripts

Goes well with other skills

What it cannot do

Getting started

Frequently asked questions

Can the agent solve CAPTCHAs?

Can I watch the browser in real time?

How fast is it compared to doing it manually?

Can the agent run multiple browser tabs at once?

Does it work with sites that require login?

Ready to start?