Browser Use Plugin

Poor Performance - Use Agent Plugin Instead

The Browser Use plugin has poor performance and is not recommended. Use the Agent plugin with browser capability instead for better performance and reliability.

AI-driven browser automation using natural language tasks with GPT-4o or Claude.

Quick Start

- name: "Verify page content"
  plugin: browser_use
  config:
    task: "Verify the page has a heading 'Example Domain'"
    max_steps: 3
    use_vision: true
    llm:
      provider: "openai"
      model: "gpt-4o"

Prerequisites

# Install dependencies
pip install playwright browser-use langchain-openai langchain-anthropic
playwright install chromium

# Set API key
export OPENAI_API_KEY=sk-your-key-here
# or
export ANTHROPIC_API_KEY=sk-ant-your-key-here

Configuration

Required Fields

Field	Description	Example
`task`	Natural language task	`"Click login button and verify redirect"`
`llm.provider`	LLM provider	`"openai"` or `"anthropic"`

Optional Fields

Field	Description	Default
`max_steps`	Maximum agent steps	`10`
`use_vision`	Enable vision capabilities	`false`
`timeout`	Task execution timeout	`5m`
`llm.model`	LLM model name	`gpt-4o` (OpenAI)
`llm.config`	LLM configuration (API keys)	Auto-detected from env

LLM Configuration

OpenAI

llm:
  provider: "openai"
  model: "gpt-4o"
  config:
    OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"

Or auto-detect from environment:

export OPENAI_API_KEY=sk-your-key-here

llm:
  provider: "openai" # Automatically uses OPENAI_API_KEY from env

Anthropic

llm:
  provider: "anthropic"
  config:
    ANTHROPIC_API_KEY: "{{ .env.ANTHROPIC_API_KEY }}"

Common Patterns

- name: "Complete login"
  plugin: browser_use
  config:
    task: |
      Navigate to {{ .env.FRONTEND_URL }}/login and:
      - Fill email: {{ .env.TEST_EMAIL }}
      - Fill password: {{ .env.TEST_PASSWORD }}
      - Click login
      - Verify dashboard appears
    max_steps: 5
    llm:
      provider: "openai"

Data Extraction

- name: "Extract pricing"
  plugin: browser_use
  config:
    task: "Extract all plan names and monthly prices"
    use_vision: true
    max_steps: 3
    llm:
      provider: "openai"
      model: "gpt-4o"
  save:
    - json_path: ".result"
      as: "pricing_data"

Form Validation

- name: "Verify success message"
  plugin: browser_use
  config:
    task: "Check if success message appears after form submission"
    max_steps: 2
    llm:
      provider: "openai"

Combining with Other Plugins

With Playwright

- name: "Navigate with Playwright"
  plugin: playwright
  config:
    script: |
      page.goto("https://example.com")
      page.fill("#api-key", "test-key-123")

- name: "Verify with AI"
  plugin: browser_use
  config:
    task: "Verify the API key field shows 'test-key-123'"

With HTTP

- name: "Get test data from API"
  plugin: http
  config:
    method: GET
    url: "{{ .env.API_URL }}/test-data"
  save:
    - json_path: ".user_email"
      as: "test_email"

- name: "Use in browser"
  plugin: browser_use
  config:
    task: "Login with email {{ test_email }}"

Response Format

browser_use returns structured results:

{
  "status": "pass",
  "message": "Task completed successfully",
  "result": "Found heading 'Example Domain'"
}

Vision Mode

Enable for visual elements like charts, images, or complex layouts:

use_vision: true

Best Practices

Be specific: "Find product under $50 and add to cart" not "Check website"
Set max_steps: Limit agent iterations (2-5 steps for most tasks)
Use vision selectively: Only enable for visual content (slower, more tokens)
Simplify tasks: Keep tasks focused and achievable
Combine with playwright: Use playwright for setup, browser_use for verification

Troubleshooting

Issue	Solution
Timeout errors	Increase `timeout` or reduce `max_steps`
API errors	Check API key and quota limits
Browser won't start	Run `playwright install chromium`
Task fails	Simplify the task, add more specific instructions
High cost	Reduce `max_steps`, disable `use_vision` when not needed

Agent Plugin vs browser_use

Feature	browser_use	agent (MCP)
Model	GPT-4o or Claude	Claude Sonnet 4.5
Tools	Browser only	Browser + Files + APIs + Database
Architecture	Integrated library	MCP server protocol
Use Case	Simple browser automation	Complex multi-tool workflows

Recommendation: Use the Agent plugin for new projects. It offers better integration, multi-tool support, and Claude Sonnet 4.5's advanced capabilities.