Skip to content

Browser Use Plugin

Poor Performance - Use Agent Plugin Instead

The Browser Use plugin has poor performance and is not recommended. Use the Agent plugin with browser capability instead for better performance and reliability.

AI-driven browser automation using natural language tasks with GPT-4o or Claude.

Quick Start

- name: "Verify page content"
  plugin: browser_use
  config:
    task: "Verify the page has a heading 'Example Domain'"
    max_steps: 3
    use_vision: true
    llm:
      provider: "openai"
      model: "gpt-4o"

Prerequisites

# Install dependencies
pip install playwright browser-use langchain-openai langchain-anthropic
playwright install chromium

# Set API key
export OPENAI_API_KEY=sk-your-key-here
# or
export ANTHROPIC_API_KEY=sk-ant-your-key-here

Configuration

Required Fields

Field Description Example
task Natural language task "Click login button and verify redirect"
llm.provider LLM provider "openai" or "anthropic"

Optional Fields

Field Description Default
max_steps Maximum agent steps 10
use_vision Enable vision capabilities false
timeout Task execution timeout 5m
llm.model LLM model name gpt-4o (OpenAI)
llm.config LLM configuration (API keys) Auto-detected from env

LLM Configuration

OpenAI

llm:
  provider: "openai"
  model: "gpt-4o"
  config:
    OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"

Or auto-detect from environment:

export OPENAI_API_KEY=sk-your-key-here
llm:
  provider: "openai" # Automatically uses OPENAI_API_KEY from env

Anthropic

llm:
  provider: "anthropic"
  config:
    ANTHROPIC_API_KEY: "{{ .env.ANTHROPIC_API_KEY }}"

Common Patterns

Login Flow

- name: "Complete login"
  plugin: browser_use
  config:
    task: |
      Navigate to {{ .env.FRONTEND_URL }}/login and:
      - Fill email: {{ .env.TEST_EMAIL }}
      - Fill password: {{ .env.TEST_PASSWORD }}
      - Click login
      - Verify dashboard appears
    max_steps: 5
    llm:
      provider: "openai"

Data Extraction

- name: "Extract pricing"
  plugin: browser_use
  config:
    task: "Extract all plan names and monthly prices"
    use_vision: true
    max_steps: 3
    llm:
      provider: "openai"
      model: "gpt-4o"
  save:
    - json_path: ".result"
      as: "pricing_data"

Form Validation

- name: "Verify success message"
  plugin: browser_use
  config:
    task: "Check if success message appears after form submission"
    max_steps: 2
    llm:
      provider: "openai"

Combining with Other Plugins

With Playwright

- name: "Navigate with Playwright"
  plugin: playwright
  config:
    script: |
      page.goto("https://example.com")
      page.fill("#api-key", "test-key-123")

- name: "Verify with AI"
  plugin: browser_use
  config:
    task: "Verify the API key field shows 'test-key-123'"

With HTTP

- name: "Get test data from API"
  plugin: http
  config:
    method: GET
    url: "{{ .env.API_URL }}/test-data"
  save:
    - json_path: ".user_email"
      as: "test_email"

- name: "Use in browser"
  plugin: browser_use
  config:
    task: "Login with email {{ test_email }}"

Response Format

browser_use returns structured results:

{
  "status": "pass",
  "message": "Task completed successfully",
  "result": "Found heading 'Example Domain'"
}

Vision Mode

Enable for visual elements like charts, images, or complex layouts:

use_vision: true

Best Practices

  • Be specific: "Find product under $50 and add to cart" not "Check website"
  • Set max_steps: Limit agent iterations (2-5 steps for most tasks)
  • Use vision selectively: Only enable for visual content (slower, more tokens)
  • Simplify tasks: Keep tasks focused and achievable
  • Combine with playwright: Use playwright for setup, browser_use for verification

Troubleshooting

Issue Solution
Timeout errors Increase timeout or reduce max_steps
API errors Check API key and quota limits
Browser won't start Run playwright install chromium
Task fails Simplify the task, add more specific instructions
High cost Reduce max_steps, disable use_vision when not needed

Agent Plugin vs browser_use

Feature browser_use agent (MCP)
Model GPT-4o or Claude Claude Sonnet 4.5
Tools Browser only Browser + Files + APIs + Database
Architecture Integrated library MCP server protocol
Use Case Simple browser automation Complex multi-tool workflows

Recommendation: Use the Agent plugin for new projects. It offers better integration, multi-tool support, and Claude Sonnet 4.5's advanced capabilities.

See Also