Browser Plugin - AI-Powered Web Automation
Test web applications using AI-driven browser automation. The browser plugin uses natural language instructions to navigate websites, extract data, and validate interfaces.
Prerequisites
# Install dependencies (automatic on first use)
pip install browser-use playwright
playwright install chromium
# Set API key (OpenAI or Anthropic)
export OPENAI_API_KEY=sk-your-key-here
# OR
export ANTHROPIC_API_KEY=sk-ant-your-key-here
Basic Usage
- name: "Check website content"
plugin: browser
config:
task: "Navigate to https://example.com and extract the main heading"
llm:
provider: "openai" # or "anthropic"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: true
timeout: "1m"
save:
- json_path: ".result"
as: "page_content"
assertions:
- type: "json_path"
path: ".success"
expected: true
Configuration
Required Fields
config:
task: "Natural language instruction" # What the browser should do
llm: # LLM provider configuration
provider: "openai" # "openai" or "anthropic"
model: "gpt-4o" # Model name
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
Optional Fields
config:
headless: true # Run without visible browser (default: true)
timeout: "2m" # Max execution time (default: 2m)
use_vision: false # Enable visual analysis (default: false)
max_actions_per_step: 10 # Action limit per step (default: 10)
allowed_domains: # Restrict navigation (optional)
- "example.com"
- "api.example.com"
viewport: # Custom browser size
width: 1920
height: 1080
LLM Providers
OpenAI
llm:
provider: "openai"
model: "gpt-4o" # or "gpt-4", "gpt-3.5-turbo"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
Anthropic
llm:
provider: "anthropic"
model: "claude-3-5-sonnet-20241022" # or other Claude models
config:
ANTHROPIC_API_KEY: "{{ .env.ANTHROPIC_API_KEY }}"
Save & Assert
Extract Data
- name: "Scrape product info"
plugin: browser
config:
task: "Go to https://example.com/product and extract the price and title"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
save:
- json_path: ".result"
as: "product_data"
- json_path: ".actions_taken"
as: "action_count"
Assert Success
assertions:
- type: "json_path"
path: ".success"
expected: true
- type: "json_path"
path: ".actions_taken"
exists: true
Common Use Cases
Web Application Testing
- name: "Test login flow"
plugin: browser
config:
task: |
1. Go to https://app.example.com/login
2. Enter email: test@example.com
3. Enter password: testpass123
4. Click login button
5. Verify you see the dashboard
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
timeout: "3m"
Data Extraction
- name: "Scrape pricing table"
plugin: browser
config:
task: "Navigate to https://example.com/pricing and extract all plan names and prices"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
use_vision: true # Better for visual elements
save:
- json_path: ".result"
as: "pricing_data"
Form Submission
- name: "Fill contact form"
plugin: browser
config:
task: |
Go to https://example.com/contact
Fill in:
- Name: Test User
- Email: test@example.com
- Message: Automated test message
Click submit
Verify success message appears
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: false # Watch it work
Multi-Step Workflows
tests:
- name: "Complete purchase flow"
steps:
- name: "Browse products"
plugin: browser
config:
task: "Go to https://shop.example.com and find the first product"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
save:
- json_path: ".result"
as: "product_url"
- name: "Add to cart"
plugin: browser
config:
task: "Go to {{ product_url }} and click add to cart"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
- name: "Checkout"
plugin: browser
config:
task: "Navigate to cart and proceed to checkout"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
Best Practices
Clear Instructions: Be specific about what the browser should do
# ❌ Vague
task: "Check the website"
# ✅ Specific
task: "Navigate to https://example.com/products, click on the first product, and extract its price"
Appropriate Timeouts: Complex tasks need more time
Headless vs Headful:
- Use headless: true
for CI/CD and faster execution
- Use headless: false
for debugging and watching the browser
Vision Mode: Enable for visual elements (charts, images)
Restrict Domains: Prevent navigation to unexpected sites
Troubleshooting
Browser won't start: Install Playwright browsers
Timeout errors: Increase timeout or simplify task
Actions not working: Add debug logging
Navigation fails: Check allowed_domains