Browser Plugin - AI-Powered Web Automation
The browser plugin enables AI-driven browser automation within your test workflows, using browser-use to intelligently navigate websites, extract data, and validate web interfaces. Use it to test web applications, scrape data, monitor website functionality, or automate complex browser interactions.
Key Features Demonstrated
- AI-Driven Navigation: Autonomous web browsing with natural language instructions
- Visual Processing: Optional visual analysis for enhanced accuracy
- Viewport Control: Custom browser window and viewport dimensions
- Session Management: Persistent browser sessions across test steps
- Multi-Domain Support: Control which domains the agent can navigate
- Headless/Headful Modes: Run with or without visible browser window
- LLM Flexibility: Support for OpenAI and Anthropic models
- Data Extraction: Extract structured data from web pages
Prerequisites
Before using the browser plugin, you need:
- Browser-use installed: The plugin will attempt to install it automatically
- LLM API Key: Either OpenAI or Anthropic API key
- Chrome/Chromium: Installed on your system (handled by Playwright)
# For OpenAI
export OPENAI_API_KEY=sk-your-key-here
# For Anthropic
export ANTHROPIC_API_KEY=sk-ant-your-key-here
Basic Configuration
plugin: browser
config:
task: "Your browser task here" # Required: instruction for the AI agent
llm: # Required: LLM configuration
provider: "openai" # or "anthropic"
model: "gpt-4o" # or other supported models
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: true # Optional: run without visible browser
timeout: "2m" # Optional: execution timeout
Simple Example
name: "Basic Browser Test"
version: "v1.0.0"
tests:
- name: "Check website content"
steps:
- name: "Visit and analyze homepage"
plugin: browser
config:
task: "Navigate to https://example.com and tell me what the main heading says"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: true
timeout: "1m"
save:
- json_path: ".result"
as: "page_analysis"
- json_path: ".success"
as: "success"
assertions:
- type: "json_path"
path: ".success"
expected: true
- name: "Log results"
plugin: log
config:
message: "Page analysis: {{ page_analysis }}"
Comprehensive Configuration Example
name: "Browser Plugin Feature Demo"
version: "v1.0.0"
vars:
target_site: "https://docs.rocketship.sh"
search_term: "installation"
tests:
- name: "Complete browser automation workflow"
steps:
# Basic navigation with custom viewport
- name: "Mobile viewport test"
plugin: browser
config:
task: |
Navigate to {{ .vars.target_site }} and describe:
1. The main navigation menu
2. How the layout appears on mobile
3. Any responsive design elements
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
executor_type: "python"
headless: false
timeout: "3m"
max_steps: 5
use_vision: true
viewport:
width: 375
height: 667
browser_type: "chromium"
save:
- json_path: ".result"
as: "mobile_analysis"
- json_path: ".success"
as: "mobile_success"
- json_path: ".session_id"
as: "session_id"
# Desktop viewport with domain restrictions
- name: "Desktop search test"
plugin: browser
config:
task: |
1. Navigate to {{ .vars.target_site }}
2. Find and use the search functionality
3. Search for "{{ .vars.search_term }}"
4. Tell me what results you found
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: false
timeout: "2m"
max_steps: 10
use_vision: true
viewport:
width: 1920
height: 1080
allowed_domains:
- "docs.rocketship.sh"
- "rocketship.sh"
save:
- json_path: ".result"
as: "search_results"
- json_path: ".extracted_data"
as: "extracted_info"
# Data extraction example
- name: "Extract structured data"
plugin: browser
config:
task: |
Navigate to https://jsonplaceholder.typicode.com/users
Extract the first 3 users' information including:
- Name
- Email
- Company name
Return as structured JSON data
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: true
timeout: "2m"
use_vision: false # Text extraction doesn't need vision
save:
- json_path: ".result"
as: "user_data"
- json_path: ".success"
as: "extraction_success"
# Log comprehensive results
- name: "Log all results"
plugin: log
config:
message: |
🌐 Browser Automation Results:
📱 Mobile Analysis:
{{ mobile_analysis }}
🔍 Search Results:
{{ search_results }}
📊 Extracted Data:
{{ user_data }}
✅ Success Status: Mobile={{ mobile_success }}, Extraction={{ extraction_success }}
Configuration Options
Required Fields
Field | Description | Example |
---|---|---|
task |
Natural language instruction for the browser agent | "Navigate to site and click login" |
llm |
LLM configuration object | See LLM Configuration section |
Optional Fields
Field | Type | Default | Description |
---|---|---|---|
executor_type |
string | "python" |
Executor type (only python supported) |
timeout |
string | "5m" |
Execution timeout (e.g., "30s", "2m", "1h") |
max_steps |
integer | 50 | Maximum browser automation steps |
browser_type |
string | "chromium" |
Browser to use: chromium, chrome, edge |
headless |
boolean | true | Run without visible browser window |
use_vision |
boolean | true | Enable visual processing for better accuracy |
session_id |
string | - | Browser session ID for persistence |
save_screenshots |
boolean | false | Save screenshots during execution |
allowed_domains |
array | [] | Restrict navigation to specific domains |
viewport |
object | 1920x1080 | Browser viewport dimensions |
LLM Configuration
The llm
object configures which AI model to use:
llm:
provider: "openai" # or "anthropic"
model: "gpt-4o" # Model name
config:
# Provider-specific API keys
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
# OR for Anthropic:
# ANTHROPIC_API_KEY: "{{ .env.ANTHROPIC_API_KEY }}"
Supported models:
- OpenAI: gpt-4o
, gpt-4
, gpt-3.5-turbo
- Anthropic: claude-3-opus
, claude-3-sonnet
, claude-3-haiku
Viewport Configuration
Control browser window and viewport dimensions:
Common viewport sizes: - Mobile: 375x667 (iPhone), 360x640 (Android) - Tablet: 768x1024 (iPad) - Desktop: 1920x1080, 1366x768, 1280x720
Note: In headless mode, only the viewport affects rendering. In headful mode, both the browser window and viewport are set to these dimensions.
Response Structure
The browser plugin returns a JSON response with:
{
"success": true,
"result": "AI agent's description of what it found/did",
"session_id": "unique-session-identifier",
"steps": [...], // Detailed automation steps
"screenshots": [...], // If save_screenshots is true
"extracted_data": {...}, // Any structured data extracted
"error": "Error message if success is false"
}
Save Operations
Basic Data Extraction
Complete Response Capture
save:
- json_path: ".result"
as: "ai_analysis"
- json_path: ".session_id"
as: "browser_session"
- json_path: ".extracted_data"
as: "structured_data"
- json_path: ".steps"
as: "automation_steps"
- json_path: ".screenshots"
as: "screenshots_list"
Optional Fields
save:
- json_path: ".result"
as: "required_result"
- json_path: ".optional_metadata"
as: "extra_info"
required: false # Won't fail if missing
Assertions
Validate browser automation results:
Template Variables
Use data from previous steps in browser tasks:
# From previous HTTP responses
task: "Navigate to {{ api_base_url }} and login with {{ test_credentials }}"
# From configuration variables
task: "Go to {{ .vars.target_site }} and search for {{ .vars.search_query }}"
# Multi-line tasks with context
task: |
Previous analysis found: {{ previous_result }}
Now navigate to the contact page and:
1. Fill out the form with this data
2. Submit and capture the confirmation
Use Cases
Web Application Testing
- name: "Test login flow"
plugin: browser
config:
task: |
1. Navigate to {{ app_url }}/login
2. Enter username: testuser@example.com
3. Enter password: testpass123
4. Click login button
5. Verify you see the dashboard
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: false
timeout: "2m"
Data Scraping
- name: "Extract product information"
plugin: browser
config:
task: |
Navigate to {{ product_url }}
Extract:
- Product name
- Price
- Availability
- Description
Return as JSON: {"name": "", "price": "", "available": true/false, "description": ""}
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
use_vision: true # Helps with complex layouts
Visual Regression Testing
- name: "Check page layout"
plugin: browser
config:
task: |
Navigate to {{ page_url }}
Analyze the visual layout and report:
1. Any broken elements
2. Missing images
3. Layout issues
4. Text overflow problems
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
use_vision: true
save_screenshots: true
viewport:
width: 1920
height: 1080
Multi-Step Form Testing
- name: "Complete multi-page form"
plugin: browser
config:
task: |
1. Go to {{ form_url }}
2. Page 1: Fill personal information (John Doe, john@example.com)
3. Click Next
4. Page 2: Select "Premium" plan
5. Click Next
6. Page 3: Enter payment details (use test card 4111111111111111)
7. Submit form
8. Capture confirmation number
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
max_steps: 20
timeout: "5m"
Responsive Design Testing
- name: "Test responsive design"
plugin: browser
config:
task: "Navigate to {{ site_url }} and describe how the navigation menu behaves"
llm:
provider: "openai"
model: "gpt-4o"
config:
OPENAI_API_KEY: "{{ .env.OPENAI_API_KEY }}"
headless: false
viewport:
width: 375 # Mobile width
height: 667
Running Examples
# Run basic browser test
rocketship run -af examples/browser-testing/rocketship.yaml
# Run with OpenAI API key
OPENAI_API_KEY=your-key rocketship run -af examples/browser-testing/rocketship.yaml
# Run viewport comparison test
OPENAI_API_KEY=your-key rocketship run -af examples/browser-testing/viewport-test.yaml
# Run in debug mode to see browser interactions
OPENAI_API_KEY=your-key ROCKETSHIP_LOG=DEBUG rocketship run -af examples/browser-testing/rocketship.yaml
Best Practices
1. Clear, Specific Instructions
# Good: Specific, step-by-step instructions
task: |
1. Navigate to https://example.com/products
2. Click on "Electronics" category
3. Find the first laptop listing
4. Extract the price and product name
# Avoid: Vague instructions
task: "Go to the website and find laptop prices"
2. Use Appropriate Timeouts
# Quick navigation tasks
timeout: "30s"
# Complex multi-step processes
timeout: "5m"
# Data extraction from large pages
timeout: "2m"
3. Choose Headless Mode Wisely
# Use headless for:
# - CI/CD pipelines
# - Data extraction
# - High-volume testing
headless: true
# Use headful for:
# - Debugging
# - Visual testing
# - Development
headless: false
4. Optimize Vision Usage
# Enable for visual tasks
use_vision: true # Layout testing, visual elements
# Disable for text-only tasks
use_vision: false # Form filling, text extraction
5. Control Navigation Scope
# Restrict to specific domains for security
allowed_domains:
- "myapp.com"
- "api.myapp.com"
- "cdn.myapp.com"
6. Handle Dynamic Content
task: |
1. Navigate to {{ dynamic_url }}
2. Wait for the loading spinner to disappear
3. Wait for the content section to be visible
4. Then extract the data
max_steps: 10 # Allow enough steps for waiting
Troubleshooting
Common Issues
"Browser automation failed: Python executor not available" - The plugin will attempt to install browser-use automatically - Ensure Python 3.8+ is installed on your system - Check logs for specific installation errors
"Failed to launch browser"
- Ensure Chrome/Chromium is installed
- For headless mode on servers, install: apt-get install chromium-browser
- Check if another browser instance is using the same profile
"Viewport not changing" - Viewport settings only work with headless=false for window size - The viewport always affects page rendering - Check that viewport values are integers, not strings
"Navigation blocked"
- Check allowed_domains
configuration
- Some sites block automation - try with use_vision: true
- Increase max_steps
for complex navigation
"Empty or unclear results"
- Make browser task instructions more specific
- Enable vision with use_vision: true
for better accuracy
- Check if the page loaded completely before extraction
- Increase timeout for slow-loading pages
"Session not persisting"
- Save and reuse session_id
between steps
- Browser sessions are cleared after each test by default
- Use the same browser_type
across steps
Debug Mode
Run with debug logging to see detailed browser interactions:
This will show: - Browser launch commands - Page navigation details - AI agent decisions - Screenshot captures (if enabled)
The browser plugin enables sophisticated web testing scenarios that combine the flexibility of AI-driven automation with the reliability of structured test frameworks, making it ideal for testing modern web applications with dynamic content and complex interactions.