agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
What this skill does
# Browser Automation with agent-browser The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. ## Core Workflow Every browser automation follows this pattern: 1. **Navigate**: `agent-browser open <url>` 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`) 3. **Interact**: Use refs to click, fill, select 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "[email protected]" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ``` ## Command Chaining Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls. ```bash # Chain open + wait + snapshot in one call agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i # Chain multiple interactions agent-browser fill @e1 "[email protected]" && agent-browser fill @e2 "password123" && agent-browser click @e3 # Navigate and capture agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png ``` **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs). ## Handling Authentication When automating a site that requires login, choose the approach that fits: **Option 1: Import auth from the user's browser (fastest for one-off tasks)** ```bash # Connect to the user's running Chrome (they're already logged in) agent-browser --auto-connect state save ./auth.json # Use that auth state agent-browser --state ./auth.json open https://app.example.com/dashboard ``` State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest. **Option 2: Persistent profile (simplest for recurring tasks)** ```bash # First run: login manually or via automation agent-browser --profile ~/.myapp open https://app.example.com/login # ... fill credentials, submit ... # All future runs: already authenticated agent-browser --profile ~/.myapp open https://app.example.com/dashboard ``` **Option 3: Session name (auto-save/restore cookies + localStorage)** ```bash agent-browser --session-name myapp open https://app.example.com/login # ... login flow ... agent-browser close # State auto-saved # Next time: state auto-restored agent-browser --session-name myapp open https://app.example.com/dashboard ``` **Option 4: Auth vault (credentials stored encrypted, login by name)** ```bash echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin agent-browser auth login myapp ``` **Option 5: State file (manual save/load)** ```bash # After logging in: agent-browser state save ./auth.json # In a future session: agent-browser state load ./auth.json agent-browser open https://app.example.com/dashboard ``` See [references/authentication.md](references/authentication.md) for OAuth, 2FA, cookie-based auth, and token refresh patterns. ## Essential Commands ```bash # Navigation agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser # Snapshot agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) agent-browser click @e1 # Click element agent-browser click @e1 --new-tab # Click and open in new tab agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser keyboard type "text" # Type at current focus (no selector) agent-browser keyboard inserttext "text" # Insert without key events agent-browser scroll down 500 # Scroll page agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container # Get information agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title agent-browser get cdp-url # Get CDP WebSocket URL # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds agent-browser wait --text "Welcome" # Wait for text to appear (substring match) agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear agent-browser wait "#spinner" --state hidden # Wait for element to disappear # Downloads agent-browser download @e1 ./file.pdf # Click element to trigger download agent-browser wait --download ./output.zip # Wait for any download to complete agent-browser --download-path ./downloads open <url> # Set default download directory # Viewport & Device Emulation agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720) agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots) agent-browser set device "iPhone 14" # Emulate device (viewport + user agent) # Capture agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser screenshot --screenshot-dir ./shots # Save to custom directory agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 agent-browser pdf output.pdf # Save as PDF # Clipboard agent-browser clipboard read # Read text from clipboard agent-browser clipboard write "Hello, World!" # Write text to clipboard agent-browser clipboard copy # Copy current selection agent-browser clipboard paste # Paste from clipboard # Diff (compare page states) agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved file agent-browser diff screenshot --baseline before.png # Visual pixel diff agent-browser diff url <url1> <url2> # Compare two pages agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy agent-browser diff url <url1> <url2> --selector "#main" # Scope to element ``` ## Common Patterns ### Form Submission ```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "[email protected]" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ``` ### Authentication with Auth Vault (Recommended) ```bash # Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY) # Recommended: pipe password via stdin to avoid shell history exposure echo "pass" | agent-browser auth sa
Related in Web Dev
generating-lwc-components
IncludedLightning Web Components with PICKLES methodology and 165-point scoring. Use this skill when the user creates or edits LWC components, builds wire service patterns, or writes Jest tests for LWC. TRIGGER when: user creates/edits LWC components, touches lwc/**/*.js, .html, .css, .js-meta.xml files, or asks about wire service, SLDS, or Jest LWC tests. DO NOT TRIGGER when: Apex classes (use generating-apex), Aura components, or Visualforce.
tanstack-query
IncludedManage server state in React with TanStack Query v5. Set up queries with useQuery, mutations with useMutation, configure QueryClient caching strategies, implement optimistic updates, and handle infinite scroll with useInfiniteQuery. Use when: setting up data fetching in React projects, migrating from v4 to v5, or fixing object syntax required errors, query callbacks removed issues, cacheTime renamed to gcTime, isPending vs isLoading confusion, keepPreviousData removed problems.
document-processor-api
IncludedProcess documents with Nutrient DWS. Use when the user wants to generate PDFs from HTML or URLs, convert Office/images/PDFs, assemble or split packets, OCR scans, extract text/tables/key-value pairs, redact PII, watermark, sign, fill forms, optimize PDFs, or produce compliance outputs like PDF/A or PDF/UA. Triggers include convert to PDF, merge these PDFs, OCR this scan, extract tables, redact PII, sign this PDF, make this PDF/A, or linearize for web delivery.
nutrient-document-processing
IncludedProcess documents with Nutrient DWS. Use when the user wants to generate PDFs from HTML or URLs, convert Office/images/PDFs, assemble or split packets, OCR scans, extract text/tables/key-value pairs, redact PII, watermark, sign, fill forms, optimize PDFs, or produce compliance outputs like PDF/A or PDF/UA. Triggers include convert to PDF, merge these PDFs, OCR this scan, extract tables, redact PII, sign this PDF, make this PDF/A, or linearize for web delivery.
tanstack-query
IncludedManage server state in React with TanStack Query v5. Covers useMutationState, simplified optimistic updates, throwOnError, network mode (offline/PWA), and infiniteQueryOptions. Use when setting up data fetching, fixing v4→v5 migration errors (object syntax, gcTime, isPending, keepPreviousData), or debugging SSR/hydration issues with streaming server components.
accelint-nextjs-best-practices
IncludedNext.js performance optimization and best practices. Use when writing Next.js code (App Router or Pages Router); implementing Server Components, Server Actions, or API routes; optimizing RSC serialization, data fetching, or server-side rendering; reviewing Next.js code for performance issues; fixing authentication in Server Actions; or implementing Suspense boundaries, parallel data fetching, or request deduplication.