browser
Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
What this skill does
# Browser Automation
Browser automation that maintains page state across command executions. Write small, focused commands to accomplish tasks incrementally.
## Choosing Your Approach
- **Local/source-available sites**: Read the source code first to write selectors directly
- **Unknown page layouts**: Use `snapshot` to discover elements, then `select-ref` to interact
- **Visual debugging**: Take `screenshot` to see current page state
## Prerequisites
```bash
# Check browser server running (Max must be open)
curl -s http://localhost:9222/ | head -1 || echo "SERVER_NOT_RUNNING"
```
## Running Commands
All commands use `client.py` from the skill directory:
```bash
uv run skills/browser/client.py <command> [arguments]
```
> ⚠️ **IMPORTANT**: Always use `uv run client.py`, **NOT** `uv run python client.py`. The `uv run` command automatically handles Python and dependencies from `pyproject.toml`. Adding `python` breaks dependency resolution.
## Workflow Loop
Follow this pattern for complex tasks:
1. **Run a command** to perform one action
2. **Observe** the output
3. **Evaluate** - did it work? What's the current state?
4. **Decide** - is the task complete or do we need another command?
5. **Repeat** until task is done
### No TypeScript in Browser Context
Code passed to `page.evaluate()` runs in the browser, which doesn't understand TypeScript:
```typescript
// ✅ Correct: plain JavaScript
const text = await page.evaluate(() => {
return document.body.innerText;
});
// ❌ Wrong: TypeScript syntax will fail at runtime
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // Type annotation breaks in browser!
return el.innerText;
});
```
## Waiting
```bash
uv run skills/browser/client.py wait-load main # After navigation
uv run skills/browser/client.py wait-selector main ".results" # For specific elements
uv run skills/browser/client.py wait-url main "**/success" # For specific URL
```
## Scraping Data
For large datasets, **intercept and replay API requests** rather than scrolling DOM. See [refs/scraping.md](refs/scraping.md) for the complete guide covering request capture, schema discovery, and paginated API replay.
## Inspecting Page State
### Screenshots
```bash
uv run skills/browser/client.py screenshot main screenshot.png
uv run skills/browser/client.py screenshot main full.png --full-page # Capture entire scrollable page
```
### ARIA Snapshot (Element Discovery)
Use `snapshot` to discover page elements. Returns YAML-formatted accessibility tree:
```yaml
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- main:
- heading "Products" [ref=e3] [level=1]
- list:
- listitem:
- link "Article Title" [ref=e4]
- button "Add to Cart" [ref=e5]
- listitem:
- link "Another Article" [ref=e6]
- button "Add to Cart" [ref=e7] [nth=1]
- contentinfo:
- textbox [ref=e8]
- /placeholder: "Search"
```
**Interpreting refs:**
- `[ref=eN]` - Element reference for interaction
- `[nth=N]` - Nth duplicate element with same role+name (0-indexed, first one omitted)
- `[checked]`, `[disabled]`, `[expanded]` - Element states
- `[level=N]` - Heading level
- `/url:`, `/placeholder:` - Element properties
**Interacting with refs:**
```bash
# Get snapshot to find refs
uv run skills/browser/client.py snapshot main
# Only show interactive elements (buttons, links, inputs, etc.)
uv run skills/browser/client.py snapshot main -i
# Use ref to interact
uv run skills/browser/client.py select-ref main e2 click
uv run skills/browser/client.py select-ref main e7 click # Click second "Add to Cart"
uv run skills/browser/client.py select-ref main e8 fill "search term"
```
## Error Recovery
Page state persists after failures. Debug with:
```bash
# Take screenshot to see current state
uv run skills/browser/client.py screenshot main debug.png
# Get page info
uv run skills/browser/client.py info main
# Get text content
uv run skills/browser/client.py text main "body"
```
## Command Reference
### Page Management
```bash
uv run skills/browser/client.py list # List all pages
uv run skills/browser/client.py create main # Create a new page
uv run skills/browser/client.py create main "https://..." # Create and navigate
uv run skills/browser/client.py goto main "https://..." # Navigate existing page
uv run skills/browser/client.py close main # Close a page
uv run skills/browser/client.py info main # Get page URL and title
```
### Element Interaction
```bash
uv run skills/browser/client.py click main "button.submit" # Click element
uv run skills/browser/client.py fill main "input#email" "[email protected]" # Fill input
uv run skills/browser/client.py hover main ".dropdown" # Hover over element
uv run skills/browser/client.py keyboard main "Enter" # Press key
uv run skills/browser/client.py text main "h1" # Get element text
```
### JavaScript Execution
```bash
uv run skills/browser/client.py evaluate main "document.title"
uv run skills/browser/client.py evaluate main "document.querySelectorAll('.item').length"
```
## Python Script (Advanced)
For complex tasks requiring loops or `page.on()` event handlers, use heredoc with `BrowserClient`:
```bash
cd skills/browser && uv run python <<'EOF'
from client import BrowserClient
client = BrowserClient()
page = client.get_playwright_page("main")
# Full Playwright API available
page.goto("https://example.com")
page.click("button")
# Event handlers for request interception
page.on("response", lambda r: print(r.url))
EOF
```
The `page` object is a standard Playwright Page.
Related in Code Review
gstack
IncludedFast headless browser for QA testing and site dogfooding. Navigate pages, interact with elements, verify state, diff before/after, take annotated screenshots, test responsive layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
startup-due-diligence
IncludedLegal due diligence review for seed-stage and Series A startups (US, Delaware C-Corp focus). Supports both investor and founder perspectives. Capabilities include: (1) Interactive document review and issue spotting; (2) Document request list generation; (3) Cap table and SAFE/convertible note analysis; (4) Red flag identification with severity ratings; (5) Diligence report generation. TRIGGERS: due diligence, DD, startup investment, cap table review, Series A, seed round, investor diligence, legal review startup, SAFE analysis, convertible note, 409A, founder vesting.
interview-master
IncludedThis skill should be used when the user asks to "generate interview questions", "prepare for interview", "optimize resume", "conduct mock interview", "analyze git commits for resume", "generate resume from code", "review my resume", or mentions interview preparation, career assistance, or extracting project experience from git history. Provides comprehensive interview and career development guidance for both job seekers and interviewers.
fix-issue
IncludedFixes GitHub issues using parallel analysis agents for root cause investigation, code exploration, and regression detection. Reads issue context from gh CLI, searches codebase and memory for related patterns, generates a fix with tests, and links the resolution back to the issue via PR. Includes prevention analysis to avoid recurrence. Use when debugging errors, resolving regressions, fixing bugs, or triaging issues.
sf-apex
IncludedGenerates and reviews Salesforce Apex code with 150-point scoring. TRIGGER when: user writes, reviews, or fixes Apex classes, triggers, test classes, batch/queueable/schedulable jobs, or touches .cls/.trigger files. DO NOT TRIGGER when: LWC JavaScript (use sf-lwc), Flow XML (use sf-flow), SOQL-only queries (use sf-soql), or non-Salesforce code.
swift-development
IncludedComprehensive Swift development for building, testing, and deploying iOS/macOS applications. Use when Claude needs to: (1) Build Swift packages or Xcode projects from command line, (2) Run tests with XCTest or Swift Testing framework, (3) Manage iOS simulators with simctl, (4) Handle code signing, provisioning profiles, and app distribution, (5) Format or lint Swift code with SwiftFormat/SwiftLint, (6) Work with Swift Package Manager (SPM), (7) Implement Swift 6 concurrency patterns (async/await, actors, Sendable), (8) Create SwiftUI views with MVVM architecture, (9) Set up Core Data or SwiftData persistence, or any other Swift/iOS/macOS development tasks.