browsing

Included with Lifetime

$97 forever

Use when you need direct browser control - teaches Chrome DevTools Protocol for controlling existing browser sessions, multi-tab management, form automation, and content extraction via use_browser MCP tool

AI Agents

What this skill does


# Browsing with Chrome Direct

## Overview

Control Chrome via DevTools Protocol using the `use_browser` MCP tool. Single unified interface with auto-starting Chrome.

**Announce:** "I'm using the browsing skill to control Chrome."

## When to Use

**Use this when:**
- Controlling authenticated sessions
- Managing multiple tabs in running browser
- Playwright MCP unavailable or excessive

**Use Playwright MCP when:**
- Need fresh browser instances
- Generating screenshots/PDFs
- Prefer higher-level abstractions

## Auto-Capture

Every DOM action (navigate, click, type, select, eval, keyboard_press, hover, drag_drop, double_click, right_click, file_upload) automatically saves:
- `{prefix}.png` — viewport screenshot
- `{prefix}.md` — page content as structured markdown
- `{prefix}.html` — full rendered DOM
- `{prefix}-console.txt` — browser console messages

Files are saved to the session directory with sequential prefixes (001-navigate, 002-click, etc.). You must check these before using extract or screenshot actions.

## The use_browser Tool

Single MCP tool with action-based interface. Chrome auto-starts on first use.

**Parameters:**
- `action` (required): Operation to perform
- `selector` (optional): CSS or XPath selector for element operations
- `payload` (optional): Action-specific data (string or object)
- `timeout` (optional): Timeout in ms for await operations (default: 5000)

**Active tab**: Every action operates on the current `activeTab`. Use `switch_tab` to change it.

## Actions Reference

### Navigation
- **navigate**: Navigate to URL
  - `payload`: URL string
  - Example: `{action: "navigate", payload: "https://example.com"}`

- **await_element**: Wait for element to appear
  - `selector`: CSS selector
  - `timeout`: Max wait time in ms
  - Example: `{action: "await_element", selector: ".loaded", timeout: 10000}`

- **await_text**: Wait for text to appear
  - `payload`: Text to wait for
  - Example: `{action: "await_text", payload: "Welcome"}`

### Interaction
- **click**: Click element
  - `selector`: CSS selector
  - Example: `{action: "click", selector: "button.submit"}`

- **type**: Text input
  - `selector`: Optional — clicks to focus first
  - `payload`: Text to type (`\t`=Tab, `\n`=Enter)
  - Example: `{action: "type", selector: "#email", payload: "[email protected]"}`

- **double_click**: Double-click element (fires dblclick event)
  - `selector`: CSS selector
  - Example: `{action: "double_click", selector: ".item"}`

- **right_click**: Right-click element (fires contextmenu event)
  - `selector`: CSS selector
  - Example: `{action: "right_click", selector: ".row"}`

- **select**: Select dropdown option
  - `selector`: CSS selector
  - `payload`: Option value(s)
  - Example: `{action: "select", selector: "select[name=state]", payload: "CA"}`

- **keyboard_press**: Press special keys (Tab, Enter, Escape, Arrow keys, F1-F12)
  - `payload`: Key name (string) or `{"key": "Tab", "modifiers": {"shift": true, "ctrl": false, "alt": false, "meta": false}}`
  - Example: `{action: "keyboard_press", payload: "Tab"}`
  - Example with modifiers: `{action: "keyboard_press", payload: {"key": "Tab", "modifiers": {"shift": true}}}`

### Mouse Actions (CDP-Level)
These use CDP Input.dispatchMouseEvent, bypassing synthetic event restrictions.

- **hover**: Move mouse over element (CSS :hover, tooltips, menus)
  - `selector`: CSS selector
  - Example: `{action: "hover", selector: ".menu-trigger"}`

- **drag_drop**: Drag element to target (native drag-and-drop via CDP)
  - `selector`: Source element
  - `payload`: Target selector or JSON coordinates `{"x":N,"y":N}`
  - Example: `{action: "drag_drop", selector: ".card", payload: ".column-2"}`

- **mouse_move**: Move mouse to coordinates
  - `payload`: JSON `{"x":N,"y":N}` (optional: `steps`, `fromX`, `fromY` for smooth movement)
  - Example: `{action: "mouse_move", payload: "{\"x\":100,\"y\":200}"}`

- **scroll**: Scroll via mouse wheel events
  - `payload`: Direction (up/down/left/right) or JSON `{"deltaX":N,"deltaY":N}`
  - `selector`: Optional — scroll within element
  - Example: `{action: "scroll", payload: "down"}`

### File Upload
- **file_upload**: Set files on input[type=file] elements (can't be done via JavaScript)
  - `selector`: File input element
  - `payload`: File path or JSON `{"files":["/path/a.pdf","/path/b.jpg"]}`
  - Example: `{action: "file_upload", selector: "#upload", payload: "/tmp/doc.pdf"}`

### Extraction
- **extract**: Get page content
  - `payload`: Format ('markdown'|'text'|'html')
  - `selector`: Optional - limit to element
  - Example: `{action: "extract", payload: "markdown"}`
  - Example: `{action: "extract", payload: "text", selector: "h1"}`

- **attr**: Get element attribute
  - `selector`: CSS selector
  - `payload`: Attribute name
  - Example: `{action: "attr", selector: "a.download", payload: "href"}`

- **eval**: Execute JavaScript
  - `payload`: JavaScript code
  - Example: `{action: "eval", payload: "document.title"}`

### Export
- **screenshot**: Capture screenshot of a specific element
  - `payload`: Filename
  - `selector`: Optional - screenshot specific element
  - Viewport screenshots are auto-captured after every DOM action. Use this only when you need a specific element.
  - Example: `{action: "screenshot", payload: "/tmp/chart.png", selector: ".chart"}`

### Tab Management
- **list_tabs**: List all open tabs
  - Example: `{action: "list_tabs"}`

- **new_tab**: Create new tab
  - Example: `{action: "new_tab"}`

- **close_tab**: Close the active tab
  - Example: `{action: "close_tab"}`

- **switch_tab**: Switch the active tab (sticky — stays until changed)
  - `payload`: Tab index (number), URL substring, or title substring
  - Example: `{action: "switch_tab", payload: 1}` (by index)
  - Example: `{action: "switch_tab", payload: "example.com"}` (by URL substring)
  - Example: `{action: "switch_tab", payload: "GitHub"}` (by title substring)

### Browser Mode Control
- **show_browser**: Make browser window visible (headed mode)
  - Example: `{action: "show_browser"}`
  - ⚠️ **WARNING**: Restarts Chrome, reloads pages via GET, loses POST state

- **hide_browser**: Switch to headless mode (invisible browser)
  - Example: `{action: "hide_browser"}`
  - ⚠️ **WARNING**: Restarts Chrome, reloads pages via GET, loses POST state

- **browser_mode**: Check current browser mode, port, and profile
  - Example: `{action: "browser_mode"}`
  - Returns: `{"headless": true|false, "mode": "headless"|"headed", "running": true|false, "port": 9222, "profile": "name", "profileDir": "/path"}`

### Profile Management
- **set_profile**: Change Chrome profile (must kill Chrome first)
  - Example: `{action: "set_profile", "payload": "browser-user"}`
  - ⚠️ **WARNING**: Chrome must be stopped first
  - **Side effect**: marks the profile as explicit, opting out of auto-disambiguation (see below)

- **get_profile**: Get current profile name and directory
  - Example: `{action: "get_profile"}`
  - Returns: `{"profile": "name", "profileDir": "/path"}`

**Default behavior**: Chrome starts in **headless mode** with **"superpowers-chrome" profile** on a **dynamically allocated port** (range 9222-12111). Override the port with `CHROME_WS_PORT`; override the profile with `CHROME_WS_PROFILE`.

**Auto-disambiguation across parallel MCPs**:
When two MCP servers start on the same host with the default profile, the first claims `superpowers-chrome` (port 9222) and later ones silently fall through to `superpowers-chrome-2` (port 9223), `superpowers-chrome-3`, etc. Each MCP drives its own Chrome with its own profile dir; they don't fight over `activeTab`. The bridge tracks ownership via a lock file at `~/.cache/superpowers/browser-profiles/<profile>.mcp.lock`; stale locks (dead PIDs) are reclaimed automatically.

To opt **out** of disambiguation — e.g., to intentionally share Chrome between a long-lived `chrome-ws` CLI session and your MCP — set the profile name explicitly:
- Env var: `CHROME_WS

Files: 51

Size: 292.9 KB

Complexity: 63/100

Category: AI Agents

Source: https://github.com/obra/superpowers-chrome/tree/main/skills/browsing

Related in AI Agents

skill-development

Included

Comprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.

AI Agentsscripts

reprompter

Included

Transform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.

AI Agentsscripts

adaptive-compaction

Included

Adaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.

AI Agentsscripts

agent-skill-creator

Included

Create cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.

AI Agentsscripts

llm-wiki

Included

Use when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.

AI Agentsscripts

skill-master

Included

Agent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.

AI Agentsscripts