skyvern-browser-automation

Included with Lifetime

$97 forever

AI-powered browser automation — navigate sites, fill forms, extract structured data, log in with stored credentials, and build reusable workflows.

browser-automationbrowser-automationmcpweb-scrapingform-fillingai-agentsworkflow-automation

What this skill does


# Skyvern Browser Automation -- CLI Judgment Procedure

Skyvern uses AI to navigate and interact with websites. Every command below is a runnable `skyvern <command>` invocation.

## When to Use This Skill

- Use when you need AI-assisted browser automation for navigation, extraction, form filling, login flows, or reusable website workflows.
- Use when deterministic selectors are unavailable and Skyvern's visual/a11y reasoning can identify page controls.
- Use when a one-off browser task should become a repeatable workflow with run history and verification.

## Step 1: Classify Your Task (ALWAYS do this first)

| Classification | Signal | CLI Command | Cost | What Happens |
|---|---|---|---|---|
| Quick check (yes/no) | "is the user logged in?" | `skyvern browser validate` | 1 LLM + screenshots | Lightweight validation (2 steps max), returns boolean. Cheapest AI option. |
| Quick inspection | "what does the page show?" | `skyvern browser extract` | 1 LLM + screenshots | Dedicated extraction LLM + schema validation + caching. |
| Single action (known target) | "click #submit" | `skyvern browser click/type` | 0 LLM | Deterministic Playwright. No AI. Fastest. |
| Single action (unknown target) | "click the submit button" | `skyvern browser act` | 2-3 LLM, no screenshots | No screenshots in reasoning. Economy a11y tree. For visual targets, use hybrid mode (selector + intent). |
| Same-page multi-step | "fill the form and submit" | `skyvern browser act` or primitive chain | 2-3 LLM or 0 LLM | Use `act` when labels are clear. Use click/type/select directly when you know selectors. |
| Throwaway autonomous trial | "try this once", "see if this works" | `skyvern browser run-task` | Higher | One-off autonomous agent for exploration. Do not use for recurring or multi-page production automations. |
| Multi-page or reusable automation | "navigate a multi-page wizard", "set this up", "automate this weekly" | `skyvern workflow create` + `run` | N LLM + screenshots | Build a workflow with one block per step. Each block gets visual reasoning, verification, and reusable run history. |

**MCP note:** if you are using the Skyvern MCP instead of the CLI, prefer `observe + execute` for same-page multi-step UI work. The CLI does not expose that pair directly.

## Step 2: Apply These Decision Rules

1. If the prompt includes a selector, id, XPath, or exact field target, use browser primitives -- not `act`.
2. If you only need a yes/no answer, use `validate` -- not `extract` or `act`.
3. If the work stays on one page and labels are clear, use `act` or a primitive chain.
4. If the user says `try this once`, `see if this works`, or clearly wants a one-off exploratory trial, use `run-task`.
5. If the task spans multiple pages and is meant to be reusable, scheduled, repeatable, or explicitly `set up` as automation, use `workflow create`.
6. Never type passwords. Always use stored credentials with `skyvern browser login`.

## Step 3: Create a Session

Every browser command needs a session. Create one first:

```bash
# Cloud session (default -- works for public URLs)
skyvern browser session create --timeout 30

# Local session (for localhost URLs or self-hosted mode)
skyvern browser session create --local --timeout 30

# Connect to existing browser via CDP
skyvern browser session connect --cdp "ws://localhost:9222"
```

Session state persists between commands. After `session create`, subsequent commands auto-attach.
Override with `--session pbs_...`. Close when done: `skyvern browser session close`.

## Step 4: Execute by Classification

### Quick check (yes/no)

```bash
skyvern browser validate --prompt "Is the user logged in? Look for a dashboard or avatar."
```

Returns true/false. Cheapest AI option -- prefer over extract or act for boolean checks.

### Quick inspection

```bash
skyvern browser extract \
  --prompt "Extract all product names and prices" \
  --schema '{"type":"object","properties":{"items":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'
```

Uses screenshots + dedicated extraction LLM. Better than screenshot+read because Skyvern's LLM interprets the page.

### Single action (known target)

```bash
skyvern browser click --selector "#submit-btn"
skyvern browser type --text "[email protected]" --selector "#email"
skyvern browser select --value "US" --intent "the country dropdown"
```

Deterministic. No AI. Three targeting modes:
1. **Intent**: `--intent "the Submit button"` (AI finds element)
2. **Selector**: `--selector "#submit-btn"` (CSS/XPath, deterministic)
3. **Hybrid**: both (selector narrows, AI confirms)

### Single action (unknown target)

```bash
skyvern browser act --prompt "Click the Sign In button"
skyvern browser act --prompt "Close the cookie banner, then click Sign In"
```

**Warning:** act has NO screenshots in its LLM reasoning. It uses an economy accessibility tree.
Fine for well-labeled elements. For visually complex targets, use MCP observe+click or hybrid mode.

### Same-page multi-step

```bash
skyvern browser act --prompt "Fill the shipping form and click Continue"
```

Use `act` when the fields and buttons are clearly labeled and the flow stays on one page.
If you need tighter control, break the work into `click`, `type`, `select`, `press-key`, and `wait`.

### Throwaway autonomous trial

```bash
skyvern browser run-task \
  --url "https://example.com" \
  --prompt "Check whether the checkout flow works end to end and extract the confirmation number"
```

Use `run-task` to prove feasibility or do one-off exploration. If the task becomes important enough
to rerun, debug, or share, convert it to a workflow.

### Multi-page or reusable automation — build a workflow with one block per step

```bash
skyvern workflow create --definition @checkout-workflow.yaml
skyvern workflow run --id wpid_123 --wait
skyvern workflow status --run-id wr_789
```

Each navigation block runs with visual reasoning + verification. Split complex flows into
multiple blocks (one per page/step). First run uses AI; subsequent runs replay cached scripts.

### Repeated/production

```bash
skyvern workflow create --definition @workflow.yaml
skyvern workflow run --id wpid_123 --params '{"email":"[email protected]"}'
skyvern workflow status --run-id wr_789
```

Split into one block per step. Use **navigation** blocks for actions, **extraction** for data.
First run uses AI; subsequent runs replay a cached script (10-100x faster).
Set `--run-with agent` to force AI mode for debugging.

## Step 5: Verify

Always verify after page-changing actions:

```bash
skyvern browser screenshot                          # visual check
skyvern browser validate --prompt "Was the form submitted successfully?"  # boolean assertion
skyvern browser evaluate --expression "document.title"                    # JS state check
```

## Step 6: Error Recovery

| Problem | Fix |
|---------|-----|
| Action clicked wrong element | Add context to prompt. Use hybrid mode (selector + intent). |
| Extraction returns empty | Wait for content. Relax required fields. Check row count first. |
| Login passes but next step fails | Ensure same session. Add post-login validate check. |
| Element not found | Add wait: `skyvern browser wait --selector "#el" --state visible` |
| Overloaded prompt | Split into smaller goals -- one intent per command. |

## Credentials

NEVER type passwords through `skyvern browser type` or `act`. Always use stored credentials:

```bash
skyvern credentials add --name "my-login" --type password --username "[email protected]"
skyvern credential list                          # find the credential ID
skyvern browser login --url "https://login.example.com" --credential-id cred_123
```

Types: `password`, `credit_card`, `secret`. Also supports bitwarden, 1password, and azure_vault providers.

## Workflow Quick Reference

```bash
skyvern workflow create --definition @workflow.yaml   # create
skyvern workflow run --id wpid_123 --wait             # ru

Files: 1

Size: 13.8 KB

Complexity: 22/100

Category: browser-automation

Source: https://github.com/sickn33/antigravity-awesome-skills/tree/main/plugins/antigravity-awesome-skills/skills/skyvern-browser-automation