agent-rdp

Included with Lifetime

$97 forever

Controls Windows Remote Desktop sessions for automation, testing, and remote administration. Use when the user needs to connect to Windows machines via RDP, take screenshots, click, type, or interact with remote Windows desktops.

AI Agents

What this skill does


# Windows Remote Desktop Control with agent-rdp

## Quick start

```bash
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i              # See interactive elements
agent-rdp automate click "@e5"              # Click button by ref
agent-rdp automate fill "@e7" "Hello"       # Type into field
agent-rdp disconnect
```

## Core workflow

1. Connect with automation: `agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation`
2. Snapshot: `agent-rdp automate snapshot -i` (get accessibility tree with refs)
3. Act: `agent-rdp automate click @e5` or `agent-rdp automate fill @e7 "text"`
4. Repeat: snapshot → act → snapshot → act...

## Troubleshooting

### Element not in snapshot with `-i`

Try without `-i` flag - some elements aren't marked as interactive but are still actionable:
```bash
agent-rdp automate snapshot              # Full tree, no filtering
agent-rdp automate snapshot -d 5         # Limit depth if too large
```

### Element not in accessibility tree at all

Some UI elements (WebView content, certain dialogs, toast notifications) don't appear in the accessibility tree. Use OCR as a last resort:

1. Take screenshot to identify what you need: `agent-rdp screenshot -o screen.png`
2. Use locate to find coordinates: `agent-rdp locate "Button Text"`
3. Click using returned coordinates: `agent-rdp mouse click <x> <y>`

## Commands

### Connection
```bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin  # Read password from stdin
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share   # Map local directory
agent-rdp disconnect
```

### Screenshot
```bash
agent-rdp screenshot                      # Save to ./screenshot.png
agent-rdp screenshot -o desktop.png       # Save to specific file
agent-rdp screenshot --format jpeg        # JPEG format
```

### Mouse
```bash
agent-rdp mouse click 500 300             # Left click at (500, 300)
agent-rdp mouse right-click 500 300       # Right click
agent-rdp mouse double-click 500 300      # Double click
agent-rdp mouse move 100 200              # Move cursor
agent-rdp mouse drag 100 100 500 500      # Drag from (100,100) to (500,500)
```

### Keyboard
```bash
agent-rdp keyboard type "Hello World"     # Type text (supports Unicode)
agent-rdp keyboard press "ctrl+c"         # Key combination
agent-rdp keyboard press "alt+tab"        # Switch windows
agent-rdp keyboard press "ctrl+shift+esc" # Task manager
agent-rdp keyboard press "win+r"          # Run dialog
agent-rdp keyboard press enter            # Single key (use press, not key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5
```

### Scroll
```bash
agent-rdp scroll up --amount 3            # Scroll up 3 notches
agent-rdp scroll down --amount 5          # Scroll down 5 notches
agent-rdp scroll left
agent-rdp scroll right
```

### Clipboard
```bash
agent-rdp clipboard set "Text to paste"   # Set clipboard (paste on Windows)
agent-rdp clipboard get                   # Get clipboard (after copy on Windows)
```

### Drive mapping
```bash
# Map at connect time
agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName

# List mapped drives
agent-rdp drive list
```

### Session management
```bash
agent-rdp session list                    # List active sessions
agent-rdp session info                    # Current session info
agent-rdp --session work connect ...      # Named session
agent-rdp --session work screenshot       # Use named session
```

### Wait
```bash
agent-rdp wait 2000                       # Wait 2 seconds
```

### Locate (OCR)
```bash
agent-rdp locate "Cancel"                 # Find lines containing "Cancel"
agent-rdp locate "Save*" --pattern        # Glob pattern matching
agent-rdp locate --all                    # Get all text on screen
agent-rdp locate "OK" --json              # JSON output with coordinates
```

Returns text lines with bounding boxes and center coordinates for clicking:
```
Found 1 line(s) containing 'Cancel':
  'Cancel' at (650, 420) size 45x14 - center: (672, 427)

To click the first match: agent-rdp mouse click 672 427
```

### UI Automation
```bash
# Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation

# Snapshot - get accessibility tree (refs always included)
agent-rdp automate snapshot                # Full desktop tree
agent-rdp automate snapshot -i             # Interactive elements only
agent-rdp automate snapshot -c             # Compact (remove empty elements)
agent-rdp automate snapshot -d 5           # Limit depth to 5 levels
agent-rdp automate snapshot -s "~*Notepad*"# Scope to a window/element
agent-rdp automate snapshot -f             # Start from focused element
agent-rdp automate snapshot -i -c -d 3     # Combine options

# Pattern-based element operations (use selectors: @eN, #automationId, .className, or name)
agent-rdp automate click "#SaveButton"    # Click button
agent-rdp automate click "@e5"            # Click by ref number
agent-rdp automate click "@e5" -d         # Double-click (for file list items)
agent-rdp automate select "@e10"          # Select item (SelectionItemPattern)
agent-rdp automate select "@e5" --item "Option 1"  # Select item by name in container
agent-rdp automate toggle "@e7"           # Toggle checkbox (TogglePattern)
agent-rdp automate toggle "@e7" --state on  # Set specific state
agent-rdp automate expand "@e3"           # Expand menu/tree (ExpandCollapsePattern)
agent-rdp automate collapse "@e3"         # Collapse menu/tree
agent-rdp automate context-menu "@e5"     # Open context menu (Shift+F10)
agent-rdp automate focus <selector>       # Focus element
agent-rdp automate get <selector>         # Get element properties

# Text input
agent-rdp automate fill <selector> "text" # Clear and fill text (ValuePattern)
agent-rdp automate clear <selector>       # Just clear

# Scrolling
agent-rdp automate scroll <selector> --direction down --amount 3

# Window operations
agent-rdp automate window list
agent-rdp automate window focus "~*Notepad*"
agent-rdp automate window maximize
agent-rdp automate window minimize
agent-rdp automate window restore
agent-rdp automate window close "~*Notepad*"

# Run commands/apps (best way to open apps)
agent-rdp automate run "notepad.exe"                                        # Open Notepad
agent-rdp automate run "Start-Process ms-settings:" --wait                  # Open Settings
agent-rdp automate run "calc.exe"                                           # Open Calculator
agent-rdp automate run "Get-Process" --wait --process-timeout 5000          # With 5s timeout

# Wait for element
agent-rdp automate wait-for <selector> --timeout 5000
agent-rdp automate wait-for <selector> --state visible

# Status
agent-rdp automate status
```

**Selector syntax:**
- `@e5` or `@5` - Reference number from snapshot (e prefix recommended)
- `#SaveButton` - Automation ID
- `.Edit` - Win32 class name
- `~*pattern*` - Name with wildcard
- `File` - Element name (exact match)

**Snapshot output format:**
```
- Window "Notepad" [ref=e1, id=Notepad]
  - MenuBar "Application" [ref=e2]
    - MenuItem "File" [ref=e3]
  - Edit "Text Editor" [ref=e5, value="Hello"]
```

## JSON output

Add `--json` for machine-readable output:
```bash
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshot
```

## Example: Open PowerShell and run command

```bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000                       # Wait for desktop
agent-rdp keyboard press "win+r"          # Open Run dialog
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000                       # Wait for PowerShell
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press ent

Files: 1

Size: 12.1 KB

Complexity: 21/100

Category: AI Agents

Source: https://github.com/thisnick/agent-rdp/tree/main/skills/agent-rdp

Related in AI Agents

skill-development

Included

Comprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.

AI Agentsscripts

reprompter

Included

Transform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.

AI Agentsscripts

adaptive-compaction

Included

Adaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.

AI Agentsscripts

agent-skill-creator

Included

Create cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.

AI Agentsscripts

llm-wiki

Included

Use when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.

AI Agentsscripts

skill-master

Included

Agent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.

AI Agentsscripts