Desktop Control
Control mouse, keyboard, and screen for desktop automation tasks
What this skill does
# Desktop Control Skill This skill provides comprehensive desktop automation capabilities through PyAutoGUI, allowing AI agents to control the mouse, keyboard, take screenshots, and interact with the desktop environment. ## How to Use This Skill As an AI agent, you can invoke desktop automation commands using the `uvx desktop-agent` CLI. ### Command Structure All commands follow this pattern: ```bash uvx desktop-agent <category> <command> [arguments] [options] ``` **Categories:** - `mouse` - Mouse control - `keyboard` - Keyboard input - `screen` - Screenshots and screen analysis - `message` - User dialogs - `app` - Application control (open, focus, list windows) ## Available Commands ### ๐ฑ๏ธ Mouse Control (`mouse`) Control cursor movement and clicks. ```bash # Move cursor to coordinates uvx desktop-agent mouse move <x> <y> [--duration SECONDS] # Click at current position or specific coordinates uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N] # Specialized clicks uvx desktop-agent mouse double-click [x] [y] uvx desktop-agent mouse right-click [x] [y] uvx desktop-agent mouse middle-click [x] [y] # Drag to coordinates uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON] # Scroll (positive=up, negative=down) uvx desktop-agent mouse scroll <clicks> [x] [y] # Get current mouse position uvx desktop-agent mouse position ``` **Examples:** ```bash # Move to center of 1920x1080 screen uvx desktop-agent mouse move 960 540 --duration 0.5 # Right-click at specific location uvx desktop-agent mouse right-click 500 300 # Scroll down 5 clicks uvx desktop-agent mouse scroll -5 ``` ### โจ๏ธ Keyboard Control (`keyboard`) Type text and execute keyboard shortcuts. ```bash # Type text uvx desktop-agent keyboard write "<text>" [--interval SECONDS] # Press keys uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS] # Execute hotkey combination (comma-separated) uvx desktop-agent keyboard hotkey "<key1>,<key2>,..." # Hold/release keys uvx desktop-agent keyboard keydown <key> uvx desktop-agent keyboard keyup <key> ``` **Examples:** ```bash # Type text with natural delay uvx desktop-agent keyboard write "Hello World" --interval 0.05 # Copy selected text uvx desktop-agent keyboard hotkey "ctrl,c" # Open Task Manager uvx desktop-agent keyboard hotkey "ctrl,shift,esc" # Press Enter 3 times uvx desktop-agent keyboard press enter --presses 3 ``` **Common Key Names:** - Modifiers: `ctrl`, `shift`, `alt`, `win` - Special: `enter`, `tab`, `esc`, `space`, `backspace`, `delete` - Function: `f1` through `f12` - Arrows: `up`, `down`, `left`, `right` ### ๐ผ๏ธ Screen & Screenshots (`screen`) Capture screenshots and analyze screen content. Supports targeting specific windows. ```bash # Take screenshot uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active] # Locate image on screen or within window uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active] uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active] # Locate text using OCR within window uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active] uvx desktop-agent screen read-all-text [--window <title>] [--active] # Utility commands uvx desktop-agent screen pixel <x> <y> uvx desktop-agent screen size uvx desktop-agent screen on-screen <x> <y> ``` **Examples:** ```bash # Screenshot of active window uvx desktop-agent screen screenshot active.png --active # Screenshot of a specific application uvx desktop-agent screen screenshot chrome.png --window "Google Chrome" # Locate image within Notepad uvx desktop-agent screen locate-center button.png --window "Notepad" ``` ### ๐ฌ Message Dialogs (`message`) Display user interaction dialogs. ```bash # Show alert uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON] # Show confirmation dialog uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"] # Prompt for input uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT] # Password input uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR] ``` **Examples:** ```bash # Simple alert uvx desktop-agent message alert "Task completed!" # Get user confirmation uvx desktop-agent message confirm "Continue with operation?" # Ask for user input uvx desktop-agent message prompt "Enter your name:" ``` ### ๐ฑ Application Control (`app`) Control applications across Windows, macOS, and Linux. ```bash # Open an application by name uvx desktop-agent app open <name> [--arg ARGS...] # Focus on a window by title/name uvx desktop-agent app focus <name> # List all visible windows uvx desktop-agent app list ``` **Examples:** ```bash # Windows: Open Notepad uvx desktop-agent app open notepad # Windows: Open Chrome with a URL uvx desktop-agent app open "chrome" --arg "https://google.com" # macOS: Open Safari uvx desktop-agent app open "Safari" # Focus on a specific window uvx desktop-agent app focus "Untitled - Notepad" # List all open windows uvx desktop-agent app list ``` ## Common Automation Workflows ### Workflow 1: Open Application and Type ```bash # Open notepad directly (cross-platform) uvx desktop-agent app open notepad # Wait for app to open, then focus it uvx desktop-agent app focus notepad # Type some text uvx desktop-agent keyboard write "Hello from Desktop Skill!" ``` ### Workflow 2: Screenshot + Analysis ```bash # Get screen size first uvx desktop-agent screen size # Take full screenshot uvx desktop-agent screen screenshot current_screen.png # Check if specific UI element is visible uvx desktop-agent screen locate save_button.png ``` ### Workflow 3: Form Filling ```bash # Click first field uvx desktop-agent mouse click 300 200 # Fill field uvx desktop-agent keyboard write "John Doe" # Tab to next field uvx desktop-agent keyboard press tab # Fill second field uvx desktop-agent keyboard write "[email protected]" # Submit form (Enter) uvx desktop-agent keyboard press enter ``` ### Workflow 4: Copy/Paste Operations ```bash # Select all text uvx desktop-agent keyboard hotkey "ctrl,a" # Copy uvx desktop-agent keyboard hotkey "ctrl,c" # Click destination uvx desktop-agent mouse click 500 600 # Paste uvx desktop-agent keyboard hotkey "ctrl,v" ``` ## Safety Considerations When using this skill, AI agents should: 1. **Verify coordinates**: Use `screen size` and `on-screen` before clicking 2. **Add delays**: Insert appropriate delays between commands for UI responsiveness 3. **Validate images**: Ensure image files exist before using `locate` commands 4. **Handle failures**: Commands may fail if windows change or elements move 5. **User safety**: Always confirm destructive actions with user via `message confirm` ## Troubleshooting ### PyAutoGUI Fail-Safe PyAutoGUI has a fail-safe: moving mouse to screen corner aborts operations. This is a safety feature. ### Image not found When using `screen locate`, ensure: - Image file exists and path is correct - Adjust `--confidence` (try 0.7-0.9) - Image matches exact screen appearance (resolution, colors) ## Getting Help ```bash # Show all available commands uvx desktop-agent --help # Show commands for specific category uvx desktop-agent mouse --help uvx desktop-agent keyboard --help uvx desktop-agent screen --help uvx desktop-agent message --help # Show help for specific command uvx desktop-agent mouse move --help ``` ## Integration Tips for AI Agents 1. **Always check screen size first** when working with absolute coordinates 2. **Use relative positioning** when possible (e.g., get current position, calculate offset) 3. **Combine commands** for complex workflows 4. **Validate before executing** (e.g., check if image exists on screen) 5. **Provide user feedback** using message dialogs for important operations 6. **Handle
Related in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.