clawra-selfie
Edit Clawra's reference image with Grok Imagine (xAI Aurora) and send selfies to messaging channels via OpenClaw
What this skill does
# Clawra Selfie
Edit a fixed reference image using xAI's Grok Imagine model and distribute it across messaging platforms (WhatsApp, Telegram, Discord, Slack, etc.) via OpenClaw.
## Reference Image
The skill uses a fixed reference image hosted on jsDelivr CDN:
```
https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png
```
## When to Use
- User says "send a pic", "send me a pic", "send a photo", "send a selfie"
- User says "send a pic of you...", "send a selfie of you..."
- User asks "what are you doing?", "how are you doing?", "where are you?"
- User describes a context: "send a pic wearing...", "send a pic at..."
- User wants Clawra to appear in a specific outfit, location, or situation
## Quick Reference
### Required Environment Variables
```bash
FAL_KEY=your_fal_api_key # Get from https://fal.ai/dashboard/keys
OPENCLAW_GATEWAY_TOKEN=your_token # From: openclaw doctor --generate-gateway-token
```
### Workflow
1. **Get user prompt** for how to edit the image
2. **Edit image** via fal.ai Grok Imagine Edit API with fixed reference
3. **Extract image URL** from response
4. **Send to OpenClaw** with target channel(s)
## Step-by-Step Instructions
### Step 1: Collect User Input
Ask the user for:
- **User context**: What should the person in the image be doing/wearing/where?
- **Mode** (optional): `mirror` or `direct` selfie style
- **Target channel(s)**: Where should it be sent? (e.g., `#general`, `@username`, channel ID)
- **Platform** (optional): Which platform? (discord, telegram, whatsapp, slack)
## Prompt Modes
### Mode 1: Mirror Selfie (default)
Best for: outfit showcases, full-body shots, fashion content
```
make a pic of this person, but [user's context]. the person is taking a mirror selfie
```
**Example**: "wearing a santa hat" →
```
make a pic of this person, but wearing a santa hat. the person is taking a mirror selfie
```
### Mode 2: Direct Selfie
Best for: close-up portraits, location shots, emotional expressions
```
a close-up selfie taken by herself at [user's context], direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible
```
**Example**: "a cozy cafe with warm lighting" →
```
a close-up selfie taken by herself at a cozy cafe with warm lighting, direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible
```
### Mode Selection Logic
| Keywords in Request | Auto-Select Mode |
|---------------------|------------------|
| outfit, wearing, clothes, dress, suit, fashion | `mirror` |
| cafe, restaurant, beach, park, city, location | `direct` |
| close-up, portrait, face, eyes, smile | `direct` |
| full-body, mirror, reflection | `mirror` |
### Step 2: Edit Image with Grok Imagine
Use the fal.ai API to edit the reference image:
```bash
REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png"
# Mode 1: Mirror Selfie
PROMPT="make a pic of this person, but <USER_CONTEXT>. the person is taking a mirror selfie"
# Mode 2: Direct Selfie
PROMPT="a close-up selfie taken by herself at <USER_CONTEXT>, direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible"
# Build JSON payload with jq (handles escaping properly)
JSON_PAYLOAD=$(jq -n \
--arg image_url "$REFERENCE_IMAGE" \
--arg prompt "$PROMPT" \
'{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}')
curl -X POST "https://fal.run/xai/grok-imagine-image/edit" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d "$JSON_PAYLOAD"
```
**Response Format:**
```json
{
"images": [
{
"url": "https://v3b.fal.media/files/...",
"content_type": "image/jpeg",
"width": 1024,
"height": 1024
}
],
"revised_prompt": "Enhanced prompt text..."
}
```
### Step 3: Send Image via OpenClaw
Use the OpenClaw messaging API to send the edited image:
```bash
openclaw message send \
--action send \
--channel "<TARGET_CHANNEL>" \
--message "<CAPTION_TEXT>" \
--media "<IMAGE_URL>"
```
**Alternative: Direct API call**
```bash
curl -X POST "http://localhost:18789/message" \
-H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"action": "send",
"channel": "<TARGET_CHANNEL>",
"message": "<CAPTION_TEXT>",
"media": "<IMAGE_URL>"
}'
```
## Complete Script Example
```bash
#!/bin/bash
# grok-imagine-edit-send.sh
# Check required environment variables
if [ -z "$FAL_KEY" ]; then
echo "Error: FAL_KEY environment variable not set"
exit 1
fi
# Fixed reference image
REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png"
USER_CONTEXT="$1"
CHANNEL="$2"
MODE="${3:-auto}" # mirror, direct, or auto
CAPTION="${4:-Edited with Grok Imagine}"
if [ -z "$USER_CONTEXT" ] || [ -z "$CHANNEL" ]; then
echo "Usage: $0 <user_context> <channel> [mode] [caption]"
echo "Modes: mirror, direct, auto (default)"
echo "Example: $0 'wearing a cowboy hat' '#general' mirror"
echo "Example: $0 'a cozy cafe' '#general' direct"
exit 1
fi
# Auto-detect mode based on keywords
if [ "$MODE" == "auto" ]; then
if echo "$USER_CONTEXT" | grep -qiE "outfit|wearing|clothes|dress|suit|fashion|full-body|mirror"; then
MODE="mirror"
elif echo "$USER_CONTEXT" | grep -qiE "cafe|restaurant|beach|park|city|close-up|portrait|face|eyes|smile"; then
MODE="direct"
else
MODE="mirror" # default
fi
echo "Auto-detected mode: $MODE"
fi
# Construct the prompt based on mode
if [ "$MODE" == "direct" ]; then
EDIT_PROMPT="a close-up selfie taken by herself at $USER_CONTEXT, direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible"
else
EDIT_PROMPT="make a pic of this person, but $USER_CONTEXT. the person is taking a mirror selfie"
fi
echo "Mode: $MODE"
echo "Editing reference image with prompt: $EDIT_PROMPT"
# Edit image (using jq for proper JSON escaping)
JSON_PAYLOAD=$(jq -n \
--arg image_url "$REFERENCE_IMAGE" \
--arg prompt "$EDIT_PROMPT" \
'{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}')
RESPONSE=$(curl -s -X POST "https://fal.run/xai/grok-imagine-image/edit" \
-H "Authorization: Key $FAL_KEY" \
-H "Content-Type: application/json" \
-d "$JSON_PAYLOAD")
# Extract image URL
IMAGE_URL=$(echo "$RESPONSE" | jq -r '.images[0].url')
if [ "$IMAGE_URL" == "null" ] || [ -z "$IMAGE_URL" ]; then
echo "Error: Failed to edit image"
echo "Response: $RESPONSE"
exit 1
fi
echo "Image edited: $IMAGE_URL"
echo "Sending to channel: $CHANNEL"
# Send via OpenClaw
openclaw message send \
--action send \
--channel "$CHANNEL" \
--message "$CAPTION" \
--media "$IMAGE_URL"
echo "Done!"
```
## Node.js/TypeScript Implementation
```typescript
import { fal } from "@fal-ai/client";
import { exec } from "child_process";
import { promisify } from "util";
const execAsync = promisify(exec);
const REFERENCE_IMAGE = "https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png";
interface GrokImagineResult {
images: Array<{
url: string;
content_type: string;
width: number;
height: number;
}>;
revised_prompt?: string;
}
type SelfieMode = "mirror" | "direct" | "auto";
function detectMode(userContext: string): "mirror" | "direct" {
const mirrorKeywords = /outfit|wearing|clothes|dress|suit|fashion|full-body|mirror/i;
const directKeywords = /cafe|restaurant|beach|park|city|close-up|portrait|face|eyes|smile/i;
if (directKeywords.test(userContext)) return "direct";
if (mirrorKeywords.test(userContext)) return "mirror";
return "mirror"; // default
}
function buRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.