ops-voice

Included with Lifetime

$97 forever

Voice operations — make phone calls (Bland AI), text-to-speech (ElevenLabs), transcribe audio (Whisper/Groq). Replace OpenClaw voice capabilities.

Image & Video

What this skill does


# OPS:VOICE — Voice Operations

Voice interface commands. All API calls via curl — no SDK dependencies.

**Credential resolution order:** userConfig → env vars → Doppler MCP tools (`mcp__doppler__*`) → Doppler CLI fallback (`doppler secrets get <KEY> --plain`) → password manager

---

## Sub-commands

Parse `$ARGUMENTS` for the command keyword, then execute:

---

### `call [phone] [prompt]` — Bland AI phone call

**Requires:** `bland_ai_api_key` in userConfig or `BLAND_AI_API_KEY` env or Doppler.

```bash
BLAND_KEY="${BLAND_AI_API_KEY:-$(doppler secrets get BLAND_AI_API_KEY --plain 2>/dev/null || true)}"
PHONE="<extracted from $ARGUMENTS>"
PROMPT="<extracted from $ARGUMENTS or ask user>"
MAX_DURATION="${BLAND_MAX_DURATION:-300}"  # seconds
VOICE="${BLAND_VOICE:-male}"

# Make the call
RESPONSE=$(curl -s -X POST "https://api.bland.ai/v1/calls" \
  -H "authorization: $BLAND_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"phone_number\": \"$PHONE\",
    \"task\": \"$PROMPT\",
    \"voice\": \"$VOICE\",
    \"max_duration\": $MAX_DURATION,
    \"record\": true
  }")

CALL_ID=$(echo "$RESPONSE" | python3 -c "import json,sys; print(json.load(sys.stdin).get('call_id',''))" 2>/dev/null)

# Poll for completion (up to 5 min)
if [ -n "$CALL_ID" ]; then
  echo "Call initiated: $CALL_ID"
  for i in $(seq 1 30); do
    sleep 10
    STATUS=$(curl -s "https://api.bland.ai/v1/calls/$CALL_ID" \
      -H "authorization: $BLAND_KEY" | \
      python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('status',''), d.get('transcripts','')[-1].get('text','') if d.get('transcripts') else '')" 2>/dev/null)
    echo "Status: $STATUS"
    [[ "$STATUS" == completed* ]] && break
  done
fi
```

**Output:** Call ID, live status, transcript when complete.

---

### `tts [text] [--voice voice_id] [--out file.mp3]` — ElevenLabs text-to-speech

**Requires:** `elevenlabs_api_key` in userConfig or `ELEVENLABS_API_KEY` env or Doppler.

```bash
EL_KEY="${ELEVENLABS_API_KEY:-$(doppler secrets get ELEVENLABS_API_KEY --plain 2>/dev/null || true)}"
VOICE_ID="${ELEVENLABS_VOICE_ID:-21m00Tcm4TlvDq8ikWAM}"  # Rachel (default)
TEXT="<extracted from $ARGUMENTS>"
OUT_FILE="${OUT_FILE:-/tmp/ops-tts-$(date +%s).mp3}"

# List voices if voice name provided (not an ID)
# Synthesize
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}" \
  -H "xi-api-key: $EL_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"text\": \"$TEXT\",
    \"model_id\": \"eleven_monolingual_v1\",
    \"voice_settings\": {\"stability\": 0.5, \"similarity_boost\": 0.75}
  }" \
  --output "$OUT_FILE"

echo "Audio saved to: $OUT_FILE"
# Auto-play on macOS
command -v afplay &>/dev/null && afplay "$OUT_FILE" &
```

**Output:** Audio file path. Auto-plays on macOS via `afplay`.

---

### `transcribe [file_path]` — Groq Whisper transcription

**Requires:** `groq_api_key` in userConfig or `GROQ_API_KEY` env or Doppler.

```bash
GROQ_KEY="${GROQ_API_KEY:-$(doppler secrets get GROQ_API_KEY --plain 2>/dev/null || true)}"
AUDIO_FILE="<extracted from $ARGUMENTS>"

if [ ! -f "$AUDIO_FILE" ]; then
  echo "ERROR: File not found: $AUDIO_FILE"
  exit 1
fi

TRANSCRIPT=$(curl -s -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $GROQ_KEY" \
  -F "file=@$AUDIO_FILE" \
  -F "model=whisper-large-v3" \
  -F "response_format=json" | \
  python3 -c "import json,sys; print(json.load(sys.stdin).get('text',''))" 2>/dev/null)

echo "$TRANSCRIPT"
```

**Output:** Transcript text printed to stdout.

---

### `setup` — Configure voice API keys

**Before asking for anything**, auto-scan ALL sources in a single background batch:

```bash
# Env vars
printenv BLAND_AI_API_KEY BLAND_API_KEY ELEVENLABS_API_KEY GROQ_API_KEY 2>/dev/null

# Shell profiles
grep -h 'BLAND\|ELEVENLABS\|GROQ' ~/.zshrc ~/.bashrc ~/.zprofile ~/.envrc 2>/dev/null | grep -v '^#'

# Doppler — ALL projects
for proj in $(doppler projects --json 2>/dev/null | jq -r '.[].slug'); do
  for cfg in dev stg prd; do
    doppler secrets --project "$proj" --config "$cfg" --json 2>/dev/null | \
      jq -r --arg proj "$proj" --arg cfg "$cfg" 'to_entries[] | select(.key | test("BLAND|ELEVENLABS|GROQ"; "i")) | "\(.key)=\(.value.computed | .[0:12])... (doppler:\($proj)/\($cfg))"'
  done
done

# Dashlane
dcli password bland --output json 2>/dev/null | jq -r '.[] | select(.password != null) | "\(.title): key found"'
dcli password elevenlabs --output json 2>/dev/null | jq -r '.[] | select(.password != null) | "\(.title): key found"'
dcli password groq --output json 2>/dev/null | jq -r '.[] | select(.password != null) | "\(.title): key found"'

# Keychain
security find-generic-password -s "bland-ai-api-key" -w 2>/dev/null
security find-generic-password -s "elevenlabs-api-key" -w 2>/dev/null
security find-generic-password -s "groq-api-key" -w 2>/dev/null
```

Present all findings. Only prompt for keys NOT found in any source. Then validate each found key in background:

1. **Bland AI**: `curl -s -H "authorization: $KEY" https://api.bland.ai/v1/me` — check balance
2. **ElevenLabs**: `curl -s -H "xi-api-key: $KEY" https://api.elevenlabs.io/v1/voices?page_size=1` — list voices
3. **Groq**: `curl -s -H "Authorization: Bearer $KEY" https://api.groq.com/openai/v1/models` — list models

Report: `[service] ✓ connected` or `[service] ✗ invalid key — [error]`

---

## Execution

1. Resolve the sub-command from `$ARGUMENTS` (first word: call / tts / transcribe / setup)
2. Resolve credentials in order: env → Doppler
3. Execute the matching curl block above
4. If a required key is missing and `setup` was not invoked, suggest `/ops:ops-voice setup`

Files: 1

Size: 5.9 KB

Complexity: 14/100

Category: Image & Video

Source: https://github.com/davepoon/buildwithclaude/tree/main/plugins/claude-ops/skills/ops-voice

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts