quick-voice

Included with Lifetime

$97 forever

Spin up an instant browser voice session (OpenAI Realtime gpt-realtime-2) to close a topic in a short conversation instead of working through documents. Generic & white-label - works for any process. Supports live data work (read/update files, JSON, run commands), and distill mode (no tools, ends with a structured deliverable). Has a generic canvas that can display images, markdown, code, html, json, video, audio - perfect for "let's go over X" flows where the agent shows you items one by one and you react in real time. Use when user says "let's close this in a voice call", "run a quick voice session about X", "תפעיל שיחה קולית", "let's go over the [images/leads/PRs/files/notes]", or when a task is faster as a 3-minute conversation than as a document edit.

Image & Videoscripts

What this skill does


# quick-voice

White-label voice session generator. Each invocation produces a per-session web app that opens a WebRTC voice channel to OpenAI Realtime (`gpt-realtime-2`) with **context-specific instructions, tools, and canvas behavior**.

## When to use

- "תפעיל איתי שיחה קולית על X" / "let's have a quick voice call about X"
- Going through a list of items where speaking is faster than reading + clicking
- Closing a topic that needs a few decisions + updates (not a long-form plan)
- Producing a structured output (decisions / action items / notes) from a free-form discussion

## Two modes

| Mode | Tools | Output |
|---|---|---|
| **live** | File / JSON / bash tools available — agent updates real data during the call | A summary of changes made (in `output.md`) |
| **distill** | No data-mutation tools; canvas + `save_note` + `end_session` only | A long structured deliverable (decisions, notes, action items) in `output.md` |

The canvas is available in **both** modes.

## How to run a session

### Step 1 — figure out context

Look at the conversation. The user said one of:
- **Explicit:** "let's close the freelancer reviews" → topic = freelancer reviews
- **Implicit:** earlier in the conversation we generated 10 images → topic = "go over generated images"
- **Vague:** "תפעיל שיחה קולית" → call AskUserQuestion (single Q) to clarify topic + mode

### Step 2 — pick a runtime directory + generate the session config

Each session has its own **runtime directory** holding `config.json`, `output.md`,
`server.log`, `done.flag`. The runtime dir lives **outside the skill** so the
skill itself stays code-only and project data stays with the project.

Pick a session id: `id=$(date +%Y%m%d-%H%M%S)`.

Choose the runtime directory:
- **Inside a project** (git repo / codebase you're working in):
  put it at `<project-root>/.quick-voice/<id>/`. Add `.quick-voice/` to the
  project's `.gitignore` so session data never gets committed.
- **No project context:**
  use `/tmp/quick-voice-$USER/<id>/`.

Create the directory and write `config.json` into it:

```json
{
  "mode": "live",
  "topic": "Short Hebrew topic title",
  "instructions": "Full Hebrew system prompt for the realtime agent. Tell it what to do, what to ask, when to use the canvas, when to call save_note, when to call end_session. Be specific about the flow.",
  "voice": "ash",
  "cwd": "/absolute/path/used/as/root/for/relative/file/ops",
  "tools": ["canvas_show", "canvas_clear", "save_note", "end_session", "read_file", "list_dir", "update_json"],
  "canvas_hints": [
    { "type": "image", "source": "/abs/path/to/image1.png", "title": "Image 1" },
    { "type": "image", "source": "/abs/path/to/image2.png", "title": "Image 2" }
  ],
  "output_template": "# Session output\n\n## Decisions\n\n## Action items\n\n## Notes\n"
}
```

**Fields:**

- `mode`: `"live"` or `"distill"`.
- `topic`: shown in the page header.
- `instructions`: the system prompt. Write it in Hebrew (Aviz prefers Hebrew). Be specific — describe the flow you want the agent to follow.
- `voice`: `"ash"` (default), `"alloy"`, `"cedar"`, etc.
- `cwd`: directory the file tools are scoped to. **Required** if any file tool is enabled.
- `tools`: whitelist of tool names from the full set (see `lib/tool-defs.js`). For **distill mode** use only: `canvas_show`, `canvas_clear`, `save_note`, `end_session`. For **live mode** add file / JSON / bash tools as needed.
- `canvas_hints`: optional. If you pre-load items the agent should walk through, list them here. Otherwise the agent decides what to show.
- `output_template`: seeds `output.md` so the agent has a structure to fill in via `save_note`.

### Step 3 — launch

```bash
node ~/.claude/skills/quick-voice/scripts/launch.js <runtime-dir>
```

`<runtime-dir>` is the absolute path to the directory you created in Step 2.
The launcher reads `<runtime-dir>/config.json` and writes `output.md`,
`server.log`, `done.flag` back into the same directory.

Cross-platform (macOS / Linux / Windows). This:
1. Verifies `OPENAI_API_KEY` (from env or `~/.claude/skills/quick-voice/.env`)
2. Runs `npm install` once if `node_modules` is missing
3. Finds a free port in 3031-3040 (uses `net.createServer` — no shell needed)
4. Spawns `server.js`, polls `/config` until ready
5. Opens the default browser at `http://localhost:<port>` (`open` on macOS, `xdg-open` on Linux, `start` on Windows)
6. Waits for the user to end the session (close browser → `/done` is hit, or the agent calls `end_session`)
7. Prints `output.md` and exits

### Step 4 — surface the output

After the launcher returns, read `runtime/<id>/output.md` and present it to the user. Do NOT delete the runtime dir automatically — the user may want to re-open or audit it. The session log is in `runtime/<id>/server.log`.

## Available tools (full set)

See `lib/tool-defs.js` for OpenAI Realtime tool definitions and `lib/tools.js` for implementations. Whitelist via `config.json.tools`.

**Canvas (both modes):**
- `canvas_show({ type, source, title?, content? })` — display in canvas. `type` ∈ `image|markdown|html|code|json|video|audio|text|url`.
  - For **media** (`image`, `video`, `audio`): pass `source` (file path or URL).
  - For **text-like** (`markdown`, `html`, `code`, `json`, `text`): pass either `content` (inline string) OR `source` (file path — the client fetches the file via `/file` and renders it). If you have a long block already on disk, prefer `source`; if you're generating short content inline, use `content`.
  - For `url`: pass `source` (iframe src).
- `canvas_clear()` — clear canvas.

**Output / control (both modes):**
- `save_note({ heading, content })` — append a section to `output.md`.
- `end_session({ summary? })` — finalize and close. `summary` is appended to output.md.

**Data (live mode only):**
- `read_file({ path })` — read file under `cwd`.
- `write_file({ path, content })` — write file under `cwd`.
- `append_file({ path, content })` — append.
- `update_json({ path, patch })` — shallow-merge `patch` into a JSON file (object root only).
- `list_dir({ path })` — list directory contents.
- `run_bash({ cmd })` — run a shell command in `cwd`. Use sparingly.

## Examples

### Example 1 — "let's go over the images you just created" (distill)

```json
{
  "mode": "distill",
  "topic": "סקירת תמונות",
  "instructions": "אתה מציג לאביץ תמונות אחת אחת. עבור כל תמונה: 1) קרא ל-canvas_show עם הנתיב מ-canvas_hints, 2) שאל 'מה דעתך?', 3) הקשב לתגובה, 4) קרא ל-save_note עם heading='[שם תמונה]' ו-content=[התגובה של אביץ]. כשמסיימים את כל התמונות — קרא ל-end_session.",
  "voice": "ash",
  "tools": ["canvas_show", "canvas_clear", "save_note", "end_session"],
  "canvas_hints": [
    { "type": "image", "source": "/Users/aviz/aviz-crm/output/img-001.png", "title": "1" },
    { "type": "image", "source": "/Users/aviz/aviz-crm/output/img-002.png", "title": "2" }
  ],
  "output_template": "# פידבק על תמונות\n\n"
}
```

### Example 2 — "review pending freelancer scores" (live)

```json
{
  "mode": "live",
  "topic": "סקירת פרילנסרים",
  "instructions": "פתח בקריאה ל-read_file({path: 'data/freelancers.json'}). הצג כל פרילנסר בקנבס (canvas_show type=json). שאל את אביץ לעדכון ציון. עדכן עם update_json. תעד ב-save_note. סיים עם end_session.",
  "voice": "ash",
  "cwd": "/Users/aviz/aviz-crm",
  "tools": ["canvas_show", "canvas_clear", "save_note", "end_session", "read_file", "update_json", "list_dir"],
  "output_template": "# Freelancer review session\n\n## Updates made\n\n"
}
```

### Example 3 — vague invocation

User says only "תפעיל שיחה קולית". Call AskUserQuestion once:

> Question: "על מה השיחה?"
> Options: [explicit topic the user types in via 'Other'], "סקירה חופשית — distill בלבד"

Then build the config from the answer.

## Anti-patterns

- **Don't bake secrets.** `OPENAI_API_KEY` comes from `~/.claude/skills/quick-voice/.env` (or the project's `.env`). Never inline.
- **Don't generate huge instructions.** Keep `instructions` ≤ 2KB. The agent needs to act fast.
- **

Files: 11

Size: 56.5 KB

Complexity: 72/100

Category: Image & Video

Source: https://github.com/aviz85/claude-skills-library/tree/main/skills/quick-voice

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts