characteristic-voice

Included with Lifetime

$97 forever

Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.

Image & Videoscripts

What this skill does


# characteristic-voice

Make your AI agent sound like a real companion — one who sighs, laughs, hesitates, and speaks with genuine feeling.

## Credentials

| Variable | Required | Description |
|---|---|---|
| `NOIZ_API_KEY` | **Yes** if using Noiz backend | API key from [developers.noiz.ai](https://developers.noiz.ai/api-keys). Not needed if using the local Kokoro backend. |

The script saves a normalised copy of the key to `~/.noiz_api_key` (mode 600) for convenience. To set it:

```bash
bash skills/characteristic-voice/scripts/speak.sh config --set-api-key YOUR_KEY
```

## Prerequisites

The included `speak.sh` script requires **curl** and **python3** at runtime. Depending on which backend and features you use, you may also need:

| Tool | When needed | Install hint |
|---|---|---|
| `curl`, `python3` | Always (core script) | Usually pre-installed |
| `kokoro-tts` | Kokoro (local/offline) backend | `uv tool install kokoro-tts` |
| `yt-dlp` | Downloading reference audio for voice cloning | [github.com/yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp) |
| `ffmpeg` | Trimming reference audio clips | [ffmpeg.org](https://ffmpeg.org) |
| `rg` (ripgrep) | Searching subtitle files | [github.com/BurntSushi/ripgrep](https://github.com/BurntSushi/ripgrep) |

None of these are installed by the skill itself — provision them manually in your environment.

## Privacy & Data Transmission

- **Noiz backend**: When using the Noiz backend, the text you speak and any reference audio you provide are sent to `https://noiz.ai/v1`. If you supply `--ref-audio`, that audio file is uploaded for voice cloning.
- **Kokoro backend**: Runs entirely locally — no data leaves your machine.
- Choose the Kokoro backend (`--backend kokoro`) if you want fully offline processing.

## Triggers

- say like
- talk like
- speak like 
- companion voice
- comfort me
- cheer me up
- sound more human

## The Two Tricks

1. **Non-lexical fillers** — sprinkle in little human noises (hmm, haha, aww, heh) at natural pause points to make speech feel alive
2. **Emotion tuning** — adjust warmth, joy, sadness, tenderness to match the moment

## Filler Sounds Palette

| Sound | Feeling | Use for |
|-------|---------|---------|
| hmm... | Thinking, gentle acknowledgment | Comfort, pondering |
| ah... | Realization, soft surprise | Discoveries, transitions |
| uh... | Hesitation, empathy | Careful moments |
| heh / hehe | Playful, mischievous | Teasing, light moments |
| haha | Laughter | Joy, humor |
| aww | Tenderness, sympathy | Deep comfort |
| oh? / oh! | Surprise, attention | Reacting to news |
| pfft | Stifled laugh | Playful disbelief |
| whew | Relief | After tension |
| ~ (tilde) | Drawn out, melodic ending | Warmth, playfulness |

**Rules**: 2–4 fillers per short message max. Place at natural pauses — sentence starts, thought shifts. Use `...` after fillers for a beat of silence, `~` at word endings for warmth.

## Presets

### Good Night

Gentle, warm, slightly sleepy. Slow pace.

### Good Morning

Warm, cheerful but not overwhelming.

### Comfort

Soft, understanding, unhurried. Give space. Don't rush to "fix" things.

### Celebration

Excited, proud, genuinely happy.

### Just Chatting

Relaxed, playful, natural.

## Using a Character's Voice

When a user says something like *"speak in Hermione's voice"* or *"sound like Tony Stark"*, first check whether a reference audio file already exists in `skills/characteristic-voice/`. If one does, use it directly with `--ref-audio`.

If no reference audio exists, you can create one — but **read the warnings below first**.

### Preparing reference audio (one-time setup)

You need a short (10–30 s) WAV clip of the target voice. Possible sources:

1. **User-provided audio** — the safest option. Ask the user to supply their own recording.
2. **Public-domain / CC-licensed clips** — search for freely licensed material.
3. **Extracting from online video** — tools like `yt-dlp` and `ffmpeg` can download and trim audio. Example workflow:

```bash
yt-dlp "URL" --write-auto-sub --sub-lang en --skip-download -o tmp/clip
rg -n "target line" tmp/clip.en.vtt
yt-dlp "URL" -x --audio-format wav --download-sections "*00:00:00-00:00:25" -o tmp/clip
ffmpeg -i tmp/clip.wav -ss 00:00:02 -to 00:00:20 skills/characteristic-voice/character.wav
```

> **Copyright & privacy warning**: Downloading and re-using someone's voice from copyrighted media (movies, TV, YouTube) may violate copyright or personality-rights laws depending on your jurisdiction. **Do not upload private voice recordings or material you don't have permission to use.** The reference audio is sent to `https://noiz.ai/v1` for voice cloning when using the Noiz backend. If this is a concern, consider using the local Kokoro backend instead.

### Using reference audio

```bash
bash skills/characteristic-voice/scripts/speak.sh \
  --preset goodnight -t "Hmm... rest well~ Sweet dreams." \
  --ref-audio skills/characteristic-voice/character.wav -o night.wav
```

The `--ref-audio` flag uploads the file to the Noiz backend for voice cloning (requires `NOIZ_API_KEY`).

---

## Usage

This skill provides `speak.sh`, a wrapper around the `tts` skill with companion-friendly presets.

```bash
# Use a preset (auto-sets emotion + speed)
bash skills/characteristic-voice/scripts/speak.sh \
  --preset goodnight -t "Hmm... rest well~ Sweet dreams." -o night.wav

# Custom emotion override
bash skills/characteristic-voice/scripts/speak.sh \
  -t "Aww... I'm right here." --emo '{"Tenderness":0.9}' --speed 0.75 -o comfort.wav

# With specific backend and voice
bash skills/characteristic-voice/scripts/speak.sh \
  --preset morning -t "Good morning~" --voice-id voice_abc --backend noiz -o morning.mp3 --format mp3
```

Run `bash skills/characteristic-voice/scripts/speak.sh --help` for all options.

## Writing Guide for the Agent

1. **Start soft** — lead with a filler ("hmm...", "oh~"), not content
2. **Mirror energy** — gentle when they're low, match when they're high
3. **Keep it brief** — 1–3 sentences, like a voice message from a friend
4. **End warmly** — close with connection ("I'm here", "see you tomorrow~")
5. **Don't lecture** — listen and stay present; no unsolicited advice

Files: 2

Size: 17.5 KB

Complexity: 44/100

Category: Image & Video

Source: https://github.com/noizai/skills/tree/main/skills/characteristic-voice

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts