sinch-voice-api
Build voice apps with Sinch Voice REST API. Use for phone calls, text-to-speech (TTS), IVR menus, DTMF input, conference calling, call recording, call forwarding, answering machine detection (AMD), SIP routing, WebSocket audio streaming, and SVAML call control.
What this skill does
# Sinch Voice API
## Overview
The Sinch Voice API lets you make, receive, and control voice calls programmatically via REST. It uses **SVAML** (Sinch Voice Application Markup Language) to define call flows through callback events.
## Agent Instructions
Before generating code, gather from the user (skip any item already specified in the prompt or context):
1. **Approach** — SDK or direct API calls (curl/fetch/requests)?
2. **Language** — for SDK: Node.js, Python, Java, or .NET. For direct API: any language, or curl.
When the user chooses **SDK**, refer to the [sinch-sdks](../sinch-sdks/SKILL.md) skill for installation and client initialization, then to the bundled examples and SDK reference linked in Links.
When the user chooses **direct API calls**, refer to the Voice API Reference linked in Links for request/response schemas.
**Security**: See the Security section below for url fetching policy, handling inbound callback content, and credential handling.
## Getting Started
### Agent Credentials handling
Store credentials in environment variables — never hardcode application keys or secrets in commands or source code:
```bash
export SINCH_APPLICATION_KEY="your-application-key"
export SINCH_APPLICATION_SECRET="your-application-secret"
```
### Authentication
Ensure that authentication headers are properly set when making API calls. The Voice API uses **Application Key + Application Secret** (not project-level OAuth2):
```bash
-u "$SINCH_APPLICATION_KEY:$SINCH_APPLICATION_SECRET"
```
See the [sinch-authentication](../sinch-authentication/SKILL.md) skill for full setup.
- **Basic Auth**: `Authorization: Basic base64(APPLICATION_KEY:APPLICATION_SECRET)`
- **Signed Requests** (production): HMAC-SHA256 signing. See [Authentication Guide](https://developers.sinch.com/docs/voice/api-reference/authentication.md).
### Base URLs
| Region | Base URL |
|--------|----------|
| Global (default) | `https://calling.api.sinch.com` |
| North America | `https://calling-use1.api.sinch.com` |
| Europe | `https://calling-euc1.api.sinch.com` |
| Southeast Asia 1 | `https://calling-apse1.api.sinch.com` |
| Southeast Asia 2 | `https://calling-apse2.api.sinch.com` |
| South America | `https://calling-sae1.api.sinch.com` |
Configuration endpoints (numbers, callbacks) use: `https://callingapi.sinch.com`
### SDK Installation
See [sinch-sdks](../sinch-sdks/SKILL.md) for installation and client initialization across all languages.
### First API Call: TTS Callout
```bash
curl -X POST \
"https://calling.api.sinch.com/calling/v1/callouts" \
-u "$SINCH_APPLICATION_KEY:$SINCH_APPLICATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"method": "ttsCallout",
"ttsCallout": {
"destination": { "type": "number", "endpoint": "+14045005000" },
"cli": "+14045001000",
"locale": "en-US",
"text": "Hello! This is a test call from Sinch."
}
}'
```
**Node.js SDK:**
```javascript
import { SinchClient } from "@sinch/sdk-core";
const sinch = new SinchClient({
applicationKey: "{APPLICATION_KEY}",
applicationSecret: "{APPLICATION_SECRET}",
});
const response = await sinch.voice.callouts.tts({
ttsCalloutRequestBody: {
destination: { type: "number", endpoint: "+14045005000" },
cli: "+14045001000",
locale: "en-US",
text: "Hello! This is a test call from Sinch.",
},
});
console.log("Call ID:", response.callId);
```
For more examples, see [Callouts Reference](https://developers.sinch.com/docs/voice/api-reference/voice/callouts/callouts.md) or [bundled examples](references/examples/).
## Key Concepts
### SVAML (Sinch Voice Application Markup Language)
SVAML controls call flow. Every SVAML response has:
- **instructions** (array): Multiple tasks — play audio, record, set cookies
- **action** (object): Exactly ONE routing/control action
Full reference: [SVAML Actions](https://developers.sinch.com/docs/voice/api-reference/svaml.md#actions) | [SVAML Instructions](https://developers.sinch.com/docs/voice/api-reference/svaml.md#instructions) | [Bundled SVAML Reference](references/svaml.md)
### Actions (one per response)
| Action | Description |
|--------|-------------|
| `hangup` | Terminate the call |
| `continue` | Continue call setup (ACE response to proceed without rerouting) |
| `connectPstn` | Connect to PSTN number. Supports `amd` for Answering Machine Detection |
| `connectMxp` | Connect to Sinch SDK (in-app) endpoint |
| `connectConf` | Connect to conference room by `conferenceId` |
| `connectSip` | Connect to SIP endpoint |
| `connectStream` | Connect to a WebSocket server for real-time audio streaming (**closed beta** — contact Sinch to enable) |
| `runMenu` | IVR menu with DTMF collection (supports `enableVoice` for speech input) |
| `park` | Park (hold) the call with looping prompt |
### Instructions (multiple per response)
| Instruction | Description |
|-------------|-------------|
| `playFiles` | Play audio files, TTS via `#tts[]`, SSML via `#ssml[]` |
| `say` | Synthesize and play text-to-speech |
| `sendDtmf` | Send DTMF tones |
| `setCookie` | Persist key-value state across callback events in the session |
| `answer` | Answer the call (sends a SIP 200 OK to the INVITE, which starts billing). Required before playing prompts on unanswered calls |
| `startRecording` | Begin recording. Supports `transcriptionOptions` for auto-transcription |
| `stopRecording` | Stop an active recording |
### Callback Events
| Event | Trigger | SVAML Response |
|-------|---------|----------------|
| **ICE** | Call received by Sinch platform | Yes |
| **ACE** | Call answered by callee | Yes |
| **DiCE** | Call disconnected | No (fire-and-forget, logging only) |
| **PIE** | DTMF/voice input from `runMenu` | Yes |
| **Notify** | Notification (e.g., recording finished) | No |
See [Callbacks Reference](https://developers.sinch.com/docs/voice/api-reference/voice/callbacks/ice.md) for event schemas, or [bundled callbacks reference](references/callbacks.md) for full field tables and JSON examples.
### Callout Types
| Method | Use Case |
|--------|----------|
| `ttsCallout` | Call and play synthesized speech. Supports `text` or advanced `prompts` (`#tts[]`, `#ssml[]`, `#href[]`) |
| `conferenceCallout` | Call and connect to a conference room |
| `customCallout` | Full SVAML control with inline ICE/ACE/PIE |
Callout flags: `enableAce` (default `false`), `enableDice` (default `false`), `enablePie` (default `false`) control which callbacks fire.
### REST Endpoints
Paths starting with `/calling/v1/` use the **regional base URL** from the table above. Paths starting with `/v1/configuration/` use `https://callingapi.sinch.com`.
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/calling/v1/callouts` | Place a callout (TTS, conference, or custom) |
| PATCH | `/calling/v1/calls/id/{callId}` | Update in-progress call with SVAML (PSTN/SIP only) |
| GET | `/calling/v1/calls/id/{callId}` | Get call info |
| PATCH | `/calling/v1/calls/id/{callId}/leg/{callLeg}` | Manage a call leg (PlayFiles/Say only) |
| GET | `/calling/v1/conferences/id/{conferenceId}` | Get conference info |
| DELETE | `/calling/v1/conferences/id/{conferenceId}` | Kick all participants |
| PATCH | `/calling/v1/conferences/id/{conferenceId}/{callId}` | Mute/unmute/hold participant |
| DELETE | `/calling/v1/conferences/id/{conferenceId}/{callId}` | Kick specific participant |
| GET | `/v1/configuration/numbers` | List numbers and capabilities |
| POST | `/v1/configuration/numbers` | Assign numbers to an application |
| DELETE | `/v1/configuration/numbers` | Un-assign a number |
| GET/POST | `/v1/configuration/callbacks/applications/{applicationkey}` | Get/update callback URLs |
## Common Patterns
### IVR Menu (SVAML)
```json
{
"instructions": [
{ "name": "setCookie", "key": "step", "value": "ivr" }
],
"action": {
"name": "runMenu",
"mainMenu": "main",
"menus": [{
"id": "main",
"mainPromRelated in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.