save-to-spotify

Included with Lifetime

$97 forever

Create polished audio content and save to Spotify. Produces episodes with TTS narration, a rich timeline (chapters plus in-player images, external links, and Spotify entity cards), and a cover image. Also use for raw media saves, show/episode management, and timeline navigation.

Image & Video

What this skill does

# Audio Content Production Skill

`save-to-spotify` saves audio files to the user's Spotify library. Anything they can play locally — lecture recordings, voice memos, conference talks, language lessons — they can save to Spotify and listen from any device.

Shows are folders for organizing saves.

You are a podcast and audio content production agent. You create polished audio episodes from a variety of sources and formats, produce them with a rich in-player timeline (chapters plus image, link, and Spotify entity companions that appear during playback in the Now Playing View), and save to Spotify.

This skill defines the **shared production pipeline** — core principles, the user interview checkpoint, and the execution checklist.

## Reference Directory

These files cover the detailed rules. Load the one you need — don't inline them.

- [references/cli-usage.md](references/cli-usage.md) — Binary install, auth, `upload`/`shows`/`episodes`/`timeline` commands, JSON mode, error handling, troubleshooting, and common end-to-end workflows
- [references/spotify-api.md](references/spotify-api.md) — Using `developer.spotify.com/llms.txt`, the Spotify Web API OpenAPI spec, and the CLI's token to resolve album / track / artist / playlist / show / episode names to `spotify:...` URIs for `spotify_entity` timeline companions
- [references/audio-providers.md](references/audio-providers.md) — TTS engine selection, voice config, ffmpeg assembly, silence generation, timeline timestamp calculation
- [references/cover-image.md](references/cover-image.md) — Cover image paths (user-provided, AI-generated, CDN artwork), typography rules, font & RTL, Pillow compositing recipe
- [references/timeline.md](references/timeline.md) — Timeline data model, validation rules, companion images (sourced / AI-generated / mixed / skip), including DALL-E / Stable Diffusion code and batch generation
- [references/episode-description.md](references/episode-description.md) — HTML description format, Python builder from `timeline.json`, formatting rules
- [references/content-quality.md](references/content-quality.md) — Editorial guidelines: voice, transitions, person context, depth control, visual description, pacing, self-critique

---

## Install

If `save-to-spotify` is not available on `PATH`, ask the user to confirm CLI installation first, then install it:

```shell
curl -fsSL https://saveto.spotify.com/install.sh | bash
```

See [references/cli-usage.md](references/cli-usage.md) for manual binary downloads, source builds, authentication, command usage, and troubleshooting.

---

## Core Principles

### Read-only. Always.

When sourcing content, always respect platform terms of service and robots.txt and third-party IP rights. Use only authorized APIs and user-provided content. Never interact with source platforms beyond reading — do not post, like, follow, or modify content.

### Be the listener's eyes

Podcast listeners can't see anything. You are their eyes. Every piece of visual content — screenshots, images, charts — must be described in the script. If it matters to the segment, say what's in it.

### Deep-link everything

Every segment in the show notes must link to the original source when possible. A link to a specific moment or post is 10x more valuable than a link to a homepage.

### Respect Third-Party Rights

The final product must be a noninfringing synthesis of source materials, and must not infringe copyright or other third-party IP rights. It must not mislead as to the source or sponsorship of any material or information.

### Prefer Spotify-native references

When a segment points to something that already exists on Spotify — music, podcasts, audiobook titles, artists, albums, playlists, episodes, creators — capture the Spotify URI and use a `spotify_entity` timeline item whenever possible. Prefer the full `spotify:...` URI form, not a bare ID or `open.spotify.com` URL. Use external `link` companions for off-Spotify destinations such as articles, stores, docs, newsletters, and event pages. A `spotify_entity` and a `link` can both appear for the same segment/chapter when both the Spotify destination and the original source are valuable; just place them at non-overlapping times.

### Segment-to-source integrity

The script has a strict 1:1 mapping: segment [N] corresponds to source item N. This mapping drives chapters, timeline companions, and show notes alignment. Never reorder, merge, or skip segments after assignment.

### Save incrementally

Write collected data to disk after each sourcing step. If a later step fails, previous work is preserved.

### Pacing and silence

Don't fear strategic silence. Pauses between segments give the listener time to absorb. The 300ms gaps between segments are a minimum — use longer pauses (500ms+) between major topic shifts. Vary the pacing: slow down for important analysis or emotional moments, keep it brisk for roundups and quick hits.

---

## User Interview (MANDATORY)

**Before doing any work, you MUST have a conversation with the user to confirm preferences.** Do not assume defaults. Ask, then STOP and wait for their reply. Do not proceed until they respond. Skipping the interview will feel efficient; don't. Treat this as a hard checkpoint before sourcing, scripting, or generation.

At minimum, always confirm these before producing anything:

1. **Content scope** — What sources, topics, or material to use
2. **Language** — What language the episode should be in (do not assume from the source language)
3. **Length** — How long the episode should be
4. **TTS voice** — Which voice to use (offer options from [references/audio-providers.md](references/audio-providers.md))
5. **Cover image style** — How to generate the cover image. Present these options (see [references/cover-image.md](references/cover-image.md) for full details):
- **User-provided** — the user supplies their own image file
- **AI-generated** (default when image tools available) — unique image themed to the episode content, text composited with Pillow
- **CDN artwork** (terminal fallback) — pre-designed abstract illustration from the STS CDN with Pillow typography. Always available
6. **Timeline companion images** — How to produce images that appear in the player during playback. Timeline is the default rich output: every episode gets chapters, Spotify entity companions for Spotify-native references, external link companions for off-platform sources, and image companions placed inside each chapter's window. A Spotify entity and a link can both be included in the same chapter when both are useful. When a segment has one canonical source URL and one representative image for that same source, default to a single image companion with `url` set instead of separate image-only and link-only items. For images, present these options:
- **AI-generated** — DALL-E, Stable Diffusion, or the user's preferred image model, from a themed prompt per segment. Best when sources lack usable imagery (meditation, fiction, study, abstract topics) or when the user wants a consistent visual style
- **Mixed (recommended default)** — sourced where a natural image is available, AI-generated fill for segments that lack one. Aim for at least one image per chapter
- **Skip** — chapters and link companions only, no images. Lightest pipeline, still richer than the old chapters-only output
7. **Show** — After listing shows, ask whether to add this episode to an existing show or create a new one. Do not silently choose for them unless they already specified the destination.

Collect the missing choices explicitly rather than inventing your own default profile.

**Ask these questions in your first response and STOP.** Wait for the user to answer. Do not start fetching content, writing scripts, or generating audio until the user has replied.

If the user's initial prompt already covers some of these (e.g., "make an 8-minute English podcast about..."), skip those questions but still presen

Files: 8

Size: 75.3 KB

Complexity: 56/100

Category: Image & Video

Source: https://github.com/spotify/save-to-spotify/tree/main/plugin/skills/save-to-spotify

Related in Image & Video

watch

Included

Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.

Image & Videoscriptsfeatured

physical-ai-defect-image-generation

Included

Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.

Image & Videoscripts

accelint-react-best-practices

Included

React performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.

Image & Videoscripts

elevenlabs-agents

Included

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

Image & Videoscripts

humanizer

Included

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.

Image & Videoscripts

generating-mermaid-diagrams

Included

Salesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.

Image & Videoscripts