resemble-detect
Deepfake detection and media safety — detect AI-generated audio, images, video, and text, trace synthesis sources, apply watermarks, verify speaker identity, and analyze media intelligence using Resemble AI
What this skill does
# Resemble Detect — Deepfake Detection & Media Safety
Analyze audio, image, video, and text for synthetic manipulation, AI-generated content, watermarks, speaker identity, and media intelligence using the Resemble AI platform.
## Core Principle — THE IRON LAW
**"NEVER DECLARE MEDIA AS REAL OR FAKE WITHOUT A COMPLETED DETECTION RESULT."**
Do not guess, infer, or speculate about media authenticity. Every authenticity claim must be backed by a completed Resemble detect job with a returned `label`, `score`, and `status: "completed"`. If the detection is still `processing`, wait. If it `failed`, say so — do not substitute your own judgment.
## When to Use
Use this skill whenever the user's request involves any of these:
- Checking if audio, video, image, or text is AI-generated or manipulated
- Detecting deepfakes in any media format
- Verifying media authenticity or provenance
- Identifying which AI platform synthesized audio (source tracing)
- Applying or detecting watermarks on media
- Analyzing media for speaker info, emotion, transcription, or misinformation
- Asking natural-language questions about detection results
- Matching or verifying speaker identity against known voice profiles
- Detecting AI-generated or machine-written text
- Any mention of: "deepfake", "fake detection", "synthetic media", "voice verification", "watermark", "media forensics", "authenticity check", "source tracing", "is this real", "AI-written text", "text detection"
**Do NOT use** for text-to-speech generation, voice cloning, or speech-to-text transcription — those are separate Resemble capabilities.
## Capability Decision Tree
| User wants to... | Use this | API endpoint |
|-------------------------------------------------------|---------------------------|---------------------------------------|
| Check if media is AI-generated / deepfake | **Deepfake Detection** | `POST /detect` |
| Know *which AI platform* made fake audio | **Audio Source Tracing** | `POST /detect` with flag |
| Get speaker info, emotion, transcription from media | **Intelligence** | `POST /intelligence` |
| Ask questions about a completed detection | **Detect Intelligence** | `POST /detects/{uuid}/intelligence` |
| Apply an invisible watermark to media | **Watermark Apply** | `POST /watermark/apply` |
| Check if media contains a watermark | **Watermark Detect** | `POST /watermark/detect` |
| Verify a speaker's identity against known profiles | **Identity Search** | `POST /identity/search` |
| Check if text is AI-generated | **Text Detection** | `POST /text_detect` |
| Create a voice identity profile for future matching | **Identity Create** | `POST /identity` |
When multiple capabilities apply (e.g., user wants deepfake detection AND intelligence), combine them in a single `POST /detect` call using the `intelligence: true` flag rather than making separate requests.
## Required Setup
- **API Key**: Bearer token from the Resemble AI dashboard (set as `RESEMBLE_API_KEY`)
- **Base URL**: `https://app.resemble.ai/api/v2`
- **Auth Header**: `Authorization: Bearer <RESEMBLE_API_KEY>`
- **Media Requirement**: All media must be at a publicly accessible HTTPS URL
If the user provides a local file path instead of a URL, inform them the file must be hosted at a public HTTPS URL first. Do not attempt to upload local files to the API. (Exception: `POST /text_detect` accepts text content inline.)
## MCP Tools Available
When the Resemble MCP server is connected, use these tools instead of raw API calls:
| Tool | Purpose |
|---------------------------|---------------------------------------------------|
| `resemble_docs_lookup` | Get comprehensive docs for any detect sub-topic |
| `resemble_search` | Search across all documentation |
| `resemble_api_endpoint` | Get exact OpenAPI spec for any endpoint |
| `resemble_api_search` | Find endpoints by keyword |
| `resemble_get_page` | Read specific documentation pages |
| `resemble_list_topics` | List all available topics |
**Tool usage pattern**: Use `resemble_docs_lookup` with topic `"detect"` to get the full picture, then `resemble_api_endpoint` for exact request/response schemas before making API calls.
## Full API Reference
Detailed request/response schemas for every endpoint are in **[references/api-reference.md](references/api-reference.md)**. Consult it before making any API call to verify exact parameter names and response shapes. The sections below cover decision-making; the reference covers exact field formats.
---
## Phase 1: Deepfake Detection
The core capability. Submit audio, image, or video for AI-generated content analysis via `POST /detect`.
**Key flags to consider:**
- `visualize: true` — generate heatmap/visualization artifacts
- `intelligence: true` — run multimodal intelligence alongside detection (saves a round-trip)
- `audio_source_tracing: true` — identify which AI platform synthesized fake audio (only fires on `"fake"` audio)
- `use_reverse_search: true` — enable reverse image search (image only)
- `zero_retention_mode: true` — auto-delete media after analysis (for sensitive content)
Detection is asynchronous. Poll `GET /detect/{uuid}` at 2s → 5s → 10s intervals until `status` is `"completed"` or `"failed"`. Most complete in 10–60 seconds.
**Supported formats:** Audio (WAV, MP3, OGG, M4A, FLAC) · Video (MP4, MOV, AVI, WMV) · Image (JPG, PNG, GIF, WEBP)
### Reading Results
- **Audio** — verdict in `metrics` — use `label` and `aggregated_score`
- **Image** — verdict in `image_metrics` — use `label` and `score`; `ifl` has an Invisible Frequency Layer heatmap
- **Video** — verdict in `video_metrics` — hierarchical tree of frame/segment results; video-with-audio returns both `metrics` and `video_metrics`
See [references/api-reference.md](references/api-reference.md#reading-results-by-media-type) for full response schemas.
### Interpreting Scores
| Score Range | Interpretation |
|-------------|-----------------------------------------------------|
| 0.0 – 0.3 | Strong indication of authentic/real media |
| 0.3 – 0.5 | Inconclusive — recommend additional analysis |
| 0.5 – 0.7 | Likely synthetic — flag for review |
| 0.7 – 1.0 | High confidence synthetic/AI-generated |
**Always present scores with context.** Say "The detection returned a score of 0.87, indicating high confidence that this audio is AI-generated" — never just "it's fake."
---
## Phase 2: Intelligence — Media Analysis
Rich structured insights about media: speaker info, emotion, transcription, translation, misinformation, abnormalities.
Two ways to run Intelligence:
1. **Combined with detection** — add `intelligence: true` to `POST /detect` (preferred; one call)
2. **Standalone** — `POST /intelligence` with a URL (when you only need analysis, not a deepfake verdict)
**Audio/video structured fields include:** `speaker_info`, `language`, `dialect`, `emotion`, `speaking_style`, `context`, `message`, `abnormalities`, `transcription`, `translation`, `misinformation`.
**Image structured fields include:** `scene_description`, `subjects`, `authenticity_analysis`, `context_and_setting`, `abnormalities`, `misinformation`.
### Detect Intelligence — Ask Questions About Results
After a detection completes, ask natural-language questions via `POST /detects/{detect_uuid}/intelligence` with `{ "query": "..." }`. Returns a question UUID — poll Related in Image & Video
watch
IncludedWatch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or Whisper API fallback), and hands the result to Claude so it can answer questions about what's in the video.
physical-ai-defect-image-generation
IncludedUse when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path performs inference and labeling on real images. This skill helps with first-time asset setup, creation of finetuning checkpoints, and configuring deployment. Trigger keywords: defect image generation, dig workflow, dig pipeline, defect image detection workflow, aoi pipeline, aoi anomalygen, usd2roi anomalygen, day 0 pcba, day 1 pcba, day 1 real-photo alignment, day 1 manual roi, metal surface anomaly, glass defect, anomalygen finetune, setup_pcb, setup_metal, setup_glass, setup_pretrained, dig setup, dig datasets, dig pretrained checkpoint, dig image-edit endpoint.
accelint-react-best-practices
IncludedReact performance optimization and best practices. ALWAYS use this skill when working with any React code - writing components, hooks, JSX; refactoring; optimizing re-renders, memoization, state management; reviewing for performance; fixing hydration mismatches; debugging infinite re-renders, stale closures, input focus loss, animations restarting; preventing remounting; implementing transitions, lazy initialization, effect dependencies. Even simple React tasks benefit from these patterns. Covers React 19+ (useEffectEvent, Activity, ref props). Triggers - useEffect, useState, useMemo, useCallback, memo, inline components, nested components, components inside components, re-render, performance, hydration, SSR, Next.js, useDeferredValue, combined hooks.
elevenlabs-agents
IncludedBuild conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
humanizer
IncludedHumanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 28 pattern detectors, 560+ AI vocabulary terms across 3 tiers, and statistical analysis (burstiness, type-token ratio, readability) for comprehensive detection. Use when asked to humanize text, de-AI writing, make content sound more natural/human, review writing for AI patterns, score text for AI detection, or improve AI-generated drafts. Covers content, language, style, communication, and filler categories.
generating-mermaid-diagrams
IncludedSalesforce architecture diagrams using Mermaid with ASCII fallback. Use this skill when generating text-based diagrams for Salesforce architecture, OAuth flows, ERDs, integration sequences, or Agentforce structure. TRIGGER when: user says "diagram", "visualize", "ERD", or asks for sequence diagrams, flowcharts, class diagrams, or architecture visualizations in Mermaid. DO NOT TRIGGER when: user wants PNG/SVG image output (use generating-visual-diagrams), or asks about non-Salesforce systems.