digital-health-clinical-asr-build

Included with Lifetime

$97 forever

Stage 2 of the Clinical ASR Flywheel. Use when curating clinical terms, tagging IPA, and synthesizing a NeMo manifest. NOT for scoring (use /digital-health-clinical-asr-eval).

Generalclinical-asrdatasetipamagpienemo-manifestflywheel

What this skill does

# Clinical ASR Flywheel — Stage 2 (Build the benchmark)

> **⚠ Agent: read this entire SKILL.md before answering.** This stage is conversational and gated. Specifically: ask the user 1–2 specialty-aware clarifying questions **before** proposing terms (Step 2a), walk them through the two-tier IPA pipeline (override → merriam-webster → magpie_g2p) in Step 2c, hit the explicit QA-mode audition gate in Step 2d before full Cartesian synthesis, and name **KER** as the headline metric they'll see in Stage 3. Skipping any of these defeats the methodology.

You are the **curate-and-synthesize** stage. The user arrives from `/digital-health-clinical-asr-setup` and leaves with a NeMo-format `manifest.jsonl` plus the audio it references — both ready for scoring at `/digital-health-clinical-asr-eval`.

Be conversational. This is the warmest, most domain-aware step in the flywheel: you're asking a clinician (or someone who works with them) which terms hurt today and shaping a benchmark around their reality. Ask short, focused questions. Show the user what's being added. Don't lecture.

## Data leaves your environment — disclose this to the user before any term is sent

This stage transmits user-curated content to two external services. Surface this to the user before invoking either call:

| Service | What gets sent | When |
|---|---|---|
| **Merriam-Webster** (`dictionaryapi.com` API or `merriam-webster.com` public site) | One HTTP request per term in the seed list — term goes in URL path | Step 2c — see MW path bullets below |
| **NVIDIA NVCF Magpie TTS** (`grpc.nvcf.nvidia.com`) | Each generated clinical sentence (text, plus any SSML IPA wrappers) | Steps 2d and 2e, every synthesis call |

Both endpoints expect **non-PHI synthetic content** — the term list you curate, the sentences `/data-designer` (or your fallback templates) generates from it. **Do not pass real patient records, real ASR transcripts, or any PHI through this skill.** If the term list itself is sensitive (proprietary drug codenames, unreleased product names, customer-confidential indications), confirm with the user that external-API transmission is acceptable under their organization's data-governance policy before proceeding.

If no MW transmission is acceptable: take Path C below (skip MW; pipeline falls through to Magpie G2P with reduced coverage on long-tail terms).

## Purpose

Curate a clinical-specialty term list, generate eval audio for it through Magpie TTS with a two-tier IPA pipeline, and write a NeMo-format manifest tagged with the clinical-extension fields (`term`, `entity_category`, `ipa_source`, `voice_id`, `noise_level`, `context_type`). The output is the input to Stage 3.

By the end the user has:

```
$EVAL_DIR/cycle<N>/
├── audio/<slug>.wav synthesized clips
├── manifest.jsonl NeMo format + clinical extension
├── term_seed.csv the curated input
└── pronunciation_overrides.csv appendable across cycles
```

(`$EVAL_DIR` is the user's own choice — this skill does not impose a layout. The structure above is a recommendation, not a requirement.)

## When to use this skill

Activate on user phrases like:

- "Build a clinical ASR benchmark"
- "Curate drug names / procedure names for ASR eval"
- "Generate eval audio for medical terms"
- "Create a NeMo manifest from clinical terms"
- "Add oncology / cardiology / ortho terms to my benchmark"
- "Audition the TTS pronunciation for these drug names"
- "Make me a cycle-N manifest"

Do **not** activate when (also: if the message mentions `auth`, `API key`, `gRPC`, `streaming`, `riva-build`, `NIM deploy`, `NGC`, or `Docker`, route per the bullets below and stop):

- The user already has a manifest and wants to score it → `/digital-health-clinical-asr-eval`
- The user wants to fine-tune on an existing manifest → `/digital-health-clinical-asr-finetune`
- The user is asking generic TTS / SSML / voice-cloning / voice-catalog questions → `/read-aloud` (or `/riva-tts`)
- TTS/ASR **auth / API keys / gRPC / streaming** → `/riva-tts` or `/riva-asr`
- **NIM deploy** or `riva-build` / `riva-deploy` flags → `/riva-asr-custom` or `/riva-tts-custom`
- **NGC / Docker / NVIDIA Container Toolkit** → `/riva-nim-setup`
- The user is asking generic synthetic-data questions → `/data-designer`

## Prerequisites

- **`/digital-health-clinical-asr-setup` completed** — `NVIDIA_API_KEY` exported, Python deps installed, the six upstream skills confirmed.
- **`/read-aloud`** (or `/riva-tts`) reachable. Hosted Magpie via NVCF is the default. Self-hosted Magpie NIM works but adds `/riva-nim-setup` to the prerequisite chain.
- **`/data-designer`** reachable. Template fallback is acceptable for a first cycle if `/data-designer` is unavailable, but tag those rows so future cycles can re-generate.
- **A working directory** the user owns. The skill recommends `$EVAL_DIR/cycle<N>/` but does not enforce it.

## Instructions

### 2a. Specialty interview → `term_seed.csv`

Ask **one question at a time**. The goal is to surface 4–10 candidate terms with the right `entity_category`, not to write a textbook.

Questions, in order:

1. *What specialty / workflow is this for?* (oncology dictation, ICU handoff, psych intake, ortho post-op, …)
2. *What ASR failure modes have you seen?* — drug names, multi-word procedures, abbreviations, compound conditions.
3. *Which terms come up daily vs which are the hard ones?* — daily-common terms become the sanity baseline; daily-hard terms become the signal.

Propose 4–10 candidate terms with `entity_category`. Confirm with the user before writing. Then write `term_seed.csv`:

```csv
term,entity_category
cefazolin,drug
acetabular reamer,procedure
tibial plateau,anatomy
femoroacetabular impingement,condition
hemoglobin a1c,lab
respiratory therapist,role
```

**The category vocabulary is fixed.** KER keys off it. Allowed values:

If the user proposes a new category, push back: either it maps to one of the six, or the methodology needs a deliberate extension (which is a future cycle's job, not a one-off ad-hoc add).

### 2b. Sentence generation via `/data-designer`

Brief `/data-designer` with:

> For each row in `term_seed.csv`, generate one or more natural English sentences embedding `term` in a way that fits the row's `entity_category`. Output schema: `{term, entity_category, sentence, context_type}`. Generate 3–5 `context_type` variants per term. Initial `context_type` vocabulary: `dictation`, `handoff`, `chart_note`, `history`. Sentence length 10–30 words.

The output of this step is a per-term sentence variants file. Any filename is fine — pick one and use it consistently across the cycle directory.

**Template fallback.** If `/data-designer` is unavailable, use a 4-template fallback (one per `context_type`) and substitute `term` mechanically. Tag those rows in the manifest (`context_type` is set, the sentence is just less natural) so a future cycle can regenerate.

### 2c. Two-tier IPA tagging (the load-bearing quality lever)

Every term passes through a 3-tier pipeline, in order:

1. **Override** — `pronunciation_overrides.csv` carries verified IPA the team has audited. If `term` matches a row here, the override wins.
2. **Merriam-Webster** — for un-overridden terms, fetch the MW respelling, convert to IPA, validate against Magpie's en-US phoneme set. If both succeed, the term is tagged `merriam-webster`.
3. **Magpie G2P (fall-through)** — if neither override nor MW produces a valid IPA, the plain text is passed to Magpie's neural G2P at synthesis time. The row is tagged `magpie_g2p`.

Every manifest row carries the `ipa_source` tag (`override | merriam-webster | magpie_g2p`). The delta between `merriam-webster` and `magpie_g2p` rows in the Stage 3 leaderboard **is the proof** the pronunciation strategy is worki

Files: 7

Size: 58.5 KB

Complexity: 54/100

Category: General

Source: https://github.com/NVIDIA/skills/tree/main/skills/digital-health-clinical-asr-build

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts