Claude
Skills
Sign in
Back

digital-health-clinical-asr-build

Included with Lifetime
$97 forever

Stage 2 of the Clinical ASR Flywheel. Use when curating clinical terms, tagging IPA, and synthesizing a NeMo manifest. NOT for scoring (use /digital-health-clinical-asr-eval).

Generalclinical-asrdatasetipamagpienemo-manifestflywheel

What this skill does


<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

# Clinical ASR Flywheel — Stage 2 (Build the benchmark)

> **⚠ Agent: read this entire SKILL.md before answering.** This stage is conversational and gated. Specifically: ask the user 1–2 specialty-aware clarifying questions **before** proposing terms (Step 2a), walk them through the two-tier IPA pipeline (override → merriam-webster → magpie_g2p) in Step 2c, hit the explicit QA-mode audition gate in Step 2d before full Cartesian synthesis, and name **KER** as the headline metric they'll see in Stage 3. Skipping any of these defeats the methodology.

You are the **curate-and-synthesize** stage. The user arrives from `/digital-health-clinical-asr-setup` and leaves with a NeMo-format `manifest.jsonl` plus the audio it references — both ready for scoring at `/digital-health-clinical-asr-eval`.

Be conversational. This is the warmest, most domain-aware step in the flywheel: you're asking a clinician (or someone who works with them) which terms hurt today and shaping a benchmark around their reality. Ask short, focused questions. Show the user what's being added. Don't lecture.

## Data leaves your environment — disclose this to the user before any term is sent

This stage transmits user-curated content to two external services. Surface this to the user before invoking either call:

| Service | What gets sent | When |
|---|---|---|
| **Merriam-Webster** (`dictionaryapi.com` API or `merriam-webster.com` public site) | One HTTP request per term in the seed list — term goes in URL path | Step 2c — see MW path bullets below |
| **NVIDIA NVCF Magpie TTS** (`grpc.nvcf.nvidia.com`) | Each generated clinical sentence (text, plus any SSML IPA wrappers) | Steps 2d and 2e, every synthesis call |

Both endpoints expect **non-PHI synthetic content** — the term list you curate, the sentences `/data-designer` (or your fallback templates) generates from it. **Do not pass real patient records, real ASR transcripts, or any PHI through this skill.** If the term list itself is sensitive (proprietary drug codenames, unreleased product names, customer-confidential indications), confirm with the user that external-API transmission is acceptable under their organization's data-governance policy before proceeding.

If no MW transmission is acceptable: take Path C below (skip MW; pipeline falls through to Magpie G2P with reduced coverage on long-tail terms).

## Purpose

Curate a clinical-specialty term list, generate eval audio for it through Magpie TTS with a two-tier IPA pipeline, and write a NeMo-format manifest tagged with the clinical-extension fields (`term`, `entity_category`, `ipa_source`, `voice_id`, `noise_level`, `context_type`). The output is the input to Stage 3.

By the end the user has:

```
$EVAL_DIR/cycle<N>/
├── audio/<slug>.wav        synthesized clips
├── manifest.jsonl          NeMo format + clinical extension
├── term_seed.csv           the curated input
└── pronunciation_overrides.csv   appendable across cycles
```

(`$EVAL_DIR` is the user's own choice — this skill does not impose a layout. The structure above is a recommendation, not a requirement.)

## When to use this skill

Activate on user phrases like:

- "Build a clinical ASR benchmark"
- "Curate drug names / procedure names for ASR eval"
- "Generate eval audio for medical terms"
- "Create a NeMo manifest from clinical terms"
- "Add oncology / cardiology / ortho terms to my benchmark"
- "Audition the TTS pronunciation for these drug names"
- "Make me a cycle-N manifest"

Do **not** activate when (also: if the message mentions `auth`, `API key`, `gRPC`, `streaming`, `riva-build`, `NIM deploy`, `NGC`, or `Docker`, route per the bullets below and stop):

- The user already has a manifest and wants to score it → `/digital-health-clinical-asr-eval`
- The user wants to fine-tune on an existing manifest → `/digital-health-clinical-asr-finetune`
- The user is asking generic TTS / SSML / voice-cloning / voice-catalog questions → `/read-aloud` (or `/riva-tts`)
- TTS/ASR **auth / API keys / gRPC / streaming** → `/riva-tts` or `/riva-asr`
- **NIM deploy** or `riva-build` / `riva-deploy` flags → `/riva-asr-custom` or `/riva-tts-custom`
- **NGC / Docker / NVIDIA Container Toolkit** → `/riva-nim-setup`
- The user is asking generic synthetic-data questions → `/data-designer`

## Prerequisites

- **`/digital-health-clinical-asr-setup` completed** — `NVIDIA_API_KEY` exported, Python deps installed, the six upstream skills confirmed.
- **`/read-aloud`** (or `/riva-tts`) reachable. Hosted Magpie via NVCF is the default. Self-hosted Magpie NIM works but adds `/riva-nim-setup` to the prerequisite chain.
- **`/data-designer`** reachable. Template fallback is acceptable for a first cycle if `/data-designer` is unavailable, but tag those rows so future cycles can re-generate.
- **A working directory** the user owns. The skill recommends `$EVAL_DIR/cycle<N>/` but does not enforce it.

## Instructions

### 2a. Specialty interview → `term_seed.csv`

Ask **one question at a time**. The goal is to surface 4–10 candidate terms with the right `entity_category`, not to write a textbook.

Questions, in order:

1. *What specialty / workflow is this for?* (oncology dictation, ICU handoff, psych intake, ortho post-op, …)
2. *What ASR failure modes have you seen?* — drug names, multi-word procedures, abbreviations, compound conditions.
3. *Which terms come up daily vs which are the hard ones?* — daily-common terms become the sanity baseline; daily-hard terms become the signal.

Propose 4–10 candidate terms with `entity_category`. Confirm with the user before writing. Then write `term_seed.csv`:

```csv
term,entity_category
cefazolin,drug
acetabular reamer,procedure
tibial plateau,anatomy
femoroacetabular impingement,condition
hemoglobin a1c,lab
respiratory therapist,role
```

**The category vocabulary is fixed.** KER keys off it. Allowed values:

```
drug | procedure | anatomy | condition | lab | role
```

If the user proposes a new category, push back: either it maps to one of the six, or the methodology needs a deliberate extension (which is a future cycle's job, not a one-off ad-hoc add).

### 2b. Sentence generation via `/data-designer`

Brief `/data-designer` with:

> For each row in `term_seed.csv`, generate one or more natural English sentences embedding `term` in a way that fits the row's `entity_category`. Output schema: `{term, entity_category, sentence, context_type}`. Generate 3–5 `context_type` variants per term. Initial `context_type` vocabulary: `dictation`, `handoff`, `chart_note`, `history`. Sentence length 10–30 words.

The output of this step is a per-term sentence variants file. Any filename is fine — pick one and use it consistently across the cycle directory.

**Template fallback.** If `/data-designer` is unavailable, use a 4-template fallback (one per `context_type`) and substitute `term` mechanically. Tag those rows in the manifest (`context_type` is set, the sentence is just less natural) so a future cycle can regenerate.

### 2c. Two-tier IPA tagging (the load-bearing quality lever)

Every term passes through a 3-tier pipeline, in order:

1. **Override** — `pronunciation_overrides.csv` carries verified IPA the team has audited. If `term` matches a row here, the override wins.
2. **Merriam-Webster** — for un-overridden terms, fetch the MW respelling, convert to IPA, validate against Magpie's en-US phoneme set. If both succeed, the term is tagged `merriam-webster`.
3. **Magpie G2P (fall-through)** — if neither override nor MW produces a valid IPA, the plain text is passed to Magpie's neural G2P at synthesis time. The row is tagged `magpie_g2p`.

Every manifest row carries the `ipa_source` tag (`override | merriam-webster | magpie_g2p`). The delta between `merriam-webster` and `magpie_g2p` rows in the Stage 3 leaderboard **is the proof** the pronunciation strategy is worki

Related in General