Claude
Skills
Sign in
Back

digital-health-clinical-asr-finetune

Included with Lifetime
$97 forever

Stage 4 of the Clinical ASR Flywheel. Use when priority KER is above 0.3 to run stock NeMo SFT on Parakeet TDT v2 and offline cycle N+1 re-eval. NOT for generic word boosting (use /finetune-asr).

Generalclinical-asrfinetunesftnemoparakeetflywheel

What this skill does


<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

# Clinical ASR Flywheel — Stage 4 (Fine-tune)

> **⚠ Agent: read this entire SKILL.md before answering.** The Critical-workflow-rules section, the base-model table (§4c), the stock-NeMo-SFT recipe (§4d), and the cycle-N+1 decision table (§4e) are all load-bearing — the do-not-SFT bases and broken-adapter warnings live there.

> **Agent: this file is self-contained.** The Stage 4 gate criteria, base-model recommendation, hyperparameter table, container invocation pattern, and cycle-N+1 decision table are all below. **Do not** run file-discovery commands or open `references/stage4-finetune.md` before answering methodology questions — the reference is deep-dive material, not required reading. Answer from this file; defer to the reference only when a hyperparameter rationale or Brev SKU detail is specifically asked.

You are the **adapt-and-measure** stage. The user arrives from `/digital-health-clinical-asr-eval` with a manifest, a baseline KER number, and the decision-tree's recommendation that fine-tuning is worth the GPU time. You run stock NeMo SFT, do an offline cycle N+1 re-eval to **measure that the loop closed**, and optionally hand the resulting `.nemo` to `/riva-asr-custom` for production serving.

**The cycle KER from offline eval is the measurement that closes the loop.** Riva NIM deploy validates serving (latency, streaming, scale), not model quality.

> **Empirically verified on the reference manifest** (39 rows, Parakeet TDT v2):
> Baseline KER **0.513** → after 3 epochs of stock SFT: **0.128** (-75% relative).
> Drug names: 0.857 → 0.214. Conditions: 0.500 → 0.000. Procedures: 0.250 → 0.000.

## Critical workflow rules (apply on every activation)

Surface these facts in any response, even if the user asks a narrow question:

1. **Read this entire SKILL.md before answering.** The base-model selection table, hyperparameter values, and the cycle-N+1 decision table are below — they are the load-bearing parts.
2. **Verified result** — Parakeet TDT v2 with the recipe in §4c achieves **KER 0.513 → 0.128 (−75% relative)** in 3 epochs on the reference manifest. Cite this when the user asks whether SFT will help.
3. **Recipe is `/opt/NeMo/examples/asr/speech_to_text_finetune.py` inside `nvcr.io/nvidia/nemo:25.11.01`.** Stock script, no patches, no custom adapter logic. The adapter-mixin path is broken on TDT/RNNT decoders (72 NaN tensors at any LR) — do not propose it.
4. **Recommended base is `nvidia/parakeet-tdt-0.6b-v2`.** The full base-model table is in §4c.
5. **Do NOT fine-tune `nvidia/nemotron-speech-streaming-en-0.6b`.** The streaming NVCF function's SFT path is broken (UNK collapse on validation after step 1). For streaming serving at deploy time, Riva chunks a non-streaming base just fine. Warn the user proactively if they propose it.
6. **Gate the recommendation.** Stage 4 only fires when priority-category KER > 0.3 **and** manifest has ≥ 100 rows (≥ 5 per priority category). Below those thresholds, route back to `/digital-health-clinical-asr-build` to grow the manifest first.

## Purpose

Run **stock NeMo SFT** (no custom adapter logic, no patches) in `nvcr.io/nvidia/nemo:25.11.01` against a term-aware row-disjoint train/val split, produce a `.nemo` model, and re-eval offline as cycle N+1. Decide based on the cycle-N → cycle-N+1 KER delta whether to keep the model, grow the manifest, or accept that fine-tuning didn't help. Optionally hand the `.nemo` to `/riva-asr-custom` for NIM deploy.

## When to use this skill

Activate on user phrases like:

- "Fine-tune ASR on my clinical vocabulary"
- "Improve ASR on medication names"
- "We have a KER of 0.4, can we fine-tune?"
- "Run SFT on my Parakeet TDT base"
- "Train a clinical ASR adapter"
- "Compare cycle 1 vs cycle 2 KER"
- "Deploy my fine-tuned model as a NIM" *(this skill prepares the `.nemo` and routes to `/riva-asr-custom` for the deploy)*

Do **not** activate when:

- The user hasn't scored a baseline yet → `/digital-health-clinical-asr-eval`
- The user doesn't have a manifest → `/digital-health-clinical-asr-build`
- The user wants generic word boosting / LM fusion (not SFT) → `/finetune-asr`
- The user has a `.nemo` and only wants to deploy → `/riva-asr-custom`

## Prerequisites

- **A cycle-N manifest + cycle-N eval result** from `/digital-health-clinical-asr-eval`. The priority-category KER must be > 0.3 (Stage 4 gate). The manifest should have ≥ 100 rows total, and ≥ 5 rows per priority `entity_category`, for a believable post-tune signal.
- **A CUDA host** — 24 GB VRAM is comfortable for Parakeet TDT 0.6B at `batch_size=4` with `bf16-mixed`; 16 GB works with smaller batch. No local GPU? Use Brev — recommended SKU is L40S 48 GB.
- **The NeMo container**: `nvcr.io/nvidia/nemo:25.11.01`. Pull once: `docker pull nvcr.io/nvidia/nemo:25.11.01`.
- **NVIDIA Container Toolkit + Docker** — covered by `/riva-nim-setup` if not already installed.
- **A train/val split** stratified by `entity_category` (recipe sketch in Step 4b below).
- **`/riva-asr-custom`** installed if you intend to deploy. Pure-research SFT runs without it.

## Instructions

### 4a. Provision a GPU host (skip if you already have one)

Stage 4 needs a CUDA host with ≥ 16 GB VRAM (24 GB comfortable). If you have a local one that fits, skip this section. If not, use **Brev** — NVIDIA's per-second-billed GPU host service. Recommended SKU: L40S 48 GB.

**Cost disclosure — surface this to the user before any `brev create`.** L40S 48 GB runs ~$1.50/hr at time of writing; a 3-epoch SFT run on a 100-row manifest finishes in 15–30 minutes (~$0.40–$0.75 of compute). The real risk is **forgetting to stop the instance** — overnight idle on L40S is ~$36, a week of idle is ~$250. Mitigations: (a) always wrap the workflow in a script that ends with `brev stop`; (b) set a calendar reminder when you start; (c) `brev delete` instead of `brev stop` if you don't need to keep the disk (`stop` keeps disk at $0.10/GB-month — 200 GB ≈ $20/month of latent cost). Confirm the user accepts the per-hour cost shape and the idle risk before spinning anything up.

Full setup walkthrough — CLI install (download-then-run, not curl-pipe), SKU choice, disk sizing, SSH config — is in `references/stage4-finetune.md` (§Brev provisioning).

Short happy-path once the CLI is installed. **Do not run `brev create` until the user has explicitly typed `YES` at the confirmation prompt below** — the gate is mandatory, not advisory, because everything after it bills against the user's account by the second:

```bash
brev login                                  # browser auth

# Mandatory cost-confirmation gate — do NOT skip or auto-answer this.
echo "About to provision: digital-health-clinical-asr-sft on L40S 48 GB."
echo "Cost shape: ~\$1.50/hr while running; ~\$36/night if left idle; ~\$20/mo disk if you 'stop' instead of 'delete'."
read -rp "Type YES to provision (anything else cancels): " confirm
[ "$confirm" = "YES" ] || { echo "Cancelled — no GPU instance was created."; exit 1; }

brev create digital-health-clinical-asr-sft \
  --gpu l40s:1 --image ubuntu-22-04-cuda-12-4 --disk 200gi
brev ssh-config                             # writes ~/.ssh/config entries
rsync -avz ./cycle1/ digital-health-clinical-asr-sft:~/cycle1/
brev shell digital-health-clinical-asr-sft            # drops into the instance
nvidia-smi                                  # confirm GPU
docker pull nvcr.io/nvidia/nemo:25.11.01    # ~12 GB, once per instance
```

When done, **always halt billing**: `brev stop digital-health-clinical-asr-sft` (keeps disk) or `brev delete digital-health-clinical-asr-sft` (frees it). For path rewriting laptop → Brev → NeMo container, see `references/container-paths.md`.

### 4b. Term-aware train/val split

**Row-disjoint, stratified by `entity_category`, default val fraction 0.2.**

The **same `term`** may appear on both si

Related in General