llm-fine-tuning-skill
Run a staged LLM fine-tuning workflow from bootstrap through investigation, requirements and success criteria, implementation planning, implementation plus data preparation, training and validation, code review, docs sync, and handoff. Use for supervised fine-tuning, preference tuning, reinforcement-style training, and other LLM adaptation work with reproducible empirical validation.
What this skill does
# LLM Fine-Tuning Skill ## Overview Run a staged workflow for LLM fine-tuning work where the hard parts are usually investigation quality, implementation planning, data preparation, prompt or chat formatting, tokenizer correctness, and empirical validation rather than large implementation volume. Use this skill for tasks such as supervised fine-tuning, preference tuning, reinforcement-style training, domain adaptation, data-format changes, eval-set changes, benchmark comparisons, and reproducible validation work. This skill is method-agnostic. It can be used for adapter-based, quantized, or full-parameter tuning, but it should force the workflow to make the chosen objective and update strategy explicit instead of assuming one method. This workflow is stage-gated. Do not batch-generate all artifacts by default. Advance only when the current stage gate is satisfied or a classified re-entry path says otherwise. ## Skill Layout - `SKILL.md` is the workflow router. - `shared/workflow-state-template.md` is the canonical stage-control artifact. - `stages/` stores stage-owned guides and templates: - `stages/00-bootstrap/` - `stages/01-investigation/` - `stages/02-requirements-and-success-criteria/` - `stages/03-implementation-plan/` - `stages/04-implementation/` - `stages/05-training-and-validation/` - `stages/06-code-review/` - `stages/07-docs-sync/` - `stages/08-handoff/` ## Workflow ### Ticket Folder Convention - For each task, create or reuse one ticket folder under `tickets/in-progress/`. - Write active workflow artifacts in `tickets/in-progress/<ticket-name>/`. - Archive completed tickets in `tickets/done/<ticket-name>/`. - Move a ticket to `done` only after explicit user verification or explicit user instruction. - If the user reopens a completed task, move the ticket back to `tickets/in-progress/<ticket-name>/` before new updates. ### Bootstrap And Worktree Setup - Before investigation, create or reuse the ticket folder and write `requirements.md` with status `Draft`. - If the project is a git repository: - resolve the base branch from explicit user instruction when provided, otherwise infer the tracked remote default or integration branch with highest confidence, - refresh tracked remote refs before creating a new ticket branch or worktree, - create or reuse a dedicated ticket worktree, - create or reuse a ticket branch named `codex/<ticket-name>`. - If the environment is not a git repository, continue without worktree setup and still enforce the ticket-folder and `Draft` requirement capture. ### Workflow State File - Create and maintain `tickets/in-progress/<ticket-name>/workflow-state.md` as the mandatory stage-control artifact. - Initialize it during Stage 0 with: - `Current Stage = 0` - `Code Edit Permission = Locked` - the bootstrap record filled in - stage gates set to `Not Started` or `In Progress` - Update `workflow-state.md` on every stage transition, gate decision, and re-entry declaration. ### Source-Edit Lock Rule - No source-code edits are allowed unless `workflow-state.md` shows: - `Current Stage = 4` - `Code Edit Permission = Unlocked` - Default state is `Locked`. - Unlock source-code edits only after Stage 3 `Implementation Plan` is current enough to drive implementation. - If Stage 5, 6, or 7 fails and a re-entry is required, lock source edits before taking the return path. ### Canonical Flow - Forward path: `0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8` - Re-entry is mandatory when failures show the issue is upstream of the current stage. - Do not stop after recording a re-entry path; resume work in the returned stage immediately unless blocked by the environment or waiting for an explicit user-only decision. ## Stage Router ### 0) Bootstrap - Primary files: - `stages/00-bootstrap/README.md` - `stages/00-bootstrap/bootstrap-checklist.md` - Required outcome: - ticket context exists, - `requirements.md` exists with status `Draft`, - `workflow-state.md` exists and records bootstrap details. ### 1) Investigation - Primary files: - `stages/01-investigation/README.md` - `stages/01-investigation/investigation-guide.md` - `stages/01-investigation/investigation-notes-template.md` - Investigation is first-class in this workflow. - Investigation can include: - reading local code, configs, datasets, logs, checkpoints, and eval harnesses, - reading tokenizer, prompt-template, and response-formatting logic, - reading open-source fine-tuning repositories, framework internals, and relevant documentation, - checking papers, model cards, or fine-tuning references when needed, - running probes, small scripts, reproductions, and data sanity checks. - Required outcome: - `investigation-notes.md` is a durable dossier with concrete evidence, - the task is triaged for scope and uncertainty, - later stages can reuse the findings directly. ### 2) Requirements & Success Criteria - Primary files: - `stages/02-requirements-and-success-criteria/README.md` - `stages/02-requirements-and-success-criteria/requirements-success-criteria-guide.md` - Required outcome: - `requirements.md` moves from `Draft` to `Plan-ready` or `Refined`, - task definition, baseline, measurable or rubric-defined outcomes, constraints, and success criteria are explicit, - the planned validation gate can measure pass, fail, or inconclusive results truthfully. ### 3) Implementation Plan - Primary files: - `stages/03-implementation-plan/README.md` - `stages/03-implementation-plan/implementation-plan-template.md` - This stage replaces heavy software-architecture runtime modeling with a concrete implementation plan. - Focus on: - dataset sourcing, curation, filtering, split assumptions, and data-preparation design, - prompt or chat template design, - tokenizer, special-token, masking, truncation, and packing behavior, - base model choice, objective type, and parameter-update strategy, - optimizer, scheduler, and fine-tuning recipe, - evaluation protocol, held-out prompt suites, inference settings, and sample-based validation, - objective-specific needs such as preference pairs, reward signals, or trajectory handling when applicable, - comparison matrix or ablations, - reproducibility plan, - implementation work items. - Required outcome: - `implementation-plan.md` is current and can drive Stage 4 implementation and Stage 5 training or evaluation. ### 4) Implementation & Data Preparation - Primary files: - `stages/04-implementation/README.md` - `stages/04-implementation/implementation-template.md` - Implementation is important, but it is not the center of this workflow. - Data preparation execution belongs here, not in Stage 5. - This stage owns the materialization of dataset manifests, formatted samples, tokenized artifacts, or other prepared inputs that Stage 5 will consume. - Keep the artifact execution-oriented: - changed files, - data-preparation scripts and materialized artifacts, - config updates, - commands, - checkpoints and logging paths, - smoke checks, - readiness for training and validation. - Required outcome: - implementation matches the implementation plan closely enough to run Stage 5, - source edits are complete for the current iteration, - required data preparation is complete, - smoke or unit checks needed before training are complete. ### 5) Training & Validation - Primary files: - `stages/05-training-and-validation/README.md` - `stages/05-training-and-validation/training-validation-guide.md` - `stages/05-training-and-validation/training-validation-template.md` - This is the primary evidence gate of the workflow. - Training here can mean supervised fine-tuning, preference optimization, or reinforcement-style optimization depending on the chosen objective. - Stage 5 should consume the prepared code, configs, and data artifacts produced in Stage 4. - Do not move substantial data-preparation work into Stage 5; onl
Related in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.