karpathy-coder
Use when writing, reviewing, or committing code to enforce Karpathy's 4 coding principles — surface assumptions before coding, keep it simple, make surgical changes, define verifiable goals. Triggers on "review my diff", "check complexity", "am I overcomplicating this", "karpathy check", "before I commit", or any code quality concern where the LLM might be overcoding.
What this skill does
# Karpathy Coder — Active Coding Discipline Derived from [Andrej Karpathy's observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls. This is **not just guidelines** — it ships Python tools that detect violations, a review agent, a slash command, and a pre-commit hook. > "The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." > > "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do." > > "LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go." > > — Andrej Karpathy ## The four principles ### 1. Think Before Coding **Don't assume. Don't hide confusion. Surface tradeoffs.** - State assumptions explicitly. If uncertain, ask. - If multiple interpretations exist, present them — don't pick silently. - If a simpler approach exists, say so. Push back when warranted. - If something is unclear, stop. Name what's confusing. Ask. ### 2. Simplicity First **Minimum code that solves the problem. Nothing speculative.** - No features beyond what was asked. - No abstractions for single-use code. - No "flexibility" or "configurability" that wasn't requested. - No error handling for impossible scenarios. - If you write 200 lines and it could be 50, rewrite it. **The test:** Would a senior engineer say this is overcomplicated? If yes, simplify. ### 3. Surgical Changes **Touch only what you must. Clean up only your own mess.** - Don't "improve" adjacent code, comments, or formatting. - Don't refactor things that aren't broken. - Match existing style, even if you'd do it differently. - If you notice unrelated dead code, mention it — don't delete it. - Remove imports/variables/functions that YOUR changes made unused. - Don't remove pre-existing dead code unless asked. **The test:** Every changed line should trace directly to the user's request. ### 4. Goal-Driven Execution **Define success criteria. Loop until verified.** | Instead of... | Transform to... | |---|---| | "Add validation" | "Write tests for invalid inputs, then make them pass" | | "Fix the bug" | "Write a test that reproduces it, then make it pass" | | "Refactor X" | "Ensure tests pass before and after" | For multi-step tasks, state a brief plan: ``` 1. [Step] → verify: [check] 2. [Step] → verify: [check] 3. [Step] → verify: [check] ``` ## Slash command `/karpathy-check` — Run the full 4-principle review on your staged changes. ## Python tools (`scripts/`) All tools are stdlib-only. Run with `--help`. | Script | What it detects | |---|---| | `complexity_checker.py` | Over-engineering: too many classes, deep nesting, high cyclomatic complexity, unused params, premature abstractions | | `diff_surgeon.py` | Diff noise: lines that don't trace to the stated goal — comment changes, style drift, drive-by refactors | | `assumption_linter.py` | Hidden assumptions in a plan: unasked features, missing clarifications, silent interpretation choices | | `goal_verifier.py` | Weak success criteria: vague plans without verifiable checks, missing test assertions | ## Sub-agent `karpathy-reviewer` — Runs all 4 principles against a diff. Dispatched by `/karpathy-check` or manually before committing. ## Pre-commit hook `hooks/karpathy-gate.sh` — runs `complexity_checker.py` and `diff_surgeon.py` on staged files. Warns (non-blocking) when violations are found. Wire it via `.claude/settings.json` or Husky. ## References - `references/karpathy-principles.md` — the source quotes, deeper context, when to relax each principle - `references/anti-patterns.md` — 10+ before/after examples across Python, TypeScript, and shell - `references/enforcement-patterns.md` — how to wire hooks, CI integration, team adoption ## When to relax These principles bias toward **caution over speed**. For trivial tasks (typo fixes, obvious one-liners), use judgment. The principles matter most on: - Non-trivial implementations (>20 lines changed) - Code you don't fully understand - Multi-step tasks with unclear requirements - Anything that will be reviewed by humans ## Cross-tool compatibility Installs via plugin for Claude Code. For other tools, copy the principles into your schema file: | Tool | Schema file | |---|---| | Claude Code | `CLAUDE.md` (auto-loaded by plugin) | | Codex CLI | `AGENTS.md` | | Cursor | `AGENTS.md` or `.cursorrules` | | Antigravity / OpenCode / Gemini CLI | `AGENTS.md` | ## Related skills (chains via `context: fork`) - **`self-eval`** — honest quality scoring after completing work - **`code-reviewer`** — broader code review; karpathy-coder focuses on the 4 LLM-specific pitfalls - **`llm-wiki`** — compound knowledge; karpathy-coder ensures you don't overcomplicate while building it
Related in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.