agent-artifex:implement
Use when the user wants to improve an existing MCP server, agent, chatbot, or tool-calling system. This includes: improving tool descriptions, fixing error messages, adding output schemas, writing tests, implementing quality checks, adding evals, setting up test harnesses, or any task where they say "help me improve", "fix my descriptions", "add tests", "write evals", "implement quality checks", "make my server better", "apply the design principles", or are ready to make code changes to improve quality. This skill covers both design application (making the code better) and test implementation (verifying the code is good). For scaffolding new projects, use claude-api:mcp-builder. For design principles without code changes, use agent-artifex:design.
What this skill does
# agent-artifex:implement — AI Services Implementation Guide ## When to Use This is the hands-on improvement skill. It covers both applying design principles to make code better AND writing tests to verify code quality. Use it whenever the user is ready to make changes — whether that means rewriting tool descriptions, restructuring error messages, adding output schemas, writing test harnesses, or building eval pipelines. Cross-references: - Scaffolding new projects → `claude-api:mcp-builder` - Design principles without code changes → `agent-artifex:design` - Gap diagnosis → `agent-artifex:assess` --- ## On Invocation Start by understanding what the user needs: 1. **Determine the work type:** - **Design application**: Improving tool descriptions, fixing error messages, adding schemas, restructuring tool sets → read the corresponding design reference (`agent-artifex/skills/design/references/`) - **Test implementation**: Writing quality checks, evals, test harnesses → read the corresponding testing reference (`references/`) - **Both**: Improve the code AND add tests (the ideal flow) 2. **What are you building?** MCP server? Agent? Chatbot? All three? 3. **Which area?** If not specified, determine from context: - Building/modifying tool definitions → Tool Description Design / Quality - Validating tool call results → Server Correctness - Testing whether the FM picks the right tool → Agent Behavior - Verifying the final answer to the user → Response Accuracy - Testing multi-turn conversations → Chatbot Integration - Fixing error messages → Error Message Design - Restructuring parameters → Parameter & Schema Design - Optimizing system prompts → System Prompt Design - Improving multi-turn handling → Multi-Turn Conversation Design - Reorganizing tool sets → Tool Set Architecture - Standardizing output formats → Response Format Design 4. **What's the tech stack?** TypeScript/Python/Go? Which test framework? MCP SDK version? Then **read the relevant reference files** before writing any code. --- ## Reference Files ### Design references (for applying improvements) Read these when making code changes to improve quality. Each file contains principles, patterns, anti-patterns, and concrete guidance for one design area. | Design Area | Reference File | What it contains | |---|---|---| | Tool Description Design | `agent-artifex/skills/design/references/tool-descriptions.md` | Six-component rubric, structural markers, augmentation patterns, domain-specific guidance | | Parameter & Schema Design | `agent-artifex/skills/design/references/parameter-schema.md` | `.describe()` patterns, output schema design, argument count guidance, naming conventions | | Error Message Design | `agent-artifex/skills/design/references/error-messages.md` | Problem/input/why/recovery structure, anti-patterns, `isError` usage, cross-references in recovery | | System Prompt Design | `agent-artifex/skills/design/references/system-prompts.md` | Knowledge placement, ordering instructions, prompt sizing, collision avoidance | | Multi-Turn Conversation Design | `agent-artifex/skills/design/references/multi-turn.md` | Result trimming, stable ID patterns, pagination, context pressure mitigation | | Tool Set Architecture | `agent-artifex/skills/design/references/tool-set-architecture.md` | Dynamic discovery, cross-references, tool splitting, token footprint management | | Response Format Design | `agent-artifex/skills/design/references/response-format.md` | Field naming, pagination patterns, fact vs. narrative, schema consistency | ### Testing references (for writing tests) Read these when writing test code, assertions, harness setup, or eval pipelines. Each file contains working code examples, prompt templates, regex patterns, and pass/fail criteria. | Testing Area | Reference File | What it contains | |---|---|---| | Tool Description Quality | `references/tool-descriptions.md` | Tier 1 code examples (all 5 checks with regex), Tier 2 FM scoring prompt template, multi-model jury setup, pass/fail criteria | | Server Correctness | `references/server-correctness.md` | Schema validation (Ajv/jsonschema), error anti-pattern regex, golden-file patterns, FM recovery 4-step procedure | | Agent Behavior | `references/agent-behavior.md` | Scenario design with examples, recorded replay (TestProvider pattern), live evaluation 4-step procedure, grading guidance | | Response Accuracy | `references/response-accuracy.md` | Closed-loop harness 5 steps, claim decomposition with LLM prompt templates, DeepMind FACTS two-phase evaluation | | Chatbot Integration | `references/chatbot-testing.md` | 5 coreference categories, 5 workflow patterns, 6 scenario categories, 4 conflict types, 6 degradation failure modes | The canonical source documents with full evidence and footnotes are in `docs/ai-services/`. --- ## Design Application by Area ### Tool Description Design **What to look for:** - Descriptions shorter than 4 sentences - Missing Usage Guidelines (89.3% prevalence — "use this when", "do not use", "instead use") - Vague Limitations that hurt more than help (removing bad limitations improved SR by 10pp) - No cross-references between related or confusable tools **What to change:** - Add a Purpose statement: what the tool does, what it returns, and its behavioral characteristics - Add Usage Guidelines with domain-specific cues: when to use, when NOT to use, what to use instead - Make Limitations concrete and actionable, or remove them entirely if they are vague - Add inter-tool cross-references: "Use `tool_x` instead when [condition]" or "Often used after `tool_y`" **How to verify:** - Run Tier 1 structural checks: sentence count >= 3, regex markers for Usage Guidelines and Limitations present - Check that every related tool pair has at least one cross-reference - Optionally run Tier 2 FM-scored rubric: all six component means >= 3 across a 3-model jury ### Parameter & Schema Design **What to look for:** - Missing `.describe()` annotations on Zod schemas or missing `description` fields in JSON Schema - No `outputSchema` declared (server returns unstructured text only) - More than 20 parameters on a single tool (out-of-distribution for FM training) - Generic parameter names like `data`, `input`, `value`, `options` without clarifying descriptions **What to change:** - Add type, meaning, behavioral effect, and default value to every parameter description - Add `outputSchema` declarations so servers return `structuredContent` - Rename ambiguous parameters or add descriptions that disambiguate - For tools with > 20 parameters, consider splitting into multiple tools or using nested objects **How to verify:** - Check all `inputSchema.properties` entries have non-empty, non-trivial descriptions - Verify `outputSchema` is declared and `structuredContent` conforms to it - Count arguments per tool; flag any exceeding 20 ### Error Message Design **What to look for:** - Stack traces leaking to the FM (`Error at`, `at function_name (`) - Raw exception class names (`TypeError:`, `ReferenceError:`) - Error messages shorter than 20 characters - No recovery actions — the FM receives an error but no guidance on what to do next **What to change:** - Structure errors with: what went wrong, which input caused it, why it failed, and what to try instead - Add tool cross-references in recovery actions: "Try `tool_x` with [adjusted args]" - Set `isError: true` on all error responses so the FM knows the call failed - Remove internal implementation details; replace with user/FM-facing language **How to verify:** - Regex checks: no matches for `/Error\s+at\s/`, `/at\s+\w+\s+\(/`, `/^(TypeError|ReferenceError|Error):/` - Length checks: all error messages > 20 characters - Recovery action presence: error text contains actionable guidance (not just "failed") ### System Prompt Design **What to look for:** - Domain knowledge duplicated between system prompt and tool descriptions
Related in Design
contribute
IncludedLocal-only OSS contribution command center. Auto-refreshes the user's in-flight PR and issue state on invoke so conversations start with full context — no need to brief Claude on what's in flight. Helps the user find issues to contribute to on GitHub, builds per-repo dossiers of what each upstream expects (CLA, DCO, branch convention, AI policy, draft-first, review bots, issue templates), runs deterministic gates before any external action so AI-assisted contributions don't reach maintainers as slop. State is markdown-only: candidate files at ~/.contribute-system/candidates/, repo dossiers at ~/.contribute-system/research/, append-only event log at ~/.contribute-system/log.jsonl. No database, no cloud calls. Use when the user asks about their PRs / issues / contributions, wants to find new work to take on, claim an issue, build/refresh a repo's dossier, or draft a Design Issue or PR. Trigger with "/contribute", "what's my PR status", "find a contribution", "claim issue X", "draft a Design Issue for Y", "refresh dossier for Z".
architectural-analysis
IncludedUser-triggered deep architectural analysis of a codebase or scoped subtree across eight modes — information architecture, data flow, integration points, UI surfaces, interaction patterns, data model, control flow, and failure modes. This skill should be used when the user asks to "diagram this codebase," "map the architecture," "show the data flow," "give me an ERD," "trace control flow," "find the integration points," "verify the layout pattern," "audit the UX architecture," or any similar request whose primary deliverable is mermaid diagrams plus cited reports under docs/architecture/. Dispatches haiku/sonnet sub-agents in parallel for per-mode exploration, then verifies every citation mechanically before any node lands in a diagram. Not for one-off prose explanations of code (use code-explanation) or for high-level system design from scratch (use system-design).
mcp
IncludedModel Context Protocol (MCP) server development and tool management. Languages: Python, TypeScript. Capabilities: build MCP servers, integrate external APIs, discover/execute MCP tools, manage multi-server configs, design agent-centric tools. Actions: create, build, integrate, discover, execute, configure MCP servers/tools. Keywords: MCP, Model Context Protocol, MCP server, MCP tool, stdio transport, SSE transport, tool discovery, resource provider, prompt template, external API integration, Gemini CLI MCP, Claude MCP, agent tools, tool execution, server config. Use when: building MCP servers, integrating external APIs as MCP tools, discovering available MCP tools, executing MCP capabilities, configuring multi-server setups, designing tools for AI agents.
react-native-skia
IncludedDesign, build, debug, and optimise high-polish animated graphics in React Native or Expo using @shopify/react-native-skia, Reanimated, and Gesture Handler. Use when the user wants canvas-driven UI, shaders, paths, rich text, image filters, sprite fields, Skottie, video frames, snapshots, web CanvasKit setup, or performance tuning for custom motion-heavy elements such as loaders, hero art, cards, charts, progress indicators, particle systems, or gesture-driven surfaces. Also use when the user asks for fluid, glow, glass, blob, parallax, 60fps/120fps, or GPU-friendly animated effects in React Native, even if they do not explicitly say "Skia". Do not use for ordinary form/layout work with standard views.
plaid
IncludedProduct Led AI Development — guides founders from idea to launched product. Six capabilities: Idea (discover a product idea), Validate (pressure-test the idea against fatal flaws, problem reality, competition, and 2-week MVP feasibility), Plan (vision intake + document generation), Design (translate image references into a design.md spec), Launch (go-to-market strategy), and Build (roadmap execution). Use when someone says "PLAID", "plaid idea", "help me find an idea", "product idea", "idea from my business", "idea from my expertise", "plaid validate", "validate my idea", "pressure-test", "is this idea good", "find fatal flaws", "validate the problem", "plan a product", "define my vision", "generate a PRD", "product strategy", "plaid design", "design from image", "translate image to design", "create design.md", "extract design tokens", "plaid launch", "go-to-market", "launch plan", "GTM strategy", "launch playbook", "plaid build", "build the app", "start building", or "execute the roadmap".
nextjs-framer-motion-animations
IncludedAdds production-safe Motion for React or Framer Motion animations to Next.js apps, including reveal, hover and tap micro-interactions, whileInView, stagger, AnimatePresence, layout and layoutId transitions, reorder, scroll-linked UI, and lightweight route-content transitions. Use when the user asks to add, refactor, or debug Motion or Framer Motion in App Router or Pages Router codebases, especially around server/client boundaries, reduced motion, LazyMotion, bundle size, hydration, or route transitions. Avoid for GSAP-style timelines, WebGL or 3D scenes, heavy scroll storytelling, or CSS-only effects unless Motion is explicitly requested.