accelerate
Use when the workflow is too slow, too expensive, or both and needs latency, cost, or token usage optimization.
What this skill does
## MANDATORY PREPARATION Invoke /agent-workflow — it contains workflow principles, anti-patterns, and the **Context Gathering Protocol**. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run /teach-maestro first. Consult the context-management reference in the agent-workflow skill for window optimization and budget strategies. --- Make the workflow faster and cheaper without sacrificing quality. Measure before and after. ### Performance Audit Measure current performance: ```text Current metrics: Latency (p50): ___ms Latency (p95): ___ms Cost per request: $___ Token usage (avg): ___ input / ___ output Error rate: ___% ``` ### Acceleration Strategies **Reduce Token Usage** - Shorten system prompts (remove redundant instructions) - Compress few-shot examples to minimum viable length - Use structured output schemas instead of verbose text - Summarize context instead of passing raw documents - Reduce output length requirements **Model Cascading** - Route simple tasks to cheaper/faster models - Escalate only complex tasks to capable models - Use classification to determine complexity **Caching** - Cache responses for identical or near-identical inputs - Cache tool results with appropriate TTL - Cache embeddings for frequently-queried documents - Use semantic caching for similar (not identical) queries **Parallelization** - Run independent tool calls in parallel - Run independent agent steps in parallel - Use streaming to start processing before full response **Context Optimization** - Retrieve less, retrieve better (improve retrieval precision) - Use context compression techniques - Implement sliding window for long conversations ### Acceleration Report For each optimization: 1. **What changed**: Specific modification 2. **Before**: Latency/cost/tokens before 3. **After**: Latency/cost/tokens after 4. **Quality impact**: Any quality change (verify with golden tests) 5. **Trade-off**: What was sacrificed for the improvement ### Acceleration Checklist - [ ] Baseline metrics recorded before any changes - [ ] Each optimization measured with before/after comparison - [ ] Quality impact verified (golden tests still pass) - [ ] Trade-offs documented for each change - [ ] Cost/latency improvements quantified ### Recommended Next Step After optimization, run `/evaluate` to verify quality didn't degrade, or `/iterate` to set up continuous monitoring. **NEVER**: - Optimize without measuring first (you need a baseline) - Sacrifice quality for speed without explicit user approval - Cache outputs that depend on real-time data - Skip the quality check after optimization - Optimize prematurely (make it correct first, then make it fast)
Related in enhancement
amplify
IncludedUse when the workflow works but needs to handle more complex cases or produce higher-quality output through better tools, context, prompts, or models.
enrich
IncludedUse when the agent needs access to information beyond its training data — knowledge sources, RAG pipelines, or grounding data.
guard
IncludedUse when deploying to production, handling sensitive data, or the workflow needs safety constraints, input validation, and security boundaries.
iterate
IncludedUse when the workflow needs to self-correct, improve over time, or establish feedback loops and evaluation cycles.
temper
IncludedUse when the workflow feels over-engineered, has premature optimizations, unnecessary abstraction layers, or complexity beyond actual requirements.
turbocharge
IncludedUse when the user wants to push past conventional workflow limits with advanced performance techniques like parallel orchestration, streaming pipelines, or adaptive routing.