autonomous-agents
Autonomous agents are AI systems that can independently decompose goals, plan actions, execute tools, and self-correct without constant human guidance. The challenge isn't making them capable - it's making them reliable. Every extra decision multiplies failure probability.
What this skill does
# Autonomous Agents
Autonomous agents are AI systems that can independently decompose goals,
plan actions, execute tools, and self-correct without constant human guidance.
The challenge isn't making them capable - it's making them reliable. Every
extra decision multiplies failure probability.
This skill covers agent loops (ReAct, Plan-Execute), goal decomposition,
reflection patterns, and production reliability. Key insight: compounding
error rates kill autonomous agents. A 95% success rate per step drops to
60% by step 10. Build for reliability first, autonomy second.
2025 lesson: The winners are constrained, domain-specific agents with clear
boundaries, not "autonomous everything." Treat AI outputs as proposals,
not truth.
## Principles
- Reliability over autonomy - every step compounds error probability
- Constrain scope - domain-specific beats general-purpose
- Treat outputs as proposals, not truth
- Build guardrails before expanding capabilities
- Human-in-the-loop for critical decisions is non-negotiable
- Log everything - every action must be auditable
- Fail safely with rollback, not silently with corruption
## Capabilities
- autonomous-agents
- agent-loops
- goal-decomposition
- self-correction
- reflection-patterns
- react-pattern
- plan-execute
- agent-reliability
- agent-guardrails
## Scope
- multi-agent-systems → multi-agent-orchestration
- tool-building → agent-tool-builder
- memory-systems → agent-memory-systems
- workflow-orchestration → workflow-automation
## Tooling
### Frameworks
- LangGraph - When: Production agents with state management Note: 1.0 released Oct 2025, checkpointing, human-in-loop
- AutoGPT - When: Research/experimentation, open-ended exploration Note: Needs external guardrails for production
- CrewAI - When: Role-based agent teams Note: Good for specialized agent collaboration
- Claude Agent SDK - When: Anthropic ecosystem agents Note: Computer use, tool execution
### Patterns
- ReAct - When: Reasoning + Acting in alternating steps Note: Foundation for most modern agents
- Plan-Execute - When: Separate planning from execution Note: Better for complex multi-step tasks
- Reflection - When: Self-evaluation and correction Note: Evaluator-optimizer loop
## Patterns
### ReAct Agent Loop
Alternating reasoning and action steps
**When to use**: Interactive problem-solving, tool use, exploration
# REACT PATTERN:
"""
The ReAct loop:
1. Thought: Reason about what to do next
2. Action: Choose and execute a tool
3. Observation: Receive result
4. Repeat until goal achieved
Key: Explicit reasoning traces make debugging possible
"""
## Basic ReAct Implementation
"""
from langchain.agents import create_react_agent
from langchain_openai import ChatOpenAI
# Define the ReAct prompt template
react_prompt = '''
Answer the question using the following format:
Question: the input question
Thought: reason about what to do
Action: tool_name
Action Input: input to the tool
Observation: result of the action
... (repeat Thought/Action/Observation as needed)
Thought: I now know the final answer
Final Answer: the answer
'''
# Create the agent
agent = create_react_agent(
llm=ChatOpenAI(model="gpt-4o"),
tools=tools,
prompt=react_prompt,
)
# Execute with step limit
result = agent.invoke(
{"input": query},
config={"max_iterations": 10} # Prevent runaway loops
)
"""
## LangGraph ReAct (Production)
"""
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.postgres import PostgresSaver
# Production checkpointer
checkpointer = PostgresSaver.from_conn_string(
os.environ["POSTGRES_URL"]
)
agent = create_react_agent(
model=llm,
tools=tools,
checkpointer=checkpointer, # Durable state
)
# Invoke with thread for state persistence
config = {"configurable": {"thread_id": "user-123"}}
result = agent.invoke({"messages": [query]}, config)
"""
### Plan-Execute Pattern
Separate planning phase from execution
**When to use**: Complex multi-step tasks, when full plan visibility matters
# PLAN-EXECUTE PATTERN:
"""
Two-phase approach:
1. Planning: Decompose goal into subtasks
2. Execution: Execute subtasks, potentially re-plan
Advantages:
- Full visibility into plan before execution
- Can validate/modify plan with human
- Cleaner separation of concerns
Disadvantages:
- Less adaptive to mid-task discoveries
- Plan may become stale
"""
## LangGraph Plan-Execute
"""
from langgraph.prebuilt import create_plan_and_execute_agent
# Planner creates the task list
planner_prompt = '''
For the given objective, create a step-by-step plan.
Each step should be atomic and actionable.
Format: numbered list of steps.
'''
# Executor handles individual steps
executor_prompt = '''
You are executing step {step_number} of the plan.
Previous results: {previous_results}
Current step: {current_step}
Execute this step using available tools.
'''
agent = create_plan_and_execute_agent(
planner=planner_llm,
executor=executor_llm,
tools=tools,
replan_on_error=True, # Re-plan if step fails
)
# Human approval of plan
config = {
"configurable": {
"thread_id": "task-456",
},
"interrupt_before": ["execute"], # Pause before execution
}
# First call creates plan
plan = agent.invoke({"objective": goal}, config)
# Review plan, then continue
if human_approves(plan):
result = agent.invoke(None, config) # Continue from checkpoint
"""
## Decomposition Strategies
"""
# Decomposition-First: Plan everything, then execute
# Best for: Stable tasks, need full plan approval
# Interleaved: Plan one step, execute, repeat
# Best for: Dynamic tasks, learning as you go
def interleaved_execute(goal, max_steps=10):
state = {"goal": goal, "completed": [], "remaining": [goal]}
for step in range(max_steps):
# Plan next action based on current state
next_action = planner.plan_next(state)
if next_action == "DONE":
break
# Execute and update state
result = executor.execute(next_action)
state["completed"].append((next_action, result))
# Re-evaluate remaining work
state["remaining"] = planner.reassess(state)
return state
"""
### Reflection Pattern
Self-evaluation and iterative improvement
**When to use**: Quality matters, complex outputs, creative tasks
# REFLECTION PATTERN:
"""
Self-correction loop:
1. Generate initial output
2. Evaluate against criteria
3. Critique and identify issues
4. Refine based on critique
5. Repeat until satisfactory
Also called: Evaluator-Optimizer, Self-Critique
"""
## Basic Reflection
"""
def reflect_and_improve(task, max_iterations=3):
# Initial generation
output = generator.generate(task)
for i in range(max_iterations):
# Evaluate output
critique = evaluator.critique(
task=task,
output=output,
criteria=[
"Correctness",
"Completeness",
"Clarity",
]
)
if critique["passes_all"]:
return output
# Refine based on critique
output = generator.refine(
task=task,
previous_output=output,
critique=critique["feedback"],
)
return output # Best effort after max iterations
"""
## LangGraph Reflection
"""
from langgraph.graph import StateGraph
def build_reflection_graph():
graph = StateGraph(ReflectionState)
# Nodes
graph.add_node("generate", generate_node)
graph.add_node("reflect", reflect_node)
graph.add_node("output", output_node)
# Edges
graph.add_edge("generate", "reflect")
graph.add_conditional_edges(
"reflect",
should_continue,
{
"continue": "generate", # Loop back
"end": "output",
}
)
return graph.compile()
def should_continue(state):
if state["iteration"] >= 3:
return "end"
if state["score"] >= 0.9:
return "end"
return "cRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.