webhook-dx-audit
Audit the developer experience of any platform that sends outbound webhooks or event destinations to its customers, and produce a structured YAML audit file with scored findings and prioritized recommendations. Use whenever the task is to review, assess, grade, or critique a company's webhook/event- delivery DX: their signup and onboarding, signing and verification, retry and delivery semantics, event catalog and payloads, setup surfaces (UI/API/CLI/IaC/SDK), consumer-facing observability, local dev, and local-to-production transition. Trigger this for a 'webhook DX review', 'event destinations audit', or an 'outbound webhook assessment', even if the user names a specific company (e.g. 'review Acme's webhooks') rather than the word audit. The output is a YAML audit file conforming to `schema/audit.schema.yaml`; whoever consumes it downstream renders their own presentation.
What this skill does
# Webhook DX Audit
Audit how a platform's customers experience its outbound webhooks and event destinations, end to end, from discovery through to production, and produce a scored YAML audit file with specific, prioritized recommendations.
The subject is any company that sends events to its developers (Stripe, Shopify, Paddle, or a smaller platform). You evaluate what their integrating developers actually hit: docs, dashboard, signing, retries, observability, and tooling, using only what is public or already exposed in product.
**Scope: webhooks AND event destinations.** Treat "outbound webhooks" and "event destinations" as the same audit. The industry terminology is in flux: Stripe popularized "event destinations" (and now delivers directly to Amazon EventBridge and Azure Event Grid alongside webhooks), Shopify ships HTTP webhooks + EventBridge + Pub/Sub destinations and is rolling out "Event Subscriptions" branding, and others still call the whole thing "webhooks". The benchmark for what a modern offering should include is the Event Destinations initiative at https://eventdestinations.org. Score against that broader concept regardless of the platform's chosen label. For a webhook-only platform, criteria that target other destination types are Not Applicable (the destination type breadth criterion in category 6 still scores 0 because the breadth gap is real).
**Three states + two scores.** Each criterion ends up at 0/1/2, Not Supported (= 0 with intent labeled), Not Applicable (logical exclusion, dropped from math), or Not Assessed (couldn't reach, e.g. dashboard-gated in a Pass 1 run). Pass 1 produces two roll-ups from the same data: a Public-scope grade (what's reachable now) and a Provisional minimum (the floor if human-in-the-loop (HITL) verification never runs). See `references/rubric.md` for definitions and `references/scoring.md` for the math.
**Audience matters.** Declare the platform's intended audience at audit start: `developer-platform` (where integrators are software engineers), `no-code-saas` (where integrators are power users wiring up automations through a UI), or `mixed` (multiple audiences with the webhook surface serving a specific tier). Verify the designation by fetching the platform's homepage and citing specific signals (hero copy, nav structure, customer testimonials, pricing tiers, API prominence); see `references/methodology.md` step 0 for the checklist. The audience-driven N/A logic in `rubric.md` removes criteria that don't apply (e.g. IaC and local-dev workflow simulation are N/A for a pure no-code SaaS; under `mixed` you score by judgment per criterion). Default to `developer-platform` only as a Pass-1 fallback if the homepage cannot be reached; Pass 2 must revisit with HITL verification.
**Perspective: this is a human developer's experience.** Categories 1 through 11 score what a person integrating with the platform encounters, so read docs as a human reads them: the rendered HTML pages a developer visits, not `.md` or `llms.txt` exports. Whether those machine-readable doc formats exist is an AI-readiness signal scored only in category 12. Keep all AI and agent assessment inside category 12; do not let it bleed into the other eleven. (Fetching a formal API/event spec like OpenAPI for category 4 is fine; that serves human codegen and validation, and is not the same as reading a machine doc export in place of the human docs.)
Fetching the `.md` export of a page to extract a quote or speed up evidence collection is fine; the rule is that you *score* what the rendered HTML page presents to a human, not what the `.md` contains. If the two diverge, treat it as an evidence gap, not a free pass to use whichever is better.
## When to use this
Use this for any request to review, grade, or critique a platform's webhook or event-destination DX. The review scope covers onboarding through to first delivered webhook, local dev experience, local-to-production transition, event types, webhook signing, retry support, and examples. See `references/program-mapping.md` for how findings map to matching Hookdeck offerings when relevant.
## Roles: who does what
This is a collaboration. Most of the work is yours (the agent), but some evidence sits behind a login or a UI that only a human can reach. Split the categories accordingly and do not stall waiting on the human for things you can already get.
**You (the agent) do unattended, from public surfaces:** implementation guidance, event catalog & schema, security & authentication (as documented), delivery semantics (as documented), SDKs & verification (read the actual repo source, not just the README), API/CLI/IaC setup surfaces (docs, Terraform registry), and agent/AI readiness (`llms.txt`, the `hookdeck/webhook-skills` repo, any MCP). Plus all scoring math and writing the YAML audit. This is the bulk of the audit.
**The human is required for:** account creation (signup almost always needs a person for email confirmation, captcha, or a card), and the in-product surfaces that cannot be judged from docs: dashboard configuration, firing a test event and seeing it land, consumer-facing delivery logs, and self-serve endpoint/subscription management.
**Critical HITL capture: an example delivery payload.** Whenever the human fires a test event or observes a real delivery, they capture and share the full delivery payload (all request headers and the body) with the auditor. The actual delivery often surfaces information the docs do not: which signing mode is active, what the dedup ID header is named in practice, which timestamp format is used, whether any custom headers are set by the operator on the destination, what user-agent identifies the sender. With a payload to score against, the auditor can recommend "document the `webhook-signature` header you're already sending" instead of the more abstract "add a signature scheme". Without it, recommendations stay conditional ("in default mode the header is X; in Standard Webhooks mode it's Y") and the integrator has to figure out their own situation.
**HITL captures fill structured fields, not narrative paragraphs.** The delivery payload lands in `audit.hitl_evidence.delivery_payload_capture` as a structured object (`signing_mode`, `headers` map, `body`, `custom_headers_feature_in_use`, example values). In-product observations land in `audit.findings[].criteria[].evidence` strings keyed by criterion id (the criterion the observation scores) and, when audience-driven or HITL-specific, also as records in `audit.hitl_evidence.scoring_decisions` or `audit.hitl_evidence.other_observations`. Do not write free-form "HITL Pass 2 lifted the grade from F to D..." narrative into the `summary` field; the dual-score data already lives in `grade.public_scope` and `grade.provisional_minimum`, and the criteria Pass 2 closed live in `passes.pass_2.closed_criteria`. The summary stays about the platform's webhook DX, not the audit's own process.
**Two ways the human covers the gated parts**, whichever they prefer:
- **Relay:** the human clicks through and pastes back screenshots or a few sentences of what they saw, and you score from that.
- **Authenticated browser:** the human logs in and hands you the session (Claude in Chrome), so you navigate the dashboard yourself with them supervising. Signup itself usually still needs the human.
Default to relay if the human does not say. Never guess a gated capability to avoid asking; mark it Not Assessed (or Not Applicable if a logical rule rules it out) and queue it for the human instead.
## How an audit runs
Run it in two passes so the human is only in the loop briefly, with a precise ask. The output of every pass is the audit YAML file (scaffolded from `assets/report-template.yaml`); there is no Markdown intermediate.
0. **Scaffold the audit YAML.** Copy `assets/report-template.yaml` to the path the caller chose. Fill in `audit.platform`, `audit.prepared`, and the `audit.reviewer` block. The default flow iRelated in Design
contribute
IncludedLocal-only OSS contribution command center. Auto-refreshes the user's in-flight PR and issue state on invoke so conversations start with full context — no need to brief Claude on what's in flight. Helps the user find issues to contribute to on GitHub, builds per-repo dossiers of what each upstream expects (CLA, DCO, branch convention, AI policy, draft-first, review bots, issue templates), runs deterministic gates before any external action so AI-assisted contributions don't reach maintainers as slop. State is markdown-only: candidate files at ~/.contribute-system/candidates/, repo dossiers at ~/.contribute-system/research/, append-only event log at ~/.contribute-system/log.jsonl. No database, no cloud calls. Use when the user asks about their PRs / issues / contributions, wants to find new work to take on, claim an issue, build/refresh a repo's dossier, or draft a Design Issue or PR. Trigger with "/contribute", "what's my PR status", "find a contribution", "claim issue X", "draft a Design Issue for Y", "refresh dossier for Z".
architectural-analysis
IncludedUser-triggered deep architectural analysis of a codebase or scoped subtree across eight modes — information architecture, data flow, integration points, UI surfaces, interaction patterns, data model, control flow, and failure modes. This skill should be used when the user asks to "diagram this codebase," "map the architecture," "show the data flow," "give me an ERD," "trace control flow," "find the integration points," "verify the layout pattern," "audit the UX architecture," or any similar request whose primary deliverable is mermaid diagrams plus cited reports under docs/architecture/. Dispatches haiku/sonnet sub-agents in parallel for per-mode exploration, then verifies every citation mechanically before any node lands in a diagram. Not for one-off prose explanations of code (use code-explanation) or for high-level system design from scratch (use system-design).
mcp
IncludedModel Context Protocol (MCP) server development and tool management. Languages: Python, TypeScript. Capabilities: build MCP servers, integrate external APIs, discover/execute MCP tools, manage multi-server configs, design agent-centric tools. Actions: create, build, integrate, discover, execute, configure MCP servers/tools. Keywords: MCP, Model Context Protocol, MCP server, MCP tool, stdio transport, SSE transport, tool discovery, resource provider, prompt template, external API integration, Gemini CLI MCP, Claude MCP, agent tools, tool execution, server config. Use when: building MCP servers, integrating external APIs as MCP tools, discovering available MCP tools, executing MCP capabilities, configuring multi-server setups, designing tools for AI agents.
react-native-skia
IncludedDesign, build, debug, and optimise high-polish animated graphics in React Native or Expo using @shopify/react-native-skia, Reanimated, and Gesture Handler. Use when the user wants canvas-driven UI, shaders, paths, rich text, image filters, sprite fields, Skottie, video frames, snapshots, web CanvasKit setup, or performance tuning for custom motion-heavy elements such as loaders, hero art, cards, charts, progress indicators, particle systems, or gesture-driven surfaces. Also use when the user asks for fluid, glow, glass, blob, parallax, 60fps/120fps, or GPU-friendly animated effects in React Native, even if they do not explicitly say "Skia". Do not use for ordinary form/layout work with standard views.
plaid
IncludedProduct Led AI Development — guides founders from idea to launched product. Six capabilities: Idea (discover a product idea), Validate (pressure-test the idea against fatal flaws, problem reality, competition, and 2-week MVP feasibility), Plan (vision intake + document generation), Design (translate image references into a design.md spec), Launch (go-to-market strategy), and Build (roadmap execution). Use when someone says "PLAID", "plaid idea", "help me find an idea", "product idea", "idea from my business", "idea from my expertise", "plaid validate", "validate my idea", "pressure-test", "is this idea good", "find fatal flaws", "validate the problem", "plan a product", "define my vision", "generate a PRD", "product strategy", "plaid design", "design from image", "translate image to design", "create design.md", "extract design tokens", "plaid launch", "go-to-market", "launch plan", "GTM strategy", "launch playbook", "plaid build", "build the app", "start building", or "execute the roadmap".
nextjs-framer-motion-animations
IncludedAdds production-safe Motion for React or Framer Motion animations to Next.js apps, including reveal, hover and tap micro-interactions, whileInView, stagger, AnimatePresence, layout and layoutId transitions, reorder, scroll-linked UI, and lightweight route-content transitions. Use when the user asks to add, refactor, or debug Motion or Framer Motion in App Router or Pages Router codebases, especially around server/client boundaries, reduced motion, LazyMotion, bundle size, hydration, or route transitions. Avoid for GSAP-style timelines, WebGL or 3D scenes, heavy scroll storytelling, or CSS-only effects unless Motion is explicitly requested.