Claude
Skills
Sign in
Back

webhook-dx-audit

Included with Lifetime
$97 forever

Audit the developer experience of any platform that sends outbound webhooks or event destinations to its customers, and produce a structured YAML audit file with scored findings and prioritized recommendations. Use whenever the task is to review, assess, grade, or critique a company's webhook/event- delivery DX: their signup and onboarding, signing and verification, retry and delivery semantics, event catalog and payloads, setup surfaces (UI/API/CLI/IaC/SDK), consumer-facing observability, local dev, and local-to-production transition. Trigger this for a 'webhook DX review', 'event destinations audit', or an 'outbound webhook assessment', even if the user names a specific company (e.g. 'review Acme's webhooks') rather than the word audit. The output is a YAML audit file conforming to `schema/audit.schema.yaml`; whoever consumes it downstream renders their own presentation.

Designscriptsassets

What this skill does


# Webhook DX Audit

Audit how a platform's customers experience its outbound webhooks and event destinations, end to end, from discovery through to production, and produce a scored YAML audit file with specific, prioritized recommendations.

The subject is any company that sends events to its developers (Stripe, Shopify, Paddle, or a smaller platform). You evaluate what their integrating developers actually hit: docs, dashboard, signing, retries, observability, and tooling, using only what is public or already exposed in product.

**Scope: webhooks AND event destinations.** Treat "outbound webhooks" and "event destinations" as the same audit. The industry terminology is in flux: Stripe popularized "event destinations" (and now delivers directly to Amazon EventBridge and Azure Event Grid alongside webhooks), Shopify ships HTTP webhooks + EventBridge + Pub/Sub destinations and is rolling out "Event Subscriptions" branding, and others still call the whole thing "webhooks". The benchmark for what a modern offering should include is the Event Destinations initiative at https://eventdestinations.org. Score against that broader concept regardless of the platform's chosen label. For a webhook-only platform, criteria that target other destination types are Not Applicable (the destination type breadth criterion in category 6 still scores 0 because the breadth gap is real).

**Three states + two scores.** Each criterion ends up at 0/1/2, Not Supported (= 0 with intent labeled), Not Applicable (logical exclusion, dropped from math), or Not Assessed (couldn't reach, e.g. dashboard-gated in a Pass 1 run). Pass 1 produces two roll-ups from the same data: a Public-scope grade (what's reachable now) and a Provisional minimum (the floor if human-in-the-loop (HITL) verification never runs). See `references/rubric.md` for definitions and `references/scoring.md` for the math.

**Audience matters.** Declare the platform's intended audience at audit start: `developer-platform` (where integrators are software engineers), `no-code-saas` (where integrators are power users wiring up automations through a UI), or `mixed` (multiple audiences with the webhook surface serving a specific tier). Verify the designation by fetching the platform's homepage and citing specific signals (hero copy, nav structure, customer testimonials, pricing tiers, API prominence); see `references/methodology.md` step 0 for the checklist. The audience-driven N/A logic in `rubric.md` removes criteria that don't apply (e.g. IaC and local-dev workflow simulation are N/A for a pure no-code SaaS; under `mixed` you score by judgment per criterion). Default to `developer-platform` only as a Pass-1 fallback if the homepage cannot be reached; Pass 2 must revisit with HITL verification.

**Perspective: this is a human developer's experience.** Categories 1 through 11 score what a person integrating with the platform encounters, so read docs as a human reads them: the rendered HTML pages a developer visits, not `.md` or `llms.txt` exports. Whether those machine-readable doc formats exist is an AI-readiness signal scored only in category 12. Keep all AI and agent assessment inside category 12; do not let it bleed into the other eleven. (Fetching a formal API/event spec like OpenAPI for category 4 is fine; that serves human codegen and validation, and is not the same as reading a machine doc export in place of the human docs.)

Fetching the `.md` export of a page to extract a quote or speed up evidence collection is fine; the rule is that you *score* what the rendered HTML page presents to a human, not what the `.md` contains. If the two diverge, treat it as an evidence gap, not a free pass to use whichever is better.

## When to use this

Use this for any request to review, grade, or critique a platform's webhook or event-destination DX. The review scope covers onboarding through to first delivered webhook, local dev experience, local-to-production transition, event types, webhook signing, retry support, and examples. See `references/program-mapping.md` for how findings map to matching Hookdeck offerings when relevant.

## Roles: who does what

This is a collaboration. Most of the work is yours (the agent), but some evidence sits behind a login or a UI that only a human can reach. Split the categories accordingly and do not stall waiting on the human for things you can already get.

**You (the agent) do unattended, from public surfaces:** implementation guidance, event catalog & schema, security & authentication (as documented), delivery semantics (as documented), SDKs & verification (read the actual repo source, not just the README), API/CLI/IaC setup surfaces (docs, Terraform registry), and agent/AI readiness (`llms.txt`, the `hookdeck/webhook-skills` repo, any MCP). Plus all scoring math and writing the YAML audit. This is the bulk of the audit.

**The human is required for:** account creation (signup almost always needs a person for email confirmation, captcha, or a card), and the in-product surfaces that cannot be judged from docs: dashboard configuration, firing a test event and seeing it land, consumer-facing delivery logs, and self-serve endpoint/subscription management.

**Critical HITL capture: an example delivery payload.** Whenever the human fires a test event or observes a real delivery, they capture and share the full delivery payload (all request headers and the body) with the auditor. The actual delivery often surfaces information the docs do not: which signing mode is active, what the dedup ID header is named in practice, which timestamp format is used, whether any custom headers are set by the operator on the destination, what user-agent identifies the sender. With a payload to score against, the auditor can recommend "document the `webhook-signature` header you're already sending" instead of the more abstract "add a signature scheme". Without it, recommendations stay conditional ("in default mode the header is X; in Standard Webhooks mode it's Y") and the integrator has to figure out their own situation.

**HITL captures fill structured fields, not narrative paragraphs.** The delivery payload lands in `audit.hitl_evidence.delivery_payload_capture` as a structured object (`signing_mode`, `headers` map, `body`, `custom_headers_feature_in_use`, example values). In-product observations land in `audit.findings[].criteria[].evidence` strings keyed by criterion id (the criterion the observation scores) and, when audience-driven or HITL-specific, also as records in `audit.hitl_evidence.scoring_decisions` or `audit.hitl_evidence.other_observations`. Do not write free-form "HITL Pass 2 lifted the grade from F to D..." narrative into the `summary` field; the dual-score data already lives in `grade.public_scope` and `grade.provisional_minimum`, and the criteria Pass 2 closed live in `passes.pass_2.closed_criteria`. The summary stays about the platform's webhook DX, not the audit's own process.

**Two ways the human covers the gated parts**, whichever they prefer:

- **Relay:** the human clicks through and pastes back screenshots or a few sentences of what they saw, and you score from that.
- **Authenticated browser:** the human logs in and hands you the session (Claude in Chrome), so you navigate the dashboard yourself with them supervising. Signup itself usually still needs the human.

Default to relay if the human does not say. Never guess a gated capability to avoid asking; mark it Not Assessed (or Not Applicable if a logical rule rules it out) and queue it for the human instead.

## How an audit runs

Run it in two passes so the human is only in the loop briefly, with a precise ask. The output of every pass is the audit YAML file (scaffolded from `assets/report-template.yaml`); there is no Markdown intermediate.

0. **Scaffold the audit YAML.** Copy `assets/report-template.yaml` to the path the caller chose. Fill in `audit.platform`, `audit.prepared`, and the `audit.reviewer` block. The default flow i
Files: 13
Size: 175.6 KB
Complexity: 91/100
Category: Design

Related in Design