Claude
Skills
Sign in
Back

notion-incident-runbook

Included with Lifetime
$97 forever

Execute Notion incident response procedures with triage, mitigation, and postmortem. Use when responding to Notion API outages, investigating errors, or running post-incident reviews for Notion integration failures. Trigger with phrases like "notion incident", "notion outage", "notion down", "notion on-call", "notion emergency", "notion broken".

Backend & APIssaasproductivitynotion

What this skill does

# Notion Incident Runbook

## Overview

Rapid incident response procedures for Notion API failures. This runbook covers a structured triage flow (under 5 minutes), automated health checks against both status.notion.so and your own integration, a decision tree for classifying failures (Notion-side vs. integration-side), per-error-type mitigation with real `Client` code, cached fallback patterns, communication templates, and postmortem structure.

## Prerequisites

- Access to application monitoring dashboards and log aggregator
- `NOTION_TOKEN` environment variable set for diagnostic API calls
- `curl` and `jq` installed for quick CLI triage
- Python alternative: `notion-client` (`pip install notion-client`)
- Communication channels configured (Slack webhook, PagerDuty, etc.)

## Instructions

### Step 1: Quick Triage (Under 5 Minutes)

Run this diagnostic script to determine if the issue is Notion-side or integration-side:

```bash
#!/bin/bash
# notion-triage.sh — run at first alert
set -euo pipefail
echo "=== Notion Incident Triage ==="
echo "Time: $(date -u +%Y-%m-%dT%H:%M:%SZ)"

# 1. Check Notion's public status page
echo -e "\n--- Notion Platform Status ---"
STATUS=$(curl -sf https://status.notion.so/api/v2/status.json \
  | jq -r '.status.description' 2>/dev/null || echo "UNREACHABLE")
echo "Notion Status: $STATUS"

INCIDENTS=$(curl -sf https://status.notion.so/api/v2/incidents/unresolved.json \
  | jq '.incidents | length' 2>/dev/null || echo "UNKNOWN")
echo "Active Incidents: $INCIDENTS"

if [ "$INCIDENTS" != "0" ] && [ "$INCIDENTS" != "UNKNOWN" ]; then
  echo "INCIDENT DETAILS:"
  curl -sf https://status.notion.so/api/v2/incidents/unresolved.json \
    | jq -r '.incidents[] | "  - \(.name) (\(.status)): \(.incident_updates[0].body)"'
fi

# 2. Test our integration authentication
echo -e "\n--- Integration Auth Check ---"
AUTH_HTTP=$(curl -sf -o /dev/null -w "%{http_code}" \
  https://api.notion.com/v1/users/me \
  -H "Authorization: Bearer ${NOTION_TOKEN}" \
  -H "Notion-Version: 2022-06-28" 2>/dev/null || echo "000")
echo "Auth HTTP Status: $AUTH_HTTP"

if [ "$AUTH_HTTP" = "200" ]; then
  BOT_NAME=$(curl -sf https://api.notion.com/v1/users/me \
    -H "Authorization: Bearer ${NOTION_TOKEN}" \
    -H "Notion-Version: 2022-06-28" | jq -r '.name')
  echo "Bot Name: $BOT_NAME"
fi

# 3. Test database query (if test DB configured)
echo -e "\n--- API Responsiveness ---"
if [ -n "${NOTION_TEST_DATABASE_ID:-}" ]; then
  QUERY_RESULT=$(curl -sf -o /dev/null -w "%{http_code} %{time_total}s" \
    -X POST "https://api.notion.com/v1/databases/${NOTION_TEST_DATABASE_ID}/query" \
    -H "Authorization: Bearer ${NOTION_TOKEN}" \
    -H "Notion-Version: 2022-06-28" \
    -H "Content-Type: application/json" \
    -d '{"page_size": 1}' 2>/dev/null || echo "000 0.000s")
  echo "Database Query: $QUERY_RESULT"
else
  echo "NOTION_TEST_DATABASE_ID not set — skipping query test"
fi

# 4. Classification
echo -e "\n--- Triage Result ---"
if [ "$STATUS" != "All Systems Operational" ] && [ "$STATUS" != "UNREACHABLE" ]; then
  echo "CLASSIFICATION: Notion-side issue. Enable fallback mode."
elif [ "$AUTH_HTTP" = "401" ]; then
  echo "CLASSIFICATION: Token expired or revoked. Rotate immediately."
elif [ "$AUTH_HTTP" = "429" ]; then
  echo "CLASSIFICATION: Rate limited. Reduce concurrency."
elif [ "$AUTH_HTTP" = "000" ]; then
  echo "CLASSIFICATION: Network/DNS issue. Check firewall and DNS."
else
  echo "CLASSIFICATION: Integration-side issue. Check application logs."
fi
```

**TypeScript — programmatic triage:**

```typescript
import { Client, isNotionClientError, APIErrorCode } from '@notionhq/client';

async function triageNotionHealth(token: string): Promise<{
  classification: string;
  notionStatus: string;
  authStatus: string;
  latencyMs: number;
}> {
  // Check Notion status page
  let notionStatus = 'unknown';
  try {
    const res = await fetch('https://status.notion.so/api/v2/status.json');
    const data = await res.json();
    notionStatus = data.status.description;
  } catch { notionStatus = 'unreachable'; }

  // Test our authentication
  const client = new Client({ auth: token, timeoutMs: 10_000 });
  const start = Date.now();
  let authStatus = 'unknown';
  let classification = 'unknown';

  try {
    await client.users.me({});
    authStatus = 'authenticated';
    classification = 'integration-side';
  } catch (error) {
    if (isNotionClientError(error)) {
      authStatus = `${error.code} (HTTP ${error.status})`;
      switch (error.code) {
        case APIErrorCode.Unauthorized:
          classification = 'token-expired';
          break;
        case APIErrorCode.RateLimited:
          classification = 'rate-limited';
          break;
        case APIErrorCode.ServiceUnavailable:
          classification = 'notion-down';
          break;
        default:
          classification = 'api-error';
      }
    } else {
      authStatus = 'network-error';
      classification = 'network-issue';
    }
  }

  if (notionStatus !== 'All Systems Operational') {
    classification = 'notion-side';
  }

  return {
    classification,
    notionStatus,
    authStatus,
    latencyMs: Date.now() - start,
  };
}
```

### Step 2: Decision Tree and Mitigation

```
Is status.notion.so showing an incident?
|
+-- YES --> Notion-side outage
|   +-- Enable cached/fallback mode
|   +-- Notify users of degraded service
|   +-- Monitor status page for resolution
|   +-- DO NOT restart or rotate tokens
|
+-- NO --> Our integration issue
    |
    +-- Auth returning 401?
    |   +-- YES --> Token expired or revoked
    |   |   +-- Regenerate at notion.so/my-integrations
    |   |   +-- Update secret manager (see below)
    |   |   +-- Restart application
    |   +-- NO --> Continue
    |
    +-- Getting 429 rate limits?
    |   +-- YES --> Exceeding 3 req/s average
    |   |   +-- Check for runaway loops or webhook storms
    |   |   +-- Reduce concurrency to 1
    |   |   +-- Add exponential backoff
    |   +-- NO --> Continue
    |
    +-- Getting 404 on specific resources?
    |   +-- YES --> Pages unshared or deleted
    |   |   +-- Re-share pages with integration via Connections menu
    |   |   +-- Check if pages were moved to trash
    |   +-- NO --> Continue
    |
    +-- Getting 400 validation errors?
    |   +-- YES --> Database schema changed in Notion UI
    |   |   +-- Re-fetch schema (databases.retrieve)
    |   |   +-- Compare with expected properties
    |   |   +-- Update property mappings in code
    |   +-- NO --> Investigate application logs
```

**Token rotation:**

```bash
# AWS Secrets Manager
aws secretsmanager update-secret \
  --secret-id notion/production \
  --secret-string '{"token":"ntn_NEW_TOKEN_HERE"}'

# GCP Secret Manager
echo -n "ntn_NEW_TOKEN_HERE" | \
  gcloud secrets versions add notion-token-prod --data-file=-

# Restart to pick up new token
kubectl rollout restart deployment/my-app  # Kubernetes
# or: gcloud run services update my-service --no-traffic  # Cloud Run
```

**Cached fallback for Notion outages:**

```typescript
import { Client, isNotionClientError } from '@notionhq/client';

const notion = new Client({ auth: process.env.NOTION_TOKEN! });
const cache = new Map<string, { data: any; timestamp: number }>();
const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes

async function queryWithFallback(dbId: string, filter?: any) {
  const cacheKey = `query:${dbId}:${JSON.stringify(filter)}`;

  try {
    const result = await notion.databases.query({
      database_id: dbId,
      filter,
      page_size: 100,
    });

    // Update cache on success
    cache.set(cacheKey, { data: result, timestamp: Date.now() });
    return { data: result, source: 'live' as const };
  } catch (error) {
    // Fall back to cache on any API error
    const cached = cache.get(cacheKey);
    if (cached && Date.now() - cached.timestamp < CACHE_TTL_MS) {
      console.warn(`Notion unavailable, serving cached data (age: ${
        Math.round((Date.now() - cached.ti

Related in Backend & APIs