Claude
Skills
Sign in
Back

system-architecture

Included with Lifetime
$97 forever

Canonical joelclaw topology and wiring map. Use when reasoning about architecture, tracing event flow, debugging why something ran/didn't run, identifying which worker executes a function, checking what listens on a port, or following an event end-to-end.

Generalarchitecturetopologyinngestgatewaykubernetesobservability

What this skill does


# System Architecture (Canonical Topology)

This skill is the **single source of truth** for joelclaw system wiring.
Use it for:
- "why did this run / not run"
- "which worker handles this function"
- "what is listening on port X"
- "how does event Y flow"
- full-stack routing/debug across CLI → Inngest → workers → gateway → telemetry

## Ground-Truth Scope + Evidence Snapshot

This document is grounded in direct reads of:
- `apps/docs-api/src/index.ts`
- `packages/restate/Dockerfile`
- `packages/restate/src/index.ts`
- `packages/restate/src/workflows/dag-orchestrator.ts`
- `packages/agent-execution/src/microvm.ts`
- `packages/system-bus/src/serve.ts`
- `packages/system-bus/src/inngest/functions/index.host.ts`
- `packages/system-bus/src/inngest/functions/index.cluster.ts`
- `packages/system-bus/src/inngest/client.ts`
- `infra/worker-supervisor/src/main.rs`
- `~/Library/LaunchAgents/com.joel*.plist`
- `k8s/*` (all files)
- `infra/pds/values.yaml`
- `packages/gateway/src/daemon.ts`
- `packages/gateway/src/channels/*.ts`
- `~/.joelclaw/gateway/AGENTS.md`
- `~/.joelclaw/gateway/.pi/settings.json`
- `~/.local/caddy/Caddyfile`
- `~/.colima/default/colima.yaml` + `colima status --json`
- `packages/cli/src/cli.ts`, `packages/cli/src/config.ts`, `packages/cli/src/inngest.ts`
- `packages/system-bus/src/observability/*` (key files: `emit.ts`, `otel-event.ts`, `store.ts`)
- `packages/telemetry/src/emitter.ts`
- `packages/system-bus/src/lib/langfuse.ts`
- `packages/inference-router/src/tracing.ts`
- ADRs in `~/Vault/docs/decisions/` (required + topology-adjacent)
- last 50 lines of `~/Vault/system/system-log.jsonl`

### Related docs verified
- `docs/architecture.md` — Restate/Firecracker runtime + workload execution flow
- `docs/deploy.md` — Restate worker deploy + auth/identity/PVC procedures
- `docs/cli.md` — workload command tree + runtime bridge
- `docs/observability.md` — **not inspected in this update**

---

## 1) Physical Topology

```text
Mac Mini "Panda" (host macOS)
├─ launchd services (gateway, worker supervisor, caddy, talon, agent-mail, etc.)
├─ Colima VM (driver: VZ, arch: aarch64, runtime: docker, VM IP: 192.168.64.2)
│  └─ Talos node: joelclaw-controlplane-1 (k8s v1.35.0, internal IP 10.5.0.2)
│     ├─ namespace: joelclaw
│     │  ├─ inngest (StatefulSet + NodePort 8288/8289)
│     │  ├─ redis (StatefulSet + NodePort 6379)
│     │  ├─ typesense (StatefulSet + ClusterIP 8108)
│     │  ├─ restate (StatefulSet + NodePort 8080/9070/9071)
│     │  ├─ system-bus-worker (Deployment + ClusterIP 3111)
│     │  ├─ restate-worker (Deployment + ClusterIP 9080; full agent image + Firecracker)
│     │  ├─ dkron (StatefulSet + ClusterIP 8080)
│     │  ├─ docs-api (Deployment + NodePort 3838)
│     │  ├─ livekit-server (Deployment + NodePort 7880/7881)
│     │  ├─ bluesky-pds (Deployment + NodePort 3000)
│     │  └─ minio (StatefulSet + NodePort 30900/30901)
│     └─ namespace: aistor
│        ├─ aistor operator (Deployments: adminjob-operator, object-store-operator)
│        └─ aistor-s3 object store (StatefulSet + NodePort 31000/31001)
├─ Caddy reverse proxy (tailnet HTTPS fan-in)
├─ Gateway daemon (embedded pi session)
├─ Firecracker substrate (requires Colima nestedVirtualization=true for /dev/kvm; OFF by default — unstable under load)
└─ NAS "three-body" (NFS tiers per ADR-0088)
```

### Known runtime endpoints
- Colima VM IP: `192.168.64.2` (`colima status --json`)
- Kubernetes API (stable operator tunnel): `https://127.0.0.1:16443`
- Talos API (stable operator tunnel): `127.0.0.1:15000`
- Tailnet hostnames seen in config:
  - `panda.tail7af24.ts.net` (Caddy routes)
  - `pds.panda.tail7af24.ts.net` (PDS values)

### Tailscale mesh state
- `tailscale status --json` failed in this environment: **UNKNOWN — needs manual verification**

---

## 2) Process Inventory (Long-Running)

## Host launchd inventory (snapshot)

> Snapshot source: `launchctl print gui/$(id -u)/<label>` and plist inspection.

| Launchd label | State | PID (snapshot) | Role | Ports / endpoints |
|---|---:|---:|---|---|
| `com.joel.system-bus-worker` | running | 75292 | Host worker supervisor (`worker-supervisor`) | supervises child bun on 3111 |
| `com.joel.restate-worker` | retired / rollback-only | — | Historical host Restate wrapper (`scripts/restate/start.sh`) | superseded by `deployment/restate-worker` on 9080 |
| `com.joel.gateway` | running | 81275 | Gateway daemon (`packages/gateway/src/daemon.ts`) | WS `:3018`, Redis bridge |
| `com.joel.caddy` | running | 9347 | Reverse proxy | 3443, 5443, 6443, 7443, 8290, 8443, 9443 |
| `com.joel.talon` | running | 96359 | Infra watchdog | health `127.0.0.1:9999` |
| `com.joel.agent-secrets` | running | 98048 | Secret lease daemon | no public port |
| `com.joel.imsg-rpc` | running | 61110 | iMessage JSON-RPC socket daemon | Unix socket `/tmp/imsg.sock` |
| `com.joel.kube-operator-access` | running | varies | stable kubectl/talos operator tunnel | local 16443 (kube), 15000 (talos) |
| `com.joel.voice-agent` | running | 71887 | voice agent runtime | local 8081 |
| `com.joel.local-sandbox-janitor` | scheduled | (launchd timer) | ADR-0221 local sandbox janitor (`scripts/local-sandbox-janitor.sh` → `joelclaw workload sandboxes janitor`) | logs in `/tmp/joelclaw/local-sandbox-janitor.{log,err}` |
| `com.joelclaw.agent-mail` | spawn scheduled | (none in launchctl snapshot) | agent-mail MCP HTTP service | observed listener `127.0.0.1:8765` (python process) |
| `com.joel.colima` | not running | — | startup helper for Colima | n/a |
| `com.joel.k8s-reboot-heal` | not running | — | periodic k8s heal script | n/a |
| `com.joel.system-bus-sync` | not running | — | sync guard watcher | n/a |
| `com.joel.gateway-tripwire` | not running | — | gateway tripwire script | n/a |
| `com.joel.content-sync-watcher` | not running | — | fs watch -> content/updated event | n/a |
| `com.joel.vault-log-sync` | not running | — | Vault log sync watcher | n/a |

### Process supervision behavior: `worker-supervisor`

Source: `infra/worker-supervisor/src/main.rs`

- Default config:
  - worker dir: `~/Code/joelhooks/joelclaw/packages/system-bus`
  - command: `bun run src/serve.ts`
  - port: `3111`
  - health endpoint: `/api/inngest`
  - sync endpoint: `/api/inngest` (PUT)
  - health interval: 30s
  - restart after 3 consecutive health failures
  - restart backoff: 1s → 30s max
- Pre-start kills stale process on port 3111.
- Runs host import preflight before spawn:
  - `bun --eval "await import('./src/inngest/functions/index.host.ts');"`
  - on failure, skips spawn and retries with exponential backoff
- Loads env from `~/.config/system-bus.env` plus leased secrets.
- Forces `WORKER_ROLE=host` for the supervised host worker.
- Emits OTEL events via CLI on supervisor failures/restarts:
  - `worker.supervisor.preflight.failed`
  - `worker.supervisor.worker_exit`
  - `worker.supervisor.health_check.restart`

### Worker supervision split note
- Talon is running (`com.joel.talon`), but host worker is still launched via `com.joel.system-bus-worker` -> `worker-supervisor`.
- ADR + system-log indicate Talon can defer worker supervision during coexistence.

---

## Kubernetes process inventory

## Node
- `joelclaw-controlplane-1` (Talos v1.12.4, k8s v1.35.0, internal IP `10.5.0.2`)

## Core services

| Service | Workload kind | Service type | Service port(s) | NodePort(s) / exposure | Role |
|---|---|---|---|---|---|
| Inngest | StatefulSet `inngest` | NodePort (`inngest-svc`) | 8288, 8289 | 8288, 8289 | Event API + connect ws |
| Redis | StatefulSet `redis` | NodePort | 6379 | 6379 | Queue/state/pubsub |
| Typesense | StatefulSet `typesense` | NodePort | 8108 | 8108 via Colima/Lima host publish | Search + telemetry store |
| Restate | StatefulSet `restate` | NodePort | 8080, 9070, 9071 | 8080, 9070, 9071 | Durable workflow ingress + admin + metrics |
| system-bus-worker | Deployment | ClusterIP | 3111 | in-cluster only | Cluster-role worker (12 functions) |
| restate-worker | Deployment
Files: 1
Size: 40.4 KB
Complexity: 39/100
Category: General

Related in General