temporal-cloud
Fix Temporal Cloud connection, auth, and config problems. Use when users hit login failures, can't connect to Cloud, get x509/TLS errors, have namespace or endpoint mismatches, paste broken SDK connection snippets, are confused about which endpoint to use, see "no pollers" or RESOURCE_EXHAUSTED, struggle with PrivateLink/PSC, or need help setting up a new namespace. Also use for HA namespace failover and DNS issues. Not for worker performance tuning or scaling.
What this skill does
# Temporal Cloud Skill
Help users diagnose and resolve Temporal Cloud connectivity, authentication, and configuration issues using tcld and temporal CLI.
## Core Philosophy
Cloud issues are frustrating because they sit at the intersection of configuration, networking, authentication, and Temporal-specific code. Most problems fall into predictable patterns. This skill provides systematic diagnosis to quickly identify root causes and prescribe fixes.
**References:**
- See `references/cloud-troubleshooting-reference.md` for full CLI command reference and error codes
- See `references/common-scenarios.md` for step-by-step setup walkthroughs
- [Environment configuration docs](https://docs.temporal.io/develop/environment-configuration) - SDK setup for connecting to Cloud
- [HA namespace connectivity](https://docs.temporal.io/cloud/high-availability/ha-connectivity) - multi-region endpoint and DNS setup
- [Dev Success troubleshooting guide](https://github.com/temporalio/dev-success/blob/main/troubleshooting-connection-issues-to-temporal-cloud.md) - companion connection troubleshooting guide
**Out of scope:** Worker performance tuning, scaling, metrics interpretation, SDK-specific config, deployment patterns. Those topics are covered by separate worker-focused skills.
## Issue Classification
| Category | Key Symptoms | First Check |
|----------|--------------|-------------|
| **tcld Login** | login failed, token refresh failed, wrong account | `tcld account get` |
| **Connection/Auth** | can't connect, access denied, handshake failures | Endpoint format + DNS + port connectivity |
| **Ambiguous Runtime Errors** | `context deadline exceeded`, `workflow is busy` | Identify the operation and layer first |
| **mTLS/Certs** | x509 errors, unknown authority, expired | `openssl x509 -enddate` |
| **Namespace** | namespace not found, SNI mismatch | Namespace name format |
| **HA / Failover** | Failover not working, wrong region, DNS stale | DNS CNAME resolution |
| **Worker** | Tasks not picked up, stale connections | `temporal task-queue describe` |
| **Private Connectivity** | PrivateLink/PSC errors | VPC endpoint status |
| **Rate Limiting** | RESOURCE_EXHAUSTED | APS limits |
## The Process
### Step 1: Identify the Category
Ask the user:
- **What's the exact error message?** (copy-paste if possible)
- **What are you trying to do?** (tcld command, starting workers, running workflows)
- **What changed recently?** (new certs, new namespace, new region)
### Step 2: Gather Context
**For SDK/client snippet reviews:**
- Which auth method are you using: API key or mTLS?
- Which SDK and version are you using?
- What exact `HostPort` / address are you using?
- What exact Namespace are you using?
- Is this SDK code, `temporal` CLI, or `tcld`?
**For tcld issues:**
- Can you run `tcld account get`?
- Multiple Temporal accounts?
**For connection issues:**
- What's your exact address / `HostPort`?
- Using mTLS or API keys?
- Which SDK and version are you using?
- Any firewall/proxy between you and Cloud?
**For ambiguous runtime errors:**
- Where exactly do you see the error: workflow start, signal/update, polling, querying, logs?
- Is this happening before work starts, while polling, or while workflow code is already running?
- Are pollers present on the relevant task queue?
- Did this start after a traffic spike, deploy, or config change?
**For certificate issues:**
- When were certs generated?
- What CA was used?
- Is CA uploaded to namespace?
**For worker issues:**
- Are workers running? How many?
- What does `temporal task-queue describe` show?
- Any errors in worker logs?
### Step 3: Apply Decision Tree
Use the appropriate decision tree based on category (see below).
### Step 4: Provide Fix
Give specific commands to resolve the issue, with verification steps.
Always include a confidence score for the proposed diagnosis or fix:
- `Confidence: 9-10/10` when the symptom, operation, and confirming signals line up cleanly
- `Confidence: 6-8/10` when the evidence is good but one plausible alternative remains
- `Confidence: 1-5/10` when the issue is still ambiguous and the "fix" is really the next discriminating check
If the problem is ambiguous, say so explicitly and keep the recommendation scoped to the next check rather than presenting a speculative root cause as settled.
## Decision Trees
### tcld Login Issues
```
Symptom: tcld login not working
│
├─ Can `tcld account get` run?
│ ├─ Yes → Login is valid; continue with account verification
│ └─ No → Run `tcld login`
│
├─ Token refresh failed?
│ └─ tcld logout && tcld login
│
├─ Wrong organization/account?
│ ├─ tcld account get
│ └─ Verify the expected namespace appears in `tcld namespace list`
│
└─ "unauthorized" or auth errors?
└─ tcld logout && tcld login
```
### Connection Failures
**Docs:** [Environment configuration](https://docs.temporal.io/develop/environment-configuration) - SDK connection options
**Endpoint check before network debugging:**
| Use case | Recommended endpoint | Notes |
|----------|---------------------|-------|
| Workers & clients (all auth) | `<namespace>.<account>.tmprl.cloud:7233` | **Namespace Endpoint** - works for both mTLS and API key auth. Recommended for all namespaces. |
| Multi-region HA (advanced) | `<region>.<cloud_provider>.api.temporal.io:7233` | Regional Endpoint - only needed for advanced HA routing. See [namespace access docs](https://docs.temporal.io/cloud/namespaces#access-namespaces). |
| tcld / Cloud Ops API | `saas-api.tmprl.cloud` | Control plane |
**Exception:** Namespaces using Flexible Auth (pre-release) cannot use Namespace Endpoints yet.
```
Symptom: Can't connect to Temporal Cloud
│
├─ Check: Using Namespace Endpoint?
│ ├─ Using regional endpoint (`*.api.temporal.io`) without HA need?
│ │ └─ Switch to Namespace Endpoint (`<ns>.<acct>.tmprl.cloud:7233`)
│ ├─ Using old/stale endpoint format?
│ │ └─ Switch to Namespace Endpoint
│ └─ Endpoint looks correct → Continue
│
├─ Check: DNS resolution
│ └─ nslookup <host-from-address>
│ ├─ Fails → DNS issue (check network, VPN)
│ └─ Succeeds → Continue
│
├─ Check: Port connectivity
│ └─ nc -zv <host-from-address> 7233
│ ├─ Fails → Firewall blocking port 7233
│ └─ Succeeds → Continue
│
├─ Check: TLS handshake
│ └─ openssl s_client -connect <address>
│ ├─ Fails → Certificate issue (see mTLS tree)
│ └─ Succeeds → Continue
│
└─ Check: Temporal CLI test
└─ temporal workflow list --limit 1 --address ...
├─ PERMISSION_DENIED → Check namespace name format
├─ UNAUTHENTICATED → Certificate not accepted
└─ Works → Connection OK, issue elsewhere
```
### Ambiguous Runtime Errors
Do not assume these are pure connectivity failures. Classify them by operation first.
| Error text | Common interpretations | First discriminator |
|------------|------------------------|---------------------|
| `context deadline exceeded` | wrong endpoint, network timeout, oversized payload, blocked execution path, client-side timeout | Where in the flow does it occur? |
| `workflow is busy` / `RESOURCE_EXHAUSTED: Workflow is busy` | operation-level contention, workload pressure, confusing user-facing error semantics | Which operation returned it? |
| `no pollers` | no connected workers, workers present but misconfigured, stale/misleading metrics | Does `temporal task-queue describe` show pollers? |
Use this decision sequence:
```
Symptom: ambiguous runtime error
│
├─ Check: Which operation returned the error?
│ ├─ start / signal / update / query request
│ ├─ poll loop / worker logs
│ └─ UI / metrics only
│
├─ Check: Is work reaching a task queue?
│ ├─ No pollers listed
│ │ └─ Treat as worker connectivity / config until proven otherwise
│ ├─ Pollers listed, backlog growing
│ │ └─ Worker capacity / tuning issue (out of scope for this skill)
│ └─ Pollers listed, no backlog issue
│ └─ Continue
│
├─ For `context deadline exceeded`
│ ├─ Happens before any work Related in Backend & APIs
jfrog
IncludedInteract with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.
cupynumeric-migration-readiness
IncludedPre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.
alibabacloud-data-agent-skill
IncludedInvoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".
token-optimizer
IncludedReduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.
resend-cli
IncludedUse this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.
alibabacloud-odps-maxframe-coding
IncludedUse this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.