langchain-otel-observability

Included with Lifetime

$97 forever

Wire LangChain 1.0 / LangGraph 1.0 traces into an OpenTelemetry-native backend (Jaeger, Honeycomb, Grafana Tempo, Datadog) with LLM-specific SLOs, safe prompt-content policy, and subgraph-aware span propagation. Use when LangSmith is not the right fit (existing OTEL stack, compliance, multi-cloud) or alongside LangSmith for deep-system traces. Trigger with "langchain OTEL", "langchain opentelemetry", "langchain jaeger", "langchain honeycomb", "langchain SLO", "LLM span", "langchain tempo", "langchain datadog tracing".

Backend & APIssaaslangchainlanggraphpythonlangchain-1.0observabilityopentelemetryjaeger

What this skill does

# LangChain OTEL Observability (Python)

## Overview

An engineer wires OpenTelemetry expecting to see prompts and responses in
Honeycomb. The traces land — but only timing, model name, and token counts
appear. The prompt body is blank. This is **not** a bug: it's the OTEL GenAI
semantic-conventions privacy-safe default (P27), where
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` is off. The instinct is to
flip it on and move on. On a multi-tenant workload that flip is a leak — the
next engineer to search traces for Tenant A sees Tenant B's PII in the results,
because redaction was supposed to happen upstream and never did.

A second trap lives inside LangGraph. A `BaseCallbackHandler` attached to the
parent runnable never fires on inner agent tool calls, because LangGraph
creates a child runtime per subgraph and callbacks do not inherit (P28). Spans
inside subgraphs appear orphaned in the waterfall — or they do not appear at
all — and SLO dashboards under-count latency on the exact calls that matter
most: the nested agent loops.

This skill wires LangChain 1.0 / LangGraph 1.0 into an OTEL-native backend
(Jaeger, Honeycomb, Grafana Tempo, Datadog) with a correct content-capture
policy, subgraph-aware span propagation, and five LLM-specific SLOs (p95 / p99
latency, error rate, cost-per-request, TTFT) with burn-rate alerts. Pin:
`langchain-core 1.0.x`, `langgraph 1.0.x`,
`opentelemetry-instrumentation-langchain >= 0.33`, OTEL GenAI semconv as of
2026-04. Pain-catalog anchors: P27, P28 (and cross-references P04, P34, P37).

## Prerequisites

- Python 3.10+
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`
- An OTEL-native backend picked: Jaeger (dev), Honeycomb / Tempo / Datadog (prod)
- For multi-tenant: upstream redaction middleware already in place (see
  `langchain-security-basics` and `langchain-middleware-patterns`)
- Access to set env vars at deploy time (`OTLP_ENDPOINT`, API keys)

## Instructions

### Step 1 — Install the SDK and instrumentor, configure the exporter

```bash
pip install \
  opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp-proto-http \
  "opentelemetry-instrumentation-langchain>=0.33"
```

```python
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.langchain import LangchainInstrumentor

resource = Resource.create({
    "service.name": "my-langchain-app",
    "service.version": "1.0.0",
    "deployment.environment": os.getenv("ENV", "dev"),
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(
        endpoint=os.environ["OTLP_ENDPOINT"],       # per-backend; see matrix
        headers=_parse_headers(os.getenv("OTLP_HEADERS", "")),
    ),
    max_queue_size=2048,        # spans buffered before drop; raise for high volume
    max_export_batch_size=512,  # batched export keeps per-span overhead under 1ms
))
trace.set_tracer_provider(provider)

LangchainInstrumentor().instrument()   # emits gen_ai.* attrs on every run
```

`BatchSpanProcessor` keeps per-span overhead well under 1 ms. Use
`SimpleSpanProcessor` only in local dev — it blocks the call path per span.

Per-backend `OTLP_ENDPOINT` and header config lives in
[Backend Setup Matrix](references/backend-setup-matrix.md) — Jaeger,
Honeycomb, Grafana Tempo, Datadog.

### Step 2 — Verify the GenAI attribute schema

Trigger one call and inspect what landed in the backend. LangChain 1.0 emits
these `gen_ai.*` attributes natively on every chat-model span:

| Attribute | Example |
|-----------|---------|
| `gen_ai.system` | `anthropic` |
| `gen_ai.request.model` | `claude-sonnet-4-6` |
| `gen_ai.request.temperature` | `0.0` |
| `gen_ai.usage.input_tokens` | `1234` |
| `gen_ai.usage.output_tokens` | `567` |
| `gen_ai.response.finish_reasons` | `["stop"]` |

Missing anything? Likely a stale instrumentor version or an outdated provider
package. The full emitted-vs-custom matrix plus LangGraph's span taxonomy
(`LangGraph.invoke` → `LangGraph.node.*` → `LangGraph.subgraph.*`) is in
[GenAI Semantic Conventions](references/genai-semantic-conventions.md).

### Step 3 — Decide on prompt-content capture (critical — do not skip)

The engineer's instinct is to flip the capture flag to see prompts. Before
flipping it, classify the workload into one of these buckets:

| Workload | Flag | Notes |
|----------|------|-------|
| Dev / staging with synthetic inputs | `true` | Fine. Do not copy these traces to prod. |
| Single-tenant internal tool | `true` | Fine if RBAC on backend is tight. |
| Single-tenant product, signed compliance artifacts | `true` | BAA / DPIA in place; retention policy matches log retention. |
| Multi-tenant SaaS, **no upstream redaction** | **`false`** | Hard no. Fix redaction first. |
| Multi-tenant SaaS, **with upstream redaction** | `true` | Safe — the span sees the already-redacted text. |
| Healthcare / finance / legal without legal sign-off | **`false`** | Hard no. |

```bash
# trusted single-tenant ONLY
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
export TRACELOOP_TRACE_CONTENT=true   # OpenLLMetry alias; set both to be safe
```

Leave unset (default) anywhere else. To capture bodies in a multi-tenant
system, wire redaction middleware upstream of the model call first — see
[Prompt Content Policy](references/prompt-content-policy.md) and cross-reference
pack siblings `langchain-security-basics` (PII redaction middleware pattern,
P34) and `langchain-middleware-patterns` (middleware order: redact → cache →
model, P24). **Failure pattern P27** — prompts missing from traces because
capture was never opted in — is the #1 first-day OTEL complaint; make the
decision explicit instead of surprise-flipping the flag in prod.

### Step 4 — Propagate callbacks through subgraphs (P28)

LangGraph creates a child runtime per subgraph. Callbacks bound at the parent
definition time do **not** inherit:

```python
# WRONG — subagent spans orphaned or missing (P28)
agent = create_react_agent(model=llm, tools=tools).with_config(
    callbacks=[my_handler]  # bound at definition time; children do not see it
)
agent.invoke({"messages": [...]})

# RIGHT — pass callbacks at invocation via config; they propagate down
agent.invoke(
    {"messages": [...]},
    config={"callbacks": [my_handler]}  # invocation-time; inherited by children
)
```

The same rule applies to custom attribute handlers (e.g. the
`CostAttributeHandler` in the semantic-conventions reference that stamps
`gen_ai.usage.cost_usd` on each model span). Attach via
`config["callbacks"]`, never via `.with_config()`. **Failure pattern P28
symptom:** SLO dashboards show low latency because the slow nested spans are
missing entirely, not because the nested calls are fast.

### Step 5 — Define LLM SLOs and dashboards

Five SLIs matter from day one. All five derive from `gen_ai.*` span attributes
— no second pipeline required:

| SLI | Target example | Why |
|-----|----------------|-----|
| **p95 latency** (top-level chat) | `< 5 s` for chat UI | Provider variance dominates |
| **p99 latency** | `< 15 s` | Tail matters on chat; agents with tools live here |
| **Error rate** | `< 0.5%` | Includes 429s + `finish_reason IN ("length","content_filter")` |
| **Cost per request** (p95) | `< $0.05` | Catches `haiku`→`opus` regressions |
| **TTFT p95** (streaming) | `< 2 s` | Perceived latency, not total duration |

Concrete Honeycomb / PromQL / Datadog queries for each SLI, plus multi-window
multi-burn-rate alerts (14.4× / 1h fast burn, 6× / 6h slow burn), are in
[LLM SLO Dashboards](references/llm-slo-dashboards.md).

### Step 6 — Tune sampling

Defaults are wrong for two ends of the volume spectrum:

```python
from opentelemetry.sdk.trace.sampling import TraceI

Files: 6

Size: 44.5 KB

Complexity: 57/100

Category: Backend & APIs

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/langchain-py-pack/skills/langchain-otel-observability

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts