Claude
Skills
Sign in
Back

langchain-observability

Included with Lifetime
$97 forever

Wire LangSmith tracing and custom metric callbacks into a LangChain 1.0 chain or LangGraph 1.0 agent correctly — env-var spelling, subgraph propagation, per-tenant dimensions, cost and latency counters. Use when setting up observability on a new service, debugging blank traces in LangSmith, or adding per-tenant cost breakdowns. Trigger with "langchain observability", "langsmith tracing", "langchain callbacks", "langchain metrics".

AI Agentssaaslangchainlanggraphpythonlangchain-1.0observabilitylangsmithcallbacks

What this skill does

# LangChain Observability (Python)

## Overview

Engineer sets `LANGCHAIN_TRACING_V2=true` and `LANGCHAIN_API_KEY=...` from the
0.2 docs, restarts the service, and sees zero traces in LangSmith — no errors,
no warnings. That is P26: in LangChain 1.0 the canonical env vars are
`LANGSMITH_TRACING` and `LANGSMITH_API_KEY`. The `LANGCHAIN_*` names are
soft-deprecated and fail silently on any chain that goes through 1.0 middleware
or `create_react_agent`. One-line fix:

```bash
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=lsv2_...
export LANGSMITH_PROJECT=my-service-prod
```

Next failure mode: a custom `BaseCallbackHandler` attached via
`chain.with_config(callbacks=[meter])` fires on the parent but is silent on
LangGraph subgraphs and `create_react_agent` tool calls — token counts
under-report by 30-70% vs the provider dashboard. That is P28: LangGraph
creates a child runtime per subgraph, and bound callbacks do not propagate.
Pass callbacks at invocation time instead:

```python
await chain.ainvoke(inputs, config={"callbacks": [meter], "configurable": {"tenant_id": t}})
```

This skill walks through canonical LangSmith setup, a metric-callback template
with tenant dimensions, invocation-time propagation, `RunnableConfig` trace
tagging, and a decision tree for LangSmith-only vs OTEL-native (defer to
`langchain-otel-observability` / L33 for OTEL-heavy). Pin: `langchain-core 1.0.x`,
`langgraph 1.0.x`, `langsmith` current. LangSmith tracing adds <5ms per-span
overhead; metric callbacks add <1ms per fire. Pain-catalog anchors: P26, P28,
P04 (cache-token aggregation), P25 (retry double-counting).

## Prerequisites

- Python 3.10+
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`
- `langsmith` (bundled with `langchain`; upgrade to current for 1.0 env-var support)
- A LangSmith API key (`lsv2_...`) — free tier at https://smith.langchain.com
- Optional metric sinks: `prometheus_client`, `statsd`, or `datadog` Python packages

## Instructions

### Step 1 — Enable LangSmith with the canonical 1.0 env vars

`LANGSMITH_TRACING=true` is the switch. `LANGSMITH_API_KEY` authenticates.
`LANGSMITH_PROJECT` groups traces by environment — use one project per
`service-env` pair (`myapp-prod`, `myapp-staging`), not one per service.

```bash
# .env (loaded via python-dotenv or secret manager)
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=lsv2_pt_...
LANGSMITH_PROJECT=my-service-prod

# Legacy fallback names (still work, soft-deprecated — do not use in new code):
# LANGCHAIN_TRACING_V2=true
# LANGCHAIN_API_KEY=lsv2_pt_...
# LANGCHAIN_PROJECT=my-service-prod
```

Verify in a REPL that the client sees the key before relying on it in
production:

```python
from langsmith import Client
c = Client()                       # reads LANGSMITH_API_KEY and LANGSMITH_ENDPOINT
print(c.list_projects(limit=1))   # raises LangSmithAuthError if key is wrong
```

Do NOT set both `LANGCHAIN_TRACING_V2` and `LANGSMITH_TRACING` — mixed settings
have caused stale project routing in 1.0.x. See P26.

For selective sampling in high-traffic services, set
`LANGSMITH_SAMPLING_RATE=0.1` (10% of runs). Full detail in
[LangSmith Setup](references/langsmith-setup.md).

### Step 2 — Write a metric callback for per-request observability

Subclass `BaseCallbackHandler`. Record `token_in`, `token_out`, `latency_ms`,
`tool_calls`, and `error`, tagged with a `tenant_id` dimension for downstream
grouping.

```python
import time
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult

class MetricCallback(BaseCallbackHandler):
    """Per-LLM-call metrics tagged with tenant_id. Overhead <1ms per event."""

    def __init__(self, tenant_id: str, sink) -> None:
        self.tenant_id = tenant_id
        self.sink = sink
        self._starts: dict[str, float] = {}

    def on_llm_start(self, serialized, prompts, *, run_id, **kwargs) -> None:
        self._starts[str(run_id)] = time.perf_counter()

    def on_llm_end(self, response: LLMResult, *, run_id, **kwargs) -> None:
        t0 = self._starts.pop(str(run_id), time.perf_counter())
        elapsed_ms = (time.perf_counter() - t0) * 1000   # wall-clock latency
        tags = {"tenant_id": self.tenant_id}
        for gen in response.generations:
            for g in gen:
                meta = getattr(g.message, "usage_metadata", None) or {}
                self.sink.incr("llm.token_in",   meta.get("input_tokens", 0),  tags)
                self.sink.incr("llm.token_out",  meta.get("output_tokens", 0), tags)
                # P04 — aggregate Anthropic cache reads across calls
                cache = meta.get("input_token_details", {}).get("cache_read", 0)
                self.sink.incr("llm.cache_read", cache, tags)
        self.sink.hist("llm.latency_ms", elapsed_ms, tags)

    def on_llm_error(self, error, *, run_id, **kwargs) -> None:
        self._starts.pop(str(run_id), None)
        self.sink.incr("llm.error", 1, {"tenant_id": self.tenant_id,
                                         "error_type": type(error).__name__})

    def on_tool_end(self, output, *, run_id, **kwargs) -> None:
        self.sink.incr("llm.tool_calls", 1, {"tenant_id": self.tenant_id})
```

A thin `sink` protocol (`incr`, `hist`) swaps between Prometheus, StatsD, or
Datadog. Alternative sinks (LangSmith-only, OTEL) do not need this callback
at all — see Step 5. Full sink adapters and P25 retry dedupe in
[Custom Metrics Callback](references/custom-metrics-callback.md).

### Step 3 — Pass callbacks via `config["callbacks"]` at invocation (P28)

This is the single most common observability bug in LangGraph 1.0 services.
Binding callbacks at definition time does not propagate into subgraphs or
`create_react_agent` tool nodes — those create child runtimes with their own
callback scope.

```python
# WRONG — fires on parent runnable only; silent on subgraphs (P28)
agent_bound = agent.with_config(callbacks=[MetricCallback(tenant_id, sink)])
result = await agent_bound.ainvoke(inputs)

# RIGHT — propagates to every runnable, subgraph, and tool call
meter = MetricCallback(tenant_id, sink)
result = await agent.ainvoke(
    inputs,
    config={
        "callbacks": [meter],
        "configurable": {"thread_id": session_id, "tenant_id": tenant_id},
        "tags": ["prod", f"tenant:{tenant_id}"],
        "metadata": {"request_id": req_id, "tier": "enterprise"},
    },
)
```

Construct the callback *inside* the request handler so it captures a fresh
`tenant_id` per request — and in that pattern, invocation-time config is the
only way callbacks reach subgraphs. See [Trace Metadata and Tagging](references/trace-metadata-and-tagging.md)
for the full `RunnableConfig` shape.

### Step 4 — Tag and annotate traces via `RunnableConfig`

LangSmith indexes two per-request fields: `tags` (flat list, filterable) and
`metadata` (key-value, searchable). Fix conventions early — LangSmith has no
rename tool.

```python
config = {
    "callbacks": [meter],
    "tags": [
        "env:prod",                # environment
        f"tenant:{tenant_id}",     # tenant
        f"tier:{tenant_tier}",     # plan tier
        f"feature:{feature_flag}", # A/B experiment arm
    ],
    "metadata": {
        "request_id": req_id,
        "user_id": user_id,
        "session_id": session_id,
        "app_version": os.environ["APP_VERSION"],
    },
    "run_name": "agent_main",      # LangSmith UI label; overrides chain class name
}
```

Hierarchical tag conventions (`env:prod`, `tenant:acme`, `tier:enterprise`)
make LangSmith filters work. Free-form tags (`"important"`, `"check-me"`) do
not. See [Trace Metadata and Tagging](references/trace-metadata-and-tagging.md).

### Step 5 — Pick a sink and the stack shape

The callback handler is the integration point. Options, in decreasing order of
fit:

- **LangSmith only** — zero additional overhead; tracing already covers latency
  and token accounting. Fine for solo dev, small teams, and LLM-native ops.
- **Prometheus (pull)** — bes

Related in AI Agents