Claude
Skills
Sign in
Back

langchain-local-dev-loop

Included with Lifetime
$97 forever

Build a fast, deterministic local test loop for LangChain 1.0 / LangGraph 1.0 — FakeListChatModel fixtures, pytest config, VCR cassettes with key redaction, warning-filter policy. Use when adding tests to a new chain, fixing a flaky test, or making integration tests reproducible. Trigger with "langchain pytest", "FakeListChatModel", "VCR langchain", "langchain test fixtures", "langchain integration test".

AI Agentssaaslangchainlanggraphpythonlangchain-1.0testingpytestvcr

What this skill does

# LangChain Local Dev Loop (Python)

## Overview

An engineer writes the most natural assertion possible:

```python
def test_summarize():
    out = chain.invoke({"text": "..."})
    assert out.content == "expected summary"
```

It passes locally against Claude at `temperature=0`. It fails in CI on the third
run with a one-token delta in the output. That is P05: Anthropic's `temperature=0`
is not greedy — it still samples. Tests against live Claude are not deterministic,
period.

So the engineer swaps in `FakeListChatModel(responses=["expected summary"])` and
the assertion passes. Then the downstream callback that logs cost blows up in CI
with `KeyError: 'token_usage'` — because `FakeListChatModel` does not emit
`response_metadata["token_usage"]` (P43). Production code reads that key, so
either the fake has to synthesize it or the test has to skip the callback.

Meanwhile, the first integration test under VCR records a cassette that ships
`Authorization: Bearer sk-ant-api03-...` in the repo (P44). PR review catches it;
the reviewer revokes the key; the dev loop is hosed for an afternoon.

And none of this matters if pytest cannot even collect the suite because
`import langchain_community` emits a `DeprecationWarning` that `-W error` promotes
to failure (P45).

This skill installs the four layers that make the whole loop fast and safe:
`FakeListChatModel` / `FakeListLLM` with a metadata-emitting subclass (fixes P43);
VCR with `filter_headers` plus a pre-commit hook (fixes P44); pytest
`filterwarnings` policy in `pyproject.toml` (fixes P45); and an env-var-gated
integration marker so the default `pytest` run never touches live APIs.

**Speed targets:** unit tests with `FakeListChatModel` run in **< 100ms** per
test; VCR-replayed integration tests run in **500ms – 2s** per test; live
integration tests (the `RUN_INTEGRATION=1` gate) run only in nightly or
manual workflows.

**Pin:** `langchain-core 1.0.x`, `langgraph 1.0.x`, `pytest` current, `vcrpy`
current. Pain-catalog anchors: P05, P43, P44, P45.

## Prerequisites

- Python 3.10+
- `pip install langchain-core>=1.0,<2.0 langgraph>=1.0,<2.0 pytest vcrpy pytest-recording`
- For integration tests: at least one provider key (`ANTHROPIC_API_KEY`, etc.)
- Project uses `pyproject.toml` (PEP 621) for pytest config

## Instructions

### Step 1 — Deterministic unit tests with `FakeListChatModel`

Use `FakeListChatModel` from `langchain_core.language_models.fake` for chat
chains and `FakeListLLM` for legacy completion LLMs. Responses cycle through
the list.

```python
from langchain_core.language_models.fake import FakeListChatModel
from langchain_core.prompts import ChatPromptTemplate

def test_classifier_picks_positive():
    fake = FakeListChatModel(responses=["positive"])
    prompt = ChatPromptTemplate.from_messages([("user", "Classify: {text}")])
    chain = prompt | fake
    out = chain.invoke({"text": "I love it"})
    assert out.content == "positive"
```

This is deterministic, runs in single-digit milliseconds, and has zero provider
dependency. Use it for every chain assertion that does not specifically require
real model behavior.

### Step 2 — Subclass `FakeListChatModel` to emit `response_metadata` (P43 fix)

The stock fake emits no `response_metadata["token_usage"]`. If your chain has a
callback that records cost, the callback crashes under the fake. Subclass and
synthesize the metadata instead of mocking around the callback:

```python
from langchain_core.language_models.fake import FakeListChatModel
from langchain_core.outputs import ChatGeneration, ChatResult
from langchain_core.messages import AIMessage

class FakeChatWithUsage(FakeListChatModel):
    """FakeListChatModel that emits response_metadata['token_usage'] so
    downstream callbacks reading token usage do not crash under test."""

    def _generate(self, messages, stop=None, run_manager=None, **kwargs):
        response = self.responses[self.i % len(self.responses)]
        self.i += 1
        message = AIMessage(
            content=response,
            response_metadata={
                "token_usage": {
                    "input_tokens": 10,
                    "output_tokens": len(response.split()),
                    "total_tokens": 10 + len(response.split()),
                },
                "model_name": "fake-chat",
            },
            usage_metadata={
                "input_tokens": 10,
                "output_tokens": len(response.split()),
                "total_tokens": 10 + len(response.split()),
            },
        )
        return ChatResult(generations=[ChatGeneration(message=message)])
```

Use `FakeChatWithUsage` whenever a chain's observability / cost path is in the
assertion surface. See [Fake Model Fixtures](references/fake-model-fixtures.md)
for agent, retriever, and embedder fakes.

### Step 3 — pytest fixtures that wire the fake into chains

Put fixtures in `tests/conftest.py` so they are shared across the suite:

```python
# tests/conftest.py
import pytest
from langchain_core.prompts import ChatPromptTemplate
from tests.fakes import FakeChatWithUsage

@pytest.fixture
def fake_chat():
    """Reusable fake chat model. Override responses per-test via
    monkeypatch.setattr(fake_chat, 'responses', [...])."""
    return FakeChatWithUsage(responses=["ok"])

@pytest.fixture
def summarize_chain(fake_chat):
    prompt = ChatPromptTemplate.from_messages([
        ("system", "Summarize the user's text in one line."),
        ("user", "{text}"),
    ])
    return prompt | fake_chat
```

Per-test response override:

```python
def test_summary_shape(summarize_chain, fake_chat):
    fake_chat.responses = ["short summary"]
    out = summarize_chain.invoke({"text": "long input"})
    assert out.content == "short summary"
```

### Step 4 — VCR cassettes for integration tests with key redaction (P44 fix)

Unit tests should never touch the network. Integration tests do, exactly once —
to record a cassette — and every subsequent run replays from the cassette file.
`vcrpy` records headers by default, which means `Authorization: Bearer sk-...`
lands in the fixture unless you filter it.

Configure VCR in `tests/conftest.py`:

```python
# tests/conftest.py (continued)
import pytest

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "filter_headers": [
            "authorization",
            "x-api-key",
            "anthropic-version",
            "openai-organization",
            "cookie",
        ],
        "filter_query_parameters": ["api_key"],
        # Block accidental re-recording in CI:
        "record_mode": "none",
    }
```

Use `pytest-recording`:

```python
import pytest

@pytest.mark.vcr  # cassette at tests/cassettes/<test_name>.yaml
@pytest.mark.integration
def test_live_claude_short_answer():
    from langchain_anthropic import ChatAnthropic
    chat = ChatAnthropic(model="claude-sonnet-4-6", temperature=0, timeout=30)
    out = chat.invoke("Say 'ok' and nothing else.")
    assert "ok" in out.content.lower()
```

To record (once, locally, with a real key): `pytest --record-mode=once tests/`.
Every other run replays — cassettes are committed, real API is never hit again.

**Pre-commit hook to block key leaks:**

```bash
# .git/hooks/pre-commit or .pre-commit-config.yaml entry
#!/usr/bin/env bash
set -e
if git diff --cached --name-only | grep -q '^tests/cassettes/'; then
    if git diff --cached -U0 -- 'tests/cassettes/' | \
       grep -E '(sk-ant-[a-zA-Z0-9_-]+|sk-[a-zA-Z0-9]{20,}|Bearer\s+[a-zA-Z0-9_-]{20,})'; then
        echo "ERROR: API key pattern found in staged cassette." >&2
        exit 1
    fi
fi
```

See [VCR Cassette Hygiene](references/vcr-cassette-hygiene.md) for the full
pre-commit config, record-new-episodes flow, shared-cassette patterns, and the
PR review checklist.

### Step 5 — Pytest warnings + markers in `pyproject.toml` (P45 fix)

`langchain_community` and some provider SDKs emit `DeprecationWarning` at import
time. If the suite runs `-W error`, collecti

Related in AI Agents