Claude
Skills
Sign in
Back

langchain-reference-architecture

Included with Lifetime
$97 forever

A reference layered architecture for production LangChain 1.0 / LangGraph 1.0 services — LLM factory with version-safe defaults, chain/graph registry, retriever and tool DI, Pydantic-validated config, per-request tenant scoping, middleware ordering, checkpointer selection per environment. Use when starting a new service, refactoring a tangled chain, or onboarding a team to existing code. Trigger with "langchain architecture", "langchain llm factory", "langchain chain registry", "langchain dependency injection", "langchain project structure".

AI Agentssaaslangchainlanggraphpythonlangchain-1.0architecturereference-architecturepatterns

What this skill does

# LangChain Reference Architecture (Python)

## Overview

Eight months into a LangChain service, a code review surfaces the mess.
Twelve chain definitions live inlined inside FastAPI route handlers. Three
retrievers are constructed at module-global scope, one bound to
`tenant_id="acme"` because that was the first tenant in the pilot —
that retriever now returns Acme's documents to every other tenant, a P33
leak that has been live in production for six weeks.
`max_retries=6` is hardcoded at four separate call sites. A
`RunnableWithMessageHistory` backed by the default
`InMemoryChatMessageHistory` loses every conversation on pod restart
(P22) — which is most days, because Cloud Run scales to zero.
Config is read from `os.environ` in three modules with three different
fallback strategies. There is no place to put a new provider without
touching seven files, and nobody remembers why the retriever is built
at import time.

The fix is not "rename a variable." The fix is an architecture that made
every one of those mistakes hard to write. This skill is the target
layered architecture:

- `app/` — FastAPI routes. Thin. Parses HTTP, calls into `services`,
  serializes response. No chain logic, no vendor clients, no env vars.
- `services/` — chain and graph definitions. Take dependencies through
  constructor args, not module-level imports.
- `adapters/` — vendor clients, LLM factory, retriever factory, tool
  factory. This is where `langchain-anthropic` is imported. Nowhere else.
- `config/` — one Pydantic `Settings` class. `SecretStr` for keys,
  `Literal["dev","staging","prod"]` for env names, `.env` file loader.
- `domain/` — Pydantic models, typed LangGraph state, enums. No I/O.

Five layers, five imports deep at most. Dependency direction is
**strictly downward**. `app` imports `services`; `services` imports
`adapters`; `adapters` imports `config` and `domain`. Never the reverse.
Import-linter enforces this in CI. Pain-catalog anchors: P22 (in-memory
history loses messages — architectural fix is persistent history
injected via DI) and P33 (per-tenant vector stores leak if retriever
bound at import — architectural fix is per-request factory). Adjacent:
P10 (recursion limits), P24 (middleware order), P28 (callback
inheritance). Pin: `langchain-core 1.0.x`, `langgraph 1.0.x`,
`langchain-anthropic 1.0.x`, `langchain-openai 1.0.x`, `pydantic 2.x`,
`import-linter 2.x`.

## Prerequisites

- Python 3.10+
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`
- `pydantic >= 2.5` and `pydantic-settings >= 2.1`
- `import-linter >= 2.0` for layer enforcement in CI
- Provider package(s): `langchain-anthropic`, `langchain-openai`, etc.
- For staging/prod checkpointer: `langgraph-checkpoint-postgres` and a Postgres instance
- Cross-reference: sibling skill `langchain-model-inference` for the LLM factory's version-safe defaults

## Instructions

### Step 1 — Adopt the 5-layer directory layout

```
src/my_service/
├── app/                         # Layer 1: HTTP boundary (FastAPI)
│   ├── __init__.py
│   ├── main.py                  # FastAPI instance, DI wiring, lifespan
│   ├── routes/
│   │   ├── support.py           # POST /support → services.support.run(...)
│   │   └── health.py
│   └── deps.py                  # FastAPI Depends() providers
├── services/                    # Layer 2: chain and graph definitions
│   ├── __init__.py
│   ├── registry.py              # name → builder lookup
│   ├── support/
│   │   ├── chain.py             # SupportChain(llm, retriever, memory)
│   │   └── graph.py             # SupportGraph (LangGraph StateGraph)
│   └── triage/
│       └── chain.py
├── adapters/                    # Layer 3: vendor integrations
│   ├── __init__.py
│   ├── llm_factory.py           # chat_model(provider, **kwargs) → BaseChatModel
│   ├── retriever_factory.py     # retriever_for(tenant_id) → Retriever
│   ├── tool_factory.py          # tools_for(tenant_id) → list[BaseTool]
│   ├── checkpointer.py          # checkpointer_for(env) → BaseCheckpointSaver
│   └── history.py               # history_for(session_id, tenant_id) → BaseChatMessageHistory
├── config/                      # Layer 4: configuration
│   ├── __init__.py
│   └── settings.py              # Pydantic Settings
└── domain/                      # Layer 5: pure models, no I/O
    ├── __init__.py
    ├── state.py                 # TypedDict / Pydantic for LangGraph state
    └── models.py                # request/response schemas
tests/
├── unit/                        # fake adapters, assert service logic
├── integration/                 # real adapters against ephemeral infra
└── contract/                    # schema snapshots (e.g., tool specs)
pyproject.toml                   # includes [tool.importlinter] contracts
```

Typical depth is 5 layers. See [Directory Layout](references/directory-layout.md) for the full tree with file-naming conventions.

### Step 2 — Centralize LLM defaults in an `adapters/llm_factory.py`

Chains depend on the `BaseChatModel` protocol, not a concrete class. The factory is the one place version-safe defaults live:

```python
# src/my_service/adapters/llm_factory.py
from langchain_core.language_models import BaseChatModel
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

_SAFE_DEFAULTS = {"timeout": 30, "max_retries": 2}

def chat_model(provider: str, **overrides) -> BaseChatModel:
    defaults = {**_SAFE_DEFAULTS, **overrides}  # caller wins
    if provider == "anthropic":
        return ChatAnthropic(model="claude-sonnet-4-6", **defaults)
    if provider == "openai":
        return ChatOpenAI(model="gpt-4o", **defaults)
    raise ValueError(f"Unknown provider: {provider!r}")
```

The `max_retries=6` scatter in the mess-case becomes `max_retries=2` in exactly one file. Services that want a longer timeout pass `timeout=60` — but they never set `max_retries=6` by accident. Cross-reference `langchain-model-inference` Step 3 for the factory pattern's provenance; see [LLM Factory Pattern](references/llm-factory-pattern.md) for per-provider variants and caching.

### Step 3 — Replace scattered imports with a chain/graph registry

```python
# src/my_service/services/registry.py
from typing import Callable, Protocol
from langchain_core.runnables import Runnable

class ChainBuilder(Protocol):
    def __call__(self, *, tenant_id: str) -> Runnable: ...

_BUILDERS: dict[str, ChainBuilder] = {}

def register(name: str):
    def decorator(fn: ChainBuilder) -> ChainBuilder:
        _BUILDERS[name] = fn
        return fn
    return decorator

def get(name: str, *, tenant_id: str) -> Runnable:
    try:
        return _BUILDERSname
    except KeyError:
        raise KeyError(f"No chain registered under {name!r}. Known: {list(_BUILDERS)}")
```

Each service module registers itself:

```python
# src/my_service/services/support/chain.py
from my_service.services.registry import register
from my_service.adapters.llm_factory import chat_model
from my_service.adapters.retriever_factory import retriever_for

@register("support_agent")
def build_support_agent(*, tenant_id: str):
    llm = chat_model("anthropic")
    retriever = retriever_for(tenant_id=tenant_id)
    # ... compose chain ...
    return chain
```

Routes become one line: `chain = registry.get("support_agent", tenant=req.tenant_id)`. There is one place to look, not twelve.

### Step 4 — Build retrievers and tools per-request, keyed by tenant (P33)

This is the P33 architectural fix. The factory takes `tenant_id` as a runtime argument. Nothing is bound at import:

```python
# src/my_service/adapters/retriever_factory.py
from functools import lru_cache
from langchain_core.retrievers import BaseRetriever
from langchain_pinecone import PineconeVectorStore
from my_service.config.settings import get_settings

@lru_cache(maxsize=256)  # cache the *store*, not the retriever
def _store_for(tenant_id: str) -> PineconeVectorStore:
    s = get_settings()
    return PineconeVectorStore(
        in

Related in AI Agents