langchain-reference-architecture
A reference layered architecture for production LangChain 1.0 / LangGraph 1.0 services — LLM factory with version-safe defaults, chain/graph registry, retriever and tool DI, Pydantic-validated config, per-request tenant scoping, middleware ordering, checkpointer selection per environment. Use when starting a new service, refactoring a tangled chain, or onboarding a team to existing code. Trigger with "langchain architecture", "langchain llm factory", "langchain chain registry", "langchain dependency injection", "langchain project structure".
What this skill does
# LangChain Reference Architecture (Python)
## Overview
Eight months into a LangChain service, a code review surfaces the mess.
Twelve chain definitions live inlined inside FastAPI route handlers. Three
retrievers are constructed at module-global scope, one bound to
`tenant_id="acme"` because that was the first tenant in the pilot —
that retriever now returns Acme's documents to every other tenant, a P33
leak that has been live in production for six weeks.
`max_retries=6` is hardcoded at four separate call sites. A
`RunnableWithMessageHistory` backed by the default
`InMemoryChatMessageHistory` loses every conversation on pod restart
(P22) — which is most days, because Cloud Run scales to zero.
Config is read from `os.environ` in three modules with three different
fallback strategies. There is no place to put a new provider without
touching seven files, and nobody remembers why the retriever is built
at import time.
The fix is not "rename a variable." The fix is an architecture that made
every one of those mistakes hard to write. This skill is the target
layered architecture:
- `app/` — FastAPI routes. Thin. Parses HTTP, calls into `services`,
serializes response. No chain logic, no vendor clients, no env vars.
- `services/` — chain and graph definitions. Take dependencies through
constructor args, not module-level imports.
- `adapters/` — vendor clients, LLM factory, retriever factory, tool
factory. This is where `langchain-anthropic` is imported. Nowhere else.
- `config/` — one Pydantic `Settings` class. `SecretStr` for keys,
`Literal["dev","staging","prod"]` for env names, `.env` file loader.
- `domain/` — Pydantic models, typed LangGraph state, enums. No I/O.
Five layers, five imports deep at most. Dependency direction is
**strictly downward**. `app` imports `services`; `services` imports
`adapters`; `adapters` imports `config` and `domain`. Never the reverse.
Import-linter enforces this in CI. Pain-catalog anchors: P22 (in-memory
history loses messages — architectural fix is persistent history
injected via DI) and P33 (per-tenant vector stores leak if retriever
bound at import — architectural fix is per-request factory). Adjacent:
P10 (recursion limits), P24 (middleware order), P28 (callback
inheritance). Pin: `langchain-core 1.0.x`, `langgraph 1.0.x`,
`langchain-anthropic 1.0.x`, `langchain-openai 1.0.x`, `pydantic 2.x`,
`import-linter 2.x`.
## Prerequisites
- Python 3.10+
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`
- `pydantic >= 2.5` and `pydantic-settings >= 2.1`
- `import-linter >= 2.0` for layer enforcement in CI
- Provider package(s): `langchain-anthropic`, `langchain-openai`, etc.
- For staging/prod checkpointer: `langgraph-checkpoint-postgres` and a Postgres instance
- Cross-reference: sibling skill `langchain-model-inference` for the LLM factory's version-safe defaults
## Instructions
### Step 1 — Adopt the 5-layer directory layout
```
src/my_service/
├── app/ # Layer 1: HTTP boundary (FastAPI)
│ ├── __init__.py
│ ├── main.py # FastAPI instance, DI wiring, lifespan
│ ├── routes/
│ │ ├── support.py # POST /support → services.support.run(...)
│ │ └── health.py
│ └── deps.py # FastAPI Depends() providers
├── services/ # Layer 2: chain and graph definitions
│ ├── __init__.py
│ ├── registry.py # name → builder lookup
│ ├── support/
│ │ ├── chain.py # SupportChain(llm, retriever, memory)
│ │ └── graph.py # SupportGraph (LangGraph StateGraph)
│ └── triage/
│ └── chain.py
├── adapters/ # Layer 3: vendor integrations
│ ├── __init__.py
│ ├── llm_factory.py # chat_model(provider, **kwargs) → BaseChatModel
│ ├── retriever_factory.py # retriever_for(tenant_id) → Retriever
│ ├── tool_factory.py # tools_for(tenant_id) → list[BaseTool]
│ ├── checkpointer.py # checkpointer_for(env) → BaseCheckpointSaver
│ └── history.py # history_for(session_id, tenant_id) → BaseChatMessageHistory
├── config/ # Layer 4: configuration
│ ├── __init__.py
│ └── settings.py # Pydantic Settings
└── domain/ # Layer 5: pure models, no I/O
├── __init__.py
├── state.py # TypedDict / Pydantic for LangGraph state
└── models.py # request/response schemas
tests/
├── unit/ # fake adapters, assert service logic
├── integration/ # real adapters against ephemeral infra
└── contract/ # schema snapshots (e.g., tool specs)
pyproject.toml # includes [tool.importlinter] contracts
```
Typical depth is 5 layers. See [Directory Layout](references/directory-layout.md) for the full tree with file-naming conventions.
### Step 2 — Centralize LLM defaults in an `adapters/llm_factory.py`
Chains depend on the `BaseChatModel` protocol, not a concrete class. The factory is the one place version-safe defaults live:
```python
# src/my_service/adapters/llm_factory.py
from langchain_core.language_models import BaseChatModel
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
_SAFE_DEFAULTS = {"timeout": 30, "max_retries": 2}
def chat_model(provider: str, **overrides) -> BaseChatModel:
defaults = {**_SAFE_DEFAULTS, **overrides} # caller wins
if provider == "anthropic":
return ChatAnthropic(model="claude-sonnet-4-6", **defaults)
if provider == "openai":
return ChatOpenAI(model="gpt-4o", **defaults)
raise ValueError(f"Unknown provider: {provider!r}")
```
The `max_retries=6` scatter in the mess-case becomes `max_retries=2` in exactly one file. Services that want a longer timeout pass `timeout=60` — but they never set `max_retries=6` by accident. Cross-reference `langchain-model-inference` Step 3 for the factory pattern's provenance; see [LLM Factory Pattern](references/llm-factory-pattern.md) for per-provider variants and caching.
### Step 3 — Replace scattered imports with a chain/graph registry
```python
# src/my_service/services/registry.py
from typing import Callable, Protocol
from langchain_core.runnables import Runnable
class ChainBuilder(Protocol):
def __call__(self, *, tenant_id: str) -> Runnable: ...
_BUILDERS: dict[str, ChainBuilder] = {}
def register(name: str):
def decorator(fn: ChainBuilder) -> ChainBuilder:
_BUILDERS[name] = fn
return fn
return decorator
def get(name: str, *, tenant_id: str) -> Runnable:
try:
return _BUILDERSname
except KeyError:
raise KeyError(f"No chain registered under {name!r}. Known: {list(_BUILDERS)}")
```
Each service module registers itself:
```python
# src/my_service/services/support/chain.py
from my_service.services.registry import register
from my_service.adapters.llm_factory import chat_model
from my_service.adapters.retriever_factory import retriever_for
@register("support_agent")
def build_support_agent(*, tenant_id: str):
llm = chat_model("anthropic")
retriever = retriever_for(tenant_id=tenant_id)
# ... compose chain ...
return chain
```
Routes become one line: `chain = registry.get("support_agent", tenant=req.tenant_id)`. There is one place to look, not twelve.
### Step 4 — Build retrievers and tools per-request, keyed by tenant (P33)
This is the P33 architectural fix. The factory takes `tenant_id` as a runtime argument. Nothing is bound at import:
```python
# src/my_service/adapters/retriever_factory.py
from functools import lru_cache
from langchain_core.retrievers import BaseRetriever
from langchain_pinecone import PineconeVectorStore
from my_service.config.settings import get_settings
@lru_cache(maxsize=256) # cache the *store*, not the retriever
def _store_for(tenant_id: str) -> PineconeVectorStore:
s = get_settings()
return PineconeVectorStore(
inRelated in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.