langchain-deploy-integration

Included with Lifetime

$97 forever

Deploy a LangChain 1.0 / LangGraph 1.0 app to Cloud Run, Vercel, or LangServe correctly — timeouts sized for chain length, cold-start mitigation, SSE anti-buffering headers, Secret Manager over `.env`. Use when prepping first prod deploy, debugging a stream that hangs behind a proxy, or diagnosing p99 latency spikes. Trigger with "langchain deploy", "langchain cloud run", "langchain vercel python", "langchain langserve", "langchain docker".

Cloud & DevOpssaaslangchainlanggraphpythonlangchain-1.0deploymentcloud-runvercel

What this skill does

# LangChain Deploy Integration (Python)

## Overview

An engineer ships a working LangGraph agent to Vercel. Every non-trivial request
returns `FUNCTION_INVOCATION_TIMEOUT`. The Python runtime on Vercel defaults to
a **10-second** cap (P35) — a three-tool agent with one RAG round easily runs
20-40s. Local dev never exposed the wall because `uvicorn` on a laptop has no
timeout. Two fixes apply together and each is load-bearing:

```json
// vercel.json — the baseline cap bump (Pro plan max is 60s, Enterprise 900s)
{ "functions": { "api/chat.py": { "maxDuration": 60 } } }
```

```python
# app/api/chat.py — stream the response so partial output arrives before the cap
from fastapi.responses import StreamingResponse

@app.post("/api/chat")
async def chat(req: ChatRequest):
    async def gen():
        async for chunk in chain.astream(req.input):
            yield f"data: {chunk.model_dump_json()}\n\n"
    return StreamingResponse(gen(), media_type="text/event-stream",
                             headers={"X-Accel-Buffering": "no"})
```

The `maxDuration: 60` raises the Vercel-imposed wall; streaming reduces
time-to-first-byte to under a second so the user sees progress even on a
40-second completion. Once the Vercel cap is fixed, the next three walls are:
Cloud Run cold starts (**5-15s** p99 on Python + LangChain — P36), `.env`
secrets leaking via `docker exec <pod> env` (P37), and SSE streams hanging
because Nginx / Cloud Run buffer the final chunk (P46).

This skill walks through a production-grade multi-stage Dockerfile, Cloud Run
flags for cold-start mitigation, Vercel `maxDuration` + streaming, LangServe
route mounting with FastAPI lifespan, SSE anti-buffering headers, and Secret
Manager via `pydantic.SecretStr`. Pin: `langchain-core 1.0.x`, `langgraph 1.0.x`,
`langserve 1.0.x`. Pain-catalog anchors: **P35** (Vercel 10s default),
**P36** (Cloud Run cold start), **P37** (`.env` leaks), **P46** (SSE buffering).

## Prerequisites

- Python 3.11+ (3.12 preferred for `uvicorn` startup speed)
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`, `langserve >= 1.0, < 2.0`
- `fastapi >= 0.110`, `uvicorn[standard] >= 0.27`
- Target platform: `gcloud` CLI (Cloud Run), `vercel` CLI (Vercel), or `docker` (generic)
- For Cloud Run: a GCP project with Secret Manager API enabled
- For Vercel: a project with `@vercel/python` runtime configured

## Instructions

### Step 1 — Multi-stage Dockerfile with slim runtime and `uvicorn`

A multi-stage build keeps the runtime image under 400MB, which cuts Cloud Run
cold starts by 2-3 seconds. Use `python:3.12-slim` as the final stage (not
`python:3.12` — that base adds ~900MB for dev tooling that never runs in prod).

```dockerfile
# syntax=docker/dockerfile:1.7
FROM python:3.12-slim AS builder
WORKDIR /build
RUN pip install --no-cache-dir uv
COPY pyproject.toml uv.lock ./
RUN uv export --format requirements-txt --no-hashes > requirements.txt \
 && pip wheel --wheel-dir=/wheels -r requirements.txt

FROM python:3.12-slim AS runtime
RUN useradd -m -u 10001 app
WORKDIR /app
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-index --find-links=/wheels /wheels/* \
 && rm -rf /wheels
COPY --chown=app:app app/ ./app/
USER app
EXPOSE 8080
ENV PORT=8080 PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1
CMD ["sh", "-c", "uvicorn app.main:app --host 0.0.0.0 --port ${PORT} --workers 1"]
```

Single worker is correct — Cloud Run handles horizontal scale; in-process
multi-worker just duplicates LangChain client memory. See [Dockerfile and
Secrets](references/dockerfile-and-secrets.md) for the distroless variant
and the `.dockerignore` hardening for `.env` files.

### Step 2 — Deploy to Cloud Run with cold-start mitigation

Python + LangChain + `tiktoken` + one embedding model imports take 5-15
seconds (P36). At `--min-instances=0`, every scale-from-zero request eats that
as user-facing latency. Paying for one always-on instance is usually cheaper
than the lost requests.

```bash
gcloud run deploy langchain-api \
  --source=. \
  --region=us-central1 \
  --min-instances=1 \
  --max-instances=20 \
  --cpu=2 --memory=2Gi \
  --cpu-boost \
  --no-cpu-throttling \
  --timeout=3600 \
  --concurrency=80 \
  --set-secrets=ANTHROPIC_API_KEY=anthropic-key:latest,OPENAI_API_KEY=openai-key:latest \
  [email protected]
# --timeout=3600 is the Cloud Run per-request maximum (1 hour) — needed
# because multi-tool LangGraph agents routinely run 1-5 minutes end-to-end.
```

The load-bearing flags: `--min-instances=1` kills cold-start p99 (one always-warm
replica costs ~$15/mo and dominates p99 improvement); `--cpu-boost` doubles CPU
for the first 10 seconds; `--no-cpu-throttling` (CPU-always-allocated billing)
keeps `astream` running between keepalive pings so long LangGraph runs do not
stall at tool boundaries; `--concurrency=80` matches typical I/O-bound
workloads (drop to 10 if embedding large docs in-process).

See [Cloud Run Deploy](references/cloud-run-deploy.md) for VPC egress, file
secret mounts, revision traffic splitting, and the full cost model.

### Step 3 — Vercel Python: `maxDuration: 60` + streaming to beat the cap

On Vercel Hobby the max is **10s** by default (P35); Pro is **60s**, Enterprise
**900s**. Always set `maxDuration` explicitly — the default is a trap.

```json
// vercel.json
{
  "functions": {
    "api/chat.py": { "maxDuration": 60, "memory": 1024 }
  }
}
```

Streaming is not just a UX fix — it is the mitigation for bursts that still
exceed `maxDuration`. Time-to-first-byte under a second keeps the proxy
considering the request alive; partial content renders on the client; when
the cap finally triggers, the user has already seen most of the answer. The
Vercel entrypoint pattern mirrors the Overview snippet above — pair with the
SSE headers from Step 5.

Edge Runtime is **not** an option here — `@vercel/edge` is JavaScript-only.
Anything that imports `langchain` must run on `@vercel/python` (serverless,
Node-free Python container). See [Vercel Python Deploy](references/vercel-python-deploy.md)
for env vars vs Vercel Secrets, cold-start profiling, and the serverless vs
fluid-compute tradeoff.

### Step 4 — LangServe: `add_routes` + FastAPI lifespan for pool cleanup

LangServe ships typed HTTP routes over any `Runnable`. The `playground` path
is invaluable in dev but **must be disabled in production** — it leaks chain
topology to anyone who can hit the URL. Mount behind a FastAPI `lifespan`
that closes `asyncpg` / `httpx` / Redis pools on revision retirement;
`on_shutdown` fires too late on Cloud Run and connections leak across
revisions.

```python
# app/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from langserve import add_routes

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.chain = build_chain()
    yield
    await db_pool.close()

app = FastAPI(lifespan=lifespan)
add_routes(app, build_chain(), path="/chat",
           enable_feedback_endpoint=False,
           playground_type="chat" if __debug__ else None)  # None = off in prod
```

See [LangServe Patterns](references/langserve-patterns.md) for typed input/output
schemas, auth middleware, and coexisting with raw FastAPI handlers.

### Step 5 — SSE anti-buffering: survive Nginx, Cloud Run, Cloudflare

Nginx, Cloud Run's load balancer, and Cloudflare all buffer responses by
default. On SSE, buffering means the client never sees the final `end` event
and `LangGraph.astream` hangs forever (P46). Two headers plus one response
flush fix it:

```python
from fastapi.responses import StreamingResponse

def sse_headers() -> dict:
    return {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache, no-transform",
        "X-Accel-Buffering": "no",          # disables Nginx buffering
        "Connection": "keep-alive",
    }

@app.post("/api/chat/stream")
async def stream(payload: dict):
    async def gen():
        async for event in gra

Files: 6

Size: 43.1 KB

Complexity: 56/100

Category: Cloud & DevOps

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/langchain-py-pack/skills/langchain-deploy-integration

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts