langchain-ci-integration

Included with Lifetime

$97 forever

Wire LangChain 1.0 / LangGraph 1.0 tests into a GitHub Actions pipeline — unit tests with FakeListChatModel, VCR-gated integration tests, warning-filter policy, and eval-regression merge gates. Complements langchain-local-dev-loop (F23) which covers the inner loop; THIS covers the CI wire-up. Use when setting up GHA for a new LLM service, after a VCR cassette leak incident, or hardening an existing pipeline. Trigger with "langchain ci", "langchain github actions", "langchain test pipeline", "vcr ci", "langchain eval gate", "pytest -W error langchain".

Cloud & DevOpssaaslangchainlanggraphpythonlangchain-1.0cigithub-actionstesting

What this skill does

# LangChain CI Integration (Python)

## Overview

A PR passes every test on your laptop. You push. GHA runs `pytest` and aborts
during collection — before a single test executes — with:

```
PytestUnraisableExceptionWarning: Exception ignored in: ...
DeprecationWarning: langchain_community.llms ...
```

The org runs `pytest -W error` and a provider SDK emitted a `DeprecationWarning`
at *import* time, which the warning filter promoted to an exception while pytest
was still walking the test tree. This is **P45** and it blocks every PR for the
team until someone pins a `filterwarnings` config.

Meanwhile the integration suite has its own failure mode: a VCR cassette
recorded three months ago at `temperature=0` against Anthropic is now flaking
against a snapshot. `temperature=0` is not deterministic on Claude — it still
nucleus-samples (**P05**) — so the cassette captured *one* valid completion, not
*the* valid completion. And yesterday a reviewer caught
`Authorization: Bearer sk-ant-...` in a cassette file that had been committed
six weeks ago (**P44**) because `vcrpy` records all request headers by default.

This skill covers the outer loop: the GitHub Actions workflow, the unit /
integration / eval gate separation, VCR cassette hygiene, pytest warning
policy, and a merge-blocking eval regression gate. The **inner** loop — fake
model fixtures, VCR recording workflow, local determinism tricks — lives in
`langchain-local-dev-loop` (F23); cross-reference it, do not duplicate it.
Pin: `langchain-core 1.0.x`, `langgraph 1.0.x`, `actions/checkout@v4`,
`actions/setup-python@v5`, `vcrpy 6.x`. Pain-catalog anchors: **P05, P43, P44, P45**.

## Prerequisites

- Python 3.10, 3.11, or 3.12 (matrix)
- `langchain-core >= 1.0, < 2.0`, `langgraph >= 1.0, < 2.0`
- `pytest >= 8`, `pytest-asyncio`, `vcrpy >= 6` (integration)
- `langchain-local-dev-loop` (F23) applied locally — fixtures and recording workflow
- GitHub repo with Actions enabled; secrets set for any live-API nightly job

## Instructions

### Step 1 — GHA workflow skeleton with four jobs

Single workflow at `.github/workflows/tests.yml`. Matrix on unit only; keep
integration and eval single-version to control cost.

```yaml
name: tests

on:
  pull_request:
  push:
    branches: [main]
  schedule:
    - cron: "0 6 * * *"  # nightly live-API re-record check (06:00 UTC)

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  unit:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python: ["3.10", "3.11", "3.12"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python }}
          cache: pip
          cache-dependency-path: |
            pyproject.toml
            requirements*.txt
      - run: pip install -e ".[test]"
      - run: pytest tests/unit/ -W error --timeout=30 -q

  integration:
    needs: unit
    if: github.event_name == 'schedule' || contains(github.event.pull_request.labels.*.name, 'run-integration')
    runs-on: ubuntu-latest
    env:
      RUN_INTEGRATION: "1"
      VCR_MODE: "none"  # replay-only; nightly cron flips to "once"
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12", cache: pip }
      - run: pip install -e ".[test,integration]"
      - run: pytest tests/integration/ -W error --timeout=60 -q

  eval:
    needs: unit
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }   # need base ref for delta comparison
      - uses: actions/setup-python@v5
        with: { python-version: "3.12", cache: pip }
      - run: pip install -e ".[test,eval]"
      - run: python scripts/run_eval.py --baseline origin/${{ github.base_ref }} --head HEAD --n 100
      # run_eval.py posts a PR comment and exits nonzero on regression > threshold

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12", cache: pip }
      - run: pip install -e ".[dev]"
      - run: ruff check .
      - run: python scripts/dryrun_load_chains.py   # catches ImportError migration regressions
```

See [GHA Workflow Reference](references/github-actions-workflow.md) for the full
job definitions including the secret-injection pattern, the matrix caching
nuance, and the `softprops/action-gh-release`-style PR comment action used by
the eval job.

### Step 2 — Unit job: `-W error` + `filterwarnings` to neutralize P45

Root cause of the collection abort: pytest collects tests by importing them.
Some provider SDKs emit `DeprecationWarning` on import. With `-W error` those
become exceptions during collection. Fix at the *filter* level, not by dropping
`-W error` (which would mask real warnings).

In `pyproject.toml`:

```toml
[tool.pytest.ini_options]
filterwarnings = [
    "error",
    # P45 — neutralize known import-time noise; scoped per module so new
    # warnings from YOUR code still fail the build.
    "ignore::DeprecationWarning:langchain_community.*",
    "ignore::DeprecationWarning:pydantic.*",
    "ignore:Pydantic serializer warnings:UserWarning",
]
asyncio_mode = "auto"
testpaths = ["tests"]
```

The ordering matters — `"error"` first, specific `"ignore"` entries after, so
the filters override the global promote-to-error. Keep the list **narrow**: a
blanket `ignore::DeprecationWarning` hides regressions you need to see.

Unit tests use `FakeListChatModel` fixtures from F23 (do not redefine them
here). One CI-specific gotcha (**P43**): `FakeListChatModel` does not emit
`response_metadata["token_usage"]`, so any callback that asserts on token counts
will break. Either subclass the fake and inject `generation_info`, or gate the
assertion:

```python
def test_chain_uses_tokens(patched_chat_model):
    result = chain.invoke({"input": "hi"})
    if patched_chat_model.__class__.__name__ == "FakeListChatModel":
        pytest.skip("fake model doesn't emit token_usage (P43)")
    assert result.response_metadata["token_usage"]["total_tokens"] > 0
```

Budget: unit job should finish in **<2 minutes** across the 3-version matrix.
If it doesn't, something is calling out to a real provider — check with
`pytest --collect-only -q | wc -l` and audit which tests lack fake-model
fixtures.

### Step 3 — Integration job: VCR replay + `filter_headers` (P44)

Integration tests replay pre-recorded VCR cassettes. Three rules:

1. Gate the job. `if: contains(github.event.pull_request.labels.*.name, 'run-integration')` or `env.RUN_INTEGRATION == "1"`, plus a nightly cron that flips to `VCR_MODE=once` and re-records against live APIs. PRs default to pure replay.
2. Enforce `filter_headers` at the fixture level — not per-test. A single `conftest.py` prevents any contributor from recording a cassette with raw credentials.
3. Pre-commit + CI both scan cassettes for leaked keys. Belt and suspenders.

Fixture (lives in `tests/integration/conftest.py`, owned by this skill's
pipeline concern — F23 owns the *recording* workflow):

```python
import vcr
import pytest

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "filter_headers": [
            "authorization",
            "x-api-key",
            "anthropic-version",
            ("openai-organization", "REDACTED"),
        ],
        "filter_post_data_parameters": ["api_key"],
        "record_mode": "none",  # CI default: replay only
        "match_on": ["method", "scheme", "host", "port", "path", "query"],
    }
```

Integration suite must finish in **<5 minutes** wall-clock on the runner, or
you will start getting cancellation flakes from the `concurrency` block. If
you exceed 5 minutes, split into a nightly-only long tier.

See [Integration Gating](references/integration-gating.md) for the full
live-vs-replay decision tree, cost-per-run budget worksheet, and the

Files: 6

Size: 46.7 KB

Complexity: 57/100

Category: Cloud & DevOps

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/langchain-py-pack/skills/langchain-ci-integration

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts