langchain-embeddings-search
Build and query vector stores with LangChain 1.0 without getting burned by flipped score semantics, embedding-dim mismatches, reranker quirks, and chunk-splitter bugs. Use when building a RAG pipeline, choosing between FAISS / Pinecone / Chroma / PGVector, filtering by similarity score, or adding a reranker. Trigger with "langchain embeddings", "vector store similarity search", "langchain RAG retrieval", "FAISS score", "Pinecone score", "reranker score".
What this skill does
# LangChain Embeddings and Vector Search (Python)
## Overview
`FAISS.similarity_search_with_score()` returns L2 distance — **lower is better**.
`Pinecone.similarity_search_with_score()` returns cosine similarity — **higher is
better**. Swap your vector store and your `if score > 0.8` filter now keeps the
garbage and drops the good results, silently. This is pain-catalog entry P12,
and it is the single most common reason a "we migrated from FAISS to Pinecone
for scale" project loses retrieval quality overnight.
The sibling gotchas:
- P13 — `RecursiveCharacterTextSplitter` default separators break inside code
fences, so RAG over Markdown docs truncates code examples mid-function
- P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of
processing), not at `VectorStore.__init__`; the failure blames "dim
mismatch: 1536 != 3072" and no earlier error
- P15 — Cohere/Jina reranker scores are **within-query relative**, so a 0.34
top-1 is not worse than a 0.92 top-1 on a different query; filtering by
threshold is the wrong heuristic
This skill walks through embedding model selection, vector store creation with
the version-safe dim guard, score normalization, hybrid keyword+vector search,
and rerankers with the correct filter-by-rank pattern. Pin: `langchain-core 1.0.x`,
`langchain-community 1.0.x`, `langchain-openai 1.0.x`, `faiss-cpu`, `pinecone-client`.
Pain-catalog anchors: P12, P13, P14, P15, P49, P50.
## Prerequisites
- Python 3.10+
- `langchain-core >= 1.0, < 2.0` and `langchain-community >= 1.0, < 2.0`
- Embedding provider: `pip install langchain-openai` (text-embedding-3-small/large)
- Vector store: `pip install faiss-cpu` OR `pip install langchain-pinecone`
- Provider API keys: `OPENAI_API_KEY`, `PINECONE_API_KEY`
## Instructions
### Step 1 — Initialize embeddings with an explicit dim
```python
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dims
# For text-embedding-3-large, use 3072 dims — must match index
)
# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"
```
Swapping models (`-small` 1536 → `-large` 3072) is a migration, not a swap.
Plan it — back-fill the index, not just the config.
### Step 2 — Choose a vector store
| Store | Score metric | Latency (1M vectors) | When to use |
|---|---|---|---|
| `FAISS` | L2 distance (lower = better) | ~5ms | Local dev, < 1M vectors, in-process |
| `Chroma` | Cosine similarity (higher = better) | ~10ms | Small multi-user, persistent local |
| `PGVector` | Cosine by default (higher = better) | ~20ms | Existing Postgres, transactional needs |
| `PineconeVectorStore` | Cosine similarity (higher = better) | ~50ms (hosted) | > 1M vectors, multi-tenant, managed |
```python
from langchain_community.vectorstores import FAISS
store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR
```
vs.
```python
from langchain_pinecone import PineconeVectorStore
store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR
```
See [Vector Store Comparison](references/vector-store-comparison.md) for the
feature matrix and the migration gotchas.
### Step 3 — Normalize scores before any threshold filter
Write a normalizer at the retriever boundary, so downstream code never sees
raw store-specific scores:
```python
def normalize(score: float, store_type: str) -> float:
"""Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
if store_type == "faiss_l2":
return 1.0 / (1.0 + score) # collapse L2 distance into similarity
if store_type in {"pinecone", "chroma", "pgvector"}:
return max(0.0, min(1.0, score)) # already similarity, clamp just in case
raise ValueError(f"Unknown store type: {store_type}")
```
Now `score > 0.7` means the same thing regardless of backend. See
[Score Semantics](references/score-semantics.md) for the per-store derivation.
### Step 4 — Chunk text with language-aware splitters
```python
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language
# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)
# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)
```
PDF pipelines have their own pain: `PyPDFLoader` splits by page, tearing tables
in half (P49). Use `PyMuPDFLoader` or `UnstructuredPDFLoader` for documents
with tables.
### Step 5 — Hybrid search (keyword + vector)
Pure vector search misses exact-match keywords (product SKUs, error codes,
function names). Combine BM25 + vector:
```python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})
ensemble = EnsembleRetriever(
retrievers=[bm25, vector],
weights=[0.4, 0.6], # tune on your eval set
)
```
See [Hybrid Search](references/hybrid-search.md) for the eval harness and the
weight-tuning procedure.
### Step 6 — Rerank by rank, not by score
```python
from langchain_cohere import CohereRerank
reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n] — trust the rank order
```
Filter by *rank* (keep top-k) not threshold. Calibration per-query is possible
but rarely worth the engineering cost.
## Output
- Embeddings initialized with dim assertion at startup
- Vector store chosen from the comparison matrix with score-semantics awareness
- Score normalizer applied at retriever boundary (no raw scores downstream)
- Language-aware text splitter that respects code fences and PDF structure
- Hybrid retriever combining BM25 and vector with tuned weights
- Reranker filtering by rank, not threshold
## Error Handling
| Error | Cause | Fix |
|-------|-------|-----|
| `PineconeApiException: dim mismatch: 1536 != 3072` | Changed embedding model without reindexing (P14) | Create a new index with the new dim; migrate in a background job |
| Retrieval quality drops after FAISS→Pinecone swap | Score semantics flipped (P12) | Apply `normalize()` at boundary; retune threshold on eval set |
| RAG answers misquote tables | `PyPDFLoader` tore table across pages (P49) | Switch to `PyMuPDFLoader` or `UnstructuredPDFLoader` |
| RAG retrieval drops code examples mid-function | `RecursiveCharacterTextSplitter` broke code fence (P13) | Use `from_language(Language.MARKDOWN/PYTHON)` |
| Cohere reranker top-1 score < 0.5 | Scores are per-query relative (P15) | Filter by rank (`reranked[:k]`), not threshold |
| `WebBaseLoader` returns 403 / Cloudflare interstitial (P50) | Default User-Agent flagged as bot | Pass `header_template={"User-Agent": "Mozilla/5.0 ..."}`; respect robots.txt |
| `ValueError: expected str instance, NoneType found` on embed | Empty document content | Filter `docs = [d for d in docs if d.page_content.strip()]` before embedding |
## Examples
### Building a RAG retriever with hybrid search
End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI
`text-embedding-3-small`, index in FAISS for local dev, wrap in an
`EnsembleRetriever` with BM25 at 0.4 weight and vector at 0.6.
See [Hybrid Search](referencRelated in AI Agents
skill-development
IncludedComprehensive meta-skill for creating, managing, validating, auditing, and distributing Claude Code skills and slash commands (unified in v2.1.3+). Provides skill templates, creation workflows, validation patterns, audit checklists, naming conventions, YAML frontmatter guidance, progressive disclosure examples, and best practices lookup. Use when creating new skills, validating existing skills, auditing skill quality, understanding skill architecture, needing skill templates, learning about YAML frontmatter requirements, progressive disclosure patterns, tool restrictions (allowed-tools), skill composition, skill naming conventions, troubleshooting skill activation issues, creating custom slash commands, configuring command frontmatter, using command arguments ($ARGUMENTS, $1, $2), bash execution in commands, file references in commands, command namespacing, plugin commands, MCP slash commands, Skill tool configuration, or deciding between skills vs slash commands. Delegates to docs-management skill for official documentation.
reprompter
IncludedTransform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptception", "run with quality", "smart run", "smart agents", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files. Success criteria: Single mode quality score ≥ 7/10; Repromptception per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
adaptive-compaction
IncludedAdaptive add-on policy and recovery layer that decides WHEN to compact, prune, snapshot, or fork -- replacing fixed-percent auto-compaction across Claude Code, Codex, and MCP-capable hosts. Trigger on auto-compact timing or damage: "when should I compact", "is it safe to compact now or start a fresh session", "auto-compact fires too early/mid-task", "switching to an unrelated task but the window still has space", "context rot", "answers get worse the longer the session runs", "the agent forgot the plan or my decisions after it summarized", "add a layer on top that manages context without changing the agent", raising autoCompactWindow to give the policy room, or installing/tuning a cross-tool compaction policy or PreCompact hook -- even when "compaction" is never said but the problem is context-window pressure or post-summarization memory loss. Do NOT use to summarize a conversation, build RAG, write a summarization prompt (decides WHEN not HOW), or answer max-context-length trivia.
agent-skill-creator
IncludedCreate cross-platform agent skills from workflow descriptions. Activates when users ask to create an agent, automate a repetitive workflow, create a custom skill, or need advanced agent creation. Triggers on phrases like create agent for, automate workflow, create skill for, every day I have to, daily I need to, turn process into agent, need to automate, create a cross-platform skill, validate this skill, export this skill, migrate this skill. Supports single skills, multi-agent suites, transcript processing, template-based creation, interactive configuration, cross-platform export, and spec validation.
llm-wiki
IncludedUse when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
skill-master
IncludedAgent Skills authoring, evaluation, and optimization. Create, edit, validate, benchmark, and improve skills following the agentskills.io specification. Use when designing SKILL.md files, structuring skill folders (references, scripts, assets), ingesting external documentation into skills, running trigger evals, benchmarking skill quality, optimizing descriptions, or performing blind A/B comparisons. Keywords: agentskills.io, SKILL.md, skill authoring, eval, benchmark, trigger optimization.