pinecone:full-text-search

Included with Lifetime

$97 forever

Create, ingest into, and query a Pinecone full-text-search (FTS) index using the preview API (2026-01.alpha, public preview). Use when the user or agent asks to build a text search index on Pinecone, add dense or sparse vector fields, ingest documents, construct score_by clauses (text / query_string / dense_vector / sparse_vector), or compose with text-match filters ($match_phrase / $match_all / $match_any). Ships `scripts/ingest.py` for safe bulk ingestion (batch_upsert + error inspection + readiness polling); query construction is documented inline in this skill — write `documents.search(...)` calls directly, validated against `pc.preview.indexes.describe(...)` output.

Backend & APIsscripts

What this skill does


# Pinecone Full-Text Search

> **Requires `pinecone` Python SDK ≥ 9.0** (`pip install pinecone>=9.0`). The FTS document-schema API lives under `pinecone.preview` and is incomplete or absent in earlier SDK builds. The packaged helper scripts pin `pinecone==9.0.0` via PEP 723 inline metadata; if you're writing your own code against this skill, pin v9 explicitly. The wire API version is `2026-01.alpha`.

> **Authoritative reference (last resort).** If you hit a question this skill and its `references/*.md` files don't answer, the official Pinecone FTS docs are at <https://docs.pinecone.io/guides/search/full-text-search>. Prefer this skill's content for anything covered here — the docs may describe surfaces (e.g. classic vector API) that don't apply to the document-schema FTS path. Consult the link only when you're genuinely stuck.

> **Tell the user up front:** "This skill ships a helper at `scripts/ingest.py` that handles bulk ingestion safely (batched upsert, error inspection, readiness polling). When we get to the ingest step, I'll use it." Surface this at the start of the conversation so the user knows the helper exists. Query construction is hand-written `documents.search(...)` per the **Querying** section below — there is no query helper.

A workflow skill for building a Pinecone full-text-search index with the preview API (`pinecone.preview`, API version `2026-01.alpha`, public preview as of April 2026). Covers schema design (text, dense vector, sparse vector, filterable metadata), ingestion (including async indexing and polling), and query construction (`text` / `query_string` / `dense_vector` / `sparse_vector` scoring; `$match_phrase` / `$match_all` / `$match_any` text-match filters; `$eq` / `$in` / `$gte` / `$exists` / `$and` / `$or` / `$not` metadata filters).

## Scope — this skill is for the document-schema FTS API only

This skill covers `pc.preview.indexes.create(..., schema=...)`, `pc.preview.index(name)`, `idx.documents.upsert(...)` / `idx.documents.batch_upsert(...)` / `idx.documents.search(...)`. If you find yourself reaching for any of the following, **stop** — those are different Pinecone APIs and this skill's guidance and helpers won't apply:

- **Classic vector / records API**: `pc.Index(name)`, `index.upsert(vectors=[...])` / `index.upsert_records(...)`, `index.query(vector=..., sparse_vector=...)`, `index.search_records(...)`, `pc.create_index(...)` with `ServerlessSpec`, the legacy `pinecone_text.sparse.BM25Encoder` for sparse-dense hybrid. For indexes WITHOUT a schema (raw vectors).
- **Integrated-embedding indexes**: `pc.create_index_for_model(...)` with `embed={...}`. Pinecone vectorizes text server-side. Different upsert/search shapes. Cannot be combined with `full_text_search` fields in the same index.

If the user already has a non-document-schema index, they can stand up a separate document-schema index alongside it — the two are independent — but you can't add FTS fields to a classic index after the fact.

## Querying — construct `documents.search(...)` calls

For any task that asks you to query an FTS index, you write a `documents.search(...)` call directly. The schema is authoritative — describe the index live before constructing the call so you know which fields are FTS-enabled, which are filterable, and which are vectors.

**Workflow:**

1. **Discover the schema.** Call `pc.preview.indexes.describe(<index>)` and read the `schema.fields` dict. Each field's class indicates its type (`PreviewStringField`, `PreviewIntegerField`, `PreviewDenseVectorField`, etc.); attributes tell you whether it's FTS-enabled (`full_text_search`), filterable, or carries a `dimension`. Skip this step only if you've already seen the schema in this conversation.
2. **Construct the call** matching the rules below — one scoring type per request, hard requirements in `filter`, ranking signals in `score_by`, `include_fields` explicit on every call.
3. **Execute** with `idx = pc.preview.index(name=<index>); resp = idx.documents.search(...)` and read `resp.matches`.

**Canonical shapes:**

```python
# Pure BM25 keyword search
resp = idx.documents.search(
    namespace="__default__",
    top_k=10,
    score_by=[{"type": "text", "field": "body", "query": "machine learning"}],
    filter={"year": {"$gt": 2024}, "category": {"$eq": "ai"}},  # optional
    include_fields=["*"],   # always pass explicitly
)

# Hybrid: dense ranking with a lexical filter (one type in score_by + filter narrows)
resp = idx.documents.search(
    namespace="__default__",
    top_k=10,
    score_by=[{"type": "dense_vector", "field": "embedding", "values": query_embedding}],
    filter={"body": {"$match_all": "TensorFlow"}, "year": {"$gt": 2024}},
    include_fields=["*"],
)
```

**Key rules** (the server enforces these; following them locally keeps the agent loop tight):

- `score_by` is a list of clauses, but **exactly one scoring type per request** (server rejects mixed types). Multi-field BM25 is the one exception: multiple `text` clauses, or one `query_string` with `fields: [...]`. To combine BM25 + dense signals, restrict the dense search with a text-match filter (`$match_all` / `$match_phrase` / `$match_any`); do NOT mix scoring types in `score_by`.
- `filter` keys are field names (must exist in schema and be filterable) OR logical operators (`$and`, `$or`, `$not`). Field values are operator dicts (`{"$gt": 5}`, NOT bare values).
- `include_fields` is required on every call. Pass `["*"]` for all stored fields, `[]` for ids+score only, or a list of names. Some SDK builds 400/422 if it's omitted.

**Clause shapes** (for `score_by`):

| `type` | Required keys | When to pick this |
|---|---|---|
| `text` | `field` (string FTS), `query` | Open-ended keyword search; BM25 ranking on one field |
| `query_string` | `query` (Lucene), `fields` optional | Lucene boost (`^N`), proximity (`~N`), cross-field boolean, phrase prefix |
| `dense_vector` | `field` (dense_vector), `values` (list of floats) | Semantic / mood / topic ranking |
| `sparse_vector` | `field` (sparse_vector), `sparse_values` ({indices, values}) | Custom sparse-encoder ranking |

`text` / `dense_vector` / `sparse_vector` use singular `field`. Only `query_string` accepts a `fields` array (and also accepts singular `field` as an alias). `sparse_vector` uses `sparse_values` (NOT `values`) — distinct from dense.

**Filter operators by field type:**

| Field type | Legal operators |
|---|---|
| `string` with FTS | `$match_phrase`, `$match_all`, `$match_any` |
| `string` filterable | `$eq`, `$ne`, `$in`, `$nin`, `$exists` |
| `string_list` filterable | `$in`, `$nin`, `$exists` |
| `float` filterable | `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$exists` |
| `boolean` filterable | `$eq`, `$exists` |
| logical wrappers | `$and: [filters]`, `$or: [filters]`, `$not: filter` |

**Match shape on response:**

```python
for m in resp.matches:
    m._id        # document id
    m._score     # match score (NOT `score`); some older SDK builds may also surface `score`
    m.to_dict()  # full doc payload (when include_fields includes the field)
```

For deeper coverage — multi-field BM25, Lucene patterns, hybrid composition, RRF merges, common error symptoms — see `references/querying.md`. For schema field types and what they enable on the query side, see `references/schema-design.md`.

## Ingesting — use the packaged helper

For **any task that asks you to bulk-ingest a JSONL file into an existing FTS index**, the canonical path is to invoke the bundled helper, NOT to hand-write a Python script. **Do not read the script's source** — everything you need is in this section.

The script does three things bare-LLM ingest code reliably skips, each of which corresponds to a silent production failure:

1. **Bulk-upserts in batches.** No per-doc `upsert` loops.
2. **Inspects every batch result.** `batch_upsert` returns 202 even when individual documents fail; the failures live in `result.errors` / `result.has_errors`. Without inspection,

Files: 6

Size: 94.2 KB

Complexity: 74/100

Category: Backend & APIs

Source: https://github.com/pinecone-io/pinecone-claude-code-plugin/tree/main/skills/full-text-search

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts