podium-contact-dedup

Included with Lifetime

$97 forever

Deduplicate Podium contacts in production and survive the data-quality failures — phone-format inconsistency producing four contacts for one phone, merge-api ordering that silently discards the richer record, opt-out flags lost on merge re-enabling marketing on an opted-out person, soft-delete confusion, cross-location duplicate blind spots, and simultaneous-merge race conditions. Use when you build a Podium dedup pipeline, scan and detect duplicate clusters, normalize phones to e164 form across an entire contact corpus, preserve opt-out state through merges, or validate cross-location duplicate resolution. Trigger with "podium contact dedup", "podium phone normalization", "podium e164", "podium duplicate contacts", "podium merge contacts", "podium opt-out preservation", "podium cross-location dedup".

Backend & APIspodiumcontact-dedupe164-normalizationdata-qualityopt-out-preservationmerge-orchestrationscripts

What this skill does


# Podium Contact Dedup

## Overview

Deduplicate Podium contacts in production and operate the dedup pipeline at scale. This is not a one-shot cleanup script — it is the data-quality layer your integration runs continuously to keep the contact corpus sane while messages, calls, webchats, and reviews keep mutating it. Run it once and Sydney's "0412 345 678" walk-in stops creating a fifth contact next to the four that already exist as `+61 412 345 678`, `(04) 1234-5678`, `+61412345678`, and `0412345678`.

The six production failures this skill prevents:

1. **Phone format inconsistency** — `+61 412 345 678`, `0412 345 678`, `(04) 1234-5678`, `+61412345678` are all the same phone but produce four contacts. Operators paste numbers from a CRM, a phone screen, a written form, and a stored fragment; Podium dedups on exact string match, so all four survive and the next caller appears as a fifth.
2. **Merge API ordering loses fields** — Podium's merge endpoint takes a `primary` and a `duplicate`; whichever you pick as `primary` keeps its own fields, the other's fields are discarded. Pick the wrong record (newer but emptier) as primary and the older, richer record's name, tags, and conversation links vanish silently.
3. **Opt-out flag lost on merge** — duplicate had `marketing_opt_out=true`, primary had `marketing_opt_out=false`; naive merge keeps primary's flag and re-enables marketing on a person who explicitly opted out. This is a compliance incident (TCPA, GDPR Article 21, ACMA Spam Act) and a trust incident — the customer opted out, you marketed at them anyway.
4. **Soft-delete vs hard-delete semantic confusion** — Podium's `DELETE /contacts/{uid}` is reversible; the record is hidden, not destroyed. Treat it as terminal and you ship a "contact reappeared after we deleted them" support ticket every time an admin restores a contact via the UI. Hard-delete (purge) is a separate, irreversible endpoint with different scopes.
5. **Duplicate detection across locations** — same phone calls Sydney AND Burleigh Heads, two contacts created (one per location), per-location dedup misses it entirely. Cross-location dedup needs a separate routine keyed by `phone_natural_key` across the union of contacts in every location_uid, not just within one.
6. **Merge conflicts on simultaneous edits** — two operators (or one operator + one automated job) merge overlapping clusters at the same time; the second merge's `primary` may have already been merged into another record, the API silently merges into a now-stale target, and one operator's intent is dropped without surfacing the conflict.

## Prerequisites

- Python 3.10+ with the `phonenumbers` library (`pip install phonenumbers`)
- A working `podium-auth` integration (this skill calls Podium with an authenticated client)
- Read scope: `contacts.read`. Write scope: `contacts.write`. (`contacts.delete` only if hard-purge is in scope.)
- A local SQLite database for the natural-key index and merge state file (default `./podium-dedup.sqlite`)
- A default region for E.164 parsing (`AU` for Australian deployments, `US` for US — set per-tenant)

## Instructions

Build in this order. Each section neutralizes one production failure mode.

### 1. E.164 normalization with natural-key emission (neutralizes phone-format inconsistency)

Every contact's phone is parsed by the `phonenumbers` library into E.164 form, then hashed into a stable "natural key" suitable for an index lookup. Same human-readable phone → same key, regardless of formatting input.

```python
import phonenumbers
from phonenumbers import NumberParseException

def normalize_phone(raw: str, default_region: str = "AU") -> dict:
    """Return {e164, national, country, natural_key, valid} for any input format."""
    try:
        parsed = phonenumbers.parse(raw, default_region)
    except NumberParseException as e:
        return {"valid": False, "reason": f"parse_failed: {e}"}
    if not phonenumbers.is_valid_number(parsed):
        return {"valid": False, "reason": "not_a_valid_number"}
    e164 = phonenumbers.format_number(parsed, phonenumbers.PhoneNumberFormat.E164)
    national = phonenumbers.format_number(parsed, phonenumbers.PhoneNumberFormat.NATIONAL)
    return {
        "valid": True,
        "e164": e164,                         # +61412345678
        "national": national,                 # 0412 345 678
        "country": phonenumbers.region_code_for_number(parsed),
        "natural_key": e164,                  # the E.164 IS the natural key — no further hashing needed
    }
```

The natural key is the E.164 string itself. Hashing it adds nothing — E.164 is already canonical and bounded in length. Use the E.164 directly as the SQLite primary key on the natural-key index.

### 2. SQLite-backed natural-key index (neutralizes O(N²) duplicate scans)

A naive dedup scans every pair of contacts — O(N²) on a 50k-contact corpus is hours. Instead, build a `(natural_key → [contact_uid, ...])` index in SQLite once, then duplicate detection is O(N) over the index.

```sql
CREATE TABLE IF NOT EXISTS contact_index (
    contact_uid    TEXT PRIMARY KEY,
    location_uid   TEXT NOT NULL,
    natural_key    TEXT NOT NULL,   -- E.164
    raw_phone      TEXT,
    name           TEXT,
    field_count    INTEGER NOT NULL DEFAULT 0,
    marketing_opt_out  INTEGER NOT NULL DEFAULT 0,
    sms_opt_out        INTEGER NOT NULL DEFAULT 0,
    email_opt_out      INTEGER NOT NULL DEFAULT 0,
    deleted_at_podium  TEXT,        -- ISO8601, NULL if live
    updated_at_podium  TEXT NOT NULL,
    indexed_at         TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_natural_key ON contact_index(natural_key);
CREATE INDEX IF NOT EXISTS idx_natural_key_per_location ON contact_index(natural_key, location_uid);
```

The `field_count` column is precomputed at index time so the merge orchestrator picks the richer record as `primary` without re-fetching every record.

### 3. Duplicate detection with confidence scoring (neutralizes blind merges)

A cluster is a set of contacts sharing the same `natural_key`. Within a cluster, each pair gets a confidence score in `[0.0, 1.0]`:

| Factor | Weight |
|---|---|
| Same E.164 (always true within a cluster) | 0.60 — required floor |
| Same name (case-insensitive, normalized) | +0.20 |
| Same email | +0.15 |
| Overlapping tags | +0.05 |

Only clusters with all pairwise scores `>= 0.80` auto-merge by default. Lower-scored clusters surface for human review.

```python
def cluster_confidence(a: dict, b: dict) -> float:
    score = 0.60   # same natural_key by construction
    if a.get("name") and a["name"].strip().lower() == (b.get("name") or "").strip().lower():
        score += 0.20
    if a.get("email") and a["email"].strip().lower() == (b.get("email") or "").strip().lower():
        score += 0.15
    a_tags, b_tags = set(a.get("tags") or []), set(b.get("tags") or [])
    if a_tags & b_tags:
        score += 0.05
    return round(min(score, 1.0), 4)
```

### 4. Merge orchestrator with primary selection (neutralizes lost richer record)

For each auto-mergeable cluster, pick the `primary` by this deterministic rule, in order:

1. **Most fields populated** (highest `field_count`) — the richer record wins.
2. **Most recently updated** (`updated_at_podium`) — break ties toward fresher state.
3. **Lowest contact_uid** (lexical) — final, deterministic, reproducible tiebreak.

Every other contact in the cluster is a `duplicate` to be merged INTO the primary. Never trust caller-supplied ordering — always compute primary inside the orchestrator.

```python
def select_primary(cluster: list[dict]) -> dict:
    return max(
        cluster,
        key=lambda c: (c["field_count"], c["updated_at_podium"], -ord_key(c["contact_uid"]))
    )

def ord_key(uid: str) -> int:
    # Stable, deterministic tiebreak — lower uid sorts first
    return sum(ord(c) for c in uid)
```

### 5. Opt-out preservation by union (neutralizes compliance re-enable)

The strongest setting wins, always. If any

Files: 11

Size: 113.0 KB

Complexity: 80/100

Category: Backend & APIs

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/podium-pack/skills/podium-contact-dedup

Related in Backend & APIs

jfrog

Included

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

Backend & APIsscripts

cupynumeric-migration-readiness

Included

Pre-migration readiness assessor for porting NumPy to cuPyNumeric. Use BEFORE substantial porting work begins when the user asks whether code will scale on GPU, whether they should migrate to cuPyNumeric, which NumPy patterns transfer cleanly, what must be refactored before porting, or mentions pre-port assessment, scaling analysis, or refactor planning. Inspect the user's source code, look up NumPy usage, cross-reference the cuPyNumeric API support manifest, and distinguish distributed-scaling-friendly patterns from blockers such as unsupported APIs, scalar synchronization, host round-trips, Python/object-heavy control flow, shape/data-dependent branching, and in-place mutation hazards. Produce a verdict of READY, LIGHT REFACTOR, SIGNIFICANT REFACTOR, or NOT RECOMMENDED, with concrete refactor pointers.

Backend & APIsscripts

alibabacloud-data-agent-skill

Included

Invoke Alibaba Cloud Apsara Data Agent for Analytics via CLI to perform natural language-driven data analysis on enterprise databases. Data Agent for Analytics is an intelligent data analysis agent developed by Alibaba Cloud Database team for enterprise users. It automatically completes requirement analysis, data understanding, analysis insights, and report generation based on natural language descriptions. This tool supports: discovering data resources (instances/databases/tables) managed in DMS, initiating query or deep analysis sessions, real-time progress tracking, and retrieving analysis conclusions and generated reports. Use this Skill when users need to query databases, analyze data trends, generate data reports, ask questions in natural language, or mention "Data Agent", "data analysis", "database query", "SQL analysis", "data insights".

Backend & APIsscripts

token-optimizer

Included

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session pruning, bootstrap size limits, cache TTL alignment). Use when token costs are high, API rate limits are being hit, or hosting multiple agents at scale. The 4 executable scripts (context_optimizer, model_router, heartbeat_optimizer, token_tracker) are local-only — no network requests, no subprocess calls, no system modifications. Reference files (PROVIDERS.md, config-patches.json) document optional multi-provider strategies that require external API keys and network access if you choose to use them. See SECURITY.md for full breakdown.

Backend & APIsscripts

resend-cli

Included

Use this skill when the task is specifically about operating Resend from an AI agent, terminal session, or CI job via the official resend CLI: installing/authenticating the CLI, sending/listing/updating/cancelling emails, batch sends, domains and DNS, webhooks and local listeners, inbound receiving, contacts, topics, segments, broadcasts, templates, API keys, profiles, or debugging Resend CLI/API failures. Trigger on mentions of Resend CLI, `resend`, `resend doctor`, `resend emails send`, `resend domains`, `resend webhooks listen`, `resend emails receiving`, or agent-friendly terminal automation.

Backend & APIsscripts

alibabacloud-odps-maxframe-coding

Included

Use this skill for MaxFrame SDK development and documentation navigation on Alibaba Cloud MaxCompute (ODPS). Helps answer MaxFrame API, concept, official example, and supported pandas API questions; create data processing programs; read/write MaxCompute tables; debug jobs (remote or local); and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute with MaxFrame, ODPS table processing, DPE runtime, MaxFrame docs/examples, DataFrame/Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.

Backend & APIsscripts