databricks-upgrade-migration

Included with Lifetime

$97 forever

Upgrade Databricks runtime versions and migrate between features. Use when upgrading DBR versions, migrating to Unity Catalog, or updating deprecated APIs and features. Trigger with phrases like "databricks upgrade", "DBR upgrade", "databricks migration", "unity catalog migration", "hive to unity".

Generalsaasdatabricksmigration

What this skill does

# Databricks Upgrade & Migration

## Overview

Upgrade Databricks Runtime versions and migrate from Hive Metastore to Unity Catalog. Covers version compatibility, deprecated config removal, table migration via SYNC/CTAS, API endpoint updates, and Delta protocol upgrades.

## Prerequisites

- Admin access to workspace
- Test environment (dev/staging) for validation before prod
- Inventory of current workloads and dependencies

## Instructions

### Step 1: Runtime Version Upgrade

#### Version Compatibility Matrix

| Current DBR | Target DBR | Key Changes | Effort |
|-------------|------------|-------------|--------|
| 12.x LTS | 13.3 LTS | Spark 3.4, Python 3.10 default | Low |
| 13.3 LTS | 14.3 LTS | Spark 3.5, improved AQE, Liquid Clustering GA | Medium |
| 14.x | 15.x LTS | Unity Catalog mandatory, legacy DBFS deprecated | High |

#### Automated Upgrade Script

```python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

def plan_cluster_upgrade(
    cluster_id: str,
    target_version: str = "14.3.x-scala2.12",
    dry_run: bool = True,
) -> dict:
    """Plan and optionally execute a DBR version upgrade."""
    cluster = w.clusters.get(cluster_id)
    plan = {
        "cluster_id": cluster_id,
        "cluster_name": cluster.cluster_name,
        "current_version": cluster.spark_version,
        "target_version": target_version,
        "removals": [],
        "warnings": [],
    }

    # Check for deprecated Spark configs
    deprecated = {
        "spark.databricks.delta.preview.enabled": "GA in 13.x+",
        "spark.sql.legacy.createHiveTableByDefault": "Removed in 14.x+",
        "spark.databricks.passthrough.enabled": "Removed in 15.x+",
        "spark.sql.legacy.allowNonEmptyLocationInCTAS": "Removed in 14.x+",
    }

    for key, reason in deprecated.items():
        if cluster.spark_conf and key in cluster.spark_conf:
            plan["removals"].append({"config": key, "reason": reason})

    # Check Python version compatibility
    if "13." in target_version or "14." in target_version:
        plan["warnings"].append("Python default changes to 3.10 — verify library compatibility")

    if not dry_run:
        clean_conf = {
            k: v for k, v in (cluster.spark_conf or {}).items()
            if k not in deprecated
        }
        w.clusters.edit(
            cluster_id=cluster_id,
            spark_version=target_version,
            cluster_name=cluster.cluster_name,
            spark_conf=clean_conf,
            node_type_id=cluster.node_type_id,
            num_workers=cluster.num_workers,
        )
        plan["status"] = "APPLIED"
    else:
        plan["status"] = "DRY_RUN"

    return plan

# Dry run first
for cluster in w.clusters.list():
    plan = plan_cluster_upgrade(cluster.cluster_id, dry_run=True)
    if plan["removals"] or plan["warnings"]:
        print(f"\n{plan['cluster_name']}:")
        for r in plan["removals"]:
            print(f"  REMOVE: {r['config']} ({r['reason']})")
        for w_ in plan["warnings"]:
            print(f"  WARN: {w_}")
```

### Step 2: Unity Catalog Migration (Hive Metastore)

#### Inventory Current Tables

```sql
-- List all Hive Metastore tables to migrate
SHOW DATABASES IN hive_metastore;
SHOW TABLES IN hive_metastore.my_database;

-- Get table sizes for migration planning
SELECT table_name, table_type,
       data_length / 1024 / 1024 AS size_mb
FROM hive_metastore.information_schema.tables
WHERE table_schema = 'my_database'
ORDER BY data_length DESC;
```

#### Migrate Tables

```sql
-- Create Unity Catalog destination
CREATE CATALOG IF NOT EXISTS analytics;
CREATE SCHEMA IF NOT EXISTS analytics.migrated;

-- Option A: SYNC (in-place — keeps data where it is, adds UC metadata)
-- Best for external tables already on cloud storage
SYNC SCHEMA analytics.migrated FROM hive_metastore.my_database;

-- Option B: CTAS (copies data — creates managed Delta tables)
-- Best for small-medium tables or format conversion
CREATE TABLE analytics.migrated.customers AS
SELECT * FROM hive_metastore.my_database.customers;

-- Option C: DEEP CLONE (best for Delta-to-Delta, preserves history)
CREATE TABLE analytics.migrated.orders
DEEP CLONE hive_metastore.my_database.orders;

-- Migrate views
CREATE VIEW analytics.migrated.customer_summary AS
SELECT * FROM analytics.migrated.customers
WHERE active = true;

-- Verify migration
SELECT 'source' AS system, COUNT(*) AS rows
FROM hive_metastore.my_database.customers
UNION ALL
SELECT 'target', COUNT(*)
FROM analytics.migrated.customers;

-- Grant access
GRANT USAGE ON CATALOG analytics TO `data-team`;
GRANT SELECT ON SCHEMA analytics.migrated TO `data-team`;
```

### Step 3: API Endpoint Migration

```python
# Jobs API 2.0 → 2.1 changes
# Old: POST /api/2.0/jobs/create with flat task definition
# New: POST /api/2.1/jobs/create with tasks[] array (multi-task)

# Old (single task):
old_config = {
    "name": "my-job",
    "existing_cluster_id": "abc-123",
    "notebook_task": {"notebook_path": "/path"}
}

# New (multi-task):
new_config = {
    "name": "my-job",
    "tasks": [{
        "task_key": "main",
        "existing_cluster_id": "abc-123",
        "notebook_task": {"notebook_path": "/path"}
    }]
}

# The Python SDK uses the latest API version automatically
from databricks.sdk.service.jobs import Task, NotebookTask
job = w.jobs.create(
    name="my-job",
    tasks=[Task(
        task_key="main",
        existing_cluster_id="abc-123",
        notebook_task=NotebookTask(notebook_path="/path"),
    )],
)
```

### Step 4: Delta Protocol Upgrade

```sql
-- Check current protocol version
DESCRIBE DETAIL analytics.silver.orders;
-- Look at: minReaderVersion, minWriterVersion

-- Upgrade to support Deletion Vectors (reader v3, writer v7)
ALTER TABLE analytics.silver.orders
SET TBLPROPERTIES (
    'delta.minReaderVersion' = '3',
    'delta.minWriterVersion' = '7',
    'delta.enableDeletionVectors' = 'true'
);

-- Enable Liquid Clustering (replaces partitioning + Z-order)
ALTER TABLE analytics.silver.orders CLUSTER BY (order_date, region);

-- WARNING: Protocol upgrades are irreversible.
-- If you need to downgrade, DEEP CLONE to a new table instead.
```

## Output

- DBR version upgraded with deprecated configs removed
- Hive Metastore tables migrated to Unity Catalog (SYNC/CTAS/DEEP CLONE)
- API calls updated to latest SDK patterns
- Delta protocol upgraded for Deletion Vectors and Liquid Clustering

## Error Handling

| Issue | Cause | Solution |
|-------|-------|----------|
| Library incompatible with new DBR | Python/Java version change | Pin library versions in `requirements.txt`, test in staging |
| `PERMISSION_DENIED` after migration | Missing Unity Catalog grants | Run `GRANT USAGE ON CATALOG`, `GRANT SELECT ON SCHEMA` |
| `SYNC` fails | Storage location inaccessible | Check cloud storage permissions and network config |
| Protocol downgrade error | Cannot lower protocol version | `DEEP CLONE` to a new table with lower protocol |
| `Table not found` after migration | Notebooks still reference `hive_metastore` | Update all references to `catalog.schema.table` format |

## Examples

### Quick Upgrade Check

```bash
# Current state
echo "CLI: $(databricks --version)"
echo "SDK: $(pip show databricks-sdk | grep Version)"
echo "Cluster DBR: $(databricks clusters get --cluster-id $CID | jq -r .spark_version)"

# Upgrade SDK
pip install --upgrade databricks-sdk
```

### Bulk Table Migration Script

```python
# Migrate all tables in a Hive Metastore database
source_db = "hive_metastore.legacy_data"
target_schema = "analytics.migrated"

tables = spark.sql(f"SHOW TABLES IN {source_db}").collect()
for t in tables:
    table_name = t.tableName
    print(f"Migrating {table_name}...")
    spark.sql(f"""
        CREATE TABLE IF NOT EXISTS {target_schema}.{table_name}
        AS SELECT * FROM {source_db}.{table_name}
    """)
    # Verify
    src_count = spark.table(f"{source_db}.{table_name}").count()
    tgt_count = spark.table(f"{target_schema}.{tab

Files: 2

Size: 12.6 KB

Complexity: 32/100

Category: General

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/databricks-pack/skills/databricks-upgrade-migration

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts