databricks-core-workflow-b

Included with Lifetime

$97 forever

Execute Databricks secondary workflow: MLflow model training and deployment. Use when building ML pipelines, training models, or deploying to production. Trigger with phrases like "databricks ML", "mlflow training", "databricks model", "feature store", "model registry".

Generalsaasdatabricksdeploymentmlworkflow

What this skill does

# Databricks Core Workflow B: MLflow Training & Serving

## Overview

Full ML lifecycle on Databricks: Feature Engineering Client for discoverable features, MLflow experiment tracking with auto-logging, Unity Catalog model registry with aliases (`champion`/`challenger`), and Mosaic AI Model Serving endpoints for real-time inference via REST API.

## Prerequisites

- Completed `databricks-install-auth` and `databricks-core-workflow-a`
- `databricks-sdk`, `mlflow`, `scikit-learn` installed
- Unity Catalog enabled (required for model registry)

## Instructions

### Step 1: Feature Engineering with Feature Store

Create a feature table in Unity Catalog so features are discoverable and reusable.

```python
from databricks.feature_engineering import FeatureEngineeringClient
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.getOrCreate()
fe = FeatureEngineeringClient()

# Build features from gold layer tables
user_features = (
    spark.table("prod_catalog.gold.user_events")
    .groupBy("user_id")
    .agg(
        F.count("event_id").alias("total_events"),
        F.avg("session_duration_sec").alias("avg_session_sec"),
        F.max("event_timestamp").alias("last_active"),
        F.countDistinct("event_type").alias("unique_event_types"),
        F.datediff(F.current_date(), F.max("event_timestamp")).alias("days_since_last_active"),
    )
)

# Register as a feature table (creates or updates)
fe.create_table(
    name="prod_catalog.ml_features.user_behavior",
    primary_keys=["user_id"],
    df=user_features,
    description="User behavioral features for churn prediction",
)
```

### Step 2: MLflow Experiment Tracking

```python
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Point MLflow to Databricks tracking server
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/[email protected]/churn-prediction")

# Load features
features_df = spark.table("prod_catalog.ml_features.user_behavior").toPandas()
X = features_df.drop(columns=["user_id", "churned"])
y = features_df["churned"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with experiment tracking
with mlflow.start_run(run_name="gbm-baseline") as run:
    params = {"n_estimators": 200, "max_depth": 5, "learning_rate": 0.1}
    mlflow.log_params(params)

    model = GradientBoostingClassifier(**params)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred),
    }
    mlflow.log_metrics(metrics)

    # Log model with signature for serving validation
    mlflow.sklearn.log_model(
        model,
        artifact_path="model",
        input_example=X_test.iloc[:5],
        registered_model_name="prod_catalog.ml_models.churn_predictor",
    )
    print(f"Run {run.info.run_id}: accuracy={metrics['accuracy']:.3f}")
```

### Step 3: Model Registry with Aliases

Unity Catalog model registry replaces legacy stages with aliases (`champion`, `challenger`).

```python
from mlflow import MlflowClient

client = MlflowClient()
model_name = "prod_catalog.ml_models.churn_predictor"

# List versions
for mv in client.search_model_versions(f"name='{model_name}'"):
    print(f"v{mv.version}: status={mv.status}, aliases={mv.aliases}")

# Promote best version to champion
client.set_registered_model_alias(model_name, alias="champion", version="3")

# Load model by alias in downstream code
champion = mlflow.pyfunc.load_model(f"models:/{model_name}@champion")
predictions = champion.predict(X_test)
```

### Step 4: Deploy Model Serving Endpoint

Mosaic AI Model Serving creates a REST API endpoint with auto-scaling.

```python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import (
    EndpointCoreConfigInput, ServedEntityInput,
)

w = WorkspaceClient()

# Create or update a serving endpoint
endpoint = w.serving_endpoints.create_and_wait(
    name="churn-predictor-prod",
    config=EndpointCoreConfigInput(
        served_entities=[
            ServedEntityInput(
                entity_name="prod_catalog.ml_models.churn_predictor",
                entity_version="3",
                workload_size="Small",
                scale_to_zero_enabled=True,
            )
        ]
    ),
)
print(f"Endpoint ready: {endpoint.name} ({endpoint.state.ready})")
```

### Step 5: Query the Serving Endpoint

```python
import requests

# Score via REST API
url = f"{w.config.host}/serving-endpoints/churn-predictor-prod/invocations"
headers = {
    "Authorization": f"Bearer {w.config.token}",
    "Content-Type": "application/json",
}
payload = {
    "dataframe_records": [
        {"total_events": 42, "avg_session_sec": 120.5,
         "unique_event_types": 7, "days_since_last_active": 3},
    ]
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())  # {"predictions": [0]}

# Or use the SDK
result = w.serving_endpoints.query(
    name="churn-predictor-prod",
    dataframe_records=[
        {"total_events": 42, "avg_session_sec": 120.5,
         "unique_event_types": 7, "days_since_last_active": 3},
    ],
)
print(result.predictions)
```

### Step 6: Batch Inference Job

```python
# Scheduled Databricks job for daily batch scoring
model_name = "prod_catalog.ml_models.churn_predictor"
champion = mlflow.pyfunc.load_model(f"models:/{model_name}@champion")

# Score all active users
active_users = spark.table("prod_catalog.gold.active_users").toPandas()
feature_cols = ["total_events", "avg_session_sec", "unique_event_types", "days_since_last_active"]
active_users["churn_probability"] = champion.predict_proba(active_users[feature_cols])[:, 1]

# Write scores back to Delta
(spark.createDataFrame(active_users[["user_id", "churn_probability"]])
    .write.mode("overwrite")
    .saveAsTable("prod_catalog.gold.churn_scores"))
```

## Output

- Feature table in Unity Catalog (`prod_catalog.ml_features.user_behavior`)
- MLflow experiment with logged runs, metrics, and artifacts
- Model versions in registry with `champion` alias
- Live serving endpoint at `/serving-endpoints/churn-predictor-prod/invocations`
- Batch scoring pipeline writing to `prod_catalog.gold.churn_scores`

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| `RESOURCE_DOES_NOT_EXIST` | Wrong experiment path | Verify with `mlflow.search_experiments()` |
| `INVALID_PARAMETER_VALUE` on `log_model` | Missing signature | Pass `input_example=` to auto-infer signature |
| `Model not found in registry` | Wrong three-level name | Use `catalog.schema.model_name` format |
| `Endpoint FAILED` | Model loading error | Check endpoint events: `w.serving_endpoints.get("name").pending_config` |
| `429 on serving endpoint` | Rate limit exceeded | Increase `workload_size` or add traffic splitting |
| `FEATURE_TABLE_NOT_FOUND` | Table not created | Run `fe.create_table()` first |

## Examples

### Hyperparameter Sweep

```python
from sklearn.model_selection import ParameterGrid

grid = {"n_estimators": [100, 200], "max_depth": [3, 5, 7], "learning_rate": [0.05, 0.1]}
for params in ParameterGrid(grid):
    with mlflow.start_run(run_name=f"gbm-d{params['max_depth']}-n{params['n_estimators']}"):
        mlflow.log_params(params)
        model = GradientBoostingClassifier(**params)
        model.fit(X_train, y_train)
        mlflow.log_metric("accuracy", accuracy_score(y_test, model.predict(X_test)))
        mlflow.sklearn.log_model(model, "model")
```

## Resources

- [MLflow on Databricks](https://docs.databricks.com/aws/en/mlflow/)
- [Feature Engineering](https://docs.databricks.com/aws/en/machine-learning/feature-store/)
- [Model Serving](

Files: 2

Size: 17.3 KB

Complexity: 36/100

Category: General

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/databricks-pack/skills/databricks-core-workflow-b

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts