mlops

Included with Lifetime

$97 forever

End-to-end MLOps guidance on AWS — platform selection, training, inference, pipelines, monitoring, and cost optimization. This skill should be used when the user asks to "build an ML pipeline", "deploy a model on SageMaker", "set up MLOps", "configure SageMaker Pipelines", "choose between SageMaker and Bedrock", "deploy ML models to production", "set up model monitoring", "use MLflow on AWS", "train a model with Spot instances", "configure inference endpoints", "set up distributed training", or mentions SageMaker, MLflow, Kubeflow, ML pipelines, model registry, model monitoring, hyperparameter tuning, inference endpoints, or MLOps on AWS.

Cloud & DevOps

What this skill does


Specialist guidance for MLOps on AWS. Covers platform selection, training job configuration, inference deployment patterns, CI/CD for ML, experiment tracking, model monitoring, and cost optimization.

## Process

1. Identify the ML workload characteristics: model type (classical ML, deep learning, foundation model), training data volume, inference latency requirements, traffic pattern, team expertise
2. Use the `awsknowledge` MCP tools (`mcp__plugin_aws-dev-toolkit_awsknowledge__aws___search_documentation`, `mcp__plugin_aws-dev-toolkit_awsknowledge__aws___read_documentation`, `mcp__plugin_aws-dev-toolkit_awsknowledge__aws___recommend`) to verify current SageMaker instance types, limits, pricing, and feature availability
3. Select the appropriate MLOps platform using the decision matrix below
4. Design the training infrastructure (instance selection, distributed strategy, Spot configuration)
5. Design the inference topology (real-time, serverless, batch, async)
6. Configure the ML pipeline (SageMaker Pipelines, Step Functions, or CI/CD integration)
7. Set up experiment tracking (MLflow on SageMaker or SageMaker Experiments)
8. Configure model monitoring (data quality, model quality, bias drift, feature attribution drift)
9. Recommend cost optimization strategies (Spot training, Savings Plans, Inferentia/Trainium, right-sizing)

## Platform Selection Decision Matrix

| Requirement | Recommendation | Why |
|---|---|---|
| End-to-end ML platform, team wants managed infrastructure | SageMaker (full) | Integrated training, tuning, deployment, monitoring, and model registry in one service; eliminates infrastructure management |
| CI/CD for ML with automated retraining and approval workflows | SageMaker Pipelines | Native step types for Processing, Training, Tuning, Transform, Model, Condition, and Callback; integrates with Model Registry for approval gates |
| Team already uses MLflow, needs portability across clouds | MLflow on SageMaker (managed) | Zero-infrastructure MLflow tracking server with automatic SageMaker Model Registry sync; preserves existing MLflow workflows |
| Customizing a foundation model without managing training infra | Bedrock fine-tuning / continued pre-training | No instance selection, no distributed training config, no checkpointing — AWS manages all training infrastructure; pay per training token |
| Kubernetes-native teams with existing EKS clusters | Kubeflow on EKS | Leverages existing K8s expertise and cluster; full control over scheduling, GPU sharing, and custom operators; but significant operational overhead |
| Simple orchestration for inference-only or lightweight training | Step Functions + Lambda | Event-driven, serverless, pay-per-execution; appropriate when training is infrequent and models are small enough for Lambda memory limits |
| Large-scale foundation model training (billions of parameters) | SageMaker HyperPod | Persistent managed clusters with automatic fault detection and repair; checkpointless recovery; supports Slurm and EKS orchestration |

## Training Instance Selection

### Training Instances

| Instance Family | Accelerator | Use Case | Price-Performance Notes |
|---|---|---|---|
| **ml.trn1 / ml.trn1n** | AWS Trainium | Large model training (LLMs, diffusion) | Up to 50% cheaper than comparable GPU instances for supported architectures; requires Neuron SDK |
| **ml.p5.48xlarge** | 8x NVIDIA H100 | Largest models, highest performance | Most powerful GPU option; use when Trainium does not support the model architecture |
| **ml.p4d.24xlarge** | 8x NVIDIA A100 | Large model training | Previous-gen flagship; still strong for most distributed training |
| **ml.g5.xlarge-48xlarge** | NVIDIA A10G | Medium models, fine-tuning | Good balance of cost and capability for fine-tuning and smaller training jobs |
| **ml.m5.large-24xlarge** | CPU only | Classical ML (XGBoost, sklearn) | No GPU overhead; appropriate for tree-based models and tabular data |

### Inference Instances

| Instance Family | Accelerator | Use Case | Price-Performance Notes |
|---|---|---|---|
| **ml.inf2** | AWS Inferentia2 | LLM and generative AI inference | Up to 4x higher throughput and 10x lower latency vs Inf1; 50%+ cheaper than GPU for supported models |
| **ml.g5** | NVIDIA A10G | General-purpose GPU inference | Broad framework support; use when Inferentia does not support the model |
| **ml.g4dn** | NVIDIA T4 | Cost-effective GPU inference | Previous-gen but still the cheapest GPU option for small-medium models |
| **ml.c7g / ml.c6g** | Graviton (CPU) | CPU inference for classical ML | Best price-performance for models that do not need GPU (XGBoost, sklearn, small NLP) |
| **Serverless** | Auto-managed | Sporadic or unpredictable traffic | No idle cost; 1-6 GB memory; cold start latency of seconds; max 60s processing time |

### Default to Trainium/Inferentia When Possible

Always evaluate ml.trn1 for training and ml.inf2 for inference before selecting GPU instances. Trainium offers up to 50% cost savings for training and Inferentia2 offers 50%+ cost savings for inference on supported model architectures. The AWS Neuron SDK supports PyTorch and TensorFlow natively. Only fall back to GPU instances when the model architecture is not supported by the Neuron compiler (check the Neuron model support matrix) or when the team needs CUDA-specific libraries.

## Inference Deployment Decision Matrix

| Pattern | Latency | Max Payload | Timeout | Cost Model | When to Use |
|---|---|---|---|---|---|
| **Real-time endpoint** | Low (ms) | 25 MB | 60s (8 min streaming) | Per-instance-hour (always running) | Consistent traffic with latency SLAs; use auto-scaling to match demand |
| **Serverless inference** | Medium (cold start) | 4 MB | 60s | Per-request + per-ms compute | Sporadic traffic with idle periods; eliminates idle instance cost entirely |
| **Batch transform** | High (minutes-hours) | 100 MB/record | Days | Per-instance-hour (job duration) | Offline scoring of large datasets; no persistent endpoint needed |
| **Async inference** | Medium-high | 1 GB | 1 hour | Per-instance-hour (scale to 0) | Large payloads or long processing; queue-based with SNS notifications |

### Real-time Endpoint Patterns

- **Single-model endpoint**: One model per endpoint. Simplest. Use for most production deployments.
- **Multi-model endpoint (MME)**: Thousands of models behind one endpoint, loaded on demand from S3. Use when you have many similar models (per-customer, per-region) and cannot justify an endpoint per model. Trade-off: first-request latency while loading a model.
- **Multi-container endpoint**: Up to 15 containers per endpoint, invoked individually or as a serial pipeline. Use for A/B testing different model versions or combining pre/post-processing with inference.
- **Shadow testing**: Route production traffic to both current and candidate models simultaneously. Compare metrics before promoting. Always use shadow testing before replacing a production model because it reveals performance differences under real traffic that offline evaluation cannot capture.

### Auto-Scaling

Default to target-tracking scaling on `SageMakerVariantInvocationsPerInstance` because it automatically adjusts instance count based on actual request load without requiring manual threshold tuning.

```
Target value: start at 70% of the max RPS the instance can handle (benchmark first)
Scale-in cooldown: 300 seconds (prevent flapping)
Scale-out cooldown: 60 seconds (respond quickly to load spikes)
```

Use Inference Recommender before production deployment to benchmark instance types and find the optimal instance/model combination. It runs load tests and reports latency, throughput, and cost per inference, replacing guesswork with data.

## SageMaker Pipelines

### Pipeline Step Types

| Step | Purpose | Notes |
|---|---|---|
| **Processing** | Data prep, feature engineering, evaluation | Runs a processing container (sklearn, Spark, custom) |
| **Training** | Model training |

Files: 4

Size: 72.6 KB

Complexity: 51/100

Category: Cloud & DevOps

Source: https://github.com/aws-samples/sample-claude-code-plugins-for-startups/tree/main/plugins/aws-dev-toolkit/skills/mlops

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts