devops-expert
Expert-level DevOps practices, culture, automation, and continuous delivery
What this skill does
# DevOps Expert
Expert guidance for DevOps practices, culture, CI/CD pipelines, infrastructure automation, and operational excellence.
## Core Concepts
### DevOps Culture
- Collaboration and communication
- Shared responsibility
- Continuous improvement
- Breaking down silos
- Blameless culture
- Measuring everything
### Automation
- Infrastructure as Code (IaC)
- Configuration management
- Deployment automation
- Testing automation
- Monitoring automation
- Self-service platforms
### CI/CD
- Continuous Integration
- Continuous Delivery
- Continuous Deployment
- Pipeline as Code
- Artifact management
- Release strategies
## CI/CD Pipeline
```yaml
# GitHub Actions Example
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linting
run: npm run lint
- name: Run tests
run: npm test
- name: Run security scan
run: npm audit
- name: Upload coverage
uses: codecov/codecov-action@v3
build:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v3
- name: Log in to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
run: |
kubectl set image deployment/myapp \
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
--namespace=staging
- name: Wait for rollout
run: kubectl rollout status deployment/myapp -n staging
- name: Run smoke tests
run: npm run test:smoke
deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
run: |
kubectl set image deployment/myapp \
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
--namespace=production
- name: Wait for rollout
run: kubectl rollout status deployment/myapp -n production
```
## Infrastructure as Code
```python
# Pulumi Infrastructure as Code
import pulumi
import pulumi_aws as aws
# VPC
vpc = aws.ec2.Vpc("main-vpc",
cidr_block="10.0.0.0/16",
enable_dns_hostnames=True,
enable_dns_support=True,
tags={"Name": "main-vpc"})
# Subnets
public_subnet = aws.ec2.Subnet("public-subnet",
vpc_id=vpc.id,
cidr_block="10.0.1.0/24",
availability_zone="us-east-1a",
map_public_ip_on_launch=True,
tags={"Name": "public-subnet"})
private_subnet = aws.ec2.Subnet("private-subnet",
vpc_id=vpc.id,
cidr_block="10.0.2.0/24",
availability_zone="us-east-1b",
tags={"Name": "private-subnet"})
# Internet Gateway
igw = aws.ec2.InternetGateway("igw",
vpc_id=vpc.id,
tags={"Name": "main-igw"})
# Route Table
route_table = aws.ec2.RouteTable("public-rt",
vpc_id=vpc.id,
routes=[
aws.ec2.RouteTableRouteArgs(
cidr_block="0.0.0.0/0",
gateway_id=igw.id,
)
],
tags={"Name": "public-rt"})
# Security Group
security_group = aws.ec2.SecurityGroup("web-sg",
vpc_id=vpc.id,
description="Allow HTTP and HTTPS traffic",
ingress=[
aws.ec2.SecurityGroupIngressArgs(
protocol="tcp",
from_port=80,
to_port=80,
cidr_blocks=["0.0.0.0/0"],
),
aws.ec2.SecurityGroupIngressArgs(
protocol="tcp",
from_port=443,
to_port=443,
cidr_blocks=["0.0.0.0/0"],
),
],
egress=[
aws.ec2.SecurityGroupEgressArgs(
protocol="-1",
from_port=0,
to_port=0,
cidr_blocks=["0.0.0.0/0"],
)
])
# EKS Cluster
cluster = aws.eks.Cluster("app-cluster",
role_arn=cluster_role.arn,
vpc_config=aws.eks.ClusterVpcConfigArgs(
subnet_ids=[public_subnet.id, private_subnet.id],
security_group_ids=[security_group.id],
))
# Export outputs
pulumi.export("vpc_id", vpc.id)
pulumi.export("cluster_name", cluster.name)
pulumi.export("cluster_endpoint", cluster.endpoint)
```
## Deployment Strategies
```python
from typing import List, Dict
import time
class DeploymentStrategy:
"""Implement various deployment strategies"""
def __init__(self, service_name: str):
self.service_name = service_name
def blue_green_deployment(self, blue_version: str, green_version: str):
"""Blue-Green deployment"""
# Deploy green environment
self.deploy_environment("green", green_version)
# Run tests on green
if self.run_tests("green"):
# Switch traffic to green
self.switch_traffic("green")
# Keep blue for rollback
print(f"Deployment successful. Blue ({blue_version}) kept for rollback.")
else:
# Rollback - keep blue active
print("Tests failed on green. Keeping blue active.")
def canary_deployment(self, current_version: str, new_version: str,
canary_percentage: int = 10):
"""Canary deployment"""
# Deploy canary with small percentage
self.deploy_canary(new_version, canary_percentage)
# Monitor metrics
metrics = self.monitor_canary_metrics(duration_minutes=10)
if metrics['error_rate'] < 0.1 and metrics['latency_p95'] < 500:
# Gradually increase canary traffic
for percentage in [25, 50, 75, 100]:
self.update_canary_traffic(percentage)
time.sleep(300) # 5 minutes between increases
if not self.check_health():
self.rollback(current_version)
return False
print(f"Canary deployment successful: {new_version}")
return True
else:
self.rollback(current_version)
print("Canary deployment failed - rolled back")
return False
def rolling_deployment(self, version: str, batch_size: int = 1):
"""Rolling deployment"""
instances = self.get_instances()
for i in range(0, len(instances), batch_size):
batch = instances[i:i + batch_size]
# Update batch
for instance in batch:
self.update_instance(instance, version)
self.wait_for_healthy(instance)
# Verify batch health
if not self.check_health():
print(f"Rolling deployment failed at batch {i//batch_size + 1}")
return False
print(f"Rolling deployment successful: {version}")
return True
def feature_flag_deployment(self, feature_name: str, enabled: bool,
rollout_percentage: int = 100):
"""Feature flag based deployment"""
return {
'feature': feature_name,
'enabled': enabled,
'rollRelated in devops
github-actions-advanced
IncludedDesign, debug, and harden GitHub Actions CI/CD workflows, including reusable workflows, matrix builds, self-hosted runners, OIDC authentication, caching, environments, secrets, and release automation.
cicd-pipeline-skill
IncludedGenerates CI/CD pipeline configurations for test automation with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. Includes TestMu AI cloud integration. Use when user mentions "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI". Triggers on: "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI", "Azure DevOps", "automated testing pipeline".
docker-expert
IncludedDocker containerization expert with deep knowledge of multi-stage builds, image optimization, container security, Docker Compose orchestration, and production deployment patterns. Use PROACTIVELY for Dockerfile optimization, container issues, image size problems, security hardening, networking, and orchestration challenges.
terraform-expert
IncludedExpert-level Terraform infrastructure as code, modules, state management, and production best practices
cicd-expert
IncludedExpert-level CI/CD with GitHub Actions, Jenkins, deployment pipelines, and automation
monitoring-expert
IncludedExpert-level monitoring and observability with Prometheus, Grafana, logging, and alerting