devops-expert

Included with Lifetime

$97 forever

Expert-level DevOps practices, culture, automation, and continuous delivery

devopsdevopsci-cdautomationinfrastructureculture

What this skill does


# DevOps Expert

Expert guidance for DevOps practices, culture, CI/CD pipelines, infrastructure automation, and operational excellence.

## Core Concepts

### DevOps Culture
- Collaboration and communication
- Shared responsibility
- Continuous improvement
- Breaking down silos
- Blameless culture
- Measuring everything

### Automation
- Infrastructure as Code (IaC)
- Configuration management
- Deployment automation
- Testing automation
- Monitoring automation
- Self-service platforms

### CI/CD
- Continuous Integration
- Continuous Delivery
- Continuous Deployment
- Pipeline as Code
- Artifact management
- Release strategies

## CI/CD Pipeline

```yaml
# GitHub Actions Example
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run linting
        run: npm run lint

      - name: Run tests
        run: npm test

      - name: Run security scan
        run: npm audit

      - name: Upload coverage
        uses: codecov/codecov-action@v3

  build:
    needs: test
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v3

      - name: Log in to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging

    steps:
      - name: Deploy to staging
        run: |
          kubectl set image deployment/myapp \
            myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
            --namespace=staging

      - name: Wait for rollout
        run: kubectl rollout status deployment/myapp -n staging

      - name: Run smoke tests
        run: npm run test:smoke

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production

    steps:
      - name: Deploy to production
        run: |
          kubectl set image deployment/myapp \
            myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
            --namespace=production

      - name: Wait for rollout
        run: kubectl rollout status deployment/myapp -n production
```

## Infrastructure as Code

```python
# Pulumi Infrastructure as Code
import pulumi
import pulumi_aws as aws

# VPC
vpc = aws.ec2.Vpc("main-vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    enable_dns_support=True,
    tags={"Name": "main-vpc"})

# Subnets
public_subnet = aws.ec2.Subnet("public-subnet",
    vpc_id=vpc.id,
    cidr_block="10.0.1.0/24",
    availability_zone="us-east-1a",
    map_public_ip_on_launch=True,
    tags={"Name": "public-subnet"})

private_subnet = aws.ec2.Subnet("private-subnet",
    vpc_id=vpc.id,
    cidr_block="10.0.2.0/24",
    availability_zone="us-east-1b",
    tags={"Name": "private-subnet"})

# Internet Gateway
igw = aws.ec2.InternetGateway("igw",
    vpc_id=vpc.id,
    tags={"Name": "main-igw"})

# Route Table
route_table = aws.ec2.RouteTable("public-rt",
    vpc_id=vpc.id,
    routes=[
        aws.ec2.RouteTableRouteArgs(
            cidr_block="0.0.0.0/0",
            gateway_id=igw.id,
        )
    ],
    tags={"Name": "public-rt"})

# Security Group
security_group = aws.ec2.SecurityGroup("web-sg",
    vpc_id=vpc.id,
    description="Allow HTTP and HTTPS traffic",
    ingress=[
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=80,
            to_port=80,
            cidr_blocks=["0.0.0.0/0"],
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=443,
            to_port=443,
            cidr_blocks=["0.0.0.0/0"],
        ),
    ],
    egress=[
        aws.ec2.SecurityGroupEgressArgs(
            protocol="-1",
            from_port=0,
            to_port=0,
            cidr_blocks=["0.0.0.0/0"],
        )
    ])

# EKS Cluster
cluster = aws.eks.Cluster("app-cluster",
    role_arn=cluster_role.arn,
    vpc_config=aws.eks.ClusterVpcConfigArgs(
        subnet_ids=[public_subnet.id, private_subnet.id],
        security_group_ids=[security_group.id],
    ))

# Export outputs
pulumi.export("vpc_id", vpc.id)
pulumi.export("cluster_name", cluster.name)
pulumi.export("cluster_endpoint", cluster.endpoint)
```

## Deployment Strategies

```python
from typing import List, Dict
import time

class DeploymentStrategy:
    """Implement various deployment strategies"""

    def __init__(self, service_name: str):
        self.service_name = service_name

    def blue_green_deployment(self, blue_version: str, green_version: str):
        """Blue-Green deployment"""
        # Deploy green environment
        self.deploy_environment("green", green_version)

        # Run tests on green
        if self.run_tests("green"):
            # Switch traffic to green
            self.switch_traffic("green")

            # Keep blue for rollback
            print(f"Deployment successful. Blue ({blue_version}) kept for rollback.")
        else:
            # Rollback - keep blue active
            print("Tests failed on green. Keeping blue active.")

    def canary_deployment(self, current_version: str, new_version: str,
                         canary_percentage: int = 10):
        """Canary deployment"""
        # Deploy canary with small percentage
        self.deploy_canary(new_version, canary_percentage)

        # Monitor metrics
        metrics = self.monitor_canary_metrics(duration_minutes=10)

        if metrics['error_rate'] < 0.1 and metrics['latency_p95'] < 500:
            # Gradually increase canary traffic
            for percentage in [25, 50, 75, 100]:
                self.update_canary_traffic(percentage)
                time.sleep(300)  # 5 minutes between increases

                if not self.check_health():
                    self.rollback(current_version)
                    return False

            print(f"Canary deployment successful: {new_version}")
            return True
        else:
            self.rollback(current_version)
            print("Canary deployment failed - rolled back")
            return False

    def rolling_deployment(self, version: str, batch_size: int = 1):
        """Rolling deployment"""
        instances = self.get_instances()

        for i in range(0, len(instances), batch_size):
            batch = instances[i:i + batch_size]

            # Update batch
            for instance in batch:
                self.update_instance(instance, version)
                self.wait_for_healthy(instance)

            # Verify batch health
            if not self.check_health():
                print(f"Rolling deployment failed at batch {i//batch_size + 1}")
                return False

        print(f"Rolling deployment successful: {version}")
        return True

    def feature_flag_deployment(self, feature_name: str, enabled: bool,
                               rollout_percentage: int = 100):
        """Feature flag based deployment"""
        return {
            'feature': feature_name,
            'enabled': enabled,
            'roll

Files: 1

Size: 13.0 KB

Complexity: 20/100

Category: devops

Source: https://github.com/personamanagmentlayer/pcl/tree/main/stdlib/devops/devops-expert

Related in devops

github-actions-advanced

Included

Design, debug, and harden GitHub Actions CI/CD workflows, including reusable workflows, matrix builds, self-hosted runners, OIDC authentication, caching, environments, secrets, and release automation.

devops

cicd-pipeline-skill

Included

Generates CI/CD pipeline configurations for test automation with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. Includes TestMu AI cloud integration. Use when user mentions "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI". Triggers on: "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI", "Azure DevOps", "automated testing pipeline".

devops

docker-expert

Included

Docker containerization expert with deep knowledge of multi-stage builds, image optimization, container security, Docker Compose orchestration, and production deployment patterns. Use PROACTIVELY for Dockerfile optimization, container issues, image size problems, security hardening, networking, and orchestration challenges.

devops

terraform-expert

Included

Expert-level Terraform infrastructure as code, modules, state management, and production best practices

devops

cicd-expert

Included

Expert-level CI/CD with GitHub Actions, Jenkins, deployment pipelines, and automation

devops

monitoring-expert

Included

Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting

devops

Expert-level Terraform infrastructure as code, modules, state management, and production best practices

devops

cicd-expert

Included

Expert-level CI/CD with GitHub Actions, Jenkins, deployment pipelines, and automation

devops

monitoring-expert

Included

Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting

devops