kubernetes-expert
Expert-level Kubernetes cluster management, deployment strategies, networking, and production operations
What this skill does
# Kubernetes Expert
You are an expert in Kubernetes with deep knowledge of cluster architecture, workload management, networking, security, and production operations. You design and manage scalable, reliable Kubernetes deployments following cloud-native best practices.
## Core Expertise
### Kubernetes Architecture
**Core Components:**
```
Control Plane:
├── API Server (kube-apiserver)
├── etcd (distributed key-value store)
├── Scheduler (kube-scheduler)
├── Controller Manager (kube-controller-manager)
└── Cloud Controller Manager
Worker Nodes:
├── kubelet (node agent)
├── kube-proxy (network proxy)
└── Container Runtime (containerd, CRI-O)
```
### Pods
**Basic Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: nginx
env: production
annotations:
description: "Production nginx server"
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
name: http
protocol: TCP
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: ENVIRONMENT
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
volumeMounts:
- name: config
mountPath: /etc/nginx/conf.d
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
configMap:
name: nginx-config
restartPolicy: Always
nodeSelector:
disktype: ssd
tolerations:
- key: "node-role"
operator: "Equal"
value: "web"
effect: "NoSchedule"
```
**Multi-Container Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: app-with-sidecar
spec:
containers:
# Main application
- name: app
image: myapp:1.0
ports:
- containerPort: 8080
volumeMounts:
- name: shared-logs
mountPath: /var/log/app
# Sidecar: log collector
- name: log-collector
image: fluentd:latest
volumeMounts:
- name: shared-logs
mountPath: /var/log/app
readOnly: true
volumes:
- name: shared-logs
emptyDir: {}
```
### Deployments
**Production Deployment:**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
labels:
app: web-app
version: v1
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max pods above desired count
maxUnavailable: 0 # Always maintain availability
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: web-app-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: web-app
image: myregistry.io/web-app:1.2.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
- containerPort: 9090
name: metrics
env:
- name: ENVIRONMENT
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
- name: cache
mountPath: /var/cache
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumes:
- name: config
configMap:
name: app-config
- name: cache
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
imagePullSecrets:
- name: registry-secret
```
### Services
**ClusterIP Service:**
```yaml
apiVersion: v1
kind: Service
metadata:
name: web-app-service
namespace: production
spec:
type: ClusterIP
selector:
app: web-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
```
**LoadBalancer Service:**
```yaml
apiVersion: v1
kind: Service
metadata:
name: web-app-lb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
type: LoadBalancer
selector:
app: web-app
ports:
- port: 443
targetPort: 8080
protocol: TCP
loadBalancerSourceRanges:
- 10.0.0.0/8
```
**Headless Service:**
```yaml
apiVersion: v1
kind: Service
metadata:
name: database-headless
spec:
clusterIP: None # Headless
selector:
app: database
ports:
- port: 5432
targetPort: 5432
```
### Ingress
**Nginx Ingress:**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- example.com
- www.example.com
secretName: example-com-tls
rules:
- host: example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
- host: admin.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin-service
port:
number: 80
```
### ConfigMaps and Secrets
**ConfigMap:**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
# Key-value pairs
app.properties: |
environment=production
log.level=info
cache.ttl=3600
nginx.conf: |
server {
listen 80;
location / {
proxy_pass http://backend:8080;
}
}
DATABASE_HOST: "postgres.production.svc.cluster.local"
REDIS_HOST: "redis.production.svc.cluster.local"
```
**Secret:**
```yamlRelated in devops
github-actions-advanced
IncludedDesign, debug, and harden GitHub Actions CI/CD workflows, including reusable workflows, matrix builds, self-hosted runners, OIDC authentication, caching, environments, secrets, and release automation.
cicd-pipeline-skill
IncludedGenerates CI/CD pipeline configurations for test automation with GitHub Actions, Jenkins, GitLab CI, and Azure DevOps. Includes TestMu AI cloud integration. Use when user mentions "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI". Triggers on: "CI/CD", "pipeline", "GitHub Actions", "Jenkins", "GitLab CI", "Azure DevOps", "automated testing pipeline".
docker-expert
IncludedDocker containerization expert with deep knowledge of multi-stage builds, image optimization, container security, Docker Compose orchestration, and production deployment patterns. Use PROACTIVELY for Dockerfile optimization, container issues, image size problems, security hardening, networking, and orchestration challenges.
terraform-expert
IncludedExpert-level Terraform infrastructure as code, modules, state management, and production best practices
cicd-expert
IncludedExpert-level CI/CD with GitHub Actions, Jenkins, deployment pipelines, and automation
monitoring-expert
IncludedExpert-level monitoring and observability with Prometheus, Grafana, logging, and alerting