setting-up-log-aggregation

Included with Lifetime

$97 forever

Execute use when setting up log aggregation solutions using ELK, Loki, or Splunk. Trigger with phrases like "setup log aggregation", "deploy ELK stack", "configure Loki", or "install Splunk". Generates production-ready configurations for data ingestion, processing, storage, and visualization with proper security and scalability.

Cloud & DevOpsdevopsdeploymentsecurityscalingscriptsassets

What this skill does

# Setting Up Log Aggregation

## Overview

Deploy centralized log aggregation platforms (ELK Stack, Grafana Loki, Splunk) with ingestion pipelines, structured parsing, retention policies, visualization dashboards, and alerting. Configure log shippers (Filebeat, Promtail, Fluentd) to collect from applications, containers, and system logs with proper security and scalability.

## Prerequisites

- Target infrastructure identified: Kubernetes, Docker Compose, or VMs
- Storage requirements calculated: estimate daily log volume and multiply by retention period
- Network connectivity between log sources and aggregation platform (typically ports 9200, 3100, 8088)
- Authentication mechanism defined (LDAP, OAuth, API tokens, or basic auth)
- Resource allocation planned: Elasticsearch needs significant heap memory (minimum 4GB per node)

## Instructions

1. Select the log aggregation platform: ELK for full-text search and complex queries, Loki for lightweight Kubernetes-native logging, Splunk for enterprise with advanced analytics
2. Deploy the storage backend: Elasticsearch cluster, Loki with object storage (S3/GCS), or Splunk indexers
3. Configure log shippers on sources: Filebeat for ELK, Promtail for Loki, Fluentd/Fluent Bit for multi-destination
4. Define parsing rules: Logstash grok patterns for unstructured logs, JSON parsing for structured logs, multiline handling for stack traces
5. Set retention policies: Index Lifecycle Management (ILM) for Elasticsearch, chunk retention for Loki, index rotation for Splunk
6. Deploy visualization: Kibana dashboards for ELK, Grafana dashboards for Loki, Splunk Search for Splunk
7. Configure alerting: define log-based alerts for error spikes, application exceptions, and security events
8. Implement RBAC: restrict dashboard access and log visibility by team and environment
9. Test the full pipeline: generate test logs, verify ingestion, confirm parsing, and validate dashboard display

## Output

- Docker Compose or Kubernetes manifests for the log aggregation stack
- Log shipper configuration files (Filebeat YAML, Promtail config, Fluentd conf)
- Parsing and field extraction rules (Logstash pipeline, grok patterns)
- Retention policy configuration (ILM, lifecycle rules)
- Dashboard JSON exports for Kibana or Grafana
- Alert rule definitions for error rate monitoring

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| `Elasticsearch heap space exhausted` | JVM heap too small for index volume | Increase `ES_JAVA_OPTS` heap size (set to 50% of available RAM, max 32GB) or add nodes |
| `Cannot connect to Elasticsearch` | Network issue or Elasticsearch not started | Verify Elasticsearch is running and healthy; check firewall rules and bind address |
| `Failed to create index` | Disk space full or index template misconfigured | Check disk usage with `df -h`; review index template settings and shard allocation |
| `Failed to parse log line` | Grok pattern mismatch or unexpected log format | Test grok patterns with Kibana Grok Debugger; add fallback pattern for unmatched lines |
| `Promtail: too many open files` | System file descriptor limit too low for log tailing | Increase `ulimit -n` to 65536; reduce the number of watched paths |

## Examples

- "Deploy an ELK stack on Docker Compose with Filebeat collecting Nginx and application logs, Logstash parsing with grok, and a Kibana dashboard for 5xx error monitoring."
- "Set up Loki + Promtail on Kubernetes with 14-day retention, basic auth, and a Grafana dashboard showing logs per namespace."
- "Configure Fluentd to ship logs from 20 application servers to both Elasticsearch (hot storage, 7 days) and S3 (cold storage, 1 year)."

## Resources

- Elastic Stack guide: https://www.elastic.co/guide/
- Grafana Loki: https://grafana.com/docs/loki/latest/
- Fluentd documentation: https://docs.fluentd.org/
- Promtail configuration: https://grafana.com/docs/loki/latest/send-data/promtail/

Files: 13

Size: 40.2 KB

Complexity: 86/100

Category: Cloud & DevOps

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/devops/log-aggregation-setup/skills/setting-up-log-aggregation

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts