clickhouse-best-practices
MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.
What this skill does
# ClickHouse Best Practices Guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 atomic rules across 3 categories (schema, query, insert), prioritized by impact. Extended with 14 reference files covering cluster management, backups, monitoring, and integrations. > **Official docs:** [ClickHouse Best Practices](https://clickhouse.com/docs/best-practices) > **Official docs:** [ClickHouse Best Practices](https://clickhouse.com/docs/best-practices) --- ## ⚠️ Security Considerations ### Credential Placeholders Example credentials in documentation (`password123`, `AKIAIOSFODNN7EXAMPLE`) are placeholders only. Never use these in production. Use proper secret management: - Environment variables - Secret managers (AWS Secrets Manager, HashiCorp Vault, etc.) - Kubernetes secrets (for K8s deployments) - ClickHouse named collections with external configuration ### Installation & Operations For installation and operational procedures: - Follow official documentation links provided in reference files - Prefer package managers (`apt`, `yum`, `helm`) over direct downloads - Use versioned artifacts instead of `latest` in production - Test procedures in non-production environments first --- ## IMPORTANT: How to Apply This Skill **Before answering ClickHouse questions, follow this priority order:** 1. **Check for applicable rules** in the `rules/` directory 2. **If rules exist:** Apply them and cite them in your response using "Per `rule-name`..." 3. **If no rule exists:** Check `references/` for deeper topic coverage 4. **If neither covers it:** Use general ClickHouse knowledge or search documentation 5. **Always cite your source:** rule name, reference file, or URL **Why rules take priority:** ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance. --- ## Review Procedures ### For Schema Reviews (CREATE TABLE, ALTER TABLE) **Read these rule files in order:** 1. `rules/schema-pk-plan-before-creation.md` — ORDER BY is immutable 2. `rules/schema-pk-cardinality-order.md` — Column ordering in keys 3. `rules/schema-pk-prioritize-filters.md` — Filter column inclusion 4. `rules/schema-pk-filter-on-orderby.md` — Query filter alignment 5. `rules/schema-types-native-types.md` — Proper type selection 6. `rules/schema-types-minimize-bitwidth.md` — Numeric type sizing 7. `rules/schema-types-lowcardinality.md` — LowCardinality usage 8. `rules/schema-types-avoid-nullable.md` — Nullable vs DEFAULT 9. `rules/schema-types-enum.md` — Enum for finite value sets 10. `rules/schema-partition-low-cardinality.md` — Partition count limits 11. `rules/schema-partition-lifecycle.md` — Partitioning purpose 12. `rules/schema-partition-query-tradeoffs.md` — Partition pruning trade-offs 13. `rules/schema-partition-start-without.md` — Start without partitioning 14. `rules/schema-json-when-to-use.md` — JSON type usage **Check for:** - [ ] PRIMARY KEY / ORDER BY column order (low-to-high cardinality) - [ ] Data types match actual data ranges - [ ] LowCardinality applied to appropriate string columns - [ ] Partition key cardinality bounded (100-1,000 values) - [ ] ReplacingMergeTree has version column if used ### For Query Reviews (SELECT, JOIN, aggregations) **Read these rule files:** 1. `rules/query-join-choose-algorithm.md` — Algorithm selection 2. `rules/query-join-use-any.md` — ANY vs regular JOIN 3. `rules/query-join-filter-before.md` — Pre-join filtering 4. `rules/query-join-consider-alternatives.md` — Dictionaries/denormalization 5. `rules/query-join-null-handling.md` — join_use_nulls setting 6. `rules/query-index-skipping-indices.md` — Secondary index usage 7. `rules/query-mv-incremental.md` — Incremental materialized views 8. `rules/query-mv-refreshable.md` — Refreshable materialized views **Check for:** - [ ] Filters use ORDER BY prefix columns - [ ] JOINs filter tables before joining (not after) - [ ] Correct JOIN algorithm for table sizes - [ ] Skipping indices for non-ORDER BY filter columns ### For Insert Strategy Reviews (data ingestion, updates, deletes) **Read these rule files:** 1. `rules/insert-batch-size.md` — Batch sizing requirements 2. `rules/insert-async-small-batches.md` — Async insert usage 3. `rules/insert-format-native.md` — Native format for performance 4. `rules/insert-mutation-avoid-update.md` — UPDATE alternatives 5. `rules/insert-mutation-avoid-delete.md` — DELETE alternatives 6. `rules/insert-optimize-avoid-final.md` — OPTIMIZE TABLE risks **Check for:** - [ ] Batch size 10K-100K rows per INSERT - [ ] No ALTER TABLE UPDATE for frequent changes - [ ] ReplacingMergeTree or CollapsingMergeTree for update patterns - [ ] Async inserts enabled for high-frequency small batches --- ## Output Format Structure review responses as follows: ``` ## Rules Checked - `rule-name-1` — Compliant / Violation found - `rule-name-2` — Compliant / Violation found ... ## Findings ### Violations - **`rule-name`**: Description of the issue - Current: [what the code does] - Required: [what it should do] - Fix: [specific correction] ### Compliant - `rule-name`: Brief note on why it's correct ## Recommendations [Prioritized list of changes, citing rules] ``` --- ## Rule Categories by Priority | Priority | Category | Impact | Prefix | Count | |----------|----------|--------|--------|-------| | 1 | Primary Key Selection | CRITICAL | `schema-pk-` | 4 | | 2 | Data Type Selection | CRITICAL | `schema-types-` | 5 | | 3 | JOIN Optimization | CRITICAL | `query-join-` | 5 | | 4 | Insert Batching | CRITICAL | `insert-batch-` | 1 | | 5 | Mutation Avoidance | CRITICAL | `insert-mutation-` | 2 | | 6 | Partitioning Strategy | HIGH | `schema-partition-` | 4 | | 7 | Skipping Indices | HIGH | `query-index-` | 1 | | 8 | Materialized Views | HIGH | `query-mv-` | 2 | | 9 | Async Inserts | HIGH | `insert-async-` | 2 | | 10 | OPTIMIZE Avoidance | HIGH | `insert-optimize-` | 1 | | 11 | JSON Usage | MEDIUM | `schema-json-` | 1 | --- ## Quick Reference ### Schema Design — Primary Key (CRITICAL) - `schema-pk-plan-before-creation` — Plan ORDER BY before table creation (immutable) - `schema-pk-cardinality-order` — Order columns low-to-high cardinality - `schema-pk-prioritize-filters` — Include frequently filtered columns - `schema-pk-filter-on-orderby` — Query filters must use ORDER BY prefix ### Schema Design — Data Types (CRITICAL) - `schema-types-native-types` — Use native types, not String for everything - `schema-types-minimize-bitwidth` — Use smallest numeric type that fits - `schema-types-lowcardinality` — LowCardinality for <10K unique strings - `schema-types-enum` — Enum for finite value sets with validation - `schema-types-avoid-nullable` — Avoid Nullable; use DEFAULT instead ### Schema Design — Partitioning (HIGH) - `schema-partition-low-cardinality` — Keep partition count 100-1,000 - `schema-partition-lifecycle` — Use partitioning for data lifecycle, not queries - `schema-partition-query-tradeoffs` — Understand partition pruning trade-offs - `schema-partition-start-without` — Consider starting without partitioning ### Schema Design — JSON (MEDIUM) - `schema-json-when-to-use` — JSON for dynamic schemas; typed columns for known ### Query Optimization — JOINs (CRITICAL) - `query-join-choose-algorithm` — Select algorithm based on table sizes - `query-join-use-any` — ANY JOIN when only one match needed - `query-join-filter-before` — Filter tables before joining - `query-join-consider-alternatives` — Dictionaries/denormalization vs JOIN - `query-join-null-handling` — join_use_nulls=0 for default values ### Query Optimization — Indices (HIGH) - `query-index-skipping-indices` — Skipping indices for non-ORDER BY filters ### Query Optimization — Materialized Views (HIGH) - `query-mv-incremental` — Incremental MVs for real-time aggregations - `query-mv-refreshable` — Refreshab
Related in Data & Analytics
clawarr-suite
IncludedComprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).
querying-soql
IncludedSOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
habit-flow
IncludedAI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
visualizing-data
IncludedBuilds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.