snowflake-performance-tuning
Optimize Snowflake query performance with clustering, materialized views, caching, and query profiling. Use when queries are slow, analyzing QUERY_HISTORY for bottlenecks, or optimizing warehouse utilization and data scanning. Trigger with phrases like "snowflake performance", "optimize snowflake", "snowflake slow query", "snowflake clustering", "snowflake query profile".
What this skill does
# Snowflake Performance Tuning
## Overview
Optimize Snowflake query performance using clustering keys, materialized views, result caching, query profiling, and warehouse tuning.
## Prerequisites
- Access to `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY`
- Understanding of micro-partitions and pruning
- Role with `MONITOR` privilege on warehouses
## Instructions
### Step 1: Identify Slow Queries
```sql
-- Top 20 slowest queries in last 24 hours
SELECT query_id, query_text, total_elapsed_time / 1000 AS seconds,
bytes_scanned / 1e9 AS gb_scanned,
partitions_scanned, partitions_total,
ROUND(partitions_scanned / NULLIF(partitions_total, 0) * 100, 1) AS pct_scanned,
warehouse_name, warehouse_size
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE execution_status = 'SUCCESS'
AND start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
AND query_type = 'SELECT'
ORDER BY total_elapsed_time DESC
LIMIT 20;
-- Queries scanning too many partitions (poor pruning)
SELECT query_id, query_text,
partitions_scanned, partitions_total,
bytes_scanned / 1e9 AS gb_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE partitions_scanned > partitions_total * 0.5
AND partitions_total > 100
AND start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
ORDER BY partitions_scanned DESC
LIMIT 10;
```
### Step 2: Add Clustering Keys
```sql
-- Clustering improves pruning for large tables (> 1TB)
-- Choose columns used in WHERE and JOIN clauses
-- Cluster by date (most common filter)
ALTER TABLE orders CLUSTER BY (order_date);
-- Multi-column clustering
ALTER TABLE events CLUSTER BY (event_date, event_type);
-- Check clustering depth (lower = better)
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date)');
-- Monitor automatic reclustering
SELECT table_name, num_rows, bytes,
SYSTEM$CLUSTERING_DEPTH('orders') AS clustering_depth
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'ORDERS';
```
### Step 3: Use Materialized Views
```sql
-- Pre-compute expensive aggregations
CREATE OR REPLACE MATERIALIZED VIEW daily_revenue_mv
CLUSTER BY (metric_date)
AS
SELECT
DATE_TRUNC('day', order_date) AS metric_date,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM orders
GROUP BY DATE_TRUNC('day', order_date);
-- Query the MV instead of base table — automatic rewrite may also apply
SELECT * FROM daily_revenue_mv
WHERE metric_date >= DATEADD(days, -30, CURRENT_DATE());
-- Check MV freshness
SELECT name, is_secure, text, refresh_on
FROM INFORMATION_SCHEMA.MATERIALIZED_VIEWS
WHERE name = 'DAILY_REVENUE_MV';
```
### Step 4: Leverage Result Caching
```sql
-- Result cache is ON by default — same query returns instantly
-- Cache is valid for 24 hours if underlying data hasn't changed
-- Check if a query used cache
SELECT query_id, query_text,
CASE WHEN bytes_scanned = 0 AND rows_produced > 0
THEN 'CACHE HIT' ELSE 'CACHE MISS' END AS cache_status,
total_elapsed_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(hours, -1, CURRENT_TIMESTAMP())
ORDER BY start_time DESC;
-- Disable cache for benchmarking
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
-- Re-enable
ALTER SESSION SET USE_CACHED_RESULT = TRUE;
```
### Step 5: Optimize Common Query Patterns
```sql
-- ANTI-PATTERN: SELECT * on wide tables
-- SELECT * FROM events; -- Scans all columns
-- BETTER: Select only needed columns
SELECT event_id, event_type, event_date FROM events WHERE event_date = CURRENT_DATE();
-- ANTI-PATTERN: Cartesian joins
-- SELECT * FROM a, b WHERE a.id = b.id;
-- BETTER: Explicit JOIN with filter pushdown
SELECT a.id, a.name, b.amount
FROM customers a
INNER JOIN orders b ON a.id = b.customer_id
WHERE b.order_date >= '2026-01-01';
-- ANTI-PATTERN: LIKE with leading wildcard
-- WHERE name LIKE '%smith%' -- Full scan
-- BETTER: Use search optimization service for LIKE queries
ALTER TABLE customers ADD SEARCH OPTIMIZATION ON EQUALITY(name), SUBSTRING(name);
-- ANTI-PATTERN: Many small queries in a loop
-- for row in rows: execute(f"INSERT INTO t VALUES ({row})")
-- BETTER: Batch inserts
INSERT INTO target_table
SELECT * FROM source_table WHERE condition;
```
### Step 6: Query Profile Analysis
```sql
-- Use EXPLAIN to see execution plan
EXPLAIN SELECT * FROM orders WHERE order_date = CURRENT_DATE();
-- Get query profile data programmatically
SELECT *
FROM TABLE(GET_QUERY_OPERATOR_STATS('<query_id>'));
-- Key metrics to watch:
-- TableScan: partitions_scanned vs partitions_total
-- Filter: if filter is AFTER scan, consider clustering
-- Sort: high spilling_to_remote_storage = needs bigger warehouse
-- Join: broadcast vs hash, skew detection
```
### Step 7: Warehouse Tuning
```sql
-- Match warehouse size to workload
-- Small: simple queries, < 100GB scans
-- Medium: moderate joins, 100GB-1TB
-- Large: complex analytics, > 1TB scans
-- Scale up for single-query performance
ALTER WAREHOUSE ANALYTICS_WH SET WAREHOUSE_SIZE = 'LARGE';
-- Scale out for concurrent queries (multi-cluster)
ALTER WAREHOUSE ANALYTICS_WH SET
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 4
SCALING_POLICY = 'STANDARD';
-- Monitor warehouse efficiency
SELECT warehouse_name,
SUM(credits_used) AS total_credits,
COUNT(DISTINCT query_id) AS total_queries,
SUM(credits_used) / NULLIF(COUNT(DISTINCT query_id), 0) AS credits_per_query
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY q
JOIN SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY w
ON q.warehouse_name = w.warehouse_name
AND DATE_TRUNC('hour', q.start_time) = w.start_time
WHERE q.start_time >= DATEADD(days, -7, CURRENT_TIMESTAMP())
GROUP BY warehouse_name
ORDER BY total_credits DESC;
```
## Performance Benchmarks
| Optimization | Typical Improvement |
|-------------|-------------------|
| Clustering key on filter column | 10-100x fewer partitions scanned |
| Materialized view | 10-1000x for aggregation queries |
| Result cache hit | Instant (0ms scan) |
| Column pruning (SELECT specific cols) | 2-10x less data scanned |
| Search optimization service | 10-100x for point lookups |
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Spilling to disk | Warehouse too small for query | Scale up warehouse size |
| High partition scan ratio | No clustering on filter column | Add clustering key |
| MV stale | Background refresh delayed | Check MV refresh status |
| Cache miss on same query | Data changed or session setting | Verify `USE_CACHED_RESULT = TRUE` |
## Resources
- [Query Performance](https://docs.snowflake.com/en/user-guide/performance-query-exploring)
- [Clustering Keys](https://docs.snowflake.com/en/user-guide/tables-clustering-keys)
- [Materialized Views](https://docs.snowflake.com/en/user-guide/views-materialized)
- [Search Optimization](https://docs.snowflake.com/en/user-guide/search-optimization-service)
## Next Steps
For cost optimization, see `snowflake-cost-tuning`.
Related in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.