snowflake-load-scale
Implement Snowflake load testing, warehouse scaling, and capacity planning. Use when testing query performance at scale, configuring multi-cluster warehouses, or planning capacity for production Snowflake workloads. Trigger with phrases like "snowflake load test", "snowflake scale", "snowflake capacity", "snowflake benchmark", "snowflake multi-cluster".
What this skill does
# Snowflake Load & Scale
## Overview
Load testing, scaling strategies, and capacity planning for Snowflake workloads using warehouse sizing, multi-cluster configuration, and concurrent query simulation.
## Scaling Model
| Dimension | How to Scale | When |
|-----------|-------------|------|
| Single query speed | Scale UP (bigger warehouse) | Complex queries, large scans |
| Concurrent queries | Scale OUT (multi-cluster) | Many users, dashboard refresh |
| Data volume | Scale UP + clustering | Tables > 1TB |
| Mixed workloads | Separate warehouses | ETL + analytics on same data |
## Instructions
### Step 1: Benchmark Current Performance
```sql
-- Baseline metrics for critical queries
-- Run each query 3 times and record results
-- Disable result cache for accurate benchmarking
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
-- Test query 1: Point lookup
SELECT * FROM orders WHERE order_id = 12345;
-- Test query 2: Aggregation
SELECT DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders, SUM(amount) AS revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY month ORDER BY month;
-- Test query 3: Join + filter
SELECT c.name, SUM(o.amount) AS total_spend
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= DATEADD(days, -90, CURRENT_DATE())
GROUP BY c.name
ORDER BY total_spend DESC
LIMIT 100;
-- Record results
SELECT query_id, query_text, warehouse_name, warehouse_size,
total_elapsed_time / 1000 AS seconds,
bytes_scanned / 1e9 AS gb_scanned,
rows_produced, partitions_scanned, partitions_total,
bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION())
ORDER BY start_time DESC
LIMIT 10;
-- Re-enable cache
ALTER SESSION SET USE_CACHED_RESULT = TRUE;
```
### Step 2: Test Warehouse Size Impact
```sql
-- Run same query on different warehouse sizes to find optimal
-- XS → S → M → L → XL
ALTER WAREHOUSE BENCHMARK_WH SET WAREHOUSE_SIZE = 'XSMALL';
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
-- Run your benchmark query
SELECT /* BENCHMARK_XS */ ...;
ALTER WAREHOUSE BENCHMARK_WH SET WAREHOUSE_SIZE = 'SMALL';
SELECT /* BENCHMARK_S */ ...;
ALTER WAREHOUSE BENCHMARK_WH SET WAREHOUSE_SIZE = 'MEDIUM';
SELECT /* BENCHMARK_M */ ...;
-- Compare results
SELECT warehouse_size, query_id,
total_elapsed_time / 1000 AS seconds,
bytes_scanned / 1e9 AS gb_scanned
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION())
WHERE query_text LIKE '%BENCHMARK_%'
ORDER BY start_time DESC;
-- Typical scaling: doubling size halves runtime for scan-heavy queries
-- Diminishing returns for small/simple queries
```
### Step 3: Concurrent Load Testing
```python
# load_test.py — simulate concurrent Snowflake queries
import snowflake.connector
import threading
import time
import os
from statistics import mean, median
CONCURRENT_USERS = 20
QUERIES_PER_USER = 10
WAREHOUSE = 'LOAD_TEST_WH'
TEST_QUERIES = [
"SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE() - 1",
"SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id LIMIT 100",
"SELECT * FROM orders WHERE order_id = %s",
]
results = []
errors = []
def run_user_session(user_id: int):
conn = snowflake.connector.connect(
account=os.environ['SNOWFLAKE_ACCOUNT'],
user=os.environ['SNOWFLAKE_USER'],
password=os.environ['SNOWFLAKE_PASSWORD'],
warehouse=WAREHOUSE,
database='PROD_DW',
schema='GOLD',
)
cursor = conn.cursor()
for i in range(QUERIES_PER_USER):
query = TEST_QUERIES[i % len(TEST_QUERIES)]
start = time.time()
try:
if '%s' in query:
cursor.execute(query, (user_id * 1000 + i,))
else:
cursor.execute(query)
cursor.fetchall()
elapsed = time.time() - start
results.append({'user': user_id, 'query': i, 'seconds': elapsed})
except Exception as e:
errors.append({'user': user_id, 'query': i, 'error': str(e)})
conn.close()
# Run concurrent sessions
threads = []
start_time = time.time()
for uid in range(CONCURRENT_USERS):
t = threading.Thread(target=run_user_session, args=(uid,))
threads.append(t)
t.start()
for t in threads:
t.join()
total_time = time.time() - start_time
# Report
times = [r['seconds'] for r in results]
print(f"=== Load Test Results ===")
print(f"Users: {CONCURRENT_USERS}, Queries/user: {QUERIES_PER_USER}")
print(f"Total queries: {len(results)}, Errors: {len(errors)}")
print(f"Total time: {total_time:.1f}s")
print(f"Avg latency: {mean(times):.3f}s")
print(f"Median: {median(times):.3f}s")
print(f"P95: {sorted(times)[int(len(times)*0.95)]:.3f}s")
print(f"QPS: {len(results)/total_time:.1f}")
```
### Step 4: Multi-Cluster Warehouse Configuration
```sql
-- Standard scaling: Snowflake adds clusters when queries queue
CREATE OR REPLACE WAREHOUSE ANALYTICS_WH
WAREHOUSE_SIZE = 'MEDIUM'
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 6
SCALING_POLICY = 'STANDARD'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
-- Economy scaling: tolerates queuing, minimizes cost
ALTER WAREHOUSE ANALYTICS_WH SET SCALING_POLICY = 'ECONOMY';
-- Maximized mode: all clusters always running (predictable latency)
CREATE WAREHOUSE DASHBOARD_WH
WAREHOUSE_SIZE = 'SMALL'
MIN_CLUSTER_COUNT = 3
MAX_CLUSTER_COUNT = 3 -- Same = maximized mode
AUTO_SUSPEND = 120
AUTO_RESUME = TRUE;
-- Monitor multi-cluster behavior
SELECT start_time, warehouse_name,
avg_running, avg_queued_load, avg_queued_provisioning
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY(
DATE_RANGE_START => DATEADD(hours, -4, CURRENT_TIMESTAMP()),
WAREHOUSE_NAME => 'ANALYTICS_WH'
))
WHERE avg_queued_load > 0
ORDER BY start_time DESC;
```
### Step 5: Capacity Planning
```sql
-- Weekly growth analysis
SELECT DATE_TRUNC('week', start_time) AS week,
SUM(credits_used) AS weekly_credits,
COUNT(DISTINCT query_id) AS weekly_queries,
ROUND(SUM(credits_used) / NULLIF(COUNT(DISTINCT query_id), 0), 4) AS credits_per_query
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY w
JOIN SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY q
ON w.warehouse_name = q.warehouse_name
WHERE w.start_time >= DATEADD(months, -3, CURRENT_TIMESTAMP())
GROUP BY week
ORDER BY week;
-- Storage growth trend
SELECT usage_date,
ROUND(storage_bytes / 1e12, 3) AS data_tb,
LAG(ROUND(storage_bytes / 1e12, 3)) OVER (ORDER BY usage_date) AS prev_tb,
ROUND((storage_bytes - LAG(storage_bytes) OVER (ORDER BY usage_date)) / 1e9, 1) AS daily_growth_gb
FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE
WHERE usage_date >= DATEADD(days, -30, CURRENT_DATE())
ORDER BY usage_date;
```
## Benchmark Results Template
```
## Snowflake Performance Benchmark
Date: YYYY-MM-DD
Environment: [staging/production]
Table size: [X rows, Y GB]
| Warehouse | Query Type | Avg (s) | P95 (s) | GB Scanned | Spill |
|-----------|-----------|---------|---------|-----------|-------|
| XS | Agg | | | | |
| S | Agg | | | | |
| M | Agg | | | | |
Concurrent: [N users, M queries, QPS achieved]
Recommendation: [sizing/clustering/multi-cluster advice]
```
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Queries queuing | Concurrency > capacity | Add multi-cluster or separate warehouse |
| Linear scaling fails | Query not parallelizable | Optimize SQL (reduce shuffle) |
| Spilling on larger warehouse | Data skew | Check for hot partition/join skew |
| Load test throttled | Login rate limit | Use connection pooling |
## Resources
- [Warehouse Considerations](https://docs.snowflake.com/en/user-guide/warehouses-considerations)
- [Multi-Cluster Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-multicluster)
- [Warehouse Load Monitoring](httRelated in Code Review
gstack
IncludedFast headless browser for QA testing and site dogfooding. Navigate pages, interact with elements, verify state, diff before/after, take annotated screenshots, test responsive layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
startup-due-diligence
IncludedLegal due diligence review for seed-stage and Series A startups (US, Delaware C-Corp focus). Supports both investor and founder perspectives. Capabilities include: (1) Interactive document review and issue spotting; (2) Document request list generation; (3) Cap table and SAFE/convertible note analysis; (4) Red flag identification with severity ratings; (5) Diligence report generation. TRIGGERS: due diligence, DD, startup investment, cap table review, Series A, seed round, investor diligence, legal review startup, SAFE analysis, convertible note, 409A, founder vesting.
interview-master
IncludedThis skill should be used when the user asks to "generate interview questions", "prepare for interview", "optimize resume", "conduct mock interview", "analyze git commits for resume", "generate resume from code", "review my resume", or mentions interview preparation, career assistance, or extracting project experience from git history. Provides comprehensive interview and career development guidance for both job seekers and interviewers.
fix-issue
IncludedFixes GitHub issues using parallel analysis agents for root cause investigation, code exploration, and regression detection. Reads issue context from gh CLI, searches codebase and memory for related patterns, generates a fix with tests, and links the resolution back to the issue via PR. Includes prevention analysis to avoid recurrence. Use when debugging errors, resolving regressions, fixing bugs, or triaging issues.
sf-apex
IncludedGenerates and reviews Salesforce Apex code with 150-point scoring. TRIGGER when: user writes, reviews, or fixes Apex classes, triggers, test classes, batch/queueable/schedulable jobs, or touches .cls/.trigger files. DO NOT TRIGGER when: LWC JavaScript (use sf-lwc), Flow XML (use sf-flow), SOQL-only queries (use sf-soql), or non-Salesforce code.
swift-development
IncludedComprehensive Swift development for building, testing, and deploying iOS/macOS applications. Use when Claude needs to: (1) Build Swift packages or Xcode projects from command line, (2) Run tests with XCTest or Swift Testing framework, (3) Manage iOS simulators with simctl, (4) Handle code signing, provisioning profiles, and app distribution, (5) Format or lint Swift code with SwiftFormat/SwiftLint, (6) Work with Swift Package Manager (SPM), (7) Implement Swift 6 concurrency patterns (async/await, actors, Sendable), (8) Create SwiftUI views with MVVM architecture, (9) Set up Core Data or SwiftData persistence, or any other Swift/iOS/macOS development tasks.