data_analysis
High-performance data analysis using Polars - load, transform, aggregate, visualize and export tabular data. Use for CSV/JSON/Parquet processing, statistical analysis, time series, and creating charts.
What this skill does
# Data Analysis Skill
Comprehensive data analysis toolkit using **Polars** - a blazingly fast DataFrame library. This skill provides instructions, reference documentation, and ready-to-use scripts for common data analysis tasks.
### Iteration Checkpoints
| Step | What to Present | User Input Type |
|------|-----------------|-----------------|
| Data Loading | Shape, columns, sample rows | "Is this the right data?" |
| Data Exploration | Summary stats, data quality issues | "Any columns to focus on?" |
| Transformation | Before/after comparison | "Does this transformation look correct?" |
| Analysis | Key findings, charts | "Should I dig deeper into anything?" |
| Export | Output preview | "Ready to save, or any changes?" |
## Quick Start
```python
import polars as pl
from polars import col
# Load data
df = pl.read_csv("data.csv")
# Explore
print(df.shape, df.schema)
df.describe()
# Transform and analyze
result = (
df.filter(col("value") > 0)
.group_by("category")
.agg(col("value").sum().alias("total"))
.sort("total", descending=True)
)
# Export
result.write_csv("output.csv")
```
## When to Use This Skill
- Loading datasets (CSV, JSON, Parquet, Excel, databases)
- Data cleaning, filtering, and transformation
- Aggregations, grouping, and pivot tables
- Statistical analysis and summary statistics
- Time series analysis and resampling
- Joining and merging multiple datasets
- Creating visualizations and charts
- Exporting results to various formats
## Skill Contents
### Reference Documentation
Detailed API reference and patterns for specific operations:
- `reference/loading.md` - Loading data from all supported formats
- `reference/transformations.md` - Column operations, filtering, sorting, type casting
- `reference/aggregations.md` - Group by, window functions, running totals
- `reference/time_series.md` - Date parsing, resampling, lag features
- `reference/statistics.md` - Correlations, distributions, hypothesis testing setup
- `reference/visualization.md` - Creating charts with matplotlib/plotly
### Ready-to-Use Scripts
Executable Python scripts for common tasks:
- `scripts/explore_data.py` - Quick dataset exploration and profiling
- `scripts/summary_stats.py` - Generate comprehensive statistics report
## Core Patterns
### Loading Data
```python
# CSV (most common)
df = pl.read_csv("data.csv")
# Lazy loading for large files
df = pl.scan_csv("large.csv").filter(col("x") > 0).collect()
# Parquet (recommended for large datasets)
df = pl.read_parquet("data.parquet")
# JSON
df = pl.read_json("data.json")
df = pl.read_ndjson("data.ndjson") # Newline-delimited
```
### Filtering and Selection
```python
# Select columns
df.select("col1", "col2")
df.select(col("name"), col("value") * 2)
# Filter rows
df.filter(col("age") > 25)
df.filter((col("status") == "active") & (col("value") > 100))
df.filter(col("name").str.contains("Smith"))
```
### Transformations
```python
# Add/modify columns
df = df.with_columns(
(col("price") * col("qty")).alias("total"),
col("date_str").str.to_date("%Y-%m-%d").alias("date"),
)
# Conditional values
df = df.with_columns(
pl.when(col("score") >= 90).then(pl.lit("A"))
.when(col("score") >= 80).then(pl.lit("B"))
.otherwise(pl.lit("C"))
.alias("grade")
)
```
### Aggregations
```python
# Group by
df.group_by("category").agg(
col("value").sum().alias("total"),
col("value").mean().alias("avg"),
pl.len().alias("count"),
)
# Window functions
df.with_columns(
col("value").sum().over("group").alias("group_total"),
col("value").rank().over("group").alias("rank_in_group"),
)
```
### Exporting
```python
df.write_csv("output.csv")
df.write_parquet("output.parquet")
df.write_json("output.json", row_oriented=True)
```
## Best Practices
1. **Use lazy evaluation** for large datasets: `pl.scan_csv()` + `.collect()`
2. **Filter early** to reduce data volume before expensive operations
3. **Select only needed columns** to minimize memory usage
4. **Prefer Parquet** for storage - faster I/O, better compression
5. **Use `.explain()`** to understand and optimize query plans
Related in Data & Analytics
clawarr-suite
IncludedComprehensive management for self-hosted media stacks (Sonarr, Radarr, Lidarr, Readarr, Prowlarr, Bazarr, Overseerr, Plex, Tautulli, SABnzbd, Recyclarr, Unpackerr, Notifiarr, Maintainerr, Kometa, FlareSolverr). Deep library exploration, analytics, dashboard generation, content management, request handling, subtitle management, indexer control, download monitoring, quality profile sync, library cleanup automation, notification routing, collection/overlay management, and media tracker integration (Trakt, Letterboxd, Simkl).
querying-soql
IncludedSOQL query generation, optimization, and analysis with 100-point scoring. Use this skill when the user needs SOQL/SOSL authoring or optimization: natural-language-to-query generation, relationship queries, aggregates, query-plan analysis, and performance or safety improvements for Salesforce queries. TRIGGER when: user writes, optimizes, or debugs SOQL/SOSL queries, touches .soql files, or asks about relationship queries, aggregates, or query performance. DO NOT TRIGGER when: bulk data operations (use handling-sf-data), Apex DML logic (use generating-apex), or report/dashboard queries.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
habit-flow
IncludedAI-powered atomic habit tracker with natural language logging, streak tracking, smart reminders, and coaching. Use for creating habits, logging completions naturally ("I meditated today"), viewing progress, and getting personalized coaching.
app-store-optimization
IncludedApp Store Optimization (ASO) toolkit for researching keywords, analyzing competitor rankings, generating metadata suggestions, and improving app visibility on Apple App Store and Google Play Store. Use when the user asks about ASO, app store rankings, app metadata, app titles and descriptions, app store listings, app visibility, or mobile app marketing on iOS or Android. Supports keyword research and scoring, competitor keyword analysis, metadata optimization, A/B test planning, launch checklists, and tracking ranking changes.
visualizing-data
IncludedBuilds dashboards, reports, and data-driven interfaces requiring charts, graphs, or visual analytics. Provides systematic framework for selecting appropriate visualizations based on data characteristics and analytical purpose. Includes 24+ visualization types organized by purpose (trends, comparisons, distributions, relationships, flows, hierarchies, geospatial), accessibility patterns (WCAG 2.1 AA compliance), colorblind-safe palettes, and performance optimization strategies. Use when creating visualizations, choosing chart types, displaying data graphically, or designing data interfaces.