autoviz
Automatic exploratory data analysis and visualization with a single line of code - generates comprehensive charts, detects patterns, and exports to HTML/notebooks
What this skill does
# AutoViz Automatic EDA Skill
Master AutoViz for instant exploratory data analysis with a single line of code. Generate comprehensive visualizations, detect patterns, identify outliers, and export publication-ready charts automatically.
## When to Use This Skill
### USE AutoViz when:
- **Quick EDA** - Need rapid insights into a new dataset
- **Initial exploration** - Starting analysis on unfamiliar data
- **Pattern discovery** - Automatically detect relationships between variables
- **Presentation prep** - Need charts quickly for stakeholder meetings
- **Large datasets** - Built-in sampling handles big data efficiently
- **Feature analysis** - Understanding distribution and importance of features
- **Correlation hunting** - Finding relationships without manual chart creation
- **Report generation** - Export comprehensive HTML reports
### DON'T USE AutoViz when:
- **Custom visualizations** - Need highly specific chart designs
- **Interactive dashboards** - Use Streamlit or Dash instead
- **Real-time data** - Streaming visualization requirements
- **Production systems** - Charts for automated pipelines (use Plotly/Altair)
- **Precise statistical tests** - Need formal hypothesis testing
- **Domain-specific plots** - Specialized visualizations not in standard EDA
## Prerequisites
```bash
# Basic installation
pip install autoviz
# With all visualization backends
pip install autoviz matplotlib seaborn plotly bokeh
# Using uv (recommended)
uv pip install autoviz pandas matplotlib seaborn plotly
# Jupyter notebook support
pip install autoviz ipywidgets notebook
# Verify installation
python -c "from autoviz import AutoViz_Class; print('AutoViz ready!')"
```
## Core Capabilities
### 1. Basic One-Line EDA
**Simplest Usage:**
```python
from autoviz import AutoViz_Class
# Initialize AutoViz
AV = AutoViz_Class()
# Automatic visualization with one line
# Returns a dataframe and generates all charts
df_analyzed = AV.AutoViz(
filename="data.csv",
sep=",",
depVar="", # Target variable (optional)
dfte=None, # Pass DataFrame directly instead of filename
header=0,
verbose=1, # 0=minimal, 1=medium, 2=detailed output
lowess=False,
chart_format="svg",
max_rows_analyzed=150000,
max_cols_analyzed=30
)
print(f"Analyzed {df_analyzed.shape[0]} rows, {df_analyzed.shape[1]} columns")
```
**From DataFrame:**
```python
from autoviz import AutoViz_Class
import pandas as pd
# Load your data
df = pd.read_csv("sales_data.csv")
# Or create sample data
df = pd.DataFrame({
"revenue": [100, 200, 150, 300, 250, 400, 350, 500],
"units": [10, 20, 15, 30, 25, 40, 35, 50],
"category": ["A", "B", "A", "B", "A", "B", "A", "B"],
"region": ["North", "South", "East", "West", "North", "South", "East", "West"],
"profit": [20, 40, 30, 60, 50, 80, 70, 100],
"customer_age": [25, 35, 45, 55, 30, 40, 50, 60]
})
# Initialize and visualize
AV = AutoViz_Class()
# Pass DataFrame directly using dfte parameter
df_result = AV.AutoViz(
filename="", # Empty when using dfte
sep=",",
depVar="profit", # Optional: specify target variable
dfte=df,
header=0,
verbose=1,
chart_format="png"
)
```
**With Target Variable Analysis:**
```python
from autoviz import AutoViz_Class
import pandas as pd
# Classification dataset
df_classification = pd.DataFrame({
"feature_1": [1.2, 2.3, 1.5, 3.4, 2.1, 4.5, 3.2, 5.1],
"feature_2": [0.5, 1.2, 0.8, 2.1, 1.0, 3.2, 2.4, 4.0],
"feature_3": ["low", "medium", "low", "high", "medium", "high", "medium", "high"],
"target": [0, 0, 0, 1, 0, 1, 1, 1]
})
AV = AutoViz_Class()
# Specify target variable for focused analysis
df_analyzed = AV.AutoViz(
filename="",
sep=",",
depVar="target", # Target variable for classification
dfte=df_classification,
header=0,
verbose=2, # More detailed output
chart_format="svg"
)
# Regression dataset
df_regression = pd.DataFrame({
"size": [1000, 1500, 1200, 2000, 1800, 2500, 2200, 3000],
"bedrooms": [2, 3, 2, 4, 3, 4, 4, 5],
"location": ["urban", "suburban", "urban", "rural", "suburban", "rural", "suburban", "rural"],
"age": [5, 10, 3, 15, 8, 20, 12, 25],
"price": [200000, 280000, 220000, 350000, 300000, 380000, 340000, 420000]
})
# Analyze with continuous target
df_analyzed = AV.AutoViz(
filename="",
sep=",",
depVar="price", # Continuous target
dfte=df_regression,
header=0,
verbose=1,
chart_format="png"
)
```
### 2. Chart Format and Output Options
**Different Chart Formats:**
```python
from autoviz import AutoViz_Class
import pandas as pd
df = pd.read_csv("data.csv")
AV = AutoViz_Class()
# SVG format (vector, scalable)
df_svg = AV.AutoViz(
filename="",
dfte=df,
chart_format="svg", # Scalable vector graphics
verbose=1
)
# PNG format (raster, good for presentations)
df_png = AV.AutoViz(
filename="",
dfte=df,
chart_format="png", # PNG images
verbose=1
)
# HTML format (interactive, for web)
df_html = AV.AutoViz(
filename="",
dfte=df,
chart_format="html", # Interactive HTML
verbose=1
)
# Bokeh backend for interactive plots
df_bokeh = AV.AutoViz(
filename="",
dfte=df,
chart_format="bokeh", # Bokeh interactive
verbose=1
)
# Server mode (for Jupyter notebooks)
df_server = AV.AutoViz(
filename="",
dfte=df,
chart_format="server", # Inline in notebook
verbose=1
)
```
**Saving Charts to Directory:**
```python
from autoviz import AutoViz_Class
import pandas as pd
import os
# Create output directory
output_dir = "analysis_output"
os.makedirs(output_dir, exist_ok=True)
df = pd.read_csv("data.csv")
AV = AutoViz_Class()
# Save all charts to specified directory
df_analyzed = AV.AutoViz(
filename="",
dfte=df,
chart_format="png",
save_plot_dir=output_dir, # Directory to save plots
verbose=1
)
# List generated files
for file in os.listdir(output_dir):
print(f"Generated: {file}")
```
### 3. Handling Large Datasets
**Sampling Strategies:**
```python
from autoviz import AutoViz_Class
import pandas as pd
import numpy as np
# Create large dataset
np.random.seed(42)
large_df = pd.DataFrame({
"feature_" + str(i): np.random.randn(500000)
for i in range(20)
})
large_df["category"] = np.random.choice(["A", "B", "C", "D"], 500000)
large_df["target"] = np.random.randint(0, 2, 500000)
print(f"Dataset size: {large_df.shape}")
AV = AutoViz_Class()
# Control sampling with max_rows_analyzed
df_analyzed = AV.AutoViz(
filename="",
dfte=large_df,
max_rows_analyzed=100000, # Sample 100K rows
max_cols_analyzed=25, # Limit columns analyzed
verbose=1,
chart_format="png"
)
# For very large datasets, use smaller sample
df_analyzed_small = AV.AutoViz(
filename="",
dfte=large_df,
max_rows_analyzed=50000, # Smaller sample for speed
max_cols_analyzed=15,
verbose=0, # Minimal output
chart_format="svg"
)
```
**Memory-Efficient Analysis:**
```python
from autoviz import AutoViz_Class
import pandas as pd
def analyze_large_file(file_path: str, sample_size: int = 100000) -> pd.DataFrame:
"""
Analyze large files efficiently with sampling.
Args:
file_path: Path to CSV file
sample_size: Number of rows to sample
Returns:
Analyzed DataFrame
"""
# Read only a sample for initial analysis
total_rows = sum(1 for _ in open(file_path)) - 1 # Exclude header
if total_rows > sample_size:
# Calculate skip probability
skip_prob = 1 - (sample_size / total_rows)
# Read with sampling
df = pd.read_csv(
file_path,
skiprows=lambda i: i > 0 and np.random.random() < skip_prob
)
else:
df = pd.read_csv(file_path)
print(f"Sampled {len(df)} rows from {total_rows} total")
AV = AutoViz_Class()
return AV.AutoViz(
filename="",
dfte=df,
verbose=1,
chart_format="png"
)
#Related in data-analysis
pandas
IncludedExpert data analysis and manipulation for customer support operations using pandas
dash
IncludedBuild production-grade interactive dashboards with Plotly Dash - enterprise features, callbacks, and scalable deployment
great-tables
IncludedPublication-quality tables in Python with rich styling, formatting, conditional formatting, and export to HTML/images - inspired by R's gt package
polars
IncludedHigh-performance DataFrame library for fast data processing with lazy evaluation, parallel execution, and memory efficiency
streamlit
IncludedBuild interactive data applications and dashboards with pure Python - no frontend experience required
sweetviz
IncludedAutomated EDA comparison reports with target analysis, feature comparison, and HTML report generation for pandas DataFrames