Claude
Skills
Sign in
Back

autoviz

Included with Lifetime
$97 forever

Automatic exploratory data analysis and visualization with a single line of code - generates comprehensive charts, detects patterns, and exports to HTML/notebooks

data-analysisautovizedaexploratory-data-analysisvisualizationautomatedchartscorrelationdistribution

What this skill does


# AutoViz Automatic EDA Skill

Master AutoViz for instant exploratory data analysis with a single line of code. Generate comprehensive visualizations, detect patterns, identify outliers, and export publication-ready charts automatically.

## When to Use This Skill

### USE AutoViz when:
- **Quick EDA** - Need rapid insights into a new dataset
- **Initial exploration** - Starting analysis on unfamiliar data
- **Pattern discovery** - Automatically detect relationships between variables
- **Presentation prep** - Need charts quickly for stakeholder meetings
- **Large datasets** - Built-in sampling handles big data efficiently
- **Feature analysis** - Understanding distribution and importance of features
- **Correlation hunting** - Finding relationships without manual chart creation
- **Report generation** - Export comprehensive HTML reports

### DON'T USE AutoViz when:
- **Custom visualizations** - Need highly specific chart designs
- **Interactive dashboards** - Use Streamlit or Dash instead
- **Real-time data** - Streaming visualization requirements
- **Production systems** - Charts for automated pipelines (use Plotly/Altair)
- **Precise statistical tests** - Need formal hypothesis testing
- **Domain-specific plots** - Specialized visualizations not in standard EDA

## Prerequisites

```bash
# Basic installation
pip install autoviz

# With all visualization backends
pip install autoviz matplotlib seaborn plotly bokeh

# Using uv (recommended)
uv pip install autoviz pandas matplotlib seaborn plotly

# Jupyter notebook support
pip install autoviz ipywidgets notebook

# Verify installation
python -c "from autoviz import AutoViz_Class; print('AutoViz ready!')"
```

## Core Capabilities

### 1. Basic One-Line EDA

**Simplest Usage:**
```python
from autoviz import AutoViz_Class

# Initialize AutoViz
AV = AutoViz_Class()

# Automatic visualization with one line
# Returns a dataframe and generates all charts
df_analyzed = AV.AutoViz(
    filename="data.csv",
    sep=",",
    depVar="",  # Target variable (optional)
    dfte=None,  # Pass DataFrame directly instead of filename
    header=0,
    verbose=1,  # 0=minimal, 1=medium, 2=detailed output
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=150000,
    max_cols_analyzed=30
)

print(f"Analyzed {df_analyzed.shape[0]} rows, {df_analyzed.shape[1]} columns")
```

**From DataFrame:**
```python
from autoviz import AutoViz_Class
import pandas as pd

# Load your data
df = pd.read_csv("sales_data.csv")

# Or create sample data
df = pd.DataFrame({
    "revenue": [100, 200, 150, 300, 250, 400, 350, 500],
    "units": [10, 20, 15, 30, 25, 40, 35, 50],
    "category": ["A", "B", "A", "B", "A", "B", "A", "B"],
    "region": ["North", "South", "East", "West", "North", "South", "East", "West"],
    "profit": [20, 40, 30, 60, 50, 80, 70, 100],
    "customer_age": [25, 35, 45, 55, 30, 40, 50, 60]
})

# Initialize and visualize
AV = AutoViz_Class()

# Pass DataFrame directly using dfte parameter
df_result = AV.AutoViz(
    filename="",  # Empty when using dfte
    sep=",",
    depVar="profit",  # Optional: specify target variable
    dfte=df,
    header=0,
    verbose=1,
    chart_format="png"
)
```

**With Target Variable Analysis:**
```python
from autoviz import AutoViz_Class
import pandas as pd

# Classification dataset
df_classification = pd.DataFrame({
    "feature_1": [1.2, 2.3, 1.5, 3.4, 2.1, 4.5, 3.2, 5.1],
    "feature_2": [0.5, 1.2, 0.8, 2.1, 1.0, 3.2, 2.4, 4.0],
    "feature_3": ["low", "medium", "low", "high", "medium", "high", "medium", "high"],
    "target": [0, 0, 0, 1, 0, 1, 1, 1]
})

AV = AutoViz_Class()

# Specify target variable for focused analysis
df_analyzed = AV.AutoViz(
    filename="",
    sep=",",
    depVar="target",  # Target variable for classification
    dfte=df_classification,
    header=0,
    verbose=2,  # More detailed output
    chart_format="svg"
)

# Regression dataset
df_regression = pd.DataFrame({
    "size": [1000, 1500, 1200, 2000, 1800, 2500, 2200, 3000],
    "bedrooms": [2, 3, 2, 4, 3, 4, 4, 5],
    "location": ["urban", "suburban", "urban", "rural", "suburban", "rural", "suburban", "rural"],
    "age": [5, 10, 3, 15, 8, 20, 12, 25],
    "price": [200000, 280000, 220000, 350000, 300000, 380000, 340000, 420000]
})

# Analyze with continuous target
df_analyzed = AV.AutoViz(
    filename="",
    sep=",",
    depVar="price",  # Continuous target
    dfte=df_regression,
    header=0,
    verbose=1,
    chart_format="png"
)
```

### 2. Chart Format and Output Options

**Different Chart Formats:**
```python
from autoviz import AutoViz_Class
import pandas as pd

df = pd.read_csv("data.csv")
AV = AutoViz_Class()

# SVG format (vector, scalable)
df_svg = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="svg",  # Scalable vector graphics
    verbose=1
)

# PNG format (raster, good for presentations)
df_png = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="png",  # PNG images
    verbose=1
)

# HTML format (interactive, for web)
df_html = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="html",  # Interactive HTML
    verbose=1
)

# Bokeh backend for interactive plots
df_bokeh = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="bokeh",  # Bokeh interactive
    verbose=1
)

# Server mode (for Jupyter notebooks)
df_server = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="server",  # Inline in notebook
    verbose=1
)
```

**Saving Charts to Directory:**
```python
from autoviz import AutoViz_Class
import pandas as pd
import os

# Create output directory
output_dir = "analysis_output"
os.makedirs(output_dir, exist_ok=True)

df = pd.read_csv("data.csv")
AV = AutoViz_Class()

# Save all charts to specified directory
df_analyzed = AV.AutoViz(
    filename="",
    dfte=df,
    chart_format="png",
    save_plot_dir=output_dir,  # Directory to save plots
    verbose=1
)

# List generated files
for file in os.listdir(output_dir):
    print(f"Generated: {file}")
```

### 3. Handling Large Datasets

**Sampling Strategies:**
```python
from autoviz import AutoViz_Class
import pandas as pd
import numpy as np

# Create large dataset
np.random.seed(42)
large_df = pd.DataFrame({
    "feature_" + str(i): np.random.randn(500000)
    for i in range(20)
})
large_df["category"] = np.random.choice(["A", "B", "C", "D"], 500000)
large_df["target"] = np.random.randint(0, 2, 500000)

print(f"Dataset size: {large_df.shape}")

AV = AutoViz_Class()

# Control sampling with max_rows_analyzed
df_analyzed = AV.AutoViz(
    filename="",
    dfte=large_df,
    max_rows_analyzed=100000,  # Sample 100K rows
    max_cols_analyzed=25,  # Limit columns analyzed
    verbose=1,
    chart_format="png"
)

# For very large datasets, use smaller sample
df_analyzed_small = AV.AutoViz(
    filename="",
    dfte=large_df,
    max_rows_analyzed=50000,  # Smaller sample for speed
    max_cols_analyzed=15,
    verbose=0,  # Minimal output
    chart_format="svg"
)
```

**Memory-Efficient Analysis:**
```python
from autoviz import AutoViz_Class
import pandas as pd

def analyze_large_file(file_path: str, sample_size: int = 100000) -> pd.DataFrame:
    """
    Analyze large files efficiently with sampling.

    Args:
        file_path: Path to CSV file
        sample_size: Number of rows to sample

    Returns:
        Analyzed DataFrame
    """
    # Read only a sample for initial analysis
    total_rows = sum(1 for _ in open(file_path)) - 1  # Exclude header

    if total_rows > sample_size:
        # Calculate skip probability
        skip_prob = 1 - (sample_size / total_rows)

        # Read with sampling
        df = pd.read_csv(
            file_path,
            skiprows=lambda i: i > 0 and np.random.random() < skip_prob
        )
    else:
        df = pd.read_csv(file_path)

    print(f"Sampled {len(df)} rows from {total_rows} total")

    AV = AutoViz_Class()
    return AV.AutoViz(
        filename="",
        dfte=df,
        verbose=1,
        chart_format="png"
    )

#

Related in data-analysis