Claude
Skills
Sign in
Back

pandasai

Included with Lifetime
$97 forever

Conversational data analysis using natural language queries on DataFrames. Chat with your data using LLMs to generate insights, create visualizations, and explain code.

ai-promptingpandasaillmdataframenatural-languagedata-analysisvisualizationconversational-aipandas

What this skill does


# PandasAI Skill

> Chat with your data using natural language. Ask questions about DataFrames and get insights, visualizations, and explanations powered by LLMs.

## Quick Start

```bash
# Install PandasAI
pip install pandasai

# Install with OpenAI support
pip install pandasai openai

# Install with visualization support
pip install pandasai matplotlib seaborn plotly

# Set API key
export OPENAI_API_KEY="your-api-key"
```

## When to Use This Skill

**USE when:**
- You need to analyze data without writing code
- Exploring unfamiliar datasets with natural language questions
- Creating quick visualizations from descriptive prompts
- Explaining complex data transformations to stakeholders
- Building conversational interfaces for data exploration
- Prototyping data analysis workflows rapidly
- Need LLM-powered data insights on pandas DataFrames
- Creating reports from data with natural language descriptions

**DON'T USE when:**
- Production data pipelines requiring deterministic outputs
- Processing highly sensitive data (consider privacy modes)
- Need precise control over generated code
- Performance-critical applications with large datasets
- Simple queries better handled by direct pandas operations
- Reproducible analyses requiring version-controlled code

## Prerequisites

```bash
# Core installation
pip install pandasai>=2.0.0

# For OpenAI backend (recommended)
pip install pandasai[openai]

# For visualization
pip install pandasai matplotlib seaborn plotly

# For Excel/CSV handling
pip install pandasai openpyxl xlrd

# For database connections
pip install pandasai sqlalchemy psycopg2-binary pymysql

# Environment setup
export OPENAI_API_KEY="sk-..."
# Or for other providers
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
```

### Verify Installation

```python
import pandasai
from pandasai import SmartDataframe
import pandas as pd

print(f"PandasAI version: {pandasai.__version__}")

# Quick test
df = pd.DataFrame({"name": ["Alice", "Bob"], "age": [25, 30]})
smart_df = SmartDataframe(df)
print("PandasAI installed successfully!")
```

## Core Capabilities

### 1. Basic Natural Language Queries

**Simple DataFrame Conversations:**
```python
"""
Query DataFrames using natural language with PandasAI.
"""
import pandas as pd
from pandasai import SmartDataframe
from pandasai.llm import OpenAI

def create_smart_dataframe(
    df: pd.DataFrame,
    model: str = "gpt-4",
    temperature: float = 0.0
) -> SmartDataframe:
    """
    Create a SmartDataframe for natural language queries.

    Args:
        df: Input pandas DataFrame
        model: LLM model to use
        temperature: Sampling temperature (0 for deterministic)

    Returns:
        SmartDataframe instance ready for queries
    """
    # Initialize LLM
    llm = OpenAI(
        model=model,
        temperature=temperature
    )

    # Create SmartDataframe
    smart_df = SmartDataframe(
        df,
        config={
            "llm": llm,
            "verbose": True,
            "enable_cache": True,
            "conversational": False
        }
    )

    return smart_df


def query_dataframe(smart_df: SmartDataframe, question: str) -> any:
    """
    Ask a natural language question about the data.

    Args:
        smart_df: SmartDataframe instance
        question: Natural language question

    Returns:
        Query result (can be value, DataFrame, or visualization)
    """
    try:
        result = smart_df.chat(question)
        return result
    except Exception as e:
        print(f"Query error: {e}")
        return None


# Usage Example
# Create sample sales data
sales_data = pd.DataFrame({
    "date": pd.date_range("2025-01-01", periods=100, freq="D"),
    "product": ["Widget A", "Widget B", "Widget C", "Widget D"] * 25,
    "region": ["North", "South", "East", "West"] * 25,
    "units_sold": [50, 75, 100, 125] * 25,
    "revenue": [500, 1125, 2000, 3125] * 25,
    "cost": [300, 600, 1200, 1800] * 25
})

# Create SmartDataframe
smart_sales = create_smart_dataframe(sales_data)

# Ask questions in natural language
questions = [
    "What is the total revenue?",
    "Which product has the highest average revenue?",
    "What is the profit margin by product?",
    "Show me the top 5 days by units sold"
]

for question in questions:
    print(f"\nQ: {question}")
    result = query_dataframe(smart_sales, question)
    print(f"A: {result}")
```

**Aggregation and Statistical Queries:**
```python
"""
Perform aggregations and statistical analysis using natural language.
"""
import pandas as pd
from pandasai import SmartDataframe
from pandasai.llm import OpenAI

def create_analytics_interface(df: pd.DataFrame) -> SmartDataframe:
    """Create an analytics interface for statistical queries."""
    llm = OpenAI(model="gpt-4", temperature=0)

    return SmartDataframe(
        df,
        config={
            "llm": llm,
            "verbose": False,
            "enable_cache": True,
            "custom_whitelisted_dependencies": ["scipy", "numpy"]
        }
    )


# Create sample employee performance data
employee_data = pd.DataFrame({
    "employee_id": range(1, 101),
    "department": ["Engineering", "Sales", "Marketing", "HR", "Finance"] * 20,
    "experience_years": [1, 2, 3, 5, 7, 10, 12, 15, 18, 20] * 10,
    "performance_score": [round(3.0 + i * 0.02, 2) for i in range(100)],
    "salary": [50000 + i * 500 for i in range(100)],
    "projects_completed": [5 + i % 20 for i in range(100)]
})

smart_employees = create_analytics_interface(employee_data)

# Statistical queries
statistical_questions = [
    "What is the average salary by department?",
    "Calculate the correlation between experience_years and salary",
    "What is the standard deviation of performance scores?",
    "Find the median projects completed by department",
    "Which department has the highest variance in salary?",
    "Show the salary distribution statistics (mean, median, std, min, max)"
]

print("Statistical Analysis Results:")
print("=" * 50)

for question in statistical_questions:
    print(f"\nQuestion: {question}")
    result = smart_employees.chat(question)
    print(f"Answer: {result}")
```

### 2. Chart Generation and Visualization

**Creating Charts from Natural Language:**
```python
"""
Generate visualizations using natural language descriptions.
"""
import pandas as pd
from pandasai import SmartDataframe
from pandasai.llm import OpenAI
import matplotlib.pyplot as plt
from pathlib import Path

def create_visualization_interface(
    df: pd.DataFrame,
    save_charts: bool = True,
    charts_path: str = "./charts"
) -> SmartDataframe:
    """
    Create interface optimized for chart generation.

    Args:
        df: Input DataFrame
        save_charts: Whether to save generated charts
        charts_path: Directory to save charts

    Returns:
        SmartDataframe configured for visualization
    """
    llm = OpenAI(model="gpt-4", temperature=0)

    # Create charts directory
    Path(charts_path).mkdir(parents=True, exist_ok=True)

    return SmartDataframe(
        df,
        config={
            "llm": llm,
            "save_charts": save_charts,
            "save_charts_path": charts_path,
            "verbose": True,
            "custom_whitelisted_dependencies": [
                "matplotlib",
                "seaborn",
                "plotly"
            ]
        }
    )


# Create sample time series data
import numpy as np

np.random.seed(42)
dates = pd.date_range("2024-01-01", periods=365, freq="D")
time_series_data = pd.DataFrame({
    "date": dates,
    "sales": np.random.normal(1000, 200, 365).cumsum(),
    "customers": np.random.poisson(50, 365),
    "category": ["Electronics", "Clothing", "Food", "Home"] * 91 + ["Electronics"],
    "region": ["North", "South", "East", "West", "Central"] * 73
})

smart_viz = create_visualization_interface(time_series_data)

# Generate various chart types
chart_prompts = [
    "Create a line chart showing sales trend over time",

Related in ai-prompting