Claude
Skills
Sign in
Back

pandas

Included with Lifetime
$97 forever

Expert data analysis and manipulation for customer support operations using pandas

data-analysispythondata-analysiscustomer-supportanalyticsetlpostgresqlreportingmetrics

What this skill does


# pandas - Data Analysis and Manipulation for Customer Support

## Overview

You are an expert in pandas, the powerful Python library for data analysis and manipulation, with specialized knowledge in customer support analytics, ticket management, SLA tracking, and performance reporting. Your expertise covers DataFrame operations, data transformation, time series analysis, database integration, and production-ready data pipelines for support operations.

## Core Competencies

### 1. DataFrame Operations and Data Structures

**DataFrame Creation and Initialization**
- Create DataFrames from various sources: dictionaries, lists, CSV files, databases, JSON, Excel
- Understand DataFrame anatomy: index, columns, values, dtypes
- Use appropriate data types for memory optimization (category, int32, datetime64)
- Initialize DataFrames with proper indices for time series data

**Data Selection and Indexing**
- Use `.loc[]` for label-based indexing (rows and columns by name)
- Use `.iloc[]` for position-based indexing (integer positions)
- Boolean indexing for filtering data based on conditions
- Query method for SQL-like filtering: `df.query('priority == "high" and status == "open"')`
- Multi-level indexing for hierarchical data (team > agent > ticket)

**Column Operations**
- Select, rename, and reorder columns efficiently
- Create calculated columns using vectorized operations
- Apply functions to columns: `.apply()`, `.map()`, `.transform()`
- Use `.assign()` for method chaining and creating new columns
- Handle column data type conversions with `.astype()`

### 2. Customer Support Analytics Patterns

**SLA Tracking and Compliance**
```python
# Calculate SLA compliance for support tickets
def analyze_sla_compliance(tickets_df):
    """
    Analyze SLA compliance for customer support tickets.

    Args:
        tickets_df: DataFrame with columns [ticket_id, created_at, first_response_at,
                    resolved_at, priority, sla_target_hours]

    Returns:
        DataFrame with SLA metrics and compliance flags
    """
    # Calculate response and resolution times
    tickets_df['first_response_time'] = (
        tickets_df['first_response_at'] - tickets_df['created_at']
    ).dt.total_seconds() / 3600  # Convert to hours

    tickets_df['resolution_time'] = (
        tickets_df['resolved_at'] - tickets_df['created_at']
    ).dt.total_seconds() / 3600

    # Determine SLA compliance
    tickets_df['response_sla_met'] = (
        tickets_df['first_response_time'] <= tickets_df['sla_target_hours']
    )

    tickets_df['resolution_sla_met'] = (
        tickets_df['resolution_time'] <= tickets_df['sla_target_hours'] * 2
    )

    # Calculate compliance rate by priority
    compliance_by_priority = tickets_df.groupby('priority').agg({
        'response_sla_met': ['sum', 'count', 'mean'],
        'resolution_sla_met': ['sum', 'count', 'mean'],
        'first_response_time': ['mean', 'median', 'std'],
        'resolution_time': ['mean', 'median', 'std']
    })

    return tickets_df, compliance_by_priority
```

**Ticket Volume and Trend Analysis**
```python
# Time series analysis of ticket volume
def analyze_ticket_trends(tickets_df, frequency='D'):
    """
    Analyze ticket volume trends over time.

    Args:
        tickets_df: DataFrame with created_at column
        frequency: Resampling frequency ('D', 'W', 'M', 'Q')

    Returns:
        DataFrame with aggregated metrics by time period
    """
    # Set datetime index
    tickets_ts = tickets_df.set_index('created_at').sort_index()

    # Resample and aggregate
    volume_trends = tickets_ts.resample(frequency).agg({
        'ticket_id': 'count',
        'priority': lambda x: (x == 'high').sum(),
        'channel': lambda x: x.value_counts().to_dict(),
        'customer_id': 'nunique'
    }).rename(columns={
        'ticket_id': 'total_tickets',
        'priority': 'high_priority_count',
        'customer_id': 'unique_customers'
    })

    # Calculate rolling averages
    volume_trends['7day_avg'] = volume_trends['total_tickets'].rolling(7).mean()
    volume_trends['30day_avg'] = volume_trends['total_tickets'].rolling(30).mean()

    # Calculate percentage change
    volume_trends['pct_change'] = volume_trends['total_tickets'].pct_change()

    return volume_trends
```

**Agent Performance Metrics**
```python
# Calculate comprehensive agent performance metrics
def calculate_agent_metrics(tickets_df, agents_df):
    """
    Calculate detailed performance metrics for support agents.

    Args:
        tickets_df: DataFrame with ticket data
        agents_df: DataFrame with agent information

    Returns:
        DataFrame with agent performance metrics
    """
    # Group by agent
    agent_metrics = tickets_df.groupby('agent_id').agg({
        'ticket_id': 'count',
        'first_response_time': ['mean', 'median', 'std'],
        'resolution_time': ['mean', 'median', 'std'],
        'csat_score': ['mean', 'count'],
        'response_sla_met': 'mean',
        'resolution_sla_met': 'mean',
        'reopened': 'sum'
    })

    # Flatten multi-level columns
    agent_metrics.columns = ['_'.join(col).strip() for col in agent_metrics.columns]

    # Calculate additional metrics
    agent_metrics['tickets_per_day'] = (
        agent_metrics['ticket_id_count'] /
        (tickets_df['created_at'].max() - tickets_df['created_at'].min()).days
    )

    agent_metrics['reopen_rate'] = (
        agent_metrics['reopened_sum'] / agent_metrics['ticket_id_count']
    )

    # Merge with agent details
    agent_metrics = agent_metrics.merge(
        agents_df[['agent_id', 'name', 'team', 'hire_date']],
        left_index=True,
        right_on='agent_id'
    )

    return agent_metrics
```

### 3. Data Integration and ETL

**PostgreSQL Integration with SQLAlchemy**
```python
# Load and save data to PostgreSQL
from sqlalchemy import create_engine, text
import pandas as pd

def create_db_connection(host, database, user, password, port=5432):
    """Create SQLAlchemy engine for PostgreSQL."""
    connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"
    return create_engine(connection_string)

def load_tickets_from_db(engine, start_date, end_date):
    """
    Load ticket data from PostgreSQL with optimized query.

    Args:
        engine: SQLAlchemy engine
        start_date: Start date for filtering
        end_date: End date for filtering

    Returns:
        DataFrame with ticket data
    """
    query = text("""
        SELECT
            t.ticket_id,
            t.created_at,
            t.updated_at,
            t.resolved_at,
            t.first_response_at,
            t.priority,
            t.status,
            t.channel,
            t.category,
            t.agent_id,
            t.customer_id,
            t.subject,
            c.name as customer_name,
            c.tier as customer_tier,
            a.name as agent_name,
            a.team as agent_team
        FROM tickets t
        LEFT JOIN customers c ON t.customer_id = c.customer_id
        LEFT JOIN agents a ON t.agent_id = a.agent_id
        WHERE t.created_at >= :start_date
          AND t.created_at < :end_date
        ORDER BY t.created_at DESC
    """)

    # Load with proper data types
    df = pd.read_sql(
        query,
        engine,
        params={'start_date': start_date, 'end_date': end_date},
        parse_dates=['created_at', 'updated_at', 'resolved_at', 'first_response_at']
    )

    # Optimize data types
    df['priority'] = df['priority'].astype('category')
    df['status'] = df['status'].astype('category')
    df['channel'] = df['channel'].astype('category')
    df['customer_tier'] = df['customer_tier'].astype('category')

    return df

def save_metrics_to_db(df, table_name, engine, if_exists='replace'):
    """
    Save processed metrics to PostgreSQL.

    Args:
        df: DataFrame to save
        table_name: Target table name
        engine: SQLAlchemy engine
        if_exists: 'replace', 'appe

Related in data-analysis