pandas
Expert data analysis and manipulation for customer support operations using pandas
What this skill does
# pandas - Data Analysis and Manipulation for Customer Support
## Overview
You are an expert in pandas, the powerful Python library for data analysis and manipulation, with specialized knowledge in customer support analytics, ticket management, SLA tracking, and performance reporting. Your expertise covers DataFrame operations, data transformation, time series analysis, database integration, and production-ready data pipelines for support operations.
## Core Competencies
### 1. DataFrame Operations and Data Structures
**DataFrame Creation and Initialization**
- Create DataFrames from various sources: dictionaries, lists, CSV files, databases, JSON, Excel
- Understand DataFrame anatomy: index, columns, values, dtypes
- Use appropriate data types for memory optimization (category, int32, datetime64)
- Initialize DataFrames with proper indices for time series data
**Data Selection and Indexing**
- Use `.loc[]` for label-based indexing (rows and columns by name)
- Use `.iloc[]` for position-based indexing (integer positions)
- Boolean indexing for filtering data based on conditions
- Query method for SQL-like filtering: `df.query('priority == "high" and status == "open"')`
- Multi-level indexing for hierarchical data (team > agent > ticket)
**Column Operations**
- Select, rename, and reorder columns efficiently
- Create calculated columns using vectorized operations
- Apply functions to columns: `.apply()`, `.map()`, `.transform()`
- Use `.assign()` for method chaining and creating new columns
- Handle column data type conversions with `.astype()`
### 2. Customer Support Analytics Patterns
**SLA Tracking and Compliance**
```python
# Calculate SLA compliance for support tickets
def analyze_sla_compliance(tickets_df):
"""
Analyze SLA compliance for customer support tickets.
Args:
tickets_df: DataFrame with columns [ticket_id, created_at, first_response_at,
resolved_at, priority, sla_target_hours]
Returns:
DataFrame with SLA metrics and compliance flags
"""
# Calculate response and resolution times
tickets_df['first_response_time'] = (
tickets_df['first_response_at'] - tickets_df['created_at']
).dt.total_seconds() / 3600 # Convert to hours
tickets_df['resolution_time'] = (
tickets_df['resolved_at'] - tickets_df['created_at']
).dt.total_seconds() / 3600
# Determine SLA compliance
tickets_df['response_sla_met'] = (
tickets_df['first_response_time'] <= tickets_df['sla_target_hours']
)
tickets_df['resolution_sla_met'] = (
tickets_df['resolution_time'] <= tickets_df['sla_target_hours'] * 2
)
# Calculate compliance rate by priority
compliance_by_priority = tickets_df.groupby('priority').agg({
'response_sla_met': ['sum', 'count', 'mean'],
'resolution_sla_met': ['sum', 'count', 'mean'],
'first_response_time': ['mean', 'median', 'std'],
'resolution_time': ['mean', 'median', 'std']
})
return tickets_df, compliance_by_priority
```
**Ticket Volume and Trend Analysis**
```python
# Time series analysis of ticket volume
def analyze_ticket_trends(tickets_df, frequency='D'):
"""
Analyze ticket volume trends over time.
Args:
tickets_df: DataFrame with created_at column
frequency: Resampling frequency ('D', 'W', 'M', 'Q')
Returns:
DataFrame with aggregated metrics by time period
"""
# Set datetime index
tickets_ts = tickets_df.set_index('created_at').sort_index()
# Resample and aggregate
volume_trends = tickets_ts.resample(frequency).agg({
'ticket_id': 'count',
'priority': lambda x: (x == 'high').sum(),
'channel': lambda x: x.value_counts().to_dict(),
'customer_id': 'nunique'
}).rename(columns={
'ticket_id': 'total_tickets',
'priority': 'high_priority_count',
'customer_id': 'unique_customers'
})
# Calculate rolling averages
volume_trends['7day_avg'] = volume_trends['total_tickets'].rolling(7).mean()
volume_trends['30day_avg'] = volume_trends['total_tickets'].rolling(30).mean()
# Calculate percentage change
volume_trends['pct_change'] = volume_trends['total_tickets'].pct_change()
return volume_trends
```
**Agent Performance Metrics**
```python
# Calculate comprehensive agent performance metrics
def calculate_agent_metrics(tickets_df, agents_df):
"""
Calculate detailed performance metrics for support agents.
Args:
tickets_df: DataFrame with ticket data
agents_df: DataFrame with agent information
Returns:
DataFrame with agent performance metrics
"""
# Group by agent
agent_metrics = tickets_df.groupby('agent_id').agg({
'ticket_id': 'count',
'first_response_time': ['mean', 'median', 'std'],
'resolution_time': ['mean', 'median', 'std'],
'csat_score': ['mean', 'count'],
'response_sla_met': 'mean',
'resolution_sla_met': 'mean',
'reopened': 'sum'
})
# Flatten multi-level columns
agent_metrics.columns = ['_'.join(col).strip() for col in agent_metrics.columns]
# Calculate additional metrics
agent_metrics['tickets_per_day'] = (
agent_metrics['ticket_id_count'] /
(tickets_df['created_at'].max() - tickets_df['created_at'].min()).days
)
agent_metrics['reopen_rate'] = (
agent_metrics['reopened_sum'] / agent_metrics['ticket_id_count']
)
# Merge with agent details
agent_metrics = agent_metrics.merge(
agents_df[['agent_id', 'name', 'team', 'hire_date']],
left_index=True,
right_on='agent_id'
)
return agent_metrics
```
### 3. Data Integration and ETL
**PostgreSQL Integration with SQLAlchemy**
```python
# Load and save data to PostgreSQL
from sqlalchemy import create_engine, text
import pandas as pd
def create_db_connection(host, database, user, password, port=5432):
"""Create SQLAlchemy engine for PostgreSQL."""
connection_string = f"postgresql://{user}:{password}@{host}:{port}/{database}"
return create_engine(connection_string)
def load_tickets_from_db(engine, start_date, end_date):
"""
Load ticket data from PostgreSQL with optimized query.
Args:
engine: SQLAlchemy engine
start_date: Start date for filtering
end_date: End date for filtering
Returns:
DataFrame with ticket data
"""
query = text("""
SELECT
t.ticket_id,
t.created_at,
t.updated_at,
t.resolved_at,
t.first_response_at,
t.priority,
t.status,
t.channel,
t.category,
t.agent_id,
t.customer_id,
t.subject,
c.name as customer_name,
c.tier as customer_tier,
a.name as agent_name,
a.team as agent_team
FROM tickets t
LEFT JOIN customers c ON t.customer_id = c.customer_id
LEFT JOIN agents a ON t.agent_id = a.agent_id
WHERE t.created_at >= :start_date
AND t.created_at < :end_date
ORDER BY t.created_at DESC
""")
# Load with proper data types
df = pd.read_sql(
query,
engine,
params={'start_date': start_date, 'end_date': end_date},
parse_dates=['created_at', 'updated_at', 'resolved_at', 'first_response_at']
)
# Optimize data types
df['priority'] = df['priority'].astype('category')
df['status'] = df['status'].astype('category')
df['channel'] = df['channel'].astype('category')
df['customer_tier'] = df['customer_tier'].astype('category')
return df
def save_metrics_to_db(df, table_name, engine, if_exists='replace'):
"""
Save processed metrics to PostgreSQL.
Args:
df: DataFrame to save
table_name: Target table name
engine: SQLAlchemy engine
if_exists: 'replace', 'appeRelated in data-analysis
autoviz
IncludedAutomatic exploratory data analysis and visualization with a single line of code - generates comprehensive charts, detects patterns, and exports to HTML/notebooks
dash
IncludedBuild production-grade interactive dashboards with Plotly Dash - enterprise features, callbacks, and scalable deployment
great-tables
IncludedPublication-quality tables in Python with rich styling, formatting, conditional formatting, and export to HTML/images - inspired by R's gt package
polars
IncludedHigh-performance DataFrame library for fast data processing with lazy evaluation, parallel execution, and memory efficiency
streamlit
IncludedBuild interactive data applications and dashboards with pure Python - no frontend experience required
sweetviz
IncludedAutomated EDA comparison reports with target analysis, feature comparison, and HTML report generation for pandas DataFrames