pandas-data-processing
Included with Lifetime
$97 forever
Pandas for time series analysis, OrcaFlex results processing, and marine engineering data workflows
programmingpandasdata-processingtime-seriescsvengineeringorcaflex
What this skill does
# Pandas Data Processing Skill
Master Pandas for time series analysis, OrcaFlex results processing, and configuration-driven data workflows in marine and offshore engineering.
## When to Use This Skill
Use Pandas data processing when you need:
- **Time series analysis** - Wave elevation, vessel motions, mooring tensions
- **OrcaFlex results** - Load simulation results, process RAOs, analyze dynamics
- **Multi-format data** - CSV, Excel, HDF5, Parquet for large datasets
- **Statistical analysis** - Summary statistics, rolling windows, resampling
- **Data transformation** - Pivot, melt, merge, group operations
- **Engineering reports** - Automated data extraction and summary generation
**Avoid when:**
- Real-time streaming data (use Polars or streaming libraries)
- Extremely large datasets (>100GB) - use Dask, Vaex, or PySpark
- Pure numerical computation (use NumPy directly)
- Graph/network data (use NetworkX)
## Core Capabilities
### 1. Time Series Analysis
**Load and Process Time Series:**
```python
import pandas as pd
import numpy as np
from pathlib import Path
def load_orcaflex_time_series(
csv_file: Path,
time_column: str = 'Time',
parse_dates: bool = True
) -> pd.DataFrame:
"""
Load OrcaFlex time series results from CSV.
Args:
csv_file: Path to CSV file
time_column: Name of time column
parse_dates: Whether to parse time column as datetime
Returns:
DataFrame with time as index
"""
# Load CSV
df = pd.read_csv(csv_file)
# Set time as index
if parse_dates:
df[time_column] = pd.to_datetime(df[time_column], unit='s')
df.set_index(time_column, inplace=True)
return df
# Usage
results = load_orcaflex_time_series(
Path('data/processed/vessel_motions.csv')
)
print(f"Time range: {results.index[0]} to {results.index[-1]}")
print(f"Duration: {(results.index[-1] - results.index[0]).total_seconds()} seconds")
print(f"Sampling rate: {1 / results.index.to_series().diff().mean().total_seconds():.2f} Hz")
```
**Resampling and Aggregation:**
```python
def resample_time_series(
df: pd.DataFrame,
target_frequency: str = '1S',
method: str = 'mean'
) -> pd.DataFrame:
"""
Resample time series to target frequency.
Args:
df: Input DataFrame with datetime index
target_frequency: Target frequency ('1S', '0.1S', '1min', etc.)
method: Aggregation method ('mean', 'max', 'min', 'std')
Returns:
Resampled DataFrame
"""
# Resample
if method == 'mean':
resampled = df.resample(target_frequency).mean()
elif method == 'max':
resampled = df.resample(target_frequency).max()
elif method == 'min':
resampled = df.resample(target_frequency).min()
elif method == 'std':
resampled = df.resample(target_frequency).std()
else:
raise ValueError(f"Unknown method: {method}")
# Fill NaN values (forward fill)
resampled.fillna(method='ffill', inplace=True)
return resampled
# Example: Downsample from 0.05s to 1s
high_freq_data = load_orcaflex_time_series(
Path('data/processed/mooring_tension_0.05s.csv')
)
low_freq_data = resample_time_series(
high_freq_data,
target_frequency='1S',
method='mean'
)
print(f"Original points: {len(high_freq_data)}")
print(f"Resampled points: {len(low_freq_data)}")
```
**Rolling Statistics:**
```python
def calculate_rolling_statistics(
df: pd.DataFrame,
column: str,
window: str = '60S'
) -> pd.DataFrame:
"""
Calculate rolling statistics for time series.
Args:
df: Input DataFrame with datetime index
column: Column name to analyze
window: Rolling window size (time-based)
Returns:
DataFrame with rolling statistics
"""
stats = pd.DataFrame(index=df.index)
# Rolling calculations
rolling = df[column].rolling(window=window)
stats[f'{column}_mean'] = rolling.mean()
stats[f'{column}_std'] = rolling.std()
stats[f'{column}_max'] = rolling.max()
stats[f'{column}_min'] = rolling.min()
return stats
# Example: 60-second rolling statistics
tension_stats = calculate_rolling_statistics(
results,
column='Tension_Line1',
window='60S'
)
# Plot rolling mean and standard deviation
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(
x=results.index,
y=results['Tension_Line1'],
name='Raw Tension',
opacity=0.3
))
fig.add_trace(go.Scatter(
x=tension_stats.index,
y=tension_stats['Tension_Line1_mean'],
name='60s Rolling Mean',
line=dict(width=3)
))
fig.update_layout(
title='Mooring Tension: Raw vs Rolling Mean',
xaxis_title='Time',
yaxis_title='Tension (kN)'
)
fig.write_html('reports/tension_rolling_mean.html')
```
### 2. Statistical Analysis
**Summary Statistics:**
```python
def generate_statistical_summary(
df: pd.DataFrame,
columns: list = None
) -> pd.DataFrame:
"""
Generate comprehensive statistical summary.
Args:
df: Input DataFrame
columns: Columns to analyze (None = all numeric)
Returns:
DataFrame with statistical metrics
"""
if columns is None:
columns = df.select_dtypes(include=[np.number]).columns.tolist()
# Standard statistics
summary = df[columns].describe()
# Additional statistics
additional_stats = pd.DataFrame({
'median': df[columns].median(),
'skewness': df[columns].skew(),
'kurtosis': df[columns].kurtosis(),
'variance': df[columns].var()
}).T
# Combine
full_summary = pd.concat([summary, additional_stats])
return full_summary
# Example
motion_stats = generate_statistical_summary(
results,
columns=['Surge', 'Sway', 'Heave', 'Roll', 'Pitch', 'Yaw']
)
print(motion_stats)
# Export to CSV
motion_stats.to_csv('reports/motion_statistics.csv')
```
**Extreme Value Analysis:**
```python
def extract_extreme_values(
df: pd.DataFrame,
column: str,
n_extremes: int = 10,
extreme_type: str = 'max'
) -> pd.DataFrame:
"""
Extract extreme values (max or min) from time series.
Args:
df: Input DataFrame with datetime index
column: Column to analyze
n_extremes: Number of extreme values to extract
extreme_type: 'max' or 'min'
Returns:
DataFrame with extreme events
"""
if extreme_type == 'max':
extremes = df.nlargest(n_extremes, column)
elif extreme_type == 'min':
extremes = df.nsmallest(n_extremes, column)
else:
raise ValueError("extreme_type must be 'max' or 'min'")
# Sort by time
extremes = extremes.sort_index()
return extremes
# Example: Top 10 maximum tensions
max_tensions = extract_extreme_values(
results,
column='Tension_Line1',
n_extremes=10,
extreme_type='max'
)
print("Top 10 Maximum Tensions:")
print(max_tensions[['Tension_Line1']])
```
### 3. Data Transformation
**Pivot Operations:**
```python
def pivot_mooring_data(
df: pd.DataFrame,
index: str = 'Time',
columns: str = 'LineID',
values: str = 'Tension'
) -> pd.DataFrame:
"""
Pivot long-format mooring data to wide format.
Args:
df: Input DataFrame in long format
index: Index column (usually time)
columns: Column to pivot (usually line identifier)
values: Value column (tension, angle, etc.)
Returns:
Pivoted DataFrame
"""
pivoted = df.pivot(
index=index,
columns=columns,
values=values
)
# Rename columns
pivoted.columns = [f'{values}_Line{col}' for col in pivoted.columns]
return pivoted
# Example: Convert long format to wide format
# Long format:
# Time LineID Tension
# 0.0 1 1500
# 0.0 2 1520
# 0.1 1 1505
# 0.1 2 1525
long_format = pd.DataFrame({
'Time': [0.0, 0.0, 0.1, 0.1, 0.2, 0.2],
'LineID': [1, 2, 1, 2, 1, 2],
'TensionRelated in programming
numpy-numerical-analysis
IncludedNumPy for matrix operations, FFT, linear algebra, and numerical computations in marine engineering
programming
python-scientific-computing
IncludedPython for engineering analysis, numerical computing, and scientific workflows using NumPy, SciPy, SymPy
programming
yaml-configuration
IncludedYAML for configuration-driven engineering workflows, model setup, and analysis parameters
programming