Claude
Skills
Sign in
Back

statistics-fundamentals

Included with Lifetime
$97 forever

Apply statistical methods to financial data including descriptive statistics, covariance estimation, regression, hypothesis testing, and resampling. Use when the user asks about return distributions, correlation between assets, building a covariance matrix, running a CAPM regression, testing whether alpha is significant, checking if returns are normal, or estimating confidence intervals. Also trigger when users mention 'volatility', 'how correlated are these', 'fat tails', 'skewness', 'R-squared', 'beta of a fund', 'bootstrap a Sharpe ratio', 'shrinkage estimator', 'Ledoit-Wolf', or ask why their optimizer produces unstable weights.

Generalscripts

What this skill does


# Statistics Fundamentals

## Purpose
This skill enables Claude to apply core statistical methods to financial data, including descriptive statistics, covariance estimation, linear regression, hypothesis testing, and resampling techniques. These methods form the quantitative backbone for portfolio construction, risk measurement, and factor modeling.

## Layer
0 — Mathematical Foundations

## Direction
both

## When to Use
- Analyzing return distributions
- Estimating correlations or covariance matrices
- Running regression analysis on financial data
- Testing hypotheses about returns
- Building factor models

## Core Concepts

### Descriptive Statistics

**Mean (Expected Value):**

$$\mu = E[X] = \frac{1}{n} \sum_{i=1}^{n} x_i$$

The arithmetic average of observed values. For financial returns, this represents the central tendency of the return distribution.

**Variance:**

Population variance:

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

Sample variance (Bessel's correction):

$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

**Standard Deviation:**

$$\sigma = \sqrt{\sigma^2}$$

In finance, standard deviation of returns is commonly called **volatility**. Annualized volatility from monthly data: `sigma_annual = sigma_monthly * sqrt(12)`.

**Skewness:**

$$\gamma = \frac{E[(X - \mu)^3]}{\sigma^3}$$

Measures asymmetry of the distribution. Negative skewness (left tail) is common in equity returns and indicates a higher probability of large losses than large gains.

**Excess Kurtosis:**

$$\kappa = \frac{E[(X - \mu)^4]}{\sigma^4} - 3$$

Measures tail thickness relative to the normal distribution (which has excess kurtosis of 0). Financial returns typically exhibit positive excess kurtosis (leptokurtosis), meaning fat tails and more frequent extreme events than a normal distribution would predict.

### Covariance and Correlation

**Covariance:**

$$\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$$

Sample covariance:

$$\hat{\text{Cov}}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$

Covariance measures the linear co-movement between two variables. Positive covariance means they tend to move together; negative means they move inversely.

**Correlation (Pearson):**

$$\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \times \sigma_Y}$$

Correlation normalizes covariance to the range `[-1, +1]`, making it unit-free and comparable across variable pairs.

### Covariance Matrix Estimation

For a set of `p` assets with `n` return observations, the sample covariance matrix is:

$$\hat{\Sigma} = \frac{1}{n-1} (X - \bar{X})^T (X - \bar{X})$$

where `X` is the `n x p` matrix of returns.

**The curse of dimensionality:** When `p` (number of assets) is large relative to `n` (number of observations), the sample covariance matrix becomes poorly conditioned or singular, leading to unstable portfolio optimizations.

### Ledoit-Wolf Shrinkage Estimator

Shrinkage blends the sample covariance matrix with a structured target (e.g., the identity matrix scaled by average variance) to produce a more stable estimate:

$$\hat{\Sigma}_{shrunk} = \delta \cdot F + (1 - \delta) \cdot \hat{\Sigma}$$

where:
- `F` = the shrinkage target (structured estimator)
- `delta` = the optimal shrinkage intensity (estimated analytically)
- `Sigma_hat` = the sample covariance matrix

Ledoit-Wolf determines the optimal `delta` that minimizes expected squared Frobenius distance to the true covariance matrix. This produces better-conditioned matrices and more stable portfolio weights.

### OLS Regression

Ordinary Least Squares estimates the linear relationship `y = X * beta + epsilon` by minimizing the sum of squared residuals.

**Coefficient Estimate:**

$$\hat{\beta} = (X^T X)^{-1} X^T y$$

**Key Regression Diagnostics:**

**R-squared (Coefficient of Determination):**

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

Represents the proportion of variance in the dependent variable explained by the model.

**Adjusted R-squared:**

$$\bar{R}^2 = 1 - (1 - R^2) \frac{n - 1}{n - k - 1}$$

where `k` = number of regressors. Penalizes additional regressors that do not improve fit.

**Standard Errors:**

$$SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 \cdot \text{diag}((X^T X)^{-1})}$$

where `sigma_hat^2 = SS_res / (n - k - 1)`.

**t-statistic:**

$$t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}$$

Tests whether each coefficient is significantly different from zero.

In finance, the single-factor regression `R_i - R_f = alpha + beta * (R_m - R_f) + epsilon` is the CAPM regression, where `alpha` is the risk-adjusted excess return and `beta` is market sensitivity.

### Common Distributions in Finance

**Normal Distribution:** Symmetric, fully characterized by mean and variance. Used as a baseline model for returns, though real returns deviate from normality.

**Log-Normal Distribution:** If `ln(X)` is normal, then `X` is log-normal. Asset prices (not returns) are often modeled as log-normal, ensuring prices cannot go negative.

**Student-t Distribution:** Has heavier tails than the normal. Parameterized by degrees of freedom `nu`; lower `nu` means fatter tails. Commonly used to model financial returns more realistically. As `nu -> infinity`, converges to the normal.

**Chi-Squared Distribution:** The distribution of a sum of squared standard normal variables. Used in variance tests and as the sampling distribution of `(n-1)*s^2 / sigma^2`.

### Bootstrap Methods

Non-parametric resampling technique for estimating the sampling distribution of a statistic.

**Algorithm:**
1. From the original dataset of size `n`, draw `B` bootstrap samples, each of size `n`, with replacement.
2. Compute the statistic of interest on each bootstrap sample.
3. Use the distribution of the `B` bootstrap statistics to estimate confidence intervals, standard errors, or bias.

**Confidence Interval (Percentile Method):**
The `(1 - alpha)` confidence interval is given by the `alpha/2` and `1 - alpha/2` percentiles of the bootstrap distribution.

Bootstrap is especially useful in finance when:
- Analytical formulas for standard errors are unavailable (e.g., Sharpe ratio)
- The underlying distribution is unknown or non-normal
- Small sample sizes make asymptotic results unreliable

### Hypothesis Testing

**t-test (mean):** Tests whether a sample mean differs significantly from a hypothesized value.

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

with `n - 1` degrees of freedom.

**F-test (joint significance):** Tests whether a group of regression coefficients are jointly zero. Used in multi-factor models.

$$F = \frac{(SS_{restricted} - SS_{unrestricted}) / q}{SS_{unrestricted} / (n - k - 1)}$$

where `q` = number of restrictions.

**Jarque-Bera Test (normality):** Tests whether sample skewness and kurtosis are consistent with a normal distribution.

$$JB = \frac{n}{6} \left(\gamma^2 + \frac{\kappa^2}{4}\right)$$

where `gamma` = sample skewness and `kappa` = sample excess kurtosis. Under the null of normality, JB follows a chi-squared distribution with 2 degrees of freedom. Financial return series almost always reject normality due to fat tails and skewness.

## Key Formulas

| Formula | Expression | Use Case |
|---------|-----------|----------|
| Sample Mean | `x_bar = (1/n) * sum(x_i)` | Central tendency |
| Sample Variance | `s^2 = (1/(n-1)) * sum((x_i - x_bar)^2)` | Dispersion |
| Annualized Volatility | `sigma_annual = sigma_period * sqrt(periods_per_year)` | Risk standardization |
| Skewness | `gamma = E[(X-mu)^3] / sigma^3` | Asymmetry |
| Excess Kurtosis | `kappa = E[(X-mu)^4] / sigma^4 - 3` | Tail thickness |
| Covariance | `Cov(X,Y) = E[(X-mu_X)(Y-mu_Y)]` | Co-movement |
| Correlation | `rho = Cov(X,Y) / (sigma_X * sigma_Y)` | Standardized co-movement |
| Shrinkage Estimator | `Sigma_shrunk = delta*F + (1-delta)*Sigma_hat` | Stable covariance matrix |
| OLS Coefficients | `beta_hat = (X'X)^(-1) X'y` | Linear regression |
| R-squared | `1 - SS_

Related in General