statistics-fundamentals

Included with Lifetime

$97 forever

Apply statistical methods to financial data including descriptive statistics, covariance estimation, regression, hypothesis testing, and resampling. Use when the user asks about return distributions, correlation between assets, building a covariance matrix, running a CAPM regression, testing whether alpha is significant, checking if returns are normal, or estimating confidence intervals. Also trigger when users mention 'volatility', 'how correlated are these', 'fat tails', 'skewness', 'R-squared', 'beta of a fund', 'bootstrap a Sharpe ratio', 'shrinkage estimator', 'Ledoit-Wolf', or ask why their optimizer produces unstable weights.

Generalscripts

What this skill does


# Statistics Fundamentals

## Purpose
This skill enables Claude to apply core statistical methods to financial data, including descriptive statistics, covariance estimation, linear regression, hypothesis testing, and resampling techniques. These methods form the quantitative backbone for portfolio construction, risk measurement, and factor modeling.

## Layer
0 — Mathematical Foundations

## Direction
both

## When to Use
- Analyzing return distributions
- Estimating correlations or covariance matrices
- Running regression analysis on financial data
- Testing hypotheses about returns
- Building factor models

## Core Concepts

### Descriptive Statistics

**Mean (Expected Value):**

$$\mu = E[X] = \frac{1}{n} \sum_{i=1}^{n} x_i$$

The arithmetic average of observed values. For financial returns, this represents the central tendency of the return distribution.

**Variance:**

Population variance:

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

Sample variance (Bessel's correction):

$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

**Standard Deviation:**

$$\sigma = \sqrt{\sigma^2}$$

In finance, standard deviation of returns is commonly called **volatility**. Annualized volatility from monthly data: `sigma_annual = sigma_monthly * sqrt(12)`.

**Skewness:**

$$\gamma = \frac{E[(X - \mu)^3]}{\sigma^3}$$

Measures asymmetry of the distribution. Negative skewness (left tail) is common in equity returns and indicates a higher probability of large losses than large gains.

**Excess Kurtosis:**

$$\kappa = \frac{E[(X - \mu)^4]}{\sigma^4} - 3$$

Measures tail thickness relative to the normal distribution (which has excess kurtosis of 0). Financial returns typically exhibit positive excess kurtosis (leptokurtosis), meaning fat tails and more frequent extreme events than a normal distribution would predict.

### Covariance and Correlation

**Covariance:**

$$\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$$

Sample covariance:

$$\hat{\text{Cov}}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$

Covariance measures the linear co-movement between two variables. Positive covariance means they tend to move together; negative means they move inversely.

**Correlation (Pearson):**

$$\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \times \sigma_Y}$$

Correlation normalizes covariance to the range `[-1, +1]`, making it unit-free and comparable across variable pairs.

### Covariance Matrix Estimation

For a set of `p` assets with `n` return observations, the sample covariance matrix is:

$$\hat{\Sigma} = \frac{1}{n-1} (X - \bar{X})^T (X - \bar{X})$$

where `X` is the `n x p` matrix of returns.

**The curse of dimensionality:** When `p` (number of assets) is large relative to `n` (number of observations), the sample covariance matrix becomes poorly conditioned or singular, leading to unstable portfolio optimizations.

### Ledoit-Wolf Shrinkage Estimator

Shrinkage blends the sample covariance matrix with a structured target (e.g., the identity matrix scaled by average variance) to produce a more stable estimate:

$$\hat{\Sigma}_{shrunk} = \delta \cdot F + (1 - \delta) \cdot \hat{\Sigma}$$

where:
- `F` = the shrinkage target (structured estimator)
- `delta` = the optimal shrinkage intensity (estimated analytically)
- `Sigma_hat` = the sample covariance matrix

Ledoit-Wolf determines the optimal `delta` that minimizes expected squared Frobenius distance to the true covariance matrix. This produces better-conditioned matrices and more stable portfolio weights.

### OLS Regression

Ordinary Least Squares estimates the linear relationship `y = X * beta + epsilon` by minimizing the sum of squared residuals.

**Coefficient Estimate:**

$$\hat{\beta} = (X^T X)^{-1} X^T y$$

**Key Regression Diagnostics:**

**R-squared (Coefficient of Determination):**

$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

Represents the proportion of variance in the dependent variable explained by the model.

**Adjusted R-squared:**

$$\bar{R}^2 = 1 - (1 - R^2) \frac{n - 1}{n - k - 1}$$

where `k` = number of regressors. Penalizes additional regressors that do not improve fit.

**Standard Errors:**

$$SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 \cdot \text{diag}((X^T X)^{-1})}$$

where `sigma_hat^2 = SS_res / (n - k - 1)`.

**t-statistic:**

$$t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}$$

Tests whether each coefficient is significantly different from zero.

In finance, the single-factor regression `R_i - R_f = alpha + beta * (R_m - R_f) + epsilon` is the CAPM regression, where `alpha` is the risk-adjusted excess return and `beta` is market sensitivity.

### Common Distributions in Finance

**Normal Distribution:** Symmetric, fully characterized by mean and variance. Used as a baseline model for returns, though real returns deviate from normality.

**Log-Normal Distribution:** If `ln(X)` is normal, then `X` is log-normal. Asset prices (not returns) are often modeled as log-normal, ensuring prices cannot go negative.

**Student-t Distribution:** Has heavier tails than the normal. Parameterized by degrees of freedom `nu`; lower `nu` means fatter tails. Commonly used to model financial returns more realistically. As `nu -> infinity`, converges to the normal.

**Chi-Squared Distribution:** The distribution of a sum of squared standard normal variables. Used in variance tests and as the sampling distribution of `(n-1)*s^2 / sigma^2`.

### Bootstrap Methods

Non-parametric resampling technique for estimating the sampling distribution of a statistic.

**Algorithm:**
1. From the original dataset of size `n`, draw `B` bootstrap samples, each of size `n`, with replacement.
2. Compute the statistic of interest on each bootstrap sample.
3. Use the distribution of the `B` bootstrap statistics to estimate confidence intervals, standard errors, or bias.

**Confidence Interval (Percentile Method):**
The `(1 - alpha)` confidence interval is given by the `alpha/2` and `1 - alpha/2` percentiles of the bootstrap distribution.

Bootstrap is especially useful in finance when:
- Analytical formulas for standard errors are unavailable (e.g., Sharpe ratio)
- The underlying distribution is unknown or non-normal
- Small sample sizes make asymptotic results unreliable

### Hypothesis Testing

**t-test (mean):** Tests whether a sample mean differs significantly from a hypothesized value.

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

with `n - 1` degrees of freedom.

**F-test (joint significance):** Tests whether a group of regression coefficients are jointly zero. Used in multi-factor models.

$$F = \frac{(SS_{restricted} - SS_{unrestricted}) / q}{SS_{unrestricted} / (n - k - 1)}$$

where `q` = number of restrictions.

**Jarque-Bera Test (normality):** Tests whether sample skewness and kurtosis are consistent with a normal distribution.

$$JB = \frac{n}{6} \left(\gamma^2 + \frac{\kappa^2}{4}\right)$$

where `gamma` = sample skewness and `kappa` = sample excess kurtosis. Under the null of normality, JB follows a chi-squared distribution with 2 degrees of freedom. Financial return series almost always reject normality due to fat tails and skewness.

## Key Formulas

| Formula | Expression | Use Case |
|---------|-----------|----------|
| Sample Mean | `x_bar = (1/n) * sum(x_i)` | Central tendency |
| Sample Variance | `s^2 = (1/(n-1)) * sum((x_i - x_bar)^2)` | Dispersion |
| Annualized Volatility | `sigma_annual = sigma_period * sqrt(periods_per_year)` | Risk standardization |
| Skewness | `gamma = E[(X-mu)^3] / sigma^3` | Asymmetry |
| Excess Kurtosis | `kappa = E[(X-mu)^4] / sigma^4 - 3` | Tail thickness |
| Covariance | `Cov(X,Y) = E[(X-mu_X)(Y-mu_Y)]` | Co-movement |
| Correlation | `rho = Cov(X,Y) / (sigma_X * sigma_Y)` | Standardized co-movement |
| Shrinkage Estimator | `Sigma_shrunk = delta*F + (1-delta)*Sigma_hat` | Stable covariance matrix |
| OLS Coefficients | `beta_hat = (X'X)^(-1) X'y` | Linear regression |
| R-squared | `1 - SS_

Files: 2

Size: 28.8 KB

Complexity: 54/100

Category: General

Source: https://github.com/joellewis/finance_skills/tree/main/plugins/core/skills/statistics-fundamentals

Related in General

modeling-omnistudio-epc-catalog

Included

Salesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).

Generalscripts

relationship-science-coach

Included

Use this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.

Generalscripts

building-sf-integrations

Included

Salesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).

Generalscripts

venue-templates

Included

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

Generalscripts

let-fate-decide

Included

Draws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.

Generalscripts

net-ops

Included

Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.

Generalscripts