AdvancedQuantitative MethodsPython
Run this module
Principal Component Analysis (PCA)¶
PCA finds the orthogonal directions that explain the most variance in a dataset. In finance it powers yield-curve decomposition (level/slope/curvature), statistical factor extraction, dimensionality reduction, and covariance de-noising.
Functions¶
| Function | Description |
|---|---|
standardize(data) |
Z-score columns to mean 0, unit variance (for correlation-matrix PCA) |
pca(data, n_components=None) |
Eigendecomposition → components, eigenvalues, variance ratios, scores |
reconstruct(scores, components, mean) |
Rebuild data from a (truncated) component set |
cumulative_variance(ratios) |
Running cumulative variance explained (scree / elbow analysis) |
Key Concepts¶
- PCA = eigendecomposition of the covariance matrix. Eigenvectors are the component directions (loadings); eigenvalues are the variance each explains.
- Yield curves: the first three PCs are reliably interpretable as level (parallel shift), slope (steepening/flattening), and curvature (bowing) — together they typically explain >99% of variation.
- Variance ratio:
eigenvalue / total variancetells you how much each component matters; use the cumulative sum to pick how many to keep. - Low-rank reconstruction: keep the top-k components and discard the rest to de-noise a covariance matrix or compress correlated returns.
Example¶
import numpy as np
from pca import pca, cumulative_variance
returns = np.random.default_rng(0).normal(0, 0.01, (500, 8))
result = pca(returns, n_components=3)
print(result["explained_variance_ratio"]) # variance per PC
print(cumulative_variance(result["explained_variance_ratio"]))
print(result["scores"].shape) # (500, 3) factor time series
Practical Notes¶
- Use
np.linalg.eigh(symmetric solver), noteig— it is faster and returns real, orthonormal eigenvectors. - Standardise first (
standardize) when variables have different units, so you analyse the correlation matrix rather than the covariance matrix. - Eigenvector signs are arbitrary; interpret loadings by their relative signs and magnitudes, not absolute sign.
Continue in Quantitative Methods¶
-
Quantitative Methods - Bootstrap
The bootstrap estimates the sampling distribution of any statistic by resampling the observed data with replacement — no normality assumption required. It is the honest way to put confidence intervals around backtest metrics like Sharpe ratio, mean return, or maximum drawdown.
-
Quantitative Methods - Cointegration
Cointegration: two non-stationary series whose linear combination is stationary. Backbone of statistical arbitrage and pairs trading.
-
Quantitative Methods - Copulas
This module demonstrates the concept of Copulas, specifically the Gaussian Copula, used in quantitative finance to model the dependency structure between multivariate random variables.
-
Quantitative Methods - Extreme Value Theory
Most risk models assume returns are normally distributed. They are not —
-
Quantitative Methods - Factor Models
Factor models explain asset returns as a linear combination of systematic factors plus a stock-specific residual. The Fama-French 3-Factor Model (1992) extended CAPM by adding two well-documented risk premia: the Size premium (SMB) and the Value premium (HML), dramatically improving the explanation of cross-sectional stock returns.
-
GARCH (Generalized Autoregressive Conditional Heteroskedasticity) captures volatility clustering — high-volatility days tend to follow high-volatility days. Used for risk forecasting, option pricing, and VaR.