The source code for these lectures is available at the github repository. The schedule (with slides and homeworks) from Spring 2023 is available here, and from Fall 2022 here.

Fall 2023

Week 1: Probability

Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.

Week 2: The modeler’s toolbox

Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.

Week 3: Simulation, moments, and overdispersion.

How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.

Week 4: Model choice, categorical prediction, and likelihood.

Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.

Week 5: Quantifying uncertainty

Calibration of estimates of uncertainty; asymptotics versus simulation. Review.

Week 6: Multivariate data and latent structure

The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.

Week 7: Linear models

Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.

Week 8: Generalized linear models

Response distributions, nonlinear relationships, transformations.

Week 9: Problems with linear models

Too many variables, not enough linearity: regularization and diagnostics.

  • Slides: Transformations and diagnostics ipynb html
  • Slides: Regularization and crossvalidation ipynb html
Week 10: Prediction and inference revisited

The bootstrap; Identifiability, ill-posed inference, non-convex optimization.