The source code for these lectures is available at the github repository. The schedule (with slides and homeworks) from Fall 2022 is available here.

Spring 2023

Week 1 (4/3): Probability

Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.

Week 2 (4/10): The modeler’s toolbox

Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.

Week 3 (4/17): Simulation, moments, and overdispersion.

How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.

Week 4 (4/24): Model choice, categorical prediction, and likelihood.

Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.

Week 5 (5/1): Quantifying uncertainty

Calibration of estimates of uncertainty; asymptotics versus simulation. Review.

Week 6 (5/8): Multivariate data and latent structure

The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.

Week 7 (5/15): Linear models

Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.

Week 8 (5/22): Generalized linear models

Response distributions, nonlinear relationships, transformations.

Week 9 (5/29): Problems with linear models

Too many variables, not enough linearity: regularization and diagnostics.

  • No class Monday, 5/29 (Memorial day)
  • Slides: Transformations and diagnostics ipynb html
  • Slides: Regularization and crossvalidation ipynb html
  • Homework (due 6/7): ipynb html
Week 10 (6/5): Prediction and inference revisited

The bootstrap; Identifiability, ill-posed inference, non-convex optimization.