This is an old version of the schedule! Here is the current (or, most recent) class schedule.

The source code for these lectures is available at the github repository.

Fall 2022

Week 1 (9/28): Probability

Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.

Week 2 (10/3): The modeler’s toolbox

Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.

Week 3 (10/10): Simulation, moments, and overdispersion.

How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.

Week 4 (10/17): Model choice, categorical prediction, and likelihood.

Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.

Week 5 (10/24): Quantifying uncertainty

Calibration of estimates of uncertainty; asymptotics versus simulation. Review.

Week 6 (10/31): Multivariate data and latent structure

The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.

Week 7 (11/7): Linear models

Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.

Week 8 (11/14): Generalized linear models

Response distributions, nonlinear relationships, transformations.

Week 9 (11/21): Problems with linear models

Too many variables, not enough linearity: regularization and diagnostics.

  • Slides: Regularization and crossvalidation ipynb html
  • Slides: Transformations and diagnostics ipynb html
  • Homework (due 12/1): ipynb html
Week 10 (11/28): Prediction and inference revisited

The bootstrap; Identifiability, ill-posed inference, non-convex optimization.