course schedule
This is an old version of the schedule! Here is the current (or, most recent) class schedule.
The source code for these lectures is available at the github repository.
Fall 2022
- Week 1 (9/28): Probability
-
Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.
- Slides: Introduction, and probability ipynb html
- Reading: Adhikari & Pitman, chapters 1, 2, & 3
- alternative reading: Wasserman, chapter 1, 2.1-2.4
- Homework (due 10/6): ipynb html
- Week 2 (10/3): The modeler’s toolbox
-
Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.
- Slides: Random variables ipynb html
- Slides: Stochastic gradient descent ipynb html
- Reading: Adhikari & Pitman, chapters 4, 6.1-6.3, 6.5, 8, 15.1-15.4.
- alternative reading: Wasserman, chapters 2 & 3
- Homework (due 10/13): ipynb html
- Week 3 (10/10): Simulation, moments, and overdispersion.
-
How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.
- Week 4 (10/17): Model choice, categorical prediction, and likelihood.
-
Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.
- Slides: Likelihood ipynb html
- Slides: P-values, and hypotheses ipynb html
- Reading: Adhikari & Pitman, chapter 20
- alternative reading: Wasserman, chapter 9
- Homework (due 10/27): ipynb html
- Week 5 (10/24): Quantifying uncertainty
-
Calibration of estimates of uncertainty; asymptotics versus simulation. Review.
- Slides: Review ipynb html
- In-class exercise: Confidence intervals and uncertainty ipynb html
- Reading: Adhikari & Pitman, chapter 14;
- alternative reading: Wasserman, chapters 8 & 11
- Homework (due 11/3): ipynb html
- Week 6 (10/31): Multivariate data and latent structure
-
The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.
- Slides: Correlation and covariance ipynb html
- Slides: Principal components analysis ipynb html
- Reading: Adhikari & Pitman, chapter 17.1-17.3 and chapter 23
- alternative reading: Wasserman, chapter 14
- Homework (due 11/10): ipynb html
- Week 7 (11/7): Linear models
-
Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.
- Week 8 (11/14): Generalized linear models
-
Response distributions, nonlinear relationships, transformations.
- Week 9 (11/21): Problems with linear models
-
Too many variables, not enough linearity: regularization and diagnostics.
- Week 10 (11/28): Prediction and inference revisited
-
The bootstrap; Identifiability, ill-posed inference, non-convex optimization.
- Slides: Uncertainty and the bootstrap ipynb html
- Slides: Interpolation and ill-posedness ipynb html
- Slides: Review ipynb html
- Reading: Adhikari DeNero & Wagner, chapter 13
- alternative reading: Wasserman, chapter 8
- Final (due 12/8): ipynb html