course schedule
The source code for these lectures is available at the github repository. The schedule (with slides and homeworks) from Fall 2022 is available here.
Spring 2023
- Week 1 (4/3): Probability
-
Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.
- Slides: Introduction, and probability ipynb html
- Slides: Random variables ipynb html
- Reading: Adhikari & Pitman, chapters 1, 2, & 3
- alternative reading: Wasserman, chapter 1, 2.1-2.4
- Short Homework (due 4/6): ipynb html
- Homework (due 4/12): ipynb html
- Week 2 (4/10): The modeler’s toolbox
-
Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.
- Slides: Stochastic gradient descent ipynb html
- Slides: Poisson counts ipynb html
- Reading: Adhikari & Pitman, chapters 4, 6.1-6.3, 6.5, 8, 15.1-15.4.
- alternative reading: Wasserman, chapters 2 & 3
- Homework (due 4/19): ipynb html
- Week 3 (4/17): Simulation, moments, and overdispersion.
-
How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.
- Week 4 (4/24): Model choice, categorical prediction, and likelihood.
-
Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.
- Slides: Likelihood ipynb html
- Reading: Adhikari & Pitman, chapter 20
- alternative reading: Wasserman, chapter 9
- Homework (due 5/3): ipynb html
- Week 5 (5/1): Quantifying uncertainty
-
Calibration of estimates of uncertainty; asymptotics versus simulation. Review.
- Slides: P-values, and hypotheses ipynb html
- In-class exercise: Confidence intervals and uncertainty ipynb html
- Slides: Power and false positives ipynb html
- Reading: Adhikari & Pitman, chapter 14;
- alternative reading: Wasserman, chapters 8 & 11
- Homework (due 5/10): ipynb html
- Week 6 (5/8): Multivariate data and latent structure
-
The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.
- Slides: Correlation and covariance ipynb html
- Reading: Adhikari & Pitman, chapter 17.1-17.3 and chapter 23
- alternative reading: Wasserman, chapter 14
- Homework (due 5/17): ipynb html
- Week 7 (5/15): Linear models
-
Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.
- Week 8 (5/22): Generalized linear models
-
Response distributions, nonlinear relationships, transformations.
- Week 9 (5/29): Problems with linear models
-
Too many variables, not enough linearity: regularization and diagnostics.
- Week 10 (6/5): Prediction and inference revisited
-
The bootstrap; Identifiability, ill-posed inference, non-convex optimization.
- Slides: Uncertainty and the bootstrap ipynb html
- Slides: Interpolation and ill-posedness ipynb html
- Slides: Review ipynb html
- Reading: Adhikari DeNero & Wagner, chapter 13
- alternative reading: Wasserman, chapter 8
- Final (due 6/15): ipynb html