course schedule
The source code for these lectures is available at the github repository. The schedule (with slides and homeworks) from Fall 2023 is available here, Spring 2023 here, and from Fall 2022 here.
Spring 2024
- Week 1: Probability
-
Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.
- Slides: Introduction, and probability ipynb html
- Slides: Random variables ipynb html
- Reading: Adhikari & Pitman, chapters 1, 2, & 3
- alternative reading: Wasserman, chapter 1, 2.1-2.4
- Week 2: The modeler’s toolbox
-
Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.
- Slides: Stochastic gradient descent ipynb html
- Slides: Poisson counts ipynb html
- Reading: Adhikari & Pitman, chapters 4, 6.1-6.3, 6.5, 8, 15.1-15.4.
- alternative reading: Wasserman, chapters 2 & 3
- Homework: ipynb html
- Week 3: Simulation, moments, and overdispersion.
-
How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.
- Week 4: Model choice, categorical prediction, and likelihood.
-
Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.
- Slides: Likelihood ipynb html
- Reading: Adhikari & Pitman, chapter 20
- alternative reading: Wasserman, chapter 9
- Homework: ipynb html
- Worksheet: ipynb
- Week 5: Quantifying uncertainty
-
Calibration of estimates of uncertainty; asymptotics versus simulation. Review.
- Slides: P-values, and hypotheses ipynb html
- Slides: Confidence intervals and uncertainty ipynb html
- In-class exercise: Power and false positives ipynb html
- Reading: Adhikari & Pitman, chapter 14;
- alternative reading: Wasserman, chapters 8 & 11
- Week 6: Multivariate data and latent structure
-
The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.
- Slides: Correlation and covariance ipynb html
- Slides: Principal components analysis ipynb html
- Reading: Adhikari & Pitman, chapter 17.1-17.3 and chapter 23
- alternative reading: Wasserman, chapter 14
- Week 7: Linear models
-
Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.
- Slides: Introduction to linear models ipynb html
- Slides: In-class exercise ipynb html
- Reading: Adhikari & Pitman, chapter 24 & 25
- Week 8: Generalized linear models
-
Response distributions, nonlinear relationships, transformations.
- Week 9: Problems with linear models
-
Too many variables, not enough linearity: regularization and diagnostics.
- Week 10: Prediction and inference revisited
-
The bootstrap; Identifiability, ill-posed inference, non-convex optimization.
- Slides: Uncertainty and the bootstrap ipynb html
- Slides: Interpolation and ill-posedness ipynb html
- Slides: Review ipynb html
- Reading: Adhikari DeNero & Wagner, chapter 13
- alternative reading: Wasserman, chapter 8