course schedule
The source code for these lectures is available at the github repository.
Below, worksheets are collections of exercises that we may or may not do in class; consider them as helpful supplementary exercises.
- Week 1: Probability
-
Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.
- Slides: Introduction, and probability ipynb html
- Slides: Random variables ipynb html
- Reading: Adhikari & Pitman, chapters 1, 2, & 3
- alternative reading: Wasserman, chapter 1, 2.1-2.4
- Homework: ipynb html
- Worksheet: ipynb
- Week 2: The modeler’s toolbox
-
Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.
- Slides: Stochastic gradient descent ipynb html
- Slides: Poisson counts ipynb html
- Reading: Adhikari & Pitman, chapters 4, 6.1-6.3, 6.5, 8, 15.1-15.4.
- alternative reading: Wasserman, chapters 2 & 3
- Worksheet: ipynb
- Week 3: Simulation, moments, and overdispersion.
-
How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.
- Week 4: Model choice, categorical prediction, and likelihood.
-
Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.
- Slides: Likelihood ipynb html
- Reading: Adhikari & Pitman, chapter 20
- alternative reading: Wasserman, chapter 9
- Worksheet: ipynb
- Week 5: Quantifying uncertainty
-
Calibration of estimates of uncertainty; asymptotics versus simulation. Review.
- Slides: P-values, and hypotheses ipynb html
- Slides: Confidence intervals and uncertainty ipynb html
- In-class exercise: Power and false positives ipynb html
- Reading: Adhikari & Pitman, chapter 14;
- alternative reading: Wasserman, chapters 8 & 11
- Worksheet: ipynb
- Week 6: Multivariate data and latent structure
-
The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.
- Slides: Correlation and covariance ipynb html
- Slides: Principal components analysis ipynb html
- Reading: Adhikari & Pitman, chapter 17.1-17.3 and chapter 23
- alternative reading: Wasserman, chapter 14
- Worksheet: ipynb
- Week 7: Linear models
-
Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.
- Week 8: Generalized linear models
-
Response distributions, nonlinear relationships, transformations.
- Week 9: Problems with linear models
-
Too many variables, not enough linearity: regularization and diagnostics.
- Week 10: Prediction and inference revisited
-
The bootstrap; Identifiability, ill-posed inference, non-convex optimization.
- Slides: Uncertainty and the bootstrap ipynb html
- Slides: Interpolation and ill-posedness ipynb html
- Slides: Review ipynb html
- Reading: Adhikari DeNero & Wagner, chapter 13
- alternative reading: Wasserman, chapter 8
Previous versions: The schedule (with slides and homeworks) from Fall 2023 is available here.