course schedule
The source code for these lectures is available at the github repository. The schedule (with slides and homeworks) from Spring 2023 is available here, and from Fall 2022 here.
Fall 2023
- Week 1: Probability
-
Overview of probability and statistics in data science - randomness, uncertainty, estimation, and prediction. Probability and expectation, conditional probabilities, and random variables.
- Slides: Introduction, and probability ipynb html
- Slides: Random variables ipynb html
- Reading: Adhikari & Pitman, chapters 1, 2, & 3
- alternative reading: Wasserman, chapter 1, 2.1-2.4
- Short Homework: ipynb html
- Homework: ipynb html
- Week 2: The modeler’s toolbox
-
Simulation, random variables, properties of and relationships between some common probability distributions; computing means, variances, and expectations. Stochastic gradient descent.
- Slides: Stochastic gradient descent ipynb html
- Slides: Poisson counts ipynb html
- Reading: Adhikari & Pitman, chapters 4, 6.1-6.3, 6.5, 8, 15.1-15.4.
- alternative reading: Wasserman, chapters 2 & 3
- Week 3: Simulation, moments, and overdispersion.
-
How to pick “realistic” simulation parameters. Central limit theorem. Method-of-moments fitting; minimum-variance estimators. Outliers and overdispersion: scale mixtures, goodness-of-fit.
- Week 4: Model choice, categorical prediction, and likelihood.
-
Likelihood, p-values, hypothesis testing, power and false positives, false discovery rates.
- Slides: Likelihood ipynb html
- Reading: Adhikari & Pitman, chapter 20
- alternative reading: Wasserman, chapter 9
- Homework: ipynb html
- Week 5: Quantifying uncertainty
-
Calibration of estimates of uncertainty; asymptotics versus simulation. Review.
- Slides: P-values, and hypotheses ipynb html
- In-class exercise: Confidence intervals and uncertainty ipynb html
- Slides: Power and false positives ipynb html
- Reading: Adhikari & Pitman, chapter 14;
- alternative reading: Wasserman, chapters 8 & 11
- Week 6: Multivariate data and latent structure
-
The multivariate Gaussian distribution, autocorrelation, modeling correlated data, random walks. Principal components analysis.
- Slides: Correlation and covariance ipynb html
- Reading: Adhikari & Pitman, chapter 17.1-17.3 and chapter 23
- alternative reading: Wasserman, chapter 14
- Homework: ipynb html
- Week 7: Linear models
-
Introduction to linear models, and some history of modern statistics. Robust models, loss functions and likelihood.
- Slides: Principal components analysis ipynb html
- Slides: Introduction to linear models ipynb html
- Slides: In-class exercise ipynb html
- Reading: Adhikari & Pitman, chapter 24 & 25
- Homework: ipynb html
- For problem 1 in the homework, you will need to refer to this image.
- Week 8: Generalized linear models
-
Response distributions, nonlinear relationships, transformations.
- Week 9: Problems with linear models
-
Too many variables, not enough linearity: regularization and diagnostics.
- Week 10: Prediction and inference revisited
-
The bootstrap; Identifiability, ill-posed inference, non-convex optimization.
- Slides: Uncertainty and the bootstrap ipynb html
- Slides: Interpolation and ill-posedness ipynb html
- Slides: Review ipynb html
- Reading: Adhikari DeNero & Wagner, chapter 13
- alternative reading: Wasserman, chapter 8