course schedule
The source code for these lectures is available at the github repository. Also please see the technical_notes for software and other troubleshooting tips.
Winter 2026
- Week 1: Exploratory Data Analysis
-
Overview of the goals of the course: description, visualization, exploration, pattern discovery, and summarization. Introduction to different frameworks and goals, and relationship to preregistration and hypothesis testing. Types of data: tidy data, images, geospatial, words, time series.
- Reading: Chapter 2 (“Exploratory Data Analysis”) from Haig, The Philosophy of Quantitative Methods (see Canvas)
- Reading: Introduction from Tan, Steinbach, & Kumar, Introduction to Data Mining (see Canvas)
- Case study: the Youth Tobacco Survey
- Reading: NYTS report 2024
- Slides: Introduction (ipynb)
- Discussion: Exploratory Data Analysis
- Demo: Youth Tobacco Survey (ipynb)
- Assignment (due Monday 1/12): ipynb html
- Lab (for 1/9): ipynb html
- Week 2: Visualization
-
Grammar of graphics. Overview of types of plot for uni- and multi-variate summarization, color pallettes, transformations. Output: bitmap, vector, and web-based interactive.
- Links: plotnine documentation
- Reading: A layered grammar of graphics, Wickham
- Slides: Graphics and visualization
- Demo: Grammar of Graphics
- Demo: Weather Data (completed version from class)
- Assignment (due Tuesday 1/20): ipynb html
- Lab (for 1/16): ipynb html
- Week 3: Summarizing, smoothing, and outliers.
-
Split-apply-combine options. Types and goals of smoothers. Methods for outlier identification.
- Monday: no class (MLK day)
- Optional reading: The Split-Apply-Combine strategy for data analysis
- Slides: Split/apply/combine (completed version from class)
- Assignment (due Monday 1/26): ipynb html
- Weeks 4-5: Dimension reduction.
-
What low-dimensional representations do and what they don’t. Overview of methods: similarity- and distance-based; examples: principal component analysis, t-SNE.
- Slides: PCA
- Slides: PCA and SVD
- Demo: penguin PCA (completed version from class)
- Assignment (due Monday 2/2): ipynb html
- Lab (for 1/30): ipynb html
- Slides: Generalizing: tSNE
- Slides: On ordination
- Assignment (due Monday 2/9): ipynb html
- Slides: Federalist papers (completed version from class)
- Lab (for 2/6): ipynb html (completed version from class)
- Week 6: Working with words.
-
Bag of words, preprocessing, embeddings, latent Dirichlet allocation, other applications of dimension reduction. Finding n-grams, sentiment analysis.
- Slides: Working with text
- Assignment (due Monday 2/16): ipynb html
- Slides: Scraping, and plotting text
- Lab (for 2/13): ipynb html (semi-completed version from class)
- Week 7: Case study.
-
Groundwater monitoring.
- Slides: Groundwater
- Assignment (due Monday 2/23): ipynb html
- Week 8: Spatial data.
-
Spatial smoothing and prediction.
- Slides: Mapping
- Reference/reading: Geocomputation with Python
- Assignment (due Monday 3/2): ipynb html
- Slides: Mapping (edited notebook from class)
- Lab (for 2/27): ipynb html
- Week 9: Working with images.
-
Formats; layers; types of image data. Normalization and pre-processing. Applications of dimension reduction.
- Slides: Landsat images (edited notebook from class)
- Assignment (due Monday 3/2): html
- Week 10: More with images.
-
More with images; case study recap; student presentations.