Useful links
Software
These might be of interest (for prerequisites, see the syllabus):
- scikit-learn
- statsmodels
- patsy for specifying (linear) models in python
- jupyterhub troubleshooting
- matplotlib - a common way to make plots in python
- seaborn - another way to make plots in python
- plotnine - yet another way to make plots in python
- scipy - many fundamental algorithms (e.g., for optimization)
Other books
If you are looking for more reference or reading material, these are also good. These are in no particular order; browse and find one the speaks to you.
-
Python for Data Analysis, by Wes McKinney, free online, covering “the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python”.
-
Probability for Data Science, by Ani Adhikari and Jim Pitman, free online.
-
Theory meets data, by Ani Adhikari.
-
Probability, by Pitman, free ebook via the UO library website.
-
Introduction to Probability, by Blitzstein and Hwang.
-
Learning Data Science, by Sam Lau, Joey Gonzalez, and Deb Nolan.
-
Introduction to Data Science, by Rafael A. Irizarry. Subtitle “Data Analysis and Prediction Algorithms with R”.
-
An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie, and Rob Tibshirani. A less technical (i.e., math-heavy) and more applied treatment of the topics in Elements of Statistical Learning.
-
The Elements of Statistical Learning, by Trevor Hastie, Rob Tibshirani, and Jerome Friedman.
-
Introduction to Probability for Data Science, by Stanley H. Chan.
-
All of Statistics, by Larry Wasserman. Technical, a book on statistical theory.
-
Introductory Statistics, by Shafer & Zhang.
-
R for Data Science, by Wickham, H. & G. Grolemund. 2016. O’Reilly Publishers. How to do many common data analysis tasks in R, specifically in the tidyverse.
-
Fundamentals of Data Visualization, by Wilke, Claus O. O’Reilly Publishers. How to think about visualization (with source code for plots available!).
Other courses:
- Probability for Data Science at UC Berkeley
- Data 88 at UC Berkeley
- Data 100 at UC Berkeley
- Data 8 at UC Berkeley
- book: Computational and Inferential Thinking: The Foundations of Data Science, i.e., “Data 8” at UC Berkeley by Ani Adhikari, John DeNero, and David Wagner.
General resources
- linuxcommand.org and bashguide
- Software Carpentry
- Reproducible Research by Karl Broman: talk and course
- Karl Broman’s excellent short tutorials on rmarkdown, git/github, make, perl, and more.
- a visual introduction to git
Probability and statistics
- List of common probability distributions
- Interactive plot of the beta distribution
- Interactive plot of the gamma distribution
- Interactive plot of Student’s t-distribution