- Stat-GB.4304: Modern statistics for data science
- Stat-UB.103: Introductory statistics (created by Vinu Abeywick)
- Stat-UB.103 (old): Statistics for business: control, regression, and forecasting
- Stat-UB.003: Regression and forecasting (page not active)
- Stats 390: Statistical Consulting Workshop (at Stanford, old page)
For the R programming language
Installation: download and install the R language itself, then download and install the free desktop version of RStudio, finally, you may find this short, free book on the basics of R and RStudio helpful for getting started.
Packages and guides
The tidyverse is essential. It’s a bundle of packages including blockbusters like
This list of recommended packages has a lot of great suggestions as well.
Interesting data sets
Data in R packages is easy to access. Just
install.packages("packagename") (with quotes), then
library(packagename) (without quotes), and read the package documentation for help. For example, the
nycflights13 documentation describes 5 dataframes contained in that package, like the
flights dataframe and the name and meaning of each variable it contains.
If a package is not available on CRAN (the central repository for R), but is instead only hosted on github, you may need to install the package
devtools and use the command
FiveThiryEight is an R package with many interesting datasets from the data journalism site with the same name. There are several sports and politics datasets, one about which movies pass or fail the Bechdel test, one about hate crimes and income inequality, one about employment and earnings of recent college graduates by major, and so on. Check the documentation for a full list of datasets and how to use each one.
Gapminder is an R package with data including life expectancy and GDP per capita by country and year. This is the data shown in a famous TED talk by the late Hans Rosling.
The four packages described in this post on the RStudio blog: