Course pages and useful learning resources

Course pages

  • Stat-GB.4304: Modern statistics for data science
  • Stat-UB.103: Introductory statistics (created by Vinu Abeywick)
  • Stat-UB.103 (old): Statistics for business: control, regression, and forecasting
  • Stat-UB.003: Regression and forecasting (page not active)
  • Stats 390: Statistical Consulting Workshop (at Stanford, old page)

Selected links

For the R programming language

Installation: download and install the R language itself, then download and install the free desktop version of RStudio, finally, you may find this short, free book on the basics of R and RStudio helpful for getting started.

Packages and guides

Interesting data sets

Data in R packages is easy to access. Just install.packages("packagename") (with quotes), then library(packagename) (without quotes), and read the package documentation for help. For example, the nycflights13 documentation describes 5 dataframes contained in that package, like the flights dataframe and the name and meaning of each variable it contains.

If a package is not available on CRAN (the central repository for R), but is instead only hosted on github, you may need to install the package devtools and use the command devtools::install_github("authorname/packagename").

  • FiveThiryEight is an R package with many interesting datasets from the data journalism site with the same name. There are several sports and politics datasets, one about which movies pass or fail the Bechdel test, one about hate crimes and income inequality, one about employment and earnings of recent college graduates by major, and so on. Check the documentation for a full list of datasets and how to use each one.

  • Gapminder is an R package with data including life expectancy and GDP per capita by country and year. This is the data shown in a famous TED talk by the late Hans Rosling.

  • The four packages described in this post on the RStudio blog: babynames, fueleconomy, nasaweather and nycflights13.