Section 1 Outline

This course surveys theory and methods addressing important statistical aspects of data science with a focus on high-dimensional data, statistical learning, and causal inference. We will begin with advances in hypothesis testing such as control of the false discovery rate for multiple comparisons. Then we will discuss statistical theory for popular learning and model selection methods such as the lasso, including recent advances in post-selection inference. Finally, after reviewing frameworks for causal inference the course will conclude by reading literature on the application of statistical learning methods to causal inference. Throughout the course we will combine theory, through mathematical understanding, with practice, through empirical understanding and competency with applications such as simulation studies. We assume a working knowledge of probability and linear algebra, familiarity with statistics, and willingness to code in the R statistical programming language, but otherwise the course is self-contained.

Topics:

  • Ethical data science
  • Foundations of statistical inference
  • Large-scale hypothesis testing
  • Inference for causal effects
  • Foundations of regression models
  • Machine learning and high-dimensional regression
  • Adjusting inference for model selection
  • Machine learning in causal inference

Textbooks:

Selected readings from the following references (subject to minor changes), all of which have free PDFs available by following these links.