Modern Statistics 4 Data Science
Joshua Loftus
2019-12-05
Section 1 Outline
This course surveys theory and methods addressing important statistical aspects of data science with a focus on high-dimensional data, statistical learning, and causal inference. We will begin with advances in hypothesis testing such as control of the false discovery rate for multiple comparisons. Then we will discuss statistical theory for popular learning and model selection methods such as the lasso, including recent advances in post-selection inference. Finally, after reviewing frameworks for causal inference the course will conclude by reading literature on the application of statistical learning methods to causal inference. Throughout the course we will combine theory, through mathematical understanding, with practice, through empirical understanding and competency with applications such as simulation studies. We assume a working knowledge of probability and linear algebra, familiarity with statistics, and willingness to code in the R statistical programming language, but otherwise the course is self-contained.
Topics:
- Ethical data science
- Foundations of statistical inference
- Large-scale hypothesis testing
- Inference for causal effects
- Foundations of regression models
- Machine learning and high-dimensional regression
- Adjusting inference for model selection
- Machine learning in causal inference
Textbooks:
Selected readings from the following references (subject to minor changes), all of which have free PDFs available by following these links.
- Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie
- Statistical Learning with Sparsity: The Lasso and Generalizations by Trevor Hastie, Robert Tibshirani, and Martin Wainwright
- Causal Inference by Miguel Hernán and Jamie Robins
- Elements of Causal Inference: Foundations and Learning Algorithms by Jonas Peters, Dominik Janzing and Bernhard Schölkopf