## Course description

This course examines modern statistical methods as a basis for decision making in the face of uncertainty. Topics include probability theory, discrete and continuous distributions, hypothesis testing, estimation, and statistical quality control. With the aid of computers, these statistical methods are used to analyze data. Also presented are an introduction to statistical models and their application to decision making. Topics include the simple linear regression model, inference in regression analysis, sensitivity analysis, and multiple regression analysis.

### Syllabus

Full syllabus here: pdf

## Assignments

- Reading: OIS Chapter 1, up to and including Section 1.5, but you may skip 1.4.2.
- Reading: HBR article 1 on A/B testing.
- Reading: HBR article 2 beginning with the section on low quality data.
- RStudio introduction video (skip to 1:45, or 5:45)
- Homework 1 (solution)
- Reading: OIS Chapter 2, sections 1 and 2.
- Homework 2 (solution)
Reading: OIS Chapter 2, Section 4, up to 2.4.3, and 2.4.4 by Friday.

Reading: econ blog on correlation, causation, and confounding

Reading: wikipedia on Types (especially self-selection, survivorship) and Problems of sampling bias

## Lecture notes

- Observational studies
- Summary statistics
- ggplot2 & histograms
- Probability definitions
- Independence, conditional probability
- Bayes’ theorem, random variables
- Random variables continued, expectation
- Expectation continued, variance, data/model worlds
- Variance continued, Chebyshev’s inequality
- Sampling
- Sampling distributions
- Estimation
- Bias variance trade-off
- Laws of large numbers (Lab.Rmd)
- Central limit theorem (Lab.Rmd)
- Using the CLT
- Confidence intervals (Lab.Rmd)
- Hypothesis tests
- Tests for multiple groups (Lab.Rmd)
- Misuse of tests
- Test power and sample size
- Covariance and correlation
- Scatterplots and regression lines
- Regression coefficients
- Prediction
- Model inference (Lab.Rmd)
- Multiple regression, other extensions
- Model selection and the F test
- Overfitting and valiation
- Classification with logistic regression
- Common pitfalls in regression
- Resources to learn more
- Principles and ethics

## Reading/textbook references

These references are generally good, and some parts of them closely match the material we are covering.

- ModernDive (MD)
- Learning Statistics with R (LSR)
- OpenIntro Statistics (OIS) (has exercises and solutions)
- Introduction to Data Science (IDS)
- Statistics, 4th Edition, by Freedman, Pisani, and Purves (FPP). Textbook,
*not required*.

Specific chapter or section references for various topics are as follows.

- Controlled and observational studies: LSR 2, especially 2.5 onward; OIS 1.1-1.5; FPP 1-2.
- Summaries and plots: LSR 5; OIS 1.6; MD 3; FPP 3-4,7.
- Probability: LSR 9; OIS 2, 3.4; IDS 26, 28.1-4,6; FPP 13-15, 17.
- Estimation: LSR 10; IDS 32-33; OIS 4.1-2; FPP 21, 23-24.
- Intervals and hypothesis tests: MD Appendix B; LSR 10.5, 11, 13; OIS 4.2-5, 5.1-4, 6.1-2,4-6; IDS 34, 38; FPP 26-29.
- Covariance and simple regression: MD 6; LSR 5.7, 15.1-2,4,6,8,9; OIS 7; FPP 8-12.
- Multiple regression: MD 7; LSR 15.3,10; OIS 8.1-3.

## Getting started with R

- Download and install the R language
- Download and install the free desktop version of RStudio
- Check out this free book on the basics of R and RStudio
- Install the tidyverse bundle of packages using
`install.packages("tidyverse")`

, and read about ggplot2 and dplyr