### Academic homepage and blog

I write about statistics, data science, and machine learning, and sometimes opine on politics and current events, data journalism, science, and academia.

I’m a statistician and data scientist with a broad range of interests including theory, applications, and teaching with the R statistical programming language. My research focuses on common practices in machine learning and data science pipelines and addressing sources and types of error that have previously been overlooked. This includes, for example:

- Developing methods for inference after model selection such as
*p*-values adjusted for selection bias - Analyzing the social fairness of machine learning algorithms from a causal perspective

My work has been published in the **Annals of Statistics** and **Advances in Neural Information Processing Systems** (NIPS).

As a first generation college graduate, my journey in higher education started in community college. I care about diversity and inclusion, and I’m happy to speak with, mentor, or help students from any background.

### History

- Assistant Professor,
**New York University**, Department of Technology, Operations, and Statistics, 2017-present. - Research Fellow,
**Alan Turing Institute**and**University of Cambridge**, 2016-17. - Ph.D. Statistics, (Biostatistics trainee),
**Stanford University**, 2016. - M.A. Mathematics, (concentration in computational biology),
**Rutgers University**, 2011. - B.S. Mathematics, (summa cum laude),
**Western Michigan University**, 2009.

### Selected Honors and Awards

- Statistics Department Teaching Award, 2014.
- Alan M. Abrams Memorial Fellowship, 2013-2015.
- Phi Beta Kappa.

## A concise defense of statistical significance

## Counterfactual privilege ICML talk

## Data for good talk at Columbia Data Science Institute

## Russian twitter trolls attacked Bernie too

## A conditional approach to inference after model selection

## Algorithmic fairness is as hard as causation

## Model selection bias invalidates significance tests

*p*-value means they’ve found something real. But there’s a problem: the reason for that significant

*p*-value may just be something called model selection bias. This bias can invalidate inferences done after model selection, and may be one of the contributors to the reproducibility crisis in science. Adjusting inference methods to account for model selection is an area of ongoing research where I have done some work. [Read More]