Academic homepage and blog

I write about statistics, data science, and machine learning, and sometimes opine on politics and current events, data journalism, science, and academia.

A conditional approach to inference after model selection

Model selection can invalidate inference, such as significance tests, but statisticians have recently made progress developing methods to adjust for this bias. One approach uses conditional probability, adjusting inferences by conditioning on selecting the chosen model. This post motivates the conditional approach with a simple screening rule example and introduces the selectiveInference R package that can compute adjusted significance tests after popular model selection methods like forward stepwise and LASSO. [Read More]

Algorithmic fairness is as hard as causation

This post describes a simple example that illustrates why algorithmic fairness is a hard problem. I claim it is at least as hard as doing causal inference from observational data, i.e. distinguishing between mere association and actual causation. In the process, we will also see that SCOTUS Chief Justice Roberts has a mathematically incorrect theory on how to stop discrimination. Unfortunately, that theory persists as one of the most common constraints on fairness. [Read More]

Model selection bias invalidates significance tests

People often do regression model selection, either by hand or using algorithms like forward stepwise or the lasso. Sometimes they also report significance tests for the variables in the chosen model. After all, a significant p-value means they’ve found something real. But there’s a problem: the reason for that significant p-value may just be something called model selection bias. This bias can invalidate inferences done after model selection, and may be one of the contributors to the reproducibility crisis in science. Adjusting inference methods to account for model selection is an area of ongoing research where I have done some work. [Read More]