A conditional approach to inference after model selection

Model selection can invalidate inference, such as significance tests, but statisticians have recently made progress developing methods to adjust for this bias. One approach uses conditional probability, adjusting inferences by conditioning on selecting the chosen model. This post motivates the conditional approach with a simple screening rule example and introduces the selectiveInference R package that can compute adjusted significance tests after popular model selection methods like forward stepwise and LASSO.

Model selection bias invalidates significance tests

People often do regression model selection, either by hand or using algorithms like forward stepwise or the lasso. Sometimes they also report significance tests for the variables in the chosen model. After all, a significant p-value means they've found something real. But there's a problem: the reason for that significant p-value may just be something called model selection bias. This bias can invalidate inferences done after model selection, and may be one of the contributors to the reproducibility crisis in science. Adjusting inference methods to account for model selection is an area of ongoing research where I have done some work.