Academic homepage and blog

I write about statistics, data science, and machine learning, and sometimes opine on politics and current events, data journalism, science, and academia.

Data for good talk at Columbia Data Science Institute

(Note: links don’t work in this preview, click on the post to view). I’m happy to be speaking at 1pm EST today at Columbia University on the topics of causal inference and selection bias in algorithmic fairness. I believe video will be available at the webinar link, and here are my slides. The talk is based on work described in this survey with my coauthors Matt Kusner, Chris Russell, and Ricardo Silva. See here for the video for Matt’s oral presentation of our first paper in this line of work at NIPS 2017. [Read More]

Russian twitter trolls attacked Bernie too

You may have seen stories about Twitter accounts operated by Russians attempting to influence the 2016 election in the United States. Much of the reporting that I’ve seen described a Simple Narrative: Russians tried to help trump and hurt Clinton, even supporting Bernie Sanders in order to attack Clinton. I’ve also seen plenty of Democrats on Twitter attacking Sanders over this. I have not seen any stories reporting the fact that many of these bots also attacked Sanders. (If you’re aware of such stories, I’d be happy to hear about them). […] Recently, FiveThirtyEight published a repository with roughly 3 million tweets from about 2,800 accounts that Twitter concluded were associated with the Russian effort. About the data: […] The data set is the work of two professors at … [Read More]

A conditional approach to inference after model selection

Model selection can invalidate inference, such as significance tests, but statisticians have recently made progress developing methods to adjust for this bias. One approach uses conditional probability, adjusting inferences by conditioning on selecting the chosen model. This post motivates the conditional approach with a simple screening rule example and introduces the selectiveInference R package that can compute adjusted significance tests after popular model selection methods like forward stepwise and LASSO. [Read More]

Algorithmic fairness is as hard as causation

This post describes a simple example that illustrates why algorithmic fairness is a hard problem. I claim it is at least as hard as doing causal inference from observational data, i.e. distinguishing between mere association and actual causation. In the process, we will also see that SCOTUS Chief Justice Roberts has a mathematically incorrect theory on how to stop discrimination. Unfortunately, that theory persists as one of the most common constraints on fairness. [Read More]

Model selection bias invalidates significance tests

People often do regression model selection, either by hand or using algorithms like forward stepwise or the lasso. Sometimes they also report significance tests for the variables in the chosen model. After all, a significant p-value means they’ve found something real. But there’s a problem: the reason for that significant p-value may just be something called model selection bias. This bias can invalidate inferences done after model selection, and may be one of the contributors to the reproducibility crisis in science. Adjusting inference methods to account for model selection is an area of ongoing research where I have done some work. [Read More]