Blog

Workshop on race and racism in science

Slides for my talk on racial bias in machine learning, AI, and data science, along with additional comments and links to other resources.

Conference on machine learning and inequality

Slides for my own talk on using causality to study the normative dimensions of machine learning models, and a blog post with additional comments on the conference.

Did the new variant of COVID spread through schools?

statistics
covid19
biocentrism

The common theory of the new COVID variant is that intrinsic biological factors make it more transmissible. An alternate theory attributes the same dynamics to social explanations like community spread among school-aged children.

Is the new variant of COVID really more transmissible?

statistics
covid19
biocentrism

A variant of COVID recently grew rapidly in London. Experts have warned this strain may be more transmissible, and governments have enacted more restrictions as a response. But the new variant has not spread rapidly in other locations.

Relocating and rebuilding

Some retrospective on an eventful 2020, announcing a move from New York to London, a couple book recommendations, and a new design for this website.

Least squares as springs

statistics
machine learning
physics

Physics intuition for regression and other methods that minimize squared error. We can imagine springs pulling the model toward the data.

A concise defense of statistical significance

statistics
reproducibility

Recent arguments against the use of p-values and significance testing are mostly weak. The weak ones are actually arguments against making decisions or mistakes in general, which is impossible.

A conditional approach to inference after model selection

statistics
reproducibility
selective inference
R

Model selection can invalidate inference, such as significance tests, but statisticians have recently made progress developing methods to adjust for this bias. This post motivates a conditional approach with a simple screening rule example and introduces an R package that can compute adjusted significance tests.

Model selection bias invalidates significance tests

statistics
reproducibility

People often do regression model selection, either by hand or using algorithms like forward stepwise or the lasso. Sometimes they also report significance tests for the variables in the chosen model. But there's a problem: the reason for *p*-value significance may just be something called model selection bias.

More articles »

Blog

 

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".