Algorithmic fairness is as hard as causation

What is algorithmic fairness?

(Feel free to skip this section if you’re already familiar with the topic.)

Algorithmic fairness is an interdisciplinary research field concerned with the various ways that algorithms may perpetuate or reinforce unfair legacies of our history, and how we might modify the alorithms or systems they are used in to prevent this. For example, if the training data used in a machine learning methods contains patterns caused by things like racism, sexism, ableism, or other types of injustice, then the model may learn those patterns and use them to make predictions and decisions that are unfair. There are many ways that technology can have unintended consequences, and this is just one of them.

Chief Justice Roberts’ theory of fairness

The Chief Justice of the US Supreme Court wrote in 2007 that

The way to stop discrimination on the basis of race is to stop discriminating on the basis of race.

He wasn’t just trying to be cute, he was arguing in favor of a specific definition of fairness: one where a person’s race is not allowed to be taken into consideration when making decisions about them. For decisions made by humans, it’s hard to imagine how such a rule could ever be enforced. But for algorithms, there is a formal way that may seem at first to be a rigorous version of Roberts’ maxim. We need just a little background/notation for variables in an automated decision-making setting:

  • \(A\) represents a sensitive attribute, like race or gender, which is the basis of potential unfairness
  • \(X\) represents other attributes, like education test scores, or criminal record, which are considered informative for the decision
  • \(Y\) is an outcome variable to be predicted or classified, like GPA, risk of recidivism, or lifetime value of a loan, etc.

The algorithm uses this data to learn a function \(f\) which is used to predict \(Y\) from the input variables, with the hope that \(Y \approx f(X,A)\). Now, while it’s hard to imagine how a human might be “race blind” when making a decision–attaining a superhuman level of objectivity and freedom from implicit bias–many people believe that computers can do this easily. Just delete the variable \(A\) from the data, and make the function \(f\) a function of only \(X\), i.e., require that \(Y \approx f(X)\).

This definition of fairness has the advantages of being enshrined in US labor law and, if you survey the public, it’s widely popular (see Table 1 of this paper). Unfortunately, it is fundamentally flawed, as the next example shows.

The red car example

Consider a car insurance company that wants to predict the risk of insuring potential customers using the predictor variables gender \(A\) and whether or not the car to be insured is red \(X\). Using the driving records \(Y\) and data of their existing customers, they want to learn a function \(f(X,A)\) to predict \(Y\). Then when a new customer applies for insurance, they collect the data \(X, A\) and apply the function \(f\) to predict the risk of the policy for a new customer.

Is it fair to fit a function \(f(X) \approx Y\)? It depends on the underlying causal structure of the world. Suppose women are more likely to drive red cars, and independently of that people with aggressive personalities \(U\) are also more likely to drive red cars. Further, suppose that the most aggressive people are the only ones who have high car insurance risk. In this world, there is no direct relationship between gender and aggression or high risk. The big problem here is that aggression is unobserved, it is a unmeasured confounding variable. The possibility of unobserved confounding like this is the same phenomenon that makes causal inference so difficult.

Let’s generate some data in this idealized world:

A <- rbinom(1000, 1, .5)
U <- rnorm(1000)
X <- A + U > 1
Y <- U > 1/2
world <- data.frame(Gender = factor(c("Other", "Woman")[as.numeric(A)+1]),
                    CarColor = factor(c("Silver", "Red")[as.numeric(X)+1]),
                    Aggressiveness = U,
                    HighRisk = Y)
##   Gender CarColor Aggressiveness HighRisk
## 1  Other   Silver     0.07730312    FALSE
## 2  Other   Silver    -0.29686864    FALSE
## 3  Woman   Silver    -1.18324224    FALSE
## 4  Woman      Red     0.01129269    FALSE
## 5  Other   Silver     0.99160104     TRUE
## 6  Woman      Red     1.59396745     TRUE

You can see there’s no relationship between gender and aggressiveness in this data:

ggplot(world, aes(Gender, Aggressiveness)) + geom_boxplot() + theme_tufte()

We can summarize the causal relationships between these variables in a graph, where there is a directed arrow from variable 1 to variable 2 if and only if variable 1 is a cause of variable 2.

  "graph TB;
  A[Gender]-->X[Car Color]

Now we get to the problem. If the car company bases its decisions on \(X\) only, and not \(A\), it will end up charging higher costs to people with red cars:

ggplot(world, aes(CarColor, Aggressiveness)) + geom_boxplot() + theme_tufte()

Furthermore, this policy would have disparate impact on women since more of them drive red cars. To see this, let’s look at predicted risk scores for logistic regression models using either \(X\) alone (Roberts’ model) or using both \(X\) and \(A\) (full model).

full_model <- glm(HighRisk ~ CarColor + Gender, family = binomial, world)
roberts_model <- glm(HighRisk ~ CarColor, family = binomial, world)
world$full <- predict(full_model, type = "response")
world$roberts <- predict(roberts_model, type = "response")
output <- melt(world, measure.vars = 5:6, = "Model", = "Prediction")
output %>% group_by(Model, Gender, CarColor) %>% 
  summarise(Predicted = mean(Prediction))
## # A tibble: 8 x 4
## # Groups:   Model, Gender [?]
##   Model   Gender CarColor     Predicted
##   <fct>   <fct>  <fct>            <dbl>
## 1 full    Other  Red      1.000        
## 2 full    Other  Silver   0.199        
## 3 full    Woman  Red      0.582        
## 4 full    Woman  Silver   0.00000000223
## 5 roberts Other  Red      0.697        
## 6 roberts Other  Silver   0.126        
## 7 roberts Woman  Red      0.697        
## 8 roberts Woman  Silver   0.126

The Roberts model predicts high risk based on red car only. But there are many more women with red cars than men, independently of aggression.

table(world$Gender, world$CarColor)
##         Red Silver
##   Other  88    432
##   Woman 232    248

The full model distinguishes between women with red cars and men with red cars. Since men in this world don’t have a preference for red, they only drive red cars if they are aggressive, so those men receive very high risk predictions. Women who drive red cars receive a medium risk score, indicating the uncertainty over the reason for their preference for red between the two possible causes.

To summarize: we started with an ideal world where there is no actual unfairness on the basis of gender, but by ignoring gender in our predictions we can actually create unfairness. Enforcing Roberts’ definition of fairness leads to unfair predictions because the underlying causal structure of the world was not properly integrated into the predictive model.


Roberts’ notion of fairness, or the simplistic or possibly unrealistic nature of the red car example, are not the root problems here. Any definition of fairness that does not incorporate the causal structure of the world may break down and possibly increase the amount of unfairness rather than decrease it.

Unobserved confounders are a fundamental limitation on both causal inference and fairness

In causal inference, unobserved confounders can invalidate whatever conclusions we make based on the data we do observe. Similarly in fairness, missing variables that have important information related to the protected attributes can cause any definition of fairness to break down. In the red car example it was the insurance company’s inability to directly measure aggressiveness that makes the problem difficult. These sorts of problems are ubiquitous in machine learning and data science: proxies are used in place of the outcome variables we’re really interested in, the data is often just whatever happened to be available rather than a carefully collected set of variables directed by a specific scientific question, and so on.

Feedback is a fundamental limitation on both causal inference and fairness

In the red car example there was no direct relationship between \(A\) and \(Y\), but in many fairness examples there is such a causal link (probably with other unmeasured variables in between them). In policing, for example, individual arrest records or geographic reported crime rates may be used as outcomes or predictors for models trying to predict future criminal activity. But if more police are sent to patrol neighborhoods where there were higher arrest rates, the increase in police can cause the number of arrests to go even higher. The resulting feedback loops can lead to overpolicing some areas and underpolicing others. It’s hard to determine whether arrest rates in one area are high because of actual high crime rates or because of more policing. Similarly, this kind of feedback can also make causal inference difficult or impossible.

Takeaways for all, even skeptics of causal definitions of fairness

Similar strategies seem necessary for overcoming the fundamental challenges of both causal inference and fairness. It’s important to involve domain experts and stakeholders, think about how the data was generated or gathered, relationships between predictors, and the possibility of confounding by unobserved variables. We can also gain insight by thinking about how we would conduct a randomized, controlled study, to the extent that it’s feasible or even imaginable. We must at least try to understand the variety of sources of error or unfairness, or how our modeling assumptions might be wrong, and which of these are the most dangerous to our conclusions.

If you are interested in further reading, the red car example appears in our 2017 NIPS oral paper on counterfactual fairness, and there are several other papers and links on causal inference (and other topics!) on Moritz Hardt’s course page on fairness in machine learning.