Estimating counterfactuals

Conceptually, the missing counterfactual outcome is one that would have occurred under a different set of circumstances. In causal inference, we wish we could observe the conterfactual outcome that would have occurred in an alternate universe where the exposure status for a given observation was flipped. To do this, we attempt to control for all factors that are related to an exposure and outcome such that we can construct (or estimate) such a counterfactual outcome. Let’s suppose some happiness index, from 1-10 exists. We are interested in assessing whether eating chocolate ice cream versus vanilla will increase happiness. We have 10 individuals with two potential outcomes for each, one is what their happiness would be if they ate chocolate ice cream, (defined as y_chocolate in the code below), and one is what their happiness would be if they ate vanilla ice cream (defined as y_vanilla in the code below). We can define the true causal effect of eating chocolate ice cream (versus vanilla) on happiness for each individual as the difference between the two:

data <- data.frame(
  id = 1:10,
  y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),
  y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)
)

data <- data |>
  mutate(causal_effect = y_chocolate - y_vanilla)

data

##    id y_chocolate y_vanilla causal_effect
## 1   1           4         1             3
## 2   2           4         3             1
## 3   3           6         4             2
## 4   4           5         5             0
## 5   5           6         5             1
## 6   6           5         6            -1
## 7   7           6         8            -2
## 8   8           7         6             1
## 9   9           5         3             2
## 10 10           6         5             1

data |>
  summarize(
    avg_chocolate = mean(y_chocolate),
    avg_vanilla = mean(y_vanilla),
    avg_causal_effect = mean(causal_effect)
  )

##   avg_chocolate avg_vanilla avg_causal_effect
## 1           5.4         4.6               0.8

The causal effect of eating chocolate ice cream (versus vanilla) for individual 4 is 0, whereas the causal effect for individual 9 is 2. The average potential happiness after eating chocolate is 5.4 and the average potential happiness after eating vanilla is 4.6. The average treatment effect of eating chocolate (versus vanilla) ice cream among the ten individuals in this study is 0.8.

In reality, we cannot observe both potential outcomes, in any moment in time, each individual in our study can only eat one flavor of ice cream. Suppose we let our participants choose which ice cream they wanted to eat and each choose their favorite (i.e. they knew which would make them “happier” and picked that one). Now what we observe is:

data_observed <- data |>
  mutate(
    exposure = case_when(
      # people who like chocolate more chose that
      y_chocolate > y_vanilla ~ "chocolate",
      # people who like vanilla more chose that
      y_vanilla >= y_chocolate ~ "vanilla"
    ),
    observed_outcome = case_when(
      exposure == "chocolate" ~ y_chocolate,
      exposure == "vanilla" ~ y_vanilla
    )
  ) |>
  # we can only observe the exposure and one potential outcome
  select(id, exposure, observed_outcome)
data_observed

##    id  exposure observed_outcome
## 1   1 chocolate                4
## 2   2 chocolate                4
## 3   3 chocolate                6
## 4   4   vanilla                5
## 5   5 chocolate                6
## 6   6   vanilla                6
## 7   7   vanilla                8
## 8   8 chocolate                7
## 9   9 chocolate                5
## 10 10 chocolate                6

data_observed |>
  group_by(exposure) |>
  summarise(avg_outcome = mean(observed_outcome))

## # A tibble: 2 × 2
##   exposure  avg_outcome
##   <chr>           <dbl>
## 1 chocolate        5.43
## 2 vanilla          6.33

Now, the observed average outcome among those who ate chocolate ice cream is 5.4 (the same as the true average potential outcome), while the observed average outcome among those who ate vanilla is 6.3 – quite different from the actual average (4.6). The estimated causal effect here could be calculated as 5.4 - 6.3 = -0.9.

It turns out here, these 10 participants chose which ice cream they wanted to eat and they always chose to eat their favorite! This artificially made it look like eating vanilla ice cream would increase the happiness in this population when in fact we know the opposite is true. The next section will discuss which assumptions need to be true in order to allow us to accurately estimate causal effects using observed data. As a sneak peak, our issue here was that how the exposure was decided, if instead we randomized who ate chocolate versus vanilla ice cream we would (on average, with a large enough sample) recover the true causal effect.

The magic of randomization

Since now we are doing something random so let’s set a seed so we always observe the same result each time we run the code:

set.seed(11)
data_observed <- data |>
  mutate(
    # change the exposure to randomized, generate from a binomial distribution
    # with a probability 0.5 for being in either group
    exposure = case_when(
      rbinom(10, 1, 0.5) == 1 ~ "chocolate",
      TRUE ~ "vanilla"
    ),
    observed_outcome = case_when(
      exposure == "chocolate" ~ y_chocolate,
      exposure == "vanilla" ~ y_vanilla
    )
  ) |>
  # as before, we can only observe the exposure and one potential outcome
  select(id, exposure, observed_outcome)
data_observed |>
  group_by(exposure) |>
  summarise(avg_outcome = mean(observed_outcome))

## # A tibble: 2 × 2
##   exposure  avg_outcome
##   <chr>           <dbl>
## 1 chocolate        5.33
## 2 vanilla          4.71

As you can see, now we have correctly estimated the causal effect in this population… how this happened? More on this in block 2 (Study Design) and block 3 !

Causal Inference Introduction

Estimating counterfactuals

The magic of randomization