26th November 2025 - Vincenzo Gioia
Instrumental Variable Regression
Use of instrumental variables in the context of the potential outcome model
Focus on the simplest case:
IV Regression Assumptions : Exclusion Restriction
IV Regression Assumptions: Causal effect and monotonicity
IV regression: Results under the assumptions
\[\begin{array}{rcl} y_i(z_i = 1, d_i(1)) - y_i(z_i = 0, d_i(0)) &=& y_i(d_i(1)) - y_i(d_i(0)) \\ \nonumber &=& \left[y_i(1) d_i(1) + y_i(0) (1 - d_i(1))\right] \\ \nonumber &-& \left[y_i(1) d_i(0) - y_i(0) (1 - d_i(0))\right] \\ \nonumber &=& (y_i(1) - y_i(0)) (d_i(1) - d_i(0)) \end{array} \]
IV regression: Results under the assumptions
Consider now the relation between \(z\) and \(d\) at the individual level. There are four observable categories of individuals (indicated in a gray square) that can be considered
Counterfactuals are represented by two circles inside the square and we can distinguish:
compliers: \(d_i(1) = 1\) and \(d_i(0) = 0\)
always takers: \(d_i(1) = d_i(0) = 1\)
never takers: \(d_i(1) = d_i(0) = 0\)
deniers: \(d_i(1) = 0\) and \(d_i(0) = 1\)
IV regression: Results under the assumptions
IV regression: Results under the assumptions
The monotonicity assumption hypothesis rules out the existence of deniers
monotonicity: \[d_i(1) \geq d_i(0) \quad \forall i\]
deniers: \(d_i(1) = 0\) and \(d_i(0) = 1\)
IV regression: Results under the assumptions
With the always takers and the never takers, the causal effect of \(z\) on \(y\) in is zero because the causal effect of \(z\) on \(d\) is zero
always takers: \(d_i(1) = d_i(0) = 1\)
never takers: \(d_i(1) = d_i(0) = 0\)
\[y_i(z_i = 1, d_i(1)) - y_i(z_i = 0, d_i(0)) = (y_i(1) - y_i(0)) (d_i(1) - d_i(0)) = 0\]
IV regression: Results under the assumptions
Therefore, the causal effect of \(z\) on \(d\) reduces to the treatment effect for the compliers
compliers: \(d_i(1) = 1\) and \(d_i(0) = 0\)
\[y_i(z_i = 1, d_i(1)) - y_i(z_i = 0, d_i(0)) = (y_i(1) - y_i(0)) (d_i(1) - d_i(0))\]
IV regression: Results under the assumptions
\[ \textbf{LATE}:=\frac{\mbox{E}\left(y(1, d(1)) - y(0, d(0))\right)} {\mbox{E}\left(d(1) - d(0)\right)} \]
IV regression: Results under the assumptions
\[ \textbf{LATE}:=\frac{\mbox{E}\left(y(1, d(1)) - y(0, d(0))\right)} {\mbox{E}\left(d(1) - d(0)\right)} \]
The numerator is also called the reduced form equation, as it measures the effect of the instrument on the outcome
The denominator is called the intention to treat equation, it measures the effect of the instrument on the treatment dummy
Econometrix example
paces, extracted from Angrist (1992)Econometrix example
Econometrix example
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2549277 0.01354494 18.82089 1.946945e-71
voucher 0.6421311 0.01882989 34.10169 2.386245e-191
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.344284 0.03979311 184.561685 0.00000000
voucher 0.107922 0.05531955 1.950884 0.05124794
Econometric example
The IV estimator is the ratio of these two effects, which is 0.168 (to be compared to 0.29 of the OLS regression)
Therefore, in this example, we get an IV estimator of the treatment effect much smaller than the OLS estimator, which may be the symptom that unobserved determinants of the outcome are positively correlated with the enrollment in private school
Econometric example
Econometric example
ols <- lm(educyrs ~ privsch + pilot + housvisit + smpl +
phone + age + sex + strata + month, data = paces)
iv <- ivreg(educyrs ~ privsch + pilot + housvisit + smpl +
phone + age + sex + strata + month |
voucher + pilot + housvisit + smpl +
phone + age + sex + strata + month, data = paces)
rbind(summary(ols)$coefficients[2,], summary(iv)$coefficients[2,]) Estimate Std. Error t value Pr(>|t|)
[1,] 0.1408088 0.04239104 3.321663 0.0009156163
[2,] 0.1342252 0.06510330 2.061726 0.0393999944
A note on Regression discontinuity
Eligibility to a program is sometimes based on the value of an observable variable (called the forcing variable)
More precisely, on the fact that the value of this variable, for an individual is below or over a given threshold
Individuals just below and just over the threshold therefore constitute two groups of individuals who are very similar, except that the first group receives the treatment and the second group doesn’t. This is called a regression discontinuity (RD) design
Two variants of regression discontinuity designs can be considered:
Difference-in-Differences
\[ \frac{\sum_{i = 1}^{n_T} (y_{i2} - y_{i1})}{n_T} - \frac{\sum_{i = 1}^{n_C} (y_{i2} - y_{i1})}{n_C} \]
Difference-in-Differences: Theoretical Example
that the average annual wage in the treatment group increased by $1000 in 2022 compared to 2021. This would be a relevant estimator of the effect of the program if nothing had changed on the labor market in 2022 compared to 2021
that the average annual wage in the control group increased by $600 (if the economic situation improved, then the average wages will increase even for those who haven’t been treated)
Difference-in-Differences: Practical Example
Econometric example
\[ y_{it} = \beta_1 + \beta_2 x_{i1} + \beta_3 x_{it2} + \beta_4 x_{i1} x_{it2} + \epsilon_{it} \] with \(t=1, 2\), (the periods before and after the attack).
Econometric example
The car_thefts data set contains repeated observations of car thefts (thefts) for 876 blocks.
Each block is observed 10 times on a monthly basis, and the month of the attack (July) is split into two half-month observations
We first compute the total number of thefts (thefts) before and after the attack for all the blocks
As the two periods are of unequal length (3.5 and 4.5 months), we divide the number of thefts for the two periods by the corresponding number of days and we multiply by 30.5 to get a monthly value: The number of monthly car thefts per block is about 0.09
car_thefts <- as.data.frame(micsr.data::car_thefts)
sum_thefts <- aggregate(thefts ~ block + period, data = car_thefts, sum)
sum_days <- aggregate(days ~ block + period, data = car_thefts, sum)
two_obs <- merge(sum_thefts, sum_days, by = c("block", "period"))
two_obs$thefts <- two_obs$thefts / two_obs$days * 30.5
mean(two_obs$thefts)[1] 0.09327905
Econometric example
Econometric example
distance to a dummy for the same blocktwo_obs$distance <- ifelse(two_obs$distance == "same", 1, 0)
mod <- lm(thefts ~ period * distance, data = two_obs)
summary(mod)$coefficients Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.089150917 0.004558794 19.5558136 3.575037e-77
periodafter 0.010551986 0.006447108 1.6367008 1.018730e-01
distance 0.008982932 0.022182021 0.4049645 6.855531e-01
periodafter:distance -0.072318585 0.031370115 -2.3053337 2.126450e-02
Econometric example
before <- two_obs[two_obs$period == "before",
c("block", "distance", "thefts")]
after <- two_obs[two_obs$period == "after",
c("block", "distance", "thefts")]
names(before)[names(before) == "thefts"] <- "before"
names(after)[names(after) == "thefts"] <- "after"
diffs <- merge(before, after, by = c("block", "distance"))
diffs$dt <- diffs$after - diffs$beforeEconometric example
[1] -0.07231858
Two Sample t-test
data: dt by factor(distance)
t = 2.7388, df = 874, p-value = 0.006291
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
0.02049419 0.12414298
sample estimates:
mean in group 0 mean in group 1
0.01055199 -0.06176660
Econometric example
The difference-in-differences can be extended to the case where the observation units before and after the implementation of the treatment are not the same
Let’s consider the case in Hong (2013), which studied the impact of the introduction of Napster on music expenditure
The study is based on the Consumer Expenditure Survey, which is performed on a quarterly basis, and the data set is called napster
Napster was introduced in June 1999 and became the dominant file-sharing service
Households with (without) internet access constitute the treatment (control) group
Econometric example
date series, we construct a period variable using June 1999 as the cutoff:Econometric example
We then proceed to the estimation, with expmusic, the expenditures on recorded music as a response
internetyes has a strong positive effect on expenditure. The interaction term between period and internet indicates that, the deployment of Napster led to a significant reduction of the expense for the “treated” individuals (those who have internet access) of $4.6.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.560788 0.1749405 60.367880 0.000000e+00
periodafter -1.749118 0.2532558 -6.906529 4.993947e-12
internetyes 14.781168 0.4318063 34.231013 1.992747e-255
periodafter:internetyes -4.588916 0.5517472 -8.317063 9.120641e-17
Matching and Propensity Score Matching
Matching is used to make treated and control units comparable in observational data.
When covariates are many or continuous, exact matching becomes impossible
This is why we use the propensity score matching, the estimated probability of receiving the treatment
However, estimating the propensity score requires understanding logistic (or probit) regression first, that we will introduce on friday
Sinthetic control Methods
References