10/12/2025 - Vincenzo Gioia
Introduction
Cases where the value of the response is continuous and observed only in a certain range
These variables are truncated for a certain value, which can be on the left side of the distribution (\(l\)), on the right side (\(u\)) or on both sides
The distribution of such a variable is a mix of a discrete and a continuous distribution:
Truncated responses in economics:
The consumer problem
Let’s suppose to have goods: \(y\) = vacations and \(z\) = food
Utility function describing the consumer preference: (\(q_y\) and \(q_z\) the quantities of the two goods)
\[ U(q_y,q_z) = (q_y + \mu)^\beta q_z^{1-\beta}, \quad 0<\beta<1, \; \mu>0 \]
The consumer seeks to maximize their utility subject to the budget constraint (\(x\) is the income and \(p_y\) and \(p_z\) are the prices): \(x = p_y q_y + p_z q_z\)
Condition for an interior solution :
\[ \frac{\beta}{1-\beta}\frac{q_z}{q_y+\mu} = \frac{p_y}{p_z} \]
\[ \begin{cases} q_y = \beta \dfrac{x}{p_y} - (1-\beta)\mu \\ q_z = (1-\beta)\dfrac{x}{p_z} + (1-\beta)\dfrac{p_y}{p_z}\mu \end{cases} \]
Income Threshold
Demand for \(y\) can become negative → not economically admissible
Minimum income level for positive consumption of \(y\):
\[ \bar x = \frac{1-\beta}{\beta} p_y \mu \]
\[ q_y = 0, \quad q_z = \frac{x}{p_z} \quad \Rightarrow \textbf{corner solution} \]
\[ y = p_y q_y = \beta x - \beta \bar{x} = \beta_1 + \beta_2 x \]
Solution
Key Concepts
Key concepts
Sample selection occurs when the observation of the dependent variable is not random due to a self-selection process
Example:
Key feature:
Relation to truncation and corner solutions:
Tobit Models
Cases where the dependent variable is a truncated variable, but the sample can be either censored or truncated
Differences on the vacation-food expenditure example
Censored regression model - Tobit model (Tobin, 1958):
Truncated regression model:
Definition
Tobit-1 (or tobit, for short) is a linear model: \(y_i = x^\top_i \beta + \epsilon_i\)
In general, the tobit name is restricted to models estimated in a censored sample
In the context of the linear regression model, violation of these assumptions is not too severe, as the estimator is still consistent. This is not the case for the model studied here, as wrong assumptions of homoskedasticity and normality will lead to biased and inconsistent estimators
Truncated normal
Truncated normal
\[E(Y | Y > l) = \mu + \sigma \lambda_{\tilde{l}}\]
\[V(Y | Y > l)= \sigma ^ 2\left[1 - \lambda_{\tilde{l}}(\lambda_{\tilde{l}}-\tilde{l})\right] \] where \(\lambda_l = \frac{\phi(l)}{1 - \Phi(l)}\) is the inverse mills ratio
Consequences
The expectation and the variance of \(y\) left-truncated at 0 can then be written as: \[ \begin{array}{rcl} \mbox{E}(Y_i \mid x_i, Y_i > 0) &=& \mu_i + \sigma r(\mu_i / \sigma)\\ \mbox{V}(Y_i \mid x_i, Y_i > 0) &=& \sigma ^ 2 \left[1 + r'(\mu_i / \sigma)\right]\\ \end{array} \] where \(r(x)=\phi(x)/\Phi(x)\)
truncation has two consequences for the linear regression model:
Consequences
Censored sample
\[\begin{cases}\begin{array}{rclccc} y&=&0 &\mbox{ if }& y^* < 0\\ y&=&y^* &\mbox{ if }& y^* \geq 0\\ \end{array}\end{cases}\]
\[ \begin{array}{rcl} \mbox{E}(Y\mid x_n) &=& \left[1 - \Phi\left(\frac{\mu_n}{\sigma}\right)\right] \times 0 + \Phi\left(\frac{\mu_n}{\sigma}\right) \times \mbox{E}(Y\mid x, y > 0) \\ &=& \mu_n \Phi\left(\frac{\mu_n}{\sigma}\right) + \sigma \phi\left(\frac{\mu_n}{\sigma}\right) \end{array} \]
Censored sample
As for the previous case, the conditional expected value of \(Y\) is not \(\mu_i\), which implies that the OLS estimator is biased and inconsistent
The downward bias of the slope seems more severe than for the truncated sample because there are much more observations for very low values of \(x\), i.e., in the range of the values of \(x\) where the correlation between \(x\) and \(\epsilon\) is severe
Corner Solution Models
Let’s consider only the case of corner solution and not the case of data censoring (like top-coding)
In both cases, the regression function: \(\mu_i = x^\top_i \beta\) returns the mean of the distribution of the untruncated distribution of \(y\)
In the data censoring case, which is just a problem of missing values of the response, this is the relevant distribution to consider and therefore \(\beta_k\) is the marginal effect of covariate \(x_k\) that we have to consider
On the contrary, for corner solution models, the relevant distributions that we have to consider is on the one hand the probability of \(y >0\) and on the other hand the zero left-truncated distribution of \(y\)
Corner Solution Models
Therefore, \(\mu_i\) is the mean of an untruncated latent variable, \(\beta_k\) is the marginal effect of \(x_k\) on this latent variable and none of these values are particularly meaningful.
For a corner solution model, the effect of a change in \(x_k\) is actually twofold:
The probability that \(y\) is positive and the conditional expectation for positive values of \(y\) are, denoting as usual \(\mu_i = x^\top_i \beta\):
\[ \begin{array}{rcl} \mbox{P}(Y_i > 0\mid x_i) &=& \Phi\left(\frac{\mu_i}{\sigma}\right)\\ \mbox{E}(Y_i\mid x_i, Y_i > 0) &=& \mu_i + \sigma r\left(\frac{\mu_i}{\sigma}\right) \end{array} \]
and the unconditional expectation of \(y\) is just the product of these two expressions:
\[ \mbox{E}(Y_i\mid x_i) = \mbox{P}(Y_i > 0\mid x_i) \times \mbox{E}(Y_i\mid x_i, Y_i > 0) \]
Corner Solution Models
\[ \begin{cases} \begin{array}{lcl} \frac{\displaystyle\partial\mbox{P}(Y_i > 0\mid x_i)}{\displaystyle\partial x_{ik}} &=& \frac{\beta_k}{\sigma} \phi\left(\frac{\mu_i}{\sigma}\right) \\ \frac{\displaystyle\partial \mbox{E}(Y_i\mid x_i, Y_i>0)}{\displaystyle\partial x_{ik}} &=& \beta_k \left[1 + r'\left(\frac{\mu_i}{\sigma}\right) \right] \end{array} \end{cases} \]
Consistent estimators
Several consistent estimators are available for the truncated and the censored model
Inefficient estimators:
The maximum likelihood estimator is asymptotically efficient if the conditional distribution of \(y\) is normal and homoskedastic
The symmetrically trimmed least squares estimator, which is consistent even if the distribution of \(y\) is not normal and heteroskedastic
Non-linear least squares
The conditional expected value of \(y\): \(\mbox{E}(Y_i\mid x_i) = x^\top_i \beta + \sigma r\left(\frac{x^\top_i \beta }{\sigma}\right)\) is non-linear in \(x\)
The parameters can be consistently estimated using non-linear least squares, by minimizing:
\[ \sum_{i=1} ^ n \left[y_i - x^\top_i \beta - \sigma r\left(\frac{x^\top_i \beta}{\sigma}\right)\right] ^ 2 \]
Probit and two-step estimators
The probability that \(y\) is positive is \(\Phi\left(\frac{x^\top_i \beta}{\sigma}\right)\), therefore, a probit model can be used to estimate the vector of coefficients \(\frac{\beta}{\sigma}\)
\(\sigma\) is not identified, and each element of \(\beta\) is only estimated up to a \(1/\sigma\) factor
The probit estimation can only be performed for a censored sample, and not a truncated sample for which all the values of \(y\) are positive
This idea leads to the two-step estimator:
Maximum-likelihood estimation
Semi-parametric estimators
Only the regression function is specified parametrically: No distributional assumption on the error term (except symmetry)
The semi-parametric estimator is valid under much weaker assumptions
Main idea of the estimator:
Advantages:
Drawback:
Model fitting
The estimation of the tobit-1 model is available in functions of different packages:
AER::tobitcensReg::censRegmicsr::tobit1The three functions return identical results, except that they are parametrized differently: micsr::tobit1 estimates \(\sigma\) as the two other functions estimate \(\ln \sigma\).
In addition to the formula and data arguments, they allow to specify
left and a right argument to indicate the truncation pointsThe micsr::tobit1 function allows to use either a censored or a truncated sample by setting the sample argument either to "censored" or "truncated".
The truncated regression model can also be estimated using the truncreg::truncreg function
Model fitting
micsr::tobit1, which also has the advantage of providing several different estimators selected using the method argument
"ml" for maximum likelihood (the only method available for the other towo functions)"lm" for linear model"twostep" for the two-step estimator"trimmed" for the trimmed estimator"nls" for the non-linear least squares estimatorExample
charitable data set concerning charitable giving (Wilhelm, 2008)Example
Donation variable is left-censored for the value of 25, as this value corresponds to the item “less than $25 donation”
For this value, we have households who didn’t make any charitable giving and some who made a small giving (from $1 to $25)
donation donparents education religion
Min. : 25 Min. : 25 less_high_school:242 none : 322
1st Qu.: 25 1st Qu.: 125 high_school :847 catholic : 537
Median : 275 Median : 775 some_college :670 protestant:1174
Mean : 1234 Mean : 2569 college :396 jewish : 66
3rd Qu.: 1125 3rd Qu.: 2368 post_college :229 other : 285
Max. :76825 Max. :491525
income married south
Min. : 855.5 Min. :0.0000 Min. :0.0000
1st Qu.: 31101.4 1st Qu.:0.0000 1st Qu.:0.0000
Median : 50712.5 Median :1.0000 Median :0.0000
Mean : 63391.0 Mean :0.6355 Mean :0.3058
3rd Qu.: 78329.7 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :785385.2 Max. :1.0000 Max. :1.0000
Example
logdon as the response and the default values of left (0) or right (\(+\infty\)) or by using log(donation) as the response and setting left to log(25)Example
log(income) and log(donparents) positively influence charitable giving; Higher education levels is associated to higher donations; Religious groups donate more than reference group; Married households donate more; Region is not significant.Maximum likelihood estimation
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -17.617804 0.898027 -19.6184 < 2.2e-16 ***
log(donparents) 0.200352 0.025235 7.9394 2.031e-15 ***
log(income) 1.453386 0.087030 16.6999 < 2.2e-16 ***
educationhigh_school 0.622148 0.188142 3.3068 0.0009437 ***
educationsome_college 1.100389 0.194320 5.6628 1.490e-08 ***
educationcollege 1.325042 0.214808 6.1685 6.895e-10 ***
educationpost_college 1.727244 0.235683 7.3287 2.324e-13 ***
religioncatholic 0.638635 0.171421 3.7255 0.0001949 ***
religionprotestant 1.257030 0.154226 8.1506 3.623e-16 ***
religionjewish 1.001090 0.307026 3.2606 0.0011118 **
religionother 0.836793 0.193670 4.3207 1.555e-05 ***
married 0.766903 0.116755 6.5685 5.084e-11 ***
south 0.112612 0.104586 1.0767 0.2815971
sigma 2.113606 0.040971 51.5877 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
log-Likelihood: -4005.3
Example
Using micsr::tobit1, we also estimate the two-step, the SCLS (symmetrically censored least squares) and the OLS estimators.
Let’s just compare graphically the estimated coefficients under the four estimation approaches (we do not explore the standard errors because except forr the ML case, I guess they are affected by a bug)
Example
Checks
The most popular method of estimation for the tobit-1 model is the fully parametric maximum likelihood method
Contrary to the OLS model, the estimator is only consistent if the generating process is perfectly described by the likelihood function, i.e., if \(\epsilon_n \sim \mathcal{N}(0,\sigma)\)
In particular, the consistency of the estimator rests on the hypothesis of normality and homoskedasticity.
The conditional moment tests are based on residuals (here, the residuals are partially observed) and can be computed using the micsr::cmtest function
Normality and heteroschedasticity
Conditional Expectation Test for Normality
data: logdon ~ log(donparents) + log(income) + education + religion + ...
chisq = 116.35, df = 2, p-value < 2.2e-16
Heteroscedasticity Test
data: logdon ~ log(donparents) + log(income) + education + religion + ...
chisq = 103.59, df = 12, p-value < 2.2e-16
Skewness and kurtosis
Conditional Expectation Test for Skewness
data: logdon ~ log(donparents) + log(income) + education + religion + ...
z = 10.393, p-value < 2.2e-16
Conditional Expectation Test for Kurtosis
data: logdon ~ log(donparents) + log(income) + education + religion + ...
z = 2.3294, p-value = 0.01984
Example
food data setExample
Example
Example
food_tobit <- tobit1(log(food) ~ log(income) + log(hsize) + midage,
data = food, subset = year == 1980,
left = -Inf, right = log(13030))
food_tobit
Call:
tobit1(formula = log(food) ~ log(income) + log(hsize) + midage,
data = food, subset = year == 1980, left = -Inf, right = log(13030))
Coefficients:
(Intercept) log(income) log(hsize) midage sigma
4.70629 0.34005 0.47304 0.09501 0.36716
Example
portfolio. [1] "id" "year" "share"
[4] "uncert" "expinc" "finass_10"
[7] "finass_10_100" "finass_more" "networth"
[10] "noncapinc" "mtrate" "high_inc_oversmpl"
[13] "age" "educ" "diploma"
[16] "female" "adults" "child_0_12"
[19] "child_13_more" "occup" "riskav"
[22] "feeling" "flex" "smoke"
[25] "alcohol" "body_mass" "habits"
Example
This data set is a panel of annual observations from 1993 to 1998 (yearly observations of 3348 households)
The response is share, and the two covariates of main interest are uncert and expinc.
uncert indicates the degree of uncertainty felt by the household; it is a factor with levels low, moderate and high,expinc indicates the prediction of the household concerning the evolution of their income in the next 5 years; it is a factor with levels increase, constant and decrease.Covariates: net worth, the age of household’s head and its square, and a dummy for households whose head is a woman
We use the micsr::tobit1 function and we set the left and the right arguments respectively to 0 and 1.
Example
Call:
tobit1(formula = share ~ uncert + expinc + networth + age + agesq +
female, data = portfolio, left = 0, right = 1)
Coefficients:
(Intercept) uncertmod uncerthigh expinccst expincdecr networth
1.48600 0.04410 0.08155 0.04059 0.04631 -0.02940
age agesq female sigma
-0.02848 0.03055 0.14153 0.45578
Example
expinc and uncert$location
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.66330267 0.081415222 32.712589 1.034277e-234
uncertmod 0.03473738 0.013854048 2.507381 1.216295e-02
uncerthigh 0.05364941 0.016387136 3.273874 1.060840e-03
expinccst 0.02644687 0.010686066 2.474893 1.332761e-02
expincdecr 0.04222966 0.014608545 2.890751 3.843229e-03
networth -0.14346949 0.004353413 -32.955634 3.513004e-238
age -0.02202405 0.003080029 -7.150600 8.639953e-13
agesq 0.02669050 0.003034251 8.796404 1.412698e-18
female 0.09196394 0.015422812 5.962852 2.478736e-09
$scale
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.41494455 0.126065307 3.291505 9.965294e-04
networth -0.09291492 0.002554462 -36.373570 1.114519e-289
age -0.01574593 0.005108319 -3.082410 2.053320e-03
agesq 0.02267481 0.004968105 4.564077 5.016967e-06
female 0.07026685 0.027956425 2.513442 1.195595e-02
Example
Endogeneity
Endogeneity can be treated in the Tobit model in a way very similar to the Probit model
Key difference with Probit:
Because the positive values are observed:
Available estimation methods (tobit1 function can be used for the purpose)
Testing exogeneity can be done using a Wald test based on the two-step estimator (endogtest can be used)
Extra: see the example in the online book Microeconometrics with R