28/11 - 03/12/2025 - Vincenzo Gioia
Moving beyond responses defined on the real line
Up to now: responses that are real numbers defined on the whole real line (or when strictly positive real number are coerced to a real number by taking its logarithm)
Now, responses that are not defined on the whole real line
Moving beyond responses defined on the real line
With respect to the linear model, the main difference is that the conditional expectation of the response is no longer a linear function of the covariates
Here, we will denote \(\eta = X \beta\) the linear predictor which is related to the conditional expectation
More precisely, \(\eta_i =g(\mu_i)\), with \(\mu_i = \mathbb{E}(Y_i|x_i)\) and where \(g(\cdot)\) is the link function
Under the linear case \(\beta_k = \frac{\partial \mbox{E}(Y_i \mid x_i)}{\partial x_{ik}}\), so the marginal effect of the \(k\)-th covariate was the corresponding coefficient
Here we have \(\frac{\partial \mu_i}{\partial x_{ik}}=\beta_k \frac{\partial g ^ {-1}}{\partial \eta_i}(\eta_i)\) so the marginal effect of the \(k\)-th covariate is now proportional (but not equal) to the corresponding coefficient
Binary response
Binary responses can take only two mutually exclusive values (coded as 1 and 0)
Common examples:
For this kind of responses, the statistical distribution is obviously a binomial distribution with one trial (that is a Bernoulli distribution)
Denoting 1 as “a success” and 0 as “a fail”, this distribution is fully characterized by a unique parameter, \(p\), which is the probability of success and is also the expected value of the variable, that is: \(\mbox{E}(Y) = \mu = p\).
The variance of the distribution is: \(\mbox{V}(Y) = \mu(1-\mu) = p(1-p)\).
Binary response
Simplest choice for \(F\): Advantages
\[p_i = F(\eta_i) = F(x^\top_i\beta)\]
Simplest choice: Identity function for \(F(\cdot)\) \[\implies p_i = \eta_i = x^\top_i\beta\]
So, the parameter of the Bernoulli distribution is assumed to be a linear function of the covariates
Advantages:
Simplest choice for \(F\): Pitfalls
\[p_i = F(\eta_i) = F(x^\top_i\beta)\]
\[\implies p_i = \eta_i = x^\top_i\beta\]
Better choices
\[F(z) = \Phi(z) = \displaystyle \int_{-\infty} ^ z \phi(t) dt = \int_{-\infty} ^ z \frac{1}{\sqrt{2\pi}} e ^{-\frac{1}{2}t ^ 2} dt\]
Logit/Probit models
Logit/Probit models
\[ \frac{\partial \mu_i}{\partial x_{ik}} = \beta_k f(\eta_i) \]
where \(f\) is the first derivative of \(F\) (the normal and logistic densities, respectively)
Then, the marginal effect is obtained by multiplying the coefficient by \(f(\eta_i)\) which depends on the value of the covariates for a given observation (the marginal effect is observation-dependent, but the ratio of two marginal effects for two covariates is not, as it is obviously equal to the ratio of the two corresponding coefficients)
As the coefficient of proportionality is the normal/logistic density, the maximum marginal effect is for \(\eta_i = 0\), which results in a probability of success of 0.5.
Logit/Probit models
In correspondence of \(\eta_i = 0\) the values of the densities are 0.4 and 0.25 for the normal and logistic densities
Therefore, a rule of thumb to interpret coefficients is to multiply them respectively by 0.4 and 0.25 for the probit and logit model to get an estimation of the maximum marginal effect
Better choices
Example: Car versus Public Transport
As an example, we consider the data set used by Horowitz (1993) which concerns the transport mode chosen for work trips by a sample of 842 individuals in Washington DC in the late sixties
The response mode is 1 for car and 0 for public transport.
The covariates are the in- and out-vehicle times (ivtime and ovtime) and the cost differences between car and public transport
A positive value indicates that car trip is longer/more expensive than the corresponding trip using public transport
Example: Car versus Public Transport
Example: Car versus Public Transport
linprob <- lm(mode ~ gcost, mode_choice)
probit <- glm(mode ~ gcost, mode_choice, family = binomial(link = "probit"))
logit <- glm(mode ~ gcost, mode_choice, family = binomial(link = "logit"))
round(summary(linprob)$coefficients[2,],2) Estimate Std. Error t value Pr(>|t|)
0.02 0.00 9.56 0.00
Estimate Std. Error z value Pr(>|z|)
0.11 0.01 8.65 0.00
Estimate Std. Error z value Pr(>|z|)
0.21 0.02 8.52 0.00
Example: Car versus Public Transport
The coefficient of gcost for the linear-probability model is 0.0226, which means that a one dollar increase of the generalized cost differential will increase the probability of using the car by 2.26 percentage points.
If we use the previously described rule of thumb to multiply the probit/logit coefficients by 0.4/0.25 in order to have an upper limit for the marginal effect, we get 4.51 and 5.28 percentage points, which are much higher values than for the linear probability model
This is because the coefficient of the linear model estimates the marginal effect at the sample mean. In our sample, the mean value of the covariate is 2.9
Example: Car versus Public Transport
To get comparable marginal effects for the probit/logit models, we should first compute \(\hat{\beta_1} + \hat{\beta_2} \bar{x}\) (1.15 and 2 respectively for the probit and logit models) and use these values with the relevant densities (\(\phi(1.15) = 0.206\) and \(\lambda(2) = 0.105\))
At the sample mean, the marginal effects are then \(\phi(1.15)\times 0.1129 = 0.023\) and \(\lambda(2) \times 0.2112 = 0.02255435\): they are therefore very close to the linear probability model coefficient
Example: Car versus Public Transport
xlim <- c(-30, 30)
set.seed(123)
y_jitter <- jitter(as.numeric(mode_choice$mode), amount = 0.02)
plot(mode_choice$gcost, y_jitter, xlim = xlim, pch = 16,
cex = 0.2, xlab = "gcost", ylab = "mode")
curve(coef(linprob)[1] + coef(linprob)[2] * x, add = TRUE, lty = 1 )
curve(pnorm(coef(probit)[1] + coef(probit)[2] * x), add = TRUE, lty = 2)
curve(plogis(coef(logit)[1] + coef(logit)[2] * x), add = TRUE, lty = 3)
legend("topleft", legend = c("linear", "probit", "logit"),
lty = c(1, 2, 3), bty = "n")Example: Car versus Public Transport
Example: Airbnb
The data set called airbnb has been analyzed by Edelman, Luca, and Svirsky (2017)
Aim: analyze the presence of racial discrimination on the Airbnb platform
The authors create guest accounts that differ by the first name chosen. More specifically, the race of the applicant is suggested by the choice of the first name, either a “white” (Emily, Sarah, Greg) or an “African American” (Lakisha or Kareem) first name.
The response is acceptance and is 1 if the host gave a positive response and 0 otherwise
We use only three covariates:
Example: Airbnb
Note: the mean of the response is \(0.45\) which is a distinctive feature of this data set compared to the previous one (mode_choice)
As the mean value of the probability of success is close to 50% we can expect that the rule of the thumb which consists of multiplying the logit/probit coefficients by 0.25/0.4 would give an estimated value for the marginal effect close to the one directly obtained in the linear probability model
'data.frame': 6235 obs. of 10 variables:
$ acceptance : Factor w/ 2 levels "no","yes": 2 1 2 1 2 1 2 2 1 1 ...
$ guest_race : Factor w/ 2 levels "white","black": 1 2 1 1 2 1 2 1 2 2 ...
$ guest_gender: Factor w/ 2 levels "male","female": 1 2 1 2 2 1 2 1 2 2 ...
$ host_race : Factor w/ 2 levels "other","black": 1 1 1 1 2 1 1 1 1 1 ...
$ host_gender : Factor w/ 2 levels "female","male": 1 2 2 1 2 1 1 1 1 2 ...
$ multlistings: num 0 0 0 0 0 0 1 0 0 1 ...
$ shared : num 0 1 0 1 1 1 1 1 1 0 ...
$ tenreviews : num 1 1 0 0 1 0 0 0 0 1 ...
$ price : int 120 74 150 110 50 250 85 62 125 119 ...
$ city : Factor w/ 5 levels "Baltimore","Dallas",..: 2 2 2 2 2 2 2 2 2 2 ...
[1] 0.4484362
no yes
0.5515638 0.4484362
Example: Airbnb
acceptance guest_race guest_gender host_race host_gender multlistings
0 0 0 0 0 0
shared tenreviews price city
0 0 67 5
'data.frame': 6168 obs. of 10 variables:
$ acceptance : num 1 0 1 0 1 0 1 1 0 0 ...
$ guest_race : Factor w/ 2 levels "white","black": 1 2 1 1 2 1 2 1 2 2 ...
$ guest_gender: Factor w/ 2 levels "male","female": 1 2 1 2 2 1 2 1 2 2 ...
$ host_race : Factor w/ 2 levels "other","black": 1 1 1 1 2 1 1 1 1 1 ...
$ host_gender : Factor w/ 2 levels "female","male": 1 2 2 1 2 1 1 1 1 2 ...
$ multlistings: num 0 0 0 0 0 0 1 0 0 1 ...
$ shared : num 0 1 0 1 1 1 1 1 1 0 ...
$ tenreviews : num 1 1 0 0 1 0 0 0 0 1 ...
$ price : int 120 74 150 110 50 250 85 62 125 119 ...
$ city : Factor w/ 5 levels "Baltimore","Dallas",..: 2 2 2 2 2 2 2 2 2 2 ...
- attr(*, "na.action")= 'omit' Named int [1:67] 817 834 2106 2401 2443 2470 2471 2580 2803 2967 ...
..- attr(*, "names")= chr [1:67] "817" "834" "2106" "2401" ...
Example: Airbnb
We fit the models (linear probability, logit and probit models)
We compare the coefficients (those for logit and probit are, respectively, multiplied by 0.25 and 0.4, easing the comparison) and we report the ratio between logit and probit models (showing that is close to 1.6)
For a 100% increase of the (log-) price, the probability of acceptance reduces by 4.16 percentage points
The estimated marginal effect for black guests is about -8.5 percentage points (we do not need to compute the derivatives because the covariate is a dummy)
fit_lprob <- lm(acceptance ~ guest_race + I(log(price)) + city, data = airbnb)
fit_logit <- glm(acceptance ~ guest_race + I(log(price)) + city,
data = airbnb, family = binomial("logit"))
fit_probit <- glm(acceptance ~ guest_race + I(log(price)) + city,
data = airbnb, family = binomial("probit"))
round(cbind(coef(fit_lprob), coef(fit_logit) * 0.25,
coef(fit_probit)* 0.4, coef(fit_logit)/coef(fit_probit)),3) [,1] [,2] [,3] [,4]
(Intercept) 0.708 0.215 0.214 1.604
guest_raceblack -0.084 -0.085 -0.085 1.602
I(log(price)) -0.045 -0.047 -0.047 1.603
cityDallas 0.023 0.023 0.023 1.593
cityLos-Angeles 0.015 0.016 0.016 1.590
citySt-Louis 0.010 0.010 0.010 1.573
cityWashington -0.037 -0.037 -0.037 1.611
Example: Airbnb
\[P(Y_i=1|x_{i1} = 1, x_{i2} = 1, x_{i3} = \log(mean(price))) - P(Y_i=1|x_{i1} = 1, x_{i2} = 0, x_{i3} = \log(mean(price)))\]
Example: Airbnb
(Intercept) guest_raceblack I(log(price)) cityDallas cityLos-Angeles
[1,] 0.859 -0.341 -0.187 0.093 0.063
[2,] 0.536 -0.213 -0.116 0.059 0.040
citySt-Louis cityWashington
[1,] 0.039 -0.150
[2,] 0.025 -0.093
pnorm(coef(fit_probit)[1] + coef(fit_probit)[2] + coef(fit_probit)[3] * log(182)) -
pnorm(coef(fit_probit)[1] + coef(fit_probit)[3] * log(182))(Intercept)
-0.08337756
plogis(coef(fit_logit)[1] + coef(fit_logit)[2] + coef(fit_logit)[3] * log(182)) -
plogis(coef(fit_logit)[1] + coef(fit_logit)[3] * log(182))(Intercept)
-0.08328864
Latent variable structure
Provide a theoretical foundation to the probit/logit models (we will consider the case where there is a unique covariate)
We observe that \(y\) is equal to 0 or 1, but we now assume that this values is related to a latent continuous variable (called \(y^*\)) which is unobserved
\[ \left\{ \begin{array}{rcl} y = 0 & \mbox{if } & y ^ * \leq \psi \\ y = 1 & \mbox{if } & y ^ * > \psi \\ \end{array} \right. \]
where \(\psi\) is an unknown threshold
Latent variable structure
\[ \left\{ \begin{array}{rcl} y = 0 & \mbox{if } & \epsilon \leq \psi - \beta_1 - \beta_2 x\\ y = 1 & \mbox{if } & \epsilon > \psi - \beta_1 - \beta_2 x \\ \end{array} \right. \]
Latent variable structure
Random utility model
\[ \left\{ \begin{array}{rcl} U_0 &=& \beta_{01}+ \beta_2 x_0 + \epsilon_0 \\ U_1 &=& \beta_{11} + \beta_2 x_1 + \epsilon_1 \end{array} \right. \]
where \(\beta_2\) is the marginal utility of $1.
Random utility model
\[ \left\{ \begin{array}{rcl} y = 0 & \mbox{if } & \epsilon_1 - \epsilon_0 \leq - (\beta_{11} - \beta_{01}) - \beta_2 (x_1 - x_0)\\ y = 1 & \mbox{if } & \epsilon_1 - \epsilon_0 > - (\beta_{11} - \beta_{01}) - \beta_2 (x_1 - x_0)\\ \end{array} \right. \]
\[ \left\{ \begin{array}{rcl} y = 0 & \mbox{if } & \epsilon \leq - (\beta_1 + \beta_2 x)\\ y = 1 & \mbox{if } & \epsilon > - (\beta_1 + \beta_2 x)\\ \end{array} \right. \]
Binomial models are a case of generalized linear models
The generalized linear models (GLM) are a wide family of models that are intended to extend the linear model
These models have the following components:
Exponential family
\[ f(y;\theta,\phi) = e ^ {\displaystyle\left(y\theta - b(\theta)\right)/\phi + c(y, \phi)} \]
with \(\theta\) and \(\phi\) being respectively a position and a scale parameter
Linear models
Linear models are a specific case of generalized linear models with a normal distribution and an identity link
We have in this case the following density function:
\[ \phi(y;\mu, \sigma) = \frac{1}{\sqrt{2\pi}\sigma} e ^ {-\frac{1}{2}\frac{(y - \mu)^2}{\sigma ^ 2}}= e^{\frac{y\mu - 0.5 \mu ^ 2}{\sigma ^ 2}- 0.5 y ^ 2 / \sigma ^ 2 - 0.5 \ln(2\pi\sigma ^ 2)} \]
which is a member of the exponential family with \(\theta = \mu\), \(\phi = \sigma^ 2\), \(b(\theta) = \theta ^ 2/2\) and \(c(y, \phi) = -(y ^ 2 / \phi + \ln(2\pi\phi))/2\)
\[ \ln L(y, \hat{\mu}) = - \frac{n}{2}\ln(2 \pi + \sigma ^ 2) - \frac{1}{2\sigma ^2} \sum_{i=1} ^ n (y_i - \hat{\mu}_i) ^ 2 \]
Linear models
For a hypothetical “perfect” or saturated model with a perfect fit, we would have \(\hat{\mu}_i = y_i\), so that the log-likelihood would be \(- \frac{n}{2}\ln(2 \pi + \sigma ^ 2)\)
Minus two times the difference of these two values of the log likelihood function is called the scaled deviance of the proposed model: \[ D^*(y;\hat\mu) = \sum_{i=1} ^ n\frac{(y_i - \hat{\mu}_i) ^ 2}{\sigma ^ 2} \]
The deviance is obtained by multiplying the scaled deviance by \(\sigma ^ 2\)
\[ D(y; \hat{\mu}) = \sum_{i=1} ^ n(y_i - \hat{\mu}_i) ^ 2 \]
which is simply, for the linear model, the sum of square residuals
Binomial models
\[ f(y;\mu) = e ^ {y \ln \mu + (1 - y) \ln(1 - \mu)}=e ^ {y \ln \frac{\mu}{1 - \mu} + \ln(1 - \mu)}= e^{y\theta - \ln (1 + e ^ \theta)} \]
which is a member of the exponential family with:
Binomial models
The model is fully characterized once the link is specified
For the logit model, we have \(\mu = \frac{e ^ \eta}{1+e ^ \eta}\), so that \(\eta = \ln \frac{\mu}{1 - \mu} = g(\mu)\)
We then have \(\theta = \eta\), so that the logit link is called the canonical link for binomial models
As the density for the binomial model returns a probability, the log-likelihood for the saturated model is zero. Therefore, the deviance is:
\[ D(y;\hat{\mu}) = 2 \sum_{i=1} ^ n \left(y_i \ln \hat{\mu}_i + (1 - y_i) \ln(1 - \hat{\mu}_i)\right) \]
The null model is a model with only an intercept: in this case, \(\hat{\mu}_i = \hat{\mu}_0\) and the maximum likelihood estimate of \(\mu_0\) is \(\sum_{i=1} ^ n y_i / n\), i.e., the share of success in the sample
The deviance of this model is called the null deviance
Binomial models
\[ X ^ 2 = \sum_{i=1} ^ n \frac{(y_i - \hat{\mu}_i) ^ 2}{{\hat V} (\hat{\mu}_i)}= \sum_{i=1} ^ n \frac{(y_i - \hat{\mu}_i) ^ 2}{\hat{\mu}_i(1 - \hat{\mu}_i)} \]
Residuals under the linear models
Residuals for GLMs: Pearson’s residuals
The most obvious definition of the residuals for binomial models is the response residuals, which are simply the difference between the response and the prediction of the model (the fitted probability of success \(\hat{\mu}\))
However, these residuals (\(y_i - \hat{\mu}_i\)) are necessarily heteroscedastic, as the variance of \(Y_i\) is \(\mu_i(1 - \mu_i)\)
Scaling the response residuals by their standard deviation leads to Pearson’s residuals: \[\frac{y_i - \hat{\mu}_i}{\sqrt{\hat{\mu}_i(1 - \hat{\mu}_i)}}\]
The sum of squares of Pearson’s residuals is the generalized Pearson statistic
Residuals for GLMs: Deviance residuals
the term \(2 y_n - 1\) gives a positive sign for the residuals of observations for which \(y_n = 1\) and a negative sign for \(y_n = 0\), as for the two other types of residuals
Estimation with glm()
Estimation with glm()
Call:
glm(formula = mode ~ gcost, family = binomial(link = "logit"),
data = mode_choice)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.39048 0.09977 13.937 <2e-16 ***
gcost 0.21122 0.02478 8.524 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 741.33 on 841 degrees of freedom
Residual deviance: 647.39 on 840 degrees of freedom
AIC: 651.39
Number of Fisher Scoring iterations: 5
Estimation with glm()
Estimation with glm()
[1] 741.332
[1] 841
Call:
glm(formula = mode ~ 1, family = "binomial", data = mode_choice)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.65576 0.09392 17.63 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 741.33 on 841 degrees of freedom
Residual deviance: 741.33 on 841 degrees of freedom
AIC: 743.33
Number of Fisher Scoring iterations: 3
Estimation with glm()
residresid method for glm objects has a type argument which can be equal to the following three options 1 2 3 4 5 6
0.03651330 -0.55944482 0.23717556 -0.66782129 0.04603174 -0.80628781
1 2 3 4 5 6
0.1946716 -1.1268821 0.5575999 -1.4178955 0.2196654 -2.0401710
1 2 3 4 5 6
0.2727512 -1.2804059 0.7358361 -1.4846428 0.3070012 -1.8118398
Estimation with glm()
1 2 3 4 5 6
3.2728820 0.2389092 1.1682273 0.6983475 3.0312992 1.4260673
1 2 3 4 5 6
0.9634867 0.5594448 0.7628244 0.6678213 0.9539683 0.8062878
1
2.446591
1
0.9203118
Model estimation
stats::glm function uses an iterative weighted least squares method to fit GLMs.Model evaluation
These measures usually indicate an important difference between the constrained and unconstrained model
However, the comparison between the constrained and unconstrained models is spurious, because adding further covariates, even if they are irrelevant, necessarily increases the fit of the model
Therefore, it is important to consider indicators that penalize highly parametrized models. The two most popular indicators are the Akaike and the Bayesian information criteria (AIC and BIC):
They are obtained by augmenting the deviance by a term which is a multiple of the number of fitted parameters
The rule being to select the model for which the statistic is lower (lower is the best)
Model estimation and evaluation
Log-likelihood and needed quantities
Maximum likelihood estimates
Standard Errors
H_obs <- opt$hessian
vcov_obs <- solve(H_obs)
se_obs <- sqrt(diag(vcov_obs))
cbind(se_obs, summary(logit)$coefficients[,2]) se_obs
(Intercept) 0.09977194 0.09977098
gcost 0.02478110 0.02478069
p_hat <- 1 / (1 + exp(-X %*% beta_hat))
W <- diag(as.vector(p_hat * (1 - p_hat)))
Fisher <- t(X) %*% W %*% X
vcov_fisher <- solve(Fisher)
se_fisher <- sqrt(diag(vcov_fisher))
cbind(se_fisher, summary(logit)$coefficients[,2]) se_fisher
(Intercept) 0.09977194 0.09977098
gcost 0.02478121 0.02478069
Log-likelihood, deviance and Information Criteria
Example: Model comparison
logit2 <- glm(mode ~ cost + ivtime + ovtime, data = mode_choice, family = binomial("logit"))
probit2 <- glm(mode ~ cost + ivtime + ovtime, data = mode_choice, family = binomial("probit"))
cbind(summary(logit)$coefficients[,1:2], summary(probit)$coefficients[,1:2]) Estimate Std. Error Estimate Std. Error
(Intercept) 1.3904762 0.09977098 0.8211603 0.05634293
gcost 0.2112231 0.02478069 0.1128718 0.01305510
Estimate Std. Error Estimate Std. Error
(Intercept) 1.0618581 0.1949694 0.66376405 0.10919704
cost 0.1562312 0.0355831 0.08553217 0.01953444
ivtime 0.5445625 0.4545989 0.30822412 0.23822344
ovtime 4.7595607 0.9582451 2.38034964 0.49523659
Example: Model comparison
[,1] [,2] [,3] [,4]
[1,] -323.6937 -317.3219 -324.2697 -318.7685
[,1] [,2] [,3] [,4]
[1,] 651.3874 642.6438 652.5393 645.537
[,1] [,2] [,3] [,4]
[1,] 660.8589 661.5869 662.0109 664.4802
[,1] [,2] [,3] [,4]
[1,] 647.3874 634.6438 648.5393 637.537
[,1] [,2] [,3] [,4]
[1,] 741.332 741.332 741.332 741.332
Testing: Tests of nested models
\[LR=2(\ell(\hat \beta^U)−\ell(\hat \beta^R))\]
\[W= (\hat \beta^U − \beta_0)^\top Var(\hat \beta^U)^{−1} (\hat \beta^U−\beta_0)\]
where \(\beta_0\) satisfies the null restrictions (e.g., \(\beta_0=0\))
\[Score =S(\hat \beta^R)^\top I(\hat \beta^R)^{−1} S(\hat \beta^R)\]
where \(S(β)=\partial \ell (\beta)/\partial \beta\) is the score vector and \(I(\beta)\) is the Fisher information matrix (the expected value of the negative hessian)
Testing: Tests of nested models
We estimated a model with the generalized cost (\(g\)) as a unique covariate, which was computed as: \(g_i = c_i + 8(i_i + o_i)\), where \(c\), \(i\), and \(o\) are the differences in monetary cost, in-vehicle time and out-vehicle time, based on the hypothesis that time value was $8 per hour
The unconstrained model for the probit case is: \[ P(Y_i = 1) = \Phi(\beta_0 + \beta_c c_i + \beta_i i_i + \beta_o o_i) \]
The constrained model implies the two following hypotheses: \(H_0: \beta_o = \beta_i = 8 \beta_c\). It is more convenient to rewrite the model so that, under \(H_0\), a subset of the parameters are 0:
\[ \begin{array}{rcl} P(Y_i = 1) &=& \Phi\left(\beta_0 + \beta_c \left(c_i + 8 (i_i + o_i)\right) + (\beta_i - 8\beta_c)i_i + (\beta_o - 8 \beta_c) o_i\right)\\ &=& \Phi(\beta_0 + \beta_c g_i + \beta_i'i_i + \beta_o'o_i) \end{array} \]
where \(\beta_i' = (\beta_i - 8\beta_c)\) and \(\beta_o' = (\beta_o - 8\beta_c)\) are the reduced form parameters of the binomial regression with the generalized cost, the in-vehicle and out-vehicle time as covariates
Testing: Tests of nested models
Likelihood ratio test
Model 1: mode ~ gcost
Model 2: mode ~ cost + ivtime + ovtime
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -324.27
2 4 -318.77 2 11.002 0.004082 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear hypothesis test:
- 8 cost + ivtime = 0
- 8 cost + ovtime = 0
Model 1: restricted model
Model 2: mode ~ cost + ivtime + ovtime
Res.Df Df Chisq Pr(>Chisq)
1 840
2 838 2 10.986 0.004115 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Testing: Conditional moment tests
Conditional Expectation Test for Normality
data: mode ~ cost + ivtime + ovtime
chisq = 4.6997, df = 2, p-value = 0.09538
Heteroscedasticity Test
data: mode ~ cost + ivtime + ovtime
chisq = 4.129, df = 3, p-value = 0.2479
Reset test
data: mode ~ cost + ivtime + ovtime
chisq = 3.2182, df = 2, p-value = 0.2001