Load the required libraries
library(haven)
library(tidyverse)
library(magrittr)
Data comes from SHARE, the Survey of Health, Ageing and Retirement in Europe (https://share-eric.eu/data/). It is a multidisciplinary and cross-national panel database of data on health, socio-economic status and social and family networks of about 140000 individuals aged 50 or older (around 380000 interviews). SHARE covers 27 European countries and Israel.
The dataset we use here contains only part of the data the survey produced.
The research question is whether a stressful period could be
associated with the occurrence of muscular weakness in the over 50
population: therefore we can consider it from the pint of view of an
explanatory (causal) model, not a predictive one.
This question could be of interest since muscular weakness can be
considered a proxy of poor physical health. Muscular strength was
defined as the hand grip measured with a dynamo-meter and the weakness
was defined as having a grip strength below a threshold calculated
according to BMI and sex.
The exposure group was defined as subjects who underwent a stressful period between the first and the last time they had been interviewed; all other subject were defined as non-exposed.
Of note, an inclusion criteria was to have a normal hand strength at the start of the study. Once the subjects were divided in the exposed and non-exposed group, the hand grip measurement after two years from the exposure definition was used to define the outcome.
In this example we will consider the following variables:
load("Logistic Regression examples.RData")
Before starting to fit any regression model we have to properly code categorical variables. This is usually done in the initial data analysis (IDA) and the descriptive statistics phase.
The first step consists in identifying which variables are categorical. This may seem as an easy step but it isn’t always. If fact the choice for some variables (i.e. indexes and scales) is not obvious and it requires either knowledge of the data we are analysing or a bit more effort on the data modelling.
The important message is that we can’t assume that it is correct to model a variable as numerical only because it was of type numerical/integer in the dataset imported in R.
Once we have identified the variables that we think are categorical we can process them. We can have three cases that leads to a slightly different procedure according to the R type of the variable:
Binary variables can either be treated as numerical, usually dummy 0/1 variables, or as factors. It does not make any difference in terms of the results obtained from the model but we have to be aware of how they are coded for a correct interpretation of the model.
In general, the model will always have 1 parameter for a binary variable. However, we have to choose which level the parameter should represent by deciding which group to code with 1. The choice should be made according to how we want the model to be interpreted. For example, for the treatment variable, 0 is usually the placebo or the control treatment and 1 is usually the experimental treatment. Alternatively, in the case of the covariates/exposure variables, we could simply use a statistical criteria: for a better stability of the model it is always better to use the most frequent category as a reference.
For example, it makes more sense to consider as the reference level of paid_job subjects who have had a paid job. This will help with the stability of the model but also it will ease the interpretation of the results. Therefore, we create a new variable no_paid_job:
datiSHAREStress$no_paid_job <- ifelse(datiSHAREStress$paid_job==0,1,0)
In this case we must transform the variable in a factor, assign labels to the factor levels and choose a reference level. The reference level will be the group all the others levels will be compared to. In fact when a factor variable with c levels is added to a model, R internally transforms it in c-1 dummy variables and a model parameter will be estimated for each of them. Again, is important we choose the reference level. In this case we have ses which we have to transform into a factor
datiSHAREStress$ses <- as.factor(datiSHAREStress$ses)
We can now check how R has coded the variable:
str(datiSHAREStress$ses)
## Factor w/ 4 levels "1","2","3","4": 4 4 4 4 4 4 4 2 4 4 ...
contrasts(datiSHAREStress$ses)
## 2 3 4
## 1 0 0 0
## 2 1 0 0
## 3 0 1 0
## 4 0 0 1
By default, R uses the group 1 as the reference level. With the contrasts() function we can see how the variable will be parametrized in the regression model. In the rows we have the variable levels and on the columns the parameters generated for this variable.
It is also useful to assign more meaningful labels:
datiSHAREStress$ses=factor(datiSHAREStress$ses,
labels = c("Great Difficulties",
"Some Difficulties",
"Fairly Easily",
"Easily"))
Last but not least we can set the reference category of our choice. In this case we choose “Easily” which is the most frequent category.
table(datiSHAREStress$ses)
##
## Great Difficulties Some Difficulties Fairly Easily Easily
## 191 813 1997 3148
datiSHAREStress$ses=relevel(datiSHAREStress$ses,ref="Easily")
If the variable is of type character the steps are similar to the previous ones with the exception that we won’t need to explicitly set the labels. As a reminder, by default R uses as a reference level the first level in alphabetical order.
We skip this part here, since we already discussed how to describe a dataset in the various IDA examples. Remind that you should pay particular attention to outliers, missing values, continuous variable’s distribution shapes, rare categories in categorical ones!
A common method used in the analysis of health data for variables selection, is fitting many models with one covariate at the time. This is also a way to begin to explore the dataset, even if as discussed it is not the suggested method to select variables in the model. In this case, we can start with the exposure to stress, our variable of interest. This kind of analysis in the medical literature is often referred as “univariable analysis”.
So we start by fitting the model
fit_uni_stress <- glm(low_grip~stress,family = binomial,data=datiSHAREStress)
As a reminder, the glm function automatically uses the logit as link function unless we state otherwise.
The results of the model can be obtained with
summary(fit_uni_stress)
##
## Call:
## glm(formula = low_grip ~ stress, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3848 -0.3848 -0.3848 -0.3848 2.5013
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.5660 0.0501 -51.217 <2e-16 ***
## stress -0.5174 0.4204 -1.231 0.218
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 3141.1 on 6147 degrees of freedom
## AIC: 3145.1
##
## Number of Fisher Scoring iterations: 5
It seems that we can’t reject the null hypothesis for \(\beta_{stress}\).
We can also obtain the 95% Confidence Interval
confint.default(fit_uni_stress)
## 2.5 % 97.5 %
## (Intercept) -2.664221 -2.4678285
## stress -1.341440 0.3066133
We can do more univariable models to look for possible confounders (previoulsy discussed with an expert if possible..) in the exposure-outcome relationship.
fit_uni_age <- glm(low_grip~age,family = binomial,data=datiSHAREStress)
summary(fit_uni_age)
##
## Call:
## glm(formula = low_grip ~ age, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4362 -0.4048 -0.2897 -0.2115 2.9594
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.505544 0.419383 -22.67 <2e-16 ***
## age 0.101366 0.005822 17.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 2817.0 on 6147 degrees of freedom
## AIC: 2821
##
## Number of Fisher Scoring iterations: 6
fit_uni_job <- glm(low_grip~no_paid_job,family = binomial,data=datiSHAREStress)
summary(fit_uni_job)
##
## Call:
## glm(formula = low_grip ~ no_paid_job, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6381 -0.3749 -0.3749 -0.3749 2.3196
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.61992 0.05134 -51.030 < 2e-16 ***
## no_paid_job 1.13184 0.21544 5.254 1.49e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 3120.8 on 6147 degrees of freedom
## AIC: 3124.8
##
## Number of Fisher Scoring iterations: 5
fit_uni_sex <- glm(low_grip~female,family = binomial,data=datiSHAREStress)
summary(fit_uni_sex)
##
## Call:
## glm(formula = low_grip ~ female, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3849 -0.3849 -0.3814 -0.3814 2.3053
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.56532 0.07178 -35.738 <2e-16 ***
## female -0.01918 0.09955 -0.193 0.847
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 3142.8 on 6147 degrees of freedom
## AIC: 3146.8
##
## Number of Fisher Scoring iterations: 5
fit_uni_move <- glm(low_grip~house_move,family = binomial,data=datiSHAREStress)
summary(fit_uni_move)
##
## Call:
## glm(formula = low_grip ~ house_move, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4221 -0.3774 -0.3774 -0.3774 2.3141
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.3735 0.1307 -18.161 <2e-16 ***
## house_move -0.2329 0.1413 -1.648 0.0993 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 3140.2 on 6147 degrees of freedom
## AIC: 3144.2
##
## Number of Fisher Scoring iterations: 5
fit_uni_ses <- glm(low_grip~ses,family = binomial,data=datiSHAREStress)
summary(fit_uni_ses)
##
## Call:
## glm(formula = low_grip ~ ses, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5066 -0.3813 -0.3614 -0.3614 2.3500
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.69592 0.07324 -36.809 < 2e-16 ***
## sesGreat Difficulties 0.70745 0.23408 3.022 0.00251 **
## sesSome Difficulties 0.37973 0.14288 2.658 0.00787 **
## sesFairly Easily 0.11084 0.11422 0.970 0.33182
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 3129.9 on 6145 degrees of freedom
## AIC: 3137.9
##
## Number of Fisher Scoring iterations: 5
When we have covariates with many levels, such as ses, we can use the global Wald test which takes into account for multiple testing.
drop1(fit_uni_ses,test="Chisq")
## Single term deletions
##
## Model:
## low_grip ~ ses
## Df Deviance AIC LRT Pr(>Chi)
## <none> 3129.8 3137.8
## ses 3 3142.8 3144.8 12.988 0.004663 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can now fit a multivariable logistic model with the variables that
were found to be associated with the outcome in the previous analysis
and we suspect could act as confounders of the main exposure of
interest.
Since we have not the possibility do discuss with an expert, we decide
to include in the multivariable regression model all covariates with a
p-value < 0.1. This criterion could be not the best one, but is
widely applied in clinical and epidemiological research.
We obviously include in the model stress , since it is our exposure of interest.
We will treat the others variables are possible confounders in the model, and we will explore if significant associations are present with the exposure of interest. In principle, in the explanatory setting, we should start from a DAG and make the causal assumptions (in order to estimate a causal effect!!) but we do not have here the experts of the matter and we will limit ourselves in the end to a cautious interpretation of the estimated association.
fit_multi <- glm(low_grip~stress+no_paid_job+age+house_move+ses,family = binomial,data=datiSHAREStress)
summary(fit_multi)
##
## Call:
## glm(formula = low_grip ~ stress + no_paid_job + age + house_move +
## ses, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3619 -0.3994 -0.2864 -0.2075 2.9996
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.486409 0.448437 -21.154 < 2e-16 ***
## stress -0.149441 0.438194 -0.341 0.733075
## no_paid_job 0.408889 0.233225 1.753 0.079568 .
## age 0.101474 0.005924 17.129 < 2e-16 ***
## house_move -0.196669 0.147181 -1.336 0.181472
## sesGreat Difficulties 0.939131 0.249104 3.770 0.000163 ***
## sesSome Difficulties 0.489696 0.149577 3.274 0.001061 **
## sesFairly Easily 0.066057 0.118419 0.558 0.576962
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 2790.2 on 6141 degrees of freedom
## AIC: 2806.2
##
## Number of Fisher Scoring iterations: 6
This model estimates the coefficients of each of the variables independently from all the others. This is key concept of multivariable regression and it is what allows to remove confounding.
At a first sight it seems that our exposure of interest is not associated with the outcome.
Knowledge of the context from which the data comes from can be used to make hypothesis about possible interaction between confounders and the exposure.
In this case we want to test for an interaction between no_paid_job and stress. In other words we want to see if the association between the outcome and the exposure could vary for different levels of no_paid_job.
We then fit the model:
fit_multi2 <- glm(low_grip~stress*no_paid_job+house_move+age+ses,family = binomial,data=datiSHAREStress)
summary(fit_multi2)
##
## Call:
## glm(formula = low_grip ~ stress * no_paid_job + house_move +
## age + ses, family = binomial, data = datiSHAREStress)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3766 -0.3996 -0.2850 -0.2054 3.0626
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.544923 0.450526 -21.186 < 2e-16 ***
## stress -0.812176 0.603047 -1.347 0.178049
## no_paid_job 0.267759 0.243706 1.099 0.271901
## house_move -0.181885 0.147844 -1.230 0.218604
## age 0.102246 0.005946 17.197 < 2e-16 ***
## sesGreat Difficulties 0.927537 0.249828 3.713 0.000205 ***
## sesSome Difficulties 0.490455 0.149577 3.279 0.001042 **
## sesFairly Easily 0.060946 0.118563 0.514 0.607225
## stress:no_paid_job 3.189101 1.054703 3.024 0.002497 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3142.8 on 6148 degrees of freedom
## Residual deviance: 2781.6 on 6140 degrees of freedom
## AIC: 2799.6
##
## Number of Fisher Scoring iterations: 6
As a reminder, using * introduces both the main effect and the interaction effect between variables in the model.
Alternatively, we can also fit the same model as follows:
fit_multi2 <- glm(low_grip~stress+no_paid_job+age+ses+stress:no_paid_job+house_move,family = binomial,data=datiSHAREStress)
This second way of adding an interaction term is useful in case we have multiple interactions involving the same variable.
It seems that a significant interaction is present between stress and the no_paid_job confounder. Note that in any case, the main effect of stress is still not significant in the model.
Now that we have fitted these two models, which one should we choose?
We can use a formal statistical test to compare the two nested models:
anova(fit_multi,fit_multi2,test="Chisq")
## Analysis of Deviance Table
##
## Model 1: low_grip ~ stress + no_paid_job + age + house_move + ses
## Model 2: low_grip ~ stress + no_paid_job + age + ses + stress:no_paid_job +
## house_move
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 6141 2790.2
## 2 6140 2781.6 1 8.5938 0.003373 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The null hypothesis of this test is that the simpler model (i.e. the one with the smaller number of regression parameters) is no different from the more complex model. In this case we reject that hypothesis at a 95% confidence level and we would keep the model with the interaction.
Even if we are here estimating an explanatory model, this does not means that we are not at all interested in evaluating if the predicted probabilities are in line with the observed event rates. A basic procedure to evaluate this aspect in logistic regression is the Hosmer-Lemeshow test (see for details: https://onlinelibrary.wiley.com/doi/book/10.1002/0471722146). In brief, the null hypothesis of the test is that the predicted probabilities (splitted in ordinal groups) are in line with the observed rates of the events in each group.
library(generalhoslem)
logitgof(datiSHAREStress$low_grip,fitted(fit_multi2))
##
## Hosmer and Lemeshow test (binary model)
##
## data: datiSHAREStress$low_grip, fitted(fit_multi2)
## X-squared = 5.6329, df = 8, p-value = 0.6883
The model doesn’t show evidence for poor goodness of fit/calibration. Remind that when we are instead working with models used specifically for prediction, we should use more refined analyses such that the bootstrap overfitting-corrected calibration curves.
The next step is interpreting the results obtained from the model which is one of the most important part of the analysis to be discussed with experts. Of course we are keeping things simple here: in reality we would have had to consider many other possible candidate confounders for the outcome in order to properly control for confounding in this observational study, at least for the measured ones.
We obtain the OR, which is the measure of association selected when reporting and interpreting the results of a logistic model.
We start with the OR estimated for age:
exp(fit_multi2$coefficients)[4]
## age
## 1.107656
In general, OR>1 indicate an increase of the probability of the outcome whereas OR<1 a decrease. But the question is : with respect to what? We know that the OR is a relative measure of association so we must always ask ourselves what comparison we are making when we report an OR we have estimated. The above output does not mean anything by itself.
The first important thing to remember is that age is a continuous variable (measured in years) and in the model we have assumed it has a linear effect on the log-odds of the outcome. When we obtain the \(\hat{OR}\) by exponentiating the coefficient for age we obtain the OR estimates for an increase of one year of age. Being the \(\hat{OR}>1\), the probability of developing low grip increases as the age increases. Specifically, a subject has odds of having low grip 1.11 times greater with respect to a subject who is 1 year younger.
For continuous variables is important to report an OR for a difference that is relevant in the application at hand; a proper choice will depend of the scale the variable is measured and on the magnitude of the estimated coefficient.
For example here 1 year may not be so relevant from an epidemiological point of view, we may want to report an OR for a 5 years difference in age:
exp(fit_multi2$coefficients*5)[4]
## age
## 1.66734
What if we want to interpret the OR of a binary variable? Let’s consider the variable house_move. If it is coded numerical, then the coefficients refers always to the level 1. In this case 1 stands for having moved country in the past, so if the estimated coefficient was significant we could say that subjects who moved seem to have an odds lower by 17% (\(1-\hat{OR}\)) compared to people that have never moved countries in their life. However, when we look at the 95% CI we observe that it contains 1 so the association does not seem to be statistically significant.
#OR
exp(fit_multi2$coefficients)[8]
## house_move
## 0.8336973
exp(confint.default(fit_multi2))[8,]
## 2.5 % 97.5 %
## 0.6239695 1.1139185
The interpretation of the OR for categorical variables is similar: we will have to keep in mind that we are always comparing each of the variable levels to the reference level. Here, it seems (interestingly!) that the odds of health worsening is the same for subject with good or fairly good self-perceived socio-economic status while it increases as the economic situation gets worse.
# OR
exp(fit_multi2$coefficients)[5:7]
## sesGreat Difficulties sesSome Difficulties sesFairly Easily
## 2.528273 1.633059 1.062842
# 95% CI
exp(confint.default(fit_multi2))[5:7,]
## 2.5 % 97.5 %
## sesGreat Difficulties 1.549423 4.125513
## sesSome Difficulties 1.218097 2.189385
## sesFairly Easily 0.842456 1.340880
Finally, we will obtain the \(\hat{OR}s\) for stress and no_paid_job. Since there is an interaction involved, we have to be a bit more careful and we have to consider the two variables together. So first of all we have to know which is our reference group. In our case this is the group who have not experienced a stressful period and have done some paid job in their life (the reference levels for both variables involved in the interaction). Then, we can consider all combinations of the levels of the variables.
If we want the \(\hat{OR}\) for undergoing a stressful period and having had a paid job with respect to not having had a period of stress and having had a paid job we simply exponentiate the main estimated coefficient for stress (keeping at the same values the others covariates in the model):
exp(fit_multi2$coefficients)[2]
## stress
## 0.4438912
and we don’t reject the null hypothesis it is equal to 1 (no significant effect):
exp(confint.default(fit_multi2))[2,]
## 2.5 % 97.5 %
## 0.1361326 1.4474079
On the other hand, if we want the \(\hat{OR}\) for never having had a paid job and not being exposed to stress with respect to having had a paid job and not being exposed to stress we simply exponentiate the coefficient for the main effect of the variable no_paid_job:
exp(fit_multi2$coefficients)[3]
## no_paid_job
## 1.307032
Also this effect is not statistically significant which means that there does not seem to be a difference in the risk of low hand grip with regards to job experience in the non-exposed group (no stress) keeping fixed all others covariates.
exp(confint.default(fit_multi2))[3,]
## 2.5 % 97.5 %
## 0.8106679 2.1073152
Last, we can obtain the OR for subjects who were stressed and had never have a paid job with respect to the reference group (not experienced a stressful period and have done some paid job):
exp(fit_multi2$coefficients[9]+fit_multi2$coefficients[2]+fit_multi2$coefficients[3])
## stress:no_paid_job
## 14.07899
So in this subgroup it seems that the probability of developing low grip increases.
However, we should evaluate also the 95% CI for this \(\hat{OR}\).
One possibility to do it is to re-fit the multivariable logistic model using the rms R package:
library(rms)
dd <- datadist(datiSHAREStress)
options(datadist='dd') #define ranges of the covariates
fit_multi2b <- lrm(low_grip~stress*no_paid_job+age+ses+house_move,data=datiSHAREStress)
The code is quite similar, the main difference is that first we have to run the datadist function which stores the distribution summaries of the variables.
The print function returns a very similar output of the summary for the glm() object.
print(fit_multi2b)
## Logistic Regression Model
##
## lrm(formula = low_grip ~ stress * no_paid_job + age + ses + house_move,
## data = datiSHAREStress)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 6149 LR chi2 361.20 R2 0.143 C 0.752
## 0 5714 d.f. 8 R2(8,6149)0.056 Dxy 0.504
## 1 435 Pr(> chi2) <0.0001 R2(8,1212.7)0.253 gamma 0.505
## max |deriv| 2e-06 Brier 0.061 tau-a 0.066
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -9.5449 0.4505 -21.19 <0.0001
## stress -0.8122 0.6031 -1.35 0.1781
## no_paid_job 0.2678 0.2437 1.10 0.2719
## age 0.1022 0.0059 17.20 <0.0001
## ses=Great Difficulties 0.9275 0.2498 3.71 0.0002
## ses=Some Difficulties 0.4905 0.1496 3.28 0.0010
## ses=Fairly Easily 0.0609 0.1186 0.51 0.6072
## house_move -0.1819 0.1478 -1.23 0.2186
## stress * no_paid_job 3.1891 1.0547 3.02 0.0025
The summary on the other hand gives you the estimates for the coefficients as well as the ones for the OR. Of note by default for the continuous variables here, such as age, the function calculates the OR for the difference between the \(1^{st}\) and the \(3^{rd}\) quartile.
summary(fit_multi2b)
## Effects Response : low_grip
##
## Factor Low High Diff. Effect S.E. Lower 0.95
## stress 0 1.0 1.0 -0.812180 0.603110 -1.99430
## Odds Ratio 0 1.0 1.0 0.443890 NA 0.13612
## no_paid_job 0 1.0 1.0 0.267760 0.243710 -0.20990
## Odds Ratio 0 1.0 1.0 1.307000 NA 0.81067
## age 58 70.8 12.8 1.308700 0.076105 1.15960
## Odds Ratio 58 70.8 12.8 3.701500 NA 3.18860
## house_move 0 1.0 1.0 -0.181880 0.147840 -0.47165
## Odds Ratio 0 1.0 1.0 0.833700 NA 0.62397
## ses - Great Difficulties:Easily 1 2.0 NA 0.927540 0.249830 0.43788
## Odds Ratio 1 2.0 NA 2.528300 NA 1.54940
## ses - Some Difficulties:Easily 1 3.0 NA 0.490460 0.149580 0.19729
## Odds Ratio 1 3.0 NA 1.633100 NA 1.21810
## ses - Fairly Easily:Easily 1 4.0 NA 0.060946 0.118560 -0.17143
## Odds Ratio 1 4.0 NA 1.062800 NA 0.84246
## Upper 0.95
## 0.36990
## 1.44760
## 0.74541
## 2.10730
## 1.45790
## 4.29700
## 0.10788
## 1.11390
## 1.41720
## 4.12550
## 0.78362
## 2.18940
## 0.29333
## 1.34090
##
## Adjusted to: stress=0 no_paid_job=0
So far, the OR for stress and the job experience are the ones for the main effects. We can easily obtain here the OR for the interaction with the built-in contrast function:
c <- contrast(fit_multi2b,
list(stress=1,no_paid_job=1), list(stress=0,no_paid_job=0),type="average")
c
## Contrast S.E. Lower Upper Z Pr(>|z|)
## 11 2.644684 0.8328779 1.012273 4.277095 3.18 0.0015
##
## Confidence intervals are 0.95 individual intervals
#OR and 95% CI
exp(c$Contrast)
## 1
## 14.07899
exp(c$Lower)
## 1
## 2.75185
exp(c$Upper)
## 1
## 72.03085
So it seems that there is a significant interaction between stress and no paid job on the outcome.
Remind our initial research question: whether a stressful period could be associated with the occurrence of muscular weakness in the over 50 population. Based on our analysis it seems that is not the single event of a stessful period that increase the occurrence of muscular weakness, but it is the joint impact of having a stressful period coupled with no paid job that is associated with the occurrence of muscular weakness, adjusting for (or independently from) the specific age, house condition or socio-economic status.