Pneumonia is a very common nosocomial infection in intensive care
units (ICUs). Many studies have investigated risk factors for the
development of infection and its consequences. We now explore a dataset
from a prospective cohort study, with 747 patients admitted in hospital,
and 97 had pneumonia at admission.
They were followed until they die in the ICU or they were discharged
alive. 657 were discharged alive, 76 died in ICU and 14 were censored
(i.e. we do not know if they die in hospital or they were discharged
alive: imagine for example that these patients have been transferred to
another hospital). The variable time in the dataset is the
length of hospital stay in days. The variable status is coded
as 0 for censoring, 1 for discharged alive, 2 for death. The variable
pneu indicates if the patient had pneumonia at admission.
Load the required libraries:
library(mvna)
library(survival)
Load the data:
data(sir.adm)
head(sir.adm)
## id pneu status time age sex
## 1 41 0 1 4 75.34153 F
## 2 395 0 1 24 19.17380 M
## 3 710 1 1 37 61.56568 M
## 4 3138 0 1 8 57.88038 F
## 5 3154 0 1 3 39.00639 M
## 6 3178 0 1 24 70.27762 M
table(sir.adm$pneu)
##
## 0 1
## 650 97
table(sir.adm$status)
##
## 0 1 2
## 14 657 76
If the status is 1 or 2 we have an uncensored observation in this example; if it is 0 is censored.
The first scientific question is : does pneumonia increase the risk of in-hospital mortality?
Let’see the raw data:
table(sir.adm$pneu,sir.adm$status)
##
## 0 1 2
## 0 6 589 55
## 1 8 68 21
my.table <- table(sir.adm$pneu,sir.adm$status)
round(prop.table(my.table,1),3)
##
## 0 1 2
## 0 0.009 0.906 0.085
## 1 0.082 0.701 0.216
It seems that pneumonia patients have more chances to die in hospital.
The second scientific question is : does pneumonia increase the risk of staying in the hospital (i.e. increase the length of hospital stay) ?
Now, first of all let’s explore the survival functions (in terms of length of stay) for a composite end-point, i.e. putting together in-hospital death or to be discharged alive (this composite end-point has no really a clinical meaning, it is just analysed here for didactic purposes):
plot(survfit(Surv(time, status != 0) ~ pneu, data = sir.adm), col = 1:2)
legend(100, 1, legend = c("no pneu", "pneu"), col = 1:2, lty = 1)
Pneumonia group shows longer hospital stays.
We can also obtain the corresponding cumulative hazard functions by using the Nelson-Aalen estimator:
plot(survfit(Surv(time, status != 0) ~ pneu, data = sir.adm), fun = "cumhaz", col = 1:2)
legend(0, 6.5, legend = c("no pneu", "pneu"), col = 1:2, lty = 1)
Let’s now quantify the effect of pneumonia on length of stay by means of a Cox model on the composite end-point:
cox <- coxph(Surv(time, status != 0) ~ pneu, data = sir.adm)
summary(cox)
## Call:
## coxph(formula = Surv(time, status != 0) ~ pneu, data = sir.adm)
##
## n= 747, number of events= 733
##
## coef exp(coef) se(coef) z Pr(>|z|)
## pneu -0.9481 0.3875 0.1152 -8.227 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## pneu 0.3875 2.581 0.3092 0.4857
##
## Concordance= 0.575 (se = 0.008 )
## Likelihood ratio test= 83.89 on 1 df, p=<2e-16
## Wald test = 67.69 on 1 df, p=<2e-16
## Score (logrank) test = 72.39 on 1 df, p=<2e-16
Pneumonia multiplies the hazard by 0.39, i.e. it is a significant factor in increasing the stay in the hospital (by decreasing the instantaneous risk to leave the hospital).
Now, we split the dataset in two subgroups according to the presence of pneumonia and we re-obtain an estimate of the cumulative hazard, this time based on the Cox model estimation (this is called the Breslow estimator of the cumulative hazard: remind that the Cox model is composed by two parts, the effect of the covariates and an estimate of the baseline hazard).
To do so, we have to extract the estimated cumulative hazard from the Cox model on the data of two patient, one with pneumonia and the other without (that are the values of cumulative hazard estimated in the Cox model for the two corresponding groups):
survfit(cox, newdata = sir.adm[1,]) -> cox.nop
survfit(cox, newdata = sir.adm[3,]) -> cox.p
We can now compare the non-parametric Nelson-Aalen estimator previously obtained with this estimator obtained from the Cox model:
plot(survfit(Surv(time, status != 0) ~ pneu, data = sir.adm), fun = "cumhaz", col = 1:2)
legend(0, 6.5, legend = c("no pneu", "pneu"), col = 1:2, lty = 1)
lines(cox.nop$time, cox.nop$cumhaz, type = "s", lty = 2)
lines(cox.p$time, cox.p$cumhaz, type = "s", lty = 2, col = 2)
They are quite close, and this is also a sort of goodness of fit plot for the cox model.Remind that here there are not competing risk, we are using the composite end-point.
For the 76 patients died in-hospital we will never observe their time to be discharged alive.
Conversely, for the 657 subjects discharged alive, we will not observe in this study their time to death.
From the raw data we saw that for the pneumonia group the chances to be discharged alive are smaller.
We also have seen that patients with pneumonia has more chance to stay longer in the hospital.
But, now the question is : patients with pneumonia have a different cause-specific hazard of death ?
Let’s now first of all estimate the cumulative incidence functions for these two distinct type of events in the two groups, taking into account the competing risks (Aalen-Johansen estimator):
library(cmprsk)
my.sir.cif <- cuminc(sir.adm$time, sir.adm$status, group = sir.adm$pneu, cencode =0)
my.sir.cif
## Tests:
## stat pv df
## 1 42.07480 8.784784e-11 1
## 2 16.16525 5.804939e-05 1
## Estimates and Variances:
## $est
## 50 100 150
## 0 1 0.88998816 0.91100110 0.91261748
## 1 1 0.64633198 0.73293123 NA
## 0 2 0.07930062 0.08576614 0.08576614
## 1 2 0.18393349 0.22550113 NA
##
## $var
## 50 100 150
## 0 1 0.0001517978 0.0001266104 0.0001250158
## 1 1 0.0025299050 0.0022581560 NA
## 0 2 0.0001125301 0.0001222882 0.0001222882
## 1 2 0.0016495598 0.0020723886 NA
We can also visualize the estimated CIFs:
plot(my.sir.cif, main = "CIFs", xlab = "Days", curvlab=c("No pneumo alive", "Pneumo alive", "No pneumo Death", "Pneumo Death"), ylim=c(0,1.4))
And we can also calculate a test statistic to compare the CIFs:
my.sir.cif$Tests
## stat pv df
## 1 42.07480 8.784784e-11 1
## 2 16.16525 5.804939e-05 1
From the test results, a significant difference is observed in both CIFs with respect to the presence/absence of pneumonia for the two types of events.
Note that these functions are quite different from the corresponding 1-KM curves: 1-KM over-estimate the incidence of the event of interest.
sir.adm$status_1_KM <- ifelse(sir.adm$status==1, 1, 0)
sir.adm$status_2_KM <- ifelse(sir.adm$status==2, 1, 0)
table(sir.adm$status_1_KM)
##
## 0 1
## 90 657
table(sir.adm$status_2_KM)
##
## 0 1
## 671 76
KM_1 <- survfit(Surv(time, status_1_KM) ~ pneu, data = sir.adm)
KM_2 <- survfit(Surv(time, status_2_KM) ~ pneu, data = sir.adm)
fup01 <- my.sir.cif$"0 1"$time
est01 <- my.sir.cif$"0 1"$est
fup11 <- my.sir.cif$"1 1"$time
est11 <- my.sir.cif$"1 1"$est
fup02 <- my.sir.cif$"0 2"$time
est02 <- my.sir.cif$"0 2"$est
fup12 <- my.sir.cif$"1 2"$time
est12 <- my.sir.cif$"1 2"$est
plot(KM_1, fun = "event", col = 1:2)
lines(fup01,est01, col="black", lty=2, lwd=2)
lines(fup11,est11, col="red", lty=2, lwd=2)
plot(KM_2, fun = "event", col = 1:2)
lines(fup02,est02, col="black", lty=2, lwd=2)
lines(fup12,est12, col="red", lty=2, lwd=2)
Let’s now estimate two cause-specific Cox models in the general population:
summary(coxph(Surv(time,status==1)~pneu,data=sir.adm))
## Call:
## coxph(formula = Surv(time, status == 1) ~ pneu, data = sir.adm)
##
## n= 747, number of events= 657
##
## coef exp(coef) se(coef) z Pr(>|z|)
## pneu -1.0901 0.3362 0.1299 -8.391 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## pneu 0.3362 2.974 0.2606 0.4337
##
## Concordance= 0.577 (se = 0.008 )
## Likelihood ratio test= 91.7 on 1 df, p=<2e-16
## Wald test = 70.4 on 1 df, p=<2e-16
## Score (logrank) test = 77.09 on 1 df, p=<2e-16
Pneumonia multiply the hazard to be discharged alive by a factor of 0.34. It is significant.
summary(coxph(Surv(time,status==2)~pneu,data=sir.adm))
## Call:
## coxph(formula = Surv(time, status == 2) ~ pneu, data = sir.adm)
##
## n= 747, number of events= 76
##
## coef exp(coef) se(coef) z Pr(>|z|)
## pneu -0.1622 0.8503 0.2678 -0.606 0.545
##
## exp(coef) exp(-coef) lower .95 upper .95
## pneu 0.8503 1.176 0.503 1.437
##
## Concordance= 0.531 (se = 0.023 )
## Likelihood ratio test= 0.37 on 1 df, p=0.5
## Wald test = 0.37 on 1 df, p=0.5
## Score (logrank) test = 0.37 on 1 df, p=0.5
Pneumonia multiply the hazard to die in the hospital by a factor of 0.85, but with a 95% CI from 0.50 to 1.43. It is not significant.Therefore having pneumonia does not significantly increase or decrease the instantaneous risk of death.
Note that if we want now to obtain the estimated cumulative incidence functions, related to the Cox models, we should switch to a multi-state approach:
sir.adm$eventMM <- factor(sir.adm$status, 0:2, labels=c("censor", "discharged alive", "death"))
table(sir.adm$eventMM)
##
## censor discharged alive death
## 14 657 76
fitMM <- coxph(Surv(time, eventMM) ~ pneu, sir.adm, id=id)
print(fitMM, digits=2)
## Call:
## coxph(formula = Surv(time, eventMM) ~ pneu, data = sir.adm, id = id)
##
##
## 1:2 coef exp(coef) se(coef) robust se z p
## pneu -1.09 0.34 0.13 0.11 -10 <2e-16
##
##
## 1:3 coef exp(coef) se(coef) robust se z p
## pneu -0.16 0.85 0.27 0.26 -0.6 0.5
##
## States: 1= (s0), 2= discharged alive, 3= death
##
## Likelihood ratio test=92 on 2 df, p=<2e-16
## n= 747, number of events= 733
dummy <- data.frame(pneu=c(0,1))
csurv <- survfit(fitMM, newdata=dummy)
plot(csurv[,'discharged alive'], ylab="CIF (from Cox)", col=1:2, lty=c(1,1), lwd=2)
lines(csurv[,'death'], ylab="CIF death", col=1:2, lty=c(2,2), lwd=2)
If instead we use the subdistribution hazards regression approach (Fine and Gray, just mentioned in the slides):
crr(ftime=sir.adm$time,fstatus=sir.adm$status,cov1=sir.adm$pneu,failcode=1,cencode=0) -> res.phsd1
res.phsd1
## convergence: TRUE
## coefficients:
## sir.adm$pneu1
## -0.9069
## standard errors:
## [1] 0.1033
## two-sided p-values:
## sir.adm$pneu1
## 0
exp(res.phsd1$coef)
## sir.adm$pneu1
## 0.4037637
Pneumonia multiply the subdistribution hazard to be discharged alive by a factor of 0.40. It is significant.
crr(ftime=sir.adm$time,fstatus=sir.adm$status,cov1=sir.adm$pneu,failcode=2,cencode=0) -> res.phsd2
res.phsd2
## convergence: TRUE
## coefficients:
## sir.adm$pneu1
## 0.9749
## standard errors:
## [1] 0.2496
## two-sided p-values:
## sir.adm$pneu1
## 9.4e-05
exp(res.phsd2$coef)
## sir.adm$pneu1
## 2.650951
Pneumonia multiply the subdistribution hazard to die in the hospital by a factor of 2.65. It is significant.
What does this means ? Why the two regression models give us a different picture in terms of hazards ?
This difference indicate that since pneumonia patients stay longer in the hospital, if we take into account the cumulative hazard scale, their risk of death is increased with respect to non-pneumonia patients. But, their instantaneous cause-specific risk of death at each time t, is not different from non-pneumonia patients. We observe them for longer time in hospital and this carries out an increased proportions of deaths in-hospital.