Load libraries:
library(survival)
library(MASS)
library(rms)
library(ISwR)
library(survMisc)
library(survminer)
library(KMsurv)
Loading and viewing dataset melanom:
dati <- melanom
head(dati)
## no status days ulc thick sex
## 1 789 3 10 1 676 2
## 2 13 3 30 2 65 2
## 3 97 2 35 2 134 2
## 4 16 3 99 2 290 1
## 5 21 1 185 1 1208 2
## 6 469 1 204 1 484 2
attach(dati)
Create the survival object:
msurv <- with(dati, Surv(days, status == 1))
msurv
## [1] 10+ 30+ 35+ 99+ 185 204 210 232 232+ 279 295 355+
## [13] 386 426 469 493+ 529 621 629 659 667 718 752 779
## [25] 793 817 826+ 833 858 869 872 967 977 982 1041 1055
## [37] 1062 1075 1156 1228 1252 1271 1312 1427+ 1435 1499+ 1506 1508+
## [49] 1510+ 1512+ 1516 1525+ 1542+ 1548 1557+ 1560 1563+ 1584 1605+ 1621
## [61] 1627+ 1634+ 1641+ 1641+ 1648+ 1652+ 1654+ 1654+ 1667 1678+ 1685+ 1690
## [73] 1710+ 1710+ 1726 1745+ 1762+ 1779+ 1787+ 1787+ 1793+ 1804+ 1812+ 1836+
## [85] 1839+ 1839+ 1854+ 1856+ 1860+ 1864+ 1899+ 1914+ 1919+ 1920+ 1927+ 1933
## [97] 1942+ 1955+ 1956+ 1958+ 1963+ 1970+ 2005+ 2007+ 2011+ 2024+ 2028+ 2038+
## [109] 2056+ 2059+ 2061 2062 2075+ 2085+ 2102+ 2103 2104+ 2108 2112+ 2150+
## [121] 2156+ 2165+ 2209+ 2227+ 2227+ 2256 2264+ 2339+ 2361+ 2387+ 2388 2403+
## [133] 2426+ 2426+ 2431+ 2460+ 2467 2492+ 2493+ 2521+ 2542+ 2559+ 2565 2570+
## [145] 2660+ 2666+ 2676+ 2738+ 2782 2787+ 2984+ 3032+ 3040+ 3042 3067+ 3079+
## [157] 3101+ 3144+ 3152+ 3154+ 3180+ 3182+ 3185+ 3199+ 3228+ 3229+ 3278+ 3297+
## [169] 3328+ 3330+ 3338 3383+ 3384+ 3385+ 3388+ 3402+ 3441+ 3458+ 3459+ 3459+
## [181] 3476+ 3523+ 3667+ 3695+ 3695+ 3776+ 3776+ 3830+ 3856+ 3872+ 3909+ 3968+
## [193] 4001+ 4103+ 4119+ 4124+ 4207+ 4310+ 4390+ 4479+ 4492+ 4668+ 4688+ 4926+
## [205] 5565+
Histogram of the follow up (days) and descriptive statistics:
hist(days)
summary(days)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10 1525 2005 2153 3042 5565
Estimate the KM curve:
mfit <- survfit(Surv(days, status==1)~1, data = dati)
mfit
## Call: survfit(formula = Surv(days, status == 1) ~ 1, data = dati)
##
## n events median 0.95LCL 0.95UCL
## [1,] 205 57 NA NA NA
Output of the KM estimate at different time points:
summary(mfit, times=c(1000,3000,4000))
## Call: survfit(formula = Surv(days, status == 1) ~ 1, data = dati)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1000 171 26 0.869 0.0240 0.823 0.917
## 3000 54 29 0.677 0.0385 0.605 0.757
## 4000 13 2 0.645 0.0431 0.566 0.735
Graphical representation of the curve:
plot(mfit)
abline(h=0.50, col="red",lty=2)
In this case the median survival time is not even interesting because the estimate is infinite.
The survival curve does not cross the 50% mark before all patients are censored.
If you want to see also censored times in the plot:
plot(mfit,mark.time=T)
abline(h=0.50, col="red",lty=2)
if you want to see also the number at risk at different time points:
ggsurvplot(mfit, data=dati, risk.table = TRUE)
Remember that estimate of the curve is reliable until the risk set
has a sufficient sample size.
As a rule of thumb: the curve can become unreliable when the number of
patients at risk is less than 15. The width of confidence intervals will
help to decide.
surv.bysex <- survfit(Surv(days,status==1)~sex)
Plot the curves:
plot(surv.bysex, conf.int=T,col=c("red","black"))
Other types of graphical representations:
ggsurvplot(surv.bysex, data=dati, risk.table = TRUE)
ggsurvplot(surv.bysex, data=dati, risk.table ="abs_pct")
Log-rank test to see if there is a significant difference between males and females: the survdiff function implements different tests specified by a parameter called rho, allowing a non-proportional hazards alternative to the null hypothesis, but the default value of rho gives the log-rank test.
survdiff(Surv(days,status==1)~sex)
## Call:
## survdiff(formula = Surv(days, status == 1) ~ sex)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 126 28 37.1 2.25 6.47
## sex=2 79 29 19.9 4.21 6.47
##
## Chisq= 6.5 on 1 degrees of freedom, p= 0.01
We conclude that there is a significant difference in survival between males and females.
surv.bysex.ulc <- survdiff(Surv(days,status==1)~sex+strata(ulc))
surv.bysex.ulc
## Call:
## survdiff(formula = Surv(days, status == 1) ~ sex + strata(ulc))
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 126 28 34.7 1.28 3.31
## sex=2 79 29 22.3 1.99 3.31
##
## Chisq= 3.3 on 1 degrees of freedom, p= 0.07
To make separate graphs:
index.ulc1 <- ulc==1
index.ulc2 <- ulc==2
surv.bysex1 <- survfit(Surv(days,status==1)~sex, data=dati[index.ulc1,])
surv.bysex2 <- survfit(Surv(days,status==1)~sex, data=dati[index.ulc2,])
par(pty=("s"), mfrow=c(1,2))
plot(surv.bysex1, conf.int=T,col=c("red","black"))
title("Presence of ulcer")
plot(surv.bysex2, conf.int=T,col=c("red","black"))
title("Absence of ulcer")
Stratification makes the effect of sex less significant. A possible explanation might be that males seek treatment when the disease is in a more advanced state than women do, so that the gender difference is reduced when adjusted for a measure of disease progression.
Example of log rank with more than 2 groups:
data(myeloma)
fit <- survfit(Surv(time, event) ~ molecular_group, data = myeloma)
ggsurvplot(fit, legend.labs = levels(myeloma$molecular_group),
pval = TRUE)
This plot shows the global p value across groups. Let’s see the differences between pairs (adjusting for multiple comparisons):
res <- pairwise_survdiff(Surv(time, event) ~ molecular_group,
data = myeloma)
res
##
## Pairwise comparisons using Log-Rank test
##
## data: myeloma and molecular_group
##
## Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF
## Cyclin D-2 0.723 - - - -
## Hyperdiploid 0.943 0.723 - - -
## Low bone disease 0.723 0.988 0.644 - -
## MAF 0.644 0.447 0.523 0.485 -
## MMSET 0.328 0.103 0.103 0.103 0.723
## Proliferation 0.103 0.038 0.038 0.062 0.485
## MMSET
## Cyclin D-2 -
## Hyperdiploid -
## Low bone disease -
## MAF -
## MMSET -
## Proliferation 0.527
##
## P value adjustment method: BH
Proliferation vs Cyclin D-2 and vs Hyperdiploid show significant differences in survival curves.No others significant differences are detected between others groups.