Load libraries:

library(survival)
library(MASS)
library(rms)
library(ISwR)
library(survMisc)
library(survminer)
library(KMsurv)

Recap of the basic steps

Loading and viewing dataset melanom:

dati <- melanom
head(dati)

##    no status days ulc thick sex
## 1 789      3   10   1   676   2
## 2  13      3   30   2    65   2
## 3  97      2   35   2   134   2
## 4  16      3   99   2   290   1
## 5  21      1  185   1  1208   2
## 6 469      1  204   1   484   2

attach(dati)

Create the survival object:

msurv <- with(dati, Surv(days, status == 1))
msurv

##   [1]   10+   30+   35+   99+  185   204   210   232   232+  279   295   355+
##  [13]  386   426   469   493+  529   621   629   659   667   718   752   779 
##  [25]  793   817   826+  833   858   869   872   967   977   982  1041  1055 
##  [37] 1062  1075  1156  1228  1252  1271  1312  1427+ 1435  1499+ 1506  1508+
##  [49] 1510+ 1512+ 1516  1525+ 1542+ 1548  1557+ 1560  1563+ 1584  1605+ 1621 
##  [61] 1627+ 1634+ 1641+ 1641+ 1648+ 1652+ 1654+ 1654+ 1667  1678+ 1685+ 1690 
##  [73] 1710+ 1710+ 1726  1745+ 1762+ 1779+ 1787+ 1787+ 1793+ 1804+ 1812+ 1836+
##  [85] 1839+ 1839+ 1854+ 1856+ 1860+ 1864+ 1899+ 1914+ 1919+ 1920+ 1927+ 1933 
##  [97] 1942+ 1955+ 1956+ 1958+ 1963+ 1970+ 2005+ 2007+ 2011+ 2024+ 2028+ 2038+
## [109] 2056+ 2059+ 2061  2062  2075+ 2085+ 2102+ 2103  2104+ 2108  2112+ 2150+
## [121] 2156+ 2165+ 2209+ 2227+ 2227+ 2256  2264+ 2339+ 2361+ 2387+ 2388  2403+
## [133] 2426+ 2426+ 2431+ 2460+ 2467  2492+ 2493+ 2521+ 2542+ 2559+ 2565  2570+
## [145] 2660+ 2666+ 2676+ 2738+ 2782  2787+ 2984+ 3032+ 3040+ 3042  3067+ 3079+
## [157] 3101+ 3144+ 3152+ 3154+ 3180+ 3182+ 3185+ 3199+ 3228+ 3229+ 3278+ 3297+
## [169] 3328+ 3330+ 3338  3383+ 3384+ 3385+ 3388+ 3402+ 3441+ 3458+ 3459+ 3459+
## [181] 3476+ 3523+ 3667+ 3695+ 3695+ 3776+ 3776+ 3830+ 3856+ 3872+ 3909+ 3968+
## [193] 4001+ 4103+ 4119+ 4124+ 4207+ 4310+ 4390+ 4479+ 4492+ 4668+ 4688+ 4926+
## [205] 5565+

Histogram of the follow up (days) and descriptive statistics:

hist(days)

summary(days)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      10    1525    2005    2153    3042    5565

Estimate the KM curve:

mfit <- survfit(Surv(days, status==1)~1, data = dati)
mfit

## Call: survfit(formula = Surv(days, status == 1) ~ 1, data = dati)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 205     57     NA      NA      NA

Output of the KM estimate at different time points:

summary(mfit, times=c(1000,3000,4000))

## Call: survfit(formula = Surv(days, status == 1) ~ 1, data = dati)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  1000    171      26    0.869  0.0240        0.823        0.917
##  3000     54      29    0.677  0.0385        0.605        0.757
##  4000     13       2    0.645  0.0431        0.566        0.735

Graphical representation of the curve:

plot(mfit)
abline(h=0.50, col="red",lty=2)

In this case the median survival time is not even interesting because the estimate is infinite.

The survival curve does not cross the 50% mark before all patients are censored.

If you want to see also censored times in the plot:

plot(mfit,mark.time=T)
abline(h=0.50, col="red",lty=2)

if you want to see also the number at risk at different time points:

ggsurvplot(mfit, data=dati, risk.table = TRUE)

Remember that estimate of the curve is reliable until the risk set has a sufficient sample size.
As a rule of thumb: the curve can become unreliable when the number of patients at risk is less than 15. The width of confidence intervals will help to decide.

Estimate survival curves by gender:

surv.bysex <- survfit(Surv(days,status==1)~sex)

Plot the curves:

plot(surv.bysex, conf.int=T,col=c("red","black"))

Other types of graphical representations:

ggsurvplot(surv.bysex, data=dati, risk.table = TRUE)

ggsurvplot(surv.bysex, data=dati, risk.table ="abs_pct")

Comparing survival curves : the log-rank test

Log-rank test to see if there is a significant difference between males and females: the survdiff function implements different tests specified by a parameter called rho, allowing a non-proportional hazards alternative to the null hypothesis, but the default value of rho gives the log-rank test.

survdiff(Surv(days,status==1)~sex)

## Call:
## survdiff(formula = Surv(days, status == 1) ~ sex)
## 
##         N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 126       28     37.1      2.25      6.47
## sex=2  79       29     19.9      4.21      6.47
## 
##  Chisq= 6.5  on 1 degrees of freedom, p= 0.01

We conclude that there is a significant difference in survival between males and females.

Stratified log-rank test using presence of ulcer as strata:

surv.bysex.ulc <- survdiff(Surv(days,status==1)~sex+strata(ulc))
surv.bysex.ulc

## Call:
## survdiff(formula = Surv(days, status == 1) ~ sex + strata(ulc))
## 
##         N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 126       28     34.7      1.28      3.31
## sex=2  79       29     22.3      1.99      3.31
## 
##  Chisq= 3.3  on 1 degrees of freedom, p= 0.07

To make separate graphs:

index.ulc1 <- ulc==1
index.ulc2 <- ulc==2

surv.bysex1 <- survfit(Surv(days,status==1)~sex, data=dati[index.ulc1,])
surv.bysex2 <- survfit(Surv(days,status==1)~sex, data=dati[index.ulc2,])


par(pty=("s"), mfrow=c(1,2))
plot(surv.bysex1, conf.int=T,col=c("red","black"))
title("Presence of ulcer")
plot(surv.bysex2, conf.int=T,col=c("red","black"))
title("Absence of ulcer")

Stratification makes the effect of sex less significant. A possible explanation might be that males seek treatment when the disease is in a more advanced state than women do, so that the gender difference is reduced when adjusted for a measure of disease progression.

Extension for more than two groups

Example of log rank with more than 2 groups:

data(myeloma)
fit <- survfit(Surv(time, event) ~ molecular_group, data = myeloma)
ggsurvplot(fit, legend.labs = levels(myeloma$molecular_group),
           pval = TRUE)

This plot shows the global p value across groups. Let’s see the differences between pairs (adjusting for multiple comparisons):

res <- pairwise_survdiff(Surv(time, event) ~ molecular_group,
                         data = myeloma)
res

## 
##  Pairwise comparisons using Log-Rank test 
## 
## data:  myeloma and molecular_group 
## 
##                  Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF  
## Cyclin D-2       0.723      -          -            -                -    
## Hyperdiploid     0.943      0.723      -            -                -    
## Low bone disease 0.723      0.988      0.644        -                -    
## MAF              0.644      0.447      0.523        0.485            -    
## MMSET            0.328      0.103      0.103        0.103            0.723
## Proliferation    0.103      0.038      0.038        0.062            0.485
##                  MMSET
## Cyclin D-2       -    
## Hyperdiploid     -    
## Low bone disease -    
## MAF              -    
## MMSET            -    
## Proliferation    0.527
## 
## P value adjustment method: BH

Proliferation vs Cyclin D-2 and vs Hyperdiploid show significant differences in survival curves.No others significant differences are detected between others groups.

Survival Analysis in R: comparing survival curves

Recap of the basic steps

Estimate survival curves by gender:

Comparing survival curves : the log-rank test

Stratified log-rank test using presence of ulcer as strata:

Extension for more than two groups