For this example we will use the R package pROC. It builds a ROC curve and returns a roc object, a list of class roc. This object can be printed, plotted, or passed to the functions auc, ci, smooth.roc and coords. Additionally, two roc objects can be compared with the function roc.test. Data can be provided as response, predictor, where the predictor is the numeric (or ordered) level of the evaluated risk factor, and the response encodes the observation class (control or case). The level argument specifies which response level must be taken as controls (first value of level) or cases (second). It can safely be ignored when the response is encoded as 0 and 1, but it will frequently fail otherwise. By default, the first two values of levels(as.factor(response)) are taken, and the remaining levels are ignored. This means that if your response is coded “control” and “case”, the levels will be inverted.
library(pROC)
This dataset summarizes several clinical and one laboratory variable of 113 patients with an aneurysmal subarachnoid hemorrhage. The purpose of the case presented here is to identify patients at risk of poor post-aSAH outcome, as they require specific healthcare management; therefore the clinical test must be highly specific. Detailed results of the study are reported in [1]. We only outline the features relevant to the ROC analysis.
data(aSAH)
head(aSAH)
## gos6 outcome gender age wfns s100b ndka
## 29 5 Good Female 42 1 0.13 3.01
## 30 5 Good Female 37 1 0.14 8.54
## 31 5 Good Female 42 1 0.10 8.09
## 32 5 Good Female 27 1 0.04 10.42
## 33 1 Poor Female 42 3 0.13 17.40
## 34 1 Poor Male 48 2 0.10 12.75
with(aSAH, table(gender, outcome))
## outcome
## gender Good Poor
## Male 22 20
## Female 50 21
Let’s estimate the ROC curve for the parameter s100b, a biomarker:
roc(aSAH$outcome, aSAH$s100b,levels=c("Good", "Poor"))
##
## Call:
## roc.default(response = aSAH$outcome, predictor = aSAH$s100b, levels = c("Good", "Poor"))
##
## Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
## Area under the curve: 0.7314
Inspect the ROC plot and report the 95% CI:
roc(aSAH$outcome, aSAH$s100b,percent=TRUE, plot=TRUE, ci=TRUE)
##
## Call:
## roc.default(response = aSAH$outcome, predictor = aSAH$s100b, percent = TRUE, ci = TRUE, plot = TRUE)
##
## Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
## Area under the curve: 73.14%
## 95% CI: 63.01%-83.26% (DeLong)
We can also inspect the confidence intervals around our estimates, point by point:
rocobj <- plot.roc(aSAH$outcome, aSAH$s100b)
ci.sp.obj <- ci.sp(rocobj, sensitivities=seq(0, 1, .01), boot.n=100)
plot(rocobj)
plot(ci.sp.obj, type="shape", col="blue")
We can also compare the performance of two different biomarkers:
roc.test(response = aSAH$outcome, predictor1= aSAH$wfns, predictor2 = aSAH$s100)
##
## DeLong's test for two correlated ROC curves
##
## data: aSAH$wfns and aSAH$s100 by aSAH$outcome (Good, Poor)
## Z = 2.209, p-value = 0.02718
## alternative hypothesis: true difference in AUC is not equal to 0
## 95 percent confidence interval:
## 0.01040618 0.17421442
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.8236789 0.7313686
And plot the two ROC curves in one graph:
roc.s100b <- roc(aSAH$outcome, aSAH$s100b)
roc.wfns <- roc(aSAH$outcome, aSAH$wfns)
plot(roc.s100b)
plot(roc.wfns, add=TRUE, col="red")
Load the required library:
library(OptimalCutpoints)
The package Optimal.cutpoints calculates optimal cutpoints in diagnostic tests. Several methods or criteria for selecting optimal cutoffs have been implemented, including methods based on cost-benefit analysis and diagnostic test accuracy measures (Sensitivity/Specificity, Predictive Values and Diagnostic Likelihood Ratios) or prevalence (Lopez-Raton et al. 2014). As a basic example, we can use the Youden Index approach:
perf.s100b <-optimal.cutpoints(X ="s100b", status ="outcome", tag.healthy ="Good",
methods = "Youden", data =aSAH, control = control.cutpoints(),
ci.fit = TRUE,
conf.level = 0.95, trace = FALSE)
perf.s100b
##
## Call:
## optimal.cutpoints.default(X = "s100b", status = "outcome", tag.healthy = "Good",
## methods = "Youden", data = aSAH, control = control.cutpoints(),
## ci.fit = TRUE, conf.level = 0.95, trace = FALSE)
##
## Optimal cutoffs:
## Youden
## 1 0.2200
##
## Area under the ROC curve (AUC): 0.731 (0.63, 0.833)
Therefore, 0.22 is the selected cut-off for the biomarker that maximizes the sum of Sensitivity and Specificity. But, pay always attention to the shape of the functional relationship between the biomarker and the risk of outcome…more on this in block 3.
[1] Natacha Turck, Laszlo Vutskits, Paola Sanchez-Pena, Xavier Robin, Alexandre Hainard, Marianne Gex-Fabry, Catherine Fouda, Hadiji Bassem, Markus Mueller, Frédérique Lisacek, Louis Puybasset and Jean-Charles Sanchez (2010) “A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage”. Intensive Care Medicine 36(1), 107-115. DOI: 10.1007/s00134-009-1641-y.
[2] Lopez-Raton, M., Rodriguez-Alvarez, M.X, Cadarso-Suarez, C. and Gude-Sampedro, F. (2014). OptimalCutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests. Journal of Statistical Software 61(8), 1-36. URL http://www.jstatsoft.org/v61/i08/.