Load the required library:

library(epiR)

We will use an an example of data from the stress test to detect the presence of coronary disease.

dat           <- as.table(matrix(c(815,115,208,327), nrow = 2, byrow = TRUE))
colnames(dat) <- c("Dis+","Dis-")
rownames(dat) <- c("Test+","Test-")
dat
##       Dis+ Dis-
## Test+  815  115
## Test-  208  327

Let’s estimate the parameters of interest related to the performance of the stress test in detecting the disease: the following function computes true and apparent prevalence (apparent prevalence is in this case the number of subjects testing positive by a diagnostic test divided by the total number of subjects in the sample tested; true prevalence is the actual number of diseased subjects divided by the number of individuals in the population), sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios from count data provided in a 2 by 2 table. Exact binomial confidence limits are calculated for test sensitivity, specificity, and positive and negative predictive value. The positive likelihood ratio is calculated as \[LR+ = Sensitivity/1-Specificity\] It represents the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive. The negative likelihood ratio is calculated as \[LR- = 1-Sensitivity/Specificity\] It represents the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative.

rval <- epi.tests(dat, conf.level = 0.95)
print(rval)
##           Outcome +    Outcome -      Total
## Test +          815          115        930
## Test -          208          327        535
## Total          1023          442       1465
## 
## Point estimates and 95% CIs:
## --------------------------------------------------------------
## Apparent prevalence *                  0.63 (0.61, 0.66)
## True prevalence *                      0.70 (0.67, 0.72)
## Sensitivity *                          0.80 (0.77, 0.82)
## Specificity *                          0.74 (0.70, 0.78)
## Positive predictive value *            0.88 (0.85, 0.90)
## Negative predictive value *            0.61 (0.57, 0.65)
## Positive likelihood ratio              3.06 (2.61, 3.59)
## Negative likelihood ratio              0.27 (0.24, 0.31)
## False T+ proportion for true D- *      0.26 (0.22, 0.30)
## False T- proportion for true D+ *      0.20 (0.18, 0.23)
## False T+ proportion for T+ *           0.12 (0.10, 0.15)
## False T- proportion for T- *           0.39 (0.35, 0.43)
## Correctly classified proportion *      0.78 (0.76, 0.80)
## --------------------------------------------------------------
## * Exact CIs

Comparing two binary diagnostic tests [Optional topic!!]

The comparison of the performance of two binary diagnostic tests is an important topic in clinical research. The most frequent type of sample design to compare two binary diagnostic tests is the paired design. This design consists of applying the two binary diagnostic tests to all of the individuals in a random sample, where the disease status of each individual is known through the application of a gold standard.

We will use for this example an R program written by Roldán-Nofuentes in 2020, called compbdt. You can download it from : https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-00988-y#MOESM1

The objective is to estimate the sensitivity and the specificity, the likelihood ratios and the predictive values of each diagnostic test applying the confidence intervals with the best asymptotic performance.

The program compares the sensitivities and specificities of the two diagnostic tests simultaneously, as well as the likelihood ratios and the predictive values, applying the global hypothesis tests with the best performance in terms of type I error and power.

When the global hypothesis test is significant, the causes of the significance are investigated solving the individual hypothesis tests and applying the multiple comparison method of Holm.

The most optimal confidence intervals are also calculated for the difference or ratio between the respective parameters.

Based on the data observed in the sample, the program also estimates the probability of making a type II error if the null hypothesis is not rejected, or estimates the power if the if the alternative hypothesis is accepted. We will review some basic ideas about statistical testing procedure in Block 2, related to sample size determination.

Type of Errors \(H_{0}\) TRUE \(H_{0}\) FALSE
Reject \(H_{0}\) Type I error \(\alpha\) ok
Do not reject \(H_{0}\) ok Type II error \(\beta\)

We define as power of a statistical test the probability of not incurring in Type II error, i.e. 1-\(\beta\). The estimation of the probability of making a type II error allows the researcher to decide about the reliability of the null hypothesis when this hypothesis is not rejected.

We will apply this function to a real example on the diagnosis of coronary artery disease.

The diagnosis of coronary artery disease (CAD) was investigated using as diagnostic tests the exercise test (Test 1) and the clinical history of chest pain (Test 2), and the coronary angiography as the gold standard. The following table shows the frequencies obtained by applying three medical tests to a sample of 871 individuals.

we source the R program that we will use to do the calculations:

source('C:\\Users\\Giulia Barbati\\OneDrive - Università degli Studi di Trieste\\Stat_Learning_Epi\\Block 1\\R codes\\compbdt.R')

and we run the program: the input are the frequencies of diseased (non-diseased) patients in which the Test 1 gives a result positive and negative and Test 2 gives a result positive and negative.

pp <- compbdt(473, 29, 81, 25, 22, 46, 44, 151)
## 
##           PREVALENCE OF THE DISEASE 
## 
## Estimated prevalence of the disease is  69.805 % and its standard error is 0.016 
## 
## 95 % confidence interval for the prevalence of the disease is ( 66.681 % ;  72.768 %) 
## 
## 
##           COMPARISON OF THE ACCURACIES (SENSITIVITIES AND SPECIFICITIES) 
## 
## Estimated sensitivity of Test 1 is  82.566 % and its standard error is 0.015 
## 
## 95 % confidence interval for the sensitivity of Test 1 is ( 79.363 % ;  85.389 %) 
## 
## Estimated sensitivity of Test 2 is  91.118 % and its standard error is 0.012 
## 
## 95 % confidence interval for the sensitivity of Test 1 is ( 88.61 % ;  93.148 %) 
## 
## Estimated specificity of Test 1 is  74.144 % and its standard error is 0.027 
## 
## 95 % confidence interval for the specificity of Test 1 is ( 68.557 % ;  79.087 %) 
## 
## Estimated specificity of Test 2 is  74.905 % and its standard error is 0.027 
## 
## 95 % confidence interval for the specificity of Test 1 is ( 69.358 % ;  79.787 %) 
## 
## 
## Wald test statistic for the global hypothesis test H0: (Se1 = Se2 and Sp1 = Sp2) is  25.662  
## 
##   Global p-value is  0  
## 
##   Applying the global Wald test (to an alpha error of 5 %), we reject the hypothesis H0: (Se1 = Se2 and Sp1 = Sp2) 
## 
##   Estimated power (to an alpha error of 5 %) is 99.8 %  
## 
##   Investigation of the causes of significance: 
## 
##    McNemar test statistic (with cc) for H0: Se1 = Se2 is  23.645  and the two-sided p-value is  0  
## 
##    McNemar test statistic (with cc) for H0: Sp1 = Sp2 is  0.011  and the two-sided p-value is  0.991  
## 
##    Applying the Holm method (to an alpha error of 5 %), we reject the hypothesis H0: Se1 = Se2 and we do not reject the hypothesis H0: Sp1 = Sp2 
## 
##    Sensitivity of Test 2 is significantly greater than sensitivity of Test 1 
## 
##     95 % confidence interval for the difference Se2 - Se1 is ( 5.192 % ;  11.857 %) 
## 
## 
## 
##           COMPARISON OF THE LIKELIHOOD RATIOS 
## 
## Estimated positive LR of Test 1 is  3.193  and its standard error is 0.339 
## 
## 95 % confidence interval for the positive LR of Test 1 is ( 2.61  ;  3.952 ) 
## 
## Estimated positive LR of Test 2 is  3.631  and its standard error is 0.39 
## 
## 95 % confidence interval for the positive LR of Test 1 is ( 2.962  ;  4.505 ) 
## 
## Estimated negative LR of Test 1 is  0.235  and its standard error is 0.022 
## 
## 95 % confidence interval for the negative LR of Test 1 is ( 0.195  ;  0.283 ) 
## 
## Estimated negative LR of Test 2 is  0.119  and its standard error is 0.016 
## 
## 95 % confidence interval for the negative LR of Test 2 is ( 0.09  ;  0.153 ) 
## 
## 
## Test statistic for the global hypothesis test H0: (PLR1 = PLR2 and NLR1 = NLR2) is  23.438  
## 
##   Global p-value is  0  
## 
##   Applying the global hypothesis test (to an alpha error of 5 %), we reject the hypothesis H0: (PLR1 = PLR2 and NLR1 = NLR2) 
## 
##   Estimated power (to an alpha error of 5 %) is 99.78 %  
## 
##   Investigation of the causes of significance: 
## 
##    Test statistic for H0: PLR1 = PLR2 is  0.898  and the two-sided p-value is  0.369  
## 
##    Test statistic for H0: NLR1 = NLR2 is  4.663  and the two-sided p-value is  0  
## 
##    Applying the Holm method (to an alpha error of 5 %), we do not reject the hypothesis H0: PLR1 = PLR2 and we reject the hypothesis H0: NLR1 = NLR2 
## 
##    Negative likelihood ratio of Test 1 is significantly greater than negative likelihood ratio of Test 2 
## 
##     95 % confidence interval for the ratio NLR1 / NLR2 is ( 1.412  ;  2.554 ) 
## 
## 
## 
##           COMPARISON OF THE PREDICTIVE VALUES 
## 
## Estimated positive PV of Test 1 is  88.07 % and its standard error is 0.014 
## 
## 95 % confidence interval for the positive PV of Test 1 is ( 85.17 % ;  90.498 %) 
## 
## Estimated positive PV of Test 2 is  89.355 % and its standard error is 0.012 
## 
## 95 % confidence interval for the positive PV of Test 2 is ( 86.698 % ;  91.562 %) 
## 
## Estimated negative PV of Test 1 is  64.784 % and its standard error is 0.028 
## 
## 95 % confidence interval for the negative PV of Test 1 is ( 59.246 % ;  69.976 %) 
## 
## Estimated negative PV of Test 2 is  78.486 % and its standard error is 0.026 
## 
## 95 % confidence interval for the negative PV of Test 2 is ( 73.024 % ;  83.151 %) 
## 
## 
## Wald test statistic for the global hypothesis test H0: (PPV1 = PPV2 and NPV1 = NPV2) is  25.944  
## 
##   Global p-value is  0  
## 
##   Applying the global hypothesis test (to an alpha error of 5 %), we reject the hypothesis H0: (PPV1 = PPV2 and NPV1 = NPV2) 
## 
##   Estimated power (to an alpha error of 5 %) is 99.26 %  
## 
##   Investigation of the causes of significance: 
## 
##    Weighted generalized score statistic for H0: PPV1 = PPV2 is  0.807  and the two-sided p-value is  0.369  
## 
##    Weighted generalized score statistic for H0: NPV1 = NPV2 is  22.502  and the two-sided p-value is  0  
## 
##    Applying the Holm method (to an alpha error of 5 %), we do not reject the hypothesis H0: PPV1 = PPV2 and we reject the hypothesis H0: NPV1 = NPV2 
## 
##    Negative PV of Test 2 is significantly greater than negative PV of Test 1 
## 
##     95 % confidence interval for the difference NPV2 - NPV1 is ( 8.041 % ;  19.363 %)

You will also find three text files in the folder that you are currently using. The results obtained comparing the sensitivities and specificities are recorded in the file “Results_Comparison_Accuracies.txt”, those obtained when comparing the LRs are recorded in the file “Results_Comparison_LRs.txt”, and those obtained when comparing the PVs are recorded in the file “Results_Comparison_PVs.txt”. In R, an alternative program to “compbdt” is the DTComPair package. The DTComPair package estimates the same parameters as the “compbdt” and compares the parameters individually, i.e. solving each hypothesis test to an α error.

References

Leisenring, W., Alonzo, T., and Pepe, M. S. (2000). Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics, 56(2):345-51.

Kosinski, A.S. (2013). A weighted generalized score statistic for comparison of predictive values of diagnostic tests. Stat Med, 32(6):964-77.

Moskowitz, C.S., and Pepe, M.S. (2006). Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clin Trials, 3(3):272-9.

Roldán-Nofuentes, J.A. Compbdt: an R program to compare two binary diagnostic tests subject to a paired design. BMC Med Res Methodol 20, 143 (2020). https://doi.org/10.1186/s12874-020-00988-y