The classical frequentist approach

Let’s begin with the classical frequentist approach: we will revise here the general topic of confidence intervals around means (numerical variables) and proportions (binary variables) and around differences between means and proportions. These methods are widely employed in epidemiological studies to report uncertainty around estimates basically related to the random sampling process from a target population.

Confidence intervals for proportions

Load libraries:

library(DescTools)
library(PropCIs)
library(epiR)
library(Epi)
library(popEpi)

The binom.test function output includes a confidence interval for a proportion, and the proportion of “success” as a decimal number.
(The binom.test function in the native stats package uses the Clopper-Pearson method for confidence intervals (1934). This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals).

Example: as part of a survey, a researcher asks a class of 21 students if they have ever studied statistics before.

The following are the data are collected:

Experience Count: Yes= 7; No=14

Note that when calculating confidence intervals for a binomial variable, one level of the nominal variable is chosen to be the “success” level.

This is an arbitrary decision, but you should be cautious to remember that the confidence interval is reported for the proportion of “success” responses.

binom.test(7, 21,0.5,alternative="two.sided",conf.level=0.95)
## 
##  Exact binomial test
## 
## data:  7 and 21
## number of successes = 7, number of trials = 21, p-value = 0.1892
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1458769 0.5696755
## sample estimates:
## probability of success 
##              0.3333333

The BinomCI function in the DescTools package has several methods for calculating confidence intervals for a binomial proportion.[See if interested in statistical details the help of the function]

BinomCI(7, 21,
        conf.level = 0.95,
        method = "clopper-pearson")
##            est    lwr.ci    upr.ci
## [1,] 0.3333333 0.1458769 0.5696755

The BinomCI function can also produce the confidence intervals for “success” and “failure” in one step:

observed = c(7, 14)
total = sum(observed)

BinomCI(observed, total,
        conf.level = 0.95,
        method = "clopper-pearson")
##           est    lwr.ci    upr.ci
## x.1 0.3333333 0.1458769 0.5696755
## x.2 0.6666667 0.4303245 0.8541231

Confidence intervals for the incidence rate

As we already introduced, if the incidence rate is assumed constant, we can use a Poisson data generating mechanism that leads to the following estimate (where the observed outcome is D, number of events, and the amount of time is Y):

\[\hat{\lambda}=D/Y\] \[s.e.{\log(\hat{\lambda})}=1/\sqrt(D)\]

From this formula and using the normal approximation we can derive approximate 95% confidence intervals for the estimated rate. A summary of the methods used for confidence interval calculations in the epi.conf function of the epiR library are available in the help of the function.

Example:

2648 persons died out of 150000 persons observed for two years (the number of person-years is 300000). Let’s fit a Poisson model to this data :

 D    <- 2648
    Y <- 300000
    mm <- glm(D~1,offset=log(Y), family=poisson)
    round(ci.pred(mm,newdata=data.frame(Y=1000)),3)
##   Estimate  2.5% 97.5%
## 1    8.827 8.497 9.169

The amount of follow up time/person-time (Y) is specified in the offset argument of the Poisson model. The estimated mortality rate is 8.83 deaths per 1000 person-years and now we also have the confidence interval around this estimate.

Confidence interval for the mean (numerical variable/outcome)

Let’s do now a simple simulation to demonstrate the concept of the confidence interval for the mean:

nSim <- 10000
n    <- 10
ciArray <- array(0, dim=c(nSim,2))

for (i in 1:nSim){
  x <- rnorm(n, mean=10, sd=1)
  ciArray[i,] <- c(t.test(x)$conf.int[1],t.test(x)$conf.int[2]) 
}

head(ciArray)
##          [,1]     [,2]
## [1,] 9.463307 11.12918
## [2,] 9.539271 10.45433
## [3,] 9.329341 10.32360
## [4,] 8.984453 10.67847
## [5,] 9.271832 10.35168
## [6,] 8.784494 10.12948

What proportion of these CIs include the true parameter?

mean(1*((ciArray[,1] < 10) & (ciArray[,2] > 10)))
## [1] 0.9487

By default with the t.test function in R we obtain a 95% confidence interval for the mean. Remind: confidence intervals make sense in the long run …of repeated “experiments” under the “same” conditions… Once the confidence interval is computed on a specific observed sample, either it contains the true parameter or not… we will never know ! But we are “confident” in the long-run property.

The Epi.conf function

As we have already explored, the function epi.conf estimate lot of confidence intervals for means, proportions, incidence, etc… in just one function. Let’s see some examples:

library(epiR)

Estimate CI for a mean

Method mean.single requires a vector as input.

dat <- rnorm(n = 100, mean = 0, sd = 1)
epi.conf(dat, ctype = "mean.single")
##           est        se      lower     upper
## 1 -0.07835821 0.1028504 -0.2824356 0.1257192

Estimate CI for the difference between two independent groups means

In statistics we distinguish between independent samples and paired samples. (This is important since different ways to evaluate the variance are connected to this aspect). Roughly, independent samples are composed by different statistical units (for example different subjects) that are randomly selected and that have “nothing” in common. Paired samples refer instead to the concept of a correlation between the statistical units, as for example repeated measures on the same subjects, or some induced correlation (we will introduce matching procedures later in the course). Very often we want to compare mean values of a numerical outcome of interest between two groups, and in this case a reasonable option could be to compute the mean difference and the corresponding 95% CI and to check if the interval contain zero or not…why ? Because if the 95% CI contain zero, then we can not conclude in favor of a statistical significant difference, but instead we conclude that data offer no enough evidence of coming from different populations. [Remind that this procedure heavily depend also on the variability of the measure (the variance) around the means and on sample size].

Method mean.unpaired requires a two-column data frame; the first column defining the groups must be a factor.

group <- c(rep("A", times = 5), rep("B", times = 5))
val = round(c(rnorm(n = 5, mean = 10, sd = 5), rnorm(n = 5, mean = 7, sd = 5)), digits = 0)
dat <- data.frame(group = group, val = val)
epi.conf(dat, ctype = "mean.unpaired")
##    est       se     lower    upper
## 1 -1.6 2.227106 -6.735715 3.535715

Estimate CI for the difference between two paired groups means

Two paired samples: systolic blood pressure levels were measured in 16 middle-aged men before and after a standard exercise test. The mean rise in systolic blood pressure was 6.6 mmHg. The standard deviation of the difference was 6.0 mm Hg. The standard error of the mean difference was 1.49 mm Hg.

before <- c(148,142,136,134,138,140,132,144,128,170,162,150,138,154,126,116)
after  <- c(152,152,134,148,144,136,144,150,146,174,162,162,146,156,132,126)
dat <- data.frame(before, after)
dat <- data.frame(cbind(before, after))
epi.conf(dat, ctype = "mean.paired", conf.level = 0.95)
##     est       se    lower    upper
## 1 6.625 1.491294 3.446382 9.803618

The 95% confidence interval for the population value of the mean systolic blood pressure increase after standard exercise was 3.4 to 9.8 mm Hg.

Is there a (statistical) significant effect of the exercise test? What do you think?

Estimate CI for a proportion

Out of 263 patients giving their views on the use of personal computers in general medical practice, 81 of them thought that the privacy of their medical file had been reduced.

pos <- 81
neg <- (263 - 81)
dat <- as.matrix(cbind(pos, neg))
round(epi.conf(dat, ctype = "prop.single"), digits = 3)
##       est    se lower upper
## pos 0.308 0.028 0.255 0.366

The 95% confidence interval for the population value of the proportion of patients thinking their privacy was reduced was from 0.255 to 0.366.

Estimate CI for the difference between two independent proportions

Example: a study on the adverse effects in 85 patients receiving either terbinafine or placebo treatment for dermatophyte onchomychois are reported. Out of 56 patients receiving terbinafine, 5 patients experienced adverse effects. Out of 29 patients receiving a placebo, none experienced adverse effects.

grp1 <- matrix(cbind(5, 51), ncol = 2)
grp2 <- matrix(cbind(0, 29), ncol = 2)
dat <- as.matrix(cbind(grp1, grp2))
round(epi.conf(dat, ctype = "prop.unpaired"), digits = 3)
##     est    se  lower upper
## 1 0.089 0.038 -0.038 0.193

The 95% confidence interval for the difference between the two groups is from -0.038 to +0.193.

Is there a (statistical) significant different rate of adverse effects of the terbinafine vs the placebo treatment ? What do you think?

Estimate CI for the difference between two paired proportions

In a reliability exercise, 41 patients were randomly selected from those who had undergone a stress test. The 41 sets of images were classified as normal or not by the laboratory and, independently, by clinical investigators from different centres. Of the 19 samples identified as ischaemic by clinical investigators 5 were identified as normal by the laboratory. Of the 22 samples identified as normal by clinical investigators 0 were identified as ischaemic by the laboratory.

dat <- as.matrix(cbind(14, 5, 0, 22))
round(epi.conf(dat, ctype = "prop.paired", conf.level = 0.95), digits = 3)
##     est    se lower upper
## 1 0.122 0.051 0.011 0.226

The 95% confidence interval for the difference in evaluations is 0.011 to 0.226 or approximately +1% to +23%. Is there a (statistical) significant different evaluation between the laboratory and clinical investigators? What do you think?

Estimate CI for a prevalence

A herd of 1000 cattle were tested for brucellosis. Four samples out of 200 test returned a positive result. Assuming 100% test sensitivity and specificity, what is the estimated prevalence (proportion) of brucellosis in this group of animals?

pos <- 4; pop <- 200
dat <- as.matrix(cbind(pos, pop))
epi.conf(dat, ctype = "prevalence", method = "exact", N = 1000, 
   design = 1, conf.level = 0.95) * 100
##   est     lower    upper
## 1   2 0.5475566 5.041361

The estimated prevalence of brucellosis in this herd is 2 cases per 100 cattle (95% CI 0.54 – 5.0 cases per 100 cattle).

Credible intervals (Bayesian approach)

Credible intervals are an important concept in Bayesian statistics. Its core purpose is to describe and summarize the uncertainty related to the unknown parameters you are trying to estimate. In this regard, it could appear as quite similar to the frequentist concept of confidence intervals. However, while their goal is similar, their statistical definition and meaning is very different. A Bayesian credibility interval for an unknown quantity of interest can be directly regarded as having a high probability of containing the unknown quantity (for example : the 95% of probability), in contrast to a frequentist (confidence) interval, which may strictly be interpreted only in relation to a sequence of similar inferences that might be made in repeated practice.

From the point of view of practical implementation, several methods have been proposed to estimate credibility intervals. The R library bayestestR provides two methods, the Highest Density Interval (HDI) and the Equal-tailed Interval (ETI). These methods can be changed via the method argument of the ci() function. What is the difference? Let’s see:

library(bayestestR)
## Warning: il pacchetto 'bayestestR' è stato creato con R versione 4.4.2
library(ggplot2)

# Generate a normal distribution (symmetric around the mean!)
posterior <- distribution_normal(1000)

# Compute HDI and ETI (default at 95%)
ci_hdi <- ci(posterior, method = "HDI")
ci_eti <- ci(posterior, method = "ETI")

# Plot the distribution and add the limits of the two CIs
out <- estimate_density(posterior, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
  geom_area(fill = "orange") +
  theme_classic() +
  # HDI in blue
  geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", linewidth = 3) +
  geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", linewidth = 3) +
  # Quantile in red
  geom_vline(xintercept = ci_eti$CI_low, color = "red", linewidth = 1) +
  geom_vline(xintercept = ci_eti$CI_high, color = "red", linewidth = 1)

In this example HDI and ETI approaches are identical.

But here:

# Generate a beta distribution (asymmetric!) 
posterior <- distribution_beta(1000, 6, 2)

# Compute HDI and Quantile CI
ci_hdi <- ci(posterior, method = "HDI")
ci_eti <- ci(posterior, method = "ETI")

# Plot the distribution and add the limits of the two CIs
out <- estimate_density(posterior, extend = TRUE)
ggplot(out, aes(x = x, y = y)) +
  geom_area(fill = "orange") +
  theme_classic() +
  # HDI in blue
  geom_vline(xintercept = ci_hdi$CI_low, color = "royalblue", linewidth = 3) +
  geom_vline(xintercept = ci_hdi$CI_high, color = "royalblue", linewidth = 3) +
  # ETI in red
  geom_vline(xintercept = ci_eti$CI_low, color = "red", linewidth = 1) +
  geom_vline(xintercept = ci_eti$CI_high, color = "red", linewidth = 1)

Here there is a clear difference between the two methods. Contrary to the HDI, for which all the values within the interval have a higher probability density than values outside the interval (the interval is determined as the one with the narrowest width), the ETI is equal-tailed. This means that a 95% interval has 2.5% of the distribution on either side of its limits. In symmetric distributions, the two methods return similar results. This is not the case for skewed distributions. Indeed, it is possible that values in the ETI for skewed distributions include some values that are relatively low probable. This property seems undesirable as a summary of the most credible values in a distribution. On the other hand, the ETI range does not change when transformations are applied to the distribution (for instance, for log-odds to probabilities transformation): the lower and higher bounds of the transformed distribution will correspond to the transformed lower and higher bounds of the original distribution. Instead, applying transformations to the distribution will change the resulting HDI. Thus, for instance, if exponentiated credible intervals are required, calculating the ETI is recommended.

An example of bayesian reasoning in health applications: the diagnostic tests

To illustrate diagnostic tests, we will use an an example of data from the stress test to detect the presence of coronary disease.

dat           <- as.table(matrix(c(815,115,208,327), nrow = 2, byrow = TRUE))
colnames(dat) <- c("Dis+","Dis-")
rownames(dat) <- c("Test+","Test-")
dat
##       Dis+ Dis-
## Test+  815  115
## Test-  208  327

Let’s estimate the parameters of interest related to the performance of the stress test in detecting the disease: the following function computes true and apparent prevalence (apparent prevalence is in this case the number of subjects testing positive by a diagnostic test divided by the total number of subjects in the sample tested; true prevalence is the actual number of diseased subjects divided by the number of individuals in the population), sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios from count data provided in a 2 by 2 table.

(Note that even if we are using this topic as an example of “bayesian reasoning”, in this R library exact binomial frequentist confidence limits are calculated for test sensitivity, specificity, and positive and negative predictive value, but the important concept here is the application of the bayes theorem in deriving the predictive value of the test more than the evaluation of uncertainty).

The positive likelihood ratio is calculated as \[LR+ = Sensitivity/1-Specificity\] It represents the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive. The negative likelihood ratio is calculated as \[LR- = 1-Sensitivity/Specificity\] It represents the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative.

rval <- epi.tests(dat, conf.level = 0.95)
print(rval)
##           Outcome +    Outcome -      Total
## Test +          815          115        930
## Test -          208          327        535
## Total          1023          442       1465
## 
## Point estimates and 95% CIs:
## --------------------------------------------------------------
## Apparent prevalence *                  0.63 (0.61, 0.66)
## True prevalence *                      0.70 (0.67, 0.72)
## Sensitivity *                          0.80 (0.77, 0.82)
## Specificity *                          0.74 (0.70, 0.78)
## Positive predictive value *            0.88 (0.85, 0.90)
## Negative predictive value *            0.61 (0.57, 0.65)
## Positive likelihood ratio              3.06 (2.61, 3.59)
## Negative likelihood ratio              0.27 (0.24, 0.31)
## False T+ proportion for true D- *      0.26 (0.22, 0.30)
## False T- proportion for true D+ *      0.20 (0.18, 0.23)
## False T+ proportion for T+ *           0.12 (0.10, 0.15)
## False T- proportion for T- *           0.39 (0.35, 0.43)
## Correctly classified proportion *      0.78 (0.76, 0.80)
## --------------------------------------------------------------
## * Exact CIs