CI for proportions (prevalence/cumulative incidence)

Load libraries:

library(DescTools)
library(PropCIs)
library(epiR)
library(Epi)
library(popEpi)

The binom.test function output includes a confidence interval for a proportion, and the proportion of “success” as a decimal number.
The binom.test function in the native stats package uses the Clopper-Pearson method for confidence intervals (1934).

This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals.

Example: as part of a survey, a researcher asks a class of 21 students if they have ever studied statistics before.

The following are the data are collected:

Experience Count: Yes= 7; No=14

Note that when calculating confidence intervals for a binomial variable, one level of the nominal variable is chosen to be the “success” level.

This is an arbitrary decision, but you should be cautious to remember that the confidence interval is reported for the proportion of “success” responses.

binom.test(7, 21,0.5,alternative="two.sided",conf.level=0.95)
## 
##  Exact binomial test
## 
## data:  7 and 21
## number of successes = 7, number of trials = 21, p-value = 0.1892
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1458769 0.5696755
## sample estimates:
## probability of success 
##              0.3333333

The BinomCI function in the DescTools package has several methods for calculating confidence intervals for a binomial proportion.

[see from the help of the function: method = c(“wilson”, “wald”, “agresti-coull”, “jeffreys”, “modified wilson”, “wilsoncc”,“modified jeffreys”, “clopper-pearson”, “arcsine”, “logit”, “witting”, “pratt”)]

BinomCI(7, 21,
        conf.level = 0.95,
        method = "clopper-pearson")
##            est    lwr.ci    upr.ci
## [1,] 0.3333333 0.1458769 0.5696755

The BinomCI function can also produce the confidence intervals for “success” and “failure” in one step.

observed = c(7, 14)
total = sum(observed)

BinomCI(observed, total,
        conf.level = 0.95,
        method = "clopper-pearson")
##           est    lwr.ci    upr.ci
## x.1 0.3333333 0.1458769 0.5696755
## x.2 0.6666667 0.4303245 0.8541231

Also the PropCIs package has functions for calculating confidence intervals for a binomial proportion.The exactci function uses the Clopper-Pearson exact method.

exactci(7, 21, conf.level=0.95)
## 
## 
## 
## data:  
## 
## 95 percent confidence interval:
##  0.1458769 0.5696755

The blakerci function uses the Blaker exact method.

blakerci(7, 21,
         conf.level=0.95)
## 
## 
## 
## data:  
## 
## 95 percent confidence interval:
##  0.1523669 0.5510455

CI for the incidence rate

As we already introduced in Epi Intro, in the case when the incidence rate is assumed constant, we can assume a Poisson likelihood that leads to the estimate : (the observed outcome is D, number of events, and the amount of follow up time is Y):

\[\hat{\lambda}=D/Y\] \[s.e.{\log(\hat{\lambda})}=1/\sqrt(D)\]

From this formula and using the normal approximation we can derive also approximate 95% confidence intervals for the estimated rate. A summary of the methods used for confidence interval calculations in the epi.conf function of the epiR library are available in the “help”of the function (refer to later in this section) .

CI for the mean (numerical variable/outcome)

Let’s do now a simple simulation to demonstrate the computation of the confidence interval for the mean:

nSim <- 10000
n    <- 10
ciArray <- array(0, dim=c(nSim,2))

for (i in 1:nSim){
  x <- rnorm(n, mean=10, sd=1)
  ciArray[i,] <- c(t.test(x)$conf.int[1],t.test(x)$conf.int[2]) 
}

head(ciArray)
##          [,1]     [,2]
## [1,] 9.245549 10.63962
## [2,] 9.480658 10.48660
## [3,] 8.955867 10.50219
## [4,] 9.314290 10.60022
## [5,] 9.992809 11.94563
## [6,] 9.065248 10.50861

What proportion of these CIs include the “true” parameter?

mean(1*((ciArray[,1] < 10) & (ciArray[,2] > 10)))
## [1] 0.9517

By default with the t.test function in R we obtain a 95% confidence interval for the mean.

Remember: confidence intervals (in the frequentist approach) make sense in the long run …

The Epi.conf function

As we have already explored, the function epi.conf estimate confidence intervals for means, proportions, incidence, etc… in one function. Let’s see a couple of examples:

library(epiR)

Estimate CI for a mean

Method mean.single requires a vector as input.

dat <- rnorm(n = 100, mean = 0, sd = 1)
epi.conf(dat, ctype = "mean.single")
##         est         se       lower     upper
## 1 0.1282932 0.09706294 -0.06430076 0.3208871

Estimate CI for the difference between two independent groups means

In statistics we distinguish between independent samples and paired samples. Roughly, independent samples are composed by different statistical units (for example different subjects) that are randomly selected and that have “nothing” in common. Paired data refer instead to the concept of a correlation between the statistical units, as for example repeated measures on the same subjects, or some induced correlation (we will introduce matching procedures later in the course, block 3).

Very often we want to compare mean values of a numerical outcome of interest between two groups, and in this case an option could be to compute the mean difference and the corresponding 95% CI and to check if the interval contain zero or not, Why ? Because if the 95% CI contain zero, then we can not conclude in favor of a statistical significant difference, but instead we conclude that data offer no evidence of coming from different populations. Remind that this procedure heavily depend also on variability around the means and on sample size!!

Method mean.unpaired requires a two-column data frame; the first column defining the groups must be a factor.

group <- c(rep("A", times = 5), rep("B", times = 5))
val = round(c(rnorm(n = 5, mean = 10, sd = 5), rnorm(n = 5, mean = 7, sd = 5)), digits = 0)
dat <- data.frame(group = group, val = val)
epi.conf(dat, ctype = "mean.unpaired")
##   est       se     lower    upper
## 1 6.6 3.563706 -1.617921 14.81792

Estimate CI for the difference between two paired groups means

Two paired samples: systolic blood pressure levels were measured in 16 middle-aged men before and after a standard exercise test. The mean rise in systolic blood pressure was 6.6 mmHg. The standard deviation of the difference was 6.0 mm Hg. The standard error of the mean difference was 1.49 mm Hg.

before <- c(148,142,136,134,138,140,132,144,128,170,162,150,138,154,126,116)
after <- c(152,152,134,148,144,136,144,150,146,174,162,162,146,156,132,126)
dat <- data.frame(before, after)
dat <- data.frame(cbind(before, after))
epi.conf(dat, ctype = "mean.paired", conf.level = 0.95)
##     est       se    lower    upper
## 1 6.625 1.491294 3.446382 9.803618

The 95% confidence interval for the population value of the mean systolic blood pressure increase after standard exercise was 3.4 to 9.8 mm Hg. Is there a significant effect of the exercise test? What do you think?

Estimate CI for a proportion (again :-) )

Out of 263 patients giving their views on the use of personal computers in general medical practice, 81 of them thought that the privacy of their medical file had been reduced.

pos <- 81
neg <- (263 - 81)
dat <- as.matrix(cbind(pos, neg))
round(epi.conf(dat, ctype = "prop.single"), digits = 3)
##       est    se lower upper
## pos 0.308 0.028 0.255 0.366

The 95% confidence interval for the population value of the proportion of patients thinking their privacy was reduced was from 0.255 to 0.366.

Estimate CI for the difference between two independent proportions

Adverse effects in 85 patients receiving either terbinafine or placebo treatment for dermatophyte onchomychois are reported. Out of 56 patients receiving terbinafine, 5 patients experienced adverse effects. Out of 29 patients receiving a placebo, none experienced adverse effects.

grp1 <- matrix(cbind(5, 51), ncol = 2)
grp2 <- matrix(cbind(0, 29), ncol = 2)
dat <- as.matrix(cbind(grp1, grp2))
round(epi.conf(dat, ctype = "prop.unpaired"), digits = 3)
##     est    se  lower upper
## 1 0.089 0.038 -0.038 0.193

The 95% confidence interval for the difference between the two groups is from -0.038 to +0.193.Is there a significant different rate of adverse effects of the terbinafine vs the placebo treatment ? What do you think?

Estimate CI for the difference between two paired proportions

In a reliability exercise, 41 patients were randomly selected from those who had undergone a stress test. The 41 sets of images were classified as normal or not by the laboratory and, independently, by clinical investigators from different centres. Of the 19 samples identified as ischaemic by clinical investigators 5 were identified as normal by the laboratory. Of the 22 samples identified as normal by clinical investigators 0 were identified as ischaemic by the laboratory.

dat <- as.matrix(cbind(14, 5, 0, 22))
round(epi.conf(dat, ctype = "prop.paired", conf.level = 0.95), digits = 3)
##     est    se lower upper
## 1 0.122 0.051 0.011 0.226

The 95% confidence interval for the difference in evaluations is 0.011 to 0.226 or approximately +1% to +23%. Is there a significant different evaluation between the laboratory and clinical investigators? What do you think?

Estimate CI for a prevalence

A herd of 1000 cattle were tested for brucellosis. Four samples out of 200 test returned a positive result. Assuming 100% test sensitivity and specificity, what is the estimated prevalence of brucellosis in this group of animals?

pos <- 4; pop <- 200
dat <- as.matrix(cbind(pos, pop))
epi.conf(dat, ctype = "prevalence", method = "exact", N = 1000, 
   design = 1, conf.level = 0.95) * 100
##   est     lower    upper
## 1   2 0.5475566 5.041361

The estimated prevalence of brucellosis in this herd is 2 cases per 100 cattle (95% CI 0.54 – 5.0 cases per 100 cattle).

References

Clopper, C.; Pearson, E. S. (1934). “The use of confidence or fiducial limits illustrated in the case of the binomial”. Biometrika. 26 (4): 404-413.

Blaker, H. (2000). Confidence curves and improved exact confidence intervals for discrete distributions, Canadian Journal of Statistics 28 (4), 783-798.