Load libraries:
library(DescTools)
library(PropCIs)
library(epiR)
library(Epi)
library(popEpi)
The binom.test function output includes a confidence interval for a
proportion, and the proportion of “success” as a decimal number.
The binom.test function in the native stats package uses the
Clopper-Pearson method for confidence intervals (1934).
This guarantees that the confidence level is at least conf.level, but in general does not give the shortest-length confidence intervals.
Example: as part of a survey, a researcher asks a class of 21 students if they have ever studied statistics before.
The following are the data are collected:
Experience Count: Yes= 7; No=14
Note that when calculating confidence intervals for a binomial variable, one level of the nominal variable is chosen to be the “success” level.
This is an arbitrary decision, but you should be cautious to remember that the confidence interval is reported for the proportion of “success” responses.
binom.test(7, 21,0.5,alternative="two.sided",conf.level=0.95)
##
## Exact binomial test
##
## data: 7 and 21
## number of successes = 7, number of trials = 21, p-value = 0.1892
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.1458769 0.5696755
## sample estimates:
## probability of success
## 0.3333333
The BinomCI function in the DescTools package has several methods for calculating confidence intervals for a binomial proportion.
[see from the help of the function: method = c(“wilson”, “wald”, “agresti-coull”, “jeffreys”, “modified wilson”, “wilsoncc”,“modified jeffreys”, “clopper-pearson”, “arcsine”, “logit”, “witting”, “pratt”)]
BinomCI(7, 21,
conf.level = 0.95,
method = "clopper-pearson")
## est lwr.ci upr.ci
## [1,] 0.3333333 0.1458769 0.5696755
The BinomCI function can also produce the confidence intervals for “success” and “failure” in one step.
observed = c(7, 14)
total = sum(observed)
BinomCI(observed, total,
conf.level = 0.95,
method = "clopper-pearson")
## est lwr.ci upr.ci
## x.1 0.3333333 0.1458769 0.5696755
## x.2 0.6666667 0.4303245 0.8541231
Also the PropCIs package has functions for calculating confidence intervals for a binomial proportion.The exactci function uses the Clopper-Pearson exact method.
exactci(7, 21, conf.level=0.95)
##
##
##
## data:
##
## 95 percent confidence interval:
## 0.1458769 0.5696755
The blakerci function uses the Blaker exact method.
blakerci(7, 21,
conf.level=0.95)
##
##
##
## data:
##
## 95 percent confidence interval:
## 0.1523669 0.5510455
As we already introduced in Epi Intro, in the case when the incidence rate is assumed constant, we can assume a Poisson likelihood that leads to the estimate : (the observed outcome is D, number of events, and the amount of follow up time is Y):
\[\hat{\lambda}=D/Y\] \[s.e.{\log(\hat{\lambda})}=1/\sqrt(D)\]
From this formula and using the normal approximation we can derive also approximate 95% confidence intervals for the estimated rate. A summary of the methods used for confidence interval calculations in the epi.conf function of the epiR library are available in the “help”of the function (refer to later in this section) .
Let’s do now a simple simulation to demonstrate the computation of the confidence interval for the mean:
nSim <- 10000
n <- 10
ciArray <- array(0, dim=c(nSim,2))
for (i in 1:nSim){
x <- rnorm(n, mean=10, sd=1)
ciArray[i,] <- c(t.test(x)$conf.int[1],t.test(x)$conf.int[2])
}
head(ciArray)
## [,1] [,2]
## [1,] 9.245549 10.63962
## [2,] 9.480658 10.48660
## [3,] 8.955867 10.50219
## [4,] 9.314290 10.60022
## [5,] 9.992809 11.94563
## [6,] 9.065248 10.50861
What proportion of these CIs include the “true” parameter?
mean(1*((ciArray[,1] < 10) & (ciArray[,2] > 10)))
## [1] 0.9517
By default with the t.test function in R we obtain a 95% confidence interval for the mean.
Remember: confidence intervals (in the frequentist approach) make sense in the long run …
As we have already explored, the function epi.conf estimate confidence intervals for means, proportions, incidence, etc… in one function. Let’s see a couple of examples:
library(epiR)
Method mean.single requires a vector as input.
dat <- rnorm(n = 100, mean = 0, sd = 1)
epi.conf(dat, ctype = "mean.single")
## est se lower upper
## 1 0.1282932 0.09706294 -0.06430076 0.3208871
In statistics we distinguish between independent samples and paired samples. Roughly, independent samples are composed by different statistical units (for example different subjects) that are randomly selected and that have “nothing” in common. Paired data refer instead to the concept of a correlation between the statistical units, as for example repeated measures on the same subjects, or some induced correlation (we will introduce matching procedures later in the course, block 3).
Very often we want to compare mean values of a numerical outcome of interest between two groups, and in this case an option could be to compute the mean difference and the corresponding 95% CI and to check if the interval contain zero or not, Why ? Because if the 95% CI contain zero, then we can not conclude in favor of a statistical significant difference, but instead we conclude that data offer no evidence of coming from different populations. Remind that this procedure heavily depend also on variability around the means and on sample size!!
Method mean.unpaired requires a two-column data frame; the first column defining the groups must be a factor.
group <- c(rep("A", times = 5), rep("B", times = 5))
val = round(c(rnorm(n = 5, mean = 10, sd = 5), rnorm(n = 5, mean = 7, sd = 5)), digits = 0)
dat <- data.frame(group = group, val = val)
epi.conf(dat, ctype = "mean.unpaired")
## est se lower upper
## 1 6.6 3.563706 -1.617921 14.81792
Two paired samples: systolic blood pressure levels were measured in 16 middle-aged men before and after a standard exercise test. The mean rise in systolic blood pressure was 6.6 mmHg. The standard deviation of the difference was 6.0 mm Hg. The standard error of the mean difference was 1.49 mm Hg.
before <- c(148,142,136,134,138,140,132,144,128,170,162,150,138,154,126,116)
after <- c(152,152,134,148,144,136,144,150,146,174,162,162,146,156,132,126)
dat <- data.frame(before, after)
dat <- data.frame(cbind(before, after))
epi.conf(dat, ctype = "mean.paired", conf.level = 0.95)
## est se lower upper
## 1 6.625 1.491294 3.446382 9.803618
The 95% confidence interval for the population value of the mean systolic blood pressure increase after standard exercise was 3.4 to 9.8 mm Hg. Is there a significant effect of the exercise test? What do you think?
Out of 263 patients giving their views on the use of personal computers in general medical practice, 81 of them thought that the privacy of their medical file had been reduced.
pos <- 81
neg <- (263 - 81)
dat <- as.matrix(cbind(pos, neg))
round(epi.conf(dat, ctype = "prop.single"), digits = 3)
## est se lower upper
## pos 0.308 0.028 0.255 0.366
The 95% confidence interval for the population value of the proportion of patients thinking their privacy was reduced was from 0.255 to 0.366.
Adverse effects in 85 patients receiving either terbinafine or placebo treatment for dermatophyte onchomychois are reported. Out of 56 patients receiving terbinafine, 5 patients experienced adverse effects. Out of 29 patients receiving a placebo, none experienced adverse effects.
grp1 <- matrix(cbind(5, 51), ncol = 2)
grp2 <- matrix(cbind(0, 29), ncol = 2)
dat <- as.matrix(cbind(grp1, grp2))
round(epi.conf(dat, ctype = "prop.unpaired"), digits = 3)
## est se lower upper
## 1 0.089 0.038 -0.038 0.193
The 95% confidence interval for the difference between the two groups is from -0.038 to +0.193.Is there a significant different rate of adverse effects of the terbinafine vs the placebo treatment ? What do you think?
In a reliability exercise, 41 patients were randomly selected from those who had undergone a stress test. The 41 sets of images were classified as normal or not by the laboratory and, independently, by clinical investigators from different centres. Of the 19 samples identified as ischaemic by clinical investigators 5 were identified as normal by the laboratory. Of the 22 samples identified as normal by clinical investigators 0 were identified as ischaemic by the laboratory.
dat <- as.matrix(cbind(14, 5, 0, 22))
round(epi.conf(dat, ctype = "prop.paired", conf.level = 0.95), digits = 3)
## est se lower upper
## 1 0.122 0.051 0.011 0.226
The 95% confidence interval for the difference in evaluations is 0.011 to 0.226 or approximately +1% to +23%. Is there a significant different evaluation between the laboratory and clinical investigators? What do you think?
A herd of 1000 cattle were tested for brucellosis. Four samples out of 200 test returned a positive result. Assuming 100% test sensitivity and specificity, what is the estimated prevalence of brucellosis in this group of animals?
pos <- 4; pop <- 200
dat <- as.matrix(cbind(pos, pop))
epi.conf(dat, ctype = "prevalence", method = "exact", N = 1000,
design = 1, conf.level = 0.95) * 100
## est lower upper
## 1 2 0.5475566 5.041361
The estimated prevalence of brucellosis in this herd is 2 cases per 100 cattle (95% CI 0.54 – 5.0 cases per 100 cattle).
Clopper, C.; Pearson, E. S. (1934). “The use of confidence or fiducial limits illustrated in the case of the binomial”. Biometrika. 26 (4): 404-413.
Blaker, H. (2000). Confidence curves and improved exact confidence intervals for discrete distributions, Canadian Journal of Statistics 28 (4), 783-798.