A case-control study of the relationship between smoking and CHD (coronary heart disease) is planned. A sample of men with newly diagnosed CHD will be compared for smoking status with a sample of controls. Assuming an equal number of cases and controls, how many study subject are required to detect an odds ratio of 2.0 with 0.90 power using a two-sided 0.05 test? Previous surveys have shown that around 0.30 of males without CHD are smokers.
We use the epi.sscc function, that require the following arguments:
*OR = scalar, the expected study odds ratio.
*p0 = scalar, the prevalence of exposure among the controls.
*n = scalar, the total number of subjects in the study (i.e. the number of cases plus the number of controls).
*power = scalar, the required study power.
*r = scalar, the number in the control group divided by the number in the case group.
*phi.coef = scalar, the correlation between case and control exposures for matched pairs. Ignored when method = “unmatched”.
*design= scalar, the design effect.The design effect is used to take into account the possible presence of clustering in the random sampling of the data. The design effect is a measure of the variability between clusters and is generally calculated as the ratio of the variance calculated assuming a complex sample design divided by the variance calculated assuming simple random sampling.
*sided.test = Use a two-sided test if you wish to evaluate whether or not the odds of exposure in cases is greater than or less than the odds of exposure in controls. Use a one-sided test to evaluate whether or not the odds of exposure in cases is greater than the odds of exposure in controls.
*conf.level = scalar, the level of confidence in the computed result.
*method = a character string defining the method to be used. Options are unmatched or matched.
*fleiss = logical, indicating whether or not the Fleiss correction should be applied (a continuity correction factor is used when you use a continuous probability distribution to approximate a discrete probability distribution). This argument is ignored when method = “matched”.
library(epiR)
epi.sscc(OR = 2.0, p0 = 0.30, n = NA, power = 0.90, r = 1, phi.coef = 0,
design = 1, sided.test = 2, conf.level = 0.95, method = "unmatched")
## $n.total
## [1] 376
##
## $n.case
## [1] 188
##
## $n.control
## [1] 188
##
## $power
## [1] 0.9
##
## $OR
## [1] 2
A total of 376 men need to be sampled: 188 cases and 188 controls.
Suppose we wish to determine the power to detect an odds ratio of 2.0 using a two-sided 0.05 test when 188 cases and 940 controls are available (that is, the ratio of controls to cases is 5:1). Assume the prevalence of smoking in males without CHD is 0.30. Here we use the Fleiss correction because we are dealing with an inverse problem (i.e. estimating power given the sample size). See for technical details: Fleiss JL et al., (2003). Statistical Methods for Rates and Proportions. Wiley, New York, 3rd edition.
n <- 188 + 940
epi.sscc(OR = 2.0, p0 = 0.30, n = n, power = NA, r = 5, phi.coef = 0,
design = 1, sided.test = 2, conf.level = 0.95, method = "unmatched",
fleiss = TRUE)
## $n.total
## [1] 1128
##
## $n.case
## [1] 188
##
## $n.control
## [1] 940
##
## $power
## [1] 0.9880212
##
## $OR
## [1] 2
The power of this study, with the given sample size allocation is 0.99.
We wish to conduct a case-control study to assess whether bladder cancer may be associated with past exposure to cigarette smoking. Cases will be patients with bladder cancer and controls will be patients hospitalized for injury. It is assumed that 20% of controls will be smokers or past smokers, and we wish to detect an odds ratio of 2 with power 90%. Three controls will be recruited for every case.
How many subjects need to be enrolled in the study?
epi.sscc(OR = 2.0, p0 = 0.20, n = NA, power = 0.90, r = 3, phi.coef = 0,
design = 1, sided.test = 2, conf.level = 0.95, method = "unmatched")
## $n.total
## [1] 620
##
## $n.case
## [1] 155
##
## $n.control
## [1] 465
##
## $power
## [1] 0.9
##
## $OR
## [1] 2
A total of 619 subjects need to be enrolled in the study: 155 cases and 464 controls.
We use the epi.sscohortc function, that require the following arguments:
A cohort (exposure-based) study of smoking and coronary heart disease (CHD) in middle aged men is planned. A sample of men will be selected at random from the population and those that agree to participate will be asked to complete a questionnaire. The follow-up period will be 5 years. The investigators would like to be 0.90 confident of being able to detect when the relative risk of CHD is 1.4 for smokers, using a one-sided 0.05 significance test. Previous evidence suggests that the incidence rate of death in non-smokers is 413 per 100000 person-years. Assuming equal numbers of smokers and non-smokers are sampled, how many men should be sampled overall?
irexp1 = 1.4 * (5 * 413)/100000
irexp0 = (5 * 413)/100000
epi.sscohortc(irexp1 = irexp1, irexp0 = irexp0, n = NA, power = 0.90, r = 1, design = 1, sided.test = 1, conf.level = 0.95)
## $n.total
## [1] 12130
##
## $n.exp1
## [1] 6065
##
## $n.exp0
## [1] 6065
##
## $power
## [1] 0.9
##
## $irr
## [1] 1.4
##
## $or
## [1] 1.411908
Over a 5-year period the estimated chance of death in the non exposed (irexp0) is 0.02065. Thus, rounding up to the next highest even number, 12130 men should be sampled (6065 smokers and 6065 nonsmokers).
Since in this design we are extracting a simple random sample from the overall population we could know in advance the expected prevalence of the exposure, and therefore we could also be more flexible, in allowing unbalancement between exposed and not exposed.
The imbalance is set through the parameter r.
If the proportion of men smoking is one-third, then in a random sample we would expect to find n1 = n/3. Hence n2 = 2n/3 and thus r = n1/n2 = 0.5. Using this new value for r:
epi.sscohortc(irexp1 = irexp1, irexp0 = irexp0, n = NA, power = 0.90, r = 0.50, design = 1, sided.test = 1, conf.level = 0.95)
## $n.total
## [1] 13543.5
##
## $n.exp1
## [1] 4514.5
##
## $n.exp0
## [1] 9029
##
## $power
## [1] 0.9
##
## $irr
## [1] 1.4
##
## $or
## [1] 1.411908
Rounding up gives a total sample size requirement of 13544. We would expect that, after random sampling about a third of these would be nonsmokers. Notice that 13544 is not exactly divisible by 3, but any further rounding up would not be justifiable because we cannot guarantee the exact proportion of smokers.
Say, for example, we are only able to enroll 5000 subjects into the study described above. What is the minimum and maximum detectable relative risk ?
irexp0 = (5 * 413)/100000
epi.sscohortc(irexp1 = NA, irexp0 = irexp0, n = 5000, power = 0.90,
r = 1, design = 1, sided.test = 1, conf.level = 0.95)
## $n.total
## [1] 5000
##
## $n.exp1
## [1] 2500
##
## $n.exp0
## [1] 2500
##
## $power
## [1] 0.9
##
## $irr
## [1] 0.5047104 1.6537812
##
## $or
## numeric(0)
The minimum detectable relative risk > 1 is 1.65. The maximum detectable relative risk < 1 is 0.50.
A study is to be carried out to assess the effect of a new treatment for the reproductive period in dairy cattle. What is the required sample size if we expect the proportion of cows responding in the treatment (exposed) group to be 0.30 and the proportion of cows responding in the control (unexposed) group to be 0.15? The required power for this study is 0.80 using a two-sided 0.05 test.
epi.sscohortc(irexp1 = 0.30, irexp0 = 0.15, n = NA, power = 0.80,
r = 1, design = 1, sided.test = 2, conf.level = 0.95)
## $n.total
## [1] 242
##
## $n.exp1
## [1] 121
##
## $n.exp0
## [1] 121
##
## $power
## [1] 0.8
##
## $irr
## [1] 2
##
## $or
## [1] 2.428571
A total of 242 cows are required: 121 in the treatment (exposed) group and 121 in the control (unexposed) group.