Load the required library:
library(epiDisplay)
library(psych)
In nature, an event usually takes place in a very small amount of time. At any given point of time, the probability of encountering such an event is small. Instead of the probability of the single event, now we focus on the frequency of the events as a density, which means incidence or ‘count’ of events over a period of time. (While time is one dimension, the same concept applies to the density of counts of small objects in a two-dimensional area or three-dimensional space). Moreover we can assume that one event is independent from another and that the densities in different units of time vary with a variance equal to the average density. We can approximate this kind of random process using the Poisson random variable. When the probability of having an event is affected by some factors, a model is needed to explain and predict the density. Variation among different strata of a population could be explained by the various combination of factors. Within each stratum (defined by covariates combination), the distribution of the events is assumed random. Poisson regression deals with outcome variables that are counts in nature (whole numbers or integers). Independent covariates have the same role as those encountered in linear and logistic regression. In epidemiology, Poisson regression is very often used for analysing grouped population based or cohort data, looking at incidence density among person-time contributed by subjects that share similar characteristics of interest. Poisson regression is one of 3 common regression models used in epidemiological studies. The other two that are more commonly used are linear regression and logistic regression, which have been already covered. The last family is survival methods, that we will explain in the last block 4.
There are two main assumptions for Poisson regression:
risk is homogeneous among person-times contributed by different subjects who share the same characteristics of interest (e.g. sex, age-group) and the same period.
asymptotically, or as the sample size becomes larger and larger, the mean of the counts is equal to the variance.
Straightforward linear regression methods (assuming constant variance and normal errors) are not appropriate for count data for four main reasons:
the linear model might lead to the prediction of negative counts
the variance of the response may increase with the mean
the errors will not be normally distributed
zero counts are difficult to handle in transformations
Moreover, in studies that invole the time dimension different subjects may have different person-times of exposure. Analysing risk factors while ignoring differences in person-times is wrong.
Poisson regression overcomes these limitations.
Note that in survival analysis using for example Cox regression (see block 4), the hazard ratio will be estimated for each covariate in the model, not the incidence density in each subgroup; in the Cox model the interest will be focused on the “how long until an event occurs - time to event -”, instead in the Poisson regression model the focus is on “how many events occur in given interval”.
The dataset Montana was extracted from an occupational cohort study conducted to test the association between respiratory deaths (outcome) and exposure to arsenic in the industry, after adjusting for various other risk factors/confounders. The main outcome variable is respdeath. This is the count of the number of deaths among personyrs or personyears of subjects in each category. The other variables are independent covariates including age group agegr, period of employment period, starting time of employment start and the level of exposure to arsenic during the study period arsenic (the exposure of interest). Read in the data first and examine the variables.
data(Montana)
summary(Montana)
## respdeath personyrs agegr period
## Min. : 0.000 Min. : 4.2 Min. :1.000 Min. :1.000
## 1st Qu.: 0.000 1st Qu.: 127.5 1st Qu.:2.000 1st Qu.:1.000
## Median : 1.000 Median : 335.1 Median :3.000 Median :2.000
## Mean : 2.421 Mean : 1096.4 Mean :2.605 Mean :2.404
## 3rd Qu.: 3.000 3rd Qu.: 925.7 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :19.000 Max. :12451.3 Max. :4.000 Max. :4.000
## starting arsenic
## Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.250
## Median :1.000 Median :2.000
## Mean :1.456 Mean :2.474
## 3rd Qu.:2.000 3rd Qu.:3.000
## Max. :2.000 Max. :4.000
The last four variables are classed as integers. We need to tell R to interpret them as categorical variables, or factors, and attach labels to each of the levels. This can be done using the factor command with a ‘labels’ argument included.
Montana$agegr <- factor(Montana$agegr, labels=c("40-49","50-59","60-69","70-79"))
Montana$period <- factor(Montana$period, labels=c("1938-1949", "1950-1959","1960-1969", "1970-1977"))
Montana$start <- factor(Montana$start, labels=c("pre-1925", "1925 & after"))
Montana$arsenic1 <- factor(Montana$arsenic, labels=c("<1 year", "1-4years","5-14 years", "15+ years"))
summary(Montana)
## respdeath personyrs agegr period starting
## Min. : 0.000 Min. : 4.2 40-49:24 1938-1949:30 Min. :1.000
## 1st Qu.: 0.000 1st Qu.: 127.5 50-59:28 1950-1959:32 1st Qu.:1.000
## Median : 1.000 Median : 335.1 60-69:31 1960-1969:28 Median :1.000
## Mean : 2.421 Mean : 1096.4 70-79:31 1970-1977:24 Mean :1.456
## 3rd Qu.: 3.000 3rd Qu.: 925.7 3rd Qu.:2.000
## Max. :19.000 Max. :12451.3 Max. :2.000
## arsenic start arsenic1
## Min. :1.000 pre-1925 :62 <1 year :29
## 1st Qu.:1.250 1925 & after:52 1-4years :29
## Median :2.000 5-14 years:29
## Mean :2.474 15+ years :27
## 3rd Qu.:3.000
## Max. :4.000
We keep the original arsenic variable unchanged for use later on.
Let us explore the person-years breakdown by age and period. Firstly, create a table for total person-years:
tapply(Montana$personyrs, list(Montana$period, Montana$agegr), sum) -> table.pyears
Carry out the same procedure for number of deaths, and compute the table of incidence per 10,000 person years for each cell.
tapply(Montana$respdeath, list(Montana$period, Montana$agegr), sum) -> table.deaths
table.inc10000 <- table.deaths/table.pyears*10000
table.inc10000
## 40-49 50-59 60-69 70-79
## 1938-1949 5.424700 17.13102 34.95107 26.53928
## 1950-1959 3.344638 23.47556 49.01961 64.82632
## 1960-1969 4.341516 20.49375 58.23803 55.06608
## 1970-1977 4.408685 14.77747 44.09949 80.81413
Now, create a time-series plot of the incidence:
plot.ts(table.inc10000, plot.type="single", xlab=" ",ylab="#/10,000 person-years", xaxt="n", col=c("black",
"blue","red","green"), lty=c(2,1,1,2), las=1)
points(rep(1:4,4), table.inc10000, pch=22, cex=table.pyears/sum(table.pyears) * 20)
title(main = "Incidence by age and period")
axis(side = 1, at = 1:4, labels = levels(Montana$period))
legend(3.2,40, legend=levels(Montana$agegr)[4:1], col=c("green","red", "blue", "black"), bg = "white", lty=c(2,1,1,2))
The above graph shows that the older age group is generally associated with a higher risk. On the other hand, the sample size (reflected by the size of the squares at each point) decreases with age. The possibility of a confounding effect of age on the exposure of interest can better be examined by using Poisson regression.
Let’s estimate a Poisson regression model taking into account only period as a covariate:
mode11 <- glm(respdeath ~ period, offset = log(personyrs),family = poisson, data=Montana)
summary(mode11)
##
## Call:
## glm(formula = respdeath ~ period, family = poisson, data = Montana,
## offset = log(personyrs))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.4932 -0.7949 0.3799 1.5010 4.7243
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.4331 0.1715 -37.511 <2e-16 ***
## period1950-1959 0.2365 0.2117 1.117 0.2638
## period1960-1969 0.3781 0.2001 1.889 0.0588 .
## period1970-1977 0.4830 0.2036 2.372 0.0177 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 376.02 on 113 degrees of freedom
## Residual deviance: 369.27 on 110 degrees of freedom
## AIC: 596.05
##
## Number of Fisher Scoring iterations: 6
The option offset = log(personyrs) allows the variable personyrs to be the denominator for the counts of respdeath. A logarithmic transformation is needed since, for a Poisson generalized linear model, the link function is the natural log, and the default link for the Poisson family is the log link.
Remind : an important criterion in the choice of a link function for various families of distributions is to ensure that the fitted values from the modelling stay within reasonable bounds. Specifying a log link (default for Poisson) ensures that the fitted counts are all greater than or equal to zero. For more details on default links for various families of distributions related to generalized linear modelling, see the help in R under help(family).
The first model above with period as the only independent variable suggests that the death rate increased with time. The model can be tested for goodness of fit and the checked whether the Poisson assumptions mentioned earlier have been violated.
To test the goodness of fit of the Poisson model, type:
poisgof(mode11)
## $results
## [1] "Goodness-of-fit test for Poisson assumption"
##
## $chisq
## [1] 369.2654
##
## $df
## [1] 110
##
## $p.value
## [1] 9.578443e-30
The component ‘$chisq’ is actually computed from the model deviance, a parameter reflecting the level of errors. A large chi-squared value with small degrees of freedom results in a significant violation of the Poisson assumption (p < 0.05). If only the P value is wanted, the command can be shortened.
poisgof(mode11)$p.value
## [1] 9.578443e-30
The P value is very small indicating a poor fit.
Note:It should be noted that this method works under the assumption of a large sample size. An alternative method is to a fit negative binomial regression model (but not covered in the slides!).
We now add the second independent variable ‘agegr’ to the model:
mode12 <- glm(respdeath~agegr+period, offset=log(personyrs), family = poisson, data=Montana)
AIC(mode12)
## [1] 396.6407
The AIC (Akaike Information Criterion) has decreased remarkably from ‘model1’ to ‘model2’ indicating a poor fit of the first model.
poisgof(mode12)$p.value
## [1] 0.0003381328
But ‘model2’ still violates the Poisson assumption.
mode13 <- glm(respdeath ~ agegr, offset = log(personyrs), family = poisson, data=Montana)
AIC(mode13)
## [1] 394.4699
poisgof(mode13)$p.value
## [1] 0.0003295142
Removal of ‘period’ further reduces the AIC but still violates the Poisson assumption to the same extent as the previous model. The next step is to add the exposure of interest: ‘arsenic1’.
mode14 <- glm(respdeath ~ agegr + arsenic1, offset=log(personyrs), family = poisson, data=Montana)
summary(mode14)
##
## Call:
## glm(formula = respdeath ~ agegr + arsenic1, family = poisson,
## data = Montana, offset = log(personyrs))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8356 -0.8555 -0.2741 0.6216 2.0531
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.9947 0.2237 -35.739 < 2e-16 ***
## agegr50-59 1.4624 0.2454 5.960 2.53e-09 ***
## agegr60-69 2.3502 0.2380 9.873 < 2e-16 ***
## agegr70-79 2.5987 0.2564 10.136 < 2e-16 ***
## arsenic11-4years 0.8041 0.1577 5.099 3.42e-07 ***
## arsenic15-14 years 0.5957 0.2059 2.893 0.00381 **
## arsenic115+ years 0.9976 0.1758 5.673 1.40e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 376.02 on 113 degrees of freedom
## Residual deviance: 122.25 on 107 degrees of freedom
## AIC: 355.04
##
## Number of Fisher Scoring iterations: 5
poisgof(mode14)$p.value
## [1] 0.1486902
Fortunately, ‘model4’ has a much lower AIC than model3 and it now does not violate the assumption.
If we change the reference level for arsenic and we use 1-4 years vs others:
Montana$arsenic.b <- relevel(Montana$arsenic1,ref="1-4years")
mode15 <- glm(respdeath ~ agegr + arsenic.b, offset=log(personyrs), family = poisson, data=Montana)
summary(mode15)
##
## Call:
## glm(formula = respdeath ~ agegr + arsenic.b, family = poisson,
## data = Montana, offset = log(personyrs))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8356 -0.8555 -0.2741 0.6216 2.0531
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.1905 0.2466 -29.156 < 2e-16 ***
## agegr50-59 1.4624 0.2454 5.960 2.53e-09 ***
## agegr60-69 2.3502 0.2380 9.873 < 2e-16 ***
## agegr70-79 2.5987 0.2564 10.136 < 2e-16 ***
## arsenic.b<1 year -0.8041 0.1577 -5.099 3.42e-07 ***
## arsenic.b5-14 years -0.2085 0.2326 -0.896 0.37
## arsenic.b15+ years 0.1935 0.2071 0.935 0.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 376.02 on 113 degrees of freedom
## Residual deviance: 122.25 on 107 degrees of freedom
## AIC: 355.04
##
## Number of Fisher Scoring iterations: 5
It does not appear to be any increase in the risk of death from more than 4 years of exposure to arsenic so it may be worth combining it into just two levels:
Montana$arsenic2 <- Montana$arsenic1
levels(Montana$arsenic2) <- c("<1 year", rep("1+ years", 3))
model6 <- glm(respdeath ~ agegr + arsenic2,offset=log(personyrs), family=poisson, data=Montana)
summary(model6)
##
## Call:
## glm(formula = respdeath ~ agegr + arsenic2, family = poisson,
## data = Montana, offset = log(personyrs))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5852 -0.8666 -0.3307 0.6802 2.2097
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.0086 0.2233 -35.859 < 2e-16 ***
## agegr50-59 1.4702 0.2453 5.994 2.04e-09 ***
## agegr60-69 2.3661 0.2372 9.976 < 2e-16 ***
## agegr70-79 2.6238 0.2548 10.297 < 2e-16 ***
## arsenic21+ years 0.8109 0.1210 6.699 2.09e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 376.02 on 113 degrees of freedom
## Residual deviance: 125.02 on 109 degrees of freedom
## AIC: 353.8
##
## Number of Fisher Scoring iterations: 5
At this stage, we would accept ‘model6’ as the final model, since it has the smallest AIC among all the models that we have tried. We conclude that exposure to arsenic for at least one year is associated with an increased risk for the disease by exp(0.8109) or 2.25 times with statistical significance, independently from age.
In the Poisson model, the outcome is a count. In the general linear model, the relationship between the values of the outcome (as measured in the data and predicted by the model in the fitted values) and the linear predictor is determined by the link function. This link function relates the mean value of the outcome to its linear predictor. By default, the link function for the Poisson distribution is the natural logarithm. With the offset being log(person-time), the value of the outcome becomes the log(incidence rate).
The matrix ‘table.inc10000’ (created previously) gives the crude incidence rate by age group and period. Each of the Poisson regression models above can be used to compute the predicted incidence rate when the variables in the model are given. For example, to compute the incidence rate from a population of 100,000 people aged between 40-49 years who were exposed to arsenic for less than one year using ‘model6’, type:
newdata <- as.data.frame(list(agegr="40-49",arsenic2="<1 year", personyrs=100000))
predict(model6, newdata, type="response")
## 1
## 33.25736
This population would have an estimated incidence rate of 33.26 per 100,000 person-years.
In this example, all subjects pool their follow-up times and this number is called ‘person time’, which is then used as the denominator for the event, resulting in ‘incidence rate’. Comparing the incidence rate among two groups of subjects by their exposure status is fairer than comparing the crude risks (considering population at baseline). The ratio between the incidence rates of two groups is called the incidence rate ratio (IRR), which is an improved form of the relative risk.
In ‘model6’, for example, we want to compute the incidence rate ratio between the subjects exposed to arsenic for one or more years against those exposed for less than one year.
The shorter way to obtain this IRR is to exponentiate the coefficient of the specific variable ‘arsenic’, which is the fifth coefficient in the model.
coef(model6)
## (Intercept) agegr50-59 agegr60-69 agegr70-79
## -8.0086495 1.4701505 2.3661111 2.6237532
## arsenic21+ years
## 0.8108698
exp(coef(model6)[5])
## arsenic21+ years
## 2.249864
The following steps explain how the 95% confidence interval for all variables can be obtained.
coeff <- coef(model6)
coeff.95ci <- cbind(coeff, confint(model6))
coeff.95ci
## coeff 2.5 % 97.5 %
## (Intercept) -8.0086495 -8.4780952 -7.598323
## agegr50-59 1.4701505 1.0094838 1.976026
## agegr60-69 2.3661111 1.9239329 2.858522
## agegr70-79 2.6237532 2.1411250 3.145495
## arsenic21+ years 0.8108698 0.5724818 1.047494
Note that confint(glm6) provides a 95% confidence interval for the model coefficients.
IRR.95ci <- round(exp(coeff.95ci), 1)[-1,]
The required values are obtained from exponentiating the last matrix with the first row or intercept removed. The display is rounded to 1 decimal place for better viewing. Then the matrix column is labelled and the 95% CI is displayed.
colnames(IRR.95ci) <- c("IRR", "lower95ci", "upper95ci")
IRR.95ci
## IRR lower95ci upper95ci
## agegr50-59 4.3 2.7 7.2
## agegr60-69 10.7 6.8 17.4
## agegr70-79 13.8 8.5 23.2
## arsenic21+ years 2.2 1.8 2.9
A simpler way is to use the command idr.display in epiDisplay (but they use the term IDR: Incidence Density Rate).
idr.display(model6, decimal=1)
##
## Poisson regression predicting respdeath with offset = log(personyrs)
##
## crude IDR(95%CI) adj. IDR(95%CI) P(Wald's test)
## agegr: ref.=40-49
## 50-59 4.5 (2.8,7.3) 4.3 (2.7,7) < 0.001
## 60-69 11.3 (7.1,17.9) 10.7 (6.7,17) < 0.001
## 70-79 14.5 (8.8,23.8) 13.8 (8.4,22.7) < 0.001
##
## arsenic2: 1+ years vs <1 year 2.5 (2,3.1) 2.2 (1.8,2.9) < 0.001
##
## P(LR-test)
## agegr: ref.=40-49 < 0.001
## 50-59
## 60-69
## 70-79
##
## arsenic2: 1+ years vs <1 year < 0.001
##
## Log-likelihood = -171.9
## No. of observations = 114
## AIC value = 353.8
The command idr.display gives results to 3 decimal places by default. This can easily be changed by the user.
Recall that for Poisson regression one of the assumptions for a valid model is that the mean and variance of the count variable are equal. The negative binomial distribution is a more generalized form of distribution used for ‘count’ response data, allowing for greater dispersion or variance of counts. In practice, it is quite common for the variance of the outcome to be larger than the mean. This is called overdispersion. If a count variable is overdispersed, Poisson regression underestimates the standard errors of the predictor variables. When overdispersion is evident, one solution is to specify that the errors have a negative binomial distribution. Negative binomial regression gives the same coefficients as those from Poisson regression but give larger standard errors. The interpretation of the results is the same as that from Poisson regression. Take an example of counts of water containers infested with mosquito larvae in a field survey. The data is contained in the dataset DHF99.
library(MASS)
data(DHF99)
summary(DHF99)
## houseid village education containers
## Min. : 1.00 Min. : 1.00 Primary :168 Min. : 0.0000
## 1st Qu.: 77.75 1st Qu.: 23.00 Secondary : 36 1st Qu.: 0.0000
## Median :154.50 Median : 51.00 High school: 34 Median : 0.0000
## Mean :174.27 Mean : 48.56 Bachelor : 25 Mean : 0.3512
## 3rd Qu.:269.25 3rd Qu.: 75.00 Other : 37 3rd Qu.: 0.0000
## Max. :385.00 Max. :105.00 Max. :11.0000
## NA's :1
## viltype
## rural:180
## urban: 72
## slum : 48
##
##
##
##
describeBy(DHF99$containers, group=DHF99$viltype)
##
## Descriptive statistics by group
## group: rural
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 179 0.49 1.25 0 0.2 0 0 11 11 4.58 29.1 0.09
## ------------------------------------------------------------
## group: urban
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 72 0.07 0.26 0 0 0 0 1 1 3.32 9.13 0.03
## ------------------------------------------------------------
## group: slum
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 48 0.25 0.53 0 0.15 0 0 2 2 1.93 2.84 0.08
The function for performing a negative binomial glm is glm.nb. This function is located in the MASS library. In addition, a very helpful function for selecting the best model based on the AIC value is the step function, which is located in the stats library (a default library loaded on start-up).
model.poisson <- step(glm(containers ~ education + viltype, family=poisson, data=DHF99))
## Start: AIC=513.3
## containers ~ education + viltype
##
## Df Deviance AIC
## - education 4 363.67 505.92
## <none> 363.05 513.30
## - viltype 2 397.54 543.79
##
## Step: AIC=505.92
## containers ~ viltype
##
## Df Deviance AIC
## <none> 363.67 505.92
## - viltype 2 398.52 536.76
model.nb <- step(glm.nb(containers ~ education + viltype, data=DHF99))
## Start: AIC=432.05
## containers ~ education + viltype
##
## Df Deviance AIC
## - education 4 156.64 424.23
## <none> 156.46 432.05
## - viltype 2 174.78 446.37
##
## Step: AIC=424.23
## containers ~ viltype
##
## Df Deviance AIC
## <none> 156.37 424.23
## - viltype 2 174.95 438.80
coef(model.poisson)
## (Intercept) viltypeurban viltypeslum
## -0.7100490 -1.9571792 -0.6762454
coef(model.nb)
## (Intercept) viltypeurban viltypeslum
## -0.7100490 -1.9571792 -0.6762454
Both models end up with only ‘viltype’ being selected. The coefficients are very similar. The Poisson model has significant overdispersion but not the negative binomial model.
poisgof(model.poisson)$p.value
## [1] 0.004387831
poisgof(model.nb)$p.value
## [1] 1
The AIC of the negative binomial model is also better (smaller) than that of the Poisson model.
model.poisson$aic
## [1] 505.9154
model.nb$aic
## [1] 426.2283
Finally, the main differences to be examined are their standard errors, the 95% confidence intervals and P values.
summary(model.poisson)$coefficients
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7100490 0.1066000 -6.660873 2.722059e-11
## viltypeurban -1.9571792 0.4597429 -4.257117 2.070800e-05
## viltypeslum -0.6762454 0.3077286 -2.197538 2.798202e-02
summary(model.nb)$coefficients
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7100490 0.1731160 -4.101578 4.103414e-05
## viltypeurban -1.9571792 0.5255707 -3.723912 1.961591e-04
## viltypeslum -0.6762454 0.4274174 -1.582166 1.136116e-01
idr.display(model.poisson)
##
## Poisson regression predicting # infested vessels
##
## IDR(95%CI) P(Wald's test) P(LR-test)
## Village type: ref.=rural < 0.001
## urban 0.14 (0.06,0.35) < 0.001
## slum 0.51 (0.28,0.93) 0.028
##
## Log-likelihood = -249.9577
## No. of observations = 299
## AIC value = 505.9154
attach(DHF99)
idr.display(model.nb)
##
## Negative binomial regression predicting # infested vessels
##
## IDR(95%CI) P(Wald's test) P(LR-test)
## Village type: ref.=rural < 0.001
## urban 0.14 (0.05,0.4) < 0.001
## slum 0.51 (0.22,1.18) 0.114
##
## Log-likelihood = -209.1141
## No. of observations = 299
## AIC value = 426.2283
The standard errors from the negative binomial model are slightly larger than those from the Poisson model resulting in wider 95% confidence intervals and larger P values. From the Poisson regression, both urban community and slum area had a significantly lower risk (around 14% and a half reduction, respectively) for infestation. However, from the negative binomial regression, only the urban community had a significantly lower risk.