Basic Epidemiological Measures: always Prevalence and Incidence!

Load the R libraries that we will use: note that in R it is possible to use different libraries for the same type of calculations; we will introduce here the survival library for the person years calculations. We will use a lot this package in Block 4.

library(Epi)
library(AER)
library(MASS)
library(survival)

US Traffic Fatalities (cumulative incidence)

We use a dataset of US traffic fatalities data for the “lower 48” US states (i.e., excluding Alaska and Hawaii), annually for 1982 through 1988.

data(Fatalities)
head(Fatalities)

##   state year spirits unemp   income   emppop  beertax baptist  mormon drinkage
## 1    al 1982    1.37  14.4 10544.15 50.69204 1.539379 30.3557 0.32829    19.00
## 2    al 1983    1.36  13.7 10732.80 52.14703 1.788991 30.3336 0.34341    19.00
## 3    al 1984    1.32  11.1 11108.79 54.16809 1.714286 30.3115 0.35924    19.00
## 4    al 1985    1.28   8.9 11332.63 55.27114 1.652542 30.2895 0.37579    19.67
## 5    al 1986    1.23   9.8 11661.51 56.51450 1.609907 30.2674 0.39311    21.00
## 6    al 1987    1.18   7.8 11944.00 57.50988 1.560000 30.2453 0.41123    21.00
##       dry youngdrivers    miles breath jail service fatal nfatal sfatal
## 1 25.0063     0.211572 7233.887     no   no      no   839    146     99
## 2 22.9942     0.210768 7836.348     no   no      no   930    154     98
## 3 24.0426     0.211484 8262.990     no   no      no   932    165     94
## 4 23.6339     0.211140 8726.917     no   no      no   882    146     98
## 5 23.4647     0.213400 8952.854     no   no      no  1081    172    119
## 6 23.7924     0.215527 9166.302     no   no      no  1110    181    114
##   fatal1517 nfatal1517 fatal1820 nfatal1820 fatal2124 nfatal2124  afatal
## 1        53          9        99         34       120         32 309.438
## 2        71          8       108         26       124         35 341.834
## 3        49          7       103         25       118         34 304.872
## 4        66          9       100         23       114         45 276.742
## 5        82         10       120         23       119         29 360.716
## 6        94         11       127         31       138         30 368.421
##       pop  pop1517  pop1820  pop2124 milestot unempus emppopus         gsp
## 1 3942002 208999.6 221553.4 290000.1    28516     9.7     57.8 -0.02212476
## 2 3960008 202000.1 219125.5 290000.2    31032     9.6     57.9  0.04655825
## 3 3988992 197000.0 216724.1 288000.2    32961     7.5     59.5  0.06279784
## 4 4021008 194999.7 214349.0 284000.3    35091     7.2     60.1  0.02748997
## 5 4049994 203999.9 212000.0 263000.3    36259     7.0     60.7  0.03214295
## 6 4082999 204999.8 208998.5 258999.8    37426     6.2     61.5  0.04897637

This data frame contain 336 observations on 34 variables. Traffic fatalities are extracted from the US Department of Transportation Fatal Accident Reporting System. There are a lot of information, for example the beer tax is the tax on a case of beer, which is an available measure of state alcohol taxes more generally. The drinking age variable (drinkage) indicates whether the minimum legal drinking age is 18, 19, or 20. The two binary punishment variables (jail and service) describe the state’s minimum sentencing requirements for an initial drunk driving conviction.Total vehicle miles traveled annually by state was obtained from the Department of Transportation. Personal income was obtained from the US Bureau of Economic Analysis, and the unemployment rate was obtained from the US Bureau of Labor Statistics.

Variables of interest for us now are:

fatal: Number of vehicle fatalities
pop: Population in the state

We can calculate the fatality cumulative incidence rate (number of traffic deaths per 10000 people living in that state in that year) as follows:

Fatalities$frate <- with(Fatalities, fatal/pop*10000)
head(Fatalities$frate)

## [1] 2.12836 2.34848 2.33643 2.19348 2.66914 2.71859

Then, we can calculate the cumulative incidence per state per year, and plot as time series.

table.deaths <- with(Fatalities,tapply(fatal, list(state,year), sum))
table.exp    <- with(Fatalities,tapply(pop, list(state,year), sum))
inc.cum.year.state    <-(table.deaths/table.exp)*10000

Plot just the first five states:

plot.ts(t(inc.cum.year.state[1:5,]), plot.type="single", xlab="years", ylab="Incidence", col=c(1:5), lty=c(1:5), ylim=c(1,4)) 
legend("topleft", col=c(1:5), lty=c(1:5), legend=c("al", "az", "ar", "ca", "co"),bty="n")

Note that here we are not showing the confidence intervals (assuming that these data come from random sampling…) around these estimates. We can use library epiR to do this :

library(epiR)
tmp <- cbind(Fatalities$fatal,Fatalities$pop)
pp <- epi.conf(tmp, ctype="inc.risk", method="exact")
inc.rates <- pp*10000
head(inc.rates)

##       est    lower    upper
## 1 2.12836 1.986775 2.277370
## 2 2.34848 2.199966 2.504381
## 3 2.33643 2.188834 2.491359
## 4 2.19348 2.051104 2.343132
## 5 2.66914 2.512397 2.833098
## 6 2.71859 2.561012 2.883324

Melanoma data (incidence rate)

Loading and viewing now the dataset Melanoma:

data(Melanoma)
head(Melanoma)

##   time status sex age year thickness ulcer
## 1   10      3   1  76 1972      6.76     1
## 2   30      3   1  56 1968      0.65     0
## 3   35      2   1  41 1977      1.34     0
## 4   99      3   0  71 1968      2.90     0
## 5  185      1   1  52 1965     12.08     1
## 6  204      1   1  28 1971      4.84     1

The Melanoma data frame has data on 205 patients in Denmark with malignant melanoma. This data frame contains the following columns:

time : survival time in days, possibly censored (we will define in detail characteristics of survival data later on block 4).
status: 1 died from melanoma, 2 alive, 3 dead from other causes.
sex: 1 = male, 0 = female.
age: age in years.
year: year of operation.
thickness: tumour thickness in mm.
ulcer: 1 = presence, 0 = absence.

Now we want to calculate the incidence rate of death for melanoma using the person time epidemiological approach. (We will ignore now death for other causes, that represents a competing risk in the survival jargon…).

mm.deaths <-sum(Melanoma$status==1)
mm.deaths

## [1] 57

per.time  <- sum((Melanoma$time)/365.25)
per.time

## [1] 1208.279

mortality.rate  <-mm.deaths/per.time
incidence.rate  <- round(100*mortality.rate,1)
incidence.rate

## [1] 4.7

If you report the incidence rate of deaths as 4.7 per 100 person-years, epidemiologists might understand, but most others will not. Person-time is epidemiologic jargon.

To convert this jargon to something understandable, simply replace person-years with persons per year. Reporting the results as 4.7 deaths per 100 persons per year sounds like English rather than jargon.

It also conveys the sense of the incidence rate as a dynamic process, the speed at which new cases of disease occur in the population.

In the R library(survival) there is a specific function that computes the person-years of follow-up time contributed by a cohort of subjects, also stratified into subgroups.

py <- pyears(Surv(time, status==1) ~1, data=Melanoma) 
py$pyears

## [1] 1208.279

We can evaluate for example if the incidence rate of deaths seems different among men and women (we will only examine it from a descriptive point of view, without making any inference):

py.sex <- pyears(Surv(time, status==1) ~sex, data=Melanoma) 
py.sex

## Call:
## pyears(formula = Surv(time, status == 1) ~ sex, data = Melanoma)
## 
## Total number of events: 57 
## Total number of person-years tabulated: 1208.279 
## Total number of person-years off table: 0 
## Observations in the data set: 205

py.sex$pyears

## sex
##        0        1 
## 787.4415 420.8378

Now let’s re-do the calculations stratified by sex:

index.males     <- Melanoma$sex==1
mm.deaths.males <-sum(Melanoma[index.males,]$status==1)
mm.deaths.males

## [1] 29

index.females   <- Melanoma$sex==0
mm.deaths.females     <-sum(Melanoma[index.females,]$status==1)
mm.deaths.females

## [1] 28

mortality.rate.males  <-mm.deaths.males/py.sex$pyears[2]
incidence.rate.males  <- round(100*mortality.rate.males,1)
mortality.rate.females  <-mm.deaths.females/py.sex$pyears[1]
incidence.rate.females  <- round(100*mortality.rate.females,1)
incidence.rate.males

##   1 
## 6.9

incidence.rate.females

##   0 
## 3.6

So, for males we observe 6.9 deaths per 100 persons per year, instead for females 3.6 deaths per 100 persons per year.