Much of the statistical modeling done in epidemiology has involved transforming probabilities (risks), which are constrained to 0 and 1, into outcomes that are amenable to linear models that can range from negative to positive infinity. One useful approach to this problem has been the logistic transformation. It involves two steps. First, we take our risk measurement, and extend it to include values beyond 1. We do this by taking the odds. Second, we will take the log of the odds. This effectively establishes a linear model that can range from minus to plus infinity. Odds has a meaning related with probability. If p is the probability, p/(1-p) is known as the odds. Conversely, the probability would be equal to odds/(odds+1). While a probability always ranges from 0 to 1, an odds ranges from 0 to infinity.
risk <- seq(0,0.99,by=0.01)
odds <- risk/(1-risk)
plot(risk, odds)
If we restrict our attention to the range of probability less than 10% we can note that these two quantities are very close:
plot(risk[1:10], odds[1:10])
abline(0,1)
We can therefore use two simple functions to move from probability to odds and from odds to probability:
p2o <- function (p) p/(1-p)
o2p <- function (o) o/(1+o)
curve(log(x/(1 - x)), 0, 1)
Note that in logistic regression (see block 3), the non-intercept coefficients of the independent variables represent the (log) odds ratios, not the odds. Let’s suppose to have an independent variable X in the model that is binary (i.e. has only two possible values 1 and 2), then the regression coefficient corresponding to X is:
\[ OR = \frac{Odds_1}{Odds_2} = \frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}} \]
We will see the interpretation of the regression coefficients of the logistic regression also in the case when X is a numerical variable or a categorical variable with more than two levels.
The effectsize package contains also functions to convert among indices of effect size. This can be useful for meta-analyses, or any comparison between different types of statistical analyses.
library(effectsize)
As we have discussed, odds ratio, although popular, are not very intuitive in their interpretations. We don’t often think about the chances of catching a disease in terms of odds, instead we tend to think in terms of probability or some event - or the risk.
For example, if we find that for individual suffering from a migraine, for every bowl of brussels sprouts they eat, their odds of reducing the migraine increase by an \(OR = 3.5\) over a period of an hour. So, should people eat brussels sprouts to effectively reduce pain? Well, hard to say… Maybe if we look at RR we’ll get a clue.
We can convert between OR and RR with the following formula, but always pay attention to the fact that depending on the baseline risk these two measures can be quite different one from the other!
\[ RR = \frac{OR}{(1 - p0 + (p0 \times OR))} \] Where \(p0\) is the base-rate risk - the probability of the event without the intervention (e.g., what is the probability of the migraine subsiding within an hour without eating any brussels sprouts).
If it the base-rate risk is, say, 85% (very high!), we get a RR of:
OR <- 3.5
baserate <- 0.85
(RR <- oddsratio_to_riskratio(OR, baserate))
## [1] 1.12
That is - for every bowl of brussels sprouts, we increase the chances of reducing the migraine by a mere 12%! Is if worth it? Depends on you affinity to brussels sprouts…
Note that the base-rate risk is crucial here. If instead of 85% it was only 4%, then the RR would be:
oddsratio_to_riskratio(OR, 0.04)
## [1] 3.181818
That is - for every bowl of brussels sprouts, we increase the chances of reducing the migraine by a whopping 318%! Is if worth it? I guess that still depends on your affinity to brussels sprouts… If the outcome is not rare (>10%), it is better to use RR directly (if the study design permits!), as it provides a more accurate estimate of the association.