Lab on priors

Esempi di specificazione di a priori

library(dplyr) # optional
library(kableExtra)
library(PriorGen)

Linguaggio: R

Esempi

  1. Sensitivity of a Test Based on Past Research
  2. Prevalence Based on Expert Opinion
  3. Univ. of Iowa Students with ACT Scores ≥ 19 in the Early 1970s
  4. Student test scores

Specifying Beta priors

Libreria PriorGen: Generates Prior Distributions for Proportions

Translates beliefs into prior information in the form of Beta and Gamma distributions. It can be mainly used for the generation of priors on the prevalence of disease and the sensitivity/specificity of diagnostic tests.

The last example is a R code demonstrating the specification of a Beta prior distribution akin to Computer-Assisted Data Analysis (CADA) as described by Novick & Jackson (1974)

Sensitivity of a Test Based on Past Research

Past research studies in the literature have estimated \(\theta\):

  • The estimates from the studies averaged out to be .90.
  • And all of the studies have estimated it to be above .80.

Model prior beliefs that say

  • we expect a value of .90
  • we’re 95% sure the value is greater than .80
# R code demonstrating the specification of a Beta prior distribution in PriorGen
# 
# * Find the beta distribution ----------------
findbeta(
  themean=.90,    # best guess for the mean
  #themedian=.90, # best guess for the median
  #themode=.90,   # best guess for the mode 
  percentile=.95, # confidence level   
  lower.v=FALSE,  # upper bound (TRUE) or lower bound (FALSE)
  percentile.value=0.80      # boundary value
)
[1] "The desired Beta distribution that satisfies the specified conditions is: Beta( 27.79 3.09 )"
[1] "Here is a plot of the specified distribution."
[1] "Descriptive statistics for this distribution are:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5761  0.8699  0.9088  0.9004  0.9397  0.9981 
[1] "Verification: The percentile value 0.8 corresponds to the 0.05 th percentile"
# * Define the beta distribution ----------------
alpha <- 27.79
beta <- 3.09


# * Plot the beta distribution ----------------
curve(
  dbeta(x, alpha, beta),
  xlab="", ylab="",
  main=paste0("Beta(", alpha, ",", beta, ")"),
  yaxt="n"
)

Prevalence Based on Expert Opinion

A SME indicates that

  • Their best guess is .15
  • They would be really surprised if it’s above .40.

Model prior beliefs that say

  • the most likely value is .15
  • we’re 90% confident the value is less than .40
# * Find the beta distribution ----------------
findbeta(
  #themean=.15,              # best guess for the mean
  #themedian=.15,            # best guess for the median
  themode=.15,               # best guess for the mode 
  percentile=.90,            # confidence level   
  lower.v=TRUE,              # upper bound (TRUE) or lower bound (FALSE)
  percentile.value=0.40      # boundary value
)
[1] "The desired Beta distribution that satisfies the specified conditions is: Beta( 2.15 7.52 )"
[1] "Here is a plot of the specified distribution."
[1] "Descriptive statistics for this distribution are:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00226 0.12602 0.20532 0.22487 0.30219 0.73757 
[1] "Verification: The percentile value 0.4 corresponds to the 0.9 th percentile"
# * Define the beta distribution ----------------
alpha <- 2.15
beta <- 7.52


# * Plot the beta distribution ----------------
curve(
  dbeta(x, alpha, beta),
  xlab="", ylab="", yaxt="n",
  main=paste0("Beta(", alpha, ",", beta, ")")
)

Univ. of Iowa Students with ACT Scores ≥ 19 in the Early 1970s

Function to find the HDI given the credMass

# R code demonstrating the specification of a Beta prior distribution akin to CADA as described by Novick & Jackson (1974)
# 
# # Define HDI function (from Krushke) ---------------

HDIofICDF = function( ICDFname , credMass=0.95 , tol=1e-8 , ... ) {
   # Arguments:
   #   ICDFname is R's name for the inverse cumulative density function
   #     of the distribution.
   #   credMass is the desired mass of the HDI region.
   #   tol is passed to R's optimize function.
   # Return value:
   #   Highest density iterval (HDI) limits in a vector.
   # Example of use: For determining HDI of a beta(30,12) distribution, type
   #   HDIofICDF( qbeta , shape1 = 30 , shape2 = 12 )
   #   Notice that the parameters of the ICDFname must be explicitly named;
   #   e.g., HDIofICDF( qbeta , 30 , 12 ) does not work.
   # Adapted and corrected from Greg Snow's TeachingDemos package.
   incredMass =  1.0 - credMass
   intervalWidth = function( lowTailPr , ICDFname , credMass , ... ) {
      ICDFname( credMass + lowTailPr , ... ) - ICDFname( lowTailPr , ... )
   }
   optInfo = optimize( intervalWidth , c( 0 , incredMass ) , ICDFname=ICDFname ,
                        credMass=credMass , tol=tol , ... )
   HDIlowTailPr = optInfo$minimum
   return( c( ICDFname( HDIlowTailPr , ... ) ,
                ICDFname( credMass + HDIlowTailPr , ... ) ) )
}

Set the mode to .75 and start with prior pseudo observations set to 25

# Specify the most likely value and the weight ----------------

most.likely <- .75
prior.weight <- 25


# Calculate alpha and beta ----------------

alpha = prior.weight * most.likely +1
beta = prior.weight - alpha + 2

# print out alpha and beta ----------------

print(paste0("alpha = ", alpha, ", beta = ", beta))
[1] "alpha = 19.75, beta = 7.25"


# Plot the beta distribution ----------------
curve(
  dbeta(x, alpha, beta),
  xlab="", ylab="", yaxt="n",
  main=paste0("Beta(", alpha, ",", beta, ")")
  )


# Obtain the highest density interval ----------------
confidence.level = .50
HDIofICDF( qbeta , shape1 = alpha , shape2 = beta, credMass=confidence.level )
[1] 0.6903551 0.8039036

Continue with \(\alpha+\beta-2=50, 100, 150,\) up to \(200\) (final decision is 150)

prior.weight <- 150

# Calculate alpha and beta ----------------

alpha = prior.weight * most.likely +1
beta = prior.weight - alpha + 2

# Plot the beta distribution ----------------
curve(
  dbeta(x, alpha, beta),
  xlab="", ylab="", yaxt="n",
  main=paste0("Beta(", alpha, ",", beta, ")")
  )


# Obtain the highest density interval ----------------
confidence.level = .50
HDIofICDF( qbeta , shape1 = alpha , shape2 = beta, credMass=confidence.level )
[1] 0.7257685 0.7732301

Plotting priors and prior predictive distribution

R code for examining implications of priors

Student test scores

Normal prior for the mean and exponential for the standard deviation

par(mfrow=c(1,2))

# Plot normal prior for the mean -----
mu.mu = 75
sigma.squared.mu = 50
sigma.mu = sqrt(sigma.squared.mu)

curve(
  dnorm( x , mu.mu , sigma.mu ), 
  from=mu.mu-4*sigma.mu, 
  to=mu.mu+4*sigma.mu,
  xlab="mu",
  ylab="p(mu)",
  font.lab=6,
  font.axis=6,
  lwd=2,                        # thicker than default line
  col="blue",
  main=paste0("N(",mu.mu,",",sigma.squared.mu,")")
)

# Plot exponential for the standard deviation -----
lambda = 1

curve(
  dexp( x , rate=lambda), 
  from=0, 
  to=5,
  xlab="sigma",
  ylab="p(sigma)",
  font.lab=6,
  font.axis=6,
  lwd=2,                        # thicker than default line
  col="blue",
  main=paste0("exp(",lambda,")")
)

Prior predictive distribution

# Prior predictive distribution -----
n_samples = 50000
sample_mu <- rnorm(n_samples, mu.mu , sigma.mu )
sample_sigma <- rexp(n_samples, rate=lambda) 
prior_x <- rnorm(n_samples, sample_mu, sample_sigma)

plot(
  density(prior_x),
  xlab="x",
  ylab="p(x)",
  font.lab=6,
  font.axis=6,
  lwd=2,                        # thicker than default line
  col="dark green",
  main="Prior predictive distribution"
)