Let \(y=(y_{1},\ldots, y_{n})\) a sample of i.i.d. values from a Weibull distribution, \(Y \sim \mbox{We}(\gamma, \beta)\), with parameter \(\theta=(\gamma, \beta)\) and density function:
\[ f (y; \gamma, \beta)= \frac{\gamma}{\beta}\left(\frac{y}{\beta}\right)^{\gamma-1}e^{-(y/\beta)^{\gamma}} , \ \ \ y \geq 0, \ \ \ \gamma, \beta >0,\] where \(\gamma\) is the shape parameter and \(\beta\) is the scale parameter.
#Plot of the probability density functions of the Weibull distribution, by varying lambda and beta
#shape = 0.5; scale = 1
curve(dweibull(x, shape=0.5, scale = 1), from = 0, to = 3,
ylab=expression("f(y;" ~ gamma ~"," ~ beta ~")"), xlab="y")
#shape = 1; scale = 1
curve(dweibull(x, shape=1, scale = 1), add=TRUE, col = "red")
#shape = 1; scale = 2
curve(dweibull(x, shape=1, scale = 2), add=TRUE, col = "blue")
#shape = 1.5; scale = 1
curve(dweibull(x, shape=1.5, scale = 1), add=TRUE, col = "green")
#shape = 5; scale = 1
curve(dweibull(x, shape=5, scale = 1), add=TRUE, col = "purple")
# add a legend to plot
legend("topright", legend=c(expression(gamma == 0.5 * ";" ~ beta == 1),
expression(gamma == 1 * ";" ~ beta == 1),
expression(gamma == 1 * ";" ~ beta == 2),
expression(gamma == 1.5 * ";" ~ beta == 1),
expression(gamma == 5 * ";" ~ beta == 1)),
col = c("black", "red", "blue", "green", "purple"), lty = rep(1,5),
cex = 0.8, box.lty = 0)
Then, the likelihood is defined as:
\[L( \gamma, \beta ;y )=\prod_{i=1}^{n}L_{i}=\prod_{i=1}^{n}f(y_i;\gamma, \beta),\] and the log-likelihood, up to an additive constant, is defined as:
\[\begin{align*} l(\gamma, \beta; y)= \log\left( \prod_{i=1}^{n}L_{i}\right)= & \sum_{i=1}^{n} \log f(y_i;\gamma, \beta) \\ = & n\log(\gamma)-n \gamma \log(\beta)+ \gamma\sum_{i=1}^{n}\log(y_{i})-\sum_{i=1}^{n}(y_{i}/\beta)^{\gamma}. \end{align*}\]
We may write the log-likelihood function in \(\mathsf{R}\):
# Negative log-likelihood function
# by using the dweibull() function
n_logLik_Weib <- function(param, data){
-sum(dweibull(data, shape = param[1], scale = param[2], log = TRUE))
}
# by hand (the last term in the summation, sum(log(y)), is the additive constant which could be removed without affecting inferential results)
n_logLik_Weib2 <- function(param, data){
n <- length(data)
res <- n*log(param[1]) - n*param[1]*log(param[2]) +
param[1]*sum(log(data)) - sum((data/param[2])^param[1]) - sum(log(data))
return(-res)
}
Suppose now to observe \(n = 15\) failure times (expressed in days) of a sample of light bulbs: we assume \(y_{1}, \ldots, y_{15} \sim \mbox{We}(\gamma, \beta)\), with \(\gamma\) and \(\beta\) unknown. We need to estimate them via maximum likelihood estimation. Preliminarily, we could inspect the log-likelihood function through contour plots.
# We generate the data by taking for example shape = 7, scale = 155
set.seed(1234)
y <- rweibull(15, shape = 7, scale = 155)
# Define a parameter grid to plot the log-likelihood
gamma <- seq(0.1, 15, length = 100)
beta <- seq(100, 200, length = 100)
parvalues <- expand.grid(gamma, beta)
# obtain the log-likelihood values for each point of the grid
n_llikvalues <- apply(parvalues, 1, n_logLik_Weib, data = y)
llikvalues <- matrix(-n_llikvalues, nrow = length(gamma), ncol = length(beta),
byrow = FALSE)
par(mfrow = c(1, 2))
# Define the confidence levels (the level of the confidence region)
conf_levels <- c(0, 0.5, 0.75, 0.9, 0.95, 0.99)
# contour plot
contour(gamma, beta, llikvalues - max(llikvalues),
levels = -qchisq(conf_levels, 2)/2,
xlab = expression(gamma),
ylab = expression(beta),
labels = as.character(conf_levels))
title("Weibull relative log likelihood")
# image plot
image(gamma, beta, llikvalues - max(llikvalues), zlim = c(-6, 0),
col = terrain.colors(20), xlab = expression(gamma),
ylab = expression(beta))
title("Weibull relative log likelihood")
We may compute the maximum likelihood estimates \(\hat{\gamma}, \hat{\beta}\) by equating at zero the log-likelihood derivatives:
\[\begin{align*} \frac{\partial}{\partial \gamma}l(\gamma, \beta; y)= & \frac{n}{\gamma}-n \log(\beta)+ \sum_{i=1}^{n}\log(y_{i})-\sum_{i=1}^{n}(y_{i}/\beta)^{\gamma}\log( y_{i}/\beta) = 0 \\ \frac{\partial}{\partial \beta}l(\gamma, \beta; y)= & -\frac{n}{\beta} \gamma + \frac{\gamma} {\beta^{\gamma+1}} \sum_{i=1}^{n} y_{i}^{\gamma} = 0 \end{align*}\]
Solving the second equation we get the constrained estimate \(\beta_{\gamma}= (\sum_{i=1}^{n} y^{\gamma}_{i}/n)^{1/\gamma}\). Substituting this in the first equation, we get
\[\frac{n}{\gamma}+\sum_{i=1}^{n} \log (y_{i})-n \frac{\sum_{i} y^{\gamma}_{i}\log(y_{i})}{\sum_{i} y^{\gamma}_{i}} =0\]
The last equation needs to be solved numerically.
# Score function (derivatives of l(theta) w.r.t. to gamma evaluated in betahat )
logLik_score_g <- function(x, data){
n <- length(data)
res <- n/x + sum(log(data)) - n*(sum(data^x*log(data))/(sum(data^x)))
return(res)
}
# MLE of gamma: solve the lok-likelihood equation based on the previous score by using the uniroot function
gammahat <- uniroot(logLik_score_g, c(1e-5, 15), data = y)$root
gammahat
## [1] 7.289321
# MLE of beta, for a fixed value of gamma (the MLE of gamma)
betahat <- mean(y^gammahat)^(1/gammahat)
betahat
## [1] 155.3338
# Check if the score is (0,0) in the MLE
library(numDeriv)
grad(c(gammahat, betahat), func = n_logLik_Weib, data = y)
## [1] 1.564041e-05 -2.112082e-12
Of course, we are interested in assessing the variability of our MLE estimators. Let \(\theta=(\gamma, \beta)\) denote the 2-dimensional parameter vector. At this purpose, we may compute the observed information matrix \[J(\theta; y)=- \frac{\partial^{2} l( \theta;y)}{ \partial \theta \partial \theta^{T}}.\]
\[\frac{\partial^{2} }{\partial \gamma^2}l( \theta;y)=-\frac{n}{\gamma^2}-\sum_{i=1}^{n}\Big(\frac{y_i}{\beta}\Big)^\gamma \Bigg\{\log\Big(\frac{y_i}{\beta}\Big) \Bigg\}^2\]
\[\frac{\partial^{2} }{\partial \gamma \partial \beta}l( \theta;y)=-\frac{n}{\beta}+\sum_{i=1}^{n}\Big(\frac{y_i^\gamma}{\beta^{\gamma+1}} \Big)\Big\{\gamma \log\Big(\frac{y_i}{\beta}\Big)+1 \Big\}\] \[\frac{\partial^{2} }{\partial \beta^2}l( \theta;y)= \frac{n\gamma}{\beta^2}-\frac{\gamma(\gamma+1)}{\beta^{\gamma+2}} \sum_{i=1}^{n}y_i^\gamma\]
Taking its inverse, evaluated in \((\hat{\gamma}, \hat{\beta})\), we obtain an estimate for the variance of our estimators.
n <- length(y)
# observed information matrix evaluated in the MLE
jhat <- matrix(NA, 2, 2)
jhat[1,1] <- n/gammahat^2 + sum((y/betahat)^gammahat*(log(y/betahat)^2))
jhat[1,2] <- jhat[2,1] <- n/betahat - sum(y^gammahat/(betahat^(gammahat+1))*(gammahat*log(y/betahat)+1))
jhat[2,2] <- -n*gammahat/(betahat^2) + gammahat*(gammahat+1)*sum(y^gammahat)/(betahat^(gammahat+2))
jhat
## [,1] [,2]
## [1,] 0.60641358 -0.04736006
## [2,] -0.04736006 0.03303187
# Check by using the hessian function of the numDeriv package
hessian(c(gammahat, betahat), func=n_logLik_Weib, data=y)
## [,1] [,2]
## [1,] 0.60641358 -0.04736006
## [2,] -0.04736006 0.03303187
# Estimate of the std.err of the MLE estimators
mle.se <- sqrt(diag(solve(jhat)))
mle.se
## [1] 1.362709 5.838764
Exercise 1
Compute the MLE and the observed information matrix for a gamma model with parameters shape \(\alpha\) and scale \(\beta\). Recall that for \(Y\sim \rm{Ga}(\alpha,\beta)\) the probability density function is \[f_Y(y; \alpha, \beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} y^{\alpha - 1} e^{-y/\beta}, \ \ \ y \geq 0, \ \ \ \alpha, \beta >0,\]
So far, we used \(\mathsf{R}\) simply as a pocket calculator, computing MLE and the variance of our estimators analytically, and then obtaining the numerical values just plugging into the formulas the inputs. However, remind the MLE for \(\gamma\), where the equation:
\[\frac{n}{\gamma}+\sum_{i=1}^{n} \log (y_{i})-n \frac{\sum_{i} y^{\gamma}_{i}\log(y_{i})}{\sum_{i} y^{\gamma}_{i}} =0\] does not have an analytical solution. Many times we do not have a closed form for MLE estimates and we may need numerical optimization. \(\mathsf{R}\) provides various functions for performing numerical methods:
\(\mathsf{nlm()}\): minimizes a function using a Newton-type algorithm. It needs a starting value and does not allow constraints on the parameters. It is usually fast and reliable. It returns the ML estimate \(\hat{\theta}\) (\(\mathsf{estimate}\)), the value of the likelihood \(-l(\hat{\theta})\) (\(\mathsf{minimum}\)) and the hessian (\(\mathsf{hessian}\)), if \(\mathsf{hessian = TRUE}\).
\(\mathsf{optim()}\): minimizes a function using Nelder-Mead, quasi-Newton and conjugate gradients algorithms. It includes an option for box-constrained optimization, and it requires a starting value. It returns the ML estimate \(\hat{\theta}\) (\(\mathsf{par}\)) and the value of the likelihood \(-l(\hat{\theta})\) (\(\mathsf{value}\)) and the hessian (\(\mathsf{hessian}\)), if \(\mathsf{hessian = TRUE}\). You also can maximize the function by using \(\mathsf{fnscale = -1}\) in \(\mathsf{control}\) argument.
\(\mathsf{nlminb()}\): often is more stable, robust and reliable than \(\mathsf{optim}\) (in particular with “nasty” functions). It performs only minimization and does not yield numerical derivatives as output. It returns the ML estimate \(\hat{\theta}\) (\(\mathsf{par}\)) and the value of the likelihood \(-l(\hat{\theta})\) (\(\mathsf{objective}\)).
\(\mathsf{optimize()}\): by using a combination of golden section search and successive parabolic interpolation, searches in an interval for a minimum or a maximum (if \(\mathsf{maximum=TRUE}\)) of a function. It returns the ML estimate \(\hat{\theta}\) (\(\mathsf{minimum}\)) and the value of the likelihood \(l(\hat{\theta})\) (\(\mathsf{objective}\)). Drawback: suited only for one-dimensional parameter.
Extra
# Numerical optimization of the log-likelihood function
# By using nlm function
#Staring values near to MLE estimate
weib.nlm_start1 <- nlm(f = n_logLik_Weib, p = c(5, 160), data = y, hessian = TRUE)
weib.nlm_start1
## $minimum
## [1] 67.4824
##
## $estimate
## [1] 7.289283 155.333712
##
## $gradient
## [1] 1.111246e-06 -1.282633e-07
##
## $hessian
## [,1] [,2]
## [1,] 0.60639337 -0.04728086
## [2,] -0.04728086 0.03299776
##
## $code
## [1] 1
##
## $iterations
## [1] 10
#Staring values far to MLE estimate
weib.nlm_start2 <- nlm(f = n_logLik_Weib, p = c(0.1, 0.1), data = y, hessian = TRUE)
weib.nlm_start2
## $minimum
## [1] 67.4824
##
## $estimate
## [1] 7.289276 155.333705
##
## $gradient
## [1] -2.544171e-06 -5.040877e-08
##
## $hessian
## [,1] [,2]
## [1,] 0.60639377 -0.04728075
## [2,] -0.04728075 0.03299770
##
## $code
## [1] 1
##
## $iterations
## [1] 27
# By using optim function
#Staring values near to MLE estimate
weib.optim_start1 <- optim(par = c(5, 160), fn= n_logLik_Weib, data = y, hessian = TRUE,
method = "L-BFGS-B", lower = rep(1e-7, 2), upper = rep(Inf,2))
weib.optim_start1
## $par
## [1] 7.289289 155.333802
##
## $value
## [1] 67.4824
##
## $counts
## function gradient
## 11 11
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $hessian
## [,1] [,2]
## [1,] 0.60641601 -0.04735972
## [2,] -0.04735972 0.03303161
#Staring values far to MLE estimate
weib.optim_start2 <- optim(par = c(0.1, 0.1), fn = n_logLik_Weib, data = y, hessian = TRUE,
method = "L-BFGS-B", lower = rep(1e-7, 2), upper = rep(Inf,2))
weib.optim_start2
## $par
## [1] 7.289292 155.333808
##
## $value
## [1] 67.4824
##
## $counts
## function gradient
## 35 35
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $hessian
## [,1] [,2]
## [1,] 0.60641575 -0.04735971
## [2,] -0.04735971 0.03303163
# What happens if we using c(0,0) as guesstimate
weib.nlm_start3 <- nlm(f = n_logLik_Weib, p = c(0, 0), data = y, hessian = TRUE)
weib.nlm_start3
## $minimum
## [1] 1.797693e+308
##
## $estimate
## [1] 0 0
##
## $gradient
## [1] 0 0
##
## $hessian
## [,1] [,2]
## [1,] 0 -Inf
## [2,] -Inf 0
##
## $code
## [1] 1
##
## $iterations
## [1] 0
A warning appears…to avoid numerical warnings, we should work with a parameters’ reparametrization:
\[ \psi = \psi(\theta)=(\psi_1 = \log (\gamma), \psi_2 = \log(\beta)). \]
Generally, for numerical purposes, it is convenient to reparameterize the model in such a way that the new parameter space is unbounded (in this case, \(\theta = \theta(\psi)= (e^{\psi_1}, e^{\psi_2})\)). At such point, the parameters estimates will be expressed in the log-scale, and we need to re-express them in the original scale:
# Reparameterization
theta <- function(omega) exp(omega)
# Negative log-likelihood
n_logLik_Weib_rep <- function(param, data) n_logLik_Weib(theta(param), data)
# Optimize the log-likelihood function by using nlm (also here the are some warnings but the algorithm works)
weib_nlm_start3_rep <- nlm(f = n_logLik_Weib_rep, p = c(0, 0), data = y)
weib_nlm_start3_rep
## $minimum
## [1] 67.4824
##
## $estimate
## [1] 1.986401 5.045573
##
## $gradient
## [1] -1.860059e-07 -1.101251e-06
##
## $code
## [1] 1
##
## $iterations
## [1] 34
# Check
theta(weib_nlm_start3_rep$estimate)
## [1] 7.289249 155.333353
weib.nlm_start1$estimate
## [1] 7.289283 155.333712
# Try to use also nlminb
The \(\mathsf{optim()}\) function provides a lot of numerical methods, such us Nelder-Mead, quasi-Newton, conjugate-gradient methods and simulated annealings. As a drawback, the user has to set up very carefully the initial parameters and the adopted method, since the final solution may be quite sensitive to these choices…To compute the observed information, the function \(\mathsf{optimHess()}\) computes numerical derivatives of generic functions, if \(\mathsf{hessian = TRUE}\) was forgotten in \(\mathsf{optim()}\).
# Working on the reparameterization
# Log-likelihood optimization by using optim:
# starting values = c(1,20), quasi-Newton Method
weib_optim_start3_rep_qn <- optim(par = c(1, 20), fn = n_logLik_Weib_rep,
method = "BFGS", data=y)
# starting values = c(1,6), conjugate-gradient Method
weib_optim_start3_rep_CG <- optim(par = c(1, 6), fn = n_logLik_Weib_rep,
method = "CG", data = y)
# check
theta(weib_optim_start3_rep_qn$par)
## [1] 7.289323 155.333973
theta(weib_optim_start3_rep_CG$par)
## [1] 7.28905 155.33369
weib.nlm_start1$estimate
## [1] 7.289283 155.333712
# Check the hessian
optimHess(theta(weib_optim_start3_rep_qn$par), n_logLik_Weib, data = y)
## [,1] [,2]
## [1,] 0.60641002 -0.04735869
## [2,] -0.04735869 0.03303163
jhat
## [,1] [,2]
## [1,] 0.60641358 -0.04736006
## [2,] -0.04736006 0.03303187
In practical situations, some components of the parameter vector \(\theta\) are more important than others; essentially, in such situations it is of interest for us making inference only on those subgroups of parameters. In the Weibull case, we could treat \(\gamma\) as the parameter of interest and \(\beta\) as the nuisance parameter. We may then define the profile log-likelihood:
\[l_{P}(\gamma) = \underset{\beta}{\max}\ l(\gamma, \beta; y)= l(\gamma, \hat{\beta}_{\gamma}; y),\] where \(\hat{\beta}_{\gamma}\) is the constrained MLE for \(\beta\), with \(\gamma\) fixed. Some issues deserve a quick consideration:
the profile log-likelihood is simply the log-likelihood for the bidimensional parameter \(\theta\), with the nuisance component \(\beta\) replaced by \(\hat{\beta}_{\gamma}= (\sum_{i=1}^{n} y^{\gamma}_{i}/n)^{1/\gamma}\).
\(l_{P}\) is not a genuine likelihood. However, it has some nice features which ease to work with it.
weib.y.mle<- optim(par = c(1,1), fn = n_logLik_Weib, hessian = TRUE,
method = "L-BFGS-B", data = y,
lower = rep(1e-7, 2), upper = rep(Inf, 2) )
# We can visualize the profile log-likelihood directly in the contour plot of the log-likelihood
gamma <- seq(0.1, 15, length = 100)
beta <- seq(100, 200, length = 100)
parvalues <- expand.grid(gamma, beta)
llikvalues <- apply(parvalues, 1, n_logLik_Weib, data = y)
llikvalues <- matrix(-llikvalues, nrow = length(gamma),
ncol = length(beta), byrow = FALSE)
conf.levels <- c(0, 0.5, 0.75, 0.9, 0.95, 0.99)
par(mfrow=c(1,1))
contour(gamma, beta, llikvalues - max(llikvalues),
levels = -qchisq(conf.levels, 2)/2,
xlab = expression(gamma),
ylab = expression(beta),
labels = as.character(conf_levels))
beta.gamma <- sapply(gamma, function(x) mean(y^x)^(1/x))
# line of the constrained estimate
lines(gamma, beta.gamma, lty = "dashed", col = 2)
points(weib.y.mle$par[1],weib.y.mle$par[2])
title("Weibull relative log-likelihood with profile log-likelihood")
In some sense, we reduced the dimension of the problem, and we acknowledged that we may work with a one-dimensional likelihood evaluated in the constrained value \(\hat{\beta}_{\gamma}\) for the nuisance component. Then, we may now compute some deviance confidence intervals with level \(1-\alpha\) as:
\[ \{ \gamma: l_{P}(\gamma) \ge l_{P}(\hat{\gamma})-\frac{1}{2}\chi^{2}_{1; 1-\alpha} \},\] where \(\chi^{2}_{1; 1-\alpha}\) is the \(1-\alpha\)-th quantile of a chi-squared distribution with 1 d.f, the asymptotic distribution for the profile likelihood-ratio test statistic:
\[W_{P}(\gamma)=2 \{ l_{P}(\hat{\gamma})-l_{P}(\gamma) \}.\]
# profile log-likelihood
n_logLik_Weib_profile <- function(gamma, data){
beta.gamma <- mean(data^gamma)^(1/gamma)
n_logLik_Weib(c(gamma, beta.gamma), data)
}
# vectorize the function with respect to gamma
n_logLik_Weib_profile_v <- Vectorize(n_logLik_Weib_profile, 'gamma')
# Vectorized function allows to compute the function in a vector of points and get the corresponding output results. So
n_logLik_Weib_profile(c(5,6,7), data = y) # wrong
## [1] 156639624
#since return the log-likelihood evaluated in c(5,6)
n_logLik_Weib(c(5,6), data=y)
## [1] 156639624
n_logLik_Weib_profile_v(c(5,6,7), data = y) # correct
## [1] 69.12128 67.96647 67.50531
# Plot the relative profile log-likelihood
plot(function(x) -n_logLik_Weib_profile_v(x, data = y) + weib.y.mle$value,
from = 0.1, to =15, xlab = expression(gamma),
ylab = "Profile relative log-likelihood", ylim = c(-8,0))
# choose the confidence level
conf.level <- 0.95
#Obtaining the upper and the lower limits of the deviance confidence interval
abline(h = -qchisq(conf.level,1)/2, lty = "dashed", col = "red")
# Find the numerical values by using the uniroot function
lrt.ci1 <- uniroot(function(x) -n_logLik_Weib_profile_v(x, data = y) + weib.y.mle$value +
qchisq(conf.level,1)/2,
c(1e-7,weib.y.mle$par[1]))$root
lrt.ci <- c(lrt.ci1, uniroot(function(x) -n_logLik_Weib_profile_v(x, data = y) + weib.y.mle$value +
qchisq(conf.level,1)/2,
c(weib.y.mle$par[1],15))$root)
# Plotting some quantities in the relative profile log-likelihood plot
segments(lrt.ci[1], -qchisq(conf.level,1)/2,
lrt.ci[1], -n_logLik_Weib_profile_v(lrt.ci[1], data = y), col = "red")
segments(lrt.ci[2], - qchisq(conf.level,1)/2,
lrt.ci[2], -n_logLik_Weib_profile_v(lrt.ci[2], data = y), col = "red")
points(lrt.ci[1],-qchisq(conf.level,1)/2, pch=16, col="red", cex=1.5)
points(lrt.ci[2],-qchisq(conf.level,1)/2, pch=16, col="red", cex=1.5)
segments(lrt.ci[1], -8.1, lrt.ci[2], -8.1, col = "red", lty = 1, lwd =2 )
text(7,-7.5, "95% Deviance CI", col = 2)
Exercise 2 The Wald confidence interval with level \(1-\alpha\) is defined as:
\[ \hat{\gamma} \pm z_{1-\alpha/2}j_{P}(\hat{\gamma})^{-1/2}. \] Compute the Wald confidence interval of level 0.95, plot the results, and evaluate via simulation the empirical coverage of the confidence interval.
Exercise 3 Repeat the steps above —write the profile log-likelihood, plot it and find the deviance confidence intervals— considering this time \(\gamma\) as a nuisance parameter and \(\beta\) as the parameter of interest.