---
title: "Estimating VaR and ES Using ARMA-GARCH Models"
output:
  pdf_document: default
  latex_engine: xelatex
  html_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

To build realistic models for risk-management purposes we need to take into account the empirical properties of risk factors and discuss some time series models that are able to capture the properties of financial risk-factor change data,
in particular differenced logarithmic price series (log-returns) and exchange-rate series.

We start reviewing essential concepts in the analysis of time series, such as stationarity, autocorrelations and their estimation. Then, we will consider real data to discuss some features that apply to many time series of risk-factor changes, such as log-returns, known as *stylized facts* of financial time series. This will motivate the introduction of suitable statistical models for univariate time series, i.e. the ARMA and GARCH models, the latter being a popular approach to model the important phenomenon of volatility.


## Fundamentals of Time series analysis

This section provides a short summary of the basic definitions in classical univariate time series analysis.

A time series model for a single risk factor is a family of random variables (rvs) $(X_{t})_{t\in \mathbb{Z}}$, also known as stochastic process, indexed by the integers and defined on some probability space $(\Omega,\mathcal{F},P)$.

A **stochastic process** provides the description of a statistical phenomenon that evolves in time according to some probabilistic law. Mathematically, the process  $(X_{t})_{t\in \mathbb{Z}}$ is a collection of rvs that are ordered in time and defined on a set of discrete time points. 

A simple way of describing a stochastic process is to give the moments of the process, particularly the first and second moments. 
 
***Moments of a time series.*** Assuming they exist, the *mean function* $\mu(t)$ and the *autocovariance function* (acv.f.) $\gamma(t,s)$ of  $(X_{t})_{t\in \mathbb{Z}}$ are given by

$$
\mu(t)=E(X_t), \quad \quad t\in \mathbb{Z}
$$
$$
\gamma(t,s)=E((X_t-\mu(t))(X_s-\mu(s)))=\rm{Cov}(X_t, X_s), \quad \quad  t, s\in \mathbb{Z}.\\
$$
It follows that the autocovariance function satisfies $\gamma(s,t)=\gamma(t,s)$ and $\gamma(t,t)=\rm{Var}(X_t)$, that is, the variance function is a special case of the acv.f. when $s=t$. 

### Stationary processes

Generally speaking, a time series is said to be *stationary* if there is no systematic change in the mean (no trend), variance and if periodic variations such as seasonal effects have been removed. In other words, the behaviour of a time
series is similar in any epoch in which we might observe it. 

The definitions given below attempt to formalize such concept in terms of properties of the mean, the variance and the covariances between observations.  

***Strict stationarity.*** The time series $(X_{t})_{t\in \mathbb{Z}}$ is *strictly* stationary if 
$$
(X_{t_1}, \dots, X_{t_n})\,{\buildrel d \over =}\, (X_{t_1+k}, \dots, X_{t_n+k})
$$
for all $t_{1}, \dots,t_{n}, k \in \mathbb{Z}$, and $n\in \mathbb{N}$. In other words, shifting the time origin by and amount $k$ has no effect on the joint distribution of $X_{t_1}, \dots, X_{t_n}$.  


***Weak stationarity.***  The time series $(X_{t})_{t\in \mathbb{Z}}$ is *weakly* stationary (or *covariance* or *second-order* stationary) if the first two moments exist and satisfy 

$$
\mu(t)=\mu; \quad \quad \gamma(t,s)=\gamma(t+k, s+k), \quad \quad  t, s, k\in \mathbb{Z}.
$$
Thus the mean is constant and the acv.f.  $\gamma(t,s)$ do not depend on the value of $t$ and $s$ but only on $|s-t|$, which we call the *lag*. Indeed, we have that, for all $s$ and $t$

$$
\gamma(t-s, 0)=\gamma(t,s)=\gamma(s, t)=\gamma(s-t,0)
$$
so that the covariance between $X_t$ and $X_s$ only depends on their temporal separation (lag) and not on the specific values of $s$ and $t$. Thus, for a weakly stationary process we write the acv.f. as a function of one variable:
$$
\gamma(k):= \gamma(k, 0), \quad \forall k\in \mathbb{Z}, 
$$
and $\gamma(0)=Var(X_t)$. 



#### Autocorrelation function
The acv.f. depends on the units in which $X_t$ is measured, hence it is helpful to standardize it and define the so-called *autocorrelation function* to measure the correlation at lag $k$, that is the linear correlation between $X_{t}$ and $X_{t+k}$ (which is the same as the correlation between $X_k$ and $X_0$). 


***Autocorrelation function.*** The autocorrelation function (ACF) $\rho(k)$ of a weak stationary process $(X_{t})_{t\in \mathbb{Z}}$ is
$$
\rho(k) = \rho(X_t,X_{t+k})= \gamma(k)/\gamma(0), \quad  \forall k\in \mathbb{Z}, 
$$

In time series analysis the estimated quantities $\rho(k)$ for different values of $k$ are known as the *sample autocorrelation coefficients* at lag $k$, and are the objects of principal interest. Indeed, these coefficients are used to check for the presence of autocorrelation or *serial correlation* in a time series, which is of utmost importance for model selection.


#### (Strict) White noise

The basic building blocks for creating useful time series models are stationary processes without serial correlation, known as white noise processes. 

***White Noise.*** $(X_{t})_{t\in \mathbb{Z}}$ is a *white noise* process if it is covariance stationary with autocorrelation function

$$
\rho(k) = \begin{cases} 1, & k=0\\
0, & k\neq 0.
\end{cases}
$$
A series of uncorrelate rvs, centred to have mean zero ($\mu=0$) and variance $\sigma^2 = \rm{Var}(X_t)$ is  a white noise process and will be denoted $\rm{WN}(0, \sigma^2)$.

$(X_{t})_{t\in \mathbb{Z}}$ is a *strict* white noise (SWN) process if it is a series of iid rvs with finite variance.

A SWN process centred to have mean zero and variance $\sigma^2$
will be denoted $\rm{SWN}(0, \sigma^2)$. Note that SWN is the easiest kind of noise process to understand, although it is not the only possible noise. 


### Correlogram and Ljung-Box test

The estimation of serial correlation involves calculating empirical estimates of autocorrelations from a sample $X_1, \dots, X_n$ from the weak stationary time series $(X_{t})_{t\in \mathbb{Z}}$. Such estimates can then be used to make inference about the serial dependence structure of the underlying process.

***Correlogram.*** The sample ACF is computed as

$$
\hat{\rho}(k)=\hat{\gamma}(k)/\hat{\gamma}(0), \quad k\in [0, n)
$$
where $\hat{\gamma}(\cdot)$ is the sample autocovariance function $$
\hat{\gamma}(k)=\frac{1}{n}\sum_{t=1}^{n-k}(X_{t+k}-\bar{X})(X_{t}-\bar{X}), \quad k\in [0, n)
$$
$\hat{\gamma}(0)=\hat{\sigma}^2$ is the sample variance and $\bar{X}=(1/n)\sum_{t=1}^{n}X_t$ is the sample mean. 

The *correlogram* is the plot of $\hat{\rho}(k)$ versus the number of lags $k$ ($k=0, 1, 2\dots$), which is designed to facilitate the interpretation of the sample ACF. To interpret such estimators of serial correlation, we need to know the theoretical ACF for some time series models. 

In particular, it can be proved that for sufficiently large $n$, the sample autocorrelations of data from an SWN process will behave like iid normal observations with mean 0 and variance $1/n$. That is, $\hat{\rho}(k)$ is asymptotically normally distributed with $E(\hat{\rho}(k))\approx 0$ and $\rm{Var}(\hat{\rho}(k))\approx 1/n$.  

Thus, having plotted the correlogram, we can check for randomness (absence of serial correlation) by plotting approximate $95\%$ confidence limits at $\pm 1.96/\sqrt{n}$, since for a SWN process 95\% of the estimated correlations will lie in the interval $(-1.96/\sqrt{n}, 1.96/\sqrt{n})$. If more than 5\% of estimated correlations lie outside these bounds, then this is considered as  evidence against the null hypothesis that the data are an SWN. 
Let's provide an example by simulating 100 iid observations from the $\mathcal{N}(0,1)$ distribution and plotting the correlogram of these data:

```{r, fig.height=4, fig.width=5}
set.seed(3)
n<-100
x<-rnorm(n)
acf(x, main="")
```

The `null' 95\% confidence bounds are drawn at about $\pm 2/\sqrt{100}=\pm 0.2$, and there is no clear evidence to reject the hypothesis that the observations are independently distributed, since none of the plotted coefficient is statistically significant. 
\newpage

***The Ljung-Box Test (portmanteau test).*** 

It is often useful to combine the visual analysis of the correlogram with a formal numerical test of the strict white noise hypothesis. To this aim, a popular test is that of Ljung and Box, where the null hypothesis is that the data are an SWN. The statistic, computed by means of the first $m$ lags,  
$$
Q_{LB}(m)=n(n+2)\sum_{j=1}^{m} \frac{\hat{\rho}(j)^2}{n-j}
$$
has an asymptotic chi-squared distribution with $m$ degrees of freedom (see Ljung and Box, 1978).  

For the data simulated above  and $m=10$ we get

```{r}
Box.test(x, lag=10, type="Ljung-Box")
```


**Remark.** If a series of rvs forms an SWN process, then the series of absolute or squared variables must also be iid. Hence, by examining the correlogram and applying the Ljung-Box (LB) test to absolute values, one can furtherly test the SWN hypothesis. The absolute values are sometimes preferred to the squared values because the squared series is only an SWN when the underlying series has a finite fourth moment. Daily log-return data often point to models with an infinite fourth moment.

## Stylized facts of financial return series

The stylized facts described below arise from the empirical observation of the statistical properties of many time series of daily log-returns and often continue to hold for longer-interval series, such as weekly or monthly returns. 

In what follows the main stylized facts are illustrated by using the log-returns computed from the DAX Stock Index available in the EuStockMarkets *R*'s database. We first select the time series from 1994 to 1998, $(S_t)_t$, and then the log-returns are computed by logarithmic differencing $X_t=\ln(S_t/S_{t-1})$, $t=1, \dots, n$.

```{r, fig.height=4, fig.width=6}
data(EuStockMarkets)
# select DAX index values in the first column
daxI<-EuStockMarkets[ ,1]
DAX<-window(daxI, start=1994, end=1998)
plot(DAX)  # time series plot of DAX index 
```

The code below computes and plots the log-returns 

```{r, fig.height=4, fig.width=6}
lrt <- diff(log(DAX)) 
plot(lrt)
```

For univariate time series of log-returns the main stylized facts are 

 (a) Return series are not iid, although they show little serial correlation;
 
 (b) Return series are not normally distributed and apper to have heavy tails; 

 (c) Extreme returns appear in clusters;

 (d) Conditional expected returns are close to zero and volatility is non costant;

 (e) Series of absolute or squared returns show profound serial correlation.
 
 
Evidence for stylized facts (a) and (b) can be seen when comparing the time series plot and ACF plot of DAX log-returns to simulated iid data from a normal distribution with mean and variance given by the sample mean and sample variance of the returns, respectively
 
```{r, fig.height=4, fig.width=8}
n<-length(lrt)  
set.seed(20)    
Sim <- rnorm(n, mean=mean(lrt), sd=sqrt(var(lrt))) 
plot(1:n, lrt, main="(a)", type="l", ylab="", ylim=c(-0.06, 0.06))
points(1:n, Sim, type="l", col=2, lty=2)
legend("bottomleft", lty=c(1,2), col=c(1,2), legend=c("DAX log-returns", "iid normal"),
       bty="n")
```

The DAX returns and the normal iid series clearly behave in a different way, and do not show the same range of extreme values. Further evidence of the departure from the Gaussian model and the presence of heavy tails can be seen in the normal quantile-quantile (Q-Q) plot of the data, which provides a visual tool for investigating the relationship between empirical quantiles and normal theoretical quantiles: 

```{r, fig.height=4, fig.width=4}
qqnorm(lrt, datax=TRUE)
qqline(lrt, datax=TRUE)
```

The normal Q-Q plot confirms that the normal distribution is a poor model for DAX log-returns. In general, daily financial returns often  appear to have a much higher kurtosis than the normal distribution.

Moreover, the DAX returns exhibit a phenomenon known as *volatility clustering* (stylized fact (c)), which is not present in the simulated series. Volatility clustering can be described as the tendency for extreme returns to be followed by other extreme returns, although not necessarily with the same sign. In other words, volatility clusters are present when the  magnitudes of the volatilities (the standard deviations of returns) tend to cluster together, so that we observe many days of high volatility, followed by many days of low volatility. 

The presence of volatility clustering is also confirmed by the serial dependence shown in the absolute or squared return values (stylized fact (e)), whereas the correlogram reveals very little evidence of serial correlation in the raw data, whit most autocorrelations lying within the $95\%$ confidence bounds (stylized fact (a)). 


```{r, fig.height=4, fig.width=8}
par(mfrow=c(1,2))
Rt<-as.vector(lrt)
acf(Rt)
acf(abs(Rt))
```


While the lack of serial correlation in the DAX log-returns suggests that our best guess for tomorrow's return based on our observations up to today is zero, the profound serial dependence shown in the absolute values provides strong evidence for the predictability of volatility: if we know that returns have been large in the last few days, then we could expect that the distribution of tomorrow's return has a large variance. This idea is expressed in the assertion of the fourth stylized fact (d): *conditional* standard deviation of returns (volatility) given historical information is continually changing in a partly predictable manner and, hence, time-series models for changing volatility are needed to decsribe such phenomenon.


We can test for the joint significance of autocorrelation coefficients over several lags by using the Ljung-Box test, applied to the raw and absolute values, respectively.

```{r}
# test on DAX log returns
Box.test(Rt, lag=20, type="Ljung")
# test on absolute log returns
Box.test(abs(Rt), lag=20, type="Ljung")
```

Results confirm that the null hypothesis of no serial correlations cannot be rejected when considering DAX returns, so that the raw data can be considered approximately uncorrelated. However, the strong evidence against the null hypothesis of the LB test for the series of absolute returns suggests that the data support a white noise but not an SWN. In such a case, a particular class of non-linear models known as (G)ARCH models that are primarily concerned with modeling changes in variance can be considered.


```{r, fig.height=4, fig.width=5}
plot(abs(Rt), type="l")    #absolute values for DAX returns
# plot(Rt^2, type="l")    #squared values for DAX returns
```

## ARMA-GARCH modeling 

An important family of time series models is that of ARMA (autoregressive moving-average) processes that are widely used in many applications of time series analysis. 

#### ARMA models

ARMA models are weak stationary processes that are constructed using white noise as a basic building block and combine an autoregressive process and a moving average process.  When such models are used it implies that we are looking at a stationary process that can describe some form of temporal dependence in the data. 


***ARMA(p,q) model.*** Let $(\epsilon_t)_{t\in \mathbb{Z}}$ be $\rm{SWN}(0, \sigma_\epsilon^2)$. The process $(X_t)_{t\in \mathbb{Z}}$ is a zero-mean ARMA(p,q) model if it is a weak stationary process of the form 
$$
X_t=\phi_1X_{t-1}+\dots+\phi_p X_{t-p}+\epsilon_t+\theta_1 \epsilon_{t-1}+\dots+\theta_q \epsilon_{t-q}, \quad \forall t \in \mathbb{Z} 
$$
The white noise $(\epsilon_t)_{t\in \mathbb{Z}}$ is also known as the process of *innovations*. Note that if the $\epsilon_t$'s are iid, then the ARMA process is strictly stationary. 

$(X_t)_{t\in \mathbb{Z}}$ is an ARMA process with mean $\mu$ if the series $(X_t-\mu)_{t\in \mathbb{Z}}$ is a zero-mean ARMA(p,q) process. We can write such model as 

$$
X_{t}=\phi_{0}+\sum_{i=1}^{p}\phi_{i}X_{t-i}+\epsilon_{t}+\sum_{i=1}^{q}\theta_{i} \epsilon_{t-i}
$$
where $\phi_{0}=\mu(1-\phi_{1}- \dots-\phi_{p})$ is the intercept of the model.  The values $p$ and $q$ are called the autoregressive and the moving average order, respectively.
Note that 

- An *autoregressive model* of order p (AR(p)) is an ARMA(p, 0) process: $X_t$ is regressed on past values of $X_t$ together with a random error;

- A *moving average model* of order q (MA(q)) is an ARMA(0, q) process: the present value depends linearly on past values of the error series; 

- The ARMA(0, 0) model has the form  $X_{t}=\mu+\epsilon_t$ and is a white noise series.

***General linear process.*** For practical purposes, a requirement for ARMA processes is that they can be expressed in the form of a general linear process 
$$
X_t=\mu+\sum_{i=0}^{\infty}\psi_i \epsilon_{t-i},
$$
where a sufficient condition for the sum to converge, and hence for the process to be stationary, is that $\sum_{i=0}^{\infty}|\psi_i|<\infty$. For a linear time series the dynamic structure of $X_t$ is governed by the weights $\psi_i$. If $X_t$ is weakly stationary, we can obtain its mean and variance easily by using properties of $(\epsilon_t)$ as

$$
E(X_t)=\mu, \quad \quad \rm{Var}(X_t)=\sigma_\epsilon^2\sum_{i=0}^{\infty}\psi^2_i
$$
provided that $\psi_i\to 0$  as $i \to \infty$. Consequently, for a stationary series,
impact of the remote error $\epsilon_{t-i}$ on the return $X_t$ vanishes as $i$ increases. For such processes the $\psi$-weights are related to the autocorrelations of $X_t$ in such a way that the linear dependence of the current return $X_t$ on the remote past return $X_{t-k}$ diminishes for large $k$:

$$
\rho_k=\frac{\gamma_k}{\gamma_0}=\frac{\sum_{i=0}^{\infty}\psi_i \psi_{i+k}}{1+\sum_{i=0}^{\infty}\psi^2_i}, \quad k\geq 0,
$$
where $\psi_0=1$.


**Example: AR(1) model.**  The AR(1) model with mean $\mu\neq 0$ has the form $X_t-\mu=\phi_1(X_{t-1}-\mu)+\epsilon_t$ and, hence, by repeated substitutions we get 
$$
X_t-\mu=\sum_{i=0}^{\infty}\phi_1^i \epsilon_{t-i}.
$$

It follows that the process is a linear process with weights $\psi_i=\phi^i_1$, provided that $\phi_1^2<1$, which implies $|\phi_1|<1$. Moreover, it can be shown that the ACF of $X_t$ satisfies
$$
\rho_k=\phi_1^k \quad k>0.
$$
Thus the ACF is exponentially decaying with possibly alternating sign and starting value $\rho_0=1$.

An AR(1) process can be generated by the R function *arima.sim()*:

```{r, fig.height=4, fig.width=8}
set.seed(440)
x<-arima.sim(list(order = c(1,0,0), ar = 0.4), n=500) 
par(mfrow=c(1,2))
acf(x)   #ACF plot
pacf(x)  #PACF plot
```

The behaviour of data simulated from ARMA(p,q) models (having intercept $c$) is illustrated via the interactive Shiny App available at the link <https://robipappada.shinyapps.io/ShinyAppARMA/>.


*Remarks on causal and invertible processes.* An ARMA process is a casual process if and only if all the roots of the AR polynomial
$$
\phi(z)=1-\phi_1 z-\phi_2 z^2-\dots-\phi_{p}z^p
$$
exceed the unit in absolute value, $|z|>1$. Similarly, a general ARMA process is said to be *invertible* if the MA polynomial  
$$
\theta(z)=1+\theta_1 z+\theta_2 z^2+\dots+\theta_q z^q
$$ 
has no roots in the unit circle. In practice, the models we fit to data will satisfy both the causality and invertibility condition.



**Model Fitting and Checking**

A standard approach to model fitting first attempts to identify the order of a suitable ARMA process using the correlogram of the data and the plot of the so-called *partial autocorrelations* against the number of lags. Indeed, in some circumstances the sample ACF  and the partial autocorrelation function (PACF) may suggest that a pure MA or AR model can be appropriate for the data at hand. For instance, the presence of a cut-off at lag q in the correlogram is taken as a sign that a pure moving-average model can be appropriate to describe the features of the data, since it can be shown that for a MA(q) $\rho_k=0$ when $k>q$. Similar behaviour in the PACF plot indicates pure AR behaviour. However, the ACF and PACF are not informative in determining the order of mixed ARMA models. 

In practice, one can simply fit a variety of MA, AR and ARMA models and use well-known model selection criteria such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to choose the ``best'' model (see Chatfield, C., 2003, for further details).

Besides the Information Criteria, *residual analysis* is an important step in model comparison. Suppose an ARMA model was fit to real data and the parameters $\psi_i$ and $\theta_j$ estimated.  The residuals are inferred values $\hat{\epsilon_t}$ for the unobserved errors $\epsilon_t$ and they are calculated recursively from the data and fitted model. If the residuals behave like SWN, then no further time series modelling is required; if they behave like WN but not SWN, then the volatility models introduced below should be adopted. 


#### ARCH models

Autoregressive conditionally heteroscedastic (ARCH) models (Engle, 1982) and the generalized ARCH (GARCH) processes (Bollerslev, 1986) are the most important models for daily risk-factor return series, since such models have been introduced to address important features of volatility such as volatility clusters, heavy tails and leverage effect. 

Given the information set available at time $t-1$ representing the history of the process up to time $t-1$, $\mathcal{F}_{t-1}$, the manner under which the conditional variance $\sigma_t^2=\rm{Var}(X_t |\mathcal{F}_{t-1})$  evolves over time distinguishes one (G)ARCH model from another.

***ARCH(m) model.*** Let $(Z_t)_{t\in \mathbb{Z}}$ be SWN(0,1). The process $(X_t)_{t\in \mathbb{Z}}$ is an ARCH(m) process if it is strictly stationary and if it satisfies the equations
\begin{align*}
X_{t} &= \sigma_{t} Z_{t}\\
\sigma^2_t &= w + \alpha_1 X^2_{t-1}+\dots+\alpha_m X^2_{t-m}
\end{align*}

- $\sigma^2_t$ is the conditional variance of $X_t$ given the past values

- $(Z_{t})_t$ is an iid process with mean zero and variance 1

- $w>0$, $\alpha_{1}, \dots, \alpha_{m}\geq 0$ are the model's coefficients 

Hence, ARCH models applies to time series from which any trend, seasonal effects or short-term correlation have been removed, but whose variance is found to vary through time, rather than to be constant. For instance, for returns data,  $X_t$ could be the series obtained by removing the sample mean from the data if the sample mean is significantly different from zero and the data exhibit no serial correlation.  From 

$$
\rm{Var}(X_t \vert \mathcal{F}_{t-1})=E(\sigma_{t}^2 Z^2_{t} \,\vert\, \mathcal{F}_{t-1})=\sigma_{t}^2
$$
it follows that the process *conditional standard deviation* $\sigma_t$ (volatility) is a function of its previous squared values. If one or more of $|X_{t-i}|$ are particularly large, then $X_t$ is effectively drawn from a distribution with large variance, and may itself be large. This is the way the model generates volatility clusters. 

The distribution of the innovations $(Z_t)_t$ can in principle be any zero-mean, unit-variance distribution; in practice, alternative distributions to the normal one (like the Student's-t) are considered in order to address specific features such as fat tails. 


**Remarks on the ARCH(1) model.**  When $m=1$, the ARCH(1) model assumes that the square of $\sigma_t$ depends on the most recent value of the $X_t$ series by 
$$
\sigma^2_t = w + \alpha X^2_{t-1}
$$
Note that such equation does not include an error term and, hence, does not define a stochastic process. Moreover, $(X_t)_t$ is a WN process if and only if $\alpha<1$, and the unconditional variance of such process is 
$$
\rm{Var}(X_t)=E(X^2_t)=w/(1-\alpha)
$$
However, while ARCH models are WN, it is not the case that they are SWN. Indeed, if $(X_t)_t$ is ARCH(1), it can be shown that its squared series $(X^2_t)_t$ has the same form of ACF as an AR(1) model, thus exhibiting serial correlation in the ACF plot and a cut-off at lag 1 in the PACF plot. 

```{r, fig.height=4, fig.width=9}
set.seed(440)
w <- 0.1
alpha<- 0.4
z <- rnorm(1000)
x <- rep(0, 1000)
sig2 <- rep(0, 1000)
for (i in 2:1000) 
  {
  sig2[i] <- w + alpha * (x[i - 1]^2)
  x[i] <- z[i] * sqrt(sig2[i])
}

par(mfrow=c(1,3))
acf(x)
acf(x^2)
pacf(x^2)
```
#### Generalized ARCH models (GARCH)

A generalization of the ARCH model that allows the variance to depend on past values of both the series and the volatility in squared form is the *generalized ARCH* (or GARCH) model.

***GARCH(m, s) model.*** Let $(Z_t)_{t\in \mathbb{Z}}$ be SWN(0,1). The process $(X_t)_{t\in \mathbb{Z}}$ is a GARCH(m, s) process if it is strictly stationary and if it satisfies the equations
\begin{align*}
X_{t} &= \sigma_{t} Z_{t}\\
\sigma^2_t & = w+\sum_{i=1}^{m}\alpha_i X^2_{t-i}+\sum_{j=1}^{s}\beta_j \sigma_{t-j}^2
\end{align*}
where $w> 0$, $\alpha_i, \beta_j\geq 0$.


The GARCH(m, s) model has the ARCH(m) model as the special case GARCH(m, 0). Only lower order GARCH models are used in most applications.

**Remarks on the GARCH(1,1) model.** The GARCH(1,1) model has the form 
$$
X_{t}= \sigma_{t} Z_{t},\quad 
\sigma^2_t= w+\alpha X^2_{t-1}+\beta \sigma_{t-1}^2
$$
with  $Z_{t}\sim \text{SWN}(0,1)$, $w>0$, $\alpha, \beta \geq 0$, and $\alpha+\beta<1$ to ensure the process is a weak stationary white noise process if. The variance of the process
is given by $w/(1 -\alpha-\beta)$.


Under a GARCH(1,1) model, a large $X^2_{t-1}$ or $\sigma_{t-1}^2$ gives rise to a large	$\sigma^2_t$ (volatility clustering).  in particular, GARCH models achieve a persistent effect of volatility clustering more parsimoniously than ARCH models, since $|X_t|$ has a chance of being large if either $|X_{t-1}|$ is large or $\sigma_{t-1}$ is large.


Similarly to ARCH models, one can establish parallels with the ARMA(1,1) process. Indeed, it can be proved that the series of squared values $(X^{2}_t)$ from a GARCH(1,1) process with finite fourth moment follow an ARMA(1,1) process.
Higher-order ARCH and GARCH models have the same general behaviour as ARCH(1) and GARCH(1,1), but their mathematical
analysis becomes more tedious.

The code below simulates a GARCH(1,1) process with Gaussian innovations and parameters $w = 0.5$, $\alpha= 0.1$, $\beta = 0.8$; then, it displays the realization of the process and the correlogram of the raw and squared values. 

```{r, fig.height=4, fig.width=8}
set.seed(440)
w <- 0.1
alpha <- 0.1
beta <- 0.8
z <- rnorm(1000)
x <- rep(0, 1000)
sig2 <- rep(0, 1000)
for (i in 2:1000){
  sig2[i] <- w + alpha * (x[i - 1]^2)+ beta * sig2[i-1] 
  x[i] <- z[i] * sqrt(sig2[i])
}

par(mfrow=c(1,3))
plot(x, type="l")
acf(x)
acf(x^2)
```


**Example: GARCH model for DAX log-returns.** We consider again the DAX daily log-returns for the period 1994-1998, which show no evidence of serial correlation, whereas their absolute values do show serial correlation and we reject the null hypothesis of a Ljung-Box test (based on the first 20 estimated correlations) at the $5\%$ level. We first consider a maximum likelihood approach to fit a GARCH(1,1) model with Gaussian innovations via the **rugarch** package:

```{r, message=FALSE}
##GARCH(1,1) with N(0,1) innovations
library(rugarch)
## Fitting via ugarchfit()
uspec. <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1)), 
                     distribution.model = "norm", 
            mean.model = list(armaOrder = c(0, 0)))

fit <- ugarchfit(uspec., data = Rt)
#fit
fit@fit$matcoef
```

If the model is adequate for the data, then neither the raw nor the squared standardized residuals obtained by the fitted model and given by $\hat{Z}_t=\hat{X}_t/\hat{\sigma}_t$ and $\hat{Z}^2_t$, respectively, should exhibit serial correlation (this can be checked via LB tests and ACF plots). Moreover, if $(Z_t)$ has been assumed to have a normal distribution, then this assumption can be checked by means of statistical tests or the visual inspection of a normal Q-Q plot of the standardized residuals.

```{r, fig.height=4, fig.width=8}
Z <- fit@fit$z # standardized residuals
plot(Z, type="l")
par(mfrow=c(1,3))
qqnorm(Z, datax=TRUE)
qqline(Z, datax=TRUE)
acf(Z)
acf(Z^2)

```

GARCH models with (scaled) $t$-distributed innovations with $\nu$ degrees of freedom are often preferred to models with Gaussian innovations. This can be done by setting `distribution.model = "std"':

```{r}
# Specify tGARCH
uspec. <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1)), 
                     distribution.model = "std", mean.model = list(armaOrder = c(0, 0)))
fit2 <- ugarchfit(uspec., data = Rt)
```


When calling tGARCH, you will see that there is a new parameter called shape, which is the estimation of the degrees of freedom of the $t$ distribution:

```{r}
#coef(fit2) 
#fit2
fit2@fit$matcoef
```
Let’s consider some diagnostic plots, which show a satisfactory fit

```{r, fig.height=4, fig.width=8}
#call plot on fit
par(mfrow=c(1,3))
plot(fit2, which=9)
plot(fit2, which=10)
plot(fit2, which=11)
```


#### ARMA models with GARCH errors

When dealing with returns data, the estimated residuals form an ARMA process is often found to be uncorrelated but the squared series has autocorrelation, due to the presence of a non-constant conditional variance.

Hence, a common way to handle such situation is to combine the ARMA and GARCH models together by setting the ARMA error $(\epsilon_{t})_t$ equal to $\sigma_{t}Z_t$, where $\sigma_t$ follows a GARCH volatility specification in terms of historical values of $(\epsilon_{t})_t$. In other words, the model to the daily log returns is given by an ARMA(p,q) model whose innovation process is GARCH(m,s). This provides a flexible model that may properly describe the dynamics of the data.


The general ARMA(p,q) model with GARCH(m,s) errors has the form 
$$
X_{t}=\mu_t+\epsilon_{t}
$$
where
\begin{align*}
\mu_t & =\phi_{0}+\sum_{i=1}^{p}\phi_{i}X_{t-i}+\sum_{i=1}^{q}\theta_{i} \epsilon_{t-i},\\
\epsilon_{t} & = \sigma_{t} \,Z_{t},\\
  \sigma^2_t & = w+\sum_{i=1}^{m}\,\alpha_i \epsilon^2_{t-i}+\sum_{j=1}^{s}\,\beta_j \sigma_{t-j}^2,
\end{align*}

$w>0$, $\alpha_i, \beta_j \geq 0,$ $\sum_{i}\alpha_i+\sum_j \beta_j<1$, and $(Z_{t})$ is SWN(0,1).

**Model fitting and checking**

In practice, the most widely used approach to fitting
GARCH models to data is maximum likelihood and the same approach can be used for anARMA model with GARCH errors. 

As with ARMA models it is usual to check fitted GARCH models
using residuals. For a general ARMA–GARCH model there are two types of residuals:

- the ordinary residual series from the ARMA model $\hat{\epsilon}_{t}=x_{t}-\hat{x}_{t}$  which estimates $\epsilon_{t}$ and are used when forecasting;

- the *standardized* residuals, used for model checking, $\hat{z}_{t}= \hat{\epsilon}_{t}/\hat{\sigma}_{t}$  obtained by dividing the ordinary residual series by its estimated conditional standard deviation.

The standardized residuals should behave like an SWN and this can be investigated by examining the ACF plot of raw and absolute values and applying LB tests.


### Volatility Forecasting and Risk esimation

In this section, we use ARMA-GARCH models for prediction and estimation of Value-at-Risk and Expected Shortfall to account for periods of high or low volatility. In particular, estimating VaR for a future time period requires us to be able to forecast volatility, that is, forecast $\sigma_{t+h}$ for $h\geq 1$ based on a sample of size $n$, $X_{t-n+1}, X_{t-n+2}, \dots, X_t$. We will assume that the data are generated by the process $X_t=\mu_t+\sigma_t Z_t$ discussed above.

In what follows, we use daily log returns on the on BMW share price from Tuesday 2nd January 1973 until Tuesday 23rd July 1996. This data are available as a numeric vector in the **Evir** package in *R* (data can be downloaded from Yahoo Finance by using the **quantmod** package).

**1. Obtain the data and plot**

We first load the data, select and plot the first $n=2000$ observations, check for serial correlation. 

```{r, message=FALSE, fig.height=4, fig.width=8}
library(evir)
library(qrmtools)
library(rugarch)

data(bmw)
n<-2000
X<-bmw[1:n]
plot(X, type="l", xlab="t", ylab=expression(X[t]))
par(mfrow=c(1,2))
acf(X)
pacf(X)
```

**2. Fit an ARMA-GARCH model to the data**

We now consider an ARMA-GARCH model for the BMW *negative* log-returns (losses), where the ARMA part consists of a pure AR(2) model, based on the information provided by the ACF and PACF plots (compare with alternative models!). The GARCH part will be a GARCH(1,1) model for the errors:  

```{r}

L<- -X # obtain process of negative log-returns

# only ARMA(2,0) fit
# fit.arma<-arima(L, order = c(2,0,0))

## Fit an ARMA(2,0)-GARCH(1,1) model
### with t standardized residuals
spec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1,1)), 
                   mean.model = list(armaOrder = c(2,0), 
                   include.mean = FALSE), #mu was not significant
                   distribution.model = "std") 

fit <- ugarchfit(spec, data = L)  #fit
fit@fit$matcoef
```
We can then extract the resulting series and plot the data $X_t$ and fitted $\hat{\mu}_t$:

```{r, fig.height=4, fig.width=5}
mu.fit <- fitted(fit) # fitted hat{mu}_t 
sig.fit <- sigma(fit) # fitted hat{sigma}_t

plot(X, type = "l", xlab = "t",
     ylab = expression("Data"~X[t]~"and fitted values"~hat(mu)[t]))
lines(as.numeric(mu.fit), col = 2)
legend("bottomright", bty = "n", lty = c(1,1),
       col = c("black", 2),
       legend = c(expression(X[t]), expression(hat(mu)[t])))
```

The following code produces the plot of estimated conditional standard deviation based on the AR(2)-GARCH(1,1) model:

```{r, fig.height=4, fig.width=5}
# unconditional volatility
# var(x_t)=w/(1 - alpha - beta)
sqrt(fit@fit$coef[["omega"]]/(1-fit@fit$coef[["alpha1"]]-fit@fit$coef[["beta1"]]))

plot(as.numeric(sig.fit), type = "l", xlab = "t",
     ylab = expression(hat(sigma)[t]))

```


We can also produce some plots of residuals and standardized residuals from the fitted model 

```{r, fig.height=4, fig.width=5}
## Plot the unstandardized residuals epsilon_t
resi <- as.numeric(residuals(fit))
plot(resi, type = "l", xlab = "t", ylab = expression(epsilon[t])) 
```

```{r, message=FALSE, fig.height=4, fig.width=8}
Z <- fit@fit$z # standardized residuals
par(mfrow=c(1,3))
acf(Z)
acf(Z^2)

## Q-Q plot of Z_t against their specified t
library(qrmtools)
nu.fit <- fit@fit$coef[["shape"]]
qq_plot(Z,FUN = function(p) sqrt((nu.fit-2)/nu.fit)*qt(p, df = nu.fit), 
        main = "std - QQ plot") ## (t_nu with variance 1)
# or 
# plot(fit, which=9)

```


**3. Calculate estimates for VaR and ES**
  
To calculate conditional risk measures we use the following result. 

Let $L_t = -X_t$ represent losses (negative log-returns) on an asset price, and assume that the process $(L_t)_t$ follows a stationary model of the form $L_t=\mu_t+\sigma_t Z_t$, where $(Z_t)_t$ is an SWN(0,1). Hence, we look at the estimation of value-at-risk and expected shortfall for the distribution $F_{X_{t+1}|\mathcal{F}_t}$.

If $\rm{VaR}^t_{\alpha}$ denotes the $\alpha-$quantile of $F_{X_{t+1}|\mathcal{F}_t}$ and $\rm{ES}^t_{\alpha}$ denotes the corresponding expected shortfall, it is easy to verify that 

$$
  \rm{VaR}^t_{\alpha}=\mu_{t+1}+\sigma_{t+1}q_\alpha(Z);\quad \rm{ES}^t_{\alpha}=\mu_{t+1}+\sigma_{t+1}ES_\alpha(Z)
$$
  where $Z$ is a rv with the same distribution of innovations. 

After fitting an ARMA+GARCH model estimates of the number of dof, $\hat{\nu}$, conditional mean $\hat{\mu}_{t+1}$ and volatility $\hat{\sigma}_{t+1}$ are available. The quantile and corresponding expected shortfall depend on the inoovation distribution. Since we have assumed (scaled) $t$-innovations the parametric quantile and expected shortfall of the standard univariate $t$ distribution must be scaled by the factor $\sqrt{(\nu-2)/\nu}$ to take account of the fact that the innovation distribution is scaled to have variance one. Under this assumption, VaR and ES are estimated as

$$
  \widehat{\rm{VaR}}^t_{\alpha}=\hat{\mu}_{t+1}+\hat{\sigma}_{t+1} \sqrt{(\nu-2)/\nu}\, q_{\nu, \alpha};\quad \widehat{\rm{ES}}^t_{\alpha}=\hat{\mu}_{t+1}+\hat{\sigma}_{t+1}\widehat{ES}_\alpha(Z)
$$
  where $q_{\nu, \alpha}$ is the $\alpha$-quantile of the $t$ distribution with mean 0 (for $\nu > 1$) and variance $\nu/(\nu-2)$ (for $\nu > 2$), and $\widehat{ES}_\alpha(Z)$ is the parametric expression of the expected shortfall of a standard univariate $t$ distribution.


```{r}
alpha <- 0.99
## Extract fitted VaR_alpha
## => quantile(<rugarch object>, probs = alpha) provides 
#VaR_alpha = hat{mu}_t + hat{sigma}_t * q_Z(alpha)

VaR.fit <- as.numeric(quantile(fit, probs = alpha))

## Build manually and compare the two
VaR.fit2 <- as.numeric(mu.fit + sig.fit * sqrt((nu.fit-2)/nu.fit) * qt(alpha, df = nu.fit)) # 
#VaR_alpha computed manually
stopifnot(all.equal(VaR.fit, VaR.fit2))

# Expected shortfall via function ES_t (qrmtools)
ES.fit<-mu.fit + sig.fit * ES_t(alpha, scale = sqrt((nu.fit-2)/nu.fit),
            df = nu.fit)

```

**4. Predict VaR and ES based on fitted model**
  
In order to estimate conditional risk measures, we need to derive prediction equations under the assumption of an ARMA-GARCH model, i.e., $X_t -\mu_t = \epsilon_t = \sigma_t Z_t$, where $\mu_t$ and $\sigma_t$ describe an AR(2) model and a GARCH(1,1) model, respectively. 

We will use the fact that $E(\epsilon_{t+h}|\mathcal{F}_t)=0$, and that the rvs $(X_s)_{s\leq t}$ and $(\epsilon_s)_{s\leq t}$ are known at time $t$.

If the forecast horizon is $t$,

- predictions of $X_{t+h}$ or $\mu_{t+h}$ ($h\geq 1$) are given by 
$$
  E(X_{t+h}|\mathcal{F}_t)=E(\mu_{t+h}|\mathcal{F}_t)
$$
  and recursively calculated in terms of $E(X_{t+h-1}|\mathcal{F}_t)$; 

- predictions of $\sigma^2_{t+h}$ or $\epsilon^2_{t+h}$ ($h\geq 1$) are given by 
$$
  Var(X_{t+h}|\mathcal{F}_t)=E((X_{t+h}-\mu_{t+h})^2|\mathcal{F}_t)
$$
and can be computed recursively in terms of $E(X_{t+h-1}-\mu_{t+h-1})^2|\mathcal{F}_t)=E(\epsilon_{t+h-1}^2|\mathcal{F}_t)$.


Let $\hat{X}_t(h):=E(X_{t+h}|\mathcal{F}_t)$ and $\hat{\sigma}^2_t(h):=Var(X_{t+h}|\mathcal{F}_t)$. 

Using the AR(2)-GARCH(1,1) specification, we get $h$-steps ahead forecasts:
\begin{align*}
\hat{X}_{t}(h)&=\mu+\phi_1(\hat{X}_t(h-1)-\mu)+\phi_2(\hat{X}_t(h-2)-\mu)\\
\hat{\sigma}^2_t(h)&=w+(\alpha+\beta)\hat{\sigma}^2_t(h-1)
\end{align*}

and these expressions are approximated by substituting inferred values for $\epsilon_t$ and $\sigma_t$ obtained from the residuals of the model (see McNeil et al, 2015).


```{r}
# Predict from the fitted process
mod <- getspec(fit) # specification of the fitted process
setfixed(mod) <- as.list(coef(fit)) # set the parameters to the fitted ones
m <- ceiling(n / 20) # n. of times to roll the n.ahead forecast
# predict from the fitted process
pred <- ugarchforecast(mod, data = L, n.ahead = 1, n.roll = m-1, out.sample = m) 

## Extract the resulting series
mu.predict <- fitted(pred) # conditional mean mu_t
sig.predict <- sigma(pred) # predicted sigma_t

# predicted VaR_alpha
VaR.predict <- as.numeric(quantile(pred, probs = alpha)) 

# predicted ES_alpha
ES.predict <- as.numeric(mu.predict + sig.predict * ES_t(alpha, 
                    scale = sqrt((nu.fit-2)/nu.fit), df = nu.fit))

```

Finally, we can plot the data (negative log-returns) with the fitted conditional mean $\hat{\mu}_t$, the estimated $\widehat{\rm{VaR}}_\alpha$, and their predicted values.


```{r}
ran <- range(L, mu.fit, VaR.fit, mu.predict, VaR.predict) 
yran <- c(-max(abs(ran)), max(abs(ran))) # y-range for the plot
xran <- c(1, length(X) + m) # x-range for the plot

plot(L, type = "l", xlim = xran, ylim = yran, xlab = "Time t", ylab = "",
     main = "ARMA-GARCH, fit, VaR, VaR predictions")

lines(as.numeric(mu.fit), col = adjustcolor("darkblue", alpha.f = 0.5)) # hat{\mu}_t

lines(VaR.fit, col = "darkred") # estimated VaR_alpha

## Predictions
t. <- length(L) + seq_len(m) # future time points

lines(t., mu.predict, col = "blue") # predicted process X_t (or mu_t)
lines(t., VaR.predict, col = "red") # predicted VaR_alpha

legend("bottomright", bty = "n", lty = rep(1, 4),
       col = c(adjustcolor("darkblue", alpha.f = 0.5), 
               "blue", "darkred", "red"), cex=0.6, 
       legend = c(expression(hat(mu)[t]),
                  expression("Predicted"~mu[t]),
                  substitute(widehat(VaR)[a], list(a = alpha)),
                  substitute("Predicted"~VaR[a], list(a = alpha))))
```


#### References 

- Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31:307–327.

- Chatfield, C. (2003). The analysis of time series: an introduction. Chapman and hall/CRC.

- Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica 50:987–1008.

- Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297–303. doi: 10.2307/2335207.

- McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative risk management.
Concepts, techniques and tools-revised edition. Princeton University Press.
Princeton University Press, Princeton, NJ.