On Estimation Of Risk Measures: A GARCH-EVT Dynamic Approach

(1)

On Estimation Of Risk Measures:

A GARCH-EVT Dynamic Approach

Hung Chu (S2348594)

MSc. Econometrics, Operations Research, and Actuarial Studies University of Groningen, the Netherlands

Faculty of Economics and Business Master thesis

June 30, 2016

Abstract

In this thesis we focus on the estimation of risk measures by GARCH models. The two risk measures that we will consider are the Value-at-Risk (VaR) and Expected Shortfall (ES). The GARCH models are fitted to losses time series using maximum likelihood, where the innovations are respectively assumed to be normally and t distributed. The VaR and ES estimation per-formance of the models is specified by the result of various backtests. If, from this result, the innovations turn out to be misspecified, we implement a dynamic approach that combines the GARCH model with Extreme Value Theory (EVT). In this dynamic approach we model the misspecified inno-vations by EVT. For normally distributed innoinno-vations the GARCH model estimates the two risk measures poorly, but the corresponding dynamic ap-proach substantially improves it. For t distributed innovations the GARCH model already estimates the two risk measures very well, so the dynamic approach is unnecessary. In fact, the dynamic approach turns out to esti-mate the ES equally well as its predecessor GARCH model, but it worsens the VaR estimation.

JEL classification. C22; C51; C52; C53; G32.

(2)

1 Introduction

According to the Capital Adequacy Directive by the Bank of International Set-tlements (BIS), the risk capital of a financial institution should be sufficient to prevent the institution to default under normal circumstances. Two risk measures that are frequently conducted by regulators to determine the capital requirement for market risk are the so-called Value-at-Risk (VaR) and Expected Shortfall (ES). It is a main concern for many financial institutions to measure these risk measures accurately. An overestimation of the risk measures could result in lower benefits from the businesses. On the other hand, an underestimation could be fatal for the financial institution as this would lead to lower required capital and hence greater risk to be unable to fulfill their exposures.

A traditional and popular approach for estimating the VaR and ES for market risk is the historical simulation (HS) approach. This approach has the advantage that it does not assume a particular distribution of the returns and it is also relatively easy to implement. However, it assumes that future movements of the returns are exactly represented by the returns’ historical movements over a specified time window. As explained in Hull (2015), this assumption is in practise unrealistic. Therefore we propose more advanced models that overcome this disadvantage. One such model is the famous Generalized Autoregressive Conditional Heteroskedas-tic (GARCH) model, developed by Bollerslev (1986). This model expresses the volatility of the time series at some time instance as a function of lagged values of the time series and volatility. Contrary to the HS, the GARCH model is able to describe the volatility clustering of time series well. This model has become very popular in time series analysis due to its simplicity and its unique ability to deal with heteroskedasticity. In particular in financial applications it has been applied successfully due to the fact that many financial time series exhibit volatil-ity clustering (McNeil et al. (2015), Predescu & Stancu (2011), F¨uss et al. (2007) and Engle (2001)). As a result, many extensions of the GARCH model has been developed since its existence, see Ter¨asvirta (2009) for an overview. According to Van den Goorbergh & Vlaar (1999) the volatility dynamics is the most important characteristics of stock returns when modelling the VaR.

(3)

1.1 Research question

A drawback of the GARCH model is that it generally does not estimate the tails of the time series distribution very well. In particular not as well as EVT does. On the contrary, EVT does not directly reflect the volatility behaviour of the return time series. McNeil & Frey (2000) proposed a two-step approach that combines a GARCH-type model and EVT. The objective of this approach is that it should be able to describe the heavy tails of the return time series, while at the same time also account for its volatile behaviour. In their paper, the authors analyzed five historical series of logarithmic returns. In the first step they estimate the AR(1)-GARCH(1,1) model, where the normal distribution of the innovations is assumed to be misspecified. In the second step they assume that the excess distribution of the innovations is an EVT distribution and as a consequence they model these innovations by EVT. The fitted EVT distribution is used to obtain estimates of the risk measures. Based on the backtest criterions, the two-step approach turned out to outperform the GARCH models with normal and t distributed innovations and the unconditional EVT.

In this thesis we consider the ARMA-GARCH model with normally and t dis-tributed innovations. For these models we are interested whether the VaR and ES backtests support the hypothesis of correct specification. If the model is correctly specified with respect to the functional relationships between the variables, we consider the model as a pseudo-true model. Otherwise we consider it misspecified with respect to the distribution of the innovations. In the latter case we implement the dynamic approach, where we change the model specification by assuming that the excess distribution of the innovations is an EVT distribution, as McNeil & Frey (2000). We use this distribution to estimate the risk measures.

We consider the log losses time series of four stock indices and an equally weighted portfolio composed of these four stock indices: AEX, DAX, S&P 500 and SSE. The estimation of the risk measures is backtested to assess the estimation performance of the model. In Section 2.5 we will explain in more detail the backtesting proce-dures as well as how we can use the result of a backtest to determine whether the innovations distribution is correctly specified. The research question is formulated as follows:

Research Question

Is the ARMA-GARCH model with normally and t distributed innovations correctly specified regarding to the backtest criterion of each risk measure estimation? If not, could we improve the estimation by additionally using EVT?

In order to answer this research question, we formulate the following sub-question:

(4)

The remainder of this thesis is organized as follows. Section 2 introduces the risk measures and also provides theory about the GARCH model, EVT and GARCH-EVT model. The empirical data and descriptive statistics are presented in Section 3. Section 4 discusses the result of the backtests. Section 5 concludes the thesis and Section 6 discusses the methodologies and consequently provides suggestions for further studies.

2 Methodology

The theory in this section is thanks to McNeil et al. (2015) if not cited differently.

2.1 Risk measures

The VaR and ES are two prevailing risk measures in risk management. In this subsection we introduce and discuss the use of these risk measures. For each risk measure we define the unconditional and the conditional variant.

2.1.1 Value-at-Risk

In 1996 the Basel Committee on Banking Supervision (BCBS) added the VaR to Basel I for the calculation of the capital requirement for market risk. This risk measure is still widely encountered in risk management thanks to its simplicity and convenience. At present the VaR is practised by the BCBS in Basel II.5. Let {Lt}t∈Z be the daily losses and denote its distribution function by FL(l) =

P (L ≤ l).

Definition 2.1. Given a confidence level α ∈ (0, 1), the unconditional VaR at the confidence level α is defined as

VaRα = inf{l ∈ R | FL(l) ≥ α}.

In risk management α ≤ 0.05 is usually chosen when considering returns. Or similarly, α ≥ 0.95 when considering losses.

The unconditional variant of the risk measure is commonly exploited in models where constant volatility is assumed. In our research we take into account for time-varying volatility and thus concede the conditional variant. For this, let Ftdenote

(5)

Definition 2.2. Given a confidence level α ∈ (0, 1), the conditional VaR over the next day at the confidence level α is defined as

VaRt_α _{= inf{l ∈ R | F}Lt+1|Ft(l) ≥ α}. (1)

Intuitively, the VaR is the loss level that will not be exceeded for a certain confi-dence level over a target time horizon. In fact, the VaR is just a quantile of the loss distribution. A drawback of the VaR is that it does not tell us the severity of losses given that the losses exceed a certain exceedance level.

2.1.2 Expected Shortfall

The BCBS is continuously revising its market risk framework since 2012. For the calculation of the market risk capital requirements, it is expected that the BCBS will make the transition from the VaR to ES by 2018.

Definition 2.3. Given a confidence level α ∈ (0, 1), the unconditional ES at the confidence level α is defined as

ESα= E[L | L > VaRα],

provided that E[|L|] < ∞.

Intuitively, the expected shortfall is the expected loss given that the loss exceeds a certain confidence level over a target time horizon. Subsequently, an alternative way to express the ES at confidence level α is:

ESα = VaRα+ E[L − VaRα | L > VaRα].

We have to point out that this is the reason that some literature use the termi-nology of ‘conditional VaR’ for the definition of the unconditional ES. However, as we are explicitly using time-varying models, this is by no means equal to our convention of the conditional VaR and the unconditional ES: these two variants of the risk measures are different.

For the same reason as for the VaR, we are particularly interested in the conditional variant of the ES risk measure.

Definition 2.4. Given a confidence level α ∈ (0, 1), the conditional ES over the next period at the confidence level α is defined as

ESt_α = ELt+1 | Lt+1 > VaRtα

(2) provided that E[|Lt+1|] < ∞.

(6)

ES makes up for the lack of measuring the tail risk of the VaR. Thus the ES is superior to the VaR concerning the modelling of risks in the tails. The VaR can be classified as a loss frequency measure, whereas the ES can be classified as a loss severity measure.

2.2 GARCH models

As mentioned in the introduction, many extensions of the basic GARCH model have been developed. To our best knowledge, what all of these GARCH models have in common for their construction is the strictly white noise (SWN) processes. Definition 2.5. A discrete-time stochastic process {Zt}t∈Z is a SWN process if

it is a series of independent and identically-distributed (i.i.d.) and finite variance random variables.

SWN processes are stationary processes without serial correlation. A SWN process with mean µ and variance σ2 _{is denoted by SWN(µ, σ}2_{). These processes describe}

the innovations of the time series.

Most financial losses time series exhibit autocorrelation and heteroskedasticity. An extension of the GARCH-type model that is able to deal with these phenomena is the so-called ARMA(p1, q1)-GARCH(p2, q2) model. As a matter of fact, this

GARCH-type model will be used for our analysis.

Definition 2.6. Let {Zt} be a SWN(0,1) process. The process {Lt}t∈Z is an

ARMA(p1, q1)-GARCH(p2, q2) if it is covariance stationary and if it satisfies, for

all t ∈ Z and some strictly positive-valued process {σt}t∈Z, the following equations.

Lt= µt+ σtZt, (3) µt= µ + p1 X i=1 φi(Lt−i− µ) + q1 X j=1 λj(Lt−j− µt−j), (4) σ_t2 = α0+ p2 X i=1 αi(Lt−i− µt−i)2+ q2 X j=1 βjσt−j2 , (5) where α0 > 0, αi ≥ 0, i = 1, . . . , p2 and βj ≥ 0, j = 1, . . . , q2.

The necessary and sufficient condition for covariance stationarity of this model is Pp2

i=1αi+

Pq2

j=1βj < 1.

The ARMA part is used for the dynamics of the conditional mean, where it takes into account the autocorrelation of the time series. On the other hand, the GARCH part takes into account the time-varying volatility (i.e. heteroskedasticity). In practise low-order ARMA(p1, q1) and GARCH(p2, q2) models are commonly used

(7)

ARMA(p1, q1)-GARCH(1,1), where p1, q1 ≤ 3. The ARMA(p1, q1)-GARCH(1,1)

model with the combination of p ≤ 3 and q ≤ 3 that yields the lowest Akaike Information Criterion (AIC) value will be used for the modelling.

2.2.1 Conditional risk measures

Denote the distribution function of the innovations FZ. For time series of the form

Lt = µt+ σtZt,

we have

FLt+1|Ft(l) = P (µt+1+ σt+1Zt+1≤ l | Ft)

= P (Zt+1 ≤ (l − µt+1)/σt+1 | Ft)

= FZ((l − µt+1)/σt+1),

and so by the definition of the conditional VaR in (1) we obtain zα = FZ−1(FZ((l − µt+1)/σt+1)) = (l − µt+1)/σt+1.

Solving this equation for l yields the expression for the conditional VaR. By the arguments above and using (2) it is straightforward to show that the conditional VaR and conditional ES can be written as

VaRt_α = µt+1+ σt+1zα, (6)

ESt_α = µt+1+ σt+1E[Lt+1 | Lt+1> zα], (7)

where zα is the α-quantile of the innovations Zt. So for normally and t distributed

innovations we respectively have

zα= Φ−1(α),

zα= t−1ν (α),

where Φ is the standard normal distribution function and tν is the standard t

distribution function with ν degrees of freedom.

2.2.2 Estimation

The parameters of an ARMA-GARCH model can be estimated by the maxi-mum likelihood (ML) estimation procedure. The conditional (log-)likelihood func-tion of the losses that has to be maximized can be written in terms of the pre-specified innovation distribution. Recall that the density function of the losses is fLt+1 |Ft(l) =

∂

(8)

Thus for a sample of n observations and a vector θ of parameters, the conditional likelihood function can be written as

L(θ) = n Y t=1 1 σt fZ L_t− µt σt , (8)

where fZ is the density function of the innovations, and µt and σtare respectively

given by the ARMA specification in (4) and the GARCH specification in (5). In this thesis we will consider normally and t distributed innovations. For no-tational convenience we will denote θ = (µ, ~φ, ~λ, ~α, ~β), where ~φ, ~λ, ~α and ~β are vectors of the ARMA-GARCH coefficients with elements φi, i = 1, . . . , p1, λj,

j = 1, . . . , q1, αk, k = 0, . . . , p2, and βl, l = 1, . . . , q2, respectively. Using the

standardized variant of the normal and Student’s t density functions, the condi-tional log-likelihood function for the corresponding distributed innovations of the ARMA-GARCH model are respectively given by

`N(θ) = − n 2ln(2π) − 1 2 n X t=1 ln(σ2_t) − 1 2 n X t=1 (Lt− µt)2 σ2 t , and (9) `t(θ, ν) = n log " Γ ν+1₂ pπ(ν − 2)Γ ν 2 # −1 2 n X t=1 log σ_t2− ν + 1 2 n X t=1 log 1 + (Lt− µt) 2 σ2 t(ν − 2) , (10) where ν > 2 is the degrees of freedom of the t distribution (Van den Goorbergh & Vlaar (1999)). Hence, from (8) − (10), the conditional log-likelihood function for the ARMA-GARCH model is given by

`(θ∗) = − n X t=1 ln(σt) + n X t=1 ln fZ Lt− µt σt , (11) = − n X t=1 ln(σt) + `i(θ∗), (12)

where θ∗ = θ and `i(θ∗) = `N(θ∗) for normally distributed innovations, while

otherwise θ∗ = (θ, ν) and `i(θ∗) = `t(θ∗) for t distributed innovations. Maximizing

this conditional log-likelihood function subject to σt > 0 (and additionally ν > 2

for t distributed innovations) yields an estimate for ~θ.

2.3 Extreme Value Theory

(9)

are maximum in every fixed time horizon, whereas the POT considers all data that exceeds a fixed threshold. Empirically, POT has been shown to be more efficient in modelling extremes for limited data than BM, see Gilli & K¨ellezi (2006). Consistent with these results we will exclusively use the POT for our analysis.

2.3.1 Peak-Over-Threshold

Definition 2.7. The (conditional ) excess distribution over the threshold u for a random variable X with distribution function F is given by

Fu(x) = P (X − u ≤ x | X > u) =

F (x + u) − F (u) 1 − F (u) ,

for 0 ≤ x ≤ xF− u, where xF ≡ sup{x ∈ R | F (x) < 1} ≤ ∞ is the right endpoint

of F .

The excess distribution is the general distribution for the excesses over the thresh-old u. The main distribution that is used in modelling the exceedances over threshold (or rather the excess distribution) is the so-called Generalized Pareto Distribution (GPD).

Definition 2.8. The distribution function of the GPD is given by

Gξ,β(x) =

(

1 − (1 + ξx/β)−1/ξ if ξ 6= 0

1 − e−x/β if ξ = 0, (13)

where β > 0, x ≥ 0 if ξ ≥ 0 and 0 ≤ x ≤ −β/ξ if ξ < 0.

The parameters ξ and β are respectively specified as the shape and scale param-eter. For ξ > 0 the distribution function of the GPD is the ordinary Pareto distribution, which corresponds to a relatively fat tailed distribution. For ξ = 0 it is the exponential distribution, and for ξ < 0 it is a short-tailed Pareto type II distribution.

A powerful result in EVT is that, for a large class of underlying distributions F and a sufficiently high threshold u > 0 we can model the excess distribution Fu by

the GPD distribution. This result is stated in the following theorem.

Theorem 2.1. (Pickands (1975) and Balkema & de Haan (1974)). For a large class of underlying distributions F we have

lim

u→xF

sup

0≤x<xF−u

|Fu(x) − Gξ,β(u)(x)| = 0.

(10)

Assumption 2.1. For an underlying distribution F with right endpoint xF, we

assume that for some high threshold u > 0 we have Fu(x) = Gξ,β(x) for ξ ∈ R,

β > 0 and 0 ≤ x < xF − u.

As mentioned by Scarrott & MacDonald (2012), the choice of the value for u con-cerns a trade-off between bias and variance. The threshold should be sufficiently high to make sure that we investigate the shape of the tail of the distribution and thus reducing the bias of the estimates. At the same time, the threshold should not be too high so that we have enough data to estimate the parameters accurate: less data yields estimates with higher variance.

2.3.2 Estimation

The estimation of the GPD is also done by ML and it is based on the excess data. Consider the data X1, . . . , Xn and denote the number of exceedances over

a certain threshold u by nu ≤ n. For convenience we denote these exceedances

by ˜X1, . . . , ˜Xnu and consequently define the excess data as Yi = ˜Xi − u, for i =

1, . . . , nu. The density function of the GPD can easily be derived from (13):

gξ,β(x) = (₁ β(1 + ξx/β) −(1/ξ+1) _{if ξ 6= 0} 1 βe −x/β _{if ξ = 0}, where β > 0, x ≥ 0 if ξ ≥ 0 and 0 ≤ x ≤ −β/ξ if ξ < 0.

If the excess data are assumed to be realizations of independent random variables, then it can be verified that the log-likelihood function is given by

`(ϑ) = nu X i=1 ln gξ,β(Yi) = ( −nuln(β) − (1 + 1/ξ) Pnu i=1ln(1 + ξyi/β) if ξ 6= 0 −nuln(β) − (1/β) Pnu i=1yi if ξ = 0 ,

where ϑ = (ξ, β)T. To obtain the parameter estimates we have to maximize this log-likelihood function subject to the constraints β > 0 and, in the case that ξ > 0, also 1 + ξyi/β > 0 for all i = 1, . . . , nu. This procedure yields a fitted GPD

distribution G_{ξ, ˆ}ˆ_β for the excess distribution Fu(x).

2.3.3 Unconditional risk measures

(11)

the upper tail probability by ¯F (x) ≡ P (X > x) = 1 − F (x), and in similar manner we define ¯F (u) and ¯Fu(x). By Assumption 2.1, we have for some sufficiently high

threshold u > 0 that the tail probability can be written as: ¯ F (x) = P (X > x) = P (X > u)P (X > x | X > u) = ¯F (u)P (X − u > x − u | X > u) = ¯F (u) ¯Fu(x − u) = ¯F (u) 1 + ξx − u β −1/ξ . (14)

Inverting ¯F (x), using (14), yields the expression of the unconditional VaR in terms of the coefficients of the GPD. For α ≥ F (u), it can easily be shown that this is

VaRα = ¯F−1(1 − α) = u + β ξ " 1 − α ¯ F (u) −ξ − 1 # . (15)

Using this result we can derive an alternative expression for the unconditional ES: ESα = 1 1 − α Z 1 α VaRx dx = 1 1 − α Z 1 α " u +β ξ 1 − x ¯ F (u) −ξ − β ξ # dx = u −β ξ + −1 1 − α β ξ(−ξ + 1) 1 ¯ F (u) −ξ (1 − x)−ξ+1 1 α = u(1 − ξ) 1 − ξ − β(1 − ξ) ξ(1 − ξ) + β ξ(1 − ξ) 1 − α ¯ F (u) −ξ = VaRα 1 − ξ + β − ξu 1 − ξ , (16) provided that ξ < 1.

2.4 GARCH-EVT model

As mentioned before, the GARCH-type models perform well in modelling the dynamics of the time series. However, it is relatively difficult to model the tails of the return distribution accurately and often we should make assumptions about the underlying innovation distribution. By combining the GARCH model with EVT we obtain a two-step model that is proficient to model the time-varying volatility of the daily returns and the tails of the return distribution. For that reason this model is sometimes called the conditional EVT.

Once we have modelled the innovations {Zt} of the GARCH by EVT, it is

(12)

this approach can be computed by simply substituting the right-hand side expres-sion in (15) into (6): VaRt_α = µt+1+ σt+1 u + β ξ " 1 − α ¯ F (u) −ξ − 1 #! . (17)

For n number of observations and Nu number of exceedances over u, we can

esti-mate ¯F (u) by Nu/n.

For the expression of the conditional ES we substitute (16) into (7). ESt_α = µt+1+ σt+1 zα 1 − ξ + β − ξu 1 − ξ , (18)

where zα is given by the right-hand side expression in (15). The estimation

pro-cedure of the risk measures using this two-step approach can be described by the following steps.

1. For some predetermined innovation distribution FZ, fit a GARCH-type model

to the losses time series by ML.

2. Using the estimates of the GARCH-type model, compute the innovations: Zt=

Lt− µt

σt

.

3. Model the innovation distribution FZ by EVT and use this distribution to

estimate its 95%, 97.5%, 99% and 99.9% quantile.

4. Calculate the conditional VaR and ES using (17) and (18), where µt+1 and

σt+1 are estimated from the GARCH model.

2.5 Backtesting

Backtests are employed to examine the accuracy of the estimated risk measures of the models. Different backtests usually have different criteria, but what they have in common is that they are used for model validation.

When backtesting the VaR, we are interested in the number of losses that in a day would have exceeded the estimated 1-day VaR. The days for which the losses {lt}t∈N exceed the estimated VaR is referred to as exception. Observe that we use

(13)

with a confidence level of α, the number of exceptions In

α over the data points

T + 1, . . . , T + n is defined as follows: I_αn= T +n X t=T +1 1{lt>V aRt−1α }, (19) where 1_{l t>V aRt−1α } = 1 if lt> V aR t−1 α and 0 otherwise.

Various backtest methods can be found in the literature. Common VaR backtests are the (1) the unconditional coverage backtest, (2) independence backtest and (3) conditional coverage backtest, where the latter being the most comprehensive. In fact, the conditional coverage test is a compound of the first two backtests. To backtest the ES we use a similar method as McNeil & Frey (2000), which considers the discrepancy between the excesses and the estimated ES.

As we will see in Section 2.5.1 − 2.5.4, for every backtest we can extract a p-value. From the p-value we evaluate the significance of the test statistic of a backtest, which is used to determine whether the innovation distribution is correctly speci-fied. Namely, if the p-value is larger than or equal to the pre-specified significance level, then the test statistic does not differ significantly from zero and consequently we do not reject the corresponding null hypothesis. In that case we say that the test statistic is non-significant at the pre-specified significance level. Otherwise we say that the test statistic is significant at the pre-specified significance level. We have evidence that the innovation distribution is correctly specified if all test statistics of a backtest are non-significant. Otherwise, we have evidence that the innovation distribution is misspecified.

2.5.1 Unconditional coverage backtest

The unconditional coverage (uc) backtest is proposed by Kupeic (1995). This likelihood ratio test is a two-tailed test and it is used to determine whether the probability of an exception is equal to the confidence level of the estimated VaR. In other words, we test the accuracy of the number of observed exceptions. For a probability of exceptions 1 − α, if we observe m number of exceptions over n trading days, then

LRuc = 2 ln[(1 − m/n)n−m(m/n)m] − 2 ln[αn−m(1 − α)m] ∼ χ2(1),

is the corresponding test statistic.

For this backtest we test the null hypothesis:

H0 : E[Iαn] = n(1 − α) vs. HA: E[Iαn] 6= n(1 − α).

(14)

2.5.2 Independence backtest

The independence (ind) backtest is proposed by Christoffersen (1998). As ex-plained in Hull (2015), the occurrence of exceptions should be evenly spread throughout the backtesting period. If the exceptions happen to be clustered, then this would lead to a higher risk of default than if the exceptions happen to be independent. Clustering of exceptions could be a result of poorly measured heteroskedasticity of the time series.

The test statistic is

LRind= 2 ln [(1 − π01)n00π01n01(1 − π11)n10π11n11] − 2 ln(1 − π)n00+n10πn01+n11 ∼ χ2(1),

where nij denotes the number of pairs (i, j), where i = 1{lt>V aRt−1α } and j =

1{lt+1>V aRtα}, t = T + 1, . . . , T + n, and π = n01+ n11 n00+ n01+ n10+ n11 , π01 = n01 n00+ n01 , π11 = n11 n10+ n11 . For this backtest we test the null hypothesis:

H0 : π01= π11 vs. HA: π016= π11.

If there are periods with sufficiently many exceptions clustered together, then the exceptions are unlikely to be independent and hence we would reject the null hypothesis. Likewise, the test statistic is considered non-significant at the X ·100% level if the p-value of the test is greater than or equal to X.

2.5.3 Conditional coverage backtest

The conditional coverage (cc) backtest is proposed by Christoffersen (1998), which is a combination of both the unconditional coverage backtest and the indepen-dence backtest. In other words, we test the null hypothesis of accurate number of exceptions (see uc) and independent occurrence of these exceptions (see ind). The likelihood ratio test statistic is given by

LRcc = LRuc+ LRind ∼ χ2(2).

(15)

2.5.4 Bootstrap backtest

For backtesting the ES we concentrate on the innovations on the out-of-sample data points when an exception occurs, i.e. on days when lt > VaRt−1α , t = T +

1, . . . , T + n. We define the discrepancy dt+1=

lt+1− EStα

σt+1

. (20)

The discrepancies should behave like an i.i.d. sample with expected value of zero (McNeil & Frey (2000)). Consequently, we test

H0 : E[dt+1] = 0 vs. HA: E[dt+1] 6= 0. (21)

The p-value is obtained by a bootstrapping procedure, in which we do not make any assumptions about the underlying distribution of the discrepancies. In case we reject the null hypothesis at the X · 100% significance level, we infer that the conditional expected shortfall is under- or overestimated by the model and hence the estimated risk measure is respectively too optimistic or too pessimistic about the market position of the financial institution.

3 Data

We analyse a total of five returns time series, each as a single investment portfolio. For this we consider four stock indices, AEX, DAX, S&P 500 and SSE, and an equally weighted (E.W.) portfolio of the four stock indices. For the four stock indices, Yahoo! Finance has provided us with the adjusted closing price and the daily trading volume from January 1, 2000 until April 12, 20161. The descriptive statistics can easily be obtained for each of the five portfolios. To illustrate the ideas behind the tests that we will perform later on (see Section 4), we initially consider the portfolio that solely consists of the returns of the AEX. This initial analysis applies to the remaining four portfolios in a similar manner.

The daily returns is calculated in terms of percentage differenced logarithmic value series, based on the following formulae.

rt= ln Pt Pt−1 · 100, (22)

where Pt is the adjusted closing price at time t. The time series of the AEX stock

index’ closing price and the stock returns rt are plotted in Figure 1.

From Figure 1, we speculate that a GARCH model could properly fit the daily returns series due to the volatility clustering. More precisely, we observe relatively

(16)

Figure 1: Daily closing price (left) and daily returns (right) of the AEX stock index from January 1, 2000 until April 12, 2016.

high volatility around the time period of 2003 and 2009, whereas we observe rel-atively low volatility at the remaining time periods. Heteroskedasticity, and in particular volatility clustering, supports the use of a GARCH model in the daily return time series for measuring price volatility.

We graphically investigate the autocorrelation up to a 12th lag in the data with a correlogram. The correlogram for the returns and squared returns can be found in Figure 2, where the blue dashed lines indicate the 95% confidence band. From the figure we observe that more than 5% of the estimated correlations lie outside the confidence band, for both the raw and squared returns. Thus, from the correlogram of respectively the raw and squared returns, we speculate that ARMA modelling is appropriate and that volatility clustering is present.

Figure 2: Correlogram of the returns (left) and the squared returns (right) of the AEX stock index from January 1, 2000 until April 12, 2016.

(17)

Figure 3: Q-Q plot of the returns of the AEX stock index from January 1, 2000 until April 12, 2016. The red line refers to the normal distribution.

do this by comparing the quantiles of the normal reference distribution to the quantiles of the returns distributions using a Q-Q plot 2. The observations in the Q-Q plot should lie on a straight line if the empirical distribution is the same as the normal reference distribution. As we can see in Figure 3, the low and high quantiles of the returns clearly deviates from the normal reference distribution (red line in the figure). The inverted S-shaped curve of the observations suggests that the returns distribution exhibit heavier tails than that of the normal distribution. So far we investigated the data graphically. In order to approve the use of the GARCH-EVT model, we additionally perform three different tests. The Jarque-Bera (J-B) test will be employed to examine non-normality of this time series. In case of non-normality, the return distribution is heavy-tailed and/or leptokurtic, which motivates a long-tailed return distribution. Second, we test for serial cor-relation in daily returns series using the Ljung-Box test up to a 12th lag length. The null hypothesis is that the data are independently distributed. This test is employed to test to which extend the ARMA part is necessary. Lastly, we apply a unit root test, namely the Augmented Dickey-Fuller (ADF) test to test for non-stationarity of the daily return time series. The non-stationarity condition is required for estimating the GARCH model. The descriptive statistics and the result of the tests for the daily returns can be found in Table 1.

For the four stock indices and the E.W. portfolio, we reject the corresponding null hypothesis of all three tests at 1% significance level. That is, from the J-B we reject the null hypothesis of normality, which implies that the returns distribution is non-normal. In combination with the excess kurtosis we assert that the returns distribution has heavy tails. The result of the Ljung-Box Q(12) test show that the daily returns exhibits serial correlation and thus confirms the use of the ARMA

(18)

Table 1: Descriptive and test statistics of the daily returns (from January 1, 2000 to April 12, 2016)

AEX DAX S&P 500 SSE E.W.

Observations 4160 4145 4093 4108 3911 Mean −0.0103 −0.0089 0.0085 0.0194 0.0051 Maximum 10.0283 10.7975 10.9572 9.4008 8.8417 Minimum −9.5903 −7.4335 −9.4695 −9.2562 −6.5572 Variance 2.1843 2.3896 1.5987 2.6551 1.1541 Skewness −0.0802 0.0025 −0.1850 −0.3237 −0.0985 Kurtosis 6.0037 3.9726 7.9260 4.6946 5.0090 J-B 6261.2265* 2730.2396* 10751.5240* 3849.0986* 4101.5821* Q(12) 68.8942* 28.6070* 63.6502* 34.5605* 55.1557* ADF −15.0929* −15.2945* −15.9684* −13.9306* −14.7253*

The descriptive statistics and three test statistics are given for the daily returns rtof the four stock indices and

the equally weighted portfolio. *Significant at 1% significance level.

part. Lastly, the result of the ADF test implies stationarity of the returns, which allows us to use GARCH models. These results encourage the application of ARMA-GARCH models.

4 Empirical results

In this section we will show and discuss the estimation performance of the time series models to the percentage differenced logarithmic value series of the losses:

lt= − ln Pt Pt−1 · 100.

We will analyse four different models: two GARCH(1,1) models, one with normally distributed innovations (GARCH-N) and the other one with t distributed innova-tions (GARCH-t) and similarly two GARCH-EVT models with normally and t distributed innovations (respectively GARCH-EVT-N and GARCH-EVT-t). The GARCH models are defined as ARMA-GARCH(1,1) with respectively innovations Zt

i.i.d.

∼ N (0, 1) and Zt i.i.d.

∼ tν(0,p(ν − 2)/ν).

The realization of the innovations can be ordered as z(1) ≥ . . . ≥ z(k) ≥ z(k+1) ≥

. . . ≥ z(N ), where N is the number of innovations. Denote the value of the

threshold by u = z(k+1), so that the GPD is fitted to the k number of

ex-cesses z(1) − z(k+1), . . . , z(k) − z(k+1). Common choices for k are k1 =

√

N and k2 = N2/3/ log(log(N )), rounded off to the nearest integer (Scarrott & MacDonald

(2012)). As we will see in Section 4.1, we have N = 1,000 and so the choices would be k1 = 32 and k2 = 52. This respectively implies that we model the upper

(19)

4.1 Backtesting the VaR

The backtest is based on one-step ahead forecasts from a series of rolling windows, where each window consists of 1,000 observations. Based on these windows we backtest for each model the estimated 1-day 95%, 97.5%, 99% and 99.9% condi-tional VaR and ES. For α ∈ {0.95, 0.975, 0.99, 0.999}, this procedure is described by the following steps.

1. Let N = T = 1,000 and let T + n be the total number of observations. 2. Fit an ARMA(p1,q1)-GARCH(1,1) model to the oldest 1,000 losses available

and estimate VaRT_α using (6), where the optimal values for p1, q1 ≤ 3 are

chosen based on AIC. If the fitting of the GARCH model is not able due to stationarity problems of the series of losses, then skip this step 3_.

3. Update T = T + 1 and remove the oldest loss.

4. Repeat steps 2 − 3 until we have obtained dVaRN_α, . . . , dVaRN +n−1_α .

Observe that the number of windows (i.e. the length of this procedure) is n − 1 instead of n. Also observe that at every next window we fit a new ARMA-GARCH(1,1) model. The advantage of refitting is that we always use the optimal ARMA-GARCH(1,1) model, up to ARMA orders of 3.

We should caution that the results of Table 1 apply to the whole sample of losses and so they do not necessarily apply to the series of losses in every window. For this reason we only consider windows for which the losses time series is stationary, i.e. for which we are able to estimate the GARCH coefficients. At least from the results of Table 1 we suspect that it is proper to use ARMA-GARCH models for most of these series of losses.

After the rolling forecasting procedure we perform the backtest by first deter-mining the number of exceptions by (19) and then extracting the p-value of each of the three VaR backtests. Since 1 − α is the probability of an exception and n − 1 is the number of windows, the expected number of exceptions is calculated by (1 − α)(n − 1), rounded to the nearest integer. The backtests result of the GARCH-N and GARCH-t model can be found in Table 2 and 3. Table 2 shows the number of exceptions (No. exc.) and the test statistic of the conditional cov-erage backtest, while Table 3 shows the test statistic of the unconditional covcov-erage and independence backtest. The p-value of each of the VaR backtest is given in parenthesis, where a bold value indicates that the test statistic is significant at the 5% level. If the p-value is greater than or equal to 0.05, then we say that the test statistic is non-significant. Otherwise the test statistic is said to be significant.

(20)

On most occassions the two GARCH models have larger number of exceptions than expected and so their VaR estimates are relatively low. In other words, these models tend to be optimistic. Recall from Section 2.5 that only the result of the conditional coverage backtest is regarded to specify the VaR estimation performance of a model. Regarding to this backtest, the GARCH-N model has seven out of twenty test statistic that are non-significant, so we have evidence that the normal distribution of the innovations is misspecified. This is in line with McNeil & Frey (2000). Therefore we will perform the two-step approach. That is, we assume that the excess distribution of the innovations is an EVT distribution and hence we will fit the excess distribution to the GPD.

Interestingly, the GARCH-t model has all test statistics that are non-significant regarding to the conditional coverage backtest, so we have evidence that the t dis-tribution of the innovations is correctly specified. Thus we consider the GARCH-t model as a pseudo-true model. There is no need to perform the two-step ap-proach, as this approach will not yield more non-significant test statistics than its predecessor GARCH model. However, we are also curious about the result of the two-step approach with these innovations. For this reason we will fit the GPD to the excess distribution of these innovations as well. Step 2 in the rolling procedure above can be adjusted for the two-step approach as follows.

2*. Perform the GARCH-EVT approach to the oldest 1,000 losses available and estimate VaRT_α using (17). If the fitting of the GARCH model is not able due to stationarity problems of the series of losses, then skip this step.

Observe that the threshold for the EVT-based models and the estimated degrees of freedom of the t distribution for the innovations are generally different in every window, because a different set of losses is modelled in every window. The backtest result of GARCH-EVT-N and GARCH-EVT-t can also be found in Table 2 and 3.

(21)

To motivate that the innovation distribution exhibit heavy tails, we plot the esti-mated degrees of freedom of the t distributed innovations from every window. Such a plot is depicted in Figure 4 for the AEX. Starting around 2007 and continuing until the end of the whole sample, the degrees of freedom is indeed relatively low.

Figure 4: Plot of the estimated degrees of freedom, ˆν, of the innovation distribution from the GARCH-t model, where the whole sample of the AEX is used.

Considering the unconditional coverage backtest, we have that the GARCH-EVT-N, GARCH-t, GARCH-EVT-t and GARCH-N model respectively yield one, two, six, and fourteen significant test statistics. So the GARCH-EVT-N model has the most number of exceptions that are accurate. Second most is the GARCH-t model, etc.

Considering the independence backtest, the four GARCH-type models have all test statistics that are non-significant. Therefore we have evidence that the GARCH-type models respond quickly to changes in the volatility of the time series. This is in line with the results of Soltane et al. (2012) using the Tunisian Stock Market, Singh et al. (2011) using the ASX-All Ordinaries index and S&P 500, Bhattacharyya & Ritolia (2008) using the NSE Nifty index, and many more.

(22)

Table 2: The number of exceptions and test statistic of the conditional coverage backtest for every model are shown. The p-value of the backtest is given in parenthesis, where a bold value indicates that the test statistic is significant at the 5% level.

Dataset AEX DAX S&P 500 SSE E.W.

Length of test (n − 1) 3,017 3,049 2,867 3,007 2,818

α = 0.95 No. exc. LRcc No. exc. LRcc No. exc. LRcc No. exc. LRcc No. exc. LRcc

(23)

Table 3: The test statistic of the unconditional coverage and independence backtest for every model are shown. The p-value of the backtests is given in parenthesis, where a bold value indicates that the test statistic is significant at the 5% level.

Dataset AEX DAX S&P 500 SSE E.W.

Length of test (n − 1) 3,017 3,049 2,867 3,007 2,818

α = 0.95 LRuc LRind LRuc LRind LRuc LRind LRuc LRind LRuc LRind

(24)

Figure 5: The AEX daily portfolio losses and the 95% VaR estimates of the GARCH-EVT-N and GARCH-N for the time period January 1, 2006 until January 1, 2010.

Figure 6: The AEX daily portfolio losses and the 95% VaR estimates of the GARCH-EVT-t and GARCH-t for the time period January 1, 2006 until January 1, 2010.

Concluding, based on the VaR backtest we have evidence that the innovation dis-tribution of the GARCH-N model is misspecified, but correctly specified for the GARCH-t model. Generally speaking, the two step approach improves the corre-sponding GARCH model on estimating the VaR if the innovations are misspecified. For a correctly specified GARCH model it turns out that the corresponding two-step approach even performs worse on estimating the VaR.

4.2 Backtesting the ES

(25)

5. Repeat steps 2 − 3 until we have obtained cES1000_α , . . . , cES1000+n−1_α , where we use (18) to estimate the ESs.

The performance of the estimation of the ES is determined by the p-value of the null hypothesis formulated in (21). This hypothesis is tested by a non-parametric bootstrapping procedure with replacement. The bootstrapping procedure is de-scribed by the following steps (Efron & Tibshirani (1993)).

1. Obtain discrepancies dt+1 = lt+1 − cEStα

ˆ

σt+1 for excesses lt+1 > dVaR

t

α only. Write

these discrepancies as d(i), i = 1, . . . , m, where m ≤ n − 1 is the number of

excesses.

2. Compute the t statistic ˆθ = _σ/_ˆ √dˆ

m, where ˆ d = 1 m m X i=1 d(i), (23) ˆ σ2 = 1 m − 1 m X i=1 (d(i)− ˆd)2. (24)

3. Obtain a new sample ˜d(i) = d(i)− E[d(i)], i = 1, . . . , m, and bootstrap { ˜d(i)}

with replacement, B = 100,000 times.

4. Compute for each bootstrap sample the t statistic ˆθ_b∗ = _ˆ dˆ˜

˜

σ/√m, b ∈ {1, . . . , B},

where d and ˜ˆ˜ σ are defined as in (23) and (24), with ˜d(i) in place of d(i).

5. Obtain the two-sided p-value by

p = 2 min(pl, pu), where pl= 1 +PB i=11{ˆθ∗_b<ˆθ} 1 + B , pu = 1 +PB i=11{ˆθ∗_b>ˆθ} 1 + B .

We make the following observations about this procedure.

• If there are less than two discrepancies, we cannot use the t statistic. In the case of one discrepancy, there is a possibility to consider the mean statistic ˆ

θ = ˆd = d(1). Still, this will always yield a p-value of almost zero. For this

(26)

• The new sample ˜d(i) has the same distribution as the original sample d(i),

i = 1, . . . , m, except for the expected value. Furthermore, the distribution of the new sample obeys the null hypothesis in (21).

• We add “1” to the numerator and denominator in pu and pl. Otherwise the

p-value can be exactly equal to zero, which is not possible.

The test statistic ˆθ for the GARCH-N and GARCH-t model can be found in Table 4. The p-values of the ES backtest are given in parenthesis, where a bold value indicates that the test statistic is significant at the 5% level.

From Table 4 we see that the GARCH-N model has only four out of twenty test statistics that are non-significant. Again we have evidence that the innovation distribution is misspecified. So we will consider the corresponding two-step ap-proach, see Table 4 for the results. From the table we see that for the GARCH-N and GARCH-EVT-N model the estimated 95% ES of the AEX has a p-value of respectively 0.000 and 0.313. A plot of the corresponding discrepancies for the two models can be found in Figure 7. From the figure it seems that the discrepancies of the GARCH-EVT-N model behave like an i.i.d. sample with expected value of zero, whereas for the GARCH-N model the expected value seem to be larger than zero. This result is in accordance with the p-values.

For the GARCH-t model all test statistics are non-significant 4_{. Again we have}

evidence that the innovation distribution is correctly specified. Therefore it is unnecessary to perform the two-step approach. But, for the same reason as in the case of the VaR, we will still consider the GARCH-EVT-t model.

Interestingly, all test statistics of the two GARCH-EVT models are non-significant. In other words, the GARCH-EVT-N model improves the ES estimation of its predecessor GARCH model. The GARCH-EVT-t model does not perform the ES estimation worse than as its predecessor GARCH model, as was the case for the VaR estimation.

Concluding, based on the ES backtest we have evidence that the innovation dis-tribution of the GARCH-N model is misspecified, but correctly specified for the GARCH-t model. Generally speaking, the two step approach improves the corre-sponding GARCH model on estimating the ES if the innovations are misspecified. For a correctly specified model it turns out that the corresponding two-step ap-proach performs equally well on estimating the ES.

(27)

Table 4: The test statistic for every model are shown. The p-values of the bootstrap test are given in parenthesis, where a bold value indicates that the test statistic is significant at the 5% level.

AEX DAX S&P 500 SSE E.W.

(28)

Figure 7: Discrepancies plot of the GARCH-N (left) and GARCH-EVT-N model (right). The discrepancies are from the estimated 95% ES of the AEX.

5 Conclusion

The ARMA-GARCH(1,1) model is misspecified with respect to the normal dis-tribution of innovations regarding to the backtest criterions, but it is correctly specified with respect to the functional relationships between the variables for t distributed innovations. For the misspecified model, i.e. the ARMA-GARCH(1,1) model with normally distributed innovations, the usage of EVT in the two-step approach substantially improves the estimation of both risk measures. For the correctly specified model, i.e. the ARMA-GARCH(1,1) model with t distributed innovations, the two-step approach performs the estimation of the conditional ES equally well as its predecessor GARCH model, but it worsens the estimation of the conditional VaR.

6 Discussion

(29)

The result of the backtests may be different for other choices of the threshold in the EVT modelling or a different number of losses for each window. We have to keep in mind that the ES backtest considered in this thesis has the drawback that it cannot be performed in case of less than two discrepancies. To our best knowledge, there is no ES backtesting procedure that can deal with this issue, so we suggest to use a larger number of observations.

One of the interesting conclusions is that the two-step approach does not improve the VaR estimation of a correctly specified predecessor GARCH model. Therefore we highly encourage future studies to consider different initial distributions for the innovations, such as the skewed normal, skewed t and Generalized Hyperbolic distribution. In addition, it may be interesting to investigate GARCH models that account for leverage effect, i.e. where negative shocks are weighted differently than positive shocks. From a practical point of view it would be even more interesting to perform a similar study as ours, but with multivariate models instead.

Acknowledgements

Foremost I would like to express my gratitude to my supervisor Dr. Praagman for the wise comments, remarks and suggestions. He gave me the opportunity to write a thesis about this very interesting topic. Furthermore, I am very grateful to my co-assessor Dr. Ronchetti for introducing me to time series models and EVT and also for his valuable comments on my thesis. His courses motivated me to further study these topics. Last but not least, I would like to extend my sincere thanks to my family for their support. This thesis would not been possible without the contribution of the people mentioned above.

References

Balkema, A. A. & de Haan, L. (1974), ‘Residual Life Time At Great Age’, The Annals of Probability 3, 792–804.

Bhattacharyya, M. & Ritolia, G. (2008), ‘Conditional VaR using EVT - Towards a planned margin scheme’, International Review of Financial Analysis 17, 382– 395.

Bollerslev, T. (1986), ‘Generalized Autoregressive Conditional Heteroskedasticity’, Journal of Econometrics 31, 307–327.

Brodin, E. & Kl¨uppelberg, C. (2008), Extreme Value Theory in Finance, Encyclo-pedia of Quantitative Risk Analysis and Assessment, John Wiley & Sons, Ltd, West Sussex.

(30)

Efron, B. & Tibshirani, R. (1993), An Introduction to the Bootstrap, Springer Science+Business Media Dordrecht, New York: Chapman & Hall, Inc.

Embrechts, P., Resnick, S. I. & Samorodnitsky, G. (1999), ‘Extreme Value Theory as a Risk Management Tool’, North American Actuarial Journal 3, 30–41. Engle, R. (2001), ‘GARCH 101: The Use of ARCH/GARCH Models in Applied

Econometrics’, Journal of Economic Perspectives 15(4), 157–168.

F¨uss, R., Kaiser, D. G. & Adams, Z. (2007), ‘Value at Risk, GARCH modelling and the Forecasting of Hedge Fund Return Volatility’, Journal of Derivatives & Hedge Funds 13(1), 2–25.

Gilli, M. & K¨ellezi, E. (2006), ‘An Application of Extreme Value Theory for Mea-suring Financial Risk’, Computational Economics 27, 207–228.

Hull, J. (2015), Risk Management and Financial Institutions, Vol. 4th Revised edition, John Wiley & Sons, Inc., Hoboken, New Jersey.

Kupeic, P. (1995), ‘Techniques for Verifying the Accuracy of Risk Management Models’, Journal of Derivatives 3, 73–84.

McNeil, A. J. (1999), ‘Extreme Value Theory for Risk Managers’, In Internal Modelling and CAD II 7, 93–113.

McNeil, A. J. & Frey, R. (2000), ‘Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach’, Journal of Empirical Finance 7, 271–300.

McNeil, A. J., Frey, R. & Embrechts, P. (2015), Quantitative Risk Management: concepts, techniques and tools, Princeton Series in Finance, Princeton University Press, Princeton (N.J.).

Pickands, J. (1975), ‘Statistical Inference Using Extreme Order Statistics’, The Annals of Statistics 3, 119–131.

Predescu, O. M. & Stancu, S. (2011), ‘Portfolio Risk Analysis using ARCH and GARCH Models in the Context of the Global Financial Crisis’, Theoretical and Applied Economics 18(2), 75–88.

Scarrott, C. & MacDonald, A. (2012), ‘A Review Of Extreme Value Threshold Estimation And Uncertainty Quantification’, Statistical Journal 10(1), 33–66. Singh, A. K., Allen, D. E. & Powell, R. J. (2011), ‘Value at Risk Estimation Using

Extreme Value Theory’, International Congress on Modelling and Simulation 19, 41–50.

Soltane, H. B., Karaa, A. & Bellalah, M. (2012), ‘Conditional VaR using GARCH-EVT approach: Forecasting Volatility in Tunisian Financial Market’, Journal of Computations & Modelling 2(2), 95–115.

Ter¨asvirta, T. (2009), An Introduction to Univariate GARCH Models, Handbook of Financial Time Series, Springer-Verlag Berlin Heidelberg.