Estimating value-at-risk through extreme value theory combined with time series analysis

(1)

Estimating Value-at-Risk through

Extreme Value Theory Combined

with Time Series Analysis

Jeroen Ruissen

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics Author: Jeroen Ruissen Student nr: 11424346

Email: jeroen.ruissen@student.uva.nl Date: July 7, 2017

(2)

(3)

Estimating Value-at-Risk through EVT — Jeroen Ruissen iii

Abstract

Although much criticized, the Value-at-Risk is still one of the main tools that are used in Financial Risk Management. While more com-prehensive risk measures exist, its simplicity and the fact that it always exists explain why its popularity is undefeated.

Besides the choice of a specific risk measure, another important fac-tor is the underlying distribution that is used to estimate the chosen risk measure. This thesis evaluates the use of EVT and GARCH for EVT estimates of Value-at-Risk, in comparison to the two more classical methods of normal approximation and quantile estimation. These four models arevaluated on two financial time series, spanning a period from 1986 up until 2016. After fitting model parameters based on 1986 - 2006, estimates of daily Value-at-Risk are calculated for various probability levels. These estimates are then backtested against the remaining years, 2007 - 2016.

The conclusion is that the GARCH for EVT outperforms the other three models and that a possible contribution to risk management might be considered, although future research, especially regarding its influence within multivariate models, is needed.

Keywords Extreme Value Theory, Generalized Autoregressive Conditional Heteroscedas-ticity, Risk Management, Risk Measures, Back-testing.

(4)

Preface v

1 Introduction 1

2 The Extreme Value Theory 3

2.1 The Generalised Extreme Value distribution . . . 3

2.2 Calibrating data to the GEV distribution . . . 5

2.3 Generalised Pareto Distribution . . . 5

2.4 Estimating VaR with the Generalised Pareto Distribution . . . 6

2.5 Pitfalls of the assumptions of the EVT . . . 7

3 EVT under heteroskedasticity 9 3.1 Stationarity . . . 9

3.2 The Ljung-Box test . . . 10

3.3 GARCH extension of the EVT. . . 11

4 Practical Study 13 4.1 Methodology . . . 13

4.2 Fitting GPD distributions and estimating Value-at-Risk. . . 15

4.3 Analysis of exceedances. . . 18

4.4 Discussion of results . . . 20

5 Conclusion and future research 24

Appendix: R Code 26

Bibliography 31

(5)

Preface

After five intensive months, writing a preface is the final stage of my Masters’ thesis. The last five months I’ve learned incredibly much - not just about the Extreme Value Theory, but also about doing research in general and writing a thesis on a Masters level.

I would like to use this page to thank Umut Can for his role as a supervisor. Not only did he provide me with very precise feedback, I was also lucky enough to be granted with a lot of freedom in doing my own research and writing my own thesis. In specific, I have been very lucky to be allowed to write the first part of my thesis from Halifax, Canada.

Secondly, I would like to thank my family for their support during the last four years. Specifically my parents, who probably played the biggest role in me becoming the person that I am now. Not only have my mom and dad taught me many basics in life, they also enabled me to follow my dreams and commit to a subsidiary Bachelors and Masters in Law and an exchange semester in Australia. Life lessons and memories that I will carry with me for the rest of my life!

Last, but definitely not least, there is my girlfriend, Stephanie. During the last five months she supported me unconditionally. She saw the struggles of me writing up the first pages and therefore I would like to devote my last words of the thesis to her. Your support means a lot to me and definitely makes me make the most of myself and my academic career. I am excited for all the academic challenges we will face together, wherever those might be.

(6)

(7)

Chapter 1 Introduction

The extreme value theory (hereafter: EVT) is slowly but steadily gaining interest of those who are concerned with likelihoods and severeness of extreme events. Extreme events can be thought of in many different ways, varying from a dike breaking through because of a spring tide to the Dutch financial system breaking down because of the fall of a big German bank. Whatever field extreme events are studied in, they poten-tially have great influence on society and people’s day-to-day life, making it of great importance to have a statistical theory that can correctly predict (the occurrence and severeness of) extreme events. Within financial institutions, modelling extreme events falls under the broader concept of risk management.

While more sophisticated measures have been introduced, financial risk management still revolves around Value-at-Risk (VaR). A one-day VaR is interpreted as the value a financial variable will not exceed within one day, with a pre-specified certainty. Mathematically, it is the quantile of a random variable X at some level p close to 1.

VaRp(X) = inf{x ∈ R|F (x) > p}, (1.1)

where F denotes the cumulative distribution function of X.

One of the reasons VaR is still used on a large scale is that financial supervisors, like De Nederlandsche Bank, base their capital requirements on it. For example, by Basel Accord II[2], a bank is required to hold the average of a 10-day 99% VaR and k times the average of the 10-day 99% VaR over the last 60 days. Furthermore, the Solvency Capital Requirement (SCR) by Solvency II is given by the one-year 99.5% VaR[11].

(8)

VaR has been criticised for many reasons over the last decades, mainly because it lacks subadditivity, requiring the VaR of X + Y to be smaller or equal to the VaR of X plus the VaR of Y , and it does not consider the entire tail of the distribution of X. While both of these points of criticism have valid grounds, and propositions like Tail-Value-at-Risk could indeed lead to better understanding of Risk Management, just using a different risk measure will not do the trick.

The main problem that this thesis addresses is the underlying distributional assump-tions that are made before calculating the value of a (any) risk measure. The pre-viously mentioned Basel II Accord names Variance-Covariance methods, historical simulations and Monte Carlo simulations as potential frameworks to come to a value for some VaR. As shown by Homolka [7], all of these classical methods lead to rela-tively low estimates and are out-performed by the EVT.

This thesis discusses the framework of the EVT, before explaining how it can be used to determine values of VaR. However, while it will prove to be a sophistication of the classical methods, over time the EVT has been criticised for the assumptions it makes on the underlying distribution.

To evaluate these points of criticism, the EVT is backtested to see if it would have correctly predicted the financial crisis of 2008. By fitting data on the Dow Jones and S&P500 indices from 1986 until 2006, the estimates of Value-at-Risk of the EVT and two classical methods are evaluated in a backtest against 2006 until 2016, including the financial crises starting in 2008. Furthermore, the possible contribution of a heteroskedastic extension, in this case the GARCH for EVT, is evaluated against the same time periods.

Chapter 2 focuses on the EVT, its estimate of VaRp and the pitfalls of the

underly-ing assumption that it makes. Chapter 3 introduces the concept of stationarity and discusses the use of the EVT within a GARCH time series. Chapter 4 puts both pre-vious chapters into practice and evaluates the outcomes, after which the final chapter concludes the thesis and gives suggestions for possible future research.

(9)

Chapter 2 The Extreme Value Theory

2.1 The Generalised Extreme Value distribution

Consider a sequence of random variables Xi, i ∈N, that are independent and

iden-tically distributed (iid) with mean µ and variance σ2 and have distribution function F (x). A well known asymptotic result of Xi is given by the Central Limit Theorem,

which states that the distribution of Sn, the sum of the first n variables of Xi,

con-verges to a normal distribution when n goes to infinity. More precisely, for each n there exist an and bn such that

lim n→∞P Sn− an bn ≤ x = Φ(x),

for each x ∈ R, where Φ(x) denotes the standard normal distribution. In this case, an and bn will equal µn and σ

√

n respectively.

Like the Central Limit Theorem, the Extreme Value Theory considers the asymptotic behaviour of a random sample. However, instead of the sum of Xi, it considers the

extreme values of the sequence. Define n-block maximum Mn as the maximum of the

first n variables,

Mn= max(X1, . . . , Xn),

and note that one can express the minimum of a sample as a maximum by consider-ing

min(X1, . . . , Xn) = − max(−X1, . . . , −Xn),

which justifies that considering the theory for maximum values only is sufficient. A subtle difference between the two theories is that the Central Limit Theorem

(10)

describes a quality that holds for the sum of all iid sequences X1, . . . , Xn with finite

variance, while there is no general formula that holds for the distribution of maxima of all sequences. If Mn, possibly after translation and rescaling, converges to some

function H(x) as n goes to infinity, then the underlying distribution function F is said to be in the maximum domain of attraction of H(x). More formal is the next definition.

Definition 1. If there exist sequences cn and dn such that

lim n→∞P Mn− dn cn ≤ x = lim n→∞F n_(c nx + dn) = H(x), (2.1)

for some H(x), then F is said to be in the maximum domain of attraction of H(x) (F ∈ MDA(H)).

In 1928, Fisher and Tippett [5] concluded that there were only three possible dis-tributions for H(x), which had separately been presented by Fr´echet, Gumbel and Weibull. These three distributions are combined in the generalised extreme value distribution.

Definition 2. The generalised extreme value (GEV) distribution function is given by

Hξ(x) =    exp(−(1 + ξx)−1/ξ), if ξ 6= 0, exp(−e−x), if ξ = 0, (2.2) for 1 + ξx > 0.

(11)

Estimating Value-at-Risk through EVT — Jeroen Ruissen 5

The parameter ξ in Hξ(x) is determined by the tail behaviour of the distribution

function of the random variables Xi. One usually distinguishes the three cases ξ > 0,

ξ < 0 and ξ = 0, of which the GEV corresponds to the Fr´echet, Weibull and Gumbel distributions respectively. As is graphically illustrated by three examples in figure

2.1, a value of ξ larger than or equal to 0 means the distribution of Xi has relatively

heavy tails, while it has a finite maximum when ξ is smaller than 0. In fact, when ξ is larger than 0, it has infinite moments E[Xk_{] for k larger than 1/ξ. When ξ equals}

zero, the distribution of Xi has finite moments of all orders k ∈N.

2.2 Calibrating data to the GEV distribution

When fitting model (2.2) to a data set, one wants to find parameters cn, dnand ξ such

that F (cnx + dn) approximately follows Hξ(x). As n will typically be fixed, denote

cn and dn by σ and µ. In other words, we are looking to find the best estimate for

Hξ x−µ_σ , which will from now on be denoted as Hξ,µ,σ(x).

Typically this best estimate is found using the method of maximum-likelihood. Di-viding our data in sets of size n and taking the maximum value of every data-set, denoted by Mn1, . . . , Mnj, . . . , Mnm gives us m realizations of our n-block maximum

Mn, with which the maximum-likelihood estimate can be found.

Trade-off between n and m:

Typically a data set does not have the size for n as well as m to be chosen big enough. When n is chosen relatively big, leading to a relatively small m, the maximum values, Mnj for 1 ≤ j ≤ m, will be well approximated by the GEV distribution and the

resulting parameter estimates will have low bias. However, as there will be relatively few values of Mn, the variance of our estimates will be high. Alternatively, when m

is relatively high and n is relatively low, our estimates will have higher bias, but our parameter estimates will have a lower variance.

2.3 Generalised Pareto Distribution

Rather than the distribution of n-block maxima, to evaluate VaR it might be more useful to look at the distribution of exceedances over some threshold u. Where the distribution of n-block maxima revolves around the GEV, the distribution of Fu(x) =

P (X − u ≤ x|X > u) for some u, has the Generalised Pareto Distribution (herafter: GPD) at its center.

(12)

Definition 3. The Generalised Pareto Distribution function is given by: Gξ,β(x) =    1 − (1 + ξx/β)−1/ξ), if ξ 6= 0, 1 − exp(−x/β), if ξ = 0, (2.3)

for x ≥ 0 if ξ ≥ 0 and 0 ≤ x ≤ −β_ξ when ξ < 0.

A theorem by Pickands et al. [10] states a direct dependence between Fu and the

GEV distribution. They state that if, and only if, F ∈ MDA(Hξ), then

lim

u→xF

sup

0≤x≤xF−u

|Fu(x) − Gξ,β(u)| = 0,

for some function β(u). In other words, if distribution function F lies in the maximum domain of attraction, as in (2.1), then a β(u) can be found for every threshold u, so that asymptotically the distribution of Fu(x) converges to the GPD as in (2.3).

Generally, a GPD will be fit using the method of maximum likelihood estimation for some set threshold u. This threshold has to be small enough to have enough observations exceeding it (leading to lower variance in the estimates ˆξ and ˆβ) and big enough to ensure that the estimates are not too biased. This is in a way similar to the trade-off between n and m within the GEV framework. Observing that the mean-excess function, e(u) = E[X − u|X > u], is a linear function under (2.3), u is typically chosen as the smallest number such that the sample mean excess function, e(u), does indeed look like a linear function.

2.4 Estimating VaR with the Generalised Pareto

Distribution

For practical reasons, in estimating a Value-at-Risk at level p, VaRp, the GPD

dis-tribution is preferred over the GEV disdis-tribution. In this thesis we also limit the evaluation of the risk measure to the one-day VaRp. As explained in the introduction

by equation (1.1), VaRp denotes a quantile of the underlying distribution.

Given a distribution function F , if the excess distribution Fu above a threshold u is

exactly equal to the GPD Gξ,β, then the VaR at level p can be computed as

VaRp = u + β ξ 1 − p ¯ F (u) −ξ − 1 ! , (2.4)

(13)

where ¯F (u) equals 1 − F (u). Now, consider a sample with size n, N (u) exceedances of threshold u, and estimates ˆξ and ˆβ. We then approximate (2.4) by

VaRp = u + ˆ β ˆ ξ n N (u) − ˆξ − 1 ! . (2.5)

2.5 Pitfalls of the assumptions of the EVT

While the EVT is often considered to be a subtle refinement of traditional estima-tion methods and has proven to lead to more realistic estimates for Value-at-Risk, its contribution to risk management has been controversial over the years. Diebold, Schuermann and Stroughair [4] believe that the enthusiasm for the EVT is “partly appropriate but also partly exaggerated”, where they are more concerned with the use of the EVT in practice than with its theoretical framework. While they find the probability theory elegant, rigorous and voluminous, they see that the statistical ap-plication of it often lacks insight and resorts to primitive solutions. One of their main concerns is that the trade-off between m and n is often misunderstood, explaining why they plead for a standardised minimum for m. If this value of m is not possible due to a lack of data, they argue one should resort to Monte Carlo simulation and apply the EVT to that rather than to the data set.

McNeil and Frey [8] also see opportunities as well as deficiencies in the EVT. They praise the statistical theory and are in favour of parameterisation of extreme events, but find that the EVT fails to capture the volatility background of most financial data, and therefore fails to capture the distribution of the Xi. Most financial data

tend to have periods of relatively high volatility, in which big shocks are followed by more big shocks, as well as relatively stable times. This phenomenon is known as “clustered volatility”, and can be definition not be captured by a model with constant volatility.

Clustered volatility is one of the examples of serial dependence within financial data. As the general model of the EVT is based upon the assumption that Xi is a time

series with constant volatility σ (homoskedasticity) and no interdependence, it gives no opportunity to model a more dependent volatility structure.

Within risk management, the failure to capture an underlying distribution of financial losses can have enormous effects on capital requirements and therefore on the financial stability within a system or organisation. It is therefore worthwhile considering an extension of the EVT that does allow for a more comprehensive volatility structure.

(14)

The next chapter discusses such an extension, after which chapter 4 evaluates several criteria of its estimates of VaR.

(15)

Chapter 3 EVT under heteroskedasticity

This chapter considers a suggested extension of the EVT, the GARCH (Generalised Autoregressive Conditional Heteroskedasticity) extension of the EVT as introduced by McNeil in 2000 [8]. This model relaxes the assumptions of the general EVT on the underlying volatility structure. While the GARCH model is well-known for capturing clustered volatility, it still makes the assumption of stationarity on the distribution of Xi.

This chapter explains the concept of stationarity, after which a numerical argument to consider extending the EVT is given. It concludes discussing the extension and on how to calculate the Value-at-Risk with the extended model.

3.1 Stationarity

An important concept in time series analysis is stationarity, which can be seen as a generalisation of independence. Independence requires any Xi, Xj, for i 6= j, to be

independent realisations of a random variable. In specific, the joint distribution func-tion, FXi,Xj, equals the product of both the marginal distribution functions. While

sta-tionarity allows for interdependence of realisations and thus for a less trivial joint dis-tribution function, it does require the joint disdis-tribution to be time-independent. Definition 4. A process (Xi)i∈Z is strictly stationary if

FX_t1,...,Xtn = FXt1+k,...,Xtn+k),

for t1, . . . , tn, k ∈ Z and for all n ∈ N.

(16)

Another form of stationarity is covariance-stationarity, also known as weak station-arity. Let autocovariance function γ(t, s) be defined as

γ(t, s) = E[(Xt− µt)(Xs− µs)], for t, s ∈ Z. (3.1)

Then, covariance-stationarity assumes Xi to have a constant mean and

autocovari-ance function.

Definition 5. A process Xi is said to be covariance-stationary if the first two

mo-ments exist and satisfy

E(Xt) = µ

γ(t, s) = γ(t + k, s + k), for t, s, k ∈ Z and µ ∈ R.

It is important to realise that although stationarity is a more lenient assumption than that of independence, it still makes specific assumptions about the distribution of the random variables. It for example assumes that there is no trend in the data, and that the covariance structure does not change over time. The main reason to extend to a model that assumes stationarity rather than independence is that it does allow for a dependence structure, and thus gives opportunities to model clustered volatility.

3.2 The Ljung-Box test

The Ljung-Box test is a numerical argument that can be used alongside a plot of a data set to argue whether a more advanced dependence structure is desirable or not. It test the null hypothesis that the time series Xi forms an iid sequence, formally

known as a white noise process.

Definition 6. Under the condition of covariance-stationarity, the autocorrelation function of Xi is given by

ρ(h) = ρ(Xh, X0) = γ(h)/γ(0), ∀h ∈ Z.

(17)

will typically be relatively small. Then, the test statistic QLB, given by

QLB = n(n + 2) h X j=1 ˆ ρ(h)2 n − j,

asymptotically follows a chi-squared distribution, with h degrees of freedom, under the assumption of a white noise process. For a given significance level α, this assumption is rejected if the value of QLB is higher than the (1 − α)-quantile of χ2h. If it is indeed

rejected, the Ljung-Box test suggests that the data calls for a more comprehensive modelling of the dependence than assuming a iid structure.

3.3 GARCH extension of the EVT

A way to model heteroskedasticity is with a GARCH model, which defines Xi as the

product of volatility σi and a random component, Zi. Typically, Zi follows a white

noise process and by definition σi+1 is dependent on previous realisations of Xi and

σi.

Definition 7. With (Zi)i∈Zfollowing a white noise process, Xiis defined as a GARCH

process if it is strictly stationary and it follows the following equation, for a strictly positively-valued process σi: Xi = σiZi, (3.2) σ_i2 = α0+ p X j=1 αjXi−j2 + q X k=1 γkσ2i−k, with α0 > 0, αj ≥ 0 and γk ≥ 0.

Note that although this process is stationary, it does include heteroskedasticity, as there is no fixed value for σi. Also, it tends to have periods of relatively high or low

volatility. The volatility at time i will be high whenever one of the (up to q) previous volatilities or one of the (up to p) previous realisations of Xi were high. Similarly, if

all q previous values of the volatility as well as the previous p values of Xi were low,

the volatility at time i will be low too. We conclude that the GARCH model gives opportunities to capture clustered volatility.

Incorporating a GARCH model in the EVT, the main difference between the EVT for a GARCH model and the classical EVT is the random component that is considered. Once a GARCH model is fit to the data, typically done by maximum-likelihood esti-mation, the realisations of random component Zi are assumed to be in the maximum

(18)

domain of attraction of the generalised extreme value distribution.

Then, a one day VaRp at time t can be estimated by calculating σt+1 through (3.2)

and then plugging in the VaRp for Zi, given by (2.5), such that

VaRp = σt+1· FZ−1(p) (3.3)

(19)

Chapter 4 Practical Study

This section discusses the practical study that examines the benefit of extending the traditional methods of quantile and normal estimation to more advanced methods, being the EVT and GARCH for EVT. After discussing the approach and test criteria, the results and the conclusions that one can draw from them are discussed.

In order to evaluate the EVT and its GARCH extension and their contributions to risk management, we use data on the log-returns from the S&P 500 and Dow Jones indices from 1986 up until 2016. Both methods are compared to one another and two more classical methods, the normal approximation and the quantile estimate. We expect to find that the EVT estimates out-perform those of the normal approximation method, as its distributions (with positively valued ξ) have heavier tails than the normal distribution. Furthermore, we expect that the GARCH extension captures the volatility structure better and will therefore perform (even) better than the general EVT.

4.1 Methodology

The methodology consists of the following steps:

1. Fitting distributions and estimating Value-at-Risk

Based on the data from 1986 until 2006, values of one-day VaRp are estimated by the

four different methods. The approaches of the classical methods are pretty straight-forward: to determine a VaRp for the quantile estimate, the p% quantile of the sample

is taken, while the normal approximation takes this quantile from the normal distri-bution with the mean and standard deviation equal to those of the sample.

(20)

Coming to the (extended) EVT values of VaRp is less trivial. The general EVT has

a VaRp given by equation 2.5. For the GARCH extension, there is no such thing as

“one” VaRp. As we can see in (3.3), it also depends on the volatility at time t + 1.

Therefore, we determine the VaRp of the white noise sequence Zi and then simulate

volatility sequence σi for time t + 1. Multiplying those two gives the value of VaRp

at time t + 1.

2. Back-testing and the test function

With given values of VaRp, we proceed to back-testing them against the data from

the remaining ten years, from 2007 until 2016. Specifically, we are interested in the exceedance identifier, evaluating at which points in time the VaRp is exceeded.

For-mally, it is given by the following equation:

1VaRp(t) =    1 if VaRp < Xt 0 if VaRp > Xt (4.1) 3. Test criteria

In particular we are interested in three test criteria to evaluate the goodness of fit, the first one being the number of exceedances, given by the sum of (4.1) for all t. As our second time frame consists of 2518 working days, we expect the number of exceedances to be binomially distributed with length 2518 and probability (1 − p). If this is not the case, the binomial test suggests that the value for VaRp is false (and

typically too low).

Secondly, the average of the exceedances is of interest. While a small exceedance of minimum capital requirement may not have a detrimental effect, a risk measure that underestimates not only the probability but also the magnitude of extreme events can have devastating effects for the financial system.

A final criterion is the clustering of the exceedances. If the underlying distribution fits the data well, we expect the exceedances to be serially independent and therefore the test function to follow a white noise process. The serial independence of exceedances will be evaluated by a Ljung-Box test.

(21)

4.2 Fitting GPD distributions and estimating

Value-at-Risk

The data being used to fit a GPD for the general GPD and the GPD for GARCH are given by the graphs below. For the general GPD, parameters β and ξ are estimated from the data directly. To fit a GPD for GARCH model, parameters αj and γk and

the corresponding values for σi are estimated first. After, Zi (= X_σi

i) is fit to a GPD

distribution.

Figure 4.1: S&P 500 and Dow Jones log-returns from 1986 until 2006.

Before estimating β and ξ, thresholds u have to be determined in such a way that the sample mean excess function, e(u), approximately follows a linear trend. In figure 4.2 one can find those mean excess plots as well as the corresponding values of u.

Once thresholds u are determined, one can fit a GPD to the excess distribution. With the fitted parameters ˆβ, ˆξ and a value of p, equation (2.5) will give the value of VaRp

for the general GPD distribution. For the GPD for GARCH model, the values will give VaRp for Zi, multiplying with σi gives the VaRp for Xi. The VaRp estimates of

(22)

Figure 4.2: S&P 500 and Dow Jones sample Mean-Excess plots for a general GPD model and a GPD for GARCH model and the deduced threshold values.

p EVT-estimate Quantile-estimate Normal estimate S&P 0.950 1.95 1.61 1.72 0.975 2.20 2.10 2.06 0.990 2.72 2.73 2.45 0.995 3.32 3.23 2.72 0.999 5.98 6.22 3.27 Dow J. 0.950 1.89 1.57 1.74 0.975 2.19 2.06 2.08 0.990 2.78 2.71 2.48 0.995 3.46 3.42 2.76 0.999 6.33 6.94 3.32

(23)

While the above estimates are constant for a value of p, the estimates of VaRp for the

EVT for GARCH are not. As these estimates are a linear function of volatility σi,

they will be higher in times of high volatility and lower in times of lower volatility. This is illustrated by the next graphs for p = 0.99.

Figure 4.3: The estimates of Value-at-Risk at 99% level for S&P 500 and Dow Jones of the GPD for GARCH versus the general GPD and the classical methods.

(24)

4.3 Analysis of exceedances

For testing the goodness of fit of the four models, we focus on the exceedances of their estimates of VaRp. This is done by back-testing the estimates against the actual

log-returns between 2007 and 2016. Summing the identifier function as in equation (4.1), we found the following number of exceedances for the models.

p Expected GARCH GPD Quantile Normal S&P 0.950 125.9 106 82 166 146 0.975 62.95 86 70 103 107 0.990 25.18 25 46 49 70 0.995 12.59 9 34 37 50 0.999 2.52 1 4 4 34 Dow J. 0.950 125.9 123 129 174 146 0.975 62.95 83 88 107 103 0.990 25.18 39 46 50 67 0.995 12.59 13 30 31 47 0.999 2.52 4 4 34

Table 4.2: Number of exceedances of VaRp and expected number of exceedances.

Number of exceedances

Now, a higher estimate of VaR is not necessarily a better estimate. Assumed is that Xi, or Zi in the case of a GARCH extension, is identically and independently

distributed. By definition of VaRp, an exceedance of VaRpat time i should then follow

a Bernoulli experiment with probability (1 − p). Then, as the entire time frame is made up of 2518 working days, we expect the number of exceedances to be binomially distributed with length 2518 and probability (1 − p).

Next, a one-sided binomial test can be used to evaluate the goodness of fit of the four models. Assuming that the underlying distribution fits the distribution of Xi well,

the number of exceedances is binomially distributed with n equal to the number of working days (2518) and probability (1 − p). As we are only concerned with models underestimating the likelihood of an exceedance, we use a one-sided test. Then, for some value of significance α and a number of exceedances k, one could reject the

(25)

model if P (Y > k) < α, where Y denotes the binomial distribution with n = 2518 and probability equal to (1 − p).

For all numbers of exceedances as in table 4.2, we find that, for a common level of α equal to 0.05, the normal estimation of VaRp would be rejected for all probability

levels on both indices. The quantile estimate would be rejected for all levels of p except for p = 0.999.

Furthermore, comparing the general GPD-estimate and the GPD-estimate of the GARCH extension, we find that the GARCH extension outperforms the general model for most levels of p. We find that the difference becomes bigger when p goes up.

p GARCH-estimate GPD-estimate Quantile-estimate Normal estimate S&P

0.950 0.96 0.99 0.0001 0.03

0.975 0.23 0.17 9.5e-07 9.5e-07

0.990 0.46 5.9e-05 7.9e-06 4.1e-14 0.995 0.8 1.4e-07 5.3e-09 2.22e-16

0.999 0.72 0.11 0.11 0

Dow J.

0.950 0.582 0.37 1.2e-05 0.03

0.975 0.005 9.8 e-4 1.03e-07 9.5e-07 0.990 0.003 5.9e-05 3.5e-06 9.5e-13 0.995 0.38 7.9e-06 3.1e-06 1.8e-14

0.999 0.72 0.11 0.11 0

Table 4.3: Binomial probabilities of the number of exceedances in table 4.2.

The average exceedance

The second criterion to be evaluated is the average of the exceedances. As capital requirements are based on risk measures, small exceedances of an estimated VaRp

will have less of an impact on the financial health of an institution than a bigger exceedance.

In this criterion a big difference between the EVT for GARCH and the other three models is observed. As the EVT for GARCH has a fluctuating volatility, of which the VaRp is a multiplication, we expect to find that the average exceedance of the

EVT for GARCH estimate is lower than those of the other three. Table4.4 gives the average exceedances of the two stock indices.

(26)

p GARCH-estimate GPD-estimate Quantile-estimate Normal estimate S&P 0.950 0.68 1.09 1.07 1.09 0.975 0.57 1.22 1.12 1.14 0.990 0.51 1.48 1.44 1.26 0.995 0.64 1.4 1.39 1.48 0.999 0.51 1.44 0.82 1.37 Dow J. 0.950 0.58 1.25 1.09 1.11 0.975 0.54 1.28 1.13 1.12 0.990 0.61 1.46 1.45 1.24 0.995 0.67 1.37 1.34 1.43 0.999 0.66 1.85 1.55 1.41

Table 4.4: Average exceedances of VaRp, for different levels of p.

The correlation of exceedances

The serial correlation of exeedances should be zero, or, in other words, the exceedances should follow a white noise process. If exceedances do not take place independently of each other and have positive correlation, then a capital requirement based on a one-day VaRp might not be sufficient.

We test the correlation by applying a Ljung-Box test, for various lag-lengths h, to the identifier functions of equation (4.1). If the exceedances indeed take place inde-pendently of each other, we expect to find p-values higher than 0.05.

For lag-lengths 5 and 10 the exceedances were tested. The results produce a significant difference between the GARCH extension on the one side and the other three methods on the other side. While the latter three tests produce p-values of magnitude of 10−16, the GARCH extension is relatively consistent with a magnitude around 10−2.

4.4 Discussion of results

This section discusses the results of the three test criteria, their implications and the conclusions that can and cannot be drawn from them. To start off it is worthwhile spending a few words on the two time frames that have been considered. Both the time frame used to calibrate the models as well as the one used to back-test their estimates included highly volatile times. While the first period included a volatile time which was initiated by Black Monday on the 19th of October 1987, a major part of the latter time frame consists of the credit crisis of 2008. These times are

(27)

a good representation of a world in which financial markets can take big hits. We therefore expected that the EVT and its GARCH extension were to face an ultimate test, as this way we could evaluate if the ”extreme events” could indeed be captured by the mathematical models.

Firstly, looking at the binomial probabilities of table 4.3, we found that the quantile and normal estimates both produced estimates that were exceeded a significantly high amount of times and are therefore to be rejected on the basis of this data. Note that the quantile estimate did have a binomial probability of 0.11 for p equal to 0.999, but our data set was probably to small to make a fair judgement of exceedances with such a high p value.

We find that the GARCH extension would not be rejected for any level of p for the S&P500, and only for p values of 0.975 and 0.99 for Dow Jones. On the other hand, the general EVT method would be rejected for the S&P500 as well as the Dow Jones for p levels of 0.99 and 0.995. Like for the quantile estimate, it is likely that the VaR0.99 has a relatively high bias. Taking that into account, we can state that the

GARCH extensions produce VaRp estimates that are a better fit to the empirical

results than the ones of the other three methods.

The second criterion that was evaluated was the average exceedance. In table 4.4 we see that the average underestimation of VaRp of the GARCH extension is lower than

one percentage point for all estimates, while the other three methods have an average exceedance between one and two percentage points for all levels of p, except for the 99.9% quantile estimate of the S&P 500. In other words, the GARCH estimates come closer to the actual losses in case of an exceedance of the VaRp.

Taking the two first test criteria together, we find that the GARCH estimates pro-duced better estimates of the losses of the S&P 500 and the Dow Jones indices. They not only estimated values of VaRp that were exceeded significantly less than those of

the other three methods, but they also had a lower average exceedance. In a simpli-fied world, where capital requirements would be based on a one-day VaRp, GARCH

estimates would lead to capital requirements that offer better protection to future losses.

After studying the number of exceedances and the average exceedance, the correlation of exceedances has been evaluated. Testing for independence, we found a Ljung-Box test for independence on the hit series, as in (4.1), would reject independence of the exceedances for all four methods. This gives reason to believe that all four methods have been unable to completely capture the dependence structure of the underlying data.

(28)

However, the magnitudes of the probabilities of the Ljung-Box statistics show one big difference. With the probability of the GARCH estimates being roughly 1014 _times

the ones of the other three methods, we find that independence of the hit series under the GARCH model has higher likelihood than under the other three.

The practical importance of this independence is once again its implications for risk measures. If exceedances of a risk measure are assumed to be independent, then a capital requirement based on these measures will have lower requirements than when dependence exists.

Now, if this dependence does exist but is not assumed, this will lead to an under-estimation of a series of financial losses (bigger than VaRp). This underestimation

could have detrimental effects for financial institutions, especially if the losses are of relatively high magnitude and occur more often than assumed by definition of VaRp.

Looking at the three test criteria as a whole, we conclude that the two classical meth-ods of normal approximation and sample quantile estimation produce the poorest results. While their average exceedance and correlation of exceedances are relatively close to those of the EVT estimate, they have a far higher number of exceedances and therefore more significant evidence to be rejected. The EVT produces estimates that are more likely based on binomial probabilities, especially for p values of 0.95 and 0.975.

The main concern with the EVT, however, is its failure to capture the dependence structure of the underlying distribution. With that being the initial motive to consider a GARCH extension of the general EVT, the results were as expected: the GARCH extension captures the dependence structure better than the other three methods, resulting in lower numbers of exceedances, lower averages of exceedances and less dependence between exceedances.

Focusing on the three test results, one would conclude that the three other methods should be discouraged as of tomorrow and some version of the GARCH extension of the EVT should be incorporated in risk measures and regulations. However, a disad-vantage of the GARCH extension is that there is no such thing as “one” VaRp. As

illustrated by figure 4.3, the GARCH estimate fluctuates over time, while the other three methods produced a constant estimate. As its estimate depends on the (esti-mated) volatility of tomorrow, a possible capital requirement based on this estimate would have to include a significant buffer to mitigate the risk of a sudden increase in volatility (leading to a higher VaRp).

(29)

Furthermore, one of the advantages of a constant VaRp is that it is an easy and

well-known number that can be used to argue a financial case to someone who is less quantitatively oriented. If this constant VaRp starts fluctuating over time, it will be

harder to explain it to someone who lacks the statistical background to understand more complicated time series and simulations.

We therefore conclude that there are statistical arguments to extend a risk measure like the EVT estimate of VaRpto an estimate that considers the dependence structure,

such as a GARCH modelling of volatility. These arguments will have to be weighed against the practical advantages of a more constant risk measure.

(30)

Conclusion and future research

This thesis examined VAR-estimates by several underlying models, including the two more classical methods of normal approximation and sample quantile estimation and the classical EVT and GARCH for EVT. Using data on the S&P 500 and Dow Jones indices from 1986 until 2006, four different models were calibrated to the log-returns. These four models were used to estimate the value of a well-known risk measure, the one-day Value-at-Risk. While capital requirements are often based on n-day risk measures for some integer n, evaluating the one day Value-at-Risk gives a general implication of the benefits of the two EVT models versus the two classical ones. With the three test criteria being the number of exceedances (and the associated binomial probabilities), the average exceedance and the correlation of exceedances, we found that the general EVT outperformed the two classical methods based on the first criterion. However, the general EVT and the two classical methods showed similar results for the other two criteria. The average exceedance was relatively high, about half of the value of the Value-at-Risk, and the exceedances were highly corre-lated.

Based on the three test criteria, the EVT for GARCH model performed best among the four models that were considered. In contrast to the results of the other models, binomial probabilities did not show significant evidence to reject the model and the average exceedance cut back approximately 50% compared to the other three. In addition, there was weaker evidence of serial correlation among exceedances. Based on these results, one would conclude that the EVT for GARCH model leads to the best fit out of the four models that were considered and that practical issues should be put aside to incorporate this model in risk management.

Before coming to such a conclusion however, future research will have to be done. In this thesis the implications are based on the one-day Value-at-Risk, while capital

(31)

requirements are often based on risk measures of multi-day periods. As the general EVT and the two classical methods assume independence of the financial time series, a n-day Value-at-Risk is a trivial multiplication of the one-day Value-at-Risk. For the EVT for GARCH however, the n-day Value-at-Risk depends on the dependence structure of the data too.

It would therefore be worthwhile doing the same practical study for n-day Values-at-Risk, and incorporating the EVT for a multivariate GARCH model. This future study can be based on the work of Hakim et al. [6]

Once a multivariate volatility model and its effects on the Value-at-Risk are studied, the contribution of GARCH volatility models to risk management can be evaluated. If the multivariate volatility models outperform the classical methods in a way similar to the one-day evaluation, the academic world carries the responsibility to encourage incorporating these in internal models of financial institutions and regulations. It is time to learn from our mistakes.

(32)

In this appendix one can find the code that was used for the practical study on the Dow Jones index. The S&P500 study has been conducted analogously.

Dow Jones evaluation #Data transformations

DJ <- read.csv(file.choose(), header=TRUE) DJ$Date <- as.Date(DJ$Date)

DJLogRet <- NA for(i in 2:5298){

DJLogRet[i] <- - (log(DJ$Close[i]) - log(DJ$Close[i-1])) *100 }

DJLogRet[1] <-0

DJDate <- DJ$Date[1:5298]

DJLogRet.Fut <- NA for(i in 1:2518){

DJLogRet.Fut[i] <- - (log(DJ$Close[i+5298]) - log(DJ$Close[i+5298-1])) *100 }

DJDate.Fut <- DJ$Date[5299:7816]

#Analysis

plot(SP5Date, SP5LogRet, "l", xlab="Date", ylab="Dow Jones log-return", ylim=c(-10,10), col="red")

plot(DJDate, DJLogRet, "l", xlab="Date", ylab="S&P500 Log-return", ylim=c(-10,10), col="blue")

(33)

Box.test(SP5LogRet, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution Box.test(DJLogRet, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution

#GPD-fit library(QRM)

MEplot(DJLogRet[DJLogRet>0], main="Mean-Excess Dow Jones", xlab="GPD Threshold") legend("topleft", expression("u=2.8"), lty= 1, lwd=2.5,col="red")

abline(v=2.8, col="red") u <- 2.8

fit.GPD(DJLogRet, threshold = u)

## Garch

install.packages("fGarch", type = "source") library(fGarch)

Garch <- garchFit(formula = ~ garch(1, 1), data=DJLogRet, include.mean=FALSE, cond.dist="norm", trace=FALSE)

alpha0.Gar <- coef(Garch)[1] ; alpha1.Gar <- coef(Garch)[2] ; beta1.Gar <- coef(Garch)[3]

### Transforming Garch to just "Z" SIGMA.GAR <- NA

SIGMA.GAR[1] <- 1 for(i in 2:5298){

SIGMA.GAR[i] <- sqrt(alpha0.Gar + alpha1.Gar * (DJLogRet[i-1])^2 + beta1.Gar * (SIGMA.GAR[i-1])^2)}

plot(DJDate, SIGMA.GAR ,"l") Z.GAR <- DJLogRet / SIGMA.GAR

Box.test(Z.GAR, lag= 5, type="Ljung-Box") plot(DJDate, Z.GAR, "l")

#Determine threshold library(QRM)

(34)

MEplot(Z.GAR[Z.GAR>0], main="Mean-Excess Dow Jones", xlab="Garch Threshold") legend("topleft", expression("u=2.45"), lty= 1, lwd=2.5,col="red")

abline(v=2.45, col="red") u.GAR <- 2.45

xi.Gar.GPD <- fit.GPD(Z.GAR, threshold = u.GAR)$par.ests[1]; beta.Gar.GPD<- fit.GPD(Z.GAR, threshold = u.GAR)$par.ests[2]

##Value-at-Risks

p <- 0.95 ## P-value for Value-at-Risks

#GPD

xi.GPD <- fit.GPD(DJLogRet, threshold = u)$par.ests[1]; beta.GPD<- fit.GPD(DJLogRet, threshold = u)$par.ests[2] VaR.GPD <- u + beta.GPD / xi.GPD *

((fit.GPD(DJLogRet, threshold=u)$n /

fit.GPD(DJLogRet, threshold=u)$n.exceed *(1-p))^-xi.GPD -1) #QQ

VaR.quan <- quantile(DJLogRet, p) #Norm

mu.norm <- mean(DJLogRet); sigma.norm <- sd(DJLogRet) VaR.norm <- qnorm(p, mu.norm, sigma.norm)

#Garch

VaR.GAR.Z <- u.GAR + beta.Gar.GPD / xi.Gar.GPD * ((fit.GPD(Z.GAR, threshold=u.GAR)$n /

fit.GPD(Z.GAR, threshold=u.GAR)$n.exceed *(1-p))^-xi.Gar.GPD -1)

SIGMA.GAR.Fut <- NA

SIGMA.GAR.Fut[1] <- sqrt(alpha0.Gar + alpha1.Gar * (DJLogRet[5298])^2 + beta1.Gar * (SIGMA.GAR[5298])^2) for(i in 2:2518){

SIGMA.GAR.Fut[i] <- sqrt(alpha0.Gar + alpha1.Gar *

(DJLogRet.Fut[i-1])^2 + beta1.Gar * (SIGMA.GAR.Fut[i-1])^2)} Var.GAR.Fut <- SIGMA.GAR.Fut * VaR.GAR.Z

(35)

Estimating Value-at-Risk through EVT — Jeroen Ruissen 29 ## Identifying functions I.GEV <- numeric(2518) for(i in 1:2518){ if(DJLogRet.Fut[i] > VaR.GPD){ I.GEV[i] <-1}} Exc.GEV <- sum(I.GEV) I.QQ <- numeric(2518) for(i in 1:2518){ if(DJLogRet.Fut[i] > VaR.quan){ I.QQ[i] <-1}} Exc.QQ <- sum(I.QQ) I.norm <- numeric(2518) for(i in 1:2518){ if(DJLogRet.Fut[i] > VaR.norm){ I.norm[i] <-1}} Exc.norm <- sum(I.norm) I.GAR <- numeric(2518) for(i in 1:2518){ if(DJLogRet.Fut[i] > Var.GAR.Fut[i]){ I.GAR[i] <-1}} Exc.GAR <- sum(I.GAR) ## Average Exceedance X.GEV <- 0 for(i in 1:2518){ if(DJLogRet.Fut[i] > VaR.GPD){

X.GEV <-X.GEV +DJLogRet.Fut[i]-VaR.GPD }} Avg.Exc.GEV <- X.GEV / Exc.GEV

X.QQ <- 0

for(i in 1:2518){

(36)

X.QQ <-X.QQ +DJLogRet.Fut[i]-VaR.quan }} Avg.Exc.QQ <- X.QQ / Exc.QQ

X.norm <- 0

for(i in 1:2518){

if(DJLogRet.Fut[i] > VaR.norm){

X.norm <-X.norm +DJLogRet.Fut[i]-VaR.norm }} Avg.Exc.norm <- X.norm / Exc.norm

X.GAR <- 0

for(i in 1:2518){

if(DJLogRet.Fut[i] > Var.GAR.Fut[i]){

X.GAR <-X.GAR +DJLogRet.Fut[i]-Var.GAR.Fut[i] }} Avg.Exc.GAR <- X.GAR / Exc.GAR

##binomial testing

1-pbinom(Exc.GAR, 2518, 1-p); 1-pbinom(Exc.GEV, 2518, 1-p); 1-pbinom(Exc.QQ, 2518, 1-p); 1-pbinom(Exc.norm, 2518, 1-p)

## Correlation test of test-function

Box.test(I.GAR, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution Box.test(I.GEV, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution Box.test(I.QQ, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution Box.test(I.norm, lag= 5, type="Ljung-Box")

# Very unlikely that the log-returns follow a white-noise distribution

#VaR plot

plot(DJDate.Fut, Var.GAR.Fut, type="l", xlab="Date", ylab="Value-at-Risk", main="Dow Jones Value at risk for p=0.99")

(37)

abline(h=VaR.norm, col="green") abline(h=VaR.quan, col="blue")

legend("topright", c("Norm", "Quan", "GEV", "GARCH"),

(38)

[1] Balkema, A., and de Haan, L. (1974). ”Residual life time at great age”, Annals of Probability, 2, 792-804.

[2] Basel Committee on Banking Supervision. (2006). ”Basel II: International Con-vergence of Capital Measurement and Capital Standards: A Revised Framework - Comprehensive Version”, Bank for International Settlements.

[3] Danielsson, J., P. Embrechts, C. Goodhart, C. Keating, F. Muennich, O. Renault and H.S. Shin(2001). ”An academic response to Basel II” Special Paper-LSE Financial Markets Group.

[4] Diebold, F.X., T Schuermann, and J.D. Stroughair (2000). ”Pitfalls and Oppor-tunities in the Use of Extreme Value Theory in Risk Management”, The Journal of Risk Finance, Vol. 1 Iss 2, pp. 30 - 35.

[5] Fisher, R.A. and L.H.C. Tippett (1928). ”Limiting forms of the frequency distri-bution of the largest or smallest member of a sample.” Mathematical Proceedings of the Cambridge Philosophical Society, 24, 180 - 190.

[6] Hakim, A, M McAleer and F Chan (2009) ”Forecasting Portfolio Value at Risk for International Stocks, Bonds and Foreign Exchange”, working paper: http://mssanz.org.au/MODSIM07/papers/31 s8/ForecastingPortfolio s8 Hakim . pdf

[7] Homolka, L. (2013). “Extreme value approach value approach for estimating value at risk metrics with respect to Basel II”, International Journal of Mathe-matics and Computers in Simulation, 7, 171–178.

[8] McNeil, A.J. and R. Frey (2000). “Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach”, Journal of Empirical Finance, 7, 271–300.

[9] McNeil, A. J., R. Frey and R. Embrechts(2005). Quantitative Risk Management, Princeton University Press, Princeton.

(39)

[10] Pickands, J. (1975). ”Statistical inference using extreme order statistics”, Annals of Statistics, 3, 119-131.

[11] European Parliament and Council (2009). ”Directive 2009/138/EC of the Euro-pean Parliament and of the Council of 25 November 2009 on the taking-up and pursuit of the business of Insurance and Reinsurance - Solvency II”