Comparison between conditional and unconditional methods to estimate the Value-at-Risk of nancial data

(1)

UNIVERSITY OF AMSTERDAM

Comparison between conditional and

unconditional methods to estimate the

Value-at-Risk of financial data

by

Martin Kroon

A thesis submitted in fulfillment for the

Master degree in Actuarial Science and Mathematical Finance

at the

Faculty of Economics and Business (FEB)

(2)

Declaration of Authorship

I, Martin Kroon, declare that this thesis titled, ‘Comparison between conditional and unconditional methods to estimate the Value-at-Risk of financial data’ and the work presented in it are my own. I confirm that:

This work was done wholly or mainly while in candidature for a research degree

at this University.

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly

at-tributed.

Where I have quoted from the work of others, the source is always given. With

the exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself.

Signed: M. J. Kroon

(3)

“Only those who will risk going too far can possibly find out how far one can go.”

(4)

UNIVERSITY OF AMSTERDAM

Abstract

Faculty of Economics and Business (FEB) Master in Actuarial Science and Mathematical finance

by Martin Kroon 1

Various conditional and unconditional methods are compared to estimate the Value-at-Risk. The conditional methods rely on an AR(1)-GARCH(1,1) time series model with varying distributions for the innovations, while the unconditional methods assume i.i.d. data. Using 8 years of data consisting of DAX, S&P 500 and AEX daily returns, we compare methods with backtesting to see which method performs best in combination with a given mesokurtic or leptokurtic distribution. Results show, using a leptokurtic distribution for the innovations, the conditional methods outperform the unconditional methods for the 90% quantile Value-at-Risk. This is the other way around when esti-mating the 99,5% quantile of this risk measure, where the unconditional methods show significant binomial test results. Note that the results and conclusions strongly depend on the data period chosen.

1

Supervision: dr. S.U. (Umut) Can (UvA, first reader), dhr. prof. dr. R.J.A. (Roger) Laeven (UvA, second reader) and Maarten van der Maarel (WTW)

(5)

Introduction

1.1 The Value-at-Risk

Nowadays, financial institutions (e.g. insurance companies) own great amounts of assets to cover their liabilities (Hull, 2012). A lot of these assets can be traded on a daily basis, which brings financial risks into a company. These risks could be dangerous for the pol-icyholders. In order to protect these policyholders, the regulators, in the Netherlands for example the DNB (De Nederlandsche Bank), want to know to what extent these institutions are exposed to these risks. Because when a financial crisis occurs (a down-side market shock), an insurance company should still be able to cover their liabilities. However, the question which then arises, is how to manage this risk in the sense of how much money to hold.

For all the financial institutions, there are different rules for this amount to hold. For example, following the Capital Adequacy Directive by the Bank for International Settlements (BIS) in Basel, (Basel Comittee, 1996) there should be enough capital to cover losses on the bank its trading portfolio over a 10 day holding period in 99% of the scenarios. In this way, we are 99% certain that the amount to hold will be enough to cover the losses during 10 days. This capital buffer is what we call the Value-at-Risk (VaR). It makes us able to know that there is a 1% chance of making a loss worse than this estimate. Note that most financial institutions use a 1 day holding period and a confidence level of 95% for internal risk control.

The Value-at-Risk has not only been used within the financial business. It has also found application within the agribusiness (Boehlje and Lins, 1998). Moreover, Manfredo and Leuthold (2001) used the Value-at-Risk for estimating the market risk of cattle feed-ers.

(8)

So as we mentioned before, the Value-at-Risk depends on a chosen quantile (con-fidence level) and on a time period (Jorion, 1997):

VaRq,t(X) = inf{x| P (X ≤ x) > q}, 0 < q < 1

Here t, q and X represent respectively the chosen time period, the confidence level and a loss. If we look at it in a more abstract way, we can say, that the Value-at-Risk equals the quantile from a distribution of returns (negative and positive) generated by an asset (or portfolio) over a desired time period.

Note that there are also other risk measures we could use. For example, there is the expected shortfall, the Tail Value-at-Risk (TVaR) and the worst conditional expectation. To test whether a given risk measure, ρ(X), is coherent, Artzner et al. (1997) proposed some axioms:

• Axiom T: translation invariance (i.e. the risk measure is in monetary units): ρ(X + α) = ρ(X) − α, ∀ X, α ∈ R (1.1) • Axiom S: Subadditivity:

ρ(X1+ X2) ≤ ρ(X1) + ρ(X2), ∀ X1, X2 (1.2)

• Axiom M: Monotonicity: if X1 ≤ X2, then

ρ(X1) ≥ ρ(X2) (1.3)

• Axiom PH: Positive homogeneity: if λ is a non-negative constant, then

ρ(λX) = λρ(X) (1.4) If a risk measure satisfies all four axioms, then we call it a coherent risk measure according to Artzner et al. (1999).

The Value-at-Risk satisfies only three out of the four, because this risk measure is not subadditive. This means, when implementing it, this measure could give us an incentive to split our portfolio (because it would be less risky according to our Value-at-Risk estimate). However this gives a wrong impression of the situation and is therefore one major disadvantage of using it.

(9)

1.1.1 Solvency II

Solvency II is an EU directive concerning the regulation of insurance business in EU countries. Here (re)insurers are able to find the rules for the required capital they have to hold to cover their liabilities, but also rules about their risk management and their required documentation [33]. Initially, the regulator scheduled its start from 2013. How-ever, this has been postponed and eventually it came into effect on January 2016. Its main goal is to improve the Solvency I framework, which had a number of drawbacks (Doff, 2008). Now the regulators use a more realistic modelling of all types of risks. Under Solvency I, insurance companies had to hold fixed percentages of their liabilities and of their risk capital as their solvency capital. This gave the policyholder not much certainty. Therefore, the regulator changed this policy and obligated these companies to estimate their Solvency Capital Requirement (SCR).

This capital requirement basically means that the company should have enough reserves to survive a 1-in-200 years crisis (this is the 99,5% VaR with a time horizon of 1 year on the basic own funds of the insurance company). This requirement can be calcu-lated according to a standard formula, which is explained in the Solvency II regulation, or by an internal model, however this needs approval from the regulator (which can be expensive).

A standard formula estimates the Solvency Capital Requirement, which basically depends i.a. on the Basic Solvency Capital Requirement (BSCR). This BSCR is calcu-lated by aggregating the solvency requirements of all risk sub-modules (see figure 1.1).

Because the Solvency Capital Requirement is calibrated using the Value-at-Risk on

(10)

the basic own funds of an insurance company, we also have to apply this to every risk (sub)module within a standard formula (see figure 1.2). After this is done, we have to aggregate the capital requirements with the use of correlation matrices (also for diversi-fication) and we get the Basic Solvency Capital Requirement.

So all in all, the Value-at-Risk plays a key role within Solvency II for determining the Solvency Capital Requirement and is therefore very interesting to do research on. In this thesis we focus on the way it is estimated and we use different old and newly introduced techniques.

Figure 1.2: Every risk (sub)module gets subjected to different stresses based on the 99,5% VaR with a time horizon of 1 year. This increases the importance of rightly

estimating the Value-at-Risk.

1.1.2 Estimation methods

The Value-at-Risk can be calculated in numerous ways. The three simplest methods are the following ones (Danielsson and de Vries, 1997):

• The Historical Method :

Re-organize actual historical losses (for a given time period) and put them in order from worst to less bad. Then assume the history to repeat itself in the future (this is not very realistic). After this is done, we simply estimate the desired quantile from the empirical distribution and we end up with the desired Value-at-Risk estimate (Linsmeier and Pearson, 1996).

(11)

• The Variance-Covariance Method :

This method assumes the losses to be normally distributed. For this normal dis-tribution, we require two parameters. The first parameter is the expectation, µ, and the second parameter is the standard deviation σ. When we estimate these parameters from our data, we are able to estimate the desired quantile from the fitted normal distribution which represents our Value-at-Risk estimate.

• Monte Carlo Simulation:

This last method involves developing models for the future returns. With the help of Monte Carlo simulation we can create multiple scenarios and taking the expectation gives us the expected returns. After putting these expected returns into a histogram, we can obtain our quantile estimate (Rouvinez, 1997).

All these methods share one common feature, and that is simplicity. However, assuming the future losses to be normally distributed or assuming them to be similarly distributed as in the past is too short-sighted. Therefore we have to take a look at more sophisticated methods (McNeil and Frey, 2000). In this paper, we focus on these methods and in particular on how they work/perform. We compare the following methods, which will be explained in detail in following chapters:

1. Conditional method

2. Conditional Extreme Value Theory method 3. Unconditional Extreme Value Theory method

4. Conditional translated Gamma approximation method 5. Unconditional translated Gamma approximation method 6. Unconditional historical simulation method

1.1.2.1 Scaling

According to the Solvency II legislation given above, we need the 99,5% Value-at-Risk with a time horizon of 1 year for calibrating the stresses within a standard formula. However, the problem with this time horizon is that if we would use the above methods, we need a lot of data.

(12)

Therefore we could scale the 1 day Value-at-Risk to the 1 year Value-at-Risk1. The most widely used approach for this, is the square root scaling method:

VaRq,t=h = VaRq,t=1∗

√

h, h ≥ 1, 0 < q < 1 (1.5) This method assumes the returns (mostly log returns) to be independent identically normally distributed with an expectation equal to zero (Hamidieh and Ensor, 2009). However, in practice we see that the data is neither independent identically distributed nor normally distributed. Therefore it is a very inaccurate way for estimating the 1 year Value-at-Risk (McNeil and Frey, 2000).

Better methods for scaling are the following ones: • McNeil/Frey Simulation

• Non-overlapping Scaling

• Bounded Non-overlapping Scaling • Overlapping Scaling

• Bounded Overlapping Scaling

These methods are widely explained and tested for their accuracy in Hamidieh and Ensor (2009).

1.1.3 About this thesis

In this thesis we compare various conditional and unconditional methods to estimate the Value-at-Risk. Some of these methods depend on time, i.e. conditional methods need an assumption about the distribution of the innovations from the time series. In this thesis we use three distributions that have fluctuating shapes (they differ in peakedness). This and the methods mentioned in paragraph 1.1.2 are thoroughly explained in chapter 2 and finally being tested in the chapters 3 and 4. In the last chapter, we analyse the results which we derive from backtesting. In this way we are able to see when a certain method in combination with a degree of peakedness of the dependent distribution of the innovations (for the conditional methods) performs better than another method, in the sense of estimating the Value-at-Risk, and we can conclude why.

1

(13)

Chapter 2

Background theory

In this chapter we explain in a comprehensive way how the different methods are estab-lished and which assumptions and such hold. First we explain how the time series model (conditional part) works and how we apply it. Afterwards we explain all the methods in the order given in paragraph 2.2. This explanation consists of what theorems we use up til the derivation of the formula for the Value-at-Risk.

2.1 The conditional part

For the conditional methods, we apply a time series model in order to pick up the sudden changes in the volatility within the data. To begin, we assume the negative log returns to follow a strictly stationary process where:

Xt = µt+ σtZt, (2.1)

Here the Xt and the Zt respectively represent the daily negative log returns and the

standardized residuals for each t. The residuals are assumed to be i.i.d. with expectation 0 and variance 1. We will let FZ denote their distribution function. We can estimate

µt and σt with the help of the information up until t − 1 with t ∈ Z, so we exploit the

entire history of observed values, say H.

As already mentioned in paragraph 1.1, the unconditional Value-at-Risk is defined as follows:

VaRq,t(X) = inf {x| P (X ≤ x) > q}, 0 < q < 1 (2.2)

So basically, you can obtain the quantile estimate directly from the data. However, for the conditional methods we want to know this estimate from the predictive distribution

(14)

function FXt+1|H(X), which is 1-step ahead in the future. So therefore we alter equation

(2.2) to fit our conditional quantile estimate:

xt_q = inf {x ∈ R : FXt+1|H(x) > q}, 0 < q < 1 (2.3)

If we finally substitute our time series model into this equation and work it out, we get our desired expression:

P (µt+1+ σt+1Zt+1≤ xtq|H) ≥ q ⇔ FZ xt_q− µt+1 σt+1 ! ≥ q ⇔ xt_q− µt+1≥ F_Z−1(q)σt+1 ⇔ xt_q= µt+1+ σt+1zq (2.4)

Here the zq represents the quantile function of Zt, i.e. the upper qth quantile (which

does not depend on t).

2.1.1 Calibration of the parameters

Just like in Jalal and Rockinger (2004), we model our µtwith an AR(1) process and our

σ_t2 with a GARCH(1, 1) process. Given the conditional heteroscedasticity of most of the returns data in the world (this is later showed in chapter 3 for our data), this choice of modelling is most common. We use their method for calibration, as the purpose of this paper is not to show which time series model would work best. Other time series models which are sometimes applied in the literature are from the ARCH/GARCH family (Bollerslev et al., 1992). So all in all, we make use of the following equations together with equation (2.1):

µt = φXt−1 (2.5)

σ_t2 = α0+ α12t−1+ β0σ2t−1 (2.6)

t = Xt− µt (2.7)

In order to estimate all the parameters, we need to determine which dependent distribution, scaled to have a variance equal to 1 (we assume a white noise process), to use for Zt. Because they mostly have a leptokurtic distribution (Bollerslev and Wooldgridge,

(15)

• Skew Generalized Error distribution: f (z; µ, σ, λ, p) = pe − |x−µ+m| vσ(1+λsign(x−µ+m)) p 2vσΓ 1 p with, m = 22p_vσλΓ 1 2+ 1 p √ π and v = v u u u u t πΓ1_p π(1 + 3λ2_)Γ3 p − 161p_λ2_Γ 1 2+ 1 p 2 Γ1_p • Student’s t-distribution: fv(z) = Γ v+1₂ √ vπΓ v₂ 1 +z 2 v −v+1₂

• Standard Normal distribution:

f (z) = √1 2πe

−1 2z

2

The first distribution has the highest kurtosis, then comes the second one. Their kurtosis is higher than that of a standard normal distribution and are therefore leptokurtic.

Beside leptokurtic distributions, they can also be platykurtic and mesokurtic (see figure 2.1). With a platykurtic distribution the points along the x-axis are dispersed. This results into a lower peak than the curvature from a normal distribution (so this means a lower kurtosis). Finally with a mesokurtic distribution, we basically mean that the kurtosis of the distribution equals the kurtosis of a distribution from the normal family (our third distribution).

Figure 2.1: Leptokurtic, mesokurtic (normal) and platykurtic distributions.

To actually estimate the parameters, we use the maximum likelihood method. The algorithm basically works by taking derivatives of the log-likelihood function with respect to the parameters θ = (φ, α0, α1, β0)0 and setting them equal to 0. For the

standard normal distributed innovations, we get a log-likelihood function equal to: L(θ) = −m 2 ln(2π) − 1 2 m X t=2 (ln(α0+ α12t−1+ β0σt−12 )) − 1 2 m X t=2 xt− φxt−1 α0+ α12t−1+ βσt−12 2

(16)

For the Skew Generalized Error distribution and the Student’s t-distribution there are no closed forms for the log likelihood functions to be found, however approaches and approximations do exist and can be found in the econometric literature.

2.1.2 1-Step forecast

We estimate the conditional mean and standard deviation respectively as an AR(1) and an GARCH(1, 1) process, which was proposed by Jalal and Rockinger (2004). The pa-rameters of this time series model is then estimated with the maximum likelihood method like we explained in the previous section. We could have estimated the parameters using the Pseudo Maximum Likelihood method of Gourieroux, Monfort and Trognon (1984). However this method assumes no distribution for the innovations, which is not what we desire for our research. We want to include the effect of changing this distribution from a mesokurtic one to a leptokurtic one.

For equation (2.4) we need to estimate µt+1 and σt+1, which are the one step

ahead estimates of the conditional mean and standard deviation. To find these values, we make predictions using our time series model. We do this by fixing the amount of memory for each prediction equal to m = 1000 (McNeil, 2000), which is a necessary assumption for chapter 4. Then the data we use, consists of 1000 daily negative log returns (xt−1000+1, ..., xt−1). With some transformations of the equations (2.5), (2.6)

and (2.7) we get the following results:

˜

µt+1 = ˆφxt (2.8)

˜

σ_t+12 = ˆα0+ ˆα1ˆ2t + ˆβ0σˆt2 (2.9)

with, ˆt = xt− ˆµt (2.10)

We need formula (2.10) to check whether the assumption of strict white noise is not violated and we have to check whether the series is strictly stationary, which can be done by checking whether β0+ α1 < 1. A proof for this expression can be found in McNeil

(2000). After estimating the future parameters we are only left with the estimation of zq. This value does not depend on time and is therefore not conditional and it can be

estimated in several ways, which is explained in the next sections.

2.2 The used methods

In this section we explain all the used methods which are listed in section 1.1.2. We start with explaining the used theorems and end with the application of them to our methods.

(17)

Following this way, we end up with the desired estimate for zq (for the conditional

methods) or for the Value-at-Risk (for the unconditional methods).

2.2.1 Conditional method

The conditional method is the easiest method when using time series for estimating the Value-at-Risk. This method basically follows a few simple steps. At first we need to make an assumption about the distribution of the i.i.d. innovations. This distribution must have a variance and a mean respectively equal to 1 and 0. Therefore we need to scale it. This can easily be done for example for the Student’s t-distribution, there we multiply the quantile function with

q

v−2

v for v > 2. After this, we can fit the time

series model through the data using equations (2.5), (2.6), (2.1) and with the help of maximum likelihood. This gives us the desired parameters we need for estimating the 1-step ahead prediction. Then we actually have finished the hard part of this method, because now we only need to estimate the quantile of the innovations distribution from the fitted time series model. When we substitute this value into equation (2.4), we end up with an estimator for the Value-at-Risk.

Estimating the Value-at-Risk with this method has some drawbacks. For example, there is the problem of not foreseeing extreme events on time. The GARCH(1, 1) model is capable of finding the volatility clustering within the data, however if no such thing as an extreme event has taken place in the past, then this method cannot predict this. Therefore we also need stronger methods, which do take this problem into account and can better cope with extreme events.

2.2.2 The Extreme Value Theory method

The Extreme Value Theory analyses the stochastic behavior of extreme values of random variables. Within this theory we can look at the behavior of ’block maxima’ or we analyse ’threshold exceedances’ of random variables. The latter one depends on the Pickands-Balkema-De Haan theorem, which gives the asymptotic tail distribution of a random variable. This way of modelling is also known as Peaks over Threshold (Pickands 1975, Balkema and de Haan 1974). In the next section we explain how this theory works and how we eventually end up with an estimator for the Value-at-Risk.

(18)

2.2.2.1 Pickands-Balkema-de Haan theorem

Given a random variable X, where F (x) represents the distribution function of X, we define the excess distribution of X over the threshold u as follows:

Fu(x) = P (X − u ≤ x|X > u) =

F (x + u) − F (u)

1 − F (u) , 0 ≤ x ≤ xF − u (2.11) where xF is the right endpoint of the underlying distribution F . The

Pickands-Balkema-de Haan theorem states that when taking a threshold u high enough, Fu(x) converges

to a certain family of distributions. This basically means that we can find a function β(u), such that the following expression holds:

lim u→xF sup 0≤x<xF−u |Fu(x) − Gξ,β(u)(x)| = 0 (2.12) with Gξ,β(x) =    1 − (1 + ξx_β )−1ξ _{, ξ 6= 0} 1 − exp(−x_β ), ξ = 0 (2.13)

This formula from Embrechts, Klupperberg and Mikosch (1997) only holds when F ∈ M DA(Hξ) and ξ ∈ R, which means that F has to be in the maximum domain of

attraction of Hξ. This is the case for most of the continuous distributions, but for

example for the Poisson and the negative binomial distributions this is not valid. The ξ parameter indicates whether the used distribution is heavy tailed or not. The case is, the more positive ξ is, the heavier the tail shall be. This also indicates that this distribution will have infinite higher moments. Eventually when equation (2.12) holds and converges, then the tail distribution follows a Generalized Pareto distribution (GPD) which is defined as equation (2.13) with x ≥ 0, when ξ ≥ 0 and when 0 ≤ x ≤ −β_ξ , then ξ < 0. The β and the ξ represent respectively the scale parameter and the shape parameter. Changing the shape parameter causes the GPD to transform into one of the following distributions:

• Pareto distribution, if ξ > 0 • Exponential distribution, if ξ = 0 • Pareto type II distribution, if ξ < 0

where the first case is most relevant for us, as we probably use heavier tailed distributions. All such distributions, where the limiting excess distribution equals the Generalized

(19)

Pareto distribution with ξ > 0, have tails of the form (Gnedenko, 1943): 1 − F (x) = x

−1

ξ _L(x) _(2.14)

Here the L(x) represents a slowly varying function which is defined as: lim

x→∞

L(ax)

L(x) = 1, ∀ a > 0.

If we work out equation (2.11) and substitute (2.13) in it, we end up with a formula which depends on F (u). This quantity can be estimated with n−h_n , as the probability of a random variable X being smaller than u can be estimated by the amount of data points in this interval, n − h, divided by the total amount (so this is an unbiased estimator). Here h represents the amount of data points above the threshold. This eventually gives us the following approximation for values of x higher than u:

P(X ≤ x) ≈ 1 − h n 1 + ξx − u β −1_ξ (2.15) We could also estimate the asymptotic relative error of the estimator given above. In Smith (1987) we find a result for this:

h12 1 − F (x)

1 − F (x) − 1

d

→ N (0, a2) (2.16) For the estimating part of the ξ and β parameters we use maximum likelihood, which basically means that we maximize the following function with respect to the scale and shape parameters: L(β, ξ |x) = −h ln β + h ln h n − 1 +1 ξ h X i=1 ln 1 +ξ(xi− u) β , with ξ 6= 0

Unconditional Value-at-Risk estimation With the help of the previous section, we can make an estimator for the Value-at-Risk. Using equation (2.15) we make sure we only need a desired quantile for the F (x) (say q). This means, we have a chance of 1 − q to find values which exceed x. When we set the right-hand side of equation (2.15) equal to q and solve for x, we should get our desired estimate:

1 + ξx − u β −1_ξ = n h(1 − q) ⇔ x − u β = n h(1 − q)) −ξ_{− 1} ξ ⇔ VaR = u +β ξ   1 − q h n !ξ − 1   (2.17)

(20)

The parameters can again be estimated using the maximum likelihood method, however it is also possible to estimate them with the help of the conditional Extreme Value The-ory (EVT). Here we use a two-step procedure which is explained in McNeil and Frey (2000), note that in this thesis, we will not go any further into this method.

The unconditional part of the title of this section basically means that we esti-mate the Value-at-Risk without considering the use of time series. So we look at the distribution of the negative log-returns, which we assume to be i.i.d., and estimate the tail distribution with the help of the previously mentioned formulas. This is in contrast with the next method, which does take the GARCH(1,1) and the AR(1) models into consideration.

Using the Extreme Value Theory for modelling the Value-at-Risk has however some difficulties. At first we have to determine which threshold to take, this can be done in several ways. Firstly, we can estimate the mean excess function which is defined as E(X − u|X > u). An unbiased estimator for this function could be

Ph i=1(yi)

h , with yi

representing the exceedances. So in words, we calculate the average exceedance. When this function approximately becomes linear (with the different thresholds on the x-axis), we know that we have to take this u. This, because the threshold choice is a linear upward function for ξ > 0 and a downward function for ξ < 0:

e(i) = β − ξ 1 − ξu +

ξ

1 − ξi, i ≥ u, ξ < 1 and β > 0

Another way for determining the threshold is making use of simulations. This is widely explained in McNeil (2000) and this is also where we base our thresholds on in this thesis.

Finally, the last difficulty of the Extreme Value Theory is the trade-off between variance and bias. If we choose the threshold high enough, we expect the Generalized Pareto distribution to fit well. However, simultaneously, it means highly volatile param-eters (and the other way around). Therefore it is crucial to rightly choose the threshold u, because the entire setting depends on it.

2.2.2.2 Conditional Extreme Value Theory method

The conditional Extreme Value Theory method is more sophisticated in the sense that we have to take multiple factors into consideration. The goal of this method is to find the quantile zq from the tail distribution of the innovations. In order to do so, we follow

the steps which are given in McNeil (2000). At first, we want a fixed amount of data in our tail, lets say k = h in equation (2.15). Because we want to estimate a lot of Value-at-Risk values in chapter 4 through time, we want to have the same amount of data used for every estimation. However this is not possible if we just choose a random

(21)

threshold for every iteration. Therefore we need to fix the amount of tail data which makes it possible to say that the (k + 1)th element of the n standardized innovations equals the threshold. Note that this only works when we order these innovations from highest to lowest (otherwise we have negative and positive numbers mixed up). After this is done, we can modify equation (2.15) to fit the tail of our innovations in order to estimate our quantile zq:

zq= zk+1+ β ξ   1 − q h n !−ξ − 1   (2.18) Where zk+1 represents the (k + 1)th element of the n ordered standardized innovations.

If we use this equation together with (2.4), we end up with our desired estimator which is given as follows: VaR = µt+1+ σt+1  zk+1+ β ξ   1 − q h n !−ξ − 1     (2.19) In this thesis we fix the parameters h and n and set them respectively equal to 100 and 1000. This is based on the simulation study, which is done in McNeil (2000). All in all, this is what we use for backtesting our data which is fully explained in chapter 4.

2.2.3 Unconditional translated Gamma approximation method

The raw negative log returns from for example the DAX (German stock index) or the AEX (Dutch stock index) are mostly very skewed to the right as already mentioned. When using time series to model this, we would expect the innovations also to be skewed to the right. A Translated Gamma Approximation can in this case be very helpful. If we would fit a regular Gamma distribution through the previously mentioned distributions we get a mismatch (it would be ideal if they would). Therefore we allow a shift in the Gamma distribution. So now we say that a random variable X, which has a Gamma distribution with α and θ as its parameters, becomes a different random variable X + x0.

If we want to estimate the parameters of this new distribution, we make sure that the first three moments of both X + x0 and X match. This means the following:

µ = x0+ α θ σ2 = α θ2 γ = √2 α

(22)

When we rewrite the equations above in terms of the α, θ and the γ, we end up with the desired parameters. After calibrating these parameters, we have a fit of the distri-bution up to the 3th moment, so F (x) ≈ _Γ(α)1

x−x0

R

0

y1−α_θα_e−θy_{dy holds. Because we use}

the unconditional translated Gamma approximation method, we only need to fit the translated Gamma distribution through the data and estimate:

F (xq− x0) = q ⇔ xq− x0 = F−1(q) ⇔

VaR = F−1(q) + x0 (2.20)

with F representing the gamma cdf with parameters α and θ.

2.2.4 Conditional translated Gamma approximation method

The conditional translated Gamma approximation method follows the same steps as the unconditional method. The necessary parameters α, θ and x0 can again be estimated

with the equations from the previous section. However there are 2 main differences to mention. At first we use the distribution of the standardized innovations from the GARCH(1,1)-AR(1) time series model instead of the distribution of the raw log returns. And secondly, our Value-at-Risk estimate has to be estimated with equation (2.4). So we need to calculate the quantile zq from the distribution of the standardized innovations.

This can be done in the following way:

H(xq− x0) = q ⇔ xq= H−1(q) + x0 ⇔

VaR = µt+1+ σt+1(H−1(q) + x0) (2.21)

with H representing the gamma cdf with parameters α and θ. Note that the conditional and the unconditional translated Gamma approximation method have one major disad-vantage. When the data, we use to fit the translated Gamma distribution through, is negatively skewed then this method does not work properly and gives therefore unreliable Value-at-Risk estimates (Kaas et al., 2009).

2.2.5 Unconditional historical simulation method

The unconditional historical simulation method works basically the same as the condi-tional method mentioned in section 2.2.1. However now we do not use the standardized innovations, but we re-organize the actual historical negative log returns (for a given time period) and put them in order from worst to less bad. After this, we assume the history to repeat itself in the future. Now, we can simply estimate the desired quantile

(23)

from the empirical distribution and we obtain the desired Value-at-Risk estimate like already explained in chapter 1.

In the next chapters, we introduce our data which comes from the financial database Datastream. All the explained methods above are tested on this data and with the help of certain techniques, we can compare the outcomes. Hereafter, we analyse the results and end with a conclusion which contains an answer to our central question.

(24)

Chapter 3

Empirical Study

In this chapter we introduce the data gathered from the financial database Datastream accessed through the UvA library. We need this data as input for chapter 4. We choose to use 2088 daily index prices from 08/02/2008 up until 09/02/2016 from the DAX (German stock index), S&P 500 (Standard and Poor 500) and the AEX (Dutch stock index). Between these two dates we should capture the financial crisis which started around 2008 (this should help finding the differences between using conditional and unconditional methods). This crisis could be seen by looking at the conditional standard deviations over time, as recessions mostly show high levels of volatility.

In the next sections we give the properties of the used data and we run some fit tests to show whether the used assumptions are correctly chosen and whether the models work properly.

3.1 Data analysis

As mentioned at the beginning of this chapter, we use three different vectors of data (where each element represents an index, which is defined as a index price with dividends, interest and rights offerings realized over a given period of time). This raw data clearly does not follow a stationary process. Therefore we transform the data into log returns by using the following formula:

rt= −log

index price_t+1 index price_t

(3.1) So using this formula, we obtain 3 vectors of 2087 log returns which we then apply for fitting the time series model. But before we can do this, we first need to run some

(25)

tests on a random fit to check the validity of our assumptions. This is done in the next paragraph.

Testing the stationarity of the raw data and residuals The credit crisis which reached its highest point in 2008 had a great impact on Europe. This can clearly be seen in the data we use in this thesis, periods of volatility clustering arise after each other. Because this crisis affected all three datasets, we find a high Pearson’s correlation coefficient between the index prices of all three pairs (around 0.85). This is the reason why we only run our tests on the AEX data.

For our tests we choose a random time interval of a 1000 days starting from 11/04/12 up until 09/02/16. This data is then used for calibrating the GARCH(1, 1) − AR(1) model and from this we obtain the standardized residuals (z1, z2, ..., z1000). At

first, we check whether the time series follow a strictly stationary process. From chapter 1, we know that this basically means that β0 + α1 < 1. From the R code which can

be found in Appendix B, we obtain that this equation holds (β0+ α1 = 0.987) for our

AEX data, so we have a stationary process. This can also clearly been seen in figure 3.1, the series does not deviate that much over time and has approximately a mean equal to 0. We also assumed in chapter 1 that the standardized residuals from the time series

Figure 3.1: The AEX dataset consisting of a 1000 negative log returns with condi-tional standard deviation (grey lines). Overall, we can clearly see a lot of clustering

and also high peaks around index 950.

would follow a strict white noise process and moreover we did not allow serial correlation between the different lags (so the standardized residuals are independently distributed). This can be checked in several ways, but in this thesis we use the Ljung-Box test and we show a graph of the autocorrelation functions of the (squared) data and of the (squared)

(26)

residuals, where this function is defined as: Rs,t=    E((xt−µt)(xs−µs)) σtσs , if t 6= s 1, if t = s (3.2) In figure 3.2, we can see that there is indeed serial correlation between the (squared) observations, but not between the standardized residuals, moreover the Ljung-Box test results into a x-squared of 0.8111 and a p-value of 0.8468 (with 3 lags). So the assumption of absence of serial correlation between the standardized residuals is valid. To check

0 5 10 15 20 25 30 0.0 0.4 0.8 Lags A CF

ACF of Squared Observations

0 5 10 15 20 25 30 0.0 0.4 0.8 Lags A CF

ACF of Squared Standardized Residuals

0 5 10 15 20 25 30 0.0 0.4 0.8 Lags A CF ACF of Observations 0 5 10 15 20 25 30 0.0 0.4 0.8 Lags A CF

ACF of Standardized Residuals

Figure 3.2: Autocorrelation functions of the (squared) observations and the (squared) standardized residuals are shown. We see that the white noise assumption is not

vio-lated.

whether the threshold for the Generalized Pareto distribution has been chosen correctly, we look at the sample mean excess plot to see where it becomes approximately linear. Because we use two different methods (unconditional and the conditional Extreme Value method) which uses this threshold choice, we follow the threshold simulation from McNeil (2000).

The paper of McNeil (2000) states that taking the 101th data point from the ordered standardized residuals or the ordered negative log returns (for respectively the conditional and unconditional method) would be optimal. As we can see from figure 3.3, the mean excess plots show downward linear trends, so parameter ξ from the GPD

(27)

is negative. This is not what we expected, but looking back at the time series, we notice that there are less outliers than expected. On the other hand, the threshold choice for the conditional Extreme Value Theory method could be improved. For example, we could change it into a smaller number, however we do not expect things to change because of the trade-off between variance and bias (the threshold choice for the other method is fine).

This trade-off can namely been seen in table 3.1, here the parameter estimates from the Generalized Pareto distribution are given. The standard errors of the parameters are definitely not high, which indicates that the parameters are estimated in the right way. And also the fit of the GPD, which can be found in figure 3.4, works fine.

u ξ s.e β s.e unconditional EVT 0.0115 -0.1315 (0.0868) 0.0094 (0.0012) conditional EVT 1.2204 -0.0842 (0.0916) 0.7414 (0.1003)

Table 3.1: Generalized Pareto distribution fit with the threshold, the parameters and the standard errors. The given standard deviations of the parameters are quite low,

which indicates low volatile parameters.

0.0 1.0 2.0 3.0 0.3 0.4 0.5 0.6 0.7 0.8 Threshold Mean Excess 0.000 0.010 0.020 0.030 0.0050 0.0065 0.0080 Threshold Mean Excess

Figure 3.3: Given are the mean excess functions for the AEX dataset of respectively the residuals and the negative log-returns. We see the functions becoming

(28)

2 3 4 5 6

1e−05

1e−03

1e−01

x (on log scale)

1−F(x) (on log scale)

2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0

x (on log scale)

Fu(x−u)

Figure 3.4: Shown is respectively the tail of the underlying distribution and of the excess distribution for the AEX dataset. We see a close fit of the Generalized Pareto

distribution.

Now we have analysed the data and showed the assumptions to hold, we can test which method in combination with a given dependent distribution (for the time series model) estimates the Value-at-Risk in the most reliable way for our data. How this is tested is explained in the next chapter (with the help of backtesting), afterwards we end with a conclusion in chapter 5.

(29)

Chapter 4

Backtesting

To test the methods explained in the previous chapters we apply backtesting. For our backtests we use the data which is explained in chapter 3. This consists of 2087 daily negative log returns from the AEX, the DAX and the S&P 500. To make estimates of the Value-at-Risk, we use a fixed time horizon for the fitting process as in Jalal & Rockinger (2004). This horizon is set equal to 1000, which gives us a few years of data to fit the time series model through for each t ∈ T = {1001, ..., 2088}. So basically, we are shifting the time horizon until we reach the last data point, which gives us 1087 Value-at-Risk estimates.

As explained in chapters 2 and 3, we also use a constant amount of data (100 data points), for the Generalized Pareto distribution. This is based on the simulation study in McNeil (2000) and has been found legitimate. To make 1-step forecasts for every t, we first need the parameters of the GARCH(1,1)-AR(1) model and of the Generalized Pareto distribution fit. Then we can calibrate the translated Gamma distribution with the help of the calculated moments as in chapter 2. Eventually, we end up with 1087 forecasts of the Value-at-Risk, so for every t we obtain one estimate. To find how good the methods perform, we apply a binomial test which is explained in Jalal & Rockinger (2004). Basically, we compare all the estimates for the different datasets and for the different quantiles q ∈ {0.995(SII), 0.90} for every t ∈ T , by looking if a violation has occurred or not. This violation indicates that the estimated Value-at-Risk is smaller than the observed value, xt_q< xt. In the next paragraph, this test is further explained.

(30)

Binomial test To test how the methods perform, we apply a binomial test. This test is based on the number of violations, where a violation occurs when a Value-at-Risk estimate is exceeded by the observed value. In total we would expect 1 − q times the total number of predictions to be violations. So we can define an indicator func-tion It, which equals 1 when a violation occurs and 0 otherwise. In other words, this

means that this function can be seen as a random variable which is Bernoulli distributed:

It=    1, VaRt< xt 0, Otherwise ∼ Bernoulli(1 − q) (4.1)

We know that if we take the sum of independent and identically distributed Bernoulli random variables we end up with a binomial distribution. This distribution uses two parameters, where one equals the number of trials, n, and the other the probability p of success for each trial. In our case, we set the cardinality (amount of elements) of the set T equal to n and 1 − q equal to p. This gives under the null hypothesis that a certain method estimates the Value-at-Risk in the right way. For this hypothesis, we use a two sided binomial test because want to capture the error of having too few violations and too many violations. When the p-value is less than 0.05, we say we have evidence against our null hypothesis.

In the next section we give our results for each dataset. The tables show the amount of violations per method and also their corresponding p-values. Additionally, we add the amount of skewness violations. This is only relevant for the translated Gamma approximation methods, as the other methods are not influenced by this problem. In appendix A, we can find the corresponding plots of the Value-at-Risk estimates. Again, we only show the plots of the AEX data, because it adds not much more information to our results.

4.0.1 Data: AEX

In this section we give our results with regards to the AEX dataset. For every t ∈ T , we made predictions with different dependent distributions (Standard Normal distribution, Student’s t-distribution and Skew Generalized Error distribution) for the time series model. Afterwards, we estimated the unconditional methods and the conditional meth-ods, which depend on this model. In the tables 4.1 and 4.2 we present the results, first we give the expected violations, this equals 1 − q times the total amount of predictions. And secondly, we give respectively per method the amount of observed violations, the p-value of the binomial test and in addition the total amount of negative skewnesses. We also make graphical illustrations of the Value-at-Risk estimates for all t ∈ T . These

(31)

graphs can be found in Appendix A.

Standard Normal distribution Skew Generalized Error distribution Conditional method 15 (0,000) Conditional method 9 (0,125)* Extreme Value Theory 10 (0,050) Extreme Value Theory 10 (0,050) Translated Gamma (8) 12 (0,005) Translated Gamma (9) 12 (0,005)

Student’s t-distribution unconditional

Conditional method 10 (0,050) Extreme Value theory 6 (0,808)* Extreme Value Theory 10 (0,050) Translated Gamma (50) 9 (0,125)* Translated Gamma (9) 12 (0,005) Historical simulation 7 (0,501)*

Table 4.1: All the methods are tested on the AEX dataset, in total we expect 6 violations for the quantile, q, equal to 0,995. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods.

A (∗) sign indicates non-rejections based on the binomial test.

Standard Normal distribution Skew Generalized Error distribution Conditional method 100 (0,379)* Conditional method 86 (0,022) Extreme Value Theory 108 (0,944)* Extreme Value Theory 108 (0,944)* Translated Gamma (8) 99 (0,327)* Translated Gamma (9) 98 (0,279)*

Conditional method 111 (0,816)* Extreme Value theory 89 (0,046) Extreme Value Theory 108 (0,944)* Translated Gamma (50) 73 (0,000) Translated Gamma (9) 97 (0,237)* Historical simulation 89 (0,046)

Table 4.2: All the methods are tested on the AEX dataset, in total we expect 109 violations for the quantile, q, equal to 0,90. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods.

(32)

4.0.2 Data: DAX

Also in this section we give our results, but now with regards to the DAX dataset. For every t ∈ T , we made predictions with different dependent distributions (Standard Normal distribution, Student’s t-distribution and Skew Generalized Error distribution) for the time series model. Afterwards, we estimated the unconditional methods and the conditional methods, who depend on this model. In tables 4.3 and 4.4 we present the results, first we give the expected violations, this equals 1 − q times the total amount of predictions. And secondly, we give respectively per method the amount of observed violations, the p-value of the binomial test and in addition the total amount of negative skewnesses for the translated Gamma methods.

Standard Normal distribution Skew Generalized Error distribution Conditional method 14 (0,000) Conditional method 7 (0,501)* Extreme Value Theory 6 (0,808)* Extreme Value Theory 7 (0,501)* Translated Gamma (5) 10 (0,050) Translated Gamma (5) 11 (0,017)

Conditional method 7 (0,501)* Extreme Value theory 3 (0,295)* Extreme Value Theory 7 (0,501)* Translated Gamma (226) 8 (0,270)* Translated Gamma (6) 10 (0,050) Historical simulation 5 (0,852)*

Table 4.3: All the methods are tested on the DAX dataset, in total we expect 6 violations for the quantile, q, equal to 0,995. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods. A (∗) sign indicates non-rejections based on the binomial test. Note that the Trans-lated Gamma found a large number of negative skewed distributions. This indicates

(33)

Conditional method 111 (0,816)* Extreme Value theory 101 (0,436)* Extreme Value Theory 104 (0,635)* Translated Gamma (226) 82 (0,007) Translated Gamma (6) 98 (0,279)* Historical simulation 100 (0,379)*

Table 4.4: All the methods are tested on the DAX dataset, in total we expect 109 violations for the quantile, q, equal to 0,90. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods. A (∗) sign indicates non-rejections based on the binomial test. Note that the Trans-lated Gamma found a large number of negative skewed distributions. This indicates

unreliable Value-at-Risk estimates (see also chapter 2).

4.0.3 Data: S&P 500

Finally, we give our results with regards to the S&P500 dataset. For every t ∈ T , we made predictions with different dependent distributions (Standard Normal distri-bution, Student’s t-distribution and Skew Generalized Error distribution) for the time series model. Afterwards, we estimated the unconditional methods and the conditional methods, who depend on this model. In tables 4.5 and 4.6 we present the results, first we give the expected violations, this equals 1 − q times the total amount of predic-tions. And secondly, we give respectively per method the amount of observed violations, the p-value of the binomial test and in addition the total amount of negative skewnesses.

(34)

Standard Normal distribution Skew Generalized Error distribution Conditional method 15 (0,000) Conditional method 7 (0,501)* Extreme Value Theory 7 (0,501)* Extreme Value Theory 7 (0,501)* Translated Gamma (8) 7 (0,501)* Translated Gamma (0) 7 (0,501)*

Conditional method 5 (0,852)* Extreme Value theory 4 (0,537)* Extreme Value Theory 6 (0,808)* Translated Gamma (1) 7 (0,501)* Translated Gamma (0) 7 (0,501)* Historical simulation 5 (0,852)*

Table 4.5: All the methods are tested on the S&P 500 dataset, in total we expect 6 violations for the quantile, q, equal to 0,995. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods.

A (∗) sign indicates non-rejections based on the binomial test.

Conditional method 106 (0,785)* Extreme Value theory 81 (0,005) Extreme Value Theory 103 (0,564)* Translated Gamma (226) 62 (0,000) Translated Gamma (0) 97 (0,237)* Historical simulation 81 (0,005)

Table 4.6: All the methods are tested on the S&P 500 dataset, in total we expect 109 violations for the quantile, q, equal to 0,90. The numbers per method respectively represent the estimated violations, the p-value of the binomial test and in addition the amount of negative skewnesses (between brackets) for the translated Gamma methods. A (∗) sign indicates non-rejections based on the binomial test. Note that the Trans-lated Gamma found a large number of negative skewed distributions. This indicates

unreliable Value-at-Risk estimates (see also chapter 2).

4.1 Analysing the results

To analyse the results from the previous section, we begin with looking at every method and dependent distribution separately. From this, we can reach a conclusion and draw all the results together. Inspecting the findings for the dependent distributions separately from the different methods applied, gives us more than enough information to answer our research question. In the next section we refer to the 99,5% Value-at-Risk as case 1 and the other estimate as case 2 (for our ease).

(35)

Conditional method The conditional method is perfect to check which dependent distribution works best for the time series model. This is because the Value-at-Risk estimate from this method directly depends on this distribution. For case 1, we find one significant test result for the AEX data. This in contrast with the other datasets. There we find significant test results, except not for the model where we use the Standard Normal distribution. For the other case, all the test results are significant except not for the Skew Generalized Error distribution. The combination of this distribution with the conditional method gives overestimates of the Value-at-Risk, which corresponds to too few violations.

When we compare which dependent distribution performs best (in the sense of having the most parameters with the highest significant test results), we clearly find that the Student’s t-distribution outperforms the other two. Most results lay close to the expected violations, which means that the most leptokurtic distribution works best for our data. Also note that the Skew Generalized Error distribution, which is leptokurtic, does not work much better than the Standard Normal distribution. Actually, the Normal distribution works even better in case 2. However for the other case, which is the most important one because of Solvency II, we clearly see that the former distribution fails against the other two. The reason why these two distributions outperform each other sometimes, is because of the symmetry and the peakedness of the innovations distribution (sometimes one fits better than the other).

(Un)conditional Extreme Value Theory method The conditional Extreme Value Theory method gives only insignificant test results in case 1 for the AEX data. How-ever, for the other datasets, we do find significant results. When we look at the tables from this chapter, we clearly find that this method works best in case 2 (in the sense of having the most parameters with the highest significant test results). But also shows accurate output in case 1. It is probably better in estimating the tail of the innovations distribution and of the distribution of the data itself. Therefore we conclude that this method works best given our data and our other methods.

For the unconditional Extreme Value Theory method, we find significant test re-sults in case 1, which is in contrast with case 2. There we only find significant test results for the DAX dataset. However, this is more likely to be an outlier than a real finding. But all in all, using this method for estimating the 99,5% Value-at-Risk works fine.

(Un)conditional translated Gamma approximation method It is hard to say anything about the conditional translated Gamma approximation method when the used distributions show negative skewnesses. Therefore we also include this in our tables.

(36)

When we look at the overall picture, we clearly see that this method performs better in case 2 (except for the S&P 500 dataset), but still, the test results are nearly insignificant. We can also see that this method underestimates the amount of violations in both cases, which indicates too high Value-at-Risk estimates.

Just like the unconditional Extreme Value Theory method, we find that the results of the unconditional translated Gamma approximation method from the first case 1 outperform the ones from case 2. However, for this method, it is very hard to say whether this is a coincidence or not, because the found amount of negative skewnesses is very high (see tables from this chapter). Therefore there are a lot of unreliable estimates, which influences the binomial test. We can also see that the other two unconditional methods perform better in the sense of having more significant test results.

Unconditional historical simulation method Just like when using the uncon-ditional Extreme Value Theory method, we find significant test results for the DAX dataset. However, again, we presume this to be an outlier. On the other side, this method performs well in case 1. The amount of violations lies very close to the expected amount. Moreover, it works even better than the other two methods in this case for the DAX and the S&P 500 datasets.

When we look at all the tables from this chapter simultaneously, we find a pat-tern. In the first case, all the unconditional methods perform well, which is in contrast with the conditional methods. Whereas, using the latter methods for case 2, we do get estimates with significant test results.

This phenomenon is the consequence of outliers. Unconditional methods only use the raw data, which can contain a lot of outliers. So when we try to estimate the 99,5% Value-at-Risk, the resulting values will be relatively high. And as we do not apply the proposed time series model, we see in figure 4.1, the line (which represents the Value-at-Risk estimates for every t ∈ T ) entering the volatility clusterings within the data as it cannot adjust itself in time. This gives more accidentally violations. Therefore, as the unconditional methods fully depend on the used data, we conclude that they are not reliable to estimate the Value-at-Risk.

(37)

Chapter 5

Conclusion

Financial institutions own great amounts of assets which they mostly use to cover their liabilities (Hull, 2012). A lot of these assets are traded on a daily basis, which brings financial risks into a company. These risks could be dangerous for the policyholders, because when the economy collapses, we expect the prices of the assets to drop. This makes it nearly impossible for these institutions to cover their liabilities and that is why the regulators want to know to what extent these institutions are exposed to these risks. However, the question which then arises, is how to imagine this in the sense of how much money to hold.

According to Solvency II, which is the regulation within the insurance branch, insurers and reinsurers have to hold a Solvency Capital Requirement (SCR), which is determined by stressing the risks within a Standard formula or within an internal model. These stresses are calibrated on a 1 year 99,5% Value-at-Risk. Estimation of this risk measure can be done in several ways. In this paper, we compared conditional and un-conditional methods. For the un-conditional methods a GARCH(1,1)-AR(1) time series model is estimated, where we compared three different distributions for the innovations (varying from a mesokurtic distribution to a leptokurtic distribution). Moreover, we applied unconditional methods and used two different quantiles (90% and 99,5% Value-at-Risk). In this way, we were able to see when a certain method in combination with a degree of peakedness of the dependent distribution of the innovations performs better than another method, in the sense of estimating the Value-at-Risk, and we can conclude why.

To give an adequate answer to the question, which method performs best under different assumptions for the distribution of the innovations and different quantiles, we backtested on three datasets (AEX returns, DAX returns and S&P 500 returns). With the help of a binomial test we compared these methods. After examining the results, we

(38)

conclude the following, which generally holds for all three datasets:

• The unconditional methods outperform the conditional methods in our backtest un-der all given assumptions about the dependent distribution of the time series and only for the 99,5% quantile.

This is mainly caused by outliers. Unconditional methods only depend on the distribution of the data (which often contains outliers and high observations) for estimating the value of the quantile. So when we estimate the 99,5% Value-at-Risk, the resulting values are relatively high. And as we do not apply the pro-posed AR(1)-GARCH(1,1) time series model, we see in figure 4.1, the line (which represents all the Value-at-Risk estimates) entering the volatility clustering within the data as it cannot adjust itself in time. This gives more accidentally violations (which are defined as Value-at-Risk estimates smaller than the observed values), therefore using these methods is not reliable (they depend too much on the data). • Conditional methods perform better for the 90% quantile than for the 99,5%

quan-tile of the Value-at-Risk in our backtest1.

Our conditional methods take the volatility clustering of the data into account because of the fitted time series model. They also look at the behaviour of the in-novations distribution. When estimating the 90% quantile for the Value-at-Risk, we are more exposed to the volatility change than when estimating the 99,5% Value-at-Risk (the estimates lay lower). Therefore we also get more violations. In other words, when there is a sudden increase in volatility in the data, we expect the GARCH(1,1)-AR(1) time series model and the used method to adjust to the situation (so our Value-at-Risk estimate is ’protected’ on two fronds). On the other hand, the unconditional methods can not cope with this problem, which is shown as a violation clustering inside figure 4.1.

• The assumption of Student’s t-distributed innovations in the time series holds the most accurate Value-at-Risk estimates for all methods in our backtest.

The tables 4.1-4.6, where we can find the results from the backtest, we clearly see when using the Student’s t-distribution as the dependent distribution for the time series model, we obtain the most significant test results. This means that the p-value of the estimated violations is greater than 0,05 and therefore significant. In other words, the distribution of the innovations from the time series model is more leptokurtic than mesokurtic.

1

Note that we make 1087 Value-at-Risk estimates for every dataset, increasing this number could change this conclusion.

(39)

However, our results also show that using a Skew Generalized Error distribution as dependent distribution (which is more leptokurtic than mesokurtic) not always gives more reliable Value-at-Risk estimates in comparison with using a Standard Normal distribution (a mesokurtic distribution).

• Unconditional methods hold poor estimates for the 90% quantile in our backtest. As already mentioned, the unconditional methods slowly adjust to sudden increases in volatility within the data. Therefore when estimating the 90% quantile (which holds lower Value-at-Risk estimates than for the 99,5% quantile), we are more exposed to the high observations and this determines the amount of violations. When there are a lot of this kind of observations, we get more violations than expected (but also the other way around).

• The Extreme Value Theory method outperforms the other (un)conditional methods in our backtest.

Using the Extreme Value Theory unconditionally gives insignificant test results for the 90% quantile. However, we do get the most significant test results (in comparison with the other unconditional methods) if we want to know the 99,5% quantile of the Value-at-Risk2 _{(see tables 4.1, 4.3 and 4.5). For the 90% quantile,}

we find that the conditional Extreme Value Theory method outperforms the other methods for every dependent distribution of the innovations. The reason for this, is that this method can perform for both symmetric and asymmetric tails, as it fits a distribution in it that suits best. Therefore we conclude, given our results, that the (un)conditional Extreme Value method performs best, optionally in combination with a Student’s t-distribution as dependent distribution of the innovations.

2

(40)

Appendix A - VaR estimates

0

Appendix B - R code

#========WTW=================packages-and-data-input========

suppressPackageStartupMessages(library(fGarch)) # fgarch package loading suppressPackageStartupMessages(library(evir)) # Evir package loading

suppressPackageStartupMessages(library(fBasics)) # fbasics package loading suppressPackageStartupMessages(library(zoom)) # zoom package loading

suppressPackageStartupMessages(library(moments)) # moments package loading setwd("C:\Towers Watson Professional Development)

msc <- read.csv2("data_Msc.csv", header = T) #this can be altered (input) Msc.data <- msc[c(1:2088),c(5:6)]

Msc.data[,1] <-as.Date(as.character(Msc.data[,1]),format="%d-%m-%Y") #reading-in data.

#we use negative log returns, so positive numbers represent losses! #========WTW===========================tests================== #par(mfrow=c(1,2))

fit@fit$matcoef[3,1]+fit@fit$matcoef[4,1] #(McNeil, 2000)

Box.test(-diff(log(Msc.data[,2]))[c(1001:2087)], lag = 3, type = c("Box-Pierce", "Ljung-Box"), fitdf = 0) #independence test #========WTW====vectors-and-fitting-Garch(1,1)-AR(1)-model======= mu_next <- rep(1,(length(Msc.data[,2]))-1001) sigma_next <-rep(1,(length(Msc.data[,2]))-1001) cond_sged_var <-rep(1,(length(Msc.data[,2]))-1001) cond_std_var<-rep(1,(length(Msc.data[,2]))-1001)

(49)

cond_snorm_var <-rep(1,(length(Msc.data[,2]))-1001) invers_sged <-rep(1,(length(Msc.data[,2]))-1001) invers_st <-rep(1,(length(Msc.data[,2]))-1001) invers_snorm <-rep(1,(length(Msc.data[,2]))-1001) qua_est_quantile_k_gains <- rep(1,(length(Msc.data[,2]))-1001) cond_evt_sged_var<-rep(1,(length(Msc.data[,2]))-1001) cond_evt_std_var<-rep(1,(length(Msc.data[,2]))-1001) cond_evt_snorm_var<-rep(1,(length(Msc.data[,2]))-1001) qua_est_quantile_k_gains_uncond<-rep(1,(length(Msc.data[,2]))-1001) cond_shifted_gamma_snorm <- rep(1,(length(Msc.data[,2]))-1001) cond_shifted_gamma_sged <- rep(1,(length(Msc.data[,2]))-1001) cond_shifted_gamma_std <- rep(1,(length(Msc.data[,2]))-1001) uncond_shifted_gamma <- rep(1,(length(Msc.data[,2]))-1001) skewness <- rep(1,(length(Msc.data[,2]))-1001) skewnesss <- rep(1,(length(Msc.data[,2]))-1001) uncond_hist_sim <-rep(1,(length(Msc.data[,2]))-1001) #empty vectors

quantile <- 0.90 #quantile we wish to see (input). afh.distr<-toString("std"); #std/sged/snorm (input). for (i in 1:(length(Msc.data[,2])-1001)){ fit <- garchFit(~arma(1,0)+garch(1,1), mean=0, include.mean=FALSE, data = -diff(log(Msc.data[,2]))[c((i):(999+i))], cond.dist=afh.distr); #fitting parameters. mu_next[i] <- predict(fit,n.ahead=1)[[1]]; sigma_next[i] <- predict(fit,n.ahead=1)[[3]]; #========WTW===========Conditional-(distribution)================

if(afh.distr == "sged"){invers_sged[i] <- qsged(c(quantile), mean=0, sd=1) } else if(afh.distr == "std"){invers_st[i]

<-sqrt((fit@fit$matcoef[5,1]-2)/

Comparison between conditional and unconditional methods to estimate the Value-at-Risk of nancial data