Estimation of Value at Risk for serial dependent data using an Extreme Value Theory approach

(1)

Estimation of Value at Risk for serial

dependent data using an Extreme

Value Theory approach

H.S.M. Koo

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics

Author: H.S.M. Koo

Student nr: 11401923

Email: hsm.koo@gmail.com

Date: December 12, 2018

Supervisor: mw. dr. L. Yang Second reader: dhr. dr. S.U. Can

(2)

Statement of Originality

This document is written by Student Helga Koo who declares to take full respon-sibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those men-tioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Estimation of Value at Risk for serial dependent data — H.S.M. Koo iii

Abstract

In Extreme Value Theory, two common approaches can be find for the estima-tion of the Value at Risk (VaR). One considers block maxima, while the other uses threshold exceedances to model the tail distribution. They are referred to as the block maxima (BM) method and the peak-over-threshold (POT) method respectively. Under i.i.d. assumptions, both methods have their own strengths and weaknesses. However, financial time series typically exhibit serial dependence fea-tures such as volatility clustering. Hence, non-i.i.d. assumptions should be applied in order to estimate the VaR. The theory suggests the inclusion of the extremal index in the BM method and an adjusted variance in the POT method. This might result in a preference of one method over the other. The empirical findings are obtained from a simulation study using Student t distributions and GARCH(1,1) models. In addition, a financial application with the Dow Jones Industrial Average is included. The results gave us better insight into the behavior of both methods for different model assumptions. However, neither of the methods convincingly outperforms the other.

Keywords Value at Risk, stationary time series, GARCH model, block maxima, threshold exceedances, extremal index

(4)

Preface

During my studies I started to have an interest in risk management. Therefore, I was very excited to dive into a specific topic in this field and to conduct a research. The resulting thesis is written as a completion of the master’s degree in ‘Actuarial Science and Mathematical Finance’ at the University of Amsterdam.

This thesis allowed me to put all the knowledge and skills that I have gathered over the past few years, into practice. It also pushed me to my limits and made me face and solve difficulties along the way. It has been a challenging and valuable journey, where I did not only learned a lot about the subject but also became to know myself better as person.

Several persons have contributed academically and practically to this master’s thesis. First, I would like to thank my supervisor Lu Yang for her time, valuable input and guidance throughout the entire process. I would also like to thank my second reader Umut Can for his time to read and comment on my thesis. In addition, I feel grateful for the love and motivation that my friends gave me. Finally, I want to express my gratitude to my family for always being supportive during my entire studies.

(6)

Chapter 1 Introduction

The global financial crisis and European sovereign debt crisis are two main events from the last decade that illustrated the importance of risk management. In order to manage the risk effectively, we need to gain insight into the extreme events. The Value at Risk (VaR) can be used to assess the risk. It is defined as the high quantile of the loss distribution and can be interpreted as the largest expected loss for a risky asset or portfolio over a specific period of time for a given confidence level. The VaR was first developed at JPMorgan and became later on the most used risk measure in the financial industry. It plays an important role in the Basel regulatory framework and has also been used in Solvency II.

When we consider extreme events, the concern lies in the small amount of observa-tions in the tail of the distribution. For this reason, we rely on the Extreme Value Theory (EVT), which provides a theoretical foundation to build models focusing on the tail of the distribution. Two main methods can be found in EVT to model extreme values under the assumption of independent and identically distributed (i.i.d.) observations. The most traditional method is referred to as the block maxima (BM) method. The data is split into non-overlapping blocks and the maximum of each block is taken. The resulting block maxima follow approximately the Generalized Extreme Value (GEV) dis-tribution. An alternative method is specified as the peak-over-threshold (POT) method. All observations exceeding some (high) threshold are considered and fit by a general-ized Pareto distribution (GPD). The latter method treats the data on extreme outcomes more efficiently, which could be beneficial in practical applications.

In real world, financial data do not satisfy the assumption of i.i.d. random variables (see e.g. Cont (2001)). Financial time series typically exhibit serial dependence such as volatility clustering. Large observations from a cluster provide less information about the distribution compared to the same number of large independent observations. This could decrease the estimation accuracy in both methods described above. Therefore, the estimation of the VaR should be adjusted by taking the serial dependence into account. According to B¨ucher & Segers (2018), the block maxima can still be considered i.i.d. as these observations are mostly scattered in time and show very weak dependence. Thus, the advantage of the BM method is that the asymptotic properties of the estimator

(7)

2 H.S.M. Koo — _{Estimation of Value at Risk for serial dependent data}

remain valid in a non-i.i.d. setting. However, an additional estimation of the extremal index is required which makes the estimation procedure more complicated. The ex-tremal index could be interpreted as the reciprocal of the mean cluster size. This extra estimation step could lead to an increase in the estimation error. Following Drees et al. (2003), the POT method has the advantage that the estimation of the VaR remains unchanged for non-i.i.d. data. However, the asymptotic variance needs to be adjusted and turns out usually to be higher.

To summarize, the BM method and POT method both have their strengths and weaknesses. In this research we aim to compare both methods in order to determine which method performs better in the estimation of VaR for serial dependent data. We use an empirical approach and are in particular interested in the performance of the methods for small samples. A simulation study takes place in which data generating processes possessing serial dependence are constructed. The GARCH(1,1) is such a process and often used for the modeling of financial time series. In addition, we apply the methods on data from the Dow Jones Industrial Average.

From our simulation results we conclude that the methods are quite similar to each other for i.i.d. data. The POT method yields slightly better results, but the differences are small. Under non-i.i.d. assumptions, we use the intervals estimator and sliding blocks estimator to estimate the extremal index and find better results for the latter one. Comparing the BM method and POT method itself, it is more difficult to define a winner. While the BM method performs better in terms of the variance and mean squared errors (MSE), the coverage probabilities of the confidence intervals turn out to be low. On the other hand, the point estimates from the POT yield better bias results, but the adjusted variance becomes very large. This results also in very large MSE and confidence intervals. Hence, the results of our research do not convince us that one method is better than the other.

The structure of the thesis is as follows. Chapter 2 contains the literature review and introduction of the two methods in both the i.i.d. and serial dependence setting. In Chapter 3 we elaborate on both methods in order to estimate the VaR and specify the evaluation criteria. Our simulation study can be found in Chapter 4 and a financial application is presented in Chapter 5. The conclusion of the research is discussed in Chapter 6.

(8)

Chapter 2 Extreme Value Theory

Our interest lies in the estimation of quantiles in the tail of the marginal distribution of financial return series. We review the EVT and focus particularly on two methods to model extreme values. The first method makes use of block maxima, while the second method considers threshold exceedances.

2.1 Value at Risk

In the early 1990s JPMorgan introduced the VaR which was part of their RiskMetrics system. It became the standard measure in risk management; see Duffie & Pan (1997) for a general overview of VaR. The VaR can be used both by financial institutions to assess their risks and by a regulator to set margin requirements.

We will define the VaR under a probabilistic framework (see e.g. Tsay (2007)). Suppose that at time t we are interested in the risk of a financial position for the next ` periods. Let L(`) be the loss function and F`(x) the cumulative distribution function

(CDF) of L(`). For any univariate F`(x) and probability q, such that 0 < q < 1, we call

the quantity

xq = inf{x|F`(x) > q} (2.1)

the qth quantile of F`(x), where inf refers to the smallest real number x satisfying

F`(x) > q. When the loss function L(`) of F`(x) is continuous, we can write

q = Pr(L(`) ≤ xq). (2.2)

If the CDF F`(x) is known, we can define xq in (2.2) as the VaR. Hence, the VaR is

the qth quantile of the CDF of the loss function L(`). The VaR is also often referred to as as the upper pth quantile with p = 1 − q as the upper tail probability of the loss distribution.

In risk management, often values as p = 0.05 or p = 0.01 are chosen, while p = 0.001 is used for stress testing. The choice for the time horizon ` might be set by the regulator and will depend on the type of risk. For market risk the time horizon is often set to 1 day or 10 days, for credit risk 1 year or 5 years. In practice the CDF F`(x) remains

(9)

unknown and therefore needs to be estimated. We will especially focus on the upper tail behavior of the loss CDF and discuss two estimation methods.

2.2 Block maxima theory

2.2.1 Block maxima - i.i.d.

Let (Xt, t ∈ Z) be random variables which represent daily financial losses. Initially,

we impose the assumption that these returns are independent and identical distributed with an unknown underlying continuous distribution F . We divide the observations into k blocks with n observations in each block. The maximum observation in each block is denoted as Mn = max(X1, ..., Xn). Applying the EVT, we fit the GEV distribution

to the block maxima. We assume that the block maxima Mn shows regular limiting

behavior, which means that there exist sequences of real constants bn and an> 0 such

that

lim

n→∞P {(Mn− bn)/an≤ x} = limn→∞F n_(a

nx + bn) = H(x) (2.3)

for a non-degenerate distribution function H(x). When (2.3) holds, F is said to be in the maximum domain of attraction of H: F ∈ MDA(H).

The GEV distribution is defined as Hξ(x) =    exp− (1 + ξx)−1/ξ if ξ 6= 0 exp (−exp (−x)) if ξ = 0 , (2.4) where 1 + ξx > 0. Since ξ → 0, we call the case ξ = 0 the limit of the distribution function. The GEV distribution also corresponds to the Weibull, Gumbel and Fr´echet distributions in the cases ξ < 0, ξ = 0 and ξ > 0 respectively. We are allowed to fit the GEV distribution to the block maxima when the following hold.

Theorem 2.2.1 (Fisher-Tippett, Gnedenko). If F ∈ MDA(H) for some non-degenerate df H, then H must be a distribution of type Hξ, i.e. a GEV distribution.

In addition, the following theory applies for the Fr´echet distribution. Theorem 2.2.2 (Fr´echet MDA, Gnedenko). If ξ > 0,

F ∈ M DA(Hξ) ⇐⇒ 1 − F (x) = x−1/ξL(x)

for some slowly varying function L(x) : (0, ∞) → (0, ∞) measurable so that lim

x→∞

L(tx) L(x) = 1 with t > 0.

Theorem 2.2.2 suggests that the tail of the original distribution F decays with a power speed. Hence, the Fr´echet distribution can be considered heavy-tailed. This distribution has a finite jth moment if j < 1/ξ, where ξ is referred to as the tail index.

(10)

Estimation of Value at Risk for serial dependent data — H.S.M. Koo 5

In the literature often estimates of ξ smaller than 0.25 are found for financial data (see e.g. Jansen & de Vries (1991), Loretan & Phillips (1994) and Lux (1996)). For this reason, the Fr´echet distribution will be of interest for our research. In practice, the true distribution is unknown which also applies to the normalizing constants an and bn in

(2.3). Therefore, we define the location parameter µ and the positive scale parameter σ which results in the three parameter form of the GEV distribution

Hξ,µ,σ(x) =    exp − 1 + ξx−µ_σ −1/ξ if ξ 6= 0 exp −exp −x−µ_σ if ξ = 0 . (2.5) Two well-known methods used to fit the GEV distribution to the block maxima are the maximum likelihood estimation (Dombry (2015)) and probability weighted moments (Ferreira & De Haan (2015)).

2.2.2 Block maxima - serial dependence

We relax the i.i.d. assumption and consider strictly stationary time series with serial dependencies. For notational convenience, we define ( ˜Xt) as the stationary time series

and (Xt) as the associated i.i.d. series with the same marginal distribution F . Likewise,

the block maxima are denoted as ˜Mn = max( ˜X1, ..., ˜Xn) and Mn = max(X1, ..., Xn)

respectively.

There are two conditions under which the maxima ˜Mnand Mnhave identical limiting

behavior (Leadbetter (1983)). First, if the stationary series only shows weak long-range dependence and no tendency of cluster-forming for large values, then the maxima of both series have exactly the same limiting behavior. Second, if F ∈ M DA(Hξ) for some

ξ, then Hξwill also be the natural limiting distribution for the normalized block maxima

of the stationary series. Moreover, the normalizing sequences will be identical for both the stationary and associated i.i.d. series.

However, the conditions above are not met for financial time series, as clusters of large values are usually involved. Recently, B¨ucher & Segers (2018) found valid reasons to regard block maxima from stationary series as i.i.d.. An intuitive explanation is that those observations are usually scattered in time with very weak dependence. B¨ucher & Segers (2018) show that the maximum likelihood estimator remains consistent and asymptotically normal. However, the location and scale for the block maxima from stationary time series are different. This is due to the extremal index θ ∈ [0, 1], which characterizes the extremal serial dependence, and results in an additional estimation step.

McNeil (1998) states that the stationary time series ( ˜Xt) has extremal index θ when

there is a sequence un(τ ) such that the following hold

lim n→∞n(1 − F (un(τ ))) = τ, lim n→∞P {Mn≤ un(τ )} = exp(−τ ), lim n→∞P { ˜Mn≤ un(τ )} = exp(−θτ ). (2.6)

(11)

for every τ > 0. Note that θ = 1 means that there is no tendency to cluster and hence refers to the i.i.d. setting. For large n, we derive the following property

P { ˜Mn≤ un(τ )} ≈ Pθ{Mn≤ un(τ )} = Fnθ(un(τ )). (2.7)

This property could be interpreted in such way that the maximum of n observations from a stationary series with extremal index θ behaves the same as nθ observations from the associated i.i.d. series. Put differently, in n observations we would have nθ as the number of pseudo-independent clusters. Therefore, θ could also be referred to as the reciprocal of the mean cluster size. Applying (2.7) to the block maxima leads to Theorem 2.2.3.

Theorem 2.2.3. If ( ˜Xn) is stationary with extremal index θ > 0 then

lim

n→∞P {(Mn− bn)/an≤ x} = H(x)

for a non-degenerate H(x) if and only if lim

n→∞P {( ˜Mn− bn)/an≤ x} = H θ_(x)

with Hθ(x)a non-degenerate.

This shows that when F ∈ M DA(Hξ), the asymptotic distribution of the

normal-ized maxima of the stationary series ( ˜Xt) could be considered to be an extreme value

distribution as well. However, the distribution function is raised by the power θ, result-ing in a different location and scale parameter (see e.g. Chavez-Demoulin & Davison (2012)). If the GEV parameters for the i.i.d. series (Xt) are denoted as ξ, µ and σ, then

the GEV parameters for the stationary series ( ˜Xt) are given as

˜ ξ = ξ, ˜ µ = µ −σ ξ 1 − θξ, ˜ σ = σθξ. (2.8) Several methods for the estimation of the extremal index can be found in the lit-erature (see e.g. Hsing (1993), Smith & Weissman (1994), Süveges et al. (2010) and Robert et al. (2009)). A comparison between various extremal index estimators takes place in Fawcett & Walshaw (2012). Their work favors the block maxima estimator and the interval estimator. Berghaus & Bücher (2016) further analyze the block maxima estimator and specify the disjoint and sliding blocks estimator. The advantage is that these estimators align with the block maxima method as they require a parameter for the block size as well. Berghaus & Bücher (2016) also compare numerous estimators and find the best performance in the sliding blocks estimator.

(12)

2.3 Peak-over-threshold theory

Recall (Xt) as the i.i.d. random variables for financial losses with distribution function

F . For a specified threshold u, we obtain excess losses with underlying distribution Fu(x) = P {X − u ≤ x|X > u} =

F (x + u) − F (u)

1 − F (u) (2.9) where 0 ≤ x < xF − u and xF ≤ ∞ denotes the right endpoint of F .

Since a small amount of observations lie above the threshold, estimation of the distribution Fu could be difficult. In this case, we rely on Theorem 2.3.1 from the EVT.

Theorem 2.3.1 (Pickands-Balkema-de Haan). Let F be a distribution function such that F ∈ M DA(Hξ), the conditional excess distribution function Fu(y) for u large

is, lim u→xF Fu(x) = Gξ,β(x) where Gξ,β(x) = ( 1 − (1 + ξx/β)−1/ξ if ξ 6= 0 1 − exp (−x/β) if ξ = 0

for β > 0, and x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ −β/ξ when ξ < 0. Gξ,β is the so-called

Generalized Pareto distribution (GPD).

The parameters ξ and β are also named the shape and scale parameter respectively. When ξ > 0, the GPD is considered heavy-tailed and equal to an ordinary Pareto distribution with parameters α = 1/ξ and κ = β/ξ. When the distribution of normalized maxima converges to the GEV distribution with shape parameter ξ, the distribution of excess losses converges to the GPD distribution with the same shape parameter ξ.

We can again make use of maximum likelihood estimation or probability weighted moments to fit the GPD to the data. The maximum likelihood estimation is easy to implement under i.i.d. assumptions, as the joint density is a product of the marginal GPD densities. An alternative is provided by Hill et al. (1975), where we estimate the extreme value index α = 1/ξ which indirectly leads to an estimate for the shape parameter ξ.

When we consider stationary time series with volatility clustering, the joint density is no longer equal to the product of the marginal densities. An easy approach is to neglect this misspecification which results in quasi-maximum likelihood estimation (QML). The estimates remain consistent, but the standard errors may be too small. Drees et al. (2003) shows that the POT approach still holds under serial dependence. Nonetheless, a different, usually higher, asymptotic variance should be taken into account.

(13)

Chapter 3 Value at Risk estimation

We first provide the methodology for the estimation of the VaR and corresponding confidence intervals using the BM method. This also contains the estimation of the extremal index. Next, we discuss the estimation of the VaR using the POT method, which includes the adjustment of the variance and the construction of the confidence intervals. An overview of our evaluation criteria can be found at the end.

3.1 Block maxima method

Let N be the total number of observations which we need to divide into blocks. When we determine the number of blocks k and the size of the blocks n, a trade-off between bias and variance takes place. A large value for k equals more block maxima and results in a larger bias yet lower variance in the parameter estimates. Contrarily, a large value for n leads to a better fit of the block maxima distribution by a GEV distribution and therefore a lower bias and larger variance in the parameter estimates.

We denote the block maxima as Mn1, ..., Mnk and use maximum likelihood

estima-tion in order to fit the GEV distribuestima-tion Hξ,µ,σto the block maxima. Let mn1, ..., mnkbe

the realizations of the block maxima and hξ,µ,σ be the density of the GEV distribution,

we write the log-likelihood as l(ξ, µ, σ; mn1, ..., mnk) = k X i=1 ln hξ,µ,σ(mni) = −k ln σ − 1 +1 ξ k X i=1 ln 1 + ξmn1− µ σ − k X i=1 1 + ξmni− µ σ −1/ξ . (3.1) Using the parameter estimates from the maximum likelihood, we can estimate the VaR. Let p be a small upper tail probability and x1−p the (1 − p)th quantile of the

return series. When we use the relationship in (2.7) with θ = 1, we obtain

P {Mn≤ x1−p} = (P {Xt≤ x1−p})n= (1 − p)n. (3.2)

(14)

Filling this into the GEV distribution defined in (2.5) leads to [1 − p]n=      exp −1 + ξx1−p−µ σ −1/ξ if ξ 6= 0 exp −exp−x1−p−µ σ if ξ = 0 , (3.3) where 1 + ξx1−p−µ

σ > 0 for ξ 6= 0. We can rewrite (3.3) to

x1−p=    µ − σ_ξ 1 − (−nln (1 − p))ξ if ξ 6= 0 µ − σln (−nln (1 − p)) if ξ = 0 . (3.4) The quantile x1−p given an upper tail probability p is called the VaR for the original

return series. Since we are interested in the case of ξ 6= 0, we estimate the VaR by filling in the MLE parameter estimates in (3.4) and obtain

ˆ x1−p= ˆµ − ˆ σ ˆ ξ 1 − (−nln (1 − p))− ˆξ. (3.5) For the construction of the asymptotic confidence intervals, we follow Drees et al. (2003). The confidence intervals for the VaR under i.i.d. assumptions are defined as

h ˆ x1−pexp −z_α/2ξ (k − 1)ˆ −1/2log ((k − 1)/(np)), ˆ x1−pexp zα/2ξ (k − 1)ˆ −1/2 log ((k − 1)/(np)) i , (3.6) where z_a/2 is the (1 − α/2) quantile of the standard normal distribution.

When we consider stationary time series, we extend the method with the estimation of the extremal index. We follow Berghaus & B¨ucher (2016) and implement the estima-tors that perform best in most scenarios. This results in the intervals estimator (Ferro & Segers (2003)) and the sliding blocks estimator (Berghaus & B¨ucher (2016)). While the sliding blocks estimator does not rely on any parameter choice other than the block size, the intervals estimator requires a parameter for the threshold.

We begin with the introduction of the intervals estimator. Let N again be the total number of observations and k the number of observations exceeding threshold u. We fix the number k to the same number of block maxima. Given the threshold exceendances, we obtain the exceedance times S1 < ... < Sk and the interexceendance

times Ti = Si+1− Si, i = 1, ..., k − 1. The intervals estimator is defined to be

ˆ θint= 2 _k−1 P i=1 (Ti− a) 2 (k − 1) k−1 P i=1 (Ti− b)(Ti− c) , (3.7) where a = b = c = 0 if the largest interexceendance time is no greater than 2 and a = b = 1, c = 2 else.

The sliding blocks estimator requires k number of blocks with block length n, such that N = kn. The values for k and n are the same as we already have used in the

(15)

beginning of the BM method. We divide the sample into N − n + 1 blocks of length n for t = 1, ..., N − n + 1. For each block, we find the maxima

M_tsl= Mt:(t+n−1) = max(Xt, ..., Xt+n−1). (3.8)

Following Berghaus & B¨ucher (2016) we use these block maxima to make the estimations ˆ N_tsl = ˆF (M_tsl) and Yˆ_tsl= −n log( ˆN_tsl), (3.9) where ˆF (x) = N−1 N P s=1

1(Xs≤ x) denotes the empirical cdf of (Xn). The sliding blocks

estimator is formulated as ˆ θsl= 1 N − n + 1 N −n+1 X t=1 ˆ Y_tsl !−1 . (3.10) Using the estimates of the extremal index, we compute the VaR in a similar way as under the i.i.d. assumption. Again we apply the relation defined in (2.7) to obtain the VaR. The (1 − p)th quantile of F (x) is the (1 − p)nθth quantile of the GEV distribution for the stationary time series. By MLE we obtain directly the parameters ˜ξ, ˜µ and ˜σ from (2.8) and we can estimate the VaR as

ˆ x1−p= ˜µ − ˜ σ ˜ ξ 1 − (−nˆθln(1 − p))− ˜ξ. (3.11) The asymptotic confidence intervals are constructed in a similar way as under i.i.d. assumptions and defined as

h ˆ x1−pexp −z_α/2ξ (k − 1)˜ −1/2log ((k − 1)/(N p)), ˆ x1−pexp z_α/2ξ (k − 1)˜ −1/2log ((k − 1)/(N p)),i. (3.12)

3.2 Peak-over-threshold method

The POT method requires us to set a threshold. The threshold choice is also subject to the bias-variance trade-off. A low threshold gives many usable observations, but could lead to a large bias. Then again, a high threshold resulting in a few observations could cause a high estimation variance. We can use a mean excess plot to explore the range of thresholds. The data is plotted against the mean excess function, which is defined as

en(u) = E(X − u|X > u) =

Pn

i=1(Xi− u)1Xi>u

Pn

i=11Xi>u

. (3.13) The mean excess function gives the average of the excesses of X over a range of values for threshold u. A good choice for the threshold is the value for which the mean excess plot becomes linear. The plot provides in addition information about the shape parameter. A linear upward trend suggests ξ > 0, whereas a horizontal line and linear downward trend indicates ξ = 0 and ξ < 0 respectively.

In this research, the number of exceedances is set equal to the number of block maxima, both denoted as k. We pick the threshold u accordingly and use the mean excess

(16)

plot to verify our threshold choice. The amount of excess loss is defined as Yj = Xi− u

with Xi > u. The GPD is fit to the threshold exceedances using maximum likelihood

estimation. Let gξ,β be the density of the GPD, the log-likelihood is given as

l(ξ, β; y1, ..., yk) = k X j=1 ln gξ,β(yj) = −k ln β − 1 +1 ξ k X j=1 ln 1 + ξyj β , (3.14) where y1, ..., yk denotes realizations of the excess loss amounts.

If the data are i.i.d., we can estimate the VaR using the maximum likelihood pa-rameter estimates. We rewrite (2.9) in order to get to

F (x) = (1 − F (u))Fu(y) + F (u) (3.15)

Using Theorem (2.3.1) we can estimate Fu by the GPD and F (u) by (n − k)/n. Then

we obtain ˆ F (x) = k N  1 − 1 + ξˆ ˆ β(x − u) !−1/ ˆξ + 1 − k N = 1 − k N 1 + ˆ ξ ˆ β (x − u) !−1/ ˆξ . (3.16) Taking the inverse of (3.16) gives

ˆ F−1(x) = u +βˆ ˆ ξ N k(1 − x) − ˆξ − 1 ! . (3.17) Let p be the upper tail probability. The VaR is the (1 − p)th quantile of F

ˆ x1−p= ˆF−1(1 − p) = u + ˆ β ˆ ξ N kp − ˆξ − 1 ! . (3.18) The corresponding confidence intervals can be calculated in the same way as defined in (3.6).

When the data is non-i.i.d. we use the same procedure and fit the likelihood using QML. However, we correct the asymptotic variance following Drees et al. (2003) ˆ σ_x(D)2=   k−1 X i=j i−1/2−log ((k − 1) / (N p)) log(i/(N p)) (k − 1) −1/2 2   −1 k−1 X i=j   log(ˆx(i)_(1−p)/ˆx1−p) log(i/(N p))   2 , (3.19) where j is the smallest integer exceeding N p and ˆx(i)_(1−p) = Xn−i:n(N p_i )−(1/ ˆξ

i₎

with ˆξi depending only on the i + 1 largest order statistics. According to Drees et al. (2003), the confidence intervals are then calculated as

h ˆ x1−pexp −z_α/2σˆ_x(D)(k − 1)−1/2log((k − 1)/(N p)) , ˆ x1−pexp z_α/2σˆ_x(D)(k − 1)−1/2log((k − 1)/(N p)) i, (3.20) where ˆσ(D)x denotes the corrected standard deviation of the VaR estimator.

(17)

3.3 Evaluation criteria

In order to evaluate the performance of the VaR estimators, we compare the estimates with the true value of the VaR. Since the empirical VaR is not known, we use a pre-simulation to determine the true value of the VaR. We simulate samples with a large sample size and take the (1 − p)th quantile of each sample as VaR estimate. The true value of the VaR is fixed on the median of these VaR estimates.

Next, we run the simulation itself by generating samples with a smaller sample size. The VaR is now calculated for each sample using the BM and POT method. We normalize the VaR estimates by the true VaR obtained from the pre-simulation. This way, the scale is omitted from our results which enables comparison between models.

We report the bias, variance and mean squared error of the normalized estimators. Let s be the number of samples, VaR1−p be the true VaR and dVaR

(i)

1−p be the VaR

estimate in run i = 1, ..., s, the bias is specified as Bias = 1 s s X i=1   d VaR(i)_1−p VaR1−p − 1  . (3.21) The variance of the normalized VaR estimator is computed as

Variance = 1 s s X i=1   d VaR(i)_1−p VaR1−p −   1 s s X j=1 d VaR(j)_1−p VaR1−p     2 . (3.22) The mean squared error of the normalized estimator is defined as

MSE = 1 s s X i=1   d VaR(i)_1−p VaR1−p − 1   2 , (3.23) which can be decomposed into the squared bias plus the variance of the normalized estimator. A good estimator has both terms small, resulting in a small mean squared error as well.

Furthermore, we calculate the proportion of the time that the confidence intervals include the true VaR. This can be defined as the coverage probability,

Cov. prob. = 1 s s X i=1 1_LB(i)_≤VaR 1−p≤U B(i)× 100%, (3.24)

where LB(i) and U B(i) denotes the lower and upper bound of the confidence interval in run i and 1_LB(i)_≤VaR

1−p≤U B(i) denotes the indicator function whether the true value

(18)

Chapter 4 Student t and GARCH models

We compare the BM method and POT method in a simulation study. We start with the design of simulation study which includes the introduction of the models for both the i.i.d. and serial dependence setting. Accordingly, the discussion of the results for i.i.d. data is followed by the results for serial dependent data.

4.1 Simulation study design

Several time series will be generated in the simulation study. We define (St, t ∈ Z) as a

time series for the price, index or exchange rate and take the logarithmic difference in order to obtain the return series Xt= − ln(St/St−1), t = 1, ..., N . As our main interest

lies in the financial losses, we transform the negative log-returns into positive values and vice versa. The log-returns happen to have several stylized facts such as fat tails and volatility clustering.

We start the simulation by the estimation of the VaR in an i.i.d. setting. We gen-erate i.i.d. data from a Student t distribution, which is heavier tailed than the normal distribution. It is therefore able to generate extreme values more similar to real finan-cial data. Our models are based on Wagner & Marsh (2005) and shown in Table 4.1. We use three Student t distributions with different degrees of freedom. The Student t distribution possesses heavier tails when the degrees of freedom decreases. This is also confirmed by the corresponding tail index from our models.

Table 4.1: Student t distribution with ν degrees of freedom and the corresponding tail index ξ of the stationary marginal distribution. The models are from Wagner & Marsh (2005).

ν ξ

Student t(6) 6 0.17 Student t(4) 4 0.25 Student t(3) 3 0.33

Next, we apply our methods to stationary time series. One model that is able to

(19)

ture the earlier mentioned stylized facts is the generalized autoregressive conditionally heteroscedastic (GARCH) model from Bollerslev (1986). Let (Zt, t ∈ Z) be a strictly

white noise (SWN) process with mean 0 and variance 1. The standard normal distri-bution and Student t distridistri-bution are often used for (Zt). A strictly stationary process

(Xt, t ∈ Z) can be written as GARCH(p, q) process,

Xt= σtZt, σt2 = α0+ p X i=1 αiXt−i2 + q X j=1 βjσ2t−j, (4.1)

where α0 > 0, αi ≥ 0, i = 1, ..., p, and βj ≥ 0, j = 1, ..., q. Due to its simplicity, the

GARCH(1, 1) model has been widely used to model financial time series. From (4.1), we derive GARCH(1,1) as

Xt= σtZt, σt2= α0+ α1Xt−12 + β1σ2t−1, (4.2)

where α0 > 0 and α1, β1 ≥ 0. The parameter restriction α1+ β1 < 1 should also hold

in order to make the model covariance stationary. Moreover, the sum of the GARCH parameters is often found to be close to one for financial data (see e.g. Mikosch & Starica (2004)). The unconditional variance of the GARCH(1,1) model is defined as σ2 = α0/(1 − α1− β1) and here taken as starting value for the volatility equation σ20.

Our model choice for stationary time series is based on Wagner & Marsh (2005). Their research involves GARCH(1,1) models with Student t(ν) errors. We present the models in Table (4.2), but choose to change the original constant parameter value α0 =

10−6 in α0 = 10−3. This adjustment re-scales the time series, but will not affect the

shape of it.

Table 4.2: GARCH(1,1) models with Student t(ν) errors and corresponding tail index ξ of the stationary marginal distribution. The models are from Wagner & Marsh (2005) with a modification of parameter α0= 10−6 to α0= 10−3.

ν ξ α0 α1 β1

GARCH(1,1)-t(9) 9 0.17 10−3 0.05 0.92 GARCH(1,1)-t(5) 5 0.25 10−3 0.03 0.94 GARCH(1,1)-t(4) 4 0.33 10−3 0.03 0.93

For both models in i.i.d. and non-i.i.d. setting, we estimate the VaR with upper tail probability p = 0.01 and p = 0.001, denoted as VaR\0.99 and VaR\0.999 respectively. In

addition, we calculate the corresponding 95% confidence intervals. In the pre-simulation we run 1000 samples with length 106 each. Let us assume that one trading year consists of 252 days, we generate 1000 samples with a sample length of 12 trading years. This is equal to 3024 data points for each sample. Since the GARCH models are initialized by an arbitrary set of parameter values, we choose a burn-in period to allow the processes to converge. In both the pre-simulation and simulation, we choose a burn-in period which equals 10% of the length of the time series. The burn-in period is not counted in the previous specified sample lengths.

(20)

Under the BM approach, we consider block sizes equal to one, two and three months. The resulting number of block maxima k are then also used as the number of threshold exceedances in the POT approach. In Table 4.3 we give an overview of the specified blocks and corresponding number of observations used for the approximation of the extreme value distributions.

Table 4.3: Overview of the number of data points in each block n and the resulting number of block maxima or threshold exceedances k. The sample size consists of 12 trading years, which equals 3024 observations. The sample is divided in blocks of one, two and three months.

1 month 2 months 3 months

n 21 42 63

k 144 72 48

4.2 Results Value at Risk - i.i.d.

The true VaR values obtained from the pre-simulation are presented in Table 4.4. The VaR increases when the Student t distribution becomes more heavy-tailed and/or when the upper tail probability p becomes smaller. These results are in line with our expecta-tions. Heavy-tailed distributions contain more extreme values and thus lead to a higher potential loss. Also, the probability of a small loss is higher compared to the probability of a large loss. Hence, the VaR0.99values are smaller than the VaR0.999 values.

Table 4.4: Empirical VaR0.99 and VaR0.999 values for three Student t(ν) models with ν degrees

of freedom. The values are obtained from a pre-simulation, where 1000 samples are run with a sample length of 106data points.

VaR0.99 VaR0.999

Student t(6) 3.14 5.21 Student t(4) 3.75 7.17 Student t(3) 4.54 10.21

Given the true VaR values, we compute the normalized VaR estimates using both the BM method and POT method. The results for VaR0.99 are shown in Table 4.5.

First of all, we notice that the bias for all models are close to zero. As expected, the bias decreases when the block sizes increases for almost all models. An exception is the Student t(3) distribution under the POT approach. For all those other cases, we observe a considerable larger bias when k = 144. The bias for the other two k values are smaller and lie closer to each other. The BM method also seems more sensitive towards k since the range of the bias is larger compared to the range of the bias in the POT method.

In line with our expectations, the variances increases when the block sizes increases. A larger block size leads to a lower number of observations used for the estimation and

(21)

thus a higher variance. The variances from the BM method are slightly higher, but almost similar to the variances from the POT method. Again, we notice a wider range for the values from the BM method compared to the values from the POT method.

Since the bias is close to zero, resulting in a smaller squared bias, the variance is more determining for the MSE. The variance and MSE are very close to each other and in some cases even the same after rounding off. For both methods, the MSE increases when the distribution becomes more heavy-tailed and/or when the block sizes increases. The POT method mostly shows the lowest values and seems to perform better. However, the difference between both methods is little.

For both methods, the confidence intervals provide better coverage when the Student t distribution becomes more heavy-tailed. This can be explained by the fact that a more heavy-tailed distribution has a larger shape parameter estimate ˆξ. As this parameter is used to construct the confidence intervals, we obtain wider confidence intervals leading to higher coverage probabilities. Not only has the Student t(6) distribution the lowest coverage rates, the rates also increases when the block sizes increases which is in contrast with the other distributions. This is a result of the reciprocity between the block size n and the number of block maxima or threshold exceedances k, both also used as input to construct the confidence intervals. The coverage probability for the Student t(6) are higher under the POT approach, but all not close to 95%. The BM confidence intervals provide higher coverage in the other cases.

Table 4.5: Evaluation criteria for the normalized VaR0.99 estimators under the BM approach

(left) and the POT approach (right). The bias, variance and MSE values are multiplied by 103_.

k 144 72 48 Student t(6) Bias 8.22 2.85 2.69 Variance 1.50 1.75 2.10 MSE 1.57 1.75 2.10 Cov. prob. 80% 86% 85% Student t(4) Bias 7.43 1.86 0.16 Variance 2.45 2.89 3.34 MSE 2.51 2.90 3.34 Cov. prob. 96% 95% 94% Student t(3) Bias 7.32 1.15 -0.06 Variance 3.13 3.70 4.52 MSE 3.19 3.70 4.52 Cov. prob. 100% 99% 98% k 144 72 48 Student t(6) Bias 4.04 2.71 2.56 Variance 1.42 1.66 1.74 MSE 1.44 1.67 1.75 Cov. prob. 83% 88% 90% Student t(4) Bias 3.37 1.74 1.61 Variance 2.35 2.76 2.86 MSE 2.36 2.76 2.87 Cov. prob. 93% 91% 90% Student t(3) Bias 2.84 3.20 3.17 Variance 3.06 3.54 3.64 MSE 3.07 3.55 3.65 Cov. prob. 98% 96% 93%

(22)

the lowest bias appears in the BM method for bimonthly blocks. This also applies to the Student t(4) distribution in the POT method. Similar to the VaR0.99 results, the

bias from the BM method are more sensitive towards the block sizes compared to the bias from the POT method.

The variances behave the same as the variances of the VaR0.99results, but are this

time larger in size. This is expected as it is more difficult to estimate a higher quantile accurately. The variances are lower for the BM method when monthly blocks are used, but are other than that higher compared to the POT method.

Again, the MSE values are most of the time equal to the variance after rounding off. Due to the larger variances, the MSE values are higher for VaR0.999 compared to

the values for VaR0.99. This time, we can not favor one approach in terms of the MSE.

The BM method performs better when monthly blocks are used, while the POT method would be a better choice when the number of exceedances is smaller.

The coverage probabilities are lower compared to the probabilities for VaR0.99. The

confidence intervals in all models fail to provide the intended 95% coverage, although the confidence intervals for the Student t(3) under the BM approach are making a good attempt. Analogous to the VaR0.99 results, we notice that the POT method has higher

rates for the Student t(6) distribution, while the BM method is better for the other distributions.

(left) and the POT approach (right). The bias, variance and MSE values are multiplied by 103_.

k 144 72 48 Student t(6) Bias -7.84 1.10 2.03 Variance 7.16 8.69 9.30 MSE 7.22 8.69 9.30 Cov 67% 79% 82% Student t(4) Bias -13.23 -0.40 1.87 Variance 13.30 16.48 18.35 MSE 13.47 16.48 18.35 Cov 86% 86% 87% Student t(3) Bias -18.23 0.94 5.92 Variance 17.95 22.97 26.26 MSE 18.28 22.97 26.29 Cov 94% 94% 94% k 144 72 48 Student t(6) Bias -3.22 -3.39 -3.65 Variance 8.35 8.67 8.68 MSE 8.36 8.68 8.69 Cov 72% 84% 86% Student t(4) Bias -7.34 -5.60 -6.61 Variance 16.01 17.50 17.42 MSE 16.06 17.53 17.46 Cov 82% 82% 84% Student t(3) Bias -6.95 -7.07 -7.65 Variance 21.96 24.26 24.68 MSE 22.00 24.31 24.74 Cov 91% 88% 87%

(23)

4.3 Results Value at Risk - serial dependence

The true VaR0.99 and VaR0.999 values for the three GARCH models are reported in

Table 4.7. Similarly to the i.i.d time series and in line with our expectations, we observe increasing VaR values when the underlying distribution becomes more heavy-tailed and/or when the upper tail probability becomes smaller.

Table 4.7: Empirical VaR0.99 and VaR0.999 values for three GARCH models with Student t(ν)

errors. The values are obtained from a pre-simulation, where 1000 samples are run with a sample length of 106_{data points.}

VaR0.99 VaR0.999

GARCH-t(9) 0.75 1.29 GARCH-t(5) 1.10 2.20 GARCH-t(4) 1.19 2.72

Since we are considering serial dependent time series, the estimation of the VaR under the BM approach includes the estimation of the extremal index. In Table 4.8 we present the results from the intervals estimator and the sliding blocks estimator. The extremal index is estimated in every simulation run and therefore we report the mean and standard deviation. The mean of the extremal index estimates do not differ much between both estimators and lie all between 0.52 and 0.60. These results confirm the presence of serial dependency in our time series. The mean for GARCH-t(9) and GARCH-t(5) are almost identical to each other, while the mean for GARCH-t(4) is slightly lower indicating a higher tendency to cluster. When k decreases, the mean estimates from the intervals estimator decreases as well which is in contrast to the mean from the sliding blocks estimator. On the other hand, the standard deviations of the extremal index obtained from the intervals estimator seem to increase, while the standard deviations from the sliding blocks estimator decreases. The standard deviations from both estimators are similar to each other for k = 144. When k decreases, we obtain lower standard deviations from the sliding blocks estimator.

(24)

Estimation of Value at Risk for serial dependent data — H.S.M. Koo 19 Table 4.8: Estimation of the extremal index θ using the intervals estimator (left) and the sliding blocks estimator (right) for three GARCH models with Student t(ν) errors. The mean and variance of the extremal index estimates are computed.

k 144 72 48 GARCH-t(9) Mean 0.59 0.55 0.56 Std. dev. 0.12 0.14 0.16 GARCH-t(5) Mean 0.59 0.55 0.55 Std. dev. 0.14 0.15 0.17 GARCH-t(4) Mean 0.55 0.52 0.52 Std. dev. 0.14 0.16 0.18 k 144 72 48 GARCH-t(9) Mean 0.55 0.58 0.62 Std. dev. 0.13 0.09 0.08 GARCH-t(5) Mean 0.55 0.58 0.63 Std. dev. 0.14 0.11 0.09 GARCH-t(4) Mean 0.52 0.55 0.60 Std. dev. 0.14 0.11 0.10

The results for the normalized VaR0.99estimators from the BM method are shown in

Table 4.9. The lowest values appear for the GARCH-t(9) and the highest values for the GARCH-t(4). We can conclude that the bias increases when the underlying distribution has larger tails and more serial dependency. The BM method with a sliding blocks estimator produces a higher bias for monthly blocks, but smaller biases for block sizes of two and three months. Also, the difference between the bias for monthly blocks is smaller between both approaches compared to the difference for the other block sizes.

Against our expectations, the variances do not increase when the block size increases. Especially under the BM approach with sliding blocks estimator, the variances become smaller. In most cases, we see that this approach leads to a smaller variance compared to the approach with the intervals estimator.

Again, the MSE values behave in the same as way as the variances. This means that the BM method performs better when we estimate the extremal index by the sliding blocks estimator compared to the intervals estimator. The difference between these two estimators increases when the models become more heavy-tailed and have a higher cluster tendency.

The coverage probabilities range from 50% to 80%. Parallel to the i.i.d. case, we note that the coverage probabilities under the BM approach provide too little coverage. The coverage rates for the BM approach with sliding blocks estimator are still in most cases a bit higher.

(25)

with the intervals estimator (left) and the sliding blocks estimator (right). The bias, variance and MSE values are multiplied by 102.

k 144 72 48 GARCH-t(9) Bias 8.00 6.46 3.87 Variance 3.15 3.42 2.99 MSE 3.79 3.84 3.14 Cov. prob. 54% 66% 71% GARCH-t(5) Bias 11.32 9.87 7.04 Variance 8.60 9.68 8.89 MSE 9.88 10.65 9.39 Cov. prob. 70% 76% 76% GARCH-t(4) Bias 17.06 15.25 9.38 Variance 35.24 36.98 19.30 MSE 38.15 39.31 20.18 Cov. prob. 78% 82% 81% k 144 72 48 GARCH-t(9) Bias 10.06 3.63 -1.12 Variance 3.47 1.90 1.28 MSE 4.48 2.04 1.29 Cov. prob. 50% 70% 74% GARCH-t(5) Bias 14.41 4.87 -1.60 Variance 9.49 4.52 2.77 MSE 11.57 4.76 2.80 Cov. prob. 67% 80% 80% GARCH-t(4) Bias 19.10 5.92 -2.31 Variance 22.75 8.00 4.22 MSE 26.40 8.35 4.27 Cov. prob. 76% 86% 85% In order to estimate the VaR under serial dependence using the POT method, we need to adjust the variance following Drees et al. (2003). As the variance are adjusted in every simulation run, we report the mean of these values alongside the original variance of the VaR0.99estimates in Table 4.10. Conform Drees et al. (2003), we find the adjusted

variances to be much higher than the original variances. While the original variances differ much more between the three GARCH models, the adjusted variances look less sensitive towards the type of model.

Table 4.10: Comparison of the asymptotic variance ˆσ2

xand asymptotic variance ˆσ (D)2

x from Drees

et al. (2003) for the VaR0.99 estimates. The mean of the asymptotic variance ˆσ (D)2 x is reported. k 144 72 48 GARCH-t(9) σ2 x 0.01 0.01 0.01 Mean of σ(D)2x 0.37 0.30 0.51 GARCH-t(5) σ2 x 0.04 0.05 0.05 Mean of σ(D)2x 0.34 0.29 0.54 GARCH-t(4) σ_x2 0.09 0.09 0.10 Mean of σ(D)2x 0.33 0.32 0.64

(26)

We now compare the estimators under the BM with the estimator under the POT approach. The results for the VaR0.99 are displayed in Table 4.11. Since we notice

that the BM method with sliding blocks estimator outperforms the BM method with intervals estimator, we will focus more on the comparison between the former and the POT method.

Once more, we see the bias increase when the distribution becomes more heavy-tailed and possesses more serial dependence. In contrast to the BM method, the bias obtained from the POT method increases when k decreases. This appeared earlier as well for the POT method under i.i.d. assumptions. However, the bias for the POT estimates are close to zero and much lower than the bias for the BM estimates.

Contrarily, the variances are much higher compared to the BM variances. This is due to the adjustment from Drees et al. (2003). Because of the relatively low bias and the high variances, the variance determine the whole part of the MSE. For all models, the smallest variance and MSE for the POT method are estimated with the number of excesses k = 72. Nonetheless, the BM methods turn out to be more favorable with regard to the variances and MSE.

The coverage probabilities under the POT approach are 94% or higher. The POT confidence intervals provide much better coverage compared to the BM confidence inter-vals. These high coverage probabilities are due to the higher variance estimates leading to wider confidence intervals.

When comparing both the BM and POT method for the VaR0.99, none of the

meth-ods beats the other because of the large differences. When we consider the variance and MSE, the BM methods are in favor, while the POT method is better in terms of the bias and the coverage probabilities.

(27)

Table 4.11: Evaluation criteria for the normalized VaR0.99 estimator under the POT approach.

The bias, variance and MSE values are multiplied by 102_.

k 144 72 48 GARCH-t(9) Bias -0.16 0.09 0.19 Variance 65.65 52.96 90.76 MSE 65.65 52.96 90.76 Cov. prob. 99% 98% 98% GARCH-t(5) Bias -0.21 0.24 0.36 Variance 27.70 23.71 43.97 MSE 27.70 23.71 43.97 Cov. prob. 98% 95% 97% GARCH-t(4) Bias -0.19 0.62 0.82 Variance 23.71 22.90 45.39 MSE 23.71 22.90 45.40 Cov. prob. 95% 94% 96%

We consider the VaR estimates with upper tail probability p = 0.001. The perfor-mances of the BM method with the two extremal index estimators are shown in Table 4.12. Similar to the VaR0.99 results, the sliding block estimator has lower bias than

the intervals estimators starting for the bi- and trimonthly blocks. The difference in bias for the monthly blocks is still not that much. Parallel to the VaR0.999 under i.i.d.

assumptions, we also notice that the intervals estimator provides the smallest bias for bimonthly blocks.

The variances in the VaR0.999 results show similar behavior as the variances in the

VaR0.99results. The variances are relatively large for the GARCH t(4) model. Also, the

variances for the intervals estimator are the highest when k = 72. The MSE is similar to the variance. In most cases, both terms are the smallest for the sliding blocks estimator. The coverage probabilities are slightly smaller compared to the probabilities for VaR0.99 results. However, we have the same conclusions as for the VaR0.99 results. The

confidence intervals from both methods provide poor coverage. The BM method with sliding blocks estimator provides somewhat more coverage compared to the method with intervals estimator with an exception for the monthly blocks. Overall, the comparison of evaluation criteria for both extremal index estimator suggest again the use of the sliding blocks estimator.

(28)

Estimation of Value at Risk for serial dependent data — H.S.M. Koo 23 Table 4.12: Evaluation criteria for the normalized VaR0.999 estimators under the BM approach

with the intervals estimator (left) and the sliding blocks estimator (right). The bias, variance and MSE values are multiplied by 102.

k 144 72 48 GARCH-t(9) Bias 8.70 10.64 9.26 Variance 8.88 15.65 13.52 MSE 9.63 16.79 14.38 Cov. prob. 55% 65% 71% GARCH-t(5) Bias 13.57 16.55 15.57 Variance 28.84 39.93 48.32 MSE 30.68 42.67 50.74 Cov. prob. 66% 70% 74% GARCH-t(4) Bias 28.04 33.31 24.21 Variance 392.53 435.73 133.91 MSE 400.39 446.82 139.77 Cov. prob. 70% 74% 78% k 144 72 48 GARCH-t(9) Bias 10.48 7.39 3.51 Variance 9.68 8.97 6.16 MSE 10.78 9.51 6.28 Cov. prob. 54% 68% 73% GARCH-t(5) Bias 16.61 10.14 4.25 Variance 31.91 20.41 14.88 MSE 34.67 21.43 15.06 Cov. prob. 65% 74% 76% GARCH-t(4) Bias 28.23 16.12 6.87 Variance 177.71 65.64 38.79 MSE 185.68 68.23 39.26 Cov. prob. 68% 78% 80% For the POT method, we again find the adjusted variance to be much higher com-pared to the original variance. The adjusted variance are more stable across the different GARCH models as well. However, the variances in both methods decrease as k decreases which is the opposite compared to the results for VaR0.99.

Table 4.13: Comparison of the asymptotic variance ˆσ2

xand asymptotic variance ˆσ (D)2

x from Drees

et al. (2003) for the VaR0.999 estimates. The mean of the asymptotic variance ˆσ (D)2 x is reported. k 144 72 48 GARCH-t(9) σ2 x 0.07 0.06 0.06 Mean of σ(D)2x 36.74 18.12 12.88 GARCH-t(5) σ2 x 0.44 0.39 0.38 Mean of σ(D)2x 32.14 16.86 12.26 GARCH-t(4) σ2 x 1.20 1.00 0.96 Mean of σ(D)2x 31.87 17.34 12.47

The evaluation criteria for the VaR0.999 estimators for the three GARCH models

(29)

the number of threshold exceedances decreases. This is in the opposite direction as the theory would suggest. Compared to the BM method, the POT method yields smaller bias with an exception for k = 48. In that case, the BM method with sliding blocks estimator has a better result.

In contrast to Drees et al. (2003), we find remarkable large variances for the POT method. The MSE are as a consequence very large as well. Despite the large values, the results behave in a similar way as the POT results for VaR0.99. Both terms are the

highest for GARCH-t(9) and the lowest for GARCH-t(4). This is again in contrast with the variances from the BM method.

Due to the large variance, all coverage probabilities under the POT method reach 100%. This implies that the confidence intervals are too wide. Since the coverage prob-abilities for the BM method are quite low on the contrary, both the BM and POT method could not provide relevant confidence intervals when we estimate the VaR0.999.

Looking at all evaluation criteria, we tend more towards the BM method with sliding blocks estimator. Especially in the case of k = 48, this method shows better criteria values compared to the POT method.

Table 4.14: Evaluation criteria for the normalized VaR0.999 estimator under the POT approach.

The bias, variance and MSE values are multiplied by 102_.

k 144 72 48 GARCH-t(9) Bias -3.63 -3.94 -4.09 Variance 2223.75 1096.58 779.51 MSE 2223.88 1096.73 779.68 Cov. prob. 100% 100% 100% GARCH-t(5) Bias -5.81 -6.52 -6.72 Variance 663.20 347.84 252.96 MSE 663.54 348.26 253.41 Cov. prob. 100% 100% 100% GARCH-t(4) Bias -7.51 -8.88 -9.25 Variance 430.90 234.38 168.57 MSE 431.46 235.16 169.43 Cov. prob. 100% 100% 100%

(30)

Chapter 5 Dow Jones Industrial Average

To add practical relevance to our research, we estimate the VaR for real financial data. The data is first described and analyzed on the properties of serial dependence. This is followed by a comparison of the VaR results from both the BM and POT method.

5.1 Descriptive statistics

We use data from the Dow Jones Industrial Average (DJIA) for our financial application. The sample consist of 3020 observations and ranges from 3 January 2006 to 29 December 2017. This equals twelve trading years, which is similar to the sample length used in our simulation study. The log-returns are presented in Figure 5.1. Extreme periods in this plot can be linked to the global financial crisis in 2008, the European debt crisis in 2011 and a stock market sell-off in 2015. The extreme returns in either direction occur close to each other, indicating the existence of volatility clustering in the time series.

Figure 5.1: Daily log-returns of the Dow Jones Industrial Average (DJIA). The data ranges from 3 January 2006 to 29 December 2017.

Table 5.1 presents the descriptive statistics of the sample. The largest loss on the DJIA has a value of 0.08 and the highest log-return a value of 0.11. In Figure 5.1 we find both events back in October 2008 during the global financial crisis. The other ob-servations are mostly spread around zero, which is in line with a median and mean close

(31)

to zero and a standard deviation of 0.01. The skewness is also close to zero indicating a slightly left skewed but almost symmetrical distribution. In addition, the distribution can be considered heavy-tailed based on a kurtosis value of 14.34.

Table 5.1: Descriptive statistics of the daily log-returns of the Dow Jones Industrial Average (DJIA). The data ranges from 3 January 2006 to 29 December 2017.

min max median mean std. dev. skewness kurtosis Dow Jones -0.08 0.11 5.50 × 10−4 2.77 × 10−4 0.01 -0.12 14.34

Figure 5.2 contains the correlograms for the log-returns and the absolute log-returns. Autocorrelations close to zero are found for the log-returns, while high autocorrelations are reported for the absolute variant. The Ljung-Box test provides p-values of zero in both cases. Hence, the autocorrelations observed in the log-returns are significant and we reject the null hypothesis for i.i.d. data.

(a) (b)

Figure 5.2: Plot of the autocorrelation function for the daily log-returns (a) and the absolute daily log-returns (b) of the Dow Jones Industrial Average (DJIA).

Figure 5.3 provides additional evidence for volatility clustering. The 100 largest losses and time between their occurrences are considered. Losses in i.i.d data follow a Poisson distribution and the waiting times between the losses behave according to the exponential distribution. This is not confirmed by Figure 5.3, which means that the data do not meet the i.i.d. assumption.

(32)

(a)

(b)

Figure 5.3: Plot of the hundred largest losses for the Dow Jones Industrial Average (DJIA) (a) and Q-Q plot of times between occurrences of these losses against an exponential distribution (b).

5.2 Results Value at Risk

Since the return series from the DJIA exhibit serial dependence including volatility clustering, we estimate the VaR under non-i.i.d. assumptions. Therefor we estimate the extremal index in the BM method and present the results in Table 5.2. The estimates vary between 0.14 and 0.46 and thus confirm our belief of cluster-forming in this time series. We observe quite some difference between the results of both estimators. The sliding blocks estimates have a smaller range of values and are increasing for a decreasing block size k. Contrarily, the intervals estimates are more volatile towards k.

Table 5.2: Overview of the intervals estimates ˆθint _{and sliding blocks estimates ˆ}_θsl _{for the}

extremal index under the BM approach.

k 144 72 48

ˆ

θint _0.46 _0.14 _0.20

ˆ

θsl _0.31 _0.37 _0.41

Under the POT approach, we fix the threshold such that the number of excesses equals the number of block maxima. Our thresholds u are set to 1.79 × 10−2, 2.39 × 10−2 and 2.74 × 10−2. The mean excess plot is shown in Figure 5.4. Based on this plot a threshold close to zero would be suitable. Hence, our thresholds yield a appropriate number of threshold exceedances that can be modeled by the GPD.

(33)

Figure 5.4: Plot of the mean excess function of the daily log-returns of the Dow Jones Industrial Average (DJIA) against thresholds ranging from -1.0 to 0.2.

The VaR0.99estimates from both methods are stated in Table 5.3. The results can be

interpreted as follows: for example, using the BM method and intervals estimator with monthly blocks, the daily negative log-return is expected to be 3.50 × 10−2 or smaller with a probability of 99%. Imagine that we have invested $100 in the DJIA index. We would expect the daily loss to be at most $100 × (exp(3.50 × 10−2) − 1) = $3.56 with a probability of 99%.

The VaR0.99 estimates vary between 3.13 ×10−2 and 4.52 ×10−2. As we have seen

before in our simulation study, the POT method provides the most stable results with regard to k. The POT estimates range from 3.35 × 10−2 to 3.47 × 10−2 and therefore belong to middle part of the total range of the estimates. The BM method with intervals estimator tends to have higher estimates, while the BM method with sliding blocks estimator accounts for the lower estimates.

The BM confidence intervals are smaller than the POT confidence intervals. How-ever, as we have seen from the simulation study, the BM confidence intervals might be too narrow. The POT intervals seem more appropriate for large values of k, but rapidly widen as k decreases.

Table 5.3: VaR0.99estimates (×102) obtained from the BM method with intervals estimator (BM

int) and sliding blocks estimator (BM sl) and the POT method (POT). The 95% confidence intervals are shown in the brackets.

k 144 72 48

BM int 3.50 (2.88, 4.26) 4.52 (3.80, 5.39) 3.95 (3.21, 4.84) BM sl 3.99 (3.28, 4.85) 3.42 (2.87, 4.07) 3.13 (2.55, 3.84) POT 3.36 (2.03, 5.56) 3.43 (1.78, 6.61) 3.47 (0.86, 14.05)

Table 5.4 contains the VaR estimates for upper tail probability p = 0.001. The es-timates ranges from 6.09 ×10−2 to 7.99 ×10−2. It is noticeable that for both p = 0.01

(34)

and p = 0.001, the highest VaR estimates appear for the BM method with intervals estimator when bimonthly blocks are considered. In the simulation study we also rec-ognize the highest bias for this situation, which could indicate an overestimation of the VaR.

The VaR0.999 point estimates behave in the same way as the VaR0.99estimates. The

only exception is for the POT method, where the VaR estimates are increasing when k decreases. The confidence intervals have widen under all approaches compared to the confidence intervals for VaR0.99. However, the POT confidence intervals are large in the

sense it becomes pointless.

Table 5.4: VaR0.999 estimates (×102) obtained from the BM method with intervals estimator

(BM int) and sliding blocks estimator (BM sl) and the POT method (POT). The 95% confidence intervals are shown in the brackets.

k 144 72 48

BM int 6.94 (5.33, 9.04) 7.99 (6.20, 10.29) 7.30 (5.33, 9.99) BM sl 7.68 (5.90, 10.00) 6.43 (4.99, 8.29) 6.09 (4.45, 8.34) POT 6.83 (0.24, 193.97) 6.65 (0.20, 219.89) 6.63 (00.38, 116.82)

(35)

Chapter 6 Conclusion

For this research we took an empirical approach to compare the BM method and the POT method for the estimation of VaR. We applied the EVT theory to simulated and real financial data and estimated the VaR0.99 and VaR0.999 under i.i.d. and non-i.i.d.

assumptions. We were in particular interested in the performance of the estimators in the non-i.i.d. setting. These findings could help us to interpret the results from our financial application, which makes it more useful for real life situations that risk managers, for instance, face.

The differences in the performance of both methods for i.i.d. data are small. For the VaR0.99 estimates, the bias, variance and MSE are slightly lower for the POT method

in all cases. Besides that, the BM results look more sensitive to the chosen number of block maxima compared to the POT method to the number of threshold exceedances. The coverage probabilities from both methods are higher when the distribution becomes more heavy-tailed. This is due to a larger estimate for the shape parameter which leads to wider confidence intervals. The POT confidence intervals provide better coverage for the Student t(6) distribution and the BM confidence intervals are better for the heavier-tailed Student t(4) and Student t(3) distributions. In the VaR0.999 results, the

BM evaluation criteria again are more volatile when the parameter k changes. This time, the BM method performs better than the POT method in terms of the MSE when k is high. All coverage probabilities have declined, but still leads to the same conclusion as in the VaR0.99results. In general, the BM method and POT method both

have their strong points depending on the type of distribution and the chosen upper tail probability. However, the differences are small in such a way that the methods are comparable to each other.

In the non-i.i.d. setting, we first make a comparison between the BM methods with two different extremal index estimators. The sliding blocks estimator reports lower stan-dard deviations compared to the intervals estimator. Also, the corresponding VaR es-timates have better values for all four evaluation criteria. Therefore, we can conclude that the sliding blocks estimator is more favorable than the intervals estimator. This is a positive outcome in practical matter, as the sliding blocks estimator is more in line with the BM approach. No additional parameter needs to be defined as was required for the

(36)

intervals estimator. In the POT method, the adjusted variances are larger as expected. Additionally, the adjusted variance are less sensitive towards the type of GARCH model. Comparing the BM method with the POT method, we prefer the BM method in terms of variance and MSE, but find the POT method better in bias and coverage probabilities. The differences between both methods become more large for the VaR0.999results. This

also makes us believe that the BM confidence intervals are too narrow and the POT methods too wide. Therefore, the BM method would be more suitable when estimating the VaR0.999 for serial dependent data.

In the financial application, we notice that the VaR estimates from the BM method with sliding blocks estimator are at the lowest, the estimates from the POT method fall in the middle of the range and the estimates from the BM method with intervals estimators appears to be the highest. This applies for both the VaR0.99 and VaR0.999

regardless of the size for k. Taking the performances of the estimators from the simu-lation study into account, we conclude that the BM method with intervals estimator tends to overestimate and that the POT method yields the most stable point estimates towards parameter k. The confidence intervals are smaller under the BM method com-pared to the POT method. However, based on our simulations, we would consider on one hand the BM confidence intervals to be too narrow and on the other hand the POT confidence intervals to be too wide. Although the POT intervals seems reasonable for VaR0.99, especially for larger k, the intervals for VaR0.999 are extremely large and thus

meaningless.

The failure from our confidence intervals suggests for more research in the variances for both methods in the non-i.i.d. setting. Similar to the POT method, there might be reason to adjust the variances for the BM method as well. Likewise, the adjusted variance for POT estimates with upper tail probability of p = 0.001 and higher could be improved. We could also consider different ways to construct confidence intervals by using the profile likelihood for instance. In spite of that, our simulation study and fi-nancial application point out the same. Neither of the methods convincingly outperform the other, but we gain insight into the performance of both methods for different model assumptions regarding underlying distributions, block sizes and thresholds.

(37)

References

Berghaus, B., & B¨ucher, A. (2016). Weak convergence of a pseudo maximum likelihood estimator for the extremal index. The Annals of Statistics, 46 (5), 2307–2335.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31 (3), 307–327.

B¨ucher, A., & Segers, J. (2018). Maximum likelihood estimation for the Fr´echet distribution based on block maxima extracted from a time series. Official Jour-nal of the Bernoulli Society for Mathematical Statistics and Probability, 24 (2), 1427–1462.

Chavez-Demoulin, V., & Davison, A. (2012). Modelling time series extremes. REVSTAT – Statistical Journal , 10 (1), 109–133.

Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1 , 223–236.

Dombry, C. (2015). Existence and consistency of the maximum likelihood estima-tors for the extreme value index within the block maxima framework. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 21 (1), 420–436.

Drees, H., et al. (2003). Extreme quantile estimation for dependent data, with applications to finance. Bernoulli , 9 (4), 617–657.

Duffie, D., & Pan, J. (1997). An overview of value at risk. Journal of Derivatives, 4 (3), 7–49.

Fawcett, L., & Walshaw, D. (2012). Estimating return levels from serially depen-dent extremes. Environmetrics, 23 (3), 272–283.

Ferreira, A., & De Haan, L. (2015). On the block maxima method in extreme value theory: PWM estimators. The Annals of Statistics, 43 (1), 276–298.

Estimation of Value at Risk for serial dependent data using an Extreme Value Theory approach