• No results found

A comparison of alternative models for Value at Risk and Expected Shortfall

N/A
N/A
Protected

Academic year: 2021

Share "A comparison of alternative models for Value at Risk and Expected Shortfall"

Copied!
81
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master’s Thesis

A comparison of alternative models for

Value at Risk and Expected Shortfall

Timna van der Horst

Student number: 10323104

Date of final version: January 15, 2018 Master’s programme: Econometrics

Specialisation: Financial Econometrics Supervisor: Prof. dr. H. P. Boswijk Second reader: Dr. A. C. Rapp

Faculty of Economics and Business

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

up into a number of sections and contains references. An outline can be something like (this

is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper

from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page)

(c) Introduction

(d) Theoretical background

(e) Model

(f) Data

(g) Empirical Analysis

(h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you

use should be logical) and the heading of the sections. You have a free choice how to

list your references but be consistent. References in the text should contain the names

of the authors and the year of publication. E.g. Heckman and McFadden (2013). In

the case of three or more authors: list all names and year of publication in case of the

rst reference and use the rst name and et al and year of publication for the other

references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that

actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty

as in the heading of this document. This combination is provided on Blackboard (in

MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number

(d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

1

(2)

Abstract

The current study provides a comprehensive overview of existing (semi-)parametric VaR and ES models and it proposes to estimate these risk measures based on a strictly consis-tent loss function (Fissler & Ziegel, 2016). We propose nonlinear VaR and ES models which allows for financial time series and high frequency data. This study implements various Monte Carlo simulations and we confirm that the proposed VaR and ES models perform well in finite sample sizes. We apply all estimation methods on three daily international stock indices. The application shows that the so-called asymmetric slope CAViaR model outperforms forecasts based on GARCH and other CAViaR-based models. Moreover, the study investigate to which extent the results are compatible when different sampling win-dows and other out-of-sample periods are used.

Statement of Originality

This document is written by Timna van der Horst who declares to take full responsibil-ity for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 4

2 Theory 7

2.1 Definition VaR and ES . . . 7

2.2 Elicitability . . . 7

2.2.1 A strictly consistent loss function for VaR and ES . . . 8

3 (Semi-)parametric models for VaR and ES 11 3.1 GARCH models . . . 11

3.1.1 GARCH and EVT . . . 12

3.2 CAViaR models . . . 13

3.2.1 CAViaR and EVT . . . 15

3.2.2 By-product of a quantile regression model . . . 16

3.3 Realized kernel models . . . 18

4 Numerical estimation of the LF -models 20 4.1 Optimization . . . 20

4.2 Asymptotic variance-covariance estimation . . . 20

5 Backtesting methods 22 5.1 Backtest VaR . . . 22

5.1.1 Dynamic Quantile test . . . 22

5.2 ES bootstrap test . . . 23

5.3 Testing forecast dominance . . . 23

5.3.1 Diebold-Mariano test . . . 23

5.3.2 Backtest from Ziegel, Krüger, Jordan & Fasciati (2017) . . . 24

6 Monte Carlo simulation study 26 6.1 Simulation study LF-GARCH . . . 26

6.2 Simulation study LF-RK . . . 30

7 Empirical application 35 7.1 In-sample estimation . . . 36

(4)

7.3 Out-of-sample estimation . . . 39

7.3.1 Different window sizes . . . 51

7.3.2 Influence of the crisis period . . . 53

7.3.3 Various out-of-sample periods . . . 55

7.3.4 Crisis period in out-of-sample . . . 62

7.3.5 Realized kernel models . . . 65

8 Conclusion 70

References 72

(5)

1

Introduction

The financial crisis of 2008-2009 and its impact has emphasized the importance of accurate quantitative risk measures in banking supervision and internal risk management. Value at Risk (VaR) is a widely developed tool which has emerged as the industry standard by choice or by regulation (see for instance Jorion, 1997). VaR is defined as the maximum potential loss associated with a financial security or portfolio within a specific time period for a given confidence level. Although VaR is an intuitive and simple risk measure, VaR provides no information about the magnitude of the return losses in the lower percentiles. A second major criticism is that the risk measure has an undesirable mathematic property, namely a lack of subadditivity (Artzner, Delbaen, Eber & Heath, 1999, 1997).1 In other

words, VaR is not a coherent measure of risk.2 This problem is caused by the fact that VaR is a quantile of the return distribution, so the shape of the tail before and after the VaR statistic need not have any bearing on the actual VaR prediction. To circumvent these shortcomings, Artzner et al. (1997, 1999) introduce an alternative measure of financial risk referred to as the Expected Shortfall (ES). This risk measure is more sensitive to the shape of tails of the return distribution and has the appeal to be a coherent risk measure (Artzner et al., 1999, 1997). ES is defined as the expected return loss given that the return is below its VaR level. The Basel Committee on Banking Supervision (BCBS) also notes these shortcomings and recently decides to shift the quantitative risk metrics system from the more familiar VaR to the ES risk measure (Basel Committee on Banking Supervision, 2016).3

The econometrics literature has provided a variety of models to evaluate VaR and ES (see overview at Kuester, Mittnik & Paollela, 2006). Schematically, we divide these existing models into the following three categories: (1) non-parametric methods, which are based on empirical distributions, (2) parametric methods with full distributional as-sumptions and (3) semi-parametric methods, which have parametric and non-parametric

1Subadditivity is based on the principle of diversification and risk aggregation. A risk measure is

subbaditive if: ρ(X1+ X2) ≤ ρ(X1) + ρ(X2), ∀ X1 and X2, i.e. the risk of two aggregate financial

portfolios X1 and X2 should be less than or equal to the sum of their individual risks (Artzner et al.,

1999). VaR violates subadditivity, except in particular cases, e.g. a joint Gaussian distribution for X1and

X2.

2A risk measure ρ(X) is defined to be a coherent risk measure if and only if it satisfies the following

four axioms: (1) translational invariance, (2) subadditivity, (3) positive homogeneity and (4) monotonicity (Artzner et al., 1999). A risk measure is said to be translation invariance in the sense that for all fixed a, ρ(X + a) = ρ(X) + a; subadditive if ρ(X1+ X2) ≤ ρ(X1) + ρ(X2); positive homogeneity if for all

λ ≥ 0, ρ(λX) = λρ(X) and finally, it is said to be monotone in the sense that X ≤ Y a.s. implies that ρ(X) ≤ ρ(Y ).

3

(6)

components. Despite the strong appeal of the non- and semi-parametric methods, which avoid the need to specify and estimate a conditional return distribution, ES estimation es-pecially in the semi-parametric category is challenging. The difficulty lies partially in the fact that ES fails to be elicitable (Gneiting, 2011). A risk measure or statistical functional is defined to be elicitable if there exists a loss function such that the correct forecast of the risk measure or functional uniquely minimizes the expected loss.

The semi-parametric category includes models based on extreme value theory (EVT) and regression quantile approaches. McNeil & Frey (2000) estimate the volatil-ity by fitting a GARCH-type model and apply EVT to offer a parametric tail distribution. Engle & Manganelli (2004a) directly compute the quantile of the distribution instead of modeling the whole distribution. Although direct quantile modeling for VaR is an advan-tage, ES estimation in this way is infeasible due to the lack of elicitability (Gneiting, 2011). Some researches overcome the problem of elicitability: Taylor (2008) estimates expectiles and map these to ES, Engle & Manganelli (2004b) discuss using EVT in combination with CAViaR dynamics and Chun, Shapiro & Uryasev (2012) construct a mixed quantile frame-work for the VaR and ES estimation.

Recently, Fissler & Ziegel (2016) present a more direct solution for the problem of elicitability by showing that, even though ES itself is not elicitable, VaR and ES are jointly elicitable. With the use of this insight, Dimitriadas & Bayer (2017) focus on independent and identically distributed (iid ) return series and consider a VaR and ES regression frame-work with linear specifications. Barendse (2017) allows for financial time series but imposes linear specifications for VaR and ES. In this thesis, (nonlinear) models are introduced which allows financial time series and high frequency data. The purpose of this thesis is to pro-vide an extensive overview of (semi-)parametric models and to investigate if these joint VaR and ES models have a substantial advantage over existing (semi-)parametric models at common tail probability levels. The performance of the models will be assessed by tra-ditional VaR and ES backtest models from McNeil & Frey (2000) and Engle & Manganelli (2004a), and we use comparative backtesting methods for model comparisons in terms of forecasting accuracy.

The thesis is set up as follows. Section 2 formally defines the VaR, ES and elic-itability and introduces the strictly consistent loss function for the pair VaR and ES. Section 3 presents a comprehensive overview of different existing models and we propose to estimate them via joint VaR and ES minimization. Section 4 provides numerical details

(7)

on the implementation of the joint VaR and ES models and Section 5 discusses the often used backtesting methods for evaluating the adequacy of the models.

Section 6 adopts several series of Monte Carlo experiments to assess the finite sample accuracy of the models estimated via joint minimization and Section 7 applies all models on real data to illustrate the use of the models both in-sample and out-of-sample. Finally, Section 8 summarizes the main findings, concludes the thesis and provides sugges-tions for future research. The appendix contains supplementary analyses.

(8)

2

Theory

This section formally defines VaR and ES and the term elicitability more extensively and introduces the strictly consistent loss function for the pair VaR and ES, which is the main interest of the thesis.

2.1 Definition VaR and ES

For a given financial security or portfolio and confidence level α, the one-day-ahead Value at Risk (V aRtα) is defined as the maximum potential return loss conditional on the information

at time t − 1:

P[yt≤ V aRtα|Ft−1] = α, (1)

where yt is a real-valued variable describing the daily period return at time point t and Ft−1 represents the information set available up to time t − 1. In this thesis a negative

value of ytis corresponded to a loss, hence we are interested in V aRtα for values of α close

to zero.

A drawback of VaR is that the size of the losses beyond the quantile level α is not taken into account. ESαt is defined as the expectation of the return loss of the underlying financial portfolio or security given that the return is below its V aRαt level, conditional on

the information available at t − 1:

ESαt = E[yt|yt≤ V aRtα, Ft−1] (2)

Often, t is measured in days (e.g. one day or one week) and α is typically chosen to be 0.01 or 0.05. For small values for α, V aRtα and ESαt a.s. ∀ t are both strictly negative.

2.2 Elicitability

From the perspective of coherence and tail risk sensitivity, ES should be preferred over VaR. However, despite the theoretical and statistical appeals, ES fails to be elicitable. An one-dimensional statistical functional, such as the mean or an expectile, is elicitable meaning that there exists a strictly consistent loss function for it (Gneiting, 2011). Subject to mild integrability and regularity conditions, we discuss some examples of strictly consistent loss functions of communally used statistical functionals.

The loss functions that are strictly consistent for the mean functional are given by:

(9)

S(x, y) = φ(y) − φ(x) − φ0(x)(y − x) (3)

where φ(·) is a strictly convex function and φ0(·) is its first derivative. A standard choice

is taking φ(z) = z2, hence we obtain the well-known mean squared error loss function (Savage, 1971).

Similarly, any loss function of the form:

S(x, y) = |I[y < x] − τ |(φ(y) − φ(x) − φ0(x)(y − x)) (4) where I[·] denotes the indicator function, is strictly consistent for the τ -expectile. The most prominent example arises when φ(z) = z2, leading to the classical asymmetric piecewise

quadratic loss function. It is well-known that expectile specific regression coefficients can be estimated with the help of this loss function (Newey & Powell, 1987).

Finally, a loss function is strictly consistent for the α-quantile if and only if it has the form:

S(x, y) = (I[y < x] − α)(g(x) − g(y)) (5)

for a nondecreasing function g(·). A natural choice arises when g(z) = z, which simplifies to the so-called asymmetric piecewise linear function, also known as the tick-loss function (Koenker & Basset, 1978). We refer to Gneiting & Raftery (2007) for a detailed overview of strictly consistent loss functions in more general contexts.

Not all one-dimensional functionals are elicitable, the most striking example in the present context being ES. However, it has been shown that a real-valued functional, although itself not being elicitable, can be an element of an identifiable elicitable functional of order k, where k ≥ 2 (Osband, 1985). A well-known result is that the vector-valued mean and variance functional is jointly elicitable, while the variance fails to be elicitable. Similar, it turns out that ES is elicitable of order 2, in the sense that the bivariate functional of VaR and ES is jointly elicitable under some weak regularity and integrability conditions (Fissler & Ziegel, 2016). This jointly elicitability opens possibility to estimate forecast models for ES in the same way as VaR is estimated by means of quantile regression.

2.2.1 A strictly consistent loss function for VaR and ES

Fissler & Ziegel (2016, Corollary 5.5) show that the unique minimizers of the expected value of any of the following loss functions, indexed by G1 and G2, are the true values for

(10)

VaR and ES, we denote qt and et for V aRtα and ESαt: S(qt, et, yt) = (I[yt≤ qt] − α)G1(qt) − I[yt≤ qt]G1(yt) +G2(et)  et− qt+ (qt− yt)I[yt≤ qt] α  − ζ2(et) + a(yt) (6)

where ytis the realized return at time point t, G1is increasing, ζ

0

2= G2 is strictly increasing

and strictly convex and a ∈ R is integrable.

To be able to use the loss function from Fissler & Ziegel (2016) in estimation of our proposed models, we need to choose two specification functions, G1 and G2, under the

regularity conditions defined below. Nolde & Ziegel (2017) argue that it is an important property of loss functions to be positive homogeneous since the ranking of the generated loss differences should be unit consistent, e.g. the ranking should be independent of the currency we measure the returns and risk measures with. A loss function S is said to be positively homogeneous (scale invariant) of order b (with b > 0) in the sense that for all c, x and z ∈ R:

S(cx, cz) = |c|bS(x, z) (7)

The only scoring functions that are positively homogeneous of order 0 are obtained by choosing continuously functions G1(z) = 0 and ζ2(z) = −log(−z) (with derivative G2)

(Nolde & Ziegel, 2017).4 For these specification functions, we have to impose that V aRαt and ESαt, a.s. ∀ t are both strictly negative. Since we are considering values of α ranging

from 0.01 to 0.10, we may assume that this is the case. We plug in these choices for the two specification functions, G1 and G2, in equation (6) and we refer to this loss function as

the LF function. In Figure 1 we plot the contours of the strictly consistent LF function, where ythas a standard Normal distribution and α is equal to 0.05.

Due to the presence of the LF function makes it possible to propose joint semi-parametric models for VaR and ES:

(qt, et) = (V aRtα(xt−1, θ), ESαt(xt−1, θ)), t = 1, ..., T (8)

where V aRtα and ESαt are two functions, parametrized by θ ∈ Θ ⊆ Rp, of the explanatory variables included in xt−1. Let xt−1 ∈ Ft−1 be a vector of observable variables at time

t − 1, where Ft−1 is defined as the σ-algebra Ft−1 = σ(yt−1, z0t−1, yt−2, z0t−2, ..., y1, z01).

The realized return at time point t is denoted as yt whereas zt is defined as the vector of

4

There are no strictly consistent loss functions for the pair VaR and ES that are positively homogeneous of order b ≥ 1, only for b = 0 (Nolde & Ziegel, 2017).

(11)

exogenous variables. The vector θ ∈ Θ, contained by p parameters, is estimated via: ˆ θS,T = argmin θ∈Θ 1 T T X t=1 S(qt, et, yt) (9)

where ˆθS,T is the estimator of the parameter of interest θ that is based on a sample of T

observations. In explanatory analysis, we found that the performance of the parameter estimator θ strongly depends on the specification functions.

Figure 1: Contours of the LF function when the distribution of the returns is standard Normal. The star corresponds to the minimum of the LF function. The minimum value is equal to V aRα = −1.63 and ESα = −2.04.

(12)

3

(Semi-)parametric models for VaR and ES

In the following section, we discuss existing (semi-)parametric models for VaR and ES and we propose to estimate the parameters via LF minimization.

3.1 GARCH models

We propose models for VaR and ES predictions based on GARCH dynamics for the con-ditional variance, combined by two different distributions for the standardized errors. We assume that the dynamics of the return series can be modeled as:

yt= σtzt (10)

σ2t = α + β1σt−12 + β2yt−12 (11)

where the conditional variance of the standardized errors σt2follows a GARCH(1,1) process

(Bollerslev, 1986) and the standardized errors ztare iid random variables with zero mean,

unit variance and marginal, strictly increasing distribution function Fz(z). The unknown

parameters (α, β1, β2) are estimated using quasi-maximum likelihood (QML), under some

choice for Fz(z).

The model equations and the specified distribution Fz(z) imply expressions for

the VaR and ES forecasts:

[

V aRtα= zqσˆt (12)

d

EStα = E[zt|zt≤ zq]ˆσt (13)

given that zq = inf{z ∈ R : Fz(z) ≤ α}, i.e. the corresponding left α-quantile of the

distribution Fz(z) which is assumed to not depend on t. As already mentioned above,

in order to evaluate VaR and ES we have to make some distributional assumption for Fz(z) to identify zq and E(zt|zt≤ zq). It is conventional to assume that the standardized

error distribution is standard Normal. Several alternative conditional distributions have been proposed, with the most common specification being the standardized (location-zero, scale-one) Student-t distribution. The Student-t specification might be more appropriate as significant evidence suggests that financial return distributions are more heavier tailed relative to the standard Normal distribution (e.g. Danielsson & De Vries, 2000). The empirical application of these estimation models is provided in Section 7. We denote the GARCH model with standard Normal and Student-t innovation distribution by GARCH-N and GARCH-t, respectively.

(13)

3.1.1 GARCH and EVT

McNeil & Frey (2000) propose a method which avoids distributional assumptions about the standardized error distribution Fz(z), by modeling the distribution of exceedances

above a specified high threshold using extreme value theory. Define the conditional excess distribution function Fu(y) above a threshold u as:

Fu(y) = P[Z ≤ u + y|Z > u] =

F (u + y) − F (u)

1 − F (u) , y ≥ 0 (14)

Therefore,

1 − F (z) = (1 − F (u))(1 − Fu(z − u)) (15)

Pickans-Balkema-de Haan theorem (Balkema & de Haan, 1974, Pickands, 1975) states that for a reasonably wide class of distributions, above a sufficiently high threshold u, the excess distribution converges to the generalized Pareto distribution (GPD) with the following two parameter cumulative density function:

Gξ,β =    1 − (1 + ξy/β)−1/ξ if ξ 6= 0 1 − exp(−y/β) if ξ = 0 (16)

where the constant ξ ∈ R is called the shape parameter which characterizes the tail behavior of Gξ,βand β > 0 is an additional scaling parameter. The following values of the parameter ξ are of particular interest: if ξ = 0, ξ > 0 or ξ < 0, this respectively indicates an exponentially decaying, heavy-tailed or short-tailed distribution in the limit.

Define the descended negative ordered statistics z(1) ≤ z(2) ≤ ... ≤ z(T ) as the sorted values of the sequence {zt}Tt=1 of iid continuous random variables from an unknown

distribution Fz(z). We set the amount by which observations overshoot a certain high

threshold u equal to a specific number k, with k/T > α. Hence, the threshold u is effectively represented by the random (k + 1)-st ascending order statistic z(k+1). We estimate the

parameter values β and ξ of the GPD model by fitting QML to the excess loss data {z(1)−z(k+1), z(2)−z(k+1), ..., z(k)−z(k+1)}. Given the maximum likelihood estimates ( ˆβ, ˆξ),

1 − Gβ, ˆˆξ(y) provides an estimate for ˆFu(y). On the other hand, the function F (u) from

equation (15) can be estimated non-parametrically by the empirical distribution function (EDF) of {zt}Tt=1 evaluated at u, Ft(u) = 1 − k/T . This means that the estimator for the

tail distribution for z ≥ u is obtained by:

ˆ F (z) = 1 − k T  1 + ˆξz − z(k+1) ˆ β −1/ ˆξ (17)

(14)

Using equation (17), the extreme quantile estimator is given by: b zqt = − u +βb ˆ ξ  α k/T − ˆξ − 1 !! , ξ 6= 0ˆ (18)

Next, VaR and ES are analytically expressed in terms of GPD and GARCH parameters:

[ V aRtα= ˆσt+1zˆqt (19) d EStα = ˆσt+1zˆtq 1 1 − ˆξ + ˆ β − ξzk+1t ˆ zt p(1 − ˆξ) ! , ξ < 1 (20)

We refer to the VaR and ES predictions from McNeil & Frey (2000) by GARCH-EVT. If the assumption of the standardized residual distribution is violated, it is in general not longer clear that maximum likelihood is optimal. Alternatively, we estimate the parameters via LF minimization, rather than estimating the GARCH models by QML and by making some assumptions about the data generating process (DGP). We refer to this model as LF-GARCH. In order to identify zq and E(zt|zt≤ zq), we include two

addi-tional parameters in the LF-GARCH parameter set, denoted by γ1 and γ2. In exploratory

simulation analyses we found that the search method of the minimization problem (9) is highly sensitive for the starting values of the intercept parameter α. We therefore propose a reparametrization of the volatility equation, based on the so-called variance targeting technique introduced by Engle & Mezrich (1996), in which the intercept is replaced by the following equation:

α = (1 − β1− β2)σ2 (21)

We estimate the unconditional variance σ2 by the variance of the in-sample returns. The

specification for the conditional variance σt2 can be rewritten as follows:

σt2 = σ2+ β1(σt−12 − σ2) + β2(y2t−1− σ2) (22)

The LF-GARCH optimization parameter vector to be estimated becomes θ = (β1, β2, γ1, γ2).

The parameters need to be constrained to preserve positivity and stationarity of the vari-ance process σt2 by requiring that α > 0, β1 ≥ 0, β2 ≥ 0 and β1+ β2 < 1. We follow

Sheppard (2013) by transforming the parameters to impose the restrictions.

3.2 CAViaR models

Engle & Manganelli (2004a) directly model the conditional quantile for a chosen probability level using quantile regression rather than extracting the quantile from an estimate of a

(15)

specified complete distribution. The Conditional Autoregressive Value at Risk (CAViaR) model specify an autoregressive process to derive the evaluation of the desired conditional quantile. A generic CAViaR model relates the quantile of the return variable at time t to its own lags and, in addition, to the previous returns:

[ V aRtα= ˆβ1+ p X i=2 ˆ βiV aR[ t−i+1 α + q X j=2 ˆ βjl(yt−j+1) (23)

where l is some (non)linear function of the lagged returns and the parameter values of β ∈ Rk (with dimension k = 1 + p + q) are found by minimizing the tick-loss function, following Koenker & Bassett (1978). In particular, we define in expressions (24) - (27) four conditional autoregressive specifications of the generic CAViaR model. The symmetric absolute value CAViaR specification, which responds symmetrically to past returns, is given by:

[

V aRtα = ˆβ1+ ˆβ2V aR[ t−1

α + ˆβ3|yt−1| (24)

The asymmetric slope CAViaR specification is designed to capture the leverage effect, i.e. a large negative shock is expected to increase volatility more than a large positive shock, by responding differently to positive and negative returns:

[

V aRtα = ˆβ1+ ˆβ2V aR[ t−1

α + ˆβ3yt−1I(yt−1 > 0) + ˆβ4yt−1I(yt−1< 0) (25)

The indirect GARCH(1,1) CAViaR model, which responds symmetrically to past returns, is appropriate if the returns are generated by a GARCH(1,1) with iid standardized symmetric errors: [ V aRtα= q ˆ β1+ ˆβ2( [V aR t−1 α )2+ ˆβ3y2t−1 (26)

The adaptive CAViaR model is a smooth model, in the sense that if VaR is exceeded, VaR should increase in the next period but when the VaR is not exceeded, you should decrease VaR very slightly:

[

V aRtα = [V aRt−1α + ˆβ1(1 + exp(G[yt−1− \V aRαt−1])−1− α) (27)

where the parameter G is some sizeable positive number and if G → ∞, the model con-verges to [V aRtα = [V aRt−1α + ˆβ1(I[yt≤ \V aRt−1α ] − α). In our empirical application, we set

G equal to 10.

The effect of past returns on current VaR predictions in a CAViaR model becomes more insightful by considering the news impact curve (NIC). This curve represents how the

(16)

estimated [V aRtα from the CAViaR model changes as the lagged return yt−1vary, using the

estimated parameter vector β.5 We set [V aRt−1

α equal to -2.326 and -1.645, respectively the

0.01 and 0.05 quantile from a standard Normal distribution. Figure 2 plots the NICs for the four CAViaR specifications. One can see that the symmetric absolute value CAViaR and the Indirect GARCH curves respond symmetrically to the past returns. The adap-tive CAViaR curve reacts differently when the previous return exceeds the VaR estimate, as compared to when it is not. Finally, the asymmetric slope CAViaR curve suggests that negative returns tend to decrease the VaR estimate much more compared to positive returns.

(i) Symmetric absolute value (ii) Asymmetric slope

(iii) Indirect GARCH (iv) Adaptive

Figure 2: The figure plots the NICs of the CAViaR models for daily returns of 3000 obser-vations ranging from 2003 to 2015 of the S&P500.

3.2.1 CAViaR and EVT

For extreme quantiles levels, CAViaR estimation gives some problems. Hence, Engle & Manganelli (2004b) introduce an alternative estimator for VaR which incorporates EVT

5

We use the estimated parameters for the CAViaR models for the S&P500 and α = 0.05 (details are presented in Section 7 below and the parameters are presented in Table 9).

(17)

into the quantile regression framework. First, we fit a CAViaR specification to estimate the conditional quantile for an intermediate quantile level p (between 0.05 and 0.10). Then this quantile is used to estimate the extreme conditional quantile. The series of the standardized quantile residuals are constructed as follows:6

ˆ

εtp = yt− [V aR t

p (28)

We calculate the CAViaR quantile estimator in combination with EVT by the following formula:

[

V aRtα= [V aRtp(1 + ˆzq) (29)

where ˆzq is the EVT estimate of the α-quantile of the standardized quantile residuals.

3.2.2 By-product of a quantile regression model

CAViaR models have a strong appeal since it does not rely on distributional assumptions. However, the models concentrate solely on VaR estimation and therefore it is not clear how to estimate the corresponding ES.

Different researches propose an approach to estimate ES. For instance, Engle & Manganelli (2004b) note that, given the fact that the conditional expectation must be constant, ES can simply be estimated by a regression of the exceeding quantile returns against the estimated quantile. We refer to this model as RQ-CAViaR and to the CAViaR framework which incorporates EVT as RQ-CAViaR-EVT.

Another approach is to estimate the parameters of the CAViaR models via LF minimization and we will refer to this as the LF-CAViaR model. We propose a CAViaR model for the conditional quantile component and, in order to compare the LF-CAViaR model with the RQ-CAViaR and RQ-CAViaR-EVT, we specify the following linear equa-tion for the condiequa-tional ES component:

d

EStα = ˆβEV aR[ t

α (30)

Instead of using a simple linear regression model to estimate the ES-specific parameter vector βE, the LF function is minimized to estimate the βE parameter. The LF-CAViaR

parameter set to be estimated becomes θ = (β1, ..., βi, βE), where i is equal to 2, 3 or 4,

depending on the choice of the CAViaR model.

6

If the specified model is assumed to be correct, the distribution of the standardized quantile residuals beyond the p-quantile does not depend on t.

(18)

Koenker (2005) provides an unconditional ES estimator which can be viewed as a by-product of a quantile regression model (Komunjer, 2007) and shows that the following equation for ES holds:

ESαt = E[xt] −

1

αE[(xt− V aR

t

α)(α − I(xt≤ V aRtα)] (31)

Koenker (2005) suggests that the mean can be evaluated by the sample mean and the expectation can be replaced by the minimized quantile regression function. A disadvantage of the previous method is that it provides an unconditional estimator for ES. Likely, daily returns show heteroscedasticity, so it is presumable that an ES estimate should also vary over time. To deliver a conditional ES estimate, Taylor (2008) introduces Conditional Autoregressive Expectiles models (CARES) which are based on the CAViaR specifications. Efron (1991) shows that there exists an interesting one-to-one relation between expectiles and quantiles. In fact, the α-quantile be evaluated by the expectile for which the proportion of in-sample observations lying below is equal to α. Consequently, the conditional expectiles can be used to estimate the quantiles. Taylor (2008) relates ES and the conditional expectile µt(τ ) by the following simple formula:

EStα=  1 + τ (1 − 2τ )α  µt(τ ) (32)

where τ is a real number satisfying µt(τ ) = V aRtα.

Subsequently, Taylor (2008) proposes to substitute the conditional expectile µt(τ )

from expression (32) and delivers the following conditional asymmetric slope CARE model:

d EStα=  1 + ˆτ (1 − 2ˆτ )α  ˆ β1+ ˆβ2ESαt−1+  1 + τˆ (1 − 2ˆτ )α  ˆ β3yt−1I(yt−1> 0)+  1 + τˆ (1 − 2ˆτ )α  ˆ β4yt−1I(yt−1< 0) (33)

where the β parameters are estimated using the classical asymmetric piecewise quadratic loss function (Newey & Powell, 1987) and the value of τ that we select is the proportion of in-sample observations lying below the conditional expectile. We refer to this model as RQ-CAViaR-CARES.

Another approach is to estimate the parameters of the CARES models via LF minimization. For the VaR component, we propose an asymmetric slope CAViaR model and for the ES component, we propose a conditional autoregressive specification:

d

EStα= ˆγ1+ ˆγ2dES

t−1

α + ˆγ3yt−1I(yt−1> 0) + ˆγ4yt−1I(yt−1< 0) (34)

(19)

3.3 Realized kernel models

In the following section we obtain models for the VaR and ES based on the use of high frequency data. One can use the realized volatility as an estimate of the volatility of the intra-day returns. The realized volatility is computed as the squared root of the sum of intra-day squared returns. Under ideal assumptions, σt may be consistently estimated by

the realized volatility measure (Barndorff-Nielsen & Shephard, 2002):

RVt= v u u t n X i=1 r2 t,i p −→ σt (35)

where the n intra-day returns on day t are used to estimate the volatility of the daily return rt=Pni=1rt,i. However, market microstructure noise resulting from bid-ask bounces and

non-synchronous trading effects implies negative autocorrelation in the intra-day returns and the convergence result is not attainable. Various alternative correction methods have been proposed to estimate the intra-day volatility. The most popular is the realized kernel method developed by Barndorff-Nielsen, Hansen, Lunde & Shephard (2008):

RKt= v u u t n X i=1 rt,i2 + 2 l X j=1 wjl n X i=j+1 rt,irt,i−j (36)

where wjl are weights decreasing from 1 to 0 as j → l.

Several evidence detect the existence of long-term dependence in daily realized variance time series, characterized by hyperbolic decay rates in autocorrelation. Corsi (2004) proposes a so-called long memory heterogeneous autoregressive of the realized kernel (HAR-RK) model in order to capture apparent high persistency facts in a simple way. The HAR-RK model is a linear function of the previously observed realized kernels over different time horizons.

The HAR-RK model forecasts the next day realized kernel using the following simple linear regression model:

RKt|t−1 = β1+ βdRKt−1(d) + βwRKt−1(w)+ βmRKt−1(m) (37)

where the terms are thought to represent daily, weekly and monthly effects. Consider the aggregated values of RKt−1 as follows:

RKt−1(n) = 1

n(RKt−1+ ... + RKt−n) (38)

at three different time scales: d = 1 (daily), w = 5 (weekly) and m = 25 (monthly). Expression (37) is estimated by minimizing the mean squared error loss function, which

(20)

results in the ordinary least squares (OLS) estimator.

A common two step procedure is introduced in which we predict the one-day-ahead realized kernel in the first step and in the second we estimate VaR and ES using the following two equations:

[

V aRtα= zqRK\t|t−1 (39)

d

EStα= E(zt|zt≤ zq) \RKt|t−1 (40)

We consider three specifications for zq and E(zt|zt≤ zq) coming from a standard Normal,

a standardized Student-t and a standardized Hansen (1994) skew-t distribution. We refer to these models as respectively HAR-RK-N, HAR-RK-t and HAR-RK-skew-t.

Alternatively, we extend a GARCH model by introducing realized kernel as an explanatory variable, known as the realized GARCH model (Hansen, Huang & Shek, 2012):

σ2t,RK = α + β1RKt−12 + β2σt−1,RK2 (41)

where the parameters are found by maximum likelihood. Subsequently, VaR and ES fore-casts are obtained as:

[

V aRtα = zqˆσt,RK (42)

d

EStα= E[zt|zt≤ zq]ˆσt,RK (43)

We refer to this model as GARCH-RK.

We compare these two existing models, with the LF-RK estimator, by introducing the following realized kernel specifications for the VaR and ES components:

[

V aRtα = ˆδ1+ ˆδ2RKdt|t−1 (44)

d

EStα = ˆδ3+ ˆδ4RKdt|t−1 (45)

where dRKt|t−1is defined as the estimated one-step-ahead realized kernel from the HAR-RK

model. The parameter set θ = (δ1, δ2, δ3, δ4) is estimated by minimizing the LF function.

Extensions of these specifications are defined in our empirical analysis, based on additional explanatory variables that are introduced in the VaR and ES specifications.

(21)

4

Numerical estimation of the LF -models

4.1 Optimization

Since differentiation of the LF function is impossible because of an indicator function, we can only apply derivative free optimization algorithms. Hence, we use the function fmin-search from Matlab, which is an optimization algorithm based on the classic Nelder-Mead simplex search method. Furthermore, we find that our optimization algorithm is sensitive to the starting values to obtain a good fit. Initializing the optimization algorithm with starting values which are fairly close to the global optimum (or a useful local optimum at least) helps the algorithm to find the final optimum. The following numerical opti-mization approach for obtaining VaR and ES starting values is implemented (see Engle & Manganelli, 2004a). First, an array with 103 random numbers is generated from the

continuous uniform distribution, on the interval (0, 1) if the true parameter is positive and on the interval (-1,0) if the parameter is negative. For each set of random numbers, we compute the value of the LF function and we select the 25 vectors that produce the lowest LF. Using these initial starting values, we run the joint optimization algorithm to get new optimal parameters. Finally, we use these optimal parameters as new starting values for the simplex algorithm. This numerical optimization procedure is repeated five times so that the coverage rate is satisfied.

For the initial CAViaR starting values we perform a quantile regression with the same probability level α, which yields starting values extremely close to the final optimum, and for the corresponding ES starting parameters from the CAViaR model we implement the previous optimization technique.

4.2 Asymptotic variance-covariance estimation

Engle & Manganelli (2004a) provide a feasible uniformly consistent estimator of the asymp-totic covariance matrix of parameters from the RQ-CAViaR models:

d V ar(ˆθS,T) = 1 Tα(1 − α) bDT(ˆθS,T) 0 b AT(ˆθS,T)−1DbT(ˆθS,T) (46) where, b AT(ˆθS,T) = 1 T T X t=1 OV aRtα(ˆθS,T)0OV aRtα(ˆθS,T) (47) b DT(ˆθS,T) = 1 2T cT T X t=1 I(|yt− V aRtα(ˆθS,T)| < cT)OV aRtα(ˆθS,T)0OV aRtα(ˆθS,T) (48)

(22)

Patton, Ziegel & Chen (2017) obtain a feasible uniformly consistent estimator of the asymp-totic covariance matrix of parameters from the LF-GARCH, LF-CAViaR and LF-CAViaR-CAESiaR models: d V ar(ˆθS,T) = 1 TDbT(ˆθS,T) 0 b AT(ˆθS,T)−1DbT(ˆθS,T) (49) where, b AT(ˆθS,T) = 1 T T X t=1 gt(ˆθS,T)gt(ˆθS,T)0 (50) gt(ˆθS,T) = ∂S(yt, V aRtα(θ), ESαt(θ)) θ θ=ˆθS,T = ∇V aRtα(ˆθS,T)0 1 −ESt α(ˆθS,T)  1 αI[yt≤ V aR t α(ˆθS,T)] − 1  +∇ESαt(ˆθS,T)0 1 ESt α(ˆθS,T)2  1 αI[yt≤ V aR t α(ˆθS,T)](V aRtα(ˆθS,T) − yt) − V aRtα(ˆθS,T) + ESαt(ˆθT)  (51) b DT(ˆθS,T) = 1 T T X t=1 1 2cTI[|yt −V aRαt(ˆθS,T)| < cT] ∇V aRt α(ˆθS,T)0∇V aRtα(ˆθS,T) −αESt α(ˆθS,T) +ES t α(ˆθS,T)0ESαt(ˆθS,T) ESt α(ˆθS,T)2 (52) We set the bandwidth cT equal to T−1/3.

(23)

5

Backtesting methods

In order to examine the adequacy of the VaR and ES models, a variety of well-known backtesting models is discussed.

5.1 Backtest VaR

If the VaR model at confidence level α is adequate, then the following relation must hold:

P[yt≤ V aRtα] = α. (53)

Similar to Christoffersen (1998), we define the hit sequence {It(α)}Tt=1 of VaR violations

as:

It(α) = I[yt≤ V aRtα], t = 1, ..., T (54)

The validity of determining an accurate VaR model corresponds to check whether the hit sequence (54) satisfies the following two assumptions: the unconditional cover-age (UC) hypothesis and the independence based (IB) hypothesis (Christoffersen, 1998). Firstly the UC property, which claims that the unconditional probability of a VaR violation exactly equals the α coverage rate: P[It(α) = 1] = E[It(α)] = α. Under the UC hypothesis,

the hit sequence of VaR forecasts calculated by a proper model should form a sequence of iid Bernoulli variables. However, the UC test ignores any form of dependence on VaR violations, hence the independence based (IB) property considers serial dependence of or-der one in the occurrence of exceedances. VaR violations observe at two different dates must be distributed independently: P[It(α) = 1|It−1(α) = 0] = P[It(α) = 1|It−1(α) = 1].

Under the IB hypothesis, the number of the independent violations should have a Binomial distribution. A third test is called the conditional coverage (CC) test, which test the two hypothesis simultaneously (Christoffersen, 1998).

5.1.1 Dynamic Quantile test

The tests described in the Section 5.1.1 examine the hit sequences for evidence of serial correlation, but has no power in satisfying the restriction of correct probabilities condi-tionally on the quantile. Engle & Manganelli (2004a) describe an example of such a hit sequence, which is an extreme case of quantile measurement error. As an alternative, En-gle & Manganelli (2004a) introduce a test based on a binary regression model for more general independence, often called the Dynamic Quantile (DQ) test. It analyses the rela-tionship between the hit sequence and a set of relevant variables including an intercept,

(24)

the lagged violations, the current VaR forecast and other explanatory variables contained in the conditional set. Define the hit series as follows:

Hitt= I[yt≤ V aRαt] − α (55)

The DQ test examines if the hit sequence has an expected value equal to 0 and if the hit sequence is uncorrelated with explanatory instruments included in Xt. The

out-of-sample DQ is defined by:

DQ = 1 T PT t=1HittXt(X 0 t· Xt)−1Xt0Hitt α(1 − α) ∼ χ 2(q) (56)

where q refers to the dimension of Xt. The variables included in Xt in our DQ test are a

constant, the current VaR forecast and four lagged values of the hit sequence, so that null distribution of the DQ test follows a χ2(6).

5.2 ES bootstrap test

McNeil & Frey (2000) propose a hypothesis test based on the so-called violation residuals and focus on the subset of ES estimations for which the observed yt is smaller then the

estimated [V aRtα. Define the violation residuals as follows:

rt= yt− dES t α [ V aRtα (57)

Under the assumption that the model yields correct VaR and ES forecasts, these exceedance residuals behave like an iid sample with unconditional mean zero. To test for zero mean, a bootstrap test is used similar to those discussed in Efron & Tibshirani (1993), which requires no assumptions on the distribution of the exceedance residuals.

5.3 Testing forecast dominance 5.3.1 Diebold-Mariano test

Generally, the tests described before are not aimed at identifying the best-performing model for VaR and ES. Recently, Fissler, Ziegel & Gneiting (2015) follow the lead of Diebold & Mariano (1995) for jointly evaluating VaR and ES.

Let {(qt, et)}Tt=1 and {(qt∗, e∗t)}Tt=1 denote two competing sequences of V aRα and

ESα estimates made by models A and B. We select a specific member of the class of

(25)

set G1(z) = 0 and G2(z) = exp(z)/(1 + exp(z)) in our comparative backtest, which meets

all the integrability and regularity requirements, but there are many other alternatives.7

The Diebold-Mariano (DM) test compares the averages of the loss differences of VaR and ES predictions for two different models. We say that model A weakly dominates model B if:

E[S(qt, et, yt)] ≤ E[S(qt∗, e∗t, yt)] (58)

and define a test statistic T as follows:

T = ¯ Sq,e− ¯Sq,e∗ ˆ σT/ √ T (59) ¯ Sq,e= 1 T T X t=1 S(qt, et, yt) (60) ¯ S∗q,e= 1 T T X t=1 S(q∗t, e∗t, yt) (61)

where ˆσT2 is an appropriate estimate of the standard deviation σ2T = var(√T ( ¯Sq,e− ¯Sq,e∗ )).

We estimate this variance by an autocorrelation-consistent Newey & West (1994) estimator with a truncation lag of four. Under suitable assumptions on the process of loss differences, the asymptotic distribution of T is standard Normal.8 The test statistic T has expected

value greater than or equal to zero under the null hypothesis that model A predicts at least as good as model B.

5.3.2 Backtest from Ziegel, Krüger, Jordan & Fasciati (2017)

The test from Fissler et al. (2015) requires us to choose a set of specification functions. Patton (2016) demonstrates that the choice for G1 and G2 is relevant in the context

of ranking and backtesting in the presence of model misspecification. Since there is no intuitive or empirical guidance available, it is challenging to select two specific functions on either economic or statistical grounds. Hence, Ziegel et al. (2017) provide a mixture representation of the strictly consistent loss functions of the form given in (6), let b1 and

b2 ∈ R: S(qt, et, yt) = Z Sb1(qt, yt)dH1(b1) + Z Sb2(qt, et, yt)dH2(b2) (62) 7

Further in this thesis we refer to this loss function as the DF function.

8

(26)

where H1 is a locally finite measure and H2 is a finite measure on all intervals of the form

(−∞, x], and x ∈ R. The first integral corresponds to the evaluation of V aRα:

Sb1(qt, yt) = (I[yt≤ qt] − α)(I[b1 ≤ qt] − I[b1 ≤ yt]) (63)

The second integral takes a more complex form, since it corresponds to the joint evaluation of V aRα and ESα: Sb2(qt, et, yt) = I[b2 ≤ et]  1 αI[yt≤ qt](qt− yt) − (qt− b2)  + I[b2≤ yt](yt− et) (64)

We consider in our analysis only the second integral, since we are interested in the pair V aRα and ESα.

The test procedure can be split in two steps. In the first step we apply for each threshold b2 the DM test from Section 5.3.1 with the elementary loss function defined

in (62), which gives us a sequence of p-values. In the second step we follow Westfall & Young (1993) by correcting our sequence of p-values and we reject our null hypothesis if the minimum of these corrected p-values is below the chosen significance level α.

(27)

6

Monte Carlo simulation study

The finite sample behavior of the LF minimization estimators, introduced in Section 3, is assessed through various Monte Carlo simulation studies.

6.1 Simulation study LF-GARCH

In order to evaluate the finite sample properties of our LF-GARCH model, we implement three Monte Carlo simulations. The set up of the Monte Carlo simulations is as follows. First, we simulate random returns following the classical GARCH(1,1) model. Specifically, the data generating process is defined as follows:

yt= σtzt (65)

σt= α + β1σt−1+ β2yt−12 (66)

where the standardized errors zt form a sequence of independent random variables coming

from three different common distributions that we consider: a standard Normal, a stan-dardized Student-t and a Hansen (1994) skew-t distribution with shape parameter (degrees of freedom) parameter ν = 7 and skewness (degree of asymmetry) parameter γ = −0.5. We choose the following parameter specification: (α, β1, β2) = (0.01, 0.90, 0.09).

Under this GARCH(1,1) model the true VaR and ES are given by:

V aRtα= zqσt (67)

ESαt = E[zt|zt≤ zq]σt (68)

The values of zq and E[zt|zt ≤ zq] depend on the choice of the distribution of zt and we

refer to the following table:

Normal Student-t Skew-t

zq Φ−1(α) inverse t -CDF(α) inverse skew-t -CDF(α)

E(zt|zt≤ zq) −φ(Φ−1(α))) ∗9 ∗10

9

Broda & Paolella (2011) present the following expression for E[zt|zt ≤ zq] if zt has a standardized

Student-t distribution with ν degrees of freedom:

E[zt|zt≤ zq] = − ν−1/2 B ν 2, 1 2  (1 + z 2 q/ν) ν + z2q ν − 1 (69) where B(a, b) =R1 0 x a−1

(1 − x)b−1dx, also known as the beta function.

10

We simulate 10,000 random numbers generated from the continuous Hansen (1994) skew-t distribution and subsequently calculate E[zt|zt≤ zq].

(28)

In each replication, we simulate 2500 and 5000 new return series and subsequently estimate the unknown parameter set via LF minimization. We consider two different of values of α in this simulation study, α = 0.01 and α = 0.05, which are commonly used probability values in risk management.

Table 1 and 2 present descriptive statistics of the estimated parameters of the LF-GARCH model on standard Normal and Student-t innovations, respectively. We observe that the biases are negligible small for the different sample sizes and probability levels that we consider. The biases and the standard deviations of the estimated parameters decrease as the sample size or the quantile level increase. Comparing the biases and the standard deviations across Table 1 and 2, we note that they are more often higher for Student-t than for standard Normal innovations.

Next, we focus on the accuracy of the VaR and ES forecasts of the LF-GARCH model. To study this we calculate the mean squared error (MSE) and the bias for the estimated risk measures based on the following two equations:

M SE( dRMtα) = 1 T T X t=1 ( dRMtα− RMt α)2 (70) Bias( dRMtα) = 1 T T X t=1 ( dRMtα− RMαt) (71)

where RMαt stands for the true VaR and ES risk measure and dRMtα for the estimated VaR and ES risk measure.

Table 3 presents the results. We observe that the biases and the MSEs are larger when α is equal to 0.01 and vanish as the sample size increases. Comparing standard Nor-mally distributed residuals with Student-t and Hansen (1994) skew-t distributed residuals, we note that, as expected, the MSEs and biases are higher for Student-t and Hansen (1994) skew-t distributions. Overall, these simulation results show that the LF-GARCH model provides reasonable VaR and ES estimations in optimal conditions and the parameter estimations improve as the sample size or the probability level α increases.

(29)

Normal innovations α β1 β2 γ1 γ2 T = 2500 α = 0.01 Value 0.018 0.888 0.093 -2.335 -2.658 Bias 0.008 -0.012 0.003 -0.008 0.007 Std dev. 0.049 0.033 0.102 0.109 Median 0.011 0.892 0.099 -2.340 -2.653 90% - 10% 0.023 0.093 0.075 0.216 0.260 α = 0.05 Value 0.021 0.887 0.094 -1.652 -2.074 Bias 0.011 -0.013 0.004 -0.007 -0.012 Std dev. 0.030 0.022 0.052 0.059 Median 0.010 0.899 0.090 -1.654 -2.073 90% - 10% 0.025 0.078 0.061 0.117 0.137 T = 5000 α = 0.01 Value 0.012 0.896 0.090 -2.323 -2.661 Bias 0.002 -0.004 0.000 0.003 0.004 Std dev. 0.019 0.016 0.062 0.075 Median 0.010 0.901 0.088 -2.336 -2.669 90% - 10% 0.014 0.052 0.044 0.137 0.192 α = 0.05 Value 0.013 0.898 0.088 -1.649 -2.065 Bias 0.003 -0.002 -0.002 -0.004 -0.002 Std dev. 0.019 0.014 0.035 0.042 Median 0.011 0.896 0.091 -1.649 -2.069 90% - 10% 0.014 0.044 0.038 0.093 0.092

Table 1: This table presents results based on the Monte Carlo simulation with 100 replica-tions of the LF-GARCH model with the DGP being the classical GARCH(1,1) with standard Normal errors. The first row reports the sample mean of the estimated values of the param-eters. The second, third, fourth and the fifth row report the average bias, sample variance, median and the difference between the 90th and 10th percentiles.

(30)

Student-t innovations α β1 β2 γ1 γ2 T = 2500 α = 0.01 Value 0.027 0.870 0.101 -2.564 -3.166 Bias 0.017 -0.030 0.011 -0.030 0.041 Std dev. 0.065 0.043 0.186 0.223 Median 0.013 0.882 0.099 -2.561 -3.165 90% - 10% 0.073 0.172 0.095 0.359 0.464 α = 0.05 Value 0.013 0.891 0.092 -1.617 -2.211 Bias 0.003 -0.009 0.002 -0.016 -0.017 Std dev. 0.034 0.024 0.063 0.092 Median 0.011 0.893 0.090 -1.618 -2.218 90% - 10% 0.027 0.130 0.084 0.142 0.185 T = 5000 α = 0.01 Value 0.014 0.895 0.088 -2.552 -3.187 Bias 0.004 -0.005 -0.002 -0.019 0.000 Std dev. 0.032 0.026 0.112 0.155 Median 0.010 0.901 0.088 -2.561 -3.208 90% - 10% 0.019 0.082 0.068 0.255 0.359 α = 0.05 Value 0.011 0.899 0.089 -1.611 -2.209 Bias 0.001 -0.001 -0.001 -0.010 -0.013 Std dev. 0.018 0.015 0.045 0.065 Median 0.011 0.900 0.088 -1.617 -2.211 90% - 10% 0.011 0.053 0.049 0.098 0.134

Table 2: This table presents results based on the Monte Carlo simulation with 100 replica-tions of the LF-GARCH model with the DGP being the classical GARCH(1,1) with standard Normal errors. The first row reports the sample mean of the estimated values of the param-eters. The second, third, fourth and the fifth row report the average bias, sample variance, median and the difference between the 90th and 10th percentiles.

(31)

Normal Student-t Skew-t

VaR ES VaR ES VaR ES

α = 0.01 T = 2500 Bias -0.091 0.142 -0.486 -0.164 -0.894 -0.629 MSE 0.439 0.573 1.517 2.603 6.065 9.833 T = 5000 Bias 0.013 0.119 -0.148 0.018 -0.309 -0.210 MSE 0.304 0.404 0.706 1.223 1.327 2.573

VaR ES VaR ES VaR ES

α = 0.05 T = 2500 Bias -0.032 -0.031 -0.451 -0.537 -0.314 -0.325 MSE 0.177 0.281 2.828 5.058 0.831 1.738 T = 5000 Bias -0.016 0.015 -0.026 -0.009 -0.105 -0.136 MSE 0.121 0.185 0.201 0.372 0.395 0.924

Table 3: This table reports the MSE and bias (× 10−1) of the VaR and ES predictions

obtained by the LF-GARCH model based on 100 Monte Carlo simulations for different distributions for the standardized residuals.

6.2 Simulation study LF-RK

In this section we assess the finite sample accuracy of our LF-RK estimator. All the reported simulation results are replicated 100 times (with 2500 or 5000 VaR and ES pre-dictions each) and we consider two different tail quantile levels in this simulation study, α = 0.01 and α = 0.05. Following, Barendse (2017) we define the following DGP:

yt= γ0+ γ1xt+ (1 + γ2xt)εt (72)

xt= φxxt−1+ νt (73)

εt∼ N (0, 1) (74)

νt∼ N (0, (1 − φx)2) (75)

This DGP is chosen by Barendse (2017), since it allows for a variety of setups that include the presence of serial correlation in the explanatory variables and conditional heteroscedasticity. The latent factor xt is able to affect yt nonlinear by its conditional

mean and volatility. In the first case we set γ0= 0.25 and γ1 = γ2 = 0, so that xtdoes not

(32)

in the last case we set γ0 = γ1 = γ2 = 0.25, so that xt influences the mean and volatility

of yt. We set φx equal to 0.85. Under this model the true VaR and ES conditional on xt

are given by:

V aRtα= γ0+ γ1Xt+ γ2XtΦ−1(α) + Φ−1(α) (76) ESαt = γ0− φ(Φ−1(α)) α + γ1Xt− γ2 φ(Φ−1(α)) α Xt (77)

The DGP, defined in (72), implies the following values for the VaR and ES parameters:

δ1= γ0+ Φ−1(α) (78) δ2 = γ1+ γ2Φ−1(α) (79) δ3= γ0− φ(Φ−1(α)) α (80) δ4 = γ1− γ2 φ(Φ−1(α)) α (81)

Table 4 and 5 present descriptive statistics of the δ parameter estimates of the LF-RK model for γ0 = γ1 = 0.25 and γ2 = 0, and γ0 = γ1 = γ2 = 0.25, respectively. We

do not report the results for the iid scenario (i.e. γ0 = 0.25, γ1 = γ2 = 0), since these

results are similar. The simulation results show the same behavior as for the LF-GARCH estimator (i.e. biases are rather small, the standard deviations increase as α decreases and decrease as the sample sizes increase).

Table 6 presents the accuracy of the VaR and ES forecasts of the LF-RK model, in terms of the bias and MSE. We observe that the MSEs and biases decrease with the sample size and increase when we move further out in the range of the left tail.

(33)

γ0= γ1 = 0.25, γ2= 0 δ1 δ2 δ3 δ4 T = 2500 α = 0.01 Value -2.078 0.246 -2.410 0.302 Bias -0.001 -0.004 0.006 0.052 Std dev. 0.081 0.279 0.091 0.297 Median -2.080 0.239 -2.417 0.324 90% - 10% 0.188 0.612 0.255 0.805 α = 0.05 Value -1.392 0.263 -1.804 0.269 Bias 0.003 0.013 0.009 0.019 Std dev. 0.081 0.279 0.091 0.297 Median -1.386 0.256 -1.804 0.265 90% - 10% 0.110 0.407 0.145 0.447 T = 5000 α = 0.01 Value -2.076 0.256 -2.412 0.257 Bias 0.000 0.006 0.004 0.007 Std dev. 0.055 0.203 0.064 0.226 Median -2.080 0.251 -2.415 0.268 90% - 10% 0.118 0.577 0.162 0.640 α = 0.05 Value -1.394 0.232 -1.810 0.241 Bias 0.001 -0.018 0.003 -0.009 Std dev. 0.030 0.103 0.035 0.122 Median -1.392 0.232 -1.803 0.243 90% - 10% 0.070 0.220 0.076 0.299

Table 4: This table presents results based on the Monte Carlo simulation with 100 repli-cations of the LF-RK model when γ0 = γ1 = 0.25 and γ2 = 0. The first row reports the

average estimated values of the parameters. The second, third, fourth and the fifth row report the average bias, sample variance, median and the difference between the 90th and 10th percentiles.

(34)

γ0 = γ1= γ2 = 0.25 δ1 δ2 δ3 δ4 T = 2500 α = 0.01 Value -2.070 -0.364 -2.399 -0.489 Bias 0.006 -0.033 0.016 -0.072 Std dev. 0.078 0.268 0.090 0.298 Median -2.072 -0.368 -2.397 -0.499 90% - 10% 0.177 0.673 0.254 0.824 α = 0.05 Value -1.398 -0.136 -1.811 -0.234 Bias -0.004 0.025 0.002 0.032 Std dev. 0.044 0.152 0.049 0.168 Median -1.392 -0.142 -1.813 -0.224 90% - 10% 0.118 0.383 0.133 0.418 T = 5000 α = 0.01 Value -2.070 -0.351 -2.412 -0.403 Bias 0.006 -0.019 0.003 0.013 Std dev. 0.053 0.184 0.065 0.217 Median -2.068 -0.378 -2.412 -0.421 90% - 10% 0.133 0.421 0.169 0.486 α = 0.05 Value -1.414 -0.158 -1.841 -0.266 Bias -0.019 0.003 -0.028 0.000 Std dev. 0.029 0.103 0.035 0.119 Median -1.395 -0.175 -1.808 -0.303 90% - 10% 0.080 0.284 0.092 0.394

Table 5: This table presents results based on the Monte Carlo simulation with 100 replica-tions of the LF-RK model when γ0 = γ1 = γ2 = 0.25. The first row reports the estimated

values of the parameters. The second, third, fourth and the fifth row report the average bias, sample variance, median and the difference between the 90th and 10th percentiles.

(35)

(1) (2) (3)

VaR ES VaR ES VaR ES

α = 0.01 T = 2500 Bias 0.543 -1.148 -1.205 -2.591 0.198 -0.903 MSE 1.123 1.708 1.189 1.784 1.386 2.044 T = 5000 Bias -0.102 -0.491 0.129 0.380 0.384 -0.672 MSE 0.560 0.995 0.518 0.634 0.823 0.384 α = 0.05 T = 2500 Bias 0.497 -0.123 0.129 0.380 -0.363 -0.311 MSE 0.367 0.543 0.518 0.634 0.396 0.677 T = 5000 Bias 0.377 -0.145 -0.045 -0.297 -0.081 -0.311 MSE 0.168 0.204 0.324 0.406 0.225 0.426

Table 6: This table reports the MSE and bias (× 10 −2) of the VaR and ES predictions

obtained by the LF-RK model based on 100 Monte Carlo simulations. (1) corresponds to the first case, when: γ0 = 0.25, γ1 = γ2 = 0. (2) corresponds to the second case, when:

γ0 = γ1 = 0.25 and γ2 = 0 and (3) corresponds to the last case, when: γ0 = γ1 = γ2 =

(36)

7

Empirical application

In this section we consider an empirical application of all estimation methods discussed in Section 3, for the international stock indices S&P500, FTSE100 and Nikkei. The daily closing prices and the realized measures are obtained from Oxford-Man Institute of Quanti-tative Finance Realized Library, each series contain 3000 observations daily returns ending on October 2015 (the exact start data for the indices vary due to differences in trading hours and days).11 We compute the daily returns as 100 times the first difference of the logarithm of the closing prices, i.e. yt= 100(log(pt) − log(pt−1)), where pt denotes closing

price at day t. Table 7 provides some full-sample main statistics for the three stock indices. The upper panel presents the sample mean, sample variance, skewness and kurtosis. We observe that the skewness for all indices is slightly negative. Furthermore, the kurtosis value for all indices is greater than 3, suggesting that the return series are more heavier tailed compared to a standard Normal distribution. The lower two panels present the full-sample VaR and ES statistics for different values for α. For each stock index, we use the first 2500 observations for estimation and reserve the last 500 observations for out-of-sample (OOS) testing and comparison. We estimate 1, 2.5, 5 and 10% one-day-ahead VaR and ES, using the models specified in Section 3.

11

(37)

S&P500 FTSE100 Nikkei Mean 0.025 0.012 0.003 Std dev. 1.200 1.116 1.490 Skewness -0.269 -0.144 -0.813 Kurtosis 14.251 12.210 9.783 VaR0.01 -3.467 -3.109 -4.207 VaR0.025 -2.525 -2.384 -3.030 VaR0.05 -1.780 -1.749 -2.341 VaR0.1 -1.185 -1.135 -1.665 ES0.01 -5.265 -4.610 -6.409 ES0.025 -3.868 -3.453 -4.649 ES0.05 -2.989 -2.732 -3.641 ES0.1 -2.213 -2.060 -2.809

Table 7: This table summarizes statistics for the stock indices S&P500, FTSE100 and Nikkei over the full-sample period ending on 2015 (containing 3000 observations). The first panel reports the mean, the standard deviation, the skewness and kurtosis. The second and third panel report the full-sample VaR and ES for different choices of α.

7.1 In-sample estimation

We now present the parameter estimates of the GARCH models and CAViaR models presented in Section 3, along with the corresponding standard errors and the one-sided probability values in Table 8 and Table 9. The accuracy of the parameter estimates is evaluated in terms of their standard errors and their probability values. We calculate the standard errors using the consistent asymptotic covariance matrix from Section 4.2. In order to save space, we only report the parameter values for the S&P500 for α = 0.05.

It can be observed from Table 8 that the coefficients γ1 and γ2 are both smaller

for the LF-GARCH model than for the GARCH-N and GARCH-t models. The param-eter values from the GARCH-EVT and LF-GARCH models are not substantially differ from each other. Concentrating on Table 9, one important result that is common to Engle & Manganelli (2004a), is that the coefficient of the autoregressive term of the CAViaR-based models is strongly significant. This indicates that daily returns in the tail exhibit the volatility clustering property.12 The second result is that we find that the parameter

12

Volatility clustering refers to the phenomenon that "large changes tend be to followed by large changes, of either sign, and small changes tend to be followed by small changes" (Mandelbrot, 1963).

(38)

values of the negative returns are strongly significant, whereas the parameters of the pos-itive returns are not significant. We conclude that the standard errors of the estimated parameters of all models are relatively small. Generally, the parameters of the LF-CAViaR models are estimated with greater precision than the parameters of the other models.

β1 β2 γ1 γ2 GARCH-EVT Values 0.899 0.086 -1.715 -2.351 Standard errors 0.008 0.008 p-values 0.000 0.000 GARCH-N Values 0.899 0.086 -1.645 -2.063 Standard errors 0.008 0.008 p-values 0.000 0.000 GARCH-t Values 0.906 0.084 -1.607 -2.186 Standard errors 0.010 0.010 p-values 0.000 0.000 LF-GARCH Values 0.889 0.100 -1.742 -2.304 Standard errors 0.013 0.010 0.049 0.081 p-values 0.000 0.000 0.000 0.000

Table 8: This table presents the estimated parameters, the corresponding standard errors and the one-sided probability values for the S&P500 for α = 0.05 over the in-sample period ending on 2013 for the GARCH models and the LF-GARCH model.

(39)

β1 β2 β3 β4 βE Model 1 LF-CAViaR Value 0.032 0.893 0.205 1.344 Standard errors 0.020 0.031 0.072 0.068 p-values 0.055 0.000 0.002 0.000 RQ-CAViaR Values 0.027 0.886 0.233 1.274 Standard errors 0.037 0.035 0.028 p-values 0.233 0.000 0.000 Model 2 LF-CAViaR Values 0.045 0.907 0.041 -0.279 1.330 Standard errors 0.016 0.022 0.052 0.041 0.033 p-values 0.002 0.000 0.213 0.000 0.000 RQ-CAViaR Values 0.038 0.914 0.038 -0.266 1.301 Standard errors 0.020 0.032 0.053 0.051 p-values 0.029 0.000 0.235 0.000 Model 3 LF-CAViaR Values 0.054 0.889 0.301 1.325 Standard errors 0.037 0.023 0.176 0.046 p-values 0.073 0.000 0.044 0.000 RQ-CAViaR Values 0.049 0.875 0.354 1.287 Standard errors 0.057 0.048 0.525 p-values 0.198 0.000 0.250 Model 4 LF-CAViaR Values 0.470 1.336 Standard errors 0.003 0.064 p-values 0.000 0.000 RQ-CAViaR Values 0.996 1.246 Standard errors 0.007 0.000 p-values 0.000 0.000

Table 9: This table presents the estimated parameters, the corresponding standard errors and the one-sided probability values for the S&P500 for α = 0.05 over the in-sample period ending on 2013 for the RQ-CAViaR models and the LF-CAViaR models.

(40)

7.2 Realized kernel models

Table 10 reports the parameter estimates together with their standard errors and their one-sided probability values for the 5% fitted VaR and ES estimates for the S&P500 based on the LF-RK model. We observe that the coefficients δ2 and δ4 are strongly significant,

whereas the coefficients δ1 and δ3 are not significantly different from zero (with p-values

of 0.421 and 0.384). δ1 δ2 δ3 δ4 LF-RK Values -0.012 -2.026 -0.026 -2.678 Standard errors 0.060 0.059 0.088 0.209 p-values 0.421 0.000 0.384 0.000

Table 10: This table presents the estimated parameters, the corresponding standard errors and the one-sided probability values for the S&P500 for α = 0.05 over the in-sample period ending on 2013 for the LF-RK models.

7.3 Out-of-sample estimation

In the following, we discuss the out-of-sample (OOS) forecast performance of the eighteen models discussed in Section 3. Initially, we will focus on the results for α = 0.05, given the focus on that quantile in the literature. In Figure 3 we have plotted the 5% VaR predictions from the LF-GARCH model and the GARCH-EVT model. From this figure we conclude that the 5% VaR and ES predictions from the LF-GARCH and GARCH-EVT models are largely similar, where the LF-GARCH VaR predictions being slightly more negative. This result is in line with the descriptive statistics from Table 8, we conclude that the estimated parameter values of the GARCH-EVT and the LF-GARCH model are rather similar.

In Figure 4 we have plotted the 5% VaR predictions from the asymmetric slope RQ-CAViaR model and from the asymmetric slope LF-CAViaR model. From this figure we conclude that the VaR predictions from these two models vary together. This is to be expected considering the LF function from expression (6). Notice that the equation in the parentheses from the last part of the second line coincides with the tick-loss function. However, the ES predictions from the asymmetric slope LF-CAViaR model are somewhat more extreme in comparison with the ES predictions from the RQ-CAViaR model. We can

(41)

also see this in Table 9, where the estimated multiplicative factor βE for the RQ-CAViaR

and LF-CAViaR model are presented. From this table we conclude that the value of the final estimate for the βE parameter is higher for the LF-CAViaR then for the RQ-CAViaR.

Figure 3: This figure plots the 5% VaR predictions of the LF-GARCH model and the GARCH-EVT model for the S&P500 over the OOS period ending on 2015.

(42)

Figure 4: This figure plots the 5% VaR predictions of the asymmetric slope RQ-CAViaR model and of the asymmetric slope LF-CAViaR model over the OOS period ending on 2015 for the S&P500.

We consider the results of the traditional backtesting procedures discussed in Sec-tion 6.1 - 6.2. Table 11 presents the outcomes of the coverage test, Table 12 the outcomes of the ES bootstrap test and Table 13 the outcomes of the DQ test. For the S&P500, only the asymmetric slope CAViaR models pass all the tests. For the FTSE100, we see that the CAViaR and LF-GARCH models pass all the tests. On the other hand, for the Nikkei, we see that only the adaptive CAViaR models pass all the tests. Nolde & Ziegel (2017) notice that comparative backtests are better suited for method comparisons on the basis of forecasting accuracy. To do so, we evaluate the DM test and the backtest from Ziegel et al. (2017). To get a first impression of the performance of each model, we look at the average losses using the tick-loss function and the DF function in Table 14 and 15.13 We conclude

that the best-performing model is the asymmetric slope LF-CAViaR. From Figure 5 we conclude that indeed due to the ES predictions, the LF-CAViaR model performs somewhat better than the RQ-CAViaR. It is also interesting to note that the LF-CAViaR-ESiaR and RQ-CAViaR-CARES model does not outperform the asymmetric slope LF-CAViaR and

13

For the DF loss function we use the functions G1(z) = 0 and G2(z) = exp(z)/(1 + exp(z)) in (6). The

tick-loss function only evaluates the VaR forecasts and ignores the ES forecasts, whereas the DF function corresponds to the joint evaluation of VaR and ES.

(43)

RQ-CAViaR model. This seems reasonable, as VaR and ES are, to some extent, likely to vary together, as both will vary with the time-varying volatility. Another interesting finding is that the LF-GARCH performs better than the GARCH models (especially the GARCH-N and GARCH-t models). For the LF-GARCH model, we avoid assuming any particular form for the return distribution, in contrast with the GARCH-N and GARCH-t models. These allow us to infer that the return series are heavy tailed and in fact exhibit a departure from the standard Normal or Student-t distribution. This suggestion is in line with the histogram of the daily returns of the S&P500 from Figure 6 and the full-sample statistics from Table 7. The histogram displays fat tails and the table reports negative skewness and large kurtosis values.

S&P500 FTSE100 Nikkei

LF-GARCH 0.837 1.000 0.837 LF-CAViaR Symmetric 1.000 0.837 0.837 Asymmetric 0.837 0.538 0.538 Ind. GARCH 0.837 1.000 0.837 Adaptive 0.837 0.682 1.000 LF-CAViaR-ESiaR 0.837 0.682 0.305 GARCH-EVT 0.837 0.837 0.682 GARCH-N 0.305 0.151 0.412 GARCH-t 0.305 0.101 0.412 RQ-CAViaR Symmetric 0.837 0.682 0.837 Asymmetric 0.837 0.538 0.682 Ind. GARCH 0.837 1.000 0.682 Adaptive 1.000 0.682 1.000 RQ-CAViaR-EVT Symmetric 0.538 0.151 0.837 Asymmetric 0.837 0.538 0.538 Ind. GARCH 0.305 0.305 0.305 Adaptive 0.682 0.538 0.837 RQ-CAViaR-CARES 0.837 0.538 0.682

Table 11: This table presents the unconditional coverage test for three daily stock indices over the OOS period ending on 2015 for eighteen different forecasting models.

Referenties

GERELATEERDE DOCUMENTEN

peptide vaccination days: NKG2A relative

Previous research on immigrant depictions has shown that only rarely do media reports provide a fair representation of immigrants (Benett et al., 2013), giving way instead

Because the macroeconomic control variables are also country specific we separate the macroeconomic variables in for the United States and for Germany and add

The variables used are as follows: the risk assets is the ratio of risky assets to total assets, the non-performing loans is the ratio of impaired loans to gross loans,

The measured CMRR of the magic-T and its connected waveguides is at least 20dB. The loss of the combination of the magic-T, waveguide structure, and the infinity probe is

Due to total maturation, high concentration, low product differentiation, threat of substitutes, high entrance and resign barriers because of the scale of the industry and

Conclusion: moral stances of the authoritative intellectual Tommy Wieringa creates in his novel These Are the Names a fi ctional world in which he tries to interweave ideas about

children’s human Ž gure drawings. Psychodiagnostic test usage: a survey of the society of personality assessment. American and International Norms, Raven Manual Research Supplement