Bayesian GARCH estimation and expectile backtesting : an investigation of recent developments in financial risk management

(1)

Bayesian GARCH Estimation and Expectile

Backtesting

An Investigation of Recent Developments in

Financial Risk Management

P.I. Zeinstra

Student number: 10867848

Date of final version: February 28, 2016 Master’s programme: Econometrics

Specialisation: Econometrics

Supervisors: dr. S. A. Broda (UvA)

P. M. M. H. J. M. Verstappen, MSc CFA (EY) Second reader: prof. dr. H. P. Boswijk (UvA)

(2)

Declaration of Authorship

I, Paulus Zeinstra, declare that this thesis titled, and the work presented in it are my own. I confirm that:

- This work was done wholly while in candidature for a master degree of Econometrics at the University of Amsterdam.

- Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.

- I have acknowledged all main sources of help.

Signed:

Date: February 28, 2016

(3)

Abstract

In this research we have investigated recent developments in financial risk modelling: Bayesian estimation of GARCH(1,1) models and the use of the Expectile. We applied the GARCH(1,1) model with normal innovations to a Monte Carlo DGP and market data from two stock indices. We found that Bayesian GARCH(1,1) estimation performs relatively poorly for the Monte Carlo DGP in comparison with the MLE. Furthermore, we find that there is barely a difference between both estimation processes and that the choice of innovation process is of sufficiently higher importance for market data. The second part of our research consists of investigating the backtestability of the Expectile. We developed two backtests for the Expectile: one based on the first order condition of its scoring function and one based on the asymptotics of the asymmetric least squares estimate. By a Monte Carlo study we found that both tests perform comparable to Kupiec’s [1995] test in terms of rejection frequencies. Also no evidence against the empirical use of the Expectile and its backtest can be found. Combining the theoretical properties with its backtestability leads us to conclude that the Expectile might be an attractive risk measure to be used in practice.

(4)

Introduction

Since Engle (1982) and Bollerslev (1986) published their articles on volatility models, time series analysis has changed substantially - only the number of their citations (more than 18,000 each) reveals their influence. From this point forward, a multitude of adaptions to their (generalized) autoregressive conditional heteroskedasticity (GARCH) model have been introduced. The main advantage of these models is that they allow for volatility clustering, i.e. periods of low volatility alternate with more agitated periods.

Popularity grew as GARCH models were perfectly suitable for applications in finance (e.g. in risk and asset management) because volatility on financial markets tends to cluster as well. In fact, J.P. Morgan developed the software RiskMetricsTM which is in essence a modification to the GARCH model. These models became even more interesting as they were extended to the multivariate setting (MGARCH models) because they allowed for correlations. This created enthusiasm among practitioners in finance once more. Namely, considering correlations between assets is important when diversifying a portfolio or calculating capital charges.

A more recent movement in the field of statistics is Bayesian inference. The rise in popularity is predominantly due to the increase in machines’ computation power. One of the advantages of Bayesian inference with respect to frequentist (or classical) inference is that information can be added a priori, leading to potentially improved estimates. Of course, the appropriateness of introducing subjectivity to statistical analysis is debatable.

A still ongoing debate between practitioners is the one regarding which risk measure to use. Even though the theoretical properties of the Value at Risk (VaR) seem to be inferior to those of the Expected Shortfall (ES), practitioners are dissatisfied with the increase of complexity that comes with the ES, especially regarding backtestability. Recently, another risk measure has joined the debate: the Expectile.

The question we ask ourselves here is how these recent movements (Bayesian inference and

(7)

the Expectile) can contribute to market risk modelling? This question may easily be split up into two parts by answering the question first for Bayesian inference followed by answering it for the Expectile. Since we consider market risk (risk that arises from movements in market prices), we will answer our questions using GARCH models.

Using a Monte Carlo DGP and market data, we evaluate both frequentist and Bayesian in-ference methods. Frequentist inin-ference and more precisely MLE will serve as a benchmark for Bayesian inference, as it is widely applied in literature. We will compare both methods using a Monte Carlo simulation and base conclusions on statistical measures such as biases, standard deviations, and RMSE.

Besides theoretical properties, backtestability of a risk measure is of extreme importance to determine whether a capital charge for a certain risk is prudent or not. Therefore, in this research emphasis will lie on the backtestability of the Expectile and we question ourselves whether we can come up with a backtest for the Expectile. To evaluate the size and power of the test we use rejection frequencies, obtained by a Monte Carlo simulation as well. We use the backtest proposed by Kupiec (1995) as a benchmark because of its intuitiveness and wide application.

In the following sections we will discuss literature regarding risk measures and GARCH models. It will begin with explaining why these notions are important in practice by providing some background. The second chapter contains detailed methodology on inference methods and risk measures. With regard to inference, emphasis lies on the Bayesian method, not only because it is the inference method that we investigate but also because it is the most difficult method. Sec-ondly, we will elaborate on the risk measures’ backtests. Chapter 3 introduces the Monte Carlo DGP and summarizes the results regarding this dataset. Chapter 4 involves the application to market data. Typical for both Chapter 3 and 4 is the division between GARCH estimation and backtesting. Chapter 5 provides the conclusions of our research.

1.1 Background

In 1973, the collapse of the Bretton-Woods1 system caused massive currency losses for many banks. As a response, the central bank governors of the G10 countries established what we nowadays know as the Basel Committee on Banking Supervision (BCBS). BCBS’s mandate is ”to strengthen the regulation, supervision and practices of banks worldwide with the pur-pose of enhancing financial stability”. It does so by publishing a set of minimum regulatory capital requirements for banks, which will be (partly) enforced by law. Examples of these are Basel I (published in 1988, enforced by law in 1992) and Basel II (published in 2004, enforced

1

The Bretton-Woods system was a monetary policy agreement between 44 countries made in July 1944. The main feature of the agreement was to keep exchange rates fixed.

(8)

by law in 2008). The latest regulation on capital requirements for European Union members is bundled in the Capital Requirements Regulation and Directive (CRR/CRD-IV) and is the first step in implementing Basel III. Between now and 2019, more provisions will be phased-in such that Basel III is fully live in 2019. In the meantime BCBS publishes other (consultative) documents such as the Fundamental Review of the Trading Book (FRTB) and the Review of the Credit Valuation Adjustment Risk Framework. To some extent, this can be seen as Basel IV.

Basel I primarily focused on credit risk (the risk of an obligor not being able to meet con-tractual agreements). According to Basel I, risks should be mapped to risk weighted assets (RWA) - a risk adjusted measure for off-balance exposure. Then, banks should hold a capital ratio level of at least 8%, that is, the capital held in reserve divided by the RWA. Basel II was designed to improve regulatory capital requirements, to incorporate not yet included risks (such as market and operational risk) and to respond to the recent financial modernization. Among other things, it introduced the notion of VaR, a measure to adequately address risk.

1.2 Risk measures

Mathematically speaking, a risk measure for a random variable X is a functional, mapping the c.d.f. of X to a real number. In the nineties, VaR was a risk measure widely accepted by practitioners in financial risk management. However, in 1999, Artzner et al. raised concerns about the use of VaR. The reason for this was that the measure was not ’coherent’; it lacked the property of subadditivity. More criticism came from Yamai and Yoshiba (2002), showing that VaR lacks prudence in case of market stress. Under extreme price fluctuations or extreme dependence structure between assets, VaR may underestimate risk. At that time, practioners - mainly risk managers - also raised their concerns. Namely, VaR does not take the largest risks into account because it does not look beyond the specified quantile. In 2005, Campbell re-viewed backtests for the VaR. Campbell distinguishes between conditional, independence, joint and realized loss backtests. Results suggested that gains in statistical power could be achieved if other quantiles would have been taken into account. This finding is intuitive as the VaR does not tell us anything about the rest of the distribution.

Because of the growing aversion towards the use of VaR, other - coherent - risk measures were proposed in literature, of which the ES has received the most attention. Numerous articles on ES (and its comparison with VaR) were published, all favouring the use ES over VaR (see e.g. Acerbi et al. (2001), Yamai and Yoshiba (2005), Dardac et al. (2011) and Ardia and Hooger-heide (2014)). Consequently, BCBS proposed to replace the VaR by the ES in the FRTB: ”A number of weaknesses have been identified with using VaR for determining regulatory capital requirements, including its inability to capture ”tail risk”. For this reason, the Committee pro-posed in May 2012 to replace VaR with ES.”. A disadvantage of ES was recognized by Kondor et al. (2015). Their central message is that:”nobody should be using Expected Shortfall for the

(9)

purpose of portfolio optimization”, the reason being that exceedingly large samples are needed to obtain an acceptable estimation error.

Another risk measure that gained more attention recently is the Expectile. The advantage of the Expectile over the ES is that it is elicitable, making it more suitable for backtesting. Literature on Expectiles let alone on backtesting Expectiles, is scarce. Yet, Emmer et al. (2013) conclude that ES seems to be the best for use in practice. In Section 2.3 we will elaborate on the theoretical details of the risk measures.

1.3 Models

To calculate risk measures, most banks apply a historical simulation due to its intuitiveness, simplicity and computational ease. This non-parametric method is simply a simulation in which a distribution is drawn from k ∈ N+ past observations with replacement. As one may suspect,

it relies on the assumption that the distribution of future values is equal to the historical dis-tribution. Others use RiskMetricsTM which basically consists of an integrated-GARCH model. Another option would be to apply Extreme Value Theory (EVT). Even though these models yield reasonable results, the most applied in empirical research are GARCH models (see e.g. Marshall et al. (2009) and Zhu and Galbraith (2011)). A brief review of the univariate GARCH model family is given by Xu et al. (2011). Li et al. (2011) compare nine estimation methods by applying three different GARCH models under three different innovations to the CSI 300 index. The three possible innovations are the normal distribution, t-distrution and GED. The models they investigate are the GARCH, exponential-GARCH (EGARCH) and GJR-GARCH (named after Glosten et al. (1993)). They find that the VaR rejection frequency of the EGARCH and GJR-GARCH with GED innovations is closest to the chosen confidence level. From this they conclude these models are most precise.

Back in 1996, Nakatsuma and Tsurumi compared Bayesian estimates with Maximum Likeli-hood estimates (MLE) for ARMA-GARCH models. Based on the Mean Squared Error (MSE) they found that the Bayesian points estimates outperform MLE for small samples. Hoogerheide et al. (2012) compared frequentist and Bayesian estimation methods with respect to their den-sity forecasts using GARCH models. They found no significant difference between the quality of whole density forecasts, however, they did find that the Bayesian method leads to significantly better left tail-forecast accuracy. Moreover they concluded that Bayesian estimation methods should be preferred in risk management applications, because of their superior predictive ac-curacy in the left tail. Aussenegg and Miazhynskaia (2006) find another rationale to prefer Bayesian approach over traditional techniques for estimating GARCH models. They conclude mentioning that the Bayesian approach involves less uncertainty in VaR estimates compared to other methods. However, considering at backtests they do not find any significant difference. Stegmueller (2013) compared the Bayesian and frequentist approach for multilevel models. He

(10)

argued that Bayesian estimates show far better properties. For instance, the magnitude in bias is much smaller for Bayesian estimates than for ML estimates.

(11)

Methodology

In this chapter we will setup our framework. In the first section we introduce the GARCH model, the two inference methods and how to estimate the GARCH model. We then continue by reviewing risk measures, backtesting and conclude by bootstrap methods. First, we start off by explaining the very basics of time series analysis and the notion of returns.

2.1 GARCH models

If a price of a stock at time t is defined as Pt, then the one-period simple return is defined as:

Rt=

Pt− Pt−1

Pt−1

.

The continuous compounded return or log-return equals:

rt= log(1 + Rt) = log Pt− log Pt−1.

Then, by first order Taylor approximation we have that for small Rt, Rt ≈ rt. Therefore, we

may use the words return and log-return interchangeably. Now, we let {rt}T_t=1 be a return

process and Ft= {rt, Ft−1} be the filtration. A filtration is a set that contains all information

up to time t. At t = 1, the filtration does not contain any information. If we then define at= rt− E[rt|Ft−1] and ht= Var[rt|Ft−1], the ARCH(m) model is given by:

at= p htt, ht= α0+ m X i=1 αia2t−i, t∼ i.i.d. (0, 1) . 6

(12)

with α0 > 0 and αi ≥ 0. In 1986, Bollerslev proposed the GARCH(m,s) model, where the latter

equation extended to:

ht= α0+ m X i=1 αia2t−i+ s X j=1 βjht−j.

An extension to this model is the GJR-GARCH (also known as threshold GARCH). The model, extended by the parameter(s) δi, is given below:

ht= α0+ m X i=1 αi+ δiI{at−i<0} a 2 t−i+ s X j=1 βjht−j.

In empirical studies, it is commonly found that when modelling a financial time series with a GJR-GARCH(1,1), δ1 > 0. This phenomenom is called the leverage effect: A negative return

causes a decrease in equity, hence an increase in leverage ratio1 and thus a larger return on equity. The GJR-GARCH model is therefore more realistic than the GARCH model, where negative shocks are equivalent to postive shocks. Other models that incorporate asymmetry are the EGARCH and asymmetric power ARCH (APARCH). These models however, do provide results comparable to the GARCH model (see Rodr´ıguez and Ruiz (2012)) and therefore will not be investigated for the sake of computational ease. The multivariate (asymmetric) GJR-DCC-GARCH model is the standard workhorse in the multivariate setting and is given below:

Ht= DtPtDt, Dt= diag(phi,t), Pt= {diag Qt}− 1 2_Q_t{diag Q_t}− 1 2_, Qt= ¯Qt(1 − θ1− θ2) + θ1t−10t−1+ θ2Qt−1+ θ3(vt−1v0t−1− ¯N), t= D−1t at.

Where vt= max(0, −t) and ¯N = Var(vt). As one may notice, it allows for dynamic correlations.

Because of computational complexity though, we will not investigate the multivariate case.

2.2 Inference methods

Whilst frequentist inference being mainstream for decades, the field of Bayesian inference is upcoming. In the following two sections we will describe estimation techniques for both inference methods.

2.2.1 Frequentist method

In the frequentists framework, there are two estimation methods for GARCH models: Gen-eralized Method of Momoments (GMM) and (Quasi-) Maximum Likelihood ((Q)ML). An

il-1

(13)

lustration of GMM estimation for an ARCH model is given by Mark (1988), who estimates a model of forward foreign exchange rates based on the CAPM equations. However, frequently GARCH models are estimated by (Q)ML and therefore we will consider this frequentist esti-mation method only.

If we let γ be the vector of AR parameters and φ the vector of GARCH parameters, the AR-GARCH process is written as below:

rt= µt(γ) +

p

ht(γ, φ)t.

Then, we assume that rt is i.i.d. and follows a normal conditional density f (·), leading to the

following likelihood: L(γ, φ | rt) = T Y t=m+1 f (rt|Ft−1, γ, φ), ∝ T X t=m+1 log f (rt|Ft−1, γ, φ), = T X t=m+1 `t(γ, φ), ∝ T X t=m+1 −1 2log ht(γ, φ) − 1 2 (rt− µ(γ))2 ht(γ, φ) .

The parameters (ˆγ_{M LE}, ˆφ_{M LE}) which maximize the latter function are the ML estimates. The function can be optimized by numerically solving the first order conditions. Note however, that we need m = max(p, q, m, s) initial values for at and ht as they are unobserved. Therefore, a

typical assumption is:

at= 0, ht= 1 T − m T X t=m+1 (rt− µt(γ))2, t = 1, . . . , m.

To estimate the variance-covariance matrix, define (with ψ = (γ, φ)):

b A = − T X t=m+1 ∂2`t( bψ) ∂ψ∂ψ0 and B = −b T X t=m+1 ∂`t( bψ) ∂ψ ∂`t( bψ) ∂ψ0 .

If the information equality holds (that is in this case, the error term is indeed normally dis-tributed), then the variance can be consistently estimated by either bA−1, bB−1 or bA−1B bbA−1. However, if one opines that errors follow a tdistribution instead of a normal distribution -which is very plausible for financial time series - one can still apply QML, by using Bollerslev-Wooldridge standard errors.

(14)

2.2.2 Bayesian method

The concept of Bayesian statistics

The difference between the frequentist and the Bayesian approach stems from the underlying assumption on the parameter, say θ = (θ1, . . . , θd), which lies in a parameter space Θ ⊆ Rd.

Frequentists assume that there exists a fixed and true θ. Contrarily, Bayesians assume θ is a random variable with a prior density p(θ). The prior density represents the information about θ known by the researcher in advance (which is not necessarily informative). If we define the likelihood function L(θ|rt) = p(rt|θ), then by Bayes’ rule we have:

p(θ|rt) = p(rt|θ)p(θ) p(rt) = R L(θ|rt)p(θ) Θp(θ|rt)p(θ)dθ ,

where the denominator is simply a normalization. The posterior density p(θ|rt) describes the

distribution of θ after combining the researcher’s belief of the distribution of θ with the likelihood of θ, given the data. The following figure depicts this relationship:

-8 -6 -4 -2 0 2 4 6 8 0 50 100 150 200 250 300 350 400 Posterior Likelihood Prior

Figure 2.1: From prior and likelihood to posterior

Note, that if we would set a flat prior p(θ) ∝ 1, we would have

P (θ|rt) ∝ L(θ|rt).

Hence, when maximizing the Bayesian posterior we would obtain a result equivalent to the one in the frequentist method. Also, under certain specifications of the prior distribution, the Bernstein-Von Mises theorem implies that both inference methods estimates converge asymp-totically to the same distribution.

An often convenient choice is to choose a prior that is conjugate to the likelihood such that the posterior can be determined analytically. However, a conjugate prior not necessarily represents

(15)

the prior state. In these cases, one can find a solution by Monte Carlo methods.

The most popular Markov Chain Monte Carlo (MCMC) methods are the Gibbs sampler and the Metropolis-Hastings (MH) algorithm. Which MCMC algorithm to use depends on whether the full conditional density p(θi|θ6=i, rt) is known or not, with θ6=i = (θ1, . . . , θi−1, θi+1, . . . , θd).

If it is known, the Gibbs sampler can be applied and if not, one should apply the MH algorithm. For the GARCH model parameters the full conditionals are unknown (due to the recursiveness of ht) and thus the MH algorithm should be applied. For an extensive discussion on MCMC

strategies, we refer to Tierney (1994).

Bayesian estimation of GARCH(1,1) with normal innovations

To the author’s knowledge, the first one who demonstrates Bayesian inference applied to a GARCH-type model is Geweke (1989). The Bayesian estimation method of the univariate GARCH model has been extensively discussed by Ardia (2008) and therefore we have based the methodology on Ardia’s formulation. Let us redefine the model as follows:

rt= x0tγ + at, at= p htt, ht= α0+ α1a2t−1+ βht−1, t∼ N (0, 1).

with αi > 0 and β > 0 such that ht > 0 and xt is a m × 1 vector of exogenous or lagged

dependent variables. Let us regroup the model parameters as ψ = (γ, φ) = (γ, α, β) and define Σ = diag(ht(ψ)Tt=1) such that the likelihood function equals:

L(ψ | r, X) ∝ | det Σ|−12 exp −1 2a 0_Σ−1_a . (2.1)

In addition, the following priors are proposed:

p(γ) = Nm(γ | µγ, Σγ),

p(α) = N2(α| µα, Σα)I{α>0},

p(β) = N (β | µβ, Σβ)I{β>0}.

where µ• and Σ•are hyperparameters and Ndis the d-dimensional normal density. For the sake

of convenience we assume prior independence such that:

(16)

Then, according to Bayes’ law:

p(ψ | r, X) ∝ L(ψ | r, X)p(ψ).

For the MH algorithm we need initial values ψ[0] = (γ[0], α[0], β[0]) such that we can iterate J passes. A single pass looks as follows:

γ[j]∼ p(γ | α[j−1], β[j−1], r, X), α[j]∼ p(α | γ[j], β[j−1], r, X),

β[j]∼ p(β | γ[j], α[j], r, X).

Since none of these full conditional densities is known analytically, we draw the parameters from proposal densities. The proposal density for γ is obtained by combining the likelihood in (2.1) and the prior density by the Bayesian update:

qγ(γ |γ, α, β, r, X) = Ne m(γ |µbγ, bΣγ), (2.2) with: b Σ−1_γ = X0Σe−1X + Σ−1_γ , b µ_γ = bΣγ X0Σe−1r + Σ−1_γ µ_γ , e Σ = diag {ht(γ, α, β)}e T t=1 ,

where γ is the previous draw. Then, a candidate γe

? _{is sampled from the proposal density in}

(2.2) and accepted with probability:

min p(γ ?_{, α, β | r, X)} p(γ, α, β | r, X)e qγ(eγ | γ ?_{, α, β, r, X)} qγ(γ? |γ, α, β, r, X)e , 1

The parameters α and β are obtained by transforming the GARCH(1,1) model. We define wt= a2t− ht such that: ht= α0+ α1a2t−1+ βht−1, ⇔ a2_t = α0+ (α1+ β) a2t−1− βwt−1+ wt, where: wt= a2t− ht= a2 t h2_t − 1 ht= (χ21− 1)ht.

The variable wt has a conditional mean of zero and a conditional variance of 2h2t and can be

(17)

yields the following expression:

zt(α, β) = a2t − α0− (α1+ β) a2t−1− βzt−1(α, β). (2.3)

Now, similarly as for γ, we can define the likelihood function of (α, β) which is defined by:

L(α, β | γ, r, X) ∝ | det Λ|−12 exp 1

2z

0_Λ−1_z

. (2.4)

Let us define vt= a2t such that the recursive transformations are:

ct= l_t∗ v∗_t ! = 1 + βl ∗ t−1 vt−1+ βv∗t−1 ! ,

where the initial values (l∗₀, v₀∗)0 are equal to zero. Then, the function zt = vt− c0tα can be

evaluated in (2.4) to approximate α. The proposal density is then:

qα(α | γ,α, β, r, X) ∝ Ne p(α |µbα, bΣα)I{α>0} with, b Σ−1_α = C0Λe−1_α C + Σ−1_α , b µ_α = bΣα C0Λe−1_α v + Σ−1_α µ_α , e Λα = diag 2h2 t(γ,α, β)e T t=1 .

Likewise, the tilde represents the previous draw and the star represents the candidate. The candidate γ? is sampled from the proposal and accepted with probability:

min p(γ, α ?_{, β | r, X)} p(γ,α, β | r, X)e qα(α | γ, αe ?_{, β, r, X)} qα(α? | γ,α, β, r, X)e , 1 .

Finally, we need a proposal function for β. The function zt(α, β) in (2.3) is linear with respect

to α but not with respect to β due to the βzt−1(α, β) term in it. Therefore, we use a first order

Taylor approximation to estimate zt(β) at point eβ:

zt(β) ' zt( eβ) + ∂zt ∂β|β= eβ · (β − eβ). Let, st= zt( eβ) + eβ∇t, ∇t= u2t − zt−1( eβ) + eβ∇t−1,

with ∇0 = 0. The function zt(β) is defined as in Equation (2.3) with α being treated as a

(18)

likelihood with the prior density to obtain the proposal density for β: qβ(β | γ, α, eβ, r, X) ∝ N (β | µbβ, bΣβ)I{β>0}, with, b Σ−1_β = ∇0Λe−1_β ∇ + Σ−1_β , b µ_β = bΣβ ∇0Λe−1_β s + Σ−1_β µ_β , e Λβ = diag n 2h2_t(γ, α, eβ) oT t=1 .

A candidate β? _{is generated from this density and accepted with probability:}

min ( p(γ, α, β? | r, X) p(γ, α, eβ | r, X) qβ( eβ | γ, α, β?, r, X) qβ(β? | γ, eβ, β, r, X) , 1 )

Accordingly, the algorithm will start the next iteration again with γ until the J -th iteration is reached. This results in a distribution, the posterior to be precise, of the parameters. Credible intervals can be obtained by quantiles of the posterior distribution. Point estimates are typically the mean, median or the mode (obtained from a non-parametric kernel function).

2.3 Risk measures

In this research we will examine three risk measures: VaR, ES, and Expectiles. Let us first define a stationary process Rt such that:

Rt= µt+

√ htZt,

where Z ∼ i.i.d.(0, 1). Let us also define PR(r) and FR(r) which represent the predictive

cumulative distribution and the true distribution, respectively. Then, the definitions of the unconditional Value-at-Risk, Expected Shortfall, and Expectile are given below:

−VaRp(R) = inf {r ∈ R : PR(r) ≥ p} , −ESp(R) = E [R|R ≤ −VaRp(R)] = 1 p Z p 0 −VaRu(R)du, eτ(R) = arg min r∈R τ Emax(R − r, 0) 2_{+ (1 − τ )E max(r − R, 0)}2_,

where 0 < p, τ < 1. In the figure below, the latter functions of p and τ are given for R ∼ N (0, 1). For the interested reader, Bellini and Di Bernardino (2014) discuss a wide variety of distributions in the light of the quantiles and Expectiles.

(19)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Quantile -3 -2 -1 0 1 2 3 Risk measures -VaR -ES Expectile

Figure 2.2: Risk measure for p or τ under standard normal distribution.

Note that in the left tail: -ES < -VaR < e . Also VaR.01(R) ≈ ES.025(R) ≈ −e.00145(R). From

the formulas above it can be seen that the risk measures do not rely on t - they are unconditional. If we do take the stochastic nature into account and condition on the filtration Ft−1, we can

get the conditional risk measures for the one day ahead return:

−VaRt_p(R) = µt+1+ √ ht+1zp, −ESt_p(R) = µt+1+ √ ht+1E [Z|Z ≤ zp] , et_τ(R) = µt+1+ √ ht+1etτ(Z),

where zp is the p-st quantile of an assumed distribution of Z (for instance normal). Note that

we explicitly distinguish between the quantile p and τ , as τ is not a quantile. For a discussion on conditional and unconditional VaR and ES we refer to McNeil and Frey (2000).

Properties

Earlier, we mentioned that VaR was not coherent. Artzner et al. (1999) defined a risk measure ρ(·) to be coherent if it satisfies the following four properties:

1. Translation invariance ρ(Z − c) = ρ(Z) − c, ∀ a ∈ R. 2. Subadditivity ρ(Z1+ Z2) ≤ ρ(Z1) + ρ(Z2).

3. Positive homogeneity ρ(λZ) = λρ(Z), ∀ λ ≥ 0.

4. Monotonicity P (S ≤ Z) = 1 ⇒ ρ(S) ≤ ρ(Z).

(20)

5. Law-invariance ρ(Z1) = ρ(Z2) ⇒ P (Z1 ≤ c) = P (Z2 ≤ c) ∀ c ∈ R.

6. Comonotonic additivity ρ(Z1+ Z2) = ρ(Z1) + ρ(Z2).

7. Elicitability

The latter notion, elicitability, has drawn a lot of attention lately after Gneiting (2011), who showed that ES is not elicitable, whereas VaR and the Expectile are. In fact, the Expectile is the only elicitable coherent risk measure. A risk measure ρ(Z) is said to be elicitable if it minimizes an expected value of the scoring function S:

ρ = arg min

ω∈R

E[S(ω, Z)] (2.5)

The estimator for the expected scoring function is the mean score over the realizations. The scoring functions of VaR and the Expectile are given below:

S(ω, z, p)VaR= (1(ω>r)− p) · (ω − z),

S(ω, z, τ )Expectile= |1(ω>r)− τ | · |ω − z|.2

The following table provides an overview of the VaR, ES and Expectile and which properties they satisfy.

1.-4. 5. 6. 7.

VaR x x

ES x x

Expectiles x x x

Table 2.1: Risk measures and their properties.

For a broad discussion on these risk measures’ properties, we refer to Emmer et al. (2013).

2.4 Backtests

In this section some commonly applied backtests will be discusssed. As literature on VaR backtests is immense and Acerbi and Szekely (2014) proposed three backtests for ES, we will elaborate on Expectile backtests. In fact, we develop two ourselves.

2_{This notation comes from Newey and Powell (1987). Note that when we defined the Expectile we already}

(21)

2.4.1 Kupiec’s backtest (VaR)

One of the reasons why VaR was such a popular risk measure in practice is because its backtest is very intuitive. Let us define the hit series for a fixed time interval as below:

It+1(p) =    0 if rt,t+1 ≤ −VaRp 1 if rt,t+1 > −VaRp .

Then, under H0 we should have that T

X

t=1

It ∼ Bin(T, p) and we can use asymptotically valid

tests3 based on the t-statistics:

t0 = ˆ p − p0 pp0(1 − p0)/T or ˆt = p − pˆ 0 p ˆp(1 − ˆp)/T, wherep =b 1 T T X t=1

It. Note that because of the asymptotic approximation t0and ˆt follow a normal

distribution. Then for a given confidence level α, we reject the hypothesis that p0 represents

the frequency that the loss is beyond VaRp if |t| > z1−α, where z1−α is the 1 − α-st quantile of

the standard normal distribution.

This test is known as Kupiec’s [1995] test. The CRD-IV (latest capital requirements regu-lation and directive) penalizes banks by enforcing them to hold more capital if in their model

T

X

t=1

It exceeds p · T too much. The test and possible penalties are named the Basel

Commit-tee’s traffic light system. It consists of three zones (green, orange and red), each with different penalties.

VaR can also be backtested by Christoffersen’s test [1998] or Engle and Manganelli’s test [2004]. Both are conditional coverage tests as they take the time series structure into account. The backtest proposed by Christoffersen tests for independence between hit series in a Markov Chain context. This test is out of scope though, as it does concern the independence of frequencies rather than the frequency of exceedance of the VaR itself. The Engle-Manganelli backtest will not be discussed in the VaR context, but will be applied to an Expectile statistic.

2.4.2 ES backtests

In 2014, Acerbi and Szekely proposed three backtests for ES as a reaction to the ongoing criticism towards the backtestability of ES. In the paper they discussed three tests Z1, Z2 and

Z3, each having their own strengths. Acerbi and Szekely concluded that Z2would be a suitable

3_{Normal approximations roughly hold if T · p > 5 and T · (1 − p) > 5. If this does not hold we can still use the}

Binomial distribution. If normal approximations cannot be applied, one can evaluate the point Fn,p(PT_t=1It),

and compare it with the boundaries α

2 and 1 − α 2.

(22)

replacement for the current backtest for VaR (defined above). The test statistic is: Z2(R) = T X t=1 Rt· It T · p · ESp,t + 1,

where the appropriate hypothesis is:

H0 : Pt[a]= F [a] t

H1 : ESa,tF ≥ ESa,t, for all t and > for some t

and VaRF_a,t≥ VaR_a,t, for all t

The testing of the hypothesis goes as follows: we simulate the distribution PZ under H0 to

compute p = PZ(Z(r)) of realization Z2(r): simulate independent Ri t∼ Pt ∀t, ∀i = 1, . . . , M compute Z₂i = Z2(Ri) estimate p = M X i=1 (Zi< Z(r))/M

A great advantage of this test over Z1 and Z3 is that it does not require to store a lot of data.

In fact, testing Z2 requires to record only the magnitude of Rt· It and the predicted ESa,t per

day. A disadvantage of these that is that they require Monte Carlo simulation and are therefore computationally intensive.

2.4.3 Expectile backtests

In this subsection we propose two new backtests. Note from earlier notation that the Expectile depends on τ rather than p. The relation between τ and p depends on the underlying distribution of the random variable of interest and therefore cannot be compared a priori. This is not the only reason: even if we knew this relationship, each backtest would test uncomparable hypotheses, leading to ’apples and oranges comparisons’. Henceforth, we are not so much interested in how the proposed Expectile backtests perform relatively to each other, but how they perform per se.

FOC backtest

We developed the first backtest by taking the first order derivative of the Expectile’s scoring function. This yields the following equation:

τ = E[(Z − eτ)

−_]

E[|Z − eτ|]

(23)

Thus, if the true Expectile is eτ, it would imply that Equation (2.6) holds. Hence we form

hypothesis:

H0 : Equation (2.6) holds, with eτ =beτ. H1 : Equation (2.6) does not hold.

where_beτ is the hypothesized expectile. Equation (2.6) also enables us to provide some intuition

about the Expectile; Bellini and Di Bernardino (2014) state that ”the Expectile is the amount of money that should be added to a position in order to have a prespecified, sufficiently high gain-loss ratio of τ ”. We construct the test statistic bE based on sample equivalent of the equation above using the return data {rt}Tt=1 and the hypothesized Expectile beτ:

b zt≡ rt−µbt q b ht b yt,τ ≡zbt−beτ, (2.7) b xt,τ ≡ τ |ybt,τ| +ybt,τ· 1(byt,τ<0), b E(τ ) = 1 T T X t=1 b xt,τ.

In Figure 2.3 we have illustrated the statistic xt,.00145 for 100 i.i.d. observations with Z ∼

N (0, 1). Recall that under the standard normal distribution the theoretical Expectile of .00145 is equal to the 99% VaR and hence we expect yi to be negative once. Indeed, in Figure 2.3 we

observe that xt < 0 ⇒ yt < 0 occurs once. Note that when yt,τ < 0, its magnitude is large.

Ideally we would have that its magnitude be just as large as the 99 observations larger than 0 summed together. 0 10 20 30 40 50 60 70 80 90 100 T -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 x t, τ x_t,τ Figure 2.3: Realisation of xt,.00145.

(24)

Also, note that bE can be rewritten as b E(τ ) = τ1 T T X t=1 b yt,τ1(ybt,τ≥0)+ (1 − τ ) 1 T T X t=1 b yt,τ1(byt,τ<0).

Under the null hypothesis and assuming xt,τ is i.i.d., we have by LLN and CLT that:

√

T bE(τ ) → N (0, Q),

where Q is consistently estimated by the sample variance s2_x. Then, with a 1 − α confidence level we reject H0 if bE(τ ) < tα

2,T −1· sx √ T or bE(τ ) > t1−α2,T −1· sx √

T. Note that bE(τ ) is equal to

the OLS estimate of λ in the equation

x = ι · λ + u,

where x, ι and u are N × 1 vectors containing xt, ones and normally distributed error terms,

respectively. Then, b λOLS= ι0x ι0ι = 1 T T X t=1

xt,τ and Var[bλOLS] =

σ_x2 T ,

with the equivalent hypothesis H0 : λ = 0; H1 : λ 6= 0. Generally, τ is very small leading to a

a strong asymmetry in the distribution of xt,τ which causes poor predictions for the Expectile.

This problem may be overcome by bootstrapping.

ALS backtest

Alternatively, let bβ(τ ) be the parameter that solves (2.5) for the Expectile scoring function. That is, asymmetric least squares with a constant on the residual data z_bt. Newey and Powell

(1987) showed that for:

b ut,τ ≡ xt− bβ(τ ), b wt,τ ≡ |τ − 1(ui,τ<0)|, c W = 1 T T X t=1 b wt,τ, b V = 1 T T X i=1 b w2_i,τu_b2_t,τ, √ T ( bβ(τ ) − β(τ ))→ .N (0, Wd −1V W−1).

(25)

We can estimate the variance-covariance matrix consistently by its sample analogue. We test the following hypothesis by using a t-test:

H0: bβ(τ ) =beτ; H1 : bβ(τ ) 6=beτ,

where_beτ is the Expectile implied by the assumed distribution of Z.

Expectile F-test

So far we have only considered unconditional coverage tests for the Expectile. The Engle and Manganelli (2004) test however, is a conditional coverage test: the statistic jointly accounts for independence of lags and correct coverage of the Expectile. It was constructed for the use with VaR, but we will use an equivalent test for the Expectile. It is basically an F-test performed on a regression of xt on a constant and lags of xt for the hypothesis that all coefficients are zero.

The regression model we use for this model is therefore the following:

xt= λ + K

X

k=1

xt−kδk+ u0t.

The model above is basically the FOC Expectile backtest regression model extended by lags of xt. As a result we can verify whether there is serial correlation left in xt. Suppose for instance

that we cannot reject the FOC hypothesis but do reject the EM test. In that case we would have reason to believe there is serial correlation left in xt as the FOC test does not have any

power against dependence. The bF statistic is given by:

b

F = (T SS − RSS)/(K + 1)

RSS/(T − 2K − 1) ∼ FK+1,T −2K−1.

Note that bF follows an F distribution with T − 2K − 1 degrees of freedom as we have T − K observations. T T S and RRS stand for total and residual sum of squares, respectively.

Realized loss functions

A statistic to compare multiple models based on the Expectile, the realized loss, is illustrated by Bellini and Di Bernardino (2014) and is given below:

L(eτ, xt) =    (1 − τ ) · (xt− eτ)2 if xt− eτ ≤ 0, τ · (xt− eτ)2 if xt− eτ > 0.

The lower the realized loss, the better the accuracy of the Expectile’s forecast. If one wishes to, it is possible to backtest based on realized loss function although it requires simulation. Therefore we will not perform such a test. For the interested reader, we refer to the extensive review on (VaR) backtests given by Campbell (2005); we will not perform such a backtest.

(26)

2.5 Bootstrapping

Because there is a strong asymmetry in xbt, we may mistrust the correctness of standardized intervals for small T . Therefore we could prefer to bootstrap instead which allows for asymmetric confidence intervals. For the unconditional coverage test we will apply the resampling method.

2.5.1 Resampling method

This method is probably the most straightforward bootstrap method. From the original sample {x_b1,τ, . . . ,bxT ,τ} we resample exactly T draws with replacement. The probability of drawing a certain xt,τ is uniformly distributed such that every observation has a probability of _T1 of being

drawn. With this bootstrapped sample we calculate a t-value, simply by using the bootstrapped sample as input for the hypothesis test instead of the original sample.

If we repeat this resampling B times, we end up with a distribution of {tb}Bb=1. Finally, we

can estimate the p-value of the hypothesis by

p-value = 2 min 1 B B X b=1 1(tb<tobs), 1 − 1 T B X b=1 1(tb<tobs) ! ,

where tobs is the t-value of the original sample.

2.5.2 Moving Block Bootstrap

An advantage of the resampling method is that it is very intuitive and simple. A disadvantage is that the random drawing of observations destroys the stochastic nature of the original sample. For the unconditional coverage tests this is not a problem because we are not interested in it stochastic structure. All the more this is a problem for the bF -test: breaking the stochastic structure of serial correlated variables would lower the significance of lags and result in a rela-tively lower p-values.

To remedy this, we use the moving block bootstrap. Let us define N = T − ` + 1 with ` ∈ N+.

We cut the sample of {x_bt,τ}Tt=1 in N blocks,

B₁ = {x_b1,τ, . . . ,xb`,τ}, B2 = {xb2,τ, . . . ,xb`+1,τ} ..

. = ...

B_N = {x_bN,τ, . . . ,bxT ,τ},

and draw each with probability _N1. In total, we draw k blocks with k = T_l, such that the bootstrap sample length is close to the original sample length. Then, if this will be repeated B

(27)

times one obtains a bootstrapped F distribution. The p-value is then calculated by p-value = 1 B B X b=1 1_(F_b_>F_obs₎,

where Fobs is the statistic raising from the original sample. Choice of ` is crucial here: too

small blocks will break the stochastic structure of the sample and a too large blocks will not be random enough. Literature suggests to choose ` ≈ T13.

(28)

Monte Carlo study

3.1 Monte Carlo DGP

In this section we describe the default parameters series. These are, if not specified differently in following sections, given in the table below. The data generating process is a GARCH(1,1) process with standard normal innovations and zero mean. For the sake of convenience it is chosen to leave out an AR(p) mean process as it only produces additional noise when estimating. In the Appendix the process is depicted and the table below summarizes the parameter values.

α0 α1 β r0 h0 T

0.00001 0.05 0.94 0 0.001 500

Table 3.1: GARCH(1,1) Parameters.

This process resembles a typical financial time series of returns, with two years of daily market data (about 250 workdays a year). Note that we set the initial values equal to their expectation (unconditional mean and variance).

r0 = 0, h0 =

α0

1 − α1− β

.

3.2 Inference Methods

In this section we will investigate the research questions by applying the methodology on the data as illustrated in the earlier two chapters. For the MCMC method, a common procedure is to choose J = 10, 000 iterations. Typically, we rely on Monte Carlo to strengthen our analysis and thus we choose 1000 simulations. As a result, the analysis becomes computationally inten-sive - we estimate 1000 times an MCMC algorithm with 10, 000 iterations!

Since we require the MCMC parameter estimates to converge to the posterior, we choose a burn-in period of 5, 000. This means that we drop the first 5, 000 iterations and only consider the last 5, 000. For the initial values of the MCMC algorithm, we choose the ones suggested by

(29)

Ardia: (α[0], β[0]) = (0.01, 0.1, 0.7).

In his book, Ardia chooses priors proportional to the truncated normal distribution. To be specific, he chooses diffused priors, which are given below:

µ_α= 0, µβ = 0, Σα= 10, 000 · I2 and Σβ = 10, 000,

As a result, the prior distribution contains little information, in fact approximately none due to the wide variance. We concur with this diffuse prior as it does not restrict the posterior distribution to a prespecified range of potential parameter estimates.

Now a prior is chosen, we will focus on the difference in point estimates between both in-ference methods. Subsequently, convergence of parameter estimates is explored by increasing the sample length. Furthermore we will investigate the behaviour of both inference methods estimates’ under an error distribution different from the standard normal. In the following chapter we will apply such an analysis to the S&P 500 and the Nikkei 225.

For the analysis we have made use of MATLAB and R. More precisely, we made use of a package in R named "bayesGARCH" developed by David Ardia, which consists of a MCMC algorithm for GARCH(1,1) models with t or normal innovations.

3.2.1 Point estimates

In this subsection we compare point estimates for the frequentist and Bayesian method. For the frequentist method we consider the ML estimate, for the Bayesian methods we will consider the mean of the posterior distribution. For time series, there are multiple approaches to evaluate parameter estimates. For instance, one could forecast n days ahead and forecast these either statically or dynamically. Similarly one can evaluate the fit on the data. Common statistics used to evaluate estimation error are the Bias, Standard Deviation and Root Mean Squared Error (RMSE). For a parameter estimate bθ, the statistics are given by:

Bias[bθ] = E[(bθ − θ)], StDev[bθ] = q E[(bθ − E[bθ])2_], RMSE[bθ] = q E[(bθ − θ)2_].

Note that MSE = Bias2+ Var. The statistics are given in the table below for all parameter estimates of ψ and the 1-day ahead forecast of hT +1.

(30)

b α0 (·10−4) ML MCMC αb1 (·10 −2₎ _ML _MCMC Bias 0.3564 1.183 Bias 0.2485 4.546 St. Dev 0.9052 0.4662 St. Dev 2.433 3.419 RMSE 0.9726 1.272 RMSE 2.445 5.688 b β (·10−1) ML MCMC bh_{T +1} (·10−4) 1 ML MCMC Bias -0.4261 -1.7455 Bias 0.02325 0.02713 St. Dev 1.154 0.7191 St. Dev 7.056 6.875 RMSE 1.230 1.888 RMSE 7.056 6.875

Table 3.2: Statistics for ML and MCMC estimates.

The table above displays that the MCMC estimates suffer from considerably larger biases than the ML estimates. On the other hand we find that the MCMC estimates center more closely around their (biased) estimate than the ML estimates. If we combine both results by examining the RMSE, they favor the ML estimates. In fact, we have observed that MCMC estimates are biased to the extent that posterior parameter distributions barely intersect, if not intersect at all, with the true parameter values.

When we take a closer look at the ML estimates from the Monte Carlo simulation, we find that their distribution is skewed and fat-tailed (around the true value). Indeed a Jarque-Bera test confirms the non-normality by rejecting (p-values of approximately 0) that the skewness and kurtosis are equal to those of the normal distribution. As a result, inference based on asymptotic theory will lead to type I (false positives) and type II (false negatives) errors. The advantage of Bayesian inference is that it does not rely asymptotic theory, yet on its the poste-rior distribution, which may have any shape - including the normal distribution - and is therefore more flexible. It should be mentioned though, relying on asymptotic theory can by avoided by bootstrapping.

3.2.2 Sample length

By means of an illustration, we show how the estimates behave for different sample lengths for (α0, α1, β) = (.001, .05, .9). We consider only one time series because a Monte Carlo simulation

will be a computational burden for large T . By theory it should hold that for t → ∞, the frequentist and Bayesian estimates converge to the true value since we have a diffuse prior. As a result, both estimates will be close to one another. The table below depicts the parameter estimates for multiple sample lengths.

1

The approximation of bhT +1is based on hT, that is, the true (and thus unobservable) conditional variance at

(31)

T α_bM LE₀ α_bM LE₁ βbM LE EM LE(10) α_bM CM C₀ α_bM CM C₁ βbM CM C EM CM C(10) 250 .0653 .0183 .9444 11.51 .1035 .0512 .3627 16.22 500 .0052 .1678 .5815 32.64 .0902 .2045 .3796 34.27 1,000 .0067 .0346 .9298 4.91 .01027 .0683 .3877 11.07 10,000 .00091 .0566 .8991 6.92 .00118 .0621 .8804 7.08 100,000 .00097 .0508 .9005 6.31 .00101 .0516 .8980 6.32

Table 3.3: Estimates for different values of T .

where E(l) is the mean absolute percentage error (MAPE) for the l day ahead forecast, defined by: E(l) = 1 l T +l X t=T +1 |ht− bht| ht

Even though the MAPE does not decrease monotonically as T increases, we do find that for each number of observations, the MH estimate has a larger MAPE than the MLE estimate.

Additional evidence can be found in Table 6.1, which contains a DGP equal to the one in Table 3.2, but with a sample length of 1000 observations. Again, in terms of RMSE, the ML estimates are superior to the MCMC estimates. The Monte Carlo distribution of the MCMC estimates are given in figure 6.4. What can be seen is that even for T = 1000, results are massively biased.

3.2.3 Innovations

In this section we demonstrate the behaviour of both estimation methods under different in-novations for 100 MC simulations. Besides, we will examine how the estimates behave under misspecification. We will examine two different error distributions: the standardized student t distribution with ν = 3 and the standard normal distribution. The true value of α0 has been

modified here to .01. Est. t(·) Φ(·) DGP α0 α1 β ν α0 α1 β ν = 3 MLE .0108 .0513 .9394 3.016 .0168 .0698 .9144 MCMC .0209 .0751 .9115 2.948 .0365 .0824 .8602 Φ(·) MLE .0134 .0523 .9378 10.00 .0129 .0490 .9376 MCMC .0559 .0814 .8605 89.91 .0536 .0786 .8642

Table 3.4: Parameter estimates under different error innovations and assumptions.

On the diagonal boxes we find the correctly specified models and on the off-diagonals we find the misspecified models. If we compare the parameter estimates with the true values, we first note

(32)

that the MCMC estimation method has difficulties estimating the normal innovations DGP: it highly overestimates α and underestimates β. Overall we perceive that MLE has smaller biases than MCMC, especially when misspecified. Note also that the MLE method was bounded from above by_bv = 10 due to the used package fGarch in R.

3.3 Backtests

From the interim results, it has been observed that a Monte Carlo simulation over a MCMC algorithm causes computational limitations. Also, for the Monte Carlo DGP, Bayesian point estimates suffered from serious bias with respect to MLE. Using poor estimates would result in a ’garbage in, garbage out’-effect. Therefore, it has been decided not to investigate risk measures under a Bayesian GARCH framework for the Monte Carlo DGP.

The following graph depicts the return series with the corresponding negative conditional VaR (a = 0.01) for the one day ahead return based on a GARCH(1,1) model with normal innovations estimated by ML:

0

500 1000

1500

2000

−0.10

0.00

0.05

0.10 Time

retur

ns

Figure 3.1: Returns and Negative Value at Risk.

Note that for τ = 0.00145, the Expectile is approximately equal to the VaR under a standard normal distribution (2.33). In this section we will perform a Monte Carlo study (5, 000 simula-tions) to examine the rejection frequency of the backtests for bp(p), bF (p), bE(τ ), bβ(τ ) and bF (τ ) with a significance level of 5% for a = 0.01 and τ = 0.00145. As in Figure 3.1, the Monte Carlo DGP is kept as input, but the sample length is set to T = 2, 000 unless specified differently.

(33)

Size

For a correctly specified model, we expect that the rejection frequency equals the size of α. Thus, for a very large number of observations a correctly specified model should have minuscule estimation error and thus rejection frequencies should be close to a 5%. Also, the t-statistics should converge to normal distributed variables. Figure 3.2 depicts the distribution of 5, 000 simulated t-values with T = 100, 000 for the t distributed backtests:

Kupiec’s test t−statistics Frequency −3 −1 0 1 2 3 0 200 400 600 800 1000

FOC Expectile test

t−statistics Frequency −3 −1 0 1 2 3 0 200 400 600 800

ALS Expectile test

t−statistics Frequency −3 −1 0 1 2 3 0 200 400 600 800 1000

Figure 3.2: t-statistics for all three unconditional backtests.

The corresponding rejection percentages are 5.34%, 5.18% and 5.2%, respectively. Hence we observe that the size distortion (Rej. Freq. - α) are very small, indicating tests’ asymptotic validity. Also, the Jarque-Bera hypothesis of the third and fourth moments matching those of a normal distribution, cannot be rejected.

(34)

500 1000 1500 2000 T 0 0.05 0.1 0.15 0.2 Rejection Frequency Kupiec's F(a) FOC ALS F(τ)

Figure 3.3: Backtests’ convergence to α as T increases.

What we see is that for relatively small sample size, there is a positive size distortion; all tests except Kupiec’s over reject. Surprisingly, Kupiec’s test remains rather steady when decreasing the sample size. As T converges to a larger number, the rejection frequencies converge to α although they still differ a little from the size. The explanation here is that the GARCH(1,1) model parameters are not exactly equal to the true values, causing estimation error. Figure 3.3 therefore tells us more about how the tests behave when including estimation error and how they converge to α as t → ∞. We observe that of the VaR tests the bF (p) test is most close to the significance level (p-value of 5.14%) and converges to α fast. For the Expectile backtests it is the ALS test with a p-value of 5.88%.

In the following table we have depicted the rejection frequencies for different values of τ .

τ E(τ )b β(τ )b F (τ )b .00045 .1308 .1094 .1142 .00145 .0700 .0588 .0694 .00245 .0520 .0462 .0590

.5 .0468 .0468 .0542

Table 3.5: Rejection Frequencies for Multiple τ .

What we see is that as τ gets larger, size distortions become smaller. Most probably because the number hits in zi,τ increase, such that outliers affect the mean less heavily. In other words, an

increase in number of observations in the tail causes size distortions to decrease. Resultantly, we would expect the smallest size distortion when we are exactly in the middle of the distribution that is, τ = 0.5. Indeed, we find a very low size distortion, but it is also worth mentioning is that for τ = 0.5 the Expectile test statistics are identical.

(35)

Bootstrapped t-values

If we plot the distribution of both Expectile backtest t-statistics for T = 2000, we find that they are skewed to the right. A Jarque-Bera test confirms the non-normality as we find p-values of approximately zero, resulting in a rejection of its null hypothesis (normal skewness and kur-tosis). The VaR backtest on the other hand, does not fail the test. Hence, bootstrapping is worthwhile for the Expectile backtests.

For the unconditional coverage tests (the t-tests) we apply a resampling method with B = 399 draws. For the bF -test we choose a block length of ` = 400 such that include 5 blocks in one bootstrap sample. The rejection frequencies of the bootstrapped Expectile backtests are given in Table 3.6

b

E(τ ) β(τ )b Fb Rej. Freq. .0416 .0460 .0384

Table 3.6: Rejection Frequencies using Bootstrapped t-values.

When we compare the second row of Table 3.5 with Table 3.6, we find that the bootstrapped rejection frequencies are closer to α than the ones under asymptotic theory. Even though this encourages the use of bootstrapped t-values when using small samples, we continue with inference based on asymptotic theory as bootstrapping within a Monte Carlo analysis is a computational burden.

3.3.1 Power

In the previous part we investigated the size of the test. As we have seen, all the experiments yielded results reasonably close to the size of 5% for proper values of T and τ . Another inter-esting property of statistical tests is their power: the probability of rejecting the null hypothesis given that the alternative hypothesis holds:

P (Reject H0 | H1 holds)

Accordingly, a high power leads to small probabilities of a type II error: the probability of rejecting H0 whilst it holds.

3.3.2 Deviations from the unconditional VaR/Expectile

The most trivial way to verify the power of the tests is by testing the hypotheses for different values of the VaR or Expectile. For a standard normally distributed random variable, the value of the risk measures under τ = 0.00145 or a = 0.01 is about 2.33. In this case, the rejection frequencies are about equal to the size. The figure below, illustrates how fast the rejection frequencies increase when we deviate from the unconditional VaR/Expectile.

(36)

1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Risk Measure Amount

0 0.2 0.4 0.6 0.8 1 Rejection Frequency Kupiec's F(a) FOC ALS F(τ)

Figure 3.4: Backtests’ power against misspecified risk measure amounts.

Apparently, all tests have more power against a too low value than against a too high amount because the lines left from 2.33 increase faster to one than the lines on the right hand side. An explanation for this might be that in the right hand there would be less data to evaluate the hypotheses, leading to less power. A similar pattern we have seen in the case when we increased τ . An increase in τ (decrease in the risk measure amount) improved estimates because there were more observations in the tail. Similarly, if we lower the risk measure amount, power increases as there is more data to verify the statement of the hypothesis. We further note that the power of the bF (τ ) is smaller than the FOC tests’s power. This finding comes in as expected because this deviation from the null concerns solely the coverage level. Less unforeseen is the skewness of Kupiec’s test’ around 2.33: at an amount of 2.1 its rejection frequency is circa 0.9 whereas at 2.56 it is about 0.55.

Dynamic Misspecification

If we assume that the series are independently and identically distributed (i.i.d.), it implies that the time series are independent from each other and the variance is unconditional of time. Hence, we treat the Monte Carlo DGP as being i.i.d. and estimate the residuals zt based on a

fixed variance (over time). To be able to compare the GARCH data with the standard normal distribution, we set the long run variance equal to 1 by choosing α0 = 0.01, α1 = .05 and

β = .94. The rejection frequencies for this misspecification are given below:

b

p(p) F (p)b E(τ )b β(τ )b F (τ )b Rej. Freq. .2560 .3312 0.2826 .3372 .3222

Table 3.7: Rejection Frequencies assuming i.i.d.-ness.

Table 3.7 confirms that neglecting the conditional volatility structure of the GARCH(1,1) model with normal innovations leads to a higher rejection frequency than when taking this structure

(37)

into account (correctly specifying the model). Namely, rejection frequencies in the second row of Table 3.5 are considerably lower and closer to α. We find that the bF (·) tests have only little more power against this misspecification than their equivalent unconditional coverage test (Kupiec’s and FOC). This is not necessarily because bF (·) tests have little additional power besides the coverage level, but because the distribution of the simulated data has excess kurtosis by definition. This excess kurtosis is theoretically implied by the conditional volatility structure. As a result, coverage levels will not be correctly specified, resulting in power for the unconditional coverage tests. Yet, including more lags of hit series or xt though might improve rejection

frequencies of the bF (·) tests relative to the unconditional coverage tests.

Distributional Misspecification

In this paragraph, we skipped the GARCH(1,1) estimation part and directly drew from i.i.d. distributed data, which can be seen as the residuals in the GARCH(1,1) model (but then without estimation error). First we estimate a risk measure based on data from a standardized t(ν)-distribution. Then, we test the backtest statistics against the hypothesized value. Of course, for ν = ∞, the distribution would be correctly specified and we would expect rejection frequencies around 5%. Moreover, the power should increase as ν decreases.

0 10 20 30 40 50 60 70 80 90 100 ν 0 0.2 0.4 0.6 0.8 1 Rejection Frequency Kupiec's F(a) FOC ALS F(τ)

Figure 3.5: Backtests’ power against misspecified t(ν)-distribution.

Figure 3.5 exactly represents what we would expect: gains in power the moment the model is misspecified. If we look more closely, we find that the ALS, FOC and Kupiec’s test have substantially more power against misspecified distribution than the two bF (·)-tests. When the residuals resemble approximately a t(ν)-distribution with ν ≤ 10, the unconditional coverage tests reject the hypothesis of being correctly specified with 1.

Another conclusion we can draw is that once we suspect the data are i.i.d., we should not use the bF -tests: Since one part of the hypothesis is true (that is, lags are independent), the

(38)

tests have more difficulties rejecting the misspecification of the coverage level compared to the unconditional coverage tests. For instance, the difference between the FOC and bF (τ ) test represents the gain in power if the lag of xt is discarded and we only test for the intercept

(unconditional coverage). Still they increase in power as v gets even smaller because they test for the coverage level as well.

When we simulate the backtests for the correctly specified distribution (ν = ∞), we can compare how GARCH(1,1) estimation would affect the results. Table 3.8 depicts the rejection frequen-cies, and can be compared with the second row of Table 3.5 which involves the GARCH(1,1) estimation process.

ν p(p)_b F (a) E(τ )b β(τ )b F (τ )

∞ .0560 .0510 .0490 .0564 .0530

Table 3.8: Rejection Frequencies for i.i.d. data.

This table represents once again evidence that if the model would be correctly specified, the rejection frequency is close to the size. We also observe that rejection frequencies in Table 3.8 have a smaller absolute size distortion than the ones in Table 3.5. From this it can be concluded that estimation error has only a marginal effect on the rejection frequencies.

(39)

Empirical study

In this Chapter we apply our methodology to two financial market datasets. More precisely, we investigate the S&P 500 and the Nikkei 225. Just as in the previous chapter, we first apply the two inference methods to the data and continue by backtesting the model. When backtesting, we investigate different innovation processes and sample lengths too.

4.1 Market data

For the empirical study we will consider two time series of financial markets. More precisely, we consider two stock indices: the S&P 500 and the Nikkei 225. These time series are widely used in empirical research and therefore results will be comparable with other findings. The data are retreived from Bloomberg, but can easily be found on websites such as Yahoo Finance too. 4.1.1 S&P 500

The S&P 500 (daily) closing price (in dollars) will be used when applying the models to market data. The S&P (Standard & Poor’s) 500 is an American stock index of 500 large companies listed either on the NASDAQ or NYSE (both American stock markets). Therefore, this time series represents typical financial market behaviour and is therefore a widely used dataset in GARCH model studies. The starting date is January 1st of 2010, the ending date is December 31st, 2014. This period consists of 1258 trading days and includes the aftermath of the credit crisis from 2008, which is relatively volatile, but also a recovery period in which a positive trend is noticed. Similarly, the log returns process depicts turbulent periods followed by relatively tranquil periods. Figures of the S&P 500 can be found in the Appendix (Figure 6.2) as well. The table below summarizes the log return process by some statistics. Note though that except for the number of observations, all values are given in basis points. The same holds for Table 4.2. Also, the careful reader will notice that that the log-returns consist of one observation less due to first differencing.

(40)

Obs. mean med. var. min max

r_tS&P 500 1257 4.8 7.2 1.0 −690 463

Table 4.1: Descriptive statistics S&P 500 (in basis points).

4.1.2 Nikkei 225

Just as the S&P 500 is a stock index of 500 large companies listed on American stock markets, the Nikkei 225 (prices in yen) is a stock index of 225 large companies listed on the Tokyo Stock Exchange (TSE). As is the case for the S&P 500, we have collected data from the 1st of January, 2010 up to the 31st of December, 2014. However, since there were fewer trading days, we have 1238 observations.

Obs. mean med. var. min max

rNikkei 225_t 1237 4.0 5.7 1.9 −1120 552

Table 4.2: Descriptive statistics Nikkei 225 (in basis points).

The price and log-return processes are depicted in Figure 6.3. At first sight it looks like the log-return process of the Nikkei 225 is far more volatile than the S&P 500, mainly due to the higher spikes. The descriptive statistics confirm this belief as the unconditional variance is higher.

For both indices the returns are positive and skewed. The magnitude of the mean return is negligible compared its standard deviation. The skewness of the returns though, is serious. This is for instance described by the difference in size of the minima and maxima.

4.2 Inference methods

In the following two subsections we align the parameter estimates of the Bayesian and frequentist inference methods. The innovation processes are assumed to be normally distributed. For the MCMC estimates we have provided 95% credible intervals, for the ML estimates we have provided 95% confidence intervals.

(41)

4.2.1 S&P 500

The results are given in the table below and are based on the full sample length:

MCMC MLE

φ φ¯ φ0.5 φ0.025 φ0.975 φM LE SE min 1 max

α0 (·10−6) 4.358 4.287 2.863 6.237 3.448 0.8287 1.824 5.072

α1 0.1495 0.1512 0.1104 0.1966 0.1218 0.0209 0.0900 0.1721

β 0.8088 0.8101 0.7612 0.8510 0.8341 0.0231 0.7888 0.8794

Table 4.3: Estimates S&P 500.

We have defined φq as the q-st quantile of the posterior distribution. The estimates above

are based on the first 500 observations. In comparison with Monte Carlo results, we find that the point estimates are relatively close to each other and their intervals intersect for the greatest part. Yet, their interpretation is different. For instance, for the ML estimate of α1 we would

say that ”if this experiment is repeated many times, in 95% of these cases α1 will be contained

in the constructed interval”. For the MCMC we could say the following: ”Given our observed data, there is a 95% probability that the true value of α1 lies within [0.1003, 0.2134]”. Before

even regarding parsimony of the models, one could prefer one method to another based on this difference in interpretation. A discussion on the fundamental differences between confidence and credible regions is given by Vanderplas (2014) and Jaynes and Kempthorne (1976). We will not dive into this discussion because we are specifically interested in the models’ relation to backtestability. Besides, if we increase the sample size we find that both estimates converge to each other - just as we have seen with the Monte Carlo DGP.

The RMSEs of the 10-day ahead static forecasts of hT +lfor the MLE and MCMC estimates are

9.335 and 9.293 (both ·10−5), respectively. Hence, in contradiction to earlier results, estimates perform evenly well. This result aligns with the one of Hoogerheide et al. (2012), who found no significant difference between frequentist and Bayesian GARCH estimates for the S&P 500. Likewise, Ardia (2006) found the same result.

4.2.2 Nikkei 225

Based on the full sample, we find the following results for the Nikkei 225 log-returns:

1

(42)

MCMC MLE

φ φ¯ φ0.5 φ0.025 φ0.975 φM LE SE min max

α0 (·10−5) 1.195 1.145 0.6546 1.978 0.823 0.2837 0.2670 1.379

α1 0.1248 0.1235 0.0866 0.1702 0.1112 0.0199 0.0722 0.1502

β 0.8160 0.8186 0.7494 0.8672 0.8482 0.0271 0.7951 0.9013

Table 4.4: Estimates Nikkei 225.

Just as with the S&P 500, we find thatαb0 andαb1 are larger for the Bayesian method, whereas

b

β is larger for the frequentist method. Also, all coefficients are found to be significant on a 5% significance level as well.

In the previous section we observed that the Nikkei 225 has larger spikes than the S&P 500. For the GARCH(1,1) model, it is theoretically implied that a high α1 leads to large spikes in

the time series. Results show that α1 is somewhat larger for the Nikkei 225 than for the S&P

500 which aligns with what we observe. It also means that unexpected shocks of yesterday have a relatively larger effect on the volatility of Nikkei 225 than the volatility of the S&P 500. Another finding is that α1+ β is approximately equal for the S&P 500 and the Nikkei 225. This

implies that the half-life of a shock on the S&P 500 is about as long as on the Nikkei 225.

4.3 Backtests

The estimates in the previous section allow us to estimate the standardized residuals, which we backtest in this section. The backtests we will apply here are the same as in Section 3.3. Correspondingly, conlusions are drawn based on a size of 5%.

4.3.1 S&P 500

Table 4.5 contains the p-values of the backtest. We distinguish between estimation methods, innovation processes and sample sizes. For instance, the p-values on the first correspond to the ML estimation method with normal innovations for the first 500 observations. In the second row we investigated the effect of changing MLE to MCMC estimates, so we can identify the influence of the estimation method keeping other factors constant. For the third and fourth row the estimated innovation process is changed from a normal to a t(ν) distribution. Accordingly a change in distribution would change the value of the risk measure. Bear in mind that different estimation methods could lead to similar p-values when the estimated conditional variances vary only slightly. The same goes for distributions: whenbν is large, normal and t(bν) innovations are approximately identical.

Bayesian GARCH estimation and expectile backtesting : an investigation of recent developments in financial risk management