Sensitivity analysis of value-at-risk

(1)

Economics and Business, University of Amsterdam Master thesis Financial Econometrics

SENSITIVITY ANALYSIS OF VALUE-AT-RISK

Ramon de Punder (10631712)

Prof. dr. C.G.H. Diks (supervisor)

(2)

Statement of Originality

This is to certify that the content of this thesis is the product of my own work and that all sources have been acknowledged.

(5)

1 Introduction

Since the Basel Committee on Banking Supervision (1996), financial institutions are allowed to employ internal forecast models to calculate their required regulatory capital. For this purpose, the Value-at-Risk (VaR) for a horizon of ten trading days (h = 10) and a confidence level of α = 0.99, has become the industry-standard risk measure. Conceptually, the (conditional) VaR is nothing but a quantile of the (conditional) density of the aggregated portfolio returns, h trading days ahead. The ever-increasing computational power and memory enables risk managers to use highly complex forecast models that incorporate very detailed information with respect to the associated asset returns at hand. In their recent empirical analysis, Kole, Markwat, Opschoor, and Van Dijk (2017) investigate the impact of the degree of complexity and information in constructing VaR forecasts. In particular, they compare models based on different levels of temporal and portfolio aggregation∗ _{in which a lower level of aggregation improves the precision and efficiency of VaR}

forecasts due the larger amount of available observations and the inclusion of the cross-correlation structure between assets over time.

Given a specific choice of aggregation, a wide range of (multivariate) time series models is nowa-days available to replicate the characteristics of financial returns. Specifically, regarding the first two moments of the conditional distribution of the returns, the financial econometrics literature has established that the conditional variance is often persistent over time whereas the linear depen-dence of the returns over time is weak if exists at all. The three most popular univariate models to capture the persistence in variance are the (Generalised) Autoregressive Conditional Heteroskedas-ticity (GARCH) model proposed by Engle (1982) and Bollerslev (1986), the GJR model proposed by Glosten, Jagannathan, and Runkle (1993) and the Exponential GARCH model proposed by Nelson (1991). The GJR-GARCH and EGARCH model can be used to replicate asymmetry, which denotes the different impacts on the conditional variance of positive and negative shocks of equal magnitude, as well. In the class of the multivariate conditional variance models, research has centered around the variance-correlation decomposition of the asset conditional covariance ma-trix, over the last two decades. Models like the Dynamic Conditional Correlation (DCC) model proposed by Engle (2002) and the GOGARCH model proposed by Van der Weide (2002) select univariate GARCH models for the variance part and a parsimonious model for the correlation part to overcome challenges in terms of the feasibility of parameter estimation and tracebility of the positive-definite constraints for the conditional covariance matrix. A survey of multivariate GARCH is provided by Bauwens, Laurent, and Rombouts (2006) and Silvennoinen and Teräsvirta

∗_{Specifically, Kole, Markwat, Opschoor, and Van Dijk (2017) consider daily, weekly and biweekly return data as}

three different levels of temporal aggregation and accomplish portfolio aggregation by grouping assets in different asset classes or a single portfolio (full portfolio aggregation).

(6)

(2009).

Although the (multivariate) normal distribution is often used in practice, distributions with fatter tails than the normal distribution may provide a better fit to the standardised residuals of financial time series. Additionally, with respect to multivariate models, the implicitly imposed assumption by the multivariate normal distribution that the variance-covariance structure of the returns is fully represented by the first two conditional moments, may be restrictive in practice. The (multivariate) (skewed) Student’s t-distribution is therefore a promising alternative (see for example Braione and Scholtes, 2016) whereas even more flexible approaches are often based on copulas or regime switching methods, or both.

Another approach of increasing interest in modeling the conditional distribution of the returns is based on bootstrap procedures. Contrary to the traditional parametric models mentioned so far, forecast densities based on bootstrap procedures do not require any distributional assumption. Among others, Thombs and Schucany (1990) and Kim (2001) introduce bootstrap procedures based on the backward representation of an AR(p) series excluding multivariate models with moving average (MA) components or GARCH-disturbances. Moreover, the asymptotic validity of the bootstrap-after-bootstrap procedure proposed by Kim (2001) still relies on Gaussian innovations. More recently, forward bootstrap procedures for univariate ARMA and GARCH models proposed by Pascual, Romo, and Ruiz (2004) and Pascual, Romo, and Ruiz (2006), respectively, have been extended by Trucíos and Hotta (2016) to incorporate asymmetry and Fresoli, Ruiz, and Pascual (2015) to multivariate models. Another advantage of the bootstrap procedures is that they allow for parameter uncertainty in the construction of forecast densities.

Several authors have previously used one-step VaR forecasts to compare the predictive accuracy of competing forecast models. To meet the Basel Committee on Banking Supervision (1996) requirements, it is also sufficient to construct one-step ahead forecasts. Hence, the Basel Committee on Banking Supervision (1996) explicitly advises to scale one-step ahead forecasts to the ten-day VaR by the so-called square-root-of-time rule when the data is observed at a higher frequency than the forecast horizon of ten days. Theoretically, iterated forecasts, which incorporate the path dependence that results from a time series model, are more efficient when the model is correctly specified. Kole, Markwat, Opschoor, and Van Dijk (2017) consider both forecast techniques and find some empirical evidence in favor of iterated forecasts, although strongly varying across other model choices∗_.

In this thesis, we study the impact of the (i) portfolio model, (ii) level of temporal aggregation, (iii) forecast technique and (iv) forecast model on the predictive accuracy of the conditional VaR,

∗_{As advocated by Chevillon (2007), scaled forecasts are more robust against model misspecifications than iterated}

(7)

for a fixed forecast horizon of ten days. Regarding the portfolio model, we only consider univariate modeling and projected modeling in which the portfolio returns are modeled by a univariate model or the linear projection of a multivariate model on the portfolio returns, respectively∗_{. For the}

level of temporal aggregation we consider the same three levels as Kole, Markwat, Opschoor, and Van Dijk (2017), namely, daily returns (κ = 1), weekly returns (κ = 5) and biweekly returns (κ = 10) and model these returns as if they were only observable once a day, etc.. With respect to the forecast technique, we consider iterated forecasts, bootstrapped forecasts, scaled forecasts and normal forecasts. Bootstrapped forecasts are actually nothing but iterated forecasts based on a semi-parametric forecast model whereas normal forecasts restrict the convolution to be normally distributed, but with the first two conditional moments equal to the conditional moments of iterated forecasts. The forecast model (i.e. time series model) describes the evolution of the conditional distribution of the returns over time in terms of associated conditional moments and a distributional assumption for the innovations†_{. Following Giacomini and White (2006), we refer to a set of model}

choices, which at least involves a particular choice for each class (i) to (iv), as a forecast method. We are not just interested in the best set of choices regarding classes (i) to (iv), but also aim to identify the sensitivity of these choices with respect to each other. For example, we are inter-ested whether the difference in performance of projected and univariate modeling is magnified by simplifications in forecast techniques (e.g. by using scaled forecasts rather than iterated forecasts) or temporal aggregation (e.g. by using κ = 10 instead of κ = 1). Similarly, we also investi-gate whether a particular class is dominant with respect to the predictive accuracy. For example, whether forecast methods that involve iterative forecasts outperform all other forecast methods, irrespective the choices for each of the other classes (i), (iii) and (iv). We also implement a Monte Carlo analysis to investigate whether the impact of the choices from the different classes (i) to (iv) and how they are related to each other is robust against different circumstances in terms of the correlation structure between assets and the persistence in variance.

To compare all resulting forecast methods, we first apply the traffic light system proposed by the Basel Committee on Banking Supervision (1996) and a set of widely used backtesting procedures proposed by Christoffersen (1998), Kupiec (1995) and Engle and Manganelli (2004). Second, we implement a pairwise statistical comparison within the framework of Giacomini and White (2006) based on a Diebold and Mariano (1995) test and the asymmetric tick loss function proposed by Giacomini and Komunjer (2005). Third, we apply the Model Confidence Set (MCS) proposed by Hansen, Lunde, and Nason (2011) to determine differences between the different classes, all

∗_{Diks and Fang (2016) also evaluate VaR forecasts based on multivariate density forecasts and show that the use}

of high-dimensional information could be misleading.

†_{The forecast model is used to construct the empirical distribution of the standardised residuals when}

(8)

together.

Our research is most related to the empirical analysis of Kole, Markwat, Opschoor, and Van Dijk (2017), who compare 255 forecast methods and focus on the impact of temporal aggregation on the predictive accuracy of the VaR. We contribute to their work and other related papers by allowing for different and more complex models with respect to evolution of the conditional (co-)variance (matrix) and the inclusion of a bootstrap procedure and the normal forecast technique, moreover. Besides differences in choice aspects of the forecast methods that are compared, we apply the forecast methods to different data in our empirical analysis and perform a Monte Carlo experiment in addition.

The remainder of this thesis is structured as follows. Section 2 describes the framework including the definitions of different forecast models and techniques and different testing procedures. Section 3 provides an empirical application based on the univariate series of the individual assets of the portfolio. Section 4 builds on the concepts and results discussed in Section 3 and provides a detailed comparison of the different forecast methods. Section 5 provides a Monte Carlo experiment and Section 6 concludes.

(9)

2 Framework

This section provides the theoretical background with respect to the particular choice aspects discussed in Section 1, which build up to the forecast methods as referred to by Giacomini and White (2006). We motivate the selection of the forecast models that are used in the subsequent sections and include estimation procedures and several conditional moments and VaR forecasts in explicit form accordingly. The last two subsections provide a set of performance evaluation tests based on backtesting procedures and methods that rely on Diebold and Mariano (1995) tests.

2.1 General setup and notation

Consider assets A ≡ {A1, . . . AM} with prices {P1,tκ, . . . PM,tκ } observable at times t ∈ ˜T κ

≡ {0, κ, 2κ, . . . , T }, where κ ∈ N. The corresponding κ-period logreturns yκ

i,t ≡ log(Pi,t/Pi,t−κ),

for i = 1, . . . , M, are defined on a probability space (Ωκ_,

Fκ

, P) and adapted to the filtration {Fκ

t}t∈Tκ, where Tκ≡ ˜Tκ\{0}. Suppose that for each asset a series of returns {yκ_i,t}K +

κ=K− exists so

that results based on a particular level of temporal aggregation κ∗ _{can be compared with results}

based on lower levels of temporal aggregation κ ∈ [K+_{, κ}∗₎_{, as if these lower levels of temporal}

aggregation were not available (or too expensive) for a particular asset and the full length of the sample, for a risk manager, in practice.

Some care is needed when introducing the τ-step ahead forecast densities conditional on the information set Fκ

T since we choose the step size to be equal to κ (e.g. instead of one day).

The associated random variables are denoted by ˜yκ,j

i,T[τ ], where the j ∈ {It, Sc, B, N } is used to

make the distinction between forecast techniques; iterated (It), scaled (Sc), bootstrapped (B) and normal (N ), which are based on different assumptions with respect to the conditional distribution of ˜yκ,j

i,T[τ ]. Consequently, the random variables corresponding to the h-period ahead aggregated

returns are given by the convolution of ˜yκ,j

i,T[τ ]for τ = 1, 2, . . . , h/κ, for all forecast techniques. In

other words, ˆ yi,Tκ,j(h) = h κ X τ=1 ˜ yi,Tκ,j[τ ],

where h is assumed to be a multiple of κ. The motivation for this notation is that the use of ˜yκ,j i,T[τ ]

in constructing ˆyκ,j

i,T(h) reveals important differences in the construction of ˆy κ,j

i,T(h) for different

levels of temporal aggregation (e.g. ˆyκ,j

i,T(h)is a simple one-step ahead forecast for κ = 10), albeit

perhaps a bit confusing at first glance∗_.

∗_{Because different random variables like ˜}_y5,j

P,T[2] and ˜y 1,j

(10)

The individual asset returns yκ

t ∈ RM contribute to a portfolio P with weights w ∈ RM,

satisfying PM

i=1wi = 1. Because τ-step (and h-period) ahead forecast of the (aggregated) portfolio

returns depend on the portfolio model and forecast technique, we replace the index i by P and PM

to denote univariate and projected modeling, respectively. Consequently,

ˆ yκ,j_P,T(h) = h κ X τ=1 ˜ yκ,j_P,T[τ ], ˆ yPκ,jM,T(h) = h κ X τ=1 w0y˜κ,j_T [τ ],

where j ∈ {It, Sc, B, N }, denote the aggregated portfolio returns for univariate and projected modeling, respectively∗_.

As a final step, we use the density of ˆyκ,j

p,T(h), with p ∈ {P, PM}, to construct the conditional

h-period ahead VaR, denoted by VaRκ,h,j

α,p,T. The latter measures the severity of risk of holding

portfolio P over time period [T, T + h]. Defining Lκ,h,j

T ,p as the portfolio loss over time period

[T, T + h]conditional on Fκ T such that L κ,h,j T ,p =−y κ,j p,T h

_{leads to the definition}

VaRκ,h,jα,p,T ≡ inf `_{∈ R : P}hLκ,h,jp,T > ` F κ T i ≤ 1 − α ,

where α ∈ (0, 1). In other words, VaRκ,h,j

α,p,T is the maximum portfolio loss which is not exceeded in

time period [T, T + h] at a given confidence level α. In line with the Basel Committee on Banking Supervision (1996) we take h = 10 and α = 0.99, but also include α = 0.95. The definition in terms of the conditional loss distribution is intuitive and often used in the actuarial sciences literature (e.g. McNeil, Frey, and Embrechts, 2015). Analytically it is however more convenient to model the negative VaRκ,h,j

α,p,t, −VaRκ,h,jα,p,T = sup `_{∈ R : P}h_{− L}κ,h,j_p,T _{≤ `} F_Tκ i ≤ 1 − α , (2.1)

since it is just the negative 1 − α quantile of ˆyκ,j p,T(h).

Based on these definitions, the effect of the level of temporal aggregation is expected to be two-fold. Hence, smaller values of κ reduce parameter uncertainty on the one hand and a forecast density that is based on less τ-step ahead forecasts on the other. To clarify the latter effect, notice that for κ = 10, VaRκ,h,j

α,p,T is just the negative (1 − α) quantile of a one-step ahead forecast density

∗_{Because returns refers to logreturns log(P}

i,t/Pi,t−κ) rather than simple returns (Pi,t− Pi,t−κ)/Pi,t−κ, in this

thesis, the linear combination yκ

P,t = w

0

yκ

t is actually a approximation for the portfolio return which is however

(11)

while for κ = 5 this is the convolution of two dependent forecast densities, for which a closed form solution may not exist. However, for even smaller values of κ (e.g. κ = 1) the shape of the distribution of the sum of these random variables is likely to become closer to the shape of the normal distribution as a result of a Central Limit Theorem.

2.2 Univariate parametric models

Univariate time series models are needed for both univariate and projected modeling. Hence, as will be shown in Section 2.4, the multivariate dynamics of the asset returns partially rely on their univariate dynamics. In this section, we discuss a selection of parametric models for which the conditional joint density of (yκ

m+κ, . . . , yTκ)can be written as ϕ(yκ m+κ, . . . , yκT|F κ m; ϑ κ ) = T Y t=m+κ ϕ(yκ t|F κ t−κ; ϑ κ ), (2.2) conditional on Fκ

m, the information set at time t = m, including

yκ κt, aκκt, σκ κ(t+1) 2m t=1 . Re-garding the conditional densities ϕ(yκ

t|Ftκ; ϑ

κ₎_{, it is convenient to extract the conditional mean}

and variance by standardising yκ

t in the following way

yκ t = µκt + aκt, aκ t = σ κ tε κ t, εκt i.i.d. ∼ ψ(z; ηκ ). (2.3)

The standarisation established in Eq. (2.3) enables us to discuss models for the conditional mean, variance and the distributional assumption on the standardised innovations εκ

t, separately.

Regarding the conditional mean µκ

t ≡ E[ytκ|Ftκ], an ARMA(p,q)-model, µκt = φ κ 0 + p X j=1 φκjy κ t−jκ+ q X j=1 θκja κ t−jκ, (2.4)

is selected to model the correlation in the asset returns which is however expected to be small, if it exists at all. In general, an ARMA(p,q) model is stationary if all roots of the corresponding AR-polynomial φ(z) = 1 − Pp

j=1φjzj lie outside the unit circle and invertible if all roots of the

corresponding MA-polynomial θ(z) = 1 − Pp

j=1θjzj lie outside the unit circle.

Although linear dependencies may be weak, financial return data often capture clustered periods of high volatility and negative shocks usually have a higher impact on tomorrows volatility than positive shocks. A possible characterisation of the conditional variance (σκ

(12)

exploits (some of) these non-lineair dependencies, is σκ t 2 = ωκ_{+ α}κ_{+ γ}κ_I {aκ t−κ<0} aκ t−κ 2 + βκ _σκ t−κ 2 , (2.5) where ωκ _{> 0}_{, α}κ ≥ 0, γκ ≥ 0, βκ ≥ 0 and aκ

t ≡ σtκεκt. For εκt ∼ N (0, 1), Eq. (2.5) specifies

the GJR-GARCH(1,1) model proposed by Glosten, Jagannathan, and Runkle (1993). The model incorporates the leverage effect for γκ _{> 0} _{and simplifies to the GARCH(1,1) model, pioneered}

by Bollerslev (1986), for γκ _{= 0. In particular, we refer to RiskMetrics if in addition ω}κ _{= 0,}

ακ_{= 1}

− βκ _{and µ} t= 0.

Another possible specification of the dynamics of the conditional variance is

log σκ t 2 = 1_{− β}κ_ωκ_{+ α}κ_εκ t−κ+ γ κ |εκ t−κ| − E|ε κ t−κ| + βκ_log _σκ t−κ 2 , (2.6)

which defines the EGARCH model proposed by Nelson (1991). An obvious advantage of this specification is that σκ

t

2

> 0is guaranteed for all parameter values. Eq. (2.6) accommodates the asymmetric relation between the returns and volatility changes similar to the GJR-GARCH(1,1) model. To see this, we rewrite the model as

σtκ 2 = σt−κκ 2βκ exp 1− βκ_ωκ − γκ E|εκ t−κ|            exp ακ − γκ₎ aκt−κ 2 σκ t−κ ! if aκ t−κ< 0, exp ακ_{+ γ}κ₎ aκt−κ 2 σκ t−κ ! if aκ t−κ≥ 0, to show that (ακ

− γκ₎ _{and (α}κ_{+ γ}κ₎_{are the coefficients through which asymmetry in response}

to negative and positive shocks is established. More precisely, if γκ

6= 0, the News Impact Curve (NIC) in terms of σκ

t

2

is a reflection in the vertical axis if ακ _{= 0}_{and a set of two lines with}

different slopes if ακ

6= 0. For ακ _{< 0} _{the slope of the line corresponding to negative shocks a}κ t

is larger in absolute value which corresponds to the earlier mentioned leverage effect. So, in a Monte Carlo analysis we can increase the strength of the leverage effect by either increasing γκ _in

Eq. (2.5) or decreasing ακ_{in Eq. (2.6).}

It is however not immediately clear how to control for persistence in variance. Suppose that γκ_{= 0}_{in Eq. (2.5), then,} aκ t 2 = ωκ_{+ α}κ_{+ β}κ aκ t−κ 2 + ξκ t − β κ_ξκ t−κ, (2.7) with ξκ t ≡ aκt 2 − Eh aκ t

(13)

model for aκ t 2 and hence Corr aκ t 2 , aκ t−κ` 2 = ακ_{+ β}κ`−1α κ₁ − ακ_βκ − βκ2 1− 2ακ_βκ_{− β}κ2 , is the autocorrelation function of aκ

t

2

, for ` ≥ 1. Therefore, increasing ακ_{+ β}κ _{yields higher}

persistence in conditional variance. On the other hand, in order for the time series to be stationary, ακ_{+ β}κ_{< 1}_{should be satisfied as well. More generally, the series y}κ

t in Eq. (2.5) is stationary if

0_{≥ α}κ₊1 2γ

κ_{+ β}κ_{< 1, provided that the distribution of the innovations is symmetric.}

The ACF of aκ t

2_{for the EGARCH model has been derived from the results of Nelson (1991)} by Rodríguez and Ruiz (2012). They show that rate of decay in the autocorrelations of aκ

t

2 is not constant, but tends to ακ _{for large lags. The stationarity condition depends on the error}

distribution. For example, the EGARCH model specified by Eq. (2.6) is always stationary if |ακ

| < 1 and εκ

t ∼ N (0, 1) or a Generalized Error Distribution (GED) with νκ> 1. For a Student’s

t-distribution the restrictions become implausible (see Rodríguez and Ruiz, 2012), making the GED the preferred leptokurtic error distribution.

More generally, the distributional assumption regarding the standardised innovations εκ t is the

third and final component in Equation 2.3 that needs a specification. Unless stated differently, the standard normal distribution is selected, with density,

ψ(z; ϑκ_{) =} 1 √ 2πe −1 2z 2 . (2.8)

This, however, is a rather traditional choice since the financial econometrics literature has estab-lished that financial returns and their standardised residuals∗ _{exhibit a higher kurtosis than is}

consistent with Gaussian innovations. Therefore, promising alternatives are leptokurtic distribu-tions like the standardised Student’s t-distribution, with density,

ψ(z; ϑκ_{) =} Γ (ν κ_{+ 1)/2} Γ(νκ_/2)p(νκ_{− 2)π} 1 + z 2 νκ_{− 2} −(νκ+1)/2 , (2.9) where νκ_{> 2}_{and Γ(x) = R}∞

0 yx−1e−ydyis the usual Gamma function. The standardised Student’s

t-distribution reduces to the standard normal distribution if νκ

→ ∞ and has fatter tails for all

∗_{Note: It has been shown that the kurtosis of a series following a stationary GARCH model exceeds three by}

(14)

finite νκ_{> 2. Another leptokurtic distribution is the GED, with density} ψ(z; ϑκ_{) =} ν κ_e−1 2|z/λ| νκ λ2(1+1/νκ₎ Γ(1/νκ₎,

where νκ_{> 0, Γ(x) denotes the Euler gamma function and}

λ = q

2−2/νκ

Γ(1/νκ_)Γ(3/νκ_). _(2.10)

The GED reduces to the standard normal distribution if νκ_{= 2}_{and has fatter tails for 0 < ν}κ_{< 2.}

In this paper, we consider seven combinations of choices regarding the conditional variance and the specification of the distribution of the standardised innovations εκ

t. In an empirical analysis,

a model for the conditional mean is selected which fits the data best, whereas µκ

t = 0in a Monte

Carlo Analysis. A summary of the different time series models is provided in Table 1. To construct VaRκ,hj

α,P,T based on the different models in Table 1, we recall that VaR κ,h,j α,P,T is the

negative (1 − α) quantile of the forecast density of the random variable ˆyκ,j P,T(h)≡ Phκ τ=1y˜ κ,j P,T[τ ], where ˜ y_P,Tκ,It[τ ]_{∼ ˜}ψz; ˜µκ T[τ ], ˜σ κ T 2 [τ ], ηκ_, _(2.11)

for τ = 1, 2, . . . , h/κ and where ˜ψz; µκ_{, σ}2κ

, ηκ denotes the destandardised ψ(z; ηκ₎_{of Eq. (2.3),}

with expectation µκ_{and variance σ}2κ_{. Regarding the τ-step ahead forecasts for the conditional} moments, the τ-step ahead forecasts for the conditional mean are in general implied by Eq. (2.4). Specifically, ˜ µκ P,T[τ ]≡ E[yTκ+τ κ|FTκ] = φ0+ p X j=1 φκ jµ˜ κ T[τ− j] + q X j=1 θκ ja κ T[τ− j], (2.12) where µκ T[τ−j] = y κ T+(τ −j)κ, for j ≥ τ, a κ T[τ−j] = a κ T+(τ −j)κ, for j ≥ τ and a κ T[τ−j] = 0, for j > τ.

For p = 1 and q = 0, the recursion in Eq. (2.12) simplifies to ˜µκ

T[τ ] = φκ0P τ −1 j=1 φκ1 j + φκ 1 τ yκ T.

As can be seen from Table 1, the τ-step ahead forecasts for the conditional variance are model-dependent. As a first step in their derivation, we use the tower property of expectations, the previsibility of the series σκ

t

2_{and the independence of ε}κ

t to show that ˜ σκ T 2 [τ ]≡ Eh(aκ T+τ κ 2 FTκ i = E σκ T+τ κ 2 E h εκ T+τ κ 2 F_Tκ_{+(τ −1)κ} i F κ T = Eh σκ T+τ κ 2 FTκ i .

(15)

T able 1: Mo del selection conditional mean Index Name Conditional v ariance equation ∗ P arameter restrictions Stationarit y conditions I RiskMetrics (1 − β κ) a κ t− κ 2 + β κ σ κ t− κ 2 β κ ≥ 0 -II GAR C H -N ω κ + α κ a κ t− κ 2 + β κ σ κ t− κ 2 ω κ > 0 ; α κ, β κ ≥ 0 α κ + β κ < 1 II I GAR C H -t ω κ + α κ a κ t− κ 2 + β κ σ κ t− κ 2 ω κ > 0 ; α κ, β κ ≥ 0 ; ν κ > 2 α κ + β κ < 1 IV GJR-GAR CH-N ω κ + α κ + γ κI { a κ t− κ < 0 } a κ t− κ 2 + β κ σ κ t− κ 2 ω κ > 0 ; α κ, β κ, γ κ ≥ 0 α κ + 1γ2 κ + β κ < 1 V GJR-GAR CH-t ω κ + α κ + γ κI { a κ t− κ < 0 } a κ t− κ 2 + β κ σ κ t− κ 2 ω κ > 0 ; α κ, β κ, γ κ ≥ 0 ; ν κ > 2 α κ + 1γ2 κ + β κ < 1 VI EGAR C H -N 1 − β κ ω κ + α κε κ t− κ + γ κ |ε κ t− κ | − E |ε κ t− κ | − |β κ| < 1 + β κlog σ κ t− κ 2 VI I EGAR C H -GED 1 − β κ ω κ + α κε κ t− κ + γ κ |ε κ t− κ | − E |ε κ t− κ | ν κ > 0 |β κ| < 1 + β κlog σ κ t− κ 2 Index Uncond. v ariance ∗ τ -step ahead forecasts ∗ ∗∗ Log-lik eliho o d con tributions ∗∗ I − σ κ T+ κ 2 − 1 2 log (2 π ) − 1 2 log σ κ t 2 − ( a κ)t 2 2( σ κ t) 2 II ω κ 1 − α κ − β κ ω κ 1 − ( α κ+ β κ) τ − 1 1 − α κ− β κ + (α κ + β κ) τ − 1 σ κ T+ κ 2 − 1 2 log (2 π ) − 1 2 log σ κ t 2 − ( a κ)t 2 2( σ κ t) 2 II I ω κ 1 − α κ − β κ ω κ 1 − ( α κ+ β κ) τ − 1 1 − α κ− β κ + (α κ + β κ) τ − 1 σ κ T+ κ 2 Ψ ν κ+1 2 − Ψ ν κ ₂ − 1 2 log (ν κ − 2) π − 1log2 σ κ t− κ 2 − ν κ+1 2 log 1 + a 2 t ( ν κ − 2) σ κ t− κ 2 IV ω κ 1 − α κ − β κ − 1γ2 κ ω κ 1 − ( α κ+ β κ+ 1γ2 κ) τ − 1 1 − α κ− β κ − 1γ2 κ + (α κ + β κ + 1γ2 κ) τ − 1 σ κ T+ κ 2 − 1 2 log (2 π ) − 1 2 log σ κ t 2 − ( a κ)t 2 2( σ κ t) 2 V ω κ 1 − α κ − β κ − 1γ2 κ ω κ 1 − ( α κ+ β κ+ 1γ2 κ) τ − 1 1 − α κ− β κ − 1γ2 κ + (α κ + β κ + 1γ2 κ) τ − 1 σ κ T+ κ 2 Ψ ν κ+1 2 − Ψ ν κ ₂ − 1 2 log (ν κ − 2) π − 1log2 σ κ t− κ 2 − ν κ+1 2 log 1 + a 2 t ( ν κ − 2) σ κ t− κ 2 VI ω κ log ( ζ κ) 1 − ( β κ) τ − 1 1 − β κ + (β κ) τ − 1log σ κ T+ κ 2 − 1 2 log (2 π ) − 1 2 log σ κ t 2 − ( a κ)t 2 2 ( σ κ t) 2 VI I ω κ log ( δ κ)4 1 − ( β κ) τ − 1 1 − β κ + (β κ) τ − 1 log ˜σ κ T+ κ 2 − log (2) + log (ν κ) − 3Ψ2 1 κ_ν − 1Ψ2 3 κ_ν − 1log2 σ κ t 2 + σ κ t − ν κ a κ t λ κ ν κ ∗In terms of natural logarithms for mo dels VI and V II, e.g. replace σ κ t 2 b y log σ κ t 2 . ∗∗ Γ( x ) and Ψ( x ) denote the (p oly)gamma functions and λ κ, ζ κ, δ κ 1, δ κ 2, δ κ 3 and δ κ 4 are giv en in Eq. (2.10), Eq. (A.4), Eq. (A.5), Eq. (A.6), Eq. (A.7) and Eq. (A.9).

(16)

This result implies for GARCH(1,1), ˜ σκ T 2 [τ ] = Ehωκ_{+ α}κ _aκ T+(τ −1)κ 2 + βκ _σκ T+(τ −1)κ 2 F_Tκ i = ωκ_{+ (α}κ_{+ β}κ_{) ˜}_σκ T 2 [τ_{− 1],}

or equivalently, by recursive substitution,

˜ σκ T 2 [τ ] = ω κ ₁ − (ακ_{+ β}κ₎τ −1 1_{− α}κ_{− β}κ + (α κ_{+ β}κ₎τ −1 _σκ T+κ 2 . (2.13)

Because RiskMetrics is just a specific GARCH(1,1) model (ωκ_{= 0, α}κ_{= 1}

−βκ_{), the explicit τ-step}

ahead conditional variance forecast formula follows directly from Eq. (2.13). The derivations of the τ-step ahead forecasts for the GJR-GARCH and EGARCH models given in Table 1 are deferred to Appendices A.1 and A.2, respectively.

Given the conditional moments and nuisance parameter ηκ_{, the sequence of density forecasts}

˜yκ,ItP,T[τ ]

hκ

τ=1 is completely known. For h = κ, we only need the one-step ahead density forecast

(i.e. τ = 1) by which results become trivial based on Eq. (2.1). Hence, Eq. (2.1) implies

−VaRκ,h,Itα,P,T = ˜µ κ

T[1] + ˜σκT1]Ψ−1(1− α; ηκ),

where Ψ−1_{(z; η}κ₎_{denotes the inverse of Ψ(z; η}κ_{), the cdf of ψ(z; η}κ_{). By the linearity of}

expecta-tions, it follows that for h > κ,

VaRκ,hIt_α,P,T =_−ˆµκ T(h) + gVaR κ,h,It α,P,T, (2.14) where ˆµκ T(h) = Phκ j=1µ˜κT[j] andVaRg κ,h,It α,P,T is based on ˆa κ,It

P,T(h) rather than ˆy κ,It P,T(h).

When univariate modeling is selected, the forecast techniques It, Sc and B can now be easily constructed. When h > κ, iterated forecasts are constructed by Simulation Algorithm S.2.1, which is given by

Each simulation s ∈ S consists of the following steps 1. Set τ = 1 and draw εs

i.i.d.

∼ ψ(z; ηκ_).

2. Construct one-step ahead forecast ˜aκ,It P,Tτ ] = q ˜ σκ T 2 [τ ]_{· ε}s. If τ < h/κ.

increase τ by one and repeat this step. Otherwise, proceed to step 3.

3. Construct the draw from distribution ˆaκ,It

P,T h)as ˆa κ,It P,T h) = h/κ X τ=1 ˜ aκ,It_P,Tτ ]. (S.2.1)

Simulation is needed, since the conditional covariance of ˆyκ,It

(17)

dependence of the random variables ˜yκ,It P,T[τ ]

hκ

τ=1 and the possible non-normal distributional

as-sumptions on εκ

t. Notice that it is indeed sufficient to simulate a distribution for ˆa κ,It

P,T(h)by means

of Eq. (2.14).

Scaled forecasts, prescribed by the Basel Committee on Banking Supervision (1996), only use the one-step ahead forecasts for the conditional mean and variance in constructing the density of ˆyκ,Sc

P,T(h). More precisely, scaled forecasts make use of the assumption that ˆy κ,Sc

P,T(h)is member

of the same parametric family as εκ

t with conditional moments only depending on their one-step

ahead forecasts in the following way,

ˆ y_P,Tκ,Sc(h)_{∼ ˜}ψ z;h˜µ κ T[1] κ , h ˜σκ T 2 [1] κ , η κ ! ,

which implies −VaRκ,h,Sc α,P,T = h κµ˜ κ T[1] + q h κ σ˜ κ T 2 [1]Ψ−1₍₁

− α; ηκ₎_{. Obviously, it follows that}

VaRκ,h,It_α,P,T _{≡ VaR}κ,h,Sc_α,P,T when h = κ. Finally, with respect to normal forecasts, it is assumed that the first two conditional moments of ˆyκ,N

P,T(h)are equivalent to the first two conditional moments

of ˆyκ,It

P,T(h). The distribution, however, is for these forecasts assumed to be normal.

2.3 Univariate semi-parametric models

The models and corresponding density forecasts discussed in Section 2.2 rely on the distributional assumption of the innovations and the estimated dynamics of the conditional mean and variance. Based on these assumptions, three different approaches in constructing VaRκ,h,j

α,P,T were considered,

namely based on iterated, scaled and normal forecasts. In this section we introduce a specific bootstrap procedure to obtain VaRκ,h,B

α,P,T. This procedure relies on the empirical distribution ˆψ κ_(z)

of the standardised residuals.

To obtain ˆψκ_(z)_{we first estimate ϑ}κ_{based on the (T −m)/κ observations {y}κ

t}Tt=m+1by

quasi-maximum likelihood∗_{. We denote $}κ

⊆ ϑκ and %κ

⊆ ϑκ as the subset of parameters involving the conditional mean and variance equation, respectively. The resulting ˆϑκ implies {ˆµκ

t}Tt=m+1

andn σˆκ t

2oT

t=m+1, using the conditional moment dynamics specified in Eqs. (2.4) to (2.6). This

yields the standardised residuals {ˆεκ

t}Tt=m+1= n_yκ t−ˆµκt ˆ σκ t oT t=m+1 and empirical CDF ˆ Ψκ(z) = 1 (T_{− m)/κ} T X t=m+1 I{ˆεκ t≤z}, (2.15)

where I{A}denotes an indicator function which equals one if event A occurs and is zero otherwise.

(18)

Notice that ˆΨκ_(z) _{is a step-function at the order statistics ε}κ

[j] for j = 1, . . . , (T − m)/κ. We

denote random draws with replacement from the series {ˆεκ

t}Tt=m+1by εκt i.i.d.

∼ ˆψκ_{(z), if ˆ}_Ψκ_(z)_{is the}

empirical CDF corresponding to the series.

The bootstrap approach detailed in this section is similar to the procedure proposed by Pascual, Romo, and Ruiz (2004) for ARIMA models, which is extended by, among others, Pascual, Romo, and Ruiz (2006) and Trucíos and Hotta (2016). We generalise their procedures to construct ˜aκ

T[j]

for conditional variance Eqs. (2.5) and (2.6), denoted by σκ t 2 =_Vκ _σκ t−κ 2 , aκ t−κ. The result

is Bootstrap Algorithm B.2.1, given by

Each bootstrap-simulation replicate (b, s) ∈ B × S consists of the following steps 1. Set j = m and let ςκ

j 2∗ = σκ m 2 _{and c}κ j ∗ = aκ m. 2. Draw ξκj,b ∗ i.i.d. ∼ ˆψκ(z). 3. Construct ςκ j+κ,b 2∗ =_Vκ _ςκ j,b 2∗ , cκ j,b ∗ ; ˆ%κ and cκ j+κ,b ∗ = q ςκ j,b 2∗ ξκ j,b ∗ . If j <T− m

κ , increase j by one and go back to Step 2. Otherwise, proceed to Step 4.

4. Construct ˆ%κ_b∗ by quasi-maximum likelihood, based onn cκ j,b

∗oT

j=m+κ.

5. Reset j = m and let σκ j 2∗ = σκ m 2_{and a}κ j ∗ = aκ m. 6. Construct σκ j+κ,b 2∗ =Vκ σκj,b 2∗ , aκj,b ∗ ; ˆ%κ_b∗ .If j < T− m κ , increase j by one and repeat this step. Otherwise, proceed to Step 7.

7. Draw εκ b,s i.i.d. ∼ ˆψ(z; ηκ_). 8. Set ` = 1 , draw εκ b,s i.i.d. ∼ ˆψ(z; ηκ₎_{and let ˜σ}κ T ,b,s 2∗ = σκ T ,b 2∗ and ˜aκ T ,b,s ∗ = aκ T ,b ∗ . 9. Construct ˜σκ T+κ`,b,s 2∗ =_Vκ _˜_σκ T+κ(`−1),b,s 2∗ , ˜aκ T+κ(`−1),b,s ∗ ; ˆ%κ_b∗and ˜ aκ T+κ`,b,s ∗ = r ˜ σκ T+κ(`−1),b,s 2∗ εκ b,s. If ` < h/κ, increase `

by one and repeat this step. Otherwise, proceed to Step 10.

10. If s < S increase s by one and go back to Step 8. Otherwise, proceed to Step 11.

11. Construct the draw from distribution ˆaκ,B

P,T h)as ˆa κ,B P,T h) = h/κ X `=1 ˜ aκ T+κ`,b,s ∗ . (B.2.1)

(19)

incorporates the variability due to parameter estimation in the construction of ˆyκ,B

P,T(h). Essentially,

Bootstrap Algorithm B.2.1 estimates the distribution of the forecast densities of ˆyκ,B

P,T(h)while the

different approaches discussed in Section 2.2 only resulted in a single forecast density. Although the distribution of forecast densities may be hard to visualise, in terms of VaRκ,h,B

α,P,T, Bootstrap

Algorithm B.2.1 estimates the distribution of VaRκ,h,B

α,P,T, reflecting the distribution of the involved

estimators ˆ%κ_b. The bootstrapped distribution of VaRκ,h,B_α,P,T results from Bootstrap Algorithm B.2.1 by taking the negative (1− α) quantile of the simulated ˆaκ,BT ,P,b h), a vector of length S,

the number of simulations, with entries ˜aκ T+κ`,b,s

∗_{. Finally, we obtain VaR}κ,h,B

α,P,T by Eq. (2.14),

neglecting the parameter uncertainty in $κ_{because including them will unlikely offset the increase}

in computation time.

2.4 Multivariate parametric models

Multivariate models are needed when projected modeling is selected as portfolio model. Analogous to the univariate case, the conditional joint density can be written as

ϕ(yκ 1,m+κ, . . . , yM,m+κκ , . . . , yκ1,T, . . . , yM,Tκ )|Fmκ; ϑ κ ) = T Y t=m+κ ϕ(yκ t|Ft−1κ ; ϑ κ ), (2.16) conditional on Fκ m and where yκ_t = µκ t + a κ t, aκt = Σ κ t 12_εκ t, εκ_t i.i.d._{∼ ψ(z; η}κ), (2.17)

is the multivariate extension of the standardisation given in Eq. (2.3), in which {εtκ}Tt=m+1 is a

vector martingale difference sequence satisfying E[εt|Ft−κκ ] = 0and V[εt|Ft−κκ ] = IM.

The multivariate time series possibly exhibit some weak autocorrelations within and between the different univariate series∗_{. A model that exploits these correlations is the VAR(p) model,}

µκ_t = φκ 0+ p X j=1 Φκ_jyκ_t−jκ, (2.18)

which is a stationary process if all roots of |Φ(z)| = 0 lie outside the unit circle, with Φ(z) = IM−

Pp

j=1Φjzj. The curse of dimensionality comes into play when similar generalisations of univariate

∗_{For example, y}2

1,t may be correlated with y22,t−lpossibly in a different way than y 2

2,t is correlated with y21,t−l,

for some ` ∈ R. In particular, when Corr(y2

1,t, y2,t−`2 ) > 0 for all ` ∈ R we say that the series y2,t2 Granger causes

y2

(20)

Table 2: Summary of different conditional variance models.

Definition Number of parameters

VEC vech Σκ t = ωκ+ ˜A κ vech aκ t−κaκt−κ 0 + ˜Bκvech Σκ t−κ O M4 DVEC Σκt = Ωκ+ Aκ aκt−κaκt−κ 0 + Bκ_Σκ t−κ O(M2) dDVEC Σκt = Ωκ IM+ Aκ IM aκt−κaκt−κ 0 + Bκ_I M Σκt−κ O(M ) sDVEC Σκ t = ¯Ω(ωκ) + ακaκt−κaκt−κ 0 + βκ_Σκ t−κ 3 Note: ωκ_{is a} 1 2M (M + 1) × 1 vector, ˜A κ

and ˜Bκare both 1₂M (M + 1) ×1₂M (M + 1) matrices, Ωκ_{, A}κ_{and B}κ

are lower triangular, positive definite matrices of size M × M , ¯Ω : R → RM _{× R}M _{and denotes the Hadamard}

product.

time series models are applied to the conditional variance Σκ

t ≡ V yκt|Ft−1 = E h aκ taκt 0 Ft−1i.

Hence, the feasibility of a VAR(p) type model for the vector vechaκ_taκ_t0, permitting all prod-ucts aκ

i,ta κ

j,t to depend on all their past values for all i ∈ {1, . . . M} and j ∈ {1, . . . M} becomes

challenging, even for small orders p. The operator vech(·) stacks all lower triangular elements of the argument matrix in a column vector, so that vech aκ

taκt 0 contains all 1 2M (M + 1) unique elements of aκ taκt 0

. Consequently, a trade-off must be made between the flexibility and feasibility of the model when selecting conditional variance models in a multivariate context. Table 2 provides models for different levels of flexibility including the associated number of parameters that needs to be estimated. In addition to the exhaustive number of involved parameters, the parameter restrictions in order to guarantee that Σκ

t is positive definite contribute yet another dimension to

the intractability of the VEC model. In a DVEC model, these restrictions become trivial, albeit at the cost of excluding all volatility and correlation spillover effects∗_.

A survey of some promising multivariate GARCH models is provided by Bauwens, Laurent, and Rombouts (2006) and Silvennoinen and Teräsvirta (2009). Many of these models are based on a decomposition of the conditional variance matrix, given by

Σκ_t =Dκ_t 1 2 Rκ_tDκ_t 1 2 , (2.19) where Dκ t ≡ diag σκ 1,t 2 , . . . , σκ M,t 2 and Rκ t ij = Corr(a κ i,t, a κ

j,t), for (i, j) ∈ {1, . . . M}. The

diag(·) operator returns a diagonal matrix with the argument vector on the diagonal. By means of Eq. (2.19), the univariate conditional variances and asset correlation structure can be modeled separately. Put differently, even if Rκ

t = IM, the flexibility of the dDVEC specification is

guar-∗_{A worth noting solution to the intractable parameter restrictions of the VEC model without loosing all volatility}

and correlation spillovers is the BEKK model due to Engle and Kroner (1995). Positive definiteness is guaranteed by construction since Σκt = ΩκΩκ 0 +Ps i=1Aκiaκt−κaκt−κ 0 Aκ i 0 +Ps i=1BκiΣκt−κΣκt−κ 0 Bκ i 0 , where Ωκ is lower triangular, Aκ_i and Bκ_i are non-singular and s ∈ N is a fixed arbitrary.

(21)

anteed by implementing the univariate models discussed in Section 2.2 in the specification of Dκ t.

In this section we discuss two different approaches in modeling the covariance structure of the standardised series ξtκ= Dtκ

−12_aκ t.

In the first approach, we use the Dynamic Conditional Correlation model introduced by Engle (2002) by implementing a sDVEC specification for Rκ

t = V

ξκ_tF_t−κκ . Strictly speaking, the sDVEC specification is applied to a proxy process Qκ

t which is needed to guarantee that R κ t has

unit diagonal entries, in the following way

Rκ t = Q∗κ t −1₂ Qκ t Q∗κ t −1₂ , Qκ_t = (1_{− α}κ − βκ_)Sκ + ακ_ξκ t−κξ κ t−κ 0 + βκ_Qκ t−κ, (2.20) where Q∗κ t ≡ Q κ

t IM. The correlations are stationary if Sκ is positive definite and ακ+ βκ< 1.

We do not use the correction proposed by Aielli (2013), because it seems to make very little difference in practice∗_{. Estimation of the parameters %}κ _{involving the conditional variance is}

based on the conditional distribution of εκ

t. In line with Section 2.2, the multivariate normal and

heavier tailed multivariate Student’s t-distribution are considered, with densities,

ψ(z; ηκ_{) =} 1 √ 2π M e−12z 0 z _(2.21) and ψ(z; ηκ_{) =} Γ (ν κ_{+ M )/2} Γ(νκ_{/2) (ν}κ_{− 2)π}12M 1 + z 0 z νκ_{− 2} −(νκ+M )/2 , (2.22)

respectively, in which νκ_{> 2}_{and where Γ(x) denotes the usual Gamma function. We use the}

two-step estimation procedure proposed by Engle and Sheppard (2001) to estimate %κ_{consistently. A}

detailed description including the first and second stage log-likelihood are deferred to Appendix B.2.

In the second approach, we use the Generalised Orthogonal GARCH (GOGARCH) model due to Van der Weide (2002). In this model, the highly complex system of covariance dynamics of the standardised returns ξtκis transformed into the M principal components of V(ξtκ) = Corr(atκ)≡

Rκ_{, denoted by f}κ

t. Define Λ

κ _{= diag(λ}

1, . . . , λk) with λi the i-th eigenvalue of Rκ satisfying

λ1≥ . . . ≥ λM and stack the corresponding eigenvectors in the matrix Pκ. Because Rκis positive

∗_{By which we follow Engle, Ledoit, and Wolf (2017). Indeed, among others, Caporin and McAleer (2014) find}

(22)

definite, its spectral decomposition, Rκ

= PκΛκ Pκ0

, where Pκ_{is an orthogonal matrix, exists.}

In this notation, the first M principal components of Rκ_{are given by f}κ

t = P

κ0

ξκ_t. We denote the standardised principal components by sκ

t = Λ

κ−12 _Pκ0

ξκ_t where the factor Λκ−12 normalises

the latent factors to have unit variance,

V sκt = V Λκ−12 _Pκ0 ξκ_t = Λκ−12 _Pκ0 PκΛκ Pκ0 Pκ Λκ−12 _{= I} M, whereas Hκ ≡ V fκ t = Λ

κ_{. To clarify, we adapt Eq. (2.17) to include the principal components,}

yκ_t = µκ t + a κ t, atκ= Dtκ 12_ξκ t, ξκ_t = Zκfκ_t, fκ_t = Htκ 12_εκ t, εκ_t i.i.d._{∼ N (0, I}M), (2.23) where Hκ t ≡ V(f κ

t|Ft−κκ )is modeled by a dDVEC model with diagonal matrices Ω

κ_{, A}κ _{and B}κ_. In other words, Hκ t ≡ diag ςκ 1,t 2 , . . . , ςκ M,t 2, where ςκ i,t 2

follows a univariate GARCH(1,1) model for i ∈ {1, . . . , M}. Although extensions with respect to the dynamics of the Htκ

compo-nents are possible, we restrict ourselves to a GARCH(1,1) model with Gaussian innovations, while the elements of Dtκ are still as general as is allowed by Eqs. (2.5) and (2.6). The GOGARCH

model implies the conditional variance-covariance matrix

Σκ_t =Dκ_t 1 2 ZκHκ_tZκ0Dκ_t 1 2 , (2.24)

which follows directly from Eq. (2.23). The latter reveals that the GOGARCH model is more flexible regarding its covariance specification compared to the DCC model. Hence, in addition to the common dDVEC specification of Dκ

t

1₂

, the GOGARCH model introduces a specification for the conditional covariance structure that relies on another diagonal dDVEC specification whereas the conditional covariance structure of the DCC model relies on a sDVEC specification which is too parsimonious to include volatility spillovers.

The linkage between the independent components and the observed data is a generalisation of the linkage implied by the Orthogonal GARCH (OGARCH) model proposed by Ding (1994) and Chibumba (1996). Specifically, it assumes that the observed (standardised) data ξκ

t can be

linearly transformed into a set of independent latent factors fκ

t by means of an invertible matrix

(23)

assumption is of course very restrictive and leads to identification issues, moreover.

Van der Weide (2002) generalises Zκ _{by introducing an orthogonal matrix U}κ_{. Hence, as a}

direct result from the Singular Value Decomposition, there exists an orthogonal matrix Uκ _such

that the very general, but invertible mapping Zκ _satisfies,

Zκ= Pκ Λκ

1

2_Uκ_, (2.25)

adding M(M − 1)/2 parameters to the model∗_{. We use Eq. (2.25) to show that the OGARCH}

model is nested in the GOGARCH model for Uκ

= IM in Appendix C.1.

Van der Weide (2002) proposes a two-step maximum likelihood procedure in which Pκ _and

Λκ are based on the consistently estimated ˆRκ _≡ T −m κ PT t=m+κξˆ κ tξˆ κ t 0

and together yield ˆsκ

t = ˆ Λκ− 1 2_Pˆκ 0 ˆ

ξκt. The remaining {ϕκij}i<j , {θκi} M

i=1 and {δκi} M

i=1 are estimated by maximum

likeli-hood accordingly. Although this procedure may be feasible for moderate values of M, we follow Boswijk and Van Der Weide (2006) in estimating the parameters of the GOGARCH model. A summary of their estimation procedure is provided in Appendix C.2

Finally, we apply the same forecast techniques as discussed in Section 2.2 to construct VaRκ,h,j α,PM,T,

for j ∈ {It, Sc, N }. The forecast techniques are in fact applied to univariate and projected mod-eling in the exact same way by substituting ˜yκ

P,T[τ ] hκ τ=1 for w 0_y_˜κ,j T [τ ] hκ τ=1 . Consequently, the

closed-form solutions for κ = 10 when iterative forecasts are selected and the general closed form solution for scaled forecasts extend naturally to the projected modeling case. Hence,

VaRκ,h,jα,PM,T = h κw κ T 0 ˜ µκ T[1] + r h κw κ T 0 ˜ ΣκT[1]wκTΨ−1(1− α; ηκ), (2.26)

defines both VaR10,h,It

α,PM,T and VaR

κ,h,It

α,PM,T. When h > κ, iterated forecasts are constructed by

Simulation Algorithm S.2.2, which is given by

Each simulation s ∈ S consists of the following steps 1. Set τ = 1 and draw εs

i.i.d.

∼ ψ(z; ηκ₎

2. Construct one-step ahead forecast ˜aκ,It

PM,T[τ ] = w κ T 0_˜ Σκ_T[τ ]− 1 2 εs. If jτ < h/κ,

increase j by one and repeat this step. Otherwise, proceed to Step 3.

3. Construct the the draw from distribution ˆaκ,It

PM,T h)as ˆa κ,It PM,T h) = h/κ X τ=1 ˜ aκ Tτ ]. (S.2.2)

∗_{To see this, Van der Weide (2002) uses the result of Vilenkin (1968) that U}κ_{can be written as the product of}

M (M − 1)/2 rotation matrices, i.e. Uκ=Q

i<jRij(ϕκij), with Euler angles ϕκij∈ [−π, π] for all i ∈ {1, . . . , M } and

(24)

As a final step, we obtainVaRg

h,κ,It

α,PM,T by taking the negative (1 − α) quantile of the simulated

distribution of ˆaκ,It

PM,T h)and hence VaR

κ,h,It

α,PM,T by means of Eq. (2.14). The corresponding τ-step

ahead forecasts ˜µκ_T[τ ]and ˜Σκ_T[τ ]depend on the forecast model. We refer to Appendix B.3 for a detailed description of these forecasts when the DCC model is selected. As a result of Eq. (2.24), the construction of ˜Σκ_T[τ ]based on the GOGARCH model fully relies on the forecasts of univariate parametric models and estimation of Zκ_{, which have already been discussed in Section 2.2.}

2.5 Backtesting

Estimation and forecast construction based on the rolling window estimation procedure popularized by Fama and MacBeth (1973) yields the series {VaRκ,h,j

α,p,T, VaR κ,h,j α,p,T+κ, . . . , VaR κ,h,j α,p,T −h κ}, with

j∈ {It, Sc, B, N } and p ∈ {P, PM}. The length of the estimation window N is chosen to be fixed

in terms of the number of trading days between the first and last observation of the window (i.e. in terms of T1_{), but not in terms of the number of observations that is actually used for estimation,}

since the latter is equal to N/κ. Parameters are re-estimated after each observation and the length of the evaluation window is set equal to h. In evaluating the performance of VaRκ,h,j

α,p,T, it is

convenient to define the hit series

xκ

α,t+h≡ I{yκ

P,t+h<−VaR κ,h,j

α,p,T}, (2.27)

for t = T, T + κ, . . . , T − h/κ and where I{A} denotes an indicator function which equals one if

event A occurs and is zero otherwise.

The Basel Committee on Banking Supervision (1996) proposes a backtesting framework which is directly related to the hit series xκ

α,t+h, in terms of the actual hits that have occurred over a

fixed period of time [T1, T2]. A natural implementation of their framework is based on the coverage

rate CRκ,h α,T1,T2 = PT2 t=T1x κ α,t

nκ , where nκ denotes the number of observations in [T1, T2]. In terms

of CRκ,h

α,T1,T2, the traffic light classes proposed by the Basel Committee on Banking Supervision

(1996) are given by CCRκ,hα,T1,T2 =            green if CRκ,h_α,T₁_,T₂ < 0.02, yellow _{if 0.02 ≤ CR}κ,h_α,T₁_,T₂< 0.04, red if CRκ,hα,T1,T2 ≥ 0.04,

which will be used to color the coverage rates. In addition, we consider three traditional Like-lihood Ratio (LR) tests to test whether the hit series satisfies the unconditional coverage and

(25)

independence property (jointly)∗_{. Hence, if the model is correctly specified, VaR}κ,h,j

α,p,T is based on

a forecast density that incorporates all levels of volatility clustering and any other or higher order dynamics such that the hit series itself is filtered for all these (nonlinear) dependencies†_{. The}

explicit hypotheses and LR statistics involving the Unconditional Coverage (UC) test introduced by Kupiec (1995) and the Independence (IND) and Conditional Coverage (CC) test proposed by Christoffersen (1998) are deferred to Appendix D. It is good practice to include all three tests to overcome particular situations in which the risk model passes the UC and CC test, but fails the IND test. Hence, in the latter case, the model is likely to cause successive periods of hits which in turn may be even more problematic then systematically underestimating the portfolio risk, while it is accepted when the IND test is exluded.

Because VaRκ,h,j

α,p,T and VaR κ,h,j

α,p,T+κ, are related by construction if κ < h, the IND and CC

test are not very informative in this case. The Dynamic Quantile (DQ) proposed by Engle and Manganelli (2004) is another widely used test for the independence of the hit series. The simple linear regression

xκ

α,T+h− α = λκ0− λκ1VaR κ,h,j α,p,T,

with hypotheses H0: λ1= 0against H1: λ16= 0 defines an implementation of the DQ test which

is robust against the overlap in forecast horizon. We use a paired block bootstrap with B = 10000 bootstrap replications and block length b = 10 to obtain the corresponding critical values.

2.6 Comparing forecast methods

The use of a fixed rolling window for parameter estimation fits in the framework of Giacomini and White (2006). Within their framework they provide statistical tests of equal predictive accuracy between two forecast methods rather than two forecast models. Additionally, it allows for nested models (e.g. GARCH is nested in GJR-GARCH) and parameter uncertainty.

We implement their framework by means of the statistic proposed by Diebold and Mariano (1995) together with the asymmetric tick loss function proposed by Giacomini and Komunjer (2005). Specifically, for two methods i and j with corresponding losses Lκ

i,α and Lκj,α based on

Lκα,T+h(e κ α,T+h) = (1_{− α) − I{}_eκ α,T +h<0} eκα,T+h,

∗_{In other words, to determine whether x}κ α,t+h

i.i.d.

∼ Bernoulli(1 − α)

†_{Note that plim CR}κ,h

α,T1,T2 = plim PT2 t=T1xκα,t nκ = E h xκ α,t i = Phyκ PM,t< −VaR κ,h,j α,p,T i = 1 − α if the model is correctly specified by means of a Law of Large Numbers.

(26)

where eκ

α,T+h ≡ yκT+h+ VaR κ,h,j

α,p,T, we test H0 : E[d κ

i,j,T] = 0 against H1 : E[d κ i,j,T] 6= 0 in which dκ i,j,T ≡ L κ i,T − L κ

j,T, by means of the DM test statistic

tκ i,j,α,T1,T2 = 1 nκ PT2 T=T1d κ i,j,T r ˆ σ2 HAC κ nκ , (2.28) where ˆσ2 HAC κ

is a heteroskedasticity and autocorrelation-consistent (HAC) variance estimator given by ˆσ2

HAC

κ

= ˆγ0+ 2PK−1k=1 akˆγk where ˆγk denotes the sample covariance of the sequence

{dκ T1, . . . , d κ T2} and in which ak = 1− k K with K = j

nκ14k. Similar to the DQ test, p-values are

based on a block bootstrap with b = 10 and B = 10000.

Because multiple pairwise comparisons do not extend to valid statements regarding multiple hypotheses we also consider the Model Confidence Set (MCS) proposed by Hansen, Lunde, and Nason (2011). The objective of the MCS is to reduce a set of initial models M0 _{to the MCS}

M∗

1−δwhich includes the set of best models M∗ with a confidence level 1 − δ∗based on a iterative

procedure that involves an equivalence test δM and elimination rule eM. In particular, if δM

is rejected with a significance level δ for a set M, this means that there is sufficient statistical evidence against the hypothesis that the objects in M are equally good. As a second step, eM

is used to eliminate the worst performing object(s). This procedure starts with M = M0 _and

iterates the δM, eM steps until δM is no longer rejected. We implement the multiple t-statistic

approach advocated in their work, where δMis based on the test statistic

Tmax,M,ακ ≡ max_i∈Mt κ i,α,T1,T2,

where tκ

i,α,T1,T2 is defined as in Eq. (2.28), but then for d

κ

i,T ≡ Lκi,T − 1 m

P

i∈MLκi,T, in which

mis the number of models in M, rather than dκ i,j,T = L

κ i,T − L

κ

j,T. The natural elimination rule

corresponding to Tκ

max,M,αis emax,M= argmaxi∈Mtκi,α,T1,T2

†_{. Because the asymptotic distribution}

of Tκ

max,M,αis non-standard we implement a block bootstrap with b = 10 and B = 10000 to obtain

the MCS p-values. The MCS p-value ˆpi, is the treshold at which i ∈ cM∗1−δ if and only if ˆpi ≥ δ.

So, if ˆpj < ˆpi, object i is more likely to be one of the best alternatives in M because there exists

a continuum of confidence levels for which only i survives, namely {(1 − δ)|ˆpj < δ≤ ˆpi}, while i

always survives when j survives.

∗_{Formally, lim}

n→∞P[M∗⊂ cM∗1−δ] ≥ 1 − δ in general and limn→∞P[M

∗_{= c}_M∗

1−δ] = 1 when M

∗_{contains a}

single object.

†_{We have also considered the test statistic T}κ

R,M,α ≡ maxi,j∈M

tκ

i,j,α,T1,T2

with elimination rule e_R,M = argmaxi∈Msupj∈Mtκi,j,α,T1,T2, but found a significant increase in power in our empirical analysis by using

Tκ

(27)

3 Empirical results for individual stocks

In this section, we select and estimate forecast models for three US stocks (Source: Yahoo Finance): the Alcoa stock (AA), the MCDonald’s stock (MCD) and the Merck stock (MRK) over the period from April 2, 1984 until May 14, 2015, using daily (κ = 1), weekly (κ = 5) and biweekly (κ = 10) data. The prices and returns are shown in Fig. 1. After a short motivation for the fixed estimation window length N, we estimate parameters based on a rolling window and perform some diagnostic tests accordingly. We conclude this section by applying the backtesting procedures introduced in Section 2.5 to the univariate return series of the AA, MCD and MRK stock.

0 30 60 90 120 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 t Pi,t AA MCD MRK (a) Prices -0.3 -0.2 -0.1 0.0 0.1 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 t yi, t AA MCD MRK (b) Returns

Figure 1: Daily prices and returns of individual stocks AA, MCD and MRK

3.1 Evaluation window length

In selecting and estimating forecast models (i.e. time series models) for the individual stock returns, valid statistical inference relies on the stationarity of the individual series. We choose the length of the estimation window N such that the null hypothesis of the Augmented Dickey Fuller (ADF) test that the series has a unit root, is rejected at a significance level of 0.05 in each of the individual windows∗ _{and for all κ. As can be seen from Figs. 2b to 2d, stationarity}† _{is provided already for}

N = 2500(250 observations) for all values of κ. On the other hand, Fig. 2a indicates that for N = 1000there are some consecutive non-stationary windows for the AA stock during the credit crunch when κ = 10. Henceforth, the length of the estimation window is set equal to N = 2500.

∗_{Note that we are testing the single hypothesis that the stock returns in a specific window are stationary}

(inde-pendent of all other windows) rather than the multiple hypothesis that all windows are stationary.

(28)

-8 -6 -4 -2 0 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T ADF AA MCD MRK (a) N = 1000 and κ = 10 -8 -7 -6 -5 -4 -3 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T ADF AA MCD MRK (b) N = 2500 and κ = 10 -16 -12 -8 -4 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T ADF AA MCD MRK (c) N = 2500 and κ = 1 -10 -8 -6 -4 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T ADF AA MCD MRK (d) N = 2500 and κ = 5

Figure 2: Results individual ADF tests. The dotted line represents the critical value −2.86 corre-sponding to a significance level of 0.05.

3.2 Conditional mean

As a first step in estimating and evaluating the different models summarised in Table 1, suitable orders p and q need to be selected for the (different) ARMA(p,q)-specifications for the conditional means of the assets AA, MCD and MKR. Fig. 3 shows that significant autocorrelation is present from lag one and onwards for some consecutive windows. Hence, the Ljung Box (LB) statistics are sufficiently large. The LB(m)-test is a joint test for H0 : ρ1= . . . ρm= 0 against H1: ρl6= 0

for some l ≤ m and is performed for each individual window, similar to the ADF test discussed in Section 3.1. Due to the time varying conditional variance, the true distribution of LB(m) differs from the asymptotic χ2_(m

− p − q) distribution that is used to obtain the critical values depicted in Fig. 3. Specifying an ARMA(p,q) model for the conditional mean exploits the serial correlation in the stock returns.

Without allowing for a GARCH specification for the conditional variance equation, it is however hard to find orders p and q such that the residuals are serially uncorrelated. Moreover, we start

(29)

30 60 90 120 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T LB AA MCD MRK (a) κ = 1 10 20 30 40 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T LB AA MCD MRK (b) κ = 10

Figure 3: Results individual LB statistics residuals. The dotted lines represent the χ2 _critical

values at a significance level of 0.05.

with small orders p and q to overcome identification problems for the ARMA coefficients due to common factors. Consequently, we compare the significance of the estimated parameters ˆφκ

1 and

ˆ φκ

2 in a AR(1) and AR(2) specification for the conditional mean combined with an GARCH(1,1),

GJR-GARCH(1,1) or EGARCH(1,1) specification for the conditional variance. In all cases we use the standard normal distribution for the specification of the error distribution. Some care is needed when it comes to model selection in a rolling window set up. Hence, it would be a rare coincidence when selection criteria such as Akaike Information Criterion (AIC) identify the same best model in every single window. Also, the significance of estimated parameters, implying possible model simplifications, does not generally yield unanimous decisions in all estimation windows.

As illustration, Fig. 4 provides t-values for all estimation windows and different specifications, based on the returns of the AA stock. Additionally, the AICs are given for the same windows and specifications. Regarding the t-values, Fig. 4a indicates that the null hypotheses for which either φκ

1 in an AR(1)-GARCH(1,1) model or φκ1 or φκ2 in an AR(2)-GARCH(1,1) are equal to zero,

are unanimously not rejected, whereas the t-values themselves are of course varying per window. The t-statistic outcomes in Figs. 4b and 4c are not leading to the same model specification in each window, but follow however a pattern which is closely related to the standard GARCH specification in Fig. 4a. Because we do not allow for any type of model switching, simplification of both the AR(1)-GARCH(1,1) and AR(2)-GARCH(1,1) model to a standard GARCH(1,1) model is motivated by the rejection rate of the null hypothesis among all windows. Put differently, if an AR(1)-EGARCH(1,1) model was selected, statistical evidence for an EGARCH(1,1) simplification would have been found most of the times (but not every time). Another motivation for the standard GARCH(1,1) model relies on Fig. 4d. Because the AIC paths are so close to each other, we simply use the mean of all window AICs to rank the different specifications. For the AA stock, the

(30)

-2 -1 0 1 2 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T t-v alue ar(1)-AR(1)-GARCH(1,1) ar(1)-AR(2)-GARCH(1,1) ar(2)-AR(2)-GARCH(1,1) (a) GARCH -2 -1 0 1 2 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T t-v alue ar(1)-AR(1)-GJR-GARCH(1,1) ar(1)-AR(2)-GJR-GARCH(1,1) ar(2)-AR(2)-GJR-GARCH(1,1) (b) GJR-GARCH -4 0 4 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T t-v alue ar(1)-AR(1)-EGARCH(1,1) ar(1)-AR(2)-EGARCH(1,1) ar(2)-AR(2)-EGARCH(1,1) (c) EGARCH -3.7 -3.5 -3.3 -3.1 -2.9 84 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 T AIC GARCH(1,1) AR(1)-GARCH(1,1) AR(2)-GARCH(1,1) (d) AIC

Figure 4: Results t-statistics and AICs for AA stock. The dotted lines represent the corresponding critical values of the t-statistics at a significance level of 0.05.

average AICs are minimized for the standard GARCH(1,1) specification. So, another motivation for a standard GARCH(1,1) model in terms of Fig. 4d is that the blue line is below the others, on average across time. Notice that on the other hand, averaging t-values over all estimation windows may be misleading because some very significant coefficients may draw the average into the critical region of the t-statistic while in fact only a few windows exhibit (highly) significant autocorrelation. At the same time it is hence also informative whether these rejection rates and average t-values are in line with each other. Therefore, we calculate both values for different values of κ. The results are summarised in Table 3.

Quantifying ‘most of the times’ as at least half of the times, Table 3 only contains average t-values and rejection rates which are in line with each other. Therefore, we do not expect that some extremely significant coefficients are distorting the outcomes a lot. Based on these two measures in Table 3, there is sufficient statistical evidence to simplify the AR(1)-GARCH(1,1) and AR(2)-GARCH(1,1) model to a standard AR(2)-GARCH(1,1) model for all three assets AA, MCD and MRK for κ ∈ {5, 10} returns. For κ = 1, the results are less general.

(31)

Table 3: Model selection conditional mean

Average t-value Rejection rate Average φκ 1 φκ2 φκ1 φκ2 AIC κ = 1 AA GARCH(1,1) - - - - -5.0480 AR(1)-GARCH(1,1) 2.4416 - 0.5596 - -5.0535 AR(2)-GARCH(1,1) 2.4090 -0.9825 0.5588 0.1418 -5.0535 MCD GARCH(1,1) - - - - -5.4746 AR(1)-GARCH(1,1) -2.7808 - 0.8939 - -5.4778 AR(2)-GARCH(1,1) -2.7868 -0.8590 0.8945 0.0000 -5.4774 MRK GARCH(1,1) - - - - -5.2608 AR(1)-GARCH(1,1) -0.5554 - 0.0707 - -5.2604 AR(2)-GARCH(1,1) -0.5427 -0.9907 0.0717 0.0000 -5.2601 κ = 5 AA GARCH(1,1) - - - - -3.3192 AR(1)-GARCH(1,1) 0.0377 - 0.0000 - -3.3157 AR(2)-GARCH(1,1) 0.0350 -0.1590 ∗ ∗ -3.3119 MCD GARCH(1,1) - - - - -4.0273 AR(1)-GARCH(1,1) -0.9205 - 0.2357 - -4.0262 AR(2)-GARCH(1,1) -0.9084 -0.7638 0.2357 0.0290 -4.0235 MRK GARCH(1,1) - - - - -3.6518 AR(1)-GARCH(1,1) -1.6508 - 0.3938 - -3.6539 AR(2)-GARCH(1,1) -1.6387 0.1026 0.4125 0.0000 -3.6507 κ = 10 AA GARCH(1,1) - - - - -2.6360 AR(1)-GARCH(1,1) -0.1547 - 0.0000 - -2.6288 AR(2)-GARCH(1,1) 0.1584 -0.7945 0.0018 0.1813 -2.6244 MCD GARCH(1,1) - - - - -3.2965 AR(1)-GARCH(1,1) -0.7949 - 0.2357 - -3.2919 AR(2)-GARCH(1,1) -0.7918 -0.1853 0.0168 0.0467 -3.2855 MRK GARCH(1,1) - - - - -2.9265 AR(1)-GARCH(1,1) -1.0269 - 0.0000 - -2.9235 AR(2)-GARCH(1,1) -1.0159 0.0792 ∗ ∗ -2.9163

The corresponding AICs in the last column of Table 3 point to the same models as the average t-values and rejection rates. There is however one particular situation, namely for MRK when κ = 5, for which the AIC is minimised for AR(1)-GARCH(1,1) instead of GARCH(1,1). Zooming in on this situation, where the rejection rate is equal to 0.3938, the earlier mentioned lack of unanimity across windows is very strong. It is also informative whether rejections occur in clusters. For a standard GARCH(1,1) specification, Fig. 5 shows that rejections are indeed clustered for the MCD and MRK stock. As an extreme case, the windows for which φ5

1 estimates in the

AR(1)-GARCH(1,1) model for MCD are significant are almost contained in a single set of consecutive windows (see Fig. 5a).

Zooming out again, the behavior of the LB statistic depicted in Fig. 3 and the results in Table 3 for different values of κ are strongly related. Despite the result that including an AR(1) model for the conditional mean is more likely to be close to the true model specification for daily returns compared to longer period returns, Fig. 3 is also a visualisation of the higher rejection rates for AA and MCD compared to MRK when κ = 1. Specifically, the average t-values, rejection rates and AICs (in five decimals) are all indicating the need for an AR(1) specification for the conditional

Sensitivity analysis of value-at-risk