Combining models for Value-at-Risk estimation

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

up into a number of sections and contains references. An outline can be something like (this

is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper

from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page)

(c) Introduction

(d) Theoretical background

(e) Model

(f) Data

(g) Empirical Analysis

(h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you

use should be logical) and the heading of the sections. You have a free choice how to

list your references but be consistent. References in the text should contain the names

of the authors and the year of publication. E.g. Heckman and McFadden (2013). In

the case of three or more authors: list all names and year of publication in case of the

rst reference and use the rst name and et al and year of publication for the other

references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that

actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty

as in the heading of this document. This combination is provided on Blackboard (in

MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number

(d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

Combining models for

Value-at-Risk estimation

Koen Willems

11110201

MSc in Econometrics

Track: Financial Econometrics Date of final version: August 8, 2017 Supervisor: Dr M.J. vd Leij

Second reader: Prof Dr H.P. Boswijk

Abstract

The performance of different forecasting combination schemes is measured in this paper. A linear, quantile and logarithmic combination is considered. We measure the performance of the combination schemes by evaluating its VaR accuracy. We apply this approach using GARCH class volatility models to daily returns on four stock indices. The VaR estimates improve when a logarithmic combination is used.

(2)

This document is written by Student Koen Willems who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents

(3)

1 Introduction

Modern risk management for financial institutions is based on an adequate risk meaurement. Since the 1990’s, regulators have emphasised Value at Risk (VaR) as the main tool for downside risk measurement. The implementation and hence the evaluation of VaR depends on assumptions made on daily returns. The VaR of a portfolio is the amount risked over some period of time with a fixed probability. Financial institutions can use their own internal models to obtain VaR estimates. There are a lot of different methods for measuring downside risk. Financial institutions are challenged to choose the model that best suits their purposes. Since the true data generating process is unknown, there is some uncertainty. In fact, any model is wrong in some sense. Hence, relying on only one model when constructing a VaR estimate might be dangerous.

Combining models may result in better VaR estimates. A simple port-folio diversification argument motivates this idea. A second reason for using forecast combinations is that individual forecasts may be very differently affected by structural breaks caused by institutional change or technological developments for example. Some models adapt quickly and will only be tem-porarily affected. Other models may adjust very slowly after a structural break.

Forecasters may have totally different views about some future period due to assumptions underlying the model they use and/or due to differ-ences in their information set. An issue that immediately arises with the

(5)

availability of many forecasts is how a decision maker can best exploit the information in the individual forecast.

The vast majority of the literature evaluates and compares density com-binations which are combined through the linear opinion pool (weighted sum of individual densities). This thesis adds to the literature as it focuses more on the decision of how to combine different forecast methods. In addition to the linear combination, a logarithmic combination as well as a quantile combination are evaluated.

Two volatility models from the GARCH class are used for forecasting in the analysis. To combine the models, we use equal weights as well as the weights determined by the log-scoring rule. The methods are applied to four stock indices. We evaluate the economic value by examining the quality of the 1-day 99% and 95% VaR estimates by applying the Conditional Coverage (CC) test of Christoffersen (1998).

The next section reviews a selection of the literature on forecast combi-nations. Section 3 introduces the three combination schemes and shows how they lead to different forecasts using an example. Section 4 describes the univariate volatility models that are used in the empirical analysis. Section 5 describes how the weights in the different models are obtained and how the models are evaluated. Section 6 is the empirical analysis and Section 7 concludes.

(6)

2 Literature Review

Many empirical studies have found that combined models produce better forecasts on average than methods based on individual forecasting models, e.g. Clemen and Winkler (1986) tested a variety of combined forecasts of GNP from four major econometric models. They found that a combination of the simple average, the normal model with an independence assumption, and the Bayesian model performed better than the individual models.

Stock and Watson (2004) considered five types of forecast combination methods. They examined the empirical performance of the combinations using a seven-country economic data set and compared it with individual

forecasts. They concluded the following: First, some combinations

per-formed better than the AR benchmark. Second, the combinations that have the least data adaptivity in their weighting schemes performed best. Third, the combination forecasts with the least adaptivity to the data were also found to be the most stable.

Aiolfi and Timmermann (2006) found significant evidence of persistence in forecasting performance accross linear and nonlinear models and, given this persistence, argued that it is likely that succesful conditional combi-nation strategies can be designed. Guidolin and Timmermann (2009) de-veloped a flexible approach to combine forecasts of future spot rates with forecasts from time-series models or macroeconomic variables. They found empirical evidence that combining forecasts from different models helps im-prove the out-of-sample forecasting performance for US short-term rates.

(7)

There is enough evidence to conclude that combination forecast methods perform better than individual forecasts. In addition, simple combinations tend to have better features than the more complex models. However, these conclusions are primarily based on studies of point forecasts. Not much in-formation can be extracted from point forecasts if there is no indication of their uncertainty. This led to a growing interest in density forecast combi-nations.

Opschoor et al. (2016) investigate the added value of combining density forecasts focused only on the left tail of the distribution. They evaluate three scoring rules in order to obtain weights to assign to the individual density forecasts. They combine densities in a linear opinion pool which is a weighted sum of the individual densities. The first scoring rule is the log score function (Mitchell and Hall, 2005). This function takes the logarithm

of the predictive density evaluated at realization yt. The second scoring

rule is the censored likelihood scoring rule (CSL), advocated by Diks et al. (2011). The third rule is the Continuous Ranked Probability Score (CRPS) of Gneiting and Ranjan (2011). For each scoring rule they choose the weights in three different ways. First, they obtain the weights by maximizing (mini-mizing in case of the CRPS) the score of the combined density summed over some time period τ . Second, they use a weighting scheme proposed by Jore et al. (2010). This method gives weight to individual method i equal to the exponent of the score of method i divided by the sum of the exponents of the scores of all methods. Third, they use the equally weighted combination

(8)

as a benchmark. They apply this approach to daily returns on the S&P500, DJIA, FTSE and Nikkei indices using volatility models including GARCH, HEAVY and GAS models and show that combining density forecasts based on optimizing the censored likelihood scoring rule significantly outperforms the other models.

3 Combinations of probability and density

fore-casts

Consider the problem of an agent having N different forecast models for a time series variable of interest y at his or her disposal. Different distribu-tions for y in period t are associated with each of these methods. Suppose that all of these methods are conditioned on an information set denoted by

Itwhich includes all information available up to time t. The agent can

com-bine different models by assigning weights to each model. Before a decision maker obtains weights to combine forecasts, he or she has to consider how to combine the predictions, e.g. taking a linear or a nonlinear combination. This section discusses several methods to combine forecasts.

Suppose we are considering combining N predictive densities of y

de-noted by ft+h,1(yt+h|It) . . . ft+h,N(yt+h|It) which are forecasts of h periods

ahead made at time t. To end up with a combination a decision maker has to assign weights to each of these indivual densities. Required is that the combination must be convex with weights summing up to one and lying in the zero-one interval so that the probability forecast never becomes negative.

(9)

This still gives room for a wide set of possible combinations.

Linear opinion pool

The most obvious way to combine densities is the so-called linear opinion pool which is the convex combination

f_t+hp (yt+h|It) = N

X

i=1

wift+h,i(yt+h|It) (1)

with wi ≥ 0 and PNi=1wi = 1. A difficulty associated with this density is

that it is typically multimodal so that no clear choice for a jointly preferred outcome emerges.

Logarithmic combination

Genest and Zidek (1986) proposed a method which is less dispersed than the linear combination and is also unimodal. One can adopt a logarithmic combination of densities f_t+hl (yt+h|It) = QN i=1(ft+h,i(yt+h|It))wi R QN i=1(ft+h,i(yt+h|It))widy (2)

where weights have to be chosen such that the integral in the denominator is finite.

Figure 1 illustrates the difference between the two combination methods. The blue and the red line represent normal densities with variance equal to one and four and mean zero and mean six respectively. The orange line in Figure 1(a) is the linear combination as in equation (1) with N = 2 and equal weigths. The orange line in Figure 1(b) is the logarithmic combination as in

(10)

Figure 1: Individual normal densities and their combinations -8 -6 -4 -2 0 2 4 6 8 10 12 14 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

(a) Two normal density functions; blue: N(0,1) and red: N(4,6) and it’s equally weighted linear combination (orange). -8 -6 -4 -2 0 2 4 6 8 10 12 14 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

(b) Two normal density functions; blue: N(0,1) and red: N(4,6) and it’s equally weighted logarithmic combination (or-ange).

(2) with N = 2 and equal weights. There is a clear difference between the two combination methods as the linear combination in this example is a bimodal distribution and the logarithmic combination is unimodal. Furthermore, the logarithmic combination is less dispersed than the linear combination.

Quantile combination

Another way to combine forecasts is to look at their cumulative distribution functions instead of their densities. Linear pooling of cumulative distribution functions is equivalent to taking the linear combination of the densities.

Ft+h(yt+h|It) =

N

X

i=1

(11)

From a graphical point of view we can also call this a vertical combination of distributions as it adds the probabilities in a vertical way. One can also combine horizontally by using the quantiles of the distribution. Quantiles are cutpoints dividing the range of a probability distribution into contin-uous intervals with equal probabilities. Given a set of quantile forecasts Qt+h,1, . . . , Qt+h,N a linear combination can be constructed in the following

way Qc_t+h= N X i=1 wiQt+h,i (3) with constraint PN

i=1wi = 1. Figure 2(b) is a graphical representation of

a combination of quantiles where the Pth quantile QP is the inverse of the

normal cdf evaluated at P denoted by Φ−1(P ). The distribution parameters

are the same as in Figure 2(a).

Summary of combinations

The orange line in Figure 2(a) represents the linear combination as in (1) where two normal cumulative distribution functions are combined with equal weights. As before, the blue and the red line have a mean of zero and a mean of six respectively and a variance equal to one and four respectively. The orange lines in Figure 1(a) and 2(a) are related in the sense that the orange line in 2(a) is the cdf of the orange line in 1(a) and hence they give the same prediction. This is true because it is calculated as an integral of a linear combination which is equal to the linear combination of the individual

(12)

Figure 2: Individual cdf’s and their combinations -8 -6 -4 -2 0 2 4 6 8 10 12 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Two normal probability functions; blue: N(0,1) and red: N(4,6) and it’s equally weighted linear combination (orange). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -8 -6 -4 -2 0 2 4 6 8 10 12 14

(b) Two normal probability functions; blue: N(0,1) and red: N(4,6) and it’s equally weighted combination of quantiles (orange). integrals: Z X i wipi(x) = X i wi Z pi(x)

To compare different techniques of combining we can calculate the cu-mulative distribution functions of the density combinations. The cdf of the logarithmic combination in equation (2) is defined as follows:

F_t+hl (yt+h) =

Z yt+h

−∞

f_t+hl (yt+h) (4)

Figure 3 represents the cdf’s of the three different combinations as well as the individual normal cdf’s. Now we have three different ways of combining models to forecast probabilities. An agent can make a forecast by (i) tak-ing a linear combination of densities, (ii) taktak-ing a logarithmic combination

(13)

Figure 3: Probability functions of two normal distributions and their equally weighted combinations . -8 -6 -4 -2 0 2 4 6 8 10 12 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 N(0,1) N(6,4) linear logarithmic quantile

of densities and by (iii) taking a linear combination of quantiles of the cu-mulative distribution function. The next section discusses several forecast methods that are used in the empirical analysis.

4 Volatility models

Implementation and hence evaluation of Value-at-Risk depends on assump-tions on daily returns. These returns can be modeled using several methods. These methods have different assumptions underlying them. Asset returns usually do not have simple normal distribution features but they are char-acterized by skewness, fat-tailedness and peakedness. Many methods are available on how to model the returns and make forecasts. For the empiri-cal analysis in this thesis conditional volatility models are used. Models of

(14)

conditional heteroskedasticity for time series have a very important role in today’s financial risk management and its attempts to make financial deci-sions on the basis of the observed price asset data in discrete time. These models are often used in VaR estimation. The methods can be framed in the following general set-up:

at= σtt (5)

t∼ D(0, 1) (6)

where at = yt− µt is the return series of the asset minus its expectation

conditional on return history It−1. The demeaned return athas the property

E(at|It−1) = E(yt|It−1) − µ = 0 where µ = E(yt). The conditional variance

in the model is V ar(yt|It−1) = E[(yt− µ)2|It−1] = E[a2t|It−1] = V ar(at|It−1)

and t is the standardized unexpected return following a distribution with

mean zero and variance equal to one. Several conditional distributions are considered in the empirical analysis. They are discussed later.

Conditional volatility models

The first model considered for conditional variance σ_t2 is the widely used

GARCH(1,1) model proposed by Bollerslev (1986). The model is given by

σ_t2 = α0+ α1a2t−1+ β1σ2t−1

where positivity of σ_t2 requires α0 > 0, α1 ≥ 0 and β1 ≥ 0. The model

is stationary if α1 + β1 < 1 and in that case has unconditional variance

σ2_{= α}

(15)

In the GARCH(1,1) model the news impact curve (NIC) is a parabola

and is symmetric around its minimum at at= 0.1 This means that a positive

shock has the same effect on σ_t+12 as a negative shock of the same size. In

practice this is unrealistic. A large negative shock like a stock market crash is expected to increase volatility more than a large positve shock. This is also known as the leverage effect. A large negative shock return leads to a decrease in equity value, hence an increase in debt-to-equity ratio (leverage), hence larger retun on equity. A model that deals with this is the exponential GARCH (EGARCH) model of Nelson (1991).

The EGARCH(1,1) specifies the log of the conditional variance as

log σ_t2= (1 − α1)α0+ θt−1+ γ[|t−1| − E(|t−1|)] + α1log σt−12 ,

with t = at/σt as before. There is no need to impose restrictions since σt2

is positive for all parameter values. There exists leverage effect if θ < 0. The parameter estimations of the GARCH models and its extensions

are done by Maximum Likelihood. The distribution of t corresponds with

the conditional distribution of the returns, yt|It−1 ∼ D(µ, σt2), where we

consider a normal distribution since it is so widely used and a Student-t distribution because of its fat-tailedness.

1_{The news impact curve is the effect of a}

ton σt+12 , keeping σt2and the past fixed. For

GARCH(1,1), given σ2

(16)

5 Weight determination and model assessment

This section describes how the weights are calculated for the individual methods and how the forecast combinations are evaluated in terms of accu-racy and performance. The performance of the different weighting schemes is not the main focus of this study but it serves more as a robustness check.

Obtaining weights

We first evaluate the equally weighted combinations with wi = 1/N , i =

1, . . . , N .

As a robustness check, we evaluate the models with another weighting scheme. The individual weights are calculated as follows. As in Opschoor

et al. (2016) the log scoring rule is used.2 This function takes the logarithm

of the density forecast evaluated at yt. For method Mi the score is given by

Si(yt, Mi) = log ft(yt|It, Mi) (7)

which can be viewed as a goodness-of-fit measure. Since we proposed dif-ferent ways of combining the densities in Section 3, we also have difdif-ferent scores for the forecast combination. The weights at time T are obtained for the individual densities by maximizing the sum of the log score of the combination where the sum is taken over some time period τ . For the simple linear opinion pool as in (1) that is maximizing

S(YT2, C p_{) =} X t∈T2 log " _N X i=1 wift,i(yt|It) # (8)

(17)

where YT2 is the set of realizations in T2 and T2 is the second observation

set which is explained later. Cp indicates the evaluation of the linear pooled

combination. We constrain the weights to sum to one and to be nonnegative,

that is,PN

i=1wi= 1 and wi ≥ 0 ∀ i. For the logarithmic combination as

in (2) the maximization problem (8) becomes

S(YT2, C l_{) =}X t∈T2 " _N X i=1

wilog ft,i(yt|It) − log

Z N Y i=1 fwi t,i(yt|It)dy !# . (9)

For computational reasons we use the weights of (8) for the quantile combination in the empirical analyisis.

Backtesting Value at Risk

To asses the accuracy of the VaR estimates we use the Conditional Coverage (CC) test of Christoffersen (1998). This tests if the actual number of viola-tions is in line with the expected number of VaR violaviola-tions. Additionally, the VaR exceedances should not occur in clusters. We can test for both seperately. The first test is in fact the proportion of failures (POF) test introduced by Kupiec (1995).

We apply these tests on the 1-day VaR estimates. Let the 1-day return

VaR (conditional on It) estimate with confidence q = 1 − p be

V aRq_t = σt+1zq (10)

where zq = inf{z ∈ R : F (z) ≥ q} and σt+1 the forecasted volatility. We

estimate both for q = .99 and q = .95. There is no mean variable in (10)

(18)

an individual model. In the following section we explain how this is applied to combined distributions.

Let xq_t = V aRq_t and define hits as It= 1{at+1<xqt}. Christoffersen (1998)

proposed a number of likelihood ratio tests, based on a Markov chain alter-native. Assume

P r(It= 1|It−1 = 0) = p0,

P r(It= 1|It−1 = 1) = p1,

and P r(It= 1) = p.

Based on these we can test for serial independence of {It}, with a likelihood

ratio test for H0 : p1 = p0. In addition, we can test for the proportion of

failures E(It|Ft−1) = 1 − q with likelihood ratio test for H0 : p = 1 − q

The test statistic for independence is given by

LRCCI = −2 log

(1 − p)n00+n10_pn01+n11

(1 − p0)n00pn001(1 − p1)n10pn111

with nij the number of pairs (It−1, It) = (i, j), e.g. n01denotes the number of

periods with no failures followed by a period with failures. The test statistic is asymptotically distributed as a chi-square with one degree of freedom.

The POF test works with the binomial distribution approach. In addi-tion, it uses a likelihood ratio to test whether the probability of exceptions is synchronized with the probability p implied by the VaR confidence level. If the data suggests that the probability of exceptions is different than p, the VaR model is rejected. The POF test statistic is given by

LRP OF = −2 log

(1 − p)N −xpx

1 −_NxN −x _Nxx

(19)

where x is the number of failures, N the the number of observations and p = 1 − V aR level. This statistic has a chi-squared distribution with one degree of freedom.

The above two tests combined gives the conditional coverage test H0 :

p0 = p1 = p with statistic

LRCC = LRCCI+ LRP OF.

This test is asymptotically distributed as chi-squared with two degrees of freedom.

6 Empirical analysis

Data

We analyse the added value of combination techniques using the forecast methods discussed in Section 4. For the empirical analysis we use data from four major stock indices. Daily returns on the AEX, the FTSE 100, the S&P500 and the NASDAQ index are used to evaluate the forecast combina-tions. Our sample starts from January 3, 2005 and ends on May 25, 2017. We delete the days on which the exchange is closed from the sample. Around 3000 observations for each index are used in the analysis. Figure 4 shows the development of the price of the four indices with the first observation in the sample as the benchmark (=100). The AEX and the FTSE 100 show a similar pattern while the NASDAQ seems to be much more volatile. The financial crisis of 2007 is clearly visible in affecting all the indices.

(20)

Figure 4: Daily log returns of four stock indices with first observation = 100. 2004 2006 2008 2010 2012 2014 2016 2018 0 50 100 150 200 250 300 350 400 AEX FTSE NASDAQ S&P500 Implementation

For each index we have around three thousand observations. The full set

of observations {T } is split up in three subsets denoted by {T1}, {T2} and

{T₃}. On the first thousand observations, denoted by {T₁}, the parameters

of the individual volatility models are estimated. Using these estimated

pa-rameters ˆθ the individual one-day-ahead predictive densities are constructed

for {T2}. Hence for each model we have a thousand density forecasts. The

second thousand observations {T2} are used to calculate the weights as in

(8) and (9). After we have obtained the weights we again estimate the

(21)

and calculate the 1-day ahead VaR estimates in {T3} using simulation

tech-niques. On the last set of observations, {T3}, we test the accuracy of the

1-day VaR estimates of the different combinations by applying the Con-ditional Coverage test of Christoffersen (1998) explained in the previous section.

In total there are four models which are estimated. The GARCH(1,1) and the EGARCH(1,1) model, both with a standard normal and a stan-dardised Students’t conditional distribution, are estimated using maximum likelihood. Using the estimation parameters we construct 1-day ahead con-ditional forecasts for the volatility, i.e. for the GARCH(1,1)-n model we have

σ2_h|F_h−1= ˆα0+ ˆα1a2h−1+ ˆβ1σ2h−1 ∀h ∈ {T3}

where we use the unconditional variance for the first observation of σ2_h−1.

Figure 5 shows the series of the 1-day return predictive variances of {T3}

of the FTSE 100 for each individual model. The four conditional models follow the same patern since the forecasted volatility in these models depends

heavily on the return shock at.

As a graphical example, Figure 6 represents the predictive densities of

the returns of the FTSE of day 150 in {T3} for each model including the

equally weighted linear combination and the equally weighted logarithmic combination. One can see that the logarithmic combination is slightly more peaked than the linear combination. This is only an example of one day in the sample and other days may have individual densities that do not

(22)

Figure 5: 1-day ahead forecasted variances in {T3} estimated by a GARCH

and an EGARCH model. Both models with a Normal and a t-distribution as conditional distribution. 2013 2013.5 2014 2014.5 2015 2015.5 2016 2016.5 2017 Estimated variance 0 1 2 3 4 5 6 7 8 GARCH-n GARCH-t EGARCH-n EGARCH-t

resemble each other as much as this example.

The 1-day VaR as in (10) cannot be applied directly as our predictive distribution is a combination of individual distributions. Since calculating the exact quantile from a user-defined distribution is computationally heavy

we use simulation techniques to overcome this issue.3

3

For each observation in the forecast sample we draw 100,000 random numbers from the combined (user-defined) density and calculate the corresponding VaR estimates using the matlab function quantile( X , p )

(23)

Figure 6: 1-day predictive densities of the 150th day in {T3} for the FTSE

100 estimated by individual models and equally weighted combinations.

-4 -3 -2 -1 0 1 2 3 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 GARCH-n GARCH-t EGARCH-n EGARCH-t

eq. weighted lin. combi. eq. weighted log. combi.

Results

To asses the added value of choosing a combination that is not the linear opinion pool we first test if the combinations perform better than the indi-vidual models. Since this is not the main point of this study we only apply this to one index. In Table 1 we can see how the individual volatility models perform in terms of 1-day return VaR estimation. For the 1-day 99% VaR estimates we see that both individual models that assume a normal condi-tional distribution are rejected in terms of predicting the correct number of

(24)

VaR violations. For stock returns, it has often been shown in practice that a Student’s t-distribution is a better choice for the error innovations than a normal distribution due to it’s fat-tailedness. However, looking at the 1-day 95% VaR estimates, the GARCH-n and EGARCH-n models do perform well and are not rejected by both the serial independence test and the POF test. From Table 1 we can only conclude that the combinations perform bet-ter than the individual models with a normal distribution assumption for the 1-day 99% VaR prediction. We cannot generally conclude that taking a combination outperforms the individual models for VaR estimation. How-ever, since none of the combinations are rejected by the two tests we can conclude that the combinations are good alternatives for predicting VaR measures.

Table 2 represents the main outcome of our study. It shows the eval-uation of the 1-day 99% and 95% VaR estimates of the equally weighted combinations of densities for the four analysed indices. The observation set

on which the estimates are backtested is {T3}. For each stock index, the

three combinations are evaluated and tested. We do not include p-values for the combined CC test since we find, for our case, that they are rejected by the CC test if they are rejected by one or both individual parts of the test, i.e the POF test and the serial independence test. Bold numbers denote that they are significant at the 5% percent level. The first and fourth col-umn for each index represents the number of violations and corresponding percentage in parentheses. For each index we estimate 1000 VaRs.

(25)

Table 1: Evaluation of 1-Day ahead VaR estimates of FTSE 100 for indi-vidual models and their equally weighted combinations

FTSE 100

Model VaR0.99 pCCI pP OF VaR0.95 pCCI pP OF

GARCH-n 21 0.459 0.002 47 0.880 0.660 GARCH-t 12 0.130 0.538 50 0.730 1 EGARCH-n 22 0.505 0.001 49 0.286 0.884 EGARCH-t 14 0.186 0.231 51 0.682 0.885 Linear 10 0.653 1 43 0.909 0.299 Quantile 12 0.589 0.538 43 0.909 0.299 Logarithmic 10 0.501 1 49 0.336 0.901

Note: This table provides the accuracy of 1-day ahead 99% and 95%

VaR estimates of the daily return of the FTSE 100 index. The columns represent the number of violations for the 99% and 95% VaR estimates and the p-values of the individual independence test (CCI) and the proportion of failures (POF) test. Bold numbers represent those models which have a p-value associated with the CCI and POF test above 5%.

(26)

AEX FTSE100

Combination V.99(%) pCCI pP OF V.95(%) pCCI pP OF V.99(%) pCCI pP OF V.95(%) pCCI pP OF

Linear 6 (0.6) 0.788 0.170 39(3.9) 0.640 0.097 10 (1.0) 0.653 1 43 (4.3) 0.909 0.299

Quantile 7 (0.7) 0.753 0.313 38 (3.8) 0.685 0.070 12 (1.2) 0.589 0.538 43 (4.3) 0.909 0.299

Logarithmic 9 (0.9) 0.413 0.863 42 (4.2) 0.502 0.253 10 (1.0) 0.653 1 49 (4.9) 0.336 0.901

NASDAQ S&P500

Linear 6 (0.6) 0.025 0.170 31 (3.1) 0.968 0.003 3 (0.3) 0.893 0.009 32 (3.2) 0.097 0.005

Quantile 4 (0.4) 0.858 0.030 20 (2.0) 0.414 0.000 4 (0.4) 0.857 0.030 27 (2.7) 0.037 0.000

Logarithmic 7 (0.7) 0.872 0.313 32 (3.2) 0.826 0.005 8 (0.8) 0.536 0.632 34 (3.4) 0.347 0.038

Table 2: Evaluation of 1-day ahead VaR estimates of equally weighted combinations. Note: This table provides the

accuracy of 1-day ahead 99% and 95% VaR estimates of the daily returns of four stock indices. The columns represent the number of violations for the 99% and 95% VaR estimates with corresponding percentages (of total number of observations) and the p-values of the individual independence test (CCI) and the proportion of failures (POF) test. Bold numbers represent those models which have a p-value associated with the CCI and POF test above 5%.

(27)

For the AEX index, we do not reject that the linear, the quantile and the logarithmic combination give accurate 1-day return 99% percent and 95% percent VaR estimates. The same is true for the FTSE 100 index. The logarithmic combination seems to predict the number of violations better than the other two combinations. However, note that we can not conclude from this that the logarithmic combination is the better one. For the NAS-DAQ, both the linear and the quantile combination are rejected in either the serial independence test or the POF test. This means that they are also rejected by the CC test. In most cases they are rejected because the predicted number of violations is incorrect. For the S&P500, we draw the same conclusions for the linear and the quantile combination. In terms of 99% VaR estimate accuracy, the logarithmic combination is not rejected for both the NASDAQ and the S&P500. We can conclude that this combination performs better than the other two combinations. However, if we look at the 95% VaR estimate accuracy, we also reject the logarithmic combination due to wrong prediction of VaR exceedances.

For all four indices we see that, in terms of the number of VaR violations, the logarithmic combination is larger than the linear combination. This can be explained by the fact that the logarithmic combination is less dispersed and hence predicts a lower VaR and violation may occur more often.

As a robustness check , the same analysis is done on the forecast com-binations but instead of using equal weights we now assign the weights by maximizing the log score of the combined density. The outcome of the

(28)

anal-ysis can be found in Table 3. We can immediately see that combination based on this weighting scheme are rejected more often by the CC test than the equally weighted ones. For the AEX index, again the logarithmic com-bination for the 99% VaR is not rejected. For the FTSE 100 index we find similar results to the equally weighted scheme as none of the estimations are rejected. For the NASDAQ stock market index we do not reject the 95% VaR estimations for all three combinations. For the 99% VaR, we only do not reject the logarithmic combination scheme. For the S&P 500 the log-score weighting scheme performs poorly as we reject all three combinations.

Based on Table 2 and Table 3 we can conclude that the logarithmic combination scheme is a good alternative for the equally weighted scheme as it performed better for some cases and not performed worse in any case. We did not find any performance difference between the linear and the quantile combination schemes.

7 Conclusion

The added value of combining models to improve forecasts is confirmed by a lot of empirical studies. Using daily returns on four stock market indices, we applied a linear, a logarithmic and a quantile combination on univariate volatility models of the GARCH class. We found that the decision on how to combine the models can lead to diverse outcomes.

When using forecast combinations, the results show that taking a log-arithmic combination of densities is a good alternative for taking a linear

(29)

AEX FTSE100

Linear 8 (0.8) 0.050 0.510 33 (3.3) 0.416 0.009 7 (0.7) 0.753 0.314 41 (4.1) 0.555 0.178

Quantile 8 (0.8) 0.050 0.510 31 (3.1) 0.336 0.003 9 (0.9) 0.066 0.747 42 (4.2) 0.515 0.233

Logarithmic 9 (0.9) 0.132 0.863 35 (3.5) 0.361 0.041 10 (1.0) 0.101 1 45 (4.5) 0.431 0.461

NASDAQ S&P500

Linear 7 (0.7) 0.036 0.314 45 (4.5) 0.502 0.461 2(0.2) 0.929 0.002 19 (1.9) 0.049 0.000

Quantile 6 (0.6) 0.025 0.170 39 (3.9) 0.640 0.097 3 (0.3) 0.004 0.009 19 (1.9) 0.049 0.000

Logarithmic 8 (0.8) 0.051 0.510 44 (4.4) 0.515 0.300 4 (0.4) 0.031 0.030 22 (2.2) 0.027 0.000

Table 3: Evaluation of 1-day ahead VaR estimates of combinations weighted using the log scoring rule. Note:

This table provides the accuracy of 1-day ahead 99% and 95% VaR estimates of the daily returns of four stock indices. The columns represent the number of violations for the 99% and 95% VaR estimates with corresponding percentages (of total number of observations) and the p-values of the individual independence test (CCI) and the proportion of failures (POF) test. Bold numbers represent those models which have a p-value associated with the CCI and POF test above 5%.

(30)

combination. The logarithmic combination performed better than the lin-ear and the quantile combination when estimating 1-day 99% VaR’s applied to the NASDAQ and the S&P500 stock market index. The logarithmic method estimates more VaR violations than the linear method. This can be explained by the fact that the logarithmic density is less dispersed than the linear density and hence has smaller VaR estimates and more violations. For both the AEX and the FTSE stock market indices all combinations per-form well. The linear, quantile and logarithmic combination method are not rejected by the CC test (Christoffersen, 1998) for both the 1-day 99% and 95% VaR estimate evaluation.

The above conclusions are based primarily on the assessment with equal weights. The log-score weighting scheme did not find very different results besides that in some cases it performed worse. However, the logarithmic combination seems the best alternative for this weighting scheme as well. In our empirical analysis we saw that almost all models and schemes un-derestimated the number of VaR violations. This can be explained by the fact that we have seen markets of extreme low volatilities over the past five years (on which the estimates are backtested). If we would do the analy-sis on another time set we might find different results and the logarithmic combination might not be the best alternative anymore.

Further research can be done with using additional volatility models from classes other than the GARCH class such as the Generalized Autoregressive Score (GAS) (Crea et al., 2013) models or Realized GARCH models. Using

(31)

stochastic volatility models or models from extreme value theory might also lead to interesting results. In addition, one can also assume other condi-tional distributions for the volatility models such as the generalized error distribution.

(32)

References

Marco Aiolfi and Allan Timmermann. Persistence in forecasting perfor-mance and conditional combination strategies. Journal of Econometrics, 135:31–53, 2006.

T. Bollerslev. Generalized autoregressive conditional heteroskedasticity.

Journal of Econometrics, 31:307–327, 1986.

Peter F. Christoffersen. Evaluating interval forecasts. International Eco-nomic Review, 39:841–862, 1998.

Robert T. Clemen and Robert L. Winkler. Combining economic forecasts. Journal of Business & Economic Statistics, 4(1):39–46, 1986.

C. Diks, V. Panchenko, and D. van Dijk. Likelihood-based scoring rules for comparing density forecasts in tails. Journal of Econometrics, 163: 215–230, 2011.

Christian Genest and James V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1(1):114– 148, 1986.

Lawrence R. Glosten, Ravi Jagannathan, and David E. Runkle. On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance, 48:1779–1801, 1993. T. Gneiting and R. Ranjan. Comparing density forecasts using

(33)

threshold-and quantile-weighted scoring rules. Journal of Business & Economic statistics, 29:411–422, 2011.

Massimo Guidolin and Allan Timmermann. Forecasts of us short-term in-terest rates: A flexible forecast combination approach. Journal of Econo-metrics, 150:297–311, 2009.

A.S. Jore, J. Mitchell, and S.P. Vahey. Combining forecast densities from vars with uncertain instabilities. Journal of Applied Econometrics, 25: 621–634, 2010.

Paul H. Kupiec. Techniques for verifying the accuracy of risk measurement models. The Journal of Derivatives, 3(2):73–84, 1995.

J. Mitchell and S.G. Hall. Evaluating, comparing and combining density forecasts using the klic with an application to the bank of england and niesr fan charts of inflation. Oxford Bulletin of Economics and Statistics, 67:995–1033, 2005.

D.B. Nelson. Conditional heteroskedasticity in asset returns: A new ap-proach. Econometrica, 59:347–370, 1991.

Anne Opschoor, Dick van Dijk, and Michel van der Wel. Combining forecasts

using focused scoring rules. Tinbergen Institute Discussion Paper

14-090/III, 2016.

(34)

growth in a seven-country data set. Journal of Forecasting, 23:405–430, 2004.