Evaluation and comparison of univariate and multivariate density forecasts in tails for a portfolio of commodities

(1)

E VA L U AT I O N A N D C O M PA R I S O N O F U N I VA R I AT E A N D

M U LT I VA R I AT E D E N S I T Y F O R E C A S T S I N TA I L S F O R A

P O RT F O L I O O F C O M M O D I T I E S

m . m . a . p ov e l

MSc Financial Econometrics

Amsterdam School of Economics

Faculty of Economics and Business

Supervisor: prof. dr. C.G.H. Diks

(2)

M.M.A. Povel: Evaluation and comparison of univariate and multivariate

(3)

A B S T R A C T

This thesis investigates the effect of aggregation on the results of the evaluation and comparison of competing density forecasts in the left tail of the distribu-tion. Conditional and censored likelihood scoring rules are used to construct Diebold-Mariano type test statistics, which tests for equal predictive accuracy. Monte Carlo simulations are performed to assess the size and power properties of the test. These were found to be satisfactory. In an empirical application the daily returns on a portfolio of commodities are used to produce competing aggregated univariate density forecasts and multivariate density forecasts. The test statistics for the investigated univariate and multivariate forecast methods indicate that aggregation is possible. Competing density forecasts with under-lying models that have the same mean vector, seem to allow for aggregation methods to get an accurate measure for the downside risk.

(4)

(5)

"Declare the past, diagnose the present, foretell the future." — Hippocrates

A C K N O W L E D G M E N T S

Firstly, I would like to thank my advisor Prof. Cees Diks for all his support and guidance. He was always there to answer my questions and guide me in the right direction.

I would also like to thank Hao Fang, who has spent a lot of time with me during the hours of simulation. Thanks to his help, I managed to solve several issues while coding. He was always willing to think about solutions, which I really appreciate.

I thank my girlfriend Eline for supporting me throughout my thesis. She helped me get through long nights, cooking for me and keeping me motivated.

Last but not least, I would like to thank my parents and my brothers for supporting me while writing this thesis and throughout life in general.

(6)

(7)

C O N T E N T S

1 i n t ro d u c t i o n 1

2 t h e o ry 5

2.1 Density forecast evaluation . . . 5

2.2 Scoring rules . . . 6

2.3 Testing approach and null hypothesis . . . 6

2.4 Kullback-Leibler Information Criterion . . . 8

2.5 The logarithmic scoring rule . . . 9

2.6 Weighted logarithmic scoring rules . . . 11

2.7 Weighted likelihood scoring rules . . . 13

2.8 Models . . . 15

2.8.1 Vector Autoregression Model . . . 15

2.8.2 The GARCH process . . . 15

2.8.3 Multivariate DCC model . . . 16

3 m o n t e c a r l o s t u dy 19 3.1 Density Forecasts in different dimensions . . . 19

3.2 Scoring rules . . . 20 3.3 Size . . . 20 3.4 Power . . . 22 4 e m p i r i c a l a p p l i c at i o n 27 4.1 Portfolio of Commodities . . . 27 4.2 Stationarity . . . 27 4.3 Model selection . . . 29 4.4 Simulation Setup . . . 31 4.5 Results . . . 32 5 c o n c l u s i o n 37 a a p p e n d i x 39 b i b l i o g r a p h y 47 vii

(8)

L I S T O F F I G U R E S

Figure 1 Standardized student-t(5) distribution is shown in red, the standard normal distribution in blue. . . 12 Figure 2 Two shifted normal distribution that are equally far

sit-uated from the standard normal distribution. These can be used to construct a simulation setup to assess the size property for the univariate case. . . 21 Figure 3 One-sided rejection rates of the Diebold-Mariano test

statistic of equal predictive accuracy when using the scoring rules Scsland Scl, under weight function I(−r ≤

rtp ≤ r). The figure shows the rejection rates, based on 1000 replications, against the alternative that the

N(0.2, 1) distribution has better predictive accuracy. . . 22 Figure 4 One-sided rejection rates of the Diebold-Mariano test

statistic of equal predictive accuracy when using the scoring rules Scsland Scl, under weight function I(−r ≤

rtp ≤ r). The figure shows the rejection rates, based on 1000 replications, against the alternative that the

N(0.2, 1) distribution has better predictive accuracy. . . 23 Figure 5 Univariate Power for sample size T =500. . . 25 Figure 6 Multivariate Power for sample size T =1000. . . 26 Figure 7 Returns of the portfolio assets Gold, Silver, Platinum

and Palladium . . . 28 Figure 8 Cointegration relation between assets. . . 29 Figure 9 Illustration of aggregation consequences for 2-dimensional

densities. p is the true unknown density, ˆf and ˆg are competing density forecasts. The blue line represents the threshold. . . 30

L I S T O F TA B L E S

Table 1 Output computed by egcitest. . . 29

(9)

Table 2 The different portfolio weights used for the empirical application. The ~βi represent the portfolio weight vec-tors and βi stands for the weight assets i receives. . . . 31 Table 3 Test statistics and P-values for the average univariate

conditional likelihood score differences at significance level of 5%. . . 32 Table 4 Test statistics and P-values for the average univariate

censored likelihood score differences at significance level of 5%. . . 33 Table 5 Test statistics and P-values for the average multivariate

conditional likelihood score differences at significance level of 5%. . . 34 Table 6 Test statistics and P-values for the average multivariate

censored likelihood score differences at significance level of 5%. . . 34

L I S T I N G S

Listing 1 For the data generating process in the simulations a DCC model was used with the parameters described below. . . 24 Listing 2 Engel-Granger Test for Cointegration. . . 39 Listing 3 Construction of several VAR models and model

specifi-cation tests in Matlab. . . 40 Listing 4 Construction of the density forecasts in Matlab using a

moving window. . . 41 Listing 5 Continuation of Listing 4; construction of density

fore-casts. . . 42 Listing 6 Continuation of Listing 5; end of the moving window

loop and construction of the scoring rules. . . 43 Listing 7 Calculating Score differences and HAC estimate for

vari-ance. . . 44 Listing 8 Calculating test statistic and comparing with critical

value . . . 45 Listing 9 Testing which model is preferred. . . 46

(10)

(11)

1

I N T R O D U C T I O N

Density forecasting has become an important tool for financial institutions over the last decades. At first point forecasts were used to forecast returns. However, it did not take long before economists realized that point forecasts are not very informative as they do not provide a measure of the forecast accuracy, as noted

by Diebold et al. (1998). Density forecasts however, produce a full predictive

distribution of the price, providing a complete measure of its uncertainty as argued byGranger and Pesaran (2000).

Portfolio managers are constantly trading, balancing risk against expected return. They use density forecasts to measure risk. Many density forecasts can be made and different models can be used to estimate them. Selecting the best forecast method can be very difficult. There are several statistical tools to test the accuracy of the forecast density function. These techniques can be classified into two groups:

1. Testing the quality of an individual density forecast relative to the data generating process.

2. Comparing two or more competing density forecasts.

The tests for evaluating the accuracy of a density forecast from the first group are based on the PIT. Diebold et al. (1998) were among the first to propose methods for evaluating density forecasts based on an probability inte-gral transform (PIT), dating back to Rosenblatt (1952). Diebold et al. (1998) evaluate a sequence of density forecasts by determining if the PIT of the realiza-tions of the variable with respect to the forecast densities are independent and identically distributed (i.i.d.) U(0, 1). Based on this hypothesis, formal tests can be constructed, such as the one constructed byBerkowitz (2001). Corradi

and Swanson (2003) constructed a test that allows for model misspecification.

These methods test the "goodness" of a given sequence of density forecasts, rel-ative to the data-generating process. The econometric model used to produce the sequence of density forecasts is likely to be misspecified. Absolute testing will only reject or fail to reject a hypothesis. In case of rejection, no alternative model is presented.

In practice, tests of the second group based on the relative predictive ac-curacy are preferred. These tests allow portfolio managers to decide which of two (or more) competing density forecasts describe the data better, given a

(12)

2 _{i n t ro d u c t i o n}

measure of accuracy. Every model underlying a density forecast may be mis-specified; therefore it is preferred to compare a set of candidate models based on the ‘distance’ of the models to the true, unknown data generating process. Contributions to this type of testing have been made by Sarno and Valente

(2004); Mitchell and Hall (2005); Corradi and Swanson (2005); Amisano and

Giacomini (2007) and Bao et al. (2004). The constructed test statistics

com-pare the relative distance between the competing density forecasts and the true density, however the way of measuring this distance differs. Mitchell and Hall

(2005), Amisano and Giacomini (2007) and Bao et al. (2004) developed tests

of equal predictive ability based on out-of-sample Kullback-Leibler Informa-tion Criterion (KLIC) values to compare the predictive accuracy of competing density forecasts. The comparison based on the KLIC values can be used as scoring rules, as done byAmisano and Giacomini(2007). Scoring rules are loss functions that depend on the density forecast and the actual observed data. The difference between the logarithmic scoring rule for two competing density forecasts is exactly the same as their relative KLIC values.

A particular region of the density forecast is of great importance for financial institutes; an accurate description of the left tail of the distribution of the port-folio returns is needed to make estimates of the Value-at-Risk or the Expected shortfall, which quantify the level of financial risk within an investment port-folio over a specific time frame. To compare density forecasts in a particular region,Bao et al. (2004) and Amisano and Giacomini (2007) considered using a likelihood ratio (LR) based on weighting the KLIC-type logarithmic scoring rule.Corradi and Swanson(2006) however showed that the accuracy of density forecasts in a specific region cannot be measured directly using the KLIC. By construction the weighted logarithmic scoring rule favors density forecasts with more probability mass in the region of interest. For instance, the t-distribution has more probability mass in the tails (fatter tails) than the normal distribu-tion. The resulting tests of equal predictive ability will be biased towards the t-distribution in this example.

Diks et al.(2011) considered the problem of the weighted logarithmic scoring

rule favoring density forecasts with more probability mass in the region of interest. They found that this problem can be solved by replacing the full likelihood by the conditional likelihood, given that the actual observation lies in the region of interest, or by the censored likelihood, with censoring of the observations outside the region of interest. Again, these adjusted scoring rules can be interpreted in terms of Kullback-Leibler divergences between weighted versions of the density forecast and the actual density.

Rather than forecasting univariate distributions of asset returns, multivariate models are used to capture the dependence in the second-order moments of asset returns in a portfolio. In the multivariate case the same problem holds for

(13)

i n t ro d u c t i o n 3

the weighted logarithmic scoring rule.Diks et al.(2013) extended the solution found by Diks et al. (2011) to the multivariate case. In their research they compared the predictive accuracy of two competing density forecasts within a particular region of the copula support. The size and power properties of the resulting test statistics were evaluated using Monte Carlo simulations to verify their reliability. In this thesis we extend the research ofDiks et al. (2011) and

Diks et al.(2013). So far, there is no method for evaluating competing density

forecasts of a portfolio of assets. Therefore, in this research we are going to test the possibility of using aggregation methods to produce and evaluate competing density forecasts of portfolio returns.

Following Diks et al. (2011), a test can be constructed both for univariate and multivariate density forecasts. For both cases, the finite sample size test properties are assessed using Monte Carlo simulations. The simulations results show that the test properties are satisfactory for realistic sample sizes. Com-peting multivariate density forecasts and aggregated versions of these densities can therefore be tested. By doing so, we would like to investigate if aggrega-tion is a possible technique to evaluate competing density forecasts for portfolio returns.

As an empirical application, a portfolio is constructed containing 4 commodi-ties: Gold, Palladium, Silver and Platinum. Over a period of 10 years, the daily prices of these commodities are used to identify the best forecasting method, both before and after aggregation. Results show that density forecasts that in-clude spillover effects are preferred in both the multivariate and aggregated case. Using model specification tests, also the underlying model with spillover effects is preferred. The parameter influences of portfolio returns and the threshold value, which indicates the region of interest, are also investigated.

The remainder of this thesis is organized as follows. Chapter 2 introduces density forecasting. Also, the different models that are used in this thesis are discussed, as well as the testing approach for the competing density forecasts. The different scoring rules introduced by Diks et al. (2011) are also explained in Chapter 2. In Chapter 3 the test properties are investigated by means of Monte Carlo simulations. In Chapter4the results of the empirical application are discussed. The conclusion are found in Chapter5.

(14)

(15)

2

T H E O RY

This thesis builds on the articles of Diks et al. (2011) and Diks et al. (2013). In this chapter we will explain the concept of density forecasting and discuss the models and testing approach used in this thesis.

2.1 d e n s i t y f o r e c a s t e va l u at i o n

Consider a vector-valued stochastic process {Zt : Ω → Rk+d}Tt=1, defined on

a complete probability space {Ω, F, P}, and identify Zt with (Yt, Xt), where

Yt : Ω → Rdis the d-dimensional random variable of interest and Xt :Ω → Rk is a vector of predictors. The information set available at time t is defined as Ft = σ(Z1, . . . , Zt). In this thesis we consider several cases where two competing forecasts methods are available, each producing a one-step-ahead forecast of Yt+1, using the information in Ft. We denote the density forecasts of Yt+1 by the probability density functions as ˆft(y) and ˆgt(y), respectively, where f and g are measurable functions.

Amisano and Giacomini (2007) argued that it is better to compare ‘forecast

methods’ instead of forecast models in the relative evaluation of density fore-casts. It is likely that any econometric model used to produce the sequence of density forecasts is misspecified. Therefore, it is more relevant to use the method considered by Amisano and Giacomini (2007). In their approach, the forecast method is defined to be the set of choices that the forecaster makes at the time of the prediction, including the density model, the estimation method, and the estimation window. There are very few restrictions imposed on the fore-cast methods. Actually, the forefore-cast methods must only fulfill the requirement that the density forecasts depend on a finite number R of most recent obser-vations Zt−R+1, . . . , Zt. The estimation methods can therefore be produced by density forecasts, such as parametric, semi-parametric, nonparametric, or Bayesian estimation methods. Another advantage over comparing forecast mod-els is that when comparing forecast methods, parameter estimation uncertainty can be treated as being part of the density forecast.

The use of a finite ‘rolling window’, as introduced by Fama and MacBeth

(1973), affords considerable analytical convenience and permit a rich variety of possibilities. The asymptotic theory of tests of equal predictive accuracy is simplified considerably by the use of a finite rolling window, as demonstrated in Section 2.3. Another advantage of comparing forecast methods is that it

(16)

6 _{t h e o ry}

permits a unified treatment of nested and non-nested models, whereas the tests of West (1996) are only applicable to nested models.

2.2 s c o r i n g ru l e s

This research uses scoring rules, put forward by Diebold and Lopez (1996), to evaluate and compare the relative performance of the one-step-ahead density forecasts ˆft(y) and ˆgt(y). In their research, they emphasize the small chance of ever stumbling upon a fully-optimal forecast. Therefore, a crucial object in measuring forecast accuracy is the loss function, L(yt+k,ˆyt+k,t), which charts the "loss," "cost" or "disutility" associated with various pairs of forecasts and realizations. In addition to the shape of the loss function, the forecast horizon

(k) is also of crucial importance.

The appropriate loss function depends on the situation at hand. In our re-search, a scoring rule is such a loss function; S∗(fˆt; yt+1). A scoring rule assigns

a higher score to a density forecast that is ‘better’ compared to another density forecast. Diebold et al. (1998) and Granger and Pesaran (2000) argue that a scoring rule should assign the highest score (on average) to a density forecast when it is in fact the true conditional density pt of Yt+1. A rational user would

prefer pt of Yt+1 over any other density forecast as they are trying to forecast

pt of Yt+1. One would therefore try to design a scoring rule such that the

highest score is assigned to pt, that is, Et S∗(fˆt; Yt+1) ≤Et S∗(pt; Yt+1) , for all t. (1)

A scoring rule is called ‘proper’ if the scoring rule does not assign a higher average score than that of the true conditional density ptto an incorrect density forecast ˆft, as described by Gneiting and Raftery (2007).

Density forecasts include estimated parameters. Even if the density fore-cast ˆft is based upon a correctly specified model, the density forecast will have estimated parameters. Therefore, the average score of the density fore-cast, Et

S∗(fˆt; Yt+1)

may not achieve the upper bound Et(S∗(pt; Yt+1))

due to non-vanishing uncertainty. It may happen that a density forecast based on a misspecified model with limited estimation uncertainty may be preferred over a density forecast based on a correct model specification with a larger estimation uncertainty, as illustrated byDiks et al. (2011) using Monte Carlo simulation.

2.3 t e s t i n g a p p roac h a n d n u l l h y p o t h e s i s

This thesis focuses on tests evaluating the accuracy of forecasting methods, rather than models. Giacomini and White(2006) developed tests of equal

(17)

pre-2.3 testing approach and null hypothesis 7

dictive ability, based on the predictive ability testing ofDiebold and Mariano

(1995) and West (1996), by evaluating the accuracy of forecasting methods rather than models. Giacomini and White(2006) distinguished tests of uncon-ditional and conuncon-ditional predictive ability. Given that the considered density forecasts are based on estimators whose estimation uncertainty does not vanish asymptotically, their tests have a number of appealing properties, mentioned in Section 2.1.

The testing approach of Giacomini and White (2006) can be extended to obtain tests of conditional predictive ability. Assume that two competing den-sity forecasts ˆft and ˆgt and corresponding realizations of the variable Yt+1 are

available for t=R, R+1, . . . , T − 1, where R is the rolling window size. Then it is possible to compare ˆft and ˆgt based on their average scores by testing for-mally whether their difference is statistically significant. The score difference is defined as

d∗_t₊₁ =S∗(fˆt; Yt+1)− S∗(ˆgt; Yt+1).

The null hypothesis of equal scores, for a given scoring rule S∗, is given by

H0 : E(d∗t+1) =0, for all t=R, R+1, . . . , T − 1.

The sample average of the score differences is defined as ¯ d∗_R,P =P−1 T −1 X t=R d∗_t₊₁, with P =T − R.

To test the null hypothesis against the alternative hypothesis, a type of test statistic constructed byDiebold and Mariano (1995) can be used.

Diebold and Mariano (1995) set up an asymptotic test statistic using the

sample mean loss differential ¯d. First a sample path {dt}Tt=1 of a loss

differen-tial was considered, equivalent to the sample path of score differences {d∗_t₊₁}T t=1

in this research. The sample path of the score differences, given that it is covari-ance stationary and has short memory, may be used to deduce the asymptotic distribution of the sample mean loss differential ¯d, which is

√

T(d − µ¯ )−→ Nd 0, 2πfd(0)

,

where fd(0) = _2π1 Pτ∞=−∞γd(τ)is the spectral density of the loss differential at frequency zero, γ_d(τ) = E[(dt− µ)(dt−τ − µ)] the autocovariance of the loss differential at displacement τ and µ is the population mean loss differential.

(18)

8 _{t h e o ry}

In large samples, the obvious large-sample N(0, 1)statistic for testing the null hypothesis of equal forecast accuracy is

S1 = ¯ d r 2π ˆfd(0) T , (2)

where ˆfd(0) is a consistent estimate of fd(0).

The statistic in (2) can be rewritten for testing the null hypothesis of equal expected scores as tR,P = √ P ¯ d∗_R,P q ˆσ_R,P2 , (3)

whereqˆσ_R,P2 is a heteroskedasticity and autocorrelation-consistent (HAC) vari-ance estimator of σ2_R,P =Var√P ¯d∗_R,P, which satisfies ˆσ_R,P2 − σ2

R,P P −→ 0. In the research of Giacomini and White (2006), Theorem 4, stated in The-orem 1 below, characterizes the asymptotic distribution of the test statistic under the null hypothesis.

Theorem 1. The statistic tR,P in (3) is asymptotically (as P → ∞ with R fixed) N(0, 1) distributed under the null hypothesis if:

i. {Zt} is φ-mixing of size −q/(2q − 2) with q ≥ 2, or α-mixing of size −q/(2q − 2) with q ≥ 2; ii. E d ∗ t+1 2q

< ∞ for all t; and

iii. σ2_R,P =Var√P ¯d∗_R,P> 0 for all P sufficiently large.

Proof. The proof of Theorem 1 is based on the central limit theorem for de-pendent heterogeneous processes given inWooldridge and White (1988).

2.4 k u l l b ac k - l e i b l e r i n f o r m at i o n c r i t e r i o n

In this thesis, we will use a scoring rule that is closely related to the Kullback-Leibler information criterion (KLIC). The KLIC has been used for over 60 years, although not for the evaluation of density forecasts. It forms the basis for the Akaike model selection criterion (AIC). The AIC is used to rank alternative models according to how close they are to the true but unknown data gener-ating process. The AIC, estimated from the maximized log-likelihood, gives

(19)

2.5 the logarithmic scoring rule 9

an approximately unbiased estimate of the expected, relative KLIC ‘distance’ between a given model and the true but unknown density, which is treated as a constant across competing models, see Burnham and Anderson(2002).

The KLICt divergence between the true conditional density pt and the den-sity forecast ˆft, is defined as:

KLICt(fˆt) = Z ∞ −∞pt(yt+1)log p_t(y_t₊₁) ˆ ft(yt+1) dy_t₊₁ (4) or KLICt(fˆt) = Et h log pt(Yt+1)− log ˆf1t(Yt+1) i . (5)

The smaller the distance, the closer the density forecast is to the true density. KLICt = 0 if and only if pt =fˆt, under the constraint that R∞∞ fˆt(y)dy = 1. This follow from the fact that density ˆft,

KLICt(fˆt) = Et " logpt(Yt+1) ˆ ft(Yt+1) # =− Et " logfˆt(Yt+1) pt(Yt+1) # ≤ − Et " _ˆ ft(Yt+1) pt(Yt+1) # − 1 = Z ∞ −∞pt(yt+1) ˆ ft(yt+1) pt(yt+1) dy_t₊₁− 1=0 (6)

where the inequality follows from applying log x ≤ x − 1 to ˆft/pt.

2.5 t h e l o g a r i t h m i c s c o r i n g ru l e

The KLIC was introduced by Mitchell and Hall (2005) as a unified means of evaluating, comparing and combining competing density forecasts, whether model-based or subjectively formed.Mitchell and Hall(2005),Bao et al.(2004)

and Amisano and Giacomini (2007) focused on the logarithmic scoring rule

Sl(fˆt; yt+1) =log ˆft(yt+1), (7)

such that an observation that falls within a region with high predictive density ˆ

ft receives a high score, and an observation that falls within a region with low predictive density receives a low score.

The KLIC can be used for testing the null hypothesis H0 : KLICt =0 , ∀t of an individual density forecast being correct. The KLIC has the advantage that it has an absolute lower bound equal to zero, achieved when ˆft = pt. It’s value therefore provides a measure of the divergence between the density forecast ˆftand the true conditional density pt. However, because ptis unknown, the KLIC cannot be directly evaluated. Some authors have tried to estimate

(20)

10 _{t h e o ry}

pt, but the problem can also be circumvented by following Bao et al. (2004), invoking Proposition 2 of Berkowitz (2001) and noting that

log pt(yt+1)− log ˆft(yt+1) =

log qt(z_{f ,t}ˆ ₊₁)− log φt(z_{f ,t}ˆ ₊₁) = log ht(zt+1),

(8) where zt+1= Z y_t₊₁ −∞ ˆ ft(y)dy, z_{f ,t}ˆ ₊₁ =Φ−1(zt+1),

where qt(·)is the true unknown conditional density of zf ,tˆ +1, φ(·)the standard

normal density andΦ the c.d.f. of the standard normal. This result states that testing the departure of {z_{f ,t}ˆ ₊₁}T −1t=R from the i.i.d N

0, 1 is equivalent to testing the distance of the forecasted density from pt. The fact remains that qt(·)is also not known, but it can be estimated using a flexible density function. We know at least that when ˆft(yt+1)is correctly specified, qt(·)is i.i.d. N

0, 1. However, when we specify ˆft(yt+1), there is no certainty that it accommodates

qt(·).

To test the null hypothesis pt = fˆt, Mitchell and Hall (2005) exploit the framework of West (1996) and White (2000). Using the loss differential {dt}

dt+1 =log pt(yt+1)− log ˆft(yt+1) = log qt(zf ,tˆ +1)− log φt(zf ,tˆ +1). (9)

The null hypothesis of the density forecast being correctly specified indeed becomes

H0 : E(dt+1) =0, ⇒ KLICt =0. (10)

The null hypothesis in (10) is in fact the same as the null hypothesis discussed in section 2.3.

The sample mean ¯d is defined as:

¯ d = KLIC = 1 T − R T −1 X t=R h log qt(z_{f ,t}ˆ ₊₁)− log φt(z_{f ,t}ˆ ₊₁) i . (11)

Mitchell and Hall(2005) use ¯d to test a hypothesis about dt+1. From the central

limit theorem, and under appropriate assumptions, the limiting distribution of ¯ d is √ T(d −¯ E(dt+1)) d − → N0,Ω, (12)

where a general expression, allowing for parameter uncertainty, of the covari-ance matrix Ω is given in West (1996).

(21)

2.6 weighted logarithmic scoring rules 11

Based on the P available observations for evaluation, y_R₊₁, . . . , y_T, the density forecasts ˆft and ˆgt can be ranked according to their average scores n−1PT −1

t=R log ˆft(yt+1) and n−1

PT −1

t=R log ˆgt(yt+1). The density that receives

the highest score would be the preferred one. To test whether the predictive accuracy is significant, the sample average of the log score differences, in (13), may be used, together with the test statistic defined in (3). This coincides with the log-likelihood ratio of the two competing density forecasts.

dl_t₊₁ =log ˆft(yt+1)− log ˆgt(yt+1). (13)

As discussed inMitchell and Hall (2005) andBao et al. (2004), we can use the KLIC to measure the relative accuracy of two competing densities. This is in fact the same as the log score difference, as

KLIC(ˆgt)− KLIC(fˆt) = E log pt(yt+1)− log ˆgt(yt+1) − E log pt(yt+1)− log ˆft(yt+1) =log ˆft(yt+1)− log ˆgt(yt+1). (14) Summarizing, the null hypothesis of equal average logarithmic scores for the densities ˆft and ˆgt corresponds with the null hypothesis of equal KLICs. The use of logarithmic scores is in fact the same as examining which of the competing density forecasts comes closest to the true distribution.

2.6 w e i g h t e d l o g a r i t h m i c s c o r i n g ru l e s

In risk management, assessing the downside risk of a portfolio is only possible when an accurate description of the left tail of its distribution is available. Several instruments use this part of the density forecast for quantifying and assessing market downside risks associated with financial and commodity asset price fluctuations. For instance, Value-at-Risk (VaR) models determine the maximum expected loss an asset or a portfolio can generate over a certain holding period, with a pre-determined probability value. VaR models can also be used to determine the most effective risk management strategy for a given situation. In the current market environment, quantification of the extreme losses in asset markets is important. Extreme value theory (EVT) provides a comprehensive theoretical forum through which statistical models describing extreme scenarios can be developed.

To be able to determine the performance of density forecasts in the left tail of the distribution, weighted scoring rules wt(y) can be used. This allows for determining which density forecasts scores best in the region of interest, seeBao

et al.(2004) andAmisano and Giacomini (2007). The remaining part becomes of less importance. Therefore, even if a particular density forecast describes the true distribution better, a density forecast which scores higher in comparison in the region of interest will be chosen.

(22)

12 _{t h e o ry}

Fig. 1: Standardized student-t(5) distribution is shown in red, the standard normal distri-bution in blue. −6 −4 −2 0 2 4 6 0 0.1 0.2 0.3 0.4 stand.t(5) N (0, 1)

Amisano and Giacomini (2007) suggested using the following of weighted

logarithmic scoring rule

Swl(fˆt; yt+1) =wt(yt+1) log ˆft(yt+1) (15)

with an appropriate weight function wt(y) to assess the quality of the density forecast ˆft on a specific part of the distribution. The weighted average scores n−1PT −1

t=R log wt(·)fˆt(yt+1) and n−1PT −1t=R wt(·)log ˆgt(yt+1) can be used to

rank the forecasts ˆft given for the specific region of the distribution. The weighted score difference is defined as:

dwl_t₊₁ =Swl(fˆt; yt+1)− Swl(ˆgt; yt+1) =wt(yt+1) log ˆft(yt+1)− log ˆgt(yt+1) . (16)

The null hypothesis of equal weighted scores, H0 : E

dwl_t₊₁ = 0 for all t =

R, R+1, . . . , T − 1, can be tested using a test statistic of the form of (3). The only requirement imposed by Amisano and Giacomini (2007) on the weight function is that it should be positive and bounded. However, as pointed out byGneiting and Ranjan (2008) a wl score does not satisfy the properness property; an incorrect density forecasts ˆft can receive a higher average score than the actual density pt. Consequently, the test of equal predictive accuracy might suggest that the incorrect density forecast is significantly better than the true density.

Diks et al. (2011) showed that the ‘threshold’ weight function wt(y) =

I(y ≤ r), with a fixed threshold r, where I(A) = 1 if the event A occurs

and zero otherwise, does not satisfy the properness property. The weighted logarithmic score causes the predictive ability tests to be biased toward densi-ties with more probability mass in the left tail. For instance, below a certain

(23)

2.7 weighted likelihood scoring rules 13

threshold r the standardized student student-t(5) distribution shown in Fig.1 will always be preferred over the standard normal distribution by the weighted logarithmic scoring rule due to the probability mass issue.

2.7 w e i g h t e d l i k e l i h o o d s c o r i n g ru l e s

To avoid inconsistencies, proper scoring rules will be used. Diks et al. (2011) adapt the logarithmic scoring rule in (7) to obtain a scoring rule for evaluating and comparing density forecasts in a specific region of interest. By replacing the full likelihood by either the conditional likelihood, given that the observation lies in the region of interest, or by the censored likelihood, with censoring of the observation outside At, the KLIC-based scoring rules satisfy the properness property and evaluate and compare density forecasts in a specific region of interest, At⊂R.

For a given region of interest At, the conditional likelihood (cl) score function is given by Scl(fˆt; yt+1) =I(yt+1 ∈ At)log ˆ ft R Atfˆt(y)dy ! , (17)

where I(·) denotes the indicator function, which is one if the event in its ar-gument occurs and zero otherwise. The division byR

Atfˆt(s)ds normalizes the

density on the region of interest, which allows for competing density forecasts to be compared in terms of their relative KLIC values. The conditional likeli-hood scoring rule is chosen when density forecasts are evaluated based only on their behavior in the region of interest At.

The cl scoring rule however, due to this normalization, does not take into account the accuracy of the density forecast for the total probability of the region of interest. For example, if At represents the left tail yt+1 ≤ r, the

conditional likelihood does not take into account whether the tail probability implied by ˆft matches with the frequency at which tail observations actually occur. Therefore, density forecasts that have similar tail shapes receive com-parable scores from the cl score function, even though the forecasts may have completely different tail probabilities. Tail probabilities are actually important in risk management, in particular for VaR evaluation. The tail probability can be taken into account by using the censored likelihood (csl) score function, given by Scsl(fˆt; yt+1) =I(yt+1∈ At)log(fˆt) +I(y_t₊₁ ∈ Ac_t)log Z Ac_t ˆ ft(y)dy ! , (18)

(24)

14 _{t h e o ry}

where Ac_t is the complement of At. The csl scoring rule takes into account the likelihood associated with having an observation outside the region of interest, but apart from that ignores the shape of ˆft outside At.

The cl and csl scoring rules focus on a sharply defined region of interest At. It is possible to adapt these scoring rules by introducing the weight function

wt(y) =I(y ∈ At). The scoring rules then become

Scl(fˆt; yt+1) =wt(yt+1)log ˆ ft R wt(y)fˆt(y)dy ! . (19) and Scsl(fˆt; yt+1) =wt(yt+1)log(fˆt) + (1 − wt(yt+1))log 1 − Z wt(y)fˆt(y)dy (20) Following Diks et al. (2011), the following assumptions are made concerning the density forecasts that are to be compared, and the weight function.

Assumption 1. The density forecasts ˆft and ˆgt satisfy KLIC(fˆt) < ∞ and KLIC(ˆgt)< ∞, where KLIC(ˆht) = R pt(y)log

_p_t₍_y₎

ht(y)dy

is the Kullback-Leibler divergence between the density forecast ht and the true conditional density pt. Assumption 2. The weight function wt(y) is such that

(a) it is determined by the information available at time t, and hence a function of Ft,

(b) 0 ≤ wt(y)≤ 1, and (c) R

wt(y)pt(y)dy > 0.

Assumption 1 ensures that the expected score differences for the competing density forecasts are finite. Assumption 2(c) is needed to avoid cases where

wt(y) takes strictly positive values only outside the support of the data. The following lemma ensures that the scoring rules in (19) and (20) are proper. In the Appendix of Diks et al. (2011) the proof for this lemma is presented, using univariate density functions. The proof remains valid if multi-variate densities are used.

Lemma 1. Under Assumptions 1 and 2, the generalized conditional likelihood scoring rule given in (19) and the generalized likelihood scoring rule given in (20) are proper.

(25)

2.8 models 15

Therefore, we can test the null hypothesis for all t =R, R+1, . . . , T − 1 of equal predictive accuracy of the density forecasts ˆft(yt+1)and ˆgt(yt+1)against

the alternative, where ˆft(yt+1)and ˆgt(yt+1) are based on the conditional

likeli-hood score and censored likelilikeli-hood score. As described before, to test the null hypothesis we use the score differences to compute a test statistic of the form of (3).

2.8 m o d e l s

So far, we have explained the scoring rules for assessing the predictive accuracy of competing density forecasts. In the following subsections, the models on which the density forecasts are based are supplied.

2.8.1 Vector Autoregression Model

The Vector Autoregression (VAR) model is used for the analysis of multivariate time series. It is a natural extension of the univariate autoregressive (AR) model to dynamic multivariate time series. The AR model specifies that the output variable depends linearly on its own previous values and on a stochastic term

t. The VAR model generalizes the univariate AR model by allowing for more than one evolving variable. The VAR model has proven to be especially useful for describing the dynamic behavior of economic and financial time series and for forecasting.

To define the VAR model, let Yt = (y1,t, . . . , yn,t)0 denote an(n × 1) vector of time series variables. The basic p-lag (VAR(p) model is then of the form

Yt =c+Π1Yt−1+. . .+ΠnYt−p+t, t =1, . . . , T , (21) where Πi are (n × n) coefficient matrices and t is an (n × 1) unobservable zero mean white noise vector process (serially uncorrelated or independent) with time invariant covariance matrix Σ (also denoted as E[t0t] =Σ).

In this thesis the VAR(4) model is considered to estimate the conditional vector mean return, which is given by

Yt =c+

4

X

i=1

ΠiYt−i+t, t=1, . . . , T . (22)

2.8.2 The GARCH process

The Generalised Autoregressive Conditional Heteroskedasticity (GARCH) mod-el, as suggested byBollerslev (1986), extends the ARCH (Autoregressive

(26)

Con-16 _{t h e o ry}

ditional Heteroskedastic) class of models to allow for both a longer memory and a more flexible lag structure. The ARCH process introduced by Engle

(1982) allows the conditional variance to change over time as a function of past errors leaving the unconditional variance constant. The ARCH model can be expressed as ht =α0+ q X i=1 αiε2t−i, (23)

where ht represents the conditional variance and t the error terms.

Let the error term t denote a real-valued discrete-time stochastic process, and ψt the information set (F -field) of all information through time t. The GARCH(p, q) process is then given by

t|Ft−1 ∼ N(0, ht) (24) ht =α0+ q X i=1 αiε2t−i+ p X i=1 βiht−i =α0+A(L)2t +B(L)ht, (25) where p ≥ 0, q ≥ 0, α0≥ 0, αi ≥ 0, i=1, . . . , q, βi ≥ 0, i=1, . . . , p.

For p=0 the process reduces to the ARCH(q) process, and for p =q =0, t is simply white noise.

The GARCH(1,1) process, which is used to estimate the conditional variance of the different assets in the empirical application, is given by equation (24) and

ht =α0+α1ε2t−1+β1ht−1, α0 > 0, α1 ≥ 0, β1 ≥ 0. (26)

2.8.3 Multivariate DCC model

There are many models for estimating the conditional covariances and correla-tions in the class of multivariate GARCH models and the most widely-used are the BEKK and dynamic conditional correlation (DCC) models. These models were developed by Engle and Kroner (1995) and Engle (2002), respectively. The disadvantage of using BEKK models is that it suffers from the ‘curse of di-mensionality’. This ‘curse of dimensionality’ can be explained as the number of parameters that increase at a faster rate than the number of assets. For exam-ple, the most general BEKK model ofEngle and Kroner(1995) has parameters

(27)

2.8 models 17

increasing with order O(n4), where n represents the dimension of the vector of financial variables, while the fully parametrized DCC model ofEngle (2002) increases by O(n2), see Caporin and McAleer (2010). Because of this curse of dimensionality and the limited number of data points available for the port-folio in the empirical application, the DCC model will be used. DCC models have the flexibility of univariate GARCH models coupled with parsimonious parametric models for the correlations.

The DCC model was introduced by Engle (2002) as a generalization of the constant conditional correlation (CCC) model ofBollerslev(1990). In this case, the focus is on the separate modeling of the conditional variances and condi-tional correlations. The covariance matrix is decomposed as follows:

Σt =DtRtDt (27)

Dt =diag(σ1,t, . . . , σk,t) (28)

Rt =Q¯−1/2t QtQ¯−1/2t , Q¯−1/2t =diag(Qt), (29) where Dt includes the conditional volatilities which are modeled by a set of uni-variate GARCH equations (see Bollerslev (1990);Engle (2002)). The dynamic correlation matrix, Rt, is not explicitly driven by a dynamic equation, but is derived from a standardization of a different matrix Qt which has a dynamic structure. The form of Qt determines the model complexity and the feasibility in large cross-sectional dimensions. Several specifications have been suggested for Qt. The DCC model (or Hadamard DCC [HDCC]) is given inEngle (2002) as

Qt =S+A ◦(Dt−1−1t−1t−10 Dt−1−1 − S) +B ◦(Qt−1− S), (30) where A and B are symmetric parameter matrices and S is a referred to as a long-run correlation matrix.

To estimate the conditional variance-covariance matrix, we will consider the DCC(1,1), or ADCC(1,0,1), model. The numbers (1,0,1) represent respectively the order of the symmetric innovations, the order of asymmetric innovations and the order of the lagged correlation in the DCC model.

(28)

(29)

3

M O N T E C A R L O S T U D Y

In this chapter Monte Carlo simulations are used to examine the finite sample behavior of predictive accuracy tests for comparing two competing density forecasts, in selected regions of the distribution. In particular, we will consider the size and power properties of the Diebold-Mariano type statistic as given in (3) for different dimensions. The tests are constructed using the conditional (19) and censored likelihood scores (20) as described in Chapter 2. The tests are examined on a univariate as well as a multivariate level. That is, the test are constructed using scoring rules for competing density forecasts based on portfolio returns as well as scoring rules based on a 4 dimensional probability density function representing the return space of a portfolio of 4 assets.

3.1 d e n s i t y f o r e c a s t s i n d i f f e r e n t d i m e n s i o n s

The scoring rules are constructed for univariate and multivariate density fore-casts. The univariate density forecast that will be used in the evaluation is of the form ˆ f(r_te₊₁) = 1 σ√2πexp ( −(rt+1− r e t+1)2 2σ2 ) . (31)

The multivariate density forecast that will be used in the evaluation is of the form ˆ f(r_te₊₁) = 1 (√2π)4_|_Σ|1₂ exp −1 2(~rt+1− ~r e t+1)0Σ−1(~rt+1− ~rte+1) , (32)

where the variance-covariance matrix consists of

Σ=         σ₁2 σ12 σ13 σ14 σ12 σ22 σ23 σ24 σ13 σ23 σ32 σ34 σ14 σ24 σ34 σ42         . (33)

The variance-covariance matrices will be estimated by a DCC(1,1) model and by several GARCH(1,1) models. The GARCH(1,1) models will produce a di-agonal variance-covariance matrix where all ˆσij =0, for all i 6=j.

(30)

20 _{m o n t e c a r l o s t u dy}

3.2 s c o r i n g ru l e s

To get an expression for the scoring rules Scsl and Scl given in (19) and (20) re-spectively, we need to evaluate the integralR

wt(y)fˆt(y)dy. We use an indicator function I(·) to define the region of interest, wt(y) = I(y ∈ At) = I(~rtp ≤ r). The integral of wt(y)fˆt(y) normalizes the density on the region of interest. Because all variables have a normal distribution, this integral can be solved analytically, Z wt(y)fˆt(y)dy= Z I(~rt ≤ r)fˆt(y)dy= = r Z −∞ ˆ ft(y)dy= P(Y ≤ r). (34)

The portfolio return is defined as ~r_tp = β1r1,t+β2r2,t+β3r3,t+β4r4,t =

~

β · ~rt, where r1, . . . , r4 are the assets returns. The probability in (34) can now

be rewritten by filling in the variable of interest ~r_tp and solving

P(Y ≤ r) =Pβ · ~r~ t ≤ r =Pβ ·~ (r~t− ~µ)≤ r − ~β · ~µ = Pβ ·~ (r~t− ~µ) ≤ c1 =P ~ β ·Σ12Σ− 1 2(~r t− ~µ)≤ c1 = P ~ β ·Σ12~_{Z ≤ c} 1 =P~v · ~Z ≤ c1 = P  Z ≤ c1 q v2₁+v2₂+v₃2+v₄2  =Φ   c1 q v₁2+v2₂+v₃2+v₄2  . (35)

In the equations above we transformed the variable of interest into a standard normal distribution, making it possible to obtain a value for the probability. To do so, we used that r − ~β · ~µ=c1 and ~β ·Σ

1

2 =~v. Because ~Z ∼ N(~0; I), the product of ~v · ~Z ∼ N(0, v₁2+v₂2+v₃2+v₄2).

For the univariate case the integral can be evaluated in a similar fashion;

P(Y ≤ r) =P(r_tp ≤ r) = P(r_tp− µp≤ r − µp) = P r p t − µp σp ≤ r − µp σp ! =P Z ≤ r − µ p σp ! =Φ r − µ p σp ! . (36)

The cumulative distribution functions found in (35) and (36) can be used to calculate the scoring rules.

3.3 s i z e

In this section we will examine the size properties of the Diebold-Mariano type statistic for the null hypothesis

(31)

3.3 size 21

Fig. 2: Two shifted normal distribution that are equally far situated from the standard

normal distribution. These can be used to construct a simulation setup to assess the size property for the univariate case.

−3 −2 −1 0 1 2 3

ˆ g ˆ

f

To investigate the size properties, two competing density forecasts need to be created that are equally incorrect. Fig. 2 shows an example of 2 equally incorrect univariate density forecasts if the true conditional density pt equals the standard normal distribution.

The focus of this thesis is on the left tail of the density forecast of a portfolio of assets. However, it seems impossible to construct an example for which the null hypothesis of equal predictive accuracy for two competing density forecasts holds if we were to include the weight function I(rtp≤ r). The threshold value r will determine whether the null hypothesis holds.

The central part of the distribution is suited for constructing an example for which the size property of the tests can be evaluated. This is achieved by using the weight function I(−r ≤ rtp≤ r). A heteroskedasticity and autocorrelation consistent (HAC) estimator for the asymptotic variance of the average relative score ¯d∗_R,P will be used:

ˆσR,P2 = ˆγ0+2

K−1

X

k=1

akˆγk, (37)

where ˆγ0 denotes the lag-k sample covariance of the sequence dT −1t=R and ak = 1 − k/K the Bartlett weights with K =jn1/4k.

For the Monte Carlo simulation, observation sample sizes of P = 500 and

P = 1000 are chosen. The data generating process (DGP) corresponds to an i.i.d. standard normal distribution. The competing density forecasts are normal distributions with means −0.2 and 0.2 and variance 1 and the threshold value

(32)

Fig. 3: One-sided rejection rates of the Diebold-Mariano test statistic of equal predictive

accuracy when using the scoring rules Scsl and Scl, under weight function I(−r ≤

rtp ≤ r). The figure shows the rejection rates, based on 1000 replications, against the alternative that the N(0.2, 1)distribution has better predictive accuracy.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.02 0.04 0.06 0.08 0.1 0.12 Threshold r Rejection Rate

Univariate Size for sample size N=500

cl1 csl1 cl5 csl5 cl10 csl10

forecasts have the same predictive accuracy. Next, the scoring rules Scsland Scl are calculated for the samples, after which the score differences are taken. By means of the HAC estimator for the asymptotic variance the test statistic can be evaluated for different threshold values. By calculating the rejection rates of the nominal significance levels of 1%, 5% and 10%, based on 1000 replication, Fig.3 and Fig. 4 for the different sample sizes can be created.

As can be seen in Fig. 3 and Fig. 4 the 3 rejection rates for the scoring rules Scsl and Scl are identical. For this combination of DGP and the middle region of the probability density functions, the scoring rules Scsl and Scl will both degenerate to the weighted scoring rule Swl in (15) described by Diks

et al. (2011). The rejection rates of the tests are also quite close to the nominal significance levels, and as r increases, they appear to converge to these levels. Therefore it appears that the size properties of the predictive accuracy test is satisfactory.

3.4 p ow e r

To assess the power properties of the tests of equal predictive accuracy, simula-tion experiments are performed where one of the competing density forecasts

(33)

3.4 power 23

Fig. 4: One-sided rejection rates of the Diebold-Mariano test statistic of equal predictive

accuracy when using the scoring rules Scsl and Scl, under weight function I(−r ≤

rtp ≤ r). The figure shows the rejection rates, based on 1000 replications, against the alternative that the N(0.2, 1)distribution has better predictive accuracy.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.02 0.04 0.06 0.08 0.1 0.12 Threshold r Rejection Rate

Univariate Size for sample size N=1000

cl1 csl1 cl5 csl5 cl10 csl10

is correct. That is, one of the competing density forecasts has the same dis-tribution function as the DGP. Therefore, regardless of the threshold r in the weighting function wt(y) =I(~rtp≤ r), the correct density forecast will always be preferred.

The setup that will be used for the Monte Carlo simulations, will closely follow the setup used in the empirical application in Chapter4. Several sample sizes will be used, including large sample sizes, to make sure that the curse of dimensionality is overcome.

As described in Section 2.1, a finite ‘rolling window’ is perfectly suited to construct density forecasts. The number of forecasts is denoted as P =T − R,

where R is the length of the rolling window and T the sample size. In the simula-tion, 100 density forecasts are produced for 2 different sample sizes and rolling windows. The DGP is a DCC model, its parameters are displayed in Listing 1. For this simulation, we iterate 100 times with the threshold values r rang-ing between [−1, 0]. The portfolio weights are set to β = [1/2, 1/2, 1/2, 1/2]0_.

In every iteration, dcc_simulate generates data (W ) from the DCC model. Next a rolling window produces 100 multivariate and univariate competing density forecasts from 2 models; a DCC(1,1) model and GARCH(1,1) models.

(34)

Listing 1: For the data generating process in the simulations a DCC model was used with

the parameters described below.

k s e r i e s = 4; C o r r M a t = [ 1 , 0 . 5 , 0 . 5 , 0 . 5 ; 0 . 5 , 1 , 0 . 5 , 0 . 5 ; 0 . 5 , 0 . 5 , 1 , 0 . 5 ; 0 . 5 , 0 . 5 , 0 . 5 , 1 ] ; g a r c h p a r a m e t e r s = [ 0 . 0 5 , 0 . 2 , 0 . 6 , 0 . 0 6 , 0 . 3 , 0 . 5 , 0 . 0 8 , 0 . 4 , 0 . 2 , 0 . 0 5 , 0 . 3 , 0 . 3 ] ; a r c h P = [1 ,1 ,1 ,1] ’; g a r c h Q = [1 ,1 ,1 ,1] ’; d c c p a r a m e t e r s = [ 0. 4 ,0 .5 ] ’; l e n g t h d a t a = T ; [ W , H T _ t r u e ] = d c c _ s i m u l a t e ( kseries , l e n g t h d a t a , CorrMat , g a r c h p a r a m e t e r s , archP , garchQ , d c c p a r a m e t e r s ,1 ,1) ; P = s i z e( W ,1) ;

These density forecasts are used to produce the different scores, as described in Section 3.2.

By taking the score differences and calculating the HAC estimator for the asymptotic variance of the score differences, a Diebold-Mariano type test statis-tic can be produced. For different nominal significance levels, the different test statistics, reject or fail to reject the null hypothesis. After all iterations are fin-ished, the rejection rates can be calculated for the nominal significance levels of 1%, 5% and 10%.

The results are shown in Fig. 5and Fig. 6. These figures show the rejection rates at nominal significance levels of 1%, 5% and 10% against superior predic-tive ability of the correct density distribution, as a function of threshold value

r. From these figures, we can conclude that for values of r close to -1, the

condi-tional likelihood scores have higher power than the censored likelihood scores. Especially if tests are performed for the 1% levels, the conditional likelihood scores perform better. For threshold values larger than -1, the 5% and 10% levels all tests have good power. As the threshold value r increases, the power increases for all scores.

Analysing the difference for the univariate and multivariate scores, we see that they follow the same trend. The multivariate power is higher than the univariate power for negative values of the threshold value r. The univariate density forecasts for the portfolio returns, are produced by aggregating the es-timated variance-covariance matrix and the vector containing the expectation values of the assets. The portfolio return is a linear combination of the asset returns; ~r_tp = β1r1,t+β2r2,t+β3r3,t+β4r4,t = β. Due to the linear combi-~

nation of assets returns, extreme portfolio returns will only occur if all assets returns are extreme, resulting in a small number of observation for values of r

(35)

3.4 power 25

Fig. 5: Univariate Power for sample size T =500.

−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Threshold r Rejection Rate cl1 csl1 cl5 csl5 cl10 csl10

close to -1. That explains why the multivariate scoring rules achieve a higher power than the univariate scoring rules.

For threshold values close to 0, the univariate conditional and censored scor-ing rules converge to the same power for all the levels. Because the conditional scoring rules have better power for threshold values between [−1, 0], that scor-ing rule would be preferred. For the multivariate scorscor-ing rules, we see that for threshold values larger than -0.3, the censored likelihood scores have equal or better power. The exact reason for this phenomenon is not clear and it is not the main focus of this thesis.

These results show that all scoring rules have good power for threshold val-ues close to 0. Appropriate threshold valval-ues close to 0 will be chosen for the empirical application, discussed in Chapter4.

(36)

Fig. 6: Multivariate Power for sample size T =1000.

−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Threshold r Rejection Rate cl1 csl1 cl5 csl5 cl10 csl10

(37)

4

E M P I R I C A L A P P L I C AT I O N

In this chapter we test the empirical relevance of the proposed aggregation and the predictive accuracy tests for comparing univariate and multivariate density forecasts with an application to a portfolio of commodities.

4.1 p o rt f o l i o o f c o m m o d i t i e s

The portfolio will contain of 4 assets; Gold, Silver, Platinum and Palladium. The log returns will be considered:

rt =lnPt− lnPt−1=ln Pt Pt−1

, (38)

where Pt is the closing price on day t. The analysis is based on the daily data over the period from November 15, 2004 until November 14, 2014, which yields a total of 2610 data points (Source: FactSet Commodities). Fig. 7 shows the return series of the portfolio assets. There is a clear sign of volatility clustering, however for most periods the volatility seems to be quite different. The returns of Silver, Platinum and Palladium are of the same magnitude, while Gold has returns that are smaller.

4.2 s tat i o n a r i t y

Density forecasts based on models that are not stationary might suffer from in-consistent estimates, due to impulse responses that do not decay. Before select-ing the models that will be used to produce the density forecasts, we analyzed the data for cointegration. Cointegrated variables, identified by cointegration tests, can be combined to form new, stationary variables. An n-dimensional time series is cointegrated if there exists a linear combination α1· lnP1t+. . .+

αn· lnPnt of the component variables is stationary. The linear combination is called the cointegration relation and the vector ~α = (α1, . . . , αn)0 the coin-tegration vector. The cointegrated variables have the tendency of reverting to a common stochastic trend. This tendency is expressed in terms of ‘error-correction’. To compensate this error-correction, such a term is added to the model, resulting in an Vector Error-Correction (VEC) model.

The approaches that are currently utilized for cointegration testing origi-nated from the work of Engle and Granger (1987). Their method starts with

(38)

28 _{e m p i r i c a l a p p l i c at i o n}

Fig. 7: Returns of the portfolio assets Gold, Silver, Platinum and Palladium

2004 2005 2006 2008 2009 2010 2012 2013 2015 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 Year Return Gold 2004 2005 2006 2008 2009 2010 2012 2013 2015 −0.12 −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 Year Return Platinum 2004 2005 2006 2008 2009 2010 2012 2013 2015 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 Year Return Silver 2004 2005 2006 2008 2009 2010 2012 2013 2015 −0.15 −0.1 −0.05 0 0.05 0.1 Year Return Paladium

regressing the first component, here r1t, on the remaining components of rtand test the residuals for a unit root. The null hypothesis states that the series in rt are not cointegrated. If the residual test fails to find evidence against the null of a unit root, the Engle-Granger test fails to find evidence that the estimated regression relation is cointegrating. The regression equation can be rewritten as

lnP1t− α2· lnP2t− . . . − αn· lnPnt =α0lnPt− c0 =t, (39) where η= [1 α0]0 is the cointegrating vector and c0 is the intercept. In order to

test for cointegration the Matlab code in Listing 2, displayed in Appendix A, was used. We compute both the τ(t1)and z(t2) Dickey-Fuller statistics, which egcitest compares to tabulated values of the Engle-Granger critical values. Table 1 shows the output of the egcitest for the return series. The h vector represents a vector of Boolean decisions for the test indicating whether there is cointegration, zero representing no cointegration. The τ(t1) and z(t2)tests fail

to reject the null of no cointegration, with a P-values close to 85%, and Dickey-Fuller test statistics substantially bigger than the critical values. The regression coefficients are found to be c0 =5.4418 and α= [0.8720, −0.2127, 0.0826]0. The

(39)

4.3 model selection 29

Fig. 8: Cointegration relation between assets.

2004 2005 2006 2008 2009 2010 2012 2013 2015 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Year P ercen t

regression coefficients can be utilized to examine the hypothesized cointegrating vector η. Fig. 8 shows the cointegration relation retrieved from the output of the Engel-Granger test for cointegration. The combination appears relatively stationary, as the test confirms.

Table 1: Output computed by egcitest.

Prob 0.8506 0.8422 Test stat. -2.0623 -9.0153 Crit. V. -4.1059 -32.1448

h 0 0

Since there is no evidence for cointegration between the return series, a VEC model does not need to be used. To estimate the mean vector of the portfolio, a VAR model will be exploited.

4.3 m o d e l s e l e c t i o n

The VAR model is discussed in Section2.8.1. The next step is to determine the number of lags. The code displayed in Listing 3 in Appendix A, was written in order to find the number of lags. First several VAR models are created. The matrix dt is a diagonal logical matrix, which specifies that the autoregressive matrices for VAR1diag, VAR2diag and VAR4diag are diagonal. In contrast, the

(40)

30 _{e m p i r i c a l a p p l i c at i o n}

Fig. 9: Illustration of aggregation consequences for 2-dimensional densities. p is the true

unknown density, ˆf and ˆg are competing density forecasts. The blue line represents the threshold. ˆ ft ˆ gt p r1 r2

(a) Bivariate case.

ˆ f = p

ˆ g

(b) Univariate case.

specifications for VAR1diag, VAR2diag and VAR4diag have empty matrices in-stead of dt. Therefore, vgxvarx fits the defaults, which are full matrices for autoregressive and correlation matrices. Next with vgxvarx the created mod-els can be easily fit to the data. With the lratiotest function we compare the restricted (diagonal) AR models to their unrestricted (full) counterparts. The test rejects or fails to reject the hypothesis that the restricted models are adequate, with a default 5% tolerance. The reject7 returns a 0, which shows that the test does not reject the unrestricted AR(4) model in favor of the un-restricted AR(5) model. All other reject booleans returned a 1, indicating a rejection.

As another check, the Akaike Information Criterion (AIC) was assessed, see Listing3. The lowest value of the aicbic indicates the best model. The VAR(4) model receives the lowest value and is therefore the preferred model, confirming our precious result.

For illustrative purposes, the models on which the density forecasts are con-structed, will have the same VAR(4) model to estimate the portfolio mean vector, ~µp, but different estimated variance-covariance matrices. The conse-quence of having different mean vectors is illustrated in Fig.9. In Fig.(a), the axes ~r1 and ~r2 represent asset returns, p is the true unknown density and ˆf

and ˆg. If the scoring rules are evaluated in 2 dimensions with the threshold line depicted in blue, that density forecasts ˆgt has highest predictive accuracy. However, if the scoring rules are evaluated after aggregation onto the black line, shown in Fig. (b), the density forecast ˆft would coincide with the true