Evaluating density forecasts : density forecast evaluation for misspecification on the mean

(1)

Evaluating Density Forecasts

Density forecast evaluation for misspecification on the mean

Mats van Beelen (10446133)

Supervisors: Cees Diks and Hao Fang

University of Amsterdam

June 2015

In this paper a modification on the logarithmic scoring rule is proposed, such that it becomes more sensitive to the conditional mean of the density forecast. The modification involves adding a penalty to the logarithmic scoring rule, such that a model with a misspecified conditional mean will be more discredited than a model with a correctly specified mean. The proposed modified scoring rule should therefore result in more power than the logarithmic scoring rule.

(2)

1 Introduction

Forecasting is a process that estimates the outcome of events that have not been yet observed. A density forecast of a random variable to be observed in the future is an estimation of the probability distribution for that future value. It provides information about the uncertainty of the prediction, there where point forecasts are not very informative unless some information is available about their uncertainty, according to Granger and Pesaran (2000) and Garratt et al. (2003). This has led to an increasing interest in density forecasting in both finance and macroeconomics (see Tay and Wallis, 2000 for a survey).

Forecasting is a central method to plan an achieve goals in the future. The government and other au-thorities often base their policy on forecasts of economic variables, such as the inflation rate. The problem that forecasters face is how to evaluate two or more competing forecast alternatives. Amisano and Giacomini (2007) propose a test for comparing the out-of-sample accuracy for competing density forecasts. Giacomini and White (2006) propose a test for equal conditional predictive ability. They propose in their paper a framework for forecast selection in which one forecast model is possibly misspecified and they introduced a test for equal conditional predictive ability of two competing forecasts.

In this paper, we aim to take the logarithmic scoring rule as a starting point, and modify it such that it becomes more sensitive to specific aspects of the density forecasts, such as their conditional mean of the distri-bution of the density forecasts. Such a modification involves adding a penalty to the logarithmic scoring rule that discredits models with a misspecification, more than a model which is correctly specified. The penalty must ensure that a correctly specified model will be chosen more often than an misspecified model. In other words we want to push a model which is correctly specified to a higher score, such that it will be chosen more often. Therefore the scoring rule which involves such a penalty receives more power than a scoring rule without such a penalty.

Through Monte Carlo simulations the size and the power of the modified scoring rule are studied, does the power of the modified scoring rule indeed increase when the competing density forecasts differs in the

(4)

con-ditional mean or variance. Secondly the weights in the modified scoring rules are observed and we investigate if there is an optimal value for the weights in terms of the power. At last the modified logarithmic scoring rule and the original logarithmic scoring compete in an empirical application, where the question arise which competing density forecast is better for the empirical time series of interests.

This paper is organized as follows. In section 2, we review the literature to provide necessary background information about the environment of the scoring rules, the scoring rules themselves and the test static described by Diebold and Mariano (1995). In section 3 we modify the log-likelihood scoring such that it becomes more sensitive to the conditional mean and the conditional variance of the density forecasts. Through Monte Carlo simulation experiments is section 4, the modified scoring rules are observed to their size and power and we address the optimal weight of the penalty in terms of power. Section 4 also contains an empirical application, where the modified log-likelihood scoring rule and the original one compete in which forecasts better. In section 6 a summarizing can be founded.

2 Scoring rules and test statistics for evaluating density forecasts

2.1 The data environment

First we need more information about the environment of the stochastic process. White (1994) consider a stochastic process (Wt: Ω → Rk+1, k ∈ N, t = 1, 2, ...) defined on a complete space (Ω, F, P), and identify Wt= (Yt, Xt0),

where Yt: Ω → R is the real variable of interest and Xt: Ω → Rkis a vector of predictor variables. Define Ft

the information set at time t, with Ft= σ(W10, ..., Wt0).

Amisano and Giacomini (2006) suppose there are two competing models available, each producing one-step-ahead predictive densities of Yt+1, based on Ft. The two competing density forecasts of Yt+1are based on the

information set Ftand are denoted by ˆfm,t= f (Wt, Wt−1, ..., Wt−m+1; ˆβm,t) and ˆgm,t= g(Wt, Wt−1, ..., Wt−m+1;

ˆ

(5)

based on m previous observations. The vector ˆβm,tcollects the parameters estimated for both models.

For producing the forecast a rolling window is used. Giacomini and White (2006) propose such a rolling window with T the total sample size, m the maximum size of the estimation window and t = m, m + 1, ..., T − 1. The first one-step-ahead forecast at time t will be compared to their real observed value ym+1. At time t + 1 the

second forecast will be formulated using the previous m observations and compare it to ym+1+1. This procedure

will produce n = T − m forecasts.

The out of sample forecasts can be evaluated by a loss function in the current content denoted by S∗( ˆft; TT +1 (Diks et al., 2011). Diebold and Lopez (1996) discussed in their paper loss functions for

quan-tile, probability, and density forecasts. The loss function measures the quality of the forecast. If the forecast is perfectly accurate, the loss function is zero. If not, the loss function measures ”how bad” the mistake is.

2.2 Scoring rule and test statistics

In the previous section there is shown how a density forecast can be constructed with a rolling window scheme. If there are two competing density forecasts available, both producing one-step ahead predictive density of the random variable, we want to know which one will perform better. The one-step ahead density forecast are denoted by ˆft(y) and ˆgt(y). Diebold and Lopez (1996) uses scoring rules to evaluate the relative predictive accuracy of the

competing density forecasts. A scoring rule measures the ‘quality’ of a probability density forecast by a numerical score based on the depending density forecast ˆf (y) and the actually observed value yt+1: S∗( ˆft(y); Yt+1)). The

numerical scores can be used for ranking competing density forecasts.

Nonetheless according to Diebold et al. (1998) and Granger and Pesaran (2000) any rational user would prefer the true conditional density ptof Yt+1over an incorrect density forecast. Therefore a incorrect density

fore-cast ‘quality’ of a probability density forefore-cast by a numerical score based on the depending density forefore-cast ˆf (y) should not receive a higher score form the scoring rule than the true conditional density, such that the following

(6)

inequality must hold:

Et(S( ˆft(y); Yt+1)) ≤ Et(S(pt; Yt+1)), for all t (1)

The scoring rule S is called proper if the scoring rule satisfy the above inequality (Gneiting and Raftery, 2007). Gneiting and Raftery (2007) also showed that the scoring rule is strictly proper if the equality holds. This implies that if the density forecast ˆf (y) is based on a correct specified model the average score Et(S( ˆft(y); Yt+1)) may

not be higher than the average score of the true conditional density Et(S(pt; Yt+1)), because there is still some

uncertainty in the density forecast.

For a given scoring rule we need to test for equal predictive ability. Giacomini and White (2006) construct tests for both conditional and unconditional predictive ability. If one want to compare two competing density forecasts ˆft(y) and ˆgt(y) for the one-steps-ahead variable Yt+1, we construct a score difference

d∗t+1= S∗( ˆft; yt+1) − S∗(ˆgt; yt+1) (2)

using a loss function S∗. The null-hypothesis of equal predictive ability is given by

H0: E(d∗t+1) = 0, for all t = m, m + 1, ..., T − 1 (3)

Diebold and Mariano (1995) constructed a type statistic to test the null-hypothesis against an alternative hypothesis H0: E(d∗t+1) 6= 0(or < 0 or > 0). Let ¯d∗m,nbe the sample average score of the score differences obtained by (3),

¯ d∗m,n= n1

PT −1

t=md∗t+1. The Diebold and Mariano (1995) type statistic is given by

tm,n= ¯ d∗_m,n q ˆ σ2 m,n/n (4)

which is asymptotically standard normally distributed as n → ∞ and ˆσ2

m,n is a heteroskedasticity and

auto-correlation-consistent (HAC) variance estimator of σ2_m,n =Var(√n ¯d_m,n∗ ). Diks er al. (2011) use ˆσ2_m,n= ˆγ0+

2PK−1

k=1 αkˆγk to construct a HAC estimator for the asymptotic variance of the average score d∗m,n, where ˆγk

denotes the k-lag sample covariance of the sequence (d∗m+1, ..., d∗T −1) and αkare the Bartlett weights αk = 1 −_Kk

with K = [n14]. The HAC estimator ˆσ2_m,nsatisfies ˆσ2_m,n- σ2_m,n

p

(7)

The significance level α rejects the null hypothesis of equal predictive ability whenever tm,n> zα(zα 2),

where zα(zα

2) is the [1-α] ([1 −

α

2]) quantile of a standard normal distribution.

2.3 Weighted logarithmic scoring rule

One of the more popular scoring rules is the log likelihood scoring rule. Mitchell and Hall (2005) and Bao et al. (2004) focus on using the logarithmic scoring rule

Sl( ˆft; yt+1) = log ˆft(yt+1) (5)

which is based on the Kullback-Leiber information criteria (KLIC). The logarithmic scoring rule denoted by Sl

assigns a high score to a density forecast if the real value yt+1falls within a region with high predictive density

ˆ

ft, and a low score if the observation fall within a region with low predictive density ˆft. If n forecasts are

avail-able to evaluate ym+1, ym+2, ..., yT, the two competing density forecast can be ranked given their average score

1 n PT −1 t=mS l_{( ˆ}_f t; yt+1) andP T −1 t=mS l_{( ˆ}_f

t; yt+1) (Diks et al. ,2011). Obviously the density forecast with the highest

average score would be preferred over the others. In subsection 2 the difference between two competing density forecast was given by d∗_t+1. The log score difference

dl_t+1= Sl( ˆft; yt+1) − Sl(ˆgt; yt+1) (6)

can be used to test H0versus the alternative hypothesis with the test statistic from Diebold and Mariano (1995) as

described in (4).

The divergence between two conditional densities can be appropriately measured by the Kullback-Leiber (1951) Information Criterion (KLIC). The KLIC allows us to measure the relative accuracy of two competing densities, as discussed by Mitchell and Hall (2005) and Bao et al. (2004,2007). The KLIC for a density forecast

(8)

ˆ

ftis defined as

KLIC( ˆft) = Et(logpt(Yt+1) − log ˆft(Yt+1))

= Z ∞

−∞

pt(y + 1)(logpt(yt+1) − log ˆft(yt+1))dyt+1

= Z ∞ −∞ pt(y + 1)log( pt(yt+1) ˆ ft(yt+1) )dyt+1

with ptas the true conditional density. However ptis unknown and thus the KLIC can not be evaluated directly.

As said above Mitchell and Hall (2005) and Bao et al. (2004,2007) use the KLIC to measure the relative accuracy of two competing densities. They take the difference of two competing density forecasts

= KLIC( ˆft) − KLIC(ˆgt)

= Et(logpt(Yt+1) − log ˆft(Yt+1)) − Et(logpt(Yt+1) − logˆgt(Yt+1))

= Et(log ˆft(Yt+1) − logˆgt(Yt+1))

such that the term Et(logpt(Yt+1)) drops out as shown above. In the last term we recognize the logarithmic score

difference as stated in (6). The test statistic d¯

∗ m,n √ ˆ σ2 m,n/n

from Diebold and Mariano(1995) changes for the logarithmic scoring rule in

¯ dl m,n √ ˆ σ2 m,n/n

(Diks et al. , 2011). Diks et al. also show that the null hypothesis as stated in subsection 3 can be used for the logarithmic scoring rule

H0: E(dlt+1) = 0, for all t = m, m + 1, ..., T − 1 (7)

where dl

t+1is the log score difference as stated in (6).

Giacomini and White (2006) propose a null hypothesis that differs from the null hypothesis as stated in (3), they propose that the expectation E[S∗( ˆft; yt+1) − S∗(ˆgt; yt+1)] is conditional on some information set Gt:

(9)

Their null hypothesis state that one cannot predict which forecasting method will be more accurate at time t using the information set Gt. For conditional predictive ability the information set Gtis equal to Ft, where Ft is the

time-t information set as stated in subsection 1. The other possibility for Gt= [0, Ω], which yields a test for equal

unconditional predictive ability. If the goal is to produce a forecast for a specific date periods in the future, the conditional test can be more informative, because the additional current information can help to predict which forecast will be more accurate for the specific date (Giacomini and White, 2006).

Amisano and Giacomini (2007) propose a modification on the log-likelihood scoring rule. They intro-duce a weighted logarithmic scoring rule, where a given weight function w(·) is multiplied with the logarithmic scoring rule

Swl= w(·)log ˆft(yt+1= w(·)Sl( ˆft; yt+1) (9)

It should be noted that the weighted logarithmic scoring rule is not proper, but Diks et al. (2011) showed some interesting applications while using this scoring rule. The test for equal performance of the density forecasts ˆft

and ˆgtslightly differs from the test statistic for the logarithmic scoring rule. The null hypothesis can be formulated

as H0: E[dwlt+1] = E[w(·)((S l_{( ˆ}_f t; yt+1) − Sl(ˆgt; yt+1))|(G)t] = 0 (10) with ¯dwl_m,n = _n1PT −1 t=md wl

t+1 the test statistic from Diebold and Mariano will be tm,n ¯ dwl_m,n √ ˆ σ2 m,n/n . which is still asymptotically standard normally distributed as n → ∞ and ˆσ2m,nis a heteroskedasticity and

auto-correlation-consistent (HAC) variance estimator of σm,n2 =Var(

√

n ¯dwlm,n). Diks er al. (2011) use ˆσm,n2 = ˆγ0+ 2PK−1k=1 αkˆγk

to construct a HAC estimator for the asymptotic variance of the average score d∗m,n, where ˆγk denotes the k-lag

sample covariance of the sequence (d∗_m+1, ..., d∗_{T −1}) and αkare the Bartlett weights αk= 1 − _Kk with K = [n

1 4].

The HAC estimator ˆσ2

m,nsatisfies ˆσ2m,n- σ2m,n p

−→ 0.

The significance level α test rejects the null hypothesis of equal predictive ability whenever tm,n >

zα(zα

2), where zα(z α

2) is the [1-α] ([1 −

α

(10)

3 A modification of the logarithmic scoring rule

In this section we introduce a modification of the logarithmic scoring rule as described in section 2. The aim is to modify the log-likelihood scoring rule so it becomes more sensitive to specific aspects of the density forecasts. The specific aspects which will be investigated in this research paper is the conditional mean. A modification involves adding a penalty to the logarithmic scoring rule, such that a model with a misspecified conditional mean scores lower than a model with a correct specified conditional mean.

3.1 Misspecification of the conditional mean

The aim is to generate density forecasts for a variable of interest over a future time horizon. Given an observed series y1, y2, ..., ymand a forecast horizon Gt, generate predictions for ym+1, ym+2, ..., yT. Let µ_{f ,t+1}ˆ denote a

forecast at time t+1 based on the density forecast f, conditional on Gt

µ_{f ,t+1}ˆ = E(yt+1|Gt)

The minimum square error (MSE) forecast is the forecast µ_{f ,t+1}ˆ that minimizes the expected square loss

E(yt+1− µf ,t+1ˆ |Gt)2 (11)

Patton and Timmermann (2007) propose the MSE for a scoring rule in the Diebold and Mariano test statistic as described in section 2. For a given weight function w(·) we can construct a test where a misspecification on the conditional mean will score lower than a model with a correct specified mean. Consider yt+1− µ_{f ,t+1}ˆ ,

where µ_{f ,t+1}ˆ is correctly specified, then the difference between the observed value and µ_{f ,t+1}ˆ is 0. This result can

be used to construct a penalty in the log-likelihood score if the difference is nonzero. The weight function can be as simple as c which is a variable between 0 and 1. The trade off between the original log-likelihood scoring rule and the modified one, depends on c. For c equal to one no penalty to the logarithmic score is added. Summarizing

(11)

the above gives

dclt+1= c · Sl( ˆft; yt+1) − (1 − c) · (yt+1− (µ_{f ,t+1}ˆ )2− (c · Sl(ˆgt; yt+1) − (1 − c) · (yt+1− µˆg,t+1)2) (12)

In (8) the null-hypothesis was given for equal conditional predictive ability for the competing density forecasts. For using the same null-hypothesis the expected value for c(yt+1− µf ,t+1ˆ )

2_{needs to be zero. The expected value}

for the penalty of the log-likelihood scoring rule is given by E[c(yt+1− µ_{f ,t+1}ˆ )2|Gt] = c · E[(yt+1− µ_{f ,t+1}ˆ )2|Gt].

If there is no misspecification in the conditional mean, then the expected value will be zero, as proposed in the null hypothesis by Giacomoni and White (2006). The null-hypothesis therefore will not change and it allows us to use the same test-statistic as stated in (4), with d∗= dcl_.

H0: E[dclt+1|Gt] = 0 for all t = m, m + 1, ..., T − 1. (13)

Let ¯dcl m,n = n1

PT −1

t=md cl

t+1. For testing H0 the Diebold and Mariano (1995) test statistic can be used, against

Ha: E[ ¯dclm,n|Gt] tm,n= ¯ dclm,n q ˆ σ2 m,n/n (14) with ˆσ2

m,nis a HAC variance estimator of σm,n2 = V ar(

√ n ¯dcl

m,n).

4 Monte Carlo Simulations

Through Monte Carlo simulations the size and power properties of the modified test are studied as described in section 3. In the first part of the simulation experiments different values for c are considered for the modified scoring rule, where only convex combinations of the original and penalty are considered. After a choice is made for the optimal value of c in terms of power, the size of the test is investigated. In the last simulation experiments the power is studied for several cases.

(12)

Figure 1: One-sided rejection rates at nominal level 10 % of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the modified scoring rule for different values of c, based on 1000 replications. For both graphs the DGP is i.i.d. standard normal, the test compares the predictive accuracy of the standard normal distribution and the normal distribution with mean 0.1 for the left panel and 0.5 for the right panel.

4.1 The optimal weight value

The first problem that arises is for which value of c the test-statistic will be optimal. At first two cases are studied where the data generating process is i.i.d. standard normal distributed, the two competing destiny forecasts are normal distributed as well. One of the competing density forecasts corresponds exactly to the data generating process, the other density forecast has mean 0.1 and 0.5 with variance 1 for both cases.

Figure 1 display the rejection probabilities for different values of c when the data generating process is taken to be standard normal distributed and one of the competing density forecast corresponds exactly to the data generating process. The he number of replications is 1000 and the amount of observations is 50 and 100.

For both graphs it seems that for a different value of c the test has the same power and therefore the penalty seems not to result in more power. Consequently further investigation on the scoring rule is needed when

(13)

the competing density forecasts are normal distributed, therefore the scoring rule as stated in (12) is written out: dt+1= c · log( ˆft(yt+1)) − log(ˆgt(yt+1)) − (1 − c) ·(yt+1− µ_{f ,t+1}ˆ )2− (yt+1− µg,t+1ˆ )2 = c · log(√1 2πσe −(yt+1−µ ˆf ,t+1)2 2 ) − log(√1 2πσe −(yt+1−µˆg,t+1 )2 2 ) − (1 − c)·(yt+1− µ_{f ,t+1}ˆ )2− (yt+1− µg,t+1ˆ )2 = c ·  log(e −(yt+1−µ ˆf ,t+1)2 2 e−(yt+1−µˆg,t+1 ) 2 2 )  − (1 − c) · (y_t+12 − 2µ_{f ,t+1}ˆ yt+1+ µ2f ,t+1ˆ ) − (y 2 t+1− 2µˆg,t+1yt+1+ µ2ˆg,t+1) = c · µ2 ˆ f ,t+1 2 − µf ,t+1ˆ yt+1) − ( µ2 ˆ g,t+1 2 − µˆg,t+1yt+1) ! − (1 − c) · (−2µ_{f ,t+1}ˆ yt+1+ µ2f ,t+1ˆ + 2µˆg,t+1yt+1− µ 2 ˆ g,t+1) = c · µ2 ˆ f ,t+1 2 − µf ,t+1ˆ yt+1− µ2_g,t+1_ˆ 2 + µg,t+1ˆ yt+1 ! − (1 − c) · (−2µ_{f ,t+1}ˆ yt+1+ µ2_{f ,t+1}ˆ + 2µˆg,t+1yt+1− µ2ˆg,t+1)

In the last line of the equation it can be seen that the original logarithmic scoring rule and the penalty are heavily correlated, the original scoring rule is a half time the penalty.

Accordingly different density forecasts needs to be considered. In the next case two double exponential distributions are considered for the two different competing density forecasts. At first the scoring rule will be written out and thereafter a simulation is executed.

dt+1= c · log( ˆft(yt+1)) − log(ˆgt(yt+1)) − (1 − c) ·(yt+1− µ_{f ,t+1}ˆ )2− (yt+1− µg,t+1ˆ )2 = c ·  log(e −|x−µ ˆf ,t+1 b | 2b ) − log( e−|x−µˆg,t+1b | 2b )  − (1 − c) · (yt+1− µ_{f ,t+1}ˆ )2− (yt+1− µg,t+1ˆ )2 = c ·   log( e−| x−µ ˆf ,t+1 b | 2b e−| x−µˆg,t+1 b | 2b   − (1 − c) · (y_t+12 − 2µ_{f ,t+1}ˆ yt+1+ µ2_{f ,t+1}ˆ ) − (yt+12 − 2µg,t+1ˆ yt+1+ µ2ˆg,t+1) = c · log(e−| x−µ ˆ_{f ,t+1} b |+| x−µˆg,t+1 b |₎ − (1 − c) · (−2µf ,t+1ˆ yt+1+ µ2_{f ,t+1}ˆ + 2µˆg,t+1yt+1− µ2ˆg,t+1) = c · −|x − µf ,t+1ˆ b | + | x − µg,t+1ˆ b | − (1 − c) · (−2µ_{f ,t+1}ˆ yt+1+ µ2_{f ,t+1}ˆ + 2µˆg,t+1yt+1− µ2ˆg,t+1)

From the equation it follows that the scoring rule and the penalty are not as correlated as for the normal distri-butions, therefore consider a case where the data generating process is taken to be standard normal. The two competing density forecasts are double exponential distributed, the mean for f will be 0 with variance 1 and the

(14)

Figure 2: One-sided rejection rates at nominal level 10 % of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the modified scoring rule for different values of c, based on 1000 replications. For both graphs the DGP is i.i.d. standard normal, the test compares the predictive accuracy of the double exponential distributions where one density forecast has mean 0 and variance 1 for both panels and the other density forecast has mean 0.1 for the left panel and mean 0.5 for the right panel.

mean for g will be 0.1 and 0.5 with variance 1.

Figure 2 display the he rejection probabilities for different values of c. The number of replications is 1000. The right panel shows that the rejection rate decreases for larger values of c. Therefore it seems that the penalty gives extra power to the Diebold-Mariano type test statistic. The left panel has a maximum for a c value around 0.5, therefore it can be concluded that the penalty indeed result in more power for different c values. For the size and power properties, this paper focuses on the normal and double exponential distributions for the competing density forecasts. In the simulation experiments a c value of1₂is taken.

4.2 Size

To investigate the size properties of the modified test a case is required with two competing predictive densities that both are equally correct or both are equally incorrect. The data generating process is i.i.d. standard normal. In the first case the two competing density forecasts are normally distributed with different means. The mean for f

(15)

Figure 3: One-sided rejection rates at nominal level 10 % of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the modified scoring rule for different values of a, based on 1000 replications. The DGP is i.i.d. standard normal, the test compares the predictive accuracy of the normal distributions with mean -a and a (left panel) and of the double exponential distributions with mean -a and a respectively (right panel).

will be -a and for g a and both have identical variances equal to 1,the value of c will be chosen to be ,₁₀1,1 2 and

9 10.

Figure 3 shows the one-sided rejection rates at significance levels of 10% of the null hypothesis against the alternative hypothesis that f has better predictive ability. The Monte Carlo simulations are based on 1000 replications for sample size n=250.

From the left panel of figure 3 it follows that the type test statistic is slightly under sized, this could be a result of the small amount of observations. The conclusion made in the previous subsection is visible in the left panel, for every value of c the test is slightly under sized. At second we can conclude from the right panel that the test is better sized for the c-values .9 and .5, than for smaller values. The conclusion can be drawn that for the modified scoring rule the test appears to be well sized for different means.

(16)

4.3 Power

The power of the modified logarithmic scoring rule is studied by performing simulation experiments where one of the competing density forecasts corresponds exactly with the data generating process, and the other differs from the data generating process with a different mean. To investigate the power of the test, two experiments will be used. At first the power is investigated where the mean of the incorrect predictive density is moving more from the mean of the correctly specified density. The scoring rule which is used for this experiment is the scoring rule as stated in (12).

Two cases are considered. At first we consider a data generating process which is standard normal distributed, the amount of observations is 100 and 200. In the first case, the two competing density forecasts are normally distributed, where f corresponds exactly to the data generating process and g has a mean which is variable between -1 and 1, with variance 1. The second case contains two competing density forecasts which are double exponential distributed, where f has mean 0 and variance 1 and g has a mean which is variable between -1 and 1 and variance 1.

Figure 4 display the rejection rates at nominal significance level 10% of the Diebold and Mariano type test statistic of equal predictive ability when using the original logarithmic scoring rule and the modified scoring rule. The left panel shows the same conclusion as the previous simulation experiments. However the right panel shows for both number of observations the modified scoring rule results in more power.

We should note that when the competing density forecasts are exactly the same the score difference are exactly zero and therefore the the hac estimator is zero. This result in an ordinary point (mu g), where the rejection rate should be around 0.1.

Through the last simulation experiment the effects of a rolling window is estimated, where the experi-ment contains parameter estimation for the two competing density forecasts. The data generating process is taken to be an AR(2) process, which is specified as: yt= 0.9yt−1+ 0.05yt−2+ εt, where εt i.i.d. normally distributed

(17)

Figure 4: One-sided rejection rates at nominal level 10% of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the modified scoring rule for different means of g, based on 1000 replications. The DGP is i.i.d. standard normal, the test compares the predictive accuracy of the normal distributions with mean 0 and [-1,1] (left panels) and of the double exponential distributions with mean 0 and [-1,1] respectively (right panels), in all graphs the original scoring rule is given as well. The top panels contains 100 observations and the bottom 200.

specification, correct to the data generating process up to two estimated parameters. The other density forecast is an AR(1) specification, where only one parameter needs to be estimated using the rolling window. The parameters

(18)

are estimated by OLS. The competing density forecats have the following model specification:

f : yt+m= αyt+m−1+ βyt+m−2+ εt+m

g : yt+m= δyt+m−1+ εt+m

Table 1 displays the one-sided rejection probabilities for equal predictive ability for diverse rolling widow sizes m. The table also includes the estimated parameters, estimated using a moving widow with m observations. The estimation window size m is varied form 50 to 2000, the out of sample size is set on 1000 and the amount of replications is 1000. The same data is used for every movingwindow size m.For relative small estimation widows, when m = 50; 100; 200, the OLS uncertainty is more important than for larger estimation widows and the test shows that the incorrect AR(1) model present better density forecasts. For larger estimation widows, when m = 500; 1000; 2000, the test favors the correct AR(2) model. The estimated α and β shows that for a bigger estimation window the estimated parameters are closer to the true parameters.

m 50 100 250 500 1000 2000

Rejection probability

0 0 0.1950 0.6880 0.9070 0.9810

Table 1: Note: The table displays the one-sided rejection probabilities at nominal significance level 10% of the null hypothesis of equal predictive accuracy against the alternative hypothesis by the Diebold and Mariano type test statistic defined in (14) when using the modified scoring rule as stated in (12). The number of observations is 1000, the moving window is varied form 50 tot 2000 and the amount of replications is 2000.

4.4 weight functions

Diks et al. (2011) described in their paper the use fullness for focusing on different regions of the distribution, in risk management they are more interested in the tails of the distribution and monetary policymakers who are targeting to keep inflation between boundaries, the central region could be of interest. The proposed scoring rule

(19)

involves a penalty which only focuses on the mean of the density forecast, therefore it could be of interest to investigate the outer region of the distribution, while adding a penalty which focuses on the central part of the distribution. Therefore an additional weight function is added to the modified scoring rule, which is a threshold function. The threshold function, wt(y) = I(−r ≤ y ≤ r), is 0 if the value is between r and one otherwise. The

modified scoring rule will be:

d = wt(yt+1)·c·(log( ˆft(yt+1))−log(ˆgt(yt+1)))−wt(yt+1)·(1−c)·((yt+1−µ_{f ,t+1}ˆ )2−(yt+1−µˆg,t+1)2) (15)

We consider two cases, one where the competing density forecasts are normally distributed and one where they are double exponential distributed. For both cases the data generating process is taken to be i.i.d. stan-dard normal. At first we investigate the size properties of the weighted modified scoring rule, thereafter the power.

Figure 5: One-sided rejection rates at nominal level 10% of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the weighted modified scoring rule for different values of r, based on 1000 replications.

Figure 5 displays the one sided rejection rates at nominal significance level of 10% of the null-hypothesis against the alternative hypothesis, under the weight function wt(y) = I(−r ≤ y ≤ r). The left panels compares

the predictive accuracy of the normal distributions with mean -0.1 and 0.1, both have variance 1. The right panel compares it for the double exponential distributions with mean -0.1 and 0.1 , bot have variance 1. The number of

(20)

replications is 1000 and the sample size is set on 300. It seems that for large values of r the size properties are not satisfactory, this could be the result that more and more observations are left out for r large. Therefore the power properties are studied when r = 1 for the threshold function.

To evaluate the power properties of the weighted modified scoring, we consider a case where one of the competing density forecasts corresponds to the data generating process, which is taken to be i.i.d. standard normally distributed. The other density forecasts will have a variable mean between -1 and 1. The c-value will be

1 2.

Figure 6 show the rejection probabilities for n=500, based on 1000 replications. In the left panel the competing density forecasts are normally distributed, for the right panel the competing density forecasts are double exponential distributed. The nominal significance level is 10%. Several conclusions can be drawn form the graph. Firstly, the threshold function has more impact on the exponential density forecasts than it has on the normal density forecasts. In figure 4 the width of both density forecasts were similar, there were in figure 6 the width of the normal density forecasts is smaller. At second it can be seen that for the right panel the rejection rate goes to zero when the conditional mean is more misspecified, but in the left panel the rejection rate is already 1 for a misspecification of 1 on the conditional mean in absolute values.

(21)

Figure 6: One-sided rejection rates at nominal level 10% of the Diebold-Mariano type test statistic of equal predictive accuracy defined in (3) when using the weighted modified scoring rule for different r = 1, based on 1000 replications. The left panel compares the predictive accuracy of the normal distributions, where the density forecast g has a variable mean between [-1,1]. The right panel compares the predictive accuracy of the double exponential distributions, where the density forecast g has variable mean between [-1,1].

5 Conclusions

In this paper a new scoring rule based on the logarithmic scoring rule is developed such that the logarithmic scoring rule becomes more sensitive for misspecification on the conditional mean of the density forecast. The modified scoring rule was designed to have more power than the original logarithmic scoring rule when the conditional mean is misspecified. Therefore the added penalty should push the relative better density forecast to a higher score. Through multiple simulations the modified scoring rule is investigated.

The first part of the simulation experiments was focused on finding an optimal value of c in terms of power. Due to the correlation between the density forecasts of the normal distributions and the penalty it seemed that adding a penalty will not result in more power. However the density forecasts with the double exponential distributions, showed us that for different vales of c the rejection probability increases when a penalty is added.

Monte Carlo simulations demonstrated that the Diebold and Mariano type test statistic while using the modified scoring rule appeared to be well sized. Secondly we have seen that for the density forecasts while using

(22)

the double exponential distributions the modified scoring rule result in more power than the original logarithmic scoring rule for different misspecified means of the incorrect density forecast. We also have seen that for the weighted modified scoring rule the test appears to be well sized for some threshold values. The power of the weighted modified scoring rule was less than for the modified scoring rule, but the rejection probabilities were still going to 1 for larger misspecification on the conditional mean. Hence we can conclude that adding penalty to the logarithmic scoring rule indeed result in more power.

(23)

References

1. Amisano, G. and Giacomini, R. Comparing density forecasts via weighted likelihood ratio tests. Journal of Business and Economic Statistics 25, 177-190

2. Bao, Y., Lee, T. and Saltoglu, B. (2004). A test for density forecast comparison with applications to risk management. Working paper 04-08

3. Bao, Y., Lee, T. and Saltoglu, B. (2007). Comparing density forecast models. Journal of Forecasting., 203-225

4. Diebold, F.X. and Mariano, R.S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics 12, 253-263

5. Diebold, F.X. and Lopez, J.A. (1996). Forecast Evaluation and Combination. Research Paper 9525

6. Diks, C. Pachenko, V. and Van Dijk, D. (2011). Likelihood-based scoring rules for comparing density forecasts in tails. Journal of Econometrics 163, 215-230

7. Garratt, A., Lee, K. and Hashem, M. (2003). A Long run structural macroeconometric model of the UK. The Economic Journal 113, 412-455

8. Giacomini, R. and White, H. (2006). Test of Conditional Predictive Ability. Econometrica 74,1545-1578

9. Gneiting, T. and Raftery, A.E. (2007). Strictly Proper Scoring Rules, Prediction,and Estimation

10. Granger, C.W.J. and Pesaran, M.H. (2000) Economic and statistical measures of forecast accuracy. Journal of Forecasting, 537-560

(24)

11. Mitchell, J. and Hall, S.G. (2005). Evaluating, Comparing and Combining Density Forecasts Using the KLIC with an Application to the Bank of England and NIESR ’Fan’ Charts of Inflation. Oxford Bulletin of Economics and Statistics, 995 - 1033

12. Patton, A.J. and Timmermann, A.G. (2007) Properties of Optimal Forecasts. Econometric Society 2004 North American Winter Meetings 234

Evaluating density forecasts : density forecast evaluation for misspecification on the mean