• No results found

Comparison of model averaging methods with an application to export of China : an empirical analysis of forecast accuracy

N/A
N/A
Protected

Academic year: 2021

Share "Comparison of model averaging methods with an application to export of China : an empirical analysis of forecast accuracy"

Copied!
25
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Comparison of model averaging methods

with an application to export of China

An empirical analysis of forecast accuracy

Hanning Ma (10596968)

December 23, 2016

Prof. dr. C.G.H. Diks Bachelor thesis econometrics

(2)

Statement of Originality

This document is written by student Hanning Ma, who declares to take full re-sponsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

This paper describes a case study, in which five different model averaging meth-ods have been compared in terms of their RMSE. Model averaging methmeth-ods combine individual forecasts generated by different models to a combination of these forecasts. The five methods that were compared were: equal weights averaging (EWA), Bates-Granger averaging (BGA), averaging using Akaike’s information criterion (AICA), averaging using Bayesian information criterion (BICA) and two versions Granger-Ramanathan averaging (GRA). The results were then compared to the results of three different studies performed by other researchers. The conclusion is that the GRA per-formed the best over three subsets of models, which was also the result of the three analysed case studies. The version of GRA that performed the best in this case was the unconstrained version of GRA.

(4)

Contents

1 Introduction 1

2 Literature review 3

2.1 Overview of past studies . . . 3

2.2 Model averaging methods . . . 5

2.2.1 Equal weights averaging . . . 5

2.2.2 Bates-Granger averaging . . . 5

2.2.3 Information criterion averaging . . . 6

2.2.4 Granger-Ramanathan averaging . . . 6

2.3 Hypotheses . . . 6

3 Methodology and Data 8 3.1 Data set . . . 8 3.2 Models . . . 10 3.3 Method of Comparison . . . 11 4 Results 13 5 Conclusion 19 References 21

(5)

1

Introduction

In college, econometrics students learn to specify models for fitting given data sets and to select the best or at least one of the best fitting models out of the many thinkable models. However, with so many competing models, how does one choose a model? And especially, how can you choose a best model when the results of some of the models turn out to be equally good in terms of errors, but differ in outcome. An alternative to model selection is model averaging. Multi-model averaging (MA) combines two or more forecast models to generate a forecast that is a weighted average of the selected models. Many researchers have shown that forecast combination techniques improve performance over model selection techniques. The study by Bates and Granger (1969) has been very influential and an important example to many other researchers in this field.

Many multi-model averaging methods have been developed over the past few decades besides the equal weights averaging technique and much research has been done to com-pare these different model averaging techniques. Hansen (2008) developed the Mallows model averaging (MMA) method and compared it with ten other techniques. His conclu-sion is that MMA is in the lead group of his study along with (constrained) Granger-Ramanathan averaging and smoothed Akaike’s information criterion averaging. Diks and Vrugt (2010) compared eight different model averaging methods, including equal weights averaging (EWA), Bates-Granger averaging (BGA), Akaike’s information crite-rion averaging (AICA) and Bayesian information critecrite-rion averaging (BICA), and conclude that Granger-Ramanathan averaging gives similar results as the Mallows model averaging method and the Bayesian model averaging method, but they pointed out the fact that GRA is easier to use and is less complex in terms of computation.

In this study the model averaging techniques that have been chosen for comparison are equal weights averaging, Bates-Granger averaging, AIC averaging, BIC averaging and two versions of Granger-Ramanathan averaging, namely a constrained and an unconstrained version. The reason Mallows model averaging and Bayesian model averaging have not been chosen is because Diks and Vrugt (2010) already argued that the results of those two methods would be similar to the results of GRA and since GRA is easier to use, the other two have been left out. The accuracy of the point predictors of these techniques have

(6)

been compared by calculating the root mean square forecast errors (RMSE) in an ex post evaluation, giving the resulting ranking of the methods by RMSE.

By comparing these techniques the question naturally arises if one or more techniques outperforms the others, and how this result differs from the results obtained by other researchers, like Diks and Vrugt (2010).

In this study an analysis of the multi-model averaging techniques has been made by making predictions of the total export of China. These predictions have then been eval-uated by using observed evaluation data from the same period. According to the World Trade Organization (2015) China is the world’s leading export country since 2009 and statistics show the export value of China is still rising every year.

To gain more insight in the opinions and theories of other researchers and to be able to compare the final results of this case study with results of past research a literature review has been performed, in which several other empirical case studies have been analysed and compared and the model averaging methods used in this research have been introduced. To compare the model averaging techniques, an empirical case study has been performed and the results have been analysed and discussed.

This study is composed of five parts. A literature review is set up in the next section, where previous case studies by other researchers and the forecast combination techniques used in this research have been described. From this literature study the hypotheses for the main research question of this thesis have been drawn. In the Section 3, the data and methodology used in the empirical study have been described. Then the results of the study have been presented and a detailed analysis of these results has been given. In Section 4 and Section 5 the conclusions that can be drawn from the literature study and the empirical study have been combined and the main question has been answered.

(7)

2

Literature review

As stated above, the work by Bates and Granger has been of great influence to other researchers. Since this seminal work, many different forecast combination techniques have been developed and studied. This section discusses a selection of the studies that have been performed, the model averaging methods that have been used in those researches and the conclusions that have been drawn. After that, the model averaging techniques used in this study have been described and at the end the hypotheses that can be drawn from this literature review have been given.

2.1 Overview of past studies

Hansen (2007) developed an averaging technique called the Mallows model averaging tech-nique by minimizing the Mallows criterion. He compared this techtech-nique with EWA, median forecast, BIC selection, weighted BIC, AIC selection, smoothed AIC, BGA, predictive least squares and GRA (Hansen, 2008). A selection of thirteen models and a simulated data set containing 20,000 observations were used. When comparing the out-of-sample mean-square forecast error (MSFE), Hansen found that only three methods were undominated, namely weighted BIC, Bates-Granger and MMA, but none of these three methods uni-formly dominated the others. However, he also calculated the maximum regrets of the procedures, which is the difference between the MSFE and the best MSFE achievable, which in Hansen’s case was the best MSFE among the eleven forecasting methods. The results were that MMA had the lowest and best maximum regret out of all of the compared combination methods. Smoothed AIC and constrained Granger-Ramanathan came in sec-ond and third respectively. His conclusion is that even though some procedures have low MSFE in some cases, MMA achieved the best performance when comparing MSFE and maximum regret. The ranking of the remaining methods relevant to this case study was, from best performing to worst performing: AICA, BICA, BGA and EWA. Though it should be noted that the unconstrained version of GRA performed worse than the constrained version of GRA and BGA, but was still better than EWA.

Another article on the comparison of different MA-methods was that by Diks and Vrugt (2010). They compared eight different forecast combination methods and the techniques

(8)

they chose to compare were EWA, unconstrained GRA, MMA, Bayesian model averaging in the finite mixture model, Bayesian model averaging in the linear regression model, Bates-Granger model averaging (BGA), averaging using Akaike’s information criterion (AICA) and Bayesian information criterion (BICA). Diks and Vrugt then performed two case studies, each with a different set of models and a different data set. For the first case study they used eight models and about 13,500 observations in total and for the second case study they used seven models and about 10,000 observations in total. Besides the RMSE, they also compared the ∆RMSE of the different techniques by first calculating the ex post optimal weights for the evaluation period and then the difference between the RMSE of the optimal weights and the RMSE obtained with the different model averaging techniques. Their results showed that the RMSE of BMA in the linear regression model, GRA and MMA were the lowest and that they were the closest to the RMSE of the optimal weights. They conclude that even though these three methods perform similarly, GRA has the most advantages, since it is easier to use and implement. After a bias correction was applied, AICA and BICA came in fourth together. In both case studies these methods only gave one model a weight of 1 and the rest a weight of 0. This means that their RMSE’s were also exactly the same. BGA performed less well than AICA and BICA, while EWA performed the worst of the whole set of methods in both case studies.

Arsenault, Gatien, Renaud, Brissette and Martel (2015) consider the research by Diks and Vrugt (2010) of great influence to their research. They also compared different MA-methods using hydrological data and models. The MA-methods compared in this study were EWA, BICA, AICA, BGA, BMA, three types of GRA, which are one unconstrained version, one with the restriction that all the weights must sum up to one and one version with a bias correction, and Shuffle Complex averaging (SCA). Their results showed that the lead group consisted of the three types of GRA and the SCA methods. The SCA method is an iteration-based method and takes longer to execute than GRA. Therefore, they conclude that the version of GRA with a bias correction and without the restriction of the weights summing up to one would be the most efficient and least complex technique to apply. The remaining GRA methods were the method with the restriction of the weights summing up to one, which performed second best from all of the GRA methods, and the GRA

(9)

method with no restrictions at all, which performed the worst from all the GRA methods. The ranking of the remaining methods relevant to this study in order from best to worst performing was: BMA, BICA, BGA, AICA and EWA.

2.2 Model averaging methods

As stated before in the introduction this study describes the comparison of five multi-model averaging methods: equal weights averaging, Bates-Granger averaging, AIC averaging, BIC averaging and Granger-Ramanathan averaging. Next, these five different techniques are described, based on the linear model

Yt= XtTβ + εt= k

X

i=1

βiXi,t+ εt. (1)

Here εtis ideally white noise with the distribution N(0, σ2) and k is the number of forecasts,

which is consistent with the number of models chosen by the model averaging method. Xi,t

is the forecast of the observed value Yt made by model i for period t. XtTβ adds up to the

combined forecast ˆYt, the point forecast vector of Yt, where β is calculated for each of the

different types of MA-methods.

2.2.1 Equal weights averaging

The weights used in equal weights averaging are equal for each of the models, giving the vector ˆβEW A= 1k, ...,k1 with the weighted average of predictors ˆYtEW A = 1k

Pk

i=1Xi,t.

2.2.2 Bates-Granger averaging

Bates and Granger (1969) chose to select weights by setting the weights equal to the following formula ˆ βBGA,i = 1/ˆσi2 Pk j=11/ˆσ2j , (2) where the ˆσ2

(10)

2.2.3 Information criterion averaging

A proposal by Buckland, Burnham and Augustin (1997) and Burnham and Anderson (2002) are Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) weights, which are both of the form

ˆ βi = exp(−Ii/2) Pk j=1exp(−Ij/2) , (3)

where Ii = −2log(ˆσi) + 2p for AICA and Ii = −2log(ˆσi) + plog(n) for BICA. Again, ˆσ is

the forecast error variance in the calibration period.

2.2.4 Granger-Ramanathan averaging

According to Diks and Vrugt (2010) Granger and Ramanathan exploited the presence of covariances when estimating weights by using the OLS estimators in a linear regression model, which could be obtained by minimizing:

S(β) =

n

X

t=1

(Yt− β0Xt)2, (4)

and the OLS estimator is then given by

ˆ

βGRA= (XTX)−1XTY. (5)

The weights can be unconstrained. Hansen (2008) also suggested a constrained GRA method, which imposes the convexity constraint of 0 ≤ βi ≤ 1 and the addivity constraint ofPk

i=1β = 1 on the minimizer.

2.3 Hypotheses

Several conclusions could be drawn from this literature review and these conclusions form the following hypotheses for this research.

The Granger-Ramanathan averaging method has been in the leading group for the re-searches done by Hansen (2008), Diks and Vrugt (2010) and also Arsenault et al. (2015). It is expected that in the following case study GRA is in the lead group as well. Fur-thermore, Diks and Vrugt (2010) only studied the unconstrained GRA and Hansen (2008)

(11)

also studied GRA with the constraint of the weights adding up to one besides the uncon-strained version, while Arsenault et al. (2015) also considered a third GRA, which was the unconstrained GRA with a bias correction and which they recommended as the best averaging method.

In the cases of Hansen (2008) and Diks and Vrugt (2010), AICA and BICA performed better than BGA, but Arsenault et al. (2015) found that BGA performed better than AICA, but worse than BICA. Therefore a clear hypothesis can’t be formed concerning the performance of these three methods relative to each other.

The equal weights averaging method clearly performs the worst in all three researches and it is expected that for this empirical case study EWA is going to perform the worst out of the five chosen methods.

Different models and data are used in the case described in this paper and the data set contains very little observations compared with the three past researches discussed in this section, so it will not be a surprise if the results from this case study differ slightly from the three analysed researches.

(12)

3

Methodology and Data

The last paragraph described the model averaging techniques that were used in the case study reported earlier in this paper. This paragraph gives a detailed description of how this research was done using these model averaging methods. First the data set that was used is explained, then an extra explanation is given of the constraints and definitions of the model averaging techniques analysed in this case study. After that there is a description of the models chosen for this research and at the end of this paragraph a description is given of how the data are analysed and compared.

3.1 Data set

For this study, a forecast of the value of the export of China has been generated. The data used in this case study are collected and generated by the General Administration of Customs from the People’s Republic of China and has been found in the database called Datastream by Thomson Reuters. The data consists of monthly observations over 33 years, starting from July 1983 until November 2016, which equals exactly 400 data points. The exports at customs refer to the real value of commodities that are exported from China. The raw values are given in terms of hundred million US dollars and are not seasonally adjusted. Exports in China are valued on a FOB basis, meaning Freight on Board or Free on Board. This means the values include the transaction value of goods and the costs of bringing the goods to the border of the country of export.

In this study approximately the first 300 data points are used as the calibration set. The remaining observations are used for the evaluation of the forecasts. With the data from the evaluation period a set of optimal weights has also been calculated ex post for comparison with the weights calculated with the model averaging methods.

The raw time series set contains an upward trend and seasonal effects, which are unde-sirable in this case study because the raw models require stationary time series and do not account for the seasonal effects (Fig. 1). To remove the trend visible in the time series, first the natural logarithm of the values were taken and then first differences. This resulted in the time series that is shown in fig. 2. With respect to the seasonal effects, the data remained unchanged, yet dummy variables were introduced and these will be described

(13)

later in this paragraph.

Figure 1: The raw value of the exports of China over a period of 400 months.

Figure 2: The data after having taken the natural logarithm and first differences.

The variable Yt is the observed total value of export from the evaluation period and

the vector X is composed of the one-step-ahead forecast points for Yt. Both the X and the

(14)

estimates for the calibration data were also made, besides the forecasts for the evaluation period.

3.2 Models

In this study several models have been used to give the predictions that build up the vector X. A total of three model sets are set up. All models are variations of the autoregressive AR(k) model. The first model contains one lagged variable of the first order, the second model contains two lagged variables of the first and second order and the third model contains a lagged variable of the second order and a squared lagged variable of the first order.

The forecasts of the individual models are first calculated, then used to determine the weights using the different MA-methods and these weights combined with the individual forecasts give the combined forecasts.

Since the data is obtained over a long period, a Dickey-Fuller test is performed to test whether the time series is stationary or not. The model used for this test is

∆yt= α + θt + γyt−1+ εt. (6)

The null hypothesis that a unit root is present (γ = 0) is then tested against the alternative hypothesis that γ < 0. When β 6= 0 there is a trend in the time series. Trends are removed by taking first differences.

As was stated before, the time series most likely contains seasonal effects and these effects are accounted for by the introduction of seasonal dummies. There are four dummies in total for every quarter of the year and a dummy shows 1 when the chosen value is observed in the quarter of the year that is connected to that dummy and 0 when the chosen value is from another period. The months that were chosen together are linked to the seasons of the year, meaning that the first dummy set containing December, January and February represents winter, the second set containing March, April and May represents spring, the third set containing June, July and August represents summer and the fourth and last set containing September, October and November represents fall. Since not all the estimated coefficients of the dummies might be significant, the Akaike’s information

(15)

criterion of each model is compared to choose the best one, which is the model with the minimum AIC-value. The different sets of dummies are a set with three dummies, six sets with combinations of two dummies and four sets, which contain one dummy each. Besides the runs with dummy variables, a model without any dummies at all is also ran as an alternative. At the end the model which has the lowest AIC-value is kept and the forecast is calculated with the resulting coefficients of that model.

Combining the AR-variables and the dummy variables results in the following five models used for this case study

X1,t = c + γTD1,t+ ε1,t, (7) X2,t= c + γTD2,t+ φ1X2,t−1+ ε2,t, (8) X3,t= c + γTD3,t+ φ1X3,t−1+ φ2X3,t−2+ ε3,t, (9) X4,t = c + γTD4,t+ φ1(X4,t−1)2+ φ2X4,t−2+ ε4,t, (10) X5,t = c + γTD5,t+ φ1(X5,t−1)2+ ε5,t, (11) X6,t = c + γTD6,t+ φ1X6,t−1+ φ1(X6,t−1)2+ ε6,t, (12)

where D is a vector consisting varying numbers of elements, for each contained season one, and γ is a vector with a length equal to the number of elements in D.

First the observations from the calibration period were estimated using the models and the data from the calibration period. For the forecasts of the evaluation period, the data set from the calibration period was used as well as observations from the evaluation period up until the forecast period. So for every new forecast, an observation from the evaluation period was added to the data set instead of using previous forecasts for the evaluation period. This method was chosen since it is in line with the method Diks and Vrugt (2010) applied to their case studies.

3.3 Method of Comparison

The weights are calculated by applying the model averaging techniques on the data from the calibration period. The resulting weights were then applied on the evaluation period.

(16)

To compare the different MA-methods and their results, their respective RMSE’s have been compared, which has been calculated with the combined forecast points and the data from the evaluation period. The out-of-sample one-step-ahead RMSE is given as

RM SE = v u u t 1 N N X t=1  Yt,n+1− ˆYt,n+1 2 , (13)

where N stands for the number of observations from the evaluation period and n stands for the period in which the one-step-ahead forecast for period n + 1 has been estimated.

(17)

4

Results

This section discusses the results that were obtained from the study by following the methods described in the last section and using the model averaging techniques explained in Section 2.

The Dickey-Fuller performed on the raw data set showed that the null hypothesis could not be rejected and that a stochastic trend was present, so it was decided to take the first differences of the natural logarithms of the raw data and to use the resulting time series for the rest of the case study. The resulting time series contained no more stochastic trend, when tested with the Dickey-Fuller test again.

In table 1, the results of the model selection using Akaike’s information criterion is shown. The different seasonal dummy sets are defined in the two leftmost columns, where the second column from the left shows the dummies contained in the dummy set. These dummies are as was defined in the last section. Note that the last set does not contain seasonal dummies at all. It can be noted that the second dummy set was chosen most often, when forecasting the time series. This must mean that there are indeed seasonal effects contained in the time series and that these effects are most prominent in the winter and spring seasons.

After letting the computer decide which set of dummy variables to use for each forecast to account for the seasonal effects of the data, the forecasts where calculated for each of the models individually and for each period apart, using data from the observed time series.

The resulting weights and RMSE after applying each model averaging technique on all six models separately are shown in table 4, as well as the resulting RMSE from the individual models. Here ˆβ1 is the weight given to model 1 described by equation 7, ˆβ2 the

weight given to model 2 described by equation 8, ˆβ3the weight given to model 3 described

by equation 9, ˆβ4 the weight given to model 4 described by equation 10, ˆβ5 the weight

given to model 5 described by equation 11 and ˆβ6 the weight given to model 6 described

by equation 12. The weights of an ex post combined forecast is also shown for comparison. The weights for βopt were calculated using OLS in the same way as was implemented using

the GRA method. The only difference is that GRA uses data from the calibration period and the ex post evaluation uses data from the evaluation period to minimize the sum of

(18)

Dummies Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Set 1 D1, D2, D3 0 1 69 4 1 3 Set 2 D1, D2 273 369 294 234 272 311 Set 3 D2, D3 0 0 0 0 1 0 Set 4 D1, D3 0 0 0 0 0 0 Set 5 D1, D4 0 0 0 0 0 0 Set 6 D2, D4 0 0 0 0 0 0 Set 7 D3, D4 0 0 0 0 0 0 Set 8 D1 12 2 14 80 42 0 Set 9 D2 60 22 18 50 57 81 Set 10 D3 2 0 0 0 0 0 Set 11 D4 1 0 0 0 0 0 Set 12 No dummies 49 3 2 29 24 2

Table 1: The number of times each dummy set was chosen from the total of twelve dummy sets for each of the six models, when estimating a total of 397 forecasts per model.

squared errors. The RMSE obtained from βopt is therefore also the best achievable RMSE. It can be noted that all of the calculated RMSE are quite small, such that the differences are small too. That’s why a ∆RMSE was calculated as well by taking the differences between the RMSE from each model averaging technique and the best RMSE, in this case the one obtained from the ex post evaluation. These RMSE and ∆RMSE are shown in the two rightmost columns.

The first thing that could be noted is that none of the model averaging techniques are outperformed by the individual models in terms of RMSE, except for model 7. This best individual model is the model, which only relies on a constant and the seasonal dummies as independent variables. When comparing the ∆RMSE of the model averaging techniques, it is clear that GRA outperforms the other methods by far. The remaining methods perform quite similarly with constrained GRA performing best from these five methods. AICA and BICA perform third best from the overall set of methods and both give all weights to the model from equation 7, since it could have been seen from the equations of AICA

(19)

Method βˆ1 βˆ2 βˆ3 βˆ4 βˆ5 βˆ6 RMSE ∆RMSE Combined EWA 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1514 0.0072 BGA 0.1777 0.1773 0.1536 0.1718 0.1774 0.1421 0.1513 0.0070 AICA 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1512 0.0069 BICA 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1512 0.0069 GRA 0.3241 2.0169 -0.3819 0.3182 -0.7865 -1.0287 0.1458 0.0015 CGRA 0.5038 0.4962 0.0000 0.0000 0.0000 0.0000 0.1510 0.0067 Best βopt 1.7083 2.7097 0.5385 -0.6565 -1.5475 -2.4257 0.1443 0 Individual X1 1 0 0 0 0 0 0.1512 0.0069 X2 0 1 0 0 0 0 0.1596 0.0153 X3 0 0 1 0 0 0 0.1642 0.0199 X4 0 0 0 1 0 0 0.1555 0.0112 X5 0 0 0 0 1 0 0.1528 0.0086 X6 0 0 0 0 0 1 0.1620 0.0177

Table 2: Weights, RMSE and ∆RMSE for six model averaging method separately, the ex post evaluation and the individual models

and BICA that when enough iterations are done, the model that gives forecasts with the lowest error variance will gain all the weight. The variances of the forecast errors are shown in table 3. BGA divides the weights according to the size of the error variances of the forecasts, therefore the methods which generate the smallest error variances have been given the largest weights by the Bates-Granger method.

When looking at the individual forecasts, model 1 performs the best. Second best is model 5, which only difference from model 1 is that it contained a squared lag of order 1. It should be noted that model 5 performed better than model 2, which contained a lag of order one which wasn’t squared, and model 6, which contained a lag of order one as well as

(20)

Models X1 X2 X3 X4 X5 X6

Variances 0.0390 0.0391 0.0451 0.0403 0.0391 0.0488

Table 3: Variances of forecasts of each individual model

a squared lag of order one. The worst performing model with individual forecasts is model 3, which is the model of the form AR(2). It is clear to see that model 4, which contained a squared lag of order one and a lag of order two, performed better than model 3 in terms of RMSE, since model 5 with a squared lag of order one performed better than model 2 with just a lag of order one in terms of RMSE.

For a better understanding and comparison of the methods, two more subsets of models have been analysed. One is the subset of models given by the equations 7 to 11, leaving just one model out of the total model set of six models. The resulting weights and RMSE are given given in table 4. In this case the unconstrained version of GRA still outperforms all the other methods, while the RMSE from the remaining five methods lie very closely, just like with the set of six models. Although this time, BGA performs slightly better than AICA and BICA. Coincidentally EWA performs just as good as AICA and BICA in this case.

Another model set, which had been used for analysis, was a model set with the models given by the equations 9 to 12. The first two models from the total set of models have been left out, since they contained the forecasts with the lowest error variances. This way the changes of the methods, which form their weights based on the error variances, can be seen compared to the methods, which do not rely as much on the error variances. Table 5 shows the results in the same way as table 2 does, except this time X1, X2and the corresponding

weights β have been left out. The weights have been generated based on the forecasts of the four models from the subset. The ranking of the methods remained mostly the same, but one big difference was that AICA and BICA fell far behind and couldn’t even outperform EWA, which was the method that had been in the worst performing group every time so far. This can be blamed on the fact that the remaining error variances were so larger, compared to the variances from the two models that were left out, that even though AICA

(21)

Method βˆ1 βˆ2 βˆ3 βˆ4 βˆ5 RMSE ∆RMSE Combined EWA 0.2 0.2 0.2 0.2 0.2 0.1512 0.0068 BGA 0.2071 0.2067 0.1790 0.2003 0.2068 0.1511 0.0067 AICA 1.0000 0.0000 0.0000 0.0000 0.0000 0.1512 0.0068 BICA 1.0000 0.0000 0.0000 0.0000 0.0000 0.1512 0.0068 GRA -0.0495 1.5011 -0.9257 1.4861 -1.4223 0.1464 0.0020 CGRA 0.5038 0.4962 0.0000 0.0000 0.0000 0.1510 0.0066 Best βopt -0.4813 -0.1270 0.4735 -0.5702 1.0462 0.1444 0 Individual X1 1 0 0 0 0 0.1512 0.0068 X2 0 1 0 0 0 0.1596 0.0152 X3 0 0 1 0 0 0.1642 0.0198 X4 0 0 0 1 0 0.1555 0.0111 X5 0 0 0 0 1 0.1528 0.0084

Table 4: Weights, RMSE and ∆RMSE for five model averaging method separately, the ex post evaluation and the individual models

and BICA still chose the model with the forecasts with the lowest error variances from the subset, it wasn’t enough to compete with methods that actually combined forecasts. Comparing all the analyses with every time a different set of models, it is clear that the GRA method performs best in terms of RMSE. Even though the remaining methods tend to perform similarly, constrained GRA keeps outperforming the other four methods. BGA and AICA and BICA perform about the same, depending on the forecast error variances and the gaps in between, whichever method has a slighter edge. Finally, EWA tends to be in the group with the lowest RMSE.

(22)

Method βˆ3 βˆ4 βˆ5 βˆ6 RMSE ∆RMSE Combined EWA 0.2500 0.2500 0.2500 0.2500 0.1521 0.0077 BGA 0.2381 0.2664 0.2751 0.2203 0.1518 0.0074 AICA 0.0000 0.0000 1.0000 0.0000 0.1528 0.0084 BICA 0.0000 0.0000 1.0000 0.0000 0.1528 0.0084 GRA 1.0279 -2.2842 2.2578 -0.7107 0.1456 0.0012 CGRA 0.2480 0.000 0.7520 0.0000 0.1516 0.0072 Best βopt 0.7572 -1.0554 1.0488 -0.3901 0.1444 0 Individual X3 1 0 0 0 0.1642 0.0198 X4 0 1 0 0 0.1555 0.0111 X5 0 0 1 0 0.1528 0.0084 X6 0 0 0 1 0.1620 0.0175

Table 5: Weights, RMSE and ∆RMSE for four model averaging method separately, the ex post evaluation and the individual models

(23)

5

Conclusion

In this paper, a case study has been described where six different model averaging methods were compared with the idea to find out which of these methods would perform the best in terms of RMSE. Three different model sets were used for analysis to get a clearer idea of how the performance of the model averaging methods would change relative to one another. The results were then compared with the results obtained in previous case studies done by Hansen (2008), Diks and Vrugt (2010) and Arsenault et al. (2015).

The three past researches all had slightly different outcomes, but one clear winner in all three of the cases was the GRA method. For Hansen (2008) and Arsenault et al. (2015) the best GRA was the constrained GRA, while Diks and Vrugt (2010) only looked at uncon-strained GRA. The case study done in this paper showed that the unconuncon-strained version of GRA outperformed all the other observed model averaging methods, even constrained GRA. In theory it would be more natural that the unconstrained version performed best in all case studies, since it has no restrictions and can freely minimize the errors, but as is shown, this is not the case in practice. The curve of the observed values from the evaluation period and the corresponding RMSE could differ from the curve of the observa-tions from the calibration period, so minimizing the RMSE ex ante does not have to give the optimal RMSE ex post. Choosing either of the GRA methods would come down to whether the researcher or data analyst’s preference of weights, as some might have a need for non-negative weights that sum up to one and others might not.

The performance of the AICA, BICA and BGA methods all came down to the forecast error variances and it had been shown that the ranking of these three methods fluctuated according the model sets that were used. The results showed that AICA and BICA were more susceptible to the changes in error variances, since these methods always put all of the weight on the model which generated the lowest forecast error variances. When error variances were large enough, AICA and BICA would perform just as good or even worse than EWA, which had been the worst performing method in all of the analysed past case studies.

Leaving out the model that did not rely on past data and only on a constant and seasonal dummies, none of the combined forecasts of the model averaging methods were

(24)

outperformed by the individual forecasts of any of the models. This means that in this case, any of the model averaging techniques should be preferred over the individual models if the goal is to optimize the RMSE.

Even though there can be a discussion as to whether the constrained or the uncon-strained version of GRA is the best model averaging method from the group, it is clear that either of these versions will give more satisfying and stable results than the other methods that were discussed and analysed in this case study.

(25)

References

Arsenault, R., Gatien, P., Renaud, B., Brissette, F., & Martel, J.-L. (2015). A comparative analysis of 9 multi-model averaging approaches in hydrological continuous streamflow simulation. Journal of Hydrology, 529 (3), 754–767.

Bates, J. M., & Granger, C. W. (1969). The combination of forecasts. Journal of the Operational Research Society, 20 (4), 451–468.

Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: an integral part of inference. Biometrics, 53 , 603–618.

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: a practical information-theoretic approach (2nd ed.). New York: Springer.

Diks, C. G., & Vrugt, J. A. (2010). Comparison of point forecast accuracy of model averaging methods in hydrologic applications. Stochastic Environmental Research and Risk Assessment , 24 (6), 809–820.

Granger, C. W., & Ramanathan, R. (1984). Improved methods of combining forecasts. Journal of Forecasting, 3 (2), 197–204.

Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75 (4), 1175–1189. Hansen, B. E. (2008). Least-squares forecast averaging. Journal of Econometrics, 146 (2),

342–350.

United Nations Commodity Trade Statistics Database. (2015). Retrieved from http:// comtrade.un.org/db/mr/rfGlossaryList.aspx

Referenties

GERELATEERDE DOCUMENTEN

In South Africa, if our 200 000 registered health professionals each helped one patient to stop smoking per month this would produce 2.4 million ex-smokers a year and

critical facto r s in implementing ERP systems includes project teamwork an d composition, organ i satio nal culture and change mana gement, top management support,

A multiple regression analysis was conducted with the following predictors: sexual orientation (coded 1 = gay male sexual condition; -1 = heterosexual control

So this is in line with the opportunistic financial reporting hypothesis from Kim, Park and Wier (2012).Thus, to formulate an answer on the research question

This study aims to evaluate the effects of thyme essential oil, carvacrol, citral and 2-(E)-hexenal, on whole- genome gene expression (the transcriptome), as well as

De botresten in deze puin- en opvullingslaag zijn waarschijnlijk afkomstig van het rond de kerk gelegen kerkhof dat met de bouw van de pastorie misschien deels geruimd

We demonstrate that the particular structure of the Laplacian weighting rule indeed allows for a distributed convergence rate optimization, based on the in-network computation of

The aim of this investigation was to analyze the development of the ANS of full-term, prematurely born that reach term PMA and preterm babies by means of a computational analysis