• No results found

Model selection criteria for ARMA one-step-ahead forecasting : an investigation of the potential of AIC, BIC and HQC

N/A
N/A
Protected

Academic year: 2021

Share "Model selection criteria for ARMA one-step-ahead forecasting : an investigation of the potential of AIC, BIC and HQC"

Copied!
55
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Bachelor Thesis

Econometrie & Operationele Research

Model selection criteria for ARMA one-step-ahead forecasting An investigation of the potential of AIC, BIC and HQC

Jurijn Jongkees 10375821

University of Amsterdam

Supervisor: Mr dr P.H.F.M. van Casteren 30-11-2015

(2)

Abstract

In this research the model selection criteria AIC, BIC and HQC are investigated based on their potential to forecast future values of daily time series one-step-ahead with

autoregressive moving average (ARMA) models. The in-sample model selection criteria AIC, BIC and HQC are therefore compared with out-of-sample forecast criteria. Three financial time series are used and different ARMA models are calculated. Recent

developments in ARMA modelling, model selection criteria and forecasting performance are discussed. Moreover, the relation between these model selection criteria and their forecasting performance on a hold-out sample is investigated, however no clear connection could be found. It should be noted, however, that common roots have led to some

misspecification of the ARMA models. It seems that model selection criteria are not able to deal with common roots problems. In addition, the three model selection criteria are

compared based on their forecasting performance. Results indicate that BIC performs better than AIC and HQC, which is in line with earlier research for ARMA models.

(3)

Contents

1 Introduction ... 1

2 Recent developments ... 3

2.1 ARMA order identification ... 4

2.2 Model selection criteria ... 5

2.3 Forecasting performance ... 8

3 Research design ... 10

3.1 Data and variables ... 10

3.2 Criteria ... 11

3.3 Model and ways to measure ... 12

4 Results ... 14

5 Analysis ... 17

6 Conclusion ... 24

References... 28 Appendix ... I

(4)

1 Introduction

In the last fifty years the topic of autoregressive moving average (ARMA) order

identification has attracted considerable attention in time series literature and in research areas such as econometrics. An ARMA model can be used to give an description of a stationary time series. As the name suggests, ARMA models consist of autoregressive terms and moving average terms. The model can be written as

𝜙(𝐿)𝑦𝑡= 𝑐 + 𝜃(𝐿)𝜀𝑡

with autoregressive (AR) polynomial 𝜙(𝐿) = 1 − 𝜙1𝐿 − ⋯ − 𝜙𝑝𝐿𝑝 and moving average (MA) polynomial 𝜃(𝐿) = 1 + 𝜃1𝐿 + ⋯ + 𝜃𝑞𝐿𝑞. 𝜀𝑡 is white noise, 𝑐 is the constant and 𝐿 is an backward operator, also known as lag operator. The above model is called an ARMA model of order (p,q), which is often abbreviated to ARMA(p,q).

A difficult and often examined problem in the literature is the identification of the order of an ARMA process. This amounts to choosing the order (p,q) in the above model that gives the best description of a time series. A lot of research is concerned with finding the best order of an ARMA process. The ultimate aim of a model in time series analysis is to describe the underlying data generating process as good as possible, but how can one measure whether a model is a good approximation of reality? For practical uses it is often interesting to know the ability of a model to forecast the future values of a time series.

High forecasting power is a property that is valued high by time series analyst. Heij et al (2004) illustrate how to quantify the forecasting performance by using a hold-out set of data. Part of the data set should be left out and the 𝑚 observations that were left out should be used for evaluation purposes. The accuracy of the forecast can, subsequently, be checked by calculating the root mean squared error (RMSE) or the mean absolute

prediction error (MAE)

𝑅𝑀𝑆𝐸 = [1 𝑚∑(𝑦𝑛+ℎ− ŷ𝑛+ℎ) 2 𝑚 ℎ=1 ] 1/2 , 𝑀𝐴𝐸 = 1 𝑚∑ |𝑦𝑛+ℎ− ŷ𝑛+ℎ| 𝑚 ℎ=1

(5)

in which 𝑦𝑛+ℎ is the true value of the time series at time 𝑛 + ℎ and ŷ𝑛+ℎ is the

corresponding predicted value. A low RMSE or MAE corresponds to an ARMA model that produces decent forecast. It depends on the preferences of the time series analyst whether short-term forecasting or long-term forecasting is more important.

Through the years many procedures to solve the problem of ARMA order

determination have been proposed, as is described by De Gooijer et al (1985). Roughly, a distinction can be made between two methods: the pattern identification approach and model selection method based on information criteria. In this research only model selection criteria are studied, only the main idea of the pattern identification methods such as the Box-Jenkins method, as described by Box & Jenkins (1970), is explained.

Qi & Zhang (2001) study whether there is a relation between model selection criteria and out-of-sample forecasting performance using artificial neural networks models. They conclude that model selection criteria are not able to provide a reliable guide to out-of-sample forecasting performance, so they conclude that there is no clear connection between in-sample model selection and out-of-sample forecasting performance. Their research focusses only on artificial neural networks. As artificial neural networks typically have more parameters than ARMA models, their results and conclusions are not applicable to ARMA models. As many penalty-based model selection criteria are actually derived and proposed for autoregressive (AR) models or even ARMA models, there might be a stronger relationship between in-sample model selection criteria and out-of-sample performance when using ARMA models.

It is therefore interesting to evaluate certain model selection criteria based on their forecasting performance applied to real data. The purpose of this research is to find an answer to the following research question: what is the potential of certain model selection criteria applied to ARMA models in one-step-ahead forecasting of real financial time series with daily observations? This investigation is useful as it is not always straightforward for

(6)

time series analysts what information criteria perform best when it comes to ARMA modelling.

The main question is supported by three sub-questions. Firstly, what important developments in ARMA modelling were made the last fifty years with regard to order identification, model selection criteria and forecasting? Secondly, can in-sample model selection criteria indeed select models that warrant good performance on a hold-out set of data and what is the relation between the criteria and their out-of-sample performance? Thirdly, which criterion performs best when it comes to one-step-ahead forecasting of ARMA time series? These issues are examined empirically by using three real financial time series with daily observations: the exchange rate between the euro and the US dollar, the interest rate of the youngest 10-year Dutch state loan and the Cushing spot price of crude oil.

The remainder of this text is structured as follows. Section 2 deals with the first sub-question and elaborates on the research done in identifying and forecasting ARMA models. Moreover, three model selection criteria are explained. Section 3 outlines the data and describes the way the research is performed. In section 4 the empirical findings are reported and the second sub-question is answered. Section 5 analyses these findings and attempts to answer the last sub-questions. This section elaborates on the validity and reliability of the research performed. Section 6, at last, concludes this research.

2 Recent developments

This section deals with the first sub-question. Some important developments of the last fifty years in ARMA order identification, model selection criteria and forecasting performance are discussed.

(7)

2.1 ARMA order identification

In many cases low order ARMA models can be used to approximate higher order autoregressive (AR) models and higher order moving average (MA) models. That is, ARMA models need relatively few parameters to give an accurate approximation of a stationary time series (Heij et al, 2004, p. 544). Stationarity means that the sample mean and sample covariances are roughly the same over time. Consequently, the first thing to do is take a look at the data to make sure that the variables are stationary. When one wants to model a non-stationary series, ARIMA models can be used (Pindyck & Rubinfeld, 1976, pp. 469-471). ARIMA modelling amounts to taking first (sometimes second) differences to obtain a stationary series and subsequently apply ARMA modelling. Data can often be made stationary by taking first (or second) differences and for that reason there is no elaboration on ARIMA models in this research.

Various methods and procedures for ARMA identification have been proposed and explored in the literature through the years. Pattern identification methods were originally introduced by Box & Jenkins (1970). The Box-Jenkins method studies the patterns of the sample autocorrelation function (SACF) and the sample partial autocorrelation function (SPACF). For a MA(q) model the SACF should become zero at lag q, according to the theory. The SPACF of an AR(p) model should have a cut-off at lag p. However, for ARMA(p,q) models (with p and q unequal to zero) reading the SACF and SPACF to estimate p and q is not an easy task. A simple inspection of the graph of the SACF and the SPACF will not be enough to give accurate estimates of the parameters p and q of the ARMA model (Heij et al, 2004).

Hence, new methods to identify the structure of ARMA processes have been proposed to supplement the Box-Jenkins method. These methods are very well described by Chan (1999) and by De Gooijer et al (1985). Also, Choi (1992, Chapter 5) provides a

(8)

detailed recapitulation of these pattern identification methods and a comprehensive bibliography.

The main advantage of penalty based model selection criteria is that there is no need to read the SACF and SPACF such as for subjective Box-Jenkins related methods. Simply minimizing the criterion is enough. In an objective way the model with the best balance between goodness of fit and model simplicity can be found. A comparison between the forecasting power of the Box-Jenkins method and the objective model selection criteria methods is given by Beveridge & Oickle (1994). They conclude that objective selection criteria methods in their study perform equal or even superior to Box-Jenkins methods.

2.2 Model selection criteria

The main purpose of this research is not to add new model selection criteria to the existing literature. In this section three model selection criteria that are commonly used in time series analysis are discussed. This research does not elaborate too much on the technical details underlying the criteria. At the end of this section, the consistency of the three criteria is discussed. The articles of Forster (2000) and Zucchini (2000) provide a good summary of the basics of model selection criteria, which is needed to understand the way model

selection criteria work.

The Akaike Information Criterion (AIC) (Akaike, 1974) is a popular criterion used for model identification. There are many different types of definitions, which are essentially all the same. In this research the following definition of AIC is used:

𝐴𝐼𝐶 =−2𝑙 + 2𝑘

(9)

where 𝑙 is the log (natural logarithm) of the likelihood function of the model, k are the number of model parameters and n is the number of observations. k will for ARMA(p,q) models including a constant be equal to p + q + 1.

As for all penalty based information criteria, the first part of AIC is a measure of the goodness of fit, which is usually calculated by the likelihood. The second part involves a penalty term for the number of parameters, this is to account for the fact that the model fit (likelihood) always increases if more parameters (for ARMA models lagged values of 𝑦 or 𝜀) are included in the model. Large models have a higher variance of the estimated

coefficients. Thus, unnecessarily large models lead to inefficiency. Only variables that have a clear effect on the dependent variable 𝑦𝑡 should be included in a model and less important variables should be excluded. More parameters in a model lead to a higher penalty and therefore the penalty term prevents overfitting. Concluding, AIC can be useful in finding a trade-off between goodness of fit (accuracy) and model parsimony (lower variance).

AIC is based on the Kullback-Leibler information number (Kullback & Leibler, 1951). A derivation of AIC based on the Kullback-Leibler information number is given by Choi (1992, pp. 47-51). Besides, Van Casteren (1994) explains that the penalty term should correct for either a bias effect as well as a variance effect. Larger models usually lead to a higher accuracy and to higher variance of the estimators. He provides a statistical derivation of AIC.

Ozaki (1977) shows that the method of minimizing AIC is more effective than the Box-Jenkins method, as it is able to overcome some of the difficulties of the Box-Jenkins method. He performs the same research and uses the same data as was earlier done by Box & Jenkins (1970), however, he uses AIC instead of the Box-Jenkins method. His

conclusion is that AIC is an easier and not less effective way of finding the optimal order of ARMA models.

(10)

Another commonly used information criterion is the Bayesian Information Criterion (BIC) which was first proposed by Schwarz (1978). It is also known as the Schwarz information criterion.

𝐵𝐼𝐶 =−2𝑙 + 𝑘 log (𝑛)

𝑛 .

AIC and BIC are quite similar, the only difference is the penalty term. For n>7 BIC imposes a greater penalty term. Consequently, BIC is more inclined to choose a smaller model than AIC. AIC and BIC are the most popular information criteria used to determine model orders in all sort of research areas. For a study on the difference between

assumptions and performance of AIC and BIC see Kuha (2004), who concludes that both criteria are valuable.

In addition to AIC and BIC the Hannan Quinn criterion (HQC) is added, which is based on the work done by Hannan & Quinn (1979). HQC is, as AIC and BIC are, a penalty based information criterion. Also, HQC can be defined in many ways. In this research the following definition is used:

𝐻𝑄𝐶 =−2𝑙 + 2𝑘 log log (𝑛)

𝑛 .

This means that HQC is exactly equal to AIC for n equal to 𝑒 raised to the power 𝑒 (which is approximately 15.154). Consequently, for a sample with 16 or more observations HQC imposes a smaller penalty for model complexity than AIC. HQC will for n>2 impose a smaller penalty than BIC. Concluding, BIC imposes a greater penalty for model complexity than AIC and HQC. HQC is a criterion that for all samples with more than 16 observations penalizes more than AIC, but less than BIC.

BIC and HQC are consistent criteria, meaning that the model order is estimated correctly when the sample size approaches infinity. This consistency is proven by Hannan (1980). AIC, on the contrary, is not a consistent criterion. Moreover, it has a relatively small penalty term. AIC is, therefore, sometimes said to be overfitting, meaning that inefficiently large models are selected.

(11)

AIC and other inconsistent criteria are still commonly used to determine orders in time series analysis despite the fact that they are inconsistent. Hansen (2005, pp. 62-63) explains that it is important to keep in mind that in reality the order of any process based on finite samples may never be correctly identified by any procedure. All models should be considered as approximations of reality. Their value and strength lie in their ability to explain a situation and predict future values. Consequently, inconsistencies of information criteria certainly do not have to be an obstacle for accurate model selection.

2.3 Forecasting performance

In this paragraph earlier research done to test the forecasting performance of the in-sample criteria AIC, BIC and HQC is described. Mantalos et al (2010) carry out a Monte Carlo experiment to test how often the correct model is selected by AIC, BIC and HQC.

Moreover, they study which model selected by the different criteria predicts future values best. Their results are as follows: for a simulated ARMA(1,1) model AIC is, using a sample of 200 observations, able to choose the correct model 64.75% of the time, BIC 94.47% and HQC 84.81%. Equally, for a simulated ARMA(2,1) model again with 200 observations AIC selects the correct model in 65.94% of the time, BIC 86.64% of the time and HQC 79.37% of the time.

Next, the question which criterion corresponds to the best predicted future values is examined. This is evaluated by the normalized mean forecast error. Due to the

normalization, the criterion that is closest to 1 can be considered as the best. For the ARMA(1,1) model with a 200 observation sample the normalized mean forecast error for AIC is 1.008094, for BIC 0.997053 and for HQC 1.000558. For the ARMA(2,1) the following values are reported respectively: 1.040837; 1.034200 and 1.035841. Although the results are not very obvious, one could certainly argue that BIC and HQC perform a

(12)

little better than AIC. It should, however, be noted that the conclusions might be different when using true data instead of simulated data. Also, it is important to keep in mind that only ARMA(1,1) and ARMA(2,1) are tested.

Equally, Koehler & Murphree (1988) conclude that BIC should be preferred to AIC for choosing the order of an ARMA model using real data. However, one should be aware of the fact that they use multiple series with monthly data (instead of daily data). Also, these series all have different sample sizes (instead of a fixed sample size), meaning their conclusion is not fully applicable to one fixed sample size. Moreover, they use state space models which is equivalent to ARMA(p,p) modelling. This is only an element of the order identification problem concerning ARMA(p,q) models. Also, HQC is not included in their study.

As was already mentioned in the introduction, Qi & Zhang (2001) were not able to find an obvious relation between certain model selection criteria and their out-of-sample forecasting performance. Their research focusses on artificial neural networks, which is a little similar to ARMA modelling, but there are some important difference. Therefore, one cannot yet argue about the relation between model selection criteria and their out-of-sample performance for ARMA models.

The conclusion of this section is that model selection criteria such as AIC and BIC appear to be a good alternative to the Box-Jenkins method. AIC, BIC and HQC are

therefore commonly used in time series analysis. Earlier research shows that the consistent criterion BIC performs better for ARMA modelling than the inconsistent criterion AIC. However, it is not sure that the same conclusion applies to ARMA(p,q) modelling with real time series data using a fixed sample size and daily data. Also, the number of models that are tested can play an important role.

(13)

3 Research design

To investigate the relation between model selection criteria and forecasting performance, empirical research is performed is section 4. This section elaborates on the design of this research. At first, the time series data and variables that are used in the research are discussed. Secondly, the model selection criteria and forecasting criteria are described. Thirdly, the model and estimation method are explained.

3.1 Data and variables

Three time series, the exchange rate between the euro and the US dollar (𝐸𝑈𝑅𝑈𝑆𝐷), the interest rate of the youngest 10-year Dutch state loan (𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡) and the Cushing spot price of crude oil (𝑜𝑖𝑙) are evaluated in this research. Cushing is a place in the US where oil is stored. All three series consist of daily data, but Saturdays, Sundays and official holidays are not observed, which is typical for a lot of financial data. Therefore, every series contains at most five observations per week. In all series there are some values that are unknown: the exchange rate EUR/USD contains 77 missing values, the interest rate contains 2 missing values and the spot price of crude oil contains 103 missing values. The missing values are filled in by linear interpolation between the previous and the next observation, hoping to keep the pattern in the data.

The data of the exchange rate between the Euro and the US dollar were taken from De Nederlandsche Bank (the Dutch central bank) and starts at the fourth of January 1999 and ends the thirtieth of August 2015. The data of the interest rate of the youngest 10-year Dutch state loan were also taken from De Nederlandsche Bank and goes from the first of January 1990 until the thirty-first of August 2015. The spot price of Cushing crude oil was

(14)

obtained from the U.S. Energy Information Administration and starts at the second of January 1986 and the last observation is at the twenty-first of September 2015.

As was earlier stated, the data need to be stationary for ARMA modelling. Since there is an exponential trend in the spot price of the oil data, the natural logarithm of the oil data is taken to remove the exponential trend, obtaining log 𝑜𝑖𝑙. After taking first

differences the augmented Dickey-Fuller test rejects the null hypothesis of a nonstationary series. Thus, Δ log 𝑜𝑖𝑙 appears to be stationary. According to the augmented Dickey-Fuller test 𝐸𝑈𝑅𝑈𝑆𝐷 and 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 are nonstationary, however the first order differences are stationary. In figure 1 a plot is shown of the time series 𝐸𝑈𝑅𝑈𝑆𝐷, its first difference is shown in figure 2. Figure 3 and 4 show the same for 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 and in figure 5 and 6 this is shown for log 𝑜𝑖𝑙. Concluding, in this research the variables 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡, 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 are used.

It is also important to know why the data were chosen. As a lot of time series are financial data, this study also focusses on financial data. On purpose, three different sort of financial data are chosen to investigate a wide range of data types. Unfortunately, the problem with a lot of financial data such as stock prices is that these series should be distributed as white noise. If this would not be the case, it would be relatively easy to predict future values, meaning that an arbitrage opportunity would come to exist. However, as the data used in this research are not stock prices, one could still expect a certain pattern (although maybe small) and therefore an ARMA model can indeed be a reasonable

alternative to white noise.

3.2 Criteria

In this study three model selection criteria are investigated, as was already stated. These are AIC, BIC and HQC. These criteria were already defined and described in section 2.2.

(15)

As was already mentioned in the introduction, in this study only short-term forecasting is examined. To be more exact, only 1-step-ahead forecasts are observed. This is also known as static forecasting (Heij et al, 2004, p. 570). Static forecasting corresponds to forecasting the value 𝑦𝑛+ℎ using all observations until 𝑡 = 𝑛 + ℎ − 1. In this study 3 forecast

evaluation criteria are used: the root mean squared error (RMSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE). All forecast criteria evaluate the forecasting performance by comparing the predicted value ŷ𝑛+ℎ with the true value 𝑦𝑛+ℎ.

RMSE and MAE were already defined in the introduction. The definition of MAPE is

𝑀𝐴𝑃𝐸 =100 𝑚 ∑ | 𝑦𝑛+ℎ− ŷ𝑛+ℎ 𝑦𝑛+ℎ | 𝑚 ℎ=1 .

MAPE expresses the forecast error for the 𝑚 observations in the hold-out sample as a percentage of the true value.

3.3 Model and ways to measure

For all 3 time series subsamples of 250 observations are created. 200 observations are used as the estimation sample and 50 observations are used as forecast sample (or hold-out sample). The ARMA model that is estimated by the estimation sample is later used to evaluate the forecasting performance on the hold-out sample. The time series of the

exchange rate between the euro and the dollar consists of 17 subsamples, the interest rate of the youngest 10-year Dutch state loan consists of 26 subsamples and the interest rate of the Cushing oil price of crude oil consists of 29 subsamples. It is also important to note that in this research only non-seasonal ARMA models are used, since there is no reason to assume that the three used series are influenced by seasonality.

For every subsample the following models are estimated using EViews: white noise; ARMA(1,0); ARMA(2,0); ARMA(3,0); ARMA(0,1); ARMA(0,2); ARMA(1,1);

(16)

ARMA(1,2); ARMA(2,1) and ARMA(2,2). The reason for these 10 models is that they are the most commonly used ARMA models. ARMA models with more than two lags on both sides are not often used. Also, there is no economic theory that would explain larger models to be appropriate. Therefore, these ten models provide quite an extensive range of

possibilities to select from.

So the above 10 models are estimated each time using the first 200 observations. The values of AIC, BIC and HQC are calculated next. Subsequently, static forecasting in EViews (using moving average backcasting) is applied to the last 50 observations of the subsample and the values of RMSE, MAE and MAPE are calculated and compared with the values of AIC, BIC and HQC.

During the first days of the credit crisis in 2008 (close to the fall of Lehman

Brothers) there is an unusual large drop in log 𝑜𝑖𝑙 and also in 𝐸𝑈𝑅𝑈𝑆𝐷. As a consequence, a few values were left out for these two series during the peak of the credit crisis to avoid bad results.

In this research three ways of measuring the relation between model selection criteria and their forecasting performance on the hold-out sample are used. The first method measures the loss in efficiency. This method is also used in doctoral thesis of Van Casteren (1994). The value of RMSE, MAE or MAPE of the model that is selected by either AIC, BIC or HQC is divided by the minimum value of RMSE, MAE or MAPE of the

corresponding subsample and the average per series is reported in section 4. This gives a value that is one or higher. The lower the value, the smaller the efficiency loss.

The second method is the Pearson correlation coefficient. Measuring whether two variables tend to change linearly together is often done by calculating the correlation between these two variables. To be more specific, the Pearson correlation is commonly used in all sort of research. The correlation between a model selection criterion (AIC, BIC or HQC) and a forecast criterion (RMSE, MAE or MAPE) is calculated for each subsample of the three time series and subsequently the average of these values are calculated and

(17)

reported in section 4. The Pearson coefficient assumes that the variables are both normally distributed with constant variance (no heteroscedasticity).

The third way of measuring the performance of the model selection criteria is the Spearman’s rank correlation coefficient (Bain & Engelhardt, 1992, pp. 489-491). This is a non-parametric way of measuring the connection between two ordinal variables. These two variables have to be matched pairs. Moreover, there must be a monotonic relation. A monotonic relation means either that when the value of one variable increases the other value of the matched variable increases as well (positively monotonic) or as the value of one variable increases the matched value of the other variable decreases (negatively monotonic). However, there is no assumption on the exact shape of the relation.

Concluding, the Pearson correlation assumes normally distributed variables, a linear relation between the variables and interval variables whereas the Spearman’s rank

correlation coefficient assumes less: no assumption on the distribution of the variables, a monotonic relation and ordinal (ranked) variables. Since there is no reason to assume that the conditions needed for either the Pearson or the Spearman correlation coefficient hold, these two coefficient are merely used in a descriptive way. In the next section the results of the above research is discussed.

4 Results

In this section the empirical results of the research described in the previous section are discussed. Moreover, the relation between model selection criteria and their performance on the hold-out sample is evaluated (second sub-question). This section, therefore, adds a lot to answering the central question. For every subsample the value of AIC, BIC, HQC, RMSE, MAE and MAPE for all ten models are given in table 1, 2 and 3 for the time

(18)

series 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷, 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡, and Δ log 𝑜𝑖𝑙 respectively. The minimum value per row is underlined.

The first way of measuring whether model selection criteria select models with a good performance on a hold-out sample is by comparing the minimum value of AIC, BIC, HQC with the minimum value of RMSE, MAE and MAPE. This is done in table 4. The number of times the model selection criteria select the same model as the forecast criteria is counted for the three time series. In table 5 the results are shown for all 72 subsamples. This is of course an indicator that uses little information (only the lowest value). However, it can still be argued that this is an important indicator, because the main task of a model selection criterion is to select the best model.

As shown in table 5, the best performing criterion BIC is only 15 times able to select the model with the lowest RMSE, only 17 times able to select the model with the lowest MAE and only 23 times able to select the model with the lowest MAPE while there are 72 subsamples. This result indicates that there is not a very clear relation between the models selected by model selection criteria and their forecasting performance, but there might be some. Naturally, more research has to be done to further investigate this relation, which is done below.

Secondly, the average efficiency loss of the model selection criteria, as described in section 3.3, is calculated and reported. In table 6 this is done for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷, in table 7 for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 and in table 8 for Δ log 𝑜𝑖𝑙. Table 6 shows that for RMSE and MAE the loss in efficiency is somewhere between 10% and 17%. The loss in efficiency is much bigger for the MAPE: somewhere between 85% and 134%. Table 7 shows better results: the

maximum loss in efficiency is 6% for all forecast criteria. Table 8 performs similar to table 6: the loss is within a boundary of 8% for RMSE and MAE. However, for AIC combined with MAPE the loss is clearly bigger: around 51%.

From these results it can be concluded that for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 the relation between model selection criteria and their forecasting performance is best, since the efficiency loss

(19)

is smallest. However, not much can be said about this relation in general. More research needs to be done to further investigate this relation, which is done below. In section 5 the difference between AIC, BIC and HQC is discussed.

Thirdly, to get a better picture of the relation between the selected models and their performance on the hold-out sample a somewhat more extensive method is used: the Pearson correlation. This method takes into account more than the previous method as this method (efficiency loss) uses only the selected model and the best model whereas the correlation uses all models. Therefore, the correlation is useful in answering the second sub-question. In table 9, 10 and 11 the average Pearson correlation is shown for

respectively 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷, 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 and Δ log 𝑜𝑖𝑙.

If there would be no relation between model selection criteria and their performance on the hold-out sample the correlation should be close to zero. If there would be a clear relation the correlation should be positive. The results are surprising. For 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 most combinations between the model selection criteria and the forecast criteria are negatively correlated. This means that model selection criteria select models that are not able to predict the future 50 values of these two time series. Only the correlation between BIC and MAPE is an exception, since this is positively correlated for both series.

The 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 series performs evidently better. Most correlations are positive, meaning that for this time series the relation between model selection criteria and forecasting performance is present. Only the relation between AIC and MAE and the relation between AIC and MAPE is negatively correlated, but still close to zero.

The results of the Pearson correlation do not indicate a relation between model selection criteria and the forecast criteria. Where one would expect either a correlation close to zero in case of no relation or a positive correlation in case of a relation, a negative relation is shown for two time series.

Thirdly, Spearman’s rank correlation coefficient, as described in section 3.3, is calculated and reported. Table 12, 13 and 14 show the Spearman’s rank correlation

(20)

coefficient for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷, 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 and Δ log 𝑜𝑖𝑙 respectively. Similar results as the Pearson correlation coefficient are shown: for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 the results are mostly negative. The values for BIC are, however, positive for all criteria concerning 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and for the MAE and MAPE criteria concerning the Δ log 𝑜𝑖𝑙 series. Generally, it seems that for all three time series Spearman’s rank correlation coefficient is less

negative than the Pearson correlation coefficient. But still the results of both the Spearman and the Pearson correlation coefficients are contrary to expectation.

Concluding, the efficiency criteria show that the average efficiency loss is clearly smallest for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡. A similar conclusion can be made for the average Pearson and Spearman correlation: most correlations are positive for the 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 series. It is surprising and against the odds that for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 mainly negative correlations are observed. The results discussed in this section do not show that there is indeed a relation between model selection criteria and their forecasting performance on a hold-out sample, so the second sub-question can be answered negatively. This conclusion corresponds with the conclusion of Qi & Zhang (2001). In section 5 some possible reasons for the negative correlation are discussed.

5 Analysis

This section continues where the previous section left off, the results are analyzed in more detail. At first this section deals with the third sub-question: which criterion performs best when it comes to one-step-ahead forecasting of ARMA time series? In addition, the validity and limitations of this research are discussed and some recommendations for future

research are done.

This paragraph elaborates on the difference between AIC, BIC and HQC. Table 5 shows that BIC is most often able to select the best forecasting model. The difference with

(21)

AIC is most obvious: 10 versus 15 for RMSE, 4 versus 17 for MAE an 9 versus 23 for MAPE. HQC performs a little better than AIC on all three criteria, but BIC still

outperforms HQC. Also, table 6, 7 and 8 show that for the three time series BIC has the lowest average efficiency loss compared with AIC and HQC for all values of RMSE, MAE and MAPE. This means that on average BIC selects model that are more efficient than the models selected by AIC and HQC.

Similarly, table 9, 10 and 11 show that BIC has the highest Pearson correlation for all three times series and for all three forecast criteria. The RMSE of 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 is the only exception, in this case AIC and HQC have higher correlation coefficients. For all other cases BIC does the best job in selecting models with low forecasting error. The same results hold for the Spearman correlation. BIC performs strictly better for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and

Δ log 𝑜𝑖𝑙. Only for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 BIC does not perform best.

In this section it is tested whether BIC is indeed significantly better than AIC and HQC. By using the central limit theorem (Bain & Engelhard, 1992, p. 238-240) it is possible to perform a paired sample test between the Pearson correlation coefficients of AIC, BIC and HQC. This way of testing is described on page 380 of Bain & Engelhardt (1992). It is an approximate method, because the central limit theorem assumes that the sample means are normally distributed which in reality holds only asymptotically for finite samples. The sample sizes of 17, 26 and 29 are considered to be large enough samples to use the central limit theorem. Also, it should be noted that the true variance of both coefficients is unknown and therefore the sample variance is used instead of the true unknown variance. This could also lead to a little error.

The test is performed for all three time series and for all three forecast criteria (RMSE, MAE and MAPE). The results as shown in table 9, 10 and 11 give rise to the hypothesis that BIC performs best, followed by HQC and AIC comes last. It is now tested whether these results are actually significant. Therefore, the following alternative one-sided hypotheses are tested: BIC>AIC, BIC>HQC and HQC>AIC. The corresponding null

(22)

hypotheses are of course: BIC=AIC, BIC=HQC and HQC=AIC. These are tested and the corresponding Z-value (of the standard normal distribution) is reported. Significance at 10%, 5%, and 1% level is indicated with *, **, and ***, respectively.

Table 15 shows the results of the above explained paired sample test. Not much can be concluded for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡: the Z-values are all reasonably close to 0 and there are no significant results. On the contrary, for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and for Δ log 𝑜𝑖𝑙 the p-values for BIC>AIC are in most cases significant to 10% or less. Only for the Δ log 𝑜𝑖𝑙 series BIC is not significantly better than AIC for RMSE. Meaning that for those 2 time series, the relation between model selection criteria and the forecast criteria is stronger for the models selected by BIC than for those selected by AIC. The other two tests are less obvious than BIC>AIC. BIC>HQC and HQC>AIC are, respectively, only significant to 10% and 5% for MAPE. Thus, there is significant statistical evidence that the Pearson correlation for models selected by BIC is higher than for those selected by AIC for the 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and the

Δ log 𝑜𝑖𝑙 series. These results are not observed for the 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 series.

The exact same test is also performed for Spearman’s rank correlation coefficient in table 16. It can clearly be seen that the results are less significant in this case. Only for MAPE BIC>AIC is significant to 1% and HQC>AIC is significant to 10%. Similar results as for the Pearson correlation are shown, but less obvious.

The efficiency loss, the Pearson correlation and the Spearman correlation do certainly show that BIC generally performs better than AIC and HQC in this research. However, due to the limited amount of subsamples it is hard to show that BIC performs better than HQC and that HQC performs better than AIC. However, some significant evidence is found to conclude that BIC performs better than AIC for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and for Δ log 𝑜𝑖𝑙 for the Pearson correlation. This result combined with the fact that the efficiency loss, the Pearson correlation and the Spearman correlation show that BIC generally performs better than AIC in this research, leads to the conclusion that BIC performs on average superior to AIC in this research. This conclusion corresponds to the conclusions of

(23)

Mantalos et al (2010) and Koehler & Murphree (1988), but it only holds for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and for Δ log 𝑜𝑖𝑙. Below some important limitation of this research are discussed. First a possible explanation is given for the negative correlations.

The results for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 are surprising. It can hardly be true that the models selected by model selection criteria are negatively related to the models selected by forecast criteria. Multiple things may have caused negative correlations, which are

discussed below.

The first and probably most important problem of this research is the fact that common roots are present in this research. In case of one common root an ARMA(p,q) model can be written equivalently as ARMA(p-1,q-1). The inverted autoregressive roots and moving average roots are commonly too close together. This parameter redundancy leads to misspecification of the ARMA model (Woodward et al, 2012, p. 120-123).

Common roots can clearly be observed when estimating the different ARMA models. Apparently it has a great influence on the results of this research: the models that contain common roots sometimes have remarkably low values of AIC, BIC and HQC, but high values of RMSE, MAE and MAPE, which does not really make sense. This influences the Pearson correlation strongly and to a lesser extent the Spearman correlation as well. Common roots therefore also provide a reason for the fact that the Spearman correlation is on average higher than the Pearson correlation in this research. The misleading values of AIC, BIC and HQC have more influence on the linear relation and less on the relation of ranked variables.

The common root problem makes the conclusion of this research less valid as common roots can often be observed for ARMA(2,2), ARMA(2,1), ARMA(1,2) and

sometimes even for ARMA(1,1) models. An important result of this research is the fact that model selection criteria cannot deal with ARMA models that have common roots, because wrong values of AIC, BIC and HQC are reported. Therefore it is important for time series analysts to be aware of the common roots problem. Especially because in this research it

(24)

seems to spoil the way model selection criteria work. Unfortunately, not much is known about the effect of common roots on model selection criteria. More research should be done to investigate why these problems appear and how this can be prevented.

There is a second limitation of this research. For the data used in this research BIC has the highest performance on average. It should, however, be noted that this conclusion applies to estimation samples consisting of 200 observations containing daily data. Generalizations to other research with a different sample size or with a different observation frequency should be made with caution. Besides, the number and sort of models that are included in a research is of significant important.

Mantalos et al (2010), for example, perform their simulation research once again, but this time with fewer models to choose from. The true model is therefore more often selected, leading to a lower forecast error. In this research there is of course no true model as in a simulation study. A real time series will never be exactly equal to a certain ARMA model. However, the procedure of the model selection criteria should be viewed as one which identifies the order of the ARMA model that is a good approximation of the data generating process (i.e. the economic behavior under study). Adding models to the research that describe the data generating process well, is therefore crucial. On the other side,

including more unnecessary models could lead to more possible wrong identifications. Thirdly, the weekend effect could influence this research. The time series contain daily observations. The fact that Saturdays and Sundays are not observed, means that there is a longer time period between an observation on Friday and an observation on Monday than there is between other days (for example Tuesday and Wednesday). However, all results are interpreted as being daily observations. This problem is referred to as the weekend effect (French, 1980). The weekend effect might have an effect on the pattern of the data, but this is hard to observe.

Fourthly, it should be noted that in this research there is no correction for clustered volatility. ARCH(p,q) or GARCH(p,q) modelling are possible alternatives for ARMA(p,q)

(25)

modelling. Although using subsamples of 250 observations should do some work in

eliminating clustered volatility, more investigation has to be done to find out whether more measures should be taken. Tests for conditional heteroskedasticity can be used to evaluate the disturbance caused by this clustered volatility.

Fifthly, it is important to keep in mind that the Pearson correlation coefficient assumes the relation between the variables to be linear. Testing for other relations, for example a quadratic relation, is difficult due to the fact that every subsample contains only 10 values of each variable. Therefore it is difficult to significantly find the true relation for each subsample. A solution to this problem might be Spearman’s rank correlation

coefficient. This coefficient assumes only a monotonic relation. A disadvantage of

Spearman’s rank correlation coefficient, however, is that it only takes into account the rank order and not absolute differences between the variables. This is why both the Pearson and Spearman correlation are calculated in this research as both coefficients have some

interesting properties. Both coefficients are only used in a descriptive way.

At last, it is also important to remember that by definition BIC selects models that are more parsimonious than those selected by AIC and HQC. Therefore, one should be careful concluding that BIC is the best criterion for ARMA one-step-ahead forecasting with daily observations. If the data generating process of the time series used in this research are by change best described by parsimonious models (for example white noise), then it is a logical consequence that BIC performs best. On the other side, it would in this case also be expected of AIC and HQC to select parsimonious models and the difference should not be as large as in this research.

There are some valuable properties of this research. The fact that three different time series are used in this research makes the results more valid and generally applicable. Also, the fact that daily observations are used is a great advantage, because many

(26)

model selection criteria in forecasting future values with ARMA models, making the results reliable.

Besides, estimating ARMA models using subsamples is to be preferred above estimating ARMA models for the entire series. Subsamples have the valuable property that the estimation and hold-out sample are of the same size. Also, the results of the subsamples can easily be compared with other subsamples and therefore conclusions can easily and reliably be made. This is a clear advantage over the research of Qi & Zhang (2001), since they use monthly data and estimate their models for the entire series.

Another advantage of this research is the fact that three commonly used model selection criteria and three commonly used forecast criteria are investigated, making it easier for time series analysts to choose which criterion is useful and which is not. The three forecast criteria determine the forecasting performance all in a different way, which all have some important aspects. Using more criteria makes the results more general and more reliable.

Recently there has been more investigation on averaging forecasts, which is a method to combine multiple forecast into a single forecast. Timmerman (2006), for example, shows that averaging forecasts is more accurate than choosing a single best forecast. For future research forecast averaging may be a possible way of improving and simplifying the results. Forecast averaging makes it possible to create a weighted average of a few important forecast criteria and combine these into one value, making the results easier to interpret.

The most important recommendation for future research is to investigate why common roots have a negative effect on model selection criteria as unfortunately not much is known about this. Finding the relation between model selection criteria and forecast criteria should also be done using only the ARMA models that do not contain common roots. All models that contain common roots should be eliminated from the range of models. Moreover, more time series with daily data should be studied to be able to

(27)

significantly find the relation between model selection criteria and their forecasting performance on the hold-out sample and to find out whether BIC is indeed superior for daily time series with an estimation sample of approximately 200 values and a hold-out sample of 50 values.

Concluding, BIC most often selects the model with the lowest forecast error and also the efficiency loss and the Pearson and Spearman correlation coefficient show that BIC performs on average best in this research. The results are, however, less obvious for the 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 series. For 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and for Δ log 𝑜𝑖𝑙 it can for most forecast criteria significantly be shown that the Pearson correlation is higher for BIC than for AIC. This cannot significantly be shown in most cases for the Spearman’s rank correlation. The main issue of this research is the fact that common roots probably lead to misspecification of the ARMA models and therefore the values of AIC, BIC and HQC are less valid. An important recommendation for future research is that only ARMA models that do not contain

common roots should be used.

6 Conclusion

This section summarizes the most important results of the paper and tries to explicitly answer the main question. The purpose of this research was to find the potential of certain model selection criteria applied to ARMA models in one-step-ahead forecasting of real financial time series with daily observations. Knowing more about the different model selection criteria for ARMA models and their forecasting performance can be useful for time series analysts as it is not always straightforward for them what model selection criteria to use. Testing the different model selection criteria and their potential to forecast future values one-step-ahead, can indeed improve ARMA modelling for real time series with daily observations.

(28)

To begin, earlier research shows that model selection criteria are indeed a powerful method of identifying ARMA models and a good alternative to Box-Jenkins related methods. This is, for example, shown by Ozaki (1977). The inconsistency of AIC and other model selection criteria do not necessarily have to be a problem for accurate forecasting. However, earlier research shows that the consistent criterion BIC performs better for ARMA modelling than the inconsistent criterion AIC. This was shown by Mantalos et al (2010) using a simulation study and by Koehler & Murphree (1988) using real data.

The second sub-question of this research focusses on finding the relation between model selection criteria and their forecasting performance on a hold-out sample of 50 observations. Qi & Zhang (2001) found no clear relation between model selection criteria and forecast criteria for artificial neural networks. This relation is not often studied for ARMA models, which makes it worth trying.

The Pearson correlation coefficient and the Spearman correlation coefficient showed a negative relation in this research. This is surprising, as it is clearly not as expected. Either a positive relation would be expected or no relation, but not a negative relation. Section 5 explains some reasons for the negative relation. The main problem seems to be the fact that common roots cause misspecifications of ARMA models. In this research common roots also seem to have caused low values of the model selection criteria and high values of the forecast criteria, which indeed leads to a negative relation between these two.

Due to the problem of common roots not much can be concluded about the relation that is investigated in section 4. The presence of common roots makes the validity of this research lower. Future research should focus more on finding a reason for the fact that model selection criteria cannot deal with common roots. A reason should be find for the fact that sometimes low values of model selection criteria were reported for models with common roots while at the same time the forecasting performance was not good at all. Moreover, more research needs to be done to find the relation between model selection

(29)

criteria and their forecasting performance on a hold-out sample using only ARMA models that do not contain common roots.

Section 5 elaborates on the third sub-question. The difference between AIC, BIC and HQC was discussed. BIC selected the model with the lowest forecast error most often. Also, the efficiency loss showed that BIC outperforms HQC and AIC. It seemed that AIC performs on average worse than HQC, but these results were less obvious.

The Pearson correlation and Spearman correlation were also calculated to find out more about the relation between model selection criteria and their forecasting performance on the hold-out-set. For 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙 the correlation with the forecast criteria was best for the models selected by BIC. These results did not hold for 𝛥 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡.

Despite the effect that common roots may have influenced the results, significance test showed that the Pearson correlation coefficient of BIC was significantly higher than the coefficient of AIC for 𝛥 𝐸𝑈𝑅𝑈𝑆𝐷 and Δ log 𝑜𝑖𝑙. This is in line with the results of Mantalos et al (2010) and Koehler & Murphree (1988). The performance difference between BIC and HQC was not as clear as the difference between BIC and AIC. Hence, this study suggests that for modelling daily time series with ARMA models using relatively large samples (200 observations) BIC selects on average models that have the best forecasting performance.

Now, the main question can be answered. As earlier research shows that model selection criteria are indeed doing a good job predicting future values, AIC, BIC and HQC certainly have potential. The relation in this research between model selection criteria and forecast criteria was, however, mainly negative due to common roots. Corresponding with earlier research it can be concluded that the consistent criterion BIC performs better than inconsistent criterion AIC for relatively large samples (200 observations). HQC performs a little better than AIC but a little worse than BIC in this research.

Investigating to what extent different model selection criteria, such as AIC, BIC and HQC, can indeed be helpful in forecasting future values of a real time series with daily data is a difficult task but worth the effort as it can improve time series analysis. More research

(30)

needs to be done in finding the relation between the different model selection criteria and their important property to forecast future values. Besides, more research should be done to extent this research to smaller and larger estimation samples. Future research should always take the effect of common roots into account and be performed for more time series and more sample sizes to provide a good foundation upon which conclusions can be drawn.

(31)

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions

on Automatic Control, 19(6), 716-723.

Bain, L. J., & Engelhardt, M. (1992). Introduction to probability and mathematical

statistics (2nd ed.). Belmont: Brooks/Cole.

Beveridge, S., & Oickle, C. (1994). A comparison of Box—Jenkins and objective methods for determining the order of a non‐seasonal ARMA model. Journal of

Forecasting, 13(5), 419-434.

Box, G.E.P., & Jenkins, G.M. (1970). Time series analysis: forecasting and control. San Francisco: Holden Day.

Casteren, P.H.F.M. van. (1994). Statistical model selection rules. (Doctoral Thesis). Amstelveen: Vrije Universiteit.

Chan, W. S. (1999). A comparison of some of pattern identification methods for order determination of mixed ARMA models. Statistics & Probability Letters, 42(1), 69-79.

Choi, B. (1992). ARMA model identification. New York: Springer-Verlag. Forster, M. R. (2000). Key concepts in model selection: performance and generalizability. Journal of Mathematical Psychology, 44(1), 205-231. French, K. R. (1980). Stock returns and the weekend effect. Journal of Financial

Economics, 8(1), 55-69.

Gooijer, J. G. de, Abraham, B., Gould, A., & Robinson, L. (1985). Methods for determining the order of an autoregressive-moving average process: a survey.

International Statistical Review/Revue Internationale de Statistique, 53(3), 301-329.

Hannan, E. J. (1980). The estimation of the order of an ARMA process. The Annals of

Statistics, 8(5), 1071-1081.

Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an

autoregression. Journal of the Royal Statistical Society. Series B (Methodological),

41(2), 190-195.

Hansen, B. E. (2005). Challenges for econometric model selection. Econometric Theory,

21(1), 60–68.

Heij, C., De Boer, P., Franses, P. H., Kloek, T., & Van Dijk, H. K. (2004). Econometric

methods with applications in business and economics. New York: Oxford

University Press.

Koehler, A. B., & Murphree, E. S. (1988). A comparison of the Akaike and Schwarz criteria for selecting model order. Applied Statistics, 37(2), 187-195.

Kuha, J. (2004). AIC and BIC comparisons of assumptions and performance. Sociological

(32)

Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of

Mathematical Statistics, 22(1), 79-86.

Mantalos, P., Mattheou, K., & Karagrigoriou, A. (2010). Forecasting ARMA models: a comparative study of information criteria focusing on MDIC. Journal of Statistical

Computation and Simulation, 80(1), 61-73.

Ozaki, T. (1977). On the order determination of ARIMA models. Applied Statistics, 26(3), 290-301.

Pindyck, R.S., & Rubinfeld, D.L. (1976). Econometric models and economic forecasts. Tokyo: McGraw-Hill Kogakusha.

Qi, M., & Zhang, G.P. (2001). An investigation of model selection criteria for neural network time series forecasting. European Journal of Operational Research,

132(3), 666-680.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464.

Timmermann, A. (2006). Forecast combinations. Handbook of Economic Forecasting, 1(1), 135-196.

Woodward, W. A., Gray, H. L., & Elliott, A. C. (2012). Applied time series analysis. Boca Raton: CRC Press.

Zucchini, W. (2000). An introduction to model selection. Journal of Mathematical

(33)

Appendix 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 EURUSD -.08 -.06 -.04 -.02 .00 .02 .04 .06 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

First order difference of EURUSD

Figure 1: Plot of the time series 𝐸𝑈𝑅𝑈𝑆𝐷 Figure 2: Plot of the time series 𝐸𝑈𝑅𝑈𝑆𝐷

0 2 4 6 8 10 90 92 94 96 98 00 02 04 06 08 10 12 14 interest -.3 -.2 -.1 .0 .1 .2 .3 90 92 94 96 98 00 02 04 06 08 10 12 14

First order difference of interest

Figure 3: Plot of the time series 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 Figure 4: Plot of the first order difference of 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 log(oil) -.5 -.4 -.3 -.2 -.1 .0 .1 .2 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14

First order difference of log(oil)

(34)

Sample criterium Witte ruis ARMA(1,0) ARMA(2,0) ARMA(3,0) ARMA(0,1) ARMA(0,2) ARMA(1,1) ARMA(1,2) ARMA(2,1) ARMA(2,2) 1 AIC -7,284905 -7,27511 -7,268943 -7,258944 -7,275309 -7,266795 -7,265797 -7,257279 -7,25941 -7,289238 BIC -7,268414 -7,242127 -7,219468 -7,192978 -7,242326 -7,21732 -7,216322 -7,191313 -7,193444 -7,20678 HQC -7,278232 -7,261762 -7,248921 -7,232249 -7,261962 -7,246773 -7,245776 -7,230583 -7,232714 -7,255868 RMSE 0,005797 0,005799 0,005757 0,005757 0,005801 0,005768 0,0058 0,005768 0,005758 0,005799 MAE 0,004352 0,004365 0,004302 0,004301 0,004372 0,004328 0,004374 0,00433 0,004308 0,004354 MAPE 93,13302 94,55857 96,26023 96,22321 95,56279 96,46502 95,86264 96,43047 96,48724 93,3942 2 AIC -6,850636 -6,842445 -6,846336 -6,836449 -6,842995 -6,844796 -6,838874 -6,834859 -6,836394 -6,863968 BIC -6,834144 -6,809462 -6,796862 -6,770483 -6,810012 -6,795321 -6,789399 -6,768892 -6,770427 -6,781511 HQC -6,843962 -6,829097 -6,826315 -6,809754 -6,829648 -6,824774 -6,818852 -6,808163 -6,809698 -6,830599 RMSE 0,007446 0,007457 0,007534 0,007552 0,007463 0,007545 0,007536 0,007553 0,007545 0,007889 MAE 0,005916 0,005913 0,00594 0,005956 0,005913 0,005939 0,005933 0,005946 0,005949 0,006231 MAPE 109,4207 111,1707 111,6621 109,9715 111,6216 110,0422 108,3986 109,0384 110,6951 154,3179 3 AIC -7,135849 -7,125884 -7,117893 -7,107991 -7,125927 -7,118263 -7,116903 -7,108522 -7,108715 -7,149361 BIC -7,119358 -7,092901 -7,068418 -7,042025 -7,092944 -7,068788 -7,067429 -7,042556 -7,042748 -7,066903 HQC -7,129175 -7,112536 -7,097872 -7,081295 -7,11258 -7,098241 -7,096882 -7,081827 -7,082019 -7,115991 RMSE 0,004616 0,004618 0,004561 0,004567 0,004561 0,00462 0,004624 0,004571 0,004571 0,00485 MAE 0,00385 0,003852 0,003786 0,003789 0,00379 0,003853 0,003852 0,003794 0,003793 0,004026 MAPE 98,01563 98,10248 97,6224 97,87841 103,1356 98,16363 97,96942 97,25979 97,64638 133,4843 4 AIC -7,510377 -7,501563 -7,494355 -7,498561 -7,501647 -7,494802 -7,494802 -7,497763 -7,488801 -7,488896 BIC -7,493886 -7,46858 -7,44488 -7,432594 -7,468664 -7,445328 -7,445328 -7,448289 -7,422835 -7,42293 HQC -7,503704 -7,488216 -7,474334 -7,471865 -7,488299 -7,474781 -7,474781 -7,477742 -7,462105 -7,4622 RMSE 0,005254 0,005257 0,005252 0,005205 0,005257 0,00525 0,005235 0,005229 0,005228 0,005196 MAE 0,004206 0,004195 0,004205 0,004145 0,004194 0,0042 0,00417 0,004179 0,00418 0,004196

(35)

5 AIC BIC -7,097073 -7,078408 -7,053541 -7,033824 -7,07786 -7,055408 -7,052134 -7,031893 -7,030475 -7,057527 HQC -7,106891 -7,098043 -7,082994 -7,073095 -7,097496 -7,084861 -7,081587 -7,071164 -7,069745 -7,106616 RMSE 0,007866 0,007876 0,007862 0,007852 0,007876 0,007857 0,007874 0,007878 0,007862 0,008007 MAE 0,006054 0,006139 0,006141 0,006139 0,006129 0,006157 0,006138 0,006162 0,006152 0,006177 MAPE 105,8473 107,733 113,6585 119,4245 106,0876 118,1744 108,528 121,363 118,5395 197,5349 6 AIC -6,915923 -6,908487 -6,910867 -6,901176 -6,909147 -6,9108 -6,913846 -6,903974 -6,90431 -6,952483 BIC -6,899431 -6,875503 -6,861393 -6,83521 -6,876164 -6,861326 -6,864372 -6,838008 -6,838344 -6,870025 HQC -6,909249 -6,895139 -6,890846 -6,874481 -6,895799 -6,890779 -6,893825 -6,877278 -6,877614 -6,919114 RMSE 0,006631 0,006677 0,006813 0,006841 0,006695 0,006821 0,006777 0,006781 0,006789 0,008941 MAE 0,005024 0,005063 0,00519 0,005204 0,005084 0,005187 0,005126 0,005126 0,005136 0,007193 MAPE 100,5098 115,2747 137,5516 137,211 123,7005 142,3653 153,3391 153,3938 158,0709 593,8124 7 AIC -7,091301 -7,084711 -7,074747 -7,066179 -7,084473 -7,074477 -7,122693 -7,114481 -7,11383 -7,188952 BIC -7,074809 -7,051728 -7,025273 -7,000213 -7,05149 -7,025002 -7,073218 -7,048515 -7,047864 -7,106495 HQC -7,084627 -7,071363 -7,054726 -7,039483 -7,071125 -7,054455 -7,102671 -7,087786 -7,087135 -7,155583 RMSE 0,006629 0,006623 0,006548 0,006624 0,006543 0,006542 0,006761 0,006678 0,006685 0,014374 MAE 0,005011 0,005006 0,005006 0,005076 0,004993 0,004991 0,005113 0,005016 0,005026 0,011517 MAPE 96,72421 96,67577 118,829 120,8429 115,4819 114,9191 219,4316 199,5458 199,2812 1022,816 8 AIC -7,212348 -7,206002 -7,197926 -7,187964 -7,206905 -7,198576 -7,204192 -7,194203 -7,19437 -7,213964 BIC -7,195857 -7,173019 -7,148451 -7,121998 -7,173922 -7,149101 -7,154717 -7,128236 -7,128404 -7,131506 HQC -7,205674 -7,192655 -7,177904 -7,161269 -7,193558 -7,178554 -7,18417 -7,167507 -7,167675 -7,180595 RMSE 0,005919 0,005827 0,00589 0,005883 0,00582 0,005882 0,005845 0,005849 0,005863 0,005979 MAE 0,004558 0,004524 0,004579 0,004575 0,004528 0,004581 0,004567 0,00457 0,004578 0,00462 MAPE 98,35293 99,44196 101,5425 100,967 100,31 101,7777 101,9647 101,9081 101,7411 100,5159

(36)

BIC HQC -7,943512 -7,926846 -7,910172 -7,898379 -7,926839 -7,910166 -7,91071 -7,894149 -7,894207 -7,926828 RMSE 0,00485 0,004847 0,004847 0,004843 0,004849 0,004849 0,004853 0,004853 0,004852 0,005023 MAE 0,003729 0,003726 0,003726 0,003702 0,003728 0,003728 0,003731 0,003723 0,00372 0,003866 MAPE 117,0154 116,3257 116,2621 112,812 116,7664 116,7021 118,0796 116,237 115,5351 119,5138 10 AIC -6,891653 -6,8869 -6,878004 -6,868288 -6,888331 -6,879013 -6,88123 -6,87126 -6,871237 -6,863929 BIC -6,875162 -6,853917 -6,82853 -6,802322 -6,855348 -6,829538 -6,831756 -6,805293 -6,80527 -6,781471 HQC -6,884979 -6,873552 -6,857983 -6,841593 -6,874983 -6,858992 -6,861209 -6,844564 -6,844541 -6,830559 RMSE 0,010751 0,010755 0,010768 0,01076 0,010763 0,010771 0,010768 0,010768 0,010769 0,010775 MAE 0,00732 0,007138 0,007184 0,00716 0,00714 0,007176 0,007172 0,007167 0,007175 0,007185 MAPE 108,792 94,55605 99,99929 94,60457 94,96747 98,46802 97,12971 96,69645 97,37832 97,89886 11 AIC -6,198479 -6,194067 -6,191473 -6,18294 -6,194122 -6,192694 -6,186629 -6,220979 -6,219862 -6,27793 BIC -6,181988 -6,161084 -6,141998 -6,116973 -6,161139 -6,143219 -6,137154 -6,155012 -6,153896 -6,195472 HQC -6,191805 -6,180719 -6,171451 -6,156244 -6,180774 -6,172672 -6,166607 -6,194283 -6,193166 -6,24456 RMSE 0,009173 0,009186 0,009218 0,009295 0,009191 0,00926 0,009185 0,008996 0,008987 0,013276 MAE 0,006887 0,006896 0,006934 0,007031 0,006903 0,007011 0,006903 0,00678 0,006765 0,010799 MAPE 154,8955 152,2947 153,4746 169,2599 155,6003 165,1691 151,0668 179,3279 175,8285 729,2688 12 AIC -6,584454 -6,57912 -6,569287 -6,561169 -6,579006 -6,569522 -6,569129 -6,572187 -6,571217 -6,576475 BIC -6,567963 -6,546137 -6,519812 -6,495203 -6,546023 -6,520047 -6,519654 -6,50622 -6,50525 -6,494017 HQC -6,57778 -6,565772 -6,549265 -6,534474 -6,565658 -6,5495 -6,549107 -6,545491 -6,544521 -6,543106 RMSE 0,010658 0,010712 0,010712 0,010665 0,010711 0,010712 0,010712 0,010692 0,010688 0,010646 MAE 0,008899 0,008933 0,008948 0,008973 0,008927 0,008956 0,008935 0,009004 0,009005 0,008866 MAPE 99,32718 106,0816 109,7617 112,4341 104,5677 111,1483 106,6135 116,3677 117,1426 100,6882 13 AIC -6,454533 -6,444645 -6,437165 -6,432117 -6,444667 -6,436786 -6,442529 -6,428068 -6,428043 -6,425147

(37)

HQC RMSE 0,011029 0,011038 0,01105 0,01115 0,011039 0,011044 0,011407 0,011086 0,01108 0,011387 MAE 0,008584 0,00859 0,008531 0,008629 0,00859 0,008531 0,009054 0,008532 0,008535 0,009018 MAPE 98,88636 98,17995 94,94319 97,28575 98,1502 95,23766 118,7376 95,51119 95,38888 116,1065 14 AIC -7,066952 -7,058332 -7,0484 -7,038401 -7,058497 -7,048512 -7,048666 -7,038666 -7,038695 -7,091969 BIC -7,050461 -7,025349 -6,998925 -6,972435 -7,025514 -6,999037 -6,999191 -6,9727 -6,972729 -7,009511 HQC -7,060278 -7,044984 -7,028378 -7,011706 -7,04515 -7,02849 -7,028644 -7,011971 -7,012 -7,0586 RMSE 0,005925 0,005941 0,00595 0,005951 0,005944 0,005949 0,005949 0,005949 0,005955 0,005872 MAE 0,004878 0,004888 0,004891 0,004892 0,004891 0,004892 0,004893 0,004893 0,004894 0,004915 MAPE 107,2948 115,1633 116,1477 116,062 116,0636 116,4578 116,3892 116,3976 116,7747 111,3926 15 AIC -7,16845 -7,188473 -7,18046 -7,189984 -7,188624 -7,18005 -7,179161 -7,179969 -7,178563 -7,224969 BIC -7,151958 -7,155489 -7,130985 -7,124018 -7,155641 -7,130575 -7,129686 -7,114002 -7,112596 -7,142511 HQC -7,161776 -7,175125 -7,160438 -7,163288 -7,175276 -7,160028 -7,159139 -7,153273 -7,151867 -7,191599 RMSE 0,005172 0,005167 0,005169 0,005185 0,005166 0,005171 0,005168 0,005138 0,005208 0,005621 MAE 0,00376 0,003712 0,003693 0,003798 0,003699 0,003726 0,003704 0,003767 0,003761 0,004483 MAPE 118,2774 142,5687 141,4414 134,37 139,6533 148,3671 142,781 143,4421 143,2476 246,9 16 AIC -7,988151 -7,986149 -7,977127 -7,972891 -7,996025 -7,98633 -8,022257 -8,015373 -8,014789 -8,005427 BIC -7,97166 -7,953165 -7,927652 -7,906925 -7,963041 -7,936855 -7,972783 -7,949407 -7,948823 -7,922969 HQC -7,981477 -7,972801 -7,957105 -7,946195 -7,982677 -7,966308 -8,002236 -7,988678 -7,988094 -7,972058 RMSE 0,005169 0,005164 0,005179 0,005234 0,005177 0,00518 0,005279 0,005246 0,005259 0,005242 MAE 0,003809 0,003799 0,003798 0,003805 0,003797 0,003797 0,003867 0,003829 0,003837 0,003829 MAPE 99,77045 113,3418 106,9938 108,519 124,7162 123,2146 152,6582 136,2689 138,2452 136,2764 17 AIC -6,875393 -6,866231 -6,859946 -6,855927 -6,865852 -6,860531 -6,859331 -6,850786 -6,850248 -6,875396 BIC -6,858902 -6,833248 -6,810471 -6,789961 -6,832868 -6,811056 -6,809856 -6,784819 -6,784282 -6,792938

Referenties

GERELATEERDE DOCUMENTEN

Plasmid copy number and UPR related mRNAs in strains expressing cbh1 and/or cbh2 genes.. Relative plasmid copy number in

Tijdens het veldonderzoek zijn binnen het plangebied enkel recente of natuurlijke sporen aangetroffen.. Met uitzondering van zeer recente fragmenten aardewerk, die

The roots of the system, of which at least one yields the globally optimal parameters of the ARMA model, follow from the construction of an autonomous multidimensional (nD) linear

To address the challenge highlighted above, this study investigated the feasibility of industrial online primary mill circuit monitoring with a simple and convenient tool

Family businesses often have no official selection criteria for a successor, and the goal of this research is to find out what the underlying criteria are to select

Similar to to our results of forecasting the variance and the tail tests, we found that using a normal or skewed normal specification for either ARMA-GARCH or GAS resulted in the

In this scenario we combine on the one hand (1) interactive discovery of user’s interests applied for semantic recommendations of artworks and art-related topics, and on the other

Having identified the selection criteria customers use to pick their postal supplier, they can be used to specify the types of customer, and potential customer to Company A.. In