• No results found

A Forecast Comparison of Volatility Models using Bayesian and Frequentist Approaches

N/A
N/A
Protected

Academic year: 2021

Share "A Forecast Comparison of Volatility Models using Bayesian and Frequentist Approaches"

Copied!
59
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A Forecast Comparison of Volatility Models using

Bayesian and Frequentist Approaches

Baharak Ibrahimy

(2)
(3)

A Forecast Comparison of Volatility Models using Bayesian and

Frequentist Approaches

Baharak Ibrahimy

January 16, 2019 Abstract

We compare 45 (G)ARCH-type models on their out-of-sample forecast ability of the conditional variance using high-frequency data. Both a Bayesian (Bayesian Model Averaging weights) and a frequentist approach (Model Confidence Set) are used to rank the performance of the models on EUR/USD and USD/JPY exchange rate data for the period of 1999-2018. We consider three time horizons: 1-hour-ahead, 1-day-ahead and 1-month-ahead. In our analysis, we also examine the forecast performance of the models during the financial crisis of 2007-2008 separately. We suggest to combine the considered approaches for optimal results. We find evidence that a GARCH(1,1) is outperformed by more sophisticated models, notably by a TGARCH(2,2).

1

Introduction

The conditional variance is important for risk management, hedging, asset pricing and option pricing. Modeling and forecasting the conditional variance is thereby an important subjects in time series analysis. Volatility models are routinely used to describe and forecast the con-ditional variance in financial time series. Their goal is to model the concon-ditional variance by means of past squared returns. Andersen and Bollerslev (1998) state that these models are able to forecast accurately when using realized variance, constructed from high-frequency data, as a proxy for the conditional variance (which is volatility squared). Hansen and Lunde (2005) build on this notion and examine the forecast performance of 330 (G)ARCH-type models. They state that for the Deutsche Mark/US dollar exchange rate data from the period of 1987-1993, nothing can outperform the GARCH(1,1) of Bollerslev (1986) for the 1-day-ahead out-of-sample forecasts based on the year 1992-1993. They compare the models with the superior predictive ability (SPA) of Hansen et al. (2001) and the reality check for data snooping (RC) by White (2000). These methods permit some degree of confidence for the results such that good results are not mistaken for chance. This study can be seen as a re-examination of their issue. We re-examine their issue for the following three motivating factors:

1) To provide insights on whether the evidence can be extended to different horizons. Besides the 1-day-ahead forecasts, two other horizons are considered (1-hour-ahead and 1-month-ahead forecasts). We include several horizons (a.k.a. frequencies) to examine their influence on the forecasting performance. The more complex models are assumed to make forecasts that are more precise on the longer horizons in contrast to the simple models which are assumed to forecast more accurate on the shorter horizons. Moreover, the Deutsche Mark no longer exists; the Euro is used instead. Using recent data provides insights, since the data tomorrow is better described by the recent past.

2) To include adversity. The global financial crisis of 2007-2008 has occured after the paper of Hansen and Lunde (2005) was published. Perhaps certain volatility models are more suited than others during these times of turmoil.

(4)

the forecast ability. We consider three different approaches, which we refer to as ranking measures since the approaches involve ranking the models in terms of their forecasting per-formance. (1) We employ the plain method, which is ranking by the lowest value of the corresponding loss function. (2) We consider ranking by the Model Confidence Set (MCS) of Hansen, Lunde, and Nason (2011). A MCS contains a set of superior set of model(s) which contains the best model(s) with a given confidence level. (3) We consider Bayesian Model Averaging (BMA) weights as proposed by Stock and Watson (2004). Bayesian methods are widely used for model selections. The BMA weights are constructed from forecast combi-nations using a shrinkage method. Shrinkage methods are used to avoid poor forecasting and poor model selection. Forecast combinations, simply put these are combined forecasts, are found to produce better forecasts than ‘individual’ forecasting models (Timmermann, 2006).

The MCS can be seen as a frequentist approach. Using BMA weights can be seen as a Bayesian approach. The plain method serves as a contrast to these approaches to reflect their power, i.e. to reflect how these methods perform when compared to the plain method. We consider this as a safe haven since the other two methods use loss functions. A model outperforming the others by chance is accounted for by these ranking measures.

Using these ranking measures over SPA and RC has the following advantages. The MCS is (i) independent of any benchmark models; (ii) the entire set of models is ranked; thus, one can easily observe which models are significantly outperformed by others, and (iii) the tested hypotheses are of a simple form. The BMA weights do not encounter difficulties with the formulations, developments and interpretations. This approach provides a set of widely applicable tools. Furthermore, our computations do not demand difficult tools and can easily be implemented and thus replicated.

Our aim is to examine whether certain type of volatility models have a higher fore-casting accuracy than the other models. More precisely, our research questions are: (1) Does GARCH(1,1) outperform the other considered volatility models? (2) Does this an-swer depend on different horizons or the used ranking measure? In order to anan-swer these questions, we compare 45 (G)ARCH-type models in terms of their out-of-sample forecasting ability of the conditional variance for two major exchange rate data: Euro/US dollar and US dollar/Japanese yen.

Our study could contribute evidence to an empirical literature that suggests either a Bayesian or a frequentist approach. We suggest to combine both approaches for opti-mal results. To preview our results, we find that a GARCH(1,1) is outperformed by a TGARCH(2,2), and we find that the class of Symmetric models is outperformed by the other two considered classes: the class consisting of Asymmetric and Leverage

accommo-dating models and the class consisting of Asymmetric models. Furthermore, we find that

the period of the financial crisis of 2007-2008 does not exhibit similarities with the regular period.

The structure of this paper is as follows. Section 2 gives a literature review. Section 3 describes the data and the considered models. Section 4 present the techniques used for the out-of-sample forecasts. The results are presented in Section 5 and a discussion follows in Section 6. A conclusion in Section 7 completes the paper.

2

A literature review

(5)

the forecastability of several econometric models starting from the 1920s to the recent years [Cassel (1922); Dornbusch (1973); Giddy and Dufey (1975); Finn (1986); MacDonald (1999); Berkowitz and Giorgianni (2001); Alba and Papell (2007); Pacelli (2012); Beckmann and Schüssler (2016)]. One of the most cited papers on this subject is that of Meese and Rogoff (1983). They have established that for short horizons, a period up to at most twelve months, the driftless random walk is able to forecast the exchange rate quite accurately. These find-ings have never fully been overturned despite the effort of other forecasting models. Inspired by this study, many authors, including Kuan and Liu (1995); Tenti (1996); Leung, Chen, and Daouk (2000); Nag and Mitra (2002); and Panda and Narasimhan (2007) have applied non-linear models such as neural networks to examine the predictability of the exchange rates dynamics. Other models, such as structural models, Markov regime switching models, and SETAR models have also been applied to forecast the exchange rate [Schinasi and Swamy (1989); Engel (1994); Brooks (2001)]. Although some claim that their model improves over a random walk model, the improvement is either very small or not at shorter horizons. Time series of exchange rates are characterized by volatility clustering, leptokurtosis, and conditional heteroscedasticity [Mandelbrot (1972); Fama (1965); F. X. Diebold and Nason (1990); Canova (1993)]. These features imply that the hypothesis of normality is rejected since these series exhibit alternating periods which are characterized by large fluctuations around the mean with periods which are characterized by smaller variations. ARCH and GARCH models have the ability to analyze the time variability of the volatility. Thereby they are useful in capturing the non-linearity of changes in exchange rates [Brooks (1997); Kilian and Taylor (2003); Wang, Chen, Jin, and Zhou (2010)].

Engle (1982) and Bollerslev (1986) are the pioneers of the ARCH (Autoregressive Con-ditional Heteroscedasticity) models and GARCH (Generalized Autoregressive ConCon-ditional Heteroscedasticity) models, respectively. The generalized model opened a way for new mod-els to capture the time series dynamics. The GARCH modmod-els are divided in two categories: the univariate and multivariate models. Some examples of the univariate models are the Exponential GARCH of Nelson (1991), the GJR-GARCH model of Glosten, Jagannathan, and Runkle (1993), APARCH model of Ding, Granger, and Engle (1993), and the compo-nent GARCH of Engle, White, et al. (1999). Some examples for the multivariate models are the VECH model of Bollerslev, Engle, and Wooldridge (1988), the BEKK model of Engle and Kroner (1995), and the Generalized Orthogonal GARCH of Van der Weide (2002). The multivariate frameworks allow evaluating density forecasts adequacy involving cross-variable interactions, such as time-varying conditional correlations (Diebold, Hahn, & Tay, 1999).

Multivariate models are able to beat the random walk [Canova (1993)); Jorion and Sweeney (1996); Leung et al. (2000)]. Carriero, Kapetanios, and Marcellino (2009) claim that the multivariate models suffer from the same problem as the univariate models in fore-casting exchange rates in a better way than the univariate driftless random walk models. This is especially true for the shorter horizons. Multivariate GARCH models are assumed to outperform the univariate GARCH models in terms of their forecast ability since the adequacy of density forecasts involve cross-variable interactions, such as time-varying con-ditional correlations (Diebold et al., 1999). Multivariate models are not as flexible as the univariate models from a practical point of view since they make estimations in practice more difficult because of the dimensionality problem. In our analysis, the parsimony of a model is important. Furthermore, we would like to accent models that are of practical rel-evance and which can be easily implemented. Univariate models show these characteristics more often than multivariate models. Therefore, we will focus on the univariate models.

(6)

Bollerslev (1998) argue that volatility models are able to forecast accurately when using high-frequency data. They state that squared daily returns are an extremely noisy es-timator of the realized daily returns. They propose using the sum of squared intraday returns, the realized variance, to reduce the noise. The intraday returns are constructed using high-frequency data which will be explained more in detail in Section 3. Martens and Zein (2004) also observe that high-frequency data improves the forecasting of financial volatility in comparison to historical daily returns or even implied volatilities. This finding inspired many scholars to use volatility models for forecasting volatility or the conditional variance (which is volatility squared), such as Hansen and Lunde (2005). They use 330 volatility models to forecast the conditional variance using high-frequency data and state that a GARCH(1,1) outperforms the other considered volatility models for the Deutsche Mark/US dollar (DM/USD) exchange rate data for the out-of-sample forecasting period of October 1992 until September 1993. Their article is used as a base for this study because of (i) its strong claim of one superior model (GARCH(1,1)), and (ii) its attraction in the scien-tific world [Andersen, Bollerslev, and Diebold (2003);Cappiello, Engle, and Sheppard (2006); McAleer and Medeiros (2008), Patton (2011); Clark and Ravazzolo (2015); Bollerslev, Hood, Huss, and Pedersen (2018);Viola, Klotzle, Pinto, and da Silveira Barbedo (2019)].

3

Data and Models

3.1

Data

We analyze two major exchange rate data, namely: Euro/US dollar (EUR/USD) and US dollar/Japanese Yen (USD/JPY). The considered period is January 1999 until December 2018. We construct the returns by the hourly closing prices obtained from Tickstory. We use prices at close since in this way data is less affected by micro-structure effects. The hourly (log-exchange rate) returns, ri are then constructed by (Lee & Kim, 1995)

ri= ln(Ci) − ln(Ci−1),

where Ciis the closing price at hour i of the US Dollar versus the corresponding currency or vice versa. Each day consists of 24 hours. The intraday returns, Rt, are then constructed by summing up the hourly log-exchange rate returns: Rt=P

24

i=1rt,i(t = 1, 2, ..., n), where rt,i is the log-exchange rate return of the ithhour for time t and n is the number of observations in the sample. Here t stands for day t. Monthly returns are computed in the same manner. One month consists of approximately 21 working days. Let Mt represent the monthly returns, then Mt = P

21·24

i=1 rt,i (t = 1, 2, ..., n). Here t stands for month t. We consider exchange hours on the time interval 10:00 EST–16:00 EST. For more detailed descriptions on the construction of the returns we refer to Andersen, Bollerslev, et al. (1997) and Martens (2001).

We also examine the period of the global financial crisis. When using monthly returns for this period, we do not obtain sufficient number of observations. Therefore, we omit this series for the crisis. We would still like to examine different horizons for this period. Therefore, we will examine the hourly (log-exchange rate) returns. We will examine 1-hour ahead and 1-day ahead forecasts for this period. The period spans January 2003 - December 2008.

(7)

time (hour, day or month), is given by ri2for the hourly returns. For the daily returns it is defined as σ2 realized,dayt = P7 i=1r 2

t,i since there are only seven exchange hours in a day. For the monthly returns, the realized variance is defined as σ2

realized,montht =

P21·7 i=1 r

2

t,i. Diebold et al. (1999) state that h-step-ahead density forecasting is equivalent to one-step-ahead forecasting of h-period returns. We evaluate three horizons in our empirical analysis: 1-hour-ahead, 1-day-ahead and 1-month-ahead forecasts. Therefore, we forecast one-step-ahead for each of the returns. We divide the data in two periods: an estimation period and a forecasting period. For the regular period, consisting of the daily and monthly returns, the estimation period is January 1999 - December 2014 and the forecasting period is January 2015 - December 2018. For the period of the financial crisis, these are January 2003 - June 2007 and July 2007 - December 2008.

In our analysis, we also include density plots. The conditional density of rt is denoted by f (r|Ft−1) where Ft−1 is the information up to and including t − 1. The forecasted conditional mean is defined as µt= E(rt|Ft−1) which is known as the location parameter. The forecasted conditional variance is defined as σ2t = var(rt|Ft−1) which is known as the scale parameter. The density can then be computed by ϕt= (µt, σ2t, ηt), where ηtis known as the forecasted shape parameter.

We use a method with rolling moving windows for the density forecasts, which is a simple way of incorporating actual data into the estimation of the conditional variance. Data from the estimation period t = 1, 2, ..., T is used to determine a window width of T for the estimation of the volatilities and to form the one-step-ahead forecasts starting from time T . Thus, we obtain the forecasted volatility for time T +1. Afterwards, the window is moved one period ahead in time and the volatilities are re-estimated using data from t = 2, 3, ..., T + 1 and the one-step-ahead forecasts are produced starting at T + 1 which gives the forecasted volatility of time T + 2. For the nth forecast, data is used from t = n, n + 1, ..., T + n − 1 which give us the forecasted volatility of time T + n.

3.2

Preliminary data analysis

The descriptive characteristics of the daily and monthly returns of the two exchange rate data are given in Table 1. The returns series display similar statistical properties as far as the third and fourth moments goes. More specifically, the returns series are either negatively or positively skewed and the large returns lead to a larger value of the kurtosis (indicating heavy tails). The table also present the p-values of the Ljung-Box test for sample autocorrelation. This test statistic is defined by (Ljung & Box, 1978)

Q = n(n + 2) l X k=1 ˆ ρ2 k n − k,

(8)

EUR/USD

M mean·10−4 std. dev. min max skew kurt pQ(10)

monthly 227 19.5679 0.0149 -0.0595 0.0375 -0.7787 2.1765 0.7941

daily 4755 0.9313 0.0032 -0.0220 0.0310 0.1766 5.9608 < 0.0001

USD/JPY

monthly 226 3.9484 0.0141 -0.0497 0.0354 -0.3512 1.0064 0.3061

daily 4738 0.1986 0.0032 -0.0360 0.0208 -0.9018 11.3689 < 0.0001

Table 1: Summary statistics of the monthly and daily returns for the EUR/USD and USD/JPY exchange rate data. M denotes the sample size. The p-values of the Ljung-Box test statistics under ten lags are given by pQ(10). The considered time period is 01/01/1999-31/12/2018.

The descriptive graphs (density of the empirical versus the estimated Normal Inverse Gaus-sian (NIG) and normal distribution, and the corresponding QQ-plots) for the monthly and the daily returns of the EUR/USD and USD/JPY series are presented in Figures 1 and 2, respectively. The density graphs and the QQ-plots show that the daily returns exhibit asym-metric fat tails. The monthly returns do not exhibit leptokurtosis. The NIG distribution clearly fits the data well.

Forsberg and Bollerslev (2002) show that GARCH models with NIG errors are able to forecast very accurate. They use Euro/USD dollar exchange rate data for their out-of-sample forecasts using data from January 1999 until December 2001.

The NIG distribution with parameters α, β, µ, and δ is denoted as N IG(α,β,µ,δ). Its density function is defined by

g(α, β, µ, δ) = a(α, β, µ, δ)q x − µ δ −1 K1{δαq  x − µ δ  }eβx, where a(α, β, µ, δ) = 1 παe δα2−β2−βµ and q(x) =p1 + x2,

and µ ∈ R, 0 ≤ |β| < α, and δ ∈ R+. µ is a location parameter, β is an asymmetry

parameter, α ± β determines the heaviness of the tail (see Barndorff-Nielsen and Prause (2001) for more information), and δ is a scaling parameter. Furthermore, K1 represent the

modified Bessel function of the third kind and index one. We refer to Abramowitz and Stegun (1965) for the properties of the Bessel function. For further information about the NIG distribution we refer the reader to Barndorff-Nielsen (1997).

3.3

A country of (G)ARCH-type models

This section will present the considered (G)ARCH-type models for this study. We fit uni-variate ARMA-GARCH models to each of the return series with innovations assumed to come from an NIG distribution. Let (Zt)t∈ Z be the sequence of independent and identi-cally distributed (i.i.d) random variables characterized by mean zero and unit variance. The process (Xt)t∈ Z is an ARMA(p1, q1)-GARCH(p2, q2) process if the following holds

(9)
(10)
(11)

where Ztis independent of (Xs)s≤t. The polynomials 1 − φ1z − ... − φp1z

p1 and 1 − θ 1z −

... − θp1z

p1 do not have common roots nor roots inside the unit circle (Brockwell, Davis,

& Fienberg, 1991). The conditional variance σt2 or some form of the conditional standard deviation σtare specified for all the considered models and displayed in Table 2. For con-venience, we will set p = p2and q = q2 when specifying some form of the σt for the models below. We fit the ARMA-GARCH models by maximum likelihood to each of the return series where we assume a certain distribution for the innovations Zt. The ARMA orders under each frequency (= for each considered return series: monthly, daily, and hourly re-turns) is obtained by selecting the model with the lowest AIC (Akaike information criterion) values among many ARMA orders. AIC is defined as (Akaike, 1970): AIC = −2LL + 2m, where LL stands for the log-likelihood and m stands for the number of parameters of the corresponding model. Table 7 in the Appendix shows the fitted orders under each frequency. Each considered model is described in more detail below, including its main contribution to the (G)ARCH universe.

ARCH model

Consider the process {Zt} where {Zt} is independent and identically distributed (i.i.d) and it is a Strict White Noise process (SWN). The process {εt}t∈Z is called an Autoregressive Conditional Heteroskedasticity process (ARCH(q)) process, (Engle, 1982) if it can be written as εt= σtZt σt2= ω + q X j=1 αjε2t−j,

where ω > 0, αj ≥ 0 for j = 1, 2, ..., q − 1, αq> 0. ω represents the intercept, ε2t stands for the residuals from the conditional mean specification, αjrepresents its coefficient for the jth lag, and σ2

t represents the conditional variance. ARCH allows for the conditional variance to change over time as a function of past residuals. The unconditional variance is constant, thereby the past can give information about the one-period forecast variance. We follow the footstep of Hansen and Lunde (2005) by including this model for the following reasons: (i) as contrast to the other models (much like a control group) since the model cannot fully capture the persistence in volatility and (ii) to verify that the MCS and BMA weights have power. If ARCH is not rejected as the best model, than the MCS and BMA weights cannot be very powerful in informing which model is the better model. For the models which will be described below we have the same properties and relations for the processes {Zt} and {εt}t∈Z. Therefore, they will not be repeated for each model; only the conditional variance (or some form of it, e.g. the standard deviation σt), will be mentioned. The same holds for the mentioned parameters and their restrictions unless specified otherwise.

(standard) GARCH

(12)

where βj ≥ 0 and βp > 0. βj represents the coefficient for the jth lag of σ2t−j. These restrictions (including the ones defined under the ARCH model) hold for all the considered models, unless stated otherwise. The GARCH model is symmetric, i.e. past negative and positive shocks have the same impact on the current volatility. The model cannot cope with skewness. Thus, the model can easily under- or overpredict the amount of volatility after a shock.

IGARCH

The integrated GARCH model by Engle and Bollerslev (1986) assumes the persistence pa-rameter ˆP =Pq

j=1αj+P p

j=1βj= 1. Hansen and Lunde (2005) express it as

σ2t = ω + ε2t−1+ q X j=2 αj(ε2t−j− ε 2 t−1) + p X j=1 βj(σ2t−j− ε 2 t−1).

Because of unit persistence, other results cannot be calculated, e.g. unconditional variance. Under the IGARCH model, shocks to volatility are permanent.

CGARCH

The conditional variance can be decomposed into a permanent and transitory component, thereby allowing to examine the long- and short-run movements of volatility. This model captures the property of high persistence in volatility and was introduced by Engle et al. (1999). It can be written as σt2= qt+ q X j=1 αj(ε2t−j− qt−j) + p X j=1 βj(σ2t−j− qt−j), qt= ω + ρqt−1+ φ(ε2t−1− σ2t−1),

where ρ ∈ (0, 1) and φ > 0. qtdenotes the permanent component of the conditional variance. The transitory component of the conditional variance is denoted by σ2

t−j− qt−j, which is the difference between the conditional variance and its trend. This component is more volatile than the other. The CGARCH model is an effective tool in collecting the short-and long-term persistence effects, but it is not an effective tool for determining the presence of asymmetric effects. This is because there is asymmetry present in the short-term as well as the long-term (de Jesús Gutiérrez, Calisto, & Salgado, 2017). Asymmetry is defined as equal magnitudes of positive and negative shocks having different effects on the conditional volatility (McAleer, 2014).

EGARCH

Standard GARCH models have a few drawbacks: they do not consider asymmetry and leverage effects. Furthermore, dynamics of the conditional variance may not be captured well enough by estimations due to the parameter restrictions which are imposed by GARCH models. The exponential GARCH (EGARCH) model introduced by Nelson (1991) deals with these drawbacks. It is represented by

(13)

where ln stands for the natural logarithm. The coefficient γj captures the size effect of εt. The coefficient αj captures the sign effect of εt. The model does not impose non-negative constraints on its parameters since the natural logarithm ensures that σ2 remains non-negative. The model allows for asymmetric volatility response to shocks.

GJR-GARCH

Glosten et al. (1993) propose the GJR-GARCH models which encounters positive and neg-ative shocks on the conditional variance asymmetrically by using the indicator function

I{εt−j≤0} which is 1 if εt−j≤ 0 and 0 otherwise. The models is represented by

σ2t = ω + q X j=1 (αjε2t−j+ γjI{εt−j≤0}ε 2 t−j) + p X j=1 βjσ2t−j,

with γj > 0 the leverage term. Leverage is defined as the negative correlation between the return and subsequent shocks to volatility (McAleer, 2014). For γj > 0 asymmetry exists. It is essential to include an asymmetric term in time series as showed by many studies, among others Engle and Ng (1993) and French, Schwert, and Stambaugh (1987), since it takes the difference in effects on predictable volatility caused by an unexpected drop or increase in the price into consideration. The GJR-GARCH model can easily test whether there are leverage effects.

APARCH

The asymmetric power ARCH (APARCH) was introduced by Ding et al. (1993). It is defined by σtδ = ω + q X j=1 αj(|εt−j| − γjεt−j)δ+ p X j=1 βjσt−jδ ,

where δ ≥ 0, and γj∈ (−1, 1) ∀j. δ denotes the Box-Cox power transformation (Box & Cox, 1964) of asymmetric absolute residuals and the conditional standard deviation. Hereby, the nonlinear models can be linearized. The model is able to capture asymmetric effects of the conditional standard deviation. γj is the leverage term. The APARCH model encompasses the ARCH, GARCH, Taylor/Schwert’s GARCH, GJR-GARCH, TARCH, NARCH, and log-ARCH models by imposing the appropriate restrictions on δ, γj, and βj.

(F)FGARCH

The full family GARCH (FFGARCH) model is proposed by Hentschel et al. (1995) and is a submodel of the family GARCH (FGARCH) model (also proposed by Hentschel et al. (1995)). The latter is denoted by

σλt = ω + q X j=1 αjσλt−j(|zt−j− ν2j| − ν1j(zt−j− ν2j))δ+ p X j=1 βjσt−jλ ,

where ztrepresent the standardized residuals, λ > 0 determines the shape of the conditional standard deviation, δ > 0 transforms the absolute value function by rotating by means of

ν1≥ 0 and shifting by means of ν2≥ 0. Rotation is the main source of asymmetry for large

(14)

model. Thus, also from this expression a family of symmetric and asymmetric GARCH models can be derived by applying a Box-Cox transformation to the conditional standard deviation. In fact, all of the following models will be derived from this expression starting with the FFGARCH defined by

σtδ = ω + q X j=1 αjσt−jδ (|zt−j− ν2j| − ν1j(zt−j− ν2j))δ+ p X j=1 βjσδt−j.

For this model the restriction of λ = δ is imposed on the FGARCH model, i.e. the same parameter is assigned to the parameter that determines the transformation of absolute value function as the parameter that shapes the conditional standard deviation. The model is able to capture vital attributes of volatility; among others volatility clustering (periods of large fluctuations in volatility) and leverage effect. For the TGARCH, the imposed restriction is

λ = δ = 1, |ν1j| ≤ 1 and ν2j = 0. For the NGARCH, we have λ = δ and ν1j = ν2j = 0.

For the NAGARCH, it is λ = δ = 2 and ν1j= 0. Finally, for the AVGARCH, the imposed

restrictions are λ = δ = 1 and |ν1j| ≤ 1.

TGARCH

The threshold GARCH (TGARCH) is introduced by Zakoian (1994) and expressed as

σt= ω + q X j=1 (αjσt−j|zt−j| − ν1jzt−j) + p X j=1 βjσt−j, |ν1j| ≤ 1

The specification form allows for positive and negative shocks to have different (asymmetric) impacts on the volatility; the impacts are not the same at all lags which is well comprehended by the model. The TGARCH model is fat-tailed and it deals with asymmetric information that captures the leverage effect.

NGARCH

The nonlinear GARCH (NGARCH) model by Higgins and Bera (1992) proposes a nonlinear functional form for the conditional variance which is useful in modeling some financial times series data such as exchange rates or stock market data. It can be expressed by

σ2t = ω + q X j=1 αjσt−j2 (|zt−j− ν2j|)2+ p X j=1 βjσ2t−j,

This model is also referred to as power GARCH. Furthermore, the model can easily be reduced to the standard GARCH model. NGARCH is applied to smooth volatility clustering in modeling. It is also applied to resolve the fat-tail effect in complex time series (Gouriéroux, 2012). NGARCH capture volatility of the following form: if positive shocks cause more volatility than negative shocks of the same size.

NAGARCH

(15)

The NAGARCH captures asymmetry in an opposite way than the NGARCH, i.e. positive shocks introduce less volatility than negative shocks.

AVGARCH

Taylor (1986) and Schwert (1990) have introduced the Absolute Value GARCH (AVGARCH).

σt= ω + q X j=1 αjσt−j(|zt−j− ν2j| − ν1j(zt−j− ν2j)) + p X j=1 βjσt−j, |ν1j| ≤ 1.

This model concentrates on the Taylor property. Haas (2009) defines the Taylor property as “the time series dependencies of financial volatility as measured by the autocorrelation function of power–transformed absolute returns which are stronger for absolute stock returns than for the squares.” Haas (2009) argues that for a AVGARCH(1,1) the Taylor property exists (but is small) for extremely large values of the unconditional kurtosis.

Model

Specifications for the conditional variance

Symmetric

ARCH

σ

t2

= ω +

Pq j=1

α

j

ε

2t−j

GARCH

σ

t2

= ω +

Pq j=1

α

j

ε

2t−j

+

Pp j=1

β

j

σ

t−j2

IGARCH

σ

t2

= ω + ε

2t−1

+

Pq j=2

j

ε

2t−j

− ε

2t−1

) +

Pp j=1

β

j

2t−j

− ε

2t−1

)

CGARCH

σ

t2

= q

t

+

P q j=1

α

j

2t−j

− q

t−j

) +

P p j=1

β

j

2t−j

− q

t−j

),

where q

t

= ω + ρq

t−1

+ φ(ε

2t−1

− σ

t−12

)

Asymmetric & Leverage

EGARCH

ln(σ

2 t

) = ω +

Pq j=1

j

ε

t−j

+ γ

j

(|ε

t−j

| − E|ε

t−j

|)) +

Ppj=1

β

j

ln(σ

2t−j

)

GJR-GARCH

σ

2 t

= ω +

Pq j=1

j

ε

2t−j

+ γ

j

I

{εt−j≤0}

ε

2 t−j

) +

Pp j=1

β

j

σ

t−j2

APARCH

σ

δ t

= ω +

Pq j=1

α

j

(|ε

t−j

| − γ

j

ε

t−j

)

δ

+

Ppj=1

β

j

σ

t−jδ

FFGARCH

σ

tδ

= ω +

Pq j=1

α

j

σ

δt−j

(|z

t−j

− ν

2j

| − ν

1j

(z

t−j

− ν

2j

))

δ

+

Ppj=1

β

j

σ

δt−j

TGARCH

σ

t

= ω +

Pqj=1

j

σ

t−j

|z

t−j

| − ν

1j

z

t−j

) +

Ppj=1

β

j

σ

t−j

, |ν

1j

| ≤ 1

Asymmetric

NGARCH

σ

δ t

= ω +

Pq j=1

α

j

σ

δt−j

|z

t−j

|

δ

+

Ppj=1

β

j

σ

t−jδ

NAGARCH

σ

2 t

= ω +

Pq j=1

α

j

σ

t−j2

|z

t−j

− ν

2j

|

2

+

Ppj=1

β

j

σ

2t−j

AVGARCH

σ

t

= ω +

Pqj=1

α

j

σ

t−j

(|z

t−j

− ν

2j

| − ν

1j

(z

t−j

− ν

2j

)) +

Ppj=1

β

j

σ

t−j

, |ν

1j

| ≤ 1

Table 2: This table presents the specifications for the conditional variance, σt2. The (G)ARCH-type models are divided in three classes: Symmetric, Asymmetric & Leverage, and Asymmetric models. The full description, including the description of the symbols, can be found under the section (G)ARCH-type models.

(16)

as our benchmark model since this is the most simple model which is assumed to give the best out-of-sample forecasts. ARCH(1) is also our benchmark model, but it is assumed to give the worst out-of-sample forecasts since it cannot fully capture the persistence in volatility.

4

Out-of-sample techniques

4.1

Loss functions

We evaluate three different loss functions in our analysis. We consider the mean squared error (MSE) and the mean absolute error (MAE) as defined below

MSE = 1 n n X i=1 t2− ˆσ2t)2 MAE = 1 n n X i=1 2 t − ˆσ 2 t|, where σ2

t stands for the realized variance and ˆσ2t stands for the forecasted conditional vari-ance, for all t. MAE is more robust to outliers compared to MSE. For our third loss function, we consider the Valuet-at-Riks-based (VaR) loss function proposed by González-Rivera, Lee, and Mishra (2004). We choose this loss function for several reasons: (1) its construction differs from the other loss functions. It does not take some form that depends only on the realized variance and the forecasted conditional variance. It is able to provide a certain quantile for the distribution of the returns of the exchange rate (Sadorsky, 2006); (2) this loss function focuses on the tails of the density. Heavy tails are much examined in financial data; and (3) this loss function is less applied in empirical analyses compared to the other two loss functions. Hansen and Lunde (2005) did not include this loss function in their study. We want to examine its performance compared to the other two loss functions. Since the estimation of volatility is of paramount importance, we also include this loss function.

Let V aRα

t+1denote the conditional value-at-risk. We define it as the conditional quantile P(rt+1≤ V aRαt+1|Ft) = α,

where rt+1 represents the exchange rate return, and α is the quantile. Noh and Lee (2016) state that the classical mean-variance ARMA-GARCH models have the form of the general conditional location-scale model. Furthermore, Elliott and Timmermann (2013) also confirm that GARCH-type models are scale models. Since the density of r is of the location-scale form, we can estimate it from

V aRαt+1= µt+1θt) + Φ−1t+1(α)σt+1θt),

(17)

where α = 0.05 (as mentioned above), and dαt+1= I(rt+1< V aRαt+1) which is the indicator function that is 1 if rt+1 < V aRαt+1 and 0 otherwise. The VaR-based loss function is an asymmetric loss function. With a weight of (1 − α), it penalizes the observations for which

r − V aRα< 0. Small values of V indicate a better goodness of fit.

4.2

Ranking measures

We consider three ranking measures for the determination of the best out-of-sample forecast-ing models. We consider the plain method, the Model Confidence Set (MCS) and Bayesian Model Averaging (BMA) weigths. Many scholars report the plain method in their analysis. The Bayesian Model Averaging approach yields the BMA weights. Perhaps the lesser known method among the three is the MCS which is especially designed by Hansen et al. (2011) to answer the question: “Which is the ‘best’ forecasting model?” The Bayesian approach is often used for model selection and model uncertainty. Often these approaches deal with Monte Carlo computations since some computations are (almost) not solvable analytically. We propose an easier method for the Bayesian approach by using BMA weights as a ranking measure. Their obvious advantage is that they are easy to compute analytically, and that they can serve as a ranking measure using any loss function. The ranking measures could be divided into a frequentist and a Bayesian approach.

4.2.1 Plain method

The plain method is simply ranking the forecasting models by the values obtained from the corresponding loss function. The model with the lowest loss value is considered to be the ‘best’ model. The ‘worst’ model is the forecasting model with the highest value for the corresponding loss function. The plain method is included as a contrast to the other ranking measures. The question that rises is whether its simplicity is a boon or a bane.

4.2.2 Model Confidence Set

The goal of the Model Confidence Set, developed by Hansen et al. (2011), is to determine the superior set of model(s) M∗, from a collection of models M. The MCS contains the best forecasted model(s) with probability 100(1 − α)%, with α denoting the significance level. The advantages of the MCS are: (i) its acknowledgement of the limitations of the data. Informative data yields an MCS containing a few models, and less informative data yields an MCS containing many models; (ii) it is possible to make statements about significance in the traditional sense; (iii) more than one model can be the ‘best’ model. In fact, the procedure does not assume that there is a particular true model. All models are treated equally and are solely compared on their out-of-sample predictive ability. A disadvantage is that a model remains in the MCS unless proven inferior. This implies that the models in MCS may not all be good models.

We consider a set of models M that contains the number of forecasting models that are indexed by i = 1, ..., m. The objects are evaluated over the sample t = 1, ..., n, in terms of a loss function and we denote the loss that is associated with model i in period t as Li,t. The loss differential between model i and model j is defined by dij,t= Li,t− Lj,tfor all i, j ∈ M. The assumption is that

E|dij,t| < ∞ and it is not time dependent, ∀i, j ∈ M.

(18)

that are considered: M = {1, 2, ..., m}. The superior set of models is defined as M∗= {i ∈ M : E(dij,t) ≤ 0 for all j ∈ M}.

MCS determines M∗ through a sequence of significance tests, where objects are elimi-nated if they are found to be significantly inferior to other elements of M. Let di,t =

1

m−1 P

j∈Mdij,t for i = 1, 2, .., m. The following hypotheses are tested:

H0,M−: E(di) = 0 for all i, j ∈ M,

HA,M: E(di) 6= 0 for some i, j ∈ M,

where M−, ⊂ M stands for a set of candidate models. The null hypothesis, H0,M−, is a

tests for equal predictive ability (EPA) over the models in M−. If the null hypothesis is

rejected, then the worst performing model is eliminated from M−. This process continues

until H0,M− is not rejected for the first time. The set of surviving models is called the

Model Confidence Set, M∗1−α with a significance level α. The procedure is based on the

hypothesis equivalence test δM− and an elimination rule eM−. The hypothesis equivalence

test is used to test the null hypothesis for any M− ⊂ M. If δM− = 1, H0,M− is rejected.

If δM−= 0, it is not. When H0,M− is rejected, eM− identifies and removes the object from

M−. To test for the null and the alternative hypotheses (HA,M−) above, the following test

statistic is constructed ti= ¯ di q c var( ¯di) for i, j ∈ M, (1) where ¯di =m−11 Pj∈M− ¯

dij for i ∈ Mwhere m stands for the number of models in M−,

¯

dij = M1 P M

t=1dij,t with M denoting the sample size, and var( ¯c di) denotes bootstrapped estimations of var( ¯di). ¯di stands for the performance of model i relative to the average of the models in the set M−. The relative performance between models i and j is measured

by ¯dij. Let νi,M= ti denote the measure that is used to rank the models. The best

performing model is defined by arg mini∈Mνi,M−. The EPA test employs the follwoing

test statistic

Tmax,M− = max

i∈M

ti

It can be used to test the null and the alternative hypotheses. The asymptotic distributions of this test statistic is not standard. Therefore a bootstrap procedure is used to estimate the relevant distribution under the null hypothesis. This procedure is similar to the one that is used to estimate var( ¯di).

The MCS is determined by sequentially trimming the set of models in M−. At each step,

the procedure eliminates the worst model until the hypothesis of EPA cannot be rejected for all the candidate models in M−. Then, the set of models belongs to the superior set

of models. For elimination, the elimination rule is used which is coherent with the test statistics of (1) and is defined by (Bernardi & Catania, 2015)

emax,M− = arg max

i∈M− ¯ di q c var( ¯di) for i ∈ M.

A useful characteristic of the MCS procedure is that it yields p-values for all the candidate models. A model with a small p-value has a smaller chance to be in the superior set of models M∗, since the null hypothesis is rejected for large values of the test statistic. The MCS

(19)

of the MCS p-value is that the MCS is a random subset of models that contains M∗ with a certain probability. The p-values are calculated by using a bootstrap method. We use 10.000 bootstraps in our analysis. We refer the interested reader to Hansen, Lunde, and Nason (2003) for more details about the bootstrap procedure. The MCS is a generalization of the Diebold-Mariano test which is easier to grasp. We refer the interested reader to Diebold and Mariano (1995).

Formally, the MCS procedure is as follows. Step 1: Initialize M− = M.

Step 2: Test for EPA in M−. (i) If the EPA test is rejected, then define

¯ di= 1 m − 1 X j∈M− ¯ dij,

Afterwards, determine the ‘worst’ model from M− which is defined by

i− = arg max i∈M− ¯ di q c var( ¯di) .

Then, remove model i− from M−. Repeat the procedure from step 2 until EPA is not

rejected for the first time. (ii) If the EPA test is not rejected, set cM∗

1−α = M. This is the

(1 − α)-confidence set.

4.2.3 Bayesian Model Averaging weights

Researches have considered methods that combine the information in a large number of time series. Simple averaging of the forecasts of different models is found to be useful for out-of-sample forecasts. This often performs better than forecasts from any one model. Another method that is found to be useful for out-of-sample forecasts is the Bayesian Model Averaging method. At the start, there are many possible models and prior beliefs regarding probabilities that each model is the true model. Afterwards, the posterior probability that each model is the true model is computed. The forecasts from different models are averaged and weighted by the posterior probabilities. This is a shrinkage method over models. BMA is widely used for model selection and parameter uncertainty. It yields the BMA weights.

We prefer using BMA weights over the full BMA method for the following reasons: (i) its simplicity. We are able to derive the weights in a simple manner. Moreover, our goal for using this method is to use it as a ranking measure. Therefore, we are not interested in the whole procedure; (ii) we can evaluate any loss function desired and derive the corresponding BMA weights from it; (iii) BMA weights are probabilities. Therefore, we can simply compare the results with the other considered ranking measures.

We use forecast combinations for the construction of the BMA weights. Forecast combi-nations are assumed to produce better forecasts than methods involving the ‘best’ individual forecasting model. Some reasons for using forecast combinations are (1) non-stationarities may affect individual models in a different way; (2) individual models will have misspecifica-tion bias of an unknown form; and (3) different loss funcmisspecifica-tions may be used for the forecasts. However, there are also drawbacks in using forecast combinations, e.g. (1) estimation errors corrupt the combination weights; and (2) non-stationarities may lead to instabilities in the combination weights and may not lead to a proper set of combination weights (Timmermann, 2006).

(20)

weights against parameter estimation errors involved while estimating the weights. Let ˆ

σ2h

i,t+h,t denote the h-step-ahead out-of-sample forecast of σt2 by the ith individual forecast model computed at time t. We use h = 1 in our analysis. For clarifications and for reasons of replications, we keep h in the formulas. The forecast combination ft+h,tcan be described as ft+h,t= n X i=1 witˆσ2 h i,t+h,t,

where n denotes the number of forecasting models. The weights wit on the ith forecast in period t are given by

wit= m−1it Pn j=1m −1 jt , with mit= Li(·),

where Li(·) denotes the loss function value obtained by the ith forecasting model. The inverse of the loss functions are use to reduce the effect of parameter estimation errors.

Shrinkage methods are used to shrink the extremes towards the center: extremely high estimated coefficients in the covariance matrix contain a lot of positive error and therefore are pulled downwards and the extremely low estimated coefficients are pulled upwards. These methods improve the quality of the set of weights by taking the sample size and the number of forecasting models into consideration. We seek for shrinkage toward equal weighting, thereby not favoring any model. The BMA weights, wBM Ait , have the following form

witBM A= ϕ ˆwcOLSit + (1 − ϕ)1 n ϕ = max  0, 1 − n M − n − h − 1  ,

where M denotes the sample size, ˆwcOLS

it denotes the constrained OLS estimator of the combination weights on the ith model in the forecast combination obtained from

ft+h,t= witσˆ2 h t+h,t+ εt, n X i=1 wit= 1, wit≥ 0 ∀i. ˆ σ2h

t+h,tdenotes the h-step-ahead out-of-sample forecasts of all models. The obtained weights from the constrained OLS are the BMA weights. They reflect the relative performance of the models. These BMA weights probabilities, i.e. they are non-negative and they sum up to 1. This approach has the advantage that it is applicable under general correlation patterns, e.g. when there is strong heterogeneity among forecasting models. Its disadvantage is that when the data sample gets large or when the number of forecasting models are small, the estimation error becomes a big problem. For unbalanced datasets, this approach is also poor since the full covariance matrix cannot be estimated.

5

Results

5.1

Results for the daily and monthly returns

(21)

EUR/USD monthly returns Models pM CS Loss·10−3 wBM A MAE, 8 eliminations GJRGARCH21 1.0000 0.1465 0.0278 GARCH22 0.8533 0.1495 0.0264 GJRGARCH12 0.8533 0.1501 0.0273 APARCH21 0.8533 0.1510 0.0260 NGARCH11 0.8533 0.1520 0.0264 NGARCH22 0.8533 0.1526 0.0257 CGARCH22 0.8533 0.1553 0.0254 NGARCH21 0.6085 0.1546 0.0263 FFGARCH11 0.5068 0.1548 0.0273 GARCH11 0.5068 0.1554 0.0265 APARCH22 0.5068 0.1560 0.0259 NAGARCH12 0.5068 0.1561 0.0260 GJRGARCH22 0.5068 0.1577 0.0256 AVGARCH21 0.3667 0.1620 0.0263 CGARCH21 0.3667 0.1612 0.0252 GJRGARCH11 0.3667 0.1584 0.0259 APARCH12 0.2746 0.1576 0.0262 GARCH21 0.2746 0.1582 0.0260 NGARCH12 0.2746 0.1591 0.0261 EGARCH12 0.2746 0.1604 0.0268 APARCH11 0.2746 0.1593 0.0258 CGARCH11 0.2746 0.1593 0.0255 TGARCH12 0.2746 0.1624 0.0258 TGARCH11 0.2746 0.1641 0.0261 CGARCH12 0.2746 0.1605 0.0254 FFGARCH12 0.2746 0.1615 0.0261 TGARCH22 0.0890 0.1626 0.0259 AVGARCH11 0.0890 0.1667 0.0259 EGARCH22 0.0890 0.2160 0.0222 EGARCH21 0.0267 0.1646 0.0263 EGARCH11 0.0267 0.1635 0.0263 AVGARCH22 0.0267 0.1656 0.0258 IGARCH22 0.0224 0.1648 0.0241 GARCH12 0.0134 0.1643 0.0251 ARCH1 0.0134 0.1763 0.0230 TGARCH21 0.0134 0.1665 0.0249 IGARCH11 0.0134 0.1698 0.0237

USD/JPY monthly returns

Models pM CS Loss·10−3 wBM A MAE, 34 eliminations TGARCH22 1.0000 0.1253 0.0368 TGARCH21 0.2829 0.1304 0.0352 EGARCH12 0.2829 0.1315 0.0360 APARCH22 0.0954 0.1307 0.0359 AVGARCH12 0.0954 0.1311 0.0352 TGARCH12 0.0954 0.1358 0.0330 TGARCH11 0.0954 0.1319 0.0341 APARCH12 0.0954 0.1335 0.0346 AVGARCH11 0.0954 0.1332 0.0331 GJRGARCH12 0.0954 0.1322 0.0334 NAGARCH12 0.0206 0.1358 0.0346 EGARCH22 0.0206 0.1531 0.0339 GJRGARCH11 0.0206 0.1457 0.0289 NAGARCH11 0.0206 0.1425 0.0305 GARCH11 0.0027 1.6586 0.0278 ARCH1 0.0000 1.9058 0.0253

(22)

EUR/USD daily returns Models pM CS Loss·10−4 wBM A MAE, 8 eliminations NGARCH21 1.0000 0.1039 0.0311 IGARCH11 0.7466 0.1048 0.0299 NGARCH12 0.7466 0.1053 0.0301 IGARCH21 0.7466 0.1049 0.0297 IGARCH22 0.7466 0.1050 0.0299 CGARCH11 0.7466 0.1054 0.0293 IGARCH12 0.7466 0.1052 0.0298 GARCH21 0.7466 0.1077 0.0264 GARCH12 0.7466 0.1071 0.0282 NGARCH22 0.7466 0.1066 0.0281 GJRGARCH22 0.7466 0.1065 0.0281 GARCH22 0.7466 0.1068 0.0282 NGARCH11 0.7466 0.1099 0.0256 GARCH11 0.4084 0.1071 0.0280 CGARCH22 0.4084 0.1095 0.0247 GJRGARCH11 0.4084 0.1083 0.0267 FFGARCH22 0.4084 0.1083 0.0272 AVGARCH12 0.4084 0.1098 0.0265 APARCH22 0.1757 0.1077 0.0277 TGARCH21 0.1319 0.1090 0.0268 CGARCH21 0.1319 0.1098 0.0253 NAGARCH12 0.0165 0.1106 0.0264 AVGARCH22 0.0165 0.1096 0.0266 CGARCH12 0.0165 0.1100 0.0255 FFGARCH11 0.0165 0.1120 0.0243 TGARCH22 0.0165 0.1101 0.0260 APARCH11 0.0165 0.1127 0.0241 GJRGARCH12 0.0165 0.1108 0.0260 FFGARCH12 0.0165 0.1110 0.0258 NAGARCH11 0.0165 0.1122 0.0256 GJRGARCH21 0.0165 0.1131 0.0240 AVGARCH21 0.0165 0.1120 0.0252 AVGARCH11 0.0165 0.1132 0.0241 TGARCH11 0.0165 0.1136 0.0243 APARCH12 0.0165 0.1113 0.0257 ARCH1 0.0165 0.1629 0.0150 EGARCH21 0.0165 0.1233 0.0196 TGARCH12 0.0165 0.1191 0.0217 EGARCH12 0.0165 1.5505 0.0017 EGARCH22 0.0165 4.4850 0.0007

USD/JPY daily returns

Models pM CS Loss·10−4 wBM A MAE, 19 eliminations APARCH12 1.0000 0.1688 0.0347 FFGARCH11 0.6069 0.1800 0.0337 APARCH11 0.6069 0.1839 0.0323 FFGARCH12 0.6069 0.1955 0.0310 ARCH1 0.6069 0.2286 0.0259 GJRGARCH21 0.0540 0.2008 0.0305 EGARCH12 0.0540 0.2052 0.0287 CGARCH22 0.0540 0.2198 0.0262 CGARCH11 0.0540 0.2255 0.0260 CGARCH21 0.0540 0.2258 0.0260 CGARCH12 0.0540 0.2290 0.0258 AVGARCH22 0.0162 0.2302 0.0291 NGARCH21 0.0162 0.2259 0.0275 AVGARCH11 0.0109 0.2302 0.0279 GJRGARCH12 0.0109 0.2371 0.0281 GJRGARCH11 0.0109 0.2384 0.0277 NGARCH22 0.0109 0.2477 0.0250 GARCH11 0.0109 0.2316 0.0265 EGARCH22 0.0109 0.2943 0.0249 GARCH12 0.0109 0.2484 0.0259 NGARCH11 0.0109 0.2633 0.0244 AVGARCH12 0.0109 0.2514 0.0271 GARCH22 0.0109 0.2458 0.0266 NGARCH12 0.0104 0.2487 0.0266 IGARCH21 0.0104 0.2591 0.0264 GARCH21 0.0104 0.2886 0.0236

(23)
(24)

since the full results are too much to grasp. Tables 3 and 4 present the results for the monthly and daily returns under MAE. We do not prefer this loss function over the others. We present this loss function since its Superior Set of Models (SSM) is usually smaller than that of MSE and larger than that of VaR. The other results (for the monthly and daily returns) are presented in the Appendix in Tables 8 and 9.

Two things are immediately obvious from Tables 3 and 4: (1) The SSM for the EUR/USD exchange rate data is larger than that of USD/JPY exchange rate data. The USD/JPY exchange rate data is more informative. (2) Both benchmark models are included in the SSM for all of the considered scenario’s except for the USD/JPY monthly returns. Under the MSE (presented in Tables 8 and 9 in the Appendix), less eliminations take place, and under VaR there are more. This is somewhat surprising since VaR is more sensitive to outliers (large misprediction) than MAE and MSE, and therefore the data should have been less informative (see Hansen et al. (2003)). The more models that are eliminated, the higher the heterogeneity of the competing forecasting models. If the SSM contains a big portion of models of the starting M set, then the competing models are statistically equivalent in their forecast ability of the future conditional variance. For instance, if we examine the EUR/USD monthly returns under the MAE, our empirical findings emphasize the statistical equivalence of forecasting the conditional variance by a simple model such as the GARCH(1,1) or more sophisticated models such as the AVGARCH(2,2). If we examine the USD/JPY daily returns under VaR, we can observe that particular models are in the SSM; the CGARCH models are able to adequately describe future conditional variance. When we look at the USD/JPY monthly returns under VaR, we observe that the symmetric and asymmetric volatility models are excluded from the SSM suggesting that these models do not adequately describe future conditional variance. For the EUR/USD monthly returns under VaR on the other hand, these type of models are included suggesting that they are adequately describing future conditional variance. The p-values of the MCS are also includes in the tables; the higher the p-value, the higher the ranking by the MCS. The p-value of the

Tmax,Mis the minimum of the overall p-values that are reported in the second column of

each table. We refer to Hansen et al. (2011) for a detailed discussion about the interpretation of the p-values of the MCS. The ranking by the loss functions is as follows; the lower the loss, the higher the ranking. For the BMA weights (wBM A), we have the opposite. The higher the value, the higher the ranking since these are probabilities. While the p-values of the MCS (pM CS) and the loss values differ across models in the SSM, the values of the BMA weights are not that different. In fact, they do not differ as much for each model under different loss functions under each frequency (the difference starts to show after four or five decimal points). This is due to the construction of the weights by means of the shrinkage forecast method. The ranking measures do not completely agree on the exact order of the ‘best’ performing models, although the ranking is not too different. This especially holds when comparing the ranking under the plain method and the MCS.

Figure 3 present the plots of the realized conditional variance with the forecasted con-ditional variance of the benchmark models and the ‘best’ performing model per considered series. The presented forecasting period is from January 2015 until December 2018 for the monthly returns. For the daily returns the presented forecasting period is from 1 Decem-ber 2018 until 31 DecemDecem-ber 2018. We selected the ‘best’ performing model per series by examining which model obtained the highest ranking overall, i.e. the highest MCS p-value, the lowest loss value and the highest BMA weight for the corresponding series. This ‘best’ performing model differs for each series and is a sophisticated model (meaning not par-simonious such as the GARCH(1,1)). Moreover, these models are categorized under the

(25)

Figure 4: Graphs of model performances for the daily returns of the EUR/USD and USD/JPY exchange rate data. ARCH(1) is omitted under EUR/USD since its loss value was too low. The x-axis is the negative value of the MAE loss. The plots on the right-hand-side present separate densities for the asymmetric (A), asymmetric and leverage (A&L) and symmetric (S) models. The considered time period is 01/01/2015-31/12/2018.

of the EUR/USD exchange rate data which belongs to the class of Asymmetric models. Al-though the ARCH(1) was assumed to be the worst performing model, it is able to capture the peaks of the realized variance more accurately compared to the other models. GARCH(1,1) is not too different in its ability to capture these peaks compared to the ‘best’ performing model for the considered scenario’s. These peaks are more frequently present in the period of 2015 until the middle of 2017. For the USD/JPY series, we do not observe as many peaks as for the EUR/USD exchange rate series.

(26)

of the average sample losses. In this way, the model with the best sample performance is represented by the right tail. All the figures can be divided into a left-hand and a right-hand side. The left-right-hand side of the figures shows the model density of all the models, whereas the right-hand side shows the performance densities for different classes of models. The models are divided into three classes: symmetric models (S), asymmetric and leverage accommodating models (A&L) and asymmetric models (A). From Figure 4 we observe that for the EUR/USD, GARCH(1,1) is performing quite nice. The ARCH(1) has one of the worst sample performances; it is omitted since its value was too low. For the USD/JPY series, we have that both benchmark models are performing almost equally well, where ARCH(1) slightly outperform GARCH(1,1). For this series, the class of Asymmetric models are inferior to the models of the other two classes in terms of sample performances. For the EUR/USD series, the class of Symmetric models take the upper lead. The graphs of model performances for the monthly EUR/USD returns in Figure 7 do not exhibit any preferences for any of the classes. For the EUR/USD daily returns, there is a preference toward the class of Symmetric models as shown in Figure 8. Figures 9 clearly shows that the class of

Symmetric models is inferior to the other classes for the USD/JPY monthly returns while

the class of Asymmetric and Leverage accommodating models takes the upper hand. For the USD/JPY daily returns, shown in Figure 10, the class of Asymmetric models is inferior under MAE and VaR. Under MSE, there are no major differences between the various types of models.

The out-of-sample performance under each frequency, capturing both exchange rate data, will be evaluated under Discussion in Section 6.

5.2

Results in the period of the financial crisis of 2007-2008

This subsection will present our findings for the period surrounding the financial crisis. The descriptive characteristics of the hourly and daily returns of the two exchange rate data are given in Table 5. The averages of all the returns are almost negligible in comparison to their standard deviation. The returns series are either slightly negatively or positively skewed and the hourly returns exhibit larger kurtosis values compared to the daily returns. The

p-values of the Ljung-Box test indicate high autocorrelation in the variance for all of the

considered scenario’s.

EUR/USD (crisis)

M mean·10−5 std. dev. min max skew kurt pQ(10)

hourly 8857 0.9789 0.0014 -0.0128 0.0125 -0.0597 11.2634 <0.0001 daily 1476 6.1466 0.0034 -0.0220 0.0262 -0.0258 6.1636 <0.0001

USD/JPY (crisis)

hourly 8829 1.5164 0.0014 -0.0170 0.0141 0.0675 12.4972 <0.0001 daily 1472 8.9514 0.0033 -0.0209 0.0184 -0.1260 4.6313 <0.0001

Table 5: Summary statistics of the hourly and daily returns for the EUR/USD and USD/JPY exchange rate data in the period of the financial crisis of 2007-2008. M denotes the sample size. The p-values of the Ljung-Box test statistics under ten lags are given by pQ(10). The considered time period is 01/01/2003-31/12/2008.

(27)

the data very well. The ARMA orders for the conditional variance are given in Table 7 in the Appendix.

Table 6 presents the results for the daily returns under MAE for both exchange rate data. Under this frequency, we have that the SSM is larger for the USD/JPY exchange rate data than for the EUR/USD exchange rate data. In fact, this also holds for the other considered loss functions which are provided in Table 11 in the Appendix. In this period, the EUR/USD exchange rate data is more informative. Table 10 in the Appendix presents the results of the hourly returns. Again, the EUR/USD exchange rate data is more in-formative under the MSE and MAE, but not under the VaR loss function. ARCH(1) is outperforming GARCH(1,1) for the EUR/USD series under all scenario’s (both hourly and daily returns) with the exception of the daily returns under MSE. For the hourly EUR/USD returns, ARCH(1) is considered to be the best performing model. For the USD/JPY series, ARCH(1) outperforms GARCH(1,1) only for the hourly returns under MSE. From Table 6, we observe that the MCS and the loss function (MAE) more or less agree on how they would rank the models of the SSM from top to bottom. The BMA weights do not agree as often. This is especially true for the USD/JPY series. For instance, for the USD/JPY daily re-turns, the NGARCH(2,2) and AVGARCH(2,1) have the highest ranking under the Bayesian approach. NGARCH(2,2) does also score high under the other two ranking measures, but AVGARCH(2,1) scores much lower. From this we can conclude that our Bayesian ranking measure is less sensitive to the values of the loss funtion than our other two meausures.

When we divide the SSM into the three categories of the volatility models as presented in Table 2, we have the following. Under EUR/USD daily returns, presented in Table 6, 30% belongs to the class of symmetric models (S), 45% belongs to the class of asymmetric and leveraging accommodating models (A&L) and 25% belongs to the class of asymmetric models (A). For the USD/JPY series, these percentages are 40%, 36%, and 24%, respectively. Off course, “A picture is worth a thousands words”. Figures 12, 13, 14, and 15 in the Appendix present graphs of model performances for all of the considered scenario’s. For the hourly returns of the EUR/USD, we do not observe a distinct class to be superior over the other classes. We acknowlegde that under MSE and MAE the asymmetric (A) class is superior, but it is inferior under VaR. Off course, this class only consisted of 20% of all models under VaR (see Table 10). For the USD/JPY series, we have that under hourly returns the asymmetric and leverage accommodating class (A&L) has the upper hand while the opposite holds true for the symmetric class (S). For the EUR/USD daily return series (during the financial crisis), we do not have a distinct superior class. The only clear distinction that can be made is under MAE where the A&L class takes the lead and the S class is inferior. For the USD/JPY daily return series (during the financial crisis), the A&L class takes the lead under MSE; the A class takes the lead under VaR, while the A&L class is inferior under VaR. Overall, during times of turmoil, the A&L class seems to have the upper hand and the

S class seems to be inferior to the other classes for the USD/JPY exchange rate data. For

the EUR/USD exchange rate data, no firm statements can be formed.

(28)

model so that we would not only observe the performance of the benchmark models. There are more peaks present for the USD/JPY exchange rate data compared to the EUR/USD series. For the USD/JPY exchange rate data, the two best performing models belong to the A class. This class did not belong to the superior class under the density performance graphs. For the EUR/USD exchange rate data, the two best performing models belong to different classes: for the hourly returns this is the S class and for the daily returns this is the A&L class. The distinction of a superior class under the USD/JPY exchange rate data was also more present in the density performance graphs.

When we compare our daily returns under regular times (out-of-sample forecasts of 2015-2018) to those of the period of turmoil (out-of-sample forecasts of 2007-2008), we have smaller sizes of SSM under USD/JPY exchange rate data and larger sizes of SSM under EUR/USD exchange rate data. When we compare these periods under MSE, we have that the GARCH(1,1) is outperforming ARCH(1) for both exchange rate data under both periods. Under MAE and VaR, this only holds under a few scenario’s. Furthermore, when we compare the density plosts, we observe that for the EUR/USD series, on average the S class takes the lead under regular times while it almost does the opposite under times of turmoil. For the USD/JPY series, on average the A&L class takes the lead under regular times and the A class takes the lead under times of turmoil. Moreover, we have that the plots of the forecasted and realized variance show more peaks for the EUR/USD series than for the USD/JPY series under regular times. This is reversed under times of turmoil. This could possibly also explain why we observe larger sizes for the SSM for the EUR/USD series than for the USD/JPY series under regular times and smaller sizes under times of turmoil. The out-of-sample performance under each frequency, capturing both exchange rate data, will be evaluated under Discussion in the next section.

6

Discussion

In this study, we aimed to provide an answer to “Which volatility model performs the best in terms of its out-of-sample forecast ability under different circumstances?” by means of a Bayesian and frequentist approach. Our findings show that no such model exists. Not even the GARCH(1,1) as claimed by Hansen and Lunde (2005). In fact, it is not even always ranked on top. Furthermore, models with more lags seemed to have the ability to perform well in terms of out-of-sample forecasting. Overall we find that the class of asymmetric and leverage accommodating models is the superior class for the USD/JPY exchange rate data. For the EUR/USD exchange rate data there was not a strong distinction between the classes.

We find the MCS to be to most informative ranking measure since it actually gives a set of superior models next to ranking them. But we also think that it is not an independent ranking measure since its rankings dependent on the values of the loss functions. The BMA weights are less sensitive to the used loss function in terms of its ranking making it a more stubborn ranking measure. The rankings of the plain method and the MCS are somewhat similar; i.e. they do not completely coincide, but they never differ too much in their rankings of the ‘best’ performing models. Rankings under the Bayesian method differ more when compared to the other two measures. The BMA weights are able to reflect relative model performance since they are the probabilistic measures that a model is correct given the observations. BMA has shown to produce more precise and reliable forecasts than other multi-model techniques (Raftery, Madigan, & Hoeting, 1997).

Referenties

GERELATEERDE DOCUMENTEN

I find significant results that firms, operating in the tech, with VC involvement, with a large market capitalization, listed on the Nasdaq, and IPOs conducted in a hot issue

It can be concluded that the CSV measures in panel A and panel B do contain information about the subsequent short-term momentum strategy, while the VDAX measure

[r]

Tise’s experience at senior management levels in the profession includes being the first President of the Library and Information Association of South Africa from 1998 –

'fabel 1 memperlihatkan jum1ah perkiraan produksi, jumlah perusahaan/bengkel yang membuat, jenis traktor yang diproduksi, asa1 desain dan tahap produksinya.. Jenis

Vanwege het bestorten van de voor oever is deze aanvoer de laatste jaren wel gereduceerd, maar er spoelen nog wel degelijk fossielen aan, schelpen zowel als haaientanden.. Van

Aangesien deelwoorde van partikelwerkwoorde baie produktief is (vergelyk Tabel 9 in afdeling 4.3.1), behoort hierdie riglyn (vergelyk Figuur 30) herhaal te word in die

Fieldwork was carried out from January 9th to March 29th, 2014, in Richardson, Texas. The focus was on Taiwanese Americans that all attend the same Taiwanese-American