One Model to Rule them All? Comparing Volatility Model Performance

(1)

One Model to Rule them All?

Comparing Volatility Model Performance

Burak Alp (s1912216)

(2)

(3)

One Model to Rule them All?

Comparing Volatility Model Performance

Student:

Burak A

LP

Student number:

S1912216

January 2, 2018

Abstract

(4)

1 Introduction

In this day and age, the financial world affects all of us, be it directly or indirectly. Some have savings accounts at banks, others have mortgages and some have large portfolios of investments. Even if someone does not make use of services the financial world pro-vides, big events occurring in this world affects us all to some extent. For example, various financial crises have been on the front page of newspapers, affecting big economies and sometimes even the whole world. Think about the 2007-2008 financial crisis, caused (ini-tially) by the housing bubble in the Unites States, and the rippling effects that were felt by countries that were on the other side of the globe.

Since the financial world can have major impact on everyday life, a closer look at all aspects of this world is needed to have a better grasp at what we are dealing with. Remember that the financial world is fueled by human interactions and sentiment, making it unpredictable at times. To minimize the occurrence of these events (or to have the maximum amount of control one can have on these phenomena), finance professionals, regulators and academics continually find ways to get more insight in these phenomena and try to find effective ways to ”tame” this animal called the financial world which can get wild if left unattended. There are a lot of different aspects to finance and the financial world, and each aspect is of vital importance. What we will focus on in this paper however, will be on volatility, which is the bread and butter of the study of finance. As Andersen and Bollerslev (1998) put it so eloquently in their paper: Volatility permeates finance.

First of all, we start out with the question what volatility – or in more general terms, stan-dard deviation – is. Generally speaking, it is a statistical measure for the stanstan-dard dis-persion between the data points within a quantitative data set and the convention is that volatility is denoted by the lower case Greek letter σ (sigma). If the data set is, for example, a set of returns of a stock, then the volatility is the degree of variation of the return of that stock. While a low volatility would imply that a return on a stock does not fluctuate a lot, a high volatility would imply big spikes in the return, both in the positive and negative direction. Hence, volatility is a measure of risk. But where do we see this volatility pop up in finance, and how is it used?

(5)

Below, the optimal proportions (or weights) for a minimum variance portfolio is given in matrix form (where bold letters indicate a vector)

wmvp = Σ

−1 ι ι0Σ−1ι

whereΣ is the matrix containing the variance and covariances of the securities included in the portfolio. It is evident that the estimated volatilities directly influence the proportions of the portfolio, and consequently the return it generates, which in this case is given by (with r being the vector of expected returns)

rmvp =r0wmvp

Another area within finance where volatility is an integral part of, is option pricing. The seminal paper of Black and Scholes (1973) gave us the elegant Black-Scholes partial differ-ential equation that must hold true for all financial derivatives. Furthermore, for the case of equity options, the Black-Scholes formula gives us an exact price for this option (given some parameters). The Black-Scholes PDE is given below

∂V ∂t +rS ∂V ∂S + 1 2σ 2_S2∂2V ∂S2 =rV

This result is often called the most important result in finance, since it gave birth to the branch mathematical finance and to the now trillion dollar derivative business. Notice that besides the obvious variables present in the equation above (price of the option V, price of the underlying security S, the risk-free rate r and time t), we see the volatility parameter σ in there.

Risk measures such as the Value at Risk (VaR) and Expected Shortfall (ES) have become increasingly important for banks and regulators since the Basel framework which brings stricter regulations on capital for financial institutions. These risk measures are usually calculated with historical data using historical simulation, or by using Monte Carlo simu-lation where normality of returns is assumed. The analytical forms for these risk measures (under the assumption of normality) are given below.

VaRα =µ+σΦ −1₍ α) ESα = 1 1−α Z 1 α VaRu du

(6)

The examples above show that volatility indeed permeates finance. Since daily stock in-dex data is by nature a time series, Auto-Regressive Integrated Moving Average (ARIMA) models are used to model the conditional mean. In our case, the conditional mean (con-ditional on the observed returns up to time t) for the returns of various indices. Note that ARIMA models only explicitly model the mean and hence assume that the return volatility is constant over time. This, of course, is proven to be incorrect and a model to account for this changing conditional volatility (called conditional heteroscedasticity) was proposed by Engle (1982), with the name of the model being Auto-Regressive Conditional Heteroscedas-ticity (ARCH).

Four years later, a more general model was proposed by Bollerslev (1986): the GARCH model, which will be the benchmark model for this paper. The GARCH model is the ex-tended version of the ARCH model, since it also adds lagged values of the volatility to the conditional volatility instead of just the lagged squared residuals. There has been a great deal of research about volatility modeling, which gave birth to an incredible list of volatil-ity models. To get a feel of how extensive this list is, Bollerslev (2009) made a glossary of all the different GARCH variants up to that year, which has more than forty pages with around 6 models on each page.

In this paper the focus will be on which of the numerous volatility models is the most suited model to use while forecasting stock index volatility. The most well established models in the literature will be considered, with the GARCH(1, 1) being the benchmark model to beat. ”Well established models” refer to the number of times the papers where these models are introduced are cited in other scientific research. The chosen models (see Methodology section) all are cited at least a thousand times (some even more than ten thousand times). This leads to the main research question of this paper

Q_main= Which volatility model has the best forecasting performance?

where we define the ”best” model as the model that scores best overall on various forecast performance measures. To see if the findings extend to other series, the analysis will be performed on multiple stock indices. Furthermore, we will look at the forecasting perfor-mance of the model when facing adversity (an unexpected financial crisis). Also, we take a look at the evolution of the behavior of return volatility over time; does a different period in history lend itself to a different optimal model in terms of performance or not?

(7)

assumptions of it)?

The model uncertainty will be checked by using two methods; Nyblom’s Stability test (Ny-blom, 1989) and Monte Carlo simulations for the model parameters. A detailed information as to how these methods are employed are given in the Methodology section. The process uncertainty will be checked by examining the stochastic element of the model. This is done in the Data section.

Section 2 of this paper will look at the literature on volatility modeling and model un-certainty. Section 3 will lay out the methodology used in this paper. Section 4 gives the relevant information about the data to be used to produce the results which are given in section 5. Section 6 concludes.

2 Literature Review

To generalize the implausible assumption of a constant one-period forecast variance, Engle (1982) introduced the Autoregressive Conditional Heteroscedasticity (ARCH) model to ex-plicitly model the variance of a given set of data. The general form of the equations for the conditional mean and variance are as follows

Xt =µ+ p

∑

i=1 φiXt−i+ q

∑

j=0 θjεt−j σt2=ω+ m

∑

r=1 αrε2t−r ε2t =σtηt with ηt ∼ (iid 0, 1)

The model for the variance is determined by the squared residuals of the specification of the conditional mean. While this is a leap forward from the assumption of constant variance (which is proven to be false for financial data, as will be shown in section 4), Bollerslev (1986) went one step further by generalizing the ARCH model to allow for past conditional variances in the current conditional variance. The second equation above then becomes

σ_t2 =ω+ m

∑

r=1 αrε2_t−r+ n

∑

s=1 βsσ_t2−s

(8)

the paper by Engle (2001). The motivation why these models are so popular is explained by Engle, Patton, et al. (2001), where they show the stylized facts about volatility such as pronounced persistence, mean-reversion and the possibility of exogenous variables influ-encing volatility to name a few and how volatility models capture all of these features. But first of all, is forecasting of the volatility even possible? Are the resulting forecasts of any use? Remember that a model which has good in-sample fit might still be use-less depending on the out-of-sample performance. Andersen and Bollerslev (1998) ad-dress the skeptics of the forecasting performance of volatility models by showing that, in-deed, ARCH and stochastic volatility models do provide accurate forecasts for daily return volatility data. Furthermore, Bollerslev, Chou, and Kroner (1992) show numerous empiri-cal applications of the ARCH/GARCH models using financial data. Also, there are cases where modifications made on the standard GARCH model can improve the performance of the model, such as in the paper of Campbell and Hentschel (1992), where asymmetry is introduced in the volatility model.

While there are many other examples in the scientific literature of the use of ARCH/GARCH models and their modifications (some of them will be discussed in the methodology sec-tion), more refined versions of volatility itself are also used for forecasting. An example of this can be observed in the paper of Andersen, Bollerslev, and Lange (1999), where the anal-ysis focuses on what they call integrated volatility instead of the more traditional volatility that we will consider. Another way of forecasting the volatility can be done using advanced methods in a continuous-time framework, as seen in, for example, the work of Andersen, Bollerslev, Diebold, and Labys (2003).

Now that we know that there are numerous specifications to model the conditional vari-ance, the question that remains is which one to use. With the advancement of the litera-ture, new modifications are added to account for (newly) observed behavior of the data, but when does the modification start to hurt the forecasting performance of the model? To try to answer these questions, we first look at the seminal paper of Hansen and Lunde (2005), where various volatility models are compared using forecasting performance as the criterion. They found that out of all the models considered, there was no significant evi-dence that the standard GARCH(1,1) model can be outperformed for exchange rate data, while the standard GARCH(1,1) model was clearly inferior to models that can incorporate a leverage effect - such as the GJR-GARCH model by Glosten, Jagannathan, and Runkle (1993) - for IBM return data. Furthermore, although comparisons between models could be made in their analysis, their testing methods lack power making it unable to distinguish between good and bad models.

(9)

uncertainty to the model. Avramov (2002) finds that model uncertainty is more important than estimation risk. Since a high level of model uncertainty can lead to a model that leads to wrong investment decisions, ignoring this uncertainty can lead to large losses.

Another example of model uncertainty and investment is given in the paper of Barberis (2000). In this paper we can see that model uncertainty adds to the variance of the model (which is shown in the paper by the added simulation bias for the estimated parameters) and consequently determines the fraction of the wealth an investor should put in risky as-sets given a certain investment horizon. Barberis shows that when parameter uncertainty is introduced in to the model, modifications to the investment strategy need to be made to avoid losses.

Furthermore, Chatfield (2006) argues that there is still much to be done about model uncer-tainty in the scientific literature. He argues that even though it is recognized that all models are approximations, many scientists make predictions from models while disregarding this fact, resulting in the accuracy of the predictions being of poor quality.

(10)

3 Methodology

The Hansen and Lunde (2005) paper will be the main inspiration for the research in the following sections. Unlike Hansen & Lunde, we will restrict our set of models by including only the models that are well established in financial literature and look at the forecasting performance in multiple time-periods. The analysis that will be performed on the S&P 500 stock index returns will then be performed for the remaining two stock index return data (DAX and Nikkei 225) to see if the results extend to other series.

Univariate vs. Multivariate Model

Before delving into the journey to our best performing model, we turn to the difference of univariate and multivariate GARCH models. While there is a mountain of research done on univariate GARCH models, multivariate GARCH models also got the attention they needed and deserved in the scientific literature, giving birth to various models such as the PC-GARCH, DCC-GARCH or BEKK-GARCH to name a few.

The benefit of these multivariate models is that correlations are also taken into account, which brings our model closer to ”reality”. Though this flexibility is very useful in theory, the dimensionality problem of these models make things difficult for estimation in practice. Since the ultimate goal is to find models that are of practical relevance and the parsimony of a model is of great importance, such elaborate models are hard to justify in most cases. Hence, only univariate models will be considered

Standard GARCH Model (sGARCH)

To start off, we have our standard GARCH(p, q) model (Bollerslev, 1986), with the bench-mark model obtained for p=q =1. The model may be written as:

σt2 =ω+ p

∑

j=1 αjε2_t−j+ q

∑

j=1 βjσ_t2−j

with σ_t2denoting the conditional variance, ω the intercept, αj and βjthe jth parameter for

(11)

Exponential GARCH Model (EGARCH)

The exponential model, or EGARCH(p, q) (Nelson, 1991), is defined as ln(σt2) =ω+ p

∑

j=1 h αjεt−j+γj(|εt−j| −E|εt−j|) i + q

∑

j=1 βjln(σ_t2−j)

where we now have an additional parameter γ in the specification. This model allows us to measure the sign effect of the innovation variable εt through the parameter αj and the

size effect through γj. Due to the left side of the equation being ln(σ_t2), the right side is

allowed to be negative and hence there are no parameter restrictions for this specification.

GARCH Model with Leverage Effect (GJR-GARCH)

Another modification on the GARCH model, called the GJR-GARCH(p, q), where GJR are the initials of the authors of the paper where this model was proposed (Glosten et al., 1993), incorporates positive and negative shocks on the conditional variance in an asymmetric way using an indicator function. The specification is as follows

σt2=ω+ p

∑

j=1 (αjε2_t−j+γjIt−jε2_t−j) + q

∑

j=1 βjσ_t2−j

where the indicator function is equal to zero if ε > 0 and equal to one if ε ≤ 0 and where the γjparameter denotes the leverage term.

Asymmetric Power ARCH Model (APARCH)

A model that also allows for leverage, but goes one step further by allowing for the Tay-lor effect (the sample autocorrelation of absolute returns are usually larger than that of squared returns), is the asymmetric power ARCH of Ding, Granger, and Engle (1993). The specification of the APARCH(p, q) model is defined as

σ_tδ =ω+ p

∑

j=1 αj(|εt−j| −γjεt−j)δ+ q

∑

j=1 βjσtδ−j

where γj is again the leverage term and δ ∈ (0,∞). Note that now, the specification is

not in terms of simply the conditional variance (σ_t2), but in terms of σδ

t, where the power

parameter has to be estimated. Furthermore, note that the GARCH and the GJR-GARCH models are sub-models of the APARCH: For δ=2 and γj=0 the APARCH model reduces

(12)

Component Standard GARCH Model (CS-GARCH)

The last model that will be considered in this paper, is the Component Standard GARCH model proposed by Engle and Lee (1993). This model decomposes the conditional variance into two separate components; a permanent component (denoted by qtin the specification)

and a transitory component (denoted by σ_t2₋_j−qt−j in the specification). The model

speci-fication is σ_t2 =qt+ p

∑

j=1 αj ε2t−j−qt−j + q

∑

j=1 βj σt2−j−qt−j qt =ω+ρqt−1+φ ε2_t−1−σt2−1

where we note that the intercept of this model is time-varying, with an autoregressive com-ponent of order 1, whereas the four previous models have a time-constant intercept ω. The decomposition of the permanent and transitory component of the conditional variance al-lows for an investigation of short-run and long-run movements of volatility.

Although there are more models that have received a lot of attention and are well estab-lished in the literature, these five models encompass many of the different methods to improve upon the standard GARCH model, such as the leverage term, size and sign ef-fects and short- and long-run movements of the volatility. Since these models are the most cited in literature, they are the most obvious candidates to improve upon the benchmark sGARCH specification.

Parameter Restrictions

The parameter restrictions for all models (except for the EGARCH specification) are the non-negativity restriction for all the parameters included in the model. This ensures that the left side of the specification is positive (as is needed since a negative volatility does not make sense).

Model Selection: Choosing p and q

The optimal lags for the autoregressive residual and variance terms will be determined using the likelihood ratio test. The computed test statistic is

D =2× LLHalt−LLHnull

where LLHalt and LLHnull are the loglikelihood values for the alternative model and the

(13)

model does not improve significantly upon the null model, follows a χ2(r−s)-distribution, where r is the number of estimated parameters for the alternative model and s is the num-ber of estimated parameters for the null model. Note however, that the likelihood ratio test will not be the only deciding factor in obtained the most preferred model. Significance of model parameter estimates and a parsimonious model will also be factors determining the models.

After the model selection phase, there will be five models (the five different models ex-plained in this section) with the optimal amount of lags and parameter estimates. Once the optimal models are obtained, the model performance phase will start where the forecasting performance will be evaluated.

Forecasting

The forecasting performance will be compared using three accuracy measures. The first and most known accuracy measure is the Mean Squared Error (MSE), which is defined as

MSE= 1 N N

∑

i=1 (yt− ˜yt)2

where N is the number of forecast values and yt and ˜ytare the actual values and the values

obtained from forecasting, respectively. The second measure, which is quite similar to the first one, is the Mean Absolute Error, which is defined as

MAE = 1 N N

∑

i=1 |yt− ˜yt|

where the symbols are equivalent to the MSE, but instead of taking the square, we take the absolute value of the deviations. These accuracy measures penalizes for the deviations from the actual values and hence, a lower value for these accuracy measures translate to a better forecasting performance.

The third and final measure to be used is the Directional Accuracy measure. This measure looks at the sign of the forecasting value in one step (say, from t−1 to t) and compares it to the sign of the actual value. This measure is defined as

DAC= 1

N

∑

_t 1sign(yt−yt−1)==sign(˜yt−˜yt−1)

(14)

value for this measure translates into a better forecasting performance.

For our analysis, the yt will be the actual volatility (σt) at time t calculated from the data

and ˜yt will be the forecast volatility ( ˜σt) at time t obtained by the model. An example: Let’s

assume we have a dataset of n+m daily returns. We split the data to obtain the first n values (to use for forecasting) and use the last m values to assess the forecast using the aforementioned forecasting measures. We will denote a dataset from value r to value s by the notation: {r, s}.

For the first forecast, we fit a model for the first n values {1, n} and consequently obtain a forecast for ˜σn+1 and compare this to the volatility in time n+1: σn+1 (by calculating the

volatility for the dataset {2, n+1}). For the second forecast, we fit a model for the dataset {2, n+1} to obtain the forecast for ˜σn+2. This value will then be compared to the actual

volatility σn+2in time n+2 calculated for the dataset {3, n+2}. For the m-th forecast, we

fit a model for the dataset {m, n+m−1} to obtain the forecast for ˜σn+m. This value will

then be compared to the actual volatility σn+m calculated for the dataset {m+1, n+m}.

After obtaining the actual and forecast volatilities in this manner, the forecast performance measures can be calculated by looking at the differences for the MSE and MAE measures and looking at the signs of the differences between two consecutive actual volatilities and two consecutive forecast volatilities for the DAC measure.

First, the performance on the whole dataset will be examined. Then, the performance in different time periods (1990’s, 2000’s and 2010’s) will be considered, to see if a single model outperforms in all time periods. Next, forecasting performance during a time of crisis will be considered to examine which model can perform the best against adversity. These steps will be taken for all three datasets to see if the results obtained for the S&P 500 extend to other series.

(15)

Model Uncertainty

After the optimal forecasting model is obtained from the analysis, Nyblom’s Stability test (Nyblom, 1989) will be performed on the optimal model and the benchmark model to check the stability of the model parameters and see if the model is adequate enough to use for forecasting purposes. This test looks at the parameters (which we will denote as the vec-tor β) and under the alternative model it assumes that the parameters have the following specification

β =βt =βt−1+ηt, where ηit iid

∼ (0, σ_η2_i)

Nyblom’s test then has the following hypotheses

H0 : β is constant ⇐⇒ ση2i =0 for all i

Ha : ση2i >0 for some i

Since the test statistic L (which is a Lagrange Multiplier type statistic) does not follow a standard distribution, the critical values will depend on the volatility model at hand. Therefore, the critical values will be provided in the table notes (see Results section). Furthermore, a Monte Carlo simulation on the parameters of the benchmark model and the resulting optimal model will be performed, to examine the underlying distribution and consequently the consistency of the estimated parameters. The procedure for this method is as follows, where we take the benchmark model as an example: First, the optimal param-eter estimates for the standard GARCH(1, 1) model are obtained after fitting the model to the data. Let us call these values ˆα and ˆβ. Then, M simulations will be performed where in each simulation, N random daily returns are drawn from the dataset. For each simulation, the optimal parameter estimates αN and βN are obtained by fitting the model to the N data

points. In the end, there will be M different αN’s and βN’s for which we can draw density

plots. If our original fitted parameters (ˆα and ˆβ) which we hypothesize to be the true values for the parameters are not close to the center of these density plots, then we can say that the parameter estimates for the GARCH(1, 1) model exhibit high uncertainty, leading to model uncertainty. Also, the simulation horizon (the amount of draws N) will be extended recur-sively to see if the error observed between the simulated parameter density mean and the hypothesized true parameter value decreases as the simulation horizon increases. In other words, this method will assess the consistency of the parameter estimates.

(16)

4 Data

The data used for the analysis are daily return data for three indices: the Standard & Poor’s 500 (S&P 500, American stock index), Deutscher Aktienindex (DAX, German stock index) and the Nikkei Heikin Kabuka (Nikkei 225, Japanese stock index). The data, obtained from Yahoo! Finance1span 27 years, starting from January 1st 1990 to December 31st, 2016. To get daily return data from the obtained index level data, first log differences were taken for all three indices. Furthermore, days for which one or more indices did not have a value (reason being missing data and some country-specific holidays) are removed for fair com-parison purposes, so that we have the exact same days and their returns for all three series. These were around 400 in the whole dataset, which is negligible (seeing as it is only a cou-ple days per year). The remaining dataset consists of 6304 data points.

Table 1: Descriptive Statistics

N Mean Std.Dev Min Max Skewness Kurtosis

S&P 500 6304 0.00020 0.0112 -0.09 0.10 -0.45 7.85

DAX 6304 0.00012 0.0142 -0.10 0.11 -0.20 3.72

Nikkei 225 6304 -0.00017 0.0155 -0.12 0.13 -0.14 5.28

Table 1 shows the descriptive statistics for the three indices. As was expected, the expected return on all three indices are extremely close to zero. Furthermore, we see a similar stan-dard deviation for all three indices of around 11%−15%. Also, we see that the skewness for all three indices are negative, but close to zero. The most remarkable (but expected) re-sult we see in the table is the excess kurtosis2for the indices, which are significantly larger than the excess kurtosis for a normal distribution (which is zero).

It is a well established (stylized) fact that return data generally do not follow a normal dis-tribution. This is because the normal distribution assigns too small a probability to extreme events (such as a huge loss or huge gain). The excess kurtosis we see in the descriptives is a sign of this. In figure 4 we see the histograms for the index returns, with the normal curve superimposed. Clearly, we can see that the peak around zero is higher than the normal distribution suggests, and there are more observations in the tails of the histograms than the normal distribution suggests.

1_{https://finance.yahoo.com/} 2

(17)

The QQ plots in figure 5 show us that a t-distribution (which allows for fatter tails) is a more appropriate distribution to use for this type of data. A Jarque-Bera test confirms our beliefs: the return data are not samples from a normal distribution. The large excess kurtosis values, the Jarque-Bera test and the QQ-plots all point to the data having fat-tails. This is only one of many stylized facts for return data. The other stylized facts can be seen by inspecting the plots of the time series.

Figure 1: Daily return for the S&P 500 Index

The figures 1, 2 and 3 (the last two figures can be found in the Appendix) show the behav-ior of the daily return over time for the S&P 500, DAX and Nikkei 225, respectively. We see that the returns oscillate around zero, and that there are high spikes in certain areas. Also, we see that extreme events appear in clusters: this is called volatility clustering and it implies that the volatility differs in different points of time.

Taking a closer look at the time series of the S&P 500, we see large volatility around the year 2000 and around 2007-2008, the dates where the dot-com bubble and sub-prime mortgage crisis occurred. The same clustering can be seen for the DAX and the Nikkei 225 index. Notice that the period leading up to the cluster of 2007 has relatively little volatility.

(18)

Table 2: Correlations

S&P 500 DAX Nikkei 225

S&P 500 1 -

-DAX 0.535 1

-Nikkei 225 0.125 0.259 1

multivariate models for reasons stated in the previous section.

Note that since the paper will focus on GARCH models (modeling of the conditional vari-ance), the ARMA modeling (the modeling of the conditional mean) will not be treated in the Results section. However, since the specification for the conditional mean has a direct impact on the specification of the conditional variance (the residual term ε of the ARMA model enters the GARCH model), a suitable specification for the conditional mean is nec-essary. The specification is as follows

rt =µ+ m

∑

i=1 φirt−i+ n

∑

j=0 θjεt−j

Multiple combinations of m and n are tested. Since the models are all nested models, the likelihood ratio test is used to determine the most preferred model. Since all specifications higher than ARMA(1, 1) resulted in very low likelihood ratio test statistics, the null hypoth-esis cannot be rejected which implies that the more elaborate models show no significant improvement of the loglikelihood and hence are not an improvement on the ARMA(1, 1) model.

(19)

5 Results

5.1 Model Estimation

In this section the optimal specification for the five discussed models will be obtained by using the Likelihood Ratio Test, significance of added parameters and the argument of par-simony as selection criteria. For each model, the optimal specification will be given with the robust standard errors of the parameter estimates in brackets below the parameter val-ues. We start off with the model that started it all: the standard GARCH(p, q).

5.1.1 sGARCH

In table 3, multiple GARCH models with different lags are fitted to the S&P 500 index return data. We note that adding an additional lag of the squared residuals to the model (column 2) does significantly improve the model according to the LR test at a 1% level (with a value of 16.66 to be compared to the critical value χ2_0.01(1) = 6.63), the added parameter appears to be insignificant and the α1parameter estimate drops from being significant at a

1% level to 5% level. Furthermore, adding an additional lag of the squared volatility (col-umn 3) to the specification introduces a highly insignificant parameter β2while not adding

anything to the log likelihood value. Keeping significance of parameters, the argument of parsimony and the LR test in mind, the GARCH(1, 1) is chosen as the optimal model. The specification for the squared volatility is as follows

ˆσ_t2=0.0776 (0.0262)ˆε 2 t−1+0.9175 (0.0244)ˆσ 2 t−1

Note that the constant ω is dropped from the specification since the parameter estimate is highly insignificant and equal to zero. Also, note that the parameter estimate for the lagged squared volatility is huge, indicating that the volatility of the previous period goes into the volatility of the current period with a factor of 0.9175; a perfect example of the incorporation of the phenomenon of volatility clustering in the GARCH framework (recall that the ARCH model does not have this term and hence does not capture the volatility clustering). Note that the optimal specification for the Standard GARCH model coincides with the benchmark model.

5.1.2 EGARCH

(20)

insignificant parameter β2 to the model while only improving the loglikelihood value by

0.10. This corresponds to an LR test statistic of 0.20 which is compared to the critical value

χ2_0.01(1) = 6.63, with the conclusion being that the model in column 3 is not a significant

improvement on the model from column 1.

If we look at column 2, it can be seen that adding an additional lag for the size and sign effects, the model loglikelihood goes up by 30.77 with all model parameters being signifi-cant at a 1% level. The LR test statistic becomes 61.54, which is far greater than the critical value it has to be compared to χ2_0.01(2) =9.21, concluding that the EGARCH(2, 1) model is a significant improvement upon the EGARCH(1, 1) model. Using the same logic, we see that column 4 does not improve upon column 2 (LR test statistic of 1.41 compared to the critical value χ2_0.01(1) = 6.63) and that column 5 does significantly improve upon column 2 (LR test statistic of 49.62 compared to the critical value χ2_0.01(2) =9.21). Although column 5 introduces two insignificant parameter estimates to the model (α2and γ3), the size of the

LR test statistic (which is twice the gain on the loglikelihood for the model) justifies the significant improvement.

In conclusion, we arrive at our optimal model for the Exponential GARCH(p, q) specifica-tion with p=3, q=1. The specification of this model is the following

ln(ˆσ_t2) = −0.1349 (0.0017) +−0.2264 (0.0215) ˆεt−1−0.0758 (0.0141)S1 +−0.0128 (0.0291) ˆεt−2+0.1870 (0.0408)S2 +0.1560 (0.0203)ˆεt−3+0.0252(0.0294)S3 +0.9859 (0.0000)ln(ˆσ 2 t−1) Si = |ˆεt−i| −E|ˆεt−i|

Note that the size effect for lag i is defined as Si to make the expression less tedious and

easier to read. Furthermore, for this model we also see a large parameter value for the lagged (natural log of) squared volatility, as was expected.

5.1.3 GJR-GARCH

(21)

critical value of χ2_0.01(2) =9.21). In conclusion, the GJR-GARCH(3, 1) model is optimal. Furthermore, we see that, for all the specifications considered in this table, all αi

parame-ters are insignificant and close to or equal to zero, whereas the γi coefficients are

signifi-cant. This indicates that only negative shocks in the market attribute to a higher volatil-ity (through εi and the corresponding indicator functions), which makes intuitive sense.

Using the LR test for determining the optimal model for our data, we conclude that the GJR-GARCH(3, 1) specification is the optimal model. Below, the expression for the chosen model is given ˆσ_t2 =0.0000 (0.0280)ˆε 2 t−1+0.1030 (0.0350)It−1ˆε 2 t−1 +0.0000 (0.0352)ˆε 2 t−2+0.1660 (0.0493)It−2ˆε 2 t−2 +0.0247 (0.0282)ˆε 2 t−3−0.1686 (0.0383)It−3ˆε 2 t−3 +0.9131 (0.0255)ˆσ 2 t−1

Note that, once again, the constant ω is omitted from the expression, since its value is zero and highly insignificant. Also, the parameter value for the lagged squared volatility is once again quite high.

5.1.4 APARCH

In table 6, we see the fitted model parameters for various specifications for the APARCH(p, q) model. It can be seen that there are no improvements to be seen in the columns following the first one, indicating that the APARCH(1, 1) model is the optimal model if we look at significance of model parameters, parsimony and the LR test (which would fail to reject the null with the given loglikelihood values in the table). We see that all the parameter estimates - except for the constant ω - are significantly different from zero at a 1% level. Furthermore, note that the optimal fitted δ is equal to 1.2145, giving us a term that is in ’between’ regular volatility (where δ =1) and the variance (where δ=2).

ˆσ 1.2145 (0.1673) t =0.0688 (0.0124) |ˆεt−1| −0.8973 (0.1447)ˆεt−1 1.2145 (0.1673)₊_0.9202 (0.0176)ˆσ 1.2145 (0.1673) t−1

Note that the constant ω is dropped from the expression due to it being equal to zero and highly insignificant.

5.1.5 CS-GARCH

(22)

parameter estimates of the intercept-specification.

From the table we can infer that adding either lag (p or q) improves the model significantly in terms of loglikelihood (where a gain of more than 120 units for the loglikelihood can be seen when going from column one to two or from column one to three). The loglikelihood values of the models in columns four and five are almost identical, with only a marginal improvement on the loglikelihood compared to the model in column two. The LR test statistic corresponding to the columns two and four equals 0.34 and the LR test statistics corresponding to columns two and five equals 0.16, both of these being far below the critical value χ2_0.01(1) = 6.63. This brings us to the conclusion that there is no model that improves significantly upon the CS-GARCH(2, 1) specification when keeping the criteria used for selecting models in mind. The expression for this model is given by

ˆσ_t2 = ˆqt+0.0010 (0.0184) ˆε2_t−1− ˆqt−1 +0.0759 (0.0221) ˆε2_t−2− ˆqt−2 +0.8725 (0.1589) ˆσ_t2−1− ˆqt−1 ˆqt =0.9973 (0.0000)ˆqt−1+0.0369(0.0163) ˆε2_t₋₁−ˆσ_t2₋₁

Note that the constant term for the intercept-specification is omitted since it was highly in-significant and equal to zero. Furthermore, note that the intercept-specification ( ˆqt), which

is called the permanent component of the conditional volatility for this model, is deter-mined largely by its lagged value (the coefficient is close to 1). Also, we see that the coeffi-cient on the transitory component ˆσ_t2₋₁− ˆqt−1 and the coefficient on the difference of the

lagged squared residuals and lagged permanent component are highly significant.

5.2 Forecasting

(23)

Five periods will be considered: the whole period for which the dataset is recorded (1990-2017), the nineties (1990-2000), the decade following the millennium celebrations (2000-2010), the final seven years of the data (2010-2017) and the financial crisis period (2007-2008).

5.2.1 Full Data Set: 1990-2017

To start off, forecasts of the five models on the full range of the dataset are performed. The results of the forecast performance measures and the forecast points can be found in table 8. First of all, we note that for the DAX dataset, the forecasts have a 70% probability for all models to forecast the correct direction, whereas this probability for the Nikkei 225 dataset is quite low around the 35%-45% range. In general, we note that the EGARCH model performs poorly for every dataset with the highest values for the MSE and MAE measures. Looking at index-specific rankings, we see that for the S&P 500 the GJR-GARCH and APARCH models rank highest, for the DAX the CS-GARCH and sGARCH models rank highest and for the Nikkei 225 the highest ranking models are the GJR-GARCH and CS-GARCH. The GJR-GARCH and CS-GARCH models seem to perform the best on the whole dataset, with APARCH and sGARCH models trailing with TFP numbers that are close. Notice that the benchmark model, the sGARCH, is quite close to the other models in terms of forecasting performance.

5.2.2 Period: 1990-2000

(24)

5.2.3 Period: 2000-2010

Table 10 includes the relevant forecast information for the period 2000-2010. First thing that draws the attention are the relatively large numbers for the MSE and MAE values, compared to the previous tables of the full data set and the period 1990-2000. We see that the CS-GARCH model ranks highest for the S&P 500 and the DAX, and ranks second best for the Nikkei 225, whereas the benchmark model ranks best for Nikkei 225 and second best for the S&P 500. In aggregate, we see that the CS-GARCH model clearly is the best model for this period, with the benchmark model ranking as the second best. A peculiar result is the bottom ranking of the APARCH model, if we compare it to the good results for this model in the previous two forecasting period analyses. The EGARCH model performs, just as in the previous cases, poorly.

5.2.4 Period: 2010-2017

The forecast performance results for the period 2010-2017 are given in table 11. If we look at index-specific rankings, we see again that the CS-GARCH ranks best in two out of three indices, and ranking second best in the remaining index. In aggregate, this model by far outperforms the other four models based on our criteria. Behind, we have the bench-mark model and the GJR-GARCH model outperforming the once again poorly performing EGARCH and APARCH models.

5.2.5 Period: Crisis (2007-2008)

The forecast performance results for the period 2007-2008, where the models were esti-mated just before the end of 2007 and the forecasting performed in the first month of 2008, are given in table 12. First thing we note (and which was expected) are the huge values for the MSE and MAE performance measures, since forecasting the volatility in a crisis with a model not ’prepared’ for a crisis gives a relatively poor forecasting performance. Sur-prisingly, we see that the EGARCH model is on par with the leading CS-GARCH model in terms of forecast points. In an index-specific setting, we see that CS-GARCH ranks first for two indices, but last in the remaining index. Nonetheless, we see that the CS-GARCH model, once again, comes out on top.

5.2.6 Cumulative Forecast Points

(25)

model sGARCH(1, 1) has the highest amount of points for the S&P 500, whereas the CS-GARCH(2, 1) is superior for the DAX and Nikkei 225.

Looking at the total on the bottom of the table, we see that indeed, the CS-GARCH(2, 1) is overall the best performing model when it comes to forecasting volatility. The benchmark model sGARCH(1, 1) comes in second, with the GJR-GARCH(3, 1) finishing the top three list. The EGARCH(3,1) model performs the worst out of the selected models, whether it is period specific or in aggregate.

To justify using these models for forecasting, the adequacy of these models have to be inspected. We will check the stability and consistency of the model parameters to gauge the usefulness of these models. Since the CS-GARCH and GJR-GARCH model, along with the benchmark model, were the highest ranking models, the model adequacy of these three models will be examined in the next subsection.

5.3 Model Uncertainty

5.3.1 Nyblom’s Stability Test

To check for the stability of the model parameters, the Nyblom’s Stability Test on all model parameters are performed for all five models. The test statistics are shown in table 14. Looking at the top three models from our analysis, we see that the benchmark model does not, apart from the constant ω, exhibit significant statistics. This translates into the model parameters of the benchmark model being constant over time, or in other words: stable. For the highest ranking model, the CS-GARCH(2, 1), we see that the test statistics for all model parameters exceed the critical value at 5%, with more than half of these statistics exceeding the critical value even at 1%. This indicates that while the model ranks highest for forecasting, the model parameters are not constant over time which translates into a higher degree of model uncertainty compared to the benchmark model.

(26)

5.3.2 Monte Carlo Simulation & Asymptotic Efficiency

As a second way to look at the stability of the model parameters, Monte Carlo simulations are performed for the top three ranked models. We first hypothesize that the estimated parameter values obtained in the previous section are the true population values of the pa-rameters. Then, we draw M samples of size N from the daily return ’population’ (which is our dataset) and obtain parameter estimates for the given models. Since we created a set-ting where we know the population values of the parameters, the resulset-ting density plots for the parameter estimates (obtained from M simulations) gives us the degree of uncertainty present in these estimates. If the means of the density plots are close to the true population values of the parameters, we can say that the estimated parameters are consistent estima-tors of the true values. Furthermore, by recursively increasing the sample size N for the simulations, we can assess the (simulation-based) consistency of the parameter estimates using the Root Mean Squared Error (RMSE) measure. According to statistical theory, the parameter estimates converge in the limit to a normal distribution (asymptotic normality) with√N-rate of convergence:

√

N(ˆθN −θ0) −→ N (d 0, V)

The simulation results for the benchmark sGARCH(1, 1) model are given in table 15. We see that for all the different-sized simulations, the mean values of the parameters are very close to the hypothesized true population values. This suggests consistency and stability of the model parameters. Furthermore, parameter density plots for all parameters included in the model are found in figure 6. Considering only the parameters for the volatility specifi-cation (ω, α, β), we see that the density for all parameters resemble approximately a normal curve. To assess the asymptotic efficiency of the model parameters, we look at the rate of convergence plots in figure 7, where the red line in each plot indicates the expected RMSE for a√N-rate of convergence. As can be seen from the plots, the RMSE for all the relevant parameters are lower than the red line as the simulation size (N) increases, suggesting con-sistency.

The results for the CS-GARCH(2, 1) model are given in table 16. We again see that for the various-sized simulations, the parameter estimates are approximately equal, but for the pa-rameters α1and α2- especially for the first one - large differences are observed compared to

the hypothesized true values. This instability of the two parameters were also highlighted by Nyblom’s test, with these two parameters having huge test statistics (2.61 and 1.78, re-spectively) compared to the critical value of 0.47. The asymptotic efficiency plots in figure 9 support this finding, where we see that for the parameters α1and α2the RMSE is larger

(27)

The simulation results for the final model in consideration, the GJR-GARCH(3, 1) model, can be found in table 17. Note that for all the different-sized simulations, the values of the parameter estimates are very close. If we look at the hypothesized true population values and compare these with the simulation values, we see that for the ω, β and γi parameters

the values have little difference, but the αiparameters do exhibit large differences between

true and simulated values. Especially for the α1 and α2 parameters, the true values and

simulation values differ by a factor of 105; a tremendous difference. The asymptotic effi-ciency plots for this model, found in figure 11, also indicate that multiple parameters show poor performance. For example, the RMSE for the parameters α1, α3 and γ1 is larger than

expected for √N-rate of convergence for all simulation sizes. These results suggest poor consistency, despite exhibiting parameter constancy in the model according to Nyblom’s test.

(28)

6 Conclusion & Discussion

To see which volatility model gives the best forecasting performance, the empirical anal-ysis gives the following insights. For the five GARCH-type models considered, the fol-lowing specifications were found to be optimal on the grounds of parsimony, significance of model parameters and the likelihood ratio test: Standard GARCH(1, 1) (which is also the benchmark model), Exponential GARCH(3, 1), GJR-GARCH(3, 1), Asymmetric Power ARCH(1, 1) and the Component Standard GARCH(2, 1). The forecast performance mea-sures show that, overall, the CS-GARCH specification outperforms all other models using a ranking method by assigning Forecast Points according to model rankings, whereas the benchmark sGARCH(1, 1) model ranks as second best. The Exponential GARCH specifica-tion ranked lowest for all time periods considered. Analyzing multiple time periods show that the CS-GARCH specification always outperforms the other models for the DAX index daily returns, a result not found for the remaining indices.

While the forecasting performance of the CS-GARCH model appears to be the best, Ny-blom’s Stability test indicates that the model parameters do not exhibit constancy. Further-more, Monte Carlo simulations indicate that some of the model parameters for this optimal model do not exhibit consistency and a√N-rate of convergence expected from an adequate model. The benchmark model on the other hand, shows more promising results for the Ny-blom’s test and the Monte Carlo simulations, indicating that while it does rank as second best according to the forecast performance analysis, the low degree of model uncertainty due to the stable model parameters indicate that the benchmark model would be a better choice.

Returning to the research question ”Which volatility model has the best forecasting perfor-mance?”, it is tried to find an answer with the results summarized above. Unfortunately, the analysis does not lead to a clear-cut answer to this question. While for the DAX index the CS-GARCH specification did outperform the remaining models in all time periods us-ing our rankus-ing criteria, claimus-ing this model to be the best in terms of forecastus-ing for this index would be too naive. This is partly due to the instability and lack of constancy in the model parameters, suggesting a high degree of model uncertainty; an undesirable trait for every model in general and forecasting models in particular.

(29)

data using different methods. The results found in this paper are aligned with the results found by Hansen and Lunde (2005), where - while being outperformed by the CS-GARCH specification - the benchmark model is hard to beat if we take model uncertainty into ac-count along with forecast performance measures.

The paper is concluded with the limitations. The analysis considered five models, with the motivation that these models are well established specifications for the conditional vari-ance in scientific literature. However, as stated in previous sections, there are numerous variations of the GARCH model which may or may not outperform the models considered in this paper. Furthermore, only three forecast performance measures are used to assess the forecasting accuracy, with two loss functions (MSE and MAE) and one directional accuracy measure (DAC). Even though these measures are the most established in the literature (along with the RMSE, which gives the same conclusions as MSE), other measures might give different results than the results found in this paper.

Additionally, the ranking method (FP) used does not account for the magnitude of dif-ference in the various measures. For example, say we only consider two models in our analysis. If Model 1 has an MSE value of 3 and an MAE value of 3, whereas Model 2 has an MSE value of 0.3 and an MAE value of 3.001, the resulting Forecast Points for both models would be 3 (1+2 and 2+1, respectively), even though we can clearly state that the forecast-ing performance for Model 2 is better. Even though the defined FP method is simple and intuitive to use, including the magnitude of these differences by using a more sophisticated method could lead to different results.

Also, the analysis only included three stock indices: the S&P 500, DAX and Nikkei 225. Including other indices into the analysis might lead to different results since each different index might react differently to certain financial news in their respective country. As a fi-nal caveat we note that only index data was used for modeling and forecasting purposes. Other types of data, such as exchange rate data, might exhibit different volatility processes leading to different conclusions.

(30)

References

Andersen, T. G. and T. Bollerslev (1998). Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review, 885–905.

Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003). Modeling and forecasting realized volatility. Econometrica 71(2), 579–625.

Andersen, T. G., T. Bollerslev, and S. Lange (1999). Forecasting financial market volatility: Sample frequency vis-a-vis forecast horizon. Journal of Empirical Finance 6(5), 457–477. Avramov, D. (2002). Stock return predictability and model uncertainty. Journal of Financial

Economics 64(3), 423–458.

Barberis, N. (2000). Investing for the long run when returns are predictable. The Journal of Finance 55(1), 225–264.

Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. Journal of Political Economy 81(3), 637–654.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31(3), 307–327.

Bollerslev, T. (2009). Glossary to arch (garch). In Volatility and Time Series Econometrics: Essays in Honour of Robert F. Engle. Citeseer.

Bollerslev, T., R. Y. Chou, and K. F. Kroner (1992). Arch modeling in finance: A review of the theory and empirical evidence. Journal of Econometrics 52(1-2), 5–59.

Campbell, J. Y. and L. Hentschel (1992). No news is good news: An asymmetric model of changing volatility in stock returns. Journal of Financial Economics 31(3), 281–318.

Chatfield, C. (2006). Model uncertainty. Encyclopedia of Environmetrics.

Ding, Z., C. W. Granger, and R. F. Engle (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance 1(1), 83–106.

Engle, R. (2001). Garch 101: The use of arch/garch models in applied econometrics. The Journal of Economic Perspectives 15(4), 157–168.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society, 987–1007.

(31)

Engle, R. F., A. J. Patton, et al. (2001). What good is a volatility model. Quantitative Fi-nance 1(2), 237–245.

Glosten, L. R., R. Jagannathan, and D. E. Runkle (1993). On the relation between the ex-pected value and the volatility of the nominal excess return on stocks. The Journal of Finance 48(5), 1779–1801.

Hansen, P. R. and A. Lunde (2005). A forecast comparison of volatility models: does any-thing beat a garch (1, 1)? Journal of Applied Econometrics 20(7), 873–889.

Markowitz, H. (1952). Portfolio selection. The Journal of Finance 7(1), 77–91.

Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 347–370.

(32)

7 Appendix: Tables

Table 3: Standard GARCH(p, q) esti-mates for S&P 500

(1,1) (2,1) (1,2) α1 0.0776**_(0.0262) _(0.0141)0.0294* 0.0777**_(0.0225) α2 _(0.0460)0.0719 β1 0.9175** (0.0244) 0.8916** (0.0387) 0.9176** (0.0128) β2 0.0000 (0.0082) ω 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) LLH 20748.52 20756.85 20748.52

(33)

Table 4: EGARCH(p, q) estimates for S&P 500 (1,1) (2,1) (1,2) (2,2) (3,1) α1 -0.1119** (0.0095) -0.2349** (0.0230) -0.1101** (0.0098) -0.2375** (0.0238) -0.2264** (0.0215) α2 0.1291** (0.0234) 0.1336** (0.0247) -0.0128 (0.0291) α3 0.1560**_(0.0203) β1 0.9825** (0.0002) 0.9812** (0.0000) 0.9999** (0.0005) 0.9999** (0.0000) 0.9859** (0.0000) β2 -0.0173** (0.0025) -0.0188** (0.0006) γ1 0.1356**_(0.0111) -0.0693**_(0.0204) 0.1338**_(0.0116) -0.0701*_(0.0343) -0.0758**_(0.0141) γ2 0.2183** (0.0098) 0.2170** (0.0327) 0.1870** (0.0408) γ3 0.0252 (0.0294) ω -0.1677** (0.0021) -0.1800** (0.0023) -0.1656** (0.0283) -0.1792** (0.0052) -0.1349** (0.0017) LLH 20821.34 20852.11 20821.44 20852.84 20876.92

(34)

Table 5: GJR-GARCH(p, q) estimates for S&P 500 (1,1) (2,1) (1,2) (2,2) (3,1) α1 0.0020 (0.0824) 0.0000 (0.0470) 0.0020 (0.0243) 0.0000 (0.3384) 0.0000 (0.0280) α2 0.0193 (0.0594) 0.0214 (0.2331) 0.0000 (0.0352) α3 _(0.0282)0.0247 β1 0.9117** (0.3076) 0.8941** (0.1699) 0.9113** (0.0390) 0.4481 (3.4803) 0.9131** (0.0255) β2 0.0004 (0.0293) 0.4035 (3.0201) γ1 _(0.4260)0.1404 0.1074**_(0.0391) _(0.0879)0.1407 _(0.2040)0.0896 0.1030**_(0.0350) γ2 0.0313 (0.1610) 0.1141 (0.8179) 0.1660** (0.0493) γ3 -0.1686** (0.0383) ω 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) LLH 20813.55 20817.77 20813.55 20819.53 20832.65

(35)

Table 6: APARCH(p, q) estimates for S&P 500 (1,1) (1,2) (1,3) α1 0.0688**_(0.0124) _(0.1409)0.0684 0.0724**_(0.0085) β1 0.9202**_(0.0176) 0.9202**_(0.0388) 0.8517**_(0.0007) β2 0.0000 (0.0235) 0.0000 (0.0134) β3 0.0624**_(0.0043) γ1 0.8973**_(0.1447) 0.8993**_(0.3534) 0.9567**_(0.0024) δ 1.2145** (0.1673) 1.2213** (0.2806) 1.2195** (0.1313) ω 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) LLH 20828.62 20828.47 20829.18

(36)

Table 7: CS-GARCH(p, q) estimates for S&P 500 (1,1) (2,1) (1,2) (2,2) (3,1) α1 0.0645**_(0.0103) _(0.0184)0.0010 _(0.0517)0.0503 _(0.0212)0.0008 _(0.0353)0.0000 α2 0.0759**_(0.0221) _(0.0337)0.0858* _(0.0903)0.0750 α3 0.0005 (0.1578) β1 0.8725**_(0.1589) 0.8475**_(0.0526) 0.8945**_(0.0482) _(0.7121)0.6465 0.8490**_(0.0475) β2 _(0.0539)0.0033 _(0.6035)0.1822 ρ 0.9956** (0.0000) 0.9973** (0.0000) 0.9975** (0.0000) 0.9974** (0.0000) 0.9973** (0.0000) φ 0.0292 (0.0336) 0.0369* (0.0163) 0.0382** (0.0033) 0.0371** (0.0138) 0.0371 (0.0351) ω 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) 0.0000 (0.0000) LLH 20613.72 20768.99 20756.69 20769.16 20769.07

(37)

Table 8: Forecast Performance Measures (FPM) for the obtained models for all three indices

Index FPM S(1,1) E(3,1) GJR(3,1) AP(1,1) CS(2,1)

MSE 2.5867 2.5975 2.5825 2.5730 2.5837 S&P 500 MAE 4.0811 4.0834 4.0672 4.0688 4.0801 DAC 55% 55% 50% 50% 60% FP 8 6 12 12 11 MSE 6.5027 6.7106 6.6180 6.6243 6.4914 DAX MAE 5.6809 5.7724 5.7228 5.6927 5.6786 DAC 70% 70% 70% 70% 70% FP 13 7 10 10 15 MSE 5.0277 5.1316 5.0837 5.1247 5.0252 Nikkei 225 MAE 5.4685 5.4776 5.4563 5.4607 5.4649 DAC 35% 35% 45% 40% 35% FP 9 5 13 10 11 Total FP TFP 30 18 35 32 37

1_{The letters denote S(tandard), E(xponential), GJR, A(symmetric) P(ower)}

and C(omponent) S(tandard) respectively, with the lags in parentheses

2_{MSE is multiplied by 10}5_{and MAE multiplied by 10}3_{for convenience} Table 9: Forecast Performance Measures (FPM) for the obtained models for all three indices in the time period 1990-2000

(38)

Table 10: Forecast Performance Measures (FPM) for the obtained mod-els for all three indices in the time period 2000-2010

2_{MSE is multiplied by 10}5_{and MAE multiplied by 10}3_{for convenience} Table 11: Forecast Performance Measures (FPM) for the obtained models for all three indices in the time period 2010-2017

(39)

Table 12: Forecast Performance Measures (FPM) for the obtained mod-els for all three indices during the 2007-2008 crisis

2_{MSE is multiplied by 10}5_{and MAE multiplied by 10}3_{for convenience}

Table 13: Cumulative Forecast Points

S(1,1) E(3,1) GJR(3,1) AP(1,1) CS(2,1) Total

S&P 500 56 38 52 52 54 252

DAX 53 41 48 42 74 258

Nikkei 225 51 43 57 37 60 248

(40)

Table 14: Nyblom’s Stability Test statistics for model parameters S(1,1) E(3,1) GJR(3,1) AP(1,1) CS(2,1) α1 0.36 1.90** 0.25 0.39 2.61** α2 1.55** 0.33 1.78** α3 1.51** 0.44 β1 0.36 0.24 0.43 0.42 0.52* γ1 0.17 0.36 0.97** γ2 0.06 0.42 γ3 0.07 0.43 δ 0.36 ρ 0.48* φ 0.98** ω 119.48** 0.26 85.06** 0.33 323.90** ν 0.53* 0.38 0.54* 0.44 0.49*

(41)

Table 15: Monte Carlo simulations for the sGARCH(1,1) parameters

ω α1 β1 ν

True (Hypothesized) 8.6788e-07 0.077644 0.91748 6.2130

Sim (N=2000) 10.872e-07 0.077667 0.91480 6.5026

Sim (N=3000) 10.006e-07 0.078141 0.91542 6.4028

Sim (N=4000) 9.1144e-07 0.077332 0.91710 6.3877

Sim (N=5000) 9.2454e-07 0.077523 0.91684 6.3350

Sim (N=6000) 9.1065e-07 0.077430 0.91703 6.3072

Table 16: Monte Carlo simulations for the CS-GARCH(2,1) parameters

ω α1 α2 β1 ρ φ ν

True (Hypothesized) 3.5197e-07 0.0010068 0.075927 0.84752 0.99729 0.036875 6.3910

Sim (N=2000) 3.9336e-07 0.0150468 0.070881 0.81509 0.99671 0.029919 6.7099

Sim (N=3000) 3.6330e-07 0.0121924 0.072001 0.83101 0.99691 0.030654 6.6737

Sim (N=4000) 2.9952e-07 0.0111556 0.071568 0.84024 0.99745 0.030654 6.5866

Sim (N=5000) 3.1600e-07 0.0107152 0.072010 0.83860 0.99731 0.031834 6.5316

(42)

(43)

8 Appendix: Figures

(44)

(45)

(a) S&P 500 (b) DAX

(c) Nikkei 225

(46)

(a) Normal QQ plot (S&P 500) (b) t QQ plot (S&P 500)

(c) Normal QQ plot (DAX) (d) t QQ plot (DAX)

(e) Normal QQ plot (Nikkei 225) (f) t QQ plot (Nikkei 225)

(47)

(48)

(49)

(50)

(51)

(52)

(53)

9 Appendix: R Code

rm(list=ls())

setwd("C:/Users/Burak/Dropbox/University/MSc/Thesis/Data") #all of the libraries used in the code

library(xlsx) library(car) library(MASS) library(xtable) library(moments) library(rugarch) library(parallel) attach(data) ############################### Data Preperation ############################### ################################################################################

#load in data and make names lowercase snp <- read.csv("^GSPC.csv", header=T) names(snp)

names(snp) <- tolower(names(snp)) #load in data and make names lowercase dax <- read.xlsx("dax.xlsx", 1, header=T) names(dax)

names(dax) <- tolower(names(dax)) #load in data and make names lowercase nik <- read.xlsx("nik.xlsx", 1, header=T) names(nik)

names(nik) <- tolower(names(nik))

#change class of date into Date and inspect data head(snp)

tail(snp)

class(snp$date)

snp$date <- as.Date(as.character(snp$date), format="%Y-%m-%d") head(dax)

tail(dax)

(54)

tail(nik) class(nik$date) #daily returns snp$ret <- c(NA,log(snp$adj.close[2:(dim(snp)[1])]) -log(snp$adj.close[1:(dim(snp)[1]-1)])) dax$ret <- c(NA,log(dax$adj.close[2:(dim(dax)[1])]) -log(dax$adj.close[1:(dim(dax)[1]-1)])) nik$ret <- c(NA,log(nik$adj.close[2:(dim(nik)[1])]) -log(nik$adj.close[1:(dim(nik)[1]-1)])) #adjust names for merging

names(snp) <- paste(names(snp),".snp",sep="") head(snp) names(dax) <- paste(names(dax),".dax",sep="") head(dax) names(nik) <- paste(names(nik),".nik",sep="") head(nik)

#merge into new data

snpdax <- merge(snp, dax, by.x="date.snp", by.y="date.dax") head(snpdax)

tail(snpdax)

snpdaxnik <- merge(snpdax, nik, by.x="date.snp", by.y="date.nik") head(snpdaxnik)

snpdaxnik.final <- snpdaxnik[-1,] head(snpdaxnik.final)

#home strech, we finally have our ready-to-use dataset

(55)

################################# Descriptives ################################# ################################################################################ attach(data)

#plot of the timeseries for the indices

plot(date,snp, type="l", main="Daily return S&P 500 Index", xlab="Date (Year)", ylab="Return")

plot(date,dax, type="l",main="Daily return DAX Index", xlab="Date (Year)", ylab="Return")

plot(date,nik, type="l",main="Daily return Nikkei 225 Index", xlab="Date (Year)", ylab="Return")

#descriptives

#descriptive statistics for firm and growth firm descr <- function(a,b,c) { ac <- cbind(length(a),mean(a),sd(a),min(a),max(a), as.numeric(skewness(a)),as.numeric(kurtosis(a))-3) bc <- cbind(length(b),mean(b),sd(b),min(b),max(b), as.numeric(skewness(b)),as.numeric(kurtosis(b))-3) cc <- cbind(length(c),mean(c),sd(c),min(c),max(c), as.numeric(skewness(c)),as.numeric(kurtosis(c))-3) final <- as.data.frame(rbind(ac,bc,cc)) colnames(final) <- c("N","Mean","Std.Dev","Min", "Max","Skewness","Kurtosis") rownames(final) <- c("S&P 500","DAX","Nikkei 225") print(final)

}

descriptives <- descr(snp,dax,nik) xtable(descriptives)

#correlations of the three indices cor(snp,dax)

cor(snp,nik) cor(dax,nik)

#histograms with normal curve superimposed

hist(snp, prob=TRUE, breaks=30, main="Histogram of S&P 500 return") curve(dnorm(x, mean=mean(snp), sd=sd(snp)), add=TRUE)

hist(dax, prob=TRUE, breaks=30, main="Histogram of DAX return") curve(dnorm(x, mean=mean(dax), sd=sd(dax)), add=TRUE)

(56)

curve(dnorm(x, mean=mean(nik), sd=sd(nik)), add=TRUE) #estimate df that gives best fit

fitdistr(snp,"t") #df=2.9 fitdistr(dax,"t") #df=3.7 fitdistr(nik,"t") #df=4.3

#plotting the data against theoretical normal and t values #par(mfrow=c(1,1)) qqPlot(snp,"norm") qqPlot(snp,"t",df=2.9) qqPlot(dax,"norm") qqPlot(dax,"t",df=3.7) qqPlot(nik,"norm") qqPlot(nik,"t",df=4.3) jarque.test(snp) ################################## Modelling ################################### ################################################################################ #LR test statistic: D = 2*(logalt-lognull) --> compare to chisq

model <- arima(snp,c(1,0,1)) model

model.res <- residuals(model)

acf(model.res, main="ACF for the residuals of ARMA(1,1)") pacf(model.res, main="PACF for the residuals of ARMA(1,1)") #ljung box test for white noise of residuals

for(h in c(1,5,10,15,20,25)) {

LBtest <- Box.test (model.res, lag=h, type="L") print(LBtest)

}

#the five different specifications for the conditional variance

spec.s<-ugarchspec(variance.model=list(model="sGARCH",garchOrder = c(1,1)), mean.model=list(armaOrder=c(1,1),include.mean = TRUE), distribution.model = "std")