Forecasting intra-day volatility : multiplicative component realized GARCH

(1)

UNIVERSITY OF AMSTERDAM

FACULTY OF ECONOMICS AND BUSSINESS

Master thesis

in the subject of

Financial Econometrics

Forecasting intra-day volatility.

Multiplicative Component Realized GARCH

Karolina Jerofejevaite

10603484

Supervised by Peter Boswijk

(2)

CONTENTS CONTENTS

1 Introduction

Up to date there is a huge amount of literature on modelling and forecasting daily variance of the returns whereas intra-day volatility models are far less discussed. However, every year trading becomes more and more frequent and automated. This motivates the development of the intra-day volatility forecasting models. These forecasts then serve as an important part of the algorithm for the schedule trades, asset pricing, changing hedge ratio. Also they help place limit orders and can be used to calculate intra-day Value at Risk (VaR) which may lead to allocation of the funds within a day. Consequently, interest in modelling intra-day variance is growing.

Thus my main goal in this thesis is to try to find the most empirically suitable approach to model and forecast intra-day variance. One of the most influential papers on this matter was recently published by Engle and Sokalska (2012). The authors introduce the Multiplicative Component GARCH model for High-frequency intra-day financial returns, which specifies the conditional variance to be a product of daily, diurnal and stochastic intra-day volatility. My master thesis builds on this paper. I investigate the performance of the mentioned vari-ance components and I seek to answer the question - how significant are these components for the intra-day volatility modelling and forecasting results?

In the mentioned paper commercially available volatility forecasts are used as a daily vari-ance component. These predictions are made on the basis of a multi-factor risk model. In contrast, I want to make use of information that high frequency data provides. Thus I take a different approach and model the daily volatility component by Realized GARCH, intro-duced by Hansen, Huang and Shek (2011). By the structure of the model, it accounts for asymmetry and long memory properties of the daily returns. Also it has been proven that the model gives substantial improvements for the daily conditional variance modelling and forecasting over usual GARCH models. Standard GARCH models typically employ squared returns to extract information about the current level of daily volatility. Within Realized GARCH model, the observed realized measures of the latent volatility are used instead.

(5)

1 INTRODUCTION

These measures are build using high frequency returns in such a way that they approximate the quadratic variation of the true underlying price process, by filtering out the microstruc-ture noise. Overall, many realized measures have been proposed, in my work I focus on 5 minute Realized Variance, Sub-sampled 5 minute Realized Variance and Realized Kernel. The last gives the most accurate daily forecasts among the three, therefore these forecasts are then used as a daily variance component in the Multiplicative Component GARCH model. After applying this model, I predict intra-day volatility 15 minutes ahead and use different approaches to evaluate the obtained forecasts. That is, various true volatility proxies and prediction measures are used. The models are applied to frequently traded stocks: Intel Corporation and Microsoft. Data samples start on the 2nd of June, 1990 till the 3rd of May, 2011. Every trading day consists of 5 second log-returns. Thus in total there are 3000 days of observations with 4680 5 second log-returns within each day.

This thesis is organized as follows. In the second chapter a literature review is given. In the 3rd _{chapter I introduce Realized GARCH and Multiplicative Component GARCH models.}

The used data is detailed in Chapter 4. Obtained results are summarized in Chapter 5. Possible extensions are discussed in Chapter 6. Limitations of the Realized GARCH model are expressed in Chapter 7. Overall conclusions of the thesis are presented in Chapter 8.

(6)

2 LITERATURE REVIEW

2 Literature review

The frequency of trading in the financial markets is gradually increasing every year. We find ourselves at the point where proper intra-day volatility model is of substantial empiri-cal importance. It has been argued that conventional GARCH models are not suitable for within-the-day modelling and fail to capture important features of the intra-day volatility. (See for instance Andersen and Bollerslev (1997)). The reason for this is distinctive intra-day seasonality or in other words diurnal patterns of the volatility. A number of closely connected models were developed to take account of intra-day volatility patterns, see Ghose and Kroner (1996), Andersen and Bollerslev (1997, 1998), Giot (2005) and Engle (2012). The latter extended the model proposed by Andersen and Bollerslev (1997) and introduced Multiplicative Component GARCH model. In contrast, to Andersen and Bollerslev, this model included not only daily and diurnal volatility components but also a stochastic intra-day component.

This thesis builds on the work of Engle and Sokalska (2012). The authors were dealing with a huge amount of data (that is 2500 US equities with high frequency returns) thus the main focus of their work was applying a number of different specifications and then comparing forecasting results. Namely, they construct models for separate companies, pool data into industries, and consider various criteria for grouping returns. And finally they arrive to the conclusion that the forecasts from the pooled specifications outperform the corresponding forecasts from company by company estimation. For this kind of modelling you need to have an access to a comprehensive sample plus an extremely fast computer. Thus in my thesis I take a different approach and focus on investigating the significance of the daily, diurnal and a stochastic intra-day volatility components for the forecasting results. Along with in-depth investigation of the properties of the intra-day returns. Moreover, I apply different ways to evaluate the accuracy of the forecasts.

For the time being let’s focus on the daily volatility (component) modelling. Commonly standard GARCH models are used. Within the GARCH framework, daily returns (typically

(7)

2 LITERATURE REVIEW

squared returns) are employed to extract information about the current level of volatility, and this information is used to form expectations about the next period’s volatility. But it must be emphasized that squared returns only offer a weak signal about the current level of volatility. Moreover, it is known that GARCH model is slow at ’catching up’ and it will take many periods for the conditional variance (implied by the GARCH model) to reach its new level, as discussed in Andersen (2003). Therefore, since I am dealing with high-frequency financial data, it is necessary to take advantage of this additional information. A number of realized measures of volatility, including Realized variance, bipower variation, the Realized Kernel, and others (see Andersen and Bollerslev (1998), Andersen (2001), Barndorff-Nielsen and Shephard, (2002, 2004, 2008), Hansen and Lunde (2006), Bandi and Russell (2008)) prove to be far more informative about the current level of volatility than is the squared return. This makes realized measures very useful for modelling and forecasting future volatil-ity. Andersen (2001), Barndorff-Nielsen and Shephard (2002) show that applying Realized variance, a measure constructed by summing high frequency squared returns, improves the understanding of time-varying variance and ability to forecast future volatility. Hansen and Lunde (2006) carry out an in-depth analysis of the Realized variance and investigate its upward-biasedness at high frequencies. Their work show that using 1 to 5 minute squared returns for Realized variance measure give the optimal results. Barndorff-Nielsen,Hansen, Lunde and Shephard expand this influential Realized variance literature by introducing Re-alized Kernel. This non-negative estimator is robust to autocorrelation of the high-frequency returns and has broadly the same form as a standard heteroskedasticity and autocorrelation consistent (HAC) covariance matrix.

Engle (2002) introduced a model called GARCH-X, which is GARCH model that includes a realized measure. However, within the GARCH-X framework the variation in the realized measures are left unexplained, due to this GARCH-X models are called partial. Engle and Gallo (2006) introduced the first ’complete’ model in this context. Their model specifies a GARCH structure for each of the realized measures, so that an additional latent volatility process is introduced for each realized measure in the model. The model by Engle and Gallo (2006) is known as the multiplicative error model (MEM), because it builds on the

(8)

3 ECONOMETRIC METHODS

MEM structure proposed by Engle (2002). Another complete model is the HEAVY model by Shephard and Sheppard (2010), which also incorporates at least two separate equations: one for latent volatility and the other one for realized measure. Thus unlike the traditional GARCH models, these models operate with multiple latent volatility processes. Another example of a complete model was introduced by Hansen, Huang and Shek (2011) and is called Realized GARCH. This model combines a GARCH structure for the returns with an integrated model for the realized measures of volatility. Importantly, the authors show statistical gains from incorporating realized measures in the volatility models. This is not the only paper that illustrates the benefit of including the realized measures in the analysis. Shephard and Sheppard (2010) show that when it comes to forecasting, HEAVY models out-perform standard GARCH substantially, for both within the sample and out of sample forecasts. From this it seems that HEAVY models would be a great pick for daily variance component forecast. However, in this model latent volatility and realized measure equations are estimated separately. And Shephard and Sheppard (2010) briefly mention that if the information was pooled across the two equations it might bring more explanatory power for the model. That is exactly what is done by Hansen, Huang and Shek (2011) in the Realized GARCH model. Therefore in this thesis Realized GARCH is chosen to model and forecast daily volatility.

In this Master thesis I combine arguably the best known practices to model intra-day and daily volatility. Namely, the Multiplicative Component GARCH by Engle and Sokalska (2012) for intra-day variance and the Realized GARCH for daily volatility (component) forecasting. This way I obtain an extended model, called the Multiplicative Component Realized GARCH.

3 Econometric methods

General notation. Every trading day is divided into a number of bins (time intervals) N, j = 1, ..., N marks an index of the bin within a day and t = 1, ..., T denotes the index of a trading day. Both things combined we get that {t, j} indicates the j-th bin on the t-th day.

(9)

3.1 Multiplicative Component GARCH 3 ECONOMETRIC METHODS

Dataset contains high-frequency 5 second log returns. This means that the size of the bin is 5 seconds and there are 4680 such bins within each trading day. The 5 second log returns are defined as:

rt,j = log St,j St,j−1 ,

where St,j denotes the stock price at time {t, j}.

Then the daily returns can be obtained as follows:

rt= 4680

X

j=1

rt,j.

Please recall that there are 4680 5 second bins within a day.

Throughout this thesis 15 minute returns will be used often. Therefore, to distinguish these 15 minute returns from high frequency 5 second returns, I choose to denote them as rt,i,

where i = 1, ..., 26 (each day has 26 15 minute bins).

3.1 Multiplicative Component GARCH

The Multiplicative component GARCH model for the intra-day financial returns specifies the conditional variance to be a multiplicative product of daily, diurnal and stochastic intra-day volatility. In this section I give the general specification of the model.

The intra-day returns rt,i are assumed to take the following form:

rt,i =phtsiqt,it,i, t,i ∼ N (0, 1). (1)

Here ht is the daily volatility component, si is the diurnal variance pattern, t,i is the error

term, and qt,i is the stochastic intra-day volatility component, with E(qt,i) = 1. In this

thesis I chose to use 15 minute returns. Modelling of Multiplicative GARCH breaks down in 3 major parts.

(i) Daily volatility component ht needs to be modelled and predicted. For this purpose

Realized GARCH model, detailed in section 3.2, is used.

(ii) The deterministic diurnal pattern (si) has to be obtained.

(10)

3.2 Realized GARCH 3 ECONOMETRIC METHODS

by the daily variance component ht:

r2 t,i ht = siqt,i2t,i ⇒ E _r2 t,i ht = siE[qt,i] = si.

This leads to the following estimator of si:

ˆ si = 1 T T X 1 r_t,i2 ht . (2)

(iii) The intra-day returns must be normalized by daily (ht) and diurnal patterns ( ˆsi):

zt,i =

rt,i

√ htsˆi

≈√qt,it,i

These obtained normalized returnszt,i are now used in a GARCH(1,1) model for the

intra-day stochastic component qt,i:

qt,i = ω + αz2t,i−1+ βqt,i−1 (3)

where zt,i | Ft,i−1 ∼ N (0, qt,i).

Here Ft,i−1 is a σ-algebra containing all the 15 minute returns observed up to the current

time moment {t, i}. In more detail, Ft,i−1 = {r1,1, ..., r1,N, ..., rt−1,1, ..., rt−1,N, rt,1, ..., rt,i−1}

and N = 26. I apply a simple GARCH(1,1) for the stochastic component qt,i as is done in

the paper by Engle and Sosalska (2012). However, possible extensions of this model will be discussed in Chapter 7.

3.2 Realized GARCH

In this part I introduce Realized GARCH model proposed by Hansen, Huang and Shek (2011) and the applied realized measures. It will be assumed that E(rt | Ft−1) = 0, which is

em-pirically proven to be accurate assumption. Here the information set Ft−1 contains the high

frequency returns rj,t and the realized measures constructed from these returns, observed

up to day t − 1. That is, Ft−1 = {rt−1,1, ..., rt−1,N, rt−2,1, ..., rt−1,N,, r1,1, ..., r1,N, xt−1, ..., x1}

where N=4680 and xt−1 denotes the obtained realized measure. As mentioned rt are the

(11)

daily volatility of the returns, ht= V (rt| Ft−1).

First of all, the broader introduction to the realized measures is needed. Let’s say we want to measure the variation over the period [0,T], also let’s assume that the log price process (Y) is a Brownian semi-martingale. A continuous semi-martingale is a process that can be decomposed as Yt = Y0 + At+ Mt , where {At}_t>0 is of bounded variation and {Mt}_t>0 a

continuous local martingale. Thus Ito processes, also known as Brownian semi-martingales, form a subset, with At=

Rt

0 µds and Mt=

Rt

0 σsdWs. Combining everything we get:

Yt = Z t 0 µsds + Z t 0 σsdWs

The main focus lies in the latent quadratic variation of this process over the whole period of interest [0,T]:

[Y ] = Z T

0

σ_u2du

where due to Ito isometry property and multiplication table we have

V Z t 0 σsdWs = E " Z t 0 σsdWs 2# = E Z t 0 σ_s2ds

See for instance Etheridge (2002). Furthermore, assume that:

Xt,j = Yt,j+ Ut,j

is a noisy observation of the true log price process Yt,j ∀{t, j} ∈ [0, T ] with Ut,j denoting the

market microstructure noise, such that E[Ut,j] = 0, V [Ut,j] = ω2.

The realized measures are constructed in such a way that they approximate the quadratic variation of the semi-martingale that drives the underlying log price process by filtering out the market microstructure noise. In the empirical part of my thesis I apply the following realized measures: Realized variance (RV ), Sub-sampled Realized variance(RVsub) and

Re-alized kernel (RK). The detailed specifications of these measures are given in the end of this chapter.

(12)

re-3.2 Realized GARCH 3 ECONOMETRIC METHODS

variance ht depends not only on the ht−1 but also on the realized measure of the volatility,

denoted xt−1. Overall, the measurement equation is a very important component, that ties

the realized measure to the latent volatility. Also providing a simple way of modelling the joint dependence between rt and xt. The specification of the model in more detail can be

found in section 3.2.1.

The authors Hansen, Huang and Shek (2011) define different specifications for the model, however they do emphasize the choice of the log-linear Realized GARCH. There are few reasons for this. First, log-linear specification automatically ensures a positive variance. In practice the log-linear specification of the usual GARCH model is not often used because rt

may have a zero value and this would cause censoring in the model. In contrast, within the Realized GARCH framework the logarithm of the returns (log(rt−1)) does not appear in the

model (this is explicitly shown below in equation (5)). For the motivation of not including returns in the Realized GARCH model I refer the reader to the Hansen, Huang and Shek (2011) paper. Another attractive feature of the log-linear Realized GARCH, is the fact that it maintains the ARMA structure that characterizes some of the standard GARCH models. All things considered, log-linear Realized GARCH seems like the best choice to be applied in an empirical work.

Usually GARCH(1,1) specification is applied to account for the volatility clustering. Here I am dealing with the returns of the Intel Corporation and Microsoft storcks. And GARCH(1,1) proofs to be enough to correct for the serial autocorrelation in the residuals of the squared residuals of the data analysed (detailed information about the data is provided in Chapter 4).

After taking in to account all of this argumentation, I chose to use log-linear Realized GARCH(1,1) model for daily volatility modelling and forecasting. In the next section the detailed specification of the model can be found.

(13)

3.2.1 Specification of the log-linear Realized GARCH(1,1)

Return equation:

rt=

p

htt. (4)

GARCH equation:

ht= exp {ω + β log ht−1+ γ log xt−1} . (5)

Realized measure equation:

log xt= ξ + ϕ log ht+ τ (t) + ut. (6)

where τ (t) = τ1t+ τ2(2t − 1) is a leverage effect such that E[τ (t)] = 0 and ∼ iidN (0, 1),

u ∼ iidN (0, σ2

u) and xt is a realized measure. It is shown in the paper by Hansen, Huang

and Shek (2011) that the use of this type of leverage equation τ (t) in the realized measure

equation (6) induces EGARCH type structure in the GARCH equation (5).

From the model specification above another argument for choosing log-linear Realized GARCH can be spotted. It lies within rt specification in equation (4). Which implies that:

log(r_t2) = log(ht) + log(2t)

and a realized measure is in many ways similar to the squared return, r2_t, although a more accurate measure of ht. Therefore, it is natural to express realized measure log(xt) in terms

of log(ht) and t like in equation(6).

3.2.2 Log-likelihood

For the purpose of estimation, the Gaussian specification will be adopted, so that the log-likelihood is given by:

l(r, x, θ) = −1 2 n X t=1 log(ht) + r2 t ht + log(σ2_u) + u 2 t σ2 u (7) where θ = (ω, β, γ, ξ, ϕ, τ1, τ2, σ2u) 0

(14)

3.2.3 Multi-period Forecast

The Realized GARCH model can be used to predict not only the conditional return variance but also the realized measure. Even more importantly, the advantage of having a model that fully describes the dynamic properties of realized measure (xt), is that multi-period-ahead

forecasting is possible. In contrast, this kind of predictions are not feasible without realized measure equation (6). The Realized GARCH model induces the following VARMA(1,1) structure, which will be used for multi-period-ahead forecasts:

   log(ht+k) log(xt+k)   =    β γ ϕβ ϕγ    k   log(ht) log(xt)    + k−1 X j=0    β γ ϕβ ϕγ    j        ω ξ + ϕω   +    0 τ (t+k−j) + ut+k−j        

where τ (t+k−j) = τ1t+k−j + τ2(2t+k−j − 1) To simplify the expression let’s denote :

A =    β γ ϕβ ϕγ   , b =    ω ξ + ϕω   , Yt=    log(ht) log(xt)   , ζt=    0 τ (t) + ut   , then we get: Yt+k = AkYt+ k−1 X j=0 Aj(b + ζt+k−j) (8) 3.2.4 Realized measures

In this section I describe in more detail the realized measures which are used in the empirical part of the thesis.

Realized variance

The simplest and yet most broadly used realized measure is Realized variance. The main idea is to aggregate squared high frequency intra-day returns rt,j to approximate the daily

increments of the quadratic variation of the price process. In more detail, if the prices are observed without the noise then, as maxj | tj− tj−1 | ↓ 0, the Realized variance consistently

(15)

Realized variance converges to the daily increment of the quadratic variation of the price process (see for instance Barndorff-Nielsen, Shephard (2002), Bandi and Russell (2008)). In my dataset tj = tj−1+ δt and δt is 5 seconds. However, due to the market microstructure

noise (Ut,j) there is a difference between the observed price process and the true price process,

whose quadratic variation is the object of interest. The effect of market microstructure noise for the Realized variance estimates are illustrated by Barndorff-Nielsen, Shephard (2002), who show that these estimates are upward biased at high frequencies. Therefore, in practice 1- to 5-minute return data are used to mitigate the effect of the noise. (See also Hansen and Lunde (2006)). I chose to use 5-minute Realized variance to be implemented in my empirical part of the paper.

The Realized variance is defined as:

xt= RVt = N

X

i=1

r_t,i2 (9)

Please note that here rt,i are 5 minute returns, thus every trading day is divided into N = 76

bins (time intervals).

Sub-sampled Realized variance

If a subset of the data is used with the Realized variance, then it is possible to average across many such estimators each using different subsets. This is called sub-sampling. Theoretically this procedure is beneficial in reducing the upward bias of Realized variance measure for the high frequencies (see for instance Barndorff-Nielsen, Shephard N (2002), Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. (2008), Bandi (2008)). In this thesis the sub-sampled 5 minute Realized variance is constructed by shifting the time of the first estimation in 5-second increments. This way I find 60 of RVt (detailed in eq. (9)) for each day and simply

take an average of these Realized variances. And that is how RVsub is obtained.

Realized Kernel

One of the concerns that arises when dealing with high frequency data is the autocorrelation between the high frequency returns. This motivated to construct a measure that would

(16)

account for this serial correlation. Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. (2008) proposed to use Realized Kernel. Which is defined as:

xt= RKt = K(X) = H X h=−H k h H + 1 γh,

where k(x) is a symmetric function, known as Parzen Kernel:

k(x) =              1 − 6x2_{+ 6x}3_, _{0 ≤ x ≤ 0.5} 2(1 − x)3_, _{0.5 ≤ x ≤ 1} 0, x > 1 and γh = n X |h|+1 rt,jrt,j−|h|,

where n = bt/δtc = 4680, δt = 5 seconds and rt,j are 5 second log returns. The authors show

that as n → ∞, K(U ) → 0, K(Y ) → [Y ], which implies that also we have K(X) → [X]. Recall that X is an observation of Y and U denotes the market microstructure noise. Now I briefly show how the optimal H can be chosen. In the same paper, authors argue that H should be find by using the formula:

H = c ˆξ4/5n3/5, (10)

with c = 3.5134 for the Parzen kernel k(x) detailed above and

ξ2 = ωˆ

2

RVsub

RVsub is sub-sampled Realized variance mentioned above and ˆω2 can be found using the

formula: ˆ ω2 = 1 q q X i=1 RV_dense(i) 2n(i) (11)

where RV_dense(1) , ..., RV_dense(q) are Realised variances calculated using every q-th observation (ev-ery 5th second, 10th second, 15th second and so on).

It should be noted that within the Realized Kernel framework high frequency returns can be used (rt,j = 5 seconds). Because, in contrast to Realized variance measure, Realized Kernel

(17)

4 DATA

is not upward biased on high frequencies. Thus more information is used to construct this measure. Which indicates that, theoretically, the Realized Kernel should approximate the quadratic variation of the log price process better than the Realized variance.

4 Data

In the empirical part of the thesis I apply models to the Intel Corporation (INTC) and Microsoft stocks. Results obtained for Microsoft stock are summarized in the appendix and serve as a robustness check. Therefore, in this section I provide plots and descriptive statistics only for the Intel Corporation (in the appendix the corresponding results are shown for Microsoft). Data sample for INTC stock starts on the 2nd _{of June, 1990 till the 3}rd _of

May, 2011. Thus in total I have 3000 days of observations with 5 second log-returns within the day. In total there are 3000 trading days (6.5 hours) each with 4680 5 second returns. I declare 4680*3000=14,040,000 high frequency log returns to be known observations. In this comprehensive sample around 44% of the 5 second returns are zeros for the Intel Corparation (INTC) stock. Additional 30 days of data (from 2011.05.04 to 2011.06.15) is also recorded in the data set and will serve as a out of sample observations for the evaluations of the forecasts. For a better illustration of the data I plot the daily returns and provide their descriptive statistics in Figures 1 and 2 respectably. The returns show two high volatility periods which correspond to dot-com bubble (approximately between 2001 and 2003) and credit crisis (between 2008 and 2010). From the descriptive statistics we see that the mean is close to zero but standard deviation much smaller than 1. The return distribution has kurtosis of 5.7 (> 3) which indicates fatter tales than for the Normal distribution. This so called non-normality is mainly caused by the volatility clustering which can be clearly seen from Figure 1. Furthermore, the autocorrelations for the squared daily returns are plotted in Figure 3 where we observe that autocorrelation is high and decays very slowly (long memory), in fact, it diminishes completely only after 500 lags.

(18)

4 DATA

Figure 1: Daily returns of the Intel Corporation stock

(19)

5 RESULTS

Figure 3: Autocorrelation for the squared daily Intel Corporation stock returns

5 Results

In this section I present the obtained results after applying models detailed above. In order to apply Multiplicative Component GARCH model I need to have forecasts of the daily volatil-ity component. Therefore I first present the modelling and forecasting results obtained from the Realized GARCH model implementation and only then the Multiplicative Component GARCH results.

5.1 Realized GARCH modelling results

By taking the advantage of the available high frequency data I model the latent daily volatil-ity by Realized GARCH model. As emphasized in the section above, there are many different approaches to approximate quadratic variation of the price process. I chose to apply three of those measures, namely 5 minute Realized variance, 5 minute sub-sampled Realized variance and Realized Kernel. Then by comparing the obtained results, distinguish which of these measures are the most suitable for the daily volatility modelling.

(20)

5.1 Realized GARCH modelling results 5 RESULTS

together. All three measures clearly capture the two high volatility periods within the sam-ple, which correspond to the dot-com bubble and credit crisis. However, to compare those measures among themselves is quite difficult just from observing the graph. All three of them seem to move very closely together. But if we take a closer look (see Figure 5) we can notice that, especially during the low volatility periods, the Realized Kernel tends to, on average, give higher volatility compared to other measures. Here it should be noted that realized measures ignore the variation of the overnight prices, which then leads to the lower volatility compared to variance of the squared daily returns (Shephard and Sheppard 2010). Taking this into account we can argue that higher level of volatility obtained from realized measures is desired. Then in this context Realized Kernel performs the best.

In Table 1 I present the values of the log-likelihood for the Realized GARCH model (specified in equations (5) and (6)). It is obvious that the highest log-likelihood value is achieved when using the Realized Kernel (RK) measure. This is also an indication for the choice of Realized Kernel (RK) for modelling the latent daily volatility within Realized GARCH framework.

Table 1: Log-likelihood for log-linear Realized GARCH(1,1)

Realized measure: RV RV sub RK

(21)

5.1 Realized GARCH modelling results 5 RESULTS

Figure 4: Realized measures

(22)

5.1 Realized GARCH modelling results 5 RESULTS T able 2: Obtained results for log-linear Realized GAR CH(1,1)

P

arameters

Standard

errors

Studen

t’s

t

p-v

alue

R

V

R

V

sub

RK

R

V

R

V

sub

RK

R

V

R

V

sub

RK

R

V

R

V

sub

RK

ω

-0.1191

-0.1490

0.2013

0.0824

0.0933

0.1234

-1.4447

-1.5973

1.6306

0.1486

0.1103

0.1031

β

0.6271

0.5768

0.5361

0.0210

0.0221

0.0209

29.8329

26.0514

25.6435

0.0000

γ

0.3554

0.4003

0.5000

0.0207

0.0220

0.0272

17.1356

18.1523

18.3764

0.0000

ξ

-0.0807

-0.0453

-0.7056

0.2151

0.2169

0.2181

-0.3753

-0.2091

-3.2353

0.7075

0.8344

0.0012

ϕ

0.9971

1.0050

0.8898

0.0269

0.0271

0.0273

37.0600

37.0561

32.5684

0.0000

τ

1

-0.0188

-0.0167

-0.0219

0.0078

0.0073

0.0055

-2.4163

-2.2805

-3.9772

0.0157

0.0226

0.0001

τ

2

0.0842

0.0809

0.0505

0.0055

0.0052

0.0038

15.1627

15.3759

13.3606

0.0000

π

0.9815

0.9790

0.9810

(23)

5.2 Realized GARCH forecasting results 5 RESULTS

From observing results in Table 2 some important conclusions can be drawn. Almost all of the parameters of the model are very significant with every realized measure. It can also be clearly seen that standard (RV ) and sub-sampled 5 minute Realized variances (RVsub)

give very close results, thus sub-sampling procedure does not give empirically sufficient im-provements in this case. However, when investigating results with Realized Kernel (RK) few distinctions can be noticed. First, the leverage effect (captured by the parameter τ1)

becomes even more important. Also using this realised measure, the parameter ξ becomes significant compared to other realized measures. Another important difference can be no-ticed when RK is applied then parameter β becomes significantly smaller and γ gets bigger. This implies that less weight is given to ht−1 and more to xt−1 which means that Realized

Kernel measure has more explanatory power for the latent daily volatility ht.

The parameter π = β + ϕγ is constrained and needs to be between (−1, 1) for the model to be stationary (see Hansen, Huang and Shek (2011)). In fact π is the largest eigenvalue of the matrix A in the equation (8). In Table 2 we see that this parameter has value close to 1 no matter which realized measures is used. It indicates that autocorrelation for the residuals of the squared returns decays slowly. Please recall that in Chapter 4 it was shown that squared daily returns of the Intel Corporation have this long memory property. Thus it can be concluded that Realized GARCH model captures the long memory property.

5.2 Realized GARCH forecasting results

In this part I try to distinguish which of the realized measures give the best forecasting results within a Realized GARCH framework. As mentioned above, this model induces the VARMA(1,1) structure for multi-period forecasts (detailed in equation (8)). The benefit of this structure is not only the fact that we can forecast daily volatility (ht) multi periods

ahead, but also the fact that we can forecast realized measure xt as well. Therefore, in this

section the forecasts over different horizons, namely 1,2,...,30 days ahead, are found and then compared. Moreover, the predictions for both xt and ht are obtained. In more detail, I use

(24)

5.2 Realized GARCH forecasting results 5 RESULTS

daily volatility and realized measure 1,2,...,30 days ahead.

True volatility is not observable. Thus when we want to evaluate the forecasts of the variance we encounter a problem of choosing the right proxy for the true volatility. Often volatility forecasts are compared to squared returns but they give a low level of information about the variance. Another solution is to choose realized measure as a proxy for the true volatility, I chose to use three proxies, namely 5 minute Realized variance RVout−sample, sub-sampled 5

minute Realized variance RVsub−out−sample and RKout−sample. Forecasting procedure can be

summarized like this:

(i) By applying 3 different realized measures I obtain 3 different forecasts for xtand 3 for

ht.

(ii) In order to distinguish which measure gives the best forecasts I calculate 3 out of sample realized measures which would serve as proxies for the true volatility.

(iii) Finally comparison takes place. Forecasts obtained for the xt and ht with Realized

variance (as a measure) are compared to RVout−sample, prediction of xt and ht with

sub-sampled Realized variance are compared to RVsub−out−sampleand forecasts obtained

for xt and ht with Realized Kernel are compared to RKout−sample.

The results are summarized in Table 3. Here we see that once again Realized Kernel performs the best and give the most accurate forecasts. 1

Table 3: Obtained results for out of sample forecasts of daily volatility and realized measure over 30 days horizon

RV RV sub RK

Forecasts for: ht xt ht xt ht xt

RMSE: 1.67E-04 1.93E-04 1.24E-04 1.02E-04 6.52E-05 7.66E-05

Compared to out of sample: RV RV sub RK

All things considered, it is clear that Realized Kernel is the most suitable realized measure to be used for modelling daily volatility and obtaining corresponding forecasts.

(25)

5.3 Modelling results for Multiplicative Component GARCH 5 RESULTS

5.3 Modelling results for Multiplicative Component GARCH

In this section I present results obtained after applying Multiplicative Component GARCH model (detailed in equations (1),(2) and (3)) for INTC stock. The last 30 days of observations will be used for this model. Here ht is modelled by the Realized GARCH and as rt,i I take

15 minute returns. In total, there is 30*26=780 observations of rt,i and diurnal pattern ˆsi

is calculated using these returns. Results of Multiplicative GARCH model are summarized in Table 4. In this table we can see that only the parameter β is significant, and it is very high. These results lead to the conclusion that zt,i do not give any explanatory information

for stochastic component qt,i. Or in other words, indicates that modelling daily volatility

component with Realized GARCH leads to no ARCH effects for the normalized 15 minute returns zt,i. (The formal tests for ARCH effects in normalized squared returns zt,i2 are given

in Chapter 6).

Table 4: GARCH(1,1) results for stochastic component qt,i

Parameter value Standard error Student’s t p-value

ω 0.006726 0.013649 0.492762 0.626165

α 0.015688 0.009767 1.606262 0.119850

β 0.978239 0.020707 47.242246 0.000000

Value of the log-likelihood 1102.1942

In Figure 6 the diurnal pattern ˆsi) is plotted, it shows variance of the returns in each of 26

15 minute bins. From this graph it can be clearly seen that at the beginning of each trading day there is a substantial increase in volatility. Also at the end of the day variance increases but considerably less than in the morning. This U-shaped day seasonality of the volatility is documented in the number of articles, see for instance (Andersen T. G. , Bollerslev T., (1997), Andersen (2001)).

For a better understanding and illustration of the model I plot returns, daily variance com-ponent ht, intra-day component qt,i, diurnal pattern si, composite variance and composite

(26)

5.4 Forecasting results for Multiplicative Component GARCH 5 RESULTS

Figure 6: Diurnal pattern

ht,i – composite variance and gt,i – composite variance without stochastic intra-day

compo-nent qt,i. Here by composite variance I mean:

ht,i =phtsiqt,i (12)

whereas gt,i does not have stochastic component qt,i. Thus it is equal to:

gt,i =

p

htsi. (13)

From Figure 7 one important feature should be noted: ht,i – composite variance and gt,i

– composite variance without stochastic intra-day component qt,i move very close together.

Also all the peaks of the diurnal pattern si remain clearly vivid in composite variance graph.

Which indicates that day seasonality is of particular importance.

5.4 Forecasting results for Multiplicative Component GARCH

In this part I present the results obtained for the one step ahead forecasts of the stochastic intra-day volatility component qt,i. That is, qt,i is predicted sequentially for each 15 minute

(27)

Figure 7: Volatility components

for each 15 minute time interval for the next day, that is 2011.05.04. The procedure can be summarized as follows:

(i) I take the daily variance component ht modelled by Realized GARCH.

(ii) Then ˆsi is calculated as detailed above.

(iii) Having this information I model the stochastic intra-day volatility component qt,i by

GARCH(1,1) and forecast it one period ahead qt+1,1. This way I obtain the prediction

for the first 15 minute bin of 2011.05.04.

(iv) Let’s assume that the first 15 minutes of the day 2011.05.04 have passed, that is rt+1,1

is now known. Using this additional information and the forecast of the daily volatility component ht+1 (see equation (8)), now qt+1,2 can be predicted.

This procedure is then repeated till qt+1,i, ∀ i = 1, ..., N is found. Here N indicates number

of bins in one trading day and is equal to 26 because in total there are 26 bins of 15 minute time intervals in one trading day.

(28)

When we want to evaluate the forecasts of the volatility we encounter a problem of choosing the right proxy of the true volatility. Here a true volatility for every bin in each day is de-noted as σ2

t,i. And to evaluate this σ2t,i, 3 volatility proxies are employed, namely rt,i2 , RVsub t,i

and RKt,i. In more detail, r2t,i are squared 15 minute returns, RVsub t,i is a sub-sampled 1

minute Realized variance for every 15 minute time interval and RKt,i – a 15 minute Realized

Kernel (detailed explanation of the calculation procedure of this measure can be found in the appendix).

I choose to compare ht,i-composite intra-day variance and gt,i-composite intra-day variance

without stochastic component (detailed in 12 and 13) to the true volatility proxies. This way I can determine whether the inclusion of the predicted stochastic intra-day component qt,i makes the forecasts more accurate or not. For this purpose Root mean squared error

(RMSE) is implemented. It is calculated as follows:

(i) RM SE = q

1 N

PN

i=1 σt,i2 − ht,i

2 (ii) RM SE = q 1 N PN

i=1 σt,i2 − gt,i

2

where σ2

t,i is one of the three mentioned volatility proxies.

The results are summarized in Table 5. In order to check the robustness of the results here I take a grid of observations, I assume that ’today’ is 2008.12.12, 2009.09.30, 2010.07.19, 2011.05.03 and obtain predictions for the following days 2008.12.13, 2009.10.01,2010.07.20, 2011.05.04 accordingly. In Table 5 bold numbers indicate which gives better results - forecasts with stochastic component ht,i or without gt,i. And (*) marks significantly different forecasts

at 10 %, (**) - at 5 % and (***) - at 1% confidence interval according to The Diebold-Mariano (DM) test. Which is summarized in the appendix along with the results of DM statistic. At first glance we observe that forecasts give contradicting results for different time periods. Inclusion of the stochastic intra-day component qt,i not always benefits the

accuracy of the forecasts. On the other hand, Diebold-Mariano test indicates that only three of these predictions are significantly different and they all favour the forecasts obtained with the stochastic intra-day component. Another noticeable feature is the fact that predictions are more accurate when compared to RVsub.t,i, RKt,i rather than rt,i2 .

(29)

Table 5: RMSE comparison for composite variance RMSE

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Forecasts of: ht,i gt,i ht,i gt,i ht,i gt,i ht,i gt,i

Compared to proxies: r2

t,i 2.148E-09*** 2.377E-09 1.55E-10 1.83E-10 8.3E-11*** 8.9E-11 1.96E-10 1.52E-10

RVsub.t,i 1.103E-09 1.084E-09 6.4E-11 8.1E-11 1.5E-11 1.6E-11 1E-10 5.2E-11

RKt,i 8.47E-10 7.98E-10 7.6E-11*** 1.18E-10 1.2E-11 1.2E-11 5E-11 4.3E-11

Along with the preceding analysis I also employ the same approach as used by Engle and Sokalska (2012) in their paper. That is, use the M SEt,i = (zt,i2 − q

f

t,i)2 measure to evaluate

the forecasts (here qf_t,i denotes the predicted intra-day component). Consequently, I obtain comparable results (see Table 5.42_{). However, the drawback of this approach is the fact that}

it is more difficult to obtain other true volatility proxies instead of z_t,i2 (recall that z_t,i2 are normalized 15 minute returns). This motivated me to focus more on the composite intra-day variance forecast evaluation.

MSE

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03 Forecasts of: qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1 Compared to proxies:

z2

t,i 2.0717*** 2.2256 4.5305 5.4935 1.6432 1.6926 2.6495 2.9354

Another possible way to compare accuracy of the predictions is to calculate the out-of sam-ple log-likelihood. Please recall that intra-day stochastic component qt,i is modelled by

GARCH(1,1). For evaluation purposes first term of the likelihood 1₂log(2π) can be ignored. Consequently, we get forecasting measure – LIKt,i detailed below.

LIKt,i = − log(qt,if ) −

z_t,i2 q_t,if ,

2_{(*) marks significantly different forecasts at 10 %, (**) - at 5 % and (***) - at 1% confidence interval}

(30)

here qf_t,i denotes the predicted stochastic intra-day component. The results obtained with this measure are summarized in Table 6. Similarly, I construct LIK measures for ht,i and

gt,i:

(i) LIK = − log(ht,i) −

r2

t,i

ht,i

(ii) LIK = − log(gt,i) −

r2

t,i

gt,i.

and summarize the results in Table 7. The results gathered in Tables 7 and 6 are equiva-lent and indicate that inclusion of the stochastic intra-day component is beneficial for the forecasting accuracy.

Table 6: LIK comparison for the intra-day component LIK

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Forecasts of: qt,i qt,i = 1 qt,i qt,i = 1 qt,i qt,i = 1 qt,i qt,i = 1

Compared to proxies:

z2

t,i -31.28 -33.18 -37.09 -44.48 -25.95 -26.80 -35.78 -38.34

Table 7: LIK comparison for composite variance LIK

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Compared to:

r2

t,i 236.42 235.05 277.94 270.99 288.62 286.90 275.84 271.97

Overall, the results favour the inclusion of the stochastic intra-day component qt,i for

achiev-ing higher accuracy of the predictions, but only slightly. Thus for further investigation, I add one more measure of forecasts - MME(u) used by David G. McMillan (2009). The idea of this measure is that it weights the under-predictions of the volatility more than the

(31)

over-predictions and it is defined as:

M M E(u) = 1 h " _u X i=1 | ht,i − σt,i2 | + o X i=1 q | ht,i− σt,i2 | # (14)

where u indicates the number of under-predictions, o – over-predictions and h, is the number of forecasts. When we think about the risk in general, it can be emphasized that the under-estimation of the risk can cause more trouble then the over-under-estimation. Thus it could be argued that MME(u) is the most important forecast evaluation measure discussed in this thesis. The results can be found in Table 8 3 _{below along with a graph for the predictions}

of qt,i, see Figure 8.

Table 8: MME(u) comparison for composite variance MME(u)

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

t,i 2.11E-03 2.07E-03*** 1.20E-03 1.52E-03 8.22E-04 7.33E-04*** 1.08E-03 1.36E-03

RVsub.t,i 2.65E-03 2.49E-03 1.58E-03 1.89E-03 8.19E-04 6.13E-04 1.10E-03 1.61E-03

RKt,i 3.04E-03 2.85E-03 2.13E-03*** 2.59E-03 9.96E-04 8.00E-04 1.29E-03 1.95E-03

After investigating Figure 8 and Table 8 a simple conclusion can be drawn. If qt,i is predicted

to be below 1, then this causes ht,i to be downward biased compared to gt,i and consequently

MME(u) give worse results for ht,i. This therefore motivates the exclusion of the stochastic

intra-day component qt,i. And vice versa if qt,i is predicted to be above 1. Overall, it

illustrates that the level of the predicted stochastic intra-day component is more important than its variations within that day.

3_{(*) marks significantly different forecasts at 10 %, (**) - at 5 % and (***) - at 1% confidence interval}

(32)

6 POSSIBLE EXTENSIONS

Figure 8: Forecasts for qt,i, ∀ i = 1, ..., N

6 Possible extensions

6.1 Asymmetries

In this part I will check whether or not a leverage effect occurs when dealing with the daily returns and with the intra-day returns. As mentioned in section 3.2.1, the realized measure equation (6) induces an EGARCH type structure in the GARCH equation (5) within the Realized GARCH framework. This motivates to apply the EGARCH model to the daily returns and then investigate the significance of the leverage effect. Obtained results for the model are given in Table 9. It can be clearly seen that leverage effect is important for the daily returns. It is indicated by highly significant coefficient α2. In order to illustrate

leverage effect more vividly I use News Impact Curve (NIC), which shows how positive and negative shocks to the returns affect future volatility (see Figure 9). It is obvious that NIC is very asymmetric, which implies that negative shocks to the returns have stronger effect to the volatility compared to positive effects. Of course, these findings are expected.

On the other hand, if we investigate the standardized 15 minute returns zt,i and their

(33)

6.1 Asymmetries 6 POSSIBLE EXTENSIONS

Table 9: EGARCH model results for the daily returns of Intel Corporation

Dependent Variable: r

t

log h

t

= α

0

+ α

1

| r

t−1

/ph

t−1

| +α

2

(r

t−1

/ph

t−1

) + α

3

log h

t−1

Variable

Coefficient

Std. Error

z-Statistic

Prob.

α

0

-0.135701

0.017482

-7.762547

0.0000

α

1

0.106276

0.011581

9.177085

0.0000

α

2

-0.041434

0.007356

-5.633033

0.0000

α

3

0.993618

0.001571

632.5466

0.0000

(34)

6.1 Asymmetries 6 POSSIBLE EXTENSIONS

Table 10: EGARCH model results for the normalized intra-day 15 minute returns of INTC

Dependent Variable: zt,i

log qt,i = α0+ α1 | zt,i−1/

√

qt,i−1 | +α2(zt,i−1/

√

qt,i−1) + α3log qt,i−1

Variable Coefficient Std. Error z-Statistic Prob.

α0 -0.106300 0.033075 -3.213912 0.0013

α1 0.135951 0.042946 3.165588 0.0015

α2 -0.000207 0.023703 -0.008713 0.9930

α3 0.867883 0.059561 14.57141 0.0000

(35)

6.2 Long memory 6 POSSIBLE EXTENSIONS

indicated by the insignificant coefficient α2 and a symmetric News impact curve.

All things considered, it can be concluded that for the leverage effect to have a significant importance longer period of time needs to be considered. Or in other words, leverage effect takes longer to occur. Which is the reason why in this thesis leverage effect is accounted for in the daily volatility modelling but not on the intra-day basis (where stochastic component is modelled by a simple GARCH(1,1)).

6.2 Long memory

Another property that needs to be accounted for is long memory. In section 5.3 I find that normalized 15 minute returns zt,i do not give any explanatory information for the intra-day

stochastic component qt,i, or in other words, there are no ARCH effects on the intra-day

basis. This can be spotted from graphs of rt and zt,i see Figures 1 and 11. We observe from

rt graph that volatility clustering plays an important role and needs to be accounted for but

zt,i looks similar to the white noise.

Figure 11:

(36)

com-6.2 Long memory 6 POSSIBLE EXTENSIONS

parison I investigate ARCH effects for the residuals of the squared daily returns r2

t and of

the normalized intra-day returns z_t,i2 . First, Breusch-Godfrey Serial Correlation LM Test is performed. The results are shown in Tables 11 and 13. Here the null hypothesis states that there is no serial correlation. It is clearly rejected for the residuals of the squared daily returns but not of the normalized intra-day returns. Which means that there are ARCH effects in r2_t but no in z_t,i2 . This conclusion is supported by the Q-statistic as well, see Tables 12 and 14. And even though Q-statistic indicates some autocorrelation for the z2

t,i residuals,

it is not highly significant this is clearly illustrated by the autocorrelation (ACF) graph, see Figure 12. Conversely, when looking at the ACF plot for r2

t (Figure 3) we see that

autocor-relation is of particular importance and very persistent. In fact, serial corautocor-relation for daily returns diminishes completely only after 500 lags, which implies very long memory.

All things considered we arrive to another conclusion that intra-day returns, which were normalized by daily variance and diurnal volatility component, no longer present serial auto-correlation and no GARCH type model needs to be applied. Thus ARCH effects and long memory property should be accounted on the daily basis (which in this thesis is done by Realized GARCH) but not within the day.

Table 11: Serial Correlation test for the squared daily returns Breusch-Godfrey Serial Correlation LM Test:

F-statistic 13.71320 Prob. F(50,2949) 0.0000

Obs*R-squared 565.9347 Prob. Chi-Square(50) 0.0000

Dependent Variable: Residuals of r2 t

Method: Least Squares

Sample: 1 3000

Table 12: Q-statistic for the squared daily returns

Q-statistic: 123.60 274.05 409.66 486.85 621.53 727.85 824.03 901.33 1060.0 1156.9 1279.3 1331.5 1433.7 1503.7 1618.0

(37)

6.2 Long memory 6 POSSIBLE EXTENSIONS

Table 13: Serial Correlation test for the squared normalized intra-day returns Breusch-Godfrey Serial Correlation LM Test:

F-statistic 1.674308 Prob. F(5,774) 0.1383

Obs*R-squared 8.346163 Prob. Chi-Square(5) 0.1382

Dependent Variable: Residuals of the z2 t,i

Method: Least Squares

Sample: 1 780

Table 14: Q-statistic for the squared normalized 15 minute returns

Q-statistic Prob. 6.8269 0.009 6.9267 0.031 7.4979 0.058 7.5141 0.111 7.9985 0.156

(38)

7 LIMITATIONS

Figure 12: Autocorrelation for the squared normalized intra-day returns

7 Limitations

In this chapter I simulate the daily returns for the Realized GARCH model (detailed in section 3.2.1) and explore the possible limitations of this model. As true parameters I chose to use the parameter values obtained from the model with Realized Kernel, see third col-umn in Table 2, that is: ω = 0.2013; β = 0.5361; γ = 0.5; ξ = −0.7056; φ = 0.8898; τ1 =

−0.0219; τ2 = 0.0505; σu = 0.09. The simulated returns are plotted in Figure 13. From

observing this graph a concern arises that modelled returns show lower degree of volatility clustering than the real data (compare with Figure 1). The autocorrelations for these re-turns are plotted in Figure 14. The volatility shows lower magnitude and decays faster than observed for the Intel Corporation squared returns (see Figure 3).

The serial correlation of the simulated squared returns is strongly influenced by the parame-ters τ1, τ2 and σu. If these are relatively small, then the variance of log(ht) and log(rt2) is also

small, recall that log(r2_t) = log(ht) + log(2). And this also implies small autocorrelations in

r2_{. In the model with the realized kernel, the estimates of these three parameters are small,}

(39)

7 LIMITATIONS

higher values for these parameters should chosen. For instance, I implemented τ1 = −0.25,

τ2 = 0.1 and σu = 0.1, which lead to the autocorrelation similar to that observed in the data

just with a faster decay, see Figure 15.

Overall, two important features should be noted, first, values of the parameters τ1, τ2 and

σu obtained by Realized GARCH model should be treated with care and might need

adjust-ments to account for the higher level of volatility clustering observed in the data. Another implication is that even with the ’styled’ parameters autocorrelation still decays faster than for the data. Initially after observing the results of the Realized GARCH model, an argu-mentation in section 5.1 was carried out that this model captures the long memory property. It was indicated by π value being close to 1 (recall results in Table 2). However, after ob-serving the simulation results, it seems this conclusion needs to be adjusted. That is, the Realized GARCH model fails to fully capture the long memory property.

(40)

7 LIMITATIONS

Figure 14: Autocorrelation of the simulated squared returns

(41)

8 CONCLUSIONS

8 Conclusions

First of all, I would like to draw some important conclusions regarding Realized GARCH model. It was extensively shown that within the framework of this model appliance of Real-ized Kernel provides the most accurate results. It was shown to be so in terms of modelling and forecasting. This motivates that quadratic variation of the underlying true price process should be modelled by the Realized Kernel rather than the Realized variance. However, even then the simulation of the daily returns indicated that Realized GARCH does not fully capture the volatility clustering and its persistence that is observed in the data.

In terms of Multiplicative Component GARCH model, the importance of diurnal pattern is emphasized. In fact, the clear shape of this pattern obtained here coincides with the results in many papers regarding Intra-day volatility. The majority of the forecasting measures favour the inclusion of the stochastic intra-day volatility component but its significance is still doubtful. Forecasting results evaluated by various measures give close results which sometime are contradicting. Therefore additional forecasting measure was added – MME(u), which penalizes under-predictions more than over-predictions. This measure indicated that a stochastic intra-day component should be included in the model, if it is predicted to be above 1. Thus importance lies within the predicted level of the stochastic component rather than its variations within the day. In-depth analysis of the intra-day returns (normalized by daily variance component forecasts formed with Realized GARCH model and by deterministic diurnal pattern) shows that no GARCH type model is needed on the intra-day basis. In more detail, no asymmetries and, in fact, no significant serial correlation is observed for the standardized squared 15 minute returns. Consequently, we can argue that modelling the daily and distinguishing the diurnal volatility components is of substantial importance in order to forecast the intra-day variance combined with accurate predictions of the level of the stochastic intra-day component.

(42)

REFERENCES REFERENCES

References

[1] Andersen T.G., Bollerslev T. 1997. Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance 4 (1997) 115-158.

[2] Andersen T.G., Bollerslev T. 1998. Answering the skeptics: yes, standard volatility models do provide accurateforecasts. International Economic Review 39(4): 885–905.

[3] Andersen T.G., Bollerslev T., Ashish D. 2001. Variance-ratio Statistics and High-frequency Data: Testing for Changes in Intraday Volatility Patterns. The Journal of Finance, 2001, Vol.56(1), pp.305-327

[4] Andersen T.G., Bollerslev T., Diebold FX, Labys P. 2001. The distribution of exchange rate volatility. Journal of the American Statistical Association 96(453): 42–55 (correc-tion published 2003, Vol. 98, p. 501).

[5] Andersen T.G., Bollerslev T., Diebold FX, Labys P. 2003. Modeling and forecasting realized volatility. Econometrica 71(2): 579–625.

[6] Bandi F.M., Russell J.R. 2008. Microstructure Noise, Realized Variance, and Optimal Sampling. Review of Economic Studies 75, 339–369.

[7] Barndorff-Nielsen O.E., Shephard N. 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society B 64: 253–280.

[8] Barndorff-Nielsen O.E., Shephard N. 2004. Power and bipower variation with stochastic volatility and jumps(with discussion). Journal of Financial Econometrics 2: 1–48.

[9] Barndorff-Nielsen O.E., Hansen P.R., Lunde A., Shephard N. 2008. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econo-metrica 76: 1481–536.

[10] Barndorff-Nielsen O.E., Hansen P.R., Lunde A., Shephard N. 2009a. Realised kernels in practice: trades and quotes. Econometrics Journal 12: 1–33.

(43)

REFERENCES REFERENCES

[11] Engle R.F. 2002. New frontiers of ARCH models. Journal of Applied Econometrics 17: 425–446.

[12] Engle R.F., Gallo G. 2006. A multiple indicators model for volatility using intra-daily data. Journal of Econometrics 131: 3–27.

[13] Engle R.F., Sokalska M. E. 2012. Forecasting intraday volatility in the US equity market. Multiplicative Component GARCH. JournalofFinancialEconom etrics, 2012, Vol. 10, No. 1, 54–83.

[14] Etheridge A. 2002. A Course in Financial Calculus.

[15] Ghose D., and Kroner. K. 1996. Components of Volatility in Foreign Exchange Markets: An Empirical Analysis of High Frequency Data. Unpublished manuscript, Department of Economics, University of Arizona.

[16] Giot P., 2005. Market Risk Models for Intraday Data. European Journal of Finance 11: 309–324.

[17] Hansen P.R., Lunde A. 2006. Consistent Ranking of Volatility Models. Journal of Econo-metrics 131: 97–121.

[18] Hansen P.R., Huang Z., Shek H.H. 2011. Realized GARCH: A joint model for returns and realized measures of volatility. Journal of Applied Econometrics 27: 877–906.

[19] Mcmillan D.G., Garcia R.Q. 2009. Intra-day volatility forecasts. Applied Financial Economics,19,611-623.

[20] Shephard N., Sheppard K. 2010. Realising the future: forecasting with high frequency based volatility (HEAVY) models. Journal of Applied Econometrics 25: 197–231.

(44)

A APPENDIX

A

Appendix

A.1 15 minute Realized Kernel calculation

In order to calculate 15 minute RKt,ithe same procedure as in section 3.2.4 is employed. Just

here to find the optimal H for every 15 minute bin I calculate 1 minute sub-sampled Realized Variance. And use every 5, 10,..., 150 seconds returns in order to find RV_dense(1) , RV_dense(2) ..., RV_dense(q) and then consequently ˆω and H for the every 15 minute time interval is obtained.

A.2 Diebold-Mariano test for Comparing Predictive Accuracy

Let yt denote the series to be predicted, also assume that we have two obtained forecasts ˆyt1,

ˆ y2

t. Then 1t = yt− ˆy1t and 2t = yt− ˆy2t can be constructed. The interest lies in evaluating

whether or not the accuracy of the forecasts differ. The accuracy of each forecast is measured by a particular loss function. The most common loss functions are:

(i) squared error loss: (i

t)2, i = 1, 2

(ii) absolute error loss:| i

t|, i = 1, 2

In this thesis I chose to use the absolute error loss function because the magnitude of the obtained errors were very small. To determine if one model predicts better than another I test the null hypothesis:

H0 : E| 1t | = E | 2t | with alternative: H1 : E| 1t | 6= E | 2 t | .

The Diebold-Mariano test is based on the loss differential:

dt=| 1t | − | 2

t |

and the statistic is:

S = ¯ d q ˆ V_d¯/T ∼ N (0, 1),

(45)

A.3 Data analysis for Microsoft stock A APPENDIX

Table 15: The Diebold-Mariano (DM) Test The Diebold-Mariano (DM) Test for the composite variance

Date 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Compared to proxies:

r2_t,i -2.0276*** -0.5888 -3.4989*** 1.3354

RVsub.t,i -0.7402 -0.3826 -1.1151 0.9259

RKt,i 0.4761 -3.8525*** 0.5305 0.0303

Table 16: The Diebold-Mariano (DM) Test for qt,i

The Diebold-Mariano (DM) Test

Date 2008.12.12 2009.09.30 2010.07.19 2011.05.03 Compared to: z2 t,i -4.1476*** -1.2114 -1.4253 1.2017 where ¯ d = 1 T T X t=t0 dt and ˆ V_d¯= γ0+ 2 ∞ X j=1 γj, γj = cov(dt, dt−j).

In Table 15 I present results of DM test for comparing prediction accuracy between the 15 minute ahead forecasts of the ht,i – composite variance and gt,i – composite variance with no

stochastic component for Intel Corporation stock. In Table 16 I present results of DM test for comparing prediction accuracy between qt,i and qt,i = 1. 4

A.3 Data analysis for Microsoft stock

Data sample for Microsoft stock starts on the 2nd_{of June, 1990 till the 3}rd_{of May, 2011. Thus}

in total I have 3000 days of observations with 5 second log-returns within the day. In total

(46)

A.3 Data analysis for Microsoft stock A APPENDIX

there are 3000 trading days (6.5 hours) each with 4680 5 second returns. All in all I declare 4680*3000=14,040,000 high frequency log returns to be known observations. Around 43% of these 5 second returns are zeros. Additional 30 days of data (from 2011.05.04 to 2011.06.15), which is also recorded in the data set will serve as a out of sample observations for the evaluations of the forecasts. For a better illustration of the data I plot the daily returns and provide their descriptive statistics in Figures 16 and 17 along with plots of autocorrelations in Figure 18.

Figure 16: Daily log returns of the Microsoft stock

(47)

A.4 Results for Microsoft stock A APPENDIX

Figure 18: Autocorrelation for the squared daily returns of the Microsoft stock

A.4 Results for Microsoft stock

In order to check whether or not results obtained for Intel Corporation are robust I carry out the same analysis for the Microsoft stock. Microsoft and Intel Corporation stocks have very similar trading frequency, in fact in my dataset these two stocks are the most often traded. Therefore, the results should be similar (summarized results for the Microsoft can be found in Tables :18, 17, 19, 20,21, 23, 25, 22, 24 and Figure 19).

After close investigation of the results one noticeable difference is the higher significance of the forecasts for the Microsoft stock. Other than that the results are very similar for the both stocks. Therefore the same conclusions can be drawn and this serves as a robustness check for the results obtained for the Intel Corporation stock.

(48)

A.4 Results for Microsoft stock A APPENDIX T able 17: Obtained results for log-lin ear Realized GAR CH(1,1) P arameters Standard errors Studen t’s t p-v a lue R V R V sub RK R V R V sub RK R V R V sub RK R V R V sub RK ω 0.095615 0.076677 0.638091 0.083816 0.090438 0.126586 1.140776 0.847844 5.040761 0.254054 0.396593 0.000000 β 0.648266 0.610604 0.555942 0.019106 0.019807 0.020644 33.930427 30.827763 26.93 0361 0.000000 0.000000 0.000000 γ 0.362973 0.397128 0.536165 0.020708 0.021327 0.028291 17.528512 18.620863 18.95 1653 0.000000 0.000000 0.000000 ξ -0.713624 -0.638446 -1.4995 60 0.205688 0.203290 0.185150 -3.469458 -3.140562 -8.099162 0.000529 0.001703 0.000000 ϕ 0.916160 0.928232 0.791914 0.024092 0.023797 0.021673 38.027053 39.006406 36.53 9008 0.000000 0.000000 0.000000 τ1 -0.029849 -0.034959 -0.0322 47 0.008251 0.007842 0.005725 -3.617575 -4.458042 -5.633122 0.000302 0.000009 0.000000 τ2 0.091492 0.091127 0.056496 0.005505 0.005292 0.003835 16.620458 17.220512 14.73 2924 0.000000 0.000000 0.000000 π 0.980808 0.979231 0.980539

(49)

Table 18: Log-likelihood for log-linear Realized GARCH(1,1)

Realized measure: RV RV sub RK

Value of the log-likelihood: 12155.3449 12317.2870 13253.8499

Table 19: Obtained results for out of sample forecasts of daily volatility and realized measure over 30 days horizon

RV RV sub RK

Forecasts for: ht xt ht xt ht xt

RMSE: 1.07E-04 1.06E-04 1.24E-04 1.28E-04 7.01E-05 1.00E-04 Compared to out of sample: RVt RVsub.t RKt

A.4.2 Multiplicative Component GARCH results

Table 20: GARCH(1,1) results for stochastic component qt,i

Parameter value Standard error Student’s t p-value

ω 0.059516 0.022729 2.618532 0.014305

α 0.093866 0.021823 4.301170 0.000199

β 0.847342 0.034912 24.270579 0.000000

(50)

Table 21: RMSE comparison for composite variance RMSE

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

t,i 5.724E-09 7.076E-09 6.3E-11 7.3E-11 3.74E-10* 3.86E-10 2.6E-11*** 7.2E-11

RVsub.t,i 1.5E-09 1.431E-09 4.8E-11* 7.3E-11 1.7E-11 1.6E-11 7E-12*** 2.8E-11

RKt,i 1.684E-09 1.654E-09 4.7E-11** 6.7E-11 2E-11 1.8E-11 7E-12 7E-11

The Diebold-Mariano (DM) Test

Date 2008.12.12 2009.09.30 2010.07.19 2011.05.03 Compared to proxies: r2 t,i -1.4504 0.5096 -1.6770* -3.9020*** RVsub.t,i -0.3738 -1.8587* 1.3959 -2.1925** RKt,i -0.4910 -2.0378** 0.6318 -0.9537

(51)

Table 22: MSE comparison for the intra-day component

MSE

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03 Forecasts of: qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1 Compared to proxies:

z2

t,i 2.6926 3.0967 2.4829 2.8399 1.4256*** 1.4830 1.2033*** 1.4826

The Diebold-Mariano (DM) Test for qt,i

Date 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Compared to:

z_t,i2 -0.0314 -0.0173 -2.383*** -5.6998***

Table 23: LIK comparison for composite variance LIK

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

t,i 247.4329 244.0970 293.3140 289.6178 285.0877 282.9899 310.2033 302.2858

Table 24: LIK comparison for the intra-day component LIK

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

Forecasts of: qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1 qt,i qt,i= 1

Compared to proxies: z2

(52)

Table 25: MME(u) comparison for composite variance MME(u)

Date: 2008.12.12 2009.09.30 2010.07.19 2011.05.03

t,i 1.89E-03 2.09E-03 9.02E-04 1.01E-03 9.25E-04* 9.35E-04 3.15E-04 2.38E-04***

RVsub.t,i 3.22E-0 3.53E-03 1.40E-03* 1.61E-03 6.31E-04 6.05E-04 7.04E-04 3.02E-04***

Forecasting intra-day volatility : multiplicative component realized GARCH

UNIVERSITY OF AMSTERDAM

FACULTY OF ECONOMICS AND BUSSINESS

Master thesis

in the subject of

Financial Econometrics

Forecasting intra-day volatility.

Multiplicative Component Realized GARCH

Karolina Jerofejevaite

10603484

Supervised by Peter Boswijk

Contents

1

Introduction

2

Literature review

3

Econometric methods

3.1

Multiplicative Component GARCH

3.2

Realized GARCH

4

Data

5

Results

5.1

Realized GARCH modelling results

P

arameters

Standard

errors

Studen

t’s

t

p-v

alue

R

V

R

V

sub

RK

R

V

R

V

sub

RK

R

V

R

V

sub

RK

R

V

R

V

sub

RK

ω

-0.1191

-0.1490

0.2013

0.0824

0.0933

0.1234

-1.4447

-1.5973

1.6306

0.1486

0.1103

0.1031

β

0.6271

0.5768

0.5361

0.0210

0.0221