• No results found

Asymmetric and Skewed GARCH modeling GARCH models in a GAS framework using the AEPD with an application in the Expected Shortfall

N/A
N/A
Protected

Academic year: 2021

Share "Asymmetric and Skewed GARCH modeling GARCH models in a GAS framework using the AEPD with an application in the Expected Shortfall"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Amsterdam School of Economics

faculty of Economics and Business

Asymmetric and Skewed GARCH

modeling

GARCH models in a GAS framework using the AEPD with an

application in the Expected Shortfall

Marwin W. Booleman (10462147) Master of Econometrics

Econometrics track Supervisor:Andreas Rapp Second reader: Kasia Lasak Date: August 15, 2018 Abstract:

This paper sets out to find the GARCH model in the generalized autoregressive score (GAS) framework. But instead of using the standard distributions, Normal and Student’s t distribu-tions, the asymmetric exponential power distribution (AEPD) is used. This GAS framework is able to use more information from a density than just the moments of a distribution. Mak-ing it a better candidate for financial forecastMak-ing than the standard GARCH model. the models compared are: Normal GARCH, t-GARCH, AEPD-GARCH, Normal GAS (equiva-lent to Normal GARCH), student’s t GAS and AEPD GAS. To compare the models and how much better or worse they perform all models are fitted using maximum likelihood estima-tion.The GAS AEPD has the best fit in terms of lowest AIC, BIC and AICC. After that a test for prediction strength is performed. Forecasting the variation and from that the expected shortfall. Here the AEPD GAS model does not perform best. The AEPD GARCH and the t-GARCH for the one and 5 day ahead predictions respectively perform best. This is likely due to over fitting or having too much density in its tails.

(2)

Statement of Originality

This document is written by Student Marwin Booleman who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 4 2 Theoretical Framework 5 2.1 GARCH . . . 5 2.2 AEPD . . . 7 2.3 AEPD-GARCH . . . 11 2.4 GAS . . . 11 2.5 AEPD-GAS-GARCH . . . 13 2.6 Expected Shortfall . . . 15 3 Research Design 16 3.1 Maximum Likelihood Estimation . . . 17

3.2 Expected Shortfall prediction . . . 18

3.3 Data . . . 20

4 Maximum Likelihood Estimation Results and Analyses 21

5 Expected Shortfall prediction Results and Analyses 23

6 Conclusions 25

References 27

(4)

1

Introduction

It has been widely contested for many years what the best model is to predict and fit finan-cial data. Starting off with the Autoregressive Moving Average models (ARMA models) then proceeding to the ARCH model from Engle (1982), which (Bollerslev, 1986) improved upon to give us the GARCH model. Yet these models still mostly use the assumption of normally distributed errors. But as research by, among others, Hansen (1994), Campbell et al. (1997) and Bao et al. (2017) shows, the normal distribution is too simple; it can not sufficiently take into account asymmetry and skewness.

Somewhat recently many newer distributions have come into use, mainly due to the increase in computing power in the last few decades and the abundance of data present. For instance, the student’s t distribution used by Bollerslev (1986), the skewed t distribution used by Hansen (1994), the general error distribution, asymmetric power distribution from Komunjer (2007) and the asymmetric exponential power distribution (AEPD) from Zhu and Zinde-Walsh (2009) are but a few examples. Each distribution is more general and can take into account more skewness,asymmetry or fat tailness than the previous. As these distri-butions can all be found within the AEPD from Zhu and Zinde-Walsh (2009), which is the distribution of choice for this paper.This then brings us to the research question central to this thesis; how does the AEPD perform compared to the Normal and t distribution?

Still there are many models which can potentially be used as well. The previously named models could be used, but also the DCC model of Engle (2002). Very recently Creal et al. (2013) published a paper on the generalized autoregressive score models. This frame-work, or model, encompasses these models. They state that, the use of the score of your assumed density distinguishes it from other observation driven models in literature as it de-pends on the complete density and not only the moments of the observations. This is what this paper wants to test, by finding the closed form for the AEPD-GAS model which has not been found in literature yet. This is the basis for the second research question; does the GAS framework improve the results?

This paper sets out to see if a flexible and general distribution in combination with a general framework provides good results. To find if they do, different models are fit on daily financial data. The models are the GARCH, t-GARCH, AEPD-GARCH, normal GAS, t-GAS and AEPD-GAS. Besides fitting the distributions on the data,a more practical test is done by testing the predictive power of the models. Predictive power in terms of predicting the j-step ahead variance but also a risk measure, the expected shortfall.

This paper is build up as follows: section two deals with all the different models and the derivation of the AEPD-GAS model. Section three deals with the research design, it shows how the maximum likelihood estimation works and the design for the expected

(5)

short-fall prediction testing and lastly the data. Section 4 and 5 deal with the results and analyses of the maximum likelihood estimation and the expected shortfall forecasting results. Finally section 6 will contain the conclusion.

2

Theoretical Framework

In this section the distributions and models used are explained and the new one is worked out. First the General Autoregressive Conditional Heteroskedasticity model (GARCH model) is shown. Secondly the Asymptotic Exponential Power Distribution (AEPD) is introduced. The third section combines these two to form the basis for the AEPD-GARCH model. In the fourth part the Generalized Autoregressive Score model (GAS model) is handled with its application in GARCH models, most importantly its student-t and normally distributed GAS GARCH models. The fifth part unites the two models and the distribution and shows the GARCH model that is gotten when the GAS approach is done with the AEPD as its distribution. Lastly the Expected Shortfall (ES) is described for all the different models and distributions.

2.1 GARCH

The model

Engle (1982) introduced the Autoregressive Conditional Heteroscedastic model (ARCH(m) model, with m being the number of lagged periods used). He proposed a model which conditions on the past for the variance instead of the expected return, which was mostly done before his research. This conditionality comes from the assumption that when the returns (rt−i) of a stock have a had a large deviation of its mean return (µt−i) in absolute value

(|at−i| = |rt−i− µt−j|), it is also expected that the volatility would be higher for that period

and thus also the next periods when forecasting. Assumed is that µt−j is zero for all t and j,

thus causing at to be equal to rt the returns. His model assuming a linear positive function,

as the variance can not be negative, is given by Tsay (2005, p. 103)1:

at= σtt σt2 = α0+ m X i=1 αia2t−i (1) t∼ i.i.d (0, 1) 1

(6)

Bollerslev (1986) improved on Engle’s research. He noticed the similarity to the Autoregres-sive Moving Average models (ARMA(p,q) model) used in time series estimation. In the book from Tsay (2005, p. 58) the general ARMA(p, q) form is given:

rt= φ0+ p X i=1 φirt−i+ t− q X j=1 βat−j

t being White Noise (2)

Drawing from the similarity in functional form he showed the GARCH(m, s) model to be: at= σtt σ2t = φ0+ m X i=1 φia2t−i+ s X j=1 βjσt−j2 (3) t∼ i.i.d (0, 1)

The usual assumption on the t ∼ i.i.d.(0, 1) is that its independent identically distributed

and that its distribution is Gaussian. If there is need for fatter tails the standardized student’s t distribution is often used. These models are mostly estimated using Maximum Likelihood (ML) estimation. This will be discussed in section 3.1.

Distributions

The pdf’s of the normal and standardized student’s t distribution are given by: f (t|µ, σ2) = 1 √ 2πσ2e −(t−µ)2 2σ2 (4) f (t|ν) = Γ(ν+12 ) Γ(ν2)p(ν − 2)π  1 +  2 t ν − 2 −ν+12 , ν > 2 (5) Γ(x) = Z ∞ 0

yx−1e−ydy , the gamma function

For the GARCH model with t∼ N (0, 1) the pdf at time t based on the information set Ft−1

would be: f (at, σt|Ft−1) = 1 p 2πσ2 t e− (at)2 2σ2t (6)

If ˜t ∼ t(ν) and t = ˜tp(ν − 2)/ν then t has a standardized student’s t distribution, and

thus mean zero and a variance of one, the pdf is given by:

f (at, σt|Ft−1) = Γ(ν+12 ) Γ(ν2)p(ν − 2)π 1 p σt2  1 + a 2 t (ν − 2)σ2 t −ν+12 (7)

(7)

The √1

σ2 t

outside the exponent or brackets is gotten from the Jacobian that arises in the transformation of at = σtt to t = σatt. This transformation looks like: f (aσtt|Ft−1) =

f (t|Ft−1)∂a∂at/σtt = f (t|Ft−1)√1 σ2

t

.

2.2 AEPD

While the standardized student’s t distribution can account for fat tails of the financial data needed for risk management, as Hansen (1994), Campbell et al. (1997) and Bao et al. (2017) noted, there is still need for asymmetry. For this purpose the asymmetric exponential power distribution from Zhu and Zinde-Walsh (2009) is chosen.

Zhu and Zinde-Walsh (2009) showed in their article that this class encompasses both the need for asymmetry and skewness by having parameters that are easily interpretable for location, scale and shape. On top of that they found closed form solutions for the expres-sions of the moments as well as the ES and the value at risk (VaR). Lastly they showed that the maximum likelihood estimation’s asymptotic properties are consistent and asymptotically normal, while also giving closed form solutions for the information matrix. These character-istics make this distribution a perfect candidate to fit daily financial data.

The density function of the AEPD is given by:

fY(y) =    α α∗σ1KEP(p1) exp  −1 p1 y−µ∗σ p1 , if y ≤ µ 1−α 1−α∗σ1KEP(p2) exp  −1 p2 y−µ 2(1−α∗ p2 , if y > µ (8) KEP(pi) =  2p 1 pi i Γ  1 + 1 pi −1 , i = 1, 2 (9) α∗= αKEP(p1) αKEP(p1) + (1 − α)KEP(p2) (10)

Here µ is the location parameter and σ represents scale, like in the Gaussian framework. The α ∈ (0, 1) is the parameter of the skewness of the distribution. Zhu and Zinde-Walsh (2009) show α < 1/2 gives skewness to the right and α > 1/2 skewness to the left. The parameter p1 > 0 is for the left tail and p2 > 0 is for the right tail. The closer pi to zero the heavier

that side of the tail is, it also increases how leptokurtic the distribution is. If pi approaches

infinity, that side of the distribution reduces to that side of an uniform distribution. Therefore these three (α, p1, p2) parameters control the shape of the distribution. α∗ provides a scale

adjustment to the left and right parts to ensure this continuity when the shape parameters change. The function KEP(pi) causes

R∞

−∞fY(y)dy = 1 to hold, thereby making the function

a proper density function. Γ(x) is again the gamma function as in the previous section. Figure 1 shows the AEPD density plot for different parameter values.

(8)

Figure 1: Multiple AEPD density plots for different values of α, p1 and p2 by Zhu and

Zinde-Walsh (2009)

As can be seen in the densities the distribution is continuous at every point and unimodal with mode µ, but not differentiable at µ. Generally the distribution is asymmetric causing

(9)

the mean to be different from its mode µ, its mean (ω) and variance (δ2) are given by: ω = 1 B  (1 − α)2p2Γ(2/p2) Γ2(1/p 2) − α2p1Γ(2/p1) Γ2(1/p 1)  (11) δ2 = 1 B2  (1 − α)3p 2 2Γ(3/p2) Γ3(1/p 2) + α3p 2 1Γ(3/p1) Γ3(1/p 1)  − ω2 (12) with B ≡ α α∗KEP(p1) = 1 − α 1 − α∗KEP(p2) = αKEP(p1) + (1 − α)KEP(p2) (13) and Γy(x) = (Γ(x))y

The B here is of no special interest, except that it makes writing the functions for ω and δ2 better to read.

Another useful aspect of this distribution is that for certain parameter values this dis-tribution simplifies to other disdis-tributions. Mainly the normal disdis-tribution, the asymmetric power distribution (APD) or skewed exponential power distribution (SEPD) from Fern´andez et al. (1995), Komunjer (2007) and Theodossiou (2015). In Table 1 a list of important distri-butions can be found. Due to the asymptotic normality of the maximum Likelihood estimator (MLE) a significance test can be done to test if any of these distributions are the true distri-bution instead of the AEPD. To be able to use the AEPD in the GAS framework the closed

Distribution skewness parameter tail parameters

Normal α = 1/2 p1 = p2 = 2

Two-piece normal α 6= 1/2 p1 = p2 = 2

APD/SEPD α ∈ (0, 1) p1 = p2 = p, p > 0

General Error Distribution (GED) α = 1/2 p1 = p2 = p, p > 0

Laplace α = 1/2 p1 = p2 = 1

Asymmetric Laplace α 6= 1/2 p1 = p2 = 1

Table 1: Combinations of AEPD parameter values resulting in different well known distribu-tions

forms of the score function and hessian function, or information matrix, are needed. As Zhu and Zinde-Walsh (2009) showed in their paper the score function of the AEPD distribution

(10)

looks like: f = f (at; θ), θ = (α, p1, p2, µ, σ) L ≡ L(at; θ) ≡ Γ(1 + 1/p1)|µ − at| ασ I(at< µ) (14) R ≡ R(at; θ) ≡ Γ(1 + 1/p2)|at− µ| (1 − α)σ I(at> µ) (15) ln f = − ln(σ) − [L(at; θ)]p1− [R(at; θ)]p2 (16) ∂ ln f ∂α = p1 αL p1 p2 1 − αR p2 (17) ∂ ln f ∂p1 = 1 p1 ψ(1 + 1/p1) − ln(L)  Lp1 (18) ∂ ln f ∂p2 = 1 p2 ψ(1 + 1/p2) − ln(R)  Rp2 (19) ∂ ln f ∂µ = − Γ(1/p1) ασ L p1−1+ Γ(1/p2) (1 − α)σR p2−1 (20) ∂ ln f ∂σ = p1 σL p1+ p2 σ R p2 1 σ (21)

Here ψ(x) = Γ0(x)/Γ(x), the digamma function and for the hessian: ψ(k)(x) =P∞

i=1

k(−1)k+1

(x+1)k+1.

The Information matrix, where λij ≡ E [(∂ ln f (at; θ0)/∂θi) (∂ ln f (at; θ0)/∂θj)] and λij = λji,

from Zhu and Zinde-Walsh (2009): λ11= p1+ 1 α + p2+ 1 1 − α, λ12= − 1 p1 , λ13= 1 p2 λ14= − 1 σ  p1 α + p2 1 − α  , λ15= w p1− p2 σ λ22= α p3 1 (1 + 1/p1)ψ0(1 + 1/p1), λ23= 0 λ24= 1 σp1 [ψ(2) − ψ(1 + 1/p1)] , λ25− α σp1 (22) λ33= 1 − α p32 (1 + 1/p2)ψ 0 (1 + 1/p2), λ34= − 1 σp2 [ψ(2) − ψ(1 + 1/p2)] λ35= αp1+ (1 − α)p2 σ2 λ44= Γ(1/p1)Γ(2 − 1/p1) ασ2 + Γ(1/p2)Γ(2 − 1/p2) (1 − α)σ2 , λ45= 1 σ2(p2− p1) λ55= αp1+ (1 − α)p2 σ2

These are gotten through a reparametrization of (8) by rescaling.

fAEP D(y|θ) =    1 σexp  1 p1 x−µ 2ασKEP(p1) p1 , ifx ≤ µ 1 σexp  1 p2 x−µ 2(1−α)σKEP(p2) p2 , ifx > µ (23)

(11)

2.3 AEPD-GARCH

With the introduction of the AEPD and the GARCH model they can now be linked to each other. Still the standard GARCH(m,s) formulation holds and with AEPD distributed errors looks like: at= σtt σ2t = φ0+ m X i=1 φia2t−i+ s X j=1 βjσt−j2 (24) ti.i.d.∼ AEP D(α, p1, p2, 0, 1)

The general AEPD density form can is expressed as σ1fAEP D(u−µσ ) since, for this version

µ = 0 and σ = 1 holds the pdf is given by:

fAEP D(y) =    α α∗KEP(p1) exp  −1 p1 y∗ p1 , if y ≤ 0 1−α 1−α∗KEP(p2) exp  −1 p2 y 2(1−α∗) p2 , if y > 0 (25)

As µ and σ are almost never equal to the expected value and variance of the AEPD, epsilon needs to be standardized. This is done for  trough a random variable Z and by using ω and δ from formula (11) and (12), which are expressions not depending on µ and σ.

 = σ Z − ω δ , σ = 1 (26) at= σtt ⇒ Z = δat σt + ω (27) fZ(z) = δ σt fY  ω + δat σt  (28) Causing the pdf to look like, given the information set available at time t (Ft−1):

fAEP D(at, σt|Ft−1) =        δ σt α α∗KEP(p1) exp  −1 p1 ω+δat σt 2α∗ p1 , if ω + δat σt ≤ 0 δ σt 1−α 1−α∗KEP(p2) exp  −1 p2 ω+δat σt 2(1−α∗) p2 , if ω + δat σt > 0 (29) 2.4 GAS

Before the main model of interest, the AEPD GARCH model in the GAS framework, the gen-eralized autoregressive score model itself needs to be shown. Creal et al. (2013) build upon the existing observation-driven models, which introduces time variation of the parameters by let-ting them depend on their lagged values as well as exogenous (lagged) variables. This causes

(12)

the model to be able to be evaluated by maximum likelihood. Examples of these models are the ARCH model from Engle (1982) and the GARCH model from Bollerslev (1986). An important characteristic of the GAS model is that it encompasses these models, also others like the dynamic conditional (DCC) model of Engle (2002).

General specification

The notation from Creal et al. (2013) is used to explain the GAS model, later these are written using the notation from (Tsay, 2005). Let the vector yt be the dependent variable,

ft the time-varying parameter vector, xt a vector of exogenous variables, for all at time t.

Yt= {y1, ..., yt}, Ft= {f0, f1, ..., ft}, Xt= {x1, ..., xt} and θ is a vector of static parameters.

At time t the information set available is {ft, Ft}, with:

Ft= {Yt−1, Ft−1, Xt}, for t = 1, ...n

And assumed is that yt is generated by the observation density::

yt∼ p(yt|ft, Ft; θ)

The mechanism for updating the time-varying parameter looks very similar to the autore-gressive updating equation, very similar to the second equation in (3), and is

ft+1= ω + p X i=1 Aist−1+1+ q X j=1 Bjft−j+1 (30)

where ω is a vector of constants, this papers main focus is on the univariate case and therefore will be a scalar. Ai and Bj coefficient matrices of dimensions 1 × p and 1 × q respectively. st

is a function of past data st= st(yt, ft, Ft; θ).

The chosen approach, and why the model’s name includes the word score, is based on the score of the assumed observation density, making a sta function of the score by:

st= St∇t, ∇t=

∂ ln p(yt|ft, Ft; θ)

∂ft

, St= S(t, ft, Ft; θ) (31)

where S(·) is a matrix function, and can be chosen. Useful options are the variance of the score:

St= It|t−1−1 , It|t−1−1 = −Et−1[∇t∇0t] −1

(32) where Et−1 is the expectation over the used observation density p(yt|ft, Ft; θ).

(13)

Special cases of GAS models

The first special case of GAS models shows how the GAS framework encompasses the the Gaussian GARCH model. Consider at= σttwith thaving a normal distribution with mean

zero and variance of one. σt is the time-varying parameter. The GAS(1,1) model:

ft+1= ω + A1st+ B1ft

With ft= σt2 and St= It|t−1−1 reduces to:

ft+1= ω + A1(a2t − ft) + B1ft

⇒ σ2

t+1= φ0+ φ1a2t + β1σ2t (33)

With φ0 = ω, φ1 = A1 and β1 = B1− A1, causing this to be equivalent to the standard

GARCH(1,1) model with Gaussian error terms.

However when the assumed observation density is not a normal density function, but the standardized student’s t density the GAS(1,1) model is not equivalent to the GARCH(1,1) specification. The GAS(1,1) gotten from this assumption is given by:

at= σtt σt+12 = ω + A1(1 + 3ν−1)  (1 + ν−1)a2t (1 − 2ν−1)(1 + a2tν−1/((1 − 2ν−1)σt2))− σ 2 t  + B1σ2t (34)

See Appendix A for part of the derivation.

As expected when ν ⇒ ∞, when the student’s t distribution reduces to the normal distribution, the updating formula also reduces to the same formula gotten when the obser-vation density is assumed to be Gaussian. The updating formula assuming the student t distribution is very different to the GARCH(1,1) model of Bollerslev (1986). The term after A1 causes a large realization of |at| to have a lower impact on the variance of the next period

than in the standard t-GARCH(1,1) model. Thus this is where the GAS framework really differentiates itself from the GARCH models. Due to this term the GAS framework can take into account more aspects of the distribution than the GARCH models can. As an example the intuition behind this, as the errors come from a fat-tailed distribution a large realization of |at| does not necessitate a large increase in the variance.

2.5 AEPD-GAS-GARCH

With the introduction of the GAS framework it can now be extended by using the AEPD as the observation density. The full derivation can be found in Appendix B. The assumptions

(14)

made are: at= σtt at∼ p(at|ft, F ; θ) ft= σt2 p(at|ft, F ; θ) =    1 σexp  1 p1 at−µ 2ασKEP(p1) p1 , if at≤ µ 1 σexp  1 p2 at−µ 2(1−α)σKEP(p2) p2 , if at> µ (35) ∇t= ∂ ln p(at|ft, Ft; θ) ∂ft St= −It|t−1−1 = −E∇t∇0t −1 st= St∇t (36) From the derivation in Appendix B follows:

L(at; θ) = Γ(1 + 1/p1)(−ω − δ√at σ2 t ) α I(ω + δ at p σ2 t < 0) (37) R(at: θ) = Γ(1 + 1/p2)(ω + δ√at σ2 t ) (1 − α) I(ω + δ at p σ2 t > 0) (38) ∇t= − 1 2σt2 − Γ(1/p1) α [L(at; θ)] p1−1 δat 2σ2 t p σ2 t +Γ(1/p2) (1 − α)[R(at; θ)] p2−1 δat 2σ2 t p σ2 t (39) St= −2(σt2)2  1 −(p1− 1)Γ(1/p1)Γ(1 − 1/p1)δ 2 αp1 −(p2− 1)Γ(1/p2)Γ(1 − 1/p2)δ 2 (1 − α)p2 −1 (40) σ2t+1= α0+A1(−σt2)      1 +Γ(1/p1) α [L(at; θ)]p1 −1 δa√t σ2 t − Γ(1/p2) (1−α) [R(at; θ)]p2 −1 δa√t σ2 t   1 −(p1−1)Γ(1/p1)Γ(1−1/p1)δ2 αp1 − (p2−1)Γ(1/p2)Γ(1−1/p2)δ2 (1−α)p2      + B1σ2t (41) Here the ω and δare as defined in (11) and (12). Like in the AEPD GARCH model the parameters to be estimated are α, p1 and p2, which control the symmetry, or skewness, as

well as the thickness of the tails. Similar to the GAS(1,1) with the t distribution the stterm

causes high realizations of at to have more moderate effect on the variation than the AEPD

GARCH(1,1) model would.

As the R and L function have indicator functions they’re only active when the (stan-dardized) data point is in their respective tail, right for the R function and for L only the left

(15)

tail. This makes the model more flexible than the t-GAS model as the contribution of the realization of at can be better controlled by its corresponding tail parameter and moderate

the influence it will have on the variance.

The leverage effect says negative returns have a bigger effect on the volatility of the next period than positive returns. Function (41) can be split up in two section, one for the left side of the distribution (the part including L(at; θ)) and a part for the right side. The

left side only includes negative returns, by definition of the assumptions made on the error term. This could remove the need for models that include extra parameters to account for the leverage effect, such as the EGARCH or TGARCH model. This is due to the distribution already being able to put more density in the left tail than the right tail, which automatically could account for the leverage effect. Testing this empirically is beyond the scope of this paper, but a simple check is done.

2.6 Expected Shortfall

The Expected Shortfall (ES) is a coherent risk measure, that gained traction in somewhat recent years. Coherence is defined by Artzner et al. (1999) as follows:

ESα(X) = E[X|X < α(or q)] (42)

1.Translation invariance

ES(X + αr) = ES(X) − α (43)

2.Sub-additivity

ES(X1+ X2) ≤ ES(X1) + ES(X2) (44)

3.Positive homogeneity

λ ≥ 0 ⇒ ES(λX) = λES(X) (45)

4.Monotonicity

X ≤ Y ⇒ ES(X) ≤ ES(Y ) (46)

Translation invariance makes sure ES(X + ES(X)r) = 0 holds true, as ES(X) is mostly negative, this trait indicates investing more (less) in a portfolio or stock it raises (lowers) risk associated with that. Sub-additivity is a natural requirement that merging risks, does not create extra risk. Positive homogeneity so investing more, would yield more risk similar too the last one of monotonicity.

Cizek et al. (2011) and gives closed form solutions for the ES, for the Normal and Student’s t distribution:

ESα(X) = µ − σ

φ(Φ−1(α))

(16)

This is assuming your observation density follows the normal distribution. α is the quantile you wish to compute the ES for. φ(x) is the standard normal pdf and Φ−1(x) the inverse of the standard normal cdf, or the quantile function. Given the GARCH modeling approach, µ = 0 and σ = σtwill be used to compute the ES in case of the Normal GARCH(1,1) model

and the Normal GAS(1,1) model.

For the t distribution it is computed differently: ESα(X) = − 1 α(1 − ν) −1(ν − 2 + x2 α,ν)fν(xα,ν)σt (48) xα,ν = F−1(α, ν) (49)

Here fν(x) is the standardized pdf of the t distribution. and ν its degrees of freedom gotten

from the ML estimation. This form takes into account that it comes from a standardized student’s t distribution.

Lastly Zhu and Zinde-Walsh (2009) and Zhu and Galbraith (2011) give closed forms for the ES from an asymmetric exponential power distributed observation density. They show: ESp(X) = 2 p  −αα∗E[|XEP(p1)|]  1 − G  h1(p); 2 p1  + (1 − α)(1 − α∗)E[|XEP(p2)|]  G  h2(p); 2 p2  (50) Here p is the quantile you wish to compute the ES for and α is the skewness parameter from the AEPD. α∗ is as defined in (10). XEP(pi) is a r.v. having the standard GED distribution

with parameter pj, E[|XEP(pi)|] = pi1/piΓ(2/pi)/Γ(1/pi). G (hi(p); 2/pi) is the gamma cdf

with parameter 2/pi and 1. i =1 or 2. Lastly:

h1(p) = 1 p1 min(p, 0) 2α∗ p1 h2(p) = 1 p2 max(p, 0) 2(1 − α∗) p2 (51) The ES does have a small shortcoming as it is defined as a conditional mean in the tails of the distribution it is very sensitive to extreme values as well as having a low amount of data in the tails, or just small sample sets. Therefore it is very important to model the left tail of the distributions properly.

Now that all models and equations are explained the next part will show the research design for the ML estimation and the expected shortfall prediction.

3

Research Design

In this section the research design is written. First the formulas to be maximized in the ML estimation, secondly the ES prediction and lastly a closer look at the data set used.

(17)

3.1 Maximum Likelihood Estimation

Both the GARCH and GAS models are estimated by maximum likelihood. this is possible as they are both observation based. The programs for estimations are written in Matlab and the function fmincon is used to find the optimal point.

For normal distributed errors in the GARCH model and for the the GAS model with normal observation density the log-likelihood function is used by taking the natural logarithm (log) of the likelihood function. Given normality that function is given by:

LT(θ) = T X t=1 log p 1 2πσt2e −a2t 2σ2t ! LT(θ) = − T 2 log(2π) − T X t=1 1 2log(σ 2 t) − a2tt2 (52) σ2t = φ0+ φ1a2t−1+ β1σt−12

Here the function for σt is the same for the GAS and the GARCH function as they are

identical as shown in (33). θ = (φ0, φ1, β1) within its optimization space Θ ⊂ R+× R+× R+,

it’s all in R+ as negative values could lead to negative variances, which are not possible. It is only optimized over θ as in the model assumption zero mean and unit variance is assumed, so these are not to be optimized over.

For the t distribution the ML function looks different: LT(θ) = T log Γ ν+12  Γ(nu2 )√ν − 2π ! + T X t=1 −1 2 log(σ 2 t) − ν + 1 2 log  1 + a 2 t (ν − 2)σ2 t  (53) σt2 = φ0+ φ1a2t−1+ β1σ2t−1

This is used for optimizing the t-GARCH model. Here is maximized over θ = (φ0, φ1, β1, ν)

in its space Θ ⊂ R+× R+× R+× R+

The t-GAS model however has a different recursive formula for the σt2:

σ2t = ω + A1(1 + 3ν−1)  (1 + ν−1)y2 t−1 (1 − 2ν−1)(1 + y2t−1ν−1/((1 − 2ν−1)σ2t−1))− σ 2 t−1  + B1σ2t−1 (54)

Therefore the t-GAS model optimizes θ = (ω, A1, B1, ν) also in the set Θ ⊂ R+× R+× R+×

R+.

The log-likelihood function of the AEPD looks like: LT(θ) = T log (δB) + T X t=1 −1 2log(σt) − 1 p1 ω + δat σt 2α∗ p1 I(ω + δat σt ≤ 0) − 1 p2 ω + δat σt 2(1 − α∗) p2 I(ω + δat σt > 0) (55) σt2 = φ0+ φ1a2t−1+ β1σ2t−1

(18)

Where ω, δ and α∗ are as defined in section 2.3. The parameters in θ are (φ0, φ1, β1, α, p1, p2)

and its parameter set Θ ⊂ R+× R+× R+× (0, 1) × R+× R+

The updating equation for the AEPD GAS model is given in (41), this changes the parameters to be optimized over to θ = (α0, A1, B1, α, p1, p2) within the same parameter set

as the AEPD GARCH model.

As the σt2 are computed recursively there is need for a starting value σ21 it was chosen to use the sample variance of the returns.

After the ML estimations the different models are compared to each other using in-formation criteria. The used inin-formation criteria are the Akaike Inin-formation Criterion (AIC) from Akaike (1974), the AICC from Hurvich and Tsai (1989), which has a correction term for small sample sets and the Bayesian information criterion (BIC) from Schwarz (1978). The AIC formula is AIC = 2k − 2LT(ˆθ), the AICC formula is AICC = AIC +2k

2+2k

T −k−1 and the BIC

is given by BIC = k log(T ) − 2LT(ˆθ). Next the Expected shortfall prediction handled

3.2 Expected Shortfall prediction

The expected shortfall prediction is done trough forecasting of the σt2 as the coefficients are estimated for the recursive relation of the squared returns and the variation. The focus is on the one step ahead forecast and the five step ahead forecast. The one step ahead forecasts for the GARCH models and the normal GAS model are gotten by:

σt2= φ0+ φ1a2t−1+ β1σt−12

E[σ2t+1] = Eφ0+ φ1a2t + β1σt2



σt+1|t2 = φ0+ (φ1+ β1)σ2t (56)

For the t-GAS and AEPD-GAS models these are slightly different and are:

σ2t+1|t= ω + A1(1 + 3ν−1)  (1 + ν−1)σ2t (1 − 2ν−1)(1 + σ2 tν−1/((1 − 2ν−1)σt2)) − σ2t  + B1σt2 (57) σ2t+1|t= α0+ (58) A1(−σt2)      1 +Γ(1/p1) α α p1Γ(1+1/p1) δσt √ σ2 t − Γ(1/p2) (1−α) 1−α p2Γ(1+1/p2) δσt √ σ2 t   1 −(p1−1)Γ(1/p1)Γ(1−1/p1)δ2 αp1 − (p2−1)Γ(1/p2)Γ(1−1/p2)δ2 (1−α)p2      + B1σt2 (59) (60) The five step forecasting is gotten by doing the recursive relation 5 times of the one step ahead, while using the previously fore-casted value of the variance. So two step ahead would

(19)

be:

σt+2|t= φ0+ (φ1+ β1)(φ0+ (φ1+ β1)σt2)

σt+3|t= φ0+ (φ1+ β1)(φ0+ (φ1+ β1)(φ0+ (φ1+ β1)σ2t))

(61) This is done analogously for the t-GAS and AEPD-GAS model. A for loop is build in Matlab to compute this. The for loop predicts the one and five ahead forecast for every 500 data points. This is the main difference, between this research and that of Zhu and Galbraith (2011), they compute this for every data point after an initial set of 1000.

With the fore-casted variances the expected shortfall can be estimated as well. In section 2.6 closed forms are shown of the ESα. In the closed forms for the normal and t

distribution forecasting the ES is straightforward. Simply changing the σt in the formula by

its fore-casted value σt+j|t (j =1 or 5) would yield the appropriate forecast of the ES. The

normal and t distribution’s closed forms with one step forecasting are: ˆ ESα,t+1|t(X) = −σt+1|t φ(Φ−1(α)) 1 − α (62) ˆ ESα,t+1|t(X) = − 1 α(1 − ν) −1 (ν − 2 + x2α,ν)fν(xα,ν)σt+1|t (63)

For the AEPD this is done by using the closed form from (50) and rescaling using the fitted ω and δ from (11) and (12):

ˆ ESp,t+j|t(X) = σt+j|t ESp(X) − ω δ  ESp,t+j|t(X) = σt+j|t δ  2 p  −αα∗E[|XEP(p1)|]  1 − G  h1(p); 2 p1  +(1 − α)(1 − α∗)E[|XEP(p2)|]  G  h2(p); 2 p2  − ω  (64) Once the fore-casted values are computed the realized values are estimated normally with maximum likelihood. From this the actual values are gotten. With this the following values are computed: SEj = (σ2t − ˆσ2(t−j)+j|(t−j)) 2 (65) M Ej = |σ2t − ˆσt|t−j2 | (66) SEp,j = (ESp,t− ˆESp,t|t−j)2 (67) M Ep,j = |ESp,t− ˆESp,t|t−j| (68)

To check how well they were predicted the squared error and the absolute error for each are computed and then for each stock the mean squared error (MSE) and the mean absolute error (MAE) are gotten. These are used to compare performances to each other, the lower the better ofcourse. The different percentiles used are: α or p = (0.001, 0.005, 0.01, 0.025, 0.05, 0.1).

(20)

3.3 Data

The data used is the S&P500 and 30 constituents of this index. For this research the data is taken from DataStream (Thomsen Reuters). This is daily price data from 31 December 1963 to 11 july 2018 for the S&P500, this was the biggest set available to get as many data points possible. All of the constituents of the S&P500 are from later in the 20th century, for the stocks and their data periods see Appendix C. The number of observations ranges from T = 788 to T = 14227

From the daily price data the daily log returns are created:

Rt= (log(Pt) − log(Pt−1)) ∗ 100 (69)

As in this paper at = σtt is assumed, Rt = at holds.In the next section the results from

the maximum likelihood estimation in terms of fit will be shown and discussed. In figure 2 the densities of a normal distribution, student’s t distribution and AEPD is fitted and plotted on the histogram of the returns of the S&P 500. Clearly the three perform well, but the normal distribution overestimates the middle part and underestimates the tails of the distribution. The t distribution, gives a lot more density to the tails and seems to overestimate quite a lot in the tails. The AEPD seems to fall between the other distributions and underestimates a lot of the density, but clearly follows the distribution of the returns a lot better than the other two.

(21)

4

Maximum Likelihood Estimation Results and Analyses

This section starts by showcasing how the models fit the S&P 500, by giving the parameter values of the different models and their significance, gotten from the asymptotic normal distribution attributes of the coefficients. See for the values Table 2. Here in Table 2 you see

Distribution φ0 φ1/A1 β1/B1 α/ν p 1 p 2

Normal 0.0064 (3.17E-05) 0.0750 (5.45E-06) 0.9207 (0.000125429) 0 0 0 GAS Normal 0.0404 (7.63E-05) 0.0766 (2.55E-05) 0.9558 (4.91E-06) 0 0 0 Student 0.0047 (4.14E-05) 0.0696 (8.63E-06) 0.9287 (5.98E-05) 6.0931 (1.16E-06) 0 0 GAS Student 0.0309 (5.77E-05) 0.0564 (5.52E-05) 0.9669 (6.16E-07) 6.2849 (2.59E-05) 0 0

AEPD 0.0049∗(0.0198) 0.0718 (0.0048) 0.9270 (0.0024) 0.4800∗(9.9327) 1.1156∗(1.3197) 1.3784 (0.0947) GAS AEPD 0.0047 (0.0006) 0.2052 (0.0001) 0.9352 (5.35E-05) 0.5040 (4.28E-05) 1.2367 (9.22E-05) 1.2538 (0.0002)

Table 2: S&P500 ML estimation results, within brackets are the standard deviations. 1∗ shows insignificant on 5% level

almost every model is highly significant, only the AEPD GARCH model has a non significant constant, α and left tail parameter. All other models are very significant. If we test whether the AEPD GAS model’s parameters are close to that of the normal distribution by doing a two sided t-test:

α − 0.5 4.28 × 10−0.5 = 92.60 p1− 2 9.22 × 10−0.5 = −8282.0 p2− 2 0.0002 = −3299.4

Which are significantly different from the values that would reduce the AEPD to the normal distribution. This can also be concluded from the degrees of freedom from the models using the t distribution. If the degrees of freedom would be very large, actually infinite, the distribution would approach the normal distribution. As the degrees of freedom are only 6.09 and 6.28 for the t-GARCH and t-GAS models respectively that conclusion can not be drawn.

In the following Table 3, you see what the values of the AIC, AICC and BIC were for the different models when maximum likelihood estimation was performed on the S&P 500. These values were divided by T, the sample size. What stands out is that the AEPD with insignificant α and p1 has the lowest AIC, AICC and BIC values. This can be caused by

the estimation of the Hessian of the log likelihood function, as the standard bfgs algorithm is used. This could thus be improved if a better Hessian approximation approach was used. But seeing the other values for the scaled information criteria, it is very close. The difference between the AIC for the AEPD GARCH and AEPD GAS is 0.007. This last table, Table 4, shows amount of times a model had the lowest information criterion for a stock. First thing to notice, the normal distribution, either the GARCH or GAS models, never had the lowest

(22)

S&P 500 − log(L) AIC AICC BIC Normal -17358.20 2.4407702 2.4407703 2.4423651 Student -16971.01 2.3864768 2.386477 2.3886032 AEPD -16865.00 2.3718538 2.3718542 2.3750435 GAS Normal -17415.92 2.4488846 2.4488847 2.4504795 GAS Student -17011.467 2.3921632 2.3921634 2.3942897 GAS AEPD -16917.10 2.3791783 2.3791787 2.382368 min/max -16865.00 2.3718538 2.3718542 2.3750435

Best AEPD AEPD AEPD AEPD

Table 3: Values of the AIC, AICC and BIC for the different estimated models on the S&P 500

values. The normal distribution is clearly outperformed by all the other models. the GAS framework had the lowest values 65 out of 90. The t-GARCH and AEPD-GARCH had the other 25. In Appendix D which model performed best on which information criterion and which stock can be found. The table in Appendix D is sorted on sample size. At first glance

Distribution AIC AICC BIC

Normal 0 0 0 Student 2 3 3 AEPD 6 6 5 GAS normal 0 0 0 GAS Student 9 9 12 GAS AEPD 13 12 10

Table 4: Number of times the model was optimal according to the information criteria

the t-GAS model seems outperform the AEPD-GAS model when the sample size is small. However when the sample size is large the AEPD-GAS model performs better, as well as the AEPD GARCH model. This could show that for fitting the AEPD properly a larger sample size would be needed. This is not necessarily strange, as the distribution can fit a larger spread of data, which happens quicker when a sample size is larger.

The leverage effect in the GAS-AEPD model, which has been handled shortly in the research design, is checked as well. For this the average influence of negative returns and the average influence of positive returns, and their variation, is compared. The average influence for negative returns is: 0.0617 on the next period’s variation with a variation of 0.0470. The average influence of a positive return was 0.0624 with a variation of 0.0523. From this can

(23)

be concluded that the leverage effect is non existent in the AEPD GAS model, as there is no difference in how large the effect of a negative return is compared to a positive return. But if the leverage effect is non existen in this framework, there could be concluded that the AEPD GAS framework takes into account the leverage effect and removes the need of models like the EGARCH model. But more research is needed on this matter.

5

Expected Shortfall prediction Results and Analyses

This section deals with the results of the expected shortfall forecasts. First see Table 5 for the results concerning the S&P 500 stock index on how each model has performed for the given percentiles. It seems the height of the percentile does not influence the performance of the prediction. Only the amount of steps ahead are predicted and which model is used. For the one step ahead the AEPD seems to clearly predict the best. But for five steps ahead the standardized student’s t distribution performs quite a lot better than the other models.

An explanation why the AEPD does perform better for the one step forecast and not for the five step, could be that due to the spread of the distribution compared to the t distribution it is easier to be slightly off on a longer horizon. And when that horizon increases the error gets bigger rapidly. You see this in Table 6 too. The AEPD almost perfectly predicts

S&P 500 0.001 0.005 0.01 0.025 0.05 0.1

J=1 MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE Normal 0.1386 0.1659 0.1022 0.1425 0.0868 0.1313 0.0668 0.1152 0.052 0.1016 0.0377 0.0865 GAS Normal 0 0.0022 0 0.0019 0 0.0017 0 0.0015 0 0.0013 0 0.0011 Student 0.1748 0.2162 0.0679 0.1389 0.0457 0.1155 0.0275 0.0911 0.0191 0.0768 0.0136 0.0655 GAS Student 0.1359 0.217 0.048 0.1371 0.0314 0.1133 0.0184 0.0889 0.0126 0.0747 0.009 0.0636 AEPD 0 0.0018 0 0.0014 0 0.0013 0 0.0011 0 0.0009 0 0.0008 GAS AEPD 2.4893 0.9508 1.7444 0.7784 1.4467 0.7003 1.0753 0.5924 0.8128 0.5063 0.5685 0.4152 S&P 500 0.001 0.005 0.01 0.025 0.05 0.1

J=5 MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE Normal 0.3626 0.4775 0.2675 0.4101 0.2272 0.378 0.1748 0.3315 0.1361 0.2925 0.0985 0.2489 GAS Normal 7.8353 1.7355 5.78 1.4906 4.9092 1.3738 3.7771 1.205 2.9405 1.0632 2.1286 0.9046 Student 0.2657 0.3981 0.1116 0.26 0.078 0.2178 0.0494 0.1736 0.0356 0.1475 0.0263 0.1267 GAS Student 0.399 0.4715 0.1654 0.3089 0.1146 0.2586 0.0716 0.2059 0.051 0.1745 0.0371 0.1493 AEPD 4.5682 1.2511 2.9084 1.0028 2.2931 0.8925 1.5772 0.7427 1.1131 0.6258 0.7198 0.505 GAS AEPD 12.1594 2.4565 7.8742 1.9892 6.2757 1.7798 4.3984 1.4934 3.1633 1.2677 2.0957 1.0317

Table 5: S&P 500 MSE and MAE values of the ES for different percentiles

the one step ahead variation, but the five steps ahead already have a lot more divergence from the true value, while the t distribution is still quite near the actual value. Here in Table 7 you see the number of times a a model had the lowest MSE or MAE over the 8 different stocks that were used for the ES prediction testing. These stocks are: S&P 500, Apple, JP Morgan, Alphabet, EXxon Mobile, Walmart, Netlfix and AT&T. Almost every stock had a different

(24)

S&P500 MSE MAE 1 step ahead Normal 0.0122 0.0493 GAS normal 0 0.0006 Student 0.0092 0.0445 GAS student 0 0.0007 AEPD 0 0.0004 GAS AEPD 0.1427 0.2993 5 steps ahead Normal 0.032 0.1418 GAS Normal 0.725 0.5175 Student 0.0284 0.1317 GAS Student 0.0391 0.1538 AEPD 0.4849 0.3699 GAS AEPD 0.9843 0.5656

Table 6: S&P 500 MSE and MAE values of the forecasts of the variance

sample size, but there was not a clear correlation between the best model and the sample size.

However there seems to be a clear correlation between which model score the best in terms of having the lowest MSE or MAE. Clearly the GAS normal is the best predictor and after that the standardized student’s t distribution. As the GAS and GARCH with normal distributed errors are equivalent, it can be said that the normal distribution is the best for predicting the expected shortfall. This is shown very clearly for the one step ahead forecast, but for the five day forecast the t distribution seems to catch up.

A reason why the AEPD seems to predict so badly might be due to of over-fitting. Due to the many parameters the AEPD is very flexible and can fit a lot of different data sets. Due to over-fitting it might drastically over or under predict the expected shortfall.

A last cause of the bad prediction might be because of the dataset, in the paper of Zhu and Galbraith (2011), they perform the fit and expected shortfall estimation too. But they use a smaller start set and estimate the expected shortfall every data point where in this paper it was done every 500 data points. This could lead to this paper having a too small data set. But increasing the data set on estimated expected shortfalls was done for just the S&P 500, yet this yielded similar results as in table 5. For comparison these can be found in Appendix E. The AEPD performed best for the one day ahead prediction, but the t distribution was best for the 5 day ahead prediction.

(25)

Distribution all MSE all MAE 1 step MSE 1 step MAE 5 step MSE 5 step MAE Normal 14 15 0 0 14 15 GAS Normal 55 56 55 56 0 0 Student 24 28 0 0 24 28 GAS Student 10 5 0 0 10 5 AEPD 9 8 9 8 0 0 GAS AEPD 0 0 0 0 0 0

Table 7: Times a model had the Lowest values of MSE or MAE over 8 stocks

In contrast to this research they test if the AEPD is best in class of other very general distributions, which it was not for predicting the expected shortfall. But they do see an improvement when increasing the generality of their distribution, which this research does not seem to do. They did do a very large out of sample prediction test for the expected shortfall, which is where this paper lacked a bit, the prediction sample was quite large, even though it seems to yield quite definitive results.

6

Conclusions

This paper set out to see if a very general model and a very general distribution of the error terms would yield good results. Good results in terms of fitting the model and distribution as well as predictive power were found. 65 out of 90 computed information criteria were the lowest for the GAS framework with either t distributed errors or AEPD errors. The other 25 best fits were for the t-GARCH and the AEPD-GARCH. Not a single time did the normal distribution have the best fit on the financial daily data. The AEPD GAS model had the best fit on most of the data sets it was estimated on. Clearly the generalized autoregressive score model assuming an asymmetric exponential power distributed observation density yields the best fits.

However this same model that fit the data the best, has never predicted the variance or the expected shortfall the best. The least general model and distribution seems to perform best in terms of mean squared error and mean absolute error. The normal GARCH model and the Normal GAS model, which are equivalent, both seem to predict the future variances and expected shortfalls the best.

To improve the results the expected shortfall prediction could be improved. In this paper the amount of predictions performed was quite low, whereas Zhu and Galbraith (2011) used a very large set for the prediction and this yielded very different results.

(26)

which it clearly seems to have found and as it fits the data the best. Thus this paper seems to have, partly, succeeded in what it has set out to do.

(27)

References

Akaike, H. (1974). A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716–723.

Artzner, P., Delbaen, F., Eber, J., and Heath, D. (1999). Coherent measures of risk. Mathe-matical Finance, 9(3):203–228.

Bao, T., Diks, C., and Li, H. (2017). A generalized capm model with asymmetric power distributed errors with an application to portfolio construction. Economic Modelling. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of

econometrics, 31(3):307–327.

Campbell, J. Y., Lo, A. W.-C., MacKinlay, A. C., et al. (1997). The econometrics of financial markets, volume 2. princeton University press Princeton, NJ.

Cizek, P., H¨ardle, W. K., and Weron, R. (2011). Statistical Tools for Finance and Insurance. Springer Berlin Heidelberg, Berlin, Heidelberg.

Creal, D., Koopman, S. J., and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28(5):777–795.

Engle, R. (2002). Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics, 20(3):339–350.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the vari-ance of united kingdom inflation. Econometrica, 50(4):987–1007.

Fern´andez, C., Osiewalski, J., and Steel, M. F. J. (1995). Modeling and inference with v-spherical distributions. Journal of the American Statistical Association, 90(432):1331–1340. Hansen, B. (1994). Autoregressive conditional density estimation. International Economic

Review, 35.

Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2):297–307.

Komunjer, I. (2007). Asymmetric power distribution: Theory and applications to risk mea-surement. Journal of Applied Econometrics, 22(5):891–921.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461– 464.

(28)

Theodossiou, P. (2015). Skewed generalized error distribution of financial assets and option pricing. Multinational finance journal : MF ; quarterly publication of the Multinational Finance Society, 19(3):223–266.

Tsay, R. S. (2005). Analysis of financial time series / Ruey S. Tsay. Wiley series in probability and mathematical statistics. Wiley, Hoboken, NJ, 2nd ed edition.

Zhu, D. and Galbraith, J. W. (2011). Modeling and forecasting expected shortfall with the generalized asymmetric student- t and asymmetric exponential power distributions. Journal of Empirical Finance, 18(4):765–778.

Zhu, D. and Zinde-Walsh, V. (2009). Properties and estimation of asymmetric exponential power distribution. Journal of Econometrics, 148(1):86–99.

(29)

Appendix

Appendix A

GAS(1,1) model derivation assuming a Student’s t distribution for the error term with mean zero and unit variance. Set at = σtt ft = σt2, St = It|t−1−1 . So p(at|ft, Ft : θ) =

f (at|σ2t, ν) f (at|σt2, ν) = Γ((ν + 1)/2) Γ(ν/2)σtpπ(ν − 2)  1 + a 2 t (ν − 2)σ2t −(ν+1)/2 (70) ∇t= ∂ ln f (at|σt, ν) ∂σt2 = (ν + 1)a2 t 2σt2((ν − 2)σt2+ a2t) − 1 2σt2 (71) St= It|t−1−1 = −Et−1[∇t∇0t]−1= −Et−1  ∂2ln f (a t|σt, ν)) ∂(σt2)2 −1 (72) −Et−1  ∂2ln f (a t|σt, ν)) ∂(σt2)2 −1 = 2σ 4 t(ν + 3) ν (73) st= ∇tSt= ν + 3 ν σ 2 t  (ν + 1)a2t (ν − 2)σt2+ a2t − 1  (74)

This can expression can be used as it stands, but to get the expression from Creal et al. (2013) it needs to be rewritten: st= ν + 3 ν σ 2 t  (ν + 1)a2t (ν − 2)σ2t + a2t − 1  = 1/ν 1/ν ν + 3 ν  1/ν 1/νσ 2 t (ν + 1)a2t (ν − 2)σ2 t + a2t − σt2  (75) = (1 + 3ν−1)  (1 + ν−1)a2tσ2t (1 − 2ν−1)σt2+ a2tν−1 − σ 2 t  (76) = (1 + 3ν−1)  (1 + ν−1)a2t (1 − 2ν−1) + a2 tν−1/σt2 − σt2  (77) = (1 + 3ν−1)  (1 + ν−1)a2t (1 − 2ν−1)(1 + a2 tν−1/((1 − 2ν−1)σt2)) − σ2 t  (78) Appendix B

Derivation fo the GAS(1,1) model assuming the AEPD as the error’s distribution. Set at=

(30)

in (23) f (at|σt, α, p1, p2) =            δ √ σ2 t exp p1 1 ω+δ√at σ2 t 2αKEP(p1) p1! , if ω + δ√at σ2 t ≤ 0 δ √ σ2 t exp p1 2 ω+δ√at σ2t 2(1−α)KEP(p2) p2! , if ω + δat σ2 t > 0 (79) L(at; θ) = Γ(1 + 1/p1)(−ω − δ√at σ2 t ) α I(ω + δ at p σ2 t < 0) (80) R(at: θ) = Γ(1 + 1/p2)(ω + δ√at σ2 t ) (1 − α) I(ω + δ at p σ2 t > 0) (81) (82)

From equation (17) we get the logarithm

ln f (at|σt, α, p1, p2) = ln(δ) − 1 2ln(σ 2 t) − [L(at; θ)]p1 − [R(at; θ)]p2 (83) (84)

Now take the derivative towards σ2t for the score.

t= ∂ ln f ∂σt2 = − 1 2σ2t − Γ(1/p1) α [L(at; θ)] p1−1 δat 2σ2 t p σ2 t + Γ(1/p2) (1 − α)[R(at; θ)] p2−1 δat 2σ2 t p σ2 t (85) (86)

Then the second derivative towards σ2t, using the product rule for the big terms which include the L and R function.

∂2ln f ∂(σ2 t)2 = 1 2(σ2 t)2 −(p1− 1)Γ(1/p1)Γ(1 + 1/p1) α2 [L(at; θ)] p1−2 δat 2σt2pσ2t !2 +Γ(1/p1) α [L(at; θ)] p1−1 3δat 4(σ2 t)5/2 −(p2− 1)Γ(1/p2)Γ(1 + 1/p2) (1 − α)2 [R(at; θ)] p2−2 δat 2σ2 t p σ2 t !2 −Γ(1/p2) (1 − α)[R(at; θ)] p2−1 3δat 4(σ2t)5/2 (87) (88)

(31)

Here the expected value is taken. Et−1  ∂2ln f ∂(σ2 t)2  = 1 2(σ2 t)2 −(p1− 1)Γ(1/p1)Γ(1 + 1/p1) α2 E[L(at; θ)] p1−2 δ 2E[a2 t] 2(σ2 t)3 +Γ(1/p1) α E][L(at; θ)] p1−1 3δE[at] 4(σt2)5/2 −(p2− 1)Γ(1/p2)Γ(1 + 1/p2) (1 − α)2 E[R(at; θ)] p2−2 δ 2E[a2 t] 2(σ2 t)3 −Γ(1/p2) (1 − α)E[R(at; θ)] p2−1 3δE[at] 4(σt2)5/2 (89) E[at] = 0, E[a2t] = σt2 Et−1  ∂2ln f ∂(σt2)2  = 1 2(σt2)2 − (p1− 1)Γ(1/p1)Γ(1 + 1/p1) α2 E[L(at; θ)] p1−2 δ 2 2(σt2)2 −(p2− 1)Γ(1/p2)Γ(1 + 1/p2) (1 − α)2 E[R(at; θ)] p2−2 δ 2 2(σ2 t)2 (90)

From Lemma 8 in the appendix of Zhu and Zinde-Walsh (2009) the expected the value of the L and R functions are gotten. Lemma 8 says:

E[L(at; θ)]r[ln(L(at; θ))]mI(at< µ) =

α pm+11 Γ(m)((1 + r)/p 1) Γ(1 + 1/p1) (91)

E[R(at; θ)]r[ln(R(at; θ))]mI(at< µ) =

1 − α pm+12

Γ(m)((1 + r)/p2)

Γ(1 + 1/p2)

(92)

r = p1− 2 for L and r = p2− 2 for R. In both cases m = 0 so:

E[L(at; θ)]p1−2 = α p1 Γ(1 − 1/p1) Γ(1 + 1/p1) (93) E[R(at; θ)]p2−2 = 1 − α p2 Γ(1 − 1/p2) Γ(1 + 1/p2) (94) ⇒ Et−1  ∂2ln f ∂(σ2 t)2  = 1 2(σ2 t)2 −(p1− 1)Γ(1/p1)Γ(1 + 1/p1) α2 α p1 Γ(1 − 1/p1) Γ(1 + 1/p1) δ2 2(σ2 t)2 −(p2− 1)Γ(1/p2)Γ(1 + 1/p2) (1 − α)2 1 − α p2 Γ(1 − 1/p2) Γ(1 + 1/p2) δ2 2(σt2)2 (95) = 1 2(σ2 t)2 −(p1− 1)Γ(1/p1)Γ(1 − 1/p1) αp1 δ2 2(σ2 t)2 −(p2− 1)Γ(1/p2)Γ(1 − 1/p2) (1 − α)p2 δ2 2(σ2t)2 (96) (97)

(32)

Take the inverse of the (96) and put a minus infront of the function St= −Et−1  ∂2ln f ∂(σ2 t)2 −1 = −2(σt2)2  1 −(p1− 1)Γ(1/p1)Γ(1 − 1/p1)δ 2 αp1 −(p2− 1)Γ(1/p2)Γ(1 − 1/p2)δ 2 (1 − α)p2 −1 (98) (99)

combining ∇tand St. ∇tis everything within the square brackets within the enumerator. As

the enumerator has 12

t in front of every term. a small simplification of the formula can be

made. After we get the end result for this paper.

st= ∇tSt= − 2(σ2 t)2      1 2σ2 t +Γ(1/p1) α [L(at; θ)] p1−1 δat 2σ2 t √ σ2 t −Γ(1/p2) (1−α) [R(at; θ)] p2−1 δat 2σ2 t √ σ2 t   1 −(p1−1)Γ(1/p1)Γ(1−1/p1)δ2 αp1 − (p2−1)Γ(1/p2)Γ(1−1/p2)δ2 (1−α)p2      (100) = −σt2      1 +Γ(1/p1) α [L(at; θ)] p1−1 δa√t σ2 t −Γ(1/p2) (1−α) [R(at; θ)] p2−1 δa√t σ2 t   1 −(p1−1)Γ(1/p1)Γ(1−1/p1)δ2 αp1 − (p2−1)Γ(1/p2)Γ(1−1/p2)δ2 (1−α)p2      (101) σt+12 = α0+A1(−σt2)      1 +Γ(1/p1) α [L(at; θ)] p1−1 δa√t σ2 t −Γ(1/p2) (1−α) [R(at; θ)] p2−1 δa√t σ2 t   1 −(p1−1)Γ(1/p1)Γ(1−1/p1)δ2 αp1 − (p2−1)Γ(1/p2)Γ(1−1/p2)δ2 (1−α)p2      + B1σt2

(33)

Appendix C

Stock name start date sample T

S&P 500 31-12-1963 14227 Apple 12-12-1980 9804 J P Morgan 02-01-1973 11877 Alphabet 19-08-2004 3625 Exxon Mobil 02-01-1973 11877 Walmart 02-01-1973 11877 Boeing 02-01-1973 11877 Mastercard 25-05-2006 3165 AT&T 21-11-1983 9038 Coca Cola 02-01-1973 11877 MC Donalds 02-01-1973 11877 FEDEX 12-04-1978 10501 20th century fox 03-11-2004 3571 Eletronic Arts 20-09-1989 7516 Delta Airlines 26-04-2007 2925 Hilton 12-12-2013 1195 Netflix 23-05-2002 4210 PayPal 06-07-2015 788 Activision Blizzard 25-10-1993 6448 Boston Scientific 19-05-1992 6822 Target 02-01-1973 11877 Estee Lauder 17-11-1995 5909 Amazon 15-05-1997 5520 General Electric 02-01-1973 11877 Nike 02-12-1980 9812 Colgate 02-01-1973 11877 Ford Motor 02-01-1973 11877 EBAY 25-09-1998 5164 Twitter 07-11-2013 1220 Facebook 18-05-2012 1604

(34)

Appendix D

Stock name AIC AICC BIC T

S&P 500 AEPD AEPD AEPD 14227

J P Morgan GAS AEPD GAS AEPD GAS AEPD 11877

Exxon Mobil AEPD AEPD AEPD 11877

Walmart AEPD AEPD AEPD 11877

Boeing GAS AEPD GAS AEPD GAS AEPD 11877

Coca Cola GAS AEPD GAS AEPD Student 11877

MC Donalds Student Student GAS Student 11877

Target GAS AEPD GAS AEPD GAS AEPD 11877

General Electric GAS AEPD GAS AEPD GAS AEPD 11877

Colgate GAS AEPD GAS AEPD GAS AEPD 11877

Ford Motor GAS AEPD GAS AEPD GAS AEPD 11877

FEDEX GAS AEPD GAS AEPD GAS AEPD 10501

Nike AEPD AEPD GAS Student 9812

Apple GAS Student GAS Student GAS Student 9804

AT&T GAS AEPD GAS AEPD GAS AEPD 9038

Eletronic Arts GAS AEPD GAS AEPD GAS AEPD 7516

Boston Scientific GAS Student GAS Student GAS Student 6822

Activision Blizzar GAS AEPD GAS AEPD GAS AEPD 6448

Estee Lauder GAS Student GAS Student GAS Student 5909

Amazon GAS Student GAS Student GAS Student 5520

EBAY GAS AEPD Student Student 5164

Netflix GAS Student GAS Student GAS Student 4210

Alphabet GAS Student GAS Student GAS Student 3625

20th century fox Student Student Student 3571

Mastercard GAS Student GAS Student GAS Student 3165

Delta Airlines AEPD AEPD AEPD 2925

Facebook GAS Student GAS Student GAS Student 1604

Twitter GAS Student GAS Student GAS Student 1220

Hilton AEPD AEPD AEPD 1195

PayPal GAS AEPD GAS AEPD GAS Student 788

(35)

Appendix E

The three tables when the amount of predictions is increased to every 100 data points instead of 500. Like the results in table 5, the AEPD performs best for the 1 day ahead and the Student’s t distribution performs best for the 5 day ahead. The lowest mean squared error and mean absolute error are bolt in the the tables

alpha Normal GAS Normal

1 day forecasting MSE MAE MSE MAE 0.001 0.031284629 0.034980559 8.45E-06 0.000492882 0.005 0.02307824 0.030044334 6.23E-06 0.000423329 0.01 0.019601341 0.0276888 5.29E-06 0.000390139 0.025 0.015081248 0.024287336 4.07E-06 0.000342212 0.05 0.011740837 0.021429438 3.17E-06 0.000301944 0.1 0.008498994 0.018232449 2.30E-06 0.000256898 0.25 0.004458464 0.01320547 1.20E-06 0.000186067 0.5 0.001756716 0.00828919 4.74E-07 0.000116796 5 day forecasting 0.001 0.075355803 0.099808646 1.951582137 0.437767772 0.005 0.055588938 0.08572431 1.439655252 0.37599288 0.01 0.047214074 0.079003358 1.222761084 0.346514309 0.025 0.036326452 0.069298096 0.940790906 0.303946344 0.05 0.028280349 0.061143767 0.732411062 0.268180883 0.1 0.020471667 0.052021925 0.530180012 0.22817184 0.25 0.010739176 0.037678647 0.278125687 0.165261206 0.5 0.004231432 0.023651217 0.109586609 0.103735908

(36)

alpha Student t GAS Student t

1 day forecasting MSE MAE MSE MAE 0.001 0.042448995 0.052612889 0.034284328 0.053288258 0.005 0.016411101 0.033248124 0.011898263 0.03267836 0.01 0.011015519 0.027425823 0.007715924 0.026685517 0.025 0.006604801 0.021421265 0.004482416 0.020633557 0.05 0.004564337 0.017916985 0.00306036 0.017177152 0.1 0.003231404 0.015159813 0.002163612 0.01450648 0.25 0.002198888 0.012580482 0.001490313 0.012055552 0.5 0.001897847 0.01171343 0.001298139 0.01124379 5 day forecasting 0.001 0.05943764 0.08808571 0.089382437 0.102370645 0.005 0.024431294 0.056317696 0.036535221 0.066060627 0.01 0.016864821 0.046688641 0.025132101 0.054904898 0.025 0.010503115 0.036705025 0.015542768 0.043259019 0.05 0.007470278 0.030847127 0.010970392 0.036366572 0.1 0.005438324 0.02621776 0.007908587 0.030861967 0.25 0.003821578 0.021867205 0.005477034 0.025632444 0.5 0.003340121 0.020399706 0.004754909 0.023858223

Table 11: Student’s t distribution errors when increasing the amount of predictions

alpha AEPD GAS AEPD

1 day forecasting MSE MAE MSE MAE 0.001 3.72E-06 0.000360855 0.690045607 0.251918457 0.005 2.33E-06 0.000296854 0.488209218 0.207819016 0.01 1.82E-06 0.000267567 0.406344604 0.187619615 0.025 1.25E-06 0.000226843 0.303307465 0.159482785 0.05 8.76E-07 0.000194174 0.229841429 0.136844278 0.1 5.66E-07 0.000159455 0.161078092 0.112686758 0.25 2.53E-07 0.000109203 0.079923606 0.077454012 0.5 9.17E-08 6.61E-05 0.030038995 0.046789247 5 day forecasting 0.001 1.047997861 0.27907734 3.104522765 0.578192166 0.005 0.666200687 0.223085081 1.971681312 0.462276531 0.01 0.524849644 0.198267951 1.556661813 0.411017957 0.025 0.360549952 0.164653906 1.076266452 0.341682502 0.05 0.25417711 0.138485876 0.765387293 0.287756256 0.1 0.164161244 0.111515265 0.501059116 0.232186684 0.25 0.072208182 0.074189329 0.226782992 0.155189227 0.5 0.025241402 0.043965659 0.081798225 0.092651845

Referenties

GERELATEERDE DOCUMENTEN

The burner used in this boiler is a Stork Double Register Burner (DRB), using an enhanced Y-jet steam assisted atomizer. The steam is injected with the oil in a

Volgens Bram Stemerdink, tijdens het kabinet-Den Uyl staatssecretaris van Defensie en opvolger van Vredeling als minister van Defensie in 1976, was het doel van Nieuw Links om

[r]

Aangesien deelwoorde van partikelwerkwoorde baie produktief is (vergelyk Tabel 9 in afdeling 4.3.1), behoort hierdie riglyn (vergelyk Figuur 30) herhaal te word in die

Daane, Beirne and Lathan (2000:253) found that teachers who had been involved in inclusive education for two years did not support the view that learners

Nicolas: Wovor hast du denn Angst, dass sie eine paywall für Schweizer machen oder wie? Moderator: Da stellt sich schon die Frage, die EU war ja auch ein bisschen zickig nach der

• Covergisting vindt plaats op een akkerbouwbedrijf met bestaande vergistingsinstallatie; • Er zijn twee bouwplannen opgesteld, één voor zandgrond en één voor kleigrond; •

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of