Quantifying the uncertainty associated with Value-at-Risk in ARMA(1,1)-GARCH(1,1) models

(1)

Quantifying the uncertainty associated with

Value-at-Risk in ARMA(1,1)-GARCH(1,1) models

Chris van den Broek [s1549987] Supervisors:

prof. dr. Ruud Koning prof. dr. Laura Spierdijk

January 31th, 2014

Abstract. We investigate a range of methods to assess the estimation uncertainty associated with Value-at-Risk forecasting. We propose an additional method based on the likelihood ratio statistic and profile like-lihood and we provide the means to implement them successfully. We assess the ability of the methods to correctly quantify the estimation uncertainty that accompanies one-step-ahead Value-at-Risk forecasting by comparing the empirical coverage probabilities with the nominal cov-erages for a wide variety of different settings.

1 Introduction

In this thesis we investigate various techniques to quantify the uncertainty associated with Value-at-Risk estimation and we investigate and evaluate a new addition to the class of Value-Value-at-Risk assessment methods. Our main contribution consists of profile likelihood-based methods and eval-uation of their performance in a controlled environment.

(2)

estimation of the lower quantiles of a data generating process such as modelled by models for port-folio return distributions. Risk measures such as the Value-at-Risk are estimates and risk measures depend on the estimated model parameters under the assumption that a (parametric) model under-lies the data generating process of the returns. Risk measures are therefore subject to estimation uncertainty of the model parameters of the underlying model. Quantifying the uncertainty that accompanies Value-at-Risk estimation is of great importance given its important role in finance and econometrics. Besides parameter uncertainty, risk measures are also subject to model uncertainty and process error. In this paper we focus on parameter uncertainty.

In empirical contributions to academic literature, authors often make note of certain stylized facts of high frequency financial data. These observations include the manifestation of little serial corre-lation by the return series, whereas series of squared and absolute returns exhibit significant serial correlation. Other observations include the observed fat tails that return series tend to exhibit, the volatility appears to vary over time and extreme returns tend to appear in clusters. Concerning the observed heavy tails of return series often accompanying financial data, Stoyanov et al. (2011) give an overview of a large set of recorded cases in applied finance where fat tails are accompanied by skewness in the underlying distribution of the data generating process. In many of the recorded cases the frequency of extreme gains and extreme losses is much higher than one would expect on basis of models that incorporate the normal distribution. Even though the frequent occurrence of such cases is widely acknowledged, many concepts and procedures in empirical and theoretical finance make the incorrect assumption that the data generating process follows a normal distribu-tion. The variance-covariance approach of RiskMetrics in the setting of Value-at-Risk forecasting and the Black-Scholes-Merton option pricing model serve as illustrating examples to this statement.

The ARMA-GARCH model offers a modelling approach to deal with both of the aforementioned characteristics through the modelling of volatility over time and through the inclusion of a con-ditional mean which captures the autocorrelation through an ARMA model. The well-developed theory around estimation techniques for the parameters make ARMA-GARCH models a first-class approach to model financial data. The parameters of ARMA-GARCH models are often estimated using QML2 _{procedures, Francq & Zakoian (2004) prove consistency and asymptotic normality of}

2

(3)

the QMLE under relatively weak modelling conditions for the general ARMA(p,q)-GARCH(Q,P) model with lag parameters p, q, Q and P. Among other things, Francq & Zakoian (2004) show that QML yields consistent estimates for ARMA-GARCH models even though the assumption of normality on the distribution of the residuals is violated (we come back to this subject later in this paper).

The primary aim of this paper is to assess and compare the delta method and the profile like-lihood approach in quantifying the uncertainty associated with the one-step ahead Value-at-Risk forecast. We introduce an approach based on profile likelihood methods and the generalized like-lihood ratio statistic and we examine its properties and performance against the delta method on basis of coverage probability accuracy3 and average width of the confidence interval bounds.

We proceed as follows. In section 2 we introduce the ARMA(1,1)-GARCH(1,1) model and the one-step ahead Value-at-Risk forecast on basis of the data generating process specified by this model. In section 3 we provide a literary overview of a commonly used method to assess the uncer-tainty associated with Value-at-Risk estimation. We propose additional methods based on profile likelihood theory in section 4. In section 5 we compare the methods in terms of empirical coverage probabilities. We present an application of our results in an empirical setting in section 6. Finally, in section 7 we discuss our findings and we share our ideas on the direction of future research on the subject of Value-at-Risk estimation and the quantification of its uncertainty.

2 The model

We consider the general ARMA(1, 1)-GARCH(1, 1) model as specified by Francq & Zakoian (2004). Using similar notation as in McNeil et al. (2005), the ARMA(1, 1)-GARCH(1, 1) model constitutes

3_{The coverage probability of a method refers to the probability that the confidence interval, as supplied by method,}

(4)

a process X = {Xt, t = 1, . . . , T } such that:

Xt = µ + φ(Xt−1− µ) + et+ ϕet−1

et = σtZt

σ_t2 = α0+ α1e2t−1+ βσt−12 ,

(1)

such that (µ, φ, ϕ)0∈ R3_{and (α}

0, α, β)0 ∈ (0, +∞] × [0, +∞]2. Let θ denote whole parameter vector,

i.e. θ = (µ, φ, ϕ, α0, α1, β)0, and let Θ denote the parameter space. We assume the residual process

Z = {Zt, t = 1, . . . , T } satisfies Z ∼ SW N (0, 1), i.e. Z is a strict white noise process with mean 0

and variance 1. In academic literature, a random process is called a strict white noise process if it is a series of iid random variables with a finite variance. The covariance stationarity condition for the model is α1+ β < 1. We can reparameterize model (1) to a more commonly known specification by

introducing and µt= Xt− et, which allows us to rewrite (1) as:

Xt = µt+ σtZt

µt = µ + φ(Xt−1− µ) + ϕ(Xt−1− µt−1)

σ2_t = α0+ α1(Xt−1− µt−1)2+ βσt−12

(2)

Higher moments of Xtdo not necessarily exist and higher moments of Ztmay not necessarily exist

either. To estimate the parameters by QML some authors require the existence of a fourth moment of the residual process, such as discussed in Ling & McAleer (2003). A necessary and sufficient condition for the existence of the fourth moment is derived in McNeil et al. (2005) and given by E[(α1Zt2+ β)2] < 1. The previous condition expresses a dependence on the kurtosis of the residual

process and on the parameters of the GARCH-part which becomes more explicit by deriving:

E[(α1Zt2+ β)2] = α21E[Zt4] + α1βE[Zt2] + β2 = α21κZ+ α1β + β2 (3)

(5)

Based on our random process X, the one-step ahead Value-at-Risk forecast for time T + 1 with a confidence level ξ is a number VaRξ_{T +1} ∈ R such that: P (XT +1 > VaRξ_{T +1}|FT) = 1 − ξ, with FT

denoting the natural filtration induced by our random process X up to and including time T , i.e. FT = σ(Xs; s ≤ T ). In particular, we are interested in VaRξ_{T +1}. By our reparameterization, it is

clear from (2) that:

ξ = P (XT +1≤ VaRξT +1|FT) = P (µT +1+ σT +1ZT +1≤ VaRξ_{T +1}|FT) = P (ZT +1 ≤ VaRξ_{T +1}− µT +1 σT +1 |F_T) qξ ₌ VaR ξ T +1− µT +1 σT +1 , (4)

where qξ denotes the quantile associated with probability ξ of the residual process Z. Notice that (4) can be rewritten to yield:

VaRξ_{T +1} = µT +1+ σT +1qξ (5)

In the previous derivation we used qξ, µT +1, σT +1 ∈ mFT, meaning these entities are measurable

with respect to the filtration up to and including time T . The implication is that VaRξ_{T +1} ∈ mF_T and hence there is no randomness involved within the formulation of the one-step ahead Value-at-Risk forecast at time T .

Let bθ denote the QMLE based on the Gaussian quasi-log likelihood function and information avail-able at time T. We consider a two-stage approach for estimating the one-step ahead Value-at-Risk in which we first estimate θ by QML and in the second stage use the standardized residuals to obtain an empirical quantile estimate of qξ, denoted byqb

ξ_{. Our main interest lies with assessing the}

(6)

unknown) value of VaRξ and let Υ denote the parameter space of VaRξ.

3 Methods to quantify the estimation error of the Value-at-Risk

Maximum likelihood estimation is the conventional approach to estimate θ. Let LT(θ) denote the

likelihood function for θ ∈ Θ and a given sample X. In the remainder we freely use ln LT(θ) to

denote the log-likelihood function, i.e. ln LT(θ) = ln(LT(θ)), and lt to denote the contribution of

observation t to the log-likelihood. Based on the first order partial derivatives we can define the (expected) Fisher information matrix I(θ) and its alternative Hessian form J (θ), which is based on the expected values of the second order partial derivatives4, as:

I(θ) = E ∂lt(θ) ∂θ ∂lt(θ) ∂θ0 and J (θ) = −E ∂ 2_l t(θ) ∂θ∂θ0 (6)

These objects can be estimated by ¯I(θ) and ¯J (θ) defined as:

¯ I(θ) = 1 T T X t=1 ∂lt(θ) ∂θ ∂lt(θ) ∂θ0 and ¯J (θ) = − 1 T T X t=1 ∂2lt(θ) ∂θ∂θ0 (7)

Now, under a few technical conditions as specified in Hendry (1995) we have the following asymptotic result:

√

T (bθ − θ0) → N (0, Σ) (8)

with Σ = I(θ0)−1 Given some regularity assumptions, Bollerslev & Wooldridge (1992) provide a

robust sandwich estimator bΣ of the covariance matrix Σ. The authors demonstrate that the sand-wich estimator is both consistent and asymptotically normal even if we deviate from the normality assumption on the model errors. In our setting, the proposed estimator can be written in the form:

b

Σ = ¯J (bθ)−1I(b¯θ) ¯J (bθ)−1, (9)

and can be used for consistent asymptotic variance estimation.

3.1 The delta method

The delta method is a procedure to derive an approximate distribution for a function of the estimated parameters by using the asymptotic distribution of the estimator. In the literature, the delta method

4

(7)

is generally classified as a generalized version of the central limit theorem.5 _{For illustration purposes,}

consider a transformation g of the parameter vector and assume that the estimator vector bθ has a limiting normal distribution. In this (simple) setting, the delta method states that:

√ T g(bθ) − g(θ0) _D −→ N (0, Σ∗) (10) with Σ∗ = ∇g(θ0)0varθb

∇g(θ0_{). Statement (10) can be derived by combining Taylor series}

ex-pansions of g(·) with the assumption of an asymptotic normal distributed parameter estimator bθ. Using a first-order Taylor series expansion around g(bθ), we obtain:

g(bθ) ≈ g(θ0) + ∇g(θ0)0 b θ − θ0 (11)

Hence, we can obtain for the variance of g(bθ):

var(g(bθ)) ≈ var(g(θ0_{) + ∇g(θ}0₎0 b θ − θ0 ) = var g(θ0) + ∇g(θ0)0θ − ∇g(θb 0)0θ0 = var∇g(θ0₎0 b θ = ∇g(θ0₎0_var b θ∇g(θ0_), (12)

with ∇g(·) denoting the vector of partial derivatives of g(·) with respect to the parameters. For var

b θ

one can use the (robust) estimator of the covariance matrix.

The empirical quantile estimator of qξ depends by construction on the estimated (and rescaled) residuals. Hence, the empirical quantile estimator is inherently subject to the estimation dynamics of the QML estimation of θ by construction of the estimated residuals. Starting with consistency of the QMLE for ARMA(1,1)-GARCH(1,1) models it is straightforward to verify thatq_bα → q0 _a.s.

under the assumption that bθ → θ0 a.s. as T → ∞. On the other hand, the limiting distribution

of (bθ,q_b_Tα)0 is far more complicated. Thankfully, Francq & Zako¨ıan (2012) establishes asymptotic normality for this vector under some technical conditions for a general class of volatility models. We refer to page 7 of Francq & Zako¨ıan (2012) for the statements of the conditions. In particular,

5

(8)

Francq & Zako¨ıan (2012) shows that under correct specification of the (quasi)-likelihood:      b θ b qξ      D −→ N           θ0 qξ      , Σξ      , with Σξ=      κ4− 1 4 (J ∗₎−1 _−λ ξ(J∗)−1K∗ −λ_ξ(K∗)0(J∗)−1 ζξ      (13) Here, K∗ = E[∂lt(θ0) ∂θ ] and J ∗ _{= E[}∂lt(θ0) ∂θ ∂lt(θ0)

∂θ0 ] which corresponds to our information matrix of

section 3.1. We have for λξ = qξ κ4₄−1+_{2f (q}pξξ₎ with pξ= E[Z21Z<qξ] − ξ and f the density of Z. For

our case of Gaussian QML, ζξreduces to ζξ = ξ(1−ξ)_f2_(qξ₎−

(qξ₎2

2 with f the density of the (true) innovation

distribution of Z. Consider the Value-at-Risk at time T + 1 conditional on the information available at time T as a function of (bθ,q_bξ)0, i.e. VaRξ_{T +1} = g(bθ, bqξ_{) = µ}

T +1(bθ) + σT +1(bθ)qb

ξ_{. Applying the}

chain rule we obtain the partial derivatives with respect to θ as in: ∂VaRξ_{T +1} ∂θ = ∂g(bθ,q_bξ) ∂θ = ∂µT +1(bθ) ∂θ + ∂σT +1(bθ) ∂θ qb ξ_{+ σ} T +1(bθ) ∂qb ξ ∂θ = ∂µT +1(bθ) ∂θ +qb ξ_·∂ q σ2 T +1(bθ) ∂θ + ∂q_bξ ∂θ · q σ_{T +1}2 (bθ) = ∂µT +1(bθ) ∂θ + b qξ 2 q σ_{T +1}2 (bθ) · ∂σ 2 T +1(bθ) ∂θ ! +∂bq ξ ∂θ · q σ2_{T +1}(bθ) (14)

In Appendix A we provide the analytical derivatives of µT +1(bθ) and σT +12 (bθ) with respect to the

individual parameters. Now, using the consistency of quasi-maximum likelihood estimators as established in White (1996) together with the Taylor expansion approach from the previous page it follows that for θ = (θ0, qξ) we have that:

√ TVaRd_ξ− VaRξ0 ∼ N 0,∂VaR ξ_(θ) ∂θ0 var(bθ) ∂VaRξ(θ) ∂θ , (15)

By (15) and our use of QML as the estimation routine, the (1 − δ)% confidence interval VaRξ(θ) on basis of the delta method is:

(9)

where QN denotes the quantile associated with the standard normal distribution for probability (1 − δ 2)% and γ = ∂VaRξ(θ) ∂θ0 Σ ∂VaRξ(θ)

∂θ under θ = bθ with Σ the covariance matrix of bθ.

The delta method is easy to implement and computationally not very demanding compared to most alternative methods. The computational effort for the procedure can be decreased significantly by using numerical derivatives instead of analytical derivatives in the system of partial derivatives of VaRξ(θ). We come back to this subject later in section 5. Overall, the ability of the delta method to give correct confidence bounds, in the sense of coverage probabilities, depends on various factors. First, the performance of the delta method depends on how well the first-order Taylor expansion approximates VaRξ(θ). Higher-order Taylor expansions increase the accuracy but the analytical form of the confidence bounds becomes far more complicated. Second, the delta method relies on the asymptotic normality of the estimator vector and hence the true confidence bounds may deviate significantly from the ones supplied by the delta method in case of small samples and non-normal errors. In light of this argument, we can observe from (16) that the delta method presupposes a symmetrical confidence interval around the estimate. Asymptotically, this holds but small samples often exhibit more pronounced asymmetries. Duty & Flournoy (2009) discuss composition exten-sions to the delta method to yield an asymmetric interval with a coverage better than the delta method. Unfortunately, their results do require some additional (and rather strong) assumptions on the functional form of, for our case, VaRξ(θ).

4 The profile likelihood method

The profile likelihood approach offers a method to investigate the uncertainty associated with QML parameter estimation. The profile likelihood method can be characterized as an application of the generalized likelihood ratio test, a statistical test well-described by Lehmann (2012). Before we further elaborate on profile likelihood techniques, we first introduce some additional notation. We let LT(θ; X) denote the likelihood function on basis of the set of observations X. Naturally, by

definition of the maximum likelihood estimator bθ, we have bθ = arg sup

θ

LT(θ; X).

We wish to restrict a proportion of the parameters to a certain value. Let θr ⊆ {µ, φ, ϕ, α₀, α1, β}

(10)

restricted parameter space defined by Θr _{= {θ ∈ Θ : θ}

i = θir for all i ∈ I} where I refers to the set

of indices of the parameters we wish to restrict in the parameter vector. The generalized likelihood ratio is a statistic to test whether or not the set of observations support the imposed restrictions on the parameter space. The generalized likelihood ratio statistic6 on basis of the set of observations X takes the form:

LR = sup θ∈Θr LT(θ; X) sup θ∈Θ LT(θ; X) (17)

Wilks Theorem considers the characteristics of the generalized likelihood ratio statistic as specified by (17) under the assumption of iid residuals7and under the assumption that the ML estimators fol-low a normal distribution asymptotically. Using Taylor expansions and the Law of Large Numbers, Wilks Theorem establishes among other things that:

−2 log (LR)−D→ χ_d, (18)

i.e. the left-hand side statistic converges in distribution to a chi-squared distributed random vari-able with d degrees of freedom where d refers to the number of elements of I. Naturally, we have −2 log (LR) = −2[log L_T( bθr_{; X) − log L}

T(bθ; X)], where bθrdenotes the QMLE obtained by

maximiza-tion of the likelihood on the restricted parameter space.

For a predetermined restricted parameter set θr, the profile likelihood represents the maximized likelihood function on the restricted parameter space θr induced by θr. In terms of our notation, the profile likelihood is a function P(θr_{; X) = sup}

θ∈Θr

L(θ; X). The profile likelihood inherits many characteristics from the likelihood function it is based on. The so-called invariance property is one of such characteristics and we believe the invariance principle is crucial in many ways to incorporate the Value-at-Risk estimator VaRξ(θ) in a meaningful way.

There are corrections to the covariance matrix available in academic literature such that the estima-tor of the covariance matrix remains consistently estimated under certain misspecifications of the (quasi-)likelihood function. In this sense, we would expect to be able to correct the standard

LR-6_{The (ordinary) likelihood ratio describes a restricted parameter space in the denominator of (17) as well, the}

generalized likelihood ratio is a generalization in the sense that it does not restrict the parameter space Θ in the denominator of (17).

7

(11)

test such that it remains valid under such misspecifications. Wooldridge (1990) discusses extensions to statistics that are based on likelihood principles, the authors demonstrate that Wald-statistics and LM-statistics can be made robust. However, Chen & Kuan (2002) shows that the information matrix equality is a necessary condition for the LR-statistic to converge in distribution to a χ2 -squared distributed random variable. When it fails to hold, there is no correct normalizing matrix such that the LR-statistic has a limiting χ2-distribution. At the time of writing, there is no correc-tion available to correct the LR-statistic to obtain robustness against the failure of the informacorrec-tion matrix equality. As a consequence, we can conclude that some likelihood-based statistics may have robust versions, such as the Wald- and LM-statistics, which yield correct sizes under misspecifica-tion, however the LR-statistic does not. In section 5 we elaborate on this issue.

In the following subsection we discuss a broader definition of the invariance principle than com-monly encountered in academic literature and we make use of this extended definition later on in this paper.

4.1 The invariance property for a general class of functions

A useful property of maximum likelihood estimation is the invariance property of the maximum likelihood estimator. The invariance property states that an estimator is invariant to a reparame-terization of the model under certain conditions. The invariance principle applies to a more general class of estimators known as extremum estimators in the literature (Newey & McFadden (1994) pro-vide a comprehensive overview of the properties of extremum estimators). Using similar notation as in Hayashi (2000), an estimator bθ is known as an extremum estimator if there exists a function QT(θ) such that:

b

θ = arg max

θ∈θ QT(θ) (19)

It can be shown that there exists a measureable solution bθ if θ is compact, QT(θ) is continuous in

θ and measurable. The maximum likelihood estimator is an extremum estimator because the ML estimator bθ solves (19) with QT(θ) = _T1 PTt=1log f (Xt; θ) with f (Xt; θ) the joint density of Xt. We

need to introduce one last notion before we can introduce the invariance principle.

(12)

inverse function τ−1(·). It can be shown that the following condition is necessary for the invariance principle to hold:

e

QT(λ) = QT(τ−1(λ)), for any λ ∈ Λ (20)

Let bλ = τ (bθ) and write eQT(θ) for the objective function of the reparameterized model. Because bθ

is the extremum estimator, QT(bθ) ≥ QT(θ) = QT(τ−1(λ)) = eQT(λ). However, by assumption of

b

λ = τ (bθ) it should also hold that QT(bθ) = QT(τ−1(bλ)) = eQT(bλ). Combining the results, we have

e

QT(λ) ≤ eQT(bλ) for any λ ∈ Λ. As mentioned on the previous page, for the maximum likelihood

estimator we have QT(τ−1(λ)) = _T1 PTt=1log f (Xt; τ−1(λ)) = _T1 PTt=1log ef (Xt; λ) = eQT(λ) and

hence the condition in (20) is satisfied for maximum likelihood estimators.

The aim of our research is to quantify the estimation uncertainty associated with VaRξ(bθ). To apply the invariance principle we need to define a proper transformation. However, the natural candidate VaRξ(θ) is not a one-to-one function in θ. Consequently, its inverse does not necessar-ily exist which in turn implies that we are not able to provide a well-defined likelihood function as in the previous paragraph. Define τ∗ : θ → Υ × θ by τ∗(θ) = (VaRξ(θ), θ). As the identity function is one-to-one, τ∗ is trivially one-to-one. Our line of reasoning differs from Zehna (1967) but we come to the same conclusion. By the invariance principle, we can always incorporate Value-at-Risk estimation (and quantification) in likelihood-based methods through a likelihood function reparamaterization from θ to (VaRξ(θ), θ).

4.2 Implementing the profile likelihood approach: ignoring any dynamics

We wish to implement the likelihood ratio approach to assess the uncertainty associated with the Value-at-Risk estimator VaRξ(θ). We distinguish two approaches. In the first approach we (incorrectly) ignore the estimation uncertainty associated with the estimation of qξand its dynamics with bθ. A suitable reparameterization of the (log)likelihood function allows us to investigate the parameter estimation uncertainty and its effects on the Value-at-Risk through the associated profile likelihood. By our model specification in (1) it is clear that µT +1 does not depend on α0, α1 and β

whereas σ_{T +1}2 depends on every element of θ. Together with the recursive nature of our model, it makes sense to reparameterize using a parameter in the parameter vector θ of the volatility equation of model (2). Since the condition α1+ β < 1 makes individual parameter range evaluations of α

(13)

try to derive an expression for VaRξ as a function of α0. We continue as follows. Select a VaR∗ ∈ Υ

and use (5) to solve for α0 and hence obtain:

VaR∗ = µT +1+ σT +1qξ VaR∗− µT +1 = σT +1qξ VaR∗− µT +1 qξ 2 = σ2_{T +1} VaR∗− µ_{T +1} qξ 2 = ∞ P k=0 βk(α0+ α1e2T −k) VaR∗− µT +1 qξ 2 − ∞ P k=0 βkα1e2_{T −k} = α0 1 − β α0 = (1 − β) " VaR∗_{− µ} T +1 qξ 2 − ∞ P k=0 βkα1e2_{T −k} # (21)

Notice that the VaR

∗_{− µ} T +1

qξ

2

-term gives rise to multiple solutions for VaR∗ if we do not take into account any parameter restrictions, i.e. in such cases VaR(α0) is many-to-one and hence

non-injective. However, by the positivity of the volatility process σt for any t through the restriction

α0 > 0 and by the standardization of the returns (which leads to qξ< 0 for non-degenerate Z and

ξ < 1₂), we have by (5) that VaR∗ < µT +1. The implementation of this restriction excludes one of the

two possible solutions and hence under the implicit parameter space we have that VaR(α0) is

injec-tive. Because VaR(α0) is injective and by our reasoning in the previous subsection, we can formulate

a well-defined reparameterized likelihood function for θ∗ = (c, a, b, VaR∗, α, β) using the inverse re-lation described by (21). Consequently, we can formulate the profile log-likelihood conditional on VaRξ(α0) and trace for what values VaR∗ ∈ Υ, we have P(VaR∗; X) = log LT(bθ; X) − qχ(1)/2 with

qχ(1) denoting the quantile of the chi-squared distribution with 1 degree of freedom and a

confi-dence interval significance level of δ. The two solutions to this equality form the (1 − δ)% conficonfi-dence interval for the Value-at-Risk forecast.

(14)

4.3 Implementing the profile likelihood approach: incorporating dynamics

We consider the case where we do not ignore the dynamics between the estimator of the quantile of the standardized standard errorsq_bξand the parameter estimator bθ. Consider the problem of testing for a single (linear) restriction, for example, µ = µ0. We reparameterize the log-likelihood function

and the Value-at-Risk as l∗(µ, eθ) = l(µ, φ, ϕ, α0, α1, β) and VaR∗(µ, eθ) = VaR(µ, φ, ϕ, α0, α1, β)

respectively for any eθ = (φ, ϕ, α0, α1, β)0. A likelihood ratio test statistic for the restriction µ = µ0

would take the form of:

LR = −2

l∗(µ0, bθ) − l(be θ)

with bθ = arg maxe

e θ

l∗(µ0, eθ), (22)

where bθ is the maximum likelihood estimator on the restricted parameter space. Under the hypoth-e esis that the restriction holds true, i.e. H0: µ = µ0, the test statistic LR would be approximately χ21

distributed. For the special case where value of µ0 equals bµ, which corresponds to the unrestricted maximum likelihood estimator bθ = (bµ, bφ,ϕ,b cα0,cα1, bβ), we have l

∗_(µ

0, bθ) = le ∗(µ, b_b θ) = l(be θ) and hence (µ0, bθ) = be θ. To quantify the parameter uncertainty associated with Value-at-Risk estimation we could opt for a trace of the log-likelihood values l∗(µ0, bθ) as we vary µe ₀ over its admissible values. For each choice of µ0, we can also obtain VaR∗(µ0, eθ). Combining both results, we obtain the profile

(log)likelihood (VaR∗(µ0, eθ), l∗(µ0, bθ)).e

The problem with the previous approach is that VaRξ(θ) is not necessarily one-to-one in any of the parameters µ, φ, ϕ, α0, α1 or β. As a consequence, there may exist values of the Value-at-Risk VaRξ

for which there is no unique θ∗∈ Θ such that VaRξ(θ∗) = VaRξ.

An alternative, more generalized approach goes as follows. Hayashi (2000) considers the prob-lem of testing for a set of r restrictions of the form a(θ0) = 0 with both 0 and a(θ0) of dimension r.

The constrained extremum estimator is defined as eθ = arg max

θ∈ΘrQT(θ) with Θ

r_{= Θ ∩ {θ : a(θ) = 0}.}

The estimator eθ can be shown to be a consistent and asymptotically normally distributed. Under the assumption that null holds, the likelihood ratio statistic LR is defined as:

(15)

where op is called the remainder which vanishes asymptotically under the null hypothesis. Under

the assumption that the information matrix equality holds, i.e. J (θ0) = I(θ0), we have:

LR = − _√

n∂Qn(θ0) ∂θ

0

I(θ0)−1A0 A0I(θ0)−1A00 A0I(θ0)−1

_√

n∂Qn(θ0) ∂θ

+ op (24)

Consider the A0I(θ0)−1

√

n∂Qn(θ0)

∂θ

-term from the previous equation. Hayashi (2000) shows that for maximum likelihood estimators we have that:

√

n∂Qn(θ0) ∂θ

D

−→ N (0, I(θ₀)) (25)

It immediately follows that:

A0I(θ0)−1 _√ n∂Qn(θ0) ∂θ D

−→ N (0, A0I(θ0)−1I(θ0)I(θ0)−1A00) = N (0, A0I(θ0)−1A00) (26)

For easy of notation, we write and K = A0I(θ0)−1

√

n∂Qn(θ0)

∂θ

. Concerning (24) we can obtain with the help of the previous equation:

LR = K0[A0I(θ0)−1A00]−1K + op

= R0(I(θ0)−1/2A00)[A0I(θ0)−1A00]−1(A0I(θ0)−1/2)R + op,

(27)

with R ∼ N (0, Ir) and Ir the identity matrix with dimensions r × r. It is straightforward to

ver-ify that 0(I(θ0)−1/2A00)[A0I(θ0)−1A00]−1(A0I(θ0)−1/2) is a symmetric idempotent matrix. Hence,

by Pearson’s lemma on quadratic forms for normally distributed vectors (see for example Driscoll (1999)) we have that LR−D→ χ2

r.

(16)

procedures are accompanied by a significant increase in computation time. We come back on issues with computation time in Appendix C.

For the case where we ignored the estimation dynamics between bθ and q_bξ we expect to observe some averse effects on the accuracy of coverage rates for confidence intervals based on this approach. However, our reparameterization allows us to compute the confidence bounds with significantly less effort by avoiding (additional) non-linear constraints on the parameter space and hence using stan-dard optimization procedures. We expect the profile likelihood approach(es) to adapt better to smaller samples than the delta method due to the asymmetry of the log-likelihood function. We expect the delta method to be less adapt to smaller samples as the confidence interval is sym-metrical around the Value-at-Risk estimate. A disadvantage of profile likelihood-based methods is the computational effort required to determine the confidence bounds and the lack of a corrected likelihood ratio statistic that is robust to failure of the information matrix equality.

5 Simulation research

5.1 Specifications

Following Christoffersen & Gonalves (2005), we consider two different model parameter specifica-tions corresponding to a low and a high persistence case. We consider a high persistence model A by specifying θA= (0, 0.3, 0.1, 0.01, 0.1, 0.8). Model specification B represents a low persistence model with θB = (0, 0.3, 0.1, 0.01, 0.2, 0.4). To investigate the effect of the sample size on the empirical coverage probability, we consider sample sizes 300, 500, 1000 and 2500. Following Francq & Zakoian (2004), we initialize the data generating process by selecting σ₀2 = α0 and e0 = α0. We consider

(17)

case, we standardize the errors to obtain a residual process with unit variance and expectation 0.

5.2 Simulation results

Table 5.1 contains the results of our simulations for the high persistence case. It appears from the benchmark case with normally distributed innovations that the delta method tends to overestimate its nominal coverage probability slightly. This tendency of the delta method to yield conservative confidence intervals for the benchmark case of normal errors is also confirmed by Pek et al. (2011). A likely reason for the overestimation lies with the dependency of the delta method on the robust covariance matrix which is a consistent but (generally) an inefficient estimator.

error distribution

method sample size N (0, 1) Student’s t skewed t (γ = 0.5) skewed t (γ = 0.25)

DM n = 300 0.93 0.79 0.68 0.66 n = 500 0.92 0.76 0.71 0.70 n = 1000 0.92 0.77 0.72 0.70 n = 2500 0.93 0.81 0.74 0.72 PLAstatic n = 300 0.93 0.83 0.79 0.79 n = 500 0.93 0.83 0.80 0.78 n = 1000 0.93 0.81 0.79 0.78 n = 2500 0.94 0.84 0.79 0.78 PLAdynamic n = 300 0.93 0.94 0.94 0.95 n = 500 0.92 0.91 0.95 0.96 n = 1000 0.92 0.92 0.95 0.95 n = 2500 0.92 0.92 0.95 0.95

Table 5.1: Empirical coverage probabilities associated with the various methods for the high persistence model A of (1) as specified in section 5.1. DM stands for the delta method of section 3.1 while PLAdynamic and PLAstatic stand for the profile likelihood approach with dynamics allowed and not allowed respectively between the empirical quantile estimator and the estimator for θ. For the case where a Student’s t-distribution is specified we specify 3 degrees of freedom. The cases with skewed t-distributions have 3 degrees of freedom as well.

(18)

different error specifications of our simulations we observe from the table that the profile likelihood approach with the estimation dynamics ignored tends to yield empirical confidence intervals closer to its nominal value than the delta method.

The profile likelihood technique with modelled dynamics between the estimator of θ and the empir-ical quantile performs considerably better than the previously discussed methods. As we introduce skewness and more severe skewness in the last two columns of Table 5.1, the delta method and the fixed-quantile profile likelihood perform far worse. We find it striking that the dynamic profile like-lihood approach consistently overestimates its nominal range. The failure of the information matrix equality to hold leads to a statistic with a significantly different distribution than the limiting dis-tribution chi-squared under the null hypothesis. For a number of interesting cases we calculated the average confidence interval width of the Value-at-Risk forecast. Because the Value-at-Risk forecast and its calculated confidence interval adjust dynamically to the simulated return series we also cal-culated the number of times the true Value-at-Risk actually exceeds the implied CI-bounds. Ideally, for a nominal confidence interval of 90% one would expect to see approximately 5% of the total number of true Value-at-Risk forecasts to violate the estimated upper CI-bound and a similar per-centage for the violation of the lower CI-bound. Table 5.2 displays the results of our investigation. For the DM method, the proportion of upper versus lower bound-exceedances is not significantly different from the ideal case and hence it appears the symmetrical CI is correctly reflected by our simulation results in case of standard normal errors. On the other hand, the presence of skewed and fat tailed innovations do affect the ability of the delta method to come near the nominal coverage for any sample size considered. It appears from number of violations of the upper and lower CI-bounds that the symmetrical confidence interval calculated by the delta method does not correctly reflect the uncertainty of the Value-at-Risk estimate. The confidence intervals have decreased in compar-ison to the benchmark case but they do not sufficiently reflect the ”true” uncertainty associated with the Value-at-Risk forecasts.

(19)

lower bound become very small, i.e. 19 and 11 for sample sizes n = 300 and n = 1000 respectively. Hence, the profile likelihood approach (with incorporated dynamics) yields conservative confidence intervals, yet far overestimates the uncertainty associated with the Value-at-Risk forecasts. We continue our study in the next section with an empirical application.

DM PLAdynamic n = 300 n = 1000 n = 300 n = 1000 Zt∼ N (0, 1) CI-width ∗1000 44.88 43.65 42.61 36.13 (Nl, Nu) (35, 31) (41, 36) (36, 34) (39, 44) Zt∼ skewed t (γ = 0.50) CI-width ∗1000 38.37 40.87 65.83 62.69 (Nl, Nu) (207,111) (182,97) (23,37) (20,31) Zt∼ skewed t (γ = 0.25) CI-width ∗1000 35.84 39.12 79.06 77.95 (Nl, Nu) (249, 96) (213, 82) (19,33) (11,18)

Table 5.2: The number of exceedences of the implied CI-bounds for the Value-at-Risk for the delta method and the unrestricted profile likelihood approach. The table lists the average width of the confidence interval based on the 1000 simulations for each error distribution as specified in section 5 and for each sample sizes n. The displayed CI-widths are scaled by multiplication with one thousand. The objects Nland Nudenote the number of violations of the lower

and upper confidence bounds respectively in the set of simulations by the true Value-at-Risk.

We have also investigated the 90% Value-at-Risk and the 99% Value-at-Risk for the sample sizes n = 300, n = 500 and n = 1000 for the low persistence case but we found not evidence against the previous observations that:

1. the profile likelihood approach with a incorporated dynamics tends to give an empirical cov-erage probability closer to 90% than the delta method and the profile likelihood approach without dynamics.

2. the profile likelihood-based method with incorporated dynamics consistently overestimates its confidence interval, the empirical coverage rate is even larger for more skewed residual distributions.

(20)

6 Empirical application

We consider the Deutscher Aktien Index (the DAX-index), which consists of the top 30 German com-panies measured by market capitalization and liquidity. We consider the DAX weekly log-return series which can be formed from the DAX closing prices. In terms of our notation we consider X1, . . . , XT and we form these by computing the logarithmic differences of the closing price index

Stfor t = 0, . . . , T , i.e. we have Xt= ln(St/St−1). The information on the closing prices is acquired

from a query of download.finance.yahoo.com. The time period we consider starts with an observa-tion of November 26, 1990 and ends with an observaobserva-tion of December 9, 2013. In total, we have 1203 observations available to analyse. Figure 3 in Appendix B gives a graphical representation of the log-return series.

For the empirical distribution of the log-return series we calculate a skewness measure of -0.340 and an excess kurtosis of 1.148. Hence, negative skewness and fat tails seem to accompany the DAX log-return series. The Shapiro test for normality yields a statistic of 0.9598 with an associated p-value smaller than 0.001. The Jarque-Bera test yields a test statistic of 889.8 with an associated p-value smaller than 0.001 as well. Hence, we reject the null of normality. Considering serial de-pendence, the Ljung-Box test (including one lag) applied to the log-returns yields a test statistic of 1.873 with a p-value of 0.1711. On basis of the Ljung-Box test we cannot reject the null of independently distributed data. However, the test applied to the squared returns results in a test statistic of 49.851 with an associated p-value below 0.001. On basis of these findings we investigate a ARMA(1,1)-GARCH(1,1) model fit.

(21)

The second period seems to correspond to the August 2011 stock market crash which followed the downgrading of the credit rating of the U.S. by Standard & Poor and the announcement by the European Central Bank that it would take more drastic actions in resolving the Eurozone debt crises by buying increasing amounts of Spanish and Italian government bonds.

Table 6.1 lists the estimation results for the ARMA(1,1)-GARCH(1,1) model. The robust co-variance estimator considered in section 3 is used to provide the listed robust standard errors. We observe from the table that all included parameters are significant at a significance level of 5%. The standardized residuals exhibit a negative skewness measure of −0.3512 and an excess kurtosis of 1.164. The Shapiro and Jarque-Bera tests both reject the null of normality for the standardized residuals.

parameter estimate robust standard error t-value p-value

µ 0.37208 0.079458 4.6827 0.000003 φ -0.90465 0.078154 -11.5753 0.000001 θ 0.88811 0.084252 10.5410 0.000001 α0 0.54202 0.290935 2.1630 0.046459 α1 0.18891 0.081834 2.3085 0.020974 β 0.76116 0.091223 8.3439 0.000001

Table 6.1: The estimation results for the ARMA(1,1)-GARCH(1,1) model for the weekly log-returns of the DAX stock index.

(22)

Figure 1 presents the Value-at-Risk forecasts with confidence intervals calculated on basis of the delta method. We observe from the figure and the relatively low number of CI violations that the confidence intervals seem to reflect the uncertainty satisfactory albeit being conservative. The inter-vals seem wide enough to cover the uncertainty associated with shifts in volatility. The results from our simulation research carry over to the empirical application for the profile likelihood method. There are but a few violations of the CI lower bound as Figure 2 demonstrates.

Figure 1: A graphical representation of the Value-at-Risk forecasts and the confidence upper and lower bounds calculated by the delta method. The red line displays the Value-at-Risk forecasts based on our rolling window estimation, the dashed lines reflect the calculated upper and lower bounds for each Value-at-Risk forecast.

Figure 2: A graphical representation of the Value-at-Risk forecasts and the confidence upper and lower bounds calculated by the profile likelihood approach with incorporated dynamics. The red line displays the Value-at-Risk forecasts based on our rolling window estimation, the dashed lines reflect the calculated upper and lower bounds for each Value-at-Risk forecast.

(23)

part as our estimates and our computed confidence bounds do not reflect model error.

7 Discussion

The estimation uncertainty associated with Value-at-Risk estimation can be quantified in multiple ways. The delta method from section 3.1 is a viable method but it tends to yield too small confi-dence intervals in cases of fat tailed and skewed distributions. As we introduce more skewness, the delta method yields significantly lower realized coverages. One can not rely on the delta method to give reliable confidence intervals for Value-at-Risk forecasts: in our controlled environment subject to the stylized facts of empirical finance the delta method severely under-performs. On the other hand, the profile likelihood tends to yield more conservative coverages. Hence, use of the profile likelihood approach is a more safe bet from the point-of-view of the cautious risk analyst. Together with the robustness of profile likelihood methods versus sample size scaling, this may be a favourable method to quantify the estimation uncertainty. However, the wide confidence intervals of the profile likelihood methods and the lack of violations of these intervals imply that (parameter estimation) uncertainty is overestimated in environments subject to the stylized facts of empirical finance.

In this paper we do not consider many other alternative methods to quantify the uncertainty of the Value-at-Risk. The normal-approximation method8 and Baysian-based methods are not treated. We have focussed on the Value-at-Risk risk measure and we have not treated other risk measures, such as the Expected Shortfall. With the growing popularity of alternative risk measures and de-sirable properties of the Expected Shortfall, we look forward to seeing more theory developed for the (conditional) autoregressive setting and more applications with the Expected Shortfall.

Appendix A

By the recursive nature of our model specification in (1) we have:

σ2_t = ∞ P k=0 βk(α0+ α1e2t−k−1) (28) 8

(24)

from which we derive: ∂σ_t2 ∂α0 = ∞ P k=0 βk ∂σ_t2 ∂α1 = ∞ P k=0 βke2_t−k−1 ∂σ2 t ∂β = ∞ P k=0 kβk−1(α0+ α1e2_t−k−1) (29) Also: ∂σ_t2 ∂θp = 2α1 ∞ P k=0 βket−k−1 ∂et−k−1 ∂θp , (30)

for p = 1, 2, 3 where (θ1, θ2, θ3) = (µ, φ, ϕ). By the first line of (1):

et = Xt− µ − φ(Xt−1− µ) − ϕet−1 = ∞ P k=0 (−ϕ)k[Xt−k− µ − φ(Xt−k−1− µ)] = ∞ P k=0 (−ϕ)k_[X t−k− µ − φL(Xt−k− µ)] = ∞ P k=0 (−ϕL)k_[X t− µ − φL(Xt− µ)] = 1 − φL 1 + ϕL(Xt−k− µ), (31)

(25)

(26)

Appendix B

Figure 3: A graphical representation of the weekly DAX log-return series. The figure displays the log-return series containing 1202 observations.

Figure 4: A graphical representation of the weekly DAX log-return series (black line) and the one week ahead Value-at-Risk forecast (red line) for approximately the last 4 years on basis of rolling window estimation.

Appendix C: Computational issues

In this section we discuss the main computational issues we encountered during our research that we feel are of importance to others in the field of parallelized computing. We advise the uninterested reader to skip this section.

(27)

on Fortran9_{and C+ and the availability of packages that natively support parallelization routines.}10

Our computational network consists of 14 AMD Opteron Dual Processor machines with 2GB RAM allocated to each core. The machines run on Red Hat Linux version 7.5. Due to the embarrassingly parallel nature of our problem, the communication bandwidth between the computing nodes and the master node is non-critical hence we settled for private ethernet network facilitated by CAT5 ethernet cables allowing for up to 100 Mbps transfer rates. The greater part of our simulation research was carried out through the message passing interface OpenMPI, which we considered superior to other alternatives due to its academic background.

The gain of load balancing

The effects of load balancing are substantial in all simulations of the methods, but the computation time gains were largest for cases where profile likelihood-based approaches were involved. Load balancing refers the practice of dividing the problem into smaller sub-problems of such size that: (a) we ensure that computing nodes do not have to wait too long on other nodes before commencing with a new sub-problem and (b) we minimize the total time required to communicate the results to the master node and/or console.

The effect of load balancing is most pronounced in case of the profile likelihood-based methods. In our simulation study, the part of the estimation process that required most computation time was the profile likelihood approach through the sub-sequential solving of maximization problems for the profile likelihoods in order to approximate the location of the confidence bounds. The number of iterations needed to arrive at a suitable bound significantly differed between generated sam-ples based on the same underlying model and parameter specification. In parallelized computing environment with a set jobs to be executed by the various computing nodes, implementing load balancing routines can make a substantial difference as figure 5 illustrates.

By the nature of the problem and the computational time gain that can be achieved through load balancing, we believe our simulation study provides a useful case study to include in the list of effectively parallelizable problems as treated by Rossini et al. (2003).

9

They are known to outperform virtual machine-based code implementation in terms of execution speed.

(28)

Figure 5: A graphical representation of the cpu computation time required for each job by three particular nodes from the cluster, identified as N.1, N.2 and N.3. A job corresponds to solving for one confidence bound in the profile likelihood approach which is handled by single computing node. After that, the result is communicated to the master node and once all nodes communicated the result of their job to the master, all the nodes receive a new job assignment until there are no more jobs. The first three rows represent job executions without load balancing, the total amount of time spent spent idle (red) is of significant size compared to the time spent computing (green). On the other hand, the lower three rows represent the same nodes executing jobs with load balancing and hence nodes are assigned new jobs dynamically and immediately.

The significance of compiler additions

In the most general sense, a compiler refers to any program that translates any high-level language, such as the C+-language or R-language, to a lower-level language such as machine code or byte code. Compiling enables the user to create compiled variants of user-defined functions using routines from the selected compiler package. Different compilers are made available to the user of most statistical programs, such as Matlab, S-Plus and R11_{. Testing was done on a single core on a single machine.}

If one would test in a parallelized environment such as a cluster, one would have to take into account communication overhead between the computing nodes. As of version 13.0, Rships with the compiler-package. Besides this package, other popular packages for the R environment include the Rcpp-package and the Inline-package. Both combine the R language with fast C and C+ routines code transitions. The execution speed increase due to the use a compiler is influenced by the style of coding of the user. For example, repeated evaluation of 1

1 − x for any x requires less execution time on any system than repeated evaluation of (1 − x)−1 even though the two entities describe the same mathematical relation. Testing on the execution speed without a byte compiler is done on R version 13 without functions being compiled. We also included a routine named

11_{A set of paid compiling routines for Matlab is available through the website of Matlab’s publisher MathWorks} R

(29)

SL which constructs a dataset of 1000 observations from a ARMA(1,1)-GARCH(1,1) specification 10000 times and re-estimates the QML estimators for each of them.

byte compiler implemented routine: sample size vanilla compiler Inline Rcpp

SL n = 100 23.59 168% 129% 131% n = 1000 121.27 91% 82% 83% DM n = 100 74.22 76% 70% 68% n = 1000 74.22 75% 68% 61% PLAstatic _{n = 100} _211.93 _31% _26% _28% n = 1000 1017.31 30% 24% 21% PLAdynamic _{n = 100} _315.33 _34% _27% _29% n = 1000 1865.40 32% 22% 25%

Table 7.1: Comparison of the different compiler-packages. The first column gives the average execution times for the various problems without the use of a compiler and these times are considered as the set of benchmark cases, while the subsequent columns list the percentual gain in computing time for the compiler-packages compared to the benchmarks.

We observe from Table 7 that the inclusion of a compiled functions yields considerable execution time gains for any of the considered cases. Apparently the compiler-package yields the best results for each of the cases.

Approximating the profile likelihood confidence bounds

Consider the profile likelihood approaches of section 4. Efficient searching algorithms for determining the location of the confidence bounds is an issue the researcher has to address. We can distinguish a few implementations from the literature:

1. Successive applications of crude grid searches as implemented by Stryhn & Christensen (2003). The implementation takes the following form in our case. First, one constructs a countable set of points to approximate the parameter range of VaRξ. In our case, an approximation using N points gives rise to the approximating set:

∗_{Υ = {}∗_VaRi

ξ:∗VaRiξ ∈ Υ for i = 1, . . . , N } (35)

The next step is to determine for what elements∗VaR we have log LT(bθ; X) ≥ log Le _T(bθ; X) − 1

2qχ(1) where bθ is the maximum likelihood estimator under the restriction VaRe

(30)

The smallest and largest ∗VaR ∈ Υ such that this condition holds are taken as the approx-imating upper and lower bound for VaRξ respectively. Depending on the required accuracy on the location of the confidence bounds, successive grid searches may be employed in the neighbourhood of the approximating bounds to refine the approximation. The practicality of this approach is questionable for our case. For each ∗VaR ∈ ∗Υ one has to optimize (re-stricted) maximization problem. The approach as laid out by Stryhn & Christensen (2003) forces us do this many times compared to alternative methods that use characteristics of the maximization problem. We therefore consider this approach to be highly inefficient.

2. Systematic search procedures, such as the bisection-based algorithm as treated by Kaw (2010). In this approach, the parameter range is divided at VaRξ(bθ) into two ranges M1 and M2 such

that M1∪ M2 = Υ and M1 ∩ M2 = VaRξ(bθ). Assuming M1 describes the lower range of Υ,

the bisection algorithm iteratively divides the range into sub-ranges at a point m ∈ Υ and the algorithm evaluates in which of the sub-ranges VaRξ ∈ Υ lies such that log LT(bθ; X) =e log LT(bθ; X) −

1

2qχ(1) where bθ is again the maximum likelihood estimator under the imposede restriction. The procedure partitions the sub-range and re-evaluates which of the two contains the solution. Depending on the required accuracy of the solution, the procedure can be carried out as many times as required by the researcher to yield a relatively small neighbourhood in which the lower bound for the confidence interval of VaRξ lies. It is a common procedure to use an interpolating approximation on this small interval to yield a point estimate for the bound, see for example Carlstein & Krishnamoorthy (1992). It is common practice to split the range at the point where the resulting sub-ranges will have equal length but interpolating procedures such as approximation by splines and quadratic approximation may yield in less iterations needed to reach convergence. The same procedure is carried out for M2 leading to

an approximate upper bound of the confidence interval of VaRξ.

(31)

for the difference between function values between iterations. Typically, the increase is even larger as the sample size n increases. This is due to the increasing steepness of the likelihood function as more observations are available, as discussed in Bolker (2008).

References

Bolker, B. M. (2008), Ecological models and data in R, Princeton University Press. Second edition. Bollerslev, T. & Wooldridge, J. M. (1992), ‘Quasi-maximum likelihood estimation and inference in

dynamic models with time-varying covariances’, Econometric Reviews 11(2), 143–172.

Carlstein, E. & Krishnamoorthy, C. (1992), ‘Boundary estimation’, Journal of the American Sta-tistical Association 87(418), 430–438.

Casella, G. & Berger, R. L. (1990), Statistical inference, Vol. 70, Duxbury Press Belmont, CA. Chen, Y.-T. & Kuan, C.-M. (2002), ‘The pseudo-true score encompassing test for non-nested

hy-potheses’, Journal of Econometrics 106(2), 271–295.

Christoffersen, P. & Gonalves, S. (2005), ‘Estimation risk in financial risk management’, Journal of Risk 17(3), 1–28.

Driscoll, M. F. (1999), ‘An improved result relating quadratic forms and chi-square distributions’, The American Statistician 53(3), pp. 273–275.

Duty, P. & Flournoy, N. (2009), The composition method: a simple strategy for improving coverage of confidence intervals, Technical report, University of Missouri.

Fern´andez, C. & Steel, M. F. (1998), ‘On Bayesian modeling of fat tails and skewness’, Journal of the American Statistical Association 93(441), 359–371.

Francq, C. & Zakoian, J.-M. (2004), ‘Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes’, Bernoulli 10(4), 605–637.

Francq, C. & Zako¨ıan, J.-M. (2012), ‘Risk-parameter estimation in volatility models’. URL: http://mpra.ub.uni-muenchen.de/41713/

(32)

Hendry, D. (1995), Dynamic econometrics, Advanced texts in econometrics, Oxford University Press.

Hulle, V. (2007), ‘Solvency II. A panel discussion’, British Actuarial Journal 13, 557–577.

Kaw, A. (2010), Numerical methods with applications: abridged, Autar Kaw Publishing. First edition.

Lehmann, E. L. (2012), On likelihood ratio tests, in ‘Selected works of EL Lehmann’, Springer, pp. 209–216.

Ling, S. & McAleer, M. (2003), ‘Asymptotic theory for a vector arma-garch model’, Econometric Theory 19(2), pp. 280–310.

Lumsdaine, R. L. (1996), ‘Finite-sample properties of the maximum likelihood estimator in GARCH(1,1) and IGARCH’, Journal of Business and Economic Statistics 13(1), 1–10.

McNeil, A. J., Frey, R. & Embrechts, P. (2005), Quantitative risk management: concepts, techniques, and tools, Princeton university press.

Newey, W. K. & McFadden, D. (1994), ‘Large sample estimation and hypothesis testing’, Handbook of econometrics 4, 2111–2245.

Papanicolaou, A. (2009), Taylor approximation and the delta method, Technical report, Stanfort University, Mathematics Department.

Pek, J., Losardo, D. & Bauer, D. J. (2011), ‘Confidence intervals for a semiparametric approach to modeling nonlinear relations among latent variables’, Structural Equation Modeling: A Multidis-ciplinary Journal 18(4), 537–553.

Reutter, M., Weizscker, J. V. & Westermann, F. (2002), ‘Septembear - a seasonality puzzle in the german stock index dax’, Applied Financial Economics 12(11), 765–769.

URL: http://www.tandfonline.com/doi/abs/10.1080/09603100110037504

Rossini, A., Tierney, L. & Li, N. (2003), ‘Simple parallel statistical computing in R’. URL: http://biostats.bepress.com/uwbiostat/paper193

Stoyanov, S. V., Rachev, S. T., Racheva-Iotova, B. & Fabozzi, F. J. (2011), Fat-tailed models for risk estimation, Technical report, Working paper series in economics, No. 30.

(33)

Stryhn, H. & Christensen, J. (2003), Confidence intervals by the profile likelihood method, with applications in veterinary epidemiology, in ‘Proceedings of the 10th International Symposium on Veterinary Epidemiology and Economics’.

Venzon, D. & Moolgavkar, S. (1988), ‘A method for computing profile-likelihood-based confidence intervals’, Applied Statistics pp. 87–94.

White, H. (1996), Estimation, inference and specification analysis, Vol. 22, Cambridge University Press.

Williams, D. (2001), Weighing the odds: a course in probability and statistics, Springer. First edition.

Wooldridge, J. M. (1990), ‘A unified approach to robust, regression-based specification tests’, Econo-metric Theory 6(1), 17–43.