• No results found

Outlier-robust estimators for dynamic panel data models

N/A
N/A
Protected

Academic year: 2021

Share "Outlier-robust estimators for dynamic panel data models"

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master’s Thesis

Outlier-robust estimators for dynamic panel

data models

Vancho Pecovski

Student number: 10887075

Date of final version: January 31, 2016

Master’s programme: Econometrics

Specialisation: Free Track Supervisor: dr. M.J.G. Bun

Second reader: dr. J.C.M. van Ophem

(2)

i

Statement of Originality

This document is written by Student [Vancho Pecovski] who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 1

2 The Setup 4

2.1 Model and Assumptions . . . 4

2.2 Data contamination . . . 7

2.2.1 Independent outliers . . . 7

2.2.2 Patched outliers . . . 8

2.3 Definition of Median . . . 9

3 Estimators 10 3.1 Arellano and Bond (1991), Blundell and Bond (1998) . . . 10

3.2 Hsiao, Pesaran, Tahmiscioglu (2002) . . . 12

3.3 Dhaene and Zhu (2009) . . . 13

3.4 Lancaster (2002) . . . 15 3.5 Further assessment . . . 16 4 Simulations 18 5 Conclusion 22 A Results 24 Bibliography 33 ii

(4)

Chapter 1

Introduction

An outlier is an observation that is not generated by the model that is used. Not accounting for outliers, can lead to bias in estimators and unreliable inference. Contrary to the previous dominant belief that large sample sizes make robust techniques unnecessary, noted in Zaman, Rousseeuw and Orhan (2001), the recent increased research interest in the effects of presence of outliers in data, proves the increased awareness of their impact on the regression results.

To address the issue of outliers, an application to panel data models is used with a partic-ular focus on dynamic panel data models. In the literature on panel data models, there are outlier robust estimators, but they depend on stationarity assumptions. Therefore, the research question in this study is the sensitivity of robust and non-robust estimators, with particular focus on stationarity assumptions.

Robust estimator is an estimator that is not modified too much by changing a small percent-age of the data. In this context, throughout the study, the term contaminated data is used, and applied, in order to be illustrated the robustness of presented estimators to outliers. Addition-ally, robustness can be also interpreted as insensitivity to small deviations from assumptions that the model incorporates.

Consequently, an important distinction is made between robustness to contaminated data and robustness to mean or covariance stationarity. There are methods which are robust to outliers, but also there are methods which are robust to stationarity assumptions. However, there is no method which is robust to both aspects. Therefore, the trade-off between these two aspects of the robustness will be analysed.

Furthermore, particular focus is on dynamic nature of the model which is related to the issue of initial conditions. This is a relevant issue because the treatment of the initial observa-tions is an important theoretical and practical problem (Hahn 1999, Wooldridge 2005) and the assumptions on the initial conditions determine the consistency of the estimator (Anderson and Hsiao, 1981). In this context, for the short time series dimension, as it is case with panel data, the initial conditions have important impact on each of presented estimators.

The starting point in every linear dynamic panel data analysis is the fact that the fixed effects estimator is inconsistent for fixed T and large N . This inconsistency is called Nickell (1981)

(5)

CHAPTER 1. INTRODUCTION 2

bias. This resulted in general practice of using the Generalized Method of Moments (GMM) in estimating the parameters of dynamic panel data models. However, GMM methods are sensitive to outliers, i.e. they are non-robust estimators, term that will be used to distinguish from the robust estimators. Additionally, some of the GMM methods are sensitive to the assumptions of mean-stationarity. Roodman (2009) estimates that not enough attention is given to this assumption in applied research. The effect of deviations from mean-stationarity is analysed, also theoretically, by Hayakawa (2009).

Since the main interest of the thesis is also the behaviour of the robust estimators in pres-ence of outlying observations, a brief overview of the recent outlier-robust estimators includes Bramati and Croux (2007) who propose an estimator that has large efficiency gains with re-spect to the classical Within estimator in presence of outliers. Lucas et al. (2007) propose a variant of GMM estimator which is less sensitive to outliers. Aquaro and Cizek (2010) use a first difference rather than a Within transformation. Conventionally, differencing is employed in dynamic panel data models because it allows the unobserved heterogeneity to be removed.

Dhaene and Zhu (2009) propose median-based estimator. The reason, for this estimator to be used in the thesis, is justified by the fact that, contrary to the above estimators which are locally robust, Dhaene and Zhu (2009) estimator is globally robust which shows the stability of the estimator, term used by Martin and Yohai (1986), in presence of large fraction of outliers. Therefore, this estimator, to which Section 3.3 is devoted, is used in the thesis as a representative of the outlier robust estimators. Because of their sensitivity to the number of periods and cross-sectional units, the simulation results in their research will be extended by including different time period units and cross-sectional units in the context of non-contaminated and contaminated data.

Additionally, among the GMM estimators, the choice is the standard Arellano and Bond (1991) and Blundell and Bond (1998) estimators because they are among the most popular inference methods in dynamic panel data literature. For the group of maximum likelihood estimators, the choice is the transformed maximum likelihood estimator by Hsiao, Pesaran, Tahmiscioglu (2002) based on the first difference transformation because the transformed max-imum likelihood estimator avoids, similar to the GMM estimator, the incidental parameter problem and thus yields consistent estimates and because the advantage, of the approach to use first difference transformation, is that it does not include instruments, but only assumptions on initial conditions.

Without having intention to give a detailed overview of presented estimators, the objective is to illustrate how sensitive are various estimators to stationarity assumptions and the presence of outliers, by using Monte Carlo simulations. In this context, particular focus is on the moment conditions underlying the presented estimators1.

Further assessment of the simulation results is done by considering the strength of deviation from mean-stationarity, i.e. the correlation in deviation from mean-stationarity and individual

(6)

CHAPTER 1. INTRODUCTION 3

specific effect by using the formula in Bun and Sarafidis (2013). This is known as constant-correlated effects or in short, cce assumption. The calculated implied correlation, by using the true values of the parameters, predicts whether the simulation results are symmetric when the assumption of mean-stationarity is violated.

Furthermore, the problem of robust estimation is also considered in a Bayesian context. The main motivation for including Lancaster (2002) estimator, in the analysis of the outlier-robust estimators, is related with the use of the uniform priors, defined as non-data dependent priors, i.e. they do not depend on the distribution of the data because of the independence of the true parameters.

The approach used in the thesis is the following. Keeping the stationarity assumption, the question is what happens if the data are contaminated, i.e. if outliers are added. The robust estimators are still consistent in that case because they rely on mean and/or covariance stationarity. Then what happens when the stationarity is violated and how these estimators behave, with and without outliers. The trade-off between robustness to stationarity assumptions and robustness to outliers will be evident.

The plan of this study is as follows. Chapter 2 presents the model and the assumptions that are usually used. In this chapter, the data contamination is introduced as well as definition of the median. Chapter 3 contains the non-robust and robust estimators with a particular focus on the assumptions underlying the stationarity. Chapter 4 defines the Monte Carlo design used for the simulations as well as assessment of the results which are presented in the appendix. Chapter 5 concludes.

(7)

Chapter 2

The Setup

In this chapter, the model used in the thesis is introduced. Particular focus is on the assumptions that underlie stationarity because after deriving the mean and covariance stationarity conditions in this chapter, the next chapter assigns, to each of the presented estimators, a relevant condition for their consistency. Furthermore, in this chapter, the median will be introduced, which will be used in section 3.3 for defining the median based estimators.

2.1

Model and Assumptions

At the beginning of this section, the mean and covariance stationarity will be defined in order to derive the conditions which will be used for analysing the sensitivity of presented estimators and the deviations from the stationarity conditions by using different values for the parameters δ and λ that are included in the initial conditions. The assumptions on the initial conditions are important because they determine the different finite sample properties of presented estimators.

The following autoregressive model without strictly exogenous variables is considered:

yit= αyi,t−1+ ηi+ υit; i = 1, ...N, t = 2, ...T (2.1)

where α is the parameter of interest with |α| < 1 and ηi is unobserved individual effects.

Additionally, υit∼ iid (0, συ2), i.e. υit are assumed to have finite moments and in particular

E(υit) = (υitυis) = 0, t 6= s (2.2)

which means that there is a lack of serial correlation, not necessarily independence over time. Following Arellano (2009), this assumption does not restrict the variance of υit. This means

that the conditional variance may be some periodic-specific non-negative function of yt−1i and ηi, where yt−1i = (yi0, yi1, ..., yi,t−1). Then,

E(υit2|yt−1i , ηi) = ϕt(yt−1i , ηi) (2.3)

and the unconditional variance may change with t,

(8)

CHAPTER 2. THE SETUP 5

E(υ2it) = E[ϕt(yt−1i ηi)] = σt2. (2.4)

In order to derive the conditions for the mean and covariance stationarity, given in Arellano (2009), which will be used for analysing the sensitivity of the presented estimators, solving (2.1) recursively, gives yit = t−1 X s=0 αs  ηi+ αtyi0+ t−1 X s=0 αsυi(t−s) (2.5) By using the assumption that E(υit|yt−1i , ηi) = 0, the following result is valid,

E(yit|ηi) = t−1 X s=0 αs  ηi+ αtE(yi0|ηi) (2.6)

which for |α| < 1 and large t tends to µi= ηi/(1 − α).

Therefore, µi is the steady state mean for individual i and the stationarity in mean implies

E(yi0|ηi) =

ηi

(1 − α) (2.7)

In addition to the previous assumption, for the covariance stationarity, we need the con-ditional homoskedasticity and/or time series homoskedasticity assumptions because they may hold in conjunction, but any of them may also occur in the absence of the other (Arellano 2009). The conditional homoskedasticity means that conditional variance may be some periodic-specific non-negative function of yt−1i and ηi, where yt−1i = (yi0, yi1, ..., yi,t−1). This is a result of the

assumption that E(υit) = (υitυis) = 0, for t 6= s, i.e. cross-sectionally for any t, but does not

restrict the variance of υit (Arellano 2009).

This is the reason why we have a subscript t for

E(υit2|yit−1, ηi) = σ2t, (2.8)

while for time series homoskedasticity,

E(υit2) = σ2 (2.9)

which lead to the following result,

cov(yit, yi(t−j)|ηi) = α2t−jV ar(yi0|ηi) + αj

t−j−1 X s=0 α2s  σ2 (2.10)

which for |α| < 1 and large t tends to αjσ2/(1 − α2). Consequently, the covariance stationarity implies that

V ar(yi0|ηi) =

σ2

(9)

CHAPTER 2. THE SETUP 6

The model (2.1), together with model (2.7) and (2.11) leads to the following initial condition,

yi0= δµi+ i0 (2.12)

where µi = ηi/(1 − α), υit∼ iidN (0, 1) , ηit∼ iidN (0, 1) , it∼ N (0, λ/(1 − α2)).

The importance of the initial conditions will be illustrated by using different values for the parameters δ and λ because included in the initial conditions, they are relevant for the stationarity assumptions. An estimator of α obtained under the assumption that (yi0|µi) follows

the stationary unconditional distribution of the process will be inconsistent when the assumption is false. Therefore, in short panels the assumption of stationary initial conditions may be very informative about α.

Contrary to the approach of defining the initial conditions by Hayakawa (2013), the ad-vantage is that we don’t need to consider different values of δ for different values for α1. As noted by Bun and Sarafidis (2013), by allowing the variance of the moment conditions to be proportional to 1/(1 − α), it has explosive behaviour as α goes to 1.

The mean and covariance stationarity conditions, now by using the parameters λ and δ, are obtained in the same way as before, i.e. recursively defining yit, and we have the following

result2, yit= [1 − (1 − δ)αt]µi+ t−1 X j=0 αjυi,t−j + αti0 (2.14) where ht= 1 − (1 − δ)αt.

The expectation and the covariance of yit are given by

E(yit) = htµ (2.15) cov(yis, yit) = hthsσµ2+ σ2υαt−s  1 − (1 − λ)α2s 1 − α2  (2.16)

Finally, by imposing restrictions on λ and δ, we have several possible situations. If δ = 1 and λ = 1, yit follows mean and covariance stationary process. If δ = 1 and λ is unrestricted,

1

Hayakawa (2013) defines the initial conditions as

yi0= µi+ i0 (2.13)

where µi= ηi/(1 − ¯α) and ¯α is changed as a function of α and arbitrary constant, for example, ¯α= α − 0.0075 +

0.0025·b. If we start with α = 0.9 and if ¯α = 0.9 we have stationarity as a special case. However, with a small deviation from stationarity, immediately we have a huge variance. With the other approach of defining the initial conditions, used in the thesis, by changing one parameter we don’t have to change other parameters in order to have interpretation that is similar, i.e. irrespective of the value of autoregressive parameter, if α = 0.5 and δ = 1, we have stationarity, if α = 0.9 and still δ = 1 we have again stationarity. Therefore, we don’t need to consider different values of δ for different values of α.

(10)

CHAPTER 2. THE SETUP 7

the process is mean-stationary. If δ 6= 1 the process is mean-nonstationary, i.e. for δ > 1, over-stationary and for δ < 1, under-over-stationary. All these cases will be analysed in the simulations in Chapter 4, by choosing different values for λ and δ which will lead to different results regarding the different finite sample properties on which the presented estimators are based.

The strength of deviation from mean-stationarity is considered by illustrating whether the deviations from the mean-stationarity remain uncorrelated with unobserved heterogeneity. This is known as constant-correlated effects or in short, cce assumption (Bun and Sarafidis, 2013) which suggests that any deviations from steady state behaviour need to be uncorrelated with ηi.

Referring to Bun and Sarafidis (2013), this assumption can be also called ’effect stationary’, because it is an expectation conditional on the individual specific effect ηi.

The correlation coefficient between the deviation of the initial condition of the y process from its long run mean and the level of its long run mean, which is given by equation (4.11) in Bun and Sarafidis (2013), for the AR(1) model can be simplified to:

ry = corr(yi0− µi, µi) = (δ − 1)1−α1 ση2 q (δ − 1)2 1 1−α 2 σ2 η+ 1−α1 2ση2 (2.17)

This equation will be used in the simulations for calculating the implied ry. The

inconsis-tency of the estimators will be confirmed if the implied ry for all estimators suggests strong

deviation from cce. More precisely, the real advantage of (2.16) is that, by inverting it as function of delta, it can be chosen and controlled ry instead.

2.2

Data contamination

The contribution of the thesis is that it considers the sensitivity of the estimators, not only in a traditional way, by considering non-contaminated data, but also by including the contamination by adding two types of outliers, independent outliers and patched outliers that will be defined in this section. The method used, in this sense, follows the approach of Dhaene and Zhu (2009) that is related with the previous literature that considered this question in time-series context. The main difference between these two types of outliers is that additive outliers affect only a single observation, and innovative (independent) outliers affects all later observations. More precisely, in the model with innovative (independent) outliers, occasional innovations have larger variance than the majority and therefore, can appear as outliers. In the model with additive outliers, the isolated outlier has an additive transient character that is unrelated to the model. Therefore, the innovative (independent) outliers transmit their effect through to later observa-tions while additive outliers do not.

2.2.1 Independent outliers

The observed data can be contaminated in the sense of Fox (1972) type I outliers, by adding independent outliers

(11)

CHAPTER 2. THE SETUP 8

yitζ,= yit+ ait (2.18)

where yit is uncontaminated data, aitis independent across i and t, and independent of

uncon-taminated data yit,  is the fraction of the contaminated observations and ζ is the size of the

additive outliers. Independent or isolated outliers means that each pair of outliers is separated in time by at least one non-outlier observation.

What will be shown in the simulations, where the contamination rate is 0.05, is that for all values of  and ζ and in stationary case, the larger bias occurs as α approaches 1. The choice of the contamination rate to be 0.05, following the practice in the literature, including (Dhaene and Zhu,2009) is determined by the fact that, as shown in Figure 1 (page 12, Dhaene and Zhu,2009) the bias curve is closest to the the zero bias line as  increases to 0.5 which is determined as the worst contamination rate, i.e, at  = 0.5 the bias curve is at most away from the zero bias line. Therefore, as the bias is increasing by increasing the contamination rate, up to  = 0.5, the choice is between the lowest contamination rate  = 0.05 and the highest contamination rate  = 0.5. Since the objective of the thesis is not to consider the intensity of contamination which a priori means increased bias, but the presence of contamination per se, the contamination rate used in the simulations is  = 0.05.

2.2.2 Patched outliers

The second way, in which the contamination can be introduced, is by patches of additive outliers. Following Martin and Yohai (1986), the patched outliers are defined as

yitζ,,k = (

yit+ ait if zpt−l= 1, l = 0, ...k − l

yit otherwise

(2.19)

where yit is uncontaminated data, and p = 1 − (1 − )1/k.

If we let zit−l to be a Binomial sequence, for 0 6 γ 6 1

ztγ= (

1 if zt−lp = 1

0 otherwise (2.20)

Therefore, using the form P (zγt = 1) = 1 − (1 − p)k = γ + o(γ),3 and the fact that ztγ is a zero-one process, in the programming part of the thesis, the contaminated process is described as

t = (1 − ztγ)xt+ zγtwt (2.21)

where xtis a Gaussian core process,4and wt is a contaminating process.

3Notation in this part of the thesis refers to p.785 in Martin and Yohai (1986).

4From the literature it is unclear why Gaussian core process is imposed, but as in the earlier studies, it is also

(12)

CHAPTER 2. THE SETUP 9

2.3

Definition of Median

The main motivation for inclusion of median in the analysis is the advantage over the mean using as robustness criteria the breakdown point. Referring to the definition by Serfling (2002), the breakdown point of an estimator measures the degree to which the estimator remains un-influenced by the presence of outliers. The advantage of the use of median is that it has a breakdown point of 1/2 and the mean has a breakdown point of 1/n, where n is the number of data which means that the median performs better according to this robustness measure. Additionally, it is well known that one particular point can greatly influence the mean.

Although, as it will be shown in the simulations, the use of the median based estimators entails an efficiency loss as a side effect, it is considered to be not so large compared with the negative effects of the presence of outlying observations.

Based on all these arguments which justify the use of median, the definition of the median5 is given in this section as an introduction to the definition of the median based estimators in Section 3.3.

The median of x1, ..., xn is written as following:

med (x1..., xn) =    x(k) if n = 2k − 1 (x(k)+ x(k+1))/2 if n = 2k (2.22)

Zielinski (1999) defines the median, based on Hurwicz (1950) in the following way:

medz(x1..., xz) =    x(k) if n = 2k − 1 Dx(k)+ (1 − D)x(k+1) if n = 2k (2.23)

where D is a Bernoulli variate, independent of x1..., xz and P r[D = 0] = P r[D = 1] = 1/2.

When n = 2k − 1, as noted by Dhaene and Zhu (2009), the two definitions are the same, and are asymptotically equivalent when, for n → ∞,

x[n/2]− x[n/2]+1= op(1) (2.24)

Therefore, what is relevant for the estimators is that, medzyt/yt−1; t = 2, ..., T is the median

unbiased for α, and the median-bias of med yt/yt−1; t = 2, ..., T converges to 0 as T → ∞.

(13)

Chapter 3

Estimators

After defining the assumptions for the mean and covariance stationarity and presenting the model that is used in this study, this chapter relates each of the presented estimators with the previously defined conditions for stationarity. The objective is not to give a detailed overview of each estimator, but to focus on the stationarity assumptions that determine the sensitivity of the estimator. Their robustness will be further illustrated in the simulations that follow in the next chapter, in case of contaminated and non-contaminated data. Section 3.5 of this chapter, gives further assessment of presented estimators and summarizes the main assumptions together with the moment conditions, as an introduction to what can be expected in the simulations.

3.1

Arellano and Bond (1991), Blundell and Bond (1998)

Among the GMM estimators, the choice is the standard Arellano and Bond (1991) and Blundell and Bond (1998) estimators because they are among the most popular inference methods in dynamic panel data literature. Again, for illustrative purposes, the model (2.1) is given. The following autoregressive model without strictly exogenous variables is considered:

yit= αyi,t−1+ ηi+ υit, i = 1, ...N, t = 2, ...T (3.1)

where α is the parameter of interest with |α| < 1 and ηi is unobserved individual effects.

The Arellano and Bond (1991) exploits the assumption that there is a lack of serial corre-lation, but not necessarily independence over time

E(υit) = E(υitυis) = 0, t 6= s. (3.2)

Simulation studies, for example, in Blundell and Bond (1998), have shown that the GMM estimator based on the following moment conditions, which are related to assumption (3.2)

E(yit−s∆υit) = 0, t = 2, ...T, s ≥ 2 (3.3)

has large finite sample bias and poor performances especially when α approaches 1.

(14)

CHAPTER 3. ESTIMATORS 11

Arellano and Bond (1991), which does not require mean stationarity, has property that it only requires very general assumptions about the initial values and the individual specific effects in order for the corresponding moment conditions to be satisfied.

Taking first-differences of the model (2.1) or (3.1) gives

∆yit = α∆yit−1+ ∆υit (3.4)

As it is noted in equation (3.3), this estimator is based on the following m = 12T (T − 1) linear moment conditions

E(yit−s(∆yit− α∆yit−1)) = 0 (3.5)

for t = 2, ...T and s = 2, ...t.

They are valid under the basic assumptions (i-iii,Section 3.5) and do not require that the assumption about mean stationarity (2.4) is satisfied. Using stacked notation they can be expressed as

E(Zi10 (∆yi− α∆yi−1)) = 0 (3.6)

where ∆yiare (T −1)× 1vectors defined as ∆yi= (∆yi2, ..., ∆yiT)0and ∆yi,−1= (∆yi1, ..., ∆yiT −1)0

and Zi10 is the (T − 1)×m matrix defined as

Zi10 =        yi0 0 0 · · · 0 · · · 0 0 yi0 yi1 · · · 0 · · · 0 .. . ... ... · · · ... · · · ... 0 0 0 · · · yi0 · · · yiT −2        (3.7)

Blundell and Bond (1998) proposed a system GMM estimator. The consistency of the system GMM estimator is achieved only when the initial conditions satisfies mean-stationarity. More precisely, Blundell and Bond (1998) introduce an additional assumption on the initial condition,

E(ηi∆yi1) = 0, i = 1, ...N (3.8)

This assumption requires a stationarity restriction on the initial conditions yi0. The equation

for the first period observed is

yi1= αyi0+ ηi+ υi1 (3.9)

Substracting yi0 from both sides of this equation, and using the initial condition defined in

(15)

CHAPTER 3. ESTIMATORS 12

∆yi1= (α − 1)

ηi

1 − α+ (α − 1)i0+ ηi+ υi1= (α − 1)i0+ υi1 (3.10) Therefore the assumption is equivalent to the restriction

E(i0ηi) = 0,1 i = 1, ...; N. (3.11)

Consequently, sufficient condition for Blundell and Bond (1998) is that the initial conditions yi0 satisfy the mean stationarity restriction E(yi0|ηi) = ηi/(1 − α) for each individual. This

requires only the first moment to be constant, and does not require constant second moment. When this condition is achieved, the GMM system estimator outperforms Arellano and Bond (1991) estimator in terms of asymptotic efficiency, but consistency relies on the validity of the moment restriction.2

Blundell and Bond (1998) requires mean stationarity. Under the assumption about mean stationarity (2.4), we have additional (T-1) moment conditions that are linear in the the pa-rameter (α),

E(∆yit−1(yit− αyit−1)) = 0 (3.12)

for t = 2, ...T ; which together with previous orthogonality conditions

E(yit−s(∆yit− α∆yit−1)) = 0 (3.13)

for t = 2, ...T and s = 2, ...t; means that we have 12T (T − 1) + (T − 1) moment conditions. The linear GMM estimator that uses all the 12T (T − 1) + (T − 1) moment conditions in (3.12) and (3.13) is referred to as the system GMM estimator.

3.2

Hsiao, Pesaran, Tahmiscioglu (2002)

For the maximum likelihood estimators we know that they are robust to initial conditions, so they don’t need mean stationarity. The main advantage of the likelihood method is that irrespective of which initial condition we take, it leads to consistent estimator. The disadvantage is that if we go beyond AR(1) model, we need strictly exogenous regressors while GMM can allow for additional endogenous regressors.

As a representative of the maximum likelihood estimators, the choice is the transformed maximum likelihood estimator by Hsiao, Pesaran, Tahmiscioglu (2002) based on the first dif-ference transformation because the transformed maximum likelihood estimator avoids, similar to the GMM estimator, the incidental parameter problem and thus yields consistent estimates,

1Derivation of this restriction is given in Bond and all (2001).

2The results for the Arellano-Bond and Blundel-Bond estimators in the simulations correspond to the findings

(16)

CHAPTER 3. ESTIMATORS 13

and because the advantage, of the approach to use first difference transformation, is that it does not include instruments, but only assumptions on initial conditions.

Using the covariance matrix of ∆υ∗,where ∆υi∗ = [∆yi2, ∆yi3− α∆yi2, ..., ∆yit− α∆yi,t−1],

the objective in this study is to find α that maximizes

logL = −N T 2 ln(2π) − N 2ln|Ω| − 1 2 N X i=1 ∆υ∗0Ω−1∆υ∗ (3.14) .

The covariance matrix of ∆υ∗ in HPT (2002) is:

Ω = συ2           ω −1 0 · · · 0 −1 2 −1 · · · 0 0 −1 2 .. . . .. −1 0 −1 2           (3.15) with ω = 1 σ2 υ V ar(∆(yi2) (3.16)

which in the case of covariance stationarity is equal to

ω = 2

1 + α, (3.17)

with

E(∆(yi1)) = 0, (3.18)

V ar(∆(yi1)) = 2συ2/(1 + α). (3.19)

The extension to this approach is to verify the efficiency of the estimator when the imposed covariance stationarity assumption is violated in case of contaminated and uncontaminated data which will be done in the simulations in Chapter 4. HPT also have a version with not imposed covariance stationarity, which is not considered in this study. Section 3.4 gives more details about the imposed stationarity in HPT(2002). However, it should be noted that this estimator is not robust to heteroskedasticity or serial correlation of the transitory shocks.

3.3

Dhaene and Zhu (2009)

In addition to the standard GMM estimators, and the transformed maximum likelihood, and after defining the median in Section 2.3, the median based estimators by Dhaene and Zhu (2009) are used in this study as a representative of the outlier robust estimators. The reason, for this estimator to be used in the thesis, is justified by the fact that Dhaene and Zhu (2009) estimator

(17)

CHAPTER 3. ESTIMATORS 14

is globally robust which shows the stability of the estimator, term used by Martin and Yohai (1986), in presence of large fraction of outliers.

Again, the intention is not to give a detailed overview, but only to underlie the stationarity assumptions that are relevant for consistency and robustness. The replication of their results in the simulations part is extended with the case when the assumptions of mean and covariance stationarity are violated, followed by different size of cross-sectional units and time period units in order to illustrate the sensitivity and to compare the results with the original work by the authors. Additionally, the contribution of the thesis is that this estimator is considered also in the case of contaminated data.

Referring to Hurwicz (1950) that suggested an estimator of autocorrelation of a Gaussian zero-mean AR(1) process, Dhaene and Zhu (2009) propose outlier-robust estimators for linear dynamic fixed effects. The particular focus of the thesis refers to the fact that the initial observations are assumed to be drawn from the stationary distributions when α < 1, i.e. the start-up of the processes has to lie in the distant past. As Dhaene and Zhu (2009) noted, ”the assumption has the testable implication that the time series of the cross-sectional locations and scales of ∆yit are zero and constant, respectively”. However, they do not provide any evidence

of this assertion.

Furthermore, as it will be verified in the simulations, additionally, the median based esti-mators assume the covariance-stationarity,

yit∼  ηi 1 − α, σ2i 1 − α2  . (3.20)

The fixed effects, ηi, are eliminated by taking the first differences, and the joint distribution

of ∆yit and ∆yit−1 is:

∆yit ∆yit−1 ! ∼ N (0, σ2), (3.21) Ωi = σi2 1 + α 2 α − 1 α − 1 2 ! . (3.22)

The important element in the programming part of these estimators is the correlations between ∆yit and ∆yit−1, i.e. r = α−12 , from which one can obtain a robust estimator of α as

α = 1 + 2r.

Using as a reference, Dhaene and Zhu (2009), two approaches are applied in using the median, first is the median of all ratios ∆yit/∆yit−1and the second is the average cross-sectional

median of the ratios ∆yit/∆yit−1.

In the first case, median of all ratios, labelled r, can be estimated by the solution of the sample analogue of the following moment condition:

E[sign(∆yit/∆yit−1) − r] = 0 (3.23)

Because of the argument of symmetry we have also the following moment condition

(18)

CHAPTER 3. ESTIMATORS 15

i.e. the reciprocals of all ratios ∆yit/∆yit−1.

Another approach is to use average cross-sectional median of ratios,

∆yit/∆yit−1= (∆yit− r∆yit−1)/∆yit−1+ r. (3.25)

For this purpose, the use of the median based estimators as outlier robust estimators is justified by the fact that these estimators are robust against outliers because for example, an additive outlier, affects only two ratios, so the median ratio is almost unaffected.

3.4

Lancaster (2002)

Making a clear distinction between Bayesian approach to handle the outliers proposed by Lian-gappaiah (1976) and Dixit (1994) and Bayesian estimator, the main motivation for including Lancaster (2002) estimator, in the analysis of the outlier-robust estimators, is related with the use of uniform priors, defined as non-data dependent priors, i.e. they do not depend on the distribution of the data because of the independence of the true parameters.

The simulations results given in Chapter 4, illustrate for the first time the performance of this estimator also in the context of contaminated data, as well as stationarity and non-stationarity case.

Lancaster (2002) proposed a Bayesian approach which involves using an orthogonal re parametrization of the fixed effects and integrating the new effects from the likelihood func-tion using a uniform prior density. The classic paper on informafunc-tion orthogonality is Cox and Reid (1987), and Sweeting (1987) who makes the connection between orthogonality and infer-ence from the integrated likelihood. Under information orthogonality taking a uniform prior for the effects reduces the bias on the parameter of interest.

If we consider the following dynamic linear model with fixed effects,

yit= αyi,t−1+ fi+ υit (3.26) where E(υit|yi1, ..., yit) = 0, (3.27) E(υit2|yi1, ..., yit) = σ2, (3.28) E(υitυis|yi1, ..., yit−1) = 0, s 6= t (3.29) by defining a function b(α) = 1 T T X t=1 T − t t α t (3.30)

Lancaster (2002) suggests the following parametrization

(19)

CHAPTER 3. ESTIMATORS 16

Denoting by

`(fi, θ) = `i1(fi)`i2(θ) (3.32)

where `i1 and `i2 are likelihood functions, as noted in Lancaster (2002,p.650), if we choose

such that

∂Li(fi, θ)

∂fi∂θ

= 0 (3.33)

is true on average, we have information orthogonality, i.e. if

E ∂ 2L i(fi, θ) ∂fi∂θ  = 0 (3.34)

and f, θ are variation independent, we have an information orthogonal parametrization of the fixed effects. The equation (3.34) states that the information matrix is block diagonal.

Referring to proposition 1 in Arellano and Bonhomme (2006,p.10), the uniform prior is bias reducing, defining the uniform priors as non-data dependent priors, i.e. they do not depend on the distribution of the data because it is independent of the true parameters.This is the link between the ability of a prior to reduce bias and information orthogonality.

In this context, analysing the results from the simulations, the main conclusion is Lancaster’s reparametrization cannot be used when α approaches 1 because in this case the orthogonal individual effects are not identified. Therefore, it shows a sizable bias despite the use of uniform priors, for large values of α. Contrary to the impact of α, the violation of the mean and covariance stationarity do not contribute to large bias of this estimator, in addition to the impact of the value of α.

3.5

Further assessment

As a summary of presented estimators, in this section the main assumptions along with the moment conditions are given, as an introduction to what can be expected in the simulations. Referring to the model (2.1) or (3.1), the basic assumptions are the following:

(i) υit is iid across i, t with E(υit) = 0, E(υ2it) = σ2υ and E(υit4) < ∞

(ii) ηi is iid across i and independent of υi1, ..., υit with E(ηi2) = σ2η and E(ηi4) < ∞

(iii) υit is independent of yi0 for t = 1, ..., T .

The assumptions states that the error terms υit are independent and identically distributed

across i, t with mean zero and finite fourth order moment. In addition, they are independent of the individual specific effect, η, and the initial value yi0. The assumptions also state that the

individual specific effects are independent and identically distributed across i with mean zero and finite four order moment.

The HPT (2002) estimator imposes covariance stationarity. Under covariance stationarity, ω = 1+α2 . 3

3This result follows from equation (3.16), ω = 1 σ2

υV ar(∆(yi2)) and equation, (3.19), V ar(∆(yi1)) = 2σ

2 υ/(1+α)

(20)

CHAPTER 3. ESTIMATORS 17

When ω is restricted in this way, the TML estimator estimates α = 1. When covariance stationarity is not imposed, but mean stationarity is kept,

yi1= ηi+ υi (3.35)

with a variance of υi, συ2, it follows that

V ar(∆(yi2) = συ2+ (1 − α)2σ2 (3.36)

or

ω = 1 + (1 − α)2σ2/συ2 ≥ 1 (3.37) However, Hsiao et al(2002) do not impose this restriction.

Finally, the median based estimator, Dhaene and Zhu (2009), assumes mean and covariance stationarity, yit∼  ηi 1 − α, σ2 i 1 − α2  (3.38) Under stationarity,

E(∆yit|∆yit−1) = r∆yit−1 (3.39)

with

r = cov(∆yit, ∆yit−1) var(∆yit−1)

(3.40)

The equation (3.39) implies that ∆yit− r∆yit−1 and ∆yit−1 are uncorrelated, and from the

basic assumptions (iii) they are independent and symmetrically distributed around zero, from which follows that

E[sign(∆yit− r∆yit−1)sign(∆yit−1)] = 0 (3.41)

This moment condition can be further rewritten as equations (3.23) and (3.24).

Consequently, the following table can be used to summarize the stationarity conditions used in the thesis.4

AB BB HPT DZ L

mean-stationarity no yes no yes yes covariance-stationarity no no yes yes yes

4It should be noted that although HPT have a more general version that is without imposed covariance

(21)

Chapter 4

Simulations

In order to compare the efficiency and the robustness to both aspects explained in the intro-duction of the thesis, i.e. robustness to outliers and robustness to stationarity assumptions, the Monte Carlo simulations are done for the presented estimators: Arellano and Bond (1991) and Blundell and Bond (1998) estimators, the transformed maximum likelihood estimator based on differenced data (HPT 2002) and the median-based estimators (Dhaene and Zhu 2009). In the appendix, the results are given as a summary of the simulations, with N = 100 and N = 300, and T = 5 and T = 4 because the objective is to obtain comparable results as in the previous studies, for example Blundell and Bond (1998), where also these sample sizes are used, knowing that the GMM estimators are originally designed for use when N is relatively large and T is relatively small and aware that the computational demands increase considerably with higher T.

The Monte Carlo design is different from the one presented in Dhaene and Zhu (2009) in several ways. One difference is that the number of cross-sectional units is not N = 1000 as in Dhaene and Zhu (2009) because the design used in the thesis includes additional parameters which significantly increase the time of simulations. However, this does not change the main conclusions given in Dhaene and Zhu (2009), but allows also the case of mean and covariance non-stationarity to be considered with and without data contamination and for different values of the parameters, δ ∈ {0.8, 1, 1.2, 1.5} and λ ∈ {1, 2, 5}. The case δ = 0 is not considered since it is natural to consider that initial conditions contain individual effects.

The value of δ = 0.8 is chosen intentionally, as the lowest in the range, because for lower values the deviation from mean-stationarity is bigger which provokes system GMM to fail, as well as Dhaene and Zhu (2009) estimator, that assumes mean-stationarity, so the only possible consistent estimator is Arellano Bond (1991) and HPT(2002), but as we add outliers it also proves to be vulnerable to data contamination.

Referring to (2.7), the initial condition is defined in the following way

yi0= δµi+ i0 (4.1)

where µi = ηi/(1 − α), υit∼ iidN (0, 1) , ηit∼ iidN (0, 1) , it∼ N (0, λ/(1 − α2)).

(22)

CHAPTER 4. SIMULATIONS 19

In addition to the means and the standard deviations, the root mean square error (abbrevi-ated rmse) is included in the results, which is calcul(abbrevi-ated as:

rmse =pbias2+ std2 (4.2)

Table A.1 and Table A.2, show the results for the mean-stationary case, i.e. δ = 1 and mean-nonstationary case i.e. δ = 0.8, when α = 0.5 and α = 0.9, respectively. The results suggest that Arellano and Bond (1991) is indeed severely biased when α is large. It becomes inconsistent when data are close to mean-stationary which is proved by the very high standard errors as α approaches 1.

By introducing five percent contamination, under independent additive outliers, the bias of the median based estimator is always towards zero and is largest (in absolute value) when α is close to 1. As noted by Dhaene and Zhu (2009), the bias is always less than 0.15, which is approximately the fraction of contaminated ratios. Unlike median based estimator, when a small fraction of large independent outliers is added to the data, the Blundell and Bond (1998) and HPT (2002) estimators are heavily biased towards 0. When a small fraction of patched outliers is added, the Blundell and Bond (1998) and HPT (2002) estimators are nearly unbiased when α is close to 1. This is surprising because this is counter-intuitive knowing the sensitivity to outliers in general, but it shows that GMM and ML are robust to particular designs which include particular type of outliers, i.e. independent outliers, also found in Dhaene and Zhu (2009).

When α = 0.5, the bias of all presented estimators are larger and median based estimator has the least bias.

The main difference between these two types of outliers is that additive outliers affect only a single observation, and innovative (independent) outliers affects all later observations. More precisely, in the model with innovative (independent) outliers occasional innovations have larger variance than the majority and therefore, can appear as outliers. In the model with additive outliers the isolated outlier has an additive transient character that is unrelated to the model. Therefore, the innovative (independent) outliers transmit their effect through to later observa-tions while additive outliers do not.

Making this distinction between the two types of outliers, as a possible explanation for the differences in robustness across designs can be stated the following. In a correctly specified model, both GMM and MLE provide consistent estimators. Therefore, among the reasons that may explain the differences between GMM and ML estimators are for example, the differences in the finite-sample properties of the estimators in a correctly specified model and misspecification, that results in inconsistency of GMM as well as ML estimators. The MLE is sensitive to distributional assumptions. Under normality, all observations are expected to be within certain range contained by the standard deviations. So if an observation is outside this range, the estimation captures the fact that the distribution is in direction of the outlier which influence the estimates. As a possible extension, the Student t distribution can be considered under

(23)

CHAPTER 4. SIMULATIONS 20

which the ML is more robust to outliers than under normality because it has thicker tails that captures the outlier as a part of the tail.

Analysing the standard deviations of the presented estimators, when there is no contamina-tion, the Blundell and Bond (1998) and HPT (2002) estimators outperform the median based estimator in terms of efficiency which confirms the previous expected result of trade-off between efficiency and robustness of the outlier-robust estimators. This is also the case even when there is no contamination.

When the number of time period units is decreased to T = 4, with uncontaminated data, the median-based estimator is outperformed by HPT (2002). Reducing the number of cross sectional units to N = 100, the same conclusion is valid compared to the case when N = 300, but now with relatively higher standard deviations for all estimators.

The contribution of the thesis is to extend the previous analysis to the case when the mean-stationarity assumption is violated, i.e. δ 6= 1. When δ = 0.8 the following conclusion can be stated.

With uncontaminated data, the results suggest that now, not only Arellano and Bond (1991), but also the median based estimator as well as HPT (2002) are severely biased when α is large. It can be noticed that for α = 0.5, the median based estimator and HPT (2002) do not have severe bias. The reason is that although the median-based estimators (Dhaene and Zhu, 2009) are more sensitive to the violation of covariance stationarity than mean stationarity and the transformed maximum likelihood, HPT (2002), used in the study, relies on covariance stationarity, both of them are sensitive to the value of α.

As it was noted in Section 3.4, Lancaster (2002) is severely biased also as α approaches 1. Lancaster’s reparametrization cannot be used when α approaches 1 because in this case the orthogonal individual effects are not identified. Therefore, it shows a sizable bias despite the use of uniform priors, for large values of α.

The strength of deviation from mean-stationarity is considered by verifying whether the deviations from mean-stationarity remain uncorrelated with unobserved heterogeneity. This is known as constant-correlated effects or in short, cce assumption (Bun and Sarafidis, 2013) which suggests that any deviations from steady state behaviour need to be uncorrelated with ηi.

Since we are considering values for δ larger and smaller than one, it is interesting to see whether the results are symmetric, i.e. by using the formula in Bun and Sarafidis (2013) that can be used to calculate the deviation from mean stationarity. For example if it is not the same for δ = 0.8 and δ = 1.2 we can a priori predict that the simulation results are asymmetric. Because of the fact that some of these estimators are inconsistent which leads to inconsistent estimate of ry, we use the true parameter value and we get one number for all estimators.

The correlation coefficient, between the deviation of the initial condition of the y process from its long run mean and the level of its long run mean, which is given by equation (4.11) in Bun and Sarafidis (2013), for the AR(1) model can be simplified to:

(24)

CHAPTER 4. SIMULATIONS 21 ry = corr(yi0− µi, µi) = (δ − 1)1−α1 ση2 q (δ − 1)2 1 1−α 2 σ2 η+ 1−α1 2ση2 (4.3)

By using the formula (4.3), we obtain symmetric results for δ = 0.8 and δ = 1.2, while α = 0.5, ry = 0.3273 and ry = −0.3273, which suggests that there is no strong correlation.

For α = 0.9, the results are ry = 0.6571 and ry = −0.6571, respectively which suggests strong

correlation in deviation from mean-stationary and individual specific effect.

The decrease in the time period units and the cross-sectional units affect the bias of the estimators. When α = 0.5 or α = 0.9, the standard deviations are not affected severely comparing to case when δ = 1.

If the assumption of mean-stationarity is violated and with contaminated data, only for α = 0.5, the median based estimator has least bias. This is not the case when α = 0.9, and depending on the type of the outliers, it can be noticed that it is the HPT (2002) that has least bias with patched outliers. Because Dhaene and Zhu (2009) estimator depends on first-differences of the dependent variable, it is sensitive to independent outliers and patches of outliers when the mean-stationarity assumption is violated.Therefore, the outlier robust estimator, the median based estimator, shows sensitivity to the violation of the mean stationarity assumption. It is important make distinction between outliers in levels and in first-differences of the time series (Maddala and Kim, 2000) because outliers in first-differences correspond to changes in trend whereas outliers in level refer to changes in the mean.

Blundell-Bond estimator is sensitive to the number of cross sectional units. RMSE of the TML estimator is less than almost 50 percent of the Dhaene and Zhu (2009) in the case of uncontaminated data.

If we consider the case of over-stationarity, Table A.5-A.8, Arellano and Bond (1991) esti-mator is the less biased estiesti-mator, not only in the case of non-contaminated data, but also in the case of contaminated data with patched outliers. This is not valid if the contamination is due to independent outliers.

When the covariance assumption is violated, Table A.9. shows that if data is uncontami-nated, the direction of the bias is opposite if we compare the standard GMM estimators with Dhaene and Zhu (2009) and HTP (2002). The former have negative bias, while the latter have positive bias. It is particular case that for α = 0.5 the standard GMM estimators are almost unbiased, while for α = 0.9 the bias is disappearing by increasing the value of λ which in this case is λ=5 and RMSE substantially decreases, while for the median based estimator and HPT (2002), it substantially increases. In the case of contaminated data, the HTP (2002) outper-forms the other estimators, again, this difference is evident only if the contamination is due to patched outliers. The surprising same results for Lancaster (2002), for α = 0.5 and α = 0.9, and for λ = 2 and λ = 5 are checked through repeated simulations under the same ’seed’ in order to be verified that there is no mistake. The explanation is that the type of the outlier has the same impact regardless of the parameter values and the degree of violation of stationarity, i.e. for different values of λ the impact on the bias is not changed.

(25)

Chapter 5

Conclusion

The Arellano Bond (1991) estimator, which does not require mean stationarity, has property that it only requires very general assumptions about the initial values and the individual specific effects in order for the corresponding moment conditions to be satisfied. However, it has large finite sample bias and poor performances especially when α approaches 1. For Blundell and Bond (1998), a sufficient condition for consistency is mean stationarity. In the case of contam-inated data, and if stationarity assumptions are maintained, for the median-based estimators, there is a trade-off between robustness to outliers and the efficiency. The median-based estima-tors (Dhaene and Zhu, 2009) are more sensitive to the violation of covariance stationarity than mean stationarity. The transformed maximum likelihood, HPT (2002), rely on covariance sta-tionarity. If the covariance stationarity assumption is violated, the standard GMM estimators outperform the median based and HPT (2002) estimators for large value of λ.

Therefore, the presented study confirmed that there are no methods which are robust to outliers and also robust to the stationarity assumptions.The previous section confirmed that there is a trade-off between efficiency and robustness of the outlier-robust estimators when the necessary stationarity assumptions are violated. Furthermore, referring to the argumentation by Arellano (2009), there are cases in which one would not expect initial conditions to be distributed according to the steady state distribution of the process. Therefore, a more robust specification is one in which the distribution of yi0 is left unrestricted which can lead to a

motivation to extend the previously defined initial conditions.

A possible extension is to consider, mixing equally, independent additive outliers and patches of outliers. However the results should not be very different because it will be just a convex combination of the results that are presented under the two types of outliers.

Distinguishing the type of the outlier used, a possible extension is to consider block-concentrated outliers in which most of the outliers are concentrated on few individuals, but for most of the time period we observe these individuals. There are only few studies dealing with these problems using panel data.

Furthermore, recently Aquiro and Cizek (2015), extended the median based estimator by means of the multiple pairwise difference transformation that will perform as good as the

(26)

CHAPTER 5. CONCLUSION 23

dard GMM estimators in finite sample and without a presence of outliers.

Regarding the moment conditions based on which these estimators rely, it is hardly to expect that there will be estimator which is robust to outliers and also robust to stationarity assumptions.

The need for modelling the latter behaviour is well recognized by those who have dealt with dynamic panel data models. The recent estimators do not capture the nature of estimation problems which are closely coupled with the presence of outlying observations.

Interesting finding as a path for future research is if the the problem of robust estimation is considered in a Bayesian context because of the use of uniform priors, defined as non-data dependent priors, i.e. they do not depend on the distribution of the data because of the independence of the true parameters and reduce the bias on the parameter of interest under information orthogonality.

(27)

Appendix A

Results

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.5, δ = 1 Arellano − Bond T = 5, N = 300 0.476 0.101 0.104 0.057 0.047 0.445 0.502 0.181 0.181 T = 4, N = 300 0.468 0.162 0.165 0.055 0.062 0.450 0.643 0.372 0.398 T = 5, N = 100 0.398 0.191 0.217 0.053 0.077 0.454 0.404 0.453 0.463 Blundell − Bond T = 5, N = 300 0.511 0.062 0.063 0.076 0.038 0.426 0.707 0.113 0.235 T = 4, N = 300 0.490 0.073 0.074 0.066 0.045 0.436 0.771 0.138 0.304 T = 5, N = 100 0.523 0.098 0.101 0.070 0.066 0.435 0.622 0.200 0.235 Dhaene − Zhu T = 5, N = 300 0.497 0.110 0.110 0.479 0.0950 0.097 0.589 0.108 0.140 T = 4, N = 300 0.480 0.117 0.119 0.451 0.119 0.128 0.565 0.065 0.133 T = 5, N = 100 0.524 0.165 0.167 0.482 0.178 0.179 0.566 0.169 0.182 HP T T = 5, N = 300 0.501 0.049 0.049 0.084 0.0667 0.422 0.788 0.061 0.292 T = 4, N = 300 0.488 0.059 0.061 0.079 0.099 0.432 0.855 0.080 0.364 T = 5, N = 100 0.511 0.075 0.075 0.104 0.123 0.415 0.776 0.129 0.305 Lancaster T = 5, N = 300 0.393 0.031 0.111 -0.159 0.037 0.660 0.359 0.116 0.179 T = 4, N = 300 0.353 0.041 0.152 -0.210 0.045 0.712 0.225 0.086 0.287 T = 5, N = 100 0.412 0.075 0.113 -0.162 0.056 0.665 0.250 0.131 0.279

Table A.1: Results of the Monte Carlo simulation for α = 0.5 and δ = 1 .

(28)

APPENDIX A. RESULTS 25

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.5, δ = 0.8 Arellano − Bond T = 5, N = 300 0.462 0.139 0.144 0.041 0.046 0.460 0.487 0.193 0.193 T = 4, N = 300 0.446 0.189 0.196 0.050 0.068 0.455 0.722 0.478 0.527 T = 5, N = 100 0.385 0.277 0.300 0.058 0.099 0.453 0.392 0.340 0.357 Blundell − Bond T = 5, N = 300 0.648 0.066 0.162 0.083 0.046 0.420 0.7630 0.091 0.278 T = 4, N = 300 0.670 0.062 0.181 0.082 0.058 0.422 0.820 0.109 0.338 T = 5, N = 100 0.635 0.104 0.171 0.096 0.081 0.412 0.669 0.1720 0.241 Dhaene − Zhu T = 5, N = 300 0.525 0.096 0.099 0.478 0.110 0.112 0.581 0.092 0.123 T = 4, N = 300 0.507 0.115 0.115 0.481 0.127 0.129 0.584 0.106 0.135 T = 5, N = 100 0.530 0.180 0.183 0.488 0.178 0.178 0.582 0.161 0.181 HP T T = 5, N = 300 0.522 0.056 0.060 0.086 0.069 0.420 0.794 0.069 0.302 T = 4, N = 300 0.511 0.065 0.066 0.076 0.121 0.441 0.861 0.062 0.366 T = 5, N = 100 0.523 0.084 0.087 0.125 0.128 0.396 0.780 0.117 0.301 Lancaster T = 5, N = 300 0.395 0.030 0.108 -0.153 0.043 0.655 0.359 0.116 0.179 T = 4, N = 300 0.362 0.041 0.143 -0.213 0.046 0.715 0.225 0.086 0.287 T = 5, N = 100 0.411 0.072 0.113 -0.161 0.064 0.663 0.250 0.131 0.279

Table A.2: Results of the Monte Carlo simulation for α = 0.5 and δ = 0.8 .

(29)

APPENDIX A. RESULTS 26

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.9, δ = 1 Arellano − Bond T = 5, N = 300 0.384 0.592 0.786 0.007 0.089 0.898 0.412 0.417 0.642 T = 4, N = 300 0.519 0.683 0.782 0.030 0.136 0.881 0.603 0.641 0.706 T = 5, N = 100 0.169 0.510 0.891 -0.038 0.208 0.961 0.256 0.597 0.878 Blundell − Bond T = 5, N = 300 0.916 0.082 0.084 0.119 0.065 0.784 0.982 0.073 0.080 T = 4, N = 300 0.909 0.107 0.108 0.097 0.068 0.806 0.918 0.137 0.139 T = 5, N = 100 0.939 0.074 0.083 0.167 0.159 0.750 0.918 0.097 0.098 Dhaene − Zhu T = 5, N = 300 0.904 0.098 0.098 0.807 0.089 0.128 0.908 0.089 0.089 T = 4, N = 300 0.907 0.128 0.128 0.806 0.116 0.150 0.922 0.091 0.094 T = 5, N = 100 0.867 0.164 0.167 0.804 0.142 0.171 0.906 0.165 0.166 HP T T = 5, N = 300 0.908 0.046 0.046 0.126 0.075 0.777 0.869 0.070 0.076 T = 4, N = 300 0.907 0.069 0.069 0.104 0.117 0.804 0.960 0.075 0.096 T = 5, N = 100 0.886 0.097 0.098 0.146 0.144 0.767 0.877 0.107 0.110 Lancaster T = 5, N = 300 0.465 0.027 0.436 -0.153 0.043 1.054 0.360 0.117 0.552 T = 4, N = 300 0.376 0.033 0.524 -0.213 0.046 1.114 0.225 0.085 0.680 T = 5, N = 100 0.474 0.052 0.429 -0.161 0.063 1.063 0.251 0.130 0.661

Table A.3: Results of the Monte Carlo simulation for α = 0.9 and δ = 1 .

(30)

APPENDIX A. RESULTS 27

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.9, δ = 0.8 Arellano − Bond T = 5, N = 300 0.732 0.273 0.321 -0.063 0.097 0.968 -0.045 0.585 1.111 T = 4, N = 300 0.768 0.387 0.409 -0.072 0.156 0.985 0.024 0.993 1.324 T = 5, N = 100 0.475 0.516 0.668 -0.095 0.235 1.022 0.081 0.520 0.970 Blundell − Bond T = 5, N = 300 0.987 0.014 0.088 0.209 0.118 0.706 0.983 0.021 0.085 T = 4, N = 300 0.995 0.017 0.096 0.225 0.164 0.694 0.990 0.029 0.095 T = 5, N = 100 0.987 0.029 0.092 0.250 0.169 0.671 0.964 0.095 0.115 Dhaene − Zhu T = 5, N = 300 0.942 0.112 0.120 0.859 0.096 0.104 0.963 0.089 0.109 T = 4, N = 300 0.963 0.126 0.141 0.842 0.121 0.134 0.981 0.117 0.142 T = 5, N = 100 0.905 0.172 0.172 0.849 0.143 0.152 0.975 0.137 0.156 HP T T = 5, N = 300 0.950 0.047 0.069 0.153 0.079 0.751 0.875 0.071 0.075 T = 4, N = 300 0.974 0.062 0.097 0.138 0.117 0.770 0.981 0.055 0.098 T = 5, N = 100 0.967 0.077 0.102 0.153 0.145 0.760 0.889 0.117 0.117 Lancaster T = 5, N = 300 0.476 0.023 0.424 -0.153 0.043 1.054 0.360 0.117 0.552 T = 4, N = 300 0.387 0.033 0.514 -0.213 0.046 1.114 0.225 0.085 0.680 T = 5, N = 100 0.480 0.056 0.423 -0.161 0.063 1.063 0.251 0.130 0.661

Table A.4: Results of the Monte Carlo simulation for α = 0.9 and δ = 0.8 .

(31)

APPENDIX A. RESULTS 28

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.5, δ = 1.2 Arellano − Bond T = 5, N = 300 0.484 0.076 0.078 0.083 0.053 0.420 0.503 0.127 0.127 T = 4, N = 300 0.495 0.095 0.095 0.075 0.074 0.432 0.676 0.274 0.326 T = 5, N = 100 0.457 0.141 0.147 0.073 0.097 0.438 0.456 0.359 0.362 Blundell − Bond T = 5, N = 300 0.508 0.062 0.063 0.072 0.040 0.430 0.656 0.117 0.195 T = 4, N = 300 0.506 0.097 0.054 0.058 0.045 0.450 0.755 0.151 0.296 T = 5, N = 100 0.503 0.110 0.111 0.067 0.067 0.438 0.610 0.162 0.196 Dhaene − Zhu T = 5, N = 300 0.519 0.089 0.091 0.495 0.096 0.096 0.575 0.102 0.127 T = 4, N = 300 0.510 0.123 0.123 0.474 0.130 0.132 0.600 0.118 0.155 T = 5, N = 100 0.502 0.160 0.160 0.480 0.180 0.183 0.551 0.159 0.167 HP T T = 5, N = 300 0.524 0.048 0.053 0.077 0.064 0.428 0.785 0.072 0.294 T = 4, N = 300 0.517 0.065 0.067 0.069 0.099 0.445 0.856 0.085 0.366 T = 5, N = 100 0.526 0.080 0.084 0.111 0.141 0.414 0.782 0.124 0.308 Lancaster T = 5, N = 300 0.397 0.031 0.107 -0.153 0.043 0.655 0.359 0.116 0.179 T = 4, N = 300 0.351 0.039 0.153 -0.213 0.046 0.715 0.225 0.086 0.287 T = 5, N = 100 0.418 0.076 0.109 -0.161 0.064 0.663 0.250 0.131 0.279

Table A.5: Results of the Monte Carlo simulation for α = 0.5 and δ = 1.2 .

(32)

APPENDIX A. RESULTS 29

Uncontaminated data Indepdendent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.5, δ = 1.5 Arellano − Bond T = 5, N = 300 0.495 0.053 0.054 0.116 0.053 0.387 0.518 0.112 0.113 T = 4, N = 300 0.499 0.081 0.081 0.123 0.093 0.388 0.596 0.227 0.247 T = 5, N = 100 0.501 0.097 0.097 0.114 0.095 0.398 0.444 0.177 0.186 Blundell − Bond T = 5, N = 300 0.683 0.076 0.198 0.068 0.043 0.434 0.641 0.133 0.194 T = 4, N = 300 0.811 0.094 0.324 0.058 0.071 0.448 0.713 0.171 0.273 T = 5, N = 100 0.694 0.127 0.232 0.063 0.073 0.443 0.629 0.175 0.216 Dhaene − Zhu T = 5, N = 300 0.617 0.084 0.143 0.568 0.093 0.115 0.669 0.093 0.193 T = 4, N = 300 0.652 0.111 0.189 0.612 0.122 0.165 0.71 0.126 0.246 T = 5, N = 100 0.612 0.155 0.191 0.554 0.161 0.170 0.661 0.158 0.226 HP T T = 5, N = 300 0.629 0.054 0.140 0.108 0.070 0.397 0.803 0.068 0.311 T = 4, N = 300 0.673 0.068 0.186 0.101 0.129 0.419 0.890 0.068 0.400 T = 5, N = 100 0.641 0.085 0.164 0.126 0.133 0.397 0.817 0.105 0.333 Lancaster T = 5, N = 300 0.411 0.030 0.094 -0.153 0.043 0.655 0.359 0.116 0.179 T = 4, N = 300 0.363 0.035 0.141 -0.213 0.046 0.715 0.225 0.086 0.287 T = 5, N = 100 0.433 0.072 0.096 -0.161 0.064 0.663 0.250 0.131 0.279

Table A.6: Results of the Monte Carlo simulation for α = 0.5 and δ = 1.5 .

(33)

APPENDIX A. RESULTS 30

Uncontaminated data Indepdendent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.9, δ = 1.2 Arellano − Bond T = 5, N = 300 0.875 0.130 0.132 0.160 0.126 0.751 0.771 0.151 0.199 T = 4, N = 300 0.897 0.199 0.199 0.194 0.173 0.727 0.866 0.347 0.349 T = 5, N = 100 0.788 0.241 0.266 0.122 0.217 0.808 0.612 0.292 0.411 Blundell − Bond T = 5, N = 300 1.013 0.019 0.115 0.099 0.089 0.805 0.958 0.067 0.089 T = 4, N = 300 1.006 0.026 0.109 0.097 0.101 0.809 0.950 0.133 0.142 T = 5, N = 100 0.999 0.051 0.112 0.227 0.257 0.720 0.937 0.112 0.118 Dhaene − Zhu T = 5, N = 300 0.947 0.109 0.119 0.860 0.086 0.095 0.975 0.088 0.115 T = 4, N = 300 0.971 0.121 0.882 0.110 0.112 0.132 0.969 0.108 0.128 T = 5, N = 100 0.971 0.187 0.200 0.843 0.174 0.183 0.974 0.149 0.167 HP T T = 5, N = 300 0.948 0.049 0.069 0.143 0.071 0.760 0.884 0.069 0.070 T = 4, N = 300 0.977 0.061 0.098 0.126 0.132 0.785 0.981 0.067 0.105 T = 5, N = 100 0.957 0.091 0.107 0.174 0.153 0.742 0.912 0.126 0.126 Lancaster T = 5, N = 300 0.468 0.029 0.432 -0.153 0.043 1.054 0.360 0.117 0.552 T = 4, N = 300 0.380 0.031 0.521 -0.213 0.046 1.114 0.225 0.085 0.680 T = 5, N = 100 0.481 0.049 0.422 -0.161 0.063 1.063 0.251 0.130 0.661

Table A.7: Results of the Monte Carlo simulation for α = 0.9 and δ = 1.2 .

(34)

APPENDIX A. RESULTS 31

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

α = 0.9, δ = 1.5 Arellano − Bond T = 5, N = 300 0.888 0.048 0.050 0.559 0.145 0.371 0.837 0.084 0.105 T = 4, N = 300 0.907 0.086 0.087 0.496 0.194 0.448 0.909 0.196 0.196 T = 5, N = 100 0.876 0.103 0.106 0.463 0.257 0.507 0.827 0.185 0.199 Blundell − Bond T = 5, N = 300 0.975 0.005 0.076 0.580 0.319 0.451 0.971 0.012 0.072 T = 4, N = 300 0.973 0.006 0.073 0.654 0.371 0.446 0.972 0.055 0.090 T = 5, N = 100 0.974 0.008 0.074 0.577 0.354 0.479 0.958 0.071 0.091 Dhaene − Zhu T = 5, N = 300 1.182 0.108 0.302 1.031 0.082 0.155 1.164 0.089 0.278 T = 4, N = 300 1.212 0.122 0.312 1.067 0.106 0.198 1.188 0.122 0.313 T = 5, N = 100 1.221 0.174 0.365 1.023 0.171 0.211 1.153 0.132 0.285 HP T T = 5, N = 300 1.178 0.044 0.281 0.227 0.091 0.678 0.957 0.068 0.089 T = 4, N = 300 1.223 0.063 0.329 0.193 0.139 0.720 1.045 0.078 0.165 T = 5, N = 100 1.193 0.068 0.300 0.231 0.158 0.688 0.982 0.132 0.156 Lancaster T = 5, N = 300 0.500 0.030 0.401 -0.153 0.043 1.054 0.360 0.117 0.552 T = 4, N = 300 0.408 0.027 0.492 -0.213 0.046 0.225 0.225 0.085 0.680 T = 5, N = 100 0.513 0.047 0.390 -0.161 0.063 1.063 0.251 0.130 0.661

Table A.8: Results of the Monte Carlo simulation for α = 0.9 and δ = 1.5 .

(35)

APPENDIX A. RESULTS 32

Uncontaminated data Independent outliers Patched AO

Estimator mean std rmse mean std rmse mean std rmse

T = 5, N = 300, δ = 1 Arellano − Bond α = 0.5, λ = 2 0.486 0.071 0.073 0.076 0.055 0.428 0.524 0.122 0.124 α = 0.5, λ = 5 0.497 0.044 0.044 0.163 0.056 0.3410 0.520 0.091 0.093 Blundell − Bond α = 0.5, λ = 2 0.506 0.043 0.043 0.082 0.042 0.420 0.718 0.117 0.248 α = 0.5, λ = 5 0.503 0.036 0.036 0.120 0.043 0.381 0.681 0.141 0.223 Dhaene − Zhu α = 0.5, λ = 2 0.641 0.089 0.167 0.590 0.093 0.130 0.699 0.097 0.221 α = 0.5, λ = 5 0.901 0.093 0.412 0.818 0.080 0.328 0.934 0.089 0.443 HP T α = 0.5, λ = 2 0.663 0.048 0.170 0.114 0.077 0.394 0.825 0.081 0.335 α = 0.5, λ = 5 1.015 0.054 0.518 0.207 0.079 0.304 0.907 0.073 0.414 Lancaster α = 0.5, λ = 2 0.261 0.029 0.240 -0.153 0.043 0.655 0.359 0.116 0.179 α = 0.5, λ = 5 0.196 0.106 0.205 -0.153 0.043 0.655 0.359 0.116 0.179 Arellano − Bond α = 0.9, λ = 2 0.646 0.387 0.463 0.045 0.094 0.860 0.590 0.375 0.482 α = 0.9, λ = 5 0.886 0.133 0.134 0.136 0.144 0.778 0.776 0.182 0.220 Blundell − Bond α = 0.9, λ = 2 0.961 0.080 0.101 0.117 0.067 0.786 0.934 0.087 0.094 α = 0.9, λ = 5 0.912 0.046 0.047 0.111 0.067 0.792 0.942 0.115 0.122 Dhaene − Zhu α = 0.9, λ = 2 0.987 0.088 0.124 0.876 0.082 0.085 0.982 0.085 0.118 α = 0.9, λ = 5 1.146 0.108 0.269 1.009 0.077 0.133 1.139 0.089 0.255 HP T α = 0.9, λ = 2 0.977 0.048 0.091 0.155 0.084 0.750 0.889 0.070 0.070 α = 0.9, λ = 5 1.147 0.049 0.251 0.211 0.100 0.696 0.936 0.078 0.086 Lancaster α = 0.9, λ = 2 0.432 0.027 0.468 -0.153 0.043 1.054 0.301 0.063 0.602 α = 0.9, λ = 5 0.414 0.027 0.487 -0.153 0.043 1.054 0.301 0.063 0.602

Table A.9: Results of the Monte Carlo simulation for α = 0.5 or α = 0.9 and λ = 2 or λ = 5 .

(36)

Bibliography

[1] Adrover, J., and R. H. Zamar (2004). Bias Robustness of Three Median-Based Regression Estimates, Journal of Statistical Planning and Inference, 122(1-2), 203-227.

[2] Ahn, S.C. and P. Schmidt (1995). Efficient estimation of models for dynamic panel data, Journal of Econometrics, 68, 5-27.

[3] Alvarez, J. and M. Arellano (2003). The time series and cross-sectional asymptotics of dynamic panel data estimators Econometrica 71, 1121-1159.

[4] Anderson, T.W. and C. Hsiao (1981). Estimation of dynamic models with error compo-nents, Journal of the American Statistical Association 76, 598-606.

[5] Anderson, T.W. and C. Hsiao (1982). Formulation and estimation of dynamic models using panel data, Journal of Econometrics 18, 47-82.

[6] Aquaro, M. and P. Cizek (2015). Robust Estimation and Moment Selection in Dynamic Fixed-effects Panel Data Models., CentER Discussion Paper, 2015-002.

[7] Aquaro, M. and P. Cizek (2010). One-step robust estimation of fixed-effects panel data models., Tilburg University, Center for Economic Research., Technical Report 2010-110.

[8] Arellano, M. (2003a). Panel Data Econometrics, Oxford University Press.

[9] Arellano, M. (2009). Dynamic Panel Data Models I: Covariance Structures and Autore-gressions, Unpublished manuscript.

[10] Arellano, M. (2003b). Modeling Optimal Instrumental Variables for Dynamic Panel Data Models, Working Paper 0310, Centro de Estudios Monetarios y Financieros, Madrid.

[11] Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations., Review of Economic Studies 58, 277-298.

[12] Arellano, M. and O. Bover (1995). Another look at the instrumental variable estimation of error-components models., Journal of Econometrics 68, 29-51.

[13] Arellano, M. and Bonhomme, S. (2015). Nonlinear Panel Data Estimation via Quantile Regressions., unpublished manuscript

(37)

BIBLIOGRAPHY 34

[14] Arellano, M. and B. Honor (2001). Panel data models: some recent developments. In J. Heckman and E. Leamer (eds.): Handbook of Econometrics, Volume 5.

[15] Baltagi, B.H. (2008). Econometric analysis of panel data.Wiley.

[16] Bekker, P.A. (1994). Alternative approximations to the distributions of Instrumental Vari-able estimators. Econometrica 62, 657-681.

[17] Besley, T. and A. Case (2000). Unnatural experiments? Estimating the incidence of en-dogenous policies. The Economic Journal 110, F672-F694.

[18] Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 115-143.

[19] Blundell, R. and S. Bond (2000). GMM Estimation with persistent panel data: an appli-cation to production functions. Econometric Review 19, 321-340.

[20] Blundell, R., Bond, S. and F.Windmeijer (2001). Estimation in dynamic panel data mod-els:Improving on the performance of the standard GMM estimator. In: B.H. Baltagi, T.B. Fomby, R. Carter Hill (eds.), Nonstationary Panels, Panel Cointegration, and Dynamic, Panels. Advances in Econometrics, Volume 15, Emerald Group Publishing Limited.,53-91..

[21] Bramati, M. C. and C. Croux (2007). Robust estimators for the fixed effects panel data model. Econometrics Journal 10 (3), 521-540.

[22] Bowsher, C. G. (2002). On testing overidentifying restrictions in dynamic panel data models. Economics Letters 77, 211-220.

[23] Bun, M.J.G. and M.A. Carree (2005). Bias-corrected estimation in dynamic panel data models. Journal of Business Economic Statistics 23, 200-210.

[24] Bun, M.J.G. and J.F. Kiviet (2006). The effects of dynamic feedbacks on LS and MM estimator accuracy in panel data models. Journal of Econometrics 2132, 409-444.

[25] Bun, M.J.G. and F. Windmeijer (2010). The weak instrument problem of the system GMM estimator in dynamic panel data models. Econometrics Journal 13, 95-126.

[26] Bun, M.J.G. and F. Kleibergen (2013). Identification and inference in moments based analysis of linear dynamic panel data models. UvA-Econometrics discussion paper 2013/07,University of Amsterdam.

[27] Bun, M.J.G. and Sarafidis, V. (2013). Dynamic panel data models. UvA-Econometrics discussion paper 2013/01,University of Amsterdam.

[28] Cox, D. R. and N. Reid (1987). Parameter Orthogonality and Approximate Conditional Inference (with discussion). Journal of the Royal Statistical Society, Series B., 49, 1-39.

Referenties

GERELATEERDE DOCUMENTEN

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

Sommige geomorfologische of bodemkundige fenomenen kunnen alleen verklaard worden door te kijken naar hun antropogene of biotische betekenis (bijvoorbeeld bolle

lijke basis (Albinski 1978, 184-189). De doelvoorstellingen komen tot uitdrukking in de motivatie van het pro- jekt in de voorafgaande hoofdstukken van dit rapport

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

The main features of the die-level wrapper are the following: (1) a serial interface for wrapper instructions and low- bandwidth test data and a scalable, parallel interface

Lasso also lies between subset selection and Ridge regression in its properties: rather than selecting variables discretely, Lasso shrinks the coefficient estimates towards zero in

Another observation based on Fig. 2 is that the system is stable for sampling times smaller than 0.02s in all three figures. The only exception is a system with small actuator

Trough valence is the best predictor, accounting for 24% of the variance in direct overall arousal, followed by trough-end valence (21%) and valence variance (18.7%). Later