Therefore, we will adhere to the maximum likelihood approach by Hsiao, Pesaran, and Tahmiscioglu [2002]

(1)

Fixed T robust LM tests for Time Persistence using a Predictive Model for the Initial Conditions

Technical Supplement to Master’s Thesis

Ties van den Ende (s1885677) January 15, 2016

1 Introduction

In this paper we will develop LM tests that are, if applied simultaneously, capable of pointing out the type of time persistence at hand, if any. Existing tests are based on initial assumptions, restricting and influencing the outcomes. Therefore, we aim to create tests that are, similar to Anselin, Bera, Florax, and Yoon [1996], robust to the presence of either type of time persistence, circumventing the need for initial assumptions. More formally, we consider the following combined hypothesis test:

H₀: No state dependence and no serial correlation H₁^A: State dependence irregardless of serial correlation

H₁^B: Serial correlation irregardless of state dependence. (1.1) To construct two LM tests associated with these combined hypotheses, we need the score vector and information matrix of a general dynamic panel data model with serial correlation, evaluated at H₀. Therefore, we will adhere to the maximum likelihood approach by Hsiao, Pesaran, and Tahmiscioglu [2002]. After these have been derived, applying a modification to the ‘standard’ LM statistic following Bera and Yoon [1993] will yield the relevant test statistics.

2 The model

We consider the following general dynamic panel data model with fixed individual effects and serial correlation, hence allowing for both types of time persistence, for i= 1, . . . , N and t = 2, . . . , T:

y_it= y_i,t−1γ + x⁰_itβ + u_it u_it= µ_i+ ν_it

ν_it= ν_i,t−1ρ + _it, (2.1)

where y= (y₁₂, . . . , y_NT)⁰is a vector of dependent variables of individuals over time, y₋₁is the observable vector of dependent variables of individuals over time, lagged once, X = (x₁₂, . . . , x_NT)⁰ is a matrix of k exogenous explanatory variables of individuals over time, µ_i is the fixed individual-specific effect, ν = (ν₁₂, . . . , ν_nT)⁰are disturbances, ν₋₁are the disturbances, lagged once, _it∼ IIN(0, σ²) and γ, β, ρ and σ² are parameters to be estimated. Throughout we will assume stationarity, i.e. |γ| < 1 and |ρ| < 1. We assume the initial observations y_i1are observable.

(2)

We define

V_ρ=







1 ρ ρ² . . . ρ^{T −2}

ρ 1 ρ . . . ρ^{T −3}

... ... ... ... ... ρ^{T −2} ρ^{T −3} ρ^{T −4} . . . 1





 .

We find E[νν⁰] = Ω = σ²(I_N ⊗ V_ρ. Under normality of the disturbances, the loglikelihood function of model 2.1 is given by:

L(θ)=constant −1 2log |Ω|

− 1 2σ²

N

X

i=1

y_i− y_i,−1γ − X_iβ − ι_{T −1}µ_i⁰ V⁻¹_ρ

y_i− y_i,−1γ − X_iβ − ι_{T −1}µ_i,

where θ⁰= (γ, β⁰, ρ, σ²), ι_{T −1}is a T − 1-vector of ones, y_ihas elements y_it, y_i,−1has elements y_i,t−1and X_i has rows x⁰_it. To cope with the individual effects, we consider the first-difference transform of model 2.1.

Let us define D= I_N⊗ D^∗where D^∗is a (T − 2) × (T − 1) matrix:

D^∗=







−1 1 0 . . . 0 0 −1 1 . . . 0 ... ... ... · · · ... 0 0 0 . . . 1





 .

Now consider the first-difference transform of model 2.1:

∆y = ∆y₋₁γ + ∆Xβ + ∆ν

∆ν = ∆ν₋₁ρ + ∆, (2.2)

where∆y = Dy has elements y_it− y_i,t−1for i= 1, . . . , N and t = 3, . . . , T and similar transformations hold for∆y₋₁, ∆X, ∆ν, ∆ν₋₁and∆. This conveniently cancels out the individual effect, solving the incidental parameter problem induced by the individual effect. The transformation does, however, bring about another complication, namely the influence of the assumptions regarding the initial observation of the dependent variable. To make matters more explicit, we will elaborate on the (influence of) initial observations.

3 Initial observations

This discussion regarding the initial observations is, to a large extent, derived from Hsiao et al. [2002]. It starts by noting that model 2.2 is well defined for t= 3, . . . , T, but not for t = 2, since we lack observations on∆y_i1. Therefore, we need information on the process before the periods under consideration. Let us formalize this in the following assumption:

Assumption 1 (i) The process has started from the (1 − m)th period, m = 0, 1, . . . and then evolved according to model 2.1; (ii) The start of the process y_i,1−mis treated exogenous.

For m > 0 starting from y_i,1−mand by continuous substitution of model 2.2, we can write∆y_i2as

∆y_i2= γ^m∆y_i,2−m+

m−1

X

j=0

γ^j∆x⁰_{i,2− j}β +

m−1

X

j=0

γ^j∆ν_{i,2− j}.

Then we find

η ≡ E[∆y |∆y , ∆x , . . . , ∆x ]= γ^m∆y +

m−1

Xγ^j∆x⁰ β.

(3)

Since x_{i,2− j} is unknown for j > 0, η_i2 is unknown. Treating η_i2 as an extra parameter will once more result in the incidental parameter problem. Therefore, Hsiao et al. [2002] propose a predictive model for η_i2 based on assumptions on (the data generating process of) the observable explanatory variables. Let

∆X_i = (∆x_i2, ..., ∆x_iT)⁰and let q_i2 = η_i2− E[η_i2|∆X_i]. First of all, we assume that m is sufficiently large (m → ∞) for the process to have reached stationarity. Furthermore, we assumed throughout that the explanatory variables are strictly exogenous. Then the marginal distribution of∆y_i2conditional on∆X_iis given by:

∆y_i2= π⁰∆X_iι_k+ ξ_i2 ξ_i2= q_i2+

∞

X

j=0

γ^j∆ν_{i,2− j},

where π is a (T − 1)-vector of unknown parameters. Now that we have dealt with the initial observations, we will derive the score vector and information matrix corresponding to the transformed model.

4 Score vector and information matrix

First, let us define∆ν^∗_i = (ξ_i2, ∆ν⁰_i)⁰for i= 1, ..., N and ∆ν^∗= (∆ν^∗₁⁰, ..., ∆ν^∗_N⁰)⁰. First, consider E[∆ν∆ν⁰]= DΩD⁰= σ²n

I_N⊗Σo

whereΣ = is a (T − 2) × (T − 2) matrix given by Σ = DV_ρD⁰. Under H₀, we find

Σ|_H

0= ˆΣ =







2 −1 0 . . . 0

−1 2 −1 . . . 0 0 −1 2 . . . 0 ... ... ... ... ...

0 0 0 . . . 2





 .

Furthermore, define E[∆ν^∗∆ν^∗0]= Ω^∗= σ²I_N⊗Σ^∗. Under H₀, we findΩ^∗= σ²I_N⊗ ˆΣ^∗where

Σˆ^∗=







2 −1 0 . . . 0

−1

0 Σˆ

... 0





 and therefore, under H₀,Ω^∗−1=_σ¹2

I_N⊗ ˆΣ^∗−1 ,^∂Ω_∂σ2^∗

= I_N⊗ ˆΣ^∗and^∂Ω_∂ρ^∗ = σ²n

I_N⊗^{∂ ˆΣ}_∂ρ^∗o , where

∂ ˆΣ^∗

∂ρ =

∂Σ^∗

∂ρ _H

0

=







−2 2 −1 . . . 0 2 −2 2 . . . 0

−1 2 −2 . . . 0 ... ... ... ... ...

0 0 0 . . . −2





 .

In this case, we are concerned with the following loglikelihood L(θ)= constant − 1

2log |Ω^∗| −1

2∆ν^∗0Ω^∗−1∆ν^∗,

(4)

where θ⁰ = (π⁰, γ, β⁰, ρ, σ²). Now define∆˜y_i,−1 = (0, ∆y_i2, ..., ∆y_{i,T −1})⁰with∆˜y₋₁ = (∆y⁰_1,−1, ..., ∆y_N,−1)⁰,

∆ ˜X_i = (0_k, ∆x_i3, ..., ∆x_iT)⁰with 0_k a k-vector of zeros, ∆ ˜X = (∆ ˜X⁰₁, ..., ∆ ˜X⁰_N)⁰,∆ ˜X^∗_i =







∆Xiι_k⁰ 0(T −2)×(T −1)





and

∆ ˜X^∗= (∆ ˜X₁^∗⁰, ..., ∆ ˜X^∗

N

0)⁰. We find

∂L(θ)

∂π = ∆X˜^∗0Ω^∗−1∆ν^∗

∂L(θ)

∂γ = ∆˜y⁰₋₁Ω^∗−1∆ν^∗

∂L(θ)

∂β = ∆X˜⁰Ω^∗−1∆ν^∗

∂L(θ)

∂ρ =

−N

2 tr Σ^∗−1∂Σ^∗

∂ρ

! + 1

2σ² ∆ν^∗0 (

I_N⊗Σ^∗−1∂Σ^∗

∂ρ Σ

∗−1)

∆ν^∗

!

∂L(θ)

∂σ² = 1 2σ⁴ ∆ν^∗0

(

I_N⊗Σ^∗−1Σ^∗Σ^∗−1)

∆ν^∗

! .

Setting ^∂L(θ)_∂π = 0 and solving for π yields an estimator of π, which under H₀ is given by ˆπ =

∆ ˜X^∗0

I_N⊗ ˆΣ^∗−1

∆ ˜X^∗−1

∆ ˜X^∗0

I_N⊗ ˆΣ^∗−1

∆y, setting ^∂L(θ)_∂β = 0 and solving for β yields an estimator of β, which under H₀ is given by ˆβ = ∆ ˜X⁰

I_N⊗ ˆΣ^∗−1

∆ ˜X⁻¹

∆ ˜X⁰

I_N⊗ ˆΣ^∗−1

∆y and setting ^∂L(θ)_∂σ2

= 0

and solving for σ² yields an estimator of σ², which under H₀ is given by ˆσ² = ^∆ˆν

0 I_N⊗ ˆΣ^{∗ −1}

∆ˆν N(T −1) where

∆ˆν = ∆y − ˆπ⁰∆ ˜X^∗−∆ ˜Xˆβ are the restricted estimation residuals. Then under H₀ we find the following estimates of elements of the score vector:

d(ˆθ)ˆ _γ = 1

σˆ² ∆˜y⁰₋₁

I_N⊗ ˆΣ^∗−1

∆ˆν!

d(ˆθ)ˆ _ρ= N(T − 1)

T + 1

2 ˆσ²∆ˆν⁰







I_N⊗ ˆΣ^∗−1∂ ˆΣ^∗

∂ρ Σˆ^∗−1



∆ˆν, where we used that tr Σˆ^∗−^{1 ∂ ˆΣ}_∂ρ^∗!

=^{−2(T −1)}_T .

Next, we will derive the information matrix J(θ) = −E"

∂²L(θ)

∂θ_j∂θ⁰_l

#

for j, l = 1, . . . , 5. Therefore, we will use that E[z⁰Ak] = tr

E[z⁰Ak]

= tr

AE[kz⁰]

for N(T − 1)-vectors z, k and conformable matrix A. Now let us define∆ ˜X_i,−1 = (0_k, ∆x_i2, ..., ∆x_{i,T −1})⁰,∆ ˜X₋₁ = (∆ ˜X⁰_1,−1, ..., ∆ ˜X⁰_N,−1)⁰,∆˜ν_i = (0, ∆ν⁰_i)⁰and

∆˜ν = (∆˜ν⁰₁, . . . , ∆˜ν⁰_N)⁰. Furthermore, let Eh∆ν^∗∆˜ν⁰i = σ²n I_N⊗ Qo

. We observe

J(θ)_[γ,γ]= E[∆˜y⁰₋₁Ω^∗−1∆˜y₋₁]= E[(∆ ˜X₋₁β)⁰Ω^∗−1(∆ ˜X₋₁β)] + E[∆˜ν⁰Ω^∗−1∆˜ν]

= (∆ ˜X₋₁β)⁰Ω^∗−1(∆ ˜X₋₁β) + trnΩ^∗−1E[∆˜ν∆˜ν⁰]o.

Similarly,

J(θ)_[γ,ρ]=N · tr (

Σ^∗−1∂Σ^∗

∂ρ Σ^∗−1Q )

and

J(θ) 2 =N σ tr

( Σ^∗−1Q

) .

(5)

We obtain estimates of elements of the information matrix:

ˆJ(ˆθ)_[γ,π]= ˆJ(ˆθ)_[π,γ]=1 σˆ²

(∆ ˜X₋₁β)ˆ ⁰

I_N⊗ ˆΣ^∗−1

∆ ˜X^∗ ˆJ(ˆθ)_[γ,γ]=1

σˆ²

(∆ ˜X₋₁β)ˆ ⁰

I_N⊗ ˆΣ^∗−1

(∆ ˜X₋₁β)ˆ

+ N(T + 1)(T − 2) T ˆJ(ˆθ)_[γ,β]= ˆJ(ˆθ)_[β,γ]=1

σˆ²

(∆ ˜X₋₁β)ˆ ⁰

I_N⊗ ˆΣ^∗−1

∆ ˜X ˆJ(ˆθ)_[γ,ρ]= ˆJ(ˆθ)_[ρ,γ] =N(T − 1)(T − 2)

T ˆJ(ˆθ)_[γ,σ2

] = ˆJ(ˆθ)_[σ2

,γ]=0 and

ˆJ(ˆθ)_[π,π] = _σ_ˆ¹2

∆ ˜X^∗0

I_N⊗ ˆΣ^∗−1

∆ ˜X^∗

ˆJ(ˆθ)_[π,ρ]= ˆJ(ˆθ)_[ρ,π] = 0 ˆJ(ˆθ)_[β,π]= ˆJ(ˆθ)_[π,β] = _σ_ˆ¹2

∆ ˜X⁰

I_N⊗ ˆΣ^∗−1

∆ ˜X^∗

ˆJ(ˆθ)_[β,ρ]= ˆJ(ˆθ)_[ρ,β] = 0 ˆJ(ˆθ)_[β,β] = _σ_ˆ¹2

∆ ˜X⁰

I_N⊗ ˆΣ^∗−1

∆ ˜X

ˆJ(ˆθ)_[β,ρ]= ˆJ(ˆθ)_[ρ,β] = 0 ˆJ(ˆθ)_[ρ,ρ] = ^N₂tr

Σˆ^∗−^{1 ∂ ˆΣ}_∂ρ^∗Σˆ^∗−^{1 ∂ ˆΣ}_∂ρ^∗

ˆJ(ˆθ)_[ρ,σ2

]= ˆJ(ˆθ)_[σ2

,ρ] =^{−N(T −1)}_σ_ˆ2

T

ˆJ(ˆθ)_[σ2

,σ²] = ^{N(T −1)}_{2 ˆ}_σ4

ˆJ(ˆθ)_[β,σ2

] = ˆJ(ˆθ)_[σ2

,β] = 0,

where we used that under H₀the expectation of the score vector equals zero (see Appendix A) and therefore trn ˆΣ^∗−1Qo = 0. Moreover, using Kruiniger [2000], we find

1 2tr











Σˆ^∗−1∂ ˆΣ^∗

∂ρ Σˆ^∗−1∂ ˆΣ^∗

∂ρ











=−2(T − 1)

2(T − 1) − 3 + 2(T − 2)²+ (T − 2)(T − 1)²

(T − 1)² .

Following Breusch and Pagan [1980] in constructing LM statistics, we define ψˆ_γγ≡ ˆJ(ˆθ)_γγ|β,σ2

= ˆJ(ˆθ)_[γ,γ]− ˆJ(ˆθ)⁰_[γ,β]ˆJ(ˆθ)⁻¹_[β,β]ˆJ(ˆθ)_[β,γ]− ˆJ(ˆθ)_[γ,σ2

]ˆJ(ˆθ)⁻¹_[σ2

,σ²]ˆJ(ˆθ)_[σ2

,γ]

ψˆ_ρρ≡ ˆJ(ˆθ)_ρρ|β,σ2

= ˆJ(ˆθ)_[ρ,ρ]− ˆJ(ˆθ)⁰_[ρ,β]ˆJ(ˆθ)⁻¹_[β,β]ˆJ(ˆθ)_[β,ρ]− ˆJ(ˆθ)_[ρ,σ2

]ˆJ(ˆθ)⁻¹_[σ2

,ρ]

ψˆ_ργ≡ ˆJ(ˆθ)_ργ|β,σ2

= ˆJ(ˆθ)_[ρ,γ]− ˆJ(ˆθ)⁰_[ρ,β]ˆJ(ˆθ)⁻¹_[β,β]ˆJ(ˆθ)_[β,γ]− ˆJ(ˆθ)_[ρ,σ2

]ˆJ(ˆθ)⁻¹_[σ2

,γ].

In constructing the test statistics, we need elements of the inverse of the information matrix. Therefore, let us define ˆZ(ˆθ) = ˆJ(ˆθ)⁻¹and let ˆZ(ˆθ)_[γ,γ], ˆZ(ˆθ)_[γ,ρ] and ˆZ(ˆθ)_[ρ,ρ] be the elements of the inverse of the information matrix corresponding to [γ, γ], [γ, ρ] and [ρ, ρ]. Following Anselin et al. [1996] and Bera and Yoon [1993] we propose modified LM tests for testing H₀ : ρ= 0 against H₁ : ρ , 0 in the presence of γ and for testing H₀ : γ= 0 against H₁ : γ , 0 in the presence of ρ, independent of the amount of time units at hand:

LM_ρ^†=n ˆd(ˆθ)_ρ− ˆZ(ˆθ)⁻¹_[γ,ρ]Z(ˆθ)ˆ _[γ,γ]d(ˆθ)ˆ _γo2

Z(ˆθ)ˆ ⁻¹_[ρ,ρ]− ˆZ(ˆθ)⁻²_[γ,ρ]Z(ˆθ)ˆ _[γ,γ]

→d χ²₁ (4.1)

LM_γ^†=n ˆd(ˆθ)_γ− ˆZ(ˆθ)⁻¹_[γ,ρ]Z(ˆθ)ˆ _[ρ,ρ]d(ˆθ)ˆ _ρo2

Z(ˆθ)ˆ ⁻¹_[γ,γ]− ˆZ(ˆθ)⁻²_[γ,ρ]Z(ˆθ)ˆ _[ρ,ρ]

→d χ²₁. (4.2)

(6)

5 Conclusion

In this paper we have adhered to the approach by Hsiao et al. [2002] of extending the first-differenced dynamic panel data model by a predictive model for the initial observations. We have constructed two fixed T robust LM tests robust to misspecification regarding the source(s) of time persistence. If applied simultaneously, these LM tests are capable of pointing out the type of time persistence at hand, if any.

References

L. Anselin, A. Bera, R. Florax, and M. Yoon. Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26:77–104, 1996.

A. Bera and M. Yoon. Specification testing with locally misspecified alternatives. Econometric Theory, 9:

649–658, 1993.

T. Breusch and A. Pagan. The lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies, 47:239–253, 1980.

C. Hsiao, M. Pesaran, and A. Tahmiscioglu. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics, 109:107–150, 2002.

H. Kruiniger. Maximum Likelihood and GMM estimation of dynamic panel data models with fixed effects.

Working paper, Queen Mary University of London, 2000.

Z. Yang. Initial-Condition Free Estimation of Fixed Effects Dynamic Panel Data Models. Working paper, Singapore Management University, 2014.

(7)

Appendices

A Review of proof of consistency by Hsiao et al. [2002]

In their paper, Hsiao et al. [2002] derive a minimum distance estimator based on the likelihood of the first-difference transform of a dynamic panel data model utilizing assumptions regarding the initial observations and proof it is consistent. Here, we will elaborate on their proof based on the method proposed in footnote 13 of their paper. First, we will define the model and associated assumptions. Consider

y_it= α_i+ γy_i,t−1+ u_it i= 1, ..., N t = 2, ..., T,

where y_it is some dependent variable of interest, y_i,t−1 is the dependent variable lagged once, α_i is the individual specific fixed effect, u_it are disturbances which are normally distributed with mean zero and variance σ²_u and γ and σ²_u are parameters to be estimated. In this simplified model without explanatory variables we assume that the initial variable y_i1is observable. The individual effect induces the incidental parameter problem which in turn will lead to inconsistent results when maximizing the associated likelihood. Therefore, consider the first-difference transform of the model:

∆y_it= γ∆y_i,t−1+ ∆u_it,

where∆y_it = y_it− y_i,t−1and similar transformations hold for∆y_i,t−1 and∆u_it. This model is well defined for t ≥ 3 but not for t= 2 since ∆y_i1requires knowledge of y_i0, which is unavailable. We assume that the process has been going on for a long time, reaching stationarity and by repeated substitution we find

∆y_i2 =

∞

X

j=0

γ^j∆u_{i,2− j}.

Now define∆u_i= (∆y_i2, ∆u_i3, ..., ∆u_iT)⁰. We find Eh∆ui∆u⁰_ii = Ω = σ²uΩ^∗where

Ω^∗=







2

1+γ −1 0 . . . 0

−1 2 −1 . . . 0 0 −1 2 . . . 0 ... ... ... ... ...

0 0 0 . . . 2





 .

We will evaluate the following expectation

E

" N

X

i=1

∂∆u_i

∂γ

0

Ω⁻¹∆u_i

#

= tr ( N

X

i=1

Ω⁻¹Eh u_i ∂∆u_i

∂γ

0i) .

Therefore, we note that

Eh∆u_i ∂∆u_i

∂γ

0

i = E













∆y_i2

∆y_i3−γ∆y_i2 ...

∆y_iT−γ∆y_{i,T −1}







0 ∆y_i2 . . . ∆y_{i,T −1}







= σ²_u







0 ₁_+γ² ₁^2γ_+γ− 1 ₁^2γ_+γ² −γ . . . ^2γ₁_+γ^{T −1} −γ^{T −2} 0 −1 −γ + 2 −γ²+ 2γ − 1 . . . −γ^{T −1}+ 2γ^{T −2}−γ^{T −3}

0 0 −1 2 . . . 0

... ... ... ... ... ...

0 0 0 0 . . . −1





 .

(8)

Now if we set γ= 0 we find

E

" N

X

i=1

∂∆u_i

∂γ

⁰ Ω⁻¹∆u_i

#

= N · trn Bo

= 0,

where B is a (T − 1) × (T − 1) matrix with the ( j, j+ 1)-th element equal to one for j = 1, ..., T − 1 and zeros everywhere else. Consequently, the claim made in footnote 13 of Hsiao et al. [2002] is justified. If we, however, treat∆y_i2exogenous and start analyzing the model from t = 3 onwards, we essentially truncate Ω and Eh

u_i_∂∆u_i

∂γ

⁰i

by omitting the first row and column. As a result, we find

E

" N

X

i=1

∂∆u_i

∂γ

0

Ω⁻¹∆u_i

#

= −NT −2 T −1, which actually corresponds to the observed bias in Yang [2014].