Fixed T robust LM tests for Time Persistence using a Predictive Model for the Initial Conditions
Technical Supplement to Master’s Thesis
Ties van den Ende (s1885677) January 15, 2016
1 Introduction
In this paper we will develop LM tests that are, if applied simultaneously, capable of pointing out the type of time persistence at hand, if any. Existing tests are based on initial assumptions, restricting and influencing the outcomes. Therefore, we aim to create tests that are, similar to Anselin, Bera, Florax, and Yoon [1996], robust to the presence of either type of time persistence, circumventing the need for initial assumptions. More formally, we consider the following combined hypothesis test:
H0: No state dependence and no serial correlation H1A: State dependence irregardless of serial correlation
H1B: Serial correlation irregardless of state dependence. (1.1) To construct two LM tests associated with these combined hypotheses, we need the score vector and information matrix of a general dynamic panel data model with serial correlation, evaluated at H0. Therefore, we will adhere to the maximum likelihood approach by Hsiao, Pesaran, and Tahmiscioglu [2002]. After these have been derived, applying a modification to the ‘standard’ LM statistic following Bera and Yoon [1993] will yield the relevant test statistics.
2 The model
We consider the following general dynamic panel data model with fixed individual effects and serial correlation, hence allowing for both types of time persistence, for i= 1, . . . , N and t = 2, . . . , T:
yit= yi,t−1γ + x0itβ + uit uit= µi+ νit
νit= νi,t−1ρ + it, (2.1)
where y= (y12, . . . , yNT)0is a vector of dependent variables of individuals over time, y−1is the observable vector of dependent variables of individuals over time, lagged once, X = (x12, . . . , xNT)0 is a matrix of k exogenous explanatory variables of individuals over time, µi is the fixed individual-specific effect, ν = (ν12, . . . , νnT)0are disturbances, ν−1are the disturbances, lagged once, it∼ IIN(0, σ2) and γ, β, ρ and σ2 are parameters to be estimated. Throughout we will assume stationarity, i.e. |γ| < 1 and |ρ| < 1. We assume the initial observations yi1are observable.
We define
Vρ=
1 ρ ρ2 . . . ρT −2
ρ 1 ρ . . . ρT −3
... ... ... ... ... ρT −2 ρT −3 ρT −4 . . . 1
.
We find E[νν0] = Ω = σ2(IN ⊗ Vρ. Under normality of the disturbances, the loglikelihood function of model 2.1 is given by:
L(θ)=constant −1 2log |Ω|
− 1 2σ2
N
X
i=1
yi− yi,−1γ − Xiβ − ιT −1µi0 V−1ρ
yi− yi,−1γ − Xiβ − ιT −1µi,
where θ0= (γ, β0, ρ, σ2), ιT −1is a T − 1-vector of ones, yihas elements yit, yi,−1has elements yi,t−1and Xi has rows x0it. To cope with the individual effects, we consider the first-difference transform of model 2.1.
Let us define D= IN⊗ D∗where D∗is a (T − 2) × (T − 1) matrix:
D∗=
−1 1 0 . . . 0 0 −1 1 . . . 0 ... ... ... · · · ... 0 0 0 . . . 1
.
Now consider the first-difference transform of model 2.1:
∆y = ∆y−1γ + ∆Xβ + ∆ν
∆ν = ∆ν−1ρ + ∆, (2.2)
where∆y = Dy has elements yit− yi,t−1for i= 1, . . . , N and t = 3, . . . , T and similar transformations hold for∆y−1, ∆X, ∆ν, ∆ν−1and∆. This conveniently cancels out the individual effect, solving the incidental parameter problem induced by the individual effect. The transformation does, however, bring about another complication, namely the influence of the assumptions regarding the initial observation of the dependent variable. To make matters more explicit, we will elaborate on the (influence of) initial observations.
3 Initial observations
This discussion regarding the initial observations is, to a large extent, derived from Hsiao et al. [2002]. It starts by noting that model 2.2 is well defined for t= 3, . . . , T, but not for t = 2, since we lack observations on∆yi1. Therefore, we need information on the process before the periods under consideration. Let us formalize this in the following assumption:
Assumption 1 (i) The process has started from the (1 − m)th period, m = 0, 1, . . . and then evolved according to model 2.1; (ii) The start of the process yi,1−mis treated exogenous.
For m > 0 starting from yi,1−mand by continuous substitution of model 2.2, we can write∆yi2as
∆yi2= γm∆yi,2−m+
m−1
X
j=0
γj∆x0i,2− jβ +
m−1
X
j=0
γj∆νi,2− j.
Then we find
η ≡ E[∆y |∆y , ∆x , . . . , ∆x ]= γm∆y +
m−1
Xγj∆x0 β.
Since xi,2− j is unknown for j > 0, ηi2 is unknown. Treating ηi2 as an extra parameter will once more result in the incidental parameter problem. Therefore, Hsiao et al. [2002] propose a predictive model for ηi2 based on assumptions on (the data generating process of) the observable explanatory variables. Let
∆Xi = (∆xi2, ..., ∆xiT)0and let qi2 = ηi2− E[ηi2|∆Xi]. First of all, we assume that m is sufficiently large (m → ∞) for the process to have reached stationarity. Furthermore, we assumed throughout that the explanatory variables are strictly exogenous. Then the marginal distribution of∆yi2conditional on∆Xiis given by:
∆yi2= π0∆Xiιk+ ξi2 ξi2= qi2+
∞
X
j=0
γj∆νi,2− j,
where π is a (T − 1)-vector of unknown parameters. Now that we have dealt with the initial observations, we will derive the score vector and information matrix corresponding to the transformed model.
4 Score vector and information matrix
First, let us define∆ν∗i = (ξi2, ∆ν0i)0for i= 1, ..., N and ∆ν∗= (∆ν∗10, ..., ∆ν∗N0)0. First, consider E[∆ν∆ν0]= DΩD0= σ2n
IN⊗Σo
whereΣ = is a (T − 2) × (T − 2) matrix given by Σ = DVρD0. Under H0, we find
Σ|H
0= ˆΣ =
2 −1 0 . . . 0
−1 2 −1 . . . 0 0 −1 2 . . . 0 ... ... ... ... ...
0 0 0 . . . 2
.
Furthermore, define E[∆ν∗∆ν∗0]= Ω∗= σ2IN⊗Σ∗. Under H0, we findΩ∗= σ2IN⊗ ˆΣ∗where
Σˆ∗=
2 −1 0 . . . 0
−1
0 Σˆ
... 0
and therefore, under H0,Ω∗−1=σ12
IN⊗ ˆΣ∗−1 ,∂Ω∂σ2∗
= IN⊗ ˆΣ∗and∂Ω∂ρ∗ = σ2n
IN⊗∂ ˆΣ∂ρ∗o , where
∂ ˆΣ∗
∂ρ =
∂Σ∗
∂ρ H
0
=
−2 2 −1 . . . 0 2 −2 2 . . . 0
−1 2 −2 . . . 0 ... ... ... ... ...
0 0 0 . . . −2
.
In this case, we are concerned with the following loglikelihood L(θ)= constant − 1
2log |Ω∗| −1
2∆ν∗0Ω∗−1∆ν∗,
where θ0 = (π0, γ, β0, ρ, σ2). Now define∆˜yi,−1 = (0, ∆yi2, ..., ∆yi,T −1)0with∆˜y−1 = (∆y01,−1, ..., ∆yN,−1)0,
∆ ˜Xi = (0k, ∆xi3, ..., ∆xiT)0with 0k a k-vector of zeros, ∆ ˜X = (∆ ˜X01, ..., ∆ ˜X0N)0,∆ ˜X∗i =
∆Xiιk0 0(T −2)×(T −1)
and
∆ ˜X∗= (∆ ˜X1∗0, ..., ∆ ˜X∗
N
0)0. We find
∂L(θ)
∂π = ∆X˜∗0Ω∗−1∆ν∗
∂L(θ)
∂γ = ∆˜y0−1Ω∗−1∆ν∗
∂L(θ)
∂β = ∆X˜0Ω∗−1∆ν∗
∂L(θ)
∂ρ =
−N
2 tr Σ∗−1∂Σ∗
∂ρ
! + 1
2σ2 ∆ν∗0 (
IN⊗Σ∗−1∂Σ∗
∂ρ Σ
∗−1)
∆ν∗
!
∂L(θ)
∂σ2 = 1 2σ4 ∆ν∗0
(
IN⊗Σ∗−1Σ∗Σ∗−1)
∆ν∗
! .
Setting ∂L(θ)∂π = 0 and solving for π yields an estimator of π, which under H0 is given by ˆπ =
∆ ˜X∗0
IN⊗ ˆΣ∗−1
∆ ˜X∗−1
∆ ˜X∗0
IN⊗ ˆΣ∗−1
∆y, setting ∂L(θ)∂β = 0 and solving for β yields an estimator of β, which under H0 is given by ˆβ = ∆ ˜X0
IN⊗ ˆΣ∗−1
∆ ˜X−1
∆ ˜X0
IN⊗ ˆΣ∗−1
∆y and setting ∂L(θ)∂σ2
= 0
and solving for σ2 yields an estimator of σ2, which under H0 is given by ˆσ2 = ∆ˆν
0 IN⊗ ˆΣ∗ −1
∆ˆν N(T −1) where
∆ˆν = ∆y − ˆπ0∆ ˜X∗−∆ ˜Xˆβ are the restricted estimation residuals. Then under H0 we find the following estimates of elements of the score vector:
d(ˆθ)ˆ γ = 1
σˆ2 ∆˜y0−1
IN⊗ ˆΣ∗−1
∆ˆν!
d(ˆθ)ˆ ρ= N(T − 1)
T + 1
2 ˆσ2∆ˆν0
IN⊗ ˆΣ∗−1∂ ˆΣ∗
∂ρ Σˆ∗−1
∆ˆν, where we used that tr Σˆ∗−1 ∂ ˆΣ∂ρ∗!
=−2(T −1)T .
Next, we will derive the information matrix J(θ) = −E"
∂2L(θ)
∂θj∂θ0l
#
for j, l = 1, . . . , 5. Therefore, we will use that E[z0Ak] = tr
E[z0Ak]
= tr
AE[kz0]
for N(T − 1)-vectors z, k and conformable matrix A. Now let us define∆ ˜Xi,−1 = (0k, ∆xi2, ..., ∆xi,T −1)0,∆ ˜X−1 = (∆ ˜X01,−1, ..., ∆ ˜X0N,−1)0,∆˜νi = (0, ∆ν0i)0and
∆˜ν = (∆˜ν01, . . . , ∆˜ν0N)0. Furthermore, let Eh∆ν∗∆˜ν0i = σ2n IN⊗ Qo
. We observe
J(θ)[γ,γ]= E[∆˜y0−1Ω∗−1∆˜y−1]= E[(∆ ˜X−1β)0Ω∗−1(∆ ˜X−1β)] + E[∆˜ν0Ω∗−1∆˜ν]
= (∆ ˜X−1β)0Ω∗−1(∆ ˜X−1β) + trnΩ∗−1E[∆˜ν∆˜ν0]o.
Similarly,
J(θ)[γ,ρ]=N · tr (
Σ∗−1∂Σ∗
∂ρ Σ∗−1Q )
and
J(θ) 2 =N σ tr
( Σ∗−1Q
) .
We obtain estimates of elements of the information matrix:
ˆJ(ˆθ)[γ,π]= ˆJ(ˆθ)[π,γ]=1 σˆ2
(∆ ˜X−1β)ˆ 0
IN⊗ ˆΣ∗−1
∆ ˜X∗ ˆJ(ˆθ)[γ,γ]=1
σˆ2
(∆ ˜X−1β)ˆ 0
IN⊗ ˆΣ∗−1
(∆ ˜X−1β)ˆ
+ N(T + 1)(T − 2) T ˆJ(ˆθ)[γ,β]= ˆJ(ˆθ)[β,γ]=1
σˆ2
(∆ ˜X−1β)ˆ 0
IN⊗ ˆΣ∗−1
∆ ˜X ˆJ(ˆθ)[γ,ρ]= ˆJ(ˆθ)[ρ,γ] =N(T − 1)(T − 2)
T ˆJ(ˆθ)[γ,σ2
] = ˆJ(ˆθ)[σ2
,γ]=0 and
ˆJ(ˆθ)[π,π] = σˆ12
∆ ˜X∗0
IN⊗ ˆΣ∗−1
∆ ˜X∗
ˆJ(ˆθ)[π,ρ]= ˆJ(ˆθ)[ρ,π] = 0 ˆJ(ˆθ)[β,π]= ˆJ(ˆθ)[π,β] = σˆ12
∆ ˜X0
IN⊗ ˆΣ∗−1
∆ ˜X∗
ˆJ(ˆθ)[β,ρ]= ˆJ(ˆθ)[ρ,β] = 0 ˆJ(ˆθ)[β,β] = σˆ12
∆ ˜X0
IN⊗ ˆΣ∗−1
∆ ˜X
ˆJ(ˆθ)[β,ρ]= ˆJ(ˆθ)[ρ,β] = 0 ˆJ(ˆθ)[ρ,ρ] = N2tr
Σˆ∗−1 ∂ ˆΣ∂ρ∗Σˆ∗−1 ∂ ˆΣ∂ρ∗
ˆJ(ˆθ)[ρ,σ2
]= ˆJ(ˆθ)[σ2
,ρ] =−N(T −1)σˆ2
T
ˆJ(ˆθ)[σ2
,σ2] = N(T −1)2 ˆσ4
ˆJ(ˆθ)[β,σ2
] = ˆJ(ˆθ)[σ2
,β] = 0,
where we used that under H0the expectation of the score vector equals zero (see Appendix A) and therefore trn ˆΣ∗−1Qo = 0. Moreover, using Kruiniger [2000], we find
1 2tr
Σˆ∗−1∂ ˆΣ∗
∂ρ Σˆ∗−1∂ ˆΣ∗
∂ρ
=−2(T − 1)
2(T − 1) − 3 + 2(T − 2)2+ (T − 2)(T − 1)2
(T − 1)2 .
Following Breusch and Pagan [1980] in constructing LM statistics, we define ψˆγγ≡ ˆJ(ˆθ)γγ|β,σ2
= ˆJ(ˆθ)[γ,γ]− ˆJ(ˆθ)0[γ,β]ˆJ(ˆθ)−1[β,β]ˆJ(ˆθ)[β,γ]− ˆJ(ˆθ)[γ,σ2
]ˆJ(ˆθ)−1[σ2
,σ2]ˆJ(ˆθ)[σ2
,γ]
ψˆρρ≡ ˆJ(ˆθ)ρρ|β,σ2
= ˆJ(ˆθ)[ρ,ρ]− ˆJ(ˆθ)0[ρ,β]ˆJ(ˆθ)−1[β,β]ˆJ(ˆθ)[β,ρ]− ˆJ(ˆθ)[ρ,σ2
]ˆJ(ˆθ)−1[σ2
,σ2]ˆJ(ˆθ)[σ2
,ρ]
ψˆργ≡ ˆJ(ˆθ)ργ|β,σ2
= ˆJ(ˆθ)[ρ,γ]− ˆJ(ˆθ)0[ρ,β]ˆJ(ˆθ)−1[β,β]ˆJ(ˆθ)[β,γ]− ˆJ(ˆθ)[ρ,σ2
]ˆJ(ˆθ)−1[σ2
,σ2]ˆJ(ˆθ)[σ2
,γ].
In constructing the test statistics, we need elements of the inverse of the information matrix. Therefore, let us define ˆZ(ˆθ) = ˆJ(ˆθ)−1and let ˆZ(ˆθ)[γ,γ], ˆZ(ˆθ)[γ,ρ] and ˆZ(ˆθ)[ρ,ρ] be the elements of the inverse of the information matrix corresponding to [γ, γ], [γ, ρ] and [ρ, ρ]. Following Anselin et al. [1996] and Bera and Yoon [1993] we propose modified LM tests for testing H0 : ρ= 0 against H1 : ρ , 0 in the presence of γ and for testing H0 : γ= 0 against H1 : γ , 0 in the presence of ρ, independent of the amount of time units at hand:
LMρ†=n ˆd(ˆθ)ρ− ˆZ(ˆθ)−1[γ,ρ]Z(ˆθ)ˆ [γ,γ]d(ˆθ)ˆ γo2
Z(ˆθ)ˆ −1[ρ,ρ]− ˆZ(ˆθ)−2[γ,ρ]Z(ˆθ)ˆ [γ,γ]
→d χ21 (4.1)
LMγ†=n ˆd(ˆθ)γ− ˆZ(ˆθ)−1[γ,ρ]Z(ˆθ)ˆ [ρ,ρ]d(ˆθ)ˆ ρo2
Z(ˆθ)ˆ −1[γ,γ]− ˆZ(ˆθ)−2[γ,ρ]Z(ˆθ)ˆ [ρ,ρ]
→d χ21. (4.2)
5 Conclusion
In this paper we have adhered to the approach by Hsiao et al. [2002] of extending the first-differenced dynamic panel data model by a predictive model for the initial observations. We have constructed two fixed T robust LM tests robust to misspecification regarding the source(s) of time persistence. If applied simultaneously, these LM tests are capable of pointing out the type of time persistence at hand, if any.
References
L. Anselin, A. Bera, R. Florax, and M. Yoon. Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26:77–104, 1996.
A. Bera and M. Yoon. Specification testing with locally misspecified alternatives. Econometric Theory, 9:
649–658, 1993.
T. Breusch and A. Pagan. The lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies, 47:239–253, 1980.
C. Hsiao, M. Pesaran, and A. Tahmiscioglu. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics, 109:107–150, 2002.
H. Kruiniger. Maximum Likelihood and GMM estimation of dynamic panel data models with fixed effects.
Working paper, Queen Mary University of London, 2000.
Z. Yang. Initial-Condition Free Estimation of Fixed Effects Dynamic Panel Data Models. Working paper, Singapore Management University, 2014.
Appendices
A Review of proof of consistency by Hsiao et al. [2002]
In their paper, Hsiao et al. [2002] derive a minimum distance estimator based on the likelihood of the first-difference transform of a dynamic panel data model utilizing assumptions regarding the initial observations and proof it is consistent. Here, we will elaborate on their proof based on the method proposed in footnote 13 of their paper. First, we will define the model and associated assumptions. Consider
yit= αi+ γyi,t−1+ uit i= 1, ..., N t = 2, ..., T,
where yit is some dependent variable of interest, yi,t−1 is the dependent variable lagged once, αi is the individual specific fixed effect, uit are disturbances which are normally distributed with mean zero and variance σ2u and γ and σ2u are parameters to be estimated. In this simplified model without explanatory variables we assume that the initial variable yi1is observable. The individual effect induces the incidental parameter problem which in turn will lead to inconsistent results when maximizing the associated likelihood. Therefore, consider the first-difference transform of the model:
∆yit= γ∆yi,t−1+ ∆uit,
where∆yit = yit− yi,t−1and similar transformations hold for∆yi,t−1 and∆uit. This model is well defined for t ≥ 3 but not for t= 2 since ∆yi1requires knowledge of yi0, which is unavailable. We assume that the process has been going on for a long time, reaching stationarity and by repeated substitution we find
∆yi2 =
∞
X
j=0
γj∆ui,2− j.
Now define∆ui= (∆yi2, ∆ui3, ..., ∆uiT)0. We find Eh∆ui∆u0ii = Ω = σ2uΩ∗where
Ω∗=
2
1+γ −1 0 . . . 0
−1 2 −1 . . . 0 0 −1 2 . . . 0 ... ... ... ... ...
0 0 0 . . . 2
.
We will evaluate the following expectation
E
" N
X
i=1
∂∆ui
∂γ
0
Ω−1∆ui
#
= tr ( N
X
i=1
Ω−1Eh ui ∂∆ui
∂γ
0i) .
Therefore, we note that
Eh∆ui ∂∆ui
∂γ
0
i = E
∆yi2
∆yi3−γ∆yi2 ...
∆yiT−γ∆yi,T −1
0 ∆yi2 . . . ∆yi,T −1
= σ2u
0 1+γ2 12γ+γ− 1 12γ+γ2 −γ . . . 2γ1+γT −1 −γT −2 0 −1 −γ + 2 −γ2+ 2γ − 1 . . . −γT −1+ 2γT −2−γT −3
0 0 −1 2 . . . 0
... ... ... ... ... ...
0 0 0 0 . . . −1
.
Now if we set γ= 0 we find
E
" N
X
i=1
∂∆ui
∂γ
0 Ω−1∆ui
#
= N · trn Bo
= 0,
where B is a (T − 1) × (T − 1) matrix with the ( j, j+ 1)-th element equal to one for j = 1, ..., T − 1 and zeros everywhere else. Consequently, the claim made in footnote 13 of Hsiao et al. [2002] is justified. If we, however, treat∆yi2exogenous and start analyzing the model from t = 3 onwards, we essentially truncate Ω and Eh
ui∂∆ui
∂γ
0i
by omitting the first row and column. As a result, we find
E
" N
X
i=1
∂∆ui
∂γ
0
Ω−1∆ui
#
= −NT −2 T −1, which actually corresponds to the observed bias in Yang [2014].