Multi step prediction from the one way error components model with AR(1) disturbances

(1)

Multi step Prediction from the One Way Error

Components Model with AR(1) disturbances

Matthijs van der Groen

Student number: 10244123 Bachelor Thesis Econometrics

Supervisor: Andrew Pua University of Amsterdam

June 29 2015

Abstract

In this paper the works of Baillie and Baltagi(1999) and Kouassi et al.(2012) are extended to a larger period prediction for AR(1) errors in the disturbance term. Four different predictors are compared based on their efficiencies using the mean squared error. This is done using a Monte Carlosimulation

and deriving their asymptotic mean squared errors. Using these concluded was that the ordinary optimal predictor and the fixed effects predictor performs well under various criteria.

(2)

Verklaring eigen werk

Hierbij verklaar ik, Matthijs van der Groen, dat ik deze scriptie zelf geschreven heb en dat ik de volledige verantwoordelijkheid op me neem voor de inhoud ervan. Ik bevestig dat de tekst en het werk dat in deze scriptie gepresenteerd wordt origineel is en dat ik geen gebruik heb gemaakt van andere bronnen dan die welke in de tekst en in de referenties worden genoemd. De Faculteit Economie en Bedrijfskunde is alleen verantwoordelijk voor de begeleiding tot het

(3)

1 Introduction

A popular method of dealing with panel data, following the work of Balestra and Nerlove(1966), is the regression model with error components, or in other words, variance components. This type of regression model can account for individual time invariant effects. This is one of the many advantages working with panel data can offer. Others include expanding sample size, capturing cross-sectional variation and dynamic behavior, improving accuracy of parameter estimates or obtaining more accurate predictions of individual outcomes. Although over the years, more and more research is done for predictions for one period with this one way error component model, little is known about longer predictions using this type of error components.

Wansbeek and Kapteyn (1978) and Taub (1979) derived a predictor for error component panel data model where is assumed that the error are not serially correlated with each other. Ignoring serial correlation when it is present gives consistent but inefficient estimates of the coefficients and gives biased standard errors. The classical error component model assumes the only correlation over time is due tot the presence in the panel of the same individual over multiple periods. This could be a restrictive assumption for economic relationships like consumption or investments, where an unobserved shock will affect later periods. Therefore, the predictor for the one way error components model was later extended by Baltagi and Li(1992) with serial correlation in the remainder disturbance term and Kouassi et al.(2012) doing extensive research for an AR(1) distributed disturbance term for a one step forecast. Applications for this type of one way error component panel data model include research by Berry et al.(1988) on the impact of plant closing on earnings or by Lillard and Weiss(1979) on the variation in earnings of American scientists across different fields using this type of error component model. Contrary to the research done by Baltagi and Li(1992), Baillie and Baltagi(1999) and Kouassi et al.(2012) which focused on a one step prediction, a prediction for a longer horizon will be formed.

In panel data, two forms of sizes can be distinguished. One is the micro panel and the other is called a macro panel. The difference between these two is that a micro panel has N individuals over T time, where T is much smaller than N. In a macro panel are N and T alike. In this thesis the focus will lie on micro panels, which are more typical than macro panels as longitudinal studies are more likely to have a small horizon. Besides these sizes of micro panels, one macro panel is considered and one size is used to benchmark other results of Baillie and Baltagi(1999) and Kouassi et al.(2012). Different values of autocorrelation are considered in these different sizes of panels, as varying values for the variances of the components of the one way error components model.

This thesis provides theoretical and simulation evidence for the relative efficiency for four different predictors for a multi-step forecast, these predictors are discussed in the next chapter and are: (i) an ordinary predictor on the form of the optimal predictor accounting for serial correlation in the error term with consistent estimators replacing population parameters,(ii) a truncated predictor that ignores the error component correction, but uses estimators for its regression parameters, (iii) a different truncated predictor which uses OLS estimates of the regression parameters, and (iv) a fixed effects predictor which assumes that the individual effects are fixed estimated parameters. The following section compares the analytic efficiencies of these predictors. The efficiencies are calculated by the asymptotic mean squared error. Thereafter,

(5)

the simulation evidence comes from a Monte Carlo experiment which are used to determine the accuracy of the asymptotic approximation of the MSE of the predictors.

2 The model

Suppose there are N individuals over T periods of time. The regression model is defined as

yit= x0itβ + uit (1)

with i = 1, ..., N and t = 1, ..., T . The matrix component xit is a K-dimensional dimensional

vector of explanatory variables for the tth period of time and the ith individual of the data set. The disturbances in this model follow a one way error component model, this can be written as

uit= µi+ νit (2)

The first error component of this model µi denotes the individual, time invariant error. This

is assumed to be NID(0, σ2

µ). The other component of the disturbances, the remainder

distur-bance νit ,is assumed to be NID(0, σν2). These two components, µi and νit are assumed to be

independent of each other. The process when the remainder disturbance νit follows a AR(1)

process, is given by

νit= ρνi,t−1+ it (3)

The remainder disturbance is now denoted by itand assumed is it∼ N ID(0, σε2). ρ denotes the

auto correlation between the disturbances and is an unknown parameter satisfying the stationary condition that the root of 1 − ρz = 0 lies outside the complex unit circle. The serial correlation term ρ can be interpreted in two ways. It can be seen as the effect of random shocks persisting longer than one year or it can reflect the operation of individual, unobserved variables which are serially correlated over time. So when νitis serially correlated it is assumed to be NID(0, σε2/(1−

ρ2)).

After having the assumptions introduced, it is more convenient to rewrite these in matrix nota-tion, starting with (1) as

Y = Xβ + u (4)

Where Y is a NT × 1 vector,stacked as Y0 _{= (y}

11, . . . , y1T, . . . , yN 1, . . . , yN T) . X is a NT × K

matrix with the rows being stacked as for Y , β is K × 1 dimensional vector and u is a stacked N T × 1 vector of disturbances. The disturbance term can be written in vector form as

u = (IN ⊗ ιT)µ + ν (5)

Where µ is an 1 × N vector, ν an 1 × NT vector, IN an identity matrix of dimension N, ιT a

(6)

3 The predictors

3.1 The ordinary predictor

The optimal prediction of the ith component at time T for s periods ahead is denoted by yi,T ,sand

is derived by Goldberger(1962) and later expanded by Baltagi(2008). With known parameters (β, ρ, σ2

µ, σν2) this optimal predictor is given by,

yi,T ,s= x0i,T +sβ + w

0

iΩ−1u (6)

for s ≥ 1. The disturbance term is defined as u = y − x0_β _{, Ω denotes the variance covariance}

matrix and w is defined as follows

wi= E(ui,T +su) (7)

The disturbances in period T + s are written as

ui,T +s= µi+ νi,T +s (8)

The expression w0_Ω−1_u_{from (6) is obtained by starting with the variance covariance matrix and}

thereafter calculating w0_{. The variance covariance matrix is defined as}

Ω = σ_µ2(IN ⊗ ι

0

TιT) + σ2ν(IN ⊗ Γ) (9)

With Γ denoting the autocorrelation in time between the components and is written in matrix notation as, Γ =        1 ρ · · · ρT −1 ρ 1 ... ... ... ... ... ρ ρT −1 · · · ρ 1       

To eliminate the serially correlated error terms, the variables can be tranformed with the Prais-Winsten (PW) transformation matrix, which eliminates the serial correlation of type AR(1) in a linear model. This is given by the matrix

C =          p 1 − ρ2 ₀ ₀ _{· · ·} ₀ −ρ 1 0 · · · 0 0 −ρ ... ... ... ... ... ... 1 0 0 · · · 0 −ρ 1         

Using this transformation matrix C on y, the explanatory variables can be transformed into y∗ = (IN ⊗ C)y

(7)

Ω∗ = (IN ⊗ C)Ω(IN⊗ C 0 ) = σ_µ2(IN ⊗ CJTC 0 ) + σ2(IN ⊗ IT) = σ2_µ(IN⊗ ιαTια 0 T ) + σ2(IN ⊗ IT) with ια

T = CιT. The inverse of Ω∗ can be written as

Ω∗−1 = 1 σ2 IN ⊗ IT ⊗ σ_µ2 σ2 ια_Tια_T0 !−1 = 1 σ2 IN⊗ IT − σ2_µ σ2 + dσµ2 IN ⊗ ιαTια 0 T ! (10) Where d = ια0 T ιαT = 1 − ρ2+ (T − 1)(1 − ρ)2. As noted before Ω∗ = (IN ⊗ C)Ω(IN ⊗ C0)

Rearranging this will lead to

Ω−1 = (IN ⊗ C0)Ω∗−1(IN⊗ C) (11)

Combining (10) and (11) gives the variance-covariance matrix Ω−1 = 1 σ2 IN⊗ C0C − σ2_µ σ2 + dσµ2 IN⊗ C0ιαTια 0 T C ! (12) Writing out the definition of w , with u in vector notation, shows

(8)

w = E(uu0_{T +s}) = E         µ1ιT + ν1 ... µNιT + νN     (u1,T +s, . . . , uN,T +s)     = E         (µ1ιT + ν1)u1,T +s · · · (µ1ιT + ν1)uN,T +s ... ... ... (µNιT + νN)u1,T +s · · · (µNιT + νN)uN,T +s         =     E[(µ1ιT + ν1)u1,T +s] · · · E[(µ1ιT + ν1)uN,T +s] ... ... ... E[(µNιT + νN)u1,T +s] · · · E[(µNιT + νN)uN,T +s]     =     σ2_µιT · · · 0 ... ... ... 0 · · · σ_µ2ιT     +     E[ν1ρν1,T +s−1] · · · 0 ... ... ... 0 · · · E[νNρνN,T +s−1     = σ_µ2(IN ⊗ ιT) + ρsσ2ν(IN ⊗       ρT −1 ρT −2 ... 1       ) (13)

Combining (12) and (13) yields

w0Ω−1 =       σ2_µ(IN⊗ ιT) + ρsσν2(IN ⊗       ρT −1 ρT −2 ... 1       )       0 1 σ2 IN ⊗ C0C − σ_µ2 σ2 + dσµ2 IN⊗ C0ιαTια 0 T C !

Breaking up the terms for calculating w0

Ω, starting with σ_µ2(IN ⊗ ιT)gives

σ2_µ(IN ⊗ ιT)0Ω−1 = σ2 µ σ2 IN ⊗ ι 0 TC0C − σ2 µ σ2 + dσ2µ IN ⊗ ι 0 TC0ιαTια 0 T C ! = σ 2 µ σ2 IN ⊗ ια 0 T C − σ2 µ σ2 + dσµ2 IN ⊗ ια 0 T ιαTια 0 T C ! = σ 2 µ σ2 IN ⊗ ια 0 T C − dσ_µ2 σ2 + dσµ2 IN ⊗ ια 0 T C ! = σ 2 µ σ2 σ2 σ2 + dσ2µ IN ⊗ ια 0 T C = σ 2 µ σ2 + dσ2µ IN ⊗ ια 0 T C following showing ,

(9)

ρsσ2_ν(IN ⊗       ρT −1 ρT −2 ... 1       )0Ω−1 = ρsσ_ν2(IN⊗       ρT −1 ρT −2 ... 1       0 ) 1 σ2 IN⊗ C0C − σ2_µ σ2 + dσµ2 IN⊗ C0ιαTια 0 T C ! = ρ s_σ2 ν σ2        IN ⊗       ρT −1 ρT −2 ... 1       0 C0C        −ρ s_σ2 ν σ2        σ_µ2 σ2 + dσµ2 IN⊗       ρT −1 ρT −2 ... 1       0 C0ια_Tια_T0C        = ρ s_σ2 ν σ2 IN ⊗ (1 − ρ2)d 0 T − σ 2 µ σ2 + dσ2µ IN ⊗ (1 − ρ2)ια 0 T C ! Using (ρT −1, ρT −2, . . . , 1)C0C = (1 − ρ2)d0_T and likewise (ρT −1, ρT −2, . . . , 1)C0ια_T = (1 − ρ2)

where dT´ is a vector of length T with 0s everywhere but the last element, i.e. d0T = (0, . . . , 0, 1).

Combining these results gives w0Ω−1 = (1 − ρ 2_)ρs_σ2 ν σ2 " IN ⊗ d 0 T − σ2_µ σ2 + dσµ2 IN ⊗ ια 0 T C # + σ 2 µ σ2 + dσ2µ IN⊗ ια 0 T C = (1 − ρ 2_)ρs_σ2 ν σ2 IN ⊗ d 0 T − (1 − ρ 2_)ρs_σ2 ν σµ2+ σ2σµ2 σ2 σ2 + dσµ2 IN ⊗ ια 0 T C = (1 − ρ 2_)ρs σ2 σ2 (1 − ρ2₎ IN ⊗ d 0 T − (1 − ρ2_)ρs σ2 (1−ρ2₎ σ2 µ+ σ2σ2µ σ2 σ2+ dσµ2 IN⊗ ια 0 T C = ρs(IN ⊗ d 0 T) − ρsσ_µ2+ σ_µ2 σ2 + dσµ2 IN ⊗ ια 0 T C = ρs(IN ⊗ d 0 T) + σ2_µ σ2 + dσ2µ (1 − ρs)(IN ⊗ ια 0 T C)

This term can be simplified by introducing a vector

(10)

where F0 _{is a 1 × T vector and is defined as}          f = η(1 − ρs)(1 − ρ) g = η(1 − ρs_{)(1 − ρ)}2 h = ρs+ η(1 − ρs)(1 − ρ) η = σ 2 µ σ2 + dσµ2 (15)

so the last term of (6), written in vector notation, can be rewritten as w0_iΩ−1u = F0ui

which in turn can be written as

w0Ω−1u = (IN ⊗ F )0u

What leads to the following expression of the predictor, yT ,s= x

0

T +sβ + (IN⊗ F )0u (16)

for s ≥ 1. This form of the optimal predictor is consistent with the formula of the one way error component model with AR(1) errors and it generalizes for s periods ahead. Specials cases include:

(1) Only serial correlation, σ2

µ= 0, F0ui will collapse into ρsuit

(2) No serial correlation, ρ = 0. F0_u i becomes σ2µ T σ2 µ+σ2 = PT

t=1uit and all errors will have the

same weight f = g = h = σ2 µ

T σ2 µ+σ2.

3.2 The feasible ordinary predictor

As the population parameters are unknown, the feasible optimal predictor ˆyi,T ,s is obtained by

substituting MLEs for those parameters. When the true variance components are known, Wansbeek and Kapteyn(1978) and Taub(1979) derived the best linear unbiased predictor from Goldberger’s(1962) result, whereafter Baltagi and Li(1992) have shown ˆyi,T ,s is indeed the best

linear unbiased predictor. Under the AR(1) correlation assumption the feasible optimal predictor is given as,

ˆ yT,s= x

0

T +sβGLS+ (IN ⊗ F )0u (17)

βGLS ,the GLS estimator of β based on the true variance components, is given by

βGLS = (X

0

Ω−1X)−1XΩ−1y where X and y are stacked as previously described.

(11)

3.3 The fully feasible ordinary predictor

To obtain the fully feasible optimal predictor the unknown variance components must be substituted by consistent estimates. Two of those estimates are described by Wallace and Hussain(1969) and Amemiya(1971). When the parameters of ˆyi,T ,s are replaced by their

corresponding MLEs , the fully feasible predictor becomes

ˆ yT ,s= x

0

T +sβ + (Iˆ N⊗ ˆF )0uˆ (18)

Where ˆβ and ˆF are the MLEs of β and F constructed with MLEs of the components

respectively. ˆu is the residual from the regression with ˆβ. The asymptotic mean squared error of this predictor and thus its efficiency will be compared with the following three alternative predictors.

3.4 The truncated predictor

The first of these three alternative predictors is the truncated predictor ˆ

ˆ

yi,T ,s= x

0

i,T +sβˆ (19)

This predictor is based on efficient estimates of the regression parameter, but it takes no form of auto correlation to the predictor into account. It ignores the extra term in (6) and therefore, it is sub-optimal. This truncated predictor corresponds to the expected value predictor

described by Goldberger(1962) and Baillie(1980).

3.5 The misspecified predictor

The second alternative predictor is based on inefficient OLS estimates of the regression parameters

y_{i,T ,s}∗ = x0_{i,T +s}βˆOLS (20)

where ˆβOLS is the least squares estimator and ignores the auto-correlated error components in

both estimation and formation of the predictor. This predictor ignores the extra term F0_u i

,which arises from Ω being non-spherical and therefore corresponds to the situation where the predictor ignores the presence of the error components. This predictor is also considered in Baillie(1980) in the context of the regression model with ARMA disturbances.

3.6 The fixed effect predictor

The last alternative predictor is the fixed effects predictor which uses the within transformation on the regression parameters. This assumes the µis are fixed parameters to be estimated. The

(12)

˜

yi,T ,s= x

0

i,T +sβ + ˜˜ µi (21)

where the estimator is defined as ˜ β = (X0QX)−1X0Qy With ˜ µi = ¯yi− ¯xi0β, Q = I˜ N ⊗ (IT − 1 TJT) ¯ yi = 1 T XT t=1yit, ¯xi = 1 T T X t=1 xit

This predictor is the BLUP given the µis are constants.

4 Asymptotic mean squared errors

In order to compare the relative efficiency of these four predictors, it is useful to examine their asymptotic mean squared errors. Previous research on the effect of parameter estimation and the prediction of exogenous variables in the context of dynamic econometric models has be done by e.g. Schmidt(1974,1977), Lahiri(1975), Baillie(1979,1980), Yamamoto(1979) and Lütke-pohl(1988). The method used in this paper is similar to the ones used in the above literature, except the parameters of interest in the predictor are from the error component effect rather than from a dynamic model mostly used in the literature above. The one way error component regression model with AR(1) remainder disturbances is examined and the relative efficiency of the predictors is compared. The asymptotic mean squared error of a predictor is given by

E[(yi,T +s− E[yi,T +s])2]

4.1 The ordinary predictor with known parameters

Starting with the optimal predictor with known parameters (β, ρ, σ2

ν, σµ2) and yi,T ,sas in (6). By

definition, the asymptotic mean squared error is given by

AM SE(yi,T ,s) = E(yi,T +s− yi,T ,s)2

the expected value of the ordinary predictor is given by yi,T ,s= x

0

i,T +sβ + F0ui

(13)

yi,T +s− yi,T ,s = νi,T +s+ µi− F0ui = νi,T +s+ µi(1 − F0ιT) − F0νi

(yi,T +s− yi,T ,s)2 = νi,T +s− F0νi

2 + µi(1 − F0ιT) 2 + νi,T +s− F0νi 0 µi(1 − F0ιT) + µi(1 − F0ιT) 0 νi,T +s− F0νi

Using that the µis and νis are independent of each other, the AMSE will be

E(yi,T +s− yi,T ,s)2 = E(νi,T +s− F

0 νi)2+ E[µi(1 − F 0 ιT)]2 = σ_ν2(1 − 2ρsF0ΓT + F 0 Γ F ) + σ_µ2(1 − F0ιT)2 with Γ =        1 ρ · · · ρT −1 ρ 1 ... ... ... ... ... ρ ρT −1 · · · ρ 1        ΓT =       ρT −1 ... ρ 1      

Writing out this term yields the AMSE, which is given by

E(yi,T +s− yi,T ,s)2 = σ2ν(1 − ρ2s+ d

(1 + ρ) (1 − ρ)f

2_{) + σ}2

µ(1 − ρs− (T − (T − 2)ρ)f )2 (22)

In this case, the predictor is entirely dependent on known parameter values, so its prediction AMSE is equivalent to its MSE.

4.2 The ordinary predictor with estimated parameters

As noted before, is this predictor above not practically feasible, so the parameters are replaced by their MLEs. This practically feasible version has a prediction error given by

yi,T +s− ˆyi,T ,s= (yi,T +s− yi,T ,s) − (ˆyi,T ,s− yi,T ,s)

and ˆ yi,T ,s− yi,T ,s = x 0 i,T +s( ˆβ − β) + ˆF 0 ˆ ui− F0ui = x0_{i,T +s}( ˆβ − β) − F0ui+ {F + ( ˆF − F )}0{ui+ (ˆui− ui)} = (x0_{i,T +s}− F0xi)( ˆβ − β) − ( ˆF − F )0ui− ( ˆF − F )0xi( ˆβ − β) (23)

To obtain the AMSE, the asymptotic variance covariance matrix for the maximum likelihood estimators of the parameters ξ = (β, ρ, σ2

(14)

n12( ˆξ − ξ)−→ N (0, Jd −1(ξ))

where J(ξ) is the Fisher information matrix and n = NT . Assumed is that T is fixed and N varies. The Fisher information matrix is defined as follows

J (ξ) = −E∂

2_l(·)

∂ξ2

where l(·) denotes the log likelihood function and is given by l(·) = constant − 1 2log|Ω| − 1 2(y − Xβ) 0 Ω−1(y − Xβ)

Working this through, following Kouassi et al.(2012), the Fisher information matrix becomes J (ξ) = V −1 β 0 0 V_θ−1 ! with Vβ = (X0Ω−1X)−1 Vθ = E[(ˆθ − θ)(ˆθ − θ)0] =    M1 M2 M3 M2 M4 M5 M3 M5 M6    −1 Where θ = (ρ, σ2

µ, σ2ν). The elements of Vθ, with η defined as in (14) , are

M1 = N (T − 1)(1 + ρ2) (1 − ρ2₎2_σ2 ν − 2N η (1 + ρ)σ2 ν T − 1 1 + ρ + T − 1 − 1 − ρT −1 1 − ρ +2N (T − 1)(1 − ρ) (1 + ρ)σ2 µσν2 ρσ2_ν+ d 1 − ρ− 1 σ_µ2 η2 M2 = − N (T − 1)ρ (1 − ρ2_)σ2 ν −N η σ2 ν d 1 − ρ− 1 + (T − 1)(1 − ρ) 1 + ρ +N dη 2 σ2 µσν2 ρσ2_ν+ d 1 − ρ− 1 σ_µ2 M3 = N η σ2 µ d 1 − ρ− 1 −N dη 2 σ4 µ ρσ2_ν+ d 1 − ρ− 1 σ_µ2 M4 = N T 2σ4 ν −N dη 2σ4 ν −N d(1 − ρ 2_)η2 2σ2 µσν2

(15)

M5 = N d(1 − ρ2_)η2 2σ4 µ M6 = N d2η2 2σ4 µ

Working out the squared expectation of (22) yields

E(ˆyi,T ,s− yi,T ,s)2 = E[(x

0

i,T +s− F 0

xi)( ˆβ − β) − ( ˆF − F )0ui− ( ˆF − F )0xi( ˆβ − β)]2

Keeping in mind that since ˆβ and θ are independent, the expectation of all cross terms involving ˆβ − β and ˆF − F are equal to zero. Therefore,

E(ˆyi,T ,s− yi,T ,s)2 = (x

0 i,T +s− F0xi)0E( ˆβ − β)2(x 0 i,T +s− F0xi) + u 0 iE( ˆF − F )2ui

Using the first order approximation of F , ˆ F − F ≈ Λ(ˆθ − θ) with Λ =           ∂f ∂ρ ∂f ∂σ2 µ ∂f ∂σ2 ν ∂g ∂ρ ∂g ∂σ2 µ ∂g ∂σ2 ν ... ... ... ∂g ∂ρ ∂g ∂σ2 µ ∂g ∂σ2 ν ∂h ∂ρ ∂h ∂σ2 µ ∂h ∂σ2 ν          

and f, g and h defined as in (10). Using

E( ˆβ − β)2 = Vβ

E( ˆF − F )2 = E[( ˆF − F )( ˆF − F )0] = E[Λ(ˆθ − θ)(ˆθ − θ)0Λ0] = ΛVθΛ

0

Further assuming the yitobservations are independent of the estimators of the standard

de-viations, the AMSE then becomes

AM SE(ˆyi,T ,s) = σ2ν(1 − ρ2s + d (1 + ρ) (1 − ρ)f 2_{) + (1 − ρ}s_{− (T − (T − 2)ρ)f )}2_σ2 µ +(xi,T +s− F0xi)0Vβ(xi,T +s− F0xi) + β 0 xiΛVθΛ 0 x0_iβ (24)

(16)

4.3 The truncated predictor

The truncated predictor as defined before is ˆ ˆ

yi,T ,s= x

0

i,T +sβˆ

This predictor uses efficient estimations of the regression parameters but ignores the last term in the optimal predictor, which is due to the predictable systematic behavior of the error com-ponents. The prediction error associated with the truncated predictor is given by

yi,T +s− ˆyˆi,T ,s= x

0

i,T +s(β − ˆβ) + ui,T +s

Using the GLS estimator of β ˆ

β = (X0Ω−1X)−1X0Ω−1y

calculating the cross term of the AMSE of the truncated predictor yields E[( ˆβ − β)ui,T +s] = −(X

0 Ω−1X)−1X0Ω−1E(uui,T +s) because ( ˆβ − β) = (X0Ω−1X)−1X0Ω−1y − β = (X0Ω−1X)−1X0Ω−1(Xβ + u) − β = β − β − (X0Ω−1X)−1X0Ω−1u = −(X0Ω−1X)−1X0Ω−1u the variance of ˆβ is given by

V ar( ˆβ) = E[(β − ˆβ)2] = E[(β − (X0Ω−1X)−1X0Ω−1y)2] = E[(β − (X0Ω−1X)−1X0Ω−1(Xβ + u))2] = E[(−(X0Ω−1X)−1X0Ω−1u)2]

= (X0Ω−1X)−1X0Ω−1E[(u)2]Ω−1X0(X0Ω−1X)−1 = (X0Ω−1X)−1 Furthermore,

E[u0_{T +s}uT +s] = E[(µιT +s+ νT +s)0(µιT +s+ νT +s)] = N · (E[µ2] + E[ν2]) = N (σµ2+ σ2ν)

(17)

E(yi,T +s− ˆyi,T ,s)2 = E[(yi,T +s− ˆyi,T ,s)0(yi,T +s− ˆyi,T ,s)] = 1 NE[(x 0 T +s(β − ˆβ) + uT +s)0(x 0 T +s(β − ˆβ) + uT +s)] = 1 N(trace(E[(xT +s(β − ˆβ) 0_{(β − ˆ}_β)x0 T +s]) + E[u0T +suT +s] +2trace(E[xT +s(β − ˆβ)0uT +s])) = 1 Ntrace(x 0

T +sV ar( ˆβ)xT +s+ E(uT +s)2− 2trace(x0T +sE[( ˆβ − β)uT +s])

= 1 N trace(x0_{T +s}(X0Ω−1X)−1xT +s) − 2trace(x0T +s(X 0 Ω−1X)−1X0Ω−1w) +(σ_ν2+ σ_µ2) (25)

Baillie(1980) found there were some situations in the case of the regression model with ARMA(p,q) errors where this truncated predictor had a smaller AMSE than the optimal predictor. This was possible when the variability of estimating the error process parameters exceeded the efficiency gain from using the optimal predictor opposed to ignoring the effect of auto-correlated errors.

4.4 The misspecified predictor

The misspecified OLS predictor is given by ˆ

yi,T +s= x0i,T +sβˆOLS

and ignores autocorrelation and prediction errors in the estimation of the one way error compo-nent model. The OLS estimator is given by

ˆ

βOLS = (X0X)−1X0y

The prediction error is given by

(yi,T +s–ˆyi,T ,s) = x0i,T +s(β − ˆβOLS) + ui,T +s

the variance of the OLS predictor is

V ar( ˆβOLS) = E(β − ˆβOLS)2 = E(β − (X

0

X)−1X0y)2= E(β − (X0X)−1X0(Xβ + u))2

= E[(−(X0X)−1X0u)2] = (X0X)−1X0E(u)2X0(X0X)−1= (X0X)−1X0ΩX0(X0X)−1 As E[u2_{] = Ω}_{. Using these, the AMSE of the misspecified predictor becomes}

E[(yi,T +s− ˆyi,T ,s)2] = x0i,T +sV ar( ˆβOLS)xi,T +s+ E(u2i,T +s) − 2xi,T +sE[(β − ˆβOLS)ui,T +s]

(18)

(β − ˆβOLS) = β − (X0X)−1X0y = β − (X0X)−1X0(Xβ + u)

= β − β − (X0X)−1X0u = −(X0X)−1X0u E[(β − ˆβOLS)ui,T +s] = −E[(X

0

X)−1Xuui,T +s] = −(X

0

X)−1X0wi

it yields that the AMSE of the misspecified predictor is

E[(yi,T +s− ˆyi,T ,s)2] =

1 N x0_{T +s}V ar( ˆβOLS)xT +s+ N (σ2µ+ σν2) − 2xT +s(X0X)−1X0w (26) If the misspecified predictor is compared in efficiency to the truncated predictor. It shows that the last terms of equations (24) and (25) are of op(1/n), so the AMSE(ˆˆyi,T ,s)−AM SE(ˆyi,T ,s) ≥ 0for

every N, T and s. This shows that the truncated predictor is more efficient than the misspecified predictor.

4.5 The fixed effect predictor

The fixed effect predictor for yi,T +sis given by

˜

yi,T ,s= x

0

i,T +sβ + ˜˜ µi

and is obtained by estimating the µi for the ith individual as a fixed parameter. The prediction

error is

yi,T +s− ˜yi,T ,s = x0i,T +s(β − ˜β) + µi,T +s− ˜µi = x0i,T +s(β − ˜β) + µi,T +s− (¯yi− ¯xi0β)˜

= (x0_{i,T +s}− ¯xi)(β − ˜β) + µi,T +s− ¯yi+ ¯xi0β

= (x0_{i,T +s}− ¯xi)(β − ˜β) + µi,T +s− ¯ui

The AMSE of the fixed effect predictor is

E[(yi,T +s− ˜yi,T ,s)2] = E[(yi,T +s− ˜yi,T ,s)0(yi,T +s− ˜yi,T ,s)]

= E[((x0_{i,T +s}− ¯xi)(β − ˜β) + µi,T +s− ¯ui)0((x0i,T +s− ¯xi)(β − ˜β) + µi,T +s− ¯ui)]

(19)

V ar( ˜β) = E[(β − ˜β)2] = E[(β − (X0QX)−1X0Qy)2]

= E[(β − (β − (X0QX)−1X0Qu))2] = E[(−(X0QX)−1X0Qu)2

= (X0QX)−1X0QE[u2]QX(X0QX)−1= (X0QX)−1X0QΩQX(X0QX)−1 = (X0QX)−1X0Q(σ2_µ(IN ⊗ ι 0 TιT) + σ2ν(IN ⊗ Γ ))QX(X0QX)−1 = (X0QX)−1X0Q{σ_µ2(IN ⊗ ι 0 TιT)}QX(X0QX)−1 +(X0QX)−1X0Q{σ_ν2(IN ⊗ Γ )}QX(X0QX)−1 = σ2_ν(X0QX)−1X0Q(IN ⊗ Γ )QX(X 0 QX)−1

Writing out the cross terms of the AMSE gives the following results,

E[(β − ˜β)0u¯i] = −(X 0 QX)−1X0QE(u0u¯i) = − 1 T(X 0 QX)−1X0QΩ(IN ⊗ ιT)

E[(β − ˜β)ui,T +s] = −(X

0 QX)−1X0QE(uui,T +s) = −(X 0 QX)−1X0Qwi E( ¯uiui,T +s) = 1 T(IN ⊗ ιT)E(uiui,T +s) = 1 T(IN ⊗ ι 0 T)wi E(¯u0u)¯ = 1 T2 IN ⊗ ι 0 T Ω (IN⊗ ιT) = N σ2µ+ N σ_ν2 T2ι 0 TΓ ιT Therefore, AM SE(˜yi,T ,s) = 1 N(σ 2 ν(xT +s− ¯x.)0(X 0 QX)−1X0Q(IN ⊗ Γ )QX(X 0 QX)−1(xT +s− ¯x.) −2(xT +s− ¯x.)0(X 0 QX)−1X0Qw + 2 T(xT +s− ¯x.) 0 (X0QX)−1ΩX0Q(IN ⊗ ιT) −2 T(IN⊗ ι 0 T)w + N (σ2ν+ σ2µ) + N ( σ_ν2 T2ι 0 TΓ ιT + σ2µ)) (27)

5 Monte Carlo experiment

5.1 Data generating process

To determine the accuracy of the asymptotic approximation of the MSE of the predictors in the type of sample sizes commonly encountered with panel data. The data are generated from the simple regression

yit= α + βxit+ uit i = 1,. . . N;t = 1, . . . , T + s

With one way error components, uit= µi+νitand νit = ρνi,t−1+εit. Throughout the experiment

the parameters were set at α = 5, β = 0.5 with the total variance σ2 _{= σ}2

ν+ σµ2 fixed at 20. The

variable xit was generated as in Nerlove(1971) with

(20)

where ωitis a random variable uniformly distributed on the interval [−0.5, 0.5] and xi0= 5+10ωi0.

The first 20 period observations are discarded to minimize the effect of initial values. Predictions were made for one, three and five periods ahead. Important to mention is the predictor for this panel data model changes with s only through xi,T +s. It is the presence of the same

individual in this panel that creates the correlation over time, this constant correlation does not die out no matter how far ahead the prediction is. Typical labor or consumer panels have large N and small T , so the sample sizes used in this experiment will be N = 20, 50, 100, 200 and T = 3, 10, 20 with 1000 replications performed for each experiment. For each replication are (N (T + s) + N )N ID(0, 1) random numbers are generated. The first N(T + s)random numbers are used to generate the νits from NID(0, σ2ν)and the remaining N random numbers are used the

generate the µis from NID(0, σµ2). With this design and using the total variance σ2= σ2ν+ σ2µ,

the implied values of ψ = σ2 µ

σ2 are 0.01, 0.225, 0.45, 0.675 and 0.9. ρ will also take these values. For

each of the predictors considered in this thesis, the AMSE for multiple step ahead predictions was computed from the formulas derived previously and the sampling MSE is computed as follows,

M SE = 1 N R R X r=1 N X i=1 (yi,T +s− ˆyi,T ,s)2

Following Spitzer and Baillie(1983), the quantity

AM SEBIASV ARIAN CE = 1 N R R X r=1 N X i=1

{(yi,T +s− ˆyi,T ,s)2− AM SE(ˆyi,T ,s)}2

Where the summation extends over all R = 1000 replications and N individuals for each (T, ρ, ϕ, s). On defining q = (MSE −AMSE), it is possible to test H0 : q = 0against H1 : q 6= 0

by using the statistic

Z =

√ Rq

AM SEBIASV ARIAN CE1/2

Since T is fixed for each experiment, it can be seen that Z, for large R, will be approximately distributed as N(0, 1), since AMSE BIAS VARIANCE is an estimate of the population variance of q and both MSE and AMSE are χ2₍₁₎ _variables.

5.2 Estimation of the parameters

The method used to estimate the parameters is derived from the method used by Baltagi and Li(1997) who recommend a consistent estimator of ρ based on the autocovariance function Qs=

σ2_µ+ σ_ν2ρs. From here, using Q0, Q1 and Q2, one can show ρ + 1 = (Q_(Q0₀−Q_−Q2₁)₎. Therefore, a

consistent estimator of ρ is given by ˆ ρ0= ˜ Q0− ˜Q2 ˜ Q0− ˜Q1 − 1 = Q˜1− ˜Q2 ˜ Q0− ˜Q1 (28)

(21)

with ˜ Qs = N X i=1 T X t=s+1 ˆ uituˆi,t−s N (T − s)

where ˆuit denotes the OLS residual on (1). Rewriting the transformed variance covariance

matrix Ω∗_{from (10) to obtain the spectral decomposition. Using different notation where Cι} T =

(1−ρ)ιγ_T, ιγ_T0 = (γ, ι0_{T −1})and γ = p(1 + ρ)/(1 − ρ). The transformed variance covariance matrix can be rewritten, following Baltagi (2008) as

Ω∗ = σ_γ2(IN ⊗ ¯Jγ) + σ2(IN⊗ ETγ)

using ¯Jγ = ιγ_Tιγ_T0/d2, E_Tγ = IT − ¯J_Tγ to utilize the Wansbeek and Kapteyn(1978) trick.

Fur-thermore d2 _{= ι}γ0

Tι γ T = γ

2_{+ (T − 1)} _{and σ}2

γ = d2σµ2(1 − ρ)2+ σ2. The best quadratic unbiased

estimators of the variance components arise naturally from the spectral decomposition and are given by ˆ σ_,02 = û0(IN ⊗ ETγ)û/N (T − 1) ˆ σ_γ,02 = û0(IN ⊗ ¯Jγ)û/N ˆ

u denotes the the OLS residual from the PW transformed equation using ˆρ0. Transforming ˆσ2γ,0

back to ˆσ2

µ,0, one can obtain the estimated variance covariance matrix ˆΩ0 using equation (12).

Using the GLS estimator of β given by (23),is ˆβ0estimated. Using this estimate ˆβ0, new residuals

are formed, to form a next estimator ˆρi with these new residuals. Iterating until convergence

trough this whole process will give consistent estimators for (β, σ2

µ, σν2, ρ). In the Monte

Carlo-simulation all negative values of σ2

µand σν2 are discarded from the replications. The discarded

values are reported in the tables.

6 Results

In this section is the effect of the different parameters on the mean squared error analyzed. This is done by plotting different graphs varying the parameters in multiple settings for the panel sizes considered.

6.1 One period prediction

The first figure shows multiple graphs plotting ψ against the asymptotic mean squared error. This is done for five different values of ρ for the size N = 50, T = 10 for a one step prediction. In these graphs it shows the ordinary predictor outperforms every other predictor for all values of ρ and for all values of ψ. The truncated and the misspecified predictors perform worse compared to the other predictors and are close to constant over all values considered. The exception holds for ρ = 0.01 and ψ = 0.01, when these predictors perform better than the ordinary and the fixed effects predictors. For very low values of ψ and low values of ρ, the fixed effects predictor is outperformed by all other predictors. It becomes the second best predictor when these parameters increase. For higher values of ρ, the fixed effects predictor outperforms the truncated and the

(22)

misspecified predictor for all values of ψ. If there is close to no auto-correlation present in the disturbances, the ordinary and the fixed effects predictor perform evenly well for a higher ψ.

figure 1(N = 50, T = 10, s = 1)

The reduction in the AMSE of the ordinary predictor when ψ = 0.9 is about fifty times when ρ is small and is around five fold when ρ is large. For the ordinary predictor are the sampling results close to the analytic ones as seen by the closeness of the MSE to the AMSE, the only exceptions are (ρ = 0.45ψ = 0.9), where the Z statistic holds a very high value of 3.6, and the values ρ = 0.9 with ψ = 0.675 and ψ = 0.9. Here the Z statistic is high and the MSE is significantly different than the AMSE, this is due to some large outliers produced by the estimators of the parameters (β, σ2

µ, σ2ν, ρ). The only time the MSE differs significantly from the

AMSE for the truncated predictor is for ψ = ρ = 0.9. The analytical and the sampling results of the misspecified predictor and the fixed effects predictor are very close to each other.

Values when ψ = 0.01 give generally more discarded values as they are more likely to produce negative estimators for the variances of the error components. When ρ increases more values are thrown away for having unfeasible estimators. This is at its maximum when ρ = 0.9 with almost throwing 30% away of all replications for having negative estimators. When ρ increases the fixed effects and the ordinary predictor performs better. For lower values of ρ and when ψ = 0.01, the fixed effects predictor performs worse than the truncated and the misspecified predictor, but when ρ increases it is a close second to the ordinary predictor.

(23)

figure2(N = 50, T = 10, s = 1)

The macro panel, which has size N = 20, T = 20, shows the same characteristics as the N = 50, T = 10 panel with an exception that the difference between the fixed effects predictor and the ordinary predictor is greater when ρ = 0.9 in this type of size. The sampling results of the ordinary predictor in this size differ more than when the data set is larger. There are large differences in the Z statistic for (ρ = 0.225, ψ = 0.675),(0.45, 0.9),(0.675, 0.9) and for multiple values of ψ when ρ = 0.9. Outliers occur when ρ = 0.9 and ϕ = 0.675 or ϕ = 0.9. There are even large Z values in the truncated predictor for ρ = 0.9 with ψ = 0.225 and ψ = 0.9.

In figure 3 it shows the ordinary predictor outperforms the others for this size as well for all values of ψ and ρ. The truncated and the misspecified predictors stay constant around 20 for all values of the parameters considered. The fixed effects predictor performs worse for low values of ρfor this size of panel relative to the size considered before, but performs better for higher values of ρ. It is again the second best predictor for higher values of ρ and ψ. The sampling results are fairly close to the analytical ones, with some exceptions for the ordinary predictor where the Z statistic has values greater than 1 for when ψ = 0.9 except for when ρ = 0.9, and ρ = 0.9 with ψ = 0.45, ψ = 0.675. The MSE and the AMSE of the other predictors are very close to each other. In this panel is the fixed effects predictor closer to the ordinary predictor for higher values of ρ, opposed to the N = 50, T = 10 size where these two predictors where closer to each other for lower values of ρ.

(24)

figure3(N = 200, T = 3, s = 1)

In the micro panels more unfeasible estimators had to be removed from the replications than the other sizes of data. Comparing the results above to the ones for a panel with less individuals, that is N = 100, T = 3, it shows that less severe outliers occur when N is larger.

6.2 Longer period prediction

Looking at the three period prediction for the predictors in the panelsize of N = 50, T = 10, a low auto correlation coefficient has a minor effect on the mean squared errors over time. However, the efficiencies of the predictors fall when ρ increases, especially for the ordinary and the fixed effects predictor, compared to the same values for a one step prediction. For low values of ψ the fixed effects predictor performs worse compared to the truncated and the misspecified predictor in the three period prediction. These last two predictors both stay close to constant over s, ρ and ψ. The fixed effects predictor performs worse for low ψ when ρ rises, but its AMSE decreases when ρ increases.

(25)

figure4(N = 50, T = 10, s = 5)

A longer step prediction produces a worse sampling results as the MSE for some values, especially for ρ equals 0.9, and are much higher than their analytical results, although the Z value is lower than 1. Other sizes of data give an even higher deviation from their MSE relative to their AMSE. A five step prediction for a size of N = 100, T = 3 gives not usable sampling results for the ordinary predictor. The rest of the predictors give good sampling predictions in terms of the closeness of their MSE to their AMSE. The same changes shows for a longer time prediction for the other sizes. The ordinary predictor still outperforms the other predictors for all values considered and the fixed effects predictor performs second best apart from situations where ρ and ψ are both low. The next figure shows when ϕ is close to zero, the truncated and the misspecified predictor outperform the fixed effect predictor. When ϕ increases this predictor performs better. If ρ increases the fixed effects predictor performs worse, as does the ordinary predictor but to a lesser degree.

(26)

figure5(N = 50, T = 10, s = 5)

Looking into the development of the asymptotic mean squared error over time for 9 different combinations for the N = 50, T = 10 panel, one can see if ρ is close to zero, the AMSE is close to constant over time. If ψ equals 0.01 the effect of the auto correlation is more present than it is for higher values of ψ. This is due to the fact when ψ is close to zero, the variance of the serially correlated component is higher than it is for higher values of ψ. The ordinary predictor is outperforming the other predictors, apart from the situations where ψ=0.01 and ρ < 0.45. Here the ordinary predictor performs equally well as the truncated and the misspecified predictors. The fixed effects predictor performs the worst of all the predictors in this situation. For all other steps of prediction considered the truncated and the misspecified predictor are close to constant and are outperformed by the ordinary and the fixed effects predictor. The ordinary predictor performs a third worse when ρ = 0.45 for a three step prediction than it does for a one step prediction. This declines to performing half as well when ρ = 0.9. The same yields for the fixed effects predictor only less severe. The same pattern is seen when a five step prediction is made, the predictors when there is no serially correlated error term stays constant over time, opposed to the predictors when ρ = 0.9 as they continue to perform worse of time.

(27)

figure6(N = 50, T = 10)

Comparing these results to other sizes of panels, it shows that when the time dimension is made smaller, the fixed effects predictor is performing worse than the ordinary predictor when ρis small compared to when the panel has T = 10. This is the opposite when ρ = 0.9. In that case the fixed effects predictor has an AMSE closer to the ordinary predictor.

7 Discussion

To create the sampling results of these four predictors, a Monte Carlosimulation was conducted. A data generating process was formed in order to reproduce real life situation of panel data. Thereafter the predictors had to be constructed from the data produced by this DGP. As in real life the true variances are not known, estimators had to be used for computing a prediction for the ordinary and the truncated predictor. The estimators used for computing these not known parameters were not based on the maximum likelihood function. They are consistent but produce unfeasible solutions in this simulation. The negative estimates of the variances of the two components of the disturbances were the only solutions to be seen as unusable. These unfeasible solutions were discarded from the replications used to compute the mean squared error. Not discarding positive outliers lead to some unrealistic outcomes in the mean squared error of the ordinary predictor, and sometimes even in the truncated predictor. This is as the mean

(28)

squared error has the tendency to weighting outliers heavily, because it is squaring each term. This results in weighing large error more heavily than small ones. Using this type of estimator, which is clearly producing outliers, a better method to use when comparing efficiencies could be a statistic based on the median of the data, such as the mean absolute error.

The replications thrown away for having negative estimators for the variances of the error components could rise up to 55% of the replications performed. This is also the reason why no more restrictions were placed on the estimators. In future experiments this could be done to see if the outliers reduce in amount. Looking into the tables it can be seen that although some of the sampling results are extremely higher than the analytical results, the Z score seems to be under 1, which shows in most replications performed these two resemble a close fit. This means a small number of replications produce a very high sampling result. In order to reduces the amount of outliers, different estimators are to be used. The estimator used for ρ, derived by Baltagi and Li(1997), did only work for large N as can be seen in the MSE of the N = 100, T = 3 table, especially for a longer prediction. Other predictors are proposed by Baltagi, Matyas and Sylvestre(2008). Of course also the maximum likelihood estimators could be used for computing the ordinary and the truncated predictor. This has been done for panel data with autocorrelated disturbances without a individual disturbance µi in the error terms by for instance Spitzer and

Baillie (1983).

In figure 5, the AMSE decreases after ρ = 0.675, opposed to the increasing mean squared error for lower values of ρ. This could be due to a faulty estimator for ρ as this is seen by the other sizes of panels too. Higher values of ρ did produces more estimates which were not usable, this can be seen in the number of discarded replications. More likely is for ρ = 0.9 an even higher mean squared error is produced than for ρ = 0.675.

Comparing the results of the one step prediction with those of Kouassi et al. (2012) notable differences arise. Apart from the fact there are minor differences in the data generating processes, most results are alike. The exception are all the values of the ordinary predictor for when ρ equals 0.9. In this case the sampling results and even the analytical results are significantly different than those of Kouassi et al.(2012). This might be of incorrect derivations of the predictor or faulty programming.

By comparing to the papers by Kouassi et al. (2012) and Baillie and Baltagi(1999), it was found that the ordinary and the fixed effects predictor both performed the best as was found by Yamamoto(1979) ,which researched these predictors in the case of regression analysis. For all sizes of panels the predictions of the one step prediction when close to no serial correlation is present are likewise the results found by Baillie and Baltagi(1999).

8 Conclusion

Four predictors were compared on their efficiency based on the mean squared error they produced in a sampling and an analytical way in the regression model with error component, these compo-nents were assumed to follow a one way error compocompo-nents model with the remainder disturbances following an AR(1) process.

For most predictors, apart from the ordinary predictor, the sampling results are very close to their analytical results with some exceptions. These exceptions mostly occur when the size of

(29)

the panel is smaller, in other words the number of observation is less than in other sizes. This stays more or less the same when a longer prediction is made. More observations means a better prediction when this estimation of the parameters is used.

When the autocorrelation coefficient is close to zero (ρ = 0.01), the analytical and to a lesser degree, because of some outliers, the sampling results both show there are gains in mean squared prediction by using the ordinary predictor instead of the truncated and the misspecified predictors for all values of ψ. The reduction in MSE is about tenfold for ρ equals 0.9 and twofold for ρ equals 0.675 for all sizes considered. These results support the ordinary predictor outperforming the other predictors in a one way error component model with no autocorrelation and is recommended using when predicting with panel data when the error terms are not serially correlated.

Increasing ρ, the auto correlation coefficient, the results shift. The truncated and the mis-specified predictor stay somewhat constant over ρ and over ψ. The ordinary predictor continues to outperform the other predictors and becomes more efficient when ρ increases. The fixed effects predictor also stays the second best predictor over this change. The ordinary predictor is even for high values for ρand ψ four times as good as the truncated and the misspecified predictors. The mean squared errors of the ordinary and the fixed effect predictors decrease by tenfold as ρ increases from 0.01 to 0.9. This is found for all values of N and T considered.

For ρ and ψ close to 0 both the truncated and the misspecified are slightly performing better compared to the ordinary and the fixed effects predictors. This continues to stay this way when looking at a larger time prediction. When ψ increases the fixed effects and the ordinary predictor performs better, so that the truncated and the misspecified predictor are not recommended using. A larger panel results in better estimators and in turn result in a better predictions, in a N = 50, T = 10panel, the fixed effects predictor give a better prediction than when T is smaller. This effect is more clearly when a longer prediction is formed.

When there is close to no autocorrelation in the error terms, a multistep forecast will likely be constant over time, however when ρ increases the forecasts will be more uncertain as the MSE will start to rise. The truncated and the fixed effects predictors are constant over time as they are oblivious to the autocorrelation in forming their prediction. The fixed effects predictor is parallel to the ordinary predictor when looking at a longer time prediction.

In general is the ordinary predictor performing best. Apart from the situation when ρ and ψ are close to 0, in that case the truncated and the misspecified predictor are performing better. When ψ increases the fixed effects predictor performs better and the difference between the ordinary and the fixed effects predictor decreases. Different values of ρ do not change much to this difference. The fixed effects predictor does give better sampling results than the ordinary predictor. In computing a longer prediction, this preference of predictor does not change much. For a normal one way error components model, so no serially correlated error term, the mean squared error is close to constant. The ordinary and the fixed effects predictor are performing worse when the prediction step is increased. The fixed effects predictor even performs worse than the truncated and the misspecified predictor for low values of ψ.Therefore, the fixed effects predictor is second best to the ordinary predictor as it produces the second best result for most different situations considered.

(30)

References

• Amemiya, T. (1971), “The Estimation of Variances in a Variance Components Model,” International Economic Review, 12: 1–13.

• Baillie, R.T (1979), “The Asymptotic Mean Squared Error of Multistep Prediction from the Regression Model with Autoregressive Errors,” Journal of the American Statistical Association, 74: 175–184.

• Baillie, R.T (1980), ‘Prediction from ARMAX Models,” Journal of Econometrics, 12: 365–374

• Baillie, RT and Baltagi, BH (1999) ‘Prediction from the regression model with one-way er-ror components’. Chapter 10 in C. Hsiao, K. Lahiri, L.F. Lee and H. Pesaran, eds., Analysis of Panels Limited Dependent Variables Models (Cambridge University Press, Cambridge), 255-267.

• Balestra, P and Nerlove, M (1966) ‘Pooling Cross-section and Time-series Data in the Estimation of a Dynamic Model: The Demand for Natural Gas’,Econometrica, 34, 585-612.

• Baltagi,B.H.(2008),” Forecasting with Panel Data.” Journal of Forecasting, 27:153-173. • Baltagi, BH and Li, Q (1992) ‘Prediction in the one - Error Component Model with Serial

Correlation’, Journal of Forecasting, 11: 561-567.

• Baltagi, B.H. and Li, Q. (1997), ‘Monte Carlo Results on Pure and Pretest Estimators of an Error Component Model with Autocorrelated Disturbances ’,Annales d’Économie et de Statistique, No. 48:69-82

• Baltagi, B.H., Matyas, L., and Sevestre, P. (2008): Error Components Models, in Matyas-Sevestre (eds.), The Econometrics of Panel Data, Third Edition, Springer Verlag, pp. 49-88;

• Berry, S., P. Gottschalk and D. Wissoker, 1988, An error components model of the impact of plant closing on earnings, Review of Economics and Statistics 70, 701–707.

• Breusch, TS (1987) ‘Maximum Likelihood Estimation of Random Effects Models’, Journal of Econometrics, 36: 383-389.

• Goldberger, A.S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear Re-gression Model,” Journal of The American Statistical Association, 57: 369–375.

• Kouassi, E., Sango, J.,Bosson Brou, J.M.,Teubissi, F.N., Kymn, K.O.(2012) "Prediction from the One-Way Error Components Model with AR (1) Disturbances." Journal of Fore-casting 31.7 : 617-638.

• Lahiri, K. (1975), “Multiperiod Predictions in Dynamic Models,” International Economic Review, 16: 699–711.

(31)

• Lillard, L.A. and Y. Weiss, 1979, Components of variation in panel earnings data: American scientists 1960–1970, Econometrica 47, 437–454.

• Lutkepohl, H. (1988), “Prediction Tests for Structural Stability,” Journal of Econometrics, 39: 267–296.

• Nerlove, M. (1971), “Further Evidence on the Estimation of Dynamic Economic Relations from a Time-Series of Cross-Sections,” Econometrica, 39: 359–383.

• Schmidt, P. (1974), “The Asymptotic Distribution of Forecasts in the Dynamic Simulation of an Econometric Model,” Econometrica, 42: 303–309.

• (1977) ,“Some Small Sample Evidence on the Distribution of Dynamic Simulation Fore-casts,” Econometrica, 45: 997–1005.|

• Spitzer, J.J. and R.T. Baillie (1983), “Small-Sample Properties of Predictions from the Regression Model with Autoregressive Errors,” Journal of the American Statistical Associ-ation, 78: 258–263.

• Swamy, P.A.V.B. and S.S. Arora, 1972, The exact finite sample properties of the estimators of coefficients in the error components regression models, Econometrica 40, 261–275. • Taub, A.J. (1979), “Prediction in the Context of the Variance-Components Model,” Journal

of Econometrics, 10: 103–107.

• Wallace, T.E. and A. Hussain (1969), “The Use of Error Component Models in Combining Cross Section and Time Series Data,” Econometrica, 37: 55–72.

• Wansbeek, T. and A. Kapteyn (1978), “The Separation of Individual Variation and Sys-tematic Change in the Analysis of Panel Data,” Annales de l’INSEE, 30–31:659–680. • Yamamoto, T. (1979), “On the Prediction Efficiency of the Generalized Least Squares

Model with an Estimated Variance Covariance Matrix,” International Economic Review, 20: 693–705.

(32)

(33)

(34)

(35)

(36)

(37)

Multi step prediction from the one way error components model with AR(1) disturbances