Forecasting with panels under the correlated random effects framework as in Mundlak (1978)

(1)

Forecasting with panels under the

Correlated Random Effects framework

as in Mundlak (1978)

Alejandro Brondino

Student number: 10840869

Date of final version: August 12, 2015

Master’s programme: Econometrics

Specialisation: Free Track

Supervisor: Prof. Andrew Pua

Second reader: Prof. Maurice Bun

(2)

Statement of Originality:

This document is written by Student Alejandro Brondino who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1 A first approach to the problem 3

1.1 Introduction . . . 3

1.2 The correlated random effects framework by Mundlak (1978) . . . 5

1.3 Forecasting environment . . . 6

1.4 Forecasting rules . . . 7

2 BLUPs for post sample prediction 9 2.1 Deriving the BLUPs . . . 9

2.2 (A)MSE of the predictors in the post sample dimension . . . 12

2.2.1 The Ordinary Predictor (OP) . . . 12

2.2.1.1 MSE of the OP when all parameters are known. . . 12

2.2.1.2 AMSE of the FOP . . . 14

2.2.2 The Truncated Predictor (TP) . . . 16

2.2.3 The Ordinary Least Squares predictor (OLS) . . . 18

2.2.4 The Fixed Effects predictor (FE) . . . 20

3 BLUPs for out of sample prediction 22 3.1 Deriving the BLUPs . . . 22

3.2 (A)MSE of the predictors in the out of sample dimension . . . 24

3.2.1 The Truncated Predictor (TP) . . . 24

3.2.2 The Ordinary Least Squares predictor (OLS) . . . 26

3.2.3 The Truncated Fixed Effects predictor (TFE) . . . 27

4 Monte Carlo Results 30 4.1 Monte Carlo Experiment Design . . . 30

4.2 Results . . . 32

4.2.1 Experiment 1 . . . 33

4.2.1.1 Post Sample Case . . . 33

4.2.1.2 Out of sample case . . . 34

4.2.2 Experiment 3 . . . 35

4.2.2.1 Post Sample case . . . 35 1

(4)

4.2.2.2 Out of Sample case . . . 37

4.3 Conclusions . . . 37

4.3.1 Post Sample prediction . . . 37

4.3.2 Out of Sample prediction . . . 38

4.3.3 General conclusion . . . 38

A Asymptotic variance and covariance matrix of the MLE 40 A.1 Setup and derivatives . . . 40

A.1.1 Partial derivatives w.r.t. γ . . . 41

A.1.2 Partial derivatives w.r.t. σµ . . . 42

A.1.3 Partial derivatives w.r.t. σv . . . 45

A.2 The information matrix . . . 47

B Results used in Chapter 2 and Chapter 3 50 C Other tables and figures 57 C.1 Tables of Correlation between αi and ¯xi . . . 57

C.2 Figures . . . 58

C.2.1 Experiment 2 . . . 58

(5)

A first approach to the problem

1.1 Introduction

Forecasting using panel data has numerous areas of application in economics and in other disciplines. To mention a few examples, Frees and Miller (2004) forecasted sales of lottery tickets in Wisconsin using a panel with demographic characteristics from fifty different zip codes in the course of forty weeks; Fiebig and Johar (2014) forecasted health expenditure for Australians, based on a micro-panel; Chamberlain and Hirano (1999) studied the case of earnings forecasts for individuals, based on data on one’s earnings history and on various personal characteristics such as age and education; Baltagi and Li (2006) forecasted the demand for liquor based on a panel of 43 states tracked for 29 years; Schmalensee et al. (1998) projected world carbon dioxide emissions up to year 2050, based on a national-level panel for period 1950-1990.

It is important to note that working with panels implies that forecasts can be made in two directions: a forecast for an individual that is part of the panel, outside of the time frame where data are available; a forecast for an individual that is not part of the panel, for the period analyzed; forecasts can also be made for an individual that is outside of the panel in both dimensions, but this thesis will not analyze such a case. Adopting the terminology as in Fiebig and Johar (2014), we will refer to the forecast in the time dimension as “post sample” forecast and to the forecast for an individual that is not part of the panel as “out of sample” forecast. As stated by Baltagi (2007), most of the research has focused on the former, making the later a particularly interesting area of study.

Additionally, the research mentioned in the survey by Baltagi (2007) is either based on the random effects model or on the fixed effects model, but never in the correlated random effects framework. To our best knowledge, only Fiebig and Johar (2014) used an application in the cor-related random effects framework, but provided no formal derivation of the Best Linear Unbiased Predictors (henceforth BLUPs) and didn’t derive the MSE or AMSE of the different predictors.

The correlated random effects framework as described in Chamberlain (1984) is a more general case of the random effects model. According to Cameron and Trivedi (2005), the key weakness of the random effects model is the strong assumption that the individual specific effects are independent

(6)

of the regressors. It is up to the investigator to evaluate if that assumption is reasonable for each particular case, but a more general framework might be necessary and the BLUPs have to be derived and available for research purposes.

A particular case of the correlated random effects framework is the one described by Mundlak (1978), in which the correlation between the explanatory variables (xi,t) and the individual specific

effect (αi) is constant over time. It models the relationship between xi,t and αi as a linear function

of ¯xi. This simplification allows us to estimate the correlation between the explanatory variables

and the individual specific effect as a K ×1 vector that will be called π. On the other hand, the more general model by Chamberlain (1984), involves the use of a K × T matrix for the same purpose, leading to a much larger number of parameters to be estimated.

An interesting feature of the correlated random effects model is that it decomposes the individual specific effect in an observable and an unobservable part αi = x¯

0 i,tπ | {z } observable + µi |{z} unobservable . We expect this decomposition to be particularly useful while predicting for out of sample units, because no other predictor that we are aware of uses a forecast for the individual specific effects (due to lack of information about yj,t where unit j is outside of the panel). Working with this model allows us

to predict the observable part of the individual specific effect for out of sample units, potentially leading to a better performing predictor.

The objective of this thesis is to follow the procedure used by Goldberger (1962) to derive the BLUPs for the correlated random effects framework as in Mundlak (1978) both for the post sample forecast and the out of sample forecast. Additionally, we want to measure the performance of these BLUPs (in the sense of AMSE) and compare it with other predictors, to determine if they are indeed preferred and under which circumstances. The theoretical AMSE will be derived for all the forecasting rules and a Monte Carlo experiment will be performed in order to obtain an empirical MSE of the predictors. A comparison between the theoretical AMSE and the empirical MSE will be useful to determine whether the AMSE is a good approximation for a finite sample experiment. This thesis is organized as follows: Section 1 corresponds to the introduction of the problem and the econometric framework; Section 2 contains the derivations necessary for the post sample predictor; Section 3 consists on the derivations for the out of sample predictor; Section 4 includes the Monte Carlo experiment design, the results and the conclusions; In Appendix A and Appendix B we show some derivations necessary to obtain the results in the main body of the thesis; In Appendix C we show additional information about the Monte Carlo experiments.

(7)

1.2 The correlated random effects framework by Mundlak (1978)

Assumptions A1 are based on the correlated random effects framework by Mundlak (1978): A1

yi,t = x0i,tβ + ui,t

with yi,t a scalar, x

0

i,t a 1 × K vector, ui,t= αi+ vi,t a scalar and αi= ¯x

0

iπ + µi. β and π are K × 1

vectors of coefficients. Along this thesis xi,t, β and π are treated as non-stochastic. In the original

model by Mundlak we have the 1 × K vector ¯x0

i ≡ T

−1_P_T

t=1x 0

i,t. It is important to note that ¯xi is

a device to model the correlation between αi and the explanatory variables.

Mundlak doesn’t specify how should we adapt this device for forecasting purposes. In particular, in the following chapters we will work under the assumption that we only have information about the explanatory variables up to period T + s. We decided to include this information in ¯xi, because

it is likely to be useful as all the previous values of xi,t included in the mean. So in the present work

we use ¯ x0_i≡ (T + s)−1 T +s X t=1 x0_i,t

The downside of this choice is that it implies that the model has to be re-estimated every time that we have new information about the explanatory variables, but the alternative of using just the first T values of xi,t for the mean seems rather arbitrary.

Under assumptions A1 we can express all the outcomes of the panel in vector form:

y = Xβ + ¯Xπ + µ + v (1.1) where: y ≡                y1,1 .. . y1,T .. . yN,1 .. . yN,T                N T ×1 , X ≡                x0_1,1 .. . x0_1,T .. . x0 N,1 .. . x0_N,T                N T ×K , u ≡                α1+ v1,1 .. . α1+ v1,N .. . αN + vN,1 .. . αN + vN,T                N T ×1 ιT ≡     1 .. . 1     T ×1 , ¯X ≡      (T + s) −1 PT +s t=1 x01,t ⊗ ιT .. . (T + s) −1 PT +s t=1 x0N,t ⊗ ιT      N T ×K , v ≡                v1,1 .. . v1,T .. . vN,1 .. . vN,T                N T ×1 , µ ≡     µ1ιT .. . µNιT     N T ×1

(8)

Additionally, we define the NT × 2K matrix A ≡hX ... ¯Xi and the 2K ×1 vector γ≡ β π , so that we can rewrite equation (1.1):

y = Aγ + µ + v (1.2)

We also make the following assumptions for the error term: A2 v_i,t∼ iidh0, σ2_viµi ∼ iid

h

0, σ_µ2i with vi,t independent of µi.

Some implications of assumptions A2 are:

• E [vi,tvj,s] = E [vi,t] E [vj,s] = 0when j 6= i or t 6= s.

• E [µiµj] = E [µi] E [µj] = 0when j 6= i.

• E [µivj,t] = E [µi] E [vj,t] = 0for every i, j and t.

• E [µµi] = σ

2

µL where the NT × 1 vector L is defined as the ith column of IN ⊗ ιT.

• E((µ + v)(µ + v)0

) = σ2_µ(IN⊗ ιtι

0

t) + σ

2

vIN T ≡ Ω where we can rewrite

Ω ≡ σ2_µT P + σ_v2IN T with P ≡ T−1 IN⊗ ιTι 0 T. • Ω−1 = 1 σv2 Q + 1 T σ2µ+σ 2 v P where Q ≡ IN T− P.

1.3 Forecasting environment

We make to following assumptions about the information available to the investigator:

A3 Data about yi,t are only available for i = 1, . . . , N and t = 1, . . . , T (note that yj,t are not

available for j /∈ [1, . . . , N]). Data about xi,t are only available for i = 1, . . . , N, j and

t = 1, . . . , T + s. Vectors β and π are unknown for the investigator. The variances of the error component σ2

v and σ

2

µ are unknown. Scalars µi and vi,t are the unobservable part of the error

term, so information about them is unavailable.

When we refer to an post sample forecast, based on assumptions A1 we want to predict yi,T +s = x 0 i,T +sβ + ¯x 0 iπ + µi+ vi,T +s (1.3) where s > 0 and i ∈ [1, . . . , N].

For the out of sample forecast, based on assumptions A1 we want to predict yj,t= x 0 j,tβ + ¯x 0 j,tπ + µj+ vj,t (1.4) where j /∈ [1, . . . , N] and t ∈ [1, . . . , T ].

(9)

1.4 Forecasting rules

The Ordinary Predictor (OP) is derived in Chapter 2 and is proved to be the BLUP for post sample units under assumptions A1, A2 and A3. If information on σ2

µand σ

2

v is available for the investigator

we have

ˆ

yi,T +s,OP = θ¯ei,GLS + Ai,T +sˆγGLS (1.5)

with θ ≡ T σ 2 µ T σ2µ+σ 2 v, ˆγGLS ≡ A0Ω−1A −1 A0Ω−1y, ¯ei,GLS ≡ T −1_P_T t=1 yi,t− A 0

i,tγˆGLS the average

of the GLS residuals for individual i over periods 1 to T and Ai,T +s≡

x0_{i,T +s}... ¯x0_i 1×2K . If σ2 µ and

σ_v2 have to be estimated, the fully Feasible Ordinary Predictor (FOP) is ˆ

yi,T +s,F OP = ˆθM LE¯ei,M LE + Ai,T +sˆγM LE

where ˆθM LE and ˆγM LE are based on the MLE of σ

2

µ, σ

2

v and γ, computed following the method of

iterated GLS as in Breusch (1987) and ¯ei,M LE ≡ T

−1_P_T

t=1

yi,t− A

0

i,tγˆM LE. Observe that these

predictors require the use of yi,t for t = 1, . . . , T in order to compute ¯ei,M LE, so they can’t be

applied for the out of sample forecast (Assumptions A3 state that we have no information about yj,t for j 6= [1, . . . , N]).

The Truncated Predictor (TP) is derived in Chapter 3 and is proved to be the BLUP for out of sample units under assumptions A1, A2 and A3. If information on σ2

µ and σ

2

v is available, for the

post sample forecast we have

ˆ

yi,T +s,T P = Ai,T +sˆγGLS (1.6)

If σ2

µand σ

2

v have to be estimated, the fully Feasible Truncated Predictor (FTP) is given by

ˆ

yi,T +s,F T P = Ai,T +sγˆM LE

Observe that the TP ignores the second term of equation (1.5), so it is equal in expectation to our OP, but it does not use the information about the error terms to improve the forecast. Similarly, for the out of sample forecasts we use

ˆ

yj,t,T P = Aj,tγˆGLS

ˆ

yj,t,F T P = Aj,tˆγM LE

The Fixed Effects (FE) predictor has been widely used in the literature, in particular, we use the same definition as Baillie and Baltagi (1999). This predictor assumes that αi is a fixed parameter

to be estimated, and uses the within transformation to do so. For post sample prediction we have ˆ

yi,T +s,F E = x

0

(10)

where ˆβF E ≡ X0QX X0Qy, Q ≡ IN T−T −1 IN ⊗ ιTι 0 T and ˆαi,F E = T −1_P_T t=1 yi,t− x 0 i,tβˆF E.

The fixed effects predictor under assumptions A1 and A2 is based on consistent estimation of the parameters as shown by Mundlak (1978). Observe that the FE predictor require the use of yi,t

for t = 1, . . . , T in order to compute ˆαi,F E, so it can’t be applied for the out of sample forecast

(Assumptions A3 state that we have no information about yj,t for j 6= [1, . . . , N]).

The Truncated Fixed Effects predictor (TFE) was used by Fiebig and Johar (2014) as an adap-tation of the FE predictor for out of sample units. It uses the feasible part of the FE predictor when the second term of equation (1.7) can’t be estimated. For the out of sample case we have

ˆ

yj,t,T F E = x

0

j,tβˆF E

The Ordinary Least Squares predictor (OLS) is important because it is easy to implement, and investigators are often tempted to use it. For the post sample case we have

ˆ yi,T +s,OLS = x 0 i,T +sβˆOLS where ˆβOLS ≡ X0X −1

X0y. It is important to note that under assumptions A1 and A2 this is a misspecified estimation of β. The Gauss Markov assumptions are not fulfilled, in particular, the error terms are assumed to be correlated, so that Eh ˆβOLS

i

6= β. For the out of sample forecast we have

ˆ

yj,t,OLS = x

0

(11)

BLUPs for post sample prediction

2.1 Deriving the BLUPs

We want to derive the BLUPs following the procedure used by Goldberger (1962). We work under assumptions A3, so xi,t are available for i = 1, . . . , N and t = 1, . . . , T + s and yi,t are known only

for i = 1, . . . , N and t = 1, . . . , T . Rewriting equation (1.3), the s periods ahead value of yi is:

yi,T +s= Ai,T +sγ + µi+ vi,T +s

with Ai,T +s≡

h

x0_{i,T +s} ... ¯x0_ii a 1 × 2K vector. We want to find the NT × 1 vector c such that: 1. Ec0y − yi,T +s

= 0

2. V arc0y − yi,T +sis at its minimum.

Combining (1.2) and (1.3) we can rewrite the left hand side of the first condition as: E

c0y − yi,T +s

= 0

Ec0Aγ + c0µ + c0v − Ai,T +sγ − µi− vi,T +s

= 0 E (c0A − Ai,T +s)γ = 0 (c0A − Ai,T +s)γ = 0 c0A − Ai,T +s = 0 (2.1)

Where the third equality follows from assumptions A2, the fourth equality follows from γ and xi,t

being non-stochastic and the last equality follows from the assumption that γ 6= 0 (note that in the case in which γ = 0, xi,t have no explanatory power over yi,t, so it is not an interesting scenario).

(12)

Next we use (2.1) to rewrite the second condition: V ar c0y − yi,T +s = V ar c0(µ + v) − µi− vi,T +s = Ehc0(µ + v) − µi− vi,T +s (µ + v)0c − µi− vi,T +s i = c0E (µ + v)(µ + v)0 c − 2c0(E (µµi) + E (vµi) + E (µvi,T +s) + E (vvi,T +s)) − 2E (µivi,T +s) + E µ2_i+ Ev_{i,T +s}2 = c0Ωc − 2σ2_µc0L + σ2_µ+ σ_v2 (2.2) Where the last equality follows from assumptions A2.

Following Goldberger (1962) we want to choose c such that it minimizes (2.2) subject to (2.1). To achieve this we use a Lagrange function with the 1 × 2k vector λ0

as the Lagrange Multiplier. M inc,λ

c0Ωc − 2σ_µ2c0L + σ_µ2 + σ2_v− 2λ0A0c − A0_{i,T +s}

We take first derivatives w.r.t c and λ. ∂ ∂c c0Ωc − 2σ2_µc0L − 2λ0 c0A − Ai,T +s = 0 2Ωc − 2σ_µ2L − 2Aλ = 0 2Ωc − 2Aλ = 2σ2_µL Ωc + A(−λ) = σ2_µL (2.3) ∂ ∂λ c0Ωc − 2σ2_µc0L − 2λ0c0A − Ai,T +s = 0 2A0c − 2A0_{i,T +s} = 0 A0c = A0_{i,T +s} (2.4) We have a system of NT + 2K equations under full rank conditions. We can solve it for c and λin the matrix form combining (2.3) and (2.4):

" Ω A A0 02K×2K # " ˆ c −ˆλ # = " σ_µ2L A0_{i,T +s} #

(13)

" ˆ c −ˆλ # =     Ω−1 IN T − A A0Ω−1A −1 A0Ω−1 Ω−1A A0Ω−1A −1 A0Ω−1A −1 A0Ω−1 −A0Ω−1A −1     " σ2_µL A0_{i,T +s} # ˆ c = Ω−1 IN T − A A0Ω−1A −1 A0Ω−1 σ_µ2L + Ω−1AA0Ω−1A −1 A0_{i,T +s} ˆ c0 = σ2_µL0 IN T − Ω −1 A A0Ω−1A −1 A0 Ω−1 + Ai,T +s A0Ω−1A A0Ω−1 ˆ c0 = σ2_µL0Ω−1+−σ_µ2L0Ω−1A + Ai,T +s A0Ω−1A −1 A0Ω−1 (2.5)

Where the second equality uses the formula for the inverse of a partitioned matrix1_.

Recall that the BLUP was defined as c0

y. We can further develop that expression plugging equation (2.5) in. ˆ c0y = σ_µ2L0Ω−1y + −σ2_µL0Ω−1A + Ai,T +s A0Ω−1A −1 A0Ω−1y = σ_µ2L0Ω−1y − σ_µ2L0Ω−1AˆγGLS+ Ai,T +sγˆGLS = σ_µ2L0Ω−1y − σ_µ2L0Ω−1yˆGLS+ Ai,T +sγˆGLS = σ_µ2L0Ω−1(y − ˆyGLS) + Ai,T +sˆγGLS = σ_µ2L0Ω−1eGLS+ Ai,T +sγˆGLS (2.6) = θ¯ei,GLS+ Ai,T +sγˆGLS (2.7)

Where the second equality recognizes the GLS estimate of γ, ˆγGLS =

A0Ω−1A

−1

A0Ω−1y. The third equality uses AˆγGLS = ˆyGLS the GLS estimated values of y. The fifth equality recognizes

y− ˆyGLS = eGLSthe sample residuals from the GLS estimation of y using A as explanatory variables.

The last equality combines result (B.1) (derived on Appendix B) with equation (2.6). Note that (2.7) is the same expression as the definitions of ˆyi,T +s,OP given in Chapter 1 in equation (1.5).

Equation (2.7) involves the use of σ2

µ, σ

2

v that might not be available for the investigator. The

feasible BLUP involves the MLE estimation (using the method of iterated GLS as in Breusch (1987)) of θ2 _{and γ:}

ˆ

yi,T +s,F P = ˆθM LE¯ei,M LE + Ai,T +sγˆM LE (2.8)

1

Starting with partitioned matrix

A B

C D

, if A is non-singular and D is singular, A B C D −1 =     A−1− A−1BD − CA−1B −1 CA−1 −A−1BD − CA−1B −1 −D − CA−1B −1 CA−1 D − CA−1B −1     .

2_{Recall the definition given in Chapter 1: θ ≡} T σ

2 µ

(14)

2.2 (A)MSE of the predictors in the post sample dimension

There are some important properties that will be used repeatedly in this section and in section 3.2 as well. To save time and space we will enumerate and name these properties here, so that we can recall them later. It is important to note that the (A)MSE is the inner product of a vector with itself, meaning that it is a scalar.

• Property 1: For a scalar a, a = a0

so that a + a0

= 2a. • Property 2: Scalars are equal to their trace a = tr (a).

• Property 3: tr (AB) = tr (BA) as long as A is an i × j matrix and B is a j × i matrix. • Property 4: Matrix Q transforms time-invariant elements to a vector of zeros.

2.2.1 The Ordinary Predictor (OP)

Following the procedure used by Baillie and Baltagi (1999) we want to derive the MSE of the OP (2.7) and the AMSE of the FOP (2.8). The MSE is defined as

M SE (ˆy) = E (y − ˆy)0(y − ˆy) (2.9) .

In this section we will work on the derivations of the MSE using vectorization, due to the fact that it will be computationally more efficient when we apply it to the Monte Carlo Simulations. In order to achieve this we define:

yT +s ≡     y1,T +s ... yN,T +s     N ×1 , ˆyT,s ≡     ˆ y1,T +s ... ˆ yN,T +s     N ×1 , µT +s≡     µ1 ... µN     N ×1 ¯ v ≡     T−1PT t=1v1,t ... T−1PT t=1vN,t     N ×1 , vT +s ≡     v1,T +s ... vN,T +s     N ×1 ¯ e ≡     ¯ e1 ... ¯ eN     N ×1 = µT +s+ ¯v, AT +s≡     A1,T +s ... AN,T +s     N ×2K , ¯A ≡     T−1PT t=1A1,t ... T−1PT t=1AN,t     N ×2K

2.2.1.1 MSE of the OP when all parameters are known. First we define the OP when all parameters are known:

ˆ

(15)

Using (1.3) and (2.10), we have:

yT +s− ˆyT ,s,OP KP = AT +sγ + µT +s+ vT +s− AT +sγ − θ¯e

= µT +s+ vT +s− θµT +s− θ¯v

= (1 − θ) µT +s+ vT +s− θ¯v

We compute the MSE, defined as in (2.9). M SE (ˆyT,s,OP KP) = E ((1 − θ) µT +s+ vT +s− θ¯v) 0 ((1 − θ) µT +s+ vT +s− θ¯v) = E µ0_{T +s}µT +s(1 − θ) 2 | {z } a1 +2 E (1 − θ) (µT +s) 0 vT +s | {z } a2 −2 E(1 − θ) θ (µT +s) 0 ¯ v | {z } a3 + Ev0_{T +s}vT +s | {z } a4 −2 Eθv0_{T +s}v¯ | {z } a5 + Eθ2v¯0¯v | {z } a6

Where the last equality uses Property 1. We work on the terms one by one: a1 = (1 − θ)2E µ0_{T +s}µT +s = (1 − θ)2 N X i=1 Eµ2_i = N (1 − θ)2σ2_µ a2 = 0 a3 = 0 a4 = N X i=1 Ev2_{i,T +s} = N σ_v2 a5 = 0 a6 = θ2Ev¯0¯v = θ2 N X i=1 E   T −1XT t=1 vi,t !2  = θ2T−2 N X i=1 T X t=1 Ehv_i,t2 i = θ2T−1N σ_v2

(16)

Putting all together, we reach the final expression for MSE (ˆyT ,s,OP KP): M SE (ˆyT ,s,OP KP) = N (1 − θ)2σ2_µ+ σ_v2+ θ2T−1σ_v2 ≡ σ2_{T +s} (2.11) 2.2.1.2 AMSE of the FOP

Following Baillie and Baltagi (1999) we want to derive the AMSE for the FOP. We will do this in two steps. We start from equation number 14 of Baillie and Baltagi (1999):

yT +s− ˆyT ,s,F OP = (y − ˆyT ,s,OP KP) − (ˆyT ,s,F OP − ˆyT,s,OP KP) ˆ yT ,s,F OP − ˆyT ,s,OP KP = AT +sγˆM LE+ ˆθM LEe¯M LE− AT +sγ − θ¯e = AT +s(ˆγM LE− γ) + ˆθM LE¯eM LE − θ¯e = AT +s− θ ¯A (ˆγM LE− γ) + ˆθM LE− θ ¯ e − ˆθM LE − θ ¯A (ˆγM LE− γ) (2.12)

Where the last equality uses result (B.2) from appendix B. With equation (2.12) we can now derive the MSE of the difference between the FOP and the OP.

M SE (ˆyT ,s,F OP − ˆyT ,s,OP KP) = E AT +s− θ ¯A (ˆγM LE− γ) + ˆθM LE− θ ¯ e − ˆθM LE− θ ¯A (ˆγM LE− γ) 0 × AT +s− θ ¯A (ˆγM LE− γ) + ˆθM LE− θ ¯ e − ˆθM LE− θ ¯A (ˆγM LE − γ) = Eh(ˆγM LE− γ) 0 AT +s− θ ¯A 0 AT +s− θ ¯A (ˆγM LE− γ) i | {z } b1 +2 Eh(ˆγM LE− γ) 0 AT +s− θ ¯A 0 ˆ_θ_{M LE}_{− θ} ¯ ei | {z } b2 − 2 Eh(ˆγM LE − γ) 0 AT +s− θ ¯A 0 ˆ θM LE − θ ¯A (ˆγM LE− γ) i | {z } b3 + E ˆ_θ_{M LE}_{− θ}2 ¯ e0e¯ | {z } b4 − 2 Eh ˆθM LE− θ ¯ e0 ˆθM LE− θ ¯A (ˆγM LE− γ) i | {z } b5 + E (ˆγM LE− γ) 0 ¯ A0 ˆθM LE− θ 2 ¯ A (ˆγM LE− γ) | {z } b6

Working term by term we have:

b1 = trhE(ˆγM LE− γ) 0 AT +s− θ ¯A 0 AT +s− θ ¯A (ˆγM LE− γ) i = E h tr AT +s− θ ¯A 0 AT +s− θ ¯A (ˆγM LE− γ) (ˆγM LE− γ) 0i = trh AT +s− θ ¯A 0 AT +s− θ ¯A E (ˆγM LE− γ) (ˆγM LE− γ) 0i = tr h AT +s− θ ¯A 0 AT +s− θ ¯A V ar (ˆγM LE) i = tr AT +s− θ ¯A 0 AT +s− θ ¯A A0Ω−1A −1

Where the first equality uses Property 2, the second equality uses Property 3, the fourth equality comes from the assumption of unbiasedness of the MLE of γ and the last equality uses V ar [ˆγM LE] =

(17)

A0Ω−1A −1

, from equation (A.22). b2 = E h (ˆγM LE − γ) 0 ˆ_θ_{M LE} _{− θ} AT +s− θ ¯A 0 ¯ e i ' Eh(ˆγM LE − γ) 0 AT +s− θ ¯A 0 ¯ eiEh ˆθM LE − θ i = 0

Where the second equality uses EhγˆM LEθˆM LE

i

' E [ˆγM LE] Eh ˆθM LEi, from equation (A.23).

b3 = E h tr (ˆγM LE− γ) 0 AT +s− θ ¯A 0 ˆ θM LE − θ ¯A (ˆγM LE − γ) i = Ehtr AT +s− θ ¯A 0 _¯ A ˆθM LE − θ (ˆγM LE − γ) (ˆγM LE− γ) 0i = tr AT +s− θ ¯A 0 _¯ AEh ˆθM LE − θ (ˆγM LE− γ) (ˆγM LE− γ) 0i ' tr AT +s− θ ¯A 0 _¯ AEh ˆθM LE − θ i Eh(ˆγM LE− γ) (ˆγM LE − γ) 0i = 0

Where the first equality uses Property 2, the second equality uses Property 3 and the fourth equality uses result (A.23) from Appendix A.

b4 = E ¯ e0e¯ E ˆ_θ_{M LE}_{− θ}2 = V ar ˆθM LE N σ_µ2 + T−1σ_v2 = 2σ 4 v (T − 1) T σ_µ2 + σ_v2

Where the third equality comes from the assumption of unbiasedness of ˆθM LE and E

¯ e0e¯ =

Nσ2_µ+ T−1σ_v2. The last equality uses V ar hˆθM LE

i = 2T σ 4 v N (T −1)(T σµ2+σ 2 v)

2, from equation (A.21).

b5 = Ehtr ˆθM LE − θ ¯ e0 ˆθM LE − θ ¯A (ˆγM LE− γ) i = E tr ¯ e0A¯ ˆθM LE− θ 2 (ˆγM LE − γ) ' E ˆ_θ_{M LE} _{− θ}2 trEhe¯0A (ˆ¯ γM LE − γ) i = 0

(18)

Where the first equality uses Property 2 and the third equality uses result (A.23) from Appendix A. b6 = tr E (ˆγM LE− γ) 0 ¯ A0 ˆθM LE − θ 2 ¯ A (ˆγM LE− γ) = E ˆ_θ M LE − θ 2 tr (ˆγM LE− γ) 0 ¯ A0A (ˆ¯ γM LE− γ) = E ˆ_θ_{M LE} _{− θ}2 tr ¯A0A (ˆ¯ γM LE− γ) (ˆγM LE− γ) 0 ' 0

Where the first equality uses Property 2, the third equality uses property 3 and the last equality uses result (B.3) from the appendix B.

Putting all the results together we can rewrite AMSE (ˆyT ,s,F OP − ˆyT,s,OP KP) as:

AM SE (ˆyT ,s,F OP − ˆyT,s,OP KP) = tr AT +s− θ ¯A 0 AT +s− θ ¯A A0Ω−1A −1 + 2σ 4 v (T − 1) T σ2_µ+ σ_v2 (2.13) Combining (2.13) and (2.11) we reach the final expression for AMSE (ˆyT,s,F OP):

AM SE (ˆyT ,s,F OP) = tr AT +s− θ ¯A 0 AT +s− θ ¯A A0Ω−1A −1 + 2σ 4 v (T − 1) T σ_µ2 + σ2_v + σ 2 T +s

Where the first term represents the loss in efficiency that arises from the estimation of ˆγM LE, the

second term the loss in efficiency derived from the estimation of ˆθM LE and the last term the variance

of the estimator with known parameters.

2.2.2 The Truncated Predictor (TP)

The truncated predictor for an individual outcome is ˆyi,T ,s,T P = Ai,T +sγˆGLS.The vector form is:

ˆ

yT ,s,T P = AT +sγˆGLS

Now we want to compute the difference between yT +s− ˆyT ,s,T P and use it to derive the MSE of

(19)

yT +s− ˆyT ,s,T P = AT +sγ + µT +s+ vT +s− AT +sγˆGLS = AT +s(γ − ˆγGLS) + (µT +s+ vT +s) M SE (ˆyT ,s,T P) = E h (AT +s(γ − ˆγGLS) + (µT +s+ vT +s)) 0 (AT +s(γ − ˆγGLS) + (µT +s+ vT +s)) i = Eh(γ − ˆγGLS) 0 A0_{T +s}AT +s(γ − ˆγGLS) i | {z } c1 + Eh2 (γ − ˆγGLS) 0 A0_{T +s}(µT +s+ vT +s) i | {z } c2 + E h (µT +s+ vT +s) 0 (µT +s+ vT +s) i | {z } C3

Where the last equality uses Property 1. We work on the terms one by one: c1 = trEh(γ − ˆγGLS) 0 A0_{T +s}AT +s(γ − ˆγGLS) i = tr A0_{T +s}AT +sE h (γ − ˆγGLS) (γ − ˆγGLS) 0i = trA0_{T +s}AT +sV ar [ˆγGLS] = tr A0_{T +s}AT +s A0Ω−1A −1

Where the first equality uses Property 2, the second equality uses Property 3 and the third equality uses E [ˆγGLS] = γ. c2 = 2trEh(γ − ˆγGLS) 0 A0_{T +s}(µT +s+ vT +s) i = 2tr h A0_{T +s}E (µT +s+ vT +s) (γ − ˆγGLS) 0i = −2σ2_µtr A0_{T +s}IN ⊗ ι 0 T Ω−1AA0Ω−1A −1

Where the first equality uses Property 2, the second equality uses Property 3 and the last equality uses E(µT +s+ vT +s) (γ − ˆγGLS) 0 = −σ_µ2IN ⊗ ι 0 T Ω−1AA0Ω−1A −1 , from equation (B.5). c3 = Ehµ0_{T +s}µT +s i + Ehµ0_{T +s}vT +s i + Ehv0_{T +s}µT +s i + Ehv_{T +s}0 vT +s i = E "_N X i=1 µ2_i # + E "_N X i=1 v2_{i,T +s} # = Nσ2_µ+ σ_v2

Where the second equality uses the fact that Ehµ0_{T +s}vT +s

i

= Ehv_{T +s}0 µT +s

i

= 0 because of as-sumptions A2.

(20)

Putting all the results together, we reach an expression for MSE (ˆyT ,s,T P): M SE (ˆyT ,s,T P) = tr A0_{T +s}AT +s A0Ω−1A −1 −2σ_µ2tr A0_{T +s}IN⊗ ι 0 T Ω−1AA0Ω−1A −1 +Nσ_µ2 + σ_v2

Since the FTP uses a MLE with consistent estimates of ˆγGLS, the AMSE of the FTP is equal

to the MSE of the TP. AM SE (ˆyT,s,F T P) = tr A0_{T +s}AT +s A0Ω−1A −1 −2σ_µ2tr A0_{T +s}IN⊗ ι 0 T Ω−1AA0Ω−1A −1 +Nσ_µ2 + σ_v2

Where the first two terms represent the loss in efficiency due to the estimation of ˆγM LE and the

last term represents the variance of the predictor with known parameters.

2.2.3 The Ordinary Least Squares predictor (OLS)

First we define ¯ XT +s≡     (T + s)−1PT +s t=1 x 0 1,t ... (T + s)−1PT +s t=1 x 0 N,t     N ×K , XT +s≡     x0_{1,T +s} ... x0_{N,T +s}     N ×K , uT +s≡     u1,T +s ... uN,T +s     N ×1

The OLS predictor for an individual outcome is ˆyi,T ,s,OLS = x

0

i,T +sβˆOLS.The vector form is

ˆ

yT ,s,OLS = XT +sβˆOLS

Now we want to compute the difference between yT +s− ˆyT ,s,OLS and use it to derive the MSE

of the OLS predictor:

yT +s− ˆyT ,s,OLS = XT +sβ + uT +s− XT +sβˆOLS = XT +s β − ˆβOLS + uT +s M SE (ˆyT,s,OLS) = E XT +s β − ˆβOLS + uT +s 0 XT +s β − ˆβOLS + uT +s = E β − ˆβOLS 0 X_{T +s}0 XT +s β − ˆβOLS | {z } d1 + E 2β − ˆβOLS 0 X_{T +s}0 uT +s | {z } d2 + Ehu0_{T +s}uT +s i | {z } d3

(21)

Where the last equality uses Property 1. We work in the terms one by one. d1 = tr E β − ˆβOLS 0 X_{T +s}0 XT +s β − ˆβOLS = tr X_{T +s}0 XT +sE β − ˆβOLS β − ˆβOLS 0 = tr X_{T +s}0 XT +s X0X −1 X0h ¯Xππ 0 ¯ X0+ ΩiXX0X −1

Where the first equality uses Property 2, the second equality uses Property 3 and the last equal-ity uses E ˆ_β_OLS_{− β} ˆ_β_OLS _{− β}0 = X0X −1 X0h ¯Xππ 0 ¯ X0+ ΩiXX0X −1 , from equation (B.7). d2 = 2tr E β − ˆβOLS 0 X_{T +s}0 uT +s = 2E tr X_{T +s}0 uT +s β − ˆβOLS 0 = −2tr X_{T +s}0 ¯_X_{T +s}_ππ0 _¯ X0+ σ_µ2 IN ⊗ ι 0 T X X0X −1

Where the first equality uses Property 2, the second equality uses Property 3 and the last equality uses E uT +s β − ˆβOLS 0 = − ¯XT +sππ 0 _¯ X0+ σ2_µ IN⊗ ι 0 T X X0X −1 , from equation (B.8). d3 = E h ¯ XT +sπ + (µT +s+ vT +s) 0 _¯ XT +sπ + (µT +s+ vT +s) i = Eπ0X¯0_{T +s}X¯T +sπ + E h (µT +s+ vT +s) 0 (µT +s+ vT +s) i = π0X¯0_{T +s}X¯T ‘+sπ + E h µ0_{T +s}µT +s i + Ehµ0_{T +s}vT +s i + Ehv_{T +s}0 µT +s i + Ehv0_{T +s}vT +s i = π0X¯0_{T +s}X¯T +sπ + E "_N X i=1 µ2_i # + E "_N X i=1 v_{i,T +s}2 # = π0X¯0_{T +s}X¯T +sπ + N σ2_µ+ σ_v2 (2.14)

Where the second and the fourth equality follow from assumptions A2.

Putting all the results together, we reach the final expression for MSE (ˆyT ,s,OLS):

M SE (ˆyT,s,OLS) = tr X_{T +s}0 XT +s X0X −1 X0h ¯Xππ 0 ¯ X0 + Ω i X X0X −1 − 2tr X_{T +s}0 ¯_X_{T +s}_ππ0 _¯ X0+ σ_µ2 IN ⊗ ι 0 T X X0X −1 + π0X¯0_{T +s}X¯T +sπ + N σ2_µ+ σ_v2

(22)

the third term represents the effect of the bias with known parameters and the last term the variance of the estimator with known parameters.

2.2.4 The Fixed Effects predictor (FE)

First we define: ¯ yi ≡ T −1 T X t=1 yi,t, ¯y ≡     ¯ y1 .. . ¯ yN     N ×1 , ¯u ≡     T−1PT t=1u1,t .. . T−1PT t=1uN,t     N ×1 , ¯v ≡     T−1PT t=1v1,t .. . T−1PT t=1vN,t     N ×1 ¯ XN ×K ≡     T−1PT t=1x 0 1,t .. . T−1PT t=1x 0 N,t     , ˆαi,F E ≡ ¯yi− ¯x 0 iβˆF E, ˆαF E ≡     ˆ α1,F E .. . ˆ αN,F E     N ×1 = ¯y − ¯XN ×KβˆF E= ¯eF E

The FE predictor for an individual outcome is ˆyi,T ,s,F E = x

0

i,T +sβˆF E+ ˆαi,F E. The vector form

is:

ˆ

yT,s,F E = XT +sβˆF E + ˆαi,F E

Now we want to compute the difference between yT +s− ˆyT,s,F E and use it to derive the MSE of

the FE predictor: yT +s− ˆyT,s,F E = XT +sβ + uT +s− XT +sβˆF E− ˆαi,F E = XT +s β − ˆβF E + uT +s− ¯ y − ¯XN ×KβˆF E = XT +s β − ˆβF E + uT +s− ¯y + ¯XN ×KβˆF E− ¯XN ×Kβ + ¯XN ×Kβ = XT +s− ¯XN ×K β − ˆβF E + uT +s− ¯y − ¯XN ×Kβ = XT +s− ¯XN ×K β − ˆβF E + (uT +s− ¯u)

Where we add and subtract ¯XN ×Kβ in the third equality.

M SE (ˆyT,s,F E) = E[ XT +s− ¯XN ×K β − ˆβF E + (uT +s− ¯u) 0 × XT +s− ¯XN ×K β − ˆβF E + (uT +s− ¯u) ] = E β − ˆβF E 0 XT +s− ¯XN ×K 0 XT +s− ¯XN ×K β − ˆβF E | {z } e1 + E 2β − ˆβF E 0 XT +s− ¯XN ×K 0 (uT +s− ¯u) | {z } e2 + Eh(uT +s− ¯u) 0 (uT +s− ¯u) i | {z } e3

(23)

Where the second equality uses Property 1.We work on the terms one by one: e1 = tr E β − ˆβF E 0 XT +s− ¯XN ×K 0 XT +s− ¯XN ×K β − ˆβF E = tr XT +s− ¯XN ×K 0 XT +s− ¯XN ×K E β − ˆβF E β − ˆβF E 0 = σ2_vtr XT +s− ¯XN ×K 0 XT +s− ¯XN ×K X0QX −1

Where the first equality uses Property 2, the second equality uses Property 3 and the last equality uses E ˆ_β_{F E}_{− β} ˆ_β_{F E}_{− β}0 = σ_v2X0QX −1 from equation (B.11). e2 = 2tr E β − ˆβF E 0 XT +s− ¯XN ×K 0 (uT +s− ¯u) = −2tr XT +s− ¯XN ×K 0 E (uT +s− ¯u) β − ˆβF E 0 = 0

Where the first equality uses Property 1, the second equality uses Property 3 and the last equality uses Eh(uT +s− ¯u) (β − βF E) 0i = 0, from equation (B.12). e3 = Ehu0_{T +s}− ¯u0(uT +s− ¯u) i = E h u0_{T +s}uT +s i − 2Ehu¯0uT +s i + E h ¯ u0u¯ i = N T + 1 T σ 2 v + π0h ¯XT +s0X¯T +s− 2 ¯X 0 N ×KX¯T +s+ ¯X 0 N ×KX¯N ×K i π

Where the last equality uses Ehu¯0u¯i= Nσ_µ2 + T−1σ_v2+ π0X¯_{N ×K}0 X¯N ×Kπ, E

h ¯ u0uT +s i = N σ_µ2 + π0X¯_{N ×K}0 X¯T +sπ and E h u0_{T +s}uT +s i = π0X¯T +s0X¯T +sπ + N σ2_µ+ σ2_v, derived on equations (B.13), (B.14) and (B.15) respectively.

Putting all together we can rewrite MSE (ˆyT,s,F E)as:

M SE (ˆyT ,s,F E) = σ 2 vtr XT +s− ¯XN ×K 0 XT +s− ¯XN ×K X0QX −1 + NT + 1 T σ 2 v+ π 0_{h ¯} XT +s0X¯T +s− 2 ¯X 0 N ×KX¯T +s+ ¯X 0 N ×KX¯N ×K i π

Where the first term represents the loss in efficiency derived from the estimation of the parameters and the second term represents the variance of the estimator with known parameters. Note that the last term is negligible if ¯XT +s ' ¯XN ×K, which may be the case if T is large and xi,t follow a

(24)

BLUPs for out of sample prediction

3.1 Deriving the BLUPs

We want to derive the BLUP for yj,t with j /∈ [1, . . . , N] and t ∈ [1, . . . , T ] following the procedure

used by Goldberger (1962). Recall that assumptions A3 state that xj,t are available for periods

t = 1, . . . , T + s, yi,t are available for i = 1, . . . , N and t = 1, . . . , T and xi,t are available for

i = 1, . . . , N and t = 1, . . . , T + s. Rewriting equation (1.4), we want to predict: yj,t = Aj,tγ + µj + vj,t with Aj,t ≡ x0_j,t ... (T + s)−1PT +s t=1 x 0 j,t

a 1 × 2K vector. We want to find the NT × 1 vector c such that:

1. Ec0y − yj,t

= 0

2. V arc0y − yj,tis at its minimum.

Combining (1.2) and (1.4), we can further develop the first condition: E c0y − yj,t = E c0Aγ + c0µ + c0v − Aj,tγ − µj − vj,t = E(c0A − Aj,t)γ + c 0 µ + c0v − µj− vj,t = E (c0A − Aj,t)γ = (c0A − Aj,t)γ c0A − Aj,t = 0 (3.1)

Where the third equality uses assumptions A2, the fourth equality follow from γ and xi,t being

non-stochastic and the last equality follows from the assumption that γ 6= 0 (note that in the case in which γ = 0, xj,t have no explanatory power for yj,t).

(25)

With result (3.1) we rewrite the second condition: V ar c0y − yj,t = V ar c0(µ + v) − µj− vj,t = Ehc0(µ + v) − µj− vj,t (µ + v)0c − µj− vj,t i = E h c0(µ + v)(µ + v)0c − 2c0µµj− 2c 0 vµj− 2c 0 µvj,t− 2c 0 vvj,t− 2µjvj,t+ µ 2 j + v 2 j,t i = = c0Ωc + σ_µ2 + σ_v2 (3.2)

Where the last equality follows from assumption A2.

Following Goldberger (1962) we want to choose c such that it minimizes (3.2) subject to (3.1). To achieve this we use a Lagrange function with the 1 × 2k vector λ0

as the Lagrange Multiplier. M inc,λ

c0Ωc + σ_µ2 + σ_v2− 2λ0A0c − A0_j,t

Now we take first derivatives w.r.t c and w.r.t. λ. ∂ ∂c c0Ωc − 2λ0 c0A − Aj,t = 0N T ×1 2Ωc − 2Aλ = 0N T ×1 Ωc + A(−λ) = 0N T ×1 (3.3) ∂ ∂λ c0Ωc − 2λ0 c0A − Aj,t = 02K×1 2A0c − 2A0_j,t = 02K×1 2A0c = 2A0_j,t A0c = A0_j,t (3.4)

Now we have a system of NT + 2K equations under full rank condition. We can solve it for c and λin the matrix form combining (3.3) and (3.4):

" Ω A A0 02K×2K # " ˆ c −ˆλ # = " 0N T ×1 A0_j,t # N T +2K×1 " ˆ c −ˆλ # =     Ω−1 IN T − A A0Ω−1A −1 A0Ω−1 Ω−1A A0Ω−1A −1 A0Ω−1A −1 A0Ω−1 −A0Ω−1A −1     " 0N T ×1 A0_j,t # ˆ c = Ω−1A A0Ω−1A −1 A0_j,t ˆ c0 = Aj,t A0Ω−1AA0Ω−1 (3.5)

(26)

Where the second equality uses the formula for the inverse of a partitioned matrix. Recall that the BLUP was defined as c0

y. We can further develop that expression plugging (3.5) in: ˆ c0y = Aj,t A0Ω−1A −1 A0Ω−1y = Aj,tγGLS ˆ yj,t,T P = Aj,tγGLS (3.6)

Where in the second equality we recognize the GLS estimate of γ, ˆγGLS =

A0Ω−1A −1

A0Ω−1yand the last equality recognizes the truncated predictor, defined in equation (1.6).

Since we are working under the assumption that σ2

v and σ

2

µ are not available to the investigator,

we define the feasible version of the truncated predictor, that uses the same MLE of the parameters as the OP in Chapter 2:

ˆ

yj,t,F T P = Aj,tγM LE

.

3.2 (A)MSE of the predictors in the out of sample dimension

3.2.1 The Truncated Predictor (TP)

In this section we will work in the derivations of the MSE using vectorization, due to the fact that it will be computationally more efficient when we apply it to the Monte Carlo Simulations. In order to achieve this we need to define:

Aj,t≡ h x0_j,t ... T−1PT t=1x 0 j,t i 1×2K, Aj ≡     Aj,1 ... Aj,T     T ×2K , Xj ≡     x0_j,1 ... x0_j,T     T ×K ¯ Xj ≡ " ιT ⊗ (T + s) −1T +sX t=1 x0_j,t !# T ×K , vj ≡     vj,1 ... vj,T     T ×1 , uj ≡     uj,1 ... uj,T     T ×1 = µjιT + vj+ ¯Xjπ yj ≡     yj,1 ... yj,T     T ×1 = Ajγ + vj+ µjιT yˆj ≡     ˆ yj,1 ... ˆ yj,T     N ×1

From equation (3.6) we have the truncated predictor for an individual outcome ˆyj,t,T P =

(27)

ˆ yj,T P ≡     Aj,1γGLS ... Aj,TγGLS     =     Aj,1 ... Aj,T     γGLS = AjγGLS (3.7)

Now we compute the difference yj− ˆyj,T P and use it to compute the MSE of the TP in the Out

of Sample dimension. yj − ˆyj,T P = Ajγ + µjιT + vj− AjˆγGLS = Aj(γ − ˆγGLS) + µjιT + vj M SE (ˆyj,T P) = E (Aj(γ − ˆγGLS) + µjιT + vj) 0 (Aj(γ − ˆγGLS) + µjιT + vj) = E (γ − ˆγGLS) 0 A0_jAj(γ − ˆγGLS) | {z } f 1 + 2E (γ − ˆγGLS) 0 A0_jµjιT | {z } f 2 + 2E(γ − ˆγGLS) 0 A0_jvj | {z } f 3 + Eι0_Tµ0_jµjιT | {z } f 4 + 2Eι0_Tµ0_jvj | {z } f 5 + Ev0_jvj | {z } f 6

Where the second equality uses Property 1. Now we work on each of the terms: f 1 = trhE(γ − ˆγGLS) 0 A0_jAj(γ − ˆγGLS) i = trA0_jAjE h (γ − ˆγGLS) (γ − ˆγGLS) 0i = tr A0_jAjV (ˆγGLS) = tr A0_jAj A0Ω−1A −1

Where the first equality uses Property 2 and the second equality uses Property 3. f 2 = 2 (γ − E (ˆγGLS))

0

A0_jE (µj) ιT

= 0

Where the first equality uses the fact that ˆγGLS and µj are independent (note that ˆγGLS depends

on µi for i ∈ [1, N], but not on µj).

f 3 = 2 (γ − E [ˆγGLS])

0

A0_jE [vj]

(28)

Where the first equality uses the fact that ˆγGLS and vj are independent. f 4 = ι0_TEµ2_j ιT = T σ_µ2 f 5 = 2ι0_TE [µj] E [vj] = 0 f 6 = T X t=1 Ehv_j,t2 i = T σ_v2

Combining all the results of this sub-section, we can rewrite MSE (ˆyj,T P)as:

M SE (ˆyj,T P) = tr A0_jAj A0Ω−1A −1 + Tσ_µ2 + σ_v2 (3.8) Since the FTP uses a MLE with consistent estimates of γ, the AMSE of the FTP is equal to the MSE of the TP.

AM SE (ˆyj,F T P) = tr A0_jAj A0Ω−1A −1 + T σ_µ2 + σ_v2

Where the first term represents the loss in efficiency due to the estimation of ˆγM LE and the last

term represents the variance of the predictor with known parameters.

3.2.2 The Ordinary Least Squares predictor (OLS)

The OLS predictor for an individual outcome is ˆyj,t,OLS = x

0

j,tβˆOLS.The vector form is:

ˆ

yj,OLS = XjβˆOLS

Now we compute the difference yj− ˆyj,OLS and use it to compute the MSE of the OLS predictor in

the Out of Sample dimension:

yj− ˆyj,OLS = Xjβ + uj − XjβˆOLS = Xj β − ˆβOLS + uj M SE (ˆyj,OLS) = E Xj β − ˆβOLS + uj 0 Xj β − ˆβOLS + uj = E β − ˆβOLS 0 X_j0Xj β − ˆβOLS | {z } g1 + E 2β − ˆβOLS 0 X_j0uj | {z } g2 + Ehu0_juj i | {z } g3

(29)

Where the second equality uses Property 2. We work on the terms one by one. g1 = tr E β − ˆβOLS 0 X_j0Xj β − ˆβOLS = tr X_j0XjE β − ˆβOLS β − ˆβOLS 0 = tr X_j0Xj X0X −1 X0h ¯Xππ 0 ¯ X0+ ΩiXX0X −1

Where the first equality uses Property 2, the second equality uses Property 3 and the last equal-ity uses E ˆ_β_OLS_{− β} ˆ_β_OLS _{− β}0 = X0X −1 X0h ¯Xππ 0 ¯ X0+ ΩiXX0X −1 , from equation (B.7). g2 = 2tr E β − ˆβOLS 0 X_j0uj = 2tr X_j0E uj β − ˆβOLS 0 = −2tr X_j0X¯jππ 0 _¯ X0X X0X −1

Where the last equality uses E uj β − ˆβOLS 0 = − ¯Xjππ 0 _¯ X0X X0X −1 , from equation (B.16). We derived an expression for g3 in the last sub-section, in equation (2.14) with the result:

g3 = π0X¯0_jX¯jπ + T

σ_µ2 + σ_v2

Putting all the results together, we can rewrite MSE (ˆyj,OLS):

M SE (ˆyj,OLS) = tr X_j0Xj X0X −1 X0h ¯Xππ 0 ¯ X0+ Ω i X X0X −1 − 2tr X_j0X¯jππ 0 _¯ X0XX0X −1 + π0X¯0_jX¯jπ + T σ_µ2 + σ_v2

Where the first two terms represent the loss (or gain) in efficiency due to the estimation of the parameters, the third term represents the effect of the BIAS with known parameters and the last term represents the variance of the predictor with known parameters.

3.2.3 The Truncated Fixed Effects predictor (TFE)

The FE predictor for an individual outcome is ˆ

yj,t,F E = x

0

(30)

Due to lack of information in yj, ˆαj,F Ecan´t be estimated. Therefore we can use a Truncated Fixed

Effects predictor that ignores ˆαj,F E and for an individual outcome is ˆyj,t,T F E = x

0

j,tβˆF E. The vector

form is:

ˆ

yj,T F E = XjβˆF E

Now we want to compute the difference between yT +s− ˆyT ,s,T F E and use it to derive the MSE

of the TFE predictor:

yj − ˆyj,T F E = Xjβ + uj− XT +sβˆF E = Xj β − ˆβF E + uj M SE (ˆyj,T F E) = E " Xj β − ˆβF E + uj 0 Xj β − ˆβF E + uj # = E β − ˆβF E 0 X_j0Xj β − ˆβF E | {z } h1 + 2E β − ˆβF E 0 X_j0uj | {z } h2 + Ehu0_juj i | {z } h3

Now we work on the terms one by one: h1 = tr E β − ˆβF E 0 X_j0Xj β − ˆβF E = tr X_j0XjE β − ˆβF E β − ˆβF E 0 = σ2_vtr X_j0Xj X0QX −1

Where the last equality uses result E ˆ_β_{F E}_{− β} ˆ_β_{F E}_{− β}0 = σ2_vX0QX −1 , from equation (B.11). h2 = 2tr E β − ˆβF E 0 X_j0uj = −2tr X_j0E [uj] E β − ˆβF E 0 = 0

Where the second equality uses the independence of uj and ˆβF E.

Previously we derived an expression for h3 in equation (2.14): h3 = π0X¯0_jX¯jπ + T

σ2_µ+ σ2_v

Putting all the results together, we can rewrite MSE (ˆyj,T F E):

M SE (ˆyj,T F E) = σ 2 vtr X_j0Xj X0QX −1 + π0X¯0_jX¯jπ + T σ_µ2 + σ_v2

(31)

Where the first term represents the loss in efficiency due to the estimation of the parameters, the second term represents the effect of the BIAS and the last term the variance of the predictor with all parameters known.

(32)

Monte Carlo Results

4.1 Monte Carlo Experiment Design

The comparison among the AMSE derived on Chapter 2 and Chapter 3 depends on the values of π, σ_µ2, σ2_v. In order to find which predictor performs better under different combinations of the parameters, we implement four Monte Carlo experiments. First we define a benchmark and then we explain the difference of each experiment w.r.t. this benchmark.

Benchmark:

We use a data generating process in line with the one used by Baillie and Baltagi (1999), but with the introduction of the correlation between the vector of explanatory variables and the individual specific effect. First we generate the scalars

xi,t = 0.1t + 0.5xi,t−1+ ωi,t

with xi,0 = 5 + 10ωi,0 and ωi,t ∼ i.i.d. U [−0.5, 0.5], with the first 20 outcomes of xi discarded to

minimize the effect of initial values. We set N = 50 and T = 10.

We perform the simulations under Assumptions A1, A2 and A3, so that yi,t = xi,tβ + ¯xiπ + µi+ vi,t

. We set the value of the parameters as β = 0.5 and σ2

µ+ σ

2

v = 20. To test the performance of the

predictors under different setups, we give different values to ρ ≡ σµ2

σ2µ+σ 2 v:

ρ = 0, 0.3, 0.6, 0.9

where in the first case, we are not in a random effect model, and in the rest the relative weight of σ_µ2 w.r.t. σ2_v is increasing. We also give different values to scalar π:

π = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5

(33)

where in the first case there is no correlation between xi,t and αi ≡ ¯xiπ + µi and, for a constant

ρ 6= 0, the correlation is increasing as the value of π is higher. For the corresponding value of σ2

µ, σ

2

v and π we generate the variables

αi = ¯xiπ + µi

with µi ∼ i.i.d. N

h

0, σ_µ2i and vi,t ∼ i.i.d. N

h 0, σ_v2i.

With this information, we are in a position to generate the yi,t under each of the 40 parameter

values combinations. In each of these situations we compute the average of the theoretical (A)MSE derived in previous sections and the empirical MSE over the 5000 replications for both the post sample prediction (for the one period ahead forecast) and the out of sample prediction (for individual j /∈ {1, ..., N } in periods 1 to T ).

For the MLE of the parameters ˆγM LE and ˆθM LE we use the method of iterated GLS as in

Breusch (1987). Formally, with the starting values of ˆθ0= 0, 1 we iterate between

ˆ γs= A0 Q + ˆθsQb A −1 A0 Q + ˆθsQb y with ds = y − Aˆγs, Qb ≡ IN T − (T N ) −1 ιN Tι 0

N T − Q, A as defined in Chapter 1 and Q as defined

in Chapter 1; and

ˆ θs+1=

d0_sQds

(T − 1)d0_sQbds

.When the estimated values are close enough (| ˆθs− ˆθs−1 |≤ 10

−8

) the procedure stops and the values and the values are stored. Now the same algorithm runs for ˆθ0 = 1 and the final values

are stored. If the value of the final estimation of the parameters for both starting values are close enough, we know that we found a global maximum and the estimate for ˆγM LE and ˆθM LE are the

average of both estimators.

In order to determine if there is a significant difference between the theoretical AMSE and the empirical MSE, we compute statistic Z ∼ N (0, 1) as in Baillie and Baltagi (1999).

Z ≡ R1/2q (AmseBiasV ariance)−1/2 where R is the number of replications, q ≡ AMSE −MSE and AmseBiasV ariance ≡ R−1

N−1PR r=1 PN i=1 h (yi,T +s,r− ˆyi,T +s,r) 2 − AM SE (ˆyi,T +s,r) i2 . As long as | Z |≤ 1.96 there is not significant difference between the AMSE and the empirical MSE.1

Experiment1 is the benchmark experiment. Note that when ρ = 0, the corr (¯x_i, αi) = 1 (the

individual specific effects are completely driven by x) and for the rest of the parameter values combinations corr (¯xi, αi) ∈ [0 , 0.10].

Experiment2 follows the benchmark design, but with N=200 and T=5. This is a “short panel”, which is closer to what we might find in real life applications. Note that when ρ = 0,

1

(34)

the corr (¯xi, αi) = 1 and for the rest of the parameter values combinations corr (¯xi, αi) ∈

[0 , 0.13].

Experiment3 follows the benchmark design with a few differences:

• Assumptions A1 are violated in this experiment. In particular, the data generating process for the individual outcomes is

yi,t = xi,tβ + xi,1π + µi+ vi,t

, so that the correlation between the xi,t and αi depends only on the first value of the

ex-planatory variable xi,1 and NOT on the average. This means that the FOP and the FTP

are misspecified. We want to see the performance of the predictors under this circumstance because it is most likely what will happen in real life applications.

• We changed the values of π for this experiment, because the relevant behavior of the functions plotted require a higher π to be observed.

π = 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 5

Since we didn’t derive the AMSE for the predictors under this data generating process, we will show the empirical MSE as the results. Therefore we decided to set the number of replications to 20000, to get smoother graphics. Note that the variability of the empirical MSE is higher, so more replications are needed to decrease the effect of extreme values. In this experiment, when ρ = 0, the corr (¯xi, αi) = 0.37 and for the rest of the parameter values combinations corr (¯xi, αi) ∈ [0 , 0.21].

Experiment4 follows the benchmark design but with a few differences:

• We changed the data generating process of xi,t to a more persistent Auto Regressive process.

In particular, now we use

xi,t= 0.1t + 0.9xi,t−1+ ωi,t

.

• After a first trial with 5000 replications, we decided to set the number of replications to 20000. We did this because the theoretical AMSE was not a good approximation of the empirical MSE (in particular for the FE predictor), so we are forced to plot the later.

In this experiment, when ρ = 0, the corr (¯xi, αi) = 1 and for the rest of the parameter values

combinations corr (¯xi, αi) ∈ [0 , 0.32].

4.2 Results

Results for Experiment 1, Experiment 2 and Experiment 4 are very similar, so we will only show Experiment 1 in this section and Experiment 2 and Experiment 4 in Appendix C.

(35)

4.2.1 Experiment 1

In experiment 1 we have Z ∈ [−0.30, 0.33] for the Post Sample case and Z ∈ [−0.66, 0.82] for the Out of Sample case, meaning that there is no significant difference between the theoretical AMSE and the empirical MSE. We plot the AMSE of the predictors because they are smoother.

4.2.1.1 Post Sample Case

As shown in in the graphics above, when ρ = 0 ⇐⇒ σ2

µ = 0, the Truncated Predictor

performs better than the Ordinary Predictor. This is not surprising, because σ2

µ = 0 implies that

the individual specific effect αi= ¯x

0

iπ + µidepends only on the first term, since the second term (the

unobservable part) is zero. The Ordinary Predictor is estimating that part that for this parameter combination is non-existent. This leads to an increase in the variability with no positive contribution in predictive power, so the AMSE of the FOP is above the one of the FTP.

The OLS is the best performing predictor when π is close to zero. This is due to the fact that (based on the Gauss-Markov theorem) we know that the OLS is the BLUP in this setup, when there is no serial correlation of the error term (in this case, when σ2

µ= 0). As π increases, the importance

of the omitted variables for the OLS model increase, leading to a higher AMSE. Both the FTP and the FOP are well specified in this experiment, so their AMSE remains unchanged as the value of π is modified.

For a value of π that is big enough (in this case something close to 0,4), the FTP becomes the best performing predictor.

(36)

We can see that as ρ increases, the predicting rules that use an estimate for µi perform

remark-ably better than the less complex rules. The shape of the AMSEs for all the predictors remains the same, but the FE predictor and the FOP shifted down considerably. The zoom in the right hand side allow us to see that the FOP is the best performing predictor for all the range of the parameters and the FE is a close second.

4.2.1.2 Out of sample case

For the out of sample case we can see at first glance that the TFE is performing poorly compared to the other predicting rules. This is due to the fact that it uses an unbiased estimator of β. Although unbiasedness is a desired property for an estimator, while trying to predict in a situation in which we have omitted relevant variables, a biased estimator (in this case the OLS estimation of β) can perform better, as it captures part of the effect of the omitted variables if they are correlated with the included ones. The graphic on the left is a good example of this, since the only difference between the OLS and the TFE is the estimation of β that they use. The zoom in the right show us that (similarly as what happened in the post sample case) the OLS predictor outperforms the FTP when π is small, but as π increases, the FTP is the preferred predicting rule.

(37)

As ρ increases, the performance of the TFE predictor doesn’t get any better. An interesting fact is that the range of values of π for which the OLS predictor outperforms the FTP is shrinking. One possible explanation for this is that the OLS predictor is not using the information about the serial correlation of the error term to improve the estimation of β, while the FTP uses a GLS method.

4.2.2 Experiment 3

In Experiment 3 only the FE predictor had Z ⊂ [−1.96, 1.96]. This is not surprising, because we derived the AMSE of the predictors under Assumptions A1, that are violated in this experiment. Since the AMSE is not a good approximation of the empirical MSE, we decided to plot the later. 4.2.2.1 Post Sample case

When the FOP and the FTP are misspecified, their MSE is no longer flat when we plot it against π. The BIAS of both predictors increase as π increases, and this explains the shape that they exhibit. The OLS predictor has an empirical MSE that increases at the highest rate as π changes. The empirical MSE corresponding to the FTP increases slower than the OLS one, because the additional term included in the prediction (¯x0

(38)

observable part of the individual specific effect (xi,1π). The MSE of the FOP increases even slower

than the one of the FTP because the additional term in the prediction (θM LEe¯i,M LE) compensates

part of the BIAS corresponding to the FTP. Non surprisingly, the MSE of the FE predictor is still flat. This is due to the fact that the FE predictor estimates αias if it was a constant, independently

of how it is defined, so that its MSE depends basically on the value of σ2

v.

Putting everything together, the outcome for small values of π is similar to Experiment 1 and 2, and the OLS is the best performing predictor at the beginning. Then the FTP becomes the best performing predictor, but this is only temporary. As π becomes even bigger, the FOP is dominating the other predictors, and this will be true independently of the value of π.

The FE predictor is a close second to the FOP when π is remarkably big.

As ρ gets bigger, we notice that the MSE of the FTP is still increasing along with π.

Also the MSE of the FOP is basically flat, and it is dominating the other predictors for all the range of the parameters. This happens because now θ is very close to 1, so that the term θM LE¯ei,M LE is compensating almost all the BIAS that arises due to the miss-specification of the

predictor.

It is important to note that the FE predictor is always a close second to the FOP, even though it never performs better.

(39)

4.2.2.2 Out of Sample case

For the out of sample case, we can see the TFE predictor is still performing poorly. When we look at the zoom in the right, we find the same effect that we had in the post sample prediction. The OLS predictor is better when π is close to zero and then the FTP performs better, in spite of the fact that it is increasing along with π.

Similarly as in Experiment 1, we find that all the MSE are shifted up as ρ increases. Since we are plotting empirical MSE, the precision of the graphic doesn’t allow for a detailed analysis of how the predictors behave when π is close to zero. We believe that we should find a similar outcome as the one of Experiment 1, in which there are less values of π for which the OLS outperforms the FTP as ρ increases.

4.3 Conclusions

4.3.1 Post Sample prediction

• The FOP derived in chapter 2 is the best performing predictor under assumptions A1, A2 and A3, when ρ 6= 0 and π is big enough. Experiment 3 showed that it can also perform

(40)

adequately when assumptions A1 are not verified.

• The FE predictor is a close second to the FOP when ρ is big enough and it is highly recom-mended in those situations due to its simplicity and lack of strong assumptions.

• The FTP derived in chapter 3 is the best performing predictor when ρ = 0 and π 6= 0 under assumptions A1, A2 and A3. It is not robust to violations to assumptions A1.

• The OLS predictor is recommended when there is enough evidence that ρ = 0 and that corr (αi,x¯i) = 0. Otherwise, the FOP or the FTP would be a better choice.

4.3.2 Out of Sample prediction

• The FTP is the best performing predictor as long as π is big enough and assumptions A1, A2 and A3 are verified. This was also true in Experiment 3, where Assumptions A1 were violated, so the FTP is recommended for out of sample prediction as long as there is evidence that corr (αi,x¯i) 6= 0.

• The OLS predictor performs nicely when π is small, so it is recommended when the evidence suggest that corr (αi,x¯i) = 0.

• The TFE predictor is by no means recommended for out of sample prediction. Due to the unbiasedness of ˆβF E and the lack of a term that estimates αj, it completely ignores the

individual specific effect.

4.3.3 General conclusion

In the course of this thesis we found that the device suggested by Mundlak (1978) to model the correlation between the explanatory variables xi and the individual specific effect αiperforms nicely,

at least under certain circumstances. The simplicity of this device and the gain in efficiency that it provides (measured in MSE) suggest that it should be taken into account for predictive purposes, even if this is not the true data generating process for yi,t. This thesis is a first approach to this

framework and does not pretend to give the ultimate solution to the problem . Future research could focus on applications to compare the performance of the FOP and the FTP with other predictors in real life circumstances, where the data generating process of the explanatory variables, the nature of the correlation between xi and αi and the correlation of the error components will take different

(41)

Baillie, R. and Baltagi, B. (1999). Prediction from the regression model with one-way error com-ponents. Analysis of panels and limited dependent variable models, Cambridge University Press, pages 255–267.

Baltagi, B. H. (2007). Forecasting with panel data. Center for Policy Research Working Paper, (91).

Baltagi, B. H. and Li, D. (2006). Prediction in the panel data model with spatial correlation: the case of liquor. Spatial Economic Analysis, 1(2):175–185.

Breusch, T. S. (1987). Maximum likelihood estimation of random effects models. Journal of Econo-metrics, 36(3):383–389.

Cameron, A. C. and Trivedi, P. K. (2005). Microeconometrics: methods and applications. Cambridge university press.

Chamberlain, G. (1984). chapter 22: Panel data. Handbook of econometrics, pages 1248–1318. Chamberlain, G. and Hirano, K. (1999). Predictive distributions based on longitudinal earnings

data. Annales d’Economie et de Statistique, pages 211–242.

Fiebig, D. and Johar, M. (2014). Forecasting with micro panels: The case of health care costs. Version: 24.

Frees, E. W. and Miller, T. W. (2004). Sales forecasting using longitudinal data models. Interna-tional Journal of Forecasting, 20(1):99–114.

Goldberger, A. S. (1962). Best linear unbiased prediction in the generalized linear regression model. Journal of the American Statistical Association, 57(298):369–375.

Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica: journal of the Econometric Society, pages 69–85.

Schmalensee, R., Stoker, T. M., and Judson, R. A. (1998). World carbon dioxide emissions: 1950– 2050. Review of Economics and Statistics, 80(1):15–27.

(42)

Asymptotic variance and covariance

matrix of the MLE

A.1 Setup and derivatives

Following Baillie and Baltagi (1999) we want to derive the variance and covariance matrix of the vector hγ0, σ_µ2, σ_v2i. Based on assumptions A1 and A2 we have:

y − Aγ = µ + v ∼ N [0, Ω]

We can set the Log Likelihood for the estimation ofhγ0, σ_µ2, σ_v2i: LogL

γ0, σ2_µ, σ2_v

= c − 0.5log | Ω | −0.5(y − Aγ)0Ω−1(y − Aγ) (A.1) Now we need the first order partial derivatives and the second order partial derivatives to fill in the information matrix that is defined as:

I ≡       −E       ∂2LogL(.) ∂γ∂γ0 ∂2LogL(.) ∂γ∂σµ2 ∂2LogL(.) ∂γ∂σ2v ∂2LogL(.) ∂σµ2∂γ 0 ∂2LogL(.) ∂σ2µ∂σ2µ ∂2LogL(.) ∂σµ2∂σv2 ∂2LogL(.) ∂σv2∂γ 0 ∂2LogL(.) ∂σ2v∂σ 2 µ ∂2LogL(.) ∂σv2∂σ 2 v             −1 40

(43)

A.1.1 Partial derivatives w.r.t. γ

First we work in the partial derivative of (A.1) w.r.t. γ. It is a quadratic form. Using the chain rule and the fact that Ω−1

is symmetric: ∂LogL(.) ∂γ = 02K×1 −0.5 ∂ ∂γ(y − Aγ) 0 Ω−1(y − Aγ) = 02K×1 (y − Aγ)0Ω−1A = 02K×1 y0Ω−1A − ˆγ0A0Ω−1X = 02K×1 (A.2) ˆ γM LE = A0Ω−1A −1 A0Ω−1y ˆ γM LE = ˆγGLS

Note that ˆγM LE is really just ˆγGLS computed with the estimated values of σ

2

µand σ

2

v. Now we want

to compute the second derivatives of equation (A.2) w.r.t. γ0

, σ2 µ and σ 2 v: ∂2LogL(.) ∂γ∂σ2_µ = ∂ ∂σ_µ2 (y − Aγ) 0 Ω−1A = ∂ ∂σ_µ2 (y − Aγ) 0 σ_v−2Q + T σ_µ2+ σ_v2 −1 P A = − (y − Aγ)0 T T σ2 µ+ σ 2 v 2P A = − (y − Aγ)0 T T σ2 µ+ σ 2 v 2 ¯ A −E " ∂2LogL(.) ∂γ∂σ_µ2 # = −E " − (y − Aγ)0 T T σ2 µ+ σ 2 v 2 ¯ A # = T T σ2 µ+ σ 2 v 2E h (y − Aγ)0i ¯A −E " ∂2LogL(.) ∂γ∂σ_µ2 # = 02K×1 (A.3)

Where the second equality uses the alternative definition Ω−1

≡ σ−2_v Q +T σ_µ2+ σ_v2

−1

P and the fourth equality uses P A = ¯A. In the fifth equality we take minus expectation of both terms of the

(44)

fourth equation and for the last equality we use y − Aγ = µ + v, so that E [y − Aγ] = 0. ∂2LogL(.) ∂γ∂σ_v2 = ∂ ∂σ2_v (y − Aγ) 0 Ω−1A = ∂ ∂σ2_v (y − Aγ) 0 σ−2_v Q + T σ2_µ+ σ_v2 −1 P A = − (y − Aγ)0 1 σ4 v Q + 1 T σ2 µ+ σ 2 v 2P ! A = − (y − Aγ)0 1 σ4 v QA + 1 T σ2 µ+ σ 2 v 2 ¯ A ! −E " ∂2LogL(.) ∂γ∂σ2_v # = −E " − (y − Aγ)0 1 σ4 v QA + 1 T σ2 µ+ σ 2 v 2 ¯ A !# −E " ∂2LogL(.) ∂γ∂σ2_v # = 02K×1 (A.4)

Where the second equality uses the alternative definition Ω−1

≡ σ−2_v Q +

T σ_µ2+ σ_v2 −1

P and the fourth equality uses P A = ¯A. In the fifth equality we take minus expectation of both terms of the fourth equation and for the last equality we use y − Aγ = µ + v, so that E [y − Aγ] = 0.

∂2LogL(.) ∂γ∂γ0 = ∂ ∂γ0 (y − Aγ) 0 Ω−1A = ∂ ∂γ0 y0Ω−1A − γ0A0Ω−1A = −A0Ω−1A −E " ∂2LogL(.) ∂γ∂γ0 # = −Eh−A0Ω−1A i −E " ∂2LogL(.) ∂γ∂γ0 # = A0Ω−1A (A.5)

A.1.2 Partial derivatives w.r.t. σµ

Now we work in the partial derivative of (A.1) w.r.t. σ2

µ. ∂LogL(.) ∂σ2_µ = −0.5 ∂ ∂σ2_µ

log | Ω | +(y − Aγ)0Ω−1(y − Aγ)

= −0.5 ∂ ∂σ2_µlog | Ω | | {z } m1 − 0.5 ∂ ∂σ2_µ(y − Aγ) 0 Ω−1(y − Aγ) | {z } m2 (A.6)