• No results found

Income mobility in Ecuador : a Pseudo-Panel approach

N/A
N/A
Protected

Academic year: 2021

Share "Income mobility in Ecuador : a Pseudo-Panel approach"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Income Mobility in Ecuador: a

Pseudo-Panel approach

Alberto Basantes-Arias

Thesis presented for the degree of

Master of Science in Econometrics

Thesis Supervisor: Maurice Bun, PhD.

Department of Economics and Econometrics

University of Amsterdam

The Netherlands

(2)

Income Mobility in Ecuador: a Pseudo-Panel

approach

Alberto Basantes-Arias

Submitted for the degree of Master of Science in Econometrics

July, 2014

Abstract

The present thesis investigates the consistency conditions of estimating a -small N-dynamic Pseudo-Panel formed by repeated cross sections (RCS). First it is found that the inclusion of a time-varying cohort variable is essential to achieve unbiased estimation, and second the usual t-statistic is not normally distributed, which makes it necessary to use an alternative t-statistic. The methodology is applied to the analysis of income mobility in Ecuador in the period 2007-2013. The results obtained suggest that the country is facing a medium-high mobility framework, which implies that policy should be oriented to improve income; however when gender differences are studied, the picture is totally different for females; this should be taken into account in the policy design.

(3)

Contents

Abstract ii

1 Introduction 1

2 Methodology: Income mobility using Pseudo-Panel 6

2.1 Measurement error in income mobility . . . 6

2.2 When can Pseudo-Panel be treated as a genuine Panel? . . . 8

2.3 Estimation of Dynamic Models . . . 11

2.3.1 Dynamic estimators for Genuine Panels . . . 12

2.3.2 Dynamic estimators for Pseudo-Panels . . . 15

3 Monte Carlo study 21 3.1 Monte Carlo design . . . 21

3.2 Monte Carlo results . . . 25

3.2.1 First setup: C = 20, nc= 200 . . . 25

3.2.2 Second setup: C = 80, nc= 50 . . . 29

4 Data and creation of Pseudo-Panel 31 4.1 National Employment and Unemployment Survey . . . 31

4.2 Construction of the Pseudo-Panel . . . 33

5 Empirical Application: Income Mobility in Ecuador 36 5.1 Interpretation of Results . . . 36 5.2 Levels of Mobility . . . 37 6 Conclusions 41 References 43 Appendix 46 iii

(4)

Contents iv

A Cohorts creation 46

A.1 Three-year interval cohorts: balanced panel . . . 46 A.2 Three-year interval by gender cohorts: balanced panel . . . 47 A.3 One-year interval cohorts: unbalanced panel . . . 50

(5)

List of Figures

1.1 Macroeconomic indicators for Ecuador . . . 1

1.2 Gini coefficient for Ecuador . . . 2

2.1 Tridiagonal matrix Σ . . . 20

3.1 Diagram of cohorts creation . . . 22

3.2 Simulation 1st setup: ρ = 0.99, β = 0 . . . 27

3.3 Simulation 1st setup: ρ = 0.50, β = 0, ε : non factor-structure . . . . 27

3.4 Simulation 1st setup: ρ = 0.50, β = 0.3, σx = 1 . . . 28

3.5 Simulation 1st setup: ρ = 0.50, β = 0.3, σx = 1.4 . . . 29

(6)

List of Tables

2.1 Main advantages and disadvantages of the dynamic estimators . . . . 18

3.1 Experimental design: scenarios . . . 25

3.2 Dynamic Pseudo-Panel simulation results of the 1st setup . . . 26

3.3 Dynamic Pseudo-Panel simulation results of the 2nd setup . . . 29

4.1 Number of individuals: 26-65 years . . . 33

4.2 Types of pseudo-panel . . . 35

5.1 Estimates of quarterly mobility using first case of Pseudo-Panel . . . 38

5.2 Estimates of quarterly mobility using second case of Pseudo-Panel . . 39

5.3 Estimates of quarterly mobility using third case of Pseudo-Panel . . . 40

A.1 Observations per cohort: three-year interval . . . 46

A.2 Observations per cohort: three-year interval, male . . . 47

A.3 Observations per cohort: three-year interval, female . . . 48

A.4 Cohorts creation criterion . . . 49

A.5 Observations per cohort: one-year interval . . . 50

A.5 . . . 51

(7)

Chapter 1

Introduction

Latin America has historically been characterised as a region with high levels of un-employment, low economic growth rates, and high poverty indexes; and these have brought as consequence high levels of inequality and macroeconomic instability, fur-thermore, Latin America has been recognised as the most unequal region in the world (Kliksberg, 2005).

During the last years, previously to the 2009 Great Recession, the region experi-enced a strong and sustained growth phase. As a result, Ecuador has been successfully following a policy plan advocated to diminish poverty rates and increase the society’s welfare. We can see in figure 1.1 that the country has been experiencing higher growth rates than the region and this has helped to ameliorate poverty.

Figure 1.1: Macroeconomic indicators for Ecuador

(a) Real GDP growth (b) Poverty incidence Notes: Real GDP measured with current USD dollars of 2005.

XXXptPoverty incidence measured as percentage of the population.

Source: own elaboration using data from ENEMDU and World Bank.

More important still, is to note that Ecuador has become not only a regional but 1

(8)

Chapter 1. Introduction 2 also a world reference, because its rapid reduction of inequality, specially in the rural areas, as we can see in figure 1.2, the National Gini Coefficient has decreased about 7 points since the beginning of the actual government regime in 2007 (SENPLADES, 2013).

Figure 1.2: Gini coefficient for Ecuador

Source: own elaboration using data from ENEMDU and World Bank.

Now, it is important to understand if this improvement in equality measures is sustainable through out time, or is just a consequence of discrete policies that have rightly reduced poverty but they will not generate a consistent impact in the society welfare (Cuesta, ˜Nopo, & Pizzolito, 2011).

This concern brings to light how important it is to analyse inequality in a dynamic way. The degree of income mobility measured by the slope coefficient of a regression of income over its lagged value, is generally seen as a dynamic measure of inequality of opportunities suggested by Fields and Ok (1999); and also, according to Antman and McKenzie (2007) as a measure of flexibility in the labor market.

Furthermore, the complete view of the inequality patterns will help to make a dis-tinction between individuals with higher mobility, and in this way differentiate if they would enjoy greater incentives to exert effort and climb up in the income distribution, and this would be an important policy aspect (Cuesta et al., 2011).

In this sense, it is argued in Jantti and Jenkins (2013) that greater levels of mobility are important in a society, the fact that the individuals can end up in a different income level from which they started, represents a more open society with greater equality of opportunities.

(9)

Chapter 1. Introduction 3 The understanding of important social aspects as the one presented before or even poverty, employment and crime deterrence makes essential the use of panel data because it is necessary to study the same individuals over time and capture the movements over their characteristics to be able to estimate dynamic models that allow to differentiate the true causal effect. However, in most of the underdeveloped countries the lack of this type of data due to the absence of a national plan or the high costs that imply to follow individuals over time is a frequent drawback.

To overcome this issue, Deaton (1985) proposed a methodology that takes advantage of the cross-sectional surveys with rotating samples, to follow cohorts of individuals over time and create a Pseudo-Panel, and with this be able to answer policy questions that require dynamic models. In the present thesis the Pseudo-Panel methodology is used to study income mobility in Ecuador.

After some treatment, detailed in the next chapters, the dynamic pseudo-panel can be estimated using the traditional methods of genuine panel data, and will enjoy of the same advantages of having panel data, control for selection biases by including fixed effects, that can be wiped out using a suitable transformation, in this case the within transformation.

This treatment is necessary because it is important to account for the fact that the level of poverty and the changes in income are not randomly behaved, therefore they depend on observed and unobserved intrinsic characteristics of the individuals (Beccaria, Maurizio, Fernandez, Monsalvo, & ´Alvarez, 2011).

Furthermore, wiping out these fixed effects, is not only technically important be-cause it is necessary to identify the parameter by fixed effects estimation (FE), but it is also empirically important, because it is the only way in which one can distinguish between a true state dependence from simply unobserved heterogeneity.

If an individual is experiencing a change in the dynamics of a process, in this case low income, because of unobserved adverse characteristics in time t, and is likely that they continue the same dynamics in the next period of time due to the same unobserved characteristics, the process is driven by unobserved heterogeneity (Cameron & Trivedi, 2005).

(10)

Chapter 1. Introduction 4 However, if the dynamics are driven by a true causal mechanism, in other words: if in a specific time period the low income causes low income in the subsequent periods, the process is called after Heckman (1981) as true state dependence.

The main objective for investigating if the relation between income and its lag are related in a true causal mechanism or just as product of individual heterogeneity is vital in the design of policy derived from this analysis. As Giraldo, Rettore, and Trivellato (2002) one should notice that the policy implications are different; if the relationship is due to true state dependence, it would make sense to force individuals to improve their income in the actual time in order to reduce their likelihood of having low income in the next periods of time, but if in the relation there is no causal mechanism then it makes no sense.

Two previous works have been done to investigate income mobility in Ecuador, in Orde˜nana and Villa (2012) they are concentrated in the effects of entrepreneurship in improving welfare level in the country. They constructed a Pseudo-Panel using the ENEMDU from fourth quarter of 2003-2010 and estimated the parameter by FE, GMM and simple OLS. However, their results have mixed messages (FE: 0.476, GMM: 0.864); considering that both the FE and GMM estimators are consistent in the pseudo-panel dynamic case, they present a very wide interval of mobility, which makes difficult to extrapolate policy implications.

The results in Carnelas (2010) are more in line with the results in the present thesis, however the author uses additional controls and the effect is concentrated in a low-moderate scenario; the point estimates suggests a mobility in Ecuador ranging from 0.50 to 0.85 after including cohort effects that take into account unobserved heterogeneity.

The remainder of the thesis is organised as follows: in chapter 2 the main methodol-ogy aspects including the construction of pseudo panels and the estimation of dynamic panel models are explained. In the next chapter a simulation exercise is conducted in order to investigate the properties of the Fixed Effects estimator and how the char-acteristics of the panel affect it. In chapter 4 a brief description of the data is given as well as the criteria that was used to create the cohorts are explained. Finally, in

(11)

Chapter 1. Introduction 5 chapter 5 the main results of the analysis are presented and also the interpretation of the main point estimates. In the last chapter, some policy recommendations are discussed and the main conclusions are given.

(12)

Chapter 2

Methodology: Income mobility

us-ing Pseudo-Panel

In this chapter an overview of the methodology of Pseudo-Panels is given, as well as the main asymptotic properties to estimate consistently dynamic models using this method. When analysing the income dynamics, panel data turns out to be necessary for the estimation of the model. The lack of panel data can be solved with the use of repeated cross-sections (RCS) to follow cohorts of individuals in a period of time.

2.1

Measurement error in income mobility

Dynamic models are relevant in a broad scope of economic applications, including the one investigated in the present paper; income mobility (MaCurdy, 1982). The mobility measure taken in the present study is the one that links the current income to its immediate past, in the following way:

yi,t∗ = ρyi,t−1∗ + βxi,t+ ηi,t (2.1.1)

Where the asterisks indicate a variable measured with some error and the coefficient ρ is the measure of mobility that is going to be addressed in two different models. 1

First an unconditional measure of mobility will be obtained from (2.1.1); where, following, Antman and McKenzie (2007), if ρ is greater than 1 there is divergence

1We allow the possibility of including exogenous regressors.

(13)

2.1. Measurement error in income mobility 7 rather than convergence, and if it is smaller than 1 it indicates some convergence of income, i.e. the individuals with income below the mean in period t will undergo a quicker growth of income than more wealthier individuals.

Second, taking into account that the dynamic process of income will be influenced by unobserved variables usually fixed for each individual, the model will contain an individual fixed term αi. When including this term to (2.1.1) the estimated parameter

in (2.1.2) will be a measure of conditional mobility.

y∗i,t = ρyi,t−1∗ + αi+ ηi,t (2.1.2)

This will indicate the speed of the movements of the individual’s income relative to their own mean; is what is referred in the economic growth literature as conditional convergence (Barro & Sala-i Martin, 1991) (Rodrik, 2011).

In this way, consistent and unbiased estimations of ρ are needed, however, income variables have usually measurement error, and in practice we observe:

yi,t = y∗i,t+ νi,t (2.1.3)

This measurement error will be one of the challenges of the estimation of mobility and even more in developing countries, because the measurement error tends to be bigger due to some intrinsic aspects of society (Antman & McKenzie, 2007). For example, illiteracy can cause some unintentional mistakes in reporting their incomes, or even individuals intentionally misreport income in order to avoid taxes. Furthermore, a large amount of the workers are part of shadow economy or are self-employed, this makes it difficult to calculate an exact measure of income.

The construction of the pseudo panel will allow some general forms of measurement error as in Antman and McKenzie (2007), in which the methodology averages out the measurement error, given that the cohorts are adequately constructed.

In this sense, the dynamic model for income mobility will be obtained by replacing (2.1.3) in the mobility model (eq. 2.1.1):

(14)

2.2. When can Pseudo-Panel be treated as a genuine Panel? 8

yi,t = ρyi,t−1+ εi,t (2.1.4)

where,

εi,t = ηi,t+ νi,t− ρνi,t−1

is a composite error term.

We still have to make some remarks about how the method would deal with the presence of this measurement error in income.

The main identifying assumption, following Antman and McKenzie (2007) is that a law of large numbers applies within a cohort. That means that as the number of indi-viduals per cohort is getting large the measurement error νi,t converges in probability

to zero: 1 nc nc X i=1 νi,t p → 0 (2.1.5a) as nc → ∞ (2.1.5b)

After constructing the pseudo-panel, with sufficient observations per cohort, the measurement error will not affect consistency of the estimates of (2.1.2), as mentioned by Antman and McKenzie (2007) due to the fact that if all individuals within a co-hort does not have a common time-varying component to the measurement error, the construction of the Pseudo-Panel will average out the individual measurement error.

2.2

When can Pseudo-Panel be treated as a

gen-uine Panel?

When dynamics relationships, as the one presented before, are the main objective of an investigation, panel-type structure is the main source of data. In this way, consider the following general linear dynamic model:

(15)

2.2. When can Pseudo-Panel be treated as a genuine Panel? 9

yi,t = ρyi,t−1+ αi+ εi,t (2.2.1)

where i = 1, ..., N indicates individuals which are observed in period t = 1, ..., T . xi,t corresponds to explanatory variables, εi,t is an idiosyncratic component and αi are

the unobserved individual effects. An assumption that will be maintained is that the disturbances εi,t are independent across individuals.

Since αi, in general, will be correlated with the other explanatory variables, equation

(2.2.1) can only be estimated consistently from panel data by using a fixed effects estimation approach. (Deaton, 1985)

Furthermore, even if αi would be random, OLS could not be used in the estimation

of (2.2.1), because it will lead to inconsistent estimators, due to the fact that one should expect that yi,t−1 would be correlated with the error term. This issue, shows

the necessity to non-fixed effects estimators for dynamic models that will be presented later on (Cameron & Trivedi, 2005).

Nevertheless, recall that in this case, panel-structure data is nonexistent and for that reason, yi,t−1is not observed, due to the fact that individuals are not followed over time,

this make impossible to identify the parameter ρ in equation 2.2.1. However, Deaton (1985) in his seminal work have pointed out the use of Repeated Cross Sections (RCS) to consistently estimate ρ. He proposes to track cohorts (c = 1, ..., C) of individuals over a period of time.

Construction of the pseudo-panel analysis involve taking average over all i belonging to each c and consider them as synthetic observations, so the cohort-time averages (denoted by a bar) version of equation 2.2.1, in which cohort specific effects are included is:

¯

yc(t),t = ρ¯yc(t),t−1+ ¯αc(t)+ ¯εc(t),t (2.2.2)

However, this will pose two main issues regarding the consistent estimation of ρ. First, ¯αc(t) depends on the time, unlike to the true panel case, and is commonly

(16)

2.2. When can Pseudo-Panel be treated as a genuine Panel? 10 estimating (2.2.2) by common random or fixed effects methods. Nevertheless, from a Monte Carlo simulation study Verbeek and Nijman (1992) conclude that if the number of observations per cohort is large (e.g. at least between 100-200) it can be assumed that ¯αc(t) will be fixed as in the genuine panel data. In this way, the model will be:

¯

yc(t),t = ρ¯yc(t),t−1+ ¯αc+ ¯εc(t),t (2.2.3)

Furthermore, considering the dynamic nature of the model, following Antman and McKenzie (2007), it is possible to expect that ¯yc(t),t−1, which is not observable, as

a different random sample from the cohort is observed in each period, do not differ asymptotically from the sample mean of the individual observed in t − 1, ¯yc(t−1),t−1. So

we can write equation (2.2.2) as:

¯

yc(t),t = ρ¯yc(t−1),t−1+ ¯αc+ ¯εc(t),t+ λc,t (2.2.4)

where,

λc,t= ρ(¯yc(t),t−1− ¯yc(t−1),t−1).

This extra error term could also be controlled letting the number of individuals in each cohort (nc = N/C) becomes large, as shown in McKenzie (2004). λc,t will be

safely ignored when the individuals who are interviewed in t − 1 are the same for those in t and for those in t + 1 because it does not affect consistency, however the standard errors are going to change and they need to be corrected.

This generates a problem in the present case with the oldest and also the youngest cohorts. Because changes in the cohorts would not be random. i.e. old poor people could be more likely to die within the period between t and t + 1. (McKenzie, 2004)

In the present study, the frequency of data used is quarterly, in this way important movements such as migration or death of individuals are less likely to occur; in the same fashion, the sample is restricted for individuals between 25-65, which are indi-viduals who are more likely to have a considerably regular income; this two practical decisions where taken in order to undermine the impact of the previous issue (Antman & McKenzie, 2007).

(17)

2.3. Estimation of Dynamic Models 11 Until now, we have talked about cohorts creation as an important practical decision to make for the consistency of ρ. A cohort is defined as a group of individuals that can be identified as with fixed membership when they show up in the surveys (Cameron & Trivedi, 2005).

Following Verbeek and Nijman (1992) the cohorts should be defined based on a continuous variable, that should be: i. constant over time, and ii. observed for all the individuals in the sample. These cohorts of individuals can be, for example, age-based. e.g. individuals born between 1980-1983.

In order for the asymptotic theory developed in the literature (e.g. McKenzie (2004), Antman and McKenzie (2007) and Verbeek and Nijman (1992)) to hold, the number of individuals in each cohort should be large enough. However, note that this implies that less cohorts should be defined and hence less observations in the pseudo-panel, and this will generate an increase in the variance of the estimator of the coefficient. So there is a trade-off between bias and efficiency that should be taken into account in the constructions of the cohorts.

Furthermore, this poses a new challenge in the estimation of mobility, because typically one will allow for small cross-section dimension in order to allow nc→ ∞,

and most of the theory of dynamic panel data estimation is developed for the typical microeconomic case of big cross-section dimension and small time dimension.

2.3

Estimation of Dynamic Models

Bearing in mind the previous discussion on creating a pseudo-panel based on cohorts of individuals observed in different independent cross-sections, is affirmed that if the cohorts are adequately constructed, the Pseudo-Panel can be treated as a genuine panel, therefore, is now moment to discuss how the parameters of models (2.1.1) and (2.1.2) can be consistently estimated.

(18)

2.3. Estimation of Dynamic Models 12

2.3.1

Dynamic estimators for Genuine Panels

It is well known that in the estimation of a dynamic model as in (2.1.2) it can not be applied the common OLS method to estimate the parameter, mainly because either under the assumption that the term αi is stochastic or fixed, it will be correlated with

the lagged dependent variable, and if it is assumed that the disturbance term exhibits serial correlation, then yi,t−1 will be positively correlated with the error term,

violat-ing the strict exogeneity assumption and for that reason makviolat-ing the OLS estimates inconsistent (even in large N large T panels) (Bond, 2002).

Following this, is then straightforward to think of a transformation to eliminate this identified source of inconsistency; commonly the within transformation is made. This regresses (yi,t− yi) on (yi,t−1− yi,−1) and has error term (ηi,t− ηi).

Note that also in this case the regressor (yi,t−1− yi) will be correlated to the error

(ηi,t− ηi). As mentioned in Nickell (1981), when the lagged dependent variable appears

as a explanatory variable, then the OLS estimation of the within transformed model will also be inconsistent.

To show this consider the within transformation of model (2.1.2): (for a moment ignoring the fact that this variables represent the sample observed with error variables) (yi,t− yi) = ρ yi,t−1− yi,−1 + (ηi,t− ηi) (2.3.1)

where the mean values of yi,t, yi,t−1,αi and ηi,t are obtained and the model is

ex-pressed in deviations from these means. Now consider that, yi = T −11 Pt=T −1t=1 yi,t = 1

T −1(yi,1+ ... + yi,t+ ... + yi,T −1) and ηi = 1 T −1

Pt=T

t=2 ηi,t = T −11 (ηi,1+ ... + ηi,t−1+ ... + ηi,T).

So the regressors will be of the form: 

yi,t−1−

1

T − 1(yi,1+ ... + yi,t+ ... + yi,T −1) 

(2.3.2a) 

ηi,t−

1

T − 1(ηi,1+ ... + ηi,t−1+ ... + ηi,T) 

(2.3.2b) So we can notice that even after the transformations the regressors are going to be correlated, specifically −yi,t

T −1 with ηi,t, and also the component −ηi,t−1

(19)

2.3. Estimation of Dynamic Models 13 Nickell (1981) has derived analytical expressions for the bias of the Within estima-tor in the first-order auto-regressive model, which has the following form (in case of covariance stationarity): plim N →∞ ( ˆρ − ρ) = − 1 + ρ T − 1  1 − 1 T 1 − ρT 1 − ρ  ×  1 − 2ρ (1 − ρ)(T − 1)  1 − 1 − ρ T T (1 − ρ) −1 (2.3.3)

For a positive ρ, the bias will be negative. From this we can see that the bias is of the order O(T−1), so consistency requires that T grows, which will reduce the bias, however in practice one should be careful on how big T has to be in order to ignore this bias, an issue that is going to be considered later. (Nickell, 1981)

Therefore, in order to estimate consistently the parameters of the dynamic model, three additional estimators are going to be considered; a straightforward generalisation of the traditional Least Square Dummy Variable (LSDV) which incorporates a bias correction (LSDVbc) developed in Kiviet (1995) and furthered study in Bun and Kiviet (2003); and two instrumental variable approaches as in Anderson and Hsiao (1982) (A-H) and Arellano and Bond (1991) (A-B), the latter being based on the General Method of Moments (GMM).

In this sense, the motivation for presenting the LSDVbc, in the present study, comes from the fact that the bias in equation (2.3.3) is considering that N → ∞, however in the case of pseudo-panel analysis besides the two common dimensions of a genuine panel, one will have to consider two extra dimensions: the number of cohorts (C) and the number of individuals in each cohort (nc), and, in the literature, it is typical the

case that nc→ ∞ while the number of cohorts C are fixed (which is the cross sectional

dimension in the pseudo-panel).

In other words, in most of the cases of pseudo-panel analysis, the panel will be small, especially in the cross-section dimension, in order to allow nc→ ∞ and ignore

the measurement error originated from the fact of not observing the same individu-als in each time period, as mentioned before (Verbeek & Nijman, 1992), (Antman & McKenzie, 2007).

(20)

2.3. Estimation of Dynamic Models 14 Judson and Owen (1999) have investigated the finite-sample properties of the LS-DVbc, the A-H and the A-B estimators, when the time dimension is bigger than the (”small”) cross-sectional dimension. After running a simulation exercise it is concluded that in the case of a balanced panel the LSDVbc is the preferred estimator as it has the smallest Root Squared Mean Error over the other considered estimators; however in the case of an unbalanced panel with a time dimension smaller than 20 a GMM with 1 instrument or the traditional A-H estimator would be preferred.

Bun (2001) also investigated by means of Monte Carlo simulation the properties of such estimators when both dimensions are small and he concludes that when small panel dimensions are faced the LSDVbc estimator has the lower RMSE in comparison to the traditional Instrumental Variable (A-H) and the GMM estimator (A-B).

However, in Beck and Katz (2004) it is investigated the properties of A-H, LSDV and LSDVbc when T is bigger (e.g.T=20), they conclude that the decision regarding the A-H is easy, it should not be used in a large T small N type of panel, because it may have small bias but its variability is very high.

However, when comparing the traditional LSDV with the Kiviet (1995) LSDV, the decision is less straightforward, while it is true that the first one present small bias and a relative smaller RMSE the ”costs” of implementing it are high. (Beck & Katz, 2004) The bias expression are difficult and not easy to use in practice2, in addition the

estimator is based on the ”true” parameters as initial values, however these are un-known. Kiviet (1995) proposes to use A-H or A-B as initial values, but this introduces some noise in the estimators. Finally, standard errors are only calculated using boot-strap (because of the difficultness of an asymptotic expression) however in this case the pseudo-panel nature of the data can impose some difficulties in the way in which the bootstrap is done at the cohort level.

2Now-a-days there is a Stata rutine that performs the estimation of this estimator using A-H or

A-B as initial values and bootstrap to calculate standard errors, which was used in the present analysis (Bruno, 2005).

(21)

2.3. Estimation of Dynamic Models 15

2.3.2

Dynamic estimators for Pseudo-Panels

In addition to the methods presented before, Verbeek and Vella (2005) and McKenzie (2004) had presented simulation exercises for the specific pseudo-panel case. Their proposed estimators will be presented next. In the former paper an augmented Instru-mental Variable Method is presented considering that grouping the individuals into cohorts is an application of an IV method; and the latter propose a consistent OLS estimator to identify parameter heterogeneity.

In this sense, in Verbeek and Vella (2005) it is argued that the within estimator bias is the dynamic model can be omitted. Consider the cohort mean model as in (2.2.4), where in an attempt to capture the cohort effects, a set of dummies cc(t) indicated the

cohort are included, and where the last term of the error is obviated:3

¯

yc(t),t = ρ¯yc(t−1),t−1+ γci(t)+ ¯εc(t),t (2.3.4)

The estimation of equation (2.3.4) by IV, as in Verbeek and Vella (2005) using ci(t),

interacted with time dummies as instruments is done first transforming the model to wipe the fixed effects:

¯

yc(t),t− ¯yc(t−1),t−1 = ρ ¯yc(t−1),t−1− ¯yc(t−2),t−2 + ¯εc(t),t− ¯εc(t−1),t−1



(2.3.5) Equation 2.3.5 is a standard transformed dynamic model, but the cross-sectional dimension is given by cohorts. For the augmented instrumental variables estimator to be consistent is required that Eεi(t),t, ci(t) = 0.

In this way, it is concluded that the traditional within estimator bias, as presented before for the genuine case, in a Dynamic Pseudo-Panel can be omitted, if the previous condition holds, because the error term of the Pseudo-Panel model is a within cohort

3Provided the cohorts are adequately constructed and that n

c → ∞ as in Verbeek and Nijman

(22)

2.3. Estimation of Dynamic Models 16 average of the individual error terms and it is asymptotically zero if the cohorts provide time-varying variation of the regressors and not have time-varying relation to with the error term.

This fact is also tested in the Monte Carlo study in chapter 3 in which evidence is provided that a simple AR(1) model including a time-varying cohort variable can be consistently estimated using a FE estimator with the normal within transformation in the set up in which the number of individuals in each cohort is bigger than the number of cohorts (nc = 200, C = 20).

Finally, worried about cohort parameter heterogeneity can be an important issue in the present analysis, McKenzie (2004) can be followed to obtain consistent estimations of mobility in each of the cohorts.

Consider the following extension of the dynamic model for individual i in time t:

yi(t),t = ρcyi(t),t−1+ αi(t)+ εi(t),t (2.3.6)

As we can notice, the parameter rho is allowed to change in each cohort, in this way we can have the following cohort average version:

¯

yc(t),t = ρcy¯c(t−1),t−1+ αc+ ¯εc(t),t (2.3.7)

In which, as well as the previous cases, the error that will be produced as one can not see the same individuals at two different times will converge to zero as nc → ∞.

Express the model (2.3.7) as matrix form, following McKenzie (2004), let ϑc= (αc, ρc)

0

denote a 2 × 1 vector of the parameters for cohort c; also let ¯yc, ¯yc,−1, ¯εc be the vectors

with elements ¯yc(t),t, ¯yc(t−1),t−1 and ¯εc(t),t.

Additionally letting ¯Γc= ι, ¯yc,−1 where ι is a vector of ones, we can write (2.3.7)

as:

¯

yc= ¯Γcϑc+ ¯εc, c = 1, 2, . . . , C (2.3.8)

A simple approach to estimate (2.3.8) is to apply OLS, due to the fact that the error term is now orthogonal to the regressor, because the lagged unobserved regressor corresponds to an instrumented variable using the mean of the individuals at t − 1

(23)

2.3. Estimation of Dynamic Models 17 ¯

yc(t−1),(t−1).

So in this way the estimator for cohort c will be given by: ˆ ϑc = ¯Γ 0 cΓ¯c −1 ¯ Γ0cc (2.3.9) The author considered as well an IV estimator to allow for the case in which the pseudo-panel measurement error has not disappear and is causing the lagged dependent variable to be endogenous. However, it is proved that given the natural asymptotic theory in pseudo-panel (nc→ ∞, fixed: T ) the OLS estimator is consistent, (limiting)

normally distributed and had a smaller RMSE in the simulation exercise, therefore McKenzie (2004) suggests that the OLS estimator is likely to be preferred in practice. As we can see, the literature has not reach a conclusion about the best estimator for dynamic models in a small N, moderate T panel of data, and even more in this case in which repeated cross sections are used to estimate income mobility.

Table 2.1 will resume the estimators that have been presented in the previous section, and facing the conditions of the empirical application a normal FE are going to be used in the next section to obtain a consistent estimation of the conditional and unconditional mobility in Ecuador. And finally the McKenzie (2004) procedure will be used to identify the estimators with cohort heterogeneity.

(24)

2.3. Estimation of Dynamic Models 18

Table 2.1: Main advantages and disadvantages of the dynamic estimators Reference Method Advantages Disadvantages

Traditional LSDV

Perform well in large T dimension panels. Simple to implement. Used before in the ap-plication of Pseudo-Panel Analysis.

Kiviet (1995) and Bun and Kiviet (2003) Bias-Corrected LSDV Perform well in small dimension panels

Not simple to implement. Not used before in the ap-plication of Pseudo-Panel Analysis. Anderson and Hsiao (1982) Second lag instrument in levels

Used before in Income Mobility Literature.

Not efficient in finite sample dynamic panels. Computational sim-ple. Arellano and Bond (1991) One-step GMM restricted to use less than 7 instruments

Is more efficient than simple IV estimator.

Does not perform well in fi-nite sample dynamic panels. Performs well in

un-balanced panels with moderated T dimen-sion (20 or less).

With big time dimension could suffer by the curse of instruments proliferation causing weak instruments problem.

McKenzie (2004)

Parameter Heterogeneity

Used before in Pseudo Panel Analysis.

When the number of co-horts is big the interpreta-tion is hard.

Performs well in pseudo-panels.

Simple to implement. All the estimators are detailed in the previous section.

Source: own elaboration using information on the detailed sources. Standard errors in Pseudo-Panels

A final aspect to be addressed is the estimation of the standard errors in the pseudo-panel scenario. Consider that, as mentioned before, the cross-section dimension is usually small and this will pose some problems in estimating the standard errors.

First, the use of cluster-robust standard errors is not possible as the number of clusters (cohorts) is evidently small. In this way, following Petersen (2009) recall that

(25)

2.3. Estimation of Dynamic Models 19 the clustered standard errors places no restriction on the correlation structure within a cluster, so its consistency (and normal distribution Donald and Lang (2007)) will depend on having a large number of clusters. In the practice, this will cause that the t-statistic based on cluster standard errors tends to over-reject the null hypothesis, this had created a ”consensus” in the literature to have at least 50 cluster units to obtain unbiased estimators of the variance (Cameron & Miller, 2013).

Furthermore, considering that the consistency of the least square estimation of the dynamic parameter in a pseudo-panel framework requires (nc → ∞), this will allow

that the estimator is consistent because it will be unaffected by the measurement error, however the standard errors will be affected, thus they will be larger than those obtained with genuine panels. A second important fact that will have an important effect in the standard errors is that, in most of the cases, the number of elements in each of the cohorts will not be the same for all of the cohorts.

To solve this problem, in the present analysis, a Variance-Covariance matrix sug-gested in Inoue (2008) is going to be used.

Inoue (2008) using a simulation exercise, underlines the importance of a (robust) different t-statistic in the pseudo-panel framework. Even though the fixed effects esti-mator is consistent in a dynamic pseudo-panel framework, the usual t-statistic is not asymptotically normally distributed.

It is also proven that, under the null that ˆρ = ρ

tρF E,j = √ N ( ˆρF E,j− ρj) r h (W0M W )−1W0M ˆΣ−1W (W0M W )−1i jj d → N (0, 1) (2.3.10)

where M is a within transformation matrix, W is a weighting matrix formed by the exogenous variables x at the cohort level and the mean at the cohort level of the lagged dependent variable and is given by:

(26)

2.3. Estimation of Dynamic Models 20 W =               x011 y¯10 .. . ... x01T y¯1,T −1 x0C1 y¯C0 .. . ... x0CT y¯C,T −1               (2.3.11) ˆ

Σ is the CT ×CT tridiagonal matrix (see figure 2.1) with ((c−1)T +t, (c−1)T +t)th diagonal element given by:

N Nct   1 Nct X i∈Ict  yi− ˆθ0wi 2 − 1 Nct X i∈Ict  yi− ˆθ0wi  !2 + ˆ ρ2N Nct−1   1 Nct−1 X j∈Ict−1 yj2−   1 Nct−1 X i∈Ict−1 yj   2  (2.3.12)

and ((c − 1)T + t, (c − 1)T + t − 1)th and ((c − 1)T + t − 1, (c − 1)T + t)th elements given by: −ρNˆ Nct   1 Nct X i∈Ict  yi− ˆθ0wi  yi− 1 Nct−1 X i∈Ict−1  yi− ˆθ0wi  1 Nct−1 X i∈Ict−1 yi   (2.3.13)

where ˆθ = [ ˆρ, ˆβ] is some consistent estimator of θ, such as the FE estimator. Figure 2.1: Tridiagonal matrix Σ

(27)

Chapter 3

Monte Carlo study

In this chapter we introduce a Monte Carlo simulation study to investigate the distri-bution of the Fixed Effects estimators in a particular case of a pseudo-panel facing a dynamic model and small cross-section dimension.

In the first section the Data Generating Process (DGP) is presented with some considerations needed before performing the simulation exercise. In the next section the results of the Monte Carlo analysis are presented and some recommendations for the practice are discussed.

3.1

Monte Carlo design

In this section it will be described the DGP used for investigating the properties of the fixed effects estimator in a pseudo-panel context. The study will be focused on consis-tency of the within estimation when estimating a dynamic model based on pseudo-panel data. Also the effects of factor structure in the error term, and the importance of a cohort time varying exogenous regressor are going to be analysed.

The data is generated in an individual level, considering that in practice one would face the cohort creation as an important issue in pseudo-panel analysis. The way in which the data is created has to take into account that the cohorts must exhibit between-groups variation higher than the within-groups variation.

Consequently, for creating the cohorts a large panel data set was created to obtain the elements of each of the cohorts taking the block diagonal elements and discarding the rest of them. Graphically one can see how each of the cohorts were created with

(28)

3.1. Monte Carlo design 22 the following example of cohort 1:

C1, T1 C1, T2 C1, T3 . .. C1, TT nC ∗ C nc T

Figure 3.1: Diagram of cohorts creation

In this way this large panel data set is created C times by the following DGP:

yi,t = ρyi,t−1+ βxc,t+ ηc+ εi,t (3.1.1)

where ηc = αi+ αc corresponds to a cohort time-invariant effect. However,

consid-ering that the data corresponds to an individual level it has an extra term in a manner of a perturbation, which corresponds to an individual effect that differentiates each individual within a cohort. εi,t corresponds to a traditional idiosyncratic error term

and yi,t−1 is the lagged dependent variable.

The main reason of investigating the properties of the estimator comes from the fact that the identification of the auto-regressive parameter rho, is not as trivial as it appears in the case of a pure AR(1) model. It turns out that it depends crucially on some time varying variable i.e. β 6= 0. If this is not the case, the dependent variable in the limit may converge to a constant in the cohort level.

For visualising this issue, consider first a simplified version of (3.1.1) when there is no time-varying cohort exogenous regressor. i.e. β = 0:

yi,t = ρyi,t−1+ αi+ αc+ εi,t (3.1.2)

(29)

3.1. Monte Carlo design 23 cohort-average version of (3.1.2) can be rewritten, by recursively substitution, as:

¯ yc,t= ¯ αc 1 − ρ + αc 1 − ρ + ∞ X s=0 ρsε¯i,t−s (3.1.3)

In this case, when the number of individuals per cohort goes to infinity, as required by consistency of pseudo-panel, the identification problem described before arises, be-cause the dependent variable will converge near to a constant, as:

plim ¯yc,t =

αc

1 − ρ (3.1.4)

Hence, the within transformation used to obtain the FE estimator of model (3.1.1) could be wiping out the dependent variable at the same time that is wiping out the unobserved fixed effects. As a result, in the simulation it is expected that the FE estimator of the pseudo-panel will not be identified and hence it will be inconsistent as it will present a large bias.

Now, if we consider the case in which β 6= 0, identification will be possible and the stationary case would not be suffering from the same problem of no identification, hence it is expected that the FE estimator in the pseudo-panel is not biased and it will be consistent.

Additionally, if we consider an error term that follows a factor structure, then the within transformation would not wipe out the dependent variable; so it will be considered, in this final case that:

εi,t = K X k=1 λi,kFk,t+ √ Kξi,t (3.1.5)

Where λi,k and Fk,t are factor loadings and common factors, respectively; and K

is the number of factors. Following Lin, Hu, Wang, and Xia (2012) to simulate a case of a strict factor in which there is cross-section independence, one should consider ξi,t, λi,k, Fk,t ∼ N (0, 1). In this sense, the consistency of the FE estimator would not

be affected meanwhile the factor loadings has zero mean and are constant over cohorts. The importance of the time varying cohort variable is not per se its existence, is

(30)

3.1. Monte Carlo design 24 also important the degree of time variation that this variable exhibits, in order for the cohorts to be adequately identified, and the dynamic parameter could be identified.

In this way, for the simulation the time-varying cohort variable will follow a normal distribution and the time variation will be given by the variance of the variable. Surely, there is a tradeoff between variation and explanation, because the higher the variance, higher the time variation but lower the relevance of the variable, because it will be more noisier as the variance increase.

In summary, the experimental design will concentrate in the time varying cohort variable (xc,t), not only in its existence (β 6= 0), but also in its variance (σx); in the

same way, the trade-off that exists between the number of cohorts (C) and the number of individuals in each cohort (nc) requires further study. Another important aspect

is to illustrate the difference between a stationary case and the non-stationary case, with the parameter rho near to a unit root. Finally, the error structure will be the last feature of pseudo panel that in this study will be further examined.

All of these scenarios are going to be analyzed within the framework provided by the following DGP:

yi,t =ρyi,t−1+ βxc,t+ ηc+ εi,t (3.1.6a)

xc,t∼N 0, σx2  (3.1.6b) ηc =αi+ αc (3.1.6c) αc∼N 0, σc2  αi ∼ N 0, σi2  (3.1.6d) εi,t ∼N 0, σε2  (3.1.6e) The DGP is similar to Inoue (2008) with the difference that the first one include additionally a vector of individual-specific explanatory variables and also to McKenzie (2004), but this one allows for inter-cohort parameter heterogeneity. Furthermore, the process was warmed-up as Kiviet (2011) stated: the process will gradually get on its stationary track, so if we take the process from some s observation and on, it will have a variance very close to its stationary value.

(31)

3.2. Monte Carlo results 25 is near a unit-root process, with no exogenous regressors. In the second scenario, the auto-regressive parameter is smaller in order for the process to be smoother and achieve stationarity, in this scenario the error structure is important, because a factor-structure will allow to achieve identification. In the third scenario an exogenous regressor will be included, first with small variation and then in a final case with larger variation.

This cases are described in table 3.1 and to illustrate the tradeoff of efficiency and bias that was described before, those cases are going to be considered on two different setups. The first of them is going to consider 20 cohorts and 200 individuals in each cohort; while the second will be comprised of 80 cohorts with 50 individuals in each cohort.

Table 3.1: Experimental design: scenarios

Scenario Dynamics parameter Time-varying exogenous parameter Degree of

va-riation in x Error structure 1st scenario ρ = 0.99 β = 0 σx = 0

ε does not follow a fac-tor structure

2nd scenario ρ = 0.50 β = 0 σ x = 0

2.a ε does not follow a factor structure.

2.b. ε does follow a fac-tor structure

3rd scenario ρ = 0.50 β = 0.3 3.a. σx= 1.0 ε does not follow a factor structure. 3.b. σx = 1.4

The 5 cases above are investigated using C = 20 and nc= 200 or C = 80 and nc = 50

3.2

Monte Carlo results

3.2.1

First setup: C = 20, n

c

= 200

In this first setup, the cross-section dimension is small, relative to the number of elements in each of the cohorts. In all of the scenarios, the genuine panel estimator present the well known dynamic bias (Nickell, 1981). In all of the scenarios the bias is around of 100% of the estimator and negative in all of the cases.

(32)

3.2. Monte Carlo results 26

Table 3.2: Dynamic Pseudo-Panel simulation results of the 1st setup ˆ ρ: pseudo-panel ρ β σx  median s.d. RMSE 0.99 0 - n.f.s 0.972 0.005 0.021 0.50 0 - n.f.s -0.052 0.048 0.551 0.50 0 - f.s 0.375 0.021 0.129 0.50 0.30 1.0 n.f.s 0.367 0.019 0.135 0.50 0.30 1.4 n.f.s 0.451 0.010 0.051

n.f.s: non-factor structure, f.s: factor structure

The numbers in the columns ”median”, ”s.d” and ”RMSE” are the median, the standard deviation, and the root square mean error of the estimates, respectively.

Source: Own elaboration using Monte Carlo results based on 1000 reps.

In table 3.2 we can see that with regard to the pseudo-panel estimator, in the 1st scenario the bias of the dynamic estimator is not big. However, in this case the process is near to a unit root case, hence it is not reliable at the moment of making inference, in figure 3.2 we can see that the normal theory breaks down, the normal t-test under-rejects the null hypothesis that ˆρ = ρ = 0.99 at every level of significance using homoscedastic standard errors and over-rejects using the Inoue’s standard errors. As expected, when the model estimated is a pure AR1 with a non-factor error structure the dynamic parameter could not be estimated and the bias is large. As we can see in figure 3.3 in panel (a) the parameter exhibit fat tails and is centered around -0.05 and in panel (b) the t-test is totally over-rejecting the null hypothesis that the estimated parameter is equal to the true one. This puts evidence on the importance of including a time-varying cohort variable in the empirical application of the present paper to achieve identification.

These results are according to Inoue (2008), McKenzie (2004) and Verbeek and Vella (2005). It is important that the explanatory variable at the cohort level vary across cohorts and also over time, to achieve identification. In this sense, not only the existence of the time-varying cohort variable is important, as mentioned before the degree of variation is also important. In this exercise, it was controlled with the

(33)

3.2. Monte Carlo results 27 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0 10 20 30 40 50 60 70 80 90 ˆ ρ St d. N or mal de ns ity Ke r ne l de ns ity 0.952 0.953 0.954 0 1 (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 N o m i n a l s i z e A c tu a l si z e N o n -R o b u st A c tu a l si z e I n o u e 4 5◦ (b)

Figure 3.2: Simulation 1st setup: ρ = 0.99, β = 0

(a) Density plot: estimated parameter of the dynamic pseudo-panel

. (b) P-value plot: t-test rejection frequencies using non-clustered standard errors Source: own elaboration using Montecarlo results based on 1000 replications.

−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0 10 20 30 40 50 60 70 80 90 100 110 ˆ ρ St d. N or mal de ns ity Ke r ne l de ns ity −0.23 −0.22 −0.21 0 2 (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 N o m i n a l s i z e A c tu a l si z e N o n -R o b u st A c tu a l si z e I n o u e 4 5◦ (b)

Figure 3.3: Simulation 1st setup: ρ = 0.50, β = 0, ε : non factor-structure

(a) Density plot: estimated parameter of the dynamic pseudo-panel

. (b) P-value plot: t-test rejection frequencies using non-clustered standard errors Source: own elaboration using Montecarlo results based on 1000 replications.

(34)

3.2. Monte Carlo results 28 0.25 0.3 0.35 0.4 0.45 0.5 0 10 20 30 40 50 60 70 80 90 100 ˆ ρ St d. N or mal de ns ity Ke r ne l de ns ity 0.245 0.25 0.255 0.26 0 1 (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 N o m i n a l s i z e A c tu a l si z e N o n -R o b u st A c tu a l si z e I n o u e 4 5◦ (b)

Figure 3.4: Simulation 1st setup: ρ = 0.50, β = 0.3, σx = 1

(a) Density plot: estimated parameter of the dynamic pseudo-panel

. (b) P-value plot: t-test rejection frequencies using non-clustered standard errors Source: own elaboration using Montecarlo results based on 1000 replications.

variance of the exogenous variable x.

For the third case, in table 3.2 we can see that the bias of the auto-regressive parameter is considerably smaller than in all of the cases when β 6= 0 but still bigger that in the case of bigger variation of the variable. In the same way, it can be seen in figure 3.4 that the normal theory does not hold and the usual t-statistic based on the homoscedastic or the Inoue’s standard errors is not valid to make inference about the significance of the parameter.

Finally, with respect to the last case, when the amount of variation increase (bigger σx) the parameter is identified and has a very small bias while it also presents a small

variance, hence the parameter is efficient. Regarding these results, is important that in the empirical application a cohort-level time varying exogenous regressor should be included.

In the same way, the rejection frequency of the normal standard-errors is smaller than the nominal size at every significance level, while the t-statistic based on the variance form as in Inoue (2008) has a good size specially in small significance values, which will allow us to make inference in the empirical application.

(35)

3.2. Monte Carlo results 29 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0 10 20 30 40 50 60 70 80 90 100 110 ˆ ρ St d. N or mal de ns ity Ke r ne l de ns ity 0.315 0.32 0.325 0.33 0.335 0 2 (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 N o m i n a l s i z e A c tu a l si z e N o n -R o b u st A c tu a l si z e I n o u e 4 5◦ (b)

Figure 3.5: Simulation 1st setup: ρ = 0.50, β = 0.3, σx = 1.4

(a) Density plot: estimated parameter of the dynamic pseudo-panel

. (b) P-value plot: t-test rejection frequencies using non-clustered standard errors Source: own elaboration using Montecarlo results based on 1000 replications.

3.2.2

Second setup: C = 80, n

c

= 50

In order to investigate the trade-off between efficiency and consistency we allow the number of cohorts to increase, hence the number of elements in each of the cohorts will decrease to nc= 50 in order to maintain the comparison between the two set-ups.

Table 3.3: Dynamic Pseudo-Panel simulation results of the 2nd setup ˆ ρ: pseudo-panel ρ β σx  median s.d. RMSE 0.99 0 - n.f.s 0.914 0.004 0.077 0.50 0 - n.f.s 0.018 0.024 0.482 0.50 0 - f.s 0.203 0.019 0.229 0.50 0.30 1.0 n.f.s 0.155 0.018 0.346 0.50 0.30 1.4 n.f.s 0.343 0.012 0.157

n.f.s: non-factor structure, f.s: factor structure

The numbers in the columns ”median”, ”s.d” and ”RMSE” are the median, the standard deviation, and the root square mean error of the estimates, respectively.

Source: Own elaboration using montecarlo results based on 1000 reps.

(36)

3.2. Monte Carlo results 30 is provided to the construction of the cohorts. However, in this setup, the cross-section dimension is bigger; this will generate a smaller standard deviation of the estimators. These results are in line with Verbeek and Vella (2005), McKenzie (2004) and Antman and McKenzie (2007) who stated the number of elements in each of the cohorts should be around 100.

(37)

Chapter 4

Data and creation of Pseudo-Panel

In this chapter, I will present the data that are used for the study of the income mobility in Ecuador. The National Employment and Unemployment Survey (ENEMDU) is the main source for the income measurements. Also, following the theory from previous chapters, the construction of the Pseudo-Panel and the definition of the cohorts are addressed in this chapter.

4.1

National Employment and Unemployment

Sur-vey

The ENEMDU is a national coverage survey conducted by the National Institute of Statistics and Census of Ecuador (INEC). This survey is the principal statistic tool in the matters of Labour Economics, since 2003 is carried out in a quarterly basis and from 2010 in a monthly basis.

The ENEMDU’s coverage is national, mainly focused in the five biggest cities of the country, including urban and rural sectors1 and also a sample of the rest of the

states, however the Insular Region (Galapagos Islands) is excluded from the sample. The survey is samples based on the National Population Census. The main ob-servational unit is the household and all the individuals from 5 years and older are interviewed. The questionnaire covers mainly information about their labour

condi-1A urban sector is a populated center with more than 2 000 inhabitants. The rural sector is only

included in the fourth quarter survey and since 2010 in the second as well

(38)

4.1. National Employment and Unemployment Survey 32 tions and all different sources of income; also covers topics regarding: educational and social conditions; as well as a dwelling conditions module.

Considering the fact that the main observational unit is the inhabited dwelling; we have to take into account that homeless people and also floating dwelling 2 are not included in the survey; this has to be considered since one of the poorest segments of society is not included, however this type of dwelling represents less than the 0.5% of the total (INEC, 2010).

Since 2007, the INEC together with the Central Bank of Ecuador, updated the methodology and increased the sample, this is the last important methodology change in the ENEMDU (INEC, 2013a). For this reason in the present analysis it will be used the quarterly survey since 2007 until the last quarter of 2013.

Since the present study will be dealing with income measures, following Antman and McKenzie (2007) we will exclude individuals that are less than 25 or more than 65 years old at the moment of the survey. The reason for this is to take into account their possibly irregular income dynamics, in the first case because they are being incorpo-rated into the labour force and the latter group because the official retirement age in Ecuador is 65.

It can be found in table 4.1 the number of interviewed individuals in each of the quarters, after the age restriction. In average 11 500 individuals are interviewed in the first and third quarters of each year, while 33 000 individuals were interviewed in the fourth quarter because it comprises both urban and rural sector. Since 2010 the urban and rural sector was included also in the second quarter, for that reason it has on average 25 000 interviewed individuals within the age range (25-65 years old).

2Floating dwelling include all type of boat that works as a house for one or more people in the

(39)

4.2. Construction of the Pseudo-Panel 33

Table 4.1: Number of individuals: 26-65 years 1st 2nd 3rd 4th 2007 10,467 11,894 11,703 32,156 2008 11,821 16,303 11,833 33,204 2009 12,064 12,215 11,672 33,747 2010 11,319 34,107 11,933 35,765 2011 11,863 35,071 11,231 30,989 2012 11,152 31,429 10,955 32,952 2013 11,520 34,525 11,343 35,512

Source: Own elaboration using ENEMDU in each period.

In its income section, the survey has information regarding labour income, non-labour income including donations or transfers in specie (converted to dollars), retire-ment income, and remittances. The measure of income that is going to be considered are the labour income of their first occupation.

All the income variables were deflated using the monthly Consumer Price Index for the national level with base on 2004 and averaged to obtain the quarterly index. This is an official indicator of the general level of prices of the principal consumption items and basic services of all socio-economic level households (INEC, 2013b).

4.2

Construction of the Pseudo-Panel

As mentioned before, the lack of Panel Data in Ecuador is a pitfall in the measurement of dynamic models, in particular in this case the income dynamics. Due to the absence of the necessary data we proceed to construct a Pseudo-Panel as described in Chapter 2, and in that way follow cohorts of individuals within the independent cross sections from 2007 to 2013.

One of the most important things in the construction of pseudo-panels is the def-inition of cohorts, Verbeek and Vella (2005) states that one should be equally careful in choosing cohorts, as in selecting instruments. In this way recall from chapter 2 that, the decision of the cohorts should be based on choosing a set of variables that are i. observed for all individuals in the sample, and ii. constant over time for each

(40)

4.2. Construction of the Pseudo-Panel 34 individual.

In order to assure that the selection of the members is as homogeneous as possible, i.e. if we select an individual in different cross-section periods, it should have the same probability to be assigned to the same cohort in each of the samples.

Furthermore, the asymptotic theory developed for pseudo-panels will continue to hold and the parameter will be identified if the cohort size is sufficiently large and if the true mean within each cohort exhibit sufficient time variation as we can see in Verbeek and Nijman (1992) and the Monte Carlo study in chapter 3. It is important to check after the cohorts are created if there is enough temporal variation (for all of the cohorts) and also inter-cohort variation in the variables that one is interested in.

In this way, it was chosen that year of birth as a cohort creation variable due to it turns out to be the most appropriate one. It is not possible to choose, as in previous works, the education level or the type of area in which the individual lives. The former one because it is not exogenous to the income level and the latter is not observed in all quarters.

Three cohort types were considered: in the first one the cohorts are constructed considering one year of birth interval, in the second one cohorts are created within a three-years of birth interval, while in the last case three-years of birth intervals are divided by gender.

In all the cases the number of individuals that belong to a cohort in each different quarter is bigger than 200 as suggested by Verbeek and Nijman (1992) in most of the cases and there is less than 1% of cohorts with less than that, they were not eliminated to have the advantage of a balanced panel and also because the results were not qualitatively different. 3

In this way we have the following three types of panel:

3The interested reader should refer to the appendix A.1 for the detailed information of the

(41)

4.2. Construction of the Pseudo-Panel 35

Table 4.2: Types of pseudo-panel

Case Time dimension

Number of cohorts

Number of

indi-viduals per cohort Type

1 28 47 491 Unbalanced

2 28 13 1550 Balanced

3 28 26 775 Balanced

Source: Own calculations.

The small cross-section dimension is almost unavoidable in a pseudo-panel analysis. In this particular case the number of cohorts varies from 13 to 47, so it has to be considered as a small N panel, and caution has to be taken when estimating the dynamic model. As we saw in the simulation exercise with this number of cohorts we will be able to identify the parameter of the model as long as a exogenous time-varying cohort variable is included in the model.

In the next chapter the main results of the estimations can be found, as well as the main policy implications it can be done.

(42)

Chapter 5

Empirical Application: Income

Mo-bility in Ecuador

In this chapter, the results of the empirical application are reported. Using the method-ology presented before and taking into account the practical recommendations follow-ing from the simulation study, the results of income mobility in Ecuador are estimated using a Pseudo Panel Approach.

It is found that Income Mobility is around 0.6 in the different specifications. When analysing gender differences it is found that males face a very high mobility level compared to females, which in turn will imply that a woman who has low income today in comparison to her peers will not converge to a higher income level tomorrow. These results will give the first signs of how to orient the public policy that is aimed to fight poverty or to improve social condition levels of the people.

5.1

Interpretation of Results

To give an interpretation to the results, recall the data generating process for income of individual i in time t, given in equation (2.1.2). This can be rewritten as:

yi,t∗ = αi  1 − ρt 1 − ρ  + ρty∗i,0+ t−1 X s=0 ρsηi,t−s ! (5.1.1) The previous decomposition follow from the fact that the process is stationary, in this case, using a Im, Pesaran, and Shin (2003), it is found that the panel does not have a unit root.

(43)

5.2. Levels of Mobility 37 Using (5.1.1) and following Antman and McKenzie (2007) we have partitioned the current income into three different parts: one that captures the initial differences in income, one that will represents the differences due to the individual’s specific fixed effects, and one that denotes the impact of shocks given in the form of a idiosyncratic shock.

With this in mind and to understand the meaning of the estimators, we can compare the current income of individual i with the current income of an individual j and establish the following expression for the gap between both:

yi,t∗ − y∗j,t = (αi− αj)  1 − ρt 1 − ρ  + ρt y∗i,0− yj,0∗  + t−1 X s=0 ρs(ηi,t−s− ηj,t−s) (5.1.2)

As we can see, the gap is explained by a difference in the intrinsic characteristics of the individuals, the difference in the initial income and also the difference in earnings innovations. In this way, we can see that when ρ = 0 the difference in income between both individuals is independent from the initial income and also from the innovations but they will differ only by their proper unobserved characteristics; this is why the literature has called this case as origin independence (Antman & McKenzie, 2007), (Jantti & Jenkins, 2013).

Small parameters imply that a high difference of income now between individual j and i, will be smaller in the next period and that is why small rho is referred as a high mobility scenario, hence the higher the parameter will be, the wider the income gap; with the rate of expansion growing with the parameter.

5.2

Levels of Mobility

In all the estimations the dependent variable is the logarithm of the individual income for the period under consideration, which have been proved by Fields and Ok (1999) that is the only measure of income that presents scale invariance, symmetry, additive separability and multiplicability; properties that are desired when working with income

(44)

5.2. Levels of Mobility 38 measures.1

All the models using the within transformation include a cohort time-varying ex-ogenous variable, as suggested by the Monte Carlo experiment, because is a necessary characteristic of the model to be able to identify the (im)mobility parameter.

In all of the cases, we can see that the unconditional mobility is higher than the conditional mobility (including fixed effects by cohorts); this results are in line to the ones obtained in Antman and McKenzie (2007) for the Mexican case. This first result gives light that the individual fixed effects are important in the estimation of mobility, hence the income differences between individuals will be (somehow) persistent, but taking into account that the absolute mobility is not as high as the Mexican case (0.540 vs 0.973).

Table 5.1: Estimates of quarterly mobility using 1st case of Pseudo-Panel

Dependent variable: Log real individual income.

(1) (2) (3) (4)

OLS FE LSDV

Quarterly lag of individual income 0.540 0.378 0.326 0.450

Standard error 0.022 0.051 0.049 0.055

Cohort effects No Yes Yes Yes Time effects No Yes No Yes Years of education No No Yes No Number of observations 1101 1101 1101 1101 R-squared 0.308 0.151 0.196

LSDVbc estimator using a bias correction up to O((N T )2) and bootstrapped std err. LSDVbc estimator initialised using A-H estimator.

R-squared in FE corresponds to the within groups. Source: Own calculations using quarterly ENEMDU.

In table 5.1 we can see the estimation for the first case, i.e. when the pseudo-panel was created with a one-year of birth interval and the results suggests that the

1Scale invariance introduces a degree of coherence into the measure because no matter in what units

is income measured, it will maintain the same properties; symmetry is also known as the anonymity property: what matters at the end is the income not the individual. In the case of multiplicability and additive separability, these are useful when using income measures of subgroups of individuals.

(45)

5.2. Levels of Mobility 39 Ecuadorian individuals are able to recover quickly from negative shocks to income. The FE estimator is around 0.4 and indicates the presence of convergence in the income growth rate in this period. This suggests that if an individual has 10% higher income than the mean, the difference will be reduced to 4% after one period.

Table 5.2: Estimates of quarterly mobility using 2nd case of Pseudo-Panel

Dependent variable: Log real individual income.

(1) (2) (3) (4)

OLS FE LSDV

Quarterly lag of individual income 0.602 0.426 0.346 0.510

Standard error 0.037 0.101 0.096 0.172

Cohort effects No Yes Yes Yes Time effects No Yes No Yes Years of education No No Yes No Number of observations 351 351 351 351 R-squared 0.382 0.194 0.272

LSDVbc estimator using a bias correction up to O((N T )2) and bootstrapped std err.

LSDVbc estimator initialised using A-H estimator. R-squared in FE corresponds to the within groups. Source: Own elaboration using quarterly ENEMDU.

In the second case of the Pseudo-Panel (3 year-of-birth intervals), the results sug-gests pretty much the same conclusions as before, however a slightly increase in the immobility is seen. The number of cohort-quarter observations is reduced to 351 and in this case the balanced panel suggests that the intrinsic characteristics of the individuals or fixed effects play a more important role, because the unconditional mobility is 50% higher than the conditional.

Finally, is important to make a distinction between female and male cohorts. The results in table 5.3 suggests a totally opposite picture for both. The female cohorts face a very high immobility scenario that suggests that a 10% difference between high-income females and low-high-income females after one period will remain to be around 8%. In the male cohorts, the point estimates suggest, a very mobile scenario, however the variable does not exhibit enough within variation (R-squared=0.017) and this gives as consequence a not-significant and biased-to-zero FE coefficient, as expected, regarding

(46)

5.2. Levels of Mobility 40 the results from the Montecarlo simulation in chapter 3.

Table 5.3: Estimates of quarterly mobility using 3rd case of Pseudo-Panel

Dependent variable: Log real individual income.

(A) Estimates for males.

(1) (2) (3) (4)

OLS FE LSDV

Quarterly lag of individual income 0.368 0.129 0.068 0.211

Standard error 0.037 0.286 0.611 0.066

Cohort effects No Yes Yes Yes Time effects No Yes No Yes Years of education No No Yes No Number of observations 351 351 351 351 R-squared 0.142 0.017 0.068

.

(B) Estimates for females

(1) (2) (3) (4)

OLS FE LSDV

Quarterly lag of individual income 0.819 0.740 0.705 0.821

Standard error 0.033 0.034 0.062 0.079

Cohort effects No Yes Yes Yes Time effects No Yes No Yes Years of education No No Yes No Number of observations 351 351 351 351 R-squared 0.700 0.573 0.590

LSDV estimator using a bias correction up to O((N T )2) and bootstrapped std errors.

LSDV estimator initialised using A-H estimator. R-squared in FE corresponds to the within groups. Source: Own elaboration using quarterly ENEMDU.

Referenties

GERELATEERDE DOCUMENTEN

The present text seems strongly to indicate the territorial restoration of the nation (cf. It will be greatly enlarged and permanently settled. However, we must

It implies that for a given country, an increase in income redistribution of 1 per cent across time is associated with an on average 0.01 per cent annual lower economic growth

The late Abu al-A‘la al-Mawdudi travelled to Europe, the United States, and Canada admonishing Muslims to eschew integration into their new environment or leave the West

Ik noem een ander voorbeeld: De kleine Mohammed van tien jaar roept, tijdens het uitdelen van zakjes chips voor een verjaardag van een van de kinderen uit de klas: ‘Dat mag niet,

Any attempts to come up with an EU- wide policy response that is in line with existing EU asylum and migration policies and their underlying principles of solidarity and

Ja, die uitspraak heeft het Hof gedaan, maar dat had het nooit mogen doen omdat het in die zaak totaal geen aanwijzingen had van wat sharia betekende en die term toen maar zelf

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

Actually, when the kernel function is pre-given, since the pinball loss L τ is Lipschitz continuous, one may derive the learning rates of kernel-based quantile regression with