• No results found

A study on the eect of labor mobility on unemployment spells of US displaced individuals and their termination

N/A
N/A
Protected

Academic year: 2021

Share "A study on the eect of labor mobility on unemployment spells of US displaced individuals and their termination"

Copied!
59
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MASTER THESIS

MSc in Econometrics

Free Track

2015/2016

A study on the effect of labor mobility

on unemployment spells of US displaced

individuals and their termination

Author:

Sara Castro Sangrador

Supervisor:

Dr. J.C.M. van Ophem

Second reader:

(2)

Abstract

This thesis aims to study the determinants of unemployment duration and exit considering two types of re-employment: full-time and part-time jobs. More specifically, interest lies on assessing the effect of labor mobility, measured by the potentially endogenous binary indi-cator on whether an individual moved after being displaced or not. For this purpouse data from the US 2010 Current Population Survey (CPS) public database is used. A competing risks model will be specified within the Proportional Hazards (PH) framework, interpreting the competing conditional hazards as the outcomes in a discrete choice that each individ-ual faces at every month spent in the unemployment state. To account for the selectivity generated by the moved indicator, an auxiliary first-step Logit model is estimated for the probability of moving, and the results obtained will in turn be used to built a correction term for the endogeneity: the Coslett dummies. It is shown how those individuals who actually moved to a new area after being displaced are more likely to engage into a full-time job than to remain jobless, while it is more likely that they remain with no job rather than accepting a part-time one. No significant effects have been found for ’non-movers’.

(3)

Contents

1 Introduction 4

2 Literature review 5

3 Data and descriptive analysis 11

3.1 Data obtention and refinement . . . 11

3.2 Summary statistics . . . 13

3.3 Kaplan-Meier Survival Function Estimates . . . 16

3.4 Nelson-Aelen Cummulative Hazard Function estimates . . . 18

4 Econometric methodology 20 4.1 The model . . . 20

4.2 Second-step likelihood function and estimation . . . 25

5 Empirical results 27 5.1 First-step Logit model . . . 27

5.2 Competing risks model . . . 28

5.3 Discriminating among models . . . 35

6 Conclusions 36 7 Bibliography 38 8 Annex 41 8.1 Review of all variables used in the models (Table 6-7) . . . 41

8.2 Regressions and F-tests to prove the existence of endogeneity (Tables 8-19) . . . 41

8.3 Estimation results (complete): First-step Logit model in Table 20 . . . 41

(4)

List of Figures

1 Duration measurement and censoring mechanism . . . 12

2 Survival Curves: Full-time Re-employment . . . 17

3 Survival Curves: Part-time Re-employment . . . 17

4 Overall Survival curves: Re-employment . . . 17

5 Cummulative hazards: Part-time Re-employment . . . 18

6 Cummulative hazards: Part-time Re-employment . . . 18

7 Overall Cummulative hazard functions: Re-employment . . . 19

8 ROC curves: models (1) and (2) . . . 35

9 ROC curves: models (3) and (4) . . . 35

List of Tables

1 Summary statistics (I) . . . 14

2 Summary statistics (II) . . . 15

3 Average Marginal Effects first-step Logit model . . . 32

4 Multinomial Logit Main Results: Four specifications . . . 33

5 Main Average Marginal Effects of MNL: four specifications . . . 34

6 Tabular overview of variables used in the model . . . 42

7 US Regions Clasification . . . 43

8 Regression of each education variable on moved . . . 44

9 Regression of moved on all education variables . . . 44

10 Regression of each ’lost job sector’ on moved . . . 44

11 Regression of moved on all ’lost job sectors’ . . . 45

12 Regression of each ’region of residence’ on moved (I) . . . 45

13 Regression of each ’region of residence’ on moved (II) . . . 45

14 Regression of moved on all ’regions of residence’ . . . 46

15 Regression of each ’race indicator’ on moved . . . 46

16 Regression of moved on all ’race indicators” . . . 46

17 Regression of each ’indicator for timing of the displacement’ on moved . . . 47

18 Regression of moved on all ’indicators for timing of the displacement” . . . 47

19 Regression of age variable on moved and viceversa . . . 47

20 Estimated coefficients first-step Logit model . . . 48

21 Multinomial Logit Results with Bootstrapped St. Errors (I): covariates . . . 49

22 Multinomial Logit Results with Bootstrapped St. Errors (II): baseline hazard . 50 23 Multinomial Logit Results with Bootstrapped St. Errors (III): Coslett dummies 51 24 Average Marginal Effects of MNL (I): covariates . . . 52

25 Average Marginal Effects of MNL (II): baseline hazard . . . 53

26 Average Marginal Effects of MNL (III): Coslett dummies . . . 54

27 MNL with log(time) baseline hazard & BT St. Errors (I): covs & λ0 . . . 55

28 MNL with log(time) baseline hazard & BT St. Errors (II): Coslett dummies . . . 56

29 Average Marginal effects of MNL with log(time) (I): covs & λ0 . . . 57

(5)

1

Introduction

In the aftermath of the 2008 financial crisis that yielded very high unemployment figures for the advanced economies, it has been observed how the recovery in certain regions has been faster and more uniform in terms of unemployment reduction, and ultimately, economic growth. In the US the recovery was stronger than in the EU, for instance. In the EU, while some economies such as Germany or the UK show growth and increased demand of workers, others still lay behind, with huge excess of labor supply and shortage of demand, as it is the case of Spain or Greece. On the other hand, the recovery in the US seems to have happened faster and in a more uniform way.

It is of great interest to identify which are the causes behind this better performance of the US. This drives labor mobility into the picture (among many other factors). It is claimed that the US economy smoothes more easily the periods of high unemployment because there is a large labor mobility among states, from areas with shortage of labor demand to areas where jobs are available. This difference is attributed firstly to cultural reasons, related to the pursuit of a bet-ter life in a country built on migration, and also because of policy reasons, due to housing and labor market regulations that make housing turnover easier than in other countries. The small predominance of language barriers throughout the US territory is also claimed as an important factor supporting this easy mobility.

However, the decision to move is not taken in an aggregate manner, but in an individual level. There might be a mix of characteristic of the individuals (demographic features, socioeconomic circumstances, etc.), which affect this decision to some extent. It is of interest to identify which of those are actually determinants of the decision to move and to have some insight on the size of the corresponding effects. It is a fact that more people change residence after being displaced in the US. Thefore, it is also interesting to show whether this decision has an effect on unem-ployment spells, and on whether these are ended by a full-time or a part-time job. For example, it is suspected that unemployment durations have increased after this last crisis period because of the increased levels of home ownerships, which is a main and distinctive cause of the 2008 crisis, and prevents moving to new areas in the search of better job opportunities more than it was the case in previous recessions.

The present paper aims to provide empirical evidence on the relation between unemployment duration and the decision to move, on whether moving to a different area after being displaced has an effect on unemployment exit or not, and whether it determines somehow the likelihood of obtaining either a full-time or a part-time job. For such a purpouse 2010 US data on dis-placed individuals is used. A competing risks model considering these two possible exits will be estimated in the framework of Proportional Hazard (PH) models, undertaking a flexible approach for the estimation of the baseline hazard from the data (Han & Hausman, 1990), instead of assuming a distribution for this baseline hazard or using the Cox model. In order to account for the selectivity generated by the inclusion of the ’moving indicator’ as regressor in the conditional hazard rates, a two step selection model is used. The first-step consists in the estimation of a migration equation, using its results for the calculation of the probability of moving for each individual. In addition, a competing risks model will be estimated in the second step.

In Section 2 a review of the relevant literature is presented, comprising the available methods that could be used to approach the main aims in the present paper, describing their pros and cons, and the belief on why the methodology used is expected to be relevant. Section 3 de-scribes the processes undertaken for the data obtention and refinement and concludes with an exhaustive descriptive analysis, including non-parametric estimates of the survival and

(6)

cummu-lative hazard functions. In section 4, the econometric methodology and estimation procedure are described. Section 5 displays the main charts together with the interpretation of the em-pirical results and inference conducted, and finally, Section 6 contains the final conclusions and possible directions of future research.

2

Literature review

The goals of the present paper can be summarized into two main ones. On the first hand, it seeks for the identification of those factors/regressors which have an effect on the unemployment durations of the individuals and on the probabilities of leaving unemployment through either a full-time or a part-time job. In order to achieve so, a competing risks model will be esti-mated in the PH framework. The baseline hazard will be flexibly estiesti-mated from the data using a methodology inspired in the Han & Hausman, (1990) approach. Moreover, the competing risks model will be interpreted as a discrete-choice setup. Each individual, in each time period spent in unemployment, faces a three-outcome choice: accepting a full-time job, a part-time one or remaining jobless. A multinomial logit framework will be developed to modelize this decision.

The endogenous binary indicator on whether a person moved after displacement or not, has been included in the competing risks model as covariate. The second main aim of the paper is to find its effect on unemployment duration and exit in presence of endogeneity. A two-step selection model will be used, including a first-step which estimates a correction term for selec-tivity: the Coslett dummies.

In order to arrive to the methodology developed in the present thesis, an exhaustive revision of the literature has been performed, and the most relevant tools that could be used to approach the two main objectives in this paper, will be described throughout this section.

First of all, the literature on unemployment duration and estimation of conditional hazard rates is extensive. Many papers, to date, have examined the development of competing risks models, which acknowledge that the unemployment state can be left through different exits: re-employment/ inactivity, full-time/part-time re-employment, etc (McCall, 1996; van de Berg, van Lomwel & van Ours, 2008; Bover & Gomez, 1999).

The proportional hazard framework, developed in Prentice (1978), is an important tool which allows to specify the conditional hazard functions of interest as a multiplicative relation between a baseline hazard which is a function of time only, and a function of the Xi regressors.

λi(t|Xi) = λ0(t|δ0)φ(Xi, β),

where usually φ(Xi, β) = exp(Xiβ) is specified, although alternative parametric specifications

could be considered. t stands for the time spent in the unemployment state, while δ0 and β

are the vectors of parameters to be estimated, corresponding to the baseline hazard and the Xi

regressors respectively. From the PH expression for the Weibull distribution, and after some algebra, the cummulative hazard is obtained as a function of the regressors and it takes the form of a linear regression (Cameron & Trivedi, 2005).1 Such a derivation will be crucial for

the application of a correction term for selectivity on this linear setup, which is the main con-tribution of the present paper.

1

(7)

In fact, a semiparametric analogue of the Heckman two-step model (Heckman, 1979) will be implemented to correct for a potentially endogenous regressor. The introduction of such a selectivity term aims to restore the zero conditional mean assumption2 and ensures that this linear part of the conditional hazard containing the vector of regressors truly behaves as a linear regression. This is achieved without assuming joint normality for the outcome and the partic-ipation equations or equivalently, the cummulative hazards and the auxiliary logit regression predicting the probability of moving. The Heckman model is considered by many authors too restrictive to sucessfully eliminate selection bias (Goldberger, 1983), triggering the development of more flexible semiparametric and nonparametric approaches, such as Coslett’s.

Two procedures are very common in the PH survival analysis literature: either to assume a functional form for the baseline hazard, building up a fully parametric model, or the specification of a Cox model which allows to estimate the β parameters without the need to specify the functional form for λ0 (Cameron & Trivedi, 2005). Prentice (1978), for example, defines the

PH decomposition using the monotonic exponential and Weibull distributions, extensively used also throughout the literature, especially the latter one, despite the lack of theoretical support. These functional form assumptions respectively lead to:

h(t) = λ h(t) = λαtα−1

These h(t) stand for the hazard functions with no covariates yet. Both disitributions have the undesirable memoryless property which means that how long you have survived so far does not affect the chances of future survival/failure (Lee, 1992). Weibull is more widely applicable because it has both a scale (λ) and a shape (α) parameter, while the exponential distribution only counts with a scale one, being α = 1 . The scale parameter becomes the function of covariates in the PH framework, hence: λ = exp(Xiβ), yielding the folowing conditional hazards:

λ(t|Xi) = exp(Xiβ)

λ(t|Xi) = αtα−1exp(Xiβ)

The restrictions imposed by the features of these distributions on the survival data of inter-est may actually yield significant inter-estimates and causal effects resulting from the distributional assumptions but not capturing the actual underlying causal relationships. For example, their monotonic nature would conduct to erroneous inference in scenarios where the hazard rates are highly volatile and experience upward and downward movements. The sensitiveness of the results to such arbitrary assumptions is shown by Moffitt(1985).

The second method, named Cox model, is very apealling in the sense that it eliminates the risk of mispecification by avoiding to make any functional form assumptions for λ0 (Cox, 1972). It is

also very valuable due to the ease of interpretation of the estimates obtained. The probability of failure at time tj of an i individual in the risk set R(tj) (containing those who did not experience

failure or censoring yet when period tj starts), is in turn used to build the conditional hazard for

individual i. As we can see below, the baseline hazard drops out when deriving the expression for the aforementioned probability and hence, it will not be part of λi(t|Xi):

2

This is actually the very basic assumption in the linear regression setup, the expectation of the error term () conditional on the vector of regressors (X) equaling 0: E(|X) = 0.

(8)

P r[Tj = tj|R(tj)] = P r[Tj = tj|Tj ≥ tj] P l∈R(tj)P r[Tl= tl|Tl≥ tj] = = P λj(tj|Xjβ) l∈R(tj)λl(tj|Xlβ) = P φ(xj, β) l∈R(tj)φ(xl, β)

The main drawback from this method is the information loss due to not estimating the baseline hazard at all, missing the measurement of how the risk of failure event changes over time at some baseline levels of the covariates. Alternatively, the approach followed in this paper to analyze the discrete-time data of interest, recovers the baseline hazard and allows to find the effect of time itself on the probability of failure through each of the exits considered. The δ0

parameters in λ0(t|δ0) will be flexibly estimated, simultaneously with the β coefficients, which

are contained in φ(Xi, β) = exp(Xiβ) as in the Cox model.

This flexible semiparametric PH framework, in line with the work in Meyer (1986), and spe-cially Han & Hausman (1990), will be adapted to a competing risks model environment in which exit from the joblessness spell can yield two different outcomes, either full-time or part-time re-employment. The model proposed by Han & Hausman has been specially developed for grouped or discrete time survival data, which is more recommendable than using continous time models such as the Cox model for discrete data, which typically results in a large number of ties that would need extra attention.

In (Han & Hausman, 1990) the relationship between survival analysis and ordinal regression models is exploited. From the PH specification of the single-exit individual conditional hazard function: λi(t) = λ0(t)exp(−Xiβ), the log form of the integrated or cummulative hazard reads:

log Z ti

0

λ0(t)dt = Xiβ + i

where i is extreme value distributed. Letting:

δt= log

Z t 0

λ0(t)dt, t = 1, ..., T

Hence, the probability that an i individual experiences the failure event at time t takes the form:

Z δt−Xiβ

δt−1−Xiβ

f ()d

and the logarithm of the baseline hazards is represented by the δt parameters. These work

as constants for each discrete time period and will be estimated along the β coefficients. The extreme value distribution of the error term yields a log-likelihood function of an ordered logit form. L =X i X t yitlog Z δt−Xiβ δt−1−Xiβ f ()d

(9)

The main downside of this paper is that, as seen above, the theoretical development of the model yields these extreme value distributed error terms and hence, logistic functions. But then, the authors suggest to use an ordered probit in the estimations instead, claiming that although it does not follow the PH specification itself, it is a fair approximation to the ordered logit model. These claims are a bit too arbitrary (Sueyoshi, 1992), this is why this thesis follows an alternative approach inspired in Han & Hausman (1990), but not exactly the one contained in their paper.

In the spirit of Han & Hausman methodology, other authors (Allison, 1982; Singer & Willett, 1993 ; Most et Al., 2014) have described how discrete choice models could be applied in survival analysis. A competing risks model with two exits can be specified in a such a way that it corresponds to a multinomial logit model for a three-outcome choice to be performed in every discrete time period reached by the individual. The three possible outcomes would be: full-time/part-time re-employment or remain jobless (two types of failure events versus survival). See how this specification is developed in Section 4.

The m-choice multinomial logit model is derived from the additive random utility framework. A latent utility generated from choosing alternative j is specified, formed by the adittion of a deterministic and a random component.

Uij∗ = Vij+ ij

For the ith individual, and considering the case with no alternative-varying regressors: Vij =

Xiβj. The alternative chosen will be the one yielding the largest utility, so that:

P r[yi= j] = P r[Uij ≥ Uik, ∀k 6= j] = P r[ik− ij ≤ Vij− Vik, ∀k 6= j]

Provided that the j are iid and type 1 extreme value distributed, the previous expression takes

the form (Manski & McFadden, 1981):

P r[yi= j] =

eVj

eV1+ eV2+ ... + eVm

Both independence across alternatives and individuals needs to hold/be assumed. This first assumption is usually known as ’Independence of Irrelevant Alternatives (IIA)’ and is likely to be violated, for example, if the alternatives can work to some extent as substitutes of each other. Provided a model with the three options: remain jobless, full-time and part-time re-employment it can be the case that a person wants to work and, although has a defined preference for a specific type of work, he/she would accept a job offer of any kind as the primary need of this person is to have a regular income again. The nested logit and the random parameters logit can be used as alternatives as both relax these restrictive assumptions (Borsch-Supan, 1987; Train, 2003).

The present paper is based on data from the same database and survey, and displays very similar method to that of McCall(1996), which estimates a competing risks model for unemployment durations allowing for two exits: full-time and part-time re-employment. Meanwhile, he needs to account for the endogeneity generated by the regressor related to reception of unemployment insurance (UI). The probability of the UI reception binary indicator equalling one is estimated together with the competing risks in a joint maximum likelihood setup. McCall starts assum-ing a fifth-order polynomial form for the baseline hazard, but later switches to an alternative approach in which the baseline hazard is flexibly estimated from the data (Han & Hausman,

(10)

1990), getting rid of the former functional form assumptions.

Alternatively to McCall’s joint ML approach to include the selectivity correction term, the present paper defines a two step selection model, inspired in Coslett (1991), (also see Hussinger, 2008). Although strict assumptions are required, estimates produced by the two-step procedure are still consistent (Heckman, 1979; Lee, 1982). Ideally, the maximization of a joint loglikeli-hood which is the approach used in McCall(1996), would yield asymptotic efficiency and correct standard errors, apart from consistent estimates. But the procedure needed in order to imple-ment it can be very cumbersome (Maddala, 1983). On the contrary, the more tractable two-step formulation which still yields consistent estimates, may produce erroneous standard errors on the second step, that includes first-step estimates which already carry variation from this former estimation stage. Hence, an extra bootstrapping procedure would be needed for the computa-tion of the standard errors in this second step.

Moving after displacement (denoted by the dichotomous variable moved ), will be somehow con-sidered as a treatment underwent by part of the sample. Hence a first-step Logit is estimated to perform predictions and discriminate between ’movers’ and ’non-movers’, with the final aim of identifying the effect of moving on the conditional hazard rates, capturing the effect of this ’treatment’. The second step would correspond to the competing risks specification following a multinomial logit form and including a correction term for the potentially endogenous regressor, calculated from the first-step estimation results.

In her paper, Hussinger estimates the effect of subsidy reception as a covariate in a linear model for R&D investment levels: R&Di = Xiβ + subsidyiθ + i. A two-step selection model is used to

identify the effect of such an endogenous treatment on the outcome of interest. The propensity of subsidy reception is estimated using a first step Probit model of the form Y∗ = Ziγ +νiwhere

νi ∼ N (0, 1). On the first hand the fully parametric and highly restrictive Heckman model is

applied.

E[R&Di|Yi∗> 0] = E[Xiβ + i|Ziγ + νi > 0] = Xiβ + E[i|νi> −Ziγ] + ξi∗

E[R&Di|Yi∗≤ 0] = E[Xiβ + i|Ziγ + νi ≤ 0] = Xiβ + E[i|νi≤ −Ziγ] + ξi∗

The joint normality assumption of the error terms νi and i implies: i = σ12νi+ ξi provided

that νi and ξi are independent. Under these restrictions the previous expectations become:

E[R&Di|Yi∗> 0] = Xiβ + E[σ12νi+ ξi|νi > −Ziγ] = Xiβ + σ12J (Ziγ)

E[R&Di|Yi∗≤ 0] = Xiβ + E[σ12νi+ ξi|νi ≤ −Ziγ] = Xiβ + ˜σ12J (Z˜ iγ)

where J is the Inverse Mill’s Ratio calculated using the fitted probabilities from the first-step Probit and making the distinction between those who did and did not receive subsidy:

Ji(Yi∗= 1) = φ(Ziγ)ˆ Φ(Ziγ)ˆ ˜ Ji(Yi∗= 0) = φ(Ziγ)ˆ 1 − Φ(Ziγ)ˆ

(11)

This joint normality assumption is too restrictive and unrealistic for the present paper that es-timates a competing risks model with a multinomial logit form (where the conditional hazards have extreme value distributed error terms). The parametric Heckman model is prefered over more flexible methods in terms of efficiency but to actually get these benefits it needs to be correctly specified, as it can become inconsistent otherwise. Following this reasoning, it is not possible to compute correctly the Heckman correction terms in the present setup. To overcome this, interest is put on alternative semiparametric methods (although it is important to remark that the restrictive distributional assumptions are not relaxed by the application of this more general semiparametric procedure). Hussinger proposes three of these, all depending on the ˆγ estimates from a first step Probit, so normality is still assumed for the error term of the partic-ipation equation, νi, but the joint normality assumption is avoided. Two of these methods will

be discussed below.

The semiparametric method introduced in Coslett(1991) utilizes a dummy variable approxima-tion to compute the selecapproxima-tion correcapproxima-tion term. In order to do so, the algorithm from Ayer et al. (1955) is proposed first. The set of dummies is finally built cutting in M sections the value-order Ziγ vector presented in the real line and using the Probit ˆˆ γ estimates. As we are dealing with

a binary treatment as endogenous regressor, different dummies will be introduced for treated and non-treated observations. In order to build each of these sets only the Ziˆγ values for the

observations in each group are taken into account, in order to split it in M sections. So that the outcome equation would read:

R&Di|(Yi∗ = 1) = Xiβ + subsidiesiθ + M

X

m=1

b1mD1im(Ziγ) + ξˆ 1i∗

R&Di|(Yi∗ = 0) = Xiβ + subsidiesiθ + M

X

m=1

b0mD0im(Ziγ) + ξˆ 0i∗

Secondly, Newey’s correction term is also proposed in Hussinger(2008), inspired on both Lee(1982) and Coslett’s (1991). This correction term would be composed by a sum of functions which contain both unknown parameters and known smooth functions built from the Probit un-bounded linear prediction vector Ziγ. It improves robutness with respect to outliers if com-ˆ

pared to Coslett’s due to the uniform boundedness, between 1 and -1, of its components. Both Newey(1999) and Coslett(1991) seek to identify slope parameters in the structural equation by using an infinite series approximation as correction term.

The advantage of Coslett’s technique over Newey’s is that the identification of the intercept, which captures the effect of the endogenous regressor, is straighforward. It is simply given by the dummy which represents the interval including the observations with the largest esti-mated probability of participation for the treated individuals, and by the dummy capturing the lowest probabilities for the non-treated. On the contrary, when using either Newey’s or Heck-man’s correction terms, the intercept estimators proposed by Heckman (1990) and Andrews and Schafgans (1998), need to be used in order to identifiy the effect of the endogenous regressor of interest (Hussinger, 2008).

(12)

3

Data and descriptive analysis

3.1 Data obtention and refinement

Survey data from both, the 2010 Displaced Worked (DWS) and March Monthly Supplement, from the US Current Population Survey (CPS) is used. These two datasets have been merged to be used in the present study. This is possible due to the design of the survey sample, which is divided into eight subsamples or rotation groups.3 Hence, it is possible to track every housing unit along several months so that, observations corresponding to a same individual, can be found in the two CPS Supplements used. These are collected at different points in time and also, each supplement contains variables which cannot be found in the other and these variables are crucial for this study. 4

The CPS survey is household-based so there is not a unique identifier or ID variable exclusive for each individual in the sample. Alternatively, there is an exclusive ID number for each household interviewed. This household ID will be used to conduct the dataset merge together with a vari-able containing the number given to each person within his/her housing unit. A third varivari-able named ”household counter” is used for the merge, and it is necesary to account for the fact that when a housing unit is selected for the survey, it will be subject to the eight interviews no matter whether the people occupying the dwelling change in the meantime. (Madrian, Lefgren, 1999).

For example, a new family could have moved in during the period that the household is part of the DWS sample. It also could have been the case that some members of the family who were not/were living in the house before have moved in/out. Hence, during the period in which the household participates in the survey, all or some of the household members may change. Thereby, this third variable is the way to avoid the problem of the ”false positives” or merged observations which in the end do not correspond to the same person although they register the same household ID and the same member number within the family. Subsequently, several checks have been conducted with this same aim of deleting obsevations for which the merge failed, looking at gender, age, race and education attainment variables.

The population of interest consists on 20-64 year-old individuals displaced from their jobs in the three years prior to the interview, 2007-2009, due to plant closing, insufficient work or position abolishment. The spell starting at the job loss until the first post-displacement job is measured in the data. There is no information on job search activities so that spell measure refers to the joblessness rather than the unemployment period (McCall, 1996). The labor force status of the respondents in January 2010 is considered (at the DWS interview). Individuals who have been absent from work in the week prior to the interview, due to vacation, health issues and other reasons are included in the DWS sample but not considered in this study. For some individuals, the aforementioned spell had not ended at the time of the final interview and their duration measure will be right-censored. In any case, the maximum value that can be registered for the joblesness durations is 33 weeks, as this is the time passed from the earliest displacements considered until the final interview in January 2010. See Figure 1 in the next page for a visual description of how this duration is measured and how the censoring mechanism works.

3Each of these groups is composed by a certain number of households which enter the sample at the same

time to be interviewed for a period of four months, followed by an eight-month break and finally, by a second period of interviews (again lasting four consecutive months).

4

The 2010 DWS interview takes place in January while the 2010 Montly Supplement survey takes place in March. Observations corresponding to duration variables are hence adjusted so that they all pertain to January 2010.

(13)

Figure 1: Duration measurement and censoring mechanism

Only joblesness duration until first post-displacement job is considered, mainly because there is no data available for subsequent spells for all observations. Due to the brief displacement time-span considered (2007-2009), the proportion of individuals that experienced more than one spell is the 16%. Moreover, information of joblessness duration for some observations was missing: incorrect answers, refusals to answer, recollection problems, were the main reasons reported, specially, the more antique the displacement had been. The deletion from the sample of these individuals reporting incomplete answers might entail a slight risk of bias as most of them pertained to people displaced at the beginning of the considered displacement period and who did not remember properly the details of their joblessness spell.

Joblesness spells were initially reported weekly, but monthly intervals have been defined instead because of two reasons. Firstly, because the number of parameters to be estimated, one per interval, would be too large, and secondly because reduces the risk of biases from heaping of reported joblessness durations at even weeks (McCall, 1996). A drawback of these wider inter-vals is that many displaced workers are reincorporated to the labor force in the first month(s) after being dismissed, and this phenomenon is more poorly captured by monthly than by weekly intervals.

Two possible exits from the joblessness situation will be considered: full-time and part-time re-employment. 5 A difficulty encountered at this point is the fact that the dataset only offers information of the full- or part-time status of the job held at the time of the interview, which is not necessarily the first post-displacement job, although, as we will see in the descriptive analysis, it is very frequently the case that both do coincide due to the short time span of displacements considered. For example, if observing the variable accounting for the number of jobs held after displacement, the mean is 0.881 and the standard deviation approx 1, so that most people had either 0 or 1 jobs, or at most two, but more rarely. Therefore the exits defined could be understood as containing those joblesness spells after displacement, which have led to an employment path which concludes with a full/part-time re-employment status held at the interview time.

As a third category that needs to be considered, is the people that could not get any job since displacement or who did at some point but then went back to unemployment or inactivity. In both cases their joblessness period is considered unfinished and the duration value treated as right-censored.6 Three dummy variables will be created (DF, DP, c), which respectively equal one for the displaced workers corresponding to each the three categories just discussed and 0 otherwise, and will work as the censoring indicators for the different hazard functions that will be specificied. Also an overall indicator for all those who achieved re-employment after displacement (D) will be created.7

5

This labor force status will be determined depending on the number of hours worked during the previous week (part-time if under 35, and full-time otherwise).

6

For the second type of right-censored durations corresponding to people who had a post-displacement job and lost it, the value of the joblessness spell will result from adding the joblessness durations until and after the first-post displacement job. For this specific profile of displaced workers there is information about this second joblessness spell. Both spells will be treated as a single, uninterrupted spell.

7

(14)

Among the wide range of covariates to be included in the conditional hazard specification there is one which is worth further mention. This is the binary indicator on whether the displaced worker moved from his/her prior residence after the job loss. In the tables displayed below, containing the descriptive statistics of all variables used, it can be observed that around 11% of individuals considered in this sample of displaced workers indeed moved out. Due to the drastic nature of this decision it can be considered as a high percentage. It is definitely large if compared to labor mobility figures in the EU, which did not exceed the 3% at any point in the considered period (Eurostat). Moreover, 65% of these people, claimed that the main reasons driving this change were economic.

There are reasons to believe that the inclusion of this explanatory variable in the hazard func-tion leads to endogeneity because of the existence of unobserved factors driving the shortening of the joblesness period or its end through a particular exit, may be the same ones which gen-erated in this person the need to move to a new area in order to seek a job there. Moreover, simultaneity might be also present. The decision to change the area of residence within the US affects the lenght of the joblessness period, but it is also possible that, for example, a joblesnesss spell which starts to be long triggers the decision of moving to a new area. The endogeneity of the moving variable needs to be taken into account (this procedure is contained in detail in Section 4).

Apart from the potentially endogenous regressor described above, there are many other relevant covariates to be included in the competing risks specification. On the first hand, some variables closely related with the former job: whether the displaced employee expects a recall from the previous employer, the sector of the former job, tenure, whether this person was unionized or re-ceived health insurance and the weekly pay earned. Also some demographic and socioeconomic indicators are included such as age, marital status, race, gender, number of children, place of residence (rural/urban, state), level of education of the subject, etc. Finally, the timing of the displacement also plays a role.

The final sample contains 4,266 observations.

3.2 Summary statistics

Tables 1 and 2 contain the main summary statistics of all variables that will be used in the models.8 The mean of the duration variable of interest measured in months is 6.1, although the standard error is also a bit larger than 6 and values up to 33 are registered (this is the upper bound, all observations have experienced either censoring or failure through either of the two exits at the 33th month considered). In the present sample of displaced people, 41% were able to achieve re-employment in a full-time job, while almost 19% found a part-time job instead. As was mentioned before, in most cases either one or zero jobs after displacement are the most common scenarios observed.

40% of people were not able to find a job in the period going from their displacement until the time of the survey. This figure can seem to be relatively large, but is explained due to the fact that 49% of the cases considered were displaced in 2009, so not much time passed for many to find a job until January 2010. Alternatively, 18% and 33% were displaced in 2007 and 2008 respectively. This difference can be attributed to the very particular circumstances that were

8

For a greater insight, Table 6 in the Annex contains a tabular overview of all variables, their names, definitions and how they are measured

(15)

Table 1: Summary statistics (I)

Variable Mean Std. Dev. Min. Max. N

Joblesness duration (montly periods) 6.102 6.098 1 33 4266 Number jobs after displacement 0.881 1.040 0 30 4140

moved 0.109 0.311 0 1 4266 Censoring indicators DF (Re-employed:Full-time) 0.408 0.492 0 1 4266 DP (Re-employed: PT) 0.188 0.391 0 1 4266 c (Still jobless) 0.404 0.491 0 1 4266 D (Re-employed: Overall) 0.597 0.491 0 1 4266 Year of displacement dw 2007 0.177 0.381 0 1 4266 dw 2008 0.332 0.471 0 1 4266 dw 2009 0.491 0.5 0 1 4266

Reason for displacement

dw plant 0.255 0.436 0 1 4266

dw insuf 0.500 0.500 0 1 4266

dw position 0.246 0.431 0 1 4266

xrecalll 0.067 0.249 0 1 4266

uibens 0.538 0.499 0 1 4266

Chracteristics of lost job (LJ)

ljten (years) 5.034 6.336 0.003 41 4266 lj priv 0.962 0.192 0 1 4266 lj Agriculture 0.008 0.088 0 1 4266 lj Manufacturing 0.199 0.399 0 1 4266 lj Services 0.793 0.405 0 1 4266 lj Construction 0.152 0.359 0 1 4266 lj Trade 0.141 0.348 0 1 4266 lj Insurance 0.07 0.255 0 1 4266 lj union 0.087 0.283 0 1 4266 lj hi 0.536 0.499 0 1 4266 ljwkpay 721.545 696.781 0 3867 4266

Labor force status (January 2010)

Employed 0.597 0.491 0 1 4266

Unemployed 0.368 0.482 0 1 4266

Not-in-labor force (NILF) 0.035 0.184 0 1 4266 Sociodemographics married 0.550 0.498 0 1 4266 age 41.159 11.657 20 64 4266 female 0.377 0.485 0 1 4266 White 0.712 0.453 0 1 4266 Black 0.100 0.300 0 1 4266 Hispanic 0.130 0.337 0 1 4266 Other (race) 0.057 0.232 0 1 4266 forborn 0.139 0.346 0 1 4266 citizen 0.921 0.270 0 1 4266 veteran 0.075 0.263 0 1 4266

(16)

Table 2: Summary statistics (II)

Variable Mean Std. Dev. Min. Max. N

Level of education LTHS 0.086 0.280 0 1 4266 HS 0.338 0.473 0 1 4266 SomeCollege 0.314 0.464 0 1 4266 College 0.194 0.396 0 1 4266 Advanced 0.068 0.251 0 1 4266 Household/family composition householder 0.423 0.494 0 1 4266 ownchild 0.721 1.068 0 7 4266 ch05 0.176 0.381 0 1 4266 ch613 0.225 0.417 0 1 4266 ch1417 0.128 0.335 0 1 4266 faminc2 10.266 3.907 1 16 4266 Household location Area of residence centcity 0.241 0.427 0 1 4266 suburb 0.396 0.489 0 1 4266 rural 0.174 0.379 0 1 4266 Region of residence NewEngland 0.120 0.325 0 1 4266 MidAtlantic 0.076 0.265 0 1 4266 EastNorthCentral 0.136 0.343 0 1 4266 WestNorthCentral 0.113 0.316 0 1 4266 SouthAtlantic 0.171 0.377 0 1 4266 EastSouthCentral 0.041 0.198 0 1 4266 WestSouthCentral 0.065 0.247 0 1 4266 Mountain 0.120 0.325 0 1 4266 Pacific 0.158 0.364 0 1 4266

(17)

affecting the US economy in the 2007-2009 period which changed along the different years, but also in some extent due to the bias problem mentioned before. Only the 7% expected a recall from their former employer after displacement and almost the 54% were receiving unemploye-ment insurance (UI). 54% had some kind of health insurance provision in their former jobs. It is considered to be an important variable when talking about displacement and job seeking because losing a job means also losing the health coverage in some cases, so people can feel even more urged to find a new job in such a situation. Most of the displaced workers were employed in the service sector (almost the 80% of the sample), and the 96% had worked for private employers for a mean of 5 years, although the variance is high: 6.3 years.

According to the demographic indicators, the sample is formed by a 55% of married individuals and 37% are females. 71% of the individuals are white, with only 10% and 13% of black and hispanic people, respectively. Actually, 14% were born outside of the US, although only the 8% does not hold the US nationality. The mean age is 42. 42% of the people in the sample declare themselves to be the main person bringing income into the household. The average number of children does not even reach 1. More precisely, it can be observed that almost 18% of the individuals have a child aged between 0-5 and, 22% and 13% have children aged between 6-13 and 14-17 respectively. Most of the individuals have reached a high school education level 34%, or some college education but without completing a degree 31%. Only 19% have completed college studies. The smallest proportions are those corresponding to people who did not reach high school certificate (9%) or that completed advanced studies (Masters or PhD), (7%).

According to the household location, most people live in city suburbs (40%). Using the official clasification of US states, 9 regions have been defined. Individuals are not that evenly dis-tributed across them. 12% live in New England, almost 8% in the Mid Atlantic zone, 14% and 11% in the East North Central and West North Central regions respectively, 17% from South Atlantic area, 4% and 6% in the East South Central and West South Central regions and finally 12% and 16% in the Mountain and Pacific regions respectively. 9

3.3 Kaplan-Meier Survival Function Estimates

First, treating the failure times as continuous (actually, although the data is grouped the un-derlying dgp is continuous). The Kaplan-Meier survival function estimates are displayed in Figures 2-4. Kaplan-Meier curves have been also computed taking into account the fact that the data are recorded in a discrete manner (in monthly intervals), and assuming that failures occur in an uniform way through the interval. Accordingly, an adjustment has been made so that failures are imputed to the midpoint of the interval rather than all of them being grouped at the begining of it. In the end, differences between the adjusted Kaplan-Meier curves and those included in Figures 2-4 are minimal, and it was considered unnecessary to also report them in this document as these estimates are being used with descriptive purpouses, just to get an overall image of the behaviour of failure/censoring patterns over the time period analyzed.

This estimator considers, for each type of exit separately, the empirical distribution function with the presence of censoring (Prentice, 1978). It takes the form:

ˆ Sw(t) = Y i|ti≤t (ηi− ciw ηi ) 9

(18)

where t1, ..., tk are the interval-censored ordered response times, ciw = 1 indicates exit through

type w and otherwise it appoints the existence of censoring at the considered t. Finally, ηi

stands for the individuals which are at risk at each t.

Figure 2: Survival Curves: Full-time Re-employment

Figure 3: Survival Curves: Part-time Re-employment

(19)

Estimates of the survival curves for both, re-employed individuals at full-time (FT) and part-time (PT) jobs are reported, measuring the proportion of surviviors left at each of the months observed (Y axis). First, for the whole sample, and then for the two subsamples determined by those individuals who moved after displacement and those who did not. Below each graph a very basic risk table is also included, indicating the number of individuals that are still ’at risk’ at each timepoint considered, refering to those individuals who did not experience failure or censoring yet. If in a certain month t, re-employment of PT type takes place but the graph for FT is being built, the individual is observed for the last time at t but treated as being censored at that timepoint and not as experiencing a transition or failure. Also an overall suvival curve has been plotted containing all individuals who achieved re-employment, no matter of which kind.

3.4 Nelson-Aelen Cummulative Hazard Function estimates

Figure 5: Cummulative hazards: Part-time Re-employment

Figure 6: Cummulative hazards: Part-time Re-employment

The Nelson-Aalen is the corresponding non-parametric estimator for the cummulative hazard function, built from the discrete-time hazard rate. It takes the form:

ˆ Λw(t) = X i|ti≤t ˆ λwi = X i|ti≤t dwi ηi

(20)

Figure 7: Overall Cummulative hazard functions: Re-employment

where dwi represents the spells ending at time t through exit of type w and ηi again stands for

the individuals which are at risk in each t.

Figures 5-7 contain the Nelson-Aelen function estimates, which also treat failure times as con-tinuous. However, the Y axis is now specifying the cummulative proportion. The hazard rate of each period, the proportion of exits encountered out of the total number of individuals which survived until the examined timepoint, is added to that of the previous months. The Nelson-Aelen functions are defined for each exit separately and also for the global censoring indicator. This is done for the total sample, and then also discriminating by the fact of whether the indi-vidual moved from his residence after displacement or not.

From the non-parametric estimates of the survival and hazard curves displayed, it can be de-duced whether the most commonly used monotonic distributions (Weibull, exponential, log-normal) properly fit the data in the present study. This appears not to be the case and, this is why the econometric methodology proposed in the next section aims to more general distri-butions, for any dataset even if the usual distributional assumptions are violated. Indeed, the method proposed is distribution-free when coming to the estimation of the baseline hazard in the PH framework. An important risk in case of misspecification of the functional form of the baseline hazard, is that significant causal effects for the covariates of interest may be obtained as a result of those erroneous functional form assumptions rather than resulting from a true causal relationship existing in the data.

The present study is based on a survey-based public database built for a purpose disconnected to that of this paper. Building a tailored database for the study would have been too time consuming and intractable in this situation, but would have allowed to redesign questions and their contents, solving the several problems of missing information that have been encountered throughout the analysis, and also the issue discussed next. It is important to keep in mind that the endogenous binary indicator on whether a person has moved after displacement registers a rare event, only afecting about 10% of the people in the database. It will be possible to identify the causes of such a behaviour and to predict the probability of its ocurrence (low probability estimates are expected), but this specific feature of the variable mightl pose obvious difficulties on the obtention of statistically significant results in the subsequent estimations, when using it as covariate.

(21)

4

Econometric methodology

4.1 The model

A competing risks model will be specified to study joblessness duration by jointly considering exits to full-time and part time re-employment. The conditional hazard functions will be defined in the context of Proportional Hazard (PH) models (Prentice, 1978) with a flexible parametric estimation of the baseline hazard based on the data (Han & Hausman, 1990), aiming to avoid misspecifications due to erroneous functional form assumptions.

The underlying dpg generates transitions between states at continuous timepoints but, as it is common in economic data, transition times are grouped in the present case and observed in dis-crete monthly intervals.10 It is not possible to know at which point during the indicated interval the transition exactly took place (interval-censored failure times). For the sake of the present specification it is assumed that the underlying continuous hazard rate is constant throughout each monthly interval (Jenkins, 1995).

Discrete failure times t = 1, 2, ...k are observed. Accordingly there are k underlying intervals of the form [a0, a1), [a1, a2), ..., [aq−1, aq), [aq, ∞), with a0 = 0 and q = k − 1, and where T = t

is observed if the failure event lies in the interval [at−1, at). Two possible risks or exits to the

joblessness period are defined: w = f, p where f and p stand for full-time and part-time re-employment respectively. The unobserved latent variables, Yw∗ ≥ 0, represent the duration of the joblessness spells if each w was the only exit analyzed. 11

The data structure available takes the form (ti, wi, ci|xi) for each individual i = 1, 2...N . The

observed discrete failure time equals ti = min(Yf i∗, Ypi∗, Ci|Xi)) the minimum of the survival

time, either of type Yf∗ or Yp∗, and censoring time Ci. Incomplete spells are considered to be

right-censored and are denoted by:

ci =



0, Yw∗< Ci if failure of type f or p occurs in the interval [at−1, at)

1, Yw∗≥ Ci indicating there is censoring at [at−1, at)

(1)

Censoring is not independent in the present study because it sets an upper bound, therefore influencing the parameters of the distribution of the observed failure times (T). Durations are censored if running above a certain pre-fixed censoring time, say tci = k. All joblesness spells

starting within a 33-month period are considered. At the termination of the 33 months, jobless-ness spell lenghts will be known for re-employed workers but there might be some people who did not leave the jobless state yet, so their duration values are considered to be right censored and set to the number of months completed in unemployment at the 33th month. The max-imum value that the duration measure could take is 33 months for those who were displaced at the earliest period considered and were either re-employed, or censored at the 33th month. Therefore the censoring mechanism is explicitly modelled and part of the likelihood function (Cameron & Trivedi, 2005).

The type of exit experienced is indicated by wi = f, p. Finally, Xi is a vector of covariates,

including in its first position a constant term, and also the potentially endogeneous regressor moved.

10

This monthly width of the time intervals is considered the most adequate provided the sample size. Weekly joblesness intervals were originally reported in the DWS dataset. Two-week intervals were at first considered but too many parameters had to be estimated for the baseline hazard so that identification could not be ensured anymore according to (Han and Hausman, 1988).

11

(22)

Exit-specific conditional hazards are specified in the PH framework. The hazard rate for obser-vation i will be taken as starting point and it is defined as follows:

λwi (t) = lim∆→0+

P (t < twi < t + ∆|twi > t, Xiw)

∆ = λ

w

0(t)exp(−(Xiwβw)) (2)

where w = f, p stands for transitions to full-time and part-time re-employment respectively. It denotes the conditional probability of exiting the joblessness state in [t, t + ∆). If integrating and taking logs afterwards the following expression results:

Λw0(t) − Xiwβw= iw (3) where Λw0(t) = logRtw 0 λw0(t)dt and iw = log Rtw 0 λwi (t)dt. Hence, δtw= log Z tw 0 λw0(t)dt = Xiβw+ iw (4)

being δwt functions of the latent variables Yw∗ ≥ 0, and where the iw error terms are type I extreme value distributed.

These δwt will work as constants at each of the joblessness intervals and will be estimated si-multaneously with the β coefficients. (Han & Hausman, 1990). In (4) Xiw = Xi is used, which

is appropiate from the identification viewpoint according to (Han and Hausman, 1998) studies on identification in competing risks models. Identification is proved in such a specification, in which the same regressors are used for each competing hazard, as long as at least one of the covariates included is continuous. This requirement is fullfilled due to the inclusion of the continuous variables measuring tenure and earnings in the lost job.

The approach used for the specification of the conditional hazards, applicable to discrete dura-tion data, consists on specifying a three-opdura-tion choice model for transidura-tions. When each discrete time interval of the form [at−1, at) is reached, the individual faces a decision with three (w=1, 2

(+1)) possible outcomes: the two target events, full-time/ part-time re-employment or survival (not finding a job). The general formulation of the model reads:

P r[at−1≤ T < at|Tw ≥ at−1|Xi] = F (δ0tw+ Xi0(at−1)βw) (5)

the coefficients of Xi are constant over time, while the δ0tw intercepts do vary over time.

The two hazard functions λw(t|Xi) sum up to the overall hazard function (Most et al, 2014),

(Prentice, 1978): λ(t|Xi) = 2 X w=1 λw(t|Xi) = P (T = t|T ≥ t, Xi) (6)

(23)

The survival function in t is given by: S(t|Xi) = P (T > at|Xi) = t X j=1 (1 − λ(j|Xi)) (7)

Hence the three conditional response probabilities: λ1(t|Xi), λ2(t|Xi) and 1 − λ(t|Xi) sum up

to one, because the response categories are mutually exclusive and exhaustive. This is the separability property of continuous hazards models and it can be approximated for the discrete case, only working with accuracy if intervals defined are sufficiently small. Based on evidence provided by the literature, monthly intervals do meet such characteristics (Most et Al., 2014; Bover & Gomez, 1999; Allison, 1982; Jenkins, 1995)

Considering the error terms iw in (3) and (4) to be extreme value distributed, this F function

in (5) will stand for the logistic cdf and the likelihood function will be of a multinomial logit form. Such approach yields the following conditional exit-specific hazard functions, conditional on having reached [at−1, at), which are equivalent to the conditional response probabilities of

the multinomial logit setup.

λw(t|Xi) =

exp(δ0tw+ Xi0βw)

1 +P2

w=1exp(δ0tw+ Xi0βw)

(8)

The (δ01w, ..., δ0qw) stand for the exit-specific baseline parameters and the estimated βw

coeffi-cients will capture the exit-specific effects of the covariates in vector Xi. These βw coefficients

for the regressors considered are constant over time. All parameters of the type δ and β will be simultaneously estimated using a multinomial logit procedure. This method is very appeal-ing for its simplicity but it still requires some changes in the dataset that will be described in Section 4.1. It could be claimed that the multinomial logit is a method originally developed for instrinsically discrete data, which is not not the case here. But if the interval hazard is relatively small the estimates from such a model will provide a very close approximation in presence of interval censored data if assuming hazards to be constant within each interval.

There is a third group of individuals who do not achieve re-employment in the whole period examined, and hence are survivors. This plausible third event is not explicitly specified in the multinomial logit because it will work as the normalized outcome whose coefficientes have been equalised to zero to act as reference or base category. (Most et al., 2014)(Cameron & Trivedi, 2005). All coefficients and marginal effects resulting from the model will be interpreted against this base outcome. The conditional probability of surviving is defined as follows:

P (T > t|T ≥ t, X) = 1 − 2 X w=1 λw(t|Xi) = 1 1 +P2 w=1exp(δ0tw+ Xi0βw) (9)

Selectivity issues are also crucial in the present model. A two-step selection model, semipara-metric analogue of the Heckman selection model, will be estimated with the main aim of finding the effect of a potentially endogenous regressor in the conditional hazards. This regressor is the binary indicator on whether an individual has moved after displacement, ’moved ’. In turn, choice among discrete alternatives in the multionomial logit setup is specified as going for the alternative maximizing an underlying latent utility Uir∗ where r stands for the (w + 1) possible

(24)

outcomes: wf, wp, 1 − wf − wp. This utility is formed by a deterministic term and a purely

random component.

Uir∗ = Xi0βr+ ir (10)

Before applying any endogeneity treatment, see that the empirical three-equation model reads as follows: λif = exp(δ0tf + Xi0βf + movediθf) 1 +P2 wexp(δ0tw+ Xi0βw+ movediθw) λip= exp(δ0tp+ Xi0βp+ movediθp) 1 +P2 wexp(δ0tw+ Xi0βw+ movediθw) (11) ˆ

P r[moved] → P r[Ymoved∗ = 1] = I{Zi0γ + νi > 0} (12)

where the error νi is assumed to follow a logistic distribution. Also the vector of regressors

Xiw = Xi are assumed to be independent to the corresponding error terms iw in each of the

hazard functions. Equation (12) defines the relationship between the outcome of interest and the Xi covariates, while (13) is the selection equation that defines the relationship between the

binary indicator moved and the vector or regressors Zi.12 Only two of the r outcomes are

spec-ified in (12) due to the use of the third as base outcome in the multinomial logit model. The multinomial logit specification requires indepedence across individuals (iid) and alternatives. This latter assumption is the so-called ’IIA’ so that if one outcome in the discrete choice is re-moved the estimated probabilities and effects of determinants on the other alternatives should remain unchanged, which might be questionable in the present scenario.

In the first step, an auxiliary logit estimation will be conducted to obtain fitted values for the parameters of the covariates which affect the probabilities of ’having moved’, (hence obtaining ˆ

γ). These results will in turn be used to compute Coslett’s correction term to take account of the selectivity generated by the regressor moved. This term consists in a set of dummies that will be subsequently introduced in the specification of the conditional competing hazards. The dummies are defined based on the value-ordered vector Zi0γ, cut in M sections. In fact itˆ is a semiparametric analogue of Heckman’s correction term (Heckman, 1979), (Coslett 1991), (Hussinger, 2008).There is a large literature focused on obtaining more flexible methods than the Heckman one, claimed to be too restrictive to actually remove selection bias (Goldberger, 1983). Restrictions imposed by functional form assumptions are much weaker in this Coslett setup. For example, the normality assumption required to apply the Heckman model is totally unrealistic in the present paper.

An important difference between this adaptation of the Heckman model and its original setup is that the outcomes of interest, the durations and the corresponding conditional hazard rates are always observed (instead of observability being contrained to ’moved’ taking a certain value).The essence of the Heckman methodolgy, (estimating the selectivity bias and eliminating it from the outcome equation of interest by introducing a correction term built from the fitted values of the first-step Logit procedure), will be adapted to eliminate selection bias from the present competing risks model as in Hussinger (2008). Different sets of dummies are built for those who moved after displacement and for those who did not.

12

Indeed moved is part of the Xivector of regressors, although in some of the equations in this section, for the

(25)

Due to the properties of hazards, the corresponding cummulative hazard functions, based in the aforementioned utility maximization framework, read as follows:

Λf(t) = log Z tf 0 λf0(t)dt = Xifβf + movediθf+ if Λp(t) = log Z tp 0 λp0(t)dt = Xipβp+ movediθp+ ip (13)

Due to the presence of simultaneity and selectivity of individuals with certain chracteristics into the moved group, E[iw|Xi, movedi] 6= 0, where w = f, p. So,

E[Uir∗|Ymoved∗ = 1] = E[Xiwβ + iw|Ziγ + νi > 0] =

Xiwβ + E[iw|νi > −Ziγ] + ξir∗ (14)

E[Uir∗|Ymoved∗ = 0] = E[Xiwβ + iw|Ziγ + νi ≤ 0] =

Xiwβ + E[iw|νi ≤ −Ziγ] + ξir∗ (15)

This bias term will be estimated using the logit model results, E[iw|νi ≤ −Ziγ] = g(Zi0γ).

Function g(.) will be estimated using ˆγ and will take the form of the Coslett series of dummy variables. g(.) is constant in each of the M sections in which Zi0γ is divided to produce theseˆ dummies. The cummulative hazard functions become:

Λw(t)|(Ymoved∗ = 1) = log Z tw 0 λw0(s)ds = Xiwβw+ movediθw+ M X m=1 b1mwDi1mw+ ξiw∗

Λw(t)|(Ymoved∗ = 0) = log Z tw 0 λw0(s)ds = Xiwβw+ movediθw+ M X m=1 b0mwDi0mw+ ξiw∗ (16)

These cummulative hazards under label (16) represent the underlying latent utilities which define the discrete choice and the multinomial logit setup in the competing risks model, differ-entiating between those who moved and those who did not. The new errors of the form ξiw∗ are assumed to maintain the extreme value distribution and the i.i.d. form. Moreover, the introduction of the correction terms in the original structural equations would restore the zero conditional mean assumption, necessary for the linear part of the PH hazard model containing the covariates to truly behave as linear regression. Six differents sets of coefficients would cor-respond to the Coslett dummies in this specification, representing ’movers’ and ’non-movers’ in each of the equations for the underlying utility of the three outcomes. All the parameters corresponding to the Coslett dummies for the base outcome (remaining jobless) are set to zero.

In order to ascertain the effect of moved on the conditional hazards the identification of the intercept is essential. In the Coslett setup the intercept is given by the dummy capturing the estimated probabilities of moving which are the closest to 1 for the individuals who moved after displacement, hence the one which represents the last of the M value-ordered individualss. Fol-lowing the same reasoning, for individuals who did not move the intercept will be the dummy corresponding to the smallest estimated probabilities of moving. (Hussinger, 2008)

For the sake of identification of the parameters (for it to be not only reliant on nonlinearities), Xi and Zi vectors of regressors will not contain exactly the same variables. Otherwise

multi-collinearity issues would show up. It is also true that the trouble is less severe the greater the variation in Z, this is, the better the logit model can discriminate between those who did and

(26)

did not move (Cameron & Trivedi, 2005). But this is quite likely not the case in the logit model for moved, which is a relatively rare event in the sample.

Finally, bootstrapped standard errors will be reported (250 replications). These are required because, first of all, two-step estimation is conducted so that second step is already relying on previously estimated components (Coslett dummies), that already carry some estimation variability. Therefore, non-bootstrap standard errors could be under or overestimating the real variability in the model.

4.2 Second-step likelihood function and estimation

The (ti, wi, ci, xi) data setup is used to derive the likelihood function to be used in the

second-step estimation phase. The risk set in interval [at−1, at) is denoted by Rt = {i : t ≤ ti}, and

contains all individuals whose joblessness spells have not reached failure or censoring yet at t − 1, and may end in the present or subsequent intervals, so that they are ’at risk’. Three indicator variables are defined. Firstly the, yitw indicators for the two exits w = 1, 2 and then

yit0 for survivors.

yitw=



1, if event of type w occurs in the interval [at−1, at)

0, no event of type w occurs in the interval [at−1, at)

(17)

yit0 =



0, if event of type w occurs in the interval [at−1, at)

1, no event of type w occurs in the interval [at−1, at)

(18) These indicators are gathered conforming the response vector in the multinomial logit setup specified to deal with discrete choice (conditional upon survival until a certain time). The contribution of the ith observation to the likelihood function takes the form:

Li = ti Y t=1 ( 2 Y w=1 λw(t|Xi)yitw)(1 − 2 X w=1 λw(t|Xi))yit0 (19)

From which the joint log likelihood, that will be subsequently optimized, is obtained:

L = n X i=1 ti X t=1 ( 2 X w=1

yitwlogλw(t|xi) + yit0(1 − 2 X w=1 λw(t|xi))) = q X t=1 X i∈Rt ( 2 X w=1

yitwlogλw(t|xi) + yit0(1 − 2

X

w=1

λw(t|xi))) (20)

This function has the same form as that of the multinomial logit model for discrete choice, that can be applied to the survival dataset after certain rearrangements (Allison, 1982). Below the main steps for the second step estimation procedure and the former data treatment are described:

• An unbalanced data panel is created in a individual-month form, so that there is one obser-vation for each person and each monthly interval of the form [at−1, at) he/she has reached

(until failure through any of the two exits considered). After this expansion, the dataset for multinomial logit counts with 26,033 individual-month observations corresponding to the 4,266 individuals, and representing every time that someone faces the three-outcome decision.

(27)

• By grouping the binary indicators yit0 and yitw described above, the dependent variable

”exitmnl” is built. It takes value ”0” for censored observations, ”1” to indicate exit to full-time re-employment and ”2” to indicate exit to part-full-time re-employment. This variable is created in this person-month layout. This means that for an individual who never left the joblessness state, all his/her observations throughout the whole period considered are censored and set equal to 0. Alternatively, for those individuals who experienced an exit from the joblessness state, all observations will be censored but the last one observed, that will take either value 1 or 2 depending on the type of exit.

• Estimate the MNL model using the ”0” observations from in the dependent variable as the reference category, against which all parameters estimated for exits 1 and 2 will be interpreted. Finally, from the estimated coefficients, average marginal effects will be computed to know the sizes of the effects of the variables considered on the probabilities of reaching re-employment through either of the exits or not at all, which is the final aim of the paper (especially the effect of the endogenous variable moved ).

The two-step ML procedure provides an important tractability advantage if compared to gath-ering the whole estimation procedure in the optimization of a single joint likelihood function. This ML procedure still provides consistent estimates for the coefficients (under strict assump-tions) while bootstrap is needed to obtain correct standard errors in the second step.

(28)

5

Empirical results

The first set of results presented in this section aims to back with some evidence the suggestion made on the existence of self-selection bias caused by the inclusion of the moved indicator as regressor, due to unobserved factors affecting joblessness duration and exit, and which may at the same time trigger the decision of moving after displacement.

Several separate regressions will be performed for such a purpouse. The regressors included in the two-step procedure (with statistically significant causal effects on exiting unemployment through either FT or PT re-employment, as will be reported later in Section 5.2), for example, several demographic, education, civil status indicators, will be first of all regressed on the moved binary indicator. The F-tests reported in Tables 8-19 in the Annex confirm the existence of statistically significant differences in the values registered for these variables in terms of those who moved and did not move. These results show that the observed factors have a causal effect as determinants of both, the competing hazard rates and the decision to move. Therefore, it is very probable that there exist unobserved factors that cause both decisions as well, and because these unobservables would be part of the error term, estimation results will quite likely be bi-ased when including the variable moved in the conditional hazard specifications. The complete results for the aforementioned regressions can be found in the annex in Tables 8-19.

5.1 First-step Logit model

This second set of results displayed in Table 3 contains the estimated average marginal effects resulting from the first-step logit model, aiming to identify which covariate values and hence, which chracteristics of the individuals, have a greater incidence on the decision of moving to a new house after being displaced.13 Due to the nature of the decision examined (rare event/ drastic change in lifestyle), the proportion of individuals that did move corresponds to a low fraction of the total sample, around the 11%. Therefore, the logit predicted probabilities are expected to be low. These first-step estimation results will be used to compute the Coslett dummies, that will be subsequently used as covariate in the second-step of the estimation pro-cedure, together with the endogenous indicator moved.

Most of the covariates are categorical variables which have been decomposed in binary indica-tors. Their coefficients are interpreted against the reference category which has been excluded from the specification to avoid multicollinearity problems.14 Many of covariates are not indi-vidually significant but the overall joint significance Wald test, with p-value ≈ 0, supports the relevance of the covariates included.

The effect of the timing of the displacement is significant at 1%. The sooner it happened in the displacement period analysed (2007-2009), the higher the chance of having decided to move. Using 2009 as reference category, the probabilities of moving of those who where displaced in 2007 and 2008 are 0.09 and 0.06 higher, respectively. Another significant factor at 1% is the expectation of being recalled by the employer in the lost job. It decreases the probability of moving in 0.165. On the contrary, there are no significant differences among the three reasons for displacement (position abolishment, insufficient work and plant closure) or the five different occupation sectors for the lost job recorded in the dataset, in terms of their effect on triggering the moving decision.

13The corresponding estimates for the Logit coefficients are included in Table 20 in the Annex 14

The reference individual is a white male from NewEngland, with high school education, who worked in the manufacturing sector and was displaced in 2009 due to insufficient work.

Referenties

GERELATEERDE DOCUMENTEN

The estimates found on the effect of involuntary job mobility on the unemployment risk show that involuntary job mobility decrease the employability of individuals, irrespec-

Contributions Focusing specifically on formal- ity transfer, for which parallel data is available, (i) we take the contribution of pre-trained models a step further by augmenting

ENERGIA’s support to the gender activities of the Program in Liberia consists of: (i) gender mainstreaming across all the Cooperation Areas of the Program, (ii)

Interessant voor deze studie is daarom de vraag hoe de toepassing van een bestaand klassiek muziekstuk in film, zoals het ‘Clarinet Concerto’ van Mozart, kan worden

Er vinden nog steeds evaluaties plaats met alle instellingen gezamenlijk; in sommige disciplines organiseert vrijwel iedere universiteit een eigenstandige evaluatie, zoals

military intervention in the Middle East in the search for terrorists (Chomsky 2003, 107). Even though both countries were subjected to U.S. domination, which should have

Voor de soorten Altenaeum dawsoni en Notolimea clandestina zijn in de recente literatuur waardevolle gegevens over het le- vend voorkomen in de Noordzee te vinden, waarvoor

Their modulated structure, even for films containing 2-A-thin Co and Fe layers, was proved by x-ray diffraction and transmission electron microscopy, Below a Co layer thickness