• No results found

Relations between mental health, income and the labor market position in the Netherlands : a dynamic panel data analysis

N/A
N/A
Protected

Academic year: 2021

Share "Relations between mental health, income and the labor market position in the Netherlands : a dynamic panel data analysis"

Copied!
56
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

UNIVERSITY OF AMSTERDAM

Master Thesis Econometrics

Relations between mental health, income

and the labor market position in the

Netherlands: A dynamic panel data analysis

Thesis presented for the degree of

Master of Science in Econometrics

Author:

Sharon Keizer

Student number:

10359540

Supervisor:

Dr. J.F. Kiviet

Second reader:

Dr. J.C.M. van Ophem

29 January 2017

(2)

Abstract

This study investigates the effects of perceived job insecurity, income and other labor market variables on the mental health status of Dutch individuals. A bian-nual panel dataset from CentERdata has been used with microeconomic data where a large number of individuals are observed for a few time periods. The full sample and employee subset are estimated with dynamic panel data models including the lagged dependent variable mental health as additional explanatory variable. Investigated is whether these models appropriately take into account the dynamics in the relations between regressors, since no previous studies within this field have done so. Further-more, possible valid and strong internal instruments are examined. Apart from the work category variables, health-related explanatory variables are studied as well. The models are estimated by the Arellano-Bond GMM estimator and compared to the estimates of an adapted version of Blundell-Bond system GMM. After classifying the regressors and employing a great number of the dynamic panel data misspecification tests, some significant factors influencing mental health are found. A decrease in men-tal health can be explained by high perceived job insecurity (full sample), high menmen-tal effort (employee subset), low income (employee subset), low life satisfaction (both) and several other factors. In addition, the estimation is also performed for men and women separately to study the differences between genders.

(3)

This document is written by Sharon Keizer who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(4)

Contents

Abstract i Statement of Originality ii 1 Introduction 1 2 Literature Review 4 2.1 Work factors . . . 4 2.2 Income . . . 8 2.3 Health factors . . . 9

2.4 Implications for the model . . . 10

3 Data 11 3.1 Construction of the panel dataset . . . 12

3.2 Dependent variable: Mental health . . . 13

3.3 Explanatory variables . . . 15

3.4 Descriptive Statistics . . . 17

4 Estimation Strategy 19 4.1 Estimation method . . . 19

4.2 Specification tests . . . 21

4.2.1 Autocorrelation and heteroskedasticity in the errors . . . 21

4.2.2 Classification . . . 22

4.2.3 Too many instruments . . . 29

4.2.4 Blundell-Bond estimation . . . 33

4.3 Econometric models . . . 35

5 Results and Analysis 36 5.1 Results complete dataset . . . 37

5.2 Results employee dataset . . . 40

5.3 Results men vs. women . . . 43

6 Summary and conclusions 46

References 49

(5)

1

Introduction

Promoting mental health and well-being are part of the Sustainable Development Agenda of the United Nations General Assembly since September 2015. The goal of this agenda is to transform our world by 2030 (World Health Organization, 2016b). This recognition shows the importance of the matter on a global scale. Together with organizations as the World Health Organization and the World Bank they want to get more attention and awareness for mental health issues. These issues impose a huge burden on societies across the world, but are still experiencing stigma and bias. The World Health Organization (WHO) defines mental health as “a state of well-being in which every individual realizes his or her own potential, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her or his community.” It is an important indicator for the health status of a population and can be measured by the different psychiatric diagnoses, such as major depression, anxiety disorders and schizophrenia. The Mental Health Inventory (MHI-5), one of the subscales of the Short Form-36 (SF-36), is a widely used instrument to measure quality of life and is used in this paper to measure the mental health of the Dutch population (Hoeymans et al., 2004).

Studies estimate that approximately 10% of the world’s population is affected by men-tal illness and 20% of children and adolescents experience some kind of menmen-tal disorder (Mnookin, 2016). In addition, depression alone affects 350 million people and is the number

Figure 1: Trends in the incidence of disorders among the Dutch population in the period of 1992-2010

(6)

1 INTRODUCTION

one cause of disability in the world (World Health Organization, 2016a). In the Netherlands approximately 20 % of the adults will suffer from a depression once in their lives (Trimbos Instituut, 2016a). From the statistics of the Public Health and Health Care department of the Dutch Government, shown in figure 1, it is clear that the occurrence of anxiety-and mood disorders has increased in the past decades (Volksgezondheid en zorg, 2011). Both the incidences of anxiety and mood disorders, such as depression, bipolar disorder and panic disorder, almost doubled in the time span presented (1992-2010). Overall, the graphs in figure 1 show that more women (red line) suffer from mental disorders than men (blue line), and the incidence of anxiety disorders (scale 0 - 800) is higher than that of mood disorders (scale 0 - 300).

A lot of studies in this field have demonstrated that unemployment and job insecurity are harmful for both mental and physical health, in itself and also because it leads in most cases to a loss of income. Not only the relations between mental health, income, job insecurity, unemployment etc. are analyzed, but also the relation between mental and several indicators of physical health. Another goal is to investigate the mental health of working men and women separately. This is to gain more insight on the vulnerability of one gender in comparison to the other, in relation to detrimental effects on mental health. Men and women may have different ways to deal with tough work conditions and/or an unhealthy lifestyle, this could have a different effect on their mental health status. The datasets that are used are from the LISS (Longitudinal Internet Studies for the Social sciences) panel administered by CentERdata (Tilburg University, The Netherlands). Three core panels, Health, Work & Education and Personality, are combined with the background variables available to get the dataset that is needed for panel data estimation techniques and to answer the research questions. The yearly panels have been collected from 2008 to 2015. Since the data of 2014 is not available in the Health panel, only 2009, 2011, 2013 and 2015 will be used to create a balanced biannual panel.

The NEMESIS-2 study of the Trimbos Institute, a longitudinal study of psychical disorders among the Dutch population, concludes that people who have experienced the unfavorable consequences of the economic crisis in 2007, such as losing their jobs or considerable loss in family income, have an increased chance of developing mental problems (Trimbos Instituut, 2016b). These relations are re-investigated in this study using a core research of the LISS panel of the Dutch population from 2009 to 2015. The main objective of this paper is to find the determinants of mental health of the Dutch population and to determine the

(7)

dynamics of the relationships between the dependent variable and job insecurity and other explanatory variables in both work and health categories. This is important because mental health issues are becoming more and more a global problem, and knowing which factors may be (partially) responsible for anxious, depressed and unhappy individuals can help in preventing and improving it. The contribution of this paper to the existing literature is the dynamic model specification in which the relationship is studied. This is a huge difference from previous studies, which all use static models for inference about job insecurity and mental health problems.

Before estimation of the coefficients can take place, the first necessary step is to divide the explanatory variables in three subsets of exogenous, predetermined or endogenous re-gressors with respect to the error terms. It is obvious that some of the causal factors of mental health will be affected by immediate or lagged feedback from mental health, so the general method of moments (GMM) is used for parameter estimation. The results of both the Arellano-Bond (difference) and Blundell-Bond (system) GMM estimators will be investigated and compared. Finding the optimal model specifications and adequately valid and strong instrument variables involves tests for overidentification restrictions, autocorrel-ation and heteroskedasticity. The results of these tests are used to construct an adequately specified model, which is needed for analyzing the data and for consistent and reliable res-ults. The number and validity of the instruments will also be thoroughly examined. Too many instruments result in unreliable autocorrelation and overidentification tests, which can be a big problem since they are very important measurements of a good model fit. After the model estimation, the results are interpreted and compared.

The remainder of this paper is organized as follows. The next section describes the previous literature on work factors (including job insecurity), income and health related factors, including the relations with mental health. It concludes with the implications these studies provide for the econometric models used in this paper. Section 3 elaborates on the data used and discusses the dependent and explanatory variables. Furthermore, relevant descriptive statistics are provided. Section 4 explains the estimation procedure, provides tests on endogeneity of the variables, instrument variables and autocorrelation in the error terms and presents the econometric models for the full sample and the employee subset. The results and an analysis are given in section 5. The final section concludes and provides a discussion of the assumptions made and possible limitations.

(8)

2 LITERATURE REVIEW

2

Literature Review

The relation between mental health and factors that influence it has been studied thor-oughly in previous papers. Associations between job insecurity, income and mental health has been established in psychological and socio-economic literature. Health risk behaviors (smoking, drinking etc.) have also been identified to have strong associations with mental disorders. This section discusses the results of previous research on the relations mentioned before. Some studies are looked at in greater detail, because they used a Dutch dataset in-cluding mental health variables as well or have very similar research questions. This makes it possible to compare the results of this study with the main findings of these similar studies. In the last part of this chapter a review of the studies that handled endogenous regressors in their research is given. The implications of these results for the model of this study are addressed as well.

2.1 Work factors

The Dutch economy went into a recession after the financial crisis in 2008 that started with the fall of the American Lehman Brothers bank. As a result the unemployment rate in the Netherlands increased from 4.2% (355,000 individuals) in 2007 to 7.4% (660,000 individuals) in 2014. For the first time in years the unemployment rate declined in 2015 to 6.9% (CBS, 2016a). The Dutch economy is climbing back up: the size of the economy is equal to the size in 2008, the export is growing, household incomes are increasing as well as

Figure 2: Unemployment rate of men and women in the Netherlands from 2007 - 2015

(9)

the consumption and investments (CBS, 2016b). Figure 2 shows the unemployment rates for men and women in the past 9 years. It is clear that the unemployment rates of women are higher than those of men in all the corresponding years. Apart from the unemployed, there are also lots of people that are uncertain of whether their job will continue to exist. This job insecurity is one of the main focal points of this study.

Labour markets have changed a lot in the past decades, they have become more flexible and open. This increase in flexibility is mainly due to more employees with a temporary contract and more on-call workers. Besides, the number of self-employed individuals also grew in this period. Employees with a flexible labour market position are more likely to become unemployed than workers with a fixed contract (Gaalen et al., 2013). On the other hand, organizations are faced with the pressure of making their business more effective with fewer resources. Competition has become harder and the economy more unpredictable, therefore downsizing, merging, acquisitions etc. have become more common. This results in increased feelings of insecurity among the work force (Sverke, Hellgren and Näswall, 2002). Several arrangements have increased the flexibility in which firms can set wages and utilize labour (Gerritse and Høj, 2013). These developments imply an aggravation of the job security of employees and jobs. Such a loss in security leads to a negative change in the well-being of the affected workers. In addition, it can cause depressive feelings and therefore result in worse mental health. Hence, the relation between job insecurity and mental health is negative.

Different studies often use different definitions for job insecurity and do not measure it in the same way. In the past it has been done in various ways, but a common element is the definition of "perception of a potential threat to the continuity of the current job". This perceived job insecurity is measured in the study of Heaney et al. (1994) by a statement concerning the continuity of someone’s job. The respondents can answer with entirely disagree, disagree, agree and entirely agree. The perception of job insecurity is therefore a personal assessment, not the actual risk to lose employment (objective job insecurity). In another study investigating job insecurity and mental health, a percentage which rep-resents the chance of losing your job in the coming 12 months is used as measurement for perceived job insecurity (Meer, Huizen and Plantenga, 2015). In the article of Reichert and Tauchmann (2011) a binary variable ’fear of unemployment’ is constructed, where 1 means that the individual is somewhat or very concerned about their job security and 0 indicates no concern.

(10)

2 LITERATURE REVIEW

The main problem with job insecurity as an explanatory variable is that the relationship with mental health is simultaneous and selection effects occur. This results in endogeneity and selection problems in the estimation process. Instead of higher job insecurity causing a decrease in mental health status, it can also be the other way around. Workers with mental health issues can select more insecure job positions or their view of security is far worse than the reality. In addition selection effects lead to increased likelihood of employees with less optimal mental health to lose their jobs and for unemployed individuals finding a secure job is harder. Hellgren & Sverke (2003) deal with these issues in their paper. They investigate the direction of the relationship between job insecurity and health complaints using longitudinal data and control for prior levels of mental and physical health complaints. Their measure of job insecurity is somewhat similar to the one used in this paper. They used three items of statements concerning the way people feel about the chance of losing their jobs. The General Health Questionnaire (GHQ-12) is the screening test used for measurement of mental health. Physical health is indicated by a 10-item inventory reflecting experiences of several psychosomatic symptoms (related to the mind and the body) in the past year. Their findings provide empirical support for the theory of perceived job insecurity leading to mental health problems. The cross-lagged effect of job insecurity on mental health complaints was significant, whereas the reverse effects were not. However, a cross-lagged effect on physical health was not found. Furthermore, they imply that prior levels of health are necessary to explain the relations between job insecurity and mental and physical health.

A lot of other researchers found the same effect. Green (2011) used nationally representat-ive panel data from Australia to study the effects of unemployment and job insecurity on life satisfaction and mental health. His results indicate that all the hypotheses about the impact on well-being are supported: the average impact of unemployment on well-being is negative, the risk of job loss is directly connected to the loss of life satisfaction and mental health, and household income and well-being have a weak positive association. He con-trolled for unobserved characteristics and used random as well as fixed-effects approaches to obtain his results. All approaches give significant results for the effect of probability of job loss on subjective mental health for males, but not for females. The result of the fixed effect method is only significant at 10 %. The relation is for both genders negative, but for males the decrease in mental health is 3.9 points in comparison to 1.3 points for females, using the fixed effects method.

(11)

Reichert and Tauchmann (2011) analyze the causal effect of job insecurity on psychological health based on individual-level data. They used a panel with data on job insecurity and mental health of private sector workers in Germany and extended the sample to employees who have insecure jobs. This is to account for the measure of mental health consequences of the potential job loss. The possibility of endogeneity is taken into account by instrumenting job insecurity with staff reductions in the company in the last 12 months. They found that an increase in fear of unemployment substantially decreases the mental health status of employees. The results of the fixed effects and IV-approach show that a shift from ’not concerned’ to ’somewhat or very concerned’ about job insecurity decreased mental health by 1.4 and 5.5 points respectively, both are significant at a 1 % level. The last approach estimates the two stage least squares system with fixed effects (IV/FE) and does not give a significant result for the effect of fear of employment on mental health.

Van der Meer and Van Huizen (2015) use the same Dutch panel dataset of LISS in the period 2008 - 2013 and have therefore the same dependent variable (MHI-5) and a lot of comparable explanatory variables. Instead of a job insecurity scale, a percentage indicating the perceived chance of job loss in used. Moreover, they pay attention to economic signific-ance of effect size and to different effects for men and women. An instrument variable that is used for job insecurity is the number of new recipients of unemployment benefits in the month of measurement of perceived job insecurity. Three different estimation procedures are carried out for the total sample and for men and women separately: pooled ordinary least squares, fixed effects and IV with fixed effects. The F-statistic of 6.56 indicates that the instrument mentioned before is rather weak. The results of their estimations show that the negative effect of perceived job insecurity on mental health is only significant for men using the fixed effect model. The IV approach does not yield any effects for both genders and leads to non-significant coefficients for most of the explanatory variables. The only exception is tenure, which has a negative influence on mental health for the whole sample and for women separately. Log net income also has an effect on the mental health of women: a positive effect of 4.4 points and is significant at the 5 % level. A limitation of this paper is that they failed to identify a causal effect between job insecurity and mental health. In addition the instrument that is used is not very strong, so the IV-estimation did not lead to many significant results. The current paper will extent the methods of Van der Meer and Van Huizen (2015) by exploring the possibilities of other instruments, using GMM estimation techniques and investigate the potential dynamic relation between variables.

(12)

2 LITERATURE REVIEW

Another Dutch study is by Collewet and de Koning (2011). They elaborates on the relation-ship between labour participation and the general health of individuals in the Netherlands in the period 2000 - 2008. To investigate this effect they use a lot of work related vari-ables: bad working conditions (physically hard work/mentally hard work), income, job stress, control about work, social support, job satisfaction, self-employment, job insecur-ity, tenure, part-time work, irregular work hours and travel time. They provide evidence for the significance of every variable in relation to health and use panel data techniques for estimation using a fixed effects model. Having a job does not yield a significant effect on health for men, however for women it is significant at a 10 % level. The amount of hours worked does also not have a significant influence on health for both men and women. Being self-employed has a negative significant effect on the general health of men, but no effect on women’s health. Furthermore, when a lot of mental effort is needed for the job it has an evident negative effect on the health of both genders. The measurement of job satisfaction is difficult, because it correlates strongly with job participation. The results suffer from multicollinearity and should be treated carefully. Most of these variables are used here as well to check whether the influence on mental health is significant or not. Apart from the variables mentioned before there is another important factor influencing mental health: job stress. Mental effort was already confirmed by Collewet and De Koning (2011) to have an effect on mental health status. Cooper et al. (1989) write about mental health, job satisfaction and job stress among general practitioners in England. The variable job stress is broken down into six different factors and almost all of them were predictive of high levels of job dissatisfaction and lack of mental well-being. Another study handling job stress is by Godin et al. (2005). The study tests associations between psycho-social stress at work and multiple indicators of poor mental health. An index of cumulative job stress was constructed and four categories are defined. They find for all five mental health indicators an association with job stress, in both men and women. Recent stress is strongly observed as an influence on the mental health of men, whereas cumulative stress has the highest effect on the mental health of women.

2.2 Income

In line with job insecurity and job loss lies the decline of household income. In most cases a reduction of personal or household income leads a decrease in mental health. The article of Barbaglia et al. (2015) discussed the consequences of negative socioeconomic changes

(13)

on the population’s mental health; this includes job loss and household income reductions. Only individuals with a paid job and without a mental disorder at baseline are considered. Furthermore, a dummy is created for a substantially reduced income (their income or their partner’s). They found that household income reductions increased the risk of any mental disorder, but particularly the risk of mood disorders (depression, bipolar disorder). Job loss leads to about the same results. However, it is a more important risk factor for men than for women, whereas household income reductions have a bigger effect on women. Sturm and Gresenz (2002) use a different line of approach; they analyze the relation between geographical inequalities in income and common chronic medical conditions and mental health disorders. Besides, they investigate whether inequality in incomes is a stronger determinant of health than the income of individuals of households. To re-examine the income inequality hypothesis - a hypothesis stating that disparities in income among members of a community affects their health - income is divided into five ordered categor-ies. Their findings indicate that there was a strong association between health and family income. However, no relation was found between chronic medical problems or mental health disorders and income inequality. Both studies provide evidence for the hypothesis that a reduction of household income is a major risk factor for the presence of mental health disorders.

2.3 Health factors

In the past years it has been demonstrated that work factors are important determinants of worker’s mental health. Nevertheless, much less attention has been given to possible other aspects that could be an influence on individuals mental health. Apart from work and personal variables, health variables will also be considered as determinants of mental health problems in this study. This is to make it as complete as possible regarding variables that influence the mental health of the population.

The studies of Vermeulen-Smit (2015) and Beauregard et al. (2011) both review several health related non-work variables. The first one uses Dutch data from the NEMESIS-2 dataset and claims that all four identified health behaviour clusters, including different levels of smoking, drinking, active lifestyle and diet, are strongly associated with mental disorders. Individuals in all three unhealthy clusters had double the risk of major depres-sion and were strongly linked to drug dependence, alcohol abuse and social phobia. They

(14)

2 LITERATURE REVIEW

used smoking, alcohol use, physical activity and nutrition as health indicators. Due to the cross-sectional nature of the data, the paper states that no conclusions about the causality of the relationship between health risk behaviour clusters and mental health can be drawn. The second article gives a systematic review of longitudinal studies and tries to bridge the gap in knowledge about non-work determinants predicting worker’s mental health status. The non-work domain ranges from chronic stressors and life events to health-related lifestyles and symptoms (Beauregard, Marchand and Blanc, 2011). Thirteen studies were selected for evaluation of the explanatory variables. The following variables were used in these studies: history of affective disorders, levels of psychological distress, alcohol-related problems, self-esteem, sense of cohesion, chronic health problems, smoking, physical activity, stressful childhood events, mastery, body mass index, hypertension, heart disease and back pain. Marchand et al. (2005) is one of the reviewed articles and shows that there exist a strong and significant relation between personal characteristics and psychological distress. Personality variables as gender, age, physical health, locus of control, sense of coherence and stressful childhood life events all had a significant influence. As for lifestyle habits, alcohol consumption has no impact, while smoking increases the risk of distress and physical activity has a small impact only. It is clear that the health-related variables should not be overlooked, therefore some of these health and personality variables will be included in this paper.

2.4 Implications for the model

The previous subsections discussed various studies with similar research goals, but none of them implemented dynamic elements to their model specification. Only some of their findings give insights on the issue of endogenous regressors. Van der Meer and Van Huizen (2015) also use data from the LISS panel. However, they only assume that job insecurity is endogenous and expect the other explanatory variables to be strictly exogenous; no tests were performed to check these assumptions. A static linear panel data model is used for estimation, so their model is not a very helpful example for this study. Although Reichert and Tauchmann (2011) do include lags of explanatory variables (employees’ employability, job satisfaction and overtime) to reduce the risk of an invalid instrumental variable, they do not test these assumptions directly. Instead they check whether staff reduction operates through other channels than job insecurity and conclude that is it a valid instrument. This procedure gives the desired results, but is not a reliable method to determine if

(15)

the instruments are strong and valid. Additionally, personal gross labor income and an unemployment indicator are also added in terms of one period lags to avoid potential bias resulting from reverse causality.

Another important aspect of this subject is discussed by Hellgren and Sverke (2003), as was earlier mentioned in subsection 2.1. Mental health can influence one’s job insecurity and the other way around. In summary: they deal with this issue in their article and conclude that the influence of job insecurity on mental health was significant, whereas the cross-lagged effect of mental and physical health on job insecurity was not. Their results imply that prior levels of (mental) health should be taken into account when explaining the effect. Including previous levels of the mental health variable transforms the model into a dynamic panel data model. Since the data only exists of three time periods, due to taking differences, only one lag of mental health can be added as an explanatory variable to the model. Apart from exploring the dynamic relation between mental health and explanatory variables, this paper will contribute to the existing knowledge of this subject by investigating possible valid and strong internal instrumental variables and applying panel data estimation techniques to the data, such as the methods of Arellano-Bond (1991) and Blundell-Bond (1998).

3

Data

The data used in this paper are from the LISS (Longitudinal Internet Studies for the Social Sciences) panel. This panel is collected by CentERdata (Tilburg University, The Netherlands) and contains yearly internet surveys in several domains, such as health, work & education, income, personality, values and political views. It is a representative sample of the Dutch-speaking population and has information about approximately 4500 households. The panel is based on a true probability sample of households drawn from the population register. Households that could not otherwise participate are provided with a computer and Internet connection. A comparison between the LISS-panel and the Dutch population shows that in most fields the differences are quite small, except for age groups. Too few elderly individuals participated in the early years of the panel, but in the later years the participation gap between elderly and other groups is decreasing. The panel also contains less panel members with a foreign background. This could be due to the fact that the panel does not include persons with insufficient language skills in Dutch (CentERdata, 2016).

(16)

3 DATA

3.1 Construction of the panel dataset

The core panels are collected every year in the period 2008-2015. The data of 2014 is not available in the Health Core Study. To create a balanced panel only the years 2009, 2011, 2013 and 2015 of every core panel are used to put together the dataset. One of the advantages of panel data techniques is an increase in estimation precision. This is the result of an increase in the number of observations due to pooling several time periods of data for each individual in the dataset. Another major advantage of analyzing panel data is that it enables to cope with unobserved heterogeneity being correlated with included regressors. From the LISS-panel the following three core panels and background variables panels are used to construct the dataset:

• Health panel (LISS Core Study: wave 3, wave 5, wave 7, wave 8) • Work & Schooling (LISS Core Study: wave 2, wave 4, wave 6, wave 8) • Personality (LISS Core Study: wave 2, wave 4, wave 6, wave 8) • Background variables (Dec 2009, Dec 2011, Dec 2013, Aug 2015)

First, the waves of each category are adapted in Stata to create files to work with. All useful variables are renamed and the remaining redundant variables are dropped out of the sample. This procedure is repeated for all the waves in the core studies, to keep the names and amount of variables exactly the same. The background variables files are selected on the same month as the collection of the data for the Health Core Study. Again the necessary variables get a different name and the other variables are deleted. Next, the waves of each category are merged into a dataset per year, using the ID number for sorting. This results in four separate files for the years 2009, 2011, 2013 and 2015. A year variable is generated to assign a year indication as well as an ID number to every observation. To create a panel dataset the four files are appended to one. The individuals that do not have a complete dataset or have missing data for one or more years, are filtered out of the sample. Furthermore, all individuals under 16 are removed as well to ensure that the panel only contains the working age population and the pensioners. The variables income, job insecurity, job stress, mental effort, appreciation, tenure, travel minutes, BMI and life satisfaction contain missing information, hence these missing observations are also dropped. This leaves a remaining database of 1848 individuals participating in all four biannual time periods. With T = 4 and N = 1848 this dataset belong to the category microeconomic data with "small T, large N" panels; this is needed for the estimators that will be used in this paper. Few time periods keep the amount of instruments low and a lot

(17)

of observations make sure that the cluster-robust standard errors and the Arellano-Bond autocorrelation test are reliable (Roodman, 2009a).

3.2 Dependent variable: Mental health

The LISS Health Core Study contains several health related variables that can be used to measure both mental and physical health. For mental health the dependent variable that is used is the Mental Health Inventory (MHI-5). For the relation between mental and physical health, some physical health variables are included as explanatory variables. The measure of physical health is a little more difficult to determine. Two variables could be used to indicate someone’s physical health: Body Mass Index (BMI) or suffering from major diseases.

The MHI-5 is a well-known method for evaluating mental health issues such as anxiety disorders, depression, behavioral control, positive effect and general distress; it is a meas-urement of overall emotional functioning. The method is a part of the Short Form Health Survey (SF-36) and is based upon the single items that best reproduce the total score based on the longer version (Berwick et al., 1991). The MHI-5 scale has scores between 0 and 100, where a score of 100 indicates optimal mental health and 0 means severe dis-tress (Hoeymans et al., 2004). The resulting MHI-5 contains the following items, each introduced by the question: "How much of the time, during the last month, have you... ."

1. .. felt very anxious?” (anxiety)

2. .. felt so down that nothing could cheer me up?” (behavioral control) 3. .. felt calm and peaceful?” (positive effect)

4. .. felt depressed and gloomy?” (depression) 5. .. felt happy?” (positive effect)

The possible answers range from 1 = never, 2 = seldom, 3 = sometimes, 4 = often, 5 = mostly to 6 = continuously. Using the standard procedure to calculate the MHI-5 scores gives scores between 0 and 100. First the scores of the negative feelings are reversed, so that a higher score translates to a happier person. The 6 - 1 score become 0 - 5 and for the positive feeling the 1 - 6 score is changed to 0 - 5, resulting in a minimum score of 0 and a maximum of 25. To get a maximum of 100 the total sum of the five questions is multiplied by four (Ware et al., 1993).

(18)

3 DATA

A disadvantage of the MHI-5 is that it is usually reported as a mean score of the total sample or over the total time period. Moreover it has no internationally agreed-upon cut-off point. Distinguishing between individuals with probable mental problems and those without is not completely clear (Hoeymans et al., 2004). In earlier studies all outcomes between 50 and 78 have been suggested as cut-off points between below average mental health and average/good mental health (Kelly et al., 2008).

Figure 3 displays the distribution of the Mental Health Inventory (MHI) scores of the indi-viduals in the sample in the beginning of the research period (2009) and at the end (2015). The graphs show that most subjects have a score in the 70-90 range in 2009. In 2015, after 6 years, the respondents have aged and the economy is no longer in a state of recession; the unemployment rate has decreased and the consumer trust has gone up. This could be the explanation why more people appear to have a higher MHI-5 score and therefore are happier, as the distribution is even more right-skewed. The factors that positively or negatively influence mental health are examined in this study. More information about the dependent variable can be found in the descriptive statistics subsection.

Figure 3: Fraction distribution of the MHI-5 scores of individuals in the dataset in 2009 and 2015

(19)

3.3 Explanatory variables

The background variables used in this study are the gender of the respondent, age, civil status, the size of the family, whether the respondent has children, origin group, degree of urbanization of the area of residence and educational level. For a lot of people in the dataset most of these variables do not change much over the sample period (some of them are therefore omitted from the descriptive statistics in table 1). The consequences of this are addressed in section 4. An overview of the dependent and explanatory variables is given in Table 8 in the appendix, including a description and the type of variable. A dummy for gender is created, to ensure the estimation can be performed for men and women separately. The data of the opposite gender is temporarily dropped out of the sample to investigate the different effects on mental health by gender. Furthermore, civil status is divided in three dummy categories: married, divorced, and the reference category contains respondents that never did marry or are widowed. The variable Dutch indicates whether the respondent is of Dutch origin or non-Dutch origin. Another dummy is created for having children.

The next category contains the work-related variables. These variables are only completed for the respondents who have a job in the years of the research period. A dummy is created to indicate whether someone is employed or not and another for long-term unemployment. The reason for this is to exclude students who don’t have a job, but are not yet finished with their study or people that enjoy an early retirement. A combination of work-related variables and the job dummy is used to estimate these effects in the complete dataset. For individuals who do not have a job the value of these variables is zero. Furthermore, variables for job insecurity, income, high workload (job stress and mental effort) and appreciation are included. For income the natural log of personal net monthly income in euros is used. Dummies regarding job details, such as being a temporary employee, being employed in the public or private sector, working irregular hours (evenings, nights and weekends) and being self-employed, are added as well. More explanatory variables that help explain job insecurity are: the average number of hours actually worked per week, the commuting time needed to get from home to work and the amount of time passed since entering employment in the current job, also known as tenure.

Looking at it in more detail shows that job insecurity is measured by a scale of 1 = disagree entirely to 4 = agree entirely with respect to the statement "It is uncertain whether my job will continue to exist". Each category is transformed into a dummy variable and

(20)

3 DATA

investigated is whether the categories disagreeing/agreeing can be put together. This is permitted when the coefficients are close to each other (the difference must not be significant). For the job insecurity variable this is the case for both the disagree and agree category, so these are combined. In the complete dataset the reference category is "missing" information, because approximately 43 % (see job dummy in table 1) of the individuals does not have a job in all time periods and therefore cannot answer all the questions about work. In the employee dataset one of the dummies is used as reference category. The same is done for appreciation, with the statement "I get the appreciation I deserve for my work". However, the dummy categories of this variable cannot be merged and these dummies are only included in the employee dataset. The reason for this is that the missing information dummy is exactly the same for job insecurity and appreciation and this leads to rank issues in the regression matrix (dummy variables are omitted as a result). Furthermore, respondents have indicated how frequently their job gets too busy or requires mental effort, with 1 = never, 2 = sometimes and 3 = often. Again dummies are created for these categories and investigated is whether the coefficient of the dummy for missing values and the coefficient of the dummy for low stress are analogous in the complete dataset. Results show that this is indeed true for both job stress and mental effort (with reference category high/average job stress or mental effort), therefore the respondents without a job are put into the low category (this is equal to never experiencing job stress/mental effort). The health related variables included are a dummy for being a heavy drinker as well as a dummy for smoking and a dummy for drug use in the past month. In addition the number of major diseases (including angina, heart attack, high blood pressure, cholesterol, stroke, diabetes, lung disease, asthma, arthritis, cancer, ulcer, Parkinson, cataract, broken hip, fracture, Alzheimer and tumors) and body mass index (BMI) are included as health indic-ators as well. The BMI variable is calculated by dividing someone’s weight in kilograms by their height in meters squared, it represents an (un)healthy diet and (in)sufficient physical exercise. For BMI values over 30 the respondent falls into the obese category. To ensure that people who gain weight by training muscles are not considered unhealthy, the bound-ary of 30 is used as an indicator of being unhealthy for the high BMI variable. Finally, the variable from the personality dataset that is included as regressor is life satisfaction (scale of 1 to 10). A higher score means a more satisfied person with probably a better mental health status.

(21)

3.4 Descriptive Statistics

Table 1 contains quantitative information on the dependent and explanatory variables. The mean value and standard deviation (SD) of all four time periods are given for all variables. Further, the minimum and maximum values of the variables over the whole time period are provided. The mean of the Mental Health Inventory variable is equal to 76.71, hence most respondents have an average mental health score. This mean score has increased over the years; in 2009 it was equal to 76.03 and in 2015 approximately 77.25. The min and max values are 0 and 100, equal to the boundaries of the MHI-scale.

A short summary of the descriptive statistics: 49.4% of the individuals in the sample is male, 90.8% is of Dutch origin (not in table), around 68% is married, approximately 9% is divorced and 48.6% has children. Furthermore, in 2015 50.4% has a job, 12% is unemployed and the remaining 37.6% is either retired, a student, disabled or takes care of the household (unpaid). For the variables job insecurity, job stress, mental effort and appreciation only the respondents with a job are included, therefore the percentages do not add up to 100 %. The amount of individuals with high job insecurity is around 20 % of the total sample. The percentage low job insecurity declines over time, this is probably caused by the decrease of people with a job. These percentages indicate that the majority of the people is not very insecure about the continuity of their jobs. The highest percentage of high job insecurity is 20.9 % in 2011, so the most people were insecure about their jobs in this year. The number of respondents with high job stress is in comparison with high mental effort rather low, hence people experience more often mental effort than stress in their jobs. In addition, the percentage of people with high mental effort is considerably higher than the average and low categories. The other work dummies show that 7.2 % has a temporary job, 33 % works in the private sector, 21.1 % works irregular hours and 4.1 % is self-employed. In the health category the statistics show that 14.4 % of the complete sample has a BMI over 30, 60.6% has ever smoked or is still smoking, 1.7% used drugs in the past month of the questionnaire and 38.5% drinks alcohol on three or more days a week. The average grade individuals give their life satisfaction is approximately 7.56, so most people are quite content with their lives.

(22)

3 DATA

Table 1: Descriptive Statistics

Mean (Standard Deviation)

Variable 2009 2011 2013 2015 Min Max

MHI 76.03 (15.6) 76.26 (16.2) 77.31 (16.1) 77.25 (16.1) 0 100

Job 0.627 (0.48) 0.613 (0.49) 0.538 (0.50) 0.504 (0.50) 0 1

Jobinsecurity: high 0.181 (0.38) 0.209 (0.41) 0.204 (0.40) 0.190 (0.39) 0 1

low 0.446 (0.50) 0.403 (0.49) 0.334 (0.47) 0.314 (0.46) 0 1

Job stress: high 0.212 (0.41) 0.202 (0.40) 0.187 (0.39) 0.182 (0.39) 0 1

average 0.369 (0.48) 0.347 (0.48) 0.303 (0.46) 0.280 (0.45) 0 1

low 0.046 (0.21) 0.064 (0.24) 0.048 (0.21) 0.042 (0.20) 0 1

Mental effort: high 0.362 (0.48) 0.332 (0.47) 0.324 (0.47) 0.302 (0.46) 0 1

average 0.210 (0.41) 0.226 (0.42) 0.180 (0.38) 0.174 (0.38) 0 1 low 0.055 (0.23) 0.054 (0.23) 0.035 (0.18) 0.028 (0.17) 0 1 Appreciation: low 1 0.029 (0.17) 0.023 (0.15) 0.018 (0.13) 0.021 (0.14) 0 1 low 2 0.142 (0.35) 0.160 (0.37) 0.125 (0.33) 0.129 (0.34) 0 1 high 1 0.396 (0.48) 0.383 (0.49) 0.348 (0.48) 0.311 (0.46) 0 1 high 2 0.061 (0.24) 0.048 (0.21) 0.047 (0.21) 0.044 (0.21) 0 1 Life satisfaction 7.600 (1.31) 7.561 (1.26) 7.562 (1.33) 7.538 (1.26) 0 10 Unemployed 0.115 (0.32) 0.104 (0.31) 0.128 (0.33) 0.120 (0.33) 0 1 Log income 6.483 (2.26) 6.591 (2.14) 6.665 (2.05) 6.707 (2.03) 0 12.1 Average hours 29.38 (14.8) 27.91 (15.4) 29.83 (14.7) 29.37 (15.2) 0 80 Tenure (years) 12.71 (11.3) 13.20 (11.3) 13.77 (11.4) 14.27 (11.6) 0 59 Temporary job 0.082 (0.27) 0.083 (0.28) 0.062 (0.24) 0.062 (0.24) 0 1 Private sector 0.375 (0.48) 0.357 (0.48) 0.302 (0.46) 0.285 (0.45) 0 1 Irregular hours 0.223 (0.42) 0.207 (0.41) 0.211 (0.41) 0.204 (0.40) 0 1 Self-employed 0.042 (0.20) 0.037 (0.19) 0.043 (0.20) 0.043 (0.20) 0 1 Travel minutes 25.52 (22.5) 25.42 (22.0) 26.14 (21.5) 26.70 (22.4) 0 240 High BMI 0.133 (0.34) 0.143 (0.35) 0.147 (0.35) 0.152 (0.36) 0 1 Major diseases 0.580 (1.02) 0.706 (1.11) 0.802 (1.21) 0.865 (1.27) 0 8 Smoke 0.608 (0.49) 0.606 (0.49) 0.602 (0.49) 0.607 (0.49) 0 1 Drugs 0.021 (0.14) 0.013 (0.11) 0.018 (0.13) 0.017 (0.13) 0 1 Heavy drinker 0.213 (0.41) 0.195 (0.40) 0.198 (0.40) 0.188 (0.39) 0 1 Age 52.17 (14.7) 54.14 (14.7) 56.16 (14.7) 57.81 (14.7) 16 93 Married 0.682 (0.47) 0.683 (0.47) 0.675 (0.47) 0.679 (0.47) 0 1 Divorced 0.084 (0.28) 0.087 (0.28) 0.094 (0.29) 0.095 (0.29) 0 1 Children 0.522 (0.50) 0.502 (0.50) 0.476 (0.50) 0.444 (0.50) 0 1

(23)

4

Estimation Strategy

The estimation strategy section is constructed as follows. First the overall estimation method is described, then the misspecification tests are discussed in detail starting with the autocorrelation and heteroskedasticity, then the classification of explanatory variables and ending with the form of the instruments. The results of these tests are used to determine the final regression models for estimation of both the full sample and the employed subset. Table 4 gives a detailed overview of all the decisions that were made in this section.

4.1 Estimation method

This section elaborates on the estimation techniques that are used in the rest of the paper and how the final model specification is determined. To accomplish this, various model specifications are tested. Autocorrelation and heteroskedasticity in the disturbances are reviewed. With AR(1) and AR(2) tests is checked whether the error terms are serially cor-related and if the model or instrument set need adjustments. Furthermore, a compromise must be made between generality of the model and efficiency of the estimates; the number of restrictions and valid instruments should be balanced out with the number of included regressors and the size of the sample. This process starts with the classification of the explanatory variables. All variables are initially classified as endogenous, predetermined or exogenous based on finding of previous papers. From this point is tested into which category they belong and if the model must be adapted. Of course, during this procedure the autocorrelation restrictions also have to be satisfied. Another aspect is to deal with the instrumental variables issue; strong and valid instruments are preferred, but are not always available. Another question is the adequate number of instruments; too many in-struments lead to biased estimates and to size distortions, whereas too few inin-struments result in either an under-identified model or in deteriorated precision in estimates. The Arellano-Bond (1991) difference GMM and Blundell-Bond (1998) system GMM methods are used for estimation of the models. Furthermore, choices must be made regarding the instrument set, the form of weighting matrix in relation to the type of heteroskedasti-city and the variance estimator (one-step or two-step estimation and non-robust, robust or cluster-robust standard error estimation). After all the issues are dealt with the final econometric regression models for estimation are presented for both the complete dataset and the employed subset.

(24)

4 ESTIMATION STRATEGY

Starting with a standard dynamic panel data model:

yi,t = λyi,t−1+ β xi,t+ γ wi,t+ δ vi,t+ τt+ ηi+ i,t,

i = 1, ..., N t = 2, ..., T

(1)

where yi,t is the dependent variable mental health, xi,t contains the strictly exogenous

explanatory variables, wi,tthe predetermined explanatory variables and vi,tthe endogenous

explanatory variables. Furthermore τt are the time effects, ηi the individuals time fixed

effects and i,t are the error terms.

The procedure of finding well-specified models for the data is quite complex when dealing with dynamic panel data relationships. Before the estimation process starts, an adequate candidate model with a set of variables that are possibly valid and effective instruments is required. The model should not be too simplistic nor impose too many implicit or explicit restrictions on the model specification. Moreover, it is important that regressors that are most likely not strictly exogenous are assumed endogenous instead or predetermined, until further investigation. This part of the paper elaborates on establishing adequate specifications for two different sets of models: one for the complete dataset and one for the employed subset. The employed dataset uses model (1) as initial model specification, where yi,t−1the lag of the dependent variable mental health is included based on the findings of

Hellgren and Sverke (2003). Whether the inclusion of the lag of MHI is really necessary is examined in the next section. For the complete dataset this model is combined with a job dummy indicating whether the respondent has a job in all the time periods. Combining these subsets with the job dummy divides the dataset into more categories: the complete dataset, the man subset, the woman subset, the employed subset, the working men and the working women. All these extensions to the model and the final regression models used for estimation are described in subsection 4.3.

The difference GMM estimation procedure involves taking first differences of all the re-gressors in the model. Panel data allows methods for controlling some types of omit-ted variables; first-differencing the equation removes the individuals specific effects, thus eliminating a potential source of omitted variable bias in estimation. Due to this the time-invariant individual characteristics (gender, origin) and variables with limited within-variation (family size, urbanization, education) are omitted. As a result the number of time periods available reduces to three. Moreover, including the lag of the dependent vari-able mental health decreases the number of time periods even further to two periods. This

(25)

will most likely lead to trouble interpreting the tests in the next section and even more important the reliability of the results.

4.2 Specification tests

Now the initial forms of the models are specified, it is time to test the assumptions made by these forms. In the next subsections tests are performed to take into account the issues of endogenous regressors, overidentification (instrument validity), instruments strength and heteroskedasticity and auto-correlation in the error terms. Apart from the model’s estima-tion results the xtabond2 package also provides an AR(1)-test, AR(2)-test, Sargan/Hansen test and difference-in-Hansen tests for instrument subsets. These misspecification tests aim to help achieve consistent parameter estimates and reliable results. However, it should be noted that both the size and power of these test could be highly distorted and results must be judged with great care (Kiviet, Pleus and Poldermans, 2015). When the models and all variables pass the tests the interpretation of the estimated coefficients can start.

4.2.1 Autocorrelation and heteroskedasticity in the errors

Several kinds of standard error estimation can be implemented in the model: non-robust, robust, cluster-robust and Windmeijer corrected cluster-robust errors. If autocorrelation is present in the idiosyncratic disturbance term, some lags of explanatory variables become invalid as instruments. This is tested by the Arellano-Bond test for first-order AR(1) and second-order AR(2) serial correlation applied to the residuals in first differences. Investigat-ing first-order autocorrelation in levels can be done by lookInvestigat-ing for second-order correlation in differences. This Arellano-Bond test for autocorrelation is valid for any kind of GMM regression on panel data. The requirements for difference GMM are the presence of first order autocorrelation in the idiosyncratic errors (p < 0.1) and the absence of second-order autocorrelation (p > 0.1) in differences. Also it must be assumed that errors are not cor-related across individuals (Roodman, 2009a). In all regressions it is required that these two conditions are satisfied; only when the residuals do not display autocorrelation then the instruments of predetermined and endogenous regressors are valid.

Unfortunately, the limited time periods and the inclusion of the lag of MHI make it im-possible to perform a second order autocorrelation test, since three time periods are needed for that. However, the coefficients of the lag of MHI in both datasets were very close to zero

(26)

4 ESTIMATION STRATEGY

and not significant (p-value > 0.50) in all regressions in the pre-testing period before the official classification of explanatory variables was carried out. Especially in the employee dataset is the significance very low, so the lag is removed and the model is no longer a dynamic panel data model. An advantage is that three time periods are now available for estimation, which leads to more accessible instruments. In the full sample the variable will still be included to study the effect of lagged mental health and the dynamic model. In the final specification of the instrument set the significance of L.MHI has greatly increased, but the coefficient is still close to zero. A solution to the problem of the second-order autocorrelation test is to drop the lagged variables (including mental health) temporarily from the model, but keep them as instrumental variables. This is to check whether second order autocorrelation is present in the errors, afterwards the lags are re-added to the model of the complete dataset. Both tests will be shown in the results of section 5 for the em-ployee dataset and only the AR(1) test for the complete dataset. During the classification procedure it is repeatedly checked whether the specification of the complete dataset still satisfies the requirements of the second order autocorrelation test.

Furthermore, both one-step and two-step GMM estimators are examined in this paper to compare the results of regressions using different weighting matrices. An advantage of the one-step GMM estimator is that is uses weight matrices that are independent of estimated parameters, whereas the more efficient two-step GMM weights the moment conditions by a consistent estimate of their covariance matrix. The one-step GMM estimator is used in combination with a robust variance estimate and corresponding weighting matrix, which uses standard errors that are robust to heteroskedasticity (Roodman, 2009a). The two-step GMM estimator is combined with a Windmeijer (2005) corrected variance estimate, which leads to more accurate inference. The resulting standard error estimates are robust and therefore consistent in the presence of any pattern of heteroskedasticity within panels. The small sample correction is used because from Monte Carlo simulations it is known that estimated asymptotic standard errors obtained by efficient two-step GMM can be severely downward biased in small samples (Arellano and Bond, 1991; Blundell and Bond, 1998).

4.2.2 Classification

A lot of issues have to be dealt with in order to start estimating the coefficients of the models. Endogeneity in the regressors is a common problem and therefore a lot of methods have been developed to handle this complication. The most widely known methods are the

(27)

instrumental variable (IV) approach or more generally the generalized method of moments (GMM), which will be used in this study. Endogenous regressors are jointly determined with the dependent variable of the model and are correlated with current error terms (and possibly with past error terms too). Using the notation from Kiviet, Pleus and Polder-mans (2015), where x contains the exogenous explanatory variables, w the predetermined explanatory variables and v the endogenous explanatory variables. The classification of the variables implies the following conditions:

E(xit is) = 0

E(wit i,t+l) = 0 ∀i, t, s, l ≥ 0

E(vit i,t+1+l) = 0

(2)

The first conditions indicate that strictly exogenous regressors must be uncorrelated with all errors (for every t and s). Predetermined regressors are affected by lagged feedback from the dependent variable or the errors it, so the second condition requires the regressors to be uncorrelated with current and future errors. The last condition indicates when the regressor should be classified as endogenous: it allows both contemporaneous correlation between the current error itand vit, and feedback from past errors onto the current value

of vit. These moment conditions relate to the level variables and will change when

first-differences of variables are used instead, as is the case with Arellano-Bond GMM. Here the relation with respect to the transformed errors is investigated. Not only simultaneous regressors (affected by direct feedback from the dependent variable) can cause endogeneity, measurement errors in regressors or omitted-variable bias can also result in this.

To be able to classify the explanatory variables into the aforementioned groups the differ-ences between the conditions (4) are tested with the incremental Hansen J-test, also known as the difference-in-Hansen test. Knowing into which category the variables belong, will help finding the best instrument set. The Hansen J-test (1982) for overidentification is robust, but can be weakened by many instrumental variables. Difference GMM introduced by Arellano-Bond (1991) is used as one-step estimation method with heteroskedasticity robust standard errors. This will be used as an initial specification for testing overidenti-fication restrictions and can be changed in a later stage of the research when tests indicate other specifications need to be used. The test regresses the residuals from an IV regression on all instruments in Z. The GMM regression is a bit more complex, the J statistic is here the value of the GMM objective function, evaluated at the efficient GMM estimator. It

(28)

4 ESTIMATION STRATEGY

is mostly used as an evaluation of the suitability of the model. Under the null hypothesis that all instruments are uncorrelated with the error terms and are therefore valid and proper instruments, the test has a large-sample χ2 (r) distribution, where r is the number of overidentifying restrictions. The Hansen statistic should not reject the overidentification restrictions, so for instance its p-value should be high (> 0.25) (Roodman, 2009) .

Furthermore, the Hansen statistics can also be used to test the validity of subsets of in-struments, via a difference-in-Hansen test. To make it easier to classify the variables into one of the categories, every explanatory variable is added as a subset of instruments to the instrument matrix in the xtabond2 package. The test performs an estimation with and without a particular subset of instruments. These are called the restricted (more mo-ment conditions) and unrestricted regressions (fewer momo-ment conditions). The maintained hypothesis is validity of the instruments used in the unrestricted regression. The null hypothesis is validity of the additional subset of instruments in the unrestricted regres-sion. The difference-in-Hansen test statistic of the restricted and unrestricted regression is asymptotically χ2(s), where s is equal to the number of added instruments. The statistic is calculated by computing the increase in J when the subset is added to the estimation procedure (Baum, Schaffer and Stillman, 2003). Apart from weakening the Hansen test, a high instrument count also weakens the difference test. Therefore, both tests should not be relied upon too faithfully. The more moment conditions are implemented, the weaker the tests become (Roodman, 2009a). Nevertheless, it gives an indication into which category the variables belong. Further testing must be performed before the test results can be viewed as reliable.

Following the suggestion of Roodman’s (2009a) article on the use of xtabond2 in Stata every regressor is put into the instrument matrix Z in some form. As was earlier mentioned every explanatory variable becomes a subset of instruments. For strictly exogenous variables the standard (but shortsighted) treatment is to put it in the Z matrix as one column (iv(var)); this is similar to the instrument variable (IV) approach. Kiviet, Pleus and Poldermans (2015) note in their article that it is better to instrument a strictly exogenous variable by some of its lags in GMM-style as well to improve inference, since all lags of exogenous variables are valid instruments. If the variable possibly is predetermined the subset of instruments can be constructed by using lags 1 and longer in the GMM-style (gmm(var)). For potential endogenous variables only lags 2 and longer in GMM-style (gmm(L.var)) are put into the instrument matrix Z.

(29)

Starting off with a relatively general model specification and instrument set is the first step in the testing procedure. This specification has to pass the requirements on the serial correlation tests and preferably every subset of instruments has a p-value above at least 0.25 to satisfy the overidentification restrictions tested by the incremental Hansen test. If this is not the case the model specification and instrument set should be reformulated. Regressors that are unlikely to be affected by reverse causality, like the background variables age and time dummies, are initially classified as strictly exogenous. Classifying regressors as strictly exogenous is only right if it can be ruled out that these regressors receive feedback of any kind from the dependent variable MHI (they impose stricter conditions on the model). The implied moment conditions of the variables initially classified as strictly exogenous mentioned above are not rejected by the test in both datasets (p-value > 0.25) for both the level variables and the corresponding lags in GMM-style, hence these results confirm the assumption of exogeneity. In addition, for the other background variables married, divorced and children the same moment conditions are tested and here the exogeneity hypothesis is not rejected as well. For the lagged dependent variable MHI all the Arellano-Bond type instruments are used with lag two and longer, because in the first-differences equation the lag of MHI (used as explanatory variable) is endogenous. This classifies the lag of MHI as predetermined with respect to the undifferenced error terms. The other regressors were not so easy to classify, because the direction of the causality was not obvious. The initial specification of the other explanatory variables is either predetermined or endogenous. All health variables start off as endogenous, whereas work variables are divided between the two categories based on the findings of the papers reviewed in section 2. All variables except tenure, private sector, self-employed, irregular hours and travel minutes are initially classified as endogenous.

First a regression containing all explanatory variables (shown in table 1) and a confined set of instruments (described below) is performed to get an initial set that passes the requirements of the Hansen test and additional incremental Hansen tests. This instrument set contains mental health, job insecurity, life satisfaction, log income, average hours, temporary job, high BMI and major diseases (all initially specified as endogenous), tenure and private sector (both initially specified as predetermined) and the background variables already classified as exogenous. The overall Hansen test statistic of the overidentification restrictions of the specification of the full sample is 0.766, hence it satisfies the requirements. All p-values of the separate incremental tests of the subset of instruments are well over 0.25. In the setting regarding only employees the same initial regression (but without the

(30)

4 ESTIMATION STRATEGY

lag of MHI and unemployment) and instrument set is used. Here the p-value of the overall Hansen test is 0.897, the p-value of the first order autocorrelation test is 0.000 and 0.907 for the second order test. Now both requirements (p-values > 0.25 for Hansen and AR(2) test) are satisfied for the initial specification the next step can be taken.

For the initial specification is checked whether this specification is justified or if it should be changed to predetermined or exogenous. First for the variables initially specified as endogenous and afterwards the initially specified predetermined variables. The first time lag is added for endogenous variables and later on the unlagged level variable will be added for predetermined variables (see table 3). Table 2 shows the test results (p-values) of the incremental Hansen tests of the additional instruments added to the initial set of instruments for the full sample and the employee subset. Note that job insecurity consist of multiple dummies in the full sample, the p-value shown in table 2 covers both dummies. Instruments are added to the initial set in the same order as displayed in the table, when the test rejects the additional instruments they are invalid and will be removed, when it does not reject them they remain in the set. The table shows a rejection of the null hypotheses (p-value < 0.25) for life satisfaction and average hours in the complete dataset. Job insecurity is above the requirement, but is still classified as endogenous as a precaution. Some opposite results are found in the workers-only subset: the extra instruments are rejected for log income, average hours and temporary job, but not for job insecurity and life satisfaction. The health variables have in both sets a high (enough) p-value, thus they are classified as predetermined. The moment conditions for the variables initially specified as predetermined (tenure and private sector) are not rejected by the test as well.

Table 2: Verifying the initial classification of endogenous variables with respect to the errors using GMM with robust standard errors

p-value incremental Hansen test Additional instruments Regressors in set Complete dataset Workers-only

Predetermined Job insecurity 0.313 0.980

(first lag) Life satisfaction 0.179 0.868

Log income 0.554 0.135

Average hours 0.052 0.024

Temporary job 0.805 0.101

High BMI 0.540 0.988

(31)

Table 2 and the overall Hansen test verified the validity of the classification of the variables included and the used instruments in the instrument matrix. This specification can be used as starting point for further testing. The next step is to add all the remaining explanatory variables to the instrument matrix one by one based on their initial specification and check whether they are valid instruments. Starting with the initially classified endogenous variables job stress, mental effort, appreciation, unemployment and the health variables. First the second and longer lags of these variables are added to the instrument set, then in the same way as in table 2 the first lags are included to test the extra moment conditions of the predetermined hypothesis. In the complete dataset, the extra instruments (first lag and longer) are not rejected by the incremental test (p-value > 0.25) for job stress, hence it is classified as predetermined. However, adding the second (and first) lag of mental effort (both dummies) and unemployment to the instrument set results in a sharp decline of the p-value of the overall Hansen test and a rejection of the first lag instrument, therefore the variables are classified as endogenous and are added to the regression model in lagged form. This changes the dynamics of the model, because now not only lagged mental health determines the effects on all explanatory variable but also lagged high mental effort and lagged unemployment. After these modifications to the model the p-value of the Hansen test has increased to 0.813.

In the second dataset the p-value of the first lag of job stress (for both dummies) and men-tal effort (both dummies) are over 0.25, so the classification of these variables is changed to predetermined. Extra instruments are not rejected as well for all different levels of ap-preciation. However, the p-value of the overall test decreased a lot when these instruments were added, therefore the instrument subsets of job stress and appreciation are added in collapsed form, because resulted in higher values of the incremental Hansen tests (for those variables). Furthermore, in both datasets the null hypothesis of testing the additional in-struments is not rejected for every health instrument subset added to the matrix, including smoke, drugs and drinker. Hence the additional instruments are valid and the classification must be changed to predetermined (or exogenous). Now the variables initially classified as endogenous are reviewed it is time to move on the the remaining variables initially classi-fied as predetermined: irregular hours, self-employed and travel minutes. They are added one by one to the instrument set and the incremental test did not reject the predetermined hypothesis. For irregular hours and self-employed the p-value of the incremental test of both the first and higher lag instruments is over 0.5 in both datasets. The p-value of travel minutes is somewhat lower, but still satisfies the requirements of the test.

Referenties

GERELATEERDE DOCUMENTEN

For the non-working group, the second-order model with a general disability factor and six factors on a lower level, provided an adequate fit. Hence, for this group, the

Here we study the temperature-dependent voltage control of the magnetic anisotropy caused by rare-earth (RE) local moments at an interface between a magnetic metal and a

Hierin wordt bepaald dat een rechtsvordering tot afwikkeling van massaschade in geld alleen kan worden ingesteld indien de rechtsvragen en feitelijke vragen in

Conference speakers included the Honourable Minister of Health, a Ministry of Health representative, leading academics in the field of Family Medicine in South

Using a dynamic spatial panel approach and data pertaining to 156 countries over the period 2000-2016, this thesis tests and compares the different spatial econometric models and

The current study analysed whether mental health status is associated with time preferences from the individual perspective, to contribute quantitatively to the rationale for

For each country, I collect data about income inequality, export of goods and services, foreign direct investment net inflow, inflation, GDP per capita growth, labor force

Dietrich and Wanzenried (2011) and Islam and Nishiyama (2016) could not present significant results but argued that that that the relationship between size and