• No results found

Insights into the relationship between health and retirement : European evidence of a bi-directional relationship

N/A
N/A
Protected

Academic year: 2021

Share "Insights into the relationship between health and retirement : European evidence of a bi-directional relationship"

Copied!
79
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Amsterdam Thesis for MSc in Econometrics

Insights into the relationship between health

and retirement

European evidence of a bi-directional relationship

Puck van Trier

10552375

MSc in Econometrics Track: Econometrics

Date of final version: July 15, 2018 Supervisor: dr. J.C.M. van Ophem Second reader: dr. E. Aristodemou

Abstract

This thesis studies the relationship between health and retirement in Europe, using data from the Survey of Health, Ageing and Retirement in Europe. Studying this relationship is complicated by the presence of endogeneity since retirement can be a decision based on the health status itself. A two-step selection model is used to take the endogeneity into account. Using this model, a bi-directional relationship between physical health and retirement is found. A bad physical health condition increases the probability of retirement and retirement leads to a deterioration of physical health. Several checks have been performed to verify the robustness of these results.

(2)

i

Statement of Originality

This document is written by Puck van Trier who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 1

2 Literature Review 3

2.1 Reasons for early retirement . . . 3

2.2 The influence of retirement on health . . . 5

3 Pension systems in Europe 9 3.1 Early retirement and retirement ages . . . 9

3.2 Overview di↵erent pension systems . . . 12

4 The Model 14 4.1 Heckman Model Setup . . . 15

4.2 Model estimation . . . 16

4.3 Limitations of the model . . . 17

4.4 Semi-parametric estimation . . . 17 5 Data 19 5.1 Overview . . . 20 5.2 Variable selection . . . 21 5.2.1 Dependent variables . . . 22 5.2.2 Explanatory variables . . . 25 5.2.3 Descriptive statistics . . . 29 6 Results 31 6.1 Health index . . . 31

6.2 Heckman first step . . . 32

6.3 Heckman second step and baseline results . . . 33

6.4 Semi-parametric estimation . . . 34 6.5 Robustness checks . . . 37 7 Conclusion 50 A Appendix 52 Bibliography 71 ii

(4)

Chapter 1

Introduction

Since the industrial revolution, life expectancy has increased rapidly. Together with lower birth rates, this has led to a change in the population structure. Also, after the second world war a lot of babies were born. This baby boom generation has now reached the retirement age, and as a consequence the proportion of retirees has increased. On top of that, retirees live longer in retirement. According to the Organization for Economic Co-operation and Development (OECD) there were more than seven people working for every one of pension age in 1950 while this number is expected to drop to only two workers per pensioner by 2047 (OECD, 2011, p.40). In most OECD countries there are two systems. People either save for their own pension while working or the working people provide pension for the retirees. If the pension program is paid for by those still working, the high proportion of retirees increases the burden on the working people significantly. If people save for their own retirement, it might become problematic that people spend much more time in retirement nowadays then they used to do. The most straightforward solution for both problems would be to raise the official retirement age to ensure that people work longer, either to save more for their own pension or to decrease the burden on the working population. By 2011, around half of OECD countries had already begun increasing pension ages (OECD, 2011, p.19) but most people still retire before the official retirement age in OECD countries (OECD, 2011, p.41). Di↵erent factors can cause such early retirements. A distinction that is often made in literature is between pull and push factors (Schultz et al., 1998). Factors pulling individuals toward retirement are often seen as positive considerations such as spouses retirement, leisure expectations and financial incentives. Factors pushing individuals out of the labor market can be seen as negative considerations such as heavy working conditions or structural and technological changes. However, not all factors can be clearly defined as either a push or a pull factor. Health for example can act as both a push and a pull factor. When people expect their health to improve after retirement it might pull them into retirement but when people’s health doesn’t allow them to work any longer it pushes them out of the labor market. Health is an important factor when it comes to the current policy of raising retirement ages. If people aren’t fit or healthy enough to make it to the official retirement age already, this calls the current policy of raising the retirement age into question.

(5)

CHAPTER 1. INTRODUCTION 2 Another major challenge associated with the increasing life expectancy is the rise in health-care costs. The elderly account for a big part of the total health spending in most if not all countries. De Nardi et al. (2015) found that medical spending doubles between the ages of 70 to 90 years based on American data. A similar result was obtained by Christensen et al. (2016) studying Danish data who found that between ages 50 and 80 total health expenditures triple for women and quadruple for men. Bakx et al. (2016) focused on Dutch data and concluded that health expenditures rise steeply with age.

Several studies have been conducted to investigate the e↵ect of retirement on health. Retire-ment can be argued to have a negative e↵ect on health as people might become socially isolated when they lose their colleagues and daily structure. On the other hand it might be a stress relief, especially when people didn’t like their job. The absence of work also increases people’s leisure time which leads to more time for hobbies, exercising or other meaningful activities such as voluntary work and care-giving activities. In those cases retirement can be argued to have a positive impact on physical and mental health. If that’s the case, then the current idea to keep people in the working field up to a higher age might not be the correct policy in order to reduce costs. Especially with increasing life expectancy it is important to gain a better insight in understanding the relationship between health and retirement.

The aim of this thesis is to find out what role health status plays in the decision to retire and if retirement a↵ects people’s health in European countries. Answering this question is complicated by the presence of endogeneity since retirement can be a decision based on the health status itself. The endogeneity problems will be discussed extensively in chapter 4. In order to deal with the endogeneity and answer the question of interest the two-step selection model of Heckman (1976) will be used on a pooled dataset from five waves of the Survey of Health, Aging and Retirement in Europe (SHARE).

The remainder of this thesis is organized as follows. A review of previous studies in this research field is given in chapter 2. In chapter 3 an overview of the current pension systems in Europe is given. Chapter 4 focuses on the model that is used, and chapter 5 provides an overview of the SHARE data accompanied by some descriptive statistics. The estimation results are provided in chapter 6. Finally conclusions are drawn in chapter 7.

(6)

Chapter 2

Literature Review

The decision to retire is a↵ected by a number of factors. Several authors conducted interviews with people about their decision to retire in order to get a better understanding of the deter-minants of early retirement. Higgs et al. (2003) have interviewed British civil servants who chose to retire early and servants who did not. With this information they constructed possible routes into retirement. Organizational restructuring, financial o↵ers and the opportunities for leisure and self-fulfillment were found to be the main determinants leading to early retirement. De Wind et al. (2013) looked at pathways leading to early retirement as well while focusing on the e↵ect of health. Based on interviews from the Study on Transitions in Employment, Ability and Motivation, they found that health plays a role in the decision to retire early for half of the people. Poor health can lead to early retirement through four di↵erent pathways. First, employees with health problems might not be able to work any longer. Second, employees with health problems might be afraid that their health will decline even further if they continue to work and therefore decide to retire. Third, health problems lead to a self-perceived decline in the ability to work and employees decide to retire. Fourth, employees with health problems might feel pushed out by their employer.

2.1

Reasons for early retirement

This research field has attracted a wide range of empirical studies with a variety of methodolog-ical approaches and datasets. Most studies found that poor health, either physmethodolog-ical or mental, can be seen as the main reason for early retirement although it is definitely not the sole deter-minant (Karpansalo et al., 2004; Mein et al., 2000; Rice et al., 2011; Harkonm¨aki et al., 2006; Sewdas, 2017). The study of Karpansalo et al. (2004) focused on perceived health as a predictor of early retirement. They distinguished between illness-based early retirement and non-illness based early retirement. Illness-based early retirement is based on illness or disability due to some chronic disease, and can be obtained after a long work history. Non-illness based early retirement is a voluntary decision that applies to part-time pensions, unemployment pensions and a special early pension regulation for farmers in Finland. Based on Finnish data they

(7)

CHAPTER 2. LITERATURE REVIEW 4 found that self-perceived poor health is a strong predictor of early illness-based retirement due to mental disorders, musculoskeletal disorders, and cardiovascular diseases. Moreover poor self-perceived health increases the probability of a non-illness based pension as well. Mein et al. (2000) studied the determinants of early retirement using longitudinal data on civil servants in London. Time until early retirement was analyzed using a Cox proportional hazards model. Like Karpansalo et al. (2004), Mein et al. found that self-perceived health is an important predictor of early retirement, but also employment grade and job satisfaction were found to be additional independent predictors of early retirement. Another study on British data was conducted by Rice et al. (2011) who used data from the English Longitudinal Study of Ageing. Again poor self-rated health was found to be associated with early retirement, but in this case also partner retirement, wealth, high alcohol consumption and symptomatic depression were all associated with an early exit from work. Especially workers reporting depression, lower limb pain and shortness of breath were likely to retire early. Harkonm¨aki et al. (2006) put their focus on mental health and social aspects related to early retirement. Multinomial regres-sion models were used to analyze data from the Helsinki Health Study. They found a strong relationship between mental health functioning and early retirement. In addition to health, conflicts between work and family and unfavorable working conditions were found to increase the intentions to retire early. Sewdas (2017) included both physical and mental health in his analysis on Danish data. He was especially interested in the di↵erences between determinants of early retirement among older workers with and without chronic diseases, but no significant di↵erences could be found. For both groups poor health and more depressive symptoms were found to be associated with early retirement together with all sorts of job characteristics such as high physical workload, low job satisfaction, low influence at work, work-family conflicts and poor relationship with colleagues. Another study that focused on job characteristics with respect to early retirement was performed by Kubicek et al. (2010). Studying data from the Wisconsin Longitudinal Study they found that both work- and family-related factors influence early retirement behavior. Especially job dissatisfaction and marital satisfaction were found to be important predictors of early retirement. After traditional economic factors, like wealth and income, marital satisfaction even turned out to be the strongest predictor of retirement timing in their study.

In summary, next to physical and mental health measures, many work-related factors (high physical workload, low job satisfaction, unfavorable working conditions etc.), financial factors (wealth and income), and family-related factors (partner retirement and marital satisfaction) are associated with early retirement. Di↵erent classifications of factors determining retirement can be found in the existing literature. We already discussed the distinction between push and pull factors. A di↵erent distinction can be made between work-related and non-work related factors. Beehr et al. (2000) were the first ones to study work-related and non-work-related factors systematically together in the same study. Based on earlier research they constructed two sets of variables for both the work and non-work related variables. The first set consists

(8)

CHAPTER 2. LITERATURE REVIEW 5 of autonomy, skill variety, task significance, interaction with others, workforce cutbacks, being tired of working and a one-time retirement incentive o↵er from the employer. The latter set consists of expected retirement activities, having a family member who needs care, reaching mandatory retirement age, and cost and availability of continued health insurance. Both sets of characteristics were found to induce retirement decisions, based on data from employees of the state Oregon in the United States. Furthermore, also wealth was found to be a strong predictor of retirement decisions. A third distinction is used by Hochman and Lewin-Epstein (2013) who group all factors into four categories: institutional factors (such as official retire-ment ages); financial factors (such as early pension reductions, earnings from work and wealth); personal employment experiences and work attitudes (such as stress, job satisfaction and per-sonal health); and family characteristics (such as marital status, partner’s retirement, children and grandchildren). They focus mainly on the impact of family characteristics, in particular they study the role of grandparenthood in association with the retirement decision. Using data from SHARE they find that grandchildren increase the probability of looking forward to re-tiring early. They also expected this e↵ect to be stronger for women due to stronger familial commitment, but no significant di↵erence could be found between men and women. The in-fluence of some other characteristics might be gender-based however (De Preter et al., 2013; Pienta, 2003; Dentinger and Clarkberg, 2002). De Preter et al. (2013) for example compared men’s and women’s retirement decisions using data from the European Community Household Panel (ECHP). They found that work characteristics, education and leisure activities influence women’s retirement timing much more than men’s retirement timing. On the other hand, health characteristics and social activities influence men’s retirement much more.

In summary it is clear that many di↵erent factors influence the retirement decision. Health is an important predictor of early retirement, but definitely not the sole determinant. Therefore we follow the approach that was taken by Hochman and Lewin-Epstein (2013), and take factors from all four categories into consideration when analyzing the e↵ect of health on the retirement decision.

2.2

The influence of retirement on health

Once the individual has decided to retire, either voluntary or involuntary, this might influence one’s future health. A big challenge when estimating the e↵ect of retirement on health is the potential endogeneity since health has a direct causal e↵ect on work and vice versa leading to a simultaneity issue (Lindeboom and Kerkhofs, 2009). This issue is further discussed in chapter 4.

For a long time the endogeneity issue was not addressed which might have led to biased results. Moreover di↵erent results were found with respect to the correlation of retirement with health. Some studies found no correlation with health (Rowland, 1977; Haynes et al., 1977) whilst other studies found a negative correlation with health such as Casscells et al. (1980) and Gonzales (1980) who both found that retirement may lead to fatal coronary heart disease.

(9)

CHAPTER 2. LITERATURE REVIEW 6 Most studies that were carried out more recently did take the endogeneity into account. Di↵erent methodological approaches can be found in literature, that can be roughly categorized in four groups: fixed e↵ects estimation, instrumental variables (IV), regression discontinuity de-sign (RDD) methods and matching methods. Some authors use Ordinary Least Squares (OLS) as well as a baseline result to compare with the results obtained when taking the endogeneity into account. Applications using these di↵erent methodologies will now be discussed.

Fixed e↵ects estimation was first introduced for this topic by Kerkhofs and Lindeboom (1997) who constructed a fixed e↵ects panel data model for two waves of a Dutch panel dataset. They argued that the endogeneity will disappear when controlling for individual fixed-e↵ects, assuming that the sources of endogeneity are time invariant individual specific constants. They concluded that retirement has a health-preserving e↵ect in general, though the results depend on gender and di↵er for di↵erent age cohorts. Note that if the assumption of time invariant sources of endogeneity is violated, endogeneity will still be present and the results will be biased. A possible solution for such potential biases is given by Lindeboom et al. (2002) who argued that these can be overcome by including a broad range of explanatory variables to account for shocks varying between successive waves of the survey. They applied this method to a fixed e↵ects model to identify the e↵ect of older persons major life events on mental health. Again retirement turned out to have a health-preserving e↵ect. In particular, no evidence was found that early retirement leads to higher depressive feelings.

A di↵erent approach to handle the biased results was taken by Dave et al. (2008). They applied fixed estimation to a limited sample of individuals from the Health and Retirement Study who did not report any physical or mental health problems before their retirement to ensure that the decision to retire cannot be driven by the individual’s health status. This clearly reduced the endogeneity problem, but it has led to a very selective sample as well. They found, in contrast to the other fixed e↵ects estimations, that complete retirement has a negative impact on both physical and mental health mainly through a decline in physical activity and social interactions. A possible explanation for this outcome is the sample selection since only people in good health are taken into account. Therefore conclusions might not generalize to the entire population.

A di↵erent approach is taken by several authors who used instrumental variables. These instruments should be correlated with the decision to retire, but they shouldn’t a↵ect health. Charles (2004) tried to estimate the e↵ect of retirement on health by exploiting exogenous variation in pensions in the United States. Age-specific retirement incentives in the Social Security System as well as changes in laws for compulsory retirement are taken as instruments. He finds a positive impact of retirement on well-being, measured as feeling depressed or feeling lonely, after controlling for the endogeneity. Due to missing information in the data used, well-being only focuses on mental health and leaves out all information related to physical health. Neuman (2008) followed the approach taken by Charles (2004) using data from the Health and Retirement Study. This allowed him to look at the e↵ect of retirement on physical health as well.

(10)

CHAPTER 2. LITERATURE REVIEW 7 Just like Charles, Neuman found a health-preserving e↵ect of retirement. Two other studies using data from the Health and Retirement study were conducted by Coe and Lindeboom (2008) and Coe et al. (2012) using early retirement window o↵ers as an instrument. Early retirement windows are ”special incentives to retire at a specific time o↵ered by employers to employees” (Coe and Lindeboom, 2008). These authors contributed to the literature by di↵erentiating between blue and white collar workers. Using simple ordinary least squares, they both found a significant negative relationship between the time spent in retirement and cognitive functioning, but no negative e↵ect was found when taking the endogeneity into account. Coe et al. (2012) even found a positive relationship for blue collar workers. Coe and Lindeboom (2008) found a positive impact of early retirement on the self-reported health but the e↵ect turned out to be only temporary.

From the studies mentioned above that use instrumental variables, all of them focused on U.S. data. Due to di↵erences in health insurance and social policies these findings do not nec-essarily generalize to other countries. Coe and Zamarro (2012) focused on European countries using the SHARE data. Country-specific early and full retirement ages were taken as instru-ments, since these will influence the retirement decision but won’t directly a↵ect health. A positive impact on overall general health was found. In fact this method can be seen as a a form of a regression discontinuity design. Exploiting the retirement ages, makes the probability of retiring a discontinuous function of age. This approach was taken by M¨uller and Shaikh (2018) as well who exploited the discontinuity at the country specific retirement age in a fuzzy RDD to estimate the e↵ect of retirement not only on the individual’s health but also on the partner’s health status. Using SHARE data for 19 European countries, they found that part-ner’s retirement leads to less physical activity, increases both the amount and frequency of alcohol consumption and increases smoking among those who already smoke. Meanwhile, own retirement leads to more physical activity and doesn’t lead to a higher amount of alcohol con-sumption. Overall they conclude that partner’s retirement has a negative impact on subjective health whilst own retirement has positive e↵ects on health. The impact of male retirement on the female’s health is stronger than vice versa which might be a motivation to do an analysis for men and women separately.

Eibich (2008) focused on the mechanisms through which retirement a↵ects health using a RDD with multiple discontinuities. He exploited the discontinuities arising from financial incen-tives in the German pension system using data from the German Socio-Economic Panel Study (SOEP). He found a positive impact of retirement on health through a relief from work-related stress, an increase in sleep duration on weekdays and a more active lifestyle after retirement.

A fourth method to overcome the endogeneity issues and identify causal e↵ects is given by matching methods. Behncke (2009) applied non-parametric matching methods to data from the English Longitudinal Study of Ageing (ELSA). Compared to the other methods that were dis-cussed before, matching methods have a few major advantages. IV and RDD provide estimates only for the subpopulation that responded to the instrument whilst matching methods lead to

(11)

CHAPTER 2. LITERATURE REVIEW 8 estimates for all retirees. Also, no functional form assumptions have to be made and it allows for individual e↵ect heterogeneity. Behncke found that the e↵ect of retirement di↵ers between individuals. It doesn’t harm all retirees but for some it increases the risk of a cardiovascular disease and being diagnosed with cancer.

In summary, no clear conclusion can be drawn concerning the e↵ect of retirement on health. Many researches have been conducted using di↵erent methodologies and datasets, leading to di↵erent findings. This paper contributes to the existing literature by using a two-step Heckman selectivity model to cope with the endogeneity problem. The model will be discussed more extensively in chapter 4. Also, extra waves of the SHARE dataset that were recently released will be taken into account compared to earlier researches.

(12)

Chapter 3

Pension systems in Europe

For a long time it was common to start working for a company at a young age where one would remain working until retirement. Career paths were rather predictable and job switches were rare. More recently, due to recessions and a more flexible labor market, most people don’t spend their entire working life in one company anymore. The traditional career path has changed due to part-time jobs as well as frequent entries and exits from the workforce for example to take care of children or elderly (Feldman, 1994). This has also led to a changing definition of retirement as discussed by Beehr et al. (2000). In earlier researches retirement was the point where people stopped working and fully withdrew from paid work, but later on it became more common to partly retire or to start working for yourself or a new employer after retirement. Therefore authors started to develop various definitions for di↵erent types of retirement (Beehr, 1986). Feldman (1994) tried to come up with a general definition that captures all sorts of retirement. He defined retirement as ”the exit from an organizational position or career path of considerable duration, taken by individuals after middle age, and taken with the intention of reduced psychological commitment to work thereafter” (p.287). This definition captures indeed the di↵erent varieties of retirement whether the individual retires early or late, whether it is full or partial retirement and whether it is voluntary or involuntary.

3.1

Early retirement and retirement ages

Early retirement is defined as retiring before the official statutory retirement age. Most pension systems allow for early retirement under certain conditions, such as a certain contributory record in the labor market or reaching a certain threshold age (OECD, 2017, p.52). In most cases of early retirement, reduced pension benefits are obtained to compensate for the longer period over which pensions are paid together with a shorter career. However, flexibility to retire fully before the normal retirement age is strongly restricted in more than half of OECD countries. Some countries, such as the Netherlands, have no early retirement age at all in their mandatory pension systems (OECD, 2017, p.68). It might still be possible to retire early for example by withdrawing the occupation pension (with adjusted benefits) earlier, as is the case in the

(13)

CHAPTER 3. PENSION SYSTEMS IN EUROPE 10 Netherlands where an occupation pension is quasi-mandatory as well.

Table 3.1 gives an overview of the requirements in 2015 of the mandatory pension sys-tems to be eligible for early retirement and the accompanied reduction in the pension benefits. This information is provided for the eleven countries that will be analyzed later, namely Aus-tria, Belgium, Denmark, France, Germany, Greece, Italy, the Netherlands, Spain, Sweden and Switzerland.1 The overview is simplified, since in fact both the reductions and the benefits are subject to many rules and exceptions. For example in Greece, there is no reduction if the individual is over age 62 with a contributory record of 40 years. The same holds for Italy with a contribution record of 42 years and 6 months for men and 41 years and 6 months for women. In Spain a di↵erence is made between voluntary and involuntary retirement. In case of involuntary retirement, early retirement is possible already 4 years prior to the normal retirement age. As stated before early retirement is possible in the Netherlands, but not through the mandatory (basic) pension system.

Country Reduction in benefits Requirements for early retirement Austria 4.2% per year 40 (women) / 45 (men) years of contribution Belgium - 39.5 years of contribution

Denmark - 3 years prior to normal retirement age

France 22%2 From age 60 with full contributory record of 40 years

Germany 3.6% per year From age 63 with minimal 35 years of contribution Greece 0.5% per month From age 62

Italy One percentage point per year From age 62

Netherlands -

-Spain 1.5% to 2% per quarter 2 years prior to normal age, 35 years of contributions Sweden Actuarial reduction depending on age From age 61

Switzerland 6.8% per year From age 62 (women) / 63 (men)

Table 3.1: Requirements and reductions for early retirement

Table 3.1 describes the situation in 2015. However, pension systems are on the move all over Europe and many requirements and reductions of early retirement have changed recently or are about to change. Most reforms aim at keeping people in the working field up to a higher age. Apart from increases in the retirement age, most reforms are observed in changing benefits, longer contributory records to qualify for benefits and changing tax incentives (OECD, 2017, p.21).

As can be seen in table 3.1 the reductions and conditions vary widely between countries and sometimes even by gender. This is not only the case for the early retirement. There are big di↵erences between statutory (“normal”) retirement ages as well. At the statutory retirement age people are usually eligible for full pension benefits without reductions. However, the exact

1The reason that these eleven countries are included is further explained in chapter 5

2The reduction depends on the number of missing years until a full contributory record. The reduction stated

here is the maximum reduction when the individual misses five years until a full contributory record. When the individual misses 1,2,3 or 4 years until a full contributory record the reduction is 4%, 8%, 12% or 17% respectively

(14)

CHAPTER 3. PENSION SYSTEMS IN EUROPE 11 benefits di↵er between countries due to di↵erent pension systems and regulations as discussed before. An overview of the rounded early and statutory retirement ages is given in table 3.2 for the same eleven countries of interest. The five waves displayed correspond to the waves of the SHARE data that will be used, which will be discussed extensively in chapter 5. The data of these waves were obtained in the years 2004, 2006, 2011, 2013 and 2015 respectively. The early and normal retirement ages correspond to these years. For waves 1 and 2 the information is obtained from the SHARE data supplemented with information from OECD (2005). For waves 4 until 6 this information is no longer included in the SHARE data and retirement ages are taken from OECD (2013), OECD (2015) and OECD (2017) respectively.

Official retirement age, early/normal

Wave 1 Wave 2 Wave 4 Wave 5 Wave 6 Country Men Women Men Women Men Women Men Women Men Women Austria 62/65 57/60 62/65 57/60 62/65 57/60 62/65 59/60 62/65 603/60 Belgium 60/65 60/63 60/65 60/64 60/65 60/65 61/65 61/65 62/65 62/65 Denmark -/65 -/65 -/65 -/65 -/65 -/65 -/65 -/65 -/65 -/65 France 60/65 60/65 60/65 60/65 60/65 60/65 60/65 60/65 60/65 60/65 Germany 63/65 60/65 63/65 63/65 63/65 63/65 63/65 63/65 63/65 63/65 Greece 57/65 57/65 57/65 57/65 60/65 60/65 62/67 62/67 62/67 62/67 Italy 57/65 57/60 57/65 57/60 62/66 62/63 62/66 62/64 63/67 62/66 Netherlands -/65 -/65 -/65 -/65 -/65 -/65 -/65 -/65 -/66 -/66 Spain 60/65 60/65 60/65 60/65 60/65 60/65 61/65 61/65 61/65 61/65 Sweden 61/65 61/65 61/65 61/65 61/65 61/65 61/65 61/65 61/65 61/65 Switzerland 62/65 60/63 62/65 61/64 63/65 62/64 63/65 62/64 63/65 62/64

Table 3.2: Overview of early and statutory retirement ages for men and women

From table 3.2 it becomes immediately clear that there are di↵erences between countries for both early and statutory retirement ages. In some countries the retirement age for women used to be lower than for men. Nowadays the di↵erences between men and women are becoming smaller though. Germany for example used to have di↵erent retirement ages during wave 1 but since wave 2 this is no longer the case. Big jumps in the retirement ages between wave 2 and 4, such as for Italy, can be explained by the fact that the period between these two waves is twice as big as the time between two other waves. Overall a clear pattern can be observed between wave 1 and wave 6 showing that the official retirement ages are rising and that they become more equal for men and women. This pattern can be observed in most OECD countries. Following the OECD (2017, p.49) the early retirement age has increased by around 14 months on average across OECD countries since 2002, while the normal retirement age has shown an average increase of eight months over the same period. Hence, the gap between the early and normal retirement age has narrowed.

3The early age appears to be equal to the normal age due to rounding. The exact early age is 59 years and

(15)

CHAPTER 3. PENSION SYSTEMS IN EUROPE 12

3.2

Overview di↵erent pension systems

The large di↵erences in pension systems and retirement ages introduce a lot of heterogeneity. It is therefore hard to compare countries and to analyze them all together. In most countries the pension system is build up from di↵erent programs. Classifying these programs is difficult due to country-specific programs and di↵erences in the underlying pension-structure. The OECD (2015, p.125) uses a framework of three main pillars to structure the di↵erent pension systems. An overview of this framework is given in figure 3.1 where DB denotes defined benefit, DC denotes defined contribution and NDC denotes notional defined contribution.

Pension systems Mandatory, adequacy Basic Minimum Social assistance Mandatory, savings Public DB points NDC Private DB DC Voluntary, savings Private DB DC

Figure 3.1: Framework pension systems

Most OECD countries have at least one program from the left pillar. This pillar consists of three di↵erent types of pension programs: basic pensions, minimum pension programs and social assistance plans. Basic pensions either pay a fixed benefit to everyone that has reached the retirement age or a benefit based on the number of years of contributions. The purpose of minimum pensions programs and social assistance plans is to guarantee a minimum standard of living after retirement. A flat minimum pension pays a basic benefit when the individual reaches the minimum retirement age, irrespectively of the contributory record. Social assistance plans pay the highest benefits to the poorest retirees and reduced benefits to better-o↵ pensioners. The benefits often depend on assets and income from other sources.

An overview of the di↵erent pension programs is given in table 3.3 for the eleven countries that are of interest for this analysis. Most OECD countries have some form of social safety-nets but the countries marked in the social assistance column in table 3.3 are the ones where full-career workers with low earnings (less than 30% of the average) would qualify for the extra benefits (OECD, 2017, p.86).

Table 3.3 shows that all countries we focus on have a public program, a private program or both. Defined benefit (DB) plans are provided by the public sector. A predefined formula

(16)

CHAPTER 3. PENSION SYSTEMS IN EUROPE 13 determines the pension payments on retirement. The payments often depend on the number of years of contributions and the individual’s terminal earnings. In a few countries, such as Switzerland and the Netherlands a plan of this form is (quasi-)mandatory.

A few OECD countries have point schemes where workers earn pension points based on their earnings each year that define their pension payment at the time of retirement. In a defined contribution (DC) plan, contributions are paid into an individual account. The money in the account is invested and the returns are added (or subtracted) from the account. At the time of retirement, retirement benefits are paid from the account. Finally, notional defined contribution (NDC) plans only exist in a few OECD countries. Like the DC plan, contributions are paid into an individual account, but in this case a rate of return is applied to the balances.

Country Basic Minimum Social Assistance Public Private

Austria DB

Belgium X X DB

Denmark X X DC

France X DB and points

Germany points Greece X DB Italy X NDC Netherlands X DB Spain X DB Sweden X NDC DC Switzerland X DB DB

Table 3.3: Pension system overview

As shown above, a lot of heterogeneity is present between the eleven countries of interest in the pension programs, the requirements for early retirement and the retirement ages. To overcome this heterogeneity when estimating, one option is to cluster countries with comparable systems and retirement ages, and hence to perform a separate analysis for each cluster. However, since not many similarities can be found, it is hard to cluster all the countries. Another option is to include a dummy variable in the model for each country to capture country-specific e↵ects. However, if the heterogeneity is too big it is questionable if the inclusion of a dummy variable is enough to correct for the di↵erences between countries. Taking all these considerations into account,we will stick to the second approach and include dummy variables to account for country-specific e↵ects.

(17)

Chapter 4

The Model

As mentioned before, we want to analyze the interrelation between health and retirement deci-sions. The e↵ect of retirement on health can be captured in the following equation

Yi= Xi0 + Ri + ✏i (4.1)

where Yi is the individual’s health status, Ri a binary decision variable of retirement, Xi a vector of explanatory variables and ✏i the error term that captures all unobserved factors. Yi refers to the health status of the individual at the time of interviewing. Three di↵erent health measures will be used in this paper. We will perform the analysis for a self-rated health status, a health index that is constructed from several objective and subjective health measures and a European depression scale to look at mental health. All three health measures will be discussed extensively in chapter 5.

The most straightforward way to estimate equation 4.1 is to use ordinary least squares. However, results will be biased if the error term is correlated with any of the explanatory variables, i.e. E[✏i|Xi, Ri] 6= 0. In that case the endogeneity should be taken into account to obtain consistent estimates.

Di↵erent underlying causes can be found for the presence of endogeneity. First, the omission of relevant variables from the regression model might lead to biased results. If no data is available on the omitted variable or if the variable is simply not observed then it cannot be included in the model. The error term will capture its influence, but in case the missing variable is correlated with the explanatory variables this will lead to endogeneity and the results will be biased. The bias can be either positive or negative and could even lead to a sign reversal of the OLS coefficient (Cameron and Trivedi, 2005, p.93). Such a bias is called an omitted variable bias. A second cause of endogeneity is sample selection, since retirement is not randomly distributed among the individuals in the sample. Individuals that are unhealthy or not satisfied with their job for example are more likely to retire early, hence they self-select themselves into early retirement based on individual preferences. In that case the assumption E[✏i|Xi, Ri] = 0 might no longer hold since Ri is correlated with unobserved individual preferences. Such a bias is known as a selection bias and leads to inconsistent results. Third endogeneity can be due to simultaneity. In this case retirement and health are likely to be co-determined meaning that retirement can

(18)

CHAPTER 4. THE MODEL 15 influence the individual’s health but health can also influence the retirement decision. Such reverse causality can lead to endogeneity that is also known as a simultaneous equations bias.

Di↵erent approaches can be taken to overcome the endogeneity and self-selection problems. In this paper a two-step selection model will be used that was first proposed by Heckman (1976).

4.1

Heckman Model Setup

The Heckman selection model can be used to correct for non-randomly selected samples (Heck-man, 1974, 1976, 1979). They typically consist of two equations: an outcome equation describing the relationship between the outcome of interest, in this case the health variable Yi, and a vec-tor of explanavec-tory variables; and a selection equation that describes the relationship between a binary participation decision, the retirement decision Ri, and another vector of explanatory variables. The model can be written as follows:

Yi = Xi0 + Ri + ✏i (4.2) Ri? = Zi0 + ˜Yi✓ + ⌫i (4.3) Ri= 8 < : 1 if Ri? > 0. 0 if Ri?  0. (4.4)

where equation 4.2 is the outcome equation as defined before in equation 4.1 and equation 4.3 presents the selection equation with Zi a vector of explanatory variables, ˜Yi the health status of individual i at the moment of retirement and ⌫i the error term. Yi in equation 4.2 and ˜Yi in equation 4.3 both relate to the health status of individual i. They are not the same however since they refer to a di↵erent point in time. ˜Yi relates to health at the moment of retiring whereas Yi relates to health at the time of interviewing. To ensure consistency of the model and to di↵erentiate between these two di↵erent points in time, di↵erent measures of health will be used for Yi and ˜Yi. Although they are not the same, they will be correlated with each other. The exact variables that are included in Yi and ˜Yi are discussed in chapter 5 together with the variables that are included in Xi and Zi.

The Heckman model assumes a bivariate normal distribution with zero means and correlation ⇢ for the error terms of the outcome equation and selection equation that can be described as follows: ✏i ⌫i ! ⇠ N 0 0 ! , ✏ 2 ✏ ⇢ ✏ 1 ! ! (4.5) The variance of ⌫i cannot be identified and is therefore set equal to 1. This can be done in a Probit equation without loss of generality because the scale of the dependent variable is not observed. If the correlation ⇢ between the error terms is zero then no selection needs to be taken into account and OLS will provide consistent estimates. Otherwise a di↵erent approach is needed that will be discussed in the next section.

(19)

CHAPTER 4. THE MODEL 16

4.2

Model estimation

The Heckman parametric selection models can be estimated following two di↵erent methods: an efficient maximum likelihood (ML) approach (Heckman, 1974) and a two-step procedure (Heckman, 1976). We will use the latter approach which has several advantages compared to the ML estimator (Cameron and Trivedi, 2005, p.550-551). First of all, the required assumptions on the error terms ✏i and ⌫i are weaker for the two-step estimator. Furthermore, the two-step estimator is more easy to implement. Finally, the distributional assumptions on the error terms can be weakened even further to permit semi-parametric estimation.

To estimate the selection model, we start with combining the two equations of the model by conditioning the outcome equation on the selection equation.

E[Yi|Ri? > 0] =E[Xi0 + Ri + ✏i|Zi0 + ˜Yi✓ + ⌫i > 0] = Xi0 + Ri +E[✏i|Zi0 + ˜Yi✓ + ⌫i > 0] = Xi0 + Ri +E[✏i|⌫i > Zi0 Y˜i✓]

(4.6)

If the error terms are uncorrelated then the last term becomes zero and OLS will give consistent estimates. However for any ⇢6= 0 the last term has no zero conditional mean and OLS estimates will be biased. The idea of Heckman is to restore a zero conditional mean by including an estimate of the selection bias (Hussinger, 2008). Therefore an expression for E[✏i|⌫i > Zi0

˜

Yi✓] is needed when ✏i and ⌫i are correlated, which is given by (Cameron and Trivedi, 2005, p.556)

E[✏i|⌫i > Zi0 Y˜i✓] = ⇢ ✏ i (4.7) where ⇢ is the correlation between the error terms and iis the inverse Mill’s ratio,“a monotone decreasing function of the probability that an observation is selected into the sample” (Heckman, 1979), that is defined as follows

i = (Zi0 + ˜Yi✓) 1 (Z0 i + ˜Yi✓) = (Zi0 + ˜Yi✓) (Z0 i + ˜Yi✓) (4.8) where and are the density and distribution function for a standard normal variable respec-tively.

The model can now be estimated in two steps. First the selection equation will be estimated that will give us a better understanding in the role that health plays in the decision to retire. A probit specification is used leading to estimates of ˆ and ˆ✓ that we use to calculate the inverse Mill’s ratio. This ratio will be added to the outcome equation

Yi = Xi0 + Ri + ib + ⌘i (4.9)

that will be estimated in the second step to find out what influence retirement has on health, where E[⌘i|Ri = 1] = 0. Since the inverse Mills ratio is always positive, it follows that the regression line for Yi on Xi and Ri will be biased upward when ⇢ is positive and downward when ⇢ is negative. The size of the bias depends on the correlation ⇢, the variance of the error term ✏ and the severity of the truncation.

(20)

CHAPTER 4. THE MODEL 17

4.3

Limitations of the model

The reported standard errors of equation 4.9 are incorrect for two reasons. First, equation 4.9 uses the estimate ˆ and ˆ✓ from the first step of the model. This introduces some randomness in the second step of the model leading to incorrect standard errors. Secondly, the error in equation 4.9 is heteroskedastic. To overcome both problems and to obtain correct standard errors the entire model will be bootstrapped using 200 bootstrap replications.

The general Heckman two-step procedure is meant to be used for estimating the parameters in a linear model. The specification of the second step of our model depends on the health measure that is used for Yi in the outcome equation (4.9). As discussed in section 4.1 three di↵erent health measures will be used. One of these health measures is a continuous variable, the other two are ordered variables.1 In the case that our health measure is a continuous variable, the general model can be applied performing OLS in the second step. In the case that our health measure is an ordered variable an ordered probit model will be used.

Another possible complication when estimating the Heckman model is the identification of the model. Nawata (1993) focused on some limitations and problems of the Heckman two-step estimator. He states that the Heckman’s two step estimator might perform poorly when the degree of multicollinearity between Xi and both Zi0ˆ and ˜Yi✓ is too high, since the inverse Millsˆ ratio term is approximately linear over a wide range of its argument in that case. The severity of this problem depends on the variation in X ˆ across observations. The higher the variation, the better a probit model can discriminate between participants and nonparticipants (Cameron and Trivedi, 2005, p.551). To avoid such identification problems some explanatory variables will be added to the selection equation that are not included in the outcome equation. An exact overview of the variables included in both X and Z and the corresponding identification conditions is given in chapter 5.

4.4

Semi-parametric estimation

The assumptions underlying the Heckman model are very strong. If the model specified in the outcome equation (4.2) and selection equation (4.3) is not correct, if the error terms are not normally distributed with mean 0 or if the error terms are not independent of both sets of ex-planatory variables then the estimates may become inconsistent. Preferably, we would therefore relax the assumptions made. One possibility is to use a semi-parametric estimation technique that relies on less strict distributional assumptions. One example of such an estimation tech-nique is given by Cosslett (1991).

Instead of the inverse Mills ratio, a dummy variables approximation of the selection correc-tion is specified to approximateE[✏i|⌫i > Zi0 Y˜i✓] in equation 4.6. A probit specification is still used to estimate the selection equation assuming normality of the error terms. Therefore this method is only semi-parametric for the outcome equation. The big advantage is that the

(21)

CHAPTER 4. THE MODEL 18 error terms of the outcome equation, ✏i no longer need to be normally distributed as long as ⌫i is normally distributed. The estimation results from the first step of the Heckman model, Zi0ˆ + ˜Yi✓, are ordered and then cut into M sections. To determine the number of M endoge-ˆ nously, we follow the approach that was taken by Hussinger (2008) and use the algorithm by Ayer et al. (1955)2. For each section a dummy variable Dim is created that takes value 1 if individual i belongs to section m and value 0 otherwise. This approximation takes the following form E[✏i|⌫i> Zi0 Y˜i✓] = M X m=1 bmDim(Zi0 + ˜Yi✓) (4.10) where bmare the coefficients corresponding to the di↵erent dummy variables Dim, thus assuming a constant function in each interval. The outcome equation of the model now becomes

Yi = Xi0 + Ri + M X m=1

bmDim(Zi0 + ˜Yi✓) + i (4.11)

Again the reported standard errors are incorrect due to the introduction of randomness from using the estimates of ˆ and ˆ✓ to calculate the selection correction term. To obtain correct standard errors the entire model will be bootstrapped using 200 bootstrap replications. Like in the Heckman model equation 4.11 will be estimated by either OLS or ordered probit depending on the variable Yi.

(22)

Chapter 5

Data

The Survey of Health, Aging and Retirement in Europe1(SHARE) will be used for the analysis. This is a multidisciplinary and cross-national panel database of micro data on health, socio-economic status and social and family networks of more than 120,000 individuals aged 50 or older. It covers 27 European countries and consists currently of six waves of data. To exploit the large number of observations in the SHARE data and to ensure that we have sufficient variation in the dataset when it comes to health and retirement, a pooled dataset will be used. Five di↵erent data waves are pooled together. Only the third wave is left out, because this wave contains di↵erent information than the other waves since it is retrospective and focuses on peoples life histories. From this wave only the information on di↵erent occupation classes will be used to verify the robustness of our results afterwards. It will not be used in the main analysis of interest. Another advantage of the pooled dataset is that subsamples can be considered later, to verify the robustness of the results, that will still contain a sufficient number of observations. The original dataset of wave 1 covers eleven European countries. More European countries are added in later waves but the interviews for a specific wave were not always conducted in the same year in this case. Therefore, in order to ensure a homogeneous sample we only focus on the original eleven countries, namely Austria, Belgium, Denmark, France, Germany, Greece, Italy, the Netherlands, Spain, Sweden and Switzerland.

1This paper uses data from SHARE Waves 1, 2, 3 (SHARELIFE), 4, 5 and 6 (DOIs: 10.6103/SHARE.w1.610,

10.6103/SHARE.w2.610, 10.6103/SHARE.w3.610, 10.6103/SHARE.w4.610, 10.6103/SHARE.w5.610, 10.6103/SHARE.w6.610), see B¨orsch-Supan et al. (2013) for methodological details. (1) The SHARE data collection has been primarily funded by the European Commission through FP5 (QLK6-CT-2001-00360), FP6 (SHARE-I3: RII-CT-2006-062193, COMPARE: CIT5-CT-2005-028857, SHARELIFE: CIT4-CT-2006-028812) and FP7 (SHARE-PREP: No211909, SHARE-LEAP: No227822, SHARE M4: No261982). Additional

funding from the German Ministry of Education and Research, the Max Planck Society for the Advancement of Science, the U.S. National Institute on Aging (U01 AG0974013S2, P01 AG005842, P01 AG08291, P30 -AG12815, R21 AG025169, Y1-AG-4553-01, IAG BSR06-11, OGHA 04-064, HHSN271201300071C) and from various national funding sources is gratefully acknowledged (see www.share-project.org).

(23)

CHAPTER 5. DATA 20

5.1

Overview

SHARE data collection is based on computer-assisted personal interviewing (CAPI) with a questionnaire that is translated into the national language. Due to the physical tests that a re-spondent is asked to perform, personal interviews are necessary. Therefore the SHARE data can be considered reliable and rather advanced in the sense that individual-based data is collected in di↵erent countries. However, fully reliable data don’t exist due to possible sample selection, human mistakes, self-rated questions etc. One of the problems, especially for individual-based data, is item non-response. Respondents might not be willing to answer a question for example due to privacy concerns (such as health or income) or because they might not know the answer (in case of wealth or income for example).

In literature a distinction is often made between data missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) (Little and Rubin, 2002). Data are said to be missing completely at random if there is no systematic di↵erence between missing and observed values in the data. Hence there is no relationship between the missing data point and any other values in the dataset. Data are said to be missing at random if the propensity for a data point to be missing is related to some of the observed data. In fact the variable is missing conditional on another variable. Finally, data are said to be missing not at random if systematic di↵erences remain between the missing values and the observed values, even after the observed data are taken into account (Sterne et al., 2009).

The most straightforward way to deal with missing values would be to delete all incomplete observations. If the missing data mechanism is said to be ignorable then deleting the observa-tions will only lead to a loss of efficiency but still leads to consistent estimates. This is the case for MCAR and MAR, where imputation methods might reduce the loss of efficiency (Cameron and Trivedi, p.927). If the missing data are non-ignorable, which is the case if they are MNAR, then this needs to be taken into account to obtain consistent estimates. We expected health and financial variables to be missing due to privacy concerns. The self-perceived health status turns out to be only missing for 0.41% of the observations and are therefore dropped from the sample. However, both income and wealth are missing for many respondents. We expect these to be MNAR since it is likely that respondents with a very high or very low salary do not want to reveal their income or wealth. To obtain consistent and efficient estimates, a multiple imputation method is therefore used.

The SHARE dataset is accompanied by an extra dataset of multiple imputations. A mul-tivariate imputation method is used which requires that multiple variables are imputed simul-taneously on the basis of some Markov Chain Monte Carlo (MCMC) technique as described in the SHARE release guide 6.1.0 (p.42). To account for the extra variability introduced by the imputation process, five imputations are constructed of the missing values. For each imputation an independent replication of the imputation procedure is used. We use imputation values for the annual household income and wealth. Selecting only one of the five imputation values might lead to misleading results. Therefore, we take the average of the five independently generated

(24)

CHAPTER 5. DATA 21 imputation values.

During the interviews, selected household members serve as family, financial or household respondents. They answer questions on behalf of the couple or the whole household. The answers to these questions, such as the number of children or grandchildren and the household income, are therefore only available for the selected household member and are missing by design for the other household members. During the imputation procedure the information is stored for all respondents. A respondent living without a partner in the household is automatically defined as family and financial respondent. An overview of the number of imputed values is given in table 5.1, where a distinction is made between values that were missing by design and values that were missing for the entire household. The number of imputed values for the household income and for wealth are very high, since they are constructed as the sum of 15 and 7 underlying variables respectively2. Even if only one element is missing, the total value is considered an imputed value.

Variable Imputed values Percentage Missing by design Percentage

Income 37,766 46.47 25,146 30.94

Wealth 53,124 65.37 24,231 29.81

Table 5.1: Overview of the imputed values

After the imputation process, all observations with missing or negative values are deleted from the final sample which then consists of 81,260 observations. Since we use a pooled dataset respondents can occur in multiple data waves, hence the total number of observations does not correspond to 81,260 unique respondents. Table 5.2 shows that the pooled dataset contains 43,138 unique respondents, hence almost half of the observations, and that only 2.52% appears in all five data waves.

Occurrences in pooled dataset Respondents Percentage

1 19,436 45.06 2 13,786 31.96 3 6,500 15.07 4 2,328 5.40 5 1,088 2.52 Total 43,138 100.00

Table 5.2: Occurrences of respondents in pooled dataset

5.2

Variable selection

The explanatory variables of health in the outcome equation (corresponding to Y ) include age, log income, education, marital status, number of children, wealth, and employment status (public sector, private sector, self-employed, retired). The selection equation on retirement

(25)

CHAPTER 5. DATA 22 contains roughly the same explanatory variables. For identification purposes a few variables are added. First, the number of grandchildren is added. We expect respondents with grandchildren to retire earlier in order to spend time with their grandchildren or to take care of them. Secondly, dummy variables are added that indicate if the respondent has reached the early retirement age or full retirement age. The latter makes sense since in all eleven countries that are analyzed, late retirement is possible and hence retirement is not compulsory at the full retirement age. Finally, some objective health measures are added (corresponding to ˜Y ), and the employment status is left out of the selection equation. Also, country and wave specific dummies are included in both equations. First the dependent variable of health will be discussed in more detail. Then we will pay attention to the explanatory variables that are included in X and Z of equation 4.2 and 4.3 respectively, as well as the definition of retirement. Finally descriptive statistics are presented in subsection 5.2.3.

5.2.1 Dependent variables

Health

In 1946 the World Health Organization (WHO) defined health as “a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity” (WHO, 2006). Measuring health is difficult in survey data, since no single standard measurement tool for health status exists. SHARE data contain the respondent’s self-perceived health status (US scale). All respondents are asked if they would rate their own health as (1) excellent, (2) very good, (3) good, (4) fair or (5) poor. Although this self-perceived health status has been used in a variety of researches it is clearly a subjective measure and large di↵erences might exist between countries in the way that people rate their own health (Kapteyn et al., 2007). Also it is unclear whether the question is concerned with physical health, mental health or both. Therefore respondents might interpret and answer the question di↵erently.

Another problem with the self-reported health is the possible presence of justification bias (Lindeboom and Kerkhofs, 2009; Parsons, 1982; Anderson and Burkhauser, 1985). Respondents might have incentives to under-report their health status. For example people receiving dis-ability pay may underestimate their own health status to justify their behavior. Respondents without a paid job might therefore exaggerate their health problems. In that case the e↵ect of health might be overestimated and the e↵ect of other variables, such as economic incentives, is likely to be underestimated. The amount of exaggeration might di↵er per country, depending on the accessibility and generosity of the disability insurance programs (Lindeboom and Kerkhofs, 2009). Conversely there are people who will not admit to being in poor health, even though they are (Myers, 1982, p.10).

We will still use the self-perceived health status, but in order to verify its correctness two other health measures will be used for Yi in equation 4.9 as well. One of them is the Euro-D depression scale that looks solely at mental health aspects. This scale has been developed to compare symptoms of depression across European countries based on twelve depression related

(26)

CHAPTER 5. DATA 23 variables: depressed mood, pessimism, suicidality, guilt, sleep, interest, irritability, appetite, fatigue, concentration, enjoyment, tearfulness. Respondents are asked about these twelve items and the Euro-D depression scale is constructed by summing the 0 (no) and 1 (yes) scores. Hence the scale ranges from 0 (not depressed) to 12 (very depressed).

The self-perceived health status and the Euro-D depression scale are both ordered variables. Finally, we will construct a continuous health variable. The SHARE data also contain several objective measures of health, both physically and mentally. We try to form a complete picture of the respondent’s overall well-being from di↵erent subjective and objective health-measures by creating a health index, which was first introduced by Bound et al. (1999). We follow an extended version of the approach taken by Coe and Zamarro (2012) and construct a health index based on the following health measures: the Euro-D depression scale, grip strength, limitations due to health, limitations with activities of daily living (ADL), physical inactivity, chronic diseases, mobility limitations, hospital stays, and the body mass index. Preferably, we would add information on smoking and heavy drinking behavior as well, but due to data limitations this is not possible. Any information on smoking behavior is absent in waves 5 and 6. All waves contain information on drinking behavior but some waves focus on the frequency of drinking whereas other waves are concerned with the units of alcohol on one occasion. The data are too distinct to come up with one clear definition of heavy drinking.

A health index is constructed from the listed health measures. In particular, the following ordered probit model is estimated:

Hij = jLij + j maleij + j ageij + µij (5.1) where Hij is the self-reported health status for individual i from country j ranging from excellent (1) to poor (5), Lij includes the individual objective and subjective measures of health as mentioned above, maleij is a dummy variable that takes value 1 if the respondent is a male and value 0 otherwise, and ageij is the age of the respondent at the time of interviewing. The gender dummy is included to correct for possible di↵erences between men and women in the way that they rate their own health. The age variable is included to capture the e↵ect that in general health deteriorates when people get older. No constant is included in Lij. In order to specify a probit model, the errors are assumed to be normally distributed. To control for di↵erent means and cut points, equation 5.1 is estimated separately for each country in the sample. The estimated values of ˆj, ˆj and ˆj are used to construct a health index for each individual by multiplying the estimated coefficients of equation 5.1 with the selected health measures, then taking the sum. The construction can be summarized as follows:

HIij = ˆjLij+ ˆj maleij+ ˆj ageij (5.2) where HIij is the health index for individual i from country j and maleij and ageij variables as defined for equation 5.1. This leads to a continuous measure of health that can be used in the outcome equation of the Heckman model as specified in chapter 4. The health measures included in the health index will now be discussed in more detail.

(27)

CHAPTER 5. DATA 24 Euro-D depression Scale

The Euro-D depression scale compares symptoms of depression across European countries as was discussed in subsection 5.2.1. It ranges from 0 (not depressed) to 12 (very depressed). Maximum grip strength

A measurement has been performed by the interviewer that quantifies the respondent’s maxi-mum handgrip strength, ranging from 0 to 100, with the aid of a dynamometer. This variable contains a lot of missing values, especially in the last three data waves. In total 10,620 values (11.56%) are missing. We assume that the grip strength of an individual decreases linearly over time and try to use linear interpolation to replace the missing values. Unfortunately this is not possible in most cases since the respondent only appears in one or two data waves or due to multiple missing values for one respondent. Therefore, all respondents with missing grip strength values are deleted from the final sample.

Limited due to health

Respondents are asked to what extent they have been limited because of a health problem in activities people usually do, for the past six months at least. Based on this information a dummy variable is created that takes value 1 if the respondent has been limited or severally limited and value 0 otherwise.

Physical inactivity

Respondents are asked how often they engage in strong physical activity such as sports, heavy housework, or a job that involves physical labor. A dummy variable is created based on this in-formation that takes value 1 in case the respondent never engages in vigorous physical activities and takes value 0 otherwise.

Number of chronic diseases

Respondents are asked if they have been diagnosed by a doctor with chronic diseases such as high blood pressure, a stroke, diabetes, chronic lung disease, cancer, Parkinson, osteoarthritis etc. that they are currently being treated for or bothered by. This variable counts the number of chronic diseases that a respondent has. It ranges from 0 to 12.

Number of mobility limitations

Respondents are asked whether they had difficulties doing any of the following everyday activ-ities:

1. Walking 100 meters

2. Sitting for about two hours

(28)

CHAPTER 5. DATA 25 4. Climbing several flights of stairs without resting

5. Climbing one flight of stairs without resting 6. Stooping, kneeling, or crouching

7. Reaching or extending your arms above shoulder level 8. Pulling or pushing large objects like a living room chair

9. Lifting or carrying weights over 10 pounds/5 kilos, like a heavy bag of groceries 10. Picking up a small coin from a table

This variable counts the number of difficulties that a respondent has with these everyday activ-ities, ranging from 0 to 10. Respondents were not asked to perform the activities listed above, hence the variable is a self-reported measure.

Number of ADL limitations

Respondents are asked whether they had difficulties doing any of the following daily activities during the last twelve months because of a physical, mental, emotional or memory problem.

1. Dressing, including putting on shoes and socks 2. Walking across a room

3. Bathing or showering

4. Eating, such as cutting up your food 5. Getting in or out of bed

6. Using the toilet, including getting up or down

This variable counts the self-reported number of limitations that a respondent has with these activities of daily living, hence it ranges from 0 to 6.

Hospital stay in previous year

Respondents are asked whether they have been in a hospital overnight during the last twelve months. This dummy variable takes value 1 if the respondent has been in a hospital overnight and takes value 0 otherwise.

Overweight and obesity

The body mass index (BMI) that is calculated as lengthbody mass (in kg)2(in meters), is taken as an indicator for

overweight and obesity. According to the World Health Organization, an adult is underweight if the BMI is below 18.5, has a normal weight if the BMI is between 18.5 and 25, is overweight if the BMI is between 25 and 30 and falls into the obese category if the BMI is over 30. Two dummy variables are included for being overweight and being obese.

5.2.2 Explanatory variables

(29)

CHAPTER 5. DATA 26 Age

All individuals in the dataset are aged 50 or older. Since we focus on retirement we eliminate individuals who have never done paid work in their entire life or since they reached the age of 50.

The level of health automatically decreases as people get older. This deterioration of health is likely not to be due to their retirement but due to their age. Leaving these respondents in the analysis sample might lead to biased results. Therefore all individuals aged 71 or older are left out.

Gender

A dummy variable male is included that takes value 1 if the respondent is male and 0 if the respondent is female.

Income

The SHARE data contain the annual net household income that aggregates at the household level all individual income components. Only the household income of wave 1 is given in gross terms instead of net terms, but SHARE provides an extra dataset where a conversion has been made from the gross household income to the net household income. The income consists of the sum of annual earnings from employment, old age pensions, early retirement pensions, survivor and war pensions, private occupational pensions, disability pension and benefits, unemployment benefits and insurance, payments from social assistance, sickness benefits and pensions, other regular payments from private pensions or private transfer, earnings from self-employment, income from rent or sublet, income from other household members, interest or dividend from bank account, bond, stock, and mutual funds.

To compare incomes across countries and waves we reduce the amount of heterogeneity. First, all incomes are converted to Euros by multiplying them with the nominal exchange rate of the interview year. To correct for di↵erences in purchasing power between countries the income is then divided by the purchasing power parity-rates of the interview year. Both the nominal exchange rates and purchasing power parity-rates are provided by the SHARE data. Next, the income must be corrected for the number of household members. We follow the approach from Buhmann et al. (1988) and divide the household income by the square root of the number of persons in the household. Finally, we take the logarithm of the corrected income. Education

Education is one of the most diverse international variables. To be able to compare di↵er-ent systems across countries the International Standard Classification of Education (ISCED) is used, which is already provided in the SHARE data. Waves 1,2, and 4 contain a classification that was first introduced in 1997. In 2011 there has been a revision of the ISCED

(30)

classifi-CHAPTER 5. DATA 27 cation. Waves 5 and 6 therefore contain both the 1997 and 2011 ISCED classification. For consistency the 1997 classification is used for all waves, which classifies the individual’s educa-tion level into (UNESCO, 1997, p.19): (0) pre-primary level of educaeduca-tion, (1) primary level of education, (2) lower secondary level of education, (3) upper secondary level of education, (4) post-secondary, non-tertiary education, (5) first stage of tertiary education or (6) second stage of tertiary education.

Marital status

A dummy variable married is created that takes value 1 if the respondent is married or has a registered partnership and takes value 0 otherwise. This variable is constructed from infor-mation on the respondent’s marital status. In each questionnaire respondents are asked about their marital status, with six possible answers: (1) Married and living together with spouse, (2) registered partnership, (3) married, living separated from spouse, (4) never married, (5) divorced, (6) widowed. Hence the dummy variable takes value 1 for cases (1) to (3) and value 0 otherwise.

Wealth

All respondents are asked a variety of questions containing their savings or investments. SHARE uses this information to construct some aggregated variables. One aggregated variable contains the total value of government and corporate bonds, stocks and mutual funds. A second one contains the total value of individual retirement accounts, contractual savings for housing and life insurance holdings. We define wealth as the total value of both aggregated variables together with the value of bank and other transaction accounts.

Like the income variable, the wealth variable is adapted to correct for di↵erences between countries and waves. We multiply wealth with the nominal exchange rate of the interview year, divide it by the purchasing power parity-rate of the same year and divide the total wealth by the square root of the number of persons in the household. Finally, we take the logarithm of the corrected wealth.

Number of children

The number of children of the respondent includes all natural children, fostered, adopted and stepchildren.

Number of grandchildren

The number of grandchildren of the respondent also includes grandchildren of spouse or partner from previous relationships.

Referenties

GERELATEERDE DOCUMENTEN

In order to be able to detect the dividend preferences of different types of owners, dummy variables are used for banks, financial institutions, companies,

According to our life-cycle model, co-payments and bequest saving thus jointly explain why higher SES households perceive a larger welfare gain from differences LTC needs and

The main goal of this research is to examine a possible connection between an individual’s cognitive ability and the ownership of an individual retirement account. As such,

The current study analysed whether mental health status is associated with time preferences from the individual perspective, to contribute quantitatively to the rationale for

Regardless of the additional control variables, measurement of inequality and the estimation procedure, I found that corruption is positively associated with the top-1% income

I find that a large share of non-interest income does increase the insolvency risk for cooperative banks, but not for commercial and savings banks. The increase in insolvency risk

In this paper, we propose a Markov Decision Problem (MDP) to prescribe an optimal query assignment strategy that achieves a trade-off between two QoS requirements: query response

Om een idee te krijgen van de huidige aanwezigheid van de Apartheidsideologie in de Afrikaner identiteit en de dominante (racistische) denkbeelden die hiermee gepaard gaan is