• No results found

Predicting life expectancy for the Swedish pension system

N/A
N/A
Protected

Academic year: 2021

Share "Predicting life expectancy for the Swedish pension system"

Copied!
64
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Predicting life expectancy

for the Swedish pension system

(2)

University of Groningen

Master’s Thesis Econometrics, Operations Research and Actuarial Studies Specialization: Actuarial Studies

(3)

Preface

This thesis marks the end of my study econometrics, with specialization Actuarial Science. Therefore, I would like to take the opportunity to express my gratitude towards a number of persons.

First of all, I would like to thank my supervisor prof. dr. R.H. Koning. I appreciated the fact that it was possible to walk by for a conversation or when I encountered problems. Fur-thermore, I want to thank him for his useful comments and feedback on my thesis.

Moreover, I would like to thank Nico for his support and trust in me. Last but not least I would like to thank my family and friends for their interest in my thesis.

Linda Swart

(4)

Contents

Preface 3

Contents 5

1 Introduction 6

2 Sweden’s pension system 8

2.1 A new pension system . . . 8

2.1.1 Components of the national pension system . . . 8

2.1.2 Retirement age . . . 9

2.1.3 Contribution rates . . . 9

2.1.4 Buffer fund . . . 11

2.1.5 Transition rules . . . 12

2.1.6 Annual pension statement . . . 12

2.2 Indexation of pensions and calculating pension benefits . . . 13

2.2.1 Income index . . . 14

2.2.2 Balancing mechanism . . . 14

2.2.3 Balance growth . . . 15

2.2.4 Calculating pension benefits and indexing . . . 15

2.3 Current situation and risks . . . 16

3 Definitions and data description 19 3.1 Actuarial concepts and notation . . . 19

3.2 Mortality rates . . . 21

3.3 Data description . . . 23

4 Forecasting methods 25 4.1 Lee-Carter model . . . 25

4.2 Fit of LC model . . . 28

4.3 Criticisms on Lee-Carter method . . . 31

4.4 Improvements to LC model . . . 31

4.4.1 Lee-Miller variant . . . 31

4.4.2 Booth-Maindonald-Smith variant . . . 32

(5)

Contents

5 Comparison between different methods 38

5.1 Comparison of life expectancy . . . 38

5.2 Actual versus predicted values . . . 44

5.3 Comparison between methods . . . 49

6 Conclusion 51 Bibliography 53 A Introduction to Sweden’s old pension system 57 A.1 Old pension system . . . 57

A.2 Reform reasons and goals . . . 58

A.2.1 Reform reasons . . . 58

A.2.2 Reform goals . . . 60

B Derivation of exposure to risk 61

(6)

Chapter 1

Introduction

In the last century life expectancy has increased rapidly through improvements in living con-ditions, medical care and working conditions. While in Sweden life expectancy was just above fifty years at the beginning of the twentieth century, it has risen to approximately eighty years in 2000 (Statistics Sweden, 2010). Furthermore, fertility rates declined. Clearly, pension sys-tems are affected by these demographic developments.

In 1994, it was decided by Sweden’s parliament to reform the pension system, because this system was expected to become unaffordable. This previous pension system is described in Appendix A. The new national pension system consists of an inkomstpension and a funded premium pension. The inkomstpension is a Notional Defined Contribution system, which means that individuals pay contributions and these contributions are accrued to an individ-ual virtindivid-ual balance, together with notional interest. The inkomstpension is financed on a Pay-As-You-Go (PAYG) basis. PAYG financing means that current benefit obligations are financed with current contributions. When contributions exceed pension payments, money flows into the National Pension Funds (buffer funds) and vice versa.

(7)

1. Introduction

What are the implications of using different forecasting models for life expectancy for the Swedish pension system?

In this thesis, we will compare different forecasting models. These models are the Lee-Carter (LC) model, two variants of the LC model and maximum likelihood estimation. In order to answer the research question, we have formulated a number of subquestions:

What are the effects of longevity in an NDC pension system? What are the differences between the various forecasting methods? How accurate are the life expectancy predictions?

(8)

Chapter 2

Sweden’s pension system

This chapter gives an overview of Sweden’s pension system. Firstly, in Section 2.1 we discuss a few important components of the new system, such as retirement age and the annual pension account. In Section 2.2 the indexation of pensions and the calculation of pension benefits are discussed. Finally, in Section 2.3 the current situation is described. For completeness, a description of Sweden’s previous pension system is given in Appendix A.

2.1

A new pension system

This section starts with an overview of the reform and some general properties of the national pension system in Section 2.1.1. Section 2.1.2 treats retirement age and the possibility of partial retirement, whereas in Section 2.1.3 the contribution rates are discussed. The role and the size of the buffer fund are considered in Section 2.1.4. To specify which group receives benefits according to the rules of the new and the old system, transition rules were formulated. These rules are discussed in Section 2.1.5. Finally, the annual account statement is treated in Section 2.1.6.

2.1.1 Components of the national pension system

In 1994, Sweden’s Parliament approved the public pension reform. Implementation of the reform started in 1995, and was completed in 1999 (K¨onberg, Palmer, and Sund´en, 2006). Figure 2.1 gives a schematic overview of the national pension system. The new pension system includes an inkomstpension (Notional Defined Contribution, abbreviated as NDC) and a pre-mium pension (Financial Defined Contribution, abbreviated as FDC). The difference between NDC and FDC systems is that NDC systems are financed on a Pay-As-You-Go (PAYG) basis while FDC systems are funded. PAYG financing means that current benefit obligations are financed with current contributions. Kok and Hollanders (2006) define funding as a financing system in which premiums are set aside and saved in funds until the time of the pension withdrawal.

(9)

2. Sweden’s pension system

of residence is needed after age 25. Pension disbursements for this pension type begin at age 65. Similarly to the previous pension system there are work-related supplementary pensions in the reformed system. For an overview we refer to the paper of Kok and Hollanders (2006).

Figure 2.1: Reformed Swedish pension system (source: www.ap3.se)

2.1.2 Retirement age

In Sweden there is no mandatory retirement age. After the age of 61 years it is possible to (partially) retire. Since 2001, individuals have the right to work until the age of 67 years, this is an increase of 2 years in comparison to the old pension system. Nevertheless, on Sweden’s pension website www.pensionsmyndigheten.se is stated that when employers approve, it is possible to continue working after the age of 67.

With partial retirement, individuals can choose the following options: a 75 percent benefit claim, a 50 percent benefit claim or a 25 percent claim. Moreover, people can receive pension payments, while they are still working. For example, people work fulltime and receive a 25 percent benefit claim as well, or work partially and receive, for instance, a 75 percent benefit claim. As a result, notional capital is enlarged by contributions on earnings (Palmer, 2000). Clearly, when individuals retire early, pension benefits will be relatively low. The moment of retiring thus influences the size of pension benefits. Sweden’s pension website www.pensionsmyndigheten.se mentions that for each year that retirement is postponed, pension benefits (of notional pension capital) are increased by approximately 7 to 8 percent.

2.1.3 Contribution rates

(10)

2. Sweden’s pension system

18.5 percent of the pension base. Since 1999 the allocation is as follows: 16 percent is used for PAYG financing, and 2.5 percent is invested in funds.

The pension base consists of pension-qualifying income and pension-qualifying amounts. The former consists of salaries and so called transfer payments, which are social insurance and un-employment benefits. Individuals pay contributions of 7 percent over these benefits. Pension-qualifying amounts are meant to compensate individuals for child-care years, study, national service, and for individuals that receive disability pension (National Social Insurance Board, 2001). Figure 2.2 gives the allocation between these categories in 2000. Approximately 83 percent of the pension credit base consists of earned income.

Figure 2.2: Pension credit base (source: National Social Insurance Board, 2001)

Employers pay contributions over individual’s total earnings, whereas employees pay contri-butions only below a certain ceiling. The contribution for employers is 10.21 percent of any employee’s income. Employer contributions for earnings above the ceiling do not give right to pension credit for individuals. These contributions are considered as a tax and transferred to the central-government budget (Swedish Social Insurance Agency, 2008).

Vidlund (2009) explains that two different base amounts are used to determine pension-qualifying income: an income-related base amount and a price-related base amount. The former is in line with an earnings index that follows the development in wages (inkomstin-dex), whereas the latter is in line with consumer price developments. In 2008, the price-related base amount was SEK 41 000 and the income-related base amount was SEK 48 000. Each year individuals accrue pension rights when their income exceeds 42.3 percent of the price-related base amount (prisbasbeloppet). In 2008, this was 0.423 × 41000 = SEK 17 343 (Swedish Social Insurance Agency, 2008). When income exceeds 8.07 times the income-related base amount, i.e. when income is above 8.07 × 48000 = SEK 387 360, a ceiling in terms of pension rights is reached. Due to this ceiling, any income above SEK 387 360 will not count towards pension rights. In brief, pension credit is based on all earnings below the ceiling, given that income exceeds a certain threshold.

(11)

2. Sweden’s pension system

a contribution of 18.5 percent of these amounts is paid by the government (National Social Insurance Board, 2001). General tax revenue is used to finance these contributions.

In conclusion, employers vouch for 10.21 percent of each employee’s income, while employees themselves pay 7 percent on earnings. This makes a total of 17.21 percent while the contri-bution rate is 18.5 percent. The explanation for this discrepancy is that individual pension contributions are reduced from pension-qualifying income. The corresponding calculation is as follows: (100−7)17.21 × 100 = 17.21

93 ' 18.5 (Swedish Social Insurance Agency, 2008).

2.1.4 Buffer fund

The national pension system has a buffer fund which consists of 5 funds, known as the First, Second, Third, Fourth and Sixth National Pension Funds (Swedish Social Insurance Agency, 2008). The buffer fund is used to serve as a financial buffer in case pension disbursements exceed contributions. The main goal of the buffer fund is to stabilize pension payments and / or pension contributions in relation to economic and demographic variations (Swedish Social Insurance Agency, 2008).

Palmer (2000) reported that in 2000 reserves in the buffer fund were approximately 40 percent of GDP. With these reserves 5 years of current pension disbursements could be financed, or equivalently, fund strength is 5 years. Fund strength is defined as fund size divided by pension disbursements for the same year. In 2009, the fund strength was 3.8 years. The fund assets of the buffer fund have a value of approximately 707 087 million Swedish Krona in 2008 (Swedish Social Insurance Agency, 2008). An overview of the changes in the value of the buffer fund in the last years is given in Figure 2.3.

Figure 2.3: Buffer fund and balance ratio (source: Swedish Pensions Agency, 2009)

(12)

2. Sweden’s pension system

fund is important for the financial stability there are new investment rules for the buffer fund. A maximum of 70 percent of the portfolio may be invested in equities and a maximum of 40 percent of the portfolio can be exposed to currency risk. It is possible that the buffer fund has a value lower than 0. In that case, the fund can borrow money. This takes place via the National Debt Office (Swedish Social Insurance Agency, 2008). The interest rate is assumed to be 1 percent.

2.1.5 Transition rules

Individuals born in or after 1954 are completely included in the new pension system, and persons born in or before 1937 receive their pension benefits according to the old benefit formulas. For individuals born between 1938 and 1953 transition formulas were created. This group receives benefits partly according to the rules of the new system and partly to the rules of the old system. For example, someone born in 1953 receives 1920 of the benefit according to the new rules, and 201 to the rules of the old ATP system. For an individual born in 1950, these proportions are respectively 1620 and 204 (National Social Insurance Board, 2003). For each year, the change in proportion is 201.

2.1.6 Annual pension statement

From the year 2000, individual account statements are annually sent to the Swedish population in February/March (K¨onberg et al., 2006). The individual account statements are known as Orange Envelopes. An example of the contents of an Orange Envelope is given in Figure 2.4. It displays among other things the notional value of the inkomstpension and the premium pension.

Figure 2.4: Illustration of annual pension statement (source: www.forsakringskassan.se)

(13)

2. Sweden’s pension system

expected moment of retiring. Expected pension benefits are based on different assumptions, one of these assumptions is the fact that individuals keep the same income pattern. An illus-tration of a projection with different retirement ages and various assumptions about future growth is given in Figure 2.5.

Figure 2.5: Pension projections for different retirement ages, and two growth rates (source: www.pensionsmyndigheten.se)

The account statements are based on data of 2 years earlier. Hence, the envelopes sent in 2009 are based on income of 2007 (Swedish Social Insurance Agency, 2008). Notice that the Orange Envelope does not represent the occupational pension most workers have. Thus for most workers, the account statement only represents part of their total pension. In order to provide workers with an overview of their total pensions, an annual statement for occupational pension is sent as well (www.pensionsmyndigheten.se/OrangeKuvertet.html).

2.2

Indexation of pensions and calculating pension benefits

(14)

2. Sweden’s pension system

calculation of pension benefits is discussed.

2.2.1 Income index

Pension accounts are usually revalued by the change in the income index. The income in-dex follows the average rate of growth in the earnings of the labor force (Swedish Pensions Agency, 2009). The income index depends on two components. The first component is the average annual change in real wages. This is measured by the average annual change in av-erage income over the last 3 years, where inflation is excluded. The second component is the inflation rate over the last 12 months, ending in June.

Changes in the income index affect both retirees and economically active individuals. Pension liability to retirees in year t are affected by changes in the income index between years t − 1 and t via adjustment indexation of inkomstpension and ATP disbursements (Swedish Social Insurance Agency, 2008). Economically active individuals in year t are affected by changes in the income index between years t and t + 1 via income indexation of pension balances.

2.2.2 Balancing mechanism

One of the reform goals is financial stability. The financial stability of the pension system is measured by a balance ratio, which is introduced in 2001 (K¨onberg et al., 2006). This ratio measures pension’s assets compared with pension’s liabilities. When the ratio equals 1, the pension system is in financial balance, and above 1 means that assets exceed liabilities. When the ratio is below 1, there is a financial imbalance and an automatic balance mechanism is activated (K¨onberg et al., 2006).

Activation of the balance mechanism means that account balances and pensions are indexed by the change in the balance index instead of the change in the income index (Swedish Social Insurance Agency, 2008). The balance index is calculated by multiplying the income index with the balance ratio. For example, consider that the balance ratio would decline to 0.99 while the income index increases from 1.00 to 1.04. In that case balancing is activated, and indexation becomes 2.96 instead of 4 percent (0.99 × 1.04 = 1.0296).

Thus, when the balance ratio is below 1, pension benefits and account balances are indexed by average income growth times the balance ratio. This corresponds to lower indexation than if the income index would be used. If the balance ratio exceeds 1 in a period when the bal-ance mechanism is activated, pension benefits and account balbal-ances are indexed at a higher rate than average income growth (Settergren, 2001). The automatic balance mechanism is deactivated when the balance index reaches the level of the income index (Swedish Social Insurance Agency, 2008).

(15)

2. Sweden’s pension system

Figure 2.6: Balancing (source: Swedish Social Insurance Agency, 2008)

Note that balancing is activated every time the balance ratio has a value lower than 1. A ratio below 1 can have several causes: temporary downturns, but also economic and demographic developments. Balancing can give a negative signal to the public and therefore it should not be activated unnecessarily (K¨onberg et al., 2006). For the calculation of the balance ratio a three-year moving average is used. An overview of the balance ratio of the last few years is given in Figure 2.3.

2.2.3 Balance growth

Each year, adjustments are made to the pension account. Contributions, accrued interest, and inheritance gains are added to the account, and a charge for administrative costs is subtracted (Swedish Social Insurance Agency, 2008). The accrued interest is usually based on the growth in average income, which is measured by the income index. When the automatic balance mechanism is activated, the income index is replaced by the balance index. This is explained in Section 2.2.1 and 2.2.2. Inheritance gains are pension balances of deceased persons. Those gains are redistributed to individuals of the same birth cohort. In fact, this redistribution is a percentage increase in pension balances by means of an inheritance gain factor (Swedish Social Insurance Agency, 2008).

2.2.4 Calculating pension benefits and indexing

(16)

2. Sweden’s pension system

An example is given by the Swedish Pensions Agency (2009). Individuals that retire at the age of 65 have a remaining life expectancy of approximately 19 years. Through the interest rate of 1.6 percent, the annuity divisor is reduced to 16. If someone has an inkomstpension account of 2.5 million, he will receive a pension of SEK 156,250 (2,500,000/16) per year. For the individuals that have withdrawn their pension before the age of 65, the disbursed amount is recalculated in the year when age 65 is reached. Reason for the recalculation is to incorporate changes in life expectancy. The final annuity divisor for a cohort is determined when the cohort reaches the age of 65 (Swedish Pensions Agency, 2009). Thus, changes in life expectancy do not influence pension benefits after the age of 65 (K¨onberg et al., 2006). For the indexation of pension benefits after retirement, normally an income index is used (see page 14). Each year pension benefits are recalculated by the ratio between the new and the previous income index (Swedish Social Insurance Agency, 2008). This ratio is divided by 1.016, which is the interest rate used in the annuity calculation. Pension benefits do not change in real terms if the actual rate of growth is exactly 1.6 percent more than inflation, where inflation is measured by the Consumer Price Index (National Social Insurance Board, 2002). For example, if wages and salaries increase with 2 percent more than inflation, pension benefits will increase by 0.4 percent in real terms ((1.0161.02 − 1) × 100 = 0.39 %). If wages and salaries increase by 1 percent more than inflation, pensions will decrease by 0.6 percent in real terms ((1.0161.01 − 1) × 100 = −0.59 %). An exception is made whenever the balance mechanism is activated - in this case, not the income index but the balance index is used to recalculate pension benefits.

2.3

Current situation and risks

In this section the current situation is described, and risks are discussed. The balance index was activated both in 2009, and 2010. At the start of 2010, the balance index decreased by 1.4 percent (Swedish Pensions Agency, 2009). Therefore, the inkomstpension credit of the economically active was lowered by this percentage, and the pension for retirees was reduced by 3.0 percent (1.4+1.6). Reason for balancing was the global financial crisis, this led to losses by the buffer fund. In 2009, the fund strength was calculated as 3.8 years. This means that without additional contributions or a higher return on the buffer fund, 3.8 years of pension disbursements could be financed by the fund.

Kok and Hollanders (2006) note that longevity risk is passed to the retiring generations, as pension size depends on life expectancy of the retiring generation. Individuals can choose to continue (partially) working and as a consequence enlarge notional capital, and thereby increase the resulting pension. The risk of low economic growth is shared by the retirees and the labor force, as both pensions as notional capital are indexed with average growth in real earnings.

(17)

2. Sweden’s pension system

possibly affected through the adjusted indexation.

According to the National Social Insurance Board (2001), the new pension system is gen-erationally fair. This is due to the fixed contribution rate, average income as the basis for compounding in the system, adjustment of pension levels to changes in average life span before the age of 65, absence of any adjustment thereafter, the buffer fund, and automatic balancing.

Figure 2.7: Life expectancy and retirement age (source: National Social Insurance Board, 2001)

(18)

2. Sweden’s pension system

Figure 2.8: Life expectancy and retirement age (source: Swedish Pensions Agency, 2009)

(19)

Chapter 3

Definitions and data description

In this chapter various concepts and definitions are discussed, which are used in Chapter 4. Specifically, we start in Section 3.1 with a discussion of actuarial concepts, and continue with an explanation of Lexis diagrams and mortality rates in Section 3.2. In Section 3.3 a description of Swedish mortality data is given, which is used in Chapter 4 and 5 to forecast life expectancy.

3.1

Actuarial concepts and notation

This section explains how to convert mortality rates into probabilities of death. These prob-abilities can be used to calculate other life table measures, such as remaining life expectancy. Before explaining this, we first introduce some actuarial concepts.

T (x) is a random variable that measures the remaining lifetime of an individual aged x. The probability distribution function of T (x) is given by:

G(t) = Pr(T (x) ≤ t), t ≥ 0 (3.1)

where the function G(t) represents the probability that a person aged x will die within t years. We assume that G is continuous, with probability density function g(t) = G0(t) (Gerber, 1997). To express survival and death probabilities, the notation as used by the international actuarial community is given. The symboltqxdenotes the probability that an individual aged

x will die within t years. On the contrary, the symbol tpx denotes the probability that an

individual aged x will survive at least t years. We have the following relations:

tqx = G(t) (3.2)

tpx = 1 − G(t) (3.3)

For t = 1, these probabilities are expressed as qxand px. The symbols qx and pxcould also be

expressed in terms of lx, where lx is the number of survivors to age x from the l0 newborns.

(20)

3. Definitions and data description

To be able to derive a relation between mortality rates (mx) and probabilities of death (qx)

the following definitions are required. mx denotes the central death rate or mortality rate at

age x. Lx is the total expected number of years lived between ages x and x + 1 by lx lives,

and a(x) is defined as the average time lived within the age interval [x, x + 1) for individuals dying at age x. Now the following equations are given (Bowers, Gerber, Hickman, Jones, and Nesbitt, 1997): Lx = a(x)lx+ [1 − a(x)]lx+1 (3.6) mx = lx− lx+1 Lx (3.7) From equation (3.6) we find:

a(x)lx = Lx− [1 − a(x)]lx+1

a(x)lx− [a(x) − 1]lx = Lx− [1 − a(x)]lx+1− [a(x) − 1]lx

lx = Lx− [1 − a(x)]lx+1+ [1 − a(x)]lx

lx = Lx+ [1 − a(x)](lx− lx+1) (3.8)

In order to express qx in terms of mx, we use equation (3.4), divide both numerator and

denominator with Lx, and use equation (3.7) and (3.8) to find:

qx = lx− lx+1 lx = (lx− lx+1)/Lx lx/Lx qx = mx [Lx+ [1 − a(x)](lx− lx+1)]/Lx = mx 1 + [1 − a(x)]mx

We assume a uniform distribution of deaths, and therefore a(x) =12, which gives: qx =

mx

1 +12mx

(3.9) Since deaths of newborns often occur in the first few weeks of life, the assumed uniform dis-tribution of deaths does not hold for individuals aged 0. The corresponding formula of a(0) is a(0) = 0.07 + 1.7m0 (Booth, Tickle, Hyndman, Maindonald, and Miller, 2010).

The expected value of the random variable T (x) is E[T (x)] = ˚ex. This is the expected

remaining lifetime of an individual aged x.

(21)

3. Definitions and data description

K(x) is the curtate future lifetime of an individual aged x. To be more precise, it is the number of future years completed before death of an individual aged x (Gerber, 1997). The expected value of the discrete random variable K(x) is denoted by ex.

ex = E[K(x)] = ∞ X k=0 k Pr(K(x) = k) = ∞ X k=0 k X j=1 1 Pr(K(x) = k) = ∞ X j=1 ∞ X k=j Pr(K(x) = k) = ∞ X j=1 Pr(K(x) ≥ j) = ∞ X j=1 jpx (3.11)

The expected future lifetime of an individual aged x can be calculated using the following approximation: ˚ex≈ ex+ 12.

3.2

Mortality rates

This section gives a definition of mortality, and explains the function of Lexis diagrams. We would like to mention that mortality could be measured by the absolute number of deaths. However, as a result mortality heavily depends on population sizes, and a comparison over space and time will be difficult (Pressat, 1972). Therefore, mortality is usually measured by age-specific death rates, i.e. mortality rates. Mortality rates can be forecasted using extrapolative methods, which will be explained in Section 4.1. Before a definition of age-specific death rates is stated, first a definition of a period rate is given (Preston, Heuveline, and Guillot, 2001). This demographic rate can be adapted to several events, for instance, the number of deaths or the number of births.

Rate [0, T ] = Number of events between time 0 and T

Person-years lived in the population between time 0 and T

A person-year is 1 year lived by 1 person. Hence, if a person lives 1 week between time 0 and T , his contribution to the total person-years will be 521 year.

As one might expect, in the definition of death rate, the number of events is replaced by the number of deaths. Hinde (1998) defines the total death rate in the following way:

Total death rate = Number of deaths in a specified time period

Number of people exposed to the risk of dying during that time period The denominator is known as exposure, or exposure-to-risk. Since the number of deaths in the population varies with age, it is more interesting to calculate age-specific death rates, which are denoted by mx,t. A definition of death rates is given here:

mx,t=

Number of deaths aged x in year t

Exposure-to-risk of individuals aged x in year t =

D(x, t)

E(x, t) (3.12)

(22)

3. Definitions and data description

Figure 3.1: Lexis diagram (source: www.demog.berkeley.edu)

individual’s life, see Figure 3.1. Starting point of a lifeline in a Lexis diagram is the moment of birth or migration into the population(◦), whereas end points are death(x) or emigration(•). Consider the white cell in Figure 3.1. When all exact life-lines of individuals are known, this age-specific death rate (mx,t) could be calculated by dividing the number of deaths by the

exposure-to-risk. Because life lines are represented in a two dimensional diagram, the length of 1 year is in fact √2. The exposure-to-risk could be calculated by adding the length of all life-lines in a cell, and divide this number by √2. Hence, exposure is measured as the total number of person-years. Nevertheless, these kind of data about individual lifelines are most often not available (Pressat, 1972).

(23)

3. Definitions and data description

Figure 3.2: Lexis diagram specified by period-age (top left), period-cohort (top right), and cohort-age (bottom) (source: Wilmoth et al., 2007)

3.3

Data description

In the Lee-Carter method (discussed in Section 4.1) we need data of death rates for age x in year t. Hence, considering the explanation given on page 22 about different types of death counts, we have to use the period-age type of observations. We use data of the Human Mor-tality Database. Note that time t (t = 1, . . . , T ) and age x (x = 1, . . . , n) are discrete here. This means that a person aged x has an exact age within the interval [x, x + 1) and if an event occurs in year t it happens within the interval [t, t + 1) (Wilmoth et al., 2007).

Since we use data from the Human Mortality Database, we will also use the definition of age-specific death rates as used by the Human Mortality Database. In order to arrive at this definition, we first return to equation (3.12), which is the definition of the observed death rate. D(x, t) are the number of deaths aged x in year t. Obviously, D(x, t) = DL(x, t) + DU(x, t)

(24)

3. Definitions and data description

DL(x, t) are the number of deaths of birth cohort t − x in year t at age x. E(x, t) is defined

as the total person-years lived in the age interval [x, x + 1) during calendar year t (Wilmoth et al., 2007). E(x, t) is also considered as the average population for a specific time period, in our case a year, or mid-year population. Finally, P (x, t) are the number of individuals aged x in year t (on January 1st).

From equation (3.12) the definition of age-specific death rates as used by the Human Mortality Database follows in equation (3.13):

mx,t= D(x, t) E(x, t) = DL(x, t) + DU(x, t) 1 2[P (x, t) + P (x, t + 1)] + 1 6(DL− DU) (3.13)

In line with equation (3.12), death rates are measured by the number of deaths divided by the exposure-to-risk. In many papers the population exposed to the risk of death in a year is estimated by the average of two annual population estimates (12[P (x, t) + P (x, t + 1)]). The Human Mortality Database however adds a small correction that reflects the timing of deaths during the interval. A precise derivation of the exposure-to-risk is given in Appendix B, and can also be found in Appendix E of Wilmoth et al. (2007).

To select time-series of Sweden’s central death rates, we use data of the Human Mortality Database (unisex, for years 1950 until 2007). We have chosen 1950 as our starting date be-cause the registration of the population was re-organized in 1946, with the changes taking effect in 1950. We have chosen 2007 as our end date because there is no data available beyond this year.

To be able to work with the data we have chosen to set the maximum age at 100 instead of 110, for the following reason. Observed death rates at older ages can fluctuate considerably, because both the number of deaths and the exposure rate become small. As a result, at older ages the death rate can exceed the value 1 because the exposure-to-risk can become smaller than the number of deaths. For example, consider 5 individuals aged 103 at January 1st. 3 individuals die one week later, while 2 will live for at least another year. The number of deaths is 3, and the exposure rate will be 2 + 3 ·521. Hence m103,t = 2+33

52

(25)

Chapter 4

Forecasting methods

In this chapter several methods to calculate future mortality rates and corresponding life expectancy are discussed. Since the Lee-Carter (LC) model is generally considered to be a first seminal model in mortality forecasting, we start by investigating this method in Section 4.1. Section 4.2 treats the fit of the LC model. Criticisms on the LC method are discussed in Section 4.3, and in Section 4.4 two improvements of the LC model are introduced. Finally, in Section 4.5 maximum likelihood estimation is treated. To be able to work with the data we use the statistical program R, and especially the packages ’demography’ and ’gnm’, which stands for generalized nonlinear models.

4.1

Lee-Carter model

This section treats a forecasting method for mortality introduced by Lee and Carter (LC) in 1992. In short, they specify a log-bilinear model for the central death rates (Brouhns, Denuit, and Vermunt, 2002). More precisely, Lee and Carter proposed a simple two-factor (age and time) model, in which singular value decomposition is used to extract a single time-varying mortality index as stated by Booth, Hyndman, Tickle, and de Jong (2006). Next, this index is forecasted using an ARIMA time series model. Subsequently, using this forecasted mortality index and estimates for the age-specific rates, the corresponding future death rates can be cal-culated. Furthermore, life expectancy forecasts can be derived. Notice that possible changes in future medical or social influences are not taken into account, meaning that mortality rates are forecasted by extrapolating past trends.

Here, the LC method is discussed in more detail. mx,tis a matrix with central death rates for

age x in year t. Death rates are also known as age-specific mortality rates. The death rates are fitted by the following model (Lee and Carter, 1992):

ln mx,t = ax+ bxkt+ εx,t (4.1)

or equivalently mx,t = eax+bxkt+εx,t (4.2)

The parameters have the following interpretation: ax is the age-specific average level of

mortality (Statistics Sweden, 2009b), bx are the age-specific weights for trends over time

(dln mx,t

dt = bx dk

dt), ktis the trend over time in the mortality rate and εx,t is an error term with

(26)

4. Forecasting methods

Notice that (4.1) is an underdetermined model. From (4.1) we see that an observationally equivalent model is given by:

ln mx,t = (ax− bxγ) + δbx

 kt+ γ

δ 

+ εx,t γ ∈ R and δ ∈ R (4.3)

Hence, we need to impose two restrictions to identify the parameters:

n P x=1 bx = 1 and T P t=1 kt=

0. Due to the last expression, parameter ax can be estimated as follows: ˆax= T1 T

P

t=1

ln mx,t.

Lee and Carter state that the parameter estimates are chosen in such a way that they minimize the squared deviations from a given matrix of age-specific rates. Thus, ˆbx and ˆkt minimize:

n X x=1 T X t=1 [ ln mx,t− ˆax− bxkt]2 (4.4)

Since there is no observed variable on the right-hand side of (4.1), ordinary least squares cannot be used to find parameter estimates (Lee, 2000). However, when applying singular value decomposition (SVD) to (n × T ) matrix N , where N (x, t) = ln mx,t− ˆax, a least squares

solution can be obtained. Thus, using singular value decomposition on matrix N , ˆbx and ˆkt

can be found as shown below.

Note that the numbers σ1 ≥ σ2 ≥ . . . ≥ σr ≥ 0 are the singular values of N , u1, u2, . . . , ur

and v1, v2, . . . , vr are the corresponding left and right singular vectors of N and r = rank(N )

(Koissi, Shapiro, and H¨ogn¨as, 2006; Plomp, 2009). U and V are orthogonal matrices, i.e. UTU = I and VTV = I. Applying SVD on matrix N leads to:

N =

r

X

i=1

σiuiviT (4.5)

The ratio between the square of the i-th singular value and the sum of squares of all singular values (σi2/

r

P

j=1

σ2j) gives the proportion of total variance explained by using the i-th term in the SVD. Alho (2000) mentions that the one-dimensional approximation is empirically ade-quate for mortality in a large number of industrialized countries.

In accordance with this: Lee and Carter (1992) report that the first right and left vectors and leading values of the SVD after normalization, provide a unique solution. To be more precise, ˆ

bx equals the first column of matrix U (ˆbx = u1), whereas ˆktequals the first singular value of

N multiplied by the first column of matrix V (ˆkt= σ1v1) (Koissi et al., 2006; Weerts, 2009).

Note that, ˆbx and ˆkt should be normalized such that n P x=1 bx= 1 and T P t=1 kt= 0.

Next, before we continue to forecast the time index kt, the kt’s are adjusted. Using the

parameter estimates ˆax, ˆbx, and ˆkt, we can find the estimated central death rates ˆmx,t, but

this procedure can also be used in reverse. In fact, parameter kt is re-estimated such that

(27)

4. Forecasting methods

Specifically, given estimated ˆax and ˆbx, ˜kt is a solution to: n X x=1 D(x, t) = n X x=1 E(x, t) ˆmx,t= n X x=1 E(x, t)eˆax+ˆbx˜kt ∀t (4.6)

This second stage estimation process is performed because the fitted death rates using the first stage estimates generally do not lead to the actual number of deaths (Lee and Carter, 1992). This is due to the fact that first stage estimation was based on log death rates instead of death rates (Lee, 2000). Note that the values of ˜kt differ from the SVD estimates ˆkt. The

reason for this is that when fitting the log death rates all ages receive equal weight (Lee and Carter, 1992). Therefore, the second stage estimation process is required in order to re-estimate kt. In fact, more weight is assigned to ages with higher death rates and larger age

groups when determining kt(Plomp, 2009).

Figure 4.1: ˆax, ˆbx and ˜kt

In Figure 4.1, ˆax, ˆbx and ˜kt are given. When analyzing the graphs, ˜kt seems to be linear

in time. Since we know mortality rates have declined over the last few decades this is a logical trend. ax gives the age-specific average level of mortality. For babies mortality rates

are relatively high. Starting at approximately age 30, death rates are increasing. Since the bx profile measures which rates will change rapidly or slowly in response to changes in kt,

ages below 20 are relatively sensitive to changes in kt, whereas ages above 80 are relatively

(28)

4. Forecasting methods

4.2

Fit of LC model

To assess the quality of the fit, graphical inspection and R2 are used. R2 measures the proportion of variance explained by the model. We start with a graphical inspection. Figure 4.2 depicts actual log death rates (ln mx,t) and estimated log death rates (ln ˆmx,t), for years

1950 until 2007. The actual rates are depicted by the solid lines, whereas the estimated death rates are displayed by dots. If we compare the fit for different ages the fit seems good, although a better fit is obtained for higher ages.

Figure 4.2: Actual and estimated log death rates

(29)

4. Forecasting methods

Figure 4.3: Actual and estimated age-specific log death rates

Goodness of fit can also be measured by the proportion of variance explained by the model. In Figure 4.4 the proportion of variance explained by the model is shown for all ages. This R2(x) is calculated as follows: R2(x) = 1 −RSS(x) TSS(x) (4.7) R2(x) = 1 − T P t=1 (ln mx,t− ln ˆmx,t)2 T P t=1 (ln mx,t− ˆax)2 (4.8)

RSS stands for the Residual, or unexplained, Sum of Squares. That is, the sum of squared deviations between the observed log mortality rates (ln mx,t) and the estimated log mortality

rates (ln ˆmx,t). It measures the amount of variance that remains unexplained. TSS is defined

as the Total Sum of Squared deviations. Thus, the sum of squared deviations between the ob-served log mortality rates (ln mx,t) and the average log mortality rates (T1

T

P

t=1

ln mx,t= ˆax),

squared. TSS measures the variation in ln mx,t. Notice that estimated log death rates are

given by ln ˆmx,t = ˆax+ ˆbx˜kt.

(30)

4. Forecasting methods

Figure 4.4: Explained variance

of variance explained, drops towards 0.2. As Plomp (2009) states, an explanation for this is that at young ages death rates are low, resulting in large variation in death rates. For really old people, the number of deaths becomes really small since only a small fraction of the population reaches these high ages.

Instead of calculating the proportion of variance explained by the model for the different ages R2(x), R2 measures the proportion of total variance explained. R-squared is defined as:

R2 = 1 −RSS TSS (4.9) R2 = 1 − n P x=1 T P t=1 (ln mx,t− ln ˆmx,t)2 n P x=1 T P t=1 (ln mx,t− ˆax)2 (4.10)

R2 for our data is 85.8, thus 85.8 percent of total variance is explained by the model. The proportion of explained variance can be calculated in a different way as well. Consider page 26, which explains that σ2i/

r

P

j=1

σ2j gives the proportion of total variance explained by using the i-th term in the SVD. By calculating this ratio, we find that there is 86.5 percent of explained variance by using only the first term. Using the second term as well would only add 2.5 percent to the explained variance. In the LC model, only the first term in the SVD is used, and therefore there is 86.5 percent of explained variance. Notice that there is a small discrepancy between both numbers. This discrepancy is caused by the second stage estima-tion of kt.

(31)

4. Forecasting methods

is therefore not a good measure of fit. Thus, the fit of the LC model to the data could be less accurate than as indicated by R2. On the other hand, considering the graphical inspection, the fit of the LC model is reasonable.

4.3

Criticisms on Lee-Carter method

Some criticisms have been made to the Lee-Carter model by various researchers.

• Age-specific death rates are very low, and therefore cannot realistically be projected to decline much further (Lee and Miller, 2001).

• Mortality rates are forecasted by extrapolating past trends, thus no information about possible future changes is incorporated into the model (Lee, 2000).

• The error terms are assumed homoscedastic, which is used in the singular value de-composition. Homoscedasticity means that the model errors have the same variance over all ages (Koissi et al., 2006). As mentioned by Koissi et al., the homoscedasticity assumption is not always correct.

• Lee (2000) reports that there would be a discontinuity between observed mortality rates and fitted mortality rates for the jump-off year.

• It was questioned whether the bx should be treated as fixed, since some observers

sug-gested that the bx coefficients might vary over time (Lee and Miller, 2001).

Since the publication of the Lee-Carter paper, many extensions and modifications were intro-duced to overcome the shortcomings of the original model. In Section 4.4, two variants on the Lee-Carter method are discussed. Notice that in both variants the first stage estimation is the same as in the original Lee-Carter method. Thus, the parameters ax, bx, kt are estimated

using singular value decomposition. Main differences are the adjustment procedure of kt, and

the chosen time period. In Section 4.5 we assume that the number of deaths follows a Poisson distribution, and the LC parameters are estimated using maximum likelihood estimation.

4.4

Improvements to LC model

In this section two variants of the LC method are discussed. Firstly, in Section 4.4.1 we treat the Lee-Miller variant, whereas in Section 4.4.2 the Booth-Maindonald-Smith modification to the Lee-Carter method is considered.

4.4.1 Lee-Miller variant

In this section the three differences between Lee-Miller and the original LC method, as well as the reasons for the adjustments, will shortly be discussed. The first difference is to use actual mortality rates (mx,T) for the last year of the fitting period (jump-off year) as the basis for

the forecast. Hence, forecasted mortality rates are calculated as ˆmx,T +t = mx,T e ˆbxk

T +t−˜kT)

(32)

4. Forecasting methods

The second difference considers the time period. Lee and Carter (1992) used mortality rates between 1900 and 1989, data for almost a century, and assumed the bx coefficients to be

con-stant during this time period. As Lee and Miller (2001) reported, some observers suggested that the bx coefficients might vary over time. Lee and Miller compared change in mortality

at specific ages (bx) for various industrial countries for two time periods. These time periods

are 1900 − 1950, and 1950 − 1995. Lee and Miller found some differences, for example, a more rapid decline in mortality at younger ages in the first time period. Therefore, following Tuljapurkar, Li, and Boe (2000), it was decided to use 1950 as a starting year, such that the assumption of fixed bx only applies to this particular time period. Notice that this difference

does not apply to our dataset, since we already chose 1950 as starting date. See Section 3.3 for a description of the data.

The third difference is the adjustment procedure of kt. Lee and Carter adjust kt such that

for all years the total expected number of deaths equals the total actual number of deaths. In consequence, population data is needed for this adjustment. To avoid this need, Lee and Miller re-estimate kt, such that given the already estimated ax and bx, the estimated life

expectancy at birth equals the observed life expectancy at birth for year t.

4.4.2 Booth-Maindonald-Smith variant

The main difference with the original Lee-Carter is the optimization of the fitting period. Furthermore, the second stage adjustment of ktdiffers. Below, these differences are discussed

in more detail. The analysis of Booth, Maindonald, and Smith (2002) is followed in this section. Similarly to LC, fitted rates are used in order to obtain mortality forecasts, instead of actual rates.

Firstly, in the Booth-Maindonald-Smith variant another adjustment procedure for kt is used

compared to the original Lee-Carter model. Given the previously estimated ax and bx, see

Section 4.1, kt is adjusted by fitting to the age distribution of deaths D(x, t) rather than to

total annual deathsP

x

D(x, t). More specific, a Poisson regression model is fitted to D(x, t). The Poisson model is:

ln D(x, t) = ln E(x, t) + ln ˆmx,t+ ˆεx,t (4.11)

where ln ˆmx,t= ˆax+ ˆbxk˜t. The adjusted ktare denoted by ˜kt, and ˆεx,t are the residuals after

adjustment of kt. The deviance is used as the minimization criterion. The Poisson deviance

(Faraway, 2006) is defined in (4.12), adjusting this to the current situation gives (4.13): deviance = 2X i  yi ln yi ˆ µi − (yi− ˆµi)  (4.12) deviance(t) = 2X x ( D(x, t) ln D(x, t) ˆ D(x, t) − h D(x, t) − ˆD(x, t)i ) (4.13)

where ˆD(x, t) denotes fitted deaths, and are obtained by: ˆ

(33)

4. Forecasting methods

Secondly, the most appropriate period to apply the Lee-Carter method on is chosen, under the assumption of linear ˜kt(Booth, Maindonald, and Smith, 2001). However, the final year of

the fitting period is based on the latest year for which data are available. Thus, the optimal choice of period only depends on the starting year, which is denoted by S.

The choice of S depends on statistical measures of goodness of fit. As Booth et al. (2002) report, the total lack of fit of the model is composed of two parts, namely the base lack of fit and the additional lack of fit. The former considers the lack of fit from the basic Lee-Carter model after the adjustment of ktand is measured by:

deviancebase(S) =

X

t

deviancebase(t) (4.15)

where deviancebase(t) = 2

X x ( D(x, t) ln D(x, t) ˆ D(x, t) − h D(x, t) − ˆD(x, t)i ) (4.16) ˆ

D(x, t) is as given in equation (4.14) and depends on the adjusted kt.

The additional lack of fit considers the lack of fit from imposing an ARIMA linear model on ˜kt. This linear fit to ˜kt passes through the mean of ˜kt at midperiod and has slope c

[c = m−11

m

P

t=2

(˜kt− ˜kt−1)]. It is only used to determine S, the starting year of the fitting period.

The total lack of fit, base plus additional, is measured by: deviancetotal(S) =

X

t

deviancetotal(t) (4.17)

where deviancetotal(t) = 2

X x ( D(x, t) ln D(x, t) ˆ D(x, t) − h D(x, t) − ˆD(x, t)i ) (4.18)

where ˆD(x, t) in equation (4.18) is derived from the linear fit to ˜kt.

To be able to compare deviancebase(S) with deviancetotal(S) both statistics are divided by

the relevant degrees of freedom, resulting in mean deviance statistics. Since there are n age categories and m years in the fitting period (m = 2007 − S + 1), there are n × m age-specific death rates. For the first two years, 2n entries are needed. The log-additive model requires for each additional year one additional parameter, thus for m years the number of addi-tional parameters is m − 2. Now the number of independent entries can be derived, also known as the degrees of freedom (df). The df for deviancebase(S) is: (nm) − 2n − (m − 2) =

n(m − 2) − (m − 2) = (n − 1)(m − 2). For the log-linear model no additional parameters are needed. Therefore, the df for deviancetotal(S) is: (nm)−2n = n(m−2) (Booth et al., 2001).

The ratio of both mean deviance statistics indicates the ratio of total to base lack of fit and is used as the statistical measure of goodness of fit. For different starting years S this ratio is calculated. The ratio is given by R(S) and is defined as:

R(S) = deviancetotal(S)/[n(m − 2)] deviancebase(S)/[(n − 1)(m − 2)]

(34)

4. Forecasting methods

As Booth et al. (2002) state, the criterion used for the choice of S is that R(S) is substantially smaller than the corresponding statistic for preceding values of S.

Figure 4.5: Left: mean deviance statistics, right: ratio of both statistics.

In Figure 4.5 on the left, both mean deviance statistics are depicted and on the right R(S) is plotted for different starting years S. When examining Figure 4.5 in more detail we consider 1975, displayed by the black triangle, as the pivotal year. This is the year for which S − 1 (1974) leads to a substantial increase in R(S) compared to S (1975). Thus, the optimal fitting period is 1975 − 2007. Striking about the figure is that after 1975 there appears to be some systematics. During approximately every five years there seems to be a constant increase and then a large drop in R(S). Notice that the overall trend is still declining after 1975, but at a lower rate than before 1975, so despite the systematics in the graph we believe our choice for 1975 as starting year is justified.

Coelho and Nunes (2011) describe a number of disadvantages of the Booth-Maindonald-Smith method. One of the two main limitations is the fact that a non-objective criterium, visual inspection of the time plots of model fit ratios, is used to detect the existance of a structural change. Moreover, not all available information from the data is used, since all information before the start of the fitting period is discarded. Booth et al. (2002) have mentioned this in their paper as well. Coelho and Nunes (2011) introduced a method where an appropriate ARIMA model is identified, based on tests for structural changes and unit roots in the index of mortality kt. For a detailed description of this method we refer to the paper of Coelho and

(35)

4. Forecasting methods

4.5

Maximum likelihood estimation

This section describes another method to estimate the parameters ax, bx and kt of the

Lee-Carter model. Instead of using singular value decomposition, maximum likelihood estimation is used. Since the number of deaths can be considered as a counting random variable, it is assumed that the number of deaths (Dx,t) are modelled as independent Poisson response

variables (Renshaw and Haberman, 2003), with:

λx,t = E[D(x, t)] = E(x, t) · mx,t

Var[D(x, t)] = ϕ · E[D(x, t)]

where ϕ is a measure of over-dispersion to allow for heterogeneity (Butt and Haberman, 2009). The canonical link of the Poisson distribution is the log-link function. Therefore, the (non-linear) predictor ηx,t is given by:

ηx,t = log[E(D(x, t)] = log[E(x, t) · mx,t]

= log E(x, t) + log mx,t = log E(x, t) + ax+ bxkt (4.20)

However, since this predictor is non-linear in the parameters we cannot use generalized linear models (Renshaw and Haberman, 2003).

The parameters ax, bx, kt are estimated by maximizing the Poisson log-likelihood function.

The log-likelihood function for a Poisson distribution is given by (4.21), hence in our case the log-likelihood function is defined by equation (4.22):

l(λ) = n X i=1 (Xi log λ − λ − log Xi!) (4.21) l(a, b, k) = X x,t n

D(x, t)log[E(x, t)eax+bxkt] − E(x, t)eax+bxkt − log[D(x, t)!]

o l(a, b, k) = X x,t n D(x, t)(ax+ bxkt) − E(x, t) eax+bxkt o +X x,t D(x, t) log E(x, t) −X x,t log [D(x, t)!] l(a, b, k) = X x,t n D(x, t)(ax+ bxkt) − E(x, t) eax+bxkt o + constant (4.22)

where the constant consists of all terms which do not depend on a, b, k. We have to per-form numerical methods to find the maximum likelihood estimates. Brouhns et al. (2002) describe an algorithm in their report, that updates the parameters in each iteration. The algorithm stops when the log-likelihood converges. Notice that Renshaw and Haberman (2006) use an equivalent procedure, however this algorithm stops when the deviance, given by P x,t 2 n D(x, t) log D(x,t)ˆ D(x,t) − h D(x, t) − ˆD(x, t) io

(36)

4. Forecasting methods

In contrast to the original Lee-Carter method, where a second stage estimation of kt was

performed, kt does not have to be re-estimated here. Reason is that the error applies directly

on the number of deaths in the Poisson regression approach, as stated by Brouhns et al. (2002). Indeed, the mle ˆax, ˆbx, and ˆkt satisfy equation (4.23):

X

t

D(x, t) =X

t

E(x, t) eˆax+ˆbxˆkt (4.23)

Thus, the ˆkt’s are such that when multiplying the resulting mortality rates with the actual

exposure rates, the actual total number of deaths for each age is obtained (Delwarde, Denuit, Guill´en, and Vidiella-i Anguera, 2006).

Figure 4.6 depicts the parameter estimates, using maximum likelihood estimation and using singular value decomposition. The ax estimate using MLE seems higher for all ages than the

ax estimate using SVD. Since the maximum likelihood estimate ˆbx is not normalized (that

means rescaled such that

n

P

x=1

bx = 1) there is some discrepancy between the normalized SVD

ˆbx and MLE ˆbx. Notice that there is quite some difference between the kt estimates. How-ever, if we would normalize the MLE estimate kt such that

T

P

t=1

kt= 0, the MLE estimate ˆktis

almost identical to the LC second stage estimate ˜kt. Nevertheless, we will not use

(37)

4. Forecasting methods

(38)

Chapter 5

Comparison between different

methods

In this chapter the various methods introduced in Chapter 4, are compared. Section 5.1 forecasts life expectancy. In Section 5.2 the accuracy of the mortality forecasting models is investigated. By using data until 1980, death rates and life expectancy are ’forecasted’ from 1981 to 2007, and the predicted life expectancy will be compared with observed life expectancy. In Section 5.3 the main differences between the models are investigated, and we make a comparison with the predictions shown in the annual pension statement, which are given in Chapter 2.

5.1

Comparison of life expectancy

Chapter 4 discussed how the LC parameters ax, bx and kt are estimated using singular value

decomposition or maximum likelihood estimation. Hereafter, time series methods are used to fit the mortality index kt. In their paper, Lee and Carter (1992) discuss that a random walk

with drift, which is an ARIMA(0,1,0) model with constant, describes the adjusted kt well.

Therefore, kt could be defined as:

kt= kt−1+ c + ηt (5.1)

where ηt∼ N (0, σ2η) and c is a drift parameter that determines the average speed with which

kt changes (Statistics Sweden, 2009a). This average annual change in kt is estimated as:

ˆ c = ∆kt ∆t = 1 T − 1 T X t=2 (kt− kt−1) = kT − k1 T − 1

The standard error of the estimate c is denoted by ˆσc, whereas the standard error of the

error term ηt is denoted by ˆση. Booth et al. (2002) report that these terms together estimate

the uncertainty associated with a one-year forecast. Now, we continue with forecasting the mortality index kt. The point forecast for k in year T + s is given by (Pedroza, 2006):

˜

kT +s= ˜kT + sˆc (5.2)

(39)

5. Comparison between different methods

about the drift term as well. The former is denoted by ση and the latter by σc. These standard

errors are estimated by (Li, Lee, and Tuljupurkar, 2004):

ˆ ση = v u u t 1 T − 2 T X t=2 (kt− kt−1− ˆc)2 ˆ σc = p var(ˆc) = s σ2 η T − 1 ≈ ˆ ση √ T − 1

This leads to the following forecast error variances, for forecasts s years ahead, and 95 percent prediction intervals: LC: ˆσk2 T +s = sˆσ 2 η BMS: ˆσk2T +s = s2σˆc2+ sˆση2 (1 − α)% prediction interval: k˜T +s± zα/2σˆkT +s 95% prediction interval: k˜T +s± 1.96ˆσkT +s

Figure 5.1 depicts the actual ˜kt for the years 1950 up to 2007, point forecast values of ˜ktfor

the years 2008 up to 2050, and the 95 percent prediction intervals. The ˜kt of the original

Lee-Carter method is used for the construction of this figure.

Figure 5.1: Actual and forecasted ˜kt with 95 percent prediction intervals

We will follow the original Lee-Carter model, and only use the uncertainty in kt in the

pre-diction intervals. Thus, the uncertainty in estimating ax and bx is not incorporated in the

prediction intervals. Reason is that the errors in estimating ax and bx are dominated by the

(40)

5. Comparison between different methods

Using the forecasted ˜kt and the previously estimated ˆax and ˆbx, mortality rates and life

expectancy can be forecasted, with prediction intervals. There are two ways to calculate the point mortality forecasts for s years ahead. Firstly, we can use fitted rates, see equation (5.3). This is described in the original LC method, as discussed in Section 4.1. Secondly, we can use actual mortality rates mx,T, as explained in Section 4.4.1. The corresponding forecasted

mortality rates are given by equation (5.4). Now, the 95 percent prediction intervals for (log) mortality rates are given by (5.5) and (5.6):

Fitted rates: ˆmx,T +s = eˆax+ˆbx ˜ kT +s s > 0 (5.3) Actual rates: ˆmx,T +s = mx,T e ˆ bx(˜kT +s−˜kT) s > 0 (5.4) (1 − α)% prediction interval: ln ˆmx,T +s± zα/2ˆbxσˆkT +s 95% prediction interval: ln ˆmx,T +s± 1.96ˆbxσˆkT +s (5.5) 95% prediction interval: mˆx,T +se ±1.96ˆbxσˆkT +s (5.6) Thus, log mortality rates are forecasted using the above formulae, together with the prediction intervals. Figure 5.2 depicts the mortality forecasts for age 10 using the LC method, the LM variant, the BMS variant, and maximum likelihood estimation.

(41)

5. Comparison between different methods

Similarly, Figure 5.3, Figure 5.4, and Figure 5.5 show the mortality forecasts for age 40, age 65 and age 90. There are some interesting differences between the various graphs. Striking about Figure 5.2 is that the LC method and Poisson MLE yield much wider prediction inter-vals for this age than LM and BMS. On the other hand, the point forecasts are approximately equal for the various methods. For age 40 and 65 the BMS point forecasts differ considerably from the others, see Figure 5.3 and Figure 5.4. Besides that, for higher ages the prediction intervals are narrower.

(42)

5. Comparison between different methods

Figure 5.4: Mortality forecasts with 95 percent prediction intervals for age 65

(43)

5. Comparison between different methods

The forecasted mortality rates are used to calculate forecasted life expectancy, with prediction bounds. In Section 3.1 is explained how to convert mortality rates into life expectancy. We use the approximation ˚ex≈ ex+12 to calculate expected future lifetimes of individuals aged x.

Figure 5.6 shows life expectancy at birth using the various methods. Remarkable is that there is hardly any difference between the forecasted life expectancy using LC or using Pois-son MLE. For example, consider the projected life expectancy in 2050. These are 85.93 years for Lee-Carter, 86.13 for Lee-Miller, 86.67 for Booth-Maindonald-Smith and 85.94 for Poisson MLE. The BMS forecasted life expectancy for 2050 is considerably higher than the other forecasts.

(44)

5. Comparison between different methods

Similarly to Figure 5.6, Figure 5.7 shows that the Lee-Miller method yields the narrowest prediction intervals. On the contrary, the Lee-Carter prediction bounds are wider than the others. At age 65, the forecasted remaining life expectancy for 2050 varies between 23.16 years for Lee-Carter, 23.24 for Lee-Miller, 23.36 for BMS, and 23.14 for Poisson MLE.

Figure 5.7: Actual life expectancy at age 65, and forecasts with 95 percent prediction intervals (with and without uncertainty from drift). Top left: Lee-Carter, top right: Lee-Miller, bottom left: Booth-Maindonald-Smith, bottom right: Poisson maximum likelihood estimation.

5.2

Actual versus predicted values

(45)

5. Comparison between different methods

variant is an exception to this. In this variant an optimal fitting period is determined from the available data. Unfortunately, Booth et al. (2002) do not specify conditions regarding the minimum length of the time period.

In Figure 5.8 the mean deviance statistics are shown on the left, and the ratio of total to base lack of fit (R(S)) is depicted on the right for different starting years S. The criterion used for the choice of S is that R(S) is substantially smaller than the corresponding statistic for preceding values of S. After visual inspection 1970, 1971, and 1978 are identified as possible starting years. Since the fitting period with these starting years would consist of only 11, 10 or 3 years, we decided to use all data instead of a subset. Hence, for BMS the only difference with the original LC method is the second stage estimation of kt.

Figure 5.8: Left: mean deviance statistics, right: ratio of both statistics.

(46)

5. Comparison between different methods

Figure 5.9: Actual life expectancy at birth, and forecasts with 95 percent prediction intervals (with and without uncertainty from drift). Top left: Lee-Carter, top right: Lee-Miller, bottom left: Booth-Maindonald-Smith, bottom right: Poisson maximum likelihood estimation.

(47)

5. Comparison between different methods

Figure 5.10: Actual life expectancy at age 65, and forecasts with 95 percent prediction inter-vals (with and without uncertainty from drift). Top left: Lee-Carter, top right: Lee-Miller, bottom left: Booth-Maindonald-Smith, bottom right: Poisson maximum likelihood estima-tion.

(48)

5. Comparison between different methods

Year Actual life Forecasted life expectancy at age 65

expectancy Lee-Carter Lee-Miller Booth-Maindonald-Smith Poisson MLE

at 65 use all data with optim. time period

1984 16.84 16.72 16.77 16.73 16.77 16.74 1985 16.67 16.83 16.84 16.83 16.85 16.85 1986 16.85 17.11 17.12 17.09 17.10 17.10 1987 17.02 17.25 17.27 17.22 17.23 17.23 1988 16.89 17.44 17.46 17.41 17.43 17.42 1989 17.37 17.26 17.27 17.23 17.26 17.24 1990 17.26 17.46 17.44 17.43 17.46 17.43 1991 17.40 17.60 17.62 17.57 17.61 17.58 1992 17.51 17.44 17.46 17.43 17.46 17.43 1993 17.45 17.96 17.98 17.93 17.95 17.93 1994 17.99 17.83 17.85 17.79 17.83 17.80 1995 17.92 17.95 17.99 17.93 17.97 17.93 1996 18.00 18.11 18.11 18.09 18.13 18.09 1997 18.16 18.02 18.04 18.02 18.07 18.03 1998 18.27 18.57 18.60 18.56 18.61 18.57 1999 18.28 18.51 18.51 18.51 18.58 18.52 2000 18.48 18.59 18.60 18.61 18.67 18.62 2001 18.56 18.76 18.77 18.77 18.83 18.78 2002 18.54 18.86 18.87 18.86 18.92 18.87 2003 18.75 18.82 18.85 18.86 18.92 18.87 2004 19.07 19.02 19.06 19.04 19.09 19.05 2005 19.07 19.08 19.13 19.12 19.17 19.13 2006 19.25 19.05 19.09 19.11 19.18 19.14 2007 19.32 19.29 19.32 19.33 19.37 19.36

Table 5.1: Actual versus predicted life expectancy at age 65

(49)

5. Comparison between different methods

5.3

Comparison between methods

In this section we will compare the different forecasting methods with each other, and with the projections given in the annual pension reports (of 2001 and 2009) of the Swedish Pensions Agency (2009). These are shown in Figure 2.7 and 2.8 of Chapter 2.

To be able to compare our projections with the projections of the Swedish Pensions Agency we constructed Table 5.2, which is similar to the tables depicted in Figure 2.7 and Figure 2.8.

Life expectancy at age 65

Year Lee-Carter Lee-Miller Booth-Maindonald-Smith Poisson MLE

2010 19.71 19.80 19.79 19.77 2015 20.16 20.25 20.28 20.22 2020 20.61 20.69 20.76 20.66 2025 21.05 21.14 21.22 21.09 2030 21.49 21.57 21.68 21.52 2035 21.92 22.00 22.12 21.94 2040 22.34 22.42 22.55 22.35 2045 22.75 22.85 22.96 22.75 2050 23.16 23.24 23.36 23.14

Table 5.2: Forecasted life expectancy at age 65, using the various methods

We would like to point out that the forecasts for 2009, depicted in Figure 2.8, are adjusted downwards compared to the predictions of 2008. That is, for all years the predicted life ex-pectancy is 1 or 2 months lower. For our projections we have used data up to and including 2007, because the Human Mortality Database had no data available beyond this year at the moment of downloading the data.

In Section 5.1 and Section 5.2 we already concluded that there is not much difference between the life expectancy forecasts using the various forecasting methods. Striking is that the BMS predictions are 1 to 2 months higher than the other forecasts. In general, the BMS forecasts are largest, and the Lee-Carter and Poisson MLE predictions are rather similar.

Figure 5.7 shows that the unisex remaining life expectancy at the age of 65 is forecasted to increase from approximately 19 years to 23 years in 2050. However, the projection given in Figure 2.8 shows an average life span of 21 years and 11 months for 2050, a discrepancy of more than 1 year.

(50)

5. Comparison between different methods

(51)

Chapter 6

Conclusion

In this master’s thesis we focused on Sweden’s reformed pension system, and on mortality models. We investigated the implications of using different mortality models in forecasting life expectancy and the implications to the Swedish pension system. This resulted in the research question: what are the implications of using different forecasting models for life expectancy for the Swedish pension system?

As pension size depends on life expectancy of the retiring generation, longevity risk is partly passed to the retiring generations. Until the age of 65, changes in average life span directly influence the size of pension benefits. However, individuals can choose to (partially) continue working, which as a consequence enlarges notional capital. Thereby, they increase their re-sulting pension. Nevertheless, after the age of 65 the annuity divisor is definite. This means that changes in life expectancy cannot directly influence pension benefits after the age of 65. However, with an increasing life expectancy the pension liability of the system increases as well, which puts pressure on the balance ratio. When this happens, both retirees and eco-nomically active are possibly affected through the adjusted indexation.

This thesis described four mortality models that were used to predict life expectancy. Among these models Lee-Carter is considered to be the first seminal model in mortality forecasting. Lee-Miller and Booth-Maindonald-Smith are both variants to Lee-Carter. In these models, the parameters are estimated using singular value decomposition. In the fourth method, the parameters are estimated using maximum likelihood. In the Lee-Carter method, one of the parameters is re-estimated in a second stage. The differences between Lee-Carter and the two variants are the adjustment procedures of this parameter, and the chosen time period. We investigated the accuracy of the mortality forecasting models. By using data until 1980, life expectancy was ’forecasted’ from 1981 to 2007, and compared with the observed life expectancy. The predictions were quite accurate. Aside from this, life expectancy was pre-dicted four years ahead, and compared with the actual values. Striking is the similarity in the predictions. For almost every year, the various methods either over- or underestimate life expectancy, and there is hardly any discrepancy between the methods. Overall, the BMS predictions (using all data) are most accurate.

(52)

ex-6. Conclusion

pectancy. However, when our predictions are compared with the predictions depicted in the annual pension statement of 2009, some discrepancies arise. For example, the assumed life expectancy for 2040 is already reached in 2030. In order to obtain the same level of pension payments, an increase of the retirement age by half a year is needed.

Referenties

GERELATEERDE DOCUMENTEN

Dat hiervoor nog stappen gezet moeten worden blijkt uit statistieken over Zweden waar nu nog veel minder mantelzorgers zijn dan in Denemarken en Noorwegen, hoewel zij per persoon

Additionally, the different motivations also lead to preference for different gaming modes among girls and boys: Girls more often choose creative and single-player mode, whereas

thin films and Si device layers, are controlled by many parameters: the growth technique, the thicknesses of the PZT thin-film and Si device layer, the membrane diameter and

This demonstrates that investors recover inflation under the utilities regime assuming that inflation used to index assets (Table 4.1) equals inflation embedded in the required

Figure 12: The average proportion of personal wealth that is invested in stocks (the remaining wealth is invested in bonds) when pension wealth at retirement is converted into a

Compared to the Dutch second pillar, Swedish occupational pension plans offer a wider range of individual choice: partici- pants can usually choose their pension provider, the type of

op basis van de historische kostprijs wordt bij voortdurende prijsstijging niet bevredigend geacht, gezien de financiële moeilijkheden die zich voordoen bij

TABLE 1: PENSION FUNDS IN THE NETHERLANDS All pension funds Assets under management billion euro Funding ratio # of pension funds # of active plan members Corporate pension funds