Modeling mortality trends.

(1)

Modeling Mortality Trends

Jasper Van Halewyck

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics Author: Jasper Van Halewyck Student nr: 10289380

Email: jaspervh@gmail.com

Date: March 27, 2014

Supervisor: Dr. K. Antonio

(2)

(3)

Modeling Mortality Trends — Jasper Van Halewyck iii

Abstract

Following B¨orger et al. (3) we work through a model estimating the evolution and uncer-tainty in the evolution of future mortality rates, with specific attention to changes in the underlying trend. This model allows for the simultaneous simulation of mortality rates of correlated populations, for example males and females or populations of countries with similar developments. This correlated simulation could create benefits by lowering the Life risk margin for an insurer in the Solvency II framework.

The model also gives an opportunity to check the Standard Solvency II model and the assumptions underlying recent mortality prognoses.

(4)

Preface vi

1 Introduction 1

1.1 Solvency II and Capital Requirements . . . 2

1.2 Stochastic Mortality Models . . . 2

1.3 Mortality in Old Age . . . 4

1.4 Thesis Structure . . . 4

2 Model and Data 6 2.1 Model . . . 6

2.1.1 Volatility . . . 7

2.2 Data . . . 8

2.2.1 Data Quality Issues . . . 9

2.2.2 Small Populations . . . 10

3 Model Estimation 11 3.1 Model Estimation. . . 11

3.2 Estimation of cohort effects . . . 14

4 Projection 17 4.1 Present Trends . . . 17

4.2 Model Calibration . . . 17

4.2.1 Volatility of κ(1)_t . . . 18

4.3 Projecting κ(1)_t . . . 21

4.3.1 A process for the general population . . . 21

4.3.2 Projection as a changing AR(n)-process . . . 22

4.3.3 Trend sensitivity of κ(1)_t . . . 22

4.3.4 Projecting the individual κ(1)_t from the general population . . . . 25

4.4 Projection of κ(2)_t , κ(3)_t and κ(4)_t . . . 27

4.4.1 Projecting κ(4)_t . . . 29

4.5 Projecting Mortality . . . 30

4.5.1 Projection as control for analysis . . . 30

4.5.2 Calibrating the volatility of κ(2)_t , κ(3)_t , κ(4)_t . . . 32 iv

(5)

Modeling Mortality Trends — Jasper Van Halewyck v

5 Comparison with the Solvency II Standard Formula 34

5.1 Solvency Capital Requirement. . . 34

5.1.1 Runoff Risk versus One Year Trend . . . 35

5.2 Numerical Comparison with Standard Model . . . 36

5.2.1 Funeral Insurance . . . 37

5.2.2 Fixed Term Retirement Insurance . . . 38

5.2.3 Annuities . . . 39

5.2.4 Diversification Benefits. . . 39

5.2.5 Summary . . . 40

5.2.6 Group Life Insurance. . . 41

6 Comparison with Actual Mortality Predictions 44 6.1 Methods . . . 44 6.2 Comparison of κ(1)_t . . . 45 6.3 Comparison of κ(2)_t and κ(3)_t . . . 46 6.4 Comparison of κ(4)_t . . . 47 6.5 Summary . . . 48 7 Discussion 49 7.1 Comparison with B¨orger et al. . . 49

7.2 Practical Use . . . 51

(6)

With this thesis I join the ranks of students all over Europe who have researched the realism of Solvency II standard setting. As long as these standards are not practically introduced (and for a while afterwards), exploring the implications of shortcuts, simpli-fications and generally the standard model will be a common use of students’ attention. All assumptions will be challenged, alternatives calculated and maybe even used as internal models later.

I hope this doesn’t make me sound as if the research is not rewarding. It is. We live in a time that methods for market-value valuation are taken to their perfection, and being part of this development (however small) feels like an actuarial adventure. Through the research I have looked in much more detail at the possibilities of longevity risk than I do in my daily work, and it has been fun to delve so deep in a topic. The model of B¨orger et al. that I work through and redevelop has a satisfying complexity, while being accessible enough to replicate.

Researching and writing has been a great but demanding experience. I would like to thank Cherie and Theo, whose patience I have tried a little too often. Many thanks go to Erik Tornij, who was kind enough to read a concept version of the thesis.

Finally, a practical note. Almost all analysis and all calculations used in this thesis were performed in R (15), often needing the MASS package (17). The code I wrote is too long to reasonably include as an appendix. In part this is due to my incomplete functional design: some calculations are repeated a few times with slightly different parameters, and the code is repeated as well. Mostly it is because the model is rather complex and a lot of new and different calculations have to be performed at each stage of the analysis. I invite all readers interested in the R code to request it by email, it is of course available.

(7)

Chapter 1

Introduction

During the 20th century, (almost) all western societies have awarded old-age benefits to their members, originally to compensate the loss of income due to age-related disability. Disability is not a prerequisite for benefits, however. In general the benefits start for each person at age 65, though earlier age limits abound when times are good, and later limits when public finances are strained (67 is coming to western Europe and has been the law in the U.S. since 1983).

The financing of old age benefits is often a hybrid between Pay-As-You-Go systems (e.g. AOW in the Netherlands and Social Security in the U.S.) and capitalization sys-tems, for instance as a private complement of government benefits. In the Netherlands, this private complement (second and third tiers of pensions) is well established, with a large capital covering the expected benefits. This has several benefits: the estimated cost of pensions is made explicit at the time the benefits are awarded, rather than post-poned until each separate benefit is paid out. Another benefit is that it becomes clear very early on when this cost was underestimated: for example when investment income is lower than expected, or when pensioners live longer than was initially accounted for. This clarity gives the opportunity to discuss how to share the pain when pensions are underfunded. This has been an important public debate in the past years.

Estimating the longevity of employees when funding their pensions has proved to be very hard. Mortality tables used in the past were all considered prudent or best-estimate. Still they were superseded after 5 to 10 years by new tables, with each one having lower mortality than the last. This experience shows that it takes some courage now to proclaim a new best estimate in mortality. Maybe a safer route is to predict how wrong the current estimate might be, by quantifying the volatility of mortality or of mortality trends.

In (1), Bauer et al. do this explicitly from the history of best-estimate mortality tables: he considers the change of forward-looking mortality as the basis for uncertainty in the forward measure. In this thesis, we will use a more mundane approach: we will follow B¨orger et al. (3) in making an estimation of volatility of mortality based on past experience. This method can be useful for many purposes. We are interested most in reserving, for which a prudent estimate of volatility is necessary. We will also use this

(8)

(prudent) estimated volatility to evaluate the current prognosis of Dutch mortality by the CBS (Centraal Bureau voor Statistiek).

1.1 Solvency II and Capital Requirements

One of the effects of the upcoming Solvency II supervisory regime is to make explicit how ‘wrong’ the current best estimates can be, and what the possible impact is on the balance sheet of the insurance entity. The Technical Specifications of December 2012 (8) do this by having the insurance entity commit to a value for the 99.5% Value-at-Risk in its basic own funds. This risk-based capital is kept on the balance sheet, and the cost of this extra capital charge consitutes a Risk Margin on the Best-Estimate Liabilities. This ensures (with a model-dependent 99.5% certainty) that the insurance entity will be able to fulfill its liabilities one year after valuation.

The 99.5% VaR is composed of different VaRs for specific risks, with diversification benefits. For entities that do not develop their own method to determine the 99.5% VaR, a standard formula has been published. Life Risk, which encompasses mortality uncer-tainty, is composed of several sub-modules (mortality, longevity, disability/morbidity, lapse, expense, revision and catastrophe risk); we will zoom in on mortality and longevity risk.

The standard formula here mentions an instantaneous and permanent decrease (in-crease) of 20% (15%) of all mortality rates as the assumption underlying the capital requirement for longevity (mortality). This standard capital requirement can look ar-bitrary, but it determines largely how much capital most insurance entities will reserve for mortality risk.

It is of course impossible to say if this will prove an adequate reserve. Still, it should be immediate that an age and gender independent formula will overcharge for some policy holders and undercharge for others (see e.g. (3), (2),(14)). This could result in unexpected releases or increases in the Solvency Ratio if a specific group of policyholders is managed as a run-off portfolio, i.e. without the benefit of new contracts to help manage the risk.

This makes it important to keep questioning and challenging the standard model, to check whether its outcomes are compatible with different individual expectations of mortality evolution. The stochastic model developed in (3) and shown in this thesis does this by analyzing the differences with the standard model for specific types of life insurance contracts.

1.2 Stochastic Mortality Models

When we want to determine how reliable mortality prognoses really are, we need a stochastic mortality model. The most famous of these models is the Lee-Carter model (10), in which

(9)

Modeling Mortality Trends — Jasper Van Halewyck 3

with age-specific constants ax, bxand ktindicating the level of mortality in a given year.

Since its publication, this model has had many expansions and specifications: it is the basis of stochastic mortality modeling. The link function between mortality (qx,t) and

the parameters can be changed depending on how one wants to fit the model. Often, bx is considered to be of the form (x − xcenter) making kt the sensitivity of mortality to

age, with an extra time-sensitive constant k(0)_t . A specific extra component κ(2)_t can be added for young ages, as in (14), or a cohort parameter to cover the cross terms between age and calendar time.

The model we will be working with is also of this form. We will be looking at

logit qx,t= αx+ κ(1)t + κ (2)

t · (x − xcenter) + κ(3)t · (xyoung− x)++ κ(4)t · (x − xold)++ γt−x

We use specific factors for both the low and the high end of the age spectrum, and use a cohort factor to capture the residuals (although these are not large, see section 3.2). The ages xyoung, xcenter and xoldare taken to be fixed at 55, 60 and 85 in accordance to

B¨orger et al. An alternative approach would be to estimate these boundary and central ages, but this would require a more comprehensive model estimation.

However popular the Lee-Carter model is, recently another interesting form of stochas-tic mortality model has been developed. With the specific SCR definition in mind, Bauer et al. have developed in (1) a way to account for the systematic mortality risk in an insurance portfolio. They work from the term structure of mortality (as given by a generation mortality table) and attach a volatility structure to find out how much the best-estimate of mortality can change in a given amount of time (e.g. 1 year with 99.5% certainty).

This approach is in principle very different from the Lee-Carter method. Rather than using the past mortality experience as a guide to the future, they look at best-estimate generation tables to give the future mortality rates the way we look at bond and swap prices to determine a term structure of interest rates. But where we can look at swaptions and other asymmetric derivatives to calculate an implied volatility term structure, there is no such deep market for mortality products (reinsurance, longevity swaps, ...). If volatility cannot be found using forward-looking instruments, it must be estimated from the changes in best-estimate term structures. Both Bauer et al. (1) and B¨orger (2) use the changes between generation tables as input for volatility calculations. When these generation tables exist, they must be split in a best-estimate table and a risk margin component (when applicable) as only the changes in best-estimate probabilities indicate the relevant volatility.

The Bauer model would bring mortality modeling and pricing much closer to finan-cial asset pricing theory. The comparison between policyholder survival and the time value of money is clear, and it could be useful to bring insights from financial mathemat-ics to the rest of (life) actuarial science. We will still focus on Lee-Carter type modeling, as it is as of now still a very important method in stochastic mortality forecasting.

(10)

1.3 Mortality in Old Age

Life expectancy at birth, as a measure of longevity, has increased a lot over the past 100 years. It is an intuitive and clear measure, and has as an advantage that it is highly sensitive to mortality rates in childhood and young adulthood. As improvement of survival at young ages is a public health goal, having such an easy statistic is appealing. However, it does not respond well to changes in adult longevity.

Adult mortality is more clearly described by the modal age of death, as explained in (6) for example. There, the authors describe three different evolutions in mortality: (a) a period in which life expectancy increases but modal age at death stayed more or less constant, (b) a period in which modal age at death increases, but less than life expectancy and (c) a period with parallel movements between the two. In periods (a) and (b) it is clear that one expects a decrease of volatility of mortality rates, but the authors mention that also in (c) the standard deviation SD(M +) (age of death conditional on death occurring above modal age) decreases. This means that the decrease of mortality at very high ages is not parallel to earlier ages. We will see that our analysis supports the idea that mortality around age 100 changes less than at younger ages. All this fits into a wider research into the rectangularization of the survival curve (most famously in (9)). A decrease of SD(M +) indicates that mortality after the mode is being compressed into an ever smaller interval. Everybody has to die at some point, but we should not take this to mean that mortality rates, decreasing at low ages, suddenly start increasing for the old. As in Lee-Carter, this is not a constraint but an effect of the model methods.

We have an old-age specific component of mortality in our model (κ(4)_t ) which is positive, meaning the general decrease of mortality is undone at high ages and might even lead to increases over time given a high enough age. And yet at very high ages mortality does not necessarily increase with age. We find Mangel (11) and Partridge and Mangel (13) speak of mortality plateaus at old age, where ‘[...] mortality trajectories do not necessarily increase with age, but may level off or even decline with age.’Vaupel (7) even finds accelerating declines in mortality for the old. So maybe an extra extreme-age component is needed, or our κ(4)_t should be limited. If we read that older ages do not necessarily incur higher mortality rates, it would be rather surprising for these rates to increase with calendar time.

We analyze κ(4)_t in section4.4.1. As we keep our analysis as mechanical as possible, but need to exclude some unrealistic evolutions of mortality. We will take from this discussion that as calendar time goes by, mortality at old ages should not increase significantly.

1.4 Thesis Structure

We go through the work of B¨orger et al. in (3), and apply this model to the Dutch insurance situation. After this, we use the model to check how compatible the past mortality rates are with the expected future mortality as given in recent CBS prognosis

(11)

tables.

In chapter2we will briefly discuss the setup of the mortality model we are studying, and the data that are the basis for the analysis. Chapter 3 contains the process of estimating the model parameters, and the extra calibration which was needed to fit the model to our purposes (prudent capital reservation). We continue in chapter 4, projecting the different parameters forward and looking at the different time series and their structure. This exercise will give us a series of mortality tables that contains the estimated variance structures.

In chapters 5 and 6 we compare these mortality tables to the Solvency II capital requirement and the CBS mortality updates, respectively.

(12)

Model and Data

In this section we will go briefly over the main parts of the model as developed in (3), deferring the actual estimation procedure for the next chapter. We discuss the chosen parametrization and the relevance of different parts.

The model estimation is of course only as good as the available data, so we will look into some possible problems with the data sets we use.

2.1 Model

We analyze the evolution in mortality rates in multiple populations, identifying a com-mon trend while allowing for differences to persist and vary between populations. Fol-lowing (3) we use the following model for mortality rates qx,t

logit qx,t = αx+κ(1)t +κ (2)

t ·(x−xcenter)+κ(3)t ·(xyoung−x)++κ(4)t ·(x−xold)++γt−x (2.1)

In this equation κ(1)_t indicates the general level of mortality over time, across ages. This equation will be calibrated on each separate population (and even for each calendar year), but we will look at common behaviour afterwards. The factor κ(2)_t indicates how mortality increases with age, indicating that we expect positive values for this parameter (visible in figure4.1). Having the coefficient (x − xcenter) moves some information from

κ(1)_t to κ(2)_t , so that the exact value of xcenter does not influence the analysis. Next,

we have two parameters κ(3)_t , κ(4)_t which allow for mortality to differ from this general trend for low ages (κ(3)_t ) and for old age (κ(4)_t ). In (3), young age ends at 55 and old age starts at 85. This choice makes the fit better after the retirement age, when longevity risk really plays, as we see in section5.2. Cohort effects are captured by the γt−x, and

the αx capture age-related effects that are constant in time.

We will make a separate estimation of the κ(·)_t parameters for each combination of year, country and gender. We can’t make a general statement of how significant each parameter is in each version of the model, but if it is really obvious that any of the pa-rameters is superfluous, we should shrink the model. To get an idea of how important the parameters are, we use the standard R glm output as a proxy for actual significance. In

(13)

chapter3 we will explain the estimation algorithm we use on each separate population. Within this algorithm, the glm is run for each calendar year; we turf the significance when it appears in each year. We have compiled statistics on how often each κ(·)_t param-eter appears as significant for each population over the calendar years. At significance

Population κ(1)_t κ(2)_t κ(3)_t κ(4)_t Australia female 94.74% 75.44% 75.44% 40.35% Australia male 98.25% 89.47% 92.98% 33.33% Belgium female 91.23% 10.53% 10.53% 68.42% Belgium male 89.47% 70.18% 84.21% 89.47% France female 96.49% 52.63% 43.86% 85.96% France male 92.98% 1.75% 54.39% 73.68% Netherlands female 94.74% 68.42% 52.63% 77.19% Netherlands male 87.72% 71.93% 73.68% 45.61% Western Population female 96.49% 35.09% 78.95% 85.96% Western Population male 94.74% 64.91% 84.21% 89.47% Total Population 89.47% 68.42% 50.88% 78.95% Table 2.1: Frequency of GLM significance for each parameter at p = 0.5% level p = 0.5%, we count the frequency each κ(·)_t parameter is mentioned as significant, and we show this for some selected populations, as well as the total population in table

2.1. We see no reason to remove one or more parameters from our model.

The parameter κ(4)_t is an important parameter: it allows the mortality at high ages to change in a different, often stronger way from general mortality. Fewer people surviv-ing to very old age means more volatile frequencies in mortality, as a simple binomial argument shows.

2.1.1 Volatility

In this thesis we use the model for the purpose of reserving: it could serve as a best estimate model, but we are mostly interested in finding confidence intervals for the value of reserves, to set economic capital. Because of necessary prudence in this matter, we will make sure that volatility is large enough, even introducing a sizable volatility add-on in section4.2.1.

The model could also be used for other stochastic calculations on mortality, we think of longevity swaps (often benchmarked reinsurance contracts). It is better to price and report the value of these contracts on a best-estimate basis, i.e. without a volatility add-on. However, using different volatilities for reserving economic capital and reporting hedges could diminish the raison-d’ˆetre of these contracts, whereas using the artificially higher volatility would lead one party to overpay for the contract.

(14)

1950 1960 1970 1980 1990 2000 −4.8 −4.6 −4.4 −4.2 −4.0 Year kappa_one General Population Restricted to full datasets

1950 1960 1970 1980 1990 2000 −5.5 −5.0 −4.5 −4.0 Year kappa_one Iceland USA

Figure 2.1: Left:Trend differences between Restricted Population and General Popula-tion, κ(1)_t . Right: κ(1)_t in Iceland and the United States.

2.2 Data

We use the population sets from the Human Mortality Database (12) on March 19, 20131, both the mortality rates and population sizes per age. The latter will be used as part of the weights in the estimation. We focus on the period 1950-2004, as a large number of countries have complete data for this period.

Not all countries have readily available data for the analyzed period 1950-2004. For example, Germany did not exist as a single country prior to 1990, and many ex-Soviet republics have no data prior to 1959. We will need to make a choice on which countries to include in our analysis and projections. When we restrict the analysis to those countries that have clear data, we risk underestimating the volatility of mortality rates. On the other hand, including all countries means the general population can undergo very significant increases or decreases of mortality rates at arbitrary times, and this can obscure the view on underlying trends for the original population. To see the difference, we show the values for κ(1)_t when determined for the general population as a single country (i.e. with all available data included), and restricted to the countries with full data sets in figure 2.1.

Restricting the analysis to countries with full data sets excludes many countries in

1

Specifically, we use the population data for Australia, Austria, Belarus, Belgium, Bulgaria, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Iceland, Ireland, Israel, Italy, Japan, Latvia, Lithuania, Luxembourg, the Netherlands, New Zealand, Norway, Poland, Portugal, Russia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Taiwan, the United Kingdom, Ukraine and the United States of America.

(15)

Eastern Europe. As a result, the uptick in mortality after 1990 is no longer there (see section 4.2.1, and the increasing trend in life expectancy is even stronger.

We will use either the general population (all countries) or the restricted group of countries with full data depending on the situation. When calibrating the volatility of the model, we need to include all countries as each can have a significantly large shock in any year. We consider the increase in mortality around 1990 in Russia to be important for the analysis, as it will help us define the shock scenario for mortality (section4.2.1) and will use the full list of countries. However, when we are interested in projecting the future evolution of mortality, we are most interested in what we expect for the Netherlands. In chapter 4, we split the evolution in two parts: a trend that is valid across countries and a country/gender-specific deviation from this trend. We will get the best results if we determine the common trend from a group of similar countries. So here, we will restrict ourselves to countries with full data2, which we will refer to as the Restricted Population. As is the case with the general population, we make this ’country’ by adding all population sizes and using weighted averages of mortality rates. We can see in the left hand side of figure2.1the difference between the behaviour of κ(1)_t in our restricted population and the total population. It is clear that the decrease in mortality in the ex-USSR in the early 1990s has a big effect on the general evolution of mortality, and it would be unwise to include this effect in a projection of Dutch Mortality. It would lead to larger differences between the general level of mortality and the Dutch level. As the total level of mortality (including Russia) is more volatile, the resulting projection of Dutch mortality would be much more volatile (resulting from both the total level and the more volatile difference between total population and sub-populations)

2.2.1 Data Quality Issues

The Human Mortality Database has data going back a very long time, but this has a drawback. Not all old data is as relevant as the data we are interested in. The automated transfer of mortality data (in text files) to our model needs a few tricks.

We choose the period of analysis to be 1950-2004 because this is a stable period in Western Europe (and North America and Japan). The data for France suffers from territory changes in 1861 and that means that we have two values for its population in that year (1861− and 1861+). This leads to errors when machine reading the figures (years become non-numeric). Luckily, our tool of choice (R) is flexible enough to bring us back to the integer realm, but not without leaving empty spots (NAs) in the database. Most of the time these missing data are before the period of analysis, but unfortunately it still affects the population we examine. We have a jump in Italy in 1971, leaving us without a population size. We solve this by using the average of the year before and the year after the jump as a proxy. We need these population totals as weights in the

2

The countries with full data are Australia, Austria, Belgium, Bulgaria, Canada, Czech Republic, Denmark, Finland, France, Hungary, Iceland, Ireland, Italy, Japan, The Netherlands, New Zealand, Norway, Portugal, Spain, Sweden, Switzerland, UK, USA.

(16)

60 70 80 90 100 0.00 0.05 0.10 0.15 0.20 Age q_x

Figure 2.2: Female mortality in Belarus, 1965

estimation of the different κ(·)_t .

The data that is present at the HMD is extensive, but its authors note that the quality is not always ideal. This will show when we are looking at sudden changes in mortality in section4.2.1. Often it is noted that no mortality data on the Soviet Union between 1974 and 1986 was published for ideological reasons, though this period has been covered later. Even when the data were published, this did not happen in a very detailed manner. An example is age heaping: either by misreporting in a census, or because deaths are only published for ages ending in 0 or 5, and then smoothed. This influences our data, as can be seen in figure2.2 for Belarus in 1965.

The main effect will be that the volatility of mortality for ex-USSR countries will be too high. In a sense, this will make our model more prudent: we include these countries when calibrating volatility and shocks, but not when we estimate the future western mortality rates.

2.2.2 Small Populations

In countries with small population sizes each death has a large impact on the qx.

Com-paring Iceland and the United States, we see in the right hand side of figure 2.1 that with a smaller population a country will have a more volatile realization of mortality: the curse of finite numbers blurs the theoretical analysis.

Even less practical is that if a population is small enough, e.g. Iceland’s 2268 37 year olds in 2002, there might not be a single death. An analysis of logit qx becomes

impossible. We chose to solve this problem by replacing qx values of 0 by the lowest

(17)

Chapter 3

Model Estimation

In this chapter, we will go through the estimation of the different κ(·)_t for each year in our analysis period: 1950-2005. We will encode most general information into the κ(1)_t and κ(2)_t parameters, the first one giving the general level of mortality in a given year and the second one the effect of age. We want these two parameters to be the main carriers of information, and will adapt our estimation of γt−x to this goal.

We consider the population as independent individuals subject to a probability qx,t

of dying. We use logit as it is the canonical link function in our estimation of this probability, given population sizes as the initial exposure for each year.

3.1 Model Estimation

This section describes the estimation algorithm. We will use this algorithm to estimate the various model parameters for a given population, by which we mean a group of people described by the population size and mortality qx,t for each age and calendar

year. We will usually look at a specific subset of people, for example Dutch women, but can also combine different populations into an aggregate, using the sum of population sizes and size-weighted mortality probabilities. For example, we can look at Germans, or the general population (all available countries and both genders). We will typically consider a country-gender combination as a population, as these are the smalles building blocks in the Human Mortality Database.

Once for each of the populations the qx and population size have been found on

the Human Mortality database, the model (2.1) can be estimated. To keep the model outcomes easy to interpret, and the estimation time reasonable, the estimation is split in five steps and occurs separately for each population.

• First, αx is set to the average logit qx in the analysis period.

αx= 1 tend− tstart+ 1 · tend X t=tstart logit qx,t

This is a quick and dirty way of removing the main cross-year effects from the qx,

reducing the estimation (2.1) to those factors that are dependent on t. 11

(18)

Next, we will estimate the dependence on t by making separate model calls for each year t. Only in later stages will we regroup the information across years to have a total view per population. Finally, we will look at cross terms as a correction for cohort effects. This way, we capture the first-order effects of x, t and their interaction in our model.

• Then, the remaining logit(qx,t) − αx are modeled as a response to the κ(·)t . We

make a separate estimation for each year, which is easier to interpret. It can have a computational advantage as well, although this seems to be minimal using R on the data sets. We use as a GLM the following call with a Gaussian distribution, with weights representing population size.

      

logit qxmin,t− αxmin

logit qxmin+1,t− αxmin+1

.. .

logit qxmax,t− αxmax

       ∼        1 x − xcenter (xyoung− x)₊ (x − xold)+        t ·        κ(1)_t κ(2)_t κ(3)_t κ(4)_t       

• After the first two steps, the mortality information is encoded in both the αx

and the κ(·)_t . We want our analysis to be as simple as possible, and are mostly interested in the evolution of mortality rates over time. It would be nice to have the κ(·)_t contain as much relevant information as possible. To arrive at this state, we can center the αx at 0. We go through the following replacements with (αx

linearly approximated as φ0+ φ · x):

κ(2)_t ← κ(2)_t + φ

αx ← αx− φ · (x − xcenter)

αx ← αx− αxcenter

κ(1)_t ← κ(1)_t + αxcenter

As visible in figure3.1 we remove the linear part of α and encode it in κ(1)_t , κ(2)_t . Note that this does not change the sum αx+ κ(1)t + κ

(2)

t · (x − xcenter) and hence

does not change the specification of the qx. The κ(1)t , κ (2)

t parameters have only

changed levels, but there should be no linear age-bias in the αx anymore.

0 20 40 60 80 −6 −4 −2 0 Age alpha_x Original Transformed phi_2+phi*(x−xcenter)

(19)

• In the estimation, we keep a record of the residuals for each of the GLM models (one for each calendar year). These are now fitted to cohort parameters to show cohort effects as γt−x. These could be estimated at the same time as the κ(·)t , but

that would (a) introduce trends in γt−x where we prefer not to have these and

(b) complicate the programming: we make separate estimations for each country-gender combination (population) and as different amounts of data are available for each country the size of a model containing all calendar years dependent on the country. In our approach we do the estimates on a smaller scale, just looking at all ages for a given population combination and a given year, then moving to the next year and ultimately to the next population. In section3.2we will discuss the differences in outcomes between the two estimation methods.

The authors of (3) estimate γt−x by least squares to the residuals. As we are

estimating for one population at a time, this reduces to taking a population-weighted average. We simply define γcohort as the mean of the residuals when

estimating logit(qx,t) with (t − x) = cohort. We do not weigh this average by

population size as a cohort involves the same individuals and size only decreases in time. There seems to be no reason to weigh an estimation error at young ages as more important than an error at high ages. The difference is minimal, as can be seen from figure3.2.

1840 1860 1880 1900 1920 1940 1960 1980 −0.10 −0.05 0.00 0.05 Year gamma_(t−x)

Population weights for gamma No population weights

Figure 3.2: Difference in estimation methods for γt−x, General Population

Moreover, one might wonder why the size of the cohort is relevant as a weight when estimating the cohort effect but not when estimating the year or age effects. The main difference between the two sets of parameters is that αx, κ(·)t capture broad

population mortality, whereas γt−x follows the same group of people throughout

their lives. This difference is not necessarily relevant when projecting mortality for future years, so we will keep using the traditional mean.

• Lastly, we readjust the parameter αx. We have already taken out the age-trend,

but would now like the model fit to be closest to the most recent data. We replace αx by a weighted average of αx and the final residuals, using lower weights for

(20)

more distant years. αx =Ptt=tendstart wt P iwi · logit qx,t− κ(1)t − κ (2) t · (x − xcenter) (3.1) −κ(3)_t · (xyoung− x)+− κ(4)t · (x − xold)+− γt−x (3.2) Here, weights wt= (1 + 1/hα)t−tend, meaning that with smaller h the older

obser-vations are less important, and with h → ∞ there is no adjustment. In this step we follow B¨orger et al. The effect is that we will start our eventual projection with values quite close to the observed mortality, making the future expectation more relevant.

To check the quality of our projection, we look at Belgian males and females aged 30, 60 and 100 in 2004. In table 3.1 we see that the model fits rather well for young ages, but misses the mortality for this specific sample of very old females.

Age qx qˆx qx/ˆqx Female 30 4.00 · 10−4 3.96 · 10−4 100.91% Female 60 5.94 · 10−3 5.54 · 10−3 107.19% Female 100 3.72 · 10−1 9.81 · 10−1 37.88% Male 30 9.30 · 10−4 9.60 · 10−4 96.90% Male 60 1.11 · 10−2 1.10 · 10−2 100.26% Male 100 4.11 · 10−1 4.09 · 10−1 100.49% Table 3.1: Model Accuracy for selected ages, Belgium 2004

3.2 Estimation of cohort effects

The estimation in the previous section can seem convoluted, using multiple steps to isolate the age-effect from the year-effects, and only estimating the cohort effect from the residuals instead of from the original data.

We think the case for treating αxas above is clear: we are interested in how mortality

changes over time, and thus want κ(·)_t to carry as much information as possible. It concerns a linear transformation and changes nothing about the estimation and the analysis, but without the transformation κ(1)_t would only capture part of the general level of mortality, and κ(2)_t only part of the sensitivity to age. As for the cohort parameters γt−x, we follow (3) in re-estimating the model in a large GLM containing all calendar

years for a given population, with a separate γ parameter for each cohort.

This is a bit more complicated than the model estimation described before. Where before we had four explanatory variables (κ(1)_t , κ(2)_t , κ(3)_t , κ(4)_t ) and residuals that would be turned into 141 cohort variables (by averaging the residuals for each separate value of t − x), we now have four explanatory variables for each year, and cohort variables to match. Depending on the country, the model will include a different number of years, and thus a different number of explanatory variables (four per calendar year plus one

(21)

per cohort). The GLM call is now:

      

logit qxmin,tstart− αxmin

logit qxmin+1,tstart− αxmin+1

.. .

logit qxmax,tend− αxmax

       ∼                  χ_(t=t_start₎ (x − xcenter) · χ(t=tstart) (xyoung− x)₊· χ(t=tstart) · · · (x − xold)+· χ(t=tend) χ_(t−x=t_start−xmax) · · · χ_(t−x=t_end−xmin)                  t ·             κ(1)_t_start · · · κ(4)_t_start κ(1)_t_start+1 · · · γt−x            

Again, we need to remove the linear trend from the αx and introduce it into the

κ(1)_t , κ(2)_t , or the outcomes will not be comparable with what we have found before. In figure3.3we show the difference between the γt−xand κ(1)t parameters in the two

estimation methods for the total population. We notice that full estimation uses the cohort parameters for part of the mortality trends, and a level difference in the κ(1)_t .

1840 1860 1880 1900 1920 1940 1960 1980 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 Cohort gamma_(t−x) Full estimation Residual Estimation 1950 1960 1970 1980 1990 2000 −5.0 −4.8 −4.6 −4.4 −4.2 −4.0 Year kappa_one Full estimation Residual Estimation

Figure 3.3: Full estimation of γt−x versus residual estimation

For the αx parameters, a very clearly defined trend was easy to remove with a linear

transformation, because the αx are common to the calendar years for which κ(·)t are

defined. The trend for γt−x is less clear, and if we would try to remove it we run into

the problem that the cohorts (t − x) are not independent from calendar time t.

The difference between the full estimation and the residual estimation goes from 0 to a level 0.15 to come back to 0 (and a little below 0). It leads to an almost level difference between the two estimated κ(1)_t .

We are interested in a κ(1)_t that carries the most possible mortality information, because a simple trend structure will be the easiest to project into the future. If we would find a trend in γt−x for one of the populations, this would explain part of the

improvement of mortality with time. The improvement in κ(1)_t would understate (or overstate) the evolution of mortality rates, and a trend in γt−x should be projected in

(22)

a way that is consistent with κ(1)_t throughout the years. This will be even harder when considering multiple populations at the same time. Hence, we prefer to construct γt−x

without trend, and stick with the separate estimation of κ(·)_t and γt−x We can focus on

the behaviour of κ(1)_t , the meaning of which is easy to interpret.

As a practical advantage of this decision we mention that not all cohorts are present for each analyzed country. Extracting the model coefficients and storing them in the appropriate dataframes is clear-cut with yearly models that are the same size each time. Of course, a closed formula can be found for variable-size models, depending on

tstart and tend but it would be a complication and make the model more error-prone.

Based on the theoretical preference and the observations for US females, we prefer not to extend this alternative model estimation to the general case.

(23)

Chapter 4

Projection

Now that we have a model to analyze the movements of mortality in the past half-century for a large collection of populations, we need to set out projecting the future evolution. To do this, we will identify what the main present trends are in κ(1)_t , κ(2)_t , κ(3)_t , κ(4)_t and use these to project the parameters further. Depending on the situation, we will make our projections either for the aggregate population or for individual countries (but still depending on what happens in other populations).

Later in the chapter we will describe how we bring all these projections together for forward-looking stochastic mortality tables.

4.1 Present Trends

We look at the behaviour of the four κ(·)_t parameters in figure 4.1. For κ(1)_t there is a very clear, common downward trend. For κ(2)_t and κ(3)_t there is no very clear trend, as all populations hover around 0.09 (for κ(2)_t ) or 0 (for κ(3)_t ). We see an upward trend κ(4)_t for the total population and female (dashed lines in the graph) groups. For men the trend is less clear in the picture. On aggregate κ(4)_t acts for men as it does for women after 1970, having been about constant up to that point. By constructing the γt−x as

residuals we ensured that they all lie around 0 without a common trend.

We will project κ(1)_t for the entire population, with deviations for each country-gender combination to make individual projections. We will project κ(2)_t , κ(3)_t without a trend (as in (3)), but correlated. Since we can see a trend in κ(4)_t , we will treat this parameter similar to κ(1)_t , but will not include a possibility of common trend changes. This is a departure from the constant-mean κ(4)_t that (3) projects. We will project γt−x

similar to κ(2)_t for some volatility in the cross term between calendar time and age.

4.2 Model Calibration

The main trends will be visible in κ(1)_t , which dominates the overall mortality. Given this importance, we will want to spend some extra time getting the projection parameters right. In this first section, we look at what volatility is appropriate for a reserving

(24)

1960 1970 1980 1990 2000 −5.5 −5.0 −4.5 −4.0 −3.5 Year kappa_one 1960 1970 1980 1990 2000 0.07 0.08 0.09 0.10 0.11 Year kappa_tw o 1960 1970 1980 1990 2000 −0.02 −0.01 0.00 0.01 0.02 Year kappa_three 1960 1970 1980 1990 2000 −0.03 −0.02 −0.01 0.00 0.01 0.02 0.03 Year kappa_f our

Figure 4.1: Parameter behaviour κ(1)_t , κ(2)_t , κ(3)_t , κ(4)_t for sample populations.

model. Specifically, we need to decide which size one-year shocks in κ(1)_t can be. That involves increasing the volatility and trend sensitivity from what was up to now a purely econometric exercise, and necessarily injects some subjectivity into our model. Up to this point we have followed B¨orger et al. very closely, but we will use our own estimates in this part, which will lead to a higher estimate of future longevity risk.

4.2.1 Volatility of κ(1)_t

As we will see in chapter4, the main driver of mortality evolution in our model is the κ(1)_t process. We need to establish what we consider a ‘big’ change in this parameter, and how often we expect to see these, so we look around for large movements in a single year.

We see some very large movements in mortality mostly in the former USSR after the fall of the communist regimes. Translated into κ(1)_t , this means an enormous increase in the early 1990s as visible in figure 4.2.

For all years t1 = 1980, · · · , 2005 we measure how much the realization κ(1)t1 differs

from the expected value in a weighted linear model, given the volatility in the years t = 1950, · · · , t1.

(25)

Modeling Mortality Trends — Jasper Van Halewyck 19 1960 1970 1980 1990 2000 −3.7 −3.6 −3.5 −3.4 −3.3 −3.2 −3.1 −3.0 Year kappa_one

Figure 4.2: Mortality increase in Russia in the early 1990s

We first estimate the weighted linear trend in κ(1)_t limited to t ≤ t1, with weights

as in the readjustment of αx (equation (3.1)), where we use the same value of h = hα.

Then, we consider as the volatility of κ(1)_t the weighted standard deviation from this linear trend: (σ)2 = P wi (P wi)2−P w2i · t1 X i=1950 wi· (κ(1)_i − ˆκ(1)_i )2 (4.1)

The big shock should reflect in the volatility, so we look at how it behaves in the period 1990-1995, showing for each year the volatility up to but not including that year:

Year 1990 1991 1992 1993 1994 1995

σ 0.0692 0.0655 0.0616 0.0778 0.1449 0.1743 Shock 0.8778 0.8332 2.3155 4.1853 2.0898 0.3549

Table 4.1: Volatility of κ(1)_t in the early 1990s

In this table, we define the shock as the difference between the realized κ(1)_i and its estimation ˆκ(1)_i , divided by the estimated standard deviation. Under the volatility in the observed period, we mention how big the next year’s shock is relative to this volatility. The shock in 1993 is 4.1853 standard deviations, which is rare: it only appears in a normal distribution with p = 1.424 · 10−5. But as this shock can happen, we want it to be within a reasonable confidence interval from the history up to 1992 (e.g. 99.8%). To get it within these reasonable spheres, we would need to increase the standard deviation from 0.0778 by 0.038 to reach 0.1158. Then, the innovation of 0.326 would be 2.812 times the standard deviation of 0.1158, corresponding to a certainty of 99.8% assuming a normal distribution. Note that this roughly is the same volatility value as (3) reaches, although they get it from a 1992 volatility of 0.06, perhaps through a shift in years. As a result, our volatility add-on is a lot lower than the 0.05 that B¨orger et al. mention.

(26)

It is worth noting that Russian males also experience a shock in 1986, when life expectancy jumped up and κ(1)_t dropped to -3.57 from a steady rise. Measuring the shock in that year, we find 5.7679 · σ (where σ is the estimated standard deviation using to (4.1)). We will follow (3) in picking 1993 as the shock on which to calibrate our model. The shock in 1986 could be a statistical anomaly, the reason being that in 1986-1987 the Soviet Union was entering the new era of Glasnost and resumed publishing mortality statistics ((12, Background and Documentation Russia)). In addition, there is a bit less data (27 years as opposed to 34 years) in which to establish the base volatility, making the shock less reliable. Coincidentally, the volatility add-on to make 1986 a 99.8th percentile shock is .044, or a bit higher than the add-on we are considering, but still smaller than the 0.05 from the paper. It is this low because of the very low volatility of κ(1)_t in the previous 27 years.

The reasoning behind the 99.8th percentile as calibration could be explained as follows. All 37 countries have about 50 years of data in the analyzed period (some slightly less). When we look at the volatility of κ(1)_t as in equation (4.1), we assume that we have captured the inter-year correlation by the weighted linear model, and will ignore residual correlation. The 37 countries are not independent. Innovations in mortality can depend on climate, natural disasters, medical advances, or an interdependent economy. If we imagine that our data set consists of only 10 independent populations (for example because of a hypothetical correlation of 0.7 between different countries), we will have a relatively prudent estimation of volatility. Next, we have data on both genders. Positing perfect dependence in the innovations for males and females we have 50 · 10 · 1 = 500 independent observations of changes in mortality. We want to have the experience of Russian males in 1993 as a 1-in-500 scenario1 or a 99.8th percentile scenario.

When looking at the multi-population setting, we can perform a similar analysis. We see a drop in the life expectancy (as a strong rise in κ(1)_t ) in the early 1990s, and one in 1986. We want this to be a 99.6th percentile shock (now recognizing both shocks). We find the following volatilities and accompanying shocks:

Year 1990 1991 1992 1993 1994 1995

σ 0.0185 0.0169 0.0174 0.0230 0.0430 0.0496 Shock 0.7340 1.3709 2.4577 4.0864 1.8800 0.4430

Table 4.2: Volatility of κ(1)_t in the early 1990s for the General population

Here in 1993, a volatility add-on of 0.029 would make that year a 2-in-50 scenario. We pick 50 scenarios here because we only have about 50 years of data, and a single (consolidated) population.

1

Or, in light of the discussion on 1986, a 2-in-500 scenario. This would lead to a volatility add-on of 0.046

(27)

4.3 Projecting κ

(1)_t

4.3.1 A process for the general population

From figure 4.1 we see that κ(1)_t behaves as stochastic noise around a linear trend, although this trend is not necessarily constant in the entire period. The top line (orange: Dutch males) in κ(1)_t visible in figure 4.1, for example is at first slightly increasing and then turns decreasing around 1970. As a consequence, we can’t simply model κ(1)_t as a random walk with drift, as the drift in such a process is constant and the trend cannot easily change.

Instead, we will re-estimate the model with each new year of observations, updat-ing both drift and volatility. Intuitively, this is close to the (non-life) actuarial practice where expectations are updated after each new year of observed claims. By contrast the life practice we work in gives a lot of consideration to the biannual CBS mortal-ity projections, where expert and epidemiological expectation also play a role in the forecasts.

So consider a weighted linear model, with weights (1 + 1/h)t−tend_{. The calibration}

of h will follow in section 4.3.3. We find a line κ(1)_t = a + b · t + et approximating the

actual κ(1)_t with most weight on the recent observations by minimizing X i 1 (1 + 1/h)(tend−i)(κ (1) i − a − b · t)

This line is the current trend. We use a weighted volatiliy σ (see equation (4.1)) to determine how large the deviations from this trend typically are.

With this information we project the κ(1)_t+1 as: κ(1)_t

end+1 = a + b · (t + 1) + t+1· (σ + ¯σ) (4.2)

with i.i.d. standard normally distributed.

The parameter ¯σ, calibrated in section 4.2.1is added to the statistical volatility to make this model more prudent. We use the value 0.038 only when prudence is needed, and may set it to 0 for best-estimate projections.

Next, we imagine we observed the history up to tend+ 1 including the projected

year, and repeat this process. We re-estimate the linear trend and the (weighted) vari-ance from this trend, then use this new varivari-ance to project again. Note that (3) uses a constant weighted variance, which they find in a different way. Similar to our equa-tion4.1, they determine a weighted linear model based on the history t < t1, for each

t1 = 1951, · · · , 2004. The volatility σ is the weighted average of the deviations

κ(1)_t − ˆκ(1)_t t = tstart+2, · · · , tend

Our version allows the volatility to increase in extreme scenarios, making the model more prudent when stress testing.

In both versions of the model, re-estimating is what makes the model more tolerant of trend changes than a simple random walk with drift. When we determine the drift

(28)

as a weighted average of κ(1)_t − κ(1)_t−1 as µ and project κ(1)_t+1= κ(1)_t + µ + · (σ + ¯σ) this drift will stay constant and trend changes will be limited.

4.3.2 Projection as a changing AR(n)-process

Let’s have a closer look at equation (4.2)

κ(1)_t+1 = a + b · (t + 1) + t+1· (σ + ¯σ)

Where a and b are dependent on the weights wi.

We can write ltas a+bt, where we can expand both a and b, using n = tend−tstart+1:

a = κ¯(1)_t − b ·X i i · wi P wi b = P wi (P wi)2−P w_i2 · tend X i=tstart wi(i − tstart+ tend 2 ) · (κ (1) i − ¯κ (1) t )

In this last expression, we can expand ¯κ(1)_t as a function of the separate κ(1)_i . Then we see that we can write this expression as a linear function in κ(1)_i with the coefficients only depending on i through the weighting parameters:

wi i − tstart+ tend 2 = (1 + 1/h)i−tend_· i −tstart+ tend 2

i.e., they are of the form

(1 + 1/h)i· (c₁+ c2i)

for some c1, c2 independent of i, κ(1)i . Because the coefficients are not dependent on the

previous κ(1)_i we can write

κ(1)_t+1= a1· κ(1)1 + a2· κ(1)2 + · · · + an· κ(1)n + t+1

making this process look like an AR(n) process. It is not for two reasons:

1. The value of n depends on how many terms have come before the current projec-tion step. With the form the weights take, this could still be ignored as n → ∞ because terms from the past will have negligible impact.

2. The volatility of t+1 is re-estimated in every timestep, whereas a regular AR(n)

process is homoscedastic. In this we differ of course from (3).

4.3.3 Trend sensitivity of κ(1)_t

We have seen that the trend can change, and the parameter h describes how much influ-ence the most recent shocks get in ongoing projections, thus indicating trend sensitivity. Here we will find a value for h that makes the model fit the data.

(29)

We use a a weighted linear model to estimate the trend, κ(1)_t = a + b · t + i where the

parameters are estimated using weights 1/ (1 + 1/h)i as in (4.1). The parameter h in this case will determine how much weight is given to the most recent observations, so a low value of h will mean that older observations are (almost) ignored, but as h → ∞ we use the entire history. To calibrate this parameter parameter h we need to first identify a strong trend, and in figure4.3 there is a clear and shared trend change around 1970 for a set of countries excluding most of the former USSR (the same set of countries as in2.1). 1960 1970 1980 1990 2000 −5.5 −5.0 −4.5 −4.0 −3.5 Year kappa_one Australia Austria Belgium Denmark Finland France Italy Japan Portugal Netherlands Norway Spain UK USA General Population Restricted Countries

Figure 4.3: Different populations with their κ(1)_t . Solid lines are for males, dashed lines for females.

The trend change around 1970 is one in which a rather stationary process (at least for men) changes to a decreasing process. We cannot visually identify another trend change, other than the increase in mortality around 1993 mainly in ex-Soviet countries. As trend changes take some time to become clear, we can’t just examine each separate year, so we will stick to analyzing the changes around 1970. Here, we want to find a value for h that makes the strongest change in trend appear at reasonable probabilities, say the 95th percentile (as in (3)).

Figure 4.3shows a particularly strong trend change for Dutch males (orange in the graph) where a horizontal line changes in one decreasing as much as all other popula-tions. We examine how h can be set to show the probability of this shift.

For this, we look at the history of κ(1)_t up to 1970, and project 25 years more years of κ(1)_t using the projection method described above, but with different values of the weighting parameter h. We do this 2500 times, and see how the actual trend compares to the 99thpercentile. We have no specific end year for the trend in mind, so we look at the 99thpercentile of each specific projection year. In general this gives wider percentiles than looking at, say, the 99th _{percentile scenario for year 25 and looking at that specific}

(30)

scenario throughout the 25 years.

The comparison is visible in figure4.4for different values of h. To make the outcomes comparable, we keep the random seed constant at the start of each projection.

1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=3 kappa_one 1%/99% percentile 5%/95% percentile Actual 1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=3.5 kappa_one 1%/99% percentile 5%/95% percentile Actual 1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=4 kappa_one 1%/99% percentile 5%/95% percentile Actual

Figure 4.4: Actual trend versus possible trends, for different values of h

We are looking at percentiles for κ(1)_t , made for each separate year. The combined 95thpercentile line will in general be lower than any of the individual scenarios, so setting a trend at this level corresponds to the actual trend laying at a higher percentile. Put more precisely, if we would calibrate the observed trend change at the 99.5th_percentile,

we would never actually observe it as a scenario in our projections. To be able to rank entire scenarios, we need a specific metric and the calibration would depend on the metric. We could pick the value of κ(1)_t at any specific year, or a more complex function from the various sets of κ(1)_t to R. At the end of chapter 5 we will do precisely that, using the value of a model portfolio for each scenario as a metric. For now, we want to keep away from any specific portfolio or term structure of interest rates, so we limit ourselves to this simplistic method.

We see that a value of h = 3.5 allows the increase in Dutch male mortality within the 95th percentile. The value we find is lower than the value of 4-5 that (3) mentions, in part because we use a much lower volatility add-on of 0.038 (as compared to 0.05 for ¯

σ).

Returning to the multi-population setting, we look at the evolution of κ(1)_t for the total population of countries with full data sets. We find a trend that does not change nearly as strong. Trend changes in the aggregate population are of course less intense than extreme cases found within this aggregate. Hence we look at a more moderate event such as the 90thpercentile. With a value of h of about 2.5 − 3, the observed trend change just touches these percentiles.

Finally, we want to emphasize that calibrating h and ¯σ is a highly iterative process: ¯

σ depends on the volatility estimate, depending on the weights h used to estimate. Conversely, h depends on how volatile we assume κ(1)_t , since this volatility determines

(31)

Modeling Mortality Trends — Jasper Van Halewyck 25 1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=3 kappa_one 5%/95% percentile 10%/90% percentile Actual 1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=3.5 kappa_one 5%/95% percentile 10%/90% percentile Actual 1950 1960 1970 1980 1990 −5.0 −4.5 −4.0 −3.5 −3.0 h=4 kappa_one 5%/95% percentile 10%/90% percentile Actual

Figure 4.5: Actual trend versus possible trends, for different values of h

the percentile lines we see in the figures. The same weights are used to determine the volatility as to determine the trend sensitivity. Thus, when we pick a value for h, we need to determine the size of the shocks as in table 4.1 to find the volatility add-on. Different values for a volatility add-on will give different trends in figure 4.4 and this influences the value of h we want to pick.

To determine the value of h = 3.5, we have tried a range of possibilities between 2 and 4.5, each time reevaluating the volatility add-on and the resulting distribution of κ(1)_t after 1970. This corresponds to a slightly higher value for ¯σ of 0.038. For this number, we base the calibration of ¯σ on the Russian experience. This makes the volatility add-on larger than it would be in the multi-population case, as the expected binomial variance is larger for sub-populations. A real multi-population calibration would with its lower volatility lead to lower capital requirements.

4.3.4 Projecting the individual κ(1)_t from the general population

1950 1960 1970 1980 1990 2000 −1.0 −0.5 0.0 0.5 Year kappa_one −/− kappa_one ,total Australia Austria Belgium Denmark Finland France Italy Japan Portugal Netherlands Norway Spain UK USA

Figure 4.6: Differences between κ(1)_t for individual populations and the total analyzed population. Solid: males, dashed: females.

(32)

We see in figure 4.6 the difference between κ(1)_t,p and κ(1)_t,total, where p varies among the different gender/country combinations. These differences often move in arcs, moving away from or closer to zero and then moving back, rather than moving totally away or ever closer to zero. In the observed period, it seems that each population has its own steady ’average’ differences from the general population, and there is no reason to assume these differences will change in the future. We have to allow departures from this steady difference to take for a longer period of time (say 10 years).

It makes some sense, then, to model the differences as mean-reverting AR(1)-processes with a mean reversal parameter close to 1. We say that

κ(1)_t,p − κ(1)_t,total = ap+ bp·

κ(1)_t−1,p− κ(1)_t−1,total+ t,p

Where −1 < bp < 1 for stationarity, rather than the lower bound of 0 that (3) mentions.

To estimate bp we consider the correlation between κ(1)t,p − κ (1)

t,total (noted dt) and

κ(1)_t−1,p− κ(1)_t−1,total (noted dt−1). The mean reversion parameter bp is given by

cov(dt, dt−1)

var(dt−1)

= 2 · cov(dt, dt−1) var(dt) + var(dt−1)

The last adjustment is a theoretic equality for an AR(1) process, but we use it here because of course the two periods (1950 to 2003 and 1951 to 2004) don’t give exactly the same variance. We find values of bp< 1 as required. In this estimation, we use weighted

variance and covariance with the customary weights for h = 3.5. In the populations considered, all but one values for bp were between 0.09 and 0.98, with the exception

at −0.22. All of these values indicate a stable mean-reversion, the single negative value represents Finnish females, where the observed departures from the mean were very short-lived in the last half of the observed period.

The parameter ap is determined based on the historical difference a0p= ap/(1 − bp).

We consider four methods for finding this difference:

1. As in the previous sections of the analysis, a w-weighted mean with h = 3.5. 2. An unweighted average of differences.

3. An intermediate way, closer to (2): We use a w-weighted linear model to extrap-olate each individual population with an extra 5 years, then take an unweighted average.

4. A second intermediate way, to use 15 years of weighted extrapolation instead of 5.

We give some example outcomes of this estimation in table 4.3, and see different outcomes. For some populations, notably UK females and French males, the four meth-ods yield close to identical results. For populations with a strong and consistent upward or downward trend such as Japan females the choice in method will have impact. Using unweighted average will draw the κ(1)_t -deviance back, away from the trend of the past

(33)

Population method 1 method 2 method 3 method 4 Australia female -0.6169 -0.4259 -0.4466 -0.4886 male -0.0174 0.2327 0.2048 0.1469 Belgium female -0.4139 -0.4066 -0.4057 -0.4015 male 0.3161 0.3343 0.3318 0.3258 France female -0.5932 -0.5470 -0.5477 -0.5425 male 0.3195 0.3263 0.3249 0.3208 Japan female -0.8129 -0.5508 -0.5748 -0.6167 male 0.0456 0.1160 0.1117 0.1079 Netherlands female -0.3566 -0.5364 -0.5149 -0.4683 male 0.2015 0.1222 0.1269 0.1304 UK female -0.2931 -0.2911 -0.2935 -0.3016 male 0.2381 0.3287 0.3174 0.2923

Table 4.3: Different methods for estimating a0_p

decades. Depending on the value of h only the most recent values will have influence. So we try to find a middle ground by extending the time series with 5 or 15 extra values, and then taking an unweighted average. This way, we find values for the long time dif-ference that are not too far away from the recent trend, but also incorporate the more distant past. In the remainder of the projection we will consider only method 4.

Next we find ap = a0p· (1 − bp)

After this we find that the differences at time t from the prediction at time t − 1 are close to white noise. Out of the 48 analyzed populations a Jarque-Bera test performed at p = 0.05 rejects 12, so we continue and project the innovations t,pas t,p∼ N (0, σp).

We allow for an inter-population correlation structure we find for the period 1951-2004, using w-weights with hσ = h.

4.4 Projection of κ

(2)_t

, κ

(3)_t

and κ

(4)_t

As we have seen in figure 4.1 the parameters κ(2)_t and κ(3)_t have no common trend, and are more or less constant for each population. Thus, we can’t add any insight by starting off with a projection common to all populations and defining deviations from that projection. Hence, we will look at them as random walks with correlation between κ(2)_t , κ(3)_t and across populations.

To do this, we first isolate the different innovations κ(2)_t − κ(2)_t−1 and κ(3)_t − κ(3)_t−1, and determine the w-weighted covariance structure A, i.e. we consider correlations between κ(2)_t and κ(3)_t as well as correlations across populations. Collecting the innovations in a matrix X with a column for both κ(2)_t and κ(3)_t for each population, we find the

(34)

covariance matrix A Aj,k = P iwi (P iwi)2−Piw2i ·X i wi(xij− ˆxj) · (xik− ˆxk)

Next we notice that (3) projects κ(2)_t with a volatility add-on ¯σ(2) _{of 10}−5 _{on an}

average volatility of (in our weighted estimation) 1.16 · 10−5. The goal of this new volatilty add-on is to reduce correlation between mortality rates at young versus old ages. The parameter κ(1)_t will determine the general level of mortality in any given year or scenario, these two will be somewhat positively correlated. Because the parameter x − xcenter is negative for young ages and positive for old, larger values for κ(2)t will

cause more negative correlation between the two through time, and a larger volatility of κ(2)_t will cause negative correlation across scenarios.

κ(2)_t = κ(2)_t−1+ (2)_t · (σ(2)+ ¯σ(2)) Thus we can increase κ(2)_t to bring the general dependence to 0.

Another way of achieving different evolutions for young and for old mortality is to increase the volatility of κ(3)_t and κ(4)_t . This will make both groups have separate and almost unrelated movements, but κ(1)_t will still influence the general, so we don’t expect a 0 correlation. The difference between both methods is shown in figure 4.7. The figure shows schematically what happens to mortality (y-axis) as age (x-axis) increases. In the left pane, a high value of κ(2)_t increases mortality for old age, but decreases it for young ages; a low value does the opposite. Thus, increased volatility makes for larger (and smaller) values of κ(2)_t , and reduces correlation between old and young ages by adding negative correlation.

Figure 4.7: Independence between young and old ages is achieved by letting them vary independently (right) rather than by varying the general slope (left).

In the right pane, we show the increased volatility at high ages (κ(4)_t more volatile) and at young ages (κ(3)_t more volatile. The blue and red lines move independently of one another as we go through scenarios. This decreases the correlation by making the age-related part of mortality more independent, but does not change the base correlation caused by κ(1)_t .

(35)

We can implement this other method by finding a common factor r (see section

4.5.2) with which to multiply the innovations for each of them:

κ(3)_t = κ(3)_t−1+ (3)_t · σ(3)_{· (1 + r)}

and similar for κ(4)_t , with the caveat that we first extract a common trend from κ(4)_t before determining the innovations (see section 4.4.1). The common trend (κ(2)_t ) would not be subject to the increase in volatility. In section 4.5.2 we will test both methods quantitatively.

4.4.1 Projecting κ(4)_t

In our model we have one parameter, κ(4)_t that describes how the trend of mortality at old ages differs from the general trend. We see very clearly in the picture, that κ(4)_t has had an increasing trend in the past 50 years, even though early in the period we see different behaviour for men and women (figure 4.8).

In the setup of our analysis, this means we expect an increasing trend in κ(4)_t , for ages 85+. If we use again our weighted linear model with the value h = 3.5 (as we used

1950 1960 1970 1980 1990 2000 −0.03 −0.02 −0.01 0.00 0.01 0.02 0.03 Year kappa_f our Analyzed countries Females Males Netherlands Weighted model Unweighted model

Figure 4.8: Increasing trend in κ(4)_t for males (solid) and females (dashed). for the trend sensitivity of κ(1)_t ) to determine the basic trend, we are almost ignoring the quasi-horizontal line before 1985. The projected increase in κ(4)_t would go from 0.027 in 2005 to 0.116 in 2065, causing a dramatic increase in mortality rates at ages 95 and up. Before that age, the increase is still offset by the large decrease in κ(1)_t that causes all mortality rates to drop.

We discussed in section1.3that although we might expect a reduction of the variance of mortality, we think a strong increase of mortality rates is not realistic. We want to choose the trend of κ(4)_t in such a way that the mortality rate at the end of our mortality table (in this analysis: 105 years of age) stays under the observed mortality at the end throughout the projection period. This ‘ceiling’ is too strict if the parameters can vary,