On modeling stochastic portfolio-speciﬁc mortality rates for small pension funds based on income

(1)

On modeling stochastic

portfolio-specific mortality rates for small pension funds based on income

Abstract

This paper proposes a Bayesian Lee-Carter based multiple population mortality model which incorporates income effects that approaches pension fund-specific mortality based on historical mortality data sets of multiple pension funds. These data sets are allowed to be inconsistent in size in terms of numbers of historical years. We first applied the model to simulated data sets and found that the aggregate of the data should consist of enough individuals and historical years to obtain credible confidence intervals for forecasted numbers of deaths. The data set that is available for this research is too small to obtain reliable confidence intervals for the time trend in mortality. Therefore, an adjusted model that uses the time trend as input is calibrated which leads to a satisfactory distribution for the technical provision. However, as the uncertainty that follows from the time trend leads to more volatility in long-term predictions of future deaths, reliable confidence intervals for the time trends in mortality are needed for a more accurate distribution for the technical provision of a pension fund.

Author :

Stijn W. Groot, MSc

¹

Supervisor:

Dr. F. Van Berkum

²

Second reader:

Dr. T.J. Boonen

³

Faculty of Economics and Business: Quantitative Economics University of Amsterdam

The Netherlands August 12, 2021

1MSc in Econometrics, Student number: 11337028, email: stijngroot@live.nl

2University of Amsterdam, Amsterdam, The Netherlands

(2)

This page has intentionally been left blank.

(3)

Statement of Originality

This document is written by Stijn Groot who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

Acknowledgements

I would like to thank Dr. F. van Berkum of the Faculty of Economics and Business of the University of Amsterdam for his ideas and insights for this research and his guidance through all steps of the process of writing this thesis. I also would like to thank Dr. T.J. Boonen of the Faculty of Economics and Business of the University of Amsterdam as the second reader of this thesis. I am also grateful to my colleagues T. Konjer, Drs. W. Hoekert AAG and K. Keijzer for their useful advice. I would like to thank my peers Bas Borgers, MSc and Josha Dekker, MSc for their invaluable support and the many other peers that inspired me throughout the years that I spent at the Faculty of Economics and Business. Lastly, I would like to thank the employees of the pension funds who allowed me to use their data.

Stijn Groot, August 2021.

(4)

1 Introduction

Due to the expanding interest in internal risk modelling and the Solvency II agreement, which is in effect since January 1^st 2016, vast literature has been published on quantifying risks for insurers and pension funds.

Solvency II is an act of the European Union in order to protect the insurance and pension sector. Before Solvency II, capital requirements were based on a fixed percentage of the total technical provision. With Solvency II, capital requirements are based on internal or standard risk models. Hence, for pension funds, analyzing the sources of risk is of relevance.

For pension funds, two key risks come from mortality. On the one hand, there is the risk that policyholders live longer than expected resulting in more payments than expected, formally known as longevity risk. On the other hand, death before retirement age results in liabilities in the form of spouse’s pension and orphan’s pension. This risk is called mortality risk. This paper will mainly elaborate on the longevity risk. Modelling these kinds of risks requires assumptions on mortality rates of the policyholders. As these assumptions affect the quantifications of the risks severely, these assumptions should be made very carefully. Consequently, the quality of mortality rate models is widely studied (Lee & Carter,1992,Renshaw & Haberman,2006).

Previous century, most pension funds valued their liabilities using flat mortality tables which implies the assumption that age-specific mortality rates do not change over time. However, as of the start of tracking mortality data, it is commonly accepted that mortality rates decrease over time. This is mainly caused by the fact that nowadays healthier lifestyles are followed. For instance, in light of Ferrucci et al.(1999) and Stewart et al.(2009), this is partly due to the decrease in men and women with smoking habits. Stewart et al.(2009) found proof that there is an inverse relationship between smoking and life expectancy using mortality data from The United States. They also found that the negative effects of increasing BMI in the United States outweighs the positive effects due to a decrease in smoking rates. However, the latter is not the case for The Netherlands as obesity is there not as much of a problem as in The United States. Also, the increase in medical knowledge, quality of living and physical activity have contributed to the life expectancy in the past years. For example,Ferrucci et al. (1999) subdivided over eight thousand individuals by their habits concerning physical activity in three respective classes and estimated their life expectancy using a Markov chain based model. They significantly associated physical activity with more expected years of life. Specifically, people at age 65 who belong to the most active class may expect a future life span of approximately 15 years, while people at age 65 associated with the least active class are expected to live 11 more years.

Although the general public will regard this phenomenon positively, it puts excessive stress on pension funds. As the life expectancies increase, pension funds are obliged to pay their participants more benefits than anticipated when the liabilities were valued. This fallacy caused reoccurring negative results on mortality, which led to financial problems. Therefore, pension funds use projection tables in order to establish the value of their liabilities more accurately ever since. Projection tables incorporate the negative time trend in mortality rates so that the decrease in mortality and thus the increase in life expectancy are taken into account.

For all countries, there is data available that includes the total exposure and the total number of deaths

(7)

per age per year for both genders. From this, the baseline mortality could be estimated and forecasted for the whole country. In The Netherlands, this is effectuated by AG (Actuarieel Genootschap), which provides the best-estimates for the forecasted mortality rates for the whole population. However, these mortality rates should not be used by pension funds directly. Plenty of studies have shown that there is heterogeneity in mortality among different subpopulations. For instance, habits such as the aforementioned smoking and being physically active affect an individuals’ mortality rates severely. Furthermore, research has shown that socioeconomic circumstances such as educational attainment, occupation, income and deprivation also have an influence on mortality (Katikireddi et al.,2017,Raleigh & Kiri,1997,Shkolnikov et al.,2006). According toRaleigh & Kiri (1997), differences in life expectancies between people living in deprived areas and people who live in wealthy areas has widened in the previous century. Namely, whereas the mortality has decreased for people in the wealthy areas, it maintained, and even increased for adolescents, in the deprived areas. Moreover,Shkolnikov et al. (2006) show that in central and eastern Europe, different mortality trends are observed for different educational levels. It appears that in Russia and Estonia, the life expectancy has decreased for low educational levels but is increased for high educational levels. Due to the improved average educational attainment, the average unconditional expected life span has increased. Also, they argue that in central and eastern Europe, over 15 per cent of the increase in life expectancy could be attributed to the relative increase of the size of the higher educational categories. There also is a well-established relationship between job occupation and mortality.

Besides the fact that during the recent COVID-19 pandemic the excess mortality was observed to the greatest extent among in-person workers (see e.g. Chen et al.(2021)), disregarding health crises, there is a great gap between mortality among the most low-skilled and high-skilled workers. To illustrate,Katikireddi et al.(2017) find that the mortality rates of construction workers are almost three times higher than that of managers and teachers. Moreover, some low-skilled groups of occupation such as female cleaners even experienced increased mortality rates.

The fact that there is heterogeneity in the mortality rates of individuals, suggests that mortality rates that are used at pension funds should depend on its portfolio. In agreement, Solvency II obliges companies in the pension sector to find reliable estimates for the mortality specific to their portfolio. Therefore, much literature is dedicated to the estimation of portfolio-specific mortality could be found.

1.1 Portfolio mortality at Willis Towers Watson

In this subsection, we provide an overview of the former models for portfolio-specific mortality that were used by Willis Towers Watson.

In 2010, Towers Watson (now, Willis Towers Watson) published a new model that estimates mortality rates that are more appropriate for pension funds than the mortality rates for the whole population. As substantiated earlier, the working population usually experiences lower mortality rates than the rest of the population. Therefore, this model estimates age-specific factors that are applicable for labourers. These factors do not change over time and represent the ratio between the country-wide mortality and the mortality in the

(8)

portfolio. For this, a large data set is obtained from pension funds, which forms a representative subgroup of the working population in The Netherlands. As the resulting mortality is estimated for the working population rather than for a specific portfolio, pension funds could not simply incorporate these mortality rates in the calculations of their technical provisions. Instead, pension funds have to determine the extent to which their portfolio corresponds to the considered working population. The latter led to problems for smaller pension funds with respect to the statistical foundation. Due to the lack of participants, the confidence intervals of the mortality rates were not favourable so that the mortality derived from this model could not be used. Therefore, in 2012, Willis Towers Watson has been asked to account for this problem by finding a model that suits pension funds with portfolios of smaller size.

In response, Konjer (2012) proposed a new model that successfully derives unbiased estimates for fund- specific mortality rates. The information that is used for estimation is a set of tables that consists of observed death probabilities subdivided according to gender, age, income and working sector occupancy for the Dutch population from CBS (Centraal Bureau voor de Statistiek ). By means of the smoothing algorithm of van Broekhoven (2002), the data is converted into smooth functions and is used to fit a Gompertz distribution.

After this, through copula theory, the CBS data is transformed into portfolio specific-mortality factors. As this method requires characteristics of the portfolio rather than historical data as input, it is not possible to find reliable portfolio-specific estimates for mortality rates for small pension funds as well. As the compositions of the portfolios of pension funds change over time, the fund-specific factors are estimated every two years, after AG publishes the updated best-estimates of the forecasted mortality of the Dutch population.

A disadvantage of this model is that it is only able to find projections of the portfolio-specific mortality.

Hence, it does not provide any vision of the uncertainty in these projections, whereas this is desired according to Solvency II. Addressing these uncertainties results in valuable insights into the present longevity risk. Therefore, in this research, we assess a new model that stochastically estimates mortality rates for pension funds.

1.2 Stochastic portfolio-specific mortality modelling

Many papers have been written on modelling portfolio-specific mortality in a stochastic fashion (Plat,2009b, Richards et al.,2013,Van Berkum,2018). Portfolio-specific mortality could be modelled using historical mortality data of the pension fund. However, due to a lack of data, it is usually not possible to simply fit models such as the model pioneered byLee & Carter(1992) and varieties of this model on the available data (see e.g.

Renshaw & Haberman(2006),Brouhns et al.(2002),Plat(2009a) andCurrie et al.(2004)). We include some specifications of the Lee-Carter model to provide the reader with some intuition behind it. A description of the initial design of this model follows. Forecasting mortality rates based on the Lee-Carter model consists of two steps (seeSubsubsection 2.3.1for details). In the first step, the natural logarithms of mortality rates are estimated based on an age-specific effect, a general time-trend, and an age-specific factor that reflects the extent to which a certain age group is exposed to this time trend. This model is often estimated with Singular Value Decomposition or under a Poisson assumption for the observed death counts. The time series of time-trend

(9)

parameters is used to forecast future trends in mortality rates, usually based on a random walk process with a drift, which forms the second step. This process is statistically independent of the first step.

There are many parameters that have to be estimated for the first step of this model: two parameters for every age, and one for every observed year. As a result, a lot of data is required to model mortality accurately. Also, this leads to higher variance in parameter estimates. However, the great amount of variables leaves the model very flexible. Many researchers have proposed expansions of this mortality model (Li & Lee, 2005, Plat, 2009a, Renshaw & Haberman, 2006). For instance, Plat (2009a) adapts the Lee-Carter model to a more parametric version. He incorporates two age-dependent functions such that different time-trend could be estimated for different ages. Also, in this way the model is applicable for wider age ranges without letting the number of variables increase too much. An overview of extensions of the Lee-Carter model is provided in Van Berkum (2018), as well as a comparison of their performance when applied to Dutch data in a Bayesian framework. Later on, more details will be given on the way the Lee-Carter model is estimated and on restrictions that ensure identifiability of the parameters.

Models based on the Lee-Carter model are often fitted on huge data sets for the whole country with many historical years. Data availability for pension funds results in two key problems. Firstly, small pension funds have too few participants to base the mortality projections on. Secondly, there are often not enough historical years of data available. As the mortality rates are generally assumed to be decreasing over time, a time trend is included in mortality models to enable forecasting future mortality, such as in the Lee-Carter model. When there are only a few past years of data available, it is often deemed impossible to determine reliable time trends.

Therefore, in order to accurately forecast portfolio specific mortality, often portfolio specific age-dependent factors are used such as in the former models of Willis Towers Watson. As aforementioned, these factors are the ratio between the mortality in the portfolio and the country-wide mortality. The advantage of these factors is that these could be assumed to be constant over time.

Regarding mortality rates of multiple subpopulations simultaneously is often referred to as multiple population mortality modelling. This often leads to more stable and reliable estimates for the considered subpopulations separately (Antonio et al.,2015,Eriksson,2020,Li & Lee,2005,Van Berkum,2018). In a recent study, Eriksson(2020) finds that mortality forecasts are more accurate when multi-population mortality models are used than when the forecasts are based on individual models. Importantly, this result is most significant when the estimation period is short, which is the case in our study. Multiple population mortality modelling is also used to investigate mortality differentials among different socioeconomic groups (see e.g. Villegas & Haberman (2014) andWen et al.(2021)).

In agreement, to resolve the problem regarding the sizes of the portfolios, we expand the data set by aggregating the portfolios of all pension funds into one fictitious subpopulation. On this subpopulation, we fit an expansion of theLee & Carter(1992) model. We expect that this greater data set enables us to accurately estimate the baseline mortality based on this subpopulation. We also obtain age-dependent factors for each portfolio to multiply the baseline mortality by, to transform these into portfolio-specific mortality rates. These

(10)

to be constant.

Even though the mortality could be obtained accurately, it is questionable whether the forecasts are reliable as well. Namely, the data on the subpopulation still consists of too few historical years such that it is not possible to estimate the time-trend properly. Therefore, we estimate the general time-trend using the country- wide population and use this as an input for our model. Note that we expect that the age-specific exposures to the time trend could be estimated. In case these could not be estimated on the basis of the available data, these are estimated from the country-wide population as well. This approach forms a solution to the problem of the lack of data in the time dimension. We will also apply the model to simulated data sets with known parameters that consist of different sizes in terms of the portfolio size and the number of historical years in order to assess the validity of the model and the performance for different sizes of data sets. As mostly the extent to which the mortality in the age-groups are exposed to the time trend is hard to estimate for small portfolios with a low number of historical years, it is expected that increasing the size of the portfolio results in more valuable and reliable long term predictions.

The literature on heterogeneity also suggests that for pension funds, the baseline mortality could be estimated most accurately when lifestyle and socioeconomic factors are taken into account. However, due to privacy reasons, this information is usually not publicly available. According to the national regulations on privacy, data banks that own data that may result in the possibility of identifying an individual person, are not allowed to publish or distribute this under any condition. Consequently, pension funds are unable to use personal data of the whole population. More importantly, such characteristics of the fund’s participants are often not directly available. However, their income and pension accrual are known.⁴ Income could be used as an indicator of socioeconomic factors. In fact, it is commonly accepted that there is a positive link between income and socioeconomic factors. Chetty et al.(2016) proved that there is a link between income and mortality. Moreover, the heterogeneity in life expectancy among different income groups has increased over time. Also, Backlund et al. (1996) find that even after controlling for the other socioeconomic factors, there is still a positive link between life expectancy and income. Furthermore, their results suggest that the relationship between income and mortality is nonlinear. Hence, in this research, the baseline mortality is estimated by means of a Lee- Carter model extended with a variable that relates to income without making assumptions on the shape of the dependence.

1.3 Bayesion Theorem

As the sizes of the portfolios considered in this thesis are small, we presume that it would be hard to obtain accurate estimates for the age-dependent portfolio-specific factors. Therefore, we consider an underlying mean-reverting process in the factors in the age dimension. Thus, we assume the factors to be independent between the portfolios but there is dependence among different ages. In order to obtain portfolio-specific mortality, two models have to be estimated simultaneously as these models depend on each other. Namely, the

4In fact, incomes of for example pensioners are usually not known. For this, we propose a way to estimate their income based on their accrued pension rights.

(11)

extension of the Lee-Carter model and the mean-reverting process of the age-dependent factors. Due to the high dimension, it is not possible to estimate these models jointly in a frequentist fashion. Thus, we are bound to calibrate the model using Bayesian theorem. In this way, all parameters could be optimized simultaneously.

Bayesian theorem regards parameters as variables with a distribution. Furthermore, implementing a Bayesian approach also brings along the advantage that information on parameter uncertainty follows automatically from calibrating the model; the output of the model consists of distributions of parameters rather than projection points. The use of Bayesian methods to estimate the Lee-Carter model or extensions of that model is exploited earlier (Antonio et al.,2015,Czado et al.,2005,Van Berkum,2018). The Lee-Carter model typically includes a time-trend. As aforementioned, in most early studies forecasting mortality rates proceeds in two steps (Brouhns et al.,2002,Li & Lee,2005,Renshaw & Haberman,2006): first, the model is estimated from which the time- trend follows, and then this time trend is extrapolated to the future. This two-step procedure may be deficient and could thus lead to sub-optimal results. Therefore, Czado et al. (2005) was, according to the author’s knowledge, the first to estimate both models simultaneously using Bayesian implementation. In fact, they compared the outcomes of the Lee-Carter model when estimated using the frequentist approach, and when this model is calibrated by means of Bayesian theorem using data on French males. Surprisingly, they found that the expected mortality rates according to Bayesian theorem are slightly higher for all ages. Furthermore, Antonio et al. (2015) presented two multi-population mortality models in a Bayesian setting. One of these models is the Lee-Carter model. In the paper ofVan Berkum (2018), to which this thesis is closest, they adjusted this model considered inAntonio et al.(2015) so that it successfully estimates the mortality of a small population (e.g. a pension fund portfolio) and the country-wide mortality jointly. In their paper, they introduced the aforementioned age-dependent factors among which a mean-reverting process is assumed as in this paper.

The remainder of this paper is structured as follows. In Section 2we first specify the notation that is used throughout this paper. After this, we elaborate on the underlying assumptions of the Lee-Carter model and their validity. Then, we provide the modifications that are applied to the Lee-Carter model such that it takes income into account and that it simultaneously estimates the portfolio-specific factors simultaneously. Hereafter, we finalize the methodology by providing the technical details on the Bayesian implementation.

InSection 3we describe data sets. First, we depict the country-wide data which is used for the general time trend and the simulation of the date. Then, we describe the data generating process for the simulated data sets. After this, we present the data that is obtained from the pension funds.

InSection 4 we present and analyze the results on the Bayesian calibration. This section is twofold. First, the model is applied to data that is simulated. The data sets have different numbers of historical years and yearly exposures in order to provide more insights on how well the model performs for data sets of different sizes. The second part consists of the results on applying the model to the available data set that is obtained from the pension funds. Subsequently, from these results, we obtain distributions for the technical provision of one of the pension funds, which we will analyze as well.

After this, concluding remarks are given in Section 5, and finally,Section 6 provides a critical look at the

(12)

2 Methodology

In this section, we introduce some notation that is used throughout the rest of this paper. For sake of con- venience, we are as consistent as possible with notations used in existing actuarial literature. The methodology is applicable for one gender at a time. For readability, we will leave out indices that specify the genders.

2.1 Deaths and exposures

First, we provide the notation for individuals. As the methodology is designed to estimate mortality rates for the whole population rather than for individuals, these variables will be transformed into variables for the whole population.

2.1.1 Variables for individuals

Although we aim to be consistent with the literature in terms of notation, we must deviate slightly. Namely, we also incorporate an income effect unlike similar studies. For this, we define several income buckets indicated by k, on which we will elaborate later on. Also, we consider multiple pension funds which are indicated by i.

For each individual, we define δⁱ_k,t,j,x as a dummy variable that indicates whether individual j who is a participant in pension fund i and belonged to income bucket k with age x has died at year t, where age is defined as an individuals’ latest birthday as usual. Thus, δⁱ_k,t,j,x is 1 if individual j died at age x in the period between the start of year t and t + 1 and belonged to income bucket k, and zero otherwise.

The next variable that we discuss is the individual exposure of individual j, who is a participant in pension fund i, who belongs to income bucket k with age x at year t, which will be denoted by τ_k,t,j,xⁱ . This variable could be interpreted as the total amount of years that the individual is exposed to mortality at age x in the period between the start of year t and t + 1. We illustrate this by a calculation example to clarify the definition of this variable and to derive some intuition.⁵ Consider an individual j who belongs to income class k who celebrates his 40^th birthday on July 1^st in 2015. If he dies on September 1^st in that same year, he has lived for three months in 2015 with age 40. Thus in that case, τk,2015,j,40ⁱ equals 0.25. In the case that he would have survived the year 2015, he would have lived for 6 months in 2015 at age 40. In that case τk,2015,j,40ⁱ would have been equal to 0.50.

2.1.2 Variables for the population

To be able to formally derive the variables on deaths and exposures at the aggregated level, we define Lⁱ_k,t,x as the number of participants in pension fund i, who belong to income bucket k with age x, and who were alive at the start of year t. We define the total number of deaths d_k,t,x and the total exposure E_k,t,x in year t of people aged x for pension fund i as follows:

dⁱ_k,t,x=

Lk,t,x

X

j=1

δ_k,t,j,xⁱ and E_k,t,xⁱ =

Lk,t,x

X

j=1

τ_k,t,j,xⁱ .

5SeeVan Berkum(2018) for a clear visualization of the variables δ_k,t,j,xⁱ and τ_k,t,j,xⁱ .

(13)

As substantiated, we estimate a baseline mortality for the aggregated data of all pension funds, and we obtain age-dependent portfolio-specific factors simultaneously. This research is designed to use historical data of multiple pension funds. Advantageously, this expands the data set, such that the extent to which a certain age group is exposed to the negative time trend in mortality could be estimated more accurately. However, the number of available historical years is not consistent for all pension funds. As a result, the composition of the aggregated data is not consistent over time. Without inference, this could lead to difficulties when estimating the time trend. According to Katikireddi et al. (2017), the mortality could differ a lot for different sectors.

Suppose that there are some pension funds with low mortality rates with data from 2010 and some pension funds with higher mortality rates with data from 2015. In the aggregated data set, the average mortality rates have increased in from 2014 to 2015. Consequently, fitting conventional mortality models on the total data set will lead to an estimated positive trend in mortality. As this positive trend is driven by the composition of the data set rather than by an actual trend in the mortality rates, this will lead to spurious results. Therefore, we propose a model that could estimate different baseline mortalities for different years. All historical years are manually divided into groups by their composition. Subsequent historical years with approximately the same composition, follow by assumption the same baseline mortality. In agreement with other studies on multiple population modelling, we do assume that the time trend is the same within each group.

The model that is described in this section does not estimate the portfolio-specific mortality rates for all pension funds simultaneously. Instead, the model is applied to obtain the mortality of one fund. Hence, we first specify the pension fund on which the model will be fitted, and indicate the corresponding exposures and the numbers of deaths by ’pf’. After this, all available years in the rest of the aggregated data are divided into groups in which all years have comparable compositions in terms of size. The group that includes the most recent years is indicated by ’pop₁’. Likewise, the group with the second most recent years is indicated by ’pop₂’.

The group with the a^th most recent years is indicated by ’pop_a’ with a ∈ {1, 2, . . . , A}, where A is the number of defined groups of years. For example, again suppose that there are some pension funds with data from 2010 to 2019, and some pension funds with data from 2015 to 2019. In that case, all pension funds have at least data from 2015 to 2019. All those years of data in the data file consist of approximately the same composition and form one year group. The years 2010 to 2014 also consist of the same composition and form another year group.

As the year group with the years 2015 to 2019 includes the most recent years, this group is assigned to pop1. The years 2010 to 2014 form group pop2. Thus, for the years 2010 to 2014 we estimate a baseline mortality different from the baseline mortality in the years 2015 to 2019. The first year and last year included in group a are indicated by a^f and a^l. For the special case that the pension fund under consideration is the one pension fund with the most data, another group of years should be defined for the years that only include in the data from the pension fund under consideration. In this way, we enable using all data of the pension fund under consideration. As this would be the year group with the least recent years, this group would be indicated by popAin that case. When pension fund l is the pension fund under consideration, we define the total numbers of deaths and years of exposure using these definitions as follows:

(14)

d^pf_k,t,x= d^l_k,t,x, E^pf_k,t,x= E_k,t,x^l , d^pop_k,t,x^a =PJ

j6=ld^j_k,t,xI_(af≤t≤a^l), E^pop_k,t,x^a =PJ

j6=lE_k,t,x^j I_(af≤t≤a^l), with a ∈ {1, 2, . . . , A}.

Here, IB is an indicator function that equals 1 in the case that B is true, and 0 otherwise. Also, J is the total amount of pension funds considered in this paper. Note that PA

a=1E^pop_k,t,x^a = PJ

j6=lE_k,t,x^j and PA

a=1D^pop_k,t,x^a = PJ

j6=lD^j_k,t,x, as the data of the pension fund under consideration indicated by l, is not included in any of the

’pop_a’ groups. Also note that for the additional defined year group in the special case that the pension fund under consideration is the pension fund with the most historical years, the corresponding observed numbers of deaths and exposures are all equal to zero.

The combinations of years of historical data, income classes and the ages that are present in the data set are inconsistent. We define T^m as the set of historical years included in m, K^m as the set of income classes included in group m, and X^m as the ages between the minimum and maximum age included in group m, with m ∈ {pf, pop₁, pop₂, . . . , pop_A}.

Using these variables, we introduce indicator variables that indicate whether a certain combination of year, income class and age is included in every group. These are needed in order to formally derive the posterior distributions. We define

I_k,t,x^pf = I_(k∈Kpf)· I(t∈T^pf)· I(x∈X^pf), (1)

and

I_k,t,x^pop^a = I_(k∈Kpopa)· I_(t∈Tpopa)· I_(x∈Xpopa), with a ∈ {1, 2, . . . , A}. (2)

When group popA corresponds to the historical years that include in the data of the pension fund under consideration and does not include in the rest of the data in the aforementioned special case, we define I_k,t,x^pop^A as follows:

I_k,t,x^pop^A= I_k,t,x^pf I_(Af≤t≤A^l).

By defining the indicator variable like this, we also obtain a baseline mortality for the unobserved years in the aggregated data excluding the pension fund under consideration. The baseline mortality in this period is based on the mortality data of the pension fund, and the baseline mortalities in the other years. We elaborate on this later on. From this, we will define the set O^pop^a of observations that follow the baseline mortality of group pop_A, and the set O^pf of observations that follow the mortality multiplied by the portfolio specific factors.

O^pf =: {(k, t, x)|I_k,t,x^pf = 1},

and analogously,

O^pop^a =: {(k, t, x)|I_k,t,x^pop^a = 1}, with a ∈ {1, 2, . . . , A}.

In Equation 1 we also include observations {k, t, x} for which the age is not included in the group but are

(15)

between the highest and lowest age that include.⁶ Also note that all observations are included in one of the O^pop^a groups, and at least some of the observations are included in O^pf. Thus, some observations are included in two groups, others in just one group.

2.2 Mortality

In this section, we present the precise definition of the mortality rates and the derivation of the distribution of the total number of deaths. Throughout this paper, these will be indicated by income class. However, for clarity, these indices will be omitted in this section just as these for gender and portfolio.

We define the observed central death rate as the ratio between the number of deaths and the exposure:

m_t,x= dt,x

E_t,x. (3)

In light of Van Berkum (2018), this value is not restricted to the [0,1] interval. For instance, consider the case that there are 10 people alive aged x at the start of the calendar year t and assume that they all die on February 1^st in year t. In that case, Dx,t equals ten as ten people died with age x in year t, and the exposure is at most ten times one month, which results in a mt,x of at least twelve. Thus, this rate is a frequency rather than a probability.

Instead, the probability of dying is denoted by q_t,x and is called death probability. Specifically, this is the probability that an individual aged exactly x at the start of year t dies within this year. The survival probability p_t,xis calculated as one minus the death probability and is the probability that an individual aged exactly x at the start of year t survives this year. Analogously, _hq_t,x is the probability that someone aged exactly x at the start of year t dies before t + h.

In a continuous time framework where both t and x are not bound to be integers anymore, we could define the mortality rate:

µt,x= lim

h↓0 hqt,x

h . (4)

This could be interpreted as the instantaneous rate of mortality. Note that this mortality rate could be considered as a continuous function in both t and x. Just as p_t,x, _hp_t,x could be calculated as one minus death probability_hq_t,x. This probability could also be derived from the mortality rate µ_t,x:

hp_t,x= exp

"

− Z h

0

µ_t+s,x+sds

#

, for h ≥ 0. (5)

From this equation, we could derive the distribution of the stochastic number of deaths. However, as there is no continuous data available we should make an assumption on the continuous µt,x.⁷ Similar to Van Berkum

6In fact, only when we assume the log-normal prior that assumes dependence among different ages, the portfolio-specific factors could also be calibrated for ages that are not included in the portfolio. When an age is not included in a portfolio, the corresponding portfolio-specific factor could not be estimated accurately when we assume independence among ages. In that case, the resulting distribution of the age-specific factor will be close to zero.

7In this paper µk,t,xis used rather than µt,x. Although it does not affect the following derivations, note that we consider income buckets, which implies that µk,t,xis not continuous in k.

(16)

(2018), we assume a piecewise constant force of mortality. This assumes that the force of mortality does only depend on the year and integer age. Formally:

Assumption (Piecewise constant force of mortality).

Let x and t be integers. Then, the force of mortality µ_t+s,x+u with s and u in [0,1) satisfies µ_t+s,x+u= µ_t,x.

Piecewise constant force of mortality will be assumed throughout the rest of this thesis. Under this assumption, the period h mortality probability for h ∈ [0, 1) is given by

hqt,x= 1 −hpt,x= 1 − exph

−Rh

0 µt+s,x+sdsi

= 1 − exph

−Rh 0 µt,xdsi

= 1 − exp [−h · µt,x] .

(6)

From this, we derive the likelihood on the individual level lj(µt,x) for µt,x given age group x and year t.

When we consider an individual j who has lived for τt,j,x years in year t with age x and he does not die before the end of the year or before his birthday, the likelihood that he contributes to the total likelihood is equal to the probability that he survives for a period indicated by τt,j,x with age x in year t:

l_j(µ_t,x|δt,j,x = 0) =_τ_t,j,xp_t,x= exp [−τ_t,j,x· µt,x] .

In the case that this individual dies before the end of the year or before his birthday, the likelihood could be written as the probability that he survived the period that he lived until the moment that he died, which is by definition τt,j,x, multiplied by the instantaneous rate of mortality at the moment the individual died µt+τ_t,j,x,x+τ_t,j,x:

l(µt,xj|δt,j,x= 1) =τ_t,j,x pt,x· µt+τ_t,j,x,x+τ_t,j,x

= exp [−τt,j,x· µt,x] µt+τ_t,j,x,x+τ_t,j,x = exp [−τt,j,x· µt,x] µt,x.

Where the latter holds according to the assumption on piecewise constant force of mortality. These two expressions for likelihood can be combined with the use of the dummy δ_t,j,x. Using this expression we could define the total likelihood as follows:

L(µt,x) =

L_t,x

Y

j=1

l(µt,xj)

=

L_t,x

Y

j=1

exp [−τt,j,x· µt,x] µ^δ_t,x^t,j,x

= exp [−Et,x· µt,x] µ^d_t,x^t,x.

This expression for the likelihood is proportional to the density of the Poisson distributed stochastic number of deaths D_t,x∼ Poisson(Et,xµ_t,x). The density f (D_t,x) of the stochastic number of deaths is in that case given

(17)

by:

f (Dt,x) = exp [−Et,x· µt,x] µ^d_t,x^t,x dt,x! .

It is proportional in the sense that both equations differ a factor that does not depend on the parameters of the distribution. In order to derive the maximum likelihood estimator, we first define the log-likelihood L_t,x:

Lt,x= log(Lt,x) = −Et,x· µt,x+ dt,x· log(µt,x).

From this we derive the maximum likelihood estimator ˆµt,x as follows:

δL(µt,x) δµt,x

= 0

−Et,x+dt,x

ˆ µ_t,x = 0 d_t,x

ˆ µt,x

= Et,x

ˆ

µt,x= dt,x

E_t,x = mt,x.

Note that the second order condition holds as the second derivative ^δ_(δµ²^L(µ^t,x⁾

t,x)² = −_µ^d_ˆ^t,x

t,x2 is certainly negative.

The fact that these expressions are proportional is a key result that validates derivations later in this thesis.

2.3 Model

Now that the notation is introduced, we provide a detailed description of the procedure that is used to derive the portfolio-specific mortality in this section. This section is structured as follows. First, we describe the model of Lee & Carter(1992), which is the basis of our model for the baseline mortalities. Then we specify how we obtain the mortality time-trend from this model using data from the Human Mortality Database (HMD). As substantiated, the available data consists of too few historical years to estimate the time-trend accurately.

Therefore, we use data of the country-wide population to obtain this time-trend and use it as an input for the model. After this, we specify the extension of the Lee-Carter model that is used to estimate and forecast the mortality rates.

2.3.1 Lee-Carter model

As the mortality time-trend is used as an input for an expanded Lee-Carter model, the time-trend is estimated through the Lee-Carter model as well. Perhaps this model is used the most in practice for modelling and forecasting mortality rates in two steps. They introduce the following model to explain observed death rates m_t,x (seeEquation 3):

log(m_t,x) = α_x+ β_xκ_t+ _t,x.

(18)

The age-specific effect is captured by αx. In this model, the variable κtis the only parameter that captures the time trend and is upon itself the same for all ages. Therefore, this model is an example of a single-factor model. The sequence of resulting best-estimates for κ_twill be input for our model for the baseline mortalities.

This parameter is used to forecast future trends in mortality rates. The related β_xpresents the extent to which the time-effects affect the mortality for age x. Lastly, _t,x is the i.i.d. disturbance term with zero-mean. This model is estimated using Singular Value Decomposition or under a Poisson assumption for the observed death counts, and it is used for estimating the mortality rates based on observed data, which forms the first step of the forecasting procedure.

The LC model is not uniquely defined. Linear transformations of all parameters could lead to equal morality rates. Therefore, in order to be able to estimate these parameters, we must specify some constraint so that all parameters could be identified. When the model is applied to historic data from the years t1to tT that includes ages x₁ to x_X, the following constraints are applied:

tT

X

t=t1

κ_t= 0 and

xX

X

x=x1

β_x= 1. (7)

By setting the sum of κ_t equal to zero, we ensure that the value of α_x equals the average observed death rates mt,xover the years. In the second step, Lee & Carter(1992) modeled the time-trend parameters κtwith the use of a random walk model with drift from which future time specific parameters could be forecasted:

κt= κt−1+ δ + χt,

where χtis iid and normally distributed with mean zero. Note that χtand t,xare independent. This forecasting method is also used to forecast the mortality time-trend in this paper.

As there are many parameters that have to be estimated, a lot of data is required for for reliable results. The large amount of parameters leads to high variances in the estimates. However, due to the many parameters, the model is very flexible. For instance, a different time trend in mortality could result for different ages.

2.3.2 Baseline mortalities and portfolio-specific factors

The model that is applied for the baseline mortality is, as aforementioned, very close to the model ofLee &

Carter (1992). Much literature have found evidence for a link between income and mortality (seeSection 1).

In agreement, we expand the Lee-Carter model with a variable that depends on income:

log(µk,t,x) = αx+ βxκ˙t+ γk. (8)

In this equation, ˙κtdenotes the time-trend that is extracted from the regular Lee-Carter model estimated for the Dutch population. For clarification, ˙κtis notated in this way to stress that it is considered as input rather than parameters that have to be estimated. In the derivations of the posterior distributions in Subsection 2.4, the dot is omitted. The income-effect is captured in γ_k. As mentioned earlier, we will apply income buckets. The

(19)

upper and lower bounds for the income buckets are defined, such that all income buckets include approximately the same number of participants in the population. Recall that the population relates to the accumulated data of all pension funds. The model is applied to data that includes ages x₁ to x_X, historical years t₁ to t_T and income buckets k₁ to k_K.

Instead of modelling observed death rates as in Lee & Carter (1992), in this model mortality rates are modelled, which serve as parameters for the distributions of the numbers of deaths. Also, optimizing variables using Bayesian theorem fundamentally assumes that these variables follow a distribution rather than that they are constants. Hence, the uncertainty in estimations for µk,t,x follows from the uncertainty in the variables instead of a disturbance term. Therefore, there is no disturbance term included in the model in contrast to the model introduced inLee & Carter(1992).

Another difference in terms of estimation with the regular Lee-Carter model is that the model for the baseline mortality in this paper requires fewer constraints on the parameters for identification purposes. This is due to the fact that ˙κ_tis used as input for the model and thus does not have to be estimated.⁸ The only constraint that is still needed to avoid convergence problems is

γk= 0.

Here, γk is the parameter associated with the middle income class.

As mentioned earlier, we estimate different baseline mortalities for the different groups of years pop_a. We assume that people in the portfolio and the rest of the population share a baseline force of mortality which is based on µk,x,t. The baseline mortality of pop_a is given by the factor Θ^popx ^a multiplied by µk,x,t.

Also, we define the age-specific parameter Θ^pf_x as the random effect that captures the difference between the mortality of the pension fund under consideration and the baseline mortality followed by the rest of the pension funds. This effect is assumed to be the same for all income classes. As derived inSubsection 2.2, the stochastic total number of deaths is Poisson distributed with the total exposure multiplied by the group-specific mortality rate as parameter. The group-specific mortality rate is calculated as the baseline mortality multiplied by the group-specific factors Θⁱ_x. Specifically, the total random number of deaths of the different groups is distributed as follows:

D^pop_k,t,x^a|µk,t,x∼ Poisson

E_k,t,x^pop^aµk,t,xΘ^pop_x ^a

, for (k, t, x) ∈ O^pop^a, (9)

D^pf_k,t,x|µk,t,x∼ Poisson E_k,t,x^pf µk,t,xΘ^pf_x

A

Y

a=1

(Θ^pop_x ^a)^I^k,t,x^popa

!

, for (k, t, x) ∈ O^pf. (10)

Note that for all combinations {k, t, x}, exactly one of the indicators I_k,t,x^pop^a equals one. For instance, when a combination {k, t, x} is included in the portfolio and in group of years O^pop³, the parameter of the Poisson distribution of D_k,t,x^pf is given by E_k,t,x^pf µk,t,xΘ^pf_xΘ^popx ³. For sake of identifiability of αx and Θ^popx ^a, we impose

8Note that the values of ˙κtare in fact restricted because they are estimated under the constraint given inEquation 7. However, this does not restrict us when estimating the baseline mortality.

(20)

the following constraint:

Θ^pop_x ¹ = 1.

We choose to restrict the factor associated with this particular group of years as in this way the results on forecasting mortality rates are only affected by the behaviour of µ_k,x,tand the factors Θ^pf_x. Also note that it is possible that only one pension fund includes in a year group pop_a. Suppose that there is one pension fund with data from 2010, and that the rest of the pension funds have data from 2014 at most. In that case, the years 2010 to 2013 are included in pop_A. Including this year group leads to at least one more parameter for every considered age that has to be estimated in the model and adds little to the aggregated data. One would expect that this would be a reason to disregard the years 2010 to 2013. However, for forecasting the fund-specific mortality, only µk,x,t and Θ^pf_x are relevant. Thus, the high variance of the factors associated to this year group does not lead to a lot more uncertainty in future mortality rates. Moreover, these years of data contribute when estimating β_x and γ_k. Also, as aforementioned, when the pension fund under consideration is the pension fund with data from the year 2010 in the same example, a year group is defined for the years 2010 to 2013. In that case, the estimation of Θ^pf_x is not influenced by the data of the other pension funds as there are no observations for the remainder of the data in these years. As a result, Θ^popx ^A could be determined such that µk,x,tΘ^pf_xΘ^popx ^A

is equal to the average mortality of the pension fund for these years without affecting the likelihood in other observations. Thus, Θ^popx ^A could fully absorb the age-specific average mortality during these years. Although, the inclusion of this data helps estimating the age-specific time trend and the differences between the income classes. Including more years of data could be very useful for the estimation of βx. So including these years in the model could lead to less uncertainty in future mortality rates.

As aforementioned, we consider two varieties of this model in this paper. One variety assumes independence among pension funds and dependence among ages. Independence among ages is modeled by using a mean- reverting lognormal prior. Due to the low exposures, this could lead to a more interpretable pattern in the factors among ages. This model will be compared to the variety that assumes a Gamma prior. This variety uses slightly less parameters and is easier to compute. We either assume dependence among ages for the factors of all groups, or we assume independence among ages for the factors of all groups.

Now that the model is defined, we provide details on Bayesian implementation in the next section.

2.4 Bayesian implementation

In this section, the procedure to obtain the results for the Bayesian calibration is described and is rather similar to the Bayesian implementation inVan Berkum(2018). As aforementioned, there are two characteristics of the Bayesian method that lead to this choice. Firstly, the distributions of the parameters follow from calibration and it is needed for finding these distributions simultaneously. Multiple models that depend on each other can be estimated simultaneously. As we will assume a mean-reverting process for Θⁱ_xin the x dimension (see Subsubsection 2.5.3), Bayesian theorem is required in order to estimate this process simultaneously with the baseline mortality model.

(21)

2.4.1 Bayesian statistics

Consider a set of observations denoted by x = (x₁, . . . , x_n), which relates to the set of observed death rates.

Also consider outcomes of the random variables Xi with density Xi(xi|θ). Thus, θ = (θ1, . . . , θp) are needed to determine the distribution for random variables Xi. These parameters refer to the parameters for the baseline mortality model and the portfolio-specific effects. The distributions of θ given the data is the distribution that we aim to approach. This distribution is called the posterior distribution and could be expressed as p(θ|x, α).

Without any knowledge of the data, one could have prior expectations about the distribution of θ, which is called the prior distribution. This distribution needs to be specified and is indicated by p(θ|α), where α are the parameters of the prior distribution. This is a multidimensional distribution for all elements of θ and is needed to be specified. All elements in the prior distribution are statistically independent, The parameters of this distribution are called hyperparameters, and need to be specified before calibration, as well as the prior distribution itself. In practice, one does usually not have many indications to choose the prior distributions.

Therefore, the hyperparameters are often specified such that the variance is very high and has the desired range (Antonio et al.,2015,Van Berkum,2018). In this way, the distribution has very little influence on the posterior distribution, from which could be sampled.

According to Bayes’ rule, the posterior distribution could be written as

p(θ|x, α) = p(x|θ)p(θ|α)

p(x|α) ∝ p(x|θ)p(θ|α), (11)

where p(x|θ) is the likelihood of all observations evaluated in all updated parameters. The key result that follows from this equation is that the posterior distribution is proportional to the chosen prior distribution multiplied by the known distribution of the observations, given the parameters (see Equation 9). It is not possible to sample directly from the posterior distribution p(θ|x, α). Instead, we proceed two different algorithms based on Markov Chain Monte Carlo from which a sequence of samples of θ result. This sequence converges to the posterior distribution.

2.4.2 Markov Chain Monte Carlo methods

As substantiated, we want to obtain a distribution of θ. Although it is not possible to sample from the posterior distribution p(θ|x, α) directly, a sequence of samples θ¹, θ², θ³, . . . could be constructed that converges to this distribution. For some elements of θ we use the Gibbs algorithm and for others we use the Metropolis(- Hastings) algorithm, depending on the situation. In order to obtain samples of θ we construct a Markov Chain with state space Z. This subset is defined such that θ ∈ Z. When we continue adding to this sequence, it eventually converges to p(θ|x, α). Mathematically, we get

θⁿ −→^d

n→∞Θ ∼ p(θ|x, α), (12)

(22)

where Θ is the multivariate random variable that follows the posterior distribution. Hence, in this way, we could simulate the posterior distribution. We can also obtain expected values of functions f of θ conditional on this posterior distribution, which turns out to be useful later on:

1 n

n

X

i=1

f θⁱ

n→∞−→ E_p(θ|x,α){f (θ)}, (13)

which is simply the average of the functions evaluated in the MCMC simulations θⁱ. The basic idea of MCMC is that in every iteration, all elements of θ are updated consecutively. Suppose we have a set of starting values θ⁰ for all parameters. The parameters are updated one by one, based on all latest updates of the other variables, the observed data x and the hyperparameters α:

update θ₁¹ based on {x, α, θ⁰₂, θ⁰₃, . . . , θ_p⁰},

update θ₂¹ based on {x, α, θ¹₁, θ⁰₃, . . . , θ_p⁰}, ...

update θ_p¹ based on {x, α, θ¹₁, θ¹₂, . . . , θ_p−1¹ }.

After this procedure, the whole vector of parameters is updated from θ⁰ to θ¹. The updating steps are done either using Gibbs sampling or Metropolis(-Hastings) sampling. If the posterior distribution is proportional to a known distribution, we use Gibbs sampling. Otherwise, we use Metropolis(-Hastings) sampling. The descrip- tions for both algorithms are as follows.

The Gibbs algorithm

In case the posterior distribution of a certain parameter θi is proportional to a known distribution, this parameter is updated using Gibbs-sampling. Suppose we aim to update parameter θ_i^j. The new realization of this variable θ^j+1_i is directly drawn from the distribution which is proportional to the likelihood multiplied by the likelihood, evaluated in the latest updates of all other parameters.⁹ This distribution is conditional on all observations, all other updated parameters and the hyperparameters. It could be written as p(θ_i^j|x, α, θ^j+1₁ , . . . , θ^j+1_i−1, θ_i+1^j , . . . , θ^j_p). In this way, we obtain θ^j+1_i , which is the new realization of θ^j_i.

The Metropolis-(Hastings) algorithm

In the case that the posterior distribution of a certain parameter θi is not proportional to a known distribution, this parameter is updated using Metropolis-(Hastings) sampling. Suppose we aim to update parameter θ^j_i. When the posterior distribution is not proportional to a known distribution, it is not possible to sample from the posterior distribution directly. Instead, we draw a new candidate θ_i^j⁰ for θ_i^j+1from a candidate generating

9Intuitively, as any distribution always integrates to one, we are sure that when the derived posterior distribution is proportional to a known distribution, the posterior distribution is the same as this known distribution.

On modeling stochastic portfolio-speciﬁc mortality rates for small pension funds based on income