Pricing and Hedge Eﬀectiveness of Longevity-Linked Securities

(1)

Pricing and Hedge Effectiveness of

Longevity-Linked Securities

Brian M¨

ollenkamp

July 1, 2020

A Thesis presented for the Degree of

Master of Science in Econometrics, Operations Research and Actuarial Studies

Faculty of Economics and Business

University of Groningen

The Netherlands

(2)

Master’s Thesis Econometrics, Operations Research and Actuarial Studies

Supervisor: dr. J. Alonso-Garc´ıa

(3)

Pricing and Hedge Effectiveness of Longevity-Linked

Securities

Brian M¨

ollenkamp

Abstract

(4)

1 Introduction

The first notion of a formal public pension system was designed in 1881 by Otto von

Bis-marck in Germany. The pension system made sure that the few workers that reached age 70

could retire and would still receive income covered by the working class [9]. Ever since, the

life expectancy for newborns in Germany more than doubled [30]. At first sight this seems nothing but good news resulting from, among other things, better sanitation, education and

health care. However, when the improving life expectancy out speeds the predictions made by

pension funds and insurance companies, it may result in more than expected outbound cash

flow for these companies. This is called longevity risk and this plays a substantial role in risk

management in the insurance and pension business. In a report by the European Insurance

and Occupational Pension Authority in 2011 it was noted that the life risk module accounted

for 24% of the basic capital requirements of insurance companies and pension funds, and that

longevity risk is a main risk driver of life risk [21]. The primary goal of this thesis is to assess

and ultimately compare financial instruments that mitigate longevity risk in a simple annuity

portfolio.

1.1 Structure of this research

One of the most important elements in assessing longevity risk is the mortality model. In

1992 Ronald D. Lee and Lawrence Carter proposed a fairly simple mortality model,

depen-dent of two age parameters αx and βx, and only one time series parameter κt [25]. These

parameters can be estimated using the singular value decomposition or numerical algorithms

like Newton-Raphson’s method. Various distributional assumptions in the Lee-Carter model

lead to different models. In this thesis we will first discuss the classical Lee-Carter model

using least squares estimation as proposed by Lee and Carter themselves [25]. This model

assumes the error terms x(t) to be N (0, σ2) distributed for all ages x and years t. This is

a very strong assumption that may not hold for most mortality data sets. In particular the

assumption that the variance of the error terms is independent of age is dubious. The second

model that will be discussed in this research does not have that problem. The model is based

on a paper of Brouhns et al. [13]. The number of deaths per age and year is assumed to

be Poisson distributed. The parameters can then be estimated numerically such that they

(7)

data from 1965 to 2014. Ultimately, both models will be compared using graphical and

sta-tistical tests and the model fitting the data the best will be used in the remainder of the thesis.

The estimates of time series parameter κt can be fitted into an ARIMA model to predict

future mortality rates and life expectancy. Determining the ARIMA model that fits κt the

best is a key aspect in this research. Later on the effectiveness of longevity-linked securities

will be estimated using simulated future values of κt. Therefore the process of determining

the ARIMA model for κtwill be comprehensively described in this thesis.

Once an accurate model for forecasting future mortality rates has been built, it can be used

to quantify the amount of longevity risk a portfolio is exposed to. The Solvency II framework

provides clear demands in the form of capital requirements for insurance companies and

pen-sion funds to protect themselves against longevity risk [1]. These capital requirements can

be a substantial amount on the balance sheet of insurance companies. This gives insurance

companies an incentive to hedge the longevity risk in their portfolio in order to reduce the

capital requirements. In this thesis we will extensively explain the Solvency II framework,

and the role of longevity risk in this framework.

In the fifth chapter all previous theory will be combined to assess and mitigate longevity

risk. A risk premium λ, denoting the market price of longevity risk, will be incorporated into

the ARIMA model for κt using Wang’s distortion operator. Consequently, the survivor

for-ward, the survivor swap and the European survivor call will be priced for different maturities

and values for λ. Ultimately, the hedge effectiveness of the longevity-linked securities will

be determined based on mean, standard deviation, Value-at-Risk and Expected Shortfall of

simulated losses in hedged portfolios.

1.2 Literature review

This section will provide a review of recent literature in the field of longevity risk. It is crucial to assess the ecomonic impact and responsibilities of longevity risk, in particular who should

bear longevity risk [28]. Macminn et al. argue that governments could be most suitable to sell

longevity-linked securities, since governments could deal with the problem of adverse selection

(8)

on the suitability of the government by discussing the Arrow-Lind theorem. This theorem

states that the risk premium of a financial instrument issued by the government would

ap-proximately be zero [4], because the risk is dispersed over a large population. This can be

countered by the insight that a portion of longevity risk is not diversifiable. They continue

by noting that the question who should bear longevity risk is dependent of if longevity risk

is seen as a risk or as uncertainty, where uncertainty is defined as randomness with unknown

probability distributions as viewed by F.H. Knight in [24]. In line with E. Stallard they argue that longevity risk is quantifiable and hence a risk. More methodological development will

improve the accuracy of future forecasts [38].

Successful hedging might diversify away a part of longevity risk, but will at the same time

incorporate basis risk [17]. Coughlan et al. provide a framework for understanding longevity

basis risk, calibrating longevity-linked financial instruments and evaluating hedge

effective-ness. Basis risk arises when there is a mismatch between the hedged portfolio and the hedging

instrument. The appearance of longevity basis risk indicates that the instrument used for

hedging can not mitigate all longevity risk a portfolio is exposed to and it also implies that

some instruments used for hedging longevity risk could be more effective than others. This hedge effectiveness depends on the underlying hedging objectives. In this paper they applied

the framework to a case study for British mortality data. The results indicated that longevity

basis risk between a pension plan or annuity portfolio and a hedging instrument linked to a

national population-based longevity index can be reduced considerably using this framework.

Our methodological framework is heavily inspired by theirs.

Understanding how longevity risk can be decomposed into different types of risk will provide

a deeper understanding of longevity risk [15]. Cairns et al. perform a case study of a pension

plan that wants to hedge the longevity risk by reducing the standard deviation of their cash

flow. They show how correlation and hedge effectiveness can be decomposed into contribu-tions by distinct types of risk. The main risks they discuss are basis risk, recalibration risk and

Poisson risk. The paper uses the framework provided by Coughlan et al., discussed in the last

paragraph. The main conclusion that Cairns et al. find is that index hedge strategies are an

effective and less costly alternative to hedge longevity risk compared to customised longevity

(9)

might underestimate longevity risk, but also the hedge effectiveness of index-based hedging

strategies. Although this research was limited to the objective of minimizing the standard

deviation in their portfolio, they believe that similar results will hold for other risk metrics.

Denuit et al. describe a framework to price survivor bonds using Wang’s transform in the

Lee-Carter framework [18]. They design a survivor bond that could be used to hedge longevity

risk. Their study finds that the designed survivor bond could be an interesting instrument to diversify longevity risk. In this paper Wang’s transform is also used to price the

longevity-linked securities discussed in this paper.

1.3 Aim of this research

The purpose of this study is to determine the price and hedge effectiveness of the survivor

forward, the survivor swap and the European survivor call for several maturities and market

prices of longevity risk. In particular, the main objective is to find the derivative that hedges

a portfolio exposed to longevity risk the most effectively based on mean, standard deviation,

99.5% Value-at-Risk and 99.5% Expected Shortfall of simulated losses. I expect the forward

and swap to do well, since they are the most common derivatives used for hedging other risks

like for example credit risk and interest rate risk [8].

This thesis will contribute to literature in presenting the longevity risk hedge effectiveness

of these longevity-linked securities in a simple pension annuity portfolio under a Poisson

dis-tributed Lee-Carter mortality model. The results can be valuable for insurance companies and

pension funds that want to assess, regulate and mitigate the longevity risk they are exposed to

in their liabilities. A limitation of this research considering the paper by Cairns et al. is that

this study does not specifically addresses parameter uncertainty in the hedging framework.

This could cause longevity risk and hedge effectiveness to be significantly underestimated [15].

1.4 Main findings

In the second chapter we find that the homoskedasticity assumption of the error terms in the

least squares estimated model does not hold for the Dutch mortality data. On top of that, analysis of the residuals also suggests that the least squares estimated model is a bad fit.

(10)

of the Poisson estimated model concludes that the Poisson estimated model is more suitable

for modelling the Dutch mortality rates.

In the third chapter an ARIMA(0,1,0) model is shown to be the best fit to the κttime series

obtained from the Poisson estimated model. The formula describing the time series κtis

κt− κt−1= −1.8631 + t,

with t∼ N (0, 5.877). This formula is used to simulate future realized survivor rates in the

process of testing the hedge effectiveness.

Finally, in the fifth chapter we find that the European survivor call is the most effective

derivative in hedging longevity risk in a basic annuity portfolio. This finding is based on the

result that it is less costly to decrease the Value-at-Risk and Expected Shortfall of losses in

a portfolio hedged with European survivor calls than in a portfolio hedged with the survivor

(11)

2 Lee-Carter model

In 1992 Lawrence Carter and Ronald D. Lee introduced the Lee-Carter model, a fairly

sim-ple model expressing the change in mortality over time for a fixed age as a function of only

one time dependent parameter [25]. In the original paper a least squares model is proposed,

which relies on the assumption of constant variance across all ages. More recently, another model was proposed by Brouhns et al. [13], which allows variance to be increasing in age.

In this model the number of deaths is Poisson distributed. The death rate formula mx(t)

is still modelled according to the central equation in Lee-Carter’s model. These approaches

to modelling mortality data differ in the procedures of estimating the parameters. In this

chapter, both models will be fit over Dutch mortality data and the results will be tested on

goodness-of-fit. The model that fits the data the best will be used to simulate future survival

rates to quantify longevity risk in chapter 5.

A strong alternative for the Lee-Carter model is the Cairns-Blake-Dowd model, which is

pro-posed in 2006 [14]. This model uses two time parameters, which is one more than is used in the Lee-Carter model. This extra time parameter creates an opportunity for the model to

de-scribe imperfect correlation in mortality rates across various ages for different years in a more

precise fashion [35]. In general, Lee-Carter often describes mortality rates better for lower

ages (younger than 60), and Cairns-Blake-Dowd describes mortality rates better for older ages

(age 60 and older), which makes the Cairns-Blake-Dowd model a useful model for pension

funds and life-insurance companies. However, a comparative research study by C. Maccheroni

and S. Nocito concluded that for Italian data from 1975-2014 the Lee-Carter model performed

better for the complete data set, and the Cairns-Blake-Dowd model only produced reliable

forecasts for age 75 and older [27]. In this thesis we also use Western-European mortality

data from roughly the same period. In addition to it being being an easier model, this makes the Lee-Carter model the preferred model for this thesis. In this thesis we will not discuss

Cairns-Blake-Dowd any further, but we can refer the interested reader to the original paper

in which the Cairns-Blake-Dowd model is proposed, see [14].

Before focusing on the estimation of the model parameters a few essential definitions and

assumptions that will be used during this thesis will be discussed. The first one is the survival

(12)

to at least age x from birth in year t. Now define the force of mortality in terms of S0,t as µx(t) = 1 S0,t(x) lim dx→0+ S0,t(x) − S0,t(x + dx) dx = −1 S0,t(x) d dxS0,t(x) = f0,t(x) 1 − F0,t(x) ,

where we use that −_dxdS0,t(x) = f0,t(x) [19].

It is challenging to obtain accurate values for µx(t), since it is a quotient of distributions that

are unknown. The distribution is based on the continuous random variable lifetime, while

mortality data denotes age at death only as integers. The probability of death for age x and

year t derived from this data is called the death rate, denoted as mx(t). In literature, it is very

common to assume that the death rate mx(t) is equal to the force of mortality µx(t). This

can be achieved by assuming that the force of mortality is constant within certain intervals

of time and age for every age x, i.e.

µx+ξ1(t + ξ2) = µx(t), 0 ≤ ξ1, ξ2< 1, (1)

see [35]. In the remainder of this thesis the one-year survival probability for someone of age

x in year t will be denoted as Sx,t.

After imposing this assumption we get to discuss our model of interest. The log-bilinear form of the Lee-Carter model is defined as follows:

log mx(t) = αx+ βx· κt, (2)

where αxis an age dependent sequence and βxis an age dependent sequence interacting with

the time dependent parameter κt[25]. When the values of αx, βxand κtare estimated with

use of the death rates data, an autoregressive integrated moving average (ARIMA) model

can be fitted over the κt’s. This ARIMA model will help us forecast future κt’s and therefore

predict future death rates and life expectancy. This will be discussed in chapter 3. The

remainder of this chapter deals with the least squares estimation of αx, βx and κtin section

2.1 and with the Poisson estimation of these parameters in section 2.2. Section 2.3 discusses the data and the required manipulations. The estimation results are presented in section 2.4

(13)

2.1 Least squares estimation

The classical way to estimate the values of αx, βx and κt is by using the least squares

assumption, i.e.

log ˆmx(t) = αx+ βx· κt+ x(t), (3)

for x = x1, ..., xm and t = t1, ..., tn, [25]. An important note here is that the error terms

x(t) are assumed to be normally distributed with mean 0 and variance σ2. In particular this

means that the error terms are assumed to be homoskedastic. In other words, the variance

of the error terms does not depend on age x. Whether this is a reasonable assumption or not

will be discussed in section 2.5.

Note that the proposed model has identifiability issues. For

log mx(t) = ˜αx+ ˜βx· ˜κt,

with ˜αx= αx+ c · βx, ˜βx= β_qx and ˜κt= q · (κt+ p), it gives exactly the same model as with

αx, βx and κt, for every p, q ∈ R with q 6= 0. This leads to an infinite number of equivalent

maxima in the likelihood corresponding to the model. All of these maxima will generate equal

forecasts. There is no point in complicating the model this way. Two constraints need to be

imposed to be sure of identification. The choice for these constraints is rather subjective. In

literature it is common to use the constraints shown in equation (4) [25].

tn X t=t1 κt= 0, xm X x=x1 βx= 1. (4)

These identifiability constraints will also be used in this thesis. The optimal ˆαx, ˆβx and ˆκt

are the estimates under which the objective function

OLS(α, β, κ) = xm X x=x1 tn X t=t1 (log ˆmx(t) − αx− βx· κt) 2 (5)

is minimized [35]. The exact way to find these values is by computing a singular value

decom-position, as suggested by the original Lee-Carter paper [25]. It can also be done numerically

using the Newton-Raphson method. We will delve into the two estimation procedures in

(14)

2.1.1 Singular value decomposition

This subsection discusses an exact way to find the ˆαx, ˆβxand ˆκtthat minimize the objective

function in equation (5). First the optimal value for ˆαx is determined. When setting the

derivative of OLS with respect to αxequal to 0, we find

−2 · tn X t=t1 (log ˆmx(t) + αx+ βx· κt) = 0. Note thatPtn

t=t1κt= 0 by the assumption in equation (4). Solving for αx yields

ˆ αx= Ptn t=t1log ˆmx(t) tn− t1+ 1 . (6)

Having estimated ˆαx, we create a matrix that represents bilinear term as follows:

Z =      ln ˆmx1(t1) − αx1 . . . ln ˆmx1(tn) − αx1 .. . . .. ... ln ˆmxm(t1) − αxm . . . ln ˆmxm(tn) − αxm      .

The goal is now to minimize the updated objective function

˜ OLS(β, κ) = xm X x=x1 tn X t=t1 (zxt− βx· κt)2 (7)

for βx and κt. Now the singular value decomposition is a useful tool. Note that the

Eckart-Young-Mirsky theorem states that the best approximation of rank k of a matrix Z with

singular value decomposition Z = U ΣVT equals

Zk = k

X

i=1

σiuivTi ,

where σiequals the i-th diagonal entry of Σ and uiand viequal the i-th column of respectively

U and V [20]. Define λ as the largest eigenvalue of ZT_{Z, and define the corresponding}

eigenvector as u. Furthermore, let v be the eigenvector corresponding to eigenvalue λ in

ZZT_{. Applying the Eckart-Young-Mirsky theorem for k = 1 leads to}

Z ≈ Z1=

(15)

being the best rank 1 approximation of Z. Hence, ˆ β = v Pxm−x1+1 j=1 vj , ˆκ =√λ   xm−x1+1 X j=1 vj  u (8)

are the values of βx and κt minimizing the objective function in equation (7) [25].

It is trivial that ˆβx satisfies the identifiability constraint in equation (4). For κt it is not so

clear, but it satisfies the constraint as well [35]. However, it is beyond the scope of this thesis

to show why this holds.

2.1.2 Newton-Raphson method

The second way of determining the values of αx, βxand κtthat minimize objective function

(5) is by using a numerical algorithm, named after Isaac Newton and Joseph Raphson [13].

Their algorithm is a very useful tool for finding the roots of a twice differentiable function

f : R → R. It starts with a first guess x0, which updates iteratively according to

xn+1= xn−

f (xn)

f0_(x n)

,

for f the function the function of interest and f0 its first derivative with respect to x. This process is repeated until the value of xn+1−xnis below a predetermined, sufficiently low value.

In our case we perform Newton-Raphson three times for αx, βx and κt all at the same time.

The function f in this case equals

f (θ) = ∂O(α, β, κ)

∂θ ,

for θ being the parameter of choice. This leads to the following updating scheme:

ˆ

θ(k+1)= ˆθ(k)− ∂O

(k)_{(α, β, κ)/∂θ}

∂2_O(k)_{(α, β, κ)/∂θ}2,

(16)

∂O(α, β, κ) ∂αx = −2 · tn X t=t1 (log ˆmx(t) − αx− βx· κt), ∂O(α, β, κ) ∂κt = −2 · xm X x=x1 βx· (log ˆmx(t) − αx− βx· κt), ∂O(α, β, κ) ∂βx = −2 · tn X t=t1 κt· (log ˆmx(t) − αx− βx· κt),

for tn the last year in the data set and xmthe oldest age in the data set [35]. Notice that the

factor −2 will also appear in the second derivative. So all −2 factors can be dropped from

both the numerators and the denominators.

The iterative updating scheme is as shown below:

ˆ α(k+1)_x = α(k)_x + Ptn t=t1(log ˆmx(t) − ˆα (k) x − ˆβx(k)· ˆκ(k)t ) tn− t1+ 1 , ˆ κ(k+1)_t = κ(k)_t + Pxm x=x1 ˆ βx(k)· (log ˆmx(t) − ˆα (k+1) x − ˆβx(k)· ˆκ(k)t ) Pxm x=x1( ˆβ (k) x )2 , ˆ β_x(k+1)= β_x(k)+ Ptn t=t1ˆκ (k+1) t · (log ˆmx(t) − ˆα (k+1) x − ˆβx(k)· ˆκ(k+1)t ) Ptn t=t1(ˆκ (k+1) t )2 .

The algorithm stops when the values for all ˆα(k+1)x , ˆκ(k+1)t and ˆβ (k+1)

x do not change too much

anymore after a new iteration. The stop criterion will be set to 10−10. For the starting values we use values that satisfy the identifiability constraints in equation (4) from the beginning.

In this case ˆα0

x= 0, ˆκ0t = 0 and ˆβx0= 1

xm−x1+1, for every x, t.

To make sure the identifiability constraints are met, new updates need to be imposed right

after each iteration for that specific parameter. We replace ˆαxwith ˆαx+ ˆβx· ¯κ, also ˆκtwith

(ˆκt− ¯κ) · ˆκ∗ and ˆβx with ˆ βx

ˆ

β∗, with ¯κ being the mean of all ˆκt’s and ˆβ

∗ _{being the sum of the}

ˆ

βx’s. Now the identifiability constraints are satisfied [35].

2.2 Poisson estimation

The second way of estimating the model parameters is by using a Poisson distribution. Based

(17)

overcome the homoskedasticity issues that arise due to the assumption that errors are equally

distributed across all ages [13].

This modelling framework relies on the exposure-to-risk Ex,tand death counts Dx,t.

Exposure-to-risk for age x and year t is the number of people of age x that are exposed to the risk of

death during a one-year time interval from year t to year t + 1. This value is almost equal

to the population size of age x in year t, except for a slight correction1 that contemplates the timing of deaths. The Dutch exposure-to-risk (ETR) data is provided by the Human

Mortality Database for all ages and years of our interest [2].

The concept of the Poisson distributed model is

Dxt ∼ Poisson(Ext· mx(t)), mx(t) = eαx+βx·κt. (9)

The parameters of this model can be estimated by maximizing log-likelihood corresponding

to the model described above. This log-likelihood is given by

L(α, β, κ) = xm X x=x1 tn X t=t1 Dxt· (αx+ βx· κt) − Ext· eαx+βx·κt + c, c ∈ R.

The easiest way to estimate the optimal values for the parameters in this Poisson model is by using the Newton-Raphson method. Working in a similar fashion as before we find the

following updating scheme:

ˆ α(k+1)_x = ˆα(k)_x + Ptn t=t1(Dxt− Ext· e ˆ α(k)_x + ˆβ_x(k)·ˆκ(k)_t ₎ Ptn t=t1Ext· e ˆ α(k)x + ˆβx(k)·ˆκ(k)t , ˆ κ(k+1)_t = ˆκ(k)_t + Pxm x=x1(Dxt− Ext· e ˆ α(k+1) x + ˆβ(k)x ·ˆκ (k) t ) · ˆβ_x(k) Pxm x=x1(Ext· e ˆ α(k+1)x + ˆβ (k) x ·ˆκ (k) t )( ˆβ(k)_x )2 , ˆ βx(k+1)= ˆβx(k)+ Ptn t=t1(Dxt− Ext· e ˆ α(k+1)_x + ˆβ(k)_x ·ˆκ(k+1)_t _{) · ˆ}_κ(k+1) t Ptn t=t1(Ext· e ˆ α(k+1)x + ˆβ(k)x ·ˆκ(k+1)t )(ˆκ(k+1)_t )2 ,

1_{Someone of age x in year t could be born in year t − x or in year t − x + 1. This correction takes the}

(18)

The criterion to stop the algorithm will again be set to 10−10, which is a common choice in literature [13]. For the starting values we use values that satisfy the constraints from the

beginning, just as we did in subsection 2.1.2 for the least squares model. In this case we set

ˆ α0 x= 0, ˆκ0t = 0 and ˆβx0= 1 m, for every x, t.

2.3 Data

The data originates from the Human Mortality Database (HMD) [2]. They provide access

to mortality data of 41 countries around the world, including Dutch death rate data which

will be used in this thesis. Unless stated otherwise, the observed death rates data that will be used in this thesis comes from calendar years t1 = 1965, . . . , tn = t50 = 2014, for ages

x1= 0, . . . , xm= x105= 104.

It should be mentioned that HMD smoothed the death rates for older ages, starting at age

x = 80. Without smoothing the data, the number of deaths at older ages will eventually

turn too small, so that the data starts to exhibit random variation. To see this we inspect

the raw data set, also obtained from the HMD. The raw death rates data were only

avail-able from age 0 up to age 95 and from year 1980 to 2014. For calculating the raw death

rates, the number of deaths per age per year is divided by the population size for that age in

that year. Figure 1 shows the boxplots of mortality rates per age of the raw data for every age.

Looking at figure 1, smoothing the mortality rates for age 80 onwards can be justified, since

variance increases rapidly for older ages. This can be deducted from noticing that the boxplots

become wider as age increases. The data beyond that age is therefore smoothed by using the

parameters that optimize the log-likelihood function of a Poisson model, as described by the

Methods Protocol [43]. This yields the following model:

Dxt∼ Poisson(Ext· µx+0.5(a, b)), µx(a, b) =

a · eb(x−80) 1 + a · eb(x−80).

This smoothing turns out to be a problem when testing which model fits the data better. For older ages the data will be biased towards the Poisson model. For that particular reason,

only data from ages 0 up to and including 79 will be used when the models are compared

to each other in quantitative tests. All data sets are transformed into a matrix where rows

(19)

Figure 1: Boxplots of mortality rates per age of the raw data. Source: the author.

2.4 Estimation results

We will start discussing the results of the least squares estimated model. The results of the

parameter estimates of the exact singular value decomposition method and the numerical

Newton-Raphson method were almost identical. The largest difference between any of the vector entries for either α, β or κ for both methods is approximately 1.286 · 10−12_{, so we can}

conclude that both methods work and assume the results are identical. Since the parameters

of the Poisson distributed model can only be estimated with Newton-Raphson’s method, from

now on the Newton-Rahpson results will be used instead of the SVD estimates in the least

squares estimated model. This makes it easier to compare the results since the same methods

are used with the same tolerance and both have equivalent sized model errors.

Now we compare the results of the least squares estimated model with that of the Poisson

estimated model. The results for α, β and κ are shown graphically in figure 2. The least

squares model results are red dotted and the Poisson model results are green dotted.

Looking at the first plot of figure 2 it becomes clear that α displays the general mortality

(20)

Figure 2: Plots of α, β and κ for the SVD estimated model (red dotted) and the Poisson

estimated model (green dotted). Source: the author.

Unfortunately, there are still a few newborns that either die at or some time after birth due to

lethal diseases and complications. Around age 17 to 21 another excessive increase in mortality

can be spotted. This peak is called the young adult mortality hump. According to American

research this hump is mainly caused by males, and induced by biological and socioeconomic

factors [36]. These factors include riskier behavior, alcohol consumption and being able to

drive. The main reasons of death that contribute to this hump are traffic accidents, suicides,

homicides and poisonings. From this age on α increases gradually as expected. Due to its

interaction with κ, the course of β is less intuitive than that of α. The parameter β can

be interpreted as the rate every age is benefiting from improving life expectancy over time.

Since κtis negative in 1994, 3 to 12 year olds benefit the most of improving survival rates. In

the third graph we can clearly see κ decreasing over time, mainly caused by improvements in

sanitation, housing, nutrition, education and medical care [37]. Individual humps are harder

(21)

Comparing the results of the parameter estimates of both estimation methods, we see that

the results for α are extremely similar. For the β and κ there seems to be relatively more

difference between the two models. For β we see a slight difference for ages 3 to 11 and

ages over 99. For κ, we see a slight difference for almost all years covered. In combination

with the differences in β this could accumulate to larger differences, since these parameters

interact with each other. It should be noted that the value of these parameters is not the only

difference between these models. The main difference is the way the source of randomness is incorporated in the model. For the least squares estimated model this happens according

to the formula in equation (3) where t(x) ∼ N (0, σ2) is the source of randomness. In the

Poisson estimated model the number of deaths is drawn from a Poisson distribution where

the mean is derived from the central equation in the Lee-Carter model.

2.5 Quantitative comparison of least squares with the Poisson

as-sumption

In this section the least squares and the Poisson model will be compared using quantitative

results. The first test we perform is Bartlett’s test. Recall that for the least squares method

homoskedasticity of the error terms was assumed. This is a very strong assumption. In this

section we will find out if this assumption is acceptable. We can see in figure 1 that the

wideness of the boxplots increases rapidly already before age 80, the age from which our

data is smoothed. This indicates that the homoskedasticity assumption may not hold for our

data set. To formally test the homoskedasticity assumption, Bartlett’s test is used for the

complete smoothed data set. Bartlett’s test is a formal way of testing if samples have equal

variance. Every row of the life table matrix is treated as a different sample, each of them hav-ing equal variance under the null hypothesis. The test statistic is approximately χk−1= χ104

distributed under the null hypothesis. This leads to the null hypothesis being rejected if the

test statistic exceeds χ104,0.95= 128.80. For the data the test finds a test statistic value equal

to 3741.4, with a corresponding p-value of less than 2.2 · 10−16. The null hypothesis is easily being rejected. We can draw the conclusion that the variance is not equal for all ages. This

rules out the possibility of homoskedastic error terms and favors the Poisson estimated model.

A way to compare the two distributional assumptions is by performing residual analysis.

(22)

from age 80 onwards. Consequently, only data for ages 0 to 79 will be tested. The model

parameters will be determined by only the data for ages 0 up to and including 79.

For the least squares method Pearson residuals need to be inspected [34]. For this particular

model the residuals are given by

rxt= log ˆmx(t) − ( ˆαx+ ˆβx· ˆκt) q 1 (xm−x1)(tn−t1−1) Pxm x=x1 Ptn t=t1(log ˆmx(t) − ( ˆαx+ ˆβx· ˆκt)) 2 .

The sum of the squared residuals is chi-squared distributed with n · m = 80 · 50 = 4000 degrees

of freedom under the null hypothesis that the least squares estimated model is a good fit. This

means that this sum may not exceed χ4000,0.95≈ 4148.2 in order for this model to be a good

fit. The value for the test statistic equals 8318653. We can conclude that the least squares

estimated model is a bad fit.

For the Poisson model deviance residuals are considered. Deviance residuals can be seen as

the difference between the estimated model and the ideal model, for data points individually. This boils down to

D = 2 ·X

x

X

t

Dxt· log(Dxt) − Dxt− log(Dxt!) − (Dxt· log( ˆDxt) − ˆDxt− log(Dxt!))

,

where Dxt is the actual number of deaths for the group of people aged x in year t and ˆDxtis

the estimated number of deaths for this same group [6]. This simplifies to

D = 2 ·X x X t Dxtlog Dxt ˆ Dxt − (Dxt− ˆDxt).

This test statistic D is approximately χn·m−p distributed, for p the number of parameters

involved in the model. So D ∼ χ3997 under the null hypothesis.

The value of test statistic D for the Poisson model equals approximately 5878.6, while

χ3997,0.95 equals 4145.2. This implies that the Poisson model is a better fit than the least

squares model, but there is still room for improvement. A possible explanation for the

sup-posed bad fit of the Poisson model lies in the fact that our data sample is very large, which

(23)

same test but adding every data point with probability ₂₀1 in order to decrease the amount of data drastically, the test statistic D appeared to be slightly smaller than the corresponding

χ0.95 value. This means that after ruling out the negative effect the largeness of the data

set has on the goodness-of-fit, this model can be considered a good fit. This result does

not hold for the Pearson residuals in the least squares assumed model. After adding a data

point with probability₂₀1 in that model, the residuals are still far from chi-squared distributed.

Another possible explanation for the somewhat bad fit of the deviance residuals lies in the

cohort effect. The cohort effect is the phenomenon that people born in certain periods of time

experience a more rapid improvement survival rates than people born in other periods. The

analysis of the cohort effect is beyond the scope of this thesis. For more information on the

cohort effect, the paper “The Cohort Effect, Insights and Explanations” from R.C. Willets is

recommended [42].

In this chapter we discussed two different distributional assumptions for the Lee-Carter model,

i.e. the least squares assumption and the Poisson assumption. The parameters α, β and κ

under these two assumptions were estimated with the Newton-Raphson method. We have seen that homoskedasticity assumption of the error terms as considered in the least squares

assumption is not suitable for this data. The residual analysis favored the Poisson assumption

as well. So based on the quantitative tests we performed, the Poisson assumption fits the data

better than the least squares assumption. In the remainder of this thesis we will continue

(24)

3 Forecasting death rates and life expectancy

In May 2019, the Dutch government decided to raise the retirement age in the Netherlands

in 2024 up to 67 years [5]. This started a fierce debate whether on average people lived long

enough, in good health, to have a decent life after retiring at age 67. Life expectancy along

with quality of life were big issues in the media. Improving life expectancy affects the private pension sector as well. If realized survival rates out speeds the predicted survival rate it could

result in more than expected outbound cash flow for insurance companies and pension funds.

In this chapter we will establish an ARIMA model that fits the κt time series. With this

ARIMA model we can predict and simulate future survivor rates, ultimately used to price

longevity-linked securities and test their effectiveness in chapter 5.

The main goal of this chapter is to model the time parameter κtand use this time series model

to forecast future values. The model used for forecasting κt is the Autoregressive Integrated

Moving Average (ARIMA) time series. In this model the AR part indicates the variable

being regressed on its own prior values, the MA part indicates that the variable of interest is regressed on its own prior error terms and the I part indicates the number of times the data

is differenced, in order to make the data stationary. For our variable of interest κP

t, the κt

for the Poisson estimated model, this boils down to

κP,d_t = c + φ1κP,dt−1+ . . . + φpκt−pP,d + θ1Pt−1+ . . . + θqPt−q+ P

t, (10)

for c constant, φ1. . . , φp the parameter estimates of the AR part, θ1, . . . , θq the parameter

estimates of the MA part, P

t the white noise error term at time t and d the number of times

κP_t is differenced.

3.1 Properties of an ARIMA(p,d,q) model

In the process of finding the ARIMA model that fits the κPt time series for the Poisson model

the best, the first property that needs to be discussed is that differencing a time series tends to

drive the lag-1 autocorrelation to a negative value. To see why this holds Wold’s representation

theorem for time series is used [3]. This theorem states that every covariance-stationary time

(25)

Xt= ∞

X

j=0

ψjZt−j+ ηt,

with ψ0= 1,P∞j=0ψ2j < ∞, Zta white noise process with variance σ2and ηta deterministic

process. Note that

Cov(Xt, Xt−1) = Cov   ∞ X j=0 ψjZt−j+ ηt, ∞ X j=0 ψjZt−1−j+ ηt−1  = σ2 ∞ X j=0 ψjψj+1,

since ηtis deterministic and Ztis white noise. Now

Yt= Xt− Xt−1= Zt+ (ψ1− 1)Zt−1+ X j=2 (ψj− ψj−1)Zt−j+ (ηt− ηt−1). Hence, Cov(Yt, Yt−1) = E[Yt· Yt−1] = (ψ1− 1)σ2+ σ2 ∞ X j=2 (ψj− ψj−1)(ψj−1− ψj−2),

where again the properties of a white noise process are used. Now sinceP∞

j=0ψj2< ∞, the

series {ψj}∞j=0 is decreasing and smaller than 1 most of the time. This implies that in most

cases φ1− 1 < 0 and (ψ1− 1) +P∞j=2(ψj− ψj−1)(ψj−1− ψj−2) < 0. So to conclude,

dif-ferencing the time series can be a very nice tool to decrease autocorrelation. Once the lag-1 autocorrelation is around zero or negative, the time series is differenced enough times.

Often the perfect number of differencing time series is where the standard deviation of the

series is the smallest. Also, since stationary time series are mean-reverting, after the time

series is differenced the optimal number of times, the series should always eventually return

to the mean over time. As a consequence, if the series needs to be differenced one time, then

the original series should show a constant trend. If it needs to be differenced multiple times

then there should be a time varying trend in the original series [32].

In order to determine the number of AR and MA parameters that should be added to the model, some more theory is required. Let B be the backward operator, defined as the operator

(26)

B · Yt= Yt−1,

for every time series Yt, for every t. This implies we can difference a time series {Yt} once by

using

Yt− Yt−1= Yt− B · Yt= (1 − B) · Yt.

Now an ARIMA(1,1,1) model can be written as

(1 − B) · Yt= φ1(1 − B) · Yt−1+ t+ θ1t−1= φ1(1 − B)(B · Yt) + t+ θ1B · t.

This is equivalent to

(1 − φ1B)(1 − B)Yt= (1 − θ1B)t.

So we can add an AR parameter by multiplying Ytwith (1 − φ1B), and we can add an MA

pa-rameter by multiplying twith (1 − θ1B) [32]. If the series is still somewhat underdifferenced

after differencing the series, this can be compensated by adding an AR parameter. The value

of φ1 corresponds to what extend the series is still underdifferenced. If for example φ1≈ 1,

the effect is almost the same as differencing the series once. On the other hand, adding an

MA parameter can undo overdifferencing. When for example θ1 ≈ 1, it almost completely

cancels out an (1 − B) term on the left-hand side of the equation, and nullifies differencing

the series.

Besides this theory the partial autocorrelation function and the autocorrelation function itself

can also be used in order to determine how much AR and MA parameters need to be added.

To understand this, note that the partial autocorrelation function removes parts that are

explained in earlier lags to find which lags are correlated with the remaining residual. This is

closely related to how an AR model works. In an AR model current values depend on their own prior values. If the PACF function suddenly cuts off sharply after lag n, then the model

could be properly described by the n previous values of the time series, indicating an AR

model with n parameters could be a good fit to the time series. The ACF function on the

other hand captures the entire correlation between a value of the time series and a lagged

(27)

can only be caused by the residuals itself. Hence a sharp cutoff in the ACF function after n

lags suggests that the last n residuals help to describe the data, and no more. Thus adding

n MA parameters could improve the model.

This ultimately leads to a set of two rules for adding AR and MA parameters. The first one

is that if the partial autocorrelation function shows a sharp cutoff after some particular lag or

if the lag-1 autocorrelation is positive, i.e. if the time series is slightly underdifferenced, the model may need another AR parameter. On the other hand, if the autocorrelation function

shows a sharp cutoff after some particular lag or if the lag-1 autocorrelation is negative, i.e.

if the time series is slightly overdifferenced, then the model may need another MA parameter

[32].

3.2 ARIMA model choice

We continue with determining the κP

t time series for our Poisson estimated model. The ACF

and PACF plots for the original series of κP

t are shown in figure 3.

Figure 3: ACF and PACF Plots of κPt for the Poisson model. The blue dotted lines indicate

(28)

Note that the ACF plot displays large values for a high number of lags. This indicates that

the series is probably underdifferenced. Differencing the series once leads to the ACF and

PACF plots depicted in figure 4.

Figure 4: ACF and PACF Plots of the 1-time differenced κP

t for the Poisson model. The blue

dotted lines indicate the border of significantly positive or negative. Source: the author.

The 1-lag autocorrelation is negative, but not significantly negative, so the differenced time

series does not appear to be too overdifferenced. Also, the standard deviation of the

one-time differenced series equals σ ≈ 2.453, which is lower than both the undifferenced with

σ ≈ 26.38 and the two-times differenced time series of κP

t with σ ≈ 3.72. Differencing the

series once appears to be optimal for this time series. We continue with determining how

many AR and MA parameters should be added to the model. We take a look at the ACF plot

of the differenced series in figure 4. The 1-lag autocorrelation is negative, but not significant. Also the ACF function does not show a particularly sharp cutoff due to the correlation not

being significant. We therefore consider using just an ARIMA(0,1,0) model. This leads to

the following model:

κP_t − κP

t−1= −1.8631 + P

t, (11)

where the constant term is significant and P

(29)

figure 5 displaying various graphical goodness-of-fit tests for this ARIMA(0,1,0) model.

Figure 5: Various Graphical Goodness-of-Fit tests for the estimated ARIMA(0,1,0) model for

κtin the Poisson model. Source: the author.

There seems to be no clear patterns in the standardized residuals in the top plot. They also seem to be mean-reverting. In the ACF plot can be seen that the values are almost entirely

between the blue-dotted lines showing no significant undescribed autocorrelation. The normal

Q-Q plot of the residuals in the third plot is straight enough to accept the assumption that

the residuals are normal. Lastly, we take a look at the p-values found for the Ljung-Box test

statistics for different lags. Only for lag-17 the p-value is smaller than 0.05. This implies that

assuming that the differenced series of κP

t is independently distributed is reasonable. There

is still some correlation left in the residuals, but not enough to add more variables. To get

rid of the remaining undescribed correlation at least 3 MA parameters should be added, since

the first significant autocorrelation in the ACF plot occurs after 3 lags. As is also common in

literature, it is not worth complexing the model this much to capture this lag-3 autocorrelation [12]. Also when adding the MA parameters iteratively, we observe that the first parameter has

an insignificant effect on the model. We can conclude that the ARIMA(0,1,0) as described

above is the best fit to the time series κP

t. The graph of the time series, along with his

(30)

Figure 6: Time series plot of κPt, for t0= 1965. Points in red are forecasts. The dark shaded

region equals ˆκP t±

p

E[(ˆκPt − κPt)2], and the light shaded region equals ˆκPt ±2

p

E[(ˆκPt − κPt)2].

Source: the author.

3.3 Bootstrapping life expectancy

In the beginning of this chapter it was mentioned that the retirement age in the Netherlands

in 2024 will be 67. This makes the expected lifetime of 67 year olds of major importance

for politics, but also for insurance companies and pension funds. In this section we will use bootstrap to estimate the mean and the 95% confidence interval of the expected lifetime of

someone aged 67. This will give more insight in the amount of uncertainty rooted in the

Poisson estimated Lee-Carter model. The results will not be used in the remainder of the

thesis.

In order to determine life expectancy for every age and for every year, future values of κP t

need to be simulated from the model described in equation (11). With these simulated values

future expected mortality rates can be estimated, from which we can derive expected lifetime

for all ages and years. We can not derive mortality rates and life expectancy analytically

using the expected value of κPt, since life expectancy is not linear in κPt. Using simulation it

is possible to determine all mortality rates from 1965 to any year in the future, for any age

(31)

assign a value for life expectancy to age 105. The value that will be assigned to age 105 does

not influence the results much, since under our model the probability of someone of 67 years

old in 2024 reaching age 105 equals approximately 0.0003. The expected lifetime we assign

to age 105 is 1.5 years, since life table data from the Human Mortality Database over the last

10 years indicates that the expected lifetime of someone aged 105 revolves constantly around

1.5 years [2].

Recall that the assumption in equation (1) assumed the probability of death to be uniformly

distributed within age interval [x, x + 1) for every age x. This implies that the expected

lifespan of someone dying at age x equals x +1₂. Consequently, we can recursively define the cohort life expectancy for age x and year t as the probability of dying at age x in year

t times x +1₂ plus the probability of surviving times the life expectancy for age (x + 1) in year t + 1, i.e. ˚ex(t) = ˆqx(t) · x + 1 2 + ˆpx(t) · ˚ex+1(t + 1),

where the expected mortality rate ˆqx(t) for someone aged x in year t is estimated using 5000

simulations, and ˆpx(t) = 1 − ˆqx(t) denotes the expected survival rate for someone aged x in

year t. Plugging in age x = 67 and year t = 2024 leads to ˚e67(2024) ≈ 19.70. This value

ex-ceeds the expected lifetimes mentioned in the public debate concerning raising the retirement

age.

When forecasting life expectancy, uncertainty plays an important role as well. To assess how

uncertain an estimate is, confidence intervals can be a very useful tool. In our case there are

two types of uncertainty. The first type is the uncertainty in the estimated parameters αx, βx

and κP_t. The second type relies on the uncertainty in forecasting κP_t from the ARIMA time series. Also, note that life expectancy is a quite complex non-linear function of parameters

αx, βxand κPt. As a consequence, we will use bootstrap to determine the confidence interval.

The general idea is to sample B realizations for each combination of age and year from the

Poisson distribution with parameter λxt = Dxt = ET Rxt· ˆµx(t) [12]. In this way there will

be B life tables, each of them generating a new set of parameter estimates for αx, βxand κPt.

For each κP

t we could simulate an ARIMA(0,1,0) series to obtain future values. It is clear

(32)

Because of the complexity of the model we can not make B too large, so we take B = 5000.

The 95% confidence interval for ˚e67(2024) then equals [18.81, 20.59]. The mean of all sampled

expected lifetimes equals 19.69, which is close to the expected value. The interval is narrower

than is common in literature. This can be explained by the fact that this study does not

distinguish among sexes, and for that reason has more data, which narrows the interval. The

histogram of ˚e67(2024) for B = 5000 is shown in figure 7.

Figure 7: Histogram of ˚e67(2024) for B = 5000 samples. Source: the author.

In this chapter we established an ARIMA(0,1,0) time series model for κPt. Consequently, we

derived an estimate for the confidence interval of the expected lifetime of someone aged 67 in 2024, taking parameter uncertainty and uncertainty in forecasting κtinto account. To measure

the impact of parameter uncertainty we also estimated the confidence interval without using

the bootstrap framework, by merely simulating 5000 values of κP

t, for κPt as estimated in

chapter 2. Subsequently, the confidence interval of the expected lifetime is calculated using

αx, βxas estimated in chapter 2. The resulting confidence interval equals [18.83, 20.57], which

is just a factor 0.98 narrower than the confidence interval where parameter uncertainty was

(33)

of someone aged 67 in 2015, where the confidence interval that does not take parameter

uncertainty into account is also a factor 0.98 smaller than the confidence interval taking

parameter uncertainty into account. We conclude that parameter uncertainty is negligible

compared to the uncertainty in forecasting κP

t in a life expectancy context. In chapter 5

future values for κP_t will be simulated to assess longevity risk in hedged and unhedged annuity portfolios. Due to complexity of the model, parameter uncertainty will not be addressed in

(34)

4 Solvency II

Longevity risk is the risk that realized survival rates of policyholders and pensioners exceed

expected survival rates. This can result in higher than expected payout costs for

insur-ance companies and pension funds, possibly causing solvency problems. In 1973 the basis

of Solvency I was introduced by the European Economic Community, to monitor solvency regulations for insurance companies and pension funds and thus protect his policyholders.

However, in Solvency I the capital requirements did not match the amount of risk the

insur-ance companies were exposed to. This caused a lack of incentive for insurinsur-ance companies to

properly manage the risks they faced. Also, insurance companies did not need to be

transpar-ent about their financial position and the risk sensitivity of that position. These shortcomings

of Solvency I led to the new Solvency II agreement, fully enforced by the European Union from

the 1st of January 2016 [7]. The Solvency II agreement does give insurance companies and

pension funds an incentive to hedge longevity risk using longevity-linked securities, since this

will lower the capital requirements. This chapter will provide the structure of the Solvency

II agreements. This will give more insight on how to effectively hedge longevity risk in the fifth chapter, where we will hedge longevity risk using longevity-linked securities in a simple

annuity portfolio.

Solvency II applies to all companies whose gross premium income equals at least 5 million

euro, or whose technical provisions are at least 25 million euro. According to De

Nederland-sche Bank, technical provisions are defined as: “The amount to be held by an insurer on

the balance sheet date in order to settle all existing obligations towards policyholders” [7].

Technical provisions will be revisited later on in this chapter. Solvency II also applies for

insurance companies whose reinsurance activities or activities abroad are non-negligible. [7].

Solvency II is often compared to the Basel II framework. The main reason is the similarity

in structure. The Solvency II framework, as well as the Basel II framework, work with

the three-pillar concept. Pillar 1 incorporates the quantification of the risks the insurance

companies are exposed to, ultimately leading to capital requirements. The way the capital

requirements are calculated is particularly important for this research, since reducing the

capital requirements is a main objective in hedging longevity. In Pillar 2 the mandatory

(35)

requirements for companies are included, with the ultimate goal of making the insurance and

pension business more transparent. All three pillars will be discussed into greater detail below

[23]. However, Pillar 2 and Pillar 3 will not be used in the remainder of this thesis.

4.1 Pillar 1

Under Pillar 1, Solvency II demands insurers to determine the market value of their balance

sheet and quantify the risks they face, either using a standard model or an internal model.

Insurers have to hold capital based on the amount of risk they are exposed to. The capital

requirements of the first pillar can be explained using figure 8 below.

Figure 8: The Solvency II Balance Sheet. Source: Society of Actuaries In Ireland.

As we can see, the liabilities consist of the capital requirements and the technical provisions.

The technical provisions consist of the best estimate liabilities and a risk margin. The best estimate liabilities can be described as the present value of all expected future cash flow,

discounted with the risk-free interest rate r. The risk-free interest rates are published by the

European Insurance and Occupational Pension Authority (EIOPA). The EIOPA is closely

involved with Solvency II and supervises the whole insurance and pension sector. The risk

margin is defined as the amount of capital another insurer needs to be paid in order for them

(36)

for the risk that the best estimate liabilities end up being worse than expected. Also part of

the risk margin is a compensation for the cost of holding capital against these best estimate

liabilities that would be taken on. In terms of formulae the risk margin is a percentage of the

discounted Solvency Capital Requirement, i.e.

RM = CoC · ∞ X t=0 SCRt (1 + rt+1)t+1 ,

where CoC is the Cost of Capital rate, set to 6% [23]. The Solvency Capital Requirement

(SCR) will be discussed later on in this section.

As mentioned in the beginning of this chapter, the main purpose of Pillar 1 is for companies

to quantify their risks by calculating its capital requirements. The Solvency Capital

Require-ment (SCR) is the amount of capital an insurance company needs to hold in order to avoid

increasingly supervisory interventions from the EIOPA. There are two ways to calculate the

SCR. It can be done by means of a standard formula or by an internal model. We start by discussing the standard formula.

4.1.1 Standard formula

The standard formula is a method that is suggested by Solvency II itself. First we define the

Net Asset Value (N AV ) at time t to be

N AVt:= At− BELt, (12)

where Atis the market value of the assets at time t and BELtis the best estimate liabilities

at time t. In the standard formula the SCR for longevity risk can now be calculated as

SCRShock_long (t) = N AV0− (N AV0|Longevity Shock), (13)

where a longevity shock entails a 20% decrease of the mortality rates for every age [31]. The

EIOPA provides a valid approximation of this value, under the circumstances that the nature,

complexity and scale of the risk is proportional. Companies that want to use this

approxima-tion need to perform an assessment of the risk to prove this. In this assessment any deviaapproxima-tions

(37)

well [1]. The simplification provided by the EIOPA is the following:

SCRlong= 0.2 · q · n · 1.1

n−1

2 · BEL_t_. ₍₁₄₎

In this formula q denotes the expected average mortality rate of all insured persons, weighted

by the insured sum. Secondly, n denotes the modified duration in years of beneficiaries

included in the best estimates, i.e.

n = 1 1 + r · ∞ X t=1 t · P V CFt P V T CF ,

where r denotes the current risk-free interest rate, P V CFtdenotes the present value of the

cash flow at time t and P V T CF denotes the present value of the total cash flow. In equation

(14) the BELtterm indicates the best estimate liabilities that are subject to longevity risk [1].

The overall Solvency Capital Requirement can be calculated by aggregating the individual

Solvency Capital Requirement using the scheme in figure 9.

Figure 9: Solvency II SCR Risk Scheme. Source: Manolache, A.E.D. [29].

In this chart various risks that insurance companies and pension funds are exposed to are

(38)

will only focus on the longevity risk in life risk2_{. For life risk the overall SCR}

Lif e can be

calculated using the following formula:

SCRLif e=

s X

i,j

CorrLi,j· SCRi· SCRj,

where CorrLi,j can be obtained from the correlation table in figure 10.

Figure 10: Life Risk correlation scheme. Source: European Union [1].

The Basic Solvency Capital Requirement (BSCR) can be calculated in a similar fashion using

SCRBSCR=

s X

i,j

CorrBi,j· SCRi· SCRj,

where CorrBi,j can be obtained from the correlation table in figure 11 [1].

4.1.2 Internal model

The EU and the EIOPA also offer insurance companies an opportunity to use an internal

model. This encourages companies to hedge the risks they are exposed to, in order to

de-crease the amount of capital they have to hold. In the internal model the Solvency Capital

Requirement is defined as the amount of capital that would be needed to overcome all losses

that could occur in a one-year time frame with probability 0.995. Specifically for longevity

2_{In this thesis we focus on hedging longevity risk in an annuity portfolio. Annuity portfolios are not directly}

(39)

Figure 11: BSCR Risk correlation scheme. Source: European Union [1].

risk this boils down to:

SCRV aR_long(t) = argminxP N AVt− N AVt+1 1 + rt+1 > x ≤ 0.005, (15)

where rtis the annual risk-free interest rate at time 0 for maturity t.

The Minimum Capital Requirement (MCR) is the strict minimum amount of capital an

insurance company needs to reserve. Below this level companies would lose their license. The

MCR is part of the SCR and is defined as

M CRV aR_long(t) = argminxP N AVt− N AVt 1 + rt+1 > x ≤ 0.15,

but it is always larger than 25% of the SCR and always smaller than 45% of the SCR. The last constraint is that the MCR is always at leaste3.7 million for life insurance companies [1].

4.2 Pillar 2

In Solvency II Pillar 2 describes the supervisory review process, the governance requirements

and responsibilities of some key functions within the insurance business. Every company that

Solvency II applies to are enforced to have a risk management team, an actuarial team, a

compliance team and an internal audit team. The organizational structure of these teams

(40)

Pillar 2 also requires every company to perform an Own Risk and Solvency Assessment

(ORSA) once every year. The EIOPA defines ORSA as: “The entirety of the processes and

procedures employed to identify, assess, monitor, manage and report the short and long term

risks an insurance undertaking faces or may face and to determine the own funds necessary

to ensure that the undertaking’s overall solvency needs are met at all times” [16]. What

distinguishes ORSA from the capital requirements mentioned in Pillar 1 is that the SCR and

the MCR refer to regulatory requirements and ORSA is defined to be the whole process of being aware and react to the risks the company is exposed to. Also, ORSA is unique for every

company, as the calculations in the SCR and MCR are likely to be more similar since the

processes are subject to the same constraints. Lastly, Pillar 1 is limited to a one-year time

horizon and ORSA focuses on long-term business as well. ORSA even requires a company to

quantify their abilities to meet Pillar 1 capital requirements in the upcoming years [23].

4.3 Pillar 3

Lastly, in Pillar 3 of the Solvency II framework disclosure requirements are included to

en-hance transparency. The results of capital requirements in Pillar 1 and review processes like

ORSA in Pillar 2 need to be disclosed in reports to EIOPA supervisors. Elements of these

reports that are not considered to be confidential need to be disclosed in an annually produced public Solvency and Financial Condition Report (SFCR). The main goal of these disclosure

requirements is to obtain more market discipline through transparency. Ultimately this will

decrease systemic risk in the whole sector as well, since everyone is more aware of the risks that

companies are exposed to [22]. Pillar 3 plays no important role in the remainder of this thesis.

The Solvency Capital Requirement gives insurance companies an incentive to hedge their

risks in order to minimize the amount of capital they need to reserve. Also, ORSA requires

insurance companies and pension funds that are exposed to longevity risk to identify,

as-sess, monitor, manage and report longevity risk. The last chapter of this thesis will present longevity-linked securities that can be used to manage longevity risk, and therefore should be

(41)

5 Hedging longevity risk

In the fourth chapter we have been introduced to capital requirements for longevity risk. The

capital requirements are designed to buffer against big losses that could occur in

unfortu-nate circumstances. The capital requirements decrease when the amount of longevity risk

a pension fund is exposed to decreases. This gives a pension fund an additional incentive to hedge the longevity risk they face, besides the incentive to avoid unnecessary tail risk in

general. In this chapter we will test the hedge effectiveness of longevity-linked securities in

a basic annuity portfolio. The annuitants in this portfolio are 67 years old. The results in

this chapter will show that in a simple unhedged pension annuity setting the 99.5%

Value-at-Risk of the loss can be roughly ten times larger than the 99.5% Value-Value-at-Risk of the loss

for a portfolio completely hedged with a longevity-linked security. This raises two questions.

How should longevity risk be hedged and what percentage of the face value should be hedged?

In this chapter the Survivor Forward, the Survivor Swap and the European Survivor Call

will be discussed, priced and tested on effectiveness in hedging longevity risk. A survivor forward is a financial instrument agreed upon at time t, that obliges the holder to exchange

predetermined fixed amount based on the expected survivor rate for a floating amount based

on the realized survival rate at maturity T . The survivor swap works similarly, but where

the survivor forward limits itself to only one payment date T , the survivor swap provides

multiple payment dates t + 1, t + 2, . . . , T [10]. The structure of these derivatives is renowned

in hedging interest rate risk, and can potentially be effective in hedging longevity risk as well.

A loss in a hedged portfolio exposed to longevity risk is compensated by a positive cash flow

from the derivatives, while a loss in the derivatives itself due to lower than expected survival

rates is compensated by less payouts in the portfolio that was exposed to longevity risk.

The last derivative that will be discussed is the European survivor option, in particularly the

European survivor call. The holder of the European survivor call has the right but not the

obligation to exchange a fixed amount based on the expected survivor rates for the floating

amount based on the realized survival rates. The holder of this derivative will never exercise

the option if the realized survivor rates are smaller than the expected survivor rates, implying

that the potential loss is bounded by just the price paid for the derivative. This will make

(42)

Mortality rates are not a continuously traded asset in the market. This means that

longevity-linked securities can not be replicated in the market, implying that the market trading

longevity-linked securities is incomplete [10]. Therefore the pricing of the derivatives will

be performed by incorporating a market price of longevity risk λ into the model. Recall that

in the original model κPt was modelled as an ARIMA(0,1,0) time series with Pt ∼ N (0, 5.822)

as can be found in equation (11). The risk premium λ will be incorporated in this model by distorting P

t in such a way that it is N (−λ ·

√

5.822, 5.822) distributed. This will decrease

κP

t, indicating a longevity shock. In this thesis prices for derivatives will be calculated for

λ ∈ {0.10, 0.15, 0.20, 0.25, 0.30}. These values for λ are common in literature and resemble

the market prices of longevity risk that are applied in the Dutch insurance market [40].

Since we have mortality data up to and including 2014, we take the risk-free interest rates

from 31 December 2014 and use this day as the issue date for the longevity-linked securities.

We use the risk-free interest rates provided by the U.S. Department of Treasury [39], since

EIOPA did not yet provide the risk-free interest rates in 2014. The risk free interest rate

for maturities 1,2,3 and 5 are obtained directly from the data, the 4 year rate is obtained by interpolation. The values are depicted in table 1.

5.1 Distortion operators

In this section we introduce the concept of distortion operators. A distortion operator is a

function g : [0, 1] → [0, 1] such that g is increasing with g(0) = 0 and g(1) = 1. This function

g transforms a certain distribution into a new distorted distribution [40]. Consider the risk

adjusted distribution

FX˜(x) = Φ(Φ −1_(F

X(x)) + λ), (16)

where FX(x) is a cumulative distribution function of a random variable X, Φ is the cumulative

distribution function of the standard normal distribution and λ is the risk premium, which indicates the market price of longevity risk. Now define Wang’s operator

Pricing and Hedge Eﬀectiveness of Longevity-Linked Securities

Pricing and Hedge Effectiveness of

Longevity-Linked Securities

Brian M¨

ollenkamp

July 1, 2020

Pricing and Hedge Effectiveness of Longevity-Linked

Securities

Brian M¨

ollenkamp

Contents

1

Introduction

1.1

Structure of this research

1.2

Literature review

1.3

Aim of this research

1.4

Main findings

2

Lee-Carter model

2.1

Least squares estimation

2.2

Poisson estimation

2.3

Data

2.4

Estimation results

2.5

Quantitative comparison of least squares with the Poisson

as-sumption

3

Forecasting death rates and life expectancy

3.1

Properties of an ARIMA(p,d,q) model

3.2

ARIMA model choice

3.3

Bootstrapping life expectancy

4

Solvency II

4.1

Pillar 1

4.2

Pillar 2

4.3

Pillar 3

5

Hedging longevity risk

5.1

Distortion operators