Pricing and Hedge Effectiveness of
Longevity-Linked Securities
Brian M¨
ollenkamp
July 1, 2020
A Thesis presented for the Degree of
Master of Science in Econometrics, Operations Research and Actuarial Studies
Faculty of Economics and Business
University of Groningen
The Netherlands
Master’s Thesis Econometrics, Operations Research and Actuarial Studies
Supervisor: dr. J. Alonso-Garc´ıa
Pricing and Hedge Effectiveness of Longevity-Linked
Securities
Brian M¨
ollenkamp
Abstract
Contents
1 Introduction 5
1.1 Structure of this research . . . 5
1.2 Literature review . . . 6
1.3 Aim of this research . . . 8
1.4 Main findings . . . 8
2 Lee-Carter model 10 2.1 Least squares estimation . . . 12
2.1.1 Singular value decomposition . . . 13
2.1.2 Newton-Raphson method . . . 14
2.2 Poisson estimation . . . 15
2.3 Data . . . 17
2.4 Estimation results . . . 18
2.5 Quantitative comparison of least squares with the Poisson assumption . . . 20
3 Forecasting death rates and life expectancy 23 3.1 Properties of an ARIMA(p,d,q) model . . . 23
3.2 ARIMA model choice . . . 26
3.3 Bootstrapping life expectancy . . . 29
4 Solvency II 33 4.1 Pillar 1 . . . 34 4.1.1 Standard formula . . . 35 4.1.2 Internal model . . . 37 4.2 Pillar 2 . . . 38 4.3 Pillar 3 . . . 39
5 Hedging longevity risk 40 5.1 Distortion operators . . . 41
5.2 Pricing longevity-linked securities . . . 43
5.2.1 Survivor Forward . . . 43
5.2.2 Survivor Swap . . . 45
5.3 Risk measures . . . 49
5.4 Hedge effectiveness of longevity-linked securities . . . 50
5.4.1 Short-term longevity risk management . . . 50
5.4.2 Long-term longevity risk management . . . 53
6 Conclusion and discussion 56
1
Introduction
The first notion of a formal public pension system was designed in 1881 by Otto von
Bis-marck in Germany. The pension system made sure that the few workers that reached age 70
could retire and would still receive income covered by the working class [9]. Ever since, the
life expectancy for newborns in Germany more than doubled [30]. At first sight this seems nothing but good news resulting from, among other things, better sanitation, education and
health care. However, when the improving life expectancy out speeds the predictions made by
pension funds and insurance companies, it may result in more than expected outbound cash
flow for these companies. This is called longevity risk and this plays a substantial role in risk
management in the insurance and pension business. In a report by the European Insurance
and Occupational Pension Authority in 2011 it was noted that the life risk module accounted
for 24% of the basic capital requirements of insurance companies and pension funds, and that
longevity risk is a main risk driver of life risk [21]. The primary goal of this thesis is to assess
and ultimately compare financial instruments that mitigate longevity risk in a simple annuity
portfolio.
1.1
Structure of this research
One of the most important elements in assessing longevity risk is the mortality model. In
1992 Ronald D. Lee and Lawrence Carter proposed a fairly simple mortality model,
depen-dent of two age parameters αx and βx, and only one time series parameter κt [25]. These
parameters can be estimated using the singular value decomposition or numerical algorithms
like Newton-Raphson’s method. Various distributional assumptions in the Lee-Carter model
lead to different models. In this thesis we will first discuss the classical Lee-Carter model
using least squares estimation as proposed by Lee and Carter themselves [25]. This model
assumes the error terms x(t) to be N (0, σ2) distributed for all ages x and years t. This is
a very strong assumption that may not hold for most mortality data sets. In particular the
assumption that the variance of the error terms is independent of age is dubious. The second
model that will be discussed in this research does not have that problem. The model is based
on a paper of Brouhns et al. [13]. The number of deaths per age and year is assumed to
be Poisson distributed. The parameters can then be estimated numerically such that they
data from 1965 to 2014. Ultimately, both models will be compared using graphical and
sta-tistical tests and the model fitting the data the best will be used in the remainder of the thesis.
The estimates of time series parameter κt can be fitted into an ARIMA model to predict
future mortality rates and life expectancy. Determining the ARIMA model that fits κt the
best is a key aspect in this research. Later on the effectiveness of longevity-linked securities
will be estimated using simulated future values of κt. Therefore the process of determining
the ARIMA model for κtwill be comprehensively described in this thesis.
Once an accurate model for forecasting future mortality rates has been built, it can be used
to quantify the amount of longevity risk a portfolio is exposed to. The Solvency II framework
provides clear demands in the form of capital requirements for insurance companies and
pen-sion funds to protect themselves against longevity risk [1]. These capital requirements can
be a substantial amount on the balance sheet of insurance companies. This gives insurance
companies an incentive to hedge the longevity risk in their portfolio in order to reduce the
capital requirements. In this thesis we will extensively explain the Solvency II framework,
and the role of longevity risk in this framework.
In the fifth chapter all previous theory will be combined to assess and mitigate longevity
risk. A risk premium λ, denoting the market price of longevity risk, will be incorporated into
the ARIMA model for κt using Wang’s distortion operator. Consequently, the survivor
for-ward, the survivor swap and the European survivor call will be priced for different maturities
and values for λ. Ultimately, the hedge effectiveness of the longevity-linked securities will
be determined based on mean, standard deviation, Value-at-Risk and Expected Shortfall of
simulated losses in hedged portfolios.
1.2
Literature review
This section will provide a review of recent literature in the field of longevity risk. It is crucial to assess the ecomonic impact and responsibilities of longevity risk, in particular who should
bear longevity risk [28]. Macminn et al. argue that governments could be most suitable to sell
longevity-linked securities, since governments could deal with the problem of adverse selection
on the suitability of the government by discussing the Arrow-Lind theorem. This theorem
states that the risk premium of a financial instrument issued by the government would
ap-proximately be zero [4], because the risk is dispersed over a large population. This can be
countered by the insight that a portion of longevity risk is not diversifiable. They continue
by noting that the question who should bear longevity risk is dependent of if longevity risk
is seen as a risk or as uncertainty, where uncertainty is defined as randomness with unknown
probability distributions as viewed by F.H. Knight in [24]. In line with E. Stallard they argue that longevity risk is quantifiable and hence a risk. More methodological development will
improve the accuracy of future forecasts [38].
Successful hedging might diversify away a part of longevity risk, but will at the same time
incorporate basis risk [17]. Coughlan et al. provide a framework for understanding longevity
basis risk, calibrating longevity-linked financial instruments and evaluating hedge
effective-ness. Basis risk arises when there is a mismatch between the hedged portfolio and the hedging
instrument. The appearance of longevity basis risk indicates that the instrument used for
hedging can not mitigate all longevity risk a portfolio is exposed to and it also implies that
some instruments used for hedging longevity risk could be more effective than others. This hedge effectiveness depends on the underlying hedging objectives. In this paper they applied
the framework to a case study for British mortality data. The results indicated that longevity
basis risk between a pension plan or annuity portfolio and a hedging instrument linked to a
national population-based longevity index can be reduced considerably using this framework.
Our methodological framework is heavily inspired by theirs.
Understanding how longevity risk can be decomposed into different types of risk will provide
a deeper understanding of longevity risk [15]. Cairns et al. perform a case study of a pension
plan that wants to hedge the longevity risk by reducing the standard deviation of their cash
flow. They show how correlation and hedge effectiveness can be decomposed into contribu-tions by distinct types of risk. The main risks they discuss are basis risk, recalibration risk and
Poisson risk. The paper uses the framework provided by Coughlan et al., discussed in the last
paragraph. The main conclusion that Cairns et al. find is that index hedge strategies are an
effective and less costly alternative to hedge longevity risk compared to customised longevity
might underestimate longevity risk, but also the hedge effectiveness of index-based hedging
strategies. Although this research was limited to the objective of minimizing the standard
deviation in their portfolio, they believe that similar results will hold for other risk metrics.
Denuit et al. describe a framework to price survivor bonds using Wang’s transform in the
Lee-Carter framework [18]. They design a survivor bond that could be used to hedge longevity
risk. Their study finds that the designed survivor bond could be an interesting instrument to diversify longevity risk. In this paper Wang’s transform is also used to price the
longevity-linked securities discussed in this paper.
1.3
Aim of this research
The purpose of this study is to determine the price and hedge effectiveness of the survivor
forward, the survivor swap and the European survivor call for several maturities and market
prices of longevity risk. In particular, the main objective is to find the derivative that hedges
a portfolio exposed to longevity risk the most effectively based on mean, standard deviation,
99.5% Value-at-Risk and 99.5% Expected Shortfall of simulated losses. I expect the forward
and swap to do well, since they are the most common derivatives used for hedging other risks
like for example credit risk and interest rate risk [8].
This thesis will contribute to literature in presenting the longevity risk hedge effectiveness
of these longevity-linked securities in a simple pension annuity portfolio under a Poisson
dis-tributed Lee-Carter mortality model. The results can be valuable for insurance companies and
pension funds that want to assess, regulate and mitigate the longevity risk they are exposed to
in their liabilities. A limitation of this research considering the paper by Cairns et al. is that
this study does not specifically addresses parameter uncertainty in the hedging framework.
This could cause longevity risk and hedge effectiveness to be significantly underestimated [15].
1.4
Main findings
In the second chapter we find that the homoskedasticity assumption of the error terms in the
least squares estimated model does not hold for the Dutch mortality data. On top of that, analysis of the residuals also suggests that the least squares estimated model is a bad fit.
of the Poisson estimated model concludes that the Poisson estimated model is more suitable
for modelling the Dutch mortality rates.
In the third chapter an ARIMA(0,1,0) model is shown to be the best fit to the κttime series
obtained from the Poisson estimated model. The formula describing the time series κtis
κt− κt−1= −1.8631 + t,
with t∼ N (0, 5.877). This formula is used to simulate future realized survivor rates in the
process of testing the hedge effectiveness.
Finally, in the fifth chapter we find that the European survivor call is the most effective
derivative in hedging longevity risk in a basic annuity portfolio. This finding is based on the
result that it is less costly to decrease the Value-at-Risk and Expected Shortfall of losses in
a portfolio hedged with European survivor calls than in a portfolio hedged with the survivor
2
Lee-Carter model
In 1992 Lawrence Carter and Ronald D. Lee introduced the Lee-Carter model, a fairly
sim-ple model expressing the change in mortality over time for a fixed age as a function of only
one time dependent parameter [25]. In the original paper a least squares model is proposed,
which relies on the assumption of constant variance across all ages. More recently, another model was proposed by Brouhns et al. [13], which allows variance to be increasing in age.
In this model the number of deaths is Poisson distributed. The death rate formula mx(t)
is still modelled according to the central equation in Lee-Carter’s model. These approaches
to modelling mortality data differ in the procedures of estimating the parameters. In this
chapter, both models will be fit over Dutch mortality data and the results will be tested on
goodness-of-fit. The model that fits the data the best will be used to simulate future survival
rates to quantify longevity risk in chapter 5.
A strong alternative for the Lee-Carter model is the Cairns-Blake-Dowd model, which is
pro-posed in 2006 [14]. This model uses two time parameters, which is one more than is used in the Lee-Carter model. This extra time parameter creates an opportunity for the model to
de-scribe imperfect correlation in mortality rates across various ages for different years in a more
precise fashion [35]. In general, Lee-Carter often describes mortality rates better for lower
ages (younger than 60), and Cairns-Blake-Dowd describes mortality rates better for older ages
(age 60 and older), which makes the Cairns-Blake-Dowd model a useful model for pension
funds and life-insurance companies. However, a comparative research study by C. Maccheroni
and S. Nocito concluded that for Italian data from 1975-2014 the Lee-Carter model performed
better for the complete data set, and the Cairns-Blake-Dowd model only produced reliable
forecasts for age 75 and older [27]. In this thesis we also use Western-European mortality
data from roughly the same period. In addition to it being being an easier model, this makes the Lee-Carter model the preferred model for this thesis. In this thesis we will not discuss
Cairns-Blake-Dowd any further, but we can refer the interested reader to the original paper
in which the Cairns-Blake-Dowd model is proposed, see [14].
Before focusing on the estimation of the model parameters a few essential definitions and
assumptions that will be used during this thesis will be discussed. The first one is the survival
to at least age x from birth in year t. Now define the force of mortality in terms of S0,t as µx(t) = 1 S0,t(x) lim dx→0+ S0,t(x) − S0,t(x + dx) dx = −1 S0,t(x) d dxS0,t(x) = f0,t(x) 1 − F0,t(x) ,
where we use that −dxdS0,t(x) = f0,t(x) [19].
It is challenging to obtain accurate values for µx(t), since it is a quotient of distributions that
are unknown. The distribution is based on the continuous random variable lifetime, while
mortality data denotes age at death only as integers. The probability of death for age x and
year t derived from this data is called the death rate, denoted as mx(t). In literature, it is very
common to assume that the death rate mx(t) is equal to the force of mortality µx(t). This
can be achieved by assuming that the force of mortality is constant within certain intervals
of time and age for every age x, i.e.
µx+ξ1(t + ξ2) = µx(t), 0 ≤ ξ1, ξ2< 1, (1)
see [35]. In the remainder of this thesis the one-year survival probability for someone of age
x in year t will be denoted as Sx,t.
After imposing this assumption we get to discuss our model of interest. The log-bilinear form of the Lee-Carter model is defined as follows:
log mx(t) = αx+ βx· κt, (2)
where αxis an age dependent sequence and βxis an age dependent sequence interacting with
the time dependent parameter κt[25]. When the values of αx, βxand κtare estimated with
use of the death rates data, an autoregressive integrated moving average (ARIMA) model
can be fitted over the κt’s. This ARIMA model will help us forecast future κt’s and therefore
predict future death rates and life expectancy. This will be discussed in chapter 3. The
remainder of this chapter deals with the least squares estimation of αx, βx and κtin section
2.1 and with the Poisson estimation of these parameters in section 2.2. Section 2.3 discusses the data and the required manipulations. The estimation results are presented in section 2.4
2.1
Least squares estimation
The classical way to estimate the values of αx, βx and κt is by using the least squares
assumption, i.e.
log ˆmx(t) = αx+ βx· κt+ x(t), (3)
for x = x1, ..., xm and t = t1, ..., tn, [25]. An important note here is that the error terms
x(t) are assumed to be normally distributed with mean 0 and variance σ2. In particular this
means that the error terms are assumed to be homoskedastic. In other words, the variance
of the error terms does not depend on age x. Whether this is a reasonable assumption or not
will be discussed in section 2.5.
Note that the proposed model has identifiability issues. For
log mx(t) = ˜αx+ ˜βx· ˜κt,
with ˜αx= αx+ c · βx, ˜βx= βqx and ˜κt= q · (κt+ p), it gives exactly the same model as with
αx, βx and κt, for every p, q ∈ R with q 6= 0. This leads to an infinite number of equivalent
maxima in the likelihood corresponding to the model. All of these maxima will generate equal
forecasts. There is no point in complicating the model this way. Two constraints need to be
imposed to be sure of identification. The choice for these constraints is rather subjective. In
literature it is common to use the constraints shown in equation (4) [25].
tn X t=t1 κt= 0, xm X x=x1 βx= 1. (4)
These identifiability constraints will also be used in this thesis. The optimal ˆαx, ˆβx and ˆκt
are the estimates under which the objective function
OLS(α, β, κ) = xm X x=x1 tn X t=t1 (log ˆmx(t) − αx− βx· κt) 2 (5)
is minimized [35]. The exact way to find these values is by computing a singular value
decom-position, as suggested by the original Lee-Carter paper [25]. It can also be done numerically
using the Newton-Raphson method. We will delve into the two estimation procedures in
2.1.1 Singular value decomposition
This subsection discusses an exact way to find the ˆαx, ˆβxand ˆκtthat minimize the objective
function in equation (5). First the optimal value for ˆαx is determined. When setting the
derivative of OLS with respect to αxequal to 0, we find
−2 · tn X t=t1 (log ˆmx(t) + αx+ βx· κt) = 0. Note thatPtn
t=t1κt= 0 by the assumption in equation (4). Solving for αx yields
ˆ αx= Ptn t=t1log ˆmx(t) tn− t1+ 1 . (6)
Having estimated ˆαx, we create a matrix that represents bilinear term as follows:
Z = ln ˆmx1(t1) − αx1 . . . ln ˆmx1(tn) − αx1 .. . . .. ... ln ˆmxm(t1) − αxm . . . ln ˆmxm(tn) − αxm .
The goal is now to minimize the updated objective function
˜ OLS(β, κ) = xm X x=x1 tn X t=t1 (zxt− βx· κt)2 (7)
for βx and κt. Now the singular value decomposition is a useful tool. Note that the
Eckart-Young-Mirsky theorem states that the best approximation of rank k of a matrix Z with
singular value decomposition Z = U ΣVT equals
Zk = k
X
i=1
σiuivTi ,
where σiequals the i-th diagonal entry of Σ and uiand viequal the i-th column of respectively
U and V [20]. Define λ as the largest eigenvalue of ZTZ, and define the corresponding
eigenvector as u. Furthermore, let v be the eigenvector corresponding to eigenvalue λ in
ZZT. Applying the Eckart-Young-Mirsky theorem for k = 1 leads to
Z ≈ Z1=
being the best rank 1 approximation of Z. Hence, ˆ β = v Pxm−x1+1 j=1 vj , ˆκ =√λ xm−x1+1 X j=1 vj u (8)
are the values of βx and κt minimizing the objective function in equation (7) [25].
It is trivial that ˆβx satisfies the identifiability constraint in equation (4). For κt it is not so
clear, but it satisfies the constraint as well [35]. However, it is beyond the scope of this thesis
to show why this holds.
2.1.2 Newton-Raphson method
The second way of determining the values of αx, βxand κtthat minimize objective function
(5) is by using a numerical algorithm, named after Isaac Newton and Joseph Raphson [13].
Their algorithm is a very useful tool for finding the roots of a twice differentiable function
f : R → R. It starts with a first guess x0, which updates iteratively according to
xn+1= xn−
f (xn)
f0(x n)
,
for f the function the function of interest and f0 its first derivative with respect to x. This process is repeated until the value of xn+1−xnis below a predetermined, sufficiently low value.
In our case we perform Newton-Raphson three times for αx, βx and κt all at the same time.
The function f in this case equals
f (θ) = ∂O(α, β, κ)
∂θ ,
for θ being the parameter of choice. This leads to the following updating scheme:
ˆ
θ(k+1)= ˆθ(k)− ∂O
(k)(α, β, κ)/∂θ
∂2O(k)(α, β, κ)/∂θ2,
∂O(α, β, κ) ∂αx = −2 · tn X t=t1 (log ˆmx(t) − αx− βx· κt), ∂O(α, β, κ) ∂κt = −2 · xm X x=x1 βx· (log ˆmx(t) − αx− βx· κt), ∂O(α, β, κ) ∂βx = −2 · tn X t=t1 κt· (log ˆmx(t) − αx− βx· κt),
for tn the last year in the data set and xmthe oldest age in the data set [35]. Notice that the
factor −2 will also appear in the second derivative. So all −2 factors can be dropped from
both the numerators and the denominators.
The iterative updating scheme is as shown below:
ˆ α(k+1)x = α(k)x + Ptn t=t1(log ˆmx(t) − ˆα (k) x − ˆβx(k)· ˆκ(k)t ) tn− t1+ 1 , ˆ κ(k+1)t = κ(k)t + Pxm x=x1 ˆ βx(k)· (log ˆmx(t) − ˆα (k+1) x − ˆβx(k)· ˆκ(k)t ) Pxm x=x1( ˆβ (k) x )2 , ˆ βx(k+1)= βx(k)+ Ptn t=t1ˆκ (k+1) t · (log ˆmx(t) − ˆα (k+1) x − ˆβx(k)· ˆκ(k+1)t ) Ptn t=t1(ˆκ (k+1) t )2 .
The algorithm stops when the values for all ˆα(k+1)x , ˆκ(k+1)t and ˆβ (k+1)
x do not change too much
anymore after a new iteration. The stop criterion will be set to 10−10. For the starting values we use values that satisfy the identifiability constraints in equation (4) from the beginning.
In this case ˆα0
x= 0, ˆκ0t = 0 and ˆβx0= 1
xm−x1+1, for every x, t.
To make sure the identifiability constraints are met, new updates need to be imposed right
after each iteration for that specific parameter. We replace ˆαxwith ˆαx+ ˆβx· ¯κ, also ˆκtwith
(ˆκt− ¯κ) · ˆκ∗ and ˆβx with ˆ βx
ˆ
β∗, with ¯κ being the mean of all ˆκt’s and ˆβ
∗ being the sum of the
ˆ
βx’s. Now the identifiability constraints are satisfied [35].
2.2
Poisson estimation
The second way of estimating the model parameters is by using a Poisson distribution. Based
overcome the homoskedasticity issues that arise due to the assumption that errors are equally
distributed across all ages [13].
This modelling framework relies on the exposure-to-risk Ex,tand death counts Dx,t.
Exposure-to-risk for age x and year t is the number of people of age x that are exposed to the risk of
death during a one-year time interval from year t to year t + 1. This value is almost equal
to the population size of age x in year t, except for a slight correction1 that contemplates the timing of deaths. The Dutch exposure-to-risk (ETR) data is provided by the Human
Mortality Database for all ages and years of our interest [2].
The concept of the Poisson distributed model is
Dxt ∼ Poisson(Ext· mx(t)), mx(t) = eαx+βx·κt. (9)
The parameters of this model can be estimated by maximizing log-likelihood corresponding
to the model described above. This log-likelihood is given by
L(α, β, κ) = xm X x=x1 tn X t=t1 Dxt· (αx+ βx· κt) − Ext· eαx+βx·κt + c, c ∈ R.
The easiest way to estimate the optimal values for the parameters in this Poisson model is by using the Newton-Raphson method. Working in a similar fashion as before we find the
following updating scheme:
ˆ α(k+1)x = ˆα(k)x + Ptn t=t1(Dxt− Ext· e ˆ α(k)x + ˆβx(k)·ˆκ(k)t ) Ptn t=t1Ext· e ˆ α(k)x + ˆβx(k)·ˆκ(k)t , ˆ κ(k+1)t = ˆκ(k)t + Pxm x=x1(Dxt− Ext· e ˆ α(k+1) x + ˆβ(k)x ·ˆκ (k) t ) · ˆβx(k) Pxm x=x1(Ext· e ˆ α(k+1)x + ˆβ (k) x ·ˆκ (k) t )( ˆβ(k)x )2 , ˆ βx(k+1)= ˆβx(k)+ Ptn t=t1(Dxt− Ext· e ˆ α(k+1)x + ˆβ(k)x ·ˆκ(k+1)t ) · ˆκ(k+1) t Ptn t=t1(Ext· e ˆ α(k+1)x + ˆβ(k)x ·ˆκ(k+1)t )(ˆκ(k+1)t )2 ,
1Someone of age x in year t could be born in year t − x or in year t − x + 1. This correction takes the
The criterion to stop the algorithm will again be set to 10−10, which is a common choice in literature [13]. For the starting values we use values that satisfy the constraints from the
beginning, just as we did in subsection 2.1.2 for the least squares model. In this case we set
ˆ α0 x= 0, ˆκ0t = 0 and ˆβx0= 1 m, for every x, t.
2.3
Data
The data originates from the Human Mortality Database (HMD) [2]. They provide access
to mortality data of 41 countries around the world, including Dutch death rate data which
will be used in this thesis. Unless stated otherwise, the observed death rates data that will be used in this thesis comes from calendar years t1 = 1965, . . . , tn = t50 = 2014, for ages
x1= 0, . . . , xm= x105= 104.
It should be mentioned that HMD smoothed the death rates for older ages, starting at age
x = 80. Without smoothing the data, the number of deaths at older ages will eventually
turn too small, so that the data starts to exhibit random variation. To see this we inspect
the raw data set, also obtained from the HMD. The raw death rates data were only
avail-able from age 0 up to age 95 and from year 1980 to 2014. For calculating the raw death
rates, the number of deaths per age per year is divided by the population size for that age in
that year. Figure 1 shows the boxplots of mortality rates per age of the raw data for every age.
Looking at figure 1, smoothing the mortality rates for age 80 onwards can be justified, since
variance increases rapidly for older ages. This can be deducted from noticing that the boxplots
become wider as age increases. The data beyond that age is therefore smoothed by using the
parameters that optimize the log-likelihood function of a Poisson model, as described by the
Methods Protocol [43]. This yields the following model:
Dxt∼ Poisson(Ext· µx+0.5(a, b)), µx(a, b) =
a · eb(x−80) 1 + a · eb(x−80).
This smoothing turns out to be a problem when testing which model fits the data better. For older ages the data will be biased towards the Poisson model. For that particular reason,
only data from ages 0 up to and including 79 will be used when the models are compared
to each other in quantitative tests. All data sets are transformed into a matrix where rows
Figure 1: Boxplots of mortality rates per age of the raw data. Source: the author.
2.4
Estimation results
We will start discussing the results of the least squares estimated model. The results of the
parameter estimates of the exact singular value decomposition method and the numerical
Newton-Raphson method were almost identical. The largest difference between any of the vector entries for either α, β or κ for both methods is approximately 1.286 · 10−12, so we can
conclude that both methods work and assume the results are identical. Since the parameters
of the Poisson distributed model can only be estimated with Newton-Raphson’s method, from
now on the Newton-Rahpson results will be used instead of the SVD estimates in the least
squares estimated model. This makes it easier to compare the results since the same methods
are used with the same tolerance and both have equivalent sized model errors.
Now we compare the results of the least squares estimated model with that of the Poisson
estimated model. The results for α, β and κ are shown graphically in figure 2. The least
squares model results are red dotted and the Poisson model results are green dotted.
Looking at the first plot of figure 2 it becomes clear that α displays the general mortality
Figure 2: Plots of α, β and κ for the SVD estimated model (red dotted) and the Poisson
estimated model (green dotted). Source: the author.
Unfortunately, there are still a few newborns that either die at or some time after birth due to
lethal diseases and complications. Around age 17 to 21 another excessive increase in mortality
can be spotted. This peak is called the young adult mortality hump. According to American
research this hump is mainly caused by males, and induced by biological and socioeconomic
factors [36]. These factors include riskier behavior, alcohol consumption and being able to
drive. The main reasons of death that contribute to this hump are traffic accidents, suicides,
homicides and poisonings. From this age on α increases gradually as expected. Due to its
interaction with κ, the course of β is less intuitive than that of α. The parameter β can
be interpreted as the rate every age is benefiting from improving life expectancy over time.
Since κtis negative in 1994, 3 to 12 year olds benefit the most of improving survival rates. In
the third graph we can clearly see κ decreasing over time, mainly caused by improvements in
sanitation, housing, nutrition, education and medical care [37]. Individual humps are harder
Comparing the results of the parameter estimates of both estimation methods, we see that
the results for α are extremely similar. For the β and κ there seems to be relatively more
difference between the two models. For β we see a slight difference for ages 3 to 11 and
ages over 99. For κ, we see a slight difference for almost all years covered. In combination
with the differences in β this could accumulate to larger differences, since these parameters
interact with each other. It should be noted that the value of these parameters is not the only
difference between these models. The main difference is the way the source of randomness is incorporated in the model. For the least squares estimated model this happens according
to the formula in equation (3) where t(x) ∼ N (0, σ2) is the source of randomness. In the
Poisson estimated model the number of deaths is drawn from a Poisson distribution where
the mean is derived from the central equation in the Lee-Carter model.
2.5
Quantitative comparison of least squares with the Poisson
as-sumption
In this section the least squares and the Poisson model will be compared using quantitative
results. The first test we perform is Bartlett’s test. Recall that for the least squares method
homoskedasticity of the error terms was assumed. This is a very strong assumption. In this
section we will find out if this assumption is acceptable. We can see in figure 1 that the
wideness of the boxplots increases rapidly already before age 80, the age from which our
data is smoothed. This indicates that the homoskedasticity assumption may not hold for our
data set. To formally test the homoskedasticity assumption, Bartlett’s test is used for the
complete smoothed data set. Bartlett’s test is a formal way of testing if samples have equal
variance. Every row of the life table matrix is treated as a different sample, each of them hav-ing equal variance under the null hypothesis. The test statistic is approximately χk−1= χ104
distributed under the null hypothesis. This leads to the null hypothesis being rejected if the
test statistic exceeds χ104,0.95= 128.80. For the data the test finds a test statistic value equal
to 3741.4, with a corresponding p-value of less than 2.2 · 10−16. The null hypothesis is easily being rejected. We can draw the conclusion that the variance is not equal for all ages. This
rules out the possibility of homoskedastic error terms and favors the Poisson estimated model.
A way to compare the two distributional assumptions is by performing residual analysis.
from age 80 onwards. Consequently, only data for ages 0 to 79 will be tested. The model
parameters will be determined by only the data for ages 0 up to and including 79.
For the least squares method Pearson residuals need to be inspected [34]. For this particular
model the residuals are given by
rxt= log ˆmx(t) − ( ˆαx+ ˆβx· ˆκt) q 1 (xm−x1)(tn−t1−1) Pxm x=x1 Ptn t=t1(log ˆmx(t) − ( ˆαx+ ˆβx· ˆκt)) 2 .
The sum of the squared residuals is chi-squared distributed with n · m = 80 · 50 = 4000 degrees
of freedom under the null hypothesis that the least squares estimated model is a good fit. This
means that this sum may not exceed χ4000,0.95≈ 4148.2 in order for this model to be a good
fit. The value for the test statistic equals 8318653. We can conclude that the least squares
estimated model is a bad fit.
For the Poisson model deviance residuals are considered. Deviance residuals can be seen as
the difference between the estimated model and the ideal model, for data points individually. This boils down to
D = 2 ·X
x
X
t
Dxt· log(Dxt) − Dxt− log(Dxt!) − (Dxt· log( ˆDxt) − ˆDxt− log(Dxt!))
,
where Dxt is the actual number of deaths for the group of people aged x in year t and ˆDxtis
the estimated number of deaths for this same group [6]. This simplifies to
D = 2 ·X x X t Dxtlog Dxt ˆ Dxt − (Dxt− ˆDxt).
This test statistic D is approximately χn·m−p distributed, for p the number of parameters
involved in the model. So D ∼ χ3997 under the null hypothesis.
The value of test statistic D for the Poisson model equals approximately 5878.6, while
χ3997,0.95 equals 4145.2. This implies that the Poisson model is a better fit than the least
squares model, but there is still room for improvement. A possible explanation for the
sup-posed bad fit of the Poisson model lies in the fact that our data sample is very large, which
same test but adding every data point with probability 201 in order to decrease the amount of data drastically, the test statistic D appeared to be slightly smaller than the corresponding
χ0.95 value. This means that after ruling out the negative effect the largeness of the data
set has on the goodness-of-fit, this model can be considered a good fit. This result does
not hold for the Pearson residuals in the least squares assumed model. After adding a data
point with probability201 in that model, the residuals are still far from chi-squared distributed.
Another possible explanation for the somewhat bad fit of the deviance residuals lies in the
cohort effect. The cohort effect is the phenomenon that people born in certain periods of time
experience a more rapid improvement survival rates than people born in other periods. The
analysis of the cohort effect is beyond the scope of this thesis. For more information on the
cohort effect, the paper “The Cohort Effect, Insights and Explanations” from R.C. Willets is
recommended [42].
In this chapter we discussed two different distributional assumptions for the Lee-Carter model,
i.e. the least squares assumption and the Poisson assumption. The parameters α, β and κ
under these two assumptions were estimated with the Newton-Raphson method. We have seen that homoskedasticity assumption of the error terms as considered in the least squares
assumption is not suitable for this data. The residual analysis favored the Poisson assumption
as well. So based on the quantitative tests we performed, the Poisson assumption fits the data
better than the least squares assumption. In the remainder of this thesis we will continue
3
Forecasting death rates and life expectancy
In May 2019, the Dutch government decided to raise the retirement age in the Netherlands
in 2024 up to 67 years [5]. This started a fierce debate whether on average people lived long
enough, in good health, to have a decent life after retiring at age 67. Life expectancy along
with quality of life were big issues in the media. Improving life expectancy affects the private pension sector as well. If realized survival rates out speeds the predicted survival rate it could
result in more than expected outbound cash flow for insurance companies and pension funds.
In this chapter we will establish an ARIMA model that fits the κt time series. With this
ARIMA model we can predict and simulate future survivor rates, ultimately used to price
longevity-linked securities and test their effectiveness in chapter 5.
The main goal of this chapter is to model the time parameter κtand use this time series model
to forecast future values. The model used for forecasting κt is the Autoregressive Integrated
Moving Average (ARIMA) time series. In this model the AR part indicates the variable
being regressed on its own prior values, the MA part indicates that the variable of interest is regressed on its own prior error terms and the I part indicates the number of times the data
is differenced, in order to make the data stationary. For our variable of interest κP
t, the κt
for the Poisson estimated model, this boils down to
κP,dt = c + φ1κP,dt−1+ . . . + φpκt−pP,d + θ1Pt−1+ . . . + θqPt−q+ P
t, (10)
for c constant, φ1. . . , φp the parameter estimates of the AR part, θ1, . . . , θq the parameter
estimates of the MA part, P
t the white noise error term at time t and d the number of times
κPt is differenced.
3.1
Properties of an ARIMA(p,d,q) model
In the process of finding the ARIMA model that fits the κPt time series for the Poisson model
the best, the first property that needs to be discussed is that differencing a time series tends to
drive the lag-1 autocorrelation to a negative value. To see why this holds Wold’s representation
theorem for time series is used [3]. This theorem states that every covariance-stationary time
Xt= ∞
X
j=0
ψjZt−j+ ηt,
with ψ0= 1,P∞j=0ψ2j < ∞, Zta white noise process with variance σ2and ηta deterministic
process. Note that
Cov(Xt, Xt−1) = Cov ∞ X j=0 ψjZt−j+ ηt, ∞ X j=0 ψjZt−1−j+ ηt−1 = σ2 ∞ X j=0 ψjψj+1,
since ηtis deterministic and Ztis white noise. Now
Yt= Xt− Xt−1= Zt+ (ψ1− 1)Zt−1+ X j=2 (ψj− ψj−1)Zt−j+ (ηt− ηt−1). Hence, Cov(Yt, Yt−1) = E[Yt· Yt−1] = (ψ1− 1)σ2+ σ2 ∞ X j=2 (ψj− ψj−1)(ψj−1− ψj−2),
where again the properties of a white noise process are used. Now sinceP∞
j=0ψj2< ∞, the
series {ψj}∞j=0 is decreasing and smaller than 1 most of the time. This implies that in most
cases φ1− 1 < 0 and (ψ1− 1) +P∞j=2(ψj− ψj−1)(ψj−1− ψj−2) < 0. So to conclude,
dif-ferencing the time series can be a very nice tool to decrease autocorrelation. Once the lag-1 autocorrelation is around zero or negative, the time series is differenced enough times.
Often the perfect number of differencing time series is where the standard deviation of the
series is the smallest. Also, since stationary time series are mean-reverting, after the time
series is differenced the optimal number of times, the series should always eventually return
to the mean over time. As a consequence, if the series needs to be differenced one time, then
the original series should show a constant trend. If it needs to be differenced multiple times
then there should be a time varying trend in the original series [32].
In order to determine the number of AR and MA parameters that should be added to the model, some more theory is required. Let B be the backward operator, defined as the operator
B · Yt= Yt−1,
for every time series Yt, for every t. This implies we can difference a time series {Yt} once by
using
Yt− Yt−1= Yt− B · Yt= (1 − B) · Yt.
Now an ARIMA(1,1,1) model can be written as
(1 − B) · Yt= φ1(1 − B) · Yt−1+ t+ θ1t−1= φ1(1 − B)(B · Yt) + t+ θ1B · t.
This is equivalent to
(1 − φ1B)(1 − B)Yt= (1 − θ1B)t.
So we can add an AR parameter by multiplying Ytwith (1 − φ1B), and we can add an MA
pa-rameter by multiplying twith (1 − θ1B) [32]. If the series is still somewhat underdifferenced
after differencing the series, this can be compensated by adding an AR parameter. The value
of φ1 corresponds to what extend the series is still underdifferenced. If for example φ1≈ 1,
the effect is almost the same as differencing the series once. On the other hand, adding an
MA parameter can undo overdifferencing. When for example θ1 ≈ 1, it almost completely
cancels out an (1 − B) term on the left-hand side of the equation, and nullifies differencing
the series.
Besides this theory the partial autocorrelation function and the autocorrelation function itself
can also be used in order to determine how much AR and MA parameters need to be added.
To understand this, note that the partial autocorrelation function removes parts that are
explained in earlier lags to find which lags are correlated with the remaining residual. This is
closely related to how an AR model works. In an AR model current values depend on their own prior values. If the PACF function suddenly cuts off sharply after lag n, then the model
could be properly described by the n previous values of the time series, indicating an AR
model with n parameters could be a good fit to the time series. The ACF function on the
other hand captures the entire correlation between a value of the time series and a lagged
can only be caused by the residuals itself. Hence a sharp cutoff in the ACF function after n
lags suggests that the last n residuals help to describe the data, and no more. Thus adding
n MA parameters could improve the model.
This ultimately leads to a set of two rules for adding AR and MA parameters. The first one
is that if the partial autocorrelation function shows a sharp cutoff after some particular lag or
if the lag-1 autocorrelation is positive, i.e. if the time series is slightly underdifferenced, the model may need another AR parameter. On the other hand, if the autocorrelation function
shows a sharp cutoff after some particular lag or if the lag-1 autocorrelation is negative, i.e.
if the time series is slightly overdifferenced, then the model may need another MA parameter
[32].
3.2
ARIMA model choice
We continue with determining the κP
t time series for our Poisson estimated model. The ACF
and PACF plots for the original series of κP
t are shown in figure 3.
Figure 3: ACF and PACF Plots of κPt for the Poisson model. The blue dotted lines indicate
Note that the ACF plot displays large values for a high number of lags. This indicates that
the series is probably underdifferenced. Differencing the series once leads to the ACF and
PACF plots depicted in figure 4.
Figure 4: ACF and PACF Plots of the 1-time differenced κP
t for the Poisson model. The blue
dotted lines indicate the border of significantly positive or negative. Source: the author.
The 1-lag autocorrelation is negative, but not significantly negative, so the differenced time
series does not appear to be too overdifferenced. Also, the standard deviation of the
one-time differenced series equals σ ≈ 2.453, which is lower than both the undifferenced with
σ ≈ 26.38 and the two-times differenced time series of κP
t with σ ≈ 3.72. Differencing the
series once appears to be optimal for this time series. We continue with determining how
many AR and MA parameters should be added to the model. We take a look at the ACF plot
of the differenced series in figure 4. The 1-lag autocorrelation is negative, but not significant. Also the ACF function does not show a particularly sharp cutoff due to the correlation not
being significant. We therefore consider using just an ARIMA(0,1,0) model. This leads to
the following model:
κPt − κP
t−1= −1.8631 + P
t, (11)
where the constant term is significant and P
figure 5 displaying various graphical goodness-of-fit tests for this ARIMA(0,1,0) model.
Figure 5: Various Graphical Goodness-of-Fit tests for the estimated ARIMA(0,1,0) model for
κtin the Poisson model. Source: the author.
There seems to be no clear patterns in the standardized residuals in the top plot. They also seem to be mean-reverting. In the ACF plot can be seen that the values are almost entirely
between the blue-dotted lines showing no significant undescribed autocorrelation. The normal
Q-Q plot of the residuals in the third plot is straight enough to accept the assumption that
the residuals are normal. Lastly, we take a look at the p-values found for the Ljung-Box test
statistics for different lags. Only for lag-17 the p-value is smaller than 0.05. This implies that
assuming that the differenced series of κP
t is independently distributed is reasonable. There
is still some correlation left in the residuals, but not enough to add more variables. To get
rid of the remaining undescribed correlation at least 3 MA parameters should be added, since
the first significant autocorrelation in the ACF plot occurs after 3 lags. As is also common in
literature, it is not worth complexing the model this much to capture this lag-3 autocorrelation [12]. Also when adding the MA parameters iteratively, we observe that the first parameter has
an insignificant effect on the model. We can conclude that the ARIMA(0,1,0) as described
above is the best fit to the time series κP
t. The graph of the time series, along with his
Figure 6: Time series plot of κPt, for t0= 1965. Points in red are forecasts. The dark shaded
region equals ˆκP t±
p
E[(ˆκPt − κPt)2], and the light shaded region equals ˆκPt ±2
p
E[(ˆκPt − κPt)2].
Source: the author.
3.3
Bootstrapping life expectancy
In the beginning of this chapter it was mentioned that the retirement age in the Netherlands
in 2024 will be 67. This makes the expected lifetime of 67 year olds of major importance
for politics, but also for insurance companies and pension funds. In this section we will use bootstrap to estimate the mean and the 95% confidence interval of the expected lifetime of
someone aged 67. This will give more insight in the amount of uncertainty rooted in the
Poisson estimated Lee-Carter model. The results will not be used in the remainder of the
thesis.
In order to determine life expectancy for every age and for every year, future values of κP t
need to be simulated from the model described in equation (11). With these simulated values
future expected mortality rates can be estimated, from which we can derive expected lifetime
for all ages and years. We can not derive mortality rates and life expectancy analytically
using the expected value of κPt, since life expectancy is not linear in κPt. Using simulation it
is possible to determine all mortality rates from 1965 to any year in the future, for any age
assign a value for life expectancy to age 105. The value that will be assigned to age 105 does
not influence the results much, since under our model the probability of someone of 67 years
old in 2024 reaching age 105 equals approximately 0.0003. The expected lifetime we assign
to age 105 is 1.5 years, since life table data from the Human Mortality Database over the last
10 years indicates that the expected lifetime of someone aged 105 revolves constantly around
1.5 years [2].
Recall that the assumption in equation (1) assumed the probability of death to be uniformly
distributed within age interval [x, x + 1) for every age x. This implies that the expected
lifespan of someone dying at age x equals x +12. Consequently, we can recursively define the cohort life expectancy for age x and year t as the probability of dying at age x in year
t times x +12 plus the probability of surviving times the life expectancy for age (x + 1) in year t + 1, i.e. ˚ex(t) = ˆqx(t) · x + 1 2 + ˆpx(t) · ˚ex+1(t + 1),
where the expected mortality rate ˆqx(t) for someone aged x in year t is estimated using 5000
simulations, and ˆpx(t) = 1 − ˆqx(t) denotes the expected survival rate for someone aged x in
year t. Plugging in age x = 67 and year t = 2024 leads to ˚e67(2024) ≈ 19.70. This value
ex-ceeds the expected lifetimes mentioned in the public debate concerning raising the retirement
age.
When forecasting life expectancy, uncertainty plays an important role as well. To assess how
uncertain an estimate is, confidence intervals can be a very useful tool. In our case there are
two types of uncertainty. The first type is the uncertainty in the estimated parameters αx, βx
and κPt. The second type relies on the uncertainty in forecasting κPt from the ARIMA time series. Also, note that life expectancy is a quite complex non-linear function of parameters
αx, βxand κPt. As a consequence, we will use bootstrap to determine the confidence interval.
The general idea is to sample B realizations for each combination of age and year from the
Poisson distribution with parameter λxt = Dxt = ET Rxt· ˆµx(t) [12]. In this way there will
be B life tables, each of them generating a new set of parameter estimates for αx, βxand κPt.
For each κP
t we could simulate an ARIMA(0,1,0) series to obtain future values. It is clear
Because of the complexity of the model we can not make B too large, so we take B = 5000.
The 95% confidence interval for ˚e67(2024) then equals [18.81, 20.59]. The mean of all sampled
expected lifetimes equals 19.69, which is close to the expected value. The interval is narrower
than is common in literature. This can be explained by the fact that this study does not
distinguish among sexes, and for that reason has more data, which narrows the interval. The
histogram of ˚e67(2024) for B = 5000 is shown in figure 7.
Figure 7: Histogram of ˚e67(2024) for B = 5000 samples. Source: the author.
In this chapter we established an ARIMA(0,1,0) time series model for κPt. Consequently, we
derived an estimate for the confidence interval of the expected lifetime of someone aged 67 in 2024, taking parameter uncertainty and uncertainty in forecasting κtinto account. To measure
the impact of parameter uncertainty we also estimated the confidence interval without using
the bootstrap framework, by merely simulating 5000 values of κP
t, for κPt as estimated in
chapter 2. Subsequently, the confidence interval of the expected lifetime is calculated using
αx, βxas estimated in chapter 2. The resulting confidence interval equals [18.83, 20.57], which
is just a factor 0.98 narrower than the confidence interval where parameter uncertainty was
of someone aged 67 in 2015, where the confidence interval that does not take parameter
uncertainty into account is also a factor 0.98 smaller than the confidence interval taking
parameter uncertainty into account. We conclude that parameter uncertainty is negligible
compared to the uncertainty in forecasting κP
t in a life expectancy context. In chapter 5
future values for κPt will be simulated to assess longevity risk in hedged and unhedged annuity portfolios. Due to complexity of the model, parameter uncertainty will not be addressed in
4
Solvency II
Longevity risk is the risk that realized survival rates of policyholders and pensioners exceed
expected survival rates. This can result in higher than expected payout costs for
insur-ance companies and pension funds, possibly causing solvency problems. In 1973 the basis
of Solvency I was introduced by the European Economic Community, to monitor solvency regulations for insurance companies and pension funds and thus protect his policyholders.
However, in Solvency I the capital requirements did not match the amount of risk the
insur-ance companies were exposed to. This caused a lack of incentive for insurinsur-ance companies to
properly manage the risks they faced. Also, insurance companies did not need to be
transpar-ent about their financial position and the risk sensitivity of that position. These shortcomings
of Solvency I led to the new Solvency II agreement, fully enforced by the European Union from
the 1st of January 2016 [7]. The Solvency II agreement does give insurance companies and
pension funds an incentive to hedge longevity risk using longevity-linked securities, since this
will lower the capital requirements. This chapter will provide the structure of the Solvency
II agreements. This will give more insight on how to effectively hedge longevity risk in the fifth chapter, where we will hedge longevity risk using longevity-linked securities in a simple
annuity portfolio.
Solvency II applies to all companies whose gross premium income equals at least 5 million
euro, or whose technical provisions are at least 25 million euro. According to De
Nederland-sche Bank, technical provisions are defined as: “The amount to be held by an insurer on
the balance sheet date in order to settle all existing obligations towards policyholders” [7].
Technical provisions will be revisited later on in this chapter. Solvency II also applies for
insurance companies whose reinsurance activities or activities abroad are non-negligible. [7].
Solvency II is often compared to the Basel II framework. The main reason is the similarity
in structure. The Solvency II framework, as well as the Basel II framework, work with
the three-pillar concept. Pillar 1 incorporates the quantification of the risks the insurance
companies are exposed to, ultimately leading to capital requirements. The way the capital
requirements are calculated is particularly important for this research, since reducing the
capital requirements is a main objective in hedging longevity. In Pillar 2 the mandatory
requirements for companies are included, with the ultimate goal of making the insurance and
pension business more transparent. All three pillars will be discussed into greater detail below
[23]. However, Pillar 2 and Pillar 3 will not be used in the remainder of this thesis.
4.1
Pillar 1
Under Pillar 1, Solvency II demands insurers to determine the market value of their balance
sheet and quantify the risks they face, either using a standard model or an internal model.
Insurers have to hold capital based on the amount of risk they are exposed to. The capital
requirements of the first pillar can be explained using figure 8 below.
Figure 8: The Solvency II Balance Sheet. Source: Society of Actuaries In Ireland.
As we can see, the liabilities consist of the capital requirements and the technical provisions.
The technical provisions consist of the best estimate liabilities and a risk margin. The best estimate liabilities can be described as the present value of all expected future cash flow,
discounted with the risk-free interest rate r. The risk-free interest rates are published by the
European Insurance and Occupational Pension Authority (EIOPA). The EIOPA is closely
involved with Solvency II and supervises the whole insurance and pension sector. The risk
margin is defined as the amount of capital another insurer needs to be paid in order for them
for the risk that the best estimate liabilities end up being worse than expected. Also part of
the risk margin is a compensation for the cost of holding capital against these best estimate
liabilities that would be taken on. In terms of formulae the risk margin is a percentage of the
discounted Solvency Capital Requirement, i.e.
RM = CoC · ∞ X t=0 SCRt (1 + rt+1)t+1 ,
where CoC is the Cost of Capital rate, set to 6% [23]. The Solvency Capital Requirement
(SCR) will be discussed later on in this section.
As mentioned in the beginning of this chapter, the main purpose of Pillar 1 is for companies
to quantify their risks by calculating its capital requirements. The Solvency Capital
Require-ment (SCR) is the amount of capital an insurance company needs to hold in order to avoid
increasingly supervisory interventions from the EIOPA. There are two ways to calculate the
SCR. It can be done by means of a standard formula or by an internal model. We start by discussing the standard formula.
4.1.1 Standard formula
The standard formula is a method that is suggested by Solvency II itself. First we define the
Net Asset Value (N AV ) at time t to be
N AVt:= At− BELt, (12)
where Atis the market value of the assets at time t and BELtis the best estimate liabilities
at time t. In the standard formula the SCR for longevity risk can now be calculated as
SCRShocklong (t) = N AV0− (N AV0|Longevity Shock), (13)
where a longevity shock entails a 20% decrease of the mortality rates for every age [31]. The
EIOPA provides a valid approximation of this value, under the circumstances that the nature,
complexity and scale of the risk is proportional. Companies that want to use this
approxima-tion need to perform an assessment of the risk to prove this. In this assessment any deviaapproxima-tions
well [1]. The simplification provided by the EIOPA is the following:
SCRlong= 0.2 · q · n · 1.1
n−1
2 · BELt. (14)
In this formula q denotes the expected average mortality rate of all insured persons, weighted
by the insured sum. Secondly, n denotes the modified duration in years of beneficiaries
included in the best estimates, i.e.
n = 1 1 + r · ∞ X t=1 t · P V CFt P V T CF ,
where r denotes the current risk-free interest rate, P V CFtdenotes the present value of the
cash flow at time t and P V T CF denotes the present value of the total cash flow. In equation
(14) the BELtterm indicates the best estimate liabilities that are subject to longevity risk [1].
The overall Solvency Capital Requirement can be calculated by aggregating the individual
Solvency Capital Requirement using the scheme in figure 9.
Figure 9: Solvency II SCR Risk Scheme. Source: Manolache, A.E.D. [29].
In this chart various risks that insurance companies and pension funds are exposed to are
will only focus on the longevity risk in life risk2. For life risk the overall SCR
Lif e can be
calculated using the following formula:
SCRLif e=
s X
i,j
CorrLi,j· SCRi· SCRj,
where CorrLi,j can be obtained from the correlation table in figure 10.
Figure 10: Life Risk correlation scheme. Source: European Union [1].
The Basic Solvency Capital Requirement (BSCR) can be calculated in a similar fashion using
SCRBSCR=
s X
i,j
CorrBi,j· SCRi· SCRj,
where CorrBi,j can be obtained from the correlation table in figure 11 [1].
4.1.2 Internal model
The EU and the EIOPA also offer insurance companies an opportunity to use an internal
model. This encourages companies to hedge the risks they are exposed to, in order to
de-crease the amount of capital they have to hold. In the internal model the Solvency Capital
Requirement is defined as the amount of capital that would be needed to overcome all losses
that could occur in a one-year time frame with probability 0.995. Specifically for longevity
2In this thesis we focus on hedging longevity risk in an annuity portfolio. Annuity portfolios are not directly
Figure 11: BSCR Risk correlation scheme. Source: European Union [1].
risk this boils down to:
SCRV aRlong(t) = argminxP N AVt− N AVt+1 1 + rt+1 > x ≤ 0.005, (15)
where rtis the annual risk-free interest rate at time 0 for maturity t.
The Minimum Capital Requirement (MCR) is the strict minimum amount of capital an
insurance company needs to reserve. Below this level companies would lose their license. The
MCR is part of the SCR and is defined as
M CRV aRlong(t) = argminxP N AVt− N AVt 1 + rt+1 > x ≤ 0.15,
but it is always larger than 25% of the SCR and always smaller than 45% of the SCR. The last constraint is that the MCR is always at leaste3.7 million for life insurance companies [1].
4.2
Pillar 2
In Solvency II Pillar 2 describes the supervisory review process, the governance requirements
and responsibilities of some key functions within the insurance business. Every company that
Solvency II applies to are enforced to have a risk management team, an actuarial team, a
compliance team and an internal audit team. The organizational structure of these teams
Pillar 2 also requires every company to perform an Own Risk and Solvency Assessment
(ORSA) once every year. The EIOPA defines ORSA as: “The entirety of the processes and
procedures employed to identify, assess, monitor, manage and report the short and long term
risks an insurance undertaking faces or may face and to determine the own funds necessary
to ensure that the undertaking’s overall solvency needs are met at all times” [16]. What
distinguishes ORSA from the capital requirements mentioned in Pillar 1 is that the SCR and
the MCR refer to regulatory requirements and ORSA is defined to be the whole process of being aware and react to the risks the company is exposed to. Also, ORSA is unique for every
company, as the calculations in the SCR and MCR are likely to be more similar since the
processes are subject to the same constraints. Lastly, Pillar 1 is limited to a one-year time
horizon and ORSA focuses on long-term business as well. ORSA even requires a company to
quantify their abilities to meet Pillar 1 capital requirements in the upcoming years [23].
4.3
Pillar 3
Lastly, in Pillar 3 of the Solvency II framework disclosure requirements are included to
en-hance transparency. The results of capital requirements in Pillar 1 and review processes like
ORSA in Pillar 2 need to be disclosed in reports to EIOPA supervisors. Elements of these
reports that are not considered to be confidential need to be disclosed in an annually produced public Solvency and Financial Condition Report (SFCR). The main goal of these disclosure
requirements is to obtain more market discipline through transparency. Ultimately this will
decrease systemic risk in the whole sector as well, since everyone is more aware of the risks that
companies are exposed to [22]. Pillar 3 plays no important role in the remainder of this thesis.
The Solvency Capital Requirement gives insurance companies an incentive to hedge their
risks in order to minimize the amount of capital they need to reserve. Also, ORSA requires
insurance companies and pension funds that are exposed to longevity risk to identify,
as-sess, monitor, manage and report longevity risk. The last chapter of this thesis will present longevity-linked securities that can be used to manage longevity risk, and therefore should be
5
Hedging longevity risk
In the fourth chapter we have been introduced to capital requirements for longevity risk. The
capital requirements are designed to buffer against big losses that could occur in
unfortu-nate circumstances. The capital requirements decrease when the amount of longevity risk
a pension fund is exposed to decreases. This gives a pension fund an additional incentive to hedge the longevity risk they face, besides the incentive to avoid unnecessary tail risk in
general. In this chapter we will test the hedge effectiveness of longevity-linked securities in
a basic annuity portfolio. The annuitants in this portfolio are 67 years old. The results in
this chapter will show that in a simple unhedged pension annuity setting the 99.5%
Value-at-Risk of the loss can be roughly ten times larger than the 99.5% Value-Value-at-Risk of the loss
for a portfolio completely hedged with a longevity-linked security. This raises two questions.
How should longevity risk be hedged and what percentage of the face value should be hedged?
In this chapter the Survivor Forward, the Survivor Swap and the European Survivor Call
will be discussed, priced and tested on effectiveness in hedging longevity risk. A survivor forward is a financial instrument agreed upon at time t, that obliges the holder to exchange
predetermined fixed amount based on the expected survivor rate for a floating amount based
on the realized survival rate at maturity T . The survivor swap works similarly, but where
the survivor forward limits itself to only one payment date T , the survivor swap provides
multiple payment dates t + 1, t + 2, . . . , T [10]. The structure of these derivatives is renowned
in hedging interest rate risk, and can potentially be effective in hedging longevity risk as well.
A loss in a hedged portfolio exposed to longevity risk is compensated by a positive cash flow
from the derivatives, while a loss in the derivatives itself due to lower than expected survival
rates is compensated by less payouts in the portfolio that was exposed to longevity risk.
The last derivative that will be discussed is the European survivor option, in particularly the
European survivor call. The holder of the European survivor call has the right but not the
obligation to exchange a fixed amount based on the expected survivor rates for the floating
amount based on the realized survival rates. The holder of this derivative will never exercise
the option if the realized survivor rates are smaller than the expected survivor rates, implying
that the potential loss is bounded by just the price paid for the derivative. This will make
Mortality rates are not a continuously traded asset in the market. This means that
longevity-linked securities can not be replicated in the market, implying that the market trading
longevity-linked securities is incomplete [10]. Therefore the pricing of the derivatives will
be performed by incorporating a market price of longevity risk λ into the model. Recall that
in the original model κPt was modelled as an ARIMA(0,1,0) time series with Pt ∼ N (0, 5.822)
as can be found in equation (11). The risk premium λ will be incorporated in this model by distorting P
t in such a way that it is N (−λ ·
√
5.822, 5.822) distributed. This will decrease
κP
t, indicating a longevity shock. In this thesis prices for derivatives will be calculated for
λ ∈ {0.10, 0.15, 0.20, 0.25, 0.30}. These values for λ are common in literature and resemble
the market prices of longevity risk that are applied in the Dutch insurance market [40].
Since we have mortality data up to and including 2014, we take the risk-free interest rates
from 31 December 2014 and use this day as the issue date for the longevity-linked securities.
We use the risk-free interest rates provided by the U.S. Department of Treasury [39], since
EIOPA did not yet provide the risk-free interest rates in 2014. The risk free interest rate
for maturities 1,2,3 and 5 are obtained directly from the data, the 4 year rate is obtained by interpolation. The values are depicted in table 1.
5.1
Distortion operators
In this section we introduce the concept of distortion operators. A distortion operator is a
function g : [0, 1] → [0, 1] such that g is increasing with g(0) = 0 and g(1) = 1. This function
g transforms a certain distribution into a new distorted distribution [40]. Consider the risk
adjusted distribution
FX˜(x) = Φ(Φ −1(F
X(x)) + λ), (16)
where FX(x) is a cumulative distribution function of a random variable X, Φ is the cumulative
distribution function of the standard normal distribution and λ is the risk premium, which indicates the market price of longevity risk. Now define Wang’s operator