• No results found

2.5 Prior and posterior distributions for the Lee-Carter model

2.5.4 Hyperparameters and starting values

probability φ is calculated as

φ = min{f ( cσm2|Λ\{ cσm2})

f ( fσm2|Λ\{ fσm2})·g( fσm2| cσm2) g( cσm2| fσm2)

; 1}.

Here, g(|σ2) is the density of the lognormal distribution with respective parameters ln σ2m12s2σ2

m and s2σ2 m. As mentioned earlier, the methodology is applied to simulated data sets as well as to the empirical data set. It appears that the sampling procedure as described inVan Berkum(2018) does not work well when it is applied to the empirical data set. The reason is that the sample variance s2σ2

m reaches higher values. As a result, the sampling distribution is very positively skewed, and only values extremely close to zero are drawn that are almost never accepted. Therefore, when the model is applied to the empirical data set, proposals are drawn as follows. Given a current fσ2mand Metropolis-Hastings sampling variance s2σ2

m, a proposal cσ2mis drawn from the truncated normal distribution ln cσm2 ∼ T N ( fσ2m, s2σ2

m, 0, ∞). The last two parameters are the lower bound and upper bound respectively. By setting the upper bound to infinity, we ensure that there is no upper bound.

In this way, we impose that the sampled values are always higher than zero. The sampling distribution is not symmetric. So the acceptance probability φ is calculated as

φ = min{f ( cσm2|Λ\{ cσm2}) f ( fσm2|Λ\{ fσm2})

·g( fσm2| cσm2) g( cσm2| fσm2)

; 1}.

Here, g(|σ2) is the density of the truncated normal distribution with respective parameters σ2, s2σ2

m, infinity and zero. Hence, the normal distribution of the logarithms is used as the proposal density for the simulated data sets, and we use the truncated normal density for proposals when we apply the model to the empirical data set.

Dpopk,t,xa ∼ Poisson(Ek,t,xpopaµpopk,t,xa),

with

log(µpopk,t,xa) = αx+ βxκt+ γk+ Txpopa,

where θpopx a = log(Θpopx a), and µpopk,t,xa denotes the baseline mortality in year group popa. As aforementioned, the following constraints are implemented to avoid identification problems:

γ4= 0 and θxpop1 = 1.

In the second step, we calculate the frequentist estimate of Θpfx as follows:

Θpfx = PT

t=1

PK

k=1Dpfk,t,xIk,t,xpf PT

t=1

PK

k=1Epfk,t,xµk,t,xIk,t,xpf .

This could be interpreted as the ratio between the number of deaths in the pension fund per age group, and the total number of deaths that would have been expected on the basis of the exposure and the baseline mortality.

Note that the data that corresponds to the pension fund that is under consideration is not taken into account for the estimation of the baseline mortality, whereas in the Bayesian framework this data is also utilized for the estimation of the baseline mortality. The reason for this is that otherwise, the expectation of the frequentist estimate of Θpfx is biased. In the Bayesian model, Θpfx is considered as the ratio between the mortality rates that are followed by the pension fund and the mortality rates that are followed by the rest of the data. Logically, if for example the data of the pension fund under consideration is also taken into account for the frequentist estimation, and experiences high mortality rates relative to the rest of the data, the baseline mortality will be estimated higher than when it is not taken into account. The estimates for Θpfx that follow from the second step will then be too low.

Starting values and hyperparameters

In order to verify that the resulting posterior distribution does not depend on the starting values of the pa-rameters, we add noise in order to obtain four different chains which we calibrate subsequently. For the parameters αxand βx, we use the frequentist estimates as starting values. For the parameters γk and Θmx with m ∈ {pf, pop1, pop2, . . . , popA}, we do not want to make any prior assumptions on their values, so we set these values to zero and one, respectively. The hyperparameters for Θmx have starting values ρm= 0.8 and σ2m= 1, and are updated every iteration.

As aforementioned, the parameters of the priors should be chosen such that these contain not too much information. In this way, the influence of the prior is negligible as desired. The way that we obtain the hyperparameters is largely analogous to Van Berkum (2018). For all data sets that we encounter, we have chosen the parameters for the prior distribution as follows:

• For the Gamma prior for αxwe set bx= 0.01 and ax= bx∗ exp( ˆαx). Due to the latter equation, we imply that the exponent of a variable that follows this prior distribution has expectation exp( ˆαx). The low value of bx ensures that the prior distribution has a large variance such that it contains little information. Note that parameter ax depends on age, whereas bxis the same for all ages.

• For βx we also use a gamma distribution with respective parameters lx and mx. Similar to the prior for αx, we set mx= 0.01 and lx= mx∗ exp( ˆβx).

• Similarly, for γk we also use a Gamma distribution. We set its hyperparameters to qx= rx= 0.01. Like this, the expectation of the exponent of a variable that follows this prior distribution has expectation one.

Note that when γk is equal to zero, this parameter has no influence on the mortality rates. In this way, we ensure that we do not make any prior assumption whether an income class has been exposed to higher or lower mortality rates.

• For Θmx with m ∈ {pf, pop1, pop2, . . . , popA}, we use a Gamma prior when we assume independence among ages. In order to avoid prior assumptions on whether this factor is below one ore above one, we set cmx = 1.

• When we assume dependence among ages in the age-specific factors Θmx with m ∈ {pf, pop1, pop2, . . . , popA}, we use a lognormal prior. This prior distribution contains three fixed parameters. For the mean reversion hyperparameter we set µρm = 0 and σρ2m = 1. The hyperparameter for the variance parameter is set to Ai= 10.

• The starting scale parameters for the proposal distributions, which are updated during calibration, are set as follows. The proposal distribution for βxinitially has initial scale parameter s2β

x = 1. The proposals for the parameters that correspond to the lognormal prior, that is used for the Bayesian model that assumes dependence among ages, are drawn from a proposal distribution with initial scale parameters s2Θm

x = 0.5, s2ρm= 0.5, and s2σ2

m = 0.5, with m ∈ {pf, pop1, pop2, . . . , popA}.

Adaptions to the final chains

For each chain, we run 150,000 iterations. As mentioned earlier, to account for the autocorrelation, we ap-ply thinning to the chains, which mean that we only keep every 50thiteration.11 By applying a burn-in period, we ensure that we disregard the iterations before the chains are converged. The burn-in period consists of the first 50,000 iterations. During this burn-in period, the scale parameters of the proposal distributions are updated every 100th iteration by setting it equal to the sample variance up to that iteration. These are only updated after the 5,000th iteration. Before this iteration, the initial scale parameters are used for the proposal distributions. In this way, we ensure that there are enough iterations such that there is at least some variance in the chain such that the scale parameters will not get extremely low values. Also, we only consider iterations

11InVan Berkum(2018), they applied more thinning. This was needed because of the way that they drew all values for βx

jointly. As a result, the proposed values for the parameters were rejected very often, which in turn results in autocorrelation.

after the 2,000thiteration.12 Mainly for the mean reversion ρmparameter and variance parameter σm2, the high scale parameters for the proposal distributions lead to very high values that affect the scale parameter for the whole calibration severely. When the scale parameter is decreased after the 5,000thiteration, the proposed and accepted values are less extreme and do not severely affect the scale parameter.

Compared to Van Berkum (2018), this process to update the scale parameters is rather complicated. We calibrate this model on varying sizes of data sets that require different scale parameters. It is desired that the initial scale parameters do not affect the calibration very heavily, which is the case when we apply this process.

12To be precise, we consider at least 5,000 iterations when updating the scale parameter, to ensure that the scale parameters are based on enough iterations. For example, after the 6,100thiteration, the scale parameter will be set to the sample variance of the

3 Data

In this section, we provide more information on the data that we used in this research. We first provide insights into the county-wide data that is used for estimating the general time trend, and for simulating the data. After this, the procedure for simulating the different data sets is explained. Lastly, the data set obtained from the pension funds is described.

3.1 Country-wide data

As aforementioned, we expect that the data from pension funds lacks in the time dimension to estimate the general long term time trend accurately. Therefore, this time trend is estimated by fitting the Lee-Carter model on the data of the whole Dutch population. Also, the parameters are extracted from the results and used for simulating the data sets. Such as many similar studies, the country-wide data is obtained from The Human Mortality Database (HMD). This data set consists of the total exposures and total observed number of deaths per year and per age. In order to find the time trend, we used the years 1950 to 2019 and ages 45 to 90.

Although more years of data were available, the observed mortality during the years before 1950 show different patterns than the more recent past.

Figure 1

Observed deaths, exposures and death rates in the whole country per age for males, aggregated over all years.

In Figure 1 and 2, we depict the observed number of deaths and ages of exposure per year among males and females, respectively. In the years of exposure, we see for both genders an inverse link between age and exposure. Logically, this is expected as newborn individuals are more probable to be alive at lower ages than at high ages. The deaths appear to be such that the log of the ratio of the observed deaths and the observed exposures inclines smoothly, as expected. Also, note that mortality is lower for women than for men, which is a commonly known phenomenon.

The observed exposures and deaths per year are depicted in Figure 3and 4. The exposures are increasing over time. This could be attributed to the increase in life expectancy. Note that we only consider ages 45 to 90. The number of individuals with ages in this range increased over time.

Figure 2

Observed deaths, exposures and death rates in the whole country per age for females, aggregated over all years.

Figure 3

Observed deaths, exposures and death rates in the whole country per year for males, aggregated over all ages.

Figure 4

Observed deaths, exposures and death rates in the whole country per year for females, aggregated over all ages.

As might be considered unexpected, the ratio of the observed deaths and exposures are not declining very much. The reason for this is that the average age increases generally over time as well. Over the years, the number of people aged 90 years increased considerably relative to the number of people aged 45 years. As shown before, this has an increasing effect on mortality rates. Apparently, those effects cancel each other out to some extent. Therefore, calculating the trend in observed mortality per age might provide valuable insights. This isolates the effect of the decrease in death rates per age. The results are shown in Figure 5. It appears that, mostly for females, the logarithm decreases smoothly by approximately one. Note thatFigure 3and4 show a decrease in log mortality of approximately 0.4.

Figure 5

The logs of observed deaths and exposures in the whole country per year for males (right) and females (right) with certain ages.