• No results found

3.3 Pension fund data

4.1.4 Forecasting future deaths

mortality rates for age group 75 are much smaller than those for age group 50, which could be attributed to the higher predictability of the death rates due to the higher mortality. Also, the wide intervals of the fund-specific mortality rates compared to these for the baseline mortality rates correspond to the uncertainty in Θpfx. Interestingly, we observe that in almost all graphs the green plane, associated with the Bayesian model that uses the lognormal prior, is closer to the light orange plane that is associated with the model that uses the Gamma prior. This is directly related to the fact that the mean reversing process in the age-specific factors moves the factors towards the parameter value that is used for the simulations of the deaths.

Lastly, we observe that for the largest data sets, the parameter uncertainty hardly affects the prediction intervals. Similar as in Van Berkum (2018), the widths of these intervals are almost fully caused by the uncertainty in the projections of κt. They almost coincide with those related to the frequentist estimates.17

Thus, we conclude that the prediction intervals for the mortality rates for the long run depend severely on the size of the data sets both in terms of exposure and the number of historical years. This is caused by the width of the confidence interval of βx. We also see that prediction intervals of the fund-specific mortality rates on the short run also depends a lot on the confidence interval of Θpfx, which relates largely to the size of the data set in terms of exposure. However, in the long term, this effect decreases.

yearly exposures are predetermined, whereas they actually are not. When the mortality rates are very high, the individual exposures are very low. In order to obtain a total exposure equal to the predetermined exposure, the number of individuals must be a lot higher than the exposure. Conclusively, the assumption that the exposure is predetermined is not valid when the mortality rates are very high. Therefore, we assume that the number of individuals is predetermined and equal to the exposure rounded to the closest integer.18 In this way, the number of individuals at the start of the year is used, rather than the total number of years that will be lived in that year. From these numbers of individuals, the number of deaths will be drawn from a binomial distribution using the death probabilities. Furthermore, in practice, the technical provisions are calculated by means of the number of individuals rather than the years of exposure. Thus, the number of deaths are drawn from the following distribution, given the fund-specific mortality rates µpfk,t,x:

Dpfk,t,x∼ binomial



Ek,t,xpf , 1 − exp(−µpfk,t,x)

 .

The forecasted death probabilities correspond to five different approaches.

• The first approach uses the prediction intervals of the mortality rates that correspond to the parameters that are used for the simulation of the data sets. This approach does not take noise into account. The volatility in the death counts solely correspond to the uncertainty in the projections of future κt. (First column of the tables)

• The second approach is similar to the first approach but uses the parameters obtained by means of the frequentist approach. (Second column)

• The third approach is similar to the second approach in terms of mortality rates, but we also simulated the death counts using the binomial distribution with the probability parameter equal to the death probability as described above. (Third column)

• The scenarios of the death rates for the fourth approach are obtained by means of the Bayesian model that assumes the Gamma prior. Further, the approach is similar to the third approach. Note that by using the Bayesian model, also parameter uncertainty is taken into account. (Fourth column)

• The fifth approach is similar to the fourth approach, but this model uses the parameters obtained by the Bayesian model that uses the lognormal prior. (Fifth column)

The results for the smallest data sets, the medium data sets and the largest sets in terms of yearly exposure are provided in Table 1, 2 and 3, respectively. From investigating the confidence intervals of the forecasted fund-specific mortality rates, we concluded that when the time horizon increases, the width of the interval

18Technically, setting the number of individuals equal to the exposure relies on the assumption that the composition of the data set in terms of age groups is constant over time, also when the future composition is simulated using the mortality rates. Suppose we compare the exposure Ek,70,2020of people aged 70 and the number of individuals aged 70 in the year 2020. In the case that in 2020 the number of individuals aged 69 is too low or the mortality for that age group in that year is too high, too few people become 70 in the year 2020, such that Ek,70,2020is an underestimation of the number of individuals. As we harmonize the relative distribution over the ages with these of the empirical data set, this is a fair assumption for most ages. However, between ages 65 and 68 there is an abrupt increase in exposure as when non-actives retire they appear in the data set.

increases as well. Hence, one would expect that the standard deviations would increase in the number of deaths as well when the time horizon increases. However, for a random variable X ∼ binomial (n, q) the variance equals Var(X) = nq(1 − q). As we only consider death probabilities below 0.5, when the death probability is lower, the variance of the binomial distribution is also lower. For instance, for the smallest data set the standard deviation of the number of deaths increases in time for the frequentist approach that does not take binomial noise into account, as could be seen in the second column. In contrast, when binomial noise is also taken into account in the third column, the standard deviation decreases when the time horizon increases. This is caused by the fact that the mortality rates decrease over time. We conclude that the presented standard deviations are not only driven by the volatility in the mortality rates, but also by the height of the mortality rates.

We observe that when the mortality rates are lower for lower ages, the binomial noise decreases considerably.

Also, in the smallest data set the binomial noise overrules the uncertainty that results from the uncertainty in the future time series κt. However, for the aggregate of all considered ages, the effect of binomial noise is very small, mostly for the data sets with the highest exposures, as the standard deviations that follow are very comparable in the second and third column. Thus, when the binomial noise is applied to more individuals, the impact of the binomial noise on the standard deviation disappears.

We could isolate the effect of parameter uncertainty by comparing the third column with the fourth column.

After all, from the confidence intervals of the parameters, we concluded that for all parameters, the posterior distributions that correspond to the model with the Gamma prior are centred around the frequentist estimates.

For the smallest data sets in terms of exposure, we see that for the data set with the most historical years, the difference between the third and fourth column is very small. In this case, the effect of the binomial noise is a lot stronger than the effect of including parameter uncertainty. However, when the number of historical years is lower, the inclusion of parameter uncertainty leads to much higher standard deviations in long term projections. From the analysis of the confidence intervals of the parameters we have learned that, except for that of βx, the confidence intervals of the posterior distributions decrease only slightly when the historical years increase. As a result, the standard deviations of predicted future death counts for only one year ahead are similar for different numbers of historical years. Note that in these cases the effect of parameter uncertainty is overruled by the binomial noise. However, we found that the confidence interval of βxdecreases strongly as the number of historical years increases. Logically, this in turn leads to more uncertainty in the predictions in the long term when the data set contains a small number of historical years. From the results, this could be verified by analyzing the increase in differences between the standard deviation of the third column and the fourth column when the time horizon increases. This increase is for all sizes of data sets in terms of exposure, thus, the strongest when the number of historical years is low. This phenomenon could be attributed to the fact that when the number of historical years is low, the uncertainty of βx is convincingly the highest relative to the other parameters. For the smallest data sets in terms of exposure, the impact of parameter uncertainty overrules the impact of binomial noise on predictions on the long term. Further, for the largest data set in terms of exposure and number of historical years parameter uncertainty barely affects the standard deviations

Sim Freq Freq Bayes G Bayes LN

TS TS TS + BN TS + BN + PU TS + BN + PU

Year Mean SD Mean SD Mean SD Mean SD Mean SD

9 simulated historical years Age group 50

2020 0.33 0.01 0.22 0.05 0.22 0.47 0.32 0.61 0.27 0.53

2030 0.29 0.03 0.10 0.08 0.10 0.32 0.28 0.78 0.27 0.74

2040 0.26 0.03 0.05 0.08 0.05 0.23 0.36 2.17 0.43 2.58

Age group 75

2020 9.78 0.25 11.05 0.27 11.05 3.27 11.20 3.63 10.33 3.32

2030 8.78 0.72 10.03 0.72 10.02 3.19 10.56 4.29 9.48 3.76

2040 7.94 0.88 9.21 0.87 9.20 3.10 10.29 5.65 9.04 4.98

Total of all age groups

2020 173.47 3.81 178.56 3.95 178.56 13.53 183.42 14.56 170.47 14.47 2030 157.82 11.19 167.33 7.21 167.36 14.47 187.89 23.40 163.01 17.67 2040 144.71 13.87 163.26 5.59 163.25 13.48 214.22 58.52 168.59 31.90

13 simulated historical years Age group 50

2020 0.32 0.01 0.60 0.00 0.61 0.78 0.64 0.85 0.28 0.53

2030 0.29 0.03 0.61 0.01 0.61 0.78 0.85 1.23 0.32 0.63

2040 0.25 0.03 0.62 0.01 0.62 0.79 1.25 2.69 0.42 1.00

Age group 75

2020 9.54 0.24 9.89 0.38 9.88 3.11 10.04 3.30 9.39 3.10

2030 8.56 0.70 8.42 1.03 8.43 3.04 8.86 3.36 8.11 3.12

2040 7.74 0.86 7.28 1.20 7.27 2.92 8.21 3.98 7.17 3.16

Total of all age groups

2020 169.74 3.72 174.74 3.79 174.73 13.38 176.60 14.02 169.20 14.00 2030 154.45 10.93 160.59 9.76 160.58 15.71 165.72 15.79 155.40 16.47 2040 141.63 13.55 150.77 9.92 150.75 15.50 162.34 19.20 147.18 17.30

17 simulated historical years Age group 50

2020 0.32 0.01 0.38 0.01 0.38 0.62 0.43 0.68 0.40 0.64

2030 0.28 0.03 0.42 0.03 0.42 0.65 0.52 0.80 0.44 0.69

2040 0.25 0.03 0.45 0.04 0.45 0.67 0.64 1.03 0.49 0.79

Age group 75

2020 9.31 0.23 9.83 0.23 9.82 3.09 9.96 3.22 9.35 3.05

2030 8.35 0.68 8.88 0.68 8.88 3.01 9.07 3.20 8.32 3.01

2040 7.55 0.84 8.10 0.83 8.09 2.93 8.41 3.21 7.52 2.97

Total of all age groups

2020 166.09 3.64 165.19 4.55 165.20 13.30 166.97 13.72 161.49 13.59 2030 151.15 10.68 147.18 12.71 147.17 17.35 150.49 17.07 144.55 17.14 2040 138.63 13.25 132.87 14.95 132.84 18.70 138.17 17.84 131.64 18.05 Table 1

In this table, the means (Mean) and standard deviations (SD) for the predicted future numbers of deaths for three different time horizons for age group 50, age group 75 and all age groups together are presented. The results are based on the smallest data sets in terms of exposure. Age group 50 consists of 147 individuals, in age group 75 there are 256 individuals, and the total of all age groups consists of 7,762 individuals in each year. The results are obtained from the parameters that are used for the simulation of the data (Sim), from the frequentist estimators (Freq), and from the Bayesian models that use the Gamma prior (Bayes G) and the lognormal prior (Bayes LN). We investigate three different approaches that include different uncertainties. The uncertainty indicated by “TS” follows from the uncertainty in future values for κt, “BN” indicates the uncertainty that results from the binomial noise, and for the Bayesian approaches we also include parameter uncertainty that follows from calibration, indicated by “PU”. The top panel shows the results for the data set with 9 historical years, the middle panel concerns the data set with 13 simulated years, and the data set for the bottom panel consists of 17 historical years.

Sim Freq Freq Bayes G Bayes LN

TS TS TS + BN TS + BN + PU TS + BN + PU

Year Mean SD Mean SD Mean SD Mean SD Mean SD

9 simulated historical years Age group 50

2020 4.0 0.1 3.6 0.1 3.6 1.9 3.7 2.1 4.3 2.1

2030 3.5 0.3 3.1 0.3 3.1 1.8 3.3 2.3 4.0 2.4

2040 3.1 0.4 2.7 0.4 2.7 1.7 3.2 2.7 3.8 2.9

Age group 75

2020 115.1 2.9 116.1 2.5 116.1 10.9 116.2 11.9 116.3 11.4

2030 103.2 8.4 105.8 7.4 105.7 12.5 106.0 14.8 105.1 14.5

2040 93.4 10.4 97.0 9.3 97.0 13.4 97.8 17.0 95.9 16.6

Total of all age groups

2020 2063.0 45.4 2110.5 48.6 2110.5 65.9 2117.5 67.0 2105.2 68.6

2030 1876.6 133.2 1919 135.2 1918.9 141.7 1943.0 133.7 1920.0 140.9 2040 1720.5 165.1 1766.1 160.0 1766.0 165.4 1811.7 150.5 1776.2 162.1

13 simulated historical years Age group 50

2020 3.9 0.1 4.7 0.1 4.7 2.2 4.8 2.3 4.3 2.1

2030 3.4 0.3 4.2 0.4 4.2 2.1 4.4 2.3 4.1 2.1

2040 3.0 0.4 3.8 0.5 3.8 2.0 4.1 2.4 3.9 2.2

Age group 75

2020 112.2 2.8 108.4 3.5 108.3 10.8 108.6 11.3 111.0 11.2

2030 100.7 8.2 94.4 9.9 94.4 13.8 94.9 14.3 97.1 14.3

2040 91.1 10.2 83.1 11.9 83.1 14.9 84.0 15.5 85.9 15.6

Total of all age groups

2020 2018.5 44.3 2055.4 43.8 2055.4 62.1 2057.3 63.7 2049.4 64.2

2030 1836.4 130.2 1876.6 127.6 1876.6 134.4 1880.9 133.9 1870.2 135.9 2040 1683.9 161.4 1727.6 157.4 1727.5 162.5 1735.3 160.7 1722.1 163.3

17 simulated historical years Age group 50

2020 3.8 0.1 4.5 0.0 4.5 2.1 4.5 2.2 4.3 2.1

2030 3.3 0.3 4.3 0.1 4.3 2.1 4.4 2.2 4.0 2.1

2040 2.9 0.4 4.2 0.2 4.2 2.1 4.2 2.2 3.8 2.1

Age group 75

2020 109.5 2.8 113.5 2.4 113.5 10.7 113.2 11.2 113.2 10.9

2030 98.2 8.0 103.4 7.2 103.4 12.3 103.0 12.9 102.8 12.7

2040 88.8 9.9 94.9 9.0 94.9 13.2 94.4 13.8 94.1 13.7

Total of all age groups

2020 1975.1 43.3 2024.0 46.6 2023.9 63.9 2025.5 65.0 2020.6 64.9

2030 1797.2 127.2 1833.5 136.0 1833.4 142.3 1836.5 141.9 1830.9 142.3 2040 1648.1 157.7 1674.7 167.6 1674.7 172.4 1679.6 171.4 1673.2 172.0 Table 2

In this table, the means (Mean) and standard deviations (SD) for the predicted future numbers of deaths for three different time horizons for age group 50, age group 75 and all age groups together are presented. The results are based on the medium sized data sets in terms of exposure. Age group 50 consists of 1,752 individuals, in age group 75 there are 3,046 individuals, and the total of all age groups consists of 80,500 individuals in each year. The results are obtained from the parameters that are used for the simulation of the data (Sim), from the frequentist estimators (Freq), and from the Bayesian models that use the Gamma prior (Bayes G) and the lognormal prior (Bayes LN). We investigate three different approaches that include different uncertainties. The uncertainty indicated by “TS” follows from the uncertainty in future values for κt, “BN” indicates the uncertainty that results from the binomial noise, and for the Bayesian approaches we also include parameter uncertainty that follows from calibration, indicated by “PU”. The top panel shows the results for the data set with 9 historical years, the middle panel concerns the data set with 13 simulated years, and the data set for the bottom panel consists of 17 historical years.

Sim Freq Freq Bayes G Bayes LN

TS TS TS + BN TS + BN + PU TS + BN + PU

Year Mean SD Mean SD Mean SD Mean SD Mean SD

9 simulated historical years Age group 50

2020 47.8 1.4 43.8 1.4 43.9 6.8 44.3 7.4 49.4 7.3

2030 42.1 4.1 38.1 4.0 38.1 7.4 39.5 8.3 45.1 8.4

2040 37.4 4.9 33.5 4.8 33.5 7.5 35.6 9.1 41.7 9.3

Age group 75

2020 1381.1 34.7 1394.6 36.3 1394.6 51.6 1392.3 54.9 1408.1 54.3

2030 1239.0 101.4 1246.5 105.6 1246.4 111.1 1240.3 116.6 1255.9 116.5 2040 1120.7 124.9 1123.6 129.6 1123.6 133.8 1115.0 140.5 1130.3 140.6

Total of all age groups

2020 24748 544 25349 540 25349 562 25352 564 25343 566

2030 22512 1598 23132 1583 23132 1590 23146 1585 23129 1592

2040 20639 1981 21278 1960 21278 1965 21306 1956 21281 1965

13 simulated historical years Age group 50

2020 46.4 1.4 47.8 0.8 47.8 7.0 47.8 7.4 49.1 7.2

2030 40.9 3.9 44.3 2.5 44.3 7.1 44.1 7.7 45.8 7.5

2040 36.3 4.8 41.3 3.2 41.3 7.2 41.1 8.1 43.0 7.9

Age group 75

2020 1347.1 33.9 1381.4 36.9 1381.4 51.9 1382.8 53.0 1382.8 52.6

2030 1208.5 99.0 1231.2 107 1231.2 112.3 1234.2 112.3 1234.5 112.2

2040 1093.0 121.9 1107 130.9 1107.0 135.1 1111.4 135.0 1111.7 134.7 Total of all age groups

2020 24215 531 24803 547 24803 568 24812 568 24808 567

2030 22030 1561 22557 1605 22557 1612 22576 1606 22573 1605

2040 20200 1936 20676 1989 20676 1993 20703 1986 20700 1986

17 simulated historical years Age group 50

2020 45.1 1.3 48.1 1.0 48.1 7.0 48.3 7.3 49.4 7.2

2030 39.7 3.8 43.8 3.1 43.8 7.3 44.3 7.6 45.1 7.5

2040 35.3 4.6 40.1 3.8 40.1 7.4 40.8 7.8 41.6 7.8

Age group 75

2020 1313.9 33.1 1347.5 33.2 1347.5 49.1 1347.5 50.1 1347.1 49.8

2030 1178.6 96.5 1211.6 97.1 1211.6 102.9 1211.5 103.7 1210.8 103.7 2040 1066.0 118.9 1098.1 119.8 1098.1 124.1 1097.9 125.0 1097.4 124.9

Total of all age groups

2020 23694 519 24209 537 24209 558 24218 558 24218 557

2030 21559 1526 22003 1576 22003 1583 22019 1578 22022 1576

2040 19770 1892 20156 1953 20156 1958 20178 1952 20184 1949

Table 3

In this table, the means (Mean) and standard deviations (SD) for the predicted future numbers of deaths for three different time horizons for age group 50, age group 75 and all age groups together are presented. The results are based on the largest data sets in terms of exposure. Age group 50 consists of approximately 21,000 individuals, in age group 75 there are approximately 37,000 individuals, and the total of all age groups consists of 966,000 individuals in each year.

The results are obtained from the parameters that are used for the simulation of the data (Sim), from the frequentist estimators (Freq), and from the Bayesian models that use the Gamma prior (Bayes G) and the lognormal prior (Bayes LN). We investigate three different approaches that include different uncertainties. The uncertainty indicated by “TS”

follows from the uncertainty in future values for κt, “BN” indicates the uncertainty that results from the binomial noise, and for the Bayesian approaches we also include parameter uncertainty that follows from calibration, indicated by “PU”.

The top panel shows the results for the data set with 9 historical years, the middle panel concerns the data set wit 13 simulated years, and the data set for the bottom panel consists of 17 historical years.

By comparing the fourth and the fifth column we analyze the effect of the assumption of dependence among ages in the portfolio factors. From investigating the age-specific factors, we have found two effects of this assumption. On the one hand, the confidence intervals are smaller when we assume the lognormal prior. On the other hand, the intervals are more centred around the parameters that are used for the simulation of the data. Both effects had the most impact for the smallest data sets in terms of exposure. Later on, these are verified by analyzing the confidence intervals and prediction intervals of the mortality rates. From the results for the means of the predicted death counts, we learn that these are often closer to the means corresponding to the simulation parameters in the first column. Also, for the data sets with the lowest yearly exposures, the standard deviations that result are slightly smaller. However, for the larger data sets in terms of exposure, where the inclusion of parameter uncertainty does not affect the standard deviations, this effect is smaller.

Conclusively, for predictions of death counts on the short term, the effects of the inclusion of parameter uncertainty lead to more uncertainty in future death counts but is overruled by the uncertainty that follows from binomial noise. When the data set is large, the parameters are less uncertain such that the effect of parameter uncertainty vanishes. When the predictions are for the long horizon, parameter uncertainty has more impact on the standard deviation of the death counts due to the volatility of βx. When the data set consists of low exposures or few historical years, βx has more volatility such that the parameter uncertainty overrules the binomial noise. Also, from comparing both considered Bayesian approaches, we conclude that the results are more credible when we use the lognormal prior. The standard deviations are often lower for smaller data sets, and the resulting means are often closer to these of the death counts that follow from the predictions that correspond to the simulation parameters.

The conclusions that we draw are similar to the conclusions in Van Berkum(2018). The only difference is that in their research, the effect of parameter uncertainty is barely relevant, even when the size of the portfolio decreases. This could be attributed to two differences between the analyses. The data set that they used consists of 50 historical years. Therefore, decreasing the portfolio did not lead to more volatility in βx. Secondly, they reduced their empirical data set by dividing both the years of exposure and the numbers of deaths by the same factor. As a result, the death rates stayed the same. Whereas, as we have derived earlier, decreasing the exposure leads to more volatility in the death rates. If this volatility is not taken into account, the uncertainty in, among others, βxis underestimated. However, as they based their parameters on a lot more historical years, it is questionable whether this would have been relevant.