• No results found

3.3 Pension fund data

4.1.3 Forecasting results

note that the pension fund under consideration is not used for the baseline mortality in the frequentist setting:

Θpopx 2BLUE=

1

#pop2

PJ

j∈pop2µ˙k,t,xΘ˙jx

1 J −1

PJ

j6=kµ˙k,t,xΘ˙jx

=

1

#pop2

PJ

j∈pop2Θ˙jx

1 J −1

PJ j6=kΘ˙jx

,

where #pop2 is the number of years included in group pop2. As the two pension funds with the most historical years are also the pension funds with the lowest fund-specific factors that are used for the data generating factors, the estimates are very low. The results are very similar to the results for Θpfx. The only difference is that the confidence intervals do not decrease at all when the number of historical years increases. This is due to the fact that group pop2 always consist of the same number of years by construction. Thus, increasing the number of simulated years, does not increase the number of observations {k, t, x} in Opop2.

Another note that has to be made is that the values of the mean reversion parameters ρi are very close to one. In light ofVan Berkum (2018), this suggests that a random walk would have been more appropriate.

However, as the estimated values for these parameters are very close to one, this would probably have led to similar results.

We conclude that the model in this thesis performs very well in terms of accuracy. The point estimates obtained using the simple frequentist largely coincide with the median of the posterior distributions that follow from the Bayesian models. However, in order to obtain favourable estimations for the extent to which the age groups are exposed to general time trend κt, indicated by βx, the data set must be large enough both in terms of the years of exposure per observation {k, t, x} and the number of historical years. For this, the available empirical data set is too small. However, for credible confidence intervals for the other parameters that drive the mortality, the yearly exposure has more impact. Also, we find that the confidence intervals that result from the Bayesian model that uses the lognormal prior are more useful, as these appear to provide more insights into the age-specific factors that are used for the data generating process, mostly for data sets that consist of the least historical years.

Figure 27

The forecasted values for κt, based on the original time series estimated from the Lee-Carter model. The values are forecasted using a random walk process with a drift.

and the resulting projections for the largest data sets are shown inFigure 29.16 In each graph, on the left side of the vertical dashed line, the projections are shown, and the planes on the right side of the dashed line, represent the forecasted mortality rates. In the three upper rows and the three lower rows, the confidence intervals are obtained using the smallest data sets and medium data sets in terms of yearly exposure, respectively. Each row corresponds to one size of the data set, and are per pair of three ordered in ascending order with respect to the number of historical years. The first and third column correspond to the baseline mortality, and the second fourth column correspond to the fund specific mortality, found by multiplying the baseline mortality with the fund-specific factors. Also, the two left rows depict the mortality rates of individuals in age group of 50, and the two right rows relate to age group 75 as indicated. All figures correspond to income class 6. In each graph, the blue plane corresponds to the parameters that are used for the data generating process, and the red plane corresponds to the frequentist estimators. The uncertainty in the predictions for these two approaches follows solely from the uncertainty in the projections of κt. The light orange and light green planes are obtained by calibrating the Bayesian model based on the Gamma prior and lognormal prior, respectively. The black dots represent the simulated death rates. For these confidence intervals, the uncertainty is caused by the uncertainty in all parameters. For comparison purposes, the ranges for the vertical axes are set equal for all data sets.

We first note that the baseline mortality has a shift in every plot. This has to do with the implemented change in the composition of the data in that year. The wider corresponding confidence intervals are due to the additional uncertainty in Θpopx 2. After all, the projected mortality rates before this change are multiplied with Θpopx 2. The simulated death rates are often rather far from these baseline mortality rates. Note that sometimes for the lowest exposures, the dots that correspond to the death rates lay beyond the scope of the graphs. This is partly due to the fact that only one seventh of the data is used for the death rates as we only consider the sixth income class. Moreover, the planes depict the confidence intervals for the mortality rates, and not for the death rates. The death rates include Poisson noise, thus the intervals for the death rates would be wider than

16We attribute the noisy intervals of the forecasted mortality rates for the frequentist approach and the approach based on the parameters that are used for the data generating process to the low number of simulated scenarios of κt.

Figure 28

The planes depict the confidence intervals of the estimated and the projected logarithm of the mortality rates on the left side of the vertical dashed line and on the right side of the dashed line, respectively. The confidence intervals are obtained by means of the two smallest data sets in terms of yearly exposure. The first and third column correspond to the baseline mortality, and the second fourth column correspond to the fund specific mortality. Also, the left panel depict the mortality rates of age group of 50, and the right panel these of age group 75. All graphs correspond to the sixth income class. The blue plane corresponds to the parameters that are used for the data generating process, and the red plane corresponds to the frequentist estimators. The light orange and light green planes are obtained by calibrating the Bayesian model based on the Gamma prior and lognormal prior, respectively. The black dots represent the simulated death rates. The rows are first ordered with respect to the yearly exposure, and then with respect to the number of

Figure 29

The planes depict the confidence intervals of the estimated and the projected logarithm of the mortality rates on the left side of the vertical dashed line and on the right side of the dashed line, respectively. The confidence intervals are obtained by means of the largest data sets in terms of yearly exposure. The first and third column correspond to the baseline mortality, and the second and fourth column correspond to the fund specific mortality. Also, the left panel depicts the mortality rates of age group 50, and the right panel of age group 75. All graphs correspond to the sixth income class. The blue plane corresponds to the parameters that are used for the data generating process, and the red plane corresponds to the frequentist estimators. The light orange and light green planes are obtained by calibrating the Bayesian model based on the Gamma prior and lognormal prior, respectively. The black dots represent the simulated death rates. The rows are ordered in ascending order with respect to the number of historical years.

those shown in the figure. We observe that the death rates are closer when the age or the exposure is higher.

This is expected as in the previous subsection, we have derived that the volatility of the death rates are lower when the exposures and mortality rates are higher. Therefore, in the graphs for the baseline mortality, the death rates are also closer and observed more often, as the yearly exposure of the data of one pension fund is less than the yearly exposure of the rest of the data. For the largest data set, the death rates are very close to confidence intervals. This also agrees with the conclusion drawn in Van Berkum(2018), that for large data sets, Poisson noise even has barely any impact on the total number of deaths.

Next, we discuss and explain the differences between the confidence intervals for the different data sets and ages. The confidence intervals of the projected mortality rates mostly relate to the uncertainty in αx, γk, Θpopx 2

and Θpfx. From the results on their confidence intervals, we have concluded that the intervals depend slightly, and not at all for Θpopx 2, on the number of historical data sets and severely on the yearly exposure. This also results from the graphs of the mortality rates. However, increasing the historical years does negatively affect the width of the prediction intervals. The extent to which the prediction intervals dilate over time is related to the variance in βx. We have seen before that increasing the number of historical years in the data leads to a lower volatility in the posterior distribution of this parameter. Further, the confidence intervals of the

mortality rates for age group 75 are much smaller than those for age group 50, which could be attributed to the higher predictability of the death rates due to the higher mortality. Also, the wide intervals of the fund-specific mortality rates compared to these for the baseline mortality rates correspond to the uncertainty in Θpfx. Interestingly, we observe that in almost all graphs the green plane, associated with the Bayesian model that uses the lognormal prior, is closer to the light orange plane that is associated with the model that uses the Gamma prior. This is directly related to the fact that the mean reversing process in the age-specific factors moves the factors towards the parameter value that is used for the simulations of the deaths.

Lastly, we observe that for the largest data sets, the parameter uncertainty hardly affects the prediction intervals. Similar as in Van Berkum (2018), the widths of these intervals are almost fully caused by the uncertainty in the projections of κt. They almost coincide with those related to the frequentist estimates.17

Thus, we conclude that the prediction intervals for the mortality rates for the long run depend severely on the size of the data sets both in terms of exposure and the number of historical years. This is caused by the width of the confidence interval of βx. We also see that prediction intervals of the fund-specific mortality rates on the short run also depends a lot on the confidence interval of Θpfx, which relates largely to the size of the data set in terms of exposure. However, in the long term, this effect decreases.