• No results found

3.3 Pension fund data

4.1.2 Parameter estimation results

increasing the yearly exposure results in better performance in terms of convergence diagnostics, which conforms to expectations. Apparently, for a constant number of historical years, the lower volatility and improved predictability in the mortality rates due to the increased exposure lead to better mixing properties.

We conclude that when we consider the simple model that assumes independence among age in the age-specific factors, the convergence diagnostics are very favourable regardless of the number of historical years and amount of exposure. However, when the lognormal prior is used for the factors, high yearly exposures are required for satisfactory mixing properties. Also, adding historical years, which could be considered as keeping track of the available data without including more pension funds, does not lead to improved convergence diag-nostics. Nevertheless, as the densities of the parameters that directly drive the mortality rates largely overlap, mostly for the data set with medium exposure and high exposure, we believe that the posterior distributions that result lead to informative insights. The posterior distributions are analysed and discussed in the next subsection.

Figure 24

Estimation results of αx (top panel) and βx (bottom panel) for all different simulated data sets. The blue line shows the values of the parameters that are used for the data generating process, the red line is obtained from estimating the parameters by using the frequentist approach, the confidence intervals that result from the model that uses a Gamma prior are depicted in light orange, and the light green planes show the confidence intervals obtained from the Bayesian model that uses the lognormal prior. In the middle of both intervals, the black line forms the median of the confidence intervals.

that the model with the lognormal prior assumes, has no direct effect on the parameter αx. However, this process affects the values for the age-specific factors, which in turn affect the estimates of αx. One might expect that the median of the confidence interval associated with the model that does not assume this process should exactly coincide with the frequentist estimate. However, the small differences are caused by the fact that the data from the pension fund under consideration is not taken into account for the frequentist estimation of the baseline mortality in order to enable unbiased frequentist estimation of the age-specific factors, as substantiated earlier. The reason why the confidence intervals for both Bayesian models overlap more when the number of historical years increases is that when the number of historical years is higher, the confidence intervals of Θpfx obtained using both versions of the model get closer, on which we will elaborate later on.

The results for βx are shown in the bottom panel ofFigure 24. For the smallest data set that corresponds to the size of the available empirical data set obtained from the pension funds, the frequentist estimates are very volatile. Often, the estimates are negative, which implies an increase in mortality over time, even though the βx that is used for simulation are chosen to be strictly positive as could be seen from the blue line. As the median closely follows the frequentist setting, the confidence intervals obtained from the Bayesian approaches are largely below the x-axis. In contrast to αx, the confidence intervals do not only decrease severely when the yearly exposure increases, but also when the number of historical years increases. This is expected as a longer time horizon provides more insights into time trends. For the largest data set, the frequentist estimates are closer to the parameter values that are used for the data generating process, the confidence intervals are rather small, and both are strictly positive. Note that also when the data set is very large, the estimates still differ from the βx that is used for the simulations. This indicates how much data is needed for accurate estimation of this parameter.

In Figure 25, the estimation results for γk are provided. First, note that γ4 is set to zero for the sake of identifiability. Similar to the results for αx, the width of the confidence intervals decrease strongly when the exposure increases, but decrease weakly when the number of historical years increases. Also, the confidence intervals and the frequentist estimates get closer to the values of γkthat are used for the data generating process when more historical years are simulated. We conclude that when the data set is large enough, the model is able to capture the dependence between income and mortality rates when this is present in the data set.

The estimation results for Θpfx are shown in the top panel ofFigure 26. The value of Θpfx that is used for the data generating process, is equal to the ratio between the implemented mortality ˙µk,t,x and the implemented mortality rates for the pension fund. However, the actual baseline mortality is based on the aggregated data of all pension funds except the pension fund under consideration. As the mortality rates depend on the pension funds that are included, the blue line is calculated as the ratio between the expected baseline mortality and the mortality rates of pension fund k under consideration.

Θpfx BLUE= µ˙k,t,xΘ˙kx

1 J −1

PJ

j6=kµ˙k,t,xΘ˙jx

=

Θ˙kx

1 J −1

PJ j6=kΘ˙jx

.

As could be seen from the estimation results, the median of the confidence interval for the model that uses the

Figure 25

Estimation results of γk for all different simulated data sets. The blue line shows the values of the parameters that are used for the data generating process, the red line is obtained from estimating the parameters by using the frequentist approach, the confidence intervals that result from the model that uses a Gamma prior are depicted in light orange, and the light green planes show the confidence intervals obtained from the Bayesian model that uses the lognormal prior. In the middle of both intervals, the black line forms the median of the confidence intervals.

Gamma prior almost exactly coincides with the frequentist estimate of this variable. This is because all data is used for the frequentist estimation of Θpfx. Note that for some low ages, the frequentist estimate is equal to zero for the smallest data set. This corresponds to the situation that no case of death occurred in the simulated data set for that age.

The estimates of Θpfx increase in age and are below one as intended. Thus, the pension fund under consid-eration is one of the pension funds that experience lower mortality rates. As expected from the results of αx, the confidence intervals for the model with the Gamma prior decrease a lot when the exposures increase and decrease only slightly when the number of historical years increases. Furthermore, the confidence intervals are the smallest for ages between seventy and eighty, even though the mortality rates are not the highest for these ages. This could be attributed to the fact that the exposures are much higher for these ages compared to higher ages, which also leads to smaller confidence intervals. The confidence intervals for the Bayesian model that uses the lognormal prior are much smoother and smaller as they are supposed to be. This prior penalizes proposals for Θpfx when they are in value far from the estimates of Θpfx−1 and Θpfx+1. Interestingly, regardless of the data set, the confidence intervals that follow from this version, approach the values of the parameters that are used for simulation closely compared to the intervals obtained by applying the model that uses the Gamma prior.

In the bottom panel of Figure 26, the estimation results for Θpopx 2 are shown. In this case, the blue line is calculated as the ratio between the baseline mortality in group pop and the mortality in group pop . Again,

Figure 26

Estimation results of Θpfx (top panel) and Θpopx 2 (bottom panel) for all different simulated data sets. The blue line shows the values of the parameters that are expected for the data generating process. These are calculated as follows:

Θpopx 2BLUE=

1

#pop2

PJ j∈pop2Θ˙jx 1

J −1

PJ j6=kΘ˙jx

and Θpfx BLUE=

Θ˙kx 1 J −1

PJ j6=kΘ˙jx

.

The red line is obtained from estimating the parameters by using the frequentist approach, the confidence intervals that result from the model that uses a Gamma prior are depicted in light orange, and the light green plane show the confidence intervals obtained from the Bayesian model that uses the lognormal prior. In the middle of both intervals, the black line forms the median of the confidence intervals.

note that the pension fund under consideration is not used for the baseline mortality in the frequentist setting:

Θpopx 2BLUE=

1

#pop2

PJ

j∈pop2µ˙k,t,xΘ˙jx

1 J −1

PJ

j6=kµ˙k,t,xΘ˙jx

=

1

#pop2

PJ

j∈pop2Θ˙jx

1 J −1

PJ j6=kΘ˙jx

,

where #pop2 is the number of years included in group pop2. As the two pension funds with the most historical years are also the pension funds with the lowest fund-specific factors that are used for the data generating factors, the estimates are very low. The results are very similar to the results for Θpfx. The only difference is that the confidence intervals do not decrease at all when the number of historical years increases. This is due to the fact that group pop2 always consist of the same number of years by construction. Thus, increasing the number of simulated years, does not increase the number of observations {k, t, x} in Opop2.

Another note that has to be made is that the values of the mean reversion parameters ρi are very close to one. In light ofVan Berkum (2018), this suggests that a random walk would have been more appropriate.

However, as the estimated values for these parameters are very close to one, this would probably have led to similar results.

We conclude that the model in this thesis performs very well in terms of accuracy. The point estimates obtained using the simple frequentist largely coincide with the median of the posterior distributions that follow from the Bayesian models. However, in order to obtain favourable estimations for the extent to which the age groups are exposed to general time trend κt, indicated by βx, the data set must be large enough both in terms of the years of exposure per observation {k, t, x} and the number of historical years. For this, the available empirical data set is too small. However, for credible confidence intervals for the other parameters that drive the mortality, the yearly exposure has more impact. Also, we find that the confidence intervals that result from the Bayesian model that uses the lognormal prior are more useful, as these appear to provide more insights into the age-specific factors that are used for the data generating process, mostly for data sets that consist of the least historical years.