3.3 Pension fund data
4.1.1 Convergence diagnostics
The convergence diagnostics indicate how well the chains have converged to the posterior distributions. The results are considered favourable when the trace plots of all four different chains, presented in the first column, largely overlap and do not show any trends. This would suggest that convergence is obtained. The second column, which shows the autocorrelation function, should depict low values. High positive values indicate autocorrelation in the series. When this is the case, the parameters mix slowly and the samples of the resulting posterior distribution are not statistically independent. In case the series contains autocorrelation, applying more thinning would form a solution. This would artificially reduce the autocorrelation. When the chains for all four different calibrations converge to the same posterior distribution, the densities of the series, shown in the third column, should largely overlap. Lastly, the Gelman-Rubin statistics show the ratio between the within variance and the between variance. As these types of variances should be equal in the case that all chains have converged to the same distribution, the value of this statistic should converge to one when the sample size increases. The diagnostics of relevant parameters obtained by calibrating the model on the smallest data set and the biggest data set are presented in this section, and these for the relevant parameters obtained from calibration on the other data sets are shown in the Appendix. For all data sets, two versions of the models are applied. The first version uses a Gamma prior for the portfolio-specific factors and the factors for the different groups of years and assumes independence among ages. The second version, which is slightly more complicated, assumes dependence among ages in the factors in the form of a mean-reverting process. This version uses a lognormal prior for these factors.
The smallest data set, with 9 years of simulated historical data, and a total yearly exposure of approximately 88.000 is in terms of size comparable to the empirical data set that has been obtained from the pension funds.
The convergence diagnostics for the version associated with the Gamma prior, presented in Figure 16and 17, show perfect mixing properties for the smallest data set. The trace plots of the different chains overlap largely,
Figure 16
Part 1. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a Gamma prior for the age-specific factors on the data set with the lowest exposures per year, and nine simulated years.
Figure 17
Part 2. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a Gamma prior for the age-specific factors on the data set with the lowest exposures per year, and nine simulated years.
no relevant autocorrelation is observed, and the Gelman-Rubin statistics converge to one rapidly. We conclude that applying this version of the model to data sets of this size lead to valid posterior distributions. The analysis of the resulting posterior distribution is elaborated on in the next section. This suggests that applying this model to the available empirical data would also lead to useful results.
The convergence diagnostics presented in Figure 18 and 19, obtained by calibrating the version of the model that uses a lognormal prior for the age-specific factors on this data set, show less favourable mixing properties. Whereas the results on αx, βx and γk show very good statistics, the convergence results for chains of the parameters associated with the mean reverting process among ages in the factors Θixindicate that these parameters do not mix very well. For instance, this could be seen from the trace plots in the first column. The series of Θi50, with ∈ {pf, pop2}, show both volatility clustering and a moving average, and is thus clearly not stationary. The moving average also results in positive values for the autocorrelation function. However, as the densities overlap quite well, and the Gelman-Rubin statistics tend to one, we believe that these results give more insights into the posterior distribution.
As relatively more deaths occur at higher ages due to the higher mortality rates, the logs of the death rates are less volatile for these ages. This could be shown by means of the delta method:
Suppose that X is a random variable. If g(x) is a function of x, and is at least once differentiable, the first-order Taylor expansion at ¯X, which is the mean of X, can be used to find the variance of g(X):
g(X) ≈ g( ¯X) + d
dxg( ¯X)(X − ¯X).
Therefore, the variance is approximately
Var (g(X)) ≈ Var
g( ¯X) + d
dxg( ¯X)(X − ¯X)
= Var
g( ¯X) + d
dxg( ¯X)X − d
dxg( ¯X) ¯X
= Var d
dxg( ¯X)X
= d dxg( ¯X)
2
Var(X).
Note that g( ¯X) and dxdg( ¯X) ¯X are constants. In this case, X ∼ Poisson(λ) which represents the number of deaths, where λ = µE, and g(x) = logEx = log x − log E. We get ¯X = λ, Var(X) = λ, and dxd g( ¯X) = X1¯ = 1λ. By filling these in, we obtain the following variance:15
Var(logXE) ≈λ1 = µE1 .
Thus, the volatility is smaller for both higher exposures and higher mortality rates. Intuitively, this leads to more accurate estimation results. As a result, the chains for Θi75 show better convergence properties.
15Note that g(x) is not defined on x = 0, which is in the state space of X. This result would hold when we disregard this case.
In fact, this extreme negative value leads to even more variance. Thus, this could be interpreted as a lower bound for the variance.
As the probability of X being zero is also higher when λ is low, this phenomenon amplifies the result of the inverse dependence between the mortality rate and the volatility of the death rate.
Figure 18
Part 1. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a lognormal prior for the age-specific factors on the data set with the lowest exposures per year, and nine simulated years.
Figure 19
Part 2. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a lognormal prior for the age-specific factors on the data set with the lowest exposures per year, and nine simulated years.
In order to derive some intuition behind the moving average and volatility clustering in the trace plots, we analyse the chain corresponding to the purple series inFigure 18and19. Just before the 1500th sample, there is a phase of the variance parameter σ2pf for Opf being very close to zero. This implies that subsequent values of Θpfx are bound to be very close to each other. As in this model the factors are updated one by one, it is not possible to deviate much from the previous value for this parameter. Hence, the volatility is very low around these samples as could be seen in the trace plot of Θpf75. The moving average could be explained from the same example. As all values of Θpfx get closer to each other due to the low value for σpf2 around the 1500th sample, the values of Θpf50and Θpf75 also move towards each other. This leads to a moving average, which could also be observed from the trace plot. This phenomenon is also the reason for the bumps on the density function shown in the third column. Also note that, once the values of Θpfx are very close to each other, the value of σ2pf tends to remain close to zero. Once this value increases again, this process reverses. Apparently, this is a rather slow process such that after thinning there is still autocorrelation in the series.
The next data set for which we discuss the convergence diagnostics is the biggest data set. This set contains 17 simulated historical years and a yearly exposure of approximately 12.558.000 years. When the model that assumes independence among ages and uses a Gamma prior is applied to this data set, the diagnostics presented in Figure 20and 21 result. The results are comparable to the results for the small data set. The diagnostics show very good mixing properties.
The diagnostics obtained by applying the model with the lognormal prior are depicted in Figure 22 and 23. In contrast to the smaller data set, the convergence diagnostics are more favourable. Similarly, the results for αx, βx and γk are almost perfect. Moreover, although there is little autocorrelation in the series of the parameters Θi50, the densities overlap very well, and the Gelman-Rubin statistics converge one. Only the Gelman-Rubin statistics for the parameters associated with the lognormal prior are not as close to one as for the other parameters. As the Gelman-Rubin statistics of the other parameters are very good, we expect that this barely affects the other parameters, from which we would actually sample. Speculatively, the difference with the small data set could probably be attributed to the fact that the higher exposure leads to lower volatility in the death rates as derived earlier.
The convergence diagnostics for the simulated data sets sized between the smallest and largest data set are presented in the Appendix. As expected, the diagnostics for the model that uses the Gamma prior are very good for all data sets. For these parameters, the series displayed in the first column and the densities in the third column overlap almost perfectly, there is barely autocorrelation, and the Gelman-Rubin statistics converge to one rapidly. The results for the model with the lognormal prior are variable. It appears that the mixing properties, measured by the autocorrelation function, depend a lot on the yearly exposure, but hardly on the number of simulated years. Although this result was expected for Θpopx 2 as the number of years included in group pop2 is kept constant for all data sets, we expected that adding more years to group pf would lead to better mixing strategies for the other parameters. After all, including years lead to more observations {k, t, x}
in Opf. However, note that mostly for the data sets with medium or high yearly exposures, the densities overlap
Figure 20
Part 1. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a Gamma prior for the age-specific factors on the data set with the highest exposures per year, and seventeen simulated years.
Figure 21
Part 2. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a Gamma prior for the age-specific factors on the data set with the highest exposures per year, and seventeen simulated years.
Figure 22
Part 1. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a lognormal prior for the age-specific factors on the data set with the highest exposures per year, and seventeen simulated years.
Figure 23
Part 2. Convergence diagnostics including an illustration of the trace plots of the four different chains, the autocorrelation function of the series, the densities of the four different chains, and the behaviour of the Gelman-Rubin statistic. The diagnostics are obtained by calibrating the model that uses a lognormal prior for the age-specific factors on the data set with the highest exposures per year, and seventeen simulated years.
increasing the yearly exposure results in better performance in terms of convergence diagnostics, which conforms to expectations. Apparently, for a constant number of historical years, the lower volatility and improved predictability in the mortality rates due to the increased exposure lead to better mixing properties.
We conclude that when we consider the simple model that assumes independence among age in the age-specific factors, the convergence diagnostics are very favourable regardless of the number of historical years and amount of exposure. However, when the lognormal prior is used for the factors, high yearly exposures are required for satisfactory mixing properties. Also, adding historical years, which could be considered as keeping track of the available data without including more pension funds, does not lead to improved convergence diag-nostics. Nevertheless, as the densities of the parameters that directly drive the mortality rates largely overlap, mostly for the data set with medium exposure and high exposure, we believe that the posterior distributions that result lead to informative insights. The posterior distributions are analysed and discussed in the next subsection.