Evaluation of the double chain ladder using simulated data

(1)

using Simulated Data

Jelger Roet

Bachelor’s Thesis to obtain the degree in

Actuari¨ele Wetenschappen

University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics

Author: Jelger Roet

Student no.: 6298907 / 10003949

Email: Jelger Roet@hotmail.com

Date: June 19, 2014

(2)

(3)

Abstract

In this paper we evaluate the Chain Ladder, Double Chain Ladder and Bornhuetter-Ferguson Double Chain Ladder by using simulated datasets. After defining a stan-dard dataset consisting of three run-off triangles, we were able to measure the performance of the three methods by varying different simulation parameters, re-sulting in numerous datasets. The performance was measured by considering the deviations of the exact total reserve estimates and the bootstrapped total reserve’s

99thpercentiles from the corresponding simulated values. The results suggest that

the Double Chain Ladder and Bornhuetter-Ferguson Double Chain Ladder are significant improvements on the Chain Ladder, wherein the largest increased

per-formance is observed at the 99th percentiles. The results also show that these two

methods behave differently when particular parameters are varied. An indication was found that the obtained results are generalisable to other datasets.

Keywords Dataset simulation, Bootstrapping, Claim reserves, Chain Ladder, Double Chain Ladder, Bornhuetter-Ferguson Double Chain Ladder

(4)

Introduction

Predicting outstanding liabilities and calculating loss reserves is of great impor-tance to insurance companies. An insurance company’s annual profit does not only depend on the premiums gained and losses claimed that year, as claims are not always settled or even reported in the year they are incurred. At the end of the year the company has to make predictions for the claims incurred that year that are not yet reported or settled. There is a series of methods that can be used to perform these predictions, of which the majority makes use of aggregate data instead of individual data. Aggregation’s practical advantage is that it can lead to a more convenient appearance of data. However, it is also associated with data loss, as individual aspects are lost in the process, hence worsening the projections. Methods based on individual data are generally not used in practice as these are too complex, and because insurance companies do not own the detailed data sets required (Verrall et al., 2010).

A widely used method is the Chain Ladder (CL) method. Its popularity can be attributed to the fact that it only uses a run-off triangle of aggregate payments, which is assumed to be available in insurance companies. However, this method can only provide forecasts of the aggregate payments and is incapable of separating the Reported But Not Settled (RBNS) claims from the Incurred But Not Reported (IBNR) claims.

Due to the above mentioned restriction, Mart´ınez-Miranda et al. (2012) pro-posed an extension to the CL, building a more accurate model. In their model, claim counts and claim sizes are evaluated separately by using the two correspond-ing run-off triangles. By applycorrespond-ing the CL to both triangles, the prediction of the outstanding claim liabilities can be divided into the IBNR reserve and RBNS re-serve, including tail reserve estimates. Because the authors apply the CL twice, they call this new method the Double Chain Ladder (DCL) method. This method has a number of advantages beside separating IBNR and RBNS reserves. Among them are the ability to adjust to the growth of the volume of insurance policies and the ability to explicitly model inflation.

Mart´ınez-Miranda et al. (2013b) argue that the explicit modelling of the annual inflation is a weakness of the DCL as these parameters might have a high degree of uncertainty. By using another run-off triangle, less volatile estimations of the annual inflation parameters can be obtained. These predictions can then be used

(8)

2 Jelger Roet — DCL Evaluation

as a replacement of the original estimations of the inflation parameters in the DCL. As it is similar to the method of Bornhuetter & Ferguson (1972), this new method is called the Bornhuetter-Ferguson Double Chain Ladder (BDCL).

The aim of this thesis is to test the CL, DCL and BDCL by assessing their performance on multiple simulated datasets with different characteristics. Result-ing from the foregoResult-ing, the research question of this thesis is: How do the CL, DCL and BDCL perform in predicting outstanding claim liabilities on simulated datasets with different characteristics?

To evaluate the performance of the methods, different simulation parameters are set to vary in a certain interval, thus creating numerous datasets. The total reserve estimations from the different methods are compared with the simulated total reserves. The deviations can then be used to evaluate the methods. In the assessment of the methods, bootstrap estimates of the predictive distributions are also constructed for the simulated datasets that result from varying the

param-eters. This is done to obtain the 99th _{percentiles of the total reserve, which can}

be compared to the simulated total reserve 99th _{percentile. This is done along the}

lines of Mart´ınez-Miranda et al. (2011). In order to do this the statistical program R will be used, for which Mart´ınez-Miranda et al. (2013a) have already provided a useful package that is needed in this research.

The outline of this thesis is as follows. Chapter 2 gives a literature review of previous research relevant to the construction of the Double Chain Ladder method and the Bornhuetter-Ferguson Double Chain Ladder method. The simu-lation settings, and thus the characteristics for the simulated datasets, as well as the approach of assessing the methods are given in Chapter 3. Chapter 4 discusses an analysis of the results, followed by a final chapter containing the conclusion.

(9)

Literature review

2.1 Chain Ladder

In general insurance a relatively simple method used to forecast outstanding claim liabilities is the Chain Ladder method. To be able to use this method, data from previous years need to be available, specifically a run-off triangle of aggregate payment data. This is assumed to be available in insurance companies. The CL method was originally proposed as a non-stochastic method, just a convenient algorithm. Later research justified the use of this simple method from a statistical point of view. For example, it can be shown that if each number in the

aggre-gate payments run-off triangle, Xij, is modelled as a multiplicative Poisson model

and their means are estimated using maximum likelihood, which equals finding the Generalized Linear Model estimates, the Chain Ladder parameters are found (Kaas et al., 2008).

A standard run-off triangle of aggregate payments is given in Table 2.1. The accident year is the year in which the accident occurred. The development year is the amount of time elapsed in years between the occurrence of an accident and the settlement of its claim. For example, claims incurred in year one and settled two years later, are settled in the same calendar year as claims incurred in year two that are settled one year later. Settled claims per calendar year can therefore also be considered if aggregate payment data is presented in a run-off triangle. The total amount of aggregated payments corresponding to a specific accident year i

and development year j, is denoted by Xij. In Table 2.1, the rows correspond to

accident years and the columns correspond to development years. Calendar years can be examined by considering the diagonals. The CL makes use of the trends in the development years, assuming that for one development year about the same percentage of claims will be settled in every accident year.

A simple algorithm for the Chain Ladder is Verbeek’s algorithm (Kaas et al., 2008). By applying this algorithm to the run-off triangle of aggregate payments

with size m, estimators of the parametersα_ei and eβj are found so that the following

holds Xij ≈ bαei b e β_j for (i, j) ∈ T1, (2.1) 3

(10)

4 Jelger Roet — DCL Evaluation Table 2.1: Run-off triangle of aggregate payments.

Development year (j) 1 2 · · · m-1 m Acciden t year (i ) 1 X11 X12 · · · X1,m−1 X1m 2 X21 X22 · · · X2,m−1 .. . · · · · m-1 Xm−1,1 Xm−1,2 m Xm1

with the restriction that Pm

j=1βbe_j = 1, where

T1 = {i, j = 1, 2, . . . , m so i + j = 2, . . . , m + 1} , (2.2)

the set of i and j corresponding to the upper left triangle in Table 2.1. Thereforeα_ei

can be interpreted as the total payments made to claims that occurred in accident

year i. eβj can be interpreted as the claim settlement fraction for development year

j. The empty lower right triangle can now be estimated by assuming that

E[Xij] =αeiβej for (i, j) ∈ T2, (2.3)

where

T2 = {i, j = 1, 2, . . . , m so i + j = m + 2, . . . , 2m} , (2.4)

the set of i and j corresponding to the lower right triangle in Table 2.1. This completes the square with estimations, of which the summation is a prediction of the reserves needed to meet the requirements. As can be seen, the CL is unable to give estimates for development years greater than m, which are referred to as tail estimates.

2.2 Double Chain Ladder

The Chain Ladder method can only provide forecasts of aggregated payments and is incapable of separating the Reported But Not Settled (RBNS) claims and the Incurred But Not Reported (IBNR) claims from the forecasts. This is because when solving the run-off square of aggregate payments with the CL, only a single development pattern is taken into account. This pattern includes, but is unable to separate, two sources of delay: the delay in reporting the incurred losses (IBNR delay) and the delay in settling the payments (RBNS delay). In order to separate these delays, thus separating and being able to model the IBNR claims and the RBNS claims, Verrall et al. (2010) propose to use two run-off triangles. A run-off triangle of aggregate reported claim counts is added to the existing CL setup. It is expected that by expanding the CL method with this triangle, the loss reserve estimates will improve. The run-off triangle of paid claim numbers could also have

been used. This would have removed the possible need to model for zero-claims.1

(11)

However, such a run-off triangle is not as easily obtained as a run-off triangle of aggregate reported claim counts (Verrall et al., 2010).

In order to forecast the IBNR and the RBNS claims, including the tails, Verrall et al. (2010) first explain how the corresponding delays can be estimated. It is assumed that the numbers of reported claims are independently distributed and have an over-dispersed Poisson distribution. Instead of applying the CL to the run-off triangle of aggregate payments, the CL is now used on the run-run-off triangle of aggregate reported claim counts. This is done to obtain forecasts of the number of claims and to estimate the IBNR delay. The RBNS delay is estimated by assuming that claims are settled with one payment and that the distribution of paid claims follows a Multinomial distribution.

Because the derived model in Verrall et al. (2010) is an approximation of more detailed individual models, restrictions are imposed on the individual claims. It is assumed that these do not depend on the IBNR or RBNS delay. A second assumption is that the individual claims are independent of the number of claims and that they are independently and identically distributed (i.i.d.). However, this last assumption is implausible as larger payments will most likely have longer RBNS delays. The authors acknowledge this, but argue that the model that can be derived from this assumption provides a reasonable starting point. From these assumptions the likelihood function for the set of aggregate payment data given the set of aggregate reported claim counts can be set up to find estimates of the outstanding claims. In Kaas et al. (2008) it is shown that to maximize the likelihood for aggregate reported claim counts, the CL can be used to complete the run-off square of aggregate reported claim counts, as this yields the same results. Mart´ınez-Miranda et al. (2012) extend the model of Verrall et al. (2010). The model is generalised and claim inflation can be taken into account, for which Verrall et al. (2010) and Mart´ınez-Miranda et al. (2011) have only corrected in their sample. This provides better estimates. By applying the CL to both the run-off triangle of aggregate payments and the run-run-off triangle of aggregate reported claim counts, the total forecast of the loss reserve can be divided into the IBNR reserve and RBNS reserve, including tail estimates. Because the CL is applied twice, the authors call this new method the Double Chain Ladder method. This is an improvement on the method of Verrall et al. (2010), which can also provide these kinds of estimates, as the parameters are more clearly defined and better estimates are obtained as inflation is accounted for. Restrictions similar to those in Verrall et al. (2010) are imposed. For example, claims are settled with a single payment. The main difference is the inclusion of the claim inflation parameters with certain restrictions.

By applying the CL to the run-off triangle of aggregate reported claim counts,

estimators of the parameters αi and βj are found so that the following holds:

Nij ≈αbiβbj for (i, j) ∈ T1, (2.5)

with the restriction that Pm

j=1βb_j = 1. Here N_ij is the total number of claims

incurred in accident year i and reported in development year j. The parameter

αi can be interpreted as the total number of claims that occurred in accident

(12)

The CL is also applied to the run-off triangle of aggregate payments to estimate

the parameters α_ei and eβj, as is done in equation 2.1. From these parameters the

inflation parameters, γi, the proportion of claims settled l periods after reporting,

πl, and a mean factor for individual payments in the first accident year, µ, can be

estimated. How these estimates can be obtained can be found in Mart´ınez-Miranda et al. (2012).

Mart´ınez-Miranda et al. (2012) provide several formulas, including separate formulas for the prediction of the IBNR and RBNS claims. The RBNS claim fore-casts can be based on either the real claim amounts or the fitted claim amounts,

b

Nij =αbiβbj. This gives two formulas for the RBNS claim forecasts:

b Xrbns1 ij = j X l=i−m+j Ni,j−l bπl bµbγi (2.6) and b Xrbns2 ij = j X l=i−m+j b Ni,j−l bπl bµbγi. (2.7)

As the IBNR claim forecasts must be solely based on the fitted claim counts, there is only one formula:

b X_ijibnr = i−m+j−1 X l=0 b Ni,j−l bπl µbγbi. (2.8)

Tail estimates necessary for the full prediction of loss reserves can be obtained by the following formula, similar to the one from Mart´ınez-Miranda et al. (2012):

b X_ijtail = min(j,d) X l=0 b

Ni,j−l πbl µbbγi for (i, j) ∈ T3∪ T4. (2.9)

Here d denotes the maximum delay and T3 and T4 are defined by

T3 = {i = 1, . . . , m − 1, j = m + 1, . . . , 2m so i + j = m + 2, . . . , 2m + 1}

(2.10) and

T4 = {i = 2, . . . , m, j = m + 1, . . . , 2m so i + j = 2m + 1, . . . , 3m} . (2.11)

It can be shown that the exact estimates from the CL method for the aggregated

payments over T2 can be obtained by adding formula 2.7 and formula 2.8. Even

though the new DCL method can produce the same estimates as the CL method, it can be argued that not the fitted but the real claim counts should be used to compute RBNS claim forecasts.

2.3 Bornhuetter-Ferguson Double Chain Ladder

The Double Chain Ladder method can obtain explicit estimates of the annual inflation parameters, whereas the Chain Ladder method models the inflation

(13)

implicitly. Both methods, however, model the inflation with a degree of uncer-tainty (Mart´ınez-Miranda et al., 2013b). The reason for this is that near the end of the run-off triangles there is less data available and it is more volatile. The Bornhuetter-Ferguson method is used to account for unstable predictions of the CL method (Kaas et al., 2008). This method is capable of incorporating prior knowledge into the model. For example, there could be prior knowledge about the total amount of payments for claims occurred in year i, and thus knowledge about

the value of α_ei. This information can then be used instead of the estimate from

the CL, resulting in more accurate forecasts.

Mart´ınez-Miranda et al. (2013b) propose to also use the Bornhuetter-Ferguson method in the DCL model to account for the uncertainty of the annual inflation parameters. Therefore, another run-off triangle is used, the triangle of aggregate incurred claim amounts. By applying the DCL to this off triangle and the run-off triangle of aggregate reported claim counts, new and more realistic estimates of the annual inflation parameters are obtained. While keeping the predictions of the other parameters unchanged, the annual inflation parameters obtained from applying the DCL to the run-off triangles of aggregate payments and aggregate reported claim counts can be replaced with the new parameters, resulting in less volatile forecasts. As this is similar to the Bornhuetter-Ferguson method in the CL, this new method is called the Bornhuetter-Ferguson Double Chain Ladder.

(14)

Chapter 3 Method of simulation

In this chapter the underlying assumptions used in the dataset simulations are explained. Additionally, a description of the use of these datasets to evaluate the CL, DCL and BDCL is provided.

3.1 Assumptions

The assumptions underlying the model used to simulate the datasets are simi-lar to the assumptions used to construct the model of Verrall et al. (2010) and Mart´ınez-Miranda et al. (2012), as these are reasonable starting points and can be adjusted relatively easily if desired. Also, several parameters are implemented that can be adjusted to create diversity in the datasets. The standard values for the parameters that are set in this thesis are given in Table A1 and justified in this section. After simulating the different squares and rectangles, the corresponding triangles are obtained by removing the lower right triangles. The assumptions for each simulated square or rectangle, and therefore the corresponding triangles, are discussed separately. The squares and rectangles that result from the standard parameters are provided in Tables A2 to A4.

3.1.1 Square of aggregate claim counts

First the size of the square of aggregate claim counts, and thereby the correspond-ing triangle, must be specified. This equals the amount of accident years observed and thus, as it is required that the aggregate claim count triangle has fully run off, development years observed for the first accident year. Secondly, a specification

of the size of the total claim count in the first year, s0, as well as of the growth

of the claim count total over each accident year, gi, must be made. The triangle

size must, for the evaluations in this thesis, be sufficiently large to be able to get adequate results, but small enough to avoid increased volatility. Therefore, the tri-angle size is set to 12, which is deemed suitable. The first year claim count total is set to 7000, corresponding to the value of the first year claim count total in the dataset used in Verrall et al. (2010). For claim count growth it is observed that in the dataset of Verrall et al. (2010) there is a steady growth over the whole period of accident years observed, with declines in some years. This trend is expected

(15)

to be realistic and is therefore adopted in this thesis. But as the essence lies in the trend, it has to be accepted that the single growth values used in this thesis cannot be justified.

The distribution of claim counts for the accident years i are generated by a random Multinomial distribution:

Ni ∼ rMultinomial(si, q). (3.1)

Here q corresponds to the vector of IBNR delays for each development year j and

si corresponds to the total claim count in year i. The IBNR delays can add up to

more than 1, but will be internally normalised to serve as probabilities. To model

the IBNR delays qj, the density function of a Cauchy distribution will be used. It

is assumed that if ξ is the density function of a Cauchy distribution, the IBNR delay of development year j is given by the following equation:

qj = ξ(j + 0.25; xibnr0 , γ ibnr_{) =} 1 πγibnrh_{1 +} (j+0.25)−xibnr0 γibnr 2i for j = 1, 2, . . . , m , (3.2) where xibnr

0 is the location parameter and γibnr the scale parameter. In this case

the location parameter can be interpreted as the year in which the reported claim counts per year peak. The scale parameter can be interpreted as a spread factor, where higher values correspond to a higher degree of spread of claim reporting over the development years. As it is more logical that claim reporting, relative to the peak year, is more biased towards earlier years than towards later years, j is adjusted by adding a constant of 0.25. We expect that the majority of the claims are reported in the first two development years, as is the case in the dataset used in Verrall et al. (2010). Also, we expect that there is a noticeable positive probability of claim reporting in every development year, meaning that there is a non-negligible tail in the delay distribution. Therefore, the standard values of xibnr

0 and γibnr in the simulations are set to 1.5 and 0.75 respectively. The Cauchy

0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 10 11 12

Years delayed in reporting (j)

D el ay p rop or ti on s (qj ) Method Cauchy Cauchy adjusted and normalised

(16)

density function for these values as well as the corresponding adjusted and non-adjusted IBNR delays are shown in Figure 3.1.

3.1.2 Square of aggregate incurred claim amounts

The square of aggregate claim counts is used to obtain the square of aggregate

incurred claim amounts. For each claim reported, k, a random claim size Y_ijk is

simulated. The individual claim sizes are mutually independent and are assumed to follow either a Gamma distribution or a Pareto distribution, as defined in Bain & Engelhardt (1992):

Y_ijk ∼ Gamma(κgamma_{, θ}gamma₎ _{for k = 1, 2, . . . , N}

ij, (3.3)

or

Y_ijk ∼ Pareto(κpareto, θpareto) for k = 1, 2, . . . , Nij, (3.4)

where κgamma _{and κ}pareto _{are shape parameters and θ}gamma _{and θ}pareto _{are scale}

parameters, all assumed to be independent of i and j. To account for inflation that depends on the accident year, the mean and variance of the distribution

are multiplied with the inflation parameter, γi, and the square of the inflation

parameter, γ_i2, respectively. This is in line with Mart´ınez-Miranda et al. (2012).

For the mean and variance of the individual claim sizes, excluding the inflation parameter effect, the values for the mean and variance of an individual (non-zero) claim severity found in Verrall et al. (2010) are used. The standard values of inflation are chosen in such a way that there is an upward trend over the whole simulated period, including downfalls. The same reasoning as used for the growth regarding the justifiability of the single parameter values applies to the single values of the inflation parameters. To account for the possibility of claim restitution for wrongly paid claims, total claim sizes in a particular accident year and development year, for development years greater than two, have a probability of being negative. As the expectation is that the discovery of false claims could take years and as the triangle size is relatively small for a full incorporation of wrongly paid claims, this probability is set to 0.1.

3.1.3 Rectangle of aggregate payments

The size of the rectangle of aggregate payments is m by m + d. Here, m is equal to the size of the aggregate claim count triangle and d is equal to the maximum duration of settlement and thus the length of the RBNS delay vector. It is expected that the maximum time to report a claim does not exceed the maximum time to settle a claim, as there are, for example, prolonged investigations and lawsuits. However, if the maximum settlement delay is set disproportionally, the estimates for later years become meaningless and volatile. Taking into account that the DCL and BDCL can provide predictions of outstanding claims for up to a maximum of d = m, the maximum duration of settlement is set equal to the size of the triangle. Thus, the size of the rectangle of aggregate payments in this thesis is 12

by 24. The RBNS delays πl are modelled and calculated in the same way as the

(17)

0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 10 11 12

Years delayed in settling (l)

D el ay p rop or ti on s (πl ) Method Cauchy Cauchy adjusted and normalised

Figure 3.2: Adjusted and non-adjusted RBNS delays.

parameters set to model the RBNS delays differ from the parameters used in the calculation of the IBNR delays. This is because, for the above mentioned reasons, the average time needed to settle a claim will generally exceed the average time needed to report a claim. For both l = 0 and l = 1 there will be a significant amount of claims settled. Claims reported near the end of the first development year will have a higher chance of being settled in the second development year than in the first development year, as handling claims takes time in general. The

standard values in the simulations of xrbns₀ and γrbns are therefore set to 1 and 1.75

respectively. The corresponding Cauchy density function as well as the adjusted and non-adjusted RBNS delays are shown in Figure 3.2. Zero claims are also accounted for by using a zero claim factor Q, which for this thesis is set to 0.2, as was done in Verrall et al. (2010). The expectation is that this is a reasonable estimation.

3.2 Methods

In order to evaluate the performance of the CL, DCL and BDCL, different datasets are simulated. While keeping other parameters equal, different single parameters can be changed to produce different datasets between which the difference is only caused by the change of that single parameter. The simulated exact total reserve that should be kept is then compared with the exact total reserve that is esti-mated by the different methods on the corresponding triangles. Parameters can thus be changed to vary in a set interval to assess the effect on the methods’ reserve forecasting performance. The different parameter values can then be set against the reserve estimation error, which equals the absolute value of the dif-ference between the simulated and the estimated exact total reserve. Besides the

exact total reserves, the 99th percentiles of the estimated total reserve are also

considered. These are obtained for the DCL and BDCL using the bootstrapping methods proposed in Mart´ınez-Miranda et al. (2011). For the CL, bootstrapping methods proposed in England & Verrall (1999) and England (2002) are used. By

(18)

repeatedly simulating the datasets with the parameters fixed, the simulated 99th

percentile of the total reserves is obtained. This is then used to determine the 99th

percentile reserve estimation error for each of the three methods.

There are different parameters that, when changed, are expected to have a significant effect on the performance of the estimation. By multiplying the vector of inflation parameters with a factor that varies within set bounds, the effect of low and high inflation regimes can be considered. The vector of the growth of the total claim counts is also multiplied by a varying factor to consider the corresponding effect. The IBNR and RBNS delays can be changed in such a way that claims are reported or settled more evenly spread over the years. This is achieved by varying the scale parameters in the calculation of the delay probabilities. A situation in which the majority of claims are constantly reported or settled after several years is unlikely. Therefore, varying location parameters, for a change in the claim reporting or settling peak year, are not considered. The distribution of individual claim sizes is also expected to make a difference. This can be evaluated by keeping the mean of the distribution constant, while letting the variance vary within set bounds. This is achieved for the Gamma and Pareto distributions by repeatedly recalculating the shape and scale parameters from the constant mean, µ, and

varying variance, σ2_{, as follows:}

κ = (_µ2 σ2 if Y k ij ∼ Gamma; 2σ2 σ2_−µ2 if Y k ij ∼ Pareto; (3.5) θ = (_σ2 µ if Y k ij ∼ Gamma; µσ_σ22+µ_−µ22 if Y_ijk ∼ Pareto; (3.6)

where κ and θ are respectively the shape and scale parameters for either the Gamma or the Pareto distribution.

(19)

Results and analysis

The results found from applying the methods in Section 3.2 are given and analyzed in this chapter. First, we discuss what the settings are in which the results were produced.

4.1 Settings

As different parameters have different interpretations, different bounds are set for the interval in which the corresponding parameter varies. In each of these intervals it is set that for the exact reserve estimation there are 120 different values for the varying parameter and thus 120 points for each of the three methods. For

the estimation of the 99th _{percentiles, 30 points are set considering the already}

high precision of the bootstrap algorithms. The precision is this high because in

order to get the 99th percentile of the estimated total reserve, the corresponding

distribution must be found. The bootstrap algorithms achieve this by running numerous simulations. For more detail see Mart´ınez-Miranda et al. (2011). For both the DCL and the BDCL the number of simulations in the bootstrap algorithm is set to 999. For the CL it is set to 2000. The latter exceeds the former due to

an observed reduced accuracy in the estimations of the 99th _{percentiles found by}

the CL bootstrap algorithm.

To get more accurate estimations of both the exact reserve estimation and the

99th_{percentiles, multiple simulations are run for each point in the interval, keeping}

parameters equal. Hereafter the mean is taken from the resulting list of values. This is done because even though the parameters are the same, the resulting datasets and thus the resulting estimations differ. This is due to randomness in, for example, the simulated individual claim sizes. The simulations are repeated 100 times for the mean of the exact reserve estimation and 5 times for the mean

of the 99th _{percentiles. Again, this is due to the already higher precision of the}

bootstrap algorithms.

In this thesis, only the 99th percentiles for the Gamma distribution are

con-sidered. The results that can be found for the Pareto distribution should have features similar to the results for the Gamma distribution. However, the single values should be more extreme due to the heavy tail of the Pareto distribution.

(20)

4.2 Results

The results are shown in Figures 4.1 to 4.5. Each figure is arranged as follows. The columns indicate the type of distribution used for the individual claim sizes, this is either the Gamma or the Pareto distribution. The first row shows the results

for the exact estimations, whereas the second row shows the result for the 99th

percentiles.

The influence of varying variance of the individual claims is shown in Figure 4.1. For the situation in which Gamma claims are used it can be seen that the increase in variance has a negative effect on the exact forecasting performance of the different methods. When Pareto claims are used, no resulting effect is observed. This can be attributed to the heavy tail of the distribution, which is able to result in extreme claim sizes. In this situation, this characteristic outweighs the effect of an increasing variance. It can be seen that the effect, which is not present in the Pareto case, is identical for each of the three methods. However, the effect

on the 99th _{percentiles is evident. Varying variance has a significant effect on the}

estimation error of the CL 99th percentiles.

Shown in Figure 4.2 are the results for when the vector of annual inflation parameters is multiplied by a certain factor. When looking at the exact estima-tions, the figure shows that for both Gamma and Pareto distributed individual claim sizes, the CL performs worse than the DCL and BDCL. In the figure of the

99thpercentiles, the CL performs better than the DCL and BDCL for some factor

values. This is the case when inflation is low. When inflation is high, the DCL and BDCL perform better. Also, it seems that the inflation has a temperate effect on the performance of the DCL when compared to the effect on the CL and BDCL. The results for varying the IBNR spread are provided in Figure 4.3. It can be seen that for exact reserve estimations for Gamma claim sizes, the DCL performs about the same as the CL, which is worse than the performance of the BDCL. When claim sizes follow a Pareto distribution, the DCL and BDCL have an equal performance and perform better than the CL. When looking at the percentiles, the CL produces significantly worse results.

Figure 4.4 contains the results for a varying RBNS spread. Regarding situations in which exact reserve estimations and Gamma claims are considered, it can be seen that the DCL performs best, with an equally worse performance of the CL and BDCL. For Pareto claims, the DCL and BDCL perform about the same, which is significantly better than the CL. When percentiles are considered, it is the CL that has the best performance when the RBNS is varied. However, the CL errors exceed those of the DCL and BDCL.

The effect of varying claim count growth in the square of aggregate claim counts is shown in Figure 4.5. The exact reserve estimations for both the Gamma and the Pareto claim sizes show that the estimation points are more scattered for higher claim count growth. For each method the error increase is about the same.

There is also a negative effect observed for the 99th percentiles, which is positively

correlated with the claim count growth. The CL percentile estimates seem to be more volatile than those of the DCL and BDCL. The downward sloping part at the end of the CL curve should therefore have no particular significance other

(21)

than the confirmation of the volatility of the estimates.

4.3 Analysis

From the results it can be seen that there are just a few points at which the CL performs best. The CL is therefore significantly the worst performing method. Whether the DCL or the BDCL performs better depends on the situation. For a varying variance it does not seem to have a large influence whether the DCL or BDCL is used, both methods perform approximately the same. With regard to inflation, the choice of method depends on the rate of inflation. In low inflation regimes the BDCL seems to be the best choice, whereas in high inflation regimes it is the DCL that seems to be the best choice. The two different delay spreads have a different effect on the optimal method choice. When increasing the IBNR spread, the BDCL has the overall best performance, whereas an increasing RBNS spread results in the DCL having the overall best performance. The largest absolute differences in errors between the DCL and BDCL result from certain factor values for varying inflation and IBNR spread.

The foregoing implies that the DCL and BDCL are a significant improvement on the CL. The choice between the DCL and BDCL methods should be based on the characteristics of the dataset in question. More importantly, if the choice is between the DCL and BDCL, it should be taken into account how the claim

reporting and settling is spread over the development years. If 99th percentiles

are considered, inflation also plays a significant role. The BDCL method has the absolute best exact forecasting perfomance when claims are reported with a signif-icant degree of spread over the development years and are mainly settled within a year. In addition to this, when inflation is decreasing or is constant over the

acci-dent years, the BDCL also performs best in predicting 99th _{percentiles. The DCL}

method has the best performance in the prediction of the 99th percentiles when

inflation is increasing significantly. Also, it has the best exact reserve forecasting performance when claims are reported almost only in the first development years and settling has a high degree of spread over the development years.

It could be argued that the obtained results are largely based on the specific square of aggregate claim counts used in the evaluations. However, Figure 4.5 shows that the claim count growth, and thus the distribution of claims in the aggregate claim counts square, does not have a significant effect on the estimations. This indicates that a generalisation of the results is plausible.

(22)

16 Jelger Roet — DCL Evaluation 500000 1000000 1500000 2000000 0 10 20 30 Factor E rr or Method CL DCL BDCL Gamma Exact 0.0e+00 5.0e+07 1.0e+08 1.5e+08 2.0e+08 0 10 20 30 Factor E rr or Method CL DCL BDCL 99th_Percentiles 360000 390000 420000 450000 0 10 20 30 Factor E rr or Method CL DCL BDCL Pareto Exact

Variance

Figure 4.1: Resulting change in the errors due to varying variance.

2e+05 4e+05 6e+05 8e+05 0.4 0.8 1.2 Factor E rr or Method CL DCL BDCL Gamma Exact 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.4 0.8 1.2 Factor E rr or Method CL DCL BDCL 99th_Percentiles 2e+05 4e+05 6e+05 0.4 0.8 1.2 Factor E rr or Method CL DCL BDCL Pareto Exact

Inflation

(23)

5e+05 1e+06 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL Gamma Exact 0e+00 1e+08 2e+08 3e+08 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL 99th_Percentiles 0e+00 1e+06 2e+06 3e+06 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL Pareto Exact

IBNR spread

Figure 4.3: Resulting change in the errors due to varying IBNR spread.

250000 500000 750000 1000000 1250000 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL Gamma Exact 0e+00 1e+08 2e+08 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL 99th_Percentiles 0e+00 2e+05 4e+05 6e+05 8e+05 0.0 2.5 5.0 7.5 10.0 12.5 Factor E rr or Method CL DCL BDCL Pareto Exact

RBNS spread

(24)

18 Jelger Roet — DCL Evaluation 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 0.9 1.0 1.1 1.2 Factor E rr or Method CL DCL BDCL Gamma Exact 0.0e+00 5.0e+07 1.0e+08 1.5e+08 0.9 1.0 1.1 1.2 Factor E rr or Method CL DCL BDCL 99th_Percentiles 0e+00 1e+06 2e+06 0.9 1.0 1.1 1.2 Factor E rr or Method CL DCL BDCL Pareto Exact

Growth

(25)

Conclusion

The CL is a popular method to predict outstanding claims, but as this method only uses the run-off triangle of aggregate payments, it is unable to separate IBNR claims from RBNS claims. By using an additional run-off triangle of reported ag-gregate claim counts, Mart´ınez-Miranda et al. (2012) provide a new method that is able to distinguish these claims, the DCL method. It is argued by Mart´ınez-Miranda et al. (2013b) that both these methods have a high degree of uncertainty

in the estimation of the inflation parameters.1 Mart´ınez-Miranda et al. (2013b)

therefore use a third run-off triangle of aggregate incurred claim amounts to pro-vide for a third method, the BDCL.

This thesis evaluated the CL, DCL and BDCL. Simulated datasets with dif-ferent characteristics were used to determine the performance of the methods in predicting outstanding claim liabilities. By setting standard values for the simu-lation parameters, a standard dataset was obtained. This dataset consists of the three run-off triangles needed for the different methods. While keeping all other simulation parameters constant, several single simulation parameters were varied in set intervals to determine the corresponding effects on the forecasting perfor-mance of the methods. The forecasting perforperfor-mance was measured by considering

the exact and 99th _{percentile errors, where the error is the absolute difference}

between the simulated value and the values estimated by the different methods. The results show that the DCL and BDCL are significant improvements on

the CL, where the largest gain was found in the predictions of the 99th _percentiles.

However, the advantages of the BDCL over the DCL were less clear. It was ob-served that the BDCL performed better than the DCL in certain situations. It was found that the highest gain in prediction accuracy could be obtained in a dataset with specific characteristics. For exact reserve estimation, these include a high degree of spread over the development years for claim reporting and early

claim settling for the majority of claims. In the estimation of the 99th _percentiles,

it was found that a constant or decreasing inflation contributed to the success of the BDCL performance.

In other situations the DCL performed better than the BDCL. An increasing

inflation added to the success of the DCL method in the prediction of the 99th

percentiles. When the majority of claims are reported in the first development

1_{which is modelled implicitly by the CL}

(26)

years and the settlements of the claims are spread over several years, the DCL performs best in the exact forecasting of the reserves.

When different combinations of characteristics are observed, a descision could be made regarding the weights of each characteristic. Possibly, a weighted average of both the DCL and BDCL estimations can be taken into consideration. This could be done by taking a weighted average of the estimated inflation parameters or the reserve estimations. Further research is needed to determine the effective-ness of such methods. Despite the observed indication of the ability to generalise the found results to other datasets, further research is needed to support this theory.

(27)

Table A1: Standard values of the simulation parameters.

Variable Variable per Value Description

accident year

m 12 Number of accident years

d 12 Maximum time to settle claims

s0 7000 Claim count total in first accident year

Q 0.2 Zero claim rate, in decimals

µ 203.01 Mean distribution individual claim sizes

σ2 3496125 Variance distribution individual claim sizes

xibnr

0 1.5 The year in which the claim reporting percentage ’peaks’

γibnr _0.75 _{Spread factor, indicating spread around the ’peak’ year}

xrbns

0 1 The year in which the claim settling percentage ’peaks’

γrbns _1.75 _{Spread factor, indicating spread around the ’peak’ year}

γi γ1 1 Inflation in accident year i, in decimals

γ2 1.5 γ3 1.6 γ4 1.3 γ5 0.8 γ6 1.5 γ7 1.9 γ8 0.75 γ9 1.6 γ10 1.5 γ11 1.8 γ12 2

gi g1 1.2 Claim count total growth over accident year i, in decimals

g2 1.1 g3 1.05 g4 0.9 g5 0.95 g6 1.2 g7 1.1 g8 0.95 g9 1.05 g10 1.05 g11 0.95 21

(28)

22 Jelger Roet — DCL Evaluation T able A2: Standar d squar e of aggr egate claim counts. Dev elopmen t year (j ) 1 2 3 4 5 6 7 8 9 10 11 12 1 3547 2109 606 298 132 97 54 42 38 40 18 19 2 4262 2521 730 356 160 116 65 52 46 47 22 23 3 4692 2768 803 391 178 128 70 58 51 52 25 24 4 4929 2903 844 409 187 135 98 54 47 49 24 23

Accident year (i)

5 4432 2618 759 370 167 120 66 54 48 49 24 24 6 4209 2490 720 352 158 114 64 52 45 47 22 22 7 5058 2977 866 420 192 138 100 55 49 50 25 24 8 5568 3268 954 461 212 152 109 73 46 42 34 30 9 5288 3108 905 439 202 144 104 58 51 53 25 25 10 5554 3260 951 460 213 151 109 62 53 55 27 27 11 5834 3420 998 483 224 159 114 77 48 44 29 38 12 5540 3252 949 459 212 150 109 58 50 45 30 40 T able A3: Standar d squar e of aggr egate incurr ed claim amounts. Dev elopmen t year (j ) 1 2 3 4 5 6 7 8 9 10 11 12 1 886988 440558 119597 53560 92261 55011 54261 5838 6057 6057 219 3 2 1295382 75775 3 360144 154175 12127 -3162 28647 1630 206 290 250 12846 3 1436872 77768 3 266476 56427 49403 9578 7640 61360 -64637 -61757 4538 632 4 1345640 59643 1 244633 83513 32454 30765 14426 13320 12126 12422 673 93

Accident year (i)

5 737954 370992 167949 64691 -18770 11653 248 -351 -437 119 89 0 6 1266234 51204 5 368019 -57933 8177 39923 23640 12754 472 2 2 -5366 7 1987942 1145705 286216 2 26492 45746 15804 11454 -104 05 68069 68123 16806 317 8 910517 489179 157867 44228 -73111 24815 27745 3086 205 10 3 8 9 1200000 1313482 226422 93004 83969 25448 17922 1918 4622 4622 461 8 1416 10 1707345 98034 4 311942 244412 -224 37 69042 54808 -890 -7088 7088 7083 835 11 1901617 1154888 404779 56132 72555 42938 37215 13253 -28534 18995 3620 -5640 12 2127227 1298559 373511 1 22209 149063 -112977 66792 5480 18413 -1689 1293 772

(29)

T able A4: Standar d re ctangle of aggr egate p ayments. Dev elopmen t year (j ) 1 2 3 4 5 6 7 8 9 10 11 12 1 253881 296363 216165 145204 116380 91435 76356 534 47 38726 29510 22199 17141 2 370611 465637 390851 281879 177022 111809 83740 60077 44358 3399 6 26898 25530 3 411048 498566 383426 247372 166171 110268 77480 73248 34235 8654 12233 13379 4 384975 429186 332303 225041 148295 104511 75208 56886 45166 3709 2 28487 22105 Accident year (i) 5 211276 247801 200278 140213 80946 5 5039 38107 27159 20458 16014 12880 10499 6 362281 389773 342665 193489 119102 89410 68385 523 81 38714 28658 22422 16648 7 568520 709847 520398 376507 247302 163017 113620 78299 77667 77737 61662 45062 8 260607 314794 239052 156744 75399 5 5690 47679 35290 26314 19906 15648 12652 9 343351 606131 449023 293467 203909 138661 98453 69934 52527 4117 0 33488 27121 10 488340 608562 465124 349979 211139 154070 120343 83696 58420 45971 38181 30501 11 543854 695852 546604 345808 232369 162905 121406 89613 56855 49194 39882 30356 12 608319 780351 590086 389346 282451 150471 120814 89364 71478 53648 42459 34269 Dev elopmen t year (j ) 13 14 15 16 1 7 18 19 20 21 22 23 24 1 8624 4330 2745 1912 1078 549 144 92 40 0 0 0 2 12910 5802 2410 890 607 492 233 164 113 109 86 0 3 4465 312 -915 -960 -972 -727 -6 65 -916 -381 45 2 0 4 10038 4547 2169 1262 845 505 307 184 7 7 2 0 0 Accident year (i) 5 4536 1677 392 -26 67 -5 -2 0 0 0 0 0 6 6603 2697 210 527 460 149 -14 -55 -53 -35 -47 0 7 22612 10427 6164 3476 2441 1763 1369 1 196 607 116 0 0 8 5291 1577 297 19 449 207 21 0 0 0 0 0 9 14972 4631 2369 1319 613 333 170 147 9 5 61 5 0 10 14778 6426 3221 1206 1148 529 93 81 111 57 8 0 11 13950 5042 1794 1133 619 259 -42 -49 106 -9 -49 0 12 15803 5448 2357 1106 104 664 225 133 2 12 8 0

(30)

Appendix B: R code

# Jelger Roet, UvA 2014

# Run [1], [2] and then the function Initialize() before evaluating.

# [1] Sets the parameters, [2] Sets the functions

# [3] Generates single triangles and shows the corresponding plots # [0] Evaluation

Q99reps=2000 # Repeats for simulated q99 (the 99th percentile)

Q99reps.CL=50000 # Repeats for CL bootstrap q99 (the 99th percentile)

sim.errors() # Used to obtain the errors, min factor is set inside the

# function.

# Variables in this function are: # exact=TRUE/FALSE, q99=TRUE/FALSE # gamma=TRUE/FALSE, pareto=TRUE/FALSE, # max.var.factor, max.ibnr.spread.factor,

# max.rbns.spread.factor, max.inflation.factor, max.growth.factor, # total.obs, repeat.amount # [1] Parameters # Packages needed: suppressPackageStartupMessages(library(DCL)) suppressPackageStartupMessages(library(plyr)) suppressPackageStartupMessages(library(reshape2)) suppressPackageStartupMessages(library(ChainLadder)) # Triangle, Claim and Delay Properties

trianglesize=12 # Number of accident years

settle.length=trianglesize # Maximum time to settle claims

claimcount.firstyear=7000 # Claim count total in first accident year

zeroclaimfactor=0.2 # Zero claim rate, in decimals

payments.mean=203.01 # Mean distribution individual claim sizes

payments.variance=3496125 # Variance distribution individual claim sizes

growth=data.frame(t(c(1.2, 1.1, 1.05, 0.9, 0.95, 1.2, # Growth

(31)

1.1, 0.95, 1.05, 1.05, 0.95)))

inflation=data.frame(t(c(1, 1.5, 1.6, 1.3, 0.8, 1.5, # Inflation

1.9, 0.75, 1.6, 1.5, 1.8, 2)))

IBNR.location=1.5 # Can be interpreted as the year in which

# the claim reporting percentage ’peaks’

IBNR.scale=0.75 # Can be interpreted as a spread factor,

# indicating spread around the ’peak’ year

RBNS.location=1 # Can be interpreted as the year in which the

# claim settling percentage ’peaks’

RBNS.scale=1.75 # Can be interpreted as a spread factor,

indicating spread around the ’peak’ year

# [2] Functions

# Function to set starting values: Initialize <- function(){

seed1 <<- 21 # Seed number for N

seed2 <<- 22 # Seed number for X

seed3 <<- 21 # Seed number for Xpaid

randomnumber <<- 10^5*runif(1) # Ensure random seed where needed

use.exact <<- TRUE # Set starting values

use.q99 <<- FALSE use.gamma <<- TRUE use.pareto <<- FALSE use.variance <<- FALSE use.rbns.spread <<- FALSE use.ibnr.spread <<- FALSE use.inflation <<- FALSE use.growth <<- FALSE # N rectangle: Nrectanglesimulated <<- data.frame(simN(seed1)) # N triangle: Ntrianglesimulated <<- removelowertriangle(Nrectanglesimulated) }

# Distribution individual claim sizes:

Ydistr <- function (triangle, mean, variance) { if(use.gamma == TRUE){

scale <- variance/mean shape <- mean/scale

return(rgamma(triangle, shape=shape, scale=scale))} if(use.pareto == TRUE){

(32)

shape=2*variance/(variance-mean^2) scale=mean*(shape-1)

return(rpareto(triangle, theta=scale, kappa=shape))} }

# Simulate random Pareto by using the inverse CDF: rpareto <- function(n, theta, kappa){

u <- runif(n) return(theta/(1-u)^(1/kappa)-theta) } # Rotate matrix: rotate <- function(x){ t(apply(x, 2, rev)) }

# Removes lower right triangle of a square matrix: removelowertriangle <- function(rectangle) {

one <- rotate(rectangle)

two <- lower.tri(one,diag= FALSE) three <- rotate(rotate(rotate(two))) rectangle[three] <- NA

triangle <- rectangle return(triangle) }

# Removes upper left triangle of a square matrix: removeuppertriangle <- function(rectangle) {

one <- rotate(rectangle)

two <- upper.tri(one,diag= TRUE) three <- rotate(rotate(rotate(two))) rectangle[three] <- NA

triangle <- rectangle return(triangle) }

# Square of aggregate claim counts: simN <- function(seed){ Ntrisim <- c() claimcount=claimcount.firstyear for (i in 1:trianglesize) { est.prob <- c() growthyeari <- growth[1,i] for (j in 1:trianglesize) {

(33)

newprob=dcauchy(j+0.25,location=IBNR.location,scale=IBNR.scale) est.prob <- append(est.prob,newprob)

}

set.seed(randomnumber) # Seed

randomnumber <<- 10^5*runif(1) # Seed

if (use.variance == TRUE){set.seed(seed)} # Seed

if (use.rbns.spread == TRUE){set.seed(seed)} # Seed

if (use.inflation == TRUE){set.seed(seed)} # Seed

Ntrisim <- rbind(Ntrisim,t(rmultinom(1, size=claimcount, prob = est.prob))) claimcount=claimcount*growthyeari

}

return(Ntrisim) }

# Square of aggregate incurred claim amounts: simX <- function(seed){ Xsim <- c() for (i in 1:nrow(Nrectanglesimulated)) { vectol <- c() inflationpar=inflation[1,i] set.seed(randomnumber) # Seed

if (use.rbns.spread == TRUE){set.seed(i*seed)} # Seed

if (use.ibnr.spread == TRUE){set.seed(i*seed)} # Seed

if (use.growth == TRUE){set.seed(i*seed)} # Seed

allpaymentsyeari <- Ydistr(claimcount.firstyear*max(growth)^trianglesize, inflationpar*payments.mean, inflationpar^2*payments.variance) for (j in 1:ncol(Nrectanglesimulated)) { if (j > 2) { negativefactor = runif(1)-0.1 negativefactor = negativefactor/abs(negativefactor)} else{ negativefactor=1 } if (j == 1){

vectol<- append(vectol, negativefactor*(sum(allpaymentsyeari[ 1:Nrectanglesimulated[i,1]])))

(34)

} else{

vectol<- append(vectol, negativefactor*(sum(allpaymentsyeari[ Nrectanglesimulated[i,j-1]+1:Nrectanglesimulated[i,j]]))) } } Xsim <- rbind(Xsim,1*vectol) } return(Xsim) }

# Rectangle of aggregate payments: simXpaid <- function(seed) { claimtotals <- c() for (i in 1:nrow(Nrectanglesimulated)) { claimtotalyear <- rep(0,(trianglesize+settle.length)) est.prob <- c() for (q in 1:settle.length) { newprob=dcauchy(q+0.25,location=RBNS.location,scale=RBNS.scale) est.prob <- append(est.prob,newprob) } for (j in 1:ncol(Nrectanglesimulated)) { set.seed(randomnumber) # Seed

if (use.variance == TRUE){set.seed(seed)} # Seed

if (use.ibnr.spread == TRUE){set.seed(seed)} # Seed

if (use.inflation == TRUE){set.seed(seed)} # Seed

if (use.growth == TRUE){set.seed(seed)} # Seed

claims <- rep(0,(j-1)) if ((1-zeroclaimfactor)*Xrectanglesimulated[i,j] < 0) { abssize = abs((1-zeroclaimfactor)*Xrectanglesimulated[i,j]) abssizefactor = -1 } else{ abssize = (1-zeroclaimfactor)*Xrectanglesimulated[i,j] abssizefactor = 1 } claims <- append(claims,t(abssizefactor*rmultinom( 1, size=abssize, prob = est.prob)))

claims <- append(claims,rep(0,(trianglesize-(j-1)))) claimtotalyear <- claimtotalyear + claims

}

(35)

}

return(claimtotals) }

# Calculates the errors sim.var.help <- function() { # Triangles Nrectanglesimulated <<- data.frame(simN(seed1)) Ntrianglesimulated <<- removelowertriangle(Nrectanglesimulated) Xrectanglesimulated <<- data.frame(simX(seed2)) Xtrianglesimulated <<- removelowertriangle(Xrectanglesimulated) Xpaidrectanglesimulated <<- data.frame(simXpaid(seed3)) Xpaidtrianglesimulated <<- removelowertriangle(Xpaidrectanglesimulated)[, -(trianglesize+1):-(trianglesize+settle.length)] if (use.exact == TRUE){ # CL estimation preds.cl.diag = clm(Xpaidtrianglesimulated) LowerTriangle.CL = removeuppertriangle(preds.cl.diag$triangle.hat) LowerTriangle.CL[is.na(LowerTriangle.CL)] <- 0 TotalRes.CL = sum(LowerTriangle.CL) # DCL estimation my.dcl.par = dcl.estimation(Xpaidtrianglesimulated,Ntrianglesimulated, Tables = FALSE) preds.dcl.diag = dcl.predict(dcl.par=my.dcl.par,Ntriangle= Ntrianglesimulated,num.dec=0,Tables = FALSE) TotalRes.DCL = preds.dcl.diag$Dtotal[nrow(data.frame( preds.dcl.diag$Dtotal))] # BDCL estimation

my.bdcl.par = bdcl.estimation( Xpaidtrianglesimulated,

Ntrianglesimulated , Xtrianglesimulated, Tables=FALSE) my.bdcl.par$inflat <- my.bdcl.par$inflat/my.bdcl.par$inflat[1] preds.bdcl.diag = dcl.predict(dcl.par=my.bdcl.par,Ntriangle= Ntrianglesimulated,num.dec=0, Tables=FALSE) TotalRes.BDCL = preds.bdcl.diag$Dtotal[nrow(data.frame( preds.bdcl.diag$Dtotal))] # Reserve error Xpaidtrianglesimulated[is.na(Xpaidtrianglesimulated)] <- 0 TotalRes.Sim = sum(Xpaidrectanglesimulated)-sum(Xpaidtrianglesimulated)

# Error per method

(36)

TotalError.DCL = TotalRes.DCL-TotalRes.Sim

TotalError.BDCL = TotalRes.BDCL-TotalRes.Sim

TotalErrors = rbind(TotalError.CL, TotalError.DCL, TotalError.BDCL)

}

if (use.q99 == TRUE){

# CL estimation

Q99.CL = quantile(BootChainLadder(Xpaidtrianglesimulated, R = Q99reps.CL,

process.distr=c("gamma", "od.pois")), 0.99)$Totals[1,1] # DCL estimation my.dcl.par = dcl.estimation(Xpaidtrianglesimulated,Ntrianglesimulated, Tables = FALSE) dist.dcl <- dcl.boot(dcl.par=my.dcl.par,Ntriangle=Ntrianglesimulated, boot.type=2) Q99.DCL = data.frame(dist.dcl$summ.total)$Q99.total[ nrow(dist.dcl$summ.total)] # BDCL estimation

my.bdcl.par = bdcl.estimation( Xpaidtrianglesimulated ,

Ntrianglesimulated , Xtrianglesimulated, Tables=FALSE) my.bdcl.par$inflat <- my.bdcl.par$inflat/my.bdcl.par$inflat[1]

dist.bdcl <- dcl.boot(dcl.par=my.bdcl.par,Ntriangle=

Ntrianglesimulated, boot.type=2)

Q99.BDCL = data.frame(dist.bdcl$summ.total)$Q99.total[

nrow(dist.bdcl$summ.total)] # Error per method

TotalError.CL = Q99.CL-Q99.Sim

TotalError.DCL = Q99.DCL-Q99.Sim

TotalError.BDCL = Q99.BDCL-Q99.Sim

TotalErrors = rbind(TotalError.CL, TotalError.DCL, TotalError.BDCL)

}

return(TotalErrors) }

# Sets up the values needed for "sim.var.help"

sim.errors <- function(max.var.factor, max.rbns.spread.factor,

max.ibnr.spread.factor, max.inflation.factor, max.growth.factor, total.obs, repeat.amount, exact, q99, gamma, pareto) {

use.variance <- c()

use.rbns.spread<- c() use.ibnr.spread<- c()

(37)

use.inflation <- c() use.growth <- c() use.exact <- c() use.q99 <- c() use.gamma <- c() use.pareto <- c() old.ibnr.spread <<- IBNR.scale old.rbns.spread <<- RBNS.scale old.inflation <<- inflation old.variance <<- payments.variance old.growth <<- growth

on.exit( { use.variance <<- FALSE

use.rbns.spread <<- FALSE use.ibnr.spread <<- FALSE use.inflation <<- FALSE use.growth <<- FALSE use.gamma <<- FALSE use.pareto <<- FALSE } ) if(missing(max.var.factor)){ variancefactor <<- 1 use.variance <<- FALSE }

else { max.factor <<- max.var.factor

min.factor <<- 0.1

use.variance <<- TRUE }

if(missing(max.ibnr.spread.factor)){ ibnrspreadfactor <<- IBNR.scale

use.ibnr.spread <<- FALSE}

else { max.factor <<- max.ibnr.spread.factor

min.factor <<- 0.25

use.ibnr.spread <<- TRUE }

if(missing(max.inflation.factor)){ inflationfactor <<- 1

use.inflation <<- FALSE}

else { max.factor <<- max.inflation.factor

min.factor <<- 0.25

use.inflation <<- TRUE }

if(missing(max.rbns.spread.factor)){ rbnsspreadfactor <<- 1

use.rbns.spread <<- FALSE}

(38)

32 Jelger Roet — DCL Evaluation min.factor <<- 0.25 use.rbns.spread <<- TRUE } if(missing(max.growth.factor)){ growthfactor <<- 1 use.growth <<- FALSE}

else { max.factor <<- max.growth.factor

min.factor <<- 0.9 use.growth <<- TRUE }

if(missing(exact)){ use.exact <<- FALSE }

else { use.exact <<- TRUE }

if(missing(q99)){ use.q99 <<- FALSE }

else { use.q99 <<- TRUE }

if(missing(gamma)){ use.gamma <<- FALSE }

else { use.gamma <<- TRUE }

if(missing(pareto)){ use.pareto <<- FALSE }

else { use.pareto <<- TRUE }

R <- total.obs*repeat.amount # Progress bar

progress_bar_text <- create_progress_bar("text") # Progress bar

progress_bar_text$init(R) # Progress bar

Totalpercentagevar<<-c(0,0,0) List.errors <<- c()

x <<- seq(min.factor, max.factor, length = total.obs) for (q in 1:total.obs) { count=0 List.temp=0 # Reserve Q99 if(missing(q99)==FALSE){ Error.list <<- getq99() Q99.Sim <<- quantile(Error.list,0.99) } for (w in 1:repeat.amount) {

progress_bar_text$step() # Progress bar

if(missing(max.var.factor)){ payments.variance <<- old.variance } else{ payments.variance <<- payments.variance*(min.factor +

(39)

q/(total.obs/(max.factor-min.factor))) } if(missing(max.ibnr.spread.factor)){IBNR.scale <<- old.ibnr.spread} else { IBNR.scale <<- min.factor +

q/(total.obs/(max.factor-min.factor)) }

if(missing(max.inflation.factor)){inflation <<- old.inflation} else{ inflation <<- inflation*(min.factor +

q/(total.obs/(max.factor-min.factor))) }

if(missing(max.rbns.spread.factor)){RBNS.scale <<- old.rbns.spread} else{ RBNS.scale <<- min.factor +

q/(total.obs/(max.factor-min.factor)) }

if(missing(max.growth.factor)){growth <<- old.growth } else{ growth <<- growth*(min.factor +

q/(total.obs/(max.factor-min.factor))) } List.temp=List.temp + abs(sim.var.help()) count=count+1 IBNR.scale <<- old.ibnr.spread RBNS.scale <<- old.rbns.spread inflation <<- old.inflation growth <<- old.growth payments.variance <<- old.variance }

List.errors <<- cbind(List.errors, List.temp/count) }

List.errors <- cbind(X=c(seq(from = min.factor, to = max.factor, by = (max.factor-min.factor)/(total.obs-1))), data.frame(t(List.errors))) return(List.errors) } getq99 <- function(){ Errorlist <- c()

R <- Q99reps # Progress bar

progress_bar_text <- create_progress_bar("text") # Progress bar

progress_bar_text$init(R) # Progress bar

for(i in 1:Q99reps) {

progress_bar_text$step() # Progress bar

Nrectanglesimulated <<- data.frame(simN(seed1))

Ntrianglesimulated <<- removelowertriangle(Nrectanglesimulated)

Xrectanglesimulated <<- data.frame(simX(seed2))

(40)

34 Jelger Roet — DCL Evaluation Xpaidrectanglesimulated <<- data.frame(simXpaid(seed3)) Xpaidtrianglesimulated <<- removelowertriangle(Xpaidrectanglesimulated)[, -(trianglesize+1):-(trianglesize+settle.length)] Xpaidtrianglesimulated[is.na(Xpaidtrianglesimulated)]<- 0 TotalRes.Sim = sum(Xpaidrectanglesimulated)-sum(Xpaidtrianglesimulated)

Errorlist = rbind(Errorlist, TotalRes.Sim)

}

return(Errorlist) }

# [3] Example of generating single triangles # Triangles

Initialize()

Nrectanglesimulated=data.frame(simN(seed1)) # N rectangle

Ntrianglesimulated=removelowertriangle(Nrectanglesimulated) # N triangle

Plot.triangle(Ntrianglesimulated, Histogram=TRUE) # N triangle plot

Xrectanglesimulated=data.frame(simX(seed2)) # X rectangle

Xtrianglesimulated=removelowertriangle(Xrectanglesimulated) # X triangle

Xpaidrectanglesimulated=data.frame(simXpaid(seed3)) # Xpaid rectangle

Xpaidtrianglesimulated=removelowertriangle(Xpaidrectanglesimulated)[, -(trianglesize+1):-(trianglesize+settle.length)]

# Xpaid triangle

Plot.triangle(Xpaidtrianglesimulated, Histogram=TRUE) # Xpaid triangle

plot # CL + Bootstrap

preds.cl.diag = clm(Xpaidtrianglesimulated)

BootChainLadder(Xpaidtrianglesimulated, R = 50000, process.distr=c("gamma", "od.pois")) # DCL + Bootstrap + Parameter plot

my.dcl.par=dcl.estimation(Xpaidtrianglesimulated,Ntrianglesimulated)

preds.dcl.diag<-dcl.predict(dcl.par=my.dcl.par,Ntriangle=Ntrianglesimulated, num.dec=0)

dist.dcl<-dcl.boot(dcl.par=my.dcl.par,Ntriangle=Ntrianglesimulated,boot.type=2) Plot.dcl.par(my.dcl.par)

# BDCL + Bootstrap + Parameter plot

my.bdcl.par = bdcl.estimation( Xpaidtrianglesimulated , Ntrianglesimulated,

Xtrianglesimulated, Tables=FALSE)

my.bdcl.par$inflat<- my.bdcl.par$inflat/my.bdcl.par$inflat[1]

preds.bdcl.diag = dcl.predict(dcl.par=my.bdcl.par,Ntriangle=

Ntrianglesimulated,num.dec=0, Tables=FALSE)

(41)

boot.type=2) Plot.dcl.par(my.bdcl.par)

(42)

References

Auguie, B. (2012). gridExtra: functions in Grid graphics. R package version 0.9.1. Bain, L. J. & Engelhardt, M. (1992). Introduction to Probability and Mathematical

Statistics (Second ed.). Cengage Learning.

Bornhuetter, R. L. & Ferguson, R. E. (1972). The actuary and ibnr. Proceedings of the Casualty Actuarial Society, LIX, 181–195.

England, P. (2002). Addendum to analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Mathematics and Economics, 31 (3), 461– 466.

England, P. & Verrall, R. (1999). Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: mathematics and economics, 25 (3), 281– 293.

Gesmann, M., Murphy, D., & Zhang, W. (2013). ChainLadder: Statistical meth-ods for the calculation of outstanding claims reserves in general insurance. R package version 0.1.7.

Kaas, R., Goovaerts, M., Dhaene, J., & Denuit, M. (2008). Modern Actuarial Risk Theory—Using R (Second ed.). Heidelberg: Springer.

Mart´ınez-Miranda, M. D., Nielsen, B., Nielsen, J. P., & Verrall, R. (2011). Cash flow simulation for a model of outstanding liabilities based on claim amounts and claim numbers. ASTIN Bulletin, 41 (1), 107–129.

Mart´ınez-Miranda, M. D., Nielsen, J. P., & Verrall, R. (2012). Double chain ladder. ASTIN Bulletin, 42 (1), 59–76.

Mart´ınez-Miranda, M. D., Nielsen, J. P., & Verrall, R. (2013a). DCL: Claims Reserving under the Double Chain Ladder Model. R package version 0.1.0. Mart´ınez-Miranda, M. D., Nielsen, J. P., & Verrall, R. (2013b). Double chain

ladder and Bornhuetter-Ferguson. North American Actuarial Journal, 17 (2), 101–113.

R Core Team (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

(43)

Sharpsteen, C. & Bracken, C. (2013). tikzDevice: R Graphics Output in LaTeX Format. R package version 0.7.0.

Verrall, R., Nielsen, J. P., & Jessen, A. H. (2010). Prediction of rbns and ibnr claims using claim amounts and claim counts. ASTIN Bulletin, 40 (2), 871–887. Wickham, H. (2007). Reshaping data with the reshape package. Journal of

Sta-tistical Software, 21 (12), 1–20.

Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York. R package version 0.9.3.1.

Wickham, H. (2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40 (1), 1–29.

Evaluation of the double chain ladder using simulated data

using Simulated Data

Jelger Roet

Abstract

Contents

Introduction

Literature review

2.1

Chain Ladder

2.2

Double Chain Ladder

2.3

Bornhuetter-Ferguson Double Chain Ladder

Chapter 3

Method of simulation

3.1

Assumptions

3.1.1

Square of aggregate claim counts

3.1.2

Square of aggregate incurred claim amounts

3.1.3

Rectangle of aggregate payments

3.2

Methods