Testing the assumptions of chain ladder method and bootstrap method.

(1)

Chain Ladder Method

and

Bootstrap Method

Wella Shindy

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics

Author: Wella Shindy

Student nr: ovu89482

Email: w.shindy@gmail.com

Date: March 5, 2014

Supervisor: K. Antonio

Second reader: R. Kaas

(2)

(3)

Abstract

Motivation The non-life insurance company wants to estimate the total required

claims reserves and determine the reliability of these claims reserves estimation. The Chain ladder method and the residuals bootstrapping method are often used to do such estimation. Next to the Chain Ladder method, this thesis covers two residuals bootstrapping methods that are often used simultaneously, namely the Over-dispersed Poisson (ODP) Bootstrap and the Mack Bootstrap. For Solvency II requirements it is recommended to verify the assumptions within the applied methods by the insurance company. This thesis will combine the available literatures with respect to assumptions of these methods and testing these assumptions.

Method In this thesis we will start with describing the estimation process of the

Chain Ladder method, the ODP Bootstrap method and the Mack Bootstrap method. Next, we will provide a complete set of assumptions of the these methods. Therefore we will describe some tests that can be used to verify these assumptions and show how the output of these tests can be used in practice. Finally, we will summarize the test results based on the tested assumptions using Motor Third Party Liability (MTPL) data from a well-known Dutch insurance company and suggest possible areas for future research.

Results Through this thesis we will illustrate the practical implementation of testing

the assumptions of the Chain Ladder method, the ODP Bootstrap method and the Mack Bootstrap method.

Conclusions This thesis outlines the full set of assumptions of the method considered

in this research and the tests available to verify these assumptions for practicing actuary.

Availability As part of this thesis a set of Excel 2003 files has been made available

for reference. These files contain the calculations illustrated in this thesis. For further insight into the calculation details of the concerning files, please email me at wshindy@ gmail.com

Keywords Check Assumptions, Chain ladder method, Bootstrap method, Over-dispersed

(4)

Preface

The thesis that lies in front of you is a final product of my Master’s course Actuarial Science at the University of Amsterdam.

This thesis would not have been possible without support and guidance of my su-pervisor at University of Amsterdam, Katrien Antonio. I was able to finish this thesis thanks to her sharp criticisms regarding my work. I also offer my sincere gratitude to Rob Kaas who put his time and effort to read and evaluate this thesis.

Special thanks to my supervisor at Achmea, Jan Willem Vulto for his inspiration and guidance.

Finally, but most importantly I thank my family who has supported me through all these times. Starting with my mother, Candy, my family in-law, the La Rose family, Sandra for their interest, moral support and mostly their flexibility on babysitting my daughter, Aimee. And to my lovely husband, Diego, for his patience and support, most of all, giving me the strength to carry on during the more difficult moments.

Amstelveen, Wella Shindy March 5, 2014

(7)

Introduction

1.1 Background

Insurance companies, and specifically Achmea, want to settle their submitted claims as soon as possible. That is why Achmea estimates the total required claims reserves. Such settlements can be split into two parts namely settlements of submitted claims for previous years and the current year and the expected settlements for future years. There are two cases for the latter. Firstly, damage that occurs in a given year is often not directly settled in the same year due to various reasons. For example, due to long legal procedures as in liability insurance claims, to spread payments as in disability insurance or to the fact that the size of the claims are hard to predict. Secondly, damage that already has occured but it is not yet notified by the insured. All of these cases make it more difficult to make an estimation of the expected payments. Achmea holds reserves to meet these expected payments (i.e., claim reserves) and determines the expected payments through an analysis of the historical payments. For this purpose Achmea uses the Chain Ladder method (Mack, 1993).

Renshaw and Verrall (1998) showed that the Chain Ladder method can be inter-preted as a Generalized Linear Model (GLM). In general, GLM can be used to suggest how the parameter estimates can be obtained, and also to suggest appropriate goodness-of-fit measures and residuals definition. Moreover, this model makes assumptions about the underlying probability distribution of the estimated settlements. Renshaw and Ver-rall also mentioned two types of underlying models of the Chain Ladder method, namely the Over-dispersed Poisson (ODP) model and the Mack model. This notion will be more clear in the next paragraph.

In order to determine the reliability of the estimated settlements, Achmea uses the statistical technique of bootstrapping. Achmea applies residual resampling bootstrap. The corresponding residuals are obtained by the Chain Ladder estimates based on pay-ments (Bootstrap method (England and Verrall, 1999)). England and Verrall derived this method using GLM. In general, a non-life insurance company uses the residuals to bootstrap because of the fact that residuals have a bit less dependent structure in which one often assumes that they are independent and identically distributed (i.i.d)(England and Verrall, 2002). This assumption is however not valid for all cases. As addressed earlier claims payments are often not directly settled and these payments depend on each other.

Using the connection between the Chain Ladder and GLM, one can connect the Bootstrap method with the mentioned before underlying models, namely the ODP and the Mack model. In this thesis we refer to these Bootstrap methods as the Over-dispersed Poisson(ODP) Bootstrap method and the Mack Bootstrap method. Achmea uses both methods.

(8)

2

1.2 Problem statement

For Solvency II requirements it is recommended to verify the assumptions within the applied method. Within Achmea this applies to the Chain Ladder method, the ODP Bootstrap method and the Mack Bootstrap method. In order to comply with this re-quirement, Achmea has set the objective to make an inventory of all the assumptions of these methods and (statistical) tests in order to judge the adequacy of the correspond-ing assumptions. The aim of this thesis is to provide complete assumptions for these methods and subsequently provide (statistical) tests that can be used to verify these assumptions.

The research objective can be translated in the following research question:

“What measures are available to test the assumptions of the Chain Ladder and the Bootstrap method in order to assess if these methods can be applied for the homo-geneous risk group of data?”

1.3 Research approach

In order to perform the Chain Ladder method, the ODP Bootstrap method and the Mack Bootstrap method, we need data of triangular shape, a so called “run-off trian-gle” with two axes. The vertical axis consists of rows which represent the years in which the damage has occurred, also known as the accident year. The horizontal axis consists of columns which represent the settling period of the corresponding accident year. This triangle contains settlement amounts for each combination of accident period and set-tlement period. For example in case of yearly based accident period and quarterly based settlement period, then we refer this as Annually-Quarterly (AQ) triangle.

Within this thesis we will limit our analysis to an Annually-Quarterly (AQ) dimen-sion only. In the AQ triangle data, the payments are recorded on the basis of accident years and the settlements are recorded per quarter for the corresponding accident year. Furthermore in order to answer our main research question we start with making an inventory of the assumptions and (statistical) tests of the concerned methods. Sub-sequently we examine if all the assumptions should be checked in order to perform an appropriate estimation. Thereafter we check whether the data fulfills the assumptions.

In other words, we will answer the following research subquestions: 1. What are the basic assumptions of each underlying method?

2. Should all assumptions be checked in order to perform an appropriate estimation, and if so, which ones?

3. Which measures are available to judge the assumptions of each underlying method? 4. Does the data reasonably fulfill the assumption?

Our approach towards answering the subquestions begins with making an inventory of the assumptions of each methods (chapter 2) and explaining how each method works. Thereafter we discuss the (statistical) measures to judge these assumptions. The exten-sive descriptions of these measures will be discussed in detail in chapter 3. Based on the previous mentioned chapters, we will answer subquestion 1 to 3. Chapter 4 is dedicated to answer subquestion 4. Finally, in chapter 5, we draw our conclusions based on the previous chapters and answer our main research question. Furthermore we provide some possible recommendations.

(9)

Reserves Estimation Methods

Within this chapter we will discuss the definitions of the methods in our scope, namely the Chain Ladder method, the ODP Bootstrap method and the Mack Bootstrap method. This chapter consists of four sections.

In the first section, we start with explaining the meaning of claims reserve and the data used to estimate the claims reserve. In order to perform estimation, we start with a so called “run-off triangle” with two axes. These axes are generally given in a fiscal time dimension, such as Annually-Annually (AA) and Annually-Quarterly (AQ). As mentioned earlier we will limit our analysis to AQ triangles given that within the available literature the run-off triangles are often in AA dimension. However in order to give a complete picture of this run-off triangle, we will discuss both triangles in section 2.1.

In the second section, we give the definition of the Chain Ladder method for an AQ triangle, illustrate an example of Chain Ladder estimation and obtain results based on this example. Subsequently, we discuss the connection between the Chain Ladder and the Weighted Least Squares (WLS) methods. This connection is needed in order to be able to judge the assumption of the Chain Ladder method. Next, we discuss the connection between the Chain Ladder method and the Generalized Linear Models (GLM). This connection is required in order to give the definition of the ODP Bootstrap method and the Mack Bootstrap method. Finally, we give the underlying assumptions of the Chain Ladder method which allow us to answer the subquestion 1 and 2 for the Chain Ladder method.

The third section is dedicated to the Bootstrap method. We start with discussing the general idea of the Bootstrap method. Hereafter, we specify the Bootstrap method that is used by Achmea. As mentioned before, we used the connection between the Chain Ladder and GLM in order to explain the definitions of the ODP Bootstrap method and the Mack Bootstrap method. Next, we also give the underlying assumptions of the Bootstrap method. Now, we are able to answer subquestion 1 and 2 for the Bootstrap method.

In the fourth and final section we will summarize the discussed topics.

2.1 Run-off Triangle

Insurance companies want to predict claims reserves in order to meet their future pay-ments. Claims reserve can be divided into two types: outstanding claim reserve and Incurred But Not Reported (IBNR) claim reserve.

A claim that occurs in a given year is often not settled in that same year due to several reasons. For example, due to long legal procedures as in liability insurance claims, to spread payments as in disability insurance or to the fact that the sizes of the claims are hard to predict. All these examples will lead to delay of the actual payments of the claims. This means that an outstanding claims reserve is required for claims that have

(10)

4

been reported, but not yet closed.

Moreover, there are cases where a claim already occurred, but the insurer is not yet notified. For such claims IBNR claims reserves are required.

In order to determine claims reserves, one starts with a run-off triangle. There are two types of run-off triangles, namely paid claims triangles and incurred claims triangles. Paid claims triangles contain historical payments, and incurred claims triangles contain the sum of the historical payments and the case reserves for particular claims. Within the scope of this research, we will only look at paid claims triangles since this is currently used by Achmea. From this point onwards, we will refer to paid claims triangle as claims triangle.

In the next sections we will further elaborate on the AA triangle and AQ triangle.

2.1.1 Annually-Annually (AA) Triangle

Denote the claims reported by accident year i and settlement period k by Ii,k. Then the

complete set of data is

{Ii,k : 1 ≤ k ≤ n, 1 ≤ i ≤ n − k + 1}. (2.1)

which may be represented as an AA run-off triangle:

I11 I12 . . . I1n

I21 I22 I2,n−1

.. .

In1

We also refer to the above triangle as an incremental claims triangle and to Ii,k as an

incremental claims total.

Denote the cumulative claim with accident year i and settlement period k by Ci,k.

A cumulative claim Ci,k is claims that are cumulated by rows,

Ci,j =

i

X

k=1

Ii,k (2.2)

Thus, the cumulative claims triangle is

C11 C12 . . . C1n

C21 C22 C2,n−1

.. .

Cn1

The aim is to obtain estimates of {Ci,n: i = 2, 3, . . . , n}.

2.1.2 Annually-Quarterly (AQ) Triangle

An AQ incremental claims triangle is presented by:

I11 I12 I13 I14 . . . I1,4(n−i+1) I21 I22 I23 I24 . . . I2,4(n−i+1) .. . .. . In1 In2 In3 In4

and the complete set of incremental claims totals is

(11)

The cumulative claim Ci,j with accident year i and development period j is Ci,j = j X k=1 Ii,k where j = 1, . . . , 4(n − i + 1). (2.4)

where the AQ cumulative claims triangle can be presented by:

C11 C12 C13 C14 . . . C1j C21 C22 C23 C24 . . . C2j .. . .. . Cn1 Cn2 Cn3 Cn4

The aim is to obtain estimates of {Ci,4n: i = 2, 3, . . . , n}.

2.2 Chain Ladder method

Based on cumulative claims triangles as discussed in section 2.1, we want to make a prediction of claims that will be paid or fulfilled in future years. These future claims are to be found in the lower right triangle. In order to make the prediction, one needs to complete the triangle into a rectangle. The required claims reserve is the total of the future claims payments for each accident year. Achmea uses the Chain Ladder method to complete this cumulative claims triangle.

The Chain Ladder method provides an estimate for the future claims reserve by using certain proportionality factors, also known as development factors. At first this method was considered as a controversial method since it does not have a strong statistic basis. However, Mack (1993) has proven that the Chain Ladder method gives the minimum variance unbiased linear estimator of future payments. Additionally since this method is relatively simple to understand and quite easy to implement it is often applied by insurance companies.

2.2.1 Estimation Process

In this subsection we discuss the Chain Ladder method as given by Mack (1993). We define incremental claim and cumulative claim as given in equation (2.3) and (2.4) respectively. The individual development factor is the cumulative claims at settle-ment quarter j + 1 divided by the (previous) claims at settlesettle-ment quarter j,

fi,j =

Ci,j+1

Ci,j

, with j = 1, . . . , 4(n − i + 1). (2.5)

The development factor that is needed to estimate the future claims reserve is a weighted

average of the individual development factor fi,j, i.e.,

Fj = Pm i=1Ci,j+1 Pm i=1Ci,j , = m X i=1 Ci,j Pm i=1Ci,j fi,j, with j = 1, . . . , 4(n − i + 1) and m = n − bj − 1 4 c where b j − 1 4 c ∈ N. (2.6)

We refer to the left-hand equation before fi,j in equation (2.6) as the “weight” of fi,j.

Note that it is also possible to use another weight such as a simple average or in some cases only a weighted average of the recent accident years.

(12)

6

Venter (1998) restated equation (2.6) by using incremental claims instead of cumu-lative claims. The development factor according to Venter is

F_jventer = Pm i=1Ii,j+1 Pm i=1Ci,j , = Fj− 1. (2.7)

Below we will illustrate the estimation process by using the Chain Ladder method. For simplicity we only consider an AQ triangle with i = 1 and k = j = 8. The incre-mental and cumulative claims triangle are presented in table 2.1 and 2.2 respectively:

Table 2.1: Example of AQ incremental triangle settlement quarter

Accident Year 1 (3m) 2 (6m) 3 (9m) 4 (12m) 5 (15m) 6 (18m) 7 (21m) 8 (24m)

2004 101 52 17 14 12 9 3 1

2005 99 76 32 3

The figures in the triangle 2.1 are the known claims grouped by the accident year (by row) and settlement quarter (by column). The row corresponding to year 2004 contains eight figures. The first figure is the claims amount that is known on March 31, 2004. The sixth figure in this row denotes the claims that occurred in 2004, but were paid in the second quarter of development year 2005 or the sixth quarter of development year 2004.

Table 2.2: Example of AQ cumulative triangle settlement quarter

2004 101 153 170 184 196 205 208 209

2005 99 175 207 210

Based on equation (2.5) and (2.6) we determine the following:

Table 2.3: Example of individual development factors triangle and development factors

2004 1.515 1.111 1.082 1.065 1.046 1.015 1.005 1.000

2005 1.768 1.183 1.014

F 1.640 1.149 1.045 1.065 1.046 1.015 1.005 1.000

To estimate the future claims we used a recursive technique that starts from the elements in the most recent quarter, i.e., the diagonal elements. These diagonal elements are also called calendar period elements.

The claims payments of quarter j +1 are estimated by multiplying the claim payment of quarter j with the development factor of quarter j, i.e.,

ˆ

Ci,j+1= FjCˆi,j with Cˆi,j = Ci,j. (2.8)

Table 2.4 below shows how to fill table 2.2 into a rectangle of data in which the claims reserve is calculated.

The result of the above calculation is given in table 2.5. The claims reserve for

accident year i, Ri, is determined by subtracting the payments that are already paid

from the best estimate of payments, i.e., for accident year i = 2, . . . , n

(13)

Table 2.4: Example of the estimation future claims by using the Chain Ladder method.

2004 101 153 170 184 196 205 208 209

2005 99 175 207 210 210 F4 210Q5_j=4Fj 210Q6_j=4Fj 210Q7_j=4Fj

Note that we do not consider the settlements that may occur later than kmax= jmax = 8.

Such settlements are dealt with by applying a so-called tail factor.

Table 2.5: Results prediction of future claims by using the Chain Ladder method.

Accident Year 1 (3m) 2 (6m) 3 (9m) 4 (12m) 5 (15m) 6 (18m) 7 (21m) 8 (24m) R

2004 101 153 170 184 196 205 208 209 0

2005 99 175 207 210 224 234 237 239 29

2.2.2 Connection with Weighted Least Squares (WLS)

In this subsection we discuss the connection between the Chain Ladder method with WLS. This connection is necessary since we are interested in testing the Chain Ladder method’s assumptions. One of the advantages of estimating using regression, in this case WLS, is that both the standard errors of the parameters in the average method selection and the standard errors of the forecasts can be obtained. Another important advantage is that the assumptions of the corresponding method can be tested.

Mack (1993) mentioned the connection between the Chain Ladder method with the Weighted Least Squares (WLS). Barnett and Zehnwirth (2000) provided a general definition of this relationship.

An individual development factor is the slope of a line passing through the origin

and the point (Ci,j+1, Ci,j). So, each ratio is a trend.

Accordingly, an individual development factor (trend) average method is based on the regression

Ci,j+1= Ci,jFj+ i, (2.10)

Note that in this case Fj is a parameter that represents the slope of the “best” line

through the origin and the data points (Ci,j+1, Ci,j). The i is the error term with

E(i) = 0,

V ar(i) = σ2Ci,j.

(2.11)

Thus,

E(Ci,j+1) = FjCi,j. (2.12)

2.2.3 Connection with Generalized Linear Model (GLM)

In this subsection we discuss the connection between the Chain Ladder method and GLM. We need this connections since we are interested in two extensions of Bootstrap methods that are derived using GLM.

Renshaw and Verrall (1998) related the Chain Ladder method to the Generalized Linear Model (GLM). In general, GLM can be used to suggest how the parameter estimates can be obtained, and also to suggest appropriate goodness-of-fit measures and residuals definition.

According to Renshaw and Verrall (1998), a GLM comprises two components namely a statistical component and a deterministic component.

(14)

8

The statistical component involves independent response variables Xi,jtogether with

the specification of their first two moments:

E(Xi,j) = µi,j,

V ar(Xi,j) =

φV (µi,j)

ωi,j

(2.13)

The φ(> 0) denotes a scale parameter, ωi,j is weight (often set to 1 for al observation)

and V (·) is the variance function.

The deterministic component consists of the same linear predictor as in Kremer (1982):

ηi,j = c + αi+ βj, α1 = β1 = 0. (2.14)

The α parameter is the adjustment to the constant parameter, c, and the β parame-ter is the adjustment for the development trends afparame-ter the first development quarparame-ter. Both components are then linked to the mean from the linear predictor through the logarithmic link function

log(µi,j) = ηi,j. (2.15)

Renshaw and Verrall (1998) proposed modelling incremental claim as response

vari-able, Xi,j = Ii,j and V (µi,j) = µγi,j. This gives

E(Ii,j) = µi,j,

V ar(Ii,j) = φµγ_i,j

(2.16)

The power γ(> 0) is used to specify the error distribution. The Chain Ladder method can be reproduced by substituting φ = 1 and γ = 1.

Renshaw and Verrall also linked the model in equation (2.16) with the “Over-dispersed Poisson (ODP)’ error distribution. Then equation (2.13) gives

E(Ii,j) = µi,j,

V ar(Ii,j) = φµi,j.

(2.17)

Note that in this case, again, γ = 1.

To judge if the fit of a model such as this GLM, is good enough, and where it can be improved, Renshaw and Verrall (1998) also looked at the residuals. They used two residuals definitions, namely Deviance residuals and Pearson residuals. We will limit our research to the Pearson residuals:

rpi,j =

Ii,j− µi,j

q

µγ_i,j

. (2.18)

2.2.4 Assumptions of the methods

In order to comply to Solvency II requirements, Achmea needs to verify whether the run-off triangle meets the method’s assumptions.

In this subsection we discuss the assumptions of the Chain Ladder method. We start with the original assumptions as given by Mack (1993) with regard to the cumulative claims. Venter (1998) restated Mack’s assumptions by incremental claims. At the end, we give the interpretation of these assumptions.

2.2.4.1 Cumulative claims

Mack (1993) showed three assumptions for the Chain Ladder method. These assump-tions are stated below:

(15)

1. Proportionality : There are unknown development factors Fj’s with

E(Ci,j+1|Ci,1, . . . , Ci,j) = FjCi,j,

2. Independency : The cumulative claims in the sets {Ci,1, . . . , Ci,j} and {Ch,1, . . . , Ch,j}

of different accident years i 6= h are independent,

3. Variance assumption : There are unknown constants σj’s with

V ar(Ci,j+1|Ci,1, . . . , Ci,j) = σ2jCi,j.

Mack also noted that these assumptions are not directly testable, but they have testable implications.

Since we based our thesis on AQ claims triangles, we applied the above assumptions for i = 1, . . . , n, and j = 1, . . . , 4(n − i + 1).

2.2.4.2 Incremental claims

Venter (1998) restated the assumptions as given in the previous subsection by incremen-tal claims. The reason for this was to emphasize the fact that the reserving method is a process of predicting future incremental claims rather than cumulative claims. These assumptions are as follows:

1. Proportionality : There are unknown development factors F_jventer’s with

E(Ii,j+1|Ci,1, . . . , Ci,j) = FjventerCi,j,

2. Independency : The cumulative claims in the sets {Ci,1, . . . , Ci,j} and {Ch,1, . . . , Ch,j}

of different accident years i 6= h are independent,

3. Variance assumption : There are unknown constants σj’s with

V ar(Ii,j+1|Ci,1, . . . , Ci,j) = σj2Ci,j.

2.2.4.3 Interpretation

Since cumulative claims have a 1-1 relation to incremental claims, both Mack’s and Venter’s assumptions can be interpreted identically. Therefore we will only discuss the original assumptions as given by Mack (1993).

The proportionality assumption says that for a given accident year, the expected value of claims is proportional to the previous claims. Note that equation (2.12) expresses the proportionality assumption. In other words, in order to judge the claims triangle we can simply use the WLS definition since it is equal to the first assumption.

The independency assumption says that the accident years are independent. This assumption can be violated in practice since the dependencies in the accident year can occur due to calendar periods effects (Mack, 1994). Calendar effects can be caused by internal influences such as changes in claim handling, claim reserving or by some external influences such as inflation, court decisions and so on. So, we need to consider the calendar period in order to test this assumption.

The variance assumption says that for a given accident year, the variance of claims is proportional to the previous claims with an unknown proportionality constant that depends on the development quarter. This assumption is much harder to understand than the first two. However, this assumption is needed to obtain an unbiased estimator.

2.3 Bootstrap method

In the previous section we showed how to compute a point estimate of the claims reserve using the Chain Ladder method. Now, we want to determine the reliability of this claim reserve. This also means that we want to know about the prediction intervals for these

(16)

10

claims reserves. This means, we are interested in the prediction error of the sum of random variables and the predictive distribution.

One way to do this is by using a data resampling. The resampling takes place from the data itself. In other words one should look for an appropriate structure within the model. With the help of this structure one re-samples new data sets from the observed data set. This approach generates a distribution of possible outcomes and is called the Bootstrap method (Efron, 1979).

2.3.1 Bootstrapping

England and Verrall (1999) described a method to create bootstrap estimates. These estimates are obtained by sampling (with replacement) from the observed residuals which are based on the known observations. Through sampling we obtain a large set of pseudo-data which allows us to calculate claims reserve from it. The method described by England and Verrall is derived using GLM. The standard deviation of the set of claim reserves estimates obtained in this way provides a bootstrap estimate of the prediction error. England (2002) extended this method by adopting a two-stage simulation process to provide a complete predictive distribution.

England and Verrall (2002) described bootstrapping the Over-dispersed Poisson (ODP) model and the Mack model. In their 2006 paper, they provided a general frame-work for bootstrapping GLMs and provided detailed results for the ODP model and the Mack model respectively. We refer to bootstrapping the ODP model as ODP Bootstrap method and bootstrapping the Mack model as Mack Bootstrap method.

Below we describe the algorithm of the Bootstrap method.

The algorithm of Bootstrap method according to England (2010): step 1: Fit the Chain Ladder method and obtain the fitted values.

step 2: Obtain the scale parameter and calculate (scaled) Pearson residuals. step 3: Resample residuals and obtain the pseudo data.

step 4: Apply the Chain Ladder method to each simulated pseudo triangle. step 5: Repeat many times, storing the reserve estimates.

step 6: Obtain the prediction error, i.e., the standard deviation of results. Note that step 2 above, a scale parameter is necessary in order to determine scaled Pearson residuals. Furthermore, scaled Pearson residuals are needed in order to deter-mine the process error (England and Verrall, 1999).

Achmea uses both the ODP Bootstrap and the Mack Bootstrap method in combi-nation with the Chain Ladder method. First, they use the Chain Ladder approach to fill the claims triangle to a square, then sample the Pearson residuals randomly over a large number of trials.

The difference between the ODP and the Mack Bootstrap method is the underlying distributional assumptions, which the definition used for the residuals and hence, the calculation of the scale parameter. Therefore, bootstrapping cannot strictly be consid-ered “distribution-free”, since distributional assumptions must be made to define the statistical model (England and Verrall, 2006).

The next two subsections will discuss the definition of the ODP and Mack Bootstrap method based on an AQ triangle.

2.3.2 Over-dispersed Poisson (ODP) model

Recall the Pearson residual as given in equation (2.18). This Pearson residual is unscaled in the sense that it does not include the scale parameter, φ.

To estimate the scale parameter, φ, England and Verrall (1999) used the Pearson scale parameter. They used the denominator N − p instead of N in order to reduce bias.

(17)

There are two types of scale parameters namely constant scale parameters and non-constant scale parameters. The non-constant scale parameter according to England and Verrall (1999) is given by φ = 1 N X s N N − p(rp) !2 . (2.19)

The non-constant scale parameter discussed in England and Verrall (2006), is given by

φj = 1 m m X i=1 s N N − p(rpi,j) !2 , (2.20)

where N is the total number of observations, p is the number of parameters estimated, and the summation is over residuals at development period j, thus

m = n − bj − 1

4 c where b

j − 1

4 c ∈ N

This thesis only considers the non-constant scale parameters since it gives better standardized residuals (England, 2010).

In equation (2.17) the Pearson residuals are defined by

rpi,j =

Ii,j− ˆIi,j

q

φjIˆi,j

, Ii,j 6= ˆIi,j, _(2.21)

where Ii,j is as given in equation 2.3, φj is as given in equation (2.20) and µi,j= ˆIi,j is

the fitted incremental claims. The pseudo data is defined as

C_i,j∗ = rp∗_i,j

q

φ ˆIi,j+ ˆIi,j. (2.22)

2.3.3 Mack model

The response variable for the Mack model can be the incremental claims, Ii,j, the

cumulative claims, Ci,j, or the individual development factors, fi,j (England and Verrall,

2006). This thesis is limited to the individual development factors, fi,j, since these are

used by Achmea.

When using fi,jas the response variable, equation (2.13) gives the model as presented

below according to England and Verrall (2002)

E(fi,j) = λj,

V ar(fi,j) =

σ2_j

Ci,j−1

. (2.23)

Therefore in terms of equation 2.13, Xi,j = fi,j, µi,j = λj, ωi,j = Ci,j−1and V (µi,j) = 1,

using parameter φ = σ_j2.

Mack (1993) gives the non-constant scale parameters

θj = 1 m − 1 m X i=1 Ci,j−1(fi,j − λ)2. (2.24)

We rename Ci,j−1 to wi,j to emphasize that it is treated as a weight that is fixed and

known. According to England and Verrall (2006) the scaled Pearson residuals are then defined as rpi,j = √ wi,j(fi,j− λj) σj . (2.25)

giving pseudo data

f_i,j∗ = rp ∗ i,jσj √ wi,j + λj. (2.26)

(18)

12

2.3.4 Estimation Process

This subsection gives an example of how to create bootstrap estimates using the ODP Bootstrap method and the Mack Bootstrap method. The algorithm of the Bootstrap method and the definition of both ODP and Mack methods are already given in the previous subsections. Note that step 3 to step 6 are simulation steps, because of this we will not give an extensive calculation for these steps.

The example starts with a cumulative claims triangle as given in table 2.2. In the following subsection we will split each step in two ways, namely ODP and Mack. Step 1: Fit the Chain Ladder method and obtain the fitted model.

ODP Bootstrap method To calculate the fitted cumulative claims triangle we start

with the diagonal element in the cumulative claims triangle. We put this diagonal el-ement in the diagonal of the fitted triangle. In order to obtain all the elel-ements of the fitted triangle we need to apply the Chain Ladder method as described in section 2.2. Recall equation (2.5). Analogously, starting with the diagonal element we can calculate the claim payment at development quarter j by dividing the diagonal element by the development factor of quarter j, i.e.,

ˆ

Ci,j =

ˆ Ci,j+1

Fj

with Cˆi,j = Ci,j.

Table 2.6 shows how to fill the fitted cumulative claims triangle.

Table 2.6: Example of the calculation of the fitted cumulative claims triangle.

Accident Year 1 (3m) 2 (6m) 3 (9m) 4 (12m) 5 (15m) 6 (18m) 7 (21m) 8 (24m) 2004 _F 209 1F2F3F4F5F6F7 209 F2F3F4F5F6F7 209 F3F4F5F6F7 209 F4F5F6F7 209 F5F6F7 209 F6F7 209 1 F7 209 2005 _F210 1F2F3 210 F2F3 210 F3 210 210F4 210F4F5 210F4F5F6 210F4F5F6F7

Table 2.7 gives the results of these calculations.

Table 2.7: Example of the fitted cumulative claims triangle.

2004 93 153 176 184 196 205 208 209

2005 107 175 201 210 224 234 237 239

Now we can calculate each element in the fitted incremental claims triangle. This is derived from equation (2.4):

ˆ

Ii,j = ˆCi,j+1− ˆCi,j

Table 2.8 gives the fitted incremental claims triangle.

Table 2.8: Example of the fitted incremental claims triangle.

2004 93 60 23 8 12 9 3 1

2005 107 68 26 9 14 10 3 1

Mack Bootstrap method By using the cumulative triangle as given in table 2.2, we

obtain the individual development factors, fi,j, as given in table 2.3. After this, we fit

the Chain Ladder method to these factors. This is done by substracting each individual

(19)

Note that the value of f and F of development period 4 to 8 for accident year 2004 in table 2.3 are equal to each other. This means that the development factors these development period is equal to zero which is shown in table 2.9.

Table 2.9: Example of the fitted development factor claims triangle.

2004 -0.125 -0.038 0.037

2005 0.128 0.033 -0.031

Step 2: Obtain the scale parameter and calculate (scaled) Pearson residuals.

ODP Bootstrap method By using the equation (2.20) where N = 12, p = 2 and

equation (2.21), we obtain the scaled Pearson residuals for ODP Bootstrap method: Table 2.10: Example of scaled Pearson residuals for ODP Bootstrap.

2004 1.130 -0.883 -0.722 0.413

2005 -1.058 0.827 0.676 -0.387

Scaleˆ0.5 0.696 1.139 1.703 5.209

Note that the residual for the development quarter 5 to 8 of accident year 2004 is

equal to zero. It is because the Ii,j − ˆIi,j = 0.

Mack Bootstrap method By using the equation (2.24) we obtain the scaled Pearson

residuals for Mack Bootstrap:

Table 2.11: Example of scaled Pearson residuals for Mack Bootstrap.

2004 -0.704 -0.730 0.741

2005 0.711 0.683 -0.672

Mack’s theta 1.79 0.65 0.66

Step 3: Resample residuals and obtain the pseudo data.

From this step onwards we will not give an extensive calculation, since it involves sim-ulation steps.

ODP Bootstrap method We resample the scaled Pearson residuals rpi,j as given

in equation (2.21) with replacement from the incremental claims triangle. We obtain a new upper left triangle which also contains scaled Pearson residuals in its elements. Subsequently, to determine a pseudo data (as given in equation (2.22)) triangle we need to transform these residuals into the incremental claims triangle. Hereafter we can calculate the cumulative claim triangle.

Mack Bootstrap method We resample the scaled Pearson residuals rpi,j as given in

equation (2.25) with replacement from the cumulative triangle. We obtain a new upper left triangle which also contains scaled Pearson residuals in its elements. Subsequently to determine a pseudo data (as given in equation (2.26)) triangle we need to transform these residuals into the cumulative claims triangle.

(20)

14

Step 4: Apply the Chain Ladder method to each simulated pseudo triangle.

ODP Bootstrap method By applying the Chain Ladder method in the pseudo

data triangle, we determine the estimated claims reserve, i.e., the lower right triangle. Together with step 3, we repeated this Chain ladder technique many times. It gives a series of future incremental payments. The variance of these series gives the estimation variance.

Mack Bootstrap method In this step we used the pseudo data triangle to

re-estimate the development factors as given in equation (2.6). Given the cumulative tri-angle we can obtain the estimated claim reserve by multiplying the previous cumulative claims by the appropriate simulated development factor obtained in step 3. Next, we move to the next period. The forecast cumulative amounts are conditional on the simu-lated period ahead forecast obtained from the previous step. Now the lower right triangle is totally filled in.

Step 5: Repeat many times, storing the reserve estimates

This step is equal for both ODP and Mack Bootstrap method. To add the process variance into the estimation variance, we randomly draw for each element in the future

triangle a payment Si,j. For ODP this random draw is given in the equation (2.17) and

for Mack is given in the equation (2.23). We repeat step 3 to 6 many times and store the future expected payments. By summing the simulated future payments per accident year we find the claim reserves for each year. The result of this simulation gives the predictive distribution.

Step 6: Obtain the prediction error

The predictive error is the standard deviation of the results.

2.3.5 Assumptions of the methods

In this subsection we discuss the assumption of the Bootstrap method, the ODP Boot-strap method and Mack BootBoot-strap method.

According to England and Verrall (2002) residuals bootstrapping assumes that the residuals are independent and identically distributed (i.i.d). It does not require the resid-uals to be normally distributed. This is often cited as an advantage of the corresponding Bootstrap method since whatever distributional form the residuals have they will always flow through the simulation process.

England and Verrall also refer to an important result with regards to checking the underlying assumption of the ODP Bootstrap method and the Mack Bootstrap method. It says that a bias of the Pearson residual should be corrected by multiplying the

cor-responding residuals by “degrees of freedom”1 before bootstrapping. This increases the

variance and eliminates the need to adjust the estimation variance. Subsequently resid-uals should vary around zero and the spread of the residresid-uals should have no systematic patterns, therefore residuals need to be standardized. We do not need to check the un-derlying assumptions of each method separately, since it is already taken care of within the calculation of scaled Pearson residuals.

2.4 Summary

The aim of this chapter is to answer subquestion 1 and 2. Since we are looking at three methods, we will specify the answer for each methods.

1

The number of degrees of freedom is the number of independent observations in a sample of data, N , that are available to estimate a parameter, p, of the population from which that sample is drawn.

(21)

The answers of subquestion 1 of Chain Ladder method are given in subsection 2.2.4. This subsection contains the assumptions of Chain Ladder method that are introduced by Mack (1993). These assumptions are:

• Proportionality says that for a given accident period, the expected value of claims is proportional to the previous claims,

• Independency says that the accident periods are independent,

• Variance assumption says that for a given accident period, the variance of claims is proportional to the previous claims with unknown proportionality constants that depend on the development period.

Mack also noted that these assumptions are not directly testable, but they have testable implications. This answers subquestion 2. In other words, all of the Chain Ladder’s assumptions should be checked in order to perform an estimation.

In order to be able to answer the subquestion 1 for the ODP Bootstrap method and Mack Bootstrap method, we started with the definition of the Bootstrap method as given by England and Verrall (1999). England and Verrall obtained the Bootstrap estimates by sampling with replacement from residuals and they derived this method using GLM. In their 2002 paper, they described the ODP Bootstrap method and the Mack Bootstrap method, and subsequently in 2006 they provided a general framework for these Bootstrap methods.

The Bootstrap method as given in England and Verrall (1999) assumes that the residuals are independent and identically distributed (i.i.d)(England and Verrall, 2002). England and Verrall (2002) also refer to an important result with regards to checking the underlying assumption of the ODP Bootstrap method and the Mack Bootstrap method. They noted that it is not necessary to check the assumptions of these Bootstrap method since it is already taken care of within the calculation of the residuals. This paragraph gives answers to subquestion 1 and also the answer to subquestion 2. In other words, there is only one assumption that needs to be checked for the ODP Bootstrap method and the Mack Bootstrap method, namely the residuals are assume to be i.i.d. These answers are to be found in subsection 2.3.5.

In the next section we will discuss the measurements to judge the adequacy of the above mentioned assumptions and thereby give answers to subquestion 3.

(22)

Chapter 3

Testing the assumptions of the

Reserve Estimation Methods

In this section we will discuss the measurements to judge the adequacy of the methods’ assumptions within the used method, to answer subquestion 3. We already discussed the underlying assumptions for each method in the previous chapter. Recall that the Chain Ladder’s assumptions are mentioned in section 2.2 and the ODP Bootstrap and the Mack Bootstrap’s assumptions are mentioned in section 2.3.

For practical reasons, Achmea requested us to implement testing tools to judge their reserving methods. Therefore we will start this chapter by explaining the implemented testing tool and its uncommon specifications. In addition to this, we will illustrate these specifications using the numerical example as given in chapter 2. Subsequently we will extensively describe each test, illustrate this test based on the example and determine the result. The results in this chapter are calculated by the implemented testing tools.

3.1 Testing tool

We implement the testing tools in Excel VBA since this software is commonly used by insurance companies and it is easy to understand. Most of these testing tools are suited and already tested for AA and AQ claims triangles.

3.1.1 Outliers

In practice we are often dealing with outliers2. We implement the testing tools in such a

way that they can work with triangles with outliers. In the testing tools such outliers are indicated with zero or empty values. The table below gives an example of an incremental claims triangle including outliers. In this case the first element of accident year 2004 is an outlier.

Table 3.1: Example of AQ incremental triangle including outliers in millions euros.

2004 0 52 17 14 12 9 3 1

2005 99 76 32 3

Based on table 3.1 we determine the cumulative claims triangle, see table 3.2.

2

Outliers represent values that are incorrect or extreme or abnormal in the triangle. Such values may not be representative in order to predict the future. Therefore, by removing outliers from the triangle one also remove its future impact within the estimation.

(23)

Table 3.2: Example of AQ cumulative triangle including outliers in millions euros.

2004 0 52 69 83 95 104 107 108

2005 99 175 207 210

3.1.2 Input Requirements

The inputs of the implemented testing tool for the Chain Ladder method are the cu-mulative claims triangle, the individual development factors triangle and the chosen development factors vector. Next to this, the inputs of the testing tool for ODP Boot-strap method and Mack BootBoot-strap method are the fitted cumulative triangle and the standardized Pearson residuals triangle.

In addition to these inputs, inputs regarding the number of accident year, i, devel-opment period, k, and develdevel-opment factor period, k − 1, are needed as well. In this chapter i = 2 and k = 8.

In order to be able to rely on the result of these testing tools, it is necessary that we have a sufficient number data points at every development period (column). The number of data points depends on the concerning triangle dimension. Within these testing tools we rely on the expertise of the modeler to insert this minimum number of data points

for each column, mincol, as input parameter. Note that in the testing tools mincol does

not make distinction whether a data point is an outliers or not. Based on table 3.2, for

k = 1, mincol = 2 in which one of them is an outliers, for k = 5, mincol = 1 since it

consists only one element.

Depending on the tools, an additional input is necessary. Such additional input will be discussed in each tool’s description.

In practice the claims are usually given in millions of euros, in order to be able to use the testing tool correctly we need to upgrade the table 3.1. For that, we multiply each element by a million euros. Now, the amounts in the table are in millions of euros. The table below only applies for this chapter.

Based on table 3.2 we find the following development factors triangle: Table 3.3: AQ development factors triangle including outliers.

2004 0 1.327 1.203 1.145 1.095 1.029 1.009 1.000

2005 1.768 1.183 1.014

F 1.768 1.216 1.062 1.145 1.095 1.029 1.009 1.000

Note that the meaning of the first column, k = 1, in the cumulative triangle differs from the column of the individual development factor triangle. The first column in the cumulative triangle contains the payments that are made in the first development quarter. On the other hand, the first column in the individual development factors triangle contains a ratio of the payments that are made in the first development quarter and the payments that are made until the second development quarter.

3.1.3 Inflation

It is necessary to remove the (constant) rate of inflation from the incremental claims when using the testing tools. Removing the inflation from the data before using the Chain Ladder method means, that we will not extrapolate the impact of outliers into the future estimation (Mack, 1994).

(24)

18

3.2 Test for Chain Ladder method assumptions

In this section we will discuss the tests that can be used to judge the adequacy of the Chain Ladder’s assumptions. Until now there is limited literature available with regard to this topic. Most of the tests in this section are introduced by Mack (1994) and Venter (1998).

Mack (1993) shows three assumptions for the Chain Ladder method, namely pro-portionality assumption, independency assumption, variance assumption. In order to judge the validity of the proportionality assumption, there are three tests namely, the linearity test, the significance test and the uncorrelatedness test. These tests are based on the relationship between the Chain Ladder method and WLS. The linearity test proposed by Mack (1994) is a graphical test. The latter two tests are statistical tests and are proposed by Venter (1998). Both the independency test and variance test are proposed by Mack (1994).

In the next subsection we will further elaborate on these tests. Note that the results of each test will be generated by using the testing tools. Except for the independency testing tool, the testing tools are suited for AA and AQ triangles.

3.2.1 Proportionality: Linearity test

The proportionality assumption says that for a given accident year, the expected value of claims is proportional to the previous claims (Mack, 1993). This proportionality relationship can also be interpreted as linearity relationship (see equation (2.12)). To detect the linearity relationship Mack (1994) uses the linear regression plot. This thesis

uses a scatterplot3 to show this linearity relationship.

Description of the test

The aim of this test is to investigate whether there is an approximately linear

relation-ship around a straight line through the origin with slope Fk as given in equation (2.6)

(Mack, 1994). In other words, we judge the randomness of the data points along the regression line.

This test is met if the data shows a clear regression line. In practice these plots will show randomness of the data points in the first development period. Generally, this does not imply that the Chain Ladder approach is inappropriate. This rather implies that the development in the first development period is highly volatile. In other words, we can ignore its result.

Test Calculation

This testing tool needs as input the cumulative claims triangle and the vector of the

chosen development factors, and three numbers, namely i, k and mincol. Since we only

have a limited number of data points, we set mincol = 1. Then, we plot the cumulative

claims Ci,j+1 against its previous Ci,j.

Results

We obtained three scatterplots based on table 3.2, for the plot of Ci,1 against Ci,2,

Ci,2 against Ci,3, and Ci,3 against Ci,4. But only the last two plots are relevant for the

analysis. This result is illustrated in the graph 3.1.

Note that one of the element of k = 1 is an outlier, this means that the plot of

Ci,1 against Ci,2 has only one relevent data point. Thus this plot is not relevant for the

3

scatterplot is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data

(25)

analysis. The testing tool does not generate the plot of Ci,4 against Ci,5 since it only

contains one data point.

Figure 3.1: Results of Linearity test based on the numerical example.

In spite of the limited number of data points, figure 3.1 shows that there is a clear linear relationship with the corresponding development period.

(26)

20

3.2.2 Proportionality: Significance test

Since there is a connection between WLS and the Chain Ladder method (see section

2.2.2), the regression analysis4can also be used to judge the corresponding assumptions.

The following test is based on the regression analysis. The linear regression of Chain Ladder method is given in the equation (2.10).

This test is to investigate whether the estimated parameters are significantly different from zero (Venter, 1998), i.e.,

H0 : the estimated parameters are significantly different from zero.

The significance is tested by regression of incremental claims against the previous cu-mulative claims, i.e.,

Ii,j+1= Ci,jFj+ i.

If the estimated parameters are greater than a tolerance level5, then the

null-hypothesis is to be rejected as false. Thus the results will be thrown out.

Venter also noted that this test is not a strict statistical test, the results only give a level of comfort.

Test Calculation

This testing tool needs as input, the cumulative claims triangle, and three numbers, namely i, k and a critical value, α. The tool calculates the statistics for a line by using the extension of equation (2.10) to calculate a straight line that best fits the data and then returns the estimated parameters and their standard deviations.

Recall the incremental triangle and cumulative triangle given in table 3.1 and 3.2 respectively. As illustrated we calculate this test for development quarters 2 to 3. The table below is the input for this calculation. Column X contains the cumulative claims at development quarter 2 and column Y contains the incremental claims at development quarter 3.

Table 3.4: Input for Significance test for development quarter 2 to 3 based on the numerical example.

Accident Year X (2) Y (3)

2004 52 17

2005 175 32

Then we find the following result:

Table 3.5: Results of Significance test for development quarter 2 to 3.

Dev. periods a Std.dev. a b Std.dev. b

2 to 3 10659 0.000 0.122 0.000

The testing tool uses the feature “LINEST” to do the calculation. Note that the amounts in the table 3.5 are in thousands of euros.

4_{Regression analysis is a statistical process for estimating the relationship among the variables. This}

analysis produces estimates for the standard deviation of each parameter estimated.

5

Tolerance level is a level of comfort when deviation still acceptable. Such a level of comfort can be determined by sensivity analysis.

(27)

Results

In the table below we applied a critical value of 1.65. The level of tolerance is 1.65 times its standard deviation, which is the 90% quantile of Normal distribution. This means that we are comfortable with a probability of about 10% of getting a factor of this absolute level of tolerance or greater when the true factor is zero. Note that if a factor were is considered to be not significant for the Normal distribution, it would probably be even less significant for other distribution.

Table 3.6 shows the estimated parameters and their standard deviations. As can be seen, all of the estimated parameters of constant a are thrown out but only one of estimated parameter of development factors b is thrown out. Thus, the constants are statistically significant and the development factors are not. The significance of the constants and the lack of significance of the development factors suggest that this significance test is not met by this dataset.

Note that the column “Dev.periods” contains of the number development periods. Table 3.6: Results of Significance test based on the numerical examples.

Dev. periods n a Std.dev. a thrown? b Std.dev. b thrown? 1 to 2 1 76000 0.000 YES 0.000 0.000 NO 2 to 3 2 10659 0.000 YES 0.122 0.000 YES 3 to 4 2 19500 0.000 YES -0.079 0.000 NO 4 to 5 2 12000 0.000 YES 0.000 0.000 NO 5 to 6 1 9000 0.000 YES 0.000 0.000 NO 6 to 7 1 3000 0.000 YES 0.000 0.000 NO

(28)

22

3.2.3 Proportionality: Uncorrelatedness test

Another implication of the proportionality assumption of the Chain Ladder method is that subsequent development factors are uncorrelated (Mack, 1994). This means that

after a rather high value of development factor, Fj−1, the expected size of the next

development factor, Fj is the same as after a rather low value of development factor

Fj−1.

Both Mack (1994) and Venter (1998) developed a correlation test for adjacent columns of a development factors triangle. Mack’s test judges the correlation within the triangle as a whole rather than every pair of columns of adjacent development pe-riods separately. If the test fails, the outcome does not show which part of the triangle is correlated. In practice this test fails for a lot of triangles. As this information gives insight in how to improve the triangle projection, we prefer to investigate the pairs of development factors as suggested by Venter (1998).

This test is to investigate whether a correlation exists within the columns of a develop-ment factors triangle. It is done by obtaining the sample correlation coefficient for all pairs of columns in the triangles.

We consider two development factor triangles. Both triangles are derived from the Venter (1998) representation of the Chain Ladder method using incremental instead of cumulative claims. One triangle contains individual development factors based on the incremental claims triangle, i.e.,

gi,k =

Ii,k+1

Ii,k

. (3.1)

Another triangle contains individual development factors derived from equation (2.7), i.e.,

hi,j = fi,j− 1. (3.2)

In order to be able to calculate the correlation coefficient, we need to obtain the average, the sample variance and the sample covariance of both development factors triangles. The correlation between two columns is the covariance divided by the product of the standard deviations for the first n elements of both columns, where n is the length of the shorter column. We refer to this as the “sample correlation (r)”.

We define

T = rh(n − 2)

(1 − r2₎

i1₂

. (3.3)

A significance test for correlation coefficient is made by considering T to be t-distributed with n − 2 degrees of freedom. If T is greater than the t-critical value for 100(1 − α)% confidence interval at n − 2 degrees of freedom, then r can be considered to be significantly different from zero at the α% level. If this test succeeds, we say that there is a single correlation within the triangle. A single correlation is not considered as a strong indicator of correlation within the triangle.

However, depending on the size of the triangle, if there is quite a number of significant correlations, this strongly suggests that there is actual correlation within the whole triangle. A significance test for the correlation within the whole triangle is as follows:

Let m denote the number of pairs of columns in the triangle. The number that displays significant correlation could be considered a binomial variate with parameters m

and α, which has standard deviationpα(1 − α)m . Thus more than αm + pα(1 − α)m

significant correlations suggest there is actual correlation within the whole triangle. The algorithm of this test is given in the next page.

(29)

The algorithm of Uncorrelatedness test according to Venter (1998):

step 1: Calculate the individual development factors based on the incremental claims,

gi,k =

Ii,k+1

Ii,k

.

Obtain a development factors triangle with gk’s in its elements and refer to each

column of this development factors triangle as X.

step 2: Calculate another individual development factors triangle which is derived from the equation 2.7,

hi,j =

Ci,j+1

Ci,j

− 1.

Obtain a second development factors triangle with hk’s in its elements and refer

to each column of this development factors triangle as Y .

step 3: Calculate column average, sample variance and sample covariance for the first n elements of both columns, where n is the length of the shorter column, by using

X = Average of X = Pn i=1gk n , Y = Average of Y = Pn i=1hk n , Sample variance of X = Pn i=1(Xi− X)2 n , Sample variance of Y = Pn i=1(Yi− Y )2 n , Sample covariance = Pn i=1(Xi− X)(Yi− Y ) n .

Note that since we want to have variance and covariance of a particular list of number in a column, we use the denominator n instead of n − 1.

step 4: Calculate the sample correlation coefficient for two columns. The sample co-variance divided by the product of standard deviations, that is,

r = Pn i=1(Xi− X)(Yi− Y ) q Pn i=1(Xi− X) Pn i=1(Yi− Y ) step 5: Determine T = r h(n − 2) (1 − r2₎ i1 2 .

A significance test for correlation coefficients is made by considering T to be t-distributed with n − 2 degrees of freedom. If T is greater than the t-critical value for 100(1 − α)% confidence interval at n − 2 degrees of freedom, then r can be considered to be significantly different from zero at the α% level. We refer this as there is “a single correlation within the triangle”.

step 6: Another significance test for correlation coefficient is as follows:

Let m be the number of pairs of columns in the triangle, then the number that displays significant correlation could be considered a binomial variate with

pa-rameters m and α, which has standard deviation pα(1 − α)m. Thus more than

αm +pα(1 − α)m significant correlations suggest there is an actual correlation

within the whole triangle. We refer this as there is “correlation within the whole triangle”.

(30)

24

Test Calculation

This testing tool needs as input the cumulative claims triangle and the development

factors triangle. Recall that this development factors triangle contains fj as given in

equation (2.5) in its element. There are four input numbers namely, i, k, k − 1 and a critical value, α.

The table below shows how this test works. Column X contains the individual development factors based on equation (3.1), Column Y contains the individual devel-opment factors based on equation (3.2). Note that X is equal to averages, which make it quite easy to obtain the standard deviation of each column and the covariance of both columns.

Table 3.7: Example step 1 to 3 of Uncorrelatedness test for development periods 2 to 3

Year X = 1 to 2 Y = 2 to 3 (X − X)2 (Y − Y )2 (X − X)(Y − Y )

2004 0.000 0.327 0.1473 0.005 -0.028

2005 0.768 0.183 0.1473 0.005 -0.028

Average 0.384 0.255 0.1473 0.005 -0.028

After this we can easily calculate step 4 to 6 and find the correlations relationship. Results

Before we calculate the sample correlation, r, we analyse first table 3.2. Based on this table, there are only two single correlations within the triangle namely correlation be-tween column 2 and 3 and bebe-tween column 3 and 4. Since these columns contain two datapoints (n = 2), we expect that r will be either −1 or 1. This implies that these columns are correlated with its neighbour. Given this expecation, we derive that there is correlation within the whole triangle. This means that this test is not met by this dataset.

In order to be able to run the testing tool, it is necessary to put a number of α as input. In this example we applied α = 10%. This α will however not be used in the calculation since n ≤ 2 which implies that T = 0 and t = 0. Note that if n > 2 then T > 0, and the t-critical value is calculated by using the feature “TInv” in Excel.

The result of the single correlation test is shown in the right most column. This result is in conformity in our expectations.

Figure 3.2: Results of Uncorrelatedness test based on the numerical examples.

Column(X -> Y) n Average of Average of Average of Sample correlation(r) T t-statistic CORRELATION EXIST? Column (1 -> 2) 1 0.000 0.768 0.000 0.000 0.000 0.000 0.000 0.000 YES

Column (2 -> 3) 2 0.384 0.255 0.147 0.005 0.028 -1.000 negative corr. 0.000 YES

Column (3 -> 4) 2 0.374 0.109 0.002 0.009 -0.004 -1.000 negative corr. 0.000 YES

Column (4 -> 5) 1 0.824 0.145 0.000 0.000 0.000 0.000 0.000 0.000 YES

Column (5 -> 6) 1 0.857 0.095 0.000 0.000 0.000 0.000 0.000 0.000 YES

(31)

3.2.4 Independency test

The second assumption to apply the Chain Ladder method is independency between

accident years, i.e. {Ci,1, . . . , Ci,n} and {Cj,1, . . . , Cj,n} are independent for i 6= j.

Ac-cording to Mack (1994) the main reason why this independence can be violated in practice is the fact that there are certain calendar period effects such as major changes in claims handling or in case reserving or external influences such as substantial changes in court decisions or inflation.

This test is to investigate whether a correlation exists within the diagonal of a cumulative claims triangle (Mack, 1994).

We define diagonals Dt for accident year, i = 1, . . . , n and calendar quarter, t =

1, . . . , 4(n − 1)

Dt= Cn−l,4l+k, l = 0, . . . , 4(n − i + 1), k = 4, . . . , 1,

and group the adjacent development factors of each diagonals

At= nC_n,4 Cn,3 ,Cn−1,8 Cn−1,7 , . . . , C1,j C1,j−1 o = {fn,3, fn−1,7, . . . , f1,j}, At−1= nC_n,3 Cn,2 ,Cn−1,7 Cn−1,6 , . . . ,C1,j−1 C1,j−2 o = {fn,2, fn−1,6, . . . , f1,j−1}

We start with ranking the elements of the set Fj, i.e. of the column of all

develop-ment factors observed between developdevelop-ment period j and j + 1, according to their size, starting with the smallest one, then the second smallest and so on. Hereafter, we

subdi-vide those observed development factors into three parts, namely lFj, sFj and neutral.

lFj is considered as the individual development factor, fi,j, being greater than the

me-dian of the development factor, Fj. sFj is considered when the individual development

factor, fi,j, is lower than the median of the development factor, Fj. Neutral is when the

individual development factor, fi,j is equal to the median of the development factor, Fj,

and this part will be eliminated from all further considerations. Thus, for each set Fj,

every development factor observed is either assigned to the set l or assigned to the set s in which

• set l = lF₁+ · · · + lFj of larger factors

• set s = sF₁+ · · · + sFj of smaller factors

In this way, every development factor which is not eliminated has a 50% chance of belonging to either l or s. So, the number in l has a Binomial distribution.

Next, we count for every diagonal At of development factors the number lt of large

factors, i.e. elements of l and the number stof small factor, i.e. elements of s. Intuitively,

if there is no specific change from calendar period t to calendar period t + 1, Atshould

have about the same number of small factors as of large factors, i.e. ltand stshould be of

approximately the same size apart from pure random fluctiation. But if ltis significantly

larger or smaller than st or if

zt= min(lt, st),

i.e the smaller of two figures, is significantly smaller than (lt+st)

2 , then there is a specific

calendar period influence.

A significance test for diagonal correlation is made by considering the first two

mo-ments of the probability distribution of zt under the hypothesis that each development

(32)

26

Binomial distribution with parameter nt= lt+ st and p = 0.5. Next, we determine the

probability Pr(l = m) = n m 1 2n = n! (n − m)!m! 1 2n (3.4)

where m = n−1₂ denotes the largest integer ≤ (n−1)₂ .

The first two moment of zt is given by

E(zt) = nt 2 − nt− 1 mt nt 2nt, V ar(zt) = nt(nt− 1) 4 − nt− 1 mt nt(nt− 1) 2nt + E(zt) − (E(zt)) 2_. (3.5)

If the probability as given in equation (3.4) is smaller than α% level or equal to 1₂nt−1

, then there is a single calendar period correlation within the triangle. However, many single calendar period correlations are not considered as a strong indicator of correlation within the triangle.

In order to avoid an accumulation of the error possibilities, we consider

Z = X

nt>2

zt.

Note that in AQ triangle we have left out z1, . . . , z4 because A1, . . . , A4 contains at most

one element which is not eliminated and therefore z1, . . . , z4 are not random but always

= 0. Similarly, we have left out zt if lt+ st ≤ 2 because this diagonal At contains at

most two elements which is not eliminated therefore zt is not a random variable but it

can be either 1 or 0.

A formal test is made by considering that under the null-hypothesis different zj’s

are (almost) uncorrelated we have

E(Z) = X nt>2 E(zt), V ar(Z) = X nt>2 V ar(zt) _(3.6)

and we assume that Z approximately has a Normal distribution. This means that we reject the hypothesis of having no significant calendar period effect only if Z is outside its 100(1 − α)% confidence interval range (E(Z) − 2pV ar(Z), (E(Z) + 2pV ar(Z)). If this is not the case then we say that there is calendar period correlation within the whole triangle.

Test Calculation

This testing tool needs as input the development factors triangle and three input num-bers namely, i, k − 1 and a critical value, α.

We start with arranging each element of development quarter column in ascending order.

Table 3.8: Example of ordering the development factor triangle in ascending order

2004 0 2 2 1 1 1 1

2005 1 1 1

Testing the assumptions of chain ladder method and bootstrap method.

Chain Ladder Method

and

Bootstrap Method

Wella Shindy

Contents

Preface

Introduction

1.1

Background

1.2

Problem statement

1.3

Research approach

Reserves Estimation Methods

2.1

Run-off Triangle

2.2

Chain Ladder method

2.3

Bootstrap method

2.4

Summary

Chapter 3

Testing the assumptions of the

Reserve Estimation Methods

3.1

Testing tool

3.2

Test for Chain Ladder method assumptions