From Double Chain Ladder To Double GLM

(1)

MSc Stochastics and Financial Mathematics

Master Thesis

From Double Chain Ladder

To Double GLM

Author:

Robert T. Steur

Examiner: dr. A.J. Bert van Es Supervisors: drs. N.R. Valkenburg AAG drs. L.N. Aid Usman

(2)

Rob Steur

Abstract

A popular technique to estimate future claim payments and reserves in non-life insurance is the Chain Ladder Method which assumes a trend in past claim data to estimate future claim figures. This method has been refined to a new framework called Double Chain Ladder Method, which seperates the (outstanding) claim estimates into an IBNR and RBNS part by using the Chain Ladder Method twice but seperately on incurred claim count and paid claim amount triangles. In addition to the seperate reserve figures, this structure provides clear insight in underlying assumptions compared to the Chain Ladder Method, as well as an integrated method to estimate tail reserves. The data for the incurred claim counts are assumed to be Pois-son distributed while the claim payments are assumed to be Overdispersed Poisson distributed in order to estimate parameters with a Chain Ladder Method. The Double Chain Ladder Method is a method which still has practical issues.

In this thesis we will upgrade the Double Chain Ladder Method to cope with some important practical issues. We will adapt the Double Chain Lad-der framework to work with types of Generalized Linear Models unLad-derlying the incurred claim count and paid claim amount data. This allows for addi-tional trends such as calendar year inflation or a shift in payments caused by less or more payments in a specific calendar year, as well as avoiding over-parametrization. Parameters will no longer be estimated using the Chain Ladder Method which is an exact equivalence of estimating parameters with a maximum likelihood procedure in a Poisson distribution. The new frame-work allows for a better fit to more types of data while maintaining the benefits of the seperate IBNR and RBNS structure.

(3)

Contents

1. Introduction 3

1.1. Non-Life insurance introduction 3 1.2. Double Chain Ladder introduction 4

1.3. Thesis goals 5

1.4. Thesis chapters overview 5 2. Double Chain Ladder Framework 6

2.1. Current DCLM 6

Step 1. Origin and development trends 7 Step 2. Settlement delay pattern 8 Step 3. Individual claim sizes 9 Step 4. Extrapolating RBNS claims 9 Step 5. Extrapolating IBNR claims 11

2.2. Discussing DCLM 13

Settlement delay pattern discussion 1 13 Settlement delay pattern discussion 2 14 Trend parameters discussion 14 Distribution paid amounts discussion 15

2.3. DCLM issue summary 15

3. Double GLM framework 16

3.1. Formulating DGLM 16

Step 1. Origin, development and calendar trends 17 Step 2. Future calendar year effects 20 Step 3. Settlement delay pattern 21 Step 4. Individual claim sizes 23 Step 5. Extrapolating RBNS claims 24 Step 6. Extrapolating IBNR claims 24

3.2. Discussing DGLM 25

4. Comparing DGLM and DCLM 27 4.1. DCLM and DGLM estimates 28 4.2. Comparing DCLM and DGLM estimates 32

5. Simulation study DGLM 33

5.1. Distribution of individual claims 33 5.2. Bootstrap Monte Carlo methodology 34 5.3. Bootstrap Monte Carlo results 35

6. Conclusions 40

6.1. DGLM future research 40

7. Popular Summary 41

Appendix A. 42

A.1. Chain Ladder Method 42

A.2. Generalized Linear Models 43 A.3. Linking CLM to a Poisson GLM 44

Appendix B. 45

B.1. Background definitions 45

(4)

1. Introduction

The main goal of this thesis is to improve a popular existing model for estimating future claim payments at an insurance company. As a necessary introduction to the research we consider the payment and data structure at a non-life insurance company. Additional information regarding commonly used terms by an insurance company is stated in appendix B. Appendix B should be read after the introduction when one is not familiar with Non-Life insurance.

1.1. Non-Life insurance introduction. A policy holder experiences cer-tain damages which he wants to claim from his insurance company. As such the company has datasets of reported and paid claim counts and amounts, which consist of an aggregation of individual claims from all policy hold-ers per year of origin i and per year of development j. For example, we have damages that have occured in 2011 (year of origin 1). A part of these will be reported and or paid in 2011 (development year 0), a part will be reported and or paid in 2012 (development year 1) etc. So we get an up-per left triangular data matrix, because we will only have data up till the current calendar year m, e.g. year of origin i and year of development j such that i + j = m. The matrix is usually referred to as a run-off triangle, see Figure 1 for an example with incurred claim counts. Value 17 for index (2013, 2) for example means that for all accidents happening in 2013, 17 accidents were reported 2 years later. This also means they were reported in calendar year 2015. If it had been a paid amount data triangle, it would mean that a money amount of 17 was paid out by the insurer after 2 years for accidents occuring in 2013. Data on a diagonal corresponds to the same calendar year. The data displayed here is part of a motor insurance run-off triangle with incremental data, which we will use throughout the thesis for illustrational purposes. Insurers often use cumulative data in their run-off triangles; results in this thesis can be extended to such a format if necessary.

i\j 0 1 2 3 4 2011 6238 831 49 7 1 2012 7773 1381 23 4 2013 10306 1093 17 2014 9639 995 2015 9511

Figure 1. Run-off triangle incurred claim counts

By regulation, the company has to estimate future claim development. They need to set aside capital, usually referred to as reserves, in order to fulfill future payments arizing from past years of origin. These reserves can be split in two required parts. Reserves for damages that have Incurred, But Not yet Reported to the insurance company (IBNR reserve), and reserves for damages that have been Reported to the company But Not yet have

(5)

been Settled or fully paid (RBNS reserve). So the timeline for a claim is: accident happens−−−−−→reporting

delay accident reported

settlement

−−−−−−→

delay final payment made.

First the damage occurs (referred to as incurred), then the policyholder reports a claim, and finally a claim is settled and paid for by the insurance company. So first there is a reporting delay after the occurence and secondly a settlement delay after the reporting. RBNS can have different meanings depending on what is defined as ’Settled’, but in our case we just assume that settling and paying for the total claim occur at the same time, so settling means the end of the claim timeline. Following this timeline, claims in an incurred triangle will ultimately be present in a paid triangle as well, since they will be paid eventually. These two types of triangles are thus linked. The Chain Ladder Method (CLM) is a popular method to analyze and extrapolate a trend for year of origin and a trend for development year in the data, see appendix A.1. For example we have a triangular dataset consisting of total claim payments. The CLM is only able to extrapolate estimates in the remainder of a data matrix, so the lower right triangular matrix. It cannot be used to estimate payments beyond the final data point in the development year. CLM is also unable to seperate claim estimates into an IBNR part and an RBNS part, it only estimates the combined payments. CLM does have a statistical justification. It can be shown that estimating with a CLM is equivalent to a maximum likelihood procedure when a Poisson distribution is assumed for the claim data, see Appendix A.3.

1.2. Double Chain Ladder introduction. The CLM has been refined to work in a new framework called Double Chain Ladder Method (DCLM), which seperates the future claim estimates into an IBNR and RBNS part by using the Chain Ladder Method twice but seperately on incurred claim count and paid claim amount triangles. The difference between incurred claim count and paid claim amount figures is explained in Appendix B. In addi-tion to the seperate reserve figures, this structure provides insight in under-lying assumptions as well as an integrated method to estimate tail reserves, so for claims beyond the final data point for development years. The data for the incurred claim counts are assumed to be Poisson distributed while the claim payments are assumed to be Overdispersed Poisson distributed in order to estimate parameters reliably with a Chain Ladder Method. These distributions are required to arrive at an equivalence to maximum likelihood estimation as mentioned earlier. It would still be possible to use CLM for other distributions, but this would not be statistically justified. Under a spe-cific assumption, the combined future claim estimate produced by DCLM, excluding the tail estimate, is exactly equal to the future estimate produced by a single CLM procedure on a paid claim amount triangle. This result will be proven in a later section and it will be useful for validating CLM es-timates, since the DCLM model provides richer insights in underlying risks by adding more parameters with a realistic interpretation.

(6)

1.3. Thesis goals. In this thesis we will adapt the DCLM framework to work with types of Generalized Linear Models (GLM) underlying the in-curred claim count and paid claim amount data. The main goal and benefit of this adaption will be:

The inclusion of a calendar year effect in the paid amount triangle. (1.1) This will constitute an approach with better predictions, as we believe that the most important effects in the data are best explained with a calendar year parameter. This will allow us to seperate a trend in paid amount data, into a trend for individual claim sizes and a trend for paid numbers. We will call this Gamma Factoring. We will demonstrate that the inclusion of this parameter will still allow the core DCLM framework to retain all of its properties and parameter interpretations. The trend in year of origin should then only give a measure of exposure, reflecting the number of policy holders and a base level of paid claim amounts. Typically, the trend in origin years should be constant, as we would not expect there to be much change in exposure. If there are sudden changes in exposure, the results can be rescaled for exposure, but this will not be included in the research. The details of the new approach will be explained in chapter 3. Parameters will then no longer be estimated using the popular Chain Ladder Method which was appropriate for a Poisson distribution with only two explanatory variables, but the new framework allows for a better fit to more types of data while maintaining the benefits of the IBNR and RBNS structure and the method for a tail reserve. The second goal of using a more general GLM is:

Preventing overparametrization for development and origin trends. (1.2) For some years in the data triangle there is only a limited amount data, so here it is not appropriate to seperately estimate a parameter value which CLM does. A GLM allows for a trend in all years, so estimates for seperate years include data from the complete dataset. No longer using Chain Ladder Methods in the latter, we will call the new model a Double Generalized Linear Model (DGLM). The current DCLM is actually a DGLM as well, as a Poisson model is a specific case of a Generalized Linear Model, see Appendix A.2. Finally, we will:

Discuss shortcomings of the settlement delay pattern in the DCLM. (1.3) These shortcomings will become apparent when we discuss the current DCLM. 1.4. Thesis chapters overview. In section 2.1 we will review the current DCLM framework, formulated by Mart´ınez-Miranda et al. (2012). All sec-tions following section 2.1 will be comprised of new material. We will discuss shortcomings of DCLM in more detail in section 2.2 and propose solutions. In chapter 3 we will then formulate the DGLM framework where our thesis goals will be accomplished. In chapter 4 we will compare reserve estimates between DCLM and DGLM for a real dataset. In chapter 5 we will do a sim-ulation study for the DGLM reserve estimates using bootstrap techniques to illustrate the variance of the estimates and to compare the reserve dis-tributions between DCLM and DGLM. Finally, we will give conclusions in chapter 6 following the discussion sections in chapter 3, 4 and 5.

(7)

2. Double Chain Ladder Framework

First we will review the current Double Chain Ladder Method (DCLM) and its underlying assumptions. Challenging the assumptions and logic in the current model will help to get a better understanding of the overall structure. After this we will discuss some properties of DCLM in more detail to see if the current framework can be refined before we move on to adaption of the framework to GLMs and a calendar year effect in the next chapter.

2.1. Current DCLM. Following Mart´ınez (2012), we will define the as-sumptions and structure of the current DCLM in this section. The overall structure of the DCLM can be divided into five parts which sequentially lead to an estimate of the RBNS claim amounts and IBNR claim amounts and thus to the total combined future payments estimate. The five steps are:

(1) Estimate factors for origin and development trend using CLM seper-ately on incurred claim counts and paid claim amounts, and extrap-olate future incurred claim counts;

(2) Estimate settlement delay pattern using both incurred and paid de-velopment trends;

(3) Estimate average individual claim sizes using both incurred and paid origin trends;

(4) Extrapolate RBNS claim amounts by applying a settlement delay pattern and average claim size to the incurred claim counts data; and

(5) Extrapolate IBNR claim amounts by applying a settlement delay pattern and average claim size to the estimated future incurred claim counts.

The first three steps will determine the pattern in which incurred claims will be paid in the future, after which the last two steps apply these patterns to the incurred counts data and estimates of future incurred count data to get RBNS and IBNR figures. We will need definitions for data, and assumptions for data distributions, settlement delay pattern, and independence structure, which will all be stated below. We assume that two data run-off triangles are available: paid amounts and incurred counts defined as follows.

Definition. Incurred counts: ℵm = {Nij : (i, j) ∈ I}, with Nij being

the total number of claims of insurance incurred in year i which have been reported in development year j and I = {(i, j) : i = 1, · · · , m, j = 0, · · · , m− 1; i + j ≤ m}. Nij has values in N.

Definition. Paid amounts: ∆m = {Xij : (i, j) ∈ I}, with Xij being the

total payment from claims incurred in year i and paid in development year j, with I as before.

Definition. Paid counts: ℵpaidm = {N_ijpaid: (i, j) ∈ I}, with N_ijpaid being the

total number of claims incurred in year i and paid in development year j. Defining d as the maximum number of years delay till payment is made after the claim is reported, d ≤ m − 1, we can also write N_ijpaid=Pmin(j,d)

l=0 N paid i,j−l,l

where N_ijlpaid is the number of future payments originating from the Nij

(8)

With these definitions, the DCLM which we use in this thesis is formulated under the distributional assumptions A given below.

A 2.1. Settlement delay pattern. Given Nij, the distribution of the

num-bers paid claims follows a multinomial distribution, so the random vector (N_i,j,0paid, · · · , N_i,j,dpaid) ∼ M ult(Nij; p0, · · · , pd), for each (i, j) ∈ I. The

proba-bilities p0, · · · , pd denote the delay probabilities such that Pdl=0pl = 1 and

0 < pl< 1, ∀l.

A 2.2. Individual claim size. The individual claim sizes Y_ij(k) per incurred claim are mutually independent with distributions fi with mean µi and

variance σ_i2. Assume that µi = µγi, with µ being a mean factor and γi the

inflation in the accident years. Also the variances are σ2_i = σ2γ_i2 with σ2 being a variance factor. Also, it is assumed that the claims are settled with a single payment or as a zero claim. The paid amounts can be written as Xij =P

N_ijpaid k=1 Y

(k)

ij ∀(i, j) ∈ I.

A 2.3. Claim counts. The counts Nij are independent random variables

from a Poisson distribution with multiplicative parametrization E[Nij] =

αiβj and identificationPm−1_j=0 βj = 1. Using this identification, the

interpre-tation for the βj is the proportion of total claims allocated to or reported

in development year j, and αi is the expected total number of claims

origi-nating from origin year i.

A 2.4. Independence: We assume that the variables Y_ij(k) are independent of the counts Nij and of the settlement delay pattern.

We will now explain the five steps following Mart´ınez which will lead to an estimate of the RBNS claim amounts and IBNR claim amounts whilst noting the appropriate assumptions used for the individual steps.

2.1.1. Step 1. Origin and development trends. Estimate factors for origin and development trend using CLM seperately on incurred claim counts and paid claim amounts, and extrapolate future incurred claim counts.

Following assumption A 2.3 we apply the CLM to the triangle of incurred counts which follow a Poisson distribution. This leads to estimates α_bi and

b

βj for trend parameters αi and βj such that

E[Nij] = αiβj. (2.1)

For more details see appendix A.1. Estimates denoted with a circumflex can be used to estimate future incurred claim numbers which will be used to extrapolate the IBNR figures, so for i + j > m we get bNij :=αbiβbj. The paid amounts are not assumed to follow a Poisson distribution, but the CLM is applied here as well. So for the Xij’s we also get estimates cαpi and

c

β_jp for parameteters αp_i and β_jp with assumption E[Xij] = αpiβ

p

j, (2.2)

using the same identification Pm

j=1β p

j = 1 as before. The p in the exponent

(9)

Remark 2.1. CLM is applied for ease of use and is not completely sta-tistically justified. The underlying justification comes from the fact that it can be shown that the variance of the paid amounts is roughly proportional to the expectation under certain assumptions, so an Overdispersed Poisson model can be fitted to the data to get maximum likelihood estimates for the parameters, see Verrall et al (2010). An Overdispersed Poisson model assumes a multiple of a Poisson distributed random variable, which has a variance higher than its mean. Applying CLM is not a very odd choice then, but we will describe an issue for this method at the end of the chapter. Also as mentioned earlier, CLM causes overparametrization of the model by es-timating all trend parameters seperately. This will be remedied by adding a more general GLM.

2.1.2. Step 2. Settlement delay pattern. Estimate settlement delay propor-tions using both incurred and paid development trends.

Following assumption A 2.1, we will estimate a delay pattern π0, · · · , πm−1,

from which very similar delay probabilities p0, · · · , pd can be derived. The

only difference is that the probabilities are defined to sum to 1 as well as 0 < pl < 1∀l ≤ d as a practical assumption, whereas the πl can be chosen

freely. The estimated delay pattern will be more important for the DCLM and it can be derived by solving the following linear system once we have estimates bβj for the incurred development trend and cβ_jp for the paid

devel-opment trend as introduced in step 1:       β₀p .. . .. . β_m−1p       =      β0 0 · · · 0 β1 β0 . .. 0 .. . . .. ... 0 βm−1 · · · β1 β0           π0 .. . .. . πm−1      . (2.3)

This will result in a delay pattern, since the equations are formuled to express a single payment in period l as a combination of past incurreds which are individually delayed to payment in period l. Denoting the solution by _bπ, we note that the values bπl could be negative and or sum to more than 1. Ideally, we would want to solve the linear equations given by equation (2.3), but directly for probability vector p. So solve βp _{= Bp, where B corresponds}

to the matrix with values of β like in (2.3). This could be done by using a constrained Least Squares minimization. But because of the format of the matrix B, p0 will have the biggest influence in the optimization, and the

following individual values of p up to pm−1 will have a sequentially smaller

influence on the optimization. This means that the resulting values will be very similar to a direct linear solution _bπ as seen in chapter 2. Therefore we will just follow Mart´ınez and estimate the maximum delay period, d, by counting the number of successiveπ_bl≥ 0, that we get by solving (2.3), such

that Pd−1

l=0 πbl < 1 ≤ Pd

l=0πbl and then define the estimated delay pattern parameters as: b pl=πbl, l = 0, . . . , d − 1, and pbd= 1 − d−1 X l=0 b pl. (2.4)

(10)

Remark 2.2. After closer inspection, there are two components in this es-timation that challenge the assumptions. They will be introduced here and discussed in more detail in section 2.2. In step 5, every delay proportionπ_bl

will be applied to every development period, so we assume homogeneity in A 2.1, but the first delay effect _bπ0 is only determined by data in year j = 0.

Data in this first year might be a poor representation for consecutive years. A similar observation applies for the remaining delay effects.

Furthermore, when a run-off triangle is updated with a new diagonal, the new value for a given origin year i, might have a paid/incurred ratio based on the data that is very different from the estimated zero delay effect _bπ0

which corresponds to this ratio. This is very important for the origin year equal to the current calendar year where only one data point is available. Applying π then for this new year might produce a bad estimate for future_b payments originating from this datapoint.

2.1.3. Step 3. Individual claim sizes. Estimate average individual claim sizes using both incurred and paid origin trends.

Following assumption A 2.2, we will determine the mean of the distribu-tion of individual claim sizes, including the parameters γi which measure

the inflation in accident years. We can set γ1 = 1 for identifiability, so

we can estimate µ by αc

p 1

b

α1. As discussed in A 2.3, the interpretation of the

numerator here is the total paid amount originating from year i = 1, and the interpretation of the denominator is the total number of incurred claims originating from year i = 1, so bµ represents the average claim payment per incurred claim accordingly. We can then estimate γi by:

b γi= c αp_i b αiµb , i = 2, · · · , m. (2.5) 2.1.4. Step 4. Extrapolating RBNS claims. Extrapolate RBNS claim amounts by applying a settlement delay pattern and average claim size to the incurred claim counts data.

We can now estimate future RBNS claim payments by extracting payment numbers from the reported incurred claim numbers, and then multiplying these figures with an average claim size to get total claim amounts. Looking at figure 2, we see the incurred data, as well as the future incurred count estimates. CLM only allows for completing the lower triangle. Looking at Figure 3, we see the claim payments data, and the location of the RBNS estimates. The estimates only run till calendar year 5 + d with d = 4 the maximum settlement delay. The farthest you can get in the triangle is achieved by taking the farthest incurred claim count at the current calendar year m = 5 and delaying the payment for as long as possible.

(11)

i\j 0 1 2 3 4 5 6 7 8 1 6238 831 49 7 1 2 7773 1381 23 4 Nb₂₄ 3 10306 1093 17 · · · Nb₃₄ 4 9639 995 · · · · 5 9511 Nb₅₁ · · · Nb₅₄

Figure 2. Incurred claim counts and CLM estimates

i\j 0 1 2 3 4 5 6 7 8 1 451288 339519 333371 144988 93243 Xb₁₅rbns · · · Xb₁₈rbns 2 448627 512882 168467 130674 Xb₂₄rbns · · · Xb₂₇rbns 3 693574 497737 202272 · · · Xb₃₅rbns Xb₃₆rbns 4 652043 546506 · · · · 5 606606 Xb₅₁rbns · · · Xb₅₄rbns

Figure 3. Paid claim amounts and DCLM RBNS estimates

Take bX₃₅rbns as an example. Intuitively, the estimated payment at this time-point will be the average claim size in year i = 3 multiplied with the sum of claims at (3, 1) multiplied with the 4-period delay proportion π4, and claims

at (3, 2) multiplied with the 3-period delay proportion π3. These are the

only two incurred datapoints that can reach (3, 5), since one would need a delay of 5 periods to delay payment from (3, 0) to (3, 5), which exceeds the maximum delay assumption d = 4 in this example.

Using assumptions A1-A4, we finally estimate bX_ijrbns by its expectation:

E[Xij|ℵm] = E[E[Xij|N_ijpaid]|ℵm] = E[E[ N_ijpaid

X

k=1

Y_ij(k)|N_ijpaid]|ℵm] =

E[N_ijpaidE[Y_ij(k)]|ℵm] = µγiE[N_ijpaid|ℵm] = µγiE[

min{j,d} X l=0 N_i,j−l,lpaid |ℵm] = µγi min{j,d} X l=0

E[N_i,j−l,lpaid |ℵm] = µγi

min{j,d}

X

l=0

Ni,j−lpl.

Use of the tower property of conditional expectation is justified since paid claims are a function of incurred claims through assumption A 2.1, and independence from A 2.4 is used in the third step. We will use the π pattern instead of the plfor the settlement delay in order to show the equivalence of

DCLM with a single CLM procedure on the paid claim amount triangle in the next section. This matches the formula with our intuition and as such we arrive at the following expression for an RBNS estimate:

b X_ijrbns =µbbγi j X l=i−m+j Ni,j−lbπl, (2.6)

(12)

where the summing indices are chosen to sum the appropriate datapoints. 2.1.5. Step 5. Extrapolating IBNR claims. Extrapolate IBNR claim amounts by applying a settlement delay pattern and average claim size to the estimated future incurred claim counts.

The future IBNR claim payments can be estimated exacly like the RBNS payments, only we use the future incurred counts from Figure 2, since IBNR corresponds to claims that are not yet reported, so not in the dataset. The IBNR estimates will also reach farther in the triangle, since the estimates

b

Nij are in later calendar years, so applying the maximum delay period to

these claims, we get estimates as far as seen in figure 4.

i\j 0 1 2 3 4 5 6 7 8

1 451288 339519 333371 144988 93243

2 448627 512882 168467 130674 Xb₂₄ibnr · · · Xb₂₈ibnr 3 693574 497737 202272 · · · Xb₃₅ibnr · · · Xb₃₈ibnr 4 652043 546506 · · · · 5 606606 Xb₅₁ibnr · · · Xb₅₄ibnr · · · Xb₅₈ibnr

Figure 4. Paid claim amounts and DCLM IBNR estimates

Notice that we do not get any estimates for year i = 1, since there are no incurred estimates bN1j to extract payments from.

Taking again bX₃₅ibnr as an example, we now delay the future incurred claims in (3, 3) with 2 periods and (3, 4) with 1 period to estimate payments in (3, 5). Note that for timepoints such as bX₃₄ibnr in the lower triangle, we use future incurreds with a zero delay as well, since they can also be paid in the same year. The final expression for an IBNR estimate is then:

b X_ijibnr =µbbγi i−m+j−1 X l=0 b Ni,j−lπbl, (2.7)

where the summing indices are chosen to sum the appropriate future claim counts. Finally summing the RBNS and IBNR components gives us the total future payment estimates bX_ijDCLM = bX_ijrbns+ bX_ijibnr.

Comparing our derived DCLM estimates with CLM estimates, it can be shown that bX_ijDCLM using (2.6) with an adjustment bX_ijrbns(2)and (2.7), gives the same estimate as CLM when applied solely to the paid claim data as seen in Figure 3. For the RBNS estimate in (2.5) we need to use fitted val-ues for the claim counts instead of the given data Nij. So we have used our

CLM estimates of α and β to estimate future counts in step 1, but they can also be used to calculate fitted values instead of the datapoints, although the differences will be small.

The CLM only estimates for development years stretching as far as the data, so no further then year j = 4 in the example. The DCLM estimates

(13)

for development years j > 4 are a useful extension of the CLM model, elim-inating the need for a tail factor to model these remaining payments. The equivalence for years j ≤ 4 can be shown as follows:

b X_ijrbns(2)+ bX_ijibnr = (_bµγ_bi j X l=i−m+j b Ni,j−lbπl) + (bµγbi i−m+j−1 X l=0 b Ni,j−lπbl) =µbbγi j X l=0 b Ni,j−lπbl =µ_b_bγi j X l=0 b αiβb_j−l_bπ_l = (α_biµbbγi) j X l=0 b βj−lbπl = cαp_i j X l=0 b βj−lbπl = cαp_icβp j = bX CLM ij ,

where in the fifth and sixth step we used the definition of_bγiand the definition

of_bπlas the solution to a linear equation. The preferred DCLM thus deviates

(14)

2.2. Discussing DCLM. As mentioned before, the DCLM has many ob-servable variables with a practical interpretation, which allows for transpar-ent application of expert judgemtranspar-ent or adjustmtranspar-ents. There are still some issues for a number of variables however, which we will discuss here. 2.2.1. Settlement delay pattern discussion 1. The only variable that might not accurately represent a realistic effect is the settlement delay π. By solving π in a linear equation we get a delay effect that is intuitively correct, but analyzing its application indepth reveals two inconsistencies as discussed in remark 2.2. For the first one, we consider the way in which the linear equation is solved. We start with π0 =

β₀p

β0, so the ratio of paid and incurred.

This means that the zero delay effect is solely based on data in the first year. Next we get the 1-year delay π1 = β

p 1−π0β1

β0 . This formula can be interpreted

as:

(1) You take the amount paid in the second year β₁p;

(2) You substract the amount paid which was reported in the second year π0β1, so only payments originating from the first year remain;

and

(3) You divide the payment originating from the first year by the in-curred in the first year β0 to get the needed proportion.

There are two illogicalities that arise by using this method. You assume that the zero delay portion paid in the second year is exactly the same as the zero delay portion in the first year, since π0 is applied here as well, which is

based solely on data from the first year. But in reality this portion can be very different in the second year. This means that the estimate of remaining payments originating from the first year in the second step is too large or too small, so π1 does not represent a proper 1-year delay. It is easy to construct

examples where this issue leads to negative values in π, which is usually not realistic, since it would mean that the insurer receives payments at some point. And again, the 1-year delay is only based on data from the first and second year, so it might not be a good representation for later years. We give a more detailed example with some imaginary values for β and βp that illustrate the issue.

Suppose we have run-off triangles with 3 years of data, and we get the following incurred and paid effects:

b

β = (0.6, 0.35, 0.05), cβp _{= (0.3, 0.55, 0.15).}

So for this product most claims get reported in the first year, but most payments occur in the second year. This might happen because damages occur on average midyear, so they are reported towards the end of the year, which means that they are probably settled beginning next year. We would get π_b0 = 0.3_0.6 = 0.5 in the first year. But it might be entirely possible that

most claims that get reported in the second year, are in fact reported in the beginning of the year, so that they are also settled in that year. The zero delay in this year is very high, say 0.8 for example. An estimate of the 1 year delay should then be π_b1= 0.55−0.35·0.8_0.6 = 0.45. But assuming the same zero

delay 0.5 as the first year while applying the DCLM approach, we actually get bπ = (0.5, 0.625, −0.15625). This happens because we overestimate the

(15)

payments originating from the first year. Evidently, this is not a good repre-sentation of the delay effect. A more appropriate approach would be to use actual data about which proportion in the second year originates from the first year, but most companies are reluctant and unable to store this level of detailed information. Using _bπ however, will still result in sensible estimates. This is because an estimate such as in 2.7 usually contains more than one entry of the vector π. The incorrect shift included in the calculation of oneb b

πi, is countered by a similar shift in calculation of bπi+1, so the mistakes cancel eachother out. It is thus not a very big issue to use bπ, but the single entries π_bi themselves do not have a realistic interpretion, so one should be

careful when adjusting these seperate values according to expert judgement. 2.2.2. Settlement delay pattern discussion 2. For the second inconsistency, we have fitted delay proportions π0, · · · , πl to the data, but for RBNS

es-timates, we already have available data for the past delay effects. Take for example datapoint (5,0) in Figure 3. We may have fitted a _bπ0 to all the

data, but we should use actual data about the ratio of paid and incurred in (5,0) to see which portion of the incurred claims still remains to be paid in the future. We could have for example that _bπ0 = 0.6, so the total future

proportion will be 0.4. In the current DCLM, this total effect of 0.4 is used regardless of the data. If the actual payment for (5,0) indicates a zero delay effect of 0.75, then only 0.25 remains to be paid in the future, so there is a payment shift in view of the average π effect. Therefore we would advise to rescale the π_bl for l > 0 to arrive at a total effect of 0.25, otherwise the

total payment proportion would be 0.75 + 0.4 = 1.15 instead of 1. Rescaling would still retain the same proportions between the estimated _bπl. There

are no issues for IBNR estimates, since these are based on future incurred counts, so there is no data to update _bπ. In most situations however, updat-ing π with actual data ratios of paid counts/incurred counts might not beb feasible. An insurer should be able to provide a run-off triangle with paid counts, which can be used in combination with incurred counts to derive information about past delays. But assumption A 2.4 in the DCLM that every incurred claim will be settled with 1 claim payment, will usually not hold in practice. There might be many paid counts arizing from a single incurred count, for example in disability insurance. A ratio in the first de-velopment year of paid counts / incurred counts does not represent a portion of incurred claims settled in that year then, so it does not represent a zero delay portion.

We propose to establish a relation between total incurred counts and total paid counts. One could show a significant trend implying fixed proportions of incurred counts and paid counts, for example 3 payments on average for every incurred count. Then it is possible to derive valid information from a ratio of paid counts / incurred counts. We can analyze the statistical significance about whether including this data would result in better es-timates. A procedure which derives information from paid counts versus incurred counts can be linked to the Munich Chain Ladder method. Munich Chain Ladder combines incurred amounts and paid amounts to reduce the estimated parameter variance, see Quarg and Mack (2004).

(16)

2.2.3. Trend parameters discussion. A CLM does not a produce a trend for calendar year effect, so a parameter that depends on i + j. There are a num-ber of effects on payment patterns that are better reflected by a calendar year effect, so it would be useful to include it in the model. We will include this in chapter 3 when we introduce a more general GLM structure to esti-mate parameters. Aside from inflation which impacts individual claim sizes, we will use calendar year effects to identify payment shifts in paid counts in the data. The exact notion of a payment shift will become apparent in section 3 when we formule the DGLM and Gamma Factoring.

2.2.4. Distribution paid amounts discussion. As mentioned shortly in re-mark 2.1, an Overdispersed Poisson model can be fitted to the paid amounts Xij. But this is different from the compound distribution assumption Xij =

PN

paid ij

k=1 Y (k)

ij which is used to derive the DCLM reserve estimates. So when

the variance of random variables is derived by means of overdispersed Xij,

there is a theoretical mismatch in the model. The mismatch will be small if both distributions are similar, but it is worth noting.

2.3. DCLM issue summary. As discussed in the sections above, there are a number of shortcomings in the DCLM that could be resolved:

(1) There is overparametrization for estimated trends in the CLM; (2) There is no interpretation for different calendar year effects; (3) CLM is only statistically justified for Poisson distributed data, (4) The settlement delay pattern π does not reflect past delay data per

year of origin for RBNS estimates;

(5) The settlement delay pattern π does not have a realistic interpreta-tion on an individual parameter level; and

(6) There is a mismatch between different assumptions for Xij.

We will formulate the DGLM now in the next section where we will adress a number of these issues.

(17)

3. Double GLM framework

We will state the Double Generalized Linear Model (DGLM) here in the same way we stated the DCLM in chapter 2. First we will state the dif-ferent steps of the new framework followed by a more detailed explanation. This will allow us to compare the DGLM with DCLM more easily. Assump-tions A.1 and A.4 about the settlement delay pattern and independence will remain the same, while providing an update for A.2 and A.3 to reflect the use of GLMs. We will also provide a seperate summary of parameter interpretations.

3.1. Formulating DGLM. The overall structure of the DGLM can be divided into six parts which sequentially lead to an estimate of the RBNS claim amounts and IBNR claim amounts and thus to the total combined future payments estimate. The six steps are:

(1) Estimate factors for origin and development trend using a GLM seperately for incurred claim counts and paid claim amounts, and extrapolate future incurred claim counts. Include a calendar year trend in the GLM for paid claim amounts as well;

(2) Estimate future calendar year effects based on past calendar year effects given by the paid amount GLM, and split calendar year effects into inflation and payment shifts;

(3) Estimate settlement delay pattern using both incurred and paid de-velopment trends, and calendar year shift effects;

(4) Estimate average individual claim sizes using both incurred and paid origin trends, and calendar year inflation effects;

(5) Extrapolate RBNS claim amounts by applying a settlement delay pattern and average claim size to the incurred claim counts data; and

(6) Extrapolate IBNR claim amounts by applying a settlement delay pattern and average claim size to the estimated future incurred claim counts.

The first four steps will determine the pattern in which incurred claims will be paid in the future, after which the last two steps apply these patterns to the incurred counts data and estimates of future incurred count data to get RBNS and IBNR figures. Especially step 1, 2, 3 and 4 are different from DCLM. Assumptions for independence structure, and definitions for paid amounts and incurred counts are defined like in the DCLM. We will provide our own assumptions for the data distributions, which will be slightly different then before.

A 3.1. Settlement delay pattern. Given Nij, the distribution of paid claim

numbers conditional on paid amounts follows a multinomial distribution, so (N_i,j,0paid, · · · , N_i,j,dpaid| ∆m) ∼ M ult(Nij; pi+j,0, . . . , pi+j,d), for each (i, j) ∈ I.

The probabilities pi+j,0, . . . , pi+j,d denote the delay probabilities such that

Pd

l=0pi+j,l= 1 and 0 < pi+j,l< 1, ∀l. The pi+j,l will be based on the delay

probabilities πl like in assumption A 2.1, as well as γi+jshif tto be explained in

(18)

A 3.2. Individual claim size. The individual claim sizes Y_ij(k) per incurred claim are mutually independent with distributions fij with mean µij and

variance σ2

ij. Assume that µij = µwiγi+jinf l, with µ being a mean factor, wi

a factor in accident years and γ_i+jinf l a calendar year inflation effect to be explained in step 2. Also the variances are σ2

ij = σ2(wiγinf li+j )2 with σ2 being

a variance factor. Moreover, it is assumed that the claims are settled with a single payment or as a zero claim.

A 3.3. Claim counts and amounts. Incurred counts Nij are independent

random variables from a distribution in the exponential family with mul-tiplicative parametrization E[Nij] = αiβj and identification Pm−1_j=0 βj = 1.

The paid amounts Xij are independent random variables from a

distribu-tion in the exponential family with multiplicative parametrizadistribu-tion E[Xij] =

αp_iβ_jpγi+j and identification Pm−1_j=0 β_jp = 1. The p in the exponent denotes

we are dealing with the paid triangle and Xij =

PN paid ij k=1 Y (k) ij ∀(i, j) ∈ I.

Using the parametrizations from the assumptions, we want to arrive at the following interpretations for all parameters to be estimated:

αi : Expected total number of incurred of claims for origin year i,

βj : Expected proportion of claims reported in development year j,

αp_i : Expected total paid amount for origin year i at base level, so without inflation effects, thus measuring mostly an exposure level, β_jp : Expected proportion of claim amounts paid in development year

j,

γi+j : Calendar year effect including both economic inflation and

payment shifts such as an increase in settlements by the insurer, p_d : Expected proportion of claim amounts arizing from incurred

claims Nij delayed for d periods till payment,

µi: Expected average paid amount per incurred claim for origin year

i at base level, so without inflation effects, thus measuring exposure. We will show that these interpretations are correct in the sections below. 3.1.1. Step 1. Origin, development and calendar trends. Estimate factors for origin and development trend using a GLM seperately for incurred claim counts and paid claim amounts, and extrapolate future incurred claim counts. Include a calendar year trend in the GLM for paid claim amounts as well. We will formulate a GLM to the incurred counts that complies with assump-tion A 3.3. In the following we will refer to this model as the Incurred GLM : Stochastic component: Observations Nij have a density in the exponential

(19)

family with a mean µij.

Systematic component: We have a linear predictor ηij = α0i+ β0j.

We will use a ResQ representation as explained in remark A.1 in the ap-pendix, because we will use Towers Watson projection software ResQ to calibrate our GLMs. Therefore we get ηij =Pi_n=1an+Pj_m=0bm.

Link function: We define link function g(µij) = log(µij) = ηij. So we get

µij = exp (Pi_n=1an+Pj_m=0bm) = exp (Pi_n=1an) · exp (Pj_m=0bm) = αiβj.

Remark 3.1. Before in this multiplicative representation, αi and βj would

just be arbitrary numbers derived by applying an algorithm such as CLM. This means that parameters used to fit claims in the lower part of a run-off triangle, such as cell (5,0) in figure 2, are only based on very few data points. The resulting estimates can be very unstable. In order to avoid this type of overparametrization, we will define an actual trend underlying the number sequences αi and βj for the origin and development trends. We will

do this by using the ResQ representation. One might want αi = αi for some

real number α, so an exponential trend instead of a number sequence. This amounts to keeping an = a constant in the linear predictor, because then

we get: αi = exp ( i X n=1 an) = exp ( i X n=1

a) = exp (i · a) = exp (a)i = αi.

We will continue using the ResQ representation, which is very convenient for constructing different types of trends. One might choose to create a fit with an arbitrary for the first value, constant for a number of consecutive values

and zero for the remaining values. This will also make it easy to extrapolate values for the trends beyond the data periods as well. This will be especially useful for extrapolating future calendar year effects, since normal estimation only results in values for past and current calendar years. Which type of trend fits the data best, such as a and b constant or arbitrary, will not be discussed in this thesis, but there are plenty of methods to analyze and compare different choices. Furthermore, a GLM is only a specification of a model and does not prescribe a standard method for estimating a and b. We will use the Weighted Least Squares method to fit a and b to the data in the run-off triangle, with identificationPm−1

j=0 βj = 1.

We will now formulate a GLM to the paid amounts that complies with as-sumption A 3.3. For the paid amounts we will also include a calendar year effect γ. In the following we will refer to this model as the Paid GLM : Stochastic component: Observations Xij have a density in the exponential

family with a mean µij. A distribution will be chosen that is very similar to

Xij =P N_ijpaid k=1 Y

(k)

ij , which unfortunately is not in the exponential family.

Systematic component: We have a linear predictor ηij = α0i+ β0j+ γi+j0 .

We will use a ResQ representation as explained in remark A.1. in the ap-pendix. So we get ηij =Pi_n=1apn+Pjm=0b

p

(20)

Link function: We define link function g(µij) = log(µij). This means we get

µij = E[Xij] = exp (α0i+ βj0 + γi+j0 ) = exp (

Pi n=1a p n+Pj_m=0bpm+Pi+j_l=1cl) = exp (Pi n=1a p

n) · exp (Pj_m=0bpm) · exp (Pi+j_l=1cl) = αpiβ p jγi+j.

Again we define an actual trend for a ResQ representation ap, bp and c. Weighted Least Squares will be used to fit ap, bp and c to the data in the run-off triangle, with identificationPm−1

j=0 β p

j = 1 and γ1 = 1. It is now very

important that the parameters produced by the GLM still have a realistic interpretation, because we use these interpretations to formulate estimates for settlement delay pattern probabilities and average claim sizes as seen in the DCLM. It might happen that parameter optimization in the GLM results in development and origin effects being taken into the calendar year parameters. This may happen when a poor GLM is chosen, so when wrong trends are modelled for ap, bp and c. We believe that αp or ap should be a measure for exposure, and should thus be fairly or entirely constant over different years of origin. Choosing a sensible trend c for calendar year effects will then really assure us of a realistic interpretation for γ and βp. This can be seen by considering a small triangle with fitted values where the αp_i are constant. For explaining interpretations we can just leave αp out of the triangle since it will not alter any proportions between βp _{and γ.}

i\j 0 1 2 1 β₀pγ1 βp1γ2 β2pγ3

2 β₀pγ2 βp₁γ3

3 β₀pγ3

Figure 5. Fitted paid amounts for constant exposure αp

We can also portray this triangle with an index k for calendar year effects. k\j 0 1 2

1 β₀pγ1

2 β₀pγ2 β1pγ2

3 β₀pγ3 β1pγ3 β2pγ3

Figure 6. Fitted paid amounts for constant exposure αp

We can see that we have the same interpretation for βp as before, so the expected portions of amounts paid in all development years. We can see this since for every calendar year k, we have a total amount γkαp. This amount is

divided in portions β_jpover claims originating from year of origin in the same year, and from claims originating from year of origin a year earlier etc. These are the same kind of portions as seen before. Even if αp _{is not completely}

constant, but has a slight upward or downward trend, the skewness caused by the αp_i in the illustration should not have large impact on the interpretation that we explained. In a later chapter we will show for an example data

(21)

triangle that the difference between βp in DCLM and DGLM is minimal. In line with the above illustration, γ is then really a calendar year effect and does not include any effects from origin or development trends. So we start with a total base level exposure αp_{, which is then divided over development}

years with βpand then corrected for positive or negative calendar year effects with γ.

3.1.2. Step 2. Future calendar year effects. Estimate future calendar year effects based on past calendar year effects given by the paid amount GLM, and split calendar year effects into inflation and payment shifts.

We estimate calendar year effects γi+j with a trend in the Paid GLM. We

only have data on past calendar years in a run-off triangle, so we need to extrapolate future values of γi+j for i + j > m. Before we do this, we will

split the estimated calendar year trend in two effects. Following assumption A 3.3, we have Xij =P

N_ijpaid k=1 Y

(k)

ij . Total amounts are thus dependent on

in-dividual claim sizes and on numbers of paid claims. The total amount might decrease or increase, which is caused either by a change in claim sizes or a change in paid claim numbers. Depending on which of these components has decreased or increased, the extrapolation procedure for future values is very different. Identifying and splitting these different trends, we will call Gamma Factoring. We will then have the following trends:

Economic inflation γinf l_{: If there is a decreasing or increasing}

in-flation trend for claim sizes such as higher medical costs, it can be expected that this trend will continue to some degree. We can ex-trapolate an increasing trend for this parameter. Formally, we can define: γ_i+jinf l := exp (Pi+j

l=1c inf l

l ), i + j > m, cinf l some real vector,

with m being the current calendar year. These values will be in-cluded in the average claim sizes in step 4.

Payment shifts γshif t_{: If the paid numbers have strongly increased,}

it might mean that more claims than expected have been settled from the total incurred claims. This can happen when insurers decide to catch up on old outstanding claims to be paid, so paid amounts will be higher in a specific calendar year s. The Paid GLM will reflect this with a very high value for γs. For future years, there are then

less claims remaining to be paid, so we would need to extrapolate a decreasing trend. Extrapolation for payment shifts will be included in the settlement delay pattern in step 3.

Both effects have to be identified in one paid amount data triangle. Smooth trends in γ should be recognized as economic inflation, and strong outlying values in γ should be recognized as payment shifts. One could use an ad-ditional paid counts triangle to analyze paid numbers in order to seperate the calendar year effects. For now we will use a simple Gamma Factoring in chapter 4. First define γinf l by taking the entries in γ that constitute an identifiable smooth trend, and extrapolate the trend to replace entries that show strong outlying values. Secondly, you can then define γshif t:= γ

(22)

Remark 3.2. We investigated a split between claim numbers and individual claim sizes in step 2, so we can use γ for both numbers and sizes. The remaining parameters estimated in step 1 can be used for one or the other as well. We will summarize which parameters can be used for claim numbers, and which parameters can be used for claim sizes. These parameters will then be used accordingly in step 3 and 4. For numbers we can use the interpretations of α, β, γshif tand βp. For individual claim sizes we can use the interpretations of α, αp and γinf l.

3.1.3. Step 3. Settlement delay pattern. Estimate settlement delay pattern using both incurred and paid development trends, and calendar year shift effects.

Following assumption A 3.1, we want to estimate settlement delay patterns pi+j,0, · · · , pi+j,d for Nij. It can be seen that the pattern is dependent on

the calendar year i + j. We will first estimate a base pattern πb0, . . . ,bπm−1 exactly in the same way as before by solving (2.3) as in step 2 from DCLM. This is an average pattern for all years of origin since we use column effects β. However, one might observe differences between payment patterns for different years of origin in the paid amount data. As explained in step 2, we can identify these payment shifts with γ_i+jshif t. So conditional on the infor-mation ∆m in the paid amount triangle, we can correct the base settlement

delay pattern π for each Nij with a parameter vector θ to include calendar

year effects. Since we are dealing with calendar year effects, the corrections b

θ will be the same for Nij in the same calendar year. So for Nij we will first

estimate patterns

p0_i+j,0, · · · , p0_i+j,m−1:= π0θi+j,0, · · · , πm−1θi+j,m−1.

These sequences may contain negative values or not sum to 1. Using the same procedure for every sequence as in (2.4) at the end of step 2 in DCLM, we finally arrive at a well defined probability vector pi+j,0, · · · , pi+j,d.

Formally, we will use a type of moment estimation to derive a correction pattern bθ, a method which also underlies the estimate _bπ. For the theory of moment estimators, we refer to Serfling (2001). If we would have a sample N_ij,(1)paid, . . . , N_ij,(n)paid for N_ijpaid, we would estimate θ of dimenstion m by a solution to a system of equations:

1 n

n

X

k=1

ft(N_ij,(k)paid) = Eθft(Nijpaid), t = 1, . . . , m,

for given functions f1, . . . , fmwhich usually correspond to different moments

of the random variable. However, we do not have samples for the paid numbers, nor do we have multiple samples of a single paid amount Xij.

(23)

following system of equations as an alternative where we choose ft(x) = x: 1 n n X k=1

N_ij,(k)paid = Eθ[Nijpaid|ℵm], i, j s.t. i + j = 1, . . . , m,

= Eθ[ min(j,m−1) X l=0 N_i,j−l,lpaid |ℵm] = min(j,m−1) X l=0 Ni,j−lπlθi+j−l,l

with m the most recent calendar year. Not all θi+j,l will appear in the

equa-tions, because we only have data up to the current calendar year m, and some delays πlθi+j,lto future years have yet to take place. The θi+j,lexcluded from

the equations will be defined to make the sequencebπ0θbi+j,0, . . . ,πbm−1θbi+j,m−1 sum to 1. We do not have actual samples for Npaid, but we need an average of the data, so we can use the average effects estimated by the GLMs for the left hand term. And we want estimators based on parameters to improve transparancy of the model, so for Nij we will use the estimated effects by

the GLM as well. We then arrive at the following system of equations:

b αicβp jγ shif t i+j = min(j,m−1) X l=0 b αiβb_j−lπ_lθ_i+j−l,l, i, j s.t. i + j = 1, . . . , m, c β_jpγ_i+jshif t= min(j,m−1) X l=0 b βj−lπlθi+j−l,l

This amounts to similar linear equations as seen in (2.3), so we want to explain the paid effect in terms of delayed incurred effects. We can again write the linear equations as a product of matrices:



   

β₀pγshif t₁ β₁pγ₂shif t · · · βp_m−1γmshif t

0 β₀pγ₂shif t · · · βp_m−2γmshif t .. . . .. . .. ... 0 · · · 0 βp₀γmshif t      =      β0 β1 · · · βm−1 0 β0 . .. βm−2 .. . . .. ... ... 0 · · · 0 β0           π0θ1,0 π1θ1,1 · · · πm−1θ1,m−1 0 π0θ2,0 · · · πm−2θ2,m−2 .. . . .. . .. ... 0 · · · 0 π0θm,0     

We can then solve for the last matrix if we use our GLM estimates cβp_{, b}_{β and}

γshif t, and values for θ can then be calculated by dividing the last matrix by values of π. As mentioned earlier, the sequences_b _bπ0θb_i+j,0, . . . ,

b

πm−1θb_i+j,m−1 are incomplete in the last matrix. We can complete the remaining values for θ such that the total sequences sum to 1. This is a very sensible choice. Consider for example a shift in payments to the front of the development of this Nij, so with high values for θ in the estimation. This means that

(24)

there are less remaining payments, so the remainder of the sequence should be downscaled.

By definition of _bπ which solves a linear equation also, it can easily be shown that the solution for the values of θ in the last matrix is actually equal to the values of γshif t itself. Combining this with the remaining values of θ, we get the following values for bθ for l = 0, . . . , m − 1:

b θi+j,l:=     

γ_i+j+lshif t if a moment estimator available, 1 −P

k<l bπkθbi+j,k P

k≥l bπk

otherwise. (3.1) Note that though the definition of the second term is dependent on l, con-secutive values of the second term are always equal to eachother. These consecutive corrections constitute a constant downscaling or upscaling to counteract the effect of the calendar shift and are defined to make the se-quence sum to 1. Also we can see that the definition of θ depends on i + j, so Nij in the same calendar year i + j = k will have the same settlement

delay pattern.

For the IBNR estimates which are extrapolated from bNij there is no data yet

for payment shifts, nor are these future incurred claims impacted by pay-ment shifts from the past. This means that all values of θ will be equal to 1, and the pattern will just coincide with a final pattern p0, . . . , pdindependent

of calendar years like in DCLM.

3.1.4. Step 4. Individual claim sizes. Estimate average individual claim sizes using both incurred and paid origin trends, and calendar year infla-tion effects.

Following assumption A 3.2, we will determine the mean of the distribution of individual claim sizes, including the parameters wi which measure effects

specific to accident years and γ_i+jinf l which measures calendar year inflation. We can set w1 = 1 for identifiability, so we can estimate µ by µ =b

c αp₁ b α1. As

discussed after A 3.3, the interpretation of the numerator here is the base expected total paid amount for origin year 1 without calendar year effects, and the interpretation of the denominator is the expected total number of incurred of claims for origin year 1, so bµ represents the base expected aver-age paid amount per incurred claim without inflation accordingly. We can then estimate wi by:

b wi = c αp_i b αiµb , andµ_bij =µbwbiγ inf l i+j i = 1, . . . , m, j = 1, . . . , 2m − 2. (3.2)

Like in step 3, this estimation method can be written as a type of moment estimation where we use average effects from the GLMs instead of actual samples, but we omit this explanation here.

(25)

3.1.5. Step 5. Extrapolating RBNS claims. Extrapolate RBNS claim amounts by applying a settlement delay pattern and average claim size to the incurred claim counts data.

Following the same method as in DCLM while using the new assumptions and steps from DGLM, we get the following formula for our RBNS estimates:

b X_ijrbns=µbwbiγ inf l i+j j X l=i−m+j Ni,j−lpbi+j−l,l, (3.3)

where the summing indices are chosen to sum the appropriate claim counts. 3.1.6. Step 6. Extrapolating IBNR claims. Extrapolate IBNR claim amounts by applying a settlement delay pattern and average claim size to the estimated future incurred claim counts.

Following the same method as in DCLM while using the new assumptions and steps from DGLM, we get the following formula for our IBNR estimates:

b

X_ijibnr =µ_bw_biγ_i+jinf l

i−m+j−1

X

l=0

b

Ni,j−lpbl. (3.4)

Finally summing the RBNS and IBNR components gives us the total future payment estimates bX_ijDGLM = bX_ijrbns+ bX_ijibnr. The DGLM estimate formulas are very similar to the formulas in the DCLM, now including directly γinf l and indirectly γshif t in p. The estimates will probably be quite different_b however, which we will analyze in chapter 4. First we will discuss the DGLM in the next section and investigate which issues from DCLM have been resolved and which issues still remain.

(26)

3.2. Discussing DGLM. As summarized in section 2.3, there were a num-ber of remaining issues in the DCLM. We will discuss here how the DGLM adresses these issues and comment on possibilities to improve the DGLM with further research.

3.2.1. Overparametrization for estimated trends in CLM. In the DCLM, ev-ery parameter in CLM is estimated seperately as real number. This leads to overparametrization of the model and unstable results for some estimates. DGLM adresses this issue by introducing GLMs instead of CLM, which al-low for a trend estimation for the parameters. This will provide more stable estimates since more data is used per estimate. More stable estimates also means that the interpretation of the parameters is more realistic. There is however no prescribed method on how to model a trend in the GLM, so this does require additional effort from the user.

3.2.2. Interpretation calendar year effects. In the DCLM, there is no pa-rameter to reflect calendar year effects. The DGLM remedies this by intro-ducing a parameter γ in the paid GLM, and further seperates the effect with Gamma Factoring into economic inflation γinf l and payment shifts γshif t. A realistic interpretation for the remaining parameters is still maintained after introducing γ and we believe that better estimates for claim reserves are possible by including calendar year effects. Gamma Factoring might be difficult depending on the dataset however, and no seperation method is imposed here. Future research might install a more exact approach incorpo-rating additional data such as a paid counts triangle. The final reserves will also be impacted by how γinf l is extrapolated to future years, so it will be important to decide on a sensible inflation trend. We don’t believe this will require too much effort. We will see how γ influences the reserve estimates in chapter 4 for our simple Gamma Factoring.

3.2.3. Statistical justification of CLM. Using CLM is only statistically jus-tified for Poisson distributed data by means of maximum likelihood estima-tion. Using a GLM in the new framework does allow for other types of data, but a GLM does not specify an estimation procedure. One does therefore need to consider which estimation method is most appropriate for the data, but it is useful to have different methods to choose from.

3.2.4. Settlement delay pattern per year of origin. In the DCLM, the settle-ment delay pattern π or p does not reflect past delay data per year of origin for RBNS estimates. The base pattern is an average column effect, so if the data for a specific year of origin indicates a different pattern, the reserve estimate will be off. This issue is adressed in DGLM by including a calen-dar year effect γshif t in correction θ for the base settlement delay pattern. But this is an effect derived from the paid amounts. It would be better to derive a similar correction from a paid counts triangle in which ratios from paid counts and incurred counts are used to predict payment shifts. This does however require an extra triangle of data and further research to relax the assumption that every incurred claim is settled with a single payment. This would be a very useful extension to the model in combination with the Munich Chain Ladder method.

(27)

3.2.5. Settlement delay pattern interpretation. In the DCLM, the settlement delay pattern π or p does not have a realistic interpretation on an individual parameter level. This problem still resists in the DGLM. The settlement delay pattern is estimated uniformly for the data, so it does not reflect the real settlement delay pattern for a single incurred datapoint. Adding the correction θ in DGLM does add extra information for the single datapoints, but it does not remedy the issue completely. As discussed in section 2.2.1, it is not a very big issue, but you should still be cautious when manually adjusting individual parameters of the settlement delay pattern. Unless an insurer stores more detailed information, it is probably imposible to solve this issue completely, so we should be content with our improvement. 3.2.6. Mismatch paid amount distributions. There is still a mismatch be-tween different assumptions for Xij in the DGLM. As seen in step 1, we

have to assume a distribution for Xij in the GLM in order to estimate

pa-rameters, which is probably not the same as the compound distribution. Depending on which model we choose, the parameter estimations will be different, so we will have to choose a distribution with similar properties to the compound distribution. The choice for the distribution in the paid GLM can have a large impact on the reserve estimates. We will give an example in chapter 5.

A summary of our conclusions will be given in chapter 6. We will com-pare DGLM with DCLM now in the next section by evaluating all estimates in the models.

(28)

4. Comparing DGLM and DCLM

For both DCLM and a DGLM with a calendar year effect, we will consider an example and give all estimated values of α, β, αp, βp, γ, p, w, µ and the reserve estimates X_ijrbns and X_ijibnr to see how the methods differ. The com-plete incurred and paid datasets for this example originate from the general insurer RSA based on a portfolio of motor third party liability policies and are as follows: i\j 0 1 2 3 4 5 6 7 8 9 1 6238 831 49 7 1 1 2 1 2 3 2 7773 1381 23 4 1 3 1 1 3 3 10306 1093 17 5 2 0 2 2 4 9639 995 17 6 1 5 4 5 9511 1386 39 4 6 5 6 10023 1342 31 16 9 7 9834 1424 59 24 8 10899 1503 84 9 11954 1704 10 10989

Figure 7. Incurred claim counts

i\j 0 1 2 3 4 5 6 7 8 9 1 451288 339519 333371 144988 93243 45511 25217 20406 31482 1729 2 448627 512882 168467 130674 56044 33397 56071 26522 14346 3 693574 497737 202272 120753 125046 37154 27608 17864 4 652043 546406 244474 200896 106802 106753 63688 5 566082 503970 217838 145181 165519 91313 6 606606 562543 227374 153551 132743 7 536976 472525 154205 150564 8 554833 590880 300964 9 537238 701111 10 684944

Figure 8. Paid claim amounts

For comparison, we will choose an incurred GLM with a Poisson distribution and a paid GLM with an overdispersed Poisson distribution. This coincides with the DCLM structure. For DGLM we will choose a particular way to model the parameters in the GLM, whilst including a calendar year effect. The model was calibrated in the Towers Watson software ResQ by first using our own expert judgement as a starting point and then applying an optimizing solution within ResQ. ResQ uses the representation for the linear predictor as described in remark A.1 in the appendix. For the Incurred and Paid GLM, the following choices finally resulted to model a, b and c in the linear predictor of the GLM:

(29)

i/j a b ap bp c 1 F ree F ree F ree F ree F ree 2 F ree F ree Constant Constant Constant 3 T rend F ree F ree F ree Constant 4 Constant F ree T rend F ree F ree 5 Constant T rend Constant T rend T rend 6 Constant F ree Constant T rend T rend 7 Constant T rend Constant T rend T rend 8 F ree T rend Constant T rend T rend 9 Constant T rend Constant T rend F ree 10 Constant T rend Constant T rend T rend

Figure 9. Incurred and paid GLM structure

‘Free’ means the parameter can be chosen freely, ‘Trend’ means that the pa-rameter will have the same value as the previous papa-rameter, and ‘Constant’ means that the parameter will have value zero. Values of b1, bp1 and c1 will

be automatically set to zero by ResQ. Looking for example at a, there will only be three different parameter values, so this is a significant reduction from estimating all parameters seperately. After all values for a, b and c are made available, the values for α, β and γ can be evaluated by summation and taking the exponent as seen in appendix A.2.

Using GLMs for DCLM would mean assigning ‘Free’ to every parameter in a, ap, b and bp, and choosing all values for c equal to ‘Constant’. By equiv-alence, this would result in CLM estimates as explained in the appendix. 4.1. DCLM and DGLM estimates. We will now show all parameter estimates for the six steps of the DGLM framework and give some remarks about the different outcomes.

4.1.1. Step 1. Origin, development and calendar trends. Estimate factors for origin and development trend using a GLM seperately for incurred claim counts and paid claim amounts, and extrapolate future incurred claim counts. Include a calendar year trend in the GLM for paid claim amounts as well. We can see in Figure 11 that only αp is significantly different in DGLM. This is because any decreasing calendar year effects γ have yet to be in-cluded. In DCLM, they are included in αp. We can also see that the DGLM estimates in figure 10 and 11 follow the structure imposed by the choices in Figure 9.

(30)

α β i/j DCLM DGLM DCLM DGLM 1 7135 7233 0.87520 0.87491 2 9194 8997 0.11841 0.11885 3 11435 11192 0.00377 0.00371 4 10676 11192 0.00091 0.00099 5 10963 11192 0.00033 0.00027 6 11437 11192 0.00028 0.00026 7 11361 11192 0.00023 0.00026 8 12519 12956 0.00014 0.00025 9 13746 12956 0.00031 0.00025 10 12556 12956 0.00042 0.00025 Figure 10. Incurred origin and development trends

αp βp γ i/j/k DCLM DGLM DCLM DGLM DGLM 1 1486754 1557363 0.3194 0.3001 1.000 2 1448715 1557363 0.2991 0.3001 1.000 3 1751387 1914913 0.1340 0.1338 1.000 4 1981700 2354551 0.0881 0.0911 0.930 5 1791061 2354551 0.0659 0.0620 0.864 6 1856619 2354551 0.0371 0.0422 0.804 7 1563619 2354551 0.0259 0.0287 0.747 8 1922669 2354551 0.0138 0.0196 0.695 9 2002268 2354551 0.0156 0.0133 0.799 10 2144804 2354551 0.0012 0.0091 0.919 Figure 11. Paid origin, development and calendar year trends

4.1.2. Step 2. Future calendar year effects. Estimate future calendar year effects based on past calendar year effects given by the paid amount GLM, and split calendar year effects into inflation and payment shifts.

We can see from the estimated values of γ that there is a decreasing trend followed by a strong upward trend in the last two calendar years. We can assume that the decreasing trend is inflation-based, and the sudden increase is due to a payment shift. Therefore, we will define γinf l as the first 8 val-ues of γ and we will keep the following years at the same value as year 8 as to not underestimate future payments. So we will not extrapolate the decreasing trend any further. Calculating γshif t then as defined in chapter 3 by _γinf lγ , we get the Gamma Factoring as seen in figure 12. Deciding on

a Gamma Factoring method is of course dependent on the model structure and on additional information such as a paid counts triangle or company intelligence. We chose our assumptions for the Gamma Factoring in order to provide a simple example. It might be possible that there are payment