Loss reserving for a ship crew insurance

(1)

Loss reserving for a ship crew insurance

Pieter Tjerkstra

(2)

Master thesis Econometrics, Operations Research and Actuarial Science (EORAS) Supervisor: prof. dr. R.H. Koning

Co-assessor: prof. dr. L. Spierdijk

Abstract

In this thesis three different loss reserving models are estimated and compared. The three meth-ods discussed are the Chain ladder, the Bornhuetter-Ferguson method and a GLM reserving method. We estimate the three methods using small samples containing missing values. We illustrate how techniques as the bootstrap and the EM algorithm can be used in such situations. Furthermore we make use of a smearing estimate in order to deal with fat-tailed data. The comparison of the reserving is done using back testing and comparing out-sample prediction to observed values. A conclusion that can be made is that loss reserving techniques including expo-sure data tend to have more stable results. Due to a limited data availability strong conclusions cannot be drawn.

(3)

3 Data description 25 3.1 Coverage data . . . 25 3.2 Claim data . . . 26 4 Estimation results 28 4.1 Risk premiums . . . 28 4.1.1 Frequency model . . . 28 4.1.2 Severity model . . . 31 4.1.3 Parsimonious model . . . 32 4.2 Chain ladder . . . 32 4.3 Bornhuetter - Ferguson . . . 33 4.4 G-R Method . . . 33 4.5 Model selection . . . 34 5 Conclusion 36 6 References 37 A IRWLS algorithm 39 B Data description tables 41 B.1 Coverage data . . . 42

(4)

C Model fits 50

C.1 Risk premium full model . . . 50

C.2 Risk premium parsimonious model . . . 52

C.3 Chain Ladder . . . 54

C.4 Bornhuetter-Ferguson . . . 55

C.5 G-R Reserving . . . 56

(5)

1 Introduction

In 1602 the Dutch East India Company was founded. This is a special historical event in many ways. From 1602 onwards the Netherlands were world leader in merchant sea shipping. It was the time of pirates and scurvy. After that merchant shipping evolved and people became more aware of the risks that come along. Nowadays the risk of piracy in the Gulf of Aden is still very relevant. But the health care on board of ships has improved. However health care on board of ships is more difficult compared to the shores. One also has to take care of repatriation. A health care insurance for ship crew is therefore a must.

Due to new legislation for insurers, insurers have to provide best estimates for the claim reserves. All policies have a yearly coverage, but not all claims and payments occur in the same year. Claims that will be reported or settled during later years also must be paid. At the end of the year insurers have to predict the amount that still has to be paid due to insured events that happened in the past. This amount must be put as a reserve on the balance sheet. The level of this amount has to be determined.

In the light of the economic circumstances of the last few years, regulators become more strict in order to protect insurers from becoming bankrupt. If the reserves of an insurer are not set adequately and large claims arise, an insurance company risks bankruptcy. The consequences for individual consumers can be severe, since their loss cannot be covered anymore. On an European level new legislation concerning capital requirements for insurers will be introduced. The plan is to introduce the new legislation called Solvency II on 1 January 2014. More on Solvency II can be found in section 1.2.

Due to Solvency II insurers have to monitor their risks and they have to provide best estimate claim reserves. The best estimates in this context are the expected sum of future claim payments for insurance coverage in the past, taking into account the time value of money. So insurers have very practical questions, how to model these future claim payments. We will examine this ques-tion for a health care insurance for ship crew. The research quesques-tion we formulate in this context is the following.

What is the most appropriate loss reserving model for Solvency II best estimate claim reserves for a ship crew insurance?

Subquestions that arise are

1. What is an appropriate model?

2. Which models are allowed according to the legislation? Does legislation impose restrictions? 3. Can we improve the forecast ability of the models using various risk factors?

(6)

We have to define what we mean by the ‘most appropriate’ loss reserving model. Since our goal is to predict the amount that has to be paid, the prediction performance of the model is one of the model selection criterions. The prediction performance can be assessed by a rolling estimation on historical data and comparing the forecasts with the observed values. The mean percentage error, mean squared errors and the usual sample statistics of the forecasting error can be used in order to evaluate the forecasts.

Furthermore we would like to asses the variance of the predictor. Since the amount of the reserves are only a point estimate, we would like to have an idea of the uncertainty around this point estimate. Due to a limited amount of data at hand we have to be careful with methods that rely on asymptotic results. Therefore we will make use of the bootstrap method. Drawing samples with replacement from the residuals of an estimated model and re-estimating the reserve amount based on these samples gives an idea of the distribution of the predictor.

The best estimate is defined as the probability-weighted average of future cash flows taking the time value of money into account, using the relevant risk-free interest rate term structure. In other words, it is the mean of the present value of the claim payments. We will elaborate more on the exact definition in the law in section 1.2.

(7)

1.1 Thesis outline

After briefly describing the relevant Solvency II legislation in section 1.2, we will discuss various modeling techniques. The goal is to get from data to a best estimate of the claim reserves. In Figure 1 the main line of the techniques used in this thesis are given.

Solvency II

Best Estimate

claim reserves

Chain Ladder

-GLM

Bornhuetter-Ferguson

-GLM

GLM

Reserving

Frequency

model

Severity

model

-GLM

-Missing values

-Log transform

Development

model

-GLM

-Missing values

-GLM

-Missing values

Figure 1: General scheme of thesis. In the gray area lists the modeling techniques used in the corresponding reserving method. The bootstrap technique is applied to every reserving method. Figure 1 gives the scheme from data to best estimate. We will test three reserving methods using data of a ship crew insurance. The reserving methods are discussed in section 2. All three reserving methods make use of Generalized Linear Models (GLM). In order to asses the margin of error in the estimation of the models, we apply the bootstrap technique. Therefore we will discuss the GLM and bootstrap techniques in section 2.

The input for the scheme in Figure 1 is data from a ship crew insurance. From the data description in section 3.1 it will become clear that the data contain missing values and heavy tailed claim sizes. We will discuss a technique that deals with the missing values in the ship crew data. Furthermore we will discuss a log transformation in order to cope with the heavy tailed claims.

(8)

1.2 Solvency II

In 2009 the European Union introduced new legislation concerning insurers. (European Par-liament [2009]) The plan is to finalize the new legislation on 1 January 2014. The goal of the legislation is to unify a single EU insurance market and enhance consumer protection. The leg-islation consists out of three pillars. The first pillar describes quantitative requirements. The second sets out governance and risk management requirements. The last pillar focuses on dis-closure and transparency requirements. So next to quantitative requirements, solvency II also sets guidelines for various processes within insurance companies. In this thesis we will look at the quantitative requirements.

The first pillar describes that insurers should have more capital than a minimal capital requirement (MCR). If the reserves are below the MCR, the supervisor will intervene. For an insurer to be solvent its capital should exceed the solvency capital requirement (SCR). The SCR consists out of a technical provision for the claim payments plus a capital requirement based on various characteristics of the insurance. The capital requirement reflects both the characteristics of the assets and of the insurance liabilities. The SCR is the capital required to ensure that the insurance obligations over the next 12 months are met with a probability of at least 99.5%. The SCR can be determined using an internal model or the standard model given by the supervisor. Part of the SCR are the technical provisions. The technical provisions represent the current amount the insurer should pay if all insurance obligations are immediately transfered to a third party. The technical provisions consists of the sum of a best estimate of the claim provisions and a risk margin reflecting the cost of capital. The risk margin reflects the cost of capital for the capital requirements of a third party taking over the insurance obligations. If a third party takes over the insurance obligations, this party also faces solvency capital requirements. The risk margin reflects the cost of capital for the SCR of the hypothetical third party taking over the insurance obligations.

We will focus on the best estimate of the claim provisions. In article 77 of the Solvency II directive (European Parliament [2009]) states the following; ‘The best estimate shall correspond to the probability-weighted average of future cash flows, taking account of the time value of money (expected present value of future cash flows), using the relevant risk-free interest rate term structure.’ CEIOPS [2009a] and CEIOPS [2009b] provide extra detailed requirements concerning the calculation of the claim provisions.

The legislation does not prescribe any reserving model, only suggestions and examples are given. Frequency/severity models, the Chain Ladder and the Bornhuetter-Ferguson are all sug-gested methods by CEIOPS [2009b]. The choice of the reserving model has to be made using expert judgment. So in that sense we are free to choose a model as long as it is based on reasonable assumptions which are in line with adequately collected data.

The legislation does however require that the method projects future cash flows. Moreover CEIOPS [2009b] state that obligations in different currencies should be calculated separately. We will keep the requirements by the Solvency II legislation in mind when modeling the claims. Solvency II requires that the best estimates of the claim reserves take into account the time value of money. This means that future projected cash flows needs to be discounted using a risk free yield curve. For the comparison of the different reserving methods the time value of money is not very relevant, we will leave out discounting in this thesis.

(9)

2 Literature review

In this section we will describe the literature concerning the main building blocks of the reserving models we will use. We start by describing an often used modeling technique in actuarial science, namely Generalized Linear Models in section 2.1. We will describe how such models are used when pricing an insurance in section 2.1.3. Next we will elaborate on the reserving techniques in section 2.2. Furthermore in section 2.3 we will describe a way of dealing with missing values in a GLM context. In section 2.4 we will describe the bootstrap technique and how to apply it in our case. Finally we discuss a modeling technique that deals with fat-tailed claims, since this technique will be applied to such claims in the claim severity model later on.

2.1 Theory of Generalized Linear Models

Generalized Linear Models (GLM) are introduced by Nelder and Wedderburn [1972]. The model extends the traditional linear model using other distributions for the dependent variable. This is extremely useful in actuarial science, since we would like to model (non-negative) claims and the number of claims. Faraway [2006] gives a very practical discussion on the topic.

2.1.1 Definition GLM

A GLM is built on two main building blocks; an exponential family distribution for the dependent variable and a link function describing the relation of the independent variables with the mean of the dependent variable (Y ). We write a sample (size n) of the dependent variable as y1, . . . , yn.

The first building block is the density of an exponential family distribution, which is given in (1). fY(yi|θ, φ) = exp yiθi− b(θi) ai(φ) + ci(yi, φ) . (1)

θi is called the canonical parameter and φ the dispersion parameter. The functions a(φ), b(θi)

and ci(yi, φ) specify the distribution. Usually a(φ) is of the form a(φ) = _mφ_i, where mi is the

weight for observation i = 1, . . . , n. For the Gamma and Poisson distribution the choice of the specifying functions is given in Table 1.

Distribution ai(φ) b(θi) ci(yi, φ) Domain Y Poisson 1 eθi _{− ln(y!)} _{{0, 1, . . .}} Gamma _mφ i − ln(−θi) mi φ ln( miyi φ ) − ln(yi) − ln(Γ(mφi)) (0, ∞)

Table 1: Specifying GLM distributions Based on the distributions in Table 1, Y has mean and variance equal to

E(Yi) = b0(θi) = µ,

V ar(Yi) = b00(θi)a(φ).

Since we have specified the first building block, we can introduce the second building block. This is the relation of the independent variables with the mean of the dependent variable. Let x1, . . . xp denote p covariate vectors of size n. Let β1, . . . , βp be their coefficients describing the

(10)

the linear predictor, η := β0+Pp_j=1βjxj where β0 is the intercept. The relation of the linear

predictor and the dependent variable is modeled via a link function g, such that η = g(µ), µ = g−1(η).

We require g to be a monotone, continuous and differentiable function. In order to apply the model the µ has to be in the parameter space of the distribution used. So for example for the Poisson distribution we require g−1(η) to be positive. Function g is called the canonical link if η = g(µ) = θ. For the Poisson distribution this holds true if g(µ) = ln(µ) and for the Gamma distribution this is the case if g(µ) = 1_µ.

Using the expressions for the mean and the variance, one can express the variance in terms of the mean. This relation is called the variance function and is denoted by V (µ). For the Poisson distribution with canonical link we get V (µ) = µ and for the Gamma distribution with canonical link we have V (µ) = µ2_{. The β parameters are usually estimated by maximum likelihood. The}

likelihood equations are solved by a Newton-Raphson procedure called iterative reweighted least squares (IRWLS). The IRWLS algorithm is described in appendix A. As we will see in section 2.3 the model is sometimes estimated using a weighted likelihood approach. This approach is also described in appendix A.

2.1.2 Diagnostic testing

Breslow [1996] lists the critical assumptions one makes applying a GLM. He notes • Statistical independence of the n observations;

• Correct specification of the variance function V (µ); • Correct specification of φ;

• Correct specification of the link function g(µ); • Correct form for the explanatory variables x;

• Lack of undue influence of individual observations on the fit.

Breslow [1996] also suggests methods in order to check the assumptions and methods that may help fixing violated assumptions. Consider for example the assumption concerning the variance function.

For the Poisson GLM the mean is equal to the variance. In practice this can be a quite strong assumption. Often the variance appears to be larger than the mean, this is called overdis-persion. A solution is to use quasi-likelihood. For the Poisson distribution one assumes that φYi∼ Poisson(µφi). The β coefficients can be estimated using the usual IRLWS algorithm. The

dispersion parameter has to be estimated separately. Faraway [2006] recommends to estimate φ using ˆφ =

Pn

i=1ri,p∗

n−p , where ri,p∗ denote the Pearson residuals as in equation (2).

ri,p∗= yi− ˆµi q \ Var(Y ) , i = 1, . . . , n. (2)

For a GLM there are multiple residual definitions. Next to the Pearson residuals, Davidson and Hinkley [1997] also list the deviance residuals (denoted by ri,d∗) and the standardized residuals on

the linear predictor scale (denoted by ri,l∗). Their definitions are respectively given in equation

(11)

ri,d∗= sign(y_i− ˆµ_i) p 2(`i(yi) − `i(ˆµi)) , i = 1, . . . , n. (3) ri,l∗= g(yi) − g(ˆµi) g0_(ˆ_µ i) q \ Var(Y ) , i = 1, . . . , n. (4)

g0(·) denotes the first derivative of the link function in equation (4). Using these residual defini-tions, various bootstrap methodologies can be constructed. We will describe these methodologies in section 2.4.3.

2.1.3 Rate making using GLM

In rate making Generalized Linear Models are often used in order to determine the risk premium of an insurance. This is for example described in Ohlsson and Johansson [2010]. The risk premium is the expected value of the loss of a particular insurance product. In rate making the goal is to divide the insurance portfolio in homogeneous groups using various risk factors. On can divide the portfolio into k subgroups that have certain characteristics. Let i = 1, . . . , k denote the homogeneous subgroups, called cells. The characteristics are the independent variables used in the GLM.

Let wi(t) denote the number of exposure units in cell i at time t. Exposure is measure of the

number of policies/persons that are insured. Usually we take the full time year equivalent of the number of policies. This quantity is comparable in computation as the computation of FTE from the number of employees. Persons/policies that are insured for half a year are measured as 0.5 exposure units. The claim frequency in period [t1, t2] for cell i is defined as fi(t1, t2) =

Ni(t1,t2)

Rt2 t1 wi(s)ds

, where Ni(t1, t2) denotes the number of claims arising in period [t1, t2] for the policies in cell i.

The claim severity is an often used term to describe the average claim size.

Murphy et al. [2000] recommend to model the claim frequency and claim severity separately. This method yields greater insight and more flexibility compared to modeling the total claim amount at once. The claim frequency fi is often modeled using a Poisson GLM with a log link.

The usual way to proceed in modeling the claim frequency is to take Ni(t1, t2) as dependent

variable and the exposure, αi(t1, t2) :=

Rt2

t1 wi(s)ds, as log offset. This formulation means the

following:

ln(Ni(t1, t2)) = ηi+ ln(αi(t1, t2))

Ni(t1, t2) = αi(t1, t2) exp(ηi).

ηi denotes the linear predictor in cell i. So the data we need for the claim frequency model are

the number of claims in every cell and the total exposure in each cell for the same period. For the claim severity one usually opts for a Gamma GLM with a log link. This implies that the claim size is modeled in such a way that it is always positive. The data used for the severity model is a data frame with one row for each claim with covariates explaining to which cell the claim belongs and its size.

The two models yield predictors in every cell i for the average claim size and the expected number of claims. Assuming independence between the claim frequency and average claim size allows calculation of the risk premium in every cell as the product of the two fitted values. If we denote the claim severity in cell i by zi we can write the risk premium (Pi) in cell i as

(12)

gf(.) and gz(.) denote the link functions for the claim frequency and claim severity GLMs

re-spectively. ηi,f and ηi,z denote the linear predictors in cell i for the claim frequency and claim

severity respectively. The risk premium is estimated by the obvious estimator given in equation (6).

ˆ

Pi= gf−1(ˆηi,f)g−1z (ˆηi,z), i = 1, . . . , k. (6)

If we use data based on yearly intervals [t1, t2] for αi(t1, t2) and Ni(t1, t2), the estimated risk

premium ˆPi can be interpreted as the expected loss for a policy in cell i during one year. This

approach is often applied in practice, see for example Anderson et al. [2007].

2.2 Theory of reserving methods

Claims arising from a particular year of coverage (denoted by the claim year) are not always settled in the same year. Due to for example legal procedures or long waiting times for invoices, payments can take place years after the actual claim date. The theory of reserving is all about setting reserves for the payments that still have to come for claims in the past. These claims are also called IBNR claims (Incurred But Not Reported) or RBNS claims (Reported But Not Settled). Note that the reserves only have to be made for the periods in which there was coverage. This means that at the end of the year all premiums are collected, but not all claims are paid. Based on historical data we would like to estimate the payments that still have to come.

The traditional IBNR approaches make use of run-off triangles. In Table 2 an example of a run-off triangle is given.

Development year (j) Year i 1 2 3 4 5 6 Y ear of origin (i ) 2006 1 X11 X12 X13 X14 X15 X16 2007 2 X21 X22 X23 X24 X25 2008 3 X31 X32 X33 X34 2009 4 X41 X42 X43 2010 5 X51 X52 2011 6 X61

Table 2: Example of a run-off triangle

In Table 2 the Xij denote the total amount of payments of claims arising from claim year i paid

within j years. At the end of year t all Xij are known for which i + j ≤ t + 1, resulting in

a triangle as depicted in Table 2. We would like to estimate the right lower part of the table, completing the triangle in to a square. So at the end of the last available year (in this example 2011, t = 6), we would like to estimate Xij with i + j > t + 1. We will denote the estimates

by bXij. Two traditional approaches in order to provide values for bXij are the Chain ladder

technique and the Bornhuetter-Ferguson approach (Bornhuetter and Ferguson [1972]). We will describe the two traditional approaches and also describe a relatively new approach proposed by Zhou and Garrido [2009]. In the latter Generalized Linear Models are used in a similar way as in pricing an insurance as described in section 2.1.3.

2.2.1 Chain ladder method

(13)

proportional. The proportionality assumption implies that in Table 2 the element to the right of the diagonal elements can be estimated. These are the values of bXij for which i + j = t + 2.

We define Aij =P_k<iP_l<jXkl, Bij =P_k<iXkj and Cij =P_l<jXil. Aij is the sum of all

elements Xkl above and to the left of Xij, Bij the sum of all elements above Xij in column j

and Cij the sum of all elements to the left of Xij in row i.

For the bXij for which i + j = t + 2 (the elements next to the diagonal, we will call them

the second diagonal) Aij, Bij and Cij can be computed. The estimate reads as follows for

i + j = t + 2, i ∈ {1, . . . , t}, j ∈ {1, . . . , t}, b Xij = Cij× Bij Aij .

Once these values are computed, we can extend the triangle with the second diagonal. Treating these estimates in the same way as the observations, we can compute bXij for which i + j = t + 3

in the same way. Repeating this procedure leads to a square.

The classic Chain ladder method is a deterministic method yielding point estimates only. In the literature there are various extensions in order to obtain confidence intervals around these point estimates. These are stochastic approaches yielding the same point estimates. For example Mack [1993] proposes a non-parametric approach. England and Verrall [1999] use a quasi-Poisson GLM and the bootstrap method in order to obtain confidence intervals for the Chain ladder point estimates.

2.2.2 Chain ladder as GLM

A quasi-Poisson GLM with a log link yields the same point estimates as the Chain ladder method. This method is introduced by Renshaw and Verrall [1994]. Here we use the same notation Xij

as in section 2.2.1 and assume φXij ∼ Poisson( mij

φ ) with ln(mij) = β0+ αi+ βj, α1= β1= 0. φ

is called the dispersion parameter and will be estimated separately from β0, αi and βj. Now we

are back in the GLM frame work described in section 2.1.

In the example in Table 2 one needs to estimate 1 + 5 + 5 + 1 = 12 parameters, using 21 data points. The risk of over-fitting and unstable estimation is therefore clearly present. In section 2.2.3 we will describe a method trying to stabilize the estimation results.

2.2.3 Bornhuetter-Ferguson method

Bornhuetter and Ferguson [1972] propose a method related to the Chain Ladder method. This method also assumes proportionality of the columns of the loss triangle. An extra assumption concerns the ‘row parameters’ αi as described in section 2.2.2. The assumption is that we have

prior knowledge of the final total amount of loss arising from claim year i. In other words E[Pt

l=1Xij] = Mi for some known parameter Mi. Bornhuetter and Ferguson [1972] use the

total premium earned in year i net of commission in order to determine values for Mi.

Kaas et al. [2008] describe that the model can be estimated using the same approach as in section 2.2.2 only with different covariates, namely ln(mij) = αni+ βj. In this equation ni is

a measure of the exposure in year i. By exposure we usually mean full year equivalent number of persons that was insured in a particular year. This parameterization leads to the assumption that Mi= ˆαn_ni

1. In other words the assumption is that the total loss arising from accident year

i is proportional to the measure of exposure ni.

(14)

Verrall [2004] shows that the Bornhuetter-Ferguson method can be interpreted as a Bayesian method within the framework of Generalized Linear Models.

2.2.4 Loss reserving using GLM

Zhou and Garrido [2009] propose a different loss reserving technique next to the traditional approaches. The traditional approaches use aggregated data that do not take all possible infor-mation into account. The method proposed by Zhou and Garrido [2009] extends the approach that is often applied in rate making. The rate making approach is described in section 2.1.3.

The model proposed by Zhou and Garrido [2009] is the following. First we define a loss function l(t), a stochastic process describing the amount of loss at time t. l(t) is not directly observed in practice. The total aggregated loss in interval [t1, t2] can be described as Ltot(t1, t2) =

Rt2

t1 l(s)ds. However, in practice there is time between the occurrence date of a loss and the date

at which the final payment is made. In order to model this delaying process we introduce another stochastic process D(t), called the loss development function. This function describes the percentage of a loss that has been paid within t years. Evidently we have D(t) = 0 for t ≤ 0 and limt→∞D(t) = 1.

We can describe the aggregate paid loss at time T ≥ t2 > t1 of losses that occurred in the

interval [t1, t2] as

L(T, t1, t2) =

Z t2

t1

l(s)D(T − s)ds. (7)

So in period [t1, t2] the total amount of loss occurred equals Ltot(t1, t2). However at time T only

L(T, t1, t2) is paid. For the unpaid loss Ltot(t1, t2) − L(T, t1, t2) one has to set a reserve. We

denote this amount by

R(T, t1, t2) := Ltot(t1, t2) − L(T, t1, t2)

= Z t2

t1

l(s)[1 − D(T − s)]ds.

In order to use this framework for actual loss reserving we will make some assumptions. First we assume that the loss function l(t) is independent of the loss development function D(t). This assumption allows us to find the first moment of the reserves, if the first moments of l(t) and D(t) exists. We find

E[R(T, t1, t2)] =

Z t2

t1

E[l(s)]{1 − E[D(T − s)]}ds, for T ≥ t2> t1.

Now we see that the expected loss reserve only depends on the expected loss function and the expected loss development function. Zhou and Garrido [2009] propose to use GLMs to estimate those two functions. The independence assumption allows us to estimate the two functions separately. Next to the independence assumption we make two extra simplification assumptions, namely:

• The amount of exposure spreads uniformly over the year. Exposure is the number of full time equivalent number of persons/policies that are insured;

(15)

Let τ denote the average settlement time of a claim. From the last assumption we have E[τ ] = Z ∞ 0 E[1 − D(s)]ds = Z ∞ 0 1 asds = 1 ln a. (8)

The first equality sign in equation (8) uses the relation between expectation and the cumulative distribution function for non-negative random variables. The average settlement parameter E[τ ] is a parameter that can be estimated as a dependent variable in the GLM framework. The model we will use for l(t) is the same model that is used in rate making. So we take the model for the risk premium to describe E[l(t)], see section 2.1.3.

We assume that Ltot(t1, t2) = P N (t1,t2)

l=1 Xl, where N (t1, t2) denotes the number of claims

arising in period [t1, t2] and Xl denotes the size of the individual claims. Using two extra

assumptions we find that the expected loss equals E[Ltot(t1, t2)] = E[N (t1, t2)]E[X]. Here we

assume that the number of claims N (t1, t2) and the claim size X are independent and Xl are

independently and identically distributed within a cell.

Let fi(t1, t2) denote the claim frequency in cell i = 1, . . . , k and period [t1, t2], zi[t1, t2] the

claim severity and τi[t1, t2] the average settlement time in period [t1, t2] and cell i = 1, . . . , k. As

in section 2.1.3 let wi(t) denote the number of (full year equivalent) exposure units in cell i at

time t and let αi(t1, t2) :=R t2

t1 wi(s)ds denote the exposure in period [t1, t2].

For the quantities fi(t1, t2), zi[t1, t2] and τi[t1, t2] one can estimate a GLM. For every cell we

obtain linear predictors, denoted respectively by ηfi, ηzi and ητi. Let gf, gz and gτ denote the

link functions for the GLMs. Zhou and Garrido [2009] propose to use a log-Weibull GLM for τ . Based on the GLMs we obtain the expected value of the claim frequency, severity and average settlement time for cell i as follows

E[fi] = g_f−1(ηfi), E[zi] = g

−1

z (ηzi) and E[τi] = g

−1 τ (ητi).

Now using equation (8) we find that_ln(a1

i) = g

−1

τ (ητi), which is equivalent to ai= exp

n

1 g−1τ (ητi)

o . Combining the expression for ai with the assumed parametric form for E[D(t)] we find the

following expression for the loss development function at time t ≥ 0 in cell i, E[Di(t)] = 1 − exp

_−t g−1τ (ητi)

, i = 1, . . . , k.

Combining the frequency and severity model we find the following expression for the expected loss at time t for cell i,

E[li(t)] = wi(t)g−1f (ηfi)g

−1

z (ηzi), i = 1, . . . , k. (9)

Finally the reserves for cell i and losses occurred in time interval [t1, t2] at time T are described

in equation (10).

E[Ri(T, t1, t2)] =

Z t2

t1

E[li(s)]{1 − E[Di(T − s)]}ds, i = 1, . . . , k and T ≥ t2> t1. (10)

Summing over each cell we can obtain the total reserve amount,

E[R(T, t1, t2)] = k

X

i=1

(16)

In order to make the connection between above reserving model and the usual loss triangles, we will introduce the following notation. Let R∆_(T

1, T2, t1, t2) = R(T1, t1, t2)−R(T2, t1, t2), where T2≥

T1 ≥ t2 ≥ t1. R∆(T1, T2, t1, t2) can be interpreted as the total amount of losses arising from

period [t1, t2], which have to be paid in interval [T1, T2]. Evidently we have

E[R∆(T1, T2, t1, t2)] = E[R(T1, t1, t2)] − E[R(T2, t1, t2)]. (11)

The estimated values for the lower right part of the traditional loss triangles as in Table 2, are the values E[R∆(T1, T2, t1, t2)]. In Table 3 this illustrated in a similar way as in Table 2.

Development year (j) Year i 1 2 3 4 5 6 Y ear of o r ig in ( i) ₂₀₀₆ ₁ _X 11 X12 X13 X14 X15 X16 2007 2 X21 X22 X23 X24 X25 E[R∆(6, 7, 1, 2)] 2008 3 X31 X32 X33 X34 E[R∆(6, 7, 2, 3)] E[R∆(7, 8, 2, 3)]

2009 4 X41 X42 X43 E[R∆(6, 7, 3, 4)] E[R∆(7, 8, 3, 4)] E[R∆(8, 9, 3, 4)]

2010 5 X51 X52 E[R∆(6, 7, 4, 5)] E[R∆(7, 8, 4, 5)] E[R∆(8, 9, 4, 5)] E[R∆(9, 10, 4, 5)]

2011 6 X61 E[R∆(6, 7, 5, 6)] E[R∆(7, 8, 5, 6)] E[R∆(8, 9, 5, 6)] E[R∆(9, 10, 5, 6)] E[R∆(10, 11, 5, 6)]

Table 3: Illustration of practical implementation, the numbers t1, t2, T1, T2are measured in years

after 1 January 2006.

In Table 3 we aggregate over years. So we take the intervals in which the claims occur [t1, t2]

equal to calender years. Zhou and Garrido [2009] suggest to use aggregation over quarters. Since this approach relies heavily on GLMs, the approach is called GLM Reserving (G-R). The G-R approach can be summarized in the following steps.

1. Construct a log-Gamma GLM to project the claim severity for each risk cell; 2. Construct a log-Poisson GLM to model the claim frequency for each risk cell;

3. Combine the two GLMs in order to calculate a risk premium, as described in section 2.1.3; 4. Model the time between the claim date and the claim payment for each risk cell using a

log-Weibull GLM;

5. Combining step 3 and step 4 using equation (10) to compute the values E[R(T1, t1, t2)];

6. Use the reserve values to complete the lower right part of the triangle using equation (11).

2.3 Theory of missing data and GLM

Suppose now we would like to estimate a GLM with dependent variable y ∈ Rn _{and independent}

variables x1, . . . xp ∈ Rn. However for some reason the xj, j ∈ {1, . . . , p} variables are not

completely observed, the independent variables contain missing values. The dependent variable is completely observed. A naive solution would be to delete the observations with missing values. This approach is called case deletion. Another approach would be to replace the missing value by for example its mean. This is called the imputation, the possible methods are for example described in Rubin [1987]. We will take another approach.

(17)

closed form for GLMs. Hence the EM algorithm offers a good way out. If there are quite some missing values present in the data, this procedure tends to give smaller standard errors for the covariate parameters compared to case deletion.

We assume covariate xj is missing at random (MAR). This means that the missing data

mechanism is independent of xj and other covariates having missing values. The mechanism

may depend on y or other covariates without missing values.

Furthermore we restrict ourselves to the case of discrete covariates. So the covariates x = (x1, . . . , xp) have a finite range, which can be parameterized by a vector γ = (γ1, . . . γr). So we

limit ourselves to factor covariates and assume we do not have any continuous covariates.

As described in section 2.1, y|x has a distribution depending on GLM parameters (β1, . . . βp, φ) =

(β, φ). We assume that the GLM parameters are distinct from γ, meaning that they can be es-timated separately. We are mainly interested in the β parameters, the γ parameters itself are not of interest. The latter will only be used in order to estimate the β parameters.

Finally we assume that observations yi|xi, i = 1, . . . , n are independent and xi are i.i.d. Let

ψ = (β, φ, γ) denote a vector containing all parameters. The log-likelihood function for all n observations can be written as follows,

l(ψ, x, y) = n X i=1 l(ψ; xi, yi) = n X i=1 lyi|xi(β, φ) + lxi(γ). (12)

l(ψ; xi, yi) denotes the complete data log-likelihood of observation i. In the first step we use the

assumption on the independence of the observations. lyi|xi(β, φ) denotes the GLM likelihood of

y|x and lxi(γ) denotes the marginal likelihood of covariate xi. In the second step we use the

assumption that the γ parameters are distinct from the other parameters and a standard result from probability theory. Namely for two events A, B we have P (A ∩ B) = P (A|B)P (B).

Now we introduce missing values in the data. We write for every i-th observation xi =

(xobs,i, xmis,i). So every observation is split up in an observed part and a missing part. The

E step of the EM algorithm consists of computing the expected likelihood. Given a current estimate of ψ, denoted by ψ(s)_{, we write the E step as follows for only one observation i.}

Qi(ψ|ψ(s)) = E h l(ψ; xi, yi)|xobs,i, yi, ψ(s) i = X xmis,i p(xmis,i|xobs,i, yi, ψ(s))l(ψ; xi, yi).

s denotes the number of the current iteration in the EM algorithm. p(xmis,i|xobs,i, yi, ψ(s))

denotes the conditional distribution of the missing covariates, conditional on the observed co-variates, the dependent variable and the current estimate of ψ. The sum is taken over all possible values the missing covariate can take. So for the completely observed covariates this amounts to only one term in the summation. If we write wi,(s) = p(xmis,i|xobs,i, yi, ψ(s)), then the E step

for the all observations can be written as

(18)

Equation (13) has the form of a weighted complete data log likelihood. If an observation contains missing values, then the observation is replaced by all possible combinations the missing values could take. So extra rows are added to the data. Now the augmented data consist out of N = Pn

i=1ki rows, where ki denotes the number of distinct covariate combinations for observation i.

If there are no missing values for observation i ∈ {1, . . . , n}, then we have ki= 1.

In the M step equation (13) has to be optimized. This can be done using several algorithms. A natural candidate is the IRWLS algorithm with which a GLM is usually fitted. Using the augmented data and prior weights wi,(s), the GLM can be fitted using the IRWLS algorithm as

described in appendix A.

In order to obtain the weights, one can use Bayes theorem as follows wi,(s)= p(xmis,i|xobs,i, yi, ψ(s)) =

p(yi|xi, ψ(s))p(xi|ψ(s)) P xmis,ip(yi|xi, ψ (s)_)p(x i|ψ(s)) . (14)

The term p(yi|xi, ψ(s)) is direct output of the estimated GLM in the last M step. For the covariate

distribution, denoted by p(xi|ψ(s)), we use a multinomial distribution. As described earlier

this distribution is parameterized by γ. Based on the fully observed observations we estimate the parameters γ by maximum likelihood. In the computation of p(xi|ψ(s)) for observation i

the information regarding the observed independent variables also has to be used. So in this computation we condition on the observed independent variables for observation i.

Executing the EM algorithm in the way described above can be summarized as follows. 1. Augment the data

• For every row containing missing values, rows are added. Namely for every possible combination of values the missing covariates can take, a row is added.

2. Initialization

• The conditional distribution of the missing covariates are computed based on the multinomial distribution;

• Initial values are computed for p(yi|xi, ψ(s)), we use p(yi|xi, ψ(s))initial=_N1;

• The initial weights for every row in the augmented data are computed using equation (14);

• Using these initial weights, the first GLM is estimated by a weighted likelihood proce-dure as described in appendix A. In software package R one can use the glm command with prior weights.

3. E step

• Compute new values for p(yi|xi, ψ(s)), using the current GLM estimation;

• The initial weights for every row in the augmented data are computed using equation (14).

4. M step

• Using the weights computed in the E step, the GLM is estimated by a weighted likelihood procedure as described in appendix A.

5. Convergence

(19)

• As described in McLachlan and Krishnan [1997] chapter 5, we use the convergence measure the convergence measure ||w(s+1)−w(s)||2

||w(s)||2 , where w(s) is the vector with

ele-ments wi,(s). McLachlan and Krishnan [1997] suggest to use a convergence precision of

10−10 based on a simulation study. So the EM algorithm is converged if the following holds

||w(s+1)− w(s)||2

||w(s)||2

< 10−10.

2.4 Theory of bootstrapping

Efron [1979] introduced the sample re-use technique called the bootstrap. Davidson and Hinkley [1997] provide an extensive overview on the applications of the bootstrap method. We will first describe the bootstrap technique in general in section 2.4.1. In the next subsection we will describe the assumptions of the bootstrap. In sections 2.4.3, 2.4.4 and 2.4.5 we describe the applications of the bootstrap used in this thesis.

2.4.1 Bootstrap in general

Let y1, . . . yn denote an i.i.d. sample from random variable Y ∼ F . Suppose that on the basis

of this sample we would compute a test statistic, denoted by t(y1, . . . yn). Using the bootstrap

technique we can acquire some insights on the distribution of t(y1, . . . yn). When applying the

bootstrap method one would take B samples of size n with replacement from the original sample y1, . . . yn. We denote those samples by y1∗b, . . . , yn∗b, b = 1, . . . , B. Using these B samples we can

compute B values for t(y1, . . . yn), denoted by t∗b= t(y1∗b, . . . y∗bn ), b = 1, . . . , B

The values t∗bcan be used to compute standard errors and confidence intervals for t(y1, . . . yn).

The bootstrap variance estimator is given by VB = _B−11 PB_b=1(t∗b− ¯t∗)2, where we have ¯t∗ = 1

B

PB

b=1t∗b. A simple (1−α)% confidence interval is usually taken as (t∗((B+1)(α/2)), t∗((B+1)(1−α/2))),

where t∗_(k)denotes the k-th order statistic of the bootstrap sample t∗b, b = 1, . . . , B. Davidson and Hinkley [1997] advise to use B ≥ 1000 to be on the safe side. 2.4.2 When does the bootstrap work?

Davidson and Hinkley [1997] discuss two senses in which the appropriateness of the bootstrap method should be assessed. First, does the bootstrap method give reliable results with the data at hand? And second, when and how might the bootstrap calculations fail?

The first question can be tried to answer using asymptotic results. Suppose the sample size n → ∞. Let ˆF denote the empirical distribution of the sample y1, . . . , yn. The empirical

distribution function is given by ˆF (x) =Pn

i=1I{e∈R:yi≤e}(x). Suppose we would like to estimate

properties of test statistic t(Y1, . . . , Yn; F ), for example the distribution.

Gt,F,n(t) = P (t(Y1, . . . , Yn; F ) ≤ t|F ). (15)

The reason why we condition on F is to indicate that we are dealing with a simple random sample from F . The bootstrap will use the empirical distribution function ˆF in order to estimate (15). The bootstrap estimate would be

G_{t, ˆ}_{F ,n}(t) = P (t(Y1, . . . , Yn; ˆF ) ≤ t| ˆF ). (16)

We would like to have that limn→∞Gt, ˆF ,n= Gt,F,n. In order for this convergence to hold, certain

(20)

Davidson and Hinkley [1997]. We will give an example in which the technical conditions are met and in which they are not met. For example if we estimate the mean of a normally distributed variable from an i.i.d. sample, the bootstrap works fine. In this setting the bootstrap can be used in order to obtain estimates for the variance of the estimator and confidence intervals as described in section 2.4.1.

A setting in which this approach would fail is for example bootstrapping the estimation of the range of a uniform distribution. Usually one would take the sample maximum as estimator for the upper bound of the range of the uniform distribution. In this case bootstrapping is inappropriate.

Davidson and Hinkley [1997] also describe some other situations in which the bootstrap might fail. They list incomplete data, dependent data and dirty data. We will come back to the case of incomplete data in section 2.4.5.

In general the bootstrap will not work for dependent data. For instance if we have realizations of a correlated time series {Yj} denoted by y1, . . . yn. The time series has marginal variance σ2=

var(Yj) and autocorrelations ρh= corr(Yj, Yj+ h). If we would take the approach in section 2.4.1

to approximate the variance of the sample mean ¯Y , then the estimated variance would approach

σ2

n for large n. However the true variance of ¯Y equals Var( ¯Y ) = σ2 n Pn−1 h=−(n−1) 1 −|h|_n ρh.

Since the summation is often different from one, the bootstrap would estimate the variance wrongly. Davidson and Hinkley [1997] state that dependence in finite population sampling has a smaller impact on the bootstrap outcomes.

Dirty data causes severe damage to any statistical device. This holds for the bootstrap as well. Obvious errors can be corrected, but in general one has to assume that the data are collected in the right way.

2.4.3 Bootstrap and GLM

In section 2.4.3 we discussed the bootstrap in a univariate setting. Applying the bootstrap to a GLM is also possible. Davidson and Hinkley [1997] describe two different sample plans. The first one uses the Pearson residuals as described in equation (2). Using the notation of section 2.1, we have a sample of the dependent variable y1, . . . yn and a covariate matrix X ∈ Rn×p. We

can sample with replacement a sample of size n from the person residuals rj,p∗, j = 1, . . . , n as

in equation (2). We denote the sample by ∗₁, . . . , ∗_n. Using these residuals we can reconstruct a new bootstrap sample for the response variable by y∗

j = ˆµj+

q \ Var(Y )∗

j, j = 1, . . . , n. Using the

bootstrap sample for the response variable and the original covariate matrix one can re-estimate the model coefficients, denoted by ˆβ∗.

Repeating the above procedure B times yields a bootstrap sample for the model coefficients. This sample can be used in similar manners as in section 2.4.1 to analyze the distribution of

ˆ

β. Similar approaches can be taken using other residual definitions, for example the deviance residuals or the linear predictor residuals.

A second approach is to re-sample both the dependent and the independent variables. So we sample with replacement n rows from the matrix [y X], yielding y∗ and X∗. Using y∗ and X∗ _{the model can be re-estimated giving ˆ}_β∗_{. Repeating this procedure B times will also give a}

bootstrap sample for the model coefficients. Davidson and Hinkley [1997] compare both sampling plans using a simulation for different GLMs. The conclusion is that the difference in outcomes is very minimal. There is little to choose between the sampling plans.

(21)

would like the response variables to be integer, which is also not granted in re-sampling residuals. Rounding off to the nearest appropriate value could be an option. Fixing the negative values in y∗ by truncation at 0 is generally not recommended.

Davidson and Hinkley [1997] proposes two possible fixes for the negative values in the boot-strap samples. The first one is to use stratification. That is, we recognize that the residuals are not homogeneous. The solution then is to divide the sample in homogeneous groups. The bootstrap samples are also created in groups by only sampling from the residuals in the same group.

Another solution is to use the linear predictor residuals as defined in equation (4). That is, taking a bootstrap sample from (4), denoted by ∗_1,l, . . . , ∗_n,l. The bootstrap sample for the

dependent variable can be constructed as yi = g−1(xβ + g0(ˆµi) \Var(Yi)

1 2

∗_i,l) , i = 1, . . . , n. g−1(·) denotes the inverse of the link function and g0(·) its derivative. If the link function g(·) is the identity link, the Pearson and the Linear predictor re-sampling schemes are the same.

For example for the Poisson GLM with a log link, we have g−1(η) = exp(η). This transforma-tion guarantees positive values in the bootstrap sample for the dependent variable. Davidson and Hinkley [1997] also show in a simulation study that there is little difference in the bootstrapping outcomes for the different residual definitions.

2.4.4 Bootstrap and traditional loss reserving

England and Verrall [1999] describe the bootstrap method for the Chain ladder. They use the same model specification as in section 2.2.2. So we have a quasi Poisson model with a log link. England and Verrall [1999] suggest to use the bootstrap samples from the Pearson residuals. The approach proposed by England and Verrall [1999] is implemented by Kaas et al. [2008] in software package R. This is the same approach as the bootstrap and GLM approach described in section 2.4.3.

England [2002] suggests to adjust the residual definition slightly for the degrees of freedom and use r0_i,p∗ = r _n n − p yi− ˆµi q \ Var(Y ) , i = 1, . . . , n. (17)

n is the number of observations used and p the number of parameters. The reason why the residuals are adjusted with the factorq_n−pn is to enable a proper comparison between estimation variance and process variance. England and Verrall [2002] propose an estimator of the bootstrap reserve prediction error. The prediction error is estimated by

\ P Ebs(R) =

q ˆ

φpR + (SEbs(R))2. (18)

R denotes the total reserve and SEbs(R) denotes the bootstrap standard error of the total reserve.

This method can also be applied for R denoting the reserve with respect to a particular accident year. ˆφpis the same estimator for the dispersion parameter as in section 2.1. Using the expression

in (18), we can relate the estimation variance (SEbs(R))2 to the process varianceP E\bs(R).

The approach is extended in an addendum, England [2002]. The extension is made in order to obtain the full predictive distribution of the reserves. He proposes to simulate the total reserve in every bootstrap iteration. The approach can be summarized in the following steps.

1. Estimate the Chain Ladder as GLM as in section 2.2.2 and the dispersion parameter by ˆ

φ =

Pn

i=1ri,p∗

(22)

2. Calculate the adjusted Pearson residuals as in equation (17).

3. Take bootstrap samples from the adjusted Pearson residuals, B times (England [2002] propose B = 1, 000) and apply the following computations repeatedly.

(a) Construct a pseudo sample using the bootstrapped residuals and re-estimate the quasi-Poisson GLM as in section 2.4.3.

(b) Project the lower right corner of the triangle, called the future cells, let the projections in cell (i, j) be denoted by ˆµ∗_ij.

(c) For each future cell simulate a payment from a Poisson distribution with mean µˆ

∗ ij

ˆ φ

and multiply this simulated value by ˆφ. (d) Sum over the future payments.

(e) Save the future payments.

This procedure gives a sample of the future payments which we can use to make inferences on the predictive distribution of the claim reserves. One Pearson residual of the Chain ladder model has value 0. The last accident year (the lower left value of the triangle) only has one observation. So the parameter concerning the last accident year is estimated using one observation. This yields a ‘perfect’ fit for this observation and therefore a zero residual. This effect is also known as the ‘corner constraint’. If the loss triangle also has only one observation for the last development year, this would lead to an extra residual of value 0.

England [2002] argue that the zero residual could be deleted. However they are included since this way the calculation method for the prediction errors is analogues to the analytical prediction error. Moreover excluding the 0 residual would have minimal effect on the mean of the bootstrap reserves, but would inflate the standard errors.

Using the approach of England and Verrall [1999], negative values can arise in the bootstrap samples. The fixes described in section 2.4.3 can be applied. Stratification may not be very appropriate. For the Chain ladder method this would amount to grouping the residuals by development year. However the data used is on an aggregated basis, leading to a relatively little amount of observations. So dividing the residuals into groups would lead to even smaller samples. The solution we will try is to use the linear predictor residuals as defined in equation (4). 2.4.5 Bootstrap and missing values

As described in section 2.4.2, the bootstrap might fail in the presence of missing data. Efron [1994] describes applying the bootstrap technique on datasets with missing data. He introduces two approaches; the nonparametric bootstrap and the full-mechanism bootstrap. McLachlan and Krishnan [1997] also briefly discuss the bootstrap in combination with the EM algorithm, they propose to use the nonparametric bootstrap method. We will therefore discuss and apply the nonparametric bootstrap method to the GLM in the presence of missing data.

McLachlan and Krishnan [1997] and Efron [1994] both do not discuss the parametric boot-strap in the case of a GLM. But the main principle can be generalized. Suppose we have a sample y1, . . . , ynand a missing value mechanism κ(·). A missing value mechanism models the

missing-ness in a sample by a mapping from the true values to the observed data. The missing value mechanism conceals certain values yielding the observed data, denoted by oi= κ(yi), i = 1, . . . , n.

For a sample statistic based on the observed values s(o1, . . . , on), the nonparametric bootstrap

works as follows. Draw a bootstrap sample of size n with replacement from oi, i = 1, . . . , n,

(23)

Calculating s∗= s(o∗₁, . . . , o∗_n) gives a pseudo value for s(·). Repeating this procedure B times gives a bootstrap sample of size B for the sample statistic s(·). Efron [1994] argues that the main advantage of this approach is that one does not have to specify the missing-data mechanism. At the same time this also is a serious potential disadvantage, since the concealment process can introduce a bias.

In the context of GLM and missing values the nonparametric approach can be applied in a similar way as the second approach in section 2.4.3. Suppose we have a completely observed dependent variable y and a covariate matrix X containing missing values. As in section 2.3 we assume the covariates are missing at random (MAR). The nonparametric bootstrap approach in this context is to re-sample both the dependent as the independent variables.

So we sample with replacement n rows from the matrix [y X], yielding y∗ _{and X}∗_{. Note}

that the matrix X∗ can contain missing values. Using y∗and X∗ the model can be re-estimated with the EM algorithm as in section 2.3 giving ˆβ∗. Repeating this procedure B times will give a bootstrap sample for the model coefficients. When applying this bootstrap procedure to the EM algorithm, it is wise to add an identifier to each observation before one augments the data. In the bootstrap procedure one can sample from the identifier and select the appropriate rows in the augmented data. This approach substantially reduces the computation time, since the augmentation does not have to be applied in every bootstrap iteration.

McLachlan and Krishnan [1997] argue that 50 to 100 bootstrap samples should be sufficient in order to estimate standard errors. We will use 1,000 bootstrap samples to be on the safe side.

2.5 Theory of log transformation

Suppose we have a GLM framework in which the dependent variable cannot be modeled prop-erly by the exponential family distribution. This is for example the case when we are dealing with heavy tailed data. This problem is amongst others discussed by Cantoni and Ronchetti [2006], they introduce a way to estimate GLM using data containing thick tails. The estimation procedure is set up in such a way that the effect of outliers is limited. In the context of claim reserving, we would like to allow for the possibility of large claims. So we will take a look at a more traditional approach, namely the log transformation. Manning and Mullahy [2001] discuss this approach. In this approach one takes ln(y) as dependent variable in a (Generalized) Linear Model in stead of the y observations.

In section 2.1 the definition of a GLM is covered. The goal is to say something about E [y|X], where X denotes the covariate matrix. The estimate for E [yi|Xi] is usually given by

\

E(yi|Xi) = g−1(ˆηi), with ˆηidenoting the estimated linear predictor for observations i = 1, . . . , n.

So we have ˆηi= Xiβ.ˆ

If one estimates a GLM to the log transformed dependent variable, we will get an esti-mate for E [ln(y)|X]. Now we have to find a way to obtain an estiesti-mate for E [y|X] based on E [ln(y)|X]. A naive approach on the basis of the log normal distribution would be to use

\

E [yi|Xi] = exp(E [ln(y\i)|Xi] + ˆσ

2

2 ), where ˆσ

2 _{is an estimate of the variance of y. However this}

only holds if y is log normally distributed.

Duan [1983] proposes to use the ‘smearing estimate’. This is a nonparametric transformation method that solves the above described problem. Suppose we have estimated a model for a transformed dependent variable h(y), where h(·) is an invertible function. Based on a GLM we have estimates for E [h(yi)|Xi], namely ˆµhi = g−1(ˆηi), i = 1, . . . , n. Here we use the same notation

for the link function and the linear predictor as in section 2.1. The superscript h indicates that ˆ

µh

i is the fitted value for h(yi) in stead of yi itself.

Let us define the prediction error term by = h(y) − µh_{, or h(y) = µ}h _{+ .} _{We have}

(24)

following predictor µh

0 = g−1(η0) = g−1(X0β). The expectation of y0 is given by

E [y0] = Eh−1{h(y0)}|X = E h−1{µh0+ }|X =

Z

h−1{µh

0+ }dF(). (19)

F is the distribution function of the error term . One could estimate F by its empirical

distribution function. If we denote ˆi= h(yi) − ˆµhi, i = 1, . . . , n, then we have

\ Fn,(x) = n X i=1 I_{{e∈R:e≤ˆ}i}(x). (20)

I_{{e∈R:e≤ˆ}i}(·) is an indicator function. Using the empirical CDF in (20) to estimate the

expec-tation in equation (19), we get

^ E [y0] = Z h−1{µh 0+ }d [F() = 1 n n X i=1 h−1{µh 0+ ˆi}. (21)

If we insert the estimated model coefficients ˆβ into the fitted values we obtain the smearing estimate for E [Y0], given by

\ E [y0] = 1 n n X i=1 h−1{ˆµh₀+ ˆi}, ˆµh0 = g −1_(X 0β).ˆ (22)

So in order to transform the fitted values to the original scale of the dependent variable, we need an estimate for the distribution of the error term. If there are missing values occurring in either the dependent variable y or the covariate matrix X for observation k ∈ {1, . . . , n}, we are unable to compute ˆk.

(25)

3 Data description

The practical application of this thesis is loss reserving for a health care insurance for ship crew. The data used are data regarding the number of persons insured and data regarding claim payments for a Dutch ship crew insurance. Using an Open Database Connection (ODBC) and SQL queries we were able to select coverage and claim data from the administrating system.

The insurance covers the medical costs if an insured person becomes ill or has an accident. The insurance has four different products, each with different terms and conditions. The products with code P concern health care and repatriation. The products with code H concern health care in the homeland of the insured. The exact product codes are described in Table 4.

The coverage data concern the administration of the policies, product types and the insured persons. We will describe these data in section 3.1. The claim data concern bookkeeping data of all payments made in relation to the occurred claims. The claim data also include the occurrence date of a claim and for which product the claim is filed. The claim data are described in section 3.2. Since the data contain missing values we will use the reporting standards suggested by Horton and Kleinman [2007].

3.1 Coverage data

The coverage data consist of entries for every (group of) insured persons, type of product and every coverage period. Each row in the coverage data frame lists a start and an end date of the coverage and the number of persons that are insured. If a person is insured for multiple product lines, this will lead to extra entries in the data frame for this person. There might be some dependence between the entries for the same person, since an insured crew member may receive health care abroad and later in its homeland. We will look more into this when discussing the claim data in section 3.2.

The administration distinguishes between three policy types. The first two policy types are the Insured Persons (IP) policies and the Crew Compliment (CC) policies. For these types of policies for every insured person the date on which the coverage starts is collected. Also the date the person leaves a ship and therefore ends the coverage period is collected. The difference between the first two policy types is that for the IP policies more characteristics are collected, for instance the name of the insured, rank, nationality and age. For the CC policies only the rank is known.

The third policy type is the Number Insured (NI) policy. For this policy type the persons are grouped per rank. So we know for every period the number of officers and the number of crew members which are insured. If a coverage period spreads over multiple years, we created an entry in the coverage data for every calender year. The number of coverage days ranges therefore between 1 and 366. The covariates collected are described in Table 5.

For the coverage data we are mostly interested in the exposure. As exposure we take the measure of the total number of insured persons measured in full year equivalents (FYE). In other words, we sum over the number of coverage days and divide by 365. Since the number of rows of the coverage data do not have a direct relation to the exposure, we present the data in exposure units in stead of the number of rows/observations.

The coverage data consists out of 69,865 rows. The total exposure equals 33,322 full year equivalents in the period 2004 until 2011. The table with descriptive statistics can be found in Table 6.

(26)

number of insured is quite constant over time.

By construction of the policy administration for the different policy types, the covariates for the Insured Persons policies contain less missing values. For the IP policies the descriptive statistics are given in Table 7. From this table we see that for the IP policies the nationality and age are collected correctly. Gender still contains a relatively large amount of missing values.

3.2 Claim data

We have claim data of claims that occurred in the period 2004 until 2011. The claim data frame consists out of claim payments. So an individual claim is likely to occur more than once in the data. The covariates collected for the claims are very similar to the covariates for the coverage data. The description of the variables is given in Table 8.

The data consists out of 15,949 claim payments originating from 6,274 distinct claims. If we aggregate over claim payments we obtain 7,133 claims. There is a difference in the number of distinct claims, since one claim can consist of payments for different products and in different currencies. If one claim consists out of payments for multiple products, then multiple entries occur in the aggregation for this claim.

Furthermore we see a variable for the status of a claim. In order to model the claim size and the claim development, we are mainly interested in closed claims. Since for the closed claims we are sure that there will be no future payments and they are fully observed.

In the original claim administration for most claims no information on rank, age and nation-ality were tracked. The administration is set up in such a way that we only know from which policy the claims arise and not from which person. Some claims could be linked to an Insured Person via the employee number. These claims have link_type equal to ‘empl nr’. For some other claims we could link the claims to the original coverage via the claimant its name. This is possible for some claims arising from an IP policy. The claims for which this name linking succeeded have link_type equal to ‘search’. If the link_type is missing, also the personal characteristics like rank, age and nationality are missing.

(27)

In Table 11 descriptive statistics of the claim size (paid amount) are given. Here we use claim data aggregated over the claim payments. From Table 11 we see that the claim size has a skewed and fat-tailed distribution. Furthermore we observe negative claim payments. The main reason why claim payments can be negative is third party coverage. For example multiple insurers send a single medical advisor to a ship. The costs of the medical advisor are later refunded.

In our further analysis we will delete the negative payments, since the reserving methods described in section 2.2 are not always able to deal with negative values. There are 186 negative payments, which is approximately 1% of the total payments. The total negative paid amount equals -109,684.68 and the total positive claim amount equals 6,502,855.49. In other words the absolute value of the negative paid amount is approximately 1.7% of the sum of the absolute value of the paid amount. We consider this negligible compared to the estimation error of the different reserving methods.

In order to get some more idea about the possible dependence between the products, we constructed Table 12. In this table we count for every product how many claims also contain payments for other products. Here we see that mainly for the health care homeland product H1 also payments are made for repatriation product P2. This is an indication of a violation of the assumption of independent observations. Breslow [1996] states that the independence assumption for GLM estimation mainly affects the asymptotic results for the standard errors. A possible fix would be to use White standard errors. Since we use the bootstrap we will not use White standard errors. As stated earlier, the bootstrap also needs the independence assumption. Davidson and Hinkley [1997] state that the nonparametric re-sampling method work reasonably well in finite population sampling. So we will continue assuming the independence assumption needed in order to bootstrap the claim data.

(28)

4 Estimation results

In this section we will estimate the described models in the literature review using the data at hand. In section 4.1 we will determine a model for the risk premium in accordance with section 2.1.3. Here we make use of the Ibrahim approach for missing values as described in section 2.3. Furthermore we use the smearing estimate for a log transformation as described in 2.5.

Next we will estimate the three reserving methods. The estimation for the Chain Ladder (CL) method is described in section 4.2. The risk premium model is needed for two reserving methods, namely the Bornhuetter-Ferguson (BF) approach and the GLM reserving (GR) method. The estimation of these two methods are respectively described in section 4.3 and 4.4. For the three reserving methods we use the yearly run-off triangles. Run-off triangles based on quarters namely contain a large amount of zeros within three development years. Finally we will compare the three reserving methods in section 4.5.

In order to compute standard errors the bootstrap will be applied. We use 1,000 bootstrap samples when bootstrapping.

4.1 Risk premiums

As can be seen from Table 6 the covariates concerning nationality and age contain a relatively large amount of missing values in the exposure data. If we use these two covariates in order to construct the different cells in the GLM reserving technique as described in section 2.2.4, the missing values would cause trouble. Namely, the total exposure in each cell (denoted by αi(t1, t2) in section 2.1.3) would contain missing values. Hence we leave out the nationality and

age covariates in our analysis. We also leave out the gender variable, since there are almost no women insured. So we will model the risk premium with the following covariates: product, rank and currency.

We would like to have the rank variable in our analysis, since it seems intuitive that a crew member is more exposed to operational risks than an officer. As we can see from Table 6 the number of missing values in the coverage data is very minor. However in the claim data, summarized in Table 10, we see that for 60% of the claims the rank variable is missing. For the claim frequency model the number of claims are not fully observed. In section 4.1.1 we elaborate on how we deal with this missingness in the dependent variable. In the claim severity model we deal with the missing values by using the EM algorithm described in section 2.3.

The coverage data as well as the claim data contain a currency variable. The currency variable in the coverage data relates to the currency in which the premium is paid. The currency in the claim data is the currency for the claim payment. Important to note is that premium and claim currency are not necessarily the same. As described in section 1.2 the legislation requires to have different reserves for different currencies. So we will model the risk premium in both currencies separately.

4.1.1 Frequency model

In order to reduce the missing value percentage in the rank variable, we assume that the policy of type Insured Persons (IP) are representative for the complete portfolio of insured. This seems reasonable since the policy type variable is only of administrative nature. The IP policies are the policies of which most of the information is known. In stead of 60% missing values in the rank variable, we now have 18%.

Loss reserving for a ship crew insurance