• No results found

A comprehensive pricing, reserving and risk model : a complete micro-level Solvency II approach

N/A
N/A
Protected

Academic year: 2021

Share "A comprehensive pricing, reserving and risk model : a complete micro-level Solvency II approach"

Copied!
52
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

a complete micro-level Solvency II approach

Robert Kroon

Graduation Thesis for the

Master’s Actuarial Science and Mathematcial Finance University of Amsterdam

Faculty Economics & Business Amsterdam School of Economics Author: Robert Kroon Studentnr: 10171428

E-mail: Robertkroon2@hotmail.com Date: June 23, 2016

Supervisor: dr. K. Antonio Second reader: dr. Sami Umut Can

(2)
(3)

Abstract

For insurance companies, three core components characterize business. These are pric-ing, reserving and risk management. These components are often modeled independently and at different levels. In this thesis we propose a model that combines all three as-pects on an individual policy level, using characteristics of both the policyholder and the claim itself, and apply its results on a Solvency II framework. We aim to obtain a model usable by non-life insurers for reserve and catastrophe risk. To do so, we expand the model used in Attema (2014) by adding additional risk types and adding a 1-year risk horizon. Furthermore, when we look at claim history and future development, we allow for multiple payments on one claim, so we can look at claims with a longer period of development. Next, we compare our results to a chainladder bootstrap. We estimate the effect of a bad year on next year’s reserve. We do so for both the individual and the triangle methods. We find that such a catastrophe significantly increases the best estimate for next year. Additionally, it increases next year’s 95th or 99.5th percentile reserve required for Solvency II in both models. We find that the mean of our individual model is higher than that of our benchmark model, meaning the individual model is more susceptible to catastrophes. However, we find that next years 99.5th percentile calculated by our benchmark model is higher than our proposed model, meaning the chainladder method has a larger variance when confronted with a bad year. With some minor extensions, our model can be used by insurers to calculate all their Solvency II risks related to reserve and catastrophe risk for non-life products.

(4)

Preface v

1 Introduction 1

2 Methodology 3

2.1 The Chain Ladder Bootstrap . . . 3

2.2 Terminology . . . 4

2.3 Claim Frequency . . . 6

2.3.1 The Number of NINR Claims . . . 6

2.3.2 The Number of IBNR Claims . . . 7

2.3.3 Seasonality . . . 9

2.4 Claim Sizes . . . 9

2.5 Multiple Payments . . . 10

2.6 Solvency II one-year-view . . . 11

2.6.1 Individual Claim Reserving . . . 11

2.6.2 Chain Ladder Bootstrap . . . 12

3 Case Study 13 3.1 Dataset . . . 13

3.2 The Number of Claims . . . 14

3.2.1 The Number of NINR Claims . . . 14

3.2.2 The Number of IBNR claims . . . 17

3.2.3 Seasonality . . . 22

3.3 Claim Sizes . . . 22

3.4 Solvency II one-year view . . . 27

4 Results 28 4.1 Simulation Results . . . 28

4.2 Comparison to Chain Ladder Bootstrap . . . 30

5 Conclusion 34

Appendix A: Extra Regressions 35

Appendix B: Triangle Data 43

References 45

(5)

In February 2015 one of the final projects of my Master’s Actuarial Science and Mathe-matical Finance started: my master’s thesis. For this thesis I formulated and developed a model that combines pricing, reserving and risk management in a single model and applies it to the current Solvency II guidelines.

First off, I’d like to thank my colleagues Yoeri Arnoldus and Cees Attema for bringing this subject to my attention. It was a challenging and interesting topic and I learned a lot. Furthermore I’d like to thank them for their technical and practical support where it was necessary. Finally I want to thank my employer, Triple A - Risk Finance, for giving me the opportunity to write this thesis, and more generally for giving me the chance to develop my actuarial skills.

For the assistance at the university I want to express my gratitude to Professor Ka-trien Antonio; her comments and assistance helped me a great deal.

Although the writing of this thesis took longer than I initially planned. I am happy with the way it turned out.

Robert Kroon

Amsterdam, June 23, 2016.

(6)
(7)

Introduction

For insurers, risk is a key component of all aspects of business. To correctly handle this exposure to risk, insurers often take three aspects of these risks into account: pricing, reserving and risk management. One of the problems with current models is that the three aspects are considered separately and at different levels. Pricing is usually done on an individual policy level, using simple regressions of frequency and severity, or more so-phisticated methods such asGLM, as inNelder and Wedderburn (1972), andGAMLSS, as in Stasinopoulos and Rigby (2007). These methods use individual policy and claim data to obtain the best possible estimate for each risk factor.

Reserving and risk management are however often reviewed on a different level. Most reserving models are based on aggregated data, usually on so-called claim triangles. The most widely used methods are in essence extensions of the simple chain ladder method, like the Mack method described in Mack (1993) and the bootstrap method described inEngland and Verrall (1999). Finally, risk management is usually executed at a com-pany level using balance sheets. Since 2012 insurers have to abide by the Solvency II guidelines of capital requirements and 1-year horizons, but so far there has not been a single comprehensive model combining these three facets of risk.

Because there are three separate frameworks for these aspects, unification is often diffi-cult, or even impossible. That is why, in this thesis, we develop a comprehensive model at an individual level that combines the pricing of policies with reserving and risk man-agement at the lowest possible level, using all individual client-, policy- and claim data available. We hope this results in a model that can calculate important risk measures relevant for insurance companies, such as reserve risk and catastrophe risk, preferably unbiased and with a low variance. To compare the outcomes of our model, we will com-pare our results with a benchmark model described in Section 2.1.

Over the last few years, researchers have developed methods that can calculate reserves on an individual level, which should make unification of all three aspects of risks some-what easier. These models are often referred to as Individual Claim Reserving models. Notable examples are Drieksens et al. (2012), Pigeon et al. (2013) and Antonio and Plat (2014). In these articles methods are developed to calculate claim reserves at a granular level, i.e. they follow an individual claim’s development over time. The results reported in these papers are promising, as these methods use more available data than methods used on claim triangles. These methods give a more realistic approach of claim development and have a sound statistical foundation. Furthermore,Huang et al. (2015)

andHuang et al. (2016)show that these models result in more accurate estimates of the reserve, with lower variances, meaning they are more preferable than triangle methods in terms of accuracy and efficiency.

(8)

Our starting point is the model proposed inAttema (2014), which combines both pric-ing and reservpric-ing into a spric-ingle model. To combine these two aspects, they first estimate the risk factors of the portfolio to obtain frequency and severity estimates usable for pricing. Using these estimate they fit point processes, given the frequency and severity of each policyholder, to simulate claims and their development. These can be used to calculate and obtain confidence intervals of the reserve, and potentially other risk mea-sures such as the Value at Risk (VaR) and the Conditional Tail Expectation (CTE). However, this model does not incorporate a 1-year risk horizon, so it is unsuitable to use for risk management purposes. For a more extensive description of the model, we refer to Section 2.2 andAttema (2014). We aim to extend this model in order to make it suitable for Solvency II.

That is why we propose to add a 1-year horizon check conform Solvency II standards. With this addition a Solvency II capital requirement for both reserve and catastrophe risk can be calculated. We also implement additional extensions in order to make the model more complete and realistic. We propose to implement the following:

• Using Extreme Value Theory (EVT) in order to separate small claims from large claims;

• Making multiple payments per claim possible, this is especially interesting for claim types that tend to have a lot of payments, like Bodily Injury claims. We will then compare our model to a benchmark model described in Section 2.1 and

Section 2.6.

The rest of the article is as follows. Chapter 2 describes the theoretical basis of At-temasmodel and our extensions.Chapter 3 describes the data, which distributions will be fitted, and how we approach the simulations. Chapter 4 summarizes the results of our model and compares them to a benchmark model. The last chapter will conclude.

(9)

Methodology

In this chapter we will review the most important methods used in this thesis. First we will briefly describe the chain ladder bootstrap, which we use as a benchmark model. Next we will describe our terminology. Lastly, we will describe the model set up by

Attema (2014)and our extensions of this model.

2.1

The Chain Ladder Bootstrap

For an insurance company its important to reserve money for future claims, even if they have not occurred yet. This is due to their exposure to potential claims, which result in losses for the company. Over the years, many methods have been described to calculate the ultimate value of future claims, and thus the value of the required reserve. One of the best known methods, still widely used by insurers, is the chain ladder method. The chain ladder method is a means to calculate the (value of the) ultimate claim given a claim triangle, as shown in Appendix B. Using this method results in a point estimate for the total claim reserve. The chain ladder method does not incorporate in-formation about claims that have not occurred yet. Its working can be summarized in the following equations:

Ci,j = Ci,j−1fj−1 for i + j − 1 > N (2.1) fj = PN −j i=1 Ci,j+1 PN −j i=1 Ci,j for j = 1, 2, ..., N − 1 (2.2)

with Ci,j the cumulative payments made in claim year i ∈ {1, ..., N } and development year j ∈ {1, ..., N }, fj the development factor from year j to j + 1 and N the size of the triangle. In essence, the chain ladder method uses the cumulative payments of previous reporting years to estimate a development factor for future development years. These factors are multiplied with the cumulative claim of the previous development year to determine the ultimate. We only use this factor for payments in the lower triangle that are still unknown, so when i + j − 1 > N .

Over the years, many improvements and extensions of the method have been proposed. Examples are allowing outliers in the model, and scaling ultimates with predefined loss ratios of the premium, as in Bornheutter and Ferguson (1972). A major extension of the chain ladder method is the Mack method as described inMack (1993). Main advan-tage of the Mack method is that it allows for the calculation of a standard deviation of the reserve, without assuming an explicit distribution of said reserve. This can be used to approximate the (theoretical) distribution of the reserve. Since our Individual claim reserving model gives us an empirical distribution, we wish for our benchmark model

(10)

to do the same. That is why, instead of using the Mack method, we choose to use the chain ladder bootstrap as our benchmark model.

The chain ladder bootstrap is a method for calculating (an empirical distribution of) the ultimate claim of an insurance portfolio. Together with the most recent data of paid claims and the case reserve, the total required reserve can be calculated. The chain lad-der bootstrap method is lad-derived from the chain ladlad-der method, hence its name. In orlad-der to use this method the claim data needs to be aggregated and presented in so-called claim triangles, as shown in Appendix B.

The chain ladder bootstrap, first described byEfron and Tibshirani (1993)and applied on claim triangles byEngland and Verrall (1999), expanded the chain ladder method to obtain an empirical distribution of the reserve. Their method creates so-called pseudo-data to simulate the reserve. This allows for the calculation of risk measures, such as the variance and the Value-at-Risk, which is the main advantage over the chain ladder method.

For this thesis, we use the chain ladder bootstrap as described in England and Ver-rall (2002), which uses the chain ladder factors described inEquations (2.1) and (2.2), instead of the overdispersed Poisson distribution as inEngland and Verrall (1999). We use the empirical distributions generated by the chain ladder bootstrap as a benchmark for our own individual model. For a more detailed description we refer to England and Verrall (2002).

2.2

Terminology

One of the main differences between the chain ladder bootstrap and more recent mod-els is the way data is used. The chain ladder bootstrap is based on aggregated data, presented in claim triangles. Over the last few years, researchers have derived models that can be used to calculate reserves on an individual level. These models use a more detailed dataset, with personal characteristics and individual claim data. Notable ex-amples areDrieskens et al. (2012),Rosenlund (2012),Pigeon et al. (2013),Pigeon et al. (2014) and Godecharle and Antonio (2015) who have built their models using discrete time intervals andLarsen (2007),Antonio and Plat (2014) andAttema (2014) who use continuous time instead. Since we use Attemas model as a basis we will also work in continuous time. This makes the model more realistic, but also increases our computing time significantly.

In order to calculate the reserve, we simulate the number of claims for each individual policy, based on their characteristics. If any claim1 occurs, we simulate the payments for each claim. We also allow the possibility of multiple payments on a single claim. To illustrate this we include Figure 2.1, taken fromAttema (2014).

In this figure we see the simulation of an imaginary policy with several claims. For each policy the insurer has to reserve money for (potential) future payments on a claim. Since we assume independence between different claims on the same policy we can sim-ulate each claim independently and sum the reserves of each claim in order to obtain the reserve of a single policy. We can then sum over the reserves of all policies to obtain the total reserve in a similar way.

(11)

Figure 2.1: The development of a fabricated policy

For a single claim we distinguish three types of reserves, similar to Attema (2014): • The Not yet Incurred and thus Not Reported reserve (NINR reserve), also known

as the premium reserve. Every insurer has to reserve money for the claim exposure on open policies, since there is a risk of claiming while the policy offers coverage and hence the insurer is exposed to the risk of filing a claim. This means the reserve is positive as long as the policy is open and zero afterwards, because no more claims can occur after the policy has ended. For a single policy, this reserve tends to be decreasing in time due to the decline in exposure and thus the probability of a claim over the remaining period of coverage. The number of NINR claims as well as their payments are unknown and therefore have to be simulated. Simply put, these are claims that will occur in the future and can be linked to existing policies and their exposure.

• The Incurred But Not Reported reserve (IBNR reserve), for claims that have occurred but are not yet reported. Between the occurrence of an accident and the notification of the claim to the insurer usually a period of time elapses. That is the so-called reporting delay. Since claims have already occurred, insurers already have to reserve for these claims. Similarly to NINR claims both the number of claims as well as their payments are unknown.

• The Reported But Not Settled reserve (RBNS reserve), for claims that have been reported but have not yet been settled, meaning there are potentially still pay-ments to come. This reserve tends to be the largest for most insurers, especially for lines of business with long developing claims. For these claims, part of the payments have already been settled before the date of evaluation. The remainder of the payments (both frequency and severity) still have to be simulated2. As we can see inFigure 2.1a single (policy level) reserve generally has both a NINR, an IBNR and a RBNS component, except at the start and the end of the policy. In these cases the IBNR and RBNS reserve or the NINR reserve are zero respectively. Since we assume the three components are independent we will model them separately. We can immediately note that at the policy start the entire premium (if priced with net premium) will be NINR reserve, since no claims have yet occurred. We also know that at some unknown time in the future all reserves equal zero, since all claims cases will be settled and closed. This means the values of the reserves are heavily influenced by the time of evaluation.

In order to calculate these reserves we need at least three components in our model.

(12)

First we need to model the number of (NINR) claims in the future. Next, we need a model for the number of IBNR claims. This model uses the simulation of NINR claims and adds the reporting delay to determine when claims are reported. Finally, we need a model to simulate claim sizes and amount of payments for both NINR/IBNR claims and RBNS claims. We will address these three components in the next sections.

2.3

Claim Frequency

Since we do not know the number of NINR and IBNR claims, simply because they have not been reported yet, we have to model their frequency. In this subsection we will explain a way to model both the NINR and the IBNR claims. We will mostly follow the approach ofAttema (2014).

2.3.1 The Number of NINR Claims

Before we can simulate the number of claims, we have to make some assumption re-garding the distribution of the frequency of claims. As our starting point, we assume the number of claims, regardless of which type, is homogeneously Poisson distributed. In a continuous time-frame this means we assume the intensity of the Poisson process λ to be constant. As a consequence the process behind the simulations is memoryless, so the occurrence of future claims does not depend on past occurrences.

Furthermore this implies that if we assume the number of claims N ∼ Poisson(λ), and the claim sizes X1, X2, X3, ... follow an unknown distribution fX(x) and are mu-tually independent and independent on N, that the sum of all claims Y |N = PN

i=1Xn is compound Poisson distributed. This means the distribution of the sum of claims is well-defined. The exact formula can be derived using probability-generation functions or cumulant generation functions.

These two attributes are both quite desirable, as we assume independence of claims in our model as well, and because a compound Poisson distribution is fairly simple to simulate, since it only uses well-defined functions.

However, assuming the number of claims is homogeneously Poisson distributed has some major drawbacks. It means we can not assume any seasonality, and we can not assume the hazard rate is variable over time. So using a constant hazard rate makes our model less realistic and inflexible. That is why we will allow our hazard rate, λ, to differ over time, and more specifically, each season. We elaborate on this inSection 2.3.3. We also assume the number of claims is dependent on the characteristics of the policy. Therefore we define a λi,t as the continuous hazard rate for each policy i, for every point in time t > 0, dependent on the information set FT at t = T , with T the evaluation date.3 We will simulate the number of claims using this lambda, after it has been cor-rected for seasonality. We can simulate these claims fairly easily.

Note that, since we use a λi,t that is stepswise constant (we assume it to be con-stant per season), the total number of claims will still be Poisson distributed, because if n1, n2, n3, ... ∼ Poisson(λi), then N = PNi=1ni ∼ Poisson(PNi=1λi). The expected time between claims will however not always be Exponentially distributed due to the seasonal shifts.

3

We assume that after t = T , the characteristics that do not depend on time and claim history, such as car type and region do not change. Characteristics such as age and claim free years will be updated using this simulated data at the end of each evaluation year

(13)

Using a Generalized Linear Model, or GLM, we fit the claim frequency λi,t for each policy. To estimate this frequency we condition on all available policy information, using all risk factors that are significant. This is equivalent to a Poisson Maximum-Likelihood estimator, as seen in Equation (2.3). For a more complete derivation we refer to Cook and Lawless (2007) L ∝ N Y i=1 λi,texp  − t Z 0 λi,u du  (2.3)

Using Equation (2.3) we fit the frequency λi,t at time t for each policy i, assuming there are N . Note that we although we can fit λi,t with a continuous distribution we choose not to do so, but assume it to be stepswise constant per season for each policy. We maximize the log-likelihood for a policy level dataset with individual characteristics such as age and vehicle type described inSection 3.2.1. Solving this Poisson maximum likelihood problem results in the estimated claim frequency for each policy i at time t.4

2.3.2 The Number of IBNR Claims

Similar to the NINR claims, the number of IBNR claims has to be simulated as well. In order to do so we need to calculate the intensity of the reporting delay. We assume the reporting delay to have a certain distribution fr(·), which we will explore inSection

3.3. We can then use this distribution of the reporting delay and the hazard rate of the NINR claims to determine the intensity of the IBNR claims. Here we follow the same approach as Attema (2014).

For a certain point in time t = T , with t = 0 the start of the policy, the probabil-ity that a claim is reported is equal to sum of all probabilities of claim occurrences with a reporting delay equal to T − Tc, Tcbeing the moment in time the claim occurred. In other words, the probability a claim is reported for a policy i, at a certain time t, is equal to a so-called convolution of the NINR hazard rate and the reporting delay:

ρi(t) = t Z

0

λi(u)fr(t − u) du (2.4)

Here t denotes the time since the start of the policy. Since we assume λi(t) to be stepwise constant for each season, we can rewrite ρi(t) as:

ρi(t) = Nt

X

j=1

λi(t) · (Fr(tj) − Fr(tj−1)) (2.5)

with t0 = 0, Nt so that tN = t and tj− tj−1 the time-step λi is constant. This way we can see λi as a step function with jumps at every time step tj. Because we assume a season to last three months, a time-step tj− tj−1 is usually equal to 1/4. Note that this is not true at the start and end of a policy, since the season is cut short there. Writing ρi(t) in the way described above reduces our computation time significantly.

Since the IBNR reserve depends on claims that have occurred but have not yet been reported, we do not integrate from zero, but from the time of the last reported claim. This means we implicitly assume that once a policyholder reports a claim, all previ-ous claims are reported as well. We believe this is a reasonable assumption. Then, the

4

Note that we do not actually use this likelihood to solve the parameters. It is merely added for illustrative purposes. For the actual fitting of the parameters we use theGAMLSSpackage in R, which solves this likelihood with an algorithm

(14)

number of IBNR claims for a certain policy i becomes equal to: NIBN R= Te+5 Z Tr ρi(t|Ft) dt (2.6)

with Tr the time of the last known (reported) claim and Tethe end of the policy. We do not integrate to infinity because the insurer does no longer have any exposure five years after the policy has ended, since Dutch legislation prohibits reporting delays longer than five years. We also account for this in the distribution of the reporting delay inSection 3.2.2. For ρi(t) we now have:

ρi(t|Ft) = t Z

Tr

λi(u)fr(t − u) du (2.7)

Where t = 0 denotes the start of the policy. If no claims have been reported we assume Tr to be equal to zero.

Due to the assumption made above, the process is no longer stationary, because the hazard rate for IBNR claims now depends on the time of the last made claim Tr. This is due to the fact that if a claim is reported, the probability of another claim instantly jumps to zero, because we assume all other claims are reported as well. This changes

Equation (2.5) as follows: ρi(t) = Nt X j=k λi(t) · (Fr(tj) − Fr(tj−1)) (2.8)

with k so that tk−1 = Tc. Now we have a relatively simple formula for the hazard rate of IBNR claims at any given time. Integrating this hazard rate over the course of the pol-icy (plus five years for extreme reporting delays) gives us the total number of reported claims.

An alternative way to calculate ρi(t) would be to take a similar approach as in

sec-tion 2.3.1. Now, if we let fWi denote the distribution of the waiting times between

claims Wi = Ti− Ti−1and FW as its survival function. We can then write the likelihood of this process, as inLawless and Thiagarajah (1996), as:

L ∝ Y i∈1,...,N (t) fWi(Ti) × FW(t − TN (t)) = Y i∈1,...,N (t) ρi(Ti|Ft) exp  − Ti Z Ti−1 ρi(u|Ft)du  × exp  − t Z TN (t) ρi(u|Ft)du  (2.9)

Unfortunately, this results in a set of equations that cannot be solved analytically.5 Therefore, we will approximate the number of IBNR claims usingEquation (2.6). Since we can compute both the hazard rate of the NINR claims and the distribution of the reporting delay, we can use these to fill in the right hand side of theEquation (2.8)and estimate ρ(·) indirectly.

5

Note that although the reporting delay is not directly shown inEquation (2.9), it is indirectly in the log-likelihood because of the definition of ρi(t) inEquation (2.7)

(15)

2.3.3 Seasonality

In order to account for seasonality, we scale the hazard rate for NINR claims (and consequently for IBNR claims as well) conform the season we are evaluating. We assume that each season lasts exactly three months and that spring starts on March 1.6Next we assume that the seasonal effect stays the same over the years, as this makes our factors more stable. Therefore we multiply each hazard rate λi with a factor Sj. Note that this does not change our log-likelihood in Equation (2.9), but instead defines our λi(t). To make sure the total number of claims remains unchanged we have to scale the factors in such a way that:

4 X

j=1

Sj = 4 (2.10)

Since we assume the effect stays the same over the years, we can calculate these factors fairly quickly. We sum all claims made in each season and then calculate the total exposure for each season. Dividing them, we obtain the appropriate factor for each season. We then scale these factors soEquation (2.10)holds. Note that this is equivalent to a Poisson GLM, but since we do not need standard deviations we will calculate the factors using the method described above, as it simplifies our computations.

2.4

Claim Sizes

After simulating the claim numbers N for NINR and IBNR claims, we need to simulate the severity of the total claim of an accident. We have to do so for all types of claims (NINR, IBNR and RNBS). Here we denote the size of a claim with a subscript j. We will describe our approach in this section.

When estimating the claim sizes, we want to use as many relevant risk factors as pos-sible, but we do not have the same information for every type of claim. For example, for both NINR and IBNR claims we have no (additional) claim data to condition on, simply because the claims have not occurred yet. So for these claims we only have the policy data available. We denote this information for a policy i as Fpi.

For RBNS claims, however, we can also condition on known characteristics of the claim in order to make a more precise estimate of the claim sizes. We denote the claim infor-mation for a claim j as Fcj. Because of the difference in available data we will run two

separate sets of regressions, one for NINR and IBNR claims and one for RBNS claims7.

So, when simulating, we can write the (total) claim size Sj for a claim j for NINR/IBNR claims and RBNS claims respectively as:

SN IN R/IBN Rj = Fc−1(U |Fpi) (2.11)

SRBN Sj = Fc−1(U |Fpi, Fcj) (2.12)

with Fc−1(·) the inverse cdf of the claim sizes, given the policy characteristics of a policy i and the claim characteristics of a claim j, if known, and U ∼ U(0, 1). The exact claim distribution Fcwe use will be derived inSection 3.3. With this distribution we estimate the total claim of an accident, taking the case reserve and earlier payments into account, where possible. This means we do not estimate each individual payment, but rather the sum of these payments.

For both sets of regressions we take the possibility of zero payments and extreme values

6

Although Attema corrects for seasonality in the same way, it is unclear which starting point he uses

(16)

into account. Correcting for claims without payments means using a zero-inflated dis-tribution to fit the claim sizes, such as the Zero adjusted Gamma (ZAGA) disdis-tribution or the Zero adjusted Inverse Gaussian (ZAIG) distribution. We estimate a probability of zero payments, using all known characteristics of a claim and incorporate this in our simulation process.

In order to correct for extreme values we use Extreme Value Theory to determine which claims are extreme values, using Hill plots to determine the exact threshold. Then we will fit an extreme value distribution such as the General Extreme Value (GEV) or the General Pareto (GP) distribution to the tail of the data. We will do so in section 3.3. Using the assumptions made in this section we can give an expression for the prob-ability density function of any claim size. If we define p as the probprob-ability of a zero payment, q as the probability of an extreme payment, F as all known information of a claim (Fpi and Fcj) and u as the threshold for extreme claims we can write:

fc(x) =                      0 if x ≤ 0 p if x = 0 fnc(x|F ) Fnc(u|F ) − Fnc(0|F ) (1 − p − q) if 0 ≤ x < u fec(x|F ) 1 − Fnc(u|F ) q if x ≥ u (2.13)

with fnc(x|F ) the pdf for normal (bulk) claims and fec(x|F ) the pdf for extreme claims. Note that Fnc(0|F ) is zero for most distributions used for claim sizes (such as Gamma and log-normal), but this does not always have to be the case.

Since it is possible for multiple payments to occur on a single claim, we also add the height of the current case reserve (or its logarithm) as an explanatory variable for the claim size, as having a higher reserve often leads to a higher total claim amount. We discuss this in the next section.

2.5

Multiple Payments

Since Bodily Injury claims often have multiple payments, our model needs to be able to simulate the total claim conditional on a previous payments, since these impact the total payment on a claim. Additionally, for RNBS claims there already exists a case reserve, which is an indication for the total expected claim. To account for this extra information, we add the sum of all previous payments and the case reserve, the so-called incurred claim, R, or its logarithm as an explanatory variable in the claim size regres-sion. The higher R, the higher the expected payment.

The incurred claims also have an impact on the probability of a zero payment. We expect that a large case reserve and multiple payments reduce the chance that the claim becomes a zero payment claim. This means our probability of zero payment p becomes dependent on the claim j.

Additionally, we make assumptions regarding extreme values, based on the case reserve and previous payments. Insection 3.3 we define the threshold u for extreme claims. We assume that if our incurred claim is larger than this threshold it is automatically an extreme claim. We argue that if the total payment for a claim is above the threshold for extreme claims, this will most likely mean the payment itself is an extreme claim as well.

(17)

2.6

Solvency II one-year-view

Now that we have the mathematical basis for our model laid down, we can make some final adjustments for Solvency II legislation. In this section we will discuss a method to incorporate the Solvency II one-year-view in both our model and the benchmark model. Because of Solvency II regulations, insurers have to take the one-year-risk of their lia-bilities into account. To do so, they have to look at a 1-in-200 year occurrence. This is known as catastrophe risk, and it requires an insurer to reserve extra capital. In terms of modeling, this comes down to using the results of the 99.5th percentile (simulation) of the reserve.

When such an event occurs, it will also impact the best estimate for the next year, as the ’bad year’ will be information used to estimate the new best estimate reserve. Effectively, this means the GLM parameters are shifted, so that more and higher claims are expected.

To quantify this effect in both our model and the chain ladder bootstrap model, we need to simulate one year ahead for both models. We will discuss our approach in the next subsections.

2.6.1 Individual Claim Reserving

To account for a one-year-view in our individual reserving model is fairly simple. If we assume we have data up until point t = T , we can fit our model using t = T − 1 as evaluation date and run our simulations conform the model described above. Sig-mundsd´ottir (2015) uses a similar approach in their article. The only difference is that instead of simulating the entire history of a claim, as described in earlier sections, we now only have to make simulations for a single year. We then take the 99.5th percentile worst scenario of the reserve and assume this (claim) simulation is our new dataset at evaluation date t = T .

Using the data of this simulation we refit the model and simulate the complete develop-ment of all claims, assuming the same policies are contracted as our current dataset8. Since this model has more and higher claims than the actual data, the new estimated best estimate (for t = T ) shall be higher than that of the model using the actual dataset with t = T as evaluation date. The difference between the reserves will be the capital required to account for the risk of an increased best estimate due to the 1-in-200 year event.

We also compare the best estimate and the distribution of the t = T reserve, both with the actual data and the altered data, to those calculated by the chain ladder boot-strap. Summarizing, we need to run three separate simulations in order to obtain and compare our final results:

1. First off, we need to run the model with the evaluation date t = T −1 and simulate one year ahead. We need the simulation that gives the 99.5 worst percentile of the reserve, to see the effect of a bad year on next years reserve.

2. Using this 99.5th percentile we can estimate the best estimate reserve with eval-uation date t = T . To do so we use the policy data we have but alter the claim development to our chosen simulation, so we can compare it to the actual best

8

We use the same policies because we want to see the effect of solely the ’bad’ year. Since insurers do not have policy data available for future years they have to make assumptions about lapse and new policies. This is outside of scope for this thesis

(18)

estimate with evaluation date t = T . Note that an insurer usually uses all avail-able data. This means an insurer generally does not have next years policy data available, so either assumptions have to be made on new policies or this model as it is can be used solely for backtesting.

3. Finally, to compare our ’worst case’ best estimate to the actual best estimate we need to run our second model again, but with the actual unaltered data. This gives us the best estimate for the reserve given all available data. The difference between the two reserves gives us the required capital to keep in case of a 1-in-200 year occurrence.

Since the first model requires us to simulate one year ahead instead of the full claim de-velopment, we need to distinguish between claims that close within that year and claims that are still open at t = T . We will elaborate on this insection 3.4and Appendix A. To compare our simulations to the chain ladder bootstrap we require a similar method to simulate one year ahead for our triangle approach. We will describe this in our next subsection.

2.6.2 Chain Ladder Bootstrap

For extrapolating a year in the chain ladder bootstrap we need to estimate values for the next triangle diagonal. To do so we take the 99.5th percentile simulation of the chain ladder bootstrap in a similar way we do for our individual model. We then use the simulated data for the new calender year from this simulation as new data. Note that since we are restricted to calender years, we can only choose an evaluation date t = T at the end of a calendar year. Otherwise it would mean our triangle data needs to be extrapolated to a full year.

The method described above leaves us with one unknown value, as the bootstrap method does not predict the value for claims made for new policies.To estimate this last value, we assume that the percentage of claims paid in the same year as the claim occurs is always the same percentage of the total payments made. In our triangle shown in

Appendix B we can see this percentage is 15.9% for 2011 and 15.2% for 2012, our last known year. Since the percentage is fairly close in the actual dataset, we use 15.9% as the estimate for the final value in our triangle.

Using this new diagonal we can simply use the chain ladder method to obtain the new best estimate of the reserve and the chain ladder bootstrap to obtain an empirical distribution. To be able to compare the triangle to our individual model we use the ’worst case’ triangle as input for our chain ladder bootstrap. This gives us a new re-serve distribution, much like our Individual model does. For more details on the triangle approach we refer toSection 4.2 and Appendix B.

(19)

Case Study

In this chapter we will discuss the dataset we have used and shed more light on the distributions we have chosen. Then we will explain the steps we take to make our model suitable to calculate Solvency II related risk measures for reserve and catastrophe risk. We focus on the one-year view in particular. This chapter will have a similar setup as

Chapter 2.

3.1

Dataset

The dataset we have used is one of a European motor insurance company with several lines of business. WhereasAttema (2014)uses data of a so-called all-risk insurance prod-uct, we will use claims related to bodily injury, which tend to have a longer development pattern, since personal injury and lawsuits are involved. This is why we have allowed for multiple payments to be made on a single claim.

Our dataset spans from the start of 2008 to halfway through 2013. Since the prod-uct was fairly new, the number of policies increases every year, reaching its maximum in 2012 with an exposure of 126,000 policy years in that year. With this we mean that the sum of all exposures for policies in 2012 equals 126,000. For these policies we have both claim data and characteristic data of the policyholders available, such as their age, car type, mileage, experience and type of claim.

We use the policy data to estimate the frequency of claims and the more detailed claim data for estimating their severity. The complete dataset contains data of 652,400 unique policies and 61,823 claims occurring between 2008 and 2013. For the claims we have data available for every development month. InTable 3.1we have summarized the available risk factors in the dataset.

We choose to use the dataset up until the end of 2012, since it will make computa-tion more intuitive and more easily programmable. It will also enable us to compare our results to the chain ladder bootstrap, as that method requires year-to-year triangles. Doing so will leave us with a dataset of slightly over 500,000 policies. We will first fit our model using 31-12-2011 as a benchmark for our Solvency II one-year view. After simulating one year ahead and choosing the 99.5th percentile reserve, we simulate the full claim development for both models, using the unaltered dataset with evaluation date 31-12-2012 as a comparison.

(20)

Risk factor policy Description Minimum Maximum Starting date Date the policy and coverage starts 22-8-2007 16-2-2015 End date Date the policy and coverage ends 29-8-2007 27-2-2015

Age driver The age of the policyholder 17 104

Credit score Credit score of the policyholder 1 3

Claim free years Claim free years of the policyholder 0 26 Years driver license Years the policy holder has a drivers license 0 72

Home owner Dummy home owner 0 1

Usage Usage of the car (private/work)

Number of cars Number of cars owned 1 11

Age vehicle Age of the vehicle 0 32

Fuel type Fuel Type of the vehicle (petrol/diesel/other) Make/model Make and model of the car

Purchase price Purchase price of the car 1.100 195.000

Mileage Mileage in kilometers per year < 10000 > 30000

Vehicle power Vehicle power (in kW) 21 196

Vehicle weight Vehicle weight (in kg) 550 2984

Region Region of the policy 0 5

Exposure Exposure of the policy (in years) 0 4

Claim frequency Number of claims made on the policy 0 7

Risk factor claim Description Minimum Maximum

Claim date Date the accident has occurred 7-9-2007 28-6-2013 Reporting date Date the claim has been reported 7-9-2007 28-6-2013 Claim cause The cause of the claim, divided in subtypes

Closing date Date the claim is closed (if applicable) 7-9-2007 28-6-2013 Development month Month of development of the claim 0 61

Total payments Payments made (if applicable) -15.400 278.000

Total reserve Reserve (if applicable) -15.400 3.636.000

Table 3.1: Available risk factors including descriptions. For claims additional data is available for every development month. If policies are renewed without changes, they are treated as a single policy.

3.2

The Number of Claims

In this section we will discuss the distributional assumptions we have made to simu-late the number of claims. First we will discuss the NINR claims, afterwards we will talk about the IBNR claims and finally we make a short remark on our seasonality assumptions.

3.2.1 The Number of NINR Claims

For the number of NINR Claims we have used a Poisson model, as described inSection 2.3.1. However, we still need to specify our model further. Before we can do so we have to conduct an analysis of the data. We do so in this section.

First we look at the average number of claims reported per year. As we can see in

Table 3.2below, the number of claims per policy steadily decreases over the years. That is why we have also decided to add the year of the policy as an explanatory variable. Note that the value for 2007 may be inaccurate due to the lack of exposure. Additionally we only have data up until 30 June 2013, so the number of claims should be doubled

(21)

for the year 2013.

Year Number of claims Exposure (years) Claims/year

2007 46 214 0.215 2008 3898 27103 0.144 2009 10463 73466 0.142 2010 12901 98834 0.131 2011 13805 114081 0.121 2012 14061 126146 0.111 2013 6649 130331 0.051

Table 3.2: Number of claims vs. exposure per year

The estimated lambdas for the complete set and the open policies (those that have not ended at the evaluation date) are shown in Figure 3.1 below. The analytical formulas have been derived in Section 2.3.1 and the actual coefficients are shown in Table 3.3

below. To make the graph more readable we scaled the frequency of the open claims to match that of the entire set. As we can see, the average claim frequency of the open claims is substantially lower than that of all policies, as also shown in Table 3.2. This further strengthens our assumption that the year of the policy should be added as an explanatory variable.

Figure 3.1: Estimated lambdas of the data available up to 31-12-2012, open policies in black, closed claims in gray

Since we allow for our claim hazard to be dependent on time, we will add the year of the policy as an explanatory variable. For our claim numbers we use a Poisson distribution with a log link function, resulting in a a multiplicative model. We can write the frequency for a claim, λi as:

λi= exp(βXi) (3.1)

with β the estimated parameters and Xi the characteristics of each policy. The results of the regression are shown in Table 3.3.

In this table we can see the effects of the results of our Poisson regression using our data up until 31-12-2012. Using the coefficients fromTable 3.3we can estimate a λi for each policy to simulate the NINR claims. For additional regressions regarding our one-year

(22)

view we refer toAppendix A. In this Appendix we have summarized the regressions we use for our one-year view (see Section 3.4).

Since many factors are significant when estimating claim frequency (and severity) we have obtained a model with many parameters. We justify this because we have a suf-ficiently large dataset of more than 500,000 policies, which leaves us with more than enough degrees of freedom to fit.

Parameters Estimates Std. error p-value t-value

Intercept -2.20 0.018 <2e-16 -123.23

Year 2007 (base=2012) 0.308 0.060 2.6e-07 5.15

Year 2008 0.294 0.015 <2e-16 19.39

Year 2009 0.332 0.013 <2e-16 25.14

Year 2010 0.230 0.013 <2e-16 18.01

Year 2011 0.182 0.012 <2e-16 14.66

Car 5 to 15y old -0.041 0.010 3.6e-05 -4.13

Car >15y old -0.068 0.011 2.1e-09 -6.00

Credit score 2 0.158 0.009 <2e-16 17.35

Credit score 3 0.217 0.014 <2e-16 15.44

Number of cars >1 -0.080 0.009 <2e-16 -9.30 Fuel type petrol (base=diesel) -0.133 0.011 <2e-16 -12.43

Fuel type other -0.068 0.080 0.40 -0.85

Type American 0.004 0.015 0.80 0.25

Type Asian 0.008 0.011 0.51 0.66

Type large -0.024 0.042 0.57 -0.57

Type prestige -0.592 0.190 0.002 -3.12

Type sports -0.361 0.076 2.3e-06 -4.72

Car value 20000-40000 0.043 0.013 0.001 3.20

Car value >40000 0.132 0.040 0.001 3.26

Years license <10 0.170 0.012 <2e-16 14.73

Age driver <23 0.182 0.032 8.8e-09 5.75

Age driver >75 0.039 0.040 0.33 0.97

Mileage <10000 -0.112 0.010 <2e-16 -11.52

Mileage >30000 0.061 0.023 0.009 2.63

Vehicle weight <1000 -0.033 0.015 0.027 -2.21 Vehicle weight >1500 0.023 0.014 0.089 1.70 Claim free years -0.020 0.0007 <2e-16 -30.59

Region 0 0.196 0.019 <2e-16 10.21

Region 1 0.375 0.015 <2e-16 25.80

Region 2 0.270 0.016 <2e-16 16.61

Region 3 0.145 0.011 <2e-16 13.12

Region 5 -0.067 0.013 6.6e-07 -4.97

Table 3.3: Table with frequency coefficients of the NINR regression, data up to 31-12-2012, corrected for exposure. Policy data was available for 509533 policies

We have added and deleted variables using type 1 and type 3 (deviance) tests and se-lected certain categorizations (groupings) using the Akaike Information Criterion (AIC) as described in Akaike (1974), a statistical measure defined as:

AIC = 2k − 2 ln(L) (3.2)

(23)

with the smallest AIC is the preferred model. The idea behind the AIC is that it chooses the model with the best fit, while correcting for the number of parameters. This has resulted in Table 3.3. Note that we have selected the grouping on a trial and error basis. This can be improved with grouping algorithms, but this is out of scope for this thesis.

As we can see in the table above, most factors are (individually) significant. For the variables that are not (such as type and age), F-tests on the entire categorical vari-ables prove they are still jointly significant. Note that every (categorical) variable has a base. In all cases this is the variable with the most exposure, as this reduces variance.9 Now we have a complete framework to simulate NINR claims. To simulate these NINR claims we have to correct the λi for each policy, depending on the season and integrate over each season until the end of the policy. For the IBNR claims however, we still need to assess the reporting delay. We will do so in the next section.

3.2.2 The Number of IBNR claims

To simulate the number of IBNR claims, we need to make some assumptions regarding the reporting delay. First off, we assume the reporting delay remains the same over time, meaning the behaviour of policyholders regarding their claim reporting will not change. Next we assume these reporting delays are random variables that do not depend on any claim characteristics, i.e. they are the same for every policy. We make this assumption, because the most important factors that determine the reporting delay (such as accident type and the size of the claim) are unknown if the claim has not yet been reported. The (empirical) distribution of the reporting delay is shown inFigure 3.2below.

Figure 3.2: Distribution of the reporting delay

In our dataset, we only have the number of days of reporting delay available, so we would have to fit a discrete distribution to simulate the delay. Since we want to fit a continuous distribution on the reporting delay, we assume that claims are reported uniformly over the day. Therefore, we add a standard uniform variable to the reporting delay, as shown in Equation (3.3). Additionally, by doing so we ensure that this distribution does not need to be zero-inflated. So we define the adjusted, continuous reporting delay Trd,con as:

Trd,con= Trd+ U (3.3)

9

The grouping of the risk factors is done on a trial and error basis. They have been selected such that there is enough exposure in each group and so that there are not too many groups with similar coefficients. The AIC is taken into account as well.

(24)

with Trd the reporting delay as obtained from the dataset and U ∼ U (0, 1). We fit our continuous distribution on this adjusted reporting delay.

As we can see the majority of the claims are reported in the first ten days, and af-terwards the density becomes much thinner. If we fit the Log-normal, Exponential and Weibull distributions on this set we find that the Log-normal distribution fits best, as shown in Figure 3.3. However, if we look at the Q-Q plots, we see that there appears to be a large tail the Exponential and Weibull distribution cannot fit, whereas the Log-normal distribution fits poorly for the bulk of the data.

Figure 3.3: Q-Q plots of Exponential, Weibull and Log-normal distribution and the reporting delay

That is why we decide to use a spliced distribution, one for the tail and one for the body (as described inequation (2.11)). To determine what the appropriate threshold is for ’extreme’ reporting delays, we use a Hill plot, as shown in Figure 3.4. As described inHill (1975)and Tsay (2009) one way to determine the threshold for splicing the dis-tributions, is to see where the Hill Plot becomes horizontal.

Figure 3.4: Hill plot of the reporting delay

From this Hill plot we can see that the graph becomes horizontal at approximately the 4500th order statistic, corresponding to a reporting delay of 30 days. This is why we fit

(25)

the Exponential, Weibull and Log-normal distribution again, but on the dataset with re-porting delays smaller than 30. For the remaining dataset we will fit other distributions. So we can write the density as:

f (x) =            fbody(x) Fbody(30) (1 − p) if 0 < x < 30 ftail(x) 1 − Ftail(30) p if x > 30 (3.4)

with p the probabilty of an extreme reporting delay, fbody and ftail the distributions for respectively the body and the tail and x as the reporting delay. The actual density is described later this section.

As we can see inFigure 3.5the three distributions fit significantly better on the body of the dataset. Although it looks like the Log-normal distribution fits best, both the Ex-ponential and Weibull distribution have a higher log-likelihood. Incidentally these are exactly the same, so for the body of the data we will use the Exponential distribution. Note that we will only use this distribution to fit reporting delays up until 30 days.

Figure 3.5: Q-Q plots of the Exponential, Weibull and Log-normal distribution and the reporting delay capped at 30 days. We split the Q-Q plots between tail and body, to see which distribution fits best for each part.

As for the tail of the data, we consider three distributions often used for extreme values, namely the Generalized Extreme Value (GEV) distribution, the Generalized Pareto (GP) distribution and the Gumbel distribution. The Q-Q plots shown in Figure 3.6

suggest the GEV distribution fits best for the tail of the data, which has an overall decent fit. Although the log-likelihood is higher for the Gumbel distribution, we choose for the GEV distribution as it has a better fit for the larger part of the dataset. The only downside is that we slightly overestimate the reporting delay for the 150 policies with a reporting delay longer than a year. We feel this is a better choice than severely underestimating most of the delays as in the Gumbel distribution.

We have to note that due to the nature of the dataset, reporting delays beyond 1517 days have not occurred, simply because the portfolio has no policies that have had an accident more than 1500 days in the past. We note that this might result in an underes-timation of the reporting delay, because higher reporting delays have not been observed yet. Looking at the Q-Q plot of the GEV distribution we believe this will not be a prob-lem, as it has a more right-skewed fit than our dataset, meaning the fit on the actual set will be closer than the Q-Q plot indicates.

(26)

Figure 3.6: Q-Q plots of the GEV, GP and Gumbel distribution and extreme values of the reporting delay

Due to Dutch legislation, claims reported after 5 years are not valid, meaning an insurer does not have to pay on claims that have occurred more than 1825 days ago. That is why we fit the GEV distribution for values between 30 days and 1825 days.

Summarizing, we will use a spliced density to determine the reporting delay. For re-porting delays below a month (30 days) we use the exponential distribution to fit the delays, and for the tail we use the GEV distribution.

As estimated from our dataset, the probability of a long developing claim p equals 4492/61823. The (empirical) distribution of the reporting delay and the fit of Equation (3.5)are shown inFigure 3.7. The pdf of this spliced distribution can be written as:

f (x) =                  fe(x; λ) Fe(30; λ) (1 − p) if 0 < x < 30 fGEV(x; µ, σ, ξ) FGEV(1825; µ, σ, ξ) − FGEV(30; µ, σ, ξ) p if 30 ≤ x < 1825 0 elsewhere (3.5) with: Fe(x; λ) = 1 − e−λx (3.6) FGEV(x; µ, σ, ξ) = exp (−t(x)) (3.7) t(x) =  1 + ξ x − µ σ −1/σ (3.8) f·(x; ·) = δF·(x; ·) δx (3.9)

and p as above. In order to obtain the numeric values of our parameters we have to maximize the following likelihood:

L = Y x f (x) = Y 0<x<30 fe(x; λ) × Y 30≤x<1825 fGEV(x; µ, σ, ξ) = Y 0<x<30 λe−λx × Y 30≤x<1825 1 σt(x) ξ+1e−t(x) (3.10)

(27)

Solving our Maximum Likelihood results in the following parameter estimates: ˆλ = 0.168, ˆµ = 45.74, ˆσ = 21.25, ˆξ = 1.10. Now that we have (constructed) this distribution, we can simulate the IBNR hazard rate ρ(t) using the method described inSection 2.3.2.

Figure 3.7: The empirical probabilities of reporting delay of all policies up until 31-12-2012 and its fitted distribution

To calculate the number of IBNR claims NIBN R at t = T , we would have to integrate over the (adjusted) hazard rate in Equation (2.7), which is an integral in itself. Then we get: NIBN RT = T Z 0 ρi(t|Ft) dt = T Z 0 t Z Tr λi(u)fr(t − u) du dt (3.11)

Because the hazard rate pi(t) is different for each policy it would increase computation time quite drastically. Therefore we simulate the number of IBNR claims using an ad-justment of a homogeneous point Poisson process as described in Rao and Teh (2011)

and Attema (2014). We will elaborate on this method below.

To reduce our computation time, we do not simulate the actual claim occurrences. Instead, we choose to simulate a number of claim realizations using a Poisson process with a parameter Ωi larger than or equal to the maximum rate of our INBR process, so Ωi≥ maxt(ρi(t)). This means we simulate more claims than our process would predict. To account for using a higher hazard rate we have to the thin the amount of realizations later.

A logical value for this Ωi is the maximum value of the NINR process: maxt(λi(t)), since it is easily computed (it is λi corrected with the highest seasonal factor) and be-cause it is always larger than ρi(t), since ρi(t) is a thinned version of λi(t), seeEquation

(2.4). After computing k occurrences Ekwe thin the number of claims generated by the Poisson process. The probability a claim actually occurs equal to:

P (Ek) =

ρi(TEk|Fi)

λmax

(3.12)

with Fi our information set up until occurrence Ei, and TEk the time of the k th oc-currence. In essence, we thin our IBNR claims using the probability that they would

(28)

actually occur. We have to note that we only simulate claim reports up until five years after the end of the policy, as the right to claim expires after five years in the Nether-lands.

3.2.3 Seasonality

As mentioned before, we will correct our claim occurrence process for seasonality. We will do so by multiplying the λi of each policy by a certain factor for each season. Since we force Equation (2.10)to hold true, the average claim occurrence remains the same. The factors we will use are shown inTable 3.4and are derived as described inSection 2.3.

Season Estimate

Winter 1.061

Spring 1.045

Summer 0.884

Autumn 1.010

Table 3.4: Seasonal correction factors

Since λi is part both the formula for NINR claims and IBNR claims, we correct for seasonality in the same way for both types of claims. We conclude the results are fairly straightforward, with the least claims in summer and the most in the winter. The relatively high estimate of spring compared to the other factors can be explained by the Dutch spring climate, which tends to be very rainy, leading to relatively many accidents.

3.3

Claim Sizes

Since an insurer is subject to exposure on all open policies until the end of the contract, they have to reserve money for all claims that may occur or have already occurred on these policies. That is why we have to look at both the size of NINR/IBNR claims and the RBNS claims to estimate the complete value of the reserve.10

Once we have simulated the number of NINR/IBNR claims we need to simulate the size of the claims for both these claims and the (open) RBNS claims. With claim size we mean the cumulative payments once a claim has been fully closed. To be able to simulate this, we run two separate regressions, as we have more data available for the RBNS claims (type of accident etc.).

Additionally, we have to use a different dataset for each type of claim. For the re-gression of NINR/IBNR claims we use the data of all closed claims we have available, as they give a good prediction of claim sizes for that have not occurred yet. For the RBNS claims, we use data of all closed claims that were open at the end of the year they occurred. We do so because we believe that claims that are open for a longer period of time have a different development pattern than claims that are closed more quickly. Here we use ’open at 31-12 of year of occurrence’ as a proxy for long-developing claims. One could argue this is a rather crude way to separate the data, but we believe it is the most accurate we have available.

Furthermore, we need a way to model the extreme claims. Similar to the extreme values of the reporting delay we use a Hill Plot to determine at what point a claim becomes

10

In this case, the reserve for NINR claims can be seen as premium reserve, whereas the reserve for IBNR and RBNS claims can be seen as the outstanding claim reserve.

(29)

’extreme’. In Figure 3.8 below, we see the Hill plot of the claim sizes. We assume that ’extreme claims’ are the same for NINR/IBNR claims as RBNS claims.

If we look at the Hill plot and zoom in on the extreme values, we see that all (com-pletely developed) claims above 8.000 have a similar distribution and can therefore be classified as extreme values. So in both models, claims above 8.000 will be modelled using the GEV distribution, and the rest will be fitted with a (Zero Inflated) Gamma distribution. For the GEV distribution we assume the parameters are not dependent on claim characteristics.11. The cdf for extreme claims can be written as:

FGEV(x; µ, σ, ξ) = exp (−t(x)) (3.13) with: t(x) =  1 + ξ x − µ σ −1/σ (3.14)

and µ, σ and ξ the parameters of the GEV distribution.

Figure 3.8: Hill plot of the claim sizes

As for the probability pi of a claim being an extreme value, we also assume there are no characteristics determining this chance, except for the type of the claim. We choose to estimate a different pi for NINR/IBNR claims and RBNS claims, because we have additional data available for RBNS claims, since we know the value of the case reserve for RBNS claims and incorperate this in our probability.

For RBNS claims we assume that if the total cost of a claim R (payments + case reserve) is over the threshold, it automatically can be classified as an extreme claim and the total claim needs to be simulated from the GEV distribution. For claims with a total cost lower than the threshold we estimate a separate probability. This means we can simulate our claim sizes Ci as follows:

11

We also assume that this insurer is re-insured for claims above 5.000.000. We make this (reasonable) assumption because of the infinite variance of the GEV distribution, sometimes leading to claims over a billion, which is very unlikely in real life.

(30)

Ci = (

FGEV−1 (U ; µ, σ, ξ) if V ≤ pi ∪ R > 8000

Fbody−1 (U ) otherwise (3.15)

with FGEV−1 (U ; µ, σ, ξ) the inverse cdf of the GEV distribution, Fbody−1 (U ) the inverse cdf of the body, and U ∼ V ∼ U(0,1). Note that we assume one distribution for extreme claims (albeit with a different probabilty for each policy), but two different distributions for the body, depending on whether the claim is NINR/IBNR or RBNS.

This is because of the fact that for RBNS claims we can add additional explanatory variables, as characteristics such as the claim cause and the claim reserve are known for claims that have already been reported. In Table 3.5 below we denote the types of claim causes we discern:

Claim cause Description Cause A Unskilled Driver Cause B Reckless Driving

Cause C Missed Stop/Right-of-Way sign Cause D Car Collision

Cause E Minor Damage

Cause F Damage caused by Parking Cause G Nature Damage (storm etc.)

Table 3.5: Accident Causes

As mentioned before, for fitting the total claim size of the NINR/IBNR claims and the RBNS claims we need two separate regressions. For both regressions we choose to use a Zero Adjusted Gamma distribution (or ZAGA), to adjust for zero payments. We can write the regressions as:

µi= exp(β1Xi,1) σi= exp(β2Xi,2) νi= exp(β3Xi,3)

(3.16)

with βj the parameter estimates and Xi,j the policy and claim characteristics for claim i and j = 1, 2, 3. Furthermore, µi is the expected value for the Gamma distribution, σi its standard deviation and νi the probability of a zero payment. The estimates from these regressions can be found inTable 3.6 and Table 3.7.

Note that in the RBNS regression we have also added the logarithm of the reserve at the time of evaluation as a explanatory variable. We have done so because the pay-ments made so far on a claim affect the final claim height. We can only do so for RBNS claims as this data is unavailable for NINR/RBNS claims.

We have chosen to add the logarithm of the reserve instead of the actual value be-cause doing so gave a better fit. The +1 has to be added bebe-cause the claim size at the end of the year might also be zero, which cannot be processed by a logarithm.

(31)

µ Parameters Estimates Std. error p-value t-value

Intercept 6.80 0.085 <2e-16 79.79

Year 2007 (base=2012) 0.637 0.099 1.0e-10 6.65

Year 2008 0.420 0.082 3.2e-07 5.11

Year 2009 0.391 0.083 2.1e-06 4.75

Year 2010 0.343 0.084 3.9e-05 4.11

Year 2011 0.309 0.085 0.0003 3.64

Car 5 to 15y old -0.094 0.019 8.6e-07 -4.92

Car >15y old -0.012 0.020 0.53 -0.62

Credit score 2 -0.078 0.016 1.46e-06 -4.82

Credit score 3 0.020 0.024 0.42 0.80

Type American 0.005 0.027 0.86 0.18

Type Asian -0.055 0.020 0.007 -2.69

Type large 0.294 0.069 2.2e-05 4.24

Type prestige -0.344 0.269 0.20 -1.28

Type sports -0.061 0.130 0.64 -0.13

Car value 20000-40000 -0.061 0.024 0.012 -2.52 Car value >40000 0.199 0.079 0.011 2.51 Years license <10 -0.043 0.021 0.038 -2.08 Age driver <23 0.211 0.045 2.9e-06 4.68 Age driver >75 0.292 0.077 0.0001 3.81 Vehicle weight <1000 -0.061 0.022 0.006 -2.78 Vehicle weight >1500 0.033 0.025 0.20 1.29 Claim free years -0.014 0.001 <2e-16 -11.67

Region 0 -0.049 0.034 0.16 -1.41

Region 1 -0.114 0.025 5.4e-06 -4.55

Region 2 -0.073 0.028 0.010 -2.59

Region 3 -0.037 0.020 0.060 -1.88

Region 5 0.130 0.024 6.1e-08 5.42

σ Parameters Estimates Std. error p-value t-value

Intercept 0.267 0.004 <2e-16 61.21 Car value 20000-40000 0.075 0.033 0.022 2.30 Car value >40000 0.018 0.011 0.097 1.66 Age driver <23 0.028 0.019 0.14 1.50 Age driver >75 -0.119 0.040 0.003 -3.00 Vehicle weight <1000 -0.038 0.010 0.0001 -3.86 Vehicle weight >1500 -0.013 0.011 0.27 -1.11 ν Parameters Estimates Std. error p-value t-value

Intercept -0.062 0.024 <2e-16 -26.22

Car 5 to 15y old -0.459 0.026 <2e-16 -17.87 Car >15y old -0.718 0.026 <2e-16 -26.75 Claim free years 0.007 0.015 1.5e-06 4.82

(32)

µ Parameters Estimates Std. error p-value t-value

Intercept 6.08 0.052 <2e-16 117.13

log(Reserve at evaluation+1) 0.129 0.005 <2e-16 28.15 Accident type A (base=B) 0.706 0.063 <2e-16 11.13

Accident type C 0.422 0.032 <2e-16 13.24

Accident type D 0.057 0.025 0.021 2.31

Accident type E 0.102 0.057 0.072 1.80

Accident type F -0.247 0.027 <2e-16 -9.11

Accident type G 1.337 0.057 0.019 2.35

Year 2007 (base=2011) 0.379 0.082 4.5e-06 4.59

Year 2008 0.213 0.038 2.3e-08 5.59

Year 2009 0.151 0.039 0.0001 3.87

Year 2010 0.098 0.042 0.021 2.31

Car 5 to 15y old -0.085 0.025 0.0007 -3.38

Car >15y old 0.048 0.025 0.062 1.87

Car value 20000-40000 -0.031 0.024 0.19 -1.30

Car value >40000 0.287 0.091 0.002 3.16

Years license <10 0.033 0.024 0.14 1.36

Age driver <23 0.283 0.059 1.9e-06 4.77

Age driver >75 0.229 0.101 0.023 2.27 Mileage <10000 -0.040 0.022 0.063 -1.86 Mileage >30000 0.141 0.054 0.009 2.60 Region 0 -0.052 0.046 0.25 -1.14 Region 1 -0.061 0.032 0.061 -1.87 Region 2 -0.097 0.036 0.008 -2.67 Region 3 -0.092 0.026 0.0004 -3.52 Region 5 0.096 0.032 0.002 3.03

σ Parameters Estimates Std. error p-value t-value

Intercept 0.229 0.005 <2e-16 49.26

Age driver <23 0.052 0.025 0.035 2.11

Age driver >75 -0.128 0.055 0.021 -2.31

ν Parameters Estimates Std. error p-value t-value

Intercept 0.079 0.038 0.039 2.07

log(Reserve at evaluation+1) -0.196 0.005 <2e-16 -35.70 Accident type A (base=B) 0.351 0.092 0.0001 3.82

Accident type C -0.483 0.058 <2e-16 -8.33

Accident type D -0.129 0.039 0.001 -3.29

Accident type E 0.917 0.069 <2e-16 13.24

Accident type F 0.266 0.040 1.9e-11 6.72

Accident type G 0.047 0.894 0.96 0.05

Table 3.7: Gamma estimates for RBNS claims. Evaluation date is 31-12-2012. Note that we have also conditioned on claim cause and case reserve

Now that we have a distribution for the normal claims we can determine the complete claim distribution. Since we use a spliced distribution of ZAGA and GEV for both claim types, we can simulate the claim size for claim i as:

Ci = ( FGEV−1 (U ; µ, σ, ξ) if V ≤ pi ∪ R > 8000 FZAGA−1 (U ; µi, σi, νi) otherwise (3.17) with:

(33)

fZAGA(x; µi, σi, νi) =              0 if x < 0 νi if x = 0 (1 − νi)   1 (σ2iµi)1/σ 2 i x 1 σ2i−1e−x/(σi2µi) Γ(1/σ2 i)   if x > 0 (3.18) FGEV(x; µ, σ, ξ) = exp (−t(x)) (3.19)

and R the outstanding claim reserve (zero for NINR/INBR claims), U ∼ V ∼ U(0,1), pi = 0.00954 for NINR/IBNR claims and pi = 0.00502 for RNBS claims. Fitting the GEV distribution on our extreme claims (> 8000) gives us the following parameters: µ = 13896.48, σ = 7111.15, ξ = 1.035. These parameters are a result of solving the following likelihood: L = Y x f (x) = Y x≤8000 fZAGA(x; µi, σi, νi) × Y x>8000 fGEV(x; µ, σ, ξ) = Y x=0 νi× Y 0<x≤8000 (1 − νi)   1 (σ2 iµi)1/σ 2 i x 1 σ2 i −1 e−x/(σi2µi) Γ(1/σ2i)  × Y x>8000 1 σt(x) ξ+1e−t(x) (3.20) Using the distributions and parameters above we can simulate the number of claims and their sizes. The results of our simulations, as well of those of our benchmark model, are described in Chapter 4.

3.4

Solvency II one-year view

For our Solvency II one-year view we will take 31 December 2011 as our evaluation date (t = T − 1). We will fit the entire model using the same assumptions as above and simulate one year ahead. When doing so, we have to take into account that some claims will be open at the end of 2012 and some will be closed. This will mean fitting two extra severity regressions, which we describe in more detail in Appendix A.

For both NINR/IBNR claims and RBNS claims, we estimate the probability a claim will be closed or open by the end of the year, at 31 December 2012. We estimate this probability empirically, using all claim data we have available, by calculating how many claims that were open at 31 December of their year of occurrence are still open one year later. Using this probability we estimate the claim size using either the ’closed’ regression or the ’open’ regression as described in Appendix A.

Then we take the 99.5th percentile reserve and use this simulation as a ’worst case’ situation. We will use the data of this simulation to refit the model and simulate the claim development as described in this section. Comparing the best estimate reserve of this model and the model with unaltered data will then be equal to the capital required for a bad year on next year’s best estimate. We will do the same for our benchmark model.

For this new 2012 situation we also have to refit the GEV distribution and the prob-ability on an extreme claim. All other regressions have to be re-calibrated as well. We will do so using the models used inSection 3.2andSection 3.3as a starting point. This is described in more detail in Appendix A.

(34)

Results

In this chapter we will discuss the results of our simulations. First we will discuss the results from the simulations with evaluation date 31-12-2011 and 31-12-2012. Next we will compare the one-year 99.5th view with the best estimate of 2012. Finally we will compare the results of our simulations with a benchmark model.

4.1

Simulation Results

In this section we will discuss the results we have obtained using the model described in Chapter 2 and Chapter 3. Using this simulation framework we can simulate claim occurrences and claim sizes, both for new claims and claims that have already occurred, given a reserve and previous payments.

Doing so and summing over all payments made on open claims and new claims we obtain the reserve at evaluation date. In Figure 4.1we have shown the empirical distribution of the reserve (IBNR and RBNS), evaluated at 31-12-2012. Here we have simulated the complete claim development of all IBNR and RBNS claims and calculated how much money should be reserved. We see that the distribution is slightly right skewed and has some extreme values. For 10000 simulations the reserve has a mean of 72.5 million (red line) and a 99.5th percentile of 106.1 million (blue line).

Figure 4.1: Full simulation results, using claim and policy data up until 2012, the mean in red (72.5 mln.), the 99.5th percentile in blue (106.1 mln.)

Referenties

GERELATEERDE DOCUMENTEN

Omdat elk team hoogstens één knik heeft, hebben die twee teams precies hetzelfde uit-thuis schema (behalve die 2x achter elkaar uit spelen ze allebei steeds om-en-om uit en

The conference on Strategic Theory and Contemporary African Armed Conflicts is the result of co‑operation between the faculties of the Royal Danish Defence College (RDDC)

Irony is clearly a complex concept that is difficult to define, as discussed above. In this chapter I will discern three subtypes of irony. These are theoretical, ideal types, in

Mit dem Ende des Ersten Weltkrieges stand Österreich vor einem Neuanfang. Der Krieg, der durch die Ermordung des österreichischen Thronfolgers Franz Ferdinand von Österreich-Este

The reserves are predicted for five portfolios corresponding to Scenario 1 with V = 10, 000 expected claims which are simulated using the data simulation machine.. Sto(12)

By defining the number of insurance claims as a point process and adjusting it by the reporting delay, the relation in equation (3) allows us to estimate the frequency of claims

Willingness to exit the artisanal fishery as a response to scenarios of declining catch or increasing monetary incentives.Fisheries Research, 111(1), 74-81... Qualitative

Bostaande•is in bree trekke die Ossewabrandwag se opvat- ting van die volkspolitiek; en na vyf jaar staan die O.B. nog by elke woord daarvan. alegs blankes &#34;·at