• No results found

Estimating the components of a stochastic model for claim reserving based on micro-level data

N/A
N/A
Protected

Academic year: 2021

Share "Estimating the components of a stochastic model for claim reserving based on micro-level data"

Copied!
50
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Estimating the Components of a

Stochastic Model for Claim

Reserving Based on Micro-Level

Data

Master’s Thesis Financial Econometrics

University of Amsterdam

Faculty of Economics and Business

July 4, 2015

Author: Caroline Goedhart

(2)

Abstract

In this thesis the data from a Dutch non-life insurance company is used to an-alyze a stochastic model for the complete claims process. Micro-level data of claims is used, which consists of all transaction dates and payments severities. The datafile consists of more than one million claim records, this data is used to estimate parametric models. Loss reserving techniques are usually based on aggregated data methods where specific claim information is not taken into ac-count. By implementing a model based on micro-level data, future cashflows might be estimated more accurately. Furthermore, many limitations of the ag-gregated models are adressed by using a micro-level model. In this thesis the components of the statistical model from Antonio & Plat (2014) are analyzed, reproduced and extended with the available data.

(3)

Contents

1 Introduction 1 2 Literature 4 3 Data description 6 3.1 Type of claims . . . 6 3.2 Occurrence of claims . . . 6 3.3 Reporting delay . . . 7 3.4 Settlement delay . . . 8 3.5 Type of events . . . 9 3.6 Payments . . . 10

3.7 Initial case reserve estimates . . . 12

3.8 Exposure . . . 13

4 The model 14 4.1 Marked Poisson process . . . 15

4.2 The likelihood function . . . 16

4.2.1 Background of the likelihood function: multitype recurrent events . 17 5 Distributional Assumptions 20 5.1 Reporting delay . . . 20

5.2 Occurrence of claims . . . 26

5.3 The development process of claims: hazard rates for the event types . . . . 29

5.4 Payments . . . 35

6 Conclusion 42

Appendix A: Additional graphs 44

(4)

1.

Introduction

Due to new solvency guidelines the accuracy of the loss reserve estimate for insurance companies has become more important. The loss reserve is the amount of money that is put aside by an insurance company to be able to satisfy all outstanding liabilities for claims that occurred on or before a certain valuation date. The new guidelines in Solvency II set out many requirements for the estimation of the reserve. Recent literature gives critical views about the disadvantages of reserving methods that are currently used in practice.

In this introduction the basic concept of the development of a claim will be explained, followed by a short explanation of the main ideas of the currently used reserving methods and an explanation of the reserving methods based on micro-level data. Next, the aim of this thesis will be explained.

The development pattern of a claim is shown in Figure 1.1. The timeline starts when the event occurs (time t1 in Figure 1.1). After the occurrence, the insured has to report the event to the insurer. This moment is called the reporting time (time t2 in Figure 1.1). When a claim is reported to the insurer there might follow some payments until the settlement of the claim.

Figure 1.1: Timeline of the development of a claim.

For estimating the loss reserve, the insurer must take all claims with an occurrence date before the valuation date into account, claims with an occurrence date after the valuation date are not a part of the outstanding liabilities at this valuation date. There is

(5)

a distinction between two types of claims: reported claims and Incurred But Not Reported (IBNR) claims. IBNR claims have an occurrence date that lies before the valuation date, but they are not reported to the insurer at the valuation date. When the valuation time lies before t2 in Figure 1.1, the claim is an IBNR claim. Even though the claim is invisible for the insurer at the valuation date, this claim is part of the insurer’s liabilities. The IBNR claims are the most difficult ones to estimate losses for because there is even uncertainty about their occurrence.

In practice, stochastic loss reserving methods are often based on aggregated data collected in run-off triangles. A run-off triangle is a summary of the underlying claim data, claims are grouped within the triangles by properties like accident year, reporting year or payment year. A famous and commonly used reserving method that is triangle based, is the chain ladder method. The principles of this method are explained in Mack (1993). However, there are some problems that may arise by using the triangles with aggregated data. An example of such a problem is the occurrence of cells in the triangles that have values that are negative or equal to zero. Disadvantages of this way of modeling are that the sample sizes are small and the model does not allow the use of individual claim data.

An alternative for looking at aggregated data, is to use the underlying micro-level data. More available data about the claims is used with this method. In recent literature a lot has been written about theoretical models for micro-level stochastic loss reserving. In England & Verrall (2002) it is mentioned that the traditional techniques, based on aggregated data, were developed before the existence of extensive computer features. One could question if the traditional methods are outdated seen the current state of knowledge. Current literature is mainly based on theoretical models, there are not many papers written that include a case study. Therefore the practical use of individual claim reserve modeling needs to be tested before one can think about implementing this method in real life. In Antonio & Plat (2014) a micro-level stochastic loss reserving model based on continuous time is tested on a real-life dataset from a non-life insurance company. The exposure of policies, the occurrence date of a claim, the reporting date, the payment dates and the settlement date are modeled by likelihood functions and used as variables to predict outstanding liabilities. Out-of-sample predictions were made to test the performance of the individual claim model and the results show that the micro-level based stochastic loss model gives more accurate estimates of the reserve than models based on aggregated data.

In this thesis the components are estimated of a stochastic loss reserving model based on micro-level data from real life. The data from a Dutch legal insurance company is used for making parametric approaches based on this micro-level data, where the framework

(6)

from Antonio & Plat (2014) is used. The data comes from a very different kind of non-life insurer than the data used in Antonio & Plat (2014), therefore the practical implementa-tions of their models will be tested. This thesis focuses on the specificaimplementa-tions of the building blocks from the framework obtained by Antonio & Plat (2014). All components of their model are estimated for the available data, and the results are analyzed extensively. The data is split in groups to improve the modeling accuracy. Furthermore, the effects from missing data and using different kinds of data than used in Antonio & Plat (2014) will be explored.

In Section 2 some important findings in literature about micro-level stochastic mod-eling of reserves are described. In Section 3 the available data will be described. The theoretical model will be introduced in Section 4. In Section 5 the distributional assump-tions will be explained together with the implementation and fit of these distribuassump-tions. This thesis ends with a conclusion about the estimation results of all components of the stochastic model in Section 6. The potential of implementing stochastic reserving models based on micro-level data will be discussed as well in this section.

(7)

2.

Literature

The current reserving techniques for non-life insurers have some drawbacks. One of these drawbacks, which was already mentioned in the introduction, is that there is loss of information when individual claim data is summarized into run-off triangles. Another disadvantage of the triangle methods is the assumption of independence between early and late development years, as was mentioned by Zhao et al. (2009). Another drawback they mention is the assumption of identical distributions for claim frequencies, severity and delays, which may be violated due to changes. As is mentioned in W¨uthrich & Merz (2008), most of the commonly used methods do not make a distinction between reported claims and IBNR claims. They give mathematical ideas on how one can model the different classes and how one could study development processes for individual claims by using approaches of Arjas (1989) and Norberg (1993, 1999). In Arjas (1989) it is mentioned that most authors at that time agreed that there are important benefits from using structurally descriptive probabilistic models in insurance. W¨uthrich & Merz (2008) mention that most of the research on individual claims development modeling is probably in its infancy, as most research mainly consists of mathematical and statistical properties of the models, and the framework is seldom used in practice in a mathematically consistent way.

Arjas (1989) formulate a probabilistic framework for the individual claim development process by using point processes and martingales. Norberg (1993) introduced a theoret-ical framework were claims are assumed to be generated by a non-homogeneous marked Poisson process, the marks representing the developments of the individual claims. These frameworks were implemented and tested by Antonio & Plat (2014). This was one of the first publications where a real life dataset was used to test the implementations for micro-level stochastic loss modeling. They got more accurate results with the micro-level model than with methods based on aggregated data. The used policy information is the exposure of policies, the number of policies where the insurer has liabilities for during the whole period where periods are taken monthly. Further claim level data is used. The occurrence date, the reporting date, the payment dates and the settlement date are used. A likelihood function is obtained which can be split in different parts, a part for the re-porting delay, a part for the occurrence process, and a part for the development process.

(8)

& Plat (2014) is used in this thesis, a description of these processes is given in Section 3 and 4. For every separate process an underlying distribution is estimated.

In Jin (2013) the framework of (a previous edition of) Antiono & Plat (2014) was used and tested with another dataset. The conclusion from this paper was that the micro-model performs better than the macro-level model. The distributional assumptions in Jin (2013) are similar to that in Antonio & Plat (2014). A difference between these papers is that the Pareto distribution is used for modeling the payment density in Jin (2013) instead of the lognormal distributio which is used in the original framework. Another notable difference between the models is that in the original model from Antiono & Plat (2014), exposure data is used. The framework of Norberg (1993) takes the time varying risk exposure as non-homogeneous Poisson intensity measure. For the dataset used by Jin (2013) there was no exposure data available. Instead of using exposure data they defined a new variable that was estimated by the model. In Jin (2013) it is written that the lack of exposure data does not have to be a problem when the book of business is known for an insurer. In Section 3 the available exposure data for this thesis will be explained.

In Pigeon et al. (2013), the same dataset is used as in Antonio & Plat (2014). Here the development over time is assumed to be discrete, and therefore discrete probability distributions are used. They define a reporting delay, a first payment delay and a later payments delay, all expressed as a number of periods. They define development factors, where they distinguish between a development process that consist of a single payment and development processes that consist of more than one payment. The results from the out-of-sample predictions for the stochastic losses are similar to that of Antonio & Plat (2014), the micro-model gave more accurate results than some implemented aggregated data methods.

Zhao et al. (2009) present a semi-parametric model to estimate individual claim loss reserving. Their simulation study indicated that their obtained procedure can produce efficient estimates and predictions for claim loss reserving. However, there is no real life data used in their approach to test these indications.

(9)

3.

Data description

The dataset used in this thesis comes from a Dutch legal services insurance company. A claim file is available that provides detailed information of all payments that are made from 1995 until 2013. The dataset consists of 445,720 claims, and 1,174,308 payments are registered for these claims. For all claims in this dataset, the type of case treatment (internal or external), the occurrence date, the reporting date, the payment date, the payment severity and the settlement date (in case the claim is closed) are known. Besides the claim file there is some policy information available for the years 2001 until 2013. A detailed description of the available information will be given in this section.

3.1

Type of claims

There is made a distinction between two types of treatments for claims: internal and external treatment. Internal treatment means that the case can be treated by a paralegal that is employed by the insurance company. External treatment of the case indicates that the claim is treated by an external company, which is often a lawyer firm. Claims that are treated externally are in general more complex, tend to make higher costs because of the high hourly rates, and have a different development pattern compared to internally treated claims. For that reason the two groups are modeled separately in this thesis. In the current loss reserving method used by the insurer, there are more than two distinct groups, and a result is that some groups do not have many data. In order to keep the group sizes large, only two groups are distinguished in this thesis.

3.2

Occurrence of claims

Figure 3.1 shows the total number of claims per occurrence year as well as the number of closed claims (at the reference date 31-12-2013). It is important to mention that the decline in the last months of the graph is mainly caused by the fact that there are IBNR claims. Another reason is that zero claims are not included in the dataset that was available for this thesis. Zero claims, are claims that are reported but where no payment has take place

(10)

external treatment, this explains why the graph for internally treated claims looks more stable.

Figure 3.1: Number of claims occurred per month for the internal group (upper) and the external group (lower).

3.3

Reporting delay

The difference between the occurrence date and the reporting date of a claim, is called the reporting delay. The reporting delay is an important driver of the IBNR reserves. In Figure 3.2 the reporting delay of the claims is shown in months.

The reporting delay distribution seems to be highly skewed to the right. Most of the claims are reported fast after the occurrence. Table 3.1 shows the empirical probabilities of reporting delays from 0 until 8 days after the occurrence date. As the table shows, almost half of the claims in the dataset is already reported in these first days.

(11)

Reporting delay: intern

Reporting delay in months

Frequency 0 1 2 3 4 5 6 0 20000 40000 60000

Reporting delay: extern

MAANDEN_MELD_EVENEMENT Frequency 0 5 10 15 0 5000 15000 25000

Figure 3.2: Reporting delay frequencies for claims with internal treatment (left) and ex-ternal treatment (right).

Internal treatment

Days Frequency Percentage Cum. percentage Rel. freq. 0 18885 5.86 5.860 0.0586 1 24460 7.59 13.45 0.0759 2 16058 4.98 18.43 0.0498 3 15016 4.66 23.09 0.0466 4 13612 4.22 27.32 0.0422 5 12574 3.90 31.22 0.0390 6 13106 4.07 35.28 0.0407 7 12554 3.90 39.18 0.0390 8 10137 3.15 42.33 0.0315 External treatment

Days Frequency Percentage Cum. percentage Rel. freq. 0 6946 10.4 10.40 0.1040 1 6735 10.1 20.48 0.1008 2 3747 5.61 26.09 0.0561 3 3371 5.05 31.14 0.0505 4 2901 4.34 35.48 0.0434 5 2546 3.81 39.30 0.0381 6 2491 3.73 43.03 0.0373 7 2373 3.55 46.58 0.0355 8 1706 2.55 49.13 0.0255

Table 3.1: Empirical probabilities for claims which are reported within the first nine days since occurrence.

3.4

Settlement delay

The settlement delay is defined as the time difference between the reporting time of a claim and its closure date. Figure 3.3 shows the histograms of the settlement delays for the group with internal treatment and the group with external treatment. The difference between the settlement delay distributions for these two groups is large, as can be seen from the figures. Claims that are handled externally, have a longer development time than claims which are treated internally. Main reasons are that these cases are more complicated or have a higher financial interest.

(12)

Figure 3.3: Settlement delay frequencies for internal treatment (left) and external treat-ment (right).

3.5

Type of events

During the development of a claim, which for now is defined as the time from notification until settlement, events like intermediate transactions might occur. Three types of events are distinguished in this thesis, following the work of Antonio & Plat (2014). These three types of events are defined as:

• type 1: settlement without a payment;

• type 2: payment with settlement at the same time; • type 3: payment without settlement.

Figure 3.4 shows the cumulative number of events during the development in months. For both groups, type 3 events (intermediate payments) occur with the highest frequency and type 2 events (payment with settlement) occur with the lowest frequency.

For the claims with external treatment, the gap between the number of type 3 and the number of type 1 events is larger than for claims with internal treatment. In Table 3.2 the percentages of the three events relative to the total number of events are given. The percentage of type 1 events is almost twice as large for the internal group as for the external group. For the claim group with external treatment only four percent of the events is a settlement together with a payment.

(13)

Figure 3.4: Cumulative number of each event over development months for the internal group (left) and for the external group (right).

Internal treatment Number Percentage Type 1 331306 29% Type 2 127640 11% Type 3 700382 60% Total events 1159328 100% External treatment Number Percentage Type 1 60405 15% Type 2 15055 4% Type 3 331231 81% Total events 406691 100%

Table 3.2: Percentages from total number of events assigned to the different event types.

3.6

Payments

The severity of payments is of great importance for estimating the future cash flows. The costs that are analyzed in this thesis are external costs, which are all costs where another party is involved. Examples are court fees, lawyer costs and payments directly to the policy holders. Internal costs are not taken into account in this thesis. Examples of internal costs are personnel costs and office rental costs. There is a large size difference between the payments within the internally treated and externally treated group. Claims that are treated externally, involve in general remarkably higher external costs than claims in the internally treated group.

The payments are transformed to log payments, a reason for this transformation is that the payments are right skewed and by performing a log-transformation the distribution

(14)

becomes more symmetric. Furthermore, the payment amounts lie closer to each other when using log-payments. In order to perform the log-transformation, ’negative costs’ were excluded from the data. Sometimes money is received instead of paid. A reason for this could be a transaction mistake or a deposit repayment. In case of a transaction mistake the paid as well as the received amount are excluded from the data. Because there are not many ’negative costs’ transaction, they are excluded from the data. Payments that are greater than the coverage limit are excluded as well. Figure 3.5 shows the histograms of the payments. Payments: intern log−payments Density 0 2 4 6 8 0.0 0.5 1.0 1.5 Payments: extern log−payments Density 0 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5

Figure 3.5: Histograms of the log-payments, the left figure represents the internally treated group and the right figure the externally treated group.

The histograms show several spikes, which are caused by frequently occurring pay-ments amounts. The payment amount that is most frequently paid for the internal group, is 73 euros. Nine percent of all intern payment transactions were of this size. Further, many payment amounts lie between 74 and 83 euros, which causes the second highest spike. The other large spike lies around the amount of 40 euros. These spikes are caused by court fees, which are fixed costs one has to pay when starting a lawsuit. Court fees are specified for different types of cases. This year, the court fee for the type of court case that is mostly treated internally, is 78 euro1. The court fee for another type of case2 that often has internal treatment, is 45 euros this year. The peaks are caused by these court fee payments. These fees change sometimes and therefore the peaks are not exactly equal to the current court fees. Further payments in this group are expert fees. These fees can differ a lot between cases. However, some outliers might occur in this group. For example, when the insurer makes a mistake during the case treatment and is held liable, he might

1The current court fees can be found at:

http://www.rechtspraak.nl/naar-de-rechter/ kantonrechter/kosten/pages/griffierechten.aspx

2The current fees from other departments can be found at:

http://www.rechtspraak.nl/ procedures/tarieven-griffierecht/pages/griffierecht-bij-de-rechtbank.aspx

(15)

be obligated to pay for the loss. These loss amounts can be very large, depending on the case. Another situation were higher payments are made, is when a case is surrended for an agreed amount to prevent outsourcing of the case.

The highest costs for the external group are caused by declarations of external lawyers that handle the case. The two largest spikes in the histogram of the external group, lie around 1000 en 2000 euros. These amounts correspond to fixed fees that are agreed with the external lawyers. Other spikes lie around 40 and 80 euros, which perhaps are caused by court fees.

In Figure 3.6 the kernel density functions for both groups are shown together. A Gaussian smoothing kernel is used. As explained in this section, the payments for both groups are distributed very differently and therefore they need to be modeled separately.

0 2 4 6 8 10 12

0.0

0.4

0.8

1.2

Payments: kernel densities

log−payments

Density

Figure 3.6: Kernel densities of the payments for the internal group (red) and the external group (blue).

3.7

Initial case reserve estimates

Some information was available about the initial case reserve estimates, which is the estimated amount that will be paid for a claim. When these estimates where investigated it turned out that there was a great lack of information. For the largest part of the claims, the claim handler did not fill in a case reserve estimate. In cases that the claim estimate was available, the values were very often not representative. For this reason the initial case reserve estimates are not used in the models that will be described in the next section.

(16)

3.8

Exposure

As mentioned in Norberg (1993), the exposure rate may be thought of as a simple measure of volume or size of the business. Antonio & Plat (2014) used the ’earned’ exposure, which can be defined as the number of policies that is insured in the observed period. This type of exposure data was not available for the data used in this paper. Instead there is data available about the total premium earned corrected for inflation for the years 2001 until 2013. Because there is no monthly or quarterly data available, seasonal effects cannot be extracted from this exposure information.

(17)

4.

The model

In this section the statistical models implemented from Anontio & Plat (2014) are de-scribed. The variables that are used in the models are defined below:

• τ is the valuation date • Ti is the occurrence time • Ui is the reporting delay

• Eij is the type of the jth transaction: – type 1: settlement without payment

– type 2: settlement and payment at the same time – type 3: payment without settlement

• Pij is the amount paid at the jth transaction

• Vij is the time that the jth event occurs since notification

• Xi is the development process after notification: (Vij, Eij, Pij) for j = 1, ..Ji

Here the index i indicates claim i. For the prediction of outstanding liabilities a distinction is made between three type of claims. As explained in the introduction there is a clear distinction between IBNR and reported claims. Within the group of reported claims, a further distinction is made between settled and open claims. These three groups are defined as:

• IBNR claims: Ti+ Ui > τ and Ti < τ ;

• open claims: Ti+ Ui ≤ τ and the development Xi is censored at (τ − Ti− Ui), the time between the valuation date and the reporting date, because this is the only part observed;

(18)

4.1

Marked Poisson process

Similar to Antonio & Plat (2014), claims are treated as a Position Dependent Marked Poisson Process (PDMPP). The marked Poisson process is explained in Karr (1991). Arjas (1989) and Norberg (1993, 1999) obtained a theoretical approach for the claim process, which is used by Antonio & Plat (2014). The occurrence time is taken as a point and the reporting delay and development together are the associated mark. The occurrence of claims is assumed to be a non-homogeneous Poisson process with λ(t) as its intensity measure. The associated mark distribution consists of the reporting delay distribution PU |t and the development distribution PX|t,u, together denoted as PZ|t. The intensity measure with associated mark of the development process can now be defined as:

λ(dt) · PZ|t=

λ(dt) · PU |t(du) · PX|t,u (4.1)

A distinction will be made between reported claims and IBNR claims: Cr = {(t, u, x) ∈ C|t + u ≤ τ } is the set of reported claims, and CIBN R = {(t, u, x) ∈ C|t ≥ τ, t + u > τ } is the set of IBNR claims. Since Cr∩ CIBN R = Ø, the processes are independent. The process for reported claims is a Poisson process with intensity measure and associated mark distribution defined as:

λ(dt) · PU |t(τ − t)1(t∈[0,τ ]) (4.2a)

·PU |t(du)1(u≤τ −t) Pu|t(τ − t)

(4.2b)

·PX|t,u(dx) (4.2c)

where equation (4.2a) is the occurrence measure, (4.2b) is the conditional distribution of the reporting delay and (4.2c) is the conditional distribution of the development process of a claim.

The intensity measure with associated mark distribution for IBNR claims is defined as:

(19)

·PU |t(du)1(u>τ −t) 1 − Pu|t(τ − t)

(4.3b)

·PX|t,u(dx) (4.3c)

where equation (4.3a) is the occurrence measure, (4.3b) is the conditional distribution of the reporting delay and (4.3c) is the conditional distribution of the development process of a claim.

4.2

The likelihood function

A parametric approach is made in this thesis. A likelihood function obtained by Antonio & Plat (2014) will be optimized over unknown parameters. The observed values for the occurrence date, the reporting delay and the claim development process are denoted by To

i, Uio and Xio respectively.

The likelihood of the observed claims is defined as: ( Y i λ(Tio)PU |t(τ − Tio) ) exp(− τ Z 0 w(t)λ(t)PU |t(τ − t)dt) (4.4a) ·Y i PU |t(dUio) PU |t(τ − Tio) (4.4b) ·Y i Pτ −Tio−Ui0 X|t,u (dX o i) (4.4c)

This likelihood function consists of three parts that can be optimized seperately. Equation (4.4a) denotes the part of the likelihood for measuring the occurrence process, this part needs to be optimized over the unknown parameter λ. The variable w(t) indicates the ex-posure, which is a measure for the size of the business. Equation (4.4b) is the likelihood for obtaining the parameters of the underlying distribution of the reporting delay. Equation (4.4c) is the likelihood function for the development process. This likelihood is censored τ − Ti− Ui time units after notification, which is the time between the valuation date and the notification date. The development observables consist of type events, corresponding event occurrence times measured since notification and payment severities. They have observed values that are respectively denoted by Eij, Vij and Pij.The likelihood function of equation (4.4c) can be split up in a likelihood for the event types and a likelihood for

(20)

the payment severities. By specifying the likelihood function for the development process in more detail, the likelihood function is defined as:

( Y i λ(Tio)PU |t(τ − Tio) ) exp(− τ Z 0 w(t)λ(t)PU |t(τ − t)dt) (4.5a) · ( Y i PU |t(dUio) PU |t(τ − Tio) ) (4.5b) ·Y i (N i Y j=1 hδij1 1 (Vij)h δij2 2 (Vij)h δij3 3 (Vij) ) · exp(− τi Z 0 h1(w) + h2(w) + h3(w)dw) (4.5c) ·Y i Y j Pp(dPij) (4.5d)

where equation (4.5c) is the likelihood for the event types and durations of these events. This likelihood will be optimized over the underlying distributions of the hazard rates. The hazard functions for type 1, 2 and 3 events are respectively defined by h1(t), h2(t) and h3(t). In this likelihood, δijk (with k = 1, 2, 3) is equal to 1 if the jth event of claim i is of type k:

• If Eij = k then δijk=1 • If Eij 6= k then δijk=0

Further Ni equals the total number of events registered for claim i. The variable τi is defined as: τi = min{τ − Ti − Ui, Vi}. This indicates the variable is censored τ − Ti − Ui time units after notification. Equation (4.5d) shows the likelihood function for the payment severity, this likelihood will be optimized over the parameters of the chosen payment density.

4.2.1

Background of the likelihood function: multitype

recur-rent events

The framework defined in the previous section for the development process, is a framework of multitype recurrent events. Cook & Lawless (2007) give mathematical implementations for frameworks with recurrent events. They make the mathematically convenient assump-tion that two events cannot occur simultaneously.

(21)

The hazard rate for recurrent events is defined as the instantaneous probability of an event occuring at t, conditional on the process history H(t):

λ(t|H(t)) = lim∆t↓0

P r{∆N (t) = 1|H(t)}

∆t (4.6)

Theorem 2.1 from Cook & Lawless (2007) states that conditional on H(τ0), the prob-ability density of the outcome n events, at times t1 < t2 < ... < tn, where n ≥ 0, for a process with intensity as defined in equation (4.6), over the interval [τ0, τ ], is:

N Y j=1 λ(tj|H(tj)) · exp{− Z τ τ0 λ(u|H(u))du} (4.7)

which is the hazard rate times the survival function, both continuous functions.

These definitions can be extended for multitype recurrent events. The hazard intensity for subject i (in this thesis the claim) and event type j is then defined as:

λij(t|Hi(t)) = lim∆t↓0

P r{∆Nij(t) = 1|Hi(t)}

∆t (4.8)

In equation (6.2) of Cook & Lawless (2007), the likelihood for multitype recurrent events is defined as:

Li = ( J Y j=1 nij Y k=1 λij(tijk|Hi(tijk)) ) · exp(− J X j=1 Z τi 0 λij(u|Hi(u))du)) = J Y j=1 (nij Y k=1

λij(tijk|Hi(tijk)) · exp(− Z τi

0

λij(u|Hi(u))du) )

(4.9) Here k = 1, ..Nij indicates the event number. This result is used to obtain the likelihood function for the event types which is defined in equation (4.5c). The likelihood function over all i’s is defined as:

L =Y i J Y j=1 (nij Y k=1

λij(tijk|Hi(tijk)) · exp(− Z τi

0

λij(u|Hi(u))du) )

(4.10) The likelihood from equation (4.5c) can be obtained from equation (4.10). Because three event types are distinguished in this thesis, J = 3 in equation (4.10). For simplicity in the following notations tijk|Hi(tijk) is notated as tijk and λij(u|Hi(u)) as λij.

L =Y i 3 Y j=1 (nij Y k=1 λij(tijk) · exp(− Z τi 0 λij(u)du) )

(22)

=Y i

(Ni Y

k=1

(λi1(ti1k)δi1kλi2(ti2k)δi2kλi3(ti3k)δi3k) · exp(− Z τi

0

λi1(u) + λi2(u) + λi3(u) )

In this equation one should recognize the likelihood of (4.5c). Note that the definitions of j and k are reversed in our thesis comparing to the definitions from Cook & Lawless (2007).

(23)

5.

Distributional Assumptions

The likelihood function showed in Section 4 needs to be optimized. As stated before, the likelihood function can be split in four partial likelihood functions which can be optimized separately. Distributions need to be specified for the reporting delay, event occurrence times and payment severities. The distribution for the reporting delay is used in the likelihood for the occurrence of claims as well. In this section parametric distributions will be specified based on the available data. All data until 31-12-2011 is used to fit the distributions. Graphical analysis and data preparations are done with the statistical software packages SAS and R.

5.1

Reporting delay

The reporting delay can be modeled using standard distributions from survival analysis, like the Weibull, the Gamma or the Exponential distribution. From graphical analysis, where the three mentioned distributions were fit to the data with the ‘MASS’ package in R, it followed that the Weibull distribution has a better fit to the reporting delay data than other continuous distributions that are commonly used in survival analysis. In Appendix A the P-P plots are shown for the fitted Weibull, Gamma and Exponential distribution and the empirical distribution function (ECDF) of the reporting delay.

Antonio & Plat (2014) used a mixture of a truncated Weibull distribution with nine degenerate components corresponding with notification 0 until 8 days after reporting. The reason for adding these discrete components for the first days is that a large part of the claims are reported a few days after the occurrence. The shape of the reporting delay histogram (as shown in Figure 3.2) in this thesis looks similar to that of the histograms in Antonio & Plat (2014), but the reporting delays for the data used in this thesis tends to be larger in general. Therefore a similar density structure is used: a mixture of a truncated continuous distribution and some discrete degenerate components. The corresponding density is defined by:

n−1 X k=0 pkIk(Uio) + (1 − n−1 X k=0 pk) f (Uio) 1 − F (365.25n−1 · 12) (5.1)

(24)

where Ik = 1 for the kth day after occurrence and 0 otherwise. Further f indicates the probability density function and F the cumulative distribution function. The term right to the + sign indicates the truncated continuous distribution, which is truncated (n − 1) · 365.2512 months. Note that the reporting delay is modeled in months but the degenerate components represent days. The likelihood for the reporting delay is defined in (4.5b) as:

Y

i

PU |t(dUio) PU |t(τ − Tio)

The numerator in this likelihood is already defined in equation (5.1). The denominator of this likelihood is defined by:

PU |t(τ − Tio) =                            p0 for day 0 p0+ p1 for day 1 .. . n−1 P k=0 pk for day (n − 1) n−1 P k=0 pk+ (1 − n−1 P k=0 pk) · F (τ −To i)−F ( n−1 365.25·12)

F (365.25n−1 ·12) for the nth day or later (5.2) where ‘day k’ means the kth day after occurrence.

The starting point in this thesis was using a mixture of a Weibull distribution with nine degenerate probability masses, as in Antonio & Plat (2014). With SAS Maximum Likelihood optimization is performed to obtain parameter estimates from the likelihood function defined in equation (5.2). The optimization routine ‘Proc NLmixed’ was used for the optimization of the log-likelihood for the reporting delay. The results of the model with nine degenerate components are shown in Table 5.1.

(25)

Parameter estimates for the internal group

Parameter Estimate Standard Error t-value Pr > |t| Lower Upper

a 0.39350 0.00111 353.76 < .0001 0.39130 0.39570 b 1.49400 0.00316 404.27 < .0001 1.61970 1.63550 p0 0.05761 0.00040 143.52 < .0001 0.05683 0.05840 p1 0.07463 0.00045 165.57 < .0001 0.07375 0.07552 p2 0.04898 0.00037 131.45 < .0001 0.04825 0.04971 p3 0.04580 0.00036 126.80 < .0001 0.04509 0.04651 p4 0.04151 0.00035 120.33 < .0001 0.04084 0.04219 p5 0.03834 0.00033 115.37 < .0001 0.03769 0.03900 p6 0.03997 0.00034 117.93 < .0001 0.03930 0.04063 p7 0.03828 0.00033 115.27 < .0001 0.03763 0.03893 p8 0.03090 0.00030 103.00 < .0001 0.03032 0.03149

Parameter estimates for the external group

Parameter Estimate Standard Error t-value Pr > |t| Lower Upper

a 0.34200 0.00526 65.10 < .0001 0.33170 0.35230 b 1.18630 0.02118 56.01 < .0001 1.14480 1.22780 p0 0.10200 0.00116 87.78 < .0001 0.09976 0.10430 p1 0.09894 0.00115 86.29 < .0001 0.09669 0.10120 p2 0.05504 0.00088 62.91 < .0001 0.05333 0.05676 p3 0.04952 0.00083 59.50 < .0001 0.04789 0.05115 p4 0.04262 0.00078 55.00 < .0001 0.04110 0.04414 p5 0.03740 0.00073 51.39 < .0001 0.03598 0.03883 p6 0.03659 0.00072 50.82 < .0001 0.03518 0.03800 p7 0.03486 0.00070 49.55 < .0001 0.03348 0.03624 p8 0.02506 0.00060 41.81 < .0001 0.02388 0.02623

Table 5.1: Estimation results from the optimization routine in SAS for a Weibull(a,b) distribution with nine degenerate components.

An explanation for the high t-values, which indicates a strong rejection of the null hypothesis that the parameter equals zero, lies in the definition of the t-statistic. As the t-statistic is defined by θ−θˆ 0

SE(ˆθ), where SE(ˆθ) equals s √

n and is increasing in √

n. Because the sample size n is very large, a small difference between ˆθ and θ0 can cause large t-values. When the t-value is large, the corresponding P-value will be very small.

(26)

P-values has some practical difficulties, and these are most apparent with large samples. In Raftery (1995) it is mentioned that when large samples are used, P-values tend to indicate rejection of the null hypothesis even when the null model seems reasonable theoretically and inspection of the data fails to reveal any striking discrepancies with it. Information criteria, which are log-likelihood criterions with degrees of freedom adjustment, are oftenly used for model selection. In this thesis two information criterions are used to select the number of components. One information criterion that is used to compare different models is the Akaike Information Criterion (AIC), which is defined as:

AIC = −2f (ˆθ) + 2q (5.3)

where f is the log-likelihood function, ˆθ is the vector of parameter estimates and q is the number of parameters. The other information criterion that is used is the Bayesian Information Criterion (BIC), and this selection criterion is defined as:

BIC = −2f (ˆθ) + q · log(n) (5.4)

where f is the log-likelihood function, ˆθ is the vector of parameter estimates, q is the number of parameters and n indicates the number of observations.

The AIC and BIC values are shown in Table 5.2 for the models with two, four and nine degenerate components. Based on these values, the model with two degenerate components is chosen.

Fit statistics for the internal group

Statistic n=9 n=4 n=2

-2 Log Likelihood 1.55E6 1.14E6 949215

AIC 1.55E6 1.14E6 949223

BIC 1.55E6 1.14E6 949266

Fit statistics for the external group

Statistic n=9 n=4 n=2

-2 Log Likelihood 381533 295362 257472

AIC 381555 295374 257480

BIC 381655 295428 257517

Table 5.2: Values of the information criterions for the Weibull model with nine, four and two degenerate components.

The parameter estimates for the model with two degenerate components are shown in Table 5.3. For the model with four degenerate components the parameter estimates can be found in Appendix B.

(27)

Parameter estimates for the internal group

Parameter Estimate Standard Error t Value Pr > |t| Lower Upper

a 0.5008 0.00119 419.38 < .0001 0.4985 0.5032

b 1.4940 0.00316 395.67 < .0001 1.2432 1.2556

p0 0.0579 0.00046 141.50 < .0001 0.0571 0.0587

p1 0.0750 0.00046 162.48 < .0001 0.0741 0.0759

Parameter estimates for the external group

Parameter Estimate Standard Error t Value Pr > |t| Lower Upper

a 0.3741 0.00271 138.20 < .0001 0.6880 0.3794

b 1.1356 0.00831 136.69 < .0001 1.1193 1.1519

p0 0.1023 0.00116 87.83 < .0001 0.0100 0.1045

p1 0.0992 0.00115 86.34 < .0001 0.0969 0.1014

Table 5.3: Estimation results from the optimization routine in SAS for the model with two degenerate components.

To check the calibration of the model that is chosen based on AIC and BIC values, the model with two degenerate components, the ML parameter estimates were used to perform some analysis in R. The maximum distance between the cumulative distribution function (CDF) and empirical cumulative distribution function (ECDF), max|Fn(x) − F (x)| where Fn(x) is the ECDF and F (x) the CDF, is compared for distributions with different num-bers of degenerate components. It turned out that this distance is the smallest for the mixed distribution with two degenerate components, which is the distribution with lowest AIC/BIC values. The CDF and ECDF for the mixed distribution with two degenerate components, together with their 95 percent confidence intervals of the ECDF for the reporting delay, are shown in Figure 5.1.

(28)

Figure 5.1: The CDF of the mixture of a truncated Weibull and two degenerate com-ponents (blue) and the ECDF (black) of the reporting delay for claims with internal treatment (left) and external treatment (right). The red lines are the 95% confidence intervals of the ECDF.

The graphs show for both groups that the CDF lies outside the confidence intervals for some regions. The confidence intervals are quite small because the number of observations n is large for both groups.

For further calibration checking, P-P plots are made. As Figure 5.2 shows, for both groups the CDF deviates from the ECDF at some points.

Figure 5.2: P-P plots of the CDF and the ECDF of the reporting delay for the internal group (left) and external group (right).

The reporting delay histograms together with the fitted mixed density are shown in Figure 5.3. The fitted mixed densities tend to give a reasonable fit.

(29)

Figure 5.3: The reporting delays and the fitted mixed density of a truncated Weibull density with two degenerate components for claims with internal treatment (left) and external treatment (right).

5.2

Occurrence of claims

The likelihood for the occurrence of claims was defined in (4.5a) as:  Q i λ(To i )PU |t(τ − Tio)  exp(− τ R 0 w(t)λ(t)PU |t(τ − t)dt).

The obtained distribution of the reporting delay is used in this likelihood function, as well as the exposure w(t). The likelihood of the occurrence of claims needs to be optimized over λ(t). As in Antonio & Plat (2014), a piecewise constant specification for the occurrence rate is used: λ(t) =                  λ1 if 0 ≤ t < d1 λ2 if d1 ≤ t < d2 .. . λm if dm−1 ≤ t < dm (5.5)

where m is chosen such that there is one interval for every month, which means m = 204. Furthermore, the intervals are chosen such that dm−1 ≤ τ < dm. For the exposure, w(t) = wl for dl−1 ≤ t < dl. To indicate whether dl−1 ≤ ti < dl, an indicator variable defined as

(30)

δ1,t(l, ti) is included. The number of claims that has an occurrence date between dl−1 and dl is expressed by: Noc(l) = X i δl,t(l, ti) (5.6)

The likelihood with the piecewise constant specification of the occurrence rate is de-fined as: m Y l=1 λNoc(l) l Y i PU |t(τ − ti) · exp(−λ1w1 Z d1 0 PU |t(τ − t)dt) ·exp(−λ2w2 Z d2 d1 PU |t(τ − t)dt)...exp(−λmwm Z dm dm−1 PU |t(τ − t)dt) (5.7)

This optimization over λ leads to the following estimator:

ˆ λt= Noc(l) wl · 1 dl R dl−1 PU |t(τ − t)dt (5.8)

This likelihood is optimized with the ‘Proc NLmixed’ procedure from SAS. In Antonio & Plat (2014) the ratio Noc(l)/wl indicates the number of claims divided by the number of valid policies for a certain period. For the exposure data used in this thesis, there was no information available about the number of valid policies. Instead there was some yearly information available about the total gross premium earned corrected for inflation. Therefore seasonal effect are not taken into account with this exposuse measure. In this thesis the ratio Noc(l)/wl indicates the number of claims divided by the relative size, which is the relative growth or reduction of the occurrence year compared to the index year 2001. For all months in a year this exposure measure is assumed to be the same. For the years 1995 until 2000 the exposure information is missing, and for this years the exposure is assumed to be equal to that of the index year 2001.

(31)

0 50 100 150 200 700 900 1100 1300 Occurrence month MLE

Occurrence process (intern)

0 50 100 150 200 100 200 300 400 500 Occurrence month MLE

Occurrence process (extern)

Figure 5.4: Maximum Likelihood Estimates of the piecewise specification of λ(t) together with their confidence bounds plotted over accident months for the group with internal treatment (above) and external treatment (below).

(32)

5.3

The development process of claims: hazard rates

for the event types

The development process of claims consists of two parts: an event type part and a payment part. The likelihood of the types of events was defined in (4.5c) as:

Q i ( Ni Q j=1 hδij1 1 (Vij)h δij2 2 (Vij)h δij3 3 (Vij) ) · exp(− τi R 0 h1(w) + h2(w) + h3(w)dw)

In Antonio & Plat (2014), a piecewise constant specification is used for the hazard rates h1(t), h2(t) and h3(t). They suggest the use of Weibull hazard rates as a good alternative for the piecewise constant specification. In this thesis the use of Weibull hazard rates is examined. The Weibull hazard function is defined as:

hk(w) = αk βαk

k

wαk−1 (5.9)

where αk is the scale parameter of the Weibull distribution and βk is the shape parameter for event k (k = 1, 2, 3) The integral term in the likelihood function can be split in three parts: τi Z 0 h1(w) + h2(w) + h3(w)dw = τi Z 0 h1(w)dw + τi Z 0 h2(w)dw + τi Z 0 h3(w)dw (5.10)

By filling in the Weibull hazard rates, these integrals can be defined as: τi Z 0 hk(w)dw = τi Z 0 αk βαk k wαk−1dw = αk βαk k [ 1 αk wαk]τi 0 = β −αk k τ αk i (5.11)

The complete likelihood for the types of events can now be defined as:

Y i (Ni Y j=1 (α1 βα1 1 Vα1−1 ij ) δij1· (α2 βα2 2 Vα2−1 ij ) δij2· (α3 βα3 3 Vα3−1 ij ) δij3 ) ·exp(−(β−α1 1 τ α1 i + β −α2 2 τ α2 i + β −α3 3 τ α3 i )) (5.12)

In Antonio & Plat (2014) different parameters were estimated for initial claim reserve categories, and a distinction was made between first and later events. Because there is no appropriate data available about the initial case reserve estimate, this covariate could not be included in the model used in this thesis. The likelihood obtained is optimized with

(33)

the ‘Proc NLmixed’ optimization routine from SAS. The estimation results are shown in Table 5.4.

Parameter estimates for the internal group

Parameter Estimate Standard Error t-value Pr > |t| Lower Upper

α1 1.032 0.00156 661.80 < .0001 1.029 1.035 α2 0.648 0.00167 388.44 < .0001 0.645 0.651 α3 0.633 0.00070 909.16 < .0001 0.632 0.635 β1 18.271 0.03916 466.60 < .0001 18.194 18.347 β2 55.862 0.33220 168.17 < .0001 55.210 56.513 β3 3.845 0.00983 391.03 < .0001 3.826 3.864

Parameter estimates for the external group

Parameter Estimate Standard Error t-value Pr > |t| Lower Upper

α1 1.256 0.00437 287.74 < .0001 1.248 1.265 α2 1.189 0.00831 143.08 < .0001 1.173 1.206 α3 0.836 0.00141 591.12 < .0001 0.833 0.839 β1 39.759 0.14550 273.34 < .0001 39.474 40.044 β2 124.430 1.31900 94.34 < .0001 121.845 127.015 β3 5.431 0.02168 250.52 < .0001 5.389 5.474

Table 5.4: Estimation results from the optimization routine in SAS for the Weibull hazard rates.

As was already mentioned in subsection 5.1, the t-statistics are large because the dataset is very large.

The Weibull hazard rates that follow from the optimization routine, are compared to piecewise constant specified hazard rates. The intervals for the piecewise constant hazard estimation are selected such that they are of an equivalent size. Therefore, the intervals for the piecewise constant specification were chosen differently for the internal and external group.

(34)

The piecewise constant hazard function for event k for the internal group is defined as: hk(t) =                                                  hk1 if t ∈ [0, 0.2) .. . hk15 if t ∈ [2.8, 3) hk16 if t ∈ [3, 4) .. . hk18 if t ∈ [5, 6) hk19 if t ∈ [7, 9) hk20 if t ∈ [9, 12) hk21 if t ∈ [12, ∞) (5.13)

For the external group the piecewise constant hazard function is defined by:

hk(t) =                                            hk1 if t ∈ [0, 4) .. . hk8 if t ∈ [28, 32) hk9 if t ∈ [32, 40) .. . hk11 if t ∈ [48, 56) hk12 if t ∈ [56, 72) hk13 if t ∈ [72, ∞) (5.14)

The piecewise constant hazard rates are estimated with the ‘Proc NLmixed’ optimiza-tion routine as well. In Figure 5.5 the piecewise constant hazard estimates are shown together with the Weibull hazard estimates.

(35)

0 5 10 15 0.00 0.02 0.04 0.06 Event 1 − intern time h(t) 0 20 40 60 80 0.00 0.01 0.02 0.03 0.04 0.05 Event 1 − extern time h(t) 0 5 10 15 0.02 0.04 0.06 0.08 Event 2 − intern time h(t) 0 20 40 60 80 0.000 0.004 0.008 Event 2 − extern time h(t) 0 5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 Event 3 − intern time h(t) 0 20 40 60 80 0.08 0.12 0.16 0.20 Event 3 − extern time h(t)

Figure 5.5: The Weibull hazards (red line) and the piecewise constant hazards (black line) for the internal group (upper) and for the external group (lower).

For some events, the estimates for the two different hazard rates are quite close. How-ever, for other groups there are remarkable differences. For the internal group, it is shown that the hazard rates for event 2 and 3 are first an increasing function, and later a declin-ing function. This is not modeled by the Weibull hazard rate. In Jin (2013), the hazard rates obtained by piecewise constant specification are an increasing function for small t as well and a decreasing function for larger values of t. In Jin (2013) the use of Weibull hazard rates is not evaluated. If one compares the obtained hazards to that of Antonio & Plat (2014), one sees that they are quite different. All hazards obtained by Antonio & Plat (2014) are decreasing functions, this is not the case for the hazards obtained in this thesis. For both groups, the hazard function of event 1 is an increasing function. This means that the instantaneous probability of the event occurring at t, conditional to the process history, increases in t. This makes sense because the greatest part of the settled claims in the dataset had event 1 as its terminal event, and as every claim has to settle at some point the probability of settlement if a claim is not yet settled increases over time.

(36)

In order to compare these outcomes, a nonparametric method for obtaining the sur-vival functions was used as well. In Cameron & Trivedi (2009) nonparametric estimators are presented of the survival, hazard and cumulative hazard functions in the presence of independent censoring. They define the Kaplan-Meier estimator (or product limit estima-tor) as: ˆ S(t) = Y j|tj≤t rj− dj rj (5.15)

where dj is the number of events at time tj and rj is the number of spells at risk at time tj. Here t1 < t2 < ... < tj < ... < tkdenote de discrete failure times. In this thesis the data is continuous, so time tj is a floating point number. The Kaplan-Meier survival function is obtained with the ’Survival’ package from R. The Kaplan-Meier survival functions are compared to the Weibull survival functions and the survival functions based on piecewise constant estimation of the hazards. The obtained survival functions are shown all together in Figure 5.6 for both groups and each event type.

(37)

0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 Type 1 − Intern Months Sur viv al function 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 Type 1 − Extern Months Sur viv al function 0 20 40 60 80 100 120 140 0.0 0.2 0.4 0.6 0.8 1.0 Type 2 − Intern Months Sur viv al function 0 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 Type 2 − Extern Months Sur viv al function 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Type 3 − Intern Months Sur viv al function 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Type 3 − Extern Months Sur viv al function

Figure 5.6: The piecewise constant survival functions (blue lines), the Weibull survival functions (red lines) and the Kaplan-Meier survival functions with their confidence inter-vals (black lines), the results for the internal group are shown left and for the external group right.

For type 1 and type 3 events, all three survival functions look very similar. However, the differences for type 2 events are larger. Type 2 events occur with the lowest frequency, as most of the claims have a type 1 event as terminal event and therefore a type 2 event will never occur for these claims.

Piecewise constant specification for the hazard rates might be preferred above the Weibull hazard rates, as the Weibull hazard rates cannot be a partly increasing and partly decreasing function. However, in general the Weibull hazard estimates lie close to the piecewise constant hazards. An alternative could be to use another distribution for the hazards, like the log-normal distribution.

(38)

5.4

Payments

The likelihood for the payments was defined in (4.5d) as:

Q i Q j Pp(dPij) =Y i Y j fp(pij) (5.16)

with fp denoting the density function of the payments.

For both groups in the dataset, the payment density Pp(dPij) needs to be estimated. A log transformation is performed on the payments, because the differences in size of the transactions are much smaller on log scale. Further the payments are right skewed, a natural log-tranformation of the payments results in more symmetric distributions.

In both groups there are some standard payment amounts, which have a high occur-rence frequency. The payments seem to consist of subgroups that follow different distribu-tions. The payment data used in this thesis was not discounted and neither corrected for inflation. Antonio & Plat (2014) used the normal distribution to model the log-payments. They estimated parameters of the lognormal distribution for different reserve categories and development years. Because there are no initial case estimates available for the dataset used in this thesis, a finite mixture model is used instead. Finite mixture models provide flexible ways to account for unobserved heterogeneity in the underlying data. A finite mixture distribution with k components is generally defined as:

fp(p) = k X

n=1

fpn(p) · λn (5.17)

where fpn(p) is the density of the nth subgroup and λn is the corresponding weight. All

weights together sum to 1. With the ‘Mixtools’ package for R multiple mixture models were tested for the payments. The ‘Mixtools’ package contains an expectation-maximization algorithm for several distributions, which include the gamma and normal distribution. The Gaussian mixture model for the log-payments is chosen to fit best, based on multiple trials with the EM algortithm in the ‘Mixtools’ package with different numbers of components k. The lognormal distribution is a commonly used distribution in finance for modeling losses and prices of financial products. As stated in McNeil (1997), the lognormal distribution has been a popular model for loss severity distributions. Further McNeil(1997) stated that the lognormal distribution is heavier tailed than the normal distribution but it is not

(39)

technically a heavy tailed distribution, it is moderately heavy tailed. In many applications it is quite a good loss severity model. The use of a lognormal distribution for loss modeling seems to be a reasonable choice for this thesis.

When the distribution of the mixture components fpn(p) was assumed to be normal,

the finite mixture distribution is defined more specifically as:

fp(p) = k X

n=1

fpn(p; µn, σn) · λn (5.18)

where fpn(p; µn, σn) denotes the normal density with parameters µnand σn, k denotes the

number of mixture components and λn denotes the weight of the corresponding compo-nent.

As a next step, the number of components k and their weights λn needed to be ob-tained. In general, adding more components will improve the likelihood, but adding too much components can cause over-fitting. In order to choose the number of mixture com-ponents, a comparison is made based on BIC values. When basing model selection on the BIC, which was defined in equation (5.4), the model with the lowest BIC value is preferred. The BIC values from k = 1 until k = 6 are shown in Figure 5.7. The best choice for the internal group is the Gaussian mixture model with five components, and for the external group the mixture model with four components is preferred.

Figure 5.7: BIC values for different numbers of mixture components for the internal group (left) and the external group (right).

Looking at these BIC values, one sees that there is a large improvement with the first few components. However, the improvements get smaller or the BIC values even become worse when adding more components.

To further check the choice of mixture components, cross-validation was performed. Five-fold cross validation was performed. The data was split in five equal sized parts,

(40)

each part is used once as validation data. The optimization routine was performed five times where 80 percent of the data was used as a training set to fit the model and 20 percent was used to validate the model. This cross-validation was performed for k = 2 until k = 5, and the log-likelihoods for each number of components for are shown in Table 5.5. The last column of Table 5.5 gives the average of all five log-likelihood values, the model that has the highest log-likelihood value is preferred. These average values are presented graphically in Figure 5.8.

Average Log-Likelihoods Internal group k=1* -81601 k=2 -76293 k=3 -69919 k=4 -68032 k=5 -63967 Average Log-Likelihoods Extenal group k=1* -50889 k=2 -48683 k=3 -48541 k=4 -47859 k=5 -47870

Table 5.5: The average log-likelihoods from five-fold cross validation for the Gaussian mixture model with k = 1, .., 5 components. *When k = 1 it is not a mixture distribution

Figure 5.8: Average Log-likelihood values from cross-validation for different numbers of mixture components for the internal group (left) and the external group (right).

Based on these log-likelihoods, the number of mixture components was chosen to be k = 5 for the internal group, because this model had the best average log-likelihood value over the five optimization routines. For the external group k = 4 is chosen, the model with five mixture components does not seem to perform better. These choices are in line with the choices based on the BIC values obtained from fitting with the complete dataset. In order to check the outcomes, the log-likehood function for potential payment distributions with several numbers of mixture components was optimized as well with the ‘Proc NL mixed’

(41)

package from SAS. The results were in line with the results from the ‘Mixtools’ package for R. The parameter estimates for the selected mixture models are given in Table 5.6.

Parameter estimates for the internal group (k=5)

Parameter Estimate Bootstrap s.e. Lower bound CI Upper bound CI

λ1 0.06425 0.00040 0.06359 0.06503 µ1 3.61797 0.00014 3.61773 3.61824 σ1 0.02030 0.00013 0.02011 0.02053 λ2 0.53999 0.00317 0.53367 0.54530 µ2 3.84059 0.00584 3.82909 3.85017 σ2 .959109 0.00269 0.95460 0.96371 λ3 0.08230 0.00039 0.08171 0.08298 µ3 4.29180 9.95E-6 4.29178 4.29182 σ3 0.00187 7.39E-6 0.00185 0.00188 λ4 0.16979 0.00107 0.16809 0.17183 µ4 4.32974 0.00051 4.32884 4.33074 σ4 0.11501 0.00056 0.11407 0.1162 λ5 0.14368 0.00247 0.13909 0.14900 µ5 6.07908 0.01642 6.04801 6.10784 σ5 0.95101 0.00660 0.94001 0.96228

Parameter estimates for the external group (k=4)

Parameter Estimate Bootstrap s.e. Lower bound CI Upper bound CI

λ1 0.32284 0.00559 0.31473 0.33375 µ1 6.97871 0.00701 6.96473 6.99089 σ1 0.79579 0.00662 0.78571 0.81001 λ2 0.54206 0.00545 0.53122 0.54974 µ2 5.21093 0.01587 5.17349 5.23236 σ2 1.50298 0.00592 1.48885 1.51189 λ3 0.06591 0.00081 0.06438 0.06742 µ3 6.82782 0.00082 6.82628 6.82925 σ3 0.06185 0.00080 0.06050 0.06331 λ4 0.06918 0.00082 0.06752 0.07083 µ4 7.57007 0.00080 7.56858 7.57165 σ4 0.06768 0.00078 0.06627 0.06904

Table 5.6: Parameter estimates of the mixture models for the internal (upper) and external group (lower), together with the bootstrap standard errors and confidence intervals.

(42)

The estimates shown in Table 5.6 seem quite reasonable. Figure 5.9 shows the his-tograms of the log-payments together with the kernel density function and the mixture components.

Payment: internal group

log−payments Density 0 2 4 6 8 10 0.0 0.5 1.0 1.5

Payments: external group

log−payments Density 0 2 4 6 8 10 12 0.0 0.1 0.2 0.3 0.4 0.5

Figure 5.9: Histogram, kernel density and mixture components for the log-payments. Left is intern with k = 5, right is extern with k = 4.

The largest spike of the internal group is estimated by the third component of the mixture model. The estimate of µ3 is 4.291797 with λ3 equal to 0.082301 and a very small standard deviation. Because of the very small standard deviation, this component cannot be seen in the figure. As was mentioned in Section 3, this amount was paid most frequently (the large peak in the histogram). Nine percent of all payment transactions for the internal group were of this size, this lies close to the estimated lambda. The remaining component estimates are as well distributed around some commonly observed value.

For the external group, the two large spikes are modeled by the third and fourth component. The estimates µ3 and µ4 lie close to the observed values of the spikes. A normal distribution for the somewhat lower payments is taken into account by the second component. As there are many small intermediate payments within the data, it makes sense that this component has the largest weight (λ2). There are many payments within this group that are larger, this payments are modeled by the first component.

In Figure 5.10 the histograms are shown together with the mixture density for both groups. The mixture density takes the large spikes well into account. One could argue if components must be added for the external group, as not all peaks are modeled precisely. However, by adding parameters overfitting might occur. The BIC values and the cross-validation analysis did not point out that adding an extra component improves the log-likelihoods.

(43)

Payments: internal group log−payments Density 0 2 4 6 8 10 0.0 0.5 1.0 1.5

Payments: external group

log−payments Density 0 2 4 6 8 10 12 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 5.10: The density of the mixture distribution together with the histograms for the internal group (left) and the external group (right).

To check the calibration of the chosen mixture models, the ECDF together with 95 percent confidence bounds are shown in Figure 5.11, together with the CDF. The graphs show that the CDF sometimes lies outside the confidence region, but the estimations seem quite reasonable.

Figure 5.11: The CDF and ECDF for the log-payments with 95 percent confidence bounds.

P-P plots of the CDF together with the ECDF of the underlying data are shown in Figure 5.12. Besides the CDF for the mixture models with five and four components for the internal and external group respectively. The P-P plots for k = 3 are given as well to show the improvement in fit by adding these extra components, the CDF and ECDF lie more on the same line for the chosen models.

(44)

Figure 5.12: P-P plots of the CDF from the mixture distribution and the ECDF for the internal group (left) and the external group (right).

A suggestion for model improvement might be to add the development year as a covariate, as there was no correction for inflation on the payments.

Another suggestion for modeling the payments is the Pareto distribution, that is a heavy tailed distribution. The Pareto distribution is a commonly used distribution for modeling losses in insurance because of this heavy tail. In Jin (2013) the Pareto distri-bution is used for modeling the payments. However, in Cooray & Ananda (2005) it is stated that the Pareto model covers the behavior of large losses well, but fails to cover the behavior of small losses. This in contradiction to the lognormal model, that covers the behavior of small losses well, but fails to cover the behavior of large losses. They suggest a spliced model, were the two-parameter lognormal density is used up to some threshold value, and the two-parameter Pareto density for the remainder of the model.

Because of the different peaks in the payment histograms it is a logical choice to use a mixture model, and the Gaussian mixture model seems to fit the data well. When it turns out that the model does not predict well for the large losses, a suggestion could be to use a spliced distribution with a mixture Gaussian distribution until a certain threshold value and the Pareto distribution for the remainder. A remarkable thing about splicing, is that it does not ensure that the spliced density fuction is continuous. Only when the components meet at the breakpoint(s), the resulting density is continuous.

(45)

6.

Conclusion

This thesis evaluated the implementation of the framework from Antonio & Plat (2014) for a different type of non-life insurer. The dataset from a legal insurance company was available for this thesis, this dataset consists of more than one million records. The building blocks from the framework of Antonio & Plat (2014) were evaluated and estimated. Within the data a distinction is made between claims with internal treatment and claims with external treatment. This distinction is made because these claims have very different development patterns and payment severities. Antonio & Plat (2014) was one of the first publications where stochastic micro-level models were implemented on a real life dataset. The dataset used in this thesis is from a very different type of non-life insurance, which makes this new implementation of the framework quite useful from a practical perspective. A likelihood function for the complete claims process is introduced in this thesis. This likelihood can be split in four building blocks which can be evaluated separately: a part for the reporting delay, the occurrence of claims, the development process and the payment severities. When optimizing the likelihood for the occurrence process of claims, the distribution of the reporting delay is used. Therefore this building block was obtained after modeling the reporting delay. For all these components, a parametric model was estimated and analyzed. Because of the very large dataset, confidence intervals tend to be very small. Information criteria and probability plots were used for model selections.

For modeling the reporting delay, a mixture of a truncated Weibull model with two degenerate components is obtained by Maximum Likelihood estimation. The obtained model has a similar structure to that used in Antonio & Plat (2014), but the number of degenerate components differ.

The occurrence of claims is modeled by piecewise constant specification. Within the likelihood function for the occurrence process, exposure data is used. There was no ex-posure data available in the form of monthly policy numbers. Instead, the yearly gross earned premium is used to measure the size of business.

For the development process, it is investigated whether Weibull hazard rates could be used for modeling the hazard rates for the three different types of events. Besides the Weibull hazard rates, piecewise constant hazard rates are estimated as well. When

(46)

forward. The piecewise constant hazards for event 2 and 3 for the internal group showed an increasing pattern for small values of t and a decreasing pattern for higher values. As the Weibull hazards do not take this into account, a suggestion might be to evaluate the use of the lognormal hazard rates as well. The Weibull survival functions and the piecewise constant specified survival functions, are compared to the nonparametric Kaplan-Meier Survival functions. For both groups, all survival functions for event 1 and event 3 lie very close. For event 2 the differences are larger. Event 2 seems to be the most difficult event to model, a reason could be that this is the smallest group.

The log-payments are modeled with a Gaussian mixture model. Both groups have several spikes in their payment distributions. A mixture model takes these spikes quite well into account. The number of mixture components is selected based on the BIC values, and the log-likelihood values obtained with cross-validation are used for confirmation. The chosen models have a reasonable fit to the data. Compared to the work of Antonio & Plat (2014), there was a lack of data about intital claim reserve estimates. Therefore, a mixture model is used instead of a normal distribution with covariate information as in in Antonio & Plat (2014).

By using a micro-level stochastic loss model, data is used more optimally as detailed information of each claim is taken into account. Furthermore, there are less limitations compared to aggregated data methods. The data that was available for this thesis, seems quite extensive and sufficient to estimate all the model components. As a further sugges-tion, the framework obtained in this thesis should be implemented in a prediction routine where future events and payments will be simulated. An estimate of the total future losses can then be made, and this estimate can be compared to the reserve estimate from a method based on agreggated data. This implementation would increase the practical value of this research, and some deficiencies of the obtained model might come forward.

Referenties

GERELATEERDE DOCUMENTEN

In this chapter, a brief introduction to stochastic differential equations (SDEs) will be given, after which the newly developed SDE based CR modulation model, used extensively in

When the controversy around the Sport Science article started on social media, it was to have serious repercussion for the researchers, the research community at

Letter writing and drafting skills, the value of plain language, moot court activities, alternative dispute resolution and clinical legal education provide

De vraag die in dit onderzoek centraal staat, luidt als volgt: Wat zijn de fiscale gevolgen van het bezitten, handelen en minen van de bitcoin in de inkomstenbelasting, voor

Currently, the survival rate after transplantation of DCD liver grafts is similar to that of transplantation of donation after brain death (DBD) liver grafts.8,10-12

Er werd wel een significant effect gevonden van perceptie van CSR op de reputatie van Shell (B = .04, p = .000) Uit deze resultaten kan worden geconcludeerd dat ‘perceptie van

The conference on Strategic Theory and Contemporary African Armed Conflicts is the result of co‑operation between the faculties of the Royal Danish Defence College (RDDC)

From the behaviour of the reflectivity, both in time and with energy-density, it is inferred that this explosive crystallization is ignited by crystalline silicon