• No results found

An integrated pricing, reserving and risk quantification model

N/A
N/A
Protected

Academic year: 2021

Share "An integrated pricing, reserving and risk quantification model"

Copied!
44
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

An integrated pricing, reserving and risk quantification model

C.S. Attema

University of Groningen

December 19, 2014

Abstract

(2)

Contents

1 Introduction 2

1.1 General theory on insurance reserves . . . 3

1.1.1 Aggregated claim reserving methods . . . 4

1.1.2 Micro-level claim reserving . . . 4

1.1.3 Premium reserve . . . 5

1.1.4 Discussion . . . 6

2 Methodology 7 2.1 Model assumptions . . . 9

2.2 Claim frequency . . . 10

2.2.1 The number of NINR claims . . . 10

2.2.2 The number of IBNR claims . . . 12

(3)

1

Introduction

An important part of the job of an actuary in an insurance company is balancing income, costs and reserves and managing their risks. This is usually done based on past data using statistical methods. Income consists of insurance premium which is set for individual policy holders, for which an actuary can create a model based on their characteristics and individual claim history. When insurance premium is paid this does not equal immediate profit or a sudden increase in equity, but it is added to a reserve to cover for future expenditures. For a non-life insurer this reserve consists of two parts: the outstanding claims reserve and the premium reserve, which are together referred to as the technical provisions.

The size of the premium reserve should be such that it covers for future claims due to current insurance contracts, the outstanding claims reserve is meant to cover for claims which have happened in the past. The size of these provisions does not affect the insurer’s losses directly, but only affect the time at which the insurer is allowed to add the premium to his equity, assuming the insurer does not default on his obligations. However, return on equity is a significant source of profit, and the sooner an insurer can the insurance premium to his equity, the higher this profit. This is in contrast to money which is held in provisions and for which the insurer is required to hold additional capital, which means that return on this capital is limited.

These provisions and premium income are closely related to each other, but there is little communication between these models in practice. While insurance premium is calculated per policy holder, outstanding claims are modelled on a portfolio level and premium reserves on a company level. This leads us to our main research question:

(4)

Where we focus specifically on property insurance, excluding income or health insurance, and we consider premium to be the net premium or expected loss. To illustrate the proposed model, we will perform a case study on a relatively small motor insurance portfolio of which micro-level data of both policy holder characteristics and claim developments and payments is available. To answer our main research questions, we cover several sub-questions:

• Should we estimate reserves for each individual policy, for each individual claim, or on an aggregated level?

• Should the model be set in continuous time or in discrete time? • How to model the number of insurance claims?

• What is an appropriate model for the size of insurance claims? • To what extent is the use of micro-level data justified?

• How does the model relate to alternative methods?

In section 1.1 we will further introduce the terminology and framework and provide an overview of relevant literature and theory, section 2 will cover the methodology. In section 3 we will define the necessary distributional assumptions. In section 4 we will apply the approach on a data set obtained from a European motor all-risk insurance portfolio. We will discuss and conclude our findings in section 5.

1.1

General theory on insurance reserves

In this section we will shortly discuss prevalent reserving methods based on recent literature. We will first introduce a number of concepts and definitions, which recur throughout literature and this paper in specific.

Reporting Delay The time it takes an individual to report an insurance claim after the damage has occurred

IBNR Claims which have been Incurred But Not Reported

(5)

GLM Generalized Linear Model

Premium reserve The reserve an insurer is required to keep for future claims from currently in force policies

Outstanding claims reserve A reserve which is meant to cover future payments from claims which have already occurred

1.1.1 Aggregated claim reserving methods

A common way to estimate outstanding claim liabilities is the chain ladder method: a deterministic method using so called claim triangles or run-off triangles and development factors to derive an estimate of outstanding claim liabilities. The model is set in a discrete time frame and is based on the assumption that the composition and behaviour of the underlying portfolio do not change over time. The chain ladder method can be shown to produce the same results as a compound Poisson GLM, where covariates are taken to be arrival year and development year. A mathematical framework for the chain ladder model can be found in Mack (1999) [10].

1.1.2 Micro-level claim reserving

Individual claim reserving models, or micro-level models, estimate the outstanding liability for each insurance claim and set the total claims reserve equal to the sum of the reserves for all claims. Since these models use data about specific claims, they are also referred to as micro-level reserving methods. Several researchers estimate development factors similarly to those in the chain ladder method. Guszcza and Lommele (2006)[5] propose a micro-level model which is similar to the aggregated GLM model as described by England and Verall (2002)[4], but instead construct a GLM model for each claim based on claim specific risk factors in addition to risk covariates indicating the arrival and development year. Pigeon, Antonio and Denuit (2013) [11] suggest an extension of the development factor method, which estimates development factors for individual claims. Key elements of this model are (i) the size of the development factors are modelled by a multivariate skew normal distribution (MSN) (ii) the time between development periods is stochastic.

(6)

in their approach, the most comprehensive and modern is the model proposed by Antonio and Plat. In addition to using micro-level data, their model includes seasonal effects and allows the modelling of individual claim developments. We will introduce their approach below.

Antonio and Plat estimate a Time Dependent Marked Poisson Process to estimate claim sizes, occurrences and developments jointly, and estimate reserves through simulation. Denoting t,u ∈ R+ and x ∈ X as the mark space, where marks are variables such as numbers of payments and claim sizes, they define a development process of a claim as an element of equation (1):

λ(dt) × PU |t(du) × PX|t,u(dx) ∈ [0, ∞) × [0, ∞) × X (1)

Where λ(t) denotes the hazard rate, which is constant on specific intervals, PU |t(u) denotes a probability

distribution for reporting delay and PX|t,u(x) a joint distribution for claim markers. Claims are then

defined by a time of occurrence and a mark x ∈ X , where x consists of variables like the number of events, event types and corresponding payments. This equation then returns the probability of some claim defined by claim markers x∗, reported after u∗ time units, at time t∗. The claim size, occurrence time and reporting delay are assumed to be independent.

They derive results numerically by constructing a large number of realisations of the process defined by equation (1), where possible conditioning on the part of the development of a claim that has already been observed. Antonio and Plat conclude that loss reserving using micro level data is feasible, and reflect real claim payments in a more realistic way than aggregated methods.

1.1.3 Premium reserve

(7)

require a model for future claims. In order to underline the relation of the premium reserve to the IBNR reserve and RBNS reserve in our framework we will in our model refer to these claims as Not Incurred and thus Not Reported (NINR).

1.1.4 Discussion

A more exhaustive overview of claim reserving methods can be found in Chukova and Anastasiadis (2010) [12]. In the majority of the methods discussed above claim sizes are modelled by a gamma or log-normal distribution and researchers propose modelling very large claims with a Pareto or GEV distribution. We find that much research has been done on modelling outstanding claims, or equivalently RBNS claims. While especially in recent literature individual claim reserving has become popular, the use of micro-level data of individual policy holders such as information about age and place of residence, is limited. This data however, is often available when it is used to model insurance premium. We find that the most flexible approach is the point process model proposed by Antonio and Plat (2010)[1] and Haastrup and Arjas (1996)[2]. Compared to aggregated methods this approach is better suited to model claims when the composition of the insurance portfolio is subject to change and allows for more analysis into underly-ing causes of financial gains or losses. While the RBNS reserve is modelled on a micro level for individual claims, IBNR claims are modelled for individual policies or for portfolios, clearly because the number of IBNR claims is stochastic in itself.

(8)

2

Methodology

In order to model the total losses for an individual policy, we will first examine the development of a single insurance claim. During its lifetime, an insurance claim goes through several phases. Firstly, before a claim is incurred, it is reserved for in the premium reserve. When an accident occurs it likely takes some time before the insurance company is notified. Once reported the size of the payments has to be set and in some cases an insurance agent has to be sent to verify the damage that has been caused. When the size of the claim has been agreed upon, the insured is reimbursed through one or several payments, and eventually the claim is settled. An illustration of this process is shown below, followed by a specification of the terminology we will use in this paper.

Figure 1: The claim process

Occurrence The time at which the claim has occurred. Claims before this point in time are Not yet Incurred and thus Not Reported (NINR). Once they have occurred they are referred to as Incurred But Not Reported (IBNR) claims.

Notification The insurer is notified of the damage. Claims are now Reported But Not Settled, and one or more payments to the policy holder are made. (RBNS).

Settlement Once all payments have been done all issues have been resolved, the claim is settled. This is the end of the development of the claim.

(9)

Figure 2: Claim processes for a single policy

Policy holders can report claims until several years after the claims have actually occurred, only after this time an insurer knows for certain the number of claims that originate from a specific policy. All insurance claims at any time are hence in one of the categories shown in figure 2. Therefore, if the technical provision covers for each of these categories, it covers for all claims for a single policy. For each policy we will therefore estimate a reserve for each of these claim categories.

Over the lifetime of a policy, these reserves change in the following way: when a policy starts, clearly no claims have occurred yet and the reserve for IBNR claims is then zero. All claims corresponding to the policy are at this time are NINR. As time passes, the expected number of future claims for the policy holder decreases, hence so does the NINR reserve. Similarly, the expected number of claims which could possibly have occurred increases and hence so does the IBNR reserve. When the policy ends, the NINR reserve is necessarily zero, as any future claims are no longer covered. Over time the IBNR reserve decays, as any claims the policy holder could have incurred are likely to have been reported at some point. If during the policy the policy holder reports a claim, future payments for the claim are covered for in the RBNS reserve.

(10)

number of claims should not change on average. For example, suppose we expect n1≥ 0 claims per policy

year and after 3 months we expect n2, where 0 ≤ n2≤ n1 claims to remain. On average the number of

reported claims after six months should then be n1− n2. The total expected claim size of NINR claims

should on average be equal to the total expected claim size of all IBNR claims, a similar argument holds for NINR and net premium, and IBNR and RBNS claims. If this is not the case, on average either (i) the estimate of the total number of claims per policy decreases over time as claims occur or are reported, or (ii) it increases over time. In situation (i) the NINR or net premium is too high or the outstanding claims reserve too low, in situation (ii) the NINR or net premium is too low or the outstanding claims reserve too high. Either situation is undesirable, for example due to extra costs because the kept reserve is too high, or due to higher risk that a reserve is insufficient.

2.1

Model assumptions

Having defined the underlying terminology and setting, we find that a model for insurance premium, premium reserve and outstanding claims requires at least the following elements.

A distribution for the number of future claims

A model for the number of unobserved but reported claims

A model for the claim size

(11)

frequencies are independent of claim sizes conditionally on known risk factors.

2.2

Claim frequency

In order to derive an estimate for the total number of claims for an individual policy, recall that any insurance claim can be specified as one of RBNS, IBNR or NINR. Because NINR and IBNR claims have not yet been reported, their number is unknown as well as their sizes. In this section we will derive models to estimate their frequency.

2.2.1 The number of NINR claims

The number of insurance claims in a certain time period can be assumed to follow a Poisson distribution. In this case the occurrence of a claim does not directly affect the expected number of remaining claims. If we apply this concept to a continuous time line, and assume the hazard rate is equal over time, the time between two events follows an exponential distribution. If the number of claims Nc follows a

Pois-son distribution, and claim sizes Xi, i = 1, .., Nc, are modelled by some density fx(x), Nc

P

0

Xi follows a

compound Poisson distribution.

Because the hazard rate is constant over time, this model does not incorporate seasonal influences. If we allow the hazard rate to change over time however, the number of claims follows an inhomogeneous Poisson distribution and the time between claims is no longer exponentially distributed. To model claims in such a setting we can use a point process. A point process is a stochastic process which describes the occurrence of events in some space, in this case time. There are three equivalent ways to specify a point process: (i) by the occurrence times of the events, Ti, i ∈ N (ii) by the inter-occurrence or waiting

times Wi= Ti+1− Ti(iii) by the counting process N (A) which specifies the number of events in interval

A = (a1, a2). In this way we can define a homogeneous Poisson process as one of (1) Ti ∼ Γ(2i,λi∗) (2)

Wi ∼ exp(λ∗) or (3) N (t) ∼ P oisson(tλ∗). A comprehensive coverage of point processes can be found

(12)

Stationarity A point process is called stationary if for any two disjoint intervals on the real line, say A and B, the random variables generated by the counting measure over that interval are independent and identically distributed.

A ∩ B = ∅ ⇒ N (A) |= N (B) ∧ N (A)= N (B)d

Orderliness A point process is called orderly if for any interval on the real line the probability an event occurs in an interval tends to zero as the size of the interval tends to zero.

N is orderly if for all t ∈ R P (N (t + δt) > 1) → 0 as δ → 0

These assumptions mean that for an individual policyholder the hazard rate is not influenced directly by the times of past claims (stationarity) and the policy holder cannot incur multiple accidents simul-taneously (orderliness). Under these assumptions, the number of insurance claims per policy can be ap-proximated by a Poisson distribution. However, to incorporate seasonality we require that N (TS1, TS2) 6=

N (TS2, TS3) where TSi, i = 1, .., 4 indicate the start of the first, second, quarter and fourth quarter of the

year which we will consider equivalent to winter, spring, summer and autumn respectively. In this case the hazard rate is not constant over time and the resulting process is hence no longer a homogeneous Poisson process but instead an inhomogeneous Poisson process, which therefore allows us to incorporate seasonality. While the times between events is no longer exponentially distributed, the total number of claims still follows a Poisson distribution. Let the number of events observed up until time t be defined as N (t), with corresponding occurrence times T1,T2,...,TN (t). We will refer to the hazard rate of occurrence

of insurance claims as λ(·). For ease of notation we will suppress writing out any conditionality for λ(·) and write λ(t|Ft) as λ(t), where Ft denotes the information set at time t. At time t we then have the

following likelihood for the hazard rate, where λ(·) is some time varying positive function. For a complete derivation we refer to Cook and Lawless (2007)[3].

(13)

2.2.2 The number of IBNR claims

While the number of claims is assumed to follow a Poisson distribution, where claims occur at times Ti,

i = 1, .., n, the insurer is not immediately notified of these claims. Instead they are delayed by a reporting delay Rdi, such that an insurer knows of a claim at time Ti only at time (Ti+ Rdi). Therefore at time

t, we can define IBNR claims as the difference between the number of claims that have occurred and the number of claims that have been reported, or equivalently claims where Ti≤ t and Ti+ Rdi> t.

At any time t∗ the probability a policyholder reports a claim is the sum of the probabilities that: (1) a claim is incurred today (t = t∗) and reported immediately (Rd∗= 0) (2) a claim was incurred yesterday (Ti = t∗− 1) and is reported today (Rd∗ = 1), and so on. In continuous time we replace the sum with

an integral: ρ(t) = Z 0 t λ(τ )fRd(t − τ )dτ (2)

Where ρ(t) denotes the expected rate of claims notified to the insurer for a single policy at time t. Since an insurance company is interested only in claims that will ever be reported the above equation should produce the same expected number of claim notifications as the Poisson process produces claim occur-rences over the entire time line. Because fRd is a probability density function, the expected number of

claims generated by the process defined by equation (2) is equal to that of the original distribution λ(·). Note that at any time t after the policy has ended, ρ(t) ≥ 0 while λ(t) = 0. Having defined the hazard rate, we can specify waiting times between events, which in turn allow us to specify the expected number of events generated by ρ(·) over a particular period of time. While for a scalar hazard rate this results in a Poisson distribution, the distribution of the number of events generated by ρ(·) is not straightforward to derive.

When a policy generates multiple accident occurrences in the same policy year, the above definition al-lows the reporting delays and occurrence times of these claims to overlap. Consider a policy holder that incurs damage at times T1 and T2 with T1 < T2 and reports them after some random number of days

Rd1 and Rd2 respectively. The above model does not restrict the values of Rd1 and Rd2, such that it is

possible that T1+ Rd1> T2+ Rd2 which means the policy holder notifies his insurance company of the

(14)

This means that whenever an IBNR claim is reported there are no additional IBNR claims for that policy holder and hence we set the hazard rate ρ(·) to zero. This leads to the following assumption:

Assumption 1 The expected number of IBNR claims for an individual policy is set to zero at the time a claim is reported to the insurer. Claims are hence reported to the insurer from a point process with an intensity at time t as shown below. Let Htbe the set of relevant information at time t, which consists of

the value of any relevant risk factors and the most recent occurrence time.

ρ(t|Ht) =

Z

0

(t−Ti−1)

λ(τ )fRd(t − τ )dτ (3)

Intuitively this means that for the process of reported claims we only consider claims that have happened since the last time the policy holder has reported a claim. It is important to note that with this assumption the point process is no longer stationary. In this equation, the upper limit ensures that only claims of up to t − TN(t) in the past are taken into account to determine the hazard rate ρ(t|Ht). The occurrence

rate depends on the history of the process, through the time since the last event Ti−1. However, the

occurrence times of events prior to the most recent one have no effect on the ρ(t|Ht). This process is

therefore referred to as a renewal process, as the process is renewed at each occurrence. Note that in this process, these occurrences are claim notifications as opposed to accident occurrences. This assumption changes the likelihood of the point process under consideration. Let fWi denote the distribution of the

waiting times Wi = Ti−1− Ti of the renewal process and FW its survival function. We denote the start

of the policy as time T0, and under the assumption that the duration of a policy is one time unit, the end

of a policy is at time T0+ 1. Following the approach by Lawless and Thiagarajah (1996) [9], we specify

the likelihood of the point process defined by ρ(·) in equation (3) at time t:

L ∝ Y i∈1,..,N (t) fWi(Ti) × FW(t − TN (t)) = Y i∈1,..,N (t) ρ(Ti|Ht) exp   − Ti Z Ti−1 ρ(u|Ht)du    × exp   − t Z TN (t) ρ(u|Ht)du    (4)

(15)

adjusted hazard rate ρ(t|Ht), which is influenced by the most recent occurrence. The expectation of the

number of reported claims can then be derived by integrating the expected value of equation (3) over the duration of the policy. ρ(·) Is clearly no longer stationary and does not have the memorylessness property. Under these assumptions the number of events generated by ρ(·) is therefore not Poisson distributed. By defining the number of insurance claims as a point process and adjusting it by the reporting delay, the relation in equation (3) allows us to estimate the frequency of claims being reported corresponding to a certain claim rate λ(·).

Since this equation is hard to solve analytically, we will describe a method to approximate the number of occurrences numerically. Using this relation the number of IBNR claims at time r can be derived by setting λ(t) ≡ 0 for t ≥ r such that we can derive the number of expected claims which will be reported in the future due to claim occurrences generated by the claim occurrence process λ(·) in the past. Since λ(·) and fRd(·) can be observed directly, a straightforward method to estimate ρ(·) in equation (3) indirectly

is by inserting λ(·) and fRd(·) in the right hand side.

2.2.3 Seasonality

To model seasonal influences we assume there exist weights Sj , where j = 1, 2, 3, 4:

4

X

j=1

Sj= 4 (5)

In this way the expected number of claim occurrences for policy k equals Sjλk in a specific season. The

claim frequency per policy is in this way unaffected by seasonality as all policies are assumed to have a duration of exactly one year.

2.3

Claim size

In literature several ways to estimate claim sizes have been suggested. Haastrup and Arjas (1996)[6], Larsen (2007)[8] and Antonio and Denuit (2013) [11] model payments by modelling payments through a point process, where the generated occurrences correspond to events such as payment and settlement. In this case the point process for claim occurrences initiates new point processes at every claim occur-rence Ti. In these models payment sizes and the number of payments are independent and identically

(16)

expected number of payments based on the point process for that claim, distributed approximately as an inhomogeneous Poisson distribution. Payment sizes are often assumed to follow a gamma, normal, log-normal or Pareto distribution. An important aspect of this model is that the number of payments directly affects the total claim size: the expected claim size for claims with two payments is twice as large as the size of single payment claims.

Alternatively, we can disregard the number of payments and only model the total claim size. The idea behind this approach is that whether the insurer decides to reimburse the policyholder through one or multiple payments might depend on factors such as risk averseness, the behaviour of the policyholder, a third party which handles repairs or some other unobserved process. For instance, a very risk averse insurance company might hold off on payments until all receipts have been received and pay everything at once, while others might pay everything as soon as it has been reported. Whether the insurer decides to pay its customer in one large or in two smaller payments does not necessarily affect the total amount it is liable. The choice whether the number of payments should be modelled is dependent on the underlying data. For example, for injury claims the number of payments could be hospital bills or medicine which continue until the policy holder is cured.

(17)

3

Distributional assumptions

In sections 3.1-3.2 we discuss our assumptions on the expected frequencies and severities of the claim types in our model. Since the number of claims and the sizes of these claims are assumed to be independent, we derive estimates for them from separate distributions. In section 3.3 we cover the model we use to benchmark our results.

3.1

Claim frequencies

NINR Claims For a single policy, any claim of which the occurrence time lies in the future is an NINR claim. We assume that claim occurrences follow an inhomogeneous Poisson point process with intensity or hazard rate λ(t): λ(t) = λ × 4 X j=1 1TSj,TSj +1(t)Sj (6)

Where λ(t) is the estimated of claim rate for a policy at time t, and 1Sj(t) an indicator function which

equals 1 if t ∈ (TSj,start, TSj,end), where Sj is a multiplier for season j. It then follows that the expected

number of claim occurrences from Ta until Tb for policy k follows a Poisson distribution:

Λk(Ta, Tb) ≡ Tb Z Ta λ(τ )dτ NN IN R∼ P oisson(Λ(Ta, Tb)) (7)

IBNR Claims Recall that the conditional hazard rate of IBNR claims at any time t for a single policy, can be written as:

ρ(t|Ht) =

Z

0 t−TN (t)

fRd(t − τ )λ(τ )dτ (8)

Where N (·) is number of events up until time t, and TN (t) is the most recent claim notification at

time t. In order to simulate the occurrence times Tiof claims being reported, we will draw a series of

re-alisations E = (Ei)i≥1from a homogeneous Poisson point process with some parameter Ω ≥ maxt(ρ(t)).

(18)

it is a realisation of the renewal process described in the previous section. To do so we thin the set of occurrences E, by removing a number of its elements based on the modified hazard rate ρ(Ei) such that

the remaining elements of E are a realisation of the process for reported claims in equation (2). Note since fRd(·) is a probability density function such that fRd(·) ≤ 1 the following relation holds:

maxt(ρ(t|Ht)) ≤ maxt(λ(t|Ht))

And we can select Ω as follows:

Ω = maxt(λ) ≡ λmax≥ maxt (ρ(t|Ht)).

We then construct a Markov Chain Yi≥1,i ∈ N+, where Y0= 0, with transition probabilities:

P (Yi= i|Yi−j= j) = ρ(Ei|HEi∩ TN (Ei)= Ej) λmax (9) P (Yi= j|Yi−j= j) = 1 − ρ(Ei|HEi∩ TN (Ei)= Ej λmax ) (10)

Where Eiare times of events of the Poisson process with rate Ω. The set F = {Ei∈ E s.t. i = Yi} then

consists of realisations of a modulated renewal process with hazard rate ρ(t|Ht) as in equation (8). Note

that any value of Ω such that Ω ≥ maxt(ρ(t|Ht)) is appropriate. Say we select Ω∗ = a × Ω, where

a ≥ 1, the expected number of events generated by a Poisson process with intensity a × Ω is a factor a more. However, the probability of an event being selected by the Markov chain defined in equation (9) is a factor a less and the resulting expected number of events is the same. However if Ω is selected such that Ω < maxt(ρ(t|Ht)), the transition probability number of events in F potentially exceeds 1.

In practice, we restrict F by disregarding any occurrence times which exceed the term of limitation for insurance claims. For a more detailed derivation and a proof we refer to Rao and Teh (2011) [14].

3.2

Claim Sizes

(19)

such as a logistic distribution. Alternatively we can use a Generalized Additive Model for Location Scale and Shape (GAMLSS) which is an extension of the GLM. Such a model allows jointly estimating an additional predictor such as the probability of payment next to the size of the payment and as such does not require a secondary regression. The GAMLSS family includes a large number of distributions outside of the exponential family, and includes all members of the GLM family.

The type of GAMLSS we will consider in this thesis is referred to as a simple linear parametric GAMLSS, which can be seen as a GLM with additional parameters and linear predictors. Given independent observations yi, i = 1, . . . , n with probability density function f (yi|θi), where θi= (µi, σi, νi), a GAMLSS

models each element of θ as a function of some explanatory variables. Let yT = (y

1, y2, . . . yn), and

k = 1, 2, 3. Then a GAMLSS model incorporates monotonic link functions, similar to those of a GLM:

gk(θk) = ηk= Xkβk

Here µ, σ, ν and ηkare vectors of length n, Xk design matrices of size n × Jk with corresponding covariate

vectors βk. Note that the GAMLSS can be greatly extended to cover a wide range of cases such as random

effects or larger numbers of parameters. For an extensive coverage of GAMLSS we refer to Stasinopoulos and Rigby (2007) [13].

As the number of RBNS claims is known at any time, we can directly apply the GAMLSS to each reported claim to estimate the RBNS reserve. In this model however we can use the additional information known about reported claims such as the case reserve as risk factors to improve the model where appropriate.

3.3

A benchmark model

In order to assess the quality of the derived estimates for the outstanding claims reserve, we will compare them to estimates based conventional methods. Denote the set of all in force policies at a certain time as Ki, the aggregated payments from arrival year i and development year j as Ci,j and Yk,l the cumulative

payments corresponding to claim l from policy k and Nk(a, b) the number of claims for policy k originating

from the time interval (a, b). We can write:

(20)

This will allow us to reconstruct run-off triangles and directly compare the suggested model to the chain-ladder method and the related development factors. For a more extensive coverage of the chain ladder method we refer to papers by Mack (1999) [10] and England and Verall (2002) [4]. In order to calculate the premium reserve for a single policy we multiply the insurance premium paid for a certain policy by the ratio between the exposure remaining and the total policy duration as per the IFRS and the Solvency II technical specifications for the prepatory phase by EIOPA (2014) [7]. For instance, the premium reserve for a policy which has one month of duration left is 121 of the insurance premium for that policy. Therefore when the reporting time is not synchronised with the start or end of individual policies the premium reserve can be large relative to the RBNS and IBNR reserves. Let RRBN S, RIBN R,

RN IN R, Rp and Ro denote estimates for the RBNS, IBNR NINR, premium and outstanding claims

reserves respectively. We can then use the following relation to derive a benchmark for our results:

(21)

4

Empirical study

The data set we use to perform our case study consists of the entire claim history of a motor insurance company, including occurrence times, reporting delays, numbers of payments for each claim and payment sizes, and information such as the type of accidents and claims. In addition, we have information about the of characteristics of policyholders,such as age, car weight, car value and driving experience over time. The time frame of the data set spans of nearly six years of data, and starts at the creation of the insurance portfolio at January 2008 and ends at June 2013. The number of policy holders starts at zero and increases over time, up to 44000 policy holders after six years. Table 1 below shows the risk factors which are tracked for each policy. In the data set at our disposal there is no lapse risk, i.e. all policies run for their full duration.

Risk Factor Variable type and description

Vehicle weight Positive, real, the weight of the insured vehicle

Number of vehicles Positive, integer, the number of vehicles the policy holder owns

Vehicle brand Categorical, with levels : European, American, Asian, sports and truck Yearly mileage Categorical, with levels : less than 10000, 10000-30000 and more than 30000 Vehicle power Integer, indicating the number of cc’s of the vehicle

Vehicle value Positive, real the sale price of the vehicle

Fuel type of vehicle Categorical, with levels: diesel, petrol, LPG, electric and hybrid Building year of vehicle Date, used to derive vehicle age

Usage Categorical, with levels: private or professional and private

Experience Positive, real, the number of years since the acquisition of a driving license Deductible Integer, the yearly deductible for the policy

Net premium Positive, real, the net premium for the policy per year Policy year Integer, the year the policy expires

Region Categorical, with 15 anonimized region indicators Number of claim-free years Integer, positive, the number of years since the last claim Policy expiration date Date, the date the policy expires

Policy holder owns a house Logical, indicating whether the policy holder owns a House Birthday Date of birth of policyholder

(22)

The information known about each claim is shown in table 2 below.

Parameter Description

Type of accident In one of eight categories, A through H Occurrence time The time of the accident

Reporting delay The time between the occurrence of the accident and notification Size of payments The size of each payments

Time of payments The date at which the payment occurred Time of settlement The date at which the claim is settled Total payment The sum of all observed payments Case reserve The case reserve

Injury Whether any bodily injury occurred during the accident

Table 2: Risk factors known of each claim

4.1

Reporting delay

(23)

Figure 3: Reporting delay over time

(24)

Figure 4: Plots of a Weibull distribution and the observed reporting delays fRd(x) = µ σ( x σ) k−1 exp((−x/λ)k) Parameter µ σ Estimate 0.399 4.095 Standard error 0.001 0.034

Table 3: Weibull distribution parameter estimates

4.2

Hazard rate

(25)

found in the appendix in section 11. We do not see large differences in characteristics over the time span of the data set. The change in claim frequency is therefore possibly due to some unobserved quantity, such as the terms of agreement of the insurance or a change in the claim handling of the insurer. We will therefore include an extra indicator for the policy year in our model to account for this trend. In this way we model the unobserved effect through an additional parameter to ensure that this effect is not included in the intercept estimate of other parameters. The estimation results for the hazard rate can be found in the table below. Because of the large number of observations the standard errors of the parameter estimates are relatively low. In the plots below we can see a slight decrease of the exposure and number of claims at the far right, this might be either because there actually are fewer policies in force or because of the policies have not yet been included in the data set. Estimation output for the claim rate is shown in table 4.

(26)

coefficients Estimate Std. Error t-value p-value

intercept 82.5 3.610 22.8 0

policy year -0.041 9.85E-04 -42.1 0 car value - cheap -0.091 0.017 -5.14 2.81E-07 car value - expensive 0.228 0.016 14.3 3.32E-46 car value - high 0.142 0.014 9.64 5.69E-22 car value - middle 0.086 0.013 6.31 2.73E-10 car age - medium -0.152 0.012 -12.8 3.22E-37 car age - old -0.255 0.028 -9.05 1.44E-19 fuel type -other -0.170 0.014 -11.7 1.38E-31 region 0 0.050 0.021 2.35 0.0186

region 1 0.296 0.017 16.8 0

region 2 0.115 0.019 5.82 5.75E-09 region 4 -0.073 0.015 -5.88 4.21E-09 region 5 -0.141 0.014 -9.53 1.61E-21 deductible -5.13E-04 2.74E-05 -18.7 0 mileage - less than 10.000 km -0.222 0.013 -15.9 0 mileage - more than 30.000 km 0.281 0.019 14.7 6.47E-49 experienced - false 0.138 0.018 7.30 2.86E-13 claim free years -0.013 7.93E-04 -17.2 0 age - senior -0.063 0.012 -4.93 8.31E-07 age - young 0.092 0.013 6.67 2.49E-11

Table 4: Parameter estimates for the GLM distribution for number of claims per policy year

(27)

be covered by another insurance policy. Policy holders might therefore be more inclined to report damage such as scratches to a newer or more expensive car, compared to a cheaper or older cars. Unfortunately we do not have information about the underlying properties of each of the region indicators.

We have found a strong and significant negative relation between the deductible and claim rate as well as the number of claim free years and claim rate, showing they are valuable predictors for an insurer. This is potentially because (i) they serve as a proxy of driver skill or (ii) as a deterrent for reporting accidents. As would be expected based on figure 5, the parameter indicating policy year is very significant, indi-cating there might be an unobserved factor influencing claim rate or might indicate a trend in the fitted coefficients. Estimates for the seasonal parameters are shown below in table 5. The estimates for the seasons have been derived using maximum likelihood optimisation based on the fitted Poisson model. Since the portfolio consists of all-risk car insurance claims, it is to be expected that most claims are reported in winter. Somewhat surprising is the relatively high rate in spring, as weather is then generally more mild compared to other seasons.

Season Estimate Standard error Winter 1.159 0.006 Spring 0.992 0.006 Summer 0.913 0.006 Autumn 0.936 0.006

Table 5: Estimates for seasonal parameters

4.3

Claim size

(28)

Gamma distribution. Its probability density function is shown below. fZAGA(x; µ, σ, ν) =      ν : x = 0 (1 − ν)  1 (σ2µ)1/σ2 x1/σ2 −1exp(−y/(σ2µ)) Γ(1/σ2)  : x > 0 (12)

Each parameter is estimated based on a number of risk factors, for µ and σ through a log link-function, and for ν through a logit link function. Estimates are derived by maximizing the likelihood function. Let Xi, i = 1, 2, 3 be some design matrices with corresponding coefficient vectors βi, then:

µ = exp(η1) = exp(X1β1) (13) σ = exp(η2) = exp(X2β2) (14) ν = 1 1 + exp(−η3 ) = 1 1 + exp(−η3) (15)

(29)

Figure 6: Several plots of the ZAGA-distribution

(30)

Akaike Information Criterion (AIC). Estimation output for unreported claims model is shown in table 6 below.

µ coefficients Estimate Std. Error t-value p-value intercept -122.61 9.882 -12.407 2.740e-35 car age - new 0.191 0.015 1.26 0 car age - old -0.081 0.037 -2.18 0.0293 policy year 0.064 0.004 1.32 1.73E-39 claim free years -0.012 8.98E-04 -14.0 1.68E-44 car value - cheap -0.066 0.021 -3.09 0.00203 car value - expensive 0.296 0.020 14.8 2.23E-49 car value - high 0.096 0.018 5.24 1.59E-07 car value - middle 0.048 0.017 2.83 0.00467 experienced - false 0.170 0.023 7.45 9.49E-14 σ coefficients Estimate Std. Error t value p-value intercept 0.117 0.003 35.4 0 ν coefficients Estimate Std. Error t value p-value intercept -1.14 0.022 -51.5 0 claim free years -0.139 0.002 -8.73 2.68E-18

Table 6: Parameter estimates for the GAMLSS for claimsize of unreported claims

(31)

µ coefficients Estimate Std. Error t-value p-value intercept 7.59 0.014 544 0 car value - cheap -0.055 0.020 -2.77 0.006 car value - expensive 0.265 0.018 14.5 0 car value - high 0.159 0.017 9.42 4.83E-21 car value - middle 0.050 0.016 3.18 0.001 case reserve - middle 0.177 0.040 4.47 7.73E-06 case reserve - low 0.396 0.024 16.6 0 case reserve - high 1.19 0.101 11.9 1.98E-32 experienced - false 0.179 0.0207 8.63 6.49E-18 category A 5.16 0.020 2.54 0.011 category B 0.056 0.040 1.39 0.165 category C 0.887 0.039 23.7 0 category D 0.302 0.058 5.18 2.26E-07 category E -1.44 0.013 -113 0 category F -0.591 0.032 -18.3 0 category G 0.308 0.048 6.38 1.78E-10 σ coefficients Estimate Std. Error t value p-value Intercept -0.0814 0.0038 -0.211 0 ν coefficients Estimate Std. Error t value p-value intercept -2.81 0.027 -104 0 reporting delay 0.005 5.31E-04 9.40 0 case reserve - middle 4.91 0.052 94.8 0 case reserve - low 1.90 0.053 35.8 0 case reserve - high 2.59 0.163 15.9 0

Table 7: Parameter estimates for the GAMLSS for number of claim size of reported claims

(32)

Based on the low p-values, and high values of the coefficients we find that including micro-level data in our model improves its predictive quality. In cases where the composition of the portfolio changes significantly over time specifically including micro-level data is useful, and would provide an additional advantage over aggregated methods. In the portfolio under consideration however the composition is relatively stable over time, hence this effect is not present in the above case study. A figure of the risk factors at several points in time as can be found in section 11 in the appendix.

4.4

Simulation procedure

Using the distributions discussed in the previous sections, we can estimate the value and number of NINR, RBNS and IBNR claims for an individual policy. In addition to these quantities, an insurer would be interested in the total liabilities for an individual policy, and the total liabilities for the entire portfolio. While we have an expression for the occurrence rate of IBNR claims, deriving the distribution of the total number of IBNR claims analytically is complex and it might not have a closed form. A similar argument holds for the distribution of the expected sum of the IBNR, RBNS and NINR reserves. We will therefore approximate the value of these reserves by simulation. As an added benefit we find that using simulation is more a more flexible approach than deriving results analytically, but calculations can be very time consuming. To generate a realisation of each of the reserves for an individual policy k, we proceed as follows. Note that because claim size is assumed to be independent of claim frequency, and NINR, IBNR and RBNS reserves are assumed to be independent, the steps do not have to be followed in this order necessarily.

Initialisation Fit models to the following quantities as described in the previous sections: (i) the yearly claim frequency for policy k λk, (ii) NINR and IBNR claim sizes, (iii) reporting delay, (iv) seasonal

influences, (v) the process for reported claims ρ(·) and (vi) RBNS claim sizes.

Generate the number of NINR claims Integrate the hazard rate in equation (6) over the remain-ing duration of the policy, and subsequently draw a realisation of a Poisson distribution with an intensity equal to the obtained quantity to obtain NN IN R.

(33)

to be piecewise linear, a straightforward choice of Ω is Ω = λ × max(Sj), j = 1, .., 4. The number

of elements of this set is then equal to NIBN R, the number of IBNR claims generated.

Generate IBNR and NINR claim sizes Invert the claim size distribution numerically or analytically and generate NIBN R+ NN IN R realisations.

Generate RBNS claim sizes For each of the NRBN S open RBNS claim, generate a claim size from

the fitted claim size distribution.

The simulated NINR reserve is then calculated by summing the NN IN R realisations of the NINR claim

size distribution. The IBNR and RBNS reserves can be derived in a similar fashion. We repeat the steps above sufficiently many times to generate a set of realisations, of which we can derive sample statistics.

(34)

5

Results

To test the results derived from the methods described in this thesis, we derived estimates at several points in time based on data available at that time. Since it is likely that not all claims have been observed yet even for past arrival years, the observed set of claims is not equal to the true set of claims. However, due to short reporting delays, the difference is unlikely to be very large. As a benchmark we derive results from the chain ladder method and premium reserve. On the next few pages in figures 7 to 10 and tables 8 to 11 estimates are shown of the reserves at several evaluation dates. The dashed orange and white lines correspond to the observed values of the estimated quantities, dashed black and white lines indicate the outcome of the benchmark model. The simulated samples consist of 100 realisations of the simulation model, but for actual applications we would suggest generating larger samples.

(35)

Figure 7: Simulation results and benchmark at 2013 − 07 − 01

Ultimate loss IBNR NINR RBNS Min. 7054735 132772 5069899 1505134 1st Qu. 7536454 165893 5393601 1904637 Median 7836832 183753 5465436 2147680 Mean 7860820 186717 5469432 2204670 3rd Qu. 8166522 203400 5552801 2421081 Max. 9265840 280571 5804955 3657285 Benchmark 8238641

Table 8: Simulation results and benchmark at 2013 − 07 − 01

(36)

Figure 8: Simulation results, observed value and benchmark at 2012 − 07 − 01

Ultimate loss IBNR NINR RBNS Min. 9933084 129643 7911262 1376148 1st Qu. 10419477 157179 8190240 1964964 Median 10675616 169418 8301606 2202552 Mean 10758458 172710 8326906 2258843 3rd Qu. 11056511 183800 8445452 2522496 Max. 12477356 240517 8798573 3603357 Benchmark 11579956 Observed Values 9511504 162431 6935843 1867701

Table 9: Simulation results, observed value and benchmark at 2012 − 07 − 01

(37)

Figure 9: imulation results, observed value and benchmark at 2012-01-0

Ultimate loss IBNR NINR RBNS Min. 8520701 65430 7464078 669632 1st Qu. 8959501 94319 7805809 998135 Median 9151775 108529 7912404 1088989 Mean 9140785 109187 7921872 1109726 3rd Qu. 9340828 121302 8034555 1217276 Max. 9860947 170512 8381307 1699961 Benchmark 10382557 Observed Values 8883877 123700 6935843 1824334

Table 10: Simulation results, observed value and benchmark at 2012-01-01

(38)

reserves. For each policy an NINR and IBNR reserve is estimated, but only for policies with claims there is an RBNS reserve.

Figure 10: Simulation results, observed value and benchmark at 2011 − 07 − 01

Ultimate loss IBNR NINR RBNS Min. 9262666 42808 7874828 705633 1st Qu. 9714851 77496 8249329 1311861 Median 9993567 95610 8368464 1532062 Mean 10031318 92459 8362721 1576138 3rd Qu. 10274773 106686 8491497 1830344 Max. 11471130 133413 8822136 2851801 Benchmark 10117804 Observed Values 10050641 215843 7775230 2059568

Table 11: Simulation results, observed value and benchmark at 2011 − 07 − 01

(39)

Calculating the 99.5% Value at Risk for the Solvency II capital requirement based on this model is straightforward, but would require a larger number of simulations. Note that because the VaR risk-measure is not (sub-)additive the 99.5% Value at Risk for individual policy holders can unfortunately not be directly related to the Solvency II capital requirements. Premium risk is straightforward to assess from this model, given a particular premium reserve Rp and simulated NINR outcomes. Based on the

estimation output and simulated samples, we find that the extra information provided by the micro-level data outweighs the increase in uncertainty from the increased model size. In addition the proposed model does not make assumptions about the composition of the underlying portfolio, which means it is appro-priate even when the policy holder characteristics change significantly.

6

Concluding remarks

In this thesis we have presented a way to calculate technical provisions for individual non-life insurance policies, based on individual properties of policy holders, and their claim history. We introduce the concept of Not Incurred and thus Not Reported (NINR) as a parallel to premium reserve, which can be used to assess premium risk. Using this notion we present a coherent framework with which non-life technical provisions can be calculated. To estimate the reserve we use a frequency-severity approach based on the model presented in Antonio and Plat (2010)[1]. We model future claim occurrences with a point process in continuous time, which is based on a Poisson distribution. Incurred But Not Reported (IBNR) claim occurrences are modelled with a renewal process based on claim frequency and reporting delay. A Generalised Additive Model for Location, Scale and Shape (GAMLSS) is used to estimate claim sizes. In an empirical study on a small all-risk car insurance portfolio, we have compared the approach to a benchmark model and to out of sample observations.

By modelling the expected number and size of future claims such that the number and size of reported claims is a realisation of these claims, the proposed model is a coherent way to derive technical provisions and insurance premium. We find that this coherence improves interpretability of the model, and allows for a better assessment for the total risk associated with an insurance policy.

(40)

or claim level. In addition one of the main assumptions of the chain ladder method, that the composition of the underlying portfolio does not change over time, is not required. While the high complexity and high number of fitted parameters compared to aggregated methods cause more parameter and model uncertainty, we feel this effect is offset by the much larger amount of available data when the model is selected parsimoniously. By modelling claims in continuous time, the method can be used to derive results at any point in time.

Based on our findings the results are in general more accurate than the benchmark model, but are computationally intensive to derive. The approach has the following advantages over aggregated claim reserving: (i) it does not require observations from many arrival years (ii) it provides valid results when the underlying insurance portfolio changes over time. (iii) Since the model is set in continuous time, there is no need to remove observations or extrapolate data to the nearest unit of measurement (iv) by including seasonality the model can be used to estimate reserves by quarter or by month. (v) As opposed to aggregated methods, micro-level models allow for more analysis into the causes of loss and profit. (vi) Micro-level methods allow different treatment of subsets of data which potentially allows us to model reinsurance contracts. The main disadvantages are the high amount of data required, increased complexity and long calculation time.

An important factor in determining the quality of the model outcome is the fit of the underlying distri-butions for claim size, occurrence times and reporting delay. In the model described in this paper each of these distributions is easily adjusted or replaced based on the underlying data, while the overall frame-work should still be appropriate. For instance for lines of business where the number of payments per claim is relevant a Poisson process could be fitted to the development of claims in addition to only mod-elling payment sizes, such as the one presented by Antonio and Plat (2010) [1] and Haastrup and Arjas (1996)[6]. We expect that a micro-level model could be especially valuable when the insurance portfo-lio composition or behaviour changes over time, however we have not yet been able to test this statement.

(41)

7

Appendix

7.1

Risk factors over time

(42)
(43)
(44)

References

[1] K. Antonio and R. Plat. Micro-level stochastic loss reserving for general insurance. Scandinavian Actuarial Journal, pages 1–21, 2010.

[2] E. Arjas. The claims reserving problem in non-life insurance: some structural ideas. ASTIN Bulletin, 19(2):139–152, 1989.

[3] R. J. Cook and J. F. Lawless. The Statistical Analysis of Recurrent Events. Springer, 2007.

[4] P. D. England and R. J. Verrall. Stochastic claims reserving in general insurance. British Actuarial Journal, 8:443–544, 2002.

[5] J. Guszcza and J. Lommele. Loss reserving using claim-level data. Casualty Actuarial Society Forum, Fall, 2006.

[6] S. Haastrup and J. Arjas. Clalims reserving in continuous time; a nonparametric bayesian approach. ASTIN Bulletin, 20(2):139–165, 1996.

[7] European Insurance and Occupational Pensions Institute (EIOPA). Technical specification for the preparatory phase. 2014.

[8] C. R. Larsen. An individual claims reserving model. ASTIN Bulletin, 37(1):113–132, 2007.

[9] J. F. Lawless and K. Thiagarajah. A point-process model incorporating renewals and time trends, with application to repairable systems. Technometrics, 38(2):131–138, 1996.

[10] T. Mack. The standard error of chain-ladder reserve estimates: recursive calculation and inclusion of a tail-factor. ASTIN Bulletin, 29:361–366, 1999.

[11] M. Pigeon, K. Antonio, and M. Denuit. Individual loss reserving with the multivariate skew normal distribution. ASTIN Bulletin, 43:399–428, 2013.

[12] S. Chukova S. Anastasiadis. Dependencies in insurance modeling: An overview. 2010.

[13] D. M. Stasinopoulos and R. A. Rigby. Generalized additive models for location scale and shape (gamlss) in r. Journal of Statistical Software, 23(7), 2007.

Referenties

GERELATEERDE DOCUMENTEN

Opportunity entrepreneurs have a higher share in developed countries, when combining the opportunity entrepreneur with GDP this might indicate a higher determinant for

Doordat de twee commercials uit Amerika komen zijn ze hoogstwaarschijnlijk voor veel proefpersonen onbekend en hebben de proefpersonen geen tot weinig kennis over het merk zo

We next review the incidence of various risks online as reported by European 9-16 year olds.. As shown in Table 2,

Day 0 gives the auction final price, day -1 gives the implied mean market recovery rate one day before the auction, similarly for day 1, however day 2 represents

Double support time and those parameters expressed as a percentage of the gait cycle (especially double support percentage) showed the largest relative differences and/or worst

lewendige aksie op die verhoog, waar dit vroeer In meer objektiewe rol gespeel het. In die volgende tonele is Baal en Ekart saam , en Baal is duidelik besig om moeg te

23.5 The contractor shall submit a claim for the revision of the date of practical completion to the principal agent within forty (40) working days, or such extended