• No results found

Crew Insurance

N/A
N/A
Protected

Academic year: 2021

Share "Crew Insurance"

Copied!
45
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Crew Insurance

A quantitative analysis to investigate the risk premiums for various

insurance products

Claudia Zunnebeld

(2)

Master Thesis Econometrics, Operation Research and Actuarial Studies Specialization: Actuarial Studies - Crew Insurance

(3)

Crew Insurance

A quantitative analysis to construct premiums for various insurance

products

Claudia Zunnebeld

August 21, 2014

Abstract

In this thesis, we investigate how to model the risk of various products sold by a crew insurance company. Using rating factors to explain the risk, we model the risk pre-mium in the form of claim incidence and claim severity. The claim incidence denotes the number of claims per day at sea insured and is modeled by a bivariate gamma or marginal gamma distribution, depending on the number of bivariate observations on more products. Using MLE we also model the proportion of zero-claim incidences. Here we observe dependence between the bivariate zero-claim incidences of two prod-ucts. Furthermore, these models were proven not to be stable between currencies and over the years. Next, the log claim amount is modeled using a gamma GLM without the rating factors region and rank, as they are not significant. We find that the claim incidence of policies written in USD is higher than the claim incidence of policies writ-ten in EUR. Also the claim severity of claims paid in USD is higher than claims paid in EUR. Finally, we observe large confidence intervals which, in combination with limited data availability, make strong conclusions impossible.

(4)

Preface

Within this thesis the reader will find the results of my final project for the Master Actuarial Studies at the University of Groningen. In a period of six months I performed research in the crew insurance business. It has given me insight in the field of insurance, which will definitely be of value in my further career. Before I started in February, I was unacquainted with this specific field of insurance. By a process of trail and error I found out how actuarial theory is applied in the practical field as well as on management level.

The knowledge gained during my Bachelor and Master at the University of Groningen is very important in writing this thesis. Especially during the last year I have studied a wide, though a very specialized spectrum of courses where I’ve extended my econometric knowledge and way of thinking. Throughout the entire year I have been inspired by the professors of the Master program. They not only took the time to teach us value theory, they also supported me to get the best out of myself.

My special thanks go out to Professor R.H. Koning, who had a prominent role in this thesis and with whom I drank so many inspiring cups of coffee this year. While giving me the freedom to conduct my own research, he supported me in the content of this thesis. He also gave me personal support and helped me out when necessary. He is an expert on the theory of risk and statistics, but above all he is a professor constantly questioning the practical application of theory on my subject. This showed me how important it is to make the well-considered choices, choices which I made to write this thesis.

Furthermore, I would like to acknowledge the contribution of an anonymous insurance com-pany who gave me the freedom to do my research and obtain my data. I hope my thesis will contribute in the process of renewing their insurance premiums for several products. A right assessment of the policy portfolio will be of high value in identifying the risk in their portfolio. During this six months they have always involved me in the project and given me the feeling I was a member of their company. Therefore I would like to thank all my former colleagues who were all so willing to answer my questions. In special I would like to thank Armen and Aad. They introduced me to the business, helped me view the project from different perspectives and also encouraged my personal development.

This thesis is the end of a valuable period as a student of the University of Groningen in which I’ve learned so much on many aspects. Last but not least, I want to thank my family, Mark and my friends, who did not see me often this year. They were always there to support and motivate me when I needed it.

(5)

Contents

1 Introduction 1

2 Research Methodology 2

2.1 Terminology . . . 2

2.2 Individual or collective risk model? . . . 2

3 Theory on the insurance premium 4 3.1 Rating Factors . . . 5

3.2 The insurance premium . . . 6

4 Modeling theory 8 4.1 Modeling the claim incidence . . . 8

4.1.1 Dependence in claim behavior . . . 9

4.1.2 MLE to model the claim incidence . . . 9

4.2 Modeling the claim severity . . . 11

4.2.1 General overview of the GLM . . . 11

4.2.2 Mathematical details of the GLM . . . 12

4.3 Ratemaking . . . 14

4.4 Model selection . . . 15

5 Data description 16 5.1 Coverages and claims for the N-group . . . 17

5.2 Coverages and claims for the C-group . . . 18

5.3 Coverages and claims for the I-group . . . 19

6 Estimation Results 21 6.1 Claim incidence . . . 21

6.1.1 Claim incidence N-group . . . 22

6.1.2 Claim incidence I-group . . . 27

6.2 Claim severity . . . 32

6.2.1 Claim severity N-group . . . 32

6.2.2 Claim severity I-group . . . 33

6.3 Risk premium . . . 34

7 Concluding remarks 37

(6)

1 INTRODUCTION

1

Introduction

Imagine what happens if an accident occurs on a ship at sea and one of the crew members gets seriously injured. Who is responsible for the costs of transporting this crew member to a nearby hospital? Who pays for the medical care he needs? What happens if the injured crew member still needs medical care in his own country?

In this thesis we conduct research on an insurance company in the maritime business. The maritime business is a high-risk business where accidents happen on a regular basis. There-fore, the employer needs to insure the risk of high costs of medical care to prevent bankruptcy. Insurance refers to the business of transferring the economic risk from the insured to the insurance company. In exchange for an insurance premium, the insurance company is now responsible for the costs of medical care and other related costs. Denuit et al. (2006) define this risk as a non-negative random variable which represents the amount of money paid by the insurance company to indemnify the policyholder. The price paid to insure this risk is the insurance premium.

In order for an insurance company to be profitable, this insurance premium needs to be high enough to cover the potential claims and other costs made by the insurance company. To determine the insurance premium that should be paid by the employer, it is required to know more about the claims the employers file. The filed claims can be defined in terms of claim frequency and claim severity.

In this thesis we work with insurance data consisting of different insurance modules. The data come from a insurance company with a specialized department in crew insurance. An insurance module insures a certain part of the care and aftercare needed in case of an accident or illness during the working period on a ship. The details considering the specific module are stated in the policy contract. If the employer is insured by the insurance company, he becomes a policyholder. In this thesis we will call the insurance modules products. The policyholder can choose to insure his crew for more than one product. The contribution of this thesis is to determine which variables can be used in a stochastic model to model the risk of the policyholders. These variables can then be used to model the claim frequency and the claim severity for the products. We propose the following research question:

Which stochastic model is appropriate to model the risk of crew insurance? To answer this question several sub-questions arise:

1. Are risks if the various products independent or dependent?

2. Can we perform a more precise risk assessment using the heterogeneity of the claims? 3. What is an appropriate model selection criteria?

(7)

2 RESEARCH METHODOLOGY Section 4 provides the reader with literature on the models used in this thesis. In Section 5 the data is described and in Section 6 the analysis of the data will be explained and the results of tests will be provided. The last two sections, 7 and 8, provide the reader with a conclusion and suggestions for further research.

2

Research Methodology

Crew insurance is a niche in the insurance market. Most procedures used to model general insurance risk can be applied to the crew insurance, however we have to take into account some special characteristics of this business. For example, the policyholder does not insure his crew for a fixed period and he can change the composition of his crew during the contract period. We therefore conduct exploratory and explanatory research in this thesis. We perform exploratory research by applying the existing theory on general insurance topics to the specific characteristics of crew insurance. Furthermore, we obtained data from a crew insurance company and use these data for our explanatory research. The use of several models is discussed and we will show which model is most useful in our situation.

2.1

Terminology

Table 1 provides an overview of the terminology used in this thesis. These terms will help us understand the fundamentals of the insurance business.

Definition Description

Claim frequency a measure of the rate at which claims occur Claim severity a measure of the size of the claim (EUR or USD)

Risk premium a measure of the average expected loss per coverage (EUR or USD) Insurance premium the price to obtain insurance as determined by the insurer (EUR

and USD)

Policyholder the person/company in whose name an insurance policy is written Coverage the number of insured days at sea

Insured days at sea the period determined in the collective bargaining agreement (CBA) in which the crew member works for the employer, in-cluding the traveling period to and from the ship.

Policy the document embodying the insurance contract, it specifies the number of crew members insured, the coverage and the products for which the crew is insured

Table 1: Terms with a description as they are used in this thesis.

2.2

Individual or collective risk model?

(8)

2 RESEARCH METHODOLOGY the portfolio of claims can be regarded as individual risk model (IRM) or as collective risk model (CRM). The IRM considers the claims on all policies, and sums over all policies in the portfolio, while the CRM is derived from the portfolio as a whole (see Verrall (1989)). The IRM considers a portfolio of i policies and a claim random variable Y where Y contains two random events: the event that the claim occurs and the size of the a claim given that it occurs. The variable Yi can be defined as follows:

Yi = IiBi (1)

where

Ii =

(

1 in case of a claim, with probability q 0 in case of no claim, with probability 1 − q Bi the size of the claim

The total losses for the insurer are:

S =

n

X

i=1

IiBi (2)

Where the summation is over all policies, so n is the number of policies in the portfolio. A limitation of the IRM is that it allows only one claim per policy. This is because the Bernoulli distributed random variable can only be 1 or 0. In the crew insurance more claims can be filed on one policy. Therefore we derive the portfolio as a whole and use the CRM as the basis risk model in this thesis.

The CRM can be represented as in Equation (3):

S = Y1+ Y2+ · · · + YN, where N ∈ N+ (3)

S is again the total loss random variable. In the collective risk model, the summation is over the claims, so N is the random number of claims in the portfolio. Let Yi be the size of

the ith claim and assume them to be independent and identically distributed (i.i.d.) random variables, unless stated otherwise. The distribution of N is assumed to be independent of the payments Yi. Using this model, the claim frequency and the claim severity can be modeled

separately. This has seven advantages according to Klugman et al. (2012). We name the ones relevant for this paper:

• When the number of policies in the portfolio changes, the number of expected claims changes. We need to account for this change when forecasting the number of claims in future years based on past years data. By modeling the claim frequency and claim severity separately this can be done easily.

• Heterogeneity of the data can be combined to obtain hypothetical loss distributions. • The distribution of S depends on the distribution of N and Y . By observing these

(9)

3 THEORY ON THE INSURANCE PREMIUM In accordance with the CRM let N be the claim count random variable, with frequency distribution f . The size of the claim payments is denoted by the random variable Y with severity distribution s.

3

Theory on the insurance premium

The maritime business has always been a dangerous business for both the employer and the crew members. The employer can lose his ship and cargo and the crew are not sure of their lives. In 1847 The Fatal Accidents Act (FAA) was signed. This act gave crew members and their families the right for compensation to cover their damages in case of an accident or illness. From that moment the probability of claims increased for employers. To protect the companies and employers from insupportable costs of claims, Protection and Indemnity (P&I) Clubs were set up. The P&I Clubs offer a liability insurance. A liability insurance covers the risk of damages to a third party. Hence the P&I-insurance protects a policyholder from claims initiated by another party, not for the policyholder’s own damage.

In the maritime business crew members are covered by the agreements of a Collective Bar-gaining Agreement (CBA). A CBA is a document which details all the terms and conditions of the crew employed on a ship. In the CBA it is stated that an employer is responsible for his crew members. Meaning that an employer has to pay for medical help and repatriation of this crew in case of accidents or illness during the period they work for him. P&I insurers offer crew insurance. However, in most cases it is cheaper for an employer to insure his ship and cargo at a P&I club and to insure this crew members by a crew insurance. Crew insur-ance nowadays is also very effective for temporary employment agencies, where an agency is responsible for the temporary employees and not the employer who hires them.

In order for insurance companies to operate effectively, the policyholders will contribute an amount of their income in the form of a premium into the insurers fund. This amount is used to compensate the policyholders who suffer a loss. According to B¨uhlmann (1985) all insurance activities of an insurance company can be summarized by an input-output system as in Figure 1.

(10)

3 THEORY ON THE INSURANCE PREMIUM To manage the risk of an insurance company, a distinction is made between risk factors and rating factors (see Sj¨oberg (2000)). Risk factors are the factors that directly influence the claim frequency and claim severity of a claim. Unfortunately, they are often difficult to obtain or measure. This is were rating factors come in. Instead of measuring the risk factors, the company selects rating factors. Rating factors are factors that are correlated with the true underlying risk factors. Rating factors are cheap to obtain and they can be easily verified (see Booth et al. (2004)). Take for example a car insurance: driving with a high speed increases the probability of an accident and is a risk factor, however it is not measurable. A rating factor in this case can be age, as young drivers tend to be more riskless and drive faster. Rating factors are essential to the insurer to determine the level of the risk premium. After having obtained the rating factors, an insurance company can base its premium on two methods:

Exposure rating: this approach uses the characteristics of a broad group of policy holders to make an estimate of the expected claim per individual. Insurance companies generalize and collect data over many years and many policyholders. They sometimes use wider industry or government data to perceive the possible future behavior of policyholders and the factors associated with the loss experience. This approach results in general premiums, i.e. not individual.

Experience rating: this approach differentiates between policyholders, and charges different premiums to individuals with different risk profiles. By grouping policyholders into groups with the same risk profile, the insurer will make an assessment of the possible risk factors likely to affect the claim frequency and the claim severity based on the past experience of this group of policyholders (see Booth et al. (2004)).

In this thesis we work with data where the policyholders have been registered into three groups. The groups are based on the information registered about the insured persons. The policyholder can decide which characteristics he wants to register.

The first group are the policyholders who insure a group of individuals for a certain period. They do not register any characteristics of the individuals. We only know the number of crew members they insure. We name this group the N-group. The second group of policyholders insure individual crew members and they register the rank of their crew members. The rank registered is grouped into the rank ’officer’ or ’crew’. We name this group the C-group. The last group also insures individual crew members and registers more specific characteristics. This group of policyholders, which we name the I-group, registers characteristics such as rank, age and region. Being able to group the policyholders based on this risk information, we use the method of experience rating.

3.1

Rating Factors

(11)

3 THEORY ON THE INSURANCE PREMIUM 1. factors associated with insured individual

2. factors relating to the coverage being sought

In crew insurance rank and region where the individual comes from are rating factors as-sociated with the insured individual. The rank and region can affect the probability of an accident or illness to occur during the coverage. The second factor, the coverage being sought, is important for the products that insure disability and loss of life. If a crew member is insured for a loss of life product, a lump sum will be paid out to the relatives if he dies. The same applies to temporary or permanent disability. If a crew member becomes temporarily disabled, a percentage of his salary will be paid during this period. If the crew member is permanently disabled a lump sum will be paid. The lump sum, which we name insured sum, is documented in the specific CBA under which the crew member works. The size of the amount to be paid determines the loss for the insurance company. In a general CBA there are five aspects for which the employer is responsible:

• medical and dental services

• repatriation and crew effects (possession) • temporary disability

• permanent disability • loss of life

In this thesis we will focus on the first two aspects. When the policyholder insures his crew for these aspects, he can insure them for a certain amount of days at sea. In Section 5 we will elaborate on the products that insure these two aspects.

3.2

The insurance premium

The process of calculating the insurance premium is described by Booth et al. (2004). They state that an insurance premium depends on several aspects. These are:

• the exposure of the policyholder and the insured individual towards risks • the degree of risk associated with the policyholder

• the costs administering the claims and policies • the profit required by the insurer

(12)

3 THEORY ON THE INSURANCE PREMIUM stage. In this stage the calculation of a theoretical price for the risk and all associated expenses is done. The second stage considers the commercial adjustment to the theoretical cost that takes the broader corporate objectives into account. In this thesis we focus be on the costing stage, as the second stage depends on the character of the insurance company and the competitive positions of the insurance company.

The theoretical premium formula in the costing stage can be expressed as follows:

OP = R + lv + u + O/T (4)

where:

OP the office premium R the risk premium a × v

v the expected number of claims per policy a the expected severity of per claim amount

l the expense of handling each claim, dependent on the number of claims u the cost per policy set up, underwriting expenses

O the total overhead expense allocated to this class of business T the expected number of policies to be sold

By decomposing the insurance premium into several components, all components can be explained and calculated separately. However, most companies combine some of the terms mentioned above into an overall proportional addition to the risk premium as it is not possible to determine the costs per claim.

The insurance premium stated in Equation 4 is a basic formula. To apply this formula in the insurance business, three additional terms have to be included: an reservation term, a reinsurance term and a profit margin.

Insurance companies are controlled by regulators and have to fulfill several conditions to reduce the risk of insolvency. These conditions are stated in Solvency II, the risk management regulation for insurance companies. According to Solvency II ’all possible claims’ must be included in the insurance premium. This means we have to take into account claims that have occurred, but have not been reported yet: incurred but not reported (IBNR). Furthermore we have to take into account claims that have been reported, but not fully settled: reported but not settled (RBNS). Modeling IBNR and RBNS claims is outside the scope of this thesis. We include a variable for the reservation to be made to cover the loss of these claims. This term is named res.

(13)

4 MODELING THEORY The final term included is the profit margin k, which is a percentage of the office premium as stated in 4. When the OP has been derived, one needs to multiply by 1 + k. This leads to the following insurance formula:

P = R + lv + u + O/T + res + D(va)(1 + k) (5)

The insurance premium P is a general formula which can be applied to different lines of business. In this thesis we apply the formula to determine the prices of various products. We will elaborate on the various products in Section 5.

Now the premium has been stated, the uncertainty about the expected claim frequency and claim severity has to be modeled. As the composition of the portfolio changes over time, the expected losses change. Hence, we need to obtain insight in the claim behavior of the policyholders in the different group and the claim behavior on the different products. This can be done by modeling the claim frequency and claim severity separately as described in Section 2.2. In the following section, the theory to model the claim frequency and claim severity is discussed. Together these determine the risk premium R.

4

Modeling theory

As discussed in the previous section, one needs to obtain insight in the claim frequency and the claim severity to derive the risk premium R. Instead of modeling the claim frequency, we model the claim incidence. In this section we will describe the theory used as building blocks to model the claim incidence and the claim severity. We start with a model for the claim incidence and the dependence between the claim incidence of various products in Section 4.1. Next, we will discuss the generalized linear model (GLM) in Section 4.2, a model used traditionally in risk modeling. We end this section with theory on the risk premium.

4.1

Modeling the claim incidence

The claim incidence is defined as the number of claims per coverage which will result in a number of claims per day at sea insured. Usually the claim frequency is modeled using a discrete number of claims. However, we have decided to model the claim incidence. A motivation to model the claim incidence instead of the claim frequency is the high observation of zero claims and the possible dependence between claim incidence of various products which we would like to model by a bivariate distribution. Furthermore, as the coverage periods are highly varying and easy to obtain as a variable, it is interesting to know more about the distribution of the claim incidence.

(14)

4 MODELING THEORY on this products. If two products are insured, the dependence between the claim behavior is important in a correct assessment of the risks in the portfolio and the quality of a policy-holder. By quality we refer to the claim behavior of the policyholder, which is bad in case of a high claim incidence and claim severity.

The claim incidence is calculated as the number of claims per coverage. Let wip be the

coverage for product i in days and policyholder p. Let Nip be the number of claims for this

coverage. The claims per coverage can be calculated as: nip = Nip

wip. Further, let µip be the

mean which differs per policyholder. 4.1.1 Dependence in claim behavior

We are interested in dependence between the claim incidence on product 1 and on product 2. Therefore we model a bivariate distribution. As the claim incidence might be skewed to the right, we assume nip to be log-normally distributed. Note that when X is log-normally

distributed, log(X) is normally distributed. Let Σ be the variance-covariance matrix with diagonal elements ψ2

i and off-diagonal element ψ12= ρ12σ1σ2:

logn1 n2  ∼ Nτ1 τ2  , ψ 2 1 ψ12 ψ22  (6)

If ψ12 > 0 a policyholder who tends to claim a lot on product 1, also tends to claim above

average on product 2. To implement this model empirically one needs a large number of claims where policyholders claimed on both products.

In section 5 it will be shown that we have a substantial number of claims on product 1 and product 2, however we do not have enough bivariate observations from one policyholder. Therefore, in the next section we will elaborate on the proportion of zero-claims and maxi-mum likelihood estimation (MLE) to be used to model the claim incidence.

4.1.2 MLE to model the claim incidence

We use a method of zero-claim proportions and MLE because the data does not contain enough non-zero bivariate claims. A next step is to model the claim incidence n for both products using a bivariate distribution. Although we do not have many non-zero observa-tions, there is more useful data at hand: a large number of observations occurring at point (0,0), indicating that policyholder do not claim on both products. Hence, another way to measure dependence is to use the number of zero-claims on both products. In this section we elaborate on a bivariate zero-inflated model instead of two univariate models.

(15)

4 MODELING THEORY products. Four cases can be identified with zero or non-zero claim incidences:

1 : n1 = 0, n2 = 0 with 1 − π12− π·2− π1· and indicator I1 (7)

2 : n1 > 0, n2 = 0 with π·2 and indicator I2 (8)

3 : n1 = 0, n2 > 0 with π1· and indicator I3 (9)

4 : n1 > 0, n2 > 0 with π12 and indicator I4 (10)

The indicator functions described above are 1 if we have observations that fulfill the charac-teristics of the case and zero if not. Let us assume the non-zero observations described in case 2 follow a distribution which we denote by f1(n1; θ1) where θ2 denote the parameter value

of the distribution. Let the non-zero observations in case 3 follow as distribution denoted by f2(n2; θ2). Let nj be the jth entry for which we observe the claim incidence. As is it

likely that we do not have many observations of case 4 to derive the distribution, let the n1

observations of case 4 have a f1(n1; θ1) distribution and n2 observations of case 4 a f2(n2; θ2)

distribution. We now derive the following likelihood function consisting of the four cases: L(π, θ; n) = Y j:I1(nj)=1 1 − π12− π·2− π1·  Y j:I2(nj)=1 π1·f (n1j; θ1)× (11) Y j:I3(nj)=1 π·2f (n2j; θ2) Y j:I4(nj)=1 π12f1(n1j; θ1)f2(nj2; θ2)

The claims have a non-negative domain and we will later see in Section 6 that the claim incidences have a skewed distribution. We therefore use a gamma distribution with parameter θ as a candidate for fi(ni, θi). Let f1(n1; α1, β1) and f2(n2; α2, β2) be the gamma distributions

for the observations for which respectively I2 and I3 are equal to one. Using the gamma

distribution we obtain the following likelihood function, : L(π, α, β; n) = Y j:I1(nj)=1 1 − π12− π·2− π1·× (12) Y j:I2(nj)=1 π1· βα1 1 Γ(α1) nα1−1 1j e −β1n1j× Y j:I3(nj)=1 π·2 βα2 2 Γ(α2) nα2−1 2j e −β2n2j× Y j:I4(nj)=1 π12 βα1 1 Γ(α1) nα1−1 1j e −β1n1j β α2 2 Γ(α2) nα2−1 2j e −β2n2j

In Equation 12 it can be clearly observed that case 4, where where both n1 and n2 are

positive, depends on the distribution and the parameters of f1(n1; α1, β1) and f2(n2; α2, β2)

(16)

4 MODELING THEORY log-likelihood is derived: `(π, α, β; n) = X j:I1(nj)=1 I1 log(1 − π12− π·2− π1·)+ (13) X j:I2(nj)=1

I2 log π1·+ α1log β1− log Γ(α1) + (α1 − 1) log n1j −

n1j

β1

+

X

j:I3(nj)=1

I3 log π·2+ α2log β2− log Γ(α2) + (α2 − 1) log n2j −

n2j

β2

+

X

j:I4(nj)=1

I4 log π12+ α1log β1− log Γ(α1)+

(α1− 1) log n1j −

n1j

β1

+ α2log β2− log Γ(α2) + (α2− 1) log n2j −

n2j

β2



Estimating the parameters can be done by taking the derivative with respect to the specific parameter, known as the score function, and setting it equal to zero. Using the assumption of independence between the claim incidence of product 1 and 2, we have reformulated the distribution of case 4 in terms of f1(n1; α, β1) and f2(n2; α2, β2).

4.2

Modeling the claim severity

To model the claim severity we use the generalized linear model (GLM) model. In this section a small introduction of GLM will be provided. A GLM provides a flexible stochastic specification of the response variable, where the mean is dependent on covariates. In this section we will elaborate on the properties of GLM. The mathematical details of the GLM will be provided in section 4.2.2.

4.2.1 General overview of the GLM

A general technique used to model a response variable using explanatory variables is the linear model. A general linear model is given by:

E(Yi) = µi = x 0

iβ (14)

where random variables Yi are the independent and named the response variable. x 0 i

rep-resents the ith row of the design matrix X. The parameter vector is β. The linear model is usually written in the form: y = Xβ + e, where e is the vector of independent identi-cally distributed (i.i.d.) random variables with e ∼ N (0, σ2). The linear combination of the

explanatory variables and the associated parameters β is named the linear predictor. The standard linear model relies on some assumptions:

(17)

4 MODELING THEORY • there exists a direct or ’identical’ relationship between the linear predictor and the

expected value of the model: x0iβ = µi, see Dobson and Barnett (2008)

However, the assumption of the response variable to have a normal distribution might not always be appropriate. Furthermore the relationship between the expected value and the linear predictor is not always a direct relationship. To overcome these problem, one can use a GLM. The first research done in the field of GLM was done by Nelder and Wedderburn (1972). They found a general model by relaxation of the assumptions mentioned above. The first assumption was relaxed by assuming that the distribution of the response variable had to be a member of the exponential family, not only a normal distribution. The normal distribution is a member of this exponential family. The assumption of a direct relationship was relaxed by restructuring the relationship between the linear predictor and the expected value. Instead of a direct relationship, a linear relationship was proposed. Antonio and Beirlant (2007) summarize the relaxations into two advantages of the the GLM over the linear model:

• linear regression assumes normally distributed data, where in GLM it is allowed that the random deviations from the mean have another distribution than the normal dis-tribution. This makes the model suitable for more cases

• GLM provides the possibility to model the effect of explanatory variables on a trans-formation of the mean on another scale

The mathematical details on the GLM will be provided in section 4.2.2. Extensive research has been performed on GLM, where the book of McCullagh and Nelder (1989) is a valuable handbook to explain the GLM.

In the last decades, GLM gained popularity in the area of insurance to model risk premiums. An advantage of GLM over one way analysis is that more variables can be added to the model at the same time. Adding more variables simultaneously also allows for interaction effects between the variables, which is not possible in one-way analysis. This makes GLM attractive to model the risk premium using a data set where several explanatory variables are available. Ohlsson and Johansson (2010) discuss the use of GLM in non-life insurance pricing. They elaborate on GLM to model the claim severity.

4.2.2 Mathematical details of the GLM

A GLM is a linear model that expresses the relationship between a response variable and a number of explanatory variables. This relation is expressed in Equation 15

g(E(Yi)) = g(µi) = x 0

iβ (15)

(18)

4 MODELING THEORY 1. a random component: Yi is the response variable, the distribution of Yi is a member

of the exponential family 2. a systematic component: ηi = x

0

iβ, linear in the parameters β1, . . . , βk, where k denotes

the number of covariates

3. a link function: the link function g links the random component to the systematic component where ηi = g(µi) .

The density of an exponential family distribution is given in Equation (16)

fY(yi|θi, φ) = exp

 yiθi− b(θi)

a(φ) + c(yi, φ) 

(16)

Where θiis the canonical parameter and φ is the dispersion parameter. The various members

of the family are specified by the functions a, b and c.

Now it can be shown that if Y has a distribution in the exponential family, the mean and variance are equal to:

E(Yi) = b0(θi) = µi (17)

V ar(Yi) = b00(θi)a(φ) (18)

The link function g, describes the relation between the linear predictor and the mean response according to η = g(µ). Where η := β0+

Pk

i=1βixi and β0 is the value for the intercept. As

the link function is one-to-one, we can invert it: µ = g−1(η). Any monotone, continuous and differentiable function can be a link function, however there are common choices for standard GLMs. The function g is called canonical if it equals the linear predictor η to the canonical parameter θ.

Maximum likelihood estimation is used to fit βs to the GLM. McCullagh and Nelder (1989) show, using the Newton-Raphson method with the Fisher scoring, how optimization can be achieved using iteratively re-weighted least squares (IRLS). This optimization method is an important practical feature of the GLM: all GLMs can be fit to data using the IRLS algorithm. Software packages can be used to apply this method. Extensive research has been performed on the IRLS algorithm and a detailed explanation is provided in McCullagh and Nelder (1989).

The large framework of GLMs requires for several procedures to check the model. Breslow (1996) lists assumptions made when applying the GLM. Violating one of the assumptions may seriously influence the outcome of the model.

(19)

4 MODELING THEORY • correct specification of the link function g

• correct form of the explanatory variables x

• lack of undue influence of individual observations on the fit

In his paper he suggests methods to check the above assumptions and describes how to deal with violated assumptions. A frequently occurring problem is the problem of overdispersion. We speak of overdispersion when the variation in the data is larger than the variation pre-dicted by the model. Underestimation of the variance might influence the selection process of the right model. To estimate the variance correctly, one could use heteroscedasticity-consistent (HC) standard errors (see White (1980)). HC standard errors have the advantage of being asymptotically consistent. One does not need to make an assumption about the distribution. HC standard errors have shown to be consistent even when the assumed model underlying the parameters was incorrect.

In the following we use the GLM to model the claim severity. Several choices are possible in this case. Ohlsson and Johansson (2010) note that the distribution has to be positive and right skewed, hence a normal distribution is not suitable. The gamma and the log-normal distributions are distributions that fulfill these requirements. Fu and Moncher (2004) ran several simulations and found that the gamma distribution slightly performs better than the log-normal distribution in case of claim severity modeling. This is in line with other literature where the gamma GLM has become the standard distribution to model the claim severity, also see De Jong and Heller (2008). Therefore we use the gamma GLM with a log link function. To apply the gamma GLM we must delete the negative claim amounts from our data set ( see Promislow (2011)), since the gamma distribution is only defined on the positive domain. In Section 5 we will elaborate on possible negative claims. The gamma distribution used in the GLM model is defined as follows:

f (y; α, β) = β

α

Γ(α)y

α−1

e−y/β (19)

As the log link function is included in the model, a multiplicative structure is obtained. In Section 4.2.2 the details of the exponential family, of which the gamma distribution is a member, have been discussed.

4.3

Ratemaking

Ratemaking is the term used for calculating the risk premium. The risk premium is defined as the expected value of the loss of a specific insurance product. By a multiplication of the claim incidence and the claim severity, one finds the expected loss per day at sea insured. As we did not model the claim frequency as a Poisson GLM model, we cannot apply the theory of Ohlsson and Johansson (2010) to determine ˆRp, the expected value of R for policyholder

(20)

4 MODELING THEORY

4.4

Model selection

As we are modeling the claim incidence and claim severity in this thesis, we are interested in which variables to include or exclude in the model. Including more variables can improve the goodness-of-fit, however adding too many variables may overfit the model to the data. In this section we will discuss test-statistics to test hypotheses and select models.

A renowned model selection criterion is the Akaike Information Criterion (AIC) by Akaike (1987). It uses the expected log-likelihood as basic criterion. The AIC is a relative goodness of fit criterion, which means that it selects the best model among several models. The best model is the model with the least complexity and highest information gain, see Bozdogan (1987). The AIC is given by:

AIC = −2` + 2k

where ` is the value of the log-likelihood of the model and k is the number of parameters in the model. When the likelihoods of several competeting models have been estimated, the AICs can be compared. The model with the minimal AIC is chosen to be the best model. Besides the AIC, one can use the Bayesian Information Criterion (BIC) by Schwarz. The BIC criterion penalizes the model by another term: k log(n), where n is the number of observations. When log(n) > 2, the BIC penalizes the model more than the AIC does. In the literature wide research has been performed on both criteria and no model selection procedure has been found that combines the advantages of AIC and BIC (see Yang (2005)). In the paper of Yang the AIC is preferred over the BIC and we will use the AIC as selection criterion in this thesis.

Where the AIC and BIC are relative goodness of fit criteria, the deviance can be used as a measure of absolute goodness of fit. The deviance is an important measure when fitting GLMs and is based on the logarithm of likelihood. Let a saturated model be a model with n observations and n parameters. The likelihood of the saturated model achieves the maximal achievable log-likelihood, it is a ’full’ model. Now suppose we have fitted a GLM to our data. To assess the quality of this model, compare it with the saturated model by means of their log-likelihoods. Let LS(yi; θ) and L(yi; ˆθ) be the likelihood corresponding to the saturated

model and the proposed model. The deviance is defined by: D = −2 log L(yi; ˆθ) − log LS(yi; θ)



(20) As in Kaas et al. (2008), the scaled deviance can be computed by dividing the deviance by the dispersion parameter φ. Dφ is approximately χ2n−k distributed. n is the number of variables in the full model and k in the fitted model. If the proposed model does not fit the data well, the value of Dφ will be larger then the value predicted by the distribution of χn−k.

When we would like to test a hypothesis that involves multiple parameters, we can use a likelihood ratio test to test a null hypothesis. Let the null hypothesis be H0 : θ ∈ Θ0 be tested

(21)

5 DATA DESCRIPTION the constrained maximum-likelihood estimator and let `(θ) be the log-likelihood:

˜

θ = argmin

θ∈Θ0`(θ) (21)

In Cramer (1989) it is shown shown that:

2 `(ˆθ) − `(˜θ) ∼ χ2p (22)

with p = dim(Θ) − dim(Θ0), the number of independent restrictions.

5

Data description

In this section we apply the models described in 4 on the data. The data used in this data is provided by an insurance company. In this section we describe the process of collecting the data. The raw data was extracted from the database by using an Open Data Base Connection and SQL. The data was divided into a claim and a coverage data set. Both data sets contain variables which indicates whether the observations comes from the N, C or I-group as described in 3. We will describe the three groups per section and we will elaborate on the matching of coverages and claims.

The data was collected with 2005 as starting year. This year was chosen since it provides us with eight reliable years of insurance data and a recent overview of the policyholders. 2012 was chosen as the last claim year to correct for the IBNR and IBNS claims, which make the calculations more complex. The final year of payment has been chosen to be 2014 where the data has been collected untill 31-03-2014. Including this year of payment we get the best estimation of the filed claims between 2005 and 2012.

The coverage data include the administration of the policies, information of the insured individuals/group on these policies and the insured products. The policy administration works with a payment in advance for a certain coverage. Later, the actual days at sea are counted and if necessary, part of the payment is refunded. This is denoted by a negative invoice amount. To derive the actual insured days at sea, correct specification of the in advance paid premium and correct days at sea is necessary. As the registration of these dates was not always correct, the coverage was derived by the invoice payments for the coverage. By summing up the positive and negative invoice amounts in case of a repayment, the net payment was calculated per person/group, product and year. By dividing the net invoice payment by the premium per day for the product, the insured days at sea were calculated.

The claim data include all information on the claims and the payments. This includes a specification of the payment, information on the individual (if collected) and the products on which the claim was filed. Unfortunately, not all data entries are reported and missing values occur.

(22)

5 DATA DESCRIPTION

Product Description

H H1 covers the medical costs in the homeland of the insured as a result of accidents and illness up to a maximum of 250,000 (EUR or USD)

H2 additional insurance for the health care in the homeland of the insured

P P1 European coverage with a maximum coverage of 200% of the Dutch tariff for treatments

P2 covers all health care and repatriation expenses

Table 2: Product description of health care products P (direct medical care) and H (medical care homeland).

Important to notice is the difference between Euro and US Dollar, which is registered for the coverages and claims. If the currency of the policy is in USD, EUR claims can occur on this policy, for example when intermediaries need to be paid in a specific currency. To simplify the data description, the claimed amounts have been summed up irrespective of the currency. When estimating the models we do differentiate between the difference in currency. In Table 3 an overview of the claims per year and product is given:

Product, year 2005 2006 2007 2008 2009 2010 2011 2012 H1 62 73 98 50 64 36 32 47 H2 5 20 32 24 55 40 56 54 P1 237 256 270 319 305 277 284 152 P2 579 610 633 565 555 617 581 735

Table 3: Number of claims per product and year over the years 2005-2012 for all policy groups.

When a wrongful claim has been paid out, the costs are refunded by the policyholder. This results in negative claims in our data set.

An important note for the coming sections is the existence of miss matches. To match a coverage to a claim, the claims has to be filed on the insured products from the coverage. However, when during the data collecting process it occurred that claims were filed on uninsured products. We name these matches miss matches. Miss matches give a wrong overview of the data.

5.1

Coverages and claims for the N-group

(23)

5 DATA DESCRIPTION coverages per policyholder and per year, the data set consists of 2,587 coverage. In this data set we see that some policyholders are insured for more than one product.

The claim data contain 6,012 payments. We are interested in the claims per product and claim number. By aggregating the payments per claim number and product, the data is reduced to 2,838 claims with 2,730 unique claim numbers. There is a difference between these numbers as some claims are filed on more than one product. By aggregating the payments, some of the claims remain negative or become zero by the reversal booking. The negative claims have an amount of 54,004, counting for 2.68% of the total positive claimed amount of 2,014,272. Because of the non-negative character of the claim models, the zero and negative claim amounts were removed from the data set. The final data set consists of 2,754 claims.

When matching the claims to the coverages, the difference in currency has been omitted, since it is possible to have a EUR claim on a USD policy. 2,618 claims can be matched to a coverage, with a total claim amount of 1.91 million. This is 94.97% of the total claimed amount. Claims can only occur on the insured products. In Table 4 an overview is given of the combinations of insured products as rows and the products on which claims have been filed as columns. Insured, claimed No claim H1 H1, P1 H1, P1, P2 H1, P2 H2 H2, P1 H2, P1, P2 P1 P1, P2 P2 Total H1,H2,P1 - - (1) - - - 3 - - 4 H1,P1 24 1 - (1) - - - - 10 (5) - 41 H1,P2 98 - - 2 - - - - (4) 109 213 P1 835 - (1) - (1) (1) - 497 (22) (18) 1,377 P1,P2 47 - - (2) - - - (2) 4 16 11 80 P2 623 - - (1) - (1) - (2) (10) 235 872 Total 1,627 1 2 3 3 1 2 2 516 57 373 2,587 Table 4: Insured product combinations per coverage (rows) and matched claim combinations (columns) per coverages for the N-group, incorrect claims are denoted by brackets.

One should interpret Table 4 as follows: there are 2,587 coverages with different product combinations over eight years. When observing the policyholders who insured their crew for products H1 and P1, 24 of them did not file a claim in the insured year. If they filed a claim, one of them claimed on H1, one has a miss match registered on H1,P1,P2, another five miss matches are registered on P1,P2 and there are 10 coverages with a claim on only product P1. The values between brackets are the incorrectly registered claim, which cannot be matched to a coverage.

5.2

Coverages and claims for the C-group

(24)

5 DATA DESCRIPTION registered, where ’C’ denotes the rank crew and ’O’ the rank officer.

Similar to the claims of the N-group, the payments were aggregated. The size of the negative claims is 1.98% of the positive amount of 2.47 million and is therefore neglected. In the final data set of 2,135 claims, the characteristic rank was not registered properly for some claims. This happened since rank is not of interest for the premium or claim payments for products P and H. For both ranks the policyholder has to pay the same premium amount and he get paid the same amount in case of illness or an accident. 1,323 values for rank were missing, which is 62.0% of the total amount of claims in the C-group.

Because the rank was not registered properly, matching coverages to claims based on this information was not possible. Therefore we leave the rank variable and match based on policy number and year. 278 policyholders with different products were insured in the period 2005-2012. Aggregating the coverages per year, currency and product, resulted in 657 unique entries divided over 8 years. Matching the coverages to the claims resulted in 2,024 matches, 95% of the total claimed amount. In Table 5 one finds the combinations of products insured as rows and the claim combinations as columns. Again we observe few bivariate observations.

Insured, claimed No claim H1 H1,P1 H1,P2 H2 H2,P1 P1 P1,P2 P2 Total H1 11 - - - (2) - (1) 14 H1,H2,P1 - - - (1) - 1 H1,H2,P2 2 - - 1 - - - 3 H1,P1 55 - 5 - - - 26 (3) (3) 92 H1,P1,P2 1 - - - 1 H1,P2 87 1 (1) 19 - - - (6) 63 177 H2 46 - - (1) 8 (3) (1) - - 59 H2,P1 9 - - - - (1) 3 (4) - 17 H2,P1,P2 1 - - - 1 H2,P2 50 - - (3) - 12 - (2) 46 113 P1 17 - - - 17 (9) (1) 44 P1,P2 1 - - - 1 - - 2 P2 55 - - 5 - 1 - (3) 69 133 Total 335 1 6 34 8 17 50 28 183 657

Table 5: Insured product combinations per coverage (rows) and matched claim combinations (columns) per coverages for the C-group, incorrect claims are denoted by brackets.

5.3

Coverages and claims for the I-group

(25)

5 DATA DESCRIPTION

Rank, Region AF AS EUR ME NA OC PH SA Unknown C 86 1,985 6,547 164 3 2 9,140 1,126 2 O 95 197 6,334 108 6 1 778 591 0

Table 6: Number of coverages per rank and region in the I-group.

Aggregating the claim payments was done similar as before. There were 4 payments with another currency than EUR or USD, these are removed. 2,607 claims for the matching process remain in the data set with a total claimed amount of 3.10 million. As rank and region are not of interest for the claim payments, we observe many missing values in the claim data.

It is possible to be insured for more period per year, therefore we aggregated the coverages by employee number and year. After the aggregation 15,803 coverages remain. The name and employee number were used to match the coverages to the claims. A match on name resulted in 1,749 matches. Some names were spelled differently and therefore could not be matched. As the employee number was not always registered, another match based on employee number resulted in 47 extra matches. 68.9% of the claims were matched to a coverage. The claim amount matched is 2.26 million. Similar to the other tables for the N-group and C-group, one can find the insured products and claims in Table 7.

Insured, claimed No claim H1 H1, H2 H1, H2, P2 H1, P1 H1, P1, P2 H1, P2 H2 H2, P1, P2 H2, P2 P1 P1, P2 P2 Total H1 179 1 - - (1) (1) - - - - (3) (1) - 186 H1,P1 502 1 - - 2 (1) - (1) - - 16 - (1) 524 H1,P2 5,279 10 - - (1) (6) 103 - - - (30) (14) 622 6,065 H2,P1 47 - - - (3) 50 H2,P2 2,579 (2) (7) (27) - - (30) 1 (1) 13 (1) - 211 2,872 P1 1,102 - - - 19 (1) - 1,122 P1,P2 33 - - - 33 P2 4,756 (2) (1) - - - (5) - - - (2) (2) 183 4,951 Total 14,477 16 8 27 4 8 138 2 1 13 71 18 1,020 15,803

Table 7: Insured product combinations per coverage (rows) and matched claim combinations (columns) per coverages for the I-group, incorrect claims are denoted by brackets.

(26)

6 ESTIMATION RESULTS

6

Estimation Results

In this section we apply the models from Section 4 to the data of Section 5. We start modeling the claim incidence using the log-likelihood to find the parameter values of the claim incidence distribution. Furthermore we describe the dependence between different products in claim incidence. Thereafter we model claim severity using a gamma GLM model. Finally we derive the risk premium for different products.

In Section 5 three groups of policyholders have been described. The matching processes of the N-group and the C-group are the same, therefore we only analyze the data in the N-group and the I-group in this section. We do not have information on individual persons filing a claim in the N-group. From the policyholders in the I-group we know the region and the rank from each individual.

6.1

Claim incidence

In Section 4.1.2 the log-likelihood model is described which we use to find the parameter values of the distribution of the claim incidences. As there exists a large number of zero claims, first the proportion of zero and non-zero claims have been calculated. In the previous section the claims in the I-group were aggregated per individual, we aggregate them further per policy number, region and rank. Now the proportions of zero-claims per year are obtained in a similar to the proportions of zero-claims in the N-group. In Table 8 the proportions of non-zero claims per coverage, πit, are given. For the N-group only data for products P1

and P2 are given, as only four claims occurred on product H1 and zero on H2 in the period from 2005 till 2008. As an example: Table 8 shows that in year 2005, 34% of the N-group of policyholders claimed on product P1.

Group Product,Year 2005 2006 2007 2008 2009 2010 2011 2012 N P1 0.34 0.35 0.43 0.42 0.43 0.40 0.38 0.25 P2 0.43 0.34 0.31 0.34 0.29 0.31 0.39 0.34 I H1 0.14 0.14 0.19 0.16 0.05 0.12 0.22 0.05 H2 0.11 0.30 0.17 0.29 0.00 0.00 0.00 0.06 P1 0.21 0.15 0.15 0.11 0.03 0.10 0.14 0.00 P2 0.26 0.31 0.30 0.33 0.26 0.25 0.26 0.22

Table 8: Proportion of non-zero claims per coverage for the N-group and I-group of policyholders.

In the table we observe varying non-zero claim proportions for the N-group and the I-group. The difference in non-zero claim proportion compared with the previous year is sometimes large. Reason is the small proportion of non-zero claims. A small increase or decrease in the number of non-zero claims can cause a large proportion difference.

(27)

6 ESTIMATION RESULTS bivariate distribution is larger than 0. To discover whether a policyholder, who claims on product 1 also tends to claim above average on product 2, we need to have enough bivariate coverages and bivariate claims.

From the data tables in Section 5 we observe a high proportion of zero claims. Claims and coverages for more than one product are rare events and have a high chance on miss matches. For the N-group and I-group we have 18 and 118 bivariate claims. The 118 matches in the I-group consist of 103 matches from the product combination H1,P2 on a total of 6,065 coverages in this combination. Most of the 103 matches consist of only two claims, one on either product. Hence, we do not have enough observations to model ψ12 and continue

modeling the claim incidence as described in 4.1.2.

One can obtain the proportion of zero claim incidences and the parameters of the gamma distribution by use of maximum likelihood estimation. The claim incidence is calculated by dividing the number of claims for a product per policyholder Nipt by the coverage wipt, the

days at sea insured. We first model the claim incidence of the N-group. 6.1.1 Claim incidence N-group

To model the claim indidence for the claims in the N-group, we use (n1, n2) to denote

(28)

6 ESTIMATION RESULTS

Figure 2: Kernel density of the untransformed and the cubic root transformed claim incidence of the N-group.

The plot shows that the cubic transformed data is close to a gamma distribution which partly takes into account the large tail at the right.

The insurance policies can be sold in EUR or USD currency, this means that we have to test whether they have a different distribution. By testing the following hypothesis we test whether the claim incidences of EUR and USD policies can be modeled by the same bivariate distribution or that we have to split the claim incidences per policy currency to model them separately. The alternative hypothesis is that H0 is not true.

H0 : πEU R,N = πU SD,N, π1·EU R,N = π U SD,N 1· , π EU R,N ·2 = π U SD,N ·2 , π EU R,N 12 = π U SD,N 12 αEU R,N1 = αU SD,N1 , β1EU R,N = β1U SD,N, αEU R,N2 = αU SD,N2 , β2EU R,N = β2U SD,N (23) The notation π in the hypothesis in 23 is short for the bivariate zero claim probability 1 − π12−π1·−π·2. Using the likelihood function described in Equation 13 we calculate the

log-likelihood of the restricted model, where the parameters of the EUR and USD claim incidence are equal. By separating the data by the policy currency we calculate the log-likelihood of the unrestricted model. The value of the test statistic is founds as 51.35 and has to be compared to a χ2(7)-distribution since there are 7 restrictions. The zero proportion is calculated as

the complement of the other proportions, hence we do not have eight restrictions. The test of stability of the parameters for different policy currency is rejected. We therefore split the data set based on the policy currency and use two bivariate models to model the claim incidence in the N-group.

Before calculating the bivariate distribution of the claim incidence (n1, n2) per policy

(29)

6 ESTIMATION RESULTS currency. We start with the EUR policies. When calculating the marginal probabilities of zero and non-zero claim incidences, let 1 − ψ0 be the probability of a zero claim incidence

n1 and ψ0 be the probability of non-zero claim incidence n1 with a gamma distribution. Let

1 − φ0 be the probability on a zero claim incidence n2 and φ0 the probability of non-zero

claim incidences n2 with a gamma distribution with different parameters from the gamma

distribution for n1. When we assume independence, the probability of (0,0) of the bivariate

distribution of (n1, n2) should be equal to (1 − ψ0) × (1 − φ0).

In Table 9 one finds the marginal distributions of the non-zero claim incidences of n1 and n2

together with the non-zero claim incidence probability ψ0 and φ0 and their standard errors.

value P1 sd 1 − ψ0 0.62 -ψ0 0.38 0.01 shape 14.45 0.43 scale 0.01 0.00 value P2 sd 1 − φ0 0.63 -φ0 0.37 0.01 shape 11.99 0.54 scale 0.01 0.00

Table 9: Marginal distributions of the EUR policies N-group for products P1 and P2.

Using the maximum likelihood function described in Equation 13 we derive the parameters of the bivariate gamma distribution for the claim incidence of the EUR policies of the N-group.

ˆ θM LE sd 1 − π12− π1·− π·2 0.63 -π1· 0.22 0.01 π·2 0.14 0.01 π12 0.01 0.00 shape n1 14.60 0.44 scale n1 0.01 0.00 shape n2 11.34 0.55 scale n2 0.01 0.00

Table 10: Estimated parameters and probabilities for the bivarite distribution of (n1, n2) of product P1 and

P2 for the EUR policies of the N-group.

The standard deviations for the probability on a zero claim incidence for both products is not given because this proportion has been calculated as the complement of the other proportions. As stated before, two variables are independent if and only if the joint probability is equal to the product of their probabilities. In our case this would be: 1 − π12 − π1· − π·2 =

(1 − ψ0) × (1 − φ0). However, 0.63 6= 0.62 × 0.63 = 0.39. Hence, the bivariate distribution

is more applicable in this case as there exists dependence between the claim incidences of product P1 and P2 for the EUR policies.

(30)

6 ESTIMATION RESULTS the observations above the 45 degree line. This agrees with Table 2, where we observe a very long tail that is difficult to model by a gamma distribution.

Figure 3: QQ-plot claim of the incidence of product P1 and P2 for EUR policies in the N-group.

Performing the same tests for the USD policies, results in another model. We observe less coverages with a USD currency in the N-group. Furthermore, there are no bivariate observations with a USD policy, hence the policyholders either claim in product P1 or P2. Therefore the bivariate model is not applicable. We model the USD claim incidences from the N-group using two marginal gamma distributions.

value P1 sd 1 − ψ0 0.71 -ψ0 0.29 0.11 shape 28.38 3.24 scale 0.00 0.00 value P2 sd 1 − φ0 0.82 -φ0 0.18 0.04 shape 11.78 2.12 scale 0.01 0.00

Table 11: Marginal distributions n1 and n2 of product P1 and P2 for the USD policies in the N-group.

(31)

6 ESTIMATION RESULTS θiwhere i = 1, . . . , 8. The log-likelihood `(θ) is a summation over the log-likelihood as stated in Equations 13 over eight years. The vector θ is a vector with 64 elements, with different parameters for every year. However, as the probability of zero-claims is the complement of the other three probabilities, we only estimate 49 parameters. The likelihood is optimized by first optimizing for the first year and then the other years. If the parameters are stable over time, we have to test the following null hypothesis:

π1 = π2 = π3 = π4 = π5 = π6 = π7 = π8 π1 1· = π1·2 . . . = π1·8 π·21 = π·22 . . . = π·28 π1 12 = π122 . . . = π128 α1 1 = α21 . . . = α81 β11 = β12 . . . = β18 α1 2 = α22 . . . = α82 β1 2 = β22 . . . = β28 (24)

πi is a short notation for the probability 1 − πi

12− π1·i − πi·2. The superscripts refer to the

years 2005 until and including 2012. The restricted log-likelihood has already been calculated before. The value of the likelihood of the unrestricted model is the summation of the log-likelihoods of the separated years. The likelihood-ratio test for stability of the parameters is:

2 `1(ˆθ1) + `2(ˆθ2) + `3(ˆθ3) + `4(ˆθ4) + `5(ˆθ5) + `6(ˆθ6) + `7(ˆθ7) + `8(ˆθ8) − `(˜θ) ∼ χ2(42) (25)

The value of the likelihood-ratio test is 75.25 which is compared to the χ2(56) as there are

42 restrictions. Hence the null hypothesis is of equal parameters is rejected. Hence, we are not able to model all bivariate claim incidences on P1 and P2 for the EUR policies by one model.

The calculated parameters values of the unrestricted model for different years are shown in Figure 4. The parameters p1, p2, p3, p4 are the probabilities 1 − π12− π1·− π·2, π1·, π·,2 and

π12 respectively. From Figure 4, we do not observe a trend. When taking the confidence

(32)

6 ESTIMATION RESULTS

Figure 4: Estimated MLE parameters and probabilities of the bivariate gamma distribution for the claim incidences of the EUR policies from the N-group over the years 2005-2012 with 95% confidence intervals.

We do not have enough data to test the stability of the marginal parameters per year of the USD policies. To determine the risk premium in section 6.3 we assume the USD marginal distributions to be stable.

6.1.2 Claim incidence I-group

(33)

6 ESTIMATION RESULTS

Figure 5: Kernel density of the cubic root transformed claim incidence for the I-group.

As one observes from Table 7 in the data section, the possible bivariate claims to match are the claims on products H1 and P2. We name the claim incidences of product H1 and P2 n1

and n2 respectively. We first test the null hypothesis that there is not significant difference

in the parameter values of the bivariate distribution for EUR and USD policies: H0 : πEU R,I = πU SD,I, π1·EU R,I = π

U SD,I 1· , π EU R,I ·2 = π U SD,I ·2 , π EU R,I 12 = π U SD,I 12

αEU R,I1 = αU SD,I1 , β1EU R,I = β1U SD,I, αEU R,I2 = α2U SD,I, β2EU R,I = β2U SD,I (26) The calculated likelihood-ratio test-statistic has a value of 112.87 and gives reason to reject the null hypothesis when comparing this value with the χ2(7)-distribution. We split the data set based on the policy currency and estimate two bivariate distributions to model the claim incidence of the products H1 and P2 in the I-group. In Table 12 one finds the estimated parameters of the bivariate gamma distribution for the products H1 and P2.

EUR θˆM LE sd 1 − π12− π1·− π·2 0.90 -π1· 0.01 0.01 π·2 0.08 0.01 π12 0.01 0.00 shape n1 5.30 2.09 scale n1 0.02 0.01 shape n1 10.10 1.49 scale n2 0.01 0.00 USD θˆM LE sd 1 − π12− π1·− π·2 0.66 -π1· 0.01 0.00 π·2 0.22 0.02 π12 0.11 0.01 shape n1 7.28 1.04 scale n1 0.01 0.00 shape n1 11.51 0.84 scale n2 0.01 0.00

Table 12: Estimated parameters and probabilities for the bivarite distribution of (n1, n2) of producht H1

(34)

6 ESTIMATION RESULTS In the figure 6 one find the QQ-plots of the bivariate distribution of product H1 and P2 for the USD policies. The distribution does not fit the tails.

Figure 6: QQ-plot claim of the incidence of product P1 and P2 for USD policies in the I-group.

We estimate the parameters of the marginal distributions of the claim incidences of product H1 and P2 for the EUR and USD policies where 1 − γ0 and γ0 and 1 − λ0 and λ0 denote the

probabilities of zero and non-zero claim incidences for respectively product H1 and P2. For both policies we find that (1 − γ0) × (1 − λ0) 6= 1 − π12− π1·− π·2. For the EUR policies

we calculate γ0,EU R = 0.85 and λ0,EU R = 0.95, the product is 0.81, which is different from

1 − π12− π1·− π·2 = 0.90. Idem we calculate for the USD policies that 0.78 × 0.64 6= 0.66.

Hence we model the bivariate claim incidence of products H1 and P2 by the bivarate gamma distributions with the parameters as estimated in Table 12.

Besides the products H1 and P2, we also have to model the claim incidences of product H2 and P1 by a marginal distribution. We first test whether we can model the claim incidences of the EUR and USD policies by the same marginal distribution. Therefore we test the hypothesis in Equation 27 for both products separated, where ζ0 denotes the proportion of

zero claim incidences. The alternative hypothesis is that H0 is not true.

H0 : ζ0EU R,I = ζ U SD,I 0 , 1 − ζ EU R,I 0 = 1 − ζ U SD,I 0 ,

αEU R,I = αU SD,I, βEU R,I = βU SD,I (27)

(35)

6 ESTIMATION RESULTS null hypothesis and we model the claim incidence of product P1 by a marginal distribution. We test the same hypothesis for the product H2. However, in the 11 observations from the USD policies, we find no non-zero claim incidences. Therefore we assume the marginal models of the USD and EUR for product H2 to be the same. The parameters of the two marginal distributions for product H2 and P1 can be found in Table 13. In this table xi0 is

the probability of a zero claim incidence for the product H2. When there is one claim on product H2 on a USD policy, this proportion will be 0.91, close to 0.89.

value H2 sd ζ0 0.89 -1 − ψ0 0.11 0.03 shape 10.29 2.41 scale 0.01 0.00 value P1 sd ξ0 0.88 -1 − ξ0 0.12 0.02 shape 12.29 2.02 scale 0.01 0.00

Table 13: Marginal distribution of products H2 (left) and P1 (right) for both EUR and USD policies of the I-group.

We would also like to perform a test of stability on the bivariate distribution of products H1 and P2 for the EUR and USD policies. Unfortunately we do not have enough observations per year to do this test. By observing the proportions of zero and non-zero claim incidences in Figure 7 we already see that the bivariate distribution for the USD policies is not stable over the years. The same applies to the EUR claim incidence.

(36)

6 ESTIMATION RESULTS A a preliminary to following section on claim severity, the final test we perform in this section is a test on stability over the parameters based on region. In Figure 8 one finds a histogram of the log claim amount paid in EUR and the various nationalities from the I-group.

Figure 8: Histogram showing the log claim amount of claims with EUR currency and frequency of a claim from the various regions.

The shows that in the data many claims come from individuals with a Philippine nationality. To test whether a Philippine nationality influence the claim incidence, we test the null hypothesis that for EUR policies we can model the bivariate claim incidence of Philipinnes and other regions by the same parameters. The alternative hypothesis is that the null hypothesis is not true. Our likelihood-ratio test statistic has a value of 8.31, which does not give enough evidence to reject the null hypothesis when compared to the χ2(8)-distribution

with a p-value of 0.404.

When testing the same hypothesis for the USD policies, we calculate a likelihood-ratio statis-tic with a value of 112.87. This value, compared to a χ2(7)-distribution distribution gives

enough evidence to reject the null hypothesis with a p-value of 0.000. Hence bivariate claim incidence of Filipinos and individuals with another nationality cannot be modeled by the same bivariate distribution. This can also be observed from the parameters: we observe a much larger probability of a non-zero bivariate claim incidence in the group of Filipinos (π12= 0.240) than in the other regions group (π12 = 0.077).

(37)

6 ESTIMATION RESULTS

6.2

Claim severity

In this section we model the claim severity for both the N-group and the I-group by a gamma GLM with a log link function. Similar to the claim incidence, we observe a skewed distribution for both claim amounts. Therefore we apply a log transformation to the claim amount and we will take this transformed claim amount as dependent variable, see Section 4.2.2

6.2.1 Claim severity N-group

To model the claim severity we first split the claims based on the currency of the claim. The majority of the claims in the N-group have a EUR currency. As described in Section 5 we have no specific information of the individual who filed the claim. We only have information about the claim payment. We know the claim year, the product on which the claim was filed and the currency of the policy, which is the currency in which the policyholder pays the premium (EUR or USD). We include these variables in the GLM model for the log transformed EUR claim amounts. The parameters of the model are shown in Table 14, together with their robust standard errors and confidence intervals. The policy currency is also included, as it is possible to have EUR claims on a USD policy and reversed.

Estimate Robust SE q2.5 q75.5 exp(coef) (Intercept) -24.176 4.197 -32.403 -15.977 0.000 Claim year 0.013 0.002 0.009 0.017 1.013 Policy cur: USD 0.228 0.061 0.108 0.312 1.257 Product: P2 0.013 0.010 -0.007 0.033 1.013

Table 14: Severity model for EUR N-group policies based on all claims in the years 2005-2012 with 95% confidence intervals.

From the positive parameter for claim year, we see that the claim amounts grow over the years. Furthermore, when the policy currency is in USD, a claim in EUR will be higher than when the policy currency is in EUR. Since we modeled the claim severity using a log link function, the exponent of the coefficient has been included in the last column to interpret the coefficients. Note that we modeled the log claim amount. For example, the log claim amount arising from claims filed on product P2 are 1.013 times as high as the log claim amounts from claims filed on product P1, ceteris paribus. This is an increase of just over 1%. We compare the deviance of this model with the deviance of the saturated model, the model where we have n observations and n parameters. The residual deviance is 144.00 with dispersion parameter of 0.06 and compared to the χ2-distribution with 2552 degrees of

freedom we do not reject the model.

Referenties

GERELATEERDE DOCUMENTEN

Mit dem Ende des Ersten Weltkrieges stand Österreich vor einem Neuanfang. Der Krieg, der durch die Ermordung des österreichischen Thronfolgers Franz Ferdinand von Österreich-Este

Omdat elk team hoogstens één knik heeft, hebben die twee teams precies hetzelfde uit-thuis schema (behalve die 2x achter elkaar uit spelen ze allebei steeds om-en-om uit en

The conference on Strategic Theory and Contemporary African Armed Conflicts is the result of co‑operation between the faculties of the Royal Danish Defence College (RDDC)

By analyzing 128 elections in the lower chambers of the European Union countries using data on the regional GDPs, the historical languages and the

To test if depth of processing, cognitive capacity and product involvement are moderating main effects of both admiration on consumer behavior and extremity of the claim

Willingness to exit the artisanal fishery as a response to scenarios of declining catch or increasing monetary incentives.Fisheries Research, 111(1), 74-81... Qualitative

Bostaande•is in bree trekke die Ossewabrandwag se opvat- ting van die volkspolitiek; en na vyf jaar staan die O.B. nog by elke woord daarvan. alegs blankes "·at

Bakhtin 1984:354 demonstrates that the outward forms of clowns such as Gros Guillaume – a 17th century figure from comic folklore, also known as Fat William, who was exceptionally