Non-life insurance pricing using the generalized linear model and the generalized additive model

(1)

Generalized Linear Model and the

Generalized Additive Model

Wenwei Wu

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics Author: Wenwei Wu Student nr: 10912487

Email: wenwei.wu@foxmail.com Date: October 3, 2017

Supervisor: dr. S.U. (Umut) Can Second reader: Roger J.A. Laeven

(2)

(3)

Abstract

In non-life insurance, the GLMs are being widely used in non-life insurance pricing. In the generalized linear model, continuous rat-ing variables are categorized in to intervals and all values within an interval are treated identical. But if two policies have different but close values, it may result in significantly different premiums after the values are categorized into different intervals. With some disad-vantages of using GLMs, we will introduce the generalized additive model in non-life insurance pricing. These thesis will discuss both GLM and GAM, the comparison of the results from some numerical computations on the public insurance data shows GAM is better than GLM.

Keywords Non-life insurance, Generalized linear model, Generalized additive model, AIC, Poisson distribution, Gamma distribution, Claim frequency, Claim severity, Deviance

(4)

Preface

First of all, I want to thank my supervisor Dr. S.U. (Umut) Can for what he did to help me to finish this thesis. He told me how to sort the data appropriately and taught me non-life insurance pricing theory. I also appreciate how he was always reachable and replied to my e-mails very quickly. Thanks also to the lecturer Prof. Kaas of the course Non-life insurance: Statistical Techniques. Through that course I found my interest in non-life insurance and this is the main reason I chose this topic, I also want to thank all my friends and family for their support.

I would like to dedicate this paper to my parents. My father died in a car accident while I was writing this paper and my mother died of cancer one month later. Sud-denly, I have to live on my own. I miss you so much and I will be strong for the rest of my life.

(7)

Introduction

A non-life insurance policy is an agreement between an insurance company and a customer, that will cover losses on cars, houses, or other assets of the insured. For companies, non-life insurance policies may cover the cost of damages to property, cost of business interruption, and even the cost of employee health problems. In general, any insurance which is not life insurance can be classified as non-life insurance. For each non-life insurance policy, the premium is calculated using a number of variables such as the age of the policyholder, the gender of the policyholder, the age of the car, etc. All of these variables are the rating factors. The rating factors are from one of the following, see Ohlsson and Johansson((2010))):

1. Properties of the policyholders: age or gender if the policyholder is a private person, line of business for a company, etc.

2. Properties of the insured objects: age or model of a car, type of building, etc.

3. Properties of the geographic region: per capita income, population density of the policyholders residential area, etc.

In the generalized linear model (GLM), the rating variables are categorized into different intervals, e.g. ages are usually categorized into three classes: young, middle and old. Nowadays, many tariff analyses are based on rating variables. For example, the data used in this thesis contains the total kilometers for each vehicle, but the choice was made to utilize just several kilometer classes. We used a similar process for other variables such as zone, bonus and claims made, but for variables like insured, claims and payment, we provided specific values.

GLMs are simple and widely used, but the disadvantage is that if two poli-cies have different but close values it may result in significantly different premiums after the values are categorized into different intervals. Categorizing the intervals effectively can also be very time consuming. The intervals must be large enough to provide good precision in the price relativities, but must also be small enough to be effective if the rating variables vary widely. In the real world, sometimes it is hard to satisfy both requirements when we are dealing with the data sets.

As an extension of generalized linear models, we can use several possible ap-proaches to work with generalized additive models (GAMs). A well-known approach

(8)

2 Wenwei Wu — Non-life Insurance Pricing

is smoothing splines. With GAMs, we can avoid the disadvantages of GLMs. This thesis prices non-life insurance using both models and compares the results, see

Guisan et al.((2002)).

In this thesis, we will discuss the definitions of GLMs and GAMs, then try to solve the non-life insurance pricing problems with different models base on a public data set. At last, we will compare the estimation results of GLMs and GAMs to see which one is the best.

(9)

Non-life Insurance Pricing

2.1 Tariff analysis

Tariff analysis is the statistical study an actuary performs to obtain a tariff. This kind of analysis is based on insurance policy data and claims for some portfolios. The data is usually provided by the company itself, but sometimes it is supplemented by external institutions like the Centraal Bureau voor de Statistiek (Central Agency for Statistics). Several key pieces of information are required for a tariff analysis (Kaas et al. ((2008))):

1. Duration of the policy: This is the amount of time that the policy will cover the risk. It is usually counted in years

2. Claim: The insured reports to the insurer that they want compensation for their loss, based on the insurance contract.

3. Claim frequency: This is the number of claims divided by the duration. 4. Claim severity (Average cost per claim): This is the total claim amount divided by the total number of claims.

5. Pure premium: This is the average cost per time period. The pure premium is the product of the claim frequency and the claim severity.

6. Loss ratio: This is the claim amount divided by the earned premium.

An easy analysis can be conducted using the pure premium directly, but we need to split the model to the product of claim frequency and claim severity. The advan-tage of this is that the claim frequency and the claim severity may have different dependencies on different variables and the data is not always complete, resulting in inconclusive analysis results. Conversely, the model split identifies the weak link. In order to perform reasonable analyses, there are some basic assumptions that provide a foundation for statistical model building, seeMccullagh and Nelder((1989)): 1. Policy independence: assume there are n different policies. Let Xi be the claim for policy i, then we have X1, ..., Xn are independent.

2. Time independence: assume there are n disjoint time intervals. Let Xi be the claim in time interval i, then we have X1, ..., Xn are independent.

3. Homogeneity: assume there are two policies in the same tariff cell with the same exposure. Let Xibe the claim for policy i, then X1and X2have the same probability

(10)

distribution.

2.2 Rating variables

The claim frequency and the claim severity are different in every policy, and we can estimate them with a set of the rating variables. The rating variables can be contin-uous and can also be categorical. In order to make them easier to analyse, we usually classify rating variables into intervals to create categorical rating variables. This can also help with the significance of the statistical results. If two or more policies share the same intervals for every rating variable, we can say that they come from the same tariff cell and have the same premium.

2.3 Multiplicative model

Theoretically, any tariff analysis is based on the insurer’s own data. We can use the data in every tariff cell to determine the premium for that tariff cell by estimating the expected cost by the observed pure premium. However, in the real world, sometimes there may be empty cells (do not have claims or have not been collected successfully). In these cases, we have to find methods to calculate the expected pure premium that varies smoothly over tariff cells, with good precision in the cell estimates. We can achieve this by using a model that assumes the expected pure premium depends on some rating factors. The multiplicative model is widely used for this purpose and is considered the standard for insurance pricing. There is also another method, conducted by applying the additive model.

First, let us focus on the multiplicative model. Before we start calculating, we need some preliminary information. Assume we have M rating factors, and let mi denote the number of classes for the ith rating factor. To make it easier to under-stand, let’s first take a look at the case of two rating factors, each with two classes: We have cells of (i, j), where i and j denote the class of the two rating factors. So i = 1 and 2, j = 1 and 2. In cell (i, j), wij denotes the exposure of cell (i, j) and Xij represents the response. So, the key ratio Yij is Yij = X_w_ijij.

µ stands for the mean of the ratio Y , so we have E(Yij) = µij. Thus, the multiplica-tive model is:

µij = γ0γ1iγ2j (2.1)

Here, γ1i for i = 1, ..., m1 are the parameters which correspond to the different classes of the first rating factor. Similarly γ2j for j = 1, ..., m2 are the ones of the second rating factor and the parameter γ0 is the base value. We can then conclude the general case for the multiplicative model:

µi1,i2,...,iM = γ0

M Y

k=1 γk,ik

(11)

where γ0 is again the base value and the rest of the parameters on the right-hand side are the price relativities for the rating variable k. We can interpret the general case for the multiplicative model as: the overall level of this model is based on the base value γ0, and the rest pf the parameters determine how much to charge for each policy with γ0 are known. We usually determine the price relativities γk,ik first

in practice, then set the base value γ0 to give the required overall premium.

In the next chapter, we are going to apply the multiplicative model to our data set and show how to determine the base value and the price relativities.

(12)

Chapter 3

Non-life Insurance Pricing

Using The GLM

The focus of tariff analysis is to correctly determine one or more key ratios, Y , that vary with a number of rating variables. This is similar to analyzing how a dependent, Y , varies with a number of explanatory variables, x, in a multiple linear regression. But both linear regression and the slightly larger general linear model are not fully applicable in non-life insurance pricing, because:

(i) The number of insurance claims are in a discrete probability distribution with non-negative integers, and the claim costs are also non-negative. However, the linear models we mentioned previously assume normally-distributed random errors. (ii) The means of multiplicative models are more reasonable for pricing than the means of linear models, which are the linear function of the covariates.

The generalized linear model (GLM) generalizes the ordinary linear models in two ways. They can solve the two problems above by incorporating the following: (i) Probability distribution: Instead of using the normal distribution, the GLM works with a general class of distributions which contain a number of discrete and continuous distributions such as the normal, Poisson, and gamma distributions. (ii) Model for the mean: As we discussed before, the means of linear models are linear functions of the covariates x. In GLMs, some monotone transformations of the means are the linear functions of x. They contain the linear and multiplicative models as special cases.

Using GLMs rather than other methods can have some advantages, such as: (i) GLMs organize a general statistical theory which has well-established techniques for estimating standard errors, constructing confidence intervals, testing, model se-lection, and other statistical features.

(ii) GLMs are not only used in non-life insurance pricing, but also in many other statistics. Thus, it is useful with or without actuarial science.

(iii) We don’t have to manually calculate the fitting values of GLMs; they can be calculated by various software such as SAS, GLIM and R. R was the software uti-lized in this thesis.

(13)

3.1 Exponential Dispersion Models

Exponential dispersion models (EDM) are the aspect of GLMs that generalize the normal distribution used in the linear models. With the assumption we made in Chapter 2, that all the variables Y1, ..., Yn are independent, we can use a density function to present the probability distribution of an EDM:

fYi(yi, θi, φ) = exp

yiθi− b(θi)

φ + c(yi, φ, wi)

where wi is the exposure weight of the tariff cell, θi is a parameter that can depend on i, and φ is the dispersion parameter. Same for all i which is greater than zero, but if φ is fixed, then we have a so-called one-parameter exponential family, otherwise we have EDM. b(θi) is the cumulant function which is assumed to be twice continuously differentiable (first derivative is invertible).We found function c does not depend on θi. The following chapters will demonstrate that the normal, Poisson and gamma distributions are included in this family of probability distributions.

3.1.1 Normal Distribution

In this section, we will show that the normal distribution used in weighted LMs (linear models) is a member of the EDM class, even though the normal distribution is barely used in actuarial applications.

Let’s first assume that there exists a list of key ratios Y_i0s which are distributed normally. The mean of the ith key ratio is: µi = E(Yi) and variance σ2 is the same for every i in linear models. Then we have Yi ∼ N (µi, σ2/wi), wi is the exposure, and we can conclude that the frequency function of this distribution is:

fYi(yi) = s 1 2πσ2_/w i e−(yi−µi)2wi/2σ2 = exp ( log s 1 2πσ2_/w i − wi 2σ2(y 2 i − 2yiµi+ µ2i) ) = exp yiµi− µ 2 i/2 σ2_/w i − 1 2( wiyi2 σ2 + log(2πσ 2_/w i) (3.1)

The part which does not depend on µiequals: c(yi, σ2, wi) = −1₂( wiy2i

σ2 +log(2πσ2/wi)), thus we can see this is an EDM with θi = µi, φ = σ2, b(θi) = θ

2 i

2. This is a member of the EDM class, even when wi = 1.

3.1.2 Poisson Distribution

To prove the Poisson distribution is also a member of the EDM class, we will first make an additional assumption that is the claims do not cluster. Yi is the number of claims in a tariff cell, wi is the duration, and E(Xi) = wiµi. Thus, the probability distribution of Yi is as follows:

fYi(yi) = e

−wiµi(wiµi)

yi

yi!

= exp {yilog(wiµi) − wiµi− log(yi!)} , yi= 0, 1, 2, ...

(14)

As we done for the normal distribution, we let the part which does not depend on µi equals: c(yi, wi) = −log(wiyi!). Thus this is an EDM with θi = log(µi), φ = 1, b(θi) = eθi

3.1.3 Gamma distribution

To demonstrate that the gamma distribution is also a member of the EDM class, we assume that the cost of an individual claim is gamma distributed and the total claim cost of the cell is X. We then we conclude that X ∼ G(wα, β), so the frequency function of the claim severity Y = X/w is

fY(y) = wfX(wy) =

(wβ)wα Γ(wα)y

wα−1_e−wβy

(3.3) Y is gamma distributed either with G(wα, wβ). Next, we have to re-parameterize the function by setting µ = α/β and φ = 1/α. Then the frequency function is:

fY(y) = fY(y; µ, φ) = 1 Γ(w/φ) w µφ w/φ y(w/φ)−1e−wy/(µφ) = exp −y/µ − log(µ)

φ/w + log(wy/φ)w/φ − log(y) − logΓ(w/φ)

(3.4)

As before, we let c(y, φ, w) denote the sum of the last three elements which do not depend on µ. Finally, we let θ = −1/µ. Bringing the index i back to the frequency function, the frequency function of claim severity Yi is:

fYi(yi; θi, φ) = exp −yiθi+ log(−θ) φ/wi + c(yi, φ, wi) (3.5) We can then conclude that with b(θi) = −log(−θi) and all the parameters previously identified, the gamma distribution is a member of the EDM class.

3.2 Link Function

Section 3.1.1 demonstrated that normal distribution is a member of the EDM class. In a standard linear model, the observations are assumed to be normally distributed around a mean which is a linear function based on parameters and covariates. How-ever, the random variables are not necessarily normally distributed with the variance which is independent of its mean. The scale in which the means are linear in the covariates may also vary, like the loglinear distributed variables.

The generalized linear models have three important components (Kaas et al.

((2008))):

1. Stochastic component: the independent random variable Yi with a density in the exponential dispersion family.

(15)

X the matrix of regressors and ~β the parameter vector.

3. Link function: the function links the expected value µi of Yi to the linear predic-tors as ηi = g(µi).

We have demonstrated that EDMs can generalise the normal distribution used in a LM, so in this subsection we are going to focus on other generalisations of ordinary linear models by studying the link function (linear structure of the mean).

First, let’s look into a simple case with only two rating variables, assuming that one has two classes and the other has three classes. We assume a multiplicative model structure for the mean, and let µij denote the expectation of Yij (key ratio) of cell (i, j), where i = 1, 2 and j = 1, 2, 3:

µij = γ0γ1iγ2j

To make it easier to estimate, we are going to transfer the function above into a linear structure by taking the logarithms on both sides:

log µij = log γ0+ log γ1i+ log γ2j (3.6) From previous sections we know we have to choose a base cell, for example (1,1) and γ11 = γ21 = 1. On the other hand, (3.6) demonstrates that µ11 = γ0 . To make it easier to calculate as the additive model, we then sort the cells in the order: (1,1); (1,2); (1,3); (2,1); (2,2); (2,3) and let β1 ≡ log γ0, β2 ≡ log γ12, β3 ≡ log γ22, β4 ≡ log γ23. For the rest of the parameters, we introduce a set of dummy variables to express them in terms of the parameters we just assumed.

Before we introduce the set of dummy variables, we have to study all the possible relationships between log µij and β{1,2,3}: In Table 3.1, we can see the relationships

Table 3.1: Relations of our example with base cell (1,1) Tariff cell i j Base cell log µij

1 1 1 √ β1 2 1 2 β1+ β3 3 1 3 β1+ β4 4 2 1 β1+ β2 5 2 2 β1+ β2+ β3 6 2 3 β1+ β2+ β4

of the dummy variables xij of:

xij =  



1, if βj is in the expression of log µi, 0, otherwise.

(3.7)

We can then show that the log µij can be presented by the dummy variables as: log µij =

4 X

j=1

(16)

and present x0_ijs in the matrix X which is called the design matrix:

X =                      1 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1                     

With the design matrix above, let ηi = log µij, for i = 1, 2, ..., 6. Then ηi0s can be shown in a matrix form η = Xβ as:

η=                      η1 η2 η3 η4 η5 η6                      β=            β1 β2 β3 β4           

Based on the simple example we used before, we can determine how the key ratio Yi can be affected by the covariates: x1, x2, ..., xr, where r is the total number of regression parameters (β1 is not included):

ηi = r X

j=1

xijβj, for i = 1,2,...n (3.9)

where the xij and βj are similar to those in the simple example and n is the number of cells. Assume there are M rating variables and let mkbe the number of intervals for the corresponding variable k, so the values of r and n can be represented as follows: r = M X j=1 mj− M n = M Y j=1 mj (3.10)

Unlike the link function in the ordinary linear models (identity link: µi = ηi), the link function in GLMs can be arbitrary (g(µi) = ηi), but the function g(·) has to be differentiable and monotone (Ohlsson and Johansson((2010))). The link function of the multiplicative model is the logarithmic function (log-link function):

(17)

The log-link function is the most widely used in the pricing of non-life insurance and the multiplicative model is always reasonable in practice. Thus, we are going to assume a multiplicative model throughout the rest of this thesis, with some interac-tions constructed by combining the variables.

3.3 Estimation of the GLM

Using the basic knowledge and properties of GLMs that have been introduced in previous sections, we are going to study the estimation of the regression parameters. From the estimation results we can estimate the base value and the price relativi-ties. Before we address the results of general cases, let’s first focus on an important, special case.

3.3.1 Claim Frequency

We will start with the two rating factors’ simple case we mentioned in section3.2and the multiplicative model (2.1). To make reading easier, we repeat the multiplicative model here:

µij = γ0γ1iγ2j (3.12)

We will use the log-link function in GLMs. We will assume the distribution of Yij is Poisson distributed. Based on the likelihood of the Poisson distribution (3.2) derived previously, we can derive the log-likelihood of the whole sample for every tariff cell:

l(yi, µi) = X

i

wi{yi[log(γ0) + log(γ1) + log(γ2)] − γ0γ1γ2} + c

=X

i

wi[yilog(µi) − µi] + c

(3.13)

where c does not depend on γ0s, which means it does not depend on µ either. From the equations we derived before: (3.9) and (3.11), we can conclude that:

µi = exp ( _r X j=0 xijβj ) (3.14)

then we apply equation (3.14) in the equation (3.13) to get:

l(yi, µi) = X i wi  yi r X j=0 xijβj− exp ( _r X j=0 xijβj ) + c (3.15)

For equation (3.15), we are going to use the most common way to find its optimal value for this log-likelihood function: differentiate the log-likelihood function with respect to βj in order to maximize the log-likelihood function. Thus the function

(18)

with respect to βj is (notice that c in equation (3.13) does not depend on β either): ∂l ∂β = X i wi  yixij − exp ( _r X j=0 xijβj ) · xij   =X i wi  yi− exp ( _r X j=0 xijβj ) ) · xij =X i wi(yi− µi· xij (3.16)

Equation (3.15) is for one j from total r + 1 j0s. In order to find the optimal value, we are going to set those r + 1 partial derivatives to 0, so equation (3.15) becomes:

∂l ∂β =

X

i

wi(yi− µi) · xij = 0 (3.17)

The ML equation above can be solved numerically. Based on earlier research, we know we can solve the stationary point of the log-likelihood-function by using Newton-Raphson’s method and Fisher’s scoring method. After the maximum like-lihood equation has been solved (using statistical computer programmes), we can calculate the base value and price relativitives by using:

γj = eβj (3.18)

3.3.2 Claim Severity

In 3.3.1 we discussed the theory of estimating of claim frequency by assuming the Poisson distribution. In this section, we are going to explore another important factor - claim severity. To estimate claim severity, we will assume that in the data, there are wi claims and claim severity of Yi in every tariff cell i.

Assume Xi distributes in a gamma distribution that represents the claim cost in each tariff cell i, such that Yi = Xi/wi. Let µi be the mean value of Xi if wi= 1, so the expected value of Xi is: E(Xi) = wiµi, and the distribution function of Xi is:

fXi(xi) = βwiα i Γ(wiα) xwiα−1 i e −βixi_{, x} i> 0 (3.19)

with equation (3.18) and Yi = Xi/wi, we can deduce the frequency function of Yi claim severity: fYi(yi) = (βiwi)wiα Γ(wiα) ywiα−1 i e −βiwiyi_{, y} i > 0 (3.20)

As we did in section3.1.3, we let µi = α/βi and φ = 1/α. Then we can conclude a re-defined frequency function for Yi as stated in equation (3.4):

fYi(yi) = exp

−yi/µi− log(µi) φ/wi

+ log(wiyi/φ)wi/φ − log(yi) − logΓ(wi/φ) = 1 Γ(wi/φ) wi µiφ wi/φ ywi/φ−1 i e −wiyi/(µiφ)_{, y} i> 0 (3.21)

(19)

The next step is to use the MLE method to estimate the regression parameters βj, as we made an assumption that all policies are independent. Thus, the log-likelihood function of the sample is:

l(yi) = logL(yi) = logY i ( 1 Γ(wi/φ) ( wi µiφ )wi/φ_ywi/φ−1 i e −wiyi/(µiφ) ) = 1 φ X i wi(log 1 µi − yi µi + logwiyi φ − φ wi (logyi+ logΓ( wi φ))) (3.22)

We know from equation (3.14) that µi= exp (

Pr

j=0xijβj )

. By replacing all µ0_is in equation (3.14) with this expression, we can conclude:

l(yi) = 1 φ X i wi(log 1 µi − yi µi + logwiyi φ − φ wi (logyi+ logΓ( wi φ))) = 1 φ X i wi(− r X j=0 xijβj− yi/exp ( _r X j=0 xijβj ) + c) (3.23)

where the expression of c does not depend on βj.

Thus, the partial derivatives of the log-likelihood function above with respect to each βj are: ∂l ∂βj = 1 φ X i wi(yi/exp ( _r X j=0 xijβj ) − 1)x_ij = 1 φ X i wi( yi µi − 1)xij (3.24)

Then we are going to set equation (3.24) to 0 in order to get the maximum likelihood function: ∂l ∂βj = 1 φ X i wi( yi µi − 1)xij = 0 ⇒X i wi( yi µi − 1)x_ij = 0 (3.25)

A great deal of research demonstrates that the maximum likelihood equations can be solved numerically. If equation (3.25) has been solved, we can calculate the base value and price relativitives by using the same fact in equation (3.18):

γj = eβj

(20)

Chapter 4

Non-life Insurance Pricing

Using The GAM

Many pricing problems in non-life insurance contain continuous rating variables, for example the age of each policyholder or the miles driven by an insured vehicle. From previous chapters, we know that the first step towards solving these pricing problems is to separate those rating variables into different intervals and deal with different values from the same interval as if they are identical. This method is advantageous because it is easy to use and usually works well in most pricing problems. However, the disadvantage is that if the premiums for two policies have very close values for rating variables, we find that for some substantially different premiums the values are classified into different intervals. As with the most common continuous rating variable, the age of each policyholder, it is unsatisfactory if the premium has been constant for many years and then increases or decreases immediately at the end of the interval. Thus, it is hard to find appropriate intervals for some rating variables such as age. Intervals which are large enough to achieve good estimate precision are not enough, but on the other hand, they have to be small enough to avoid variables varying to much within an interval.

In one study (Hastie and Tibshirani ((1986))), Trevor Hastie and Robert Tib-shirani introduced the concept of generalized additive models which uses a sum of smooth functions P sj(Xj) instead of using the linear form P βjXj as in the gen-eralized linear models. The undefined function sj(·) does depend on cell j and they are estimated by a scatterplot smoother.

Previous studies have demonstrated a number of methods when using the GAM. The method of smoothing splines will be utilised for the rest of this thesis.

4.1 Penalized Deviances

To get a better understanding of this topic, we assumed that there is a set of rating variables xi1, xi2, ..., xiJ, and the rating variable xij represents the value of variable j for observation i. Those rating variables can be categorical or continuous. Thus,

(21)

we look back at the general problem of how to represent the situation where the key ration Y is affected by rating variables (specifically a model for the means):

ηi = r X

j=1

x0_ijβj, i = 1, 2, ...n (4.1)

First we assume that the model has no interactions, so it can be written as:

ηi = β0+ K1 X k1 β1k1φ1k1(xi1) + ... + KJ X kJ βJ kJφJ kJ(xiJ) (4.2)

To get the β in (4.2), we used βj’s in (4.1) and renumbered them; the notation makes it clear which variable a certain β parameter is associated with. Variable φjk(xij) is a dummy variable based on whether xij equals zk or not (if variable j is categorical, we assume there is a limited set of values {z1, ..., zKj}). On the other hand, if j is

continuous, we use a subdivision into those intervals, which means φjk(xij) = 1 if xij belongs to the k − th interval, otherwise its value is zero.

According to the concept introduced in Hastie and Tibshirani((1986)), we can present (4.2) in a new way. There are some functions fj such that:

ηi= β0+ f1(xi1) + ... + fJ(xiJ) = β0+ J X

j=1

fj(xij) (4.3)

Here, xij is the value of rating variable j for observation i and (4.3) contains the additive effect of each variable, although some of the fjs may have the traditional form as in (4.2). Because the mean depends more freely on the value of every rating variable, the model above is more general.

Now, let’s assume there is a simple example where we model all the variables except the first one in the usual way. This is given by:

ηi = r X j=1 βjx 0 ij+ f (xi1) (4.4)

Because insurance data may consist of a huge set of observations and we aim to find the function f that describes the effect of those rating variables the most appropri-ately, we need to find the function f that has a good fit with the insurance data and is smooth at the same time.

4.1.1 The Deviance

Measuring the fit of some estimations means ˆµ = exp{f (x)} to our data y. The expression of the deviance D(y, ˆµ) can be defined as blow:

D(y, ˆµ) = 2φ n X

i=1

(l(yi, yi) − l(yi, ˆµ)) (4.5) where the parameter φ was introduced in Chapter 3 as the dispersion parameter, and the deviance can be interpreted as the weighted sums of distances of the estimated

(22)

means ˆµ which was calculated from the data y. Thus, for the Poisson distribution, we have: D(y, exp{f (x)}) = 2 n X i=1 wi yilog yi− yif (xi) − yi+ ef (xi) (4.6)

and for the gamma distribution, we have:

D(y, exp{f (x)}) = 2 n X i=1 wi yi/ef (xi)− 1 − logyi+ f (xi) (4.7) 4.1.2 The Regularization

The research in Ohlsson and Johansson ((2010)) demonstrates that the regulariza-tion is a measure of the variability of the funcregulariza-tion f , and it can be presented as:

R(f (x)) = Z b

a

(f00(x))2dx (4.8)

where a is a value which is lower than the minimum of x and b is a value which is higher than the maximum of x. From (4.8) we can see that if a function varies widely, then its regularization will be very large. Conversely, if a function has limited variation then its regularization will be small.

4.1.3 The Smoothing Parameter

With the deviance and the regularization identified in previous sections, we can now calculate the function f which minimizes the penalized deviance:

∆(f (x)) = D(y, exp{f (x)}) + λR(f (x)) (4.9) Here the parameter λ is the smoothing parameter, which creates balance between good fit to the data and measured variability of the function f. A small λ shows low function variability which also tells us that it would increase the weight on data and that the function varies freely. On the other hand,besides providing significant trade-off between good fit to the data and low variability of the function, the large λ also forces the right-hand side of (4.8) to be small. In the next chapter, we will discuss the different methods to find the optimal smoothing parameter λ.

4.2 Smoothing Splines

For a spline function we need a number of polynomials, which are settled on a connected set of disjoint intervals, and then tie the polynomials together at the points where the intervals adjoin.

There is an important theorem that we need to know:

Theorem 4.1 Consider a set of m points while they are ordered likeu1 < ... < um and m real numbers y1, ..., ym, there is an s, such that s(uk) = yk, k = 1, ..., m, this unique s is the natural cubic spline. Furthermore, if f is an function with

(23)

f (uk) = yk, k = 1, ..., m and f is twice continuously differentiable, then for any a ≤ u1 and b ≥ um, we have: Z b a (s00(x))2dx ≤ Z b a (f00(x))2dx (4.10)

The theorem above can be interpreted as the properties of splines in connection to penalised deviances. It can be shown that, among all twice continuously differen-tiable functions, the natural cubic spline minimises the integrated squared second derivative.

With Theorem 4.1, if there is a unique natural cubic spline s such that {s(uk) = f (uk), k = 1, ...m}, we also have D(y, exps(x)) = D(y, expf (x)) and these lead to:

∆(s(x)) ≤ ∆(f (x)) (4.11)

(4.11) shows that, if we are looking for the twice continuously differentialbe function which minimizes (4.9), the set of natural cubic splines is the only thing we need to worry about. As we learned from the Appendix B.2 of Ohlsson and Johansson

((2010)): every spline function of a given degree can be expressed as a linear com-bination of B-splines of that degree. With this property, we can use B-splines to parameterize the set of s (natural cubic splines). Assume B3k(x) is the k-th B-spline of third order. Thus, the useful theorem which is used in the rest of this work can be presented as:

Theorem 4.2 For a given set of m knots, a cubic spline s can be written as:

s(x) = m+2

X

k=1

βkB3k(x) (4.12)

for a set of unique constants {β1, ...βm+2}.

The set of unique constants here can decide the weight of each B-spline and must not be confused with the regression parameters which are defined in previous sections.

4.3 Price Relativities - One Rating Variable

We are going to estimate the price relativities by looking at the natural cubic spline s which minimizes the penalized deviance:

∆(s(x)) = D(y, exp{s(x)}) + λR(s(x)) (4.13) We will begin by thinking of the problem of one rating variable and use two different cases. The first case is claim frequency and the other case is claim severity.

For claim frequency we first make an assumption that we have one continuous rating variable, and the observations of the key ratio Y are in a Poisson distribution. Continuous rating variables are usually rounded when people are collecting or storing

(24)

insurance data: they always use a much smaller number of different values instead of assuming millions of different values. For example, regarding the problem of age, the age of a policyholder is always stored as a certain number of years rounded down to the nearest integer already (23 years 6 months is recorded as 23 years). Thus, the number of times of happening of different values is less than one hundred times.

Assume xi represents the value of the rating variable for the i-th observation, and z1, ..., zm are the possible values of xi (z1 < z2 < ... < zm). If Y is in a Poisson distribution, we can derive the deviance as below:

D(y, exp{s(x)}) = 2 n X

i=1

wi(yilogyi− yis(xi) − yi+ es(xi)) (4.14)

If there exists an a and b, such that a ≤ z1 and b ≥ zm, then we can write the regularization function as:

R(s(x)) = Z z1 a (s00(x))2dx + Z zm z1 (s00(x))2dx + Z b zm (s00(x))2dx (4.15) Because function s is the natural cubic spline, which is linear outside the interval [z1, zm] , the item

Rz1

a (s

00_(x))2_{dx and} Rb zm(s

00_(x))2_{dx are both zeros. We can rewrite} equation (4.15) as:

R(s(x)) = Z zm

z1

(s00(x))2dx (4.16)

Now the penalized deviance can be written as: ∆(s(x)) = D(y, exp{s(x)}) + λ ∗ R(s(x)) = 2 n X i=1 wi(yilogyi− yis(xi) − yi+ es(xi)) + λ Z zm z1 (s00(x))2dx (4.17)

According to theorem 4.2, we can express the natural cubic spline s as the sum of B-splines (with a set of unique constant parameters β1, ..., βm+2 and cubic B-splines B1(x), ..., Bm+2(x) with knots z1, ..., zm. Thus, we can rewrite equation (4.12) as

s(x) = m+2

X

j=1

βjBj(x) (4.18)

Thus, from equation (4.17) we can infer:

∆(β) = 2 n X i=1 wi(yilogyi− yi m+2 X j=1 βjBj(xi) − yi+ exp{ m+2 X j=1 βjBj(xi)}) + λ m+2 X j=1 m+2 X k=1 βjβk Z zm z1 (B_j00(x)B_k00(x)dx = 2 n X i=1 wi(yilogyi− yi m+2 X j=1 βjBj(xi) − yi+ exp{ m+2 X j=1 βjBj(xi)}) + λ m+2 X j=1 m+2 X k=1 βjβkΩjk (4.19)

(25)

where

Ωjk = Z zm

z1

(B00_j(x)B_k00(x)dx (4.20) From Appendix B.2 ofOhlsson and Johansson((2010)), we learned that we can use the basic properties of B-splines to compute the value of Ωjk. Thus, we are going to apply partial derivatives to find the parameters β1, ..., βm+2 which are minimizing the function ∆(β). The steps are given as follows:

∂∆ ∂βl = 2 m X k=1 ˜ wk(−˜yk+ exp{ m+2 X j=1 βjBj(zk)})Bl(zk) + 2λ m+2 X j=1 βjΩjl= 0 (4.21)

After some simplifications, we can obtain the equation: m X k=1 ˜ wkexp{ m+2 X j=1 βjBj(zk)}Bl(zk) + λ m+2 X j=1 βjΩjl− m X k=1 ˜ wky˜kBl(zk) = 0 (4.22) for l = 1, 2, ..., m + 2

Equation (4.22) demonstrates that βl for l = 1, 2, ..., m + 2 s expressed in a non-linear way. In order to solve this equation, we are going to apply Newton-Raphson’s method. The first step of Newton-Raphson’s method is defining hl(β1, ..., βm+ 2) equal to (4.22), such that:

hl(β1, ..., βm+2) = m X k=1 ˜ wkexp{ m+2 X j=1 βjBj(zk)}Bl(zk) + λ m+2 X j=1 βjΩjl− m X k=1 ˜ wky˜kBl(zk), l = 1, 2, ..., m + 2 (4.23)

Then we solve the linear equation systems for the parameters β₁(n+1), ..., β_m+2( n + 1) iteratively in order to solve the problem that equation (4.23) equals zero. The method looks like below:

hl(β1(n), ..., β (n) m+2) + m+2 X j=1 (β(n+1)_j − β_j(n))∂hl ∂βj (β₁(n), ..., β_m+2(n) ) = 0 (4.24)

and the item ∂hl

∂βj can be expressed as:

∂hl ∂βj = m X k=1 ˜ wkexp{ m+2 X j=1 βjBj(zk)}Bj(zk)Bl(zk) + λΩjl (4.25)

To simplify the equation above, let’s define:

γ_k(n)= exp{ m+2

X

j=1

(26)

To simplify equation (4.24), we use the properties of equations (4.23), (4.25), (4.26). Thus, equation (4.24) turns out to be:

m X k=1 ˜ wkγ_k(n)Bl(zk) − m X k=1 ˜ wky˜kBl(zk) + λ m+2 X j=1 β_j(n)Ωjl+ m+2 X j=1 (β_j(n+1)− β_j(n))( m X k=1 ˜ wkγ_k(n)Bj(zk)Bl(zk) + λΩjl) = 0 (4.27)

By leaving β_j(n+1) on the left-hand side of equation (4.27) and placing β_j(n) on the right-hand side, we get:

m+2 X j=1 m X k=1 ˜ wkγ_k(n)Bj(zk)Bl(zk)βj(n + 1) + λ m+2 X j=1 β_j(n+1)Ωjl = m X k=1 ˜ wkγ (n) k (˜yk/γ (n) k − 1 + m+2 X j=1 β(n)_j Bj(zk))Bl(zk) l = 1, 2, ..., m + 2 (4.28)

Here we can construct two matrices to store the Bl(zk)0s and ˜wkγ (n) k 0_{s, so they} appear as: B =            B1(z1) B2(z1) ... Bm+2(z1) B1(z2) B2(z2) ... Bm+2(z2) ... ... ... ... B1(zm) B2(zm) ... Bm+2(zm)           

Thus, B is a m × (m + 2) matrix and the diagonal m × m matrix W(n)is:

W(n)=            ˜ w1γ1(n) 0 ... 0 0 w˜2γ2(n) ... 0 ... ... ... ... 0 0 ... w˜mγm(n)           

As we have done with B and W(n), we can make a symmetric (m + 2) × (m + 2) matrix Ω with the elements Ωjk, a vector β(n) made of the elements β

(n)

j , and a vector y(n) with elements ˜yk/γ(n)_k − 1 +P(n)j=1Bj(zk). Based on the properties and matrices mentioned before, we can rewrite the system of linear equations into the matrix form:

(B0W(n)B + λΩ)β(n+1)= B0W(n)y(n) (4.29) Now Newton-Raphson’s method can be applied to solve equation (4.29) for β. After calculating β, the natural cubic spline s(x) can be solved using equation (4.18). The exponential values of s(zi) f ori = 1, ..., m are the price relativities.

In the previous section, we assumed the observations of the key ratio Y were Poisson distributed. Now we will assume the observations of the key ratio Y are gamma

(27)

distributed. For this assumption, the deviance now is given by: D(y, exp{s(x)}) = 2 n X i=1 wi(yi/es(xi)− 1 − logyi+ s(xi)) (4.30)

Let’s combine the equation above with the conclusion we derived from equation (4.16). The penalized deviance can be shown as the following:

∆(s(x)) = 2 n X i=1 wi(yi/es(xi)− 1 − logyi+ s(xi)) + λ Z zm z1 (s00(x))2dx (4.31)

Now, we apply the property of the natural cubic spline s being expressed as the sum of B-splines. With this property, we can use the parameters β1, ..., βm+2 to form a function to describe the penalized deviance:

∆(β) =2 n X i=1 (wi(yi/exp{ m+2 X j=1 βjBj(xi)} − 1 − logyi + m+2 X j=1 βjBj(xi)) + λ m+2 X j=1 m+2 X k=1 βjβkΩjk (4.32)

Base on equation (4.32), we can use partial derivatives to find the parameters β1, ..., βm+2 which can minimize the equation (4.32). The partial derivatives are given as follows: ∂∆ ∂βl = 2 n X i=1 wi(−yi/exp{ m+2 X j=1 βjBj(xi)}Bl(xi) + Bl(xi)) + 2λ m+2 X j=1 βjΩjl = 2 m X k=1 X i wi(−yi/exp{ m+2 X j=1 βjBj(zk)}Bl(zk) + Bl(zk)) + 2λ m+2 X j=1 βjΩjl = 2 m X k=1 ˜ wk(− ˜yk/exp{ m+2 X j=1 βjBj(zk)} + 1)Bl(zk) + 2λ m+2 X j=1 βjΩjl (4.33)

We then set the equation (4.33) to zero and get: ∂∆ ∂βl = 0 ⇒ 2 m X k=1 ˜ wk(− ˜yk/exp{ m+2 X j=1 βjBj(zk)} + 1)Bl(zk) + 2λ m+2 X j=1 βjΩjl= 0 ⇒ m X k=1 ˜ wk(− ˜yk/exp{ m+2 X j=1 βjBj(zk)} + 1)Bl(zk) + λ m+2 X j=1 βjΩjl= 0 ⇒ − m X k=1 ˜ wk kBl(zk)/exp{ m+2 X j=1 βjBj(zk)}+ m X k=1 ˜ wkBl(zk) + λ m+2 X j=1 βjΩjl= 0 f or l = 1, 2, ..., m + 2 (4.34)

(28)

22 Wenwei Wu — Non-life Insurance Pricing Thus, we have: − m X k=1 ˜ wk kBl(zk)/exp{ m+2 X j=1 βjBj(zk)}+ m X k=1 ˜ wkBl(zk) + λ m+2 X j=1 βjΩjl= 0 f or l = 1, 2, ..., m + 2 (4.35)

As we saw in previous section, the item −Pm

k=1w˜k kBl(zk)/exp{ Pm+2

j=1 βjBj(zk)} also depends on parameters β1, ..., βm+2 in a non-linear way. Thus, we can use Newton-Raphson’s method to solve the equation (4.35) again.Preparing to use Newton-Raphson’s method starts with defining hl(β1, ..., βm+2 such that:

hl(β1, ..., βm+2) = − m X k=1 ˜ wky˜kBl(zk)/exp{ m+2 X j=1 βjBj(zk)} + m X k=1 ˜ wkBl(zk) + λ m+2 X j=1 βjΩjl f or l = 1, 2, ..., m + 2 (4.36)

Setting function (4.36) to zero, we then solve the equation for the unknown param-eters β₁(n+1), ..., β_m+2(n+1) by solving the linear equation systems iteratively:

hl(β1(n), ..., β (n) m+2) + m+2 X j=1 (β_j(n+1)− β_j(n))∂hl ∂βj (β₁(n), ...β_m+2(n) ) = 0 f or l = 1, 2, ..., m + 2 (4.37) where ∂hl ∂βj = m X k=1 ˜ wky˜kBj(zk)Bl(zk)/exp{ m+2 X j=1 βjBj(zk)} + λΩjl (4.38)

Now, let’s simplify equation (4.37) based on the properties of equations (4.36), (4.36), and (4.38): − m X k=1 ˜ wky˜kBl(zk)/γ (n) k + m X k=1 ˜ wkBl(zk) + λ m+2 X j=1 β_j(n)Ωjl + m+2 X j=1 (β_j(n+1)− β_j(n))( m X k=1 ˜ wky˜kBj(zk)Bl(zk)/γ_k(n)+ λΩjl) = 0 f orl = 1, 2, ..., m = 2 (4.39)

Here we will introduce a matrix W(n) which is an m × m diagonal matrix:

W(n)=            ˜ w1y˜1/γ₁(n) 0 ... 0 0 w˜2y˜2/γ₂(n) ... 0 ... ... ... ... 0 0 ... w˜my˜m/γm(n)           

(29)

As in the previous section, we will also introduce a vector β(n) with elements β_j(n) and a vector y(n) which the elements are 1 − γ(n)/˜yk +

Pm+2 j=1 β

(n)

j Bj(zk)’s. Thus, with all the matrices introduced, we can rewrite the system of linear equations into a transformation of matrices:

(B0W(n)B + λΩ)β(n+1)= B0W(n)y(n) (4.40) By deriving equation (4.40), the preparation for using Newton-Raphson’s method is complete. With Newton-Raphson’s method, we can use equation (4.18) to calculate the natural cubic spline s(x) and we write the price relativities as es(z1)_{, ..., e}s(zm)_.

4.4 Price Relativities - Several Rating Variables

Considering all the information in previous sections, we will now focus on several rating variables. One of the price relativities estimation algorithms which is itera-tively used within the GAM is the backfitting algorithm. Reducing the estimation problem to one dimension and considering only one continuous rating variable at a time are the core concepts of the backfitting algorithm. We assume we have a large number of categorical rating variables together with two continuous rating variables. The generalization to the case with an arbitrary number of continuous rating vari-ables is completely straightforward, by presenting the values of observation i for the first and second continuous rating variables as xi1 and xi2. On the other hand, we use two sets of parameters to denote the possible values for the continuous rating variables, which are z11, ..., z1m1 and z21, ..., z2m2. Thus, the mean of this model can

be calculated as: ηi = r X j=0 x0_ijβj+ m1+2 X k=1 β1kB1k(x1i) + m2+2 X l=1 β2lB2l(x2i), f or i = 1, 2, ..., n (4.41)

We will again consider two cases: claim frequency, and claim severity.

In this case, we assume the distribution of the observations of the key ratio Y is Poisson distribution first. Thus, we can write the penalized deviance as:

∆(β1, β2) = 2 n X i=1 wi(yilogyi− yiηi− yi+ eηi) +λ1 m1+2 X j=1 m1+2 X k=1 β1jβ1kΩ(1)jk + λ2 m2+2 X j=1 m2+2 X k=1 β2jβ2kΩ(2)jk (4.42)

The smoothing parameters in equation (4.42) are λ1 and λ2, which are for the first and second continuous rating variable respectively. We then we will make some

(30)

24 Wenwei Wu — Non-life Insurance Pricing notations: v0i= exp{ r X j=0 x0_ijβj} (4.43) v1i= exp{ m1+2 X j=0 β1jB1j(x1i)} (4.44) v2i= exp{ m2+2 X j=0 β2jB2j(x2i)} (4.45) With these notations, we can rewrite the deviance function as:

D = 2 n X i=1 wi(yilog yi− yiηi− yi+ eηi) (assume ηi = log µi) = 2 n X i=1 wi(yilog yi− yilog µi− yi+ µi) (with µi= v0iv1iv2i) = 2 n X i=1 wi(yilog yi− yilog(v0iv1iv2i) − yi+ v0iv1iv2i) = 2 n X i=1 wiv1iv2i yi v1iv2i log yi v1iv2i − yi v1iv2i log v0i− yi v1iv2i + v0i = 2 n X i=1 wiv0iv2i yi v0iv2i log yi v0iv2i − yi v0iv2i log v1i− yi v0iv2i + v1i = 2 n X i=1 wiv0iv1i yi v0iv1i log yi v0iv1i − yi v0iv1i log v2i− yi v0iv1i + v2i (4.46)

The estimation problem β0, ..., βrwould be the same as having the categorical rating variables only (the regularization terms in equation (4.42)), if β11, ..., β1,m1+2 and

β21, ..., β2,m2+2 were known to us. Unfortunately, as we can see in equation (4.46),

there are observations yi

v1iv2i and parameters wiv1iv2i. As in the situation if β0, ..., βr

and β21, ..., β2,m2+2 were known, the estimation problem of β11, ..., β1,m1+2 is exactly

as we previously solved in Section 4.3.1. With the continuous rating variable, the previous problem still exists.

Before we implement the backfitting algorithm, some initial estimations are re-quired. Deriving the ˆβ₀(0), ..., ˆβr(0) while excluding the continuous rating variables would be a good initial estimation. At the same time, we let β11, ..., β1,m1+2 and

β21, ..., β2,m2+2 equal zero. With the good initial estimations, we will assume:

ˆ v_0i(0) = exp    r X j=0 x0_ijβˆ_j(0)    (4.47) ˆ v_1i(0)= exp    m1+2 X j=1 ˆ β_1j(0)B1j(x1i)    = 1 (4.48)

(31)

ˆ v_2i(0)= exp    m2+2 X j=1 ˆ β_2j(0)B2j(x2i)    = 1 (4.49)

After making the assumptions above, we use three steps iteratively until the es-timation results converge (the tolerance 1e-10 is used in this paper). The steps are: · Step 1

For step 1, we are going to use the observations yi

ˆ

v_oi(q−1)ˆv(q−1)_2i and x1i is the only explanatory rating variable with the weights wiˆv(q−1)0i vˆ

(q−1)

2i to compute the val-ues of ˆβ₁₁(q), ..., ˆβ_1,m(q)₁₊₂. Based on equation (4.44), we can calculate ˆv_1i(q) with ˆv(q)_1i = exp{Pm1+2

j=0 β (q)

1j B1j(x1i)} · Step 2

We are again going to use the observations yi

ˆ

v(q−1)_oi vˆ_1i(q) and x2iis the only explanatory rating variable with the weights wivˆ(q−1)_0i vˆ_1i(q)to compute the values of ˆβ21(q), ..., ˆβ

(q) 2,m2+2.

Based on equation (4.45), we can calculate ˆv_2i(q)with ˆv_2i(q)= exp{Pm2+2

j=0 β (q)

To estimate parameters βˆ(q)₀ , ...,βˆ(q)r , we calculate them based on the observations yi

ˆ

v(q)_1i vˆ_2i(q) and the weights wivˆ (q) 1i vˆ

(q)

2i using only the categorical rating variables. With equation (4.43), we can derive the function of calculating ˆv_0i(q) which is ˆv_oi(q) = exp{Pr

j=0xij/ ˆβj(q)}

After iterating the steps above, we will focus on the estimation results, if the results are converged, we are going to use the converging results to get the price relativities for the categorical rating variables, by using:

γj = eβj, f or j = 0, 1, ..., r (4.50)

The natural cubic splines for the continuous rating variables can be calculated with:

s1(x) = m1+2 X j=1 β1jB1j(x) (4.51) s2(x) = m2+2 X j=1 β2jB2j(x) (4.52)

With equations (4.51) and (4.52), we can derive the price relativities for the contin-uous rating variables as before: es1(z11)_{, ..., e}s1(z_1m1) _{and e}s2(z21)_{, ..., e}s2(z_2m1)_.

Instead of assuming the key ratio Y is in a Poisson distribution, we will assume it is in a gamma distribution now. Thus, the penalized deviance will be presented as:

(32)

26 Wenwei Wu — Non-life Insurance Pricing ∆(β1, β2) = 2 n X i=1 wi(yi/eηi− 1 − log(yi/eηi)) +λ1 m1+2 X j=1 m1+2 X k=1 β1jβ1kΩ(1)_jk + λ2 m2+2 X j=1 m2+2 X k=1 β2jβ2kΩ(2)_jk (4.53)

Now, let’s derive the function for deviance again:

D = 2 n X i=1 wi(yi/eηi− 1 − log(yi/eηi)) = 2 n X i=1 wi(yi/µi− 1 − log(yi/µi)) = 2 n X i=1 wi(yi/(v0iv1iv2i) − 1 − log(yi/(v0iv1iv2i))) = 2 n X i=1 wi yi/(v1iv2i) v0i − 1 − logyi/(v1iv2i) v0i = 2 n X i=1 wi yi/(v0iv2i) v1i − 1 − logyi/(v0iv2i) v1i = 2 n X i=1 wi yi/(v0iv1i) v2i − 1 − logyi/(v0iv1i) v2i (4.54)

As before, if we know the values of β11, ..., β1,m1+2 and β21, ..., β2,m2+2, then the

estimation of β0, ..., βr will be the same as the problem with only categorical rat-ing variables. If we know the values of β0, ..., βr and β21, ...β2,m2+2, the estimation

problem of β11, ..., β1,m1+2 is the one we have saw in the previous section with

ob-servations yi/(v0iv2i. Similar result holds for the second continuous rating variable. We will then conduct the same initial estimation as in the previous case, following iteratively until the results are converged. Thus, the steps are:

· Step 1

We are going to use the observations yi

ˆ

v_oi(q−1)ˆv(q−1)_2i and x1i is the only explanatory rating variable to compute the values of ˆβ₁₁(q), ..., ˆβ_1,m(q)₁₊₂. Based on equation (4.44), we can calculate ˆv_1i(q) with ˆv_1i(q)= exp{Pm1+2

j=0 β (q)

We are again going to use the observations yi

ˆ

v(q−1)_oi vˆ_1i(q) and x2iis the only explanatory rating variable to compute the values of ˆβ₂₁(q), ..., ˆβ_2,m(q)₂₊₂. Based on equation (4.45), we can calculate ˆv_2i(q) with ˆv_2i(q)= exp{Pm2+2

j=0 β (q)

For the estimated parametersβˆ₀(q), ...,βˆr(q), we calculate them based on the observa-tions yi

ˆ

v_1i(q)ˆv(q)_2i used only with the categorical rating variables. With equation (4.43), we can derive the function of calculating ˆv(q)_0i , which is ˆv_oi(q)= exp{Pr

j=0x0ijβˆ (q) j }

(33)

After the estimation results have converged, we can use the fact:

γj = eβj, f or j = 0, 1, ..., r (4.55) to calculate the price relativities for the categorical rating variables.

The functions of calculating the natural cubic splines for the continuous rating vari-ables are: s1(x) = m1+2 X j=1 β1jB1j(x) (4.56) s2(x) = m2+2 X j=1 β2jB2j(x) (4.57)

At last, we will use the exponential functions of the functions above to get the price relativities for the continuous rating variables, which are: es1(z11)_{, ..., e}s1(z1m1) _and

(34)

Chapter 5

Cross Validation Method of

Choosing the Smoothing

Parameter

This chapter will focus on the best method of choosing the smoothing parameter. As we mentioned at the beginning of Chapter 4, the smoothing parameter will generate a trade-off between a good fit to the data and low variability of the function f . If the value of λ is small, it will cause the weight put placed on the data to increase and allow f to vary freely. Otherwise, a small λ value will cause a decrease in the weight put on data, then the integrated squared second derivative of the function f will be small.

There are many methods we can use to find the optimal smoothing parameter, but in this case we will discuss the cross validation method only. Other methods like the L-curve will not be addressed in this study.

The cross validation method is a way of measuring the predictive performance of a statistical model. While there are a number of variations on the cross validation method, we will be using the leave-one-out cross validation method due to its simple properties. Only a single continuous rating variable will be considered using this method of cross validation. Because the calculations in both Poisson distribution’s case and gamma distribution’s case are very close, we can consider the case of Poisson distribution only. The fitted natural cubic spline s minimizes the function below:

∆(s(x)) = 2 m X k=1 ˜ wk(˜yklog ˜yk− ˜yks(zk) − ˜yk+ es(zk)+ λ Z zm z1 (s00(x))2dx (5.1)

If we delete one particular zk and the corresponding ˜yk from the data set, then for any λ, we can get the new minimizing natural cubic spline sλ_k(x) for the changed data set. A good λ will lead to a good predictor sλ

k(x) of the corresponding removed data point ˜yk, and this must hold for any k. The cross validation score measures the overall ability to predict all the removed ˜y0_ks. The cross validation score can be

(35)

defined as: C(λ) = 2 m X k=1 ˜ wk(˜yklog ˜yk− ˜ykskλ(zk) − ˜yk+ exp(sλk(zk)) (5.2)

If the value of C(λ) gets smaller, it leads to a better ability to predict the deleted data points.

The core concept of cross validation is to select the λ which can minimize equa-tion (5.2). We calculate C(λ) by finding all the minimizing natural cubic splines s1λ, ..., sλm. This method can be time-consuming when seeking the optimal result. A developed method has been introduced in other studies, but it will not be covered in this thesis.

(36)

Chapter 6

Results

In this chapter, we will use some pubic insurance data to calculate and compare the results of different models. We will be using the R software to calculate and analyse the results from different models. The data used was published as a package for R.

6.1 Data

Using the package ’insuranceData’, seeSimonN((2006)) in R, we load the data set in the package named ’dataCar’, which is based on one-year vehicle insurance policies taken out in 2004 or 2005. There are 67856 policies and the data was used byJong and Heller((2008)). The data frame consists of 67856 observations on the following 11 variables:

1. veh value vehicle value, in 10,000 dollars 2. exposure 0-1

3. clm occurrence of claim (0 = no, 1 = yes) 4. numclaims number of claims

5. claimcst0 claim amount (0 if no claim)

6. veh body vehicle body, coded as BUS CONVT COUPE HBACK HDTOP MCARA MIBUS PANVN RDSTR SEDAN STNWG TRUCK UTE

7. veh age 1 (youngest), 2, 3, 4 8. gender a factor with levels F M 9. area a factor with levels A B C D E F 10. agecat 1 (youngest), 2, 3, 4, 5,

11. X OBSTAT a factor with levels 01101 0 0 0

We will use the variables: veh value, clm, veh body, veh age, gender, area and agecat from the data set to construct our GLM-logisitic and GAM-logistic testing models and analyze the results.

Except for the continuous explanatory variable veh value, we will set some base levels for other variables: agecat 1, gender Female, area C, veh age -1, veh body - SEDAN.

We assume there is a probability of π which represents when a claim occurs (clm = 30

(37)

1).

6.2 Testing Models and Results

The GLM will be constructed as:

logitπ =β0+ β1agecat + β2area + β3veh body + β4gender + β5veh age + β6veh value + β7(veh value)2

(6.1)

and the GAM as:

logitπ =β0+ β1agecat + β2area + β3veh body + β4gender + β5veh age + s(veh value)

(6.2)

where the parameter β0 is the constant, and the other βis are the weights of the explanatory variables. In equation (6.2) the function s(·) is the smoothing splines function of the continuous rating variable veh value. (Segurado et al. ((2006)))

We will use the functions named ’glm’ and ’gam’ from the package ’mgcv’ in R to analyze 6 different models, 3 in GLM and 3 in GAM.

Model 1: uses agecat, area, veh body and veh value as explanatory variables. Model 2: uses agecat, area, veh body, veh value and veh age as explanatory variables.

Model 3: uses agecat, area, veh body, veh value and veh age as explanatory variables.

Model 4: we assume the variables agecat, area and veh body have linear effects and variable veh value has a non-linear effect.

Model 5: we assume the variables agecat, area, veh body and veh age have linear effects and variable veh value has a non-linear effect.

Model 6: we assume the variables agecat, area, veh body, veh age and gender have linear effects and variable veh value has a non-linear effect.

The estimation resultes of the GAM-logistic are presented in Table 6.1. (the signs ***, ** and * represent that the results are significant under the 1%, 5% and 10% confidence intervals respectively). The results demonstrate that the variable gender has an insignificant effect on clm, so we should delete this variable.

6.2.1 AIC

To determine which model is best, we will use the Akaike information criterion (AIC). The AIC is used to examine the complexity of the model together with the goodness of its fit to the sample data, and to produce a measure which balances the two. The better model has the lower AIC value. We can write the function of AIC

(38)

Table 6.1: Estimation results of GAM-logistic model

Variable βˆ Variable βˆ Variable βˆ

β0 -2.6889*** veh bodyBUS 1.0847** veh bodyTRUCK -0.1035 agecat1 0.2293*** veh bodyCONVT -0.6367 veh bodyUTE -0.2933***

agecat2 0.0245 veh bodyCOUPE 0.2409 areaA -0.0391

agecat3 0.0000 veh bodyHBACK 0.0348 areaB 0.0605

agecat4 -0.0288 veh bodyHDTOP 0.0766 areaC 0

agecat5 -0.2201*** veh bodyMCARA 0.4319 areaD -0.1281* agecat6 -0.2207*** veh bodyMIBUS -0.1892 areaE -0.0582

veh age1 0.0000 veh bodyPANVN 0.1789 areaF 0.0463

veh age2 0.1664*** veh bodyRDSTR -0.0698 s(veh value) *** veh age3 0.1450** veh bodySEDAN 0

veh age4 0.1984** veh bodySTNWG -0.0884

in terms of k (the number of parameters) and l (the loglikelihood):

AIC = −2l + 2k (6.3)

The results of AICs are presented in Table 6.2.

Table 6.2: Results of AICs

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6

Degrees of freedom 25 28 29 25.9781 29.2036 30.2083

AIC 33642.56 33634.54 33636.32 33641.79 33633.15 33634.91 We can see that Model 5 has the lowest AIC value, so the estimation results of Model 5 will be more accurate than the others.

6.2.2 Explained Deviance

Another method of comparing the models is analyzing the explained deviances. The explained deviances to the models are 0.52%, 0.56%, 0.56%, 0.52%, 0.57% and 0.57% respectively. Both Model 5 and Model 6 have higher explained deviances, which means their results are more accurate than the other models.

6.2.3 Deviance

Another method of comparing the models is to analyze the deviance by using the function anova in R. We can use this method to present the residual deviances and χ2 of degrees of freedom of the models. Table 6.3 presents these results.

The presence of ’**’ means they reject the null hypotheses under the 0.1% con-fidence interval.

(39)

Table 6.3: Analysis of the Model’s Deviances

Residual df. Residual deviance Diff. df. Diff. deviance Pr(> χ2)

Model 1 67831 33593 - - -Model 2 67828 33579 3 14.024 0.0029** Model 3 67827 33578 1 0.218 0.6406 Model 4 67830 33590 -3.0219 -11.521 0.0094** Model 5 67827 33575 3.2255 15.099 0.0022** Model 6 67826 33574 1.0047 0.251 0.6183

From the results in Table 6.3, we can see that Model 2 is better than Model 1 and Model 3, Model 4 is better than Model 3, and Model 5 is better than Model 4 and Model 6.

6.2.4 GAM UBRE scores

To select the best GAM model, we can take a look at their UBRE scores. A lower UBRE score means the estimation results are more accurate. The UBRE scores for Models 4, 5 and 6 are -0.5042, -0.5044 and -0.5043 respectively, so Model 5 is the best.

6.3 Results

Based on these the comparison results, Model 5 is the best of the 6 models. We can apply the estimation results from Table 6.1 to equation (6.2) to get the final GAM-logistic regression function:

ln πˆ

1 − ˆπ = −2.6689 + 0.2293x1+ ... + 0.1450x24+ 0.1984x25+ s(x26). (6.4) Using the function ’predict.gam’ in R, we recieve 67856 linear predictions (the value of logitˆπ). The first 5 values are: -2.5948, -2.6730, -2.5957, -2.4282 and -2.6251 (Lee and Carter ((1992))). We can then apply the function:

ˆ

π = exp(logitˆπ)

1 + exp(logitˆπ) (6.5)

The values of ˆπ are: 0.0695, 0.0646, 0.0694, 0.0810 and 0.0675. Key information from all prediction results is presented in Table 6.4:

Table 6.4: Statistic information of prediction results

Minimum 25% quantile Median Mean 75%quantile Maximum ˆ

(40)

Chapter 7

Summary and Conclusion

Since the 1990s, when British actuaries introduced GLMs into the pricing of non-life insurance, GLMs have become the most widely-used models in many insurance companies. However, due to the complexity of insurance data, there are some disad-vantages to GLMs, which lead to challenges in real life. In this paper, we discussed the GAMs along with GLMs in order to produce a new concept for pricing of non-life insurance.

We discussed both GLMs and GAMs in previous chapters and can draw some conclusions regarding the use of both models to predict claim frequency for car insurance:

From the results presented in Chapter 6, we can see that:

1.With either GAMs or GLMs, the predictions of claim frequency are affected by variables such as: car value, driver age, area, car type, and car age. This is similar to many other previous, correlated studies. The variables’ impact on the predictions differ by country, area, or other characteristics.

2. We compare 6 different models using AIC, deviance, and UBRE score. The best model was determined to be the Model 5, thus the conclusion in this case is that the semi-parametric GAM has many advantages over the GLM. The prediction results are more reliable and accurate.

For the GAMs there are many methods for choosing the optimal smoothing pa-rameter beyond just cross validation. In the future, we would like to test the L-curve method along with the cross validation method to see whether this has any impact on the estimation results of the GAM.

(41)

A. Guisan, T. C. E. Jr, and T. Hastie. Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecological Modelling, 157(23):89–100, 2002.

T. Hastie and R. Tibshirani. Generalized additive models. Statistical science, pages 297–310, 1986.

P. D. Jong and G. Z. Heller. Generalized Linear Models for Insurance Data. Cam-bridge University Press, 2008.

R. Kaas, M. Goovaerts, J. Dhaene, and M. Denuit. Modern Actuarial Risk Theory— Using R. Springer, Heidelberg, second edition, 2008.

R. Lee and L. Carter. Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419):659–671, 1992.

P. M. Mccullagh and J. A. S. Nelder. Generalized linear models, 2nd ed. Applied Statistics, 39(3), 1989.

E. Ohlsson and B. Johansson. Non-life insurance pricing with generalized linear models, volume 21. Springer, 2010.

P. Segurado, M. B. Arajo, and W. E. Kunin. Consequences of spatial autocorrelation for nichebased models. Journal of Applied Ecology, 43(3):433–444, 2006.

SimonN. Generalized additive models : an introduction with R. Chapman Hall/CRC, 2006.

Non-life insurance pricing using the generalized linear model and the generalized additive model

Generalized Linear Model and the

Generalized Additive Model

Wenwei Wu

Contents

Preface

Introduction

Non-life Insurance Pricing

2.1

Tariff analysis

2.2

Rating variables

2.3

Multiplicative model

Chapter 3

Non-life Insurance Pricing

Using The GLM

3.1

Exponential Dispersion Models

3.2

Link Function

3.3

Estimation of the GLM

Chapter 4

Non-life Insurance Pricing

Using The GAM

4.1

Penalized Deviances

4.2

Smoothing Splines

4.3

Price Relativities - One Rating Variable

4.4

Price Relativities - Several Rating Variables

Chapter 5

Cross Validation Method of

Choosing the Smoothing

Parameter

Chapter 6

Results

6.1

Data

6.2

Testing Models and Results

6.3

Results

Chapter 7

Summary and Conclusion