Pricing insurance products in the presence of multi-level factors

(1)

Pricing insurance products in the

presence of multi-level factors

Aart Valkhof

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics Author: Aart Valkhof Student nr: 0182737

Email: valkhof@hotmail.com

Date: May 22, 2016

Version: 1.0.0

Supervisor: Dr. K. Antonio Second reader: Dr. S.U. Can

(2)

This thesis discusses the pricing of multi-level factors for non-life insurance products. We explore several pricing techniques like credibility, GLM and mixed models. We ques-tion ourselves whether the latter has addiques-tional value for the pricing. We investigate four different implementations of mixed models: the backfitting algorithm, Laplace ap-proximation, the penalized quasi-likelihood method and the Gauss-Hermite quadrature method. The backfitting algorithm is comprehensive to implement, has no statistical framework, but provides valuable credibility factors. The other three techniques are of-fered by standard software. They come with a powerful statistical framework, but lack credibility factors. We apply the techniques on a commercial general liability portfolio. The data that we use for categorizing the company activities contain a nested structure. We conclude that the use of this structure in your models brings an improvement for the pricing of the multi-level factor.

Keywords Insurance, Actuarial, Liability, Pricing, GLM, Credibility, Backfitting algorithm, GLMM, GLMC, Hierarchical models, Multi-level factor, MLF

(3)

Introduction

The task of the pricing actuary is to develop a premium based on factors that represent the risk of the policy. The actuary takes into account two types of rating factors (Mano

and Rasa(2006)).

1. Continuous factors like driver’s age or reinstatement of a building;

2. Factors that are categorical without an inherent ordering, such as type of fuel, occupation and economic activity.

The latter can take on a certain number of values. These are referred to as the levels of a factor. Pricing is straightforward when the number of levels of the categorical factor is small as is the case with type of fuel. When the number of levels is large and there is no logical way to order them, then the pricing becomes problematic. Ohlsson and

Johansson (2004) denominate rating factors with a large number of levels as

multi-level factors (MLFs). We want to include this type of information in the pricing list - besides the non-MLFs, also called ordinary rating factors - even though the factor consists of many levels that do not have a sufficient amount of data. The model of a car is an example of a MLF. It is an important rating factor in motor insurance. There are several thousands of car model classes. Some represent popular cars with sufficient data available, some have sparse data. Another example is geographical zones. Densely populated zones within cities have enough data. Zones from rural territories have less data.

It is common to use generalized linear models (GLMs) to estimate non-MLF rating factors.McCullagh and Nelder(1989) start the rise of GLM as the most important statistical technique for the non-life pricing actuary. Many textbooks give a general context of GLM, see Kaas et al. (2008), Ohlsson and Johansson (2010) and Frees et al. (2014). GLM is very suitable to estimate claim frequencies and claim severities in the presence of risk factors. Firstly they enable the actuary to use a distribution of the exponential family. Secondly, GLM offers opportunities for multiplicative models rather than for additive models. In this way relative increments are obtained, which is desirable for pricing.Ohlsson and Johansson(2004) andMano and Rasa(2006) illustrate the problems with GLM pricing of a MLF. In GLMs the effects are fixed, which makes GLM only suitable for categorical covariates with a limited set of levels like type of fuel and age class. A MLF will have levels where the data is too sparse to estimate a fixed regression parameter for such levels. Estimating parameters for these sparse levels will lead to strange and unreliable results. Another way to deal with MLF is to cluster the levels to groups with enough data points. Often this clustering is done in an ad hoc way, or requires a lot of time and manual intervention to execute many hypothesis tests. The actuary ends up merging levels to get more reliable estimates at the price of a less detailed tariff.

Credibility theory is a very old pricing technique. Credibility aims to price indi-vidual insurance contracts taking into account the whole portfolio as well. It is a trade

(7)

2 Aart Valkhof — Pricing multi-level factors

off between an insured’s own loss experience and the experience of the whole portfolio. The expression credibility alludes to the weight given to the experience of the individual. If this individual experience is credible, then the individual experience will determine the premium rate. Vice versa, if the individual experience is not credible, then the collective experience will determine the premium. These are two extreme positions. In practice a compromise between the extreme positions determines the premium. We distinguish two different types of credibility: limited fluctuations credibility and greatest accuracy credibility. The limited fluctuations credibility descends from the beginning of the twen-tieth century. Although this theory provides simple solutions, it suffers from a lack of theoretical justification (Denuit et al.(2007)). In this thesis we will focus on the second type that was introduced byB¨uhlmann(1967) in the 1960s. See B¨uhlmann and Gisler

(2005) for an extensive exposition on credibility. Credibility is a pricing technique that can solve the rating for MLFs, but it does not have the advanced statistical framework that GLMs have. Ohlsson and Johansson (2004) propose to use a combination of gen-eralized linear model and credibility in order to treat the situation where we have MLF besides non-MLF.

The idea of combining GLM and credibility (GLMC) was introduced inNelder and

Verrall (1997). They showed how credibility theory can be a building block for a

hier-archical generalized linear model (HGLM). Ohlsson (2008) presents the GLMC model assumptions and estimators for MLFs. The article also provides several examples, such as the MLFs car brand and car model for a motor insurance product. This is an example of a nested data structure, where car model is hierarchically ordered under car brand.

Mano and Rasa (2006) work out an application of a GLMC model with spatial data as

a MLF. Frees et al. (2014), chapter 16, discuss the pricing of MLFs with generalized linear mixed models (GLMMs). GLMM is an extension to GLM. A GLMM can treat the situation where there are fixed and random effects. The theory of GLMM is covered

inBreslow and Clayton(1993). The non-MLF are seen as fixed effects and are modeled

by the GLM. These explanatory variables are fixed, but unknown. The MLFs are seen as random effects. These are random variables that capture the effects that are not ob-servable in the other explanatory variables. By using GLMC and GLMM the advantages of GLM and credibility can contribute to the solution of our problem with the pricing of MLF. The mathematical calculations of GLMM are complex.Frees et al.(2014), chapter 16, tackle three methods to estimate the regression parameters and the random effects of a GLMM: approximating the integrand with the Laplace method; approximating the data; approximating the integral through numerical integration. Including the approach of Ohlsson this gives us four methods to determine the estimators of a MLF. In these methods we need to distinguish between non-hierarchical and hierarchical models to investigate the added value of the nested structure of the data.

1.1 Aim of thesis

The aim of this master paper is to compare the estimates obtained by GLMC to the estimates obtained by the three mentioned estimating methods of GLMM. We do this by making use of a portfolio of a commercial-line insurance product, namely commercial general liability (CGL). The data of the portfolio is made available by a Dutch insurer. Currently the insurer constructs the tariff of the product by a GLM with only non-MLFs. With the help of expert judgment this tariff is transformed to a tariff for the MLF levels. This transformation costs a lot of time and the expert judgment is not without subjectivity. We will discuss the current model and investigate if the help of mixed models will lead to improvements. We will construct a GLM model with the MLF as a fixed effect to demonstrate the problem of pricing the MLF. Subsequently we will propose GLMC and GLMM models as an improved alternative. Finally we ask ourselves if the nested structure of data could contribute to an improved tariff.

(8)

1.2 Outline of thesis

In chapter 2 we explain the product of commercial general liability (CGL). CGL is a product that is not commonly used for research questions, because of the shortage of data. Motor products are more likely to appear in actuarial literature. However, the lack of data is exactly the reason why it is interesting to use a CGL portfolio for the pricing of MLF. In this chapter also the MLF that we will focus on, is introduced: the economic activity of an enterprise indicated by a code. The data of the portfolio is described in chapter 3. In chapter4 the theoretical framework can be found. We define some elementary terms and elaborate about different basic concepts. Furthermore we discuss the main theory of GLM, credibility and GLMM. We present the results of the analysis in chapter5 and compare the outcomes of the different estimation techniques. We conclude with the conclusion in chapter 6.

(9)

Chapter 2

Commercial general liability

This chapter describes the commercial general liability (CGL) insurance product. For the literature related to this chapter we refer toKealy(2015). After the introduction in section2.1we enumerate the main coverages in section2.2. Every coverage is illustrated with an example. We continue in section2.3with exploring some special characteristics of the product like the occurrence limit and long settlement periods. Finally section2.4

explains the role of the economic activity as an important MLF in the price list of this product.

2.1 Introduction

General liability is an insurance product that prevents the insured from suffering major financial losses if an entity is held legally responsible in an event of bodily injury or damage. For instance, if a customer enters a shop and slips due to the wet floor, then the shop owner is set liable for the customer’s damage. General liability can involve private and commercial entities. In case of a private entity we talk about private lia-bility, in case of a commercial entity we talk about commercial liability. This thesis is about commercial general liability. If in this thesis one states liability, then this means commercial general liability.

2.2 Coverage

The product consists of three main coverages as expressed in the list below. Every coverage is clarified with an example.

1. Third-party liability. The enterprise is protected against general liability claims from third parties. Section2.1 gives an example.

2. Employer’s liability. The enterprise is protected for financial losses in case of a claim from an employee. For example, a construction worker falls from a scaffold. This is a tragic event especially because the safety net was not in place. The employer of the injured worker is set liable for the cost of the injuries.

3. Product liability. The enterprise is protected against claims that are caused by a delivered product or service. For example, a plastic packaging supplier sells one-liter containers to a mouthwash manufacturer. The containers have special caps with child safety locks. The customers of the mouthwash manufacturer complain because the mouthwash containers have bad child safety locks. The mouthwash manufacturer is forced to recall all containers and to compensate its customers. The mouthwash manufacturer holds the packaging supplier responsible for the incorrect product and claims the cost for the recall activities and for the compen-sation.

(10)

In practice there are far more coverages, but this goes beyond the scope of this thesis.

2.3 Product characteristics

Property and motor hull insurance products know their maximum loss beforehand. It is the reinstatement of a building or the catalog value of a vehicle. But with general liability there is no intrinsic maximum loss. The loss depends on the claim events and the maximum size of the claim is not a priori known. In order to limit the risk of the insurer and to cap the loss every policy is subject to a predefined occurrence limit and to an annual aggregate limit. The occurrence limit is the maximum reimbursement that the insurance policy will pay for a single event. The annual aggregate limit is the most that an insurance policy will pay regardless of the number of events in a year. It is common to define a limit with round numbers like 0.5, 1.0, 1.25 or 2.5 million. In general the annual aggregate limit is twice as much as the occurrence limit. If the claim is higher than the limit, then the insurer is not accountable for the surplus.

General liability is characterized by long settlement periods and long reporting delays. Even years after the policy year is ended, a claim can be reported. In contrast to property insurance, where a claim has typically been observed together with the occurrence of the insured event, the insurer can face the claim long after the event actually happened. A well known example are asbestosis claims, see Ratner (1983). It can take decades before the bodily injury of a worker that touched asbestos, manifests itself. Settlements in court can drag along for years and the bodily injury can worsen over the years. Such claims go together with a large claim size and it can take years before the exact claim size is known. This makes it difficult to determine the ultimate claim cost and number of claims. Kaas et al.(2008), chapter 10, explains the modeling of the ultimate claim size and claim cost by using reserving techniques, but this goes beyond the scope of this thesis.

Table 2.1 demonstrates the claim handling of CGL by means of an example. It shows the information of the policyholder and the claim information, like claim cause and financial settlement. Although the claim cost is not that large it takes more than a year to settle the claim.

Information of policyholder

Economic activity Wholesale of agricultural machinery, equipment and (SIC code) tractors (4661)

Business sector Revenue Per occurrence limit

Annual

aggregate limit

Wholesale 7.786.000 2.500.000 5.000.000

Claim information

Cause Improper adjustment of cattle concentrate feed making machine which disrupts milk production. The concentrate is destroyed.

Claim date 15-04-2013

Date Remark Payment Reserve Incurred

15-05-2013 Claim reported 5.000,00 5.000,00

09-12-2013 Payment 5.806,18 806,18 5.806,18

23-04-2014 Payment 6.125,36 6.125,36 11.931,54

29-04-2014 Closed 11.931,54

(11)

2.4 Standard Industrial Classification

The economic activity of an enterprise is an important risk driver in several non-life products for commercial lines. The Standard Industrial Classification of Economic Ac-tivities codes (SIC) - or Standaard Bedrijfsindeling (SBI) in Dutch - classifies enterprises by the type of economic activity in which they are engaged. The list of classifications consists of approximately 1200 levels and is maintained by Statistics Netherlands. For more information on SIC codes and a complete list of the SIC codes we refer toStatistics

Netherlands (2015).

The importance of the economic activity for the CGL product is illustrated with the following examples.

• Illustration 1: shops selling kitchens have more third-party risk than shops selling books. Workers will install the kitchens at the customer’s (third-party) house and the customer’s property can be damaged. This increases the risk of a claim. For example, the plumbing system may start leaking because of the activity of the workers. This damages the customer’s floor. The kitchen shop is responsible for the damage.

• Illustration 2: an enterprise that has an activity in the construction business has more employer’s liability risk than an enterprise with an administrative activity. The probability of an accident is higher at a construction site than at an office. • Illustration 3: shops selling medical goods have more product liability risk than

shops selling vegetables. In case there is something wrong with the medicine, then the probability of bodily injury is higher.

The SIC code has a hierarchical structure. At the top of the structure there is the business sector, like Retail, Wholesale, Construction, Industry, etc. Below this top there are four nested layers. The top layer is indicated by two digits, the deepest layer is indicated with 5 digits. Table 2.2 shows a part of the SIC code listing (Statistics

Netherlands (2015)). The example displays a part of the business sector Retail. SIC

codes that are indicated with an arrow have a deeper layer. SIC codes that are indicated with a diamond are the deepest or end layer. The example shows that the deepest layer does not have to be a five digit SIC code.

SIC Description

. . . B 47 Retail trade

. . . B 472 Specialized shops selling food and beverages . . . 4721 Shops selling potatoes, fruit and vegetables

. . . B 4722 Shops selling meat and meat products, game and poultry . . . 47221 Shops selling meat and meat products

. . . 47222 Shops selling game and poultry . . . 4723 Shops selling fish

. . . B 4724 Shops selling bread, pastry, chocolate and sugar confectionery . . . 47241 Shops selling bread and pastry

. . . 47242 Shops selling chocolate and sugar confectionery Table 2.2: Structure example of the SIC coding in the retail sector.

In 2008 Statistics Netherlands reformed the structure of the SIC codes. The re-form mainly involved the transre-formation from a national coding to a more international coding. Nowadays the SIC codes are almost one hundred percent in line with the in-ternational standard of the European Union. One of the reforms eliminated the codes with six digits. See Eurostat(2015) for information on the European coding.

(12)

Data

The data used in this thesis is from a general liability insurance product of a commercial portfolio from a Dutch insurance company. The data covers the years 2012-2015, but for 2015 the data only covers the first three quarters of the year. The targeted customers are independent contractors and small and medium-sized enterprises (SMEs)1. Besides the main risk drivers of the premium - revenue, business sector, occurrence limit and economic activity - the year of occurrence of the claim is available. The available claim statistics are claim amount and number of claims. We will analyze these statistics per claim year. The data set does not distinguish between the three main coverages as described in section2.2.

3.1 Descriptive statistics

In this section we describe the explanatory variables. Table3.1summarizes all covariates of the data set. Then we discuss every variable briefly in section 3.2.

Continuous covariates

Field name Description Min/Max Mean Std.dev.

Gross Revenue Yearly revenue of enterprise 1/64,000,000 466,771 1.340×106

Categorical covariates

Field name Description Levels Mode

Claim year Year in which claim occurs 2012, . . . , 2015 2014 Business sector Clustering of economic

activities

Wholesale, Retail, Construction, Garage, Hospitality, Manufacturing

Hospitality

SIC Code for economic activity 349 levels 56101

Limit The maximum loss per

event

1.25/2.50 mio. 2.50 mio.

Table 3.1: Description of covariates in the data set.

1

According to the definition of theEuropean Union(2003) SMEs have an annual revenue of less than 50 million euro.

(13)

3.2 Data fields

In this section we discuss in detail the covariates, or: risk factors, present in the data set. In the tables we use the exposure as the sum of the duration per policy in years. N is the sum of the number of reported claims. The frequency is the division of N by the exposure.

3.2.1 Claim year

The claim year is the year in which the claim occurred. This data set contains four consecutive years of data: 2012, 2013, 2014 and 2015. Table 3.2 shows the frequency per claim year. The frequency drops in 2015. This is not because 2015 is a year with less claims, but this is a consequence of the delay of claim reporting that comes with a liability product as discussed in section 2.3. We can conclude that the claim statistics for the year 2015 are incomplete. Therefore the claim year 2015 will be excluded in all research queries for this thesis.

Claim year Exposure N Frequency

2012 2,685 105 0.039

2013 3,668 150 0.041

2014 6,898 332 0.048

2015 6,377 230 0.036

Table 3.2: Claim statistics per claim year.

3.2.2 Business sector

The economic activities are clustered in business sectors. The data set contains six business sectors. More sectors exist but are not present due to several reasons. Firstly because of the high risk several sectors are unacceptable risks. Secondly because the sectors with negligible exposure are excluded from the data. The business sector has a one-to-one relation with the economic activity. This one-to-one relation is defined by Statistics Netherlands. In table 3.3 the exposure and claim statistics per cluster is stated. We see that the business sector Construction has the highest frequency. This corresponds with the risk profile of this sector. Due to the heavy labor on construction sites more accidents happen compared to other sectors.

Business sector Exposure N Frequency Construction 3,541 303 0.086 Garage 192 5 0.026 Hospitality 4,191 122 0.029 Manufacturing 722 36 0.050 Retail 3,335 73 0.022 Wholesale 1,271 48 0.038 Total 13,252 587 0.044

Table 3.3: Claim statistics per business sector over the claim years 2012 till 2014.

3.2.3 Standard Industrial Classification

The economic activity of an enterprise is expressed in a Standard Industrial Classifi-cation. This classification is in detail described in subsection 2.4. Although there are

(14)

0.0 0.2 0.5 0.8 1.0 56101 56102 5630 5621 55101 55102 56103 5629 5530 55201 Hospitality 4120 4332 4334 43221 432101 4312 4333 43222 4331 4339 42112 43993 4222 422101 42111 43992 432902 4391 Construction 4711 47713 47712 47761 4726 4742 47591 47221 47789 47793 47721 47763 47643 47717 4723 47241 47593 47762 47741 4725 4741 4721 47528 47711 47782 4765 47714 47299 47819 47525 476203 47716 47291 47526 47431 47597 47783 477814789947592 47432 47642 47521 47527 4777 477914782 472924724247811 4761 47293 47594 47742 475969478914752247511 47792 4775 475301 476202 476201 47722 47222 475134754347892 4764447530247715 47191 47524 4763 47999 47541 Retail 4651 4622 4634 466992 464999 46473 46462 4649924673846311 46739 46383 46496 4639 46731 46471 466991 4662 4652 46695 46741464214645 4672246422 46442 46737 46429 46425 46772 466501 463202 46433 46472 46732466824649914643646901 46497 46423 4666 46424 46382 46694467364644146692 46762463814669994674246495 46435 46219 46691 46432 46498467614633146412 Wholesale 31011 1071 2561 2562 16231181292550 18123 310902 1392 16101 3012108401222301331234332212 331212 2312 309201 2030 2893 2740181323611292022370 2512231128493213331211 2341 331213 1105 310901 18121 16239 Manufacturing 45311 45112 4730 4532 45194 Garage

Figure 3.1: Mosaic graph per SIC clustered by business sector over the claim years 2012 till 2014. The color indicates the frequency as displayed in the bar. The size of the rectangles indicates the exposure.

approximately 1200 SIC codes, the data only contains 349 codes. This is because the insurer rejects enterprises within some specific economic activities. For instance, Roof-ing (SIC code 4391) has a high liability risk and is rejected. Our data still contains SIC codes with six digits. This is a legacy from the old coding as described in subsection

2.4.

Figure 3.1 plots the exposure and claim frequency corresponding to the SIC codes present in our portfolio. Every rectangle represents a SIC code (349 codes). The size of the rectangle is proportional to the exposure. Some rectangles are so small that only a part of (or no) SIC code can be displayed. The color of the SIC indicates the observed claim frequency over the total observation period. Close to white means a claim frequency of zero. Dark red means a frequency of one. The SIC codes are clustered by business sector. The cluster Hospitality, which based on exposure is the biggest sector, consists of only eleven SIC codes. Manufacturing, which has a much smaller exposure, consists of 116 SIC codes. Manufacturing has a few SIC codes with a very high frequency, but most of the SIC codes have a very low or zero frequency. These high and low frequencies do not tell the full story. For instance, SIC code 3230 (Manufacture of sports goods) from the sector Manufacturing, shows dark red and has a frequency of 0.88. It has three claims on an exposure of 3.4. Is this due to the fact that this economic activity involves many risks? Or is this just one bad policy?

3.2.4 Revenue

Revenue is the annual revenue of an enterprise and is expressed in euro. It is a continuous variable, but in this thesis we transform this information into a categorical variable. The revenue is classified into 10 percentiles: 0% to 10%, 10% to 20%, 20% to 30%, etc. The classes are sequentially numbered 1, 2, 3 to 10. This is done per business sector because the magnitude of the revenue differs per sector. For example, the sector

(15)

Wholesale Retail Construction Garage Hospitality Manufacturing

Re

v

en

ue

0 500000 1000000 1500000 2000000 2500000 3000000

Figure 3.2: Revenue deciles per business sector over the claim years 2012 till 2014. The top decile is omitted due to lack of space.

Wholesale has higher revenue ranges compared to the other sectors. This is shown by figure 3.2. The deciles of Wholesale piled up on top of one another, reach a higher revenue, compared to the other business sectors. Table A.1 shows the exact borders of the percentiles per sector. The advantage of this classification is that the number of observations in every interval is the same. The disadvantage is that this classification may assign almost identical observations to consecutive classes and observations with widely different values to the same class, see Fischer and Wang (2011). Although the targeted enterprises are independent contractors and SMEs it can be possible that a non-SME company is in the portfolio due to product replacement, portfolio take-overs or simple mistakes. The revenue of an enterprise is a risk driver for a general liability product. Table3.4shows a clear link between the revenue and frequency: the higher the revenue, the higher the frequency.

Revenue class Exposure N Frequency

1 1,454 29 0.020 2 1,800 36 0.020 3 855 21 0.025 4 1,339 31 0.023 5 1,468 48 0.033 6 1,162 40 0.034 7 1,234 54 0.044 8 1,308 59 0.045 9 1,324 91 0.069 10 1,307 178 0.136

(16)

3.2.5 Limit

In subsection2.3the role of the occurrence limit is explained. There are only two values of occurrence limits in the data: 1.25 and 2.5 million euro. The policyholder only has these two options to choose from. The 2.5 million option has the highest exposure and claim frequency, namely 9,354 to 3,897 for exposure and 0.0474 to 0.0369 for frequency. The annual aggregate limits are twice as large as the occurrence limit, so 2.5 or 5.0 million euro correspondingly.

(17)

Chapter 4

Theoretical framework

After the description of the data in the previous chapter, we dive into the theoretical framework. We start this chapter with the introduction (4.1) of fundamental definitions of non-life insurance products. Then we explain the notation of the data structure in section 4.2. In section 4.3 we explain the concepts of fixed and random effects. We explain several data structures for random effects and introduce our multi-level factor (MLF). Section4.5 summarizes the main principles of the pricing technique GLM. We review another pricing technique, credibility theory, in section 4.7. The theory of GLM and credibility come together in subsection4.8.1, where Ohlsson’s backfitting algorithm is described. We end with the theory of mixed models in subsection4.8.2.

4.1 Basic concepts of tariff analysis

We highlight the main concepts for the pricing of non-life insurance products. Detailed information can be found in Denuit et al. (2007) and Ohlsson and Johansson (2010). The exposure of a policy is the duration of the policy. This is the amount of time it is in force. The exposure is calculated for every policy per year or per period of insurance. This means that the exposure has a minimum value of zero and an maximum of one. The exposure of a group of policies is obtained by adding the duration of individual policies. A claim represents an event reported by the policy holder, for which he demands economic compensation. We assume that the claim is actually justified. The claim frequency is the average number of claims per year on one policy. It is the number of claims reported by the policy holder divided by the exposure. The claim cost is the amount paid by the insurer to the insured in case of a claim. The claim severity is the average cost per claim. It is the total claim cost divided by the number of claims. The earned premium is the annual premium times the exposure. It is the amount of premium income paid by the insured for the period that the policy is in force. The pure premium or risk premium is the claim frequency multiplied by the average cost per claim. The actual premium is the premium for one year according to the tariff in force. This premium includes loadings for expenses and capital cost and is not directly comparable to the pure premium. The policyholder can be seen as a single risk with a risk profile expressed by the values for the levels of the rating factors. If several policyholders have identical values for the levels of rating factors - an identical risk profile -, then they form a risk class. The non-life pricing actuary has the task to compute the pure premium for every risk group based on a statistical model incorporating all the available information about the policyholders in such a class.

In liability insurance modeling the claim costs is much more difficult than claim frequencies. Denuit et al. (2007) explain this with three reasons. Firstly, claims costs are often a mix of small and large claims. Large liability claims need several years to be settled. Only estimates of the final cost appear in the data until the claim is closed. Secondly, the statistics available to fit a model for claim severities are much more limited

(18)

than for claim frequencies, since only four percent of the policies produce claims. Finally, the cost of an accident is for the most part beyond the control of a policyholder since the payments of the insurance company are determined by third-party characteristics. The information contained in the available observed covariates is usually much less relevant for claim sizes than for claim counts. Our goal in this thesis is to compare different methods to analyze the claim frequencies reported on this product. We can achieve this by analyzing the claim frequency.

4.2 Data structure

Our data set has the structure of panel data involving information of policyholders over time. We denote our response of interest - number of claims - Nit and xit is a vector of

p explanatory variables. The subscripts indicate the policyholder i and the claim year t. Panel data is very suitable for a posteriori ratemaking. This tariff predicts the next year’s loss for a particular policyholder, using the dependence between the current year’s loss and losses reported by this policyholder in previous years. We aggregate our data to the level of risk classes. We do this to save computational time in our analysis while we do not get different results, seeDenuit et al.(2007) page 66 for a theoretical justification. This is only applicable in our situation with Poisson likelihood. The number of claims and exposure are aggregated like this.

Nrt= n X i=1 Nirt wrt= n X i=1 wirt, (4.1)

where r is the risk class, w is the exposure and n is the number of policyholders. Section

B.1in the appendix shows data samples of the panel data per policyholder and per risk class.

4.3 Fixed versus random effects

In this chapter we will use the terms fixed effects and random effects. We use linear models to explain the differences between these two terms although linear models are not a subject of this thesis. The given models are fromFrees et al.(2014), chapter 8. A basic linear model with no clustering of data, is model (4.2)

E[ Nrt] = α + x0rtβ, (4.2)

V ar(Nrt) = σ2,

where vector Nrtis the observed response variable, xrt= (xrt,1, . . . , xrt,p) is a vector of

p explanatory variables, β = (β1, . . . , βp)0 is a vector of p corresponding parameters to

be estimated by the model. r denotes a risk class and t is the time period. This model produces identical estimates for all risk classes r given a xrt, because it ignores the panel

structure. An example of a linear fixed effects model is (4.3). It is the same model as (4.2), but with a risk class specific intercept

E[ Nrt] = αr+ x0rtβ, (4.3)

V ar(Nrt) = σ2.

Each risk class r has its own fixed - but unknown - intercept αr. There is no pooling of

information, because the data is clustered per risk class. The intercept captures all of the effects that are not observable in the other explanatory variables. We assume inde-pendence among all observations. The model is called a fixed effects model because

(19)

the quantities αr are fixed parameters to be estimated. Another approach is the linear

mixed model. This model allows for random intercepts, with model equation

Nrt= αr+ x0rtβ + rt, (4.4)

where rt is an identically and independently distributed error term - E[rt] = 0 and

V ar(rt) = σ2. The intercept αr is now a random variable with variance σα2 that

repre-sents variation between risk classes. These random intercepts capture the heterogeneity among the risk classes and structure the dependence between observations on the same risk class. These random effects represent unobserved characteristics to the actuary. The variance structure σ2 represents the variability within risk class r. The random effects are mixed with the regression parameters β, that are considered as fixed effects. Hence the term mixed model. Two extremes exists. When σ2_α → 0, then there is complete pooling. When σ2_α→ ∞, then we speak of no pooling. A mixed model is a compromise between these two extremes, balancing between the complete pooling and no pooling models. This is known as partial pooling. In this balancing between two extreme po-sitions actuaries will recognize the resemblance with credibility theory, which we will discuss in section 4.7.

Single random effect 55101 55102 55201 4622 4634 4636 4711 . . .

Figure 4.1: Single random effects per level. The pink nodes are SIC codes.

4.4 Multi-level factors (MLF)

Frees et al.(2014), chapter 8, enumerate four kinds of structures for the random effects:

1) single random effect per level; 2) multiple random effects per level; 3) Nested random effects and 4) crossed random effects. Our random effects for this thesis have the first or the third structure. We will restrict ourselves to these two structures. The single random effect per level is the most common structure. The random effect corresponds to a certain level of a single grouping factor. For example,Ohlsson and Johansson(2004) uses bus companies as an example of a MLF that has this particular structure. Their example consist of data of 624 bus companies and two ordinary rating factors, namely age and zone. These are the bus age with five classes and a standard subdivision of Swedish parishes into seven zones. The MLF is the bus company itself and added to the model as a random effect. All levels, meaning all bus companies, are listed side by side next to each other. In case of nested random effects some levels of one factor occur only within certain levels of a first factor. For example, the MLFs car brand and car model in Ohlsson(2008) are nested random effects. Car models (Volvo V70, Volvo S60, Volvo XC90) are hierarchically ordered under car brands (Renault, Volvo, etc.). Cars of the same brand have risk characteristics in common, even if they represent different models. This gives an advantage when new models are introduced by a brand. For example, Swedish cars are well-known for their safety. A new model from the brand

(20)

Volvo will have this risk characteristic. Even when there is no data available for the new model.

The MLFs of our CGL portfolio are business sector and SIC code. Figure4.1shows the SIC code as a single random effect per level. All SIC codes are listed on the same hierarchical level. If a model includes a random effect according to this structure, then we call this model non-hierarchical. Figure 4.2 shows the SIC code hierarchically ordered under business sector as nested random effects. The business sectors are listed next to each other at the same hierarchical level. Every SIC code is listed under one and only one business sector. A model with nested random effects is called hierarchical or multilevel.

Nested random effects

Hospitality Wholesale Retail · · ·

55101 55102 55201 · · · 4622 4634 4636 4637 · · · 4711 47191 47221 47222 47241 · · · · · ·

Figure 4.2: Nested random effects. The green nodes are business sectors. The pink nodes are SIC codes.

4.5 Generalized linear models

Linear regression models like the examples in section 4.3 assume normally distributed random errors and a mean structure that is linear in the regression parameters. This conflicts with the responses of interest in non-life pricing models. Firstly because the number of claims follows a discrete probability distribution on the non-negative integers. Secondly because the mean number of claims is not linear with the covariates. We typically want to impose a multiplicative rather than the additive structure which linear regression brings us. In actuarial science the generalized linear model (GLM) is the main regression technique to find the relation between the response and explaining variables. It provides solutions to the two drawbacks of the linear models. The technique is thoroughly discussed inKaas et al. (2008), where the following three components are described.

1. A stochastic component: a set of independent random variables Yi, i = 1, . . . , n

with a density in the exponential dispersion family of the form. fYi(y; θi, φ) = exp yθi− b(θi) φ/wi + c(y; φ/wi) , (4.5)

where b(.) and c(.) are real functions, θi is the natural parameter and φ is the scale

parameter. Here i represents the policyholder and wi is the weight of policyholder

i in the policy year under consideration.

2. A systematic component that attributes a linear predictor ηi = x0iβ =P_jxijβj

to every observation. βj are fixed regression coefficients and xij are covariates.

3. A link function that links the expected value µi of the response variable to the

(21)

GLM gives us the advantage to use another distribution than the normal distribution for the random deviations from the mean. Another advantage is that the mean of the random variable may not be a linear function of the explanatory variables, but it may be expressed on a logarithmic scale. In this case we get a multiplicative model instead of an additive model. The outcome of the GLM is the so-called a priori tariff (Denuit et al.(2007)), meaning that the actuary only uses covariates that are known in advance. A disadvantage is that GLM cannot directly include random effects to take dependencies or hierarchically structured data into account, or to create an a posteriori tariff. This makes GLM an example of a fixed effects model. Another disadvantage is that GLM assumes homogeneity of the underlying portfolio. We will discuss this further in the next section.

4.6 Actuarial models for claim frequencies

Before using the GLM we should determine which distribution is valid for our response variable N . In this section we do not use the indices i and t for readability reasons. The Poisson distribution is often used to model count data in general and to model the number of claims made in particular. Of course the number of claims is a discrete variable, so we are restricted to the discrete distributions. Another candidate is the negative binomial distribution. Here the mean is no longer equal to the variance - as in the case of Poisson - but the variance exceeds the mean. We call this over-dispersion. The Poisson distribution with exposure has the following probability mass function:

P (N = k) = (λw)

k

k! e

−λw_, _(4.6)

where w is the exposure measure and k the number of claims, k = 0, 1, 2, . . . . Denuit

et al. (2007) states that the use of the Poisson distribution is obvious, but only when

the underlying population is homogeneous. Unfortunately in practice this is not always the case. The difference in behavior among individual policyholders that cannot be observed by the actuary leads to unobserved heterogeneity. Over-dispersion is a well-known consequence of unobserved heterogeneity in count data analysis. This means that the variance of the number of claims is larger than the mean. A way to manage this unobserved heterogeneity is to impose a random variable (called Θ) on the mean parameter of the Poisson distribution. Denuit et al.(2007) call this a mixed Poisson distribution with parameters λ for the mean frequency and Θ for the positive random effect. In a mixed Poisson model the annual expected claim frequency itself becomes a random variable. The obtained distribution is defined as

P (N = k | Θ) = (λwΘ)

k

k! e

−λwΘ_. _(4.7)

We obtain unconditional probabilities by integrating 4.7over the random variable Θ. P (N = k) = E[P (N = k | Θ)] = Z ∞ 0 e−λwθ(λwθ) k k! dFΘ(θ), (4.8)

where FΘ(θ) is the distribution function of the random variable Θ. The mixed Poisson

model 4.8is an accident-proneness model: it assumes that a policyholder’s mean claim frequency does not change over time but allows some insured persons to have higher mean claim frequencies than others. We will say that N is mixed Poisson distributed with parameter λ and risk level Θ, denoted as N ∼ MPoi(λ, Θ) when it has probability mass function4.8. The condition E[Θ] = 1 ensures that

E[N ] = E[E[N | Θ]] = Z ∞ 0 ∞ X k=0 k e−λwθ(λwθ) k k! dFΘ(θ) = λw. (4.9)

(22)

A well-known candidate for the distribution of Θ is a Gamma(a, a)-distribution for some a > 0. Now the expectation of Θ meets the condition of being equal to one. The density function of this distribution is

fΘ(θ) =

1 Γ(a)a

a_θa−1_e−aθ_, _{θ > 0.} _(4.10)

The unconditional probability mass function then becomes P (N = k) = EΘ[E[P (N = k | λwθ) | Θ]] = Z ∞ 0 e−λwθ(λwθ) k k! dFΘ(θ).

Replace dFΘ(θ) by fΘ(θ)dθ = _Γ(a)1 aaθa−1e−aθdθ and we obtain

P (N = k) = Z ∞ 0 e−λwθ(λwθ) k k! 1 Γ(a)a a_θa−1_e−aθ dθ = a a_(λw)k Γ(a)k! Z ∞ 0 e−(λw+a)θθk+a−1dθ = Γ(a + k) Γ(a)Γ(k + 1) a a + λw a λw a + λw k .

In the last line we recognize the probability mass function of the negative binomial distribution with expectation wλ and variance wλ + α(wλ)2.

The usual way to find an estimate for the parameters of the distributions is the method of maximum likelihood. This method defines the likelihood as the product of the probabilities to observe all outcomes in the data, and approaches this as a function of the distribution parameters. The maximum likelihood estimator (MLE) of the parameters is the value for which the likelihood is maximum. GLM also uses this method to find an estimate for β.

4.7 Credibility theory

Credibility theory is one of the oldest actuarial techniques. We will focus on the greatest accuracy credibility, introduced by B¨uhlmann (1967) in the 1960s. Many textbooks explore this subject, likeKaas et al.(2008), chapter 8 andB¨uhlmann and Gisler(2005). Credibility is useful if the actuary has to set a premium for a group of policies for which there is limited claim experience on a smaller group, but a lot more experience on a larger group of policies that are more or less related. Credibility can set up an experience rating system to determine the pure premium, taking into account not only the individual experience with the group, but also the collective experience. There are two extreme positions. One is to charge every policy the same premium, calculated over all policies that are present in the data. The other extreme is to charge group or policy r its own premium based on its own claim average. Credibility provides the following formula to obtain a weighted average of the two extreme positions.

zrN¯r+ (1 − zr) ¯N , 0 6 zr6 1, (4.11)

where r is a group of policyholders or risk class, ¯Nr is the average number of claims

of the risk class r and ¯N is the average number of claims of the portfolio. zr is the

well-known credibility factor or credibility weight.

We now switch to a chronological overview of the different credibility models to envision this theory. We start with the B¨uhlmann model:

(23)

We interpret m as the overall mean. This is the expected number of claims for an arbi-trary risk class. Ξrand Ξrtare independent random variables for which the expectations

are zero. The variance of Ξr describes the variation between risk classes. Ξr denotes a

random deviation from mean m, specific for risk class r. The components Ξrt denote

the deviation for year t from the long-term average of a risk class r. They describe the within-variation of a risk class. After the introduction of the Bühlmann model, en-hancements have been added.Bühlmann and Straub(1970) created a model where the weight of a risk is included. The Bühlmann-Straub model has the same decomposition as4.12, but the variance of Ξrtis s2/wrt. Here wrtis the weight attached to observation

Nrt. B¨uhlmann and Jewell (1987) introduced Jewell’s hierarchical model. This is an

improvement of the B¨uhlmann-Straub model that is compatible with the modeling of hierarchically structured data. Antonio et al. (2010) implements a hierarchical model for data of insurance companies, fleets of vehicles and vehicles. The number of claims for risk class r in sector p in year t can be decomposed as follows:

Nprt = m + Ξp+ Ξpr+ Ξprt. (4.13)

Again m is the overall mean. Ξp is the deviation of sector p from mean m. In our

example p could be the insurance company. Splitting up insurance company p into fleet q and each fleet q in vehicle v, each with its own deviation Ξp+ Ξpq+ Ξpqv, leads to

a hierarchical chain of models. One can use this model to fixed effects. For example, denote p as the region and j as the gender of the driver. By adding the term Ξ0_j one can describe the risk characteristics of group j. This model is a cross-classification model.

While GLMs make use of a distribution, which is specified at forehand, credibility theory does not use a distribution. Due to this distribution free property the estimation of the parameters is difficult. This estimation depends on moment estimation. This is a cumbersome method especially compared to the maximum likelihood method. Frees

et al. (1999) took account for the renaissance of credibility theory by showing that the

classical credibility model can be reformed to a linear mixed model (LMM). For the modeling of claim counts the GLM is a better framework than LM. Therefore we want to switch to GLM with fixed and random effects in a GLMM framework.

4.8 GLMs with random effects

In this section we extend the GLM with random effects. In subsection4.8.1we discuss the combination of GLM and credibility. In subsection 4.8.2we elaborate on GLMMs.

4.8.1 Backfitting algorithm

Ohlsson(2008) describes the ideas underlying the combination of GLM with credibility

theory (GLMC). It is a distribution-free and a simple approach similar to credibility theory. Besides models for random variables with a single random effect per level, Ohls-son also describes models with nested random effects. OhlsOhls-son poses that GLMC is especially suited for the estimation of multi-level factors (MLFs). He incorporates the two structures of random effects which are discussed in section4.4. The first structure is the single random effect per level, denoted as Uj. In this subsection we use the notation

that is used byOhlsson(2008). The multiplicative model then looks like this.

E[Yijt| Uj] = µγiUj, (4.14)

where Yijt is the observed response variable, i is a priori tariff cell and j is a group

of risks, like the MLF level. Repeated observations are indexed by t. µ is the base premium and γi is product of the price relativities for tariff cell i, so γi = γ1iγ2i. . . γRi

with R denoting the number of ordinary factors. µ and γ_ri can be estimated by standard GLM methods by initially disregarding Uj. Uj is the random effect, with E[Uj] = 1. Uj

(24)

Step 0 Initially, let ˆUj = 1 for all j;

Step 1 Estimate the parameters for the ordinary rating factors by a GLM using Poisson with log-link, using log( ˆUj) as an offset-variable. This yields ˆµ and ˆγi;

Step 2 Compute σ2 and τ2 using the formulas (2.9) and (2.10) of Ohlsson(2008) and the outcome of Step 1;

Step 3 Use equation4.15 to compute ˆUj using the estimates from Step 1 and 2;

Step 4 Return to Step 1 with the new ˆUj from Step 3 until convergence.

ˆ

Uj is estimated with this technique

ˆ Uj = zj

˜ Y.j.

µ + (1 − zj), (4.15)

where ˜Y.j. is the average response over group j. The tilde symbol means that Y is

transformed by γi. The bar means that it is the average. zj is the well-known credibility

factor. If zj is zero then ˆUj will be equal to 1. If zj is 1, than ˆUj will be the average of the

group j. zj is depending of the total exposure of j, σ2 and τ2, like is specified in formula

(2.5) of Ohlsson (2008). For groups with small variation between the observations of that group in comparison with the variation between the groups, zj tends to 1.

Ohlsson (2008) also considers the use of nested random effects. This is the

multi-plicative model:

E[Yijkt| Uj, Ujk] = µγiUjUjk. (4.16)

Here we have two random effects, namely Uj for sector j and Ujk for group k within

sector j. The other indices correspond to those in the hierarchical model in expression

4.14. Assumptions are that E[Uj] = 1 and E[Ujk| Uj] = 1. We illustrate the hierarchical

model with the example with car brand and car model form Ohlsson (2008). In the example the key ratio Yijkt is the claim frequency, i is a policyholder, j is a car brand,

k is a car model and t is the time period. The ordinary rating factors γi = γ1iγ2i. . . γRi

are well-known factors, like vehicle class, vehicle age and geographic zone.

To find the estimates of the hierarchical model the same iterative backfitting algo-rithm is used. The random effects Uj and Ujk are both initially set to 1 for all j and

k. µ and γ_ri’s can be estimated by standard GLM methods by incorporating log( ˆUj)

and log( ˆUjk) as an offset-variable. The formulas of the estimates are different than the

non-hierarchical case, but provided by Ohlsson(2008).

4.8.2 Generalized linear mixed models

Generalized linear mixed models (GLMMs) extend GLMs by including random effects in the linear predictor. The random effects reflect the idea that there is a natural hetero-geneity across risk classes and that the observations on the same subject share common characteristics. In section4.3the advantages of linear mixed models are discussed. These advantages also apply to GLMMs. The idea of combining credibility and GLMs was in-troduced in the actuarial literature by Nelder and Verrall (1997). Frees et al. (2014), chapter 16, and Antonio and Beirlant (2007) elaborate this topic and explain several examples.

GLMMs extend GLMs by adding random effect z0_ijui to the linear predictor x0ijβ.

z0_ij is a vector of known covariates of the random effects and ui is a parameter vector of

random effects for subject i. Conditional on ui, GLMM assumptions for the jthresponse

on subject i, response variable Yij are

(25)

Like with GLM there is a link function g(.) in order to relate the mean µij to the fixed

(β) and random effect (ui) parameter vectors.

g(µij) = x0ijβ + z 0

ijui, (4.18)

where ui is the vector of random effects for cluster i, β is the vector with the fixed

effects parameters, xij and zij are p and q dimensional vectors of known covariates

corresponding with the fixed and random effects, respectively.

When the response variable Yij follows a Poisson distribution, like in our case with

number of claims, we use the logarithm as the link function. So, log(µij) = x0ijβ + z 0 ijui µij = ex 0 ijβ+z0ijui_. _(4.19)

The likelihood of the GLMM with specification 4.17is L(β, D | yij) =

Z

fYij| ui(yij| ui)fU(ui)du, (4.20)

where the integral goes over the random effects vector ui (with covariance matrix

D). Frees et al. (2014), chapter 16, state that due to the integral in 4.20 there are no

explicit expressions for estimators and predictors. Approximations to the likelihood or numerical integration techniques are required to maximize4.20with respect to the un-known parameters. Three approaches are distinguished to approximate the estimations of β, D and predictions of the random effect for clusters i, ui.

1. The Laplace approximation

The Laplace method approximates integrals of the form Z

eh(u)du (4.21)

for some function h of a q-dimensional vector u. See Tierny and Kadane (1986). Then, we have, by Taylor expansion,

h(u) ≈ h( ˆu +1

2(u − ˆu)

0_h00_{( ˆ}_{u)(u − ˆ}_u). _(4.22)

This expansion approximates4.20.

2. The penalized quasi-likelihood (PQL)

This method is also called pseudo-likelihood (PL). It is based on an algorithm with a working variate. The algorithm starts with initial estimates of β, u and the variance components. By using linear mixed model techniques the working variates and variance are updated. These steps are repeated until convergence of the estimates. Breslow and Clayton (1993) give justifications of the approach. 3. The adaptive Gauss-Hermite quadrature (GHQ)

The non-adaptive Gauss-Hermite quadrature approximates the value of integrals of the kind stated in (4.23) with a weighted sum:

Z ∞ −∞ h(z)e−z2dz ≈ Q X l=1 wlh(zl). (4.23)

Q is the order of the approximation, the zl are the zeros of the Qth order

Her-mite polynomial, and the wlare corresponding weights. With the adaptive

Gauss-Hermite quadrature rule, the nodes are rescaled and shifted such that the inte-grand is sampled in a suitable range. The integral in4.20is approximated with the

(26)

adaptive Gauss-Hermite quadrature rule for numerical integration. This numeri-cal integration technique still enables for instance a likelihood ratio test. Moreover the estimation process is just singly iterative. On the other hand, at present, the procedure can only deal with a small number of random effects which limits its general applicability. When Q = 1, z1 = 0 and w1 = 1, then this method

corre-sponds with the Laplace approximation method.Lui and Pierce (1994) give more details on GHQ.

Frees et al. (2014), chapter 16, discusses some pros and cons of the three methods.

Laplace and PQL rely on quite a few approximations. Therefore the accuracy is low. The advantage of PQL is that a large number of random effects but also crossed and nested random effects can be handled. The approximation through numerical integration is more accurate. But this method is limited to GLMMs with a small number of nested random effects. Gauss-Hermite quadrature is explicitly designed for normally distributed random effect which gives you less flexibility.

(27)

Chapter 5

Modeling of the MLF

Before we start the analysis section, we describe the current practice for the CGI portfolio at the insurer concerned. The insurer uses GLM to model the claim frequency. Besides an overall intercept their model includes the covariates revenue class and busi-ness sector. Although this model is technically correct, it is not accurate enough. The insurer projects the tariff per business sector to the appropriate level of SIC codes. This projection is done by risk experts. This is a time consuming and subjective process. The tariff is heavily depending on human preferences. In this chapter we will relate to the current practice and explore alternatives.

We analyze the data described in chapter 3. The variable of interest is Nrt, which is

the number of claims observed per risk class r and time period t. The number of policy years wrt is the measure of exposure. We perform the data preparation and analysis

in statistical software package R. Section 5.1 presents the modeling of the claim frequency. Several well-known distributions for claim counts have been fitted to the data. The goal is to choose a distribution that we can use in the GLM framework presented in section5.2. In this section we search for the best GLM models using only fixed effects. Here, we approach all available covariates as fixed effects, including the MLF. Playing with the MLF in this manner will show the difficulties of using the MLF. Section5.3 introduces the MLF as a random effect. First in section 5.3we incorporate the SIC code in a GLMM with a non-hierarchical structure. Finally in section 5.4

we incorporate the business sector and SIC code in a GLMM with a hierarchical structure in order to show the added value of such a nested structure of our random effects.

5.1 Actuarial modeling of claim frequencies

We start with fitting a negative binomial, Poisson and over-dispersed Poisson (quasipois-son) distributions. We do not take any regression parameters into account, but we do incorporate the exposure. To get a first impression of our data we made table5.1. Table

5.1compares the fit of several distributions to the data. Hereby we follow the procedure

from Kaas et al. (2008), page 65, to fit the negative binomial distribution to the data.

First we calculate the estimates for the parameters of the fitted distributions. Then we tabulate the empirical distribution of the data as well as the fitted distributions. For the Poisson distribution ˆλ = 0.0443 is used. We estimate λ as follows

ˆ λ = P rtNrt P rtwrt (5.1) where r and t indicate the risk classes and the repeated observations. Table 5.1shows that the data has a right-skewed tail. The negative binomial distribution fits this tail better compared to the Poisson. After this first impression we fit the data with the help of the glm and glm.nb functions of the R library stats. We used the following model

(28)

Number of claims

Distribution Parameter 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Empirical 5,684 294 56 27 9 4 0 2 1 1 0 0 0 1

Negative binomial ˆr=0.0967;ˆp=0.5004 5,685 275 75 26 10 4 2 1 0 0 0 0 0 0

Poisson ˆλ=0.0443 5,816 258 6 0 0 0 0 0 0 0 0 0 0 0

Table 5.1: Comparing different distributions for the number of claims.

and commands

λrt= exp (β0+ log(wrt)), (5.2)

glm(N ∼ 1, offset = log(exposure), family = poisson(link = log)) (5.3) glm.nb(N ∼ 1 + offset(log(exposure)), link = log) (5.4) glm(N ∼ 1, offset = log(exposure), family = quasipoisson). (5.5) Figure5.1and table5.2show the resulting estimates. 95% confidence intervals, estimates and Akaike information criterion (AIC) are given. The AIC is derived as stated inKaas

et al. (2008), page 248, and is calculated with the following equation:

AIC = −2` + 2k, (5.6)

where k is the number of parameters and ` is the logarithm of the likelihood. The routines glm and glm.nb include the AIC in their output. The quasipoisson distribution does not include a AIC because there is no log likelihood for this distribution. The

−3.25 −3.20 −3.15 −3.10 −3.05 −3.00

Neg. binomiaal Poisson Quasipoisson

Figure 5.1: Estimate (+/- 1.96 * s.e.).

Distribution Estimate s.e. AIC Neg. binomial -3.123 0.052 2,966.7 Poisson -3.117 0.041 3,118.1 Quasipoisson -3.117 0.046

Table 5.2: Comparing distributions. confidence intervals show small differences, though the confidence interval of the Poisson distribution is the smallest. The negative binomial distribution returns the lowest AIC. Based on the fits obtained above where no covariates are taken into account, we should opt for a negative binomial distribution to fit the data. Unfortunately the R library that we will use for the GLMM modeling does not support this distribution. The library lme4 contains the function glmer.nb where nb stands for negative binomial. The package manual -lme4(2015)- declares this function as experimental. Taking this into account, we continue our analysis with the Poisson distribution and extend the basic fit obtained here by fixed and random effects.

5.2 Generalized linear model

In this section we discuss the GLM analyses and outcomes. We fit the GLMs using the glm function in R. For all models we choose for a Poisson distribution in combination with a log link function. We use the logarithm of exposure as an offset. First we propose a basic model. This is a model with the ordinary rating factors without the use of the MLF. We refer to this model as complete pooling because the model ignores the clustering of the data in economic activities. In a next step we extend the basic model

(29)

by the MLF as if it were fixed effects. We first add business sector to the basic model and after that we will replace business sector with SIC code. The latter is called the no pooling model, because there is no pooling of the data anymore. Every SIC code will have its own estimate. The basic model extended by the business sector is called semi-complete pooling model. The pooling of the data is in between complete pooling and no pooling.

5.2.1 Complete pooling

The starting point of our analyses is the basic GLM model, where we ignore the cluster-ing of the data by the economic activity. We refer to this model as a complete poolcluster-ing model. The covariates under consideration are the ordinary rating factors which we ap-proach as the fixed effects, namely revenue class, occurence limit and claim year. We use β0 for an overall intercept. β1 is the parameter for the revenue class. This is the

classification of the revenue of the policyholder as described in subsection 3.2.4. This covariate contains numeric values and is estimated by a single parameter β1, inspired

by Kaas et al. (2008) apply to the bonus-malus factor in chapter 9. In order to

re-trieve the appropriate frequency we multiply β0 by the revenue class. For example, if

a policyholder has revenue classification 5, then the frequency -apart from the other parameters- will be exp( ˆβ1· 5). The vector β2consists of two parameters for the levels of

the explanatory variable occurrence limit. Vector β3 contains three parameters for the

three levels of the explanatory variable claim year. We investigate three GLM models. Nrt∼ P OI(wrt· λrt),

λrt= exp (β0+ β1· revenue classrt), (5.7)

λrt= exp (β0+ β1· revenue classrt+ β2· limitrt), (5.8)

λrt= exp (β0+ β1· revenue classrt+ β2· limitrt+ β3· claim yearrt). (5.9)

Here r is a risk class and t are the repeated observations. The covariates occurrence limit and claim year need to be treated as categorical variables, which is achieved in R by the factor instruction. We define:

limit = factor(occurrence limit) claimyear = factor(claim year). (5.10) We create model (5.9) in R with the following instruction:

glm(N ∼ revenue cat + limit + claimyear, offset = log(exposure),

family = poisson(link = log), data = ds glm) (5.11) We commence our quest for the relevant rating factors by comparing the models (5.7) and (5.8). (5.7) is the most restrictive model, because it does not include covariates and presumes that the elements of the vectors β2 and β3 must be equal to zero. In a

hypothesis test, the null hypothesis is H0 : β2 = 0. The alternative hypothesis is H1 :

β2 6= 0. The hypothesis testing is done by the function anova with the following call:

anova(glm.fit1, glm.fit2, test = ”Chisq”) (5.12) Here glm.fit1 stands for (5.7) and glm.fit2 for (5.8). The function returns a p-value of 0.01864, which tells us that the null hypothesis is rejected. The same test for model (5.8) to model (5.9) gives a p-value of 0.2634. The null hypothesis is not rejected. In table A.3 of the appendix the estimates, standard errors and AIC of all three models are listed.

Based on the hypothesis testing the relevant covariates are revenue class and the occurrence limit. Later on in this chapter we will find out that one of the mixed models does not converge if the covariate occurrence limit is included. Because our main goal is

(30)

to compare the different mixed models methods, we decide to choose for revenue class as our only fixed effect. We call this model the basic model (5.13). In this model every policyholder within a revenue class will have the same rate independent of the economic activity. For practical utilization by an insurer the rate needs more differentiation. We will do this in the next subsection by investigating the covariates business sector and SIC code as fixed effects.

λrt= exp (β0+ β1· revenue classrt). (5.13)

Model names

Complete (5.7) Semi-complete (5.14) Covariate Estimate (s.e.) p-value Estimate (s.e.) p-value Intercept −4.541 (0.121) 0.000 −4.944 (0.145) 0.000 Revenue cat 0.225 (0.016) 0.000 0.224 (0.016) 0.000 Business sector

Hospitality ref. group

Construction 1.063 (0.107) 0.000 Garage −0.109 (0.456) 0.812 Manufacturing 0.517 (0.190) 0.007 Retail −0.302 (0.148) 0.042 Wholesale 0.251 (0.170) 0.141 Observations 6,079 6,079 Log Likelihood −1,443.877 −1,354.026

Akaike Inf. Crit. 2,891.753 2,722.053

Table 5.3: Estimate, s.e. and statistics for the (semi-) complete poling models.

5.2.2 Semi-complete pooling

Starting from the model in (5.13) we analyze the candidate random effects as if they are fixed effects. In this way we create a model that we can use for comparison. We execute our strategy in two steps. Firstly, we extend our basic model with the business sector. Secondly, we take the SIC code into account. We call the model with the extension of business sector semi-complete pooling (5.14), because the pooling of data is in between complete pooling and no pooling. The business sector is taken as a fixed effect besides the other fixed effect revenue class. We use the R function factor to establish that business sector is a categorical variable.

λrt= exp (β0+ β1· revenue classrt+ β4· Sectorrt). (5.14)

Here, the vector β4 is a vector with six elements that corresponds with the six

regres-sion parameters for the covariate business sector. Table 5.3states the outcomes of the complete and semi-complete pooling models. The estimates for the covariate revenue class are almost equal, but the overall intercept of the semi-complete pooling model is much higher. This is compensated by the estimate of the covariate business sector. We performed an hypothesis test with H0 : β4 = 0 and H1: β46= 0. With a p-value of 0 H0

is rejected. We include business sector in our model.

With a simple example we demonstrate the risk of the complete pooling model. According to the complete model the annual frequency for a policyholder in the Con-struction sector with a revenue class of 4 is exp(−4.541 + 4 · 0.225) = 0.0262. Note the calculation for the revenue class factor. We multiply β1 by four because revenue class

Pricing insurance products in the presence of multi-level factors