• No results found

Statistical modelling in Motor Own Damage insurance : determining claim frequency and claim severity factors

N/A
N/A
Protected

Academic year: 2021

Share "Statistical modelling in Motor Own Damage insurance : determining claim frequency and claim severity factors"

Copied!
54
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Damage Insurance:

Determining Claim Frequency and

Claim Severity Factors

Elena Mironova

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics Author: Elena Mironova Student nr: 10825207

Email: e.mironova1504@gmail.com Date: July 17, 2015

Supervisor: Assistant Professor Dr. Sami Umut Can Second reader: Professor Dr. Rob Kaas

(2)

the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

This research focuses on the MOD insurance market in Russia. In 2012-2015 a number of negative tendencies caused the loss ratio in the Russian MOD insurance market to increase significantly. There-fore, the insurers had to resort to the simplest way of improving their financial position: they raised the MOD tariffs. However, a more accurate risk assessment seems to be a better way of avoiding extra costs and ensuring profitability. Thus, the main goal of this research is to determine the factors signif-icantly influencing insurance claim frequency and severity in MOD insurance in Russia. The following models are employed: the hurdle model for claim frequency analysis and the gamma specification of GLMs for claim severity analysis. As a result, using a resampling framework, it was found that driving experience and vehicle class do affect claim severity significantly, and, in addition to these factors, franchise amount also has a significant influence on claim frequency in the MOD insurance.

Keywords MOD insurance, Russian motor insurance market, claim frequency, claim severity, generalized linear models, hurdle model, gamma distribution.

(4)

Introduction 1

1 Modern motor insurance market in Russia 3

1.1 Russian motor insurance market: peculiarities and current structure . . . 3

1.1.1 CMTPL in Russian motor insurance market. . . 3

1.1.2 MOD in Russian motor insurance market . . . 4

1.1.3 The relation between CMTPL and MOD insurance markets . . . 4

1.2 Current structure of Russian motor insurance market. . . 5

1.3 Current tendencies in motor insurance market . . . 6

1.3.1 Influence of changes in legislation on Russian motor insurance market . . . 7

1.3.2 Influence of the current macroeconomic situation on Russian motor insurance market. . . 9

1.4 Chapter conclusions . . . 10

2 The models for claim frequency and severity analysis 11 2.1 Literature review . . . 11

2.1.1 Literature on GLMs . . . 12

2.1.2 Literature on frequency models . . . 12

2.1.3 Literature on severity models . . . 13

2.2 Generalized Linear Models . . . 14

2.2.1 The general setting . . . 15

2.2.2 Rating factors in a multiplicative model . . . 15

2.2.3 Link function . . . 16

2.3 Specifications for claim frequency modelling . . . 18

2.3.1 Specific features of count variable analysis . . . 18

2.3.2 Hurdle model . . . 18

2.4 Specifications for claim severity modelling . . . 19

2.4.1 Specific features of claims severity analysis . . . 19

2.4.2 Gamma distribution . . . 20

2.5 Chapter conclusions . . . 21

3 Empirical research 22 3.1 Data description . . . 22

3.1.1 Data characteristics and sample composition . . . 22

3.1.2 Exploratory analysis . . . 23

3.1.3 Methodology . . . 25

3.2 Hypotheses . . . 26

3.3 Models estimation results . . . 28

3.3.1 Determinants of claim frequency . . . 28

3.3.2 Determinants of claim severity . . . 31

3.4 Chapter conclusions . . . 33

Conclusion 34

(5)

Appendix 35

(6)

This research is the final proof of competence to obtain a Master of Science degree in Actuarial Science and Mathematical Finance. The research is a continuation of my bachelor’s thesis. The previous stage only focused on claim frequency analysis of Russian MOD insurance market. This research also considered claim severity analysis, but most importantly, it applied a resampling methodology.

I would like to take this opportunity to express my gratitude to Dr. S. U. Can, assistant profes-sor at the University of Amsterdam, the Netherlands, for supervising this research. Thank you for suggesting numerous ways of tracking modelling mistakes and providing very thorough and detailed comments on the text. I would also like to thank my bachelor thesis supervisor, Prof. E.V. Gilenko, associate professor at the St. Petersburg State University, Russia, for fostering the interest in the topic, introducing me to R and LATEXand setting the standards of a high-quality text writing. I appreciate

the enormous amount of time and effort invested at the foundational stage of the research. I also express my gratitude to the company that provided the data at the initial stage of the research and introduced me to the practical matters of the insurance business.

(7)

• GLM — Generalized Linear Models • EDM — Exponential Dispersion Models

• CMTPL insurance — Compulsory Third Party Liability insurance • MOD insurance — Motor Own Damage insurance

• YoY — year over year

(8)
(9)

Introduction

Over the period from 2012 to 2015, the motor insurance market in Russia was influenced by sig-nificant changes both in motor insurance legislation and consumer protection laws, as well as the macroeconomic situation in the country.

First of all, since the decisions of the Russian Supreme Court Plenum which were made in 2012– 2013, motor insurance services have been covered by the Consumer Protection Law (see Plenum

(2013)). By the end of 2013, these decisions resulted in a dramatic increase of the legal costs of Russian motor insurance companies because in most cases court decisions were in favour of the cus-tomers. Consequently, in early 2014, the CMTPL loss ratio in several Russian regions exceeded 100% (see Yumabaev(2015)).

This forced some motor insurance companies to leave the loss-generating regions. Other motor insurance companies had to compensate their CMTPL losses by other types of motor insurance, in particular, by motor own damage insurance (MOD), as well as impose additional insurance products to their customers, which in turn caused a negative reaction from the customers’ part.

Simultaneously, since the middle of 2012, there was an extended discussion on the necessity to increase the CMTPL limits of indemnity that remained unchanged since their first introduction in 2002 (see 40-FZ (2002)). Finally, in October 2014 and April 2015 the CMTPL limits of indemnity were increased by more than 200% in total (see223-FZ(2014)) in order to take into account the risen prices for spare parts and maintenance and repair services.

Correspondingly, Russian motor insurance companies lobbied increasing the CMTPL tariffs that are rigid and set in a centralized manner by the Central Bank of Russia, which was set in charge of the CMTPL insurance in Russia in September 2013. It is only in October 2014 that the CMTPL tariffs were also increased initially by 30%, and then increased again by almost 60% in April 2015.

But a sharp depreciation of the Russian national currency (rouble) at the end of 2014 worsened the situation in Russian automotive market dramatically. Vehicles and repair parts of foreign production became much more expensive, thus increasing the expenses of Russian motor insurance companies. Moreover, the currency depreciation caused a significant decrease in the demand for new cars: in the 1st quarter of 2015 it dropped by 40% YoY (see Lomakin (2015)). This virtually ceased the inflow of new clients to the motor insurance companies. As a result, in the 1st quarter of 2015 the total premium income from MOD policies decreased by 12% YoY (see Fig. 1.1).

The circumstances described above, along with low concentration and high competition levels in the Russian motor insurance market, pushed the motor insurance companies into a very difficult position in the early 2015. Most of the motor insurance companies had to resort to, in fact, the only available way to compensate their losses — increase the prices of MOD policies significantly. In some cases, the increase of the MOD policy prices made up about 50% (seeLavrentev (2015)).

Thus, in this research we focus on the MOD insurance in the Russian motor insurance market. As it was shown above, the prominent increase in the prices of policies in the MOD insurance market was influenced by a number of factors, including changes in the CMPTL insurance market (although the CMTPL prices themselves are rigid and strictly regulated by the Central Bank of Russia). Nevertheless, we share the opinion of some experts in the Russian insurance business that the best way to avoid MOD policies over-pricing is a more accurate risk assessment and taking into account an extended range of possible factors influencing claim frequency and claim severity in Russian MOD insurance market (seeZhilkina(2014)).

Nowadays, in Russian MOD insurance the following factors are typically taken into account: driver’s attributes (experience, age, claim records); vehicle’s characteristics (brand, model, year of production, whether it was bought on credit, etc.); policy conditions (franchise, vehicle’s value).

Sometimes Russian motor insurance companies consider additional factors such as driver’s gender (which is legal in Russia, as compared to, for example, the European countries where it is forbidden to discriminate drivers by gender, see Commission(2011)), and vehicle’s mileage (which is known as pay-as-you-drive insurance in the UK and the USA).

Thus, a better understanding of the actual impact of all these factors on claim frequency and claim severity could help Russian motor insurance companies to improve their financial position in the current critical situation. It is important to stress that Russian motor insurance companies have

(10)

flexibility only in changing the MOD tariffs thus compensating, in particular, for the negative situation in the CMTPL insurance market.

Thus, the aim of this research is to determine the factors significantly influencing insurance claim patterns in MOD insurance in Russia. To this end, we apply appropriate statistical models of both claim frequency and claim severity classes to a modern MOD insurance dataset. We also seek to find out which of these factors should be considered by Russian motor insurance companies to optimize their client portfolios.

The data for this research was provided by one of the largest motor insurance companies based in St. Petersburg, Russia. The name of this company remains undisclosed according to the company’s confidentiality policy.

The structure of this thesis is as follows. Chapter 1 focuses on Russian motor insurance market. The peculiarities of the CMTPL and the MOD insurance markets are listed and the structure of the MOD market is discussed. Moreover, the current tendencies in the motor insurance market are considered. Namely, the influence of major changes in legislation and the macroeconomic situation on the Russian motor insurance market are described in detail.

Chapter 2 focuses on the statistical theory that forms the basis of this research. The review of relevant literature on the applications of GLMs in motor insurance and references on claim frequency and severity models are discussed. The chapter also presents a theoretical background on the GLMs, the hurdle model and the gamma specification for severity models.

Chapter 3 focuses on the empirical part of the research. It describes the data characteristics, the methodology of resampling and the results of exploratory analysis. Furthermore, it discusses the research hypotheses about the influence of different factors on claim frequency and claim severity. Finally, the chapter contains the results of estimated models within a resampling framework and the comments on interpretation of these results.

Our conclusions suggest a number of recommendations for Russian motor insurance companies regarding the factors that should be taken into account in MOD tariffs composition.

(11)

Modern motor insurance market in

Russia

1.1

Russian motor insurance market: peculiarities and current

struc-ture

This section focuses on peculiarities of the Russian motor insurance market. In particular, the section briefly describes CMTPL and MOD insurance in Russia and illustrates a close relation between the two insurance segments. The reason for focusing here on the link between the compulsory and voluntary motor insurance in Russia is that these segments are closely connected (as discussed below). They are not only affected by common factors, but they also have influence on each other.

1.1.1 CMTPL in Russian motor insurance market

Compulsory motor third party liability insurance (CMTPL) is regulated by the Federal Law No. 40 ‘On Compulsory motor third party liability insurance’, the edition of 04.11.2014. The law covers vehicles that travel at a speed of over 20 km per hour. The object of the CMTPL insurance is a property interest that concerns vehicle owner’s liabilities to a third party. The law addresses risks of personal injuries as well as risks of damage to a property of third party when the vehicle is in use in the Russian Federation. Special military equipment is excluded (see 40-FZ (2002)).

Currently, the CMTPL insurance coverage for the risk of personal injuries is 500 000 rub. (ap-proximately e8 300, as of May 2015), for the risk of property damage the coverage is 400 000 rub. (approximatelye6 700, as of May 2015). Generally, the duration of insurance contract is 1 year.

CMPTL premium calculation is based on the information provided, taking into account the data from the Automated Information System of Russian CMTPL insurance. The system was brought into action on 01.01.2013. It gathers data on every driver that has ever been insured by CTMPL insurance. The premium depends on a base tariff rate and a number of coefficients. The value of the base tariff ranges from 3 432 roubles to 4 118 roubles for individual clients (approximately e57 - e69, as of May 2015). The values of coefficients are determined by:

• vehicle motor power; • usage period;

• policy duration; • the number of drivers;

• age and driving experience of every driver; • region of prior use;

• claim record.

(12)

The CMTPL insurance segment is strictly controlled by the government. However, a single fixed value of the base tariff was first extended to a 5% value range since 01.10.2014 and then to a 20% value range since 12.04.2015 (for details refer to section 1.3.1). Thus, more liberty was given to the insurance companies in terms of tariff calculation. Liberation of tariffs was aimed at improving the financial state of the CMTPL market and creating a healthy level of competition.

One of the peculiarities of the CMTPL insurance market in Russia is the fact that it has become unprofitable. According to the data of the Russian Association of Motor Insurers (RAMI), the financial troubles of the CMTPL market may cause over 3.5 billion roubles of losses (seeSmirnov (2015)). 1.1.2 MOD in Russian motor insurance market

In this study, by MOD insurance we understand the insurance of road vehicles with coverage of any damage to the insured vehicle with the exception of the damage wilfully inflicted by a policy holder. The two major groups of risks covered by the MOD insurance are ‘Theft’ and ‘Damage’. The ‘Theft’ risk covers accidents when the insured vehicle is stolen and the insurer is obliged to compensate full amount of coverage. Insurance companies often limit the coverage, for instance, by excluding cases when a start key is left inside the car.

The ‘Damage’ group of risks differs significantly from one insurer to another. The MOD insurance is voluntary, therefore, it is not so heavily regulated by legislation. However, the minimum standards of MOD insurance are now being developed (see also section1.3.1). The basic risks from the ‘Damage’ category are road accidents and third-party wrongdoing risks. Natural disasters are not always covered by MOD insurance policy, however, this option is offered by several insurance companies.

Typical exceptions to MOD insurance coverage are vehicle damages caused by additional freight loading or pets transported in passenger compartment, damage that occurred during racing or other competitions that involve vehicle usage. Compensation is not provided if the driver does not have the driving license at the moment of the accident or if the driver is not included in the list of drivers in the insurance contract.

The MOD tariffs depend on a number of factors that are set by the insurance company. The most typical factors are the brand, the model and the year of production of the vehicle, the number of drivers listed in the insurance contract as well as the age and driving experience of each driver. Naturally, policy terms such as franchise also affect the tariff. Less widely used factors in Russian motor insurance companies include power of the motor, driver’s gender, mileage of the vehicle. There are also quite unusual factors, such as driver’s marital status and colour of the vehicle.

As shown in Fig. 1.1, the growth rate of compensation payments in the MOD insurance market exceeded that of premium income in the fourth quarter of 2012, when negative consequences of the recent developments started to affect the market. Only in the 4th quarter of 2014 insurance companies managed to stimulate the premiums growth enough to cover the growth rates of compensations. It should also be noted that the general downward trend in growth rates illustrates a slowdown in the market development. Moreover, negative values of premiums growth rates in 2014 and 2015 indicate shrinking market in the critical conditions.

1.1.3 The relation between CMTPL and MOD insurance markets

In this subsection we consider the relation between the CMTPL and MOD insurance markets. The recent changes in legislation described below in section1.3.1led to a significant increase in loss ratios of both CMTPL and MOD insurance markets. The current macroeconomic tendencies discussed in section1.3led insurance companies into a critical situation. Therefore, the insurers had to avoid extra losses by all means available.

First, some insurance companies insistently recommended customers to purchase additional insur-ance services along with the CMTPL policies. In particular, the suggested services included MOD policies and other voluntary insurance services. Insurers also tried to limit the number of CMTPL policies sold per day and even refused to sell CMTPL policies. However, it should be noted that these desperate measures were only applied in loss-generating regions.

(13)

Figure 1.1: Premium income and compensation payments growth, Russian MOD insurance market

Source: Media-Information Group ’Strakhovanie Segodnya” (see MIG(2015))

Second, in order to compensate for extra losses in the CMTPL segment, insurers raised the MOD tariffs. On the one hand, CMTPL tariffs are controlled by the Central Bank of the Russian Federation, so there is no opportunity to compensate the new losses by raising the CMTPL tariffs. On the other hand, the MOD insurance has the benefit of non-regulated tariffs. However, high MOD tariffs nega-tively affect demand for the insurance services. Therefore, increasing the MOD tariffs is the simplest way to cover extra losses, although, the insurers can try to attract new clients.

One of the most applicable and efficient ways of avoiding extra losses is to apply more accurate client selection procedures. Many experts hold the opinion that more thorough risk assessment contributes greatly to better financial results (seeBarykina(2015)). This approach encourages companies to create more fair and accurate tariffs by extending the number of factors considered for tariff calculation.

Thus, CMTPL and MOD insurance markets are closely connected. Changes in the legislation affect both segments. The critical situation in the CMTPL market triggers an increase in MOD tariffs. Therefore, compulsory and voluntary motor insurance markets should be considered together.

1.2

Current structure of Russian motor insurance market

The motor insurance market in Russia has a number of features that are relevant to this research. The first feature is a large number of companies operating in the market. In the 1st quarter of 2015 there were over 130 companies licensed for MOD policies sales (seeMIG (2015)). The number of the insurance companies is comparable to the UK motor insurance market: in 2014 there were 140 MOD insurance companies in the UK (see Miller (2015)). Evidently, in terms of territory Russia is much larger than the UK, but the difference in number of cars between the two countries is not so profound. In 2014 there were almost 30 mln. passenger vehicles registered in the UK (see DTUK (2015)) and 43.3 mln. vehicles registered in Russia (see SAIRF(2015)).

The second feature is the structure of the MOD insurance market in Russia. The pie chart in Fig.

1.2 represents the structure of the MOD insurance market in Russia in 2014 in terms of the total premium income. As it can be seen from the diagram, the five largest companies take over 50% of the market. The ten largest insurers hold about 75% of the market. Thus, a quarter of the MOD market is shared by over 130 small companies. Consequently, the competition level among these small companies is very high and competition among the large insurers is also considerable.

(14)

Figure 1.2: Russian MOD insurance market structure, total premium income in 2014.

Source: Media-Information Group ’Strakhovanie Segodnya’ (see MIG (2015))

It should be noted that lack of strict regulation on MOD tariffs encourages a significant level of competition in the Russian MOD insurance market. Price-based competition tends to be destructive in the insurance field: there is a minimum justified tariff based on actuarial calculations that limits discount opportunities. Therefore, insurance companies try to attract new customers by offering addi-tional services such as average surveyor advice and a simplified claim application process. Moreover, according to the data of the Russian Automotive Market Research, in 2014 only 12% of car owners in Russia had a MOD policy (seeMorozov and Nefedov(2015)). This number demonstrates that market reach is still relatively low and insurance companies have a lot of opportunities to attract new clients and improve the financial state of the company.

The relative sizes of CMTPL and MOD markets are quite different in Russia and the European countries. The CMTPL insurance market in Russia comprises around 40% of the premium income of the motor insurance market in total, while in European countries it stands for approximately 60% of the market (see Miller (2015)). Moreover, number of cars per capita differs: in Russia every third person on average owns a car, while in the UK every second person, so the market exposure in the motor segment is lower in Russia (see DTUK (2015) andSAIRF (2015)).

It is also noteworthy that in Russia the reaction of insurance companies to both internal and external shocks is mainly reflected through changes in MOD tariffs. Premium income from the MOD segment exceeds 50% of the total premium income from voluntary non-life insurance in 5 out of the 10 largest insurance companies in Russia (seeMIG(2015)). Consequently, the MOD insurance market in Russia does not only reflect the dynamics of the motor insurance market, but it also constitutes an important element of the insurance industry as a whole.

1.3

Current tendencies in motor insurance market

In this section recent changes in legislation are described as well as major macroeconomic tendencies and consequently evolving trends in motor insurance are listed. The section also contains conclusions from the current situation in the market and suggests a way of how current negative tendencies and trends could be avoided by applying appropriate data analysis.

(15)

1.3.1 Influence of changes in legislation on Russian motor insurance market A number of significant changes in legislation were introduced in 2012-2015, which influenced both CMTPL and MOD insurance sectors. The following essential innovations were adopted by the leg-islative bodies: increases in limits of indemnity and tariffs in the CMTPL insurance, extension of consumer protection law to cover CMTPL and MOD insurance contracts, and also a number of minor but influential changes.

Indemnity limits and tariffs in CMTPL. The most outstanding and controversial alteration implied a large increase in limits of indemnity and tariffs for the CMTPL insurance. Essential points of a two-year-long discussion on indemnity limits and CMTPL tariffs follow below.

The CMTPL legislation reform act was adopted on 21.07.2014 (see 223-FZ (2014)). The changes in the limits of indemnity were suggested because the limits had not been modified since the initial introduction of the Law on CMTPL in 2002. For over 10 years it had not been adjusted neither to inflation, nor to the structural changes in Russian automotive market. The main goal of the CMTPL insurance is to protect interests of the damaged party in a road accident, therefore, suggested changes can be seen as very beneficial from the customer’s side.

However, insurance companies were not optimistic about the coming changes, since the base CMTPL tariff has been adjusted for over 10 years. Insurers suggested to increase the base tariff in order to compensate for the new limits and the negative tendencies that took place in Russian insurance market (more details follow in section1.3). The estimates of necessary increase ranged from 40% to 70% (seeEvplanov (2013)). Only external experts from ’Towers Watson’ were able to put an end to the long lasting discussion between the regulator (the Central Bank of the Russian Federation) and insurance companies. These experts demonstrated that the average CMTPL premium should be raised by 26%–31%, while the base tariff should only be increased by 11.8% in order to reach a break even point (see Nehaichuk(2014b)). As a result, the following changes were implemented.

The first increase in limits of indemnity concerned the vehicle damage risk. It came into force on 01.10.2014 and increased the maximum of potential compensation to over three times the initial amount: from 120 000 roubles to 400 000 roubles. The second increase in indemnity limits concerned the risk of personal injuries. It came into force on 01.04.2015 and multiplied the previous compensation limit by 3.125: the limit increased from 160 000 roubles to 500 000 roubles.

Both changes in the limits of indemnity were followed by a respective rise of the base CMTPL tariff. Before these changes became operational, the base tariff for CMTPL was fixed to a value of 1980 roubles. The first tariff change that came into force on 01.10.2014 comprised a 23%–30% upturn in the base tariff and set it to a tariff range of 2 440–2 574 roubles. The second change came into force on 12.04.2015 and increased the tariffs by 40%-60% up to the current level.

Consumer protection law coverage. According to the decision of the Russian Supreme Court Plenum which was made on 27.06.2013, motor insurance services are also covered by the consumer protection law (see Plenum (2013)). In fact, it implied that insurance companies are obliged to pay out compensations even for claims that do not amount to any insured event and do not satisfy policy wording. Thus, it affected the whole motor insurance market greatly. This decision led to an enormous increase of legal costs among insurers. Fig. 1.3 illustrates dynamics of the amount of CMTPL claims settled in court. As shown in Fig.1.3, the plenum decision described above caused the share of CMTPL claims settled in court to increase more than twice in three years.

Unfavourable legal practice affected both CMTPL and MOD markets significantly. Premium in-come of these segments sums up to 40%-50% of non-life insurance portfolios in many companies. The first way for insurers to avoid a sudden rise in costs was to increase MOD tariffs (in order to compen-sate for losses in both compulsory and voluntary insurance). Thus, over 2014 MOD tariffs rocketed by 40% relative to the previous year (see Lavrentev (2015)). The legal practice was not the only reason for insurers to increase tariffs (see section1.3), however, it contributed greatly towards this tendency. Automotive lawyers. It is worth noting that the adjustments to the consumer protection law are very controversial. On the one hand, they were aimed at protecting the policy holders’ interests. Indeed, after the introduction of these adjustments the average CMTPL compensation increased by 20% YoY in the 3rd quarter of 2013, while the average CMTPL premium only increased by 7% (see Morozov and Nefedov(2015)).

(16)

Figure 1.3: Proportion of CMTPL claims settled in court, %

Source: an annual report by the Russian Association of Motor Insurers (see RAMI (2015))

On the other hand, experts believe that most of the penalties paid as a result of a legal action against an insurance company are not received by the insured clients directly, but they become a vast income of ’automotive lawyers’ (seeKomleva and Yanin (2013)). Activities of such intermediaries are based on massive collection of claims from dissatisfied clients of insurance companies. Usually lawyers buy the claims for a price equal to a compensation that would satisfy the client. However, by winning numerous lawsuits lawyers get not only the adjusted compensation, but also fines and penalties that the insurance companies are obliged to pay. Thus, the aforementioned change in legislation encouraged significant growth of the ’automotive lawyers’ industry.

Limiting the CMTPL market share. Moreover, the changes discussed above provoked in-surance companies to cut their CMTPL business in unprofitable regions (subordinate entities in the Russian Federation). Several large companies decreased their share of the market. For instance, the third largest company ‘Ingosstrakh’ decreased its market share by almost 30% from 2013 to 2014 (see MIG (2015)). Other companies closed their offices in unprofitable regions and left regional mar-kets. Although CMTPL insurance accounts for around 15% of the total insurance market, it is very influential (see RAMI (2015)). In fact, the CMTPL sector defines the regional presence of insurance companies, therefore, it is especially significant for the regional network.

Pre-court dispute resolution in CMTPL. After the regulator considered the situation, the following adjustments to the Law on CMTPL were introduced in July 2014 (along with changes in indemnity limits and tariffs). First, on 01.09.2014 pre-court dispute resolution procedures came into action and became obligatory in CMTPL insurance. Thus, the customer has to notify the insurance company before starting a legal action, if he(she) is not satisfied with the compensation.

Standards of compensation calculation in CMTPL. Second, since 01.12.2014 calculation of CMTPL compensation has to be based on the unified standard methodology. Some studies predict that the introduction of the new calculation methodology will decrease the number of legal actions by 70%-80% (see Nehaichuk(2014a)). After these modifications insurers have taken more optimistic view on established practice, however, the MOD insurance market is still in a critical situation.

Single option direct claim settlement in CMTPL. A number of additional developments were implemented in 2013-2015. One of them involves introduction of the single option direct claim settlement in the CMTPL sector. According to the newly introduced amendments in the law ’On Compulsory motor third party liability insurance’, since 02.08.2014 a policy holder can only submit a claim to the company where he is insured (see223-FZ(2014)). Before this innovation customers could

(17)

choose between submitting a claim to their insurance company or to the company where the guilty person is insured.

This change influences the motor insurance market in two ways. First, policy holders should be more careful while choosing the insurance company. Consequently, insurance companies should increase the quality of their services in order to attract new clients. Second, insurers themselves should employ more careful selection procedures while constructing their portfolios. Now the companies do not have an option to refer the insured person to the guilty party’s insurance company in order to settle the claim. Therefore, this development is beneficial not only for customers, but for the motor insurance market in general.

Loss of commodity value compensation in CMTPL. Another innovation concerns the loss of commodity value. In accordance with the decision of the Supreme Court Plenum from 29.01.2015, insurance companies are obliged to include the loss of commodity value into the CMTPL compensation (see Plenum (2015)). Moreover, insurers are also obliged to compensate damage to any freight that was carried by the vehicle at the moment of an accident. The new CMTPL tariffs did not take these developments into account, therefore, the decision of the Supreme Court encouraged insurers to raise the MOD tariffs in order to compensate for extra losses. The head of the Russian Insurers Union, Igor Yurgens, argued that the effect of this decision on the CMTPL market should be analysed carefully and CMTPL tariffs should develop accordingly (seeKostromina (2015)).

Universal MOD insurance standards. One of the longest-awaited changes is the introduction of the universal MOD insurance standards. At the moment this document is under development. However, the main goal of this regulation and its major aspects are already clear. The universal MOD insurance standards are analogous to CMTPL regulations: they are aimed at a clarification of the relationships between the insured person and the insurance company.

On the one hand, the document establishes claim settlement procedures and minimum covered risks, identical for all insurers. Thus, every policy holder will be aware of the rules applying to the insurance contract regardless of the insurance company. On the other hand, exemptions from coverage are specified, so that insurers will not suffer extra losses due to numerous legal actions. The universal standards should lower the amount of legal actions that concern claim settlements on the MOD insurance market. Moreover, they will form a base for a clearer and more transparent tariff formulation. 1.3.2 Influence of the current macroeconomic situation on Russian motor

insur-ance market

The insurance industry is tightly linked to the global macroeconomic conditions and the influence of the global environment becomes most obvious during crisis periods. For instance, decreasing prices of crude oil caused an enormous drop in currency exchange rates in December 2014 — January 2015, which weakened the rouble significantly.

Depreciation of national currency. The weak national currency affected the motor insurance market in several ways.

1. It caused a significant upturn in prices of foreign manufactured vehicles: prices of new cars in Moscow increased up to 30% from October 2014 to February 2015 (seeNoskova(2015)). 2. It caused a sharp increase in repair costs (including higher prices for repair parts).

3. It also caused a noticeable decrease in demand for new cars. Thus, in January–April 2015 the number of new vehicles sold in Russia dropped by almost 40% YoY (see Lomakin(2015)). 4. As a result of the tendencies mentioned above, MOD tariffs increased significantly. On average,

MOD tariffs were increased by 40% over 2014 (see Rilling et al.(2015)).

Lower real income. Moreover, the macroeconomic situation affected real income. Lower real income combined with higher prices on MOD policies led to a decline in demand for MOD insurance services. Experts predict a 5% drop in premium income from MOD insurance in 2015 (seeMIG(2015)). Furthermore, the negative macroeconomic environment caused a significant increase in interest rates, thus leading to a decrease in demand for loans. According to the data provided by the Russian

(18)

Public Opinion Research Centre, only 35% of respondents who took part in a public opinion poll in 2015 expressed their wish to buy a car on credit, in comparison to 58% in 2014 (see Noskova(2015)). This tendency caused another drop in demand for MOD policies, since a MOD policy is a customary requirement within a loan agreement.

Foreign insurers are leaving Russian motor insurance market. The tendencies listed above combined with the crisis in the CMTPL insurance market caused several foreign companies to leave the Russian market: Zurich and AIG left the Russian retail insurance market in the second half of 2014 (see Anisina(2014)). Certainly, such developments do not appear to be very promising. However, the critical situation stimulates novel approaches. Thus, considering the difficult circumstances, insurance companies launched and promoted more customized insurance products, such as incomplete MOD policies with a limited number of risks covered or with more varieties in franchise suggested. Moreover, telematics based insurance (so called ’smart MOD insurance’) is offered by a number of insurers as a way to lower the price of the MOD policy.

1.4

Chapter conclusions

This chapter described the modern state of the Russian motor insurance market. The chapter focused first on peculiarities of the CMTPL and MOD insurance markets as separate segments and then determined the relation between the two markets. The current structure of the Russian motor insurance market is characterized by low concentration and high competition level. Recently introduced changes in legislation and unfavourable macroeconomic tendencies caused losses in both CMTPL and MOD insurance markets.

Therefore, insurance companies had to take measures in order to prevent extra losses in motor insurance. Moreover, insurers increased MOD tariffs in order to compensate for the negative tendencies. Increasing tariffs is the most obvious, but not necessarily the best way of dealing with the critical situation. In particular, studies suggest that accounting for additional factors while pricing MOD policies contributes to better financial results.

(19)

The models for claim frequency and

severity analysis

This chapter is focused on the theoretical aspects of generalized linear models that constitute the basis of this research. The chapter is structured as follows. First, a literature review is presented. Then, a general framework of the generalized linear models is described. Furthermore, claim frequency mod-elling is considered, in particular, the hurdle model is discussed. The next part contains specifications of claim severity analysis, namely, the gamma distribution in the GLM setting. Assumptions and mod-elling procedures are described for each model, as well as peculiar features that make these models especially useful in the actuarial science.

Consider the following terms and notation for the two types of models employed in this research:

Table 2.1: Modelling framework terms

Term Notation Frequency models Severity models Observation i = 1, . . . , n one policy

Dependent variable Xi number of claims total sum of claims

Weight (exposure) ωi policy years number of claims

Key indicator Yi claims frequency claims severity

This notation will be used throughout the whole thesis. Thus, if the dependent variable is the number of claims, it is reasonable to take policy years as the weight. For instance, if the duration of the policy is 1 year and the whole year is included in the time period under consideration, then ω = 1. This being the case, Yi = Xi/ωi is frequency of the claims. For severity of claims as the key indicator, the total

sum of claims for a particular policy would be the dependent variable and the number of claims would be the weight. Independent variables in both types of models include driver’s attributes (experience, age, etc.), vehicle’s characteristics (brand, model, whether it was bought on credit, etc.), and policy conditions (franchise, for instance).

2.1

Literature review

This section focuses on the literature relevant to the research. It reviews associated literature: text-books and articles from periodic publications on GLMs, frequency and severity models. Frequency models are based on count data adjusted for duration of the insurance contract, while severity models represent continuous data with excess zeros.

(20)

2.1.1 Literature on GLMs

GLMs became a widely popular concept in 1980s. Therefore, the GLMs framework is covered in a large amount of literature in different fields. However, this research focuses on application of GLMs in the insurance context.

Renshaw A. Modelling the claims process in the presence of covariates (1994). The main purpose of this article is to illustrate the significant potential of GLMs for modelling the claims process. It discusses a number of beneficial features of these models. In particular, the article lists a variety of distributions available within the framework. Due to a large number of fitting distributions, a variety of modelling techniques and some specific components in GLMs (such as the link function) the general modelling concept is presented as very flexible.

Two major types of models are discussed in detail: claim frequency and claim severity models. The-oretical contemplations are supported by an illustrative example of a leading UK insurance company’s data. Different sets of risk factors are considered for the two types of models, while goodness of fit is being assessed. Authors also examine interaction effects and comment on significance of interaction between the vehicle type and the policy holder’s age.

Relevance of this paper to the current research is quite obvious, as it covers the most essential theoretical background. Moreover, it is one of the first comprehensive studies that employ motor insurance data as an illustration of the GLMs framework. For more detail refer toRenshaw (1994).

Haberman S., Renshaw A. Generalized Linear Models and Actuarial Science (1996). This paper takes a global perspective on role of the GLMs in actuarial science. The main goal of this article is to demonstrate the high potential of GLMs that is not restricted to modelling only motor insurance premiums. Thus, authors consider the following practical applications of GLMs:

• survival modelling;

• multiple-state models in health insurance; • claim severity modelling in non-life insurance; • mortality and life-insurance lapse modelling; • non-life insurance premium rating;

• reserves modelling in non-life insurance.

In total, the authors present six major implementations of the GLMs framework. Thus, they implement the Gompertz-Makeham survival model within the GLMs framework, describe permanent health insurance model, they also model a number of parametrized distributions, implement Cox’s model, analyse frequency and severity of claims and calculate reserves in non-life insurance.

This paper is especially relevant to the current research for two main reasons. First, the broad perspective taken in the study reflects the high potential of GLMs and allows examining the framework from a different prospective. Second, the part on non-life insurance provides essential theoretical background.

Evidently, the list of references on GLMs presented above is not comprehensive. One of the first major monographs on the subject is ‘Generalized Linear Models’ by McCullagh and Nelder (1989). There is also a number of more recent text-books that are not focused on GLMs solely, but pay a great deal of attention to the concept. Among them there are Ohlsson et al. (2010), Kaas et al.

(2008), Klugman et al. (2004), Zeileis et al. (2008). For more detail on the theory of GLMs and its modifications refer to the text books listed above.

2.1.2 Literature on frequency models

Mullahy J., Specification and Testing of Some Modified Count Data Models (1986). This paper provides a very detailed introduction into the modified count distributions. In particular, it suggests that data generating processes underlying zero observations might differ from those underlying

(21)

the positive count observations. The authors pay special attention to the properties of the alternative models, such as the ability to cover underdispersion and overdispersion, which are typical to economic and social studies data.

The theoretical background of hurdle models is described in much detail. Maximum likelihood estimates for model parameters and specification tests are also discussed in the paper. Poisson and negative binomial specifications are suggested for every type of model. Data on beverage consumption is employed in order to illustrate formulated models and tests. The results obtained from the practical example demonstrate that hurdle models and similar specifications provide a better fit to the data with excess zeros, they also demonstrate that estimates from different model types are comparable.

The paper is highly relevant because it explains the fundamental theory of the models employed in this research. It is also especially valuable due to the application of specification tests and the demonstration of benefits of the augmented count models. For more detail refer to Mullahy(1986).

Zeileis A., Kleiber C., Jackman S. Regression Models for Count Data in R (2008). In comparison to the previous paper, this article is rather technical. The main purpose of the paper is to discuss properties of count variables models within R programming language framework. The primary focus of this paper is on hurdle and zero-inflation models, which handle such problems as over-dispersion and excess zeros typical for economics data.

The authors provide a brief description of the underlying theory in traditional count data models as well as zero-augmented models. The GLMs framework is introduced and followed by special cases of Poisson, quasi-Poisson, negative binomial, hurdle and zero-inflation models. Moreover, the R package pscl is introduced. The package contains implementation of suggested models in R. It does not only include basic commands for model building, but also a number of prediction and validity assessment techniques.

In order to illustrate the application of the models, the authors employ an example on demand for medical care. Based on the example they introduce some basics of exploratory analysis in R, build the models under consideration, interpret the results. A comprehensive comparison of the models is also presented in the paper. It illustrates that hurdle and zero-inflation models provide significantly better fit for data with large number of zero observations. In general, the authors conclude that the hurdle model appears to be the best for the particular data sample, since its quality is as high as in zero-inflation model, but the interpretation of the hurdle model is easier to understand.

This paper is highly relevant to the research since it does not only describe the theoretical basis, but also offers particular ways of implementation for the models under consideration. It is especially important due to the lack of literature relevant to this subject. Moreover, the suggested validity assessment techniques for these models are valuable, as it is not described in detail in any other source. For more detail refer toZeileis et al. (2008).

A number of scientific studies on relevant topics is also considered in this research. Among them are Sant(1980) on multiplicative multivariate loss models, the second edition of ’Regression analysis of count data’ by Cameron and Trivedi(2013), and others. For more detail on the count data models refer to the sources listed above.

2.1.3 Literature on severity models

Frees E., Valdez E., Hierarchical Insurance Claims Modelling (2007). This paper describes a comprehensive research on claims modelling. Similarly to most of the papers on claim severity, this article does not only discuss severity distributions, but also combines it with frequency models. The main goal of the paper is to determine risk factors that influence features of claim processes. The authors build a hierarchical model that consists of the three corresponding components: frequency, severity and type of motor insurance claim.

The authors pay special attention to the detailed panel data on insurance records. The complex structure of this data allows constructing a comprehensive hierarchical model of motor insurance claims. It is shown in the paper that driver’s age and gender, as well as the age and type of the vehicle are statistically significant in the analysis of claim frequency within the negative-binomial claim counts model. Moreover, year of the claim, vehicle age and driver’s age are significant predictors

(22)

of claim severity fitted with the generalized beta distribution of the second type. Again, year of the claim and vehicle age in combination with vehicle type are significant for determining the type of the claim.

The paper under consideration contributes to the current research in quite a general manner. Thus, the authors use the generalized beta distribution of the second type, while a special case of it (gamma distribution) is employed in this research. However, this paper provides a global, but detailed view on the subject matter: considering all three components illustrates the interrelation between them, which could become a subject for future research. For more detail refer toFrees and Valdez (2008).

Resti J., Ismail N., Jamaan S. H. Estimation of claim cost data using zero adjusted gamma and inverse Gaussian regression models (2013). This recent article, unlike the previous study, focuses solely on claim severity models. Most of the models based on the gamma distribution do not account for zero observations that often make up the majority of insurance data. Tweedie models allow for zeros, but the probability of a zero observation is not fitted as a function of the risk factors. Therefore, the main purpose of this paper is to introduce zero-augmented gamma and inverse Gaussian models for insurance claim severity modelling.

These models not only account for zeros, but they also allow dependence of zero observations generating process on a set of regressors. Each model consists of two parts: claim probability part (probability of a zero-observation is examined) and claim cost part (when the claim amount is not a zero-observation). The authors come to the conclusion that zero-adjusted inverse Gaussian model is preferable for their dataset. They also note that for both claim frequency and claim severity analyses explanatory variables play an important role.

Despite the conciseness of the paper, it provides a very valuable insight into zero-augmented models for claim severity in motor insurance, which is somewhat analogous to the zero-augmented frequency models discussed in the previous section. It is not the main purpose of this research to employ such sophisticated models, but it might well be a subject for future consideration. For more detail refer to

Resti et al.(2013).

As it was mentioned earlier, most of the papers that study severity of claims also consider claim frequency. Thus, Brockman and Wright (1992) on statistical motor rating and Renshaw (1994) on modelling the claims process serve as examples of modelling claim severity within the GLMs framework. For more detail on theoretical background of claim severity models refer toOhlsson et al. (2010) and

Klugman et al. (2004).

2.2

Generalized Linear Models

The generalized linear models (GLMs further on) represent a broad class of statistical methods applied in numerous fields. Even within one field these models are also often employed to obtain solutions for different kinds of problems. For instance, in motor insurance, this group of models can be applied for tariff justification and target marketing. This research focuses on the usage of the GLMs in order to determine significant risk factors within tariff analysis.

The aim of tariff analysis in motor insurance is to specify how a particular set of risk factors affects an indicator, such as frequency or severity of claims. At first, this setting reminds of the linear regression, where effects of regressors on the dependent variable are studied. However, linear models are not very suitable for tariff formulation in motor insurance for the following reasons.

1. They assume normally distributed random disturbances, while the number of claims in insurance corresponds to a discrete probability distribution and severity of claims is characterized by a positively skewed probability distribution.

2. In linear models the mean of the dependent variable is a linear function of the regressors, while a multiplicative model is more relevant for insurance applications (for details refer to page16).

(23)

E. Ohlsson and B. Johansson in “Non-Life Insurance Pricing with Generalized Linear Models” (2010) argue that GLMs generalize the traditional linear regression in two ways, each of them corre-sponding to one of the problems stated above.

1. Probability distribution. GLMs allow not only for a Gaussian distribution of random errors, but also for Poisson, negative binomial, gamma and several other distributions.

2. Dependent variable mean modelling. The GLM framework allows employing any monotonic trans-formation of the mean as a linear function of regressors in a manner of either an additive or a multiplicative model.

2.2.1 The general setting

This part of the chapter focuses on a group of models that address the first problem stated above. These are the exponential dispersion models. First, a number of assumptions suggested in Ohlsson et al. (2010) should be introduced.

1. Independence of policies. Consider a number of policies (observations). Claim frequency of one particular policy is independent of that of another. Similarly, claim severities of two different policies are independent of each other.

2. Independence over time. Claim frequencies of one policy over different non-overlapping periods of time are independent of each other, as well as claim severities.

3. Homogeneity. Consider two policies (observations) with the same values of all risk factors (regres-sors). Then the corresponding values of key factors of these two policies (frequency or severity of claims) are characterized by the same distribution.

If the assumptions presented above hold, then, according to Ohlsson et al. (2010), the probabil-ity distribution in exponential dispersion models is determined by the following probabilprobabil-ity densprobabil-ity function: fYi(yi; θi, φ) = exp  yiθi− b(θi) φ/ωi + c(yi, φ, ωi)  , (2.1)

where yi — realization of a random value Yi; θi — fundamental distribution parameter different for

each i (open interval domain), while dispersion parameter φ > 0 has the same value for all i; ωi≥ 0 —

“weight” of the i-th observation; b(θi) — cumulant function (twice continuously differentiable and

invertible function).

A given cumulant function and parameters φ and θi fully determine the distribution family

(bino-mial, Gaussian, Poisson). The function c(yi, φ, ωi) does not depend on θi, therefore, it is not considered

in much detail within the GLM framework. It includes all the components of the probability density function which do not depend on θi.

2.2.2 Rating factors in a multiplicative model

Generally, tariff analysis is based on the insurer’s own data. Define a tariff cell as a set of policies with the same values of all the risk factors. Then the premium for such a cell could be estimated as an average claim amount per policy, if there is enough data in every tariff cell. However, in practice this is rarely the case, as the Russian insurance companies often face a lack of own data in some tariff cells. Therefore, a need of specific methods arises.

These methods should provide the expected value of premium that would be close to the estimates for tariff cells and would not differ too much between similar cells. It is suggested in Ohlsson et al.

(2010) that one of the models fitting to the description above is a multiplicative model, where the expected value of the dependent variable is determined by a set of rating factors.

Consider M rating factors with mi — the number of classes (values) for every rating factor i. For

(24)

i and j denote the class of the first and the second rating factors accordingly. The tariff cell (i, j) has a weight (exposure) of ωij and a value of the dependent variable Xij. The key indicator is calculated

as Yij = Xij/ωij.

Note that E(Yij) = µij, where µij is the expected value, if ωij = 1. Then the multiplicative model

can be described as follows:

µij = γ0γ1iγ2j. (2.2)

Here {γ1i; i = 1, . . . , m1} are the class parameters of the first rating factor, {γ2j; j = 1, . . . , m2} — the

class parameters of the second rating factor, and γ0 is a constant basic value.

The case for 2 rating factors can be easily generalized for any number of factors:

µi1i2...iM = γ0γ1i1γ2i2. . . γM iM. (2.3)

The two-factor model is superfluously defined in the setting above. If all the γ1i are multiplied by

10 and γ2j are divided by 10, then the same values of µij are obtained with new values of parameters.

In order to obtain unique values of parameters, the reference tariff cell should be specified, preferably with a large weight. All the other tariff cells will be compared to the reference cell, therefore, providing a relative formulation. Let the cell (1, 1) be the reference cell, then γ11= γ21= 1.

Now the parameter γ0 can be interpreted as a basic reference value: it is the value of the key

indicator for the reference tariff cell, while other parameters represent a relative difference with respect to the reference cell. For instance, if γ12 = 1.25, then the mean in the cell (2, 1) is 25% higher in

comparison to the cell (1, 1).

The multiplicative model assumption implies that there are no interactions of factors within one model. Let the first rating factor be the driver’s age group, the second rating factor — geographical region and the key indicator is net-premium, then the following interpretation takes place. For instance, if net-premium is 20% higher for drivers of age 26–30 in one region, then the same ratio is held in every other region. Usually, this setting is much more relevant and reasonable in tariff analysis.

The following example illustrates the model. Let the premium for drivers 18–25 years old in the first region bee1 000, while in the second region this premium is e2 000. Then an increase of 25% in premium within the multiplicative framework implies an increase ofe250 in the first region and e500 in the second region. It seems fair, since relatively young drivers usually imply a greater risk for the insurance companies. Within the additive framework µij = γ0+ γ1i+ γ2j, the same example means

that the premium in both regions would increase by an absolute value of e250. It does not seem to be fair now: for the residents of the first region it is 25% of their premium and it might affect their welfare, while for the residents of the second region it is only 12.5% of the premium and it does not affect their welfare on the same level.

2.2.3 Link function

The link function describes a connection between the mean of the dependent variable and the re-gressors. Consider a model with 2 classes of the first rating factor and 3 classes of the second factor. Applying a logarithmic transformation to the multiplicative model provides the following equation (see Ohlsson et al. (2010)):

log(µij) = log(γ0) + log(γ1i) + log(γ2j), (2.4)

where µij is the mean claim number in the cell (i, j). Again, take the tariff cell (1, 1) as a reference

cell, where γ11= γ21= 1. In order to present the model in a list form, redefine the parameters in the

following manner:

β1 = log(γ0),

β2 = log(γ12),

β3 = log(γ22),

(25)

Consider the following parametrization of a multiplicative model with 2 rating factors in Tab.2.2.

Table 2.2: Parametrization of a multiplicative model with 2 rating factors i Tariff cell log(µi)

1 1 1 β1 2 1 2 β1 +β3 3 1 3 β1 +β4 4 2 1 β1 +β2 5 2 2 β1 +β2 +β3 6 2 3 β1 +β2 +β4

Introduce a dummy variable xij:

xij =      1, if βj is included in µi, 0, else. (2.5)

Then the dummy variable representation of parametrization is defined in Tab. 2.3.

Table 2.3: Dummy variables in a multiplicative model with 2 rating factors i Tariff cell xi1 xi2 xi3 xi4 1 1 1 1 0 0 0 2 1 2 1 0 1 0 3 1 3 1 0 0 1 4 2 1 1 1 0 0 5 2 2 1 1 1 0 6 2 3 1 1 0 1

Then equation (2.4) can be transformed as follows:

log(µi) = 4

X

j=1

xijβj; i = 1, 2, . . . , 6. (2.6)

Any monotonous function g(·) dependent on µi may be placed in the left-hand side of the equation

within the GLM setting. Consider a general case of examining the effect of r explanatory variables x1, x2, . . . , xr on the dependent variable Yi. A new variable should be introduced:

ηi = r

X

j=1

xijβj; i = 1, 2, . . . , n, (2.7)

where xij is redefined as the value of j-th regressor in i-th observation. In standard linear models it is

implied that µi≡ ηi, while in generalized linear models g(µi) = ηi, if g(·) is a monotonous differentiable

function. This function is one of the key components of the GLM framework. The link function links the mean of the dependent variable with the linear structure through the newly introduced parameter:

g(µi) = ηi= r

X

j=1

xijβj. (2.8)

Thus, in multiplicative models the link function is logarithmic:

(26)

which can be explained by construction of the model. It should be noted here that, for the reasons stated above, multiplicative model and the log-link function play a special role in motor insurance model building.

2.3

Specifications for claim frequency modelling

This section describes a group of models that allow modelling count variables. First, it lists several specific features that should be considered when applying these models. Then the section focuses on hurdle model, as it constitutes the foundation of this research. It should be noted here that this model does not belong to the generalized linear models as such, however, its separate components can be described in terms of the GLMs setting.

2.3.1 Specific features of count variable analysis

The general setting briefly described in table 1 is applied throughout this section. It is noteworthy that the key indicator and the dependent variable are connected through the exposure element, just as it is described in section 1 of this chapter (see page16). This framework determines usage of spatial data.

One of the most classical specifications for count variables analysis is the Poisson distribution in the GLM setting. It allows describing rare random events processes. Moreover, it offers several beneficial properties, such as additivity and equality of mean and variance to the value of the distribution parameter. However, motor insurance data are often characterized by overdispersion, which is not covered by the Poisson distribution.

Therefore, the negative binomial distribution is often employed for claim number modelling. It allows capturing overdispersion in random disturbances, which is often common to the number of claims distributions. Similarly to the Poisson distribution, it also captures positive skewness of claims numbers. Nevertheless, neither Poisson nor negative binomial distribution incorporates excess number of zero-observations, if each policy is taken as an observation. Hence, more advanced methods, such as the hurdle model, should be applied to capture this peculiarity.

2.3.2 Hurdle model

The main specific feature of the hurdle model is that it describes the link between regressors and the count dependent variable that has a large number of zero-observations (see Mullahy (1986)). It is a very important property in reference to the insurance data. The model distinguishes between data-generating processes for zero-observations and positive observations. It consists of two components: a process of zero-values and a positive count values process (starting from 1). It should be noted that this model separates the two processes very clearly and both processes are observable, unlike the components of the zero-inflated model, for instance.

Assumptions of the hurdle model. According to Mullahy (1986), this model is based on the properties listed below.

1. The zero-values process has a binomial distribution or it is described by a truncated count values distribution.

2. The count component of the model is based on a zero-truncated Poisson or negative binomial distribution and it is bounded by 1 from below (the negative binomial specification is considered in this research).

3. The two components of the model may include different sets of regressors.

4. Depending on the distribution applied in the count component, the corresponding distribution assumptions should be satisfied.

(27)

Model building. The technical setting of the hurdle model can be presented as follows: fhurdle(y; x, z, η, γ) =      fzero(0; z, γ), if y = 0, (1−fzero(0;z,γ))·fcount(y;x,η)

(1−fcount(0;x,η)) , if y > 0,

(2.10)

where x, z — sets of regressors for each of the two components (index i is discarded for the sake of simplicity); η, γ — vectors of coefficients’ estimates in the corresponding component; fzero(0; z, γ) —

zero-values component, bounded from the right by y = 1; fcount(y; x, η) — count values component,

bounded from the left by y = 1.

Parameters η, γ and one or more dispersion parameters (if fzero(0; z, γ), or fcount(y; x, η), or both

components are based on negative binomial probability distribution function) are estimated by the maximum likelihood method. It is noteworthy that the two components are estimated separately and independently.

Thus, the hurdle model combines two models: a logit-regression for separation of the zero-values process and a Poisson or a negative binomial model bounded by the unit value within the count component.

The following equality represents the foundation of the model:

log(µi) = xTi η + log(1 − fzero(0; zi, γ)) − log(1 − fcount(0; xi, η)), (2.11)

where µi is the expected value of the dependent variable given the regressors’ values xi and zi.

In order to see whether the suggested model is relevant, a test of joint equality of the coefficients in both components should be conducted. The null hypothesis is ηi = γi for every i. Depending on the

resulting p-value, a conclusion can be made whether separate processes need to be considered for zero-values and count-zero-values of the dependent variables. The result of this test could also be interpreted as a general goodness of fit indicator: if the null hypothesis is rejected, hurdle model fits the data better than traditional models. For details on theoretical implications of the model refer to Zeileis et al. (2008).

2.4

Specifications for claim severity modelling

This section describes theoretical implications for a group of models usually employed for claim severity modelling. First, several specific features of these models are described. Then the section focuses on the gamma distribution, as it forms the basis of this research. The gamma specification can be described within the GLMs setting and, in particular, as a special case of the exponential dispersion models (as described in section 2.1.1).

2.4.1 Specific features of claims severity analysis

Similarly to the section 2.2.1, the terms applied in this part of the thesis are defined in table 1. It should be noted here that claim severity distribution implies positive continuous values. Moreover, the claim severity distribution should be skewed to the right, since large claims usually appear infrequently, while a bigger amount of relatively small claims is common in motor insurance.

The following example can be applied in order to illustrate one of the specific features that should be satisfied by severity models. Consider two different tariff cells, with mean claim severity e500 and e2 000. Let the standard deviation of claim severity in the first cell be e100. Claims in the second group tend to be four times as severe as the ones in the first group. Therefore, it seems reasonable to assume that standard deviation of the claim sizes would differ on the same scale, so that the standard deviation in the second group would be e400 (see Brockman and Wright (1992)). Thus, the claim severity distributions have a constant coefficient of variation CV = σ/µ (where µ is the mean and σ is the standard deviation), rather than a constant variance across the tariff cells.

(28)

2.4.2 Gamma distribution

One of the most commonly used specifications for claim severity modelling is the gamma distribution. It satisfies the distribution requirements listed above and can be easily defined within the Exponential Dispersion Models setting (EDM further on).

Assumptions of the gamma distribution model. The following assumptions are suggested by Ohlsson et al. (2010) in order to employ the gamma distribution specification for claim severity modelling.

1. Individual claims are gamma distributed and mutually independent.nThe distribution of indi-vidual claims is fully determined by the probability density function f (x; α, β), where α > 0 is a shape parameter and β > 0 is a scale parameter.

2. The coefficient of variation is constant across the tariff cells.

Based on the assumptions above, first, some aspects of the model for individual claim sizes should be introduced.

Model building. The probability density function is defined as follows: f (x) = β

α

Γ(α)x

α−1exp(−βx); x > 0. (2.12)

It should be noted here that the expectation for this probability density function is α/β and the variance is α/β2. Moreover, a sum of independent gamma distributed variables with the same scale parameter β and shape parameters α1, α2also has a gamma distribution with the same scale parameter

and a shape parameter α = α1+ α2. Let X be the sum of ω claims (independent G(α, β) distributed

random variables), then X ∼ G(ωα, β). Then the probability density function for Y = X/ω (index i is omitted for the sake of simplicity) is computed as follows:

fY(y) = ωfX(ωy) =

(ωβ)ωα

Γ(ωα) y

ωα−1exp(−ωβy); y > 0. (2.13)

Thus, Y ∼ G(ωα, ωβ) with expectation α/β. We can re-parametrize fY in terms of µ = α/β > 0 and

φ = 1/α > 0. Then the probability density function in terms of the new parameters is determined as follows: fY(y) = fY(y; µ, φ) = 1 Γ(ω/φ) ω µφ !ω/φ y(ω/φ)−1exp(−ωy/(µφ))

= exp −y/µ − log(µ)

φ/µ + c(y, φ, ω) !

; y > 0, (2.14)

where

c(y, φ, ω) = log(ωy/φ)ω/φ − log(y) − log Γ(ω/φ). (2.15) Thus, E(Y ) = ωα/(ωβ) = µ and V ar(Y ) = ωα/(ωβ)2 = φµ2/ω are obtained. In order to switch to the EDM notation, another parameter θ = −1/µ < 0 should be introduced. Restore the index i, then the frequency function for the claim severity Yi is constructed as follows:

fYi(yi; θi, φ) = exp yiθi+ log(−θi) φ/ωi + c(yi, φ, ωi) ! . (2.16)

Thus, the gamma distribution specification is indeed a special case of the exponential dispersion model with b(θi) = − log(−θi), therefore, it can be applied within the GLM framework. The parameters and

coefficients of the generalized linear model based on the gamma distribution can be estimated by the maximum likelihood method. For details on theoretical aspects of this model refer to Ohlsson et al.

(29)

2.5

Chapter conclusions

In this chapter we focused on the theoretical foundations of the models that form the core of this research. First, some literature review on the relevant topics is provided. Then, the GLMs are discussed briefly. Finally, the hurdle model for claim frequency analysis and the gamma specification of GLMs for claim severity analysis are discussed. These two models will be fitted to data within this research.

Referenties

GERELATEERDE DOCUMENTEN

The second aim was to investigate the longitudinal association between the time course of recovery of the ipsilesional visuospatial deficit, recovery of contralesional

100 MHz corresponds to a permeability of about 100. ln general the permeability of ferromagnetic materials is higher than that corresponding to the value of the

computer op het laboratorium gebruikt wordt, is dit makkelijk te realiseren. Andere laboratoria kunnen uit de met de hand ver- werkte steekproeven al voldoende

A mixed methods study involves the collection or analysis of both quantitative and/or qualitative data in a single study in which the data are collected concurrently or

Die vraag word gestel of alle onderwysers geskik is vir die bantering van die seksueel-gemolesteerde dogter (sien vraag 23)a. Geselekteerde onderwysers word opgelei

The simulation code (HAWC2 ver 11.6) has been developed to handle VAWT aero-elasticity, hydrodynamics generator controls and using the met-ocean data at the test site. In the

Keywords: negative interest rates, option pricing, American options, European options, Black and Scholes model, SABR model, stochastic volatility, Longstaff-Schwarz, Monte

Willingness to exit the artisanal fishery as a response to scenarios of declining catch or increasing monetary incentives.Fisheries Research, 111(1), 74-81... Qualitative