• No results found

Price Dispersion in the Car Insurance Market

N/A
N/A
Protected

Academic year: 2021

Share "Price Dispersion in the Car Insurance Market"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Price Dispersion in the Car Insurance Market

(2)

Contents

1 Introduction 3

2 Theory/Literature 5

2.1 Price dispersion . . . 5

2.2 Key Concepts of Non-Life Insurance . . . 6

2.2.1 Rating factors, tariffs and risk analysis . . . 6

2.2.2 Generalized Linear Models . . . 7

2.2.3 Vehicle insurance . . . 8

2.3 Allowing for non-linear effects of the covariates . . . 8

2.4 Mixed-Effects Models . . . 10

3 Data 11 3.1 Construction of our data set . . . 11

3.2 Descriptive statistics . . . 12

4 Empirical Results 12 4.1 Separate Premium Models . . . 12

4.2 Linear Mixed-Effects Model . . . 19

5 Conclusions 29 References 31 Appendices 33 A Categorization 33 B Regression results 33 C Linear Mixed Effect Estimations 40 C.1 Fixed Effects . . . 40

(3)

Abstract

In this thesis, we study price dispersion in the Dutch car insurance industry. We are interested in the influence of insurer heterogeneity on the variation in premiums. A mixed effects model approach is used to capture the differences in the pricing structures of the insurers. We find that all insurers use the proposed factors in their premium determination and that their pricing structures differ significantly from each other. That is, insurer heterogeneity exists for all the rating factors leading to different premium dispersions across and within factors. Our results indicate that premium dispersion increases with age and mass of the car.

1

Introduction

The market for motor vehicle insurance in the Netherlands is extensive and highly competitive. In 2014, 34 insurance companies competed to attract new policy holders and retain existing ones. Together they made a turnover of 3.9 billion euros. The potential is high as there are almost 8 million passenger cars in the Netherlands (CBS, 2015), all of which needing a mandatory third party liability insurance. However, with the advent of many comparison sites, retaining customers becomes a greater challenge as car owners are now able to inform themselves better about the different rates, giving them more opportunities and choice in the market.

Insurance companies realize this as well and are continuously trying to determine their position in the market. Comparing premiums of all their policies with the competition would give an indication of this position but this method is quite laborious and lacks overview. As all insurance companies determine their premiums using a pricing structure related to policy characteristics, it would be extremely helpful for these companies to have models that can estimate the pricing structures of their competitors. In that way, premiums of the competition can be predicted and compared given any combination of policy characteristics. Moreover, having detailed information about the pricing structure, insurance companies can also detect for which policy characteristic there is more room to price competitively and attract more customers. One way to estimate these pricing structures is by reverse engineering. With a large set of policy profiles the corresponding rates of competitors are collected in the same way as is done by comparison sites. By using an appropriate econometric model a relation between the two can be found.

In general, the basis of non-life insurance rate making relies heavily on the Generalized Linear Models, introduced by Nelder and Wedderburn (1972). GLM’s generalize linear regres-sion models in two important ways. The distributional assumption is relaxed by allowing any member of the exponential dispersion family, such as Poisson, Gamma, Normal and Binomial. Secondly, it allows also for non-linear relations of the conditional mean to the explanatory variables.

(4)

and are extremely useful in rate-making in a sense that a riskier policy characteristic should be charged a higher premium. Premiums for all policies can now be calculated by multiplying a base premium with the relativities.

This step of the rate-making process actually only accounts for the expected loss (total claim amount per year) of the insurer. To arrive at a final production premium, operational costs and competitive considerations are also incorporated by way of additions made to the relativities.

In this thesis we are interested in prediction of the premiums and in which way the heterogeneity of the insurers, i.e. their different pricing structures, influence the premium dispersion. To that end we pose the following research questions: Which factors are relevant for car insurance premiums? And, which relativities have the highest influence on premium dispersion?

Essentially, the prediction of the rates of a competitor reduces to estimating their rela-tivities. Using GLM for this purpose would seem an obvious choice. However, this approach meets with a few shortcomings. First of all, as the final production premiums are being predicted, there is no way to identify the separate stages of the rate making process. For the purpose of rate predicting it is not really important to know its various components even though it would give greater insight of a competitor’s strategy and portfolio behaviour.

Secondly, the categorization of rating factors used by insurers might be different. Using the same categorization set-up per insurer may lead to biases in the estimated relativities. Estimating GLM’s with different but possibly correct categorization per insurer will result in better predictions but also makes the comparison of the relativities and their effects on premium dispersion more complicated.

To have the best of both worlds, we propose to estimate the pricing structures of the insurers jointly using a mixed effects model that allows for heterogeneity between the insurers. This model exists of fixed and random components, in which the first can be interpreted as industry standard relativities and the latter as variation around that standard, or as a measure of variation between the insurance companies. To cope with possible categorization problems we will introduce restricted cubic splines for a number of covariates. These allow for possible non-linear effects between the response and the covariate and keep the continuous nature of the covariates as opposed to categorization.

(5)

2

Theory/Literature

2.1 Price dispersion

Price dispersion is the difference of prices of homogeneous goods across companies (Stigler, 1961). Even though economic theory would expect the same prices for these goods (the law of one price), in reality this is not the case because certain conditions for perfect competition do not hold. Most studies focus on consumer heterogeneity, market structure and competition as reasons for price dispersion. For instance, Stigler finds that imperfect information due to search costs of consumers leads to price dispersion. The advent of internet should reduce or even eliminate consumer search costs. However, several studies find that price dispersion still exists among internet retailers and can be either higher or lower than that of original retailers (Bailey, 1998; Lee, 1998).

There are also indications that other sources of consumer heterogeneity like preference, brand loyalty, experience and willingness to pay influence price dispersion (e.g. Chen and Hitt (2001), Diamond (1987) and Johnson et al. (2004)).

Price dispersion can also exist within a company, where it charges different prices for the same product usually to attract different demographics. This form of price dispersion is also called price discrimination. Examples are age discounts such as lower public transport fares for the elderly and student discounts, grocery stores selling different brands within a product group and different airline tickets for the same flight (economy and business class) .

Microeconomic theory predicts that as the market moves toward perfect competition both price dispersion across and within firms should decrease. Empirical evidence suggests mixed effects of competition. In a study for airline fares, Borenstein and Rose (1994) find that an increase in competition leads to higher price discrimination. They explain this by a brand loyalty argument where frequent flyers or business class flyers are less price elastic than holiday flyers. The former are often more linked to certain flight operators through client programs. Increased competition on a route will therefore impact the top end (business class) of the fare distribution less than the lower end. The result is that price discrimination increases.

In another airline study, Gerardi and Shapiro (2009) find quite the opposite. Complying with the theory, price discrimination decreases with extra entry on the market. This decrease is higher if the client base for a route is more heterogeneous, that is with a large number of business and leisure travellers. The difference between the two studies is the estimation method. Borenstein and Rose (1994) use cross-section data whereas Gerardi and Shapiro (2009) use panel data. The latter identifies changes in competition directly by the number of carrier active on a route while the former has to instrument for the degree of competition. Gerardi and Shapiro show that the instrument used by Borenstein and Rose is correlated with the error term of the regression and this biases the effect of competition on price dispersion upward.

Another reason for possible positive effect of competition on price dispersion is that as competition increases, companies take effort to distinguish themselves ’vertically’ (Tirole, 1988), for example by offering extra services or product choices. In this way, companies create niches for themselves giving them some monopolistic power to differentiate prices. This effect can temper or even outweigh the initial negative effect of extra competition on price dispersion. Empirical evidence from a study of competition among grocery stores concurs with this theory (Zhao, 2006).

(6)

focus on the existence of search costs as the reason for the price differences (Carlson and McAfee, 1983; Dahlby and West, 1986). However, Schlesinger and Von der Schulenburg (1991) find results that product heterogeneity and switching costs are also important causes. They make a distinction between homogeneous insurance contracts and the final insurance products which differ in terms of service among insurers. There is also empirical evidence of the importance of brand loyalty or perceived product heterogeneity to consumers in the insurance industry (Cummins et al., 1974; Schlesinger and von der Schulenburg, 1990).

Switching costs are also relevant as consumers have to inform their former insurance companies of the switch, incur costs of entering into a new contract and learning potential other claim procedures of the new insurer. These costs enable insures to exact a form of monopolistic pricing.

Honka (2014) finds relevant search and switching costs for the retention rate in the au-tomobile insurance market in the USA and concludes that consumer welfare would be most benefited by a reduction in search costs.

This thesis adds to the existing literature on price dispersion in the insurance market by investigating another aspect of heterogeneity of insurers, their pricing structure. Generally, an insurance premium is a combination of risk assessment of expected loss, operational costs and competitive considerations of the insurer. If we assume that all insurers have more or less the same risk assessments, any difference in premiums will be the result of differences in the latter two premium components.

Competitive considerations are for instance package deals for consumers buying multiple insurance products or employee discounts for companies. It also involves pricing strategies where premiums for policies with low risk are made more attractive with discounts while riskier policies get a premium mark-up. These strategies could indeed be different per rating factor and per factor level. Thus, premium dispersion might not only be different across factors but also within factors. This motivates us to use the mixed effects model approach described in Section 2.4 where insurer heterogeneity is allowed for multiple rating factors.

2.2 Key Concepts of Non-Life Insurance

2.2.1 Rating factors, tariffs and risk analysis

(7)

as the non-random measure of exposure. It actually measures the loss (monetary damage) per claim conditional on a non-zero claim count. The product of frequency and severity gives the expected claim costs per year, also called the pure premium. In a perfect world without any additional costs, the pure premium is the rate the insurer needs to ask in order to break even.

The rating factors are usually measured in categories even if they are continuous by nature. This has two reasons: the relation of the premiums to these factors is seldom linear and historically, the premiums were calculated by so-called tariffs, which were price listings for any policy. The categorization of continuous variables is generally a bad idea as it results, among others, in a loss of information, loss of power and possible overestimation of the effects of other variables (Harrell, 2015). Moreover, even though performed by experienced actuaries, categorization has an arbitrary nature as natural rules for the determination of the level bounds are often missing. It is possible, however, to allow for non-linear effects of continuous variables differently which will be addressed later in this paper.

The idea behind most existing tariffs is to use the rating factors to divide the portfolio of insurance policies into homogeneous groups or tariff cells. One tariff cell corresponds to one value (level) of each of the rating factors. The key ratios of the tariff cells are jointly modelled under three basic assumptions:

1. Policy independence. The response variables (claim count and amount) of two different policies are independent.

2. Time independence. The response variables of a policy in disjoint time periods are independent

3. Homogeneity. The response variables of two policies belonging to a same cell and having the same exposure follow the same probability distribution

For all three assumptions one can find counter examples where they would be invalid. Ohlsson and Johansson (2010) argue that they are still reasonable enough and that insurers have found ways to reduce the impact of these situations, e.g. reinsurance for calamities impacting many policies at once and invalidating the first assumption.

2.2.2 Generalized Linear Models

It would be tempting to model the expected key ratios using linear regression. However, inference based on linear regression assumes a normal distribution for the random errors. The random variables in frequency (claim count) and severity (claim amount) are both non-negative, in the first case discrete and in the second case continuous and skewed to the right, which make the normality assumption less plausible. Moreover, the relation of the mean to the rating factors is multiplicative rather than additive.

Instead, the common practice in non-life insurance is to use GLM (Generalized Linear Models), developed by Nelder and Wedderburn (1972). GLM relaxes the distributional as-sumptions in order to allow for all distributions which are member of the Exponential Dis-persion Family (EDF), for example Poisson, Gamma, Normal and Binomial. The probability density function, c.q. probability mass function of the EDF is given by

(8)

where θi is a parameter that is allowed to depend on i, while the dispersion parameter φ > 0 is the same for all i. The cumulant function b(θi) is assumed twice continuously differentiable, with invertible second derivative. Every choice of b(θi) corresponds to a family of probability distributions, such as the normal, Poisson and gamma distributions. Given b(θi), the distribution is completely specified by the parameters θi and φ. The function c() does not depend on θi and is of little interest in the GLM theory. In the insurance context, the response variables, Yi, are the key ratios and wi is the exposure of tariff cell i. The θi is usually a function of the mean of the key ratios, µi = E(Yi).

The most common choice for claim frequency is a Poisson distribution. For severity this is a Gamma distribution. It is also possible to make a model for the pure premium directly using a compound Poisson distribution (Jørgensen and Paes De Souza, 1994). However, separate models for frequency and severity give more insight into the extent as to which the rating factors affect the premiums (Ohlsson and Johansson, 2010).

Another relaxation of GLM is that the mean is linked to the explanatory variables xi through

g(µi) = x0iβ = ηi, (2)

where the link function, g(), is monotonous and differentiable, not necessarily the identity (which is the case for linear regression models). ηi is called the predictor and in this case a linear function of the xi. In the special case where θ(µi) = g(µi), the link function is called canonical.

The levels of the rating factors are translated into dummy variables in xi, where one level per rating factor is left out to prevent collinearity in the design matrix X = (x1, . . . , xn)0. Usually, the level which has the most observations is set as the base level. In a sense, the base level is incorporated in the first column of the design matrix (the intercept term).

As already mentioned, the effect of the rating factors is usually multiplicative rather than additive. For this reason, the log link function is the usual choice in this context, that is

g(µi) = log µi= x0iβ. (3)

Taking exponentials of the parameters β gives the relativities of the rating factors as they describe the difference of the premium relative to the base level. By multiplying the base premium and the relativities an actuary can easily calculate the risk premium for any profile. 2.2.3 Vehicle insurance

For vehicle insurance in particular, there are three main types of coverage: TPL (third party liability), partial casco and hull. Only the first coverage (in Dutch: WA or Wet Aansprakelijkheid) is mandatory and covers physical and property damage incurred by an-other person in a traffic accident. Partial casco mainly covers theft, weather and windscreen damage. Hull or full casco also covers damage of the policy holder in an accident. Having additional coverage (a package deal) generally results in a reduction of the TPL premiums of about 10%. This is explained by a self-selection argument. People with additional insurance are usually more careful with their vehicle and hence less likely to cause an accident.

2.3 Allowing for non-linear effects of the covariates

(9)

use a transformation of the predictor to induce linearity (e.g., log(x)). Another option is to include higher powers of the covariate in the model, but polynomials have some undesirable properties such as undesirable peaks and valleys (Magee, 1998).

In this paper we will use restricted cubic splines as described by Harrell (2015) to model non-linear effects. Splines are piecewise polynomial functions over intervals of the covariate. They are connected at the endpoints of the intervals, the knots. For instance the linear spline function for a covariate x is given by

f (x) = β0+ β1x + β2(x − t1)++ β3(x − t2)++ . . . + βk+1(x − tk)+, (4) where tp, p ∈ {1, . . . , k} are the knots, and (x − tp)+= max{0, x − tp}.

The problem with linear splines is that they are not smooth. This can be overcome by using cubic splines

f (x) = β0+ β1x + β2x2+ β3x3+ β4(x − t1)3++ . . . + βk+3(x − tk)3+ = x0β,

where x = (1, x1, . . . , xk+3)0, β = (β0, . . . , βk+3)0 and

x1 = x, x2= x2,

x3 = x3, x3+p= (x − tp)3+

Smoothness is achieved by forcing the first and second derivatives of the polynomials to agree at the knots. Stone and Koo (1985) found that cubic splines behave poorly in the tails. They find advantages of restraining the tails to be linear, giving the restricted cubic spline for k knots t1, . . . , tk

f (x) = β0+ β1x1+ β2x2+ . . . + βk−1xk−1, (5) where x1 = x and for j = 1, . . . , k − 2

xj+1 = (x − tj)3+− (x − tk−1)3+(tk− tj)/(tk− tk−1) + (x − tk)3+(tk−1− tj)/(tk− tk−1).

It can be shown that f (x) is linear in x for x > tk. Notice that for the restricted cubic spline only k − 1 parameters have to be estimated in stead of k + 3.

It is usually impossible to specify the locations of the knots in advance. Fortunately, Stone (1986) found that the location of the knots for restricted cubic splines is not very crucial in most situations and that the number of knots should be between 3 and 5 depending on sample size. For large samples (N > 100) he advises to choose 5 knots on the quantiles Q = {0.05, 0.275, 0.5, 0.725, 0.95}.

(10)

2.4 Mixed-Effects Models

In this thesis we aim to determine the influence on premium dispersion of the heterogeneity between the insurers, i.e. the differences in relativities. We do this by using a linear mixed effects model. Mixed-effects models are primarily used to describe relationships between a response variable and some covariates in data that are grouped according to one or more classification factors. They incorporate both fixed effects that apply to the entire population under study and random effects which are associated with individuals or groups and allow for heterogeneity of the effects. Examples of grouped data include longitudinal data, repeated measures data, multilevel data, and block designs.

Our response variable is the log premiums of the insurers which we model on the rating factors and the data are grouped by one classification factor, namely the insurer i. The heterogeneity of the insurers is captured by including random effects for every insurer, bi, for the effects of the rating factors, i.e. the relativities. The random effects describe the variation around the mean relativities of the population of insurers and are thus the same for each insurer (fixed effects, β).

A linear mixed model with one grouping factor is formulated as follows (Laird and Ware, 1982; Pinheiro and Bates, 2000):

yi = Xiβ + Zibi+ εi, i = 1, . . . , M, (6)

bi ∼ N (0, Ψ ), εi∼ N (0, σ2I), (7)

where yi is the ni dimensional response vector of group i, β a p dimensional vector of fixed effects and bi a q dimensional vector of random effects. Xi and Zi are ni × p and ni × q matrices of known covariates. In our case, Xi and Zi consist of rows of insurance policies with values for every rating factor and yi the corresponding premiums per insurer. These policies were constructed (see Section 3) so Xi and Zi are nearly the same for each insurer. However, some policies are not insured by all insurers leading to small differences in Xi and Zi and number of observations per insurer, ni. The within group errors are denoted by εi.

The random effects bi and the within group errors εi are assumed to be independent of each other and of the errors and random effects of different groups. Although the bi behave like parameters, they are actually an extra source of random variation (between the groups) in the model. We do not estimate the bi but can make predictions of the values of these random variables given the data. The parameters of this model that will be estimated are thus β, Ψ and σ2.

The random effects bi are restricted to have a mean of 0 and therefore any nonzero mean for a term in the random effects must be expressed as part of the fixed-effects terms. The columns of Zi are usually a subset of the columns of Xi. In this sense a mixed model allows us to make inferences about fixed effects which represent the average characteristics of the population represented by the subjects i and the random effects representing the variability amongst the subjects.

Several methods of parameter estimation are proposed for linear mixed models. See for a comparison Searle et al. (1992) and Vonesh and Chinchilli (1997). We will use Restricted Max-imum Likelihood (REML) as it compensates for the negative bias in the variance estimation of unrestricted Maximum Likelihood (ML).

(11)

per covariate. The disadvantage of this approach is that the effect or relativity of such a covariate is now the combined effect of four spline components (as we use the recommended five knots per spline). To determine the variation of this relativity is now less straight-forward. Variation of the relativities is not the only thing we need to compare the effects of the heterogeneity in these relativities on premium dispersion. These effects also change with the values in Xi and Zi. To make a sensible comparison of the heterogeneity effects of different rating factors we need to have the same scale.

To this end we propose the following approach. We will predict the premiums of all the insurers for the first and third quartiles (q25, q75) of the continuous covariates, while keeping the other covariates fixed. Per quartile we calculate two dispersion measures: the ratio of the maximum predicted to the minimum predicted premium and the coefficient of variation of the premiums. This approach accomplishes two things. It isolates each heterogeneity effect per rating factor and allows a comparison of these effects on the same scale, namely the same quartile of their respective distributions. This allows us to infer if a source of insurer heterogeneity has a larger impact on premium dispersion, given a quartile. Moreover, we can also see which factors create the largest effects for the premiums of individual insurers by comparing their predicted premiums at both quartiles.

Note that as our response variable is log premium which we assume to be normally dis-tributed, the prediction of original premiums is not the same as taking exponentials of the predicted log premiums. That is if log Y ∼ N (µ, σ2) then EY = exp(µ + 12σ2). Thus if the normality assumption of the log premiums is correct then

E(yij|x0ij, zij0 , bi) = expx0ijβ + z0ijbi+12σ2 

is the predicted premium for insurer i and policy j, which we will estimate by replacing β and σ by their estimates.

3

Data

3.1 Construction of our data set

Our data consist of 88,586 fictitious policies constructed by Achmea, the largest company in the Netherlands offering insurance for health care, income and property loss. Achmea uses this set monthly to compare premiums with the competition with the help of Rolls. This computer software has become the de facto comparison tool for car insurance in the Netherlands as all companies have submitted their premium models to its distributor. It is also used by comparison sites for consumers. The data is augmented with car specifics like weight, type and value provided by RDW, a public sector company specialized in mobility. The TPL premium data of seven insurance firms active in the Dutch automobile insurance market are generated using Rolls and added to our set. These seven firms include the main brand of Achmea, Centraal Beheer, and six main competitors of Achmea. All premiums date from February 2016.

(12)

5 covariates linked to the vehicle: age, fuel type, mass, power and original cost price. This price is not indexed. Cars with age higher than 15 years are measured at their economical or day value. This price measure is used in the determination of tax related to leased cars.

Two factors are held constant: gender and postal code. All policy holders are set to be male as in the EU it is prohibited to differentiate rates according to gender. The postal code is held constant for representative reasons. If it were to vary, the number of observations would be insufficient to cover all the relevant car-person-postal code profiles, leading to inaccurate estimates for the effects of these factors.

3.2 Descriptive statistics

Tables 1-4 give summary statistics and correlations of the covariates and the response vari-ables. The positive correlations between claim free years and age and between vehicle charac-teristics as original cost price, power and mass come as no surprise. Price is highly correlated with mass and power which might indicate that it has no additional value in our model for TPL. Indeed, TPL coverage does not cover damage to your own car. However, some insurers may use this variable as a proxy for driving style. Owners of expensive cars could be more cautious drivers and cause less damage. As can be seen from Table 4, the number of obser-vations is not the same for each insurer. Some ‘policies’ are not covered by all the insurers. For instance, ANWB does not provide coverage to vehicles with price higher than 75,000e.

Table 1: Summary statistics continuous covariates

Variable Description Obs Mean Std. Dev. Min Max Median

age Age insured (years) 88,586 51.7 15.85 18 81 53

claimfree Claim-free years 88,586 16.76 13.52 0 62 14

agecar Age car (years) 88,586 10.79 5.79 1 65 10

powercar Power (kWh) 86,420 78.91 34.24 17 430 74

mass Mass (kg) 88,586 1,135.15 255.14 512 2,960 1115

price Original price (e) 88,586 23,933.32 17,032.13 1,954 28,7658 19,660.5

Table 2: Fuel

Type Frequency Per cent

Gasoline 77,396 87.37

Diesel 8,674 9.79

Electricity 1,051 1.19

Liquid Petroleum Gas 1,465 1.65

Total 88,586 100

4

Empirical Results

4.1 Separate Premium Models

(13)

con-Table 3: Correlations continuous covariates

age claimfree agecar powercar mass price

age 1.00 claimfree 0.58 1.00 agecar -0.27 -0.16 1.00 powercar 0.02 0.01 -0.06 1.00 mass 0.08 0.05 -0.10 0.82 1.00 price 0.02 0.01 -0.07 0.91 0.80 1.00

Table 4: Summary statistics premiums

Variable Obs Mean Std. Dev. Min Max Median

Reaal 88,335 645.71 479.41 165 8,582 502 ANWB 86,540 419.23 298.60 119 4,160 306 Univ´e 88,335 341.40 211.22 137 2,817 273 A.S.R. 88,335 265.72 183.21 43 2,865 213 Allsecur 88,586 248.89 146.08 112 2,303 204 Nationale Nederlanden 88,335 319.65 179.67 119 2,204 255

Centraal Beheer Achmea 88,292 288.27 138.80 109 1,553 235

tinuous response variables are categorized in a crude manner, where all levels are chosen to have at least 0.5% of the observations. The idea behind this that without knowledge of the categorization structure of the companies a model with many levels can be a good starting point for the determination of the actual levels. Levels without significantly different relativ-ities these can be pooled together to form a larger level. The categorization can be found in Table A1. For every factor one level is left out in the estimation to avoid collinearity. This level is the one with the most observations. In total, our categorization combined with the fuel types results in 284 indicator variables. The model for ANWB has three fewer as ANWB does not offer insurance to vehicles with price higher than 75,000e.

In this section and the next we will use the subscripts i = 1, . . . , 7 to denote the insurer and j = 1, . . . , ni to the denote the observation per insurer. The number of observations per insurer ni are found in Table 4. The seven separate linear models are specified as follows:

log yij = x0ijβ + ij, (8)

where yij is premium for the j-th policy of the i-th company and x0ij is the vector of the indicator variables. The ij are error terms and assumed to identically and independently distributed. These models are basically a GLMs where we assume a Normal distribution and a log link as in (3).

As noted in Section 2.2.2 a normal distribution for premiums might not be accurate given the non-normal risk components of premiums: claim frequency and severity. A different member of the EDF family, like Gamma or compound Poisson might be more appropriate. We will assess the normality of the errors in (8) and if needed try to model premiums with a different GLM.

(14)

of the fitted values. Moreover, the Breusch-Pagan test with the null of homoskedasticity were rejected for all insurance companies (Breusch and Pagan, 1979). Consequently, we re-estimated (8) with the standard errors robust against heteroskedasticity. The estimation results can be found in the appendix Table B1. As inspection of this table is quite a tedious job, we will list the main results here. In every model, all factors are significant. That is, the Wald tests with the null that all levels of a factor are insignificant are rejected. This indicates that all insurance companies use the proposed variables for their TPL rate making. The models fit the data nicely with adjusted R2 over 94% for Reaal and around 99% for the other models. This is off course not surprising with this number of predictor variables.

The results of Table B1 come more to life by graphing the coefficient estimates against fac-tor levels. By taking exponentials of these estimates we achieve estimates for the relativities. The graphs for the relativities per age level and fuel type are depicted in Figures 1 and 2. The age relativities give a nice idea of the different premium strategies in the market. Especially for the young ages there seem to be quite some difference. For instance, relative to the base level of a 54 year old, an 18 year old has to pay between 1.6 (CB Achmea) and 3.8 (Reaal) times as much premium, keeping all other factors constant. Also, some insurance companies seem to have the ”daddy, can I borrow your car” hump, where between the ages of 38 and 54 the relativities rise to compensate for the higher expected damage this entails.

For fuel, all insurance companies charge a surplus for diesel drivers. Due to tax regulations most diesel cars become more interesting when the owner expects to drive a lot. Hence, on average, the diesel drivers will make more km and are likely to make more claims than gasoline drivers. The attitude toward electric cars seems quite different among the insurers. Some like Reaal and Univ´e charge less compared to gasoline cars, whereas Allsecur charges more. The reason for this is unclear. If the insurers have the same risk assessment of electric cars this difference might be the result of different competitive strategies. LPG seems to be priced the same as Gasoline across the line. For this reason LPG vehicles will be pooled with Gasoline in the general market model.

Figure 1 also nicely depicts the non-linear relation between age and premiums. For the relativities of the other factors we find different patterns. Claim free years (Figure 3) form a linear spline with different knots (both in quantity and location) and slopes per insurer. ANWB and Nationale Nederlanden are relatively expensive for policy holders with few claim-free years but differ in strategy after about 4 years. Most insurers stop differentiating before 15 claim-free years. Only ASR probably uses different relativities after 15 years as is indicated by the unusual slope change at 16+.

The relativities for mass seems to increase linearly. The large shifts at the end are due to an increase of width of the levels. The patterns for price and power differ quite a lot among insurers. For Allsecur and Nationale Nederlanden this power seems to become important after a certain threshold. For the others the relativities increase very slightly but significantly with power. The relativities of car price, although jointly significant for all insurers, seem to be only important for Reaal and Univ´e.

(15)

Figure 1: Relativities per age

(16)

Figure 2: Relativities per fuel type

(17)

Figure 4: Relativities per mass

(18)

Figure 6: Relativities per age car

(19)

Figure 8: Normal Quantiles Plot of model Centraal Beheer Achmea

4.2 Linear Mixed-Effects Model

We will use the results of the separate linear models as a basis for a single model where we jointly estimate the premium models of the seven insurance companies using a linear mixed-effects model like (6). To reduce the number of parameters, we will use restricted cubic splines as described in (5) for most all of our variables except fuel type and claim-free years. For these covariates we we will use the same indicators as in the separate models. As our data exceeds 100 observations we will use the recommended five knots per cubic spline.

The introduction of the splines has reduced the number of covariates (including the in-tercept) to 40. We aim to investigate whether there exist random effects for all of these covariates. To get an indication of the variation between the insurance companies per factor we re-estimated the individual linear models with the restricted cubic splines. We only report the confidence intervals of the coefficient estimates in Figure 9. It is clear from the small number of overlaps of the intervals that insurers use quite different pricing structures for all factors. The most overlap is for policy holders with a large number of claim-free years.

Ideally, we would like to incorporate all these covariates as random effects in the linear mixed effects model and allow for covariances between these random effects. Unfortunately, this leads to serious computational problems, mainly because algorithm for the maximum likelihood estimation does not converge. To cope with this we restricted the covariance matrix of the random effects (Ψ in (6)) to be diagonal and reduced the number of random effects included in our model. We allowed only for random effects in the age of the policy holder (cub lf t), fuel type and mass (cub gw). Consequently, the other covariates are assumed to be fixed effects.

(20)

random effects is strong as we could easily imagine covariation between random effects related to car characteristics for instance and the random effects corresponding to four covariates of a restricted cubic spline are probably correlated as well. However, our specification is the largest linear mixed effects model we can estimate for the car insurance premiums. The resulting effects on premium dispersion of the three covariates included in the random effects can therefore be seen as merely indicative.

The estimation results of our linear mixed effects model can be found in Table C1 and Section C.2. From the fixed effects estimates we see that almost all covariates are significant except for electric and lpg cars. That is, their ‘industry’ mean premiums do not significantly differ from the premiums of the reference level, gasoline. Diesel cars are charged more than gasoline again. Almost all cubic terms are relevant, only the power of the car (cub kwh) might be better represented by a linear effect.

We checked the main assumptions about the variance components of (6), that is nor-mality and homoskedasticity of the within residuals εi and the normality of the random effects. Figure 10 shows that the within residuals are nicely centred around zero, but that homoskedasticity might not be the case as the premiums of Reaal vary more than Univ´e. The normality of the within errors seems reasonable as can be seen in the QQ plots in Figure 11. The normality of the random effects is questionable (Figure 12). Especially the QQ plot of the intercept shows an irregular pattern. The figure identifies two outliers where the standard-ized random effect in absolute value is greater than the 95% quantile of the standard normal. These are ANWB in the QQ plot of the intercept and NN in the plot for the second term of the cubic spline for age. The omission of claimfree years and age of the vehicle as random effects, both very relevant in the individual model of ANWB, may have a large impact on the estimation of the random effect of the intercept of this insurer.

Figure 13 compares the coefficient estimates of the individual models (blue circles) and the linear mixed model (pink circles). The fixed effects of the linear mixed model are represented by dashed vertical lines. For the covariates without random effects components the coefficient estimates of the linear mixed model are the same as the dashed line. The differences in the estimates of the intercept, claim free years and age of the car for ANWB are quite notable.

We tested whether our random effects are relevant by performing a likelihood ratio test. Let L2 be the likelihood of the more general model and L1 the likelihood of the restricted model, then the likelihood ratio test (Λ) statistic is

Λ = 2 log(L2/L1) = 2 [log(L2) − log(L1)]

Let kibe the number of parameters to be estimated in model i, then the asymptotic distribu-tion of Λ is a χ2 distribution with k2− k1 degrees of freedom (df) under the null hypothesis that the restricted model is adequate.

To test if all insurers use the same pricing structure, i.e. all random effects are irrelevant, the restricted model will be just a pooled linear model with fixed effects only. Our general model is the estimated linear mixed model. The degrees of freedom of the distribution of the statistic will be 12, equal to the number of random effects we estimated in our linear mixed effects model specification. Λ has a value of 1065479 and has p value, Pr(χ212≥ 1065479) < 0.0001. So we firmly reject the pooled model.

(21)

In this case, Λ = 162103, with p < 0.0001, so we also reject the random intercept model. These results do not imply that our model is optimal, but it is at least better than these two simpler models.

Table 5 gives an impression of how different factor pricing schemes of the insurers influence premium dispersion. We have predicted the premiums for the first and third quartiles (q25, q75) of the continuous covariates, while keeping the other covariates fixed. For fuel we compare the predicted premiums of gasoline and diesel. As reference profile we have 55 year old driver, having zero claim free years, driving a 5 year old car of 850kg, with 60 kWh, gasoline and original price 10000e. This is an average hatchback (e.g. Renault Twingo). The predicted premiums for this profile are the same as in the row for gasoline of Table 5. We chose these levels for the factors as they had the highest frequency in our data. As the claim free data was manually created we chose the reference level to be zero.

The effect of the difference in age causes predicted premiums to go either up or down. For Nationale Nederlanden the predicted premium of a 64 year old is almost 6% lower than for a 40 year old. ANWB, on the other hand, charges a 21% higher premium. These predictions correspond well with Figure 1 where 40 and 64 are on both sides of the ‘hump’ and we similar differences in the estimated relativities for age in the separate linear models.

For the other continuous covariate we included as random effect, mass, we see that the predicted premiums of all the insurers are higher for the heavier vehicle. This increase is 15% for CB Achmea and almost 38% for Reaal.

Age of the car, power, price and number of claimfree years are fixed effects in our model, so the change of quartile results in the same percentage change of the premiums. Of these four, the age of the car and especially claimfree years have the most impact on premiums. It pays to drive safely or at least to report no claims. It is interesting that policy holders with older cars need to pay higher third party liability premiums. It might be that these drivers are less protective of their cars and therefore cause more damage then owners of newer cars. Having a diesel car in stead of gasoline results in a rise of predicted premiums by 19 (Univ´e) to 48% (ASR). We see that two insurers with on average low-end premiums (CB Achmea and ASR, cf. Table 4) have the highest premium increase. This might be the result of competitive pricing, where the relative increase in premium for higher risk (diesel) compensates for a more attractive premiums for lower risk (gasoline). Remember, that due to the Dutch tax system, it is more attractive to drive a diesel car if one expects to drive the car a lot (≥15000km). A more frequent use of the car means a higher risk to incur damage.

The last two columns of Table 5 we report the fraction between the highest and lowest predicted premiums and the coefficient of variation of the predicted premiums per covariate value. For instance for a 40 year old policy holder, the highest premium is a factor 2.15 higher than the lowest. And the standard deviation of the predicted premiums is 26% of their mean at age 40. Both can be seen as measures of dispersion of the premiums.

(22)

The same dispersion can be seen for the fixed covariates. As we do not allow for het-erogeneity in these factors, different values do not generate a change in dispersion. For the random effects, we see that the dispersion of the predicted premiums is slightly higher for mass than for age. Dispersion for diesel is a bit lower than for gasoline.

(23)
(24)

Residuals vz CB Achmea NN Allsecur Unive ASR ANWB Reaal −0.5 0.0 0.5

(25)
(26)

Random effects

Quantiles of standard nor

mal

−1.0 −0.5 0.0 0.5 1.0 −0.4 0.0 0.4 0.8 ANWB

(Intercept)

−0.010 0.005

cub_lft1

−0.04 0.000.02 NN

cub_lft2

−0.15 0.00 0.10

cub_lft3

−0.2 0.0 0.2 0.4

cub_lft4

−0.0004 0.0002

cub_gw1

−0.0010 0.0005

cub_gw2

−0.002 0.000 −1.0 −0.5 0.0 0.5 1.0

cub_gw3

−1.0 −0.5 0.0 0.5 1.0 −0.002 0.002

cub_gw4

−0.10 0.00 0.10

diesel

−0.1 0.0 0.1

electric

−0.03 0.00 0.02

lpg

(27)
(28)
(29)

5

Conclusions

In this thesis we have estimated and compared premium models of seven insurance companies active in the Dutch market. By using a constructed data set with characteristics related to the vehicle and the insured, also called rating factors, the Third Party Liability (TPL) premiums of these insurance companies were retrieved using the same software as internet comparison sites. By estimating linear models, we find that every insurer uses the factors we proposed in their rate-making process. That is, the two characteristics of the insured (age and claim-free years) and the five vehicle factors (age, mass, power, fuel type and price) were all relevant. Our results indicate that the insurers use different premium models, that is their relativities and base premiums are not the same given our categorization of the factors. We did not investigate the main reasons for these differences although different categorization schemes seems to be one of the culprits. Other options are company structure (e.g. the possibility of cross selling deals) and competitive strategy. This can be an interesting starting point for further research.

We re-estimated the linear models using restricted cubic splines for five covariates: age, mass, power, price and age of the car. We did this to capture possible non-linear effects of these covariates in a better way than categorization does. The main issues with categorization is that there are no natural rules to create the various levels and that it is assumed all covariate values of the same category have the same effect on the response variables. This can lead to undesirable results such as loss of information, loss of power and possible overestimation of the effects of other variables. Restricted cubic splines are in essence smooth, piecewise polynomials over intervals of the covariate and therefore retain more information of the covariate by allowing it to remain continuous. Moreover, for large samples the number of parameters to be estimated per covariate spline is five. For many covariates, this leads to a vast reduction in this number compared to categorization schemes. For this model specification we also find that the insurers use different pricing structures for all the policy characteristics we used.

To investigate the effects of this variation in factor pricing on premium dispersion we have estimated a linear mixed effects model. This model incorporates fixed effects pertaining to the entire population under study and random effects which allow for heterogeneity of individuals or groups in the population. These random effects can involve random intercept terms but also random coefficients.

As our preliminary results indicate that the heterogeneity of the insurers is spread out over all our factors, it is desirable to incorporate random effects for all these factors. However, this vastly increases the number of parameters to be estimated (especially if covariances between the random effects are allowed) and is encountered by computational problems like convergence problems of the maximization algorithm. To cope with this, we estimated a simpler model where we incorporate a random intercept, and random effects for age, mass and fuel type. The other factors are treated as fixed. Moreover, we assumed that the random effects were uncorrelated. This assumption is quite strict as it is likely that random effects related to the terms of a spline do correlate.

(30)

and mass. Dispersion in mass seems slightly higher than age and premiums for diesel cars are slightly less than gasoline cars.

Comparing the predicted premiums for the first and third quartile of the continuous variables, we see that price and power of the car, although relevant, do not have a large impact on the premiums. For age of the policy holder, mass, age of the car and claimfree years we do see big differences in the predicted premiums for these quartiles.

(31)

References

Bailey, Joseph P (1998). Intermediation and electronic markets: Aggregation and pricing in Internet commerce. Ph. D. thesis, Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science.

Borenstein, Severin and Nancy L Rose (1994). Competition and price dispersion in the us airline industry. Journal of Political Economy 102 (4), 653–683.

Breusch, T. S. and A. R. Pagan (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica 47 (5), 1287–1294.

Carlson, John A and R Preston McAfee (1983). Discrete equilibrium price dispersion. Journal of Political Economy 91 (3), 480–493.

Chen, Pei-Yu and Lorin Hitt (2001). Brand awareness and price dispersion in electronic markets. ICIS 2001 Proceedings, 26.

Cummins, J David, Dan M McGill, Howard E Winklevoss, and Robert A Zelten (1974). Consumer attitudes toward auto and homeowners insurance. Department of Insurance, The Wharton School .

Dahlby, Bev and Douglas S West (1986). Price dispersion in an automobile insurance market. Journal of Political economy 94 (2), 418–438.

Diamond, Peter (1987). Consumer differences and prices in a search model. The Quarterly Journal of Economics 102 (2), 429–436.

Gerardi, Kristopher S and Adam Hale Shapiro (2009). Does competition reduce price dis-persion? new evidence from the airline industry. Journal of Political Economy 117 (1), 1–37.

Harrell, Frank (2015). Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer.

Hastie, T. J. and R. J. Tibshirani (1990). Generalized Additive Models. CRC PR INC. Honka, Elisabeth (2014, 1). Quantifying search and switching costs in the us auto insurance

industry. The RAND Journal of Economics 45 (4), 847–884.

Johnson, Eric J, Wendy W Moe, Peter S Fader, Steven Bellman, and Gerald L Lohse (2004). On the depth and dynamics of online search behavior. Management science 50 (3), 299–308. Jørgensen, Bent and Marta C Paes De Souza (1994). Fitting tweedie’s compound poisson

model to insurance claims data. Scandinavian Actuarial Journal 1994 (1), 69–93.

Laird, Nan M and James H Ware (1982). Random-effects models for longitudinal data. Biometrics, 963–974.

(32)

Magee, Lonnie (1998). Nonlocal behavior in polynomial regressions. The American Statisti-cian 52 (1), 20–22.

Nelder, JA and RWM Wedderburn (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A (General) 135 (3), 370–384.

Ohlsson, Esbj¨orn and Bj¨orn Johansson (2010). Non-life insurance pricing with generalized linear models. Springer.

Pinheiro, J.C. and D.M. Bates (2000). Mixed-effects models in S and S-PLUS. Springer, New York.

Schlesinger, Harris and J-Matthias Graf von der Schulenburg (1990). Consumer informa-tion and the purchase of insurance. WZB WorkingPaper, Berlin. SchlesingerConsumer Information and the Purchase of Insurance1990 .

Schlesinger, Harris and J-Matthias Graf Von der Schulenburg (1991). Search costs, switching costs and product heterogeneity in an insurance market. Journal of Risk and Insurance, 109–119.

Searle, S.R., G. Casella, and C.E. Mcculloch (1992). Variance Components. Wiley, New York. Stigler, George J (1961). The economics of information. Journal of political economy 69 (3),

213–225.

Stone, Charles J (1986). Comment: Generalized additive models. Statistical Science 1 (3), 312–314.

Stone, Charles J and Cha-Yong Koo (1985). Additive splines in statistics. Proc Stat Comp Sect Am Statist Assoc 27, 45–48.

Tirole, Jean (1988). The Theory of Industrial Organization. MIT Press Ltd.

Vonesh, Edward F and Vernon G Chinchilli (1997). Crossover experiments. Linear and Nonlinear Models for the Analysis of Repeated Measurements. London: Chapman and Hall , 111–202.

(33)

Appendices

A

Categorization

Table A1: Categorization of continuous variables

variable levels age [18(1)80) , 80+ claimfree 0, 1, . . . , 15, 16+ agecar [1(1)25) , [25, 29) , 30+ price(x100) [45, 75) , [75(5)350) , [350(10)450) , [450(20)550) , [550(40)670) , [670, 750) , [750, 900) , [900, 1200) , 1200+ mass (x10) [60, 70) , [70, 74) , [75(1)150) , [150(2)160) , [160(5)170) , [170, 180) , [180, 200) , 200+ powercar [17, 40) , [40, 60) , [60(10)180) , [180, 210) , 210+

Note: The notation [a(c)b) denote disjoint intervals of length c starting at a and ending at but not including b. The union of these intervals form [a, b)

B

Regression results

Table B1: Regression table

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(34)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(35)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(36)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(37)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(38)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(39)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

(40)

Table B1 – Continued from previous page

(1) (2) (3) (4) (5) (6) (7)

log Reaal log ANWB log Unive WA log a s log Allse log Nat log Cen

powercar =40 0.00586∗∗ -0.00242∗∗∗ -0.00173∗∗∗ -0.00440∗∗∗ 0.00236∗∗∗ 0.0202∗∗∗ -0.00918∗∗∗ (2.68) (-8.59) (-4.02) (-6.66) (4.28) (48.53) (-15.96) powercar =50 0 0 0 0 0 0 0 (.) (.) (.) (.) (.) (.) (.) powercar =60 -0.0285∗∗∗ -0.000616∗ -0.00164∗∗∗ 0.00587∗∗∗ 0.000305 -0.0102∗∗∗ -0.00193∗∗ (-13.18) (-2.48) (-3.76) (8.68) (0.46) (-16.41) (-3.03) powercar =70 -0.0376∗∗∗ -0.000936∗∗ -0.00470∗∗∗ 0.00263∗∗∗ 0.00394∗∗∗ -0.0102∗∗∗ 0.00997∗∗∗ (-15.22) (-3.17) (-9.42) (3.49) (5.04) (-12.79) (13.90) powercar =80 -0.0204∗∗∗ 0.000229 0.0106∗∗∗ 0.00339∗∗∗ -0.00877∗∗∗ -0.0163∗∗∗ 0.00837∗∗∗ (-7.77) (0.60) (15.43) (3.67) (-8.72) (-15.24) (9.31) powercar =90 -0.0515∗∗∗ 0.000451 0.000781 0.00808∗∗∗ 0.0167∗∗∗ -0.00891∗∗∗ 0.0128∗∗∗ (-16.58) (0.99) (0.86) (7.28) (12.53) (-6.64) (11.80) powercar =100 -0.0308∗∗∗ 0.000996∗ 0.0148∗∗∗ 0.0127∗∗∗ 0.0454∗∗∗ -0.00759∗∗∗ 0.00513∗∗∗ (-9.82) (2.02) (14.08) (10.63) (31.46) (-4.92) (4.38) powercar =110 -0.0731∗∗∗ -0.000495 0.0283∗∗∗ 0.0208∗∗∗ 0.0801∗∗∗ 0.00700∗∗∗ 0.0184∗∗∗ (-20.20) (-0.84) (20.80) (15.48) (44.13) (3.64) (12.84) powercar =120 -0.0354∗∗∗ 0.00223∗∗ 0.0345∗∗∗ 0.0185∗∗∗ 0.144∗∗∗ 0.0369∗∗∗ 0.0167∗∗∗ (-9.17) (2.80) (20.36) (11.91) (73.41) (15.06) (9.98) powercar =130 -0.0475∗∗∗ -0.00218∗ 0.0250∗∗∗ 0.0154∗∗∗ 0.156∗∗∗ 0.0612∗∗∗ 0.0135∗∗∗ (-9.73) (-2.44) (11.08) (8.47) (60.86) (17.82) (6.51) powercar =140 -0.0461∗∗∗ 0.00348∗∗∗ 0.0234∗∗∗ 0.0277∗∗∗ 0.151∗∗∗ 0.102∗∗∗ 0.00808∗∗∗ (-9.27) (3.44) (10.81) (13.28) (54.46) (26.17) (3.38) powercar =150 -0.0390∗∗∗ 0.00439∗ 0.0264∗∗∗ 0.0355∗∗∗ 0.323∗∗∗ 0.121∗∗∗ 0.00630 (-7.02) (2.50) (9.35) (12.87) (57.28) (24.84) (1.77) powercar =160 -0.0760∗∗∗ -0.000389 0.0218∗∗∗ 0.0273∗∗∗ 0.390∗∗∗ 0.186∗∗∗ 0.0000252 (-15.32) (-0.25) (7.82) (10.97) (109.68) (33.35) (0.01) powercar =170 -0.0740∗∗∗ -0.000254 0.0386∗∗∗ 0.0277∗∗∗ 0.398∗∗∗ 0.217∗∗∗ 0.00821∗ (-12.55) (-0.13) (10.63) (9.15) (96.90) (40.28) (2.05) powercar =180 -0.0670∗∗∗ -0.00132 0.0285∗∗∗ 0.0308∗∗∗ 0.402∗∗∗ 0.302∗∗∗ 0.00155 (-12.05) (-0.68) (8.31) (9.65) (96.20) (50.64) (0.39) powercar =210 -0.0585∗∗∗ -0.00426 0.0755∗∗∗ 0.0378∗∗∗ 0.547∗∗∗ 0.415∗∗∗ -0.00131 (-8.59) (-1.30) (14.91) (8.65) (102.87) (56.61) (-0.25) Constant 5.503∗∗∗ 5.165∗∗∗ 5.137∗∗∗ 4.842∗∗∗ 4.863∗∗∗ 5.051∗∗∗ 5.230∗∗∗ (1192.84) (10520.42) (3795.62) (2628.88) (3145.50) (3483.86) (3462.00) Observations 86169 84407 86169 86169 86420 86169 86138 Adjusted R2 0.943 0.999 0.990 0.990 0.982 0.986 0.985 F 5581.3 527355.1 43841.0 46570.7 19866.9 26186.6 24338.3 df m 284 281 284 284 284 284 284 df r 85884 84125 85884 85884 86135 85884 85853 rmse 0.128 0.0179 0.0436 0.0483 0.0549 0.0546 0.0475 t statistics in parentheses ∗p < 0.05,∗∗p < 0.01,∗∗∗p < 0.001

C

Linear Mixed Effect Estimations

C.1 Fixed Effects

Table C1: Fixed effects estimates linear mixed model Linear Mixed Effects Model

(41)
(42)
(43)

C.2 Random Effects

Approximate 95% confidence intervals Random Effects:

Level: vz

lower est. upper

sd((Intercept)) 0.256083 0.437081 0.746005 sd(cub_lft1) 0.005777 0.010929 0.020678 sd(cub_lft2) 0.013436 0.024327 0.044045 sd(cub_lft3) 0.047557 0.092082 0.178294 sd(cub_lft4) 0.122815 0.251775 0.516147 sd(cub_gw1) 0.000201 0.000362 0.000654 sd(cub_gw2) 0.000419 0.000757 0.001369 sd(cub_gw3) 0.000931 0.001709 0.003135 sd(cub_gw4) 0.001626 0.003032 0.005653 sd(diesel) 0.043249 0.076803 0.136390 sd(electric) 0.046439 0.109168 0.256631 sd(lpg) 0.011036 0.021391 0.041462

Within-group standard error:

lower est. upper

Referenties

GERELATEERDE DOCUMENTEN

Where y jt is the dependent variable which measures the spread of wages, x kjt is the independent variable which measures passage of time, union density,

The relevant part of the research done by Goldberg and Verboven is that quality adjusted price differences are analyzed with a hedonic price equation, as in this thesis the

The objective of this thesis is to determine the effects of a psychological pricing strategy (as opposed to a round price strategy) on the price and quality

In deze bijlage geeft het Zorginstituut een opsomming van de stand van zaken van de uitvoering van de activiteiten die zijn beschreven in het plan van aanpak voor de uitvoering

Although the majority of respondents believed that medical reasons were the principal motivating factor for MC, they still believed that the involvement of players who promote

This is supported in the draft KM framework by the Department of Public Service and Administration (DPSA) in South Africa 37 which mentions the following as the reasons why

Noteer bijvoorbeeld wat u wilt bespreken, wat nog niet zo goed gaat of welke vragen u heeft.. Datum Wat wil ik vragen

In the current context, we used four-way ANOVA, where the con- tinuous dependent variables were the relative error of the attenuation and backscatter estimates, influenced