• No results found

Combining utility maximization with regret minimization : an exploratory analysis in the field of marketing research

N/A
N/A
Protected

Academic year: 2021

Share "Combining utility maximization with regret minimization : an exploratory analysis in the field of marketing research"

Copied!
80
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Combining utility maximization with regret

minimization: an exploratory analysis in the field of

marketing research

Maura Schreurs

November 29, 2014

A thesis in partial fulfillment of the requirements for the degree of Master in Econometrics

Department of Quantitative Economics University of Amsterdam

Supervisor 1: Dr. J.C.M. van Ophem Supervisor 2: Dr. M.J.G. Bun

(2)

Abstract

Background The Random Utility Model (RUM) is the dominant modelling approach in mar-keting research. A restrictive property of RUM is that it assumes that context is not relevant in a pairwise comparison, whereas marketeers often manipulate the context to influence consumer decision making. The Random Regret Model is proposed as an alternative to RUM. The RRM acknowledges the context in a pairwise comparison, while remaining as parsimonious as the RUM. Recently, also the Generalized-RRM (G-RRM) is introduced, which nests both models as special cases, as well as a latent class model that combines both decision rules.

Objectives First, we introduce the RRM and G-RRM into marketing research and explore the opportunities for a combined RUM-RRM model in marketing research. Second, mathematical contributions to the emerging literature are made. We provide a mathematical foundation for the RRM and show that the comparison of RUM and the new version of RRM is a valid. We also show that the same properties of marginal effects hold for the RRM as for the RUM. Methodology Using a stated preferences and revealed preferences data set concerning Alzheimer treatment and saltine crackers respectively, we provide an empirical comparison by looking at model fit, trade-offs and forecasting performance.

Results The RUM has the best model fit in the context of both data sets, whereas the latent RUM-RRM and latent G-RRM have the best model fit for both data sets. Parameter ratios did not differ significantly across models, but interpretation of trade-offs differs for utility and regret models. The differences in forecasting errors were very small for both data sets.

Discussion Although a conceptual comparison shows promising results for RRM, empirical comparison using data from discrete choice experiments in the field of marketing shows that the RUM outperforms the RRM. However, latent RUM-RRM could reveal additional insight in marketing research, but differences in forecasts are small. Additional analysis using more sophisticated model specification is necessary to determine whether differences in model fit and forecasting performance increase.

Keywords Random Utility Model, Random Regret Model, Generalized Random Regret Model, Hybrid RUM-RRM, Latent class modeling, Independence of irrelevant alternatives, Stated prefer-ences data, Revealed preferprefer-ences data

(3)

Preface

The content of this thesis is based on an internship at SKIM, an international market research agency with an expertise on choice based conjoint analysis. The internship was jointly supervised by research director M. Hoogerbrugge and project leader J. Hardon. It should be emphasized that the views expressed in this thesis are those of the author. No responsibility for them should be attributed to SKIM.

(4)

Contents

1 Introduction 1

2 Unordered multinomial choice models 2

2.1 General concept . . . 2

2.2 Random utility model . . . 2

2.2.1 General framework . . . 2

2.2.2 Logit specification . . . 3

2.2.3 Panel logit specification . . . 5

2.2.4 Interpretation of parameters . . . 6

2.3 Random Regret Model . . . 7

2.3.1 General framework . . . 7

2.3.2 Logit specification . . . 9

2.3.3 Interpretation of parameters . . . 10

2.4 A proof for a more general framework of the RRM . . . 11

2.5 Theoretical comparison of RUM and RRM . . . 13

2.5.1 Independence of Irrelevant Alternatives . . . 14

2.5.2 Fully- versus semi- compensatory decision rule . . . 16

2.5.3 Compromise effect . . . 17

2.5.4 Policy implications . . . 18

2.6 Combined RUM−RRM models . . . 19

2.6.1 Hybrid RUM−RRM model . . . 19

2.6.2 Generalized-Random Regret Model . . . 20

2.6.3 Latent class RUM−RRM model . . . 22

3 Model Estimation Procedure 24 3.1 General procedure . . . 24

3.2 Clobal concavity in the (G-)RRM . . . 25

3.3 Latent class models . . . 25

4 Data analysis 26 4.1 Stated versus revealed preference data . . . 26

4.2 Medicine brand choice data . . . 26

4.2.1 Choice variable . . . 26

4.2.2 Explanatory variables . . . 26

4.3 Saltine crackers data . . . 27

4.3.1 Dependent variable . . . 28

4.3.2 Explanatory variables . . . 28

5 Methodology for the empirical comparison 29 5.1 Goodness of fit and estimates . . . 29

5.2 Parameter ratios and trade-offs at attribute level . . . 30

5.3 Predicted choice probabilities . . . 32

5.4 Forecasting measures . . . 32

(5)

6 Results of the empirical comparison 34

6.1 Alzheimer treatment . . . 34

6.1.1 Goodness of fit and estimates . . . 34

6.1.2 Parameter ratios and trade-offs at attribute level . . . 38

6.1.3 Predicted choice probabilities for specific choice task . . . 39

6.1.4 Forecasting measures . . . 40

6.1.5 Regret weight in the generalized regret models . . . 41

6.2 Saltine crackers . . . 44

6.2.1 Goodness of fit and estimates . . . 44

6.2.2 Parameter ratios . . . 46

6.2.3 Average predicted choice probabilities . . . 46

6.2.4 Forecasting measures . . . 46

7 Discussion 49 8 Bibliography 51 A Appendix 54 A.1 Derivation marginal effects and elasticities RUM . . . 54

A.2 Derivation marginal effects and elasticities RRM . . . 56

A.3 Proof of equivalence RUM and RRM in binary choice set . . . 58

A.4 Proof original and new framework are both special cases of a more general framework 59 A.5 The RUM is a restricted version of the G-RRM . . . 61

A.6 Data analysis . . . 62

A.7 Ben-akiva Swait test p-values . . . 63

A.7.1 Alzheimer Data . . . 63

A.7.2 Saltine Data . . . 64

A.8 Parameter ratios . . . 65

A.8.1 Alzheimer data . . . 65

A.8.2 Saltine data . . . 68

A.9 Prediction realization tables for Alzheimer data . . . 69

A.9.1 In−sample prediction tables . . . 69

A.9.2 Out−of−sample prediction tables . . . 71

(6)

1

Introduction

For marketeers it is of great importance to gain insight in the behavioral process underlying choices made by their potential customer base. This knowledge can be utilized to persuade potential customers to buy their product. For decades, market researchers provide insight to marketeers, by investigating what drives a consumer to choose one product over an other from a causal perspective. The contributions of McFadden have given an enormous boost to market research by introducing the Random Utility Model (RUM) which uses a conditional logit model. For this, he received a Nobel prize in Economics in 2000. Today the Random Utility Model (RUM) is still the dominant discrete choice model used in marketing research.

The RUM assumes that consumers maximize their utility when encountering a decision. The decision rule of utility maximization is rooted in classical economics and assumes that consumers evaluate a choice situation as a rational economic agent. Today there is a large body of evidence that we act predictably irrational rather than rational. Based on this evidence a new discipline of behavioral economics is created. There is growing interest for a decision rule motivated by behavioral economics to increase the behavioural realism of the discrete choice models.

Recently, a new behavioral process is introduced in the field of choice modeling: the Random Regret Model (RRM) (Chorus et al., 2008). The RRM is motivated by behavioural economics, and is almost as parsimonious as the RUM. The RRM does so by assuming that people minimize anticipated regret, and models semi-compensatory behavior resulting into the prediction of the compromise effect. Also an encompassing model named the Generalized Random Regret Model (G-RRM) is introduced by Chorus (2014) that allows to estimate which decision rule (utility max-imization or regret minmax-imization) fits the data better.

In recent years several empirical comparisons of the RUM and RRM have been provided (e.g. Hensher et al. (2011), Thiene et al. (2012), de Bekker-Grob et al. (2013)), of which one empirical comparison has been provided in the field of marketing research (Chorus et al., 2014). However, to the best of our knowledge, no empirical comparisons are performed in the context of discrete choice experiments with many attributes, and in particular with many categorical attributes. For example the G-RRM is only estimated for one data set with only a few numerical attributes. In marketing research, the choice experiments are often complex, meaning that the data contains many attributes and constraints. This thesis will study the possibilities of the RRM in these type of discrete choice experiments by performing an empirical comparison between the RUM and RRM.

Furthermore, we will address a second trend in the literature, namely the growing interest for models that capture potential heterogeneity in decision rules. This is done by analyzing latent class models that combine the utility maximizing decision rule and the regret minimizing decision rule. The remainder of this thesis is organized as follows. In Section 2 the RUM and RRM are presented and interpreted. Also the theoretical differences between the RUM and RRM will be discussed, as well as the theory for the models that combine both decision rules. In Section 3 the model estimation procedure is discussed, and in Section 4 we analyze the data used for the empirical comparison. The methodology for the empirical comparison is explained in Section 5, and the results of this ecomparison are discussed in Section 6. The findings and suggestions for further research will be discussed in Section 7.

(7)

2

Unordered multinomial choice models

2.1 General concept

An unordered multinomial choice model is used to analyze the choice between a finite number of alternatives that do not have a natural ranking. There are numerous examples in our daily lives where we encounter such a discrete choice situation, for example each time you choose between different shampoos or take-away? Using a multinomial choice model, the aim of a researcher is to get an understanding of the behavioural process underlying the choice decision from a causal perspective.

In discrete choice modeling it is assumed there are several factors that together determine the choice y of a decision maker. We can represent the behavioural process causing the decision maker to choose product y based on observed factors x and unobserved factors  by h (x, ). Now, a consumer chooses product y if

y = h (x, ) . (1)

This behavioural process h (·) is deterministic in the sense that, given (x, ), the chosen alternative y is fully known. But  is unobserved, so that a certain density f () has to be assumed in order to determine a probability for each choice.

Using the indicator function I(·) which equals one if the argument holds and zero else, we can write the probability as

P (y|x) = P (I {h(x, ) = y} = 1) =

Z

I {h(x, ) = y} f () d (2)

using the density f ().

So, we need to specify a behavioural process h(·), and we need to impose a density for . Depending on the density f (), the integral can be solved in closed form, by simulation, or by combining the two. Train (2009) provides an overview of advances in the specifications of the error term. The focus of this thesis lays on the observable factors x and the assumed behavioural process underlying the choice. To maintain this focus, assumptions regarding  will be made such that the integral in (2) can be solved in closed form. The first and dominant decision rule we will discuss in the next subsection is the assumption of utility maximization.

2.2 Random utility model

2.2.1 General framework

The Random Utility Model (RUM) is based on utility theory (McFadden; Thurstone, 1927; Manski, 1977) and states that economic agents try to maximize their utility when they encounter a decision. In doing so, an agent evaluates each alternative in isolation, then attaches a utility value to it, and will always rationally choose the one which provides him the highest utility.

Utility is an abstract concept since we can not completely observe it, so that the RUM belongs to the class of latent variable models. It states that the utility in choice situation i associated with product j, Uij is composed of a deterministic component Vij(= h(xij)), which the researcher can

measure, and a random component ij. The dominant model is the additive random utility model

(ARUM). The ARUM states that the utility perceived for a product equals the sum of utilities attached to each of the attributes that together define the product. The utility of an attribute is in

(8)

turn determined by the performance multiplied by a taste parameter that indicates how important the attribute is to the decision maker. We state that for a J -choice multinomial model the utility is specified as

Uij = Vij + ij =

X

m

βm· xijm+ ij, i = 1, .., n, j = 1, .., J (3)

This thesis only discusses the ARUM, also known as linear-in-parameters RUM, so from now on the A will be suppressed for the sake of simplicity.

The consumer maximizes utility, implying that alternative j in choice task i will be chosen if the utility of product j exceeds the utility attached to every other alternative in choice set i. So, product j is chosen with probability

P (yi= j) = P (Uij > Uik, ∀k 6= j)

= P (Vij+ j > Vik+ ik, ∀k 6= j)

= P (ik− ij < Vij − Vik, ∀k 6= j) .

(4)

Recall that the researcher cannot observe , so we must make an assumption about the density f (i). Using this density combined with equation (2) the cumulative probability in equation (4)

equals

P (yi= j) =

Z



I (ik− ij < Vij − Vik∀k 6= j) f (i) di, (5)

i.e. a multidimensional integral over the density f (i). The density f (i) is the distribution of the

unobserved portion of the utility within the population of people who observe the same observed portion of utility (Train, 2009). This interpretation of the unobserved part enables that the prob-ability in equation (4) can also be interpreted as the share of people that choose product j, within the population of people who observe the same observed portion of utility, see Train (2009).

Different assumptions for f () yield different RUM specifications. We will use the logit spec-ification because of the independence assumption of the error terms, yielding a simple form for first order conditions and asymptotic outcomes. In the remainder, we suppress the subscript i for notational convenience.

2.2.2 Logit specification

The key assumption for the logit model is that each idiosyncratic error j from (3) is assumed to

be independent and identically distributed (i.i.d) and has a type I extreme value distribution, also known as the standard Gumbell distribution, i.e.

F (j) = e−e

−j

. (6)

The variance of j equal to π

2

6 ensures normalization. Now, since each j is i.i.d. extreme value

type I distributed, the difference k− j is logistically distributed, i.e.

F (k− j) =

ek−j

1 + ek−j. (7)

Recall that the error term represents the unobserved part of the utility of the decision maker with respect to alternative j. Under this assumption about the unobserved part of the model, the probability of choosing alternative j in (4) can be rewritten to the closed-form formula

P (y = j) = e

Vj

P

j=1..JeVj

(9)

There are several different logit models that have the assumption of j given in (6) in common,

namely the multinomial logit, the conditional logit and the mixed conditional logit. What dis-tinguishes the models are the type of explanatory variables used to estimate the predicted choice probabilities. There are two different types of explanatory variables used in stated preferences data sets:

• The first type of explanatory variables are case specific, meaning that they vary across indi-viduals, but are the same for each alternative within a choice task. Examples are sociode-mographic variables like gender, age and profession. These type of variables are denoted by wi, i = 1, .., n.

• The second type of explanatory variables vary both across decision makers and across alter-natives. An example is the price of a train ticket for different train routes. These type of variables are denoted by xij, i = 1, .., n, j = 1, ..J .

The multinomial logit model is applicable when the data only contains the first type of variables Wi. In this thesis we will analyze two data sets that both contain attributes that vary across

alternatives so that we will not discuss the multinomial logit any further. The conditional logit and mixed conditional logit are described next.

Conditional logit

In a discrete choice experiment attributes vary across alternatives by construction. For example the price of a product is varied across different alternatives and choice tasks to obtain pricing insights. The model with this type of data is given already in Subsection 2.2.1 in (3) and is known as the conditional logit model. So, the attributes contain information about the decision maker relative to each alternative. The parameter β is constant across all alternatives, so that no normalization is necessary for identification.1 It represents the importance of a particular attribute to the decision

maker. So, the price attribute β represents how important the price of a product is to the decision maker. The predicted choice probability is given by:

P (y = j) = e

xijβ

P

j=1..Jexijβ

. (9)

Note that we do not have a constant in this model. This is because we are evaluating differences, i.e. Vij− Vik, which means that non-varying variables such as a constant drop out. However, one

can include an alternative specific constant, β0,j, where β0,J is normalized to zero for identification.

The constant measures how much more likely alternative j will be chosen compared to the base alternative. When an alternative specific constant is added, we no longer have a conditional logit, but we arrive at a special case of the Mixed conditional logit which will be discussed next.

Mixed conditional logit

In the conditional logit model, we assumed all variables differ amongst alternatives. However, it may be useful to explain choice behavior by individual specific variables like age or gender. These may be variables that correlate with the choice, since brand A may be more appealing to young women than brand B, who is perhaps preferred by older men. When all variables are of this type, we arrive at the multinomial logit. In case one wants to add variables like gender to the attributes

1

(10)

presented in the hypothetical choice task, the suitable model is the mixed conditional logit model. This model combines both type of explanatory variable and is given by

Uij = Vij + ij = xijβ + wiαj+ ij. (10)

Note that for the case specific variables we obtain j set of variables estimates αj, where we normalize

the set αJto zero for identification. When we set wi= 1 we estimate an alternative specific constant.

2.2.3 Panel logit specification

In the logit specification discussed above, we assume i.i.d error terms. In a discrete choice experi-ment a respondent is faced with multiple choice tasks, which makes this assumption invalid. More specifically, the n observations consist of R respondents filling out Q questions, yielding n=RQ responses. Instead, we treat the data as a pooled cross-section, which means that the data contains different cross-sections at multiple moments. With panel data, the same cross-section is observed at multiple moments in time. Although the independence assumption may be reasonable for pooled cross-sections, it is likely that the observations from the same respondent are correlated, since R people providing Q responses provides less information than RQ people providing one response (Louviere and Woodworth, 1983). Hence, multiple choices from one respondent or buyer cannot be assumed to be independent.

The panel data structure enables us to include the individual heterogeneity in the discrete choice model. We will illustrate this for the RUM, but the extension of the framework applies to the regret minimizing decision rule as well. When we denote the individual heterogeneity by crj, the general

framework for a discrete choice model in case of panel data is given by

Urjq= xrjqβ + crj+ rjq. (11)

Note that this heterogeneity often is not observed, so we cannot simply add a regressor. By using an effects model, you include an extra variable in the model that fits the effect of a respondent, thereby accounting for the unobserved variance across respondents. This way we can account for neglected heterogeneity. To do so, some assumptions need to be imposed. The first assumption is the strict exogeneity of the regressors given by

E[rjq|xrj1, ..., xrjQ, crj] = 0. (12)

Moreover we need to impose an assumption about the relation between the unobserved heterogene-ity and observed regressors. The assumption of mean independence, given by

E[crj|xrj1, ..., xrjQ] = α (13)

is a strong assumption. A less restrictive assumption is given by

E[crj|xrj1, ..., xrjQ] = h(xrj1, ..., xrjQ), (14)

where we allow correlation between the unobserved heterogeneity and the regressors in the discrete choice model. When we impose assumptions (12) and (13) we obtain the random effects model. The random effects model has a compound error term, where besides the idiosyncratic error we also have an error term that is identical in each chocie task within a respondent. Under assumptions (12) and (14) yield the fixed effects model. The fixed effects model is less restrictive, since it allows for correlation between the regressors and the unobserved heterogeneity. However, estimating a fixed effects model is not possible by use of maximum likelihood estimation due to the inconsistency of

(11)

the joint maximum likelihood estimator, see Chamberlain (1982). One could estimate the fixed effects model through the estimation of a random effects model where a dummy is included for each respondent, where we normalize the Rth dummy to 0 for identification. The dummy estimates the unobserved effects of the respondent. Therefore, in this thesis we will impose the assumption stated in (13) about the unobserved heterogeneity.

Under the assumption of random effects, we can adopt a more simple approach avoiding ad-vanced modelling that will produce consistent results. In fact, the logit specification discussed with i.i.d errors wil provide consistent but inefficient estimates. In this thesis, we will correct the error term a posteriori by use of the sandwich estimator (e.g. White (1980)). The method for obtaining the corrected error term will be explained in Section 3 together with the naive estimation procedure, i.e. under the assumption of i.i.d. error terms. 2

2.2.4 Interpretation of parameters

Due to the non-linearity of the logit model, the interpretation is not straightforward. To evaluate the effect of the attributes, one often considers the odds ratio, which equals the ratio of two choice probabilities. This ratio for alternative j with respect to alternative l, is given by

Ωj|l= P (y = j) P (y = l) = eβ0,j+xijβ eβ0,l+xilβ = e [β0,j−β0,l]+β[xij−xil]. (15)

with j = 1, .., J . The intercept of the odds ratio show how on average one alternative is preferred over another. So, if [β0,j > β0,l], then alternative j is preferred over alternative l. We see that for

e.g. βprice (< 0) alternative j is preferred more for a greater negative difference in xpricej− xpricel.

The odds ratio shows that that due to symmetry a gain in β for product j versus l leads to a loss of β for product l versus j.

In order to interpret the total effect of a one unit increase from an attribute, ceteris paribus, one generally calculates the marginal effect. The marginal effect is calculated by taking the partial derivative of the choice probability of alternative j with respect to either xij , which gives

∂P (y = j) ∂xij

= βP (y = j) [1 − P (y = j)]. (16)

or with respect to xil, l 6= j, which yields

∂P (y = j) ∂xil

= −β P (y = j) P (y = l) . (17)

The direct marginal effect should be interpreted as follows. A one unit increase in xij would

increase or decrease the probability that alternative j is chosen by the marginal effect expressed as a percentage. Note that due to the non-linearity, the marginal effect depends on all attributes, so we cannot determine a single marginal effect per attribute. Two different methods are applied to get a marginal effect per attribute. First, one can calculate the marginal effects at the mean, i.e. take the mean of every explanatory variable and calculate the marginal effect based on those attribute values. Note that this calculation only makes sense for numeric attributes, since for categorical variables the mean has no meaning. Second, one can calculate the mean of marginal effects. Note that the marginal effects sum to zero, as shown in Appendix A.1.

2

Note that when the mean independence assumption does not hold, our analysis will produce biased and incon-sistent results.

(12)

To deal with scale sensitivities one often prefers elasticities. It measures the effect of a percentage point increase in choice probability of alternative j caused by a one percent increase of xij. For the

conditional logit model, the direct elasticity is given by ∂P (y = j)

∂xij

xij = βxij P (y = j) [1 − P (y = j)], (18)

and the cross elasticity is given by ∂P (y = j)

∂xil

xil= −βxij P (y = j) P (y = l) . (19)

It is straightforward that also the elasticities sum to zero.

2.3 Random Regret Model

2.3.1 General framework

The Random Regret model (RRM) is based on regret theory (Loomes and Sugden, 1982; Bell, 1982; Fishburn, 1982) and assumes that instead of maximizing utility, respondents seek alternatives that promise to minimize anticipated regret. Regret is a negative, cognitively based emotion that we experience when realizing or imagining that our present situation would have been better, had we decided differently (Zeelenberg, 1999).

If you consider the possibility of regret before making your decision, this is known as anticipated regret. Originally regret theory was based on risky decisions, meaning that the outcome of an event is uncertain at the moment one has to choose. An example of a risky decision is deciding on your lottery ticket number. However, during the years, evidence of a regret minimizing behavioural process is provided for both risky and non-risky decision making from different disciplines including marketing (Zeelenberg and Pieters, 2007; Simonson and Tversky, 1992), psychology (Zeelenberg, 1999) and management science (Bell, 1982). Also, Coricelli et al. (2005) found neurological evidence by concluding that when people make choices, brain regions where anticipated regret is allocated get activated. A mathematical model based on this paradigm of anticipated regret was recently proposed by Chorus et al. (2008) and named the RRM.

The level of anticipated regret in decision making that is associated with the considered alter-native j is composed of a systematic regret Rj, and a random error j representing unobserved

heterogeneity (Chorus, 2010). Hence, the RRM belongs to the latent class models as well. This can be written as

RRj = Rj+ j, j = 1, ..., J. (20)

The systematic regret Rj equals the sum of all binary regrets, that arise from pairwise comparison

of the alternative under consideration with each of the other alternatives, i.e. Rj =Pk6=jRj↔k.

This bilateral comparison is performed for each attribute of the product under consideration, so that Rj↔k equals the sum of the comparison in terms of each of the m attributes: Rj↔k =

P

k6=j

P

mRmj↔k.

The attribute level regret Rmj↔k in turn either equals zero (in case the considered alternative performs better than the other alternatives on that particular attribute), or it equals the weighted difference in attribute performance (Chorus, 2010). This weight equals the parameter βm and

(13)

attribute to the decision maker. Now we have arrived at the originally proposed model by Chorus et al. (2008) given by RRj = X k6=j X m max {0, βm· (xkm− xjm)} + j. (21)

However, this behavioural process is discontinuous in the point 0, complicating estimation procedures as well as the calculation of elasticities, due to the inability to compute partial derivatives around the discontinuity point 0. Therefore, Chorus further developed the RRM, by using a logarithmic function as an approximation of the max-function. The derivation of the new RRM is as follows. First, two i.i.d. extreme value type I distributed error terms v0m and vxm were added,

which can be interpreted as a representation of heterogeneity in attribute perceptions and weights (Chorus, 2010). In formula,

Rmj↔k = max {0 + v0m, βm· (xkm− xjm) + vxm} . (22)

Next, the error terms are integrated out using the extreme value type I density shown in (6). Ben-Akiva and Lerman (1985) show that the integral for the expected value equals the logsum formulation Z v Rm j↔k· f (v) dv = ln h 1 + eβm·(xkm−xjm) i . (23)

This yields the final formulation of the RRM used today, RRj = X k6=j X m lnh1 + eβm·(xkm−xjm)i+  j. (24)

Now the regret function is smooth and its arguments are monotonically increasing, making the RRM and the RUM (Section 2.2) equally parsimonious. Figure 1 shows the binary regret curve for both the original and smoothened version of the RRM. As illustrated in Figure 1, the binary regret

Figure 1: The binary regret curve of the RRM

(14)

comparison instead of exactly equal to zero. Moreover we find that the slope of the curve does not necessarily equal βm as is the case in the RUM. The parameter βmis an upper bound for the regret

caused by a one unit increase of the corresponding attribute m. So, instead of measuring the actual contribution to regret, it measures the potential regret caused by a unit increase of alternative m. Whether this potential level of regret is reached, depends on the initial performance of attribute j relative to alternative k. Assume β > 0, so that a higher value for the attribute is preferred above a lower value. Now assume xjm performs better than xkm, so that xkm− xjm< 0. When xkm− xjm

increases by one unit (due to a decrease of xkm by one unit), we see in the figure that the increase

in regret is smaller than βm. Now, if xkm− xjm > 0 so that alternative k performs relatively better

on attribute m, a unit increase in n xkm− xjmleads to a larger increase in regret which approaches

βm for larger values of xkm− xjm. Note that this analysis shows that a deterioration on a relatively

poor performing attribute influences the level of regret more than the same level of deterioration on a well performing attribute. To give an example, imagine the attribute is storage capacity, so m = storage and is measured in GB. We expect that βstorage > 0, since a higher storage capacity

will be preferred above a lower one, if everything else is equal. If alternative j had a higher storage than k, the regret is zero. Now, when this alternative decrease the number of GB, the increase in regret will be minimal, since it will still be the cheapest alternative around. However, if alternative j already had less GB than k the additional regret caused by a unit decrease in GB will be higher. The magnitude is estimated simultaneously with the coefficient of βm. When the sign is positive,

this means that an increase in the difference between the chosen and unchosen alternative yields a (potential) increase in regret. Similarly, for e.g. a price attribute where a negative coefficient is expected, we get a positive contribution to regret for alternative j when this alternative is more expensive than the other alternatives, since both βm and xk,price− xj,price is negative. Now regret

increases with the price difference between the two alternatives.

We find an asymmetry in the effects of a unit increase of an attribute value when using the RRM. So, the preferences of a consumer depend on the other alternatives in the choice set which is consistent with findings in consumer psychology (e.g. Simonson (1989), M¨uller et al. (2010), de Clippel, Geoffroy and Eliaz, Kfir).

Alternative j is chosen when it minimizes anticipated regret, which is equivalent to maximizing the minus of anticipated regret, so that the choice probability is given by

P (y = j) = P (RRj ≤ RRk, ∀k 6= j) = P (− (Rj+ j) ≥ − (Rk+ k) , ∀k 6= j) = P (− (j − k) ≥ Rj− Rk, ∀k 6= j) = P (− (k− j) ≤ −(Rj− Rk), ∀k 6= j) . (25) 2.3.2 Logit specification

Assuming the negative of error terms to be i.i.d. extreme value type I distributed, it can be shown that we again have the logit formulation of the choice probabilities given by

P (y = j) = e

−Rj

P

j=1..Je−Rj

. (26)

The resulting model has a smooth function and is equally parsimonious as the RUM, with an equal number of parameters that can be estimated by maximum likelihood, as will be discussed in Section 3. the model can be estimated using software packages like Biogeme (Bierlaire et al., 2006), Matlab or R. Please note that, when a choice task consists of only two alternatives, the RRM logit model

(15)

reduces to the RUM logit presented in Section 2.2.2. A proof based on Chorus (2010) is provided in Appendix A.3.

Due to the correspondence, we will provide a theoretical comparison of RUM and RRM in Section 2.5, after we have discussed the interpretation of parameters in the next subsection. 2.3.3 Interpretation of parameters

Also for the RRM, the parameters cannot be interpreted directly due to the non-linearity of the model. The sign of the parameters however can be interpreted. For the RRM, the odds ratio for alternative j with respect to alternative l is given by

P (y = j) P (y = l) = e−Rj/P k=1..Je −Rk e−Rl/P k=1..Je−Rk =e −Rj e−Rl =e P k6=j P m−ln  1+eβm·(xkm−xjm)  e P k6=l P m−ln h 1+eβm·(xkm−xlm) i . (27)

Recall that the odds ratio show the effect of a unit increase in an attribute value on the likelihood that alternative j is chosen relative to alternative l. Again we see that the choice probability is a monotone function of the attribute values.

For the total effect we again need to consider the marginal effects. The derivation is less straightforward than for the RUM and is provided in Appendix A.2. The direct marginal effect is equal to ∂P (Y = l) ∂xlm = P (Y = l) · βm   J X j=1 P (Y = j) · q(l, j, m)   (28)

and the cross marginal effect is equal to ∂P (Y = i) ∂xlm = P (Y = i)βm   J X j=1 P (Y = j) · q(l, j, m) − q(l, i, m)  , (29) with q(l, j, m) = e βm·(xlm−xjm) 1 + eβm·(xlm−xjm).

Although it is less straightforward to see, the sign of the marginal effect is fully determined by the sign of the parameter. The interpretation is as follows: if an attribute value of respectively alternative j or l increases by one unit, this leads to increase or decrease of the likelihood that alternative j is chosen expressed as a percentage. The marginal effects need to sum to zero. We will proof in this thesis that this is the case. To see this, note that

J X j=1 ∂P (Y = j) ∂xlm = ∂P (Y = l) ∂xlm +X i6=l ∂P (Y = i) ∂xlm (30)

(16)

By substituting the direct and indirect marginal effects of the RRM, it follows that the sum of the marginal effects equals zero, see Appendix A.2 for the full proof.

Again one can calculate the marginal effect at the mean or choose to calculate the mean of the marginal effects. However, due to the advantage of scale insensitivity, we look at the elasticities. For the elasticity we need to multiply by respectively xjm and xlm, yielding the direct elasticity

given by P (y = j) ∂xjm xjm = βmP (y = j) xjm· " · J X k=1 q(k, j, m) − J X k=1 P (y = k) · q(k, j, m) # (31)

and the cross elasticity given by P (y = j) ∂xlm · xlm= −βmP (y = j) xlm· " q(k, l, m) − J X k=1 P (y = k) · q(k, l, m) # . (32)

They show the percentage point increase or decrease in the choice probability of alternative j caused by a one percent increase of the attribute values of respectively respectively j and l 6= j. As (31) and (32) illustrate, the elasticities are calculated for each alternative and each choice scenario. In the present study we will present the average of the elasticities per attribute. The elasticities are comparable with RUM due to scale insensitivity.

At this point an important note needs to be made. The systematic regret associated with alternative j, Rj, is a summation of non-negative (or in fact strictly positive) binary regrets. The

number of binary comparisons is positively correlated with the size of the choice set, which implies that the size of βm will depend on the choice set size. If the parameters would not decrease when

choice sets become larger, this would mean that in a large choice set a change in an attribute value xjm would have a bigger impact on the anticipated regret than in a small choice set. So, one cannot

directly compare parameters from the RRM and RUM model, since the differences in parameter size cannot be interpreted in a meaningful way. Also, when one wants to simulate market shares, the choice set in the simulation must have the same size for both estimation and simulation, to prevent biased forecast. In case the choice set for simulation is bigger than for estimation, probabilities forecasts will be biased due to overestimation of the parameters. Similarly, a smaller choice set would lead to underestimation. This has important empirical implications, since when there are six brands active in the market and you are studying brand choice, you should present respondents with all six brands at once in a choice set.

2.4 A proof for a more general framework of the RRM

The smoothened RRM shown in (24) is introduced by Chorus (2010) for analytical convenience. Chorus showed that the smoothened RRM is related to the original framework; when we add Extreme value type-1 error terms to each term in the max -function and integrate them out to obtain the mean, we find this mean equals the binary regret term in the smoothened RRM. This raises the question whether a comparison of the RUM and new version of RRM is a valid one. To investigate this, an extreme value type-1 error term is added to the observed utility of one attribute, which yields

(17)

Z ∞ −∞ Vmf (v) dv = Z ∞ −∞ (βmxm+ v)f (v) dv = βmxm· 1 + Z ∞ −∞ vf (v) dv = βmxm+ E[v] = βmxm+ γ, (33)

where γ denotes the Euler constant. We find that due to the linearity of the utility maximizing decision rule a constant term will be added which is independent of the alternative. Since only differences in utility matter, the Euler constant γ will drop out. Hence we conclude we can compare the 2010 version of RRM and RUM based on the derivation by Chorus.

However, the two models are much more related than suggested before. We will provide a proof that both the smoothened RRM in (21) and the original framework in (24) are a special case of a more general framework.

First we will show that the smoothened RRM is a special case of the more general framework. The binary regret in the 2010 version of the RRM is given by

Rmj↔k = lnh1 + eβm·(xkm−xjm)

i

= lnh1 + eβm·(∆xm)i. (34)

where βm∆xm is a scalar constant. The subscript m is deleted in the remainder of the proof for

the sake of tractability. The proof uses two basic properties of the Gumbel distribution3: i The Gumbel distribution is preserved under linear transformation, i.e.

if  ∼ Gumbel(η, µ) and θ and α > 0 are scalar constants then α + θ ∼ Gumbel(αη + θ,µα).

ii The maximum of two independent Gumbel distributed error terms with common scale parameter is also Gumbel distributed, i.e.

if i∼ Gumbel(ηi, µ) for i = {1, 2},

then max{1, 2} ∼ Gumbel



1 µln(e

µη1 + eµη2), µ



Combining property i and ii we get that if i,i=1,2follow the Gumbel distribution with parameters

(ηi, µ) for i = {1, 2} then max{1, 2+ β · ∆x} also follows the Gumbel distribution with parameters

 1 µlne µη1 + eµη2β·∆x , µ  . Now, since ln(1 + eβ·∆x) = 1 µln  eµη1 + eµ(η2+β·∆x)  η1, η2 = 0, µ = 1 (35)

we find that the binary regret of the smoothened RRM is a special case for µ = 1.

Next we will show that the limiting case when µ → ∞ yields the binary regret in the original framework. Under the earlier imposed restriction η1, η2 = 0, but no restriction for µ we obtain

1 µln  1 + eβ·∆xµ  . (36) 3

(18)

By distinguishing three cases for the limit when µ → ∞, and applying L’Hˆopital’s rule4 we show in Appendix A.4 that the limiting case is given by

lim µ→∞ 1 µln  1 + eβ·∆xµ=  0 if β · ∆x ≤ 0 β · ∆x if β · ∆x > 0 .

This shows that the original RRM (Chorus et al., 2008) is a limiting case of the new smoothened version of the RRM (Chorus, 2010) when µ → ∞, meaning that the new version of the RRM con-verges to the original framework when the variance approaches zero. Above that, this proof shows that both models are a special case of a more general framework. This allows us to approximate the original RRM model much more precisely, as in illustrated in Figure 2. Here we see that for

Figure 2: The general framework for the RRM for different values of µ

e.g. µ = 10 the original framework is approximated very close. Finally, the proof for the general framework allows us to test whether the smoothened version estimates the same decision rule as the original framework.5 Next we will provide a theoretical comparison of the RUM and RRM model, where we will elaborate on the similarities and differences of the models.

2.5 Theoretical comparison of RUM and RRM

The RUM and RRM have several similarities. Both models are equally parsimonious due to the equal number of parameters that need to be estimated using a similar logit specification. This implies that known extensions of error terms for the RUM can be applied to the RRM as well. Finally, for a binary choice task the RRM is equal to the RUM, as proven in Appendix A.3.

Despite these similarities, there are several conceptual differences between the RUM and RRM that arise from the different assumed behavioral process underlying the decision.6 To illustrate these differences between the RUM and RRM we will often refer to the hypothetical choice task

4

L’Hˆopital’s rule: if limx→cf (x)g(x)= 0, +∞ or −∞ and limx→cf

0(x)

g0(x) exists and g

0

(x) 6= 0

then limx→cf (x)g(x) = limx→c f

0(x)

g0(x).

5

Note that for this we need tot test at the boundary, so that specific tests apply only.

6In this section the difference in parameter interpretation in the RUM and the RRM is not discussed, as this is

(19)

with three alternatives, shown in Table 1. Here the assumption is made that βx = βy = 0.1, so

that for ∆ = 0, alternative C is a perfect compromise between A and B. Alternatives

A B C

x 20 40 [30 + ∆]

y 40 20 [30 − ∆]

Table 1: A hypothetical choice task

2.5.1 Independence of Irrelevant Alternatives

An important restriction of the RUM following from the independence of the error term is that the relative probabilities of two alternatives depend only on attributes of those two alternatives, see Wooldridge (2012). In the context of the choice task presented in Table 1 the relative probabilities of two alternatives, i.e. the odds ratio, for the RUM equals

ΩRU MA|B = P (y = A) P (y = B) =

eβxxA+βyyA

eβxxB+βyyB. (37)

We see in (37) that alternative C is disregarded, such that the coefficients only dependent on A and B. This implies that the RUM reduces to a binary logit for each pairwise comparison, so that information about a third alternative will have no influence on the comparison of alternatives one and two. This is known as the Independence of Irrelevant Alternatives property (IIA).

Since the RRM also assumes i.i.d. error terms, we suspect the IIA to hold for the RRM as well. However, as we see in the odds ratio given by

ΩRRMA|B =P (y = A) P (y = B) = e−RA e−RB, (38) with RA= ln h 1 + eβx·(xB−xA) i + lnh1 + eβx·(xC−xA) i + ln h 1 + eβx·(yB−yA) i + ln h 1 + eβx·(yC−yA) i , Rb = ln h 1 + eβx·(xA−xB) i + lnh1 + eβx·(xC−xB) i + ln h 1 + eβx·(yA−yB) i + ln h 1 + eβx·(yC−yB) i ,

we find that the RRM does acknowledge the presence of alternative C in the binary comparison of alternatives A and B. More generally, in the RRM the odds ratios include all alternatives within a choice set, even when the errors are assumed i.i.d. as in the logit specification discussed in Section 2.3.2.

A graphical illustration of this difference is shown in Figure 3, where the estimates are assumed without loss of generality to equal βx= βy = 0.1. The figure displays the odds ratio of alternative

A with respect to alternative B for both the RUM and RRM as a function of the attribute values of alternative C. Since under IIA every pairwise comparison reduces to a binary logit, varying alternative C will have no impact on ΩRU MA|B . For RRM however, we do see a change in ΩRRMA|B as alternative C varies.

(20)

Figure 3: Odds ratios of RUM and RRM

It is important to understand that the assumption of i.i.d. error terms can still yield inconsistent results in the RRM model. We will illustrate this by applying the famous Red-bus Blue-bus problem to the RRM. The Red-bus Blue-bus problem is an often used term for the IIA. It concerns a situation with initially two travel modes, a car and red bus, with predicted choice probabilities equal to PCar = 0.7 and PRbus = 0.3. After the introduction of a blue bus with choice probability

PBbus the choice probability ratio of the car and red bus remains

PCar

PRbus

= 0.7

0.3 = 2.33 (39)

due to the IIA property. The odds ratio, combined with the two restrictions PRbus = PBbus and

PCar+ PRbus+ PBbus = 1, yield the new choice probabilities to equal PRbus = PBbus = 0.23 and

PCar = 0.54. In words, the introduction of a blue bus cannibalizes the share of people preferring

the car as a travel mode, whereas it is expected that the color of a bus has a negligible influence on the decision to drive by car or not.

The Red-bus Blue-bus problem also exist for the RRM despite its acknowledgement of the context, since this is a limitation of the logit specification. To see this, consider the the choice task shown in Table 2 and assume that {βprice, βtime} = {−0.04, −0.06}. In Table 3 the predicted

Alternatives

Car Rbus Bbus

price 4 2.5 2.5

time 30 245 2.5

Table 2: A hypothetical choice task Red-bus blue-bus problem

choice probabilities are shown. Recall that for a binary choice task the RRM is equivalent to the RUM, so that in the initial situation with two travel modes the predicted choice probabilities are equal for the RUM and RRM. When the blue bus is introduced, we see that for the RRM that the car travel mode is cannibalized as well, whereas to a lesser extent.7

7

(21)

Before Expected RUM-CL RRM-CL

PCar 0.70 0.70 0.54 0.62

PRbus 0.30 0.15 0.23 0.19

PBbus - 0.15 0.23 0.19

Table 3: Predicted choice probabilities Red-bus blue-bus problem

Altogether this means that the definition of the IIA property stated in the beginning of this section is not precise in the context of the RRM. In the RRM we still have the Red-bus Blue-bus problem due to the assumption of i.i.d errors, whereas in the systematic part of the RRM we assume all alternatives are relevant, allowing to account for context effects. To deal with violation of the Red-bus Blue-bus example, the logit model needs to be extended such that correlation between error terms is allowed. Examples are the nested logit and the mixed logit, which are described in e.g. Train (2009).

The acknowledgement of the context in the RRM is consistent with numerous evidence for the existence of context effects from the field of psychology. Simonson (1989) showed that adding extra alternatives to a decision task can influence preferences. On the other hand, deleting information of another third characteristic can also influence choices (Kivetz and Simonson, 2000). Many other experiments have buttressed the finding that irrelevant alternatives do influence the choice of a consumer. The context specific choice probabilities in the RRM can be seen as a drawback as well. This is because we need to have the correct choice set as a subset of the alternatives is not sufficient anymore like is with RUM (note that in that case one should have the preferred alternative included in the choice set). To illustrate, suppose the consideration set of a consumer consists of eight brands. It is well known that the optimal number of alternatives presented in a single choice task in a discrete choice is around three. The RUM produces unbiased estimates when only three alternatives are shown in one choice task. For unbiased RRM estimates we need to incorporate all eight brands to obtain unbiased estimates.

2.5.2 Fully- versus semi- compensatory decision rule

The assumption of utility maximizing behavior implies fully compensatory decision making. This means that a decrease of performance of attribute one can be fully compensated by another at-tribute. For example, suppose we have a product with two attributes x and z, which are evaluated as equally important so that βx = βy. Then a one unit increase in x can be fully compensated with

a one unit decrease in y. Also, the level of compensation does not depend on the initial performance of the attributes, meaning it has no influence whether an attribute is the best or worst performer in the choice set.

In contrast, the decision rule of minimizing anticipated regret is semi-compensatory; the level of compensation of y needed for an increase in x of one unit depends on the initial relative performance of both attributes of this alternative compared to the other options in the choice set. The choice probabilities in RRM are choice set specific, since they depend on the performance of all alternatives. This implies that a superior performance of attribute y does not necessarily make up for a relative poor performance of x.

differ from RUM estimates. The robustness of respectively RUM and RRM with respect to the IIA property is an interesting area for future research.

(22)

2.5.3 Compromise effect

The semi-compensatory decision rule implies what is known as the compromise effect (Simonson, 1989). Numerous empirical studies have confirmed the existence of the compromise effect, which stipulates that options positioned between extreme alternatives in a product space are perceived as more attractive, hence becoming more likely to be chosen by consumers (M¨uller et al., 2010). The RRM predicts the compromise effect by assigning a larger choice probability to attributes with a mediocre performance, as is shown in Figure 4.

Figure 4: Predicted choice probabilities of the RRM for the hypothetical choice task as function of ∆ In Figure 4 we find the choice probabilities predicted by the RRM belonging to the hypothetical choice task shown in Table 1 as a function of ∆. We see that alternative C has a market share bonus for having a mediocre performance on both attributes, relative to the other alternatives in the choice set. For ∆ = 0, alternative C is a perfect compromise and therefore has the largest market share bonus for this value. In Table 4 the choice probabilities predicted by the RUM and the RRM are shown for different values of ∆. Here we find that the RUM predicts equal choice

∆ = 0 A B C Prum 0.33 0.33 0.33 Prrm 0.26 0.26 0.48 ∆ = 5 A B C Prum 0.33 0.33 0.33 Prrm 0.21 0.23 0.46 ∆ = 20 A B C Prum 0.33 0.33 0.33 Prrm 0.14 0.60 0.26

(23)

probabilities for all three alternatives, i.e. P (A) = P (B) = P (C) = 0.33, irrespective of the value for ∆. In contrast, the RRM predicts a market share bonus for the alternative that is positioned as a compromise alternative given the value for ∆. For ∆ = 0 alternative C is a perfect compromise between alternative A and B. When alternative C is the perfect compromise the RRM predicts a choice probability of 0.48, which is 15 percentage points higher than the RUM. When ∆ = 5 so that {xC, yC} = {50, 10} we find that the predicted choice probability of the compromise alternative C

is still 13 percentage points higher than the RUM. We also see that the predicted choice probability in the RRM for alternative A is two percentage points lower than for alternative B, which can be explained by the particular choice set composition. For ∆ = 20 alternative C attains the values {xC, yC} = {35, 25} so that alternative B is positioned as the compromise alternative, and indeed

we find that alternative B gets a market share bonus in the RRM. Hence, we find that the choice probabilities depend on the choice set composition for the RRM by predicting extremeness aversion resulting in the compromise effect. The RUM is insensitive to the choice set context in contrast with RRM.

2.5.4 Policy implications

The RRM could yield different policy implications due to the above mentioned differences. We will illustrate this by means of an example where both models predict a different alternative as the market leader. Suppose you want to buy a new iPod touch, and you visit the Apple online store. You encounter the choice situation shown in Figure 5, which requires you to make a trade-off between the capacity of the iPod (16, 32, or 64 GB) and the price they pay for it (199, 249, or 300 euros). We expect that the RRM predicts a higher probability for the iPod with a capacity

Figure 5: Apple iPod touch

equal to 32 GB compared to the RUM because of the compromise effect predicted by the RRM. The predicted choice probabilities shown in Table 5 are consistent with this expectation. For the calculation of the predicted choice probabilities we assume that the taste parameters for capacity and price equal respectively βCapacity = 0.04 and βP rice= −0.08. We see that under the assumption

16 GB 32 GB 64 GB

Prum 0.43 0.21 0.36

Prrm 0.34 0.50 0.17

(24)

of regret minimization, we predict that 50% of the iPods that are sold will have 32 GB capacity. In contrast, the RUM predicts that the 32 GB iPod will only count for 21% of the sales and instead the 16 GB will be bought most often: 43%. Hence we find that the different decision rules yield different policy implications, namely how many iPods to produce with each capacity.

At this point it is important to note that the finding of significant differences in policy im-plications is not limited to theoretical examples. Since the introduction of RRM (Chorus et al., 2008, 2010), both theoretical and empirical evidence is provided for differences in model fit, choice probabilities forecast and elasticities (e.g. Boeri et al. (2013), Thiene et al. (2012), Kaplan and Prato (2012)) when compared to the common standard, the RUM. This means that a different choice for a decision rule could yield significantly different policy implications. Because of these differences, several models are proposed that combine both decision rules, which will be discussed next.

2.6 Combined RUM−RRM models

So far, several attempts have been made to combine both models to obtain a better fit and provide robustness to the differences in policy implications that result from the RUM and the RRM. Three different types of models that aim to combine both decision rules into one model have been intro-duced upon today, namely the hybrid RUM-RRM, the G-RRM and the latent RUM-RRM, which will be discussed in Subsections 2.6.1, 2.6.2 and 2.6.3 respectively.

2.6.1 Hybrid RUM−RRM model

The first model that combines the RUM and RRM is proposed by Chorus et al. (2013). It assumes that some attributes are processed according the decision rule of minimizing regret, whereas other attributes are processed in a utility maximizing manner. For example, a consumer minimizes regret when evaluating the price of a camera, but uses a utility maximizing rule for the evaluation of the colour of the camera.

The derivation of this model is straightforward. Note that we can assume without loss of generality that, when evaluating alternative j in a choice set, the first q ≤ 1 alternatives are processed in a RUM fashion, whereas the remaining M − q attributes are processed by a regret minimizing rule. This results in the Random Modified Utility (RM Uj). Using both the RUM and

RRM frameworks given in (3) and (24) respectively, we get RM Uj = M Uj+ j = X m=1..q βj· xjm− X m=q+1..M X k6=j ln h 1 + eβm·(xkm−xjm) i + j. (40)

At this point it is not specified what attributes should be assigned to a particular decision rule. To assign attributes to either decision rule, Chorus et al. (2013) adopt the following strategy. First, both the RUM and RRM are estimated, where after the model with the superior model fit is selected. For the sake of illustration, assume RRM has a better fit in a choice set where each alternative has three attributes. Next, the model is re-estimated with the first attribute assigned to utility maximization. If the model fit increases, keep the first attribute assigned to RUM, otherwise it is assigned to RRM. This process is repeated for the second and third attribute, yielding the final model. Note that this way it is not guaranteed to arrive at the hybrid model with best model fit. To see this, note that in case of m attributes we can construct 2m− 2 different hybrid models.

So, an alternative approach is to try all 2m− 2 models and see which has the best fit, which could increase the model fit at the cost of an even more cumbersome modeling process.

(25)

Again, for the logit specification we assume that error terms j are i.i.d. extreme value type I

distributed, yielding the choice probability given by P (y = j) = e

M Uj

P

j=1..JeM Uj

. (41)

The hybrid RUM-RRM allows for each attribute to be either estimated by RUM or by RRM, without use of additional parameters. Empirical comparisons show a significant increase in model fit for this hybrid RUM-RRM compared to both RUM and RRM (e.g. de Bekker-Grob et al. (2013), Chorus et al. (2013)). Nonetheless, the process of model selection is cumbersome.

The next model we discuss is the G-RRM, which will provide a solution for the cumbersome process of model construction at the cost of m additional parameters.

2.6.2 Generalized-Random Regret Model

The Generalized Random Regret model (G-RRM) is proposed by Chorus (2014). To formulate the G-RRM, we replace the constant 1 in the systematic regret Ri given in (24) by a so-called regret

weight variable γm. The regret weight estimates the extremity of the semi-compensatory decision

making and thereby also the extremity of the compromise effect. This yields the framework GRRj = GRj + j = X k6=j X m ln h γm+ eβm·(xkm−xjm) i + j. (42)

The regret weight variable determines the degree of convexity of the regret function. So, it deter-mines how pronounced the reference dependent asymmetry and the compromise effect discussed in Section 2.5 are. To gain a better understanding of the role of γm, we plot ln[γm+ eβ·(xkm−xjm)]

with βm= 1 and γm ∈ [0, 1].8

Figure 6: Generalized binary regret function for several values of the regret parameter

We see that for γm = 1, i.e. the conventional RRM specification, we have the highest degree of

convexity. As γm becomes smaller, the degree of convexity decreases. For γm = 0, the preference

8The regret weight could in theory attain values outside the interval [0, 1], but these cases are disregarded since

(26)

function is linear, so that the impact on regret of a change in an alternatives attribute no longer depends on the alternatives initial performance in terms of the attribute (Chorus, 2014).

For γ ∈ (0, 1) the G-RRM still has the properties of the conventional RRM like the compromise effect and the reference dependent asymmetry are still present, but in a less pronounced manner. We will illustrate this graphically by usind the hypothetical choice task shown in Table 1 and the graphical illustration of the compromise effect shown in Figure 4. We compare the market share bonus for the compromise alternative in Figure 7 for γ = 1 (the full lines), with γ = 0.1 (the dashed lines). We see that the market bonus share for alternative C is still present, but the magnitude has decreased.

4

Figure 7: The compromise effect illustrated for γ = 1 and γ = 0.1

In fact, Chorus (2014) has proved that for γm = γ = 0, we get the same choice probabilities as

with the common standard RUM. This is stated more precisely in the next proposition, of which a proof is provided in Appendix A.5.

Proposition Assume we have a choice set with J alternatives. Analysis using G-RRM with γm = γ = 0 and parameters βm will produce identical choice probabilities as when analyzing using

RUM with parameters J · βm.

It follows that for γm = γ = 1 we have the RRM, for γm = γ = 0 we arrive at the RUM, and

by setting γm ∈ {0, 1} ∀ m we get the Hybrid RUM-RRM. Therefore, the G-RRM nests those

three models in terms of predicted choice probabilities and related metrics such as elasticities.9 This allows us to estimate what decision rule fits the data best through estimation of the regret weight instead of assuming the outcome beforehand. To do so, due to the difficulties with bounded optimization, γm is parametrized in binary logit form. More precisely,

γm =

eδm

1 + eδm, (43)

9

(27)

where we estimate δm that can attain values between plus and minus infinity.10

Different restrictions can be applied to the regret weight γm. When we restrict the regret weight

to be constant constant across attributes, i.e. γm = γ ∀ m, the G-RRM nests the RUM and RRM

and can be used as a diagnostic tool to investigate which degree of convexity fits the data best. When we we allow the regret weight to be attribute specific, i.e. γm, m = 1, ..., M , the G-RRM can

also be used to estimate the optimal hybrid RUM-RRM.

When we assume errors to be i.i.d. extreme value type I distributed once more, the choice probability equals P (y = j) = e GRRj P j=1..JeGRRj . (44)

The G-RRM has the advantage that based on the data we can estimate what behavioral process fits the data better, allowing for more flexibility in the behavioral process underlying the choice of a consumer. However, it still has the restrictive assumption that everyone adopts the same decision rule when encountering a decision. It is well known that different people can view the same situation differently. As an example, think of the optimists versus the pessimists, or the risk-seekers versus the people with pronounced risk aversion. The latent class model, discussed next, relaxes this assumption of respondent homogeneity, i.e. relaxes the assumption that either all consumers are regret minimizers, or all are utility maximizers.

2.6.3 Latent class RUM−RRM model

Hess et al. (2012) propose a model that is motivated by the hypothesis that perhaps some people aim to maximize utility, whereas other people are more likely to minimize regret when encountering a decision. So this model allows that some customers are indeed utility maximizers, but others are regret minimizers and thereby introduces respondent heterogeneity. More specifically, suppose we have R respondents, and respondent r is faced with Qr choices. Let Pr(βRU M, RU M ) and

Pr(βRRM, RRM ) be the probability of the sequence of Qr choices for respectively the RUM and

RRM. For respectively the RUM and RRM, this equals

Pr(βRU M, RU M ) = ΠQq=1r P (Ujrt ≥ Ukrq, ∀k 6= j) and (45)

Pr(βRRM, RRM ) = ΠQq=1r P (RRjrt ≤ RRkrq, ∀k 6= j), (46)

where j represents the chosen alternative, βRU M, βRRM represents a vector of parameters and the

choice probabilities are as described in respectively (4) and (25). Here we have assumed that consumers choose between the two different behavior processes of utility maximizing and regret minimizing. However, we cannot know with certainty what decision rule the decision maker is using. Therefore we estimate the probabilities πRU M, πRRM that a consumer is using a particular

decision rule. Now the probability for a sequence of choices observed for respondent r is given by Pr = πr,RU MPr(βRU M, RU M ) + πr,RRMPr(βRRM, RRM ) , (47)

where 0 ≤ πRU M, πRRM ≤ 1, πRU M + πRRM = 1 must hold to meet the conditions for a

probabil-ity. Hence, in this model we need to estimate the parameters of the choice models in each class, (βRU M, βRRM), together with the class probabilities πRU M and πRRM = 1 − πRU M.

10

However, a critical note must be made at this point, since this way γ will only be able to approach 0 and never

equal 0 and thereby represents utility maximizing behavior since limδ→−∞= 0. A similar argument holds for γ = 1.

(28)

The model of Hess et al. (2012) has two latent classes which are similar to G-RRM for γ = 0 and γ = 1. One could easily generalize this latent class model by allowing the regret parameter to γ ∈ (0, 1) instead of γ ∈ {0, 1} for one of the classes, yielding respectively a RUM-(G-RRM) or RRM-(G-RRM).11 Numerous combinations are possible, and one could also introduce a third class with e.g. γ ∈ (0, 1), or an other decision rule available in the literature.

11

(29)

3

Model Estimation Procedure

3.1 General procedure

For the estimation of the parameters we use the procedure of Maximum Likelihood Estimation (MLE). Although technological advances make several alternative estimation techniques possible such as nonparametric- and Bayesian estimation, MLE is still a widely used approach.

The probability density function of a sample y and a set of parameters θ is given by f (y|θ). For a discrete choice model, this equals

f (y|θ) =

J

Y

j=1

P (y = j)I(y=j), (48)

where I(·) denotes the indicator function, which equals one if alternative j is chosen, and zero else. The likelihood function equals the joint density of the sample. Under the i.i.d. assumption, the likelihood is simply the product of the individual probability density function in (48), yielding

L(θ) = n Y i=1 J Y j=1 P (Yi= j)I(Yi=j). (49) .

However, as discussed in Section 2.2.3, the assumption of independence does not hold for re-sponses of the same individual r, so that the likelihood function equals

L(θ) = R Y r=1 J Y j=1 P Y{Qr}i= j I(Y{Qr }i=j) , (50)

where Qrrepresents the sequence of choice situations observed for respondent r. Now the likelihood

in (50) is equivalent to the likelihood function in (49) if and only if the following equation holds:

J Y j=1 P Y{Qr}r = j I(Y{Qr }r=j) = Qr Y q=1 J Y j=1 P (Yr= j)I(Yr=j), (51)

i.e. when we assume independence so that the joint likelihood simply is the product, which is an invalid assumption in the context of repeated observations. Optimizing the likelihood in (50) requires assumptions about the correlation between the Qr observations, and solving complex

integrals. We will deal with the dependence between the repeated choices by correcting for it after the likelihood optimization, as discussed in Section 2.2.3, so that we continue with the likelihood function shown in (49) where the i.i.d. assumption is imposed. Often, the log-likelihood function given by LL(θ) = n X i=1 J X j=1 I (Yi= j)log(P (Yi= j)) (52)

is used in optimization for mathematical convenience. Since the logarithm is a monotonic function, the transformation will not affect the location of the maximum. To obtain ˆθM LE, we maximize

the log-likelihood with respect to θ, i.e. solve the likelihood equation, also known as first order condition, given by ∂LL(θ) ∂θ = n X i=1 J X j=1 I (Yi= j) ∂P (Yi= j) ∂θ = 0 (53)

(30)

will yield the MLE estimate ˆθM LE. The log-likelihood function is nonlinear in the parameters θ so

that the first order conditions in (53) cannot be solved analytically. For estimation, a numerical optimization techniques must be used to obtain ˆθM LE. In this thesis we will use the optimization

algorithms BIO (Conn et al., 2000) and CFSQP (Lawrence et al., 1994).

Next we will correct the bias in the error terms arising from the violation of the independence assumption by calculating the so-called sandwich error term (e.g. White (1982)), which is given by

S = (−H)−1B(−H)−1. (54)

The sandwich estimator is calculated by pre- and post-multiplying the Berndt-Hall-Hall-Hausman matrix B with the negative of the inverse of the Hessian H. The Hessian is the matrix of second derivatives of the log-likelihood function with respect to the parameters. Under the assumption of i.i.d. errors, the jkth cell is given by

Bjk = n

X

i=1

LjnLkn (55)

with Ljn equals the derivative with respect to model parameter j of the contribution to the

log-likelihood function from observation n. Acknowledging the panel structure of the data, the jkth

cell of the Berndt-Hall-Hall-Hausman matrix is given by

Bjk = n X i=1 LjnLkn= R X r=1   Qr X q=1 Ljnt   X t Lkrq ! (56)

which is not equal to PR

r=1

PQr

q=1LjrqLkrq which would equal Bjk in case of independent error

terms.

3.2 Clobal concavity in the (G-)RRM

Global concavity of the log-likelihood function implies the existence of a unique global maximum. It has been proven that the likelihood function of the RUM is globally concave. For the RRM, no mathematical proof for global concavity of the log-likelihood has been published. In this thesis we did not manage to find the proof. Due to the non-linearity of the regret function the Hessian still contains the attribute values, complicating the proof. Empirical analysis suggests the existence of a global maximum, since estimation with random starting values results in the same final log-likelihood every time. A mathematical proof to support this suggestion is an interesting direction for future research.

3.3 Latent class models

The latent class models will be estimated by use of the CFSQP algorithm developed by Lawrence et al. (1994). The log-likelihood function of the latent class model is not global concave so that covergence can be reached in a local optimum. All latent models are estimated one hundred times with random starting values to deal with the local optima.

Referenties

GERELATEERDE DOCUMENTEN

Hij zorgt daarom voor duidelijke uitleg (ook al is die niet altijd correct: zo zijn niet geconsolideerde sedi- menten zeker geen afzettingen zonder vast verband!) en

In terms of the average Hamming Loss values for the regression procedures, it is clear that for every threshold method, CW performs best, followed by FICY, OLS and RR.. In terms

19951 THE JURISPRUDENCE OF REGRET 27 first instance, imported models, modifying them to serve new purposes or re- interpreting them to fit different environments.47 Indeed, the

Aan de neiging van de ziel naar iets volmaakts moest worden voldaan: ‘Gedenk dan dat godsdienst niet bestaat in woord, maar in daad, dat er slechts twee geboden zijn: God en de

When we consider the spectra of the other cal- cium aluminate hydrates it is striking that with increasing hydrogen acceptor property of the an- ions X - ,

After summarizing some results concerning nonnegative ma- trices, we will give a short overview of the different steps, leading to a more or less complete analysis of the first

43: Het kijkvenster op de eerste sleuf aan het terrein van de Suprabazar legde direct de archeologische rijkdom van het terrein bloot... 44: Overzichtsfoto van het kijkvenster

ja AD 26/08/2010 WP1 S4 lichtgrijs, roest bruin gevlekt spikkels (zeer weinig) natuurlijk AD 26/08/2010 WP1 S5 bruin grijs, donkergrijs gevlekt