Experimental Design in a Discrete Choice Experiment: A simulation study Master Thesis

(1)

Experimental Design in a Discrete Choice

Experiment: A simulation study

Master Thesis

Tiago M.C. Delgado Marques MSc EORAS, specialization Econometrics

Faculty of Economics and Business, University of Groningen Supervisor: Professor Viola Angelini

Co-Assessor: Professor Rob Alessie

February 27, 2012

Abstract

In this paper, a simulation study concerning experimental design in a discrete choice experiment (DCE) is carried out. Instead of asking respondents to answer a set of choice scenarios, an utility function is defined based on the estimates obtained in a published DCE study. From this utility function the simulated responses are obtained. Types of designs studied include, Orthogonal Arrays, A-,D,I-optimal designs and Bayesian designs. The designs are evaluated based on their bias, prediction error and efficiency. The main issues treated here are: How do commonly experimental designs perform across small and large sample sizes? Should one include interaction terms among attributes in a design applied to a DCE? And in which situations should one use Bayesian designs? A major goal is to verify whether commonly used experimental designs for DCE perform well and where standard methods can be improved.

(2)

Acknowledgements

This thesis is a research project carried out in order to obtain a Master degree in EORAS, specialization Econometrics, from the University of Groningen. It was also part of my work as a Research Assistant at Pharmerit B.V.

First of all, I would like to thank my supervisor, professor Viola Angelini, for her useful comments and suggestions. Second, I would like to thank Bart Heeg, Director of Health Eco-nomics & Outcomes Research at Pharmerit for coming up with Discrete Choice Experiment as a research topic and the enthusiasm shown by my colleagues at Pharmerit.

Obviously, it wouldn’t be possible to arrive to this late stage of the Master without the support of all my friends in Groningen. The three years that I spent and the persons that I met there taught me valuable lessons.

Additionally, I would like to thank to my mother for her moral support in the most critical times during my stay in The Netherlands.

(3)

List of Tables

1 An example of an ”easy choice’ . . . 13

2 A more informative choice scenario . . . 13

3 DCE model estimation in the last 20 years, adapted from (De Bekker-Grob et al (2010)). . . 15

4 Attributes and their levels, Ryan and Hughes (1997) . . . 36

5 Treatment alternatives Ryan and Hughes (1997) . . . 36

6 List of Designs Experiment I . . . 39

7 List of simulation experiments . . . 40

8 List of performance measures . . . 40

11 Average Bias over 50 simulations, with N=25 . . . 50

12 Efficiency and Prediction Error for N=25 . . . 51

13 Efficiency and Prediction Error for N=25 over 50 simulation runs . . . 52

14 Efficiency and Prediction Error for N=100 . . . 54

15 MAE, D-efficiency over 10 simulation runs . . . 54

16 Bias, 50 runs with different seed numbers . . . 59

17 Bias, 50 runs with different seed numbers . . . 59

18 MAE, 50 runs with different seed numbers . . . 60

19 Bias, Prediction Error and Efficiency with 100 respondents. . . 61

20 Performance measures with 100 respondents by design. . . 62

21 Bias and Prediction Error, with MXL model for 100 respondents. . . 62

22 Coefficients for different numbers of simulation runs . . . 71

23 Coding of the levels per attribute . . . 73

24 Bias in Absolute Value per Coefficient, N = 25 . . . 74

25 Bias in Percentage of the True Coefficient, N = 25 . . . 74

26 Estimated coefficients Full Factorial Design . . . 75

27 Estimated WTP . . . 75

28 Prediction Error . . . 75

29 Estimated coefficients Orthogonal Array (OMEP) design . . . 76

32 Estimated coefficients A-optimal design . . . 77

35 A-optimal design sensitivity analysis . . . 78

36 Estimated coefficients I-optimal design . . . 79

39 Estimated coefficients D-optimal design . . . 80

42 Estimated coefficients Dmax-optimal design . . . 81

(6)

45 Sensitivity Analysis D-max . . . 81

46 Estimated coefficients D choices -optimal design . . . 83

49 Sensitivity Analysis D-choices . . . 83

50 Estimated coefficients OA inter design . . . 85

51 Estimated coefficients Dquad design . . . 85

52 Estimated coefficients Dquad design . . . 86

53 Estimated coefficients Dinter design . . . 86

54 Estimated coefficients Dnointer design . . . 87

56 Estimated coefficients Dnointer design . . . 88

57 Sensitivity Analysis D-max design . . . 88

58 Estimated coefficients Bayesian D design with wrong prior values . . . 89

(7)

List of Figures

(8)

1 Introduction

The region of Zuid Holland wants to know which attributes are more valued by a commuter when deciding on which transportation mode to use from home to work. Doctors investigate, in a particular medical condition, whether patients prefer to stay more time in hospital with a higher cost and low risk of complications or they prefer to stay at home and pay less, but with a higher risk of complication. A fast-food company wants to maximize its revenue, by learning which characteristics are best valued in their pizzas.

Discrete Choice Experiments (DCE) are a method to provide an answer to these problems. First, the researcher decides which variables are relevant to the problem. A questionnaire is obtained, possibly using statistical techniques. Each question comprises two or more choice alternatives (different modes of transportation or different medical treatments) with the vari-ables fixed at different values or levels. The respondent chooses in each question which choice alternative he prefers. Using these answers and the questionnaire, estimates of the importance of each variable in the utility of a consumer/patient are obtained.

In health economics, DCE’s can be used as a way to measure the importance of factors which traditional economic evaluation does not cover. These factors might have no impact in a patient’s health state or have only a short term impact in his health state but a high impact on his utility (Ryan et al (2006)). Alternatively, DCE’s can be used as a tool to measure out-comes to be included in the economic evaluation of treatments (Lancsar and Louviere (2008)). One of the stages of a DCE is to find the best possible questionnaire to present to respon-dents (also referred to as ”experimental design” or ”design”). This is carried out in a way that optimizes certain statistical properties, for instance, one wants the estimates resulting from a DCE to be as precise as possible, or to predict consumer behaviour accurately.

The goal of this paper is to verify whether commonly used techniques to obtain a design perform well and where can they be improved. Rather than asking a respondent about their preferences, an utility function is defined based on the estimates of a published DCE paper by Hughes and Ryan (1997), from which a vector of responses is generated. The designs are evaluated based on their bias, prediction error and efficiency (see section 5).

To perform a DCE study several steps are needed. First, one needs to find which at-tributes are relevant for the consumer, that is which regressors enter in the utility function. Then, the levels for each attribute need to be defined.

These should be realistic (Hughes and Ryan (1997)) in order to simulate an actual choice situation for the respondent. More importantly, the attributes and levels must be designed “so that they force the respondent to trade” (Kjaer(2005)). That is, one infers about the im-portance of each factor by having choice scenarios where some variables are fixed at ”good” values and some at ”bad” values, which should differ per choice alternative.

(9)

Introducing a cost attribute in a choice experiment allows the researcher to directly esti-mate the willingness to pay (WTP) for an improvement in a certain attribute and at the same time infer on its relative importance. Additionally, pair wise comparison can be performed by calculating the marginal rate of substitution (MRS) between attributes.

In the third stage of the experiment, scenarios are presented to individuals. Each of them contains different combinations of levels of the attributes. It is usually not feasible to carry the experiment using all possible scenarios (also known in the literature as full factorial de-sign). For instance in a choice experiment with four attributes, each with four levels, we get 256 scenarios (4 x 4 x 4 x 4 = 256). The number of scenarios to be presented to individuals needs to be reduced, while not losing too much information.

The last stage of a DCE is the data analysis. If there are only two choice alternatives, probit and logit, possibly with random effects or fixed effects to control for individual het-erogeneity are frequently used. With three or more choice alternatives, the complexity of the modeling task increases. There are many models to chose from including but not limited to, multinomial logit, multinomial probit, nested logit, random parameters logit (also known as mixed logit) (Greene and Hensher (2003)), latent choice multinomial model (Caudill et al (2006)) and the generalized multinomial model (Fiebig et al (2009)).

In short, the goal of a discrete choice experiment (DCE) is to make inference over the utility of a discrete set of choice alternatives. These alternatives usually denote different products, medical treatments and means of transportation. Then, in an ideal world, compa-nies or policy makers can base their products or policies on the results of a DCE. But how reliable is this methodology?

An angle to look at DCE’s is to investigate the validity of the statistical methodologies used, namely the choice of experimental design. In most DCE’s an orthogonal main effects plan (OMEP) is used to generate the design, that is to generate the questionnaire to be answered by the respondents. The main advantage of this type of design is that for linear models, the correlation between any two components of the parameter to estimate is zero (Hedayat et al (1999)). This yields more precise estimates. At the same time it confounds interaction terms among attributes and nonlinearities.

Although Louviere et al (2000) find that such terms do not account for a big part of the explained variance, one could argue that “the existence of diminishing marginal utilities or gain-loss asymmetries” (Hoyos(2010)) makes it difficult to believe, without further testing, that utility functions in a DCE should be linear.

An additional problem is that this orthogonality does not extend to the generalized linear models used in the estimation phase. Also the optimal design criteria assume that the model used in the estimation phase is linear, while that is seldomly the case. The reason for this is that the covariance matrix of models like probit, logit and multinomial logit depend on the parameter to estimate, which the researcher does not know a priori.

(10)

is a design that has fixed prior value for the parameter to estimate (Huber and Zwerina (1996)) and bayesian designs (Sandor and Wedel (2001), Chaloner and Verdinneli (1995), Goos et al (2004)). They allow the researcher to carry out a design optimized for the model that will be used in estimation, based on prior values for the parameter to estimate. However there is little information about whether these designs are in deed superior to orthogonal and linear designs. One would expect this to be true if we would know the true parameters in advance. In reality there is substantial uncertainty about the estimates of the parameters.

Previously few papers have analyzed the choice of best experiment design. In a simula-tion study, Carlsson and Martinsson (2003) give a comparison between D-optimal designs and semi-bayesian D-optimal designs, using only two sample sizes with bias as the main perfor-mance measure. Lusk and Norwood (2005) compare several design strategies, although not different efficiency criteria and with a very small number of respondents in their simulation. Goos et al (2006) show that Bayesian designs outperform optimal linear design criteria in terms of efficiency, but bias and prediction power were not investigated. Sandor and Wedel (2001) compared bayesian and semi-bayesian designs. Finally, Rodriguez et al (2010) studied only linear efficiency criteria.

However, there is a not yet a study that compares the three design paradigms, orthogonal main effects, optimal linear criteria and bayesian designs evaluating both bias, prediction power and efficiency. Specially for Bayesian designs, it is interesting to know whether the extra computational effort is really worth it.

There are thus some unresolved issues in the current design for choice experiments liter-ature: What are the best design criteria in small and large sample sizes? Should we build designs taking into account interaction terms? And are Bayesian designs worth the extra computational effort?

To answer these questions (and possibly raise some new ones!) I compare the full-factorial design and an orthogonal main effects plan with designs based on efficiency criteria like the A, D, I-optimal designs. This comparison is carried out for different sample sizes with bias, efficiency and prediction power as the main performance measures. Additionally the effect of adding interaction terms and non-linearities will be analyzed. The assumption that linear designs are good for nonlinear models, which is frequently made in DCE’s will also be tested. Finally, linear optimal designs will be compared with Bayesian designs.

In section 2 an introduction to discrete choice experiments is given. The stages of the DCE are described along with the choice techniques.

In section 3 different experimental design methods are described including orthogonal arrays and orthogonal main effects plans, optimal designs, optimal design algorithms, and bayesian designs.

(11)

2 What is a Discrete Choice Experiment?

Why should one care about DCE’s? Measuring the utility of a certain set of treatments and/or products has since long been a task that has amused economists, marketers, psychol-ogists and other social scientists. Their goals vary from finding which characteristics are best valued by consumers in a product, understanding people’s preferences to go to work, and in health economics, measuring the quality of life given by a certain treatment.

In health economics, cost-benefit analysis and cost effectiveness analysis are the most used techniques to measure the economic value of a treatment, both in monetary terms and in util-ities. Different treatments for the same disease are compared based on their cost per unit of benefit. This benefit might be measured in LY (Life years)or in QALY (quality-adjusted life years) (Drummond et al (1997)).

However the QALY paradigm falls apart, when one is interested in measuring non-health outcomes or treatment process attributes (Ryan et al (2006)). Non-health outcomes relate to the quality of the given care including staff attitude, level of pain or waiting time among others. These non-health outcomes/treatment attributes have possibly a high impact in a patient’s utility but not on his health state.

For instance, different waiting times might not give significant differences with respect to QALY’s, but on the other hand they might have a significant impact on the patient’s utility. In the limit a DCE might lead to a different recommendation than a QALY. This is more likely if the comparing procedures or treatments differ with respect to factors that are not measured in the QALY and/or some factor produces only short-term QALY gains (Ryan et al (2006)). Alternatively, DCE’s can be used to measure outcomes to be included in the economic evaluation of treatments and to infer about patients’ preferences. (Louviere and Lancsar (2008)).

In order to properly value non-health outcomes, health economists followed other social scientists in the development of stated preference methodology. Stated preference data is obtained by directly asking consumers about their preferences for products or treatments.

Discrete Choice Experiment is thus a methodology in the field of stated preference tech-niques. It is mostly used in marketing, environmental and transport economics and health economics. Its goal is to make inference over the utility of a discrete set of choice alterna-tives. These alternatives usually denote different products, medical treatments and means of transportation.

2.1 Stages of a Discrete Choice Experiment

Prior to start the choice experiment, the researcher should decide which kind of choice tech-nique he should use. Choice techtech-niques can be divided into three categories (Kjaer (2005)), discrete choice experiment, contingent ranking and contingent rating.

(12)

or-Figure 1: An example of a choice experiment question reproduced from Kjaer (2005).

dering of the responses. In a contingent rating the respondent rates the alternatives using a predefined scale. In contingent ranking, the respondent ranks the alternatives. In DCE one chooses the most preferred alternative. A naive researcher would use rather contingent rating or ranking, since they provide the most information. However, these are seldomly used (Kjaer (2005)). The reason is the high degree of complexity for the respondent (Kjaer (2005)). This in turn, can undermine the quality of the data used to obtain the estimates.

A compromise approach between contingent ranking, contingent rating which are cogni-tively demanding and the simple discrete choice is proposed by (Burgess et al (2008)).

This approach is less cognitively demanding than contingent rating and contingent rank-ing and at the same time it extracts more information than regular discrete choice. The main idea is that the respondent answers in succession which scenarios are the most and least preferred. From this, the researcher can infer the ranking of the choice alternatives for each scenario. In the estimation phase, a higher weight is given to an a choice alternative with a higher ranking in a particular question (Burgess et al (2008)). Notice that in this setting the number of observations increases with the number of choice alternatives, unlike with the regular discrete choice.

In the first stage of a DCE the researcher defines which attributes are relevant to include in the experiment. There are several ways of doing this. For instance, one can research into previously published DCE’s in the same field, or carry out a pilot-study.

(13)

The more the number of levels and attributes the bigger is the number of resulting scenarios. In principle, this should provide the researcher with more information. But, the primary con-cern is that the levels are realistic (Hughes and Ryan (1997)). That is, they should reflect an actual choice situation. For instance, if in figure 1, the attribute walk to/from transportation equals 60 minutes for Car, then one can guess that every respondent will choose Bus for this scenario. However this does not reflect reality as no one would park their car at a 60 minutes walking distance! Scenarios like this could imply poor reliability of the estimates resulting from the DCE.

After the attributes and its levels are defined, an experimental design algorithm is used. The goal is to reduce the number of scenarios to present to the respondent, to a cognitively feasible number. How high this number is, depends from the subject of the DCE. A good way to test this is simply to ask the respondents whether they found the questionnaire difficult (Hughes and Ryan (1997)).

For instance, in a study with 5 attributes, with 4 levels each there are 54 possible scenar-ios to present. This is too much, for a single respondent. The number of scenarscenar-ios can be reduced in a way that optimizes some particular statistical properties. Usually, this means to have the smallest covariance matrix possible. Then the resulting estimates are efficient, that is, they have the smallest standard errors and one can build the smallest confidence interval for a given confidence level. Popular methods are to use an orthogonal array of type OMEP (orthogonal main effects plan) and optimal experimental design using D-efficiency criteria.

An important aspect that needs to be emphasized is that even if the scenarios have opti-mal statistical properties, they need to be informative. Here is an hypothetical example of a particularly easy choice:

Example of choice scenario

Brand Comfort Price Speed

BMW high 50 fast

Mercedes high 100 fast

Table 1: An example of an ”easy choice’

Unless the consumer has lexicographic preferences, for Mercedes, then a rational consumer would clearly choose the BMW. Consequently this choice provides little information for the researcher. A more informative choice for the researcher would be the following situation:

Example of choice scenario

Brand Comfort Price Speed

BMW high 75 medium

Mercedes high 100 fast

(14)

These two examples introduce a key aspect of a DCE: there needs to be trading between attributes. In the example above, the respondent has to choose between a cheaper but slower car and a faster but more expensive one. From this trading, the investigator can infer which attribute is more important for the consumer.

The output given by the statistical package of the experimental design algorithm is usu-ally ordered. If one would pair the scenarios in the order given by the statistical package, the problem of having non-informative choices would be encountered often, since in each row only the level of one or two attributes change. There is a need thus, to randomize the rows first. Ultimately, the main objective is that there is trading between attributes. The output of this stage is a small questionnaire to present to each respondent.

Alternatively, one might consider to create a design where the optimal statistical proper-ties extend not only across each question, but also across every choice alternative. In many situations, it is more realistic to assume that product A has a certain set of attributes and levels, that is different than the set of attributes and levels of product B.

This approach, implies for an OMEP, for instance, that each scenario for each choice al-ternative has optimal statistical properties, which does not hold true if instead we generate a design optimally and randomly draw scenarios for each alternative. With this method, one expects to get more efficient estimates (Lusk and Norwood (2005)), but at the expense of an increased computing time. For instance, if we assume three choice alternatives where each choice alternative has 4 attributes and each attribute has 4 levels, we get 44_{∗ 4}4_{∗ 4}4 _scenarios

in the full factorial design. Also, the high number of scenarios makes it harder to find an appropriate orthogonal array.

Another feature of experimental designs is blocking. If the number of resulting scenarios obtained from the design is considered to be too large, one can divide the respondents in several groups, each of them answering only parts of the questionnaire. However maintaining similar properties across groups can be a challenging task. Some researchers might be inter-ested in this: divide the sample into groups according to characteristics of particular interest. In order to ensure that the questionnaire is effective a number of practical “tricks” are used by researchers. One can perform a test of consistency by putting two similar questions in the choice experiment. If the respondent answers differently to them, the researcher might want to exclude them, since there might be the danger that the respondent did not under-stand the situation, answered randomly or strategically (Hughes and Ryan (1997)).

Additionally, the questionnaire can be validated by excluding respondents, that in a few selected questions, made the “wrong choice” assuming that all respondents are utility max-imizers (Hughes and Ryan (1997)). An example of this is when a respondent selects an alternative that has worse levels in all attributes than another alternative. Additionally the researcher might want to exclude respondents exhibiting lexicographic preferences for a par-ticular choice alternative. These respondents won’t trade, thus they add little information to the dataset.

(15)

per question, binary response models are used. Namely, logit and probit models (Cameron and Trivedi (2005)). If the researcher believes there is individual taste heterogeneity, random effects probit can be used. Most DCE studies carried out until now, have only two alternatives per question, but there is a trend for more studies using three or more choice alternatives. (De Bekker-Grob et al (2010))

Type of Model 1990-2000 2001-2008

Binary Choice Model 76% 64%

Multinomial Choice Model 21% 32%

Table 3: DCE model estimation in the last 20 years, adapted from (De Bekker-Grob et al (2010)).

If the DCE poses three or more choice alternatives per question, the modeling task be-comes more challenging. Then, we use multinomial models. There is a wide range of models to choose from, depending on the assumed shape of the variance, the presence of taste and/or scale heterogeneity, and whether IIA (independence of irrelevant alternatives) assumption holds true (De-Bekker et al (2010)). This problem is known in the literature as the “red -bus, blue-bus problem” (Cameron and Trivedi (2005)).

IIA implies that the conditional probability of traveling by car, given that traveling by car and by red bus are options is unchanged if the option of traveling by blue bus is added. Since IIA holds in MNL, if there is an equal probability of traveling by red bus or by car equal to 0.5 each, the addition of the blue bus option will give 0.33 probability of traveling to each of the three options. In reality we expect that the introduction of the blue bus will halve the usage of the red bus. The IIA assumption is frequently violated in multinomial models. This is usually the case if alternatives are similar, or belong to a group. In general one should carefully consider whether the addition of a J + 1-th alternative changes the con-ditional probabilities of the pairwise comparisons of the other J alternatives.

The violation of any of these assumptions yields inconsistent estimates, but on the other hand, there is no simple and general model, where you can relax each of these assumptions.

Even if you do not take into account the extra effort that it takes to estimate (and un-derstand!) models like MXL, latent class models (Caudill et al (2006)) or the generalized multinomial logit model (Fiebig et al (2009)), one would expect that such models should pro-duce less efficient estimates than multinomial logit provided that all assumptions on which MNL is based are satisfied. Therefore it is highly recommendable to test for each of these assumptions and use different model specifications.

2.2 Willingness to Pay

A distinct feature of a DCE is the ability to calculate the marginal rate of substitution (MRS) between attributes. In general the MRS between two attributes xi and xj can be interpreted

(16)

attributes is a cost or price attribute, we can calculate the so-called willingness to pay for an extra level of a particular attribute.

The willingness to pay measure, allows the researcher to calculate an approximation to the price of certain intangible goods or situations. Examples of research questions, where WTP is useful are, how much would one be willing to pay to avoid traffic jams, to spend less time in hospital after a treatment, to have more policeman patrolling, among many others. In health economics this allows non-health authorities to measure health benefits in monetary terms, given by a particular treatment of product.

The calculation of the marginal WTP in a linear model is quite straightforward. The researcher simply has to include a cost attribute in the utility function. If γ is the estimated coefficient for the cost attribute and β the vector of coefficients then, the willingness to pay for attribute i is,

WTPi = βi/γ.

The total WTP for choice alternative j is

k

X

i=1

βi

γ (1)

In a general model the marginal WTP for a certain attribute depends on its own level. For the general model,

y = f (Xβ + ) (2)

we have marginal WTP equal to

WTPi = ∂f ∂xi ∂f ∂xγ , (3)

where xγ is the cost attribute.

For models other than the linear, the marginal WTP depends on the level of the attribute i. In probit, logit and multinomial models, the marginal WTP for attribute i depends not only on its own level, but also on the levels of the other attributes.

(17)

(18)

3 Experimental Design Methods

Full-Factorial design is a design where all possible combinations of attributes and their levels are included as scenarios in the experiment. It is usually not feasible to carry the experiment using all possible scenarios. For instance in a choice experiment with four attributes, each with four levels, we get 256 scenarios (4 x 4 x 4 x 4 = 256). The number of scenarios to be presented to individuals needs to be reduced, while maintaining a certain degree of variability. That is the main purpose of using experimental design methods.

Fractional Factorial Design denotes a design where only a subset of all possible combina-tions of attributes and their levels are used. Three types of designs are discussed in the next sections, Orthogonal Arrays/OMEP, optimal designs, and bayesian optimal designs.

Orthogonal Array, and in particular a special type of OA called Orthogonal Main Effects Plan (OMEP) allows the estimation of all main effects, while every component of the param-eter to estimate is uncorrelated to each other. A problem with this type of design is that it confounds interactions among attributes. Additionally, it is not possible to obtain an OMEP for every number of scenarios.

Optimal Designs are designs which are optimal in relation to a certain optimality crite-ria. Since one would like to have precise estimates, these optimality criteria usually attempt to minimize the variance of the parameter estimates. These designs do not confound inter-action terms among attributes and it is possible to compute them for any number of scenarios. Bayesian design is an optimal design that allows the researcher to insert prior knowledge, about the parameters to estimate in the design. It assumes that the variance of the parameters to estimate is computed based on the model used in the estimation, while both Orthogonal Arrays and Optimal Designs assume that the variance of the parameters to estimate is based on the normal linear model.

Four characteristics are desirable in a design, level balance, orthogonality, minimal overlap and utility balance (Kjaer (2005)).

Level balance means that the levels of each attribute occur with an equal frequency. Or-thogonality implies that the attribute vectors are orthogonal to each other. Minimal overlap is related to the pairing of alternatives. If a design has minimal overlap then the probability of an attribute level repeating itself in each choice is zero. ((Kjaer (2005)).

An utility balanced design consists of a design where the utilities for each choice set are equal or approximately the same (Kjaer (2005)). The main reason to take this into account is to guarantee that there is trading between the attributes. However, this is not straightfor-ward. One can not compute the individual utility scores, without knowing the true coefficients for each attribute, but the researcher does not know them in advance. A solution might be to generate provisional estimates from a pilot-study or to use bayesian designs.

(19)

attributes, so that one can make inference on the importance of each attribute.

3.1 Orthogonal Arrays

In this section I give a description of OA’s as mathematical object. In contrast to the optimal design sections, I will not discuss methods to obtain the design. This is the case since in practice software packages like R, have a library (see DoE.base package of R) with thou-sands of OA’s stored. Thus, in principle the reader won’t have to construct its own design. If the reader would like to learn how to build a suitable OA, than I recommend the book of Hedayat et al (1999), where several techniques to obtain an OA are discussed.

A popular technique to generate efficient designs is to use orthogonal arrays (OA). The main reason for its popularity lies in the fact that by using a type of OA known as orthogonal main effects plan (OMEP), as a design, all attributes are orthogonal to each other. This is in turn, should result in parameter estimates with smaller standard errors.

I reproduce here the definition of an orthogonal array given in Hedayat et al (1999).

Definition 1. Let S be a set of S levels , s = 1, 2, ..., S. A N by k array A, is said to be an orthogonal array, with N runs, k factors, S levels, index λ, and strength t (for some 0 ≤ t ≤ k), if every N by t subarray of A contains each t-tuple based on S exactly λ times as a row. It is denoted as OA(N, k, S, λ, t).

In other words, if A is an orthogonal array then, each n by t submatrix of A contains all possible t row vectors with the same frequency λ (Tang (1993)). For instance A is an OA of strength 2 if in any pair of columns of A we see every possible 2 row vectors that it is possible to see, and these two row vectors occur with the same frequency.

Additionally, for any orthogonal array it holds true that if the orthogonal array is said to be of strength t, then it is also of strength, T = 1, . . . , t − 1.

The index λ denotes the number of times a question (row of the design matrix) is repeated in the design. In a discrete choice experiment λ is usually equal to one, as a researcher will not want a respondent to answer several times the same question.

If there is at least one factor which has a different number of levels than the other factors or one factor has different levels than another, then a more general notation to denote an orthogonal array is OA(N, S1S2 . . . Sk, λ, t), where S1, . . . , Si, . . . , Sk denotes the i-th set of

levels belonging to the i-th attribute.

(20)

OA1 =             0 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 0             (4)

To provide a deeper understanding of the definition of an orthogonal array to the reader I show briefly, that this array is of strength 3.

It follows from the definition of an orthogonal array that the maximum strength of an N by k orthogonal array is k. Suppose that this orthogonal array is of strength 4 (that is k). A simple example of a possible combination not present in this array is the row (0 1 1 0). This is in contradiction with the definition of strength of an orthogonal array, thus it can not be of strength 4. To prove that this array is of strength 3 I simply note, that first there are 8 possible combinations of 3 factors each with the same two levels (2 x 2 x 2), and second that every possible row appears exactly once.

The strength of an array, is an important characteristic from a statistical point of view. Depending on the strength, you can estimate main-effects only, or higher order interactions of a particular order. Basically, an orthogonal array of strength 2 or higher is sufficient if one is only interested in estimating main effects only (Hedayat et al (1999)). Other types of OA’s, which are not necessarily of strength 2 or higher are also suitable for estimating main effects. These are called orthogonal main effects plans (OMEP).

An OMEP allows the estimation of all main effects but confounds interaction terms among attributes and nonlinearities. Additionally it has the property that the ordinary least squares estimator of any two components from two different main-effects are uncorrelated (Hedayat et al (1999)).

An example of an OMEP is matrix 4. This holds true, since any array of strength 3 is also an array of strength 2. But an OMEP is not necessarily an OA of strength 2 or higher. For example, this OMEP taken from Hedayat et al (1999):

(21)

Matrix 5 has 9 rows, while there are only two levels in each column. Therefore there is no way that we can see each possible level combination equally often. It is thus, an OA of strength 0. What sort of property makes 5 an OMEP then?

An OMEP is a matrix that exhibits the property of proportional frequencies. If one se-lects any pair of columns from an OMEP, then the frequency of any possible ordered row combination that it is possible to see occurs a certain number of times which is proportional to the frequency that any other possible ordered row combinations occur (see Hedayat et al (1999) for details).

Let nj_i denote the number of times level i appears in column j and 0 denote a different column or different level. Let N be the number of choice questions. Then a formula to verify the condition of proportional frequencies is given by,

nj_i nj_i00

N . (6)

In matrix 5 the pairs {0 0}, {1 0},{0 1},{1 1} occur with frequencies 4,2,2 and 1, re-spectively, as a row considering any pair of columns of the matrix 5. Computing formula 6 gives also 4,2,2,1 for any pair of columns. Then I can conclude that 5 satisfies the condi-tion of proporcondi-tional frequencies and consequently this matrix is also an OMEP. Although 5 is an orthogonal array of strength 0 it suffices if one is interested in estimating main effects only. An orthogonal array where the levels occur with proportional frequencies is an orthogonal main effects plan, and such an array allows main effects only estimation. Obviously, if one is interested in estimating interactions among attributes, orthogonal arrays of strength higher than two can be used. The disadvantage is an increased number of runs. Also, it becomes difficult to find an appropriate orthogonal array for a specific number of attributes and its levels, as the number of attributes and its levels increases and if each attribute has a different number of levels. Even if such an array is found, the number of runs might be higher than desired, and blocking might not be an option.

Another important point is that, while OAs permit a design with orthogonal attributes for linear models, estimation of discrete choice experiments with linear models is seldomly used. This orthogonality does not extend to the nonlinear models used to estimate the β parameter in discrete choice experiments (Bliemer et al (2008)). Therefore it is arguable whether the resulting estimates from a orthogonal array design are truly efficient.

(22)

3.2 Optimal Design of Experiments

I introduce now the problem of optimal experimental design. Let X be a m by k matrix, y a m by 1 vector, β a k by 1 vector of parameters. Then we have,

y = Xβ + , (7)

where is an m by 1 vector of i.i.d. error-terms, with zero mean and variance σ2.

The goal is to select m rows xi of the full factorial design.

Different information based optimality criteria are based in one or both of the following,

V ar( ˆβ) = ˆσ2(X0X)−1, (8)

and/or,

V ar(ˆyi) = ˆσ2xi(X0X)−1x

0

i, (9)

where ˆyi denotes the predicted value for the i-th individual.

Optimality criteria based only in the first equation above, have as main objective to mini-mize the covariance matrix of the parameter estimates. Other criteria based on V ar(ˆy) try to minimize the variance of the prediction error. The criteria presented in the following sections belong to the class of information based criteria. For a rigorous treatment on these the reader is invited to study the book of Pukelsheim (2006).

Other types of optimization criteria are the so-called distance based criteria (these are not treated in the simulation study).

Let z, y be two m by 1 vectors. Let V ∈ <m. Distance based criteria are based in the distance function ∂(z, V) :

∂(z, V) = miny∈V||z − y||, (10)

where,

||z − y|| =p(z1− y1)2+ · · · + (zm− ym)2). (11)

For instance, U -optimality minimizes the distance between the points xcin the candidate

set C to the points xi in the design matrix X. For an exhaustive list of information and

distance-based efficiency criteria see El-Monsef et al (2009).

The first step of obtaining an optimal design is to apply a particular algorithm to the full-factorial design matrix, according to a certain efficiency criteria. The second step is to pair the alternatives through randomizing.

(23)

design across scenarios and choice alternatives, there is no need for pairing, since the pairing is already optimized.

The disadvantage of this approach, is that the full-factorial matrix has 44 x 44 x 44 rows and 4 x 3 columns, leading to a higher computation time. Notice that this approach is mandatory, if at least one of the attributes has a different set of levels per choice alternative.

3.2.1 D-Optimality

Let X be the M by k matrix resulting from a full-factorial design. Each k column of X denotes an attribute and each row a scenario.

D-efficiency is a popular criterion to optimize a design. It aims to minimize the determi-nant of the covariance matrix or equivalently,

max |(X0X )|.

That is m rows of X are chosen, forming matrix X, such that the determinant of (X0X) is maximized, which is equivalent to minimize the size of the covariance matrix.

This criteria assumes that the true model is linear, while estimates in a discrete choice experiment are obtained based in discrete choice models, which are nonlinear. Then we have

max |I( ˆβ)|,

where I denoted the Fisher Information matrix. For the linear model this matrix equals, I( ˆβ) = X

0

X

σ2 , (12)

which does not depend on β.

In general this matrix depends on the vector of coefficient estimates β. This introduces an additional difficulty to the optimization process. Therefore it is frequently assumed that a good design for a linear model is a also a good design for a non-linear model.

In a non-linear model the information matrix is just the second derivative of the log-likelihood function

I(β) = ∂L

∂β ∂β0 (13)

For an illustration on how this expression depends on β see section 4.

3.2.2 G-Optimality

Another popular criterion is G-optimality. It is defined as,

(24)

That is we minimize the maximum prediction variance over the design space. This objec-tive is of importance, since an important goal of a choice experiment is to predict the utility of a certain individual given a set of treatments of products.

The problem with G-optimal designs is that this optimization problem is more difficult than for A, D, I-optimal designs (Rodriguez et al (2010)). Namely the exchange algorithms described in section 3.2.5 do not work here. For this reason I do not use this criterion in the simulation study. The General Equivalence Theorem gives the relationship between the D and G-optimal designs (see Kiefer and Wolfowitz (1960)).

For a discussion of algorithms for G-optimal designs see Rodriguez et al (2010).

3.2.3 A-Optimality

A-optimality’s goal is to minimize the trace of (X0X)−1, which is equivalent to minimize the average variance.

The disadvantage of this criterion is that different designs can be optimal with different non-singular codings.

3.2.4 I-Optimality

The criterion of I-optimality minimizes the integrated prediction variance. Let X denote the full factorial design matrix and ˆy the vector of fitted values. Then I-optimality is defined as

I = Z

X

N

σ2V ar(ˆy) dx. (15)

In other words, we minimize the average predicted variance over the design region.

3.2.5 Optimal Design Algorithms

I discussed already the most popular optimization criteria, when choosing an experimental design. Now, the question is how does the researcher perform the optimization. The problem is how too choose m scenarios out of M possibilities. This is a discrete optimization problem, so standard rules of differentiation do not apply.

A first approach involved direct brute-force optimization. The main problem with this approach is that the problem can become easily too large. For instance, suppose that in a discrete choice experiment we have four attributes and each of them has four levels and the researcher wants to have 30 scenarios. There are 256 (44) scenarios to choose from. Thus, to find a D-optimal design, we need to evaluate the determinant of the covariance matrix _226!30!256! times.

(25)

exchange algorithms. The basic idea amounts to exchange a point in the design with a point in the candidate set, and check whether the determinant increases.

Let X denote the full-factorial matrix. Define η(xj), xj ∈ X as

η(xj) =

(

0 if there are no observations at point xj,

nj/M if there are nj observations at point xj.

(16)

Notice that PM

j=1nj = m, for j = 1, . . . , M . We can interpret this definition as follows: If

η(xj) = 0 then xj ∈ X is not in the design matrix X. If η(xj) is nonzero, point xj is the

design matrix. If n > 1 then, for some xj ∈ X there is a scenario which is repeated in the

design matrix X.

Then the candidate set can be defined as the set of all xj ∈ X , such that η(xj) = 0 and

are not excluded from the design, in the current iteration.

The Fedorov algorithm is defined as follows. It starts with an m-point design η1. This

starting design can be arbitrarily chosen, as long as it is nonsingular. If we chose to optimize the design using D-optimal criteria, then the function of interest is M (η) defined as

M (η) = X 0 X m = Z X xx0dη(x), j = 1, . . . , M, x ∈ X . (17)

Additionally define d(xi, η) as,

d(xi, η) = x 0 iM (η)−1xi= x 0 i(X 0 X)−1xi, (18)

where xi denotes the i-th row of the design matrix X. Notice this is just the variance of the

predicted response (without the scale parameter σ2).

Finally define, d(xi, xc, ηl) as

d(xi, xc, ηl) = x

0

iM (ηl)−1xc. (19)

Fedorov (1972) showed that the l-th iteration step is given by

|M (η_l+1)| = (1 + ∆(xi, xc, ηl)|M (ηl)|, (20)

where,

∆(xi, xc, ηl) = d(xc, ηl) − d(xi, ηl) − (d(xi, ηl) d(xc, ηl)) + d2(xi, xc, ηl). (21)

Notice that d2(.) is clearly defined since it is a scalar.

In the ic-th step of the l-th iteration a point xi, i = 1, ..., m is deleted and a point xc,

(26)

the change, otherwise the change is rejected. Therefore in each l-th iteration m maximizations of the ∆ function must be carried out.

In the original Fedorov algorithm only the exchange (xi, xc) that results in the maximum

of the ∆ function, out of all possible pairs of candidate points and design points is accepted. Therefore after m ∗ (M − m) steps, at most, only one exchange is done. In the modified Fe-dorov algorithm of Cook and Nachtsheim (1983), for each design point, at most, an exchange is made after the candidate point that optimizes the ∆ function is found. Thus, after (M −m) steps an exchange is done. This modification outperforms the original Fedorov algorithm in terms of speed (Cook and Nachtsheim (1983)).

The procedure stops when,

M (ηl+1) − M (ηl)

M (ηl)

≤ δ, (22)

where δ is a sufficiently small positive number.

Other algorithms like the Wynn-Mitchell and Van Schalwyk are similar to Fedorov’s algo-rithm, but treat the optimization of the ∆ function in a different way (Cook and Nachtsheim (1980)). A different and more recent proposal is the RSC algorithm by Sandor and Wedel (2001), where RSC stands for relabeling, swapping and cycling.

(27)

3.2.6 Bayesian Design with Fixed Prior

In commonly used models for discrete choice, the covariance matrix depends on the unknown parameter vector β (see sections 4.1, 4.2.1, 4.2.2), which it is usually not known for certainty a priori.

A possible approach to deal with this problem was introduced first by Huber and Zwerina (1996). It amounts to use the true covariance matrix of the model to estimate, where prior estimates of β are given. Then, an algorithm can optimize the design based on this value. The main issue with this approach is that if the values provided are not close enough to the true value of the parameter β, the resulting estimates might be biased.

Carlsson and Martinsson (2003) studies this approach using a simulation experiment, with a design that assumes a multinomial logit model. They found that a small bias in the prior fixed values, still gives better results than a standard D-optimal or a design using orthogonal arrays. However in a real life situation prior values might be far from the true parameters in some cases.

Sandor and Wedel (2001) compare these type of design with designs with ”pure” bayesian designs (random prior values based on a particular distribution with parameters supplied by the user) and found that bayesian designs with non-fixed prior can have a predictive validity which is 20% higher and 30% to 50% lower standard errors, than bayesian designs with fixed prior.

3.2.7 Bayesian Designs with non-fixed prior

With substantial uncertainty in the parameter values, it might be worth to assume that the β parameter is random and follows a particular distribution. This is the subject of Bayesian designs: we draw a vector of prior parameters from a certain distribution. These draws be-come inputs in the design, to be optimized using a bayesian version of the modified Fedorov algorithm.

A regular modified Fedorov algorithm chooses the exchange between a point in the candi-date set and the design matrix that maximizes the efficiency, according to a certain criterium. The bayesian version chooses the exchange, that on average is more efficient, given the ran-domly drawn ˜β based on a certain prior distribution defined by the user.

In this section I discuss optimal bayesian design, following the review by Chaloner and Verdinelli (1995).

The problem still consists on choosing a design a design matrix X that optimizes a certain criteria. A popular efficiency criteria is DB-efficiency, the bayesian version of D. Analogous

versions of A and I-optimality can also be used.

(28)

U = Z

lnp(θ|y, η)

p(θ) p(y, θ|η) dθdy, (23)

where θ denotes the parameter of interest. For instance in a normal linear regression model θ = β σ2_.

This utility function is just a measure of the distance between the prior and posterior distributions for θ. It is known in the statistics literature as the Kullback-Leibler distance.

For the normal linear regression model with prior design matrix X0 we get,

U = −k 2ln(2π) − k 2 + 1 2ln |nM (η) + X 0 0X0 σ2 | . (24)

Only the last term in the expression depends on η. Therefore this optimization problem reduces to

maxη nM (η) + X00X0. (25)

From the last expression it becomes clear that, the inclusion of the matrix X₀0X0 in the

optimization is the only difference between Bayesian and Non-Bayesian designs in the normal linear model. Another interesting conclusion that can be taken from the last expression is that for large sample sizes, the difference between Bayesian and Non-Bayesian vanishes. This asymptotic result has also a common sense translation, the more data you have the less im-portant becomes prior information used in the model.

Of course, other utility functions can be used to derive D-optimality. For more details, on these utility functions I refer to Chaloner and Verdinneli (1995).

3.2.8 Smart Draws

One aspect of Bayesian designs, that needs to be emphasized is that they are computation-ally more intensive then regular optimal designs. Basiccomputation-ally, we need to draw several times the β parameter and optimize the design using the modified Fedorov algorithm for each of the drawn βs. In order to reduce the computation time it becomes imperative to use smart draws. Examples are, Halton draws, modified latin hypercube sampling, Sobol sequences and gaussian quadrature. Bliemer et al (2008) compare several of these drawing techniques with the common pseudo monte carlo sampling.

(29)

Bayesian efficiency criteria are not very different from the usual criteria, the main differ-ence is that we use an information matrix that is a function of the unknown β parameter. The information matrix is still optimized using a modified Fedorov algorithm or alternatively using the RSC algorithm by Sandor and Wedel (2001). A comparison between these two algorithms for different efficiency criteria is carried out by Goos et al (2004).

3.3 Discussion

Three types of designs were reviewed in this section: orthogonal arrays, optimal designs and optimal bayesian designs. In a certain sense, one could say that these designs represent re-spectively, the past, the present and the future of experimental design for discrete choice experiments.

Orthogonal arrays advantage is that it allows to carry out designs where all attributes are orthogonal. One problem, with this approach is that as the number of attributes with different numbers of levels increases, it becomes increasingly harder to find a parsimonious design. This is even harder if one wants to include interaction terms. Then we need use orthogonal arrays of strength higher than two, implying a larger number of runs.

Optimal designs, do not have the disadvantage of orthogonal arrays, they allow the re-searcher to find a design for any number of runs, number of attributes and its levels, though the orthogonality is dropped. However both, orthogonal arrays and optimal designs assume, that the model to estimate is linear.

(30)

4 Econometric Models for Discrete Choice

In this section I discuss only the models used in the simulation experiments. For other models, the reader is invited to read Cameron and Trivedi (2005).

4.1 Binary Choice Models

Models for binary choice arise when the response variable can take only two values. These can denote a Yes or No answer like, the choice between two different treatments or products, labour force participation and whether or not a client of a bank defaults. When the dependent variable only takes two values linear models become inadequate, since they assume that the error term is normally distributed, and the error term in this setting is clearly non normal. Furthermore, if one interprets the fitted values as probabilities, the linear model does not guarantee that one will get probabilities inside the [0 1] interval.

Two models are frequently used, the logit and the probit model. The difference between them lies in the specification of the error term. While in the logit model, the error term is extreme value distributed, in the probit model the error term is normal. Other functions could be used provided that they limit the predicted probabilities inside the [0 1] interval. In practice, there is not much difference between logit and probit models, they produce similar predicted probabilities, with slight differences in the tails of the distribution. ((Cameron and Trivedi (2003))

In discrete choice experiments, we see often that individual respondents exhibit taste het-erogeneity. But, usually the researcher does not know whether this fact holds. A possible route is to estimate a model that takes such effects into account, like the random effects probit model, and compare its estimates with a model that does not like the probit model. Another approach is to remove the individual effects by using a fixed effects model. The choice between random effects and fixed effects depends on assumptions relative to the correlation between the individual effects and the covariates (Cameron and Trivedi (2005)).

In Experiment I and in Experiment II, there are two choice alternatives for each scenario. I chose to use the probit model, since it is the same model used in the paper where the utility function is taken from (see Ryan and Hughes (1997)).

Here, we model the probability of individual i choosing an alternative B for a certain scenario as,

pi = Φ(x

0

iβ), (26)

where Φ is the standard normal cdf, xi is the i-th row of the design matrix, and β is the

parameter to estimate.

The first order conditions are given by,

(31)

where wi equals

wi =

φ(x0_iβ)ˆ

φ(x0_iβ)(1 − φ(xˆ 0_iβ))ˆ , (28)

and φ(.) denotes the standard normal pdf. The asymptotic covariance matrix is just,

ˆ V ( ˆβ) = N X i=1 wixix 0 i !−1 . (29)

Notice that through wi, this matrix depends on β the parameter to estimate.

4.2 Multinomial Choice Models

In Experiment III, Multinomial Logit and Mixed Logit are used. The number of DCE ap-plications that consist of three or more choice alternatives, though still a minority has been rising in the last decade (De Bekker-Grob (2008)). One factor that hinders the widespread use of multinomial choice models in DCE’s is the additional modelling complexity. There are many models to choose from (see De Bekker-Grob et al (2010) for an overview) and making the wrong assumption can be ”fatal” unlike in the linear model.

4.2.1 Multinomial Logit

The simplest multinomial choice model of the lot (see De Bekker-Grob et al (2010)) is the multinomial logit, where we assume homoskedasticity, homogeneity in tastes and scale and independence of irrelevant alternatives (IIA).

The first decision that the researcher does is which regressors enter the model as alternative specific and alternative invariant. This gives rise to different specifications of the regression function. Namely with alternative specific regressors the model is called conditional logit and the probability that individual i chooses alternative j is given by,

pij = P (yi= j) = exp(x0_ijβ) PJ l=1exp(x 0 ilβ) , l = 1, . . . , J. (30)

If the regressors are invariant across alternatives, we call the model multinomial logit. Then, the probability that individual i chooses alternative j equals,

(32)

In general, we can have models with a mix of the two specifications. In a DCE usually the regressors are the same across alternatives, but they do not have the same levels, except for the constant term. Thus, in a DCE context there are (k − 1) + (J − 1) regression coefficients to estimate, including k − 1 attributes and J − 1 intercepts.

Both the conditional and multinomial logit can be estimated through maximum likeli-hood. The first order conditions for the conditional logit model equal,

N X i=1 J X j=1 yij(xij − ¯xi) = 0, (32)

and the Fisher Information Matrix is,

N X i=1 J X j=1 pij(xij − ¯xi)(xij − ¯xi) 0 , (33)

where pij is given by equation 30.

For the multinomial logit model the first order conditions are:

N

X

i=1

(yij− pik)xi = 0, k = 1, . . . , J. (34)

The kj-th block of the Information Matrix is just,

N X i=1 pij(δijk− pik) xix 0 i, k = 1, . . . , J, j = 1, . . . , J, (35)

where δijk equals one if j = k and zero otherwise.

Notice that in both specifications the Information Matrix depends on β through pij.

A major issue of MNL is the independence of irrelevant alternatives (IIA) assumption. IIA follows from the assumption that ij are iid. This implies Cov(ij, ik) = 0, for any j 6= k. An

extreme example of how the IIA can yield unrealistic results is the ”red bus blue bus problem”. IIA implies that the conditional probability of traveling by car, given that traveling by car and by red bus are options is unchanged if the option of traveling by blue bus is added. Since IIA holds in MNL, if there is an equal probability of traveling by red bus or by car equal to 0.5 each, the addition of the blue bus option will give 0.33 probability of traveling to each of the three options. In reality we expect that the introduction of the blue bus will halve the usage of the red bus and leave the probability of travelling by car unchanged.

(33)

Alternative models in the literature are Multinomial Probit (MNP), Mixed Logit (MXL) and Nested Logit (Cameron and Trivedi (2005)). Unfortunately estimation of any of these models is more complicated than the Multinomial Logit. In the case of MXL and MNP we have to resort to simulation methods for estimation.

4.2.2 Mixed Logit

The Mixed Logit (MXL) (also called Random Parameters Logit) relaxes the IIA assumption. MXL can be specified in two ways (Hensher and Greene (2002)). The first is called error components model and it defines the utility to the ith individual of the jth choice alternative as,

uij = x

0

ijβ + ξi+ ij, j = 1, 2, . . . , J. (36)

Here ij’s are iid and extreme-value distributed like in multinomial logit model. However, the

βi’s parameters are allowed to be random,

βi = β + ξi. (37)

The distribution of the ξi’s needs to be specified by the researcher. Usually they are normally

distributed.

The random parameter specification is defined as, uij = x 0 ijβ + vij, j = 1, 2, . . . , J, (38) where vij equals vij = x 0 ijξi+ ij. (39)

In both error components and random parameter specifications the IIA assumption is relaxed. I show this for the random parameter specification (for the error components speci-fication, the proof is similar), by computing Cov(vij, vik), with j 6= k:

Cov(vij, vik) = E(vijvik) (40) = E((x0_ijξi+ ij)(x 0 ikξi+ ik) = E(x0_ijξi x 0 ikξi) = E(x0_ijξitr(x 0 ikξi)) = E(x0_ijξiξ 0 ixik) = x0_ijΣβxik.

Here tr(.) denotes the trace operator. Step 2 follows since ij are iid and E(xijξiik) = 0.

(34)

Since the βi are now random the probability of individual i choosing alternative j equals, pij = Z _exp(x0 ijβ) PJ m=1exp(x 0 imβ) f (βi|β, Σβ) dβi, (41)

where f (.) denotes the specified probability distribution for β and Σβ is the covariance matrix

of β.

This integral can be approximated through SML estimation. The log-likelihood function is just, lnL(β, Σβ) = N X i=1 J X j=1 yij ln 1 B B X b=1 exp(x0_ijβb) PJ m=1exp(x 0 imβb) . (42)

In order to approximate this integral, βi must be drawn B times, from its specified

dis-tribution. How large B needs to be depends on the specific situation, but it needs to be a number much larger than N (Cameron and Trivedi (2005)). Therefore the usage of smart draws is advised to speed up computations.

The most frequently used distributions for MXL are normal, log-normal, uniform and triangular (Hensher and Greene (2002)). Each of them has its own advantages and disadvan-tages. The major issue is the standard deviation at the tails of the distribution. For instance, we might get an estimated parameter with the wrong sign (a positive cost coefficient) if we specify the random parameters to be normally distributed, or get a very large estimate (a price coefficient close to zero will yield an extremely large willingness to pay) if we specify the parameter to be lognormal distributed.

Solutions for this problem are to constraint the standard deviation to be a function of β, specify β to be a function of some characteristics, or to specify a discrete distribution for the random parameters, giving rise to so-called latent class models. Using bounded distributions like the uniform or triangular, or use the truncated normal might be convenient alternatives (Hensher and Greene (2002)). Another approach is just to hold one or several coefficients fixed (Train (2001)). This is advised specially for the cost coefficient, if one is interested in the WTP.

The MXL model might not be the ”remedy to all evils”. Namely, there may exist other sources of randomness like scale heterogeneity (Fiebig et al (2009)). Scale heterogeneity means that for some respondents the scale of the error term is larger than for others. If individuals exhibit scale heterogeneity, the distribution of β is misspecified and MXL might give poor estimates. An attempt to model scale heterogeneity is the Generalized Multinomial Logit Model of Fiebig et al (2009).

In the G-MNL model β is specified as,

βi= σiβ + γηi+ (1 − γ) σiηi. (43)

(35)

The advantage of this approach is that it allows to model scale and taste heterogeneity by including only two extra parameters.

(36)

5 Methodology

To carry out a simulation study a data generating process must pre-defined. Namely, the matrix of scenarios X and the vector β must be pre-defined. In order to add realism I use the results of a published DCE as the true coefficients. Namely, the true data generating process are the results of the probit regression found in Hughes and Ryan (1997).

In this study women’s preferences for miscarriage management were analyzed. Two alter-native treatments were considered, surgical and medical management of miscarriage. Medical management of miscarriage involves taking drugs while surgical management includes an op-eration (Hughes and Ryan (1997)).

This model contains monetary measure, risk, health status and time attributes, as defined in De Bekker-Grob et al (2010). These attributes are typically part of a DCE study. 56% of the DCE studies performed in the period 2001-2008 and reviewed by De Bekker-Grob et al (2010) had a monetary measure attribute, 51% a time attribute, 31% a risk attribute, and 54% a health status domain attribute.

In the following table attributes and their levels, as in Hughes and Ryan (1997) are shown:

Attribute Reference Description Levels

Health status x1 the level of pain you will experience Low, Moderate, Severe

Time x2 time in Hospital receiving treatment 1 day, 2 days, 3 days , 4 days

Time x3 time taken to return to normal household 1-2 days, 3-4 days, 5-6 days, ≥ 7 days

Monetary measure x4 cost to you of treatment 100,200,350,500,600 pounds

Risk x5 complications following treatment No, Yes

Table 4: Attributes and their levels, Ryan and Hughes (1997) Additionally the two treatments are referred to in this section as,

Reference Name Treatment Description

Alternative 1 Medical Management of Miscarriage Alternative 2 Surgical Management of Miscarriage

Table 5: Treatment alternatives Ryan and Hughes (1997)

The estimated regression equation in Hughes and Ryan (1997) is the utility function from which I wish to simulate. In the rest of this section, I name this utility function as basecase. It is given by

u∗_i = −0.236 − 0.462x1i− 0.25x2i− 0.07x3i− 0.002x4i− 0.983x5i+ ∗i, i = 1, . . . , N m. (44)

(37)

In this DCE, it is assumed that the attributes and their levels are the same for the two choice alternatives. In some applications this does not apply. For instance, in this DCE the full-factorial design contains (3 x 4 x 4 x 5 x 2) 480 rows. If the levels for the two choice al-ternatives differ then the full-factorial design contains 480 x 480 rows and 10 columns instead of 480 rows and 5 columns.

Even if the levels are the same, one might expect that there will be an efficiency gain, if the design is optimized using the matrix containing 480 x 480 rows rather than the one with 480. For my simulation study, this would be computationally troublesome to perform for the full-factorial design due to the large size of the matrix. The same holds for OA’s since it would be difficult to find an OA with not too many rows. Therefore I decided not to pursue this route in the basecase.

Still, I do a comparison of the two approaches, namely I compare a D-optimal design that is optimized from a 480 by 5 matrix with a D-optimal design that is optimized from the 4802 by 10 matrix, that is, a design optimized across scenarios and choice alternatives. Additionally, in Exp III, all designs are optimized across scenarios and choice alternatives.

5.1 Experiment I: Are frequently used Experimental Designs good?

In the first experiment of this study, the effects of using different experimental designs across sample sizes will be examined. A frequently used experimental design method is a type of orthogonal array known as orthogonal main effects plan and D-optimal designs. I will com-pare their performance with a full-factorial design, A and I-optimal criteria designs. Later I study ways to improve the designs obtained, with the D-max and D-choices designs.

At first, I would like to know whether frequently used experimental designs are good. By good I mean, compared to the experimental design that uses all the data (full-factorial design), how do experimental designs that use only a fraction of all combinations perform? Namely in terms of bias, efficiency and prediction error.

A second question is how do experimental designs behave in small and large sample sizes. Obviously, in frequentist statistics the larger the sample size, the better. But in health economics, sample sizes are usually not very large, thus it is important to verify whether experimental designs produce good results even in a sample smaller than 100. Namely, the estimates resulting from the experimental design should have a low bias even in small sample sizes.

The utility function is defined as,

u = Xβ + . (45)

Experimental Design in a Discrete Choice Experiment: A simulation study Master Thesis