• No results found

Modeling US indiduals' selection of health insurance policies not offered by their employers

N/A
N/A
Protected

Academic year: 2021

Share "Modeling US indiduals' selection of health insurance policies not offered by their employers"

Copied!
23
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty Economics and Business, Amsterdam School of Economics University of Amsterdam

Modeling US indiduals’ selection of

health insurance policies not offered by

their employers

Bachelor Thesis Econometrics C.E. Maijers

10646426 26 June 2018

Supervisor: Dr. Eleni Aristodemou Abstract

This paper describes the modeling of the decisions made by US individuals with regard to the selection of an apropriate health insurance policy in the case where such coverage is not provided by their employers. For the analysis a multinomial logit model is deployed. The data from the Medical Expenditure Panel Survey used for the analysis comprise personal characteristics. A combination of individual and health-related characteristics are selected as determinants.

(2)

Statement of Originality

This document is written by Student Camilla Maijers who declares to take full responsi-bility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 3

2 Economic theory of demand models 4

2.1 The multinomial and the conditional logit model . . . 4 2.2 Extensions and estimation of the logit model . . . 8

3 An application 10

3.1 The health insurance market . . . 10 3.2 The data . . . 11 3.3 The model and method . . . 13

4 Empirical results 15

5 Conclusion 17

6 References 19

List of Tables

1 Description of study population . . . I 2 Estimation results . . . II 3 Estimated average marginal effects . . . III 4 Estimated probabilities . . . III 5 The percentage correctly predicted outcomes, the hit rates . . . III 6 Hausman specification test results . . . III

(4)

1

Introduction

US individuals are faced with multiple choices regarding their health insurance plan. For years the majority of the nonelderly population in the United States have obtained their health insurance through their employer (Feldman, Finch, Dowd, & Cassou, 1989). The question arises which establishment individuals would choose to obtain private health insurance from if this were not the case. They will choose the health insurance policy that maximizes their utility from the available set of alternative policies offered by the various establishments. The utility function of policyholders is expressed in terms of the characteristics that define the offerings of different establishments combined with individual characteristics.

In this paper, the choice for a health insurance plan is modeled for individuals who are not getting health insurance coverage through their employer, using individual data offered by the Medical Expenditure Panel Survey from 2015. Analysis of the behavior of consumers in the health insurance market is useful because understanding consumer taste heterogeneity is considered to be a cornerstone of an optimal design of a market that is hampered by an inefficient competitive equilibrium (Keane, 2004).

In the past, health plan choices have been modeled with certain logit models. Feldman, Finch, Dowd and Cassou (1989) modeled the choice between employer-sponsored health in-surance plans using a nested conditional logit model. Harris and Keane (1998) developed an extended heterogeneous logit model to model the elderly’s choice between health insurance plans. Royalty and Solomon (1999) estimated a multinomial logit and nested multinomial logit model of health plan choices to subsequently derive price elasticities. In this paper a multinomial logit model is used to model the choice of an individual between establishments offering health insurance policies when a health insurance plan is not offered by an employer. To obtain health insurance the policyholder can choose between different establishments: a union, a group, an insurance agent, an insurance company, an HMO and the State Ex-change market. The data that is used to estimate the multinomial logit model contains information on the population’s health insurance, health expenditures and health status. The used subsample after elimination contains a total of 6,367 observations. Based on the existing literature, determinants are selected to draw up the model. A combination is used

(5)

of individual and health-related characteristics (Shen, 2013), such as age, gender and race combined with number of comorbidities and self-reported health status among other. In the end the probabilities of choosing establishment j are estimated for each individual i. For each individual i the outcome with the highest estimated probability is considered the predicted outcome. The hit rate is defined as the percent correctly predicted percentages. The overall hit rate of the estimated model is 51 percent.

In the following section the logit models that have been used in the existing literature are discussed. Section 3 describes the model’s application to the health insurance market. In addition, the dataset used to model the individual’s health plan choice is described, and the multinomial logit model is presented as well as the chosen characteristics. In Section 4 the results are analyzed and discussed. Section 5 concludes, summarizes the results, and provides some side notes for future research.

2

Economic theory of demand models

The estimation of demand is considered essential in previous studies that have examined different events that occur on differentiated product markets (Nevo, 2000). One method to estimate demand is with discrete choice models. Discrete choice models explain and pre-dict the choices made by individuals within a finite set of discrete differentiated products (Wooldridge, 2010). Such models of product demand are most significantly influenced by McFadden (1974). This section outlines the multinomial logit model and McFadden’s con-ditional logit model, and describes the further development of the logit model. Various modeling assumptions and their implications for the model are discussed.

2.1

The multinomial and the conditional logit model

If the discrete response of the individual in the data has more than two outcomes, it is a case of multinomial data. The discrete dependent variable yi = j, with j = 1, ..., J , representing

the choice, can be estimated with the multinomial logit model. The model is based on the assumption that individual i will choose the option j from a set of J alternative options available that yields the highest utility uij. This utility uij of individual i for option j is a

(6)

function of the characteristics of the individual (Wooldridge, 2010):

uij = x0iβj + ij, j = 1, ..., J, i = 1, ..., N (2.1.1)

where xi represents the K observed individual characteristics and βj is a K × 1 vector

representing the K unknown parameters that are to be estimated. The term ij reflects the

effect of the (for the researcher) unobserved taste variables.

The probability P (yi = j | xi) is defined as the probability of individual i choosing option

j, that maximizes utility uij:

P (yi = j | xi) = P (uij > uih∀ h 6= j), j, h = 1, ..., J (2.1.2)

Probability P (yi = ji | xi) is also known as the market share of product j and in the

multinomial logit model this market share is equal to the response probabilities;

P (yi = j | xi) =

ex0iβj

1 +PJ

h=1exiβh

(2.1.3)

Of particular interest is to analyse how ceteris paribus changes in xi affect the response

probabilities (Wooldridge, 2010).

Lastly an outside good with utility ui0 is defined: individuals may decide not to buy

any of the available product options (Nevo, 2000). That is, in addition to the available alternatives j = 1, ..., J , j = 0 is defined since a general price increase would otherwise not change the quantity of purchases (Nevo, 2000). The response probabilities must sum up to unity, representing the response probability for the outside good (Wooldridge, 2010):

P (yi = 0|xi) = 1 1 +Pj h=1ex 0 iβh (2.1.4)

The multinomial logit model is estimated with maximum likelihood. The standard prac-tice for identification is to choose β0 for the first (reference) category (Heij, De Boer, Franses,

(7)

as `i(β) = J X j=0 1[yi = j]log(P (yi = j | xi)) (2.1.5)

where the relevant response probability for each observation i is determined by the indicator function (Wooldridge, 2010). The maximimum likelihood is then calculated by adding the conditional log likelihoods of all observations with respect to the parameters βj, j = 1, ..., J .

The loglikelihood becomes:

log(L(β1, ..., βJ)) = N X i=1 ( J X j=1 yijx0iβj− log(1 + J X h=1 ex0iβh)) (2.1.6)

McFadden (1974) investigates in his paper the relation between individual behavioral models and the distribution of population choices. He showed that related to the multinomial logit model, the conditional logit model can be obtained from an underlying utility function and discusses its statistical properties. Instead of focusing on the individual characteristics, he focuses mainly on the product characteristics. McFadden (1974) states that the utility uij

of individual i for product j can be written as follows

uij = zijγ + wiδj+ aij, j = 1, ..., J, i = 1, ..., N (2.1.7)

where zij represents the product characteristics that differ across alternatives (and possibly

also across individuals) and wi represents the individual specific characteristics. Here too

the residual term aij stands for the effect of the unobserved taste variables. If δj is δ for all

products, then δ drops out of the equation. The conditional logit model is also estimated with maximum likelihood.

The logit model is an attractive model to work with because it is computationally easy to solve. However, some of its assumptions place strong restrictions on it. Assumed is that the residual term ij is identical and independently distributed across options and individuals

following a Type I extreme value distribution (Wooldridge, 2010). As a result, the ratio of the probability of two choices does not change depending on [ ]. This property is called the Independence of Irrelevant Alternatives (IIA) which can result in implausible substitution

(8)

patterns.

Another limitation is that the cross price elasticities only depend on price and market share (Nevo, 2000). That is, if two products are similarly priced and have similar market shares, it is implied that both products will have the same cross price elasticity with a third product, even if one product resembles the third product more than the other product. This can result in implausible outcomes as well.

The last limitation is due to the way heterogeneity is modeled (Nevo, 2000). In both the multinomial and conditional logit model it is assumed that the variation of individual tastes only enters through the residual term ij. The individual parameters representing the

observed variables that capture the responses of the individuals are therefore assumed to be homogeneous. Furthermore, it can also be argued that because of this too much variety of taste preferences is represented by the residual term ij. Obviously, the majority of the utility

should not come from an implausibly large draw of the ij (Petrin, 1999; cited in Nevo, 2000)

Another problem that has to be considered when modeling the demand for differentiated products, is that in the literature it is suggested that prices correlate with unobserved product characteristics captured in the residual term. Nevo (2000) explains that producers take product characteristics and quality into account when setting prices. In other words, a higher price might imply that the consumer gets more value for his money (Berry, 1994). This unobservable value is incorporated in the residual term. But if this endogeneity is not accounted for, the estimates for βj will be biased. One solution to deal with potential

endogeneity might be to make an estimate using instruments. Unfortunately, it is not possible to use a straightforward instruments estimation method in this case. Instead an estimation method using inverted market shares (suggested by Berry - 1994) should be used. This is discussed in the next section.

The multinomial and the conditional logit model are commonly used for discrete choice modeling because of their convenient application of response probabilities. However, as discussed, both models have their limitations due to certain assumptions. Extensions have been suggested that expand the multinomial logit model. Both Berry (1994) and Nevo (2000) discuss the logit model in detail in their research and compare different models that are based on the logit framework. In the next section their research is discussed.

(9)

2.2

Extensions and estimation of the logit model

The multinomial logit model computes the response probability of an individual choosing an option from a set of alternative options as a function of individual characteristics (Wooldridge, 2010). This model does have its limitations due to the implications of some of the assumptions made. Substitution patterns rely on the IIA property and are not influenced by additional effects from characteristics or prices of other products or by the similarity of products (Nevo, 2000). Extension of the classic logit model with the random coefficient model alleviate these restrictive assumptions while maintaining the advantage of the logit model in dealing with dimensionality issues. This section contains the model extensions developed by Berry (1994) and Nevo (2000) and subsequently how to estimate these particular models.

Berry (1994) and Nevo (2000) both provide a similar disquisition on ways to extend the conditional logit model proposed by McFadden (1974). Berry (1994) assumes a random coef-ficients specification for the utility function. He states that the utility for a given individual depends on the characteristics of the chosen product, on random consumer taste variables, and on a set of parameters to be estimated. The individual i chooses the option j that maximizes utility uij, according to Berry (1994) given by:

uij = xjβ˜i− apj + ξj+ ij, i = 1, ..., N, j = 1, ..., J (2.2.1)

xj are the observed product characteristics, pj is the price parameter and ξj are the

unob-served product characteristics. The individual-specific taste variables are represented by the K × 1 random coefficients vector ˜βi and the residual term ij. ξj is interpreted as a mean

of the individuals’ perception of unobservable product characteristics such as the product quality of option j, and ij interpreted as the distribution of individual preference about

these unobservable product characteristics. Berry (1994) decomposes the individual i’s taste parameter random coefficients ˜βi for characteristics k as

˜

(10)

where βkrepresents the mean level of the taste parameter for characteristic k. The

individual-specific differences in taste per product characteristic are represented by the mean-zero ζik,

that is assumed to follow an identically and independently distributed stand normal distri-bution across individuals and characteristics.

The random coefficients model differs from the conditional logit model because the model allows for the influence of individual-specific taste effects as well as the influence of un-observable product characteristics such as product quality. Also, the risk of unreasonable substitution effects is averted with this random coefficients model. Berry (1994) explains that with this model, if individuals experience an increase in price for product j, they will substitute towards a product that resembles product j. But these benefits come at a cost, because the random coefficients model is a less computationally convenient estimator than the logit model. When deciding on a suitable model for selected data, a choice has to be made between the larger computational hazard of the random coefficient model or the basic but restrictive multinomial logit model. The logit model will be favored when the computational complexity predominates, the random coefficients model will be preferred when the emphasis is on estimating the more specific patterns of demand (Berry, 1994).

Berry (1994) states that the prices as well as the unobserved product characteristics enter the demand equation in a nonlinear fashion. He suggests a method to estimate the model whereby he inverts the market shares to obtain the mean utility levels that are linear in the prices and unobserved product characteristics. This method can be applied to a classic conditional logit model as well. The log differences between the estimated market shares and the estimated market share of the outside good is then equivalent to the mean utility. This mean utility is linear in the product characteristics and can be estimated with OLS. The residual term is interpreted as the term for the unobserved product characteristics. In the random coefficient model the market share equation is difficult to calculate, but the method is similar.

Berry (1994) and Nevo (2000) both suspect the presence of endogeneity in prices. There-fore the model should be estimated with the instrumental variables method. It is not possible to use a straightforward application of the instrumental variables method because the de-mand equation is not linear in the unobservables. This is resolved with the inverted market

(11)

share method which allows the mean utility to be estimated with an instrumental variables method. Cost shifters are seen as good instruments (Nevo, 2000) but it is often difficult to obtain appropriate instruments within the data.

The following section describes the application of the multinomial logit model to the choice of individuals to obtain health insurance from various establishments, when health in-surance is not offered by their employer. The health inin-surance market and relevant research are discussed. Thereafter follows a specification of the dataset, the model and estimation method that is used in this paper.

3

An application

In this paper a model is applied to the decision by an individual from which establishment to obtain a health insurance plan, when the individual is not offered a health insurance plan by his or her employer. This section first describes the health insurance market and the different models that are used in the existing literature for health plan choices. After that, the data and the variables are described that are utilised in the analysis presented in this paper. Finally, the approach that is used to model the choice for a health insurance establishment is described, including the econometric methods that are used for estimating the unknown parameters that influence the choice for a particular establishment.

3.1

The health insurance market

The individual has to decide from which establishment he or she obtains private health insur-ance if such a plan is not offered by his or her employer. This establishment can be either a union, an insurance agent, an insurance company, a health maintenance organization (HMO), a group, or a State Exchange (MEPS, 2018). Health insurance provided by a union is de-fined (in the documentation of the data) as insurance offered through a labor-management committee. Private health insurance obtained through a group is for small businesses that cannot offer the same benefits as large companies (Wikipedia). HMO’s cover health care services only from doctors, hospitals and other providers that the HMO has a contract with. This puts certain restrictions on the patient’s freedom to choose (Feldman, Finch, Dowd,

(12)

& Cassou, 1989). State Exchanges are marketplaces to make private health insurance more accessible to individuals (MEPS, 2018).

There are several studies that have estimated a model for such a health plan choice. Harris and Keane (1998) model the choice of health plans by senior citizens, developing an extended heterogeneous logit model to do so. Feldman, Finch, Dowd and Cassou (1989) investigate the demand for employment-based health insurance plans using a nested logit model to model health plan choices and segregating their sample into subsets based on whether the policy offers the policyholder the freedom to choose his or her own doctor. Royalty and Solomon (1999) model the health plan choices of employees at a single firm and focus on estimating price elasticities in a managed competition setting.

When modeling the demand for health insurance, the existing literature takes into consid-eration that obtaining health insurance is subject to asymmetric information (Keane, 2004). The individual knows more about the state of his health than the insurance company which may lead to adverse selection as the choice for a health insurance plan may be influenced by the possible health care expenditures a policyholder expects to make in the future. However, evidence of the presence of such adverse selection is weak.

In the next section the dataset obtained from the Medical Expenditure Panel Survey 2015 that is used in this paper is described. A comprehensive description of the subsample is provided and its particular characteristics are discussed.

3.2

The data

For this analysis, data from the Medical Expenditure Panel Survey (MEPS) from 2015 is used. MEPS provides surveys of families, individuals, their doctors, pharmacies, hospitals and employers from all of the United States combined. The aggregated survey offers a comprehensive source of data with regard to costs and utilization of the health care system and the coverage of the health insurance providers since personal information is combined with information about health insurance policies as well as relevant expenditures (MEPS, 2018). Consequently, the cross sectional dataset used in this article contains information on privately insured individuals and their private health insurance plans, as well as demographic and socioeconomic characteristics.

(13)

A subsample is used of individuals who have private health insurance that was not ar-ranged through their employer. Individuals who have public insurance are excluded because having public health insurance is not considered to be an individual’s choice (Shen, 2013). Health insurance in the United States is normally linked to employment. When this is not the case, this paper examines the choice from which establishment an individual obtains his or her private health insurance policy. Only the observations including policyholders are kept. The subsample contains 6,367 observations after elimination. The establishment can be either a union, an insurance agent, an HMO, a group or a State Exchange. This variable acts as the dependent variable. The outside good is defined as the individuals in the sample that are not offered a health insurance plan by their employer and remain uninsured.

The dataset consists of consumer-level data. The observed variables thus vary across individuals. Personal characteristics such as age, sex and the self-reported health status, that affect the demand for medical care (Feldman, Finch, Dowd, & Cassou, 1989), are added to the model. Income, marital status and the region are added as socioeconomic variables following Shen (2013). Health-related variables are given by the number of comorbidities an individual has, and whether they are current smokers (Shen, 2013). In the survey the indi-viduals are asked if they have any of the following conditions: Alzheimer’s disease, asthma, arthritis, cancer, diabetes, heart disease, high blood pressure, artrose, or have suffered a stroke. The comorbidity variable counts the number of conditions the policyholder has. The variable becomes 0, 1 or 2 if the individual has no comorbidities, 1 comorbidity or 2 or more comorbidites respectively.

Shen (2013) states that the benefits and costs of the health insurance plan should be added. The variables are given by the out-of-pocket premium payments, annual deductibles and if the health insurance plan is associated with a Health Savings Account (HSA). Feldman, Finch, Dowd, and Cassou (1989) in his research finds a significant effect for these variables in preferring a health plan. Keane (2004) adds the key attributes of plans, that is if the policy includes dental insurance, vision insurance and prescription drugs. After inspection of the data, these variables were not consistent and missing most of the observations. Unfortu-nately, they could not be added to the model. Therefore, the assumption is made that there is no endogeneity.

(14)

Adverse selection arises when there is an asymmetry in information between the individ-ual and the health insurance establishment. That is, the individindivid-ual knows more about his or her health status than the health insurance establishment (Keane Stavrunova, 2016). As a way to control for adverse selection the variable for the self-reported health status of the individual is added, as well as the variable for the number of comorbidities and whether the individual smokes.

In table 1 some summary statistics of the data are given. Continuous variables like income are categorized in groups to show the distribution. When estimating the model the variables remain continuous. Total sample contains 6,367 observations from which 3.7 percent obser-vations chose a union, 7.4 percent a group, 13.55 percent an agent from a insurance company, 20.2 percent an insurance company, 3.9 percent an HMO, 24.23 a State Exchange market and 27.1 percent remained uninsured.

On the basis of the dataset and the chosen variables the model is defined to model the choice for the health insurance establishment.

3.3

The model and method

In this section the model and methods that are used in section 4.1 are discussed. The model that is chosen is the multinomial logit model (Wooldridge, 2010):

uij = x0iβj + ij, j = 1, ..., J, i = 1, ..., N (3.3.1)

Where uij is the utility that person i obtains given choice option j, xi is a vector of observed

characteristics and ij is a random residual, assumed to be identically and independently

distributed following a Type I extreme value distribution. The probability that the jth establishment will be chosen is estimated. The characteristics that are described in the previous sector are included in the xi vector. Age2 is added following Shen (2013), as well as

dummies for region and self-reported health status. The probability of choosing establishment j to obtain health insurance is given by

P (yi = j | xi) =

ex0iβj

1 +PJ

h=1exiβh

(15)

An empirical analysis is be performed to estimate this model on the data obtained from the Medical Expenditure Panel Survey.

The estimated parameters in the model are difficult to interpret. A way is to look at the relative probabilities or the relative odds. Another way is to look at marginal effects to determine the effect of a characteristic in the probability scale. Marginal effects with respect to the kth predictor are given by (Heij, De Boer, Franses, Kloek, Van Dijk, 2004):

∂P (yi = j) ∂xik = ˆpij(βjk − X r ˆ pijβrk) (3.3.3)

where ˆpij is the predicted probability that individual i chooses establishment j and βjk is

the kth element of βj.

In the next section different diagnostic tests are performed to test the model. It might be the case that the individuals that choose for the insurance company and the individuals that choose for an agent at a insurance company can be combined in the same group. It could also be the case that other groups should be combined. This is tested in the next section as well as the overall significance of the model.

Furthermore, the success of classification is estimated (Heij, De Boer, Franses, Kloek, Van Dijk, 2004). First the probabilities per choice are estimated. For each individual i the establishment with the highest estimated probability is considered to be the predicted establishment choice outcome. These predicted outcomes can be compared with the actual observed outcomes of yi. Therefore the hit rate is defined as the percentage of correctly

estimated outcomes. The hit rate is compared to the random predictions, whereby for each individual i an outcome is predicted with the probability that is equal to the observed frac-tions in the sample. The expected hit rate of these random predicfrac-tions is the sum of the squared observed fractions. The estimated model provides better-than-random predictions if

z = h − ˆq p ˆq(1 − ˆq)/n =

nh − nˆq

pnˆq(1 − ˆq) (3.3.4)

is larger than 1.645 at 5 percent significance level (Heij, De Boer, Franses, Kloek, Van Dijk, 2004).

(16)

The empirical analysis is given in the next section. The multinomial logit model is esti-mated for the choice of an individual between establishments offering health insurance policies when a health insurance plan is not offered by an employer. Multiple test are conducted and marginal effects are analyzed.

4

Empirical results

The results of the multinomial logit model estimated for the choice of an individual between establishments offering health insurance policies when a health insurance plan is not offered by an employer are given in table 2. The variables are estimated for the different estab-lishments. The choice for not being insured serves as the reference group. The variable for self-reported mental health was initially added to the model but because it only had an insignificant effect for all the establishment groups it was ultimately left out of the model. Family size, income, years of education, age2 and number of comorbidities have a significant

effect for all the establishments at the 5 percent significance level. From the region dum-mies the region west with respect to northeast is mostly significant. Age and gender are significant for all the establishments, except union. Being white is not significant for the group establishment and being married is not significant for HMO, but both are significant otherwise. Current smoker is only significant for agent and insurance company and HMO. The self-reported health statuses Excellent and Very good with respect to Poor health status have a significant effect for all the establishments except the union.

The estimated parameters in the model are difficult to interpret. A way is to look at the relative probabilities or the relative odds. For example the characteristic being married. By exponentiating the estimated parameters, it shows that the relative probabilities of choosing the insurance company rather than being uninsured, and of choosing the agent at the in-surance company rather than being uninsured, and of choosing the State Exchange market rather than being uninsured, are on average double for married individuals ceteris paribus. The relative probability of choosing a union over being uninsured is 35% less for individuals that are married than individuals that are not married.

(17)

marginal effects determined in equation 3.3.3 to determine the effect of being married in the probability scale. The average estimated marginal effect of being married on the different establishment choices are given in table 3. The results show that the average marginal effect of being married on union, HMO and being uninsured is negative. This means that the probability of choosing these establishments is on average a few percentage points lower for married individuals than for unmarried individuals ceteris paribus.

The overall specification is tested with a LR test, with the null hypothesis that all pa-rameters are zero. The multinomial logit model is tested without the variables obtaining the loglikelihood value −11022.38. The corresponding Likelihood Ratio-test on the joint signif-icance of the chosen variables has a value LR = 2(−8567.044 + 11022.378) = 4910.67. The null hypothesis is rejected because the LR is sufficiently larger than the critical value of the χ2(102) distribution. The variables are jointly significant.

It might be the case that the groups of individuals that choose for the insurance company and the agent of an insurance company can be combined. A Wald test is performed to test whether the estimated coefficients are the same for the insurance company and the agent of an insurance company. The test results indicate that the Wald test statistic has a value of 43.68 (0.0004). This value exceeds the critical value of the χ2(17) distribution, thus the coefficients are significantly different. The Wald test is performed for all the combinations of combining groups. All the performed Wald tests indicate that none of the groups should be combined.

The probabilities of choosing establishment j are estimated for each individual i. Table 4 shows that the mean of these probabilities resemble the sample means per group. For each individual i the outcome with the highest estimated probability is considered the predicted outcome. The hit rate is defined as the percent correctly predicted percentages. The hit rates are given in table 5. The overall hit rate is 51 percent. The model predicts most outcomes correctly for the individuals that choose the insurance company, the State Exchange market and the individuals that remain uninsured with hit rates of 65, 71 and 80 percent respectively and less for the other groups. For the individuals that chose an HMO as establishment to obtain health insurance the predicted outcome was never correct.

(18)

rate of the random predictions by the equation (3.3.4) given in section 3.3. The hit rate multiplied by the number of observations nh is equal to the number of correctly predicted observations: 3,195. The hit rate of random predictions is equal to sum of squared sample means: (1, 725/6, 367)2 + (233/6, 367)2 + (471/6, 367)2 + (863/6, 367)2 + (1, 284/6, 367)2 +

(248/6, 367)2+ (1, 543/6, 367)2 = 0.1994. The z statistic is then given by:

z = 3, 195 − 0.199 ∗ 6367

p6, 367 ∗ 0.199(1 − 0.1994) = 60.39 (4.0.1) This results shows that the predictions made by the model are significantly better than random predictions.

Lastly, a Hausman test is performed to test whether or not the IIA assumption is valid here. Under the IIA assumption, there should not be found a systematic change in the estimated parameters if one of the establishments is excluded from the model. A restricted model is estimated, excluding one of the possible establishments, and these parameters are tested against the parameters of the full model with a Hausman test. The results are given in table 6. The IIA assumption only holds for the group outcome. For four groups the χ2 statistic is negative, which can be interpreted as strong evidence that the IIA assumption is violated. This could be a result of the fact that the sample groups are quite small in comparison.

The next section gives a summary of this paper and analysis and provides some side notes for future research.

5

Conclusion

In this paper, the choice for a health insurance plan is modeled for individuals who are not getting health insurance coverage through their employer, using individual data offered by the Medical Expenditure Panel Survey from 2015. Analysis of the behavior of consumers in the health insurance market is useful because understanding consumer taste heterogeneity is considered to be a cornerstone of an optimal design of a market that is hampered by an inefficient competitive equilibrium.

(19)

A multinomial logit model is used to model the choice of an individual between estab-lishments offering health insurance. The data obtained from the Medical Expenditure Panel Survey that is used to estimate the multinomial logit model contains information on the pop-ulation’s health insurance, health expenditures and health status. Determinants are selected to draw up the model from existing literature. Personal characteristics are given by age, sex, marital status, income and self-reported health status among other.

The multinomial logit model is estimated and it is found is that family size, income, years of education, age2 and number of comorbidities have a significant effect at the 5 percent

sig-nificance level. Furthermore, the probabilities of choosing establishment j are estimated for each individual i. For each individual i the outcome with the highest estimated probability is considered the predicted outcome. The hit rate is defined as the percent correctly predicted percentages. The overall hit rate of the estimated model is 51 percent. However, there is a difference between the hit rate’s for the different establishments. The hit rate for the indi-viduals that chose the state market for example is 80%, but for the indiindi-viduals that chose an HMO the predicted outcome was never correct.

The number of observations per establishment in the subsample used for for this paper are quite uneven, leaving some of the groups such as the group of individuals that chose for an HMO and a union rather small compared to the group of individuals that chose for a insurance company or a State Exchange market. Also, the group of individuals that re-main uninsured was quite large by comparison, because of the fact that most people obtain health insurance through their employer and therefore there are not many observations of individuals that choose an other establishment. Especially the estimated parameters for the HMO group suffered from this. Furthermore, the product characteristics that were available in the data turned out to be inconsistent and missing a lot of observations, and had to be left out of the model. Therefore it was not possible to conduct a conditional logit model estimation. In a next attempt to model the choice for health insurance establishment this should be considered.

(20)

6

References

Berry, S. (1994). Estimating discrete-choice models of product differentiation. The RAND Journal of Economics, 25 (2), 242-262.

Feldman, R., Finch, M., Dowd, B., & Cassou, S. (1989). The demand for employment-based health insurance plans. Journal of Human Resources, 115-142.

Harris, K. M., & Keane, M. P. (1998). A model of health plan choice:: Inferring preferences and perceptions from a combination of revealed preference and attitudinal data. Journal of Econometrics, 89 (1-2), 131-157.

Heij, C., de Boer, P., Franses, P.H., Kloek, T., & van Dijk, H.K. (2004). Econometric methods with applications in business and economics (1st ed.). New York: Oxford University Press.

Keane, M. (2004). Modeling Health Insurance Choice Using the Heterogenous Logit Model, Manuscript, Department of Economics, Yale University

Keane, M., Stavrunova, O. (2016). Adverse selection, moral hazard and the demand for Medigap insurance. Journal of Econometrics, 190 (1), 62-78.

McFadden, D. (1974). Conditional logit analysis of qualitative choice analysis. Frontiers in Econometrics, pp. 105-142.

Nevo, A. (2000). A practitioner’s guide to estimation of random-coefficients logit models of demand, Journal of economics management strategy, 9 (4), 513-548.

Shen, C. (2013). Determinants of health care decisions: insurance, utilization, and expendi-tures. Review of Economics and Statistics, 95 (1), 142-153.

Royalty, A. B., & Solomon, N. (1999). Health plan choice: price elasticities in a man-aged competition setting. Journal of Human Resources, 1-41.

(21)

N %

All 6,367 100.0

Insurance establishments

Union 233 3.66

Group 471 7.40

Insurance company from an agent 863 13.55

Insurance company 1,284 20.17 HMO 248 3.90 State Exchange 1,543 24.23 Uninsured 1,725 27.09 Gender Male 3,158 49.60 Female 3,209 50.40 Age Below 49 3,009 47.25 50 - 65 1,778 27.93 66 or older 1,580 24.82 Income Less than $20,000 3,085 48.45 $20,000–$30,000 1,168 18.34 $30,000–$50,000 1,077 16.92 Over $50,000 1,037 16.29 Marital status Married 2,884 45.30 Other 3,483 54.70 Family size 1 1,633 25.65 2 2,044 32.10 3 or more 2,690 42.25 Years of education

Less than high school 3,289 51.66

High school 1,353 21.25 College or higher 1,725 27.09 Region Northeast 780 12.25 Midwest 1,285 20.18 South 2,674 42.00 West 1,628 25.57 Number of comorbidities 0 2,787 43.77 1 1,434 22.52 2 or more 2,146 33.71 Smoker Yes 704 11.06 No 5,663 88.94

Self-reported health status

Excellent 1,740 27.33

Very good 1,917 30.11

Good 1,682 26.42

Fair 776 12.19

Poor 252 3.96

(22)

Union Group Insurance compan y from an agen t Insurance com pan y HMO State Exc hange Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Estimate (SE) Age -.052 .0 30 -.121 .019 -.132 .014 -.148 .0 13 -.118 .019 .043 .012 Age 2 .001 .0002 .002 .0001 .002 .0001 .002 .0001 .001 .0002 -.0003 .0001 Gender -.186 .170 .632 .129 .587 .105 .565 .0988 .585 .158 .721 .085 White -1.174 .181 -.111 .163 -.310 .125 -.466 .112 -.903 .168 -.828 .092 Married .738 .184 .291 .144 1.042 .119 .710 .111 -.040 .186 .58 0 .093 F amily size -.429 .074 -.455 .059 -.392 .040 -.466 .038 -.647 .077 -.356 .028 Income .00004 0.000 .00004 .000 .00003 .000 .00003 .000 .00003 .000 .00002 .000 Y ears of education -.151 .011 -.123 .009 -.130 .008 -.136 .007 -.15 6 .011 -.140 .007 Region Midw est -.111 .236 .452 .212 1.106 .192 1.140 .180 -.039 .286 .176 .162 South -1.842 .231 -.601 .191 -.104 .173 -.061 .160 -.516 .235 -.272 .134 W est -.759 .232 .028 .204 .417 .183 .441 .170 .683 .241 .120 .145 Num b er of comorbidities .504 .120 .536 .09 6 .370 .079 .3 49 .074 .584 .119 .45 2 .064 Curren t smok er -.178 .2 42 .121 .184 -.479 .168 -.461 .151 -.808 .2 86 -.282 .123 Health status F air -.155 .389 .102 .347 -.052 .300 -.460 .258 .016 .377 .318 .257 Go o d .135 .369 .603 .328 .556 281 -.118 .243 -.0631 .367 .999 .244 V ery go o d .303 .379 1.190 .331 1.040 .285 .542 .246 .902 .361 1.433 .249 Excellen t .323 .392 1.10 8 .341 1.204 .290 .634 .251 .921 .372 1.204 .254 In tercept -.802 .971 -1.690 .658 -.519 .487 1.119 .432 .200 .688 -1.392 .407 Num b er of observ ations 6367 Log-lik eliho o d v alue -8567.0437 Pseudo R-squared 0.2228

(23)

Married Average (SE)

Union .0051 .005

Group -.0219 .007

Insurance company from an agent .0618 .009

Insurance company .0249 .011

HMO -.0229 .006

State Exchange .0262 .011

Uninsured -.0733 .009

Table 3: Estimated average marginal effects

Mean estimated probabilities Sample ratio in %

Union .0365949 3.66

Group .0739752 7.40

Insurance company from an agent .1355426 13.55

Insurance company .2016648 20.17

HMO .0389508 3.90

State Exchange .2423433 24.23

Uninsured .2709282 27.09

Table 4: Estimated probabilities Hit rate

Overall 51%

Union 7%

Group 6%

Insurance company from an agent 4%

Insurance company 65%

HMO

-State Exchange 71%

Uninsured 80%

Table 5: The percentage correctly predicted outcomes, the hit rates 2*Hausman test statitistic 2*df 2*P > χ2

If excluded:

Union -36.10 80

-Group -22.66 80

-Insurance company from an agent -271.88 80

-Insurance company 234.69 80 0.000

HMO -48.46 80

-State Exchange 87.73 80 0.260

Referenties

GERELATEERDE DOCUMENTEN

The claim that symbolic rewards in the health care insurance market lead to more acceptability of the reward campaign and to more willingness to recommend the company compared

(2016) and the empirical finding of Gorter and Schilp (2012), I hypothesize that risk preference, measured by an individual’s financial risk attitude, is related to

relationship between the insurance coverage and healthcare utilisation for both dental care and physiotherapy, which provides evidence for the presence of moral hazard and/or

Research purpose: The main purpose of the study would be to determine how job demands, resilience, and grit influence individual work performance of journalists in the South African

Although different case studies have been conducted on language practices in mathematics classrooms, it is an ever changing landscape, and valuable observations

eu-LISA shall also implement any necessary adaptations to the VIS deriving from the establishment of interoperability with the EES as well as from the implementation of the

speculeer hier, zou de relatie tussen Hezbollah en Palestijnse vluchtelingen een andere vorm hebben als Hezbollah door de internationale politiek als een legitieme organisatie

Magersfontein Is maar een van die talle Afrikaanse werke wat werklik letterkundige waarde be- sit, en ongelukkig het hierdie een werk nou deur 'n sameloop van omstandighede onder