• No results found

Stability in the relation between health insurance and health care expenditures

N/A
N/A
Protected

Academic year: 2021

Share "Stability in the relation between health insurance and health care expenditures"

Copied!
71
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

U

NIVERSITY OF

A

MSTERDAM

BS

C

E

CONOMETRICS

S

TABILITY IN THE RELATION BETWEEN HEALTH INSURANCE

AND HEALTH CARE EXPENDITURES

A

UTHOR

: J

ANELLE

Z

OUTKAMP

, 10441670

S

UPERVISORS

:

DR

. H

ANS VAN

O

PHEM

&

DRS

. R

OB VAN

H

EMERT

(2)

Contents

1

Introduction

2

2

Theory about insurance decisions and medical expenditures

3

2.1

Insurance as an endogenous variable . . . .

3

2.2

Medical expenses estimation . . . .

5

3

Model

7

3.1

Insurance coverage prediction

. . . .

7

3.2

Expenditure Estimation . . . .

8

3.2.1

Two-Part model . . . .

8

3.2.2

Heckman two-step estimation . . . .

10

3.2.3

Comparing the two-part model and the Heckman two-step estimation . . .

11

3.3

Empirical Specification . . . .

12

4

Data

12

4.1

Sample description . . . .

12

4.2

Summary statistics . . . .

14

5

Empirical results

16

5.1

Insurance decision

. . . .

16

5.2

Expenditure equation . . . .

20

5.3

Unconditional Marginal Effects . . . .

22

5.4

Stability over time . . . .

24

6

Conclusion

25

References

27

Appendix A - Descriptive statistics

29

Appendix B - Regression output 2010

31

Appendix C - Regression output 2008

39

Appendix D - Regression output 2006

47

Appendix E - Regression output 2004

55

(3)

1

Introduction

In periods where income uncertainty is rising health insurance acquisition becomes less natural.

During the past decade the ratio of uninsured in the U.S. has increased constantly (Long et al.,

2014). This is problematic because insurance can play in important role in positive lifestyle

de-cisions and citizen’s feeling of security. Moreover, health insurance is related to higher use of

preventive care which results, on the long run, in higher and healthier life expectancy (Cohen,

Neumann, & Weinstein, 2008). These concerns partially motivated U.S. policy makers to consider

lowering health insurance prices or to extend universal coverage. However, estimating the financial

effects of such decisions is challenging for several reasons.

First, there is a simultaneous connection between the insurance decision and health care

expenditures. On the one hand, there might be an asymmetric information problem. Individuals

buying health insurance are likely to be those who anticipate greater need of health care due to, for

example, their greater health need risk. This is frequently referred to as adverse selection. In

gen-eral, insurers do not have full information on the health of the potential customers. It is therefore

habitual that insured clients have a higher expense risk profile that consequently results in higher

medical expenses (Arrow, 1963). The other indirect connection between insurance and health care

expenditure is due to moral hazard. Individuals, once insured, may become less cautious about their

unhealthy or risky behaviors, which could lead to more health problems, requiring more health care.

In addition, medical insurance lowers the marginal cost of care to the individual, who can choose

to overuse a certain medical service. This paper focuses on the second link, how insurance affects

medical expenses. However, not taking into account the first link, insurance coverage as a function

of health expenditures, might lead to endogeneity bias in estimates of the effects of insurance on

health expenditures.

A second challenge that must be accounted for when modeling health care expenditures is

the large amount of individuals that do not incur any expenditures at all. In economical terms this

is often described as a medical utilization choice (Jerant, Fiscella, Tancredi, & Franks, 2013). First,

a person decides whether to visit a doctor and so incur positive expenditures. Next, the doctor and

patient jointly decide about the treatment and related costs. The health expenditure data can

there-fore be regarded as semi-continuous. With a discrete part for zero expenditures, and a continuous

(4)

part for the positive expenses. Linear regression estimates on the censored full sample, or on the

truncated positive expenditure sample would then again lead to biased estimates (Heij et al., 2004).

This thesis examines the effect of insurance on expenditures for health care services. The

instrumental variable estimation technique will be used to control for the endogeneity bias caused

by the simultaneous relation between insurance and expenses. In order to model the expenditure

equation, two models will be used that have been extensively applied in previous literature: the

two-part model and the Heckam two-step model. Moreover, marginal effect for insurance on

medi-cal expenses for the last decade will be medi-calculated to analyze whether this effect is stable over time.

The rest of the paper proceeds as follows. The next section reviews the literature. Section

3 sets up the theoretical model. Section 4 discusses empirical specifications, data and

summa-rizes most important statistics. Section 5 presents the empirical results. Concluding remarks and

suggestions for further discussion are provided in section 6.

2

Theory about insurance decisions and medical expenditures

Researchers have addressed the challenges that arise in modelling expenditure data in a variety of

study designs. This section first studies the instrumental variable approach to control for

endogene-ity. Then earlier research on modelling medical expenditures is discussed.

2.1

Insurance as an endogenous variable

The first challenge, endogeneity, arises from the simultaneous causality between health insurance

and medical expenditures. On the one hand, people who have insurance are much more likely to use

health care than their uninsured counterparts. On the other hand, people who have greater demand

for health care may have more incentive to obtain insurance coverage (Shen, 2013). Consequently

in this paper, methods to deal with this endogeneity will be used.

There are several econometric techniques that deal with this challenge. One such technique

is the use of instrumental variables. There are three requirements that instruments must satisfy.

First, it must be powerful - the instrument should be sufficiently correlated with the insurance

vari-able. The second requirement is exogeneity. Which means that the instrument must be uncorrelated

with the error term in the expenditure equation. As a third requirement, the instruments should not

(5)

be directly linked to the health care expenditures.

Plausable instruments for an individual’s insurance status must first be proposed from an

economic point of view. Several job characteristics have previously been used. Variables such as

employment, and if so, in which industry, the firm size and any union membership are likely to

affect insurance coverage because most persons with health insurance at mid-life receive health

benefits from their employers (Crystal, Johnson, Harman, Sambamoorthi, & Kumar, 2000). In

addition, these characteristics are unlikely to directly affect the amount of health services used.

Another variable that would be valid from an economic point of view is whether an employee

of-fers an insurance plan to it’s employers. This variable is powerful because most of the employed

individual’s buy the insurance plan if offered by their employee (McDonalds, 2010). Moreover,

health expenses are not expected to be directly affected by the employee’s insurance proposition.

Once economically conceivable instruments have been found, there are several methods to

implement these variables in the original estimation equation. Among these approaches, the two

stage least square method (2SLS) is the most common. It is used by Meer and Rosen (2004) to

study the effect of health insurance on utilization of different types of medical services. In the first

stage, a linear regression with a set of exogenous variables, that includes instruments, is used to

estimate the insurance variable. In the second regression the actual insurance plan is replaced by

it’s instrumental variable estimator. Another frequently used approach consists of replacing the first

stage’s linear regression by a probability estimation. This method is used by Crystal et al. (2000) to

estimate the effect of being insured on out-of-pocket cost. They used a probit estimation to predict

the individual’s coverage choice. Then again, this prediction replaced the original coverage plan in

their expenditure equation.

If the instruments satisfy the requirements, the 2SLS method and the probit model are both

valid mechanisms to correct the earlier mentioned endogeneity bias. However, attention must be

paid to their statistical differences when choosing among them. The error terms in the probit model

are assumed to be normally distributed, whereas this assumption is not mandatory for the linear

regression used in the 2SLS. Another difference lies in their outcomes. The probit model estimates

each individual’s insurance coverage probability. In contrary, the predicted insurance variable in the

the 2SLS model does not necessarily lie between zero and one and cannot be given an economic

interpretation. In the present context, the probit’s insurance estimation has a higher informative

(6)

value and is therefore preferred despite of being more restrictive.

2.2

Medical expenses estimation

After controlling for the endogeneity problem, the insurance effect can be acquired by modelling

the health expenditure estimation equation. The equation must include an array of related variables

to isolate the effect of insurance coverage. This is because the simple correlation between

insur-ance status and medical expenditures may reflect advantageous selection effects rather than causal

effects of health insurance.

First, insured are on average older that the uninsured. Age differences may partly explain

their difference in health care expenditure because responsibility, which implies taking care of

one-self and for example going to the doctor when needed, tends to increase with age (Friedman, 1974).

A second characteristic, also associated with age, is health. A poor health is more likely among

the insured. The unhealthy have more frequent doctor visits and therefore on average increased

medical costs. Other variables that the model should control for include socio-economic

charac-teristics. The insured on average have a higher household income and educational level, which are

associated with higher transportation possibilities to a medical institution and higher health

con-sciousness respectively (Friedman, 1974). This could similarly result in an increased probability

and level of health care demand for the insured.

After controlling for relevant characteristics, it is hypothesized that insurance still has a

positive effect on medical expenditures. Insurance lowers the access threshold to medical care and

certain treatment for individuals as the costs are covered by the insurer.

A comparable argumentation and consequent controlling variable selection is used in most

previ-ous studies on medical expenditures. In these studies, several methods to model health expenditure

data have been applied. Modeling such data is challenging as it typically include a large amount

of observations for individuals that do not incur any expenses at all. It can be regarded as

semi-continuous. With a discrete part for the observations sample, and a continuous part for the positive

expenses. Linear regression estimates on the censored full sample, or only on the truncated positive

expenditure sample would then again lead to biased estimates (Heij et al., 2004).

(7)

property is their interpretative characteristic. In economical terms, the two-step approach is often

described as a medical utilization choice. First, a person decides whether to visit a doctor and so

incur positive expenditures. Next, the doctor and patient jointly decide about the treatment and

related costs (Jerant et al., 2013). This intuition translates into a simple computational form. One

equation estimates the probability that a person incurs any positive medical expenses and a second

equation estimates the level of positive expenses. The final expected individual’s expenditures is

obtained by multiplying these two estimates together.

In the existing literature, the RAND Health Insurance Experiment is the only randomized

health insurance research and serves as the benchmark study. This experiment randomly assigned

people into four types of insurance with varying degrees of coverage (Manning, Newhouse, Duan,

Keeler, & Leibowitz, 1987). One group had free care, the others were responsible for 25%, 50%

and 95% of the costs of their care. The two-part model was then used to estimate their differences

in expected health care expenditures. A probit model predicts the probability on positive

expendi-tures, a linear regression was used for the positive expenditure estimation. The experiment results

showed that insurance coverage, f.e. free health care, does have a positive impact on medical

ex-penditures. In fact, individuals in the group that had complete coverage of medical expenditures

incurred approximately 60% higher expenses than the group that paid for 95% of their medical bill.

Miller, Banthin, and Moeller (2004) studies the effect of insurance on medical expenditures

using data from the Medical Expenditure Panel Survey (MEPS). As the distribution for inpatient

expenditures differs from outpatient expenditures, the four part model was applied. This model is

an expansion of the earlier mentioned two-part model. The probability of hospital use is estimated

among all users and then the costs for users who were hospitalized and for those who are not are

es-timated seperately. Consequent to the theoretically expected outcome, they also find that insurance

has a positive effect on health expenses. Health expenditures increase by approximately 60% to

90% when expansion of insurance is simulated. Note that in this study, the individual’s insurance

status is considered as en exogenous variable which might have caused biased estimates.

The two-part model assumes that there is no correlation between the error terms in both

equations. However, in health economics it could be argued that a high probability of incurring

any positive expenditures might be related to a higher expected level of expenditures. Shen (2013)

takes this relation into account using the Heckman two-step model. Here again, a positive effect

(8)

of insurance is obtained. However, Shen (2013) defends the use of the adjusted semi-parametric

estimation approach above parametric approaches. He argued that parametric approaches make

distributional assumptions which are not substantiated by economic theory. The parametric model

estimated that individuals with insurance have 125% higher medical expenses that the uninsured,

whereas the semi-parametric approach predicts an increase of 48%, a number closer to that found

the RAND’s experiment.

To summarize, the existing literature provides various alternatives to estimate the insurance

coverage effect on health care expenditure. Among these approaches the two-part model and the

Heckman two-step model have been extensively used. Both models will therefore be discussed in

the next section.

3

Model

The economic model used in this thesis combines the techniques found in previous literature. First,

a probit is used to predict the individual’s insurance decision. Next the two-part model and the

Heckman are explained and compared.

3.1

Insurance coverage prediction

The first model deals with the health insurance choice as an endogenous variable. Let I

i

be an

indicator of whether an individual selects any kind of health insurance coverage. In the model, an

individual selects insurance if the probability p

i

of so doing is greater than 50%

1

. This probability

is determined by a set of exogenous variables z

i

and is estimated with a binary probit. The choice

of being insured can be given an interpretation in terms of an unobserved variable I

i

that represents

the latent preference of individual i for the choice I

i

= 1

I

i

= z

0i

γ + σ

ν

ν

i

ν

i

|z

i

∼ NID(0, 1).

(1)

1Further research on the optimal threshold value is recommended. However, for this application the threshold value

(9)

The observed choice I

i

is related to the index I

i∗

by means of the equation

I

i

= 1

i f

I

i

> 0

I

i

= 0

i f

I

i

≤ 0

(2)

which is used to estimate the insurance probability:

p

Ii

= P[I

i

= 1] = P[I

i

= z

i0

α + σ

ν

ν

i

> 0] = P[ν

i

< z

0i

α

σ

ν

] = Φ



z

0i

α

σ

ν



(3)

Then the predicted insurance status is estimated:

ˆ

I

i

= 1

i f

p

ˆ

Ii

> 0.5

ˆ

I

i

= 0

i f

p

ˆ

Ii

< 0.5

(4)

It is important to remark that the error terms are assumed to be normally distributed with zero

mean. Further, ratios of index parameters (θ =

α

σ

) are identified and estimated with the Maximum

Likelihood (ML) technique. The scaled parameters can be used to recover probabilities and predict

the individual’s insurance status. This prediction is assumed to be an exogenous valid substitute for

the original insurance status.

3.2

Expenditure Estimation

The predicted insurance status enters the expenditure estimation equation as an exogenous variable.

Recall that linear regression provides biased coefficients when estimating health expenditure data

as it is typically semi-continuous. Two alternative models are therefore discussed: the two-part

model and the Heckman two-step equation.

3.2.1

Two-Part model

The two part model seperately estimates the probability and level of positive expenditure. The

first part estimates the individual’s probability of incurring any positive expenditures. Denote y

i

as

the level of medical expenditures, x

i

as a set of exogenous variables that affects the probability of

(10)

Then, similar to the method used for the insurance prediction, the probability is estimated with a

binary probit. The event of incurring any positive expenditures can be given an interpretation in

terms of an unobserved variable y

i

y

i

= x

0i

β

1

+ ˆ

I

i

δ

1

+ σ

ε

ε

1i

ε

1i

|x

i

∼ NID(0, 1)

(5)

The positive expenditure probability can be written as

p

i

= P[y

i

> 0] = P[ε

1i

< x

0i

β

1

+ ˆ

I

i

δ

1

] = Φ(x

0i

β

1

ε

+ ˆ

I

i

δ

1

ε

) = Φ(x

0i

γ

1

+ ˆ

I

i

γ

2

)

(6)

where γ

1

= β

1

ε

and γ

2

= δ

1

ε

. The second equation is a linear model on the log scale for

positive expenses, given that the person is a positive user of medical services,

log(y

i

|y

i

> 0) = x

0i

β

2

+ ˆ

I

i

δ

2

+ σ

ε

ε

2i

E(ε

2i

|y

1i

> 0, x

i

) = 0.

(7)

In this equation, the error terms are assumed to be identical but not necessarily normally distributed.

However, in the two-part model the error terms for both equations (ε

1i

and ε

2i

) are assumed to be

independent. This makes it possible to estimate the coefficients of each equation separately (Duan,

Manning, Morris, & Newhouse, 1984). The coefficients in the probit equation can be estimated

with the ML method, ordinary least squares (OLS) may be used in the second equation

2

.

The final two-part expenditure estimation is composed combining the previous equations

E[log(y

i

)] = P[y

i

> 0]E[log(y

i

)|y

i

> 0]

= Φ(x

01i

γ

ˆ

1

+ I

i

γ

ˆ

2

)(x

0i

β

ˆ

2

+ I

i

δ

ˆ

2

).

(8)

Note that the predicted insurance status was only used to obtain consistent coefficients. In the final

equation and following calculations for the marginal effect, the original insurance status I

i

is used.

In this thesis the incremental effect of having an insurance on having medical expenses is

2It is assumed that the second equation, the estimation of positive expenditures, satisfies the requirements needed

(11)

of interest. This effect is

∆(log(y

i

)|I

i

) = E[log(y

i

)|I

i

= 1] − E[log(y

i

)|I

i

= 0]

= Φ(x

i0

γ

ˆ

1

+ ˆ

γ

2

)(x

0i

β

ˆ

2

+ ˆ

δ

2

) − Φ(x

0i

γ

ˆ

1

)(x

0i

β

ˆ

2

).

(9)

The insurance coverage effect can be summarized by the mean marginal effect over the full sample

of N individuals.

3.2.2

Heckman two-step estimation

The Two-part model gives consistent estimators if the separability condition holds true, there must

be no correlation between the probability of having positive medical expenses and the level of these

expenses. However, as described in the literature review, this does not necessarily holds true. A

high expense probability is usually related to higher expenditures. In the Two-part model’s second

equation, the error terms come from the truncated sample and do consequently not have zero mean.

Therefore the estimation should be corrected with the inverse Mills ratio λ

i

. This term gives an

indication of how the probability and level of expenditures are correlated. The correlation term

is denoted by ρ. Note that normality for the error terms for the second equation is a necessary

condition.

ρ λ

i

=E[ε

i

|log(y

i

) > 0]

φ (x

0 i

γ

1

+ I

i

γ

2

)

Φ(x

i0

γ

1

+ I

i

γ

2

)

(10)

The first equation in the Heckman two-step method, which estimates the probability of incurring

any health expenditures, corresponds with the probit estimation in the two-part model. P[y

i

> 0] =

Φ(x

0i

γ

1

+ ˆ

I

i

γ

2

). Thus, γ can be estimated consistently by ML. As a second step, the equation for the

level of positive expenses can be written as

(12)

The final Heckman two-step expenditure estimation is composed combining both steps

E[log(y

i

)] = P[log(y

i

) > 0]E[log(y

i

)|log(y

i

) > 0]

= Φ(x

i0

γ

ˆ

1

+ I

i

γ

ˆ

2

)(x

0i

β

ˆ

2

+ I

i

δ

ˆ

2

+ ˆλ

i

ρ ˆ

ˆ

σ

ε

).

= Φ(x

0i

γ

ˆ

1

+ I

i

γ

ˆ

2

)(x

i0

β

ˆ

2

+ I

i

δ

ˆ

2

) + ˆ

ρ ˆ

σ

ε

φ (x

0

i

γ

ˆ

1

+ I

i

γ

ˆ

2

)

(12)

And the corrected incremental effect is now given by

∆(log(y

i

)|I

i

) = E[log(y

i

)|I

i

= 1] − E[log(y

i

)|I

i

= 0]

= Φ(x

0i

γ

ˆ

1

+ ˆ

γ

2

)(x

0i

β

ˆ

2

+ ˆ

δ

2

) − Φ(x

0i

γ

ˆ

1

)(x

i0

β

ˆ

2

) + ˆ

ρ ˆ

σ

ε

φ (x

0i

γ

ˆ

1

+ ˆ

γ

2

) − ˆ

ρ ˆ

σ

ε

φ (x

i0

γ

ˆ

1

)

(13)

Note that in practice, the coefficient of the inverse Mills ratio ˆ

ρ ˆ

σ

ε

is be estimated jointly by ML. A

test on the significance of the selection bias (that occurs only if ρ 6= 0) can be performed by testing

whether the coefficient of λ

i

is significant.

3.2.3

Comparing the two-part model and the Heckman two-step estimation

The Heckman model uses the inverse Mills ratio, assuming the error terms are normally distributed,

to correct for the correlation between the probability of positive expenditures and it’s level. From

an economic point of view, this term is expected to be significant and positive. But the inverse

Mills ratio acts as an omitted variable in the Two-part model’s second equation. This exclusion has

several consequences.

Consider the possibility that λ correlates with the exogenous explanatory variables. A

positive covariance of the omitted variable λ with a covariate and expenditures will cause the

Two-part estimates for that regressor the be higher than the true coefficient value.

Now suppose that λ

i

is orthogonal to the explanatory variables (x

i

, z

i

). Then the estimated

coefficients for both models would be equal, but the marginal effects in the Two-part model are

underestimated compared to the Heckman method:

(13)

This difference increases as λ ’s coefficient ( ˆ

ρ ˆ

σ

ε

), and thus the correlation, increases in absolute

value. For this reason the results estimated with the Heckman model are preferred despite of being

more restrictive.

3.3

Empirical Specification

The previous section models the incremental effect of insurance on medical expenses with several

estimation equations. The second equation in both the two-part model and the Heckman model

estimates the level of positive expenditure. Utilization data is usually non-normal, right-skewed

and heteroskedatic, with variance that increases with the mean (Diehr, Yanez, Ash, Hornbrook, &

Lin, 1999). These features do not necessarily cause problems. If the data set is large, OLS

regres-sion on the untransformed data will provide consistent estimates of the regresregres-sion parameters. The

standard errors, however, will be typically too small and give overly significant hypothesis tests. A

log transformation is applied to the positive level of expenditures to improve it’s distribution. In

addition it reduces the influence of outliers and increases the precision of the estimates (Diehr et

al., 1999).The final expenditure equation is a combination of a probit and a linear regression. Due

to this combination, traditional standard errors reported by statistical programs are no longer valid.

In this thesis the bootstrap technique will be used to estimate standard error, confidence intervals

and p-values

3

.

4

Data

In this section the dataset is described along with the variables used.

4.1

Sample description

This study’s empirical analysis uses data from the Medical Expenditure Panel Survey (MEPS) for

all even years between 2002 and 2012. The MEPS is a representative survey of the U.S. civilian

population that started in 1996 by the U.S. It collects data on demographic characteristics, health

(14)

insurance coverage and medical expenditures of individual participants which are drawn from a

sample of households. The survey involves five interviews rounds, a self-administered

question-naire and collected data from participants and health care providers. Moreover, it has been tested

for robustness bias and predictive accuracy, which makes it very suitable for health expenditure

purposes (Hill & Miller, 2010).

The sample is limited to adults between the age of 19 and 64 because individuals between

these ages are involved in the choice of having a health insurance plan (Holl, Szilagyi, Rodewald,

Byrd, & Weitzman, 2012). Children until the age of 19 are excluded as they participate in their

parent’s insurance plan if they have any. Adults 65 years of age or above are equally excluded as in

the U.S. most of them are covered by Medicare (Holl et al., 2012). Other exclusion criteria include

individuals who had missing values on the most relevant exogenous variables used

4

. The amount

of observations for the period 2002-2012 ranges between 10043 and 18101.

These observations’s characteristics are collected into variables which can be used to study

the incremental effect of insurance on medical expenditures. Definitions of the variables can be

found in Appendix A. In the MEPS database, health expenditures are the total amount paid on

health services during a year excluding dental services. Insurance coverage provided by MEPS is

a dummy variable indicating whether a person was insured for at least one month during the year,

including both private and public insurance. Individual characteristics controlled for are

demo-graphics, socio-economic status and health related. The demographics are age, gender, ethnicity

(white, nonwhite), marital status (married, other), family size, and region (North-East, Midwest,

South, West). Socioeconomic variables are income and education. Education is in three categories

indicating the highest grade achieved, college or above, high school or equivalence degreed, and

less than high school education (the default). Income is given by the logarithm of the total salarial

income in U.S. dollars a person received during the year. The health related characteristics are

self-reported health status, mental health and suffering from priority diseases. Self reported health

status is included in three categories indicating if a persons feels very healthy (survey score >3 on

a five point scale), not healthy (survey score <3 on a five point scale) or average (default). Poor

mental health indicates whether a person’s perceived mental health is surveyed as poor or fair. The

priority diseases are defined as such by MEPS and include coronary heart disease, heart attack or

(15)

myocardial infarction, any heart disease condition, stroke or transient ischemic attack, emphysema,

high cholesterol, chronicle pain or swelling, arthritis, asthma, diabetes and cancer.

Recalling the endogeneity bias caused the simultaneous relation between insurance

cover-age and health expenses discussed in the previous section, additional variables are used as

instru-ments to predict an individual’s insurance status. Two dummies indicate whether an individual is

or has been employed during the current year and if his employer offers an insurance coverage plan

to it’s employees.

4.2

Summary statistics

Summary statistics of the sample by insurance status for 2012 are reported in table 1.

5

. The first

column describes the statistics for the full sample. The second and third columns contain

statis-tics for the insured and uninsured respectively. Out of the 10043 individuals in the data set, 1406

(14%) are uninsured and 2109 (21%) have had zero expenditures during 2012. The mean expenses

are twice as high as for the uninsured. Note that nearly 20% of the insured have had no medical

expenditures at all, in contrary to almost 50% of the uninsured.

Recall that these differences in health expenditure may reflect advantageous selection

ef-fects rather than causal efef-fects of health insurance. The insured are on average approximately five

years older than the uninsured. Perceived health and mental health does not differ much between

the two groups. However, the insured are less healthy as 55% in this sample suffers from any

priority disease, in contrary to 40% of the uninsured. Moreover, the insured do have a favorable

socioeconomic position. Their average yearly income is $32429, which is twice as high as the

uninsured’s income. On average 32% of the insured have a college degree, in contrary to 12% of

the insured.

Similarly, differences appear when analyzing the instrumental variable’s statistics. The

em-ployment rate among both groups is almost equal (approximately 70%). However, 30% of the

uninsured’s employees offers an insurance coverage plan, in contrary to 61% of the insured’s

em-ployees. This might indicate a consumer’s preference for any insurance coverage. Both variables

are regarded as being economically suitable instruments for the insurance decision.

(16)

Table 1. Summary Statistics of the total Sample and by Health Insurance Status

Full sample (n=10043) The insured (n=8637) The uninsured (n=1406)

N Mean (S.E.) N Mean (S.E.) N Mean (S.E.)

Insured 0,84 (0,47) Expenditures Total 10043 3180 (9828) 6804 4121 (11274) 3239 1203 (5170) Positive expenditures 8458 4293 (11207) 5730 4894 (12131) 1710 2279 (6942) Zero expenditures 1585 0,16 (0,36) 1074 0,16 (0,36) 1529 0,47 (0,5) Demographics Age 40,72 (13,17) 42,09 (13,22) 37,85 (12,6) Male 4659 0,46 (0,5) 3025 0,44 (0,5) 3432 0,50 (0,5) White 6799 0,68 (0,47) 4518 0,66 (0,47) 4792 0,70 (0,46) Married 5036 0,50 (0,5) 3706 0,54 (0,5) 2794 0,41 (0,49) Family size 3,25 (1,72) 3,11 (1,57) 3,55 (1,97) Health Excellent health 1905 0,19 (0,39) 1273 0,19 (0,39) 1328 0,20 (0,4) Poor health 1601 0,16 (0,37) 992 0,15 (0,35) 1279 0,19 (0,39)

Poor mental health 1355 0,13 (0,34) 918 0,13 (0,34) 918 0,13 (0,34) Priority health 4986 0,50 (0,5) 3734 0,55 (0,5) 2630 0,39 (0,49)

Smoke 1895 0,19 (0,39) 1182 0,17 (0,38) 1498 0,22 (0,41)

Socioeconomic status

Income 27174 (32988) 32469 (36341) 16053 (20435)

High school education 5109 0,51 (0,5) 3437 0,51 (0,5) 3512 0,52 (0,5) College education 2536 0,25 (0,43) 2156 0,32 (0,47) 798 0,12 (0,32) Region Midwest 1735 0,17 (0,38) 1284 0,19 (0,38) 947 0,14 (0,35) South 3739 0,37 (0,48) 2353 0,35 (0,48) 2911 0,43 (0,49) West 2911 0,29 (0,45) 1910 0,28 (0,45) 2103 0,31 (0,46) Instruments Insurance offered 5121 0,51 (0,5) 4184 0,61 (0,49) 1968 0,29 (0,45) Employed 7601 0,76 (0,43) 5197 0,76 (0,42) 5050 0,74 (0,44)

(17)

5

Empirical results

This section reports the outcomes estimation for the insurance choice and medical cost in 2012

6

.Then marginal effects for all even years between 2002 and 2012 are reported and compared.

5.1

Insurance decision

As discussed, the individual’s insurance coverage decision is assumed to be endogenous due to the

simultaneous causality argumentation. It can be checked whether this line of reasoning statistically

holds true with the Durbin Wu Hausman (DBH) test. Outcomes for this test are shown in Table 2.

Under the null-hypotheses, that all the regressors are exogenous, the test statistic (n ∗ R

2

)

asymp-totically has the χ(1) distribution. As expected, the test significantly rejects the null-hypotheses

(p-value=0.003) for exogeneity.

Employment and whether the individual’s employer offers an insurance plan are

econom-ically plausible instruments. For statistical validity they must be correlated with the insurance

variable, they should not directly affect the level of expenditures and they must not correlate with

the error term in the expenditure equation. Following the example of previous studies (f.e. Meer

and Rosen (2004)), this thesis relies on the earlier mentioned economical argumentation for the

first two requirements: relevance and no direct effect on medical expenditures. To check if the

ex-ogeneity assumption is satisfied, the Sargan test for over-identification is used. Outcomes for this

test are shown in Table 3. The p-value for this test is 0.3163, and thus does not reject the exogeneity

of employment and insurance offer dummies.

7

6The outputs for other years are bundled in Appendix B-F.

7Note that other methods to test for endogeneity might be preferred due to the semi-continuous character of medical

expenditure data. However it is assumed that the performed DWH and Sargan test give enough evidence to confirm endogeneity and use or reject the proposed instrumental variables.

(18)

T able 2. Durbin W u Hausman test 2010 (1) (2) (3) type: linear re gression type: type: linear re gression dependent v ariable: Insured dependent v ariable: dependent v ariable: Residuals Re gression (2) estimate (s.d.) p estimate (s.d.) p estimate (s.d.) p Insurance Of fered 0.336 (0.01) 0.000 Insured 2.060 (0.06) 0.000 Insured 0.714 (0.21) 0.001 Emplo yment -0.282 (0.02) 0.000 Age -0.027 (0.02) 0.086 Age 0.007 (0.02) 0.673 Age -0.011 (0) 0.000 Age2 0.001 (0) 0.003 Age2 0.000 (0) 0.606 Age2 0.000 (0) 0.000 Male -1.171 (0.06) 0.000 Male 0.039 (0.06) 0.500 Male -0.045 (0.01) 0.000 White 0.433 (0.06) 0.000 White 0.032 (0.06) 0.603 White -0.032 (0.01) 0.000 Married 0.329 (0.06) 0.000 Married -0.076 (0.07) 0.263 Married 0.088 (0.01) 0.000 F amily size -0.213 (0.02) 0.000 F amily size 0.014 (0.02) 0.449 F amily size -0.012 (0) 0.000 Health Excellent -0.421 (0.07) 0.000 Health Excellent -0.008 (0.07) 0.917 Health Excellent 0.010 (0.01) 0.361 Health Poor 0.797 (0.09) 0.000 Health Poor 0.034 (0.09) 0.696 Health Poor -0.044 (0.01) 0.001 Mental health Poor 0.754 (0.09) 0.000 Mental health Poor -0.030 (0.09) 0.747 Mental health Poor 0.040 (0.01) 0.004 Health priority 1.498 (0.06) 0.000 Health priority -0.071 (0.07) 0.286 Health priority 0.084 (0.01) 0.000 Smok e -0.088 (0.07) 0.237 Smok e 0.021 (0.07) 0.780 Smok e -0.027 (0.01) 0.016 Education HS 0.516 (0.07) 0.000 Education HS -0.099 (0.08) 0.202 Education HS 0.105 (0.01) 0.000 Education Colle ge 1.009 (0.09) 0.000 Education Colle ge -0.194 (0.1) 0.060 Education Colle ge 0.208 (0.01) 0.000 LogW age -0.023 (0.01) 0.001 LogW age -0.003 (0.01) 0.641 LogW age 0.008 (0) 0.000 Re gion MW 0.350 (0.1) 0.000 Re gion MW 0.026 (0.1) 0.791 Re gion MW -0.052 (0.01) 0.000 Re gion S -0.032 (0.08) 0.695 Re gion S 0.090 (0.09) 0.295 Re gion S -0.129 (0.01) 0.000 Re gion W -0.148 (0.09) 0.087 Re gion W 0.060 (0.09) 0.493 Re gion W -0.077 (0.01) 0.000 Constant term 3.428 (0.32) 0.000 Residual Re gression (1) -0.789 (0.22) 0.000 Constant term 0.788 (0.05) 0.000 Constant term -0.524 (0.35) 0.135 Observ ations 10043 10043 10043 R2 0.206 0.333 0.001 T est outcome test statistic 13.245 p v alue 0.000

(19)

Table 3. Sargan test 2010

(1) (2)

type: linear regression type: linear regression

dependent variable: LogExpenditures dependent variable: Residuals Regression (1)

estimate (s.d.) p estimate (s.d.) p

Insured IV 2.774 (0.22) 0.000 Insurance offered -0.020 (0.08) 0.785

Age -0.020 (0.02) 0.218 Employment -0.146 (0.16) 0.353

Age2 0.000 (0) 0.022 Age 0.000 (0.02) 0.999

Male -1.132 (0.06) 0.000 Age2 0.000 (0) 0.984

White 0.465 (0.06) 0.000 Male -0.002 (0.06) 0.972

Married 0.254 (0.07) 0.000 White 0.000 (0.06) 0.994

Family size -0.199 (0.02) 0.000 Married 0.000 (0.07) 0.998 Health Excellent -0.429 (0.08) 0.000 Family size 0.000 (0.02) 0.991 Health Poor 0.831 (0.09) 0.000 Health Excellent 0.000 (0.08) 0.998 Mental health Poor 0.724 (0.1) 0.000 Health Poor -0.004 (0.09) 0.963 Health priority 1.426 (0.07) 0.000 Mental health Poor -0.005 (0.1) 0.957 Smoke -0.067 (0.08) 0.388 Health priority 0.001 (0.07) 0.991 Education HS 0.418 (0.08) 0.000 Smoke 0.002 (0.08) 0.981 Education College 0.815 (0.11) 0.000 Education HS 0.002 (0.08) 0.976 LogWage -0.026 (0.01) 0.000 Education College -0.001 (0.09) 0.995

Region MW 0.376 (0.1) 0.000 LogWage 0.014 (0.02) 0.373

Region S 0.058 (0.09) 0.520 Region MW 0.004 (0.1) 0.969 Region W -0.088 (0.09) 0.340 Region S 0.002 (0.09) 0.985 Constant term 2.904 (0.37) 0.000 Region W 0.001 (0.09) 0.992

Constant term Observations 10043 10043 R2 0.275 0.000 Test outcome test statistic 1.018 p value 0.313

As the proposed instruments are not rejected a probit estimation can be performed to predict

each individual’s insurance coverage status. This estimation’s purpose is merely to correct for the

endogeneity bias. However, results provide information about the individual’s decision process and

are therefore worth commenting.

The results for this estimation are shown in Table 4. The highest marginal effect on the

probability of having insurance comes from the insurance offer variable, with a the p-value being

less than 0.01. Whether an employer offers an insurance to its employees increases to probability

of having an insurance by more than 30%. Education also has a significant effect on the insurance

coverage. People with a high school or college degree on average have a 10 and 21% higher chance

of being insured, compared to individuals with no high school degree (coefficient p-value<0.01).

(20)

Individuals with a poor mental health and people suffering from a priority disease have a significant

positive impacts on insurance coverage.Other significant variables with a positive effect are age,

wage and marital status.

For most of the coefficients, the marginal effect on the probability of having an insurance are

as expected from the economical theory and should therefore be suitable to generate the insurance

prediction. The portion of estimated insured individuals is 87%, which is only slightly higher

than the original sample ratio (84%). Moreover, the ratio of observation correctly estimated is

approximately 80%.

Table 4. Insurance prediction

type: probit regression

dependent variable: Insured

estimate s.d. p-value ME Insurance offered 0.995 (0.03) 0.000 0.926 Employment -0.820 (0.07) 0.000 -0.961 Age -0.042 (0.01) 0.000 -0.057 Age2 0.001 (0) 0.000 0.000 Male -0.142 (0.03) 0.000 -0.199 White -0.095 (0.03) 0.002 -0.156 Married 0.290 (0.03) 0.000 0.227 Family size -0.035 (0.01) 0.000 -0.053 Health Excellent 0.030 (0.04) 0.426 -0.044 Health Poor -0.150 (0.04) 0.000 -0.234 Mental health Poor 0.110 (0.05) 0.015 0.021 Health priority 0.275 (0.03) 0.000 0.211 Smoke -0.095 (0.04) 0.010 -0.167 Education HS 0.300 (0.03) 0.000 0.232 Education College 0.715 (0.05) 0.000 0.627 LogWage 0.027 (0.01) 0.000 0.013 Region MW -0.188 (0.05) 0.000 -0.288 Region S -0.441 (0.04) 0.000 -0.526 Region W -0.272 (0.05) 0.000 -0.360 Constant term 1.006 (0.16) 0.000 0.694 Observations 10043 Pseudo R2 0.175 hitratio 0.80

(21)

5.2

Expenditure equation

In the previous section the individual’s insurance status has been predicted. Recall that this is a

required step in order to address the endogeneity problem arising from the simultaneous causality

between health insurance and expenditures. Consequently we assume that substituting the original

insurance variable for it’s prediction in the following equations, deals with the endogeneity issue.

The expenditure equations are estimated with the two-part model and the Heckman two-step model.

First stage: Positive expenditure probability

Table 5 shows the individual probability of having any positive medical expenditures. One

of the most important questions here is how this is affected by the insurance status. The average

marginal effect of insurance coverage on this probability is 9.5% and significant (p-value<0.05).

Meaning that if everyone in the sample was moved from uninsured to insured, the average gain in

the probability of having any positive medical expenses would increase with 9.5%. Health

vari-ables equally have a significant effect. People with poor mental health and a poorly perceived

health have a 6 to 7% higher probability of accessing health care than average healthy individuals.

Moreover, suffering from a high priority disease increases the expenditure probability with

approx-imately 18%. An interesting finding is that individuals that feel very healthy are not less likely to

have any positive expenditures than average healthy individuals. As reasoned from the economic

point of view high school and college education both have a positive effect. Another interesting

finding is that income does not have a significant impact (p-value>0.05) on the positive health

ex-penditure probability once the insurance decision is taken into account. Male and family size have

a significant positive effect, while smoking is not significant.

(22)

Table 5. Probability positive expenditures

type: probit regression

dependent variable: Expenditures estimate s.d. p-value ME Insured prediction 0.305 (0.04) 0.000 0.095 Age -0.019 (0.01) 0.026 -0.006 Age2 0.000 (0) 0.003 0.000 Male -0.550 (0.03) 0.000 -0.162 White 0.142 (0.03) 0.000 0.042 Married 0.194 (0.04) 0.000 0.057 Family size -0.095 (0.01) 0.000 -0.028 Health Excellent -0.122 (0.04) 0.001 -0.037 Health Poor 0.223 (0.05) 0.000 0.061 Mental health Poor 0.279 (0.05) 0.000 0.075 Health priority 0.624 (0.03) 0.000 0.181 Smoke -0.052 (0.04) 0.187 -0.015 Education HS 0.208 (0.04) 0.000 0.061 Education College 0.510 (0.05) 0.000 0.134 LogWage 0.003 (0) 0.464 0.001 Region MW 0.230 (0.05) 0.000 0.063 Region S -0.028 (0.04) 0.527 -0.008 Region W -0.036 (0.05) 0.438 -0.010 Constant term 0.498 (0.17) 0.003 0.000 Observations 10043 Pseudo R2 0.181 hitratio 0.80

Second stage: Estimating positive expenditures

Table 6 reports the estimated medical utilization equations, conditional on being positive. The first

part shows the estimates for the two-part model, the second part represents the Heckman two-step

model.

First note that the coefficients on the positive level of expenditures for most of the variables

have the same sign as in the previous stage. Moreover insurance coverage increases the level of

expenditures by about 20% in the two-part model and 35% in the Heckman model. It is remarkable

that income has a significant but negative impact on the amount of medical expenditures, in contrary

to what economical theory suggests. The impact of this finding, which is similar to that found by

Shen (2013) is relatively small as a log-transformation has been applied to the raw wage data.

Health variables have the biggest impact. Poor mental health, poor perceived health and suffering

from a priority disease all increase the expected positive expenses with approximately 50% in both

(23)

the two part and the Heckman model.

Moreover, the outcomes show that the inverse Mills ratio has a significant and positive

effect. This indicates that there is indeed a positive correlation between the probability and the level

of these expenses. According to the theory discussed, the two-part model’s estimates are biased if

λ correlates with the covariates and expenditures. Moreover, the marginal effects discussed in the

next paragraph are underestimated.

Table 6. Level of positive expenditures

type: linear regression

dependent variable: LogExpenditures

(1) (2) estimate (s.d.) p estimate (s.d.) p Insured prediction 0.201 (0.06) 0.000 0.351 (0.08) 0.000 Age -0.038 (0.01) 0.000 -0.038 (0.01) 0.000 Age2 0.001 (0) 0.000 0.001 (0) 0.000 Male -0.334 (0.04) 0.000 -0.530 (0.08) 0.000 White 0.208 (0.04) 0.000 0.256 (0.05) 0.000 Married 0.091 (0.04) 0.037 0.165 (0.05) 0.001 Family size -0.072 (0.01) 0.000 -0.108 (0.02) 0.000 Health Excellent -0.247 (0.05) 0.000 -0.299 (0.06) 0.000 Health Poor 0.467 (0.06) 0.000 0.542 (0.06) 0.000 Mental health Poor 0.453 (0.06) 0.000 0.533 (0.07) 0.000 Health priority 0.578 (0.04) 0.000 0.811 (0.09) 0.000 Smoke -0.042 (0.05) 0.405 -0.058 (0.05) 0.263 Education HS 0.336 (0.05) 0.000 0.415 (0.06) 0.000 Education College 0.571 (0.06) 0.000 0.744 (0.09) 0.000 LogWage -0.023 (0) 0.000 -0.021 (0) 0.000 Region MW -0.075 (0.06) 0.230 0.000 (0.07) 0.999 Region S -0.169 (0.06) 0.002 -0.176 (0.06) 0.002 Region W -0.243 (0.06) 0.000 -0.253 (0.06) 0.000 Constant term 6.955 (0.22) 0.000 6.316 (0.31) 0.000 Lambda 0.915 (0) 0.000 Observations 10043 10043

5.3

Unconditional Marginal Effects

The unconditional marginal effects for both models are shown in table 7. The two-part model

ap-proach estimates the marginal impact of insurance on expenditures to be 67% on average. The

Heckman estimate is slightly higher, namely 76%. The difference between the marginal effects in

the two models is as expected because the inverse Mills ratio is significantly different from zero and

positive. The two-part model, which has been extensively used in the existing literature, therefore

(24)

consistently underestimates the explanatory coefficients. The Heckman provided estimates depend

on the (joint) normality assumption. If this assumption is incorrectly imposed, the resulting

esti-mator is typically inconsistent.

Another way to interpret the validity of the obtained results is by comparing them with

outcomes from earlier discussed research. Recall that earlier experimental research estimated the

effect on approximately 60% (Manning et al., 1987), parametric research found an effect of

ap-proximately 80%, and semi-parametric research about 50%. The estimates derived in this paper

are similar to these mentioned studies. However, caution must be taken into account as earlier

re-search is based on different data sets, models and strategies. For example, Shen uses MEPS data

for 2015, limits it’s sample to working individuals and only estimates the effect on expenditure for

private insurance coverage. The above results are based on the sample including people who have

outpatient use and those with inpatient use. The RAND experiment study notes that the distribution

of medical expenditures differs for these two groups (Manning et al., 1987). To address this issue

the model has been re-estimated for individuals with only outpatient use and found that the results

are similar. Detailed results are available on request.

(25)

Table 7. Unconditional marginal effect on expenditures (2010) (1) (2) ME (s.d. ME) ME (s.d. ME) Insured 0.689 (0.08) 0.772 (0.09) Age -0.062 (0.02) -0.065 (0.02) Age2 0.001 (0) 0.001 (0) Male -1.220 (0.06) -1.349 (0.07) White 0.407 (0.06) 0.448 (0.07) Married 0.410 (0.07) 0.456 (0.08) Family size -0.222 (0.02) -0.246 (0.02) Health Excellent -0.400 (0.08) -0.439 (0.09) Health Poor 0.742 (0.1) 0.812 (0.11) Mental health Poor 0.830 (0.1) 0.902 (0.11) Health priority 1.533 (0.06) 1.693 (0.07) Smoke -0.123 (0.08) -0.140 (0.09) Education HS 0.619 (0.08) 0.678 (0.09) Education College 1.327 (0.1) 1.457 (0.11) LogWage -0.012 (0.01) -0.012 (0.01) Region MW 0.350 (0.11) 0.399 (0.12) Region S -0.176 (0.09) -0.186 (0.1) Region W -0.245 (0.09) -0.258 (0.1)

5.4

Stability over time

Another possibility to interpret the results is to examine the insurance effect on health expenditures

over time. The method applied in this paper is equally applied to all even years between 2002 and

2012. First the individual’s insurance status is estimated. Then this prediction replaced the original

insurance status in the expenditure equations. Descriptive statistics and outcomes for each

estima-tion equaestima-tion are displayed in Appendix B-F. Table 8 shows the final result, the marginal effect

for insurance status on medical expenditures for each analyzed year. The effect does not seem to

differ extremely within the last decade. The lowest effect is found in 2002 and 2008, where being

insured affects health care expenditure on average with 57% and 59% estimated with the two-part

model and the Heckman model respectively. The highest effect is found in 2004 and 2012, where

the effect is approximately 67% and 73%. Note that these differences must be compared with each

year’s specific descriptive statistics and regression outcomes. In addition, the hypotheses that

em-ployment and insurance offered are exogeneous and therefore valid instuments for the insurance

status is rejected in 2008 and 2010. Estimations for these years are therefore biased.

(26)

Table 8: Marginal effect insurance on expenditures over time

(1) (2)

ME TPM (s.d. ME) ME Heckman (s.d. ME)

2002 0.617 (0.07) 0.638 (0.08) 2004 0.727 (0.07) 0.742 (0.08) 2006 0.729 (0.07) 0.764 (0.07) 2008 0.621 (0.07) 0.633 (0.08) 2010 0.509 (0.07) 0.528 (0.08) 2012 0.689 (0.08) 0.772 (0.09)

Figure 1: Graph marginal effect insurance on expenditure over time

6

Conclusion

Using the Medical Expenditure Panel Survey, this paper has examined the effect of health insurance

on the health care expenditures. From an economic point of view, insurance status is likely to be an

endogenous variable, not taking this into account would cause the interpretation to the statistical

re-lationship to be problematic. The Durbin Wu Hausman test confirms this intuition, exogeneity has

been rejected. In order to address this problem the instrumental variable estimation was applied.

Economically and statistically, employment and whether an employer offers an insurance plan to

it’s employees seem to be suitable instruments for the insurance variable. The insurance status is

predicted with a probit estimation, and this estimation substitutes the original insurance status in

the expenditure equations.

Two models have been extensively used in previous literature: the two-part model and the

Heckman two-step model. Both estimations find a positive and statistically significant effect of

insurance. However, before commenting on the final results, their different estimations can be

(27)

in-terpreted. The covariate coefficient in the two-part model are lower than the Heckman estimates in

the expenditure part of both models.

Moreover, the marginal effect, which are of main interest in this paper, also differ between

the two models. As expected in the modeling section, the two-part marginal effects underestimate

the insurance effect. The Heckman model estimates are, on average, 4% higher. However it is

remarkable that the differences are not extremely high. Semi-parametric models have estimated

the effect of insurance and private insurance between 60% and 80%. This coincides with the

re-sults found in this paper. Another approach to interpret the estimated impact is to analyze if it has

been stable within the past decade. For all estimates we find similar results. The two-part model

marginal effects are lower than the Heckman’s. But the differences and effects do not seem to

change extremely over time. Insurance increases medical costs between 57% and 68% with the

two-part model, and between 59% and 78%in the Heckman model. These results are, again,

com-parable to that found in previous literature and therefore seem to be plausible.

There are some limitations and consequently some future research directions that must be

pointed out. First, this thesis assumes that the error terms are normally distributed using parametric

models. However, in practice there is no reason to make such an assumption. A further analysis on

the use of semi-parametric or non-parametric models is highly recommended.

Secondly, if the event of incurring any positive expenditures is regarded as a choice to visit

a doctor then the explanatory variables might be different in the first and second equation for both

the two-part and the Heckman two-step model. Research on this property is advised, Further, this

study relies on data from the U.S., therefore the outcomes might not apply to other populations.

Moreover, as (Shen, 2013) and other did, it might be interesting to consider only private

in-surance as an endogenous variable. This is because other types of inin-surance might not be a choice

for the customer, as accepting it for free might not be a choice. include

(28)

References

Arrow, K. J. (1963). Uncertainty and the welfare economics of medical care. The American

economic review, 941–973.

Cohen, J. T., Neumann, P. J., & Weinstein, M. C. (2008). Does preventive care save money? health

economics and the presidential candidates. New England Journal of Medicine, 358(7), 661–

663.

Crystal, S., Johnson, R. W., Harman, J., Sambamoorthi, U., & Kumar, R. (2000). Out-of-pocket

health care costs among older americans. Journal of Gemtology: Social Sciences, 55(1),

S5I–S62.

Diehr, P., Yanez, D., Ash, A., Hornbrook, M., & Lin, D. (1999). Methods for analyzing health care

utilization and costs. Annual review of public health, 20(1), 125–144.

Duan, N., Manning, W. G., Morris, C. N., & Newhouse, J. P. (1984). Choosing between the

sample-selection model and the multi-part model. Journal of Business & Economic Statistics, 2(3),

283–289.

Friedman, B. (1974). Risk aversion and the consumer choice of health insurance option. The

Review of Economics and Statistics, 209–214.

Heij, C., De Boer, P., Franses, P. H., Kloek, T., Van Dijk, H. K., et al. (2004). Econometric methods

with applications in business and economics. Oxford University Press.

Hill, S. C., & Miller, G. E. (2010). Health expenditure estimation and functional form: applications

of the generalized gamma and extended estimating equations models. Health economics,

19(5), 608–627.

Holl, J. L., Szilagyi, P. G., Rodewald, L. E., Byrd, R. S., & Weitzman, M. L. (2012). Profile of

uninsured in the united states. Archives of pediatrics & adolescent medicine, 149(4), 398–

406.

Jerant, A., Fiscella, K., Tancredi, D. J., & Franks, P. (2013). Health insurance is associated with

preventive care but not personal health behaviors. The Journal of the American Board of

Family Medicine, 26(6), 759–767.

Long, S. K., Kenney, G. M., Zuckerman, S., Wissoker, D., Shartzer, A., Karpman, M., & Anderson,

N. (2014). Quicktake: Number of uninsured adults continues to fall under the aca: Down by

(29)

8.0 million in june 2014. Washington, DC: Urban Institute.

Manning, W. G., Newhouse, J. P., Duan, N., Keeler, E. B., & Leibowitz, A. (1987). Health

insurance and the demand for medical care: evidence from a randomized experiment. The

American economic review, 251–277.

McDonalds, M. (2010). A profile of the uninsured persons in the united states. Pfizer facts.

Meer, J., & Rosen, H. S. (2004). Insurance and the utilization of medical services. Social Science

& Medicine, 58(9), 1623–1632.

Miller, G. E., Banthin, J. S., & Moeller, J. F. (2004). Covering the uninsured: Estimates of the

impact on total health expenditures for 2002.

Shen, C. (2013). Determinants of health care decisions: Insurance, utilization, and expenditures.

The Review of Economics and Statistics, 95(1), 142–153.

(30)

Appendix A - Variable definitions

Table A.1. Variable Definitions

Variable Definition

Age Individual’s age by end of current year Age2 Squared individual’s age

Education College Dummy=1 if individual’s highest degree is College Education HS Dummy=1 if individual’s highest degree is High School

Employment Dummy=1 if individual has been employed for at least one month in current year.

Family Size Individual’s family size

Health Excellent Percieved health. Dummy=1 if individual reported score 4 or higher on a 5 point scale

Health Poor Percieved health. Dummy=1 if individual reported score 2 or lower on a 5 point scale

Health Priority Dummy=1 if individual has ever been diagnosed with one of the fol-lowing conditions

High blood pressure, Coronary hear disease, Angina, Heart atack, Myocardial infarction, Stroke, Emphysema, Chronic bronchitis, High cholesterol, Cancer, Diabetes, Joint pain, Arthritis, Asthma, Attention deficit disorder.

Insurance offered Dummy=1 if individual’s employee offers insurance plan to it’s em-ployers.

Insured Dummy=1 if individual has been insured for at least one month in current year

Insured prediction Dummy=1 if probit model predicts that the individual has been insured for at least one month in curring year

(31)

Variable Definition

LogExpenditure Logarithm of health care expenditures in current year US dollars LogWage Logarithm of yearly wage income in current year US dollars Male Gender. Dummy=1 if individual is male

Married Marital status. Dummy=1 if individual has ever been married during current year

Mental health Poor Percieved mental health. Dummy=1 if individual reported score 4 or higher on a 5 point scale

Region MW Dummy=1 if individual lives in region Mid-West Region S Dummy=1 if individual lives in region South Region W Dummy=1 if individual lives in region West Smokes Dummy=1 if individual currently smokes White Ethnicity. Dummy=1 if individual is white

(32)
(33)

T able B.1. Summary Statistics of the total Sample and by Health Insurance Status 2010 Full sample (n=16210) The insured The uninsured N Mean (S.E.) N Mean (S.E.) N Mean (S.E.) Insured 0.70 0.46 Expenditures T otal 3341 9510 4165 10674 1422 5519 Zero expenditures 0.14 0.35 0.14 0.35 0.44 0.50 Positi v e expenditures 4345 10642 4844 11368 2552 7197 Demographics Age 41.20 13.08 42.66 12.98 37.82 12.68 Male 0.46 0.50 0.44 0.50 0.51 0.50 White 0.70 0.46 0.70 0.46 0.72 0.45 Married 0.52 0.50 0.57 0.50 0.40 0.49 F amily size 3.17 1.67 3.06 1.57 3.43 1.86 Health Excellent health 0.20 0.40 0.20 0.40 0.20 0.40 Poor health 0.14 0.34 0.13 0.34 0.16 0.36 Poor mental health 0.13 0.34 0.13 0.34 0.14 0.35 Priority health 0.52 0.50 0.57 0.50 0.41 0.49 Smok e 0.20 0.40 0.18 0.38 0.25 0.44 Socioeconomic status High school education 0.44 0.50 0.39 0.49 0.54 0.50 Colle ge education 0.49 0.50 0.56 0.50 0.32 0.47 Income 26890 31415 31923 34288 15175 18748 Demographics Mide west 0.21 0.40 0.23 0.42 0.16 0.36 South 0.38 0.48 0.35 0.48 0.45 0.50 west 0.27 0.44 0.26 0.44 0.28 0.45 Instruments Insurance of fered 0.51 0.50 0.61 0.49 0.26 0.44 Emplo yed 0.76 0.42 0.78

(34)

T able B.2. Durbin W u Hausman test 2010 (1) (2) (3) type: linear re gression type: type: linear re gression dependent v ariable: Insured dependent v ariable: dependent v ariable: Residuals Re gression (2) estimate (s.d.) p estimate (s.d.) p estimate (s.d.) p Insurance Of fered 0.327 (0.01) 0.000 Insured 1.990 (0.05) 0.000 Insured 0.316 (0.16) 0.046 Emplo yment -0.307 (0.02) 0.000 Age -0.038 (0.01) 0.002 Age 0.001 (0.01) 0.958 Age -0.005 (0) 0.010 Age2 0.001 (0) 0.000 Age2 0.000 (0) 0.892 Age2 0.000 (0) 0.000 Male -1.026 (0.04) 0.000 Male 0.017 (0.04) 0.700 Male -0.052 (0.01) 0.000 White 0.327 (0.05) 0.000 White 0.011 (0.05) 0.815 White -0.025 (0.01) 0.000 Married 0.390 (0.05) 0.000 Married -0.038 (0.05) 0.482 Married 0.111 (0.01) 0.000 F amily size -0.208 (0.01) 0.000 F amily size 0.005 (0.01) 0.723 F amily size -0.010 (0) 0.000 Health Excellent -0.423 (0.06) 0.000 Health Excellent -0.005 (0.06) 0.924 Health Excellent 0.023 (0.01) 0.006 Health Poor 0.960 (0.07) 0.000 Health Poor 0.010 (0.07) 0.884 Health Poor -0.026 (0.01) 0.013 Mental health Poor 0.691 (0.07) 0.000 Mental health Poor -0.011 (0.07) 0.881 Mental health Poor 0.035 (0.01) 0.001 Health priority 1.468 (0.05) 0.000 Health priority -0.027 (0.05) 0.590 Health priority 0.072 (0.01) 0.000 Smok e -0.207 (0.06) 0.000 Smok e 0.019 (0.06) 0.735 Smok e -0.051 (0.01) 0.000 Education HS 0.586 (0.09) 0.000 Education HS -0.069 (0.09) 0.469 Education HS 0.174 (0.01) 0.000 Education Colle ge 1.165 (0.09) 0.000 Education Colle ge -0.107 (0.1) 0.305 Education Colle ge 0.264 (0.01) 0.000 LogW age -0.015 (0.01) 0.005 LogW age -0.002 (0.01) 0.711 LogW age 0.013 (0) 0.000 Re gion MW 0.306 (0.07) 0.000 Re gion MW 0.001 (0.07) 0.985 Re gion MW -0.010 (0.01) 0.333 Re gion S -0.095 (0.07) 0.150 Re gion S 0.034 (0.07) 0.617 Re gion S -0.110 (0.01) 0.000 Re gion W -0.158 (0.07) 0.024 Re gion W 0.020 (0.07) 0.780 Re gion W -0.057 (0.01) 0.000 Constant term 3.428 (0.26) 0.000 Residual Re gression (1) -0.352 (0.17) 0.036 Constant term 0.524 (0.04) 0.000 Constant term -0.133 (0.27) 0.623 Observ ations 16210 16210 16210 R 2 0.220 0.329 0.000 T est outcome test statistic 4.411 p v alue 0.036

(35)

Table B.3.: Sargan test 2010

(1) (2)

type: linear regression type: linear regression

dependent variable: LogExpenditures dependent variable: Residuals Regression (1)

estimate (s.d.) p estimate (s.d.) p

Insured IV 2.307 (0.17) 0.000 Insurance offered -0.045 (0.06) 0.429

Age -0.037 (0.01) 0.003 Employment -0.327 (0.13) 0.012

Age2 0.001 (0) 0.000 Age -0.001 (0.01) 0.945

Male -1.009 (0.05) 0.000 Age2 0.000 (0) 0.992

White 0.338 (0.05) 0.000 Male -0.003 (0.05) 0.949

Married 0.352 (0.06) 0.000 White -0.001 (0.05) 0.991

Family size -0.202 (0.02) 0.000 Married 0.001 (0.05) 0.981 Health Excellent -0.428 (0.06) 0.000 Family size 0.000 (0.02) 0.984 Health Poor 0.970 (0.07) 0.000 Health Excellent -0.003 (0.06) 0.961 Mental health Poor 0.681 (0.07) 0.000 Health Poor -0.007 (0.07) 0.920 Health priority 1.441 (0.05) 0.000 Mental health Poor -0.006 (0.07) 0.937 Smoke -0.187 (0.06) 0.002 Health priority 0.001 (0.05) 0.986 Education HS 0.517 (0.1) 0.000 Smoke -0.001 (0.06) 0.987 Education College 1.058 (0.11) 0.000 Education HS 0.002 (0.09) 0.979 LogWage -0.017 (0.01) 0.003 Education College 0.000 (0.09) 0.997 Region MW 0.307 (0.08) 0.000 LogWage 0.032 (0.01) 0.014 Region S -0.061 (0.07) 0.389 Region MW 0.008 (0.08) 0.913 Region W -0.138 (0.07) 0.059 Region S 0.006 (0.07) 0.932 Constant term 3.295 (0.28) 0.000 Region W 0.003 (0.07) 0.971 Constant term 0.057 (0.27) 0.837 Observations 16210 16210 R2 0.273 0.000 Test outcome test statistic 7.465 p value 0.006

(36)

Table B.4. Insurance prediction 2010

type: probit regression

dependent variable: Insured

estimate s.d. p-value ME Insurance offered 1.029 (0.03) 0.000 0.975 Employment -0.935 (0.06) 0.000 -1.058 Age -0.024 (0.01) 0.000 -0.037 Age2 0.000 (0) 0.000 0.000 Male -0.170 (0.02) 0.000 -0.215 White -0.076 (0.03) 0.003 -0.125 Married 0.384 (0.03) 0.000 0.332 Family size -0.026 (0.01) 0.001 -0.040 Health Excellent 0.079 (0.03) 0.009 0.020 Health Poor -0.104 (0.04) 0.004 -0.175 Mental health Poor 0.108 (0.04) 0.003 0.036 Health priority 0.249 (0.03) 0.000 0.198 Smoke -0.173 (0.03) 0.000 -0.229 Education HS 0.506 (0.04) 0.000 0.420 Education College 0.827 (0.05) 0.000 0.739 LogWage 0.047 (0.01) 0.000 0.034 Region MW -0.055 (0.04) 0.175 -0.135 Region S -0.402 (0.04) 0.000 -0.472 Region W -0.226 (0.04) 0.000 -0.301 Constant term 0.190 (0.14) 0.160 Observations 16210 16210 Pseudo R2 0.192 hitratio 0.80

(37)

Table B.5. Probability positive expenditures 2010

type: probit regression

dependent variable: Expenditures estimate s.d. p-value ME Insured prediction 0.227 (0.03) 0.000 0.064 Age -0.035 (0.01) 0.000 -0.009 Age2 0.001 (0) 0.000 0.000 Male -0.501 (0.02) 0.000 -0.136 White 0.107 (0.03) 0.000 0.029 Married 0.247 (0.03) 0.000 0.067 Family size -0.094 (0.01) 0.000 -0.025 Health Excellent -0.125 (0.03) 0.000 -0.035 Health Poor 0.275 (0.04) 0.000 0.067 Mental health Poor 0.272 (0.04) 0.000 0.066 Health priority 0.635 (0.03) 0.000 0.172 Smoke -0.089 (0.03) 0.004 -0.024 Education HS 0.272 (0.05) 0.000 0.072 Education College 0.582 (0.05) 0.000 0.155 LogWage 0.013 (0) 0.000 0.003 Region MW 0.164 (0.04) 0.000 0.042 Region S -0.093 (0.04) 0.012 -0.025 Region W -0.074 (0.04) 0.057 -0.020 Constant term 0.661 (0.14) 0.000 0.000 Observations 16210 Pseudo R2 0.177 hitratio 0.80

(38)

Table B.6. Level of positive expenditures

type: linear regression

dependent variable: LogExpenditures

(1) (2) estimate (s.d.) p estimate (s.d.) p Insured prediction 0.110 (0.05) 0.016 0.207 (0.06) 0.000 Age -0.012 (0.01) 0.125 -0.016 (0.01) 0.050 Age2 0.000 (0) 0.000 0.000 (0) 0.000 Male -0.331 (0.03) 0.000 -0.464 (0.05) 0.000 White 0.160 (0.03) 0.000 0.187 (0.03) 0.000 Married 0.142 (0.03) 0.000 0.209 (0.04) 0.000 Family size -0.071 (0.01) 0.000 -0.099 (0.01) 0.000 Health Excellent -0.257 (0.04) 0.000 -0.293 (0.04) 0.000 Health Poor 0.587 (0.04) 0.000 0.653 (0.05) 0.000 Mental health Poor 0.396 (0.04) 0.000 0.456 (0.05) 0.000 Health priority 0.563 (0.03) 0.000 0.745 (0.07) 0.000 Smoke -0.193 (0.04) 0.000 -0.216 (0.04) 0.000 Education HS 0.470 (0.07) 0.000 0.542 (0.07) 0.000 Education College 0.747 (0.07) 0.000 0.899 (0.08) 0.000 LogWage -0.029 (0) 0.000 -0.025 (0) 0.000 Region MW 0.062 (0.05) 0.175 0.102 (0.05) 0.036 Region S -0.111 (0.04) 0.009 -0.133 (0.04) 0.003 Region W -0.145 (0.04) 0.001 -0.164 (0.05) 0.000 Constant term 6.253 (0.17) 0.000 5.798 (0.23) 0.000 Lambda 0.739 (0) 0.000 Observations 16210 16210

(39)

Table B.7. Unconditional marginal effect on expenditures (2010) (1) (2) ME (s.d. ME) ME (s.d. ME) Insured 0.509 (0.07) 0.528 (0.08) Age -0.074 (0.01) -0.078 (0.01) Age2 0.001 (0) 0.001 (0) Male -1.199 (0.05) -1.266 (0.05) White 0.328 (0.06) 0.346 (0.06) Married 0.574 (0.06) 0.607 (0.06) Family size -0.232 (0.02) -0.244 (0.02) Health Excellent -0.441 (0.06) -0.467 (0.06) Health Poor 0.988 (0.09) 1.015 (0.08) Mental health Poor 0.827 (0.09) 0.834 (0.08) Health priority 1.635 (0.06) 1.716 (0.06) Smoke -0.321 (0.06) -0.338 (0.07) Education HS 0.887 (0.1) 0.932 (0.1) Education College 1.686 (0.11) 1.762 (0.11) LogWage 0.000 (0.01) 0.025 (0.01) Region MW 0.355 (0.09) 0.363 (0.08) Region S -0.262 (0.08) -0.276 (0.08) Region W -0.255 (0.08) -0.272 (0.08)

Referenties

GERELATEERDE DOCUMENTEN

Keywords: ANN, artificial neural network, AutoGANN, GANN, generalized additive neural network, in- sample model selection, MLP, multilayer perceptron, N2C2S algorithm,

An important finding of the present analysis is that the original IGCCCG classi fication as published in 1997 still distinguishes three prognostic groups among patients with

For the non-working group, the second-order model with a general disability factor and six factors on a lower level, provided an adequate fit. Hence, for this group, the

A study of the factors affecting maternal health care service utilization in Malawi is significant for a number of reasons: Firstly, by examining socio-demographic and

Onderzocht zijn de relaties van voeding met achtereenvolgens de sierwaarde van de planten nadat ze drie weken in de uitbloeiruimte stonden, het percentage goede bloemen op dat moment

The claim that symbolic rewards in the health care insurance market lead to more acceptability of the reward campaign and to more willingness to recommend the company compared

All recent studies used regression analyses which showed that physician density has beneficial effects on several measures of health including infant and under-five

(2016) and the empirical finding of Gorter and Schilp (2012), I hypothesize that risk preference, measured by an individual’s financial risk attitude, is related to