U
NIVERSITY OFA
MSTERDAMBS
CE
CONOMETRICSS
TABILITY IN THE RELATION BETWEEN HEALTH INSURANCE
AND HEALTH CARE EXPENDITURES
A
UTHOR: J
ANELLEZ
OUTKAMP, 10441670
S
UPERVISORS:
DR. H
ANS VANO
PHEM&
DRS. R
OB VANH
EMERTContents
1
Introduction
2
2
Theory about insurance decisions and medical expenditures
3
2.1
Insurance as an endogenous variable . . . .
3
2.2
Medical expenses estimation . . . .
5
3
Model
7
3.1
Insurance coverage prediction
. . . .
7
3.2
Expenditure Estimation . . . .
8
3.2.1
Two-Part model . . . .
8
3.2.2
Heckman two-step estimation . . . .
10
3.2.3
Comparing the two-part model and the Heckman two-step estimation . . .
11
3.3
Empirical Specification . . . .
12
4
Data
12
4.1
Sample description . . . .
12
4.2
Summary statistics . . . .
14
5
Empirical results
16
5.1
Insurance decision
. . . .
16
5.2
Expenditure equation . . . .
20
5.3
Unconditional Marginal Effects . . . .
22
5.4
Stability over time . . . .
24
6
Conclusion
25
References
27
Appendix A - Descriptive statistics
29
Appendix B - Regression output 2010
31
Appendix C - Regression output 2008
39
Appendix D - Regression output 2006
47
Appendix E - Regression output 2004
55
1
Introduction
In periods where income uncertainty is rising health insurance acquisition becomes less natural.
During the past decade the ratio of uninsured in the U.S. has increased constantly (Long et al.,
2014). This is problematic because insurance can play in important role in positive lifestyle
de-cisions and citizen’s feeling of security. Moreover, health insurance is related to higher use of
preventive care which results, on the long run, in higher and healthier life expectancy (Cohen,
Neumann, & Weinstein, 2008). These concerns partially motivated U.S. policy makers to consider
lowering health insurance prices or to extend universal coverage. However, estimating the financial
effects of such decisions is challenging for several reasons.
First, there is a simultaneous connection between the insurance decision and health care
expenditures. On the one hand, there might be an asymmetric information problem. Individuals
buying health insurance are likely to be those who anticipate greater need of health care due to, for
example, their greater health need risk. This is frequently referred to as adverse selection. In
gen-eral, insurers do not have full information on the health of the potential customers. It is therefore
habitual that insured clients have a higher expense risk profile that consequently results in higher
medical expenses (Arrow, 1963). The other indirect connection between insurance and health care
expenditure is due to moral hazard. Individuals, once insured, may become less cautious about their
unhealthy or risky behaviors, which could lead to more health problems, requiring more health care.
In addition, medical insurance lowers the marginal cost of care to the individual, who can choose
to overuse a certain medical service. This paper focuses on the second link, how insurance affects
medical expenses. However, not taking into account the first link, insurance coverage as a function
of health expenditures, might lead to endogeneity bias in estimates of the effects of insurance on
health expenditures.
A second challenge that must be accounted for when modeling health care expenditures is
the large amount of individuals that do not incur any expenditures at all. In economical terms this
is often described as a medical utilization choice (Jerant, Fiscella, Tancredi, & Franks, 2013). First,
a person decides whether to visit a doctor and so incur positive expenditures. Next, the doctor and
patient jointly decide about the treatment and related costs. The health expenditure data can
there-fore be regarded as semi-continuous. With a discrete part for zero expenditures, and a continuous
part for the positive expenses. Linear regression estimates on the censored full sample, or on the
truncated positive expenditure sample would then again lead to biased estimates (Heij et al., 2004).
This thesis examines the effect of insurance on expenditures for health care services. The
instrumental variable estimation technique will be used to control for the endogeneity bias caused
by the simultaneous relation between insurance and expenses. In order to model the expenditure
equation, two models will be used that have been extensively applied in previous literature: the
two-part model and the Heckam two-step model. Moreover, marginal effect for insurance on
medi-cal expenses for the last decade will be medi-calculated to analyze whether this effect is stable over time.
The rest of the paper proceeds as follows. The next section reviews the literature. Section
3 sets up the theoretical model. Section 4 discusses empirical specifications, data and
summa-rizes most important statistics. Section 5 presents the empirical results. Concluding remarks and
suggestions for further discussion are provided in section 6.
2
Theory about insurance decisions and medical expenditures
Researchers have addressed the challenges that arise in modelling expenditure data in a variety of
study designs. This section first studies the instrumental variable approach to control for
endogene-ity. Then earlier research on modelling medical expenditures is discussed.
2.1
Insurance as an endogenous variable
The first challenge, endogeneity, arises from the simultaneous causality between health insurance
and medical expenditures. On the one hand, people who have insurance are much more likely to use
health care than their uninsured counterparts. On the other hand, people who have greater demand
for health care may have more incentive to obtain insurance coverage (Shen, 2013). Consequently
in this paper, methods to deal with this endogeneity will be used.
There are several econometric techniques that deal with this challenge. One such technique
is the use of instrumental variables. There are three requirements that instruments must satisfy.
First, it must be powerful - the instrument should be sufficiently correlated with the insurance
vari-able. The second requirement is exogeneity. Which means that the instrument must be uncorrelated
with the error term in the expenditure equation. As a third requirement, the instruments should not
be directly linked to the health care expenditures.
Plausable instruments for an individual’s insurance status must first be proposed from an
economic point of view. Several job characteristics have previously been used. Variables such as
employment, and if so, in which industry, the firm size and any union membership are likely to
affect insurance coverage because most persons with health insurance at mid-life receive health
benefits from their employers (Crystal, Johnson, Harman, Sambamoorthi, & Kumar, 2000). In
addition, these characteristics are unlikely to directly affect the amount of health services used.
Another variable that would be valid from an economic point of view is whether an employee
of-fers an insurance plan to it’s employers. This variable is powerful because most of the employed
individual’s buy the insurance plan if offered by their employee (McDonalds, 2010). Moreover,
health expenses are not expected to be directly affected by the employee’s insurance proposition.
Once economically conceivable instruments have been found, there are several methods to
implement these variables in the original estimation equation. Among these approaches, the two
stage least square method (2SLS) is the most common. It is used by Meer and Rosen (2004) to
study the effect of health insurance on utilization of different types of medical services. In the first
stage, a linear regression with a set of exogenous variables, that includes instruments, is used to
estimate the insurance variable. In the second regression the actual insurance plan is replaced by
it’s instrumental variable estimator. Another frequently used approach consists of replacing the first
stage’s linear regression by a probability estimation. This method is used by Crystal et al. (2000) to
estimate the effect of being insured on out-of-pocket cost. They used a probit estimation to predict
the individual’s coverage choice. Then again, this prediction replaced the original coverage plan in
their expenditure equation.
If the instruments satisfy the requirements, the 2SLS method and the probit model are both
valid mechanisms to correct the earlier mentioned endogeneity bias. However, attention must be
paid to their statistical differences when choosing among them. The error terms in the probit model
are assumed to be normally distributed, whereas this assumption is not mandatory for the linear
regression used in the 2SLS. Another difference lies in their outcomes. The probit model estimates
each individual’s insurance coverage probability. In contrary, the predicted insurance variable in the
the 2SLS model does not necessarily lie between zero and one and cannot be given an economic
interpretation. In the present context, the probit’s insurance estimation has a higher informative
value and is therefore preferred despite of being more restrictive.
2.2
Medical expenses estimation
After controlling for the endogeneity problem, the insurance effect can be acquired by modelling
the health expenditure estimation equation. The equation must include an array of related variables
to isolate the effect of insurance coverage. This is because the simple correlation between
insur-ance status and medical expenditures may reflect advantageous selection effects rather than causal
effects of health insurance.
First, insured are on average older that the uninsured. Age differences may partly explain
their difference in health care expenditure because responsibility, which implies taking care of
one-self and for example going to the doctor when needed, tends to increase with age (Friedman, 1974).
A second characteristic, also associated with age, is health. A poor health is more likely among
the insured. The unhealthy have more frequent doctor visits and therefore on average increased
medical costs. Other variables that the model should control for include socio-economic
charac-teristics. The insured on average have a higher household income and educational level, which are
associated with higher transportation possibilities to a medical institution and higher health
con-sciousness respectively (Friedman, 1974). This could similarly result in an increased probability
and level of health care demand for the insured.
After controlling for relevant characteristics, it is hypothesized that insurance still has a
positive effect on medical expenditures. Insurance lowers the access threshold to medical care and
certain treatment for individuals as the costs are covered by the insurer.
A comparable argumentation and consequent controlling variable selection is used in most
previ-ous studies on medical expenditures. In these studies, several methods to model health expenditure
data have been applied. Modeling such data is challenging as it typically include a large amount
of observations for individuals that do not incur any expenses at all. It can be regarded as
semi-continuous. With a discrete part for the observations sample, and a continuous part for the positive
expenses. Linear regression estimates on the censored full sample, or only on the truncated positive
expenditure sample would then again lead to biased estimates (Heij et al., 2004).
property is their interpretative characteristic. In economical terms, the two-step approach is often
described as a medical utilization choice. First, a person decides whether to visit a doctor and so
incur positive expenditures. Next, the doctor and patient jointly decide about the treatment and
related costs (Jerant et al., 2013). This intuition translates into a simple computational form. One
equation estimates the probability that a person incurs any positive medical expenses and a second
equation estimates the level of positive expenses. The final expected individual’s expenditures is
obtained by multiplying these two estimates together.
In the existing literature, the RAND Health Insurance Experiment is the only randomized
health insurance research and serves as the benchmark study. This experiment randomly assigned
people into four types of insurance with varying degrees of coverage (Manning, Newhouse, Duan,
Keeler, & Leibowitz, 1987). One group had free care, the others were responsible for 25%, 50%
and 95% of the costs of their care. The two-part model was then used to estimate their differences
in expected health care expenditures. A probit model predicts the probability on positive
expendi-tures, a linear regression was used for the positive expenditure estimation. The experiment results
showed that insurance coverage, f.e. free health care, does have a positive impact on medical
ex-penditures. In fact, individuals in the group that had complete coverage of medical expenditures
incurred approximately 60% higher expenses than the group that paid for 95% of their medical bill.
Miller, Banthin, and Moeller (2004) studies the effect of insurance on medical expenditures
using data from the Medical Expenditure Panel Survey (MEPS). As the distribution for inpatient
expenditures differs from outpatient expenditures, the four part model was applied. This model is
an expansion of the earlier mentioned two-part model. The probability of hospital use is estimated
among all users and then the costs for users who were hospitalized and for those who are not are
es-timated seperately. Consequent to the theoretically expected outcome, they also find that insurance
has a positive effect on health expenses. Health expenditures increase by approximately 60% to
90% when expansion of insurance is simulated. Note that in this study, the individual’s insurance
status is considered as en exogenous variable which might have caused biased estimates.
The two-part model assumes that there is no correlation between the error terms in both
equations. However, in health economics it could be argued that a high probability of incurring
any positive expenditures might be related to a higher expected level of expenditures. Shen (2013)
takes this relation into account using the Heckman two-step model. Here again, a positive effect
of insurance is obtained. However, Shen (2013) defends the use of the adjusted semi-parametric
estimation approach above parametric approaches. He argued that parametric approaches make
distributional assumptions which are not substantiated by economic theory. The parametric model
estimated that individuals with insurance have 125% higher medical expenses that the uninsured,
whereas the semi-parametric approach predicts an increase of 48%, a number closer to that found
the RAND’s experiment.
To summarize, the existing literature provides various alternatives to estimate the insurance
coverage effect on health care expenditure. Among these approaches the two-part model and the
Heckman two-step model have been extensively used. Both models will therefore be discussed in
the next section.
3
Model
The economic model used in this thesis combines the techniques found in previous literature. First,
a probit is used to predict the individual’s insurance decision. Next the two-part model and the
Heckman are explained and compared.
3.1
Insurance coverage prediction
The first model deals with the health insurance choice as an endogenous variable. Let I
ibe an
indicator of whether an individual selects any kind of health insurance coverage. In the model, an
individual selects insurance if the probability p
iof so doing is greater than 50%
1. This probability
is determined by a set of exogenous variables z
iand is estimated with a binary probit. The choice
of being insured can be given an interpretation in terms of an unobserved variable I
i∗that represents
the latent preference of individual i for the choice I
i= 1
I
i∗= z
0iγ + σ
νν
iν
i|z
i∼ NID(0, 1).
(1)
1Further research on the optimal threshold value is recommended. However, for this application the threshold value
The observed choice I
iis related to the index I
i∗by means of the equation
I
i= 1
i f
I
i∗> 0
I
i= 0
i f
I
i∗≤ 0
(2)
which is used to estimate the insurance probability:
p
Ii= P[I
i= 1] = P[I
i∗= z
i0α + σ
νν
i> 0] = P[ν
i< z
0iα
σ
ν] = Φ
z
0iα
σ
ν(3)
Then the predicted insurance status is estimated:
ˆ
I
i= 1
i f
p
ˆ
Ii> 0.5
ˆ
I
i= 0
i f
p
ˆ
Ii< 0.5
(4)
It is important to remark that the error terms are assumed to be normally distributed with zero
mean. Further, ratios of index parameters (θ =
ασ
) are identified and estimated with the Maximum
Likelihood (ML) technique. The scaled parameters can be used to recover probabilities and predict
the individual’s insurance status. This prediction is assumed to be an exogenous valid substitute for
the original insurance status.
3.2
Expenditure Estimation
The predicted insurance status enters the expenditure estimation equation as an exogenous variable.
Recall that linear regression provides biased coefficients when estimating health expenditure data
as it is typically semi-continuous. Two alternative models are therefore discussed: the two-part
model and the Heckman two-step equation.
3.2.1
Two-Part model
The two part model seperately estimates the probability and level of positive expenditure. The
first part estimates the individual’s probability of incurring any positive expenditures. Denote y
ias
the level of medical expenditures, x
ias a set of exogenous variables that affects the probability of
Then, similar to the method used for the insurance prediction, the probability is estimated with a
binary probit. The event of incurring any positive expenditures can be given an interpretation in
terms of an unobserved variable y
∗iy
∗i= x
0iβ
1+ ˆ
I
iδ
1+ σ
εε
1iε
1i|x
i∼ NID(0, 1)
(5)
The positive expenditure probability can be written as
p
i= P[y
∗i> 0] = P[ε
1i< x
0iβ
1+ ˆ
I
iδ
1] = Φ(x
0iβ
1/σ
ε+ ˆ
I
iδ
1/σ
ε) = Φ(x
0iγ
1+ ˆ
I
iγ
2)
(6)
where γ
1= β
1/σ
εand γ
2= δ
1/σ
ε. The second equation is a linear model on the log scale for
positive expenses, given that the person is a positive user of medical services,
log(y
i|y
i> 0) = x
0iβ
2+ ˆ
I
iδ
2+ σ
εε
2iE(ε
2i|y
1i> 0, x
i) = 0.
(7)
In this equation, the error terms are assumed to be identical but not necessarily normally distributed.
However, in the two-part model the error terms for both equations (ε
1iand ε
2i) are assumed to be
independent. This makes it possible to estimate the coefficients of each equation separately (Duan,
Manning, Morris, & Newhouse, 1984). The coefficients in the probit equation can be estimated
with the ML method, ordinary least squares (OLS) may be used in the second equation
2.
The final two-part expenditure estimation is composed combining the previous equations
E[log(y
i)] = P[y
i> 0]E[log(y
i)|y
i> 0]
= Φ(x
01iγ
ˆ
1+ I
iγ
ˆ
2)(x
0iβ
ˆ
2+ I
iδ
ˆ
2).
(8)
Note that the predicted insurance status was only used to obtain consistent coefficients. In the final
equation and following calculations for the marginal effect, the original insurance status I
iis used.
In this thesis the incremental effect of having an insurance on having medical expenses is
2It is assumed that the second equation, the estimation of positive expenditures, satisfies the requirements needed
of interest. This effect is
∆(log(y
i)|I
i) = E[log(y
i)|I
i= 1] − E[log(y
i)|I
i= 0]
= Φ(x
i0γ
ˆ
1+ ˆ
γ
2)(x
0iβ
ˆ
2+ ˆ
δ
2) − Φ(x
0iγ
ˆ
1)(x
0iβ
ˆ
2).
(9)
The insurance coverage effect can be summarized by the mean marginal effect over the full sample
of N individuals.
3.2.2
Heckman two-step estimation
The Two-part model gives consistent estimators if the separability condition holds true, there must
be no correlation between the probability of having positive medical expenses and the level of these
expenses. However, as described in the literature review, this does not necessarily holds true. A
high expense probability is usually related to higher expenditures. In the Two-part model’s second
equation, the error terms come from the truncated sample and do consequently not have zero mean.
Therefore the estimation should be corrected with the inverse Mills ratio λ
i. This term gives an
indication of how the probability and level of expenditures are correlated. The correlation term
is denoted by ρ. Note that normality for the error terms for the second equation is a necessary
condition.
ρ λ
i=E[ε
i|log(y
i) > 0]
=ρ
φ (x
0 iγ
1+ I
iγ
2)
Φ(x
i0γ
1+ I
iγ
2)
(10)
The first equation in the Heckman two-step method, which estimates the probability of incurring
any health expenditures, corresponds with the probit estimation in the two-part model. P[y
i> 0] =
Φ(x
0iγ
1+ ˆ
I
iγ
2). Thus, γ can be estimated consistently by ML. As a second step, the equation for the
level of positive expenses can be written as
The final Heckman two-step expenditure estimation is composed combining both steps
E[log(y
i)] = P[log(y
i) > 0]E[log(y
i)|log(y
i) > 0]
= Φ(x
i0γ
ˆ
1+ I
iγ
ˆ
2)(x
0iβ
ˆ
2+ I
iδ
ˆ
2+ ˆλ
iρ ˆ
ˆ
σ
ε).
= Φ(x
0iγ
ˆ
1+ I
iγ
ˆ
2)(x
i0β
ˆ
2+ I
iδ
ˆ
2) + ˆ
ρ ˆ
σ
εφ (x
0i
γ
ˆ
1+ I
iγ
ˆ
2)
(12)
And the corrected incremental effect is now given by
∆(log(y
i)|I
i) = E[log(y
i)|I
i= 1] − E[log(y
i)|I
i= 0]
= Φ(x
0iγ
ˆ
1+ ˆ
γ
2)(x
0iβ
ˆ
2+ ˆ
δ
2) − Φ(x
0iγ
ˆ
1)(x
i0β
ˆ
2) + ˆ
ρ ˆ
σ
εφ (x
0iγ
ˆ
1+ ˆ
γ
2) − ˆ
ρ ˆ
σ
εφ (x
i0γ
ˆ
1)
(13)
Note that in practice, the coefficient of the inverse Mills ratio ˆ
ρ ˆ
σ
εis be estimated jointly by ML. A
test on the significance of the selection bias (that occurs only if ρ 6= 0) can be performed by testing
whether the coefficient of λ
iis significant.
3.2.3
Comparing the two-part model and the Heckman two-step estimation
The Heckman model uses the inverse Mills ratio, assuming the error terms are normally distributed,
to correct for the correlation between the probability of positive expenditures and it’s level. From
an economic point of view, this term is expected to be significant and positive. But the inverse
Mills ratio acts as an omitted variable in the Two-part model’s second equation. This exclusion has
several consequences.
Consider the possibility that λ correlates with the exogenous explanatory variables. A
positive covariance of the omitted variable λ with a covariate and expenditures will cause the
Two-part estimates for that regressor the be higher than the true coefficient value.
Now suppose that λ
iis orthogonal to the explanatory variables (x
i, z
i). Then the estimated
coefficients for both models would be equal, but the marginal effects in the Two-part model are
underestimated compared to the Heckman method:
This difference increases as λ ’s coefficient ( ˆ
ρ ˆ
σ
ε), and thus the correlation, increases in absolute
value. For this reason the results estimated with the Heckman model are preferred despite of being
more restrictive.
3.3
Empirical Specification
The previous section models the incremental effect of insurance on medical expenses with several
estimation equations. The second equation in both the two-part model and the Heckman model
estimates the level of positive expenditure. Utilization data is usually non-normal, right-skewed
and heteroskedatic, with variance that increases with the mean (Diehr, Yanez, Ash, Hornbrook, &
Lin, 1999). These features do not necessarily cause problems. If the data set is large, OLS
regres-sion on the untransformed data will provide consistent estimates of the regresregres-sion parameters. The
standard errors, however, will be typically too small and give overly significant hypothesis tests. A
log transformation is applied to the positive level of expenditures to improve it’s distribution. In
addition it reduces the influence of outliers and increases the precision of the estimates (Diehr et
al., 1999).The final expenditure equation is a combination of a probit and a linear regression. Due
to this combination, traditional standard errors reported by statistical programs are no longer valid.
In this thesis the bootstrap technique will be used to estimate standard error, confidence intervals
and p-values
3.
4
Data
In this section the dataset is described along with the variables used.
4.1
Sample description
This study’s empirical analysis uses data from the Medical Expenditure Panel Survey (MEPS) for
all even years between 2002 and 2012. The MEPS is a representative survey of the U.S. civilian
population that started in 1996 by the U.S. It collects data on demographic characteristics, health
insurance coverage and medical expenditures of individual participants which are drawn from a
sample of households. The survey involves five interviews rounds, a self-administered
question-naire and collected data from participants and health care providers. Moreover, it has been tested
for robustness bias and predictive accuracy, which makes it very suitable for health expenditure
purposes (Hill & Miller, 2010).
The sample is limited to adults between the age of 19 and 64 because individuals between
these ages are involved in the choice of having a health insurance plan (Holl, Szilagyi, Rodewald,
Byrd, & Weitzman, 2012). Children until the age of 19 are excluded as they participate in their
parent’s insurance plan if they have any. Adults 65 years of age or above are equally excluded as in
the U.S. most of them are covered by Medicare (Holl et al., 2012). Other exclusion criteria include
individuals who had missing values on the most relevant exogenous variables used
4. The amount
of observations for the period 2002-2012 ranges between 10043 and 18101.
These observations’s characteristics are collected into variables which can be used to study
the incremental effect of insurance on medical expenditures. Definitions of the variables can be
found in Appendix A. In the MEPS database, health expenditures are the total amount paid on
health services during a year excluding dental services. Insurance coverage provided by MEPS is
a dummy variable indicating whether a person was insured for at least one month during the year,
including both private and public insurance. Individual characteristics controlled for are
demo-graphics, socio-economic status and health related. The demographics are age, gender, ethnicity
(white, nonwhite), marital status (married, other), family size, and region (North-East, Midwest,
South, West). Socioeconomic variables are income and education. Education is in three categories
indicating the highest grade achieved, college or above, high school or equivalence degreed, and
less than high school education (the default). Income is given by the logarithm of the total salarial
income in U.S. dollars a person received during the year. The health related characteristics are
self-reported health status, mental health and suffering from priority diseases. Self reported health
status is included in three categories indicating if a persons feels very healthy (survey score >3 on
a five point scale), not healthy (survey score <3 on a five point scale) or average (default). Poor
mental health indicates whether a person’s perceived mental health is surveyed as poor or fair. The
priority diseases are defined as such by MEPS and include coronary heart disease, heart attack or
myocardial infarction, any heart disease condition, stroke or transient ischemic attack, emphysema,
high cholesterol, chronicle pain or swelling, arthritis, asthma, diabetes and cancer.
Recalling the endogeneity bias caused the simultaneous relation between insurance
cover-age and health expenses discussed in the previous section, additional variables are used as
instru-ments to predict an individual’s insurance status. Two dummies indicate whether an individual is
or has been employed during the current year and if his employer offers an insurance coverage plan
to it’s employees.
4.2
Summary statistics
Summary statistics of the sample by insurance status for 2012 are reported in table 1.
5. The first
column describes the statistics for the full sample. The second and third columns contain
statis-tics for the insured and uninsured respectively. Out of the 10043 individuals in the data set, 1406
(14%) are uninsured and 2109 (21%) have had zero expenditures during 2012. The mean expenses
are twice as high as for the uninsured. Note that nearly 20% of the insured have had no medical
expenditures at all, in contrary to almost 50% of the uninsured.
Recall that these differences in health expenditure may reflect advantageous selection
ef-fects rather than causal efef-fects of health insurance. The insured are on average approximately five
years older than the uninsured. Perceived health and mental health does not differ much between
the two groups. However, the insured are less healthy as 55% in this sample suffers from any
priority disease, in contrary to 40% of the uninsured. Moreover, the insured do have a favorable
socioeconomic position. Their average yearly income is $32429, which is twice as high as the
uninsured’s income. On average 32% of the insured have a college degree, in contrary to 12% of
the insured.
Similarly, differences appear when analyzing the instrumental variable’s statistics. The
em-ployment rate among both groups is almost equal (approximately 70%). However, 30% of the
uninsured’s employees offers an insurance coverage plan, in contrary to 61% of the insured’s
em-ployees. This might indicate a consumer’s preference for any insurance coverage. Both variables
are regarded as being economically suitable instruments for the insurance decision.
Table 1. Summary Statistics of the total Sample and by Health Insurance Status
Full sample (n=10043) The insured (n=8637) The uninsured (n=1406)
N Mean (S.E.) N Mean (S.E.) N Mean (S.E.)
Insured 0,84 (0,47) Expenditures Total 10043 3180 (9828) 6804 4121 (11274) 3239 1203 (5170) Positive expenditures 8458 4293 (11207) 5730 4894 (12131) 1710 2279 (6942) Zero expenditures 1585 0,16 (0,36) 1074 0,16 (0,36) 1529 0,47 (0,5) Demographics Age 40,72 (13,17) 42,09 (13,22) 37,85 (12,6) Male 4659 0,46 (0,5) 3025 0,44 (0,5) 3432 0,50 (0,5) White 6799 0,68 (0,47) 4518 0,66 (0,47) 4792 0,70 (0,46) Married 5036 0,50 (0,5) 3706 0,54 (0,5) 2794 0,41 (0,49) Family size 3,25 (1,72) 3,11 (1,57) 3,55 (1,97) Health Excellent health 1905 0,19 (0,39) 1273 0,19 (0,39) 1328 0,20 (0,4) Poor health 1601 0,16 (0,37) 992 0,15 (0,35) 1279 0,19 (0,39)
Poor mental health 1355 0,13 (0,34) 918 0,13 (0,34) 918 0,13 (0,34) Priority health 4986 0,50 (0,5) 3734 0,55 (0,5) 2630 0,39 (0,49)
Smoke 1895 0,19 (0,39) 1182 0,17 (0,38) 1498 0,22 (0,41)
Socioeconomic status
Income 27174 (32988) 32469 (36341) 16053 (20435)
High school education 5109 0,51 (0,5) 3437 0,51 (0,5) 3512 0,52 (0,5) College education 2536 0,25 (0,43) 2156 0,32 (0,47) 798 0,12 (0,32) Region Midwest 1735 0,17 (0,38) 1284 0,19 (0,38) 947 0,14 (0,35) South 3739 0,37 (0,48) 2353 0,35 (0,48) 2911 0,43 (0,49) West 2911 0,29 (0,45) 1910 0,28 (0,45) 2103 0,31 (0,46) Instruments Insurance offered 5121 0,51 (0,5) 4184 0,61 (0,49) 1968 0,29 (0,45) Employed 7601 0,76 (0,43) 5197 0,76 (0,42) 5050 0,74 (0,44)
5
Empirical results
This section reports the outcomes estimation for the insurance choice and medical cost in 2012
6
.Then marginal effects for all even years between 2002 and 2012 are reported and compared.
5.1
Insurance decision
As discussed, the individual’s insurance coverage decision is assumed to be endogenous due to the
simultaneous causality argumentation. It can be checked whether this line of reasoning statistically
holds true with the Durbin Wu Hausman (DBH) test. Outcomes for this test are shown in Table 2.
Under the null-hypotheses, that all the regressors are exogenous, the test statistic (n ∗ R
2)
asymp-totically has the χ(1) distribution. As expected, the test significantly rejects the null-hypotheses
(p-value=0.003) for exogeneity.
Employment and whether the individual’s employer offers an insurance plan are
econom-ically plausible instruments. For statistical validity they must be correlated with the insurance
variable, they should not directly affect the level of expenditures and they must not correlate with
the error term in the expenditure equation. Following the example of previous studies (f.e. Meer
and Rosen (2004)), this thesis relies on the earlier mentioned economical argumentation for the
first two requirements: relevance and no direct effect on medical expenditures. To check if the
ex-ogeneity assumption is satisfied, the Sargan test for over-identification is used. Outcomes for this
test are shown in Table 3. The p-value for this test is 0.3163, and thus does not reject the exogeneity
of employment and insurance offer dummies.
76The outputs for other years are bundled in Appendix B-F.
7Note that other methods to test for endogeneity might be preferred due to the semi-continuous character of medical
expenditure data. However it is assumed that the performed DWH and Sargan test give enough evidence to confirm endogeneity and use or reject the proposed instrumental variables.
T able 2. Durbin W u Hausman test 2010 (1) (2) (3) type: linear re gression type: type: linear re gression dependent v ariable: Insured dependent v ariable: dependent v ariable: Residuals Re gression (2) estimate (s.d.) p estimate (s.d.) p estimate (s.d.) p Insurance Of fered 0.336 (0.01) 0.000 Insured 2.060 (0.06) 0.000 Insured 0.714 (0.21) 0.001 Emplo yment -0.282 (0.02) 0.000 Age -0.027 (0.02) 0.086 Age 0.007 (0.02) 0.673 Age -0.011 (0) 0.000 Age2 0.001 (0) 0.003 Age2 0.000 (0) 0.606 Age2 0.000 (0) 0.000 Male -1.171 (0.06) 0.000 Male 0.039 (0.06) 0.500 Male -0.045 (0.01) 0.000 White 0.433 (0.06) 0.000 White 0.032 (0.06) 0.603 White -0.032 (0.01) 0.000 Married 0.329 (0.06) 0.000 Married -0.076 (0.07) 0.263 Married 0.088 (0.01) 0.000 F amily size -0.213 (0.02) 0.000 F amily size 0.014 (0.02) 0.449 F amily size -0.012 (0) 0.000 Health Excellent -0.421 (0.07) 0.000 Health Excellent -0.008 (0.07) 0.917 Health Excellent 0.010 (0.01) 0.361 Health Poor 0.797 (0.09) 0.000 Health Poor 0.034 (0.09) 0.696 Health Poor -0.044 (0.01) 0.001 Mental health Poor 0.754 (0.09) 0.000 Mental health Poor -0.030 (0.09) 0.747 Mental health Poor 0.040 (0.01) 0.004 Health priority 1.498 (0.06) 0.000 Health priority -0.071 (0.07) 0.286 Health priority 0.084 (0.01) 0.000 Smok e -0.088 (0.07) 0.237 Smok e 0.021 (0.07) 0.780 Smok e -0.027 (0.01) 0.016 Education HS 0.516 (0.07) 0.000 Education HS -0.099 (0.08) 0.202 Education HS 0.105 (0.01) 0.000 Education Colle ge 1.009 (0.09) 0.000 Education Colle ge -0.194 (0.1) 0.060 Education Colle ge 0.208 (0.01) 0.000 LogW age -0.023 (0.01) 0.001 LogW age -0.003 (0.01) 0.641 LogW age 0.008 (0) 0.000 Re gion MW 0.350 (0.1) 0.000 Re gion MW 0.026 (0.1) 0.791 Re gion MW -0.052 (0.01) 0.000 Re gion S -0.032 (0.08) 0.695 Re gion S 0.090 (0.09) 0.295 Re gion S -0.129 (0.01) 0.000 Re gion W -0.148 (0.09) 0.087 Re gion W 0.060 (0.09) 0.493 Re gion W -0.077 (0.01) 0.000 Constant term 3.428 (0.32) 0.000 Residual Re gression (1) -0.789 (0.22) 0.000 Constant term 0.788 (0.05) 0.000 Constant term -0.524 (0.35) 0.135 Observ ations 10043 10043 10043 R2 0.206 0.333 0.001 T est outcome test statistic 13.245 p v alue 0.000
Table 3. Sargan test 2010
(1) (2)
type: linear regression type: linear regression
dependent variable: LogExpenditures dependent variable: Residuals Regression (1)
estimate (s.d.) p estimate (s.d.) p
Insured IV 2.774 (0.22) 0.000 Insurance offered -0.020 (0.08) 0.785
Age -0.020 (0.02) 0.218 Employment -0.146 (0.16) 0.353
Age2 0.000 (0) 0.022 Age 0.000 (0.02) 0.999
Male -1.132 (0.06) 0.000 Age2 0.000 (0) 0.984
White 0.465 (0.06) 0.000 Male -0.002 (0.06) 0.972
Married 0.254 (0.07) 0.000 White 0.000 (0.06) 0.994
Family size -0.199 (0.02) 0.000 Married 0.000 (0.07) 0.998 Health Excellent -0.429 (0.08) 0.000 Family size 0.000 (0.02) 0.991 Health Poor 0.831 (0.09) 0.000 Health Excellent 0.000 (0.08) 0.998 Mental health Poor 0.724 (0.1) 0.000 Health Poor -0.004 (0.09) 0.963 Health priority 1.426 (0.07) 0.000 Mental health Poor -0.005 (0.1) 0.957 Smoke -0.067 (0.08) 0.388 Health priority 0.001 (0.07) 0.991 Education HS 0.418 (0.08) 0.000 Smoke 0.002 (0.08) 0.981 Education College 0.815 (0.11) 0.000 Education HS 0.002 (0.08) 0.976 LogWage -0.026 (0.01) 0.000 Education College -0.001 (0.09) 0.995
Region MW 0.376 (0.1) 0.000 LogWage 0.014 (0.02) 0.373
Region S 0.058 (0.09) 0.520 Region MW 0.004 (0.1) 0.969 Region W -0.088 (0.09) 0.340 Region S 0.002 (0.09) 0.985 Constant term 2.904 (0.37) 0.000 Region W 0.001 (0.09) 0.992
Constant term Observations 10043 10043 R2 0.275 0.000 Test outcome test statistic 1.018 p value 0.313
As the proposed instruments are not rejected a probit estimation can be performed to predict
each individual’s insurance coverage status. This estimation’s purpose is merely to correct for the
endogeneity bias. However, results provide information about the individual’s decision process and
are therefore worth commenting.
The results for this estimation are shown in Table 4. The highest marginal effect on the
probability of having insurance comes from the insurance offer variable, with a the p-value being
less than 0.01. Whether an employer offers an insurance to its employees increases to probability
of having an insurance by more than 30%. Education also has a significant effect on the insurance
coverage. People with a high school or college degree on average have a 10 and 21% higher chance
of being insured, compared to individuals with no high school degree (coefficient p-value<0.01).
Individuals with a poor mental health and people suffering from a priority disease have a significant
positive impacts on insurance coverage.Other significant variables with a positive effect are age,
wage and marital status.
For most of the coefficients, the marginal effect on the probability of having an insurance are
as expected from the economical theory and should therefore be suitable to generate the insurance
prediction. The portion of estimated insured individuals is 87%, which is only slightly higher
than the original sample ratio (84%). Moreover, the ratio of observation correctly estimated is
approximately 80%.
Table 4. Insurance prediction
type: probit regression
dependent variable: Insured
estimate s.d. p-value ME Insurance offered 0.995 (0.03) 0.000 0.926 Employment -0.820 (0.07) 0.000 -0.961 Age -0.042 (0.01) 0.000 -0.057 Age2 0.001 (0) 0.000 0.000 Male -0.142 (0.03) 0.000 -0.199 White -0.095 (0.03) 0.002 -0.156 Married 0.290 (0.03) 0.000 0.227 Family size -0.035 (0.01) 0.000 -0.053 Health Excellent 0.030 (0.04) 0.426 -0.044 Health Poor -0.150 (0.04) 0.000 -0.234 Mental health Poor 0.110 (0.05) 0.015 0.021 Health priority 0.275 (0.03) 0.000 0.211 Smoke -0.095 (0.04) 0.010 -0.167 Education HS 0.300 (0.03) 0.000 0.232 Education College 0.715 (0.05) 0.000 0.627 LogWage 0.027 (0.01) 0.000 0.013 Region MW -0.188 (0.05) 0.000 -0.288 Region S -0.441 (0.04) 0.000 -0.526 Region W -0.272 (0.05) 0.000 -0.360 Constant term 1.006 (0.16) 0.000 0.694 Observations 10043 Pseudo R2 0.175 hitratio 0.80
5.2
Expenditure equation
In the previous section the individual’s insurance status has been predicted. Recall that this is a
required step in order to address the endogeneity problem arising from the simultaneous causality
between health insurance and expenditures. Consequently we assume that substituting the original
insurance variable for it’s prediction in the following equations, deals with the endogeneity issue.
The expenditure equations are estimated with the two-part model and the Heckman two-step model.
First stage: Positive expenditure probability
Table 5 shows the individual probability of having any positive medical expenditures. One
of the most important questions here is how this is affected by the insurance status. The average
marginal effect of insurance coverage on this probability is 9.5% and significant (p-value<0.05).
Meaning that if everyone in the sample was moved from uninsured to insured, the average gain in
the probability of having any positive medical expenses would increase with 9.5%. Health
vari-ables equally have a significant effect. People with poor mental health and a poorly perceived
health have a 6 to 7% higher probability of accessing health care than average healthy individuals.
Moreover, suffering from a high priority disease increases the expenditure probability with
approx-imately 18%. An interesting finding is that individuals that feel very healthy are not less likely to
have any positive expenditures than average healthy individuals. As reasoned from the economic
point of view high school and college education both have a positive effect. Another interesting
finding is that income does not have a significant impact (p-value>0.05) on the positive health
ex-penditure probability once the insurance decision is taken into account. Male and family size have
a significant positive effect, while smoking is not significant.
Table 5. Probability positive expenditures
type: probit regression
dependent variable: Expenditures estimate s.d. p-value ME Insured prediction 0.305 (0.04) 0.000 0.095 Age -0.019 (0.01) 0.026 -0.006 Age2 0.000 (0) 0.003 0.000 Male -0.550 (0.03) 0.000 -0.162 White 0.142 (0.03) 0.000 0.042 Married 0.194 (0.04) 0.000 0.057 Family size -0.095 (0.01) 0.000 -0.028 Health Excellent -0.122 (0.04) 0.001 -0.037 Health Poor 0.223 (0.05) 0.000 0.061 Mental health Poor 0.279 (0.05) 0.000 0.075 Health priority 0.624 (0.03) 0.000 0.181 Smoke -0.052 (0.04) 0.187 -0.015 Education HS 0.208 (0.04) 0.000 0.061 Education College 0.510 (0.05) 0.000 0.134 LogWage 0.003 (0) 0.464 0.001 Region MW 0.230 (0.05) 0.000 0.063 Region S -0.028 (0.04) 0.527 -0.008 Region W -0.036 (0.05) 0.438 -0.010 Constant term 0.498 (0.17) 0.003 0.000 Observations 10043 Pseudo R2 0.181 hitratio 0.80
Second stage: Estimating positive expenditures
Table 6 reports the estimated medical utilization equations, conditional on being positive. The first
part shows the estimates for the two-part model, the second part represents the Heckman two-step
model.
First note that the coefficients on the positive level of expenditures for most of the variables
have the same sign as in the previous stage. Moreover insurance coverage increases the level of
expenditures by about 20% in the two-part model and 35% in the Heckman model. It is remarkable
that income has a significant but negative impact on the amount of medical expenditures, in contrary
to what economical theory suggests. The impact of this finding, which is similar to that found by
Shen (2013) is relatively small as a log-transformation has been applied to the raw wage data.
Health variables have the biggest impact. Poor mental health, poor perceived health and suffering
from a priority disease all increase the expected positive expenses with approximately 50% in both
the two part and the Heckman model.
Moreover, the outcomes show that the inverse Mills ratio has a significant and positive
effect. This indicates that there is indeed a positive correlation between the probability and the level
of these expenses. According to the theory discussed, the two-part model’s estimates are biased if
λ correlates with the covariates and expenditures. Moreover, the marginal effects discussed in the
next paragraph are underestimated.
Table 6. Level of positive expenditures
type: linear regression
dependent variable: LogExpenditures
(1) (2) estimate (s.d.) p estimate (s.d.) p Insured prediction 0.201 (0.06) 0.000 0.351 (0.08) 0.000 Age -0.038 (0.01) 0.000 -0.038 (0.01) 0.000 Age2 0.001 (0) 0.000 0.001 (0) 0.000 Male -0.334 (0.04) 0.000 -0.530 (0.08) 0.000 White 0.208 (0.04) 0.000 0.256 (0.05) 0.000 Married 0.091 (0.04) 0.037 0.165 (0.05) 0.001 Family size -0.072 (0.01) 0.000 -0.108 (0.02) 0.000 Health Excellent -0.247 (0.05) 0.000 -0.299 (0.06) 0.000 Health Poor 0.467 (0.06) 0.000 0.542 (0.06) 0.000 Mental health Poor 0.453 (0.06) 0.000 0.533 (0.07) 0.000 Health priority 0.578 (0.04) 0.000 0.811 (0.09) 0.000 Smoke -0.042 (0.05) 0.405 -0.058 (0.05) 0.263 Education HS 0.336 (0.05) 0.000 0.415 (0.06) 0.000 Education College 0.571 (0.06) 0.000 0.744 (0.09) 0.000 LogWage -0.023 (0) 0.000 -0.021 (0) 0.000 Region MW -0.075 (0.06) 0.230 0.000 (0.07) 0.999 Region S -0.169 (0.06) 0.002 -0.176 (0.06) 0.002 Region W -0.243 (0.06) 0.000 -0.253 (0.06) 0.000 Constant term 6.955 (0.22) 0.000 6.316 (0.31) 0.000 Lambda 0.915 (0) 0.000 Observations 10043 10043
5.3
Unconditional Marginal Effects
The unconditional marginal effects for both models are shown in table 7. The two-part model
ap-proach estimates the marginal impact of insurance on expenditures to be 67% on average. The
Heckman estimate is slightly higher, namely 76%. The difference between the marginal effects in
the two models is as expected because the inverse Mills ratio is significantly different from zero and
positive. The two-part model, which has been extensively used in the existing literature, therefore
consistently underestimates the explanatory coefficients. The Heckman provided estimates depend
on the (joint) normality assumption. If this assumption is incorrectly imposed, the resulting
esti-mator is typically inconsistent.
Another way to interpret the validity of the obtained results is by comparing them with
outcomes from earlier discussed research. Recall that earlier experimental research estimated the
effect on approximately 60% (Manning et al., 1987), parametric research found an effect of
ap-proximately 80%, and semi-parametric research about 50%. The estimates derived in this paper
are similar to these mentioned studies. However, caution must be taken into account as earlier
re-search is based on different data sets, models and strategies. For example, Shen uses MEPS data
for 2015, limits it’s sample to working individuals and only estimates the effect on expenditure for
private insurance coverage. The above results are based on the sample including people who have
outpatient use and those with inpatient use. The RAND experiment study notes that the distribution
of medical expenditures differs for these two groups (Manning et al., 1987). To address this issue
the model has been re-estimated for individuals with only outpatient use and found that the results
are similar. Detailed results are available on request.
Table 7. Unconditional marginal effect on expenditures (2010) (1) (2) ME (s.d. ME) ME (s.d. ME) Insured 0.689 (0.08) 0.772 (0.09) Age -0.062 (0.02) -0.065 (0.02) Age2 0.001 (0) 0.001 (0) Male -1.220 (0.06) -1.349 (0.07) White 0.407 (0.06) 0.448 (0.07) Married 0.410 (0.07) 0.456 (0.08) Family size -0.222 (0.02) -0.246 (0.02) Health Excellent -0.400 (0.08) -0.439 (0.09) Health Poor 0.742 (0.1) 0.812 (0.11) Mental health Poor 0.830 (0.1) 0.902 (0.11) Health priority 1.533 (0.06) 1.693 (0.07) Smoke -0.123 (0.08) -0.140 (0.09) Education HS 0.619 (0.08) 0.678 (0.09) Education College 1.327 (0.1) 1.457 (0.11) LogWage -0.012 (0.01) -0.012 (0.01) Region MW 0.350 (0.11) 0.399 (0.12) Region S -0.176 (0.09) -0.186 (0.1) Region W -0.245 (0.09) -0.258 (0.1)
5.4
Stability over time
Another possibility to interpret the results is to examine the insurance effect on health expenditures
over time. The method applied in this paper is equally applied to all even years between 2002 and
2012. First the individual’s insurance status is estimated. Then this prediction replaced the original
insurance status in the expenditure equations. Descriptive statistics and outcomes for each
estima-tion equaestima-tion are displayed in Appendix B-F. Table 8 shows the final result, the marginal effect
for insurance status on medical expenditures for each analyzed year. The effect does not seem to
differ extremely within the last decade. The lowest effect is found in 2002 and 2008, where being
insured affects health care expenditure on average with 57% and 59% estimated with the two-part
model and the Heckman model respectively. The highest effect is found in 2004 and 2012, where
the effect is approximately 67% and 73%. Note that these differences must be compared with each
year’s specific descriptive statistics and regression outcomes. In addition, the hypotheses that
em-ployment and insurance offered are exogeneous and therefore valid instuments for the insurance
status is rejected in 2008 and 2010. Estimations for these years are therefore biased.
Table 8: Marginal effect insurance on expenditures over time
(1) (2)
ME TPM (s.d. ME) ME Heckman (s.d. ME)
2002 0.617 (0.07) 0.638 (0.08) 2004 0.727 (0.07) 0.742 (0.08) 2006 0.729 (0.07) 0.764 (0.07) 2008 0.621 (0.07) 0.633 (0.08) 2010 0.509 (0.07) 0.528 (0.08) 2012 0.689 (0.08) 0.772 (0.09)
Figure 1: Graph marginal effect insurance on expenditure over time
6
Conclusion
Using the Medical Expenditure Panel Survey, this paper has examined the effect of health insurance
on the health care expenditures. From an economic point of view, insurance status is likely to be an
endogenous variable, not taking this into account would cause the interpretation to the statistical
re-lationship to be problematic. The Durbin Wu Hausman test confirms this intuition, exogeneity has
been rejected. In order to address this problem the instrumental variable estimation was applied.
Economically and statistically, employment and whether an employer offers an insurance plan to
it’s employees seem to be suitable instruments for the insurance variable. The insurance status is
predicted with a probit estimation, and this estimation substitutes the original insurance status in
the expenditure equations.
Two models have been extensively used in previous literature: the two-part model and the
Heckman two-step model. Both estimations find a positive and statistically significant effect of
insurance. However, before commenting on the final results, their different estimations can be
in-terpreted. The covariate coefficient in the two-part model are lower than the Heckman estimates in
the expenditure part of both models.
Moreover, the marginal effect, which are of main interest in this paper, also differ between
the two models. As expected in the modeling section, the two-part marginal effects underestimate
the insurance effect. The Heckman model estimates are, on average, 4% higher. However it is
remarkable that the differences are not extremely high. Semi-parametric models have estimated
the effect of insurance and private insurance between 60% and 80%. This coincides with the
re-sults found in this paper. Another approach to interpret the estimated impact is to analyze if it has
been stable within the past decade. For all estimates we find similar results. The two-part model
marginal effects are lower than the Heckman’s. But the differences and effects do not seem to
change extremely over time. Insurance increases medical costs between 57% and 68% with the
two-part model, and between 59% and 78%in the Heckman model. These results are, again,
com-parable to that found in previous literature and therefore seem to be plausible.
There are some limitations and consequently some future research directions that must be
pointed out. First, this thesis assumes that the error terms are normally distributed using parametric
models. However, in practice there is no reason to make such an assumption. A further analysis on
the use of semi-parametric or non-parametric models is highly recommended.
Secondly, if the event of incurring any positive expenditures is regarded as a choice to visit
a doctor then the explanatory variables might be different in the first and second equation for both
the two-part and the Heckman two-step model. Research on this property is advised, Further, this
study relies on data from the U.S., therefore the outcomes might not apply to other populations.
Moreover, as (Shen, 2013) and other did, it might be interesting to consider only private
in-surance as an endogenous variable. This is because other types of inin-surance might not be a choice
for the customer, as accepting it for free might not be a choice. include
References
Arrow, K. J. (1963). Uncertainty and the welfare economics of medical care. The American
economic review, 941–973.
Cohen, J. T., Neumann, P. J., & Weinstein, M. C. (2008). Does preventive care save money? health
economics and the presidential candidates. New England Journal of Medicine, 358(7), 661–
663.
Crystal, S., Johnson, R. W., Harman, J., Sambamoorthi, U., & Kumar, R. (2000). Out-of-pocket
health care costs among older americans. Journal of Gemtology: Social Sciences, 55(1),
S5I–S62.
Diehr, P., Yanez, D., Ash, A., Hornbrook, M., & Lin, D. (1999). Methods for analyzing health care
utilization and costs. Annual review of public health, 20(1), 125–144.
Duan, N., Manning, W. G., Morris, C. N., & Newhouse, J. P. (1984). Choosing between the
sample-selection model and the multi-part model. Journal of Business & Economic Statistics, 2(3),
283–289.
Friedman, B. (1974). Risk aversion and the consumer choice of health insurance option. The
Review of Economics and Statistics, 209–214.
Heij, C., De Boer, P., Franses, P. H., Kloek, T., Van Dijk, H. K., et al. (2004). Econometric methods
with applications in business and economics. Oxford University Press.
Hill, S. C., & Miller, G. E. (2010). Health expenditure estimation and functional form: applications
of the generalized gamma and extended estimating equations models. Health economics,
19(5), 608–627.
Holl, J. L., Szilagyi, P. G., Rodewald, L. E., Byrd, R. S., & Weitzman, M. L. (2012). Profile of
uninsured in the united states. Archives of pediatrics & adolescent medicine, 149(4), 398–
406.
Jerant, A., Fiscella, K., Tancredi, D. J., & Franks, P. (2013). Health insurance is associated with
preventive care but not personal health behaviors. The Journal of the American Board of
Family Medicine, 26(6), 759–767.
Long, S. K., Kenney, G. M., Zuckerman, S., Wissoker, D., Shartzer, A., Karpman, M., & Anderson,
N. (2014). Quicktake: Number of uninsured adults continues to fall under the aca: Down by
8.0 million in june 2014. Washington, DC: Urban Institute.
Manning, W. G., Newhouse, J. P., Duan, N., Keeler, E. B., & Leibowitz, A. (1987). Health
insurance and the demand for medical care: evidence from a randomized experiment. The
American economic review, 251–277.
McDonalds, M. (2010). A profile of the uninsured persons in the united states. Pfizer facts.
Meer, J., & Rosen, H. S. (2004). Insurance and the utilization of medical services. Social Science
& Medicine, 58(9), 1623–1632.
Miller, G. E., Banthin, J. S., & Moeller, J. F. (2004). Covering the uninsured: Estimates of the
impact on total health expenditures for 2002.
Shen, C. (2013). Determinants of health care decisions: Insurance, utilization, and expenditures.
The Review of Economics and Statistics, 95(1), 142–153.
Appendix A - Variable definitions
Table A.1. Variable Definitions
Variable Definition
Age Individual’s age by end of current year Age2 Squared individual’s age
Education College Dummy=1 if individual’s highest degree is College Education HS Dummy=1 if individual’s highest degree is High School
Employment Dummy=1 if individual has been employed for at least one month in current year.
Family Size Individual’s family size
Health Excellent Percieved health. Dummy=1 if individual reported score 4 or higher on a 5 point scale
Health Poor Percieved health. Dummy=1 if individual reported score 2 or lower on a 5 point scale
Health Priority Dummy=1 if individual has ever been diagnosed with one of the fol-lowing conditions
High blood pressure, Coronary hear disease, Angina, Heart atack, Myocardial infarction, Stroke, Emphysema, Chronic bronchitis, High cholesterol, Cancer, Diabetes, Joint pain, Arthritis, Asthma, Attention deficit disorder.
Insurance offered Dummy=1 if individual’s employee offers insurance plan to it’s em-ployers.
Insured Dummy=1 if individual has been insured for at least one month in current year
Insured prediction Dummy=1 if probit model predicts that the individual has been insured for at least one month in curring year
Variable Definition
LogExpenditure Logarithm of health care expenditures in current year US dollars LogWage Logarithm of yearly wage income in current year US dollars Male Gender. Dummy=1 if individual is male
Married Marital status. Dummy=1 if individual has ever been married during current year
Mental health Poor Percieved mental health. Dummy=1 if individual reported score 4 or higher on a 5 point scale
Region MW Dummy=1 if individual lives in region Mid-West Region S Dummy=1 if individual lives in region South Region W Dummy=1 if individual lives in region West Smokes Dummy=1 if individual currently smokes White Ethnicity. Dummy=1 if individual is white
T able B.1. Summary Statistics of the total Sample and by Health Insurance Status 2010 Full sample (n=16210) The insured The uninsured N Mean (S.E.) N Mean (S.E.) N Mean (S.E.) Insured 0.70 0.46 Expenditures T otal 3341 9510 4165 10674 1422 5519 Zero expenditures 0.14 0.35 0.14 0.35 0.44 0.50 Positi v e expenditures 4345 10642 4844 11368 2552 7197 Demographics Age 41.20 13.08 42.66 12.98 37.82 12.68 Male 0.46 0.50 0.44 0.50 0.51 0.50 White 0.70 0.46 0.70 0.46 0.72 0.45 Married 0.52 0.50 0.57 0.50 0.40 0.49 F amily size 3.17 1.67 3.06 1.57 3.43 1.86 Health Excellent health 0.20 0.40 0.20 0.40 0.20 0.40 Poor health 0.14 0.34 0.13 0.34 0.16 0.36 Poor mental health 0.13 0.34 0.13 0.34 0.14 0.35 Priority health 0.52 0.50 0.57 0.50 0.41 0.49 Smok e 0.20 0.40 0.18 0.38 0.25 0.44 Socioeconomic status High school education 0.44 0.50 0.39 0.49 0.54 0.50 Colle ge education 0.49 0.50 0.56 0.50 0.32 0.47 Income 26890 31415 31923 34288 15175 18748 Demographics Mide west 0.21 0.40 0.23 0.42 0.16 0.36 South 0.38 0.48 0.35 0.48 0.45 0.50 west 0.27 0.44 0.26 0.44 0.28 0.45 Instruments Insurance of fered 0.51 0.50 0.61 0.49 0.26 0.44 Emplo yed 0.76 0.42 0.78
T able B.2. Durbin W u Hausman test 2010 (1) (2) (3) type: linear re gression type: type: linear re gression dependent v ariable: Insured dependent v ariable: dependent v ariable: Residuals Re gression (2) estimate (s.d.) p estimate (s.d.) p estimate (s.d.) p Insurance Of fered 0.327 (0.01) 0.000 Insured 1.990 (0.05) 0.000 Insured 0.316 (0.16) 0.046 Emplo yment -0.307 (0.02) 0.000 Age -0.038 (0.01) 0.002 Age 0.001 (0.01) 0.958 Age -0.005 (0) 0.010 Age2 0.001 (0) 0.000 Age2 0.000 (0) 0.892 Age2 0.000 (0) 0.000 Male -1.026 (0.04) 0.000 Male 0.017 (0.04) 0.700 Male -0.052 (0.01) 0.000 White 0.327 (0.05) 0.000 White 0.011 (0.05) 0.815 White -0.025 (0.01) 0.000 Married 0.390 (0.05) 0.000 Married -0.038 (0.05) 0.482 Married 0.111 (0.01) 0.000 F amily size -0.208 (0.01) 0.000 F amily size 0.005 (0.01) 0.723 F amily size -0.010 (0) 0.000 Health Excellent -0.423 (0.06) 0.000 Health Excellent -0.005 (0.06) 0.924 Health Excellent 0.023 (0.01) 0.006 Health Poor 0.960 (0.07) 0.000 Health Poor 0.010 (0.07) 0.884 Health Poor -0.026 (0.01) 0.013 Mental health Poor 0.691 (0.07) 0.000 Mental health Poor -0.011 (0.07) 0.881 Mental health Poor 0.035 (0.01) 0.001 Health priority 1.468 (0.05) 0.000 Health priority -0.027 (0.05) 0.590 Health priority 0.072 (0.01) 0.000 Smok e -0.207 (0.06) 0.000 Smok e 0.019 (0.06) 0.735 Smok e -0.051 (0.01) 0.000 Education HS 0.586 (0.09) 0.000 Education HS -0.069 (0.09) 0.469 Education HS 0.174 (0.01) 0.000 Education Colle ge 1.165 (0.09) 0.000 Education Colle ge -0.107 (0.1) 0.305 Education Colle ge 0.264 (0.01) 0.000 LogW age -0.015 (0.01) 0.005 LogW age -0.002 (0.01) 0.711 LogW age 0.013 (0) 0.000 Re gion MW 0.306 (0.07) 0.000 Re gion MW 0.001 (0.07) 0.985 Re gion MW -0.010 (0.01) 0.333 Re gion S -0.095 (0.07) 0.150 Re gion S 0.034 (0.07) 0.617 Re gion S -0.110 (0.01) 0.000 Re gion W -0.158 (0.07) 0.024 Re gion W 0.020 (0.07) 0.780 Re gion W -0.057 (0.01) 0.000 Constant term 3.428 (0.26) 0.000 Residual Re gression (1) -0.352 (0.17) 0.036 Constant term 0.524 (0.04) 0.000 Constant term -0.133 (0.27) 0.623 Observ ations 16210 16210 16210 R 2 0.220 0.329 0.000 T est outcome test statistic 4.411 p v alue 0.036
Table B.3.: Sargan test 2010
(1) (2)
type: linear regression type: linear regression
dependent variable: LogExpenditures dependent variable: Residuals Regression (1)
estimate (s.d.) p estimate (s.d.) p
Insured IV 2.307 (0.17) 0.000 Insurance offered -0.045 (0.06) 0.429
Age -0.037 (0.01) 0.003 Employment -0.327 (0.13) 0.012
Age2 0.001 (0) 0.000 Age -0.001 (0.01) 0.945
Male -1.009 (0.05) 0.000 Age2 0.000 (0) 0.992
White 0.338 (0.05) 0.000 Male -0.003 (0.05) 0.949
Married 0.352 (0.06) 0.000 White -0.001 (0.05) 0.991
Family size -0.202 (0.02) 0.000 Married 0.001 (0.05) 0.981 Health Excellent -0.428 (0.06) 0.000 Family size 0.000 (0.02) 0.984 Health Poor 0.970 (0.07) 0.000 Health Excellent -0.003 (0.06) 0.961 Mental health Poor 0.681 (0.07) 0.000 Health Poor -0.007 (0.07) 0.920 Health priority 1.441 (0.05) 0.000 Mental health Poor -0.006 (0.07) 0.937 Smoke -0.187 (0.06) 0.002 Health priority 0.001 (0.05) 0.986 Education HS 0.517 (0.1) 0.000 Smoke -0.001 (0.06) 0.987 Education College 1.058 (0.11) 0.000 Education HS 0.002 (0.09) 0.979 LogWage -0.017 (0.01) 0.003 Education College 0.000 (0.09) 0.997 Region MW 0.307 (0.08) 0.000 LogWage 0.032 (0.01) 0.014 Region S -0.061 (0.07) 0.389 Region MW 0.008 (0.08) 0.913 Region W -0.138 (0.07) 0.059 Region S 0.006 (0.07) 0.932 Constant term 3.295 (0.28) 0.000 Region W 0.003 (0.07) 0.971 Constant term 0.057 (0.27) 0.837 Observations 16210 16210 R2 0.273 0.000 Test outcome test statistic 7.465 p value 0.006
Table B.4. Insurance prediction 2010
type: probit regression
dependent variable: Insured
estimate s.d. p-value ME Insurance offered 1.029 (0.03) 0.000 0.975 Employment -0.935 (0.06) 0.000 -1.058 Age -0.024 (0.01) 0.000 -0.037 Age2 0.000 (0) 0.000 0.000 Male -0.170 (0.02) 0.000 -0.215 White -0.076 (0.03) 0.003 -0.125 Married 0.384 (0.03) 0.000 0.332 Family size -0.026 (0.01) 0.001 -0.040 Health Excellent 0.079 (0.03) 0.009 0.020 Health Poor -0.104 (0.04) 0.004 -0.175 Mental health Poor 0.108 (0.04) 0.003 0.036 Health priority 0.249 (0.03) 0.000 0.198 Smoke -0.173 (0.03) 0.000 -0.229 Education HS 0.506 (0.04) 0.000 0.420 Education College 0.827 (0.05) 0.000 0.739 LogWage 0.047 (0.01) 0.000 0.034 Region MW -0.055 (0.04) 0.175 -0.135 Region S -0.402 (0.04) 0.000 -0.472 Region W -0.226 (0.04) 0.000 -0.301 Constant term 0.190 (0.14) 0.160 Observations 16210 16210 Pseudo R2 0.192 hitratio 0.80
Table B.5. Probability positive expenditures 2010
type: probit regression
dependent variable: Expenditures estimate s.d. p-value ME Insured prediction 0.227 (0.03) 0.000 0.064 Age -0.035 (0.01) 0.000 -0.009 Age2 0.001 (0) 0.000 0.000 Male -0.501 (0.02) 0.000 -0.136 White 0.107 (0.03) 0.000 0.029 Married 0.247 (0.03) 0.000 0.067 Family size -0.094 (0.01) 0.000 -0.025 Health Excellent -0.125 (0.03) 0.000 -0.035 Health Poor 0.275 (0.04) 0.000 0.067 Mental health Poor 0.272 (0.04) 0.000 0.066 Health priority 0.635 (0.03) 0.000 0.172 Smoke -0.089 (0.03) 0.004 -0.024 Education HS 0.272 (0.05) 0.000 0.072 Education College 0.582 (0.05) 0.000 0.155 LogWage 0.013 (0) 0.000 0.003 Region MW 0.164 (0.04) 0.000 0.042 Region S -0.093 (0.04) 0.012 -0.025 Region W -0.074 (0.04) 0.057 -0.020 Constant term 0.661 (0.14) 0.000 0.000 Observations 16210 Pseudo R2 0.177 hitratio 0.80
Table B.6. Level of positive expenditures
type: linear regression
dependent variable: LogExpenditures
(1) (2) estimate (s.d.) p estimate (s.d.) p Insured prediction 0.110 (0.05) 0.016 0.207 (0.06) 0.000 Age -0.012 (0.01) 0.125 -0.016 (0.01) 0.050 Age2 0.000 (0) 0.000 0.000 (0) 0.000 Male -0.331 (0.03) 0.000 -0.464 (0.05) 0.000 White 0.160 (0.03) 0.000 0.187 (0.03) 0.000 Married 0.142 (0.03) 0.000 0.209 (0.04) 0.000 Family size -0.071 (0.01) 0.000 -0.099 (0.01) 0.000 Health Excellent -0.257 (0.04) 0.000 -0.293 (0.04) 0.000 Health Poor 0.587 (0.04) 0.000 0.653 (0.05) 0.000 Mental health Poor 0.396 (0.04) 0.000 0.456 (0.05) 0.000 Health priority 0.563 (0.03) 0.000 0.745 (0.07) 0.000 Smoke -0.193 (0.04) 0.000 -0.216 (0.04) 0.000 Education HS 0.470 (0.07) 0.000 0.542 (0.07) 0.000 Education College 0.747 (0.07) 0.000 0.899 (0.08) 0.000 LogWage -0.029 (0) 0.000 -0.025 (0) 0.000 Region MW 0.062 (0.05) 0.175 0.102 (0.05) 0.036 Region S -0.111 (0.04) 0.009 -0.133 (0.04) 0.003 Region W -0.145 (0.04) 0.001 -0.164 (0.05) 0.000 Constant term 6.253 (0.17) 0.000 5.798 (0.23) 0.000 Lambda 0.739 (0) 0.000 Observations 16210 16210
Table B.7. Unconditional marginal effect on expenditures (2010) (1) (2) ME (s.d. ME) ME (s.d. ME) Insured 0.509 (0.07) 0.528 (0.08) Age -0.074 (0.01) -0.078 (0.01) Age2 0.001 (0) 0.001 (0) Male -1.199 (0.05) -1.266 (0.05) White 0.328 (0.06) 0.346 (0.06) Married 0.574 (0.06) 0.607 (0.06) Family size -0.232 (0.02) -0.244 (0.02) Health Excellent -0.441 (0.06) -0.467 (0.06) Health Poor 0.988 (0.09) 1.015 (0.08) Mental health Poor 0.827 (0.09) 0.834 (0.08) Health priority 1.635 (0.06) 1.716 (0.06) Smoke -0.321 (0.06) -0.338 (0.07) Education HS 0.887 (0.1) 0.932 (0.1) Education College 1.686 (0.11) 1.762 (0.11) LogWage 0.000 (0.01) 0.025 (0.01) Region MW 0.355 (0.09) 0.363 (0.08) Region S -0.262 (0.08) -0.276 (0.08) Region W -0.255 (0.08) -0.272 (0.08)