• No results found

The relationship between insurance status and total health expenditures in the United States of America : a semiparametric approach of an extended Cosslett-Model

N/A
N/A
Protected

Academic year: 2021

Share "The relationship between insurance status and total health expenditures in the United States of America : a semiparametric approach of an extended Cosslett-Model"

Copied!
70
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

up into a number of sections and contains references. An outline can be something like (this

is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper

from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page)

(c) Introduction

(d) Theoretical background

(e) Model

(f) Data

(g) Empirical Analysis

(h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you

use should be logical) and the heading of the sections. You have a free choice how to

list your references but be consistent. References in the text should contain the names

of the authors and the year of publication. E.g. Heckman and McFadden (2013). In

the case of three or more authors: list all names and year of publication in case of the

rst reference and use the rst name and et al and year of publication for the other

references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that

actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty

as in the heading of this document. This combination is provided on Blackboard (in

MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number

(d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

The Relationship between Insurance Status

and Total Health Expenditures in the United

States of America

A Semiparametric Approach of an Extended Cosslett-Model

R.J. Garssen

(10319476)

MSc in Econometrics

Track: Econometrics

Date of final version: December 23, 2016

Supervisor: Dr J.C.M. van Ophem

(2)

Abstract

The American health care system has two major problems: the increasing GDP-share of medical

expenses and the size of the uninsured population. Health insurance is seen as too expensive and

insurers have too much power. This research aims to provide a detailed analysis of the relation

between insurance and utilization decisions and total health expenditures. Moral hazard and

adverse selection influence these decisions. A non-random selection is made based on insurance

status and utilization of medical care. As a consequence, the model includes correction terms to

correct for possible correlations and sample selectivity. The whole process is estimated in a

parametric en semiparametric way. The results are highly significant for the correction terms.

Furthermore, being insured has a positive impact on the use of medical care. Besides that, total

health expenditures seem to be higher for individuals with insurance. However, a detailed impact

analysis is hard to give due to the effect of the Cosslett dummies.

(3)

Statement of Originality

This document is written by Student Rens Garssen who declares to take full responsibility for the

contents of this document. I declare that the text and the work presented in this document is

original and that no sources other than those mentioned in the text and its references have been

used in creating it. The Faculty of Economics and Business is responsible solely for the supervision

of completion of the work, not for the contents.

(4)

Contents

List of Figures

4

List of Tables

6

1

Introduction

7

2

Theoretical Background

9

2.1

Comparing Estimation Methods . . . .

9

2.1.1

Variables

. . . .

10

2.1.2

ANOVA . . . .

10

2.1.3

ANOCOVA . . . .

10

2.1.4

One-Part Model

. . . .

10

2.1.5

Two-Part Model . . . .

11

2.1.6

Four-Part Model . . . .

11

2.1.7

Results

. . . .

12

2.2

Alternative Four-Part Models . . . .

13

2.3

Adverse Selection and Moral Hazard . . . .

15

2.4

Instrumental Variables . . . .

16

2.5

Shen . . . .

17

3

Methodology and Techniques

22

3.1

Cosslett’s Selection Model . . . .

22

3.2

An Extended Cosslett Selection Model . . . .

23

3.2.1

Parametric Approach

. . . .

23

3.2.2

Semiparametric Approach . . . .

24

4

Data

26

4.1

Medical Expenditure Panel Survey . . . .

26

4.2

Dependent Variables . . . .

26

(5)

4.4

Selected Variables and their Expected Impact . . . .

27

4.4.1

Corresponding Explanatory Variables

. . . .

28

Health Status . . . .

28

Demographics . . . .

29

Control Dummies . . . .

31

4.4.2

Specific Explanatory Variables . . . .

31

Insurance Status . . . .

31

Utilization . . . .

31

Total Health Expenditures

. . . .

32

4.5

Descriptive Statistics . . . .

33

4.5.1

All Years . . . .

36

4.5.2

Year 2014 . . . .

39

5

Results and Analysis

40

5.1

Insurance Status . . . .

40

5.2

Utilization . . . .

43

5.3

Total Health Expenditures . . . .

46

5.3.1

Positive Utilization and Insured . . . .

47

5.3.2

Positive Utilization and Uninsured . . . .

49

5.4

Other Results . . . .

50

6

Conclusion and Discussion

53

References

57

A Results

58

A.1 Insurance Coverage . . . .

58

A.2 Utilization . . . .

60

A.3 Total Health Expenditures (Insured) . . . .

63

(6)

List of Figures

4.1 Opinion about Health Insurance . . . 33

4.2 Distribution of Total Health Expenditures . . . 37

4.3 Percentage Privately Insured . . . 38

5.1 Frequency Insurance dummies . . . 51

(7)

List of Tables

4.1 Description of Study Population . . . 33

4.1 Description of Study Population . . . 34

4.1 Description of Study Population . . . 35

4.1 Description of Study Population . . . 36

4.2 Insurance and Utilization Comparison . . . 37

5.1 semiparametric Estimation Results: Insurance Coverage . . . 42

5.1 semiparametric Estimation Results: Insurance Coverage . . . 43

5.2 Semiparametric Estimation Results: Utilization . . . 44

5.2 Semiparametric Estimation Results: Utilization . . . 45

5.3 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and In-sured) . . . 48

5.4 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and Uninsured) . . . 49

5.4 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and Uninsured) . . . 50

5.5 Polynomial Insurance . . . 52

5.6 Polynomial Utilization . . . 52

A.1 Parametric Estimation Results: Insurance Coverage . . . 58

A.1 Parametric Estimation Results: Insurance Coverage . . . 59

A.2 Semiparametric Estimation Results: Insurance Coverage . . . 59

A.2 Semiparametric Estimation Results: Insurance Coverage . . . 60

A.3 Parametric Estimation Results: Utilization . . . 60

A.3 Parametric Estimation Results: Utilization . . . 61

A.4 Semiparametric Estimation Results: Utilization . . . 62

A.4 Semiparametric Estimation Results: Utilization . . . 63

A.5 Parametric Estimation Results: Total Health Expenditures (Positive Utilization and Insured) 63 A.5 Parametric Estimation Results: Total Health Expenditures (Positive Utilization and Insured) 64 A.6 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and In-sured) . . . 64

A.6 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and In-sured) . . . 65

(8)

A.7 Parametric Estimation Results: Total Health Expenditures (Positive Utilization and Uninsured) 66 A.7 Parametric Estimation Results: Total Health Expenditures (Positive Utilization and Uninsured) 67 A.8 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and

Uninsured) . . . 67 A.8 Semiparametric Estimation Results: Total Health Expenditures (Positive Utilization and

(9)

Chapter 1

Introduction

In the American health care system private and public health insurances can be distinguished. The definition of these two separate possibilities are clearly described by Smith and Medalia (2015):

• A private health insurance includes an insurance plan covered by an employer, an union or insurance company

• A public health insurance includes federal programs like Medicaid, Medicare, the Children’s Health Insurance Program (CHIP), Civilian Health and Medical Program of the Department of Veteran Affairs (CHAMPVA), individual state health plans etc.

There also exists a significant group of uninsured. The information channels in America call the present situation “the health care crisis”. As stated in the report of the Council of Economic Advisors (2009), a reform of the health care system should be a high priority in the United States. This reform is seen as crucial due to two major problems: the increasing GDP-share of medical expenses and the size of the uninsured population (Council of Economic Advisors, 2009; Fox et al., 1993). In 2013 16.9% of the American GDP consists of health expenditures (World Bank Group, 2016). Furthermore, the population without health insurance was 13.3% in the same year (Smith and Medalia, 2015). Besides that, a lot of Americans are struggling with medical bills resulting in large debts. This comes together with a lot of bankruptcies in all age categories in the United States (Mangan, 2013).

The increasing GDP-share of medical expenses and the size of the uninsured population are caused by several, mainly financial, reasons. According to Fronstin (2005) the financial aspect is the most important factor people take into consideration when they decide to get an insurance or not. This insinuates that it is possibly more affordable not to be insured than paying monthly premiums. Another financial problem is the so called ‘Donut-Hole’. A major part of the seniors insured by Medicare has to deal with this. It is called a donut-hole because at first, the prescribed medicines are covered, but after a certain amount has been passed, all the medicines have to be payed out-of-pocket up to your yearly limit, where the coverage gap ends and everything is covered again. Another financial reason is that insurance companies could raise premiums with any rate whenever they wanted. They were also allowed to reject customers, drop customers when they became sick or reached their yearly limit. Health care also becomes more expensive (Fox et al., 1993). Until 1980 health care prices increased with the same rate as the general consumer price index. After 1980, however, medical care prices increase with a much higher rate than the consumer price index. A final

(10)

reason, although there are more reasons left, millions of people are too poor to afford health insurance but at the same time earn too much to apply for Medicaid. So the determination of restrictions when someone should be able to apply for a certain public insurance should be revisited too (Zwelling and Kantarjian, 2014).

After multiple failed attempts, new developments regarding the health care reform started on March 23, 2010 when President Barack Obama signed the Affordable Care Act (ACA), also known as ObamaCare (Orszag and Emanuel, 2010). And since 2014, the the health insurance system shows improvement (Blu-menthal and Collins, 2014). The insurance coverage expansion began in 2014. One of the causes is the introduction of the parent’s policy for young adults. In other words, young adults can enroll in a parent’s policy until they turn 26. The Health Insurance Marketplace also has a strong positive effect on the in-surance coverage rate in America.1 It increases the number of insured Americans, 8 million in 2014, and it creates more choices for consumers and more competition between insurers. The Congressional Budget Office (CBO) prospects that 25 million Americans will be insured via the market place in 2017 Blumenthal and Collins (2014).Furthermore, 12.3 million additional Americans are enrolled in Medicaid and CHIP.

Hence, there were or are clear reasons for a reform of the American health care system. In 2010, ObamaCare has been introduced and since 2014 positive changes in the health care system are observed. However, at least as important is the question how this health care crisis could arise in the first place. In this research we will focus on one of the two major problems, the uninsured population. We want to create a better insight in the decision making process of being insured, utilizing medical care and the health care expenses. This leads to our main question: What is the effect of the insurance status on total medical expenses?

The medical expenses depend on a lot of variables including several choice variables. Think about the decision to be insured, the decision to visit a doctor or the doctor’s decision of treatment for the patient. These are all subjective elements bringing challenges in estimating medical expenses and its significant explanatory variables. The Medical Expenditure Panel Survey (MEPS) is an extensive panel data set with a lot of relevant variables about families and individuals. The survey began in 1996 and still hasn’t ended. With this wide choice of variables we want to give a well-founded answer on our research question by applying a semiparametric approach of an extended Cosslett-model on the specific data set. Both decision-making processes and total health expenditures will be estimated.

The next chapters of this paper are organized as follows: Chapter 2 provides detailed background information of previous researches with similar topics and the methods they have used; Chapter 3 explains the model and method used in this research. Besides that, there will be some clarification of important variables and causalities we have to consider in our model; Chapter 4 describes the data used for this research; Chapter 5 discusses the main results followed by the conclusion, discussion and future research directions in Chapter 6.

1

The Health Insurance Marketplace, also called “Exchange”, offers standardized health insurance plans. States can operate their own marketplace or joining the marketplace managed by the federal government. The plans differ in coverage rates: Bronze (60%), Silver (70%), Gold (80%), Platinum (90%).

(11)

Chapter 2

Theoretical Background

Econometrics and its estimation methods are extending their area of application. It is not only about economics and finance anymore because forecasting or explaining certain trends by using historical data is seen as valuable for a lot of research areas. Health economics has become a very popular one. In this chapter we will discuss some researches about the health industry, mainly focusing on total health expenditures and health insurance. First, the different methods applied by Duan et al. (1982) are explained and compared. Duan et al. (1982) conclude that the four-part model is the best estimation method. Therefore, the next paragraph is about two more researches using alternative four-part models. In paragraph 2.3 adverse selection and moral hazard are explained. They can cause endogeneity. Using instrumental variables is one of the solutions for endogeneity which is explained by the the research of Dunn (2015). Another way to deal with endogeneity is to include a correction term which can be seen in the paper of Shen (2013). Shen (2013) estimates two selection equations and an outcome equation. His model is also the basis of the model of this paper.

2.1

Comparing Estimation Methods

At the time of Duan et al. (1982) there was a debate about cost sharing for medical services in America. In this case, cost sharing is defined as sharing medical expenses between the individual and the insurer. Some people thought cost sharing would not affect the demand for medical care, some appointed it as tax on the sick and others believed it discourages the use of necessary health care. To provide a better insight the government sponsored an experiment in the 1970s, the RAND Health Insurance Experiment (RAND HIE). This study assigned people randomly to different kind of insurance plans and followed their behavior in the field of health care expenditures and utilization of medical care. Making use of this data Duan et al. (1982) tried to find the best model to analyze total health expenditures and the impact of different insurance plans. They applied and compared the following methods: analysis of variance (ANOVA), analysis of covariance (ANOCOVA), a one-part model, a two-part model and a four-part model. The trade-off between the numbers of assumptions and degrees of freedom recurs several times when comparing the estimation methods.

(12)

2.1.1

Variables

The dependent variable for each estimation method is the total medical expenses. If explanatory variables are included, they consist of insurance plan variables, sex, race, family income, family size, self-reported health, pain and worry. The coefficients of the insurance variables are most relevant to answer the research question of Duan et al. (1982). The insurance plan variables are specified as five dummy variables: free care plan (P00), 25% coinsurance rate (P25), 50% coinsurance rate (P50), 95% coinsurance rate (P95) or an individual deductible of $150 per person or $450 per family (IDP). The free care plan is the omitted group.

2.1.2

ANOVA

The analysis of variance (ANOVA) is the first estimation method they applied. The untransformed health expenditures (Yi) are estimated by the grand mean (µ), a plan effect (αi) and an error term (i) resulting

in the following equation:

Yi= µ + αi+ i, with i as indication for plans (2.1)

The plan mean is the variable of interest and is defined as the mean of the health expenses for that plan. The ANOVA has the advantage that it returns unbiased and consistent estimates. The only restriction that has to be met is the independence of the plan assignment with the error term. The set-up of the random experiment assures this restriction to hold. A major disadvantage is that the model is homogeneous since it ignores the effect of other explanatory variables.

2.1.3

ANOCOVA

To solve this homogeneity problem, the analysis of covariance does include explanatory variables which creates an important contrast with the ANOVA. This can be seen in the model too:

Yi= αi+ X 0

iβ1+ i, with i as indication for plans (2.2)

The beta is a vector of coefficients indicating the impact of each explanatory variable. These estimates are obtained by Ordinary Least Squares (OLS). If the true model is linear and the error term is independent of the covariates, ANOCOVA creates consistent and unbiased estimates. But when linearity does not hold in the true model, the estimated coefficients will be inconsistent. Besides that, the dependent variable includes both, strictly positive health expenditures and health expenditures equal to zero. The impact of explanatory variables on total health expenditures will be biased when spenders and non-spenders are kept together as one group.

2.1.4

One-Part Model

Besides a large number of observations with no expenses, the distribution of medical expenses typically also has a fat right tail. The one-part model takes account of this skewness by including a special case of a Box-Cox transformation, the logarithmic transformation.1 A constant is added because the logarithm of

1

The objective is to make the dependent variable close to normal by finding a function y = f (y) with f (y) =ypp. This Box-Cox transformation results in a logarithmic transformation if p = 0 is optimal.

(13)

zero is mathematically impossible but Duan et al. (1982) do not want to loose observations.2 Log(Yi+ 5) = X

0

iβ1+ i (2.3)

The difference between the one-part model and the ANOCOVA is the introduction of the logarithmic trans-formation. Although this transformation is often used for variables with a fat right tail, it does not necessarily correct for the skewness. So it is questionable if the problem of the fat right tail is really solved. Further-more, the large number of non-spenders, about 20% of the data, are still not separated from the spenders. Following the same reasoning as in the previous paragraph , this will result in biased estimations.

2.1.5

Two-Part Model

A model that approaches the complexity in a better way is the two-part model. The problem of the individuals without expenses in the one-part model is corrected by the addition of an extra binary equation. In other words, the first decision is to have positive expenses or zero expenses (Ii) followed by the estimation of the expenses (Yi) conditional on the positivity of the expenses. In formulated form the model is as follows:

Ii= X 0 iδ1+ ν1i, ν1i∼ N (0, 1), P (Ii> 0) = Φ(X 0 iδ1) (2.4) log(Yi|Ii> 0) = Xi2+ ν2i, ν2i∼ N (0, σ2) (2.5) This model requires more assumptions than the ANOVA, but also more parameters are estimated and thus more degrees of freedom. The researchers chose for a normality assumption that changes the binary equation to a probit estimation. The second equation is a linear model with the logarithm of the medical expenses as dependent variable. The parameters δ1, δ2and σ are estimated by maximum likelihood. This two-part model

does separate the spenders from the non-spenders so this will not cause a bias anymore. However, the model also has a disadvantage: the estimated coefficients become inconsistent when the normality assumption for the error term in the second equation does not hold. This problem would not exist when equation (2.5) could be estimated by OLS.

2.1.6

Four-Part Model

The four-part model goes one step further compared to two-part model. Whereas the two-part model focuses on the large number of non-spenders only, the four-part model also takes the fat right tail, 10% of the observations, in consideration. The four-part model divides the population into three groups: non-spenders, ambulatory-only spenders and spenders with hospital utilization. The separation of the non-spenders and spenders is necessary on account of the large number of observations with zero expenses. The separation for ambulatory-only spenders and hospitalized spenders is for the observations with extremely high expenses in the right tail. These individuals spend too much to use the log-normal distribution. By classifying the observations in these three groups they want to solve the problem of non-normality of the medical expenses. The model in formulated form is comparable with an extended version of the two-part model, except

2

(14)

that another binary equation has been added. P (Ii> 0) = Φ(X 0 iγ1) (2.6) P (Ui> 0|Ii> 0) = Φ(X 0 iγ2) (2.7) Log(Yi|Ii> 0, Ui = 0) = X 0 iγ3+ νi and Log(Yi|Ui> 0) = X 0 iγ4+ ωi (2.8)

The first equation estimates the probability for having zero medical expenses or strictly positive expenses. The second equation is used to make a distinction between ambulatory-only and inpatient spenders. The last equations estimate the logarithm of the medical expenses for ambulatory-only spenders and inpatient spenders given the expenditures are positive. The four equations of the model are all estimated with maxi-mum likelihood. This model requires more assumptions than the two-part model, but also more parameters are estimated. According Duan et al. (1982) the results of the four-part model are consistent and more precise and robust.3

2.1.7

Results

The ANOVA is a homogeneous estimation method which makes it hard to compare with the other het-erogeneous models. When a comparison of the results is made, Duan et al. (1982) look at consistency of the estimates and minimum mean squared errors of the health expenditure equations. They conclude that there are large differences between the estimation results. The results of the ANOVA are unbiased but also inefficient because this method is extremely homogeneous. The ANOCOVA and the one-part model included explanatory variables but the results are biased due to the large group of non-spenders. Duan et al. (1982) describe the ANOVA and the ANOCOVA as too straight-forward, too simple and unable to correspond with the complexity of health care decisions and the health care market. The one-part model takes account of the skewed distribution of the total health expenditures by including a logarithmic transformation. This transformation is often applied in econometric literature. However, as mentioned earlier, it is questionable if it is a correct solution to approach the distribution of the medical expenditures.

So the two-part and four-part models are left. The major improvement is the separation of spenders and non-spenders. Another positive aspect is the increase in degrees of freedom. But it is important to note that this increase comes together with more assumptions. Duan et al. (1982) discussed the results of these models in more detail, especially regarding the impact of the insurance plan variables. The coefficients of the different insurance plan variables show that individuals with higher insurance coverage spend more on medical services and thus have higher health expenditures. The plan differences seem to be smaller for the two-part model compared to the one-part model. The four-part model has on his turn smaller plan differences than the two-part model. Their concluding remarks state that the four-part model performs

3

As a consequence of the logarithmic transformation, (Duan et al., 1982) introduce a re-transformation factor to go from logarithmic scale to raw dollar scale. The expected health expenditures are E(Yi|Xi) = e

X0iβ

∗ E(ei|X

i). The

re-transformation factor is E(ei|X

i) = e

σ2

2 if the error is normally distributed and independent of X

i. While the

transformed models try to approximate normality, the error distribution still deviates from the normal distribution. That is why they estimate the smearing factor in a nonparametric way. This way the factor becomes the sample average of the exponentiated least squares residuals. However, the smearing factor is applied in the four-part model only, because for the one-part and two-part models the error is related with Xi. That’s why the results of the four-part

(15)

the best, although it should not be seen as final methodology since the results can be inconsistent when an assumption does not hold. They suggest semiparametric or nonparametric approaches for the first two selection equations to make the model less restrictive and to increase the quality of the results.

2.2

Alternative Four-Part Models

In the following paragraph we will discuss the methodology and results of Manning et al. (1988) and Miller et al. (2004), who both applied a four-part model because of the findings and conclusions of Duan et al. (1982).

The following quote has motivated Manning et al. (1988) to investigate the health insurance system:

“The share of GNP devoted to medical care has increased from 4 to 11 percent between 1950 and 1984.”

According Manning et al. (1988) the most important explanation for this increase is the growing popu-larity of health insurance. The wide and increasing variety of insurance plans increases the demand for medical care in general, the demand for higher quality of medical services, but also the prices of medical services. Multiple studies try to quantify the effect of diversity of insurance plans on health expenditures. And although they are all looking to the same market behavior problem, results about the price elasticity or coinsurance elasticity differ by a factor 10 or more among all these researches. However, such disagreement is not astounding keeping in mind the complexity of the issue and the importance of using a high quality data set, like the RAND HIE.

Significant disagreement between the results of Manning et al. (1988) and Duan et al. (1982) should not exist. Both use the same data set and both apply, among other things, a four-part model. Furthermore, they both focus on medical expenditures and use insurance plan, health status, sociodemographic, demographic and economic variables as independent explanatory variables. In the case of Manning et al. (1988), health expenditures contain all inpatient services and all purchases of drugs and supplies. The same problem as before arises, because medical expenses are highly skewed. Therefore, Manning et al. (1988) made a partition of the participants into three groups: nonusers, users of only outpatient services and of any inpatient services. Unsurprisingly, this is exactly the same partition as Duan et al. (1982). The model consists of two probit models to determine whether a participant uses medical care (Ui) and whether a participant uses inpatient services (Ii). Afterwards, two linear regressions with a logarithmic transformation of the medical expenses

are estimated, one for outpatient-only participants and one for inpatient participants.4 Ui= X 0 1iβ1+ 1i, (1i|Xi) ∼ N (0, 1) (2.9) Ii= X 0 2iβ2+ 2i, (2i|I1i> 0, Xi) ∼ N (0, 1) (2.10)

Log(M edical$i|Ui> 0 and Ii≤ 0) = X 0

3iβ3+ 3i, (3i|Ui> 0, Ii≤ 0, Xi) = 0 (2.11)

Log(M edical$i|Ui> 0 and Ii> 0) = X 0

4iβ4+ 4i, (4i|Ui> 0, Ii> 0, Xi) = 0 (2.12)

4

Manning et al. (1988) also introduce a smearing factor functioning as re-transformation factor based on Duan et al. (1982). Whereas Duan et al. (1982) used one smearing factor for the whole sample, Manning et al. (1988) estimates separate smearing factors for various sub-groups.

(16)

As we would expect, Manning et al. (1988) came up with the same conclusion as Duan et al. (1982): the four-part model generates the best estimates. Moreover, expenditures in the free care plan are 46% higher than the 95%-care plan. This corresponds to the difference found in the sample means. So again, the higher the coverage rate, the higher the total health expenditures. The largest decrease in expenditures occurs from free care to 25% care plan followed by smaller decreases thereafter.

The methodology of Miller et al. (2004) can be placed into the same category as Manning et al. (1988) and Duan et al. (1982). They use a four-part model on total health expenditures. The first two equations are logistic regressions which differs slightly from the probit approach. To account for the skewness of expenditures, these two equations split up the sample in the same three groups as discussed earlier. The last two equations are OLS regressions including smearing factors.5 This model is applied on the Medical Expenditure Panel Survey, which is the same data set as used in this paper. Miller et al. (2004) use pooled data from 1996 till 1998. In their opinion it is important to adjust these data, because in the period between 1998 and 2002 the population composition has changed and medical technologies have improved but also have become more expensive. They use the March 2002 Current Population Survey (CPS) to approach the population composition of 2002. Although this adjustment also increases the costs of medical care with a small amount, according Miller et al. (2004) extra calibration is necessary to match the growth rates of the National Health Accounts (NHA).

Next to health expenditures, insurance status is a variable of interest. Private group, private non-group or Medicaid all belong to the insured status. Miller et al. (2004) divide the uninsured status into four groups: full year uninsured with employer offers and without, uninsured for less than six months and uninsured for more than 6 months. The control variables consist of basic variables like age and gender to activities of daily living, functional limitations and perceived mental health status.

The results show that insured individuals spend more than full-year uninsured individuals, which is in line with the results found by Manning et al. (1988) and Duan et al. (1982). All insured types spend more than $2000 dollars per person whereas fully uninsured with and without employer offers spend on average $778 and $934 respectively. Hence, this is a major difference between the groups. Furthermore, privately insured individuals seem to be healthier than uninsured individuals based on physical health, mental health and functional limitations. The effect of chronic diseases is the only exception since the uninsured are less likely to have a chronic disease. However, Miller et al. (2004) put forward the possibility of incomplete information in the survey referring to the possibility of not reporting chronic diseases. This is a fair point made and should be taken into account for all surveys. Good characteristics of a survey are the number of observations and variety of questions all on individual level. But these come together with measurement errors when individuals do not know or do not want to give the (true) answer.

Miller et al. (2004) tried to find useful information for improving the insurance status of Americans and

5

Predicted expenditures are calculated according the following expression: E(Expenditureiρi[(1 − πi)e

xiβ3

S3(xi) + πie xiβ4

S4(xi)] where

ρi= F (xiβ1) (estimated probability of any health expenditures)

πi= F (xiβ2) (estimated conditional probability of inpatient expenditures given any expenditure)

exiβ3S

3(xi) (estimated expenditures conditional on having outpatient expenditures only)

exiβ4S

4(xi) (estimated expenditures conditional on having outpatient and inpatient expenditures)

S3(xi), S4(xi) (smearing factors for conditional expenditure models)

The smearing factors may not be confused with correction terms. In this case, the smearing factors are re-transformation factors because Miller et al. (2004) use a logarithmic re-transformation. The smearing factors of Miller et al. (2004) are based on methods of Duan et al. (1982) and Manning et al. (1988).

(17)

shift the uninsured population to the private or public insurance groups. Hence , the most important result is the estimate of the change in national health expenditures if all uninsured individuals were given health insurance coverage. Expanding coverage from uninsured to private insurance leads to a net cost between $53.8 to $67.4 billion representing an eleven to fourteen percent increase in annual health expenditures while expanding coverage to public insurance would have cost around $40 billion, an eight percent increase. Shifting to public coverage is less costly than shifting to private coverage because of lower reimbursement rates. The research of Miller et al. (2004) is a good start to estimate the costs of increasing the insurance coverage rate in America. However, as they indicate themselves, more research has to be done regarding plan design and minimization of health expenditure increases when expanding the insured population.

2.3

Adverse Selection and Moral Hazard

Although the researchers mentioned above were adapting their models to get a better fit of the distribution of the medical expenses, they were actually already looking at decision-making processes. The complexity of health care is partly caused by these decision-making processes, e.g. the decision to be insured, the decision to visit a doctor or to buy medicines and thus have positive expenses, the doctor’s decision of treatment options and the final decision for the treatment the patient wants and the doctor agrees with. Hence, the binary equations implemented in the four-part models discussed in the preceding paragraphs describe the decision whether to have positive expenses or not and to use inpatient services or not. However, this does not cover all the decisions yet. Besides that, when multiple decision processes are estimated, they are likely to be correlated.

Besides the decision to have health expenditures or not, the decision to be insured is also important. An individual with insurance is more likely to visit a doctor than an individual without insurance. Or an individual with health problems has a greater incentive to get a health insurance than a healthy individual without any medical complaints. These specific behavioral phenomena are specified as adverse selection and moral hazard. Moral hazard and adverse selection are based on asymmetric information and changing risk behavior because of that.

Adverse selection is based on asymmetric information between two parties. In this case individuals have more information about their health status than the doctor and the insurer. This also leads to a selectivity problem. Insurers preferably have a healthy customer database. However, the relatively unhealthy people are the ones actively looking for a health insurance.

Moral hazard is defined as the situation whereby someone takes unusual risks to earn a profit. Perhaps this sounds rather vague, but in the health insurance case it can been seen as an insured individual who lives more careful and also goes to see a doctor more easily just to be sure. In other words, insured individuals are more likely to use medical care because they know it is (partially) covered. This is the reason why it is important not only to look at extreme values of medical expenses, i.e. the utilization of medical care, but also the insurance status.

(18)

2.4

Instrumental Variables

A very recent study to health insurance and the demand for medical care has been performed by Dunn (2015). The main motivation of his research is, as has been mentioned in the introduction, the medical expenditures share of GDP which is large and growing. He wants to estimate demand for medical care based on out-of-pocket price paid by the consumer. This is different from the full price paid to the medical provider, because a part of the amount is covered by the insurer. Therefore, he includes the demand for insurance too. This leads to the possibility of endogeneity because insurance status, health status and health expenditures are correlated. This correlation is mainly caused by adverse selection and moral hazard. Consequently, insurance status is likely to be correlated with the error term resulting in endogeneity.

To control for endogeneity, researchers can use experimental data with a high level of randomness or include instrumental variables. The RAND Health Insurance Study is such a random experiment. However, using the RAND Health Insurance Study was not an option for Dunn (2015). In the period between the 1970s and 2015, medical expenses and medical technologies have changed substantially. Therefore it is questionable if it is still useful to obtain conclusions for the twenty-first century from such obsolete data. The data used by Dunn (2015) instead is the MarketScan commercial claims database for the years 2006 and 2007. This dataset has more than four million claims per year and contains detailed information like demographics, medical conditions, expenditures et cetera.

Dunn (2015) points out the importance of the negotiations between medical providers and insurers. As we would expect, insurers negotiate with medical providers prior they compile the plans they will offer. So the negotiated prices do influence the plan offers of the insurers, but at the same time they should be uncorrelated with the consumer’s selection of the insurance plan as they do not have any input in this negotiation. This line of thought leads to the introduction of the main instrument used in his paper, the Medical Services Area (MSA) Service Price Index (SPI). This instrument is likely to be strong since the difference in negotiated prices between MSAs is substantial. Extra variations of this instrument are introduced because a bias could occur when, for example, the service price in a MSA is correlated with the quality of the services in that MSA.

The logarithm is applied to correct for the skewness of medical utilization, but this is the only measure Dunn takes. He approximates the distribution of medical expenditures less well than Manning et al. (1988) and Miller et al. (2004) who both had additional equations to create subgroups.

The empirical model for demand is split up into three cases: overall utilization, weighted number of episodes and utilization per episode.6 Following health literature, a two stage residual inclusion model is applied with the logarithm of the MSA service price index (SP Ir) as basic instrument for the out-of-pocket price (OOP Pf). A two stage residual inclusion model is preferred above two stage least squares since the

model is nonlinear. Each model is estimated 6 times, once without instrument and five times with different combinations of instruments. All the models include explanatory variables (Zi) like age, family size, income, insurance plan type, obese and smoking.

6

An episode is the period starting at the first visit for a specific disease until the disease is over. The weighted number of episodes is based on the intensity of the treatment of each disease. The intensity of treatment for hypertension is lower than that of a hearth disease (Dunn, 2015).

(19)

For the overall service utilization (SUi), the two stages are given by the following equations: ln(OOP Pf) = γ ln(SP I r ) + τ1Zi+ ξi (2.13) SUi= X d∈i SUd,i= e α ln(OOP Pf)+β1Zi+δ bξi + i (2.14)

It is important to note that the instrument, ln(SP Ir), has to be uncorrelated with unobserved demand, ξi, and the instrument has to be correlated with out-of-pocket price ln(OOP P f ). Otherwise it cannot be

classified as a valid instrument anymore. To control for unobserved health and measurement errors which cause endogeneity, the error term of the first-stage regression is included in the second equation. The model for the extensive margin or in other words the weighted number of episodes is very similar to the above. Again, a two-stage residual inclusion model is applied to address endogeneity. The first stage is unchanged and in the second stage overall utilization is replaced by the weighted number of episodes.

The third case is the intensive margin or service utilization per episode (SUd,i). Since the episodes

are observed separately, additional information about the specific disease is also analyzed. This leads to the similar model, but now the dependent and some explanatory variables differ by either individuals and diseases. The service price index in this model is disease specific as well which is a big difference with the two preceding models. Using indices i and d for individual (Zi) and disease (Xd,i) specific variables respectively,

the notation of the model in mathematical form is as follows:

ln(OOP Pf) = γ1 ln(SP I r d) + γ2 ln(SP I r ) + τ1Zi+ τ1Xd,i+ νd+ ξi,d (2.15) SUd,i= e

α ln(OOP Pf)+beta1Zi+β2Xd,i+νd+δ dξi,d+ 

d,i (2.16)

In the end, Dunn (2015) makes some interesting concluding remarks. The results indicate that the price elasticity for the demand for medical care is around -0.22, which is in line with previous researches. Besides that, the high significance of δ, the coefficient of the residual inclusion variable, indicates that controlling for endogeneity is crucial. A key finding from the RAND study was that consumers change the number of episodes treated instead of the utilization per episode as response to changes in out-of-pocket prices. The results of the extensive margin model confirm this finding too. In the long run, Dunn (2015) concludes that after thirty years the key findings of the RAND study still hold and that negotiated service prices are strongly related with out-of-pocket prices and thus influences medical care utilization.

2.5

Shen

The complex health care system is a popular research topic, but none of the researches estimates the process as a whole (Shen, 2013). None of them estimates insurance, utilization and medical expenses simultaniously to simulate the decision-making processes. None of them adjusts the model well enough to approach the complexity of the reality. According to the following quote of Shen (2013), exactly the opposite is true for the methodology he applies on health care decisions and total health expenditures:

“This paper contributes to the literature by taking into account the interrelated nature of health care de-cisions and using a semiparametric approach to address the empirical challenges.”

(20)

Before the paper of Shen (2013) is discussed, we first take a closer look to two-step estimation methods. The models of the papers discussed so far are all two-step estimation methods. For Duan et al. (1982), Miller et al. (2004) and Manning et al. (1988), the first step is a selection equation. For Dunn (2015), it is the estimation of the instrumental variable. The second step is the outcome equation of their own interest. Shen (2013) distinguishes himself from the other researchers by including a correction term for the sample selection and endogeneity in his first step. This correction term is based on the approach of Heckman.

The Heckman selection model typically consists of two equations: • A selection equation for a binary decision variable:

Selectioni =1{Z 0

iγ + ui> 0} (2.17)

• An outcome equation for the variable of interest:

(Yi|Selectioni) = X 0

iβ + i (2.18)

For consistent estimates of the outcome equation, estimated by OLS, the explanatory variables have to be exogenous and the error terms should have mean zero. In the absence of sample selectivity and endogeneity

E(i|Xi, Selectioni) = 0 (2.19)

holds. This is the case when samples are taken randomly and all variables are exogenous. However, the selection made by Equation (2.17) is non-random. So it does result in sample selectivity and thus

E(i|Selectioni) 6= 0 (2.20)

This can cause endogeneity which is a common problem in selection models. Omitting the selection bias given in Equation (2.20) would result in inconsistent estimates. Therefore an estimate of the selection bias, the correction term, is included which restores a zero conditional mean. In 2000 Heckman received the Economics Nobel Prize for, among other things, his achievement to introduce a correction term which corrects for both selection bias and endogeneity. This results in the final outcome equation

(Yi|Selectioni) = X 0

iβ + λi(Z 0

iγ)b + i (2.21)

where λi(·) is a function for the correction term.

Returning to the paper of Shen (2013), there is a split up of the whole health care process in insurance, utilization and expenditures. All three have their own equation. Insurance and utilization are decision variables. These decision variables rely on their own important question regarding health care:

• Do I want insurance? • Do I want medical care?

So the dependent variables are individual choices, which means that they can be endogenous. This is also a part where adverse selection and moral hazard claim their role. Total health expenditures are estimated for

(21)

each insurance status conditional on utilization. Therefore, the methodology of Shen (2013) is a four-part model. He estimates his model both parametrically and semiparametrically. The parametric selection model in formulated form is as follows (Shen, 2013):

Indicator function for the insurance status:

I =1{VI > I}, where VI = ZIγI (2.22)

Indicator function for the utilization:

A =1{VA+ IθA> I}, where VA= WAβA (2.23)

Equation for the total health expenditures7:

YE= XEβE+ θEd + λdGd+ u∗d: A = 1, I = d, d ∈ {1, 0},

λdGd(VA, VI) = E(u|XS, A = 1, I = d), u ∗

d= u − λdGd(VA, VI)

(2.24)

Maximum likelihood estimation of the insurance and utilization decisions is the first step. A joint normality assumption is made for the error terms, so that makes it a bivariate probit estimation. An important difference between the models of Shen (2013) and Heckman is that Shen (2013) includes the insurance variable as one of the explanatory variables in the utilization and expenditure equation. So endogeneity correction is for including insurance status and the sample selection correction is for the selection made based on utilization. The inclusion of insurance status has everything to do with moral hazard and adverse selection. Insured people are much more likely to use medical care and people with greater demand for health care are more likely to get an insurance. In the second step the health expenditures are estimated. The inclusion of the insurance status variable is important because it may influence the choice of treatment type. Think about buying brand-name medication instead of inexpensive variants. The inclusion of the utilization variable is not necessary because expenditures are estimated only given its positivity. As mentioned before, the estimated coefficients will be biased if spenders and non-spenders are kept together as one group.

The correction terms used by Shen (2013) in the parametric case are based on the Heckman correction. In this case, two correction terms (λd) are added for insurance status and utilization respectively. These

terms can be determined depending on assumptions made. Under normality, it is possible to calculate the correction term by using the standard normal density function and cumulative density function. This creates either an easy way to obtain the selection bias and a straight forward interpretation. However, the results and interpretations are only useful when the model is correctly specified. Therefore, if normality does not hold, the results are inconsistent.

Researchers can also choose for a nonparametric approach, which does not need any distributional assumptions and thus is more flexible, but also more complex to estimate. Contrary to the parametric approach, the nonparametric approach does not exclude any possible outcomes due to made assumptions. This is one of the reasons why a nonparametric approach is preferred above a parametric approach by some researchers. Nevertheless, a nonparametric approach also has his disadvantages. The most important one is

7

(22)

that the estimation precision decreases when the number of explanatory variables increases. This is mainly due to the curse of dimensionality. A second disadvantage is that the results can be difficult to interpret and the third is that it does not allow extrapolation (Horowitz and Savin, 2001).

It is rather predictable that there exists a method which is a middle way of the parametric and nonpara-metric approach. This semiparanonpara-metric approach avoids the curse of dimensionality and is not as restrictive as the parametric approach. Recall the conclusion of Duan et al. (1982) in which they refer to nonparametric and semiparametric approaches as improvement of the four-part model.

The same reasoning is followed by Shen (2013). He applies a semiparametric selection model to avoid distributional and functional assumptions that are not well justified. The parametric variant is used as benchmark. The model becomes semiparametric when the distribution of the error term does not have to be specified. However, it does assume a parametric index XI0βI. The index does not have to be linear but

when the dimension of X is large, it becomes difficult to estimate the probability of being insured. To be able to apply maximum likelihood without distributional assumptions, Shen (2013) made index assumptions to develop an estimator of the likelihood. This estimator is based on the approach in Klein and Shen (2010). Following Klein et al. (2010), bias controls and regular kernels are employed to create consistent estimates of the probabilities.

In the semiparametric model, the health expenditures are estimated by Robinson’s differencing method. With this method he differences out the selection and thus a correction is not necessary. True expectations are replaced by consistent estimates and residuals obtained by OLS.

The results of the parametric and semiparametric methods show a lot of similarities but also some differences (Shen, 2013). Although these results only hold for the obese sample selected by Shen (2013), they do show the relevance of multiple explanatory variables. Both methods signify an increase of the probability of seeking health care of 15 percentage points if a person has a private insurance. Turning to the expenditures, the methods show different results. The parametric model indicates that being privately insured causes an increase in the expenditures of 125%, while the semiparametric method predicts an increase of 48%. The latter comes close to results of the RAND study. Another interesting finding is the role of the correction terms. Whereas econometric literature states out the importance of correction terms when using selection equations, the estimates of the correction terms in the parametric model are both not significant. Neither is the correction term for insurance in the utilization equation. Unfortunately Shen (2013) is unable to provide any information about the correction terms in the semiparametric method, because they are canceled out and not necessary when he takes Robinson’s differences. Other findings worth mentioning are that impact of gender, marital status and physical and mental illnesses. Females are much more likely to visit a doctor. The marginal effects of the parametric and semiparametric models are 6 and 4 percentage points, respectively. Marital status, which is one of the variables causing a differences between the explanatory variable matrices, has a significant impact on utilization.8 Married people are approximately 3 percentage points more likely to use medical care. This can be caused by the concern of the partner or a higher income for example. To create an indication of physical health status Shen (2013) includes health characteristics like number of comorbidities, often used in health literature, and smoking status (Klabunde et al., 2000). For mental illness, a single survey variable is used where the individual is asked to rate their mental health status. Both physical and mental illnesses have extremely positive impacts on health expenditures. Hence

8

To reduce the impact of multicollinearity, the matrices of explanatory variables of the equations have to differ from each other. This will be discussed in more detail in paragraph 3.2.

(23)

inclusion of these variables in the model of this paper will certainly be considered.

As a summary of the discussed papers above, estimating health expenditures turns out to be complex. The methods should take into account the difference in spenders and non-spenders, adverse selection and moral hazard, but also sample selectivity and endogeneity. Besides that, if it is the aim to estimate the process as a whole, decision-making process should be estimated too.

(24)

Chapter 3

Methodology and Techniques

In the following paragraph we will discuss Cosslett’s selection model. With the help of this information the model of this paper will be described in more detail.

3.1

Cosslett’s Selection Model

Cosslett’s selection model is the semiparametric analogue of Heckman’s two step estimation method (Hussinger, 2008). Heckman’s two step selection model in formulated form is given in Equation (3.1) and (3.2).

Selectioni=1{Z 0

iγ + ui> 0} (3.1)

(Yi|Selectioni) = Xi0β + λi(Zi0γ)b + i (3.2) The most important difference between the Cosslett and Heckman model is the methodology for calculating the correction term. Besides that, Cosslett’s model is a semiparametric approach and thus is less restrictive as the Heckman model. How the correction term can be calculated depends on the assumptions imposed. Shen (2013) starts with assuming joint normality for his parametric estimation method. The advantage of making distributional assumptions is that it reduces the difficulties of calculating and interpreting the correction term. Under normality the correction term is proportional to the hazard rate,

λi,insured(Z 0 iγ) = φ(Zi0γ) Φ(Zi0γ) (3.3) λi,uninsured(Z 0 iγ) = − φ(Zi0γ) 1 − Φ(Zi0γ) (3.4) where φ and Φ are the standard normal probability density function and cumulative distribution function respectively. However, if the distributional assumptions are not true, it will in general lead to inconsistent results.

Avoiding this normality assumption results into a semiparametric or nonparametric estimation method. In that case, Cosslett’s model is one of the possibilities. The correction term is no longer proportional to the hazard rate. Cosslett uses dummy variables to approximate λi(Zi0γ) instead. These are created by dividing

(25)

the interval of Zi0γ in a certain number of smaller intervals. Each interval has his own dummy variable. In

all these, the principle applies: when the size of the intervals goes to zero, the dummies will be a better fit of the correction function λi(Z

0

iγ) and the results become consistent. The outcome equation is estimated by

OLS.

3.2

An Extended Cosslett Selection Model

3.2.1

Parametric Approach

The model applied in this paper is an extended Cosslett selection model based on a combination of the model applied by Shen (2013) and Cosslett’s selection model. The model of this paper is called an extended version of the Cosslett-model because it has two binary variable equations and one outcome equation estimated semiparametrically. This results in a four-part model instead of a two-part model:

Insurancei=1{Z 0 iγ + ui> 0} (3.5) U tilizationi=1{W 0 iπ + Insurancei∗ δ + M X m=1 λmDm(Z 0 iˆγ) + vi> 0} (3.6)

(T otal Health Expendituresi|Insurancei= 1, U tilization = 1) = X 0 iβ1+ M X m=1 J X j=1 αmDm(Z 0 iˆγ)Dj(W 0 iπ, Insuranceˆ i) + i (3.7)

(T otal Health Expendituresi|Insurancei= 0, U tilization = 1) = X 0 iβ2+ M X m=1 J X j=1 αmDm(Z 0 iˆγ)Dj(W 0 iπ, Insuranceˆ i) + i (3.8)

Hereby Dm(Dj) stands for the m th

(jth) Cosslett dummy. Parameters α, β and γ will be estimated. Equation (3.5) and (3.6) describe decision-making processes. In this research we have two important decisions, namely getting a health insurance and seeking for health care. These are the same decision variables as used by Shen (2013): Insurance and Utilization. In more detail, the insurance status has the value zero when an individual is not insured and one when insured. Utilization is zero when there are no medical expenses and one if the expenses are strictly positive. Again, since we are interested in people with positive expenses and different insurance statuses, we are making a non-random selection. This in combination with the possibility of endogeneity leads to the fact that the inclusion of the selection correction term is unavoidable. To obtain the dummies for the correction term, the value of Z0ˆγ is divided in M intervals of equal size. The minimum and maximum value of Z0γ are the lower and upper bounds. Each interval is given its own dummy variable,ˆ thus there will be M dummies. The higher the value of Z0γ, the more likely that individual is insured.ˆ Analogously, with the estimation of the second binary variable we create another J dummies.1 The higher the value of W0ˆπ, the more likely that individual uses medical care. As we can see in equation (3.7) and (3.8), the correction dummies of the two binary variables are included in the outcome equation as cross-dummies, so in total there are theoretically M2correction dummies in the last equation. However, in actual practice it

1

(26)

is possible that some combinations of the Cosslett dummies of insurance and utilization do not exist within the sample. So the total number of cross-dummies is expected to be be less than M2.

The last step of the model is estimating the total health expenditures by OLS. This equation is estimated separately for the sample with insurance and positive expenditures and without insurance with positive expenditures.

We will bootstrap the standard errors to create asymptotically correct standard errors and be able to interpret the significance of the explanatory variables. When the sample size is large, the bootstrapped estimates will converge to the standard errors as the number of replications increases. The formula to calculate the bootstrapped standard error for each explanatory variable is:

SDbootstrap= s βbootstrap0 βbootstrap R − 1 − R R − 1 ∗ PRr=1βbootstrap,r r 2 (3.9) R is the number of bootstrap replications.

In the two selection equations binary variables are estimated. To interpret the results in a correct way, marginal effects will be calculated for the estimated coefficients of these explanatory variables. In the outcome equation total health expenditures, a continuous variable, is estimated by OLS. These results can be interpreted directly.

3.2.2

Semiparametric Approach

The semiparametric approach is quite similar to the parametric model. The estimation method of the outcome equations remains unchanged. The two selection equations are estimated differently. In stead of a probit estimation, the semiparametric method of Gallant and Nychka is applied, see Stewart (2004).

This estimation method is based on maximum likelihood whereby the density function is estimated by a product of a polynomial and a density function of a standard normal distribution. In general, an expansion around any density with a moment-generating function could be used (Stewart, 2004).

The standard formula for an ordered probit model is

log(L) = N X i=1 J X j=1 yij∗ log[F (αj− x 0 iβ) − F (αj−1− x 0 iβ)] (3.10)

The model of this research has only two possible outcomes. The probit estimation assumes that F (·) is the standard normal CDF. In the semiparametric case, this distribution is unknown. Following the method proposed by Gallant and Nychka, the unknown density function is approximated by the density using a Hermite form. This approximation is a product of the squared (kth order) polynomial and any density with a moment-generating function.2 Gallant and Nychka show that ultimately a normal density is used resulting in a Gaussian leading term. The normal density is also used in this paper. The approximation is as follows:

fK() = 1 θ XK k=0 γk k2 φ() (3.11) 2

(27)

θ = ˆ ∞ −∞ XK k=0 γk k2 φ()d (3.12)

To fulfill the mandatory requirements for a proper density function, the polynomial is squared, what makes it positive and this term is multiplied by the normal density function resulting in a function integrating to one.3 θ serves as integral which can be seen as the total. Dividing by the total gives a probability and thus a density function.

3

(28)

Chapter 4

Data

In the following chapter the data of the Medical Expenditure Panel Survey is discussed. After that, dependent and explanatory variables for each equation are explained. Descriptive statistics are given as well followed by a description of the selected explanatory variables and their expected impacts.

4.1

Medical Expenditure Panel Survey

For this research we will use the Medical Expenditure Panel Survey (MEPS), which is a set of larger scale surveys of families and individuals provided by the Agency for Healthcare Research and Quality (AHRQ). This dataset is also used by Shen (2013). However, Shen (2013) focuses on the obese population, so it will be hard to compare our results with his.

The survey began in 1996 and still has not ended. It contains characteristics of individuals in America, their medical costs, their use of health care and their health insurance coverage. These are some of the most important variables in our research, but the data set contains more variables. From disease diagnoses till Body Mass Index (BMI), employment sector till family size or smoking behavior till pension plan, it is all included in the data set.

4.2

Dependent Variables

The model used in this paper has three different dependent variables. The first equation is a selection based on insurance status. Here the dependent variable is insurance status. This is a binary variable with the value one if the individual is privately insured. Public insurances are not included in this research because this is not seen as a choice since it depends on special requirements like income level, occupation and age. The second selection is based on utilization of medical care. This is a binary variable as well. Utilization has the value 1 if the individual has strictly positive health expenditures. Utilization is zero when the total health expenditures are zero. For the outcome equations, the dependent variable is the total health expenditures. This is the total amount spend on medical care including out-of-pocket and covered expenditures.

(29)

4.3

Difficulties with Variable Selection

Before we make a selection of explanatory variables we have to acknowledge possible difficulties within our model:

1. When an estimated variable is included in following equations of the overall estimation method, there is a chance of multicollinearity (Puhani, 2008). In our case, insurance status is included in both, utilization and medical expenditures. When explanatory variables of the insurance equation have too much overlap with the explanatory variables of the utilization equation, multicollinearity is likely to occur. To avoid this, it’s necessary to have different selections of explanatory variables for each equation. Certain variables are primary control variables which can’t be left out in any of the three equations, so some overlap is unavoidable. In any case each effort to create a difference between the three selections of explanatory variables is desirable.

2. With every econometric research it is highly preferred to use a sufficiently large dataset. Partly due to Cosslett’s approach that generates a lot of dummies, a sufficiently large sample is essential. Since two equations are estimated, the number of dummy variables increases in a quadratic way. To guarantee reliable results the proportion between sample size and number of explanatory variables has to be acceptable. In our case, the sample size is sufficiently large, so this will not cause any problems. 3. In the Medical Expenditure Panel Survey, individuals sometimes forgot, did not know or did not want

to answer certain questions. In the MEPS database, these unknown characteristics get a negative value. These negative values could lead to biased estimations. As a consequence we try to avoid variables completed by only a small part of the survey participators. Moreover, besides the negative outliers discussed above, there are also well-reported values of certain variables which are outliers. For example total health expenditures or income. Such variables frequently follow a positively-skewed distribution meaning that the mass of the distribution is concentrated on the left with a long right tail. Taking the natural logarithm is a familiar adjustment. The logarithm of zero does not exist so often a small constant is added to neither decrease the sample size nor create any mathematical impossible calculations. Another solution is to create an extra dummy-variable for individuals with no expenses or no income. With reference to the fat right tail, a possible solution is to truncate the value at a defined maximum value. Note that we create selectivity by avoiding individuals with unknown characteristics and capping expenditures and income. Also, when going through the data for several years, there will be variables returning each year but also variables who do not.

4.4

Selected Variables and their Expected Impact

The selection of explanatory variables is an important part of the model that also influences its quality. The Medical Expenditure Panel Survey gives us a wide range of variables which can be included in the model. Different interpretations of causal relations make it hard to set hypotheses about the impact of each variable. Nevertheless, in this paragraph we will discuss some of the expected impacts of the explanatory variables. As discussed in the foregoing paragraph, differences between explanatory variables for each equation are impor-tant. Therefore, we discuss variables included in all three equations (corresponding explanatory variables) and variables included in only one or two of the equations (specific explanatory variables).

(30)

4.4.1

Corresponding Explanatory Variables

Health Status

Besides these standard control variables, we also include several variables indicating health status: • Comorbidities-ratio

• Smoke status

• Number of visits to a medical office • Body Mass Index

• Pregnancy • Deafness • Blindness • Mental health

We use a comorbidities ratio instead of a comorbidities count, because some of the comorbidities were not measured in the earlier years. The list of comorbidities of most years is asthma, arthritis, cancer, coronary hearth disease, other heart diseases, high cholesterol, high blood pressure and stroke. Cancer is not included for 2004, 2005, 2006 and 2007 and high cholesterol is not included for 2004. Because we do want to include cancer and high cholesterol when available in a certain year, we divided the number of comorbidities by the total comorbidities taken into account for that specific year. This results in the comorbidities ratio used as explanatory variable in our model. We expect a positive impact of the comorbidities ratio in every equation. The more diseases someone has, the more likely to be insured, the more likely to use medical care and the higher the health expenditures will be. Regarding insurance status, this expectation is based on the selectivity problem, meaning that less healthy individuals are more actively looking for an insurance.

Smoke status is a dummy variable. Smoking has a negative effect on health status but smoking is also something people can quit. This means that people have a choice to continue or quit smoking. Following the same reasoning as for the comorbidities ratio we should say that smoking worsens physical health status and thus leads to more medical care and higher expenditures. However, people often smoke to reduce stress (Cohen and Lichtenstein, 1990). Hence, maybe it improves mental health status. Hibbard and Cunningham (2008) have done research into patient activiation and its correlation with other characteristics. They define patient activation as a person’s ability to manage their health and health care. The patient activation is lower for those who are obese or smoke. In addition, they point out that the direction of the causality operates in both directions. A poor health status causes lower activation but passivity and lower activation also leads to poorer health. So smokers can be more passive and care less about their health. In that case the impact of smoking on insurance, utilization and health expenditures will be negative.

The patient activation is lower for those who are obese or smoke (Hibbard and Cunningham, 2008). So the same reasoning applies for BMI. This index is categorized in low (< 18.5), normal (18.5 − 25), high (25 − 30) and very high (> 30). We expect that the group with a very high BMI will need more medical care. Also, BMI is expected to have a positive impact on health expenditures. Turning to insurance, the impact can be because insurance companies can reject less healthy people. The category low BMI cannot be

Referenties

GERELATEERDE DOCUMENTEN

5: Kaart van Romeins Tongeren met aanduiding van het onderzoeksgebied (roodl).. Ook over het stratenpatroon is niet veel geweten. De Romeinse heerbaan die Boulogne-sur-Mer met

Using a dynamic spatial panel approach and data pertaining to 156 countries over the period 2000-2016, this thesis tests and compares the different spatial econometric models and

Using expenditure data from the Longitudinal Internet Studies for the Social Sciences (LISS), we estimate how total non-medical and medical expenditures change when

The aim of this paper was to (1) revisit the topic of age-specific trends in health care utilization, and estimate these trends for different health care sectors in the

chronopotentiometric stages when a current density is suddenly applied or when a current density is suddenly stopped, for a flow rate of 50 ml/min. B1 shows that τ increases

eu-LISA shall also implement any necessary adaptations to the VIS deriving from the establishment of interoperability with the EES as well as from the implementation of the

Vibration monitoring directly at the helicopter rotor blades presents an important advancement in health and usage monitoring systems. The autonomous and distributed

(vii) •n Kliniekskool moet vanwee die afgesonderdheid van sy leerlinge, ruim voorsiening maak vir vorming en toerusting van die kinders. Hulle moet gemotiveer