### Faculty of Economics and Business, University of Amsterdam

## Health Spending Budget Range Prediction

### How will individual choice on health insurance and other factors a ffect health spending?

### Author: Shihan Yu 12012726 Superviosr: Edwin Hinloopen

### June 30, 2021

### Abract

Recently, Netherlands has become one of the ten most expensive countries in terms of medical expenditure. Its aggregate healthcare expenditures have rocketed comparing to other high-income economies. Thus, the paper would like to investigate how health insurance would affect health spending in the Netherlands. And give a spending budget range prediction by using the ordered probit model and order logit model. Meanwhile, the paper will also consider the endogeneity of the insurance premium and other potential factors that lead to spending decision change.

### Statement of Originality

The document is written by Shihan Yu who declares to take full responsibility for the content of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of the completion of the work not for the content.

### Acknowledgement

I would like to thank my supervisor Edwin Hinloopen for his support and guidance throughout the thesis process. I would also like to thank the University of Amsterdam for providing me with the opportunity to participate in the Econometrics program, and for all the opportunities that will award me in the future.

### Introduction

Dutch health care system has now universally and widely applied. Individuals will have a greater selectivity when it comes to purchasing insurance. The obvious choices are choosing the amount paid for the health insurance product, deciding the insurance company and other in- dividual insurance decisions. Relevantly, the latest data from the OECD shows that from 2016 to 2019, total spending on health insurance in the Netherlands is above the OECD average, but on a downward trend. Interestingly, per capita, healthcare spending in the Netherlands has con- tinued to grow rapidly over these years. It is clear that economic growth led to an incremental aggregate health expenditure(Wang and Lee, 2018). Rather than examining the overall rela- tionship, however, this paper will focus on the health spending budget prediction considering the influence of insurance choice and other factors. But the problem is using a simple linear regression model to predict the exact health spending value might cause high bias. Therefore, this paper will divide health spending into four-level and estimate the budget in the four ranges.

But there are three empirical considerations regarding the analysis of health insurance deci- sions.

First of all, since health expenditure is a continuous variable, it is not appropriate for the choice analysis predictive model. Because for various reasons, continuous data are awkward to handle in choice analysis. If only the dependent variable were converted to a binary variable, this means that the variables would be divided only into those with healthcare spending and those without. Thus, there is no logical reason to expect that different ranks of spending are regressed differently since the binary reflect the only ordinality. In particular, the comparison between people spending low on health care and high spending would not exist. Therefore, drawing on ideas from Daykin and Moffatt (2002) literature, this paper will convert the latent, continuously distributed random spending into an ordered response.

Secondly, the insurance premium is only one of the main factors on health spending. There are many other individual factors that influence health spending. the individual’s decision will depend on the type of insurance company they choose. In the presence of having private health insurance, coverage and payment levels for individual medical services depend in part on the ability of the primary insured to cover the associated premiums(Atake, 2020). Meanwhile, there is a role for the link between health status and health care spending when health care helps to reduce or prevent health deterioration(Rivera, 2001). Besides, in a review by Xu et al. (2015)’s research, medical expenditures due to smoking are an essential component of the overall eco-

nomic costs of smoking. Therefore, the severity of smoking also leads to changes in the extent of health expenditures. A variety of literature shows that the magnitude of health spending does not only depend on the coverage premium but is also closely related to other individual performance. Thus, there are many other variables that need to be taken into consideration in the regression.

Furthermore, health insurance is potentially an endogenous variable. On the report of Schellhorn (2001), insurance status could be interdependent with utilization variables, health status and other individual factors. From his point of view, supplemental insurance only covers the additional costs. So the choice of supplementary insurance will be considered as an exoge- nous variable. Also, private insurance cannot cover the same risks as social insurance (Sinn, 1996). And individual may know their risk type better(Cardon and Hendel, 2001). In addition, people depend on factors such as age, income, etc. when deciding how much to spend on in- surance. Therefore, the type and amount of insurance are considered endogenous. While, in Manning et al. (1987)’s paper, they try to substitute dataset to avoid the endogenous problem, which will not be considered in this research.

This paper examines the topic by comparing various individual health insurance decisions and the corresponding health spending extent in addition to health insurance. There has been plenty of literature examining the overall impact of health spending on health insurance, but very little has focused on the impact on individuals. In this research, the choice model will be used to study the relationship. As claimed by Amemiya (1981), the three most frequently models used in quantitative analysis are the Linear Probability model, probit model and logit model. Therefore, this paper will apply the logit and probit models with the ordinal dependent variable. Meanwhile, to make sure endogeneity, it will also use the 2SPS model(Terza et al., 2008) to make a comparison between determining health insurance choice as an endogenous variable or an exogenous variable.

The remaining part of this article will be constructed as follows, the second section will provide the Dutch health care system and explain the impact of health insurance on different levels of health care budget in the previous years of the study. And “Data” and “Model” will derive the dataset and used it in the research. Also, it will give an explanation about the meth- ods choose in the study. In sections 5 and 6, the finding in the study will indicate and more further consideration will be discussed.

### Literature review

According to Huber and Orosze (2003), Netherlands’s aggregate healthcare expenditure and per capita expense have both maintained above average during recent decades. Addition- ally, their analysis also suggests a potential connection that exists between health spending and health insurance. On the other hand, Ward and Franks (2007) focus on the influence of specific individual decisions on medical expenses in their analysis. They concluded that the variability of insurance status is related to the predictable changes in total healthcare expenditures. How- ever, such a conclusion is subject to a limited sample and a lack of comprehensive research on the impact of insurance.

In the recent study on the Netherlands healthcare system (Bakx et al., 2016), Netherlands ranks the first in terms of healthcare expenditure among high-income countries, which results from the peculiarities of the Dutch medical system. As stated in their study, healthcare expenses are negatively correlated to household income level. Medical expenses occupy approximately 33% of long-term care expenses. Accordingly, earlier research (Newhouse, 1992) has also de- rived a similar conclusion that the number of medical expenses could potentially depend on health insurance. However, in his finding, insurance implementation has not led to obvious growth in expenditure as expected.Finkelstein (2007) suggests that the widespread implemen- tation of health insurance may play a more substantial role in medical expenses. However, in further discussion, the impact could also be satisfied in most other health insurance systems.

Thus, Wang and Lee (2018) conducted similar research later in 24 different national healthcare systems including the Netherlands and their result confirms Finkelstein’s assumptions. More- over, not only will life insurance consumption reduces medical expenses but also affect the relationship between medical expenditure and the economy.

From all the literature above, they all consider health spending as a continuous variable in the linear regression. However, unlike treating health spending as a continuous random vari- able, the ordered response models could reflect from the individual-specific fixed effects. Thus, an ordered model can explain the different insurance choices on the budget range allocation for the available health care services(Adhikari, 2013). Therefore, some research has transformed health spending as an ordered variable in the analysis. A two-part choice model was used in Xu et al. (2015)’s study to verify the health spending scale attributable to cigarette smoking.

They firstly applied a simple logit model for positive health spending response, then use the same model to further subdivide the smoking fraction. Different from Xu’s paper, the method

in Daykin and Moffatt (2002)’s research and Deussing (2003)’s paper gives a more straight- forward model for rescaling spending. In their point of view, the continuous latent variable corresponding to the dependent variable is being converted into an ordinal variable by using a cut-off point between consecutive alternatives. By doing so, the outcome of health spending level could be straightforwardly carried out using the ordered probit model(Daykin and Mof- fatt, 2002) or the ordered logit model(Adhikari, 2013).

However, considering health insurance as exogenous is not always possible (Shen, 2013).

As mentioned in the report, individual insurance choice is possibly endogenous. In contrast, the marginal effect on expenditure is quite different from the survey based on the exogenous assumptions (Newhouse, 1992). Thus, health shocks and insurance type could be the causes of insurance endogeneity in the Netherlands. Although the individual choice of social insurance covers the majority of the Netherlands population, private health insurance also takes a third of the primary health market (Tapay and Colombo, 2004). Similarly, in terms of health shock, the result illustrates that economic growth increases insurance consumption, which then implies a negative correlation between health expenditure and economical growth (Wang et al., 2018). It is highly likely that health insurance is closely associated with such endogenous factors which could not be neglected due to their impact on medical expenses.

To avoid the influence of endogeneity, some(Manning et al., 1987) have instead chosen to use their experimental data. They point out in their dissertation that endogeneity cannot be avoided because the non-experimental data are cross-sectional. As there is more incentive for some people to obtain more complete coverage through multiple choice. Neglecting en- dogeneity would lead to upward fluctuations, which means that the confirmation interval for the coefficients is not large. Therefore, they created a control group for the experimental data to eliminate temporal bias caused by other factors. Thus, the study used data from the Rand Health Insurance Experiment (HIE) to narrow the uncertainty on this issue. However, this is rarely utilized because experimental data are difficult to capture and can easily become obso- lete(Shen, 2013).

In general, the choice of instrumental variables would be a very common way to address this empirical challenge. From the perspective of(Deb et al., 2006), the alternative insurance plan is a reason that causes the regression model endogenous. . Others are more likely to en- rol themselves in health plans with higher health insurance and broader coverage than healthy individuals. Deb also states that in their data, the individual choice is possibly dependent on

personal characteristics. Thus, the health status, preferences for risk and so on could be the underlying instrumental variables. To tackle this concern, their study uses the multinomial selection model into a Bayesian estimation framework. Analogously, Schellhorn (2001) identi- fied health status as an overriding phenotype in insurance decisions. Meanwhile, income seems to determine health insurance choice more important than medical care or health status. Since all variables are interdependent, they use the availability of supplemental insurance as an instru- ment and the choice of deductible as endogenous. In this way, they settle for being influenced by health care utilization and refrain from using health status variables as endogenous variables in the model.

This literature review mainly analyzes recent studies on the impact of health insurance on health spending. And gives some sample literature that transforms continuous spending into ordinal. It also mentions that health insurance is likely to be used as a potential endogenous variable. However, only a minority of the literature mentions endogeneity, and few studies have translated health expenditures into an ordinal variable. Therefore, in this paper, an ordinal choice model will be used to investigate their interactions and take endogeneity into consider- ations.

### Methodology

In the presence of endogeneity, simple regression models can lead to biased and inconsis- tent coefficient estimates. To address this challenge, the most simply and commonly used is the Two-stage least square. However, considering the dependent variable is nominal, the regression model will be nonlinear. Therefore, this paper will apply the model two-stage predictor substi- tution(2SPS) which is quite similar to the two-stage least square(2SLS) but is widely applying for the nonlinear model. The theory is based on Terza et al. (2008)’s paper: In the first phase, the model will construct an auxiliary regression with instrumental and exogenous variables, the results of which will be used to generate predicted values of the endogenous variables. Then to the second stage, the corresponding choice model will be performed after reproducing the predicted endogenous variables.

Subsequently, the paper will compare two different choice models conducted by the ex- ogenous variables and predicted variable ˆX. First of all, as the dependent variable is ordinal, the choice of linear regression will implicitly presume that the differences between the level of

ordinal variables are the same(Daykin and Moffatt, 2002). Since the rating for the dependent variable has no numerical significance, and the rating reflects the only ordinality. In contrast, the choice of the Ordered Probit model is consistent with this measurement variable. Ans using the sequential Probit model suggests that a particular assigned value of the ordinal variable is uniformly associated with a range of willingness in health care spending. Although differences in willingness toward a given response are clearly unobserved, the model should accommodate such differences exist(Rivera, 2001). In the case of ordinal data, ignoring such differences can cause biased estimates, especially if the size of the category is relatively small.

The Ordered Logit model has a somewhat similar structure to the ordered Probit model, but computational tractability(Van Beek et al., 1997) would be an edge of the ordered logit.

The condition of the ordered regression model is that the dependent variable in the model is a continuous latent variable (Amponsah, 2009). In order to predict the budget range, the paper transforms health spending by classifying it into four levels (y), the ordinal y will distinguish the degree of the predictive budget of an individual through thresholds. Although y is a variable with no observation condition, it will be observed by determining whether the value crosses a certain cut point.

The ordinal variable will be define as follow:

y_{1} = 0, i f expenditure= 0, individual should prepare no spending budget

y_{2} = 1, i f 0 < expenditure ≤ κ1, individual should prepare low spending budget(0-50)
y_{3} = 2, i f κ1< expenditure ≤ κ_{2}, individual should prepare medium spending budget(50-200)
y_{4} = 3, i f κ2< expenditure, individual should prepare high spending budget(more than 200)

Where κiis the trisection point of the health expenditure distribution expect all observations with zero health spending. The values of y equal to 1 to 4 are defined as four different levels of health spending budget, respectively.

The endogeneouty will be considered in the first step, so the auxiliary regression is provide as follow:

X = Xexoβ + Zα + η (1)

Xˆ = X^{exo}β + Zα (2)

Where X represent the endogenous variable which is the insurance premium. And Xexowith
n ∗ k_{1} are other k_{1}exogenous variables that also have effect on y. And α with k1∗ 1 and β with
k_{2}∗ 1 are column vectors of parameters. Z is a n ∗ k_{2}vector of k_{2}identifying instrumental vari-
ables. Also, instrumental variables are supposed to be sufficiently associated with endogenous
variables. Additionally, they should not directly influence the outcome(Castineira and Nunes,
1999).

Based on Rivera’s model(2001), the paper first estimate the ordered probit model(ORP) with the endogenous problem. And at the second stage, the ordered probit model will be ap- plied for further prediction. Combining Equation (1) and Equation (2) yields the following equation:

First stage:

X = β1+ Xhthstsβ_{2}+ Xltillβ_{3}+ Xmthexpβ_{4}+ Xbmiβ_{5}+ Xsmokeβ_{6}
+ Zageα_{1}+ Zgenderα_{2}+ Zinstypα_{3}+ Zriskadvα_{4}+ η

Xˆ =Xexoβ + Zˆ instrumentαˆ

(3)

Second stage:

Pr(yi = i) = Φ(κi−1< ˆX^{0}β + α^{j} < κi)

= Φ(κ^{i}− ˆX^{0}β) − Φ(κi−1− ˆX^{0}β) (4)

At the first stage, simple linear regression is used for estimating problematic predictors. X on the left side of the regression denotes the endogenous variable “insurance premium”. And all X on the right side represent the exogenous variable “health status”, “long stand illness”,

“month expenditure ”, “Body mass index(BMI)” and “smoking severity”. Z stand for all in- strument variables “age”, “gender”,“insurer type” and “risk tolerance” respectively. And ˆX is the prediction for the insurance premium in the first stage which will substitute the endogenous variable in the next step. At the second stage,Φ is the cumulative density function for normal distribution.

And for the ordered logit model(ORL)(Grilli and Rampichini, 2014) with the endogenous problem. The regression is nearly the same. Only the distribution difference in the second stage can be found as follow:

First stage:

X = β1+ Xhthstsβ_{2}+ Xltillβ_{3}+ Xmthexpβ_{4}+ Xbmiβ_{5}+ Xsmokeβ_{6}
+ Zageα_{1}+ Zgenderα_{2}+ Zinstypα_{3}+ Zriskadvα_{4}+ η

Xˆ =Xexoβ + Zˆ instrumentαˆ

(5)

Second stage:

Pr(yi = i | ˆX, α, β) = L(α^{i}− ˆX^{0}β) − L(αi−1− X^{0}β) (6)
Where every constant parameter should satisfy: α_{j−1} ≤ αj, j = 0, 1, 2, 3. And the cumulative
density function for logistic distribution is modelled as:

L= e^{α}^{i}^{− ˆ}^{X}^{0}^{β}
1+ e^{α}^{i}^{− ˆ}^{X}^{0}^{β}

= 1

1+ e^{−(α}^{i}^{+ ˆX}^{0}^{β})

(7)

Analytically, ordered choice models can use the likelihood function to estimate the coefficients of the explanatory variables. And because the likelihood of both models is the product of all probability functions, taking log-likelihood will make the calculation more convenient. There- fore, the estimation parameters of ORP is derived below:

ln L(κ, β | y, ˆX)= X4

i=1

(iln(Φ(κi− ˆX^{0}β) − Φ(κi−1− ˆX^{0}β)) (8)
Similarly, for ORL:

ln L(α, β | y, ˆX)=

4

X

i=1

(iln(L(αi− ˆX^{0}β) − L(α_{i−1}− ˆX^{0}β)) (9)

To visualize the interpretation of the coefficients more intuitively, odds ratios will also be an- alyzed in the ORL for performing the analysis for the ordinal dependent variable. This will prevent arbitrary split assignments to different categories since for odd ratios the cut points between categories are unobservable (McCullagh, 1980). A model for odd ratios is shown as bellowed:

ln(Pr(yi ≤ γ | ˆX)

Pr(yi > γ | ˆX))= αi− ˆX^{0}β, (0 ≤ γ < 3) (10)

Pr(yi = γ | ˆX) =

L(α_{1}− ˆX^{0}β) if γ= 0

L(αi− ˆX^{0}β) − L(α_{i−1}− ˆX^{0}β), if 0 < γ ≤ 2
1 − L(α_{3}− ˆX^{0}β), if γ= 3

(11)

### Data

As mentioned previously, this article will utilized panel data from the Longitudinal Inter- net Studies for the Social Sciences (LISS). The tendency given by the OECD shows a sharp upward trend in insurance spending from 2015 to 2016. Therefore, in this paper, attention will be devoted to the household surveys sample in 2015. However, some of the surveys have many empty responses, this might be a potential outlier or hard work for filling in the missing value.

So by removing all the incapable samples, the resulting dataset is a sample of 3546 observa- tions. And tables 1, 2 and 3 will give descriptive statistics for ordinal variables, continuous variables and dummy variables respectively.

Table 1: Statistics of Continuous Variables

N mean std min max mode medium

hth exp^{*} 3545 73.80 174.72 0 3000 0 25

hth wil^{**} 3545 1.15 0.79 0 3 1 1

hth sts^{**} 3545 3.05 0.76 1 5 3 3

BMI 3545 26.03 6.48 0.37 261.22 25.31

lt ill^{**} 3545 0.34 0.47 0 1 0 0

gender^{***} 3546 0.51 0.50 0 1 1 1

ins typ^{**} 3545 2.45 0.59 1 3 3 3

risk ady^{**} 3545 82.45 160.72 0 500 0 0

ins prm 3545 175.11 352.34 0 12625 136

mth exp 3545 1774.23 2433.93 0 67900 1500

age 3545 55.18 16.16 18 93 68 57

smoke 3545 2.40 6.10 0 50 0 0

* Define as dependent

**Define as ordinal

***Define as dummy

The dependent object is the individual health spending, which is monthly average spending on health care costs not supported by insurance. Since the research highlights the difference in the amount of individual health insurance payments on the additional health spending level,

Table 2: Descriptive Statistics of Ordinal Variables

Variable Category value Count Fraction

risk adv^{*} zero risk tolerance r=0 2581 72.81%

low risk tolerance 0<r≤200 389 10.98%

medium risk tolerance 200<r≤400 186 6.94%

high risk tolerance r>400 329 9.26%

ins typ partnership 1 187 5.28%

LTD 2 1566 44.17%

PLC 3 1792 50.55%

hth sts poor 1 54 1.52%

moderate 2 622 17.55%

good 3 2108 59.46%

very good 4 607 17.12%

excellent 5 154 4.34%

hth wil no spending 0 620 17.49%

low spending 1 2041 57.57%

median spending 2 626 17.16%

high spending 3 258 7.28%

* for a given obliged risk of 375 euro. How much each observation voluntary its own risk in 2015?

the dependent variable will be transferred to an ordinal variable by dividing the health spend- ing amount into four categories. By leveraging the Daykin and Moffatt (2002)’s approach, the research defines three cut-off points for dividing health spending level which is 50 and 200. For the householders with no health spending, the ordinal value for spending willingness is 0. Next, the health spending budget is from 0 to 50, 50 to 200, and over 200 are defined as the ordinal values 1, 2 and 3. They respectively denote individual prepares a “low health spending budget”,

“medium health spending budget”, and “high spending budget”. This will be the main depen- dent variable “hth wil” in research. While evaluating the explanatory variables, in each step, the identification problem of the estimation procedure need to take into account. In particular, it might be the case that if both stages consist of exactly the same variables, the parameters might not be identified(Cameron and Trivedi, 2005). Even though most of the considered variables

are expected to affect both insurance coverage and health care expenditure, this paper attempt to suggest at least a few variables that differ in the two stages.

In addition, the variables “hth sts”, “lt ill”, “BMI”, and “mth exp” are considered as exoge- nous variables that may also have a potential impact on health spending. Logically, an increase in health expenditure will automatically improve health status to some extent. Conversely, low health status will lead to more additional health care expenses(Rivera, 2001). Thus, individ- ual health status can be an exogenous variable that affects the dependent variable. To define health condition, this paper uses the variable “hth sts”, which investigates the health position of individuals in each sample. This ordinal variable classifies health status into 5 levels, from poor to excellent, represented by 1 to 5. Meanwhile, neither height nor weight appears to be significantly related to health care expenditures, but when these two data sets are converted to body mass index(BMI), there is a U-shaped relationship between BMI and medical care ex- pense(Wang et al., 2012). And BMI is derived from the following model, where w is assigned in kilogram and h is in meter.

BMI = w

h^{2} (12)

In particular, there is evidence that long-term illness, misery or obstacle spend more on re- habilitative services, long-term care and so on. These might not all include insurance coverage.

Hence the dummy “lt ill” will represent whether the person has any chronic disease. Finally, it is obvious that monthly expenditure “mth exp” will reflect on the health spending. In the report by Papanicolas et al. (2018), it is a common scenario that higher health expenditures may be caused by low social expenditures. This contention is driven by the fact that low social spending leads to a sicker population, which not only carries worse health outcomes but also uses more medical services and higher healthcare expenditures. In addition, smoking or not will have a significant direct impact on expenditure but not on insurance coverage(Shen, 2013).

The fact is that people who smoke do not perceive smoking as a potential health threat and hence may not be motivated to obtain insurance. But on the other hand, it is clear that smoking will have a deleterious effect on their health. This means that they will seek medical care more frequently as a result, and health expenditures will increase. And in this research, the number of cigarettes that one smokes per day will represent the severity of smoking.

Having decided upon all explanatory variables in the second stage, the relationship between insurance coverage and instrument variables will be investigated further. Insurance companies are essentially intermediaries for those who are willing to take risks in certain insurance con-

Table 3: Descriptive Statistics of Dummy Variables

Variable Category value Count Fraction

Gender Female 0 1722 48.58%

Male 1 1823 51.42%

lt ill Yes 1 1218 34.36%

No 0 2327 65.64%

tracts(Broch,1985). And different types of companies will have varying portfolios of market values, leading to corresponding levels of risk. Therefore, the type of company and the an- ticipated risk of the individual will have a potential impact on the choice of the premium.

Therefore, the type of company and the expected risk of the individual will be potential factors in the choice of insurance premium. Thus, the amount of voluntary risk in the LISS survey can indicate an Individual’s risk tolerance, named “risk adv”. And an increment in this or- dinal variable signifies an incentive to undertake risk. Additionally, to get the insurance type

“ins typ”, this research paper categorized the choice of insurance companies purchasing in the survey into three types: partnership, limited liability company and public limit company. They are designated as 1, 2 and 3 respectively.

Because all the ordinal variables progress accordingly, so transform it into a dummy is not necessary and could also cause a handy regression. So, dummy transformation may not take into consideration. For the following part, the ordered logit and ordered probit models will be built to estimate the coefficient for the corresponding variables.

### Result

It is easy to find some simple correlation between these variables from figure 1. Accord- ingly, it is clear that there is a strong negative correlation between health conditions and health spending. Also, the coordinates (health expenditure, age) and (health expenditure, long-term illness) show brighter colour blocks, implying that age and long-term illness are more relevant to average monthly health spending than the other independent variables.

Before starting to discuss the results from the regression, the usage of coefficients in ordi- nal models is unclear. Theoretically, as the ordered model is nonlinear, the coefficient directly represents the predicted change in the dependent variable caused by a unit change in the cor-

responding independent variable. In contrast, marginal effects have more binding constraints, and the correlations of these variables will change instantaneously(Rivera, 2001). Therefore, to ensure the coefficients are useful, Table 6-9 will be analyzed using the marginal effect to explain the ordinal model and odds ratio for the logistic model particularly. Moreover, the most commonly used to evaluate Logit and Probit models is the McFadden R squared, also known as pseudo-R squared. And in general, the model is fit if the pseudo-R squared is larger than 0.20. Therefore, the model used fits well with the prediction model statically. Similarly, the significance of the model could be told in the test statistics in ORP and ORP. The test statistic 199.28 is extremely larger than the Chi-Square with 6 degrees (12.582).

The scatter plot in Figure 4 initially shows that insurance purchases are concentrated in areas under approximately 500 euros, but people will more intensively have a high insurance premium if they spend a high level on health care. Table 4 present the result from the logit model that is estimated using odds ratios. It can be seen that for every 1 unit increase in the in- surance premium payment of each individual, the odds of willing to spend highly on health care versus willing to spending an acceptable amount(medium spending) are 1.007 times higher.

This means when the amount of insurance premium rises, people are more willing to spend more on additional health care. And for the exogenous variable “long term illness”, the odds ratios are significantly higher than other variables. This phenomenon tells for people who suf- fer from long term illness, misery, and disability will have 1.45 times more intention to spend than people without disease in all kinds of spending levels. And when only considering gen- der, females are more reluctant to spend on healthcare than males. Conversely, health status, BMI and smoking extent get the opposite outcome. High-frequency smokers had higher total monthly medical expenditures at all stages than normal smokers and nonsmokers. This is con- sistent with the findings of Leu and Schau (1983), as smoking reduces life expectancy. This, in turn, reduces the utilization of medical services and thus increases additional medical costs.

Moreover, the effect of monthly expenditure on health spending is not really notable as the odds ratios are nearly 1.

However, the incremental or decremental evolution of the odds ratio is constant at each stage. The value of odds ratios is not sufficient to explain the variation of spending intention.

While the marginal effect will specifically interpretation the change in each level. Hence, Ta- bles 6-9 provide the outcome from both ordered models by applying the marginal effect. From both tables, the result of interpretations is quite similar. For example, in Table 6, the marginal

effects show that smoker is 2.2 percentage points more likely than nonsmokers to spend noth- ing on health care, but they are not willing to spend a high level on healthcare as the smoking severity increase, which is about 1.2 percentage points in table 9. And Table 6-9 with the logit model yields marginal coefficients for gender corresponding to 1.4% and -0.6%, respectively, which are almost the same as the probit findings. This might because people who choose higher insurance premium products have more need for medical services. However, considered higher premiums covering more potential health expenses, the higher premiums make people feel more inured and therefore less inclined to fret about extra health care. Therefore, there is an increased proportion of low-level spending but a decrease afterwards. In addition, the effect of long-term illness on spending plans appears to be the most obvious effect. People with long-term disease have a 4.5 percentage lower possibility of spending nothing than people in good health conditions. On the contrary, because they suffer from illness for a long time, the possibility for spending a high amount on extra medical care will be 2.4% to 3.1% percentage more. More importantly, for every 1 percentage increase in the monthly insurance payment, the possibility for 0.01% less likely to spending zero amount on health, but conversely 0.11%

and 0.18% less likely to spend more than 200 or even 500 euro per month on health care. This derives a similar result as the odds ratios but more specific.

To sum up, the relation between the insurance premium and health spending level is not a simple trend. For a higher insurance premium payment, more proportion of people will decide to spend highly on medication. Instead, there will be less possible for people to spend low on health care or even zero spendings. In a general perspective, the higher purchase of insurance premiums will lead to lower health expenditure. For other factors, the long-term disease seems to have an explicit impact. Health status, on the other hand, has a distinct inverse effect on the dependent variables. While because the test statistic for BMI is not sufficient, the interpretation for BMI will not consider in this model. And all other factors have exceedingly small effects on spending. What’s more, the large Sargan statistic in Figure 5 with p-values close to 0.05 proves that the instrumental variables are almost valid.

### Conclusion and Discussion

Recent research on health insurance systems has brought a more comprehensive insight into health expenditures and insurance premiums. The current findings suggest that the widespread

implementation of health insurance may have played a more attenuating effect on individual health care expenses. However, the relationship in this process remains uncertain. On the one hand, there is an endogeneity of insurance, as it is an individual choice that depends on various factors (Shen, 2013). On the other hand, few studies have compared the multiple ordered mod- els simultaneously. Therefore, it is difficult to make a precise judgment based on the published papers on the subject.

As a matter of fact, this paper contributes two different ordered models of budget prediction for medical expenses not covered by insurance, based on people’s different insurance choices and other factors. While it also takes the endogeneity of insurance premium into account.

a result, it shows that there indeed significant connections between the scale of health expenses budget plan and other factors. And importantly, for observations with a high insurance pre- mium, fewer people should arrange costly budgets while more tend to select the low level or even zero health payment plan. Moreover, the endogenous effect is theoretically as expected considered the test of instrument variables validation. Regarding the modelling strategy, the choice to use the ordered Logit model and the ordered Probit model is very similar, as they yield almost the same results. One thing, however, is that the ordered Logit model yields an additional odd ratio, which makes macro trends more observable and is used widely nowa- days(Persoskie and Ferrer, 2017).

Nevertheless, there is also a limitation for defining health spending as ordinal variables.

The disadvantages are the willingness to spend on health identifies a continuous variable into four progressive levels, where the difference between each adjacent level is one. However, the four different scales of budget for health spending cannot be expressed by an equal value.

So, they cannot be further distinguished. This may affect the persuasiveness of the estimation results(Liu et al., 2020). One way to solve this issue is to convert each ordinal variable into corresponding multiple dummy variables. This allows a clear analysis of the interpretation and better control of the reference group. However, this would make the model more tricky.

Furthermore, these ordinal regression models erroneously presume that the error variance is the same in all circumstances. Yet Williams (2008) indicates that the standard errors are wrong and the proportion odds estimates are violated if the heterogeneity problem is ignored. Hence, further research should be concentrated on the generalized ordered logit/probit models together with the heterogeneous choice model or location-scale models referring to Williams’ research.

### References

Adhikari, S. R. (2013). Health care switching behaviour of the patients in nepal: an ordered logit model analysis. Economic Journal of Development Issues, pages 128–147.

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic Literiture, 19(4):1483–1636.

Atake, E. H. (2020). Does the type of health insurance enrollment affect provider choice, utilization and health care expenditures? BMC Health Services Research, 20(1):1–14.

Bakx, I., Donnell, O. O., and Doorslare, E. (2016). Spending on Health Care in the Nether- lands:Not Going So Dutch. Fiscal Studies, 37:593–625.

Cameron, A. C. and Trivedi, P. K. (2005). Microeconometrics: methods and applications.

Cambridge university press.

Cardon, J. H. and Hendel, I. (2001). Asymmetric information in health insurance: evidence from the national medical expenditure survey. RAND Journal of Economics, pages 408–427.

Castineira, B. R. and Nunes, L. C. (1999). Testing endogeneity in a regression model: An application of instrumental variable estimation. 8:197–206.

Daykin, A. R. and Moffatt, P. G. (2002). Analyzing ordered responses: A review of the ordered probit model. Understanding Statistics: Statistical Issues in Psychology, 1(3):157–166.

Deb, P., Munkin, M. K., and Trivedi, P. K. (2006). Bayesian analysis of the two-part model with endogeneity: application to health care expenditure. Journal of Applied Econometrics, 21(7):1081–1099.

Deussing, M. A. (2003). An empirical analysis of the relationship between public health spend- ing and seld-assessed health status: An ordered-probit model. Master’s thesis, University of Ottawa, Ontario.

Finkelstein, A. (2007). The aggregate effects of health insurance: Evidence from the introduc- tion of medicare. Quarterly Journal of Economics, 122(1):1–37.

Grilli, L. and Rampichini, C. (2014). Ordered logit model. Encyclopedia of quality of life and well-being research.

Huber, M. and Orosze, E. (2003). Health expenditure trends in oecd countries, 1990–2001.

Health Care Financing Review, 25:1–22.

Liu, J., M., F., Jin, F., Wu, C., and Chen, H. (2020). Multi-attribute decision making based on stochastic dea cross-efficiency with ordinal variable and its application to evaluation of banks’ sustainable development. Sustainability, 12(6):23–75.

Manning, W. G., Newhouse, J. P., Duan, N., K. E. B., and Leibowitz, A. (1987). Health insurance and the demand for medical care: evidence from a randomized experiment. The American economic review, pages 251–277.

McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2).

Newhouse, J. P. (1992). Medical care costs: How much welfare loss? Economic research- Ekonomska istraˇzivanja, 6(3):3–21.

Papanicolas, I., Woskie, L. R., and Jha, A. K. (2018). Health care spending in the united states and other high-income countries. Jama, 319(10):1024–1039.

Persoskie, A. and Ferrer, R. A. (2017). A most odd ratio:: interpreting and describing odds ratios. American journal of preventive medicine, 52(2):224–228.

Rivera, B. (2001). The effects of public health spending on self-assessed health status: an ordered probit model. Applied Economics, 33(10):1313–1319.

Schellhorn, M. (2001). The effect of variable health insurance deductibles on the demand for physician visits. Health Economics, 10(5):441–456.

Shen, C. (2013). Determinants of health care decisions: insurance, utilization, and expendi- tures. Review of Economics and Statistics, 95(1):142–153.

Sinn, H. W. (1996). Social insurance, incentives and risk taking. International Tax and Public Finance, 3(3):259–280.

Tapay, N. and Colombo, F. (2004). Private Health Insurance in the Netherlands. A Case Study.

Terza, J. V., Basu, A., and Rathouz, P. J. (2008). Two-stage residual inclusion estimation:

addressing endogeneity in health econometric modeling. Journal of health economics, 27(3):531–543.

Van Beek, K. W., Koopmans, C. C., and Van Praag, B. M. (1997). Shopping at the labour market: A real tale of fiction. European Economic Review, 41(2):295–317.

Wang, k., Lee, Y., Lin, C., and Tsai, C. (2018). The effects of health shocks on life insurance consumption, economic growth, and health expenditure: A dynamic time and space analysis.

Sustainable Cities and Society, 37:34–56.

Wang, K. M. and Lee, Y. M. (2018). The impacts of life insurance asymmetrically on health expenditure and economic growth: dynamic panel threshold approach. Economic research- Ekonomska istraˇzivanja, 31(1):440–460.

Wang, K. S., Liu, X., Z., S., Z., M., Pan, Y., and Callahan, K. (2012). A novel locus for body mass index on 5p15. 2: a meta-analysis of two genome-wide association studies. Gene, 500(1):80–84.

Ward, L. and Franks, P. (2007). Changes in health care expenditure associated with gaining or losing health insurance. Annals of Internal Medicine, 146(11):768–74.

Williams, R. (2008). Ordinal regression models: Problems, solutions, and problems with the solutions. http://www. stata. com/meeting/germany08/GSUG2008-Handout.pdf. Accessed:

2008–06-27.

Xu, X., E., B. E., M., K. S., A., S. S., and F., P. T. (2015). Annual healthcare spending attributable to cigarette smoking: an update. American journal of preventive medicine, 48(3):326–333.

### Appendix

### 0.1 Figure

Figure 1: Correlations of variables selected in the model

Figure 2: Graph of insurance premium and other instrumental variables

(a) Insurance premium against insur-company type (b) Insurance premium against Health status

(c) Insurance premium against risk voluntary (d) Insurance premium against gender

Figure 3: Graph of health spending and other variableses(categories normalised separately)

(a) Health spending against body mass index(BMI) (b) Health spending against Daily smoking

(c) Health spending against Insurance premium (d) Health spending against monthly expenditure

(e) Health spending against age (f) Health spending against long stang disease

Figure 4: Scatter plot for Health spending and other variables

(a) Health spending against monthly expenditure (b) Health spending against insurance premium

(c) Health spending against Daily smoking (d) Health spending against Age

### 0.2 Table

Table 4: Endogenous ordered logit model with odds ratio

Odds Ratios Std.Err z-score p-value

ins pre 1.007 0.001 6.94 0.000

mth ecp 1.000 0.001 4.51 0.000

hth sta 0.764 0.036 -5.56 0.000

bmi 0.989 0.005 -1.96 0.051

lt ill 1.452 0.112 4.83 0.000

smoke 0.989 0.005 -1.81 0.071

Table 5: Result from ordered logit and probit model with endogenous problem

Model - ORL Model - ORP

Variables Coef SE P-value Coef SE P-value ins pre 0.007 0.011 0.000 0.003 0.001 0.000 mth exp 0.001 0.000 0.000 0.001 0.000 0.000 hth sta -0.269 0.048 0.000 -0.049 0.016 0.003 bmi -0.010 0.005 0.050 -0.001 0.001 0.626 lt ill 0.373 0.077 0.000 0.082 0.025 0.002 smoke -0.010 0.006 0.071 -0.004 0.002 0.010

Table 6: Marginal effect for hthsp wil=0

Model - ORL Model - ORP

Variables Coef SE P-value Coef SE P-value

ins pre -0.0011 0.0001 0.000 -0.0001 0.0001 0.766 mth exp -0.0001 0.0002 0.000 -0.0001 0.0001 0.000 hth sta 0.0380 0.0069 0.000 0.0269 0.0071 0.000

bmi 0.0014 0.0007 0.051 0.0004 0.0007 0.625

lt ill -0.0526 0.0109 0.000 -0.0452 0.0111 0.000 smoke 0.0014 0.0008 0.071 0.0023 0.0007 0.002

Table 7: Marginal effect for hthsp wil=1

Model - ORL Model - ORP

Variables Coef SE P-value Coef SE P-value

ins pre -0.0002 0.0001 0.000 -0.0001 0.0001 0.735 mth exp -0.0001 0.0001 0.000 -0.0001 0.0001 0.735 hth sta 0.0380 0.0048 0.000 0.0061 0.0033 0.068

bmi 0.0003 0.0002 0.059 0.0001 0.0002 0.632

lt ill -0.0139 0.035 0.000 -0.0103 0.0055 0.061 smoke 0.0003 0.0062 0.081 0.0005 0.0003 0.083

Table 8: Marginal effect for hthsp wil=2

Model - ORP Model - ORL

Variables Coef SE P-value Coef SE P-value

ins pre 0.0001 0.0001 0.744 0.0008 0.0002 0.000 mth exp 0.0001 0.0001 0.009 -0.0001 0.0001 0.000 hth sta -0.0186 0.0082 0.023 -0.0304 0.0055 0.000 bmi -0.0002 0.0005 0.629 -0.0012 0.0006 0.050 lt ill 0.0311 0.0132 0.019 0.0421 0.0087 0.000 smoke -0.0013 0.0004 0.036 -0.0011 0.0006 0.071

Table 9: Marginal effect for hthsp wil=3

Model - ORL Model - ORP

Variables Coef SE P-value Coef SE P-value

ins pre 0.0005 0.0007 0.000 0.0001 0.0004 0.780 mth exp 0.0001 0.0001 0.000 0.0001 0.0001 0.000 hth sta -0.0176 0.0033 0.000 -0.0145 0.0046 0.002 bmi -0.0007 0.0003 0.052 -0.0002 0.0004 0.628 lt ill 0.0244 0.0052 0.000 0.0244 0.0076 0.001 smoke -0.0006 0.0004 0.072 -0.0013 0.0005 0.009

Figure 5: Result for Sargan test