• No results found

Count data models with endogeneity : an application to hospital stay

N/A
N/A
Protected

Academic year: 2021

Share "Count data models with endogeneity : an application to hospital stay"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Count data models with endogeneity: an

application to hospital stay

Anouk Hiensch

10275304

MSc in Econometrics Track: Free Track

Date of final version: August 14, 2017 Supervisor: J.C.M. Van Ophem Second reader: K.J. van Garderen

Abstract. This paper analyzes the effect of the choice of a patient to order a certain amount of protein rich food products on the length of stay in the hospital Gelderse Vallei located in Ede, the Netherlands. A simultaneous model is estimated which includes unobserved heterogeneity in both the treatment and count outcome equation to account for possible endogeneity. Our estimates suggest that an increased amount of ordered protein products has a positive effect on the length of hospital stay. Also, our estimates suggest the presence of significant unobeserved heterogeneity but there is no evidence for endogeneity.

(2)
(3)

Anouk Hiensch

Statement of originality

This document is written by Student Anouk Hiensch who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(4)

Contents

1 Introduction 5

2 Relevant literature 6

2.1 Protein intake. . . 6

2.2 Hospital Gelderse Vallei . . . 7

2.2.1 At Your Request R . . . . 7

3 Previous research 8 3.1 Endogeneity problem . . . 8

3.2 Recent empirical research . . . 9

4 Research method 10 4.1 Data . . . 10 4.1.1 At Your Request R data . . . 10 4.1.2 Health variables . . . 11 4.1.3 Data transformation . . . 12 4.1.4 Descriptive statistics . . . 13 4.2 Empirical Approach . . . 14

4.2.1 Analysis of the number of protein products . . . 15

4.2.2 Determinants of a patients hospital stay . . . 15

4.2.3 Two-equation model of hospital stay and protein products . . . 16

5 Results 19 5.1 Ordered probit model . . . 19

5.2 Zero-truncated Poisson. . . 20

5.3 Two equation model . . . 22

6 Discussion & Conclusion 27 6.1 Conclusion . . . 27

6.2 Limitations and recommendations . . . 27

References 29 Appendices 31 Appendix A Menu Hospital . . . 31

A1. Insert menu card . . . 32

Appendix B ICD10 . . . 33

Appendix C Descriptive statistics random sample . . . 34

Appendix D Log Likelihood . . . 35

(5)

1 Introduction

1

Introduction

A big poster on the outside of the parking garage of hospital Gelderse Vallei, Ede promotes: "Nutrition and exercise, for quicker recovery!". Over the last few years, Gelderse Vallei has focused more and more on nutrition and exercise as it is believed that proper nutrition benefits the recovery of a patient (Kondrup, 2004). They currently focus on the promotion of protein intake by patients.

Protein is an important component of our diet as these macromolecules perform multiple essential processes in the human body (Alberts, 2013). Especially hospitalized patients require a sufficient amount of protein. Increased protein intake by this group of individuals has multiple benefits like a lower chance of being readmitted to the hospital and a faster rehabilitation from fractures (Van der Zanden, Van Essen and Van Kleef, 2015). Adequate protein intake for hospitalized patients may therefore result in faster recovery.

The key question in this present research is: how does protein intake influence the length of hospital stay of a patient? A few empirical challenges arise when studying this question. First, the length of hospital stay of a patient is influenced by a tremendous amount of variables and capturing all of them is nearly impossible. In addition, an empirical problem appears since in hospital Gelderse Vallei patients order their own food and therefore protein intake is a choice variable. Hence, the choice to order a certain amount of protein products may not be exogenous with repect to the length of hospital stay and possible endogeneity of this choice variable must be taken into account. Lastly, the model has to allow for the count setting of both variables.

Inference regarding this treatment count outcome model is difficult because of the un-observed common confounding factors influencing both the decision of a patient to eat a certain amount of protein and the length of stay in the hospital. We must allow for possible endogeneity. Some papers deal with this endogeneity in a count setting by using a GMM strategy (Windmeijer and Santos Silva, 1997); others apply a two-steph method and introduce unobserved heterogeneity (Kenkel and Terza, 2001) or use a latent factor structure to represent unobserved factors influencing both the outcome as well as the treatment (Bratti and Miranda, 2011; Deb and Trivedi, 2004).

The aim of this study is to estimate the effect of an endogenous treatment effect on a count dependent variable. Moreover, a model is suggested which attempts to capture the effect of the average number of protein products ordered by a patient a day on the number of days he or she spends in the hospital, taking into account the endogeneity of the choice variabele the average number of protein products a day as well as the count structure of the dependent variable the number of days of hospital stay.

The remainder of this paper is organized as follows. In section 2 relevant literature on protein intake and the hospital is provided and section 3 discusses previous research on the encountered endogeneity problem. The underlying data is described in Section 4, which also gives a detailed description of the econometric model. Section 5 analyzes the results and section 6 summarizes and discusses the findings.

(6)

2

Relevant literature

This section elaborates on the relevant literature behind the model. Section 2.1 discusses literature concerning protein intake and section 2.2 provides information on the At Your Request R program implemented in hospital Gelderse Vallei, Ede.

2.1 Protein intake

Proteins are big macromolecules which are made up of amino acids. They are an essential part of our body and perform a large amount of functions like repairing tissue, replicating DNA and transporting molecules (Alberts, 2013). Proteins are distinct from one another by their composition of amino acids. Twenty different standard amino acids exist and they can generally be classified in indispensable and dispenseable amino acids or, respectively, essential and nonessential. The amino acids categorized as essential cannot be synthesized by the human body and must therefore be obtained by nutrition (Meideros, 2007). Protein synthesis in the body is an ongoing process that takes place in nearly all cells of the human body. In addition, the body does not store amino acids and therefore needs a continuing supply of amino acids to produce new protein. Therefore it is important that we obtain the essential amino acids by the consumption of food which contains protein. In this way, the body can break down the ingested proteins into amino acids through digestion.

During sickness and recovery, protein expenses more quickly because of increased intensity of certain bodily processes. Therefore it is important that individuals who are ill consume enough protein. It is found that sufficient protein intake for hospitalized patients has several benefits: they have a reduced risk of developing bedsores, recover more quickly from fractures, lose less weight during their hospital stay and have less chance of being readmitted to the hospital (Van der Zanden, Van Essen and Van Kleef, 2015).

Healthy individuals are recommended to consume a daily 0.8 g/kg bodyweight of protein. Since hospitalized patients are at risk of increased bodily processes like gluconeogenesis, the biosynthese of new glucose, and muscle catabolism, their protein requirements are higher (Kudsk and Sacks, 2006; Braga et al., 2009). Depending on several factors like patient characteristics and research methods applied in research on protein intake, their requirements fluctuate between 1.1 and 1.7 g/kg bodyweight.

It is found that proper nutrition contributes to the quality of life of patients and has a positive effect on their recovery (Kondrup, 2004). Thus, enough protein intake for patients may contribute to faster recovery. However, hospitalized patients are often found to consume too little protein (Barton, Beigg and MacDonald, 2000).

A lot of initiatives have taken place to try to get patients to eat more protein. One way of doing so, is changing the catering system in the hospital. In the USA the innovative service provider Sodexo, implemented the At Your Request R program, a room service dining for

hospital patients. Over 350 hospitals throughout North America use the program that allows patients to order food and drinks they prefer from a menu at the moment that suits them best. In November 2012 also hospital Gelderse Vallei Ede implemented the At Your Request R meal service to, among other reasons, improve patients nutrition.

(7)

2 Relevant literature

2.2 Hospital Gelderse Vallei

On the third of July 2007 hospital Gelderse Valei together with Wageningen University & Research initiated the Alliantie Voeding in de Zorg (translated as Alliance Nutrition in Healthcare). For 10 years now, dietitians, researchers, medical specialists, nurses and students from this alliance have worked on various projects to improve nutritional care. In their turn, the Alliance Nutrition in Healthcare cooperates with Food Valley, OostNV and InnoSportNL to realize the initiation of projects revolving around sports, nutrition and healthcare. One of those initiations implemented in Gelderse Vallei is Eat2Move which focuses on nutrition and movement to improve recovery in healthcare.

Alongside Eat2Move, in 2012 Gelderse Vallei was the first hospital in Europe to implement the innovative meal service concept At Your Request R

. According to a research commis-sioned by the Dutch Ministry of Public health, Welfare and Sport, hospital Gelderse Vallei is the best hospital in Holland regarding patients nutrition. The researchers state that the concept At Your Request R

from Sodexo mostly corresponds to the ideal situation regard-ing nutritional care in hospitals ("Voedregard-ingsziekenhuis ’beste van Nederland’ in voedregard-ing", 2016).

2.2.1 At Your Request R

The At Your Request R meal service concept is developed in the US and based on the

room-service concept in hotels. Between 7 o’clock in the morning and 6.15 in the afternoon, patients can call the nutrition call center in the hospital and order drinks and food. The call center has all the relevant information on patients, like whether he or she is on a diet, and can help them accordingly. The hospital kitchen then receives these orders and prepares the food within 45 minutes of the placed order time.

At the moment of admission to the hospital the patient receives the At Your Request R

menu card. Patients younger than 18 years get a special children’s menu. Patients who require a specific diet, obtain an appropriate menu or insert menu card. See Appendix A for the general menu. In order to stimulate patients to eat sufficient protein, in April 2014 At Your Request R added a symbol in the form of a hand making a thumb to products

which contain a lot of protein. In addition, patients who need extra protein obtain an additional menu insert card which gives them the opportunity to order certain protein rich snacks in between the ordinary meals. Appendix A1 provides this insert menu card. The At Your Request R

database digitally stores all food orders per patient per day. As a result, all the food that has reached the patient is registered. However, there is no data on the actual intake of a patient. Patients might order more than they truly consume. A validation study was done in 2013 on 63 patients where data from food orders was compared to detailed food intake lists which were recorded by researchers. Doorduijn and Van Gameren (2016) conclude that food order data are a good reflection of food intake.

(8)

3

Previous research

This section will focus on previous research concerning endogeneity in the field of treatment outcome modelling. Section 3.1 will discuss the general problem of endogeneity, whereas section 3.2 considers recent empirical research attempting to address this problem.

3.1 Endogeneity problem

A lot of empirical micro econometric literature, like in health or labor economics, focuses on certain outcome models where the dependent variable is a count. In some cases this count dependent variable is itself a function of a binary variable, indicating whether an individual has received treatment yes or no, or a count variable indicating several treat-ments. Windmeijer and Santos Silva (1997) for example investigate the determinants of the demand of health care. Among other regressors, they explicitly focus on the effect of a self-reported health index on the number of visits to the doctor. In these models straight-forward estimation is challenging, since the regressor of interest is likely to be endogenous. We say that a regressor is endogenous whenever there are unobservable individual charac-teristics affecting the treatment that also influence the dependent variable. In the case of Windmeijer and Santos Silvas (1997), individuals who report themselves as healthy might be overall less concerned about their health and therefore are probably also less likely to visit a doctor, indicating that health status is not exogenous.

Another example of literature documenting the effect of an endogenous treatment on a count outcome is Kenkel and Terza (2001), who study the effect of advice given by a physician on individual alcohol consumption. The empirical problem addressed by Kenkel and Terza (2001) with respect to alcohol consumption and physician advice is that cer-tain unobserved factors influencing whether an individual receives physician advice could possibly be correlated with unobservable determinants of alcohol demand. To elaborate, individuals more concerned with their health might be more likely to seek physician advice as well as refrain from heavy drinking.

Before discussing several approaches which adress the endogeneity let us discuss a standard Poisson model for count data and its consequence if one of the regressors is endogenous. Consider a dependent variable yi, i = 1, ..., N , that takes integer positive values and is

independently Poisson distributed and a vector of explanatory variables xi. The density of yi can be defined as

f (yi|xi, β) =

exp(-λi)λyii yi!

where β is a vector of parameters estimated by maximum likelihood and λi is the

condi-tional mean which can be described as

λi = E(yi|xi) = exp(x0iβ).

When one of the elements of xi is an endogenous regressor, λi specified above is no longer the correct conditional mean of yi. This inevitably results in inconsistent estimates and

inference of the Poisson model will be wrong. The model needs to be adjusted accordingly. As mentioned before, the chosen number of protein products by a patient is allowed to be endogenous. To elaborate, when considering the effect of this variable on the length of hospital stay, there are most certainly factors influencing both the amount of ordered protein products and the length of hospital stay that are impossible to observe. As a

(9)

3 Previous research

consequence, when incorporating the variable of the number of protein products in the vector of explanatory variables xi mentioned above, the conditional mean λi = exp(x0iβ) of yi, the number of days a patient spends in the hospital, is no longer correct and estimates

are inconsistent. Thus, a methodology has to be adopted that analyzes the effect of the endogenous variable on the dependent variable.

3.2 Recent empirical research

The literature discussed above has, in various ways, attempted to catch the effect of a po-tential endogenous regressor on an outcome variable. Windmeijer and Santos Silva (1997) use the generalized method of moments (GMM) estimation technique to estimate the ef-fect of a self-reported health index on the demand of health care. They simultaneously model the outcome equation and the equation for the self-reported health status and find that their estimated coefficient is not significantly different from the estimated coefficient in the regular Poisson model.

Deb and Trivedi (2004) in their research, study the effect of endogenous insurance status on the utilization of health care services and jointly model the distribution of the endogenous treatment, insurance status, and the outcome variable using a latent factor structure. The latent factors are introduced in both the outcome and the treatment equation and represent unobserved factors influencing both equations and can therefore be interpreted as unobserved heterogeneity. Deb and Trivedi (2004) find that there are unobserved factors which considerably change the estimated effect of insurance status on utilization.

Kenkel and Terza (2001) apply a two-step method to estimate the endogenous treatment model and use unobserved heterogeneity that is normally distributed. They find a signif-icant correlation between alcohol demand and the unobserved factors determining advice given by a physician. Bratti and Miranda (2011) on the other hand, use the data from Kenkel and Terza (2001) on physician advice and alcohol consumption and apply the same latent factor strategy as Deb and Trivedi (2004). Just like Kenkel and Terza (2001), they find that neglecting endogeneity of treatment leads to the opposite effect of physician advice on alcohol consumption.

When modeling the joint distribution, the implemented unobserved factors are unknown. Deb and Trivedi (2004) and Bratti and Miranda (2011) tackle this by using maximum simulated likelihood to obtain estimates. Both papers adapt a technique that uses Halton sequences to evaluate the integrals. Their algorithms used for maximization require only the analytical first derivatives, for the second derivatives a numerical approach is applied. The paper of Bratti and Miranda (2011) is used as a building block for the present research. We simultaneously model the hospital stay equation and protein product equation and introduce a latent factor structure to take endogeneity into account. Our method differs in the sense that we use numerical integration to deal with the unobserved heterogeneity instead of simulating the likelihood. Section 4.2.3 will elaborate on the method applied in this research.

(10)

4

Research method

This section introduces the data and elaborates on the empirical approach applied in the research. Section 4.1 will start off with highlighting the data used, introducing the dependent and explanatory variables and presenting the descriptive statistics. Section 4.2 will give a detailed insight in the empirical approach of the research.

4.1 Data

The data is obtained from the hospital Gelderse Vallei (Ede, The Netherlands) and covers a period of two years, running from April 2013 up and till April 2015. This 600-bed hospital has an annual admission of about 21,000 patients and is one of the few hundred hospitals around the world which makes use of Sodexo’s At your request R meal-service

program. Patients included in the research are 18 years or older, do not make use of a feeding tube and therefore solely depend on solid food. Also, only patients who spend 1 day or more in the hospital are included. As a result, patients who only undergo a short procedure in the hospital and leave after a few hours are not included in the sample. Patients from the following wards were excluded from the used dataset: Intensive Care, Acute Admission, Emergency Cardiac Care, Coronary Care Unit and the delivery rooms. The dependent variable of interest is the number of days a patient spends in the hospital, hospital stay. In addition, our other variable of interest is the number of protein rich products a patient orders, protein. More specific, the number of protein rich food products a patient orders is the average number of protein rich food products that a patient orders per day.

The data obtained from the At your request R

program, will be discussed in section 4.1.1 and section 4.1.2 elaborates on health related variables.

4.1.1 At Your Request R data

For every patient the At your request R database registers their ordered food and drink

products from the menu. Every call a patient makes to the At your request R

nutrition call centre is registered and counted as 1 order. On all orders made by a patient, its details are saved in the database. These details consist of the time and date of the order, time of delivery, the specific products chosen by the patient and the amount of those products. In addition, data on a patients diet is available like for example whether a patient is on a protein rich diet or a sodium low diet. Most patients, 40%, follow a general diet without any restrictions or additions. The other main diet groups are protein rich, diabetes, sodium low and saturated fat-limited. Patients who are on a diet which requires their food to be grinded or liquid are excluded from the sample. In addition, patients who are on a limited protein diet are removed from the sample.

An extra menu sheet with additional in-between-snacks high in protein are provided for patients who are on a protein rich diet as a motivation to increase their protein consump-tion. A dummy variable which indicates whether a patient is on a protein rich diet is added to the model1. Note that a person can be on multiple diets, so a patient on a protein rich diet could possibly also be on a sodium low diet. It is expected that those patients order

1An alternative choice could be to include all diets in the model but to keep things simple this research only includes a dummy variable for the protein rich diet.

(11)

4 Research method

more protein products, since they have a greater variety of products to choose from and are explicitly encouraged to increase their protein intake.

The database also includes information on whether and how many extra nutritional sup-port in the form of supplement drinks are ordered by the patients. The average amount of ordered supplement drinks per day is added as a regressor. Possibly patients who order supplement drinks are less likely to order protein rich food products since they already receive this highly nutritious drink.

As mentioned before, in April 2014 At Your Request R changed their food menu by adding

explicit symbols to protein rich food products. For every patient a dummy variable indi-cates whether the patient belongs to the group after the menu change, dummy value is 1, or before the menu change, dummy is 0. This way it can be tested whether the use of symbols results in patients ordering more of that particular product. The menu change most probably results in an increase of the amount of protein rich food products ordered by the patients, as these products are being promoted more obviously.

4.1.2 Health variables

Besides the data from At your request R

, also health related characteristics on the patients should be considered when modeling a patients hospital stay. The database of the hospital contains characteristics on gender, age and BMI. For the variable BMI, 4 distinct groups are identified. A dummy variable indicates whether a patient is underweight, has a normal weight, is overweight or obese. Obesity is defined as having a BMI value of 30 or above, whereas a person is underweight when his or her BMI value is below 18.5. The patients categorized in normal weight have a BMI between 18.5 and 24.9 and overweight patients have a BMI between 25 and 29.9. Patients with a higher BMI are expected to spend more days in the hospital.

In addition, a patients co-morbidities are registered as well as its ICD10 code. Regarding the co-morbidities, this is a number indicating the total amount of co-morbidities a patient suffers from. This number counts whether a patient suffered from for example high blood pressure, diabetes or high cholesterol and is used as an explanatory variable. Overall it can be expected that patients who have more co-morbidities are less healthy and spend more days in the hospital.

In 1990 the Forty-third World-health assembly endorsed the ICD-10. These ICD10 codes represent the International Classification of Diseases, 10th version, and are a standard diagnostic tool for health purposes used by the health industry. The code is made up of 2 main parts and one additional, optional, part. The first part consists of one or two letters and represents the main category a patient is in, for example diseases of the circulatory system, and the second part is a number indicating a subdivision of this main category. The optional part of the code can subdivide the patients into further classification within their belonging main category. A complete list with the definition of this code is provided in Appendix B. In total, 15 dummies have been created to represent the relevant diagnostic categories. The dummies represent the main categories which serve as a first classification in the ICD10 code, and are indicated by the first two letters of the code.

To elaborate, a patient with ICD10 code A or B is grouped in category AB, certain infectious and parasitic diseases, and the belonging dummy, ICD10 AB, equals 1 for this patient. Patients with diseases of the blood or blood-forming organs or certain disorders involving the immune mechanism as well as anything concerning neoplasms are coded ICD10 CD. A patient diagnosed with an endocrine, nutritional or metabolic disease gets

(12)

the ICD10 E code. Code ICD10 F and ICD10 G respectively belong to mental, behavioral, neurodevelopmental disorders and diseases of the nervous system. A patient diagnosed with a disease of the eye or adnexa or a disease of the ear or mastoid process has code ICD10 H whereas patients diagnosed with a disease of the circulatory system have code ICD10 I. Diseases of the respiratory system, diseases of the digestive system and diseases of the skin or subcutaneous tissue respectively belong to code ICD10 J, ICD10 K and ICD10 L. Patients in the hospital for diseases of the musculoskeletal system and connective tissue get code ICD10 M and patients who have a disease of the genitourinary system get code ICD10 N. Patients in the hospital concerning pregnancy, childbirth or the puerperium get coded ICD10 O and the ICD10 Q code belongs to patients with congenital malformations, deformations or chromosomal abnormalities. Patients in the hospital for injury, poisoning or certain other consequences of external causes have code ICD10 S. Lastly, patients diagnosed with symptoms, signs or abnormal clinical and laboratory findings which are not elsewhere classified are coded ICD10 R. In our sample, there were no patients belonging to ICD10 codes ICD10 P, ICD10 V and ICD10 Y.

Furthermore, the amount of medical procedures performed on the patient during its hos-pital stay are registered as well as whether the patient had surgery yes or no. Regarding the medical procedures, these range from inserting a catheter to performing a lumbar puncture. It can be expected that patients who undergo surgery or a significant amount of procedures order more protein rich products as it is believed that these will enhance their recovery. On the other hand, more medical procedures or a surgery performed on a patient could positively effect a patients length of hospital stay.

4.1.3 Data transformation

The original database is organized in a way that every observation represents one order placed by a patient. For statistical analysis, it is more convenient to transform the database in patient level data and because a patient can make multiple orders a day, these orders are stacked as such that every observation consists of all orders of one patient on one day. The number of protein rich products ordered by a patient are added up for all days he or she is in the hospital and divided by the number of days spend in hospital, creating a variable for the average number of protein rich products ordered per day per patient. An important issue has to be taken into account when selecting the patients for our sample. Some patients in the two-year database appear in the sample multiple times because of readmissions. To avoid correlation between observations of the same patient, only the first admission of a patient is included in the final sample. As a result every patient is present in the sample only once.

With respect to the health related variables, some care must be taken with respect to age. Alongside the explanatory variable age, age squared is added as a regressor. This is done to capture the possibly present non-linear effect of age on the dependent variables. This study is subject to potential measurement error since we treat every protein rich product equal, while certain products might be much higher in protein than others. It could be that three products which are not protein rich, together contain as much protein as one protein rich product. A small note in our favor is that the database does not register complete meals but solely separate products making up a meal together. This way we avoid that a complete protein rich dinner is being treated the same way as a protein rich snack. Also the way that the protein rich products are promoted works in our favor. Every protein rich product has the same symbol. Since the menu does not display

(13)

4 Research method

how much protein a product contains, a patient does not make a decision for a certain product based on the amount of protein it includes.

4.1.4 Descriptive statistics

The final sample consists of 14297 patients between 18 and 101 years old of whom 6471 are male and 7826 female. Characteristics of the dependent as well as the explanatory variables of the model can be found in Table 1. Table 1 shows that the daily average number of protein products ordered by patients is 2.40, with a minimum of 0 and a maximum of 13, and the mean number of days a patient spends in the hospital is 3.59.

Regarding the explanatory variables, the average age of the patients is 63.56. In addition, 387 patients are underweight whereas 3128 patients are obese. The average number of co-morbidities is 0.8162 and 5405 patients have undergone a surgery during their hospital stay. Furthermore, the mean number of medical procedures a patient has undergone is 4.16.

Regarding the ICD10 variables, it can be noted that most patients, 2224 out of 14297, have ICD10 code I indicating diseases of the circulatory system. Furthermore, 1994 of the 14297 patients have ICD10 code S which covers patients with injury, poisoning and certain other consequences of external causes. 1920 of the 14297 patients have ICD10 code ICD10 K representing patients with diseases of the digestive system, 1751 ICD10 code M for diseases regarding the musculoskeletal system and connective tissue, 1498 ICD10 code CD and 1405 ICD10 code J. The other patients belong to the alternative ICD10 codes. Furthermore, 2389 patients are on a protein rich diet and the average number of additional nutritional drinks ordered by patients is 0.01. 59% of the patients are admitted to the hospital after At Your Request R changed its food menu by adding symbols to protein rich

(14)

Table 1: Descriptive statistics of all variables

Variables N Mean St. dev. Dependent variables Protein 14297 2.3959 1.5735 Hospital stay 14297 3.5922 2.3329 Explanatory variables Female 7826 0.5474 Age 14297 63.56 18.508 Co-morbidities 14297 0.8162 1.2097 Surgery 5405 0.3781 Menu change 8452 0.5912 Medical Procedures 14297 4.1566 3.5093 ICD10 AB 247 0.0173 ICD10 CD 1498 0.1048 ICD10 E 271 0.0190 ICD10 F 92 0.0064 ICD10 G 337 0.0236 ICD10 H 137 0.0096 ICD10 I 2224 0.1556 ICD10 J 1405 0.0983 ICD10 K 1920 0.1343 ICD10 L 134 0.0094 ICD10 M 1751 0.1225 ICD10 N 1091 0.0763 ICD10 O 64 0.0045 ICD10 Q 21 0.0015 ICD10 R 1107 0.0774 ICD10 S 1994 0.1395 Nutritional drink 14297 0.0106 0.1278 Protein diet 2389 0.1671 Underweight 387 0.0271 Normal weight 5506 0.3851 Overweight 5276 0.3690 Obese 3128 0.2188

Note: Sample of 14297 patients.

4.2 Empirical Approach

The decision of a patient to order a certain amount of protein rich products is affected by numerous factors like personal characteristics, health status, circumstantial and perhaps even environmental aspects. In addition, the length of stay in the hospital by a patient certainly depends on as many, if not more, personal characteristics and circumstances. Ob-serving all these relevant factors when investigating the relationship between the amount of protein products and the number of days spend in the hospital, is impossible. In ad-dition, most probably there will be factors influencing both the choice of a patient to order a certain amount of protein products and the length of hospital stay, implying the presence of endogeneity. By taking into account those unobserved factors and allowing for correlation between these unobservables across both the number of protein products and days spend in the hospital, this relationship can be estimated.

The estimation strategy implemented here consists of three parts. First, the number of protein products chosen by the patient is modeled. In the second step, we model the number of days that a patient spends in the hospital. In doing so, the number of protein products ordered by a patient is incorporated as a regressor, in the first instance ignoring possible endogeneity of this choice variable. Lastly, to take into account this endogeneity, the models for the protein products and the number of days spend in the hospital are simultaneously estimated, accounting for unobserved factors possibly present in both models by implementing unobserved heterogeneity.

(15)

4 Research method

4.2.1 Analysis of the number of protein products

This section investigates the relationship between the number of protein rich products chosen by a patient and a vector of explanatory variables. The dependent variable is the average number of protein rich products chosen by a patient per day.

As described above the average number of protein products ranges from 0 to 13. These averages are rounded off, resulting in a count variable2. In a sense this can be experienced as a patient eating zero protein products, nothing, to 1, a little bit, to 13, a lot of protein. Moreover, we are now dealing with an ordering of the alternatives. Table 2 shows the frequencies of the rounded off average number of protein products chosen by the patients. The percentage of people eating a rounded off average of 7 protein products or more is 1.64. For estimation purposes these "categories" are grouped together, leaving us with 8 ordinal alternatives: 0 to 7 protein products, where the category 7 indicates 7 or more protein rich food products. It can be observed that the distribution of frequencies has a long right tail.

Table 2: Frequencies of rounded off average number of protein products ordered by patients per day

Number of Protein 0 1 2 3 4 5 6 >7 (max 13) Frequency 1043 3494 3880 2855 1669 778 344 234

Relative frequency 7.30 24.43 27.14 19.97 11.67 5.44 2.41 1.64

The data described above has to be estimated by an appropriate and parsimonious esti-mation method which explicitly takes the ordering into account. Hence, the 8-alternative ordered model is given by:

Protein∗i =Zi0θ + qi (1)

Proteini =j if αj−1< Proteini ≤ αj

With j = 0, .., 7 and

P [Proteini= j] =P [αj−1< Proteini ≤ αj]

=F (αj− Zi0θ) − F (αj−1− Zi0θ).

Here, q is assumed to be standard normally distributed and F (.) is the standard normal CDF. The model can be described as an ordered probit model. Ziis a vector of explanatory variables and does not contain a constant, allowing the model to be able to calculate all cutoff points αj. The parameters θ and threshold parameters αjare obtained my maximum

likelihood.

4.2.2 Determinants of a patients hospital stay

To determine what characteristics of the patients influence the number of days he or she stays in the hospital, a Poisson regression is in place. Note that, in our sample, a patient never spends zero days in the hospital, since only patients who stay in the hospital one whole day or more are included. Moreover, the data on hospital stay only includes

2Although the setting of the average amount of protein products ordered by a patient per day is essentially continuous, by rounding off the averages the variable is forced in a count setting. This is done to provide more intuitive results, which are in line with the key question of this research: does the decision of a patient to order a certain amount of protein products have an influence on the length of hospital stay?

(16)

strictly positive values and therefore a zero-truncated Poisson regression model seems to be appropriate.

As a result, the distribution which models the relationship between the length of stay in a hospital (y), a vector of covariates X and the rounded off average amount of protein products ordered by a patient per day, is assumed to be zero-truncated Poisson distributed. The following distribution function generates the number of days a patient spends in the hospital, a count:

Pr(yi) = µ

yi

i exp(−µi)

(1 − exp(−µi))yi! (2)

Where the log mean can be described as:

ln(µi) = Xi0β + Di0δ0+ .. + Di7δ7 (3)

To elaborate, Di0, .., Di8 represent dummies where

Di0= 1(P rotein = 0), ..., Di7= 1(P rotein = 7, 8, ..).

Incorporating dummies instead of the plain variable P rotein itself, frees the model of a restricted linear relationship between hospital stay and protein products. In the model,

X does not contain a constant, allowing all dummy variables Dij to be estimated. The

parameters β and δj, j = 0, .., 7, are estimated by maximum likelihood.

4.2.3 Two-equation model of hospital stay and protein products

This section aims to find a solution for the major issue of endogeneity present in the model. We estimate a simultaneous model for the number of days a patient spends in the hospital where the rounded off average amount of protein rich food products ordered by a patient is included as a regressor and is allowed to be endogenous. To start off, the decision to eat a certain amount of protein rich products will not be independent of the amount of days a patient stays at the hospital. Consequently, inference concerning the effect of protein products on hospital stay might be complicated by unobserved common factors present in both equations. A way of addressing this endogeneity is to estimate both the decision of the amount of protein rich product as well as the hospital stay equation simultaneously and add an unobserved heterogeneity term.

As mentioned before, the amount of protein rich food products ordered by the patients can be modeled by an 8-alternative ordered probit model. A zero-truncated Poisson dis-tribution is applied to model the hospital stay of a patient.

Bratti & Miranda (2011) propose imposing a structure on their residuals in their simul-taneous model to allow for correlation. They do so by introducing a specific latent factor in their model. We apply the same method and incorporate a factor ηi in our model3. The mean presented in the zero-truncated poisson model in equation (3) transforms to the following conditional log mean of y given ηi and the endogenous explanatory variables Dij indicating the rounded off average number of protein rich food products ordered by

patients:

ln(µi) = Xi0β + Di0δ0+ .. + Di7δ7+ ηi (4)

3

Bratti and Miranda (2011) impose a structure on the residuals of two equations indicating endogenous treatment and endogenous participation. We only allow for one endogenous variable in the model.

(17)

4 Research method

Here, the specific latent factor ηi is introduced, which is present in both equations. This

random variable represents unobserved heterogeneity of an individual i. Xi does not

contain an intercept, allowing all δj, j = 0, .., 7, to be estimated.

Including the latent factor ηi in both equations, results in the following structure on the

error term qi in the ordered probit equation (1):

qi =ληi+ ξi

Incorporating this structure on the error terms allows for correlation between the rounded off average number of protein products and hospital stay, y. Here ξi is an idiosyncratic

error term and λ embodies a factor loading accompanying ηi which has the advantage of

being able to be estimated alongside the other parameters θ and αj in the model. In the simultaneous model, in contrast to the ordered-probit model before, Z does contain an intercept and the first cutoff point α0 is fixed.

In order to close the model, a few assumptions have to be placed on the distribution of ηi and ξi:

• Distr(ηi|Xi, Zi, ξi) = Distr(ηi), which assumes random effects indicating that the

unobserved heterogeneity term ηi is independent of all explanatory variables and

the error term ξi,

• Distr(ξi|Xi, Zi, ηi) = Distr(ξi), this condition assumes ξi to be independent of the

explanatory variables Xi and Zi as well as the unobserved heterogeneity term ηi.

It is assumed that the latent factor is distributed as follows ηi∼ N (0, σ2

η) and ξi∼ N (0, 1).

Note that:

Var(ηi) =ση2 Var(qi) =λ2ση2+ 1

Cov(ηi, qi) =λση2

The correlation between the error terms q and η can be written as:

ρη,q = λσ2η q σ2 η(λ2ση2+ 1) (5) Estimation

Under the aforementioned asumptions, the joint probably density function of patient i conditional on the common latent factor, is

li|ηi=DijP r[Yi = yi, Proteini= j|Xi, Zi, ηi]

=DijP r[Proteini= j|ηi]P r[yi|ηi].

The complication here arises in the fact that ηi is unknown. Nevertheless, we did make

an assumption on the distribution of ηi, which is standard normal with variance σ2η, and can therefore be integrated out of the joint density,

DijP r[Yi = yi, Proteini = j|Xi, Zi] = Dij

Z ∞

-∞

P r[Proteini= j|ηi] × P r[yi|ηi] × φ(ηi)dηi.

(6) It follows that the log-likelihood function can be written as:

log(L) =X i X j Dijln Z ∞ -∞ P r[Proteini = j|ηi] × P r[yi|ηi] × φ(ηi)dηi  (7)

(18)

Where, as before, Dij = 1 for Proteini = j.

Difficulty in estimation arises because the integral in (7) has no closed form solution. Deb & Trivedi (2004) as well as Bratti & Miranda (2011) solve this by using simulation-based estimation, this research differs in the sense that the integral is dealt with directly by means of numerical integration.

Using the distributional assumptions specified in the sections before, the log-likelihood boils down to:

log(L) =X i X j Dijln Z ∞ -∞ [Φ(tj-Zi0θ-ληi)-Φ(tj−1-Zi0θ-ληi)] × µyi i exp(-µi) (1-exp(-µi))yi!× φ(ηi)dηi (8) Where, µi = exp(Xi0β + Di0δ0+ .. + Di7δ7+ ηi).

In order to enhance identification, some exclusion restrictions have been placed on the used covariates in both equations. Next to the same explanatory variables as Xi, the covariates vector Zi consists of five more covariates. These five additional regressors include whether

a patient follows a protein rich diet and whether, as also how many, extra nutritional support drinks a patient has ordered. Also the categorical groups indicating the BMI of a patient are added solely to covariates Zi and not Xi.

The choice of suitable excluded variables remains a largely discussed topic and multiple approaches exist to test the validity of these variables. Kenkel and Terza (2001) for example, use an overidentifying test to check for the validity of their exclusion restrictions. However, that is not the focus of this research and to keep things simple these excluded variables are assumed to be valid4. See Table 1 for the descriptive statistics on those variables.

In order to reduce computational time of the model, a random sample of 2000 out of the total 14297 patients is used. The descriptive statistics on this sample, which are quite similar to the original sample, are presented in Appendix C.

The unknown parameters, α, θ, λ, β, δ and ση2 will be estimated by maximum likelihood. For more details on the log likelihood function see Appendix D.

4

Bratti and Miranda (2011) even claim that formally the model is identified without exclusion restric-tions through restricrestric-tions on the covariance matrix and by functional form. See Bratti and Miranda (2011) for the description on their restrictions on the covariance matrix.

(19)

5 Results

5

Results

This section will elaborate on the results and give an interpretation of the estimated param-eters. Sections 5.1 and 5.2 will present and discuss respectively the results of the rounded off average amount of protein products ordered by a patient per day equation and the length of hospital stay equation. Section 5.3 will give the results when the aforementioned equations are estimated simultaneously and endogeneity is taken into account.

5.1 Ordered probit model

The parameters of the ordered probit model are estimated using maximum likelihood and presented in Table3. The dependent variable is the rounded off average number of protein products ordered by a patient per day. This dependent variable has 8 levels; ordering on average 0 protein products per day, ordering on average 1 protein product per day up and until ordering on average 7 or more protein products per day.

The first column of Table3reports the estimated coefficients of the ordered probit model. The sign of these estimated coefficients can be directly interpreted as whether the latent variable P rotein∗ increases with the regressor or not (Cameron and Trivedi, 2004). In or-der to get a more intuitive interpretation of the results, Appendix E presents the marginal effects at the mean of the independent variables.

Patients who are female are less likely to order more protein products then males. The estimated coefficient is significant on a 0.1% level. Age appears to have a positive effect on the average amount of protein ordered per day by a patient, but this effect lessens when people get older. This is indicated by the estimated positive coefficient of age and the estimated negative coefficient of age squared. Both are significant at a 0.1% level.

Patients admitted to the hospital after April 2014, who received menu cards that promoted protein products explicitly, are more likely to order a higher amount of protein products on average per day then patients admitted before the menu change, significant at a 0.1% level. In addition, a patient that is on a protein rich diet is more likely to be in a higher category of average protein products per day. The estimated coefficient of protein diet is significant at a 0.01% level. A patient who is obese is more likely to order more protein products and is significant at a 1% level.

Regarding the diagnostic variables ICD10, patients who have a certain endocrine, nutri-tional or metabolic disease, coded ICD E, are more likely to order a higher amount of protein products on average per day then patients who have symptoms, signs or abnormal clinical and laboratory findings which are not elsewhere classified, at a 1% level. A patient diagnosed with a disease of the respiratory system, ICD10 J, is more likely to order more protein products then the reference group, ICD10 R, significantly at 5%. Patients with diseases of the digestive system, ICD K, are less likely at a 5% level and patients with diseases of the musculoskeletal system and connective tissue, ICD M, just like patients with diseases of the genitourinary system, ICD10 N, are more likely at a 1% significance level to order more protein products on average per day than their reference group, ICD10 R. Patients in the hospital for pregnancy, childbirth or the puerperium, ICD10 O, are less likely to order more protein products and patients diagnosed with injury, poisoning or certain other consequences of external causes, ICD10 S, are more likely to order more pro-tein products. Both estimated coefficients are significant at 10%. All remaining estimated coefficients are not significant.

(20)

Table 3: Ordered probit ML estimation results

Variables Coefficient St. error

Female -0.2599 *** 0.0479 Age 3.3394*** 0.7558 Age squared -2.9650*** 0.6395 Underweight 0.0897 0.1308 Overweight 0.0524 0.0542 Obese 0.1829** 0.0632 Co-morbidities 0.0726 0.7405 Surgery 0.0569 0.0623 Menu change 0.3950*** 0.0474 Medical Procedures -0.1077 0.0724 ICD10 AB -0.1298 0.2112 ICD10 CD 0.0301 0.1109 ICD10 E 0.4435** 0.1729 ICD10 F 0.1317 0.2857 ICD10 G 0.0149 0.1613 ICD10 H -0.3179 0.2911 ICD10 I 0.0308 0.0995 ICD10 J 0.1898* 0.1079 ICD10 K -0.1549* 0.1028 ICD10 L 0.1678 0.2511 ICD10 M 0.3749** 0.1158 ICD10 N 0.2843** 0.1203 ICD10 O -0.9820. 0.5054 ICD10 Q -0.2798 0.6323 ICD10 S 0.1712. 0.1034 Nutritional drink 0.0182 0.1935 Protein diet 0.3385*** 0.0679

Cut off points

α0 -0.4624* 0.2295 α1 0.6249** 0.2288 α2 1.3865*** 0.2298 α3 1.9987*** 0.2313 α4 2.5231*** 0.2330 α5 2.9931*** 0.2362 α6 3.4779*** 0.2440

Note: The dependent variable is the rounded off

av-erage number of protein products ordered by patients a day; sample of 2000 individuals;.,*,**,***

Asymp-totically significant at respectively 10%, 5%, 1% and 0.1%; reference categories are Males, Normal Weight and ICD10 R.

At the bottom of the table, the estimated cut off points of the ordered probit model are given. These cut off points indicate where the latent variable is cut to produce the 8 groups which are present in the data.

5.2 Zero-truncated Poisson

The estimates of the zero-truncated poisson are given in Table4. The dependent variable is the number of days a patient spends in the hospital. The first column presents the estimated coefficients of the zero-truncated poisson model.

Being female reflects an increase of 9,77%, exp(0.0932) = 1.0977, of the expected number of days spend in the hospital. The estimated coefficient is significant at a 0.1% level. In addition, the parabolic effect of age has a positive impact of 148%, exp(0.9083) = 2.48, on the length of stay in the hospital by a patient.

Patients who have been admitted in the period when the protein products have been explicitly promoted on the menu card seem to spend less days in the hospital than patients who spend their days in the hospital before the menu change. The patients after the

(21)

5 Results

menu change have an expected value which is exp(−0.1419) = 0.8677, 13,23% less than the expected value of the number of days spend in the hospital of patients before the menu change. Note that this effect might also be due to certain time effects, like overall improvement of health care, or other ongoing projects in the hospital to enhance recovery of the patients.

Furthermore, an additional medical procedure performed on a patient increases the ex-pected value of the number of days in the hospital by a factor of exp(0.1588) = 1.1721. Patients who are in the hospital concerning neoplasms, diseases of the blood and blood-forming organs and certain other disorders involving the immune mechanism, ICD10 CD, have an expected value exp(0.3453) = 1.412 times the expected value of the number of days in the hospital of patients who have symptoms, signs and abnormal clinical and lab-oratory findings which are not elsewhere classified, the reference group. Patients in the hospital for endocrine, nutritional or metabolic diseases, ICD10 E, have an expected value

exp(0.3528) = 1.4231 times the expected value of the reference group. The estimated

coefficient is significant at 0.1%. Individuals diagnosed with a disease of the respira-tory system, ICD10 J have an expected value of length of hospital stay which is 46%,

exp(0.3777) = 1.4589, more than the expected value of the reference group, ICD10 R.

Patients admitted to the hospital for diseases of the digestive system, ICD10 K have an expected hospital stay which is 35% bigger than the reference group. The estimated coefficient is significant at a 0.1% level. Patients with diseases of the musculoskeletal system and connective tissue, ICD10 M, have an expected value of hospital stay which is

exp(0.2917) = 1.339 times the expected value of the reference group, significant at 0.01%.

Furthermore, patients with code ICD10 N, diseases of the genitourinary system, ICD10 Q, indicating congenital malformations, deformations and chromosomal abnormalities and ICD10 S, injury, poisoning and certain other consequences of external causes respectively have an expected hospital stay which is 22%, 175% and 62% bigger than the reference group ICD10 R. The estimated coefficients are significant at a level of, respectively, 1%, 0.1% and 0.1%.

In the zero-truncated Poisson model, the effect of the dummies Dj indicating the rounded off average number of protein products ordered by a patient per day, the main interest of this research, on the length of hospital stay are estimated without taking endogeneity of this variable into account. Inference might therefore not be accurate, as there could be factors influencing the choice of a patient to order a certain amount of food products high in protein that also influence his or her length of stay in the hospital. Nevertheless, the zero-truncated Poisson model results state that the rounded off average protein products ordered per day have a significant positive effect on the expected value of length of hospital stay. This effect seems to increase as the number of protein products ordered by a patient increases, with the exception of the effect of an average of 6 protein products a day which is a little bit smaller than the effect of an average of 5 protein products per day. All estimated coefficients are significant at 1% level or lower.

The positive effect found in the dummy variables D0 up and till D7 is against the hy-potheses proposed by literature, which state that increased protein intake might enhance recovery and therefore reduce the length of hospital stay. Nevertheless, the results pre-sented by the zero-truncated Poisson model tell us otherwise. One might argue that these results are not as counter intuitive as they first appear, since patients who are overall more concerned by their health are likely to follow the recommendations of sufficient protein intake but might also spend more days in the hospital. However, endogeneity of the choice of a patient to order a certain amount of protein products is not taken into account in this model and the results could be subject to endogeneity bias.

(22)

Table 4: Zero-truncated Poisson ML estimatation resuls Variables β St. error Female 0.0932*** 0.0262 Age -0.3830 0.4255 Age squared 0.9083** 0.3508 Co-morbidities 0.1766 0.1131 Surgery -0.0084 0.0331 Menu change -0.1419*** 0.0260 Medical Procedures 0.1588*** 0.0356 ICD10 AB 0.1923 0.1240 ICD10 CD 0.3453*** 0.0645 ICD10 E 0.3528*** 0.0915 ICD10 F 0.1321 0.1700 ICD10 G 0.1714. 0.0985 ICD10 H -0.1297 0.2157 ICD10 I 0.0405 0.0623 ICD10 J 0.3777*** 0.0626 ICD10 K 0.3032*** 0.0625 ICD10 L 0.0693 0.1537 ICD10 M 0.2917*** 0.0682 ICD10 N 0.2020** 0.0723 ICD10 O -0.1500 0.4154 ICD10 Q 1.0108*** 0.2650 ICD10 S 0.4804*** 0.0603 D0 0.4303** 0.1440 D1 0.5364*** 0.1358 D2 0.7562*** 0.1362 D3 0.8467*** 0.1379 D4 1.0028*** 0.1400 D5 1.0443*** 0.1436 D6 1.0299*** 0.1551 D7 1.1978*** 0.1652 log likelihood -3969.449

Note: The dependent variable is the length of hospital stay in days; sample of 2000 individuals;

.,*,**,*** Asymptotically significant at respectively

10%, 5%, 1% and 0.1%; reference categories are Males and ICD10 R.

5.3 Two equation model

This section presents the results of the simultaneous model applied as a solution to the major issue of endogeneity of the choice of a patient to order a certain amount of protein products. Table5 reports the estimates for the effect of the rounded off average number of protein products on a patients hospital stay. Here, endogeneity is taken into account by implementing a random variable ηi which represents unobserved heterogeneity in both

equations. Column (2) displays the estimated coefficients of the variables. We find that patients who order a rounded off average amount of 2 protein products or more have a higher expected value of the length of hospital stay. The positive effect seems to increase as the amount of average protein products increases up to and including 6 average protein products. The effect decreases a little for patients ordering on average 7 or more protein products, category 7. The estimated coefficients on the dummy variables D3 up to and

including D7 are significant at 0.1% and the estimated coefficient on the dummy variable

D2 is significant at 10%.

The results are quite in line with the results found in the zero-truncated Poisson model which ignores endogeneity. However, in contrast to the model that neglects endogeneity, ordering a rounded off average amount of 0 or 1 protein products does not seem to have an effect on hospital stay. Also, the zero-truncated Poisson model finds that the positive effect of the average amount of protein products on hospital stay increases as the amount of

(23)

5 Results

protein products increases whereas in the simultaneous model this effect decreases a little bit at an amount of 7 or more rounded off average protein products a day. Furthermore, the results on the dummy variables D2, D3, D4 and D5 in the zero-truncated Poisson model are somewhat upward biased.

Table5shows that there is significant unobserved heterogeneity present, since σ2η is statis-tically different from zero. This suggests that the positive association between eating an average of 0 and 1 protein product per day and hospital stay as well as the other upward biased estimates of the zero-truncated Poisson model could be driven by this unobserved heterogeneity. Note that the estimated correlation ρη,q between the errors η and q of the two equations is negative, however not significant. This result implies that in our model, with its specific structure imposed on the error terms to allow for correlation, no endo-geneity of the rounded off average amount of protein products ordered per day is present. This result could be due to the quite restrictive relationship induced between the error terms of both equations. In reality however, this relationship might be different and more complex.

Table 5: Effect of the number of protein products on hospital stay in simultaneous model

Variables Coefficient St. error D0 -0.1819 0.1944 D1 0.0158 0.1840 D2 0.3357. 0.1804 D3 0.5478*** 0.1521 D4 0.8528*** 0.1702 D5 1.0189*** 0.1499 D6 1.0312*** 0.1685 D7 0.9209*** 0.1873 ˆ ρη,q -0.0129 0.2143 ˆ σηb -0.8584*** 0.0383

Note: The dependent variable is the length

of hospital stay in days in the simulta-neous model; sample of 2000 individuals;

.,*,**,*** Asymptotically significant at

re-spectively 10%, 5%, 1% and 0.1%.

b σˆ

η is scaled as 0.0001 + exp(-0.8584) =

0.4239.

Table6 below reports the estimated coefficients of the other covariates on hospital stay of the simultaneous model. Being female seems to have a positive effect on the length of hospital stay, significant at a 0.01% level. This is in line with the positive effect found in the zero-truncated Poisson model. A patients age does not have an effect on the number of days a patient spends in the hospital. An increase in the number of co-morbidities of a patients results in a positive increase of the number of days a patient spends in the hospital. Note here that the co-morbidities variable is a number indicating the amount of co-morbidities a patient suffers from and the estimated coefficient should be interpreted carefully as not all co-morbidities will have the same effect on hospital stay. The estimated coefficient is significant at a 5% level. In the zero-truncated Poisson model, the effect of additional co-morbidities was not significant.

Furthermore, an increase in the number of medical procedures performed on a patient has a positive effect on the length of stay in the hospital. This estimated coefficient is significant at a 0.1% level. This result is in line with the estimate found in the zero-truncated Poisson model. Just like found before, a patient who was admitted to the hospital after the menu change significantly spends less days in the hospital than a patient who is admitted before

(24)

the menu change. The estimated coefficient is statistically significant at 0.1%. Once more, it should be noted that since the hospital is in an ongoing process of improving nutritional care and motivating patients to move more, projects taking place in Gelderse Vallei during the same period could have influenced this result as well.

Patients who are in the hospital concerning neoplasms, diseases of the blood and blood-forming organs and certain other disorders involving the immune mechanism, ICD10 CD, have a higher expected value of the number of days in the hospital then patients who have symptoms, signs and abnormal clinical and laboratory findings which are not elsewhere classified, the reference group. Individuals admitted in the hospital for diseases of the respiratory system, ICD10 J have an expected value of length of hospital stay which is higher than the expected value of the reference group, ICD10 R. Furthermore, also patient who have a disease of the digestive system, ICD10 K and patients with diseases of the musculoskeletal system and connective tissue, ICD10 M, have a higher expected hospital stay. This also accounts for patients diagnosed with diseases of the genitourinary system, ICD10 N, and code ICD10 S, injury, poisoning and certain other consequences of external causes. These results are in line with the results found in the zero-truncated Poisson model. The difference between the two models regarding the ICD10 codes is that in the simultaneous model, patients who are admitted to the hospital for congenital malformations, deformations and chromosomal abnormalities, ICD10 Q, in contrast to the zero-truncated Poisson model, do not seem to have a significant higher or lower expected value of hospital stay then the reference group.

Table 6: Effect of other covariates on hospital stay in si-multaneous model

Variables Coefficient St. error

Female 0.1705*** 0.0303 Age 0.2000 0.5330 Age squared 0.6911 0.4775 Co-morbidities 0.3101* 0.1418 Surgery 0.0166 0.3710 Menu change -0.2567*** 0.0349 Medical Procedures 0.2379*** 0.0404 ICD10 AB 0.1572 0.1122 ICD10 CD 0.3059*** 0.0699 ICD10 E 0.1335 0.1176 ICD10 F 0.1529 0.2257 ICD10 G 0.0604 0.1261 ICD10 H 0.1934 0.2251 ICD10 I 0.0030 0.0753 ICD10 J 0.4196*** 0.0763 ICD10 K 0.3792*** 0.0745 ICD10 L -0.0835 0.1973 ICD10 M 0.2684** 0.0840 ICD10 N 0.1924* 0.0877 ICD10 O -0.5161 0.5812 ICD10 Q 0.3890 0.4219 ICD10 S 0.4142*** 0.0750

Note: The dependent variable is the length of hospital stay in

days in the simultaneous model; sample of 2000 individuals;

.,*,**,*** Asymptotically significant at respectively 10%,5%,

1% and 0.1%; reference categories are Males, Normal Weight and ICD10 R.

Table7describes the effects of covariates from the simultaneous model on the rounded off average amount of protein products ordered by patients. Column(2) gives the estimated coefficients of the model. Being female has a negative effect on the number of protein products ordered by patients. This estimated coefficient is significant at 0.1% level. Age seems to have a positive effect on the rounded off average number of protein products

(25)

5 Results

ordered by a patient but this effect gets less when the age increases. This is significant at a 0.1% level. These effects are in line with the results found in the ordered probit model before.

Furthermore, being obese has a significant positive effect on the average number of protein products ordered by patients per day. Also, patients who were admitted to the hospital after the menu change seem to order more protein rich food products then patients who were admitted to the hospital before the menu change. Both estimated coefficients are significant at 5% or lower. Patients who are on a protein diet order on average more protein products per day than patients who are not on a protein diet. The estimated coefficient is significant at 0.01%. All effects are in line with the results found in the ordered probit model.

Patients with certain infectious and parasitic diseases, ICD10 AB seem to order less protein products on average per day than patients in the reference group ICD R. In addition, a patient who is admitted to the hospital for a disease of the digestive system, ICD10 K, seems to order less protein products on average per day than a patient in group ICD10 R. The estimated coefficients are significant at respectively 5% and 1%. Patients who have a disease of the musculoskeletal system and connective tissue, ICD10 M, order more protein products on average per day than patients with the ICD10 R code, statistically significant at a 5% level. In contrast to the ordered probit model, only the diagnoses of certain infectious and parasitic diseases, ICD10 AB, diseases of the digestive system, ICD10 K, and diseases of the musculoskeletal system and connective tissue, ICD10 M, seem to have a significant effect on the rounded off average amount of protein products ordered by a patient. In the ordered probit model there were more diagnoses of the patients which had a significant effect on the amount of ordered protein products.

Regarding the estimated cut off points of the ordered probit model, all are significant at a 0.1% level. Note that since the model contains an intercept, t0 is set to zero here and is not estimated by the model.

(26)

Table 7: Effect of other covariates on the rounded off av-erage amount of protein products ordered by patients in simultaneous model

Variables Coefficient St. error

(Intercept) 0.5240* 0.2660 Female -0.3367*** 0.0502 Age 4.4911*** 0.9024 Age squared -4.2242*** 0.7593 Co-morbidities -0.4283. 0.2319 Surgery -0.0615 0.0631 Menu change 0.4517*** 0.0478 Medical Procedures -0.0618 0.0733 ICD10 AB -0.4927* 0.2224 ICD10 CD -0.1506 0.1141 ICD10 E 0.0140 0.1704 ICD10 F -0.3310 0.2391 ICD10 G -0.0095 0.1648 ICD10 H -0.1849 0.2909 ICD10 I -0.0985 0.1047 ICD10 J 0.0232 0.1126 ICD10 K -0.3252** 0.1073 ICD10 L 0.0832 0.2577 ICD10 M 0.2704* 0.1204 ICD10 N -0.0095 0.1245 ICD10 O 0.3573 0.5634 ICD10 Q 0.5042 0.6787 ICD10 S -0.0439 0.1077 Excluded Variablesa Nutritional drink -0.2766 0.1926 Protein diet 0.5008*** 0.0629 Underweight 0.0578 0.1353 Overweight -0.0265 0.0521 Obese 0.1211* 0.0611

Cut off pointsb

α1 1.0304*** 0.0405 α2 1.7741*** 0.0444 α3 2.3808*** 0.0451 α4 2.9180*** 0.0492 α5 3.5300*** 0.0611 α6 4.1512*** 0.0894

Note: The dependent variable is the rounded off aver-age number of protein products ordered by a patient per day in the simultaneous model; sample of 2000 individuals;

.,*,**,*** Asymptotically significant at respectively 10%,5%,

1% and 0.1%; reference categories are Males, Normal Weight and ICD10 R.

aVariables excluded from the hospital stay equation to

en-hance identification.

bThe first cut off point α

(27)

6 Discussion & Conclusion

6

Discussion & Conclusion

This section will discuss the results presented before and give a conclusion. Also limitations and recommendations regarding the research will be provided.

6.1 Conclusion

Sufficient protein intake by hospitalized patients has multiple important health benefits. Since proper nutrition has a positive effect on the recovery of ill patients, it could be that an increase in protein consumption by patients contributes to a decrease in the length of hospital stay. This research investigates whether the rounded off average number of protein products ordered by a patient in a hospital per day has an effect on the number of days he or she spends in the hospital. In order to do so, we account for potential endogeneity of the average number of protein products. We propose a simultaneous model with unobserved heterogeneity in both equations and use numerical integration to obtain estimates.

Our results suggest that ordering a rounded off average amount of 2 or more protein products per day has a significant positive effect on the number of days a patient spends in the hospital when accounting for possible endogeneity. This positive effect was also found in the zero-truncated Poisson model, which does not take possible endogeneity into account. However, neglecting possible endogeneity leads to an incorrect found positive effect of eating 0 or 1 protein products a day on the length of hospital stay and leads to overall upward biased estimates. Our results suggest that there is significant unobserved heterogeneity present, but there is no evidence that the rounded off average number of protein products ordered by a patient per day is endogenous with respect to hospital stay. We can conclude that increasing the rounded off average amount of protein products ordered by a patient does not contribute to a decrease of the length of hospital stay like was hypothesized by literature.

6.2 Limitations and recommendations

When investigating the question if protein intake enhances recovery, one could also think of other measures of recovery than the length of hospital stay. Taking the number of days a patient spends in the hospital as a measure has certain disadvantages. Mostly, a lot of procedures for which patients get admitted to the hospital consist of a standard amount of days a patient should stay in the hospital. Increasing or decreasing protein intake will probably not have an effect on those standard number of days.

Furthermore, increased protein intake could possibly effect patients who are diagnosed with certain diseases more than others. As a result, future research could sample only those who explicitly benefit from it. For example, one could sample only the patients who follow a protein rich diet. It could be that when this sample is analyzed, significant correlation does occur between the error terms of the two equations, implying endogeneity of the rounded off average amount of protein products ordered by a patient per day. In our model we introduce only one latent factor and imply a certain restrictive structure on our error terms. Future research could explore the possibility of removing those restrictive elements.

Other limitations of our data are that it does not provide information on how much protein a patient actually consumed and it does not give the exact amount of protein present in

Referenties

GERELATEERDE DOCUMENTEN

A Negative Binomial regression model with spatial random effects is defined, and detailed parameter estimation methods are proposed in Chap- ter 4 where we also test our Markov

The results for the forced resignations group show a significant positive abnormal return of 0,5 % on the announcement day of the management departure. This is consistent with

The performance statistics (Tables 2 and 3) showed that although Common Factor models with small groups of countries, the Poisson Lee-Carter and the Age-Period-Cohort models

However, applying regular lasso for the proportional hazards model yields different selections of variables and reasonably different regression coefficients for those few variables

Huang (2005) found that the value of voting rights is negatively related to prior year's market value of equity and return on asset. The reason for this is simple, the weaker

The data surrounding these dimensions were then compared to organisational performance (using the research output rates) to determine whether there were

By proposing a reading of the role of Breach in The City &amp; The City as sustaining of the border that exists between the two cities in which this story takes place,

This is why, even though ecumenical bodies admittedly comprised the avenues within which the Circle was conceived, Mercy Amba Oduyoye primed Circle theologians to research and