• No results found

Predicting length of stay in adult ICU patients using electronic health record data - a landmarking approach

N/A
N/A
Protected

Academic year: 2021

Share "Predicting length of stay in adult ICU patients using electronic health record data - a landmarking approach"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Predicting length of stay in adult ICU patients using electronic health

record data - a landmarking approach

by

(2)

Title of Thesis

Predicting length of stay in adult ICU patients using electronic health record data - a landmarking approach

Student

Frank van de Ruit

Student number: 10562206

Mentors Dr. M.H.P. Hof Ir. M.V. van Dijk

Tutor

Prof. Dr. A.H. Zwinderman

Location of the Scientific Research Project Amsterdam UMC, location AMC

Department of Clinical Epidemiology, Biostatistics and Bioinformatics Meibergdreef 9

1105 AZ Amsterdam The Netherlands

Practise teaching period March 2020 - December 2020

(3)

Acknowledgements

First of all, I would like to thank Michel Hof, my daily supervisor, for his help on designing the model and programming the calculations. I always felt free to ask for help and every time I did, he helped me to move forward.

Secondly, I would like to thank Koos Zwinderman, my tutor of this thesis. His input for my research project were always insightful and refreshed my motivation.

Thirdly, I would like to thank the person who got me involved in the subject of capacity management: Menno van Dijk. Without him my this thesis would not have happened.

Lastly, I would like to thank all the staff members at the Medical Informatics Department who helped to make my scientific research projects possible, with special mention for the homecoming day teachers and Gita Guggenheim.

(4)

Contents

1 Introduction 5

1.1 Reading guide . . . 6

1.2 Literature search for LOS modelling approaches . . . 6

1.3 Complexity of predicting length of stay . . . 9

1.3.1 The expected LOS depends on the condition of the patient . . . 9

1.3.2 The importance of non-health-related data . . . 9

1.3.3 The reason for admission as a predictor . . . 9

1.3.4 Discharged alive or discharged deceased . . . 10

1.4 The potential of electronic health records for modelling LOS . . . 10

1.5 Research questions . . . 11

2 Preliminaries 11 2.1 Modelling LOS as a time to event analysis . . . 11

2.2 Landmarking . . . 12

2.3 Data processing required to create landmarks . . . 12

3 Methods 14 3.1 Model specification . . . 14

3.2 Data . . . 14

3.2.1 Admission data . . . 15

3.2.2 Diagnoses . . . 15

3.2.3 Patient chart data . . . 15

3.2.4 Lab results . . . 16

3.2.5 Prescriptions . . . 16

3.2.6 Mechanical ventilation and other life-support systems . . . 17

3.2.7 Number of ICU stays . . . 17

3.3 Omitted data . . . 17

3.3.1 Care provider . . . 17

3.3.2 Microbiology . . . 17

3.4 Overview of the final dataset . . . 17

3.5 Model development: parameter selection procedure . . . 17

3.6 Model development: choice of model quality statistic . . . 17

3.7 Model development: fitting the model . . . 18

4 Results 18 4.1 Baseline data characteristics . . . 18

4.2 Longitudinal data characteristics . . . 19

4.3 Results of model development . . . 22

4.4 Results of model testing . . . 23

5 Discussion 25 5.1 Limitations . . . 27

5.2 Strengths . . . 28

5.3 Relation to other works . . . 28

5.4 Implications of this study . . . 29

5.5 Recommendations for future researchers . . . 29

6 Appendix 30 6.1 PubMed Query for LOS modelling studies . . . 30

6.2 Longitudinal data characteristics visualisations: stratified by outcome . . . 31

6.3 Longitudinal data characteristics visualisations: stratified by LOS group . . . 32

(5)

Summary

Intensive care units (ICU) are faced with limited capacity and limited personnel. Length of stay (LOS) is one of the most important determinants of capacity utilisation. Predicting the LOS of patients can therefore be of great utility for capacity planning. However, most currently available models perform insufficiently for clinical implementation because they were developed to predict the LOS at admission. In this thesis we developed a model for LOS, using a longitudinal modelling approach for LOS to investigate whether reliable LOS predictions are possible.

We approached modelling LOS as a time to event analysis, i.e. survival analysis, using a landmarking approach. With landmarking, the length of stay is split up into small intervals and the outcome at start of the next interval is predicted. We used the public ICU database MIMIC-3 and included adult patients with a LOS of less than 31 days. We hypothesised that both health-related and non-health-related parameters were important predictors of LOS, so both were included in the analysis. We tested several model configurations using the cross-validated concordance statistic to find the best model.

The resulting best model had a cross-validated concordance statistic of around 0.8, which increased slightly the longer a patient stayed in the ICU. Furthermore, on average the model predicted a 52.2% chance to be discharged when the patient was truly discharged.

Overall our study showed that a longitudinal approach is promising for modelling LOS, but our model was not able to generate reliable predictions for individual patients because the included variables did not sufficiently explain the variance of LOS between individual patients. We recommend future researchers to focus on finding parameters that can better explain the variance of LOS between individual patients.

Summary (Dutch)

Intensive care (IC) afdelingen moeten rekening houden met beperkte capaciteit en personeel. Ligduur van pati¨enten is een van de belangrijkste determinanten voor het capaciteitsgebruik. Het voorspellen van de ligduur kan daarom van grote waarde zijn. De meeste beschikbare modellen voor ligduur zijn echter niet betrouwbaar genoeg voor toepassing in de kliniek, omdat ze ontwikkeld zijn om de ligduur op het moment van opname op de IC te voorspellen. In deze thesis hebben wij een model voor ligduur ontwikkeld, waarbij we een longitudinale aanpak hebben gehanteerd om te onderzoeken of betrouwbare voorspelling van ligduur mogelijk is.

We hebben het modelleren van de ligduur beschouwd als een survival analyse, waarbij we een landmark analyse hebben gehanteerd. Bij een landmark analyse wordt de ligduur opgesplitst in intervallen en de uitkomst van de pati¨ent wordt voor het begin van elk volgend interval voorspeld. We hebben de MIMIC-3 database gebruikt en daaruit volwassen pati¨enten ge¨ıncludeerd die korter dan 31 dagen hebben gelegen. Daarbij was de hypothese dat zowel medische als niet-medische gegevens relevante voorspellers zouden zijn voor ligduur. We hebben vervolgens meerdere modelconfiguraties getest met behulp van de gekruisvalideerde concordance statistic om tot het beste model te komen.

Het resulterende beste model had een gekruisvalideerde concordance statistic van rond de 0.8, die zelfs iets steeg naarmate de ligduur toenam. De gemiddelde voorspelde kans op ontslag op het moment waarop de pati¨ent in werkelijkheid was ontslagen was 52.2%.

In conclusie laat onze studie zien dat een longitudinale aanpak veelbelovende om ligduur te modelleren. Ons model was echter niet in staat om betrouwbare voorspellingen te doen voor individuale pati¨enten omdat de ge¨ıncludeerde variabelen het verschil in ligduur tussen pati¨enten onvoldoende konden beschrijven. We raden toekomstige onderzoekers daarom aan om zich te richten op het vinden van voorspellers die de variatie in ligduur van individuele pati¨enten beter kunnen beschrijven.

(6)

1

Introduction

Intensive care units (ICU) are faced with limited capacity and limited personnel and population ageing will continue to increase the strain on ICUs in the coming years. Resource and capacity management is therefore becoming increasingly important and ICUs are looking for new ways to optimise resource planning. One way to improve resource planning is to predict patient length of stay (LOS). LOS is one of the most important determinants of ICU capacity demand, so having an estimate of the remaining LOS of the currently admitted patients can be of great use for planning new admissions and staffing schedules.[1,2] Consider, for instance, that an ICU aims to reserve 10% of its beds for life-saving emergency admissions. If at some point 80% of its beds are occupied and the ICU is confronted with a new influx of patients that will increase the bed occupation to 95%, the 10% buffer cannot be upheld unless some of the currently admitted patients are discharged soon. The ICU has to decide that it cannot accept the patients for fear that critically-ill patients will have to be rejected. However, declining the current influx of patients is also risky because these patient have to be transferred to a different ICU, most likely in a different hospital, with its associated risks. In this scenario, the uncertainty regarding the LOS of patients prevents the admission of new patients, while there are beds unoccupied. Now, consider the same situation but with the possibility to predict the LOS for each patient currently in the ICU. If it is predicted that 20% of the currently admitted patients are highly likely to be discharged in the next two days, the ICU can make a better risk assessment and decide whether they can briefly forego the 10% buffer and admit the new patients.

It is therefore not surprising that, for many years now, it has been common practice for physicians to estimate the LOS of patients upon admission. Several studies have shown, however, that LOS predictions based on clinical experience alone are not reliable.[3,4] Consequently, the interest in modelling LOS has increased significantly in the last decade as shown by steep increase in the number of articles published about LOS in the last 10 years (see Fig. 1).

Figure 1: The number of articles published on PubMed about length of stay since 1990.[5] Note the significant increase around 2010.

The quality of models, however, is often insufficient for clinical implementation. In 2014, Verburg et al. assessed the possibility to predict LOS in the ICU at admission using eight regression techniques and found an average error of three days between the predicted LOS and the true LOS, while the average true LOS was only two days. Based on these results they concluded that all tested regression techniques were not able to provide useful predictions at admission.[6] Again in 2017, Verburg et al. systematically reviewed the literature to assess the quality of available LOS models. They stated that a model was suitable for planning if it could explain at least

(7)

36% of the variance (R2) in LOS. They found 11 studies with 31 models and the best model had an R2 of 28%, which implied that all models performed poorly.[7]

Predicting LOS can be of great use for ICUs to optimise capacity planning, but predictions based on clinical experience and on most available prediction models with limited reliability are questionable. In this master thesis we aim to investigate what knowledge is available about predicting LOS and why available prediction models have insufficient predictive performance. We will then create our own model to find out if we can improve upon the available models.

1.1

Reading guide

In the sections 1.2, 1.3 and 1.4 we will respectively cover the literature on LOS prediction models, the com-plexities of predicting LOS and the potential of the electronic health record for modelling LOS. This is followed by the preliminaries where we give background information about the modelling approach we used and the associated data processing steps that were required. In the method section we will specify our model in more detail, what data was used as input for the model and explain how we developed and tested the model. In the results section we will begin with an overview of the data characteristics, followed by the results of model development and model testing. In the discussion section we will answer the research questions and discuss the limitations and strengths of our approach. Finally, we conclude with the implications of our results and make recommendations for future researchers.

1.2

Literature search for LOS modelling approaches

We searched PubMed for studies on predicting LOS in adult patients in the ICU to find modelling approaches that yielded good predictive performance and to find out what specific data has predictive value for LOS and may therefore be important to include in our model. The query, results and extraction process can be found in the appendix.

The search resulted in 103 studies that developed a model to predict LOS. Of these studies, 41 modelled the risk of prolonged stay. We will not cover these studies because predicting prolonged stay is a different problem than we are addressing in this thesis. Continuous LOS was modelled by the remaining 62 studies. The majority (46 out of 62) modelled LOS for a specific condition such as after liver transplantation. However, these models are poorly generalisable to the whole ICU because they contain parameters that are specifically relevant for that condition. For example, in a model for LOS after liver transplantation, liver function tests will be an important predictor for recovery and will therefore likely be the main feature of these models. For the whole ICU, however, liver function tests will not be good predictors because not all patients suffer from liver disease and will therefore not always have abnormal liver function tests. For this reason we did not look further into the studies that modelled LOS for only one condition.

For the remaining studies, Tables 1 and 2 show the extracted information. Seven studies modelled a specific subpopulation in the ICU, e.g. surgical patients. These studies had a more heterogeneous population than the studies with only one condition and their modelling approaches may therefore also be interesting for the whole ICU. However, all seven studies only used data known or collected at admission to predict LOS. This means that none of the predictions took into account longitudinal data like repeated measurements (e.g. blood pressure) or the occurence of complications (e.g. infections). Five out of seven studies used a form of multivariate regression (e.g. Poisson regression) and found a statistically significant relation between the included predictors and LOS. The sixth study by Gilani et al. compared the correlation of LOS with the Simplified Acute Physiology Score II (SAPS) and the Acute Physiology and Chronic Health Evaluation versions II and III (APACHE).[8] These are scoring systems that predict the mortality of patients at admission and can therefore be viewed as a measure for the severity of illness of patients. The study found that all three scoring systems had a statistically significant, positive correlation with LOS, implying that a higher score (i.e. higher risk of dying) will lead to a longer stay. However, the authors did not give a recommendation about using these scoring systems for modelling LOS. The final study by Gholipour et al. used an artificial neural network to model LOS in trauma patients with the inclusion of longitudinal data.[9] The resulting predicted mean LOS did not statistically differ from the true mean, indicating a promising model. Unfortunately, none of these studies externally validated their results in an external validation dataset so the generalisability of their results remains untested. In addition, the only longitudinal model was made using a neural network, so this study did not provide any insight into what specific data is useful for predicting LOS.

(8)

Study Population Data Input Longitudinal Siddiqui (2017) [14] Sepsis Observational SOFA1, SIRS2, EWS3 No

Gholipour (2015) [9] Trauma Not specified ISS4, vital signs No

Zivkovic (2019) [15] Trauma Observational Serum cholinesterase No Gilani (2014) [8] Surgical Observational APACHE5 II & III, SAPS6 No

Lee (2012) [16] Surgical Observational Manual muscle testing, albumin

No Kasotakis (2012) [17] Surgical Observational SOMS7, hypernatremia,

hypotension

No

Alizadeh (2015) [18] Surgical Observational Vitamine D No

Rudolf (2016) [19] Total ICU Observational Cardiac markers No

Waite (2013) [20] Total ICU Registry APACHE, admission type, several markers

No Piva (2015) [21] Total ICU Observational SOMS, SAPS-2,

Comorbidity

No

Ghorbani (2017) [22] Total ICU Observational APACHE No

Woods (2000) [2] Total ICU Observational APACHE No

Perez (2006) [11] Total ICU Observational Admission data, surgical status

Yes

Houthooft (2015) [12] Total ICU Not specified SOFA Yes

Marik (2000) [10] Total ICU Observational APACHE No

Table 1: Part 1 of the results of the PubMed literature search for studies about length of stay - continued in Table 2. 1SOFA: Sequential Organ Failure Score, 2SIRS: Systemic Inflammatory Response Syndrome,3EWS: Early

Warning System, 4ISS: Injury Severity Score, 5APACHE: Acute Physiology and Chronic Health Evaluation, 6SAPS: Simplified Acute Physiology Score,7SOMS: Surgical Intensive Care Optimal Mobilisation Score.

because it did not report clearly what data was used as model input. Three out of eight studies tested a version of the APACHE scoring system. Unlike Gilani et al. however, these studies found poor predictive value of APACHE for individual patients. Marik et al. did find that APACHE predicts LOS well on a population level, but should not be used to predict LOS for individual patients.[10] The remaining five studies used a modelling approach that was not based on scoring systems. Three of these five studies used a form of multivariate regression and reported statistically significant effects between the included predictors and LOS, but only Piva et al. reported a prediction accuracy statistic - with an R2 of 0.105. The last two studies modelled LOS for

the whole ICU and included longitudinal data, so these were potentially of interest for our study. Perez et al. (2006) modelled LOS as the daily probability a patient will be discharged from the ICU with the help of a Markov chain.[11] A Markov chain is an analysis technique where a set of states is defined (e.g. patient is in the ICU, patient is in the ward, patient is discharged) and the next state is predicted based on the characteristics of the current state. They reported a mean predicted LOS that did not statistically differ from the true mean LOS, so their approach appeared to be promising. Unfortunately, the authors did not specify what parameters were included in the model. Houthooft et al. (2015) used a local ICU database from which they extracted the sequential organ failure score (SOFA).[12] SOFA is a score that predicts the outcome of critically ill patients.[13] They then modelled LOS with a regression type machine learning algorithm and the resulting best model had a mean error of 1.8 days and a median error of 1.2 days. The authors did not report any confidence statistic for these predictions, but they did argue that the model was of sufficient quality for clinical implementation. The studies by Perez et al. and Houthooft et al. showed promising results, but neither externally validated their model.

In summary, the studies by Gholipour et al. , Perez et al. and Houthooft et al. reported the most promising results. They respectively used an artificial neural network, a Markov chain approach and a collection of machine learning techniques. There was little consensus among the studies in our literature study on what data has predictive value for LOS, except that 9 out of 17 models used some scoring system for severity of illness as input, with APACHE, SAPS and SOFA being the most prominent. Unfortunately, the results of this literature search gave little guidance as to what is the best approach to model LOS.

(9)

Study Approach Reported result External validation?

Siddiqui (2017)[14] Poisson regression Inconclusive No

Gholipour (2015)[9] Artificial neural network

Mean predicted LOS = Mean population LOS

No Zivkovic (2019)[15] Linear regression Negative correlation with LOS No Gilani (2014)[8] APACHE1II & III,

SAPS2

Positive correlation with LOS N.a. Lee (2012)[16] Poisson regression Positive correlation with LOS No Kasotakis (2012)[17] Poisson regression Significant correlation with LOS No Alizadeh (2015)[18] ANCOVA3 Significant correlation with LOS No

Rudolf (2016)[19] Linear regression Significant correlation with LOS No Waite (2013)[20] Linear regression Hypernatremia independent

predictor for LOS

No Piva (2015)[21] Poisson regression SOMS4 inversely correlated with

LOS (RR = 0.88)

No

Ghorbani (2017)[22] APACHE APACHE is a poor predictor

for LOS

N.a.

Woods (2000)[2] APACHE APACHE consistently

underpredicts LOS

N.a. Perez (2006)[11] Markov chain Mean predicted LOS = mean

population LOS

No Houthooft (2015)[12] Machine learning Mean predicted error for LOS

is 1.8 days

No

Marik (2000)[10] APACHE APACHE poorly predicts

individual LOS

N.a.

Table 2: Part 2 of the results of the PubMed literature search for studies about length of stay. N.a.: Not applicable, LOS: Length of stay, 1APACHE: , 2SAPS: ,3ANCOVA: Analysis of covariance, 4SOMS: Surgical

(10)

1.3

Complexity of predicting length of stay

As the literature study did not give direction to our modelling approach, we will now present four reasons that complicate the modelling of LOS and explain why addressing these issues may lead to better predictions. 1.3.1 The expected LOS depends on the condition of the patient

LOS is difficult to predict at the time of ICU admission because the expected LOS can change when the condition of the patient changes. With the expected LOS we mean the time it takes for the patient to recover sufficiently to be discharged from the ICU, given the current condition of the patient. Take the following scenario for example: a patient is steadily recovering, his vital signs are improving and if everything stays steady, the patient will likely recover completely in two days. However, the next day the patient develops pneumonia, his health deteriorates and the patient has to stay in the ICU longer. But the expected LOS may change again when a therapy is administered (e.g. antibiotics). If the therapy is effective the condition of the patient may improve but if the therapy is not effective the patient will likely have a prolonged stay or die in the ICU. Because all these events cannot be captured at admission, we hypothesise that it is important to model LOS as a longitudinal process and to use longitudinal data to describe the changes in the condition of the patient, thus using the most up-to-date information to make updated predictions.

Longitudinal data consists of events such as newly developed infections, repeated measurements such as lab results and bedside measurements, and conditions such as mechanical ventilation. The value of a longitudinal approach is reinforced by the studies performed by Verburg et al. (2014) and Cai et al. (2016). Verburg used the data collected only at admission (e.g. initial lab, vital signs, demographics and main health problem) of over 30.000 unplanned admissions and failed to produce a reliable model for LOS.[6] Cai, on the other hand, used a longitudinal approach to model hospital LOS with longitudinal data and obtained an average daily accuracy of 80%, showing that reasonable model performance is possible when using longitudinal data.[23]

1.3.2 The importance of non-health-related data

A study by Woods et al. in 22 Scottish ICUs found that the variation in severity of illness alone could not explain the difference in LOS between individual patients.[2] LOS depends on health-related factors, but also on non-health-related factors.[24,25] Health-related factors give an indication of the health of a patient, i.e. age, the medical history of a patient, measurements like blood pressure and body temperature and so on. In an ICU setting, many health-related data are measured because they show whether a patient is recovering or if additional interventions are necessary. Non-health-related factors, on the other hand, say something about the organisational and practical context that may affect the LOS of a patient. Examples of such factors are the time of day - discharge is less likely at night - family member involvement (especially in critically ill patients), the purpose of the ICU admission (explained in section 1.3.3), the availability of a bed in the nursing department, hospital policies or national policies regarding termination of life support and so on. These factors are rarely explicitly recorded during the care process, but they do affect the LOS in individual patients. However, models for LOS are often based on health-related data alone. This is understandable, because health-related data are more readily available and more intuitive to use, but they will not be able to account for a relevant portion of the variance in LOS between individual patients.

How much of the overall variance of LOS can be explained by non-health-related factors has not been compre-hensively investigated, but it is a given that they affect LOS in individual patients. It is therefore important to investigate their role in the modelling of LOS.

1.3.3 The reason for admission as a predictor

The reason a patient is admitted to the ICU is an important predictor for LOS because it may explain some of the variation in LOS that is not explained by severity of disease. Take planned versus unplanned admissions. Planned ICU admissions often take place in the context of high-complexity surgery. The vital signs of these patients may be poor right after surgery and have to be monitored closely for a short time, but if no complications occur they are discharged to the ward within a reasonably predictable number of days. The distinction between a planned or unplanned admission is therefore an important predictor for LOS. However, unplanned admissions cover a much larger variety of conditions than planned admissions and they vary more in severity of illness than planned admissions, so unplanned admissions may still be difficult to predict accurately. The combination of variables that comprises a meaningful classification of patient LOS is poorly understood. Scoring systems like APACHE, SAPS and SOFA try to classify patients based on health-related data, but studies that tested these systems for predicting LOS showed equivocal results (see section 1.2).

(11)

To model LOS better it is important to gain more insight in what non-health-related variables can help describe the reason a patient is admitted to the ICU and thus cluster patients more informatively.

1.3.4 Discharged alive or discharged deceased

A patient can be discharged in two ways: the patient recovers and is discharged alive or the patient does not recover and dies, i.e. is discharged deceased. These outcomes are mutually exclusive: if a patient dies, he cannot be discharged alive and vice versa. When modelling LOS, this is an important distinction because the effects of the model parameters may differ significantly, and may even be opposite, for the two outcomes. For example, consider two patients who are identical at admission and have the same outcome at discharge, except that one is an elderly patient and one is a young patient. If the patients survive, on average the younger patient will recover faster than the older patient and will therefore have a shorter LOS. In contrast, if the patients die, on average the younger patient will stay alive longer because he is more vital and the physicians may try more interventions before ceasing care than with an older patient. So, for the same predictor (age), the effect on LOS is opposite depending on the outcome: if the patient survives an higher age increases LOS but if the patient dies an higher age decreases LOS. We therefore hypothesise that modelling LOS as two separate, parallel trajectories, i.e. for discharged alive and discharged deceased, will improve model performance.

1.4

The potential of electronic health records for modelling LOS

In this section we will explain why the electronic health record may be a suitable data source to address the complexities described in section 1.3 and why it may therefore be used to model LOS effectively.

The increase in the use of electronic health records (EHR) has rapidly improved the availability and quality of routinely collected data, including non-health-related data. Many healthcare institutions are making these data available for reuse (e.g. research and business intelligence). In fact, the Massachusetts Institution for Technology maintains a public EHR ICU database for the purpose of research[26]. Consequently, EHRs are used extensively for research, including the development of prediction models.[27] Despite these opportunities, not a single study in our literature search used EHR data, while the EHR contains all of the longitudinal data of a patient stay in the ICU. EHRs also contain important non-health-related data like the time of day, whether an admission is planned or not and patient preferences regarding end-of-life decisions. The EHR is therefore a source that will likely contain the data necessary to model LOS effectively.

EHR data also has limitations. Most importantly, the data collection process is less strict than in controlled data sources (e.g. prospective cohort data). Stored data may be lost, important data may not be recorded, data may be recorded erroneously (because of human error) and data collection policies and habits may change over the years. Therefore, to investigate the usability of EHR data we searched the literature for reviews about the use of EHR data for research. We extracted the following information:

• EHRs are a relatively cheap source of rich data that may be valuable for the development of prediction models, especially with modelling techniques that can utilise highly dimensional datasets.[27]

• Though the results of published models vary strongly based on the modelling technique used, models based on EHR data can achieve high predictive accuracy. In fact, some of these models have been deemed accurate enough to be applied to clinical practice. One example is the Rothman Index which predicts mortality and readmission in the surgical ICU.[28]

• Many authors do not account for biases typically encountered in EHRs (e.g. missing data and loss to follow up).[27,29,30] One important bias that is often overlooked is outcome misclassification. Outcome misclassifcation occurs when the true patient outcome (e.g. 1-year mortality) is not recorded and instead the patient is excluded from the study or a possibly erroneous outcome is imputed from the data. The true long-term outcome is often missing in EHR data, because reliable follow-up ends at hospital discharge. Without addressing outcome misclassification, predictions can become severely biased.[31,32]

• Individual risk predictions based on EHR data are often unreliable because of an heterogeneous population and inconsistency of data collection (e.g. errors resulting from manual input by personnel). However, models based on EHR data appear to perform well on a population level.[33]

• Models developed with EHR data have limited generalisability to EHRs of other institutions, because data collecting, storing and processing vary between healthcare institutions. External validation of models based on EHR data is therefore crucial before they can be applied to clinical practice.[29]

(12)

• Researchers often do not include in their prediction models the longitudinal data available in EHRs and the potential of EHRs for prediction modelling can therefore by explored further.[29]

Based on these findings and the fact that EHRs are likely to contain important information to model LOS and the finding of our literature search in section 1.2 that showed that EHR data has not been used extensively for modelling LOS, we decided that it was appropriate and justified to use EHR data to try and model LOS.

1.5

Research questions

For this thesis, we formulated the following research questions: • Is it possible to predict adult LOS reliably based on EHR data?

• Does a longitudinal modelling approach with the inclusion of longitudinal data explain the variance of LOS between individual patients better compared to models that predict LOS at admission with admission data alone?

In this thesis we will therefore develop a model for the LOS of adult patients in the ICU using EHR data. We propose to use a longitudinal approach so the longitudinal data from the EHR can be incorporated in the model, allowing for updated predictions when new information becomes available as the ICU stay progresses.

2

Preliminaries

In the following two preliminary sections we will provide background information on the longitudinal modelling approach we used to develop the model and we will give additional information on the data structure required to do so.

2.1

Modelling LOS as a time to event analysis

Modelling LOS is essentially a time to event analysis, also called survival analysis. In a survival analysis the incidence of an outcome is related to the total time all subjects were at risk to experience that outcome. As an example, Figure 2 shows the LOS of five patients in the ICU, patient 1 stayed for 7 hours, patient 2 for 4 hours, and so on. Note that in our study all patients experience the terminal event (i.e. discharge). The incidence of patients being discharged can be calculated by dividing the number of patients by the total time all patients were at risk to be discharged: 5 / (7 + 4 + 3 + 9 + 5) = 0.179. This gives us an average of 0.179 discharges per hour spent in the ICU. The incidence may vary over time but then the incidence is calculated in a similar way within specific time intervals. Next, we can add patient characteristics, e.g. gender, and calculate the incidence of discharge per gender: 2 / (7 + 3) = 0.200 for female patients and 3 / (4 + 9 + 5) = 0.167 for male patients. This way it is fairly easy to calculate difference in LOS for a given determinant. Similarly, the effects of multiple determinant of LOS can be fairly easily generated using a technique like Cox regression to create a model that can predict the LOS.[34]

For our study, however, there are two mutually exclusive outcomes: discharged alive and discharged deceased (see Fig. 2, two patients were discharged deceased). Like we explained in section 1.3.4, the effects of patient characteristics on LOS may have an opposite effect for discharged alive and discharged deceased. To account for this, LOS has to be modelled for two parallel processes: the patient is discharged alive, given that he does not die in the ICU and the patient is discharged deceased, given that he is not discharged alive. This will ensure that predictors that increase the risk of discharged alive also reduce the risk of discharged deceased; one could say that the two outcomes compete with each other, which is why models with two competing outcomes is often referred to as a competing risks model. In section 3.1 we will expand on the mathematical specifications of our model and show how it handles competing risks.

In addition to competing risks, we want our model to use longitudinal data collected during the ICU stay so we can generate updated predictions. We will discuss three methods that can be used to achieve this: Cox regression, joint modelling and landmarking. The most important argument not to use Cox regression for this study is that we want make predictions for individual patients and at different timepoints. To do this we need to estimate the baseline hazard. Instead of a parametric baseline hazard, however, Cox regression provides the baseline hazard as a function over time which is very easily affected by changes in the data, i.e. it fits the data well. For the purpose of predicting, a less flexible (e.g. parametric) baseline hazard is more robust. Joint modelling is a technique that combines survival analysis and the analysis of repeated measurements (e.g.

(13)

LOS (hours) ♀Patient 1 ♂Patient 2 ♀Patient 3 ♂Patient 4 ♂Patient 5 0 1 2 3 4 5 6 7 8 9 10 Outcome Discharged deceased Discharged alive

Figure 2: Overview of the survival data of five patients, 2 female and 3 male. All patients are discharged.

with mixed-effects models). The major limitations of joint modelling are that its computational requirements increase rapidly with every additional repeated measure added and that it is not straightforward with joint models to include both continuous and categorical variables.[35] Because we had little prior knowledge about what variables yielded high predictive value, we expected to have to test many different model configurations, which would have been impractical with very long computation times.

2.2

Landmarking

Landmarking is basically an alternative approach to a time to event analysis where the follow-up time is split into a set of intervals and the start of each interval is a landmark. Instead of predicting the outcome at the end of the ICU stay, a landmarking model predicts the outcome at the next landmark (see Fig. 3). At each landmark, new information can be used for the prediction, thus allowing for updated predictions at every landmark. This approach also has limitations. Firstly, the basic landmark model will not remember patient characteristics from previous landmarks; at the beginning of every landmark the model treats the patient as if he is newly arrived. This can be mitigated by explicitly modelling time-dependent effect, e.g. the effect of age per hour spent in the ICU. Secondly, the conversion of the data to a landmarking format requires a significant expansion of the data size and imputation of missing values will be necessary because not all longitudinal characteristics are available at every landmark. We explain the data conversion and imputation processes in more detail in the next section (2.3). The granularity of the predictions also depends on the frequency at which new data is available, but in an ICU most measurements are performed multiple times a day so this is less relevant for our case.

Despite these limitations, we chose landmarking over joint modelling because it is a flexible technique that can easily handle both continuous and categorical variables, it can be used with competing risks and the computa-tional requirements remain workable despite the large amount of highly dimensional data in EHRs.

2.3

Data processing required to create landmarks

Say we have the LOS of n patients and we partition the LOS of each patient into mi landmarks, where mi is

equal to the number of hours patient i stayed in the ICU. The outcome of patient i = 1, . . . , n at landmark j is given by Yi(j), where the outcome can take the following three values:

Yi(j) =      0 Not discharged, 1 Discharged alive, 2 Discharged deceased. (1)

(14)

LOS (hours) Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 0 1 2 3 4 5 6 7 8 9 10 Outcome Discharged deceased Discharged alive

Figure 3: Overview of the landmarking predictions of five patients. At every landmark (vertical dashed line), the model predicts the probability the patient will be discharged at the next landmark. For example, seven predictions are made for patient 1.

the values of these parameters (e.g. temperature and blood pressure) can change at every landmark. As an example, Figure 4 shows how the blood pressure measurements of one patient are converted to a landmarking format. With this conversion we discretised time to hourly intervals and imposed the occurrence of the events of interest (discharged alive or discharged deceased) to occur at the end of the each interval (which is equal to the start of the next interval).

LOS (hrs) SBP Y 1 130 0 2 145 0 5 110 0 12 112 1 −→ LOS (hrs) SBP Y 1 130 0 2 145 0 3 . . . 0 4 . . . 0 5 110 0 6 . . . 0 7 . . . 0 8 . . . 0 9 . . . 0 10 . . . 0 11 . . . 0 12 112 1 −→ LOS (hrs) SBP Y 1 130 0 2 145 0 3 145 0 4 145 0 5 110 0 6 110 0 7 110 0 8 110 0 9 110 0 0 110 0 11 110 0 12 112 1

Figure 4: The data transformation to landmarks using systolic blood pressure (SBP) of one patient as an example. The left table shows the original data with the length of stay (LOS) in hours, SBP and the outcome as Y. In the middle table a landmark is made after every hour spent in the ICU. The ‘. . . ’ represents missing values, which are generated because SBP was not measured every hour. The right table shows that the missing values are imputed with the last known value, in red. Note that for this patient measurements were performed after 1, 2 and 5 and 12 hours of stay and the total LOS was 12 hours.

(15)

3

Methods

We modelled adult LOS with a landmark analysis and a competing risk for mortality. We will now explain in more detail the design of our model followed by what data we used for model fitting and how we optimised the model.

3.1

Model specification

We set the length of each interval at one hour, i.e. the time between each landmark is one hour and a prediction is made every hour for each patient. The probabilities to observe no discharge, discharged alive or discharged deceased at landmark j are given by:

p(Yi(j) = y | xi(j)) = 2 Y l=0     exp{xi(j)β0l} 2 P k=0 exp{xi(j)β0k}     1(y=l) (2)

where xi(j) are the data values of patient i at landmark j, β0 the model covariates for the outcome under

consideration and 1() is the indicator function given by:

1(x) = (

1 if x is true, 0 otherwise.

We fix the parameters of the not discharged outcome, i.e. β0= (0, 0, . . . , 0), such that the model is identifiable. To estimate the model parameters, it is necessary to define a likelihood function. The likelihood function represents the probability to observe the data with a given set of parameter values. To specify the likelihood function, we assume independence between all i = 1, . . . , n ICU stays. This leads to the following log-likelihood function: log[L(β)] = n X i=1 mi X j=1 3 X k=1 log[P (Yi(j) = k|xi(j); β)]1(Yi(j) = k) (3)

Where n is the total number of patients, mi the total number of landmarks for patient i and k the outcome

under consideration. Yi(j) is the observed outcome for patient i at landmark j. The maximum loglikelihood

estimator then follows as:

ˆ

β = argmax

β

log[L(β)] (4)

3.2

Data

We used the MIMIC-3 database developed by the Massachusetts Institute of Technology [26]. MIMIC contains EHR data of ICU stays in the Beth Israel Deaconess Medical Center from 2001 tot 2012. We chose MIMIC-3 because it is easy to access, contains the ICU stays of over 40.000 ICU patients, has been used extensively for modelling medical record data and has comprehensive documentation on its contents and applications. All ICU stays with a known length of stay were included. Patients younger than 18 years were excluded and stays that lasted longer than 30 days were excluded. Stays longer than 30 days are often already medically recovered, but they remain in the ICU because of non-health-related complications. All ICU stays were treated as independent, even if multiple ICU stays concerned the same patient (e.g. readmissions).

The MIMIC data contains the majority of the data collected during the ICU stays, including structured data and unstructured data (i.e. free text fields). We focused on the structured data because processing unstructured data (e.g. natural language processing) was beyond the scope of this study. The structured data consists of patient information (e.g. age, gender), care related measurements (e.g. lab results, body temperature, blood

(16)

pressure), medication prescriptions, procedures (e.g. mechanical ventilation), non-health-related parameters (e.g. planned versus unplanned admission, transfers, healthcare provider details) and billing data (e.g. ICD-9 coded diagnoses). However, much of this data is not directly usable for model development, but has to be aggregated first to concise and meaningful parameters. Next, we will cover the variables present in MIMIC in more detail, explain what variables were feasible for inclusion in the model and explain any data processing that was required to derive the variables.

3.2.1 Admission data

ICU admission data includes age, gender and admission type (i.e. whether the admission was planned or urgent). Age may improve the model because the risk of mortality increases with age and because older patients may have a slower rate of recovery. The admission type may improve the model because planned ICU patients are generally less ill than emergency patients or they are admitted for routine monitoring after complex interventions (e.g. high risk surgery). Unless complications occur, these stays are generally shorter than emergency admissions. Age and admission type were included in the model because they may have a relevant effect on LOS. The effect of gender is less obvious, but it was nonetheless included for the sake of completeness.

3.2.2 Diagnoses

Including diagnoses may improve the model because they determine for a large part the severity of disease and the reason for admission (e.g. mechanical ventilation due to pneumonia or close monitoring after major blood loss). Diagnoses are recorded in MIMIC with the International Statistical Classification of Diseases (ICD).[36] For research, ICD codes can be used to stratify patients because ICD contains hierarchical disease clusters. For this study, however, we chose not to stratify by ICD clusters because MIMIC contains the outdated ICD-9 codes and translating to ICD-10 codes is labour intensive. Furthermore, the latest version is more detailed and contains new categories, so translating will result in loss of information. Instead, we stratified by the Charlson Comorbidity Index. The Charlson Comorbidity Index was developed to predict 10-year mortality rates based on comorbidities and it is compatible with ICD-9.[37] The Charlson Comorbidity Index was calculated using the R comorbidity package.[38]

Including diagnoses as ICD codes was not feasible because they contain too many categories and clustering was not feasible either. As an alternative, the Charlson Comorbidity Index was included because it gives an indication of the general health status of a patient and may stratify for severity of disease and the risk of prolonged stay due to complications.

3.2.3 Patient chart data

The patient chart contains an overview of the most important measurements per patient, including bedside measurements such as temperature, blood pressure and so on. Including bedside measurements may improve predictions because they provide longitudinal information on the health status of a patient. Furthermore, bedside measurements are often performed more frequently if the status of a patient is poor or deteriorating, thus potentially providing information about complications. The chart in MIMIC contains over 6000 measurement types. Most of these measurement types, however, contain only a few records and are therefore not very useful for inclusion in the model. To extract the useful measurements, we used the following heuristics:

• Is the measurement performed on a more or less daily basis and for most of the patients?

• Is the measurement significantly altered by interventions in the ICU, then do not include it? This would make it depend on other factors which we cannot realistically account for in the model.

The first rule resulted in heart rate, oxygen saturation, respiratory rate, systolic and diastolic blood pressure, central venous pressure, Glasgow Coma Scale (GCS) score and body temperature. The second rule eliminated oxygen saturation because patients receive supplementary oxygen, blood pressure because patients receive fluid to maintain blood pressure and respiratory rate because patients are mechanically ventilated. That leaves heart rate, central venous pressure, GCS and temperature as potential variables for inclusion in the model. Temper-ature may improve the model because it is a sign of ongoing inflammation and may indicate a complication and makes it less likely the patient will be discharged soon. GCS may improve the model because it repre-sents the patients consciousness and a patient is not likely to be discharged unless consciousness has at least improved. The relation of heart rate and central venous pressure are less obvious, except for extreme cases, and we therefore did not think these would add much to the model.

(17)

Body temperature and GCS were included in the model because they had a theoretically plausible effect on LOS. Although we excluded other bedside measurements from the model, they may still improve LOS predictions, but for this study we limited ourselves to the most promising variables.

3.2.4 Lab results

Including lab results may improve the model because, like bedside measurements, they can be indicative of the health status of a patient. Most studies use only a select number of lab results based on clinical knowledge, because they are specific to the disease under investigation or based on a feature selection algorithm. The literature provides several feature selection algorithms, such as the study by Tran et al. (2014) that can be used to find the most informative features.[30] However, it is uncertain if the estimated predictive value of the lab results selected by such an algorithm from one EHR are generalisable to other EHRs.

Lab results were included in the model because they provide important information about many health param-eters that may correlate with LOS. Including lab results as individual tests is however not feasible, because the number of model covariates would explode. Selecting the most important lab results is difficult because the importance varies by patient context. Automated feature selection algorithms may similarly find features that are not as informative in a new population. Instead, the number of abnormal lab results over time was included. Table 3 shows an example of the calculation. At ICU admission (i.e. landmark 1) the patient has 0 abnormal lab values. At landmark 2, lab result A is abnormal, so the count increases to 1. At landmark 5, two more results are abnormal (B and C) so the count increases to 3, but A was already abnormal so that measurement does not change the count. Only the maximum value per landmark is used in the model, so for landmark 5 the count is 3. Note that in this example, the count at admission is 0, but the patient can be admitted with abnormal lab values. This approach is less context specific and may therefore contain less information, but it may also be more generalisable.

Landmark Measurement Abnormal Count

1 A Normal 0 2 A Abnormal 1 5 A Abnormal 3 5 B Abnormal 3 5 C Abnormal 3 12 A Normal 1 12 C Normal 1 15 B Abnormal 1

Table 3: Deriving the number of abnormal lab results over time. Note that an abnormal test result only increases the count if the previous result was normal or unknown. For example, at landmark 5 an abnormal result for A does not add to the count, because the previous result of this test at landmark 2 was already abnormal.

3.2.5 Prescriptions

A prescription is the request by a physician for the administration of medication to a patient. Including prescriptions in the model leads to the same problems as lab results. We therefore used a similar approach and included the number of active prescriptions over time (see table 3). Note that in this example, the count at admission is 0, but a patient can be admitted with active prescriptions.

Landmark Prescription Status Count

1 A Started 1 5 A Stopped 2 5 B Started 2 5 C Started 2 12 D Started 3 15 E Started 4

(18)

3.2.6 Mechanical ventilation and other life-support systems

Mechanical ventilation is a common intervention in the ICU. MIMIC contains the day-by-day billing of mechan-ical ventilation, so it was possible to derive the days on which a patient was mechanmechan-ically ventilated. Including information about mechanical ventilation may improve the model because it is indicative of severe illness and so long as patient is mechanically ventilated the patient cannot be discharged. We did not include data on any other life-support systems because they were more difficult to extract from the data and because a patient on life support is virtually always on mechanical ventilation.

Mechanical ventilation was included in the model because we expect it to be highly predictive for LOS. If a patient is mechanically ventilated, this patient is in bad condition and is unlikely to be discharged soon. Me-chanical ventilation was included as a binary variable (1=ventilation, 0=no ventilation) and as as the cumulative ventilation time since admission (see table 5).

3.2.7 Number of ICU stays

After a patient has been discharged from the ICU, a readmission to the ICU may occur. A single hospital admission can therefore have more than one ICU admission. Including the number of ICU admissions may improve the model because readmissions are often indicative of complicated stays, resulting in higher mortality or delayed recovery and may therefore increase LOS.

3.3

Omitted data

This subsection explains the reason why variables in MIMIC were omitted from the model despite their potential use for modelling LOS.

3.3.1 Care provider

Care providers are the involved physicians, nurses and other assisting personnel. The number of involved care providers at every landmark may improve the model, because it is a surrogate for the amount of care a patient needs. The number of involved care providers will likely be at its peak when the health status is most critical, lower when a patient is recovering, and even lower when the patient will soon be discharged. However, the number of involved care providers was not included in our analysis because it had to be derived from the chart data, which we deemed too unreliable.

3.3.2 Microbiology

The microbiology data in MIMIC contains the results of bacterial, fungal and viral cultures of collected patient samples. We decided to use body temperature as a more general surrogate for infection instead of the culture results. One major advantage of temperature is that it also indicates whether there is an inflammatory response. After all, a positive culture does not always mean infection and a patient with an infection does not always have a positive culture.

3.4

Overview of the final dataset

Table 5 shows the data of one ICU stay that had a LOS of 92 hours and was discharged alive. The patient did not have any temperature measurements so these were imputed with the population average. Over time you can see that the number of abnormal lab results fluctuates. The patient was mechanically ventilated once, for a total of three days.

3.5

Model development: parameter selection procedure

To select the parameters with the most predictive value, we fitted several models with different configurations of the variables. A baseline model with all variables included was fitted first. The baseline model was fitted again, but with one variable omitted and this was repeated until all variables had been omitted once. The resulting model with the best performance was the new baseline model and the process was repeated until no better model could be found.

3.6

Model development: choice of model quality statistic

Variable selection was guided by the cross-validated concordance statistic (c-statistic). The c-statistic is the proportion of true positives to false positives, at different acceptance thresholds. To calculate the cross-validated

(19)

Landmark Age Adm Type NIC Lab ... Temp VentStat VentTime Outcome 1 55 Elective 1 17 ... 37.1 0 0 No discharge 2 55 Elective 1 17 ... 37.1 0 0 No discharge 3 55 Elective 1 17 ... 37.1 0 0 No discharge 4 55 Elective 1 17 ... 37.1 0 0 No discharge 5 55 Elective 1 17 ... 37.1 0 0 No discharge 6 55 Elective 1 15 ... 37.1 1 1 No discharge 7 55 Elective 1 15 ... 37.1 1 2 No discharge 8 55 Elective 1 17 ... 37.1 1 3 No discharge 9 55 Elective 1 18 ... 37.1 1 4 No discharge ... 76 55 Elective 1 15 ... 37.1 1 71 No discharge 77 55 Elective 1 15 ... 37.1 1 72 No discharge 78 55 Elective 1 17 ... 37.1 0 72 No discharge 79 55 Elective 1 18 ... 37.1 0 72 No discharge ... 91 55 Elective 1 14 ... 37.1 0 72 No discharge

92 55 Elective 1 14 ... 37.1 0 72 Discharged alive

Table 5: The data of one ICU admission, showing the structure of the final dataset used for analysis. A subset of representative variables is shown. Adm type = admission type, NIC = number of ICU stays, Temp = body temperature, VentStat = ventilation status, VentTime = cumulative ventilation time

c-statistic, we trained the models on a random 90% of the data, and tested the model on the remaining 10%. By testing the model on this 10% of the data, we simulated external validation. The model with the highest cross-validated c-statistic can therefore be considered the model with the best predictive performance. Because the models make a prediction at each landmark, the cross-validated c-statistic is calculated for each landmark and plotted as a curve.

3.7

Model development: fitting the model

All analyses were performed using R (version 4. 0. 2).[39] The model was fitted by minimisation of the negative loglikelihood using the nlminb function of base R.

4

Results

We now report the characteristics of the data obtained after data processing, followed by the results obtained during model development and model testing.

4.1

Baseline data characteristics

MIMIC contained a total of 61532 ICU stays. Of these, 8202 stays were excluded because they involved children and 2721 because age was over 250 years (registration error), 579 stays were excluded because the LOS exceeded 30 days and a final 389 stays were excluded because the LOS could not be determined. In total, 49641 ICU stays were included in the analysis. For 2987 stays, the patient died in the ICU, amounting to a mortality rate of 6%. For patients who survived, the majority of discharges occurred during the day, with a peak in the late afternoon. If the patient died, the discharges were fairly steady over the day, with only a minor increase in the afternoon (see Fig. 5). The LOS was right skewed with a median of 50.0 and a mean of 89.3 hours.

The baseline patient characteristics consisted of age, gender, the Charlson Comorbidity Index, admission type and the number of ICU stays per hospital admission. The data contained slightly more stays involving male patients (57%) than female patients (43%). Only 14% of the ICU admissions were planned, while 86% were unplanned. The mean number of ICU stays per hospital stay was 1.08, indicating that only a small proportion of patients were readmitted to the ICU at least once. The average age was 63 years old. The baseline characteristics are summarised in Table 6.

Table 6 also shows the baseline characteristics stratified by outcome. This shows that, on average, patients who died in the ICU stayed 48 hours longer (155% of the time of patients who survived), were 5 years older, had

(20)

Figure 5: The distribution of the number of discharges for the time of day. Note the peak in the late afternoon for discharged alive and the fairly steady rate for discharged deceased.

a slightly higher Charlson Comorbidity Index and the proportion of unplanned admissions was 11 percentage points higher, while the proportion of male patients was approximately equal for both strata.

Table 7 shows the baseline characteristics stratified by the quartiles of the population LOS rounded to the nearest day. Henceforth we will refer to these strata as ‘LOS groups’. As the LOS increased, the mean age and mean Charlson Comorbidity Index also increased, but the proportion of planned versus unplanned admissions did not show a significant difference among the strata, except that stays that lasted longer than 4 days were unplanned slightly more often. Gender is again approximately equal for all strata. Both Table 6 and Table 7 show that the vast majority of patients were admitted to the ICU only once and only a few patients were readmitted to the ICU at least once.

4.2

Longitudinal data characteristics

The number of patients in the analysis decreased over time because patients were discharged (see Fig. 6).

Figure 6: The number of patients in the analysis over time.

(21)

Total N=49641 Discharged N=46662 Deceased N=2979 LOS(Hours) Mean(SD) 89.3(106.5) 86.4(103.1) 134.0(142.8) Median(IQR) 50.0(28.0-98.0) 49(28.0-95.0) 79(30.0-185.0) Gender Male(%) 28447(57.3) 26806(57.5) 1641(55.1) Age(years) Mean(SD) 62.7(16.4) 62.4(16.4) 67.8(15.5) Admission type Planned(%) 7216(14.5) 7094(15.2) 122(4.1) Unplanned(%) 42425(85.5) 39568(84.8) 2857(95.9) Charlson Comorbidity Index Mean(SD) 1.72(1.33) 1.70(1.33) 1.93(1.26)

No. ICU stays during hospital stay

Mean(SD) 1.08(0.32) 1.08(0.32) 1.06(0.27)

Table 6: Summary of the population baseline characteristics of the total population and the summary stratified by outcome. Percentages are based on the column totals. Note that patients who died stayed longer, were older, their admissions were unplanned more often and they had a higher comorbidity score than patients who survived. SD: standard deviation. IQR: interquartile range. LOS: length of stay, SD: standard deviation, IQR: interquartile range. LOS <1d N=9203 LOS 1-2d N=14585 LOS 2-4d N=13073 LOS >4d N=12780 Gender Male(%) 5186(56.4) 8631(59.2) 7361(56.3) 7269(56.9) Age(years) Mean(SD) 61.0(17.3) 61.8(16.5) 63.5(16.2) 64.0(15.8) Admission type Planned(%) 1262(13.7) 2562(17.6) 2002(15.3) 1390(10.9) Unplanned(%) 7941(86.3) 12023(82.4) 11071(84.7) 11390(89.1) Charlson Comorbidity Index Mean(SD) 1.52(1.29) 1.57(1.35) 1.78(1.29) 1.96(1.33)

No. ICU stays during hospital stay

Mean(SD) 1.06(0.27) 1.06(0.28) 1.08(0.33) 1.11(0.38)

Table 7: Population baseline characteristics stratified by the LOS groups: less than 1 day, 1 to 2 days, 2 to 4 days and longer than 4 days. Note the increasing age and comorbidity as the LOS increases. LOS: length of stay, SD: standard deviation.

(22)

that was discharged peaked early and then slowly diminished until, after approximately 10 days, the majority of patients had been discharged and only a few more patients were discharged every hour.

Figure 7: The number of discharges in the ICU over time. In blue the patients who were discharged alive and in red the patients who died.

Figure 8: The number of deaths in the ICU over time.

The longitudinal patient characteristics consist of the number of abnormal lab results, the number of prescribed medications, body temperature, GCS and mechanical ventilation status. On average, patients who died had more abnormal lab results, a lower GCS (i.e. poorer measurable consciousness) and were more often mechan-ically ventilated than patients who survived. Figure 9 shows the number of lab results stratified by outcome. Furthermore, patients who died had a lower body temperature, although the difference was small. The number of prescribed medications did not differ significantly between the two strata. For the sake of legibility, the visuals of the other characteristics can be found in the appendix.

Figure 9: The mean number of abnormal lab results over time stratified by outcome.

Figure 10 shows the number of lab results stratified by LOS group. On average, the number of lab results, the number of prescribed medications and the GCS increased the longer a patient stayed in the ICU, while

(23)

the mean body temperature did not change with longer stays. See the appendix for the visuals of the other characteristics.

Figure 10: The mean number of abnormal lab results over time stratified by LOS groups.

Figure 11 shows the proportion of patients who were discharged alive that was mechanically ventilated and Figure 12 shows the proportion of patients who died that was mechanically ventilated. These visuals show that patients who died were more often mechanically ventilated.

Figure 11: Of the patients who were discharged alive, this plot shows the proportion of patients over time that was on mechanical ventilation.

Figure 12: Of the patients who were discharged deceased, this plot shows the proportion of patients over time that was on mechanical ventilation.

4.3

Results of model development

Optimising the model consisted of one round of fitting , i.e. all variables were tested once, with cross-validation on the validation dataset. Figure 13 shows the resulting curves of the cross-validated c-statistics per landmark. Only the model without the number of prescribed medications had a visibly lower c-statistic curve (see Fig. 13, light-blue arrow). The cross-validated c-statistic curves of the other models were close together.

(24)

Figure 13: The cross-validated c-statistic as LOS increases for all the models from the first round of fitting. The light-blue arrow indicates the model where the number of prescribed medications was omitted.

Because the overall performance was relatively poor (as shown by the cross-validated c-statistic of around 0.7), we added the time of day to the model to see if we could improve performance further (see Fig. 14, dark-blue arrow). The resulting model had a c-statistic of around 0.8, an improvement especially in the first 10 days which also contains the majority of patients (see Fig. 6).

Table 8 shows the estimated effects of the model covariates. Keep in mind that these covariates will be the exponent of base e, so negative covariates will indicate a decrease in the probability of the respective outcome while positive covariates will indicate an increase in the probability of the respective outcome. Note that the effects of the same covariate can differ significantly between the two outcomes. For example, unplanned admission status had a negative value for surviving patients, but a positive value for patients who died. This means that the probability of discharged alive is decreased for unplanned admissions but the probability of discharged deceased is increased for unplanned admissions.

Table 9 shows the output of the model for one patient. This patient stayed for 14 hours and was discharged alive. For each landmark, the model outputs three values: the probability of each outcome to occur at that landmark, independent of the other landmarks. For example, at the first landmark, the probability for no discharge is 99%, and the probability to be discharged is 1%. To calculate the risk of discharge after 14 landmarks, we simply take 1 minus the probability that the patient was not discharged for all the preceding landmarks, i.e.:

1 −

K

Y

k=1

P (Y = 0)

where K is the total number of landmarks for that patient. In the most right column in Table 9 we can see that the model gives a risk of 23% for the patient to be discharged after 14 hours, while in reality the patient was discharged after 14 hours.

4.4

Results of model testing

The model with the highest c-statistic was the model with all variables included and the addition of the time of day (see Fig. 14, dark-blue arrow) The cross-validated c-statistic starts at around 0.6 and quickly increases in the first 48 hours, after which it fluctuates around 0.8. Table 9 shows the model performance for one patient in the validation set that had a LOS of 14 hours, for whom the model predicted a 23% chance for discharge when

(25)

Figure 14: The cross-validated c-statistic as LOS increases for all the models from the first round of fitting, including the model where the time of day was added (dark blue-arrow). The light-blue arrow indicates the model where the number of prescribed medications was omitted.

Variable Discharged SE Deceased SE

Baseline -9.26 0.155∗∗∗ -9.39 0.533∗∗∗

No. landmarks 0.000322 0.0000780∗∗∗ -0.00317 0.000312∗∗∗

Gender (Male) 0.0168 0.0116∗ -0.0323 0.0460∗

Admission type (Unplanned) -0.177 0.01637∗∗∗ 0.492 0.113∗∗∗

Age (years) -0.00206 0.000364∗∗∗ 0.0165 0.00159∗∗∗

Ventilation status (Yes) -1.84 0.0262∗∗∗ -0.0292 0.0574∗

Cumulative ventilation time (10 hours) 0.00240 0.000104∗∗∗ 0.00416 0.000341∗∗∗

Body temperature◦C 0.0183 0.00368∗∗∗ 0.0292 0.0133∗∗

GCS 0.261 0.00441∗∗∗ -0.0830 0.00515∗∗∗

No. prescribed medications -0.0476 0.000500∗∗∗ -0.153 0.00286∗∗∗

No. abnormal lab results -0.00305 0.000749∗∗∗ 0.0960 0.00208∗∗∗

No. ICU stays during hospital admission -0.0672 0.0181∗∗∗ -0.699 0.0891∗∗∗

Charlson Comorbidity Index -0.000386 0.00480∗ 0.132 0.0185∗∗∗

Time of day 10-13h & 19-22h 1.73 0.0198∗∗∗ 0.522 0.0538∗∗∗

Time of day 14-18h 2.48 0.0194∗∗∗ 0.817 0.0575∗∗∗

Table 8: The estimated model covariates for the model with the highest cross-validated c-statistic curve after model development. The odds of that outcome occurring is given by the exponent of base e of these values. SE: standard error. P-values: ∗not significant,∗∗0.01-0.05,∗∗∗<0.01

the patient was discharged. If we calculate this predicted cumulative risk at the time of discharge for the whole validation set, the mean of these values gives an indication of the accuracy of the model: the mean predicted cumulative risk. The best model had a mean predicted cumulative risk of 52.2%. Table 10 shows the model performance for subgroups. On average, patients with a longer LOS had a higher mean prediction and patients who died had a lower mean prediction than patients who survived. However, these results were for predictions made one hour before the patient is truly discharged. For clinical practice you want to predict more than one hour.

(26)

Landmark No discharge Discharged alive Discharged deceased Cumulative risk of discharge 1 0.9897 0.009920 0.0003588 0.01 2 0.9897 0.009922 0.0003571 0.02 3 0.9897 0.009923 0.0003553 0.03 4 0.9897 0.009884 0.0003881 0.04 5 0.9897 0.009886 0.0003862 0.05 6 0.9897 0.009888 0.0003843 0.06 7 0.9897 0.009890 0.0003824 0.07 8 0.9897 0.009891 0.0003805 0.08 9 0.9897 0.009893 0.0003786 0.09 10 0.9897 0.009895 0.0003767 0.10 11 0.9897 0.009897 0.0003748 0.11 12 0.9501 0.049352 0.0005616 0.15 13 0.9501 0.049360 0.0005588 0.19 14 0.9501 0.049369 0.0005560 0.23

Table 9: The output of the model for one patient that stayed in the ICU for 14 hours. The middle three columns give the independent probabilities of the respective outcome occurring at that landmark. The column on the right gives the cumulative probability to be discharged, i.e. the risk to have been discharged at that landmark or any of the previous landmarks: in reality the patient was discharged at the 14th landmark and the model predicted the chance of discharged to have occurred at that point to be 23%.

LOS group Discharged Deceased

LOS <1d 0.30(0.15) 0.22(0.17) LOS 1-2d 0.46(0.18) 0.34(0.21) LOS 2-4d 0.58(0.19) 0.49(0.23) LOS >4d 0.72(0.21) 0.64(0.24)

Table 10: The mean(SD) cumulative probability to be discharged, stratified by the LOS groups and the outcome of the ICU stay. On average, patients who stayed longer had a higher mean prediction. Patients who died had a slightly lower mean prediction than patients who survived.

to extrapolate the data for the next 48 hours. For this, we simply used the values of the last known landmark. For example, if a patient is at the 50th landmark (i.e. a LOS of 50 hours) with a body temperature of 38C

and we want to predict the next 48 hours, we extrapolate the next 48 hours and assign a body temperature of 38◦C to them. In other words, we predict the probability the patient is discharged in the next 48 hours, assuming that the condition of the patient stays exactly the same, except for the number of landmarks passed and the time of day. Figure 15 shows two predictions for a patient who was discharged alive after 98 hours. The light-blue line shows the prediction using all landmarks, while the dark-blue, dashed line shows the prediction at 48 hours prior to discharge. For this patient, both predictions were very similar. Similarly, Figure 16 shows the same predictions, but for a patient who died after almost 5 days in the ICU. Unlike the patient shown in Fig. 15, however, the predictions differ substantially.

When we calculated this statistic for all ICU stays in the validation set and took the mean, we found that the mean predicted cumulative probability declined slightly as the model had to predict further into the future, with a value of 46.2% 24 hours prior to discharge, 45.5% 48 hours prior to discharge and 45.0% 72 hours prior to discharge.

5

Discussion

For this study we aimed to model LOS of adult patients in the ICU to answer the following questions: • Is it possible to predict adult LOS reliably based on EHR data?

• Does a longitudinal modelling approach with the inclusion of longitudinal data explain the variance of LOS between individual patients better compared to models that predict LOS at admission with admission data alone?

(27)

Figure 15: The cumulative probability to be discharged for a patient who was discharged alive after 4 days. The light-blue line shows the cumulative probability of discharge using all 97 landmarks, and the dark-blue, dashed line shows the prediction 48 hours prior to discharge.

Figure 16: The cumulative probability to be discharged for a patient who was discharged deceased after almost 5 days. The dark-blue line shows the cumulative probability of discharge using all 119 landmarks, and the light-blue line shows the cumulative probability using only the first 71 landmarks (i.e. 48 hours prior to discharge).

Referenties

GERELATEERDE DOCUMENTEN

The primary objective of the current study is to examine the effectiveness of a workers’ health surveillance module combined with personalized feedback and an offer of online

To illustrate the difference between the classical approach (e.g., using sum–scores) and psychometric genetic modelling (e.g., using an IRT model and analyzing item-level data),

In this study our hypotheses which were based on the results of the focus group study are tested in a wider range of disease types to answer the following research question: what is

This study investigated whether the Parkinson Disease Cognitive Functional Rating Scale (PD-CFRS) and the Amsterdam Instrumental Activities of Daily Living questionnaire (A- IADL)

For parents’ psychological distress, a relation seems to exists between mindful behaviour and parental depression, anxiety and stress.. As for the effects of MBIs, overall

Our aim was to construct models and estimate necessary parameters for yearling body weight (BW), clean fleece weight (CFW) and mean fibre diameter (MFD) of South African Dohne

Like the well-known LISREL models, the proposed models consist of a structural and a measurement part, where the structural part is a system of logit equations and the measurement

Since the MA-, GARCH- and TGARCH-model do not show significant test values, this paper concludes that a significant difference exists in the accuracy of