• No results found

Intensive care unit benchmarking: Prognostic models for length of stay and presentation of quality indicator values - Thesis (complete)

N/A
N/A
Protected

Academic year: 2021

Share "Intensive care unit benchmarking: Prognostic models for length of stay and presentation of quality indicator values - Thesis (complete)"

Copied!
229
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Intensive care unit benchmarking

Prognostic models for length of stay and presentation of quality indicator values

Verburg, I.W.M.

Publication date

2018

Document Version

Final published version

License

Other

Link to publication

Citation for published version (APA):

Verburg, I. W. M. (2018). Intensive care unit benchmarking: Prognostic models for length of

stay and presentation of quality indicator values.

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

(2)

Ilona W.M. Verburg

01001001

01101110

01110100

01100101

01101110

01110011

01101001

01110110

01100101

00100000

01100011

01100001

01110010

01100101

00100000

01110101

01101110

01101001

01110100

00100000

01100010

01100101

01101110

01100011

01101000

01101101

01100001

01110010

01101011

01101001

01101110

01100111

00001010

01001001

01101100

01101111

01101110

01100001

00100000

01010111

00101110

01001101

00101110

00100000

01010110

01100101

01110010

01100010

01110101

01110010

01100111

Intensive

care unit benchmarking:

prognostic

models for length of stay

and

(3)
(4)

Intensive care unit benchmarking:

prognostic models for length of stay

and

presentation of quality indicator values

(5)

Intensive care unit benchmarking: prognostic models for length of stay and presentation of quality indicator values

PhD thesis, University of Amsterdam, Amsterdam, the Netherlands ©Ilona Verburg, Amsterdam, 2018

ISBN 978-94-6332-283-6

Design and lay out: Ilona Verburg Design cover: Vincent Bakkenist

Printed by: GVO drukkers & vormgevers B.V.

Copyright by Ilona Verburg, Amsterdam, The Netherlands. All rights reserved. No part of this thesis may be reproduced, stored or transmitted in any form or by any means without prior written permission of the author.

A digital version of this thesis can be found at http://dare.uva.nl. This thesis was printed with financial support of:

Stichting Nationale Intensive Care Evaluatie (NICE) and the Department of Medical Informatics AMC.

(6)

Intensive care unit benchmarking:

prognostic models for length of stay

and

presentation of quality indicator values

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus prof. dr. ir. K.I.J. Maex

ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel

op vrijdag 2 februari 2018, te 10.00 uur

door Ilona Willempje Maria Verburg geboren te Schoonhoven

(7)

Promotiecommissie

Promotores: Prof. dr. N.F. de Keizer AMC-Universiteit van Amsterdam Prof. dr. E. de Jonge Universiteit Leiden

Co-promotores: Dr. R. Holman AMC-Universiteit van Amsterdam Prof. dr. N.B. Peek University of Manchester

Overige leden: Prof. dr. M.B. Vroom AMC-Universiteit van Amsterdam Prof. dr. A.H. Zwinderman AMC-Universiteit van Amsterdam Prof. dr. E.W. Steyerberg Universiteit Leiden

Dr. M.G.W. Dijkgraaf AMC-Universiteit van Amsterdam Dr. L.M. Peelen Universiteit Utrecht

(8)
(9)
(10)

Table of contents

1 General Introduction 1

I Prognostic models for intensive care unit length of stay

2 Which models can I use to predict adult ICU length of stay?

A systematic review 15

3 Comparison of regression methods for modeling intensive care length of

stay 43

4 Is patient length of stay associated with intensive care unit characteristics? 71

II Presentation of quality indicator values

5 The association between outcome-based quality indicators for intensive

care units 99

6 Individual and clustered rankability of ICUs according to case-mix

adjusted mortality 123

7 Guidelines on constructing funnel plots for quality indicators: a case study on mortality in intensive care unit patients 143

8 General Discussion 171

Appendix

Summary 181

Samenvatting 187

Overview of cited literature 193

Dankwoord 207

Curriculum vitae and portfolio 209

(11)
(12)
(13)

Chapter 1

1.1

Introduction

D

ue to a drive for continuous quality improvement, a pressure on account-ability and budgetary constraints, the awareness for quality of care has grown among various stakeholders including care providers, healthcare managers, insurance companies, governmental bodies and patients. Healthcare institutions search for quality indicators to identify room for quality of care im-provement [1–4]. This has led to the formulation of numerous quality indicators across all fields of clinical medicine. In some cases, it is obvious what the target (preferred) value should be. However, for many indicators, no clear target values are available to define good care. For example, mortality rates in hospitals should be as low as possible, but it would be unrealistic to demand that no hospitalized patients die. In the absence of clear external target values, healthcare institutions often compare themselves to their own historical values or with their peers in a process called benchmarking.

Ideally, the quality indicator values represent the true quality of care provided by an institution and differences in the quality indicator values between institutions would indicate that institutions with worse values could improve their quality of care. Figure 1.1 presents an adaption of a previously published schematic representation of causes of differences in quality indicator values between health-care institutions [5]. Observed differences in quality indicators will ideally be the result of genuine differences in quality, but can also arise from noise caused by registration biases, differences in patient characteristics, residual confounding, and random variation. This noise may influence observed changes over time or differences between institutions and could lead to incorrect judgments, being made about institutions.

Errors may occur in the registration of patient data and may result in registration bias and inadequate data quality meaning that a quality indicator is less reliable [6]. Examples of causes of registration bias are differences in: how data are collected, such as the use of multiple electronic patient record systems; definitions; and interpretation. Quality registries should aim to minimize registration bias by standardization and monitoring data quality [7, 8].

In addition to random variation, differences in patient characteristics (case-mix) such as age, sex or severity of illness may influence care outcomes and thus quality indicator values. For example, higher in-hospital mortality rates or longer in-hospital length of stay may result from a more severely ill patient popula-tion. Fair and meaningful benchmarking requires correction for differences in patient case-mix. Prognostic models can partially correct for differences in patient case-mix. However, even after case-mix correction quality indicator values may still be influenced by patient characteristics, for which adjustment was not or inadequately performed. This is known as residual confounding. Theoretically, differences in the values of correctly constructed quality indicators should reflect true differences in the quality of care.

(14)

1

General Introduction Observed differences Unexplained differences Random variation Unexplained differences Patient characteristics Difference in quality of care Registration bias Residual confounding

Figure 1.1: Schematic representation of causes of differences in quality indicator values between health care organizations, adapted and adjusted from Lingsma et al [5].

Defining meaningful quality indicators to measure the quality of care is difficult since different quality indicators may reflect different aspects of performance. No single quality indicator reflects the whole spectrum of healthcare performance. Therefore, a set of quality indicators reflecting structure, process, and outcomes of health care are often presented to identify room for quality of care improvement and to support policy decisions [9].

There are many possible ways of presenting quality indicator values to stakeholders to support them in making decisions about the quality of care. Methods of identifying institutions with outlying performance include: simple descriptive statistics; league tables; Bayesian ranking; the probability of being in the worst-ranked group of institutions; preset limits for acceptable performance, such as 95% confidence intervals; statistical process control (SPC) charts; funnel plots; variable life adjusted display (VLAD) curves; and risk-adjusted exponentially weighted moving average (RA EWMA) plots [10]. In this thesis, we focus on league tables and funnel plots.

All studies included in this thesis have been performed in the context of the Dutch National Intensive Care Evaluation foundation (NICE) registry, a quality registry for Dutch intensive care units (ICUs). The remainder of this chapter introduces the domain of intensive care and the NICE registry. In addition, prognostic models to adjust for case-mix differences and other unexplained differences, such as organizational aspects, are introduced. Furthermore, league tables and funnel

(15)

Chapter 1

plots and their ability to take random variation into account, are introduced as methods of presenting quality indicator values. The chapter concludes with the objective and an outline of the thesis.

1.2

Intensive care units and National Intensive Care

Evaluation foundation

Intensive care is defined as 'a service for severely ill patients with potentially recoverable conditions, who can benefit from more detailed observation and more intensive treatment than can safely be provided in general wards or high depen-dency areas' [11]. This definition originates from the end of the 20th century. Since then intensive care has expanded and evolved. Nowadays, ICU care is very complex and delivered in a highly technical and labor-intensive environment. As these developments have occurred, the survival chances of critically ill patients has drastically improved. But the cost of intensive care has also increased substantially, resulting in a high proportion of the health care budget being spent on ICUs [12]. This all makes ICUs a particularly interesting part of the hospital to assess and improve performance.

For this reason, the NICE foundation [13, 14] was established in 1996 by a group of intensivists. Its purpose is to facilitate quality monitoring and quality improve-ment initiatives in Dutch ICUs. At the start of the registry, a small proportion of all Dutch ICUs voluntarily participated. Over the years the NICE registry has expanded and, currently, all Dutch ICUs participate.

Figure 1.2 gives an overview of data registration, analyses, and benchmarking by the NICE registry. All ICUs register a core dataset including demographic, diagnostic, and physiological data from the first 24 hours after ICU admission. Furthermore, they register outcome data, such as: ICU and in-hospital mortality; ICU readmission; and ICU and hospital length of stay. In addition, most ICUs participate in an additional quality indicator registry of NICE. This consists of structure, process, and outcome indicators chosen by the Dutch Society of Intensive Care (NVIC). Examples are: the number of ICU and hospital beds; staff resources; nurse-to-patient ratio; glucose regulation; and duration of mechanical ventilation. Furthermore, the NICE registry has several other optional registration modules. These are: complications; sepsis; sequential organ failure assessment (SOFA); and nursing workload. The analyses presented in this thesis use data from the core dataset. In addition, chapter 4 uses data from the NVIC quality indicators.

To improve the quality of the data collected, the NICE registry uses strict defini-tions and specificadefini-tions of the data collected, described in a data dictionary. It provides participants with e-learning based training, provides participants with a mandatory training before starting the registration and performs data quality

(16)

1

General Introduction Stakeholders Benchmark reports Web-based dashboard Public website Prepared datasets Analyses (e.g. case-mix correction)

NICE database Data security and

Upload data to NICE

quality checks Automatic data

encryption Data dictionary and

training Hospital ICU Internal hospital database Data extraction of registry modules

Data validation report

Data reminder

Data quality audits

Audit report ICU patients input:

-Demographics -Severity of illness -Diagnosis

Patient outcome: -ICU and in-hospital

mortality

-Readmission to the ICU -ICU and in-hospital

length of stay

Figure 1.2: Overview of data registration, analyses and benchmarking by the NICE registry.

(17)

Chapter 1

controls, such as automated checks on data entry and onsite data quality audits [6, 15].

ICU patients are very heterogeneous. They may have a high severity of illness or have undergone major surgery. Post-surgical patients often have low mortality rates and short ICU length of stay. Hence, it is meaningless to compare the outcome of different ICUs without proper case-mix correction. The NICE registry corrects the quality indicator in-hospital mortality for case-mix using the Acute Physiology and Chronic Health Evaluation (APACHE) IV model [16]. However, the NICE registry does not correct the quality indicator length of stay for case-mix. This thesis focusses on prognostic models used for case-mix adjusted ICU length of stay.

The NICE registry performs analyses on the registered data and supplies feedback on the values of a set of quality indicators to the participating ICUs through reports issued every 6 months and a web-based dashboard application. This enables ICUs to benchmark their performance to national values and to groups of ICUs of comparable size which can be used to identify critical points in the care process as starting point for improving quality of care. Like other ICU quality registries [17, 18], the NICE registry has offered ICUs the opportunity to make the outcomes of some quality indicators publically available on a website [19] since 2013. This makes performance transparent to all stakeholders.

Several methods are used within the NICE registry to present performance on quality indicators such as SPC charts [20], VLAD curves [21], RA EWMA charts [22], funnel plots, and caterpillar diagrams. Results are presented for the entire cohort and for subgroups of ICU patients [19].

Currently, data of more than 1,000,000 ICU admissions have been included in the NICE database and about 85,000 new admissions are registered annually. In 2016 the overall in-hospital mortality of ICU patients in the Netherlands was around 13%. The mean ICU length of stay was 2.9 days (median 1.0 day) for ICU survivors and 5.1 days (median 2.1 days) for ICU non-survivors. The overall percentage of patients readmitted to the ICU was around 6% and the overall percentage of patients readmitted to the ICU within 48 hours after ICU discharge around 2%.

1.3

Prognostic models for intensive care unit length

of stay

The first part of this thesis addresses prognostic models for ICU length of stay. Since costs are strongly related to ICU length of stay [23, 24], ICU length of stay can play an important role in examining the efficiency of care. As discussed earlier in this section, ICU patients form a heterogeneous population with patients with a wide range of complex health issues, each of which may have a different association with ICU length of stay [25]. For example, severe trauma patients

(18)

1

General Introduction

may have a very long ICU length of stay while most post-surgical patients will be discharged from the ICU very quickly. Quality indicators for ICU length of stay are meaningless if they are not properly corrected for patient case-mix.

For in-hospital mortality, prognostic models have been proposed and are widely implemented to adjust in-hospital mortality for patient case-mix. Prognostic models for ICU length of stay are less frequently used as little consensus exists on the best method for predicting ICU length of stay and the predictive performance of existing models is modest [26–29]. To date, the NICE registry presents crude mean and median ICU length of stay and does not correct reported values of ICU length of stay for differences in case-mix between ICUs.

The accurate prediction of ICU length of stay is challenging for three main reasons. Firstly, the distribution of ICU length of stay is typically strongly skewed to the right with both a long tail and an inflated number of values close to zero. Secondly, the association between severity of illness and ICU length of stay differs for ICU survivors and ICU non-survivors [30]. Thirdly, the characteristics of individual ICUs can be associated with patient level ICU length of stay. Examples are discharge policies and the availability of spare beds on general wards [31–34]. The focus of this thesis and of quality registries, such as the NICE, is primarily on benchmarking. However, in chapter 2 and chapter 4 of this thesis, we extent our focus to address three reasons for predicting ICU length of stay. These are: 1) benchmarking; 2) planning the number of beds and members of staff required to fulfill demand for ICU care within a given hospital or geographical area; and 3) identifying individual patients or groups of patients with unexpectedly long ICU length of stay to drive direct quality improvement [35, 36]. The requirements for an ICU length of stay prediction model differ between these situations. A model for benchmarking purposes needs to predict ICU length of stay reliably at the ICU level. A model for planning or identifying patients with unexpected long ICU length of stay needs to predict ICU length of stay reliably for individual patients. Including ICU organizational characteristics in a prognostic model for ICU length of stay will limit the usefulness of a prediction model for benchmarking purposes. This is because this type of model will adjust for a part of the variation in the quality indicator values that can be attributed to quality of care. However, for planning the number of beds and members of staff required, including ICU organizational characteristics might be valuable. In addition, a model to identify individuals or groups with unexpectedly long ICU length of stay needs to predict ICU length of stay reliably for individual patients. It might be valuable to include ICU organizational characteristics in this type of prognostic model for ICU length of stay.

(19)

Chapter 1

1.4

Presentation of quality indicator values

The main focus of the second part of this thesis is on the presentation of quality indicator values. As we previously described, registries often report a set of quality indicator values to a range of stakeholders. Besides ICU length of stay, in-hospital mortality and readmissions to the ICU are often used as quality indicators for ICU care. Several studies found that patient level outcomes of ICU care are interrelated and influence each other [37–44]. The second part of this thesis addresses whether it is sufficient to report a single quality indicator. This would be the case if ICUs that perform well on one quality indicator also perform well on other quality indicators. This would not be the case if different ICU quality indicators reflect different aspects of performance. This thesis also addresses league tables and funnel plots, which are visual methods of presenting quality indicators. We describe league tables and funnel plots below.

1.4.1 League tables

League tables are frequently used to present comparative performance results [10]. Figure 1.3 presents a league table for benchmarking ICUs, for the general ICU population reported to the NICE registry. In the league table, ICUs are ranked according to their values of the in-hospital standardized mortality ratio (SMR) over the year 2016. We present 95% confidence intervals around the values of SMR. The SMR is defined as the number of deaths actually observed in an ICU divided by the number of deaths predicted by the APACHE IV [16] model. The purpose of league tables is to discriminate between the best and worst performing institutions. It enables staff in underperforming institutions to recognize the need for improvement and to identify the best performing institutions with a high rank in the league table, from which they can learn and develop improvement strategies.

Although it is possible to add confidence intervals to the ranks of a league table [41, 45, 46], statisticians have raised concerns about the reliability of league tables to discriminate between institutions [45–48]. League tables focus on observed differences between hospitals. However, users of league tables may ignore uncertainties in performance due to limited sample size even though confidence intervals have been added. If the sampling error is substantial, and hence the signal-to-noise ratio is low [49], league tables may be unreliable. This is because the rank of a particular institution may be largely determined by chance rather than the underlying quality of care it provides. The reliability of a league table can be expressed in terms of its rankability [50]. Rankability expresses the percentage of variation between ICUs' observed quality indicator values that is due to unexplained differences rather than random variation. Unexplained differences hypothetically reflect differences in quality of care, see figure 1.1. Higher values of rankability correspond to more reliable league tables [49, 50]. The concept of reliability in terms of rankability has not previously been applied

(20)

1

General Introduction

to ICU league tables, although risk adjusted mortality has been used in league tables to compare the performance of ICUs [41].

SMR (95% Confidence Interval) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1 11 26 31 36 41 16 21 6 46 51 56 61 66 71 76 81

(21)

Chapter 1

1.4.2 Funnel plots

Funnel plots form an alternative for league tables. Funnel plots are graphical decision-making tools to assess and compare the clinical performance of a group of institutions on quality indicators against a pre-defined benchmark [51]. A funnel plot is an example of a Shewart control chart, which were originally meant for quality control at the Western Electric company in the 1920s. Although Shewart control charts were used in industry, where quality control of manufacturing and other processes and reporting business performance were essential for remaining competitive, they have only been applied in healthcare since around 1990 [10]. Figure 1.4 presents a funnel plot of ICU performance, based on the same data as used for figure 1.3. In a funnel plot, the value of a quality indicator for each institution is plotted against a measure of its variation, often the number of patients or cases used to calculate the quality indicator. Control limits indicate a range, in which the values of the quality indicator would, statistically speaking, be expected. The control limits form a funnel shape around the benchmark, which is presented as a horizontal line. If an institution falls outside the control limits, it is seen as performing differently than expected, given the value of the benchmark [51–53]. Number of admissions SMR 0 500 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1,500 2,000 2,500 1,000

Figure 1.4: Funnel plot of Dutch ICUs based on case-mix adjusted in-hospital mortality.

It is important that we can assume that institutions falling outside the control limits perform significantly different than expected, given the benchmark. It is also important that there is no reason to suspect that institutions falling in-side the control limits are performing differently from the benchmark. Incorrect

(22)

1

General Introduction

judgements may have severe consequences, such as loss of trust among patients, insurance companies refusing to pay for care delivered or award new contracts, and demotivated health care staff. Hence, the methods used to construct funnel plots need to have a solid justification in statistical theory. Incorrectly constructed funnel plots could lead to highly consequential incorrect judgements about hospital performance.

Prior papers comparing hospital performance by funnel plots are mostly based on a single paper on funnel plot methodology [51]. In this paper, Spiegelhalter describes multiple methods of constructing control limits. We found no publica-tions describing a guideline for producing funnel plots, which describe all steps required when producing a funnel plot.

1.5

Objectives of this thesis

This thesis aims to contribute to knowledge on improving the reliability and accuracy of ICU benchmarking and is divided into two themes: 1) prognostic models for ICU length of stay; and 2) the presentation of quality indicator values. The two research questions on prognostic models for ICU length of stay in this thesis are:

1. Is it feasible to predict ICU length of stay accurately using regression methods and patient characteristics only?

2. What is the role of ICU organizational characteristics in predicting ICU length of stay?

The three research questions on the presentation of quality indicator values are:

3. Are case-mix adjusted ICU outcomes mutually independent measures for ICU quality of care?

4. What is the rankability of league tables for in-hospital mortality of ICU patients, and how can it be improved?

(23)

Chapter 1

1.6

Outline of this thesis

This thesis contains eight chapters. To address the research questions on prognostic models for ICU length of stay, three studies were performed. In chapter 2, we systematically reviewed the reporting and methodological quality of models predicting ICU length of stay. In chapter 3, we compared ordinary least square (OLS) regression, generalized linear models (GLM)s, and Cox proportional hazards (CPH) regression to predict individual patient ICU length of stay. In chapter 4, we added ICU organizational characteristics to a regression model correcting for patient case-mix and assessed the influence of these characteristics on ICU length of stay and the change in model performance.

To address the research questions on presentation of quality indicator values three studies were performed. In chapter 5, we examined the associations between outcome-based quality indicators for in-hospital mortality; readmission to the ICU within 48 hours of ICU discharge; ICU length of stay. In chapter 6, we evaluated the rankability of a league table of ICUs based on case-mix adjusted in-hospital mortality. In chapter 7, we conducted a literature search to identify the steps in the process of funnel plot development. We applied the steps identified to an example for crude proportion of mortality and SMR in the NICE registry. This thesis concludes with chapter 8, which provides an overall discussion of the principal findings of the work.

(24)

Part I

Prognostic models for intensive

care unit length of stay

(25)
(26)

2

|

Which models can I use

to predict adult ICU

length of stay?

A systematic review

Ilona W.M. Verburg, Alireza Atashi, Saeid Eslami, Rebecca Holman, Ameen Abu-Hanna, Evert de Jonge, Niels Peek and Nicolette F. de Keizer

(27)

Chapter 2

Abstract

Objective: We systematically reviewed models to predict adult intensive care

unit (ICU) length of stay.

Data Sources: We searched the Ovid Excerpta Medica database (EMBASE) and

Medical Literature Analysis and Retrieval System Online databases (MEDLINE) for studies on the development or validation of ICU length of stay prediction models.

Study selection: We identified 11 studies describing the development of 31

prediction models and three describing external validation of one of these models.

Data extraction: Clinicians use ICU length of stay predictions for planning ICU

capacity; identifying unexpectedly long ICU length of stay; and benchmarking ICUs. We required the model parameters to have been published and for the mod-els to be free of organizational characteristics and to produce accurate predictions, as assessed by the squared Pearson's correlation coefficient (R2) across patients for planning and identifying unexpectedly long ICU length of stay and across ICUs for benchmarking, with low calibration bias. We assessed the reporting quality using the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies.

Data synthesis: The number of admissions ranged from 253 to 178,503. Median

ICU length of stay was between 2 and 6.9 days. Two studies had not published model parameters and three included organizational characteristics. None of the models produced predictions with low bias. The value of R2 was 0.05 to 0.28 across patients and 0.01 to 0.64 across ICUs. The reporting scores ranged from 49/78 to 60/78 and the methodological scores from 12/22 to 16/22.

Conclusion: No models completely satisfy our requirements for planning,

iden-tifying unexpectedly long ICU length of stay, or for benchmarking purposes. Physicians using these models to predict ICU length of stay should interpret them with reservation.

(28)

2

2

Prognostic models for ICU length of stay: a systematic review

2.1

Introduction

I

ntensive care units (ICUs) provide complex and expensive care and hospitalsface pressure to improve efficiency and reduce costs [23, 54]. Since costs are strongly related to ICU length of stay, shorter ICU length of stay generally equates to lower costs [23, 24]. Hence, models predicting ICU length of stay can play an important role in examining the efficiency of ICU care. We identified three main reasons for clinicians to predict ICU length of stay [35, 36]: 1) planning the number of beds and members of staff required to fulfill demand for ICU care within a given hospital or geographical area; 2) identifying individual patients or groups of patients with unexpectedly long ICU length of stay to drive direct quality improvement; and 3) enabling case-mix correction when comparing average length of stay between ICUs (benchmarking). The requirements of an ICU length of stay prediction model differs between these situations. A model for planning purposes or to identify individuals or groups with unexpectedly long ICU length of stay needs to predict ICU length of stay reliably for individual patients. When benchmarking the quality or efficiency of ICU care and benchmark reports are based on summary measures of differences between expected and observed length of stay [55], a prediction model needs to predict total ICU length of stay accurately across ICUs.

A range of models to predict case-mix adjusted ICU length of stay have been published. However, their clinical utility is unclear [25, 55] and there is no consensus on which is the best [16, 26, 29]. Predicting ICU length of stay accurately is difficult for three reasons. Firstly, statistical methods often assume a Gaussian distribution, but ICU length of stay is generally right skewed [56]. Secondly, patients admitted to an ICU form a heterogeneous group with a wide range of complex health issues, each of which may have a different association with ICU length of stay [25]. Thirdly, the association between severity of illness and ICU length of stay differs for ICU survivors and ICU non-survivors [56]. If not correctly addressed, these points could lead to wildly inaccurate or biased predictions of ICU length of stay, thus negating their utility [56].

In this study, we systematically review reporting and methodological quality of models for predicting ICU length of stay and assess their suitability for planning ICU resources, identifying unexpectedly long ICU length of stay, and benchmarking. We examine characteristics most relevant to clinicians assessing the suitability of a published model to predict ICU length of stay in their own hospital or group of hospitals.

(29)

Chapter 2

2.2

Methods

2.2.1 Search strategy and inclusion and exclusion criteria

We searched the Ovid Excerpta Medica database (EMBASE) and Ovid Medical Literature Analysis and Retrieval System Online (MEDLINE) databases from database inception until October 31th 2014 by searching all fields and including citations in-progress, which are not indexed with Medical Subject Headings (MeSH) headings. The search query consisted of three sub-queries, with synonyms and combined with 'AND', on intensive care, length of stay and prediction. We present the detailed search strategy in appendix 2.A, table 2.7. We included all original papers describing the development and/or validation of a prediction model for ICU length of stay in adult patients. We excluded duplicate studies, papers not written in English and studies, which were later updated by the same research group.

When deciding whether to include a paper, the authors (IV, AA) classified the papers by reading the title and abstract, then compared and discussed their results until they reached consensus about the eligibility of the paper based on the title and abstract. IV manually reviewed the references in papers found. The authors (IV and AA) read the full text of the included papers and independently scored the items for each model. If necessary, disagreement in all steps was reconciled by discussion with other authors (NdK, SE).

2.2.2 Assessment of methodological and reporting quality

We used the consensus based Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS) [57] to assess the quality of reporting in the studies. This checklist was based on previously published reporting guidelines, systematic reviews, methodological literature and pilot versions discussed within the Cochrane Prognosis Methods Groups. Although the checklist is relatively new, it has already been used in scientific reviews [58–60]. We extended the checklist with three items of importance in prediction models [61] and four items on how the prediction model handled four specific subgroups of ICU patients: 1) patients readmitted to the ICU; 2) patients transferred from or to another ICU; 3) patients who survived their ICU stay versus patients who died on the ICU; 4) and patients who underwent cardiac surgery. Although the checklist presents items to report on, no score and no weight per item was reported. To circumvent this, we assess the methodological quality of prediction models using eleven items that we considered as important, based on literature [57, 61]. We assigned item scores of zero (Not), one (Partly) or two (Yes) points and calculated a total score for reporting and methodological quality. There were 39 items in the reporting score for a total of 78 possible points, and 11 items in the methodological score for a total of 22 possible points. All items are presented in appendix 2.B, table 2.8.

(30)

2

2

Prognostic models for ICU length of stay: a systematic review

2.2.3 Definition of utility in predicting ICU length of stay

In order to assess the models we defined four requirements on prediction models in advance. The first three were applicable to models for all purposes. We regarded a model as suitable if: a) the parameters required to predict ICU length of stay have been published; b) it does not include any organizational characteristics; c) it has a low level of bias, demonstrate by at least ('moderate' calibration [62]. Moderate calibration is achieved if the mean observed values equal the mean predicted values for groups of patients with similar predictions [62]. Moderate calibration is important when benchmarking to avoid distortion of benchmarks if mean predicted ICU length of stay differs greatly between hospitals. Our fourth requirement was that a model produces accurate predictions. Our definition of 'accurate predictions' is different for models used to plan resources or identify patients with unexpectedly long ICU length of stay and for models used for case-mix correction in benchmarking. We regard a model as suitable for planning resources or identifying patients with unexpectedly long ICU length of stay if it has an squared Pearson's correlation coefficient (R2) of at least 0.36 (strong) across patients. We regard a model as suitable for case-mix correction in benchmarking if it has an R2 of at least 0.36 (strong) across ICUs [63].

2.3

Results

We identified 3,818 unique articles in Embase and Medline and selected 25 (0.7%) based on the title and abstract. Following inspection of the full text, we included 13 (52.0%) studies and identified one additional study from the references of previously included papers [32]. We excluded three studies [29, 37, 64], because a more recent version of the model was available [28]. Hence, we included 11 studies describing the development of 31 models [25, 26, 28, 32, 55, 56, 65–69]. We present the inclusion process in figure 2.1.

Most studies presented one model, but four [25, 26, 55, 56] each presented between two and 12 models. Three studies [26, 56, 67] described the external validation and one a second order recalibration [26] of the Acute Physiology And Chronic Health Evaluation (APACHE) IV [28] model for ICU length of stay. We present information on geographical location, time period of data collection, observed mean and median ICU length of stay, numbers of parameters estimated of ICUs and included patients in table 2.1. Data was collected between 1995 [68] and 2011 [56] over periods of one month [68] to eleven years [66] in Europe [55, 56, 68], USA [26, 28, 67], Southern America [65], Egypt [69] and worldwide [32]. The data come from 253 [69] to 178,503 [66] admissions to three [69] to 275 [32] ICUs. The number of parameters estimated ranged from one [32, 65] to 151 [56] and the average number of admissions per parameter estimated from 82 [68] to 16,560 [32].

(31)

Chapter 2

Studies reviewed based on title and abstract: 3,818

? Review full text for relevance: 25

?

Articles included based on full text: 15

? Articles eligible for review: 17

Articles included from references listed by articles included: 2 -Model development: 15 Model validation: 12        ) P P P P P PPq

Figure 2.1: Flowchart representing the result of search and the number of articles excluded and eligible for review.

We present patient exclusion criteria and candidate predictors considered for inclusion in table 2.2 and table 2.3. Studies used exclusion based on observed value of ICU length of stay, such as incomplete or unknown ICU length of stay, ICU length of stay less than four [25, 26, 28, 37, 56, 64, 66, 67] or six [55] hours, ICU length of stay greater than 48 hours [68] or 60 days [25, 66]. Three studies included ICU organizational characteristics, such as geographic location, ICU level, number of beds or teaching hospital status [25, 55, 66].

In table 2.4, we present how researchers handled patients, who were readmitted to the ICU, transferred between ICUs, underwent cardiac surgery or died before ICU discharge. The researchers' strategies for handling readmissions were: including only a patient's first admission to the ICU [25, 26, 28, 32, 56, 66]; including readmission as predictor in the model [67]; defining ICU length of stay as the sum of all ICU length of stay of a patient's ICU admission [65]; and including readmissions as separate data records [55]. Three studies excluded patients based on transfer to [56, 67] or from another ICU [28, 56, 67] and six excluded one or more groups of cardiac surgery patients [26, 28, 55, 56, 67, 69]. Researchers handled patients, whose ICU stay ended in death by: including death as a predictor [25, 55, 66], excluding patients dying within an hour of ICU admission [69] or developing different models for ICU survivors and non-survivors [56].

(32)

2

2

Prognostic models for ICU length of stay: a systematic review

T able 2.1: Coun try , p erio d of data colle ction, outcome v alues, n um b e r of predictors and sample size for mo del dev elopmen t studies. Reference Data collec tion coun try

Data collection perio

d Num b er of ICUs included P ercen tage of patien ts excluded Num b er of included admissions Mean (median) ICU length of sta y (da ys) Num b er of parameters estimated Clermon t, 2004 [68] 11 Europ ean coun tries 1 1995 49 -989 -12 P erez, 2006 [65] Colom bia 1997 to 1998 20 6% 1 ,528 20 (13.0) 2 1 Zimmerman, 2006 [28] USA 2002 to 2003 104 12% 69 ,652 3.9 (2.0) 131 Rothen, 2007 [32] 35 coun tries 3 2002 275 -16 ,560 2.0 1 Moran, 2008 [66] A ustral ia and New Zealand 1993 to 2003 99 12% 178 ,503 3.6 (2.9) 79 Moran, 2012, mo del 1-12 [25] A ustralia and New Zealand 2008 to 2009 131 4% 89 ,330 3.6 (2.9) 26 Niskanen, 2009, mo del 1-2 [55] Finland 2000 to 2005 23 22% 37 ,718 3.9 (2.0) 9 V asilevsiks, 2009, mo del 2 [26] 4 USA 2001 to 2004 35 -6 ,684 4.0 (2.0) 24 V asilevs iks, 2009, mo del 3 [26] USA 2001 to 2004 35 -6 ,684 4.0 (2.0) 2 Kramer, 2010 [67] USA 2002 to 2007 83 10% 12 ,640 4.2 (2.14) 5 106 Al T ehewy , 2010 [69] Egypt 2004 to 2005 3 42% 253 5.1 (4.0) 1 V erbur g, 2014, mo del 1-8 [56] Netherlands 2011 83 1% 6 32 ,667 4.2 (1.7) 151 111 Unsp ecified coun tries participating in the Europ ean So ciet y of In tensiv e Car e Medicine (ESICM). 2Hospital da ys after first da y of ICU admission. 3List of 35 co un tries (47): Argen tina; A ust rali a; A ustria; Belgium; Brazil; Bulgaria; Canada; Cuba; Czec h Republic; De nmar k; France; German y; Greece; Hong K ong;Hungary; India; Ireland; Israel; Italy; Luxem bourg; Mexico; Netherlands; Norw ay; P oland; P or tugal; R ussian Federation; Serbia; Slo vakia; Slo venia; Spain; Sw eden; Switzerland; T urk ey; Uni ted Kingdom; United States. 4V asil evsiks, 2009 mo del 2: co variates included from Mortalit y Probabilit y Mo dels (MPM0) II I; mo del 3: co va riates included from Simplified A cute Ph ysiology score (SAPS) II mo del. 5Mean predicted remaining ICU sta y af te r da y 5 w as 6.87 for the subgroup of admi ssions with ICU length of sta y longer than fiv e da ys. 6P er cen tage of excluded admissions after applying the AP A CHE IV inclusion criteria.

(33)

Chapter 2

Table 2.2: Summary of patient exclusion criteria, for development studies.

Reference Based on outcome Based on age ICU readmissions T ransf ers from another ICU T ransf ers to another ICU Surviv al statu s Burns Cardiac surgery Other 1 Clermont, 2004 [68] x Perez, 2006 [65] x x Zimmerman, 2006 [28] x x x x x x x x Rothen, 2007 [32] x Moran, 2008 [66] x x x x Moran, 2012, model 1-12 [25] x x x x Niskanen, 2009, model 1 [55]2 x x x x x Niskanen, 2009, model 2 [55] Vasilevsiks, 2009, model 2 [26]3 Vasilevsiks, 2009, model 3 [26] Kramer, 2010 [67] x x x x x x Al Tehewy, 2010 [69] x x x x x Verburg, 2014, model 1-8 [56] x x x x x x x x

1Other subgroups consist of severity of illness; admission type; admission diagnose;

unknown discharge location or date; trauma; dialysis; and unknown Glasgow Coma Score.

2Niskanen, 2009 model 1: outcome measure truncated at 30 days;

model 2: log-transformed outcome measure used.

3Vasilevsiks, 2009 model 2: covariates included from Mortality Probability Models (MPM0) III;

model 3: covariates included from Simplified Acute Physiology score (SAPS) II model.

(34)

2

2

Prognostic models for ICU length of stay: a systematic review

Table 2.3: Summary of predictor variables included in the models, for development studies.

Reference Ov erall sev e rit y of illness 1 A dmission source Age Mec hanical v e n tilation Glasgo w Coma Score Comorbidities Hospital length of sta y b efore ICU admission Organizational Coun try sp ecific In teraction terms 2 Other predictors Clermont, 2004 [68] x x x x Perez, 2006 [65] x Zimmerman, 2006 [28] x x x x x Rothen, 2007 [32] x Moran, 2008 [66] x x x x x x x Moran, 2012, model 1-12 [25] x x Niskanen, 2009, model 1 [55]3 x x x x x Niskanen, 2009, model 2 [55] x x x x x Vasilevsiks, 2009, model 2 [26]4 x x x x x Vasilevsiks, 2009, model 3 [26] x Kramer, 2010 [67] x x x x x x x x Al Tehewy, 2010 [69] x Verburg, 2014, model 1-8 [56] x x x x x x x

1Overal severity of illness consist of Acute Physiology Score (APS);

Acute Physiology and Chronic Health Evaluation (APACHE) II score; APACHE III score; Simplified Acute Physiology Score (SAPS) II score; SAPS III score; SAPS probability of mortality; and Mortality Probability Models (MPM)

2Interactions with age; APACHE III score; Therapeutic Intervention Scoring System (TISS) score;

SAPS II score; mechanical ventilation; gender; calendar year; hospital type; ICU death; and yearly number of admissions.

3Niskanen, 2009 model 1: outcome measure truncated at 30 days;

model 2: log-transformed outcome measure used.

4Vasilevsiks, 2009 model 2: covariates included from Mortality Probability Models (MPM0) III;

(35)

Chapter 2 T able 2.4: The handling of subgroups of patien ts in eac h of the mo d els for predicting ICU length of sta y . Reference T reatmen t of patien ts readmitted to the ICU Exclusion of patien ts transferred b et w een ICUs Exclusion of patien ts admitted to th e ICU follo wing cardiac surgery 1 P atien ts, wh o died b efore ICU disc harge Clermon t, 2004 [68] -P erez, 2006 [65] Summed o v er 2 -Death as one of the states Zimmerman, 2006 [28] Excluded F rom another ICU CABG -Rothen, 2007 [32 ] Excluded -Moran, 2008 [66] Excluded -Predictor: death Moran, 2012, mo del 1-12 [25] Exclude d -Predictor: death Niskanen, 2009, mo del 1-2 [55] Included -Op e n heart Predictor: death V asilevsiks, 2009 mo del 2 [26] 3 Excluded -CABG -V asilevs iks, 2009 mo del 3 [26] -CABG -Kramer, 2010 [67] Predicted T o and from another ICU CABG -Al T ehewy , 2010 [69] -CABG; cardiac v alv e or heart transplan t Excluded: deaths within first hour; deaths cardiopulmonary arrest within 4 hours V erbur g, 2014, mo del 1-8 [56] Excl uded T o and from another ICU CABG; all electiv e su rgery Mo del for whole group and separate mo dels for surviv ors and non-surviv ors 1CABG=coronary artery by ass grafting; ele ctiv e=electiv e surgery 2The sum of ICU length of sta y of all admissions of a patien t 3V asil evsiks, 2009 mo del 2: co variates included from Mortalit y Probabilit y Mo dels (MPM0) II I; mo del 3: co va riates included from Simplified A cute Ph ysiology score (SAPS) II mo del. 24

(36)

2

2

Prognostic models for ICU length of stay: a systematic review

Two studies related ICU length of stay to in-hospital mortality [32, 55]; and presenting expected and observed ICU length of stay separately for survivors and non-survivors [28, 67]. Five studies neglected mortality in their model.

We present results on model evaluation and performance reported by the original authors in table 2.5. The authors used a random sample from the development data set of [25, 26, 28, 55, 65–67], bootstrap methods [25] or data from different time periods [69] or ICUs [68]. The total number of admissions for validation ranged from 460 [68] to 46,517 [28] and the average number per parameter from 35 [26] to 2,843 [55]. The Pearson correlation coefficients ranged from 0.05 [26, 69] to 0.28 [55] across patients and from 0.01 [26] to 0.62 [28] across ICUs. Differences between the mean observed and mean predicted ICU length of stay ranged between 0.01 and 4.7 days and were not statistically significant for seven models. Two studies test these differences for subgroups of the covariates included in their model [26, 28, 70]. Four studies presented a calibration curve [26, 28, 55, 67], but none regression coefficients.

Table 2.6 described the suitability of models according to our predefined require-ments. No models met all of our requirements for models for planning the number of beds and members of staff required or identifying individual patients or groups of patients with unexpectedly long ICU length of stay to drive direct quality improvement. The APACHE IV model [28] and a second order recalibration with updated model parameters of this model [26] fulfilled most of our requirements for models for benchmarking, presented in table 2.6. However, the requirement for moderate calibration was not fulfilled.

We present external validation studies on the APACHE IV model in appendix 2.C table 2.9. The number of included admissions ranged between 4,611 and 32,667 admissions to between 35 and 83 ICUs in the USA and the Netherlands between 2001 and 2011. The average number of patients per parameter was between 35 and 249. The R2 was moderate (0.16 to 0.18) and strong (0.43 to 0.44) across ICUs. The difference in days between the observed and predicted mean length of stay was larger than for the internal validation of the model.

We present the scores assigned to the methodological quality items in appendix 2.B, table 2.8. The overall reporting quality scores ranged from 49 [69] to 60 [55] and median score of 55. The overall methodological quality scores ranged from 12 [25, 32, 66, 68] to 16 [65] points and a median score of 14. Further items extracted from each study are described in appendix 2.D, table 2.10.

(37)

Chapter 2 T able 2.5: V alidation of prediction mo del p erformance b y the original authors. Mo del v alidation Mo del p erformance Reference V alidation me tho d Num b er of patien ts included in v alidation set R 2 across

ICU (validation set)

R 2 across patien ts (v alidation set) Difference mean observ ed and mean predicted (v alidation set) in da ys (bias) Recalibration plot presen ted Clermon t, 2004 [68] 12 other ICUs 460 -0.50 1 No P erez, 2006 [65] Random sample (pps 2) of 50% stratified b y ICM co de 3 1 ,531 -0.01 to 6.69 1,4 No Zimmerman, 2006 [28] Simple random sample of 40% 46 ,517 0.62 0.22 0.08 1 Y es 5 Rothen, 2007 [32] -0.40 6 No Moran, 2008 [66] Random sample (pps 2) of 20% stratified b y y ear of admission 44 ,625 -0.18 -7 No Moran, 2012, mo del 1-12 [25] Random sample (pps 2) of 20% stratified b y y ear of admission 22 ,333 -0.18 to 0.20 0.2 to 4.7 Y es Niskanen, 2009, mo del 1 [55] 8 Simple random sample of 40% 25 ,586 0.57 0.27 0.01 1 Y es 5 Niskanen, 2009, mo del 2 [55] Simple random sample of 40% 25 ,586 0.64 0.28 0.76 Y e s V asilevsiks, 2009, mo del 2 [26] 9 Simple random sample of 40% 4 ,611 0.28 0.10 0.01 1 Y es 5 V asilevs iks, 2009, mo del 3 [26] Simple random sample of 40% 4 ,611 0.01 0.05 0.02 1 Y es Kramer, 2010 [67] Simple random sample of 50% and differen t time p er io d 12 ,904 0.43 6 0.18 6 0.02 1 and 0.61 10 Y es Al T ehewy , 2010 [69] T w o ICUs and differen t time p erio d -0.05 -No V erbur g, 2014, mo del 1-8 [56] Bo otstrap (100x) 32 ,667 -0.09 to 0.15 -Y es 1Difference tes ted as not significan t differen t from 0 (t-test, Mann Withney U test or Chi 2-go odness of fit test). 2pps=probabilit y prop ortional to size. 3ICM=In tensiv e Care N ational A udit and Rese arc h Cen tre co ding metho d. 4Presen ted per da y, per system: 0.01 (1da y)-6.69 (30da ys), not significan t. 5Recalibration rep orted as accurate, go od or prefect, app endix 2.C, table 2.9. 6P erformance ba sed on the dev elopmen t set, and presen ted as a median. 7Figure 1, presen ts ra w and predicted mortalit y. 8Niskanen, 2009 mo del 1: outcome truncated at 30 da ys; mo del 2: log -transformed outcome measure used. 9V asil evsiks, 2009 mo del 2: co variates included from Mortalit y Probabilit y Mo dels (MPM0) II I; mo del 3: co va riates included from Simplified A cute Ph ysiology score (SAPS) II mo del. 10 P er formance rep orted for admissions with ICU LoS>5 da ys. Difference w as 0.02 for in ternal and 0.61 for external validation. 26

(38)

2

2

Prognostic models for ICU length of stay: a systematic review

T able 2.6: Suitabilit y of mo dels predicting ICU LoS for planning resources; iden tifying ind ividual patien ts or group s of patien ts with unexp ecte dly long ICU length of sta y; and b enc hmarking, b ased on the defined requiremen ts. Reference Mo del parameters published No organiza-tional or coun try sp ecific predictors Only predictors at ad mission or first 24 hours included Mo derate calibration (for all subgroups) Strong R 2 across ICUs Strong R 2 across patien ts Suitable for planning and iden tifying long ICU length of sta y Suitable for b enc hmar-king Clermon t, 2004 [68] Y es Y es No No No No No No P erez, 2006 [65] Y es Y es Y es No No No No No Zimmerman, 2006 [28] Y es Y es Y es No 1 Y es No No No Rothen, 2007 [32] Y es Y es Y es No No No No No Moran, 2008 [66] Y es No Y es No No No No No Moran, 2012, mo del 1-12 [25] No No Y e s No No No No No Niskanen, 2009, mo del 1 [55] 2 Y es No Y es No Y es No No No Niskanen, 2009, mo del 2 [55] Y es No Y es No Y es No No No V asilevsiks, 2009, mo del 2 [26] 3 Y es Y es Y es No 1 No No No No V asilevs iks, 2009, mo del 3 [26] Y es Y es Y es No 1 No No No No Kramer, 2010 [67] Y es Y es No No Y es No No No Al T ehewy , 2010 [69] Y es Y es Y es No No No No No V erbur g, 2014, mo del 1-8 [56] No Y es Y es No No No No No 1Mo dera te recalibration w as tested and differences w ere not significan t for sev era l subgroups. 2Niskanen, 2009 mo del 1: outcome m easure truncated at 30 da ys; mo del 2: log-transformed outcome measure used. 3V asil evsiks, 2009 mo del 2: co variates included from Mortalit y Probabilit y Mo dels (MPM0) II I; mo del 3: co va riates included from Simplified A cute Ph ysiology score (SAPS) II mo del .

(39)

Chapter 2

2.4

Discussion

In this systematic review, we focussed on the utility of models for predicting ICU length of stay and assessed their suitability for planning ICU resources, identifying unexpectedly long ICU length of stay, and benchmarking ICUs. We included eleven studies on model development and three studies externally validating the APACHE IV model [28]. We concluded that no models fulfilled all of our requirements for planning ICU resources or identifying patients with unexpectedly long ICU length of stay. The original [28] and a second order recalibration of the [26] APACHE IV model fulfilled most of our requirements for benchmarking. However, these models did not fulfill our requirement for moderate calibration. As no models fulfilled our requirements, physicians choosing to use them to predict ICU length of stay should interpret the predications with reservation. Benchmarking incorrect predictions can have large consequences, especially when benchmarking results are published and those without specialist statistical knowledge use them to judge hospitals. As patient characteristics in individual hospitals over time may remain more similar than between hospitals, benchmarking ICU length of stay to an historical benchmark may be acceptable with these models.

In addition, as healthcare and hospital policies differ between countries and over time, we recommend validating a model using recent local data before using it to predict individual or to benchmark hospitals on ICU length of stay.

Four main aspects of reporting and methodological quality in studies reporting on the development of prediction models for ICU length of stay could be improved. The first is the exclusion of patients based on observed ICU length of stay. Excluding a few extreme values might enable researchers to obtain a model with a reasonable fit for the majority of patients. However, sub-optimal patient care could lead to prolonged ICU length of stay and, when benchmarking, truncating ICU length of stay or excluding patients with prolonged ICU length of stay can lead to biased performance results [71]. The second is the handling of ICU non-survivors. The association between severity of illness and ICU length of stay differs for ICU survivors and non-survivors [56]. It seem sensible to develop different models for these two groups. However, higher mortality rates could lead to shorter average ICU length of stay, but it is undesirable to reduce ICU length of stay at the cost of increasing mortality. Thirdly, for benchmarking composite indicators incorporating information on ICU length of stay and mortality may be preferable to length of stay as a single outcome measure. Fourthly, some researchers used ordinary linear regression to predict ICU length of stay. This can lead to predictions of ICU length of stay which are negative and, as such, conceptually incorrect [56]. Logarithmic or other transformations of ICU length of stay or other regression models can overcome these problems. Our study has two main strengths over previous reviews of prediction models for ICU length of stay. Firstly, this is the first systematic review of ICU length of stay prediction models for the general ICU population. Secondly, we systematically assessed the studies

(40)

2

2

Prognostic models for ICU length of stay: a systematic review

in light of three clinical applications and using established frameworks [57, 61, 72, 73]. Previously, researchers compared the Acute Physiology and Chronic Health Evaluation (APACHE) IV [28], mortality probability [74] and simplified acute physiology score [75] models and found that the APACHE IV scoring system is most frequently used to predict ICU length of stay [28]. A systematic review in cardiac surgery patients identified several models for unexpectedly long ICU length of stay [76], but we found no studies which defined ICU length of stay in this way for the general ICU population. This may be because cardiac surgery is performed in relatively small numbers of specialist hospitals and patients generally have a shorter and less variable ICU length of stay than other patients.

Our study also has Five main weaknesses. Firstly, we did not require studies to have a minimum sample size. Little consensus exists on the sample size required to prevent overfitting when constructing a prediction model for a continuous variable [77–79]. The minimum mean number of admissions per predictor is 82, and hence sufficient, according to the sparse literature. Secondly, we did not place restrictions on the period in which the data were collected, when in the ICU stay patient characteristics were measured or the point from which ICU length of stay was measured. Two studies report on data collected in or before 1999 [65, 68], two [67, 68] used characteristics that changed over time and one [67] predicted prolonged ICU length of stay beyond a threshold value. These models may not be suitable for current implementation, but throw light on patient characteristics that should be included in prediction models for ICU length of stay. Although these models are substantially different to the other models included in this review, we believe that their inclusion does not influence our final conclusions because we do not recommend them for either purpose. Thirdly, we did not examine differences in performance according to whether ICU length of stay is defined in fractional or billing days. Fourthly, we only included studies that developed or validated prediction models for ICU length of stay. We did not include all studies evaluating associations between patient characteristics, such as Sequential Organ Failure Assessment (SOFA) score, physiological or laboratory values, and ICU length of stay. Fifthly, we did not consider the utility of models for the early identification of individuals with a high risk of excessively long ICU length of stay.

Constructing a good model for ICU length of stay would require specialized statistical and clinical knowledge [28, 71–73, 80, 81]. For instance, as several studies [68] have reported an association between daily SOFA scores and ICU length of stay, researchers could explore using these data for day-on-day planning. They could also examine novel statistical methods, such as joint modelling [82] and competing risks [83] to predict ICU length of stay [84–86]. However, we do not expect that these methods will result in substantially better models for predicting ICU length of stay if they only consider patient characteristics. We expect that data on hospital characteristics and ICU policies and practice will also be required. However, including this type of characteristics will make it more difficult to use the resulting models for benchmarking purposes.

(41)

Chapter 2

2.5

Conclusion

No previously published models satisfy our three general requirements for pre-diction models for ICU length of stay or our specific requirements for models to plan resource allocation and to identify patients with unexpectedly long ICU length of stay or our specific requirements for models for benchmarking purposes. Physicians considering using these models to predict ICU length of stay should interpret them with reservation until a validation study using recent local data has shown that they obtain moderate calibration and produce accurate predictions.

(42)

2

2

Prognostic models for ICU length of stay: a systematic review

Appendix 2.A: Search query

Table 2.7: Search query

Intensive Care Unit Admission duration Prognostic model

ICU length of stay prognostic

intensive care prognose

critical care predictive

critically ill prediction

predict predictor

Ovid EMBASE and Ovid MEDLINE were searched from database inception until 31-10-2014: ("length of stay"[All Fields] AND (prognostic [All Fields] OR prognose[All Fields] OR predictive[All Fields] OR prediction[All Fields] OR predict[All Fields] OR

predictor[All Fields])) AND ("intensive care"[All Fields] OR "critical care"[All Fields] OR "critically ill" [All Fields] OR ICU [All Fields])

(43)

Chapter 2

App

endix

2.B:

A

dopted

domains

and

(k

ey)

items

of

the

used

CHARMS

[57]

c

hec

k

lis

t

T able 2.8: A dopted domains and (k ey) items of the used chec klist [57] accompanied wi th the rep orting-and metho dological score. -(1 of 5) Mo del dev elopmen t Mo del v alidation Clermont, 2004[68] Perez, 2006[65] Zimmerman,2006 [28] Rothen,2007 [32] Moran,2008 [66] Moran,2012 m1-12[25] Niskanen,2009 m1[55] Niskanen,2009 m2[55] Vasile vsiks,2009 m1[26] Vasilevs iks,2009 m2[26] Vasilevs iks,2009 m3[26] Kramer,2010 m1[67] AlT ehewy, 2010[69] Verbu rg,2014 m1-8[56] Vasile vsiks,2009 [26] Kramer,2010 [67] Verbur g,2014 [56] Total scorek eyitem Source of data 1 y y p p y y y y y y y y y n y y n 28 Particip ants P articipan t e ligibilit y and recruitmen t metho d 2 y y y y y y y y y y y y p y y y y 33 P articipan t d escription p y y p y y y y p p p y p y p y y 27 Study dates y y y y y y y y y y y y y y y y y 34 Outc ome(s) to be pr edicte d Definition and metho d for meas uremen t of outcome p p y p y y n n y y y y p y y y y 26 W as the same outcome definition used in all patien ts? y y y y y y y y y y y y y y y y y 34 T yp e of ou tcome y y y y y y y y y y y y y y y y y 34 W ere candidate predictors part of th e outcome? y y y y y y y y y y y y y y y y y 34 Spread is rep orted for p rimary outcome measure n n n n y y y y n n n n n y n n y 12 Candidate pr edictors Num b er and typ e of predictors y y y y y y y y y y y y y y y y y 34 Definition and metho d for measuremen t of predi ctors y y y y y p y y y y y y y y 27 Timing of p redictor measu remen t y y y y y y y y y y y y y y y y y 34 Handling of predictors in the mo delling y y y y y y y y y y y y y y y y y 34 y=fulfilled (y es); n=not fulfilled (no ); p=partly fulfilled (partly); u=unkno wn, not men tioned 32

(44)

2

2

Prognostic models for ICU length of stay: a systematic review

T able 2.8: A dop ted domains and (k ey) items of the used chec klist [57] accompanied with the rep orting-and metho dological score -(2 of 5). Mo del dev elopmen t Mo del v alidation Clermont, 2004[68] Perez, 2006[65] Zimmerman,2006 [28] Rothen,2007 [32] Moran,2008 [66] Moran,2012 m1-12[25] Niskanen,2009 m1[55] Niskanen,2009 m2[55] Vasile vsiks,2009 m1[26] Vasilevs iks,2009 m2[26] Vasilevsi ks,2009 m3[26] Kramer,2010 m1[67] AlT ehewy, 2010[69] Verbur g,2014 m1-8[56] Vasile vsiks,2009 [26] Kramer,2010 [67] Verbur g,2014 [56] Total scorek eyitem Sample size Num b er of participan ts and outcomes/ev en ts y y y y y y y y y y y y p y y y y 33 Num b er of outcomes/ev en ts in relation to the n um b er of pred ictors (Ev en ts P er V ariable) 1 p y p y p p p p p p p p p p p p p 19 Missing data Num b er of participan ts with an y missing v alue y n n n n n n n y y y n y y n n y 14 Num b er of participan ts with missing data for eac h predictor p n n n n n n n p p p n p y p n y 10 Handling of missing data y n n n y y y y y y y y y y y y y 28 Mo del development Mo delling metho d y y y y y y y y y y y y y y y y y 34 Mo delling assumptions satisfied n y p n p p p p n n n p n p n p p 11 Metho d for selection of pre dictors for inclusion p n y y p p y y y y y y n y 21 Initial predictors/v ariables are rep orted 2 y y y y y y y y y y y y y y 28 Metho d for selection of pr edictors and criteria used y y y y n p y y y y y y y y 25 Shrinkage of predictor w eigh ts or regression co efficien ts n n n n n n n n n n n n n n 0 Rep orting of mo del deriv ati on and calibration pro cess is sufficien t for the results to b e repro duced 2 p p y y p p y y y y y y p n 21 y=fulfilled (y es); n=not fulfilled (no ); p=partly fulfilled (partly); u=unkno wn, not men tioned

(45)

Chapter 2 T able 2.8: A d opted domains and (k ey) items of the used chec klist [57] accompanied with the rep orting-and metho dological score -(3 of 5). Mo del dev elopmen t Mo del v alidation Key items Clermont, 2004[68] Perez, 2006[65] Zimmerman,2006 [28] Rothen,2007 [32] Moran,2008 [66] Moran,2012 m1-12[25] Niskanen,2009 m1[55] Niskanen,2009 m2[55] Vasilevsiks ,2009 m1[26] Vasilevs iks,2009 m2[26] Vasilevs iks,2009 m3[26] Kramer,2010 m1[67] AlT ehewy, 2010[69] Verbu rg,2014 m1-8[56] Vasilevsi ks,2009 [26] Kramer,2010 [67] Verbu rg,2014 [56] Total scorek eyitem Hand ling sp ecific p atient sub gr oups 3 Readmissions 2 n y y y y y y y y y y y n y y y y 30 T ransf ers 2 n n y y y y y y y y y y n y y y y 28 Non-surviv ors 2 n y n n y y y y n n n n y y n n y 16 Cardiac surgery 2 n n y n n n y y y y y y y y y y y 24 Mo del p erformanc e Calibration and Discrimination y y y n y y y y y y y y y y y y y 32 Measures with confidence in terv als n n n n n n y y n n n n n n n n n 4 Classification measures and use of a-priori cut p oin ts n n n n n n n n n n n n n n n n n 0 Mo del evaluation Metho d used for testing mo del p erformance: dev elopmen t dataset only or separate external v alidation 1 y y y n y y y y y y y y y y y y y 32 In case of p o or v alidation, whether mo del w as adjusted or up dated n n n n n n n n n n n n n n 0 Public ation of the develop ed mo dels Final and other m ultiv ariable mo dels presen ted 1 y y y y y n n y y y y y y n 22 An y al ternativ e presen tation of the final prediction mo dels n n n n n n n n n n n n n n 0 Comparison of the distri bution of predictors y y n n n n n n y y y n n n y n n 12 y=fulfilled (y es); n=not fulfilled (no ); p=partly fulfilled (partly); u=unkno wn, not men tioned 34

(46)

2

2

Prognostic models for ICU length of stay: a systematic review

T able 2.8: A dop ted domains and (k ey) items of the used chec klist [57] accompanied with the rep orting-and metho dological score -(4 of 5). Mo del dev elopmen t Mo del v alidation Clermont, 2004[68] Perez, 2006[65] Zimmerman,2006 [28] Rothen,2007 [32] Moran,2008 [66] Moran,2012 m1-12[25] Niskanen,2009 m1[55] Niskanen,2009 m2[55] Vasile vsiks,2009 m1[26] Vasilevs iks,2009 m2[26] Vasilevsi ks,2009 m3[26] Kramer,2010 m1[67] AlT ehewy, 2010[69] Verbur g,2014 m1-8[56] Vasile vsiks,2009 [26] Kramer,2010 [67] Verbur g,2014 [56] Total scorek eyitem Interpr etation and discussion of the eligible studi es In terpretation of p resen ted mo d els y y y y y y y y y y y y y y y y y 34 Comparison with other stud ies, discu ssion of generalizabilit y , strengths and limitations. y y y y y y y y n n n y y y n y y 26 Metho dolo gic al quality items Study consist of a cohort study or registry instead of a randomized design (s ource of data) y y y y y y y y y y y y y y y y y 34 Study consist of a prosp ectiv e study design (sour ce of data) y y u u n n u u n n n n p n n u 5 P atien ts are not excluded bas ed on outcome v ariable (participan ts) n y n y n n n n n n n n y n n n n 6 Selectiv e inclusion based on data a v ailabilit y did not to ok place (participan ts) y u u u n n n n n n n n n n n n n 2 sample size (n) in dev elopmen t set is sufficien t relativ e to the n um b er of v ariables in the final mo del (sample size) y y y y y y y y y y y y y y 28 y=fulfilled (y es); n=not fulfilled (no ); p=partly fulfilled (partly); u=unkno wn, not men tioned

Referenties

GERELATEERDE DOCUMENTEN

In het innovatieatelier komen JGZ-professionals (innovatiepioniers) uit heel Nederland bij elkaar om in co-creatie te werken aan  vernieuwing binnen de JGZ​. ​In dit overzicht vind

An algebra task was chosen because previous efforts to model algebra tasks in the ACT-R architecture showed activity in five different modules when solving algebra problem;

Replacing missing values with the median of each feature as explained in Section 2 results in a highest average test AUC of 0.7371 for the second Neural Network model fitted

Mr Ostler, fascinated by ancient uses of language, wanted to write a different sort of book but was persuaded by his publisher to play up the English angle.. The core arguments

The person with small- est distance to SO i is labeled as the owner and denoted as OP i (an example is shown in Fig. A concrete example is shown in Fig. It is costly but unnecessary

Voorschrijven van acetylsalicylzuur voor primaire preventie van cardiovasculaire aandoeningen bij diabetes mellitus is in Nederland niet gebruikelijk en wordt vanwege gebrek aan

In this section, we describe our guidelines on producing funnel plots. We developed these guidelines following a focussed literature search, in which we identified six conceptual