• No results found

The Predictive Effect of Influenza Activity on Hospitalization in the Netherlands

N/A
N/A
Protected

Academic year: 2021

Share "The Predictive Effect of Influenza Activity on Hospitalization in the Netherlands"

Copied!
25
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

bm#KBii2/ BM T`iBH 7mH7BHHK2Mi 7Q` i?2 /2;`22 Q7

Kbi2` Q7 b+B2M+2

.mM+M q2bH2v CMb2M

RyyR8eky UlpV

k8989dj UolV

Kbi2` BM7Q`KiBQM bim/B2b

/i b+B2M+2

7+mHiv Q7 b+B2M+2

mMBp2`bBiv Q7 Kbi2`/K

CmM2 kyRd

AMi2`MH amT2`pBbQ` 1ti2`MH amT2`pBbQ` hBiH2- LK2 S`Q7X /`X :2` EQQH2 .2MMBb _Qm#Qb- S?.X

{HBiBQM ol- 61q- *qA Ji?2KiB+H .2p2HQT2`- >Ph~Q *QKTMv

(2)

This feasibility study, commissioned byHOTflo Company, provides an analysis and evaluation of the effect of influenza activity on emergency hospitalizations for five departments in the Netherlands, namely(1) Pulmonary Medicine, (2) Cardiology, (3) Internal Medicine, (4) Pediatrics, and (5) Geriatrics. Additionally, this study evaluates several predictive models to forecast influenza activity for the upcoming season and influenza activity for the upcoming two weeks. The study is commissioned to examine whether influenza activity forecasts can be incorporated in a more pro-active predictive model for number of patients present in the hospital for department that are flu-dependent. In turn, the improvement of the existing predictive model in the portfolio can aid hospitals with their capacity planning, potentially leading to an increased workload perception, patient safety, and hospital efficiency.

The main findings of this study are:

⌅ A moderate correlation (0.355) exists between emergency hospitalizations unrelated to seasonality and trend-cycle for thePulmonary Medicine department and influenza activity over the last three influenza seasons. A good correlation (0.498) exists over the last four influenza seasons. Emergency hospitalizations thus remain elevated, even when corrected for seasonality.

⌅ No significant exists between emergency hospitalizations unrelated to seasonality and trend-cycle for theCardiology department and influenza activity over the last three influenza seasons. A moderate and significant correlation (0.226) exists over the last four influenza seasons. Both series are however favorably aligned in space and time, indicating that there is an effect of influenza activity on emergency hospitalizations for cardiac patients.

⌅ A weak to moderate correlation (0.195) exists between emergency hospitalizations unrelated to seasonality and trend-cycle for theGeriatrics department and influenza activity over the last three influenza seasons. The correlation slightly improves when (0.210) an additional fourth season is added. The correlation (0.298 and 0.272 respectively) is significantly higher when only Influenza Type A is considered.

⌅ No significant correlation is found between between emergency hospitalizations unrelated to seasonality and trend-cycle for thePediatrics and Internal Medicine departments and influenza activity over both three and four influenza season. A significant weak to moderate correlation (0.159 and 0.167 respectively) is however found when only Influenza Type A is considered.

⌅ Influenza activity on Twitter of the last month is moderately correlated (0.354) with influenza activity. Whereas aggregated flu-related Google searches from the previous week is highly correlated (0.722) with influenza activity. Google is therefore preferred over Twitter as Digital Influenza Surveillance indicator.

⌅ Utilizing social media data (Google and Twitter) increases forecasting accuracy significantly over predictive models that capture intrinsic influenza timeseries information solely.

⌅ Ensemble machine learning based models outperform timeseries models in short-term forecasting, accuracy and direction, making these models useful for both tactical planning (2 weeks - 6 months forecast) and operational planning (1-2 weeks forecast).

⌅ With respect to interpretability of the predictive model, a Dynamic Regression Model is proposed that captures intrinsic influenza timeseries information, lagged Google data (one, three and four weeks lag), and Twitter data (four weeks lag). This model can serve as a early warning indicator for elevated influenza activity while providing accurate short-term forecasting values.

HOTflo can use these results and evaluated models of this feasibility study to accurately estimate influenza activity for the upcoming 1-2 weeks by utilizing both publicly available influenza data and social media data. The predicted values can be used in HOTflo’s existing autoregressive models for the prediction of the number of cardiac, respiratory and geriatric emergency patients present in the hospital. It is recommended to use the proposed Dynamic Regression Model for operational purposes since it is a good trade-off between short-term forecasting accuracy and interpretability, increasing the chances of acceptance for clinical decision making and capacity management. Furthermore, the model can serve as an early warning system indicating when influenza activity as exogenous variable adds explanatory power to the existing model.

(3)

The Predictive Effect of Influenza Activity on

Hospitalization in the Netherlands

Duncan Wesley Jansen

*

Abstract

Influenza epidemics place a strong burden on health services and resource occupation, especially for depart-ments that receive a significant proportion of patients that are vulnerable to influenza activity. The effect of influenza activity on hospitalization can vary in size per department. In this study, a moderate to weak effect is shown for elevated Respiratory, Cardiac and Geriatric related hospitalizations during elevated Influenza activity. Since the onset, severity, amplitude and longitude of elevated influenza activity during an Influenza season vary widely from season to season, multiple predictive models have been evaluated and discussed that can accurately forecast influenza activity for the next two weeks in the Netherlands based on digital Influenza indicators. Depending on the purpose of use, the forecasted Influenza activity values can be used as an additional exogenous variable in more advanced multivariate regression models or timeseries models, improving the prediction accuracy over calendar variables.

Keywords

Digital Disease Detection — Tracking Influenza Activity — Influenza Forecasting — Influenza-related Hospitalization

*Contact: Duncan.jansen@student.uva.nl — +31(0)6 305 35 205

Contents

Context 1 1 Introduction 2 2 Related Work 3 2.1 Influenza-Associated Hospitalizations . . . 3 2.2 Forecasting the Dynamics of Influenza Epidemics3

3 Materials and Methods 5

3.1 Data Collection and Inspection . . . 5

Hospitalization•Influenza•Google•Twitter

3.2 Feature Engineering . . . 8

W/S Feature•Change Points

3.3 Forecasting models . . . 9

Benchmark•ARGO•Stacked Linear Regression•SVM Regres-sion•Generalized Boosting Regression•Dynamic Regression

3.4 Evaluation . . . 10

Evaluation Metrics•Cross-validation•Evaluation Criteria

3.5 Research Tools . . . 11

4 Results and Discussion 11

4.1 Effect of Influenza Activity on Hospitalizations . 11 4.2 Forecasting Influenza Activity . . . 13

5 Limitations and Implications 15

6 Conclusion and Recommendation 17

Acknowledgments 17

References 18

Appendices 20

Context

Sufficient resources for Healthcare providers is a necessity for providing quality care. At the same time, resources can only be utilized at a certain cost. Over the last decades, develop-ments and trends in the way Healthcare is provided requires hospitals to ensure proper and efficient use of these resources. This is formalized as integrated capacity management.

HOTflo Companyis a specialized service company that supports hospitals with capacity management through i.a. pre-dictive models for the number of patients present in the hos-pital. A disturbing factor that is presumably contributing to noise in the existing predictive (timeseries based) models, is influenza activity. More commonly known as the flu. Elevated influenza activity at the onset of a flu epidemic increases the demand for certain types of care. As a consequence the uti-lization of the available resources such as beds and staffing rise as well. Since in Healthcare, increased demand cannot be ignored and should be provided regardless of demand, pa-tients are often transferred to departments that are unrelated to their type of specialism. Switching to a more pro-active predictive model incorporated with external predictions can have a great effect on workload perception, patient safety and hospital efficiency.

Within this context, afeasibility study is conducted on the predictability of influenza activity for departments that are flu-dependent, commissioned by HOTflo and part of the grad-uation curriculum of the collaborativeData Science master

offered at the University of Amsterdam and Vrije Universiteit van Amsterdam.

(4)

1. Introduction

In industrialized countries, the complexity of health conditions managed by healthcare professionals, inpatient bed shortages, increased patient volumes and long outpatient wait times for diagnostic investigations cause many challenges in healthcare systems. Consequently, these dysfunctionalities lead to unfa-vorable situations such as overcrowding and increased waiting times in hospitals [1]. Since overcrowding and increased waiting times lead to inefficiencies in healthcare systems, the ability to predict patient visits and the fluctuations in visits is crucial in designing strategies to address these issues.

Several predictive models have been proposed that aim to predict the number of patients present in the hospital [2]. The most widely used predictive models are (1) multivariate regression models with calendar features (day of the week, month, season, working days) and/or weather features (tem-perature, precipitation) and (2) time series models. These models can explain roughly 65% - 75% of patient-volume variability. In the majority of models, weather data fails to substantially improve reliability over calendar features due to the lack of reliability of weather data over longer time periods [2].

A factor that could presumably improve reliability over weather data in predictive models is the intensity of influenza activity during influenza epidemics [3]. High influenza activ-ity seems to have a significant effect on emergency department visits, mainly for respiratory complaints [4] and outpatients visits by healthy children who are being hospitalized with similar rates at adults with high risk for influenza [5] [6]. Increased hospitalization due to elevated influenza activity implies a fast-changing demand for healthcare services for departments that receive a significant portion of patients with an increased vulnerability to influenza.

Influenza epidemics follow a seasonal pattern where influenza activity is expected to be significantly higher in the winter season. The seasonal epidemic is therefore expected to repeat every winter. The onset, severity and longitude of the out-break is however unknown a priori [7]. This places a strong burden on health services and resource occupation since the influenza virus can spread rapidly, increasing the demand for health services unexpectedly thereby posing logistic problems within hospitals. Particularly since demand for health services in the winter season is already significantly higher than in other seasons due to winter mortalities.

In anticipation of influenza epidemics, a European net-work of sentinel medical doctors (EISN) is in place to monitor the surveillance of influenza activity and the corresponding spread. The affiliated sentinel doctors located in various Euro-pean countries report on the number of self-referred patients with Influenza-Like Illness (ILI) registered at the General Practitioners (GP) Office. These doctors sent in samples of a selected group of patients to the National Influenza Surveil-lance Laboratory for testing. In the Netherlands, NIVEL issues an official alert when the reported patients with ILI

activity is over the epidemiological limit of 5.1 cases of ILI activity per 10,000 residence for two consecutive weeks1.

Hospitals act upon the official alert by initiating influenza response protocols to secure patient safety.

Although the influenza surveillance system is one of the most advanced surveillance systems globally, it faces several challenges; (i) only a portion of patients with ILI seek medical care through self-referral, (ii) the network of sentinel doctors is small with typically only 1%-5% of all physicians within a country affiliated, and (iii) there is a lag present between the consultation and availability of data of typically 1-2 weeks [7]. These challenges can result in underreporting, thus deflating the demand for health services as a result of ILI, particularly early in the influenza season.

In response, several innovative ILI activity indicators have been proposed in literature based on conventional data collec-tion methods. These methods include monitoring call volumes to telephone triage services, counting over the counter drug sales and collecting patients logs to Physician for flu shots [8]. More innovative ILI activity indicators have been derived from web-based sources compassing the growing number of human interactions on the Internet regarding ILI activ-ity such as Google Flu Trends (GFT) [9], Google searches [7] [10], Tweets [8] [11] and Wikipedia statistics [12] [13] . In the United States, these influenza activity predictors can significantly improve surveillance over the traditional surveil-lance system of the Center for Disease Control and Prevention (CDC). Combining information from multiple influenza activ-ity predictors can be more advantageous than choosing one optimal indicator since the information from multiple indica-tors complement each other and produce the most accurate and robust predictions when combined optimally [14] [15]. The majority of research conducted regarding influenza activ-ity indicators as improvements over traditional surveillance systems is strongly focused on the United States and on fore-casting the dynamics of influenza outbreaks. The aim of this research is to contribute to the existing literature by examining the effect of influenza activity on hospitalization in the Nether-lands for departments that receive a significant proportion of patients vulnerable to influenza activity. When a significant effect is present, a predictive model is constructed that can forecast the influenza activity in the Netherlands, capturing the amplitude and longitude of the influenza season and pro-viding accurate forecasts for up to two weeks. Ultimately, the outcome of this study should provide a framework for HOTflo on which they can further develop their predictive models. The research question is stated as follows:

Research Question

What predictive model can forecast influenza activity most accurately such that it represents the effect on emergency hospitalizations for departments that are flu dependent in the Netherlands?

(5)

Sub Questions

(i) Which departments receive a significant amount of emergency patients that are vulnerable to Influenza activity?

(ii) How big is the effect of Influenza activity on emergency hospital-izations of these departments?

(iii) Can Influenza like activity be detected from online social media behavior?

(iv) What features need to be extracted from social media data and official Influenza Surveillance for prediction?

(v) Can the amplitude and longitude of the Influenza activity in the Netherlands be predicted?

(vi) What combination of features has the most predictive power on influenza activity?

(vii) What predictive model can capture the dynamics of influenza activity in the Netherlands most accurately?

(viii) What predictive model can provide the most reliable 1-2 week influenza activity forecast?

In the next chapter, relevant work on influenza-associated hospitalization is discussed as well as models and features that track influenza outbreaks and activities. In chapter 3, data collection, descriptive statistics, forecasting models and evaluation metrics are discussed extensively. Subsequently, the effect of influenza activity on hospitalizations and the results of the evaluated forecasting models are presented and discussed in detail in chapter 4. The outcome of this study is critically evaluated in chapter 5 where both the limitations of this study and recommendations for further research are addressed. The overall conclusion is stated in chapter 6.

2. Related Work

2.1 Influenza-Associated Hospitalizations

Influenza-associated hospitalizations contribute to an impor-tant proportion of the total health burden of a nation’s health-care system. Multiple studies have therefore estimated num-bers and rates of influenza-associated hospitalization by age group, risk status, influenza type and subtype in various coun-tries to cope with the economic costs of influenza epidemics and pandemics to alleviate the burden. In the United States, an-nual influenza-associated hospitalizations has been estimated for primary and listed pneumonia, influenza, respiratory and circulatory hospitalizations by comparing different discharge categories, discharge types and age groups [16]. The main conclusions drawn in this paper are: (i) significant numbers of influenza-associated hospitalizations occur among the elderly and children younger than 5 years, (ii) The rate of hospital-izations and length of stay increases significantly with age, especially during Influenza A(H3N2) viruses predominant in many influenza seasons and after age 65, and (iii) influenza activity is associated with an increase in hospitalizations for a broad range of cardiopulmonary diagnoses such as respiratory disease and acute cardiac failure.

A similar study conducted in Europe aiming to assess the impact of seasonal influenza on hospitalizations and mortality of five European countries, finds similar results. Both

hospital-izations due to respiratory diseases and due to pneumonia and influenza, increases significantly by age groups and places the highest health burden for patients over 65 [17]. Other studies claim that seasonal influenza increases the number of emergency visits strongly associated with excess respiratory complaints [4], whereas other studies claim that influenza epidemics accounts for excess cardiovascular-related hospital-izations for elderly patients ( 80 years) [18].

Where the majority of studies stress the complications of seasonal influenza epidemics for elderly patients, other stud-ies stress the serious complications among patients of any age group who have certain chronic conditions or the complica-tions for young children and infants. One retrospective cohort study finds that during influenza epidemics, excess number of hospitalizations occur among healthy children younger than 15 with acute cardiopulmonary conditions. With most exces-sive rates occurring among children younger than one year [6]. A similar study confirms that influenza epidemics sub-stantially increase outpatients visits by healthy children with specific acute respiratory illness [5]. Hospitalization rates are even higher among children with chronic medical con-ditions, with approximately 12 times higher hospitalization rates for acute respiratory disease than for children without these conditions, especially for children younger than 2 years [19].

2.2 Forecasting the Dynamics of Influenza Epidemics Mathematical and Computational models capturing the dy-namics of Influenza epidemics and pandemics for public health purposes have been extensively studied in literature [20]. Reliable forecast of influenza activity measures such as onset, peak time, peak height, longitude and magnitude would inform public health practitioners and healthcare workers to anticipate for demand shocks in healthcare resources, enabling them to plan resource capacity accordingly. Modelling and real-time forecasting of influenza outbreaks with traditional (viral and syndromic) surveillance and digital surveillance is most extensively studied in literature.

Traditional surveillance systems rely mostly on Influenza Like Illness (ILI) reporting from general practitioners, diagnostic library testing and reports from public health institutions. Al-though these traditional surveillance systems produce accurate reporting on ILI activity based on clinical “sentinel” medical practices, there are some inherent problems with this system.

Firstly, it only captures information about people who seek medical care for their influenza symptoms thereby miss-ing those who do not interact with the health care system. Secondly, there is often a 1 to 2 weeks lag between the occur-rence of the illness event and dissemination of the surveillance information due to processing time and aggregating clinical information. It is the latter that is particularly sub-optimal for capacity management.

(6)

With the explosive growth of social media activity that can be collected from Internet-based services, multiple researchers have stressed the potential of big social media based datasets to serve as digital influenza surveillance systems to detect in-fluenza outbreaks and elevated inin-fluenza activity [21]. Many people search and tweet about their illnesses before seeking medical care. Digital surveillance could therefore be used as real-time “sensors” on ILI activity to overcome the inherent problems of the traditional surveillance system. Reliance on digital surveillance systems solely however, leads to signifi-cant discrepancies between predicted activity and officially reported activity by Health surveillance institutions [22]. Dig-ital surveillance systems should rather be used complimentary to traditional surveillance data capturing foundational issues of measurement, construct validity, reliability and dependen-cies among data [23].

The Internet-based influenza indicator that has has received most attention is Google Flu Trends (GFT), a flu tracking algorithm developed by Google, based on aggregated Google queries related to flu. In their original paper, Google claimed to accurately estimate the current level of influenza activity in the United States with a lag of roughly one day due to the high correlation of relative frequency of certain queries and the percentage of physician visits of patients with In-fluenza like symptoms. In subsequent years however, GFT predictions missed the actual proportion of doctor visits due to influenza like symptoms by more than 100%! The overfitting of the GFT algorithm was mainly due to the vulnerability of overfitting to seasonal unrelated terms and due to the high chance of finding search terms that match the propensity of flu while being unrelated whatsoever. The latter results in search terms without any forecasting power for ILI activity, thus overfitting to a small number of cases and GFT being partly a winter detector and partly a flu detector. Furthermore, the GFT algorithm failed to capture both the temporal autocor-relation and seasonality in the ILI activity data, which could be extracted from autoregressive forecasting models [22] [23]. A recently published Harvard paper stresses that the GFT algorithm significantly failed to (1) capture the changes of people’s Internet search behaviour over time, (2) incorporate newly available ILI activity reports during the evolution of a flu epidemic, and (3) capture intrinsic time series properties such as seasonality.

In response, they propose a flexible, self-correcting, ro-bust and scalable autoregressive model with Google search data (ARGO) to track influenza-like activity in real-time, out-performing all other proposed methods. The ARGO model dynamically incorporates (1) newly available information on ILI activity as it becomes available, (2) automatic selection of the most useful Google search queries for prediction, and (3) long-term cyclic information. The combination of seasonal flu information (autoregressive part) and dynamically reweighted Google search information (exogenous input) is key in the

enhanced performance over other methods [10].

Incorporating Google search information in Time Series mod-els enables detection of sudden ILI activity changes, over-coming the “delaying” effect of time series models. Google searches are, however, very sensitive to sentiment and over-reaction by the public [10]. Although an increase in Google search activity approximates interest in a health topic, it fails to provide any contextual information. People may search for health information unrelated to their own health. Some searches can be academically related or driven by news re-ports [24].

For this reason, health related tweets have been proposed as indicator for ILI activity [8] [25]. In comparision to Google data, Twitter data (1) provides more contextual information, (2) more often reflects the user’s own level of disease or incu-bation, and (3) contains more useful information for forecast-ing with smart feature engineerforecast-ing [25]. Furthermore, tweets are very noisy individually, but can reveal underlying epi-demic patterns when aggregated [8]. The weekly number of flu related tweets is found to be highly correlated with weekly ILI activity [8] [25]. In general, flu related twitter activity in the previous week(s) can substantially improve prediction accuracy of an autoregressive model in predicting ILI activ-ity in real-time. Twitter-based models can even outperform Google-based models, particularly in periods of normal news coverage [15].

Another indicator that receives significant attention, is the amount of traffic observed on influenza-related Wikipedia articles. More specifically, the free and publicly available statistics and trends of selected articles. Compared to Google searches or twitter activity, Wikipedia statistics are the least in-fluenced by high media coverage [12]. On the other hand, the signal-to-noise ratio of Wikipedia can be problematic since Wikipedia is the most preferred source for finding health re-lated information, regardless of being ill [15]. Combining multiple influenza-related Wikipedia article view data can however be effective for estimating ILI activity up to two weeks in advance [12].

A comparison of these three Web-based influenza indicators for seasonal change in ILI activity through Bayesian Change Point Analysis, points out that Google data has the highest sensitivity and positive predicted values (PPV) in influenza activity data (92% and 85%), followed by Twitter (50% vs 43%) and Wikipedia (33% and 40%). Seasonal change in Google data therefore seems to be most aligned with seasonal change in ILI activity data [15].

Lastly, instead of utilizing information from a single in-fluenza indicator, it is more advantageous to combine multiple indicators to achieve higher prediction accuracy. With en-semble Machine Learning algorithms, real-time two to three week forecasts can achieve comparable prediction accuracies as autoregressive models [14].

(7)

3. Materials and Methods

3.1 Data Collection and Inspection

Hospitalization Data Data on hospitalizations has been re-trieved from databases of four hospitals in the Netherlands. These databases have been made available by HOTflo Com-pany. The hospitals selected as sample for the Dutch Health-care system are:

⌅ One small-sized hospital (<500 beds) ⌅ Two medium-sized hospital (500-1000 beds) ⌅ One large-sized hospital (>1000 beds)

This sample is selected since (1) these hospitals are affiliated with HOTflo Company, (2) databases of these hospitals is most reliable, (3) the hospitals are geographically dispersed, and (4) these hospitals represent a diverse patient population. Key figures of the selected hospitals can be found in

Appendix I.

Evident from the discussed literature, influenza-associated hospitalizations elevate mainly among young children, se-niors, patients with chronic conditions and cardiopulmonary diagnoses during influenza epidemics. The suspected depart-ments that receive a significant amount of patients that are vulnerable to influenza activity are therefore; (1) Geriatrics, (2) Pediatrics, (3) Pulmonary Medicine, and (4) Cardiology. Validation with a Microbiologist employed at the Medical Microbiology Laboratory of Medisch Centrum Alkmaar, re-veals that most vulnerable patients are elderly patients with limited functioning organs, patients with chronic pulmonary diseases such as Asthma and Chronic inflammatory demyeli-nating polyneuropathy (CIDP). Based on the clinical expertise of the Microbiologist, the presumable specialisms most affect by influenza activity, sorted by relevance, are:

(1) Pulmonary Medicine (2) Internal Medicine (3) Geriatrics (4) Cardiology (5) Pediatrics

The number of emergency hospitalizations for all special-ism (with the exception of Geriatrics) have been retrieved from all hospitals in the period October 28, 2013 – April 21, 2017, with the exception of the large-sized hospital due to data unavailability before January 01, 2014. The period cap-tures four influenza seasons. Since not every hospital has a dedicated Geriatrics department, emergency hospitalizations for Geriatrics has solely been retrieved from Medisch Cen-trum Alkmaar in the same period. Hospitalizations have been grouped by entry date and aggregated to weekly levels for every hospital. The weekly hospitalizations are then summed over the hospitals. Aggregation to weekly level is necessary

to make the data comparable to the influenza activity report-ing. The number of weekly hospitalizations per department is outlined in Table1.

The moving average with a 12 week rolling window is plotted per department in Figure1to inspect of seasonal patterns inherent in the time-series. An Augmented Dickey–Fuller test (ADF) for stationarity (alternative hypothesis = ”stationary”) is performed to investigate trends inherent in the time-series. The Autocorrelation Function (ACF) and Partial Autocorre-lation Function (PACF) with 2-years lag have been gener-ated in Figure2 to investigate seasonality in the data. On inspection, there is enough statistical evidence to conclude that Geriatrics and Pediatrics are stationary with p-values of 0.00097 and 0.000144 respectively. There is not enough statistical evidence to conclude that Pulmonary Medicine (p-value = 0.10948), Cardiology (p-(p-value = 0.15668) and Internal Medicine (p-value 0.25966) are stationary. The downward trend in hospitalizations can presumably be attributed to in-creased prevention and inin-creased quality of care and medical innovation. Seasonality seems, on inspection, be strongly present for Pulmonary Medicine and moderately present for Pediatrics and Cardiology. An additive model seems appro-priate since the magnitude of seasonal fluctuations around the trend cycle does not seem to vary with the level of the series [26].

Table 1. Summary of average weekly Hospitalizations with the standard deviation in brackets

Pulmonary Cardiology Pediatrics Internal Medicine Geriatrics Small-sized Hospital 11(±4) 30(±7) 24(±10) 19(±5) -Medium-sized Hospital 1 32(±8) 94(±16) 28(±7) 45(±9) 5(±2) Medium-sized Hospital 2 7(±4) 24(±6) 16(±5) 14(±4) -Large-sized Hospital 27(±8) 90(±16) 77(±12) 59(±14) -Total 138(±21) 235(±28) 158(±16) 166(±21) 5(±2)

Influenza Data Data on influenza activity is collected from

FluNet Europe, a sentinel influenza surveillance system based on an organize network of primary care physicians, general practitioners (GPs) in Europe who report on the weekly num-ber of patients seen with Influenza Like Illness (ILI). The National Institute for Public Health and Environment (RIVM) reports numbers and figures of influenza activity in the Nether-lands to both the WHO and the European Centre for Disease Prevention and Control (ECDC).

The RIVM collects this data from the Dutch Institute for Heatlth Research (NIVEL), who report on the weekly regis-tered ILI incidence rate per 100,000 inhabitants based on the number of self-referred patients with ILI at the General Prac-titioner (Sentinel network). A subset of specimen of selected patients is sent in to the National Influenza Surveillance Labo-ratory at the Erasmus Medical Centre for detailed analysis on the characteristics of circulating influenza viruses according to influenza type (type A or B) and subtype (e.g. A(H3N2) and A(H1N1)). The RIVM issues an official alert when the

(8)

(a)Pulmonary Medicine (b)Cardiology

(c)Pediatrics (d)Internal Medicine

(e)Geriatrics

Figure 1.Moving Average plots of selected departments

reported patients with ILI activity is over the epidemiological limit of 5.1 cases of ILI activity per 10,000 residence for two consecutive weeks. Data on influenza activity is reported on a weekly basis during the influenza surveillance season (week 40 – week 20 of the following year).

Due to long processing times and limited financial resources available for this research, private data concerning the weekly ILI incidence rate from the NIVEL has not been used despite the practical usage for the epidemiological limit. Instead, publicly available data from the Influenza Surveillance Lab-oratory on the specimen tested and processed is used. More specifically, influenza activity in this study is defined as the percentage of specimen that is found to be positive over all specimen processed in that week. Influenza activity over the

four most recent influenza seasons, smoothed with a moving average window of 8 weeks, is outlined in Figure3.

Google Data Similar to the paper of Yang et. al [10], highly correlated terms to the ”ik heb griep”2 search are obtained

usingGoogle Correlate. Terms that are semantically uncorre-lated to the flu, but correuncorre-lated to winter, like ”skibroek heren”3

or ”rode kool stamppot”4have been removed. A total of 80

correlated terms remain. For these terms, weekly search vol-ume have been collected for the period November 3, 2013 - April 21, 2017 fromGoogle Trends. The selected period captures four influenza seasons. More information on the

2I have the flu 3Skipants for men 4Red cabbage stove

(9)

(a)Pulmonary Medicine (b)Cardiology

(c)Pediatrics (d)Internal Medicine

(e)Geriatrics

Figure 2.ACF and PACF of the selected departments

correlated terms can be found inAppendix II. The search terms are aggregated to a total weekly search activity level related to the flu. Both the aggregated search activity and the individual search activity terms have been transformed using a log transformation. A small number of 0.5 is added to avoid taking the log of 0. The log transformation is appropriate because Google search frequencies typically have an expo-nential growth rate near the peaks. Aggregated search activity related to the flu, smoothen with a moving average window of 8 weeks, is outlined in Figure4.

Twitter Data Historical flu-related tweets are collected through a Python script written on top of the Twitter Offi-cial Search API, publicly available onGithub. This script bypasses some limitations of the Search API like rate limits and time constraints.

A search query is written containing a lower and upper bound for the date, geocoordinates (longitude, latitude, radius) and a query text to be matched in the tweets. The date is set to restrict the search to 20 May, 2013 - 22 April 2017. The geocoordinates are set equal to the geographic midpoint of the Netherlands (in Lunteren) with a radius of 150 km, capturing all towns and villages in the Netherlands. The query text is set to:

”griep OR verkouden OR hoest OR koorts OR keelpijn OR verkoudheid OR niezen OR keelontsteking OR snotteren OR uitzieken”5

A total of 17,392 tweets have been collected over the selected

(10)

Figure 3.Officially reported Influenza Activity

Figure 4.Influenza Activity on Google

period. These tweets have been filtered in the following way to represent true weekly flu activity:

⌅ Google’s Language Detection, languages that make no sense (such as Slovenian or Croatian) detected with a high probability are removed.

⌅ Usernames containing one of the query words (e.g. Lau-rens Niezen) are removed.

⌅ Twitter text that contains a combination of ”Hoest” with ”jou” or ”nou” is removed since it represent Dutch

in-formal language for ”how are you?”

⌅ Non ASCII characters and informal language (#hashtag, user) are removed [25].

⌅ Retweets are removed since they do not represent a new ILI case [8].

⌅ Tweets from the same user within a certain syndrome elapse time are removed since they do not represent new ILI cases. Previous research has shown that the highest correlation persist when the syndrome elapse time is 1-week [8].

⌅ Tweets are aggregated to weekly level.

A total of 4,264 tweets are removed leading to a total of

13,128 tweets to represent flu activity on Twitter. Weekly Aggregated twitter activity related to the flu, smoothed with a moving average window of 8 weeks, is outlined in Figure

5. From Figure5it is clear that the popularity of Twitter in the Netherlands has dropped drastically. However, there is still some seasonal pattern inherent in the data as shown in Figure6.

Figure 5.Influenza Activity on Twitter

Figure 6.Boxplot of Flu Activity on Twitter

Statistics of Wikipedia visits are collected since it is the least able to capture seasonal change in ILI data, has a high signal-to-noise ratio and is often not implemented in regression or time-series methodologies. Furthermore, in a near to bilingual country as the Netherlands, it is hypothesized that a significant portion of people turn to the more extensive English Wikipedia page for flu-related information.

3.2 Feature Engineering

W/S Twitter Feature Since the popularity of Twitter in the Netherlands has decreased drastically in recent years, a new feature is constructed to capture the seasonal change in the

(11)

flu related activity. This feature involves the division of ev-ery week in the influenza season (week 40 - week 20 of the following year) by the average weekly activity of the preced-ing summer season (week 20 - 40). As an illustration, every single week in the influenza season 2015/2016 is divided by the average activity in the summer season of 2015. Every week in the summer season is divided by the average activity of the corresponding summer. After the design of this new W/S feature, more seasonal change is captured as presented in Figure7.

Figure 7.W/S feature

Change Detection Change detection aims to detect sudden changes in timeseries data when a certain threshold is met. Because influenza epidemics start at different times in the influenza season, change detection can be used to estimate the size, promptness and rise of influenza activity [27]. A popular change detection method for detecting aberrations in influenza surveillance is the modified Cumulative Sum (CUSUM) Con-trol Chart technique since it is a very powerful method for identifying changes in the process average. CUSUM makes use of all historic data since each value is a function of the previous datapoints [28].

The CUSUM algorithm, available onGithub, is implemented and tuned for the influenza data. The parameters set for the drift is 0.02 and the threshold is set to 0.1. This means that a change point is detected when the process average increases or decreases by 10%. The change stops when it fails to in-crease/decrease by 10%. The drift reflects the smoothing and elimination of the short-term fluctuations, and the threshold reflects the performance of the method in detecting sudden changes [29]. The threshold is purposely set low in order to capture as much direction changes in the influenza data as possible. In total, 55 change points are detected. The designed feature is the amplitude of the change between the start and endpoint of the detection. The start points, ending points and detection points are plotted in Figure8.

All features are scaled to [0,1] through min-max scaling.

Scal-Figure 8.Change Point Detection (CUSUM)

ing makes it easier to compare timeseries and improves vis-ibility of comparative plots. Additional features included in the predictive models are (1) interaction term between the change point feature week lag) and Google searches (1-week lag), and (2) interaction term between the change point feature (1-week lag) and W/S feature (1-week lag), and (3) lagged variables of Google searches and the W/S feature (up to 4 weeks lag). For the Machine Learning based forecast-ing models (discussed in the next section), lagged influenza activity variables are included (up to 4 weeks).

3.3 Forecasting models

The predictive models used in this research can be divided in two categories. Timeseries models (benchmark, ARIMA, ARGO) and Machine Learning models. Machine Learning models tend to emphasize approximating non-linear condi-tional mean function in a non-parametric fashion. In time-series there is often not much conditional-mean non-linearity and when there is, it is of highly specialized nature and approx-imated in a tightly-parametric fashion. In Machine Learning techniques, feature engineering is very important when used for timeseries since it can have a significant impact on perfor-mance. With feature engineering, trends, lags and seasonality inherent in the timeseries data can be engineered. Three en-semble machine learning methods proposed by Santillana et al. are implemented [14]. The algorithms proposed are known to have a distinct strength in combining information from multiple data sources. Brief explanations on the algorithms are given below. A more extensive illustrative explanation is given inAppendix III.

Benchmark As a benchmark, an ARIMA(2,0,2) is used on the univariate influenza data. Incorporating two autoregressive terms (p) and two moving average terms (q) achieves the best AIC and BIC criteria over all set of parameters (p,d,q). Differencing (d) is set to zero since the influenza series is stationary. Setting d=1 while the constant in the model is different from zero and the series is stationary, leads to straight line for the long-term forecast due to convergence of the series

(12)

to the mean of the series [26]. The formal Dickey-Fuller test confirms this claim (p-value = 0.02515, alternative hypothesis = ”stationarity”). The residuals resemble a white noise process (randomness), indicating that this model does not exhibit a lack of fit. Tested formally with a Ljung-Box test (p-value = 0.9928).

ARGO One of the top performing autoregressive models presented in literature is the ARGO model. This robust, scal-able and self-correcting model including autoregressive terms and individual Google search queries is driven by a hidden Markov model. A more extensive formulation of the model is presented the paper of Yang et al. [10].

Stacked Linear Regression Stacked Linear Regression lin-early combines the information inherent in all the influenza related features, which are weak predictors on their own, into a more accurate and robust single predictor. Because the weak predictors are highly correlated, Lasso regularization is needed to discard redundant information. Lasso regular-ization penalizes the size of the coefficients by trading-off the Residual Sum of Squares (RSS) and the magnitude of the coefficients. This results in the best predictive model with the smallest number of independent variables in the regression. SVM Regression Support Vector Machine (SVM) regres-sion can, unlike multivariate linear regresregres-sion, incorporate linear transformation functions (kernels) to represent non-linear relationship between variables. With the help of kernels, the independent variable can be mapped to a higher dimen-sional feature space. The radial basis function (RBF) kernel can even map the independent variable to an infinite feature space. The high dimensional space (with sufficient dimen-sion) ensures that a maximum-margin hyperplane exists that can separate data points. SVM models minimize an epsilon-insensitive cost function where errors that are less than epsilon are ignored, leading to better generalization of the model on out-of-sample data. When used for regression, SVM produces a non-linear model where the features serve as regression vari-ables. Because it is not known a-priori how many dimensions are needed, a RBF kernel is used.

Generalized Boosting Regression Decision Tree Regres-sion with AdaBoost (AdaBoost.R2) or Generalized Boosting Regression (GBM) fits a sequence of weak predictors (that are slightly better than random guessing such as small decision trees) on repeatedly modified versions of the training data. The predictions from the weak predictors are then combined through a weighted majority vote to produce a final ensemble prediction. At every iteration, training examples that were incorrectly predicting by the model, receive a higher weight, whereas the weights are decreased for training examples that were predicted correctly. As the number of iterations increase, training points that are difficult to predict receive more prior-ity (i.e. weight) to force the model to concentrate on training examples that were missed at the previous iteration.

Dynamic Regression It is widely agreed in literature that Internet-based influenza indicators, when used complemen-tary to traditional indicators, improve the performance of the forecasting model. The ARIMA(2,0,2) model is therefore extended with exogenous predictors (Internet-based influenza indicators).

The model proposed is a dynamic regression model, cap-turing temporal autocorrelation, seasonality and exogenous information. An important model assumption is that all exoge-nous predictors are stationary. The formal Dickey-Fuller test for stationarity shows that the Google timeseries (p-value = 0.02477), the W/S feature timeseries (p-value = 0.01767) and the change point feature (p=0.01) are stationary. The model is formulated as: yt=µ + P

Â

p=1 apyt p+ Q

Â

q=1 qqet q+ K

Â

i=1 biX1t i + N

Â

j=1gj X2t j+X1t 1X3t 1+X2t 2X3t 1+et et⇠ N (0, s2)

yt=influenza activity at week t

X1t=aggregated Google searches at week t X2t=Twitter activity at week t

X3t=amplitude of change in influenza activity week t et=white noise error term at week t

Where (P,Q,K,N) = (2,2,4,4) representing the ARIMA(2,0,2) model and the digital influenza activity of the previous month. The residuals in the model resemble a white noise process (p-value = 0.8277).

3.4 Evaluation

Evaluation Metrics Determination of the effect of influenza on the hospitalization of the five selected departments is eval-uated with three metrics. The first metric is the Pearson Correlation to measure the degree of linear relationship be-tween the hospitalizations and influenza activity over time. The Pearson Correlation assumes a normal distribution for both variables of interest, a linear relationship between the variables and normally distributed errors (homoscedasticity). The relationship however, might not be linear and the distribu-tion of hospitalizadistribu-tions might be skewed. As is the case with many health datasets [30].

The non-parametric Spearman Rank Correlation test is therefore used as a second metric. The test measures the de-gree of association between two variables and does not make any assumptions about the underlying distribution of the data. Lastly, the Dynamic Time Warping (DTW) algorithm is used to find the optimal alignment between timeseries in

(13)

lin-ear time and space to determine its similarity. The FastDTW algoritm proposed by Salvador and Chan [31] works on both small and large timeseries datasets. DTW can rank the depart-ments on similarity between the fluctuations in hospitalization and fluctuations in influenza activity. A more extensive expla-nation is provided inAppendix IV.

The performance of the forecasting models discussed in the previous subsection are evaluated against four evaluation met-rics. Namely, the Root Mean Squared Error (RMSE), the Mean Absolute Error (MAE), the Pearson Correlation, and hit rate. The RMSE measures the difference between the pre-dicted and true values. The metric penalizes large errors due to squared nature and is particularly useful when large errors are undesirable and increases exponentially [32]. For inter-pretability, MAE is reported since it averages the magnitude of errors without considering the direction. It is particularly useful when the undesirability of the error increases linearly. The Pearson Correlation is reported to measure the linear de-pendency between the predicted influenza activity and the true influenza activity. The hitrate is reported to measure how well the model can predict the direction of change in influenza activity independent of the magnitude of change. Formal definitions of these metrics can be found inAppendix V. Cross-validation Since serial correlation is inherent in time-series data, regular k-folds cross validation ore leave-one-out cross validation leads to incorrect predictions. Leaving out an observation does not remove all the associated information due to the serial correlation and is therefore problematic [26]. These obstacles are overcome by (1) rolling cross valida-tion where an initial training window is used for training and grows by one observation each round until the training win-dow and forecast horizon captures the whole timeseries, and (2) a forward shift of a fixed training length by the forecast horizon after each iteration. The forecast horizon is set to 32 weeks, capturing a full influenza season. The initial training window is set to 84 weeks, capturing two influenza seasons and one summer season (week 20 - week 40). The k-fold cross validation procedure is summarized in9.

Figure 9.k-fold cross-validation for time-series models

Evaluation Criteria The forecasting models are evaluated based on the following criteria.

Criteria 1: How accurately can the model predict the in-fluenza activity over the whole inin-fluenza season?

Criteria 2: How accurately can the model predict the in-fluenza activity for the upcoming two weeks?

Criteria 3: How well can the model predict the direction of change in influenza activity?

Criteria 4: Is the model underfitting or overfitting over a short forecasting horizon?

Criteria 5: Is the model interpretable? 3.5 Research Tools

All data sources are pre-processed with Pandas, a high-performance and flexible data analysis tool for the Python pro-gramming language. Data manipulation is performed using NumPy, an array-processing package that can efficiently ma-nipulate large multi-dimensional arrays for Python. Statsmod-els is used for statistical testing and modeling in Python. Scipy for statistical tests. The ”Caret” [33] and ”Forecasting” [26] packages in the statistical computing language R are used to construct the predictive models. The R package ”MLmetrics” [34] is used for evaluation. The ”ARGO” [35] package is used to implement the ARGO model proposed by Yang. The ”forecastHybrid” package is used to perform cross validation

[36].

4. Results and Discussion

4.1 Effect of Influenza Activity on Hospitalizations In the majority of departments there is a trend and/or season-ality present in the hospitalizations over time (as discussed in subsection 3.1). Removing the seasonal component and the trend-cycle component through additive decomposition, leads to a remainder of the information in the series that can not be explained by seasonality or trend-cycle. The remain-der component is used to determine the effect of (1) total influenza activity on hospitalization, and (2) influenza activity type A on hospitalization. The latter is included since the rate of hospitalization increases significantly during the cir-culation of Influenza A viruses, as discussed in (subsection 2.1). The effect is evaluated by the Pearson Correlation, Spear-man Correlation and Dynamic Time Warping, as described in subsection 3.4. Table2outlines the results for the five departments over the whole sample covering three influenza seasons. Complementary, the results over four influenza sea-sons excluding the biggest hospital due to data inavailability are outlined inAppendix VI. The remainder component is plotted against influenza activity in Figure10.

Some interesting conclusions can be drawn from the results in Table2. Hospitalization that can not be attributed to sea-sonality or trend-cycle for Pulmonary Medicine seem to be

(14)

Influenza Type A + Type B Influenza Type A

Pearson Spearman DTW Distance Pearson Spearman DTW Distance

Pulmonary Medicine 0.355⇤⇤⇤ 0.341⇤⇤⇤ 32.5 0.379⇤⇤⇤ 0.358⇤⇤⇤ 35.8

Cardiology 0.072 0.096 39.8 0.104 0.127⇤ 40.5

Pediatrics 0.030 0.044 46.2 0.136⇤ 0.159⇤⇤ 51.4

Internal Medicine 0.089 0.101 44.8 0.149⇤⇤ 0.159⇤⇤ 50.7

Geriatrics 0.174⇤⇤ 0.195⇤⇤⇤ 39.6 0.256⇤⇤⇤ 0.298⇤⇤⇤ 41.2

Table 2.Effect of Influenza Activity on Hospitalizations over three influenza seasons.

(a)Pulmonary Medicine (b)Cardiology

(c)Pediatrics (d)Internal Medicine

(e)Geriatrics

(15)

correlated to influenza activity for both type A and type B. While the correlation with influenza type A seems to be higher, it is only slightly higher. It seems that influenza type A and influenza type B are both factors that can be attributed to higher hospitalization for lung patients. Type A viruses are constantly changing and are responsible for flu epidemics. Type B viruses on the other hand, cause less severe reactions to humans than type A virus. Nevertheless, type B influenza can still be extremely harmful. They do not cause pandemics however. Since high influenza activity has a significant ef-fect on emergency visits for respiratory complaints, influenza activity regardless of type is a factor that should be taken in consideration. Additionally, the remainder component seems to be elevated every winter season, even after removing sea-sonality (Fig.11a). The effect becomes even more significant when an additional influenza season is added (Table6). For Cardiology hospitalizations, there seem to be no correla-tion with influenza activity. The DTW distance between the two series is however relatively low and comparable to that of Pulmonary Medicine. This means that the patterns seem to match reasonably despite being uncorrelated. The remain-der component seem to show a peak early in the influenza season (Nov-Dec) when influenza activity is generally low (Fig. 11b). One possible explanation could be that Cardiac patients are vulnerable to the slightest influenza activity. Clin-ical expertise is needed to refute or confirm this suspicion. Further validation is needed. Adding another influenza season however, causes an even more optimal alignment in space and time and a significant non-linear correlation between overall influenza activity (Table6). This raises the suspicion that there is a moderate effect present of overall influenza activity on hospitalizations for Cardiac patients.

Perhaps the most surprising result is the very low and insignif-icant correlation of Pediatrics hospitalizations and influenza activity, despite the impact of influenza activity on hospi-talizations of young children (subsection 2.1). A possible explanation could be the high vaccination rate among children in the Netherlands. Another explanation could be the fact that young children have not yet developed an extensive immune system making them vulnerable to other viruses such as the Respiratory syncytial virus (RSV). RSV is related to the In-fluenza viruses and can especially have severe consequences for children younger than 1 year in the form of Bronchitis or Pneumonia [37]. These results persist when an additional influenza season is added (Table6).

For Internal Medicine hospitalization, there seem to be no correlation with influenza activity. Although the hospitaliza-tion are slightly correlated with influenza type A, the number of hospitalizations can already be sufficiently explained by the seasonality and trend-cycle component as reflected by the marginal peaks of the remainder component (Fig. 10d). These findings contradict the suspicion of elevated Internal

Medicine hospitalizations during influenza epidemics, sus-pected by the Microbiologist. As with Pediatrics, these results persist when an additional influenza season is added (Table6). Hospitalization for Geriatric patients on the other hand, seem to be significantly correlated. Although relatively weak, the remainder components seem to be elevated in the winter pe-riod where influenza activity is high. These findings support the claim that especially elderly patients are vulnerable to influenza activity and are likely to be hospitalized. In gen-eral, elderly patients are more likely to suffer from chronic conditions. It is therefore surprisingly that the correlation is relatively low. In the Netherlands, vaccination coverage for elderly patients ( 55) is among the highest in Europe with a coverage of roughly 70% according to a National seasonal influenza vaccination survey conducted by the ECDC. More details can be found inAppendix VII. The high vaccination rate might explain the relatively weak correlation. Adding an-other influenza season however, slightly improves the optimal alignment in space and time between overall influenza activity (Table6).

4.2 Forecasting Influenza Activity

The two digital influenza indicators extensively discussed in subsection 3.1 are aggregated Google Searches and Influenza related activity on Twitter. Influenza activity is plotted against these indicators in Figure11. The cross-correlations of these indicators with influenza activity is outlined in Table3. From Table3it seems that flu related activity on twitter in the previ-ous month is most correlated with influenza activity, whereas real-time time flu related Google searches and flu related Google searches from last week is most correlated with in-fluenza activity. Google searches show a significantly higher correlation than Twitter activity, which is in line with the pa-per of Sharpe et. al [15].

Feeding these lagged indicator features into the models dis-cussed in subsection 3.3, yield the results summarized in Table4. The forecasting errors (Fig.12a) and absolute fore-casting errors (Fig.12b) of the models are plotted in figure

12. The forecasted Influenza activity for the influenza season 2016/2017 of the models is plotted in Figure13. Guided by the model evaluation criteria defined in subsection 3.4, some interesting conclusions can be drawn.

Firstly, incorporating digital indicators with time series infor-mation significantly improves forecasting performance since the baseline model is beaten by every single model. This shows the added value of digital disease detection information over simple historical autoregressive terms. Classical ARIMA models are typically well-suited for short-term forecast (one-step ahead forecast) and not for long-term forecast due to the convergence of the autoregressive part to the mean of the series [26]. This is supported by the relatively good accuracy (RMSE = 0.075 and MAE = 0.061) of the 1-week forecast, but

(16)

poor long-term forecast (RMSE = 0.237 and MAE = 0.204). Secondly, in terms of forecasting the influenza season for tacti-cal planning (2 weeks - 6 months), Machine Learning models show a good fit (Fig.13) and outperform all timeseries models on both accuracy (criteria 1) and direction (criteria 3). One explanation could be the presence of nonlinear relationships between variables present in the model. Machine Learning models are particularly good at capturing these nonlinear relations in data in a nonparametric fashion [38]. Besides long-term forecasting, the Machine Learning algorithms also outperform the timeseries models in 1-2 week short-term fore-casting ability for operational planning (criteria 2). Among the Machine Learning models, Generalized Boosting Regression predicts the most direction of change and is able to capture peak intensities (Fig.13). SVM regression is also able to de-termine these peaks, whereas the Stacked Linear Regression model can more accurately predict influenza activity in 1 or 2 weeks. This might indicate that there is a optimal dynami-cally chosen linear combination of weak predictors that form a single robust predictor for influenza activity.

The problem with the majority of Machine Learning Algo-rithms is the interpretability (criteria 5). Machine Learning models are often referred as black-box or data-driven models. When considering models, interpretability should be consid-ered alongside accuracy. Certainly in the Health domain, medical experts have to understand the intelligence behind the predictive model to accept the model for clinical decision making [39]. A good model should be (1) understandable in reasonable time, (2) accurate, and (3) in line with domain knowledge [40]. Based on this criteria, timeseries models are preferred since the dependencies among the data and variables can be well understood and relatively easy explained. Lastly, considering the short-term fit of the model (criteria 4), the proposed dynamic model seem to over-fit over a shorter forecasting horizon (Fig. 12a and Fig. 13). This suggest that this model can detect elevated influenza activity earlier and could therefore serve as an early warning indicator for elevated influenza activity. A notable result is the significance of the variables in the model. Both the autoregressive and the moving average terms are not significant, whereas only the Google searches of the previous month (4-weeks lag), In-fluenza activity on Twitter of the previous months (4-weeks lag), and the cross-term between the change point feature and Google searches of the previous weeks are significant at a 5% significance level. Excluding the cross-terms in the model however, leads to significant autoregressive and significant moving average terms together with lagged Google searches (lag 1, lag 3 and lag 4) and influenza activity on Twitter of the previous months. It seems that the cross-terms is able to capture the temporal autocorrelation and seasonality of the influenza activity. In view of interpretability (criteria 5), a model with cross-terms is preferred since less terms are used

and the model becomes a multivariate regression model (hav-ing less strict model assumptions than timeseries models). The Model specifications are summarized inAppendix VIII. The ARGO model on the other hand, seems to under-fit sud-den elevation in influenza activity over a shorter forecasting horizon (Fig.12a). This confirms the usefulness of utilizing multiple influenza indicators for prediction. As a last remark, ARGO dynamically reweights search information through Lasso regularization. The search terms most often selected by the model as predictor for influenza activity are outlined inAppendix IX. The top 3 most selected Google searches are ”longontsteking”, ”griep spierpijn”, and ”griep hoofdpijn”,

reflecting health concerns in the general population.

(a)Google searches

(b)Influenza Activity on Twitter

Figure 11.Comparison Influenza Activity - Digital Indicator

cross-correlation

lag= 0 lag = 1 lag = 2 lag = 3 lag = 4

Google 0.722⇤⇤⇤ 0.666 0.603 0.535 0.457

Twitter 0.299⇤⇤⇤ 0.307 0.330 0.346 0.354

Table 3.Cross-correlations digital influenza indicators and influenza activity

(17)

Table 4.Evaluation of Forecasting Models

Evaluation Metrics

RMSE MAE Correlation Hitrate

Baseline ARIMA(2,0,2)

1-week 0.075 0.061 -

-2-week 0.188 0.143 -

-season 0.237 0.204 -0.43 44.82%

Dynamic Regression (without interaction terms)

1-week 0.051 0.051 -

-2-week 0.070 0.068 -

-season 0.144 0.128 0.73 55.17%

Dynamic Regression (with interaction terms)

1-week 0.138 0.138 - -2-week 0.098 0.078 - -season 0.149 0.132 0.63 79.31% ARGO 1-week 0.105 0.105 - -2-week 0.107 0.107 - -season 0.109 0.099 0.94 62.07%

Stacked Linear Regression

1-week 0.022 0.022 - -2-week 0.050 0.044 - -season 0.084 0.076 0.96 62.07% SVM Regression 1-week 0.024 0.024 - -2-week 0.066 0.058 - -season 0.106 0.088 0.93 65.52%

Generalized Boosting Regression

1-week 0.031 0.031 -

-2-week 0.052 0.048 -

(18)

(a)Forecasting Errors

(b)Absolute Forecasting Errors

Figure 12.Forecasting errors Influenza season 2016/2017

Figure 13.Forecast of Influenza season 2016/2017

5. Limitations and Implications

Limitations Despite the promising results of the forecasting models and the effect of influenza activity on hospitalization for Pulmonary Medicine, Cardiology and Geriatrics, there are several limitations to this study.

First, due to long processing times and limited financial

resources, no data is collected on the influenza incidence rate. The true influenza activity will in truth be higher than the number of specimen processed at the Influenza Surveillance Laboratory. In turn, the true influenza activity might have a more significant effect on hospitalizations.

Second, there are some limitations to the use of Internet-based flu indicators. Twitter users are, on average, younger than the general population of a country and in general less vulnerable to the flu. Furthermore, the popularity of Twitter has drastically decreased in the Netherlands (Fig.5), raising doubts about its usability in the future. Full access to the Twit-ter search API could however have resulted in more tweets retrieved, providing a more accurate picture of flu related ac-tivity on Twitter. Google searches on the other hand, suffer from influence by news reports making Google searches a “noisy” indicator for actual disease activity.

Third, information on Wikipedia Statistics for flu related pages are not included in the models. Utilizing this informa-tion might result in even better accuracy. This comes at the expense of collecting and processing more data for a rela-tively small improvement. Furthermore, utilizing Wikipedia inherent the problem of determining if a single user used both Google and Wikipedia for examining flu related information. Consequently, estimates of new influenza related cases will be inflated.

Fourth, influenza activity is not consistent from season to season. It is therefore necessary to constantly monitor the methodologies discussed on newly available training data. This poses limitations to ARIMA based models, especially during pandemics, which usually occur off-season. As with any predictive model, the quality of past performance does not guarantee the quality of future performance.

Lastly, the effect of influenza activity on the departments examined in the study has been relatively weak for most de-partments. This can most probably be attributed to a signif-icantly higher vaccination rate in the Netherlands compared to other European countries. Redoing this study in another European country with a lower vaccination rate might reveal a more significant effect. The effect of influenza activity on Pediatric hospitalizations is nihil due to the resemblance of the RSV virus to Influenza. Using the dynamic models (ARGO, Dynamic Regression) for forecasting RSV activity could be beneficial to explain the elevated hospitalizations for Pediatrics.

Implications In this study, the effect of influenza activity on hospitalization for various departments is examined. On top of that, several forecasting models have been proposed that can predict influenza activity. The appropriate forecasting model should be selected according to the purpose of use. Possible purposes could be (1) the use of the model as early warning system, (2) short-term forecasting, (3) peak intensity determination, or (4) examining the dynamics of the influenza season. The forecasted values of influenza activity can then be used as exogenous variable in more advanced multivariate regression models or time series models aiming to predict the

(19)

number of patients present for the Pulmonary Medicine, Car-diology or Geriatrics department, complementary to calendar variables and as a substitute for weather variables.

6. Conclusion and Recommendation

Conclusion Inefficiencies in the healthcare system such as overcrowding and increased waiting time has shifted focus on developing models that can predict the number of patients visits to address these issues. A factor that could presumably improve the reliability of the predictions complementary to calendar variables in regression or timeseries models is in-fluenza activity. Highly circulating inin-fluenza activity can have a significant effect on emergency hospitalizations for a variety of departments and patient groups. Determining the onset, severity and longitude of influenza outbreaks and elevated activity is hard since these dynamics change from season to season.

Multiple studies have therefore aimed to develop digital Influenza surveillance models as an improvement over tradi-tional surveillance methods including online influenza activity indicators on Google and Twitter. There is however, no clear consensus over which model can capture the dynamics of in-fluenza activity most accurately and an overwhelming amount of studies tend to focus on the United States solely. The aim of is this study has been twofold, (1) contributing to the existing literature by examining the effect of influenza activity on hos-pitalizations in the Netherlands for departments that receive a significant proportion of patients vulnerable to influenza activity, and (2) evaluating predictive models that can provide 1-2 week forecast for influenza activity in the Netherlands most accurately.

Through the collection, preprocessing and analysis of a variety of data sources including hospitalizations, influenza re-porting, Google searches and Tweets and feature engineering, it is demonstrated that Influenza activity has a moderate effect on Pulmonary Medicine hospitalizations and a weak effect on Cardiology and Geriatrics hospitalizations not explained by winter seasonality. Furthermore, it is has been shown that (1) Influenza related Google searches of the previous week and previous month has a predictive effect on influenza activity of the upcoming week, and (2) Influenza activity on Twitter in the previous month has a predictive effect on influenza activity of the upcoming week.

For strategic purposes, several predictive models have been evaluated and discussed that can predict the longitude and amplitude of influenza activity. Among the models eval-uated; (1) all models beat the baseline model demonstrating the importance of including digital disease detection infor-mation in Influenza surveillance models, (2) the regression based models (Stacked Regression and Dynamic Regression) perform particularly well on short-term influenza activity fore-casting, whereas (3) non-linear models (SVR and GBM) are particularly good at capturing peak intensities.

Certainly, in the Health domain, models need to be

un-derstandable, accurate and in line with domain knowledge. The appropriate model should therefore be selected in line with the purpose of use. Possible purposes could be (1) serve as an early warning system, (2) short-term forecasting, (3) peak intensity determination, or (4) examination of the dy-namics of the influenza season. The forecasted influenza activity predicted by the preferred model can then be used as an exogenous variable in more advanced multivariate re-gression models, improving the prediction accuracy of the number of patients visits for Pulmonary medicine, Cardiology or Geriatrics hospitalizations.

Recommendation Relating back to the Context section at the beginning of this paper, HOTflo can use the results and evaluated models of this feasibility study to accurately esti-mate influenza activity for the upcoming 1-2 weeks by utiliz-ing both publicly available influenza data and social media data. The predicted values can be used in HOTflo’s existing autoregressive models for the prediction of the number of cardiac, respiratory and geriatric emergency patients present in the hospital. It is recommended to use the proposed Dy-namic Regression Model for operational purposes since it is a good trade-off between short-term forecasting accuracy and interpretability, increasing the chances of acceptance for clini-cal decision making and capacity management. Furthermore, the model can serve as an early warning system indicating when influenza activity as exogenous variable adds explana-tory power to the existing model.

The switch to a more pro-active model can potentially explain more of the patient-volume variability for these de-partments and more accurately predict the influx of patients during periods when influenza activity is elevated. In the long-term this can create additional value for hospitals with respect to workload perception, patient safety and efficiency.

Acknowledgments

I would like to express my gratitude to my external supervisor and second internal supervisor Dennis Roubos for the useful comments, remarks, guidance and engagement through the learning process of this master thesis. Furthermore I would like to thank Ger Koole for his comments, remarks, introduc-tion to the right people and helping me on the way. Also, I would like to thank the people involved in HOTflo, who have shared their expertise with me and have created a great work-ing atmosphere. Lastly, I would like to thank Maarten Marx for coordinating this project and thank him for his flexibility.

Referenties

GERELATEERDE DOCUMENTEN

Overall it is acknowledged that in the Netherlands only very strong hunting in both summer and winter seasons ,as happened in the 1960’s, can really decrease goose numbers, but

THE BOEOTIA PROJECT: PUBLICATIONS TO 1997 Alcock, S E (1997) Changes on the ground in Early Imperial Boeotia, in J L Bintliff (ed) Recent Developments in the History and Archaeology

To detect anomalies in the local magnetic field resulting from subsurface architectural remains on the totality of the test area, we used the FM36 Fluxgate gradiometer, which

These originated in Trench 1a-c, which is in the open-air sacrificial refuse area, and were part of unit 2, which also contained large quantities of PG and G pottery fragments,

In our models, cardiac surgery during influenza seasons versus baseline seasons was an indepen- dent risk factor for development of ARDS (odds ratio, 1.85; 95% confidence

Factors associated with a high virulence and pathogenicity were indentified in two very pathogenic human influenza strains, namely the 1918 ‘Spanish influenza’ pandemic and H5N1

With the threat of the development of the avian H5N1 strain into a new pandemic influenza virus, possibly as dangerous as the 1918 H1N1, we cannot underestimate the