Modelling Covid-19 Interventions with Machine Learning and SIR Models

(1)

University of Amsterdam

Masters Thesis Computational Science

Modelling Covid-19 Interventions with

Machine Learning and SIR Models

Author:

Rebecca Davidsson

W. Weterings - Supervisor V. Krzhizhanovskaya - Examiner V. V. Vasconcelos - Assessor

A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computational Science

Avanade

(2)

Declaration of Authorship

I, Rebecca Davidsson, declare that this thesis, entitled ‘Modelling Covid-19 Interventions with Machine Learning and SIR Models’ and the work presented in it are my own. I confirm that:

This work was done wholly or mainly while in candidature for a research degree at

the University of Amsterdam.

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly attributed. Where I have quoted from the work of others, the source is always given. With the

exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself.

Signed:

Date: June 2021

(3)

UNIVERSITY OF AMSTERDAM

Abstract

Faculty of Science

Master of Science in Computational Science

Modelling Covid-19 Interventions with Machine Learning and SIR Models by Rebecca Davidsson

Many models have been implemented to predict future infections of Covid-19. Machine learning (ML) models and epidemiological Susceptible-Infectious-Recovered (SIR) models have provided promising results towards accurate predictions of the number of confirmed cases. Multiple studies have shown that ML models can outperform variants of the SIR-models due to the ability to learn from the past trend and capturing variance in the data. However, most predictions of future Covid-19 trends are based on SIR models or its variants. SIR models are often used to model scenarios of different kinds of inter-ventions. Because of the limitations of both kinds of models, it is not clear yet which model should be used to model interventions. It is important to create a clear overview of what models should be used for Covid-19 forecasting to help epidemics in the choice of a forecasting model for future interventions or vaccination programs. Here, a comparison is made between SIR models and the ML model Long-Short-Term-Memory (LSTM) in predicting confirmed cases after an intervention. A hybrid model that combines SIR and LSTM is implemented in order to overcome model limitations. For predicting the trend of confirmed cases after an intervention, the hybrid model was able to outperform both SIR and LSTM. However, the most suitable model differs for every country and is based on the goal of the predictions. A model decision has to be based on whether the goal is to predict the confirmed cases on the short- or long-term. For modelling interventions such as a lockdown, hybrid models can outperform SIR and LSTM models. Hybrid models can help policymakers and epidemiologists to make more informed decisions about Covid-19 interventions.

(4)

Acknowledgements

I would like to thank my supervisor from Avanade, Wilbert Weterings, for helping, giving advice, and for general support during this entire thesis, for critical feedback about my work and thinking along with my questions. Also, for great weekly talks to catch up on social contact during this period of working remote due to the pandemic.

In addition, I would like to thank my examiner, Valeria Krzhizhanovskaya from the Com-putational Science team, for advice and critical feedback about my work.

For both, I really appreciate your help and time you put into supporting me. Therefore, a great thank you.

Lastly, I would like to thank Vítor Vasconcelos for assessing my work, Tom and Sam for reviewing my thesis and giving critical feedback, and Renee - my Avanade buddy - for weekly updates and fun meetings.

(5)

4.2.1 LSTM enhanced by SIR - M1 . . . 50 4.2.2 SIR enhanced by LSTM - M2 . . . 52 4.3 Comparing models . . . 55 4.3.1 Uncertainty Quantification . . . 56 4.3.2 Sensitivity Analysis . . . 57 5 Discussion 61 5.1 Interpretation of results . . . 61 5.2 Implications . . . 63 5.3 Limitations . . . 64 5.3.1 Model evaluation . . . 65 5.4 Recommendations . . . 67 6 Conclusion 69 6.1 Future work . . . 70 6.2 Summary . . . 71 Bibliography 72 Appendix 83 .1 Code and contributions . . . 83

.2 Referred tables and figures . . . 83

.2.1 JHU dataset description . . . 83

.2.2 Excluded countries . . . 83

.2.3 OxCGRT dataset description . . . 84

.2.4 Parameters and their values . . . 85

.2.5 Event study approach . . . 86

.2.6 SIRD NPI scenario predictions . . . 87

.2.7 SEIR NPI scenario predictions . . . 88

(7)

List of Figures

3.1 Number of Confirmed Cases on a logarithmic scale against calculated StrI for all countries listed in the OxCGRT dataset. The grey line indicates a non-linear trend line, computed by local regression. Low values of StrI are associated with lower values of confirmed cases. StrI increases along with the number of confirmed- and death cases. . . 19 3.2 StrI over time for multiple countries over time. Green windows indicate

maximum stringency for this country. For example, for the United King-dom, maximum stringency of policy measures was applied in January. . . 20 3.3 Course of RMSLE in parameter estimation for the SIR model. The

individ-ual dots indicate all data values per iterations and the green line indicates an average of these points. Error stabilizes to a value between 1 and 2 after about 250 iterations. . . 26 3.4 RMSLE values associated with computed parameter estimations for the SIR

model. For mortality rate and latent period, a bigger initial range was set for parameter estimation, which explains the higher errors (indicated with light colors), compared to low errors (indicated with dark colors). The x-and y-axis show the estimated parameter values. . . 26 3.5 Single LSTM cell structure, where ftrepresents the forget gate, itthe input

gate and ot the output gate, ht−1 the output of the previous LSTM cell

at timestep t − 1, xt the input at the current timestep and C the cell at

timestep t. The dot represents a multiplication and the plus-sign represents the sum over all inputs. . . 28 3.6 Change in the ODE parameter ρ over time for the Netherlands (left) and

the average of all countries (right). For the Netherlands, there is no clear pattern or relationship between policy stringency (blue) and effective con-tact rate (red). On average (yellow line), ρ shows a decrease after May 2020, with an increase in November 2020. Grey lines indicate ρ values for individual countries. . . 33 3.7 An overview of methods used to implement sensitivity analysis and

uncer-tainty quantification. . . 36 3.8 M1 architecture, including the rolling update mechanism and SIR

predic-tions. Here, the input data X consists of time series data of the number of confirmed cases xCt and the SIR model predictions xSt. Details of the

LSTM cell are described in Section 3.5. The starting arrow indicates the beginning of the rolling update mechanism. . . 38 4.1 Delay, τ, in days for each country, where darker colors indicate a longer

delay period. The minimum value for τ was 8.3, whereas the maximum value was 30.2. . . 43

(8)

List of Figures vii 4.2 Correlation diagram of each individual NPI and model parameters,

adjust-ing for a delay in NPI effects. The percentage at the top represents the percentage of the maximum calculated delay. Here, an average is computed over all countries, where the average of τ was 16.3 days. In this case, 0% of τ represents the correlation at the same day of the NPI. 100% of τ represents the maximum of calculated τ. . . 44 4.3 SIR predictions including separate interventions for a selection of countries.

This figure shows the effects of an NPI on the number of confirmed cases if a specific NPI would be implemented at maximum strength. Parameters associated with the NPI are estimated by Equation 3.25. NPIs with dot-ted lines are associadot-ted with too little data and are, therefore, not reliable predictions. Note that the y-axis is not fixed and is different for each country. 45 4.4 Prediction error (MAPE) of the LSTM, SIR, SIRD, and SEIR over time,

averaged out over multiple countries for fixed start dates: 1st of December, 2020 (left) and 1st of February, 2021 (right). Note that the parameters here are not adjusted for interventions. . . 46 4.5 Global and country-level predictions of confirmed cases for SIR models (A,

D) and deep learning models (B, D). For B and D, confidence intervals are shown along with the mean of 25 runs. No confidence levels are shown for the SIR models since these models are deterministic. Figure A and B show global-level predictions, while C and D show country-level predictions of, in this case, the Netherlands. The same trends were found for other countries. 48 4.7 Predicted values (green) and observed values (red) after school closing in

the United Kingdom and Italy. . . 51 4.6 Average (yellow) computed strengths for corresponding NPIs, used in model

M1. Grey lines show computed strengths for individual countries, while the yellow line indicates the average. Strengths of the variables Stay home restrictions, cancel events, information campaigns, and testing policy are computed to be relatively small compared to the other variables. Stringency index at maximum strength (lockdown) is computed to have the largest strengths, indicating the largest effect of this NPI. The x-axis shows the number of days, where zero indicates the start of an NPI. . . 51 4.8 M1 predictions of confirmed cases along with their standard deviation for

30 runs for multiple countries. Here, predictions for the NPI gatherings re-strictions and Stringency Index (StrI) are shown. The red line indicates a 7-day rolling average. The dashed line indicates the start date of validation, selected by the start of a period of highest StrI - a lockdown. For all coun-tries, except for Sweden, predictions of the number of confirmed cases after a lockdown are lower compared to no restrictions. Detailed descriptions per country can be found in Section 4.2.1. . . 54 4.9 M2 predictions along with their standard deviation for 30 runs for multiple

countries. Here, the NPIs lockdown (highest StrI) and gatherings restric-tions are shown. The red line indicates a 7-day rolling average. The dashed line indicates the start date of validation. . . 55

(9)

List of Figures viii 4.10 RMSE and MAPE for all concerning models for 7- and 14-day predictions.

Values are shown as an average per country. Blue bars indicate the nor-malized RMSE, red/orange bars show the MAPE. The left-hand side shows the RMSE and MAPE for model predictions without NPIs. In other words, this shows the performance of models when no NPIs are included in the prediction. The right-hand side shows model predictions where NPIs are included in the modelling process. . . 56 4.11 MAPE for predictions in multiple countries for 50 runs for the NPI lockdown.

Since the SIR model (blue) is deterministic, only one data point is shown. 56 4.12 Uncertainty Quantification of M1 (left) and M2 (right), showing confidence

bands, calibration and prediction intervals. See 4.3.1 for details. . . 57 4.13 A boxplot of computed first-order sensitivity indices Si for all NPIs. High

values for Si indicate a large sensitivity on the model outcome. The

respec-tive highest and lowest median of Si were computed found Stringency index

and Testing policy. . . 58 4.14 SA for M1 (top) and M2 (bottom). First-order indices Si (grey) and total

indices ST are shown along with 95% confidence intervals for ST (the black

bar). . . 60 5.1 Flow chart of model choice in the context of forecasting Covid-19 cases

(with or without NPIs). . . 67 1 SIRD predictions including individual interventions for a selection of

coun-tries. This figure shows the scenario’s where a specific NPI would be im-plemented at maximum strength. Parameters associated with the NPI are estimated by Equation 3.25. NPIs with dotted lines are associated with too little data and are therefore not reliable predictions. Note that the y-axis is not fixed and is different for each country. . . 87 2 SEIR predictions including individual interventions for a selection of

coun-tries. This figure shows the scenario’s where a specific NPI would be im-plemented at maximum strength. Parameters associated with the NPI are estimated by Equation 3.25. NPIs with dotted lines are associated with too little data and are therefore not reliable predictions. Note that the y-axis is not fixed and is different for each country. . . 88 3 RMSE (normalized and non-normalized for population size) for predictions

(10)

List of Tables

2.1 An overview of discussed papers related to modelling Covid-19 infections with a variant of the SIR model, sorted by date. . . 12 3.1 Evaluation metrics used to evaluate performance. Here, ˆy is the estimated

value of y at time t in the timeseries data and N the total number of data points. ¯y indicates the mean value for all observations. . . 21 3.2 Model parameters along with their non-dimensional parameter. Non-dimensional

parameters are independent of the units of time and population, facilitating the understanding of parameter values. . . 25 3.3 Parameter values set for the LSTM model. . . 30 3.4 LSTM model loss over 200 epochs, where RMSE was used as minimization

function. Convergence of both the training set (blue) and validation set (red) was seen after about 70 epochs. . . 30 4.1 Parameter estimation and performance metrics for global-level predictions

over the period 01-10-2020 until 01-02-2021. . . 47 4.2 Parameter estimation and performance metrics for country-level

predic-tions over the period 01-10-2020 until 01-02-2021. Average is computed of all concerning countries, mentioned in section 3.1. The dimensional param-eters are also given for 1

β and 1

γ. This division ( 1

x) is made to interpret the

values in days. . . 47 4.3 Overview of strengths and weaknesses of SIR-model variants and deep

learn-ing models in the context of predictlearn-ing Covid-19 infection spread. Predic-tion performance is evaluated by computing the predicPredic-tion errors RMSE and MAPE. . . 49 4.4 Prediction errors for M1 computed for the NPI Lockdown, represented by

the highest value of StrI for that country. For the second column, RMSE norm., RMSE is normalized by population size for each country to facilitate comparison of errors between countries. . . 51 4.5 Prediction errors for M2 computed for the NPI Lockdown, represented by

the highest value of StrI for that country. For the second column, RMSE norm., RMSE is normalized by population size for each country to facilitate comparison of errors between countries. . . 53 1 OxCGRT dataset variables including all Containment and closure policies

along with their name, description and coding. . . 84 2 All parameters along with their corresponding SA range. . . 85 3 M1: Reported RMSE, MAE and MAPE. Empty columns indicate missing

of insufficient data. . . 89 ix

(11)

List of Tables x 4 M2: Reported RMSE, MAE and MAPE. Empty columns indicate missing

(12)

Abbreviations

BO Bayesian Optimization

CPA Change Point Analysis

ITS Interrupted time-series

JHU John Hopkins University

M1 Hybrid Model 1

M2 Hybrid Model 2

MAPE Mean Absolute Percentage Error

MAE Mean Asbolute Error

MSU Model Structure Uncertainty

NPI Non-pharmaceutical intervention

ODE Ordinary Differential Equation

OLS Ordinary Least-Squares

OxCGRT Oxford-Covid-19 Government Response Tracker

PU Parametric Uncertainty

RMSE Root Mean Squared Error

SA Sensitivity Analysis

SU Scenario Uncertainty

(13)

Abbreviations xii

StrI Stringency Index

TPE Tree Parzen Estimator

UQ Uncertainty Quantification

Models

CNN Convolutional Neural Networks

GRU Gated Recurrent Unit

LSTM Long-Short-Term-Memory

MLP Multy Layer Perceptron

RNN Recurrent Neural Network

(14)

Chapter 1

Introduction

The Covid-19 pandemic is a pandemic of the coronavirus disease 2019 (COVID-19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It has been causing enormous damage to well-being and people’s health, according to the Organisation for Economic Cooperation and Development. At the time of writing this paper, the number of confirmed cases was more than 150 million worldwide, with more than 3.5 million deaths [1]. Unfortunately, the number of cases is estimated to keep rising [1]. In the meantime, policymakers are having a hard time operating and make tough decisions based on the sparse data that is available [2]. Predictive analytics and forecasting tools are used to plan interventions to prevent the further spread of the virus. A lot of new research in the field of machine and deep learning helps policymakers make better decisions about planning interventions. Here, accurate prediction of future infections is an important aspect of the decision. However, a large number of factors make it difficult to accurately study the effect of interventions on future infections, with three factors in particular. First, the delay effect between implementing an intervention, such as a lockdown, and seeing actual difference in the number of new infections. This also has to do with an incubation period of the virus, which is estimated to have a median of 5 days [3]. In addition, the delay effect can be explained by other factors such as the time it takes for the population to adapt to a new intervention and the time it takes to get a test result. This delay (or lag-effect) makes it very hard to study the actual effect of an intervention. Secondly, studying individual interventions, such as the difference in effect between a lockdown and social-distancing, is a very challenging process due to the interference of multiple policy measures at the same time. For example, in most countries, when a lockdown is implemented, other interventions are also applied. Thirdly, there is relatively little data available, which makes it hard to train and implement deep learning models to forecast future infections. In addition to these limitations, when policymakers eventually choose a type of intervention to implement, choosing the optimal level of restriction or stringency is also a difficult task. Determining policy stringency has been a highly non-trivial task

(15)

Introduction 2 due to socioeconomic turbulence and the rapidly fluctuating situation.

As already stated, predicting confirmed cases of Covid-19 is a very challenging process. There is a lack of historical data, a lack of consistency in testing data between countries and a large variety in the government-led infection prevention approaches, and a delay between implementing a non-pharmaceutical intervention (NPI) and the moment that actual effect can be seen in the number of newly infected individuals. A successful attempt was made to predict future Covid-19 cases based on historical data of the Influenza virus [4]. However, all the different kinds of NPIs during the Covid-19 pandemic are not comparable with those applicable for Influenza.

It is proposed that Susceptible-Infectious-Exposed-Recovered (SEIR) or SIR models pro-vide a promising research area [5]. Considering the huge negative impact of the virus with over 3.5 million deaths worldwide [1], it is important that policymakers can rely on prediction models to make the best decisions. One of the most used prediction models is implemented by scientists from John Hopkins University, which is based on a SEIR model. However, the model parameters are deterministic and the model does not account for changes in policy measures.

To help understand why some model forecasts are more optimistic or pessimistic than others, it is important to understand the working of these models. Considering that there is not one type of data that can be used to predict new infections (or new confirmed cases), machine learning (ML) and compartmental models such as the SIR model can be combined to predict the number of new infections more accurately [6]. Until now, not much research has been done to investigate the integration of changing NPIs into ML models and the SIR model for Covid-19 forecasting, opening up a gap in current research. In particular, the differences between ML and variants of the SIR model have not yet been studied extensively. Investigating the different models and their applications to modelling NPIs could help support decision-makers to make the most informed decisions [7]. Therefore, the research question and sub-research question of this thesis will be:

What is the difference between modelling Covid-19 interventions in fu-ture spread forecasts with ML and SIR model variants?

To what extent can ML and SIR models be used to create a hybrid model that accurately predicts new COVID-19 infections?

In the research about the difference between modelling Covid-19 with ML compared to SIR, recommendations will be made for model choice based on several features. This thesis is not focused on the creation of the model that minimizes the prediction error. It is rather focused on highlighting the techniques behind different models and their ability to capture

(16)

Literature Review 3 variance in spread of the virus. Furthermore, a goal is to give a clear overview of which models should be used in given circumstances.

Thesis structure

First, this thesis will introduce the subject in a literature review. The literature review will cover previous research about SIR and ML models in the context of Covid-19. Fur-thermore, a broad comparison is made between SIR- and ML models. At the end of this section, a more detailed research question and corresponding hypotheses are stated. The methods will cover details of the used SIR and ML models along with their equations and parameters. In Chapter 4, a thorough analysis of the differences between SIR and ML models is discussed in terms of short-term, long-term, global and regional prediction accuracy. This chapter will introduce arguments that partially answer the main research question. Section 4.2 will answer the second research question by introducing two new hybrid models, where aspects of SIR and ML models are combined. Besides researching the models’ potential to predict new confirmed cases, another goal of this last step was to identify potential weaknesses of both types of models. Lastly, the discussion and research conclusion is given.

(17)

Chapter 2

Literature Review

A large part of the literature regarding epidemiological modelling focuses on the following methods: capturing disease dynamics [8], accurate forecasting of newly infected cases [9], estimating model parameters associated with NPIs [10], and optimizing public-health policies [11]. This thesis will mainly focus on forecasting confirmed cases with a sub-goal to estimate model parameters associated with NPIs to model the future spread of confirmed cases given an NPI scenario, using ML, SIR, and a combination of ML and SIR models. The purpose of this literature review is to gain insight into currently applied ML and SIR models for Covid-19 forecasting. Sub-questions that will be used to streamline the literature research can be described as:

• Which ML and SIR models are most common in the prediction of Covid-19 infec-tions? This question will be answered in sections 2.1 and 2.2.

• What is already known about the strengths and weaknesses per model? The answer will be given in section 2.3.

• How can these models integrate information about NPIs? Here, NPIs are defined by public health measures such as quarantine control or lockdowns. This question will be answered in section 2.4.

Note that throughout this thesis, SIR models are used as a reference of epidemiological model variants such as Exposed-Infected-Recovered (SEIR) and Susceptible-Infected-Recovered-Dead (SIRD).

A selection of articles was made by an abstract-screening method. First, several abstracts of related articles were screened to recognize important features to search for. Using Google Scholar and the WHO Covid-19 Global literature on coronavirus disease, articles were searched for using the keywords Covid-19, machine learning, deep learning, SIR, SEIR, forecasting, predictions, intervention, and infection spread. To include high-quality papers, the selection was refined by filtering on papers with a high number of citations. Furthermore, some papers were excluded due to a lack of results for sensitivity analysis

(18)

Literature Review 5 and performance of models. However, this did not yield to papers about the SIR model, since an analysis of performance was often not conducted. Furthermore, the publication date was also an important aspect. A lot of new studies have been published, presenting new insights into Covid-19 forecasting and its analysis.

2.1 Machine learning methods

A lot of research has been done to implement machine learning models that can accurately predict the number of confirmed cases. Ahmad et al. [5] presented a taxonomy of machine learning models used for Covid-19 forecasting: traditional machine learning regression, deep learning regression, network analysis, and social media and search queries data-based methods[5]. This literature review is mainly focused on deep learning regression, extended by novel approaches. The next paragraph will summarize previous literature towards deep learning in the context of predicting Covid-19 infections.

Various types of deep learning neural networks have been applied to predict the spread of Covid-19 infections. Deep learning models used for Covid-19 are mostly based on convolutional neural networks (CNN) [12], Long short-term memory (LSTM) [13, 14], autoencoders [15], polynomial neural networks [16], multilayer perceptron (MLP) [12], Long-Short Term Memory (LSTM) and gated recurrent units (GRUs) [17]. Other ML models such as Random Forrest are not considered, considering that it requires the time series data to be transformed into the correct format, leading to lower model performance in terms of accuracy and bias [18]. Covid-19 time series data covers a temporal dataset, requiring models to handle temporal data. Before diving into the real-world application of these models, a brief overview will be given about the models currently used for Covid-19 predictions.

Neural Network (NN). NNs are collections of units or nodes passing on signals to model a specific function. For a model with input x and output y, the equation model would then be y = f(x)+. NNs can be used to model highly complex and non-linear functions. A NN consists of connections, where each connection sends the output of a neuron to the input of another neuron. Each connection has its own weight. A chosen propagation function determines the input of a neuron from the output of predecessor neurons. Furthermore, the learning rule determines the weights of the connections. The weights then serve to compute a favored output for a given input. Normal NNs can consist of multiple layers or even a very large amount of layers, which are then called deep learning models.

Convolutional Neural Networks (CNN). CNNs are similar to neural networks, but with constraints to reduce the complexity of the model, and are mainly used for image processing [19]. They are also called space invariant neural networks since they are based on the idea

(19)

Literature Review 6 of sharing the weight of filters that slide along input features and then generate so-called feature maps as a response.

Recurrent Neural Networks (RNN). As disclosed in its name, RNNs are NNs with recurrent connections. The connections can go in the opposite direction of a normal signal flow. This can form cycles in the network. RNNs learn while training the model and can remember prior inputs. Compared to regular NNs, the output is not only influenced by weights applied to the connections, but also by a hidden state. This hidden state represents the context based on prior input and output, which means that similar input values can generate different output values depending on previous input values in the model. RNNs are suitable for temporal and sequential data. In line with CNNs, RNNs also consist of multiple layers of neurons where each neuron is assigned a weight. In comparison to Convolutional Neural Networks (CNNs), RNNs are less suitable for spatial data such as images. CNNs take fixed-sized inputs and generate fixed-sized output, whereas RNNs can handle free arbitrary and output lengths.

Long-Short-Term-Memory (LSTM). LSTM is a variant of Recurrent Neural Network (RNN) and can be used to predict time series data [20]. The model consists of gates to regulate the flow of information. The gates in the network can learn to select important informa-tion from time series data and can use this to predict values. LSTM is generally composed out of three gates: an input gate, an output gate and a forget gate. In comparison with RNNs, LSTM networks use spatial units in addition to the standard units used by RNNs. It includes a memory cell, used to remember information for long periods of time. LSTM models are therefore more suitable for long term memory of time series. The gates are used to regulate how this information is stored or forgotten. A more detailed explanation about LSTM models is given in section 3.5. Next to unidirectional LSTM, bi-LSTM consists of two LSTM networks, both taking input in the opposite direction.

Gated Recurrent Unit (GRU). GRU is a variant of the LSTM model that is used to reduce the number of parameters and make the model less complex [21]. Also, The GRU does not have an output gate. In some cases, GRUs have been shown to perform better on small datasets compared to LSTM models [22]. The key difference between GRU and LSTM is found in the gating structure. GRUs have two gates, while LSTM consists of three gates. GRUs are preferred over LSTM if the dataset is small, LSTM otherwise [23]. A strength of GRU is its computational time and relative simplicity; GRU is more efficient because of fewer parameters compared to LSTM [24]. Both LSTM and GRU include memory to overcome a problem of vanishing gradients, which occurs in the RNN.

Variational Auto Encoder (VAE). VAE is a directed probabilistic graphical model. Its posterior is approximated by a NN with an autoencoder-like architecture as a result. A normal auto encoder gives recreated original input as its output. However, VAEs assume that the input data has an underlying probability distribution. It then tries to find the parameters of this distribution, such as the parameters of a Gaussian distribution.

(20)

Literature Review 7 Multilayer Perceptron (MLP). MLP is a feed-forward neural network that can learn non-linear and real-time models. It has three types of layers - input, output, and hidden layers. Prediction and classification are implemented by the output layer. Neurons in the MLP are trained by a backpropagation algorithm. Signals travel from input to output, through one or more hidden layers.

Some results show that the exploitation of deep learning models has led to improvements in predicting Covid-19 infections compared to regression models [25,26]. The next paragraph will cover previous literature about the implementation of deep learning models on Covid-19 time series data.

Huang et al. compared multiple models for Covid-19 forecasting [12], including CNN, GRUs, LSTM, and MLP. This study demonstrated that CNN outperformed the other three deep learning methods. Here, it is important to note that this study only used training data from seven Chinese cities. Considering that this study was performed in March of 2020, the availability of data regarding Covid-19 was limited. This study proposed the use of a mixed structure of deep learning models to establish more accurate prediction results. Furthermore, Shahid et al. [25], discussed the comparison between Bi-LSTM, LSTM, GRU, where the Bi-LSTM outperformed the other models for predictions. Models were trained on a combination of data from ten countries in the period between January and May 2020. For evaluation, the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and R2 _{were used (see Section 3.2 for details about evaluation metrics). Then,}

individual predictions per country were made for 48 days in the future. This study did not report the difference in short-term (1 - 14 days) and long-term (>14 days) predictions for the different models. However, the Bi-LSTM and LSTM have shown promising results regarding the prediction accuracy of Covid-19 cases.

Several other studies have also confirmed that LSTM models can outperform other deep learning methods for infection predictions [25–28]. However, a more recent study looked deeper into the differences between RNN, GRU, VAE, LSTM and Bi-LSTM in perfor-mance and accuracy [29]. This study showed that, for Covid-19 data, training an RNN is relatively quick compared to GRU and LSTM, mainly because the RNN is a simpler model. Furthermore, this study demonstrated promising results of the VAE algorithm compared to the other methods. Zeoual et al. (2020) note that the VAE algorithm has not yet been applied before in Covid-19 forecasting [29]. The reason for the higher per-formance of VAE algorithms could be the ability to capture almost all variability in the data, even when the amount of data is small. Training data included 210 countries with 148 days of time series data. Predictions were then made for every country separately for 17 days into the future. Again, the difference in short and long-term predictions was not reported.

(21)

Literature Review 8 The differences between RNN, GRU, and LSTM, concerning the number of predicted days, were highlighted by Wang et al. [30]. To overcome the problem of data sparsity, a clustering-based method was used to augment training data for each region. Spatial regions were then clustered based on trend similarity. For each cluster, a model was trained. This study showed that RNN performs best on short-term forecasting of one week into the future, whereas LSTM outperforms the GRU and RNN methods for the forecasting of three and four weeks ahead. Thus, it was observed that a classical RNN performs better than GRU and LSTM on short-term forecasting and worse for long-term forecasting. However, the finding that LSTM outperforms GRU and RNN for multivariate time series analysis was concluded by other studies [27,28].

2.1.1 Conclusion

In this section, an attempt was made to answer the question which machine-learning models are most common in the prediction of Covid-19 infections. Deep learning methods emerged as a promising field of research when focusing on predicting infections. Con-sidering recent developments and new findings in the field of deep learning, the current most advanced machine learning models are RNN, LSTM, bi-LSTM, GRU, and VAE [29]. These models have the advantage of handling variance in time series data and are flexible with modelling non-linear features. In particular, exploitation of LSTM and bi-LSTM models has lead to improvements in prediction accuracy compared to other deep learning and regression models in several studies [25–28,30–32].

2.2 SIR models

The SIR model is a mathematical compartmental model applied to compute the number of subjects infected with a disease in a closed population over time. The mathematical model developed by Kermack-McKendrick is based on coupled equations giving informa-tion about the number of individuals in the following compartments: susceptible (S), infected (I), and recovered (R) [33]. This model can predict the spread of a disease by computing features such as the number of infected subjects. In the past, SIR models have been applied to model infections of Seasonal Influenza [34]. For Covid-19, the models help predict the progression of the pandemic and investigate techniques for NPIs. In contradic-tion to deep learning models, SIR models cannot predict the number of newly confirmed cases directly (note that there is a difference between total/cumulative confirmed cases and new confirmed cases or active infections per day). However, SIR models can take the total population N into account. The number of active cases is then calculated by subtracting the number of total recovered and death cases from the total number of cases (active cases = total cases - total recovered cases - total death cases).

(22)

Literature Review 9 For Covid-19, different kinds of SIR models have been developed with varying compart-ments, such as the SEIR, eSIR, and SIRD models. For example, the SEIR model extends a simple SIR model by adding the Exposed (E) compartment. The next section will give a detailed overview of these variants. A detailed description of the math behind the SIR models is given in section 3.3. Also, a brief overview of important findings of the SIR model is given in Table 2.1.

2.2.1 Variants of the SIR model

Since different countries take different measures in preventing new infections, the SIR model must be modified to capture the national or regional assumptions [35]. This can be done with variants of the SIR model. Chen et al. [36] looked into the SIR model when taking undetectable infected persons into account. The time-dependent SIR model was able to adapt important parameters in the model according to changes in NPIs. Disregarding the finding that Covid-patients could be infected twice [37], another variant of the SIR model was proposed; the SIRD model [38]. This variant was based on the idea that recovered subjects are no longer susceptible to infection. Alanzi et al. also used a modified SIR model: the SIR-F model, where the number of confirmed cases is calculated by summing the number of infecteds, recovereds, and deaths [39]. This modification was due to the observation that some people already died before getting to the hospital (a confirmed case). This SIR model variant took the lockdown measures and interventions into account. This was done by using an extra parameter in the model that was calculated by taking the number of office and school days into account. Also, the population pyramid which indicated the number of people in a certain age category was included in the predictions. Interventions such as centralized treatment, control measures, and quarantine were also considered in an SEIR model from Cao et al. [40] and in an SUQC (Susceptible, Un-quarantined infected, Quarantined Infected and Confirmed) model from Zhao et al. [41]. These studies showed the effect the quarantine measures and the scenario of loosening quarantine measures that were already taken.

Wang et al. [42] looked at a modified Markov SIR model by incorporating multiple types of time-varying quarantine protocols. These included government-level macro isolation policies and community-level micro inspection measures. This method was performed to establish a method of indicating the number of under-reported infections.

The effects of quarantine measures were also studied in a mixed-SEIR model in a study conducted by Hou et al. [43]. This study showed the effect of quarantine measures on the number of infected cases and the delay in peak time. The parameters in the mixed-SEIR model were changed to match the incubation period of Covid-19. Also, the assumption that the disease is contagious during the incubation period was taken into account. The

(23)

Literature Review 10 study showed that quarantine and isolation can reduce the number of Covid-19 infections and delay a potential peak in infections.

Ivorra et al. [44] studied a θ-SEIHRD model, that was able to capture information about undetectable infections in China. Unreported data was used to compute model parameters, which in turn were used to estimate the future spread of the disease. Results highlight the impact of a low percentage of detected cases and concluded that the magnitude of the epidemic can be greatly reduced when the percentage of undetectable cases is low. Godio et al. [45] implemented a similar variant of the model, but with a stochastic method. A stochastic SEIR model was applied with information about quarantine measures, making use of three extra compartments for the model. The three extra compartments were quarantined cases, death cases, and insusceptible cases. Calafiore et al. also investigated stochastic techniques with a SIR model. In addition, the differences between a SEWIR-F and stochastic SIR were highlighted. Here, SEWIR-F stands for susceptible (S), exposed (E), waiting (W), infected (I), recovered (R) and fatal (F).

A relatable study looking into another variant of the SIR model, the eSIR model, aug-mented by adding information about time-varying lockdown measures [46]. This was done by changing the R0 value according to the strength of NPIs. The eSIR model was also extended by adding information about the incubation period. Wangping et al. concluded that, compared with the normal SIR model, the eSIR model was a more suitable model to predict future infections.

In addition to these variants, another variant called the SIDARTHE was implemented by a study from Giordano et al. [47]. This variant takes eight stages into account; susceptible (S), infected (I), diagnosed (D), ailing (A), recognized (R), threatened (T), healed (H) and extinct (E). In addition to fitting the model to data of the epidemic, this study also looked into the effect of lockdown measures and testing.

Su et al. [48] implemented an adjusted SEIR model; a SEIR-Sq-Eq-Iq model, where Sq is the number of quarantined susceptible subjects, Eq the number of isolated exposed subjects, and Iq the number of isolated infected subjects. By using different data sources for parameter estimation, the model was used to show the effect of NPIs.

Mahmood et al., [49] used an agent-based SEIR model approach to identify hotspot regions in London. In addition, this model was able to simulate multiple scenarios of the lockdown, including no measures, extended lockdown, dynamic lockdown and periodic lockdown. The agent-based approach was focused on regional predictions in London.

Batistela et al. [50] used an SIRSi model, where Si stands for the number of sick subjects, that allowed to infer unreported and asymptomatic cases. A feedback loop in the model was added to consider different immunity responses. The same thing was done by Liu et al. [51], where unreported cases were seen as a separate compartment in the SIR model. Fernández-Villaverde and Jones used a SIRDC (Susceptible - Infectious - Resolving - Dead - reCovered) model to predict the spread in different countries, states, and cities [52].

(24)

Literature Review 11 Camino-Beck looked into a SEICRS model (Susceptible - Exposed - Infectious - Compart-ment - Recovered - Strategy) and impleCompart-mented a way of analyzing second wave scenarios [53].

Chen et al., [54] implemented a SEIAR model to simulate different scenarios and look into the effect of interventions such as a reduced mobility rate. This study used a fixed set of parameters to simulate the scenarios.

Din et al. [55] focused more on determining the stability of their SIR model. Numerical solutions were calculated using Non-Stanard Finite Difference (NFDS) scheme. They concluded that protection and isolation play a crucial role in the prevention of increasing infection rates.

Now that the COVID-19 pandemic has raged over a year, multiple studies have reflected on the accuracy of SIR model forecasts from the previous year. Moein et al. [56] analyzed their SIR model forecasts in comparison with newly observed data from a year later. This study highlights that most SIR models showed the same modelling problem; the biggest problem was that most models were based on assumptions that were not true. The same was concluded by Shin et al. [57], using both the SIR and SEIRD model to re-evaluate previous predictions from the first until the third wave in South Korea. Here, the assumption for a homogeneous population and the invariant parameters assumption were invalidated.

Maheshwari et al. [11] looked into a Lotka-Volterra (LV) predator-prey SIR model (LVSIR). This study used the LVSIR to determine the effect of NPIs analytically, in contrast to most studies which are mostly focused on scenario-based modelling. This method allowed the study of the optimal level of restrictions in addition to the economic outcomes of NPIs.

2.2.2 Conclusion

This section was focused on answering the research question on which variants of the SIR model have successfully been implemented to predict Covid-19 cases. To create accurate prediction models, information on interventions has to be included. There are many types of interventions, such as quarantine on individual or group level, avoid crowding, hand hygiene, isolation, personal protective equipment school measures, physical distancing, and workplace measures. A strength of SIR model variants is the ability to include various interventions into a modelling scenario. Multiple studies were conducted that took such interventions into account [62]. Since most studies do not report evaluation methods to assess the performance of applied SIR models, it is hard to rank the used model concerning their efficiency. Also, the variants are often studied on different kinds of data, which makes comparing variants of the SIR model a difficult task. For example, Rahimi et al. studied the performance of the SIR model, the SEIR model, and the Prophet algorithm - a method for forecasting time series data - in different countries [63]. This gave the insight that there

(25)

Literature Review 12

Ref Date Model Area Days Prediction range Source Interventions [42] 03-03-2020 Markov SIR China 200 15-01-20 until 12-02-20 China CDC

Included time-varying quarantine protocols [41] 11-03-2020 SUQC China 32 20-01-20 until 21-02-20 China NHC

Included quarantined individuals to predict new infections [6] 18-03-2020 SIR & NN China 40 20-01-2020 until 03-03-20 NHC China

Used a neural network to learn the increase of quarantine control [47] 22-03-2020 SIDARTHE Italy 350 20-02-20 until 05-04-20

PCMS Studied effect of lockdown and testing [40] 28-03-2020 SEIR China 30 23-01-20 until 24-02-20 HCHP -[38] 31-03-20 SIRD Italy 100 24-02-20 until 30-03-20 Github -[43] 02-04-2020 mixed SEIR Wuhan, China 50 10-01-20 until 29-02-20 WHO and oth-ers

Looked at the impact of quarantine and isolation interven-tions

[45] 03-04-2020

SEIR Italy 30 - JHU Augmented the stochastic SEIR model with quarantine in-formation [58] 24-04-2020 SEWIRF and stochastic-SIR Senegal 30 06-03-2020 until 24-04-2020 - -[46] 06-05-2020 eSIR Italy and China 141 22-01-20 until 03-04-20

JHU Information about time-varying lockdown measures by changing the R0 value

[48] 07-05-2020 SEIRSqEqIqAreas in China 50 23-01-20 until 24-02-20 WHO and oth-ers

Added a parameter that indicates the changes in the popu-lation flow into the model

[49] 20-08-20 AGB SEIR Areas in Lon-don 180 01-03-20 until 28-08-2020 London NHS

Used an agent-based SEIR model approach to identify hotspot regions in London and to simulate multiple lock-down scenarios. [36] 18-09-20 SIR China 29 15-01-20 until 02-03-20 NHC China

Adjust crucial parameters to adapt accordingly to the change of control policies

[44] 20-10-20 θ-SEIHRD China 120 01-31-19 until 29-03-20 WHO -[50] 29-10-20 SIRSi Brazil cities

730 - SAEDE looked into temporary immunity as an intervention [39] 02-11-20 SIR-F KSA 700 01-04-20 until 13-07-20 WHO and oth-ers

Compared the effects of no interventions, lockdown, and new medicines [54] 01-12-20 SEIAR China 150 06-12-2019 until 05-05-2020 China NHC

Looked into effects of reducing mobility rates by calculating the curve of the number of infections, using Euler numerical method. [59] 08-12-20 Stochastic SEIR Germany- 16-03-2020 until 30-05-2020

RKI Looked at social distancing and modelled several scenarios in Germany

[60] 01-01-21 SIR Japan 71 14-01-2020 until 24-03-2020

MHLW School closings and event canceling [61] 21-01-21 SEIR China 28 01-23-2020 until 03-03-2020 Ding Xi-ang Yuan

Compared SEIR to a Neural Network to model effects of quarantine on the future spread

[55] 19-02-21 SIR Pakistan 210 01-02-20 until 01-09-20

worldo-meters

used a convex incidence rate to determine model stability [56] 25-02-21 SIR Iran 91 14-02-20 until 11-04-20 MCMC -[11] 26-02-21 LVSIR India, Mex-ico, Nether-lands 365 01-04-20 until 01-04-21

JHU Investigated efficiency of periodic lockdown using a Lotka-Volterra SIR approach, incorporating additional informa-tion about mobility.

Table 2.1: An overview of discussed papers related to modelling Covid-19 infections with a variant of the SIR model, sorted by date.

(26)

Literature Review 13 is not one optimal model; a given SIR model could perform better on data from a given country whereas the SEIR model can perform better on data from another country. Furthermore, some results show that simpler SIR models can predict the number of con-firmed cases more accurately compared to SEIR models [64, 65]. An explanation for this observation could be that using the SEIR model will produce a loss of information compared to the SIR model [64]. Furthermore, simpler models require fewer modelling assumptions. As Shiva et al. [56] pointed out, complex and modified SIR model variants often come with the problem that these models are based on assumptions around Covid-19 that are not true.

The performance of mathematical models for epidemics also depends greatly on parameter tuning [9]. Detailed models with a lot of variable parameters also require detailed statistics for their validation. More complex models are only favored in cases of a large amount of data. On the other hand, simpler models are favored when data is limited [66]. Also, the number of possible errors for forecasting increases along with the number of model parameters [67]. A trade-off has to be considered: including information about NPIs can improve the model’s accuracy, but this also means that the models get more complex. Considering these problems, this research question cannot be answered completely in this literature research. Therefore, multiple SIR model variants will be studied to be able to observe the variance in the outcome of the different models.

2.3 Comparing and Combining ML- and SIR-models

Only a few studies have looked into both ML models and SIR models. Some results show that ML models outperform SIR models [68] in predicting the spread of the virus. An example of this finding is the implementation of time series analysis, where previously confirmed cases are used to predict future cases, which was done by Pandey et al. [68]. Here, the performance of a regression model was compared to the SEIR model, where the regression model outperformed the SEIR model in terms of prediction accuracy.

Liu et al. [69] compared the performance of an SEIRD model to an LSTM for data from China. The accuracy of the models was comparable; there was no significant difference. Both models were able to show the effect of NPIs on the number of predictions. Feng et al. [61] looked into the effects of quarantine modelled by both the SEIR model and an RNN separately. This study argues that the RNN can result in higher prediction accuracy. However, in this study, the SEIR model did not account for fluctuations in infection rates, in contrast to the RNN.

In contradiction to cases where ML models outperform the SEIR model, the SEIR model can perform better when making global level predictions [30]. Wang et al. compared SEIR to deep learning methods and showed that the SEIR model outperforms other methods such as RNN, GRU, and LSTM, but only on global level forecasting. The deep learning

(27)

Literature Review 14 methods have the ability to capture more of the variability and fluctuations in the number of infections over different regions. Therefore, deep learning methods would perform better on short-term predictions. Also, the deep learning models may outperform the SEIR model since the SEIR model is based on a regularity over time.

2.3.1 Combining ML and SIR models

Besides that ML and SIR models can be implemented separately, they can also be com-bined. It was suggested that integrated ML and SIR/SEIR models could enhance existing models in terms of accuracy [70].

Dandekar et al. [6] proposed an improved SIR model to predict infections in Wuhan, China. They used a NN to learn the increase of quarantine strength and could then show its effect of in preventing infections from rising. The parameter (Q(t)I), where Q(t) is a quarantine strength function and I(t) is the number of infections, was added to the model. Using this as a basis, Q(t) was represented by a layer in a deep neural network. Thus, a NN was implemented to approximate quarantine strength Q(t). The model was then used to identify the effect of quarantine measures on the infection spread. The NN consisted of two layers with ten weights in each layer and used the Rectified Linear United (ReLU) activation function. The simple SIR model was chosen for analysis over other more complex epidemiological models, so that the physics of the number of infected and recovered cases would not be obscured by a large number of parameters. However, this study did not perform any evaluation methods to analyse the quality of the model. More details about this method can be found in section 4.2.

In short, according to multiple scientists, ML models can be combined with epidemiological models to overcome modelling limitations. It was found that neural networks can be used to estimate unknown features of a system of equations [71], such as the SIR model. ML models can also be used as an ’extra parameter’ of the SIR model [6].

2.4 Modelling NPIs

It is important to take preventative measures to prevent spread of the virus. Non-pharmaceutical intervention strategies can be defined as public health measures such as quarantine control or lockdown strategies. These measures include isolation, household quarantine, closure of schools, shops, and workplace, travel restrictions, etc. Numerous studies have been performed to investigate the effects of different strategies. Perra Nicole [72] already summarized 28 articles covering studies focusing on the effect of NPIs in mul-tiple countries. Interestingly, a few studies also used machine learning or SIR models to study the impact of NPIs and will be highlighted in the coming paragraph.

(28)

Literature Review 15 An outstanding study by Haug et al. [73] showed that cancelling small gatherings was on average the most effective type of intervention. Here, linear regression was used to rank each NPI. Also, lasso regression was used along with the assumption that Rt would be

constant without NPIs and that fluctuations are associated with NPIs. Furthermore, this study used a random forest and NN model that took the variation of Rt into account.

Then, an estimation of Rt was made for the next day, taking previous values for the

strength of NPIs and Rtinto account. This study found that social gathering restrictions,

working remote, school closures and lockdown are the most effective NPIs, followed by border restrictions. Koh et al. [74] looked at the impact of NPIs in 142 countries using a regression model with country-specific parameters. The results show effectiveness of lockdowns and travel bans, where partial lockdowns are also shown to be effective when implemented in an early stage.

The Oxford- Covid-19 Government Response Tracker (OxCGRT) collected national gov-ernment policy measures globally [75]. The measures include physical distancing and other healthcare-related measures. Four categories were used to investigate the differences in government policy measures; overall government response index, stringency index, contain-ment and health index, and economic support index. For example, the Stringency Index was proposed to systematically organize the different types of NPIs. The Stringency In-dex consists of school closure, workspace closure, public event cancellation, restrictions on gathering size, public transport closure, stay-at-home requirements, restrictions on travel and movement, and public information campaigns. These categories could then be used to evaluate global government responses over time. Note that the OxCGRT dataset will be covered in more detail in section 3. Also very relevant for this study, Castex et al. [76] in-vestigated the relationship between the OxCGRT dataset and Google mobility reports and socio-economic data. Here, a SIR model with time-varying parameters is used to model the outcome and study the effects of NPIs. The authors of this study struggled with the problem of high correlations between individual NPIs. Therefore, a combination of NPIs was created that focused on major policies that enforce social distancing and school- and workplace closures. This allowed studying the effect of a group of NPIs. Results show that the impact of school closures and remote working was associated with a decreased employment rate, the number of elderly people, country extension, population density, and an increased GDP per capita and expenditure in health care.

2.4.0.1 Including NPIs in SIR and ML models

Ideally, NPIs that happened in the training data (pre-interventions) show a clearly iden-tifiable and stable trend, with similarities between countries. However, this is not always the case with Covid-19 trends. Including interventions into an SIR models is possible by changing model parameters. This easy to implement feature enables studying several

(29)

Literature Review 16 scenarios, which is one of the main strengths of SIR models. For ML models, this task is difficult, or still impossible considering a certain error threshold, since there are a lot of factors to consider. In addition, there is not much previous research on adding NPIs as a factor of deep learning models. Research that is done is mostly based on interrupted time series (ITS) modelling [77]. Even so, ITS requires the trend to remain constant in the absence of an NPI.

A small start towards a method of NPI scenario-based time series modelling was done by Ge et al. [78], where the effect of intervention strategies was shown by using RNN with an extra model parameter A that included intervention information. However, this approach was not flawless. For example, A ranged from 0 to 1, indicating no NPI and an active NPI respectively. Initially, Af inal was set to a fixed number, specific to a country.

Furthermore, the distribution of A is expected to be known before training the model. In short, values for A and its distribution are expected to be known for all countries, which is not the real case.

There are also other methods to add NPI information into the model. First, merging NPI data (the auxiliary features) with the output layer of the chosen deep learning model. This method can be seen as a post-model adjustment. Secondly, initializing the model’s states with a learned representation of the NPI. This method requires sufficient input data of the given NPI.

Relative to the finding that modelling NPI scenarios with ML models is still a difficult task and not much research has been done in this area, SIR model variants can be used easily to model NPIs by tuning parameters associated with an NPI.

In general, studying NPI scenario’s is less complex, which is why SIR models are often preferred when it comes to simplicity and fast modelling.

2.4.1 Hyperparameter optimization

An LSTM model includes a minimum of five hyperparameters; learning rate, time step features, hidden units, and training epochs. Previously, several ways have been studied to estimate the optimal model parameters. Kavadi et al. [79] looked into a Progressive Partial Derivative Linear Regression model to solve the parameter optimization problem. After that, a Nonlinear Global Pandemic Machine Learning model was applied to normalized features to make future predictions. Bayesian Optimization (BO) was also used to select optimal parameters for an LSTM model [31]. BO is a method based on the Bayes Theorem to find the minimum or maximum of an objective function. Another option for parameter optimization is to use grid search, a method of searching through a given subset of the hyperparameter space of an algorithm.

(30)

Methods 17

2.5 Research Question & Hypothesis

Only a few studies have been conducted where a comparison was made between SIR mod-els and ML modmod-els in modelling Covid-19 interventions. Most of the existing research towards epidemiological models focuses on modelling scenarios of interventions. After all, techniques to determine optimal control policies via analytical frameworks are largely missing [11]. As written in section 2.3, a study from Wang et al. compared SEIR to RNN, GRU and LSTM [30]. However, this study did not look at the performance of the models regarding interventions, opening up another gap in the literature. Therefore, the research question will focus on comparing the accuracy of SIR- and ML models that include in-formation about interventions. Considering that techniques to determine optimal control policies are largely missing [11], this research could help policymakers to make more suit-able decisions regarding interventions. More specifically, research will be directed towards modelling NPIs with different models: what is the difference between modelling Covid-19 interventions in future spread forecasts with ML compared to SIR model variants? A sub-goal of the research is to model the predictive distribution of confirmed Covid-19 cases with an ML model, an SIR model, and a combination of ML and SIR, with a focus on accurately quantifying uncertainty. Research will be done to estimate model parameters associated with NPIs to model the future spread of confirmed cases, given an NPI scenario.

Hypotheses

Based on the research question, a few hypotheses were made:

• Compared to deep learning models, SIR model variants will perform better when making global level predictions [30]. However, deep learning models will be able to capture more variance in the data. Therefore, deep learning models will perform better on short-term predictions compared to long-term predictions.

• Including information about NPIs is simpler for the SIR model compared to ML models. Therefore, an analysis of multiple scenarios with different interventions does not need to be complex and analysis is relatively fast. Due to limitations of complexity for ML models, it is expected that SIR model variants will perform better on representing information about NPIs.

• Performance of the models depends on parameter tuning. By combining ML and SIR model variants, ML models could be used to tune parameters for the SIR models. On top of that, combining ML and SIR models could be a solution to overcome the limitations of both models. For SIR models, the main limitation is the inability to learn from trends in the past. For ML models, the main limitation is the difficulty of including parameters such as NPIs or behavior of individuals.

(31)

Chapter 3

Methods

Optimal models to make forecasts of Covid-19 spread remain to be exploited. Therefore, this research focuses on clarifying the differences between various models and how these models can include NPIs. This research will be based on two types of data, time series data, and data that includes Covid-19 NPIs. The first part of this chapter will cover these two data sets. Then, SIR models and their variants are explained in detail. After that, the deep learning model LSTM and its used techniques are described. In addition, two hybrid models are described, combining both SIR and LSTM. Uncertainty quantification and sensitivity analysis are then described in the succeeding section. Lastly, methods of including NPI in the modelling process are described, such as change point analysis. All analysis is done in Python3, where ML models were based on Tensorflow [80].

3.1 Data

Data is collected from January 21 (2020) until March 1 (2021) for all 195 countries, exclud-ing 9 countries due to a lack of data (see Appendix .2.2). The first dataset, an open-access epidemiological dataset, includes time series data gathered by John Hopkins University (JHU) [1]. It consists of the global number of confirmed cases, deaths and recovered indi-viduals from January 21 (2020) onward. Several countries did not report recovered cases of Covid-19. An example of this finding is seen in the Netherlands. In this country, it is not obliged to report that an individual has recovered from a disease [81]. The number of new cases per day was calculated by Inew = It+1− It, where t is the concerning day and

t + 1the succeeding day. For the Netherlands, the number of recovered cases was derived from the number of confirmed cases and the total number of deaths. An assumption was made here that after being confirmed, the next stage would be either recovery or death. The second dataset, the OxCGRT dataset, summarizes the strength of government re-sponse, as described in section 2.4, which can be downloaded from the covidtrackerapi

[75,82]. The OxCGRT dataset includes information about school closing, workplace clos-ing, restrictions on gatherings, public transport shut down, international travel controls,

(32)

Methods 19 cancellation of events, stay home restrictions and internal and international movement re-strictions (see Appendix 1 for a detailed description). All measures are indicated on a scale from 0 to 4, representing no restrictions and highest stringency respectively. This scale is used to indicate the policy strength of each restriction. Furthermore, the Stringency Index (StrI) indicates the overall stringency of a country and is based on all NPI variables in the OxCGRT dataset. Here, higher StrI indicates higher stringency for this country. Further on in this research, StrI and other OxCGRT variables play an important role in predicting future infections. Therefore, a brief analysis of the most important variables is given in the coming section.

In general, government responses have become stronger throughout the Covid-19 outbreak [75]. Policy stringency varies across countries and over time. As countries change policy measures, an analysis can be made on the available data, where StrI is the main target. As Figure 3.1 shows, when StrI is very low, the number of confirmed cases is also very low. This can be explained by the way that countries choose to not implement NPIs when the number of newly infected cases is very low. However, along with the number of confirmed cases and death cases, StrI also increases. When the StrI is very high, the number of confirmed cases drops. An interactive visualization is shown in thesisproject.vercel.app

showing the StrI over time for all countries.

Figure 3.1: Number of Confirmed Cases on a logarithmic scale against calculated StrI for all countries listed in the OxCGRT dataset. The grey line indicates a non-linear trend line, computed by local regression. Low values of StrI are associated with lower values of confirmed cases. StrI increases along with the number of confirmed- and death cases.

It is also important to understand the limitations of the OxCGRT dataset. Most of these limitations can be explained by incomplete data. Incomplete and missing data is mostly

(33)

Methods 20

Figure 3.2: StrI over time for multiple countries over time. Green windows indicate maximum stringency for this country. For example, for the United Kingdom, maximum

stringency of policy measures was applied in January.

dealt with by the OxCGRT dataset already by treating missing values as zero. Further-more, scientists that created the dataset argue that very small variations in StrI are most likely a cause of missing data rather than a real change in underlying policies. As a result of treating missing data, null-values (or absent data points) come with a different meaning than 0-values. Here, null values or absent data points represent data that is not up to date and should not be interpreted as 0-values. Note that the highest StrI does not always indicate a lockdown for every country in the used dataset. This was not the case for the countries chosen for this study, except for China and Australia. For other countries, the highest values for StrI were always related to a lockdown.

Due to computational limitations, later on in this research, the scope is limited to nine countries: Australia, China, Germany, India, Italy, the Netherlands, Sweden, United King-dom, and the United States. These countries were chosen based on the ability to show model performance on different patterns of trends. The scope of countries includes both relatively large and small countries. In addition, it includes countries with a lot of variance in policy measures, such as the United Kingdom, as well as countries with relatively mild and stable policy measures, such as Sweden.

For initial experiments, data was split into a training set (01-22-2020 until 01-10-2020) and validation set (01-10-2020 until 01-03-2021). The input data consists of time series data, which are series of data points ordered in time. The data is represented by X = (X1, . . . , Xt) for each timestep t, where Xt∈ RM ×N, with M represented by the number

of features for the N countries. Later on, the training and validation set were split based on country-specific NPI date ranges, in order to validate the correct NPIs. For example, for the Netherlands, this split was made at 16-12-2020, which was the start of a period of highest stringency (a lockdown). A threshold of 5 months of training data was set to be sufficient. For the United States, the training set was too small given that the split had

(34)

Methods 21 to be made in March 2020 already, resulting in 3 months of training data (see Figure 3.2). Therefore, the United States was excluded from further investigation.

3.2 Evaluating forecast accuracy

To evaluate the prediction performance, several evaluation metrics are used. Here, the prediction performance is used as a term to represent model accuracy and is based on evaluating prediction error. This error can be defined as the difference between the ob-served values and the predicted values. The two most commonly used accuracy measures are Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Other techniques of evaluation are Mean Squared Error (MSE), Root Mean Squared Log Error (RMSLE) and Mean Absolute Percentage Error (MAPE). The computation of the evaluation metrics is shown in Table 3.1. MAE can be used to compute the average squared error. It gives less weight to outliers compared to other metrics. If the chance of errors is large, MAE is preferred over MSE. MSE incorporates both the variance and bias of the predictor, making it a useful method when there are unexpected values.

Metric Calculation MAE 1 N PN t=1|yt− ˆyt| MSE 1 N PN t=1(yt− ˆyt)2 RMSE q1 N PN t=1(yt− ˆyt)2 RMSLE q1 N PN t=1(log(yt) − log(ˆyt))2 MAPE 1 N PN t=1| yt−ˆyt yt |

Table 3.1: Evaluation metrics used to eval-uate performance. Here, ˆy is the estimated value of y at time t in the timeseries data and N the total number of data points. ¯y indicates the mean value for all observations. RMSLE gives a larger penalty to estimates that

under-predict compared to over-predicted esti-mates. MAPE measures the error in percent-ages by calculating the absolute deviation and dividing it by the actual value. A problem with MAPE is that very small values will result in non-valid results due to near-zero values in the denominator. For MAE, MSE, RMSE, RMSLE and MAPE, lower values indicate a better fit with the data.

Later on in this thesis, when comparing model predictions per country, error measures are nor-malized by population size in order to compare errors between countries. The RMSE equation then becomes: 1 n( 1 N PN t=1(yt − ˆyt)2) where n

is the population size of a country and N the number of observations.

Modelling Covid-19 Interventions with Machine Learning and SIR Models

University of Amsterdam

Masters Thesis Computational Science