Predictive modelling of influenza outbreaks in Russia

(1)

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Graduate School of Informatics, MSc Computational Science

Universiteit van Amsterdam

Department of High-Performance Computing Faculty of Informational Technologies and Programming

Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)

PREDICTIVE MODELLING OF INFLUENZA OUTBREAKS

IN RUSSIA

Iuliia Novoselova

Supervisors: Valeria Krzhizhanovskaya, Vasily Leonenko

Amsterdam, Saint Petersburg 2017

(2)

STATEMENT OF ORIGINALITY

This document is written by Student Novoselova Iuliia who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

(3)

ABSTRACT

Keywords: modelling, data analysis, mathematical epidemiology, infectious diseases, seasonal influenza

The paper focuses on modelling and prediction of seasonal ILI epidemic outbreaks. Influenza-like illnesses are the most spread and common infectious disease in the world and present a substantial burden on global public health, economic and development of countries. Hence, it is an important issue of investigation the infection transmission.

The first part of this Master’s thesis is dedicated to studying of the connection between the incidence data and weather factors in Russian cities. For this purpose, the algorithms of correction of incidence data and distinction of phases of seasonal ARI dynamics are developed. Furthermore, it is needed to find the relation between the ARI dynamics and the weather conditions. In the second part of this research, the possibility of an applying of the Baroyan –Rvachev model on contemporary data is examined. Moreover, the usage of modern air transportation matrix for prediction of epidemic peaks is considered.

(4)

TABLE OF CONTENTS

STATEMENT OF ORIGINALITY ... 2

INTRODUCTION ... 5

1 EXISTING METHODS OF MODELLING AND PREDICTION OF EPIDEMICS ... 8

1.1 Influence of external factors ... 8

1.2 Modelling of influenza epidemic dynamic ... 10

1.3 The Baroyan-Rvachev model ... 12

1.4 Transportation and epidemic transmission ... 14

2 IMPACT OF WEATHER CONDITIONS ON ARI INCIDENCE DYNAMIC ... 17

2.1 Data analysis of ARI incidence data and weather factors ... 17

2.1.1 Algorithms for incidence data analysis ... 17

2.1.2 Connection between ARI incidence and weather factors ... 19

2.2 Results ... 23

3 MODELLING AND PREDICTION OF ILI TRANSMISSION ... 25

3.1 The local Baroyan-Rvachev model ... 26

3.1.1 The model ... 26

3.1.2 Numerical experiments and results ... 28

3.2 Patterns of epidemic transmission between cities of Russian Federation ... 31

3.3 The global model ... 35

3.3.1 Transportation matrix ... 35

3.3.2 The model ... 36

3.3.3 Numerical experiments and results ... 37

3.4 Conclusions to the modelling and prediction of ILI transmission ... 39

CONCLUSIONS ... 41

FUTURE WORK ... 43

ABBREVIATIONS ... 44

REFERENCES ... 45

(5)

INTRODUCTION

Acute respiratory illnesses are the most spread and oldest infectious diseases. Annually influenza strikes five million severe cases and, furthermore, results in five hundred thousand deaths worldwide. Almost all seasonal epidemics cause an increase in mortality. Influenza-like illnesses present a huge burden on worldwide public health [1]. Thus, influenza remains a global concern as it causes considerable damage to the development and the economy of a particular country. As a result, it is crucial to examine the transmission and dynamic of infection.

There are several tools to prevent influenza disease: vaccination, isolation of infected individuals, closing of public spaces. However, vaccination strategy and surveillance effort are useless without the knowledge of epidemic phases such as start, peak and finish. The predictable phases of epidemic help to state organisations to prepare for the season of epidemics. For example, if an onset of incoming wintertime epidemic is predicted, there is some time to organise vaccination strategy in a city.

Currently, the primary method of the epidemic prediction is computational modelling of disease dynamic. It includes statistical methods and mathematical algorithms and uses incidence data observed during epidemics [2]. However, a problem related to incidence data is that these data are often incomplete because of underreporting and lack of monitoring systems. Moreover, the experiments with infectious diseases spread in human populations are often impossible, unethical and expensive.

A method of effective prevention is the optimal strategy that applies one specific technique, or a combination of several algorithms. Once the best model for prediction has been found, it can be used to test different methods to halt epidemic propagation and select the most proficient one.

Thus, the next stage is to find the best combination of methods and algorithms and select the appropriate parameters. Firstly, it is required to determine the timing of an epidemic, examine its dynamics and determine its phases. Secondly, the factors affect the

(6)

spread and dynamics of the disease are analysed. Moreover, to establish a suitable model for a particular situation is a vital issue of the investigation.

In the 70s of the last century, a famous Baroyan-Rvachev model was created in the Soviet Union. It is the first model used in practice to predict dynamic and transmission of epidemics. It was widely discussed in other countries [31, 34 - 35]. The original idea of the approach is that an epidemic spread is based on the transport model. Nowadays it is unclear whether this model can be applicable to the modern data in Russian Federation. Thus, the problem that arose from using this model is the changes in transport network. First of all, at present, general passenger flow increased and, moreover, international communication routes became bigger. Therefore, there is no open source of migration flow data compared to the past.

The main goal of this Master’s thesis is to examine and develop a model for

prediction of epidemic outbreaks in cities of Russian Federation. For this purpose, it is necessary to analyse the ARI and ILI incidence data in the template regions (described in Chapter 2) and apply the model of Baroyan-Rvachev on contemporary incidence data (described in Chapter 3).

Chapter 1 consists of a review devoted to investigations of the spread of infectious diseases. Moreover, it contains a study of the influence of various external factors of transmission and dynamic of influenza epidemics. In addition to that, an evolution of modelling and forecasting of epidemics is scrutinised.

Chapter 2 demonstrates the result of an algorithm for collected incidents data of Russian cities, fixing the under-reporting. Next, an algorithm is applied to deal with distinguishing of phases of seasonal ARI dynamics. Finally, this section aims at finding the relation between the weather conditions and the ARI incidence data.

Chapter 3 provides the examination of the Baroyan-Rvachev model. This research explores the possibility of applying the model to the present day. Further steps are to examine modern incidence data and determine spread patterns of influenza among

(7)

Russian cities. Besides, the important part of this study is taken and applied to modern air transportation matrix and incidence data in order to predict ILI outbreaks.

In conclusion, the last chapter contains discussion, description of the whole research work and future plans considering this study.

(8)

1 EXISTING METHODS OF MODELLING AND PREDICTION OF EPIDEMICS

1.1 Influence of external factors

Previous studies indicate that the incidence rate is influenced by various factors, including social, environmental, economic, human resourced, transporting features [2]. Not all of the factors above are examined with an essential accuracy. A data analysis of seasonal incidence data should be carried out to show substantial annual variation and factors.

Influenza transmission significantly varies in the different geographic areas. Moreover, it depends on various weather features [19]. The vital climate factor is a zone of the considered region, temperate or tropical. For example, ARI epidemics in tropical and subtropical regions are highly irregular [3, 4]. As a result, it is difficult to determine epidemic seasons and annual causes of them under these circumstances. However, solar radiation is stronger in the tropic areas. Consequently, it was attracted attention from research teams, and they founded a straight role of this feature in the influenza spread [5].

Russian Federation, as well as Europe and USA, is located in the temperate region. In temperate regions, compared with tropics, influenza outbreaks mostly occur in winter. These seasonal influenza epidemics cause a significant level of incidence cases in winter time [6]. Despite the enormous number of works dedicated to the seasonal spread of ILI disease, a primary idea of underlying seasonal influenza fluctuations remains very restricted [7].

Experimental studies suggest the wintertime influenza peak in temperate regions is related to the consistent cold and dry weather [8, 9]. Furthermore, several pieces of research demonstrate that the onset of the epidemic depends on air temperature, absolute and related humidity of the particular region. In addition, temperature and humidity are strongly correlated. Both of these weather factors meet a minimum during the winter

(9)

period while the influenza epidemic reaches its peak [10 - 13]. As a result, the epidemic outbreak occurs with minimal temperature and humidity in corresponding days.

In recent papers by Shaman J. et al. the experiments have appeared to check the hypothesis that temperature and humidity are considered the primary drivers of seasonal influenza spread in regions of temperate zone in the USA [10, 11]. One of them demonstrates that the onset of the winter epidemic is associated with a critically low level of humidity in weeks before outbreak [10]. Despite the fact that air temperature and humidity is strongly correlated [7, 11], temperature also depends on indoor management, so humidity remains the primary influenza driver [10].

In related references, it was observed that other different factors except climate conditions might effect on epidemic dynamic. For example, age, immunity of individuals and prevention tools. In [40] an effect of age structure and quarantine was examined. It was found that cross-immunity influence on the duration of epidemic. As reported by Schanzer D. et al. [41] children play a major role in the transmission of the infection. Children fall ill at the beginning of influenza epidemic; thereby the rest individuals are infected by them.

In addition to factors above, an effect on influenza transmission has antigenic drift and immunity loss. Axelsen J. B. et al. [14] argue that the immunity system of the population is responsible for repetitive antigenic drift of the influenza virus. Thus, it has a significant influence on the timing of influenza outbreak. However, studies on the influence of these external features are still lacking. All of them require a separate and in-depth studying.

Romanyukha A. A et al.carried out the investigation of dynamic of seasonal epidemics in Moscow, Russian Federation. In this research incidence data from 1959 to 1988 was used. Results of this study show that despite the variety of pathogens, ARI infections might be considered as a set of related diseases and its structure depends on the weather and social conditions. Data analysis suggests that winter epidemics of influenza are not caused only by the emergence of new strains of the influenza virus. The winter

(10)

epidemic incidence rate depends not only on the properties of the virus but also on the ARI incidence in the previous year. Consequently, the anti-influenza epidemics measures should not be restricted to immunisation against certain strains of influenza virus and should be focused on reducing all infectious burden of respiratory infections [2].

Despite the great interest on this topic, the universal equation of dependence between the external factors and the parameters of the influenza dynamics and timing have not been presented in the general form. Moreover, it is not clear whether it is possible to find this dependence. The main reason for this is a multifactorial epidemic process and the complexity of its structure. As a consequence, the solution to the problem of analysis and processing of the data incidence and impact of external factors is an important issue.

1.2 Modelling of influenza epidemic dynamic

In the literature, the most common tools to investigate the influenza transmission are modelling and simulation. These methods are powerful and might be extended to apply for prediction of outbreak dynamic of the influenza epidemics.

The most widespread model of the transmission of infection diseases is the SIR model. Its early prototype was developed in the beginning of XX century [30]. Moreover, its extensions are successfully used at present days [18 – 22]. Consequently, this type of model has a long history and it is considered reliable.

The basic SIR model is introduced as a simple mathematical model that describes epidemic transmission with a uniformly related population. The population is divided into three separate groups of individuals: Susceptible, Infected and Recovered [23].

The SIR model might be represented by a nonlinear system of three differential equations (1), (2) and (3). This system approximates the relations between three groups of the population [24]:

(11)

𝑑𝑆 𝑑𝑡 = − 𝛽𝑆𝐼 𝑁 (1) 𝑑𝐼 𝑑𝑡 = 𝛽𝑆𝐼 𝑁 − 𝛾𝐼 (2) 𝑑𝑅 𝑑𝑡 = 𝛾𝐼 (3)

where 𝑆(𝑡) denotes to the number of susceptible people; 𝐼(𝑡) is infected people; 𝑅(𝑡) represent the number of recovered individuals; 𝑁(𝑡) = 𝑆(𝑡) + 𝐼(𝑡) + 𝑅(𝑡) defines population, 𝛽 and 𝛾 are infection and recovery rates, respectively.

The basic SIR model itself is seldom used in practice. It is improved for a specific epidemic situation. Firstly, the division of the population into groups becomes more complicated. Secondly, the information about various factors influenced on the spread of the epidemic is provided to improve the quality of modelling.

One of the extended types of the SIR model used for modelling and forecasting is the SIRS model. This model deals with infective individuals fully recovered and moved to the Recovered group of the population. After that, individuals from recovered group become susceptible once again. One example of an application of the SIRS model is an experiment carried out by Jacob Bock Axelsen et al. As it is mentioned in [18] the researchers developed “a simple epidemiological model to reveal multiannual predictability based on high-quality influenza surveillance data in Israel”. As a result, they got successful predictions that are driven by antigenic drift, immunity loss and weather factors such as humidity and temperature [18].

Another study was held by R. Yaari et al. and presented in [19]. The standard SIRS epidemic model was considered with antigenic drift. They used Israeli incidence data for modelling approach. The result obtained in [19] showed that including actual weather conditions to simulate influenza transmission rate gives better outcomes than using the inter-annual values of the weather factors solely. The researchers have arrived at the conclusion that the climate has a great influence on the spread and dynamics of the flu epidemic.

(12)

The investigation described in [20] shows that it is possible to add a number of improvements to the simulation of epidemic including the information about immunity and influenza infection and get the SIRS compartmental model. Moreover, the results of the investigation by James Truscott et al. [20] indicate that an age-structured model with mixing and cocirculating strains are both required to match observed incidence data.

Another type of extension of SIR model is a SEIR deterministic model. The population is divided into four separated groups: Susceptible, Exposed, Infectious and Removed. Antonella Lunelli et al. [21] have used the SEIR model with age structure. In this study the separation of age groups as taken from surveillance. The research team have estimated age-specific transmission rates and levels of immunity in age groups. Using SEIR-model, they analysed epidemic surveillance data in Italia. The improvement of the previous model is the SEIRS model. It was used by Nele Goeyvaerts et al. [22] with age structure and vaccination. In this model population is classified into groups of Susceptible, Exposed, Infectious, Recovered and vaccinated (Susceptible) individuals of Belgian incidence data.

1.3 The Baroyan-Rvachev model

The Baroyan-Rvachev model is a famous model that was used for modelling and prediction in the Soviet Union. The work on modelling and forecasting of the influenza dynamics belongs to the 70th years of the XX century. One of the first examples of applying this model is presented in [16]. It was the only model used in practice for epidemic prediction at that time.

According to the approach of Baroyan and Rvachev, the transmission of illness though the country is determined by the transport network. The model can be represented by the system of equations (4) – (8) in a discrete form:

(13)

𝑦̅ (𝑡) = ∑ 𝑦_𝑗 _𝑗(𝑡, 𝜏) 𝑇 𝜏=0 𝑔(𝜏) (4) 𝑦_𝑗(𝑡 + 1, 0) = 𝛽𝑗 𝑝_𝑗𝑥𝑗(𝑡)𝑦̅ 𝑗 (5) 𝑦_𝑗(𝑡 + 1, 𝜏) = 𝑦_𝑗(𝑡, 𝜏 − 1) + ∑ 𝜎_𝑖𝑗𝑦𝑗(𝑡, 𝜏 − 1) 𝑝_𝑖 𝑛 𝑖=0 (6) 𝑥_𝑗(𝑡 + 1) = 𝑥_𝑗(𝑡) − 𝑦_𝑖(𝑡 + 1, 0) (7) 𝑥_𝑗(0) = 𝛼_𝑗𝑝_𝑗 𝑗 = 1,2, … , 𝑛 (8)

where 𝑛 is the number of considered cities; 𝑖, 𝑗 denotes the city number; 𝑥_𝑗(𝑡) represents susceptible people at the time 𝑡; 𝑦_𝑗(𝑡, 𝜏) is the number of people infected 𝜏 days ago from the moment of moment 𝑡; 𝑦̅ (𝑡) denotes the number of infectious _𝑗 individuals in the city 𝑗 at the time 𝑡; ‖𝜎_𝑖𝑗‖ determines the transport matrix between cities; 𝑝_𝑖 is population of the particular city 𝑖; 𝑔(𝜏) is infectious rate; 𝛼_𝑗 is susceptible rate; 𝑇 is duration of infectious period, and 𝛽_𝑗 is a free parameter.

The equation (1) demonstrates the number of infectious people in the distribution of the infected individuals at different times. The second equation describes the number of newly infected people. The equation (3) denotes the change of the distribution of sick people at different moments of time due to connections with other cities. The next equation (4) represents the change in the number of susceptible individuals due to new cases. The last equation (5) determines the initial data.

Two types of the Baroyan-Rvachev model are discussed in this paper. One of them is local when events only inside one city are considered. The transport matrix is not used and 𝜎_𝑖𝑗 = 0 for each 𝑖 and 𝑗 in the equations (1) – (5) in this case. The second one is named a global model, where the usage of the country transport network is required part of modelling [17].

(14)

Later in 1985, the Baroyan-Rvachev approach was extended by Rvachev and Longini [38]. They found how to use the model between countries at a global scale. In recent years, the improved model of simulation of epidemic transmission has attracted much attention from research teams in different countries [31, 34 – 35].

Nowadays, the possibility of an applying of the Baroyan-Rvachev model on modern data in Russian Federation is unknown. The original idea of this modelling deals with transport network. Thus, the problem that arose from using this model at present days is the changes of the transport network. The general passenger flow increased and, moreover, the international communication routes became bigger compared to the past. Furthermore, there is no open source of migration flow data. As a result, it is crucial to check a possibility of usage the approach of Baroyan and Rvachev these days.

1.4 Transportation and epidemic transmission

The previous study indicates that one of the greatest impacts on influenza epidemic transmission is human mobility [15]. In this paper, several types of transport models are observed.

The first type is the patch model. The idea of such model is a division of the country population into particular sub-groups. Hence, the strength of infection for an individual depends on the distance between individual’s patch and the others, and on the prevalence of infection in all patches. The same force is exerted on all individuals of the same patch. These models rely on the simplifying hypothesis that the probability of human interaction generally depends on the community of living. Next type of the transport model deals with the distance transmission. Each individual is modelled separately. He can infect others in a surrounding, with a probability decreased with distance.

One of the most complex, but, at the same time, common type is the network model. The network models focus on the contacts between individuals or groups. This is a mathematical object which study have been initiated by graph theory and social sciences.

(15)

There are several alternatives of representation of the transport model that can be applied in practice. One of them is the gravity model. This model depends on the geographic distance metric. Another example is the air travel model. It is one of the most usable types of transportation modelling. Moreover, international air transportation has a vital impact on global influenza transmission. However, the role of air travel remains debated in regions. The last type of representation of human mobility is based on work commutes. The investigations documented in [15] show that models of work commutes outperformed air traffic models on a regional scale. The cause of this result is that work commutes are more localised and better for modelling of influenza spread in particular region [15].

The main issue is to understand what type of human transportation influenced the most on spatial spread of illnesses. Nowadays, the solution of mobility determination is an urgent problem, required accurate and significant investigation.

There are several investigations on this topic. In the paper [31] the authors concentrate on air travel patterns in the USA and its importance for forecasting of epidemics. The researchers present the results of simulation of seasonal influenza epidemics in 1998-2001 and applying of the compartmental model and air transportation data. They show that air transportation plays a significant role in transmission of infectious diseases [31]. The researchers analysed the global model of Rvachev and Longini [38] for analysis of their incidence data. To sum up, they use modern transportation data and consider seasonal effects that are characteristic of the United States [31].

The similar experiments were held in Hong Kong, a region with a discrepant climate [32]. In this paper, the authors suppose that pandemic epidemic starts in Hong Kong and returns after a while via international air transportation. This investigation shows the significant importance of air transportation in epidemic modelling and prediction. To illustrate the result, a simulation of the approach of Rvachev and Longini was performed [32].

(16)

There is one more investigation of the influence of air transportation on influenza transmission. In the research documented in [39], the stochastic model of the international spread of infection is developed. In addition, the modern air transportation and demographic data are used for impact on epidemic spread. In this case, flu infection is transmitted slower.

As a result, there is no single system for predicting of influenza dynamic and transmission. Information on this topic is scattered and does not give a general idea. It is necessary to create a unified system that would include all algorithms and methods of working with data in the particular country and take into account its features. The system should contain several techniques. Its parameters are external factors which have the greatest influence on the levels of seasonal incidence.

(17)

2 IMPACT OF WEATHER CONDITIONS ON ARI INCIDENCE DYNAMIC

The aim of this chapter is to investigate an influence of the climate features on acute respiratory illness dynamic in cities in temperate regions. In this work temperature and humidity are analysed as the weather factors.

2.1 Data analysis of ARI incidence data and weather factors

2.1.1 Algorithms for incidence data analysis

A first step to reach the aim of this part of Master’s thesis is to find the structure of ARI incidence in cities. The detailed algorithm for determination of the illness structure is fully described in the paper [25] written by our research group in ITMO University. Here the general information is presented.

The dataset that used in this work contains the weekly incidence data in cities of Russian Federation during the period from 1986 to 2015. This information of incidence data is provided by the Research Institute of Influenza, Saint Petersburg, Russia [26].

Firstly, the received raw weekly data is corrected and smoothed. This step is needed for adjustment of the incidence data, for instance, in holidays. The previous study shows that people do not visit doctors during the state holidays and weekends [27]. Moreover, there are the missing points in the dataset. The cubic interpolation is used for filling missing value. Formerly the daily incidence data is taken from the interpolated weekly data. The Thursday incidence is taken as incidence data of corresponding week divided by 7. Figure 1 presents the plot of original and corrected incidence data.

After that, the daily incidence data is separated into seasons. The season lasts from July each year up to June of the following year. The investigation of the received seasonal incidence proposes that each season data has the repetitive phases [25]. They are lower ARI level, higher ARI level, level transitions and outbreaks. In the several seasons,

(18)

outbreak did not occur. The phases are similar and repetitive for all seasons in Russian cities. Figure 2 presents a typical season with certain phases. The developed algorithm for determination of phases parameters is described in the paper [25].

Figure 1 – Example of correction and interpolation weekly incidence data of season from October 1985 to May 1986 in Saint Petersburg

For further experiments, the epidemic and interepidemic incidence data are determined by the algorithms above. The next step is to examine the relationship between these epidemic and interepidemic incidence data and weather factors, temperature and humidity.

Three cities of Russian Federation were chosen for the experiments: Moscow, Saint Petersburg and Novosibirsk. The reasons for this choice are that these cities are the largest and have the biggest populations in Russia. Moreover, the epidemics are pronounced in these cities. However, they are located in different regions and have altered climate.

(19)

Figure 2 – The ARI incidence curve demonstrates phases of the seasonal epidemic outbreak in the season from July 2003 to June 2004 in Saint Petersburg with one

incidence peak. Under-reported data occurred due to state holidays is presented

The same experiments were held for Ile-de-France region with the resembling weather condition and population for comparison. The weekly incidence data of Ile-de-France is provided by French Sentinelles surveillance system [28]. In addition, the daily incidence data was received in the same way as for Russian cities. However, a criterion for determination of the epidemic and interepidemic incidence was taken from weekly ILI reports available on the Sentinelles website [28].

2.1.2 Connection between ARI incidence and weather factors

The next stage of this investigation was to find a connection between weather conditions and incidence data. The information about temperature and humidity is provided by NOAA National Center for Environmental Prediction [29]. In the experiments below, temperature and specific humidity were used as weather factors. Specific humidity is determined by averaging the 6-hour period values from the initial database.

(20)

Temperature and humidity are strongly correlated, especially in minimal values. Figure 4 and Table 1 show the correlation between weather factors.

Figure 3 – The relation between temperature and specific humidity in Saint Petersburg, Moscow, Novosibirsk, and Ile-de-France

Table 1 – Correlation coefficients of temperature and specific humidity in Saint Petersburg, Moscow, Novosibirsk, and Ile-de-France

Saint

Petersburg Moscow Novosibirsk Ile-de-France Correlation

coefficient 0.929 0.931 0.915 0.916

Due to the correlation of temperature and humidity, the similar trends are expected for connection between the incidence data and weather factors. The graphs were created to examine the relation between the incidence data and temperature, specific humidity. Figure 3 presents several graphs.

(21)

(22)

In Figure 3 the daily ARI incidence data are plotted against the corresponding data on weather factors. The epidemic data is highlighted in red; the inter-epidemic are blue dots. Moreover, the correlations between incidence data and weather conditions were calculated using Python function from SciPy library. Tables 2 – 5 show the received correlation coefficients. The interepidemic trends that show a relation between interepidemic data and temperature or specific humidity are illustrated in the plots.

Table 2 – The correlations between the interepidemic incidence and the weather conditions in Saint Petersburg and Moscow

Saint Petersburg Moscow

Temperature Humidity Temperature Humidity Correlation

coefficient -0.64 -0.62 -0.63 -0.64

Table 3 – The correlations between the interepidemic incidence and the weather conditions in Novosibirsk and Ile-de-France

Novosibirsk Ile-de-France

(23)

Table 4 – The correlations between the epidemic incidence and the weather conditions in Saint Petersburg and Moscow

Saint Petersburg Moscow

coefficient -0.03 -0.01 -0.05 -0.06

Table 5 – The correlations between the epidemic incidence and the weather conditions in Novosibirsk and Ile-de-France

Novosibirsk Ile-de-France

coefficient -0.15 -0.16 -0.01 -0.03

2.2 Results

The experiments show that there is a dependence between interepidemic data and weather conditions; the incidence data is inversely proportional to temperature and specific humidity. In contrast, the epidemic data does not show a dependence on the external factors. The correlation coefficients between the epidemic data and weather conditions are small and statistically non-significant. Besides, these results agree with similar trends between correlated temperature and humidity discussed above.

The plots represent the thresholds of temperature and specific humidity that the epidemic ARI incidence cannot overcome. The result means there is no epidemic incidence data after specific values of temperature and humidity. These values are contained in Table 6. Less than 0.1% of cases do not overcome these values of the thresholds.

(24)

Table 6 – The threshold of the weather conditions that the incidence data in the corresponding city cannot overcome

Saint

Petersburg Moscow Novosibirsk Ile-de-France

Temperature, C 6.8 8.4 7.5 16

(25)

3 MODELLING AND PREDICTION OF ILI TRANSMISSION

This part of the thesis presents the investigation of the spread of influenza-like illnesses epidemics between the cities of Russian Federation. Despite the fact that the model of the transmission of influenza already exists, the Baroyan-Rvachev model, currently it is not used for medical purposes. Patterns of the transmission, found in the USSR, are not suitable for today's data in many ways. In recent years there has been a significant development of transportation routes, which resulted in changes in the spread of influenza. Thus, the solution of the problem of the influenza transmission among subjects of the Russian Federation is a vital and serious issue that requires taking into accounts the modern technologies. So the purpose of one part of the project was the development of a statistical model of the changes dynamics of the spread of flu among the subjects of the Russian Federation and its comparison with the Baroyan-Rvachev transport model.

The main idea of the approach of Baroyan and Rvachev is that an infection moves among cities via the transport network. So infected people travel to other cities, where there is no epidemic yet, and these individuals bring the disease into the following cities. Thus, the information about people transportation between subjects in the country should be used for investigation this type of modelling. It allowed us to apply the approach of Baroyan and Rvachev for the prediction of the seasonal outbreaks.

As a result, we can describe an algorithm for prediction of the seasonal epidemic outbreak. This algorithm is fully documented in [33]. The algorithm explains steps for one season. First of all, a first city, where the seasonal influenza epidemic starts earlier rather than in other cities, should be determined. It can be done by using the dataset of ILI incidence data. Then the sufficient number of the points of the incidence data of the first city is defined for a model fitting. Obtained points fit the model, and the epidemic model parameters are found. Received parameters are named optimal for the season. Next, the transportation matrix is used to determine an order of cities in which seasonal epidemic will occur. After that, the corresponding days are defined by the cities order.

(26)

Next step is to develop a model for these cities with the optimal parameters found due to the incidence data of the first city. Using these models, we can predict the outbreaks of epidemics in cities.

Using the algorithm above, we can apply modern data and get results of the prediction. However, the situation is not ideal at present days. Firstly, there is no open source of transport information in Russian Federation. In the past, train, air and bus people transportation was considered. Now there is no opportunity to have such detailed traffic data.

The second problem is related to the incidence data. In the Soviet Union daily incidence epidemic data of more than 50 cities was used in research works. Currently, such type of data is not available. Instead of this, the weekly data are interpolated to daily incidence data. It causes a range of inaccuracies because the incidence data became more smooth that real data. Also, it affects the work of algorithms of definition a timing of seasonal epidemics.

3.1 The local Baroyan-Rvachev model

This part of the paper describes the examination and creation of the local Baroyan-Rvachev model. It represents the simulation of incidence data inside one city. This part of research is fully documented in the paper [33] written by our research group in ITMO University. Then the local model was extended to the global model. The global model shows the changes that occur between cities. Also, it uses transportation network for this issue.

3.1.1 The model

The dataset, used in this work, contains the incidence information of Russian cities during the period from 1986 to 2014. The algorithms, described in Chapter 2 of this thesis

(27)

for correction and restoration of the daily incidence data, were applied to the dataset. Then the received dataset was used for modelling of seasonal epidemic describing below.

The population model is formulated in the Baroyan-Rvachev approach. The population is represented by a set of four groups of people: susceptible, exposed, infectious and removed. In such type of model there is a latent (exposed) period when person can be infected but cannot spread a disease, or in other words to be infectious. The model can be represented by a following system of the differential equations (9) – (11) and the initial conditions (12):

𝑦̅ = ∑ 𝑦_𝑡 _𝑡−𝜏𝑔_𝜏 𝑇 𝜏=0 (9) 𝑦_𝑡+1 = 𝛽 𝑝𝑥𝑡𝑦̅ 𝑡 (10) 𝑥_𝑡+1 = 𝑥_𝑡 − 𝑦_𝑡+1 (11) 𝑥₀ = 𝛼𝑝 (12)

where 𝑥_𝑡 denotes the number of susceptible people in the population at the moment 𝑡; 𝑦_𝑡 is the number of newly infected individual; 𝑦̅ is cumulative infectious people by _𝑡 the time 𝑡; 𝑔_𝜏 denotes to the rate of infectious individuals infected 𝜏 days before moment 𝑡; 𝑝 is population of city at moment 𝑡; 𝛽 is infection rate, and 𝛼 represents the rate of population which is vulnerable to the currently circulating flu virus strain. Here modelling is developed within the confines of one country and transport network is not considered. The described system contains the epidemiological parameters 𝛼, 𝛽. Moreover, in order to calibrate the epidemic curve position, the additional model parameters are established. The model parameters are a relative vertical bias of the developed incidence curve position and an absolute horizontal bias of the developed incidence curve position.

(28)

3.1.2 Numerical experiments and results

The aim of this part is to fit the local model by the modern incidence data in order to predict the epidemic peaks for three Russian cities: Moscow, Saint-Petersburg, and Novosibirsk. We have exacted the epidemic outbreaks from the incidence data and took 67 epidemic outbreaks in total. We have performed the model fitting and assessed the parameters of the local model for every season.

The model fitting was developed in Python programming language using numpy and matplotlib Python libraries. Then to determinate the optimal parameters, the BFGS (Broyden – Fletcher – Goldfarb - Shanno) optimisation method was performed.

To estimate parameters for modelling for every city we can apply the following steps. Consider that we have a city, where we want to predict a day and a height of the outbreak peak, for example, Moscow. Let assume that Saint Petersburg and Novosibirsk have already experienced the outbreak in the season under consideration, and we have found two pairs of parameter values obtained by the model incidence curve calibration. At the same time, the epidemic outbreak has just started in Moscow, and we know only several incidence points, including the starting one (which gives us the moment of epidemic start in Moscow). In this case, we cannot find the parameter values directly for Moscow. According to Baroyan and Rvachev, all parameters are approximately equal to each other. Hence we can use the pair of parameter values from Saint Petersburg and Novosibirsk (depending on where the outbreak has started earlier and, as a result, the incidence data is more complete) to build the model curves and obtain the approximate forecasts in absence of the incidence data of Moscow. So the previous procedure was repeated for every epidemic curve and measure two values: the prediction bias of the peak day 𝑑𝑡 and the ratio between the modelled and the real outbreak peak heights 𝑑ℎ. These values are used to estimate a correctness of the prediction of the epidemic peak.

Figure 5 illustrates the example of the model fitting for Saint Petersburg in the season from 2005 to 2006.

(29)

Figure 5 – An example of a fitting of the local model in Saint Petersburg in the season from 2005 to 2006

The criteria have been chosen for an assess the accuracy of the prediction results: ‘square’, ‘vertical stripe’ and ‘horizontal stripe’. According to ‘square’ criteria, the prediction is accurate if the prediction bias of the peak day 𝑑𝑡 within the limits of (-8, 8) and the peak ratio 𝑑ℎ in (0.5, 2.0). ‘Vertical stripe’ means that 𝑑𝑡 in (-7, 7), and ‘horizontal stripe’ denotes that 𝑑ℎ in (0.7, 1.5). Thus, the percentage ratio of the number of peaks that fit these criteria to the number of all epidemics was considered. Tables 7 and 8 represent the obtained prediction results.

(30)

Table 7 – Prediction accuracy according to the ‘square’ criterion

Saint Petersburg Moscow Novosibirsk

Saint Petersburg data - 33.3% 40.9%

Moscow data 33.3% - 35%

Novosibirsk data 31.8% 40.0% -

Soviet model 100% 100% 75%

Table 8 – Prediction accuracy according to the ‘stripe’ criterion

Saint Petersburg

Moscow Novosibirsk All cities

Msk Nsk SPb Nsk SPb Msk USSR

‘Vertical’ 28.6% 31.8% 28.6% 35.0% 31.8% 30.0% 87.4%

‘Horizontal’ 90.5% 77.2% 90.5% 75.0% 86.4% 75.0% 69.0%

The results show that, due to the inaccuracies in selection of the first outbreak day, the prediction for the ‘square’ and the ‘vertical stripe’ for all three Russian cities are not good enough in this investigation, whereas all assessments made in the Soviet Union in 1970’s are accurate to suit both criteria. The positive moment of this is that by means of the value interval of the epidemic peak biases we could judge the curve extraction algorithm accuracy and compare different versions of that algorithms.

In contrast, the accuracy of peak height prediction is quite good. It overcomes the results in USSR. The ‘horizontal stripe’ criterion demonstrates this in Table 6.

Also, we found that prediction based on Novosibirsk incidence data to forecast the outbreak peaks in Moscow and Saint Petersburg provides worse results than in the cases of when Moscow peak is predicted by Saint Petersburg data, and Saint Petersburg peak is predicted by Moscow data. This trend might occur because a relation between two cities, Moscow and Saint Petersburg is inversely proportional to their distance. It is vital to notice that this matter should be investigated further to know the exact cause.

(31)

3.2 Patterns of epidemic transmission between cities of Russian Federation

The aim of this part is to investigate the patterns of flu spread between the cities in Russian Federation. This study helps to understand is it possible to find the human ways of movements and apply the Baroyan-Rvachev approach for the prediction of the influenza outbreaks in Russia.

First of all, the ARI incidence data was scrutinised. The dataset was examined with the incidence data of 12 Russian cities: Saint Petersburg, Moscow, Novosibirsk, Kazan, Omsk, Ufa, Chelyabinsk, Nizhniy Novgorod, Samara, Perm, Yekaterinburg, and Rostov-on-Don during the period from 1987 to 2015. This data was divided into seasons from 1 July to 30 June of following year. Next step is to consider the timing parameters of each epidemic of all season: start, peak (the biggest number of cases in season) and duration of epidemic. Then, the incidence data was formed in tables by seasons. These tables were created in theory considered by Baroyan and described in [16]. Cities were ordered by the start of the epidemic in the considering season in the tables. The example of the table is presented in Table 9 below.

The cities are matched to the numbers of the week of season years. The results demonstrate that there are three main regions of Russia: Central (Moscow, Kazan, Nizhniy Novgorod, and Samara), Ural (Chelyabinsk, Perm, and Yekaterinburg) and Siberia (Novosibirsk, Omsk, and Ufa). The epidemic might start in one of these regions or Saint Petersburg, and then moves to the other regions in way of closeness. For example, in the Table 6 epidemic firstly starts in all cities of Ural, then goes to Siberia, Omsk and Novosibirsk. After that, it transmits to Central area included Moscow, Nizhny Novgorod and Samara. In addition, almost always influenza epidemic begins in Saint Petersburg with or after Moscow. Also, there are no patterns for transmission of epidemic in Rostov-on-Don.

Table 9 – The spread of the influenza epidemic between Russian cities in the season from 1987 to 1988; “e” means that in this week there was epidemic situation according to Research Institute of Influenza and “p” denotes to the peak of epidemic.

(32)

City Number of week and city epidemic situation in this week 2 3 4 5 6 7 8 9 10 11 12 13 Moscow e e e e p e e e e Samara e e e e e e e e p e e Chelyabinsk e e e p e e e e e e Nizhny Novgorod e e p e e e e Saint Petersburg e e p e e e e Ufa e e p e e e e e e Kazan e e p e e e e e Novosibirsk e e p e e e e e Omsk e e p e e e e e Perm e e p e e e e e e Rostov-on-Don e e p e e e Yekaterinburg e e p e e e e e

The next investigation extends the previous research. In this work, the bigger dataset was examined that contains dates and information about the weekly ARI incidence of 49 Russian cities from 1990 to 2015. Using the extended dataset, the ways, how the infection is spread through the country, were proposed. The same parameters of epidemics: their onset, peak and last, were considered.

The frequency tables of the epidemic starts in cities under consideration were created to determinate the patterns of the transmission. Moreover, the probability tables of the epidemic onset in the cities were produced. The probability was calculated as a number of outbreaks, which occurred in second city after first city, divided on a number of epidemics happened in first city. Based on these tables I have built the plots for the data analysis. The number of created plots is huge, and it complicates the work greatly.

(33)

Figure 6 illustrates an example of these plots. This graph shows the probabilities that epidemic goes from the first city to the second city. Each city in the dataset has a special number in order from 1 to 49. Table 10 contains a list of cities and their corresponding numbers. The more intense of the colour of plotting means the higher considered probability. Probability 0.0 (0.0, 0.2] (0.2, 0.4] (0.4, 0.6] (0.6, 0.8] (0.8, 1.0]

Figure 6 – The probabilities that epidemic goes directly from first city to the second city. The order of cities is based on Table 10

(34)

Table 10 – The order of cities is used in the investigation and plotting of patterns

Number City Number City

1 Arkhangelsk 26 Perm 2 Astrakhan 27 Petropavlovsk 3 Barnaul 28 Petrozavodsk 4 Belgorod 29 Pskov 5 Bryansk 30 Rostov-on-Don 6 Chelyabinsk 31 Ryazan

7 Chita 32 Saint Petersburg

8 Irkutsk 33 Samara 9 Izhevsk 34 Saratov 10 Kaliningrad 35 Smolensk 11 Kazan 36 Stavropol 12 Kemerovo 37 Syktyvkar 13 Khabarovsk 38 Tula 14 Kirov 39 Tver 15 Krasnodar 40 Ufa 16 Krasnoyarsk 41 Ulan-Ude 17 Kursk 42 Ulyanovsk 18 Magadan 43 Vladikavkaz 19 Moscow 44 Volgograd 20 Murmansk 45 Vologda

21 Nizhny Novgorod 46 Voronezh

22 Norilsk 47 Yakutsk

23 Novosibirsk 48 Yekaterinburg

24 Omsk 49 Yuzhno-Sakhalinsk

25 Oryol

The results show that there are four centres where the epidemic might start: Centre with Moscow and cities close to Moscow; East region with Khabarovsk; West region with Kaliningrad; and the rarest situation is from South. In addition, the ways of influenza spread are more complex at present days. For example, the epidemic might start in two regions at the same time and then move through the country. The main reason for this situation is supposed to be the increase of the development of transport network, both international and internal in the country. It means that there might be more sources of infection entering the country and more complex way structure of the spread compared to the past.

(35)

3.3 The global model

In this chapter, the local Baroyan-Rvachev model was extended. The usage of the human transportation, an algorithm for predicting epidemic peaks, the numerical experiments, and their results are described below.

3.3.1 Transportation matrix

According to the idea of Baroyan-Rvachev model, the spread of seasonal flu epidemics across the country is caused by the movement of people between cities. A square symmetric matrix of real data of the daily passenger traffic between the cities of Russian Federation was used to take into account this factor in the model. It is named transportation matrix. Each element of the matrix represents traffic from the city on a row to the city on a column. The values of the main diagonal of this matrix are taken zeros. The dimension of the considered matrix is 41x41.

The dataset of air transportation between cities of Russian Federation was provided by the research group from ITMO University, by Vladislav Shmatkov and Vladislav Karbovskii. In the dataset information that gives the most plausible representation of passenger traffic in 2016 year. The data for this research was collected with online service yandex.raspisanie [37]. The original matrix contains the weekly statistical data.

In USSR the researchers deal with air, train and bus transportations. However, there is no source of such data at present days, and we used only air transportation between cities. In addition, the partition of cities by zones was used in USSR, and it complicated the work with the data because it was necessary to take into account passenger traffic in small towns in one zone [16]. Using only air travel makes work easier because the airports are located in the largest cities of the district.

(36)

3.3.2 The model

This part describes the method of a forecasting of the epidemic peaks at the modern level. The investigation relied on the research of Baroyan and Rvachev documented in the paper [36]. The system of equations (1) – (5) in Chapter 1 present this approach.

In technical terms, prediction differs from direct modelling that used only the statistical information. In this case, the initial period of epidemic is allowed to use for determination of the epidemic parameters. So we have taken the daily incidence data of that city of the country, that peak occurred earlier than in all other cities (the first city). This approach provides us with a statistical data in the system. Usually infection epidemic starts first in this city. The moment of the epidemic peak of this city will be called a prognostic point.

Here it is necessary to consider the difference between ILI and ARI incidence data. This issue has not fully studied yet, and it is difficult to distinguish ARI and influenza incidence at sufficient level of accuracy. Initially, there is a background morbidity caused by ARI. Then, the moment occurs when the morbidity sharply increases caused by a somewhat factor. It might be a new strain or a shift in the existed strain. Conventionally, this outbreak is called influenza. Next, only influenza incidence is examined, and the incidence data caused by ARI is excluded, considering as a background incidence.

The preparation stage for forecasting of epidemic outbreaks is described as follows. The incidence of the first city is taken and the local model, described earlier in this chapter, is fitted. As a result, we get the epidemiological parameters of the model.

Next, we define the initial values of the model. To do this, we determine the beginning of epidemics in future modelled incidence data. Let take the prognostic point as a starting point of modelled epidemics. There are other methods of the more precise definition of the start. They are examined in discussion of this master thesis.

After that, the epidemic curves are developed using the transportation matrix and the parameters found for the first city. When more than ten individuals become infectious

(37)

in modelled data of the particular city, we can consider the start of the epidemic. The transport network corrects the rest timing.

3.3.3 Numerical experiments and results

According to the algorithm described above, the numerical experiments were carried out. In these experiments, 41 cities of Russian Federation were involved, and 16 epidemics were examined during the period from 2000 to 2015.

The same criteria, as for the local model, have been used for assessing the accuracy of prediction results. According to ‘stripe’ criterion, the prediction is accurate if the prediction bias of the peak day within the limits of (-7, 7) in ‘vertical stripe’ and the peak ratio in (0.7, 1.5) in ‘horizontal stripe’. Table 11 represents the prediction results by this criterion Figure 7 illustrates the example of prediction.

The obtained results show that if epidemic starts in Magadan the value of the incidence peak is very poorly predicted in these seasons. It is supposed to be remoteness and small population of Magadan compared to the other cities. Besides the traffic flows connected with Magadan are relatively insignificant. Moreover, as it was discussed above, most often epidemic, which entering occurs from the different sides of the country and they have several transmission paths, predominates. As a result, it is necessary to take into account other possible drifts of the epidemic apart from Magadan. Also, it is vital to take into account not only the first city but also the next initial cities.

All the epidemics, started in Syktyvkar, in 2000, 2003, 2004 and 2007, spread inside the city but do not move further through the cities of the country. Then the epidemic comes back to Syktyvkar again after a while. All the results of the prediction of epidemics started in Syktyvkar are much ahead of the real data.

(38)

Figure 7 – Example of the peak prediction in the season from 2002 to 2003. The first city is Omsk

(39)

Table 11 - Prediction accuracy according to ‘stripe’ criterion

Season First city ‘Vertical stripe’ ‘Horizontal stripe’

2000 Syktyvkar 0.0% 34.62% 2001 Kirov 23.08% 11.54% 2002 Omsk 72.73% 54.55% 2003 Syktyvkar 0.0% 45.45% 2004 Syktyvkar 3.33% 46.67% 2005 Vladikavkaz 7.41% 44.44% 2006 Magadan 28.57% 3.57% 2007 Syktyvkar 3.23% 38.71% 2008 Kaliningrad 23.53% 38.24% 2009 Khabarovsk 5.41% 48.65% 2010 Magadan 8.11% 0.0% 2011 Vologda 18.18% 45.45% 2012 Kazan 14.71% 5.88% 2013 Magadan 35.29% 41.18% 2014 Nizhny Novgorod 3.23% 35.48% 2015 Vladikavkaz 18.42% 34.21%

To sum up, the obtained results are insufficient for prediction at this level. The precision is lower than it was in the past, in the Soviet Union. Further studies of the model are required. The definition of the peak time is particularly important. However, this is only the first attempt of such prediction after a while. Several results were obtained, and the ways of further development were found.

3.4 Conclusions to the modelling and prediction of ILI transmission

In this part of the research, Chapter 3, a first attempt to use the approach of Baroyan and Rvachev is performed on the modern incidence data after roundly 30 years.

(40)

Firstly, the local model was implemented. It represents the simulation of incidence data inside one city. Three biggest cities were used for this investigation, Moscow, Saint Petersburg, and Novosibirsk. The experiments show that due to the inaccuracies in selection of the first outbreak day the prediction is not sufficient, and it is needed to check the curve extraction algorithm accuracy and compare different versions of that algorithms. In contrast, the precision of prediction of epidemic peak height is very good, and it is comparable to the results achieved in the past. Also, there is a trend that might occur because of the relation between two cities is inversely proportional to their distance. This fact should be investigated further to know the exact cause.

Secondly, the patterns of influenza transmission were studied. The results represent that there are several important centres of infection source at present days. The epidemics might move between cities of Russian Federation from these sources. As a result, the structure of the influenza transmission become more complex compared to the Soviet Union.

Lastly, the global model was developed. The transportation matrix was determined and used for this modelling. The global model is the extension of the local model, and it shows the changes of incidence data that occur between cities. The precision of epidemic peak prediction is lower than it was in the past, in the Soviet Union. Further studies of the model are required. However, this is only the first attempt of using the Baroyan-Rvachev approach on modern incidence and transportation data. Several facts were learned, and the ways of further development were suggested.

(41)

CONCLUSIONS

The aim of this Master’s thesis was to examine and develop the model for prediction of epidemic outbreaks in cities of Russian Federation. For this purpose, it was required to analyse the acute respiratory infection incidence data in the temperate regions and apply the model of Baroyan-Rvachev on modern incidence data.

Chapter 2 describes the determination of the thresholds of temperature and specific humidity that epidemic ARI incidence cannot overcome. In addition to that, the right edge allocation algorithm can be used. It allows examining whether a chosen daily incidence is possible in combination with chosen weather factors. That helps to understand if the epidemic outbreak is possible to detect using particular levels of the observed climate conditions [5].

Limitation of this part of work, described in Chapter 2, is that the interpolated weekly incidence data was used instead of daily incidence data. The epidemic curves received via interpolation are more smooth that real daily incidence data. It might cause changes in the relation between incidence data and weather conditions. Nowadays, the daily incidence is not entirely available. Moreover, it depends on weekends and holidays during season. The next step for these investigation might be to get daily incidence data of Russian cities and use it to check the consequences of this work. As a result, it might give the answer how the interpolated incidence data differs from the real situation of daily incidence cases.

In this paper, only the temperature and humidity as external factors were studied. Further works might be related to the investigation of the influence of other factors, not only climate. They might be age structure, immunology, vaccination and strains of influenza.

In the part of the investigation, described in Chapter 3, the Baroyan-Rvachev model was examined and developed. It shows that the prediction precisions are lower compared to the past. Further works might be related to with the improvement of the local population model. For example, to use daily incidence data instead of weekly data and

(42)

compare the obtained results. This experiment might help to understand a difference between using daily data with inaccuracies due to work interpolated weekly data.

One of the drawbacks related to the research of the global Baroyan-Rvachev model is that we do not take into account the factor of the development of the country's transport network. In USSR the parameter for this factor was calculated from the literature. Then the transport matrix was multiplied by this parameter. In this case, the uneven increase of the passenger traffic was not taken into account, but this is not significant since the calculated value was averaged [36].

One more improvement might be used in Baroyan-Rvachev model in future. Only half-wave of the epidemic curve of the first city might be used for the estimation of epidemiological parameters. The Soviet research team found that the results are more accurate in this case [16].

Moreover, there is an idea to use several initial cities that are sources of the epidemic infection besides the first city. The initial cities are named cities where epidemic starts in the period while the epidemic does not reach its peak in the first city [36]. This modification would improve the model and take into account the complex structure of spreading and entering the infection in the country.

Other transportation models might be used for comparing results, for example, the gravity model based on the distance between cities. In addition, there are other ways to account for human mobility, both at the regional and the country levels.

(43)

FUTURE WORK

The results of this study of thesis will be used in the further investigations. First, the findings will be used for a research to create a multi-component system for prediction of the seasonal dynamics of influenza in Russian regions. To achieve this goal of creation of the system, we will develop various useful models and algorithms. The list of the research’s main points is presented below.

The first point is a development of algorithm of processing and correction of data incidence of influenza-like illnesses. There are some deviations in the data of the incidence. For example, during holidays a substantial decrease in the number of cases was found. As it may affect the modelling, it is necessary to correct and smooth it. The second point is a development of algorithm of posteriori selection of seasonal phases of epidemics. As a result of long-term data analysis, the algorithm will collect statistics on the quantities of levels, onset and duration of epidemic in a particular Russian region. The third point is a development of a mathematical model of an outbreak. This model will describe the dynamics of the influenza epidemic. The fourth point is related to the development of a statistical regression model of seasonal incidence. Its parameters include several external factors that have the most substantial influence on the levels of seasonal incidence. The fifth point is to develop an algorithm predicting the time of an outbreak. After that, the program realisation of the whole system will be developed. It unites all the models and algorithms that are mentioned above. Finally, testing and debugging of the system produced will be carried out.

To sum up, the findings of this research can be readily used in practice. The resulting system will assist the simulation of the seasonal dynamics of influenza-like illnesses, taking into account the correct relationship between the levels of influenza and the seasonal flu epidemic parameters. Furthermore, it will examine the relationships that cause the occurrence of epidemics. Besides, it will help to use the tools to prevent influenza disease in the proper way, for example, to enhance them before predictable epidemic peak day or begin vaccination strategy in advance before the epidemic onset.

(44)

ABBREVIATIONS

ARI – acute respiratory infection ILI – influenza-like illness

(45)

REFERENCES

1. World Health Organizations (WHO) (2009) Influenza (Seasonal) [online]. URL: http://www.who.int/mediacentre/factsheets/fs211/en/ (date accessed: 01.01.2009).

2. Romanyukha A. A., Sannikova T. E., Drynov I. D. The origin of acute respiratory epidemics //Herald of the Russian Academy of Sciences. – 2011. – Т. 81. – №. 1. – С. 31-34.

3. Tamerius J. D. et al. Environmental predictors of seasonal influenza epidemics across temperate and tropical climates //PLoS Pathog. – 2013. – Т. 9. – №. 3. – С. e1003194.

4. Tang J. W. et al. Comparison of the incidence of influenza in relation to climate factors during 2000–2007 in five countries //Journal of medical virology. – 2010. – Т. 82. – №. 11. – С. 1958-1965.

5. Sagripanti J. L., Lytle C. D. Inactivation of influenza virus by solar radiation //Photochemistry and Photobiology. – 2007. – Т. 83. – №. 5. – С. 1278-1282.

6. Reichert T. A. et al. Influenza and the winter increase in mortality in the United States, 1959–1999 //American journal of epidemiology. – 2004. – Т. 160. – №. 5. – С. 492-502.

7. Lipsitch M., Viboud C. Influenza seasonality: lifting the fog //Proceedings of the National Academy of Sciences. – 2009. – Т. 106. – №. 10. – С. 3645-3646.

8. du Prel J. B. et al. Are meteorological parameters associated with acute respiratory tract infections? //Clinical infectious diseases. – 2009. – Т. 49. – №. 6. – С. 861-868. 9. Urashima M. et al. A seasonal model to simulate influenza oscillation in Tokyo //Japanese journal of infectious diseases. – 2003. – Т. 56. – №. 2. – С. 43-47.

10. Shaman J. et al. Absolute humidity and the seasonal onset of influenza in the continental United States //PLoS Biol. – 2010. – Т. 8. – №. 2. – С. e1000316.

(46)

11. Shaman J., Kohn M. Absolute humidity modulates influenza survival, transmission, and seasonality //Proceedings of the National Academy of Sciences. – 2009. – Т. 106. – №. 9. – С. 3243-3248.

12. Lowen A. C. et al. Influenza virus transmission is dependent on relative humidity and temperature //PLoS Pathog. – 2007. – Т. 3. – №. 10. – С. e151.

13. Lowen A. C. et al. High temperature (30 C) blocks aerosol but not contact transmission of influenza virus //Journal of virology. – 2008. – Т. 82. – №. 11. – С. 5650-5652.

14. Axelsen J. B. et al. Multiannual forecasting of seasonal influenza dynamics reveals climatic and evolutionary drivers //Proceedings of the National Academy of Sciences. – 2014. – Т. 111. – №. 26. – С. 9538-9542.

15. Charaudeau S. Movements networks in epidemiological models: integration, analysis and application to commuting movements in France : Université Paris-Diderot-Paris VII, 2013.

16. Baroyan O. V. et al. Computer modelling of influenza epidemics for the whole country (USSR) //Advances in Applied Probability. – 1971. – Т. 3. – №. 2. – С. 224-226. 17. Ivannikov Yu. G., Ismagulov A.T. The epidemiology of influenza //Publishing office “Kazakhstan”, Almaty. -1983.

18. Axelsen J. B. et al. Multiannual forecasting of seasonal influenza dynamics reveals climatic and evolutionary drivers //Proceedings of the National Academy of Sciences. – 2014. – Т. 111. – №. 26. – С. 9538-9542.

19. Yaari R. et al. Modelling seasonal influenza: the role of weather and punctuated antigenic drift //Journal of The Royal Society Interface. – 2013. – Т. 10. – №. 84. – С. 20130298.

20. Truscott J. et al. Essential epidemiological mechanisms underpinning the transmission dynamics of seasonal influenza //Journal of The Royal Society Interface. – 2011. – С. rsif20110309.

(47)

21. Lunelli A. et al. Understanding the dynamics of seasonal influenza in Italy: incidence, transmissibility and population susceptibility in a 9‐year period //Influenza and other respiratory viruses. – 2013. – Т. 7. – №. 3. – С. 286-295.

22. Goeyvaerts N. et al. Estimating dynamic transmission model parameters for seasonal influenza by fitting to age and season-specific influenza-like illness incidence //Epidemics. – 2015. – Т. 13. – С. 1-9.

23. Bootsma M. C. J., Ferguson N. M. The effect of public health measures on the 1918 influenza pandemic in US cities //Proceedings of the National Academy of Sciences. – 2007. – Т. 104. – №. 18. – С. 7588-7593.

24. Logan J. D. An introduction to nonlinear partial differential equations. – John Wiley & Sons, 2008. – Т. 89.

25. Leonenko V. N., Ivanov S. V., Novoselova Y. K. A Computational approach to investigate patterns of acute respiratory illness dynamics in the regions with distinct seasonal climate transitions //Procedia Computer Science. – 2016. – Т. 80. – С. 2402-2412.

26. Flu Institute. Research Institute of Influenza website [online]. URL: http://influenza.spb.ru/en/ (date accessed: 28.04.2017).

27. Baroyan O. V. et al. Computer modelling of influenza epidemics for large-scale systems of cities and territories //Proc. WHO Symposium on Quantitative Epidemiology, Moscow. – 1970.

28. Sentinelles. Sentinelles surveillance system website [online]. URL: https://websenti.u707.jussieu.fr/sentiweb/ (date accessed: 28.04.2017).

29. Kalnay E. et al. The NCEP/NCAR 40-year reanalysis project //Bulletin of the American meteorological Society. – 1996. – Т. 77. – №. 3. – С. 437-471.

30. Kermack W. O., McKendrick A. G. A contribution to the mathematical theory of epidemics //Proceedings of the Royal Society of London A: mathematical, physical and engineering sciences. – The Royal Society, 1927. – Т. 115. – №. 772. – С. 700-721.