MScBA Digital Business

(1)

MScBA Digital Business

Final Thesis

Investigating the future cases of COVID-19 in Germany by univariate and multivariate models

by

Katerina Todorova

12964166

August 2022

Supervisor:

Dr. Chintan Amrit

(2)

This document is written by Student Katerina Todorova who declares to take full

responsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of the completion of the work, not for the contents.

(3)

Table of Contents

Tables ...3

Figures ...4

ABSTRACT ...5

Chapter 1: Introduction ...6

1.1 History of Pandemics ...6

1.2 Covid-19 Pandemic ...8

Chapter 2. Literature Review... 10

2.1 Covid-19 Impact on Neighboring Countries ... 11

Chapter 3. Data Description ... 16

Chapter 4 Methods ... 18

4.1 Time Series Forecasting ... 18

4.2 Components of Time Series Analysis... 18

4.3 Concept of Stationary ... 20

4.4 White Noise ... 22

4.5 Time Series Forecasting Methods ... 23

4.5.1 ARIMA Model ... 24

4.5.2 VAR Model ... 25

Chapter 5. Model Selection Criteria ... 27

5.1 Akaike Information Criterion (AIC) ... 27

5.2 Bayesian Information Criterion (BIC)... 28

5.3 Final Prediction Error (FPE) ... 28

5.4 Schwarz Information Criterion (SIC) ... 29

5.5 Hannan-Rissanen Criterion (HRC) ... 29

5.6 Hannan-Quinn Information Criterion ... 29

5.7 Techniques to Use in Time Series Analysis ... 30

5.7.1 Dickey-Fuller Test (ADF) ... 30

5.7.2 Granger Causality Test ... 30

5.7.3 The Portmanteau Test ... 31

Chapter 6. Developing the Models to Determine the Correlation of Covid-19 Cases Between Selected Countries ... 33

6.1 Strategy to Develop ARIMA Model ... 33

6.2 Strategy to Develop VAR Model ... 33

Chapter 7. Results ... 34

7.1 ARIMA Model Results ... 34

(4)

7.1.1 ARIMA (6.1.3) Model ... 36

7.1.2 ARIMA (7,1,5) Model ... 37

7.1.3 ARIMA (7,1,5, s7) Model ... 38

7.1.4 Model Comparison ... 39

7. 2 VAR Model ... 39

7.2.1 ADF Test ... 41

7.2.2 Granger Causality Test ... 42

7.3 VAR model for GER_COV19 and POL_COV19 series ... 44

7.4 Estimating VAR Model ... 44

7.4.1 VAR Estimation with 7 Lags ... 44

7.5 Forecasting with VAR ... 47

Chapter 8. Discussion ... 48

Chapter 9. Conclusions... 50

9.1 Conclusion ... 50

9.2 Future Work ... 50

REFERENCES ... 52

Appendices ... 56

Appendix 1 ... 56

Appendix 2 ... 65

Appendix 3 ... 66

Appendix 3 ... 68

Appendix 4 ... 70

Appendix 5 ... 72

(5)

Tables

Table 1: Epidemiological situation in Argentina and neighboring countries (Ramírez, 2021). ... 13

Table 2: Total cases in ROI by region (Ahmed, 2021). ... 15

Table 3: Regression analysis Results ... 37

Table 4: Ljung-Box test for ARIMA (6,1,3) Model ... 37

Table 5: Ljung-Box test for ARIMA (7,1,5) Model ... 38

Table 6: Z-test of coefficients for ARIMA (7,1,5,s7) Model. ... 38

Table 7: Comparison of ARIMA (6,1,3), ARIMA (7,1,5) and ARIMA (7,1,5,s7) Models. ... 39

Table 8: The Merged Dataset ... 39

Table 9: Critical values for the test statistics for the FRA_COV19 series. ... 41

Table 10: Granger-Causality Test Results with 3 lags (GER_COV19 as the dependent)... 42

Table 11: Granger-Causality Test Results with 4 lags (GER_COV19 as the dependent)... 43

Table 12: Granger-Causality Test Results with 3 lags (POL_COV19 as a dependent) ... 43

Table 13: Granger-Causality Test Results with 4 lags (POL_COV19 as a dependent) ... 43

Table 14: Portmanteau Test (asymptotic) for GER.POL.var7 ... 45

Table 15: Portmanteau Test (asymptotic) for GER.POL.var13... 45

Table 16: Portmanteau Test (asymptotic) for GER.POL.var13_s7. ... 46

Table 17: Comparison of GER.POL.var7, GER.POL.var13 and GER.POL.var13_s7 Models. ... 46

Table 18: All possible combinations of orders ... 666

Table 19: Lag Order Selection ... 68

Table 20: Var Estimation Results with 13 lags for POL_COV19 without seasonal dummies. ... 70

Table 21: Var Estimation Results with 13 lags for POL_COV19 without seasonal dummies ... 72

Table 22: Var Estimation Results with 13 lags for GER_COV19 with season 7 ... 74

Table 23: Var Estimation Results with 13 lags for GER_COV19 with season 7 ……… 76

(6)

Figures

Figure 1: Evolution of confirmed Covid-19 cases in Argentina and neighboring countries [20]. ... 12

Figure 2: Covid-19 cases in France between Mar 2020 – May 2022 ... 16

Figure 3: Covid-19 cases in Germany between Mar 2020 – May 2022... 17

Figure 4: Covid-19 cases in Poland between Mar 2020 – May 2022 ... 17

Figure 5: Components of Time Series ... 20

Figure 6: Different types of stationary time series. ... 21

Figure 7: White Noise Process ... 22

Figure 8: ACF and PACF graphs of White Noise ... 23

Figure 9: The original and differenced series ... 34

Figure 10: ACF Graph ... 35

Figure 11: PACF Graph ... 36

Figure 12: Confirmed Covid-19 Cases in Germany, France and Poland. ... 40

Figure 13: Plotting Confirmed Covid-19 Cases in Germany, France and Poland Separately. ... 40

Figure 14: Covid-19 Cases in Germany and Poland. ... 44

Figure 15: 13-day forecast of Covid-19 cases in Germany using final model (GER.POL.var13) ... 47

Figure 16: 13-day forecast of Covid-19 cases in Poland using final model (GER.POL.var13) ... 47

(7)

ABSTRACT

Covid-19 (Sars-Cov-2) is a new acute respiratory disease that emerged in 2019 and has affected the universe and caused the deaths of millions of people. In a situation where the disease has spread so much, it has become an inevitable need to create future prediction models. In this study, two different time series analysis models, such as ARIMA and VAR, were used. The dataset was obtained from Germany, Poland, and France. This thesis aims to analyze the causality and correlation between Covid-19 changes in Germany and neighboring countries and to forecast new Covid-19 cases. The ARIMA model was used as a univariate model for the Covid-19 instances in Germany. Covid-19 cases in Poland and Germany were used to apply the VAR model due to the bi-directional feedback and common border between the two countries. There is no study in the literature investigating the impact of Covid-19 cases in neighboring countries on cases in Germany using the VAR model. This study highlights the use of the VAR model to analyze the impact of neighboring countries on Covid-19 cases in Germany and to predict new cases. The developed ARIMA and VAR models were compared to determine the best model using AIC and BIC. The 13-day forecast model was developed by using the final VAR model. According to the result obtained from this study, it has been determined that neighboring countries influence both the increase and decrease of cases and the spread of future cases. This study underlines the importance of predictive models in controlling the pandemic, saving human lives, and predicting new cases in the future.

(8)

Chapter 1: Introduction 1.1 History of Pandemics

Pandemics have had many effects throughout history and caused societal and behavioral changes. Examining past pandemics in world history can give us information about how to deal with new pandemics and may cause us to realize some mistakes made in the past. The lack of adequate historical information makes it difficult to learn when and where past outbreaks occurred. There have been many pandemics in history, such as the plague of Athens, the Plague of Justinian, lepra, Black Death, smallpox, and Cholera pandemics.

Black Death is a bacillus bacteria-caused illness carried and transmitted by parasite fleas, most notably in rodents such as brown sewer rats. The disease reached China in 1331 and killed more than 90% of Hebei Province's population, resulting in more than 5 million deaths in China. The disease arrived in 1346 at the Genoese city of Kefe in Crimea via trade lines and the Mongol army. The army, who wanted to besiege the city of Kefe, threw the people who died due to the Black Death into the city with catapults. The Genoese who fled from the city of Kefe caused the disease to spread first to Europe and then to Greenland. As a result of the epidemic, approximately 200 million people died in the 14th century. This pandemic has affected the countries, the demographic structure of the nations, and their economic and political systems. An armistice was declared between England and France, and England changed economic and demographic conditions. The change in economic and demographic conditions led to the collapse of the English feudal system (Dean, 2018).

There are other changes that the Black Death Pandemic has brought to our lives. To protect coastal cities from the pandemic, ships from infected areas were held for 40 days before entering Venice. This practice, later called quarantine, came into our lives for the first time due to Black Death.

(9)

Studies on Black Death continue today. In 2018, through research conducted at the

Universities of Oslo and Ferrara, it was revealed that the epidemic was not caused by animals but by humans. The bacterium Yersinia Pestis turned out to be a "human parasite model"

(Dean, 2018).

Cholera is a disease that proves how fast and how wide an epidemic can spread. The Cholera Outbreak began in India's Ganges Delta in 1817 and swept throughout the country within a year. The disease spread to India and Thailand, the Philippines in 1820, western and eastern Asia in 1821, and Russia in 1823. It has affected almost all countries in Asia, and the number of deaths is still unknown. This pandemic, which spread throughout Asia, is also known as the first cholera pandemic or Asian Cholera (Hays, 2006).

The second cholera epidemic started in India in 1827 and spread to Afghanistan and Fars in 1829. By the end of the same year, the disease reached Russia, and via western Russia, the disease spread to Europe. In 1831, it spread to the Arabian Peninsula via Fars and spread to Africa from the Arabian Peninsula. The cholera disease reached America in 1832. As in the first cholera pandemic, the total number of deaths in the second cholera pandemic is still unknown (Hays, 2006).

Although scientific studies conducted during the second cholera pandemic showed that the disease might be associated with poverty, it was not stated that Cholera was more common in poor people. In some countries, poor people began to be discriminated against, and it was believed that the cholera disease was a punishment given to people by God. For this reason, official prayer days were organized in some countries (Hays, 2006).

In 1839, the third cholera pandemic broke out in India and spread worldwide. In England, 61,000 people died from Cholera between 1848 and 1849, and 26,000 died between 1853 and 1854. In Paris, 20000 people died of Cholera in 1849. Approximately two-thirds of the

(10)

Brazilian population died. In 1854, an English doctor stated that the way of transmission of the disease was dirty water. As it can be understood, the belief, lifestyles, cultures, and traditions of societies affect the progress of the pandemic. Pandemics have caused permanent changes in countries, economic problems, and the collapse of political systems (Hays, 2006).

1.2 Covid-19 Pandemic

An epidemic is defined as the occurrence of a simple disease more than expected or detecting a significant increase in the number of diseases. A pandemic, on the other hand, is the

occurrence of a disease in a wide area by crossing international borders, affecting the world and a large part of it. There have been many pandemics such as Justinian's Plague (541-750 AD), Black Death (1347-1351), Cholera (1817-1823), Spanish Flu (1918-1919), SARS (2002-2003), H1N1 (2009-2010) and Ebola (2014-2016). While some of these pandemics have ended, others are still actively involved in our lives. As it can be understood, some of these pandemics do not disappear over time.

The latest pandemic is COVID-19 (SARS-Cov-2), which emerged in Wuhan, China, in 2019 and spread worldwide (Hui DS, 2020). Coronavirus is one of the deadliest virus families.

Covid-19 first began to appear in Wuhan, China, then continued to spread in Iran and Italy and affected the entire world. The World Health Organization declared a pandemic worldwide (World Health Organization, 2020). The most common symptoms of the Covid-19 pandemic are high fever, dry cough, shortness of breath, fatigue, loss of taste and smell, diarrhea, and vascular congestion. In addition, severe symptoms such as shortness of breath, loss of speech and movement, and chest pain are observed in some patients. Covid-19 affects people with cardiovascular disease, diabetes, chronic respiratory disease, cancer, and people over 60 (Paules, 2020).

The death rate from Covid-19 is 2.92%, while the mortality rate from influenza ranges from 0.05% to 0.1%, which means that the death rate from Covid-19 is between 30 times and 60

(11)

times more lethal than the death rate from influenza (McCarty, 2020). The results of the studies show that the mortality rate is higher in patients over the age of 80 (Wu Z. a., 2020).

For example, the overall mortality rate in South Korea was reported as 0.9%, while the mortality rate in patients over 80 years of age was 9.3% (YJ, 2020).

Countries have started to apply certain restrictions to reduce the number of Covid-19 cases and the rate of spread. Restrictions such as quarantine, lockdown, use of masks, closing the borders, and social distance were applied. In addition, some arrangements were made, such as working from home and distance education. Today, almost all countries have removed or eased these restrictions, which may cause a new wave of pandemics. Unfortunately, even in countries where vaccination is common, the Covid-19 wave can maintain its continuity (Munnoli, 2022).

Countries also want to predict the cases that may occur in the short, medium, and long term with the increase in Covid-19 cases. Predictions developed in this context will help countries to plan and take precautions. Since the first days of Covid-19 instances, researchers have been analyzing data to predict how the pandemic will unfold. These studies are carried out using mathematical models, statistics, and deep learning algorithms.

(12)

Chapter 2. Literature Review

Time series forecasting is focused on predicting the investigated event over time. The data used to create the time series can be collected at regular intervals such as daily, monthly, and yearly. There are many studies on time series models of Covid-19 cases. In this section, some studies with ARIMA and VAR models are given.

Barman has compared the ARIMA model with LSTM to investigate Covid-19 cases in 4 countries: Germany, the United States of America, Spain, and Italy. In this research, the ARIMA model is quite good and has been shown to outperform the LSTM model. This indicates that the ARIMA model can be used in time series analysis and forecasting (Barman, 2020).

Ding et al. analyzed data in Italy between February 24 and March 30, applying the ARIMA model. It was stated that at the time of this study, the number of Covid-19 cases in Italy decreased, but it would not be correct to ease the measures taken (Ding, 2020).

Sahai et al. analyzed data between February 15 and June 30, 2020, in the 5 most affected countries to estimate the spread of the pandemic. Data collected from the United States, Brazil, Russia, India, and Spain were used in this study. The study indicates that India successfully controlled the pandemic until May 31, 2020. With the lifting of the lockdown, Covid-19 cases started to rise again due to the movement of migrant workers in Delhi and Mumbai. Thus, India became one of the most affected countries by Covid-19. As a result of this situation, India has placed 3rd after the United States and Brazil. While the Covid-19 data in Spain progress in a flat graph, cases in Russia have decreased (Sahai, 2020).

Singh et al. investigated the impact of unlocking in India by comparing the predicted positive cases and the total number of tests done for Covid-19 during unlock and lockdown periods.

Seven forecasting models were examined and compared based on prediction accuracy: TBAT, Prophet, Autoregressive Integrated Moving Average (ARIMA), Moving Average, Neural

(13)

Basis Expansion Analysis (N-BEATS), Single Exponential Method, and Double Exponential Method. The ARIMA model was the best forecasting model (Saswat Singh, 2021).

Moftakhar et al. used two forecasting models, such as Artificial Neural Networks (ANN), to predict Autoregressive Integrated Moving Average (ARIMA) to predict Covid-19 cases in Iran. The cases between 19th February and 30th March 2020 were used as the dataset.

ARIMA and ANN models were compared, and a prediction was made that the number of cases in Iran would be 6678 and 3977 as of April 24, 2020. The comparison indicates that ARIMA predicted future Covid-19 cases in Iran more precisely than ANN (Moftakhar, 2020).

Rajab et al. proposed an approach to predict the spread of Covid-19 in UAE, Saudi Arabia, and Kuwait using the Vector Autoregressive Model (VAR). The results indicate that the proposed model is highly accurate. The proposed approach's effectiveness demonstrates that it may also be utilized for predicting other countries (Rajab, 2022).

Wang et al. proposed a validated Vector Autoregressive Model (VAR) model for predicting daily positive Covid-19 cases. The dataset used in this study was collected from John Hopkins University publications. The model was validated by comparing the predicted values with actual values. This model also predicts whether their infection will decrease in the early stages of vaccination (Wang, 2021).

2.1 Covid-19 Impact on Neighboring Countries

Ramírez et al. compared Covid-19 cases in Argentina and its neighbor countries such as Uruguay, Brazil, Paraguay, Bolivia, and Chile. On March 7, 2020, Argentina reported the first death in the region; before the end of the month, all neighboring nations had also confirmed it.

It is important to note that the number of tests each country performs varies. Chile is the Latin American country with the most PCR tests per million inhabitants, followed by Uruguay and Paraguay (Ramírez ML, 2021).

(14)

Figure 1: Evolution of confirmed Covid-19 cases in Argentina and neighboring countries [20].

With 1.09% lethality of COVID-19, Paraguay, and Chile rank first, followed by Uruguay (2.78%), Argentina (3.11%), Bolivia (3.42%), and finally Brazil with 5.61%. Public health policies influenced each country's epidemiological behavior. There is community

transmission of the virus in Argentina, which is why strict preventive hygienic measures are in place. Uruguay and Paraguay have higher test rates which helped them to understand the pandemic and maintain greater control. Uruguay and Paraguay are two Latin American countries with fewer positive cases.

Country Date 1º confirmed case by COVID-19

Date 1º death by COVID- 19

Date 1º preventive action (obligatory isolation, type of action)

Initial date of community transmission

% Lethality rate by COVID-19 (as of May 25)

Argentina Mar-03 Mar-07 Mar-20 (strict and

Mar-23 3.69 %

(15)

obligate in phases) Uruguay Mar-13 Mar-28

Mar-17 (not obligatory)

-

2.79 %

Brazil

Feb-26

Mar-17 - Mar-20

6.26 %

Bolivia Mar-10 Mar-28 Mar-12 (obligatory in phases)

Mar-14 3.92 %

Chile Mar-03

Mar-20

Mar-18 (obligatory in phases)

Mar-16

1.03 %

Paraguay Mar-07 Mar-20 Mar-20 (Total and obligatory)

Mar-20 1.27 %

Table 1: Epidemiological situation in Argentina and neighboring countries (Ramírez, 2021).

Brozak et al. developed a model for the dynamics of transmission of Covid-19 in India and Pakistan. The basic model was developed by categorizing the entire population into several

(16)

compartments based on disease status. Cumulative Covid-19 mortality data from India and Pakistan was used in the basic model. The basic model revealed that relaxing the lockdown and mitigation measures in India and Pakistan could result in another pandemic wave in either country (Brozak, 2021). The lifting of lockdown measures in the United States was revealed to determine the magnitude of the expected third wave of the pandemic in the fall of 2020 (Ngonghala, 2020). According to our results, the likelihood of effectively controlling or eliminating the COVID-19 pandemic in each of the two countries utilizing existing control resources is quite promising. The study shows that if control measures are maintained, both countries can eliminate Covid-19. According to other information obtained in the study, it was understood that the cumulative death rate in Pakistan was associated with the increased time spent by Indian citizens in Pakistan. India and Pakistan have experienced a devastating Covid- 19 pandemic. Lifting control measures can result in another pandemic wave in both countries (Brozak, 2021).

Ahmed et al. aimed to determine if there is an impact of Covid-19 cases in Northern Ireland (NI) in Covid-19 cases in the Republic of Ireland (ROI). They investigated whether the infection rate differed in ROI counties bordering NI compared to the rest of ROI. The data used in the study consists of confirmed Covid-19 cases in ROI between March 2020 and March 2021. The study indicates that NI had more confirmed Covid-19 cases than ROI during the second pandemic wave. Descriptive statistics reveal that per capita cases were more significant along the border than elsewhere in ROI counties (Ahmed, 2021).

Region Total number of cases Cases/1000 people

Border 27633 60.4

Midland 11877 40.6

West 20167 38.9

(17)

Mid-East 79402 41.6

Mid-West 20612 43.6

South-East 18175 43.1

South-West 25862 37.4

National 227790 47.8

Table 2: Total cases in ROI by region (Ahmed, 2021).

During the COVID-19 pandemic on the island of Ireland, high infection rates in Northern Ireland increased cases in nearby ROI. As can be understood from this study, neighboring countries should continue the fight against the pandemic in a coordinated manner.

Coordination between neighboring countries is essential to reduce the spread and impact of the pandemic.

(18)

Chapter 3. Data Description

This study forecasts the number of newly reported Covid-19 cases using univariate and multivariate models. The dataset contains the data between Mar 2020 – May 2022. The Seasonal ARIMA model (SARIMA) was used for the univariate model, and Vector

Autoregressive (VAR) model was used for the multivariate model. The dataset used in this study consists of Covid-19 cases from Germany, France, and Poland.

In the first section ARIMA model will be applied to the data from Germany then the best ARIMA model will be selected for forecasting Covid-19. In the second section, Granger Causality Test and Vector Auto Regressive Model will be applied to investigate if there is a significant causality between the three countries.

To explore the Covid-19 cases in the selected countries, the plots are displayed:

Figure 2: Covid-19 cases in France between Mar 2020 – May 2022

(19)

Figure 3: Covid-19 cases in Germany between Mar 2020 – May 2022

Figure 4: Covid-19 cases in Poland between Mar 2020 – May 2022

(20)

Chapter 4 Methods 4.1 Time Series Forecasting

Predicting future events has been important for companies and countries since it became part of the decision-making mechanism. Predicting future events and conditions is significant in all fields of macroeconomics, engineering, biology, medicine, and social content. Good prediction of the future is the basis for planning to be ready for this future, determining policies, and making decisions. Thus, forecasts allow taking precautions in advance and reduce future anxiety. Therefore, consistent predictions are required to make the right decisions and planning.

A time series is a data series of values observed over time. The time series data can be collected daily, weekly, monthly, quarterly, and yearly. Historical values typically influence time series data. Time series studies have made rapid progress from the past to the present, with the advancement in statistical tools and computers. Yule and Slutsky founded time series analysis. Yule used the autoregressive model (AR) and said that the real-time series period length should be constant. Walker extended Yule's autoregressive model to more than two previous observation values. The moving average (MA) model was used by Slutsky (Yule, 1921, Slutsky, 1937).

Box and Jenkins conducted several pioneering investigations on the formulation, estimate, and management of time series models. Throughout their research, they stressed the necessity of using the autocorrelation coefficient. They developed the autoregressive moving average model (ARMA), which is a hybrid of the autoregressive (AR) and moving average (MA) models (Abraham, 1983).

4.2 Components of Time Series Analysis

Regular or irregular changes can occur in time series, and it is necessary to find their reasons.

Persons, W.M. suggested a time series decomposition in 1919 as secular trends, cyclical fluctuations, seasonal variation, and irregular variation components (Dodge, 2008).

(21)

Secular trend refers to a time series change in a specific direction eventually. Since this is a long-term analysis, monthly or seasonal data will not affect the result. The graph generally represents time series with Cartesian coordinates. In the scatter diagram of the time series, if the points are mainly gathered around a line, it can be said that the trend, that is, the general trend, is linear (Baltagi, 2011).

A cyclical component is a non-seasonal component that fluctuates in a predictable cycle.

Sometimes a series will display oscillation with no set period but is predictable. The average cycle length is greater than the length of a seasonal pattern. In practice, the cyclical

component is believed to be included in the trend component. Trend-cycle refers to the combination of trend and cyclical components. The cyclical fluctuations repeat themselves in four stages:

1. Peak 2. Recession

3. Trough/Depression 4. Expansion

Seasonal fluctuations are regular changes in observation values that increase or decrease at the same time points in successive years, months, days, or seasons. The wavelength is the time interval between the maximum points of two subsequent seasonal changes, and the wave intensity is the difference in height between the highest and minimum points of a seasonal variation (Dodge, 2008).

Irregular variation occurs due to events unrelated to specific conditions, the existence of which cannot be foreseen in advance and the effects of which do not appear continually. This type of variation can occur suddenly, and it can be unpredictable. Time series are thought to

(22)

be driven in part by white noise, which plays a significant role in both practice and theory (Dodge, 2008). Components of the time series are shown in Figure 5.

Figure 5: Components of Time Series 4.3 Concept of Stationary

It is critical in time series analysis that the series be stationary. Stationarity is defined as a process in which the mean and variance of a series of variables do not vary with time, do not have an increasing trend with time, and do not bear the impact of time. If the variables are stationary, the effect of random shocks over a certain period is temporary rather than

permanent. The variable is not stationary if a continuous effect is noticed. Mean and variance doesn’t depend on the time series studied. As a result, time series that include patterns that impact time series values at various periods, such as trend and seasonality, are not considered stationary series (Baltagi, 2011). There are two different time series which are named as strongly and weakly. A strong stationary form occurs when the distribution of a time series has the same through time (Chatfield, 2000).

(23)

A time series yt is said to be firmly stationary if the joint distribution of (y1, y2, yt) and (y1+k; y2+k, yt+k) is the same for all t, where k is an arbitrary positive integer and (1, 2, t) is the period. On the other hand, stationary solid time series are hard to verify. Therefore, weak stationary time series can be applied practically (Cochrane, 1997).

The mean and variance of a weakly stationary time series are constant, and the covariance values do not change with time. The weak form of stationary time series can be described as:

cov (Xt1, Xt2) = Kxx (t1, t2) = Kxx (t2 - t1,0) = Kxx(k) VAR(Xt) =cov (Xt, Xt) = Kxx (t, t) = Kxx (0) = d

As a result, time series with weakly stationarity show that data varies with a continuous variance around a specific level. With this attribute of the weakly stationary, future data may be readily inferred, making time series forecasts highly significant. Figure 2 shows diverse types of stationary time series (R. S. Tsay, 2010).

Figure 6: Different types of stationary time series.

(24)

4.4 White Noise

A pure random process is one in which Zt, a discrete random process, comprises a collection of independent identically distributed random variables. White noise is the term that refers to a completely random process. Even though it is rarely encountered in applied time series, it is the foundation for more complicated time series procedures. Xt = Zt, Zt ~IID (0, σ2) notation is used to denote this process (Wei, 1994). The graph of the white noise process is shown below.

Figure 7: White Noise Process

The white noise process is stationary because the autocovariance function and mean do not depend on time. This process's autocovariance function, autocorrelation (ACF), and partial autocorrelation (PACF) are shown below.

(25)

Figure 8: ACF and PACF graphs of White Noise 4.5 Time Series Forecasting Methods

Many approaches have been developed for predicting future events by utilizing past and present observations. These approaches can be divided into two groups which are univariate time series and multivariate time series.

Univariate time series are approaches for forecasting the future utilized in the case of data from a single time-dependent variable. The purpose is to determine whether there is a pattern in the data. The most significant advantage of univariate analysis is that it produces better findings when the data amount is small. On the other hand, univariate analyses may not be appropriate for completing complicated tasks.

Multivariate time series methods are used to predict and control the cause-effect relation between two or more time series. In multivariate time series analysis methods, the time series depends on other variables and past values along with its past importance. One of the primary purposes of multivariate time series is to determine the relationship between the components of that series. In some cases, although a multivariate series is not stationary, it can become stationary with any linear transformation.

(26)

4.5.1 ARIMA Model

ARIMA model is one of the most popular models that use statistical analysis of time series data. ARIMA model is applied to non-stationary series that have been transformed into a stationary state. Non-stationary linear stochastic models are used for non-stationary but transformed to stationary series. Yule's introduction of AR models in 1921 laid the groundwork for ARIMA models. Later, in 1927, Slutsky developed MA models; in 1954, Wold developed ARMA models, a combination of AR and MA. Box and Jenkins created ARIMA models.

The Box-Jenkins prediction method differs from other prediction models because it does not require prior knowledge of the time series structure or the general development trend.

Furthermore, unlike other methods, which need the series to have a specific trend, the Box- Jenkins method can be applied to complex time series because there is no such restriction in these models (Abraham, 1983).

The general representation of the ARIMA model is ARIMA (p, d, q). p and q are the degrees of the autoregressive (AR) model and the moving average (MA) model, respectively, and d is the non-seasonality difference. ARIMA model can be defined as follows:

Wt = ∅1 Wt-1 + ∅2 Wt-2 + …+ ∅p Wt-p + at – 𝜃1 at-1 + 𝜃2 at-2 - …- 𝜃q at-q

Wt term is written instead of Yt term in the ARMA model. Wt process is obtained because of the non-stationary process Yt by taking the d degree difference and shown as below:

^d Yt  Wt

 Yt = Wt = Yt - Yt-1 = (1-B) Yt

^dYt = Wt = (1-B)^d Yt

Where,

(27)

 = Difference operator

d = Degree of difference Wt = Differentiated series

ARIMA is a linear approach, which means that the predicted value of a variable is expected to be a linear function of past observations. As a result, time series data supplied into ARIMA should be linear and stationary. Real-world issues, on the other hand, are frequently nonlinear.

As a result, ARIMA may be insufficient for complicated nonlinear real-world problems.

4.5.2 VAR Model

Sims (1980) proposed the Vector Autoregressive (VAR) model, which may be used to represent the dynamics and interdependence of multivariate time series. The Vector Autoregressive Model (VAR) is a popular and easy-to-use model for analyzing numerous time series. The VAR model produces reliable findings, particularly when determining and forecasting the dynamic movements of time series (Sims, 1980).

When using the VAR model, no distinction is made between the variables, such as internal or external. VAR analysis is an econometric analytical approach to discovering the dynamic interactions between variables. It is used to find the prediction models of connected time series of variables.

In its simplest form, the VAR model consists of two variables: [y1t, y2t]. VAR model can be defined as:

(^y1t_y2t) = (^β10_β20)+(_{α2 β21}^{β1 α11} ) (^y1t−1_y2t−1)+(^u1t_u2𝑡)

The VAR model has some weaknesses. One weakness is that it is difficult to determine whether variables significantly affect the dependent variable. Another weak point is that all

(28)

variables in the VAR model must be stationary. Finally, determining the optimum lag lengths is difficult. However, numerous ways may be used to solve the difficulties.

(29)

Chapter 5. Model Selection Criteria

Many studies on model selection algorithms and criteria can be found. These include

traditional model selection approaches as well as information-based model selection methods.

Hypothesis tests are commonly used in traditional selection approaches. Model selection methods based on information criteria are an alternative to traditional methods. Akaike has been a pioneer in statistical modeling and statistical model identification or evaluation. As a result, he is regarded as one of the earliest researchers on this subject (Akaike, 1973, Akaike, 1974).

In cases where ACF and PACF do not indicate a single model, it is sometimes not sufficient to determine the p and q degrees according to the ACF and PACF graphs. If a suitable grade is not selected, the results obtained from the model will not be reliable. That is, when the degree of the model is chosen smaller than it should be, the estimation of the parameters is not consistent. When it is chosen larger than it should be, the variance of the estimation of the parameters is significant. In such circumstances, several information criteria are applied to select the model that best represents the series. These criteria are discussed further below.

5.1 Akaike Information Criterion (AIC)

The AIC method chooses amongst many models, each with a different amount of parameters.

Each model has the greatest likelihood function, and the model with the highest likelihood function is chosen as the best model.

AIC = -2logL(θ) +2k

Many AIC statistics are defined in numerous ways. The same findings will be reached even though there is no one statistic because they depend on the estimated parameter values obtained with the most likely estimators. When selecting from many models with varying numbers of parameters, the AIC value for each model is calculated, and the appropriate model

(30)

grades are established by selecting the most negligible value among them (Akaike, 1973, Akaike, 1974).

5.2 Bayesian Information Criterion (BIC)

The p-value is thought to be overestimated by the Akaike Information Criterion. Another technique, the Bayesian Information Criterion (BIC), should be investigated to control the AIC's tendency toward overfitting (Brockwell, 1991).

BIC = nln (σ²) + kln(n)

When selecting among many models with varying numbers of parameters, the BIC value for each model is calculated, and the appropriate model grades are established by selecting the one with the most negligible value. Although BIC statistics are defined differently in various sources, their findings will result in the same model degrees being chosen since they are based on the best probability estimations.

5.3 Final Prediction Error (FPE)

Another criterion used when determining model parameters is the Final Prediction Error (FPE) statistic. The FPE statistic is used to calculate the p-value, which represents the degree of autoregression.

FPE = σ̂² ^𝑛+𝑝

𝑛−𝑝

The degree of the model is decided by the value that makes the FPE statistic the minimum.

Furthermore, because the asymptotic distribution of σ̂² / σ², (n - p) is a chi-square

distribution with degrees of freedom; the model's applicability to the autoregressive process may be evaluated by comparing the FPE value to the chi-square table value (Brockwell, 1991).

(31)

5.4 Schwarz Information Criterion (SIC)

The SIC values of models with varying parameters considered to reflect the time series under consideration appropriately are computed, and the model with the least value is chosen. The relevant degrees are also determined using this methodology. The SIC formula is defined as follows, where k is the number of parameters in the model (Wei, 1994).

SIC = nlnσ̂_𝑧² + klnn

5.5 Hannan-Rissanen Criterion (HRC)

Another method in model selection is a method applied by Hannan and Rissanen (1982). The process is supposed to be stationary or to take the difference until it becomes stationary and consists of n number of observations. This criterion will produce consistent estimators when calculating the degrees of autoregressive-moving average processes. Although it has been observed that this method generally produces more satisfactory results, it is recommended that it be used in conjunction with sample autocorrelation and partial autocorrelation studies rather than always working with the model chosen by the Hannan and Rissanen criterion in some processes (Granger, 1986).

HRC = logσ̂_𝑧² + (p+q) logn/n

The model is estimated with least squares for combinations of (p, q) values, and the value of σ

̂_𝑧² is supposed to represent the estimation with the highest probability error variance.

5.6 Hannan-Quinn Information Criterion

Hannan and Quinn proposed a new model selection criterion by adjusting the penalty factor using a repeating logarithm. As the penalty factor's growth rate decreases, this model may be viewed as a consistent degree in the selection criterion. In this formula, n is the sample size, σ

̂_𝑧² is the highest likelihood estimate of 𝜎_𝑧², and k is the number of parameters in the model (Hannan, 1979).

(32)

HIC = nlog (σ̂_𝑧²) + 2kloglog(n)

5.7 Techniques to Use in Time Series Analysis 5.7.1 Dickey-Fuller Test (ADF)

Dickey-Fuller (ADF) test is a commonly used technique to check for unit root. ADF test is used to look for unit roots even if there is autocorrelation. The ADF-test null hypothesis is that there is no unit root, and the alternative hypothesis is that the time series is stationary.

The ADF-test allows us to incorporate elements such as a constant, a trend, or both a constant and a trend (Dickey, 1979).

∆Yt = α + δYt−1 + ∑^𝑝_𝑖=1βi∆Yt −γt +

In this situation, the ADF equation includes a constant α and trend γ. As previously stated, the null hypothesis is that there is no unit root. To put it another way, the time series is non- stationary and must be equal to 1 for the equation to be non-stationary or have a unit root.

Knowing the number of periods to be used before performing the ADF test is essential. There are numerous methods for determining lag duration. One method is to run the ADF test with different lagged periods, beginning with a high number and increasing until the findings are statistically significant. Furthermore, it is critical to note that the fundamental issue with unit root testing is their limited power. In other words, the capacity to reject a null hypothesis when it is wrong is lowered.

5.7.2 Granger Causality Test

Earlier studies investigating the link between two variables used the Granger test to evaluate Granger non-causality by estimating a VAR model in levels. These studies did not consider series features such as stationarity and cointegration connections. If the series is not

(33)

stationary, it is assumed that the test method is incorrect. Because the t-statistics do not follow the chi-square distribution, the test will fail. More particular, under the null hypothesis, the Wald test statistic does not follow the standard asymptotic chi-square distribution, preventing meaningful calculations of Granger causality (Asafu-Adjaye, 2000).

Toda-Yamamoto introduced a novel approach extensively utilized in assessing the Granger causality link between non-stationary series. The proposed solution is based on adding the maximum order of integration of the series as an extra lag length to the VAR model. The success of the TY test depends on the correct calculation of the lag length of the VAR model and the degree of integration of the series. Furthermore, because the Wald statistic

asymptotically converges to the 2 distribution, the power and significance level of the test are influenced in small samples (Mavrotas, 2001).

5.7.3 The Portmanteau Test

The portmanteau test, also known as the Q test and established by Box and Pierce (1970), is one of the methods used to determine if a group of autocorrelation coefficients in a financial time series is substantially different from zero.

Q = n∑^𝑀_𝑘=1𝑟²(𝑧̂)

In the formula, n represents the sample size, and M represents the lag length. If this model is appropriate, it will exhibit a distribution with about (M-p-q) degrees of freedom (2(M-p-q)).

The coat hanger statistic will be increased if the model is unsuitable.

H0: The model is appropriate H1: The model is not appropriate

(34)

If Q is less than (2(M-p-q)), the H0 hypothesis cannot be rejected, which means that the model is appropriate.

Box and Jenkins' (1970) definition of the portmanteau test fails to converge to the (2 distribution when explored with insufficient sample numbers. As an alternative, Ljung-Box defined the portmanteau test (Godfrey, 1979).

Q = n(n+2) ∑^𝑀_𝑘=1(𝑛 − 𝑘)-1r2(𝑧̂)

If Q is greater than (2(M-p-q)), H0 will be rejected, which means the model is not appropriate.

(35)

Chapter 6. Developing the Models to Determine the Correlation of Covid-19 Cases Between Selected Countries

6.1 Strategy to Develop ARIMA Model

ARIMA model is only applied to the Covid- 19 cases in Germany. AIC and BIC functions are used to find the best ARIMA model. Time series can be both stationary and non-stationary.

ARIMA models require stationary time series. ACF and PACF tests assume the data is stationary. This study uses ACF and PACF to get information about p and q parameters.

Ljung-Box test was applied for the white noise test. The selected models were compared and used the AIC and BIC functions; the best ARIMA model was selected for the time series analysis.

6.2 Strategy to Develop VAR Model

VAR model is applied to investigate the impact between the selected countries. To develop the VAR model, all datasets should be merged using merge.xts () function. In this section, several tests are applied to investigate the impact and causality between the countries. Firstly, the ADF test was applied to check the stationarity of time series data. After the ADF test, the Granger causality test was applied to the confirmed cases in Germany (GER_COV19) and Poland (POL_COV19) with 3 and 4 lags. The optimum lag length was found by several tests such as Final Prediction Error (FPE), Akaike information criterion (AIC), Schwarz

information criterion (SC), and Hannan-Quinn information criterion (HQ). The model was estimated with 7 lags and 13 for both data. Portmanteau test, AIC, and BIC functions were applied. The best VAR model was selected according to the results from AIC and BIC functions.

(36)

Chapter 7. Results 7.1 ARIMA Model Results

Model selection is an essential element in the modeling process in time series analysis. AIC and BIC are two popular model selection criteria. To find the best model Arima, best.AIC and Arima, best.BIC functions will be used. This study will use Akakie Information Criterion (AIC) first.

This function estimates the model with all possible combinations of model orders but does not remove intermediate lags. After choosing the proper ARIMA model, precise estimations were attempted using the Box and Jenkins approach. Box and Jenkins's methodology is based on the concept that previous events impact future events. ARIMA models are a form of Box- Jenkins methodology.

Figure 9: The original and differenced series

One of the most important statistical measures used in estimation methods is ACF and PACF.

While PACF measures the relationship between two variables, ACF is used to analyse

(37)

univariate time series. ACF and PACF coefficients are not statistically significant in non- stationary data. As a result, if the series is not stationary, it must be converted to stationary using the proper difference procedure. Information about p and q parameters can be provided by drawing ACF and PACF graphs. The autocorrelation (ACF) and Partial Autocorrelation function (PACF) were applied until the 36th lag.

Figure 10: ACF Graph

(38)

Figure 11: PACF Graph

ARIMA (7,1,5) could be a sensible mode without lags 3 and 4.

7.1.1 ARIMA (6.1.3) Model

According to ARIMA (6.1.3) model, regression analyses are shown in Table 1.

Variables Coefficients S.E of the regression

Ar1 -0.7922 0.0312

Ar2 -0.9404 0.0231

Ar3 -0.8523 0.0267

Ar4 -0.8528 0.0268

Ar5 -0.8968 0.0204

Ar6 -0.7321 0.0286

(39)

Ma1 0.7454 0.0454

Ma2 0.3949 0.0474

Ma3 0.4115 0.0300

Table 3: Regression analysis Results

In this regression analysis, Sigma^2 is estimated at 70931559: log likelihood= -8480.84 AIC= 16981.69 AICc= 16981.96 BIC= 17028.67 (See Appendix 2)

7.1.1.1 White Noise Test for ARIMA (6,1,3) Model

Ljung-Box methodology was used for the White Noise test for 10, 20, and 30 lags.

Lag=10 Lag=20 Lag=30

X-squared 200.77 265.57 290.67

p-values <0.00000000000000022 <0.00000000000000022 <0.00000000000000022 Table 4: Ljung-Box test for ARIMA (6,1,3) Model

According to the table, the null hypothesis cannot be rejected at 5%. Therefore ARIMA (6,1,3) model can be correct.

7.1.2 ARIMA (7,1,5) Model

In this section, ARIMA (6,1,3) model will be compared with ARIMA (7,1,5) model. Z-test of coefficients for the ARIMA (7,1,5) model is shown in Table (See Appendix 3).

7.1.2.1 White Noise Test for ARIMA (7,1,5) Model

Ljung-Box methodology was used for the White Noise test for 10 and 30 lags.

Lag=10 Lag=30

X-squared 20.306 93.638

p-values 0.02649 0.00000001826

(40)

Table 5: Ljung-Box test for ARIMA (7,1,5) Model 7.1.3 ARIMA (7,1,5, s7) Model

Estimate Std. Error z value Pr(>|z|) Ar1 0.3984288 0.1200796 3.3180 0.0009065 ***

Ar2 -0.3153338 0.0961161 -3.2808 0.0010353 **

Ar3 -0.0156597 0.0805020 -0.1945 0.8457641 Ar4 -0.1201007 0.0764995 -1.5700 0.1164256 Ar5 0.0018877 0.0656229 0.0288 0.9770510 Ar6 0.1532682 0.0552267 2.7753 0.0055158 **

Ar7 0.3686493 0.0799743 4.6096 0.000004034 ***

Ma1 -0.4973412 0.1235626 -4.0250 0.000056972 ***

Ma2 -0.1417251 0.1050871 -1.3486 0.1774515 Ma3 0.3501919 0.0968525 3.6157 0.0002995 ***

Ma4 -0.2653412 0.1003724 -2.6436 0.0082037 **

Ma5 0.0789868 0.0665927 1.1861 0.2355758

Sma1 -0.8139067 0.0402969 -20.1977 <

0.00000000000000022

***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Table 6: Z-test of coefficients for ARIMA (7,1,5, s7) Model.

(41)

7.1.4 Model Comparison

This section will compare all models based on AIC and BIC functions to find the best model.

Models Df AIC BIC

ARIMA (6,1,3) 10 16981.69 17028.67

ARIMA (7,1,5) 13 16861.93 16923.00

ARIMA (7,1,5, s7) 14 16676.92 16742.58

Table 7: Comparison of ARIMA (6,1,3), ARIMA (7,1,5) and ARIMA (7,1,5, s7) Models.

According to the table, it is understood that the ARIMA (7,1,5, s7) model is more suitable because all terms are significant, residuals are white noise, and low information criterion (IC) values.

7. 2 VAR Model

To develop the VAR model, all datasets need to be merged by using merge.xts () function.

The first five rows of the merged dataset can be shown below

Date FRA_COV19 GER_COV19 POL_COV19

2020-03-04 73 157 1

2020-03-05 138 186 0

2020-03-06 190 187 4

2020-03-07 336 142 1

2020-03-08 177 105 5

2020-03-09 286 347 6

Table 8: The Merged Dataset

(42)

Figure 12: Confirmed Covid-19 Cases in Germany, France, and Poland.

Figure 13: Plotting Confirmed Covid-19 Cases in Germany, France, and Poland Separately.

(43)

From these graphs, it can be concluded that there is an impact between the three countries in terms of rising and fall periods of the Covid-19 pandemic. The effect between these countries will be investigated in this section by using Granger Causality Test.

7.2.1 ADF Test

Adf test was applied to the dataset from France. The value of the test statistic is -2.0313, and the second test statistic is 26.6692. Critical values for the test statistics can be seen in the table.

1pct 5pct 10pct

tau2 -3.43 -2.86 -2.57

phi1 6.43 4.59 3.78

Table 9: Critical values for the test statistics for the FRA_COV19 series.

tau2 term stands for the null hypothesis, which is δ = 0. Phi1 term stands for a combined null hypothesis named a joint hypothesis where c and δ are equal to 0. According to the table, it can be concluded that tau2 is -2.86, which means that it is higher than the value of the test statistic, which is -2.0313. Therefore, the null hypothesis cannot be rejected. It can be presumed that there is a unit root. In other words, the variable is non-stationary.

Data differencing by first order can be used to solve non-stationarity. Therefore, the ADF test should be repeated in the first order. As a result of the first-order difference, the Value of the test statistic is -7.3033. Critical values for the test statistics can be seen in Table 12. The value of the test statistic is lower than the 5% critical value, which is -2.86. Therefore, the null hypothesis is rejected. In addition to that joint hypothesis is also rejected second test statistic is higher than its critical value, which is 4.59. After differentiating the data by the first order, variables became stationary.

(44)

7.2.2 Granger Causality Test

GER_COV19 is selected as the dependent variable to determine if POL_COV19 is a Granger cause of GER_COV19. To perform Granger Causality Test, the granger test () function was used. Firstly, the Granger causality test will be done for 3 lags.

Model 1: GER_COV19 ~ Lags (GER_COV19, 1:3) + Lags (POL_COV19, 1:3) Model 2: GER_COV19 ~ Lags (GER_COV19, 1:3)

Model 1 is the unrestricted model, which returns with lags, and Model 2 is the restricted model, which returns without the lags.

Res.Df Df F Pr(>F)

1 802

2 805 -3 0.6137 0.6063

Table 10: Granger-Causality Test Results with 3 lags (GER_COV19 as the dependent)

As it can be concluded p-value is 0.6063, which is greater than 0.05. Hence, the null hypothesis cannot be rejected. Thus, it is found that Covid-19 cases in Poland do not affect the Covid-19 cases in Germany.

Granger- Causality test was repeated for 4 lags.

Model 1: GER_COV19 ~ Lags (GER_COV19, 1:4) + Lags (POL_COV19, 1:4) Model 2: GER_COV19 ~ Lags (GER_COV19, 1:4)

Res.Df Df F Pr(>F)

1 799

2 803 -4 24.58 < 0.00000000000000022 ***

(45)

Table 11: Granger-Causality Test Results with 4 lags (GER_COV19 as the dependent)

When the test was repeated with 4 lags, it can be seen that the p-value was less than 0.05.

Hence, the null hypothesis can be rejected, and it is found that POL_COV19 is a Granger cause of the GER_COV19 series.

The Granger-Causality test is conducted for the case where POL_COV19 is the dependent variable with 3 lags.

Model 1: POL_COV19 ~ Lags (POL_COV19, 1:3) + Lags (GER_COV19, 1:3) Model 2: POL_COV19 ~ Lags (POL_COV19, 1:3)

Res.Df Df F Pr(>F)

1 802

2 805 -3 67.606 < 0.00000000000000022 ***

Table 12: Granger-Causality Test Results with 3 lags (POL_COV19 as a dependent)

The Granger-Causality test is conducted for the case where POL_COV19 is the dependent variable with 4 lags.

Model 1: POL_COV19 ~ Lags (POL_COV19, 1:4) + Lags (GER_COV19, 1:4) Model 2: POL_COV19 ~ Lags (POL_COV19, 1:4)

Res.Df Df F Pr(>F)

1 799

2 803 -4 53.642 < 0.00000000000000022 ***

Table 13: Granger-Causality Test Results with 4 lags (POL_COV19 as a dependent)

(46)

In this case, the null hypothesis was rejected. Therefore, it is understood that GER_COV19 is a Granger cause of POL_COV19, both with lags 3 and 4.

7.3 VAR model for GER_COV19 and POL_COV19 series

To use in the VAR model, Covid-19 cases in Germany and Poland were selected due to the bi-directional feedback between the two series and the common border between these two countries.

Figure 14: Covid-19 Cases in Germany and Poland.

Table 17 (See Appendix 4) shows the results of lag order selection criteria for the model. The optimum lag length is determined using Final Prediction Error (FPE), Akaike information criterion (AIC), Schwarz information criterion (SC), and Hannan-Quinn information criterion (HQ). All the criteria suggest order 13.

7.4 Estimating VAR Model

7.4.1 VAR Estimation with 7 Lags

The model will be estimated with 7 lags without seasonal dummies, for GER_COV19 and POL_COV19 (See Appendix 5 and 6).

The portmanteau test is used to assess the quality of the fitted model by determining if the residuals are approximately white noise.

(47)

Data Residuals of VAR object GER.POL.var7

Chi-squared 914.27

df 36

p-value < 0.00000000000000022

Table 14: Portmanteau Test (asymptotic) for GER.POL.var7

The table reports the result of the Portmanteau Test (asymptotic) for GER.POL.var7. It can be understood that the p-value is less than 0.05, which results in rejecting the null hypothesis of no autocorrelation.

After estimating the model with 7 lags, the model also will be estimated with 13 lags (See Appendix 7 and 8).

The portmanteau test is used to assess the quality of the fitted model by determining if the residuals are approximately white noise.

Data Residuals of VAR object GER.POL.var13

Chi-squared 54.617

df 12

p-value 0.0000002119

Table 15: Portmanteau Test (asymptotic) for GER.POL.var13

Table 15 reports the result of the Portmanteau Test (asymptotic) for GER.POL.var13. It can be understood that the p-value is 0.0000002119, which is less than 0.05. The null hypothesis of no autocorrelation is rejected.

(48)

The VAR (13) model will be repeated with season 7 for GER_COV19 and POL_COV19 (See Appendix 9 and 10).

The portmanteau test is performed to verify the null hypothesis of no autocorrelation of residuals.

Data Residuals of VAR object

GER.POL.var13_s7

Chi-squared 52.348

df 12

p-value 0.0000005378

Table 16: Portmanteau Test (asymptotic) for GER.POL.var13_s7.

Models df AIC BIC

GER.POL.var7 30 31718.26 31858.98

GER.POL.var13 54 30813.84 31066.74

GER.POL.var13_s7 66 30806.28 31115.39

Table 17: Comparison of GER.POL.var7, GER.POL.var13 and GER.POL.var13_s7 Models.

It was discovered that AIC and BIC preferred GER.POL.var13 after applying the VAR Model in the following order: VAR (7) | VAR (13) | var (13, season = 7). GER.POL.var13 will be used as the final model.

(49)

7.5 Forecasting with VAR

Figure 15: 13-day forecast of Covid-19 cases in Germany using final model (GER.POL.var13)

Figure 16: 13-day forecast of Covid-19 cases in Poland using final model (GER.POL.var13)

(50)

Chapter 8. Discussion

This thesis uses the Box-Jenkins modeling method to analyze the impact of Covid-19 cases on neighboring countries to help the authorities take precautions to avoid more future cases. The dataset used in this thesis consists of Covid-19 cases between March 2020 and May 2022. In this thesis, ARIMA and VAR models were applied. For the ARIMA model, only the cases from Germany were used. ARIMA model uses univariate variables, while the VAR model uses multivariate variables. The dependent variable is the recorded cases from Germany, and the independent variable is the recorded cases from Poland and France.

For the ARIMA model, all the possible orders were created. The result of this estimation shows that ARIMA (6,1,3) can be selected among the models to be compared. In addition to that, ARIMA (7.1.5) can be a sensible mode. White noise tests were applied for both possible models for 10, 20, and 30 lags. ARIMA (6,1,3), ARIMA (7,1,5) and ARIMA (7,1,5, s7) were compared. The study's results suggest that ARIMA (7,1,5, s7) is the best model, with low information criteria values, and all the terms are significant. It is concluded that these countries have an impact when the cases are rising and falling.

Furthermore, VAR based forecasting model was developed. Firstly, the ADF test was applied to the dataset, which checks if there is a unit root. When the ADF test was performed for FRA_COV19, the tau2 value was calculated as -2.0313. Since this is less than 0.05, the null hypothesis of non-stationarity cannot be rejected. It is easy to solve non-stationarity by differentiating by the first order. Since it was decided that the FRA_COV19 dataset was not stationary, the first-order differentiating was applied. As a result of this, it became stationary.

After performing the ADF test, the Granger Causality test was performed. It was checked whether POL_COV19 is a Granger cause for GER_COV19 and vice versa. Lag lengths of 3 and 4 were applied for both conditions. First, the analysis was done with 3 lags to check if POL_COV19 is a Granger cause for GER_COV19. The result of this show that Poland does

(51)

not affect the number of cases in Germany. However, when the analysis was repeated for 4 lag lengths, it was understood that Poland was the Granger cause for Germany. The same steps were applied to check if Ger_COV19 is a Granger cause for POL_COV19. The results from this analysis suggest that Germany is a Granger cause for Poland for both 3 and 4 lag lengths. It is concluded that these data are non-stationary co-integrated. The Granger Causality test reveals that there is bi-directional feedback on a 5% level.

Finally, the VAR model was developed to investigate Covid-19 cases in Germany and Poland due to the bi-directional feedback between the series. The optimum lag length should be selected first to develop the VAR model. The optimum lag length was selected by using lag order selection criterions such as using Final Prediction Error (FPE), Akaike information criterion (AIC), Schwarz information criterion (SC), and Hannan-Quinn information criterion (HQ). According to the results of these criteria, the optimum lag length was determined as 13.

Firstly, the model was estimated with 7 lags without seasonal dummies, for GER_COV19 and POL_COV19. After that, it was estimated that 13 lags without seasonal dummies both for GER_COV19 and POL_COV19. VAR (13) model was repeated with season 7. AIC and BIC criteria suggest that GER.POL.var13 is the final model.

(52)

Chapter 9. Conclusions 9.1 Conclusion

Covid-19, which began on December 31, 2019, in Wuhan, Hubei Province of China, and was declared a pandemic by the World Health Organization on March 12, 2020, affected the entire world. Covid-19 has turned into a pandemic that is not only limited to a particular region but also affects developed countries. In the face of this unexpected pandemic, some people sought refuge in conspiracy theories, while others believed such a condition could exist. In reality, this relaxed attitude was evident among civilians and the government decision-making

system. Years ago, it was predicted that more pandemics would emerge in a globalizing world and that these diseases would afflict all countries equally. Even the conditions that were supposed to have been eradicated in ancient times continue to endanger people's lives in various locations by resurfacing them in modern settings. Forecasting helps government agencies and policymakers address the issue by planning for healthcare staff and equipment demands.

In this study, we investigated the performance of two different time series analysis methods and applied them to forecast Covid-19 cases in three countries. The collected data was between Mar 2020 and May 2022. GER.POL.var13 model has shown exceptional

performance in predicting the case number and trend of Covid-19 in the selected countries.

We can conclude that we can use the VAR model as a time series forecasting method to predict the future trend of the pandemic. Thus, it will help authorities to take necessary precautions to keep the pandemic under control and protect humans from infection.

9.2 Future Work

In this thesis, only two-time series forecasting models were used. There are more forecasting models to use in predicting future events. In future work, other time series models can be compared with the models found in the literature.

(53)

The dataset used in this study includes only Covid-19 cases in the selected countries. On the other hand, death and recovered data could help us to develop a better model.

More hybrid models will combine statistical and machine learning models in recent years.

The hybrid model can be used to enhance the performance of forecasting models in future work.

(54)

REFERENCES

Abraham, B. L. (1983). Statistical Methods for Forecasting. John Wiley and Sons, Newyork,.

Ahmed, R. a. (2021). Does high COVID-19 spread impact neighbouring countries? Quasi- experimental evidence from the first year of the pandemic in Ireland [version 2; peer review:

2 approved]. HRB Open Res, 4:56.

Akaike, H. (1973). Information theory and extension of the maximum likelihood principle.

2nd International Symposium on Information Theory. Bu- dapest: Academiai Kiado, 267-281.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transaction and Automatic Control., 719-723.

Asafu-Adjaye, J. (2000). The relationship between energy consumption, energy prices and economic growth: time series evidence from Asian developing countries. Energy Economics, 615-625.

Baltagi, B. H. (2011). Econometrics. Berlin: Heidelberg: Springer Berlin Heidelberg.

Barman, A. (2020). Time Series Analysis and Forecasting of COVID-19 Cases Using LSTM.

arXiv.Org. https://arxiv.org/abs/2006.13852.

Brockwell, P. J. (1991). Time Series: Theory and Methods. New York: Springer-Verlag.

Brozak, S. J. (2021). Dynamics of covid-19 pandemic in India and Pakistan: A metapopulation modelling approach. Infectious Disease Modelling.

Chatfield, C. (2000). Time-Series Forecasting. Department of Mathematical Sciences.

University of Bath, UK.

(55)

Cochrane, J. H. (1997). Time Series for Macroeconomics and Finance. Graduate School of Business, University of Chicago,.

Dean, K. R. (2018). Human ectoparasites and the spread of plague in Europe during the Second Pandemic. Proceedings of the National Academy of Sciences,, 1304 LP – 1309. . Dickey, D. A. (1979). Distribution of the Estimators for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association, 427-431.

Ding, G. (2020). Brief Analysis of the ARIMA model on the COVID-19 in Italy. medRxiv.

https://www.medrxiv.org/content/10.1101/2020.04.08.20058636v1.

Dodge, Y. (2008). The concise encyclopedia of statistics. Springer.

Godfrey, L. G. (1979). Testing The Adequacy of A Time Series Model. Biometrica, 67-72.

Granger, C. W. (1986). Forecasting economic time series. London: Acedemic Press.

Hannan, E. a. (1979). The Determination of The Order of An Autoregression. Journal of the Royal Statistical Society Series B(Methodological), 190-195.

Hays, J. (2006). Epidemics and pandemics. Santa Barbara, Calif.: ABC-CLIO.

Hui DS, I. A. (2020). The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health—The latest 2019 novel coronavirus outbreak in Wuhan, China. International Journal of Infectious Diseases, 264–266.

Mavrotas, G. K. (2001). Old wine in new bottles: testing casuality between savings and growth. The Manshester School, 97-105.

McCarty, M. F. (2020). Utraceuticals have potential for boosting the type 1 interferon response to RNA viruses including influenza and coronavirus. Progress in Cardiovascular Diseases.