Predicting volatility of cryptocurrencies using machine learning techniques

(1)

Faculty of Economics and Business

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

up into a number of sections and contains references. An outline can be something like (this

is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper

from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page)

(c) Introduction

(d) Theoretical background

(e) Model

(f) Data

(g) Empirical Analysis

(h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you

use should be logical) and the heading of the sections. You have a free choice how to

list your references but be consistent. References in the text should contain the names

of the authors and the year of publication. E.g. Heckman and McFadden (2013). In

the case of three or more authors: list all names and year of publication in case of the

rst reference and use the rst name and et al and year of publication for the other

references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that

actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty

as in the heading of this document. This combination is provided on Blackboard (in

MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number

(d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

University of Amsterdam

Predicting Volatility of Cryptocurrencies

Using Machine Learning Techniques

Salima el Khababi

10658041

MSc in Econometrics Thesis

Specialisation: Financial Econometrics Date of final version: July 15, 2018 Supervisor: drs. A. C. Rapp

(2)

This document is written by Salima el Khababi who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

This paper investigates to what extent the machine learning techniques Support Vector Regression (SVR) and Feedforward Neural Network (FNN) are an improvement over the Markov-Switching GARCH (MSGARCH) in predicting the volatility of five cryptocurren-cies. The results show that for most cases SVR has the lowest forecast errors measured by RMSE and MAE and MSGARCH has lower forecast errors than FNN. The Diebold-Mariano test could not find a significant difference between the forecast performance of SVR and MSGARCH for the forecast that starts before the bubble crash. In addition, the time varying correlations between the cryptocurrencies, estimated with the DCC model using the estimated volatilities of SVR, are mainly positive and vary the most during the bubble. Fur-thermore, the backtesting results of the 1% VaR forecasts give the conclusion that all three methods provide valid 1% VaR forecasts for the cryptocurrencies.

(4)

1 Introduction 1 2 Literature Review 3 2.1 Cryptocurrency . . . 3 2.2 Forecasting Volatility . . . 5 2.3 Speculative Bubbles . . . 7 3 Method 9 3.1 Markov-Switching GARCH . . . 9

3.2 Support Vector Regression . . . 11

3.3 Feedforward Neural Network . . . 14

3.4 Evaluation . . . 16

3.5 Dynamic Conditional Correlation Model . . . 17

4 Data 19 5 Results 23 5.1 Final Specification . . . 23

5.2 Comparison of Forecasting Methods . . . 27

5.3 Multivariate Analysis . . . 29 5.4 Sensitivity Analysis . . . 33 6 Application 38 6.1 Value at Risk . . . 38 6.2 Backtesting Results . . . 40 7 Conclusion 42 References 44

Appendix I Price Graphs 47

Appendix II Boxplots 48

(5)

In the past few years, cryptocurrencies have gained increasing popularity in the financial markets (He et al., 2017). Cryptocurrencies are created to function as digital currencies by using cryptography to secure the transactions without any centralized control being involved. (B¨ohme, Christin, Edelman, & Moore, 2015). Although the cryptocurrencies are made to function as currencies, there is a discussion among economists and researchers about whether the cryptocurrencies are assets or currencies and whether the cryptocurrencies should be used as an investment since their volatility is higher than any other asset and currency (Cheah & Fry, 2015). Central banks such as the European Central Bank do not consider cryptocurrencies as currencies, mainly because they are not a legal tender in most countries and they are too volatile (ECB, 2015). The high volatility may lead to financial instability in the monetary system since there is an overlap when the cryptocurrencies are converted into traditional currencies and vice versa (DNB, 2018). Therefore, it is crucial to do research on the volatility of cryptocurrencies for not only financial interest but also for monetary policy. Predicting and estimating volatility of financial time series play an important role in risk management, derivatives pricing, and portfolio selection. A very common and popular method to estimate volatility is the Generalized Autoregressive Conditional Heteroskedas-ticity (GARCH) model, given its ability to capture conditional heteroskedasHeteroskedas-ticity and given the low number of parameters in the estimation. Other approaches based on machine learn-ing such as Support Vector Regression (SVR) and Feedforward Neural Network (FNN), may improve the forecast of volatility (Dhamija & Bhalla, 2010; Peng, Albuquerque, Camboim de S´a, Padula, & Montenegro, 2018). The main difference between these methods is that the GARCH models make assumptions about the functional form of the conditional variance, while the machine learning methods do not make these assumptions (Bezerra & Albuquerque, 2017).

Although there is ongoing discussion about cryptocurrencies, not much research exists on the cryptocurrencies and on the performance of the machine learning techniques for forecasting volatility. To give more insight in this discussion and to fill the research gap, this research investigates to what extent the machine learning techniques SVR and FNN are an improvement over the Markov-Switching GARCH (MSGARCH) in predicting the

(6)

volatility of cryptocurrencies. Unlike the standard GARCH models, the MSGARCH allows the parameters of the GARCH models to differ per regime.

The limited existing studies on cryptocurrencies are mainly based on Bitcoin and not on the other cryptocurrencies. Therefore, this paper also investigates four other cryptocurren-cies besides Bitcoin, which are Litecoin, Ethereum, Ripple and Dash. Instead of evaluating the forecasting techniques against the GARCH models, as Peng et al. (2018) have done for SVR, this study allows for regime switching in the GARCH models (MSGARCH) to take potential speculative bubbles into account. The performance of the forecasting methods SVR, FNN and MSGARCH is evaluated by the Root Mean Squared Error (RMSE), the Mean Absolute Error (MAE) and the Diebold–Mariano test, which are often used in the literature (e.g. Chen, H¨ardle, and Jeong (2010)). In addition, five foreign exchange rates are used to investigate the differences and similarities between the volatility of these currencies and the cryptocurrencies. Furthermore, the volatility of the cryptocurrencies is investigated in a multivariate setting by estimating the time varying correlations of the cryptocurrencies with the Dynamic Conditional Correlation (DCC) model. It is estimated by using the uni-variate volatility estimates of the forecasting method with the highest accuracy. The time varying correlations may show whether one cryptocurrency drives the other cryptocurren-cies. Finally, the 1% Value at Risk (VaR) of the cryptocurrencies is forecasted by the three forecasting methods and backtested by the Dynamic Quantile test. The VaR gives insight in the maximum loss of holding an asset over a certain period.

The remainder of this research is organized as follows. Section 2 provides a literature review on cryptocurrencies and forecasting techniques for volatility. After that, Section 3 discusses the forecasting and evaluation methods. Thereafter, Section 4 describes the data of the cryptocurrencies and foreign exchange rates. Following, Section 5 presents the results of the forecasting techniques, the time varying correlations between the cryptocurrencies and the sensitivity analysis. Section 6 provides an application for the forecasting methods by using their volatility estimates to forecast the 1% VaR. Finally, Section 7 gives the conclusion.

(7)

2 Literature Review

This section gives a literature review of the existing studies on cryptocurrencies and forecast-ing methods for volatility of financial time series. Firstly, the cryptocurrencies are discussed to provide a better understanding of these new financial assets/currencies. Furthermore, pre-vious researches on the performance of several forecasting methods for volatility are reviewed. Additionally, speculative bubbles are explained to emphasize the difficulty of forecasting the volatility of highly speculative assets or currencies such as the cryptocurrencies.

2.1 Cryptocurrency

Bitcoin is the first cryptocurrency that was created in 2009 and nowadays it is the largest cryptocurrency. It is created to function as a virtual currency without the involvement of a financial institution. The purpose of the creators is to provide a fast, global and easy way to make transactions. These transactions use a peer-to-peer network to solve the double spending problem. It works as follows. Each transaction is written into a chain of blocks (a blockchain) by several miners. Each new mined block receives the confirmation status when it is linked to an already existing mined block via the containing cryptographic hash. When a mined transaction has a blockchain with six conformations, then the transaction can proceed. An additional implemented security procedure is that the miners are rewarded with bitcoins and an optional transaction fee for each successful transaction (B¨ohme et al., 2015). On the other hand, it costs the miners more than half a million dollars in bitcoins once they try to fraud. Another characteristic of Bitcoin is that it has a limited supply of 21 million units such that only deflation can occur and no inflation (Peng et al., 2018). This makes Bitcoin comparable to gold, which also has a scarcity of supply (Dyhrberg, 2016).

Other cryptocurrencies have arisen in the years following 2009. These cryptocurrencies are called the altcoins, as an alternative to Bitcoin, such as Ethereum, Ripple, Litecoin and Dash. The main principles of these altcoins are the same as those of Bitcoin. However, there are a few differences since they are created to improve Bitcoin. For example, Dash has a focus on privacy and therefore it has more anonymity than Bitcoin. On the other hand, Ethereum focuses on security and makes in addition use of smart contracts that are free of possible fraud, forming a promising technology that can be used by banks (Peng et al.,

(8)

2018). Moreover, all four altcoins are much faster in their transaction procedure and more energy efficient than Bitcoin. Ripple is the fastest and it has already been implemented in a few banks and payment systems.

The cryptocurrencies and the traditional currencies have similarities and differences. Both satisfy the functions of money, which are a store of value, a medium of exchange and a unit of account (Lo & Wang, 2014). Payments with cryptocurrencies have already been accepted by a few companies such as Overstock.com, Dell and Microsoft. For example, Microsoft allows payments in bitcoins in their online stores (Polasik, Piotrowska, Wisniewski, Kotkowski, & Lightfoot, 2016). On the other hand, the cryptocurrencies do not have a long historical background, are not controlled by a third party and their value has no physical representation, while traditional currencies have all these characteristics (Peng et al., 2018). Although major companies such as Microsoft accept cryptocurrencies as payments, some financial institutions and economists are more skeptical and argue that cryptocurrencies are not currencies but assets (Yermack, 2013). One of the reasons for the skeptical attitude may be that there is a lack of knowledge among financial professionals and a lack of existing research on these cryptocurrencies (Peng et al., 2018). The existing research on cryptocur-rencies is mainly based on Bitcoin and not on the altcoins. Another reason is that the cryptocurrencies suffer from speculative bubbles and are not stable enough to be used as a currency (Cheah & Fry, 2015; Baek & Elbeck, 2015). In addition, Yermack (2013) claims that the volatility levels of Bitcoin are larger than any other currency and that it is useless for risk management, as it would be too difficult to hedge this risk. However, Bouri, Jalkh, Moln´ar, and Roubaud (2017) conclude that Bitcoin can be used for hedging against commod-ity prices because they are negatively correlated. Bouri et al. (2017) even state that Bitcoin can act as a safe-haven against movements in commodity prices since their correlations are negative during times of stress. Similarly, Dyhrberg (2016) concludes that Bitcoin can be used to hedge against stocks in the FTSE 100 index and the American dollar because their correlations are also negative on average.

(9)

2.2 Forecasting Volatility

The volatility of financial time series has often a nonlinear and chaotic nature, which makes forecasting the volatility a difficult task (Bezerra & Albuquerque, 2017). The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model introduced by Bollerslev (1986) is often used to forecast and estimate the volatility because it captures conditional heteroskedasticity and uses a low number of parameters. It has obtained several extensions to capture properties, such as asymmetry and nonlinearities (e.g. EGARCH). Besides that, different error distributions are used instead of the standard normal distribution, such as the Student’s t-distribution and Generalized Error Distribution (GED). Although these exten-sions are an improvement over the standard GARCH model, for some financial series it still fails to capture the volatility (Ardia, Bluteau, Boudt, Catania, & Trottier, 2018).

Other methods based on machine learning such as Support Vector Regression (SVR) and Feedforward Neural Network (FNN) have been proposed as alternatives for the GARCH-models. These machine learning methods do not make assumptions on the functional form of the conditional variance and on the error distribution, while the GARCH models require these assumptions. It makes SVR and FNN more flexible than the GARCH models. Therefore, they may be able to capture the dynamic characteristics of financial time series (e.g. kurtosis and asymmetry) better than the GARCH models (Bezerra & Albuquerque, 2017; Dhamija & Bhalla, 2010). Further details of the estimation of these methods are given in the next section.

The machine learning techniques have already been successfully used to forecast the volatility of financial time series. For example, Bezerra and Albuquerque (2017) conclude that SVR outperforms the GARCH models (GARCH, EGARCH, TGARCH and GJR-GARCH) in forecasting the volatility of stock indices given the lower forecast errors, which are measured by Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). They also find a significant difference in the forecast accuracy of SVR and the GARCH models by using the Diebold–Mariano test. Furthermore, Dhamija and Bhalla (2010) conclude that FNN has lower forecast errors, measured by MAE, than the GARCH models (GARCH, EGARCH, TGARCH and IGARCH) for foreign exchange rates. Chen et al. (2010) in-vestigate the forecasting performance of the standard GARCH and EGARCH against the

(10)

forecasting methods Recurrent Neural Network (RNN) and SVR for the NYSE index and GBP/USD. They conclude based on the MAE values that SVR performs better than the other methods in most cases. The Recurrent Neural Network as used by Chen et al. (2010), allows the information to move recurrent instead of forward, which is the case in FNN. Dif-ferent from Chen et al. (2010), this paper compares the Markov-Switching GARCH, SVR and FNN. The Markov-Switching GARCH (MSGARCH) allows for different regimes in the GARCH models and therefore the parameter estimates can differ per regime.

Several methods are used to evaluate the performance of forecasting methods in the literature. The most common methods are the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE). The RMSE takes the root of the average of the squared deviation between the estimated volatility and the actual volatility. On the other hand, the MAE estimates the average of the absolute deviation between the estimated volatility and the actual volatility. The forecasting method with the lowest RMSE or MAE value is the most accurate. However, a more reliable method is to test whether a forecasting method is significantly better compared to another forecasting method. For instance, the Diebold–Mariano test can be used for this purpose. It tests whether the difference of the forecast errors of two forecasting methods significantly differ from zero. If the null hypothesis is rejected, then one of the forecasting methods is significantly more accurate in forecasting the volatility than the other method (Chen et al., 2010; Peng et al., 2018).

Although the machine learning methods have shown improvements over the GARCH models, the results are mainly based on data of stock indices and foreign exchange rates. The existing literature on the volatility of cryptocurrencies is very limited. Two exceptions are Katsiampa (2017) and Peng et al. (2018). Katsiampa (2017) estimates several GARCH models for the volatility of Bitcoin and Peng et al. (2018) conclude based on the lower forecast errors that the SVR is an improvement over the GARCH models for the cryptocurrencies Bitcoin, Ethereum and Dash. Different from Peng et al. (2018) this study also uses the altcoins Ripple and Litecoin, and as mentioned before the SVR will be compared with MSGARCH and FNN.

Forecasting and estimating the volatility of financial time series form an important task for risk management. Risk management uses the forecasted volatilities for several

(11)

applica-tions, e.g. to estimate market risk. A popular measure for market risk is Value at Risk, which measures the maximum loss of holding an asset or portfolio over a given period (Tsay, 2010). Therefore, in this paper the methods MSGARCH, SVR and FNN are also applied to forecast the Value at Risk of the cryptocurrencies. The forecast performance of Value at Risk methods is usually backtested, for example by the Dynamic Quantile test, such that the validity of these forecasts is checked.

2.3 Speculative Bubbles

Predicting the volatility of cryptocurrencies is considered as more challenging than other financial time series such as foreign exchange rates and stock indices because the cryptocur-rencies have higher volatility levels (Peng et al., 2018). Baek and Elbeck (2015) and Cheah and Fry (2015) conclude that speculation drives the high volatility levels of the cryptocur-rency Bitcoin. The same seems to occur for the altcoins, which make them susceptible to speculative bubbles. The speculation in the altcoins may be driven by Bitcoin and hence in this paper the time varying correlations between the cryptocurrencies are estimated with the Dynamic Conditional Correlation (DCC) model of Engle (2002). Other models could also be used, but for convenience the DCC model is used because for DCC it is straightfor-ward to apply the volatility estimates of the three forecasting methods in the estimation of the time varying correlations. More details of the estimation approach for the time varying correlations are given in the next section. As a result, the obtained time varying correlations show whether the speculative bubbles of the altcoins are driven by Bitcoin.

Speculative bubbles occur when the price of an asset exceeds its fundamental value due to speculation among traders and may end in a crash. They are categorized in rational and irrational bubbles. Rational bubbles occur due to self-fulfilling expectations and mis-pricing of fundamentals. On the other hand, irrational bubbles occur due to psychological factors such as irrationally optimistic expectations (Cheah & Fry, 2015). There is a disagreement among economists on whether speculative bubbles exist. Economists such as Eugene Fama claim that speculative bubbles do not exist because financial markets are efficient and if they existed, then they would be predictable (Engsted, 2016). Some tests and models exist to indicate whether there is a bubble and to determine when the time of the bubble burst

(12)

occurs. However, several empirical studies show that these tests and models are not always accurate when applied to historical speculative bubbles and they are not useful for forecasting volatility (Homm & Breitung, 2012; Sornette, Cauwels, & Smilyanov, 2018).

Even though no accurate model exists in the literature, Panopoulou and Pantelidis (2015) and others proposed the method to allow for regime switching to capture speculative bubbles in financial time series. They allow for two and three regimes indicating the explosive state, the collapse state and the dormant state. This approach can also be used for the GARCH models since the prediction of volatility by the GARCH models is likely to fail in capturing possible speculative bubbles in time series. This method is known as the Markov-Switching GARCH (MSGARCH) in which the GARCH models have different regimes that are used to capture the different volatility levels. This allows for time variation in the parameters of the GARCH models (Haas, Mittnik, & Paolella, 2004).

Also, the mentioned machine learning methods SVR and FNN can capture potential speculative bubbles and the different regimes in the volatility because they do not make assumptions on the functional form of the volatility. For example, Rotundo (2004) uses neural networks to forecast crashes of speculative bubbles in stock indices (Dow Jones, Nasdaq and CAC40). Based on the accurate results for these crashes, she concludes that neural networks can be a suitable approach for forecasting highly speculative assets. Bezerra and Albuquerque (2017) state the same for the method SVR since the forecast errors of SVR for stock indices Nikkei 225 and Ibovespa are lower than those of the GARCH models (GARCH, EGARCH and GJRGARCH), and they suggest using a mixture of Gaussian kernels instead of a single kernel to improve the forecast of time series with different regimes.

To sum up, the cryptocurrencies are becoming more important in the financial markets and research on their volatility is needed to obtain a better understanding on their dynamics. The existing research is limited and mainly based on Bitcoin. Therefore, this paper also investigates four altcoins. SVR, FNN and MSGARCH are used to forecast the volatility of these cryptocurrencies. The forecasts are examined by the RMSE, the MAE and the Diebold–Mariano test. In addition, the most accurate forecasting method is applied to estimate the time varying correlations between the cryptocurrencies. Also, an application is provided by forecasting and backtesting the Value at Risk of the cryptocurrencies.

(13)

3 Method

The methods mentioned in the literature review, are further explained in this section. Firstly, the Markov-Switching GARCH and the different GARCH-models for the regimes are dis-cussed. Thereafter, it is described in detail how the machine learning methods SVR and FNN are used to estimate volatility. Also, the evaluation methods are discussed, which examine the performance of the forecasting methods. Finally, it is explained how the DCC model is used to estimate the time varying correlations between the cryptocurrencies.

3.1 Markov-Switching GARCH

The Markov-Switching GARCH model (MSGARCH) allows for different parameter estimates per regime for the GARCH models (Ardia et al., 2018). This paper considers one, two and three regimes for the specification of the MSGARCH. If one regime is selected as most optimal, then it equals the standard GARCH models. Two regimes in the specification of the MSGARCH indicate high and low volatility levels and three regimes indicate high, regular and low volatility levels. The standard GARCH (Bollerslev, 1986), EGARCH (Nelson, 1991), TGARCH (Zakoian, 1994) and GJRGARCH (Glosten, Jagannathan, & Runkle, 1993) with the Student’s t-distribution, GED, Gaussian distribution and the skewed versions of all three are considered for the specification of the MSGARCH. The standardized Student’s t-distribution, GED, Gaussian distribution are respectively specified as

f1(η; ν) = Γ(ν+1₂ ) p(ν − 2)πΓ(ν 2) 1 + η 2 ν − 2 −ν+1₂ , (1) f2(η; ν) = ν · exp(−1₂|η/λ|ν₎ λ2(1+1/ν)Γ(1/ν) , λ = Γ(1/ν) 41/ν_Γ(3/ν) 1/2 (2) f3(η) = 1 √ 2πe −1₂η2 , (3)

and the skewed version of all three, as defined in Trottier and Ardia (2016), is

fξ(η) = 2σξ ξ + ξ−1fi(ηξ), ηξ=    ξ−1(σξη + µξ), if η ≥ −µξ/σξ ξ(σξη + µξ), if η < −µξ/σξ, with µξ= M1(ξ − ξ−1), σ2ξ = (1 − M12)(ξ2+ ξ −2 ) + 2M₁2− 1, M1= 2 Z ∞ 0 ufi(u)du,

(14)

where 0 < ξ < ∞ and fi(·) (for i = 1, 2, 3) is one of the densities in (1)-(3). The skewness is

measured by ξ and if ξ = 1 then fξ(·) equals the unskewed densities of (1)-(3). The model

specification is chosen based on the model with the smallest value of the information criteria AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). Maximum Likelihood is used to estimate these different specifications of MSGARCH. The chosen model specification of the MSGARCH is used for the forecast.

The MSGARCH is estimated as follows. The returns of the cryptocurrencies and foreign exchange rates rtare defined as

rt= log(Pt/Pt−1), (4)

rt= µt+ at, (5)

with Pt the price at time t and atthe disturbance at time t. The conditional mean equation

in (5) is assumed to have mean zero and no serial correlation because the MSGARCH only allows for switching regimes in the conditional variance (Ardia et al., 2018). Therefore, the returns of the cryptocurrencies and foreign exchange rates are demeaned and corrected for serial correlation if necessary. The general MSGARCH is specified as

at|(st= k, It−1) ∼ D(0, σ2k,t, ξk), (6)

with It−1 the information set observed up to time t − 1, D(0, σ2k,t, ξk) a continuous

distri-bution, ξk the skewness parameters and the standardized innovations are ηk,t = at/σk,t ∼

i.i.d.D(0, 1, ξk). Note that if the density (1) or (2) is chosen then the parameter νkalso differs

per regime. The regime state is indicated with k (k = 1, . . . , K) and st denotes the state at

time t. So, for each state the conditional variance σk,t follows one of the GARCH models

as indicated by the equations (7)-(10), which are respectively the standard GARCH(1,1), EGARCH(1,1), TGARCH(1,1) and GJRGARCH(1,1).

σ2_k,t= α0,k+ α1,ka2t−1+ βkσk,t−12 , (7)

with α0,k, α1,k > 0 and βk≥ 0,

log(σ2_k,t) = α0,k+ α1,k(|ηk,t−1| − E[|ηk,t−1|]) + α2,kat−1+ βklog(σk,t−12 ), (8)

(15)

with α0,k, α1,k, α2,k > 0 and βk ≥ 0,

σ2_k,t= α0,k+ (α1,k+ α2,kI(at−1< 0))a2t−1+ βkσk,t−12 , (10)

with α0,k, α1,k > 0 and α2,k, βk ≥ 0. The EGARCH, TGARCH and GJRGARCH are

able to capture asymmetry in the conditional variance, while the standard GARCH cannot capture the asymmetry. The asymmetric models EGARCH, TGARCH and GJRGARCH differ in their specifications. The conditional variance σk,t is restricted to be positive by the

given restrictions on the parameters. There are additional restrictions on the parameters to require covariance-stationarity per regime, which are respectively for (7)-(10): α1,k+ βk< 1,

βk < 1, α21,k+ βk2− 2βk(α1,k+ α2,k)E[ηk,tI(ηk,t< 0)] − (α21,k− α22,k)E[ηk,t2 I(ηk,t< 0)] < 1 and

α1,k+ α2,kE[η2k,tI(ηk,t < 0)] + βk < 1 (Ardia et al., 2018).

The Markov transition matrix for the different states is defined by

P =      p1,1 · · · p1,K .. . . .. ... pK,1 · · · pK,K      (11)

with pi,j = P[st = j|st−1 = i] the probability of transitioning from state st−1 = i to state

st = j given that 0 < pi.j < 1 ∀i, j ∈ {1, . . . , K} and PKj=1pi,j = 1, ∀i ∈ {1, . . . , K}. The

likelihood function and the condition density of at are given respectively by

with Ψ = (θ1, ξ1, . . . , θK, ξK, P ), θi contains the parameters of the chosen GARCH model in

the corresponding regime i (e.g. θi = (α0,i, α1,i, βi)0for GARCH(1,1)) and P[st−1= i|Ψ, It−1]

is the probability that state st−1= i given the information set It−1 and the parameter vector

Ψ (Ardia et al., 2018).

3.2 Support Vector Regression

For the SVR method, the approach of Peng et al. (2018) and Bezerra and Albuquerque (2017) is used. This approach is based on the GARCH model because it also has a conditional mean

(16)

and conditional variance equation, but they are estimated with Support Vector Regression. The conditional mean equation and conditional variance equation are respectively defined by

rt= f (rt−1) + at, (14)

˜

σ_t2 = g(˜σ_t−12 , a2_t−1), (15)

where at are the residuals and ˜σ2t is a proxy for the volatility. The proxy is needed because

volatility is not directly measurable. It is defined as ˜σ_t2= (rt− ¯r)2 where ¯r denotes the mean

of the returns of the in-sample data. This proxy is also used by Peng et al. (2018), Bezerra and Albuquerque (2017) and Chen et al. (2010). Note that (14) is estimated before (15) since at is needed in (15).

In (14) and (15) the SVR decision functions f (·) and g(·) are estimated by -SVR, which is one of the types of support vector machines (Peng et al., 2018). It works as follows. The learning algorithm of -SVR is first applied on a training set (rt−1,1, rt,1),

. . . , (rt−1,n, rt,n) with rt−1,i as the input scalar and rt,i as the output scalar for (14) and

training set (˜σ2

t−1,1, a2t−1,1, ˜σt,12 ) , . . . , (˜σt−1,n2 , a2t−1,n, ˜σt,n2 ) with (˜σ2t−1,i, a2t−1,i) as the input

vector and ˜σ_t,i2 as the output scalar for (15). To find the decision function f (rt−1), -SVR

maps the input rt−1 with the nonlinear transition function φ into a feature space F . This

can be represented in a regression by

f (rt−1) = w0φ(rt−1) + b, φ : Rn→ F , w ∈ F (16)

where w is a regression parameter vector, b is a constant and rt−1 only contains the n

observations from the training set. These w and b are estimated by solving the minimization problem: min w, b ( 1 2kwk 2₊C n n X i=1 L(rt,i, f (rt−1,i)) ) , (17) L(rt,i, f (rt−1,i)) =   

|rt,i− f (rt−1,i)| − , if |rt,i− f (rt−1,i)| >

0, if |rt,i− f (rt−1,i)| ≤ .

(18)

In (18) the loss function is described where an absolute deviation of f (rt−1) from rt must

be at least and therefore the name -SVR. The number of support vectors is higher when is chosen small. The parameter C indicates the trade-off between 1₂kwk2 _{and the loss. For}

(17)

a large value of C, the algorithm overfits the data and therefore both C and are chosen based on a grid search (Bezerra & Albuquerque, 2017).

The minimization problem can be rewritten in the dual representation:

min αi, α∗i ( 1 2(α − α ∗₎0_{Q(α − α}∗_{) +} n X i=1 (αi+ α∗i) + n X i=1 rt,i(αi− α∗i) ) , (19) s.t.    0 ≤ αi, α∗i ≤ C, i = 1, . . . , n Pn i=1(αi− α ∗ i) = 0.

To obtain the dual representation, Langrangian multipliers (αi, α∗i) and

Qij ≡ rt,irt,jK(rt−1,i, rt−1,j; γ) i, j = 1, . . . , n

with kernel K(rt−1,i, rt−1,j; γ) ≡ φ(rt−1,i)0φ(rt−1,j) are used (for proofs see Vapnik (1995)).

In this paper the RBF (Radial Basis Function) kernel, also known as the Gaussian kernel, is chosen because it is the most popular kernel used in the literature (Peng et al., 2018). It is defined by

K(rt−1,i, rt−1,j; γ) = exp −γ||rt−1,i− rt−1,j||2 = exp

−||rt−1,i− rt−1,j|| 2 2σ2 i, j = 1, . . . , n.

Note that γ = _2σ12 and γ is chosen based on a grid search. Other kernels that can be used

for the estimation of -SVR, are the linear kernel and the polynomial kernel. However, these kernels do not perform better than the RBF kernel as Chen et al. (2010) conclude based on the results of a Monte Carlo simulation, which they use to analyze the robustness of kernel selection. Solving (19) gives the estimated parameter vector ˆw =Pn

i=1(cα∗i−αbi)φ(rt−1,i) and estimated parameter ˆb = rt−Pni=1(cαi∗−αbi)K(rt−1,i, rt−1; γ) + . Thus, the decision function is then f (rt−1) = ˆrt= ˆw0φ(rt−1) + ˆb = n X i=1 (cα∗_i −α_bi)K(rt−1,i, rt−1; γ) + ˆb

(Bezerra & Albuquerque, 2017). The decision function g(·) is estimated in the same way as f (·) by replacing the input to (˜σ2

t−1, a2t−1) and output to ˜σt2. The minimization problem is

first solved with the training set and then it is validated with the validation set for different values of C, and γ by a grid search. The values C ∈ [0.5, 4.5], ∈ [0.05, 1] and γ ∈ [0.05, 2] are considered for the grid search. If one of the bound values are chosen as most optimal, then the values above or below that value are also considered. The training and validation

(18)

set are respectively 75% and 25% of the in-sample data. The final specification is chosen by MSE. This final specification is used for the forecast of the out-of-sample data.

3.3 Feedforward Neural Network

The Feedforward Neural Network is one of the neural network models. This neural network only allows the information to move in a forward direction. It contains neurons and layers, which are respectively displayed as circles and columns in Figure 1. The figure contains an input layer, hidden layer and output layer. The input layer has l neurons with input xi, which are the lags of the volatility proxy ˜σ2t−1, . . . , ˜σt−l2 . Each neuron of one layer is

connected with each neuron of the next layer by weights that are displayed as arrows in the figure. There are also two bias neurons (gray colored in Figure 1) that contain weights and are not connected with the previous layer.

Figure 1: Feedforward Neural Network

..

.

b[1]

..

.

b[2]

..

.

x1 x2 x3 xl o[2]₁ o[2]m o[3]₁ o[3]_R Input layer Hidden layer Ouput layer

(19)

The process from one layer to the next layer is described by hj = l X i=1 w[1]_j,ixi+ b[1]_j , j = 1, . . . , m (20) f (x) = 1 1 + exp(−x), (21) o[2]_j = fj(hj) = 1 1 + exp(−hj) , j = 1, . . . , m (22) o[3]_r = fr   m X j=1 w[2]_r,jo[2]_j + b[2]_r  , r = 1, . . . , R. (23)

Firstly, the network estimates (20) which is the sum of the weights w_j,i[1] from the input neurons xi to the hidden neurons o

[2]

j and the weights from the bias neuron b [1]

j are added.

After that, the activation function in (21) is used, which leads to (22). The output of (22) gives the values of the neurons in the hidden layer. The same steps are followed to obtain the neurons of the last layer o[3]r as described in (23) (Wu, Zhou, Luo, & Basset, 2016).

Note that only one neuron is used in the output layer for the forecast of the volatility of the univariate case. To find the weights w and b, the mean squared errors between the volatility proxy ˜σ_t2 and the estimated volatility of FNN o[3]₁ are minimized over w and b by the BFGS (Broyden–Fletcher–Goldfarb–Shanno) algorithm: min w,b 1 2n n X t=1 ˜ σ_t2− o[3]_1,t2= min w,b 1 2n n X t=1    ˜ σ_t2− f1,t   m X j=1 w_1,j[2]fj( l X i=1 w[1]_j,ixi+ b[1]_j ) + b[2]₁      2 ,

where n is the number of observations in the training set.

The network is trained on a training set in which different amounts of neurons are con-sidered for the layers. These different specifications are validated on a validation set. The final number of neurons for the layers are chosen based on the model specification with the lowest MSE, which are obtained from the validation. For the input layer a maximum of 5 neurons are considered and a maximum of 10 neurons for the hidden layer. The lags of the volatility proxy ˜σ_t−i2 are used for the input neurons and the corresponding output neuron is ˆ

˜

σ_t2. The training and validation set are divided in the same way as for the method SVR. The final model specification is used for the forecast of the out-of-sample data.

(20)

3.4 Evaluation

The MAE and the RMSE are used to evaluate the forecast performance and the MSE is used for the evaluation of the different specifications. The MSE is defined as follows:

M SE = 1 n n X t=1 (˜σ2_t − ˆσ˜2_t)2, (24)

where ˆσ˜t is the estimated volatility at time t, ˜σt is the volatility proxy at time t and n is the

number of observations in the validation set. The model specification with the lowest MSE value is used to forecast the out-of-sample data. The forecast is evaluated by

M AE = 1 T T X t=1 |˜σ_t2− ˆσ˜_t2|, (25) RM SE = v u u t1 T T X t=1 (˜σ_t2− ˆσ˜2_t)2_, ₍₂₆₎

where T is the amount of out-of-sample data. The forecasting method with the lowest MAE and RMSE values is the method that outperforms the other methods.

In addition, the Diebold–Mariano test is used to test whether one forecasting method has a significantly different forecast performance compared to another forecasting method. The null and alternative hypothesis of this test are respectively:

H0: E(e2M SGARCH,t− e2SV R,t) = 0, t = 1, . . . , T

H1: E(e2M SGARCH,t− e2SV R,t) 6= 0, t = 1, . . . , T

where T denotes the amount of out-of-sample data. The same is done for FNN by replacing eSV R,tto eF N N,twith ei,t = ˜σt2− ˆσ˜i,t2 , which denotes the forecast error of method i. Rejecting

the null hypothesis implies that the forecasting methods have a significantly different forecast accuracy. On the other hand, no method outperforms the other method when the null hypothesis is not rejected. The corresponding Diebold–Mariano test statistic is defined by

DM = √1 T 1 p ˆ_S2 T −1 X t=0 ((˜σ_t+12 − ˆσ˜_1,t+12 )2− (˜σ2_t+1− ˆσ˜2_2,t+1)2) ∼ N (0, 1), (27)

where ˆσ˜1,t+1 and ˆσ˜2,t+1 are the estimated volatility of respectively method 1 (MSGARCH)

and 2 (either SVR or FNN). The covariance matrix is denoted with ˆS2 (Peng et al., 2018; Bezerra & Albuquerque, 2017).

(21)

3.5 Dynamic Conditional Correlation Model

The Dynamic Conditional Correlation (DCC) model proposed by Engle (2002) estimates the time varying correlations between time series. The model is defined by

Σt= DtρtDt, Dt= diag( √ σii,t), (28) ρij,t= qij,t √ qii,tqjj,t , (29) Qt= (1 − θ1− θ2) ¯Q + θ1t−10t−1+ θ2Qt−1, (30) t= Dt−1at, (31) Qt= V ar(t|Ft−1), Q = V ar(¯ t). (32)

The conditional covariance matrix of the five cryptocurrencies is denoted as Σtin (28). The

matrix Dtcontains the univariate conditional standard deviations of the five cryptocurrencies

and ρtis the matrix with the time varying correlations. The conditional standard deviations

are usually estimated with a GARCH model. The correlations are further defined in (29) and (30), where tare the standardized errors as denoted in (31). Qt and ¯Q are respectively

the conditional variance and unconditional variance of the standardized errors as described in (32), where Ft−1 is the information set. The parameters θ1 and θ2 are unknown and they

are estimated by maximizing the log-likelihood function. The log-likelihood function has two parts: L(θ) = L1(φ) + L2(θ), (33) with L1(φ) = − 1 2 X t n X i=1

log(2π) + log(σii,t) + ri,t2 /σii,t , (34)

where φ contains the parameters of the GARCH models that estimate σii,t, and

L2(θ) = − 1 2 X t log|ρt| + 0tρ −1 t t− 0tt . (35)

The first part is the log-likelihood of the sum of univariate GARCH models, as denoted in (34). In this paper, this part is replaced by the volatility estimates of the forecasting method with the highest accuracy. The obtained volatility estimates are used to estimate the standardized errors:

ˆ t=

ri,t− ˆµi,t

p ˆσii,t

(22)

which are used in the second log-likelihood function. This second log-likelihood function (35) is maximized over θ1 and θ2. The estimated bθ1 and bθ2 are used to obtain the time

varying correlations as described in (29) and (30). Note that the standard errors of bθ1 and

b

θ2 have to be corrected since this is required when estimates for µi,t and σii,t are used in the

log-likelihood function.

In short, the forecast performance of the three methods SVR, FNN and MSGARCH is evaluated. Firstly, the methods are specified by using a training and validation dataset for which different specifications are considered. For SVR and FNN, the model with the lowest MSE value is chosen. The choice for the model specification of the MSGARCH is based on the information criteria AIC and BIC. After choosing the specification, the models are used for the forecast of the out-of-sample data. The obtained results are evaluated by the RMSE, the MAE and the Diebold–Mariano test. Also, the time varying correlations of the cryptocurrencies are estimated by DCC.

(23)

4 Data

In this section the data are described, which are used for the empirical analysis. The volatility of five cryptocurrencies and the volatility of five foreign exchange rates are estimated and forecasted by the aforementioned methods. The results of the cryptocurrencies and foreign exchange rates are compared to gain a better understanding on their dynamics.

Daily and hourly data of the cryptocurrencies and the foreign exchange rates are ob-tained respectively from the web-pages cryptodatadownload.com and dukascopy.com. The hourly data are used for the sensitivity analysis and the daily data are used for the main analysis because there is not enough hourly data available to forecast the volatility of the cryptocurrencies before the bubble crash. The data are divided into an out-of-sample data set containing 100 days (starting at Jan. 5th, 2018) and an in-sample data set containing the remaining days. The in-sample data set is further divided in a training set and a validation set containing respectively 75 percent and 25 percent of the in-sample data. The training and validation set are used for the machine learning techniques. Note that a period of more than 100 days (e.g. 200 days) for the out-of-sample data is not considered because then there is not enough in-sample data for the cryptocurrencies to train the machine learning tech-niques. For the cryptocurrencies and the foreign exchange rates, missing data are removed. Also, holidays and weekends are removed for the foreign exchange rates since the markets are then closed. However, this is not the case for the cryptocurrencies for which the markets are always open.

The exchange rates of the cryptocurrencies Bitcoin, Ethereum, Ripple, Litecoin and Dash are used, denoted respectively as BTC/USD, ETH/USD, XRP/USD, LTC/USD and DASH/USD. The exchange in US dollar is chosen because of the availability of the data. Their daily returns are shown in Figure 2 and the descriptive statistics are available in Table 1. These five cryptocurrencies are chosen based on their market cap and data availability. The sample of the cryptocurrencies covers the period August 8th, 2015 - April 16th, 2018 (984 observations) since Ethereum is released in 2015, which is the youngest cryptocurrency in this data set. For the hourly data the sample starts from July 1st, 2017 and ends at April 16th, 2018, containing 6938 observations. The graphs of the prices of the currencies are given in Appendix I. These graphs show that all the cryptocurrencies have a bubble, where the

(24)

(25)

explosive growth starts around April 6th, 2017 and the bubble crashes in January 2018. The exact dates of the bubble crashes are not the same for the cryptocurrencies, but the prices of the cryptocurrencies begin to decrease around the second half of January 2018.

In addition, the foreign exchange rates EUR/USD, JPY/USD, CAD/USD, GPB/USD and CHF/USD corresponding to the euro, Japanese yen, Canadian dollar, pound sterling and Swiss franc are used, which are among the largest exchange rates in the FX market. For the foreign exchange rates, the exchange in USD is chosen because the cryptocurrencies are also expressed in USD. The daily returns of these currencies are illustrated in Figure 2 and the descriptive statistics are available in Table 1. Based on the data availability, the sample covers the period August 3rd, 2003 - April 12th, 2018 (3862 daily observations and 91838 hourly observations) for all the foreign exchange rates. Note that the sample period differs from the sample period of the cryptocurrencies, so that the results can be compared by allowing for the difference between the cryptocurrencies and the foreign exchange rates that foreign exchange rates have a long historical background and cryptocurrencies do not.

Comparing the cryptocurrencies and the foreign exchange rates, it is noticeable that the cryptocurrencies have more and greater outliers than the foreign exchange rates. This is visible in Figure 2 and in the boxplots with outliers, which are available in Appendix II. Moreover, the volatility of the cryptocurrencies is much larger than the volatility of the foreign exchange rates. Figure 3 visualizes this great difference in volatility. Also, the standard deviations from Table 1 are much larger for the cryptocurrencies than for the foreign exchange rates.

(26)

Table 1: Descriptive statistics

Mean St. dev. Kurtosis Skewness Min. Max. Bitcoin 0.0034796 0.0427328 3.95699 -0.26808 -0.2014098 0.2381416 Litecoin 0.0033707 0.0727605 19.55442 0.65870 -0.6878138 0.6008487 Dash 0.0050187 0.0771462 5.38808 0.15393 -0.4632239 0.3746934 Ripple 0.0044256 0.0799388 38.46868 3.09134 -0.6162727 1.0273560 Ethereum 0.0059946 0.0812468 3.93626 0.20448 -0.3600027 0.4433843 EUR 2.130e-05 0.0060675 1.83534 0.05918 -0.0256788 0.0346271 GBP -3.197e-05 0.0059596 12.54709 -0.96148 -0.0846886 0.0304024 JPY 2.951e-05 0.0063876 4.43728 0.04897 -0.0552581 0.0395109 CHF 3.655e-05 0.0066199 12.02726 -0.46634 -0.0906082 0.1952443 CAD 2.767e-05 0.0060022 2.62809 -0.11638 -0.0330894 0.0400334

(27)

5 Results

The results of the empirical analysis are presented and evaluated in this section. Firstly, the final specifications of the forecasting methods MSGARCH, SVR and FNN for the forecast starting before the bubble crash are discussed. After that, the performance of the forecasting methods is compared by using the RMSE, the MAE and the Diebold-Mariano test. The forecast performance of the FNN in a multivariate setting and the time varying correlations between the cryptocurrencies are presented in the second part. Finally, a sensitivity analysis discusses the robustness of the results.

5.1 Final Specification

The final specifications of the MSGARCH for the currencies are obtained by selecting the specifications with the lowest AIC and BIC value. Up to three regimes for the standard GARCH, EGARCH, TGARCH and GJRGARCH with the Student’s t-distribution, GED, Gaussian distribution and the skewed versions of all three are considered for the final speci-fication.

The final specifications of the cryptocurrencies differ from each other, as can be seen in Table 2. For only Litecoin three regimes are optimal, while for Bitcoin, Ethereum, Dash and Ripple two regimes are optimal. Two regimes indicate a regime for low and high volatility and three regimes indicate low, regular and high volatility. This is visible in the estimates of α and β since they differ among the regimes. The stable state probabilities π are also given in Table 2. These probabilities show for which π it holds that P π = π. Furthermore, the standard GARCH is optimal for all the altcoins and the EGARCH model is optimal for Bitcoin. This means that no asymmetry is found for the altcoins in their conditional volatility, but there is asymmetry found for Bitcoin. The EGARCH model takes the leverage effect into account, where past negative returns have more impact on the volatility than positive returns. For the distributions, all the cryptocurrencies do not choose the normal distribution. Bitcoin, Litecoin and Ethereum have respectively the Student’s t-distribution, the skewed Student’s t-distribution and GED. It is noticeable that Ripple and Dash have the same final specification with a skewed GED. The parameter estimates of the distributions are ν and ξ, where ξ indicates the amount of skewness. For ξ=1 it corresponds to the symmetric version

(28)

of the distribution. In Table 2, most of the estimates of ξ are close to 1, which indicates a low skewness level. Only Ripple is an exception, which has a high skewness level for the distribution in regime 2 (ξ2=4.9338).

Table 2: Final specification of MSGARCH for the cryptocurrencies

Bitcoin Litecoin Ethereum Dash Ripple EGARCH-std GARCH-sstd GARCH-ged GARCH-sged GARCH-sged α0,1 -0.0460 0.0000 0.0000 0.0036 0.0000 α1,1 0.0737 0.0000 0.0165 0.0074 0.1271 α2,1 0.0786 β1 0.9975 0.9244 0.9247 0.3062 0.8714 ν1 3.1585 2.1123 1.0431 0.7000 0.7868 α0,2 -0.9530 0.0000 0.0006 0.0001 0.0035 α1,2 0.1329 0.0265 0.1664 0.1907 0.1264 α2,2 -0.1640 β2 0.8284 0.9731 0.8032 0.8088 0.2345 ν2 11.3347 2.2071 1.6293 1.3834 0.8027 α0,3 0.0001 α1,3 0.0337 β3 0.9661 ν3 3.8304 ξ1 1.3539 0.9999 1.1237 ξ2 1.0933 1.0528 4.9338 ξ3 1.2889

State 1 stable prob. 0.5936 0.0352 0.3978 0.2576 0.8828 State 2 stable prob. 0.4064 0.2439 0.6022 0.7424 0.1172 State 3 stable prob. 0.7209

AIC -3469.4597 -3030.3813 -2330.7538 -2461.8050 -3045.8862 BIC -3412.128 -2930.0508 -2282.9773 -2404.4733 -2988.5545

Note: the in-sample data were used to select the final specification for the forecast starting before the bubble crash and all the parameter estimates are significant with p-value<0.01.

The foreign exchange rates have different final specifications for the MSGARCH compared to the cryptocurrencies. The final specifications of the foreign exchange rates are reported

(29)

in Table 3. For all the foreign exchange rates one regime is optimal, while for the cryp-tocurrencies multiple regimes are optimal. The standard GARCH is optimal for JPY/USD and CAD/USD and EGARCH is optimal for GBP/USD and CHF/USD. The exchange rate EUR/USD is an exception with GJRGARCH as final specification. So, asymmetry in the conditional volatility is only found for GBP/USD, CHF/USD and EUR/USD. Similar to the cryptocurrencies, the foreign exchange rates have not chosen the normal distribution in the final specification. The exchange rates JPY/USD, GBP/USD and CHF/USD have the Student’s t-distribution as final specification. The other exchange rates CAD/USD and EUR/USD have the GED. Also, no foreign exchange has skewness in its distribution, while three cryptocurrencies have skewness in their selected distributions.

Table 3: Final specification of MSGARCH for the foreign exchange rates

EUR GBP JPY CHF CAD

GJRGARCH-ged EGARCH-std GARCH-std EGARCH-std GARCH-ged α0,1 0.0000 -0.0785 0.0000 -0.0746 0.0000 α1,1 0.0199 0.0830 0.0532 0.0686 0.0461 α2,1 0.0228 -0.0197 0.0048 β1 0.9666 0.9925 0.9370 0.9927 0.9479 ν1 1.4943 5.6105 1.0431 6.4484 1.6090 AIC -28329.1615 -28040.8047 -2330.7538 -27741.8824 -28597.6150 BIC -28298.0059 -28015.8802 -2282.9773 -27710.7269 -28572.6906

Note: the in-sample data were used to select the final specification for the forecast starting before the bubble crash and all the parameter estimates are significant with p-value<0.01.

As mentioned in Section 3, the method SVR is used twice to estimate equations (14) and (15), hence for each equation a grid search is done for the parameters γ, C, by using the training and validation set. The results of the final specification for each equation are presented in Table 4. The final specifications of SVR for the cryptocurrencies have a lot of similarities. All the cryptocurrencies select 0.5 for γ, the parameter of the RBF kernel, in the first equation and second equation, except the cryptocurrency Dash which selects 1.0 for γ in the second equation. For the cost parameter C, Bitcoin and Litecoin select 0.5

(30)

Table 4: Final specifications of FNN and SVR

FNN:

neurons per layer

SVR: equation 1 SVR: equation 2 Bitcoin 2-2-1 γ=0.5, C=0.5, =0.7 γ=0.5, C=4, =0.1 Ethereum 2-9-1 γ=0.5, C=1.5, =1 γ=0.5, C=4, =0.1 Litecoin 2-4-1 γ=0.5, C=0.5, =0.1 γ=0.5, C=4, =0.1 Dash 2-5-1 γ=0.5, C=1.5, =1 γ=1, C=4, =0.1 Ripple 3-2-1 γ=0.5, C=1.5, =0.9 γ=0.5, C=4, =0.2 EUR 4-10-1 γ=0.5, C=1, =1 γ=1.5, C=3.5, =0.1 GBP 2-6-1 γ=0.5, C=1, =0.7 γ=0.5, C=4, =0.2 JPY 2-9-1 γ=0.75, C=3.5, =1 γ=0.5, C=4, =0.1 CHF 2-6-1 γ=0.75, C=0.5, =0.1 γ=2, C=3, =0.1 CAD 4-2-1 γ=0.5, C=0.5, =0.9 γ=0.5, C=1, =0.1

Note: the training and validation set of the forecast starting before the bubble crash were used.

in the first equation, while Ethereum, Dash and Ripple select a higher cost of 1.5. In the second equation all the cryptocurrencies select C=4, which is higher than the cost in the first equation. The loss parameter differs among the cryptocurrencies in the first equation from 0.1 up to 1.0. On the other hand, for the second equation the chosen value of is 0.1 for all the cryptocurrencies, except Ripple for which = 0.2.

The final specifications of the SVR for the foreign exchange rates are similar to the spec-ifications of the cryptocurrencies. EUR/USD, GBP/USD and CAD/USD choose γ=0.5 as optimal in the first equation, while JPY/USD and CHF/USD select γ=0.75. For the second equation, JPY/USD, GBP/USD and CAD/USD have the value 0.5 for γ, but it is much higher for CHF/USD and EUR/USD with respectively γ =2.0 and γ=1.5. Furthermore, the cost parameter C is 1.0 for EUR/USD and GBP/USD and 0.5 for CHF/USD and CAD/USD in the first equation. It is noticeable that JPY/USD has a much higher cost value of 3.5. On the other hand, for the second equation the values of C are between 3.0 and 4.0 except for CAD/USD, which has a much lower value of 1.0. The selected values for are similar to the selected values of the cryptocurrencies for both the first and second equation.

(31)

For the specification of FNN different amount of neurons are considered in the input and hidden layer, ranging from 1 to 5 neurons for the input layer and 1 to 10 neurons for the hidden layer. As aforementioned, the input neurons are the lags of the volatility proxy. Table 4 presents the final specification of FNN for the cryptocurrencies and the foreign exchange rates after a grid search. All the cryptocurrencies have 2 input neurons, but Ripple is an exception with 3 neurons. Also, the foreign exchange rates GBP/USD, JPY/USD and CHF/USD select 2 input neurons and as exception EUR/USD and CAD/USD have 4 input neurons. The neurons of the hidden layer differ among the currencies. Bitcoin, Ripple and CAD/USD have 2 neurons in the hidden layer. This is much lower compared to the other currencies that have at least 4 neurons in the hidden layer. Overall, each currency has a different final specification for the forecasting method FNN.

5.2 Comparison of Forecasting Methods

A period of 100 days is forecasted with the final specifications of the forecasting methods MSGARCH, SVR and FNN. The out-of-sample data set starts a few days before the bubble crash of the cryptocurrencies. The accuracy of the forecasts is evaluated by estimating the RMSE and MAE. These values are presented in Table 5. The RMSE and MAE values are much lower for the foreign exchange rates than for the cryptocurrencies, indicating better performance of the forecasting methods for the foreign exchange rates. Note that the volatil-ity of the cryptocurrencies is much higher than the foreign exchange rates and therefore it is more difficult to forecast the volatility of the cryptocurrencies. The SVR seems to be the most accurate method in forecasting the volatility because it has the lowest MAE and RMSE values for the majority of the currencies.

Although the MAE and RMSE values are lower for the SVR, for most cases the differences between these values of SVR and MSGARCH are not large. The forecast performance of the machine learning technique SVR seems less accurate when the RMSE values are compared with those of MSGARCH. SVR has still for most of the foreign exchange rates the lowest RMSE values compared to FNN and MSGARCH. However, the forecasting method MS-GARCH has the lowest RMSE values for the cryptocurrencies Dash, Ripple and Ethereum. It is noticeable that the FNN does not have the lowest RMSE value for any currency, but it

(32)

Table 5: Evaluation of the volatility forecasts for 100 days (starting before bubble crash) MAE RMSE MSGARCH SVR FNN MSGARCH SVR FNN Bitcoin 0.0033954 0.0004158 0.0022346 0.0060434 0.0012038 0.0050594 Litecoin 0.0068408 0.0025411 0.0058597 0.0116503 0.0083570 0.0119247 Dash 0.0058699 0.0034036 0.0074382 0.0084376 0.0181711 0.0231798 Ripple 0.0129769 0.0091583 0.0091390 0.0180890 0.0254502 0.0254487 Ethereum 0.0054602 0.0029462 0.0077749 0.0075195 0.0086737 0.0144260 EUR 2.7856e-05 7.1361e-06 3.8958e-05 3.2214e-05 3.0898e-05 7.9019e-05 GBP 2.9474e-05 1.2452e-05 3.3349e-05 4.2762e-05 1.4765e-05 5.6347e-05 JPY 3.0096e-05 7.0618e-06 4.3338e-05 3.5214e-05 1.6663e-05 0.0001111 CHF 2.3845e-05 3.6634e-05 4.3646e-05 2.9025e-05 4.1267e-05 5.0712e-05 CAD 2.7236e-05 1.6576e-05 4.4816e-05 3.9801e-05 0.0001320 0.0001657

Table 6: Diebold-Mariano test results of the forecasts for 100 days

SVR FNN

Test-statistic P-value Test-statistic P-value Bitcoin 1.1939 0.2354 -0.0439 0.9651 Litecoin 0.0769 0.9389 -0.3789 0.7056 Dash -1.0127 0.3137 -1.0059 0.3169 Ripple -0.9534 0.3427 -1.0048 0.3174 Ethereum -1.0925 0.2772 -1.5666 0.1204 EUR -0.9889 0.3251 -1.0119 0.3140 GBP 1.2175 0.2263 -0.6643 0.5081 JPY 0.6333 0.5280 -1.0103 0.3148 CHF -0.1055 0.9162 -1.7548 0.0824* CAD -0.9985 0.3205 -0.9999 0.3198

Note: significance indicated with *p-value<0.1, **p-value<0.05 and ***p-value<0.01.

(33)

has the lowest MAE value for Ripple. Overall, the machine learning method SVR seems to be the most accurate in forecasting the volatility, followed by MSGARCH and FNN.

Table 6 presents the results of the Diebold-Mariano test for the forecast of 100 days. It tests whether the accuracy levels of the forecasting methods SVR and FNN differ significantly from the accuracy of the forecasting method MSGARCH. For both the cryptocurrencies and the foreign exchange rates, the test does not reject the null hypothesis that MSGARCH and SVR have the same accuracy levels. However, FNN has one significant p-value, which rejects the null hypothesis for the foreign exchange rate CHF/USD at a significance level of 10%. The MAE and RMSE values indicate that the MSGARCH is more accurate than FNN for CHF/USD. For the other currencies the test does not reject the null hypothesis that FNN and MSGARCH have the same forecast accuracy levels.

5.3 Multivariate Analysis

The MSGARCH and SVR cannot be used in a multivariate setting, but the FNN can be used by taking 5 input neurons each corresponding to a lag of the volatility proxy of the cryptocurrencies, 10 hidden neurons and 5 output neurons each corresponding to the fore-casted volatility of each cryptocurrency. The MAE and RMSE are estimated for each output neuron. The estimated MAE and RMSE values of the forecast of 100 days are presented in Table 7. These values are lower than the MAE and RMSE values of the univariate case of the FNN presented in Table 5. Although the MAE and RMSE are lower for the multivariate case, the SVR remains the most accurate forecasting method for the majority of the curren-cies. These lower MAE and RMSE values may indicate that there is correlation between the cryptocurrencies.

Hence, the time varying correlations are calculated for the cryptocurrencies with the DCC model. The time varying correlations may show whether the returns of the cryptocur-rencies are driven by one of the cryptocurcryptocur-rencies. Even though the Diebold-Mariano test has not found differences in the accuracy levels of SVR and MSGARCH, the RMSE and MAE values favor the forecast performance of the machine learning method SVR. Therefore, the univariate estimates of SVR are used to estimate the standardized errors of the five cryp-tocurrencies, which are used in the second-step log-likelihood function of the DCC model

(34)

Table 7: FNN for the multivariate case of the cryptocurrencies (forecast 100 days)

MAE RMSE Bitcoin 0.0021691 0.0027323 Litecoin 0.0053240 0.0318296 Dash 0.0064026 0.0151002 Ripple 0.0075431 0.1085830 Ethereum 0.0068583 0.0104361

(i.e. equation (35)). After maximizing the second-step log-likelihood function, the DCC parameters and the time varying correlations are obtained. Figure 4 shows the estimated time varying correlations between the cryptocurrencies.

The time varying correlations are estimated for the entire data period. It is noticeable that the time varying correlations are quite stable before the explosive growth of the bubble, which are approximately the first 600 values (from Aug. 15th, 2015 – April 6th, 2017). All the time varying correlations show a drop in the correlations around observation 600, which is at April 6th, 2017. Between 600 and the last observation (from April 6th, 2017 - April 16th, 2018) the correlations of the cryptocurrencies increase overall to a higher correlation level compared to the period before 600 (April 6th, 2017). In the period before the bubble crash (April 6th, 2017 - January 13th, 2018) the prices of the cryptocurrencies increase explosively (see Appendix I) while the correlations also increase. All the graphs show a drop in the time varying correlations around observation 880 (January 13th, 2018), which is the moment where the bubbles of the cryptocurrencies crash. After the bubble burst (from January 13th, 2018 - April 16th, 2018), the correlations between the cryptocurrencies increase to their highest correlation value. In this period the prices of cryptocurrencies decrease while their correlations increase.

The first four graphs in Figure 4 show the time varying correlations between Bitcoin and the altcoins. The correlation between Bitcoin and Ethereum goes up and down around an overall correlation of 0.2 from the first observation up to 600 (Aug. 15th, 2015 – April 6th, 2017). For the same period the time varying correlations between Bitcoin and the other

(35)

(36)

altcoins Ripple, Dash and Litecoin are more stable compared to the time varying correlations between Bitcoin and Ethereum. These time varying correlations of Ripple, Dash and Litecoin are respectively around 0.2, 0.25 and 0.45 in this period. From April 6th, 2017 the time varying correlations between Bitcoin and Ethereum, Ripple and Dash increase approximately to 0.40 and the time varying correlations between Bitcoin and Litecoin increase to 0.55. It is noticeable that the time varying correlations of Bitcoin and the altcoins are positive over the entire period.

The remaining graphs are the time varying correlations between the altcoins. These time varying correlations show positive correlations, except for the correlations between Dash and Litecoin and the correlations between Ripple and Dash, which become negative around observation 600 (April 6th, 2017). Ripple and Dash seem to be the least correlated among the altcoins with the highest correlation being 0.25. On the other hand, Ethereum and Litecoin obtain the highest correlation of 0.5. The time varying correlations between Ethereum and Litecoin and between Ethereum and Dash vary the most in time compared to the other time varying correlations of the altcoins.

Overall, the time varying correlations indicate that the cryptocurrencies have positive correlations between each other and that they drive each other’s returns, and therefore each other’s volatility, to some extent. It follows that the investment risk of one cryptocur-rency cannot be managed by regular hedging when using the other cryptocurrencies for the hedge because the correlations between the cryptocurrencies are not negative. However, cross hedging can be used to manage this risk since cross hedging requires positive time varying correlations between assets. When using cross hedging, opposing positions in the cryptocurrencies are taken to reduce the investment risk.

(37)

5.4 Sensitivity Analysis

A period of 50 days is also forecasted that starts after the bubble crash (February 25th, 2018). After obtaining the final specifications and using them to forecast the period of 50 days, the MAE and RMSE are estimated and presented in Table 8. The forecasting method SVR has the lowest MAE values for the cryptocurrencies Bitcoin, Dash and Ripple. On the other hand, the forecasting method MSGARCH has the lowest MAE values for the other two cryptocurrencies Litecoin and Ethereum. The RMSE values of the cryptocurrencies are similar to the MAE values, but for Dash the MSGARCH has a lower RMSE value than SVR. The foreign exchange rates EUR/USD, GBP/USD, JPY/USD and CAD/USD have the lowest MAE and RMSE values for the SVR. However, the MSGARCH has much lower MAE and RMSE values for the foreign exchange rate CHF/USD compared to those of SVR. The MSGARCH also has lower forecast errors than FNN for most currencies. Altogether, the method SVR seems to forecast the volatility of the cryptocurrencies and foreign exchange rates more accurate than the forecasting methods MSGARCH and FNN, given the MAE and RMSE values.

Table 8: Evaluation of the volatility forecasts for 50 days (after bubble crash)

MAE RMSE MSGARCH SVR FNN MSGARCH SVR FNN Bitcoin 0.0026138 0.0002692 0.0014361 0.0037972 0.0003265 0.0015517 Litecoin 0.0032995 0.0041164 0.0060726 0.0042854 0.0167620 0.0145085 Dash 0.0055677 0.0043801 0.0101079 0.0060792 0.0230939 0.0317892 Ripple 0.0041035 0.0034417 0.0043978 0.0051332 0.0035415 0.0047315 Ethereum 0.0041473 0.0042531 0.0095036 0.0051141 0.0114029 0.0182681 EUR 2.0985e-05 5.3969e-06 4.0082e-05 2.3700e-05 7.5991e-06 5.3278e-05 GBP 1.9274e-05 5.7178e-06 8.7893e-05 2.4856e-05 7.0270e-06 9.1100e-05 JPY 2.7311e-05 6.4405e-06 3.5018e-05 3.5868e-05 7.7792e-06 7.4734e-05 CHF 2.1549e-05 0.0002099 4.8327e-05 2.4854e-05 0.0002149 7.0069e-05 CAD 2.4426e-05 4.0943e-06 4.3976e-05 3.1281e-05 6.2707e-06 5.4524e-05

(38)

Table 9 presents the test statistics and p-values of the Diebold-Mariano test for the forecast of 50 days. Bitcoin and Ripple reject the null hypothesis that MSGARCH and SVR have the same forecast accuracy for a significance level of 10%. Based on the MAE and RMSE values, SVR outperforms MSGARCH in forecasting the volatility of Bitcoin and Ripple. Bitcoin also rejects the null hypothesis that FNN and MSGARCH have the same forecast accuracy at a significance level of 10%. Given the MAE and RMSE values, the forecasting method FNN has more accuracy in its forecast for Bitcoin than MSGARCH. The other cryptocurrencies do not reject the null hypothesis, which means that the forecast accuracies of SVR and FNN do not differ significantly from the forecast accuracy of MSGARCH.

Table 9: Diebold-Mariano test results of the forecasts for 50 days

SVR FNN

Test-statistic P-value Test-statistic P-value Bitcoin 1.6890 0.0978* 1.6779 0.1000* Litecoin -0.9995 0.3227 -0.9992 0.3228 Dash -0.9997 0.3226 -1.0036 0.3207 Ripple 1.7343 0.0894* 1.0852 0.2834 Ethereum -1.2551 0.2157 -1.5994 0.1164 EUR 2.2589 0.0286** -1.3470 0.1844 GBP 1.5117 0.1373 -4.2183 0.0001*** JPY 1.0522 0.2981 -1.1411 0.2596 CHF -12.562 0.0000*** -1.0087 0.3183 CAD 1.0816 0.2849 -1.2635 0.2127

Also, the foreign exchange rate EUR/USD rejects the null hypothesis for SVR, but for a lower significance level of 5%. The MAE and RMSE values indicate that SVR is more accurate than MSGARCH. For the foreign exchange rates GBP/USD and CHF/USD the

(39)

null hypothesis is rejected at a significance level below 1% respectively for FNN and SVR, but it is in favor of the MSGARCH since the MAE and RMSE values are lower for the MSGARCH. The other foreign exchange rates do not reject the null hypothesis and hence no significant difference is found in the forecast performance.

The results of the forecast of 50 days are comparable to the results of the forecast of 100 days. However, in most cases the MAE and RMSE values of the forecast of 50 days are smaller than those of the forecast of 100 days. This could be due to the lower forecast period and for the cryptocurrencies this period is after the bubble crash, which makes it less difficult to forecast the volatility. The MAE and RMSE values of both the forecasts (100 and 50 days) are in most cases smaller for the forecasting method SVR than for the MSGARCH and FNN. The Diebold-Mariano test results show more significance for the forecast of 50 days than 100 days, but for most currencies the null hypothesis is still not rejected.

Table 10: Evaluation of the volatility forecasts for hourly data (forecasts for 100 hours)

MAE RMSE

MSGARCH SVR FNN MSGARCH SVR FNN Bitcoin 0.0001948 4.2121e-05 0.0001793 0.0002517 5.908e-05 0.0002450 Litecoin 0.0003264 7.3941e-05 0.0003073 0.0003619 8.9405e-05 0.0004599 Dash 0.0003048 0.0007426 0.0010161 0.0003894 0.0067295 0.0072376 Ripple 0.0003029 0.0001692 0.0004910 0.0010005 0.0003075 0.0015202 Ethereum 0.0002686 5.9337e-05 0.0002349 0.0003475 0.0001035 0.0003824 EUR 1.7712e-06 2.9466e-07 2.6354e-06 1.9323e-06 3.5398e-07 3.8614e-06 GBP 7.4138e-06 1.6542e-06 3.0437e-06 1.5894e-05 1.8623e-06 1.5035e-05 JPY 9.1749e-07 4.8897e-07 6.7600e-06 1.2392e-06 5.7883e-07 6.9118e-06 CHF 3.4720e-06 6.3624e-06 1.7206e-06 3.7252e-06 6.7057e-06 1.9057e-06 CAD 2.1008e-06 3.8080e-07 2.1805e-06 2.2720e-06 5.8531e-07 8.3790e-06

Furthermore, the forecasting methods MSGARCH, SVR and FNN are used with hourly data. After obtaining the final specifications and using them to forecast the period of 100 hours, the MAE and RMSE values are estimated. Note that the final specifications of the forecast of 50 days and 100 hours are comparable to the final specifications of the forecast of

(40)

Table 11: Diebold-Mariano test results for hourly data (forecasts for 100 hours)

SVR FNN

Test-statistic P-value Test-statistic P-value Bitcoin 1.1765 0.2422 0.1541 0.8778 Litecoin 2.0210 0.0460** -1.1635 0.2474 Dash -1.0000 0.3197 -1.0000 0.3197 Ripple 1.0381 0.3017 -0.8707 0.3860 Ethereum 1.4858 0.1405 -0.8385 0.4038 EUR 1.9407 0.0551* -1.0830 0.2814 GBP 1.4557 0.1486 -0.5853 0.5597 JPY 1.8222 0.0715* -23.6070 0.0000*** CHF -11.52 0.0000*** 3.7985 0.0003*** CAD 7.5879 0.0000*** -1.1116 0.2690

100 days. The main differences in the final specifications are that the GJRGARCH is chosen more often for MSGARCH, for SVR some of the parameter values are smaller or higher and for FNN the input neurons are often 3 or 4 instead of 2. Table 10 presents the MAE and RMSE values of the forecasting methods for a forecast of 100 hours. The MAE and RMSE values of the cryptocurrencies Bitcoin, Litecoin, Ripple and Ethereum are the lowest for the forecasting method SVR. The forecasting method MSGARCH has only the lowest MAE and RMSE value for Dash. The foreign exchange rates EUR/USD, GBP/USD, JPY/USD and CAD/USD have also the lowest MAE and RMSE values for the forecasting method SVR. On the other hand, the forecasting method FNN has the lowest MAE and RMSE value for the foreign exchange rate CHF/USD. Altogether, the SVR seems to have the most accurate forecast performance for most cryptocurrencies and foreign exchange rates.

Comparing the MAE and RMSE values of the hourly data with the daily data, the MAE and RMSE values are much lower for hourly data. So, using higher frequency data improves the forecast performance of all the forecasting methods. However, it should be noted that

Predicting volatility of cryptocurrencies using machine learning techniques

Faculty of Economics and Business

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

up into a number of sections and contains references. An outline can be something like (this

is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper

from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page)

(c) Introduction

(d) Theoretical background

(e) Model

(f) Data

(g) Empirical Analysis

(h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you

use should be logical) and the heading of the sections. You have a free choice how to

list your references but be consistent. References in the text should contain the names

of the authors and the year of publication. E.g. Heckman and McFadden (2013). In

the case of three or more authors: list all names and year of publication in case of the

rst reference and use the rst name and et al and year of publication for the other

references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that

actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty

as in the heading of this document. This combination is provided on Blackboard (in

MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number

(d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

Predicting Volatility of Cryptocurrencies

Using Machine Learning Techniques

Salima el Khababi

10658041

2

Literature Review

3

Method

..

.

..

.

..

.

4

Data

5

Results

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided

rst reference and use the rst name and et al and year of publication for the other

(d) Date of submission nal version