The eﬀect of news sentiment on returns and trading strategies in the context of cryptocurrencies

(1)

The effect of news sentiment on

returns and trading strategies

in the context of cryptocurrencies

Johan Starkenburg

(2)

Master’s thesis

Degree: MSc Econometrics, Operations Research and Actuarial Studies

Track: Actuarial Studies

Supervisor:

dr. D. Ronchetti

(3)

The effect of news sentiment on

returns and trading strategies

in the context of cryptocurrencies

Johan Starkenburg

Abstract

(4)

1 Introduction

The goal of this paper is to investigate the effect of news sentiment on returns and trading strategies in the context of cryptocurrencies. Therefore, sentiment analysis, cryptocurrencies (in general and in terms of returns and trading strategies) and the interaction between these two phenomena will be elaborated upon in the introduction. In addition, the contributions/hypotheses will be stated and the remainder of the paper will be outlined.

1.1 News sentiment

A formal definition of sentiment analysis is given in Definition 1 (Oxford Dictionary, 2019).

Definition 1 Sentiment analysis is the process of computationally identifying and categorising opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.

Sentiment analysis is a subset of Natural Language Processing (NLP). NLP concerns processing and analysing natural (i.e. human) languages by means of computer programs. This area of research gained popularity alongside the rise of computers and the internet around the year 2000, although the first research in this area dates back to at least Baldwin (1942). An extensive insight into the foundation of

NLP is given in Manning and Sch¨utze (1999), whereas Jurafsky and Martin (2009) discuss more modern

concepts in NLP. For conventional stock markets, one of the first attempts to utilise sentiment analysis in predicting stock returns was conducted by Tetlock, Saar-Tsechansky, and Macskassy (2008). A general sentiment database was used in order to extract the sentiment from a column in the Wall Street Journal, in which the stock market was discussed daily. Tetlock et al. (2008) find a significant link between sentiment and stock market returns. In particular, the following three conclusions are made. Firstly, the fraction of negative words in firm-specific news stories forecasts firm earnings. Secondly, firms’ stock prices briefly underreact to the information embedded in negative words. Lastly, The earnings/return predictability is largest for stories that focus on fundamentals.

(6)

1.2 Cryptocurrencies

A formal definition of a cryptocurrency is given in Definition 2 (Oxford Dictionary, 2019).

Definition 2 A cryptocurrency is a digital currency in which encryption techniques are used to regulate the generation of units of currency and verify the transfer of funds, operating independently of a central bank.

Nowadays, these so-called cryptocurrencies are traded on various exchanges, similar to traditional stock market exchanges. To understand where cryptocurrencies originate from, some historical context on conventional stock markets is provided. The concept of conventional stock markets has been around for a long time. Although there is some discussion about the beginning of corporate stock trading, some see the Initial Public Offering (IPO) of The Dutch East India Company in 1602 as the key event (Beattie, 2017). Thereafter, it still took several centuries for stock markets to mature. On the one hand, quantitative research on stock markets dates back to at least 1934 with the introduction of fundamental analysis by Graham and Dodd (Lehoczky and Schervish, 2018). On the other hand, large scale quantitative analysis on metrics such as price and volume was difficult to perform before 1958 due to the lack of a central database of stock market information. The widespread interest in stock markets and the existence of a central database led to an considerable amount of literature pertaining to quantitative finance. Lehoczky and Schervish (2018) describe the most important developments of the last 60 years.

Large financial institutions such as banks and insurers currently have a great deal of power, since many transactions are only possible via these institutions. This leads to a trust-based model, which inherently suffers from weaknesses such as the possibility of fraud (Nakamoto, 2009). Therefore, researchers felt the need to decentralise this power, which amongst other things led to the development of the Bitcoin in the beginning of 2009 (Nakamoto, 2009). Subsequently, numerous other cryptocurrencies have been developed. At first, cryptocurrencies did not receive much attention from the media nor academics. However, more recently, cryptocurrencies have received excessive media coverage, mostly due to erratic price movements. This also led to academic literature on cryptocurrencies. The literature pertaining to cryptocurrencies can roughly be split into two distinct areas. Firstly, regulation and governance issues are addressed by several papers. For example, in Campbell-Verduyn (2018) the issue of money-laundering with respect to cryptocurrencies is investigated. They find that the digital assets themselves do not pose the greatest hazard in terms of money laundering. Instead, it is the underlying blockchain technology that presents the most important threat. In addition, they stress the importance of continual monitoring and investigation of ethical implications raised by cryptocurrencies. Secondly, existing quantitative financial research has been applied to cryptocurrencies. This includes, for example, research regarding returns (Fry, 2018; Phillips and Gorse, 2018) and price bubbles (Cheah and Fry, 2015).

(7)

1.3 Combining news sentiment and cryptocurrencies

In the cryptocurrency domain numerous articles can be found that try to link sentiment to various cryptocurrency metrics. For example, Georgoula et al. (2015) use Twitter posts as a proxy for sentiment and show that this sentiment can be used to explain Bitcoin price movements. Moreover, Baiga, Blaub, and Sabaha (2019) use Google trends data in order to link sentiment to Bitcoin price clustering; again they find a significant link. The two most popular proxies for sentiment are Twitter feeds (e.g. Steinert and Herff, 2018; Garcia and Schweitzer, 2015) and Google trends data (e.g. Baiga et al., 2019; Nasir et al., 2019). Data from Wikipedia or Reddit is sometimes also considered (Steinert and Herff, 2018). Although not yet considered in existing literature, news blogs could also potentially be a good proxy for sentiment. Twitter posts are very short by definition and can therefore not include elaborate analyses. Also, Google trends data only contains search volume information. Articles on news blogs generally provide more information, since they are often relatively long and tend to contain thorough analyses. Interestingly, none of the existing research seems to use a GARCH modelling approach when establishing a link between news sentiment and cryptocurrency returns. For example, Nasir et al. (2019) use a VAR approach and Phillips and Gorse (2018) use a Wavelet approach. Other articles, such as Garcia and Schweitzer (2015), focus directly on trading strategies instead of returns.

1.4 Contributions and hypotheses

Based on the literature review in sections 1.1, 1.2 and 1.3, the most important contributing factors of this paper are given below. Firstly, this paper considers news blogs as news proxy. The existing literature has mostly been using Twitter feeds or Google trends data in order to obtain a quantitative news sentiment score. This data is readily available, although some data has to be paid for. Obtaining news sentiment based on news blogs is more complicated, since all relevant articles need to be obtained and processed individually. Secondly, this article includes both basic and sophisticated news sentiment models. As can be inferred from section 1.1, existing literature has mostly been focused on either basic or sophisticated models. In particular, older literature tends to focus on basic models and more recent literature tends to focus on sophisticated models. Note that the concept of cryptocurrencies is a relatively new phenomenon. As a consequence, sophisticated sentiment models are used in existing research that considers news sentiment in the context of cryptocurrencies. By considering both basic and sophisticated models, this article will be able to assess if the additional computational burden introduced by sophisticated models can be justified. Thirdly, GARCH models are used to assess the effect of news sentiment on cryptocurrency returns. As mentioned in section 1.3, it seems that existing literature has not yet modelled the effect of news sentiment on cryptocurrencies using a GARCH type model. Lastly, another distinguishing factor is the large number of cryptocurrencies and news blogs considered. A large number of cryptocurrencies (29 in total) are investigated, whereas most articles only consider one cryptocurrency (often Bitcoin). By investigating numerous cryptocurrencies, this article will be able to determine if behaviour across cryptocurrencies is similar. Furthermore, 15 news blogs are considered, whereas other articles have only taken into account news from one media source (e.g. Twitter or Google). By combining multiple news sources, a more representative news proxy is obtained.

A number of hypotheses will be assessed during the course of this paper. These hypotheses are based on the literature review in sections 1.1, 1.2 and 1.3. The hypotheses are stated below and are split in two main hypotheses (numbered with one) and two sub-hypotheses (numbered with two).

Hypothesis 1a News sentiment has a significant effect on cryptocurrency returns. Hypothesis 2a

oi

Sophisticated models for news sentiment are better able to assist in the modelling of cryptocurrency returns than basic news sentiment models. Hypothesis 1b Including news sentiment increases the profitability of trading strategies. Hypothesis 2b

oi

Sophisticated models for news sentiment are better able to improve the profitability of trading strategies than basic news sentiment models.

(8)

2 Data Description

For the purpose of this paper two categories of data have been collected, namely data on cryptocurrency metrics (e.g. price and volume) and data concerning news sentiment. Consequently, the data description will be split into three parts. Firstly, the data pertaining to cryptocurrency metrics is described in section 2.1. Secondly, the data related to news sentiment is elaborated upon in section 2.2. Lastly, in section 2.3 the interaction between the cryptocurrency metrics and news sentiment is investigated.

2.1 Cryptocurrency metrics

All data concerning cryptocurrencies was obtained from the website CoinMarketCap (CoinMarketCap, 2019b). This can be considered a representative source, since this website takes into account numerous exchanges on which the cryptocurrencies are traded. The final information is based on a weighted average across exchanges based on the traded volume on each exchange, see CoinMarketCap (2019a) for a more detailed explanation. In order to obtain a complete picture of the cryptocurrency domain, information was scraped regarding the top 100 cryptocurrencies based on market capitalisation (as listed on 02-03-2019). However, many of these cryptocurrencies only have a limited number of observations. Therefore, the cryptocurrencies were filtered based on the time frame for which data was available. The requirement was imposed that there should be at least two years of data available, which results in the time frame from

01-03-2017 until 01-03-2019. This time frame will be denoted byT, i.e.T {01-03-2017, . . . , 01-03-2019}.

After filtering, a total of 29 cryptocurrencies remain, which are given by (name: ticker symbol)

Bitcoin: BTC, Ethereum: ETH, XRP: XRP,

Litecoin: LTC, Tether: USDT, Stellar: XLM,

Monero: XMR, Dash: DASH, NEO: NEO,

Ethereum Classic: ETC, NEM: XEM, Zcash: ZEC,

Waves: WAVES, Dogecoin: DOGE, Decred: DCR,

Augur: REP, Lisk: LSK, DigiByte: DGB,

BitShares: BTS, Steem: STEEM, Komodo: KMD,

Siacoin: SC, Verge: XVG, Stratis: STRAT,

Golem: GNT, MaidSafeCoin: MAID, Factom: FCT,

PIVX: PIVX, Zcoin: XZC.

The data description will not focus on all 29 cryptocurrencies. In particular, the best known cryptocur-rency, namely Bitcoin, will be inspected. Additionally, a so-called altcoin index, named Altcoin28 with ticker symbol ALT, will be created, which will be based on the other 28 cryptocurrencies. This is in line with Steinert and Herff (2018), who define altcoins to be all cryptocurrencies except Bitcoin. This

in-dex is constructed analogously to the S&P500 inin-dex, see The-McGraw-Hill-Companies (2011) for details

pertaining to the calculation. In this way, all data is adequately described, while preventing an overly large data description. Also, it could be useful for future research if real (i.e. tradable) altcoin indices are created. Therefore, the analysis in this paper concerning returns will also include the Altcoin28. Descriptive statistics for Bitcoin and the Altcoin28 can be found in Table 1. Similar descriptive statistics concerning all cryptocurrencies can be found in Tables 14, 15, 16 and 17 in the appendix.

Table 1: Descriptive statistics concerning the Bitcoin and the Altcoin28.

Price (USD) Return (%) Volume (USD) Market Cap (USD)

BTC ALT BTC ALT BTC ALT BTC ALT

Min. 937.5 148.5 -18.7 -23.4 1.3e+08 6.0e+07 1.5e+10 3.0e+09

0.25 Q. 3511.9 2016.6 -1.8 -2.3 1.8e+09 2.0e+09 6.1e+10 4.0e+10

Mean 6006.2 3923.0 0.3 0.5 4.7e+09 5.8e+09 1.0e+11 7.8e+10

0.75 Q. 7666.1 4827.7 2.5 3.3 6.1e+09 8.0e+09 1.3e+11 9.7e+10

Max. 19497.4 17233.5 25.2 26.9 2.4e+10 2.7e+10 3.3e+11 3.4e+11

S.d. 3580.9 3021.6 4.6 5.7 3.8e+09 4.5e+09 6.0e+10 6.0e+10

(9)

When inspecting Table 1, note that the Altcoin28 price is constructed with a deterministic divisor (analogous to the S&P500). However, the other variables provide some insight into altcoins and its discrepancies with respect to the Bitcoin. First of all, observe that the returns of the Altcoin28 seem to be even more extreme than the returns of the Bitcoin. However, both returns are centered quite close to zero. Secondly, Bitcoin dominates the cryptocurrency space in terms of market capitalisation and volume. With regard to Table 14 and 15 it can be concluded that there exists significant differences between the individual cryptocurrencies in terms of price and returns, e.g. returns as low as -50% and as high as 560% are observed. Tables 16 and 17 contain similar figures/conclusions as Table 1. Inspecting the variables in Table 1 across time may provide additional insights. This can be done in Figure 1. These graphs can again be found for the individual cryptocurrencies in Figures 12 and 13 in the appendix.

Price Bitcoin (USD) Price Altcoin28 (USD) 2017-07 2018-01 2018-07 2019-01 5000 10000 15000 20000 0 5000 10000 15000

Date

Return Bitcoin (%) Return Altcoin28 (%) 2017-07 2018-01 2018-07 2019-01 -20 -10 0 10 20 -20 -10 0 10 20

Date

Figure 1: Plots concerning prices and returns. The two plots on the left hand side represent the closing prices of Bitcoin and the Altcoin28 across time. The two plots on the right hand side illustrate the returns of Bitcoin and the Altcoin28 across time.

From Figure 1 it can be argued that the price paths of Bitcoin and the Altcoin28 are highly positively correlated. Both start very low, thereafter they experience a surge in prices just before the beginning of 2018. Afterwards, the prices drop sharply and eventually become more stable. Both return series experience extreme negative/positive returns and there seems to be volatility clustering, i.e. extreme values appear in groups. In Figures 12 and 13 similar patterns can be observed. Additionally, it can be argued that the individual cryptocurrencies behave significantly differently. With respect to the upcoming models section it is interesting to investigate the autocorrelations, which can be done in Figure 2.

95% confidence interval

Lag

95% confidence interval

Lag

A CF Alt coin 28 Returns A CF Bitcoin Returns 0 10 20 0 10 20 0.00 0.25 0.50 0.75 1.00

(10)

Inspecting Figure 2, it becomes apparent that autocorrelation is not likely to cause any issues. In particular, the return series for Bitcoin does not seem to exhibit any significant autocorrelation. Addi-tionally, it is insightful to inspect the histogram of returns with respect to the upcoming models section. The cryptocurrency returns will be modelled by means of a GARCH model. Hence, a suitable innova-tion distribuinnova-tion needs to be chosen, which will likely be a distribuinnova-tion with heavier than normal tails (Stavroyiannis, 2018). As can be seen in Figure 3, the returns are more peaked than a normal distribution and they exhibit heavier tails.

normal density normal density

Return Bitcoin (%) Return Altcoin28 (%)

-20 -10 0 10 20 -20 -10 0 10 20 0.00 0.04 0.08 0.12

Densit

y

Figure 3: Histogram concerning the return of Bitcoin and the Altcoin28. The normal densities were constructed with the same mean and variance as the respective series.

Lastly, the contemporary and lagged correlations between returns and volumes are investigated, since previous literature has found that volume could be used for predicting returns for conventional assets

(see e.g. ¨Ulk¨u and Onishchenko, 2019). Figure 4 contains a correlation diagram pertaining to the

con-temporary and lagged correlation between returns and volumes. As expected, the concon-temporary volumes and returns are all highly correlated. Significant negative correlation between lagged volume and current return exists for Bitcoin, yet this correlation is not observed for the Altcoin28.

-100 -80 -60 -40 -20 0 20 40 60 80 100 Ret. B TC t Ret. AL Tt Ret. B TC t− 1 Ret. AL Tt− 1 V ol. BTC t V ol. AL Tt V ol. BTC t− 1 V ol. AL Tt− 1 Ret. BTCt Ret. ALTt Ret. BTCt₋₁ Ret. ALTt−1 Vol. BTCt Vol. ALTt Vol. BTCt₋₁ Vol. ALTt−1 100 62 1 -6 -5 -11 -8 -9 62 100 -1 0 -3 -5 -1 -6 1 -1 100 62 -4 -11 -5 -11 -6 0 62 100 -2 -4 -3 -5 -5 -3 -4 -2 100 89 94 83 -11 -5 -11 -4 89 100 84 92 -8 -1 -5 -3 94 84 100 89 -9 -6 -11 -5 83 92 89 100

(11)

2.2 News sentiment

As mentioned in Steinert and Herff (2018), sentiment research concerning cryptocurrencies up till now has mostly been based on Google trends, Wikipedia, Reddit or Twitter. This paper sets out to employ a different news source as the basis of the analysis, namely news articles on dedicated cryptocurrency blogs. Since the whole cryptocurrency scene is relatively immature, it is difficult to establish authorities in this area. For conventional stock markets, established news sources such as the Wall Street Journal (e.g. Tetlock et al., 2008) could be investigated. However, for cryptocurrencies this is not an option. Therefore, this paper relies on an algorithmic ranking of cryptocurrency blogs provided by the website detailed (Allsopp, 2019a). The blogs are ranked according to the number of so-called mentions, for details on how this quantity is computed see Allsopp (2019b). In this way an objective list of cryptocurrency blogs was obtained. The analysis of this paper is based on the top 15 cryptocurrency blogs according to detailed (retrieved on 17-03-2019). The Coinbase Blog was excluded, since articles on this website were deemed not representative as they mostly promoted their own exchange, i.e. coinbase. Again, data

was collected in time periodTwere possible. However, some websites were founded later than the begin

date ofT. Also, for scraping convenience some articles outsideTwere collected, which are not included

in the analysis. The collection of the data was done in a two-step procedure. Firstly, the URLs of the desired articles were extracted from the internet. Secondly, using the obtained URLs, the articles themselves were scraped from the internet using a custom made web scraper written in R. This web scraper, accompanying functions and a separate code appendix elaborately explaining all scripts are available on request. In the continuation of this subsection the collected information per website and per news category will be elaborated upon. Lastly, the way in which the sentiment scores are constructed will be explained in detail.

2.2.1 Information per website

A first overview concerning the available information is given in Table 2.

Table 2: Overview of the included websites. The number of mentions is used to rank the websites (Allsopp, 2019b). The number of URLs and articles represent the unique number of URLs and articles

(all and in time frame (itf)T) that were collected, respectively. The number of raw categories gives the

number of categories that are available for each website. The number of disjoint categories gives the number of categories that were necessary to create disjoint sets from the raw categories.

Number of Number of cats.

Website mentions URLs articles (all) articles (itf) raw disjoint

CoinDesk 4923 7809 7800 7585 1 1 Cointelegraph 2376 10285 10201 9902 1 1 CCN 1917 2989 2988 1749 2 3 AMBCrypto 1526 7182 7182 6821 3 4 Bitcoin News 1070 9175 9175 6533 1 1 NewsBTC 958 6893 6893 4443 2 2 Bitcoinist 652 7042 7042 4109 3 7 Bitcoin Magazine 355 4206 4068 1550 1 1 Coinspeaker 195 4260 4189 2729 4 6 CryptoPotato 152 880 880 830 1 1 BTCManager 86 5799 5798 5436 4 8

Use The Bitcoin 86 1674 1669 1632 1 1

Coin Central 78 991 986 986 1 1

The Merkle 48 9193 9190 5294 3 4

Coin idol 46 2414 2410 2259 1 1

Total - 80792 80471 61858 -

-Note that itf = in time frame, cats. = categories.

(12)

time between the collection of the URLs and articles. Therefore, some articles were taken offline in the meantime or other problems such as false URLs were encountered. Moreover, it can be observed that most websites only have a single raw news category (i.e. a news category that is readily available on a particular website). This implies that these websites do not categorise their articles. However, some websites have multiple raw news categories, in other words these websites do categorise their articles. In total four raw news categories were gathered, which are given by all, bitcoin, ethereum and altcoin. The websites that only have one news category, only have the all category. For websites with multiple raw news categories multicollinearity issues may arise. Suppose article X is in the all and bitcoin category simultaneously. Thus, when using these two categories jointly, both categories include article X. If this is the case for many articles, the two categories are relatively similar, which may lead to problems related to multicollinearity. Therefore, disjoint news categories are constructed to eliminate these issues. For each website, the overlap between the news articles was investigated and the minimum number of disjoint categories were constructed. This resulted in a total of eight new disjoint news categories (DNC), namely

DNC1: other,

DNC2: only bitcoin,

DNC3: only ethereum,

DNC4: only altcoin,

DNC5: bitcoin and ethereum,

DNC6: bitcoin and altcoin,

DNC7: ethereum and altcoin,

DNC8: bitcoin, ethereum and altcoin.

For websites with only one raw news categories the category was simply renamed from all to other. Also, the categories are created as specific as possible. For example, if an article is in the raw categories

all and bitcoin, the article is placed in disjoint category DNC2. Since there is no overlap between the

different DNC categories, they can be used jointly, in contrast to the raw sets. The total number of articles is an important, but not the most relevant metric to consider. Since this paper consider daily

time series data, it is more important how the observed articles are distributed acrossT. The number of

observations available on each day is shown in Figure 5.

T otal n um b er of articles Av erage n um b er of w ord s 2017-07 2018-01 2018-07 2019-01 50 100 150 600 800 1000 Date

Figure 5: Visualisation pertaining to the number of articles and the average number of words in each

article for each day inT, where T {01-03-2017, . . . , 01-03-2019}.

Interestingly, in Figure 5 it can be detected that the number of available articles for each day behaves quite erratically over time. Investigation reveals that this is mainly due to the fact that the average

number of articles that are published on weekdays (97.77) is significantly greater (p < 0.01) than the

average number of articles published in the weekend (51.56). In addition, it can be seen that the number of articles per day structurally increases over time at least until the year 2018. Apart from increased public interest in cryptocurrencies, this could be partially explained by the fact that some of the included

websites are founded later than the beginning ofT. Also, the average number of words in each article

(13)

posted. The average number of words in an article that is published on weekdays (584.29) is significantly

less (p < 0.01) than in the weekend (626.51). Therefore, it seems to be the case that during weekdays

more articles are posted that are generally smaller, whereas during the weekends less articles are posted that are generally larger. This could be explained by hobby authors, i.e. authors for whom writing is not the primary source of income, who have more time in the weekends to write more extensive in-depth analyses.

Ideally, at least one article is available for each blog in each category on each date in T. A missing

observation in this context occurs if no articles are available for a given website, category and day. It

is to be expected that some news categories (e.g. DNC8) have a lot of missing observations inT, which

would render them useless. In Table 3 the number of missing observations acrossTis investigated. From

Table 3 it can be inferred that CoinDesk, Cointelegraph and Bitcoin News could be used on their own in the analysis; these website only have a limited number of missing observations. It was decided not to use the other website by themselves since this will result in numerous missing values. Instead these website are utilised to obtain a more representative sentiment score by aggregating over all 15 websites, which will be elaborated upon in the following subsection.

Table 3: Overview of the number of missing observations across the various websites and categories. The

considered time frameTfrom 01-03-2017 up to and including 01-03-2019 constitutes a total of 731 days.

In the table below the number of missing observations (in terms of days) are presented for each category for each website. If a certain category does not exist for a certain website, ”-” is reported. The average number of articles for each date varies heavily between the different websites and categories (as high as 13.5 and as low as 1). DNC1 DNC2 DNC3 DNC4 DNC5 DNC6 DNC7 DNC8 CoinDesk 4 - - - -Cointelegraph 0 - - - -CCN - 232 553 - 631 - - -AMBCrypto 318 439 - 378 - 591 - -Bitcoin News 0 - - - -NewsBTC 243 500 - - - -Bitcoinist - 52 642 379 677 428 644 671 Bitcoin Magazine 201 - - - -Coinspeaker 680 328 - 394 - 303 - -CryptoPotato 396 - - - -BTCManager 236 184 516 274 652 302 570 550

Use The Bitcoin 261 - - -

-Coin Central 352 - - -

-The Merkle 225 637 728 - 728 - -

-Coin idol 82 - - -

-2.2.2 Information per category

In the previous subsection, the available information per website was investigated. As was mentioned, for some websites not enough observations are available to use them individually in the analysis. However, this data could still be employed in the analysis by aggregating all the data for all the websites. In this way, a more representative sentiment score is obtained in the end. After aggregating, the overview in Table 4 was created, similar to Table 3. From Table 4, it can be observed that it is advisable to

only use the categories DNC1 and DNC2 in a joint analysis. The other categories contain too many

missing observations inT. Consequently, the category DNC9 was constructed, which contains all unique

articles except the articles pertaining to Bitcoin. In other words, DNC9 is the union of DNCi for

i ∈ {1, 3, 4, 5, 6, 7, 8}. This category was created to obtain a more representative score and to avoid

losing the articles in the categories DNCi for i ∈ {3, 4, 5, 6, 7, 8}.

(14)

have the category all (equivalent to DNC9 and DNC1 in this case). Secondly, another sentiment score

based on aggregated measures could be composed pertaining to the categories DNC2 and DNC9, which

can safely be used jointly. Hence, for a total of five (i.e. CoinDesk, Cointelegraph, Bitcoin News, DNC2

and DNC9) sets of articles a sentiment score time series was constructed, which are used to include

sentiment in the analysis in two different ways (i.e. CoinDesk, Cointelegraph and Bitcoin News versus

DNC2 and DNC9).

Table 4: Overview of the available information across the various categories after aggregation across

websites. The considered time frame Tfrom 01-03-2017 up to and including 01-03-2019 constitutes a

total of 731 days. In the table below the number of missing observations (in terms of days), the total number of articles and the average number of words are presented for each category.

DNC1 DNC2 DNC3 DNC4 DNC5 DNC6 DNC7 DNC8 DNC9

Nr. N.A. 0 1 349 137 525 151 505 511 0

Nr. arts. 44046 8100 568 4993 258 3264 313 316 53758

Avg. words 593 529 618 522 588 563 618 657 588

Note that Nr. = Number, N.A. = Not Available, arts. = articles, Avg. = Average.

2.2.3 Sentiment scores

After the articles have been collected and categorised properly, a quantitative sentiment score can be extracted from each article. This paper considers numerous approaches in order to do so, which will be explained below. These approaches are based on a total of seven different sentiment analysis mod-els/databases, which are summarised in Table 5.

Table 5: Summary pertaining to the used sentiment databases.

Abb. Explanation Source

AFINN Dictionary with a list of words and a

correspond-ing score. The score has a (integer) scale of -5 (very negative) up to and including 5 (very pos-itive). It is specifically designed for shorter mes-sages.

Nielsen (2011)

BING Dictionary with a list of negative and positive

words compiled by experienced researchers in the field of text analysis.

Hu and Lui (2004)

NRC Dictionary with a list of words, which are

consid-ered positive/negative or are connected to emo-tions, e.g. anger or trust. For the purpose of this paper, only the list of positive and negative words was used.

Mohammed and Turney (2013)

GI The psychological Harvard-IV dictionary as used

in the General Inquirer software, which is a gen-eral purpose list of positive and negative word.

Stone et al. (1962)

HE Dictionary with a list of positive and negative

words according to the Henry’s finance-specific dictionary.

Henry (2008)

LM Dictionary with a list of positive, negative and

uncertainty words according to the Loughran Mc-Donald’s finance-specific dictionary. Again, only the positive and negative words will be employed.

Loughran and McDonald (2011)

VAD In contrast with the above mentioned

dictionar-ies, this model is based on sentences instead of single words.

Steinert and Herff (2018)

(15)

Nowadays, most research concerning sentiment is based on sentiment models that take into account sentences instead of single words (Steinert and Herff, 2018). However, it was decided to also include a significant number of sentiment models that are based on unigrams due to the fact that sentiment models based on sentences are extremely computationally intensive. Models based on unigrams perform calculations much faster, which might be necessary; in high frequency settings speed matters to a great extent and a lot of computation power could be (too) expensive. In addition, it is interesting to see if the approach based on sentences actually provides a better quantitative sentiment score. As can be inferred from Table 5, six unigram sentiment databases and one sentiment model based on sentences are included. Before explaining the exact calculations, some necessary notation is introduced.

Let W be the set of all well-defined English words. Moreover, define Wn _{to be an n-dimensional}

set of well-defined English words.

Let A ∈ WnA _{be an article, where n}

A is the total number of words in article A. Note that

A [A1, . . . , AnA], where A1, . . . , AnA ∈ W.

Let U ∈ WnU ×_ZnU _{be a sentiment matrix based on unigrams, where n}

U is the total number

of words in sentiment matrix U. In our case U ∈ {AFINN, BING, NRC, GI, HE, LM} C U S.

Define Ui1 {U11, . . . , UnU1}, where U11, . . . , UnU1 ∈ W and Ui2 {U12, . . . , UnU2}, where

U12, . . . , UnU2∈Z. Lastly, observe that Ui2is the score associated with the word Ui1(∀i).

Let N ⊆ WnN _{be the set of negators, where n}

N is the total number of negators in negator set N .

These words reverse the meaning of the subsequent word. Examples of negators are no and not. The VAD model takes into account valence shifters, such as negators (e.g. not), amplifiers (e.g. really) and adversative conjunction (e.g. but). Rinker (2019) contains the calculation details of the VAD model. The sentiment sets BING, NRC, GI, HE and LM only distinguish between positive (score = 1) and negative words (score = -1), whereas AFINN scores words on an integer scale between -5 and 5 (inclusive). Hence,

sentiment database U provides a function fU : W → Z that maps a word to a (sentiment) score. If

for some b, Ab < Ui1 then fU(Ab) 0, i.e. if a certain word in an article is not included in unigram

sentiment database U a score of 0 is attached to that word.

Now that the basic notation has been developed, the exact calculations can be elaborated upon. As

mentioned above, the function fU is able to map words to (sentiment) scores and an article is a collection

of words. This implies that fU : W →Zneeds to be transformed to a function FU : WnA →Z, where

without loss of generality it is assumed that an nA-dimensional article needs to be transformed. This

transformation was performed in two different ways, resulting in the functions FU1and FU2. Firstly, the

score was calculated by summing the individual scores for all words in an article, i.e.

FU1(A)

nA

Õ

i1

fU(Ai).

Secondly, a bigram (i.e. a set of two words) approach was used. As mentioned before, if the sentiment score is solely based on unigrams, certain phrases might be misclassified. Consider, for example, the combination of words ”not good”. If the sentiment score is solely based on unigrams, this will be considered positive, since good is a positive word. However, in reality this piece of text should be considered negative as the negating word not reverses the meaning of the second word, in this case good. A bigram approach is able to capture those dynamics. This implies that for each set of two words, the first word is inspected to determine if it is an element of N , if this is the case the sentiment score of the second word is reversed. In mathematical form this implies

FU2(A)

nA

Õ

i1

fU(Ai) −2 ·I{Ai−1∈N }· fU(Ai),

where I{E} is an indicator function that equals one if event E is true and zero otherwise. By definition,

the first word in an article is not preceded by another word, therefore A0<N andI{A0∈N } 0.

Lastly, the VAD model also provides a function that maps an article to a quantitative sentiment score,

(16)

nA-dimensional article needs to be transformed. All in all, 13 functions (|U S| · 2+ 1 6 · 2 + 1 13C nF, where |·| denotes the cardinality of a set) are available to transform an article, which is a set of words, into a sentiment score, which is an integer. An excerpt from one of the most negative articles is ”Investors Lost Billions Daily ...”, whereas an excerpt from one of the most positive articles is ”Bitcoin Cash Skyrockets ...”. These articles seem to be correctly classified.

It could be the case that there are multiple articles for one date. However, eventually a single score for each date is needed, since this paper uses daily return data. Therefore, the individual sentiment scores

for each article need to be aggregated within each website across dates inT. The aggregation on this

level is done in two different ways. Firstly, a simple average of all the scores on a particular date is taken. Secondly, a weighted average based on the number of words in each article is computed. This second approach takes into account the length of the article which might be important, since it could be the case that the length of an article influences the impact it has. For example, a small negative article is

likely less harmful to the prices of cryptocurrencies than an extensive negative article. Let SSFad

tcb denote

the sentiment score on date t for news category c for blog b using sentiment scoring function F and

aggregation approach ad across days, where

t ∈T(note |T| 731),

c ∈ {DNC1, . . . , DNC9},

b ∈ B, where B is the set of all blogs (note |B| 15),

F ∈ {FU1, FU2, FVAD},

ad ∈ {simple, weighted}.

Then the following expressions are obtained SSFsimple_tcb (|Atcb|) −₁ Õ i∈Atcb F (i), SSFweighted_tcb Õ i∈Atcb ni !−1 Õ i∈Atcb ni· F (i),

where Atcb is the set of all articles on date t in category c of blog b and ni is the number of words in

article i. In case Atcb ∅, the sentiment score is reported as not available (N.A.). As mentioned in

section 2.2.1, the blogs CoinDesk, Cointelegraph and Bitcoin News can directly be used with category

DNC9. For ease of notation, let these sentiment scores be defined by

SSIFad t h SSFad tDNC9CoinDesk, SS Fad tDNC9Cointelegraph, SS Fad tDNC9Bitcoin News i . Hence, SSIFad

t is a vector of sentiment scores on date t resulting from sentiment scoring function F and

aggregation approach ad across dates, which includes the sentiment scores for the (individual) websites

CoinDesk, Cointelegraph and Bitcoin News. This gives rise to a total of 26 (|ad| · nF 2 · 13 26)

different sentiment score vectors of length three for each date t ∈T.

Lastly, as discussed in section 2.2.2, aggregation across websites leads to a more representative score and it prevents the loss of a significant amount of data. This aggregation is performed after the aggregation within each website across days to ensure a single sentiment score is obtained for each day. Again, two ways of aggregation are considered. Firstly, a simple average across websites is performed. Secondly, a weighted average based on the number of mentions (see Table 2) is computed, since it might be the case

that the most popular websites have the most influence. Let SSFadaw

tc denote the (aggregated) sentiment

score on date t for news category c using sentiment scoring function F, aggregation approach ad across

days and aggregation approach awacross websites, where aw∈ {simple, weighted} and everything else is

(17)

where miis the number of mentions for blog i. IfIn SSFad_tci ∈_Z

o 0, there is no observation available on date

t in category c for blog i, since in that case SSFad

tci N.A.< Z. Note that mi is static with respect to

time. This is not optimal since the ranking and thus the number of mentions may change from day to day. However, historical data pertaining to the number of mentions is not available. Referring to section

2.2.2, the only eligible categories are DNC2 and DNC9. For ease of notation, let the sentiment scores

pertaining to these categories be defined by SSAFadaw t h SSFadaw tDNC2, SS Fadaw tDNC9 i . So, SSAFadaw

t is a vector of sentiment scores on date t resulting from sentiment scoring function F and

aggregation approaches ad and aw, which includes the (aggregated) sentiment scores for the categories

DNC2 and DNC9. This gives rise to a total of 52 (|ad| · |aw| · nF 2 · 2 · 13 52) different sentiment

score vectors of length two for each date t ∈T.

The above constructed sentiment scores are implicitly dependent on the number and length of articles that are available on a given day. However, more explicitly, the number and length of articles that are available on a given day could be viewed upon as a proxy for news activity in the realm of cryptocurrencies. Consequently, these quantities might also contain valuable information with regards to cryptocurrency returns and trading strategies. Therefore, a second set of variables is constructed that contains meta

information regarding the number and length of articles that are available on each date t ∈T. Let NOtcb

denote the number of articles (observations) on date t for category c on blog b. Also, let NWtcb denote

the total number of words that have been written on date t for category c on blog b. Let NOtc and

NWtc be analogous to NOtcband NWtcbexcept for the fact that the two former measures are aggregated

across websites. In the notation developed above, the mathematical definitions are given by

NOtcb |Atcb|, NOtc Õ j∈B |Atc j|, NWtcb Õ i∈Atcb ni, NWtc Õ j∈B Õ i∈Atc j ni.

The only metrics of interest correspond to SSIFad

t and SSA

Fadaw

t , since these are the only sentiment scores

that will be used in the analysis. In order to ease notation, define the following metrics

NOIt NOtDNC9CoinDesk+ NOtDNC9Cointelegraph+ NOtDNC9Bitcoin News,

NW It NWtDNC9CoinDesk+ NWtDNC9Cointelegraph+ NWtDNC9Bitcoin News,

NOAt NOtDNC2+ NOtDNC9,

NWAt NWtDNC2+ NWtDNC9.

The variables NOIt and NW It could be used in conjunction with SSIFa_t d, whereas NOAt and NWAt

could be used together with SSAFadaw

t . Lastly, it has to be noted that for the developed sentiment scores

(SSIFad

t and SSA

Fadaw

t ) and meta information (NOIt, NW It, NOAt and NWAt) there are still some

missing observations inT(see Tables 3 and 4). However, since the number of missing observations is very

limited, these observations are generated by linear interpolation, also used in e.g. Stavroyiannis (2018), to prevent the loss of data that is present on these dates.

So all things considered, four main sets of news sentiment proxies can be used in the upcoming models section. On the one hand, 26 different vectors of size three and five have been constructed that represent

three individual websites, SSIFad

t and

h

SSIFad

t , NOIt, NWIt

i

, respectively (a total of 52 options). On the other hand, 52 different vectors of size two and four have been composed that represent two (disjoint)

categories, SSAFadaw t and h SSAFadaw t , NOAt, NWAt i

, respectively (a total of 104 options). Thus, a total number of 156 options are created to include news sentiment.

Elaborating upon all these different options is not necessary since they are all quite similar. Instead,

only SSIFU1simple

t and SSA

FU1simplesimple

(18)

these variables can be found in Table 6. As can be observed, except for the LM sentiment model, the articles are considered positive on average, which indicates that cryptocurrency blogs have a slight bias. Moreover, large discrepancies exist between the minimum and maximum sentiment scores. Interestingly,

in some cases the minimum value is positive (e.g. DNC9 with NRC), which implies that on no day inT

the news should be considered negative. Considering the length of Tand the erratic price paths of the

cryptocurrencies inT, this is not the expected result. With respect to the mean, the standard deviation

is substantial. Additionally, it can be seen that the sentiment is significantly different for each website and category with respect to all descriptive statistics. This is a first indication that including multiple news sentiment factors could be beneficial.

Table 6: Descriptive statistics concerning news sentiment. In particular, the time seriesSSIFU1simple

t

t∈T and SSAFU1simplesimple

t

t∈T

are examined. Abbreviations are used in the table, e.g. CoinDesk

SSFU1simple tDNC9CoinDesk t∈T and DNC2 SSFU1simplesimple tDNC2 t∈T

. The other abbreviations follow analogously. Sentiment model (U)

AFINN BING NRC GI HE LM VAD

Minim um                CoinDesk -39.50 -32.50 -7.00 -25.00 -6.50 -34.00 -0.09 Cointelegraph -34.40 -18.60 -1.00 -6.00 -5.11 -41.60 -0.04 Bitcoin News -27.50 -13.00 2.50 -12.00 -3.14 -37.00 -0.02 DNC2 -48.00 -26.50 -3.50 -10.00 -4.75 -20.67 -0.17 DNC9 -14.98 -6.34 6.21 2.53 -0.96 -10.33 0.04 Mean                CoinDesk 7.44 0.77 18.83 9.17 1.07 -4.37 0.10 Cointelegraph 5.96 0.62 15.56 7.52 1.48 -4.21 0.10 Bitcoin News 18.40 6.51 29.74 16.86 3.78 -3.76 0.14 DNC2 5.15 -0.24 12.79 6.32 1.30 -4.48 0.08 DNC9 10.04 2.64 19.08 10.56 1.96 -2.99 0.12 Maxim um                CoinDesk 73.00 36.50 105.50 59.50 12.00 13.00 0.26 Cointelegraph 27.40 14.80 62.60 33.00 12.83 5.60 0.26 Bitcoin News 89.50 46.00 121.00 80.00 17.50 13.00 0.32 DNC2 40.50 14.67 40.17 27.33 7.88 4.67 0.21 DNC9 27.59 10.33 38.39 25.71 5.73 3.03 0.21 Standard deviation                CoinDesk 9.89 5.24 11.44 7.99 1.87 4.72 0.04 Cointelegraph 6.45 3.51 6.49 4.23 1.54 3.66 0.04 Bitcoin News 13.46 7.02 12.05 8.81 2.39 5.47 0.05 DNC2 7.94 4.03 5.30 4.37 1.76 3.54 0.04 DNC9 4.44 2.40 4.23 3.28 1.00 1.87 0.03

2.3 Interaction between cryptocurrency metrics and news sentiment

Before continuing with the models section, the interaction, in terms of correlations, between the cryp-tocurrency returns and news sentiment will be investigated. More specifically, the correlation between current returns and lagged news sentiment is inspected. The following correlation analysis is useful since news sentiment will enter the considered models as a lagged variable. In order to model returns a GARCH modelling approach is used, in which additional (lagged) news sentiment proxies are intro-duced. With reference to the upcoming trading strategies, note that it is only possible to use presently known information to predict the returns of tomorrow. Although significant correlation does not imply causation or predictability, it is a first indication that a relationship may exist between the variables of interest. Later, in the models and results sections, this relationship will be further quantified.

A large number of sentiment scores have been constructed that can be used in the analysis (see section 2.2.3). Firstly, news sentiment was constructed from individual websites, which will be called news

cate-gory one from now on. News catecate-gory one consists of 28 different metrics, i.e. SSIFad

(19)

Secondly, news sentiment was composed for individual categories, henceforth called news category two.

News category two can be represented by 54 different metrics, i.e. SSAFadaw

t , NOAt and NWAt. With

regard to cryptocurrency returns, a total of 30 returns series are available, i.e. 29 individual cryptocur-rencies and the Altcoin28. Therefore, news category one and two can be combined with cryptocurrency

returns in 840 (28 · 30 840) and 1620 (54 · 30 1620) ways, respectively. The correlations for each of

these 840 and 1620 cases will be inspected.

For news category one, significant correlation presented itself in 19% of the cases, where the variants of the AFINN sentiment model produce the highest correlations in combination with Dash returns (i.e. 0.09). In the case of news category two, significant correlation exists in 15% of the cases. Here, the returns of Stellar and the lagged HE sentiment scores (for a number of different variants) produce relatively high correlations (as high as 0.09). The set of variables that produce the highest correlation for news category one and two are visualised in Figure 6, in which it can be seen that positive news seems to have a different effect for different individual websites. Also, as expected, it can be observed in most cases that the different news sentiment proxies are positively correlated. This implies that the various news sources generally agree on the sentiment on each day. The above performed correlation analysis induces cautious optimism with regard to the validity of the main hypotheses stated in section 1.4; at first sight, lagged news sentiment seems to have a relationship with returns.

-100 -80 -60 -40 -20 0 20 40 60 80 100 Ret. D ASH t CoinDesk t− 1 Cointele gr aph t− 1 Bitc oin News t− 1 Ret. XLM t DNC 2t− 1 DNC 9t− 1 Ret. DASHt CoinDeskt−1 Cointelegrapht−1 Bitcoin Newst−1 Ret. XLMt DNC2t−1 DNC9t−1 100 -11 1 8 29 9 0 -11 100 9 10 -3 1 32 1 9 100 3 0 0 20 8 10 3 100 3 13 18 29 -3 0 3 100 12 8 9 1 0 13 12 100 19 0 32 20 18 8 19 100

Figure 6: Correlation diagram concerning contemporary return (Ret. = Return) of Dash/Stellar and lagged news sentiments. Empty squares represent insignificant correlation, whereas coloured squares indicate significant correlation (confidence level of 95%). The correlation scale is in terms of

percent-ages (-100% up to 100%). Lastly, note that abbreviations are used in the figure, e.g. CoinDeskt−1

SSFAFINN1weighted

t−1DNC9CoinDesk and DNC2t−1 SS

FHE2weightedweighted

(20)

3 Models

This section will elaborate upon the employed models in the context of returns as well as trading strate-gies. Firstly, two Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) type models will be used in order to investigate the effect news sentiment may have on cryptocurrency returns. These models are discussed in section 3.1. Secondly, a number of trading strategies will be developed in section 3.2. Before continuing with these sections, the training dataset and the validation dataset will be defined

(similar to Nakano et al., 2018). The time frameTfor which data was collected covers the period between

01-03-2017 and 01-03-2019 (i.e. 731 data points). The training dataset, denoted byT1, is defined to be

the period between 01-03-2017 and 01-01-2019 (i.e. 672 data points). Whereas, the validation dataset,

denoted byT2, is defined to be the period between 02-01-2019 and 01-03-2019 (i.e. 59 data points). In

this way, the trading strategies can begin on 01-01-2019, by buying or selling cryptocurrencies using

the return forecasts of 02-01-2019. So, the trading strategies are calibrated in T1 and will be recorded

during T2, the last 2 months of T, to inspect their out-of-sample performance. Also, the GARCH type

models will be fitted using the data available inT1. However, the GARCH type models will not be used

for prediction or trading strategies, since not all of them are suitable to predict returns. For example, a pure GARCH model, i.e. without AutoRegressive Moving Average (ARMA) terms, predicts constant returns. Furthermore, existing literature has shown that machine learning models often have superior performance (see e.g. Karathanasopoulos et al., 2017; Sun et al., 2019). Thus, the GARCH models will solely be used to determine the effect news sentiment has on cryptocurrency returns.

3.1 Returns

This paper will consider an exponential GARCH (eGARCH) model and an ARMA model with eGARCH errors (ARMA-eGARCH). These types of models are commonly used to model financial returns and they have also been applied to cryptocurrencies (e.g. Dyhrberg, 2016). This paper focuses on eGARCH models, since the external regressors that will be used can take on negative values, i.e. news sentiment can exhibit negative values. In the standard GARCH model, this may have the consequence that the fitted values for the variance turn out to be negative, whereas this is not the case for eGARCH models. Moreover, the eGARCH model accounts for asymmetries. This is required since Maqsood and Lelit (2017) and Stavroyiannis (2018) found that the effects of previously observed returns on the future variances are asymmetric. In particular, a decrease in returns has a greater effect on the variance than an increase.

The dynamics of the considered models will be discussed one by one below. Firstly, for each model the general version is given, constructed using McNeil, Frey, and Embrechts (2015) p. 118-123 and Ghalanos (2017). Afterwards, news sentiment is introduced in the models. The names of the models including news sentiment coefficients will be appended with X (i.e. MODELX). By estimating a model with and without news sentiment, information criteria can be compared. This comparison and the significance of news sentiment coefficients will be used to obtain a verdict with respect to hypothesis 1a. Numerous news sentiment proxies will be examined (see section 2.2.3). After fitting all models, the final models will be selected based on the Akaike Information Criterion (AIC). With respect to the notation below, note that,

in principle, 30 return series are available, i.e. the return processes (rit)t∈T1 are collected, where i ∈ CC,

where CC is the set of all cryptocurrencies defined (see section 2.1). However, the same models are fitted for each cryptocurrency and hence index i is dropped in the notation below for convenience. Thus, the

models are specified using the return process (rt)t∈T1, which can be viewed upon as the return process

for any of the 30 cryptocurrencies. The models were fitted using the R package rugarch (Ghalanos, 2019).

Model 1 (eGARCH) nl

Let (zt)t∈T1 be Standard White Noise with mean 0 and unit variance (SWN(0,1) for short). The process

(rt)t∈T1 is an eGARCH(q,p) process if it satisfies, for all t ∈T1, the difference equations

rt µ + σtzt, ln σ2_t ω + q Õ i1

αizt−i+ γi(|zt−i| −E[|zt−i|])+

(21)

where µ, ω, αi, γi, βj ∈R, i ∈ {1, 2, . . . , q}, j ∈ {1, 2, . . . , p}, p and q are positive integers indicating the

number of included lags, ln(·) is the natural logarithm function, | · | represents the absolute value function

andE[·]denotes the expected value function.

The eGARCHX model in which news sentiment has been incorporated, is equivalent to model 1 except

that the formula for ln σ2

t is adjusted to ln σ2_t_{ω +} q Õ i1

p Õ j1 βjln σ2 t− j + n Õ k1 φk σNSt−1k . Here φk

σ is the coefficient in the variance equation corresponding to the news sentiment proxy NSk

NS_tk

t∈T1. It still has to be explained in more detail what NS

k

t, where t ∈ T1 and k ∈ {1, . . . , n},

represents exactly. Let N St

NS1_t, . . . , NSn_t represent a vector containing all news sentiment proxies.

In this paper, the vector N St can appear in four different forms, i.e.

N St SSIFat d, N St h SSIFad t , NOIt, NWIt i , N St SSAFa_t daw, N St h SSAFadaw t , NOAt, NWAt i .

As mentioned above, SSIFad

t can take on 26 different values, SSA

Fadaw

t can be represented by 52 different

values and no options can be specified for NOIt, NW It, NOAt and NWAt. So, news sentiment can be

included in a total of 156 (2 · 26+ 2 · 52 156) ways, for each cryptocurrency. The news sentiment vector

N St is standardized in order to facilitate comparability.

Model 2 (ARMA-eGARCH) nl

Let (zt)t∈T1 be Standard White Noise with mean 0 and unit variance (SWN(0,1) for short). The process

(rt)t∈T1 is an ARMA(q1, p1) process with eGARCH(q2, p2) errors if it satisfies, for all t ∈ T1, the

difference equations rt µt+ σtzt, µt µ + q1 Õ l1 ψl(rt−l−µ) + p1 Õ m1 θmσt−mzt−m, ln σ2_t_{ω +} q2 Õ i1

p2 Õ j1 βjln σ2 t− j , whereµ, ψl, θm, ω, αi, γi, βj ∈R, l ∈ {1, 2, . . . , q1}, m ∈ {1, 2, . . . , p1}, i ∈ {1, 2, . . . , q2}, j ∈ {1, 2, . . . , p2},

q1, p1, q2and p2are positive integers indicating the number of included lags, ln(·) is the natural logarithm

function, | · | represents the absolute value function andE[·] denotes the expected value function.

The ARMA-eGARCHX model, in which news sentiment has been incorporated is equivalent to model 2

except that the formulae for µt and ln σ2_t are adjusted to

µt µ + q1 Õ l1 ψl(rt−l−µ) + p1 Õ m1 θmσt−mzt−m+ n Õ k1 φk µNSkt−1, ln σ2_t ω + q2 Õ i1

p2 Õ j1 βjln σ2 t− j + n Õ k1 φk σNSt−1k . Here φk

µ represents a coefficient analogous to φkσ. Thus, φkµ is the coefficient in the mean equation

corresponding to the news sentiment proxy NSk _NSk

t

t∈T1.

(22)

in practice mostly low order GARCH models are used. Accordingly, the orders from one up to and including three will be tested for all volatility models and only orders of one will be tested for all mean models. Except for the reason mentioned above, the number of different orders tested is also restricted by computation power. The final model will be selected based on the AIC criterion. At first, for each model one innovation distribution is considered, namely the normal distribution. Moreover, all orders up

to and including three are fitted for the volatility models, which gives rise to 9 (3 · 3 9) options. For the

mean model only the orders of one are fitted. Therefore, for each cryptocurrency, a total of 18 models are

fitted that do not take into account news sentiment. Furthermore, a total of 2808 (156·18 2808) models

that do take into account news sentiment are tested, again for each cryptocurrency. Consequently, the

total number of models that will be fitted with normal innovations is 84780 (30 · (18+ 2808) 84780).

Given the available computation power, it was not feasible to include higher orders for all variants of the models.

The Generalized Hyperbolic Distribution (GHD) will also be considered as distribution for the innova-tions. This modelling choice has been made since research has shown that stock market and especially cryptocurrency returns exhibit heavier tails than the traditionally employed normal distribution (see e.g. section 2.1; Jansen and de Vries, 1991; Stavroyiannis, 2018). The names of the models will be appended with their respective innovation distribution, i.e. MODEL-Normal and MODEL-GHD. Fitting GHD in-novations takes relatively much computation time, which further complicates fitting a large number of different orders for every variant of the considered models. By including these models, the total number

of fitted models becomes 169560 (84780 · 2 169560).

3.2 Trading strategies

Firstly, the general idea of the trading framework will be elaborated upon. Afterwards, the individual components that constitute a particular trading strategy will be explained in more detail. As was mentioned in section 2.1, information concerning a total of 29 cryptocurrencies was collected. Moreover, the Altcoin28 was constructed in order to represent the whole altcoin space by a single metric. However, in this section the Altcoin28 will not be considered, since it is not actually tradable and the aim of this paper is to focus on trading strategies that can actually be implemented in real life. The general idea of the trading framework will be suitable for investors who are interested in obtaining exposure to cryptocurrencies by investing in a portfolio of cryptocurrencies. The trading decisions will be made univariately for each cryptocurrency. This implies that buy and sell signals are generated for each cryptocurrency without using information on the other cryptocurrencies. Moreover, the invested money is treated in a univariate manner. More specifically, an equal portion of wealth is invested in each cryptocurrency at the start. Later, in the trading period, money reserved to be invested in a particular cryptocurrency is not allowed to be invested in another cryptocurrency at any point in time. So, the trading framework can refrain from making an investment in a certain cryptocurrency, which leaves some money on a hypothetical bank account. Yet, this money is not allowed to be invested in another cryptocurrency. In other words, 29 separate investment problems are considered, which are combined to obtain an investment portfolio of cryptocurrencies. The nature of this trading framework is fairly basic. This is justified by the fact that this is one of the first papers to consider a portfolio of cryptocurrencies. Most papers just consider the Bitcoin (see e.g. Nakano et al., 2018; Corbeta et al., 2019).

In mathematical form, the trading framework is defined as follows. DefineTGto be a general time frame

starting at time t 0 and ending at time t T. The value (in USD) of the cryptocurrency portfolio at

time t ∈TGis then given by

Π_t

nc

Õ

i1

CCVit,

where CCVit is the value (in USD) of the investment in cryptocurrency i at time t and nc is the total

number of cryptocurrencies considered. In this case nc 29 since i ∈ CC\{ALT}. In turn, the value of

the investment in a single cryptocurrency at time t is given by

CCVit Ii

t

Ö

τ0

(23)

where Ii is the value of the initial investment in cryptocurrency i at time t 0, CCSiτ−1is the buy/sell

signal the underlying trading strategy gives at time τ − 1 for cryptocurrency i and riτ is the return of

cryptocurrency i at timeτ. It is not possible to realise returns at time τ 0, therefore ri0 is set to zero.

Additionally, by definition, no buy/sell signal exists before the start date and thus CCSiτ−1is also zero

forτ 0. In this paper, the initial investment is normalised such that the value of the portfolio at time

t 0 is equal to one. Combined with the restriction that an equal investment is made at t 0 in each

cryptocurrency, this implies Ii 1/nc in this setting. The outcome space of CCSit is given by -1, 1 and

0, i.e. CCSit ∈ {1, −1, 0}. Note that CCSit can be explained as follows. CCSit 1 represents a buy

signal, CCSit −1 entails a sell signal and finally CCSit 0 expresses neither a buy nor a sell signal

(i.e. no trade is made). For each cryptocurrency i and each time t, the signals CCSit are produced by

underlying trading strategies, which will be explained in more detail in sections 3.2.1 and 3.2.2.

With regards to the above introduced trading framework, two main trading strategies will be considered.

Firstly, short-selling constraints are imposed and the corresponding strategy will be called STRAT1.

This implies that STRAT1only allows selling cryptocurrencies if they are in the portfolio. However, the

general framework above does allow for short-selling. Thus, a minor adjustment needs to be made to the

trading framework to obtain STRAT1. In practice, this implies restricting the outcome space of CCSit.

This restriction is made in the following way. If a certain trading strategy gives the signal CCSit −1

for some cryptocurrency i at some time t, the value of this signal is converted to 0. Hence, for STRAT1

it holds that CCSit ∈ {1, 0}. Secondly, a strategy, called STRAT2, without short-selling constraints is

considered. Therefore, STRAT2allows for buying and selling cryptocurrencies at any time. This means

that the outcome space of CCSit remains unaltered compared to the general framework introduced

above.

3.2.1 Basic trading strategies

Here, a total of five strategies are considered that will serve as a benchmark (and input) for the machine learning strategies. The general trading framework depends on buy and sell signals for each cryp-tocurrency at each time t. However, the implementation of the trading strategies is equivalent across cryptocurrencies. Therefore, index i is dropped in all cryptocurrency-specific variables for notational

convenience. For example, the trading strategies are discussed in terms of CCSt, which may be viewed

as the trading signal at time t for any of the 29 cryptocurrencies.

The first strategy is the well-known buy-and-hold strategy. In this strategy the cryptocurrency is bought

at the beginning ofT2 and sold at the end ofT2. In other words, CCSt 1,∀t ∈T2. The buy-and-hold

strategy will only serve as a benchmark for the other strategies. The next four strategies are so-called technical trading strategies or rules and they will be used as benchmarks as well as inputs for the machine learning strategies. A technical trading strategy relies on statistical quantities related to metrics such as price, volume and return of a certain asset. This paper only considers technical trading strategies that take into account price or return information. The discussion below, concerning the employed technical trading rules, will be brief by nature since it is not the main focus of this paper.

The first two technical strategies are based on moving averages (also used in Corbeta et al., 2019). Define

MAS _{and MA}L _{to be short and long moving averages, which take into account the previous S and L}

prices, respectively (S, L ∈N, with S< L). In mathematical form this implies

MAS_t 1 S S Õ λ1 Pt−(λ−1), MALt 1 L L Õ λ1 Pt−(λ−1),

where Pt is the price of a certain cryptocurrency at time t. The first strategy using moving averages is

(24)

where Bt is a threshold band defined as Bt MAL_t · B, where B ∈R+ is a deterministic constant. This threshold band is sometimes used by traders in order to prevent false signals, which may arise when the price of an asset moves sideways (Corbeta et al., 2019).

Secondly, the Fixed Moving Average (FMA) strategy is also based on moving averages. This strategy is similar to the VMA strategy. However, this strategy only considers crossings of the short and long moving averages. If the short moving average intersects the long moving average from below (above) a

buy (sell) signal is given. Subsequently, this strategy continues to give the same signal for k ∈Nperiods,

in which other signals are ignored. Hence, the signals of the FMA strategy can be summarised by the following equations CCFt                1, if MAS_t−1< MAL_t−1∧ MAS_t > MAL_t, −1, _{if MA}S t−1> MALt−1∧ MASt < MALt, 0, if MAS_t−1< MAL_t−1∧ MAS_t < MAL_t, 0, if MAS t−1> MALt−1∧ MASt > MALt, CCSt FFMA t | (CCFt)t∈TG _,

where CCFt is defined purely for notational purposes. The function FFMA:TG→ {1, −1, 0}, whereTG

is a general set of dates, ensures that signals prevail for k periods. For example, if for some t, CCFt 1,

then FFMA sets CCSt · · · CCSt+(k−1) 1.

Thirdly, the Trading Range Break-out (TRB) strategy is considered, which is also investigated in Corbeta

et al. (2019). This strategy is based on rolling price maxima, which take into account the previous k ∈N

periods. So, instead of the rolling averages used in the VMA and FMA strategies, rolling maxima are considered in this case. The TRB strategy can be characterised by the following mathematical expression CCSt          1, if Pt> max (Pt−1, . . . , Pt−k)+ Bt, −1, _if _P_t< max (P_t−1, . . . , P_t−k_{) − B}_t, 0, if |Pt−max (Pt−1, . . . , Pt−k) |< Bt,

where Bt is a threshold band defined as Bt max (Pt−1, . . . , Pt−k) · B, where B ∈ R+ is a deterministic

constant.

Finally, the Relative Strength Index (RSI) was included, which is also employed in Nakano et al. (2018).

The RSI is based on the average gain and average loss of an asset over a time period with length k ∈N.

The average gain (AG) and the average loss (AL) over the last k time periods are given by

AGk_t 1 k k Õ λ1 max r_t−(λ−1), 0_, ALk_t 1 k k Õ λ1 max −r_t−(λ−1), 0_.

The RSI based on parameter k is then defined as

RSIk_t 1 − 1

1+ AGk

t/ALkt

.

An often used rule of thumb states that values of RSI above 0.7 (below 0.3) should produce buy (sell) signals (Nakano et al., 2018). However, in this paper several thresholds are implemented by varying the

parameter B ∈Rbelow. In case ALk_t is zero, the formula for RSI_tk entails dividing by zero, which is not

allowed. However, if ALk

t 0 and AGtk > 0 then RSItk should produce a buy signal, since only positive

returns and no negative returns were observed in the previous k periods. Therefore, a value of one (the

maximum) is assigned to RSIk

The eﬀect of news sentiment on returns and trading strategies in the context of cryptocurrencies

The effect of news sentiment on

returns and trading strategies

in the context of cryptocurrencies

Johan Starkenburg

Master’s thesis

Degree: MSc Econometrics, Operations Research and Actuarial Studies

Track: Actuarial Studies

Supervisor:

dr. D. Ronchetti

The effect of news sentiment on

returns and trading strategies

in the context of cryptocurrencies

Johan Starkenburg

Table of Contents

1

Introduction

1.1

News sentiment

1.2

Cryptocurrencies

1.3

Combining news sentiment and cryptocurrencies

1.4

Contributions and hypotheses

2

Data Description

2.1

Cryptocurrency metrics

Date

Date

Lag

Lag

Densit

y

2.2

News sentiment

2.3

Interaction between cryptocurrency metrics and news sentiment

3

Models

3.1

Returns

3.2

Trading strategies