• No results found

Investigating cryptocurrencies: return, exchange volume and volatility with investor's attention and investor sentiment: an empirical analysis

N/A
N/A
Protected

Academic year: 2021

Share "Investigating cryptocurrencies: return, exchange volume and volatility with investor's attention and investor sentiment: an empirical analysis"

Copied!
70
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Management (Faculteit der

Managementwetenschappen)

M Financial Economics Academic year 2017-2018

Investigating cryptocurrencies’ return,

exchange volume and volatility with

investor’s attention and investor sentiment:

An empirical analysis

(2)

Student: A. Ghiasvand (Alexandr) (s4073770) Supervisor: dr. J. Qiu (Jianying)

Content

Abstract ... 3

1. Introduction ... 4

1.1 Previous literature ... 4

1.2 Scientific relevance, goals & research questions ... 6

2. Theory ... 7 2.1 Investor’s attention ... 7 2.2 Investor sentiment ... 9 2.3 Interaction effects ... 11 3. Methodology ... 13 3.1 Data ... 13

3.2 Dependent Variables: Cryptocurrency return, exchange volume, and volatility ... 13

3.3 Independent Variables: Wikipedia, Google Trends, Sentiment ... 14

3.4 Control Variables: Dollar Euro Exchange rate, S&P 500 ... 14

3.5 Missing values and Time interval ... 14

3.6 Econometric methodology ... 15

4. Results ... 18

4.1 Ordinary Least Squares (normal OLS) ... 18

4.2 Autocorrelation ... 22

4.3 Ordinary Least Squares with first differences ... 22

4.4 Vector Autoregressive models (VAR) ... 23

4.5 Vector Error Correction model (VECM) ... 26

5. Conclusion ... 29

6. Discussion ... 30

7. References ... 32

8. Appendix: Sentiment Analysis ... 36

(3)

Abstract

In this research, I study the relationship between cryptocurrency’s (Bitcoin, Ethereum and Litecoin) return, exchange volume and volatility on one side, and investor’s attention and investor sentiment on the other. Together with search queries (Google Trends and Wikipedia queries) and a sentiment proxy (Reddit), I investigated the existence of any long run relationships, as well as the forecasting power of investor’s attention and investor sentiment on return, exchange volume and volatility. Furthermore, I also constructed an interaction between the search queries and the sentiment proxy to investigate any leverage effects. I use Ordinary Least Squares (OLS) as well as time-series analysis to evaluate the effects. The research consists of data starting from the 1st of September 2017 until the 7th of May 2018. The results showed solely consistent long run relationship between search queries and the cryptocurrencies’ exchange volume and volatility. Leverage effects seemed evident as well, but were mostly due to the search queries’ explaining variance. The sentiment proxy gave inconsistent results, showing no unilateral direction towards return, exchange volume nor volatility.

(4)

1. Introduction

Since the introduction of Bitcoin in 2009, cryptocurrencies have always been on the radar of governments, financial markets, and the general public. The main idea of this new technology is to create a peer-to-peer network absent of any intermediaries, like banks. The word “blockchain” describes the main technology behind cryptocurrencies, making it a secure system through a decentralized ledger. Of course, there are other possibilities for this technology apart from a new way of facilitating exchange, like smart contracts. But nevertheless, it has grabbed the attention of the world, and more specifically, from financial markets. At the time of this thesis is being written, the price of Bitcoin and other cryptocurrencies have been skyrocketing since they have been introduced in the financial markets, with its price peak at the end of 2017.

The sudden rise in the price of these cryptocurrencies did not come gradually all over the timeline, but increased severely in 2017. One of the main reasons, often mentioned in previous literature, might be the attention for these cryptocurrencies through social and mainstream media. People tend to step in, not knowing or wanting to know the use of cryptocurrencies as alternative currencies or other applications, but rather as an asset. As Glaser et al. (2014a) have stated in their research, especially uninformed investors have not bought Bitcoin primarily as an alternative transaction system, but more as an alternative investment vehicle. Bitcoin is often seen as an investment opportunity for great returns. More precisely, it is often seen as a speculative investment (Velde, 2013; Yermack 2013). Making it a speculative investment, (social) media might influence features of cryptocurrencies through investor’s attention and sentiment. That is why it is relevant to focus on other factors influencing features of cryptocurrencies, and not just on fundamentals. In this research, I will look at both investor’s attention and investor sentiment, and their influence on the cryptocurrencies Bitcoin, Ethereum and Litecoin. I will look at the long term effects on the cryptocurrencies’ return, exchange volume and volatility, as well as their forecasting effects through a time series analysis to test for any anticipatory effects.

1.1 Previous literature

Regarding previous literature, there has been a ton of research done on factors influencing the features (e.g. price, volatility and trading volume) of cryptocurrency markets, mostly its impact on the main cryptocurrency of this moment: Bitcoin. Georgoula et al. (2015) have done a Twitter sentiment

(5)

analysis to study the relationship between collective mood derived from Twitter feeds, and Bitcoin prices. (Positive) sentiment seems to have a positive short-run impact on the price of Bitcoin. Additionally, they have also looked at technological factors (like the hash rate), and fundamental economic variables (like USD/Euro exchange rate). They found that in the short run, Wikipedia search queries (positive), the hash rate (positive), and the exchange rate (negative) seem to matter, while in the long run, the total stock of Bitcoin in circulation (positive) and the S&P 500 (negative) seem to have a significant effect on the price of Bitcoin.

Bukovina & Martiček (2016) looked at sentiment as a driver of Bitcoin volatility, and found a marginal presence of sentiment in their whole period of study. However, sentiment’s explanatory power increases during the period of excessive volatility, more specifically, in the “bubble” period of their researched timeframe, at the end of 2013. Finally, their results show that positive sentiment explains a greater deal of Bitcoin excessive volatility. In line with Georgoula et al. (2015) and Bukovina & Martiček (2018), Glaser et al. (2014b) also find that price volatility is significant influenced by positive sentiment and media coverage. In line with sentiment analysis, Kaminski (2016) has covered some research around Twitter signals influencing Bitcoin price, trading volume, and spread. More precisely, Kaminski (2016) looked at virtual emotions, and found that Twitter as a microblogging platform may be interpreted as a virtual trading floor which is emotionally reflecting Bitcoin’s market movement. Emotions tend to “fly high” when trading volume was high.

Kristoufek (2013) shows that search queries (Google Trends and Wikipedia) are connected with Bitcoin prices. Kristoufek finds strong correlation between the internet engines and Bitcoin, in both ways: namely, search queries influencing the prices, as well as the prices influencing the search queries. This confirms Kristoufek’s assumption of the expectation of Bitcoin having no underlying fundamentals, which points in the speculation and trend chasing dynamics. However, Kritoufek (2013) also finds that when the prices are above the trend (high prices), increasing interest pushes prices further up, while this goes also for the opposite, namely prices below the trend (low prices) being pushed down when interest increases.

In contrast with Kristoufek (2013), Bleher & Dimpfl (2018) have looked at daily Google Trends and its predictive power towards returns and volatility of multiple cryptocurrencies. They stated that in previous research an evident association is founded between Google’s search volume index (SVI) and stock market’s returns and volatility. However, this seemed not to matter for the cryptocurrencies that have been investigated by Bleher & Dimpfl (2018). In general, Google’s SVI seems not to predict cryptocurrency returns. All in all, it seems not to indicate a prediction for volatility neither.

(6)

1.2 Scientific relevance, goals & research questions

As there are a lot of ways to analyze different determinants of cryptocurrencies’ features, it comes down that every researcher is using different variables and proxies. However, we can see that search queries and sentiment are popular methods, but in my opinion not investigated to its full extent. First of all, most researchers focus on Bitcoin, and neglect the fact that Bitcoin is the pioneer within cryptocurrency markets and could have a drastically different pattern from other currencies. This means that paper results with Bitcoin only cannot be necessarily generalized to other cryptocurrencies as well, especially the less famous cryptocurrencies, i.e. the altcoins. I will therefore use cryptocurrencies with lesser market capitalization, to shed some light on other cryptocurrencies (Ethereum and Litecoin) as well. Second, I will use search queries as well as a sentiment proxy in my research. The combination of both in one research has not been conducted before. Furthermore, the sentiment proxy in this research is a unique variable constructed through a combination of a sentiment tool (NLTK) and selected Reddit as its source. This sentiment proxy differs from others sentiment proxies through its source, the method of retrieved data and/or the executed sentiment tool. I will explain the construction of this variable in the Methodology section. Also, not only will I look at them individually, but I will also shed some light on Kristoufek’s (2013) careful suggestion of a possible leverage effect when it comes to Bitcoin prices above or below their trend price. If a significant leverage effect will be evident, it will mean that the interaction of independent variable X1 together with independent variable X2 will have an effect on the dependent variable Y1. For instance, if I hypothesize if positive sentiment and an increase of search queries have a positive effect on Bitcoin’s return, a leverage effect will mean that if the two are both independently positive and significant, together with the interaction term X1*X2 (also positive and significant), the two independent variables strengthen each other. I will construct an interaction between search queries and sentiment to evaluate this effect. Finally, I will focus on the long run relationship between search queries and sentiment and the price, volatility and trading volume of the relevant cryptocurrency, as well as some forecasting power. To contribute to the nuance around this field, it is important to investigate it extensively. In line to my stated goals, I will tend to answer three research questions:

1. To what extent do search queries have a relationship with the return, volatility and exchange volume of the relevant cryptocurrency?

2. To what extent does sentiment have a relationship with the return, volatility and exchange volume of the relevant cryptocurrency?

(7)

3. To what extent does the interaction between search queries and sentiment have a relationship with the return, volatility and trading volume of the relevant cryptocurrency?

2. Theory

2.1 Investor’s attention

Some works view ticker searches (read: search queries) as a valid proxy for investor sentiment (Da, Engelberg & Gao, 2011; Joseph, Wintoki & Zhang, 2011). Above all, search queries define a certain attention towards information, before expressing any form of sentiment. Although Da, Engelberg & Gao (2011) find correlations between ticker searches and proxies for sentiment, I view them not the same. I do find it important to acknowledge the overlap, but want to investigate the difference as well, namely investor’s attention on one side and investor sentiment on the other. Relating to this paper, both definitions might indicate an interest towards a security, but do not mean exactly the same. I define investor’s attention as the selective concentration on specific information. For example, I look at the amount of search queries on Google per day. This means that people were drawn to enter a certain keyword, like “Bitcoin”, in a search engine. However, this does not directly imply a certain sentiment to do it. It could be theorized that people look for certain words when they have a positive sentiment towards specific information, and look for it through a search engine. This would mean that level of attention would flow as a result of (positive) sentiment. However, this cannot be confirmed for sure, while it is also not my goal in this research. Moreover, amongst different proxies for sentiment, there are differences in dimensions explaining a certain mood (Bollen, Mao & Zeng, 2011), let alone between sentiment and attention. Hence, the research includes both.

Regarding investor’s attention, and the return and exchange volume of securities, a positive relationship could be implied. Joseph, Wintoki & Zhang (2011) view ticker searches as the pressure of buying among less sophisticated individual or retail investors. Their idea behind this process is that a consumer’s search for information precedes (at the same moment or some moment before) his or her purchase decision. They assume, based on previous research (Barber, Odean & Zhu, (2009)), that behavior of less sophisticated investors to be correlated with each other, since they are behaving according to the same underlying reasons. In this line of reasoning, they expected that increases in search intensity would forecast abnormal stock returns (i.e. higher prices) and abnormal stock trading volumes. This was confirmed by their results, through analyzing stocks from the S&P 500. The abnormal returns seemed to still hold after controlling for the risk factors modelled by Fama & French (1993) and Carhart (1997). Bordine et al. (2012) also looked at the effect of web search queries on

(8)

stock market volumes, in their case stocks from the NASDAQ-100. In line with Joseph, Wintoki & Zhang (2011), they find correlation between daily volumes and queries related to the relevant stocks. They even find that query volumes anticipate trading by one day or more.

As suggested, the search for information could also be preceding buying behavior, resulting in a anticipation of search queries before buying. Preis, Moat & Stanley (2013) suggest that with Google Trends data, they might be able to reflect not only the current state of stock markets, but also the anticipation of certain future trends. Their results suggest that in their period of study, they could construct profitable trading strategies based search query volumes of Google Trends. However, in contrast to Joseph, Wintoki & Zhang (2011), they detect increases in search volumes for keywords before events whereby stock markets fall. Their possible explanation for this is that in a first stage people tend to gather more information about the state of the market, followed by a stage of selling. Interestingly, this differs from the idea of preceding buying behavior. Moving from stock markets to cryptocurrency markets, Kristoufek (2013) found correlations between search queries and cryptocurrency prices. His results show a more linear relationship between search queries and Bitcoin prices, namely, higher levels of search queries related to higher prices of Bitcoin.

Regarding the reasoning, the source of investor’s attention seems to come mostly from the pressure to buy, and not to sell. This makes the relationship between investor’s attention and the previous two discussed variables, return and exchange volume, a positive one. Looking at volatility, we could draw the same conclusion. Dimpfl & Jank (2011) have looked at retail investors and the effect of their search queries on volatility of related searched stocks. They find a Granger causality in both direction: the higher the amount of search queries, the higher the volatility, but also the other way around. Emphasizing on the first part, we could say that search query data might have predictive power for future volatility of stocks. This is in line with previous researches which find that noise traders’ behavior to become a source of additional volatility in stock markets (Lux & Marchesi, 1999; Foucault, Sraer & Thesmar, 2011).

All in all, we can see that search queries seem to have a positive relationship with the volatility and trading volume of an asset. However, the expected relationship between search queries and return of an asset does not give unilateral results. Although, I will follow Joseph, Wintoki & Zhang (2011) and Kristoufek’s (2013) research, and expect a positive relationship between search queries and the return of cryptocurrency. This expectation is also based on me using the same search queries as Kristoufek (2013). Furthermore, I also expect that there is an anticipation effect induced by search queries. Therefore, I propose the following hypotheses:

(9)

Hypothesis 1: More search queries have a positive effect on the (a) return, (b) volatility and (c) exchange volume of the cryptocurrency

Hypothesis 2: Search queries have an forecasting effect on (a) return, (b) volatility and (c) exchange volume of the cryptocurrency

2.2 Investor sentiment

When I talk about sentiment or investor sentiment, I talk about finding underlying opinions (opinion mined) and subjectivity in texts which can influence behavior. The goal of a sentiment analysis is to derive a certain “mood”, depending on the instrument that is used. The main idea is that if sentiment has an systematic effect on behavior of individual investors, this might cause effects on financial markets, or at the very least a Granger causality relationship. The effect should be revealed through an price movement or other event. So, why might sentiment as a proxy be more useful in the field of finance in comparison with fundamentals?

If we look at the idea behind investor sentiment based research, we first have to look at two views. Rational risk-based asset pricing models argue that asset prices should reflect the discounted value of the expected cash flows in the future. Even if some investors are trading irrational, these irrationalities are quickly offset by arbitrage opportunities. Theoretically, this would mean that investor sentiment would have no impact on asset prices. On the other side, behavioral models suggest that investor sentiment might cause prices to deviate from the fundamental value. When sentiment starts to rise, investors’ sentiment might induce sentiment-driven demand for the risky asset in question, which drives the price of that risky asset up or down. After this period of high sentiment, prices revert to fundamentals, and vice versa. Due to the limitations to arbitrage (Barberis & Thaler, 2003), this deviation from the fundamental value might endure quite long. In line of this argument, an intertemporal relation between sentiment and asset returns can exist.

In line with the behavioral models, Baker & Wurgler (2007) have looked into broad definition of investor sentiment in stock markets. They define it as a “belief about future cash flows and investment risks that is not justified by the facts at hand”. Kaplansky & Levy (2010) add to this definition of sentiment as a “misperception that can cause mispricing”. All in all, it comes down to a shock that induces a deviation of the asset from its fundamental value. So why might investor sentiment in

(10)

cryptocurrency markets specifically be so important? In comparison with stock markets, big institutional investors are not the main participants in the cryptocurrency markets (Bukovina & Martiček, 2016). Sentiment might have a more significant impact in these markets with mainly retail or individual investors, who might be more investing based on noise. When we look at research within the field of behavioural finance, we can see that noise traders are more susceptible for behavior to less rational factors like sentiment (Baker & Wurgler, 2007; Kumar & Lee, 2006; Barber & Odean, 2011). This might give investor sentiment a more prominent role in cryptocurrency markets in comparison with other security markets.

Moving on to investor sentiment proxies in research, Kim & Kim (2014) argue that empirical results regarding the intertemporal relation between investor sentiment and stock returns can give mixed results, depending on the choice of investor sentiment proxy. They classify four groups according to the source sentiment information is extracted from: in short, they are surveys, market variables, news and social media, and popular internet message boards. This means that different proxies can give different results, or should be interpreted differently. Furthermore, as hard it is to define sentiment, it might be even harder to retrieve a proxy for investor sentiment, especially a sentiment around cryptocurrencies. In my case, the latter two sources of the classifications might be the only available sources regarding the recentness of the cryptocurrency markets. Many social media haven been used as source for sentiment, like Twitter (Bollen et al., 2011; Georgoula et al., 2015; Kaminsky, 2016), Yahoo! Finance (Kim & Kim, 2014) and Reddit (Bukovina & Martiček, 2016). In the Methodology section, I will talk further about my chosen source, and the tool I have chosen to derive investor sentiment proxy.

As stated before, individual investors might be more prone to noise. Brown (1999) acknowledged this as well, and reasons that if noise traders affect prices, “the noisy signal is sentiment, and the risk they cause is volatility, then sentiment should be correlated with volatility”. This is based on noise-trader theory, which implies that irrational noise-traders that behave coherently on noise, can consequently be the cause of some of the systematic risk. De Long et al. (1989; 1990) showed that noise traders, in comparison with traders behaving based on fundamentals, could affect prices systematically. These noise traders affect asset prices when they are usually bullish (assuming higher future prices) or bearish (assuming lower future prices). Research on Bitcoin markets with sentiment proxies have shown that sentiment might indeed have an effect on the price of Bitcoin (Georgoula et al., 2015; Bukovina & Martiček, 2016). The effect of positive sentiment might, even if small, be evident on Bitcoin prices according to Glaser et al. (2014). Their results confirmed that positive sentiment of

(11)

media coverage had a strengthening influence on demand, and therefore prices. This would mean that positive sentiment would be related to higher return.

When regarding Bitcoin’s volatility, Glaser et al. (2014) also show that price volatility is significantly influenced by positive sentiment. Also, and adding up to De Long et al. (1989; 1990), Brown (1999) finds that unusual levels of individual investor sentiment are correlated with greater volatility of close-end investment funds. Additionally, higher levels of volatility seemed to occur only when the market is open and associated with more trading activity. With other words, if sentiment would relate to more exchange, it will simultaneously influence price volatility as well. This result is also confirmed in other researches regarding Bitcoin’s price (Glaser et al., 2014; Bukovina & Martiček, 2016). Assuming that Brown’s (1999) finding can be generalized to cryptocurrency markets as well, we could expect that positive sentiment also influences cryptocurrencies’ exchange volume.

Previous literature does not give a clear evidence of every proxy for (positive) sentiment influencing return, volatility and trading volume of an asset. Moreover, there is no real evidence of (positive) sentiment having any forecasting power. However, there are signals that might lead to a careful premises that positive sentiment influences these features (return,, volatility and exchange volume). Since the sentiment proxy might have some overlap with the search queries, I will tend to follow in line with previous hypotheses regarding investor’s attention. This includes the same idea of more search queries influencing the pressure to buy, however in this case positive sentiment reasons for higher return, volatility and exchange volume. Furthermore, I find it interesting to explore any forecasting effects of investor sentiment. Therefore, I propose the following hypotheses:

Hypothesis 3: Positive sentiment increases (a) return, (b) volatility and (c) exchange volume of the cryptocurrency

Hypothesis 4: Positive sentiment has an anticipating effect on the (a) return, (b) volatility and (c) exchange volume of the cryptocurrency

2.3 Interaction effects

As I did discuss before regarding Kristoufek’s (2013) research, there could be a careful suggestion made of a possibly interaction-effect between investor’s attention and investor sentiment. However, if both variables show significance in the previous hypotheses, they might even cause for an leverage

(12)

effect when interacted. As explained in the Introduction section, I want see that if both investor sentiment and (positive) sentiment have a significant and positive effect on the dependent variable. Adding an interaction term to the relevant statistical test (for instance an Ordinary Least Squares regression), will show if they strengthen each other above their separate significant effects, if evident. With other words, if the (positive) sentiment dummy1 and the relevant search query variable are positive and significant on a certain dependent variable, for example return, they both explain a part of the variance. Furthermore, if the interaction between the two is positive and significant, this would mean that the two strengthen each other when explaining the variance on return. For statistical clarity: if, for any odd reason, the interaction term would be negative and significant, it would mean that the independent variables on their own increase return, but when both are positive (in the case of the sentiment dummy) and increasing (in the case of the search query), they decrease return. This would be an odd results. However, my only intention here is to explain statistical examples, not to give any implications of directions. Moreover, there is no evidence on the relationship between the interaction of search queries with sentiment on return, volatility or exchange volume in previous research. Based on the idea of two variables possibly strengthening each other, I want to explore a possible leverage effect. For this exploring idea, I, therefore, propose the following hypothesis:

Hypothesis 5: There is an interaction effect between positive sentiment and search queries, and the (a) return, (b) volatility and (c) exchange volume of the cryptocurrency

In the Results section, I will feedback on the hypotheses. I will do this by referring to Hypothesis 1a (return), Hypothesis 1b (volatility) and so on, so the reader understands which dependent variable is discussed.

(13)

3. Methodology

3.1 Data

I have used several sources to conduct my empirical research. The main data for the dependant variables are retrieved from Coin Metrics (2018). Coin Metrics is an ad-free analytics site and blog that provides data on a variety of cryptocurrencies, considering cryptocurrency’s historical price, trading volumes, transaction counts and more. Calculations of cryptocurrency return and (weekly) volatility are based on these data.

The data relating to the Dollar/Euro exchange rate (closing) price and the S&P 500 index is retrieved from http://nl.investing.com (Investing, 2018a; Investing 2018b), which freely provides information

on a variety financial securities and other variables. Mainly, Investing.com is an off-shore broker,

owned by GS Sharestocks Ltd. The company is registered in Cyprus, but is regulated and monitored by the local Securities and Exchange Commission (SEC), which makes it more credible than unregulated off-shore brokers.

As for the data on search queries, I have looked at Google Trends (2018) and Wikipedia (2018). These were easy to retrieve. Google Trends (2018), however, provides only daily data on the relative attention for a search query, if data is retrieved for not longer than 90 days, while Wikipedia (2018) provides all search queries in absolute values.

For the variable of sentiment, I have investigated the possibilities to recreate a sentiment variable by a method that is often used in other papers as well. I have looked at the methods in other papers, and the possibilities, and have experimented with different sources and different tools to conduct a sentiment analysis. In collaboration with Tom Janssen Groesbeek, student from the Department of Computer Science at Radboud University Nijmegen, we have investigated the options, and come up with a plan. The data for the sentiment variable has been retrieved from Reddit.com (2018), a social network site which offers to people to rate website and have discussions on a forum, where content is socially curated and promoted by site members through voting. After retrieving the data, the sentiment analysis was performed with the Natural Language Toolkit (NLTK, 2018). Finally, each of the Reddit posts would get a assigned value based on its text. In the Appendix section, the extensive process has been elaborated to provide more information on the Sentiment Analysis.

3.2 Dependent Variables: Cryptocurrency return, exchange volume, and volatility

The dependent variables, cryptocurrency’s (Bitcoin, Ethereum and Litecoin) return and exchange (or read: 24H trading) volume are all measured in Dollars. The return of the cryptocurrency is calculated

(14)

by subtracting the old price from the new price, divided by the old price. The exchange volume has been provided without necessary modifications to be made, so were already ready for usage. Furthermore, the weekly volatility of the cryptocurrency has been made by calculating the volatility (read: standard deviation), divided by the weeks average price, to get the relative standard deviation. In other words, standard deviation will be expressed in percentiles. Absolute standard deviations would be not suitable for time series analysis, and thus inefficient.

3.3 Independent Variables: Wikipedia, Google Trends, Sentiment

The Wikipedia search queries are given in absolute numbers, so did not need any modification. The daily Google Trends data, however, could only be retrieved by a 90 day sequence. To extend this, I made sure that the last day of the first 90 days overlapped with the first day of the second 90 days, and so on. By doing this, I was able to scale the data to one sequence with daily data.

The value of the sentiment data were scaled from 0 to 1, with 0 being the most negative value possible, and 1 being to the most positive value possible. However, I wanted to make a distinction between negative a positive sentiment, by creating dummies. A value around 0.5 should be as close as neutral. After running some graphs to view the distribution of the scores, I have split negative sentiment, the reference category, from the positive category by a value of 0.56 exactly. This means that all values under 0.56 would be labelled as negative sentiment, while the opposite will be labelled as positive sentiment.

3.4 Control Variables: Dollar Euro Exchange rate, S&P 500

The control variables have been taken into the model to determine for any corrections. The Dollar Euro exchange rate checks for any shifts due to exchange rate risk, since almost all other variables are in Dollars. The S&P 500 has been used often in relation to Bitcoin research before (Bouri et al., 2017; Georgoula et al., 2014) as a representable proxy for the global state of the economy. Therefore, I included it in my control variables. The S&P 500 will also be expressed in Dollars.

3.5 Missing values and Time interval

The data contains some missing values. Considering the sentiment data, there is a missing on the sentiment value of Ethereum on the 8th of September. I have resolved this by taking calculating the average sentiment value of the previous and the next day. This way, you can more easily extrapolate the sentiment values between the 7th and 9th of September.

(15)

The Dollar Euro exchange rate and the S&P 500 have missings on the weekends, which is obvious considering the NYSE exchange is mostly open on all weekdays, with some weekdays being an exception. I have resolved these missings by replacing these days with average rate or index price of that week.

Considering the time interval, I have chosen to look at daily data for cryptocurrency return and exchange volume. Furthermore, I have chosen to investigate the cryptocurrency’s weekly volatility due to the limitations I have considering daily volatility data. By doing this, I have converted all values to their weekly averages. The daily data will consist of data from 01/09/2017 until 07/04/2018, while the weekly data will cover week 36 of 2017 until week 18 in 2018. For clarity reasons, I have provided a legend for all the variables in Appendix Table 1. These variables names will be used throughout this research. Following up, I have provided the summary statistics unstandardized of all variables in Appendix Table 2a and Appendix Table 2b (unstandardized). I have chosen to standardize these values in Stata to get clear and better readable beta coefficients.

3.6 Econometric methodology

Above all, I am interested in the nature of relationships between the variables. To gain the necessary insights, I will therefore conduct an Ordinary Least Squares (OLS) for Hypothesis 1, 3 and 5 and equivalently a Vector Autoregression (VAR) for Hypothesis 2 and 4. These methods do have their limitations if some conditions are not met. But studying the nature of the relationship does not necessarily result as a big issue, even though if some conditions are not met (Sims, 1980; Sims Stock & Watson, 1990). I will also conduct additional analyses, which will be used to extend the former methods. These other methods, the OLS with first differences and the Vector Error Correction model

(VECM) resolve some of the problems of the former mentioned methods, and are there to give a more

holistic view.

Due to the problem of multicollinearity2 in different models and omitting variables, I have chosen to check every independent variable in a separate model together with the control variables. As I know that this is not desirable, I do find this necessary for statistical, as for readability reasons to have uniformed tables. Additionally, I have tested the regressions on their severity of multicollinearity by measuring the Variance Inflation Factor (VIF), and found values that exceeded the rule thumb (Neter et al, 1989). Running several regressions with different combinations of variables showed that it was better to provide separate models for every independent variable. For these reasons, I have produced

(16)

five models in every analysis. Every model will consist of one independent variable, and two control variables. These will be BTCgoogle (Model 1), BTCwiki (Model 2), BTCsent (ref=neg.) (Model 3), BTCsentgoogle (ref.=neg) (Model 4) and BTCsentwiki (ref.=neg) (Model 5). The models are equivalent for the other cryptocurrencies, Ethereum and Litecoin.

To check if some of the conditions are met, I will conduct some test check: I will investigate the non-stationarity of my variables, and estimate them in a model in their stationary forms (read: first differencing). I will conduct a test to check for non-stationary variables to investigate the problem of spurious regressions. This will be done through an augmented Dickey-Fuller (ADF) test. The ADF test will test the null hypothesis of a unit root (θ=1) versus the alternative hypothesis, which represents stationarity. The usual argument is that if series levels are non-stationary, then estimated regressions involving the levels cannot be trusted. In this case, we could speak off spurious regressions.

Furthermore, I will also look for any cointegrating vectors. I will do this with a cointegration test, which will be based on Johansen’s method of calculating a trace statistic (Johansen, 1995). The test will specific the number of cointegrating vectors, if there are any.

I will start with an OLS regression to investigate long term effects. The functional form of this regression is as follows (Equation 1):

(1) 𝑦𝑦𝑡𝑡 = 𝛼𝛼 + 𝛽𝛽1𝑋𝑋𝑡𝑡+ . . . + 𝜀𝜀𝑡𝑡

A normal OLS will regress variables with ordinary levels. One of the problems that might exist in normal OLS, is autocorrelation. I will test the OLS on this, and if evident, I will conduct a Prais-Winsten estimation to lead the OLS to more efficient results (Prais & Prais-Winsten, 1954). If noteworthy results emerge, diverging from the OLS estimation, I will shortly discuss them in the Results section. Another problem of an OLS with ordinary levels could be that variables are non-stationary. This can cause problems, and will therefore be checked by the former mention ADF test. First differencing resolves this, and makes all variables stationary. The functional form changes into the following (Equation 2):

(2) ∆𝑦𝑦𝑡𝑡 = 𝛽𝛽1∆𝑋𝑋𝑡𝑡+ . . . + 𝑢𝑢𝑡𝑡

When using first differences, it is important to emphasize on the difference in functional interpretation here. This is due to the error term ut. which implies the first differences in error terms. Due to this,

(17)

results in OLS (levels regression) can be also be reasoned in the same line when taking first difference. However, the converse direction of reasoning is not necessarily true form first differencing to levels regression. With other words, if there exists some stable relationship between the level of “x” and the level of “y”, this also implies a stable relationship “x” with a change of “y”, but not necessarily the other way around. That is why I will start with a normal OLS, and conduct an OLS with first differences to evaluate if any long term effects, if evident in the normal OLS, still hold when corrected for possible non-stationarity problems.

Next, the VAR model will be used to investigate the dynamic relationship (time series) among the variables. Like the normal OLS with ordinary levels, one of the restrictions is that the variables have to be stationary. But like the ordinary OLS, I will firstly investigate this with a VAR analysis to investigate the nature of the relationship. Also, I will test whether the variables have any Granger causality with the dependent variable.

Next, I will conduct the VECM for the same reasons as I conduct a OLS with first differences. With both methods, the lag lengths have been chosen based on the significance of the t-ratio of the last lag. In my case, this has been 2 lags at max. The equations of the VAR (Equation 3) and the VECM (Equation 4) will look as follows:

(3) 𝑦𝑦𝑡𝑡 = 𝛽𝛽1+ 𝛽𝛽2𝑦𝑦𝑡𝑡−1+ 𝛽𝛽3𝑦𝑦𝑡𝑡−2+ 𝛽𝛽4𝑥𝑥𝑡𝑡−1+ . . . + 𝑣𝑣𝑡𝑡𝑦𝑦

(4) ∆𝑦𝑦𝑡𝑡 = ∆𝛽𝛽2𝑦𝑦𝑡𝑡−1+ ∆𝛽𝛽3𝑦𝑦𝑡𝑡−2+ ∆𝛽𝛽4𝑦𝑦𝑡𝑡−1 . . . + 𝑣𝑣𝑡𝑡∆𝑦𝑦

For simplicity reasons, I will only present all normal OLS in the Results section. Due to the amount of numerous tables, the reader might get confused if all were presented. However, I will present one table in the other sections for exemplary purposes.3 The (other) main results of the OLS with first differences will only be referred and, obviously, be discussed to give context to the former mentioned OLS. In the same way, the results of the VECM give context to the VAR results. Both VAR and VECM tables will also be put in the Appendix section, as well as the OLS with first differences. In all models, I will mainly present the coefficients with their significance level, together with corresponding standard error.

3 In the sections Ordinary Least Squares with first differences, Vector Autoregressive models, and Vector Error Correction model, the name of the table will start with “Appendix Table …” instead of “Table …”. An exact copy will

(18)

4. Results

4.1 Ordinary Least Squares (normal OLS)

In Table 1a, 1b, and 1c, the results of the independent variables on return have been summarized. The first thing that stands out in all models, is the coefficient of determination (R2) is quite low in all models. Only M1, M2, M4 and M5 of Table 1c (Litecoin return) have a relatively high R2. This means that the models independent variables do not explain proportion of the variance in the dependent variable well. However, we will have a look at the coefficients. The coefficient of Google Trends and Wikipedia do not seem to have an obvious effect on return. BTCwiki is significant, together LTCgoogle and LTCwiki, and do point in the same direction as indicated in the theory. However, the search queries in general do not seem to have a convincing effect on return.

Sentiment seems to have no effect at all. In all tables, there are no significant results evident. On the other side, the interaction effect between sentiment and the search queries do provide significant effects in 5 of the 6 models. Only in M5 of Table 1b, the interaction between sentiment and Wikipedia search queries does not seem to have an effect. All effects do point in the same direction, confirming the theory. However, all in all, the predictors do not seem to have a lot of explanatory variance.

Table 1a: OLS on BTCreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

BTCgoogle .012 .077 BTCwiki .166** .076 BTCsent (ref=neg.) .270* .236 BTCsentgoogle (ref.=neg) .138* .066 BTCsentwiki (ref.=neg) .165* .067 S&P500 -.0.36 .094 -.113 .088 -.013 .080 -.068 .082 -.073 .081 DollarEuro -.101 .094 .008 .095 .113 .080 -.057 .083 -.036 .084

_cons 3.7e-09*** .063 3.3e-09***

.062 -.147 .093 5.4e-09 .062 6.3e-09 .062

R2 .02 .02 .03 .03 .04

(19)

Table 1b: OLS on ETHreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

ETHgoogle .115 .081 ETHwiki .071 .069 ETHsent (ref=neg.) .182 .133 ETHsentgoogle (ref.=neg) .170* .070 ETHsentwiki (ref.=neg) .117 .067 S&P500 .024 .101 .078 .086 .104 .080 .023 .087 .070 .083 DollarEuro -.106 .096 -.150 .086 -.166* .080 -.098 .086 -.136 .083

_cons 1.9e-09 .063 2.1e-09 .063 -.117 .106 2.1e-09 .062 2.5e-09 .063

R2 .03 .02 .03 .04 .03

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 249

Table 1c: OLS on LTCreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

Zltchgoogle .424*** .064 LTCwiki .433*** .064 LTCsent (ref=neg.) .198 .128 LTCsentgoogle (ref.=neg) .380*** .062 LTCsentwiki (ref.=neg) .362*** .063 S&P500 -.151 .080 -.131 .078 .033 .080 -.069 .077 -.058 .077 DollarEuro .069 .080 .099 .081 -.110 .081 -.002 .774 .009 .079

_cons -2.5e-09 .059 -1.7e-11 .058 -.090 .086 3.8e-11 .059 -1.1e-09 .059

R2 .16 .16 .02 .14 .13

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 249

Moving on to Tables 2a, 2b and 2c, we see that the models have significantly higher coefficients of determination. Regarding all tables, search queries (M1 and M2) seem to have a significant and positive effect on exchange volume, which is in line of the hypothesis. On the contrary, sentiment seems to have no effect on exchange volume.

Oddly enough, the interaction term between search queries and sentiment does seem to have an effect. However, you can see that the explained variance comes mostly from the search queries, since the R2 decreases in M4 (compared to M1) and M5 (compared to M2).

Regarding the control variables, we can see that varying results. Firstly, S&P500 does not seem to be a substitute for higher trading volumes, but moving rather more in the same direction in the long term. The only exception is evident in Table 2c (Litecoin). In M1 and M2 this does not seem to be the case. Secondly, when we look at DollarEuro, we see the effect varying in different models. In some models, it seems to have a positive and significant effect. This would mean that in the long term, when the

(20)

dollar becomes cheaper compared to the euro, more exchange volume is evident. But we also find significant negative effects in these tables, which represents the opposite.

Table 2a: Ordinary Least Squares on BTCexvol

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

BTCgoogle .628*** .039 BTCwiki .538*** .044 BTCsent (ref=neg.) .137 .093 BTCsentgoogle (ref.=neg) .290*** .045 BTCsentwiki (ref.=neg) .270*** .047 S&P500 .435*** .048 .561*** .051 .845*** .059 .753*** .056 .764*** .056 DollarEuro .066 .048 .046 .056 -.334*** .058 -.223*** .0568 -.212*** .059

_cons 2.7e-09 .032 2.1e-09 .037 -.075 .069 7.1e-09 .043 6.1e-09 .043

R2 .74 .67 .48 .56 .54

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 249

Table 2b: Ordinary Least Squares on ETHexvol

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

ETHgoogle .706*** .041 ETHwiki .265*** .049 ETHsent (ref=neg.) .007 .099 ETHsentgoogle (ref.=neg) .298*** .049 ETHsentwiki (ref.=neg) .154** .050 S&P500 .198*** .051 .607*** .061 .733*** .060 .577*** .061 .678*** .061 DollarEuro .360*** .048 .015 .060 -.104 .060 .042* .061 -.044* .061

_cons 2.2e-09 .032 3.2e-09 .044 -.005 .079 3.5e-09 .044 4.1e-09 .046

R2 .75 .51 .45 .53 .48

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 249

Table 2c: Ordinary Least Squares on LTCexvol

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

LTCgoogle .907*** .032 LTCwiki .810*** .043 LTCsent (ref=neg.) .062 .123 LTCsentgoogle (ref.=neg) .663*** .047 LTCsentwiki (ref.=neg) .594*** .051 S&P500 -.021 .040 .068 .052 .393*** .077 .200*** .059 .230*** .063 DollarEuro .190*** .040 .193*** .053 -.227* .077 -.011 .059 -.006 .065

_cons -1.5e-09 .030 3.5e-09 .038 -.029 .083 3.4e-09 .045 1.4e-09 .049

R2 .78 .64 .10 .50 .42

(21)

Finally, looking at the long term effects on volatility (Table 3a-3c), we find that search queries in almost all cases do positively and significantly affect volatility. The exception here is Table 3b, M2: ETHwiki does not seem to have a significant effect. Furthermore, the R2 is relatively high in the first two models of all the three tables.

Again, sentiment does not seem to give any explanatory value, and the significance of the interactions’ effect seems to come mostly from the search queries. But like with exchange volume, search queries seems to also have a long run relationship with volatility.

Table 3a: Ordinary Least Squares on BTCstd

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

BTCgoogle .761*** .179 BTCwiki .697*** .181 BTCsent (ref=neg.) -.156 .364 BTCsentgoogle (ref.=neg) .117 .182 BTCsentwiki (ref.=neg) .086 .192 S&P500 -.242 .221 -.083 .211 .289 .233 .288 .228 .296 .228 DollarEuro .369 .222 .351 .232 -.189 .223 -.132 .243 -.146 .246

_cons 6.8e-09 .136 9.8e-10 .141 .085 .261 2.0e-09 .170 9.5e-10 .171

R2 .41 .36 .07 .07 .07

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 35

Table 3b: Ordinary Least Squares on ETHstd

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

ETHgoogle .698** .210 ETHwiki .137 .199 ETHsent (ref=neg.) -.168 .418 ETHsentgoogle (ref.=neg) .525* .204 ETHsentwiki (ref.=neg) .138 .193 S&P500 -.430 .263 .095 .251 .194 .224 -.235 .258 .101 .245 DollarEuro .666* .248 .215 .248 .111 .229 .500 .246 .207 .241

_cons 4.5e-09 .146 2.3e-09 .168 .130 .364 -7.4e-10 .154 1.8e-09 .168

R2 .32 .09 .09 .24 .10

(22)

Table 3c: Ordinary Least Squares on LTCstd

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

zlctgoogle .816*** .146 LTCwiki .807*** .149 LTCsent (ref=neg.) .253 .418 LTCsentgoogle (ref.=neg) .562** .167 LTCsentwiki (ref.=neg) .540** .175 S&P500 -.387* .183 -.324 .181 .066 .231 -.131 .207 -.119 .212 DollarEuro .440* .183 .479* .190 .043 .262 .246 .214 .276 .225

_cons -3.5e-09 .125 -7.1e-10 .127 -.109 .251 6.2e-09 .151 6.1e-09 .154

R2 .50 .49 .02 .27 .24

If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 35

4.2 Autocorrelation

While running the OLS, the regressions have also been tested on autocorrelation by investigating high t-values in combination with a relatively high R2. In case of all the regressions considering

exchange volume and volatility as dependent variables, some test have been conducted, and the

possibility of autocorrelation has been confirmed only with exchange volume. This is not desirable when conducting an OLS regression.

To evaluate this problem, I have conducted a Prais-Winsten estimation. The estimation shows that the search queries effects do not change. However, the effects the interaction terms BTCsentwiki (ref.=neg), ETHsentwiki (ref.=neg) and ETHsentgoogle (ref.=neg) become not significant after correcting for autocorrelation. Also, the significance levels of S&P500 and DollarEuro have changed; all of the significant effects have become not significant in the Prais-Winsten estimation. This means that, all in all, the search queries effects are the only ones that still hold.

4.3 Ordinary Least Squares with first differences

To make sure that all variables are stationary, the OLS with first differences provides us less spurious results. Regarding the first differences related to the cryptocurrency’s return (Appendix Table 5a-c), we find no additional information of any importance.

Next, the first differences of the explanatory variables on the first differences of exchange volume give interesting results. All of the search queries remain positive and significant in these models. However, the R2 of M2 in Appendix Table 6b (Ethereum) drastically decreases compared to its equivalent in the normal OLS4. All of the other coefficients of determination remain high. When we

(23)

look at the other models, M3, M4 and M4, we find no results that give any additional value to answer our hypotheses.

Appendix Table 5a: Ordinary Least Squares (first dif.) on BTCreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

BTCgoogle .077 .195 BTCwiki .505* .243 BTCsent (ref=neg.) .158 .132 BTCsentgoogle (ref.=neg) .147* .073 BTCsentwiki (ref.=neg) .163* .082 S&P500 .522 .360 .499 .357 .505 .359 .525 .357 .522 .357 DollarEuro -.716 .432 -.666 .429 -.705 .430 -.732 .429 -.714 .429 _cons -.007 .089 -.006 .088 -.008 .089 -.008 .088 -.008 .062 R2 .02 .04 .03 .04 .03 If p<0.001 = ***, p<0.01 = **, p<0.05 = * N = 248

Finally, regarding volatility (Appendix Table 7a-c), we see similar results as in the normal OLS regressions. The significant coefficients in the normal OLS remain significant and positive in the OLS with first differences. The significance of the interaction terms even fades away after conducting first differences.

All in all, we can conclude that search queries seem to have solely long term effects on exchange volume and volatility, even after controlling for non-stationarity and autocorrelation. We do find one exception regarding search queries, namely the effect of ETHwiki on volatility. However, this seems to be the only exception, while other results do confirm my expectation.

When we look at sentiment and the interaction term between sentiment and search queries, we do not find any striking results regarding the confirmation of the hypotheses. In line with these results, and in line with the Prais-Winsten estimation results, we can reject Hypothesis 3a (return), 3b (volatility) and 3c (exchange volume) and Hypothesis 5a (return), 5b (volatility) and 5c (exchange volume). This also counts for Hypothesis 1a. On the other side, we find noteworthy results to confirm Hypothesis

1b (volatility) and 1c (exchange volume). With other words, we find long term effects of search

queries on cryptocurrency’s volatility and exchange volume.

4.4 Vector Autoregressive models (VAR)

In Appendix Table 8a, 8b and 8c, the VAR tables are presented to investigate any evidence of Granger causality towards cryptocurrency’s return. When we look at the lags of the search queries at first, we can see significant coefficients of both lags of BTCwiki, LTCgoogle and LTCwiki on return. This

(24)

can be also confirmed by a Granger Causality test5. This means that the other three variables have no anticipating effect at all on return. Zooming in on the former mentioned search queries that are significant, we can see that the first lag in all cases is positive, while the second lag is negative. This might indicate some reversal in the very short term.

Furthermore, sentiment seems to have no forecasting effect regarding Bitcoin and Litecoin return. On the contrary, the first lag of sentiment in Appendix Table 8b does seem to have an effect on Ethereum’s return. Also, conform my expectations, it is positive.

Moving on to the interaction terms, we see that there are significant effects on Ethereum’s and Litecoin’s return. Looking at Appendix Table 6b and 6c, we see that all of the first lags of the interaction terms are significant and positive, conform my hypotheses. However, this does not go for the interaction terms on Bitcoin’s return.

The control variables seem to not to give a lot of significant results. The main and only consistent result is that the first lag of S&P500 seems to have a positive effect on Ethereum’s return. In some of the cases, even the second lag of S&P500 is significant, but negative. Again, we see here some kind of reversal effect between the lags.

5 All Granger Causality tests can be viewed in the Appendix in Appendix Table 11a (return), 11b (exchange volume),

(25)

Appendix Table 8a: Vector Autogression on BTCreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

BTCreturn L1. -.002 .063 -.039 .063 -.014 .064 -.013 .064 -.020 .064 L2. .001 .064 .001 .063 -.002 .064 -.002 .064 .000 .064 BTCgoogle L1. .017 .143 L2. .000 .144 BTCwiki L1. .585** .176 L2. -.524** .177 BTCsent (ref=neg.) L1. .176 .129 L2. .011 .129 BTCsentgoogle (ref.=neg) L1. .073 .067 L2. .032 .068 BTCsentwiki (ref.=neg) L1. .122 .071 L2. -.003 .073 S&P500 L1. .347 .261 .325 .255 .335 .259 .328 .260 .323 .259 L2. -.430 .263 -.422 .256 -.394 .262 .055 .261 -.428 .260 DollarEuro L1. -.119 .318 -.060 .311 -.114 .314 -.079 .318 -.068 .317 L2. .068 .318 .035 .305 .045 .310 .055 .310 .057 .309 _cons -.007 .063 -.006 .062 -.109 .114 -.007 .063 -.007 .062 If p<0.001 = ***, p<0.01 = **, p<0.05 = *

In Appendix Tables 9a, 9b, and 9c (exchange volume), we see similar results as in the Appendix Tables 8a, 8b, and 8c (return). BTCwiki, ETHgoogle, ETHwiki and LTCwiki seem to have a forecasting effect conform my expectations, while the other search queries show no significant effects. To summarize the effects of sentiment and the interactions term: almost none of the coefficients show any significant effects. The three interaction terms that do show significant effects (Appendix Table 9a and 9b), seemed to be more of an extension of their related search query. Moreover, we can see that any leverage effects here seem to extend from search queries, and not from sentiment. This makes the possibility of a leverage effect not very convincing, even if significant.

At last, we can also draw similar results from the control variables. DollarEuro seems to have an effect in two of the five models in Appendix Table 9a. In the other three cases, only the second lag is significant, while the first lag is not significant. Thus, it cannot be viewed as a valid Granger causal effect.

(26)

At last, we will investigate if there are any forecasting effects towards cryptocurrency’s volatility in Appendix Table 10a, 10b, and 10c. All of the search queries of Wikipedia seem to be positive and significant in the first lag. However, we can only say the same for BTCgoogle. The forecasting power of Google Trends seems not be anywhere near of overwhelming.

Again, sentiment and interaction terms seem to give little results conform the hypotheses. Only BTCsentwiki (ref.=neg) seems to have a significant and positive effect on the volatility of Bitcoin. The control variables also seem to be not affecting the volatility of the cryptocurrencies as well. Only in Appendix Table 10c, we can see that the second lag of S&P500 (and the first lag in M2) has a significant effect. However, the second lag’s significance is not valid when the first lag is not significant as well.

4.5 Vector Error Correction model (VECM)

As mentioned before, most of the variables seemed not to be stationary. Above all, I also tested the variables on cointegration (Johansen, 1995), and have found multiple cointegrating vectors. For this reason, the VECM method seems suitable to provide less spurious results. All of the VECM results can be found in the Appendix. For simplicity reasons, I will only mention the main results that are relevant and only refer to the tables.

At first, we will look at the return (Appendix Table 12a, 12b and 12c). In all of the models, the error correction term, ce1 L1. (cointegrated equation 1), is negative and significant in all cases. This mean that there exists a long run (Granger) causality running from the explanatory variables to return. Next, we can see that search queries do not have any strong and convincing forecasting power towards return. There are significant effects to be seen, but they are not overwhelming enough to confirm hypothesis 2a. However, one of the striking things in the VECM results (Appendix Table 12a, 12b and 12c), is that (positive) sentiment has a negative forecasting effect on Bitcoin, Ethereum and Litecoin’s return. With other words, the forecasting relation between positive sentiment and return is negative; positive sentiment eventuates significantly in negative return. This is notable, since it contradicts the direction of my expectations. As for the interaction terms or control variables, we find no noteworthy results.

(27)

Appendix Table 12a: Vector Error Correction on BTCreturn

M1 M2 M3 M4 M5

B S.E. B S.E. B S.E. B S.E. B S.E.

_cel L1. BTCreturn -.986*** .111 -.995*** .112 -.726*** .104 -1.002*** .112 -1.019*** .113 LD. L2D. .005 .091 -.007 .090 -.180 .088 .006 .091 .019 .092 BTCgoogle .013 .065 .017 .064 -.091 .065 .011 .066 .026 .066 LD. L2D. -.002 .008 BTCwiki .001 .008 LD. L2D. .031** .010 BTCsent (ref=neg.) -.021* .010 LD. L2D. -.031*** .009 BTCsentgoogle (ref.=neg) -.016* .007 LD. L2D. -.002 .004 BTCsentwiki (ref.=neg) .001 .004 LD. L2D. -.004 .004 S&P500 -.004 .004 LD. L2D. .018 .016 .020 .015 .006 .016 .017 .016 .018 .016 DollarEuro -.031 .016 -.030 .015 -.040* .016 -.032* .016 -.032 .016 LD. L2D. .000 .019 -.001 .018 .004 .020 .002 .019 -.001 .019 _cons .002 .019 -.001 .018 .009 .019 .003 .018 .003 .018 If p<0.001 = ***, p<0.01 = **, p<0.05 = *

When we look at exchange volume (Appendix Table 13a, 13b and 13c), the first thing to mention is that in only regarding Bitcoin, all of the error correction terms in every model are negative and significant. In the other two tables (Litecoin and Ethereum), this is only evident for two of the five models. Moreover, none of the explanatory variables give any consistent results, especially regarding the hypotheses. This goes for the control variables too.

And finally, looking at the VECM models on volatility (Appendix Table 14a, 14b and 14c), we find even less notable results. Only the lags of BTCwiki a positive effect on volatility. In some case, S&P500 even has a negative effect on the volatility of Litecoin.

All in all, we can say that none of the models give additional results of a possible confirmation of the hypotheses. We do find significant results in some cases, and some surprising results in the case of forecasting power of sentiment on the cryptocurrencies’ return, namely a negative relationship

(28)

between the first lag of positive sentiment on return. However, we can confirm that Hypothesis 2a

(return), 2b (volatility), 2c (exchange volume) can be rejected. Also, the results show that there are

no consistent results towards my exploratory Hypothesis 4a (return), 4b (volatility) and 4c (exchange

(29)

5. Conclusion

In this research, my main goal was to investigate the relationship between investor’s attention and investor sentiment towards multiple features of the cryptocurrency market, namely Bitcoin, Ethereum and Litecoin’s return, exchange volume, and volatility. I combined different data sets, and even introduced a unique sentiment proxy through a Sentiment Analysis to achieve the necessary variables. In my introduction, I proposed three research questions. I will answer them here shortly, and finish with a discussion.

To what extent do search queries have a relationship with the return, volatility and exchange volume of the relevant cryptocurrency?

The results show that there is no consistent long run relation between search queries and return. However, on the other side, the search queries do show an evident long run relation with the exchange volume and volatility of all of the investigated cryptocurrencies. Even after controlling for non-stationarity and autocorrelation, the results still hold. Furthermore, the forecasting power of search queries did not provide any overwhelming significant results to confirm any anticipation towards return, exchange volume, nor volatility.

To what extent does sentiment have a relationship with the return, volatility and exchange volume of the relevant cryptocurrency?

The sentiment proxy showed no long run relation with neither return, nor volatility, nor exchange volume in any way. In the rare cases it showed any significance, positive sentiment even had a negative (forecasting) effect. But all in all, there is no consistent long run, nor a consistent forecasting relationship to be found regarding all dynamic time series models.

To what extent does the interaction between search queries and sentiment have a relationship with the return, volatility and trading volume of the relevant cryptocurrency?

The interaction terms have shown significant effects, but it appears that their significant levels is owed to the effect of the search queries mostly. The significant explained variance is drastically lower in the models with the interaction terms, compared with the search queries. In summary, there are no consistent relationships found between the interaction terms and any of the dependent variables.

(30)

6. Discussion

There were some limitations in my dataset. For starters, regarding volatility, I had to look at weekly data. This was not desirable in line with the other data, but necessary. Also, I had limitations concerning the time period I could investigate. The limit was due to the sentiment proxy. The daily discussions on Reddit, where the sentiment proxy is retrieved from, showed sufficient daily posts starting around September 2017. The main idea is that if there were not sufficient daily posts, the daily “pool” of posts would not be big enough gain reasonable discussions. Moreover, unimportant or irrelevant posts, like spamming posts, which were possibly not intensively checked by the Reddit community, would more likely to be extracted in the to the dataset, despite any artificial prerequisites.

In contrast with Kristoufek (2013) however, we would have expected that the price (read: return) of cryptocurrencies would have a long run relationship with search queries of Google Trends and Wikipedia. On the contrary, I did not find these results. One possible explanation could be the difference in researched time zones. One major difference with my data in comparison with Kristoufek (2013) was the major boom and eventual burst of the cryptocurrency market at the end of 2017. Such overwhelming event was not evident in Kristoufek’s (2013) data.

Also, previous literature stated that search queries could have a lot of forecasting effects towards the different variables I investigated. Even though these findings were evident in the stock markets (Dimpfl & Jank, 2011; Joseph et al., 2011; Preis et al., 2013), they did not seem to be applicable for the cryptocurrency market.

Furthermore, the sentiment proxy in this research seemed not be a successful forecaster, nor showing any consistent long term relationship with any of the dependent variables. Creating a sentiment proxy can be tricky, since there are numerous ways to do it. Moreover, it is extremely hard to find a tool, even if highly sophisticated, that can be applied in every context. For example, Kim & Kim (2014) used the same tool on the Yahoo! Finance message board to retrieve a sentiment proxy, and found that the proxy could not provide any forecasting power towards stock price performance. But, on the other side, Kim & Kim (2014) found that (positive) sentiment seemed to be reflected later on by the stock price performance. Even though the difficulty of proxy creation was obvious, I found it interesting and relevant to explore the sentiment analysis, using Reddit as my main source.

Since most of the previous research concerning cryptocurrencies have been focused mostly on Bitcoin, my interest was to extend this by focusing on other cryptocurrencies as well, and to see if results were

(31)

converging to each other. However, for future research, it might be interesting too to look at portfolios instead of single cryptocurrencies. This way, any idiosyncratic risk is cancelled out. For example, the same research can be done through comparing a portfolio with different cryptocurrencies, and comparing this to a benchmark, like the S&P 500.

Furthermore, there also numerous possibilities when it comes to proxies for investor sentiment and investor’s attention. In this research, the sentiment proxy has been constructed by releasing the tool NLTK on Reddit posts. But, for example, one other possibility considering sentiment valuating tools was SentiStrenght, which has been used in the field of finance as well (Zheludev et al., 2014). Moreover, multiple proxies for sentiment could also be included to evaluate the effects, and whether they show converging results. As for investor’s attention, more types of search queries could be investigated. The search queries in my research were based on search tickers, but investor’s attention could also arise from clicking on certain sources not directly related to a search engine, but instead clicked through a news feed containing different sources.

Referenties

GERELATEERDE DOCUMENTEN

cumulative returns for the Japanese stock index and the cumulative returns of a long-short portfolio for he portfolio strategy is to go long in the two best performing sectors

She has led studies aimed at identifi- cation and remediation of unprofessional behaviours, the role of clin- ical education in shaping the professional identity of learners

In short, birth cohorts can differ in their job satisfaction level because older cohorts are replaced by younger cohorts who have higher and more idealistic expectations, different

45 Nu het EHRM in deze zaak geen schending van artikel 6 lid 1 EVRM aanneemt, terwijl de nationale rechter zich niet over de evenredigheid van de sanctie had kunnen uitlaten, kan

In dit onderzoek is gekeken naar negatieve informatie effecten en het effect van zelfbevestiging (Steele, 1988) op de cognitieve klachten en prestaties van

Door het VSA wordt bijvoorbeeld gewerkt aan een ‘dashboard’ als monitoringstool om scherper te sturen (Eric Peperkamp, persoonlijke communicatie, 26 mei 2016). Dit systeem

Begheyn argue that the space for housing development within the borders of the built environment in Alphen aan den Rijn is scarce, causing the need for housing development outside

The research question, which follows: “To what extent does transparency in the implementation of the General Data Protection Regulation effect consumers’ online privacy