• No results found

PREDICTING UK STOCK INDICES METRICS USING TWITTER METRICS

N/A
N/A
Protected

Academic year: 2021

Share "PREDICTING UK STOCK INDICES METRICS USING TWITTER METRICS"

Copied!
97
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

PREDICTING UK STOCK INDICES METRICS

USING TWITTER METRICS

D.P.J. Bodewes S2545233

Master’s Thesis Finance Master Thesis Marketing Intelligence

University of Groningen Faculty of Economics and Business

e-mail: d.p.j.bodewes@student.rug.nl

Asset Pricing

Supervisor: dr. E. de Haan Supervisor: dr. I. Souropanis

Abstract:

Financial market prediction using Twitter sentiment has drawn increased attention for the last decade. However, the majority of the research used predominantly politcal and economical proxies to make predictions about some of the largest stock indices in the world. This thesis finds that using an OLS multiple regression these political and economical proxies can also explain variations in the FTSE100 metrics (daily returns, daily volatility and daily volume), but also for the smaller UK stock indices (FTMC and FTSC). Furthermore, this analysis also provides evidence that new proxies inspired by the PESTLE (Political, Economical, Social, Technological, Legal and Ecological) model contribute in explaining variance of the stock indice metrics. Another OLS multiple regression finds that also other Twitter metrics (Likes and Retweets) are able to explain variations in these stock indices metrics. Finally, combining the 18 Twitter proxies into a 3 Factor Model contributes to either explain equally or more variance than the best performing invidual proxy. A MAPE comparison of the 3 factor models predictions relative to the prediction of the random walk provides evidence that the 3 Factor Model reduces the prediction error for the FTSE100 returns (215%), FTMC returns (130%), FTSC returns (60%) and the FTSE100 volume (0.08%).

(2)

2

Table of Content

1.INTRODUCTION ... Error! Bookmark not defined.

2. LITERATURE REVIEW ... Error! Bookmark not defined. 2.1 Twitter keywords & UK stock indices ... Error! Bookmark not defined. 2.2 Twitter Sentiment forecasting ... Error! Bookmark not defined. 2.3 Likes and Retweets ... Error! Bookmark not defined. 2.4 Factor Analysis ... Error! Bookmark not defined.

3. DATA COLLECTION ... Error! Bookmark not defined. 3.1 Twitter Sentiment Data ... Error! Bookmark not defined. 3.2 Polarity Score ... Error! Bookmark not defined. 3.3 Tweets sample size and weekend observations ... Error! Bookmark not defined. 3.4 Stock indices ... Error! Bookmark not defined. 3.5 Descriptive statistics ... Error! Bookmark not defined.

4. METHODOLOGY ... Error! Bookmark not defined. 4.1 Granger Causality ... Error! Bookmark not defined. 4.2 OLS multiple regression ... Error! Bookmark not defined. 4.3 Principal Component Analysis ... Error! Bookmark not defined. 4.4 Mean Absolute Percentage Error ... Error! Bookmark not defined.

5. RESULTS ... Error! Bookmark not defined. 5.1 Granger Causality ... 30

5.2 OLS multiple regression ... Error! Bookmark not defined. 5.2.1 DMPS lag variables ... Error! Bookmark not defined. 5.1.2 DMPS including SDR and SDL ... Error! Bookmark not defined. 5.3 Factor Analysis ... Error! Bookmark not defined. 5.4 Mape Calculation ... 44

6. Conclusion ... Error! Bookmark not defined. 6.1 Granger Causality ... 48

7. Acknowledgements ... Error! Bookmark not defined.

8. REFERENCE LIST ... Error! Bookmark not defined.

(3)

3

1.INTRODUCTION

“Brexit will be a bloody mess, our economy will collapse”. This anonymous Tweet

was posted right after the Brexit referendum results on the 23rd of June in 2016. The next day

the FTSE100 dropped 8.5%,experienced the highest trade volume in the period between 2015

till 2020, after which the index closed 3.2% lower than it opened1. Could this Tweet have

predicted these large changes in the next day’s FTSE100 metrics? It is hard to argue that this single Tweet had such a significant impact. However, a large number of Tweets could expose changes in public sentiment containing information about the future actions of investors, which would make stock index metrics predictable.

According to the semi-strong Efficient Market Hypothesis, all past and public information is already incorporated in the price, and therefore movements in the stock markets follow a random walk. This implies that stock markets are as predictable as the path of cumulated random numbers (Fama, 1965). However, the Behavioral Finance literature emphasizes the importance of the public sentiment and its respective effect on financial decision making (Blasco, Corredor & Ferreruela, 2012; Prechter, 1999). In the literature, various mediums have been used to extract this public sentiment to predict stock indices, such as newspaper headlines (Yang, Song, Mo, Datta & Deane, 2015), search engines (Mao, Counts & Bollen, 2011), and social media data (Gross-klussmann, König & Ebner, 2019). The last decade, sentiment analysis via social media gained popularity due to the increase of users and therefore availability of data. Above that, Twitter has multiple practical advantages to collect data, for instance: 1) The data is freely available; 2) Tweets can be scrapped for different time horizons;

3) Specific content can be acquired by scrapping2 for Tweets that contain specific keyword(s).

First, the United States (US) stock indices have been the main object of interest, where

various studies used Twitter Sentiment to predict the stock indices such as S&P500 (Zheludev, Smith & Aste 2014; Renault, 2017), DIJ, and the NASDAQ100 (Mao, Counts & Bollen, 2011). However, research about the Twitter sentiment effects on Non-US stock indices has been mainly left obscure. Although, Garcia (2016) provided empirical evidence

(4)

4

that Twitter sentiment about various economical and political topics contained information to predict several non-US major indices, such as for Australia (AS51), Japan (NKY), South Korea (KOSPI) and Israel (TA-25).

Therefore, this study aims to further explore non-US stock indices to establish cross

validity for some of the empirical evidence found in comparable US studies (Mao, Counts & Bollen, 2011; Renault, 2017; Rao & Srivastava, 2012). Furthermore, Garcia (2016) already demonstrates that there are differences in results for national stock indices with different market caps. This suggest that Twitter sentiment may also be able to make predictions about smaller stock indices. These smaller stock indices are not yet as widely studied as their larger counterparts. However, there is evidence that the effect of investor sentiment on asset returns is even stronger for stocks with lower market capitalization, while their visibility is lower it would slow down investors reactions to news information (Chan, 2003; Tetlock et al., 2008; Ferguson et al., 2015) This implies that smaller stock indices, as an aggregation of those smaller stocks, may also be predictable using investor sentiment.

Apart from the size of the analyzed stock indices, there is also large variation in the

manner at which Twitter sentiment is acquired. Though, the literature

predominately acquires Tweets using keywords that can be categorized as either financial or political for their respective forecasting research. As an example, Mao, Counts & Bollen (2011) found empirical evidence for the prediction of US stock indices using 26 financial and economical keywords. While these two categorical keywords have proven to be relevant in sentiment analysis for stock indices metrics forecasting, it may be that other categorical keywords can also be used for stock index predictions. This for the reason that the stock markets are not only affected by financial and political matters, but also by Social (Teoh, Welch & Wazzan, 1996), Technological (Mohr-Jackson, 1994), and Ecological (Serafeim, 2018) factors. Therefore, this research focuses on these factors’Twitter metrics and their respective effects on the UK stock indices. This contributes to the existing literature by expanding our knowledge about what topics contain relevant Twitter sentiment for stock market predictions.

As such, this research builds on the work of Garcia (2016) by expanding the

(5)

5

Ecological) proxies. These proxies are used to investigate whether daily Twitter sentiment, total likes, and total Retweets can predict the daily volume, volatility, and returns of the FTSE100, FTMC, and the FTSC over a time period of 01/01/2015 till 01/01/2020. For this empirical study, I collected 2,966,642 Tweets for 18 keywords over the same period.

A Granger causality test found that the lags of various proxies contain information to

explain variations in the UK stock indices metrics. By the estimation of a series of regressions, I find that various proxies are able to explain variations in several of the UK stock indices metrics, but that the conventional financial and political proxies outperform the new proxies. Furthermore, another series of regression demonstrate that the addition of Retweets and Likes can explain, for most proxies, more variation in these UK stock indices metrics. Also, combining the proxies in factors results into explaining equal or more variation than the best performing individual proxy per specific stock index metric. Lastly, Mean Absolute Percentage Error ratios demonstrate that the prediction of the stock indices returns and FTSE100 volume improves relative to a random walk, but also that larger stock indices returns are better to predict than for smaller stock indices.

This thesis is structured as follows. Section 2 reviews the existing literature on Twitter

sentiment metrics and their respective effects on the stock indices metrics. It also reviews the literature that contributes to the selection procedure of the proxies and their associated keywords. Section 3 discusses the data samples and the methodology used to predict the variations in the stock indices metrics. Section 4 discusses the key findings of the study and Section 5 wraps up with the conclusion, data limitations, and recommendations for future research.

2. LITERATURE REVIEW

The prediction of stock markets is a topic of high interest in both corporate and scientific

(6)

6

Behavioral Finance challenges the EMH by emphasizing that behavior and emotions steer movements in the financial markets, which changes in public sentiment (Prechter, 1999). This indicates that by measuring this public sentiment it provides information to forecast financial markets movements. In the last two decades, research measures this public sentiment via online data to make financial predictions. Several studies confirm that news media content shapes the investor sentiment, which influences their choice for financial assets ( Devitt & Ahmad, 2007; Mao, Counts & Bollen, 2011; Vanstone & Gepp, 2018)

Also, Empirical evidence proves that web search data contains

predictive information for fluctuations in the financial markets (Bollen, Counts & Mao 2011; Bordino et al., 2012). Where variations in search volume indicates the interest in certain financial assets and therefore precedes the actual changes in the financial assets. In the last decade, social media receives increasing interest as a medium to acquire public sentiment. One of these social media is Twitter, where several studies provided empirical evidence for its prediction practicalities of the stock market movements (Mao, Counts & Bollen, 2011; Rao & Srivastava, 2012; Gross-klussmann, König & Ebner, 2019). Already several financial keywords have been used to capture relevant Twitter sentiment that could forecast the stock

markes (Rao & Srivastava, 2011; Garcia, 2016;Usher, Dondio & Morales, 2019).

This research validates results found in previous Twitter sentiment studies by analyzing three UK stock indices. Similarly, this study expands the type of keywords that contains sentiment with predictive information. This literature review further explores the concept of twitter sentiment, number of likes and Retweets, and their respective effects on the stock indices metrics. Additionally, the PESTLE factors and their respective effects on stock markets are reviewed to select the relevant keywords for this study.

2.1 Twitter Keywords & UK Stock Indices

Text mining programs can quantify Tweets into a sentiment score, also known as a

(7)

7

predictability. Multiple studies already found that use of financial and political Twitter topics contain information to stock metrics such as returns, volatility and volume of several US stock indices, i.e. S&P500 (Gross-klussmann, König & Ebner, 2019; Renault, 2017 ), NASDAQ-100 (Rao & Srivastava, 2012) and the DJI and the VIX (Bollen, Counts & Mao, 2011).

While these findings provide sufficient evidence for Twitter as a medium to make

predictions about the US stock markets, it is interesting to find out whether these findings extrapolate to other (smaller) stock indices. According to Garcia (2016) stock markets do not perfectly commove with another, and therefore may respond differently to public sentiment influenced by the news. However, since the FTSE100 and the S&P100 strongly correlate (Garcia, 2016), one can infer that these two Anglo-Saxon stock markets are exposed to similar influences. Therefore, public sentiment may be among these influences, which would lead to find similar results as in previous studies. Concerning the UK stock indices, the literature confirmed that Twitter sentiment about Brexit events were able to predict changes in the FTSE100 (Usher, Dondio & Morales, 2019). Also, Johnman, Vanstone & Gepp (2018) found that financial news articles contain sentiment that can be used to make predictions about the volatility and returns of the FTSE100 index.

On the other hand, regarding smaller stock indices it is harder to form expectations, due

to the scarce available empirical research in the literature. Renault (2017) found that financial news sentiment contains predictive information about the IWM ETF, but the effects are less strong than for the major US stock indices. Conversly, Johnman, Vanstone & Gepp (2018) suspect that Twitter sentiment would be even better at make predictions of the UK FTMC, since reduced visibility of smaller stocks would slow changes in positions. Therefore, since small caps have lower availability of liquidity, a sudden rise in demand or supply could cause higher returns, relative stronger increases in volume, and also more volatility (Chan, 2003; Tetlock et al., 2008; Ferguson et al., 2015). While this reasoning may be theoretically sound, there is no extensive empirical evidence to back up this claim for the predictive power of Twitter sentiment on the respective smaller stock indices. Therefore, I assume to find at least similar performance of conventional Twitter sentiment for the smaller stock indices relative to the larger stock indices, which results in the following hypotheses:

(8)

8 Hypothesis 1: The conventional Twitter proxies used in previous literature contribute to make

predictions about the a) FTSE100, b) FTMC, and/or c) FTSC trading data of a) daily return, b) daily volatility, and/or c) daily volume.

2.2 PESTLE Proxies Sentiment & Financial Markets

As indicated before, the literature is limited in the variety of Twitter sentiment keywords

to make stock index predictions, especially about the UK indices. Besides the extensive research about several financial and political keywords, other sources that may be relevant for stock market forecasting has largely remained uncharted. I consult the PESTLE model to structure my search for new keywords to extract Twitter sentiment that may be relevant for stock market prediction. The PESTLE model offers a macro-environmental framework that contains factors that influence the performance of a company, industry, or country (Perera, 2017). In the literature, the associated proxies of these 6 factors have several examples in the literature where it changes investor sentiment and as such affects the financial metrics. These proxies can be translated into keywords to collect the Tweets to use them for stock market forecasting. As an example, one proxy of the political factor can be the UK prime minister, which can be translated in keywords (e.g. Boris Johnson or Theresa May).

(9)

9

The Economical factor includes proxies that are most frequently used

for Twitter sentiment and stock indices metrics predictability, which are predominately related to financial terminology. Frequently, the literature uses methods that quantify the message of a Tweet as Bullish or Bearish (Rao & Srivastava, 2012) , or simply scraping for Tweets containing the keywords Bullish and Bearish (Mao, Counts & Bollen, 2011). These methods allowed to make predictions about several US stock indices, such as the NASDAQ-100 (Rao

& Srivastava, 2012), the VIX, and the DJI (Bollen, Mao & Counts, 2011).

Furthermore, several studies used the names of stock indices (e.g. keyword FTSE100) to make predictions about the corresponding stock index (e.g. stock index FTSE100) (Zheludev, Smith & Aste 2014; Garcia, 2016; Gross-klussmann, König & Ebner, 2019) or used financial terminology to make predictions about several US stock indices, among which the S&P500 (Bollen, Counts & Mao, 2011). While Zheludev, Smith & Aste (2014) also found return predictability for several UK stocks. However, the literature is not concise about whether economical Twitter sentiment also can make predictions about the UK stock indices.

Also, Social proxies have been known to affect financial assets. A well-known example

is the effect of social unrest that causes a flight from stocks to gold, with the corresponding effects on the stock indices metrics (Teoh, Welch & Wazzan, 1996). Furthermore, search query volume about keywords such as ‘recession’ and ‘crisis’ were found indicative for changes in stock indices metrics (Mao, Counts & Bollen, 2011). However, the literature does not provide insight into the effects of social Twitter sentiment and forecasting stock indices metrics. On the other hand, Culotta (2012) indicated that the use of Twitter data can help to identity epidemic outbreaks faster and more economically than traditional methods. The COVID-19 epidemic demonstrated that changes in investor sentiment caused increased volatility and affected returns of numerous of the world's stock markets (Zhang, Hu & Ji, 2020). This infers that Twitter may be a useful medium to pick up these social sentiment changes early and predict their corresponding effects on the stock markets.

The Technological proxy sentiment for stock indices can be featured by innovation

(10)

10

is not decisive for the position that investors take, but that sentiment about the outcome is more important. To involve sentiment measurement to social media, there was found evidence that search query volume about search terms, such as ‘buy stock’ and ‘sell stock’, can make predictions about the VIX and DJI (Mao, Counts & Bollen, 2011). The first page of Google search output displays suggestions of stock(s) (indices), due to promising innovations of the respective companies or industries. If search interest enables stock index predictions, then due to a strong correlation with Twitter sentiment this may also be the case for acquiring technological twitter sentiment (Bollen, Counts & Mao, 2011).

The Ecological factor receives increasingly more attention from investors and the

impacts on stock markets, due to increased awareness of our impact on the world. The literature provided evidence that negative ESG (Environmental, Social & Government) sentiment has a negative effect on stock returns. (Serafeim, 2018). For example, In the Real Estate industry, the public sentiment about flooding risk decreases the amount of willingness-to-pay for a square foot realty in certain areas (McApline & Porter, 2018). As such, sentiment about ecological risk may influence the value of financial assets. The available literature is not concise about the effects of ecological Twitter sentiment and its respective effects on stock markets.

The above section illustrates that the literature provides sufficient evidence about the

effects of Twitter sentiment of the Political, Legal, and Economical (PLE) factors for stock (indices) predictability. On the other hand, for other the factors Social, Technological, and Ecological (STE) the literature has not yet established such extensive evidence on their Twitter sentiment effects. However, the literature provides evidence that via other mediums the proxies of these factors can be used to make predictions about stocks or/and indices. Thereby, Investor sentiment is not a new phenomenon originated with the foundation of Twitter, but Twitter is merely a new source to acquire this investor sentiment. Therefore, I expect that proxies of the STE factors can also be used to acquire Twitter sentiment relevant for UK stock indices predictability.

However, since these topics are not always as directly related to stock indices, such as

(11)

11

immigration, innovation and climate change) are more abstract events that stretch over longer time periods, which makes the interpretation of their future effects on the stock market more ambiguous. Several proxies of the PLE factor will be used cross-validate results of previous research. But, also new STE and PLE proxies are introduced to expand our knowledge about sources of Twitter sentiment that can be used for stock market predictability. Which results in the following hypotheses:

Hypothesis 2a: The new Twitter proxies chosen via the PESTLE model contribute to

predicting the a) FTSE100, b) FTMC, and/or c) FTSC trading data for the a) daily return, b) daily volatility and/or c) daily volume.

Hypothesis 2b: Twitter sentiment about the PLE factors will outperform that of the STE factors

by predicting the a) FTSE100, b) FTMC, and/or c) FTSC trading data for the a) daily return, b) daily volatility, and/or c) daily volume.

2.3 Likes and Retweets

While Twitter sentiment reflects the consensus of a group of Twitter users about a

certain topic, it withholds information about the reach of this consensus or the number of people that agree with this consensus. This agreement can be measured by silent Twitter users who ‘Like’ the Tweet, which assumes that they agree with the written content. They can also ‘Retweet’ the Tweet, which allows the followers of the ‘Retweeter’ to see the message and as such enhances Tweets’ reach. Previous literature finds that the total volume of Tweets containing the name of a stock index predicts daily stock index returns and volume (Bollen, Mao, & Zeng, 2011; Tirunillai & Tellis, 2012). While a Retweet contains the same message, the number of Retweets provides information about the number of silent users that message reaches. All these potential reached investors can use this information to make decisions about their position in financial assets. Additionally, the amount of Likes also signals the reach of potential investors, but at the same provides additional information by indicating whether a person agrees with the Tweets content. Previous research finds that disagreement among

investors increases the daily volume and volatility of stock markets (Cookson

(12)

12

stock markets. Therefore, I expect that the total amount of Likesdecreases volatility and

volume. Additonaly, Retweets may work as amplifiers of the Twitter sentiment effect on the stock indices metrics, due to the increase in reach of the sentiment. Which results in the following hypothesis:

Hypothesis 3: The Tweets’ Likes and Retweets improve, in addition to Twitter sentiment, the

prediction of the a) FTSE100, b) FTMC, and/or c) FTSC trading data of a) daily return, b) daily volatility, and/or c) daily volume.

2.4 Factor Analysis

The financial markets are a complex system that is affected by the behavior of many

agents, through a system of actively and passively receiving various information via a vast (online) financial news sources (Alanyali, Moat & Preis, 2013). This suggests that predicting the financial markets based on one individual source (Twitter) about one specific topic (Proxy) is difficult. As such, the individual proxies may only explain small parts of the stock market metrics. However, a combination of these proxies may increase the explanatory power of the stock market metrics. Several other papers used multiple proxies to forecast stock market metrics (Mao, Counts & Bollen, 2011; Gross-Klussmann, König & Ebner, 2019). Therefore, it is interesting to see if the combination of these proxies can improve the prediction of the selected stock market metrics. Additionaly, this combination proxies allows to evaluate the performance of predicting the stock market metrics relative to predictions of a random walk. This results in the following hypotheses:

Hypothesis 4a: The combination of proxies, relative to the individual proxies, will improve the prediction of the a) FTSE100, b) FTMC, and/or c) FTSC metrics for the a) daily return, b) daily volatility and/or c) daily volume.

Hypothesis 4b: The combination of proxies outperforms the random walk prediction of the a) FTSE100, b) FTMC, and/or c) FTSC metrics for the a) daily return, b) daily volatility and/or c) daily volume.

(13)

13

The advent of social media platforms can be held responsible for the rapid pace at which

information is received by an increasing size of the population. Twitter is among these

platforms that have seen its user base grow to over 321 million people in 20193. For investors

and traders not only this new pace and size of information can be used for making financial decisions, but also its changes in the content sentiment. The change in sentiment can be captured by using text mining programs that quantify the sentiment of Tweet into a polarity score. Research has demonstrated that these polarity scores can be used to make predictions about trading data, such as asset returns, volatility, and volume. Blasco et al. (2012) found that sentiment plays a key role in investor herding behavior, where a sentiment change causes investors to follow this trend. Therefore, variation in the sentiment consensus of investors can cause movements in the financial markets. Hence, the following step is to acquire the right topics that can acquire Tweets containing sentiment that influences stock indices metrics. The subsections below will further elaborate on the data specifics used for the analyses. Furthermore, the keyword selection procedure will be explained for acquiring the relevant Tweets for forecasting the stock market metrics.

3.1 Twitter Sentiment Data

The data used for the sentiment analysis are the Tweets that people post via their Twitter

account. These tweets were scrapped via an online Application Programming Interface (API)

via Python using the package ‘Twitterscraper’ developed by Taspinar4. The Tweets used in this

research are in English and scrapped over a period of 01-01-2015 till 01-01-2020. As earlier explained the keywords for acquiring the Tweets were selected using the factors of the PESTLE model. Table 1 provides an overview of the selected keywords. For testing both hypothesis 1

and hypothesis 2a-b,I made a distinction between keywords used in previous social media

sentiment and selected proxies via the following procedure.

First, as already explained earlier this research uses the PESTLE structure to select

keywords that were both used in the previous literature as new keywords. Keywords used in previous studies are used to validate their prediction ability of other stock indices on the UK stock indices. Second, the financial relevance of the conventional and new keywords was

(14)

14

checked using various online resources. 1) Google Trends was used to discover which keywords had high number of search volume, which indicates the popularity of the respective

keyword5. 2) The daily financial interest of these keywords was consulted via the search query

of the Financial Times6. The threshold to accept the keyword was that the number

of articles over the given study time period should exceed 1,825 (5 years a 365 days). This ensured that the keyword represents a topic that also enjoys a daily interest in the financial markets; 3) The third condition was that the keyword was searched for at least 5 times a day on

Wikipedia7, which is used as an indicator for daily new interest in the topic that the keyword

represents. This signaled that the keyword remained relevant over this given time period of the study. The keyword was selected for this research if all 3 conditions were met. However, an exclusion was made for the FTSC proxy since it represents one of the used stock indices.

3.2 Polarity Score

After the Tweets were collected the subsequent step was to quantify the Tweets into a polarity score, such that it could be used for the stock indices predictions. The polarity scores were calculated via the ‘qdap’ function, which assigned scores to the tweets using the following formula:

!"# = %&1 −*+,().)0⋅ 2 4 – 1 δ = 789 √;

Eq. 1

The algorithm analyses the Tweet for positive and negative words and labels them as polarized words using a sentiment dictionary (Hu and Liu, 2004). Subsequently, a context cluster of

words (X=>) is pulled from around these polarized words (default 4 words before and two words

after) to be considered as valance shifters. Subsequently, the positive and negative words are given a context by tagging the other words in the cluster as neutral, negator, amplifier, or de-amplifier.

5https://trends.google.com/trends/?geo=US 6 Ft.com/searc

(15)

15

TABLE 1

Proxy Keywords

Political Economical Social Technological Legal Ecological

Category A Obama1, Trump1, David Cameron2, Theresa May2, Boris Johnson2 Brexit Bearish, Bullish, Economy UK, FTSE100, GBP/EUR Recession Buy stock, Sell stock

Category B European Union

FTSE250, FTSC,

Immigration UK

House of

Commons Climate Change

Unemployemnt Rate UK

Note: Labeled as proxy: 1. US president, 2 UK prime minister,

Note: Category A represents the proxies used in previous literature and Category B the new proxies introduced in this research

(16)

16

In these algorithms, neutral words hold no value in the equation, but do affect word

count (n). Negating (not, don’t) words are reversing the intent of a positive or negative word. Amplifying words (strongly, very) increase the insanity of a positive or negative word. Demplyfying words (barely, mildly) decrease the intensity of a positive or negative word. The cluster scores are summed up and divided by the square root of the number of words in the Tweets, which returns a sentiment score per Tweet. For a more extensive explanation of the calculation of the polarity scores, Appendix B of De Haan (2020) can be

consulted or the rdocumentation webpage8. After acquiring the polarity scores, I calculated the

1) polarity mean of the Tweets, 2) the sum of daily Likes, 3) sum of daily Retweets. I selected the sum instead of the mean for both Likes and Retweets since I am interested in the total reach of the Tweets instead of the average reach.

3.3 Tweets sample size and weekend observations

Since the stock indices only trade during the working days, I aggregated the weekend

Tweets with the Friday Tweets. Because the information of the weekend Tweets may contain information that can predict the stock market metrics of Monday, which prevents a loss of information. The Tweets are cleaned from symbols, punctuations, and numbers, as such that this would not influence polarity score calculation. Furthermore, empty Tweets were removed from the dataset, since they did not provide any valuable information.

A second problem was the determination of the number of daily observations, which would ensure a representable sample of the daily Tweets. This study maintains a daily median of 120 daily Tweets across proxies. The daily sample can deviate across proxies the amount of daily available Tweets (FTSC proxy) or due to inconsistencies in the Python package. Figure 1a-b demonstrates the confidence level of the daily variation of the Bearish polarity scores, by illustrating the polarity mean and the up- and downside deviations expressed in two times the daily standard error. This plot includes daily samples with a median of 120 and reflects the confidence interval of 95% in which the observations occur. Figure 1a-b illustrates that the polarity mean demonstrates enough daily variation to predict the stock indices

(17)

17

metrics. Furthermore, a sample size test was done to provide an answer to how far the sample is

removed from the true mean9. With the assumptions of a two-way sample and a 5% probability

for both a Type I and II error, a median of 120 has the certainty to fall in the range of a 0.8 standard deviation of the true mean.

FIGURE 1a-b Figure illustrates the polarity mean (red) plus 2 times the

standard error (blue) and minus 2 times the standard error (green)

(18)

18

3.4 Stock indices

I acquired the stock indices data via Yahoo Finance of the FTSE100 (^FTSE), FT Mid

Cap (^FTMC), and the FT Small Cap (^FTSC)10. These stock indices are plotted in Figure

2a-c. Yahoo Finance allows easy access to financial data, which is freely available on the internet. All datasets are complete, only for the FTSC contained missing values for the last two months of the year 2019 on Yahoo Finance. To prevent issues of nonnormality, all the stock indices metrics were log-transformed. The intraday volatility (TIV) was calculated by ln(Price

max-t/Price mint), where Price maxt is the highest price of day t, Price mint the lowest price of the

day t and ln stands for the natural log. The daily returns (TDR) were calculated by

ln(Pa.closet/P.opent), where Pa.closet is the days' t adjusted close price and the P.opent was the

days' t opening price. The volume (TV) is calculated by taking ln(Volumet), where Volumet is

the days’ t volume. All stock indices include these three metrics for the analyses, except the FTSC volume which has too many omitted observations for this metric.

Figure 2a-c Stock indices metrics

(19)

19

3.5 Descriptive statistics

Since 18 proxies have to be covered, I will only discuss the descriptive statistics and

results with a broad description and the most outstanding outcomes. This means that only the Bearish proxy will be illustrated in the following subsections (based on alphabetic order). This for the reason to keep an overview about the approach and prevent an overload of information.

The tables and figures of the other proxies can be found in the Appendix. Table 2 and

Appendix A demonstrate the data descriptives of the Twitter variables: Polarity Score Likes and Retweets. Table 2 also includes the descriptives of the stock indices metrics. The total observation of the Twitter variables varies between 8,981 and the 222,417.

The variation of observations between the different proxies can be explained due to

(20)

20

which seems to make sense since these proxies carry respectively a more positive and negative nature. In general, standard deviations vary somewhere between 0.2 (FTSE100) and 0.33 (recession), which indicates there is variation in the daily sample. The Like and Retweets variables have a majority of observations that are zero and a small percentage that has extreme values, i.e. the max amount of Bearish Retweets and Likes are respectively 680 and 4,121.

Table 3 and Appendix B provide an overview of the correlations between the Twitter variables and the stock indices metrics. The lower part of the correlation matrices demonstrates the correlations between the Polarity Mean, Likes and Retweets of the different proxies. The signs of the correlation between on the one side Likes and Retweets and on the other Polarity Mean is variable per proxy, but are all approximately zero to low. The correlation between Likes and Retweets are all positive and high, which indicates that a high number of Tweets comes with higher number of Likes. This causes attention for multicollinearity, however the VIF scores stay below 10 for all proxies. So this means that multicollinearity does not jeopardizes robustness of the later analyses (Franke, 2010).

The upper part of the correlation matrices reviews the correlations between the Twitter

variables and the stock indices metrics. In some cases, Retweets seem to have a weak to moderate correlation with FTMC Volume and Volatility, which is in line with using Retweets as a proxy for Tweets volumes as in the findings of Bollen, Mao & Zeng (2011). Concerning the number of Likes and the FTSC volatility, the correlations are predominately positive yet sometimes barely above zero. Strangely the correlation between the FTSE100 volume and volatility and Likes and Retweets is generally negative. An explanation could be the lower volumes of the FTMC and the FTSC, which makes them more sensitive to changes Tweets’ reach in a negative manner.

The correlations between the proxies’ Polarity Mean and the stock indices metrics

(21)

21

TABLE 2: DESCRIPTIVE STATISTICS FOR STOCK INDICES METRICS AND BEARISH PROXY METRICS

Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max

FTSE100 Returns 1,202 0.0002 0.01 -0.05 -0.004 0.0005 0.01 0.04 FTSE100 Volume 1,202 20.44 0.26 19.04 20.31 20.44 20.57 22.05 FTSE100 Volatility 1,202 0.01 0.01 0.002 0.01 0.01 0.01 0.09 FTMC Returns 1,204 0.0003 0.01 -0.07 -0.004 0.001 0.005 0.04 FTMC Volume 1,204 18.96 3.15 0.00 19.29 19.47 19.64 20.71 FTMC Volatility 1,204 0.01 0.01 0.002 0.01 0.01 0.01 0.13 FTSC Returns 1,106 0.0003 0.01 -0.04 -0.002 0.001 0.003 0.03 FTSC Volatility 1,106 0.01 0.01 0.001 0.003 0.004 0.01 0.11 Polarity Mean 1,241 -0.25 0.07 -0.67 -0.29 -0.25 -0.20 0.02 Retweets 1,241 3.45 1.10 0.00 2.83 3.47 4.16 6.57 Likes 1,241 4.63 1.25 0.69 3.81 4.62 5.42 8.76

(22)

22

while FTSE100 and FTSE250 are substitutes based on the corresponding proxy polarity mean.

The polarity mean also correlates differently with the volatility and volume of the stock

indices. In general, polarity mean correlates both positive and negative with the volume, while correlating negatively with the volatility of the three stock indices. This may be explained by the fact that a lower polarity scores signals increased disagreement, while higher polarity scores signals more agreement, based on the findings of Cookson & Niessner (2020). Also, the variation in signs for the volume and volatility may be explained by whether a proxy embodies a negative or positive subject among investors. For example, one explanation for the positive

correlation between EU polarity and Volume (corr = 0.124***) may be due to increased market

interest, because of more public confidence in the Brexit negotiations.

4. METHODOLOGY

The aim of this study is to predict the stock indices metrics, which are the returns,

(23)

23

Table 3: Correlation matrix with the Dependent Variables and the Bearish Independent Variables

Dependent variables Independent variables

FTSE Returns FTSE volatility FTSE Volume Returns FTMC volatility FTMC Volume FTMC Returns FTSC FTSC volatility Bearish Likes-1

Bearish Retweets-1 Bearish Polarity-1 FTSE Returns -0.135*** -0.109*** 0.808*** -0.170*** -0.052 0.732*** -0.137*** -0.045 -0.025 0.019 FTSE volatility -0.135*** 0.480*** -0.231*** 0.792*** -0.017 -0.276*** 0.556*** -0.087** 0.006 -0.134*** FTSE Volume -0.109*** 0.480*** -0.206*** 0.581*** 0.419*** -0.177*** 0.404*** -0.090** -0.008 -0.197*** FTMC Returns 0.808*** -0.231*** -0.206*** -0.326*** -0.063* 0.809*** -0.239*** -0.040 -0.040 0.068* FTMC volatility -0.170*** 0.792*** 0.581*** -0.326*** 0.102** -0.325*** 0.628*** -0.042 0.039 -0.150*** FTMC Volume -0.052 -0.017 0.419*** -0.063* 0.102** -0.044 0.071* 0.190*** 0.039 0.125*** FTSC Returns 0.732*** -0.276*** -0.177*** 0.809*** -0.325*** -0.044 -0.249*** -0.041 -0.053 0.040 FTSC volatility -0.137*** 0.556*** 0.404*** -0.239*** 0.628*** 0.071* -0.249*** 0.027 0.071* -0.065* Bearish Likes-1 -0.045 -0.087** -0.090** -0.040 -0.042 0.190*** -0.041 0.027 0.780*** 0.361*** Bearish Retweets-1 -0.025 0.006 -0.008 -0.040 0.039 0.039 -0.053 0.071* 0.780*** 0.102** Bearish Polarityt-1 0.019 -0.134*** -0.197*** 0.068* -0.150*** 0.125*** 0.040 -0.065* 0.361*** 0.102**

Computed correlation used pearson-method with listwise-deletion.

* p < 0.05, ** p < 0.01, *** p < 0.001

(24)

24

4.1 Granger Causality

The first approach is a Granger causality test in order to research whether Twitter

sentiment past values can predict stock market metrics. This test is frequently used in previous literature to find relationships between twitter proxies sentiment and the stock market metrics (Garcia, 2016; Mao, Counts & Bollen, 2011; Usher, Dondio & Morales, 2019). The Granger causality test is used to find supportive evidence for hypotheses 1 and 2. The simplest linear causal model is (Granger, 1969):

!"= ∑()*'+,&' -".'∑()*'+,/'!".'+ 1" Eq. 2

where the Granger Causality test analyses whether the inclusion of an independent

variable -, for time t and j lag, helps to make predictions about a dependent variable !", by

reducing 1". This means that -" granger causes !" if &' is significantly different from zero. Therefore, this analysis investigates whether the daily mean polarity score (DMPS) improves forecasting of the stock indices metrics, considering that the past values of stock indices metrics are included into the prediction. However, the Granger causality test does not prove actual causality, but establishes if a statistical pattern of lagged correlation exists. This analysis contributes to review what past values of DMPS are useful for the stock indices metrics. The hypotheses would be supported with evidence if the Granger causality would find a significant

Granger causality relation between twitter sentiment and the stock indices metrics.

4.2 OLS multiple regression

I further conduct an OLS multiple regression to investigate whether the polarity of

(25)

25

market. This test provides supportive evidence to review hypotheses 1 and hypothesis 2a-b, and as such if the proxy lags contain information to predict the stock indices metrics. Equation 1 is used to identify the effects of the DMPS lags on the stock indices metrics:

!",3,' = 4 + /13,6 ∙ 89:;3,".6+ 1",3,6 Eq. 3

where the variable 89:;3,".6 is the daily mean polarity score for m proxy in t year and n lags.

The term !",3,' is the stock index metric explained by m proxy for j stock index metric (daily

return, daily volatility and daily volume) in time t. The formula allows to select the amount of lags that result in the models that explain the most variance of the stock indices metrics.

Another OLS multiple regression investigates the effects of sum of daily Likes (SDL)

and sum of daily Retweets (SDR) and their ability to improve explaining the variance of the stock indices metrics. The SDR is expected to increase the effect of the DMPS and the SDL is expected to ventilate the agreement about the sentiment topics. This test is used to find supportive evidence for Hypothesis 2. The analysis uses Equation 2:

Where the variable ;8<3,"., is the sum of daily Likes for m proxy and t year and the

variable ;8=3,"., represents the sum of daily Retweets for m proxy and t year. (;8=3,".,

89:;3,".,) illustrates the interaction effect between the sum of daily likes for m proxy and t year and the daily mean polarity score for m proxy in t year. The other symbols are the same as for Eq. 2.

4.3 Principal Component Analysis

I use a Principal Component Analysis (PCA) in order to find out whether the proxies

!",3,' = 4 + /13 ∙ 89:;3,"., + /23∙ ;8<3,".,+ /33 ∙ ;8=3,"., + /43(;8=3,".,∙ 89:;3,".,) + 1",3

(26)

26

combined are better able to predict the stock market metrics11. A PCA reduces the

dimensionality of the data by trading accuracy for simplicity(Leeflang, Wieringa, Bijmolt & Pauwels, 2015). PCA decreases the large number of predictiors by transforming closely related variables into new uncorrelated factors that capture the maximum variability (Souropanis, 2019, p. 53 ). As such, the PCA leads to a more parsiomonious model and prevents multicollinearity between the explanatory variables. A more detailed explanation of PCA can be found in the Appendix of Souropanis (2019). This test is used to find supportive evidence for hypohtesis 4. For this analysis Equition 3 is used:

!",3,' = 4 + C /D

E

D+,

FD,".6+ 1",3 Eq. 5

where !",3,' is again the stock index metric explained by m proxy for j stock index metric(daily

return, daily volatility and daily volume) in time t. Furthermore, FD,".6 represents the kth

principal component for time t and days n. This equation allows to analysis which amount of K factors and their lags deliver the optimal model fit in terms of explained variance.

4.4 Mean Absolute Percentage Error

Following, among others, Mao, Counts & Bollen (2011) and Rao & Srivastava (2012),

I use a Mean Absolute Percantage Error (MAPE) to evaluate the predictive validity of the PCA. MAPE is dimensionless and therefore independent of scale, which makes it easy to compare forecast accuracy across different settings similar to other studies (Leeflang, Wieringa, Bijmolt & Pauwels, 2015). This ratio, expressed as a percentage, allows to compare the performance of different models in predicting different stock indices metrics. The ratio of the MAPE represents the percentual deviation upwards and downwards of the prediction relative to the true value. The MAPE is calculated using Equitation 5:

(27)

27 9G:H = I.I, ∗ C | LM.LNM LM | I "+I∗O, ∗ 100% Eq. 6

Where R∗ is the observations used for the predictions and T the total observations.

Furtheremore, S" is the true value and the SN" the predicted value. Besides that, the MAPE also

allows comparison to a naïve model to evualate the predictive validity of a model. In this study the naïve model represents the prediction of a random walk.

5. RESULTS

This section provides the results for testing the hypotheses by the analyses described in

the methodology section. First, a Granger Causality test provides an overview of which proxy lags contain variance that can explain the variability in the stock market metrics. Second, the information of the Granger Causality provides input for an OLS multiple regression to review whether exists a linear relationship between the proxies lags and the different stock indices metrics. Another OLS multiple regression examines whether the addition of Likes and Retweets contribute in explaining this linear relationship with the stock indices metrics. Third, a PCA evaluates whether the combination of proxies can strengthen the explanation of these stock market indices metrics. A final analysis consist of a MAPE calculation which assesses the predictive validity of the PCA models.

5.1 Granger Causality

The first analysis is to test which proxies’ lags contain information to explain variance

in the stock market indices. This is done by using Granger causality, which tests whether the DMPS lags are able to predict future changes in asset metrics. A Durbin Watson found no

autocorrelation in both the stock market metrics (Yt) and the proxies (Xt) and as such are

stationary time series that can be used for a Granger Causality test. As explained in the methodology section, Granger causality does not establish causality between two variables.

Instead it tests if the variation of Xt can predict variation of Yt in a structured manner. In other

(28)

28

Granger causality tests for the different proxy lags.

Both hypothesis 1 and 2a, which respectively state that the conventional and new

proxies contribute to make predictions about the stock indices metrics. These hypotheses find support for the proxies that surpass, in the Figure 2a-c, the upper dotted line.(p =0.10), middle dotted line (p =0.05) and lower line (p=0.01).

Since several proxies’ lags are significant, they provide evidence that DMPS lags

contribute in explaining the variance of several of the stock indices metrics. Figures 2a-c only demonstrate the significant Granger causality tests, which indicates for the excluded proxies that there was not enough statistical evidence to support hypotheses 1 and 2a. The Granger Causality test demonstrates significance for the new chosen proxies, such as Brexit, Climate Change, European Union, Immigration and the UK Prime Minister. This provides evidence for hypothesis 2a, that the chosen proxies via the PESTLE Model contribute in predicting the different UK stock indices metrics.

Furthermore, Figure 3a-c illustrates that the proxies Bearish, Bullish, Economy UK,

FTSE250 and FTSE100 have the most significant DMPS lags that are able to explain variance of Yt . These findings align with the findings of previous literature about the use of these proxies

(29)

29 Figures 2a-c.

Confidence p-values as a function of lag for the null hypothesis that pI(t) does not

Granger cause ri(t). The orange dashed line(p=0.10), grey dashed line (p=0.05)

and yellow dashed line (p=0.01) are marked as the limits of confidence p-values.

0,00001 0,0001 0,001 0,01 0,1 1 1 2 3 4 5 6 7

P-VA

LU

E

LAG

GRANGER CAUSALITY FTSE100

p = 0.1

p= 0.05 p=0.01

FTSE100 price ~ Brexit FTSE100 price ~ Bullish FTSE100 price ~ economy uk FTSE100 volume ~ Brexit FTSE100 volume ~ FTSE100 FTSE100 volume ~ US president FTSE100 volume ~ UK prime minister FTSE100 volatility ~ Bearish

(30)

30 0,001 0,01 0,1 1 1 2 3 4 5 6 7

P-VA

LU

E

LAG

GRANGER CAUSALITY FTSE250

p = 0.1 p= 0.05 p=0.01

(31)

31 0,0001 0,001 0,01 0,1 1 1 2 3 4 5 6 7

P

VA

KY

E

LAG

GRANGER CAUSALITY FTSC

p = 0.1 p= 0.05 p=0.01 FTSC price ~ Bearish FTSC price ~ Bullish FTSC price ~ buy stock FTSC price ~ sell stock FTSC volatility ~ Bearish FTSC volatility ~ EU

(32)

32

5.2 OLS multiple regression

5.2.1 DMPS lag variables

The Granger Causality tests indicated that several lags contain information about future

values of the stock market indices metrics. An OLS multiple regression established whether there exists a linear relationship between DMPS lags and the stock market indices metrics. The

models that include 4 DMPS lags are able to achieve the highest adjusted R2 , which is the

variance explained corrected for the number of variables used. Table 4 demonstrates the results of Eq. 1 for the Bearish proxy with the effects of the polarity lags for predicting the stock indices

metrics12. The DMPS lags effects are, in general, in line with the expectation on their respective

influence on the stock indices metrics.

Concerning the stock market returns, only a couple of proxies have significant effects.

The signs of the effects differ per proxy lag, for example the Bearish proxy delivered solely negative effects and the Bullish proxy both negative and positive. Concerning the Bullish proxy, the effects on the stock returns change sign for different DMPS lags. Renault (2016) devoted this reversion effect on stock index returns to noise trading. This situation is consistent with the presence of uninformed sentiment-driven traders which cause a price run on day t that is followed by a price reversel on day t+1 when arbitrageurs correct the mispricing (Renault, 2016, p. 36).

Regarding the stock market indices volatility, the effects of the DMPS lags

predominately have a negative sign, which is in line with the findings of Mao, Counts & Bollen (2011), that positive sentiment decreases volatility. However, there are exceptions with only positive signs, such as for the UK Prime Minister. This suggests that an increase in positive political sentiment causes more market movements. Concerning the effects of DMPS lags on the metric volume, there is much variation in sign between the proxies and between their respective. For example, the Bearish DMPS lags have a negative effect on the FTSE volume, while the Climate Change DMPS lags have positive effects for the same metric.

These sign differences across the proxies indicates that the effects are likely to

dependent on the nature of the proxy. Therefore, a general inference about the effect of Twitter sentiment on volume is not possible. For example, positive sentiment about the Bearish proxies

(33)

33

may cause a relief in the market which reduces trading volume?. While negative sentiment about the Climate Change proxies may results in tensity on the stock markets and increasing trading activity. When comparing the larger stock index (FTSE100) with the smaller stock indices (FTMC & FTSC), the effects on returns and volatility are largely similar. While the

effects on the volumes tends to be stronger for the FTMC and FTSC relative to the FTSE100.

To evaluate the hypotheseses, the adjusted R2 are taken into consideration to assess

which proxies explain the most varation of the future stock market indices values. Table 5

illustrates these adjusted R2and the significance of the models for Eq. 1. This overview

demonstrates that the best explained stock index metric is the daily volatility, followed by volume and returns. Daily volatility is best explained by the proxies FTSE100 (9%), European Union (5%) and Buy stock (4%). Daily volume is best explained by the proxies House of Commons (4%), Bearish (2%), and USA president (2%). The variance of daily returns is only explained by a couple of proxies, among which the proxies Bullish (1%), Bearish (1%), Climate Change (1%) and FTSE100 (1%). The proxies Economy UK and Immigration delivered no models that could explain variance of the different stock indices metrics.

The proxies performance can be compared relative to the literature. Bollen, Counts &

Mao (2011) found an adjusted R2 of 9.2% for predicting stock index returns using multiple

online news mediums for sentiment analysis, while using solely Twitter sentiment several individual proxies were capable of explaining approximately 1% of the variance in X. Also,

Renault (2016) ,using StockTwits sentiment, found adjusted R2 of approximately 1% to explain

variance in stock returns of the next half hour. Therefore, the explained variance of these Twitter proxies may be considered reasonable relative to these comparable studies. While the stock market is difficult to forecast and influenced by numerous variables, it can be considered a modest improvement to explain an additional part of stock indices metrics.

Concerning hypothesis 1a, Table 5 exhibits that all conventional proxies from previous

(34)

34

containing relevant information for explaining variance of the stock indices metrics.

Regarding the comparison between both conventional (Table 1, Category A) and new

(35)

35

Table 4

Multiple regression Polarity lags

Bearish Proxy

Dependent variable:

FTSE100 Returns FTSE100 Volatility FTSE100 Volume FTSE250 Returns FTSE250 Volatility FTSE250 Volume FTSC Returns FTSC Volatility

(1) (2) (3) (4) (5) (6) (7) (8) Intercept 0.0003 0.01**** 20.22**** 0.002 0.005**** 20.90**** 0.001 0.003**** DMPS t-1 0.002 -0.01** -0.38*** 0.01* -0.01** 2.57* 0.002 0.0005 DMPS t-2 0.01 -0.002 -0.27** 0.01** -0.01* 3.06** 0.003 -0.005* DMPS t-3 -0.003 -0.01 -0.07 -0.01 -0.001 0.07 -0.0000 -0.002 DMPS t-4 -0.004 -0.01** -0.19 -0.01 -0.004 2.11 -0.005* -0.003 Observations 1,198 1,198 1,198 1,198 1,198 1,2 1,102 1,102 R2 0.003 0.02 0.03 0.01 0.02 0.02 0.004 0.01 Adjusted R2 -0.001 0.02 0.03 0.01 0.02 0.01 0.0004 0.01 Residual Std. Error 0.01 (df = 1193) 0.01 (df = 1193) 0.25 (df = 1193) 0.01 (df = 1193) 0.01 (df = 1193) 3.13 (df = 1195) 0.01 (df = 1097) 0.01 (df = 1097) F Statistic 0.77 (df = 4; 1193) 7.55**** (df = 4; 1193) 10.68**** (df = 4; 1193) 3.06** (df = 4; 1193) 6.15 **** (df = 4; 1193) 5.24**** (df = 4; 1195) 1.10 (df = 4; 1097) 2.83** (df = 4; 1097)

Note: DMPS = Daily Mean Polarity Score. All variables are log transformed with the excpeption of

(36)

36

TABLE 6: MATRIX OF THE ADJUSTED R

2

AND SIGNIFICANT FOR THE MODELS OF EQ. 3

Twitter Proxies Stock Market Metrics

FTSE returns FTSE volatility FTSE Volume returns FTMC FTMC volatility FTMC volume FTSC returns FTSC volatility

Bearish 0 0,02**** 0,03**** 0,01** 0,02**** 0,02**** 0 0,01** Boris Johnson 0 0,04**** 0 0 0,02**** 0 0 0,01*** Brexit 0 0,01** 0 0 0,01** 0,02**** 0 0 Bullish 0,01** 0 0 0,01*** 0,01*** 0 0,01**** 0,00* Buy Stock 0 0,04**** 0 0 0,02**** 0 0 0,02**** Climate Change 0 0 0,01**** 0,01** 0 0,01** 0 0 Economy UK 0 0,01*** 0 0 0 0 0 0 European Union 0,00* 0,05**** 0 0 0,03**** 0 0 0,01** FTSC 0 0 0 0 0 0 0 0,01** FTSE100 0 0,09**** 0,02**** 0,00* 0,05**** 0 0,01** 0,05**** FTSE250 0 0 0 0,00* 0,00* 0 0 0 GBP/EUR 0 0 0,01**** 0 0 0,00* 0 0 House of Commons 0 0 0,01** 0 0 0,04**** 0 0 Immigration UK 0 0 0 0 0 0 0 0 Recession 0 0,03**** 0 0 0,01**** 0 0 0 Sell Stock 0 0,02**** 0,01*** 0 0,01*** 0,01** 0 0,01**** UK Prime Minister 0 0,02**** 0 0 0,01**** 0 0 0,01*** Unemployment Rate UK 0 0 0 0 0 0 0 0 USA President 0 0,02**** 0,02**** 0 0 0,01** 0 0

Note: Numbers indicate the adjusted R2 and include the level ****p<0.001, ***p<.01,**p<0.05,*p<0.10

(37)

37

5.1.2 DMPS including SDR and SDL

Another OLS multiple regression assesses the contribution of SDLt-1 and SDRt-1 in terms

of explaining variance of the future stock market indices values. Table 6 demonstrates the

results of Eq. 2 for the Bearish proxy13. Regarding the general effects of SDLt-1, these have a

negative effect on the FTSE100 volume and volatility, while these effects have a positive and negative effect on the FTMC volume and volatility. This indicates that an increase in SDL decreases the volatility for both stock indices and the volume of FTSE100, but increases the FTMC volume. This decreasing effect of SDL is in line with the findings of Cooksen & Niessner(2020), while an increase in agreement would result in lower market volatility and lower trading volumes. However, the positive effect on FTMC volume contradicts these findings. Therefore, it may be that another mechanism could explain this positive linear relationship.

Concerning the FTSC, SDLt-1 has a positive effect on volatility, except for the Climate

Change SDLt-1. Which indicates that an increase in SDL causes more daily volatility of the

FTSC. This again contradict the findings of Cooksen & Niessner (2020), which suggest that Likes is not a good proxy for disagreement concering the smaller stock indices. Concerning the

returns, SDLt-1 effects only have significant (negative) effects for European Union and Sell

Stock. This may be because both these proxies embody negative indicators for the British stock market, where increased collective agreement results in short positions on the stock market indices.

Generally, SDRt-1 effects tend to differ per proxy and between stock market size. This

can be explained by the function that Retweets represent, namely a proxy for the reach of the

DMPS. Therefore, it is more interesting to review the interaction between SDRt-1 and

DMPSt-1. The interaction is significant for several proxies, which indicates that there is a joint

dependence of these two variables in explaining several of the stock indices metrics. The

(38)

38

results clarify that there is both a negative effect and a positive effect on respectivily the stock indices volatility and volume. This can be explained by the findings of Mao, Counts & Bollen (2011) who found that there is a negative relation between Twitter sentiment and the VIX, but also that there is a positive relation in the Tweet volume and DJI volume. Concerning the returns, the interaction effect was only significant for the proxies Brexit and Sell stock. While the effect of the Brexit proxy was negative as expected, while the Sell Stock proxy had a positive effect. Since Twitter sentiment and Tweet volume have respectively a positive and negative relation with log returns (Mao, Counts & Bollen, 2011), the sign may depend across proxies and which of the two variables has the strongest effect on the respective stock index returns.

Concerning hypothesis 3, Table 7 illustrates the adjusted R2and the significance of the

(39)

39 Table 6

Multiple regression Polarity Lag Including SDL and SDR

Bearish Proxy

Dependent variable:

FTSE100 Returns FTSE100 Volatility FTSE100 Volume FTSE250 Returns FTSE250 Volatility FTSE250 Volume FTSC Returns FTSC Volatility

(1) (2) (3) (4) (5) (6) (7) (8) Intercept 0.001 0.02**** 20.36**** 0.002 0.01**** 18.73**** -0.0004 0.01**** SDL t-1 -0.001* -0.001**** -0.03*** -0.0005 -0.001** 0.61**** -0.0003 -0.0002 DMPS t-1 -0.004 0.01 -0.64* 0.001 0.02** 3.67 -0.01 0.01 SDR t-1 0.001 -0.001 0.03 0.001 -0.001* -0.67** 0.001 -0.0002 DMPS t-1 x SDR t-1 0.003 -0.01** 0.07 0.003 -0.01**** -0.75 0.004 -0.003 Observations 1,201 1,201 1,201 1,201 1,201 1,203 1,105 1,105 R2 0.004 0.03 0.03 0.01 0.03 0.03 0.01 0.01 Adjusted R2 0.0003 0.03 0.03 0.003 0.03 0.02 0.002 0.01 Residual Std. Error 0.01 (df = 1196) 0.01 (df = 1196) 0.25 (df = 1196) 0.01 (df = 1196) 0.01 (df = 1196) 3.11 (df = 1198) 0.01 (df = 1100) 0.01 (df = 1100) F Statistic 1.10 (df = 4; 1196) 10.12**** (df = 4; 1196) 9.62**** (df = 4; 1196) 1.89 (df = 4; 1196) 9.26**** (df = 4; 1196) 8.37 **** (df = 4; 1198) 1.53 (df = 4; 1100) 2.77** (df = 4; 1100)

Notes: SDL = Sum of Daily Likes, SDR = Sum of Daily Retweets, DMPS = Daily Mean Polarity Score. All variables are log transformed with the

(40)

40

TABLE 7: MATRIX OF ADJUSTED R2 AND SIGNIFICANT FOR THE MODELS OF EQ. 4

Twitter Proxies Stock Market Metrics

FTSE returns FTSE volatility FTSE Volume FTMC returns FTMC volatility FTMC volume FTSC returns FTSC volatility Bearish 0 0.03**** 0.03**** 0 0.03**** 0.02**** 0 0.01** Boris Johnson 0 0.03**** 0.02*** 0 0.01*** 0.01*** 0 0.01** Brexit 0.01** 0.03**** 0.01** 0.01*** 0.01*** 0.02* 0 0 Bullish 0 0.03**** 0.02**** 0 0.03**** 0.02**** 0 0.01**** Buy Stock 0 0.05**** 0.01**** 0 0.01**** 0.01*** 0 0 Climate Change 0 0.03**** 0.02*** 0 0.01** 0.01*** 0 0 Economy UK 0 0.05**** 0.01** 0 0.02**** 0.02**** 0 0 European Union 0 0.03**** 0.01** 0 0.01*** 0.02**** 0 0 FTSC 0 0.01** 0.02**** 0 0 0 0 0.02**** FTSE100 0 0.05**** 0.02**** 0 0.03**** 0.02*** 0.01* 0.01*** FTSE250 0 0.03**** 0.01**** 0 0.01** 0.01** 0 0 GBP/EUR 0 0.06**** 0.03**** 0 0.03**** 0.01**** 0 0.01*** House of Commons 0 0.03**** 0.01*** 0 0 0.05**** 0 0 Immigration UK 0 0.03**** 0.01*** 0 0 0.03**** 0 0 Recession 0 0.03**** 0.01**** 0 0 0.02**** 0 0 Sell Stock 0 0.02**** 0.02**** 0 0 0.01*** 0.01** 0.01** UK Prime Minister 0 0.02**** 0.02**** 0 0 0.01**** 0 0.01*** Unemployment Rate UK 0 0.03**** 0.01**** 0 0 0.02**** 0 0 USA President 0 0.04*** 0.05*** 0 0.01*** 0.03**** 0 0

Note: Numbers indicate the adjusted R2 and include the level ****p<0.001, ***p<.01,**p<0.05,*p<0.10

(41)

41

5.3 Factor Analysis

The PCA constructs factors out of the 18 used proxies from the previous analyses. This

PCanalysis excludes the FTSC proxy, since it has too many missing observations and therefore would cause an information loss of values of the other proxies. The first step is to review a Kaiser-Meyer-Olkin (KMO) test to see whether the proportion of variance in the proxies are suited for factor analysis. The proxies need to surpass the threshold (>.40) to be included in the factor analysis (Amold & Reynolds, 2003). All proxies surpass this threshold and therefore can be used for the factor analysis. Subsequently, a scree plot indicates that 3 factors (K = 3) is optimal number of factors, because after this number the curve flattens indicating a drop of efficiency if more factors are included. These 3 factors where able to predict approximately 30% of the variance. Normally the criteria is to use the amount of factors that can account for 60% of variables variance. However, controlling for multiple factors, a 3 factor model results in the models that explain the most variance of the stock indices metrics. Moreover, a number of 3 lags delivers the highest additional increase in total variance explained for the stock indices metrics.

Table 8 includes the 3 factor model including their 3 lags. A comparison of the adjusted

R2 of Table 5 and Table 7 allows to draw conclusion on Hypothesis 4a, which states that the

Referenties

GERELATEERDE DOCUMENTEN

This thesis finds that using an OLS multiple regression these political and economical proxies can also explain variations in the FTSE100 metrics (daily returns,

The metrics number of elements per input pattern and number of elements per output pattern measure the size of the input and the output pattern of rules respectively.. For an

For each of them we defined metrics for measuring the number of trans- formation functions for each of these function types (not shown in the table).. Besides the number

Brazil’s capacity to pursue a successful strategy of consensual hegemony will be analysed by looking at four levels of analysis: the domestic, bilateral, regional,

Given this distorted division of land, land reform has become crucial to the success of the national transformation project (Hall, 2004), reconciliation and nation building

Taking into consideration the success of kernel spectral clustering (KSC) for large scale networks (which are both sparse and high-dimensional), in this paper, we perform

zwart T-shirt, grijze short, blauwe of witte sokken (2); wit T-shirt, grijze short, zwarte of blauwe sokken (2); wit T-shirt, zwarte short en blauwe sokken (1); rood T-shirt,

After establishing that the ASVI is indeed able to capture investors’ attention on weekly frequency data, now daily frequency data is used to answer the