• No results found

ESG and Stock Market Performance: The Impact of Twitter Sentiment

N/A
N/A
Protected

Academic year: 2021

Share "ESG and Stock Market Performance: The Impact of Twitter Sentiment"

Copied!
67
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

ESG and Stock Market Performance: The Impact of Twitter

Sentiment

Ruben Joris Pieter de Groot S2529645

Double Degree Master Thesis MSc Finance & MSc Marketing

Faculty of Economics and Business University of Groningen

Supervisor MSc Finance: L. Dam Supervisor MSc Marketing: E. De Haan

(2)

ABSTRACT

This study researches the impact environmental, social, and governance (ESG) related behavior of companies has on the stock market performance of those companies. In order to measure this impact, I create sentiment data using the social media platform Twitter, and compare the predictive performance of this newly created ESG measurement tool with that of the more traditional ESG pillar scores of a professional rating agency. I find, using data for 40 companies out of five different sectors mostly headquartered in the United States (US) and for the period 2010-2018, that ESG Twitter sentiment more strongly predicts stock market performance than the ESG pillar scores do. The ESG pillar scores, however, significantly predict the ESG Twitter sentiment, in turn. This means, both tools generate information regarding the stock market performance of a firm, which shows that a firm’s management should adopt both in a combined dashboarding model.

Keywords: environmental, social, governance, ESG, stock market performance, Twitter

(3)

ACKNOWLEDGEMENTS

First, I would like to thank my supervisors Lammertjan Dam and Evert de Haan for their support and advice throughout the whole process of writing this double degree master thesis. Combining a thesis for the masters Finance & Marketing does not occur frequently, and therefore, requires some flexibility from both of the supervisors, especially in the difficult period I have written this thesis. During the whole process, I never experienced this to be an issue and I always felt the meetings to be pleasurable.

(4)

TABLE OF CONTENTS

1. INTRODUCTION... 5

2. LITERATURE REVIEW ... 8

CSR/ESG AND FIRM PERFORMANCE ... 8

TWITTER SENTIMENT AND FIRM PERFORMANCE ... 12

3. METHODOLOGY ... 14

4. DATA ... 19

ESG PILLAR SCORES ... 19

ESG TWITTER SENTIMENT ... 21

STOCK MARKET PERFORMANCE ... 25

COMPANY SELECTION PROCEDURE... 26

DESCRIPTIVE STATISTICS ... 28

5. RESULTS ... 34

ROBUSTNESS ... 44

6. DISCUSSION ... 45

LIMITATIONS AND FURTHER RESEARCH ... 46

7. REFERENCES ... 49

APPENDIX 1 ... 52

APPENDIX 2 ... 54

KEYWORD SELECTION PROCEDURE ... 54

(5)

1. INTRODUCTION

At the beginning of this century, a growing debate has emerged about the impact environmental, social, and governance related behavior of companies has on the performance of those companies as a whole. By means of different ESG measurement tools, researchers attempt to establish the relationship between the ESG performance of a firm and the firm’s financial or stock market performance. Already since the 70s of the last century, studies explore whether corporate social responsibility (CSR) could affect a firm’s performance. While the focus, back then, was not yet on ESG related matters specifically, researchers used relatively simple forms of measurement tools to relate firms’ CSR behavior to their overall performance. Alexander and Buchholz (1978), for example, measure the correlation between CSR and stock market performance, using a ranking based on the ratings of businessmen and students regarding leading firms’ CSR levels. Over the years, the measurement tools have become more sophisticated. Among others, Van de Velde, Vermeir, and Corten (2005) use sustainability scores of a rating agency (Vigeo) to build separate portfolios consisting of “best, good, bad, and worse sustainable performance.” Especially, the usage of such scores has become more popular over the past decades, as the number of professional rating agencies has also grown (SustainAbility, 2019). Another (recent) development is the emergence of Twitter sentiment as a predictor of firm performance. An increasing amount of researches transforms the freely downloadable tweets into sentiment data and investigates the relationship between these newly created data and stock return fluctuations, mostly with significant results. Therefore, combining the two previously mentioned evolutions, I see possibilities to examine the relationship between ESG and firm performance (specifically, stock market performance, in this research) in more depth, by means of creating a new ESG measurement tool. This newly created ESG measurement tool consists of sentiment data using ESG related messages extracted from Twitter. I compare the impact of the above-mentioned new data source with the relationship between the already more frequently used ESG pillar scores of a professional rating agency and stock return.

(6)

impact of corporate ESG ratings is still to be questioned” (SustainAbility, 2019). This is mostly the result of professional rating agencies not being transparent about the methodology behind their ESG scores, as that means giving away the ‘secret ingredient’ that provides them with an advantage over the increasingly growing number of ESG data providers. In order to overcome this problem, and to potentially receive new insights into the relationship between ESG and firm performance, the research field demands new ESG measurement tools. The ESG Twitter sentiment data used in this research, which studies, to my knowledge, have not created before, could potentially fill this gap in the field of ESG measurement tools.

The empirical literature about the relationship between a firm’s ESG (previously CSR) related activities and its stock market performance can mostly divide into three main components: (i) “event studies that explore the immediate effects of social or environmental performance proxies on short-term stock price variability;” (ii) “regression analyses that attempt to establish a cross-section relationship between CSR and stock returns;” and (iii) “portfolio studies that investigate the benefits of embedding CSR into investment decisions” (Guenster, Derwall, Bauer, and Koedijk, 2011). In this research, I execute a strategy comparable to that of option two, as I perform a panel data regression analysis of 360 firm-year observations. By means of regressing the two different predictors (ESG pillar scores and ESG Twitter sentiment) on as well the current as the one-year lead stock return, I examine the performance of both predictors. Moreover, I create regression models, in line with Granger causality theory (Granger, 1969), explaining the relationship between the predictors themselves, as those models test whether one predictor has predictive information about the other. In the models explaining the stock return, I control for the impact of the risk factors of the Fama and French (FF) Three-Factor model (Fama and French, 1992), as among others, Von Arx and Ziegler (2014) show that to be important when linking ESG (or CSR) to a firm’s stock return.

(7)

the provider of the ESG pillar scores and the main data source of the stock return data is the Center for Research in Security Prices (CRSP). Using the latter source, I extract the monthly stock returns adjusted for dividends and stock splits, which I transform into annual stock returns. Finally, I obtain the factor data required for the measurement of the betas through the website of Kenneth French. For the calculation of these betas, I use a rolling period of 24 months.

I find that the newly created ESG Twitter sentiment data perform stronger in terms of predicting the stock market performance than the more traditional ESG pillar scores do. The ESG pillar scores, however, significantly predict the ESG Twitter sentiment data, in turn. This means, both have an influence on the stock market performance of a firm, as the ESG pillar scores give an indication about the future ESG performance in terms of the Twitter sentiment, whereas the ESG Twitter sentiment data provide new information regarding the effect this ESG performance has on the stock market performance.

These findings imply that a firm’s management should use both ESG measurement tools (pillar scores and Twitter sentiment) in a combined dashboarding model of a firm. Since the ESG pillar scores do have an indirect relationship with stock market performance, whereas the ESG Twitter sentiment has a direct impact on stock return, both tools generate new information regarding (the direction of) the stock market performance of a firm. This means, new ESG measurement tools do provide new insights regarding the ESG performance of a firm and the corresponding consequences for the stock market performance, which confirms the literature stream of, among others, Aupperle, Caroll, and Hatfield (1985) and Balatbat, Siew, and Carmichael (2012). Finally, the direction of the impact of especially the newly created ESG Twitter sentiment on a firm’s stock return is ambiguous, which provides opportunities for further research.

(8)

2. LITERATURE REVIEW

In this section, I discuss the literature of the two different research fields relevant to this research separately. As already mentioned in the introduction, these research fields are the relationship between CSR/ESG and firm performance, and the impact of Twitter sentiment on the performance of a firm. With firm performance, I refer to both financial performance and stock market performance. Though, the main focus of this literature review is on the impact on a firm’s stock market performance, as that is also the measure of performance in this research. Moreover, besides discussing only ESG related researches, I also include studies focusing on CSR, as CSR being the precursor of ESG. Finally, concerning the relationship between CSR/ESG and stock market performance, in relation to the three main components of empirical literature as discussed in Guenster et al. (2011) mentioned before, I mainly focus on studies using regression analyses and portfolio strategies, as those methodologies are most similar to the methodology used in this research. The discussion of the two aforementioned research fields should give an indication of what to expect regarding the relationship between the newly created ESG Twitter sentiment and stock market performance, but also for the impact of the extracted ESG pillar scores from Refinitiv and a firm’s stock return.

CSR/ESG and Firm Performance

As already mentioned, relating a firm’s CSR/ESG activities to its overall performance has originated already in the last decades of the previous century. During the emergence of CSR at that time, people mostly tended to believe displaying positively CSR related behavior would not be optimal for the performance of a firm as a whole. This general expectation was mainly based on Friedman’s capitalist view, which states that “corporate managers’ only moral obligation is to its shareholders and that the only one social responsibility of business is to use its resources and engage in activities designed to increase its profits” (Friedman, 1962). Relating this to CSR, it means a firm’s management should not be participating in CSR related activities, as that would mean it would be spending time and resources to matters other than maximizing profits and shareholder revenue.

(9)

different findings. Moskowitz (1972), for example, uses a self-selected portfolio of 14 companies showing, as he believes, strong CSR performance to demonstrate that firms being socially responsible outperform major market indices such as the Dow-Jones Industrials in terms of the rate of return. He uses this finding to state that “socially aware and concerned management possess the requisite skills to run a superior company in the traditional sense of financial performance, thus making its firm an attractive investment.” Moreover, Vance (1975) uses the rankings of two earlier conducted surveys in which businessmen and students rate leading firms regarding their level of CSR. He eventually links these rankings to stock market performance and establishes a negative correlation. This means, in contradiction to the previously mentioned research, he finds that socially responsible firms are at a competitive disadvantage because of the higher amount of costs related to socially responsible behavior, thereby confirming Friedman (1962). Alexander and Buchholz (1978) apply more or less the same strategy as in Vance (1975), by means of also measuring the correlation between the two earlier mentioned rankings and stock market performance. Their research is, however, substantially different in that it controls for risk as suggested in the capital asset pricing models, and since the investigated time period is now multiple years instead of only a single year. Using the risk-adjusted returns and the extended time frame of data, the significant relationship between CSR and stock market performance vanishes. This means, these three early studies do not establish a clear relationship between CSR and stock market performance to exist, which also provides a first indication of the relevance of the way of measurement behind CSR, the length of the research period, and the inclusion of control factors as suggested in asset pricing models.

(10)

portfolios with a lower level of sustainability. Finally, Akpinar, Jiang, Gómez-Mejía, Berrone, and Walls (2008) apply a similar strategy and do also find similar results, as they state that CSR does have a positive impact on firm performance after prioritizing stakeholders. They establish this finding by means of developing an equal-weighted CSR portfolio and a stakeholder-weighted CSR portfolio and show that the latter portfolio significantly outperforms the former.

Another extensive stream of literature focuses more on theoretical explanations rather than empirical relationships. Freeman (1984) develops the stakeholder theory in opposition to Friedman’s capitalist view, in which he mainly states that firms can develop a strong relationship with their stakeholders by means of CSR activities. Furthermore, multiple studies relate CSR to the agency theory. Jensen and Meckling (1976), for example, state that a potential increase in reputation could incentivize managers to overinvest in CSR. This thought could, in turn, cause agency costs to arise from conflicts between managers and shareholders. On the other hand, agency costs could also decrease by means of lower monitoring costs resulting from CSR possessing informative value about the quality of management (Akpinar et al., 2008). In addition, they state that their theory could potentially give an answer to the theory introduced in Friedman (1962) since “used as a signal for good management, CSR might create value for stockholders in terms of reduced monitoring costs.” Finally, another frequently discussed theory is that participating in CSR activity could enhance a firm’s image and reputation, which could, in turn, positively influence firm performance again (e.g., Soloman and Hansen, 1985).

(11)

regression models that operating performance, efficiency, and firm value positively relate to stronger ESG performance. Besides that, Ng and Rezaee (2015) use an updated version of this KLD database to regress sustainable performance on the cost of equity and argue that “non-financial sustainability performance negatively relates to this cost of equity.” Moreover, Fatemi, Glaum, and Kaiser (2018) incorporate, in addition to the KLD ratings, Bloomberg’s measure of ESG disclosure to show that ESG strengths increase firm value, whereas ESG weaknesses decrease a firm’s value. Furthermore, they find that ESG disclosure “mitigates the negative effect of weaknesses and attenuates the positive effect of strengths.” Finally, a study incorporating the ESG scores as used in this research is Velte (2017). He shows, by means of a correlation and regression analysis on German data, that ESG performance has a positive impact on financial performance. Furthermore, he states that governance performance, in comparison with environmental and social, has the strongest impact financially speaking.

Besides the development of the methodology, over the years it has also become evident that (i) the type of CSR/ESG measurement tool does have an impact on the results of a study (e.g., Aupperle et al., 1985; Balatbat et al., 2012); (ii) the length of the research period affects CSR or ESG related studies as CSR/ESG usually tends to have a long-term rather than a short-term impact (e.g., Henderson, 2001; Walley and Whitehead, 1994); and (iii) models explaining firm performance should include control variables (e.g., Telle, 2006), most preferably the risk factors of asset pricing models in case of stock market performance (e.g., Von Arx and Ziegler, 2014) or inputs such as the amount of research and development investments regarding financial performance (e.g., McWilliams and Siegel, 2000). As already illustrated in a relatively early stage by Alexander and Buchholz (1978), dissimilarities for one or more of the three categories above could (partially) explain potential differences in results between studies. This could cause significant relationships to be visible while in fact the relationship should be non-significant, and vice versa.

(12)

CSR/ESG measurement tool used, the length of the research period, and the potential inclusion of control variables such as the impact of the risk factors as suggested in the FF Three-Factor model. It has become evident that differences for any of these three characteristics could partially explain the differences in results between studies, which causes the results of several studies to be biased.

Twitter Sentiment and Firm Performance

The research field concerning the impact of Twitter sentiment data on a firm’s performance is not as extensive as that of ESG and CSR yet, as creating sentiment data using sources as Twitter is still relatively new to date. Only in the past decade, researchers started to explore the possibilities large online platforms can have in the data collection procedure, and the impact those types of data can have on, mostly, stock market performance. Although, in literature, studies also link a firm’s performance to sentiment data created using different social media platforms (e.g., Facebook or Booking.com), I solely focus on researches using Twitter as the data source of sentiment. This is since the largest proportion of the online sentiment research field makes use of Twitter as the main source of data, and since this research also focuses on Twitter sentiment as the predictor of the stock return fluctuations, for multiple reasons, as discussed later.

When investigating this still relatively limited research field regarding Twitter sentiment and its impact on firm performance, though, it becomes soon evident that Twitter sentiment does have a significant relationship with especially stock market performance. To start with, Bollen, Mao, and Zeng (2011) find that public mood in the form of Twitter sentiment can significantly predict the Dow Jones Industrial Average (DJIA). To measure the Twitter sentiment, they use two different tools. The first is to indicate whether a tweet is positive or negative, and the second to find the mood of the respective Twitter user in terms of six different dimensions (calm, alert, sure, vital, kind, and happy). By means of Granger causality analysis as proposed in Granger (1969), they establish the predictive relationship between Twitter sentiment and the DJIA index. Moreover, they state that the inclusion of part of the Twitter mood dimensions could cause increasingly accurate predictions of fluctuations in the DJIA, which they show by means of a self-organizing fuzzy neural network.

(13)

using a different time period of data and find the same results, although slightly less significantly in terms of the prediction accuracy. Moreover, Rao and Srivastava (2012) perform a similar study using Granger causality theory and a machine learning model involving the DJIA, and establish that “negative and positive dimensions of public mood carry a strong cause-effect relationship with price movements of individual stocks/indices.” Furthermore, they state their research to be more robust and to provide more significant results than any previous work, as they find up to 88% correlation between Twitter sentiment and stock return, and also the prediction accuracy of their machine learning model records a high score of 91%.

Besides the methodology discussed above, studies also adopt different strategies to examine the relationship between Twitter sentiment and mainly stock market performance. These studies using a different strategy are, however, similar to the studies mentioned before in the sense that they do also find significant results. For example, Si, Mukherjee, Liu, Li, Li, and Deng (2013) show, using a topic-based sentiment strategy on the S&P100 index, that methods measuring the sentiment for specific topics perform even “better than existing state-of-the-art non-topic based methods.” Furthermore, Ranco, Alekovski, Caldarelli, Grčar, and

Mozetič (2015) find using an event study approach that Twitter sentiment and stock return significantly relate during peaks of Twitter volume. In contradiction to previous research, when relating the Twitter sentiment to stock market performance over their entire time period of research, they only find a low correlation and Granger causality to exist. Though, when examining this relationship surrounding events such as quarterly announcements or other automatically identified events, the positive and negative sentiment of these events turn indicative of the direction of cumulative abnormal returns. Finally, Sul, Dennis, and Yuan (2017) also use whether a tweet is positively or negatively oriented to examine the relationship between this type of Twitter sentiment and the stock return of individual S&P500 firms. Their research is, however, substantially different in comparison with the previously mentioned studies in the sense that they also incorporate the amount of followers the corresponding Twitter users have, and the number of times a tweet is retweeted. They show that the sentiment of tweets belonging to ‘twitterers’ with a below-median amount of followers significantly predicts the stock return. Moreover, the sentiment of these tweets with a below-median amount of followers most strongly predicts stock return without being retweeted.

(14)

sentiment has increased the amount of possibilities regarding the Twitter sentiment. As an example, De Haan (2020) uses an R software package called ‘qdap’ to assign a specific score of sentiment to a specific tweet directed at the official Twitter account of a company. He then relates a company-specific yearly aggregated transformation of these tweets’ sentiment scores and a few other transformations, as discussed later, to annual firm performance indicators of those companies. He finds that especially negatively oriented Twitter sentiment is a strong predictor of firm performance, and that overall Twitter sentiment (electronic word of mouth, in his research) also performs better in terms of predicting firm performance than a traditional customer satisfaction index does. Besides the more sophisticated measurement of Twitter sentiment by means of the ‘qdap’ package, which I discuss in greater detail later in this research, another difference of his research in comparison with the previously discussed literature is that De Haan (2020) attempts to establish a relationship between Twitter sentiment and a firm’s performance on an annual basis, whereas the previous literature mainly prefers to use data on a daily frequency. Since I attempt to establish a similar relationship, it could be interesting to see the impact this difference in the time-frequency of the data has on the results of this study.

Concluding, a significant relationship between sentiment created using the social media platform Twitter and mainly stock market performance is clearly visible. Although, this research field is still relatively new to date, over the past decade, researchers already introduced numerous developments. These developments do have one thing in common: the significant relationship remains existent. Furthermore, the research field mostly makes use of daily data to establish the relationship between Twitter sentiment and firm performance. Therefore, it could be interesting to relate the findings of the literature predominantly using data on a daily basis (thus for a relatively short time period, in total) with that of this research focusing on yearly data (and for a longer total period of time).

3. METHODOLOGY

(15)

mostly means if one of the predictors potentially has superior information over the other. In this section, I discuss these different types of models separately.

To start with, I examine the relationship between the predictors and the stock market performance over the data set as a whole by means of two different types of panel regression models. In the first model type, I test the effect of the predictors on the current stock return, whereas the second model type investigates the relationship between the predictors and the one-year lead stock return. Both model types use the five different sub predictors (score, volume, standard deviation, fraction of positive tweets, and fraction of negative tweets) as an independent variable at least once, for each of the pillars (E, S, and G), and for as well the traditional ESG scores (only score) as the Twitter sentiment (all of the different sub predictors). Both model types consist of seven different models which I construct and extend systematically so that the models are nested versions of each other. This means that I can observe the relative performance (gain) between the different (sub) predictors in greater detail. Furthermore, both of the model types control for the impact of the risk factors of the FF Three-Factor model (market risk premium, size premium, and value premium), as literature finds this to be important when regressing on stock return (e.g., Von Arx and Ziegler, 2014). Finally, I include year dummies to account for the time fixed effects. This results in the models as presented in Equation 1 (on the current stock return) and Equation 2 (on the one-year lead stock return), where the current stock return does reflect the short-term impact, and the one-year lead stock return the long-term impact, as both could lead to different results according to, among others, Henderson (2001) and Walley and Whitehead (1994):

𝑅𝑖𝑡 = 𝛼0+ ∑𝑃 𝛽

𝑝=1 𝑝· 𝐸𝑆𝐺𝑝𝑖𝑡+ ∑𝑀𝑚=1𝛽𝑚· 𝑇𝑆𝑚𝑖𝑡+ ∑𝑓=13 𝛽𝑓· 𝐵𝑒𝑡𝑎𝑓𝑖𝑡+ ∑2019𝑡=2010𝛽𝑡·

𝑦𝑒𝑎𝑟𝑡+ 𝜖𝑝𝑚𝑓𝑖𝑡, (1)

where 𝑅𝑖𝑡 is the stock return for firm i in year t, 𝐸𝑆𝐺𝑝 is the ESG pillar score per pillar p, respectively environmental, social, and governance, 𝑇𝑆𝑚 is the Twitter sentiment per sub

predictor m, respectively environmental, social, and governance sentiment scores, fractions of positive tweets, fractions of negative tweets, sentiment standard deviations, and volumes of tweets, 𝐵𝑒𝑡𝑎𝑓 is the beta measuring the impact of FF’s three risk factors per factor f on a rolling period of 24 months, respectively of the market risk premium, size premium, and value premium, and 𝑦𝑒𝑎𝑟𝑡 is a dummy variable for each of the years.

𝑅𝑖𝑡+1 = 𝛼0+ ∑𝑃𝑝=1𝛽𝑝 · 𝐸𝑆𝐺𝑝𝑖𝑡+ ∑𝑚=1𝑀 𝛽𝑚· 𝑇𝑆𝑚𝑖𝑡 + ∑3𝑓=1𝛽𝑓· 𝐵𝑒𝑡𝑎𝑓𝑖𝑡+

(16)

where 𝑅𝑖𝑡+1 is the stock return for firm i in year t+1. I check both of the model types for the most important model assumptions (multicollinearity, autocorrelation, heteroskedasticity, stationarity, and normality), and in case required, I adjust the models or results. Furthermore, in line with a large part of the Twitter sentiment research field (e.g., Bollen et al., 2011; De Haan, 2020; Mittal and Goel, 2012; Rao and Srivastava, 2012), I construct the model in Equation 2 based on the Granger causality theory (Granger, 1969), as it tests whether one time series (i.e., different predictors, in this case) has predictive power about another time series (stock market performance or annual return).

I eventually evaluate the different models on the significance, sign, and size of the parameter coefficients, adjusted R-squared, and the Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). Especially, the final two measures of fit can be used to find the relative performance (gain) of the two predictors (ESG pillar scores and ESG Twitter sentiment) in predicting the stock market return. The AIC and BIC are appropriate to use for the two model types discussed above because of the nested construction of the models, which enables the tests to find what variable (or category of variables) causes the increase or decrease in the fit of the models. Both tests namely measure whether a certain variable (or category) does increase or decrease the efficiency (i.e., the amount of information provided relative to the amount of independent variables included) of a model. A score closer to zero means the extra variable (or category) is beneficial to the model compared to a nested version of the model without that particular variable (or category). The AIC and BIC differ from each other in the sense that the BIC is a more restrictive version of the AIC, by also using the number of observations as an input in the calculation. Equation 3 and Equation 4 present the formulas behind the calculation of both measures of fit. The results of these tests show whether both predictors add efficiency to the models explaining the stock market performance of a firm, and thereby potentially, whether one predictor has more explaining power on stock return than the other does.

𝐴𝐼𝐶 = −2 × ln(𝐿) + 2 × (𝐾 + 1), (3)

where 𝐿 is the value of the likelihood and 𝐾 is the number of independent variables of the model.

𝐵𝐼𝐶 = −2 × ln(𝐿) + ln(𝑇) × (𝐾 + 1), (4)

where 𝑇 is the amount of observations of the model.

(17)

the ESG performance of a firm (i.e., pillar scores, Twitter sentiment scores, and positive fractions of tweets) to be positively related to stock return (for negative fractions of tweets, logically the other way around). Although, literature finds mixed results to exist between ESG and stock market performance, over the years, a trend is visible showing the relationship to become more positively oriented. This trend even exists for studies accounting for various control factors, which the models above also do by means of the inclusion of the impact of the FF Three-Factor model. Moreover, in line with, among others, Henderson (2001) and Walley and Whitehead (1994), I expect the positive relationship to be especially present for the models including the one-year lead stock return, as ESG tends to have a long-term rather than a short-term impact. Regarding the volume and standard deviation of ESG Twitter sentiment, I do not have an a priori expectation. Secondly, in relation to the performance of as well the ESG pillar scores, as the ESG Twitter sentiment, I expect the impact of both predictors on the stock return to be different. This is since literature states that the type of ESG measurement tool does have a strong impact on the results of a study (e.g., Aupperle et al., 1985; Balatbat et al., 2012). Besides that, I do not have a clear a priori expectation about which predictor to be better in terms of predicting the stock market performance. This is since both of the predictors have established themselves in research, and both also have their limitations. The latter implies, due to a relatively low amount of fluctuation between the years (e.g., Sahut, Pasquini-Descomps, Cohendet, and Mazouz, 2015), the ESG pillar scores could be the lesser predictor, whereas the creation of annual instead of daily sentiment data as performed in this research could potentially cause the effect of the ESG Twitter sentiment to become less significant than in literature. Therefore, it is difficult to state one of the predictors will perform better than the other.

(18)

these models for the aforementioned assumptions, which explains, among others, the potential transformation of the dependent variable into its first difference form (to adjust for non-stationarity). The models on the one-year lead stock return are, again, in line with the method presented in Granger (1969) based on the same reasoning as before. The construction of the sector-specific models, for the majority of the sectors, is in Equation 5:

𝑅𝑖𝑗𝑡,𝑡+1 = 𝛼0+ 𝛽𝐸𝑆𝐺· 𝐸𝑆𝐺𝑖𝑗𝑡+ ∑4 𝛽

𝑚=1 𝑚· 𝑇𝑆𝑚𝑖𝑗𝑡+ ∑3𝑓=1𝛽𝑓· 𝐵𝑒𝑡𝑎𝑓𝑖𝑗𝑡+

𝜖𝐸𝑆𝐺𝑚𝑓𝑖𝑗𝑡, (5)

where 𝑅𝑖𝑗𝑡,𝑡+1 is the stock return for firm i in sector j and in year t or t+1 (depending on the model), ESG is the ESG pillar score for the three pillars in total, 𝑇𝑆𝑚 is the Twitter sentiment

per sub predictor m for the three pillars in total, respectively the volume of the tweets, the sentiment standard deviation, and the positive and negative fraction of tweets, and 𝐵𝑒𝑡𝑎𝑓 is the beta measuring the impact of FF’s three risk factors per factor f on a rolling period of 24 months, respectively of the market risk premium, size premium, and value premium. Note, that in the models for the Financial sector and the model on the one-year lead stock return for the Industrial sector, I also include a year dummy to account for the time fixed effects present in those models. Furthermore, as already mentioned, I use the first difference form of the dependent variables in the Technology and Consumer Cyclical sectors’ models. I also evaluate these models based on mostly the same measures (of fit) as discussed before. Comparing the models for the AIC and BIC is difficult, as the models differ in as well independent variables (IVs), dependent variables (DVs), and data used.

Finally, for the relationship between the different (sub) predictors themselves, I use the models as presented in Equation 6:

(19)

combinations of (sub) predictors, I measure a model including and excluding a lagged version of the dependent predictor. I do this to be able to find out, besides the relationship between a (sub) predictor and the future values of the other (sub) predictors, whether a certain (sub) predictor has predictive power about its own future values as well. This means, the models give an answer to the questions whether a (sub) predictor provides new information over time, and whether a (sub) predictor provides different information than the other (sub) predictors do, which means it is able to show if a (sub) predictor provides value to a firm to be used in a (combined) dashboarding model. Also, all these models are in line with Granger (1969) for the same reasoning as mentioned before, but now also accounting for the significant amount of autocorrelation, if present, which means the models including a lagged version of the DV can measure the incremental predictive value more accurately. Furthermore, I mostly evaluate the models using the same measures (of fit) as discussed earlier, namely the significance, sign, and size of the parameter coefficients, and the adjusted R-squared.

4. DATA

In this research, I use three different data sources: (i) the traditional ESG data in the form of pillar scores downloaded through Refinitiv Eikon, (ii) the ESG-based Twitter sentiment data created using tweets scrapped from Twitter, and (iii) the stock market performance data mostly collected from the Center for Research in Security Prices. In this section, I discuss these three different data sources in more detail, separately. Furthermore, I explain the company selection procedure. Finally, I provide the descriptive statistics of the final data, including the correlation between the three different data sources mentioned above.

ESG Pillar Scores

(20)

the vast majority, I collect the ESG data over the period 2010-2018 (85% of the companies), but incidentally, a company has data from 2009-2017 (12.5% of the companies) or 2011-2019 (2.5% of the companies). This is since I adjust the years of data for the companies that have period end dates not coinciding with the start of a new calendar year in such a way that they do match a particular calendar year (for comparison reasons), and since, in any case, this ensures the data to be the most recently available (i.e., the most recently nine years of data). The latter certifies the amount of tweets eventually generated to measure the Twitter sentiment to be the largest possible, since Twitter has become more popular over time, as I show later. The readjustments of the companies’ years of data are based on the following principle. If a company’s period end date falls in the months January-April or August-December, the corresponding year is the calendar year with the largest number of months that is in between two different period end dates (e.g., a period end date in August means August, 2009-August, 2010 becomes the year 2010, and August, 2010-August, 2011 turns into the year 2011). If, however, a company’s period end date falls in the middle of the year (i.e., the months May, June, or July), I adjust its research period so that it exactly matches the 2010-2018 time frame, to make comparing more convenient.

Refinitiv divides each of the ESG pillar scores into three or four categories, depending on the type of pillar (E, S, or G). An overview of these categories is in Table 1. A company receives a score for each of the categories. Equation 7 shows the formula used to measure these ESG category scores.

𝐸𝑆𝐺𝑐𝑖𝑡 =𝑊𝑜𝑟𝑠𝑒𝑐𝑖𝑡+

𝐸𝑞𝑢𝑎𝑙𝑐𝑖𝑡 2

𝑉𝑎𝑙𝑢𝑒𝑐𝑡 , (7)

(21)

TABLE 1

Overview of Refinitiv’s Categories per ESG Pillar

Categories used by the professional rating agency Refinitiv to measure the ESG pillar scores used in this research.

Pillar Categories

Environmental • Resource use

• Emissions • Innovation Social • Workforce • Human rights • Community • Product responsibility Governance • Management • Shareholders

• Corporate social responsibility strategy

ESG Combination of the pillars above

Note: ESG = combination of environmental, social, and governance.

A major advantage of Refinitiv’s ESG scores is that Refinitiv, in comparison with the other ESG data providers, is relatively transparent about their way of measurement. They show a list of inputs considered for the measurement in the application used to extract the ESG pillar scores (Eikon), and they also make their methodology available to the public (Refinitiv, 2020). This makes it possible to create (ESG-based) Twitter sentiment scores based on categories as closely matching the categories used by Refinitiv as possible, as I discuss later, which allows for comparison.

ESG Twitter Sentiment

(22)

(https://www.statista.com/statistics/248074/most-popular-us-social-networking-apps-ranked-by-audience/). This means that Twitter, especially in the United States, has become a more true representation of the population and that a large amount of the (American) companies participates actively on Twitter.

For the scrapping of the tweets from Twitter, I use Twitterscraper, which is a Python package developed by Taspinar (https://github.com/taspinar/twitterscraper). For the calculation of the sentiment scores, I use the R software package ‘qdap’ created by Rinker, Goodrich, and Kurkiewicz (https://cran.r-project.org/web/packages/qdap/qdap.pdf). The ‘qdap’ package measures how positive or negative a certain piece of text (tweet, in this case) is. This means that each scrapped tweet receives a score (below zero means negative, zero means neutral, and above zero means positive), which I aggregate into company-specific, yearly sentiment scores per pillar. The more positive (negative) the text in a tweet is, the higher (lower) is the corresponding score. The scores are mainly based on polarized (positive and negative) words identified by a special sentiment dictionary, surrounded by a cluster of four words before and two words after each polarized word. To calculate the score for a specific tweet, the package sums the scores of the different polarized words and the corresponding clusters and divides them by the square root of the number of words in the tweet. In order to correctly measure the impact of the polarized words and their clusters, first, I perform a cleaning procedure of the tweets in R (e.g., deletion of punctuation marks, numbers, and company names). In this cleaning procedure, I do not remove stop words, as that could potentially bias the results (Ranco et al., 2015). For more information about the package and the calculation behind the sentiment scores, I refer to Rinker (2018) and De Haan (2020).

(23)

Furthermore, the total (average) amount of environmentally related tweets is 76,717 (1,918), for social I extract 46,877 (1,172) tweets, and governance has 40,297 (1,007) tweets. This means, 46.8% of the tweets are environmentally related, 28.6% socially related, and 24.6% governance related. These amounts are lower than the amount of tweets Twitterscraper initially returns, as Twitterscraper does not perform completely flawlessly. This causes me to remove a third (33.5%) of the total amount of scrapped tweets in R since (i) the tweets contain French language even after telling the package to only scrap tweets in English, (ii) the same tweet occurs multiple times in the data set of scrapped tweets, indicated by an identical timestamp (thus, excluding retweets or other copied tweets, as those could provide valuable information), (iii) ‘@company_name’ misses in the text of the tweets, or (iv) the tweets do not contain any text, before or after cleaning the text of the tweets. A list of the total amount of tweets per pillar, per company, is in the company list in Appendix 1 (Table A1.1). I provide more information on the mechanism behind the ESG pillar division of the tweets in the next paragraph.

(24)

TABLE 2

Overview of the Different Categories per ESG Pillar

Categories used by the professional rating agencies Refinitiv and MSCI to measure their ESG pillar scores and the combination of those categories used in this research to scrap the ESG related tweets.

Pillar Categories of

Refinitiv

Categories of MSCI Final Categories*** Average

amount of tweets per company*

Environmental Resource use Climate change Climate 1,347

Emissions Natural resources Emission 359

Innovation Pollution & waste Environmental 626

- Environmental

opportunities

Pollution 527

Social Workforce Human capital Human capital 46

Human rights Product liability Human rights 726

Community Stakeholder opposition Social 423

Product responsibility

Social opportunities Workforce 642 Governance Management Corporate governance Corporate behavior 14

Shareholders Corporate behavior Corporate governance 29

CSR strategy - Corporate social

responsibility**

1,232

- - Governance 187

Notes: ESG = environmental, social, and governance, CSR = corporate social responsibility.

* Before cleaning procedure in R. After cleaning, on average, I remove 33.5% of these tweets. The companies are mostly from the United States and I measure the averages mostly over the period 2010-2018.

** I select corporate social responsibility as a category for governance, as CSR nowadays actually reflects the ESG strategy of a firm from a firm’s management perspective. This is also the reason why professional rating agencies more frequently start to use the term as part of their measurement regarding governance, as is also visible in this table. Yet, it could be that Twitter users still think of CSR as representing the ESG/CSR performance of a firm as a whole, which was usually meant with CSR at the end of the previous age. So, ESG is actually an updated version of CSR and CSR nowadays only reflects the ESG corporate strategy, but possibly not all ‘twitterers’ use the terms in those exact ways already. This could potentially bias the governance category, for which I account by means of a robustness test later in this research.

*** The selection of the final categories is mostly based on whether a category of the professional rating agencies has multiple different interpretations in an online environment such as Twitter. If a category has multiple different interpretations, I do not select it, as it does not return reliable tweets.

I also subdivide the categories, in turn, into different keywords. I use these keywords to scrap the tweets belonging to each of the categories. These keywords are also based on an extensive selection procedure. For each category, I consider multiple (combinations of) keywords measuring that particular category, of which I make a final selection using trial and error on a subset of 100 tweets for five companies. More information on the keyword selection procedure is in Appendix 2. An overview of the different keywords is also visible there.

(25)

company-specific, yearly (i) fractions of positive (i.e., sentiment score larger than zero) tweets (value between 0 and 1 by design), (ii) fractions of negative (i.e., sentiment score smaller than zero) tweets (value between 0 and 1 by design), and (iii) standard deviations of the sentiment scores, which is in line with De Haan (2020). These transformed measurements are also pillar- specific. For the ESG (total) pillar in particular, I calculate the Twitter sentiment score and its transformations in two different ways: equally over all tweets and equally weighted over the three different pillars (E, S, and G; volume, which I discuss in the next sentence, is not equally weighted). The final sub predictor of the ESG Twitter sentiment is the yearly volume of tweets per company and per pillar. This means, in the final analysis, I examine as well the impact of valence (sentiment scores), as the impact of volume (number of tweets) on the stock market performance of a firm.

Stock Market Performance

The stock market performance data consists of two different types of data: the annual stock returns per company, and the annual betas adjusting for risk per company measured through the Fama and French Three-Factor model. To calculate both different types of data, I mainly acquire the monthly stock returns (adjusted for dividends and stock splits) per company through the Center for Research in Security Prices (95% of the monthly stock return data is from the CRSP). The other 5%, I extract from Yahoo! Finance, as those 5% do not trade on the American stock market, which is a prerequisite to be recorded by the CRSP. Subsequently, I annualize the monthly stock returns so that they do match the yearly ESG- and Twitter sentiment data. The annual stock returns and betas are based on the same period end dates used to measure the ESG pillar scores, so the periods used to measure these stock market performance data slightly differ between companies. Once again, I adjust these periods so that they exactly match the calendar years, which means also the stock market performance data is mainly over the period 2010-2018. The formula behind the annualization of the monthly stock returns per firm is in Equation 8.

𝑅𝑖𝑡 = [(1 + 𝑅𝑖𝑚1) × (1 + 𝑅𝑖𝑚2) × … × (1 + 𝑅𝑖𝑚12)] − 1, (8) where 𝑅𝑖𝑡 is the stock return for firm i in year t, and 𝑅𝑚𝑛 is the stock return in month n. The FF Three-Factor model used to measure the yearly, company-specific betas on a rolling period of 24 months is in Equation 9.

(26)

where 𝑅𝑖𝑛 is the stock return for firm i in month n, 𝑅𝑓 is the risk-free rate of return, 𝑅𝑚 is the total market portfolio return, SMB is the size premium (small minus big stocks in terms of market capitalization), HML is the value premium (high book-to-market ratios minus low book-to-market ratios), and 𝛽𝑚,𝑠,ℎ are the three factors’ coefficients or the betas adjusting for risk I use in the final analysis of this research. Besides the excess return on the market portfolio (𝑅𝑚− 𝑅𝑓) to explain the excess return of a firm (𝑅𝑖 − 𝑅𝑓), the SMB factor represents the finding that small market cap firms generate higher returns than large market cap firms, and the HML factor is a representative of the finding that high book-to-market ratio firms (high value) generate more returns than low book-to-market ratio firms (low value) do. On the basis of the formula above, I measure the three different betas on a yearly basis for each of the 40 companies (see next paragraph) included in the data set, using a rolling period of 24 months. This means, I measure each of the annual betas (for each company) on the stock return and FF factor data of the 24 months before the start of that particular year, applying Refinitiv’s ESG period end dates. In order to perform this measurement, I obtain the FF factor data (including

risk-free rate) through the website of Kenneth French

(https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). For more information on the FF Three-Factor model, I refer to Fama and French (1992).

Company Selection Procedure

(27)

companies, most likely, do generate more tweets. Besides that, to be included in the final data set, each of the companies should fulfill all of the data requirements mentioned below:

• The firm has no missing data for the most recent nine years of ESG pillar scores. • The firm’s headquarters is in the United States (35 companies), or the majority of its

revenue is from North America, in comparison to Europe, Asia, and South America. I allow an amount of revenue made in a different English-speaking country (e.g., the United Kingdom) to accompany the North American revenue if that means that the combined activity is the largest in comparison to the other regions, whereas this would not be the case for North American activity only. This rule is for comparison purposes, and since the Twitter sentiment scores are based on English text, as mentioned before, which means I want to select companies for which the largest amount of Twitter activity is in English, so that the Twitter sentiment scores are a true representation of the entire customer base. First, I set the threshold to American headquartered companies only, but it eventually turns out to be difficult to scrap enough tweets for mostly similar-sized companies within a sector, as I discuss at the next bullet point, so that I extend this rule. Finally, this selection criterion means that I can control the models in this research for the impact of Fama and French’s three risk factors, as I explain in the methodology section, since those factors are also based on US stock market data.

• The firm has received at least 400 tweets per ESG pillar before cleaning over the entire period of research. After cleaning, only one of the ESG pillars can generate less than 300 tweets over the entire time period, otherwise, I remove the company. These limits are to be able to obtain reliable Twitter sentiment estimates. This is also the reason why I could not use the sectors Healthcare and Energy in this research, as those sectors possess an inadequate number of companies with sufficient Twitter activity. There is one exception to this rule: Morgan Stanley remains in the final data set even though it only returns 288 social and 289 governance tweets. This is since both amounts are very close to the 300 threshold, and since it turns out to be difficult to find a worthy replacement.

(28)

and vice versa). This selection criterion is to ensure that the Twitter sentiment data does reflect the largest amount and most important part of the activities by the parent company, which means I can match it with the ESG- and stock market performance data of the parent company. An example that fits the latter two criteria is PepsiCo. For PepsiCo, I use as well tweets addressed towards the parent company PepsiCo, as the (main) subsidiary Pepsi. Whenever one of the two Twitter accounts generates more tweets than the other on a specific ESG category, I select it. I do not use the other Twitter account for that particular category, in that case.

• Twitter users should not address the firm frequently with tweets indirectly targeting the firm itself, such as ‘twitterers’ mentioning they experience (e.g., read, see, or hear) something through the platform of the company. I exclude the companies Netflix and Sirius XM for this reason.

• The firm should trade on public markets, since the measurement of the stock market performance data requires firms’ stock return data to be publicly available.

In total, there are 360 firm-year observations. The complete list of firms is in Appendix 1 (Table A1.1).

Descriptive Statistics

(29)

TABLE 3

Descriptive Statistics of the Final Data Set of 40 Mostly U.S. Firms, 2010-2018

Overview of the descriptive statistics for each of the variables used in this research, respectively the 30 different sub predictors (independent variables), the dependent variable stock return (adjusted for dividends and stock splits), and the control variables derived from the Fama and French Three-Factor model using a rolling period of 24 months (beta market risk premium, beta size premium, and beta value premium). Each of the variables has data on a yearly basis, mostly for the period 2010-2018. The pillar scores are from a professional rating agency (Refinitiv), and the other 26 sub predictors I measure myself using Twitter data. Moreover, the stock returns are mainly from the Center of Research in Security Prices, and the data required to measure the betas is from the website of Kenneth French.

N Mean Median S.D. Min. Max.

Environmental pillar score 360 80.07 83.59 14.07 27.74 98.74

Social pillar score 360 80.47 82.75 12.84 34.80 99.17

Governance pillar score 360 68.86 72.62 18.91 17.31 98.05

ESG pillar score 360 76.80 78.21 11.32 35.34 97.53

Environmental volume 360 213.10 90.5 476.57 0 6,306

Environmental sentiment score 345 0.07 0.07 0.12 -0.56 0.77

Environmental sentiment S.D. 333 0.28 0.28 0.08 0.00 0.55

Environmental sentiment positive 345 0.36 0.34 0.19 0.00 1.00

Environmental sentiment negative 345 0.19 0.16 0.16 0.00 1.00

Social volume 360 130.22 46 260.49 0 2,629

Social sentiment score 334 0.09 0.09 0.12 -0.48 0.87

Social sentiment S.D. 328 0.28 0.28 0.08 0.00 0.60

Social sentiment positive 334 0.40 0.39 0.18 0.00 1.00

Social sentiment negative 334 0.17 0.14 0.14 0.00 0.88

Governance volume 360 111.96 83 105.90 0 728

Governance sentiment score 341 0.15 0.16 0.10 -0.33 0.64

Governance sentiment S.D. 338 0.30 0.29 0.07 0.04 0.64

Governance sentiment positive 341 0.48 0.48 0.15 0.00 1.00

Governance sentiment negative 341 0.13 0.09 0.12 0.00 1.00

ESG volume 360 455.29 289 653.36 0 6,982

ESG sentiment score 350 0.10 0.10 0.10 -0.38 0.62

ESG sentiment S.D. 347 0.29 0.29 0.06 0.00 0.55

ESG sentiment positive 350 0.40 0.40 0.15 0.00 1.00

ESG sentiment negative 350 0.17 0.14 0.13 0.00 1.00

ESG sentiment score equally weighted 330 0.11 0.11 0.08 -0.16 0.53

ESG sentiment S.D. equally weighted 322 0.29 0.29 0.04 0.16 0.42

ESG sentiment positive equally weighted 330 0.41 0.40 0.11 0.13 0.78

ESG sentiment negative equally weighted 330 0.16 0.15 0.09 0.00 0.50

Stock return 357 0.15 0.11 0.28 -0.96 1.33

Beta market risk premium 358 0.99 0.99 0.64 -1.65 3.00

Beta size premium 358 -0.11 -0.20 0.88 -3.06 4.03

Beta value premium 358 0.16 0.03 0.92 -2.13 4.00

(30)

TABLE 4

Average Amounts of Tweets per Firm per Sector per ESG Pillar, U.S. 2010-2018*

This table shows the development of the average number of tweets per mostly American firm, per TRBC sector, and per ESG pillar over the years 2010-2018, for which the majority of the 40 firms in this research have data. This means, the reported numbers of tweets are yearly averages. I scrap the tweets myself from Twitter using specific ESG related keywords and by only selecting the tweets directed at the official Twitter accounts of the companies included in this research. The number of tweets reported in this table is after a cleaning procedure in R, which causes me to remove a third (33.5%) of the tweets.

Sector Pillar 2010 2011 2012 2013 2014 2015 2016 2017 2018 Total

Consumer Cyclical E 28 74 184 167 231 220 243 868 1,500 3,515 S 16 36 103 168 154 195 171 273 554 1,670 G 17 57 134 162 143 169 147 143 133 1,105 Total 61 167 421 497 528 584 561 1,284 2,187 6,290 Consumer Non-Cyclical E 10 38 56 177 387 461 165 556 505 2,355 S 3 14 246 187 148 196 299 110 169 1,372 G 27 77 160 203 221 164 132 120 119 1,223 Total 40 129 462 567 756 821 596 786 793 4,950 Financial E 1 3 15 30 63 209 135 248 691 1,395 S 1 2 9 15 30 66 68 197 211 599 G 1 6 28 67 76 93 83 90 154 598 Total 3 11 52 112 169 368 286 535 1,056 2,592 Industrial E 5 63 63 101 162 123 84 266 276 1,143 S 2 20 21 32 42 46 55 110 151 479 G 5 38 94 111 108 148 103 102 82 791 Total 12 121 178 244 312 317 242 478 509 2,413 Technology E 64 26 272 70 142 156 148 348 342 1,568 S 25 36 76 91 251 229 239 294 723 1,964 G 35 56 150 155 235 200 188 180 205 1,404 Total 124 118 498 316 628 585 575 822 1,270 4,936 Total E 22 41 118 109 197 234 155 457 670 2,003 S 9 21 91 99 125 146 166 197 362 1,216 G 17 47 113 140 157 155 131 127 140 1,027 Total 48 109 322 348 479 535 452 781 1,172 4,246

Notes: E = environmental, S = social, G = governance, U.S. = United States, TRBC = Thomson Reuters Business Classification.

* Since I only include the years 2010-2018 for the calculations in this table, I exclude a few firm-year observations of the total data set used in this research. This also explains why the total averages in the most right column slightly differ from the total averages of the data set as a whole.

(31)

performance (e.g., Fatemi et al., 2018; Gillan et al., 2010, who find a significantly positive correlation to exist between ESG and firm performance). On the other hand, this finding confirms the results of, among others, Evans and Peiris (2010), who also find that governance negatively correlates with stock return.

(32)

mixed results though, to be effective in these types of research. Finally, besides this debate about the trustworthiness or biasedness, the finding that the correlation between the pillars, for as well the pillar scores as the Twitter sentiment, is significantly positive for score and the positive fraction of tweets (for the negative fraction, it is the other way around) indicates that if a firm performs better on one of the ESG pillars, it usually also notes stronger performance for the others.

Finally, I mention some other implications from the results in Table 5. At first, it seems that for environmental and social the volume of tweets negatively correlates with the sentiment score, standard deviation, and fraction of positive tweets, and positively correlates with the fraction of negative tweets. This shows that a high amount of tweets usually indicates that there is an issue regarding the environmental or social behavior of a firm, which triggers Twitter users to tweet about it. Secondly, the volumes of tweets of the different pillars positively correlate with each other, which indicates that if a Twitter user addresses a firm more frequently for a category of one of the ESG pillars, the firm also often receives an increasing amount of tweets about the other pillars’ subjects. Summarizing the findings above, it seems that firms usually report a relatively stable performance between pillars.

Besides that, the large significant correlations between the sentiment scores and the fractions of positive and negative tweets are by design, as I classify a score above zero to be a positive tweet and below zero to be a negative tweet. The relationship between the standard deviations of the sentiment scores and the corresponding fractions of positive and negative tweets within a pillar also more or less originates by design, as higher amounts of positively and negatively oriented tweets means more dissimilarity among ‘twitterers,’ especially in combination with the relatively high amounts of neutral (i.e., sentiment score of 0) tweets in a data set like this. Still, the amount of neutral tweets in this research specific is relatively low, since the amounts of correlation between the fractions of positive and negative tweets are relatively high, as opposed to, for example, the 13% correlation De Haan (2020) finds. This shows that the ESG related tweets used in this research possess more pronounced opinions than the general tweets used by De Haan (2020). Yet, the amount of neutral tweets also still explains why the correlation between the fractions of positive and negative tweets within pillars is not close to 100% either.

(33)

TABLE 5

Correlation of the Main Predicting ESG Variables and the Dependent Variable Stock Return Using U.S. Firms 2010-2018 (n = 328-360)

This table provides the Pearson correlation using complete cases of the most frequently used predictors and the stock return (adjusted for dividends and stock splits). The main predicting variables, I extract from the professional rating agency Refinitiv (environmental, social, and governance pillar score) and create myself (remaining predictors) using data from Twitter, focusing on environmental, social, and governance related tweets directed at the official Twitter accounts of mostly U.S. firms for the period 2010-2018. The main source of the stock return is the Center of Research in Security Prices.

EPS SPS GPS E vol. ESS ES

S.D. ESP ESN S vol. SSS SS S.D. SSP SSN G vol. GSS GS S.D. GSP GSN Return EPS 1.00 SPS .43*** 1.00 GPS .17** .36*** 1.00 E vol. .03 .07 .04 1.00 ESS .10 .05 .07 -.33*** 1.00 ES S.D. -.001 .08 .04 -.13* .05 1.00 ESP .08 .07 .14* -.28*** .80*** .21*** 1.00 ESN -.11 .03 -.02 .25*** -.74*** .34*** -.42*** 1.00 S vol. .16** .09 .01 .24*** -.09 .05 .01 .09 1.00 SSS .13* .10 -.01 -.04 .17** .01 .08 -.15** -.14* 1.00 SS S.D. -.06 -.04 -.05 .09 -.06 .17** -.02 .15** -.03 .07 1.00 SSP .13* .15** .06 .03 .15** .02 .13* -.10 -.14* .82*** .27*** 1.00 SSN -.18** -.05 .06 .09 -.11* .04 .02 .17** .13* -.77*** .34*** -.42*** 1.00 G vol. .11* .18** .21*** .31*** -.05 .06 -.01 .03 .34*** .02 -.01 .06 -.02 1.00 GSS .24*** .23*** .04 -.02 .29*** .06 .19*** -.16** -.02 .15** .05 .16** -.11* .01 1.00 GS S.D. -.07 -.13* -.08 .04 -.01 .10 -.04 .05 .03 -.04 .08 -.03 .03 -.04 -.02 1.00 GSP .16** .18** .04 .001 .24*** .06 .21*** -.08 .01 .13* .03 .18*** -.05 -.01 .85*** .04 1.00 GSN -.33*** -.32*** -.09 .03 -.20*** -.01 -.13* .15** .03 -.14* .03 -.09 .16** -.03 -.75*** .41*** -.51*** 1.00 Return -.002 -.17** -.11* -.04 .02 -.09 -.10 -.13* -.01 .05 .02 .03 -.09 .01 -.06 -.11* -.08 .06 1.00

Notes: U.S. = United States, n = number of observations, EPS = environmental pillar score, SPS = social pillar score, GPS = governance pillar score, E = environmental, vol. = volume, ESS = environmental sentiment score, ES = environmental sentiment, S.D. = standard deviation, ESP = environmental

(34)

5. RESULTS

Table 6 and Table 7 show the results of the first two model types discussed in the methodology section. Both model types explain the relationship between the different (sub) predictors and the annual stock return (current and one-year lead return). Each of the tables consists of seven different models, which are extended versions of each other. I discuss the results of these models in the first part of this section, based on the measures (of fit) mentioned earlier.

(35)

firm. This could be the result of, on the one hand, highly demanding Twitter users in terms of environmental performance, causing firms to suffer a high amount of costs and effort to receive positive environmental tweets. On the other hand, it seems that saving time, effort, and costs, and neglecting environmental behavior completely is also not favorable (in fact, even worse) for the current stock return of a firm. This means, a somewhat neutral attitude regarding the environmental performance might be the desired position for a firm. Quantifying these findings again, a standard deviation (0.19) increase in the amount of positive tweets results in a drop of 4.2-4.9% in the current stock, and a standard deviation (0.16) increase in the amount of negative tweets causes the current stock return of a firm to decline with 4.6-5.9%. This means that the magnitudes of the effects are larger than for the effect of the social pillar score. Moreover, this finding is comparable with other studies establishing mixed results, such as Orlitzky, Schmidt, and Rynes (2003). They establish by means of a meta-analysis that, although CSR is likely to pay off, also strong moderations for that effect exist, causing the overall impact to be mixed.

When considering the difference between the different (sub) predictors rather than the different pillars, it becomes clear that the negative sentiment is the strongest predictor of the current stock return, which confirms De Haan (2020). Furthermore, this also proves that extracting the sentiment scores into positive and negative fractions of tweets gives more explaining power than using the sentiment scores themselves (the sentiment scores are not significant for any of the models), which is also the reason why I prefer to use the positive and negative sentiment for the remaining models in this research as well (note, that using all three different measurements of Twitter sentiment in the same model causes multicollinearity problems).

Concerning the different pillars again, it stands out that the negative sentiment of governance works in the opposite direction as it does for environmental. This means, low performance for governance has positive implications for the current stock return, which confirms, among others, Friedman (1962) and Evans and Peiris (2010) again. Besides that, the finding is in line with Table 5, which means it is also different than my expectations. The difference with the environmental pillar could be the result of governance being measured in a slightly different way than environmental and social are, as CSR could have multiple possible interpretations in the minds of Twitter users, which I explain in Table 2. Interpreting the effect, a standard deviation (0.12) larger number of negative governance related tweets results in 5.8-6.0% higher stock return. This means that the economic impact is considerable.

Referenties

GERELATEERDE DOCUMENTEN

totdat uiteindelik die wereld verras word met 'n verstommende ontdek- king of ontwerp. Die vraag ontstaan juis of die Afrikanerstudent nie miskien gedu- rende sy

The higher coefficients levels of household sentiment variables when time dummies are included indicate that sentiment levels above the trend level have indeed extra positive effect

The test above was conducted with the yield spread as dependent variable, individual ESG pillar scores as independent variables and the bond-and firm characteristics as

Table 4 exhibits the effect of the combined ESG-, Environmental-, Social- and Governance pillar score interacted with the Paris agreement on yield spread.. The variable “Paris” is

Other industries yield no significant results for the alpha, hence I cannot reject the null hypothesis of H0: High CSR stocks do not perform significantly different compared

In this research, the main investigated relationship is the possible impact the two different predictors (ESG pillar scores and ESG Twitter sentiment) have on the

Table 9 contains the results of the Carhart (1997) four-factor model for portfolio performance for the equally weighted portfolios based on positive screens for

Table 6.1: The effect of common risk factors and volatility indices on monthly returns of market value weighted North American portfolios between 2002 and 2017.. This table