• No results found

The causal effect of a lagged volume-weighted change in news risk on log weekly returns

N/A
N/A
Protected

Academic year: 2021

Share "The causal effect of a lagged volume-weighted change in news risk on log weekly returns"

Copied!
83
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The causal effect of a lagged volume-weighted change in

news risk on log weekly returns

Lisanne Stolte

Msc Finance Thesis

University of Groningen

Student number : s2474514

Supervisor Owlin: W. Westra

Supervisor university: J.V. Tinang Nzesseu

05-08-2019

Abstract

In this paper the effect of a volume-weighted change in news risk on log weekly returns is examined. I find clear evidence of a reversal pattern in this relationship, switching from a negative effect (though non-causal) to a positive causal effect. Furthermore, I find evidence that groups consisting of a higher absolute change in news risk from week t − 1 to t have a higher margin of log returns at t than groups with a lower absolute change in news risk. This seems to indicate that the reversal effect sets in earlier for groups with a higher absolute change in news risk. There is a growing need at banks and other financial institutions to implement the effect of news in real time in portfolio management and risk management in order to decrease unnecessary dependencies.This research is written for Owlin, a company that provides a news analytics tool for the financial sector. The research question that will be addressed is: how does the volume-weighted change in news risk measure influence the weekly returns of companies? The research is conducted for 1029 publicly traded companies based in 47 countries all around the world. The time frame for the research is June 2016 till May 2019.

JEL classification: D8, C23, G11, G14

Keywords: reputation risk, financial risk, firm-specific news, news sentiment, return reversal.

I wish to thank Willem Westra, for his always critical view which has helped me to go out of my

(2)

Contents

1 Introduction 1

2 Literature review 4

3 Methodology & hypotheses 7

3.1 Regression equation . . . 7

3.2 Research methods . . . 10

3.3 Cleaning data . . . 14

3.4 Subsample . . . 15

3.5 Hypotheses . . . 16

4 Data and descriptive statistics 18 4.1 Internal database . . . 18

4.2 External database . . . 18

4.3 Alignment of the two databases . . . 19

4.4 Difficulties in the dataset . . . 19

4.5 Final sample . . . 19

4.6 Summary statistics . . . 23

5 Results 30 5.1 Comparison lagged and non-lagged absolute news risk changes . . . 30

5.2 Comparison lagged and non-lagged decreases and increases in news risk . . . 31

5.3 Regressions with dummies for a certain absolute change in news risk . . . 34

5.4 Regressions with dummies for a certain change in news risk . . . 40

6 Conclusion 41

A Process of determining RICs 49

B Subsample - raw data 51

C Subsample - processed data 55

D Distribution variables 59

E Transformation of returns 60

F Correlation plots 61

(3)

K Quantile panel regression with lagged independent variable 72 L Margins of log weekly returns for the lagged absolute change in news risk

dummies 75

M Margins of log weekly returns for the lagged change in news risk dummies 76 N Margins of log weekly returns for the change in news risk dummies 77 O Marginal analysis lagged interactions for the change in news risk 78 P Marginal analysis non-lagged interactions for the change in news risk

dum-mies 79

Q Marginal analysis lagged interactions for the absolute change in news risk

(4)

1.

Introduction

A few recent studies demonstrate a strong correlation between media stories and stock market reactions (Tetlock, 2007 and Tetlock, Saar-Tsechansky, and Macskassy, 2008). How-ever, literature demonstrating a causal effect between the two is limited. In this paper the causal effect between the lagged volume-weighted change in news risk measure and log weekly returns is addressed and a clear reversal effect in log weekly returns is demonstrated. Furthermore, our research seems to suggest that the reversal effect is likely to set in sooner for groups with a higher absolute change in news risk, though this is not fully proven yet.

The research topic of this paper is how a volume-weighted change in news risk measure influences weekly returns of publicly traded companies. In this case, the volume-weighted change in news risk (from now on change in news risk) takes the change in the amount of articles into account compared to the average for a certain company, as well as the change in sentiment. Sentiment analysis is the process of identifying the opinion expressed in text Tabari, Biswas, Praneeth, Seyeditabari, Hadzikadic, and Zadrozny (2018). Past research on news has mainly focused on its influence on the public’s perception and on the public’s perceived importance of certain topics. However, in our research we decide to focus more on the impact of a change in news risk on the returns of individual companies. In particular whether negative news can be causally related to a company’s returns.

This master’s thesis is performed as a research for Owlin. Owlin is a company that devel-ops risk measures and trend measures based on real-time financial news. Its systems track and analyze over 3 million sources including (local) news sites, corporate sites, government and regulatory publications, academic sites and specialist blogs. The languages covered in-clude English, French, German, Spanish, Swedish, Italian, Portuguese, Polish, Danish, Dutch and Chinese (traditional and simplified).

There has been an enormous growth in the volume of business news appearing in the mass media during recent decades. Diana Henriques, The New York Times business journalist, notes that coverage of the New York Stock Exchange and NASDAQ (for example) has doubled in the last ten years of the 20th century. The increasing volume of business news is

of crucial importance since most of what external stakeholders and consumers learn about a company, comes from the news (Chen and Meindl, 1991; Deephouse, 2000; Dutton and Dukerich, 1991; Fombrun and Shanley, 1990).

(5)

also act upon this news. However, articles regarding the effect of news have mainly been of psychological nature. The effect of the news on stock prices for example, has remained mostly unproven; see Shiller (1981), Cutler, Poterba, and Summers (1989), Campbell (1991), Berry and Howe (1994), Mitchell and Mulherin (1994), and Tetlock (2007). There is a growing literature in finance that uses textual analysis to convert qualitative information in the news into quantifiable measures that represent the sentiment of the news. However, literature using an algorithm to determine news risk measures is still limited.

The data provided by Owlin consists of risk measures assigned to articles of certain com-panies using real-time financial/economic news. Fitch Ratings, a Tier 1 financial institution rated a sample of Owlin’s database as 98% complete, 96% accurate and 97% right on sen-timent1. Therefore the news database provided by Owlin is considered representative for

the news in the market. However, it needs to be noted that all news filters are adapted and tailored to the clients needs. Since Owlin looks at the relative volume of news, a high relative volume in a certain week can imply a higher salience of a certain topic/company compared to usual. This higher salience can influence the public and thus also investors in a positive or a negative way depending on the sentiment in the news. In this case, the algorithms are used to determine the amount of risk associated with certain firms based on the degree of negativity and the relative volume of news.

The time period used in this research is from June 2016 until May 2019. This period is chosen as we can retrieve data for this period under the current API (Application Pro-gramming Interface) of Owlin. The final sample consists of 1029 companies, based mainly in Europe, Asia and North and South America (figure 2). The main stock exchange of these companies are based in 47 countries all around the world. The countries in which our sample is mainly based, can be seen in figure 3.

There is a growing need at banks and other financial institutions to decrease response time to signals given by news in the market in order to be able to hedge a risk directly or to profit from the upside. This is also demonstrated by the testimonials below. Testing the effect of the news on returns is important in order to improve existing risk models and portfolio management.

1Test results of one of Owlin’s clients (Fitch Ratings) indicated these scores on a sample of 115 entities

(6)

According to one bank’s Head of Risk Management who is a user of the Owlin integration in a Fitch Ratings platform:

”The ability to filter news sentiment at the portfolio level is ”a godsend” because our current system of Google alerts was arduous and overwhelming”

According to Doron Reuter, Head of Business Development at ING:

”What’s more, producing useful, early insight into key company and industry news, the Owlin product also provides us with the overview and the lead time to respond to this information faster and in a more structured and effective man-ner.”

The research question that is tested in this paper is: how do volume-weighted change in news risk measures influence log weekly returns of individual companies? We form groups based on varying amounts of (absolute) change in news risk. These groups are included in the form of dummies and interaction variables for the change in news risk. The research method we employ is a quantile panel regression. We are using panel data covering 1029 companies through 152 weekly intervals. As a research method we are using quantile panel regression which is suitable because it allows us to include a possibly endogenous independent variable (the change in news risk), to include interesting outlying observations that we do not want to disregard in our research and the standard errors are more robust for non-normal distributions. On top of that we perform a post-estimation marginal analysis in order to verify whether margins of log weekly returns differ significantly per group. The marginal analysis is a valuable addition to the quantile panel regression as it facilitates the interpretation of dummy and interaction variables.

(7)

2.

Literature review

Perhaps one of the most relevant papers to answer the research question is by Tetlock, 2007 who demonstrates that market prices initially go down following pessimism in the media and afterwards revert to fundamentals. This paper shows that the Dow Jones index returns are roughly 25 basis points more on the days in which negative sentiment is very low (bottom 5%) as compared to the days in which negative sentiment is very high (top 5%).

Other studies have also shown that stock returns exhibit reversal at weekly and three to five year intervals, and drift over 12-month periods (e.g. De Bondt and Thaler (1985), Jegadeesh (1990), Lo and Craig MacKinlay (1990) and Jegadeesh and Titman (1993). Some articles shows that stock prices seem to drift after important corporate events for several months. This suggests that an underreaction to the news is happening initially and the stock keeps adjusting for the following months. The raw data in figures 15 and 11 in appendix B seems to indicate a drift as well (for example).

Chan (2003) finds strong drift in monthly returns following bad news, hence investors can initially be underreacting to the bad news. Investors can also overreact to price shocks, causing excess trading volume and volatility and leading to reversal. This reversal is sig-nificant even when controlling for size and book-to-market. While Chan (2003) is forming rolling portfolios of stocks, which change every month, we will form more frequently changing rolling portfolios of stocks on a weekly basis. We do believe that these methods of forming portfolios are fairly old-fashioned with continuously developing technologies that can ensure rebalancing every second or minute. However, because of a lack of more frequent returns data, we followed a similar rebalancing as Chan (2003).

While the literature is divided on this ”overreaction”, Zarowin (1990) finds evidence that shows overreaction in the market. Zarowin ranks common stocks according to their performance during a given month and he reports that the portfolio of the past months losers significantly outperforms the basket of the past months winners during the subsequent month. On the other hand, we have advocates of the ”efficient market hypothesis”. One of them is Park (1995). On a daily returns level, Park supports market efficiency. He shows that results favouring overreaction are due to a biased sample selection. After eliminating the selection bias caused by the bid-ask bounce, the price reversal disappears.

(8)

return reverts more quickly, with a greater reverting magnitude, than positive returns revert to negative returns. These findings are relevant for the research question as they suggest that returns could go down in the short run (e.g. following an increase in news risk) but go back up in the long run. Furthermore, the results suggest that we might find a faster and bigger reversion for an increase in news risk (since we expect negative returns for an increase in news risk) compared to a decrease in news risk.

Loss aversion is one of the theories that could explain a stronger reaction to negative news as opposed to positive or neutral news. Loss aversion means that the value of a certain loss has a greater impact on preferences of an investor than a gain of the same absolute value, Kahneman and Tversky (1979). Loss aversion implies that investors are more afraid of receiving negative returns and therefore likely respond stronger to negative news compared to positive news. Brown, Harlow, and Tinic (1988) show that stock price reactions to negative news events tend to be larger than reactions to positive events.

For this reason dummy and interaction variables (consisting of the dummy multiplied by the change in news risk) for groups with a certain increase or decrease in news risk can help us answering the research question. These variables allow us to verify whether the effect for a group with a certain increase in news risk on log weekly returns is in fact stronger than for a group with a similar absolute decrease in news risk. Furthermore, in order to verify whether an increase and decrease of the same absolute amount have a similar effect on the returns, we will perform another regression with control variables of a certain absolute change in news risk.

No news also contains information about a company’s situation. The absence of news can give information to the market for example, if a firm does not lay off workers or declare bankruptcy after a macroeconomic shock, investors could see this as a positive sign c. But on the other hand, if a firm fails to announce new investment projects for a period of time, this can be perceived as a negative signal to the market,Giglio and Shue (2013). They find evidence that supports an underreaction to no news.

In the paper by Boudoukh, Feldman, Kogan, M., and Richardson (2013), a methodology for identifying relevant events for companies is used (broken down into several categories). Some of the topics are much more likely to appear on extreme return days (e.g., analyst recommendations, financials) while others are not (e.g., partnership). Their results suggest that different types of events have a different impact on the stock price. Furthermore, their results showh a positive association between daily returns and daily tone. Evidence is also found that shows that on identified news days, the volatility of stock prices is over double that of other days. Furthermore, identified news days are 31-34% more likely to be associated

c

(9)

with extreme returns. Besides they show that their measure of tone (sentiment) increases the R-squared measure on identified news days.

In Ross (1989) and Andersen (1996), the volatility of stock price changes is related to the rate at which information flows to the market. According to Maheu and McCurdy (2004), ”the most important process affecting price movements is the news arrival process”. Information that shows uncertainty about the company’s future, can result in a decrease of the stock price. These papers suggest that for an increase in news risk, we can expect a higher volatility or lower return in our model. However, this research uses textual analysis for determining the sentiment, while we will use the existing machine learning techniques of Owlin to determine the sentiment.

Also the research by Groß-Klußmann and Hautsch (2011) highlights that it is crucial to classify news according to indicated relevance in order to filter out noise. This article is particularly relevant for this research as it also employs machine learning for the text analysis. Moreover, they show that sentiment indicators have predictability for future price trends though the protability of news-implied trading is deteriorated by increased bid-ask spreads.

In psychological literature, the news is considered to be an accurate proxy of the public’s salience since there is a high correlation between the volume or some patterns of news and the public agenda;Canel, Llamas, and Rey (1996);Winter and Eyal (1981). By the public agenda, we mean that people organize their priorities according to the salience signs given in the media. Front page news can thus appear to the public as relatively more important than some small news stories.

(10)

3.

Methodology & hypotheses

In section 3.1 we specify the regression equation for four quantile panel regressions. Then in section 3.2 we will elaborate on why our first research method, the quantile panel regression, is chosen. Also we elaborate on why a post-estimation marginal analysis is chosen in addition to the quantile panel regression. In section 3.3 we explain the process to make our panel data slightly more balanced to ensure consistency across time and across companies. In section 3.5 we determine the hypotheses following the literature review.

As a measure of fit, the Pseudo R2 as proposed by Koenker and Machado (1999) is

more suitable than the R2 or adjusted R2 as the pseudo R2 is more appropriate for quantile

regressions. However, this measure of fit is not taken into account in this research, since Stata does not have a feature that allows for a quantile panel regression that includes the Pseudo R-squared.

3.1.

Regression equation

Regression equations 7, 8, 9 and 10 are determined. The variables and rationale behind the regression equations will be explained first.

Please note that when speaking about week t − 1 in the following equations, it means we take the last available week for company i. Often this is just one week earlier, but this can differ. Please note that we select on a company having at least a certain amount of returns and change in news risk on which we will elaborate in section 3.3.

We follow the rationale of Machado and Santos Silva (2019) for determining the quantile panel regression equation. We are interested in estimating the conditional quantiles of a random variable Y whose distribution conditional on a k-vector of covariates X belongs to the location-scale family. Y denotes a log function of the weekly returns and this transformation is taken because of a non-normal distribution as seen in appendices D and E. Variable Y is calculated as following:

Yit = log(1+ | Rit |) × sgn(Rit) (1)

for week t = 1, ..., 152 for a panel of i = 1, ..., 1029 companies, in which R represents the weekly return in percentages and sgn represents the sign of the weekly return(- or +)

(11)

article. X is calculated as following: Xit∗ = ¯Nit× Ait ¯ Ai − ¯Nit−1× Ait−1 ¯ Ai (2)

*Please note that this function is rescaled after calculation and put on a scale from -1 to 1 to facilitate interpretation, in which -1 represents the observation with the highest decrease in news risk, 0 represents no change in news risk and 1 the observation with the highest increase in news risk.

for week t = 1, ..., 152 for a panel of i = 1, ..., 1029 companies, in which ¯N represents the average news risk score per article for company i fo week t or t − 1. Ait and Ait−1 represent

the amount of articles in week t and week t − 1 respectively for company i and ¯Ai is the

average amount of articles for company i per week, taken over all weeks in the total time frame researched (∼3 years).

Given data {Yit, Zit} from a panel of 1029 companies i = 1, ..., 1029 over 152 time

peri-ods (weeks), t = 1, ..., 152, we consider the estimation of the conditional quantiles QY (τ |

X) for a location-scale model of the form:

Yit = 11

X

l=1

αlitEXTlit0 + αitN EU T R0it+ βX 0 it+ 11 X l=1 ˆ βEXTlit0 Xit0 + ˆβN EU T R0itXit0 + (δ + γZit0)Uit, t = 1, ..., 152, (3) or Yit = 11 X l=1 αlitP OSlit0 + 11 X l=1 αlitN EG0lit+ αitN EU T R0it+ βX 0 it+ 11 X l=1 ˆ βP OSlit0 Xlit0 + 11 X l=1 ˆ βN EG0litXlit0 + ˆβN EU T Rit0 Xit0 + (δ + γZit0)Uit, t = 1, ..., 152, (4)

with P{δi + γZit0 > 0}=1 The parameter (δi), i = 1, ..., 1029, captures the company fixed

effects and Z is defined as in equation 5. The sequence {Xit} is i.i.d. for any fixed i and

independent across t. Uit are i.i.d. (across i and t), statistically independent of Xit, and

normalized to satisfy the moment conditions in equation (6).

EXTl is a dummy variable for the group with the lth highest absolute change in news risk

(12)

when the group of the absolute change in news risk is equal to l and 0 otherwise.

N EU T R denotes the dummy variable for the group with a change in news risk equal to 0, taking on 1 when the group is equal to N EU T R and 0 otherwise.

P OSl is a dummy variable for the group with the lth highest decrease in news risk with

l=[1,11] and l = 1 denotes the highest decrease in news risk. P OSl takes on 1 when the

group of a decrease in news risk is equal to l and 0 otherwise.

N EGl is a dummy variable for the group with the lth highest increase in news risk with

l=[1,11] and l = 1 denoting the highest increase in news risk. N EGl takes on 1 when the

group of an increase in news risk is equal to l and 0 otherwise.

Dummy variables are determined by using factor variable functionalities in Stata, which al-low us to generate automatic dummies. N EGlX, P OSlX, N EU T RX or EXTlX represent

the interaction variables of the prespecified dummies and the change in news risk, X. Z is a k-vector of known differentiable (with probability 1) transformations of the compo-nents of X with element b given by:

Zb =Zb(X) b = 1, ..., k, (5)

U is an unobserved random variable, independent of X, with density function fU(.) bounded

away from 0 and normalized to satisfy the moment conditions

E(U ) = 0 E(| U |) = 1. (6)

Equation 3 implies that:

QY(τ | Xit) = 11

X

l=1

αlitEXTlit0 + αitN EU T R0it+ δiq(τ ) + βXit0 + 11

X

l=1

ˆ

βEXTlit0 Xlit0

+ ˆβN EU T R0itXit0 + Zitγq(τ ) (7)

Equation 4 implies that:

QY(τ | Xit) = 11 X l=1 αlitP OSlit0 + 11 X l=1 αlitN EG0lit+ αitN EU T R0it+ δiq(τ ) + βXit0 + 11 X l=1 ˆ βP OSlit0 Xlit0 + 11 X l=1 ˆ βN EG0litXlit0 + ˆβN EU T R0itXit0 + Zitγq(τ ) (8)

(13)

i, or the distributional effect at τ . As mentioned by Machado and Santos Silva (2019): ”The distributional effect differs from the usual fixed effect in that it is not, in general, a location shift. That is, the distributional effect represents the effect of time-invariant company characteristics which, like other variables, are allowed to have different impacts on different regions of the conditional distribution of Y ” .

Yit represents a log function of the weekly return (from now on shortened to log weekly

return) of a company. In this case, the weekly return is chosen since it is less volatile than the daily return. Therefore it will be clearer to see in which direction a stock is moving and it includes less random stock movements. While Xit0 compares two sequential weeks (t − 1 and t) in terms of the volume of news and the news risk, Yit is only looking at one

week (t). Therefore there is some overlap and we cannot speak of causality. For this reason we perform two more regressions for the lagged variable X and the lagged dummy and interaction variables:

QY(τ | Xit−1) = 11

X

l=1

αlit−1EXTlit−10 + αit−1N EU T R0it−1+ δiq(τ ) + βXit−10

+

11

X

l=1

ˆ

βEXTlit−10 Xlit−10 + ˆβN EU T R0it−1Xit−10 + Zit−1γq(τ ) (9)

QY(τ | Xit−1) = l=11 X l=1 αlit−1P OSlit−10 + l=11 X l=1

αlit−1N EG0lit−1+ αit−1N EU T R0it−1+ δiq(τ )

+ βXit−10 + 11 X l=1 ˆ βP OSlit−10 Xlit0 + 11 X l=1 ˆ

βN EG0lit−1Xlit−10 + ˆβN EU T Rit−10 Xit−10 + Zit−1γq(τ )

(10)

3.2.

Research methods

(14)

3.2.1. Quantile panel regression

The first research method applied in this research is a quantile panel regression on quan-tiles 0.05, 0.25, 0.5, 0.75 and 0.95. A quantile panel regression is chosen as method since it is among others more robust to outliers on which we will elaborate later on. We mainly look at quantile 0.5 since coefficients for this quantile have an equal chance to be underestimated as to be overestimated. The other quantiles are mainly used as a robustness check. The quantile panel regression is mainly used for the interpretation of the effect of the change in news risk variable on the log weekly returns.

We perform four quantile panel regressions, divided in the non-lagged and lagged change in news risk and with either dummies for groups with varying amounts of absolute changes in news risk or varying amounts of changes in news risk (equations 7, 8, 9 and 10). The dummy and interaction variables for the absolute change in news risk as in equations 7 and 9, allows us to test whether (lagged) changes in news risk have a similar effect on log weekly returns for both an increase and decrease in news risk.

In this research, the observations in the sample are divided into 23 groups based on different changes in news risk. Firstly, all observations are sorted on the change in the news risk and then divided into 11 groups with an increase in news risk negi, 11 groups with a

decrease in news risk (posi) and one group with a change in news risk of 0 (neutr). We

divide the sample selecting on the cumulative value once the values are sorted and finding the cutoff values of groups that lie closest to the calculated boundaries. All groups have approximately the same absolute total amount of change in news risk as can be seen in appendix 1, except for the neutr group. We use different boundaries to select groups for an increase and decrease in news risk. However, the positive and negative changes in news risk are approximately equally distributed in our sample and therefore the total absolute amount of change in news risk is still fairly the same across groups, though a little bit higher for the groups with an increase in news risk (neg).

The sorting and selecting of groups based on the change in news risk causes groups of different sizes in terms of observations as can be seen in table 1. Selecting groups this way rather than having groups with an equal amount of observations, allows us to determine whether groups with different changes in news risk have different weekly returns or not.

For this research, a fixed effects model or a random effects model is used since we are dealing with panel data (time series data for different entities). The expectation is that a fixed effects model is the most appropriate, since we expect endogeneity between the error term and the independent variable (corr(ui, X) 6= 0). The random effects model assumes that

(15)

appropriate for this research (the null hypothesis being that random effects is appropriate). When performing a Hausman test on the regressions of the change in news risk and log weekly returns, we reject the null hypothesis and conclude that the fixed effects model is the most appropriate (Prob>chi2=0.0001).

A way to correct for possible endogeneity is to include an instrumental variable. This possibility is explored, however, no proper instrument is found that is uncorrelated with the dependent variable (log weekly returns). As mentioned in section 2, the public agenda is highly correlated with the volume or patterns of news which could make it an appropriate instrument in this sense. However, the public agenda of stock investors is likely to have an effect on the returns as well. Therefore the public agenda is not deemed an appropriate instrument.

Non-linear fixed effects models are also considered as a possible research method. Wooldridge (2005) proposes finding the distribution conditional on the initial value (and the observed history of strictly exogenous explanatory variables). However, in our research there is ex-pected to be some endogeneity among the independent variables, therefore this method is deemed inappropriate. Furthermore, in the paper by Greene (2002), some issues with non-linear models are raised. One of the issues is that non-non-linear fixed effects models give biased estimates of coefficients. On top of that, there is relatively little empirical evidence on the behaviour of the fixed effects estimator and literature is almost exclusively focused on the application of the fixed effects model on binary choice models.

A method that is investigated is the quantile regression as proposed by Koenker and Bassett (1978). Estimators are suggested which are comparable to least squares, but are outperforming least squares in terms of non-Gaussian error distributions. As the relationship between the change in news risk and log weekly returns is non-Gaussian (not normally distributed) as seen in appendix D and E, this is an interesting regression method for this research.

According to Koenker and Bassett (1978): ”Unfortunately the extreme sensitivity of the least squares estimator to modest amounts of outlier contamination makes it a very poor estimator in many non-Gaussian, especially long-tailed, situations.” Outliers are not taken out from our final sample, as these peaks in news are actually considered very relevant for the return. These peaks in news are expected to have the highest impact on returns and therefore we do not want to eliminate them as is often done for Ordinary Least Squares. The quantile regressions are more robust to outliers and therefore we do not need to drop this data.

(16)

paper by Machado and Santos Silva (2019) is particularly useful for models with endogenous independent variables. Therefore this method is deemed appropriate as our model possibly consists of endogenous independent variables. The news may affect the returns, but the returns may also affect the news, causing a correlation between the error terms and the independent variable.

Thus, the quantile panel regression model is the most appropriate method for this data set since it allows us: to account for the fact that we have non-Gaussian distributions, to include interesting outlying observations without biasing the coefficients too much, to include a possibly endogenous independent variable.

3.2.2. Marginal analysis

However, interpretation of regression tables can be very challenging in the case of inter-action effects, categorical variables or non-linear functional forms (Jann, 2013). Therefore our second research method consists of a marginal analysis for the dummy and interaction variables. The added value of the marginal analysis is that it allows us to compare groups with a different (absolute) change in news risk and to also test whether the groups are signif-icantly different from each other (and not just whether they are signifsignif-icantly different from zero). In order to use marginal analysis in Stata, we need to use factor variable notation for the dummy and interaction variables in the quantile panel regression. The marginal analysis is mainly performed on the 0.5 quantile regressions, unless specified otherwise. A margin is a statistic computed from predictions from a model while manipulating the values of the covariates (Jann, 2013). For the continuous covariate X (the change in news risk), margins are computed by taking the first derivative of the response of the quantile panel regression with respect to the covariate. Using the factor variable notation in Stata, the first factor (the first dummy or interaction variable) is automatically deleted which needs to be taken into account when interpreting results.

(17)

since no correction is applied to the dummy variables itself, the results for the dummy vari-ables should be taken with caution since there is a good chance that these are false positive as highlighted by McDonald (2014). Please note that the ˇSid´ak method is applied to the interaction variables (which represents the interaction between the dummy variables and the change in news risk). Therefore, for these results less caution is needed than for the dummy variables.

3.3.

Cleaning data

When performing a panel regression, one of the most important things is to clean the data set properly and try to make our panel a bit more balanced by checking if the sample is consistent across entities. Furthermore, the data needs to be checked for consistency over time as well. In the following sections will be explained how the actual returns and news data is cleaned.

3.3.1. Consistency across entities

In order to check for consistency across entities, it is verified whether a company split or merged during the time frame of the research. Through the Thomson Reuters database it is established that 493 companies in the sample have closed an M & A deal within the time frame of this research (June 2016 till May 2019). We will not correct for the fact that they merged, but it will automatically be corrected for if too many data points are missing in the returns. If a company merges and the RIC identifier changes, too many returns show up as missing and the stock will be eliminated from this research.

The panel data is unbalanced since we have different news per company and also in different weeks. In the data, it can be seen that exactly 1548 companies have a year or more of missing news risk scores within the time frame. These companies have the least amount of articles and will be dropped from the data set in order to make our panel a bit more balanced.

3.3.2. Consistency across time

(18)

When it comes to deleting missing values, listwise deletion is not an appropriate method since each country has a different amount of public holidays and therefore different missing return days. Furthermore, some companies even have returns on Saturday’s while most countries’ stock exchanges are closed during the weekend. Given the fact that daily, weekly, monthly and yearly data can co-exist in a data set, we need to ensure that the data is consistent across time as well. In this case companies were eliminated after exceeding a number of 512 missing days of returns within the data frame (304 missed weekend days plus a maximum worldwide amount of public holidays of 28 days per year times three years, plus an extra margin of four months).

Companies are dropped from all 98 countries in the sample, depending on whether these exceed the benchmark or not. From 51 countries all companies returns exceed the bench-mark and are thus completely dropped from the sample. The dropped data represents 1655 companies with a total of 3,015,888 news articles. It is possible that for some companies the RIC identifier from the matching system is outdated or the company is not publicly traded anymore, or the returns are simply not well reported. After eliminating these countries, we end up with a sample of 1029 companies with a total of 21,016,004 articles in 47 countries. These companies thus represent the companies with the most amount of news articles.

3.4.

Subsample

Please note that this subsample was used to determine the right measurements and we did not perform our analysis on them (appendices B and C).

We start with a subsample of 7 companies for which we have clear news events as can be seen in table 7 in appendix C. This subsample is used to determine the right method for our regression. When looking at the raw data results of the subset, a clear increase in the risk value (thus more negativity) based on the news can be observed for certain peaks in the stock price (see appendix B). Sometimes the news happens after the stock price goes down, but in other cases it is clear that the news happens before the drop. The price can be seen correcting itself for example for Deutsche Bank (figure 15) after two peaks in the riskiness of the news. The difficulty in the dataset is that each stock price has a different response time to the news in the market and therefore it is hard to determine which period or date of the news to regress on which period or date of the returns.

(19)

around that time, are the outliers that we want to correct for. However, other outliers such as for Nissan (figure 14) are fairly interesting for this research, as Nissan’s CEO got arrested around the 1542 timestamp. Furthermore, it is clear that it is a peak of news instead of just one outlying data point.

As can be seen in appendix C, the processed data (taking the average news risk scores per day) does not look very different from the initial raw data and still shows the interesting peaks in news. We can see a lot of volatility in the daily returns that does not clearly show where the stock is going in the long run, such as for Deutsche Bank (figure 15). For this reason we decided to choose the weekly returns as opposed to daily returns.

3.5.

Hypotheses

The division in several regressions in section 3.1 with dummies for a certain (absolute) change in news risk, allows us to test several hypothesis under both regression models.

We believe the overreaction of returns to news as mentioned by e.g. Chan (2003) and Zarowin (1990) to be higher for groups with a higher absolute change in news risk as opposed to the change in news risk. Investors may not take into account that positive or negative news may somehow already be reflected in the stock price and overreact. Therefore groups with a higher (absolute) change in news risk are expected to show a reversal effect sooner than groups with a lower absolute change in news risk. This reversal effect is demonstrated by e.g.Tetlock (2007), De Bondt and Thaler (1985), Jegadeesh (1990), Lo and Craig MacKinlay (1990) and Jegadeesh and Titman (1993). The first hypothesis is as following:

Null Hypothesis 1. Groups with a higher (absolute) change in news risk do not show a reversal effect in log weekly returns sooner than groups with a lower absolute change in news risk

Alternative Hypothesis 1. Groups with a higher (absolute) change in news risk show a reversal effect in log weekly returns sooner than groups with a lower absolute change in news risk

(20)

Null Hypothesis 2. The change in news risk between week t − 1 and t has no effect on the log weekly return at week t

Alternative Hypothesis 2. The change in news risk between week t−1 and t has a negative effect on the log weekly return at week t

Null Hypothesis 2. a The change in news risk between t − 1 and t does not have a stronger effect on the log weekly return at week t for a group with an increase in news risk than for a group with decrease in news risk of a similar absolute value

Alternative Hypothesis 2. a The change in news risk between t − 1 and t has a stronger effect on the log weekly return at week t for a group with an increase in news risk than for a group with decrease in news risk of a similar absolute value

In the literature, a reversal effect in the returns is found at weekly and three to five year intervals, e.g. De Bondt and Thaler (1985), Jegadeesh (1990), Lo and Craig MacKinlay (1990), Jegadeesh and Titman (1993). In this research we focus on a change in news risk for one lag (which is approximately a week, depending on the last available week for a certain company). Therefore we expect to find a negative effect for the change in news risk on the log weekly return, but a positive effect for the lagged change in news risk on the log weekly return. We construct conditional subhypotheses 3a and 3b. If both null hypotheses 3a and 3b can be significantly rejected, then we can reject null hypothesis 3.

Null Hypothesis 3. The change in news risk does not exhibit a reversal effect on the log weekly returns within one lag

Alternative Hypothesis 3. The change in news risk exhibits a reversal effect on the log weekly returns within one lag

Null Hypothesis 3. a The change in news risk does not have any effect on the log weekly return

Alternative Hypothesis 3. a The change in news risk has a negative effect on the log weekly return

Null Hypothesis 3. b The lagged change in news risk does not have any effect on the log weekly return

Alternative Hypothesis 3. b The lagged change in news risk has a positive effect on the log weekly return

(21)

4.

Data and descriptive statistics

In this research two databases are used, the Owlin database and the Thomson Reuters database. In the following sections we will elaborate on the details and challenges of the data that is retrieved from these databases.

4.1.

Internal database

The internal database of Owlin consists of news on 9737 companies and risk values as-sociated with articles about these companies. While Owlin has millions of news sources regarding all types of news, these are specifically tailored queries about these entities. The data provided by Owlin consists of risk scores for articles about certain companies using real-time financial/economic news. The data provided by Owlin consists of risk measures assigned to articles of certain companies using real-time financial/economic news. Further-more, the volume of the news is based on filters developed by data analysts (filters are stored search queries written in Owlin’s proprietary OQL query language). These filters make sure that articles regarding the entity itself, its subsidiaries, executives and company aliases are included. Also these filters ensure that most company-specific noise is excluded (other com-panies/products/people with a similar name). For example if we would want to see news regarding the company Apple, we do not want to see articles about apples. The news filters make sure that the news is regarding the right entity.

As this may sound quite abstract for some readers, an example of the mechanism behind the algorithm is given. Suppose that a news flow for Google consists of different synonyms, such as Alphabet or GOOGL (the stock ticker). If an article would mention for exam-ple: ”Alphabet goes bankrupt”, then the system will look for company name/synonym goes bankrupt. Therefore the right risk measure of an article will be assigned to Google when that specific synonym is included in the system. As mentioned before, Fitch Ratings rated a sample of Owlin’s database as 98% complete, 96% accurate and 97% right on sentiment.

4.2.

External database

(22)

4.3.

Alignment of the two databases

The news data is taken for a specific second in the last 3.5 years at the point in time when the article was published. The returns are provided on a weekly basis and the news data needs to be aligned to this. Therefore we choose to take the average risk scores of articles for a specific company for a certain week. Usually the median would be an appropriate measure as it is less prone to outliers than the average. However, the risk scores consist of a large amount of zero’s, which represent a neutral risk value of an article (no positive or negative sentiment is assigned to these articles or the sentiments in the article balance out). This would make it very likely that zero would be taken as the median value and therefore interesting peaks in the news may not be visible in the median data anymore. These peaks in the news represent company-specific events that could possibly have an enormous impact on the company’s weekly returns. We consider these events which are usually considered outliers as valuable for our research and therefore we consider the average of risk scores assigned to articles for a certain week more appropriate.

4.4.

Difficulties in the dataset

One of the main difficulties in the data is that each company has a very different news flow and stock price, which is why we need to use a quantile panel regression with the company as panel variable. Since we have a large sample size of 1029 companies (we will elaborate on this in section 4.5), this makes it harder. Furthermore, each company is likely to have a different stock volatility and perhaps a different response time of the stock price to the news. This makes it difficult to determine the period of the news/returns to correlate with the other. In order to regress the news based risk of each company on its returns, we are using a panel regression with the companies as panel variable and

Initially the intention of this research was to look at excess returns of a certain stock and not at actual returns. However, due to the large size of the data set and the variety of the stock exchanges (figure 3), it is complicated to come up with a uniform benchmark for the market return. This market return would be needed in order to calculate the expected return and subsequently the excess return. For this reason the actual return is chosen with a log transformation.

4.5.

Final sample

(23)

order to be able to retrieve data from the Thomson Reuters databases. The RIC’s have been found using the matching system of the permid.com website which is developed by Thomson Reuters. The matching system compares the company names in our system to the company names in the Reuters system. Based on the legal name or regular company name, 3170 companies have been determined as a 100% match. RIC’s are only given to publicly traded companies, therefore eliminating the smaller companies from our sample. However, there may be some false positive matches, given that the matching was solely done on the company’s name. Initially the matching is done based on the legal name in our system and if not available in our system, the regular company name is used (more details in appendix A). This way we tried to avoid false positives as much as possible as full manual checking was too time-intensive given the size of the initial sample.

The second criterion is that each company’s news flow must contain either the company name or any alternative company names, abbreviations or stock tickers. These company names are being used for the algorithm, looking for negative/risky words close to a certain name. After selecting on this criterion, our sample is narrowed down to 2690 companies.

The last criterion is that each company must have a sufficient amount of scores assigned to articles and a sufficient amount of returns. In this case companies were eliminated after exceeding a number of 73 missing weekly returns within the data frame and 52 weeks of missing risk scores. After selecting on these criteria, the final sample is narrowed down to 1029 companies.

(24)

Fig. 1. Sector distribution of the companies in the sample

All 1029 companies are assigned to a sector following the GICS (Global Industry Classification Standard).

Fig. 2. Region of the headquarters of the companies in the sample

(25)

Fig. 3. Country of the stock exchange of the companies in the sample

(26)

4.6.

Summary statistics

As mentioned in section 3, we divide the sample in different groups which consist of different (absolute) changes in news risk. While we perform the regression on the log returns, we highlight the summary statistics for the actual weekly return for interpretation purposes. The reason why groups are formed based on observations rather than on companies is because we believe a lot can happen in the news for a certain company during the time frame of this research. Our research therefore investigates the effect of firm-specific change in news risk at a certain point in time on log weekly returns. Therefore the groups can be seen as portfolios consisting of certain observations.

In table 1 the summary statistics are given for all groups and the change in news risk is taken from week t − 1 till week t, while weekly return is taken for week t. The lowest average weekly return among all groups is -4.56% which corresponds to the group with the highest increase in news risk(neg1). The highest average weekly return in the table is 1.94%

which corresponds to the group with the highest decrease in news risk (pos1). The average

weekly return is going up in the table, with some minor exceptions. These results indicate that there is possibly a correlation between the change in news risk from week t − 1 to week t and the weekly return in week t. Our expectation was to see a certain correlation between the change in news risk and the log of the weekly returns. However, we did not expect this relationship to be as pronounced as it seems in the table, since our model does not include other variables that are known to have an impact on a company’s returns, such as size, profitability, investment and the book-to-market equity ratio by Fama and French (2015).

Furthermore, the standard deviation seems to go up when going from no change in news risk (neutr) to a high increase (neg1) or high decrease (pos1) in news risk. One exception

is the standard deviation for the group with the 10th highest increase in news risk (neg10)

which is quite similar as for the group with the highest increase in news risk (neg1).

Given these statistics, the market seems to respond more heavily to an increase in news risk than to a decrease in news risk which is reflected in the standard deviation as well as the average weekly return. For example, the standard deviation and the average weekly return of neg1 are higher than for pos1. We can also spot the stronger reaction to negative news

(or less positive news) in figure 4 for the log weekly returns. The slope of the linear fit for the increase in news risk is steeper than the slope of the linear fit for a decrease in news risk. Overreaction to negative news can be explained by loss aversion of investors (e.g.Kahneman and Tversky (1979))

The average weekly return is positive for two groups with an increase in news risk (neg10

and neg11), but given that the average decrease in news risk for these groups is still very

(27)

is good news.

In table 1 there is overlap in week t, since change in news risk is taken from week t − 1 up to week t and weekly return is taken for week t. Therefore the lagged variable of the change in news risk (from week t − 2 till t − 1) and the weekly return of week t can tell us more about a possible causality as opposed to correlation in our model. These summary statistics can be found in table 2.

While table 1 shows a clear difference between groups with an increase in news risk and groups with a decrease in news risk, table 2 shows a positive weekly return for all groups except for the groups with the highest increase or decrease in news risk. Interestingly, the lowest average weekly return is for the group with the highest decrease in news risk (pos1).

Also neg2 and pos2 show a clearly lower return compared to the other groups. While this

may seem logical for the neg groups, this does not seem logical for the pos groups.

(28)

Fig. 4. Scatter of the change in news risk and the log of the weekly returns (for the group with the highest absolute change in news risk)

(29)

The highest average return in table 2 is for the third highest decrease in news risk (pos3) which indicates that the positive effects of observations in this group are more lasting than the positive effects of observations in the group with the highest decrease in news risk. While this may seem surprising to some, this is in line with mean reversal theory for returns as highlighted by for example Poterba and Summers (1988). It seems that pos1 and pos2 revert back to their mean sooner than pos3

From table 2, it is not easy to pinpoint what is specifically going on in the market. When plotting the highest group of absolute changes in news risk (groups pos1 and neg1 combined), we see that there is a reversal for the highest group of absolute changes. While figure 4 including the non-lagged change in news risk is downward sloping, figure 5 (for the lagged variable) is upward sloping. This means that, for example, if we have a certain change in news risk from t − 2 to t − 1, this change seems likely to have a negative correlation with the log of weekly returns at t − 1. However, it seems likely to have a positive causal impact on the log weekly returns at week t.

(30)

Fig. 6. Linear fit for the mean/median weekly return and the news risk group at time t

News risk groups are taken from 1 to 23 (1 representing pos1 and 23 representing neg1 in line with table 1). Group 23 is excluded for the fit of the mean weekly return, for visualisation purposes.

Fig. 7. Linear fit for the mean/median weekly return at t and the news risk group at time t − 1

(31)

Table 1: Weekly return for groups selected based on their change in news risk Weekly return Group1 N Mean SD pos1 195 1.935 7.986 pos2 493 1.755 6.734 pos3 804 1.568 5.910 pos4 1168 1.685 5.808 pos5 1607 1.056 5.992 pos6 2203 1.172 5.081 pos7 3075 0.924 4.330 pos8 4337 0.786 4.375 pos9 6425 0.569 4.153 pos10 10552 0.462 4.075 pos11 35927 0.336 3.775 neutr 11557 0.354 4.011 neg11 35787 0.151 3.953 neg10 10469 0.127 13.764 neg9 6360 -0.166 4.274 neg8 4326 -0.102 4.223 neg7 3062 -0.235 4.640 neg6 2215 -0.450 4.914 neg5 1616 -0.570 5.233 neg4 1164 -0.756 6.529 neg3 797 -1.216 6.032 neg2 491 -1.029 6.807 neg1 189 -4.564 13.537 Total 144819 0.270 5.513

Change in news risk2

Mean SD Sum -0.139 0.105 -27.080 -0.055 0.010 -26.947 -0.034 0.004 -27.003 -0.023 0.002 -27.000 -0.017 0.001 -27.010 -0.012 0.001 -26.988 -0.009 0.001 -27.003 -0.006 0.001 -26.996 -0.004 0.001 -27.004 -0.003 0.000 -27.002 -0.001 0.001 -26.999 0.000 0.000 0.000 0.001 0.001 27.255 0.003 0.000 27.252 0.004 0.001 27.254 0.006 0.001 27.256 0.009 0.001 27.243 0.012 0.001 27.251 0.017 0.002 27.248 0.023 0.002 27.245 0.034 0.004 27.281 0.056 0.010 27.258 0.144 0.107 27.245 0.000 0.012 2.756

1 The groups that start with neg

l refer to observations in the group with the lthhighest increase in news

risk (news that is becoming more negative), while poslrefers to observations with the lthhighest decrease

in news risk (with l=[0,11]). N eutr refers to no change in the news risk at time t. N eg1thus represents

the group with the highest increase in news risk and pos1 the group with the highest decrease in news

risk at time t.

(32)

Table 2: Weekly return for groups at t − 1 selected based on their change in news risk at t − 1 Weekly return Group1 at (t-1) N Mean SD pos1 193 -0.193 5.823 pos2 492 0.122 4.355 pos3 797 0.819 7.679 pos4 1161 0.534 4.396 pos5 1596 0.326 4.006 pos6 2186 0.306 4.871 pos7 3046 0.374 4.148 pos8 4301 0.306 4.034 pos9 6383 0.271 4.100 pos10 10464 0.280 3.981 pos11 35676 0.266 4.014 neutr 11483 0.431 4.773 neg11 35556 0.285 4.105 neg10 10413 0.346 13.826 neg9 6312 0.308 4.149 neg8 4294 0.226 4.103 neg7 3035 0.273 4.166 neg6 2203 0.365 4.894 neg5 1608 0.251 4.503 neg4 1157 0.240 4.853 neg3 793 0.374 4.190 neg2 489 0.097 4.208 neg1 188 -0.061 7.204 Total 143826 0.301 5.508

Change in news risk2(t − 1) Mean SD Sum -0.139 0.106 -26.837 -0.055 0.010 -26.897 -0.034 0.004 -26.782 -0.023 0.002 -26.844 -0.017 0.001 -26.821 -0.012 0.001 -26.779 -0.009 0.001 -26.750 -0.006 0.001 -26.775 -0.004 0.001 -26.828 -0.003 0.000 -26.779 -0.001 0.001 -26.789 0.000 0.000 0.000 0.001 0.001 27.069 0.003 0.000 27.112 0.004 0.001 27.051 0.006 0.001 27.051 0.009 0.001 27.007 0.012 0.001 27.096 0.017 0.002 27.119 0.023 0.002 27.085 0.034 0.004 27.153 0.056 0.010 27.158 0.144 0.108 27.129 0.000 0.012 3.147

1 The groups that start with neg

l refer to observations in the group with the lth highest increase in

news risk (news that is becoming more negative), while posl refers to observations with the lthhighest

decrease in news risk (with l=[0,11]). N eutr refers to no change in the news risk at time t − 1. N eg1

thus represents the group with the highest increase in news risk and pos1 the group with the highest

decrease in news risk at time t − 1.

2 As calculated in equation 2.

(33)

5.

Results

In section 5.1 we demonstrate regression results for the non-lagged and lagged change in news risk on the log weekly return including dummy and interaction variables for groups with a certain absolute change in news risk. These regression results are summarized and presented together for comparison purposes and can be found in table 3 following the full regressions appendix 11 and 10.

In section 5.2 we demonstrate the regression results for the non-lagged and lagged changes in news risk on the log weekly return including dummy and interaction variables for groups with a certain change in news risk. These results are summarized and presented together for comparison reasons and can be found in table 4 following the full regressions in appendix J and K.

After performing the comparisons between the effect of the lagged and non lagged (ab-solute) change in news risk on the log weekly returns, we will elaborate on the absolute change in news risk for the non-lagged variable and on the change in news risk for the lagged variable. The reason for this is that increases and decreases in news risk seem to behave in a similar manner for the non-lagged change in news risk, but differently for the lagged change in news risk. We will elaborate on this further in this section. marginal analysis will also be performed accordingly.

5.1.

Comparison lagged and non-lagged absolute news risk changes

Absolute changes in news risk from week t − 1 up to and including week t have a negative correlation with the log weekly returns at week t (significant at 0.1% for quantiles 0.25, 0.5, 0.75, 0.95 and at 1% for quantile 0.05), with a coefficient of -3.40 at quantile 0.5, as can be seen in table 3. However, because of an overlap at period t for the log weekly returns and the change in news risk, this relationship is not causal.

(34)

Table 3: Comparison of two separate quantile panel re-gressions for the non-lagged and lagged change in news risk on weekly log returns with dummies and interactions for a certain absolute change in news risk

Quantile 0.05 0.25 0.5 0.75 0.95 Log weekly returne (t)

Change in news riskf (t) -3.29** -3.34*** -3.40*** -3.45*** -3.49*** Standard error (1.16) (0.79) (0.53) (0.71) (1.00) Dummy variablesac YES YES YES YES YES Interaction variablesad YES YES YES YES YES

Observations 144,819 144,819 144,819 144,819 144,819 Change in news riskf (t − 1) 0.55 0.78 1.10* 1.33* 1.53

Standard error (1.04) (0.71) (0.48) (0.64) (0.90) Dummy variablesbc(t − 1) YES YES YES YES YES Interaction variablesbd(t − 1) YES YES YES YES YES Observations 143,826 143,826 143,826 143,826 143,826

Standard errors in parentheses.

*, ** and *** represent significance at 5%, 1% and 0.1% respectively

a Full regression results can be found in appendix H and follow regression equation 7. bFull regression results can be found in appendix I and follow regression equation 9.

c Dummy variables are included for twelve groups with different amounts of absolute changes in

news risk (dummy variables are extlwith l=[1,11] and l=1 being the highest absolute change

and the dummy variable for no change in news risk is neutr).

dInteraction variables of all dummy variables with the change in news risk are included for

l=[1,11].

e Log weekly return is calculated using equation 1. f Change in news risk is calculated using equation 2.

5.2.

Comparison lagged and non-lagged decreases and increases in news risk

(35)

t and the log weekly return is taken for week t, there is some overlap. Therefore this implies we cannot establish causality but correlation and hence the news can also change following a certain return.

(36)

Table 4: Comparison of quantile panel regressions of the non-lagged and lagged change in news risk on weekly log returns with dummies for a certain change in news risk

Quantile 0.05 0.25 0.5 0.75 0.95 Log weekly returne (t)

Change in news riskf -2.74 -3.31+ -4.08*** -4.67** -5.14*

Standard error (2.55) (1.74) (1.17) (1.57) (2.19) Dummy variablesac YES YES YES YES YES Interaction variablesad YES YES YES YES YES Observations 144,819 144,819 144,819 144,819 144,819 Change in news riskf (t − 1) 2.72 2.87 3.07* 3.22+ 3.35

Standard error (2.97) (2.03) (1.37) (1.83) (2.56) Dummy variablesbc(t − 1) YES YES YES YES YES

Interaction variablesbd(t − 1) YES YES YES YES YES Observations 143,826 143,826 143,826 143,826 143,826

Standard errors in parentheses.

+, *, ** and *** represent significance at 10%, 5%, 1% and 0.1% respectively

a Full regression results can be found in appendix J and follow regression equation 8. b Full regression results can be found in appendix K and follow regression equation 10. c Dummy variables are included for 23 groups with different amounts of changes in news risk.

Dummy variables consist of posl and neglwith l=[1,11] and l=1 being the highest decrease

or increase in news risk respectively, the dummy variable for no change in news risk is neutr.

d Interaction variables measuring the interaction between dummy variables and the change in

news risk are included for neutr, posland negl with l=[1,11]. e Log weekly return is calculated using equation 1.

(37)

5.3.

Regressions with dummies for a certain absolute change in news risk

We first perform a quantile panel regression including factor variables for twelve groups constructed based on a certain absolute change in news risk (appendix H). Including the groups as factor variables in Stata allows us to generate automatic dummies. In this case the group with the highest absolute changes (ext1) and its interaction with the change in

news risk are both omitted following the literature that the first factor accounts for the most variance in a model.

As can be seen in the appendix, the dummies(extl for l=[1,11]) do not have a significant

effect on the log weekly returns at quantile 0.5. However, the dummies for l=[6,11] and neutr are for example significant at varying significance levels for quantiles 0.05 and 0.25. The coefficients of the dummies indicate a more positive effect on the log returns in the short term if news has less absolute news risk at quantile 0.05 and 0.25. We perform a marginal analysis to verify whether the dummies are indeed statistically different from each other at quantile 0.25. This quantile is chosen as it is closest to 0.5 and thus less chance of underestimating the model compared to 0.05.

From the marginal analysis (table 5) we can conclude that the dummy for no change in news risk(neutr) is significantly different from all other groups. Also, the group with the second lowest absolute change in news risk (ext11) is significantly different from all other

groups except for the group with the 8thhighest absolute changes (ext

8) at a 5% level. Hence

on the short term, observations in a group with very low absolute changes in news risk have a significantly higher correlation with log weekly returns than groups with a higher absolute change for quantile 0.25. The change in news risk is taken from week t−1 up to and including week t and the log weekly returns is taken for week t. Therefore, given the overlap at t, we cannot speak of causality but of correlation. On top of that while the margins coefficient is going up for groups with a lower absolute change in news risk, the standard error is rising. As mentioned by Altman and Bland (2005), the standard error falls with sample size. Since the groups consist of a different amount of observations as seen in 8, this needs to be taken into account. Following Altman’s logic, we would expect a lower standard error for groups with a lower absolute change in news risk since they consist of more observations. However, the opposite is true and the standard error is generally higher for groups with a lower absolute change in news risk.

(38)

Table 5: Dummy variables and their margins in respect to log weekly returns and whether they differ significantly from each other

Dummy variable1 Margins2 Std. Error

ext1 0 (.) ext2 0.176 (1.27) ext3 0.124 (0.95) ext4 0.222 (1.74) ext5 0.217 (1.74) ext6 0.265* (2.14) ext7 0.306* (2.50) ext8 0.323** (2.66) ext9 0.283* (2.34) ext10 0.315** (2.62) ext11 0.362** (3.01) neutr 0.442*** (3.63) Unadj. group3 A ABCD A ABC AB BCDE CDE EF BCDE DE F

Standard errors in parentheses

*, ** and *** represent significance at 5%, 1% and 0.1% respectively

1 ext

lrefers to the dummy variables for the group with the lthhighest absolute changes in news risk (taking

on 1 when group is equal to l and 0 otherwise) with l=[1,11] and neutr refers to the dummy variable for no change in news risk.

2 Margins are determined following a panel quantile regression on quantile 0.25

3 Margins sharing a letter in the unadjusted group are not significantly different at the 5% level.

(39)

As can be seen in appendix H, the interaction variables of groups 3 till 11 (extl with

l=[3,11]) and the change in news risk are all significant at 0.1% level for quantile 0.5 and almost all remain significant (at varying levels) at other quantiles, providing a robust result. Furthermore, the coefficient seems to decrease when moving to groups with a lower absolute change in news risk. This indicates that interactions of groups with a lower absolute change in news risk from t − 1 to t have a more negative correlation with log weekly return at t than groups with a higher absolute change. Initially, we might expect this correlation to be more negative for groups with a higher absolute change in news risk. However, our results are less surprising with the reversal that can be seen when comparing figures 4 and 5 and figures 6 and 7. Our results seem to indicate that the reversal effect sets in earlier for groups with a higher absolute change in news risk. In order to establish whether groups have a significantly different interaction from each other, we perform a marginal analysis on quantile 0.5.

Margins of interaction variables for the groups with different amounts of absolute changes in news risk are significant at 0.1% level for the 1sttill the 11thmost extreme news risk changes

(extl with l=[1,11]) and for no change in news risk(neutr) for the 0.5 quantile (table 6). A

clear pattern can be seen indicating that there is some overlap at 5% significance level in margins between groups that are relatively close to each other in terms of absolute change in news risk but not for groups farther away from each other (with the exception of the group with no change in news risk). For example, the groups with the highest absolute change in news risk (ext1 and ext2) are not significantly different from each other but significantly

different from the other groups (except neutr for ext2). This suggests that while the groups

are not entirely optimal in terms of overlap in some margins at a 5% level, there is a clear decrease in margins when going from groups with a high absolute change in news risk to medium to low absolute change in news risk.

This decrease is also depicted in figure 8, in which the overlap can be seen as well as the decrease in the linear prediction of the log weekly returns. Please note that dummy neutr and ext1 were ommitted from the underlying regression in appendix H and therefore we must take these results with a bit more caution. Especially the neutr dummy gives an unexpected result, likely because of the omission. This graph and table 6 show that marginal effects of groups with a higher absolute change in news risk are closer to 0. We proved that a reversal effect is happening one week later (as seen in i.a. table 3). Therefore, the fact that the margins coefficients of groups with a higher absolute change in news risk are closer to 0 in table 6, seems to indicate that these groups reverse sooner than groups with a lower absolute change in news risk.

(40)

is more stable for groups with a higher absolute change in news risk. This is contradicting the result in figure 4 which depicts the results for the group with the highest absolute change in news risk(ext1). Possibly the variance in this figure is explained by the change in news

risk variable itself and not by the interaction with the dummy as depicted in figure 8. Groups with absolute changes in news risk are relatively equally distributed in terms of increases and decreases of news risk as can be seen in table 1. Therefore we can say that the results in table 6 and figure 8 show a clear pattern in the interaction behaviour of groups with varying extreme changes in news. Groups of comparable decreases and increases in news risk seem to behave in a similar manner in the short run. This result is surprising as we would expect an increase (decrease) in news risk to have a negative (positive) effect on the log weekly returns. Apparently groups with a certain absolute change in news risk behave in a similar way.

However, given the known reversal effect in returns (e.g. (De Bondt and Thaler, 1985), (Jegadeesh, 1990), (Lo and Craig MacKinlay, 1990) and (Jegadeesh and Titman, 1993)) it seems logical that groups with a higher absolute change in news risk seem to show the reversal effect sooner than other groups. Chan (2003) notes that investors can under- or overreact to news. In this case it seems that investors are overreacting more to a high absolute change in news risk and therefore the return seems to revert sooner.

(41)

Table 6: Interaction variables regressed on the log weekly returns and whether they differ significantly from each other

Interaction variable1 Margins2 Std. Error ext1× change in news risk -3.40*** (0.534)

ext2× change in news risk -4.42*** (0.856)

ext3× change in news risk -9.68*** (1.070)

ext4× change in news risk -12.31*** (1.284)

ext5× change in news risk -11.52*** (1.439)

ext6× change in news risk -16.23*** (1.674)

ext7× change in news risk -18.77*** (1.907)

ext8× change in news risk -18.80*** (2.214)

ext9× change in news risk -23.58*** (2.697)

ext10× change in news risk -27.09*** (3.317)

ext11× change in news risk -30.52*** (4.940)

neutr× change in news risk -3.40*** (0.534) ˇ Sid´ak groups3 E DE C BC BC ABC AB AB A A A D

Variable ext1× change in news risk and neutr× change in news risk are omitted due to collinearity

and therefore only take on the effect of the change in news risk on the log weekly returns as seen in appendix H.

Standard errors in parentheses

*, ** and *** represent significance at 5%, 1% and 0.1% respectively

1 variable ext

l× change in news risk and neutr × change in news risk are interaction variables of dummy

extl or neutr respectively and the independent variable ”change in news risk”. extlrefers to the dummy

variables for the group with the lth highest absolute changes in news risk (taking on 1 when group is equal to l and 0 otherwise) with l=[1,11] and neutr refers to the dummy variable for no change in news risk.

2 Margins are determined following a panel quantile regression on quantile 0.5 as seen in appendix H. 3 Margins sharing a letter in the Sidak groups are not significantly different at the 5% level following a

ˇ

Referenties

GERELATEERDE DOCUMENTEN

Here, the returns of Stellar and the lagged HE sentiment scores (for a number of different variants) produce relatively high correlations (as high as 0.09). The set of variables

As the weather variables are no longer significantly related to AScX returns while using all the observations, it is not expected to observe a significant relationship

When controlling for technology stock returns, oil price changes and the general market returns, other company specific, macroeconomic and monetary factors almost

[r]

The key findings show that the Fama and French three-factor model constructed from the overall market factor and mimic risk factors related to size and book-to-market

Door er geen aandacht aan te besteden vallen zij echter toch onder de nieuwkomers binnen het fantasyveld die fantasyboeken goed vinden op basis van inherente

The values of the optimal base stock levels are determined by using a multi-item two-echelon tactical inventory planning model, allowing reactive lateral transshipments

De gemeente heeft behoefte aan regionale afstemming omtrent het evenementenbeleid omdat zij afhankelijk zijn van de politie en brandweer voor inzet: ‘wij hebben