• No results found

Clouds on the horizon: privacy failures and its effect on electronic word-of-mouth

N/A
N/A
Protected

Academic year: 2021

Share "Clouds on the horizon: privacy failures and its effect on electronic word-of-mouth"

Copied!
61
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Clouds on the horizon: privacy failures and its effect on

electronic word-of-mouth

Kristian Reksen Svedal

26-06-2016

Master thesis

MSc. Marketing Intelligence The Department of Marketing Faculty of Economics and Business University of Groningen

1st Supervisor: dr. Lara Lobschat

2nd Supervisor: dr. Hans Risselada

Kristian Reksen Svedal S3203565

(2)

2

Abstract

Customers leave digital footprints wherever they interact with technology. According to Gartner (2015), nearly 21 billion devices will be connected to the internet by 2020. Hence, even more

personal information about customers will be generated in the years to come. Their information is of interest to hackers and other criminals. At the same time, privacy laws are not keeping up and technical systems are subject to failure; privacy failures, events that compromise customers’ information privacy are therefore becoming more prevalent (Wedel and Kannan 2016). Such events may put the customers at risk of identity theft and fraud. When a privacy failure occurs companies can expect negative electronic word-of-mouth (eWOM) from customers. The negative consequences for companies resulting from this can be reduced short-term and long-term profitability as well as damaged brand image and reputation (Goldenberg et al. 2007; Villanueva et al. 2008; Luo 2009; Balji et al. 2016). However, less is known of how different types of privacy failures trigger negative eWOM and how it varies by contextual factors such as the industry in which it occurred. This study

investigates how privacy failures affect negative eWOM. More specifically, the aim was to investigate the effect of the type of privacy failure, here called event type and in which industry it occurred on the two dependent eWOM variables volume and valence. This was performed through the means of analyzing social media data from Facebook (FB) from the official company pages of the five

companies in the industries retail, financial services, telecom and technology. Text mining was used to gain insights from the textual data and lastly, regression analysis was used to test the hypotheses. The text mining revealed that all companies had significantly more negative eWOM for the FB comments that were privacy-related. The highest volume was for the hacked retail firm Target, followed by the insider breached telecom firm AT&T. It was expected that a firm-related privacy failure such as an insider breach would lead to more volume of privacy-related eWOM and more negative eWOM as measured by valence. However, only H1 was supported were it was shown a significantly positive effect of an event type insider on the volume compared to an event type that was outside the firm (hacking). The other hypotheses were not supported. There is not statistical evidence to support the hypotheses that event type insider leads to more negative eWOM (H2), nor that the industry in which it occur can either strengthen or weaken the effect of event type on the volume and valence (H3a, H3b, H4a, H4b).

(3)

3

Preface

This master thesis was written as a thesis project in the MSc Marketing, profile marketing intelligence at the Faculty of Economics and Business, University of Groningen. The process has brought both moments of joy and challenges. First and foremost, it has been a great learning experience working on the thesis project. I have learned what it means to do research in a more systematic manner. I have also learned more about the topics customer privacy and privacy failures, as well as how customers respond to events that compromise their personal information. Lastly, although quite challenging it was especially interesting to learn text mining approaches to get insight from textual data.

(4)

4

Table of contents

1.0 Introduction...6

2.0 Literature review ... 10

2.1 Information privacy and privacy failures ... 10

2.2 eWOM ... 11

2.2.1 Volume and valence ... 13

2.2.2 Event type ... 15

2.2.3 Contextual factors of information privacy: industry ... 16

2.4 Conceptual model ... 18

3.0 Methodology ... 19

3.1 Research approach ... 19

3.1.1 Social listening and text mining ... 19

3.2 Measures ... 20

3.3 Data collection ... 20

3.3.1 Choice of companies and industries ... 20

3.3.2 Extracting data from the FB API ... 21

3.3.3 Choice of software and libraries ... 21

3.4 Ethics of the research ... 22

4.0 Results ... 22

4.1 Data description and data preparation ... 23

4.2 Descriptive analysis of the company FB data for one year: unfiltered ... 24

4.3 Text cleaning and preprocessing ... 26

4.4 Text mining ... 27

4.4.1 Target ... 28

4.4.2 JPMorgan Chase & Co. ... 30

4.4.3 Home Depot ... 32

4.4.4 Evernote ... 34

4.4.5 AT&T ... 36

4.4.6 T-tests: comparing company means of valence and valence filtered ... 38

4.5 Regression analysis ... 41

4.5.1 Outlier detection ... 42

4.5.2 Regression analysis: specification, estimation, validation ... 42

5.0 General discussion ... 50

(5)

5

6.1 Managerial Implications ... 52

6.2 Limitations: reliability, validity and generalizability ... 52

6.3 Further research ... 53

References ... 55

Appendices ... 60

(6)

6

1.0 Introduction

According to Gartner (2015), nearly 21 billion devices will be connected to the internet by 2020. The increasing number of devices that are connected to the internet means that more data is being generated and stored. Various types of customer data are collected by firms and organizations, while customers themselves leave digital footprints wherever they interact with technology. On one hand, customer data can be used to improve the lives of customers through better products and services. While at the same time, customer data is also of interest to hackers and others, which can be exploited to their benefit if it is accessed. Hence, with new technological developments come new challenges. Information privacy is defined as “the ability of the individual to personally control information about one’s self” (Stone et al. 1983). There are a growing number of incidents where the customers’ information privacy is being compromised. There is now a likelihood of 26% for firms to experience a data breach in a period of two years (Ponemon Institute 2016). When a data breach occurs, the customer has no control of their personal information. Therefore misuse of their personal information is likely to happen. The exchange of electronic information can put customers at risk due to providing easy and often unwanted access to personal information (Stewart and Segars 2002). Privacy laws and security technology is lagging behind the development in data collection, storage and processing technologies which has made breaches that compromise personal information becoming more common (Wedel and Kannan 2016). According to Privacy Rights Clearinghouse (PRC 2016), there have been more than 5370 privacy failures only in the U.S since 2005 - with more than 908 million customer records compromised. The sources of privacy failures (the event) can be hacking, data loss, equipment failure and loss of physical device or insiders misusing the data. However, there are different terms and classifications of such breaches which will be further explained in the remainder of the introduction.

A threat to customer privacy is what has been described as privacy breaches (Choi et al. 2016), data breaches (Sen and Borle 2015) or security breaches (Campbell et al. 2003). These are all incidents or events that can be joined under the broader term privacy failure (Martin and Murphy 2016). Privacy failure can be defined as a “broad term for any organizational lapse that can compromise consumer information, including but not limited to a data breach, hacking intrusion, or company loss of information” (Martin and Murphy 2016). When a privacy failure occurs, the customer is susceptible to harm (Martin et.al 2016). For instance, the data breaches at ChoicePoint and TJX harmed

(7)

7

privacy failure is used for events compromising customers` information. Some examples of the one large privacy failures are Target (40 million records breached), The Home Depot (56 million records breached) and Evernote (50 million records breached) (PRC 2017). Why should firms care about privacy failures? First of all, privacy failures are costly to companies. Since 2013 there has been a 29% increase in the total global cost of data breaches (Ponemon Institute 2016). They found that the global average total cost of a privacy failure is $4 million and the average cost per record breached is $158. The average cost per record breach has been found to vary with industry, where it is at the highest in the healthcare sector ($355), while it is $172 in retail and $129 in the transportation sector. Furthermore, the Ponemon Institute (2016) found that most data breaches are done by outside hackers or criminal insiders. Important questions to answer are how customers respond to privacy failures and what their responses are to the various events.

(8)

8

(9)

9

more blame on the company for the dissatisfaction – he/she is more likely to put more effort in engaging in a response to this dissatisfaction. It is then also a question of how severe the privacy failure is perceived by each customer. However, the effect of event type on volume and valence may also depend on the industry in which the privacy failure happened. Industry is a contextual factor that has been researched in information privacy (Smith et al. 2011). According to Goldenberg et al. (2007) the individual-level effect of negative WOM is contingent upon the industry and each specific case; hence the negative eWOM may play out differently for various industries. It has been found that the type of industry sector in which the breach occurred, significantly impacts the market valuation of the firm (Malhotra and Malhotra 2011). Some industries collect and store more sensitive information than others (e.g. financial services, healthcare, and education) which can make it more lucrative for criminals to exploit. Moreover, customers may have higher expectations to certain industries in terms of how their personal information is handled and protected. Therefore, the expected negative effect of a privacy failure (event type) on eWOM should depend on which industry in which it occurred. Thus, it is assumed that industry is a variable that play a role in the generation of volume and valence as a moderator, moderating the effect of event type on volume and valence.

The following questions are to be addressed in the current study:

RQ1: How do events related to privacy failures trigger eWOM in terms of volume and valence?

Sub questions:

RQ2: How is the volume and valence affected by the event type? RQ3: How is the volume and valence affected by the industry?

(10)

10

programming environment R. By tying the knots between marketing and the technological advances from computer science, this study gives an interesting perspective on privacy failures and eWOM. The structure of this thesis is as follows: first, the literature review with the hypotheses and the conceptual model is presented. Second, in the methodology part, the research design of the study is explained as well as the methods used and the plan for data collection and data analysis. Third, the results-part will cover data analysis, estimations, interpretations and discussion of results. Lastly, the thesis ends with the conclusion including managerial implications, limitations, reliability, validity, generalizability and further research.

2.0 Literature review

In the current part of the thesis, a review is given on the literature which serves as a foundation for the study. The various parts of the literature review builds up to each hypothesis and lastly, the theoretical framework is visualized in the conceptual model given at the end of this chapter. The review first introduces information privacy and privacy failures as these are the overarching topics for the thesis. Next, theory from the literature on eWOM will be covered. This is to investigate how privacy failures relate to eWOM and how it can trigger eWOM on social media. After follows

literature on how the independent variables event type and industry relate to the dependent eWOM variables volume and valence.

2.1 Information privacy and privacy failures

Privacy has been researched in various domains, ranging from law and philosophy - to ethics, sociology, economics, information systems and marketing (see Martin and Murphy 2016 for an extensive review). Many definitions of privacy exist but one general definition has been given by Rust et al. (2002) where they view “privacy as the degree to which personal information is not known by others”. Furthermore, there is the subset of the overall concept of privacy which is called

information privacy (Bélanger and Crossler 2011). Information privacy is defined as “the ability of the individual to personally control information about one’s self” (Stone et al. 1983). Information privacy stands in contrast to physical privacy, where physical privacy is regarded as “the access to an

(11)

11

A significant threat to customers’ information privacy is what in the literature has been described as privacy breaches (Choi et al. 2016), data breaches (Sen and Borle 2015) or security breaches

(Campbell et al. 2003). These are all events that can be joined in the broader term privacy failure (Martin and Murphy 2016). Privacy failure can be described as a “broad term for any organizational lapse that can compromise consumer information, including but not limited to a data breach, hacking intrusion, or company loss of information” (Martin and Murphy 2016). The term privacy failures will here be used for any intentional or nonintentional event that compromise customers’ information privacy. Among the previously mentioned privacy issues by Smith et al. (1996), privacy failures (as defined here) are more related to unauthorized secondary use, errors, improper access and invasion than data collection. According to Sen and Borle (2015), what they call data breach is an incident where someone gets unauthorized access to personal information and thus it is a compromise. Data breach then is another term which bears similarities to the term privacy failures. The compromised personal information may be sensitive, protected and confidential (Sen and Borle 2015). It may include but is not limited to personal health information, financial information, personal identifiable information and intellectual property (Sen and Borle 2015). Moreover, according to Choi et al. (2016), it is a trend of increasing privacy failures which endangers customers’ information privacy while it also hurts firms’ profitability and reputation. Not only can it hurt firms directly in terms of lawsuits and sanctions, but also through negative eWOM following the event. In the following section, literature on eWOM and the dependent variables volume and valence, as well as the independent variables event type and industry will be covered.

2.2

eWOM

Hennig-Thurau and Walsh (2004) has defined eWOM as an “information exchange about products and services between customers online”. In that respect, eWOM serve as an important source of information for customers, in aiding their decision-making. People share their opinions to others related to products, services, news, events and everyday life situations. WOM and eWOM is often used within the framework called CCB – consumer complaint behavior where two common

(12)

12

caused the dissatisfaction. This means that most consumers do not engage in negative WOM (or eWOM) - only the highly dissatisfied consumers are most likely to engage in negative eWOM (Richins 1983). So, if consumers do engage in negative eWOM following a privacy failure, how do they respond and what can be the consequences of this to firms?

In comparison to traditional WOM that occur in person face-to-face, much of today’s communication is of publicly displayed in social media, on internet forums and other media. According to

researchers, eWOM communication is perceived as a reliable information source by customers (Gruen et al. 2006; Villanueva et al. 2008) and it can have an impact on consumer behavior (Berger 2014). To firms, eWOM/WOM can have both positive and negative impact on various outcomes. In terms of long-term profitability, Villanueva et al. (2008) investigated the impact of customer

acquisition both through marketing efforts and organically through WOM, on customer equity. Their findings revealed that the long-term growth in customer equity was nearly twice as large for

(13)

13

engage spread negative WOM; in comparison to customers who are dissatisfied but believe failures will not occur in the future and that it is not under the firm’s control (Blodgett et al. 1993). In this study, the (negative) eWOM is measured as an outcome by the two variables volume and valence respectively.

2.2.1 Volume and valence

(14)

14

comments on specific topics made by others, and these are used as basis in their own decision making (Huang and Chen 2006). Based on that people react more strongly to negative information as previously mentioned, and that they are prone to observe how other people behave online and how they talk, a potentially large volume with negative valence following a privacy failure can have negative consequences for firms. However, the volume of eWOM is not necessarily affected to same extent for each privacy failure (or event type). The various privacy failures (e.g. hacking, insider breach) may trigger different reactions and responses and hence the volume is expected to vary accordingly. Not all consumers may react as negatively to the failure, and not all that reacts

negatively may invest effort in engaging in negative eWOM in social media. According to Hirschman (1970), dissatisfied customers may engage in voice actions in cases where they see it worthwhile to do so, rather than exit the relationship. Furthermore, Richins (1983) found when studying negative WOM by dissatisfied consumers that consumers’ responses are minimal to minor dissatisfaction, resulting in almost no complaining or spreading of negative WOM. Research says little about what kind of privacy failures they would react and respond more strongly to, more than others. The valence on the other hand captures the content of eWOM messages - how positive versus negative the messages are (Liu 2006). Valence is often described within the valence-expectancy framework, where valence is the positive versus negative feelings about an event occurring (Tsiros et al. 2004). Negative eWOM is a form of private action within the IPPR framework of Son and Kim (2008), where customers may voice their opinions about experiences that threatened their information privacy. The negative eWOM spread by the customers is a type of statement as a response to the violation that occurred. Previous research has also found that people tend to pay more attention to negative information than positive information (Cheung and Thadani 2012). This is because negative

(15)

15

2.2.2 Event type

(16)

16

a privacy failure will induce feelings in consumers such as anger (Folkes 1988). According to Kalamas et al. (2008), angry customers are less likely to spread positive WOM and more likely complain and have negative repurchase intentions, as well as engage in third-party action. Hence, the emotions that the privacy failure evokes can result in the customers venting their frustrations publicly in social media. However, it is less clear which privacy failures (event types) that they react more negatively to. Given that controllability is an important factor in attribution - to which extent was the failure under the firms’ control (Curren and Folkes 1987; Curren et al. 1992). If the privacy failure is due to internal factors (e.g. criminal insider, data loss) it should generate a higher volume of negative eWOM than when the cause is external (i.e. outside of the firm’s control). Hence, an insider breach should result in more negative eWOM to firms than when a privacy failure is due to outside hackers. This is according to attribution of the failure, that the customers blame the company for the incident and that the very reason why it happened was due to internal factors (i.e. the firm has control and more responsibility). Negative eWOM is still expected for firms that had an event type classified as external but is believed that customers perceive the violation of their privacy less negative since the event was not under the firm’s control.

Hence, this lead to the following hypotheses:

H1: An event type which is firm-related leads to more eWOM than an externally related event H2: An event type which is firm-related leads to (higher) negative eWOM than an externally related event

2.2.3 Contextual factors of information privacy: industry

(17)

17

as a contextual variable (Smith et al. 2011). Also, according to Goldenberg et al. (2007) the individual-level effect of negative WOM is contingent upon the industry and the specific case. Bélanger and Crossler (2011) suggest that future research should measure privacy concerns of businesses and groups, while trying to differentiate the factors that explain the concerns and related consequences within the context of industry or society. One of the studies that have used the contextual variable industry in privacy research is Malhotra and Malhotra (2011). They found that the industry in which the privacy failure occurred had a significant impact on the market valuation of the firm. While only looking at financial industry and retail, they found that especially the market reacts more negatively to breaches in the financial sector. It is further argued that this must be due to the special

responsibility this industry has to protect personal information of their customers (Malhotra and Malhotra 2011). Moreover, Sen and Borle (2015) estimated the contextual risk of privacy failures where they estimated the risks of failure within an industry (among other variables). Security controls that firms implement are likely to differ between industries and thus the data breach risk will vary between the industries (Sen and Borle 2015). They employed a classification of in total seven types of industries: financial and insurance, retail and merchant, education, government, medical and health care, nongovernmental organizations (NGO’s) and others. For instance, banking/financial businesses are regarded as information intensive industries and they have a particular responsibility that confidential data is not leaked (Yeh and Chang 2007). Moreover, the retail sector experiences increasing costs due to privacy failures (Malhotra and Malhotra 2011) and where privacy failures are becoming more frequent. How large the volume and negative the valence is following the event may depend on the industry, because some industries collect and store more information about their customers but also more sensitive personal information than others. This can be information that is more valuable and lucrative for others to access and exploit. Also, the focus on and investments in security may differ between industries. Financial services, healthcare and

education are some industries in which a privacy failure could compromise personal information such as personal identifiable information, social security number, transaction data, grades and patient records. Furthermore, trust is especially important in sectors like finance, telecom, insurance, healthcare, education. Banks for instance have to be more concerned about their customers’

(18)

18

where the compromise is perhaps less critical (e.g. entertainment, media). It is expected that a privacy failure occurring in certain industries (e.g. finance, telecom, healthcare) should lead to a higher volume of eWOM and a more negative valence than compared to other industries (e.g. entertainment, media, technology) due to the previously mentioned nature of the various industries, what type of personal information they store and the expectations to the different industries from the customers.

This leads to the following hypotheses for industry:

H3a: The industry in which the privacy failure occurs strengthens the positive effect of event type on the volume

H3b: The industry in which the privacy failure occurs weakens the positive effect of event type on the volume

H4a: The industry in which the privacy failure occurs strengthens the negative effect of event type on the valence

H4b: The industry in which the privacy failure occurs weakens the negative effect of event type on the valence

2.4 Conceptual model

(19)

19

3.0 Methodology

In this chapter the research approach is described, the operationalization of variables (measures), the plan for data collection, the methods which are used, as well as potential ethical issues. The text mining part is especially emphasized and explained.

3.1 Research approach

This study takes an observational research approach to research eWOM in the context of information privacy and privacy failures. The benefit of using this approach is that eWOM is studied in a natural setting and hence it is realistic. This is done by social listening - analyzing social media data by the means of text mining. The eWOM is first studied by text mining then by regression analysis. More specifically, FB-data will be analyzed by first extracting the data through FB’s application

programming interface (Graph API). FB is one of the most popular social media platforms with nearly 2 billion users worldwide (Statista 2017). This means it is potentially large volumes of data that is publicly available to extract from the platform. Furthermore, the data which is publicly available can be extracted for free using their API through the developer site of FB (e.g. comments, likes, shares). Descriptive statistics will be reported, as well as word frequency counts, words associations, word clouds and sentiment analysis. The tool that will be used for the analyses is the open-source,

programming environment R. Lastly, to test the hypotheses regression analysis will be used to model

how volume and valence differ depending on event type and industry.

3.1.1 Social listening and text mining

Text mining is a method which is about discovering knowledge from texts and finding patterns in unstructured or semi-structured texts (Netzer et al. 2012). Text mining encompasses a large field of theoretical and methodological approaches that uses text as input information (Feinerer et al. 2008). It draws on disciplines such as computer science, statistics and linguistics (natural language

(20)

20

computing environment, organize, structure and store the text into a repository, tidying the text including preprocessing (e.g. text formatting), creating a term document matrix (i.e. transforming the preprocessed text into a structured format), statistical analysis and visualizations (Feinerer et al. 2008). The idea of text mining is to transform the text into a structured format based on term frequencies, thereafter use standard data mining techniques (Feinerer et al. 2008). When the data is extracted, cleaned and pre-processed it is ready for analysis.

3.2 Measures

The two dependent variables that will be measured are volume and valence, both in the text mining part and the regression part. However, the text mining part also concentrates on word associations, common privacy failure-related words for the companies. Volume is here a discrete, non-negative count variable. Volume will in this study be measured by the number of negative privacy failure-related comments that a company receive to each of their FB post. This corresponds to the original variable comments_count from the FB data. The original variables from the API are described later in part 4.1 data description and data preparation. Valence needs to be classified from the content of the comments by using a sentiment algorithm which calculates a continuous sentiment score per comment. The independent variables, event type and industry are coded as dummy variables where event type is coded 0=hacked vs 1=insider where hacked serve as the baseline privacy failure. Industry is coded with m-1 dummy variables where retail serves as the baseline industry. This leads to three dummy variables for industry: financial (=0/1), telecom (=0/1) and technology (0/1). In this study, the classification for privacy failures (event type) is used as by Ayyagari (2012) - the same classification given by the nonprofit organization PRC which advocates privacy rights. These are the following events then related to privacy failures: unintended disclosure, hacking or malware, payment card fraud, insider, physical loss, portable device and/or stationary device.

3.3 Data collection

3.3.1 Choice of companies and industries

(21)

21

following companies and industries have been selected: Target, JPMorgan Chase & Co., AT&T, Evernote and The Home Depot. For more information on the breaches, see PRC (2017).

Table 1: Summary of companies and breaches

Firm Event type Industry Date public Customer

records breached

Target Hack Retail Dec.13, 2013 40 000 000

JPMorgan Hack Financial services Dec.5, 2013 465 000

Evernote Hack Technology Mar.3, 2013 50 000 000

Home Depot Hack Retail Sep.2, 2014 56 000 000

AT&T Insider Telecom Apr. 8, 2015 280 000

3.3.2 Extracting data from the FB API

The FB Graph API gives free access to publicly available FB data by using access tokens which are either long-lived or short-lived. After getting API access tokens, a programming environment is used to set up a library to connect to the API. Temporary (short-lived) access tokens lasts for maximum two hours at a time. In that time window the data from the selected firms’ FB pages can be

extracted. The library in R called Rfacebook is suitable for extracting the data from the Graph API and is used here (see The Comprehensive R Archive Network, CRAN 2017 for information about the libraries). After retrieving the access token (which essentially is a link), it is stored as an object (variable) in R. Then the functions from the Rfacebook library are used to request the relevant information and store it in R. The two main functions used here is the getPage-function in

combination with the getPost-function. Finally, the data is ready to be pre-processed, cleaned and structured for further analysis.

3.3.3 Choice of software and libraries

The open-source programming environment R is chosen for the analyses in this study. When it comes to text mining, R has the following main features: preprocessing, associations, clustering,

summarizing, categorizing, and connecting to API’s (Feinerer et al. 2008). The main libraries that are used for text mining in this thesis are the packages tm, qdap, sentimentr and syuzhet. Most of the analysis is based on the text mining package tm, while the valence is calculated with the

(22)

22

score for polarity in the sentiment; this is done for each sentence calculated with basis in sentiment dictionaries to tag polarized words (CRAN). It is the bag-of-words method which treats each text document as a mix of topics or sentences, and these are then broken down into an ordered bag of words. An important notice is that the score is contingent upon the polarity lexicon (dictionary) that is being used. Moreover, the other library for sentiment analysis in R, syuzhet is especially useful for classifying emotions while it also assigns classifications for negative and positive scores. From this library the get_nrc_sentiment function is used, which calls the NRC dictionary and then classifies the presence of the emotions anger, anticipation, joy, surprise, sadness, fear, trust and disgust (see documentation on CRAN for syuzhet). For calculating the correlations between words (word

associations), the findAssocs-function from the tm library is used where it returns a correlation value ranging from 0 to 1 where a higher value means that two words have a stronger association (or higher correlation). It essentially calculates the correlation in terms of two terms/words co-occurrences. Lastly, for some of the necessary data manipulation steps that are used to filter out privacy failure-related comments, mainly the library dplyr is used with functions such as filter and grepl which combined with specified words returns FB comments that contain any of the words that are specified.

3.4 Ethics of the research

FB users will not know that their online behavior is observed and used in this study and therefore they cannot explicitly give their consent. Although, what is consent in social media research is not clear according to Zimmer (2010), it should be further explored. One argument is that by signing up for FB as a user, one already accepts certain terms and agreements well aware that some data such as comments one generates is publicly available. It is acknowledged in this study that there might be some ethical issues of this research approach, even though the data that are extracted and analyzed already is considered public. Important steps are taken to not reveal the true identity of any users behind the FB comments. Only aggregates and separate words are extracted out of the comments. No users ID’s, names or full comments are published in this thesis.

4.0 Results

(23)

23

The results in this thesis is split into two parts in order to analyze the negative eWOM: the first part employs the text mining approach for understanding the negative eWOM and finally linking to the research questions, while the second part mainly consist of regression analysis for hypothesis testing.

4.1 Data description and data preparation

The data is extracted through mainly to functions using the Rfacebook package, namely getPage and getPost. These functions return only publicly available data which is nowadays quite restrictive. For each company it is retrieved the following variables with the two functions:

Table 2: getPage variables

Variable Variable description

from_id FB ID of the company

from_name Name of the company

message FB post from the company

created_time Timestamp for the FB post

type Type of content in the FB post

link If FB post includes a link, a link is given

id ID combining id of the sender and id for the FB

post

story A previous post is updated

likes_count Discrete count variable for the number of likes a

company post receive

comments_count Discrete count variable for the number of

comments a company post receive

shares_count Discrete count variable for the number of

shares a company post receive

Table 3: getPost variables

Variable Variable description

from_id FB ID of the user posting a comment

from_name Name of the FB user

(24)

24

created_time Timestamp for the FB comment

id ID for the FB users’ comment to a company post

likes_count Discrete count variable for the number of likes a

comment from the FB user receive

comments_count Discrete count variable for the number of

comments on the comment from the FB user

In order to prepare for analysis, the data for each company is stored in two data frames - one for the getPage variables (company) and one for the getPost variables (user comments). The data is checked for inconsistencies where it is important to check if there are any observations that should not be there. This is checked for in the company data frames with logical functions in R, that the company data frames only consists of data from the company. For instance, the from_id variable should be the same for all observations in the company data. For the user comments data frames, in which text mining is applied at a later stage - there should be no from_id corresponding to the company. It was found only a few replies from the company in these comments data frames to FB users, which were removed in order to have texts that are only from the FB users (<10 per company).

4.2 Descriptive analysis of the company FB data for one year:

unfiltered

Before the text mining, a descriptive analysis is performed on the total one year data for each company to get more insight in the level of customer engagement they generate on their respective FB pages throughout a year.

Target

The retailer Target has 23 645 000 likes on their official FB page.

(25)

25

shares were 255. However, there is a large spread in the number of likes (SD=10447), the number of comments (SD=270) and the number of shares (SD=1021).

JPMorgan Chase & Co

The financial service firm JPMorgan Chase & Co. has 265 000 likes on their FB page.

From their page, 296 posts made by the firm were extracted for the time period 2015/01/01 to 2016/09/02. Also here it was used one year of data so from around 2015/01/01 to 2016/01/01 which resulted in 161 observations (posts) made by the firm and 503 comments by FB users.

For the period, there were 22% duplicates in the comments from users, so also here many users wrote several comments. The total number of likes in was 15998 and the total number of shares of their posts was 1132. The mean likes per post were 99, the mean comments per post were 5.80 and the mean shares were 7.03. JPMorgan Chase & Co. has a large spread in their likes, comments and shares (SD=229, SD=6.73 and SD=23.78 respectively). The numbers for likes, comments and shares are very small compared to Target but JPMorgan Chase & Co. is a financial firm, hence in general they have a lower customer engagement.

The Home Depot

The Home Depot has 2 968 000 likes on their official FB page. For the time period 2014/09/03 to 2016/10/01, 393 observations (posts made by The Home Depot) were extracted and 23 400 comments from FB users in total to their posts. Only one year of the data was used from the period 2014/09/03 to 2015/09/03 resulting in 212 posts from The Home Depot and 14535 comments from the FB users to analyze. There were 2649 duplicates in the comments, meaning that 18% of the comments are not made by a unique ID. In other, words, many people made several comments on the firm’s FB page. The total number of likes in the period was 612139 and the number of shares was 92817. The mean likes they received to their posts were 2887, the mean comments were 82 and the mean shares were 434. However, they also had a large spread in these values (SD=7694, SD=130 and SD=1827 respectively). As expected, compared to JPMorganChase & Co., The Home Depot has way more engagement on their official FB page.

AT&T

(26)

26

times. The total likes they received in the period were 453895 and the total shares were 57459. The mean likes were 1746, the mean number of comments was 152 and the mean shares were 221. They had a large spread in these values (SD=6836, SD=352 and 1052 respectively).

Evernote

The technology firm Evernote has close to 583 000 likes on their FB page.

For the time period 2013/03/12 to 2015/03/30, 1047 observations (posts) were extracted. In the same period the total number of comments to the firms’ post was 6890. For the one year time period the data was used, 2013/03/12 to 2014/03/12 there was 500 posts made by Evernote and 4350 comments to analyze. Around 29% of the comments were duplicates, indicating that many posts were made by the same users. The total number of likes in the time period was 5402 and the total number of shares was 5611. The mean likes per post made by Evernote were 108, the mean

comments were 9 and the mean shares were 11. Evernote had as the other companies large spread in these values (SD=141, SD=28 and SD=15 respectively).

It is clear that the retailers Target and The Home Depot, as well as the telecom firm AT&T have the highest popularity and customer engagement on FB compared to the financial services firm JPMorgan Chase & Co. and the technology firm Evernote. See table 9 and 10, appendix A for descriptive summary of volume and valence for the total year, unfiltered.

4.3 Text cleaning and preprocessing

After the data is extracted and prepared for all the companies, the next step in the text mining process is to clean the text and preprocess it. This is done to ensure that a lot of noise is removed, which should result in a more efficient and insightful analysis. Essentially, all the extracted data are for now just text documents (i.e. each FB comment). From the FB data, it is the comments from the users that are subject to the text mining. A central step in text mining is to create a corpora, or

corpus which is a body or collection of all the relevant text documents. This is done for each of the

(27)

27

text documents (e.g. each FB comment). The tdm is the basis for quantitative analysis of text, as well as creating word clouds, word frequencies and so forth. Say if the word breach represents one row, then it is counted for each FB comments how many times the word occurs. Lastly, to be able to do quantitative analysis on the text, the tdm is converted to a regular matrix in R. Finally, the text is ready for some analysis.

4.4 Text mining

The following analysis is based on the comments data from the FB API. However, one can expect much noise throughout the period of one year for each company with comments that are not relevant to answering the research questions of how volume and valence vary by event type and industry. Hence, the data is manipulated by filtering out the comments based on certain keywords. This is done to ensure that it is the most privacy failure-related comments that are subject to the text mining (i.e. the negative eWOM). However, the first descriptive analysis is employed on the data for the total period to explore what type of privacy failure-related words are frequent and common. These are then used to create the filtered data. Word clouds and word frequency counts are used first for exploratory means and serve as basis for the further analysis with word associations and sentiment analysis (valence).

Filtering

The data for one year contain mostly comments that are not related to privacy failures, so before further analysis some word clouds are created as well as word frequency counts for the five companies. When inspecting the word frequencies for the whole period, among the 300 frequent words at Target there are words such as boycott, breach, issue, shame and account. Furthermore, for JPMorgan Chase & Co. among the frequent words for the unfiltered data was the word fraud and account. For The Home Depot there were words such as account, breach, boycott, experience, hacked, happen and problem. For Evernote the words data, issues, hate, password and response were frequent. Lastly, among AT&T most frequent words were problems, issues, account and data. Some of the common terms between the companies for the one-year period where the words data, breach, fraud, compromised, compromise, account, security, hacked, hacking, affected and

(28)

28

in the analyses. As an example, if the unfiltered data for the one-year period is used, the following word cloud including the 300 most frequent words for Target is retrieved:

Figure 1: Word cloud total one year Target

Already from inspecting the word cloud for the retailer Target based on the total period (unfiltered), there are several words that could be related to the privacy failure (e.g. bad, monitoring, happened, safety, security). However, there are a lot of other words as well that are not particularly relevant (i.e. from less relevant FB comments). In the following analyses, only the filtered privacy failure-related comments are analyzed. For a one year period for each company, with the filtered valence data there are in total a volume of 2378 observations (privacy related FB comments): 1591 for Target, 46 for JPMorgan Chase & Co., 350 for AT&T, 311 for The Home Depot and 80 for Evernote.

4.4.1 Target

Word cloud and most frequent words

(29)

29

Figure 2: Word cloud filtered Target

Word associations

When inspecting the word associations, if their brand name target is used as the focal word, some of the highest correlated terms are complaint (0.58), consumer (0.58), abusive (0.58), commission (0.57), comply (0.57), damages (0.57), defraud (0.57), filed (0.57) and investigation (0.54). For word associations, these are relatively high correlations. If the privacy-related word breach is used, these are a few of the correlated terms that are retrieved: data (0.23), debit (0.16), casualty (0.15), depression (0.15), encrypted (0.15), lied (0.15), poorly (0.14), reaction (0.13) and victimized (0.10). Lastly, when using hacked as the focal word it was retrieved some other correlated terms: ruined (0.21), password (0.21), help (0.13), terrible (0.13), vital (0.13), backlash (0.12), heated (0.12), issues (0.12), shitty (0.12), skanks (0.12), software (0.12) and stupid (0.12).

Sentiment analysis - valence

The valence is computed by using the sentiment_by function in R from the library called sentimentr, which computes a continuous, average sentiment score for each FB comment.

(30)

30

Figure 4: Barplot of classified emotions Target

It is evident that the retailer Target received quite some negative eWOM on their official FB page following the hacking (1591 comments). The FB users in which of many are likely customers, responded quite strongly by using many negative words in the same comments as the brand name Target was mentioned. Their overall mean valence was slightly negative and the FB users used did not only use the term hacked for the privacy failure they experienced but also used related words such as fraud and breach. The event type for Target was classified as hacked and the correlated terms to the word hacked was for instance password and software, while some correlated terms to the word breach was data and encrypted. Hence, except for the more negative words displaying emotions such as anger, the words mentioned seem to relate to the event type hacking. The many frequent words for Target also seem related to security, so the FB users are complaining about security issues as well as using words for venting their frustrations. Lastly, the number of unique id’s are calculated where it is found that 1255 are unique while 336 of the negative comments that Target received (21%) are duplicates. Hence, some of the FB users responded to the privacy failure with several negative comments.

4.4.2 JPMorgan Chase & Co.

Word cloud and most frequent terms

(31)

31

and the core products of JPMorgan Chase & Co. There are also many neutral words such as transfer, deposit, taxes, clients, bank and department.

Figure 5: Word cloud filtered JPMorganChase & Co.

Word associations

When applying their whole brand name in one term, no terms are correlated. By using the term morgan, these are some of the correlated terms related to the hacking at JPMorgan Chase & Co.: detected (0.58), inform (0.58), attack (0.42), complaints (0.42), failure (0.42), happen (0.42), issues (0.42), stealing (0.42), victims (0.27), fraud (0.15) and boycott (0.14). When the word fraud is used as the focal word, some correlated words which are retrieved are believable (0.92), branding (0.92), criminals (0.92), critical (0.92), detecting (0.92), event (0.92), fraudsters (0.92), handled (0.92), privacy (0.92), responded (0.92), loss (0.48), protect (0.32) and stealing (0.17). Although there privacy failure-related words were less frequent for JPMorgan Chase & Co. as compared to Target, there are some highly correlated words that co-occur with the word fraud as well as negative words that correlate quite strong with their brand name.

Sentiment analysis - valence

(32)

32

Figure 6: Barplot of emotions JPMorgan Chase & Co.

Overall, it was shown that the financial services firm JPMorgan Chase & Co. also received negative eWOM following their hacking. However, the number of observations was quite small compared to Target and less terms/words to analyze; there were only 46 comments for JPMorgan Chase & Co., while there were 1591 privacy related comments for Target. It must be mentioned that JPMorgan Chase & Co. has much less customer engagement in general on their official FB page compared to the retailer Target with overall less likes to their official FB page, comments, shares and likes. It was expected that they would have less customer engagement with comments and other activity due to its nature (i.e. industry type). Their overall mean valence for the filtered comments is slightly positive but it has also quite a large spread (SD=0.214). This means that there is quite a large variance in the eWOM they received. Among the responses from the customers and others it was used privacy-related words with their brand name (e.g. attack, failure). People used words privacy-related both to the handling of the incident, to the protection of the customer data, to the data loss and to the detection of the incident. An interesting observation is that with the focal word fraud used in the word

association analysis, the word branding was highly correlated (0.92). Given that the hacking happened for this renowned financial services firm, it was expected more negative eWOM and stronger responses like was seen for Target. However, the classification of anger was relatively large compared to the other negative emotions so even though they do not have much customer

engagement and volume in general, the eWOM was quite negative for the few privacy related comments that are analyzed. Lastly, there is 15% duplicated user ID’s in the negative comments where some made several negative comments.

4.4.3 Home Depot

Word cloud and most frequent terms

(33)

33

customer and security. As with Target, The Home Depot receive many comments where the privacy-related words breach and security are mentioned indicating these words are once again top-of-mind. Other words among the top 30 most frequent words are account, store, called, information and cards. From inspecting the 200 most frequent from the word cloud below one can see terms such as: trusted, fraudulent, compromised, never, issues, stolen, target, accountability and respond.

Figure 7: Word cloud filtered The Home Depot

Word associations

As with JPMorgan Chase & Co., the brand name of The Home Depot is split through the text mining process. When specifying the word depot as the focal word for word associations, these are some of the correlated words: home (0.97), dumped (0.57), care (0.57), rude (0.56), fiasco (0.55), mess (0.55), personally (0.55), concerning (0.54), accountability (0.5), issue (0.49), seriously (0.48), embarrassing (0.33), sucks (0.31), lawsuit (0.28) and criminal (0.26). The word hacked was quite prominent in the word cloud and it had several correlated terms: attacking (0.27), blizzard (0.27), displeased (0.27), hurts (0.27), ignorant (0.27), pissed (0.22) and hate (0.21). Quite strong negative words have been used by the customers and other FB users following the hacking at The Home Depot.

Sentiment analysis - valence

(34)

34

Figure 8: Barplot emotions The Home Depot

The retailer The Home Depot had a mean overall valence which was negative. Their event type was hacking and hence it was an outsider event. Except for the word hacked, words such as breach and fraudulent were used in the eWOM for The Home Depot. They got even some more stronger negative words associated with their brand name (e.g. fiasco, sucks). Once again people talk about security, information, service, cards, account, data and system for instance. As with the other retailer Target, much of the same response was observed for The Home Depot. This firm also has quite a lot of customer engagement on their FB page compared to JPMorgan Chase & Co. In the negative eWOM that The Home Depot received there were 22% duplicated user ID’s, also here a portion of the comments are made from some more apparently active users.

4.4.4 Evernote

Word cloud and most frequent words

(35)

35

Figure 8: Word cloud filtered Evernote

The brand Evernote has several correlated terms: glitch (0.42), intention (0.42), justifying (0.42), matter (0.42), stop (0.42), stupid (0.42), wrong (0.42), collection (0.29), account (0.29), filing (0.24), private (0.21), crap (0.17), encryption (0.17), hacked (0.17), joke (0.17), password (0.17), refund (0.17) and frustrating (0.15). Another potential word to use as the focal word can be security which is quite prominent in the word cloud. These are some select correlated words to security: branded (0.62), breach (0.62), crap (0.62), device (0.62), joke (0.62), proof (0.58), never (0.33), suck (0.19) and privacy (0.19). Some of FB users apparently have strong negative opinions about the security at Evernote, although the volume of the comments is only 80 in total for the privacy failure-related comments.

Sentiment analysis - valence

(36)

36

Figure 9: Barplot of emotions Evernote

Evernote has so far the highest overall mean valence which is more positive than for Target, JPMorgan Chase & Co. and The Home Depot. However, they also have a quite high spread in the valence (SD=0.22), thus they are receiving many positive comments while also some negative comments. Their event type was classified as hacking as with the other previous companies and it was found in the privacy-related comments words such as intention, wrong, collection, joke and frustrating related to their brand name. Other words relates to more security and systems-related topics (glitch, encryption, password and device). Evernote is a relatively young, small and

entrepreneurial tech company with more customer engagement on their FB than for instance JPMorgan although much less than Target, The Home Depot and AT&T. It may be that their brand and their entrepreneurial associations and/or their eWOM management buffered against some more negative eWOM, given that it was observed the highest mean valence and relatively few negative emotions classified from the comments (and higher joy for instance). Lastly, for Evernote there were only 10% duplicates in the user ID’s, so compared to the other companies most comments are from unique users.

4.4.5 AT&T

Word cloud and most frequent words

For the telecom firm AT&T the top 10 most frequent terms are: att, account, service, phone, bill, get, customer, will, call and told. In the insider breach, customer data was compromised such as

(37)

37

Figure 10: Word cloud filtered AT&T

Word associations

When using the brand name att (lowered from AT&T after the text cleaning), these are some words that are highly correlated: service (0.50), court (0.49), never (0.48), customer (0.42), happen (0.42), problem (0.42), audit (0.42), blaming (0.42), care (0.40), matter (0.40), collection (0.38), cheated (0.35), guilty (0.35), issue (0.35), knowingly (0.35), awful (0.33), mad (0.30), concern (0.29), negative (0.28), account (0.26), wrong (0.25), cancelling (0.24), shame (0.24), fraudulent (0.24) and horrible (0.23). Many other negative words were used by the FB users as well, although with lower

correlations. Furthermore, the word account was very prominent in the word cloud and it has for instance these correlated words which are used in the same comments: reimbursing (0.51), secure (0.51), thief (0.51), apologize (0.47), personal (0.45), system (0.39), protect (0.38), claims (0.34), records (0.34), privacy (0.33), proof (0.32), access (0.30), investigation (0.29), law (0.27) and unauthorized (0.21).

Sentiment analysis - valence

(38)

38

Figure 11: Barplot of emotions AT&T

AT&T is operating in the telecom industry and their privacy failure or event type was classified as an insider breach. Their overall mean valence was slightly positive, although they had relatively large instances of anger among the classified emotions. In the eWOM they received on their FB page, customers and others used words related to the literature on privacy (e.g. collection, concern, fraudulent, secure, protect, privacy and unauthorized). There were also made comments blaming the company, using the law as an argument and talking about claims. Lastly, among the negative eWOM that AT&T received, 31% of the comments are duplicates which could be a proxy for a larger fraction of more active customers responding to their privacy failure as compared to the other companies.

4.4.6 T-tests: comparing company means of valence and valence filtered

Lastly, t-tests are used to compare if the means between the valence for each company is statistically different in the total period of one year against the subset of comments which were extracted based on the select few privacy related-keywords.

Table 4: t-tests comparing valence total period and valence filtered

(39)

39

The t-tests shows that for all the companies, there are significant differences in the mean valence comparing for each company between the mean valence of a one-year period (overall) and the mean valence of the privacy related comments. Thus, as expected the eWOM is significantly more negative in the most privacy failure-related comments compared to the total eWOM received throughout a year.

Comparisons and conclusion of the text mining: event type and industry

Now that some well-known text mining methods have been applied on all of the companies, the volume and valence summary statistics are compared with the following tables:

Table 5: Valence comparison filtered comments

Firm Min. Median Max. Mean SD Event type

Industry

Target -0.77 0.00 1.603 -0.005 0.178 Hacked Retail

JPMorg. -0.657 0.012 0.56 0.013 0.214 Hacked Financial

HomeD. -0.53 0.00 0.481 -0.008 0.155 Hacked Retail

Evernote -0.612 0.07 0.907 0.095 0.22 Hacked Tech

AT&T -0.385 0.00 0.559 0.012 0.134 Insider Telecom

Table 6: Volume comparison filtered comments

Firm Min. Median Max. Mean SD Event type

Industry

Target 1.00 2.00 303.00 9.76 31.59 Hacked Retail

JPMorg. 1.00 1.00 5.00 1.35 0.81 Hacked Financial

HomeD. 1.00 1.00 78.00 3.61 10.35 Hacked Retail

Evernote 1.00 1.00 6.00 1.56 1.31 Hacked Tech

AT&T 1.00 3.00 24.00 4.32 4.01 Insider Telecom

The results from the 2378 filtered FB comments that were based on some common privacy-related keywords from preliminary analysis, showed some commonalities and differences between the five companies. The volume of privacy failure-related eWOM was quite different for the industries retail, financial services, telecom and retail. There were 1591 comments to 163 Target FB posts, 350

(40)

40

(41)

41

volume (2nd highest) compared to those who were hacked and with the 2nd highest mean volume per

company post although the hacked retailer Target had a much higher volume and also mean volume than AT&T. The valence was not more negative as measured by the mean for the firm that suffered an insider breach compared to hacking and not more negative for industries were it was expected to be higher (financial and telecom) as found in the text analysis.

4.5 Regression analysis

As an introduction to the last part of the results, the data preparation for the following analysis is described in more detail. First it is created two data frames in R, one for the volume and one for the valence. The aim is to model only the volume and the valence which are from the privacy failure-related comments. The valence data frame has already been prepared from the text mining, which included only the FB comments per company based on the following keywords related to the privacy failure: data, breach, fraud, compromised, compromise, account, security, hacked, hack, hacking, affected and fraudulent. For a one year period for each company, with this filtered valence data there are in total 2378 observations (privacy related FB comments): 1591 for Target, 46 for

JPMorgan Chase & Co., 350 for AT&T, 311 for The Home Depot and 80 for Evernote. And in the text mining part it was these comments that were analyzed. However, for volume it is necessary to extract only the company FB posts in which the filtered, privacy failure-related comments

(42)

42

data. This ensures that the volume for each company FB post is only the relevant privacy failure-related comments. For instance, the original volume to one of Target’s FB posts in the time period was 86 but now it is 8 in the new volume filtered variable. For this particular observation, these 8 comments are privacy failure-related comments. Finally, after this procedure is applied for each of the companies, all the filtered volume data are merged together in a final data frame. As a check, there are 2378 filtered FB comments which are privacy-related and used for the text mining where 1591 of the comments are for Target. In the new filtered volume data, the sum of all observations for the volume filtered variable now equals 1591. For JPMorganChase & Co. there are now 46

observations for the volume, 350 for AT&T, 311 for The Home Depot and 80 for Evernote. Finally, this procedure resulted in the correct number of volume that is privacy failure-related and linked it to the correct company FB posts my matching of id variables. For the number of observations in the volume data filtered altogether (the companies FB’ posts) there are now 415 observations: 163 for Target, 34 for JPMorgan Chase & Co., 81 for AT&T, 86 for The Home Depot and 51 for Evernote. These 415 company FB posts match to the 2378 privacy-related comments (volume).

4.5.1 Outlier detection

Before further analysis, outliers are inspected for volume and valence. A boxplot is used to inspect the outliers and those values which are Q1 - 1.5*IQR and Q3 + 1.5*IQR are removed. For valence this resulted in 120 observations which is 5% of the total number of observations for all firms in the filtered comments data (n=2378). Lastly, this was also performed on the total data for one year, in order to compare correctly the mean valence between the total period and the filtered comments correctly. This resulted in 10326 observations removed out of 96 888 observations. Outliers are also removed for the filtered volume data, where 38 extreme observations or outliers are detected (see appendix B for plots). The mean volume before removing the 38 out of 415 observations (9%) is 5.73 while 2.23 after removal. There were only a few posts for Target that were extreme, for instance one post had 303 comments, while the median across all firms before outlier removal is 2. Hence, for all firms most observations are small counts and the few posts for Target especially that have much larger counts affect the mean quite much.

4.5.2 Regression analysis: specification, estimation, validation

Model specification

(43)

43

model is appropriate. The Poisson model, or in cases when the variance is larger than the mean (overdispersion), the negative binomial distribution are potential candidates to model the volume variable. However, these models do not incorporate the fact that there are no zero values observed in the data (and cannot be observed). Hence, zero-truncated Poisson and/or zero-truncated negative binomial are more appropriate models and hence used to model volume while comparing the performance of the two various methods. For modeling valence, ordinary least squares regression (OLS) is employed. The models are all fully pooled since the data is stacked in order to test for differences in event_type and industry on the volume and valence.

Valence

For testing the hypothesis for valence, four models are specified and estimated: a full model including interaction between the IV’s event type and industry, a full model (main effects only), a model including the IV event type only and a model including industry only. For the event type variable, hacking serve as the reference category (=0) while for the industry variable, retail serve as the reference category (=0). The models for valence follow the following specification:

Full model (with interaction):

Valence = B0 + B1 insider + B2 financial + B3 telecom + B4 technology+ B5 (insider x financial) + B6

(insider x telecom) + B7 (insider x technology)

Full model (main-effects only):

Valence = B0 + B1 insider + B2 financial + B3 telecom + B4 technology

Event type only model:

Valence = B0 + B1 insider Industry only model:

Valence = B0 + B1 financial + B2 telecom + B3 technology

Exploratory analysis

(44)

44

Figure 13: Histogram of valence

Estimation

Four models were estimated for the continuous DV valence using ordinary least squares regression (OLS) with a backward selection approach by first estimating the full model with interaction, then the full model with main effects only, followed by the event type-only model and lastly the industry-only model.

Table 8: Valence OLS regression summary

Full model Full model

interaction

Event only Industry

only Variable B SE B B SE B B SE B B SE B Constant -0.005 0.003 -0.005 0.003 -0.002 0.003 -0.005 0.003 Event_type insider 0.0167* 0.007 0.0167 0.007 0.013 . 0.007 Industry financial 0.035 . 0.020 0.035 0.020 0.035 . 0.020 Industry telecom NA NA NA NA 0.016* 0.007 Industry technology 0.065*** 0.016 0.065 0.16 0.065*** 0.016 Event_type*Industry financial NA NA Event_type*Industry telecom NA NA Event_type*Industry technology NA NA F 7.406*** 7.406*** 3.115 7.406*** R2 0.009 0.009 0.001 0.009 . p<.10, *p <.05, **p<.01, ***p<.001,

Model fit and interpretation

All models were overall significant (p<0.05) except for the event type-only model which is not overall significant (p=0.07). From the model fit it is shown that the coefficient of determination, the R square does not improve when adding more predictors. For instance, the R square for the industry only-model is very small (~ 0.01) and the same for the full only-model including both event type and industry as predictors (~ 0.01). The implication is that in any of the models the predictors event type and

(45)

45

(insider) and industry (telecom). There is not enough variance in the data. There is only one firm that had an event type classified as an insider breach, and thus event type insider is linearly dependent on the industry being telecom (for AT&T). The effects of event_type insider on valence are the same for the two models full model and industry model (coefficient of 0.016), hence they store the same information. In the estimation in R for the full model, event type insider is estimated and significant while telecom is not defined due to singularities. There is no solution to this multicollinearity issue but to remove one of the IV’s. According to H2, a privacy failure which is firm-related should lead to more negative valence (insider in this case) compared to an outsider-related privacy failure (hacking). This is not supported as the event_type insider has a significant effect on the valence only in the full model and given it is positive. Before further interpretation of the parameters, the best model is chosen through model selection. An analysis of variance-test (ANOVA) is used to test if the means are significantly different between the models. The industry only-model is significantly different from the event type-only model (p<0.05) and the industry only-model is significantly different from the null model (intercept only) (p<0.05). Hence, the industry-only model is chosen for further

interpretation and validation. When using the industry as the predictor (main effects only), compared to the reference category retail (baseline), all other industries are significant. However, strictly speaking - only telecom and technology are significant at a 5%-level, while the financial is significant at a 10%-level. Compared to retail, there is a positive effect on the financial industry on the valence with 0.035 (although not highly significant). Note that there were quite few observations for the financial services firm JPMorgan Chase & Co. after the process of filtering out the comments and also removing outliers. Compared to retail, there is a significant positive effect (p<0.05) of the industry telecom on the valence with 0.016. Lastly, the industry technology has the highest significant positive effect (p<0.05) on the valence with 0.065 compared to retail. It is observed that the effect of telecom (AT&T who had an insider breach) is positive and not negative, while the highest positive effect on valence was for the technology firm Evernote (which also received less negative eWOM and had the highest overall, positive mean valence). As was shown in the text mining and t-tests, the retailer Target received the highest volume and while also Target and The Home Depot both had a mean valence which was negative.

Model validation

(46)

46

Normality

Figure 14: Histogram of residuals

The residuals have a close to normal distribution, however this is further inspected by the normal quantile-quantile (QQ) – plot.

Figure 15: Normal quantile-quantile plot

From the normal quantile-quantile plot (QQ-plot) the data looks normal as the residuals mostly follow the line, although there are some smaller tails. Before the outlier detection that was performed earlier, the tails were quite larger. It is therefore assumed that the data is normally distributed.

Heteroscedasticity

The variance of the residuals is inspected by plotting the residuals of the industry-only model versus the fitted values:

Referenties

GERELATEERDE DOCUMENTEN

censorship, approval of the Internet law, increasing state control, state’s repressive online politics to Others, criticism about Erdoğan, criticism coming from the EU, dynamics of

We assume that the online community involvement (contain online communities dependence and virtual relationships) and trust tendency will affect the intensity of negative

Concluding, literature is still very divided about whether the use of personal data of customers for sending personalized messages, either with a marketing or a service purpose,

● The filtering process resulted in 2378 privacy failure-related FB comments to analyze (negative eWOM) ● Valence significantly more negative for all companies in the filtered

The result from the research showed that the Motivations of Anticipated Reciprocity, Increased Recognition and Motivation Not in Self Interest were the reasons community

But also the stress response of the parentally stressed rats is changed as seen in the secretion of corticosterone and ACTH between prenatally stressed rats of 4 months old (restraint

For the dif- ference between the expected maximum of the Brownian motion and its sampled version, an expansion is derived with coefficients in terms of the drift, the Riemann

Provide the end-user (data subject) with the assurance that the data management policies of the data con- troller are in compliance with the appropriate legisla- tion and that