The value of pre-launch Twitter volumes in predicting initials sales of new cars

(1)

1

The value of pre-launch Twitter volumes in predicting initials sales

of new cars

MSC Business Studies Marketing Thesis

by

Jorrit Stein 10618015

First supervisor: Ms. E.Korkmaz

Second supervisor: Dr. Umut Konus

(2)

2

Statement of originality

This document is written by Student Jorrit Stein who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3

Index

Abstract ... 4

Introduction ... 5

Literature review ... 10

Predicting car sales ... 10

Advertisement ... 13

Web data based forecasting ... 14

Twitter ... 18

Forecasting with Twitter data ... 19

Conceptual framework and hypotheses ... 21

Car model tweets ... 24

Brand tweets about car ... 26

Mass media ... 27

Twitter data. ... 30

Car sales data ... 30

Sample ... 31

Variables ... 33

Results ... 39

Graphical description ... 40

Hypothesis testing ... 45

Discussion and conclusions ... 52

Managerial implications ... 56

Further research ... 58

References ... 59

Appendix 1: Search queries Twitter ... 63

Appendix 2: Correlation plots ... 65

(4)

4

Abstract

The recent technological developments, the rise of big data and the increased use of the internet and social media has led to a new era in the prediction of consumer behavior. Organizations and scholars increasingly see the value of collecting consumer information and using this data to make predictions about future consumer behavior. This study will look at the possibility to make predictions with online word of mouth on Twitter. Specifically, this research will try to find a relation between pre-launch Twitter volumes and the initial sales of new car models. The two Twitter volumes used in this research are the consumer and brand tweets about a new car model. Furthermore, we examine the relationship between brand media expenses and both pre-launch WOM on Twitter and post-launch initial car sales. Data was collected from seventeen new car models introduced in the Netherlands in 2013 en 2014. A descriptive, correlation and regression analysis were applied to test the

relationships. The results of our research show no strong support for the ability to predict initials car sales of new car with pre-launch Twitter volumes generated by either consumers or brands. Although our outcomes do indicate that pre-launch WOM on Twitter seems to have a positive influence on the initial sales of new cars. Furthermore, we did not find a relationship between pre-launch brand expenses and initial sales. We did find a strong positive relationship between post-market brand investment in mass media promotion of a new car and the initial new car sales. The thesis results indicate that post-launch media expenses by a particular brand is a better predictor of initials car sales than the pre-launch consumer and brand tweet volumes and the pre-launch media expenses.

(5)

5

Introduction

Nowadays, companies are collecting more and more consumer data. This development is caused by the widespread digitalization of world. The storage of big data combined with the technological development, new processing techniques and the increased usage of the internet and social network sites have led to a new era in the prediction of consumer behavior (Goldman Sachs Group, 2014). This digital development provides new opportunities, which has attracted a lot of attention from scholars during the last decades, but there is still plenty of room for new research.

Business management teams increasingly see the value of collecting consumer information and using this data to make predictions about future consumer behavior. Consumer data is increasingly seen as a resource that can lead to a sustainable competitive advantage (The Economist Intelligence Unit, 2013). Consumer data is becoming such a resource in the business environment, as companies are collecting an increasing amount of divers, exclusive and unique consumer information, which is used to improve business practices. Consumer data, and in particular the results of its analysis, is a valuable resource for making strategic decision on business level which can results in the prolonged existence of a company (Barney, 1991).

Recently, scholars are increasingly interested in research topics concerning predictive consumer analytics, due to the potential value of consumer data. As mentioned before the success of social media is an important pillar for the rise of consumer data. The various social media platforms offer consumers the opportunity to generate and spread information to a large audience in an

unprecedented manner. This online generated worth of mouth (WOM) appears to be a promising source of information for the prediction of consumer behavior as WOM has already proven to be very valuable in the offline world. Recent studies on the diffusion of innovations have found that the volume of WOM correlates significantly with consumer activity and market outcome (Anderson, 2003; Neelamegham and Chintagunta, 1999). WOM constitutes the basis of interpersonal

(6)

6 communications that has an important influence on product evaluations and purchase decisions by consumers. According to Grewal, Cline and Davies (2003) WOM is more powerful than business communication, because WOM is considered more credible and valuable by consumers. The power WOM is also frightening for marketers, as informal discussions among consumers are difficult to control and can either make a product popular or unsuccessful in the market.

As mentioned earlier, recent studies on the diffusion of innovations have found significant

correlations between the volume of WOM and related market results. This research on the diffusion of innovation is based on the new product diffusion theory developed by Bass (2004) and Rogers (2004). The literature of Bass (2004) suggest that innovators in the early stage of a product life cycle are mainly affected by mass media. And after using the new products, the innovators pass their opinions to latecomers via the WOM channels. Rogers (2004) recognizes WOM as a channel of communication in the product life cycle, particularly among the early majority and late majority, who tend to base their purchase decision on the WOM from the early adopters.

The digital developments has led to a new variety of WOM, namely the online WOM. According to Phelps et al. (2004) online WOM is even more influential than offline WOM because of its speed, convenience, wide reach and the absence of interpersonal pressure. And furthermore, because online marketers are nowadays able to archive WOM interactions from online forums in databases. This offers organizations the opportunities to estimate their marketing effects directly and perhaps more accurately. In the past decade, researchers have carried out an increasing number of studies to understand the power of online WOM. They found a significant effect of online consumer reviews on product sales (Chen and Xie, 2008; Dellarocas, 2003; Li and Hitt, 2008; Miller, Fabian and Lin, 2009; Cui, Lui and Guo, 2012).

(7)

7 Online WOM is supported by various digital platforms. The social networks on the internet, in

particular, are a major source for online WOM. Cui, Lui and Guo, 2012 and Asur and Huberman (2010) identify Twitter as a platform that is increasingly suitable for studying the effect of online WOM on the adoption of new products by consumers. Twitter is a very popular microblogging website, where users can follow people of their interest, update their own status with Tweets, retweet messages of others or communicate with them directly. Since Twitter launched in March 2006, the service rapidly gained worldwide popularity and its user base has been growing

exponentially, with 500 million registered user in 2012, who posted 340 million tweets and searched for 1.6 billion queries per day. In 2013 Twitter was one of the most visited websites (Wiki, 2014). Since the success of Twitter, the platform has drawn more and more attention of researchers from various disciplines.

Today, scholars are highly interested in exploiting online WOM data in their forecasting problems. Although evidence has been found that online WOM influences consumer purchases, there are still theoretical and empirical questions about the effect of online WOM on new product sales. This study contributes to the academic literature due to the following reasons. Firstly, despite many practical research of marketing agencies into the opportunities of the Twitter database, there exists a major shortage of academic research on the predictive capabilities of the Twitter database. Secondly, as mentioned by Cui, Lui and Guo (2012), most studies on online WOM to date have dealt with forecasting the sales of an experience products, such as books, movies, and television shows. These products are often well promoted prior to their release and attract customer reviews within a short period after their public release. Although a few studies also included more technological products (Clemons, Goa and Hitt, 2006; Mudambi and Schuff, 2010), there is a shortage in studies focusing on the effects of online WOM on solely technical products. Thirdly, most of the existing studies focus on forecasting sales of existing products and not on new products.

(8)

8 Finally, most research examined the relationship of online WOM and sales after the product launch. Even though a positive relationship between the post-launch online WOM period and sales has been found in the literature (Chen and Xie, 2008; Dellarocas, 2003; Li and Hitt, 2008; Miller, Fabian and Lin, 2009; Cui, Lui and Guo, 2012) and the new product diffusion theory by Bass (2004) and Rogers (2004) suggests that word of mouth plays a greater role in the growth period than in the introduction stage, recent studies indicate that online WOM can affect product sales early in the product life cycle process (Amblee and Bui, 2008; Dellarocas, 2003; Dellarocas, Zhang and Awad, 2007; Asur and Huberman, 2010). The results from these studies indicate that online WOM might even have an effect before the start of the product life cycle. There is a lack of evidence of the relationship between pre-launch online WOM and sales. To sum up, our research contributes to the literature because we focus on predicting the sales of technological product using the pre-launch online WOM on Twitter.

This thesis will investigate the opportunities of online WOM on Twitter to predict sales in a business environment. Specifically this research will try to find a relationship between pre-release Twitter data and the initial sales of new car models. So the research is intended to test the forecasting capabilities of Twitter in a real world situation. This study focuses on the car industry for two reasons. Firstly, the topic of cars is of considerable interest among the Twitter community. Tweets about the car industry accounted for roughly 1.5% of total tweets in the Netherlands in 2013. Secondly, the real world outcomes from car sales can be easily observed due to the new car registration system in the Netherlands. In the next chapter a literature study has been conducted which starts off with the definitions of the key concepts of the research topic. The literature review will be continued by highlighting some of the key articles on the topics of word of mouth, forecasting with social media and predicting car sales. After the literature review we will present a conceptual framework and the hypotheses. In the subsequent results section we demonstrate the outcomes of a descriptive,

(9)

9 correlation and regression analysis. This research report ends with the discussion, managerial

implications and suggestions for further research.

This thesis contributes to both the scientific as the practical world. Our results support or reject the possibility to predict real world outcomes with tweets. Marketers or managers can use the results of this research as valid argument for decision making on a strategic level. For instance, managers can use the findings of our research as a reason for his instructions to the marketing department. If a managers is interested in knowing the future market share of a product he can assign his department to analyze the Twitter messages about their own products and those of their competitors. If

necessary they can initiate a volume creating campaign on Twitter. Moreover, for online marketing agencies it is a confirmation of their activities. They can scientifically explain their clients the importance of being active on social media and monitoring their competition. As mentioned before this study contributes to the scientific world because we focus on predicting the sales of

technological product using the pre-launch online WOM on Twitter. Most importantly a positive results would mean another confirmation of the predictive power of Twitter.

(10)

10

Literature review

This literature section will start off with the definition of the key concept of the research topic. The literature section will be continued by highlighting some of the key articles related to the research questions. Our research touches multiple research fields namely car sales prediction, effect of

advertisement, worth of mouth, web data based forecasting, Twitter and in particular predicting with Twitter. The key findings of this literature review are summarized in Appendix 3: Table 1.

Definitions

Two important concepts of this study are word of mouth and Twitter volume. According to Dichter (1966) word of mouth in marketing can be defined as the passing on of information between a commercial communicator and a receiver concerning a brand, a product or a service. A

non-commercial communicator is someone who is not rewarded for passing through the information.

Twitter volume is the amount of tweets about a certain topic. By practitioners, Twitter volume is often referred to as Twitter buzz. According to Thomas (2004) Twitter buzz is the interaction of consumers and users of a product or service which intensify or changes the original marketing message of a brand. Buzz can be an emotion, energy, excitement, or anticipation about a product or service and can either be positive or negative. Originally, buzz referred to oral communication but the emerge of Twitter and Facebook changed this. The social networks are nowadays the dominant communication channels for marketing buzz. The source of the buzz can be the intentional marketing activities or it can be the result of independent events that reaches the larger public through

traditional or social media.

Predicting car sales

(11)

11 (1996). The really new product in this article is an electric vehicle by General Motors. They describe how the automobile manufacturer combined a new measurement methodology, called information acceleration, with existing marketing research methods to make forecast about potential sales. The multimedia virtual-buying environment conditions respondents for future situations, simulates user experience and encourages consumers to actively search for information on the product. The basic idea behind information acceleration method is to place the consumers in a virtual buying

environment that simulates the information that is available to the consumer at the time he or she makes a purchase decision. The method to generate the forecast for the electric vehicle combines measurements on factors which are believed to influence consumer buying choice. In the

information acceleration measurement Urban Weinberg and Hauser (1996) use showroom visits, advertising, magazine articles and word-of-mouth as sources of information that consumers access in their search for information about a new car. These information types can to a certain degree, depending on degree of access and quality, influence consumer choice and subsequent potential sales. In addition the authors appoint other influencing factors like governmental regulation, offerings of other brands, environmental situations, driving experience, reviews and technological and infrastructure development.

Newman and Staelin (1972) did research on the pre-purchase information seeking of consumers looking for a new car or major household appliance. Newman and Staelin argue that knowledge of consumer information seeking is fundamental to understand buyer behavior and planning marketing communications and retail distribution. The article tries to identify the main influences on

information seeking. The study examined 653 households which had bought a new car or a major household appliance. Newman and Staelin performed two multivariate techniques with a

information seeking index as the dependent variable. The information seeking index is merely constructed by identifying the sources of information used by the different households. Identified sources for information seeking are categorized in friends or neighbors, books, pamphlets, magazine or newspaper articles; newspaper or magazine advertisements; television commercials and other

(12)

12 sources, such as repairmen or mechanics. Newman and Staelin find support for their hypothesis that purchase and use of a product results in learning which later influences buying behavior. Other interesting findings are that half of the buyers thought mainly of only one brand at the outset of the decision process. Moreover the results indicate that many buyers engage in little information seeking, even though enough information is accessible, suggesting a significant selectivity of search. However the authors state that this does not mean the buyer is badly informed. The buyer may have started with what he regarded as sufficient knowledge. Also, counts of types of sources and types of information say little about the quality and quantity of information seeking search. The results that many buyers engaged in little information seeking is consistent with a finding reported earlier by Newman and Staelin (1971). According to the finding of this report, half of the buyers of new cars and major appliances had purchase decision times of one or two weeks. In this previous research the amount of information seeking was positively related to decision time, but the data also showed that experienced buyers were able to collect a substantial amount of information in a short time.

Bennett and Mandell (1969) studied the pre-purchase information seeking behavior in terms of repeat purchase data for new car purchasers. They found that experience alone, measured by the number of times the choice decision has been faced, appears not to affect information seeking behavior. Meanwhile, positively reinforced past choices, measured in aggregate or in sequence decrease the amount of pre-purchase information seeking in which consumers engaged. This study supports the contention that brand choice behavior is a form of human behavior subject to learning through reinforcement. Consumers who are loyal to a brand are either more susceptible to the brand’s marketing or are harder to reach by competing brands. Bennett and Mandell identify multiple sources of information in the new car purchase decision process. The sources used are consumer reports, dealer visits, expert opinion, friends opinion reading brochures, discussion with spouse, auto show, advertisement, new articles, discussion with children.

(13)

13 Koppel, Charlton and Fildes (2006) have focused on the reasons why people buy cars. Specifically they examined the importance of vehicle safety in new vehicle purchase process for fleet vehicles. They found that safety is generally not the primary consideration in the vehicle purchase process and safety is outranked by factors such as price and reliability. The full list of factors identified in the vehicle purchase process by Koppel, Charlton and Fildes (2006) are warranty, type, price, style, safety, running costs, re-sale, reputation, reliability, price, performance, model, fuel, country and comfort.

Web data based forecasting

In the following paragraphs we discuss literature which covers a more broad topic of forecasting with web data. The theory shows the various possibilities to predict real world outcomes based on

internet datasets. In the field of predictions with internet data there are many different streams. This is due to the various origins and types of data. The different flows in web data based predictions also originates from the variety in research disciplines. For instance web based prediction is used in psychological, health and social science. Digital platforms provide a diverse, rich and unlimited set of data which offers the media companies the unprecedented opportunity and ability to track and model the behavior of individual users over a certain time period. Researchers use the data to help the firm to better understand consumers and to improve management decision making. In the long-term this will improve business practices (Feit et al., 2013).

Goel et al. (2013) explored how web search can predict collective future behavior days or even weeks in advance Goel and Goldstein (2014) used connectivity data generated by social media. Across different sectors, Goel and Goldstein found that social data are informative in identifying individuals who are most likely to undertake certain activities. Also on a larger scale scholars are examining the opportunities of big data forecasting. Ettredge, Gerdes and Karuga (2005) proved that web search statistics can predict macro-economical variables, such as unemployment rate.

Next to this business related topics, scholars examined the use of predictive analytics in health sciences. Cooper et al. (2005) found a correlation between The Yahoo! search activity associated with

(15)

15 specific cancers and their estimated incidence, estimated mortality, and volume of related news coverage. Polgreen et al. (2014) studied the application of internet searches for influenza

surveillance. The authors models predicted an increase in communities positive for inﬂuenza one to three weeks in advance of when they occurred.

Besides in the business environment and in healthcare, web data based forecasting is also used by scholars researching prediction of stock markets and politics Antweiler and Frank (2004) found a correlation between activity on internet forums and stock volatility and trading volume using automated linguistics methods. Gilbert and Karahalios (2010) and Choudhury et al. (2010) used blog posts to predict stock market behavior. Williams and Gulati (2008) found, using a multivariate analysis, that the number of Facebook supporters is a valid predictor of electoral success. Veronis (2007) shows that a simple count of candidate mentions in the press can be a better predictor of electoral success than commonly used election polls.

Word of mouth

Dichter (1966) examined the relationship between the successful everyday WOM recommendations and effective advertisement. He discovered that the two concepts are closely related to each other. According to Dichter this emphasizes the new role of the advertiser as that of a friend who

recommends a tried and trusted product. In addition, Dichter identifies WOM as an influencer that complement mass media advertising. There is a symbiotic relationship between the impersonal and the personal, or the formal and informal, avenues of communication. This relationship is moderated by the risk factor of buying a new product. For instance, when a consumer considers buying a new car the economic risk are much higher compared to buying toilet paper. Dichter demonstrates that if the consumer risks are high, WOM recommendations are one of the strongest influencers on product purchase decisions of consumers. Dichter further points out that there is a market of influencers who can be reached and influenced by advertising in existing specialized publications,

(16)

16 such as profession magazines or by the appropriate approach. The rest of the consumers are

influenced by these aforementioned influencers through WOM.

Cui, Lui and Guo (2012) examined the effect of online reviews on product sales for consumer electronics and video games. To research this effect the authors did an analysis of panel data of 332 new products from Amazon.com over nine months. Cui, Lui and Guo identified and measured three characteristics of consumer product reviews. These three characteristics are volume, valence, and dispersion of the consumer reviews. The reason behind the measurement of the volume of product reviews is that discussions about a product in online forums lead to increased awareness among consumers. The valence is the average ratings or the fraction of positive and negative opinions.

Cui, Lui and Guo found that that each of the metrics of online reviews all significantly affects consumer purchases, but together these metrics have an tremendous effect on new product sales. The effects tend to be stronger or weaker depending on the product category. In other words they found important contextual variables that moderate the influence of online reviews. Specifically, the authors make a distinction between an experience product or a search product.

Search products are goods that consumers can evaluate by specific attributes before purchase, such as the technical or performance aspects of a product. Consumers assessing a search product are more likely to use a systematic decision making process. On the internet there is a tremendous amount of information available on product attributes, functions, and performances. And even more important, the evaluation of products by other consumers is prominently displayed. Consumers can easily access such information. Cui, Lui and Guo (2012) found that the valence of reviews has a great effect on evaluations and purchase decisions for search products. This indicates a strong persuasive effect of product ratings for more complex products and consumers experiencing a high level of involvement. Moreover the researcher found that the effect of the volume of page views by readers

(17)

17 is significant for both experience and search products, but the volume of page views has a greater influence than the volume of reviews only for search products. According to Cui, Lui and Guo this latter suggests the significant role played by followers or latecomers in this product category.

Experience products require feeling or experiencing. Experience products are difficult to describe using specific attributes and may induce different experiences across consumers. Evaluations of experience products by consumers tend to be very personal and less indicative of the quality of a product. Moreover, in the online environment consumers cannot directly feel the products or experience product attributes. Consequently, consumers considering an experience product rely more on extrinsic affective cues, such as the popularity of the product. Cui, Lui and Guo (2012) have found this to be true. They found that experience products are more subject to the influence of the volume of reviews. These volumes signal the popularity of a product and an awareness effect from the large volume of reviews.

Furthermore the authors did research on the effect of online product review over time. They found that the volume of reviews has a significant positive effect on new product sales in the early period of a product life cycle. This effect decreases over time, which according to Cui, Lui and Guo suggests the significant role played by early reviews. They also found that the percentage of negative reviews has a greater effect than that of positive reviews, confirming the negativity bias.

Recently, scholars has put a lot of effort in researching the importance of online WOM on the success of movies. These studies rely on metrics such as the number of message, votes and discussion pages dedicated to a film. Liu (2006) studied online WOM data from the website Yahoo! Movies in order to study the dynamics of the online discussion. Lui (2006) found that while during pre-launch, valence is important, once the movie has launched online WOM volume becomes by far the best predictor of sales. The author concludes that online WOM is a good predictor of success, but that the link is not causal. WOM reflects mainly the media exposure of the film. Asur and Huberman (2010) found a

(18)

18 similar result using Twitter data. They show that the number of tweets created around a movie can predicts box office revenues. We discuss this article more thoroughly further in this literature review. Holbrook and Addis (2007) show that online WOM increases with a film's budget and that WOM positively impacts the revenues of the film. Online WOM is an approximation for the media exposure.

Twitter

Recently, Twitter has attracted scholarly interested from various fields and for different reasons. Jansen et al. (2009) researched the usefulness of Twitter for marketers and brand owners. They found use for the analysis of Twitter chatter to monitor digital word of mouth in the area of product marketing. Jansen et al. (2009) also found in their study that one fifth of a random sample of tweets contained mentions of a product or brand and that an automated monitoring tool was able to distinguish significant differences of customer attitude of a user towards a brand.

Another stream of research on Twitter focuses on understanding its usage and community structure (Honeycutt and Herring, 2009; Huberman, Romero and Wu, 2008; Java et al., 2007) which provides a general understanding of why and how people use the micro blogging service. Briefly, they found that the intentions and intensity of Twitter usage differs considerably. Huberman et al. (2009) analyzed the social interaction on Twitter and found that the driver of Twitter usage is a limited hidden network among friends and followers. The scholars conclude that most of the interaction links are meaningless.

Asur and Huberman (2010) describe Twitter as an extremely popular online micro blogging service with very large user base, consisting of several millions of users. According to the scholars, tweets normally consist of personal information about the users, news or links to content such as images, video and articles. A retweet is a post originally made by one user that is forwarded by another user.

(19)

19 Retweets are a way of disseminating interesting posts and links. According to Asur and Huberman, Twitter has attracted lots of attention from organizations because of the huge potential it provides for viral marketing. Organizations are using Twitter to advertise products and spread information to stakeholders. Due to its huge audience, Twitter is even increasingly used by news organizations to filter out the latest news updates.

Forecasting with Twitter data

Next to the general understanding of Twitter, other researchers took an interested in its prediction power and potential application to other areas. This paragraph will discuss a range of applications of the predictive power of Twitter. Achrekar et al. (2011) found that the volume of ﬂu related tweets is highly correlated with the number of fever cases reported by using auto-regression models. Lampos and Christiani (2010) also suggest the possibility to use Twitter to track the spread of epidemic diseases. Tumasjan et al. (2010) analyzed Twitter messages mentioning parties and politicians prior to the German federal election 2009 and found that Twitter is indeed used as a platform for political debate. The amount of tweets concerning a political party or persons reflects voter preferences and comes close to traditional election polls.

Next to varied use of the predictive power of Twitter mentioned above we are interested in the application of Twitter data in the business environment. According to Asur and Huberman, social media can also be considered as a form of collective wisdom. They decided to investigate the power of Twitter in predicting real world outcomes. Asur and Huberman constructed a linear regression model for predicting box office revenues of movies in advance of their release. The model uses the rate of chatter extracted from a total of almost three million tweets. The model predictions outperformed in accuracy those of the Hollywood Stock Exchange. They found a strong correlation between the amount of pre-launch attention a movie has and its ranking in the future. Moreover

(20)

20 they analyzed the sentiments present in tweets and demonstrated their efficacy at improving

predictions after a movie has released. Asur and Huberman conclude by arguing that the used method can be extended to a large variety of topics, ranging from the future rating of products to agenda setting and election outcomes. Moreover they state that their work shows how social media expresses a collective wisdom which, which when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.

Zhang, Fuehres and Gloor (2011) published a paper in which they describe early work trying to predict financial market movements such as gold price, crude oil price, currency and stock market indicators by analyzing Twitter posts. They collected Twitter feeds for five months capturing a large set of emotional retweets originating from within the USA. They extracted six public opinion time series containing the keywords “dollar”, “$”, “gold”, “oil”, “job” and “economy”. They found a Granger-casual relationship between the keywords, except for “$”, and certain market movements. Their results show that these keywords are correlated to and predictive of financial market

movement. The study concludes that emotional Twitter outburst on a topic on one day, the volume of economic topic retweeting, is a fairly accurate predictor of how the corresponding stock market will be doing the next day.

The previous studies provide reasons to believe that Twitter offers a database suitable for analysis and subsequent prediction of sales, crime, flue trends, revenues, stock markets and more. In a business context, Twitter data analysis can offer a lot of statistics about how a brand is performing on the internet.

(21)

21

Conceptual framework and hypotheses

Urban, Weinberg and Hauser (1996), Dichter (1966), Bennett and Mandell (1969) and Newman and Staelin (1972) have identified WOM as an important source of information for consumer in the decision making process. Therefore, WOM can be used as important source for new car sales prediction. Newman and Stealin (1971) discovered that the consumer information seeking process for the decision processes for a new car seems to be short and the process includes only a limited number of sources. Moreover, they found that half of the buyers think mainly of only one brand at the outset of the decision process.

The following literature identified WOM as a key source in the aforementioned consumer decision making process. Dichter (1966) defined WOM, in marketing, as the passing on of information between a non-commercial communicator and a receiver concerning a brand, a product or a service. Recent studies on the diffusion of innovations have found that the volume of WOM correlates significantly with consumer activity and market outcome (Anderson, 2003; Neelamegham and Chintagunta, 1999). Grewel, Cline and Davies (2013) found information from WOM to be more powerful than printed information, because WOM is considered more credible and valuable. They argue that informal discussions among consumers can influence the popularity of a product, particularly for new products.So theoretically, there is support for the effect of WOM on new product purchase decisions and sales.

Cui, Lui and Guo (2012) consider online WOM even more influential than offline WOM because of its speed, convenience, wide reach, and the lack of interpersonal pressure. Furthermore, online

marketing managers today can archive WOM interactions from online forums in databases. These databases lead to opportunities to estimate the effects of consumer WOM directly and perhaps more accurately.

(22)

22 To better understand the effect of online WOM on new product purchase decisions and sales, we take a closer look at the diffusion of innovation literature developed by Bass (2004) and Rogers (2004). The diffusion of innovation literature deals with the adoption of innovation in societies at the aggregate product category level (Anderson, 2003; Neelamegham and Chintagunta, 1999). But the diffusion of innovations theory has also been found very useful in analyzing the role of online WOM in new product growth (Cui, Lui and Guo, 2012). A key concept in the literature of the diffusion of innovations is the Bass model. This model by Bass (2004) proposes that the early adopters and innovators in the early stage of the product life cycle are affected by external influences, such as mass media.

According to Bass (2004), the external influences are the reason why innovators are turned into the main adopters of new products in the first period of the product life cycle. In the later growth and maturity periods of the product life cycle, the adoption of new products accelerates due to internal influences, such as WOM. These influences result in the adoption of the product by followers and latecomers. Thus, the diffusion of innovation theory considers early adopters as the initial main driving force of the dispersion process. In this diffusion process, WOM has been recognized as a key channel of communication, particularly among the early majority and late majority, who tend to follow the innovators and early adopters (Bass, 2004;Rogers 2004).

The diffusion of innovations literature indicates that WOM plays an increasingly important role in new product adoption during the growth stage of the product life cycle. The diffusion of innovation literature is supported by Dichter (1966). This scholar argues that there exists a ready-made market of influencers who can be reached and, in turn, influenced by advertising. The rest of the consumers are influenced by these influencers through WOM. By extending the diffusion theory to the online world you would expect online WOM to exert minimum effect in the early introduction stage of a

(23)

23 new product, but greater effect in the growth period of the new product.

However, recent studies suggest a change in the role of WOM in the product life cycle due to the rise of the internet. For instance, Phelps et al. (2004) argue that the speed, the convenience and large reach of digital interactions is changing the dynamics of the various industries in which WOM has traditionally played an important role, especially for new product launches. In the current digital world, WOM about products and brands can reach consumers instantaneously. Consumers no longer have to wait for known influencers, friends or family to give them interpersonal advice about a product.

Other research has been looking for more practical evidence of a change in the role of WOM and their findings suggest that online WOM is having an early impact on new product sales. To start with the research performed by Cui, Lui and Guo (2012). This study supports previous findings by Amblee and Bui (2008), Dellarocas (2003) and Dellarocas, Zhang and Awad (2007) that online WOM can substantially increase the initial sales of new products, exaggerate product growth and cause the reversal of sales growth when maturity sets in. Cui, Lui and Guo (2012) have exposed this influence by researching the early effect of online WOM on new product sales. Also, multiple other studies confirm this early effect of online WOM on product sales. These studies examined the effect of early online WOM on books, movies and video games revenues (Holbrook and Addis,2007;Chen and Xie, 2008; Dellarocas, 2003; Li and Hitt, 2008; Miller, Fabian and Lin, 2009; Lui, 2006). The

aforementioned studies suggest that online WOM communication is an important source of

information for consumers planning to purchase new products. The user generated content, such as Tweets, helps costumer make informed decisions about purchasing new products. And so, online WOM has become an important driver and predictor of new product sales.

(24)

24 (2004) and Bass (2004). These new findings propose an early effect of online WOM on the product life cycle. This early effect causes new products to experience an tremendous growth in the early stage of the product life cycle as a result of online WOM. But on other end, these growth effects are not extended in the next phase of the product life cycle and tend to fade out over time. From a managerial perspective, these findings suggest that online WOM shortens product life cycles and forces organizations to rethink their pre and post-launch marketing strategies.

The early effect of online WOM has also been confirmed by Asur and Huberman (2010). According to Asur and Huberman (2010), social media expresses a collective wisdom which, when properly used, can yield an extremely powerful and accurate indicator of future outcomes. Therefore, Asur and Huberman (2010) studied an even earlier effect of online WOM. They researched the effect of online WOM, specifically the Twitter volumes, in the pre-launch period of a product. They found a positive correlation between the volume of pre-launch online WOM and the box-office revenues of a particular movie at the release weekend. These findings suggest an even earlier effect of online WOM than Cui, Lui and Guo (2012). This thesis continues building on the theory that online WOM can effect, and thus predict, product sales even before the official launch of the product.

Car model tweets

In this thesis we examine the change of the role of online WOM in the product life cycle and the predictive abilities of online WOM the pre-launch phase of a new product. To continue building on the findings of Asus and Huberman (2010), we use the pre-launch online WOM, on the social network called Twitter, to predict new product sales. Twitter has been identified by Asur and Huberman (2010) as one of the fastest growing online WOM networks. The micro-blogging network has experienced a burst of popularity in recent years leading to a huge user base, consisting of several tens of millions of users who actively participate in the creation and propagation of content. According to Asur and Huberman (2010) Twitter has attracted lots of attention from organizations

(25)

25 because of the huge potential it provides for viral marketing. Organizations are using Twitter to advertise products and spread information to stakeholders. Jansen et al. (2009) found that one fifth of a random sample of tweets contained mentions of a product or brand.

The pre-launch Twitter data is used to predict new car sales. This thesis uses this product category because of the contribution of Cui, Lui and Guo (2012). They studied the effect of various metrics of online WOM on new product sales. They found that volume, valence and views of online reviews all significantly affect new product sales, but the effects tend to be stronger or weaker depending on the product category. In general, product type influences search behavior of consumers and the use of information sources, which in turn influences their choices. As mentioned, this research focuses on forecasting car sales of new models. In this thesis we position new cars in the search category. Although cars are hard to categorize in explicitly a search or experience product because certain car characteristic, such as the driving experience, are very personal and hard to describe.

Cui, Lui and Guo (2012) showed that online WOM can be measured by different metrics, such as the volume or sentiment. In our research we use tweet volumes as metric. This is mainly because tweet volume has demonstrated its predictive capabilities in previous research. Asur and Huberman (2010) found a relationship between tweet volumes and box-office revenues. Zhang, Fuehres and Gloor (2011) argue that the more positive tweets about a financial market the higher the chance of a financial up rise. Lui (2006) shows that online WOM about a film concentrates on the weeks before and after the release day, and decline steadily thereafter. Lui used data retrieved from the website Yahoo Movies. The author concludes that online WOM volume is a good predictor of success. Achrekar et al. (2011) conclude that the volume of ﬂu related tweets are highly correlated with the number of fever cases reported. The research of Tumasjan et al. (2010) indicates that the amount of tweets concerning a political party or persons reflects voter preferences and comes close to

(26)

26 relationship between volume of online products reviews and product sales. The rationale behind this effect of volume of product reviews is that discussions about a product in online forums lead to increased awareness among consumers. Although we acknowledge that not all of the above mentioned studies are about predicting new product sales, the findings do show that the metric online WOM volume is a good predictor of real world outcomes.

To briefly summarize, prior research on car sales prediction indicate that WOM is an important influencers in the consumer decision process (Urban Weinberg and Hauser, 1996; Newman and Staelin, 1972). Furthermore, previous literature shows that online WOM effects early sales of new products. The information seeking of consumers for the decision processes for a new car seems to be short and include only a limited number of sources. In addition, Twitter has grown to be the online platform for WOM about particular brands. But more important, Twitter volumes have proven to be a good predictor of future consumer behavior in various field. This thesis is interested in the relationship between tweets volumes and the initial new car sales.

H1: New cars that are more discussed on Twitter during the pre-launch phase, sell better in the post-launch period.

Brand tweets about car

Besides the total tweet volume about a particular topic we are also interested in how online WOM is actively stimulated by the brands. According to Thomas (2004) Twitter volumes arise from the interaction of consumers and users of a product or service which intensify or changes the original marketing message of a brand. These original marketing messages are distributed through the marketing channels of the brand in the pre-launch phase. One of these channels is the brand’s Twitter account. Asur and Huberman (2010) determined whether movies that have greater self initiated publicity, in terms of Tweets with linked URLs, perform better in the box office. They found

(27)

27 the correlation between the URLs and retweets with the box-office performance to be moderately positive. However, they also found that these features are not very predictive of the relative performance of movies. Just like Asur and Huberman (2010) this study is interested in studying how attention and popularity are generated for products by the various brands, and the effects of this attention on the real world performance of the products. Preliminary exploratory research on the pre-launch period of new cars on Twitter showed similarities with the finding of Asur and Huberman (2010). Prior to the release of a new car, brands generate promotional information in the form of Tweets with trailer videos, news, blogs and photos. Brand tweets prior to the release of products consist primarily of such promotional campaigns, tailored to promote information distribution via online WOM on a large scale. Due to the promotional character of those brand tweets, these are expected to have a large positive influence on the online WOM and eventually on the initial sales of new car models. Following the research of Asur and Huberman (2010) this thesis examines the relationship between the volume of such promotional tweets published by the brand about a new car model and the initial sales.

H2: Brands that initiate higher publicity in the pre-launch phase through Twitter have higher initial new car sales.

Mass media

Another way through which marketing messages are distributed to stimulate online WOM and sales for new products is via advertisement. As already mentioned the diffusion of innovation literature discussed by Bass (2004) suggests that in the early stage of a product life cycle , innovators are mainly affected by mass media, and after using the new products, they pass their opinions to latecomers via WOM channels. As discussed earlier, the recent study by Asur and Huberman (2010) suggests an earlier presence and influence of online WOM. Therefore, this thesis examines the relationship between pre-market advertisement and pre-launch online WOM.

(28)

28 H3: Higher brand investment in market mass media promotion for new car leads to more pre-market Twitter volume.

And in addition, Urban, Weinberg and Hauser (1996), Bennett and Mandell (1969) and Newman and Staelin (1972) identified advertising as an important source of information for consumer decision making. The relationship between advertisement and sales has been examined by Clarke (1976) and Heyse and Wei (1985). Clarke (1976) found that the positive effect from advertisements on sales occurs within three to nine months. Heyse and Wei (1985) found similar results and pose that sales and advertising are more strongly related as time periods overlap. They specifically found a strong connection between advertising budgets and current sales. The advertising budgets are often set as percentage of sales. These two articles suggest a short-term effect of advertisement on online WOM and sales. Therefore this thesis is interested in the relationship between the pre-launch mass media investment and the post-launch initials car sales. And, since sales Clarke (1976) and Heyse and Wei (1985) found a stronger relationship between sales and advertisement in the same time period, we also examine in the relationship between the post-launch media investment and the post-launch new car sales. The results of the measurements of these relationships can serve as reference

material for the relationship between pre-launch Twitter volumes and initial car sales. Similar to how Asur and Huberman (2010) compared their model build from Tweet volumes to market-based predictors to indicate its predictive strength.

H4a: Higher brand investment in pre-market mass media promotion leads to higher initial new car sales.

H4b: Higher brand investment in post-market mass media promotion leads to higher initial new car sales.

(29)

29 When analyzing the influence of tweets volumes on sales two influencing factors have to be taken into account. An important factor is the popularity of a brand. Some car brands might be more popular in the Netherlands, which can strongly influence the sales of a particular brand. One major reason for popularity is loyalty. Newman and Staelin (1972) found that the purchase and use of a product result in learning which later influences buying behavior. Furthermore the study has to account for the fluctuations in car sales throughout a year. On average Dutch people are known for buying a car just before the summer holidays when they receive their extra holiday allowance. Other fluctuations emerge due to changes in governmental law, for example in the area of emission taxes. These fluctuations can influence the measurements of initials sales of new cars.

(30)

30

Method

To answer the research questions whether Twitter volumes in the pre-launch phase forecast initial car sales of new cars a database analysis is conducted. We used various databases to gather information about Twitter volumes, media expenditures and car sales.

Twitter data.

An online monitoring tool called Buzzcapture is used for the gathering of Twitter volume data. Buzzcapture is a web based tool which gives an organized display of all Dutch Twitter data provided by the Twitter API. The tool helps you to get insights into a specific topic discussed on Twitter. The required datasets for this research are the brand and car model tweet volumes. This information will be gathered using search queries within the tool. As an example, for the model search of Volkswagen Golf, we use the full written name of the car model as a search query. This means a Tweet has to contain the words Volkswagen and Golf otherwise the message won’t be included in the data. For the brand search, we use the full written name of the brand (Appendix 1: Table A1: Search queries Twitter). These search queries work as a filter in the program. The data gathered from Buzzcapture gives a detailed overview of the brand and car model tweet volume per month from January 2012 until November 2014.

Car sales data

The car sales information is gathered using the secondary data from the BOVAG. The source of the data is the RDW, the public service provider in the mobility chain (RDW, 2014). When a new car is sold the new owner is obligated to register the car license plates to the RDW database. A license plate number is an identifier for vehicles. With the license plate the RDW can identify who is liable for a vehicle. Furthermore, with the car registration system the RDW can keep track of the sales of new cars. The car sales data from the RDW is processed and edited by the RDC, which is the data centre of BOVAG. BOVAG is a trade organization of more than 10,000 entrepreneurs engaged with

(31)

31 mobility (BOVAG, 2014). The data from BOVAG gives a detailed overview of the car sales in the Netherlands per model per month for the period of January 2013 until October 2014 (BOVAG, 2014).

Media expenditure

The information about the media expenditure of different brands used in this thesis is gathered from the Adfact database. Adfact is a marketing agency which collects, analyzes and sells information concerning the mass media investment of companies in the Netherlands. The database consist of the daily records of all widespread market advertisement. These records include commercials on

television and radio. And advertisement in cinema, newspaper and magazines. Furthermore, it includes outdoor displays. The records are accompanied with the estimated cost of the

advertisement based on current market prices and connected to the related brands and models. The information does not give exact brand media expenditure data but gives a good approximation. Moreover, the expenditure data of the various brands are well comparable with each other. It is important to notice the information does not include the expenditure on online display like banner expressions. The Adfact data gives a detailed insight into the media expenditure of automotive brands in the period of January 2013 to November 2014.1

Sample

This research uses all Dutch Twitter data provided by the Twitter API. We analyzed a total amount of 290.308.950 tweets for this research. Buzzcapture gives the opportunity to filter out the necessary information using search queries on a particular topic in a specific time period. Furthermore, this study collects sales data and related brand media expenditure from seventeen new car models in 2013 and 2014 in the Netherlands. In order to provide clean results, we only use information on family cars with a traditional engine. For example, this excludes two seaters and electric cars.

1

To check the reliability of the Adfact data, the marketing expenditure information from Adfact is compared with data obtained from one of the brands in our research. The results show an alignment of information.

(32)

32 Another requirement is that a new cars needs to be sold more than a hundred times in 2013 or 2014. This study uses these new cars, because they generate enough Twitter volume and demand for media investment from the related brand to examine the proposed relationship between Twitter volume, media expenditure and car sales. We use data from the Netherlands, because Twitter is a commonly used social media platform in the Netherlands (Azevedo, 2011). And also because of the high rate of car owners that comes from high prosperity of the country (Worldbank, 2011; Legatum institute, 2014). This makes the Netherlands a good country for researching the predictive

capabilities of Twitter. Last and foremost the necessary data is available. The Twitter volumes, media expenditures and car sales data is gathered for seventeen new cars (Table 1: New car models

introduced in 2013 and 2014).

Table 1: New car models introduced in 2013 and 2014

Number Models Available in the Netherlands

1 Renault Capture April 2013

2 Peugeot 2008 May 2013

3 Opel Adam January 2013

4 Kia Carens March 2013

5 Peugeot 108 June 2014

6 Volkswagen Golf Sportsvan May 2014

7 Citroen C4 Cactus June 2014

8 BMW 3 Serie Gran Turismo June 2013 9 Mercedes-Benz CLA-klasse March 2013

10 Fiat 500L Januari 2013

11 Renault Zoe March 2013

12 Seat Toledo March 2013

13 BMW i3 November 2013

14 Opel Cascada April 2013

15 Mini Paceman March 2013

16 Mercedes-Benz GLA-klasse March 2014

17 Porsche Macan April 2014

The analysis starts with a descriptive graphical analyses of the Twitter volumes, media expenditures and car sales data. The remainder analysis of this thesis resembles the study of Asur and Huberman (2010) in which they investigated the effect of pre-launch Twitter volume on box office revenues for

(33)

33 movies. Asur and Huberman (2010) performed a correlation analyses on the tweet-rate a week prior to the release and the box office revenues in the opening weekend. The tweet-rate is defined as the number of tweets referring to a particular movie per hour. Subsequently they constructed a linear regression model using least squares of the average of all tweets for the 24 movies considered over the week prior to their release. To investigate whether Twitter volumes can predict the early car sales for new car models and whether media expenditure has an influence on the Twitter volumes and initial sales, we make use of various variables in a correlation analyses. Furthermore, we construct multiple linear regression models. (Saunders and Lewis, 2011).

Variables

The dependent variable is the total amount of cars sold in the six months after each model is available in the Netherlands. The dependent variables is defined as follows.

Total-CarSales model = Total number of car sales within six months after the release of each new car

model

The six months period is determined after consideration of the various delivery times of new cars and the limitations of the provided car sales data. The monthly new car sales data from the BOVAG data is determined by the moment of registration of a new car at the RDW2. This moment of

registration aligns with the moment of delivery of a new car to the consumer. But this date is not the same as the moment of the order of a new car. The delivery time for a new car in the Netherlands can strongly fluctuate from direct delivery to multiple months depending on the car model, the brand and the place and time of the order. Also, in some cases the consumer can pre-order the car. There is no reliable data available about the average delivery times of new cars. The six months periods enables this thesis to examine a large percentage of the initial car sales of a new car model.

2

The registry of a new car license plate at the RDW is mandatory in the Netherlands. A registration plate is an identifier for vehicles. The register keeps track of who is responsible for a vehicle.

(34)

34 The analyzed sales period is limited by the time period of car sales data as provided by BOVAG, the available Twitter data and the date of the product launch. This research has only access to Twitter data from 2013 and 2014 and therefore can only use car models from these years. To include a high amount of new car models in this thesis, the post launch period cannot be long. This post-launch period strongly limits the amount of new cars which can be included in this thesis from 2014 as we have only car sales data up to November 2014.

In this thesis we considers four independent variables. The first variable is the total Twitter volume from consumers about each new car model in the period of six months before the car is available in the Netherlands.

(1) Total-tweets-car model = Total number of tweets for a new car model sent by consumers,

accumulated over six months prior to the release of each car model

We use a six months pre-market period, because new car model introduction communications from brands vary over this period. These communications can range from the release of pictures of the new model on social media to billboards along the road. Note that this variable represents the total number of tweets from consumers only as the thesis subtracts the number of related brand tweets written by the company’s marketing department from the total amount of tweets. This subtraction is done because the tweets sent out by the brand aren’t part of the online WOM from consumers. Note that the thesis still uses the number of company tweets. These tweets will be analyzed separately as the third dependent variable. The second dependent variable is the total media spending for the model done by the brand in the period of six months before the car model is available in the Netherlands.

(35)

35 (2) Media-expenditure-pre model =Total amount of media expenditure of each brand for the new

model accumulated over six months prior to the release of each car model

Again, we use a period of six months. Similar to the first independent variables, we have chosen for this period to capture all the pre-launch brand communications which are spread over this period. This also applies to the third independent variable. As mentioned before, the third dependent variable is the total amount of tweets written by the brand about the related car model in the period of six months before the car model is available in the Netherlands.

(3) Total-tweets-brand model= Total number of tweets for a new car model sent by the own brand,

accumulated over six months prior to the release of each car model

In addition to the independent variables we use two control variables. The first control variable is the popularity of the brand to control for brand effect. The popularity of the brand in this thesis is measured by accumulating the total number of car sales of a specific brand in 2013 and 2014 in the Netherlands. The higher the sales of the brand in this period, the higher its current popularity. The sales numbers give an indication of the current consumer’s willingness to buy the brand. This variables is divided up in three categories, namely high, medium and low brand popularity, using the difference in total brand sales. In this thesis we use two dummy variables to control for the

popularity of the brand.

The second variable is the time period in which a car becomes available. This variable is used to control for the popular periods in which people buy cars in the Netherlands. For instance, the beginning of a new year is often a popular period to buy cars. This is due to changes in emission regulation at governmental level. If the six months post-launch sales measurement period overlaps

(36)

36 with a popular buying period there is more chance of higher sales during this period. We control for this time effect by using a dummy variable which controls for a popular time period.

In order to further investigate the relationship between pre-launch tweets and post-launch sales we also examine a shorter time period following the research of Asur and Huberman (2010). Asur and Huberman (2010) found a positive relationship between the Twitter volume one week prior to the release and the box-office revenues. If we extend their study to our thesis we should focus on the tweets one week before the release of a new car model and the car sales directly after the car becomes available. Our datasets do not provide a period less than a month. In order to keep the time period as short as possible we decided to examine the relationship between the total number of tweets one month prior to release and the total number of car sales of a new car model one month after the release. The dependent variable the total amount of new cars sold one month after the car becomes available. The independent variable is the total volume of consumer tweets one month prior the release date of the new car.

Sales-month-post model = Total number of car sales within one months after the release of each new

car model

Tweets-month-pre model = Total number of tweets for a new car model sent by consumers, one month

prior to the release of each car model

The last variable in this thesis is used to examine the relationship between the post-launch media expenditure of each brand and the initial sales both accumulated over six months after the release of the new car. We have chosen for the six month period to capture all the post-launch brand

communications which are spread over this period. Moreover our data only provides media expenditure data and car sales data to a maximum of six months.

(37)

37 Media-expenditure-post model =Total amount of media expenditure of each brand for the new model

accumulated over six months after to the release of each car model.

Multiple linear regression

The total of seven independent and dependent variables mentioned above are aggregated values over months. The dependent variables are continues variables. Therefore, to find the predictive value of WOM on Twitter and media expenditure, we construct multiple linear regression models using the variables mentioned above. The regression models are constructed to find a casual relationship between the independent en dependent variables (Vocht, 2007). In addition, we check the coherence of the variables by calculating the correlation coefficients. The correlation coefficients allow us to quantify the intensity of the relationship with the continuous variables (Vocht, 2007). The correlation coefficient is also used to verify for a multicollinearity problem.

The following regression model is constructed to measure the relationship between pre-launch consumer tweets, brand tweets and media expenditure and the post-launch initial sales, all accumulated over a period of six months.

(1) Total-Sales model = β0 + β 1 Total-tweets-car model + β2 Total-tweets-brand model + β3

Media-expenditure-pre model + β4 Control-popular-period + β5 Control-high-brand-popularity + β6

Control-low-brand-popularity + Ɛrror

The next regression model is constructed to measure the relationship between pre-launch consumer tweets and the post-launch initial sales, both over a period of one month.

(2) Sales-month-post model = β0 + β 1 Tweets-month-pre model + β2 Control-popular-period + β3

(38)

38 The last regression model is constructed to measure the relationship between the post-launch media investment of each brand and the initial car sales, both accumulated over six months.

(3) Total-Sales model = β0 + β1 Media-expenditure-post model + β2 Control-popular-period + β3

Control-high-brand-popularity + β4 Control-low-brand-popularity + Ɛrror

The first two regression models only uses Twitter data before the introduction of the new car model as the research is intended to develop a model to forecast the car sales in the pre-launch phase of a new car model. Note that the independent variables can be highly correlated with each other which would cause a multicollinearity problem in a regression model. The thesis checked these correlation but none of the independent variables showed a strong correlation with another (Table 4).

(39)

39

Results

This section will show the results of preliminary analyses and a graphical description of the most important variables. Moreover it displays the results of the correlation and regression analysis.

Preliminary steps

Before we can tests the hypotheses with a regression analysis, the raw data is cleaned and a preliminary data analysis is conducted. We need to clean the data because it comes from different sources. The preparation of the data consists of extracting the right periods and values out of the complete datasets, and calculating the variables of interest. As mentioned before, the time windows for the first regression model are six months before and after a new car model becomes available in the Netherlands. Therefore, the total period sums up to a year per new car model. The second regression model uses a time period of one month before and after product launch. The third regression uses a time period of six months after the introduction of a new car. The calculations and preparation of the various variables are done in Microsoft Excel 2013.

The preliminary analysis consists of a frequency and normality check of the seven variables of interest. Moreover a graphical analysis is conducted to give an overview of the relationship between the dependent and independent variables. The frequency and normality check shows the mean of the various variables and indicates that there are no missing values. The data shows that Total-sales, Total-tweets-car, Total-tweets-brand, Media-expenditure-pre and sales-month-post are substantially or extremely positively skewed (s > 1) (Vocht, 2007). That means that the most frequent scores are grouped towards the left of the distribution. Moreover, Total-sales, car, Total-tweets-brand, Media-expenditure-pre and sales-month-post show kurtosis (k > 1), a sharp peak of the distribution (Table 2: Frequency and normality check). The results of the normality check show that the thesis has to correct for the skewed data to enable a regression analysis. This correction is done