• No results found

How to use online information search behavior to predict number of tourists in Amsterdam city

N/A
N/A
Protected

Academic year: 2021

Share "How to use online information search behavior to predict number of tourists in Amsterdam city"

Copied!
64
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

How to use online information search behavior to

predict number of tourists in Amsterdam city

Faculty of Economics and Business

Master of Science in Business Administration Track: Marketing

Student: Aylar Soltani Student number: 10544194 First supervisor: Bob Rietveld

Second supervisor: prof. dr.W.M. van Dolen Submission Date: August 19th, 2016

(2)

1

Statement of originality

This document is written by Student Aylar Soltani who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

2

Acknowledgements

This work would not have been possible without the help, and support of many people to whom I wish to express my sincere gratitude.

First and foremost, I owe my deepest gratitude to my great supervisor, Bob Rietveld. I really enjoyed his support and invaluable insights and guidance.

I would like to thank Willemijn van Dolen, Olivier Ponti and Masoud Mazloum for their valuable help and insights.

My deepest gratitude goes to my family and friends for their endless support and love throughout my life.

(4)

3

Table of Contents

Abstract ... 5

1. Introduction ... 5

1.1 Introduction ... 5

1.2 Research gaps and research question ... 7

1.3 Contribution... 8

2. Literature review ... 9

2.1 Antecedents and motivations of information acquisition behavior ... 10

2.1.1 Utilitarian information seeking behavior ... 10

2.1.2 Non-utilitarian information seeking behavior ... 12

2.2 The Theory of planned behavior and the extension of TPB ... 12

2.3 Antecedents and motivations of using the internet- uses and gratification theory ... 13

3. Research hypotheses ... 15

4. Methodology and Empirical Testing ... 18

4.1 Data Description ... 18

4.1.1 CBS website ... 19

4.1.2 Wikipedia Usage Trends ... 19

4.1.3 Google Trends ... 19

4.2 Methodology ... 20

4.3 Visitor Time Series (VTS) ... 22

4.3.1 Exploratory Analysis of Visitor Time Series (VTS) ... 22

4.3.2 Baseline Model ... 24

4.4 Wikipedia Usage Trends (WUT) Time Series ... 25

4.4.1 Selection of Wikipedia pages ... 25

4.4.2 Exploratory Analysis of WUT Series ... 26

4.4.3 WUT series and corresponding population ... 29

4.4.4 Aggregate WUT Time Series ... 31

4.4.5 Granger Causality Analysis ... 32

4.4.6 Wikipedia Usage Trends Model ... 33

(5)

4

4.5 Google Trends (GT) time series ... 34

4.5.1 Query Selection ... 34

4.5.2 Exploratory Analysis of Google Trends Series ... 35

4.5.3 Aggregate Google Trends series ... 36

4.5.4 Google Trends Models ... 36

4.5.5 Forecasting Power of Models ... 38

5. Conclusion and Discussion ... 40

5.1 Wikipedia Usage Trends ... 40

5.2 Google Trends ... 42

5.3 Implications ... 44

5. 3.1 Implications for Practice... 44

5.3.2 Implications for Amsterdam Marketing Organization ... 45

5.4 Limitations and Suggestions for Future Research ... 46

5.5 Conclusion ... 47

Appendices ... 50

(6)

5 Abstract

Online user activities such as querying search engines have been frequently used for predicting user’s intention. This research aims at studying how certain online information can be used as indicators of customers’ intention. More specifically, we use Google search history and Wikipedia page visits as user’s online activities in order to predict number of visitors of Amsterdam city. We choose a number of Google search queries and Wikipedia pages that indicate the user’s intention for visiting Amsterdam city, and use them to build two predictive models using ARIMA time series approach. As the baseline model we use the ARIMA model built using the historical data of Amsterdam city visitors in a three year period. We show that the model built using Google searches outperforms the other two models at least 18%, and the other two models have almost the same performance.

Keywords: online data, time series, forecast, travel intention, tourism demand, consumer

behavior

1. Introduction

1.1 Introduction

Consumers’ self-reported intentions have been studied in the consumer behavior literature extensively (Brown, et al., 2003; Creyer, 1997; Hsin Chang & Wen Chen, 2008; Cronin, et al., 2000; Grewal, et al., 1998). Most of the studies examine the relationship between consumers’ intention and consumers’ behavior in order to predict their future behavior. However, this is hard to achieve, since the number of moderators and variables that intervene are hard to detect and

(7)

6

measure. Nonetheless, rapid growths of information technology and e-commerce have changed rules of the game, and consumer behavior. So, new methods of measuring purchase intention and purchase behavior needs to be adopted.

In this study, we show how individuals use internet to shape purchase intention, and seek for information in order to make the decision. Thus, online platforms that provide rich and valuable information for users can be used as a proxy measure for purchase intention, and prediction of future behavior. For this purpose, information seeking behavior of consumers in travel and tourism industry is explored. Amsterdam city is selected as a destination city to explore the online and offline behavior of its tourists. Based on ITB world travel report in 2015, 75 percent of travelers used online information for planning their trip. This source of information helps travelers in making choice of their destination. An online pre-purchase intentions model is proposed by Shim et al. (Shim et al., 2001). This model shows that intention to search for information on the internet is a key predictor of intention to purchase. This means that, information search behavior of travelers is a powerful predictor of travel product purchase, and travel behavior. In this work, Google search engine and Wikipedia are examined and compared as a medium that provides travel information. It needs to be mentioned that not all of the people who seek information actually intend to travel, and information can be acquired to satisfy needs such as entertainment or curiosity. However, the empirical study shows that majority of activities regarding online information acquisition of individuals is goal oriented and has functional reason which is planning a trip (Vogt & Fesenmaier, 1998).

A large body of research has been conducted using online data that internet users leave behind in real time to predict diverse variety of future phenomena such as monitoring influenza

(8)

7

outbreak (Ginsberg, et al., 2009), automobile sales, unemployment, and travel planning (Choi & Varian, 2012), forecast election (Tumasjan, et al., 2010), box office revenue and video games sales (Goel, et al., 2010; Arias, et al., 2013), housing market (Wu & Brynjolfsson, 2013), consumer sentiment index (Penna & Huang, 2009) stock market volatility (Vlastakis & Markellos, 2012; Nofer & Hinz, 2015; Bollen, et al., 2011; Zhang, et al., 2011) inflation expectation (Guzman, 2011), and tourism volume (Xiang & Pan, 2011; Yanget al., 2015). However, these studies lack to explain theoretical reasons of such a behavior and focus on predictive structure of big data.

1.2 Research gaps and research question

Researchers have extensively used self-reported survey data to investigate purchase intention of products within time frame of one week to a year or more. The major drawback of using self-stated intention for investigating the purchase intention is that, the results have variation with the actual behavior (Jamieson & Bass, 1989). The reason is due to self-generated validity effect (Chandon et al, 2005). This effect increases the likelihood that consumer will follow the reported intention, so this method lacks in measuring the actual intention, consequently sophisticated models are needed to predict the actual behavior.

Furthermore, literatures have evaluated the relationship of intention with actual purchase behavior only on individual level, and there has been no effort to examine intention and actual behavior in aggregated and macro level. Using small sample could be misleading and the sample could show different behavior compared to the behavior of large population (Jamieson & Bass, 1989).

(9)

8

Moreover, some studies have used online behavior to examine actual online purchase (Guo & Barnes, 2011; Chu, 2001; Murphy et al, 1996), but, to the best of our knowledge, there is no study that uses online behavior to predict offline behavior. This study addresses the shortcoming of the existing literature explained above by focusing on macro level of consumer behavior in online platforms, and investigating and predicting its descendant behavior in offline environment.

In this work, we aim at using Google search queries and Wikipedia page visits as a proxy of travel intention to predict the number of visitors of Amsterdam city. More specifically, we address the following research questions in this study:

1. How to select a medium to measure purchase intention, and what characteristics of the medium increase the measurement accuracy?

2. How Wikipedia Usage Trend, can be used to predict the number of visitors of Amsterdam city in the near future, i.e. one month?

3. How user queries, posed over a search engine (e.g. Google), can be used to predict the number of visitors of Amsterdam city in the near future, i.e. one month?

1.3 Contribution

Theoretical: A new framework is proposed here, to measure purchase intention by using

information search behavior of internet users, and anticipating the actual behavior in offline setting. Previous research regarding online consumer behavior uses survey to examine the relationship between intention to purchase and search for information (Pavlou and Fygenson, 2006), or actual purchase in an online platform (Guo & Barnes, 2011). This study, confirms the previous findings in a real setting. Moreover, to the best of our knowledge, this is the first study

(10)

9

that uses online data to observe offline behavior. Although the survey method has merit, investigating the theory other than self-reported data has high external validity.

Managerial: Since this study focuses on macro level of consumer behavior, the result will

reveal behavioral aspects of market. Hence, this methodology helps managers to evaluate market’s potential prior to investment and measure effectiveness of marketing strategies by using interactive media in real time. Market movements can be detected by observing the interest of internet users on particular product, and their information search behavior, so the proposed framework in this study can be incorporated as a marketing strategy tool. Moreover, marketers can use this method to measure effectiveness of marketing campaign by monitoring brand awareness.

The rest of this research is structured as follows: Section 2 provides an overview of the relevant literature on how information search behavior is shaped and how it can be used as a proxy to measure the intention. In section 3, the conceptual framework, which we use in this study, is described. Section 4 presents the methodology that is used and the results of the analysis. Section 5 presents, discussion, implications and limitation of this research, and concludes.

2. Literature review

In this section, we first review studies regarding underlying motivations of information acquisition behavior of individuals. Then, the theory of planned behavior is explained, and its

(11)

10

extended model that integrates information acquisition approach is discussed. Finally, by using gratification theory, consumers’ motivations for using different media are explained.

2.1 Antecedents and motivations of information acquisition behavior

Information acquisition behaviors of individuals in the professional, educational, and shopping context have been studied extensively in the literature (Herman, 2004; Julien & Duggan, 2000). Most of these studies focus on functional motivations and needs of information seeking behavior, which are considered as a goal oriented task (i.e. utilitarian). In contrast, there is limited number of studies that look at information seeking behavior with entertainment motives (i.e. non-utilitarian). In the next section, these two views are discussed in detail.

2.1.1 Utilitarian information seeking behavior

By far, the most influential studies for understanding the information seeking behavior has been carried out by researchers in the consumer behavior field. These studies have been focused on utilitarian purpose of information acquisition by using cognitive approach to explain why people look for information.

These works show that, while individuals might have different behavioral motivations and needs to purchase a product, they have similar decision making process. Consumer decision making process has been of interest of many studies over the decades, and various approaches for modeling it has been proposed in the literature. One of the prevalent models, from cognitive approach, is that of Engel, Kollat, and Blackwell Model (Engel et al., 1973). This model has undergone numerous revisions, and the final version consists of five stages, problem recognition

(12)

11

or need recognition, information search, pre-purchase evaluation of alternatives, purchase and post-purchase behavior (Engel et al, 1973). In this model, consumer is considered as a problem solver, where the decisions are made after acquiring and processing information. Information search occurs internally using memory and past experience and or externally. The authors believe that source, depth and amount of information search depend on the complexity and nature of the problem.

Several models have been proposed for the destination selection decision process of tourists (Schmoll, 1977; Mathieson and Wall, 1982; Mouthino, 1987). All of these models consider information search as one of the stages before selection and processing the choice and travel purchase. For instance, Mathieson, (Mathieson et al., 1982) proposed a linear five-stage model of travel buying behavior. These stages are: 1) need recognition and travel desire, 2) information collection and evaluation image, 3) travel decision, 4) travel preparation, and 5) travel outcome and satisfaction. This author proposes a model in which includes every aspect of a vacation decision making process. This author believes that travel purchase is not a just single purchase of one tour package. Moreover, information search is not just in pre-purchase phase, but; travelers evaluate and process information during and after travel as well. However, the intense information search behavior normally happens before purchase or trip, and during the trip.

Furthermore, the information search behavior is influenced by individuals’ prior experience and knowledge of the destination (Lehto et al., 2006). Previous studies show that, uncertainty avoidance as a cultural factor, and information source has great impact on the type and amount of information that is used by tourist before planning the trip (Money & Crotts, 2003; Kerstetter& Cho; 2004).

(13)

12

All of the information related to vacation destination city is not acquired by visitors who plan their vacation. As it is explained in the next section, people sometimes seek for information for entertainment and pleasure. However, since majority of this type of information seeking is accidental, its structure is quite complex, thus the trend and pattern cannot be detected.

2.1.2 Non-utilitarian information seeking behavior

The underlying motivations of seeking information are much more diversified than believed by scholars in the goal oriented context and rationalism (Kari, 2001). Recently, some studies examined information seeking behavior from a new perspective; obtaining information for entertainment and pleasure (Ross, 1999). Kari and Hartel (Kari & Hartel, 2007), explore the information seeking behavior as a pleasurable and profound phenomenon, which is considered as an intrinsic motivation. Meaning that, individuals obtain information for its internal rewards, such satisfaction or pleasure.

Information science in leisure and entertainment context is examined by Hartel (Hartel, 2010). In this study, the leisure type of information is categorized as causal, serious, and project based concepts. These studies show that information seeking behavior with entertainment motives are less goal oriented, therefore the patterns are hard to detect.

2.2 The Theory of planned behavior and the extension of TPB

One of the powerful theories in predicting an individual’s behavior is the theory of planned behavior (TPB) (Chatzoglou & Vraimaki, 2009). TBP has been applied as a method to predict human behavior in consumer behavior extensively. According to TPB, attitudes, subjective

(14)

13

norms and perceived behavioral controls predict behavioral intentions, which consequently determine behavior (Ajzen, 1985, 1991). This framework has gained strong empirical support in anticipating tourism behavioral intentions (Sparks, 2007; Sparks & Pan, 2009).

However, some studies show that TPB is not capable enough in demonstrating variance of intended behavior. Thus, additional variables are added to the model, and by extending TBP the predictive power of the model has been increased. For instance, Lam and Hsu (Lam & Hsu, 2006) show that past behavior has impact on behavioral intention of selecting a travel destination. Amaro and Duarte (Amaro & Duarte, 2015) found that attitude, compatibility and perceived risk as predictor variable of intentions to purchase travel online. Although numerous studies found different explanatory variable for behavioral intention, none of these studies questioned the predictive power of intention. Meaning that as the behavioral intentions get stronger the likelihood of future behavior will increase; this is consistent with the theory of planned behavior and the theory of reasoned action (Ajzen, 1987; Fishbein & Ajzen, 1975).

Extended model of the theory of planned behavior shows that intention to purchase provokes the intention to get information (Pavlou & Fygenson, 2006). This model integrates the theory of planned behavior and Engel’s purchase decision making model to predict electronic commerce adaption. Shim et al., (Shim et al., 2001), found that the intention to use the internet to search product information is the strongest predictor of purchase intention.

2.3 Antecedents and motivations of using the internet- uses and gratification theory

The uses and gratification theory is used to explore individuals psychological needs to use specific media, such as television (Conway and Rubin, 1991), radio (Mendelsohn, 1964),

(15)

14

newspaper (Elliott & Rosenberg, 1987). The main assumptions of this framework are that individuals are goal oriented in their behavior, and are conscious of their needs. Thus, they select a media and content that fulfill their goals and gratify their needs (Katz et al, 1974). Media scholars have applied the uses and gratification theory to discover motives of non-traditional media users, the internet users (Raacke & Jennifer, 2008; Lin, 1999; Ko et al, 2005; Papacharissi & Rubin, 2000). In fact, this theory is a useful framework to understand and explore any new media adaptation phenomena (Roy, 2009).

Researchers have applied uses and gratification approach to examine users’ motivation in different internet context, such as web (Ko et al, 2005; LaRose et al, 2001); weblogs (Kaye, 2005; Chung & Kim, 2008); and social networks such as facebook (Joinson, 2008), and twitter (Johnson & Yang, 2009). Although some of these platforms have some differences, information, social interaction, entertainment, and convenience are the most common motivations in internet usage.

Shao (Shao, 2009) has divided consumers’ online behavior in three activities, consumption of information and entertainment, participation in social interaction, and production of self-expression and self-actualization. Each of these activities is driven with different motivations. Ko et al (Ko et al., 2005) finds that people who have high information motivation participate in human- message interaction (consumption of information), while those with social interaction motivation are engaged in human- human interaction (participation in social action). These studies determine that different internet websites and platforms are suited to satisfy specific needs. For instance, social networks such as facebook are a better tool to satisfy social interaction needs compared to information need.

(16)

15

Shao (Shao, 2009) categorizes Wikipedia and Google as mediums for individuals whose main motivations to use these platforms are information and entertainment acquisition. Moreover, according to the report from Alexa.com as of 23 March, 2016, Google is the first and Wikipedia is the seventh most visited websites.

3. Research hypotheses

In this section, we list the hypotheses which are tested in this work. For each hypothesis, we first explain the theories and reasons behind it, and then we present the hypothesis.

As explained above, based on gratification theory, motivations to use Wikipedia and Google search engine are to gain information or satisfy hedonic needs such as entertainment. Furthermore, according to Alexa.com report, these two mediums are the most popular websites that provide information. Hence, to explore information acquisition behavior of consumers these two platforms are selected.

According to Engel’s consumer decision model and travel buying behavior models, tourists search for information to facilitate decision making process before and during their vacation. Research shows that most of the people who acquire travel information from the internet have functional motivation, meaning that travel related information is used for the purpose of travel planning (Vogt & Fesenmaier, 1998). To test whether online information acquisition behavior is goal oriented and has functional reasons of planning vacation, we hypothesize the following:

Hypothesis 1: Usage trend of Wikipedia pages related to Amsterdam are correlated with the actual visits of tourists.

(17)

16

Hypothesis 2: Search queries indices related to Amsterdam are correlated with the actual visits of tourists.

Empirical studies show that intention to search for product information on the internet is an indicator of intention to purchase (Shime et al., 2001; Pavlou et al., 2006). All of these studies found that intention to search information is the predictor of intention to purchase. In this work, instead of intention to search information, we examine actual search behavior, and we show that actual information search behavior, in the aggregated level, can be used as a reflection of the purchase intention of users.

If online search behaviors of consumers reflect their purchase intention, queries that users pose over search engines, or the number of page visits of Wikipedia pages reveals users intention and attitudes to a great extent. According to the theory of planed behavior, intention can be used to predict the behavior. Thus, in order to be the indicator of travel intention, correlated search queries or page views with lag must have prediction power of the actual behavior. The reason to include the lagged correlated page views or queries, as explained in section 2, is that information search behavior that happens in purchase phase reflects the intention. In this study pre-purchase phase is considered the time period before the trip happens, and pre-purchase is considered the actual behavior of going to the trip. To investigate whether travel information search behavior can be used as an indicator of travel intention the following hypotheses are proposed:

Hypothesis 3: Usage trend of Wikipedia pages related to Amsterdam city has predictive power of visiting the city.

(18)

17

Hypothesis 4: Search queries indices related to Amsterdam city has predictive power of visiting the city.

In (Gefen & Straub, 2000), the authors argue that two major behaviors of online consumers are getting information and purchasing. Travel buying behavior model of Mouthino (Mouthino, 1982) shows that people search for information continuously before, during and after vacation. First, travelers search for information to select a destination, then they search for information to purchase travel products such as transportation or accommodation for the selected destination. Since searching for travel-related products happens after searching to choose a destination, it seems that it is a stronger indicator of travel intention than destination-related searches.

Hernández (Hernández, 2012) studies the intent behind the query that a user poses over search engines. The intents are classified in three categories: Informational, Navigational, and Transactional. Since, Wikipedia only contains information, and lacks transactional information, Google search engine is more superior for measuring the travel intention. While it is hard to separate purchase related queries from the rest, but, in aggregated level, Google search indices predictive model will out- perform to that of Wikipedia. Therefore the following hypothesis is formulated for testing:

Hypothesis 5: The model with search queries indices has higher predictive power compared to that of usage trend of Wikipedia page views.

Moreover, based on the theory of planned behavior, intention can predict the behavior if it is measured just before the performance of the behavior (Ajzen, I. , 1985). This suggests that

(19)

18

correlated queries, which are closer to the trip, must have higher prediction power. This leads us to the following hypothesis:

Hypothesis 6: The predictive power of search queries indices increases as the lag order decreases.

4. Methodology and Empirical Testing

Currently, Amsterdam Marketing Organization (Iamsterdam), which coordinates tourist activities of Amsterdam, strongly needs to know the number of visitors to the city in advance to be able to better manage the city facilities. They, however, has no means to predict the number of visitors in advance, since the only source of information about Amsterdam city visitors is the CBS 1 publication, which is published half-yearly and shows the volume of visitors in the last six months. We hope that using the predictive power of users ‘online activities enable Iamsterdam to forecast tourism demand in a short term (i.e. one month).

4.1 Data Description

The focus of this study is to analyze online information acquisition behavior of tourists in macro-level. So, the data and the method that is used is completely different than the survey research methods. As mentioned earlier, everyone who plans to visit a city does not use online information. Furthermore, the amount of information that individuals use varies based on their interests, past experience, and knowledge. This means that the number of pages and search queries that are used by people differ. Thus, in the big-data analytics the trend and interest of a

(20)

19

larger population is examined, and the study examines the significance of behavioral changes in the whole population.

In this study, three different datasets are used, and time series analysis is used to test the hypothesis.

4.1.1 CBS website

Number of Amsterdam visitors are obtained from the Central Bureau voor de Statistiek - CBS (http://www.cbs.nl/nl-NL/menu/home/default.htm). This data, which have been published since 2012 every six months, includes the volume of Amsterdam visitors on a monthly basis. The nationality of travelers is also listed in the dataset. This facilitates analyzing the data based nationality to examine the different behavior of travelers and culture effect.

4.1.2 Wikipedia Usage Trends

The number of daily page views of English Wikipedia articles is presented in Wikipedia Trends website. The data represents page-views’ traffic, fortunately bots and spider traffic is not included. Wikipedia Trends provides number of page views for the articles from 2008 until present on daily basis.

4.1.3 Google Trends

Google is selected to collect search queries’ volume as it has been the dominated search engine in the world. According to the Forrester report (2015) in the US 65% and in the Europe 70% of searches are powered by Google. Google provides aggregate query volume through Google

(21)

20

Trends. These data are normalized and it is not absolute number of search queries. The data is presented on a scale between zero and one hundred. The query indices are available from 2004 until present on weekly basis. Since the visitors data, collected from CBS, is on monthly basis, Google search query indices are converted from weekly data to monthly indices to examine the correlation between search queries and actual visits.

4.2 Methodology

In order to analyze time series, successive actions of investigation need to be done. The flowchart of analysis of time series is depicted in figure 1. In this process, first, we analyze the trends, and raw data for the visitor time series, Wikipedia page views and Google search indices time series. Then ACF plot is used to observe trends and seasonality of the data. Time series model can be build when the series are stationary, so the Augmented Dicky-Fuller (ADF) test is used to test the data. Next, the data that is non-stationary time series is refined, and the trend or seasonality effects are removed, and ADF is reapplied. For the stationary time series cross-correlation of the time series are examined and correlated series are selected and aggregated. Finally, the Granger Causality for aggregated series is applied, and if the data has predictive power, new models are built by using ARMA or ARIMA models. In the last stage, the selected models to predict the future visitors are compared with the baseline model and the accuracy of the models by computing prediction error are computed. In this analysis, the data from 2012 to 2014 is used for training data set, and first six months of 2015 are used for test data set. In the next sections, the process is explained in each step in more details.

(22)

21

(23)

22 4.3 Visitor Time Series (VTS)

4.3.1 Exploratory Analysis of Visitor Time Series (VTS)

First the raw data of the number of visitors is examined. The figure 2 shows that over the three consecutive years, mean of visitors per year is not constant and increases. This means that the time series of visitors is not stationary. Moreover, the trend shows seasonal pattern. The ACF (Auto Correlation Function) test indicates the same results. The ACF plot, figure 3, shows that by increasing the lag the ACF decays to zero, which indicates there is a trend. Moreover, annual cycling and the sinusoidal pattern in the plot represent the seasonality of the data.

Figure 2. Trend of Amsterdam visitors volume Figure 3. ACF plot of visitors

In the next step we conduct a unit root test. In this study we use one of the most popular tests, the Augmented Dicky-Fuller (ADF) test. The unit root test determines whether the time series is stationary and differencing is required.

In the ADF test the following regression model is estimated:

(24)

23

Where denotes the first differenced series, and k is the number of lags.

The null hypothesis is:

(The data should be differenced in order to make the time series stationary) The alternative hypothesis is:

(The data is stationary and differencing the time series is not needed)

The result of ADF test on the data confirms ACF test. The P-value for the test is p < 0.3083 this means the null hypothesis is not rejected. Thus, in order to continue the analysis, the time series should be de-trended. This process will be continued until the differential function of the time series do not have unit root.

In order to obtain stationary time series, the differences between observation at time t and time t-1 are computed. After de-trending the data, the result of ADF test rejects the null hypothesis (p < 0.01). This means that the differenced time series is stationary. Figure 4 and 5 depict the differenced time series and its ACF plot. The resulting time series of the first differenced in figure 4 is stationary in mean, there is not any upward or downward trend. However, there are spikes at certain periods, meaning there is weak effect of seasonality. In the ACF plot, the same pattern appears. Although most of the lines fell within the 95% confidence intervals (blue lines), there are spikes every 12 month, which shows the periodic effect. Seasonal differencing can be applied to remove this effect. However, when a time series is differenced, one data point is lost, and since the differenced series is stationarity based on ADF test, we do not apply the seasonal differencing on the time series.

(25)

24 Figure 4. The plot of visitors first differenced series

Figure 5. ACF of visitors first differenced series

4.3.2 Baseline Model

In this study we do not chose a simple baseline model. Most of the researchers in their studies select simple AR model as the baseline model (Choi & Varian, 2012; Wu & Brynjolfsson, 2013; Yang, et al., 2015), which could be a misleading approach. The AR model is within the same class as ARIMA and ARMA models. In other words, they share the same degree of complexity, so in that sense an AR model do not have any advantages over ARIMA and ARMA models. Thus, if we simply chose AR model without analyzing the historical data, we could eventually obtain promising results by comparing the proposed model with miss-fitted baseline model.

We observed the seasonal effect and trend of historical data, in our explanatory analysis. While, the ARMA models are used for stationary time series, so this model cannot be sued as a baseline model. An ARIMA model which is extended version of ARMA model will be selected. Table 1 compares different ARIMA models. ARIMA (1, 1, 0) (1, 1, 0) overall has better fitness compared to the other models. Based on Akaike’s information criterion (AIC), the lowest AIC

(26)

25

will be selected as the best fitted model. Thus an AR (1) plus SAR (1) model with a seasonal and a non-seasonal difference is the baseline model.

Forecast model 1 (baseline model): ARIMA (1, 1, 0) (1, 1, 0)

Table 1 Selection of baseline model

Model AIC Log Likelihood

ARIMA(1,1,1) (1,1,1) 208.66 -99.33 114

ARIMA(0,1,0) (0,1,0) 222.76 -110.38 863.1

ARIMA(1,1,0) (1,1,0) 205.43 -99.71 212.6

ARIMA(0,1,1) (0,1,1) 207.53 -100.76 206.8

ARIMA(1,1,0) (0,1,1) 206.94 -100.47 204.6

Figure 6 shows the actual visitors’ trend and the fitted values, which the robustness of the baseline model can be captured.

Figure 6 Fitted baseline model and actual volume of visit

4.4 Wikipedia Usage Trends (WUT) Time Series

4.4.1 Selection of Wikipedia pages

Initially, eighty six pages form Wikipedia that are informative to a tourist are selected. The selection process was based on different activities and attractions that tourists obtain in

(27)

26

Amsterdam city. Nine different categories based on planning a trip are defined. The information that provided by Amsterdam Marketing Organization is used to select the Wikipedia pages related to the attraction and interesting activities for tourists. However, all of the attractions that mentioned by Amsterdam Marketing Organization, don’t have English Wikipedia page. Appendix 1 summarizes the categories and the related pages in each category. The number of page views for 84 pages are extracted from Wikipedia Usage Trends website (for two pages the data were not available). Thus, in total eighty four time series are available. According on the theories explained and the hypothesis, we need to evaluate the correlation between WUT time series and visitors. If tourists use Wikipedia as a source of information prior or during the trip, it is expected to have significant correlation between the page views and actual number of tourist. Moreover, if tourists plan their trip in advance, it is expected to have correlation between the series with some lag.

We undertake the exploratory analysis same as the visitor time series on each of the series. Time series must be stationary to be able to analyze the correlation between the series. Spurious correlation can happen between non-stationary time series. This could be due to coincidence or a third external factor. Thus, first we will conduct ADF test, and de-trend if the series are not stationary. The next step, cross-correlation between time series will be computed, and pages with significant correlation or correlogram will be selected.

4.4.2 Exploratory Analysis of WUT Series

In this section, the process of analysis of Wikipedia pages for “Amsterdam Metro” page as an example is explained; the process for the rest of pages is the same. Figure 7 and 8 depict the plot of “Amsterdam Metro” data and its correlation with number of visitors. The data for “Amsterdam

(28)

27

Metro” page views has negative slope while number of visitor has positive one. The cross-correlation of these two time series in figure 8 shows that these series have negative cross-correlation. Although the CCF is used just for stationary series, the correlation is computed to show the result of correlation will be different if the series is not stationary. The Augmented Dicky-Fuller test shows that the time series have unit root. Thus the differential function is applied to obtain stationary series.

Figure 7. The plot of Amsterdam Metro page views Figure 8. Cross- correlation of Amsterdam page views and visitors

One of the explanations of the downward trend for the Wikipedia page views could be the mobile Wikipedia website. Since the Wikipedia Usage Trend present the number of pages views only from the main website. The traffic share of these two platforms can be obtained from the Similarweb.com (This website provides web analytic services.). The data shows that 40% of traffic share belongs to mobile web. Thus, differencing the time series controls this external factor.

Differenced time series of “Amsterdam Metro” is stationary in mean (figure 9), based on ADF test. Figure 10 shows that the correlation of the page views and the number of visitors at lag

(29)

28

0 is positive and significant, while before de-trending the series they had negative correlation. The correlation coefficient is 0.382 according to the auto correlation function.

Figure 9. The plot of “Amsterdam Metro” page first differenced series

Figure 10. Cross- correlation of “Amsterdam Metro” differenced page views and visitors

The cross-correlation for all of the Wikipedia pages are computed, and the results of the analysis show that forty two WUT time series have significant correlation with the actual number of visitors from Amsterdam city. The appendix 2 illustrates the maximum correlation coefficient of each time series with the visitor time series. We can see that most of the correlation happens within 0 lag. The outcome is promising, because there is correlation with visitors in each category. This means that in each step of trip planning there are pages that are visited. However, since the computed lag in most of the cases is zero, it could mean that people visit Wikipedia close to their trip, and plan their trip within a month.

(30)

29

Thus, hypotheses 1 is supported, and the pages with significant correlation with the number of visitors support the assumption that significant number of viewed pages have been used by visitors, for the purpose of vacation planning, either before vacation or during vacation.

4.4.3 WUT series and corresponding population

In order to see whether the behavior that is observed is meaningful, we investigate the correlation of page views and the number of visitors from United States and European countries. Figure 11 and 12 depicts the proportion and trend of visitors in three categories, visitors form European countries excluding the Netherlands, United States, and visitors from all countries. The reason for this selection is, to check whether the distance has an effect on the behavior of Wikipedia’s page visitors and tourists’ trip planning behavior. Hence, we selected two groups that drive large number of tourists to Amsterdam with short and long distance to the destination city. Another reason for selecting the United States is to examine whether the data represents the majority of tourists behavior from all over the world. Based on the report of web traffic analysis from Similarweb.com, almost 35% of traffic of Wikipedia English pages comes from United States. So, we need to check whether the observed page views derived by Americans or bigger population.

As figure 11 demonstrates almost 50% of tourists come from European countries, this means that the behavior observed by this group explains almost half of the Amsterdam’s tourism market. From the other hand, plot of visitors’ trend (figure 12), shows visitors from European countries, compared to the other two, have higher rate of fluctuation, which makes the prediction and market changes hard to identify. However, the trend of visitors from all countries has lower rate

(31)

30

of fluctuation, so if we could obtain the behavior of this group, it will be possible to predict the behavior of all visitors to Amsterdam.

Figure 11 Trend of visitors based on the ratio of visits Figure 12 Trend of visitors

The cross-correlation of page usage series and visitors for US and EU countries are computed (Appendices 3 and 4). In total, 26 and 38 of Amsterdam related Wikipedia page views had correlation with the number of tourists from US and EU countries, respectively.

One interesting fact about the result is different lag periods for the US visitors and European visitors. For US visitors, almost 70% of the pages are correlated with lag. Conversely, only 34% pages are observed with lag for the European visitors, and 66% with zero lag. This interesting finding is a proof to show the different planning behavior of tourists for different destinations based on the distance of a travel, which intuitively one could conclude.

Furthermore, Out of 42 pages that had correlation with the number of tourists from all countries, 45%, and 71% had overlap with the pages that had correlation with the number of tourists form US and European countries, respectively. However, the computed lags, and amount of correlation for those pages are different. This evidence suggests that obtained data for the number Wikipedia page reviews belongs to a larger population, and it is not representative of

(32)

31

tourists from US and Europe. Thus, it can be concluded that the pages that are selected based on the significant correlation between Wiki series and visitor series (Visitor from all countries) is a suitable indicator to evaluate all visitors.

4.4.4 Aggregate WUT Time Series

We constructed one time series by aggregating the data of all WUT series that had significant correlation with the number of actual visitors. The numbers of page views of forty two pages for each month were summed up. The reason to use the sum of the page view is to include the volume in the equation, and to examine whether the effect is driven by the volume of Amsterdam related page views.

For the rest of the analysis, we use the aggregated time series as a representative of WUT time series. Cross-correlograms between visitors and Wikipedia aggregated time series is calculated. The CCF plot below shows the correlation between differenced time series of WUT and visitor time series. Clearly there is positive correlation (r= o.385) between two time series with four months lag; the line crosses 95% confidence intervals. Although the large number of calculated correlations had zero lag, the cumulative effect of volume changed the lag order. In other words, the number of people who visited Amsterdam related pages 4 months before their travel was higher than those whom did this at the same month of their trip. In plot, at lag zero the line barely touches the blue line (95% confidence intervals), which shows small amount of correlation between two time series with zero lag. Moreover, there is no correlation between number of page views and visitors after travel. This means that travelers do not use Wikipedia as a source of information after their vacation.

(33)

32

Figure 13 CCF Plot - Cross- Correlogram of differenced time series of Wikipedia usage trend and visitors

4.4.5 Granger Causality Analysis

To investigate whether the number of tourist can be predicted by WUT, Granger causality test is performed. This test determines predictive power of one variable over the other. In this test the lag length is important for robustness of the test. The result of the analysis in previous section determined that two time series had significant correlation with four month lag. Thus, we perform the test with lag order of four, on differenced time series. As it is mentioned before, all of the analysis is done on stationary time series. The result in table 2 shows that WUT time series with highly significant P-value of 0.001524 Granger Cause the number of visitors.

Thus, hypothesis 3 is supported, and this means that the correlated page views of Wikipedia shows the travel intention. So, a model to forecast the future behavior of visitors based on the information obtained from WUT data can be built.

(34)

33

Table 2 Granger Causality test for WUT data and number of visitors

Null Hypothesis Lag order F-statistic P-value WUT does not Granger Cause Visitor 4 5.523 0.003107 Visitor does not Granger Cause WUT 4 6.32 0.001524 4.4.6 Wikipedia Usage Trends Model

We extend the ARIMA model and incorporate the Wikipedia usage trend data as the predictive variable. The WUT time series has four month lag which enables us to predict number of visitors 4 month ahead. Thus, a regressor to the baseline ARIMA model is added to increase its predictive power.

Forecast model 2: ARIMA (1, 1, 0) (1, 1, 0) + regressor (WUT)

In order to evaluate our theory, we need to examine the result of the model with exogenous variable with the baseline model. To see whether adding new variable improves the baseline model, accuracy of prediction of models is evaluated by comparing the mean absolute percentage error MAPE - and the root mean squared error –RMSE of models.

4.4.7 Forecasting Power of Models

We used 36 data point for the training set and 6 data point for test set. The result of prediction for the next six month shows that WUT data increased the root mean square error (RMSE) from 28.6 to 30.6 and mean percentage error (MAPE) from 4.3 to 4.5 , around 5% decline in prediction accuracy (see table 3). In the previous section, the Granger causality test confirmed that Wikipedia data has prediction power, but it is not as strong as the baseline model. The accuracy of prediction could have been promising if a simple baseline model had been chosen.

(35)

34

Table 3 Forecast comparison of models

RMSE MAE MPE MAPE MASE

Forecast Model 1 28,61201 23,49119 -2,93066 4,302963 0,57057 Forecast Model 2 30,67133 24,51263 -3,4746 4,511053 0,59538 4.5 Google Trends (GT) time series

4.5.1 Query Selection

Identifying relevant queries is extremely hard, because there are thousands of keywords that can be used. Moreover, finding the queries that show travel intention is challenging since some queries are used for multiple purposes. For instance, a user who searches for “Amsterdam weather” might be an Amsterdam citizen, who is checking when the rain stops, or a tourist, who is trying to find a suitable time to visit Amsterdam. Thus, two aspects we need to consider. First, the selected queries should be able to cover the representative sample of the population. Second, it should be able to show the purpose of the information seeking behavior. In order to address these concerns, systematic approach in the query selection process is employed.

Google Trends provide top searches for the queries; therefore by using a simple query, top searched queries related to that term will be extracted. We used the term “Amsterdam” as the first seed query. The reason to start with this query is to obtain top searched queries with large volume. Interestingly, the first and the second top searched queries for Amsterdam are “Amsterdam hotel” and “Amsterdam airport”. Hence, the queries that are related to the planning a trip to Amsterdam were retrieved recursively, by using search terms extracted from the fists seed. To cover all aspects of a travel planning behavior, extra queries that were not suggested by Google trends are added to the database. In total 238 distinct search queries were selected. We

(36)

35

divided the queries in 8 different categories, to examine the type of information that can be a particular of interest of tourists.

4.5.2 Exploratory Analysis of Google Trends Series

Google trends provide data weekly, so first, the weekly data is converted to monthly indices. In the next step, exploratory analysis of raw data and ADF test is applied, and differenced method is used to make the series stationary. Then, the correlograms between time series of search queries and number of visitors are computed.

The results of cross-correlation, for the series with significant correlation with the number of visitors, are summarized in appendix 5. In total, 134 search queries, (56% of search terms), had significant correlation with the number of visitors. This means that hypothesis 2 is supported.

Table 4 presents percentage of correlated queries in each category. Sum of the correlated queries with different lag order is higher than 100%, because for some queries the correlation was significant in more than one lag.

Table 4 Percentage of correlated queries in each category, and queries with lag order of 0, 1 and 2.

Categories significant correlation % Queries with % Correlated queries with 0 lag order % Correlated queries with 1 or 2 lag order

Accommodation 63% 81% 35%

General information 56% 80% 7%

Market & Shop 80% 50% 0%

Museum 64% 52% 10%

Neighborhood & Towns 53% 60% 20%

Tourist Info 74% 70% 40%

Tours & Attractions 69% 45% 64%

(37)

36 4.5.3 Aggregate Google Trends series

Since queries without lag do not have predictive power, we eliminate those from our analysis. We aggregate queries that have similar lag order. Different methods are used in this area of research to aggregate search indices. Some of the researchers simply summed the indices, while others used average. To avoid any overfitting or spurious correlation we use a group of queries and extract indices for the entire group. However, it is not possible to use more than 30 characters at one time, so in some cases we had to use more than one group. For instance, queries with one month lag were divided in four groups, and the data were extracted for each group. Then, we summed the related indices of those four groups. At the end we had, one time series for each of the lag orders.

4.5.4 Google Trends Models

Before constructing a model, we investigated the cross correlation between aggregated time series and number of visitor time series. All of the single time series had significant correlation with the number of visitors. Then we selected time series with lag order 1, 2, and 5 to investigate whether lag order has any effect on prediction power or not. Figure 14 shows the CCF plot and significant correlation coefficient of these time series.

(38)

37

(a) CCF for the lag order 1, r=0.503 (b) CCF for the lag order 2,

r=0.600

(c) CCF for the lag order 5

r=0.587

Figure 14 CCF plot with 3 lag order

Same as Wikipedia model, a simple regressor is applied on ARIMA model. Three models are built with different lag order for each model. The models are as follows:

Forecast model 1(baseline): ARIMA (1, 1, 0) (1, 1, 0)

Forecast model 2: ARIMA (1, 1, 0) (1, 1, 0) + regressor (GT d=1) Forecast model 3: ARIMA (1, 1, 0) (1, 1, 0) + regressor (GT d=2) Forecast model 4: ARIMA (1, 1, 0) (1, 1, 0) + regressor (GT d=5)

Table 5 summarizes the results of overall fitness of the regression model with baseline model. Based on Akaike Information Criterion (AIC), model 4 presented better fitness. Moreover, baseline model outperforms model 2.

Table 5 Regression comparison fit of the model with different lags

Model AIC BIC Log Likelihood

Model 1-Baseline 205,43 208,84 -99,71 233

Model 2- d=1 207,17 211,71 -99,58 257,3

Model 3- d=2 204,98 209,53 -98,49 229,2

(39)

38

The Granger Causality relationship between Google time series and visitor time series was tested, the result of the test is displayed in table 6. The result shows that all tree time series from Google Trends have predictive power. Meaning that number of visitors can be predicted by Google trends data. This supports hypothesis 4.

Table 6 Granger Causality test for Google Trends Time series and visitor time series

Null Hypothesis Lag order F-statistic P-value

GT-1 does not Granger Cause Visitor 1 1,2378 0,2745 Visitor does not Granger Cause GT-1 1 11,944 0,001612**

GT-2 does not Granger Cause Visitor 2 2,0818 0,1436

Visitor does not Granger Cause GT-2 2 10,053 0,0005122***

GT-5 does not Granger Cause Visitor 5 1,491 0,2393

Visitor does not Granger Cause GT-5 5 3,9473 0,01268* Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

4.5.5 Forecasting Power of Models

To test the predictive power of models, the prediction error between models were computed. Table 7 presents the results of the test.

The results for the queries with one or two months lag outperformed the baseline model, with significand improvement in mean percentage error (MAPE), 18% and 26% respectively. Although baseline model outperformed model 2 in goodness of fit, its prediction accuracy was lower. The reason for this inconsistency can be explained by figure 15. The plot presents volume of visitors from 2012 to 2016. This is the same plot in figure 2, which the data from the test

Table 7 Forecast comparison of models

RMSE MAE MAPE MASE Improvement in MAPE

Model 1-Baseline 28,61201 23,491188 4,302963 0,57057 - Model 2- d=1 22,7591 19,088624 3,514946 0,463638 18% Model 3- d=2 21,12149 17,484172 3,193637 0,424668 26% Model 4- d=5 29,19336 23,758263 4,461833 0,577057 -4%

(40)

39

period is included. The mean of the number of visitors each year had jump, which caused an upward trend. However, in the last year which is the testing period of the data set, average of the volume each month was almost same as the year before. This is why, the baseline model with relatively good fitness, could not predict as good as the model that included the data from Google Trends. The reason for this change in behavior is not clear, but by including the Google Trends data, the model is capable of predicting this change in the future. Thus, baseline model could work perfectly in a stable market, but when the market is not stable, or during crisis or any shocks in the market, information acquisition behavior of consumers can have higher predictive power.

Figure 15 Trend of Amsterdam visitors volume

Model 4 underperformed the baseline model; this shows that in short term period Google trends data predict perfectly. However, for the longer period the performance is almost same as the baseline model. This finding is consistent with the previous research, meaning that search data is able to predict and explain short term periods.

(41)

40

In sum, the results supported hypothesis 5. The accuracy of prediction with Google search engine data out-performed that of Wikipedia. Hypothesis 6 is partially supported. The performance of predictive model improved by decreasing the lag order from 5 months to 1 and 2 months, but model with lag 1 slightly performed poorer than lag 2.

5. Conclusion and Discussion

5.1 Wikipedia Usage Trends

This study proved the value of Wikipedia Usage Trend data in predicting the intention and even actual behavior of tourists. The results show that 51% of Wikipedia page views had significant correlation with the number of tourists in Amsterdam. Therefore, hypothesis 1 is supported. We validated that information acquisition behavior of Wikipedia visitors is goal oriented, and pages with significant correlation are mostly used by tourists for the purpose of vacation planning.

Pre-travel: The aggregated data of page views had significant correlation with the number of visitors with lag order of four. However, the correlation between page views and visitors with lag of zero is not as high as lag four, and the line barely touches the 95% confidence interval. The reason can be explained with the effect of volume of the data, meaning the data with higher lag has higher volume. Furthermore, it confirms the fact that Wikipedia is used as medium to get information for evaluation and selection of destinations. Based on the travel buying models, the information search behavior is a process that first is used to select a destination; then to buy the travel products; and finally it is used during the travel.

During travel: Although the cross-correlation at lag zero confirms that visitors use this medium during their vacation or close to their vacation planning, it is not as strong as lag 4.

(42)

41

Moreover, correlated pages are scattered in all categories, responding to information acquisition behavior in each step of planning a vacation.

Post-travel: There is no significant correlation with the number of page views and visitors. So, visitors do not use Wikipedia as a source of information after their trip. This is in line with the findings of previous studies, that visitors use environments that are more interactive to acquire information and share it (Gretzel et al, 2006).

The Granger causality test supported hypothesis 3, and aggregated page views has predictive power with lag order of 4. In other words, correlated page views can be used as an indicator of travel intention. The predictive model is built by using the page views data. However, the forecasting model underperformed baseline model by 5%. This is because the baseline model that is selected is not simple, and has strong predictive power.

The analysis results of visitors from different geographical regions confirm previous findings that culture, prior knowledge, past experience and destination has effect on information search behavior of individuals (Gretzel & Hwang, 2006; Kerstetter& Cho; 2004; Money & Crotts, 2003). The information search behaviors of the visitors from the following three regions are compared: (1) visitors from all over the world, (2) European visitors, and (3) American visitors. Each region showed different behavior in the information acquisition process.

One of the interesting findings in this section is that correlation between US visitors and page views has the highest lag compared to the other two. Almost 70% of the pages have lag while this figure drops to 34% for European. Based on the available data, we cannot conclude the reasons for such a high difference. As mentioned earlier, culture, past experience, knowledge,

(43)

42

and destination city can be the reasons. Since European visitors are more familiar with the vacation planning in Europe and the area; they might need less information for selecting a destination. The fluctuated behavior of European visitors and correlated pages without lag is strong indicator that they plan the vacation spontaneously and within a month.

5.2 Google Trends

In this research, we found clear evidence that the numbers of tourists in Amsterdam are correlated with monthly search volume of the keywords related to Amsterdam (hypothesis 2). In total, 56% of the examined search queries had significant correlation with the number of visitors. This means that significant number of visitors used these queries while they were looking for information to select Amsterdam as a destination or plan for a vacation in Amsterdam. The analysis of lag orders shows similar results as Wikipedia. Before and during the travel, planning behavior queries had significant correlation with the number of tourists, but in post-travel phase no significant correlation was found.

According to our exploratory analysis of Google Trends series (Table 4), queries in the transportation category had the lowest amount of correlated queries (38%), and categories like market and tourist info obtained the highest percentage. This means that a category like transportation contains queries that both tourists and locals use, although use of these queries was significant for tourists. This explains why correlated queries in categories like general information, neighborhood, and town were not as high as the rest. Thus, if we use more specific queries related to vacation planning and tourism, the likelihood that those queries are used by tourists and not residents will increase.

(44)

43

Furthermore, each category shows different lag order, which represents various aspects of a trip planning behavior. For instance, in the accommodation category, 81% of correlated categories had zero lag order and 35% had lag order of one or two month. Interestingly, we see that queries with lag are more general compared to those without a lag. This behavior is not strange, since customers tend to search for more general information at the beginning, while at the end they narrow down their choices. We see the same behavior in this study; first people search for more general information like “hotels Amsterdam center” with one month lag, and then more specific brands like “Hilton hotel Amsterdam”. This confirms previous studies in travel buying models, which states that tourism consumers first look for information to select a destination, and then search for information to buy the travel product (Mathieson et al., 1982). Thus, the more specific the queries, the more purchase-related it becomes.

The Granger causality test supported hypothesis 4 regarding predictive power of Google search queries as an indication of travel intention. Moreover, using regressor on ARIMA model, search queries with lag orders of one and two outperformed the baseline model by 18% and 26%, respectively. In contrast, lag order of five months had lowest accuracy, which is in line with the theory of planned behavior, which states that there is temporal effect in measuring intention, meaning that as it gets closer to the behavior the measured intention becomes a better predictor of the actual behavior (hypothesis 6). So, hypothesis 6 partially was supported.

The prediction power of model with Google search engine data is higher than that of Wikipedia, so hypothesis 5 was supported. As explained before, the underlying reason is the availability of transactional information in Google search engine.

(45)

44 5.3 Implications

5. 3.1 Implications for Practice

First, analyzing a market and finding the interest of consumers regarding a specific product, user search queries can provide great insights. In this study, attractive and popular places have significant correlation with visitors, which indicates interest of visitors for those places. This valuable information can be used for marketing mix decisions. For instance, Google trends data can reveal the phase of product lifecycle which is highly valuable information for identifying potential sale opportunities. Regarding promotion and communication, content managers can use the popular keywords in digital campaigns to increase engagement and boost sale. Moreover, distribution strategy can be more effective, if companies know in which region the interest for a certain product is higher.

Second, user search queries data is a powerful tool in competitive intelligence analysis. Keywords related to different products and brands of competitors can reveal their performance and popularity within consumers. Furthermore, to some extent it can indicate market share of products.

Finally, user search queries data is a great tool for brand managers to measure brand awareness. Marketers should be aware that there are some situations that user search queries data should be used cautiously. In this study, lag order is mostly zero to one, but for each industry dominant lag order may be different. For instance, real estate and high-tech businesses have different structures. For example, when indices increase for real estate queries, the result in the market will be shown with a bigger lag compared to the products such as mobile phone.

(46)

45

Moreover, products and brands from premium categories should not be compared with mass-market products. Since brand awareness for mass-mass-market products is higher than premium products, volume of indices for those brands will be higher than that of premium brands.

5.3.2 Implications for Amsterdam Marketing Organization

The query selection is an important step in building an accurate predictive model, since travelers (especially foreign travelers) plan their trip in different steps. Hence, based on the planning steps each query might have different lag period, and effect on the model. The list below represents the type of queries that needs to be selected based on different strategic approach that Amsterdam Marketing Organization could take:

1. Overall queries from Tourist Info and Accommodation category can be used to target tourists and find what they plan and when they plan and predict their behavior.

2. To predict visitors’ volume within a month, especially before events, queries from

Accommodation, Tourist Info, and General Information (especially queries related to

weather) categories are more powerful.

3. To predict visitors’ volume one or two months ahead, focus needs to be on queries related to Attractions and Transportation category.

4. To predict the volume of visitors three to five months ahead, focus of queries should be on Museum category.

5. Since European tourist demand is not stable and 50% of tourists are from Europe, search queries from this region needs to be examined separately.

Referenties

GERELATEERDE DOCUMENTEN

Rapporten van het archeologisch onderzoeksbureau All-Archeo bvba 321 Aard onderzoek: opgraving Vergunningsnummer: 2016/244 Naam aanvrager: Natasja Reyns Naam site: Hoogstraten

As a consequence, the required percentage of people using the bicycle in a city in order to be considered as a cycling city according to city dwellers might be lower for

This time, depress the plunger to the point of initial resistance, wait one second, and then continue pressing the plunger as far as it will go in order to discharge the entire

This prediction value gives a trend indication: this is a value that indicates whether and, if so, in which direction the benchmark trend will change in a specific expectation

The model has its origins in family stress theory, having evolved from Hill’s (1949 &amp; 1958) ABCX Model, via McCubbin &amp; Patterson’s (1983a &amp; 1983b) Double ABCX Model

An additional comparison with independent data obtained from 1,007 European monitoring stations for the period 2010−2015, 35 showed that year-specific annual mean flow, and highest

Le graphique montre que la majorité des enquêtés sont logés dans des maisons non jumelées (201 travailleurs, soit 68,83%).. Ceux qui habitent dans des maisons jumelées représentent

Firstly with regards to the breakdown of the initial vortex sheet formed between the flow emerging from the blowing slot and the upper surface boundary-layer: