• No results found

What kind of behavior is indicative for a successful customer journey?

N/A
N/A
Protected

Academic year: 2021

Share "What kind of behavior is indicative for a successful customer journey?"

Copied!
61
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

What kind of behavior is

indicative for a successful

customer journey?

An investigation into the factors that ensure successful

progression to each of the phases in the customer

journey.

Heleen van Sonsbeek, s3756416 University of Groningen

Faculty of Economics and Business MSc Marketing

Master thesis 17th of June 2019

First supervisor: dr. F. Beke

(2)

Management summary

Various studies show that it is of paramount importance for companies to have a successful online shopping environment, as this gives them a big competitive advantage. In addition, it is important to master this area in order to survive as a company and to maintain a leading position in its respective market(s).

In order to master a successful online shopping environment, companies have to ensure a smooth and optimal customer journey. To successfully embed this customer journey, a thorough understanding of the journey and the behavioral characteristics indicative for each of the stages need to be acquired. It begins with knowing how the different phases are distinguished in an online setting. Several guidelines have been outlined in this research that illustrate this.

The research focused on each of the phases individually, zooming in on which behavioral characteristics contribute to a customer successfully progressing to a next phase.

It commences with the first phase, the problem recognition phase. For customers to progress from this phase to the next phase, the consideration phase, it is hypothesized that this transition will be positively affected by reading inspirational content. This proved not to be the case, on the contrary. Customers reading inspirational content were actually less likely to successfully progress to the second phase in the customer journey than those who didn’t. This can be due to a variety of reasons. It could be that people use the website for merely reading blogs and either make their purchases offline or through another channel, or do not make any purchase at all. Although there were limitations in the data provided (only a small percentage of people that watched blogs) an advice here would be to carefully consider including inspirational content in this phase and possibly doing so in the next phase.

For the successful progression from the second to the third phase, the evaluation of alternatives phase, the impact of the expose to online ads during a customer-initiated contact with the website was examined. The results show that customers who were exposed to those ads were more likely to successfully progress to the next phase. In addition to this, the results show that in this phase, the exposure to ads that were initiated by the firm, like email advertising, actually negatively impact the progression of the customer to the next phase. It is therefore advised that customers who are identified as being in this phase, should not be exposed to such ads. Rather it is recommended to focus the budgets towards ads that customers are exposed to during their search, such as paid search ads. The results have shown that when the customers use these ads to enter the website, they are just as likely to progress to the next phase as they are when they directly entering the website without the use of any ad. This shows that these paid search ads are just as effective on progressing to the next phase as customers directly entering the website, and thus shows massive potential.

(3)

product. Providing the right information is therefore essential. Additionally, an advice would be to at this point provide the customer with special offers.

Lastly, for the successful progression from phase four to phase five, the post purchase phase, the impact of purchase history is examined. It is predicted that customers who have a purchase history with the company are more likely to make a purchase again than the customers who did not. This has proven to be true, once again confirming the fact that loyal customers are the most profitable to the company. Paying specific focus to this category existing customers through for instance targeted marketing or special offers is a worthwhile consideration to increase the share-of-wallet for the company of these customers.

The research however found no significant difference among behaviors that lead to a successful progression to the next phase for the different products. Most likely this is due to the fact that very limited data has been recorded around the price of the products viewed, as the data was only available for those customers who actually made a

purchase. It is therefore recommended that along with the page visit views, a separate column is recorded which includes the price or price segment of the products that are being viewed and to run these tests again.

The findings of this research on the other hand does enable companies to classify the customers in the different phases and to look for which customers are more likely to progress to the next phase. These customers are the most profitable for the company, since they are most likely to make a purchase. The focus should therefore be on identifying these customers, collecting the relevant data on the behavior of these customers and applying the right tools in order to maximize the probability that they will successfully progress to the next phase.

Finally, this research shed a light on data variables that are important to record for the company, as this will significantly enhance the insights on the behavioral aspects of a customer. As mentioned earlier, the price of the product viewed by the customer should be recorded so that this effect can be tested throughout all phases. Other data that would help with conducting such tests is the recording of when the specific products are being added to the cart. This improves the classification of the different phases. For this research, assumptions had to be made around phases three and four since the data was not available.

Overall the research has provided highly valuable insights based on theoretical

investigations and detailed analysis of the data. The question on what kind of behavior is indicative for a successful customer journey and the factors that ensure successful progression into each of the phases in the customer journey, has been successfully answered.

(4)

Table of contents

Management summary 2

1 Introduction 7

1.1 Problem statement and research questions 7

1.2 Academic and managerial relevance 9

1.3 Structure of the research 9

2 Theoretical framework 10

2.1 Customer journey 10

2.1.1 The different phases 10

2.1.2 Impact of digitalization on the customer journey 11 2.1.3 Identification of the different phases in an online setting and

successful progressions 12

2.2 Furniture industry and its developments 15

2.3 Conceptual framework 16

3 Methods 18

3.1 Data description 18

3.2 Data preparation 19

3.2.1 Distinction of the different phases 20

3.2.2 Dependent variable 21 3.2.3 Independent variables 22 3.2.4 Control variable(s) 23 3.2.5 Moderating variable(s) 23 3.3 Data analysis 23 3.3.1 Model(s) specification 24 3.3.1.1 Model 1 24 3.3.1.2 Model 2 24 3.3.1.3 Model 3 25 3.3.1.4 Model 4 25

3.3.2 Plan for testing the various hypotheses 26

3.3.2.1 Phase one 26

3.3.2.2 Phase two 26

3.3.2.3 Phase three 26

3.3.2.4 Phase four 27

4 Results 28

4.1 The different phases 28

4.1.1 Descriptive statistics 28

4.2 Outlier handling 30

4.3 Results of the models 30

4.3.1 Multicollinearity 31

4.3.2 Phase one 32

(5)

4.3.2.2 Estimation results of model 1 33 4.3.2.3 Results in relation to Hyothesis one 33

4.3.3 Phase two 33

4.3.3.1 Selection of model 33

4.3.3.2 Estimation results of model 2 34

4.3.3.3 Results in relation to Hypothesis two 35

4.3.4 Phase three 35

4.3.4.1 Selection of model 35

4.3.4.2 Estimation results of model 3 36

4.3.4.3 Results in relation to Hypothesis three and five 36

4.3.5 Phase four 37

4.3.5.1 Selection of model 37

4.3.5.2 Estimation results of model 4 37

4.3.5.3 Results in relation to Hypotheses four and six 37

4.3.6 Summary Results in relation to Hypotheses 38

5 Conclusion & Discussion 39

5.1 Conclusions and discussion 39

5.2 Academic implications 41

5.3 Limitations and future research 41

6 References 43

7 Appendix 50

7.1 Datasets variables available 50

7.2 Dataset infographic 52

7.3 Steps taken for final dataset analysis 53

7.4 Phase distinguishment 54

7.5 Results 55

7.5.1 Results phase one 55

7.5.1.1 Confusion matrices 55

7.5.1.2 Log likelihood tests 55

7.5.1.3 Outputs of different models 55

7.5.1.4 Marginal effects 56

7.5.2 Results phase two 56

7.5.2.1 Hitrate tables 56

7.5.2.2 Log likelihood tests 57

7.5.2.3 Individual marginal effects 57

7.5.2.4 Comparable marginal effects 57

7.5.2.5 Individual coefficients 57

7.5.2.6 Comparable coefficients 58

7.5.3 Results phase three 59

7.5.3.1 Confusion matrices 59

7.5.3.2 Log likelihood tests 59

7.5.3.3 Marginal effects 59

7.5.3.4 Coefficients 60

7.5.4 Results phase four 60

(6)

7.5.4.2 Log likelihood tests 60

7.5.4.3 Marginal effects 61

(7)

1 Introduction

In the last decade, a tremendous shift has taken place from offline to online purchases. Almost a third of the people report to buy something online weekly and the number of people who never shop online keeps dropping over the years (Price Waterhouse Coopers, 2019). Instead of heading to the shopping center in the weekend to look for, say, that new couch, more and more people opt for opening their electronic devices and starting their quest for a new product online. Research has shown that up to 70% of the consumers start their search for information online about a new product through retail websites (PriceWaterhouse Coopers, 2015). Retailers have naturally responded to this by using their retail websites complementary to their offline channels to increase sales. It is even said that “multichannel retailers dominate today’s retail landscape” (Zhang et al., 2010, p. 168). This shift from offline purchases to online purchases has grown so dramatically over the past years, and the prediction is that this trend will further continue in the future (Statista, e-commerce worldwide).

This trend can also be observed in the profitability of companies. A company like Amazon for instance, generates tremendous profits due to their large number of loyal customers (Hunersen, 2018). Studies have shown that this is caused by their

optimization of the customer journey. It has been said that the smoother a customer journey is to go through, the more frequent people will shop and the more they will spend (Price Waterhouse Coopers, 2019).

This shows that companies who have optimized embedding a successful online

shopping environment, possess a clear competitive advantage over those who have not done so yet. It also shows that a company has to excel in this area in order to survive and stay a leader in their respective market(s) (Lemon and Verhoef, 2016; Melis et al., 2016).

For any company to succeed, the optimization of the customer experience, which is defined as the “strategic process of creating holistic customer value, achieving differentiation and sustainable competitive advantage”, is of paramount importance (Carbone and Haeckel, 1994; Pine and Gilmore, 1998; Shaw and Ivens, 2002; Gentile et al., 2007; Verhoef et al., 2009; Jain et al, 2015). Over 70% of the executives of

companies have heavily prioritized the customer experience as a result of this (Pattek, Fenwick, 2016; Truettner et al., 2016). This includes, as mentioned before, the

optimization of the customer journey, which is defined as the process of transitioning from a potential customer to a loyal customer (Payne and Ballantyne, 1991). However, the above-mentioned shift from offline to online purchases has brought about many challenges (Campo, Breugelmans, 2015). This is due to the fact that there are changes in the way this customer journey is executed and thus leading to significant behavioral changes. Online behavior is significantly different from offline behavior, where, for example, about 80% of the online shopping carts are abandoned without consumers completing a purchase (Close, Kukar-Kinny, 2010; Listrak, 2017).

1.1 Problem statement and research questions

(8)

characteristics that are indicative for each of those phases is therefore of paramount importance.

It is more than apparent that the customer journey is an integral part of gaining competitive advantage and to succeed as a company (Court, et al., 2009; Edelman, 2010; Homburg et al., 2015). Previous studies have aimed to quantify the different phases online (Moe, 2003; Pallant et al, 2017). However, to the best of our knowledge there have been no studies executed that focus specifically on which factors ensure a successful progression to each consecutive stage in an online setting, as well further investigation of the moderating role of the type of products focused on by customers in these journeys. Given these observed research gaps, the following main research question has been established around which this thesis will be centered:

What kind of behavior is indicative for a successful customer journey? An investigation into the factors that ensure successful progression to each of the phases in the customer journey.

To help generate insights into the results for this research investigation, the following sub-questions have been defined:

How can the different phases of the customer journey be identified and classified (in an online setting)?

This question will allow the identification and distinction of the phases to be clear and will provide tangible guidelines to classify the different phases in the dataset that will be created and applied.

Which behavior is indicative for a progression to the next phase?

With this sub-question, we are interested in which specific aspects account for a successful progression to the next phase in each of the individual phases.

Which elements that can be linked to different phases account for success?

This sub-question will be used to specifically identify what behavior is indicative for a progression to the next phase. Various researches have indicated that the factors that cause for a successful progression to each of the consecutive phases might be moderated by whether customers are looking at a high involvement versus a low involvement product (Hong, 2015). A high involvement product is defined as a higher important product class, that is based on a customer’s inherent needs, values and interests, while a low involvement product is defined as a lower important product class (De Wulf et al., 2001; Zaichkowsky, 1985).

Thus, the following question is elaborated on as a part of the above-mentioned sub-question:

To what extent are these success factors affected by the type of product examined?

Finally, in the conclusion and discussion section of this research, the following questions will be answered:

(9)

The management summary contains an overall summary of the results the research focuses on and how these results found in this research can positively benefit companies.

1.2 Academic and managerial relevance

Over the past years, a lot of importance is being attributed to the improvement of customer experience as it is seen as a necessary step to gain competitive advantage. The creation of strong and enduring customer experiences results in large benefits for companies (Lemon and Verhoef, 2016; Morgan, 2017). It has shown that when the customer journey is highly improved, the benefits generated lead to higher conversion rates, improved customer loyalty and more sales through word of mouth (Court et al, 2009; Edelman, 2010; Homburg et al., 2015).

It has been perceived by the Marketing Science Institute (2014, 2016) that customer experience is one of the most important challenges to focus on with research, especially since the recent developments, of which digitalization is one of the main influencers, has increased the complexity in this field and the amount of information that is available (Chu et al., 2010; Lemon and Verhoef, 2016). Furthermore, as the customer experience and with that the customer journey is so high on the agenda for managers these days, it is seen as one of the most successful routes to generate profits and outperform

competition (The CEO guide to customer experience, 2016; Morgan, 2017). Evident that it is a very useful and booming research field.

The majority of the research that has been done on this topic is mostly surrounded on the attempts to conceptualize and measure the customer journey (Lemon and Verhoef, 2016; Brakus et al. 2009; Pucinelli et al. 2009), rather than focusing on what distinct behavioral characteristics define success in each of the phases. Since implementing this knowledge is one of the areas in which companies can maximize their profits, the research as described in this paper is highly relevant and potentially filling the research gap that is currently surrounding this topic.

1.3 Structure of the research

The structure of this thesis is as follows: first the literature about this topic will be reviewed. Based on the existing literature surrounding this topic, the

distinguishable measures for each of the phases will be identified and the behavioral patterns leading to successful progression to the next phase will be hypothesized. Then, the methodology of how the research question will be answered will be tackled,

(10)

2 Theoretical framework

2.1 Customer journey

The customer journey is a concept that falls under customer behavior, where customer behavior is defined as “activities that are directly involved in obtaining, consuming, and disposing products and services, including the decision processes that precede and follow these actions” (Engel et al., 1995). The act of the purchase itself and the pre/post purchase activities are also a part of this (Dibb, 2006).

To quantify this behavior, numerous frameworks have been established (Hawkins, 2012; Engel et al. 1995; Kotler 2000) that define this customer behavior, more

specifically, as the customer journey. The customer journey is defined as the process of transitioning from a potential customer to a loyal customer (Payne and Ballantyne, 1991). This journey comprises of five different phases, sometimes referred to as see, wish, think, do, care (Google, 2013). More formally, these are defined as problem recognition, information search, evaluation of alternatives, purchase decision and post-purchase evaluation stages (Ozarslan and Eren, 2015). The customer journey is seen as a funnel, as not everyone who takes an interest will end up buying something

(Disilvestro, Salesforce, 2018). In each phase, there will be less customers present than in the phase previous to that.

2.1.1 The different phases

In the first phase, known as awareness or problem recognition, the consumer perceives a need or problem, and becomes motivated to solve it (Kotler, 1991). This can be

internally or externally stimulated. When one is internally stimulated, something inside the person says that there is a need for a certain product. Externally stimulated on the other hand means that the customer is stimulated by an external stimulus, such as an advertisement of for example a new couch (Ozarslan, Erhan Eren, 2015). This is the stage in which the customer becomes aware of the type of product or service needed to satisfy the need.

The second phase of the customer journey is known as consideration, or information search. In this stage, knowledge is acquired, and research will be performed by the customer to do so. This can take the form of internal or external information search. With internal search, the customer searches own memory for information about the desired product or service, such as information obtained from past marketing

advertisements, or information from peers received in the past. With external search on the other hand, customers look outwardly for additional information such as company websites or recommendations from their network (Ozarslan and Erhan Eren, 2015; Hawkins et al., 2012; Engel et al., 1995; Kotler 2000).

(11)

Finally, the customer makes use of decision procedures and strategies that are fit to the product category to arrive at his/her final decision regarding the product (Ozarslan and Erhan Eren, 2015; Engel et al., 1995).

The above leads to the fourth phase in the customer journey, which is the purchase decision stage, in which the purchase of the desired product, as evaluated in the previous phase, is made.

The fifth and final phase will then be executed, which is the post-purchase evaluation stage, otherwise known as the loyalty stage. This is the stage in which the customer evaluates whether or not he/she likes the performance of the product. This influences whether a customer would purchase the product again or not and in which they communicate their opinions through various channels (Dibb, et al., 2005; Wang and Chang, 2014).

2.1.2 Impact of digitalization on the customer journey

As early as 1999, Achrol and Kotler dove into the implications for marketing of an enabled economy. They stated, contrary to earlier beliefs, that in this internet-enabled economy, the marketer should be placed more in an advisory role instead of a seller within the exchange process. This puts a focus on providing the right information at the right time. In the early 2000s, there was already a massive awareness of the impact of the internet on the marketing mix, and that the right strategies needed to be implemented to maximize this potential (Quinton and Simkin, 2017; Zineldin 2000; Trim, 2002).

As the digitalization caused a gigantic increase in data in many different domains, it did so too in the marketing field. There has been a tremendous increase in the amount of data available, and it is recognized by many companies that a massive competitive advantage can be obtained by making use of this data (Phillips-Wren, Hoskisson, 2015). This has changed the landscape of the customer journey enormously, who now execute some of the different phases, if not all, in an online setting such as webshops. It has been said in the Nielsen’s Continuous Innovation report that “retailers are increasingly finding they must innovate in ways that make it easier or more convenient for their customers to get what they need without missing a beat” and “convenience itself may be the most creative and energetic example of retail innovation”. The shift towards

multichannel offline-online retailing, such as webshops, is one of the most successful implementations that respond to this convenience trend (Campo, Breugelmans, 2015). This shift has enabled customers to combine the advantages of both the online and offline shopping sphere, where convenience is named as the number one advantage of the online channel and self-service is named as the number one advantage of the offline stores (Alba et al. 1997; Chu et al., 2008; Chu et al. 2010; Konuş, Verhoef, and Neslin 2008; Venkatesan et al., 2007).

(12)

offline and online stores (Campo, Breugelmans, 2015, cf. Dholakia et al. 2010; McPartlin and Dugal 2012; Shankar and Yadav 2010).

2.1.3 Identification of the different phases in an online setting and successful progressions

Since, as mentioned above, digitalization has significantly changed the way the customer journey is executed, the identification of the different phases is also

significantly different in an online setting as compared to an offline setting. Different studies have established various metrics and characteristics for this identification. One of the main metrics used to identify the phase the customer is in, is the visit duration, meaning the time the customer spends on the website. This is done since this metric allows for the visit intent of the customer to be observed, as a well-established theory that the longer the customer spends on the website, the more serious he/she is considering the product shown on this website (Bucklin and Sismeiro, 2003; Danaher et al., 2006, Pallant et al., 2017). However, a problem with this metric is that it does not account for people who leave their web-browser open for a large amount of time, without actively looking into it.

Therefore, another metric that would be better to observe the level of seriousness in which a customer is going through the customer journey, is the visit frequency, since this measures the online activity of the customer more accurately. This is defined as the number of pages viewed. The number of page views have shown to have a positive effect on successfully completing tasks early in the customer journey, such as choosing the type of product necessary. Contrary to this, the number of pages viewed have shown to negatively impact the completion of tasks coupled to the later phases, such as the use of a shopping cart, as the customer moves closer to the purchase stage (Bucklin and Sismeiro, 2003, Li and Chatterjee, 2005, Pallant et al., 2017).

In addition to the visit frequency, the visit variety is also an important metric for the online identification of the different phases. With this measure, the type of pages that are viewed are being taken into account. These can be distinguished in product pages, product category pages, product overview pages, shopping cart and blog content. Firstly, the number of products viewed has shown to significantly impact the part of the journey the customer is in, where a smaller number of product pages viewed indicates that the customer is busy with a specific shopping goal, whereas when a large number of product pages are viewed, the shopping goal is more broad, such as merely browsing (Moe, 2003). This metric also helps in predicting whether a purchase will be made by the customer during a visit (Van den Poel and Buckinx, 2005)

(13)

locking into the advantages of certain price promotions or making a shopping list of products to be considered for a future purchase (Close and Kukar-Kinney, 2010). Based on the above metrics, the different phases can be distinguished. Since phase one is the phase in which the problem recognition takes place, which as mentioned above can be initiated by external influences such as reading blogs and marketing (customer behavior theory, university of Pretoria) this phase is characterized by one or multiple shallow visit(s). In this phase customers typically look at the homepage of the website once or twice and, on average, no other pages are being viewed. This process of merely checking out the homepage or blogs on the look for anything that spikes their interest is said to be repeated every few days until the customer sees something interesting and develops a purchase goal at which point, he/she progresses to the next phase.

Furthermore, after this type of visit, customers typically make another visit to the website after only three days (Pallant et al., 2017).

Since blog content has become very popular in communicating innovative ideas and products and is highly used to gain readers’ online trust towards the products a retailer sells as well as to be inspired for new ideas and products (Chuan Lu et al., 2014), this is a very important online tool for marketers. Especially so, since the change of the retail landscape to the online environment has changed the customers decision making process, and blog recommendations and posts have become a vital reference source in this process (Uzonoglu and Misci Kip, 2014). Therefore, a good blog post filled with recommended products is said to positively affect a customer’s willingness to purchase products (Uzonoglu and Misci Kip, 2014). Thus, since blogs convey inspirational content that affects a person’s willingness to purchase a product, the following is hypothesized:

H1: Reading inspirational content in the first phase has a positive influence on the successful progression towards the next phase.

The second phase, also known as the consideration phase, also does not result in a purchase. The goal of this phase is gathering information (Moe, 2003). Thus, in this phase, multiple page and product views occur, and the use of the search function is more common. Customers browse for info with a relatively narrow range of categories, but a relatively broad range of products.

(14)

customer will purchase something on a specific visit (Bucklin and Sismeiro, 2009, Manchanda et al. 2006). Therefore, the following is hypothesized regarding the successful progression from phase two to phase three:

H2: Exposure to online ads during a customer-initiated contact with the website increases the likelihood of the progression from phase two to phase three. Phase three, the evaluation of alternatives phase, is as mentioned before, the phase where the purchase goal is clear to the customer, and where the different alternatives are evaluated and compared. It is said that characteristics that represent phase three online is that this phase is mostly centered around product page views. This phase is also the first time where products are added to the online shopping cart. The shopping cart is not only used as a means to purchase, but also as an organizational tool to remember the

alternatives to consider (Close and Kukar-Kinney, 2010).

Since customers are shopping with a specific goal in mind in this phase, certain behaviors and thus browsing patterns tied to goal-oriented shopping are likely to be observed in this phase. This includes a higher level of customer involvement, in which more attention of the customer is allocated to the task, in contrast to the non-goal-oriented tasks that are carried out on the website like shallow browsing (Pallant et al, 2017). This means that a higher consideration for the information is likely to be

observed, which is translated in highly frequent engagement with the website in a short time frame, in which the most relevant information about the products is collected that is needed for the customer to make their purchase decision. Thus, customers in this phase spend a significantly large amount of time on certain product pages and their focus is limited to the evaluation of several selected options (Pallant et al, 2017). This leads to the following hypotheses regarding the successful progression of phase three to phase four:

H3: The time spent on these select pages has a positive influence on the successful progression from phase three to phase four.

Phase four, also known as the purchase decision phase, which is the phase in which the customer makes the final purchase decision, does not include any new product views and is mostly characterized by the use of the shopping cart. In this phase, the content of the shopping cart is evaluated, and a purchase is made. Therefore, this phase also includes the visits to the check-out area of the webshop, including account pages visited (Moe and Fader, 2004).

Since many studies find that trustworthiness and loyalty are key to letting this phase progress successfully, meaning that a purchase will be made, a customer’s history with the brand is expected to influence the progress of this phase. It has been found that the likelihood of a customer purchasing something from a particular brand or store

significantly increases as the number of previous purchases that customer has made with that brand/store becomes higher. This has been found to be particularly true for online shopping environments (Hernandez et al 2010). Thus, the following is expected for the successful completion of phase four:

(15)

The final phase in the customer journey, the post-purchase phase, is characterized by usage of the several ordered products, reviewing them in the form of online reviews and the possible returning to these products. Since this phase is the final stage in the

customer journey, and no data is available around this stage, examining the success factors for this phase on the customer journey falls beyond the scope of this research. The above-mentioned success factors for each of the stages in the customer journey could be different whether the product the customer is looking at is a high involvement product or a low involvement product. Several studies have shown that slight changes in the customer journey could occur for high involvement products, i.e. products with a higher importance and thus price, versus low involvement products, i.e. products with a lower importance and price.

The perceived risk of buying products online is intensified for products which are required to be tested, such as how comfortable a couch is. When customers shop online, they cannot get the benefits of testing these products and so the risk of purchasing online increases. This is especially true for products that are more expensive, which are perceived as high involvement products (Chu et al, 2010).

Therefore, the expectations for the phases three and four in the customer journey with regards to the moderator are as follows:

H5: The effect of the time spent on select pages on the successful progression to the next phase is higher for high involvement products than for low involvement products.

H6: The effect of the presence of a customer’s purchase history with a shop on the successful progression to the next phase is higher for high involvement products than for low involvement products.

Since there is no data available regarding the moderator for the first two phases, the hypotheses cannot be tested for these two phases and thus the hypotheses were not established.

2.2 Furniture industry and its developments

The furniture industry is one of the examples in which the above described

developments have significantly changed the way the customer journey takes shape, with a shift from offline retailing to online retailing. Many retailers in this industry, have therefore introduced an online webshop in which (parts of) this customer journey can be executed by the customers. About 74% of the furniture shoppers look for information online (google analytics, 2016), which shows the tremendous shift from offline retail to online retail. Retailers have noticed this difference too. In 2016, 26% of the furniture retailers sold furniture online (Blue Report, google analytics, 2016). As found in a study conducted by Google (Google, 2015), digital influences offline

furniture sales, where 66% of the in-store purchasers accesses the internet while looking for information on furniture.

(16)

sphere, because of the convenience benefits that ordering such products brings with it, like not having to carry it home (Chintagunta, et al., 2012). However, the results from another study were that only one in four furniture buyers purchased online (Google, 2015).

Although there is a large change happening in the way shopping is done in the furniture industry, very little research has been done on the way furniture is being purchased, especially the customer journey involved in the process (Burnsed, Hodges, 2013;

Euromonitor International, 2016). This intensifies the need to undertake further research in this field, as it can massively improve the understanding of the customer journey in the furniture industry. In addition to that, it can gain useful insights as to how to properly cater to this customer journey in order to gain maximum profits.

Since most customers who buy furniture do so to convey a desired image to their social circle, many rely heavily on peer-to-peer interaction and word-of-mouth as a source of inspiration and therefore, this plays an important role in the decision-making process (Hassan et al, 2010; Burnsed, Strubel and Moody, 2016). This factor has been

heightened in the current online shopping world, in which information is made abundant and there are many channels to choose inspiration from, such as blogs like Pinterest, on which 130 million visual searches are made every month, most of them regarding to furniture, home décor and style (Lyons, 2018).

The main objectives for customers to purchase home furnishing products are either a need for a specific product or a want to make changes in their décor (Van Der Merwe, Campbell, 2008). It has been studied that customers who shop for furniture, which typically are products with higher involvement, the decision-making process is significantly longer than the decision-making process of other mundane products (Milosavljevic, Koch and Rangel, 2011). The average time spent on information search and evaluation alone, so the second and third phase of the customer journey, extends to three months if categories like kitchen and bathroom furnishing are concerned (Costa, 2013). Generally speaking, over half of the furniture shoppers take more than two weeks to research and purchase furniture and 36% of the shoppers take over a month (Google, 2015).

2.3 Conceptual framework

(17)
(18)

3 Methods

3.1 Data description

The data that has been used for this research is a dataset that has been provided by Goossens, an online furniture retailer. This data comes from the google analytics tool that is tied to their website. It provides insight into many variables, such as how much time customers have been spending on specific pages and which pages were visited in which sessions. The data is available both on customer level, with various specific ClientIDs present in the dataset, and on session level, which is what browser activity a customer has undertaken during a particular session on the website.

The time frame the data ranges is from April 2017 to April 2019, making it a rather large dataset. Because of its size, the analysis part has been done on a subset of this data.

The data from Goossens has been made available for this research in seven separate subsets. Below the various subsets including their variables are briefly discussed. The full variables list per dataset can be found in Appendix 8.1. In figure 1 below, a visual representation of the structure and content of the Goossens dataset is given. This figure can be viewed in full size in Appendix 8.2.

Figure 1: visual representation of Goossens dataset

As can be seen in figure 1, the content of the seven subsets are as follows:

(19)

The third and fourth datasets are the largest. Both contain the same variables, and together they have 45,319,554 observations. This dataset includes all the activity that has happened within the sessions, including the date and time on which an action occurred, the page that was viewed and the number of times that particular page was viewed, although the latter was rarely more than one time (2,192 times out of about 45 million observations).

The fifth dataset has 5,133,230 observations and logs the unique sessions paired to the ClientID belonging to that session and looks at which page the session started, and at which page the session ended. Furthermore, it logs the time that session took.

The sixth dataset has 10,607 observations and focusses on at the date, hour and minute a purchase was made, in which session and by which customer. It also records the amount of unique purchases made in that session. Of these 10,607 observations 1843 ClientIDs are not available and data is available of 8470 unique customers who made a purchase. The seventh and final dataset has 5,133,194 observations and logs how the customer entered the website. Of the 5,133,230 sessions, or visits to the website, 8470 people made a purchase. There are different sources, such as typing in the website themselves, using Google or through an ad. This dataset is useful in checking some of the marketing tactics used by Goossens as to how to get customers to visit their website.

Each of the datasets contains the sessionID (ga.dimension2), so this is the primary way through which these seven datasets can be linked. Furthermore, Goossens datasets 2, 5, 6 and 7 also include the ClientID (ga.dimension1), so this variable could also be used when linking the data.

3.2 Data preparation

Data preparation has taken place as depicted in the below figure:

(20)

To generate the needed insights from the dataset most efficiently, some adjustments have been made to the original dataset as made available by Goossens. These

adjustments are as follows: merging of the different files in one main file, sorting by sessionID so that each of the sessions can be identified as one of the five phases as discussed in the Theoretical Framework (Chapter 2.1.1).

To prepare the provided datasets for the eventual analysis and testing of the previously defined hypotheses, some necessary actions had to be undertaken. Firstly, since the dataset provided is rather large, the mere loading of the datasets is a task a standard computer has difficulty with and takes several hours. Therefore, as unfortunately no supercomputer has been available for this research, the analyses as described later on in this chapter has been done on a subset of the data.

For the analysis, the datasets three and four were merged. Subsequently, dataset two which includes the ClientIDs belonging to the different sessions has been merged with this data. After that a subset of this dataset has been created where the ClientIDs are present. This is done to make the customer journey trackable, which is something that is not possible when ClientIDs are false. Furthermore, outliers in the data have been identified and properly handled. An example of this is if a page is open for more than 600 seconds, it is assumed that a person is not looking at that page anymore and has simply kept their browser open.

After these adjustments, a subset of 500,000 observations has been taken from the above constructed dataset. This is the final dataset on which the classification of the different phases has taken place. How this has been done will be described later on in this chapter.

3.2.1 Distinction of the different phases

Figure 3: distinction of different phases in the dataset

(21)

Phase one includes all the website visits in which a customer has visited the homepage, watched the folder or magazine, the new collection, or looked at both the website general categories “wonen” (living) and “slapen” (sleeping). It also includes the website visits in which the various blog types were watched, including the inspiration pages. Furthermore, the category page views of less than 5 seconds are also added to this phase.

Phase two includes all the page views in which the specific categories are browsed for more than 5 seconds, as well as many shallow product page viewings, meaning product pages that are viewed below 10 seconds. Furthermore, Goossens offers an advice option, and since this is tied to the information search motive of phase two, pages related to this advice option are also classified under phase two.

Since phase three and four are distinguished from the first two phases by the usage of the cart, and phase three is the phase in which the items are first added to the cart, the usage of cart will be used in the classifying of these phases. However, since there is no data available on when and which products have been added to the cart, the following choice has been made to classify phase three. If the online shopping cart is viewed and directly following that a product page is viewed, then the customer is in phase three and thus the web activity following this for the particular customer is also classified as phase three. Phase three also includes all the product page views that have been viewed seriously by the customer, which is where a product page has been viewed above 10 seconds and below 600 seconds. Lastly, when a customer checks order information, such as order history, this is also put into phase three since this indicates that the customer is shopping with a particular goal in mind.

Phase four is classified as the entire process of watching the online shopping basket up until and including checkout, as this is the phase where the final purchase decision is being made. No new products are being viewed in this phase and the content of the basket is being evaluated. Therefore, this phase rarely includes a new page or category view.

Phase five is left out of the classification since this is the post-purchase phase, which occurs after a successful checkout has completed. In order to classify this phase, data is required that falls beyond the scope of the available dataset such as product reviews and product return information.

The classification of the different phases happens by adding an extra column to the earlier defined dataset as described in the chapter data preparation, with the name “PC” for Phase Classification, in which the different observations will have the different phase classifications.

3.2.2 Dependent variable

Table 1

Dependent variable Description Format

Models 1, 2, 3 and 4

Next phase (created)

Boolean dummy variable which indicates whether a customer progressed to the next phase or not.

(22)

The dependent variable for testing the successful progression to the next phase is next

phase. This is a variable that has been created as an additional column to each of the

four datasets of each of the phases. This variable takes the value ‘1’ if the customer did progress to the next phase, and ‘0’ if this did not happen. In phase four, the next phase variable is set to ‘1’ if the customer completed an order in this phase.

3.2.3 Independent variables

To test the hypotheses, for the different phases, the following independent variables have been established:

Independent variables Description Format

Model 1

Blog (created)

Boolean dummy variable that indicates whether a blog has been watched or not.

Boolean (0/1)

Model 2

Direct (created)

Dummy variable which indicates whether a customer entered the website directly (without any help of ads)

Boolean (0/1)

CIC (created)

Dummy variable which indicates whether a customer entered the website through a paid search ad after initiating the search

Boolean (0/1)

FIC (created)

Dummy variable which indicates whether a customer entered the

website through any other ads initiated by the firm

Boolean (0/1)

Others

(created) Other mediums through which a customer entered the website Boolean (0/1) Adtype (created) Factorial (1, 2, 3 or 4) Model 3 Ga.avgTimeOnPage (already present)

The average time that was spent on a page, averaged over the sessions

In seconds Timeonindividualpage

(created)

Time spent on an individual page In seconds

Model 4

Purchasehistory (created)

Presence of a purchase history for a particular client at a particular time

Boolean (0/1)

(23)

3.2.4 Control variable(s)

Table 3

In order to make sure that the chance of the model provides outcomes that could be attributed to alternative explanations, a control variable has been included. This has also shown to increase statistical power (Becker, 2005). Typically, more than one control variable is included, however, since the “average time on page” variable is the only variable that is consistently present in each of the phases, this variable is chosen as the control variable. Furthermore, previous research findings show that the average time spent on a page will affect the outcomes (Pallant et al, 2017).

3.2.5 Moderating variable(s)

Table 4

The moderating variables that will be included in models three and four are the

variables ga.ProductRevenuePerPurchase and highlow. These variables will be used to test the effect of high versus low involvement products on the behaviors that lead to a successful progression to the next phase. This effect can unfortunately not be tested for phases one and two, as this data is only available for those sessions in which a purchase has been made. Another option to test this effect for these two phases would be to classify the involvement of a product based on the type of product viewed. However, this would require too many assumptions to be made that are not backed by research and therefore the results would not be reliable. Therefore, the choice has been made to focus on the data that is present for phases three and four.

The ga.ProductRevenuePerPurchase variable shows the price of the products that have been purchased by the customer. Based on this variable, the highlow variable has been created. This is a Boolean variable that takes the value of ‘0’ if the product is low involvement and ‘1’ if the product is high involvement.

3.3 Data analysis

The research conducted in this thesis is both causal and quantitative, as the relationships between watching inspirational content, exposure to online ads, time spent on a page, purchase history and the progression to the next phase are being investigated (Malhotra, 2009). Since the dependent variable is the same for each phase analysis, and this

variable is binary, the same technique can be used for the analysis of the successful progression of each of the phases. Since marketing problems with a binary dependent

Control variable Description Format

Models 1, 2, 3 and 4

Ga.avgTimeOnPage (already present)

The average time that was spent on a page, averaged over the sessions

In seconds

Moderating variables Description Format

Models 3 and 4

Ga.ProductRevenuePerPurchase (already present)

The price of the product purchased In euros

Models 3 and 4

Highlow (created)

Dummy variable which indicates a low or high involvement product

(24)

variable require either a binominal logit or a probit model, which is said to both provide similar results, the choice has been made to implement a logit model and use a logistic regression analysis (Leeflang et al., 2015, 2017). These models will be constructed with the use of the independent variables as described above. To account for the moderation effect of high versus low involvement products, an interaction effect will be added to the model. These models are used to test the hypotheses, and the significant individual parameters of the model will be assessed. These models have been programmed in R with the use of the program R studio.

3.3.1 Model(s) specification

In order to predict whether a customer will progress to the next phase or not, a binary logistic regression model will be used. With this model, the outcome is the probability that an event occurs, in this case, that the customer moves to the next phase (Stevens, 2002). Since this event or response variable is binary, either a customer progresses to the next phase or not, it is required that the predicted values fall between zero and one (Rodriguez, 2007; Allison, 2012). The equation below shows the basic structure of a binary logit model.

𝑝𝑟𝑜𝑏% = 1

1 + exp (−{𝑥1 %𝛽3})

Where:

Probi = the chance that a customer progresses to the next phase

𝑥′% = each observation in i

𝛽3= probability coefficient for each variable

The equations of the four models that will be tested in this research are as follows: 3.3.1.1 Model 1

𝑝𝑟𝑜𝑏1𝑡𝑜2% = 1

1 + exp (−{𝛽9+ 𝑏𝑙𝑜𝑔%𝛽<+ 𝑎𝑣𝑔𝑡%𝛽?}) Where:

Prob1to2i = the chance that a customer progresses from phase one to phase two

𝛽9= intercept

𝑏𝑙𝑜𝑔% = whether a customer watched a blog in observation i

𝑎𝑣𝑔𝑡% = average time on page in observation i 3.3.1.2 Model 2

𝑝𝑟𝑜𝑏2𝑡𝑜3% = 1

1 + exp (−{𝛽9+ 𝑎𝑑𝑡𝑦𝑝𝑒%𝛽<+ 𝑎𝑣𝑔𝑡%𝛽?}) Where:

Prob2to3i = the chance that a customer progresses from phase two to phase three

𝛽9= intercept

𝑎𝑑𝑡𝑦𝑝𝑒% = the type of ad the customer is exposed to in observation i

(25)

3.3.1.3 Model 3 𝑝𝑟𝑜𝑏3𝑡𝑜4% = 1 1 + exp E− F 𝛽9+ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%𝛽< + 𝑎𝑣𝑔𝑡%𝛽?+ ℎ𝑖𝑔ℎ𝑙𝑜𝑤%𝛽L + (ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%)𝛽N + (ℎ𝑖𝑔ℎ𝑙𝑜𝑤% ∗ 𝑎𝑣𝑔𝑡%)𝛽O + 𝑝𝑟𝑝𝑝%𝛽P +(𝑝𝑟𝑝𝑝% ∗ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%)𝛽Q+ (𝑝𝑟𝑝𝑝%∗ 𝑎𝑣𝑔𝑡%)𝛽R ST Where:

Prob3to4i = the chance that a customer progresses from phase three to phase four

𝛽9= intercept

𝑡𝑖𝑚𝑒𝑖𝑛𝑑% = the time in seconds on the individual page of observation i 𝑎𝑣𝑔𝑡% = average time on page in observation i

ℎ𝑖𝑔ℎ𝑙𝑜𝑤% = whether a high or low involvement product is observed in observation i (ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%) = interaction term of involvement type with the individual time spent on a page in observation i

(ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑎𝑣𝑔𝑡%) = interaction term of involvement type with the average time spent on a page in observation i

𝑝𝑟𝑝𝑝% = product revenue per purchase in observation i

(𝑝𝑟𝑝𝑝% ∗ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%) = interaction term of product revenue per purchase with the individual time spent on a page in observation i

(𝑝𝑟𝑝𝑝% ∗ 𝑎𝑣𝑔𝑡%) = interaction term of product revenue per purchase with the average

time spent on a page in observation i 3.3.1.4 Model 4 𝑝𝑟𝑜𝑏4𝑡𝑜5% = 1 1 + exp V− W 𝛽9+ 𝑝ℎ%𝛽< + 𝑎𝑣𝑔𝑡%𝛽?+ ℎ𝑖𝑔ℎ𝑙𝑜𝑤%𝛽L + (ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑝ℎ%)𝛽N + (ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑎𝑣𝑔𝑡%)𝛽O XY Where:

Prob4to5i = the chance that a customer progresses from phase four to phase five

𝛽9= intercept

𝑝ℎ% = presence of a purchase history for the client of observation i 𝑎𝑣𝑔𝑡% = average time on page in observation i

ℎ𝑖𝑔ℎ𝑙𝑜𝑤% = whether a high or low involvement product is observed in observation i

(ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑝ℎ%) = interaction term of involvement type with the presence of a

purchase history in obserevation i

(ℎ𝑖𝑔ℎ𝑙𝑜𝑤%∗ 𝑎𝑣𝑔𝑡%) = interaction term of involvement type with the average time spent on a page in observation i

𝑝𝑟𝑝𝑝% = product revenue per purchase in observation i

(𝑝𝑟𝑝𝑝% ∗ 𝑡𝑖𝑚𝑒𝑖𝑛𝑑%) = interaction term of product revenue per purchase with the individual time spent on a page in observation i

(26)

3.3.2 Plan for testing the various hypotheses

Below, the plans for testing the various hypotheses is described. To test the various established hypotheses, four different datasets have been made of each of the phases. A column has been created and added to each of these datasets, recording whether the customer in that session went to the next phase (1) or not (0). This metric allows for the various success factors to be tested whether a customer successfully progressed to the next phase in the customer journey or not. Then, the variables as described in the variables section above will be created. Finally, three models will be applied to this data. One will be a null model, another the normal logistic regression (LM in R) and finally the generalized linear model (GLM in R). These three models will be compared on various criteria and the best performing model will be used to test the hypotheses. The below subsections will discuss how the various independent variables are created.

3.3.2.1 Phase one

As for phase one the relationship between reading inspirational content and the

progression to the next phase is being studied, the content viewed should be available in the dataset. This already is present by the use of the variable ga.pagepath, as well as an additional column which is set to ‘1’ if a blog is watched, and set to ‘0’ if this is not the case. Two variations of this variable will be studied. One variation is where only blog pages are included in this variable and another variation is where, in addition to blog pages, pages of the inspiration and trends categories are added. This is done to test whether the impact of reading this content is robust among all inspirational categories.

3.3.2.2 Phase two

In phase two, the relationship between the exposure to different types of online ads and the successful progression to the next phase is being studied, and thus the data of the seventh dataset needs to be added, which includes the columns ga.channelGrouping and

ga.sourceMedium. In addition to this, the different ads have been classified under the

different types. Organic search and direct are identified as no ads, and thus get the adtype ‘1’. Generic paid search ads and branded paid search ads are classified under customer-initiated ads, or the group CIC, and thus get the adtype ‘2’. Email and social are categorized as FIC, or firm-initiated contact ads and are identified as adtype ‘3’. Lastly, other adtypes are classified as other, and get adtype ‘4’. In addition to this, separate columns are created for each of these adtypes, with the occurrence of them being marked as a ‘1’ and otherwise ‘0’.

Variable Type of ad/route

Direct - Organic search

- direct

CIC - Generic paid search

- Branded paid search

FIC - Email

- social

Other - other

Table 5

3.3.2.3 Phase three

(27)

exact time spent on each page is not present in this dataset, this has been created with the help of the ga.dimension3 variable. This is a text variable that shows the enter and exit times of the customer on a particular page. With the help of this variable, the variable timeonindividualpage has been created, which shows the time spent on a particular page in seconds.

3.3.2.4 Phase four

The establishing of the next phase variable is a little different for phase four, as there is no real phase five distinguishable in the dataset that follows this phase. Therefore, the criteria whether a person in phase four went to phase five or not is whether they actually completed their purchase, so whether they arrived at their order overview page or not. For this phase, the effect of the purchase history on the progression to the next phase is being observed, and thus the presence of a purchase history with a particular customer must be present. This has been done by checking the data whether a purchase has been made previously for that particular customer prior to that. This has been added as a separate column in the dataset with the variable purchasehistory, where the variable equals ‘1’ if there has been a previous purchase, and ‘0’ if there has not.

Then finally, the effect of the moderator high involvement versus low involvement products is being examined. To do so, the column from dataset one,

Ga.productrevenueperpurchase, is added to the datasets of phases three and four. This

column measures the price of the particular product ordered. This data is unfortunately not available for phase one and phase two since almost no product pages are being viewed in these phases, and if they are, they do not result in a purchase later on. This data might affect the outcome of the moderator testing however, since for the subset as defined previously of 500000 entries, only a very tiny amount of the

(28)

4 Results

This chapter discusses the results of the research. First, an overview of the distribution of the data will be given, along with some descriptive statistics that serve as “model free evidence”. Then briefly how outliers were dealt with is discussed and finally the results of each of the models will be given.

4.1 The different phases

Based on the classification methods as described in the previous chapter, the

proportions of the various phases of the subset of 500,000 observations is as follows:

Phase 1 Phase 2 Phase 3 Phase 4

359,645 135,634 4,137 653

Table 6

The proportion of the various phases follows the funnel theory of the customer journey (Disilvestro, Salesforce, 2018). Furthermore, a visit of phase two is no way more

common than it was before in offline settings, it has significantly increased over the past decade (Konus et al., 2008, PriceWaterhouse Coopers, 2015, Sands et al. 2016, Verhoef et al., 2007). In addition to that, the visits of phase one, the shallower visits, were said to account for over three quarters of all visits on a company’s website, which can be observed in these proportions as well (Moe, 2003).

4.1.1 Descriptive statistics

Of the above-mentioned website visits in each of the phases, the number of unique clients present in each of the phases is as follows:

Phase 1 Phase 2 Phase 3 Phase 4

45,342 28,083 1,907 120

Table 7

This means that of the total number of clients present in phase 1, 0.26% was present in phase 4. In addition, of the 120 people that are present in phase 4, 18 did not complete their purchase (yet). Therefore, the conversion rate, which is the number of people who made a purchase compared to the number of visitors, is 0.22%. Of these people 24 were reoccurring customers, meaning that they completed two or more orders.

In this dataset a total of 139 purchases were made, with an average revenue of €531 spent per order. Below in figure 3 a histogram is displayed, which shows a distribution of the frequency of the value in euros of the purchases made. The revenue variable,

ga.productRevenuePerPurchase, is available only for the people and sessions which

(29)

Figure 4: histogram of distribution ga.ProductRevenuePerPurchase

For hypothesis one, the effect of blogs watching in phase one on the successful

progression to the next phase will be determined. Of the 359,645 observations present in this phase, 4,091 are blog pages, 138 are inspiration pages and 426 are trends pages. On average, clients spend 75.4 seconds on blog pages, 123.6 seconds on inspiration pages and 53.7 on trends pages. Of the 45,342 unique clients in phase 1, 1,313 visit blog pages, 115 unique clients visit inspiration pages and 292 unique clients visit trends pages.

In phase two, data is present on how a customer entered the website, including the adverts they clicked to get there. This can be distinguished, as mentioned in Chapter 3.3.2.2 of Methods, into four categories: Direct, CIC, FIC and Others. Below the distribution of the categories is shown as well as the percentage of sessions that were present in the next phase:

Category Route through which the

website was entered

Percentage that progressed to next phase Direct 63794 8.45% CIC 57955 8.42% FIC 16055 6.27% Others 89 0% Table 8

Finally, the time the customers spent on a page will be examined as one of the

(30)

Figure 5: averages of time variables in each of the phases

4.2 Outlier handling

As mentioned in Chapter 3.2 of Methods, there were some outliers present in the data that had to be removed. One of them are some observations that occurred in 2009. Since it falls beyond the timeline of the dataset, these observations were removed. Then there were outliers present in terms of time spent on pages. The decision has been made that if a page is open for more than 600 seconds, it is assumed that a person is not looking at that page anymore and has simply kept their browser open. This visit is therefore not likely a serious visit and thus has been removed. Lastly, there were some small irregularities in the data like a strange value for a ClientID that were removed.

4.3 Results of the models

Below the results of the various models used to test the hypotheses as established in 2.1.3 will be discussed. For each of the phases, a generalized linear regression model has been defined that shows the best predictive capabilities. To assess these predictive capabilities, the model will be compared with a null model and a model with slight alterations.

The criteria on which the various models will be compared are as follows:

First, the model will be compared with the use of the Akaike Information Criterion (AIC). This measure focusses on the accuracy of the estimates, as well as parsimony. The AIC includes a penalty for the lack of parsimony, which increases the score

(Leeflang et al., 2015). This is why the model with the lowest AIC score will be chosen, as this is the most parsimonious one of the alternatives.

Second, the models will be compared on their predictive capabilities with the use of the hitrate measure. This measure is defined by the following equation: true positives /(true

positives + false negatives). This means that when calculating the hitrate, the percentage

of correctly classified positive observations is defined and thus a higher hitrate indicates a better predictive capability (Leeflang et al., 2015, Pituch & Stevens, 2016). In the

0 50 100 150 200 250 300 350 400 450

Phase1 Phase2 Phase3 Phase4

Averages of time variables in each of the phases

(31)

model, this will be the times that the model correctly predicted whether or not a customer successfully progressed to the next phase.

Lastly, a log likelihood test is performed to compare the models to the null model. This is done to ensure that it performs better than this null model and the outcome of this test should therefore be significant (p<0.000) if this is the case.

Then, when the best model is established for that particular phase, that model will be used to test the various hypotheses. These hypotheses will be tested by interpreting the outcomes of these generalized logistic regressions. To do this, the outcomes of the coefficients will be examined, as well as the marginal effects of the various variables. The coefficients will be examined by looking at the nature of the relationship between the predictors and the dependent variables, which can be either positive or negative. Then to see what the size of the effect is, the marginal effects will be interpreted.

4.3.1 Multicollinearity

For the results of the logisitc regression to be valid, no multicollinearity must be present in the predictor (independent) variables (Lani, 2014; Leeflang et al., 2015). The

presence of multicollinearity means that there is a high correlation between these predictor variables which makes the coefficient estimates less reliable (Lani, 2014; Leeflang et al., 2015). To test for the presence of multicollinearity, the Variance Inflation Factor (VIF) has been computed for all the four best performing logistic regression models as described below in the results. The VIF score must be below five for multicollinearity to not be present. If it is beyond this number, multicollinearity issues are present in the dataset. The results of these can be found in the table below.

Variables per model VIF score

Model 1 Blog 1.000088 Ga.avgTimeOnPage 1.000088 Model 2 Adtype 1.002035 Ga.avgTimeOnPage 1.002035 Model 3 Ga.avgTimeOnPage 1.067107 Timeonindividualpage 1.067107 Highlow 6.106804 Ga.avgTimeOnPage*highlow 7.041850 Timeonindividualpage*highlow 1.448340 Model 4 Purchasehistory 1.321397 Ga.avgTimeOnPage 1.000000 Highlow 1.357510 Purchasehistory*highlow 1.658907 Table 9

(32)

timeonindividualpage variable and the interaction effect between highlow and that

variable is the most important for the testing of the hypothesis, this multicollinearity issue will likely not affect the results.

4.3.2 Phase one

4.3.2.1 Selection of model

To test the effect of reading inspirational content on the successful progression of the customer to the next phase, an extra variable has been created into the dataset that indicates whether or not blog content has been read. Then two models have been

developed, a logistic regression model and a generalized linear model. The performance criteria of these two models, as well as the null model, are described in the table below:

AIC Hitrate Log likelihood

Null model 210721 1 -

LM 218945.3 0.9855 Not significant

GLM 210310 0.98578 Significant

Table 10

As can be seen in the above table, the model that performs best according to the AIC, is the GLM model, which has the lowest AIC. Concerning the hitrate, the null model is said to perform best. However, this does not necessarily mean that the null model predicts the best. As can be seen in Appendix 7.5.1 this null model predicts everyone to go to the next phase, which means there are no false negative predictions. Therefore, the hitrate of this null model is not reliable. This is not the case for the LM and GLM

models as can be seen in Appendix 7.5.1. Both have a similar hitrate, but the GLM model performs slightly better. Lastly, the loglikelihood test proved to be significant for the GLM model and not significant for the LM model, meaning that the GLM model performs better than the null model, and the LM model does not. Thus, based on the above reasoning, the GLM model has been chosen to test the effect of watching inspirational content on the successful progression to the next phase.

To test this effect, two models have been distinguished that test two variations of blog classification. One model only focusses on the actual blog content of the website and the other model includes the inspiration and trends categories since these two categories can also be regarded as inspirational content. This is done to test the robustness of the findings for inspirational content and to see if there is a variation in these subcategories. The model chosen for testing the above, is the Generalized Linear Model (GLM), with the “logit” link function. This model is robust, as can be seen in Appendix 7.5.1 as other similar models tested on this dataset provide similar outcomes.

The GLM model includes the chosen predictor variables “blog” and “ga.avgTime” to test the hypothesis, since this is the model with the lowest AIC score out of all the models that have been used. In addition to this the log likelihood tests point out that adding these two variables significantly improve the fit of the model as compared to the null model, see Appendix 7.5.1.

Referenties

GERELATEERDE DOCUMENTEN

This research could confirm the relationship and detected that a higher number of episodes watched in a row is associated with lower qualities of family relationships.. This is in

This research therefore looked at a customer journey process mining approach that takes the privacy of users into account so that software companies can improve the usability of

Examining the relationship between customer satisfaction levels (based on the Design Quality, Product Life Elements and Product Conformance product quality dimensions),

5.5.1 The use of online channels in different stages of the customer purchase journey In order to test the first hypothesis multiple logit models are tested with a channel as a

- Attention  consumer first pays attention to product category - Interest  consumer becomes interested in product category - Search  consumer gathers information and

Furthermore, elaborate research has been done into the effect of customer equity drivers such as value equity (preference for price, quality and convenience of the product or

Therefore, it might be expected that these more task-oriented customers buy less in terms of quantity (e.g. only the lamp). At the same time, we can expect that

Hypothesis 1a: Consumer privacy increases firm performance in the reputation stage of the online customer journey, and hypothesis 4: Firm performance in the