• No results found

Valuing Website Visitors

N/A
N/A
Protected

Academic year: 2021

Share "Valuing Website Visitors"

Copied!
90
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)
(2)
(3)

3

Valuing Website Visitors

Adjusted Visitor Value: Conversion Probability * Visitor Value

Keywords: online data, web metrics, online metrics,

valuing customers, valuing visitors, clickstream data, web analytics, online value, visitor value, web mining, conversion probability, visitor value.

Master Thesis

MSc Business Administration, Marketing Management

Author: Ayco Jelmer van der Meer

Address: Prins Willem Alexanderstraat 80, 8501 MD, Joure Phone number: +31 (0)6 122 710 23

Mail: aycovandermeer@gmail.com Student number: s1975870

Organization: University of Groningen Faculty: Economics and Business

1st Supervisor University of Groningen: Dr. M.C. Non 2nd Supervisor University of Groningen: D. Naydenova MSc

Company: FBTO (Achmea)

1st Supervisor: T. Bakker 2nd Supervisor: S. Hanekamp

Completion date: June 29 2011.

(4)

4

MANAGEMENT SUMMARY

Traditional media such as radio, television and print advertisement have well-defined and well-researched metrics to measure advertising effectiveness. The internet however, has a long way to go to effectively assess advertising (Kumar and Shah, 2004). For instance, Lavrakas et al. (2010) argue that popular campaign assessment methods have poor reliability and validity. Furthermore, Fulgoni and Mörn (2009) concluded that the amount of clicks per campaign is not an accurate indicator of effectiveness either.

As the online environment “breathes” data, a model that effectively assesses value of website visitors is not only possible, it is also highly desirable. This master thesis is an attempt to create such a model. First, all online behavior prior-to-conversion is known. Online behavior such as: total amount of pages visited, total time on site in minutes, etc., also known as online metrics. Second, these data can be used to develop a new metric: “Visitor Value”. The strength of each original online metric is included within the Visitor Value metric by performing a multiple regression analysis. By only including converting visitors in the regression, the dependent variable is known as each conversion represents a certain value. Consequently, using the results of the multiple regression analysis, it is known what online behavior represents what value. Likewise, it is possible to allocate value to each (new) website visitor based on the multiple regression equation. The Visitor Value metric has an acceptable level of predictive power.

(5)

5

equation it is possible to distinguish non-converting visitors and converting visitors for 93.6% of the times, which indicates a high level of predictive power.

As a consequence, these two metrics can be combined. By multiplying Visitor Value with Conversion Probability the most important and central metric in this research is born: Adjusted Visitor Value. That is, the Visitor Value adjusted by taking into account the Conversion Probability. This approach excludes Visitor Value of visitors who are not likely to convert. Therefore, it can be interpreted as the potential value.

The metrics Visitor Value, Conversion Probability and thus Adjusted Visitor Value can be applied in several ways. First, it can be used for campaign optimization of online advertising campaigns. Under normal circumstances campaigns are assessed by looking at sales figures. When two campaigns are active, the campaigns can be assessed using all three metrics. Visitor Value provides insights whether the visitors are valuable in terms of their online behavior. In addition, Conversion Probability provides insights with regard to the likelihood that these visitors convert. The combined metric, Adjusted Visitor Value shows the potential value of the visitors per campaign. It could occur that campaign X has one conversion and a low Adjusted Visitor Value, whereas campaign Y has zero conversions and a high Adjusted Visitor Value. As Adjusted Visitor Value can be interpreted as the potential value, campaign Y is more interesting. Using these insights, campaign budget can be allocated differently in order to optimize results during the campaign. Furthermore, it is not only possible to compare two or more campaigns; it is also possible to compare the performance of advertising expressions within one campaign. For instance, one campaign on a website uses five different advertising expressions, i.e. banners. The campaign reports provide the performance of these five banners with regard to Visitor Value, Conversion Probability and Adjusted Visitor Value. Hence, finding out which banners for instance provide visitors with higher Conversion Probabilities.

(6)

6

(7)

7

PREFACE

A particular point of interest of mine is online marketing. That is the main reason why I wanted to conduct research in this field of marketing. Another point of interest is to improve my skills with regard to multivariate tests. As a marketing management student these topics are heavy material, as well as challenging. The opportunity to combine both points of interests in one master thesis is therefore a perfect end of my study time.

Besides thanking my family, friends and my girlfriend, I would also like to thank my supervisors, both from FBTO and the University of Groningen. Additionally, I would also like to thank my colleagues from FBTO who helped me gather the data and/or provide feedback to improve my thesis. Please enjoy while you read my thesis.

(8)

8

TABLE OF CONTENTS

E

1. INTRODUCTION ... 12 1.1 Background problem ... 12 1.2 Problem Statement ... 15 1.2.1 Research Objective ... 15

1.3 Theoretical and Social Relevance ... 15

1.4 Structure of Thesis ... 16

2. THEORETICAL FRAMEWORK ... 17

2.1 Literature review ... 17

2.1.1 Time On Site ... 17

2.1.2 Depth of Visit ... 18

2.1.3 Price Information Search Goals ... 19

2.1.4 Product Information Search Goals ... 20

2.1.5 Positive and Negative Service Goals ... 20

2.1.6 New versus Returning Visitors: Amount of Sessions ... 21

2.1.7 Amount of Bouncers ... 23

2.1.8 Amount of Website Errors ... 24

2.1.9 Type of Customer ... 24

(9)

9 3. RESEARCH DESIGN ... 28 3.1 Company information ... 28 3.2 Type of Research ... 28 3.3 Data Description ... 29 3.4 Dependent Variable ... 30 3.5 Independent Variables ... 32 3.6 Plan of Analysis ... 32 4. RESULTS ... 34 4.1 Outliers ... 34 4.2 Factor Analysis ... 34

4.2.1. Objectives of Factor Analysis ... 34

4.2.2 Research Design of Factor Analysis ... 35

4.2.3 Assumptions in Factor Analysis ... 35

4.2.4 Deriving Factors and Assessing overall Fit ... 35

4.2.5 Interpretation of Factors ... 36

4.3 Differences between the Types of Customers ... 38

4.4 Visitor Value - Multiple Regression Analyses ... 40

4.4.1 Objectives of Multiple Regression Analysis ... 40

4.4.2 Research Design of the Multiple Regression Analysis ... 40

4.4.3 Model Estimating & assessing overall Model Fit ... 42

4.4.4 Assumptions within the Multiple Regression Analyses ... 44

4.4.5 Identification of a Possible Moderating Variable ... 45

4.4.6 Interpreting the Regression Variate ... 46

4.4.7 Validation of the Results ... 49

(10)

10

4.6 Conversion Probability - Logistic Regression Analysis ... 50

4.6.1 Sample Size ... 50

4.6.2 Estimation of the Logistic Regression Model ... 50

4.6.3 Assessing Overall Model Fit ... 52

4.6.4 Interpretation of the Results ... 52

4.6.5 Validation of the Results ... 53

4.7 Hypotheses Testing ... 53

4.7.1 Visitor Value: Multiple Regression Analyses ... 53

4.7.2 Visitor Value: Simple Regression Analyses ... 55

4.7.3 Conversion Probability: Logistic Regression Analysis ... 58

4.7.4 Conversion Probability: Simple Logistic Regression Analyses ... 59

4.7.5 Visitor Value: Depth of Visit versus Time on Site ... 59

4.7.6 Conversion Probability: Depth of Visit versus Time on Site ... 59

4.7.7 Visitor Value: Possible Moderating Variable: Website Errors ... 59

5. CONCLUSIONS AND RECOMMENDATIONS ... 60

5.1 Results on Hypotheses ... 60

5.2 Managerial Implications ... 61

5.2.1 Visitor Value ... 61

5.2.2 Conversion Probability ... 61

5.2.3 Adjusted Visitor Value ... 62

5.2.4 Applications of Adjusted Visitor Value ... 63

5.3 Limitations ... 65

5.4 Future Research Directions ... 66

(11)

11

Appendix A - Glossary ... 78

Appendix B – List of All Variables ... 80

Appendix C – Factor Analysis ... 81

Appendix D – Assumptions in Multiple Regression ... 84

Appendix E – Logistic Regression Output ... 86

Appendix F – Infographic Adjusted Visitor Value ... 87

(12)

12

1.

INTRODUCTION

“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science.” – Lord Kelvin

1.1 BACKGROUND PROBLEM

“Does anyone really know if online ad campaigns are working?” is the title of a study by Lavrakas et al. (2010). The authors conclude that often used methods to assess the effectiveness of advertising on the internet have poor reliability and validity. The evaluation of online ad campaigns by means of checking the amount of clicks is not an accurate indicator of effectiveness either (Fulgoni and Mörn, 2009).

In the field of online marketing there are many possible approaches to evaluate online advertising campaigns. Kumar and Shah (2004) found that an increasingly large number of companies claim to optimize online advertising campaigns by using Return on Investment measures. However, they also found that it was clear that they were looking at conversions as strictly margin transactions (Kumar and Shah, 2004). Other companies perhaps use some additional online search behavior as an evaluation metric, e.g. average time on site per advertising campaign. Moreover, Danaher (2007) developed a model to predict page views, that is, reach and frequency of online advertising campaigns.

(13)

13

Finally, the second metric “Conversion Probability” is calculated. This metric is a prediction of the probability that the website visitor converts. In line with the calculation of Visitor Value, this metric is also based on the same online behavior.

Additionally, the two new metrics are combined, Visitor Value will be used to calculate “Adjusted Visitor Value” which includes the “Conversion Probability”. The Adjusted Visitor Value metric is calculated by multiplying Visitor Value with the Conversion Probability. The usage of Conversion Probability is also applied by Rutz and Bucklin (2007) who use this method to find out if keywords within search engine advertising can explain the conversion rates. The authors found that keywords were actually predictive, in part, of conversion rates. The Adjusted Visitor Value metric can be particularly useful to identify potential customers. It could occur that a non-converting visitor and a converting visitor are quite similar, and are allocated similar Visitor Values. A simple explanation could be that the visitor just lacks the last steps in the purchasing process. By using the Adjusted Visitor Value metric, a clear distinction between these two types of visitors can be made. This approach provides clearly substantial opportunities with regard to campaign optimization. Under normal circumstances campaigns can only be evaluated, while these predications provide the opportunity to optimize campaign settings during the campaign term. Besides having extra metrics to evaluate campaigns, i.e. (Adjusted) Visitor Value or Conversion Probability, there are also opportunities to personalize webpages using this metric. For instance, visitors with a high Visitor Value, but low Conversion Probability can be targeted on a more sophisticated manner to increase the likelihood of conversion, and visitors with a low Visitor Value can be targeted with different marketing communications.

(14)

14

resources can be allocated to website visitors who are most likely to be influenced by the ads (Goldfarb and Tucker, 2011).

The importance of personalization is also stressed by Goldsmith (1999) who argues that personalization must become the basis of the marketing management trajectory. In addition Constantinides (2006) finds that the traditional marketing mix as management tool lacks personalization. By identifying visitors based on (Adjusted) Visitor Value, FBTO will be able to tailor and hence personalize their online marketing campaigns. Another study by Montgomery et al. (2004) finds that dynamically changing websites could increase conversions rates from 7% to 9%. They argue that this finding indicates personalization could be quite profitable. That is, personalize web designs and product offerings based upon the website visitor.

(15)

15

“(..) IDENTIFYING VALUABLE

CUSTOMERS BASED ON ONLINE

BEHAVIOR”

greatest potential for increased effectiveness lies in the development of new segmentation strategies that are dynamically based on the situation of the time of the visit, that is, after a user enters a website. Likewise, (Adjusted) Visitor Value can serve as a new segmentation strategy.

1.2 PROBLEM STATEMENT The problem statement is as follows:

How can companies allocate value to individual website visitors based on their online behavior?

1.2.1 RESEARCH OBJECTIVE

The research objective is to find out which online behavior represents what value in terms of “Visitor Value” and why it represents that Visitor Value (academically based). First the relationships between each online metric and/or online metric dimension and Visitor Value will be investigated. That is, whether there are positive or negative relationships between the dependent and independent variables. Second, a model will be developed for both new customers and current customers that predicts the Visitor Value based on the individuals’ online behavior. This model provides information with regard to the importance and the direction of each independent variable and the dependent measure; Visitor Value. In addition, a model will be developed to predict the Conversion Probability, as this metric will be used to adjust Visitor Value, resulting in the “Adjusted Visitor Value” metric. See Appendix F for a visual.

1.3 THEORETICAL AND SOCIAL RELEVANCE

The Research Priorities of 2010-2012 presented by the Marketing Science Institute (2010) show that there is great need for research into the field of allocating marketing resources. Using a new online metric,

(16)

16

sophisticated manner. That is, identifying valuable customers based on online behavior, hence after a certain amount of clicks. Opportunities such as knowing where and how valuable customers arise can be very useful for allocating marketing resources.

In addition, Alba et al. (1997) suggest that in-store retailers should improve the personalized information, which they call a strategic advantage compared to online retailers. This strategic advantage of in-store retailers could disappear by developing new metrics in combination with technical software such as SpeedTrap. Hence, product offerings can be changed based on metrics such as Visitor Value. Moreover, according to Bélanger et al. (2006) the predominant way of determining success from organizations’ perspective is to gather clickstream data and make inferences from site traffic. Finally, as the role of internet continues to increase, studies of commercial internet sites are crucial to academic research (Allen et al., 2006).

1.4 STRUCTURE OF THESIS

(17)

17

2.

THEORETICAL FRAMEWORK

This chapter consists of a literature review regarding online behavior, measured by online metrics such as time on site, total pages viewed, etc. Each paragraph reviews one metric and results in one or more hypotheses with regard to Visitor Value and/or Conversion Probability. It is important to mention that by multiplying the two, i.e. Visitor Value and Conversion Probability the metric “Adjusted Visitor Value” is calculated. Hence, two closely related metrics are also the elements of the Adjusted Visitor Value metric. Additionally, online data is site dependent (Lynch et al., 2001), which is enlisted as a limitation of this research. The theoretical framework ends with a conceptual model and a list of all the hypotheses that will be tested. Detailed definitions can be found in the glossary, Appendix A.

2.1 LITERATURE REVIEW

2.1.1 TIME ON SITE

(18)

18

learn about the goods prior to purchase, suggesting it is a search good. However, the quality of the goods is difficult to assess, suggesting it is an experience good. (Nelson, 1970).

In addition, there are two important counterarguments with regards to the importance of this metric. First, according to Bucklin and Sismeiro (2003) time constraints are likely to play an important role in online behavior. The authors find that as visitors stay longer at the website, the probability of exiting increases. Second, Johnson et al. (2003) find that visitors spend less time per session the more they visit the same website. In this line of reasoning, despite several counterarguments, the following is expected:

= There is a positive relationship between Time on Site and Visitor Value.

= There is a positive relationship between Time on Site and Conversion

Probability.

2.1.2 DEPTH OF VISIT

Depth of visit is defined as “the number of pages viewed or requested during a visitor’s session” (Kaushik, 2007; Dreze and Zufryden, 1997). Bhat et al. (2002) define depth of visit as the key measure of website activity. This metric is in practice also referred to as page views or pageload activity. Kaushik (2007) argues that this metric is considered a proxy for customer engagement. That is, if website visitors view more pages, the website must have engaging content. Other studies, such as Lin et al. (2010) found a significant positive relationship between the number of pages accessed in a visiting session and conversion. Furthermore, depth of visits has a significant effect on customer purchase probabilities (Manchanda et al., 2006). In this line of reasoning, it is reasonable to assume that depth of visit has a positive relationship with both Conversion Probability and Visitor Value.

(19)

19

number of page views when they return to a website. Hence, this could result that the depth of visit metric is more important at first than subsequently, but it is still reasonable to assume that depth of visit is positively related to Visitor Value and Conversion Probability.

The following hypotheses are expected:

= There is a positive relationship between Depth of Visit (Page Views) and

Visitor Value.

= There is a positive relationship between Depth of Visit (Page Views) and

Conversion Probability.

= Depth of Visit (Page Views) is more important as predictor of Visitor Value

than Time on Site.

= Depth of Visit (Page Views) is more important as predictor of Conversion

Probability than Time on Site.

2.1.3 PRICE INFORMATION SEARCH GOALS

Price information search goals can be defined as visitors who obtain price information for a specific product. That is, browsing to price-information related pages. Price has been considered a key motivator of consumer decision making (Gupta and Kim, 2010). Smith and Brynjolfsson (2001) claim that website visitors do not choose websites that offer the lowest price but balance quality and price factors in consumer decision making. Besides that, it is argued by several authors that experience effects, i.e. effects that reduce the incentives to learn new websites, increase cognitive lock-in (Johnson, Bellman and Lohse, 2003; Wood and Lynch, 2002).

In addition, website visitors are differentiated by their underlying needs for the different types of information. That is, website visitors who obtain price information through online infomediaries pay lower prices (for the same product). Consequently, website visitors who obtain product information pay higher prices (Viswanathan et al., 2007).

(20)

20

than 50% of all online transaction is accounted by these visitors. This suggests that these website visitors have a higher Conversion Probability and Visitor Value compared to website visitors who do not look for product information. Therefore, the following is expected:

= There is a positive relationship between each Price Information Search Goal

and Visitor Value.

= There is a positive relationship between each Price Information Search Goal

and Conversion Probabilities.

2.1.4 PRODUCT INFORMATION SEARCH GOALS

As previously mentioned, website visitors who obtain product information through online infomediaries pay higher prices (Viswanathan et al., 2007). This suggests that these website visitors are more valuable in terms of visitor value than website visitors who do not obtain such product information. Likewise, a study of Bellman et al. (1999) claimed that product information is the most important predictor of online buying behavior. Therefore:

= There is a positive relationship between each Product Information Search Goal

and Visitor Value.

= There is a positive relationship between each Product Information Search Goal

and Conversion Probability.

2.1.5 POSITIVE AND NEGATIVE SERVICE GOALS

Service goals can be defined as an achieved goal which enabled visitors to produce a service independent of direct service employee involvement (Meuter et al., 2000). In literature also referred to as Self-Service Technologies, i.e. SSTs (Meuter et al., 2000). Examples of SSTs include changing account information, declaring invoices, etc.

(21)

21

studies show that customer satisfaction can affect profitability (e.g. Reichheld and Sasser, 1990). Understanding the underlying factors that influence customer satisfaction by using SSTs is important. Meuter et al. (2000) identify three groups that lead to customer satisfaction. First, SSTs that have the ability to bail customers out of immediate or troubling situations. Second, SSTs that are satisfying due to the relative advantage. Third, when SSTs “did its job”. Whereas customer dissatisfaction occurs in situations of technology failure, process failures, poor design and customer-driven failures. This indicates that successfully using SSTs increases customer satisfactions, which in turn positively influences visitor value. However, as Meuter et al. (2005) state; those savings cannot be realized without embracement of the visitors and the actual use of the technology.

Contrary to positive service goals, there are also service goals that are of negative impact, e.g. customers who churn online, declare invoices, report damages, etc. Therefore we distinguish positive and negative service goals:

= There is a positive relationship between the usage of positive service goals

(SSTs) and Visitor Value.

= There is a positive relationship between the usage of positive service goals

(SSTs) and Conversion Probability.

= There is a negative relationship between the usage of negative service goals

(SSTs) and Visitor Value.

= There is a negative relationship between the usage of negative service goals

(SSTs) and Conversion Probability.

2.1.6 NEW VERSUS RETURNING VISITORS: AMOUNT OF SESSIONS

(22)

22

Sismeiro, 2003). The importance of the different browsing behavior can be shown by the following example: when launching a new online advertising campaign that attracts new website visitors, the metric depth of visit will increase. This suggests a positive result by the online advertising campaign. However, Bucklin and Sismeiro (2003) argue that managers should take notice that this could send a false signal of changes and therefore take into account the mix of new and returning visitors. The authors argue that an increase of the depth of visit metric is not per definition an indicator of effectiveness, but a normal consequence of having more new visitors.

The authors use returning visitors as a test for learning effects as experience with the website increases. It is argued that visitors are learning how to navigate the site efficiently and are more knowledgeable from previous visits. This could indicate that visitors spend less time on the website due to more efficient browsing behavior. However, Bucklin and Sismeiro (2003) did not find an indication of whether reduced time on site is due to a decrease in depth of visit or shorter page view duration. They were unable to find a significant change in the time on site metric. Finally, although there is no significant finding that returning visits decreases the time on site metric, learning effects could still occur within the time on site metric.

This learning effect, i.e. increase in knowledge and familiarity has been studied for several years. Park et al. (1989) claim that an increase in store knowledge and familiarity leads to more efficient search behavior. The authors mention two effects on future store visiting behavior. First, the amount of explicit search behavior which is required to make purchase decisions decreases as a consequence of having more knowledge from which to draw. Second, knowledgeable consumers search more, as they are able to search more efficiently (Johnson and Russo, 1984). This suggests that as consumers repeatedly visit a store they become more familiar and knowledgeable with the search process. Which in return suggest that consumers who shop more frequently are more likely to make a purchase (Janiszweski, 1998; Roy, 1994).

(23)

23

There is however one potential problem with this metric. Returning website visitors might have a different IP address, i.e. a service provider which allocates them a different IP address each time they connect. Another situation could be that website visitors are using different computers, which makes it difficult to distinguish returning from new visitors (Drèze and Zufryden, 1997). Software such as SpeedTrap is able to identify sessions which belong to each other and allocates them to the specific, already known website visitors. This algorithm is not waterproof, therefore it will be enlisted as a limitation of this research. The only approach to be 100% sure is when such visitors enter the online environment of the company by logging in with their credentials. The hypothesis is as follows:

= There is a positive relationship between the Amount of Sessions and Visitor

Value.

= There is a positive relationship between the Amount of Sessions and Conversion

Probability.

2.1.7 AMOUNT OF BOUNCERS

A bouncing website visitor is defined as a visitor that stayed on a website for a few seconds. In this study a bouncer is a website visitor that exits the website within 15 seconds. However, there is no hard rule about the time bucket (Kaushik, 2007). Kaushik (2007) also claims that bounce rate is an important metric to monitor because visitors who bounce from a website are unlikely to convert. In addition, Kaushik (2007) suggests that high bounce rates may indicate that users are dissatisfied with page content or layout or the website has contextual misalignments. The importance of this metric is also supported by Sculley et al. (2009) who found that bounce rate provides a useful assessment of user satisfaction for e.g. sponsored search advertising. Sculley et al. (2009) claim in addition that it complements other quality metrics such as click-through and conversion rates. The metric is a cumulative one, as every website visit can result in a bounce. Therefore:

= There is a negative relationship between the Amount of Bouncers and Visitor

Value.

= There is a negative relationship between the Amount of Bouncers and

(24)

24

2.1.8 AMOUNT OF WEBSITE ERRORS

As previously mentioned within the service goal section, 2.1.5, customer dissatisfaction occurs in situations of technology failure, process failures, poor design and customer-driven failures (Meuter et al., 2005). Website visitors can experience website errors as well. For instance, the 404 “Not Found” error. This common response indicates the server has not found anything matching the requested URL. That is, a web reference that will not lead to a valid or correct page. Dead links are considered to be a minor inconvenience that can be resolved by using a webindex or search engine (Spinellis, 2003). However, this minor inconvenience can lead to exiting visitors. As Lohse and Spiller (1998) quote; “Nothing will drive away customers like a site full of dead links”. This suggests that website errors create an inconvenient browsing experience. In the light of these findings:

= There is a negative relationship between the Amount of Website Errors and

Visitor Value.

= There is a negative relationship between the Amount of Website Errors and

Conversion Probability.

= Website errors negatively moderate the relationship between the other

independent variables and Visitor Value.

2.1.9 TYPE OF CUSTOMER

(25)

25

In this line of reasoning the following is expected:

= Visitors that are current customers have a higher Visitor Value than visitors

who just became a new customer.

= Visitors that are current customers have a higher Conversion Probability than

(26)

26

2.2 CONCEPTUAL MODEL

In Figure 1 the conceptual model is shown with the specific hypotheses, which resulted from the literature review. In addition, a multiple regression analysis will be performed to identify the most important predictors of Visitor Value (related to H3a and H11). A logistic regression analysis is performed for H3b. These hypotheses are therefore not mentioned in the conceptual model. See Table 1 for a list with hypotheses that will be tested.

Time on Site

Depth of Visit

Price Information Search

Positive Service Goals Product Information Search

Negative Service Goals

Amount of Sessions

Amount of Bouncers

Visitor Value

Figure 1 - Conceptual Model

Amount of Website Errors

(27)

27

The following hypotheses will be tested: Hypotheses

H1a There is a positive relationship between Time on Site and Visitor Value.

H1b There is a positive relationship between Time on Site and Conversion Probability. H2a There is a positive relationship between Depth of Visit (Page Views) and Visitor Value.

H2b There is a positive relationship between Depth of Visit (Page Views) and Conversion Probability. H3a

H3b

Depth of Visit (Page Views) is more important as predictor of Visitor Value than Time on Site. Depth of Visit (Page Views) is more important as predictor of Conversion Probability than Time on Site.

H4a H4b

There is a positive relationship between each Price Information Search Goal and Visitor Value. There is a positive relationship between each Price Information Search Goal and Conversion Probability.

H5a H5b

There is a positive relationship between each Product Information Search Goal and Visitor Value. There is a positive relationship between each Product Information Search Goal and Conversion Probability.

H6a There is a positive relationship between the usage of positive service goals (SSTs) and Visitor Value.

H6b There is a positive relationship between the usage of positive service goals (SSTs) and Conversion Probability

H7a There is a negative relationship between the usage of negative service goals (SSTs) and Visitor Value.

H7b There is a negative relationship between the usage of negative service goals (SSTs) and Conversion Probability.

H8a There is a positive relationship between the Amount of Sessions and Visitor Value.

H8b There is a positive relationship between the Amount of Sessions and Conversion Probability. H9a There is a negative relationship between the Amount of Bouncers and Visitor Value.

H9b There is a negative relationship between the Amount of Bouncers and Conversion Probability. H10a There is a negative relationship between the Amount of Website Errors and Visitor Value. H10b There is a negative relationship between the Amount of Website Errors and Conversion

Probability.

H11 Website errors negatively moderate the relationship between the other independent variables and visitor value.

H12a Visitors that are current customers have a higher Visitor Value than visitors who just became a new customer.

H12b Visitors that are current customers have a higher Conversion Probability than visitors who just became a new customer.

(28)

28

3.

RESEARCH DESIGN

In order to test the previously mentioned hypotheses, a research method will be developed and further explained. The research objective is to examine which online behavior represents what value. Hence, how companies can allocate value to individual website visitors based on their online behavior. Within this chapter it will be argued which type of research is conducted and which metrics are considered. Finally, the plan of analysis will be discussed.

3.1 COMPANY INFORMATION

The model is applied to data from FBTO, an insurance company in the Netherlands. FBTO will serve as the case study within this research. The company started as a regional operating company and became a national direct writer. FBTO is part of Achmea, one of the largest financial services companies in the Netherlands. The Achmea brand includes, among others, Avéro Achmea, Centraal Beheer Achmea, Zilveren Kruis Achmea and Interpolis. In 2010 FBTO.nl welcomed 7,850,217 online visitors of whom 1,299,476 calculated prices and consequently 61,220 insurances were sold.

3.2 TYPE OF RESEARCH

(29)

29 3.3 DATA DESCRIPTION

The data set contains twelve months of online behavior of 54,210 individuals who at least purchased one insurance and 54,210 individuals who did not purchase an insurance, i.e. non-converting visitors. The data set contains three types of customers. First, a website visitor who in the twelve months became a customer, i.e. new customer. Second, a website visitor who purchases a second insurance, i.e. current customer with a new purchase. And third, the non-converting visitors. The data set contains 54,210 non-non-converting customers, 51,214 new customers and 2,996 current customers with a new purchase. The data covers the period from January 1 to December 31, 2010. It is important to mention the assumption here, that customers who achieved service goals in 2010, i.e. goals that are solely accessible by customers, before a conversion are automatically considered current customers. In addition, when customers purchase a second insurance in 2010, the second insurance is perceived as a conversion by a current customer.

(30)

30

“(..) OPTIMIZE ONLINE MARKETING

CAMPAIGNS DIRECTLY AFTER THE

FIRST CLICKS.”

In this study the behavior of 30 days prior-to-conversion will be used and interpreted as the behavior prior-to-conversion. By increasing this median from 13 days to 30 days, more data is used which includes outliers. The amount of days is still in line with literature: Rimm-Kaufman (2005) found out that approximately 10% of the conversions occurred after four or more weeks.

The data includes two important variables, namely Visitor Value and Conversion. Visitor Value, based on the maximum acquisition costs, is measured on an interval scale. Therefore, a multiple regression analysis can be performed. The Conversion, measured on a nominal scale, that is, either converting or non-converting visitor can be used for a logistic regression analysis. There is a possibility that a maximum of four models should be developed, when the type of customers, i.e. new customers and current customers, differ from each other. The variables that are used for all analyses are similar, with the exception of the service goals, i.e. goals that are solely accessible for current customers. The conclusion whether to use what amount of models also depends on whether there are enough significant differences between the two types of customers. Therefore, bivariate tests will be used to find significant differences between these two groups. If there are no significant differences, two out of four models will be developed which will contain extra variables as the current customers have access to service goals (SSTs). For instance, current customers are able to use the online customer environment to adjust their bank account number, etc.

3.4 DEPENDENT VARIABLE

For the multiple regression analysis, Visitor Value serves as the dependent variable, measured on an interval scale. This value is the sum of the maximum acquisition costs of each conversion, see Table 2. The maximum acquisition costs figure is currently available within the company to value a conversion. Ideal situation would be when online and offline data were combined and an overall CLV value is used. Limited data availability will be enlisted as a limitation of this research.

(31)

31

online behavior as online behavior is used in the form of independent variables. This approach provides the opportunity, amongst others, to optimize online marketing campaigns directly after the first clicks. This is possible, as the behavior of each new website visitor will be used to predict Visitor Value. Campaign optimizing can be done very sophisticated as it provides insights concerning the predicted Visitor Value, i.e. the higher Visitor Value, the better.

The table below provides figures provided by FBTO; the maximum acquisition costs per insurance. The travel insurance consists of a temporarily travel insurance (marked as “TR”) and continuous travel insurance (marked as “DR”). Additionally, the Home insurance consists of three components: contents insurance (marked as “INB” in this research), liability insurance (marked as “AVP”) and the homeowner insurance (marked as “OPST”).

Insurance Marked as: Maximum Acquisition Costs

Trailer insurance Caravan ● Term life insurance ORV ● Pleasure boat insurance PLV ● Legal insurance RB ● Travel insurance TR / DR ● Funeral insurance UITV ●

Home insurance INB / AVP / OPST ●

Health insurance ZORG ● Car insurance AUTO ●

Table 2 – Maximum Acquisition Costs per Insurance

The dependent variable for the logistic regression analysis is a nominal variable and provides insights whether the visitor converted, i.e. either non-converting visitor or converting visitor.

DATA (=IV)

Online behavior prior-to-conversion

VISITOR VALUE (=DV) New/Current Customer

(= CONVERSION)

(32)

32 3.5 INDEPENDENT VARIABLES

The independent variables are consistent with the sections of the literature review. A total overview of all the independent variables can be found at Appendix B. Currently there are multiple variables which will be reduced and summarized by a factor analysis within the next chapter. A factor analysis is an excellent starting point for multivariate techniques and provides a clear understanding of which variables may act in accordance and how many variables may be expected to have impact in the analysis (Hair et al., 2010, p.100). Appendix B lists the variables, hypotheses which are related to these variables and the measurement scale of each variable.

3.6 PLAN OF ANALYSIS

Firstly the respondents will be sorted for the Visitor Value metric, that is, only website visitors are considered who purchased an insurance, whether it is as a new customer or current customer. Using the data of website visitors who did not convert would bias the model, minimizing the differences between the predicted Visitor Value of each website visitor. The data set with website visitors who did not convert will be used later on, to better understand the multiple regression model. For the interpretation of the model it is useful to know what the average Visitor Value of non-converting visitors is. For now, non-converting cases are excluded from the data set

Secondly, as the data set contains 39 variables, a factor analysis will be performed to find out if certain variables can be combined to improve the overall interpretation.

(33)

33

models. This however depends on whether FBTO wants to allocate value to all online behavior in the case of the multiple regression analysis.

(34)

34

4. RESULTS

First, the outliers within the data set should be identified. Consequently, a factor analysis will be performed to improve the interpretation of the whole by combining variables. The outcomes of the factor analysis will be used in the subsequent analyses. Second, several analyses will be performed to check if the two types of customers, i.e. new customers and current customers differ significantly from each other. Consequently the hypotheses will be tested using simple regression analyses and/or separate logistic analyses. Finally, the multiple regression analysis will be used to build a model predicting Visitor Value and a logistic regression analysis is performed to predict the Conversion Probability.

4.1 OUTLIERS

The multivariate detection method is best suited for examining a complete variate, for instance such as the independent variables in a multiple regression or the variables in factor analysis (Hair et al., 2010, p.67). Based on Mahalanobis D² measure, which measures each observation’s distance in multidimensional space from the mean center of all observations, divided by the number of variables involved (D²/df) it is possible to identify outliers. This measure can be used for significance testing, that is observations having a D²/df value exceeding four can be designated as outliers (Hair et al., 2010, p. 67). Further descriptive analysis shows that these outliers, in total 26, were not realistic and thus far away from a representative element of the population, e.g. cases with 14,709 pages viewed before conversion, and 103 sessions before a conversion. Therefore, 26 cases are removed from the data set.

4.2 FACTOR ANALYSIS

Within this section, the application of the factor analysis will be described using the factor analysis guidelines of Hair et al. (2010). Factor analysis will be performed for variables, i.e. goals which involve price information search and product information search. The output of the factor analysis can be found in Appendix C, Table 1-4.

4.2.1. OBJECTIVES OF FACTOR ANALYSIS

(35)

35

hand it has a primary objective to summarize data. On the other hand it is interesting to use fewer factors that explain a maximum variance of the variables. That is, factor analysis also reduces data (Hair et al., 2010, p. 98).

4.2.2 RESEARCH DESIGN OF FACTOR ANALYSIS

First, a R-type factor analysis will be used in this research instead of a Q-type factor analysis, as the objective is to group variables rather than respondents (Hair et al., 2010, p.101). As previously mentioned, 10 variables concerning price information search and 11 variables concerning product information search will be included in the factor analysis. All cases will be included, hence 54,210. To check whether factor analysis is appropriate, two basic requirements exist. First, the sample size should be 100 or larger. Second, the minimum is to have at least five times as many observations as the number of variables to be analyzed (Hair et al., 2010, p.102). These requirements are met. Therefore, the factor analysis is appropriate.

4.2.3 ASSUMPTIONS IN FACTOR ANALYSIS

There is a strong conceptual foundation that supports the assumption that a structure does exist. To test whether the factor analysis is appropriate two measures are particularly important, namely the Bartlett’s test of sphericity and the Kaiser-Meyer-Olkin measure. The Bartlett’s test of sphericity indicates that there are sufficient correlations among the variables if found significant. The second measure is the Kaiser-Meyer-Olkin measure, also known as the measure of sampling adequacy. This measure should exceed .50 for both the overall test and each individual variable (Hair et al., 2010, p.105). For this factor analysis the Barlett’s test of sphericity is significant (p = .000) and the Kaiser-Meyer-Olkin statistic is .554.

4.2.4 DERIVING FACTORS AND ASSESSING OVERALL FIT

To derive factors, an orthogonal factor rotation is used in the form of varimax. This method provides a clearer separation of the factors and results in less multicollinearity in subsequent analysis (Hair et al., 2010, p.115).

(36)

36

check if the factors approximately predict 60% of the total variance. Based on Table 3 in Appendix C, the 11 product information pages and 10 price information pages can be combined into 8 factors explaining approximately 60.6% of the variance. Before concluding what amount of factors should be used, additional factor analyses are performed by adding and/or removing one factor. These solutions however do not provide better factors, i.e. the total variance is less than 60% or variables are illogically combined into factors.

4.2.5 INTERPRETATION OF FACTORS

The factor analysis for the 11 product- and 10 price information pages provides the following factors:

Factor Variables

1. Car insurance information PREMIE_AUTO, PRODUCT_AUTO, KZH_AUTO 2. Home insurance information PREMIE_WOON, PRODUCT_AVP, PRODUCT_INB,

PRODUCT_OPST

3. Travel insurance information PREMIE_TR, PRODUCT_TR 4. Funeral insurance information PREMIE_UITV, PRODUCT_UITV 5. Legal insurance information PREMIE_RB, PRODUCT_RB 6. Term Life insurance information PREMIE_ORV, PRODUCT_ORV

7. Health insurance information PREMIE_ZORG, PRODUCT_ZORG, KZH_ZORG 8. Leisure insurance information PREMIE_PLV, PREMIE_CARAVAN

Table 3 – Factor Analysis for Product- and Price information pages

(37)

37

Factor Cronbach’s alpha Variable in subsequent analyses

1. Car insurance information .544 PremieProductKzhAuto

2. Home insurance information .467 PremieProductWoon

3. Travel insurance information .800 PremieProductTR

4. Funeral insurance information .790 PremieProductUITV

5. Legal insurance information .809 PremieProductRB

6. Term Life insurance information .364 PremieProductORV

7, Health insurance information .102 ᵃ

8. Leisure insurance information .047 ᵃ

Note: ᵃ = used disjointed in subsequent analyses

Table 4 – Reliability of each Factor

Although the current factor solution is logically, based on conceptual definitions, the leisure insurance information factor is not. Additionally, the extra check points out that the reliability is quite low for the Health insurance information factor and the Leisure insurance information factor, i.e. Cronbach’s alphas are .102 and .047. Therefore, the 7th

(38)

38 4.3 DIFFERENCES BETWEEN THE TYPES OF CUSTOMERS

The decision whether to build two or four models depends amongst others if new customers differ significantly from current customers. Table 5 provides insights, using independent sample T-Tests. That is, providing the differences between the types of customers for each variable. As service goals are solely accessible by current customers, these goals are not included here. In addition, the new/returning nominal variable is not included, as all current customers are returning visitors.

Metrics Type of Customer Results

New (n = 51,214) Current (n = 2,996) AMOUNT_SESSIONS M = 1.73, SD = 1.502 M = 1, SD = 0ᵃ T (54208) = 26.445*** AMOUNT_BOUNCERS M = 0.03, SD = 0.283 M = .00 SD = 0.018 T (54208) = 6.385*** DEPTH_OF_VISIT M = 65.84, SD = 85.65 M = 47.07, SD = 41.32 T (54208) = 11.916*** TIME_ON_SITE (minutes) M = 74.38, SD = 360.71 M = 44.92, SD = 306.84 T (54208) = 4.376*** PREMIE_PLV M = 0.03, SD = 0.409 M = 0.04, SD = 0.443 T (54208) = -1.431 PREMIE_CARAVAN M = 0.03, SD = 0.352 M = 0.03, SD = 0.383 T (54208) = -1.409 PREMIE_ZORG M = 0.71, SD = 1.590 M = 0.62, SD = 1.246 T (54208) = 2.839** PREMIE_DR M = 0.64, SD = 1.973 M = 0.31, SD = 1.011 T (54208) = 8.877*** PRODUCT_ZORG M = 0.03, SD = 0.254 M = 0.01, SD = 0.121 T (54208) = 4.068*** KZH_ZORG M = 0.01, SD = 0.141 M = 0, SD = 0.060 T (54208) = 3.165** WEBSITE_ERRORS M = 0.02, SD = 0.202 M = 0.01, SD = 0.203 T (54208) = 0.266 PREMIE_PRODUCT_ KZH_AUTO M = 0.09, SD = 1.021 M = -0,15, SD = 0.483 T (54208) = 8.819*** PREMIE_PRODUCT_ WOON M = 0.008, SD = 1.012 M = .-0,129, SD = 0.737 T (54208) = 7.244*** PREMIE_PRODUCT_TR M = 0.003, SD = 1.012 M = -0,059, SD = 0.764 T (54208) = 3.346*** PREMIE_PRODUCT_ UITV M = -0.001, SD = 1.007 M = 0.018, SD = 0.869 T (54208) = -1.185 PREMIE_PRODUCT_RB M = 0.0006, SD = 1.007 M = 0.01, SD = 0.876 T (54208) = -0.629 PREMIE_PRODUCT_ ORV M = -0.001, SD = 1.002 M = 0.018, SD = 0.958 T (54208) = -1.082 VISITOR_VALUE M = , SD = ● M = ●, SD = ● T (54208) = -8.192*

Table 5 – Differences between the two types of customers

Note: ***Significant at p < .01, **Significant at p < .05., *Significant at p < .10.

(39)

39

(40)

40 4.4 VISITOR VALUE - MULTIPLE REGRESSION ANALYSES

In line with the factor analysis, the multiple regression analysis will also be based on guidelines of Hair et al. (2010). This section will provide two models, one for new customers and one for current customers with a new purchase. The main objective of these analyses is to predict the value website visitors represent; Visitor Value. A multiple regression model can be generally represented as follows (Hair et al., 2010):

 

Where, : the dependent variable

: the independent variables : the intercept

: the coefficients : the error term

4.4.1 OBJECTIVES OF MULTIPLE REGRESSION ANALYSIS

The selection of the independent variables should be based on theoretical relationships to the dependent variable (Hair et al., 2010, p.172). In short, the regression then provides a means of objectively assessing the magnitude and direction of the relationships of each independent variable. The direction of the relationships is either positive or negative. The selection of variables for this research is based solely on theoretical grounds, which is a requirement for a multiple regression (Hair et al., 2010, p.172).

4.4.2 RESEARCH DESIGN OF THE MULTIPLE REGRESSION ANALYSIS

(41)

41

opportunity to analyze how much variance of the dependent variable is explained by the independent variables (Pallant, 2005).

First, the data set will be split. This procedure provides the opportunity to build the model using an estimation data set and to forecast the model on a validation data set. This procedure is relatively easily done, but some research design issues need to be taken into account such as sample sizes and its statistical power, the generalizability of the results and other requirements. The overall data set contains 54,210 cases where 51,214 cases are new customers and 2,996 are current customers. As the current customers data set is relatively small, the development of the model will be based on 2,562 cases and forecasting will be based on 434 cases, i.e. the validation data set. This is based on guidelines of Hair et al. (2010, p.354) Additionally, the new customers data set is relatively large, therefore the development will be based on 29,712 cases and the validation data set for forecasting purposes contains 21,502 cases. Data splitting will be done via random selection procedures. As sample size directly influences the statistical power and generalizability of the results, it is an important influential element (Hair et al., 2010, p.174). With large samples, such as this study, it is important to ensure the criterion of significance, practical as statistical (Hair et al., 2010, p.174). On the one hand, to determine whether the sample size can be generalized, there should be a minimum ratio of 15 to 20 observations for each independent variable (Hair et al., 2010, p.175). All variables for all data sets meet this requirement. Hence the results should be generalizable as the sample is representative (Hair et al., 2010, p. 175). On the other hand, when maximizing the degrees of freedom, the model will be improved concerning its generalizability but also addresses model parsimony and sample size concerns (Hair et al., 2010, p. 176).

Additionally, there are a few independent variables which have nonmetric data, also known as dichotomous variables. These will be coded as dummy variables and therefore can act as replacement independent variables (Hair et al., 2010, p.177).

(42)

42

4.4.3 MODEL ESTIMATING & ASSESSING OVERALL MODEL FIT

The main goal of this research is to allocate value to website visitors based on their online behavior. Hence, all independent variables should be included as they represent the online behavior. According to Hair et al. (2010, p.186) this is a confirmatory perspective as the exact set of independent variables is specified in advance. It might however be required to use forward addition and/or backward elimination, a largely trial-and-error process for finding the best regression estimates (Hair et al., 2010, p.188). This procedure possible excludes independent variables, that is, online behavior.

In addition, to test the significance of the overall model a test of the coefficient of determination is needed (Hair et al., 2010, p.192) by the means of a significant F-statistic. That is, indicating that the model as a whole has statistically significant predictive capabilities. Furthermore, the adjusted R² is useful when comparing regression equations involving different independent variables or different sample sizes (Hair et al., 2010, p.193). Finally, a significance test of the regression coefficients is required, using a significance level of .05.

Visitor Value Model for New Customers

(43)

43 ( ) ( ) ( ) ( )  

Where, : Predicted Visitor Value New Customers : the error term

The above model has a R² of .598 and an adjusted R² of .358. The adjusted R² value takes the number of variables in the model into account; therefore it is a less biased measure than R², which would simply improve by adding more variables to the model (Hair et al. 2010). The F-Ratio of the multiple regression analysis is 1008.148, p = .000. All independent variables are significant at a 0.01 significance level, except for Website_Error_Amount, which is significant at a 0.05 significance level, p = .011.

Visitor Value Model for Current Customers

The Visitor Value Model for Current Customers will be developed using the same trial-and-error procedures as the New Customers Visitor Value Model. In total 27 independent variables were used in advance. The trial-and-error process of selecting the right set of variables resulted in the inclusion of only 17 independent variables. First, two variables were constants, namely: Amount of Sessions, Amount of Bouncers and SRVC_Zorgpas and are therefore automatically excluded from the analysis. Second, variables that were insignificant and/or not contributing to the adjusted R²: KZH_Zorg, Product_Zorg, MYF_Acti, Time on Site, SRVC_Decl, SRVC_SDE and SRVC_W_REK were not included.

(44)

44 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 

Where, : Predicted Visitor Value Current Customers : the error term

The model for Current Customers has a R² of .595 and an adjusted R² of .349. As stated earlier, the adjusted R² is a less biased measure and makes this statistic comparable with other models. The F-Ratio is 81.867, p = .000. The independent variables are significant at a 0.01 significance level. There are exceptions for the following variables; PremieProductWoon (p = .004), MYF_INL (p = .046), SRVC_OPZEG (p = .005) and Website_Error_Amount (p = .030), which are all significant at a 0.05 significance level.

4.4.4 ASSUMPTIONS WITHIN THE MULTIPLE REGRESSION ANALYSES

As the independent variables are selected, and the regression coefficients are known, it is now possible to assess the models for meeting the assumptions for multiple regression analyses. Hair et al. (2010, p.182) recommend to examine several assumptions that should be met.

Linearity of the phenomenon is the first area. When the relationship is not linear, the result of the regression analysis would under-estimate the true relationship. The linearity of the phenomenon can be checked by the examination of residual plots: using the studentized residuals as a function of standardized predicted values. See Appendix D, Figures 1 and 2 for the graphical analyses of the residuals for both models. The graphical analyses do not indicate that the relationship is not linear. Hence, this assumption is met.

(45)

45

assumption is not violated. There is some concentration above zero, indicating that heteroscedasticity could be present. Therefore it is wise to forecast the model on an estimation data set. Forecasting provides the opportunity to check if heteroscedasticity actually influences the model by giving too much weight to a small subset of the data set when estimating the coefficients. If these models lose predictive power in its forecast, such assumptions should be further and deeper investigated. For now, the mass residuals variance is around zero for both models, this implies that the assumption is not violated.

The final area is the normality of the error term distribution. Hair et al. (2010, p.185) argue that this is perhaps the most frequently encountered assumption violation. Consequently, Hair et al. (2010) opt to use the better diagnostic tool in the form of the normal probability plots. See Appendix D, Figures 3 and 4 for the normal probability plot for both models. These figures provide a clear picture with a straight diagonal line, i.e. the normal distribution and the plotted residuals compared to this diagonal line. As the residual line closely follows the diagonal line, the distribution can be seen as normal (Hair et al., 2010, p.185).

4.4.5 IDENTIFICATION OF A POSSIBLE MODERATING VARIABLE

As previously mentioned, the conceptual model requires to investigate whether the variable Amount of Website Errors negatively moderates Visitor Value and the other independent variables. The moderator term is a variable formed by multiplying each independent variable by the possible moderator variable, i.e. Amount of Website Errors, which will be entered in the regression equation. The three-step process by Hair et al. (2010, p.181) will be used to determine a possible moderating effect. First, the unmoderated equation is estimated. Second, the moderated relationship is estimated, i.e. the original equation plus the moderator variables. Third, the change in R² should be assessed.

(46)

46

“(..) 16 INDEPENDENT VARIABLES

EXPLAIN 35,8% OF VISITOR VALUE”

The analyses are performed with each time another moderator variable within both equations, i.e. current and new customers equation. Consequently the change in R² is assessed: no change in R² could be identified for all moderating variables. Hence, the Amount of Website Errors does not (negatively) moderate Visitor Value and the other independent variables for both equations.

The multiple regression analyses show that Amount of Sessions and Amount of Website Errors negatively influences Visitor Value for New Customers. Additionally, Amount of Bouncers has positive influence on Visitor Value for New Customers. These findings are not expected, a possible moderating effect might be present. As previously mentioned, using the three-step process by Hair et al. (2010, p.181) the Amount of Website Errors does not moderate Visitor Value and the other independent variables, for both equations. Hence, Amount of Bouncers and Amount of Sessions will also be checked by performing the three-step process by Hair et al. (2010, p.181). In line with the findings of Amount of Website Errors, there is no change in R² that indicates that there are potential moderating effects.

4.4.6 INTERPRETING THE REGRESSION VARIATE

As the model estimation is complete, it is now possible to check the predictive power of the independent variables for both models.

First, the New Customers model with 16 independent variables explain 35.8% of Visitor Value. The constant of the equation is statistically significant (p = .000), hence it contributes substantially to the prediction. Additionally, there are four variables that are of negative impact on the predicted Visitor Value, namely, the amount sessions, PremieProduct_Woon, i.e. the factorscores of the Home insurance, PremieProduct_TR, i.e. factor scores of the Temporarily Travel insurance and finally, PremieDR which is the price information search behavior for the Continuous Travel insurance. All other variables have positive coefficients.

(47)

47

explain 34.9% of Visitor Value. Just as the previous model, the constant is statistically significant, p =.000. In line with the previous model, the variables PremieDR, PremieProductTR and PremieProductWoon are of negative influence. All service goals and the Amount of Website Errors within the model are of negative influence as well.

Assessing Variable Importance

To compare variables, the standardized regression coefficients can be used, as the standardization process converts the variables to a common scale and variability (Hair et al., 2010, p.199). Likewise, it reduces problems associated with multicollinearity between the variables in the equation (Aiken and West, 1991). More important, the usage of standardized variables is more meaningful by the interpretation of the regression variate (West et al., 1996; Cohen et al., 2003).

Independent Variable New Customers Model: Standardized Coefficients (Beta)

Current Customers Model: Standardized Coefficients (Beta)

New_Returning .039 .082 Amount_Bouncer .019 / Amount_Sessions -.061 / Depth_of_Visit .110 .267 PREMIE_DR -.201 -.119 PRODUCT_ZORG .031 / PREMIE_ZORG .226 .207 PREMIE_CARAVAN .012 .065 KZH_ZORG .025 / Premie_Product_KZH_Auto .276 .179 Premie_Product_Woon -.071 -.050 Premie_Product_TR -.293 -.232 Premie_Product_UITV .102 .093 Premie_Product_RB .064 .067 Premie_Product_ORV .113 .141 Website_Error_Amount .012 -.035 MYF_INL / -.033 MYF_AANM / -.075 SRVC_W_PERS / -.046 SRVC_OPZEG / -.060

Note: / = Variable not included in the specific model

Table 6 – Standardized Coefficients (Beta) of each Independent Variable

(48)

48

independent variable is. The most important variables in terms of positive impact for the New Customers model are PremieProductKZH_Auto and Premie_ZORG, whereas PremieProductTR and PremieDR are of negative impact within the New Customers model. The Current Customers model is also negatively influenced by PremieProductTR and PremieDR, in line with the New Customers model. However, the Current Customers model has an extra positive, important value by the means of Depth of Visit, besides PremieProductKZH_Auto and Premie_ZORG.

If we look at the differences between the simple regression analyses used for hypotheses testing and the overall multiple regression analyses, the following can be found: (1) For new customers the Amount of Sessions is positive and significant performing a simple regression and is negative and significant performing the overall regression, (2) The variable “SRVC_OPZEG” is not significant, but positive in the simple regression, while it is negative and significant in the overall regression.

Multicollinearity

To assess the multicollinearity problem, a measure recommended by Hair et al. (2010, p.225) for testing the impact of collinearity is used. That is, calculating the tolerance values and the variance inflation factor, also known as VIF.

First, for the New Customers Model, the tolerance values range from .990 (Premie_Caravan) to .492 (Depth_of_Visit), whereas the VIF values range from 1.011 (Premie_Caravan) to 2,515 (Amount_Sessions). Second, for the Current Customers model, tolerance values range from .981 (WebsiteErrorAmount) to .594 (Depth_of_Visit) and VIF values range from 1.020 (WebsiteErrorAmount) to 1.684 (Depth of Visit).

(49)

49

“(..) THE SMALLER THE SHRINKAGE,

THE MORE CONFIDENCE THAT THE

EQUATION CAN BE GENERALIZED.”

4.4.7 VALIDATION OF THE RESULTS

According to Hair et al. (2010, p.206) the most appropriate approach to empirical validate is to forecast the regression model using a new sample. In this research, the original data sets are split into two samples, i.e. an estimation- and validation data set via random selection procedures. Hence, the conditions and relationships for the both data sets are identical. The validation data set to forecast the equation for New Customers contains 20,944 cases, whereas the validation data set to forecast the equation for Current Customers contains 434 cases.

Forecasting, hence examining whether the regression model continues to hold on comparable data, but not used in the estimation is also known as cross-validation (Malhotra, 2007, p.563). As previously mentioned, the model is based on the estimation sample and is consequently forecasted using the validation sample. Consequently the predicted and observed dependent values in the validation sample are correlated to determine the simple r² for the validation data set. This measure can be compared to R² of the estimation data set.

For the New Customers model, the squared correlation between the predicted Visitor Value and observed Visitor Value is .595² = .354. The R² of

the estimation sample is .358. The difference between the R² from the estimation sample and the simple r² from the validation sample is .004, also known as shrinkage (Malhotra, 2007, p.564). Shrinkage can be interpreted as follows: the smaller the shrinkage, the more confidence that the equation can be generalized. The shrinkage is .004, confirming that the regression model for New Customers has good internal and external predictive power.

Referenties

GERELATEERDE DOCUMENTEN

mind-set metrics, and mind-set metrics and sales levels change during contractions of consumer confidence. compared

Moreover, I think Haas is right in stressing that the Phrygian malediction formulae continue an indigenous Anatolian tradition, and it is only by chance that we may

Most previous studies have analysed the agreement between metrics and peer review at the institutional level, whereas the recent Metric Tide report analysed the agreement at the

To create indices measuring how highly participants dehumanized members of each team, we had to discriminate secondary emotions and high humanity traits (i.e., uniquely human traits

We will take a look at the Kantorovich-Rubinstein Theorem, which tells us that the Kantorovich distance is equal to a metric that has the structure of a metric derived from a dual

We manually analyze a set of documents – legal cases, practitioner oriented journal publications, and scholarly oriented journal publications – and register by what (type of)

In experiment 1, the P indicator was maintained as the default criterion for ordering universities on the list view page, but the following message was prominently displayed when

As we have seen in Figure 4, in about one-sixth of all sessions, visitors choose to switch from the default ordering of universities based on publication output (i.e., the