• No results found

The Influence of Customer Behavior and Loyalty Metrics on Future Purchase Behavior

N/A
N/A
Protected

Academic year: 2021

Share "The Influence of Customer Behavior and Loyalty Metrics on Future Purchase Behavior"

Copied!
79
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Influence of Customer Behavior

and Loyalty Metrics on Future

Purchase Behavior

(2)

The Influence of Customer Behavior

and Loyalty Metrics on Future

Purchase Behavior

Master Thesis Marketing Intelligence & Management

June 26, 2017

Author

Tom te Winkel

Menistenstraat 91

8023 SH Zwolle

Tel: +31628150430

E-mail: tomtewinkel@gmail.com

Student number: S2725231

Research institute

University of Groningen

Faculty of Economics and Business

Department of Marketing

PO Box 800, 9700 AV Groningen (NL)

Supervisors

First Supervisor: Dr. K. (Keyvan) Dehmamy

Second Supervisor: Dr. J.T. (Jelle) Bouma

(3)

Management Summary

Nowadays, the internet has become very popular as shopping channel. The main reason is that customers have more experience with the online environment and there are more online retailers to choose from (Bart, Shankar, Sultan and Urban, 2005). However, this fast evolving landscape leads to a fierce competition for the customer. In order to create a competitive advantage, retailers are more focussed on building and maintaining a strong relationship with customers (Day, 2003). In this field, the online retailer has a disadvantage, because they have no personal contact with the customers and the interaction takes mainly place through the website. Therefore, online retailers are trying to minimize the lack of interaction by making use of clickstream data and loyalty metrics in customer surveys. In this way the online retailer tries to better understand the needs of the customer. For this reason the aim of this study is to investigate the influence of customer behavior and loyalty metrics on future purchase behavior. This research builds mainly on the study by Van den Poel and Buckinx (2005). Their study investigated which types of predictors influence purchase behavior at an online store. The existing literature is extended in two ways: 1) by including the customer loyalty metrics and operational excellence as possible predictors, and 2) focussing not only on the influence on purchase probability, but also on spendings behavior.

This study uses session level data from an European online retailer. The data set contains all the sessions of ### customers from 13 May 2016 until 13 March 2017. A logistic regression is performed to predict the purchase probability in a session. In addition, nine Machine learning methods are performed to test which method is able to predict the likelihood of a purchase in a session most effectively. Thereafter, the dataset is split up in four segments and the spending behavior is predicted for each segment with the use of a linear regression.

The research resulted in some interesting findings. First of all, it seems that it is hard to determine the factors that influence purchase behavior for segment 1 and segment 4, because only a few significant variables are found. Second, customers in segment 2 with a higher NPS will spend more money at the retailer, but a higher CSAT will lead to a lower spending behavior. Overall, it can be concluded that the influence of loyalty metrics on purchase behavior is limited and further research is needed.. Finally, the findings shows that males are more likely to make a purchase than females and males in segment 1 or segment 3 spend more money than females.

Keywords: ​Spending behavior, purchase likelihood, linear regression, logistic regression,

machine learning methods, historical purchase behavior, website behavior, loyalty metrics.

(4)

Preface

This master thesis was written to complete the master Marketing Intelligence and Management of the University of Groningen. I have taken the opportunity to write my thesis for a company in order to work with actual data and to gain work experience. During this period I experienced a steep learning curve, because in the beginning I was struggling to find the right balance between work and doing research. But after five months of hard work and encountering ups and downs, I am very satisfied with the end result.

I would like to take the opportunity to thank the people who helped me during this study and during the writing of this thesis. First of all, I would like to thank my first supervisor Dr. Keyvan Dehmamy for all his guidance, support, and useful feedback. I would also like to acknowledge my second supervisor Dr. Jelle Bouma. I am thankful for the useful feedback and the time and effort he spent in establishing and monitoring the relationship between both the supervisor of the university as the supervisor of the online retailer. Thirdly, I would also thank the supervisor of the online retailer. Without her input, time and dedication my thesis would not have been so successful. She helped me not only with my thesis, but also with my personal development, leaving me with a lot of confidence to enter the labor market and to find the job that really suits me. Finally, a thank you to my girlfriend, friends and family who have always supported me.

Tom te Winkel

(5)

Table of content

1.​ ​Introduction 7

2. ​Literature review 10

2.1 Purchase behavior 10

2.2 Determinants of future purchase behavior 11

2.2.1 Historical purchase behavior 12

2.2.2 Website behavior 13 2.2.3 Operational excellence 16 2.2.4 Customer loyalty 18 2.2.5 Customer characteristics 20 2.3 Conceptual model 21 3. ​Research methodology 22

3.1 Method I: Logistic regression 22

3.2 Method II: Multiple linear regression 23

3.3 Method III: Machine learning methods 24

3.4 Model specification and estimation 26

3.5 Plan of analysis 27

4. ​Data analysis 29

4.1 Data description 29

4.2 Variables description 29

4.3 Oddities, outliers and missing values 30

4.4 Pooling 32 4.5 Assumption testing 33 4.5.1 Heteroskedasticity 33 4.5.2 Non-normality 34 4.5.3 Multicollinearity 34 5.​ ​Results 36 5.1 Descriptive results 36 5.2 Logistic regression 37

5.3 Machine learning methods 42

5.4 Multiple linear regression 43

6. ​Conclusion 47

6.1 Hypotheses testing 47

6.2 Discussion 48

6.2.1 Historical purchase behavior 48

(6)

6.2.2 Website behavior 49

6.2.3 Operational excellence 52

6.2.4 Customer loyalty 53

6.2.5 Customer characteristics 54

6.2.6 Final conclusion 54

6.3 Academical and managerial implications 55

6.4 Limitations and avenues for future research 57

References 59

Appendices 65

Appendix A: Overview Hypotheses 65

Appendix B: Variables description 67

Appendix C: Histograms and Q-Q plots 69

Appendix D: VIF scores 70

Appendix E: Lift curve machine learning methods 71

(7)

1. Introduction

Since the birth of the World Wide Web the number of internet users in Europe has grown enormously. On average 85 percent of the households in the European countries have internet access and 71 percent of the individuals use internet on a daily base. In the countries Luxemburg, the Netherlands and Norway there are already 97 percent of the households with internet access, nearly full coverage (Statista, 2016). The internet is mainly used for receiving and sending e-mails, consulting wiki’s to gain knowledge, consuming and publishing content on social media and reading a digital newspaper, but also looking up product information is an often mentioned activity (Eurostat, 2016).

The increased popularity of online shopping could be explained by the increasing level of experience with the online environment and there is more variety of online retailers to choose from. Customers seems to have more trust in making an online purchase and become aware of the benefits (Bart, Shankar, Sultan and Urban, 2005). Making an online purchase is less time-consuming and the market is more transparent, because customers can compare prices between retailers (Wolfinbarger and Gilly, 2001; Park and Kim, 2003). In addition, the number of retailers who offering their products or services online is still growing, because of the low investments in starting and maintaining a website and the possibility to reach more customers compared to the physical store (Bakos, 2001; Ecommerce Europe, 2015).

In order to maintain a competitive advantage in this fast evolving landscape, online retailers have to make a transformation from a product-oriented organization to a more customer-centric organization (Lamberti, 2013). In a customer-centric organization the focus is on building and maintaining a strong relationship with the customer (Day, 2003). A satisfactory relationship will increase the lifetime value, commitment and trust of a customer and this leads finally to long-term profitability (Ahmad & Buttle, 2001). Therefore a retailer needs to listen to their customers and improve the experience among all touchpoints (Lamberti, 2013). In this field, the online retailer has a disadvantage, because the interaction with the customer takes mainly place through the website and is based on pictures, videos and the quality of the product information (Park and Kim, 2003). Where the physical store interacts mainly through face-to-face contact (Moe, 2003).

Online retailers are trying to minimize the lack of interaction with customers by making use of clickstream data. Clickstream data contains information about the online behavior. Especially, it provides insights in the path and actions the customer takes through a website (Montgomery, Srinivasan and Liechty, 2004). This data is helpful in optimizing the website in

(8)

order to better fulfill the needs of the customer. In addition, online retailers are also trying to start a dialogue via customer surveys. In those surveys customers are already often asked about their level of satisfaction with the company (CSAT), but the Net Promoter Score (NPS) is becoming increasingly popular among academics and practitioners. The NPS reflects customer's intention to recommend the retailer to friend or colleagues on a 0 to 10 rating scale (Reichheld, 2003). This score if often used in customer surveys, because of its simplicity and ease of measurement and the founder claims that an increase in the NPS will lead to more loyal customers and business growth (Grisaffe, 2007; Reichheld, 2003). In an empirical study, Van den Poel and Buckinx (2005) already examined the influence of different types of determinants on purchase behavior at on online store. They included general clickstream behavior, detailed clickstream information, customer characteristics and historical purchase behaviour as potential determinants. However, the customer loyalty metrics in order to minimize the lack of interaction is missing. Therefore the main purpose of this paper is to investigate the influence of customer behavior in combination with the loyalty metrics on future purchase behavior. The corresponding research question of this study is:

What is the influence of customer behavior and loyalty metrics on future purchase behavior?

This research aims to provide and expand current knowledge, namely: Firstly, the conversion rate (the proportion of visitors who complete a desired action) is one of the most important metrics to measure profitability and growth, but for online retailers the conversion rate rarely exceeds 5 percent (Ayanso and Yoogalingam, 2009). Websites seems to attract a lot of visitors, but are not able to entice visitors to make a purchase. The low rates could be explained by the ease and the low costs of visiting different online retailers compared to traveling to a psychical store (Moe and Fader, 2001). For online retailers it is highly relevant to gain more insights into the determinants of purchase behavior, because even small changes can result in considerable increases in sales (Sismeiro and Bucklin, 2004).

Secondly, a model is developed based on data from an existing European online retailer including variables that are already tested in existing literature and also new variables are added to extend the literature. Besides, there is ten months of data available which can be used to train, validate and test the model.

Finally, this research provides more insights about the performance of customer loyalty metrics. Although the popularity by retailers and the positive results of the study of Reichheld (2003), there is still an ongoing debate about the impact of NPS among practitioners and

(9)

academics (McGregor, 2006). Research suggest that the predictive performance of the NPS is limited and is not better than other metrics such as customer satisfaction (Van Doorn, Leeflang and Tijs, 2013). Previous research from Morgan and Rego (2006) also showed that focussing on this recommendation intentions and behaviors are misguided and not lead to growth of the firm. In a longitudinal study, Keiningham, Cooil, Andreassen and Aksoy (2007) found no support for the claims of Reichheld. Besides the theoretical contribution, this research is also relevant for managerial purposes, because the findings can be used as benchmark for other online retailers who are using customer loyalty metrics (Grisaffe, 2007). In addition, the model will be tested for different segments, because in practice retailers are focussing more on manageable and heterogeneous segments instead of focussing on the entire customer base.

The overall structure of the study takes the form of six chapters. First, the literature review will elaborate on the conceptual model which gives an overview of the determinants that influence purchase behavior. These variables will be explained extensively and hypotheses are formed. Chapter three is concerned with the methodology used for this study. In chapter four the data transformation and cleaning steps are explained. In chapter five the results will be discussed and finally in chapter six the conclusion of this research will be explained including the discussion with the implications, limitations and avenues for future research.

(10)

2. Literature review

This chapter will give an overview of existing literature to provide a better understanding of the main concept, namely purchase behavior and also the determinants that are included in this research. First, section 2.1 describes purchase behavior. In section 2.2 the determinants historical purchase behavior, website behavior, operational excellence, customer loyalty and customer characteristics are explained. Finally in section 2.3 the existing literature and relations are summarized in a conceptual model.

2.1 Purchase behavior

Purchase behavior could be defined as the decision processes involved in actual buying and using products (Morrison, 1979). In this study purchase behavior will be explained using two variables. The first variable is a dummy variable (Purchase) that indicates whether a purchase was made yes or no. Herewith it is possible to quantify if changes in the independent variables lead to a higher or lower purchase probability in a session. To get more insights whether changes in the independent variable leads to a higher spending behavior a second variable is included. The variable is continuous and can be defined as the total amount spent (Spendings) in a session. As described in chapter one, understanding and predicting purchase behavior is getting more important, because it could in the end lead to a more successful business (Sismeiro and Bucklin, 2004). Especially, because of the growing online competition and the ease of visiting multiple websites from home without making any costs (Moe and Fader, 2001). Therefore it is possible that customers visit the website multiple times before making a purchase.

(11)

stimulus driven, which could lead to more impulse buying. The visitors with a search & deliberation pattern want to buy a product in a particular category, but are unsure which product to buy. The directed buying pattern show that the visitor's main intention is to purchase a product and has sufficient knowledge to make the decision (Moe, 2003). Both the knowledge building as the hedonic browsing patterns are less purchase driven in compared to directed buying and search & deliberation patterns.

However, many online retailers are monitoring the purchase behavior via web analytics tools like Google Analytics, but weekly statistics about for example the total number of purchases per week do not give much insights about patterns on an individual level. Therefore researchers started to make use of clickstream data to study how customer behavior evolves over time and to predict purchase behavior. Clickstream data contains information about the sequence of pages or path viewed by the visitor as they click through the website (Montgomery et al., 2004). The first research in this field was conducted by Moe and Fader (2001). They founded that taking dynamics into account will lead to a better prediction of the purchase conversion rates compared to the stationary model and the probability of making a purchase is higher for customers who visit the website more often. In a follow-up study historical data was included to take into account the different roles of store visits and the fact that people can become more familiar and experienced with the order process of the website (Moe and Fader, 2004). As a result, companies can use this data to predict purchase probability for each customer.

2.2 Determinants of future purchase behavior

To study the determinants of future purchase behavior a set of 31 variables will be discussed. The variables are chosen based on existing literature, logic and intuition. In an empirical study, Van den Poel and Buckinx (2005) already investigated which predictors influence purchase behavior in an online retail setting. The 92 variables used in this study are categorized into four different groups, namely: general clickstream behavior, detailed clickstream information, customer characteristics and historical purchase behaviour. The results shows that variables from all the four categories are present in the final best model, but detailed clickstream variables are the most important in predicting purchase behavior. Based on these findings several relevant variables will be included in this study in combination with some new variables.

In this study, the general and detailed clickstream data will be renamed into one category, namely: website behavior. Data about historical purchase behavior and the characteristics of

(12)

the customer are available and therefore included in this research. The two categories operational excellence and customer loyalty will be created. In the next section the five categories and underlying variables will be explained in further detail and their expected effect on future purchase behavior.

2.2.1 Historical purchase behavior

Historical purchase behavior is according to the literature effective in predicting future purchase behavior (Moe and Fader, 2004; Van den Poel and Buckinx, 2005). Therefore the variables recency, frequency and monetary value will be included to define the historical purchase behavior of each customer (Bitran and Mondschein, 1996; Van den Poel, 2003). Recency could be defined as the time since last purchase (PurchaseReceny), which measures the interval between current visit and the time since the last purchase in days or months (Chen, Kuo, Wu and Tang, 2009). Customers with a high score of recency, thus a larger time interval between current visit and previous purchase will have a lower purchase probability. The effect of recency on future purchase behavior only holds for retailers who have a broad assortment with different categories in comparison to retailers who for example only selling washing machines. There is less variation, because customers buy once every five year (Chen et al., 2009). Furthermore, Neslin, Taylor, Grantham and McNeil (2013) introduced the recency trap, which means that customers who do not make a purchase in a given time period, are also less likely to make a purchase in the next period. These customer are of less value for the retailer, because their purchase likelihood is continue to decline. In addition a new variable is created which is in line with the argumentation of purchase recency, namely: whether a customer made a purchase during the last visit (PurchaseLastVisit). Based on these findings the following hypotheses are defined:

H1a:​ A higher number of days between visit and last purchase will negatively influence future

purchase behavior.

H1b:​ A purchase during the last visit will positively influence future purchase behavior. One variable (PurchaseVisit) is included in this study to define the purchase frequency. Frequency can be defined as the total number of purchases in a given time period. Research show that the frequency of past purchases is positively influencing customer’s future purchases. According to Janiszewski (1998) frequent buyers are more involved and more motivated in the process and thus are more likely to buying on impulse. Another research suggest that frequent buyers are also more likely to make a repeat purchase. (Bicheno and Wierenga, 2009). Based on this information the following hypothesis can be formulated:

(13)

H2:​ A higher number of purchases per visit will positively influence future purchase

behavior.

Monetary value is an indicator that represents the amount of money spent since their first purchase. According to Keiningham et al. (2007), the monetary value of each customer past purchase behavior is likely to be an effective predictor of future purchase behavior, because they have more experience and trust in the order process of the retailer (Ahmad & Buttle, 2001). This is in line with the study of Van den Poel and Buckinx (2005) who showed that the monetary value has a positive effect on customers future purchase behaviour. Therefore it can be assumed that customers who spend more money now, will likely to spend more money in the future. Based on these findings two variables (AvgMonetaryValuePurchase, AvgMonetaryValueVisit) are included. In addition, one variable (MonetaryValueLastVisit) is included to indicate how much the customer has spend during the last visit. In this research it is assumed that customers who spend more during the last visit will probably spend less during the next visit, because they can not spend unlimited amount of money and have to save money for the next large purchase. The corresponding hypotheses can be defined as follows:

H3a:​ A higher average spending per purchase will positively influence future purchase

behavior.

H3b:​ A higher average spending per visit will positively influence future purchase behavior. H3c:​ A higher spending during the last visit will negatively influence future purchase

behavior.

2.2.2 Website behavior

The second category is website behavior and consist of eleven variables. The first two variables are related to website loyalty. Website loyalty can be defined as customer’s favourable attitude towards the website of the online retailer. The first variable which indicate website loyalty is visit frequency (VisitFrequency). Visit frequency could be explained by the total number of visits in a given time period (Bucklin and Sismeiro, 2009). When customers gain more experience with the online retailer they may consider to visit the website more frequently, less frequently or the frequency is not changing. Literature shows that customers who frequently visit the website are more likely to make a purchase compared to infrequent visitors (Van den Poel and Buckinx, 2005). Moe and Fader (2004) also found supporting evidence, but they extend these findings by showing that changes in individual visit frequency over time provides even better predictions of the purchase likelihood of the customer. The second variable is visit recency (VisitRecency), which can be defined as the number of days

(14)

between the customers’ latest visit and the visit before that (Chen et al., 2009). Customers with a higher recency have a higher number of days in between, which indicate that they visit the website only once a while. In the just mentioned studies they found a negative significant effect meaning that a higher recency leads to a lower purchase likelihood. Based on these findings the following hypotheses are formed:

H4a: ​A higher number of total past visits will positively influence future purchase behavior. H4b:​ ​A higher number of days since last visit will negatively influence future purchase

behavior.

The next two variables are related to the stickiness of the session and refers to the amount of time a visitor spends on the website during a session (Lin, Hu, Sheng and Lee 2010). The stickiness of the session is important, because it influence customer’s commitment and trust towards an online retailer. A sticky website not only leads to the fact that customers spend more time on the website, but they also have a higher probability to make a purchase (Kumar Roy, Lassar and Butaney, 2014). The variables pageviews (PageViews) and duration (Duration) are used to define the stickiness of the session. Pageviews can be described as total number of pages viewed in a session and duration can be described as the time spend on those pages. Literature shows a positive significant effect of the session duration and the number of page views on future purchase behavior. Customers who access more pages or spend more time on the website are more inclined to make a purchase (Lin et al., 2010; Van den Poel and Buckinx, 2005). Therefore the hypotheses are defined as followed:

H5a: ​A higher number of page views will positively influence future purchase behavior. H5b: ​A longer time spent will positively influence future purchase behavior.

Search behavior is an important predictor of purchase behavior and consists of three variables. The first variable is the number of search terms (SearchTerms) used in a session. As described in section 2.1, Janiszewski (1998) defined two types of search behavior, namely: goal-directed search and exploratory search. In this study it is expected that customers who are using search terms in a session have a more goal-directed search. They already have a specific product in mind when entering the website and it is unlikely that they leave without a purchase (Moe and Fader, 2004). The second variable is based on the exploratory search, namely the ratio between product detail pages and product overview pages visits (PDPorPOPVisits). A lower ratio means more product overview pages visits than product detail pages visits in a session. Research suggest that a low ratio is also be accompanied by a lower purchase intention and

(15)

spending behavior (Moe, 2003). Their browsing behavior is more stimulus driven than planned, because they lack the motivation and experience to search efficiently (Janiszewski, 1998). Furthermore, it is interesting to investigate in which way the customers enter the website (WebsiteEntry). The website could be entered via free channels, namely by typing the domain name into the browser or via organic search. It is also possible to enter the website via a paid channel like referral websites or via non-organic search. The first mentioned entry method is more linked to the exploratory search and the second method is more linked to the goal-directed search, because they are already searching for a specific product. Therefore research suggest that customers who enter the website via a paid channel are more likely to make a purchase and spend more money compared to customers who enter the website via a free channel (Park and Chung, 2009). Altogether, this lead to the following hypotheses:

H6a: ​A higher number of search terms will positively influence future purchase behavior. H6b: ​A lower number of PDP visits in compared to POP visits will negatively influence future purchase behavior.

H6c: ​Entering the website via paid channels will have a greater positive influence than free channels on future purchase behavior.

Besides search behavior, customers can show interest in specific products. The first variable is related to the number of PDP visits in comparison to the total number of pageviews (VisitPDP). According to Moe (2003), customers who have the intentions to buy a product are more likely to view product pages. They view product pages, because it provides more detailed information about the product itself instead of general information that is showed on the category pages. The second variable is adding a product to the shopping cart (AddtoCart). According to Sismeiro and Bucklin (2004) there is sequence in ordering a product. First the item is placed in the shopping cart, then he or she enters the shipping information and finally place the order with the use of a credit card. Their results show that thirty percent of the visitors add products to the cart and only two percent of the visitors ordered a product in the end. This research suggest that a higher number of products added to the cart will lead to a higher purchase likelihood, because it is already one step closer in making a purchase. The two corresponding hypotheses are defined as followed:

H7a:​ ​A higher number of PDP visits will positively influence future purchase behavior. H7b: ​A higher number of products added to the shopping cart will positively influence future purchase behavior.

(16)

The last two variables are related to website behavior that does not show any purchase intentions. Often it is assumed that when people visit an online retailer they have the intentions to make a purchase, but as described in section 2.1 it is possible that customers do not have these intentions. For example they recently made a purchase and want to gather information about the delivery time or return policy. As a result they only visit personal pages or informational pages (​InformationalPages​) and have no intentions to visit product pages. Furthermore, the time spent on the website could indicate whether there are purchase intentions. Van den Poel and Buckinx (2005) introduced a variable that indicates whether a customer is in a hurry. Hurry can be defined as the average time during the last visit is less than the average over the past (​Hurry​). They found that when customers are in a hurry they are less likely to make a purchase. Customers seems to have no time to make a purchase. For the above mentioned reasons two variables are included in the model, which are expected to have a negative influence on future purchase behavior. The hypotheses are described as followed:

H8a: ​A higher number of information page visits will negatively influence future purchase behavior.

H8b: ​A lower average time spent per session than average will negatively influence future

purchase behavior.

2.2.3 Operational excellence

Operational excellence can be defined as providing customers with reliable products or services at competitive prices and delivered with minimal difficulty or inconvenience (Treacy and Wiersema, 1993). For online retailers the operation departments mainly consists of monitoring and improving the delivery and return process (Boyer, Hallowell and Roth, 2002). Therefore two variables will be included in the model. This study suggest a positive influence on future purchase behavior, when the delivery of the past purchase was a success (DeliverySucces). The delivery becomes a success when the products are delivered within the specified time period. According to Rao, Griffis and Goldsby (2011), 60 percent of the customers are switching between retailers, because of service failures like problems with ontime delivery. In this study it is expected that when customers have a higher success rate for the delivery of previous purchases they will be more likely to make a purchase and spend more money during next visit. The associated hypothesis is described as followed:

H9:​ A higher success rate for delivery will positively influence future purchase behavior.

(17)

Returning products in the online retail is more common in Europe (Postnord, 2015). The return percentage of online retailers fluctuates between 15% and 45% and the products of the fashion category are returned most often. In this study it is assumed that in general customers are not coming back when they return products, because they are not satisfied with the product and lose confidence in the retailer. This contradicts with the number one reason for returning products, because the underlying reason is that customers themselves ordering the incorrect product or size. Based on these findings the main effect could be defined by the following hypothesis:

H10a: A higher number of total past returns will negatively influence future purchase behavior.

Besides the main effect of the total number of past returns on purchase behavior, it could also be different for specific product categories. In total there are six product categories added to the data set, namely Fashion (ReturnedFA), Home & Garden (ReturnedHG), Electronics, Entertainment & Household Appliances (ReturnedEEH), Sport & Leisure (ReturnedSL), Beauty & Wellness (ReturnedBW) and Other (ReturnedOTHER). According to Cho, Im, Hiltz and Fjermestad (2002) the most important aspects to make a repeat purchase after returning the product are the price of the product and the effort to search for product information. When customers are paying a high price for a product and they have to invest much time to compare different specifications (for example laptop or bed) their future purchase likelihood and spending behavior will decline after returning the product. Customers have high expectations and often these high expectations are not fulfilled by the product or retailer. They become disappointed and do not want to invest both money and time to make a repeat purchase. For cheaper products this effect is reversed, because they spent less amount of money and a purchase is made on a more regular basis (for example clothes or fragrances). This lead to the following hypotheses:

H10b: A higher number of total past returns of the Fashion category will positively influence future purchase behavior.

H10c: ​A higher number of total past returns of the Home & Garden category will negatively influence future purchase behavior.

H10d: ​A higher number of total past returns of the Electronics, Entertainment & Appliances category will negatively influence future purchase behavior.

H10e: ​A higher number of total past returns of the Sport & Leisure category will positively influence future purchase behavior.

(18)

H10f: ​A higher number of total past returns of the Beauty & Wellness category will positively influence future purchase behavior.

H10g: ​A higher number of total past returns of the Other category will positively influence future purchase behavior.

2.2.4 Customer loyalty

Customer loyalty is the feeling of attachment to or affection for a company’s people, products, or services (Jones and Sasser, 1995). This definition is similar to the concept of relationship commitment which is described as the desire to be in a valued and enduring relationship (Wang and Wu, 2012). Loyal customers show different behavioral outcomes. The most common behavior is promoting the retailer via word-of-mouth or the willingness to pay more for a certain product or service (Srinivasan, Anderson and Ponnavolu, 2002). In this study the variables NPS, customer satisfaction (CSAT) and relationship length will be included to define the level of loyalty.

The first variable that will be discussed is the NPS (NPSScale). Approximately 13 years ago Reichheld (2003) developed the NPS as a customer loyalty metric. Based on a 0-to-10 scale, customers answer the question: ‘How likely is it that you would recommend the company to a friend or colleague?”. The scores between 0 and 6 are labeled as detractors, 7 or 8 as passives and 9 or 10 as promoters. According to Reichheld this metric gives more insight in the level of loyalty and the growth of the business. The logic of this question is simple, because it is per definition a plus to have more promoters than passives or detractors in the customer base. Customers who are promoters are more loyal and are recommending the business to friends and family. Since the introduction of the NPS there is an ongoing debate between practitioners and academics, because there is uncertainty about the performance of the NPS. Literature suggest that the predictive performance of the NPS is limited and focussing on this recommendation intentions and behaviors are misguided and will not lead to growth of the firm (Morgan and Rego, 2006; Van Doorn, Leeflang and Tijs, 2013). In a longitudinal study, Keiningham, Cooil, Andreassen and Aksoy (2007) found also no support for the claims of Reichheld. Although the contradicting findings, the NPS is widely adopted by many retailers. The main reason is that they believe it is build on extensive research, but there is actually still a lack of evidence (Keiningham et al., 2007). In this research the 0-to-10 scale will be included and therewith the hypothesis can be formulated:

H11:​ A higher NPS will positively influence future purchase behavior.

(19)

The second variable is the customer satisfaction score (CSAT). Customer satisfaction can be defined as the number of customers, or percentage of total customers, whose reported experience with a firm, its products or its service exceeds specified satisfaction goals (Farris, Bendle, Pfeifer and Reibstein, 2010). The score is often determined by asking a single question, namely: “How would you describe your overall satisfaction with the company?”. The response options range from very dissatisfied to very satisfied on a 5-point or 10-point scale. In the field the score is often reported as “top box” or “top two boxes”, thus the percentage who checked the “9” or “10”. The most important factor behind this score is the expectation of the customer, because when the expectations are high and the retailer can not meet those expectations, they will be disappointed and will likely rate their experience as less than satisfying (Farris, et al., 2010). Research provide evidence for a positive relation between customer satisfaction and the financial performance of the retailer and therefore an increase in the satisfaction of the customer will lead to a higher profitability (Anderson, Fornell and Lehmann, 1994; Ittner and Larcker, 1998). The satisfaction score is so often used by retailers, because retaining satisfied customers is less costly than retaining less satisfied customers (Mittal and Kamakura, 2001). All in all, this lead to the following hypothesis:

H12:​ A higher CSAT will positively influence future purchase behavior.

The last variable in this category is relationship length (RelationshipLength). Relationship length refers to the amount of time (in years) the customer is member of the retailer. In this research the relationship between the customer and the retailer starts when the customer makes an account on the website. Research show that when the relationship is short, customers have less experiences and are forced to rely more on word-of-mouth and the communication of the retailer to form weaker and less-stable expectations (Wang and Wu, 2012). Retailers have to invest more time and money in the interaction with the customer to gain their trust. When the relationship becomes longer they get more familiar and experienced with the retailer and the barrier to order products from the website is lower (Coulter and Coulter, 2002). Therefore, the literature suggest that the purchase behavior of customers with a short relationship is lower than customers with a long relationship which lead to the last hypothesis in this category, namely:

H13:​ A longer relationship length will positively influence future purchase behavior

(20)

2.2.5 Customer characteristics

Finally, also the characteristics of the customer will be taken into account in this research. According to the literature there is support for the use of several customer characteristics in predicting purchase behavior (Padmanabhan, Zheng and Kimbrough, 2001; Van den Poel and Buckinx, 2005). In this study the variables age, gender and variable X are included. Age (Age) is a variable that is expected to have an inverted U-shaped relationship with future purchase behavior. Young children do not have the right to manage their bank account and therefore can not make an online purchase. When they become older they have the right to make a purchase independently. Younger people are more familiar with new technologies and therefore are more likely to make an online purchase. The likelihood of making a purchase is declining when people become even older, because they get more thoughtful and deliberate in their evaluation process (Venkatesh and Agarwal, 2006). The hypothesis is defined as follows:

H14:​ There is an inverted U-shaped relationship between age and future purchase behavior. The second customer characteristic is gender (Gender). There are several differences between the male and female customers in their online behavior. Male customers are more satisfied and show a more positive attitude towards making online purchases compared to female customers. The main reason is that females are less emotionally satisfied, because they have less trust, are more skeptical and did not find it as convenient as the male customer (Rodgers and Harris, 2003). Based on this research the fifteenth hypothesis can be formulated:

H15:​ Male customers will have a greater positive influence than female customers on future

purchase behavior.

The last variable in this category is variable X (VariableX). ######## #### #### #### ### #### ######### #### ## ########## ## ### ###### ## ### ######## ######## ###### ### ######### #### ###### ###### ### ######### ##### #### ### #### ## ######## ## #### #### ### ######## #### #### ### ## ### ## ## ##### ### ######### #### # #### ## ### #### ## ######## ##### ######## ######### #### ####### ###### ##### ### ######## #### ## #### ####### ## ### ######## ####### #### #### ## ##### ### #### #### ######## ###### ##### ######## ### ##### ### ###### ## ############# ###### The corresponding hypothesis can be defined as:

H16:​ Variable X will positively influence future purchase behavior.

(21)

2.3 Conceptual model

Based on the literature review, a conceptual model with the determinants that are driving purchase behavior is created. The conceptual model is shown in figure 1.

Figure 1:​ ​Conceptual model

To conclude, the overview of all hypotheses is given in appendix A

(22)

3. Research methodology

This chapter describes the methodology used for this research. First, section 3.1 explains the logistic regression. Section 3.2 describes the multiple linear regression. After that, section 3.3 describes the different machine learning methods that are used in this study. In section 3.4 the model specification and estimation are explained, and finally in section 3.5 the method of analysis is described.

3.1 Method I: Logistic regression

In order to predict future purchase behavior in a session yes or no a binary choice model is used. The binary choice model is preferred, because the error term shows a binomial distribution instead of a normally distributed pattern. This is related to the binary outcome variable, namely the probability to make a purchase yes or no (equation 1).

[1]

The probit model and the logit model are both often used in theory. However, the logit model is considered to be the best option for this analysis, because of the mathematical convenience and the parameters estimates are easier to interpret compared to the probit model (Leeflang and Bijmolt, 2013). The logit model calculates the purchase behavior by linking the unobserved utilities to the binary choice of a purchase by using probabilities. The utilities can be calculated when the model is estimated based on the coefficients. Equation 2 shows the utility function.

[2]

Thereafter, the utilities are used in equation 3 to transform the utilities into probabilities by applying the logistic CDF. The probability of observing ​Y​i = 1 for customer ​i​, given ​X​i, is equal to the cumulative distribution function evaluated at the utilities (Leeflang and Bijmolt, 2013).

[3]

This results in an estimation for the latent variable ( ​Y​i*). The latent variable indicates the

(23)

relation between the latent variable (​Y​i*) and the observed choice (​Y​

i​) can be specified as:

[4]

To interpret the coefficients of the logit model the odds ratio is an important criteria. The odds ratio is the probability of ​Y​i = 1 divided by the probability of ​Y​i = 0 (equation 5).

[5]

The odds ratio is the likelihood of making a purchase versus not making a purchase. An odds ratio of four means that the probability of purchasing is four times larger than the likelihood of not purchasing. It is common to interpret the odds ratios as the log odds ratios (equation 6).

[6]

The log odds ratio indicates the change in odds ratio if the independent variable changes one unit. For example if the variable is gender (0 = female, 1 = male) with a beta of 1.385 means that one-unit increase in the independent variable results in a change in the odds ratio of exp(β) = 4. This indicate that the odds ratio for males is four times the odds ratio for females (Leeflang and Bijmolt, 2013).

3.2 Method II: Multiple linear regression

In order to predict spendings a multiple linear regression will be performed. Regression analysis is a technique that models the relationship between the dependent variable and one or more independent variables. In a regression, the dependent variable is modeled as a function of independent variables, coefficients and an error term. Equation 7 shows the corresponding formula.

[7]

In the formula the dependent variable is ​y​and the ​x​ 1,​x​2 … ,​x​k, are the independent variables. The ​ß​0, ​ß​1, ​ß​2 … , ​ß​k, are the coefficients and ​ε ​is the error term. The error term is the unobserved random component, which is created when there is a difference between the observed values and the predicted values in the model (Yan and Su, 2009). The coefficients are estimated based on the ordinary least squares method (OLS), meaning minimizing the sum of squared residuals (Leeflang and Bijmolt, 2013).

(24)

After the estimation of the coefficients in the model, four assumptions need to be checked before evaluating the quality of the model. Violating these assumptions will lead to wrong estimates of the parameters. The four assumptions are multicollinearity, autocorrelation, heteroskedasticity and normality. The first assumption, multicollinearity means that some mechanism causes independent variables to be correlated. If an independent variable has a high correlation with an another variable there is multicollinearity in the model. Multicollinearity does not affect the predictive power or reliability of the model, but rather the significance level and coefficients estimates of individual variables (Field, 2009). To examine if multicollinearity exists in the data the Variance Inflation Factors (VIF) scores could be used. When the VIF scores exceeds the value 5 there is multicollinearity. In the literature a VIF score of 10 or higher is often used as rules of thumb to indicate serious multicollinearity issues in the model. The second assumption, autocorrelation is violated when the residuals show a systematic pattern over time (Leeflang and Bijmolt, 2013). The residuals show a first order autocorrelation when all or a part of the covariances will be nonzero for different point in the data. A violation of this assumption results in a wrong estimate of the variance of the effects (Leeflang and Bijmolt, 2013). The test should only be done when there is time series data, thus not required in this study. The third assumption, heteroskedasticity, is violated when there is a wrong estimation of the variance, thus a set of residuals have a different variance then an another set of residuals. Violating this assumption could lead to inefficient estimates of the coefficients (Leeflang and Bijmolt, 2013). To detect this assumption the Levene’s test can be performed. There is no concern when the test show insignificant results. The last assumption, normality could be violated due to wrong model specifications and this lead to residuals which are not normally distributed (Leeflang and Bijmolt, 2013). This assumption can be detected by looking at a histogram of the residuals and also by performing the Shapiro-Wilk, Kolmogorov-Smirnov and Jarque-Bera tests. Bootstrap can be performed in order to overcome this problem.

3.3 Method III: Machine learning methods

The basic concept behind machine learning is a computer program that automatically improve the performance through experience (Kübler, Wieringa and Pauwels, 2016). Machine learning consists of supervised and unsupervised learning. In case of unsupervised learning the model is not provided with the correct output during the training. Therefore, the algorithm cannot learn how to classify or predict unseen data. This method is often used to identify structure and patterns within the data. In this study the focus will be on supervised learning methods. In this learning method both the input and the desired output are included in the training data. The correct outputs are known and used as input in the model during the learning process

(25)

(Kübler, Wieringa and Pauwels, 2016). Different machine learning methods are used to get more insights in which method is able to predict future purchase in a session most effectively. The data set is split up in a training set and an evaluation set. The machine learning methods uses these two sets of data to train, predict and evaluate its performance. The methods used in this study are the Logit regression, Nearest Neighbour, Naive Bayes, Support Vector Machines (SVM), Tree, Bagging, Boosting, Random Forest and Neural Network. The performance of the above mentioned machine learning methods are evaluated based on the GINI coefficient of diversity and the Top-Decile Lift (TDL). These performance metrics will be discussed in section 3.5.

Here the different machine learning techniques are briefly discussed, but only the logit regression is excluded because this method is already described in section 3.1. The first method is the decision tree. Both practitioners as academics often use decision trees, because there are many advantages of using this method. The main advantages of this method are the good performance on Big Data, simple to interpret and it is very flexible (Kübler, Wieringa and Pauwels, 2016). The method starts with the complete training set and splits customers in homogenous sub-groups. The goal is to get a so pure classification as possible, with a minimum number of splits (Blattberg, Kim and Neslin, 2008). At each split a variable is selected that forms the basis for a decision rule that drives the split. The tree is stopped based on a splitting rule. There are four often used splitting rules, namely: AID, CHAID, CART and QUEST. The most important problem with this method is overfitting which means that the data perfectly classifies the training data, but is not able to classify the new unseen data correctly (Kübler, Wieringa and Pauwels, 2016). A good way to overcome overfitting is combining different trees together and aggregating across decision rules. This way of dealing with overfitting refers to ensemble methods. The most used ensemble methods are Bagging, Boosting and Random Forest. Bagging is suitable when the data is noisy, thus outliers or other issues in the data. Boosting performs well when there is missing data and different types of predictors in the data and Random Forest is suitable when there is high correlation between trees and only a few very strong predictors in the data (Dietterich, 2000). The sixth method SVM is also a classification technique that splits observations into homogeneous groups using a linear approach. Neural Network is a machine learning technique that is based on how the brains of a human works. This technique consists of a number of interconnected elements that transform inputs to outputs (Kübler, Wieringa and Pauwels, 2016). Nearest Neighbour is a non-parametric method used for classification and regression. In order to make predictions for a new variable the complete training set is used for the most similar variables (neighbour) and summarizing the output variable for those variables (Weinberger, Blitzer and Saul, 2006). To

(26)

determine which of the variables in the training data set are most similar to the new data a distance measure is used. The most often used distance measure is the Euclidean distance. The last method that will be explained is Naive Bayes. This method process the information in the training data and tries to determine which observations can be grouped together by defining rules that show up in the data (Kübler, Wieringa and Pauwels, 2016). This is a rather powerful and simple model to perform.

3.4 Model specification and estimation

The model for the logistic regression can be described as followed:

Where i = 1, … ​n​ sessions and s = 1, 2, 3, 4 are the segments.

The model for the multiple linear regression can be described as followed:

Where i = 1, … ​n​ sessions and s = 1, 2, 3, 4 are the segments.

(27)

3.5 Plan of analysis

The plan of analysis consists of three parts which is based on the three methods. The first part is relating to the analysis and validation of the logistic regression. First, a descriptive model is estimated which includes all the variables defined in this research. Before starting interpreting the coefficients, odds ratio and marginal effects, the model will be validated. The most popular metrics to validate the model are the Pseudo R-squares: McFadden R2 and

Nagelkerke R2​. These R-squares are not the same as in the linear model, because the Pseudo

R-squares are based on comparing the log-likelihood of the intercept only model (LL 0) with the log-likelihood of the estimated model (LL k) (Leeflang and Bijmolt, 2013). The next step is to calculate the hit rate (overall percentage of correctly classified predictions) and perform the likelihood ratio test to check whether the estimated model outperforms the null model. To make this visual also a cumulative lift curve is drawn and a Top-Decile Lift (TDL) is calculated. The TDL is the fraction of purchasers in the top decile divided by the fraction of purchasers in the whole set (Neslin, Gupta, Kamakura, Lu and Mason, 2006). The higher the TDL the better the model is able to classify the customers that have a high purchase probability. Thereafter, the significance level of the coefficients are determined with the Wald statistics. After estimating, validating and interpreting the descriptive model a backward procedure is performed to create the best predictive model. The procedure starts with including all variables in the model and excludes those variables that do not significantly contribute to the model. The performance of the different models are compared, based on the information criteria AIC and BIC. The information criteria are used, because the Log Likelihood (LL) is penalized for the number of parameters. The different between the BIC and AIC is that the BIC gives a higher penalty for complexity and is more preferred for a large sample size (Leeflang and Bijmolt, 2013). The model with the lowest AIC and BIC is the best predictive model.

Secondly, the data is split up in a training set and evaluation set. Thereafter different machine learning methods are executed. The TDL, hit rate, calculation time and GINI coefficient are calculated for each method. The TDL and hit rare are already described, but the GINI coefficient can be described as the area between the cumulative lift curve and the lift curve of the random prediction (Blattberg et al., 2008). This technique classifies purchasers and non-purchasers. The interpretation is the same as for the TDL, because the higher the score the better the model performs. The last metric is the calculation time of the machine learning method to estimate the model. The methods are compared based on these four metrics and used to determine which technique is able to predict future purchase in a session most effectively.

(28)

Thirdly, the descriptive model for the multiple linear regression is estimated. In order to validate the model the three assumptions are tested. The first assumption is multicollinearity. To show multicollinearity in the data the VIF scores are calculated. The second assumption, heteroscedasticity, is not violated when the error term is homoscedastic, thus there is equal variance of the disturbance term (Leeflang and Bijmolt, 2013). To test this assumption a Levene’s test is performed. Lastly, to test for normality a number of residual plots are created. In addition a Kolmogorov-Smirnov, Shapiro-Wilk and Jarque-Bera test are conducted to further investigate this assumption. Next to give more insights in the model fit the R-squared and adjusted R-squared are interpreted. The adjusted R-squared is an modified version of the R-squared that takes into account the number of predictors in the model and compares the explanatory power of the models. Thereafter, the significance levels and the coefficients are interpreted.

(29)

4. Data analysis

This chapter will give an overview of the data cleaning and transformation steps. First in section 4.1 and 4.2 the data set and variables will be described. After that in section 4.3 the data set is analyzed and cleaned for oddities, outliers and missing values. In section 4.4 a test is performed to check whether pooling is allowed, and finally in section 4.5 the three assumptions for the multiple linear regression are discussed.

4.1 Data description

In chapter one there is already mentioned that this study uses data of an European online retailer. In the period 13 May 2016 up to and including 26 May 2016, the retailer has sent a customer journey survey among ### customers with a netto response of ### (### percent). This survey provides more insights in the willingness to recommend and the overall satisfaction with the retailer. For all these respondents, ten months (13 May 2016 up to and including 13 March 2017) of clickstream data is collected to train and validate the model. Each session in this study contains information on 30 explanatory variables and two dependent variables. The raw data set contains a total of ### sessions of ### customers. In ###% of the sessions a purchase was made.

4.2 Variables description

In this section the variables and the transformations will be described. First the two dependent variable are being discussed. For the variable purchase yes or no a dummy variable is created. The variable gets the value 1 when a purchase has taken place in a session otherwise the value 0. The second variable, the amount of money spent in a session, is available in the database and can be included.

As described in section 2.2 the explanatory variables which may help in predicting future purchase behavior are divided into five categories. In the category historical purchase behavior all the variables need to be calculated or derived from other variables. PurchaseRecency is the time since last purchase and can be calculated by subtracting the date when the last purchase has taken place from the current session date. The variable is expressed in number of days. The variable PurchaseLastVisit is a dummy variable. The value 1 indicates if a purchase has taken place during the last visit. The variables PurchaseVisit, AvgMonetaryValuePurchase and AvgMonetaryValueVisit are created in a similar way, namely by dividing the total number of purchases or amount spent by the total number of visits or purchases. The last variable in this category is MonetaryValueLastVisit and can be derived from the spending variable.

(30)

In the category website behavior the variables PageViews, Duration, SearchTerms, AddtoCart and InformationalPages can be extracted from the database and do not need any form of transformation. The first variable that is calculated in this category is VisitPDP. The total number of PDP visits is divided by the total number of pageviews. The second variable is VisitFrequency and is calculated by adding up all the visits until the last visit. The third variable VisitRecency is the time since last visit and can be calculated by subtracting the date of the last visit from the current session date. The difference is expressed in days. Next the PDPorPOPVisits variable is calculated by dividing the PDP visits by the POP visits. A value below 1 means more POP visits and a value higher than 1 means more PDP visits. The fifth variable WebsiteEntry is transformed in a dummy variable. The paid channels are indicated by the value 1 and the free channels by the value 0. The last variable in this category is Hurry. Hurry is a dummy variable and a value of 1 indicates that the duration of the current session is lower than the average duration of a session.

In the operational excellence category the variable returns is calculated by adding up the total number of returns per product category until last visit. The variable DeliverySucces is expressed in a percentage of how many deliveries are delivered on time.

The variables NPS, CSAT and RelationshipLength indicating the loyalty of the customer can be included in this research. The NPS and CSAT are available due to the survey sent out on 13 May 2016. The variable RelationshipLength is the difference in years between the session date and the date when the account was created on the website.

The customer characteristics variables are available in the database, but only the variables Age and variable X need to be recoded to a dummy variable where 1 is equal to female or yes. The full list of the variables and the explanation can also be found in appendix B.

4.3 Oddities, outliers and missing values

To create meaningful insights out of the data, first the data set need to be cleaned for oddities, outliers and missing values. To make this insightful, histograms and descriptive statistics are conducted. In table 1 the descriptive statistics for each variable is summarized.

(31)

Table 1: ​Descriptive statistics

The first step is to look for oddities in the data. The first oddities are the six negative values for the variables PurchaseRecency and six negative values for VisitRecency. This is because of registration issues with the date format in a session. To handle this oddity, the negative values were set to zero in order to maintain a valid data set.

After checking for oddities, the data set has to be analyzed for outliers. Outliers can be defined as observations with a unique combination of characteristics identifiable as distinctly different from other observations (Hair, Black, Babin, Anderson, 2010). The descriptive statistics in table 1 shows possible outliers for the variables AvgMonetaryValuePurcahse, AvgMonetaryValueVisit and VisitFrequency, because the maximum is considerably high. Therefore histograms are made for every variable to examine the data for unusual observations that are far removed from the mass of the data. The histograms show a rather smooth transition from the mass towards the large numbers, therefore the large numbers are kept in the data set and are not considered as outliers.

(32)

data when estimating a model. The listwise deletion method exclude an entire observation from analysis if any single value is missing and therefore reduces the sample size and could lead to biased estimates (Myers, 2011). The variables which have a large number of missing values are VisitRecency, MonetaryValueLastVisit and PurchaseLastVisit with all 1,008 missing values and the variable PurchaseRecency has even 12,836 missing values. The data assumes that the customers are visiting and making a purchase for the first time. Therefore the first observation of the customer in the data set has always missing values, because it does not take into account the information before 13 May 2016. For example when a customer visit the website on 12 May and on 14 May only the second visit is included in the data set. Therefore the VisitRecency for this observation is missing while it normally has a value of 2. To deal with the missing values historical data will be included. The first step is to collect data from 13 November 2014 until 13 March 2017 for the four variables. Thereafter, the second step is to filter out only the sessions between 13 May 2016 and 13 March 2017. This reduces the missing values from 1,008 to 6 and the 12,836 missing values to 611. The remaining missing values are customers who don’t made a purchase via the website in the 18 months before the first date in the data set. To deal with the last missing values a multiple imputation method is used. With this model the missing values are predicted by making use of the data that is available. The observation that is closest to this prediction is the imputed value. This method is executed in order to make use of all the observations in the data set. The same method is used for the variable DeliverySucces which has 13,976 missings. The last variable with missings is PageViews. In total 167 observations are missing probably because of registration errors in the corporate data warehouse and therefore excluded from the data set.

4.4 Pooling

The customer base of the online retailer was already segmented based on prior behavioral research where a factor analysis and cluster analysis was conducted. The outcome of this research led to the definition of nine segments ranging from new, engaged, disengaged to churned customers. In this model the nine segments are combined into four segments (​Segment 1, segment 2, segment 3, segment 4) to keep the model simple and the distribution

per segment sufficient. There are three options to account for the multiple entities when performing a multiple linear regression. The first option is the unit-by-unit method, which means that a model is estimated for each segment separately. The second option is called pooling. Pooling means that the data of all the four segments are combined and thereafter one model is estimated. If pooling is allowed, the assumption is that all the independent variables are homogeneous across all segments and have the same effect on purchase behavior. If the variables have the same effect on purchase behavior for the four segments, this would make

Referenties

GERELATEERDE DOCUMENTEN

The experimenter made clear to the participant that the second round of the experiment was about to start: “We will continue with the second round, the experiment

Hypothesis 2: Adding a CSR variable to the determinants of CDS spreads to the equation as used by Ericsson, Jacobs and Oviedo (2009) increases the explanatory power of

It aims to contribute 1 to the conceptualization of intra-organizational dynamics in hospitals and the influence of these dynamics on entrepreneurship, 2a to the assessment of

Providing a solid de finition of political motivations with more clearly de fined criteria is therefore very important for those activities that basically live from the

Increased fiscal deficits and a lower appreciation in both countries, and an increased M2 over foreign reserves in the Philippines specifically, are not reason enough to

According to Flanery and James (1990) the nominal contracting hypothesis implies a relationship between company’s stock return and interest rate changes: the higher

The Lisbon treaty theref ore installed a hybrid solution and two separate presidencies: a longer term presidency f or the meetings of the heads of government in the European

Because advices are called implicitly, such aspect-oriented languages support the specification of so-called instantiation policies to define how to retrieve the aspect instance for