Developing customer segments for an online advertising campaign in order to optimize customer acquisition with advanced unsupervised learning: A case study

(1)

Developing customer segments for an online advertising

campaign in order to optimize customer acquisition with

(2)

2

Developing customer segments for an online advertising

campaign in order to optimize customer acquisition with

advanced unsupervised learning: A case study

(Final version)

L. de Boer

Faculty of Economics & Business Department of Marketing Master thesis Vredewoldlaan 46 9727 DG Groningen 0653167213 l.de.boer.13@student.rug.nl S2243490

First supervisor: Dr. H. Risselada Second supervisor: Dr. A. Bhattacharya

(3)

3

MANAGEMENT SUMMARY

The online shopping environment is a fast-changing environment. In the area of big data, it is important for online retailers to keep up with the competitive online environment and make efficiently use of the Web-based information systems to capture behavior online. An efficient marketing strategy is the development of a targeted advertising campaign. It is crucial for firms to target the right audience with the right message. Customer segmentation is an essential marketing tool that can offer firms insights for better focus on the right audience. Most of the research focuses on the prediction of future purchase behavior or targeting of the current customers. The most used and effective tool to segment customers is cluster analysis. In marketing, it is common practice to use a standard cluster algorithm such K-means analysis. There is a gap in investigating earlier stages of the purchase funnel to reach new customers instead of current customers. Furthermore, marketers rely too often on standard clustering algorithms.

Customer purchase behavior is important but equally important is the reach of new visitors, which is called customer acquisition. The aim of this study is to find out whether developing customer segments on the foundation of so called ‘shopping intent data’ as well as socio demographic variables will not only provide more optimal customer segments, but also whether these customer segments can be used as input for a targeted ‘lookalike’ advertising campaign for customer acquisition. Furthermore, this study investigates whether it is useful to use other unsupervised learning methods than a standard cluster analysis. This research builds on several key papers such as Van Doorn et al. (2010), who suggest in their conceptual paper that customer engagement behavior can serve as a useful framework for segmenting customers. Van den Poel and Buckinx (2005) investigate clickstream behavior and their influence on future purchase behavior. They provide information on search/browsing behavior and online consumer decisions. Furthermore, this study builds on the study of Vidden, Vriens, and Chen (2016) who compare different ensemble methods and Docinar and Leisch (2003), which are one of the few authors concerning cluster ensemble methods in a marketing research area.

(4)

4 This study uses data provided from a small e-commerce website and additional socio-demographic data of households in the Netherlands. Suitable variables for customer segmentation were analyzed including their possible relation with advertising effectiveness. Using the bagged clustering approach, a segmentation model was built. The segmentation model is based on the foundation of so called shopping intent data (browsing variables, customer engagement variables and transaction data) and socio-demographic variables. With the use of lookalike modeling, this segmentation model was served as input for a targeted lookalike advertising campaign and compared to the current generic advertising campaign of the e-commerce website. Advertising effectiveness for customer acquisition was measured with several online metrics.

The results show that developing customer segments on the foundation of online shopping behavior does not improve advertising effectiveness for customer acquisition in a lookalike advertising campaign compared to a generic advertising campaign. The conclusions are based on the results of the lookalike ad campaign of this study, and the results of the generic ad campaign of the e-commerce website. However, bagged clustering analysis seem to provide stable customer segments.

For the management, this research show that one can consider whether it is useful to invest time and money in developing customer segments on the foundation of shopping intent data. Based on the results of this study, it will not provide more advertising effectiveness than a generic advertising campaign. However, customer segmentation is a powerful marketing tool which can provide stable segments with the use of more advanced unsupervised learning methods such as bagged clustering.

(5)

5

ACKNOWLEDGEMENTS

This thesis represents the last part in finalizing the master programme in Marketing Management & Marketing Intelligence. The last months were challenging, and my analytical skills were tested more than ever. However, I am convinced I have learned a lot in a relatively short amount of time. I would like to express my acknowledgements to everyone who supported me during the journey towards graduating.

First of all, I want to thank Human Data Associates for giving me the opportunity to follow a thesis internship and let me dive into the ‘real world’ of analyzing big data. I would also like to thank my first supervisor Dr. Hans Risselada for his guidance and useful feedback throughout the thesis period. Finally, I would like to thank my friends and family for supporting me during the thesis project. Lastly, I want to mention my friend Jasmijn Staal separately for her support. We have been working on our thesis almost every day at Zernike or the university library and supporting each other when one of us had some struggles with their thesis.

(6)

6

1. INTRODUCTION

Online shopping has increased over the last decade. In the preceding year, online expenditures in the Netherlands have risen by 23% (GfK, 2017). Compared to traditional retailing, the e-commerce environment enables consumers to search for information and purchase products or services through direct interaction with the online store (Park & Kim, 2003). In response to the thriving development of e-commerce, many online retailers have developed Web-based information systems to handle enormous amounts of transactions on the Internet. These systems can automatically capture data on the browsing histories and purchase records of individual customers. In this area of big data, it is important for firms to deal with the overload of data that is available (Verhoef, Kooge, & Walk, 2016). To deal with information overload, it is important for firms to make efficient use considering the data that is available to them. Handle the available data in a smart way, leads to an effective marketing strategy which is key for firms (Bergemann & Bonatti, 2011). One of these marketing strategies is developing an advertising campaign for the right audience.

Marketing strategy, in general, is supposedly the result of a firm’s segmentation, targeting and positioning choices (STP) (Kotler, 1994; Toften, & Hammervoll, 2013). Segmenting sees customers with similar needs and buying behaviors grouped into segments using one or more variables. Targeting involves making resource allocation decisions that determine the prioritization of the segments. Lastly, positioning entails the development of marketing programs that are appropriate for the targeted segments (Venter, Wright & Dibbs, 2015). There are widespread benefits associated with adopting a customer segmentation approach. Segmentation mostly leads to a detailed customer analysis. Such analysis allows the firm to become more in tune with customer behavior (Dibb & Simkin, 2001). In an online environment, e-commerce websites try to reach the desired customer audience by the use of targeted advertising. Targeted advertising is a positioning strategy of the STP process. The targeting of advertising implies that firms can design media vehicles to target advertising messages to specific segments in the market (Iyer, Soberman, & Villas-Boas (2005).

(9)

9 see advertising as clutter. Customer segmentation is an essential marketing tool that can offer firms insights for better focus on the right audience (Vidden, Vriens, & Chen, 2016). In the era of online e-commerce, it becomes more important to use clickstream behavior and past purchase behavior for customer analysis (Wu & Chou, 2009). However, most research focuses on the prediction of future purchase behavior or targeting of the current customers (Van den Poel & Bukinx, 2005; Montgomery, Srinivasan & Liechty, 2004). Less research has been done in investigating customer acquisition. The focus of customer acquisition is to reach prospects.

The most used and effective tool to segment customers is cluster analysis. Cluster analysis an unsupervised learning method where a class of techniques is used to classify objects or cases into relatively homogeneous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters (Malhotra, 2010). It is common practice to use a single segmentation base only and run a single calculation of a single algorithm (Dolcinar & Leisch, 2003). Despite the fact that there are many other (advanced) cluster methods available, it is less common to apply these methods in marketing.

(10)

10 ‘ To which extent can customer segments be developed and used in an advertising campaign in order to increase advertising effectiveness for optimizing customer acquisition?’

Sub question 1: Do lookalike audiences that are created on the foundation of shopping intent data, improve the effectiveness of advertising for customer acquisition compared to a generic audience? Sub question 2: In which way do cluster ensemble methods improve the quality and stability of the cluster solution rather than a standard cluster analysis?

This research is twofold, it contributes to the literature of strategic STP marketing and Customer Relationship Marketing (CRM). Furthermore, this study expands our knowledge on ensemble cluster analysis, since ensemble cluster methods have gained popularity in the pattern recognition literature but are still quite unique in the marketing research area. For business purposes, implementing and developing this approach can allow the company to build sustainable competitive advantage.

This study consists of the following parts: first of all, a segmentation model is build that classifies different visitors of a small e-commerce website (henceforth website X) into segments, based on shopping intent data and socio demographic variables, which can be used for targeting at the customer acquisition level. For segmentation, means clustering, Bagged Clustering with K-means (BC-K) and Bagged Clustering with C-K-means (BC-C) are compared to each other using simulated data sets. The methods are being evaluated on their ability to find the true cluster solution. A direct comparison of BC-K and BC-C is not available in marketing literature yet. For targeting, we carried out an lookalike advertising campaign to assess the effectiveness of online advertising with the objective of attracting prospects for the e-commerce website (customer acquisition).

(11)

11 lookalikes are created (Facebook, 2017) For each segment (N segments) of the segmentation model, a lookalike audience is created and targeted with a different advertising message that suits the audience. So, for each segment, an adapted advertising message is created. Currently, website X advertises with a generic ad campaign. The settings for the different ad campaigns are discussed in section 4.3. Advertising effectiveness is measured by comparing the two advertising campaigns with several online metrics.

For this study, clickstream behavior and customer data (online shopping intent data) of website X was provided. Furthermore, socio-demographic data of households in the Netherlands was available. Therefore, the segmentation is built on the current customers of website X. The segmentation model will be used to target non-visitors ea. lookalike audiences. For each lookalike audience, a suited advertising message is created. In this way, one can assess whether lookalike audiences will respond to the advertisements.

Structure of the report

(12)

12

2. LITERATURE REVIEW

This chapter provides an overview of existing literature on advertising effectiveness in an online environment and variables based on browsing behavior, customer engagement behavior and transaction data that could be useful for clustering. Furthermore, socio-demographic variables are discussed. First, a framework is presented to visualize and summarizes the research. Second, advertising effectiveness for customer acquisition and lookalikes audiences are discussed. Third, the variables that are chosen for cluster input are discussed and their possible influence on advertising effectiveness, as well as socio-demographic variables. Finally, the essence of cluster ensemble methods is introduced, and its potential effect on model performance is discussed.

2.1 Research Framework

This theoretical framework visualizes the research. On the left side, the variables that are used for the segmentation model are presented. These variables are extracted from the data that was available from website X. This gives information about the current customers of the website. The segmentation model is based on the foundation of these variables [Q1].

The next part of the research looks at the effect on the model performance of cluster ensemble methods [Q2]. Two cluster ensemble methods and standard K-means cluster analysis will be tested on three different simulation data sets. The methods are compared with their hit rate, which gives an objective way to gauge how the various approaches are performing relative to each other. The best performing cluster method is applied on the data of website X. The resulting segments will be targeted in an experimental lookalike ad campaign the social media platform Facebook. With an email address, an individual customer can be identified on Facebook. With the use of lookalike modeling, lookalike audiences are targeted. Each lookalike audience is targeted with a different advertising message that is adapted to the segment.

(13)

13

(14)

14 2.2 Advertising effectiveness for optimizing customer acquisition

The topic of targeted advertising and its effectiveness on business performance have been extensively studied during the last few decades. In literature, many types of advertising are investigated. The focus of this study is on what we call user-targeted advertising for customer acquisition. In user-targeted advertising, advertisers target specific users based on properties such as demographics or interests. These properties may be explicitly specified by the user, or they may be inferred from the user’s behavior (Liu et al., 2016). Acquired customers can be explained as valuable prospects, those that can potentially turn into future customers (Shen, Geyik, & Dasdan, 2015). The objective of this study is to see whether the experimental lookalike ad campaign (see section 2.1) is effective for customer acquisition and more sufficiently, whether the resulting lookalike ad campaign works better than the current generic ad campaign of website X. In this study, advertising effectiveness for customer acquisition is measured with several online metrics, which are explained in section 4.3.

New prospects can be achieved by enlarging the target audience to include similar, like-minded users, referred to as lookalikes (Liu et al., 2016). Before we move towards lookalike audiences, it is important to look at previous research about advertising effectiveness in an online environment. Therefore, the customer acquisition process is described. After that, lookalike audiences are discussed.

2.2.1 What is customer acquisition?

The marketing ‘sales funnel’ offers a way to describe the customer acquisition process, which can be defined into different stages. These divisions vary from study to study, as do the definitions they use to characterize each part (Zhang, Kim, & Srivastava, 2016).

(15)

15 affects visitation to the company's website for users in most stages of the purchase funnel, but not for those who previously visited the site without creating an account.

To conclude, one can say that the concept of the sales funnel is important in explaining the link between online advertising and a firm’s outcome of interest. However, before we can discuss the variables for customer segmentation, it is important to know what is meant by lookalike audiences.

2.2.2 What are lookalike audiences?

Several studies have paid some attention to lookalike modeling and forms of lookalike modeling in an advertising context. Liu et al. (2016) describe lookalike modeling as identifying users similar to a gives user set. Zhang, Chen and Wang (2016) refer to lookalike modeling as a technology where advertisers can detect the users with similar interests to the known customers and then deliver the relevant ads to them. In the study of Kim et al. (2005), they make use of this profiling method in a business-to-consumer (B2C) environment. They build a model on their current customers and use that model on potential prospects to rank them from most to least likely to respond. According to Shen, Geyik, and Dasdan (2015), the use of lookalike modeling provides a higher targeting accuracy and thus brings more potential customers to the advertisers. Zhang, Chen, and Wang (2016) describe lookalike modeling as the audience extension problem because advertisers find new customers by extending the desirable criteria for their starting point, which is their existing audience or customers. The technology of lookalike modeling seems to gain interest in the online environment, and some even say it is essential in online advertising (Zhang, Chen, & Wang, 2016). The input for a lookalike audience on Facebook is the email address or phone number of a current customer. Based on gender, age, interest and living area, lookalikes are created (Facebook, 2017). The settings for the lookalike audiences in this study are discussed in section 4.3

2.3 Customer segmentation

(16)

16 segmentation as an issue where segments objectively exist and that it is mainly the task of the marketer to unveil them (Venter, Wright, & Dibb, 2014). In marketing segmentation, it is important in which extent the available independent segmentation variables are associated with the dependent criterion of interest, which is usually some aspect of behavior (Pesonen, 2013). In this study, the aspect of behavior is advertising effectiveness for customer acquisition. In the following section, we take a look at previous literature of online engagement and how it can affect advertising effectiveness.

The segmentation variables that are used are categorized in the following way: browsing variables, customer engagement behavior (online engagement), transaction data and socio-demographics. The overarching term for these sources is referred to as shopping intent data. Pesonen (2013) notes that segmentation conducted on types of data such as click stream data from a website provides efficient data for customer segmentation because it measures actual customer behavior instead of their opinions. Besides that, the assumption is that the lookalike audience behave the same way online as the current customer (section 2.2.2). Therefore, these variables could be interesting to segment the current customers. Furthermore, socio demographic variables, which are included in the segmentation model, are discussed as well.

2.3.1 Online engagement

2.3.1.1. Online browsing variables

(17)

17 of goal directed search toward a particular task will likely reduce the impact of peripheral materials like advertising. Exploratory behavior results in more browsing of media content, which can include advertising stimuli. These two types of online experiences are also often described in literature as utilitarian shopping motivations and hedonic shopping motivations. Utilitarian motivations relate to the functionality of shopping, while hedonic motivations are defined as consumers' enjoyment of the shopping experience itself (Babin et al., 1994; Hirschman & Holbrook, 1982; Anderson et al., 2014). Calder, Malthouse, and Schaedel (2009) distinguish between online utilitarian experiences and online intrinsic enjoyment experiences to understand consumer experience with media. Utilitarian experience is described as a users believe that the site provides information to help them make decisions. Online intrinsic experience enjoyment is described as a type of entertainment.

According to Moe (2003), exploratory browsing is derived by exploring and encountering new stimuli during visits. Therefore, hedonic browsing sessions should lead to more page variety than goal-directed browsing behavior. With more page variety, the number of page views and time on site will probably rise as well. In the study of Rohm and Swaminathan (2004), they mention the importance of targeting the variety seeker. Kahn (1995, p.139) defines variety seeking as “the tendency of individuals to seek diversity in their choices of services or goods” (Rohm & Swaminathan, 2004). Furthermore, there are different ways users can enter a website, the medium. One can divide the medium source into organic, CPC and referral. Visitors who are considered organic find a website after using ‘organic search results’ (Evans, 2008), and are therefore not referred by another website. According to Park and Chung (2009), a visitor who directly enters a website is linked with a goal-directed search motivation while consumers who enter via a referring website are associated with an exploratory search motive.

(18)

18 variables are included: time on page, page variety, session counts, medium and device category (see figure 2.1).

The above discussion indicates that people who have an exploratory browsing behavior, enjoy being on the internet. Therefore, they are more likely to respond to advertisements. Moe et al. (2003) conclude that users who are more focused on browsing, are more responsive to advertising.

2.3.1.2 Customer engagement behavior

Customer engagement (CE) is a well-known concept in literature. However, there are many different meanings of CE, and it seems that the definition of ‘engagement’ can be quite confusing (Brodie et al., 2011; Calder, Malthouse, & Schaedel, 2009; Van Doorn et al., 2010). Most previous studies about ‘engagement’ in an advertising context refer to the technology acceptance model (TAM), developed to understand the adoption of new technology. This theory explains the adoption of interactive shopping behavior as a form of technology assisted shopping (Childers et al., 2001; Davis, 1989).

(19)

19 Online engagement can refer to email behavior. Ting (2012) investigated email newsletters as a customer relationship marketing tool. Interaction was measured through open, click and response rates and they found that newsletters are useful for retention. MacPherson (2001) mentions that the decision to open an email from the firm, read the contents, and respond to the message depends on the receiver’s perception of the sender. Obtaining permission from consumers before sending email ensures that emailing improves the relationship with the client and therefore shows engagement with the firm. Email newsletters, eBook interest and whether a customer clicked in an email received from the website are therefore important measurements that show engagement.

The direct link between the types of customer engagement behavior as mentioned by Van Doorn et al. (2010) and advertising effectiveness is not available in previous studies. However, Van Doorn et al. (2010) suggest that CEBs can serve as a framework for segmenting customers based on types of engagement behaviors they show. Furthermore, they suggest that CEB must be considered in the context of advertising. Calder, Malthouse, and Schaedel (2009) do investigate the relationship between engagement and advertising effectiveness. They elaborate on two types of online engagement - Personal and Social-Interactive engagement, which affect advertising. They state that the interactive component of a user’s experience with the website is shown to affect advertising. They measure user experience and engagement using a survey. According to their study, a higher experience and engagement levels are associated with more ad effectiveness, and they conclude that users highly engaged with a media vehicle are more responsive to advertising.

The above discussion indicates that besides Calder, Malthouse, and Schaedel (2009), the direct relationship between online engagement behaviors and advertising effectiveness is not researched extensively. However, some studies discuss the influence of engagement behavior on firm performance such as van Doorn et al. (2010). Therefore it is suggested that users who show more customer engagement behavior will probably be more responsive to advertising as well. The following customer engagement behavior variables are used in this study: newsletter subscribers, email amount, eBook downloaders, email click, blog readers, and brands (see figure 2.1).

2.3.2 Historical transaction data

(20)

20 frequency (number of past purchases), and monetary value (average purchase amount per transaction) (Fader, Hardie, & Lee, 2005). Sarvari, Ustundag, and Takci (2016) emphasize the importance of RFM and recommend a combination of RFM and demographic attributes for clustering to capture more accurate customer segments. Van den Poel and Buckinx (2005) investigated the contribution of different types of predictors to the purchasing behavior at an online store. They take variables into account from four different categories in predicting online-purchasing behavior including historical online-purchasing behavior. According to their results, historical purchasing behavior is important when predicting online-purchasing behavior. According to Bucklin and Sismeiro (2003), a more exploratory task (see section 2.3.1.1. ) is expected to have a higher likelihood to make a transaction. Furthermore, they are expected not only to make a purchase but also to have a higher transaction amount. Cao and Zhang (2015) investigated factors of e-satisfaction in an e-commerce environment and mention that logistics, such as the speed of a payment method can affect e-satisfaction.

Concluding, historical purchase behaviors are already found to predictive for predicting online purchase conversion (Van den Poel, & Buckinx, 2005). Based on their notion, historical purchase variables are expected to not only provide accurate customer segments but are important in evaluating advertising effectiveness as well.

The variables that are included in this study are payment method, discount, amount of orders, transaction in the first session, average item revenue, days since last transaction and amount of sessions after last transaction (see figure 2.1).

2.3.3 Transaction data

(21)

21 The following variables are included: gender, life phase, education, income, social class, and region (see figure 2.1).

Overall, literature seems to suggest that the previously discussed variables are useful for customer segmentation. The variables do not only provide more optimal and accurate customer segments, Developing a lookalike advertising campaign based on these variables should improve advertising effectiveness as well (see figure 2.1). Summarizing, exploratory browsing behavior seems to positively affect advertising effectiveness to optimize customer acquisition. Furthermore, it is suggested that users who show customer engagement behavior are associated with advertising effectiveness as well. Historical purchase behavior is already found to predictive for predicting online purchase conversion (Van den Poel & Buckinx, 2005). Based on their notion, historical purchase variables are taken into account in evaluating advertising effectiveness. This leads to the following sub question:

Sub question 1: ‘Do lookalike audiences that arecreated on the foundation of shopping intent data, improve the effectiveness of advertising for customer acquisition compared to a generic audience?’

2.4 Cluster ensembles

(22)

22 Cluster ensembles have not yet been extensively discussed in marketing literature. Some ensemble cluster methods are studied and applied in the tourism market. D’urso et al. (2016) use ‘Bagged fuzzy clustering’ for profiling Chinese travelers in Western Europe. In the study of Dolcinar and Leisch (2003), winter tourist segments in Austria are identified by using so called bagged clustering. Both studies mention the successful application of ensemble clustering to enhance the performance of unstable or weak clustering algorithms. Other studies that elaborate on cluster ensemble methods can be found in pattern recognition literature. Vidden, Vriens, and Chen (2016) compare K-means with latent class analysis and ensemble analysis. In their study, latent class performs better than ensemble methods and K-means. However, they mention that ensemble methods can potentially have a big advantage when we are dealing with different segmentation bases, e.g., needs, behaviors, and lifestyle and that further research is needed. Therefore, the following sub question is formulated.

(23)

23

3. DATA

In this section, the data that is used to answer the research question is discussed. First, the data from website X is discussed. A description of the data that was available is provided, and the process of data aggregation is explained. Secondly, correlation is discussed. Then, the process of data cleaning is discussed. After that, the descriptive statistics of the data set is described to get a broader view of the customer base. And lastly, some assumptions regarding the data are explained.

3.1 Data collection

The data that was available consists of 6 different datasets, on different aggregation levels. Overall, it contains data on customers, orders, order lines, email and clickstream data of website X. Furthermore, two datasets were provided which contain information about households in the Netherlands.

(24)

24

Dataset Description

Clickstream dataset This dataset contains website behavior of website X and consists of 953.115 website sessions from 152.193 unique cookies. One cookie ID contains the information of the multiple sessions of a user. The dataset contains clickstream data over a period of 2017-04-25 till 2017-11-27. The structure of this dataset can be found in Appendix A

Email data The email data set consists of email interaction (214.130 records) between website X and its customers over the period from 2017-10-17 till 2017-11-27. This dataset is aggregated on individual email addresses; one email address equals an individual customer

Customer data This dataset contains information of the current customer base of website X. 6.521 unique customers are identified over a period of 2014-11-07 till 2017-11-27. This dataset is aggregated on individual email addresses; one email address equals one individual customer

Order data Contains information of 14,543 orders between 2014-11-20 and 2017-11-27. This data is aggregated on order level

Order Lines data Order lines of 14,543 orders between 2014-11-20 and 2017-11-27.This data is aggregated on order level

Bisnode geo household dataset

This dataset contains socio-demographic information of 7,686.755 households in the Netherlands. This data is aggregated on zip codes

Bisnode GEOMA dataset

Contains socio-economic and demographic information about 453.456 households in the Netherlands. This data is provided by ‘Centraal bureau voor de Statistiek’ (CBS) and MarktOnderzoeksAssociatie (MOA). This data is aggregated on zip codes Table 3.1: Datasets

3.2 Data aggregation

(25)

25

(26)

26 3.2.1 Correlation

According to Sarstedt and Mooi (2014), there should be low levels of collinearity among the variables when carrying out a cluster analysis. To see whether the segmentation variables influence each other a Pearson correlation analysis was conducted between the constructs. In the dataset, the variables have different types of variables. The phi coefficient is the measure of association between binary variables. However, the measure is similar to the Pearson correlation coefficient and its interpretation (Henrysson & Wedman, 1974). Therefore, only Pearson correlation analysis is conducted. Evans (1996) suggest the following guide for the absolute values of the Pearson correlation statistic: .00 - .19 ‘very weak’, .20 - .39 ‘weak’, .40 - .59 ‘moderate’, .60 - .79 ‘strong’ and .80 - 1.0 ‘very strong’.

After running the Pearson correlation, two items correlated very highly. First, ‘sum page views’ (a sum of all the page views of a single customer) and ‘sum item revenue’ (a sum variable of the total amount spent by a single customer) were included in the data set. ‘sum page views had a very strong correlation with session count (r= 0,803, p < .001). ‘Sum page views’ also had a very strong correlation with unique page views (r= 0,982, p < .001). Furthermore, ‘sum item revenue had a very strong correlation with amount of orders (r= 0,828, p < .001). Therefore, ‘sum page view’ was removed from the dataset. ‘Sum item revenue’ was transformed into ‘average item revenue’ (the average spending of a single customer, see table 3.2).

(27)

27 the ‘desktop’ variable. The negative correlation between tablet and desktop indicate that when more tablet is used, the use of desktop decreases and vice versa.

Furthermore, there were some correlations in the email variables. ‘Email amount’ and ‘newsletter subscriber’ have a strong correlation (r=0,717, p < .001). If a user is a newsletter subscriber, he or she probably receives more emails from the website as well. ‘Email click’ has a weak correlation with ‘newsletter subscriber’ (r= 0,226, p < .001), which can mean that users who click in an email, are more often newsletter subscribers. ‘Email click’ has a moderate correlation with ‘email amount’ (r=0,401, p < .001). If the amount of email increase, ‘email click’ increases as well.

3.2.2 Missing values and outliers

After merging the different datasets into one final dataset, missing values are checked. In the variable ‘gender’ there are 2,009 NA’s in total. To deal with these missing values, a list of front names was provided. These names were matched using the ‘Nederlandse voornamenbank’ from het Meertens Instituut (Meertens Instituut, 2017). After applying, there were 223 NA’s in the data for gender since some names were not recognized in the database from ‘het Meertens Instituut’, or only the initials were known. Therefore, predictive mean matching is applied using MICE in R. This is a form of nearest neighbor hot deck imputation. It predicts the data point, using the data that is available. The observation that is closest to this prediction is the imputed value. Next, there were some missing values in the household variables. ‘Life phase’ was missing 1,2 %, ‘education’ 0,2% and ‘social class’ 0,9%. Here, predictive mean matching is used as well. Also, the shipping extension was not available for 635 customers. For these customers, the correct zip code and house number were known, only not the extension. For this study, the average household information of the available zip code that belongs to these customers were taken.

(28)

28 values were so high that they could bias up the customer base. Therefore they were removed. After removing, 2,169 customers were left in the dataset.

3.2.3 Variable description

The following variables are selected and descripted in the figure below (see section 2.3 for variable selection for customer segmentation).

(29)

29

Variable Operationalization Level Frequency

(N = 2169)

Relative Frequency (%)

Email hash Unique email address which indicates a single customer

- 2169 100%

Organic Whether a single customer at least came to the website via organic search once, over all sessions 0= No 1 = Yes 1478 691 68,1% 31,9%

Referral Whether a single customer at least came to the website via referral once, over all sessions

0= No 1 = Yes 2073 96 95,6% 4,40%

Desktop Whether a single customer visited the website via desktop. ‘Yes’ means that from all the sessions, the customer at least came on the website via desktop once

0= No 1 = Yes 776 1393 35,8% 64,2%

Mobile Whether a single customer visited the website via smartphone. ‘Yes’ means that from all the sessions, the customer at least came on the website via smartphone once

0= No 1 = Yes 1638 531 75,5% 24,5%

Tablet Whether a single customer visited the website via tablet. ‘Yes’ means that from all the sessions, the customer at least came on the website via smartphone once 0= No 1 = Yes 1921 248 88,6% 11,4% Newsletter subscriber

Whether a single customer is subscribed to the newsletter and receives this by email

0= No 1 = Yes 482 1682 22,2% 77,8%

EBook Whether the customer downloaded the eBook available on the websit

(30)

30

Email click Whether a single customer clicked in an email received from the website

0= No 1 = Yes 1841 328 84,9% 15,1%

Blog visitor Whether a single customer has been on a blog page of the website.

‘Yes’ means that from all the sessions, the customer at least visited a blog page once

0= No 1 = Yes 1842 327 84,9% 15,1%

Brand 1 Whether in one of the

products the customer bought, the product brand was brand 1 0= No 1 = Yes 2146 23 98,9% 1,10%

products the customer bought, the product brand was brand 2 0= No 1 = Yes 890 1279 41,0% 59%

(31)

31

Brand else Whether in one of the

products the customer bought, the product brand was another brand than one of the nine top brands as indicated on the website 0= No 1 = Yes 1473 696 67,9% 32,1%

Bankoverchrijving Whether in all the

transactions a single customer did, the payment was via bankoverschrijving 0= No 1 = Yes 2159 10 99,5% 0,05%

Visa Whether in all the

transactions a single customer did, the payment was via Visa

0= No 1 = Yes 2120 49 97,7% 2,30%

MasterCard Whether in all the

transactions a single customer did, the payment was via MasterCard 0= No 1 = Yes 2017 152 93,0% 7,0%

IDEAL Whether in all the

(32)

32

Payment else Whether in all the

transactions a single customer did, the payment was

something else than above

0= No 1 = Yes 1666 503 76,8% 23,2%

Discount Over all the products a single customer bought, at least one of them was discounted

0= No 1 = Yes 2004 165 92,4% 7,60% Transaction first session

Whether the customer did an transaction in the first

0= No 1 = Yes 1041 1128 48,0% 52,0%

Gender Gender of a single customer 0= No 1 = Yes

1208 961

55,7% 44,3%

Life phase Life phase of a household in the Netherlands

0= unknown 1= Young singles 2= Middle aged singles 3= Older singles 4= Families with young children only

5= Families with younger and older children

6= Families with older children only

7= Young couples without children 8= Middle aged couples without children 9= Older couples without children 36 283 375 223 466 58 277 69 169 213 1,7% 13,0% 17,3% 10,3% 21,5% 2,70% 2,80% 3,20% 7,80% 9,80%

Social Class Social Class of a household in the Netherlands 1= Social class A 2= Social class B 3= Social class C 4= Social class D 5= unknown 470 504 648 321 226 21,8% 45,0% 74,9% 89.8% 0,10%

Education Education level of a

household in the Netherlands

(33)

33

Income Income of a household 1= minimum 2= below modal 3= modal 4= 1,5 times modal 5= 2 times modal 6= 2,5 times modal or higher 171 200 524 512 412 350 7,90% 9,20% 24,2% 23,6% 19,0% 16,1%

Region Geographic classification according to ACNIELSEN

1=Amsterdam, Rotterdam, Den Haag 2= N/Z Holland and Utrecht 3= Groningen, Drenthe, Friesland 4=Flevoland, Overijssel, Gelderland 5= Zeeland, Brabant , Limburg 691 669 161 323 325 31,9% 30,8% 7,40% 14,9% 15,0% Table 3.3: Binary and categorical variables

3.3 Preliminary analysis

In order to give some preliminary insights, the descriptive statistics of the socio-demographic variables of the dataset are discussed. The frequencies are presented in table 3.3. The socio-demographic variables of the dataset give us insights about the current customer base of website X.

(34)

34 online shoppers are younger, wealthier, better educated and have higher computer literacy (Swinyard & Smith, 2003). However, one can argue that the online shopping environment develops over time and that in this era the typology of the online shopper is already different. It seems that the customer group of website X consists of so called ‘young urban professionals’ which is a well-known concept in literature. Young urban professionals tent to earn more, live in bigger cities and have educational credentials and professional occupations. Furthermore, a popular view holds that the young urban professional is health conscious and very active, with a preference for luxury goods (Solomon & Buchanan, 1991). This fits with the type of products and services that website X sells.

3.4 Assumptions

Various assumptions are made throughout the process to validate the process and obtain usable outcomes. The following is assumed:

 It is assumed that the household information (from ‘geo household’ dataset) is representable for one individual customer.

 For gender, we assume that the front names which are normally spoken female, are females in the data set as well and names which are normally spoken given to males, are males in the data set as well.

 Companies are left out of the dataset, since we assume that this is not a relevant group for targeting.

 In the newsletter variable ‘dropped’ emails are left out of the dataset, we assume that these are customers that either have another email address or unsubscribed to the newsletter

 Only the orders that are shipped and completed were taken into account.

(35)

35

4. DATA METHODOLOGY

In this section, the methodology that is used in this study is discussed. First, three different cluster methods are discussed and the evaluation method that is used, the hit rate. Then, the lookalike ad campaign is explained and how it is compared to the generic ad campaign.

4.1 Cluster methods 4.1.1 K-means clustering

K-means is one of the simplest unsupervised learning algorithms. The aim of the K-means algorithm is to split the data set into a number of clusters (k) which are predefined. Each cluster is associated with a centroid. The algorithm goes through each of the data points and assigns the data points to one of the cluster centroids. The closeness of a data point and the centroid is measured with a Euclidean distance measure. This process is repeated until there is no change in the clusters. The objective function (1) is as follows:

(1)

(36)

36 4.1.2 Bagged Clustering K-means algorithm

In Bagged clustering, cluster results are combined by hierarchical clustering, i.e. the results of the base method are combined into a new data set which is then used as input for a hierarchical method (Leisch, 2001). It can be seen as a combination procedure for several partitioning results. Ensemble methods are successfully applied to enhance the performance of unstable regression and classification algorithms in a variety of ways. In bagged clustering, the main idea of bootstrap aggregation is used. That means, the creation of new training sets by bootstrap sampling, and incorporate it into the cluster analysis framework (Leisch, 2001). In this method, K-means is used as the base method. Furthermore, for Hierarchical clustering, different distance methods can be used. In this case, Ward’s method is applied. The Ward’s method is an agglomerative, bottom-up approach which minimizes the within group-dispersion at every iteration by using the minimum sum-squares criterion (Murtagh & Legendre, 2014). In bagged clustering, bootstrap samples are constructed from the original dataset. The base method was runned on each of these samples. The bagged clustering algorithm chooses a couple of random points, and by means of the base method (K-means), centers are created. Then, these centers are grouped further with a hierarchical cluster method, Which results in a dendrogram were one can see the distances. In a dendrogram, points are merged bottom-up. This is done by the Euclidean distance. The algorithm is summarized in figure 4.1.

(37)

37 4.2.3 Bagged Fuzzy C-means algorithm

The third method that is introduced is the Bagged Clustering with a Fuzzy C-means algorithm (BC-C). The algorithm works the same as Bagged Clustering with means (3.3.1 a), but instead of K-means as base method, Fuzzy C-K-means (FCM) is used as the base method. For Hierarchical clustering, Ward’s method is applied again as distance measure. FCM makes us of soft clustering. In non-fuzzy clustering (hard clustering), data is divided into clusters, where each data point can only belong to exactly one cluster. In fuzzy clustering, data points can potentially belong to multiple clusters (Dunn, 1973; Bezdek, 1981). FCM is based on minimization of the following objective function (2):

(2)

Where is:

(38)

38 4.2 Simulation data sets

For comparing the different cluster methods, 3 different simulation datasets are chosen. Since clustering is an unsupervised learning method, we don’t know the true cluster solution. In the simulation datasets, we know the true cluster solution. Therefore, the performance of the method can be measured and compared. We apply the above described cluster methods to three benchmark datasets from the UCI database are used, which is a well-known database repository: Iris, Wine, and Soybean. We empirically investigate this issue by examining the behavior of cluster ensembles on benchmark data sets. As the true clusters are known in classification problems, we can evaluate the clustering methods by comparing it with the true classes. The chosen benchmark sets are mostly used in pattern recognition literature and artificial intelligence. The data sets are chosen based on previous research. In the paper of Leisch (2001), they use the Iris data set as a benchmark model for the bagged cluster analysis with K-means as base method and hard competitive learning as base method. Karaboga and Ozturk (2011) use Iris and Wine to test the so called Artificial Bee Colony (ABC) algorithm, which is a clustering approach as well. Azimi and Fern (2009) use Iris, Wine, and Soybean to test another cluster ensemble algorithm, named the Maximal Similar Features (MSF). The benchmark sets are shortly described below.

1. Iris data set

The Iris data set consists of 150 observations on three species of iris (setosa, versicolor, and virginica). It has four attributes. The dataset consists of continuous variables (CRAN repository, 1988). The attribute characteristics are continuous. Because the data from website X also has some continuous variables this simulation data set is chosen.

2. Wine data set

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars, which are the three different classes. The analysis determined the quantities of 13 constituents found in each of the three types of wines. The variables in this dataset consist of 178 observations that are continuous (UCI learning repository, 1991). Because the data from website X also has some continuous variables this simulation data set is chosen.

3. Soybean data set

(39)

39 This is a dataset with binary variables which is representable for website X. A limitation of this dataset is that it is a hard prediction class since there are 19 classes of different soybeans. The dataset consists of 207 observations with 35 variables. However, this set is still useful for comparing methods because the datasets consist of categorical and binary variables, and are therefore representable for the data structure of website X. There were also missing values in this dataset. To make the dataset useable for analysis, the NA’s (2337 in total) are imputed by copying values from similar records in the same dataset. Imputing is done by predictive mean matching (pmm) with the R package MICE (see section 3.2.2)

In the application of cluster methods on the website X data, we do not know the true segment sizes. The different cluster methods are tested on simulation data, where the true segment sizes are known and are therefore useful for testing the cluster methods. The data of website X consists of continuous and categorical variables. Two of the simulation sets consist of continuous variables and one of categorical variables. An example of how the hit rates are computed in R is included in Appendix C.

4.2.1 Hit rate

To compare these methods, the hit rate is used. The Hit rate used in this study is a direct comparison to the true cluster solution for each method. This will give an objective way to see how the various approaches are performing relative to one another. Hit rates are calculated with the Adjusted Rand Index (Vidden, Vriens, & Chen, 2016). Basically, the higher the hit rate, the better. The formula (3) of the Adjusted Rand Index (ARI) is presented below.

(3)

(40)

40 4.3 Lookalike advertising campaign

After the different cluster methods are tested on the simulation datasets, the best performing method (see chapter 4) is used for the data of website X. The cluster analysis will result in a segmentation model with N segments, which are used as input for the targeted experimental lookalike ad campaign for website X on Facebook. On Facebook, lookalikes are created, and these lookalikes are going the be targeted. Each audience is targeted with a different advertising message, suited to the behavior of the specific segment where the lookalike audience is built on. A lookalike is matched by the email address of an individual customer. The lookalike ad campaign is live on Facebook from 22-12-2017 till 05-01-2018 (16 days). The generic ad campaign of website X was live from 1-10-2017 until 28-12-2017 (89 days). For comparison, we take the results from the first 16 days of the generic ad campaign to compare with our lookalike ad campaign.

(41)

41 Because we want to test whether our lookalike ad campaign ‘works better’ than the generic ad campaign of website X, the ad campaigns are compared. However, we must note that the generic ad campaign lasted 89 days while our lookalike ad campaign was running only 16 days. Therefore, we compare our results with the first 16 days of the generic ad campaign. However, by default, the attribution window of Facebook is set to 1-day view and 28-day click, which means you see actions that happened one day after someone viewed your ad and up to 28 days after someone clicked your ad (Facebook, 2017). The number of days between when a person viewed or clicked on the ad and then subsequently took an action is called an attribution window. In this study, the lookalike ad campaign lasted only 16 days.

Additionally, in our lookalike ad campaign, the lookalikes are excluding the current customers of

the website. The generic ad campaign does not exclude current customers from website X. This

(42)

42

Metric Description

Ad ID The unique ID of the ad viewing in the reporting. In this study, multiple Ad IDs are involved (see section 5.6)

 Advertisement generic campaign

}

 Advertisement lookalike audience 1  Advertisement lookalike audience N

Clicks The number of clicks on the ad

Reach The number of people who saw your ads at least once. Reach is different from impressions, which may include multiple views of your ads by the same people

Total action value The total value of all conversions attributed to the ads

Spend (in Euros) The estimated total amount of money spent on the ad

Impressions The number of times your ads were on screen.

Click-through-rate (CTR)

( Clicks / Impressions ) * 100

Unique Click-through-rate (unique CTR)

( Unique clicks / Reach ) * 100

Cost per thousand impressions (CPM)

(Spend / Impressions) * 1000

Table 4.1: Facebook report metrics

Eventually, an insight report of the lookalike ad campaign can be extracted. In the results of the ad campaign, one can see how the segments have responded on the advertisements. Furthermore, the insight report of the (current) generic ad campaign are available for comparison. The generic ad campaign was live from 1-10-2017 till 28-12-2017. The information that is measured from both campaigns is captured in table 4.1.

Form together the lookalike ad campaign

(43)

43

5. FINDINGS

In this chapter, the findings are reported. First, the results regarding the compared cluster methods are presented. Secondly, the findings of the bagged cluster analysis and the resulting segmentation model is presented. After that, the settings and results of the generic ad campaign and the lookalike ad campaign are reported.

5.1 Comparing cluster methods on simulation data sets

The three cluster methods as discussed in the previous section (4.1) were tested on three different simulation sets. The results are shown in table 5.1, 5.2 and 5.3. Each method was ran 10 times. This resulted in 10 seeds for every method. Seed was set for reproducibility. For Bagged clustering, the exact number of the centers is not a very sensitive parameter within the framework. The only critical issue is that it should be not lower than the number of segments expected to exist in the data (Dolcinar & Leisch, 2003). Therefore, the base method K-means (BC-K) and C-means for Bagged clustering (BC-C) were used with K=10 centers for the Iris data set and Wine data set. Since the Soybean dataset has 19 classes, the base centers were set to K=20.

1. Iris data set

For Iris, the base method was applied on B=50 bootstrap samples (150 observations)

Seed K-means BC-K BC-C 1 0,5875 0,8183 0,7195 2 0,7392 0,7455 0,6170 3 0,6740 0,7073 0,6096 4 0,6371 0,7733 0,7322 5 0,7114 0,7455 0,7073 6 0,6729 0,7455 0,7455 7 0,7313 0,7734 0,7073 8 0,7027 0,7195 0,7322 9 0,7982 0,6827 0,7322 10 0,6212 0,8348 0,6536 Average 0,6876 0,7546 0,6957 MAX 0,7982 0,8348 0,7455 MIN 0,5875 0,6444 0,6096 Distance 0,2107 0,1521 0,1385

(44)

44 On average, BC-K performs the best at the Iris data set. On average it predicts 75% of the true cluster solution right, which is much higher than K-means (69%) or BC-C (69%). Also, the highest score in ten runs was from BC-K. As one can see, both variants of bagged clustering also perform far more stable than K-means because the distance between the highest score and lowest score is much smaller.

2. Wine data set

For the Wine data set, the base method was applied on B=50 bootstrap samples (178 observations).

Seed K-means BC-K BC-C 1 0,9142 0,8951 0,9149 2 0,8998 0,9458 0,9486 3 0,9249 0,9150 0,8791 4 0,6974 0,8707 0,8635 5 0,8919 0,9487 0,9326 6 0,7956 0,8799 0,9486 7 0,8756 0,9486 0,8635 8 0,8265 0,8625 0,8831 9 1 0,7746 0,9309 10 0,9195 0,9324 0,9326 Average 0,8745 0,8973 0,9098 MAX 1 0,9487 0,9487 MIN 0,6974 0,7746 0,8636 Distance 0,3025 0,1740 0,0850

Table 5.2: Hit rate Wine data set

(45)

45 3. Soybean data set

For the Soybean dataset, the base method was applied on B=50 bootstrap samples (207 observations). Seed K-means BC-K BC-C 1 0,4145 0,4350 0,3651 2 0,4318 0,4818 0,3661 3 0,4362 0,4351 0,3446 4 0,4612 0,5257 0,3214 5 0,4359 0,4961 0,2964 6 0,5225 0,5197 0,3475 7 0,4829 0,4913 0,3100 8 0,4797 0,4693 0,3095 9 0,5682 0,5278 0,2884 10 0,4656 0,4339 0,3336 Average 0,46989 0,4816 0,3282 MAX 0,5682 0,5273 0,3661 MIN 0,4145 0,4338 0,2884 Distance 0,1536 0,093993 0,0777

Table 5.3: Hit rate Soybean data set

On average, BC-K performs best on the soybean data set. On average it predicts 48% of the true cluster solution right. K-means scores on average 46% and BC-FD has on average the lowest score with only 32%. However, BC-C performs the most stable and has the smallest distance between the highest and the lowest score. BC-K also scores good with the distance (0,09) compared to K-means clustering which has a much larger distance between its highest and lowest score (0,15). 5.1.2 Conclusion

(46)

46 5.2 Customer segmentation results

In this section, the results regarding the bagged clustering approach is discussed and the resulting segmentation model is presented.

5.2.1 Bagged cluster analysis results

First, the dataset was normalized so that the values of the variables that were on a different scale could be compared. For the website X dataset, bagged clustering with K-means as a base method was used. Because cluster analysis is an ‘unsupervised learning technique’, we do not know whether the outcome corresponds to the ‘real’ number of clusters in the data. Therefore we looked for 2 to 10 clusters. The base method was used with K = 10 centers. The base method was applied on B = 100 bootstrap samples, resulting in a total of 1000 centers, which were then hierarchically clustered using the Euclidean distance and Ward linkage method.

Figure 5.1 shows the dendrogram resulting from the hierarchical step of the bagged clustering procedure, illustrating the stepwise merging process. The plot shows the relative height of aggregation (black line) and the first differences (grey line). The distance from 1 to 2 clusters is 0.5 (figure 5.1). Thus, there is a recommendation in favor of the two-cluster solution.

(47)

47 5.2.2 Segmentation model

First, the results of the bagged cluster analysis are provided. After that, the resulting segmentation model is discussed. Then, the settings for the generic ad campaign and the lookalike campaign are discussed. Lastly, the results of the campaigns are reported.

5.2.1.1 Segment profiles

For each segment, the descriptive statistics are provided. The descriptive statistics for both segments are presented in table 5.4 and 5.5. Table 5.4 represents the continuous variables and table 5.5 represents the binary and categorical variables. After that, the segment profiles are discussed, and their main observations are highlighted. To see whether the variables in the segments significantly differ from each other, an independent samples t-test was performed to compare the means of segment 1 and segment 2 on the continuous variables.

(48)

48

(49)

49

Variable Cluster Level Frequency

(N = 2169) Relative Frequency (%) Organic Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1160 411 318 280 73,8% 26,2% 53,2% 46,8% Referral Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1571 0 502 96 100% 0,0% 83,9% 16,1% Desktop Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 600 971 176 422 38,2% 61,8% 29,4% 70,6% Mobile Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1150 421 488 110 73,2% 26,8% 81,6% 18,4% Tablet Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1390 181 531 67 88,5% 11,5% 88,8% 11,2% Newsletter subscriber Cluster 1 (N=1571)

Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 404 1167 78 520 25,7% 74,3% 13,0% 87,0% E-book Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1571 0 574 24 100% 0,0% 96,0% 4,0% E-Mail click Cluster 1 (N=1571)

Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1453 328 388 118 92,5% 7,5% 64,9% 35,1% Blog visitor Cluster 1 (N=1571)

(50)

(51)

51 Cluster 2 (N=598) 1 = Yes 0= No 1 = Yes 479 381 217 30,5% 63,7% 36,3% Bankoverschrijving Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1571 0 588 10 100% 0,0% 98,3% 1,7% Visa Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1571 0 549 49 100% 0,0% 91,8% 8,2% MasterCard Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1466 105 551 47 93,0% 6,7% 93,3% 7,9% IDEAL Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 357 1214 107 491 22,7% 77,3% 17,9% 82,1% Payment else Cluster 1 (N=1571)

Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1217 354 449 149 77,5% 22,5% 75,1% 24,9% Discount Cluster 1 (N=1571) Cluster 2 (N=598) 0= No 1 = Yes 0= No 1 = Yes 1530 41 474 124 97,4% 2,6% 79,3% 20,7% Transaction first session Cluster 1 (N=1571)

(52)

52

Life phase Cluster 1 (N=1571)

Cluster 2 (N=598)

0= unknown 1= Young singles 2= Middle aged singles 3= Older singles 4= Families with young children only

6= Families with older children only 7= Young couples without children 8= Middle aged couples without children 9= Older couples without children 0= unknown 1= Young singles 2= Middle aged singles 3= Older singles 4= Families with young children only

6= Families with older children only 7= Young couples without children 8= Middle aged couples without children 9= Older couples without children 29 221 266 145 342 42 200 46 127 153 7 62 109 78 124 16 77 23 42 60 1,8% 14,1% 16,9% 9,2% 21,8% 2,7% 12,7% 2,9% 8,1% 9,7% 1,2% 10,4% 18,2% 13,0% 20,7% 2,7% 12,9% 3,8% 7,0% 10,0% Social class Cluster 1 (N=1571)

Cluster 2 (N=598) 1= Social class A 2= Social class B 3= Social class C 4= Social class D 5= unknown 1= Social class A 2= Social class B 3= Social class C 4= Social class D 5= unknown 338 363 474 235 161 132 141 174 86 65 21,5% 23,1% 30,2% 15,0% 10,2% 22,1% 23,6% 29,1% 14,4% 10,9%

(53)

53 Cluster 2 (N=598) 0= Unknown 1= Low 2= Middle 3= High 1 113 173 311 ,2% 18,9% 28,9% 52,0% Income Cluster 1 (N=1571) Cluster 2 (N=598) 1= minimum 2= below modal 3= modal 4= 1,5 times modal 5= 2 times modal 6= 2,5 times modal or higher 1= minimum 2= below modal 3= modal 4= 1,5 times modal 5= 2 times modal 6= 2,5 times modal or higher 137 143 396 358 294 243 34 57 128 154 118 107 8,7% 9,1% 25,2% 22,8% 18,7% 15,5% 5,7% 9,5% 21,4% 25,8% 19,7% 17,9% Region Cluster 1 (N=1571) Cluster 2 (N=598) 1= Amsterdam, Rotterdam, Den Haag 2= N/Z Holland and Utrecht 3= Groningen, Drenthe, Friesland 4= Flevoland, Overijssel, Gelderland 5= Zeeland, Brabant , Limburg 1= Amsterdam, Rotterdam, Den Haag 2= N/Z Holland and Utrecht 3= Groningen, Drenthe, Friesland 4= Flevoland, Overijssel, Gelderland 5= Zeeland, Brabant , Limburg 512 486 119 226 228 179 183 42 97 97 32,6% 30,9% 7,6% 14,4% 14,5% 29,9% 30,6% 7,0% 16,2% 16,2%

Table 5.4: Descriptive statistics binary/categorical variables

(54)

54

Segment 1: Try-out users (72,4%)

Segment 1, called try-out users, is the largest group with 72,4% of the customers included. According to their browsing behavior, they have a high time (M=144,76, SD=151,59) on page, t(2167) = 5,425, p < .001.

If we look at the customer engagement behavior, this group user more mobile to browse on the website (26,8%) than the other group (18,4%), Chi-Square(1)= 16,545 , p < .001. It seems that this group use tablet more to browse on the website as well (11,5%). However, there was no significant difference between the groups in their average use of tablet, Chi-Square(1)= ,043, p = ,836. This group is the most interested in buying brand 2 (52,1%) and brand 4 (19,6%).

The try-out users are typically users who buy in the first transaction (61,7%) and then seem to leave as a customer since they have on average 228 days since their last transaction (SD= 143,410) which is almost 8 months. The group does not seem to like use the paying method ‘bankoverschrijving’ and does not use ‘visa’ to pay. They also seem to spend less on average (€49,43, SD=34,15). This is consistent with the fact that they are in a lower income class (25,2% earns modal), Chi-Square(5) = 11,166, p < .05. This group mostly live in the cities Amsterdam, Rotterdam and the Hague (32,6%). However, this does not significantly differs from the other group, Chi-Square(5) = 2,976, p = ,562.

Segment 2: Fascinated users (27,6%)

Developing customer segments for an online advertising campaign in order to optimize customer acquisition with advanced unsupervised learning: A case study

Developing customer segments for an online advertising

campaign in order to optimize customer acquisition with

Developing customer segments for an online advertising

campaign in order to optimize customer acquisition with

advanced unsupervised learning: A case study

MANAGEMENT SUMMARY

ACKNOWLEDGEMENTS

TABLE OF CONTENTS

1. INTRODUCTION

2. LITERATURE REVIEW

3. DATA

4. DATA METHODOLOGY

}

5. FINDINGS