Comparison of binary dependent variable approaches used for churn prediction in non-contractual online retail setting Master Thesis

(1)

1

Comparison of binary dependent variable approaches used for

churn prediction in non-contractual online retail setting

Master Thesis

University of Groningen

Faculty of Economics and Business

Department of Marketing

Date:

June 23

rd

, 2014

Name:

Vlastimil Kosik

Student Number:

s2348101

Address:

Verlengde Oosterweg 24a-1, 9727, GR

E-mail:

kosik.vlastimil@gmail.com

Phone Number:

+421 908 749 589

Supervisor:

dr. Hans Risselada

(2)

2

Abstract:

The research presented in this paper examines the consequences between the two binary dependent customer churn approaches in non-contractual setting. The results of the analyses have confirmed the initial expectations between the absolute and partial churn strategies. Despite of the fact that partial churn method has been found to lead to complete customer attrition, due to differences and inability of this method to detect few important churn predictors it has been concluded that the two approaches measure slightly different phenomena. Additionally, both partial and absolute churn approaches have their own pros and cons, thus, special attention needs to be paid during the decision making process of which approach to employ.

(3)

3

1. Introduction

It has been approximately fifteen years ago when both entrepreneurs and researchers had first witnessed the rise in importance of customer relationship management (CRM). The rise was followed by growing number of scientific papers aimed at different aspects of CRM, as well as hundreds of companies making significant investments in order to improve relationships with their customers (Verhoef, 2003). There is one particular aspect which is worth mentioning: “CRM concept enabled companies to leverage databases and modern communication technologies to think and act at individual customer level” (Reinartz, & Kumar, 2002). At first glance it might seems as some minor advance in the way of interaction with customers but when one thinks about it in more depth, without the ability to collect information about individual customers, many of current business strategies would not be feasible just because companies would miss the necessary data.

The same argument applies to the main topic in this paper, which is customer attrition. Even though customer attrition, or in other words customer churn, has become quite popular topic in the past decades, there are still areas which are rather under-researched. One of these areas is customer churn prediction in non-contractual retail setting which has been analysed only by a small number of researchers (Bucklinx, & Van den Poel, 2005; Tamaddoni Jahromi, Sepehri, Teimourpour, & Choobdar, 2010; Fader, & Hardie, 2005). In non-contractual setting, the actual reality of whether a customer has left the company or not is unavailable and can only be assumed, thus affecting the amount of attention this subtopic has attracted. Up to date, two different approaches employing specific strategy in development of the binary dependent variable have been used in churn prediction modelling for non-contractual setting industries, namely absolute and partial churn. In absolute churn, researchers and managers rely on the assumption that customers who become inactive for a certain period have left the company for good, and are thus considered as churners. Partial churn, on the other hand, focuses on early detection of switching behaviour by analysis of current spending level of an individual in comparison to previous periods and customers who show significant decrease in share of the wallet are considered to represent a partial defection.

(5)

5 The two described approaches have both been presented by their authors as fully viable strategies in non-contractual churn prediction; however, the current literature stream indicates that partial churn method might provide biased results in terms of the explanatory power of some covariates (Knox, & van Oest, 2014; Kopalle, Mela, & Marsh, 1999). Additionally, previous research by Risselada et al. (2010) undermines the actual power of the absolute churn method, which takes significantly longer to develop and gather data, and therefore, providing more outdated data and less valuable findings than could be obtained using the partial churn method. These main drawbacks of each method lead to increased level of ambiguity for all who are interested in churn modelling in the already ambiguous non-contractual setting. For this reason, research aimed at providing empirical evidence in support of these claims is necessary, in order to gain the necessary knowledge about the actual effectiveness of each approach within different scenarios.

The uncovered gap in research on churn methods has opened a place for this particular master thesis, where the main focus is put on comparison, with emphasis on deviations between the actual effects of predictors, of the two previously introduced non-contractual binary dependent variable churn approaches. The study has successfully revealed several deviations between the partial churn and absolute churn methods. It was found that partial churn fails to successfully measure the real effects of specific variables representing a failure recovery, a discount and a relationship breadth. On the other hand, it has also been found that in addition to the significant inability to detect several effects of churn, the partial churn method is able to depict the type of majority of important relationships as well as the fact that it leads to complete attrition, and thus, in some specific scenarios offers a useful alternative to absolute churn approach.

1.1 Structure

(6)

6

2. Literature review

This section will begin with a short introduction of the main research objectives and contributions. Next, the differences between contractual and non-contractual setting will be clearly explained. Subsequently, in order to help the reader familiarise with the main concept and the research interest, a description of all known approaches employed previously by various scientists in the field of non-contractual churn modelling will be provided. Lastly, various types of possible predictors that may play an important role in differentiating between the partial and absolute churn method will be described and linked to each binary approach.

2.1 Objective

Throughout the brief introduction, several theoretical motivations for the main purpose of this study have been presented. Firstly, due to the fact that both approaches, which rely solely on the use of binary dependent variable, have been developed by two independent groups of researchers, little is known about the relative added value they are supposed to provide. Secondly, even though each method has been argued to be a viable solution for identification of causes of attrition in non-contractual setting, evidence suggests that partial churn method suffers from inability to detect specific relationships representing the attrition behaviour (Koex, & van Oest, 2014; Kopalle et al., 1999). However, because these methods have only been presented as individual solutions to churn, the flaws of each strategy remain unknown and lack of empirical evidence is prevalent. Lastly, while it is expected that both methods are capable of detection of churners, it is also suggested that the choice of a given method should not be based on random decision but rather follow strict guidelines because the superiority of a partial churn over an absolute churn strategy, and vice versa, can differ depending on what is the goal of the managers or researchers involved.

(7)

7 In more depth, the following sub-questions will be studied in order to address the principal objective presented above:

What are the effects of different model building strategies on the statistical significance and the type of the effects of various predictors?

What are the effects of different model building strategies on model fit?

What are the most important determinants of customer churn in each proposed method of model building?

How successfully is the partial defection approach able to predict actual churn behaviour? What approach (partial vs. absolute churn) should be a preferred choice on general and during specific events?

2.2 Customer churn

As companies reach maturity stage, majority is faced with the realisation that acquisition of new customers is not a straightforward task. It does not matter whether it is caused by the market being saturated or by tense competition, the shift in focus from customer acquisition to customer retention can be observed in majority of companies (Ang, & Buttle, 2006). Customer retention has already gained an important place in research and the business field, providing many valuable findings. Firstly, it has been found that an increase in customer retention can have a large positive impact on the performance of a company. By looking at findings of Gupta, Lehman and Stuart (2004), who state that “on average 1% increase in retention increases firm value by 5%” or that increasing the retention rate by 5% can double the profits in some industries (Reichheld, 1996), one can immediately see why there are so many companies consider improvements in customer retention to be a core strategy.

(8)

8 The literature concentrating on customer churn modelling is also quite abundant and there are many different scientific articles investigating this topic. Nonetheless, what is surprising is that in majority of the papers, authors restrict themselves to data from only a few types of industries. Telecommunication providers definitely belong to the industry that has attracted the most attention (Gustaffson, Johnson, & Roos, 2005; Neslin et al., 2006; Lemmens, & Croux, 2006;Lima, & Mues, 2008; Braun, & Schweidel; 2011; Risselada, Verhoef, & Bijmolt, 2011). The second most attractive industry consists of financial service providers (Larivie`re, & Van den Poel, 2005; Kumar, & Ravi, 2008). Lastly, there are also papers looking at data from the energy market (Wieringa, & Verhoef, 2007). After examining the resemblance between these industries one can find several characteristics that they have in common and at the same time many aspects in which they differ from each other. While there might be more possible explanations for this phenomenon, the most likely cause is that in all of the previously mentioned industries, customers generally sign contracts with their service providers. These contracts are time bound and in order to continue in a customer-company relationship a contract has to be re-established before the initial period reaches the last day of agreement. In terms of data analysis, contracts provide companies a great advantage over the companies that operate in non-contractual setting because they make the task of identifying whether a customer has decided to leave the company or stay for another pre-agreed period of time very straightforward.

(9)

9 Telecom Finance Retail Contractual Non-contractual

Gustaffson et al. (2005)

×

Neslin et al. (2006)

×

Lemmens & Croux (2006)

×

Lima & Mues (2008)

×

Braun & Schweidel (2011)

×

Risselada et al. (2011)

×

Eiben et al. (1998)

×

Larivie’re & Van den Poel (2005)

×

Kumar & Ravi (2008)

×

Fader, Hardie & Ka Lok (2005)

×

Van Den Poel & Buckinx (2005)

×

Tamaddoni Jahromi et al. (2010)

×

This study

×

Table 1: Summary of articles on customer attrition

When it comes to the modelling of customer attrition in contractual setting, the approach is rather straightforward and majority of the researchers tend to use the known strategy of having a binary dependent variable which indicates whether a customer churned (1) or not (0). Generally, the only difference is between the various statistical methods that are being developed to further improve the fit of the models. However, the researchers interested in modelling customer churn in non-contractual setting have not yet discovered an approach which would become appealing enough for the majority of the following modellers to have a tendency regularly prefer one approach over another. Following a look at the most influential papers from non-contractual setting in Table 1 , one realises that all authors have developed a different approach in how they classify whether a customer is likely to leave the company or not.

(10)

10 developed different approaches, which as it seems measure slightly distinct phenomenon, and thus provide findings that vary across the methods. Thus the question that needs to be asked is how can this diversity in the employed approaches be explained, and which approach yields the most acceptable results in terms of the best model fit and coefficient values? Surprisingly, no publications report an attempt to answer this question up to date. The lack of answers leads to increased ambiguity for researchers and managers who would want to follow on the previous papers and choose which approach should be employed, it also opens door for already mentioned research questions that are the main interest of this study and are summarised under the section 2.1. However, before making some more in depth hypotheses that could lead into more valuable conclusions, it is necessary to take a look at the roots of the binary dependent variable approaches used in non-contractual setting.

2.3 Different approaches in non-contractual settings

As mentioned above, compared to the contractual setting, where most of the research is being executed under nearly identical conditions, the papers where authors try to perform churn prediction in non-contractual setting are both scarce in quantity and at the same time rather different from each other. In the following subsections, each of the known approaches will be explained in more detail.

2.3.1 Absolute churn method

(11)

11 instead of looking at whether customer cancelled or refused to prolong the current contract, the churners are customers who have not made any purchase during certain pre-specified period of time, where length of this period is industry and company specific and might vary significantly in length. The method itself is not very popular among researchers and up to date there is only one published work that has utilised this approach (Tamaddoni Jahromi et al., 2010). However, the exact opposite might apply to business field, where managers of many companies, especially those which are just starting with their implementation of marketing intelligence, tend to favour this technique because it is the least demanding in terms of various computations. Nevertheless, from a statistical or mathematical point of view, not many differences can be observed when the method is compared to typical contractual setting prediction. All in all, the main advantage of this method is the convenience it offers by allowing to consider the entire nature of customer attrition as if it has taken place in a contractual environment. On the other hand, a disadvantage of the absolute churn is that it takes a long time until all necessary data is collected and ready for use. The reason of the extended time requirements is that a company has to first collect data and once data collection is finished, it is necessary to wait for a pre-specified period of time before it is able to find out which of the customers have not continued in purchasing the products from a given retailer/service provider.

2.3.2 Partial churn method

(12)

12 she has only spent 25 euros, which is approximately 17% of the value that he or she has spent in the previous period. In Figure 2, there is an individual who has not yet shown significantly decreased patterns in purchase behaviour, therefore, representing an individual who has not been classified as partially churned.

Figure 1: Example of a partial churner

Figure 2: Example of a non-churner

2.3.3 Stochastic probability modelling method

(13)

13 differences, the approach could even be classified as belonging to an entirely different field. First models that were developed for similar purposes were negative binomial distribution (Ehrenberg, 1959) and Pareto/NBD (Schmittlein, Morisson, & Colombo, 1987). However, as these models were considered to be substantially more complex and demanded a solid mathematical background, only very limited number of researchers or managers has decided to adopt it (Hardie, & Fader, 2005). Due to this particular reason, Fader and Hardie began developing approaches that could be used also by individuals without advanced mathematical and statistical backgrounds (Fader, & Hardie, 2001).

2.3.4 Summary of the non-contractual approaches and further research

development

(14)

14 option is true, the conclusion will lead to important findings, allowing for improved decision-making when choosing an approach.

Figure 3: Comparison of time cost of partial and absolute churn

As previously indicated, in order to research whether the partial churn method truly offers a viable improvement in comparison to absolute churn, the main focus of this master thesis is examining what are the consequences (in terms of strength and type of observed effect and statistical significance as well as the ability to produce correct predictions) of treating customers as partially churned compared to the absolute churn classification.

(15)

15

2.4 Determinants of customer churn

In order to evaluate which way of customer churn classification brings more acceptable and reliable results, it is necessary to develop as complete model as possible. In the next step, the same predictor variables will be used while applying both methods. Evaluation of the absolute and partial churn will be then carried out by close examination of the coefficients and the predictive power. By doing so, it will be possible to determine to what extent is a partial churn method able to indicate the actual churn behaviour. As has been previously noted, there is currently only one study that applies absolute churn method to the non-contractual setting (Tamaddoni Jahromi et al., 2010). A small complication arises because of the fact that authors of the paper focused on final model fit and model performance and have kept the values of coefficients of predictor variables secret due to privacy concerns of the data provider. Nevertheless, authors of the absolute partial churn article were fully confident about the usefulness of the method and have stated that it is a viable strategy for both identification of the right predictors as well as individuals most likely to churn and can be used as fairly acceptable substitute solution, yielding satisfying results in multiple different industries (Tamaddoni Jahromi et al., 2010). Therefore, it can be expected that if the findings of the authors are shown to be valid, the absolute churn approach should yield results that are nearly identical to results from churn modelling papers in both contractual and non-contractual settings. The identification of independent variables will be based solely on what previous research published. After a sufficient amount of literature for each variable is provided, the variable will be also linked to partial churn. Lastly, the expectations of possible differences or similarities will be made and transformed into hypotheses.

2.4.1 Recency, frequency and monetary value

Recency, frequency and monetary (RFM) value variables are well known in churn literature and there are only a few studies which have not tried to account for these two explanatory variables. According to Miguéis and Van den Poel (2005), recency indicates how recently a customer has made a purchase and is typically the most powerful variable out of the RFM predictors (Miglautsch, 2000). Most prior studies conclude that recency has a positive effect on the switch probability and it has been found that the lower the recency value, the lower the chance that a customer will churn (Buckinx, & Van den Poel, 2005).

(16)

16 or she is (Bolton, Lemon, & Verhoef, 2004). Opposing this statement, Zeithaml (2006) argues that purchasing behaviour may not always reflect a customer’s tendency to churn and he believes that based on dissatisfaction and negative attitude towards the company, customer may develop an intention to leave. This not only indicates the importance of the positive effect that satisfaction of customers may have on the customer’s decision to remain active but also shows that at the same time dissatisfied customer may have tendency to make radical decisions, such as absolute termination of the company-individual relationship. In 1999, Garbarino et al. have introduced a term called overall satisfaction, where the overall satisfaction corresponds to customers’ evaluation of all previously made transactions as well as satisfaction with the goods, services and even physical facilities of the firm. A certain degree of similarity can be observed when Garbarino’s definition of overall satisfaction is compared to the term coined by Jamal and Bucklin, customer service experience, which they have defined as “individual’s overall evaluation of the firm’s customer service based on prior experience over the duration of the customer-firm relationship” (2006). Therefore, if majority of individuals are unsatisfied with the service quality, the frequency might also have positive effect on attrition. Thus, depending on the final sign of the coefficient representing frequency, it should be possible to conclude whether the frequent buyers are satisfied with the service or not.

The possible intention to leave the company, however, does not have to have immediate effect on the purchasing behaviour and the same purchasing patterns may be visible until the customer finds the best alternative to which he or she can switch to (Zeithaml, 2006). Using monetary value as a predictor of churn is based on a similar rationale. While this variable usually offers the least predictive power out of RFM, in majority of the cases, the relationship between customer attrition and monetary value has been found to have significant negative relationship with churn (Buckinx, & Van den Poel, 2005; Migueis, Van den Poel, Camanho, & Falcão e Cunha, 2012); there are also few examples when the monetary value had positive relationship with attrition. For example, Li (1995) found the sales volume to be positively related to customer attrition, and in case of Kumar and Reinartz (2003) the relationship has been found to “have U-shaped relationship, very light and very heavy users having higher tendency to churn”. However, as these findings were mostly the special case of a specific industry which cannot serve for provision of generalized findings, it has been decided to follow Ganesan’s argument (1994), in which the researcher has stated that the more a customer invests within a company, the more likely he or she is to stay, and thus, expect that the monetary value of an individual to have a negative linear effect on customer churn.

(17)

17 how loyal (in terms of purchasing pattern) customer remains, it is expected that the effects of RMF on the partial churn will be significantly more substantial than effects of RFM on the absolute churn. This reasoning is also supported by results of research by Van den Poel and Buckinx (2005) who found that RFM, in terms of the ability to separate behaviourally loyal customers from those who have tendency to partially defect, have dominated all other variables. This has been considered to be against their initial expectations where the previous research has indicated and put strong emphasis on predictive power of other demographic and industry specific control variables.

Hypothesis 1: All three RFM predictors are expected to have a significant relationship with both partial and absolute churn; however, the actual predictive power of RFM is expected to be significantly higher for partial churn than for absolute churn.

2.4.2 Failure recovery

(18)

18 Nonetheless, when it comes to the relationship between complaints/failure recovery and the partial churn, the actual effect on churn might not be successfully detected by usage of this strategy. The reasoning is offered in the findings of Knox and van Oest (2014), who claim that “unless the customer leaves the company immediately after a complaint, or a second failure occurs shortly after the first, the relationship quickly returns to business as usual”. Thus, due to failure recovery’s immediate effects on customer behaviour that are supposed to result in either the customer leaving the company for good or returning back to old relationship, it can be expected that no changes in purchasing patterns will occur if the relationship is negative. In such case, partial churn is likely to show no significant relationship with failure recovery.

Hypothesis 2: Unsuccessful failure recovery leads into higher probability of absolute and partial churn while successful failure recovery leads into smaller probability to absolutely churn but has no effect on partial churn.

2.4.3 Remaining past consumer behaviour

(19)

19 Firstly, if cross-buying is found to have a significant negative relationship with absolute churn, the relationship should be also detected by partial churn method because fewer loyal customers would buy the cheapest basic products. Second option is that the effect of the last order is found to be statistically insignificant in absolute churn prediction but will still be shown to have a significant relationship with partial attrition. The explanation is hidden behind the reality that price of a basic product with an addition of a second product is obviously higher than price of a single good purchase. Therefore, individuals who had shown higher degree of cross-buying will spend more money and show the current level of their commitment to be larger than single product buyers, and thus, be less likely to be classified as partial churners.

The second consumer behaviour that is expected to play an important role in both partial and absolute churn prediction is the length of the relationship. Bhattacharya & Sankar (2003) have found that the length of the relationship is positively related to the degree to which an individual is able to identify him/herself with a company. This argument is supported by the finding of relationship length having negative effect on partial churn (Buckinx, & Van den Poel, 2005) it is be expected that the negative relationship on both partial and absolute churn will be found.

Moreover, Li (1995) has found that customers who have received discount were less likely to churn. This finding has been also followed and supported by Neslin et al. (2006). Thus, negative relationship between absolute churn and discount is expected to be found. However, discounting may also lead to decrease in individual spending because “(1) as discounts become more endemic, sales decrease (2) temporary price reductions increase price sensitivity (3) the frequent use of deals make them less effective tool for stealing sales from competition” (Kopalle et al., 1999). To conclude, when it comes to partial churn, customers who received discount might become more deal and price sensitive and increase the timing of the next purchase and decrease an average amount of money they spend. Nevertheless, in the end the customers receiving discounts should still make at least some purchase and become less likely to abandon the company for good. Therefore, partial churn method is expected to provide biased results indicating that customers who received discount are more likely to become churners in the future while the reality is the exact opposite. Following this reasoning a discount effect is expected to have a positive effect on partial defection.

(20)

20 reasoning and findings from descriptive analysis. For example, it is expected to witness higher tendency to churn for individuals who have decided to unsubscribe from a newsletter. The opposite applies for individuals who have decided to link their online account with their Facebook profile. It is believed that individuals who were willing to link their online account with their private social network site account are both more involved and more satisfied with the service of the company, and the relationship is expected to be positive. Regarding the number of reminders, a number of calendar items added by a customer to their calendar in order to receive reminders, it is expected to find a linear relationship. The reasoning is based on the belief that individuals with a large amount of reminders are both intrinsically motivated as well as being constantly targeted with memory refreshing messages. Lastly, the type of email used during the registration of the trading account has also been believed to influence the probability to churn. The reasoning has been based on preliminary descriptive analysis during which it has been found that the proportion of both partial and absolute churners is significantly smaller among individuals who have used specific email domains. At the same time, it is believed that usage of well-known email domains might be a sign of a desire for longer lasting consumer-company relationship. Thus, it is highly probable that there will be significant differences among users of various email domains.

The expectation that the four mentioned industry specific variables will be significantly related to both types of customer attrition, however, it is also likely that the effect of these variables will be stronger in case of absolute churn. The reasoning is hidden in the nature of these variables, where most of them can indicate to what degree does an individual plan a long-term relationship or not and they do not change over time. Simply, users of unknown domains who have not linked their Facebook account and have never been interested in receiving reminders about their friends’ birthday are more likely to be planning only a one-time purchase without really considering the future customer-company relationship. Moreover, since the partial churn method mostly examines which individuals started showing reduced spending and there is absolutely no reason to expect that partial churn has been planned even before the first purchase, there is a high probability that the effects of the industry specific variables on partial churn will not sufficiently denote the reality.

(21)

21

Hypothesis 4: A discount is expected to have a significant positive effect on the absolute churn while a significant negative relationship between a discount and a partial churn is expected.

Hypothesis 5: Industry specific predictors are expected to have a significant relationship with both partial and absolute churn while the actual predictive power of these variables is expected to be significantly higher for absolute churn than for partial churn.

2.4.5 Customer demographics

Demographic variables are being extensively used in churn prediction modelling in majority of studies (Zeithaml, Bolton, Deighton, Keiningham, Lemon, & Petersen, 2006; Buckinx, & Van den Poel, 2005; Mittal, & Kamakura, 2001). According to Mittal and Kamakura (2001), males are more likely to churn because they have been found to repurchase less often. Combining this finding with the fact that customers of the data provider are mostly female (86%) suggest that the male customers participating should be only those individuals who feel quite enthusiastic about the products offered, it is expected that men will be less likely to churn than female. Age variable is also being used in all previously mentioned studies and while it has not always been found to have a significant effect on churn adoption there are cases where age has been proven to play an important role (Seo, Ranganathan, & Babad, 2008; Svendsen, & Prebensen, 2013), and therefore, it has been decided to include it in a model. However, the fact that customers are allowed to fill in any value for their age means that quite a significant number of them can be expected to be dishonest about their real age. Due to this fact, all individuals who have been expected to provide fake age information have been grouped together and examined for an increased number of churners. As shown in the Figure 4a and 4b, the increase of absolute churners (from 19.8% to 29.4%) as well as partial churners (from 24.7% to 29.5%) is clearly visible. Thus, it is believed that individuals who have been lying about their real age are more likely to attrite. To conclude, as there is absolutely no theoretical evidence that would indicate otherwise, no significant differences are expected to be found between partial and absolute churn methods.

(22)

22 Figure 4: Increase in proportion of churners among age faking individuals

3. Research design

This chapter will describe the proposed research design for the whole investigation. Firstly, a description of the dataset used for this particular study will be provided. Secondly, operalizations of the variables included in both absolute and partial churn models will be presented. Lastly, methodology part explaining the various steps of analyses, as well as results section will be presented.

3.1 Dataset

(23)

23

3.2 Sample and variables description

The exact specifications of depended variables are crucial for this study; therefore, the following two paragraphs will be devoted to clear explanation of how both partial and absolute churn methods have been measured.

Operalization of a dependent variable representing an individual classified as absolutely churned has been based on a single rule, dividing customers into two groups: those who have made at least one purchase in a specific time period and those who have not. In this study, based on a managerial decision considering the specific attributes of the industry in which the data provider operates, customers who have not bought anything for thirteen consecutive months are considered as churners. Therefore, a person who has not made a purchase for at least thirteen months is marked as 1; otherwise an individual gets a notation of 0. In total, 19.9% of the individuals have been classified as absolutely churned.

(24)

24 In total, the database consisted of transactional, behavioural and demographic information from 153 754 unique individuals. However, due to the nature of how a partial churn method is being developed a significant number of individuals had to be removed. In order to classify someone as partially churned, it is necessary to have observations from at least two consecutive periods. In this particular research, to ensure the reliability and robustness of the partial churn approach, it has been decided to include only individuals who have been customers of the company for all twelve months during the observation period. Unfortunately, approximately 15 000 individuals became the customers of the company after the data collection period has commenced. Therefore, these individuals have been removed from the database. Since no missing data have been detected, the final dataset consisted of 136 276 individuals.

Additionally, as already argued in the chapter 2, in order to be able to compare predictive power of both partial and absolute churn methods, it was necessary to include covariates that add value and help to improve the predictions when compared to null-model. Based on literature review, the set of variables that is likely to have influence on partial and absolute churn has been chosen and is presented in Table 2. Correlation check has been performed for each of the proposed variables. The variance inflation factor has been found acceptable in all cases where the smallest variance inflation factor (VIF) value has been found to be 1.012 and highest 3.424. Therefore, it has been concluded that there are no issues caused by multicollinearity of the predictor variables.

Variable Measurement & Description

Recency Number of days since last purchase, β1

Frequency Number of orders over 1 year period, β2 Monetary Individual Net sales over 1 year period, β3

Failure recovery Number of complaints, β4

Length of Relationship Number of months since the day of account registration, β5

Remainders 0 – No 1 – Yes, β6

Last Order Info

1 – Card 2 – Card + Gift

3 – Cardset 4 – Card + Personalized Gift, β7β8β9 Form of dummies, Card  reference category

Registration Email 1 – Gmail.com 2 – Hotmail.com 3 – UPCmail.nl 4 – Ziggo.nl 5 -Other, β10β11β12β13, Other  reference category

Unsubscribe 0 – No 1 – Yes, β14

Facebook Linked 0 – No 1 – Yes, β15

Discount 0 – No 1 – Yes, β16

Fake Age Information 0 – No 1 – Yes, β17

Gender 0 – Female 1 – Male, β18

(25)

25 After the dataset has been finalised in numbers and the available predictor variables have been prepared, the necessary descriptive analysis with the main goal of getting more familiar with the data has been performed. The following descriptive statistics have been found to offer highly contributive information: (1) 24.7% individuals have been classified as partially churned, 19.9% absolutely churned; (2) 86% of the sample were female; (3) on average, a customer spent 54.5 euro and made 9.2 orders during the one year observation period; (4) the most commonly used email domains during the registration of an account were gmail.com and Hotmail.com/nl with 12.0% and 38.1% respectively; (5) average customer age was roughly 40 years; (6) it has been decided that 1411 customers, 1.4%, have provided fake age information; (7) on average, customers made 0.2 complaints and only 19% of customers made at least 1 complaint; (8) 87.1% of the customers have received at least one discount; (9) 5.5% of customers have linked their greetz.nl accounts with their Facebook profile, (proportion of churners has been found to be significantly smaller for this small group, with 8.5% absolute churners and 15.9% partial churners, indicating a negative relationship towards churn probability).

For validation purposes, the database was randomly divided into two parts. The first part of the database which was used for specification and estimation of the given models consisted of 90% of the total sample of 136 000 individuals. The second part, consisting of the remaining 10% of individuals, a holdout sample, has been taken from the original dataset with the purpose of further validation of the results obtained from first sample. Again, it is necessary to highlight that the selection for either of the groups has been absolutely random; therefore, it was expected from both subsets to provide rather similar output

3.3 Methodology

(26)

26

3.3.1 Logistic regression

Due to the fact that logistic regression is a special type of linear models which assumes the linear relationship between the dependent and independent variables in the logit scale, i.e. using log odds transformation, the mathematical equation for the model had to be developed accordingly. Since both partial and absolute churn were hypothesized to be influenced by the exact same predictor variables the mathematical equation on right side will be identical. The both models are expected to look as following:

It is necessary to emphasize that “c” in both equations is a constant term and that the β coefficients obtained from logistic regression have to be interpreted differently if compared to e.g. linear regression. Firstly, the coefficients have to be translated into utilities that can later be translated into probabilities of an event taking place (in this case being either partially or absolutely churned, individuals marked as 1) versus probability of an event not taking place. Additionally, since the original β coefficients are log odds of event happening to not happening, a further calculation using the following formula can allow for automatic transformation into simple odds of event happening vs not happening:

(27)

27 Thus, if the above formula yielded a hypothetical result of 80 for a gender (while categorisation would follow the rule = male 0 and female 1), then it would mean that individuals of female gender are 80% more likely to be considered as an absolute/partial churner compared to male individuals in the dataset. Again, it is necessary to keep in mind that this only applies if all other variables have remained unchanged.

3.3.2. Decision Trees

The second method used for evaluation of how well is it possible to predict the constructed dependent variables will be different kinds of decision trees. This statistical method is fairly common among researchers. The main reasons for such frequent use are that decision trees offer advantages in the form of being very easy to learn and the ability to represent complex interactions among many variables (Lowd, & Davis, 2014). Additionally, in several papers decision trees have outperformed logistic regression (Risselada et al., 2010; Perlich, Provost, Simonoff, & Cohen, 2004); and therefore, understanding the uniqueness of each dataset, it has been decided to perform four different decision tree modelling strategies:

CHAID

Exhaustive CHAID CART

QUEST

(28)

28

3.4 Results

Prior to running the final model analysis, a pre-test version of logistic regression for both dependent variables has been performed. This analysis, however, did not serve for analysis of coefficients, and instead it has been run in order to ensure that the results are not being influenced by additional outliers and to check adequacy of the models. The actual check consisted of plotting the normalized (studentized) and deviance residuals with the predicted probabilities for churn for each individual. As proposed by Sarkar, Midi and Rana (2011) studentized residuals have been used to check for highly influential observations while deviance residuals served as further check for potential outliers and adequacy of the model. As can be seen in the Figure 5a, it has been found that there are multiple observations that are considered as extremely influential because their values were so high that it was impossible to actually visualise the two linear trends with slope –1 (Sarkar, Midi, & Rana, 2011). Thus, the observations with these extreme values, as high as 75 000 (values over 2 are considered to be in need of careful investigation), were removed from the dataset. The exact same procedure was followed for plotting the deviance residuals against the predicted individual probabilities.

Figure 5: Residual plots

(29)

29 -2 LL Cox & Snell R2 Nagelkerke R2

Before removal of outliers 100 970.180 0.314 0.466

After removal of outliers 98 487.996 0.325 0.483

Table 3: Improvement of fit after removal of outliers

Furthermore, the removal has also allowed for visualisation of the two previously mentioned linear trends which, as can be seen in Figure 5b, had the lowess smooth of the plot is in line with the zero intercept, and thus, the model has been considered to be sufficiently adequate for proceeding to with the interpretation of coefficients.

After thorough check of the outliers, a short analysis of the relationship between absolute and partial churn has been performed. Following the statements from chapter 2, customers who have been classified as partial churners might or might not completely churn, but exactly how many of the partial churners actually turn out to leave company for good is unknown and have not been specified in any previous research. Thus, in order to find out the relation of partial and absolute churn, a binary dependent variable representing partial churn has been put as independent variable into the model for absolute churn. The results were stunning and have indeed proved that partial churners are likely to become absolute churners in the later periods (p-value = 0.000, exp(β) = 2.066, β = 0.726).

Therefore, it can be said individuals who have been classified as partial churners in period 1 had 106.6% higher odds to absolutely churn in period 2.

3.4.1. Findings: logistic regression

The bottom line of the Table 7 shows the results for three general tests used for measuring the meaningfulness of both partial and absolute churn models. Omnibus test p-value has been found to be highly significant, at 1% significance level for both models (partial p-value = .000 < .01 and absolute p-value = .000 < .01). Therefore, it can be said that both models performed significantly better compared to the constant only model. Additionally, pseudo R² measures have been used in order to confirm the results of the omnibus test. These measures have been named by their inventors and indicate how well the model fits to the data (Cox, & Snell, 1989; Nagelkerke, 1991). Actual values indicate a better fit of a partial churn model (Cox & Snell .325 compared to .232, Nagelkerke .483 compared to .365).

(30)

30 The evaluation method has been based on a rule of thumb, thus the lower the values, the better the model fit. Following this reasoning, values that can be found in Table 4 are in line with the outcome obtained by omnibus and pseudo R² measures, thus indicating that partial churn method performs slightly better.

Partial Churn Model Absolute Churn Model

AIC 98 518.270 100 375.111

BIC 98 704.880 100 571.542

Table 4: Fit of the models (Partial vs. Absolute)

However, since the log likelihood values for both models are rather similar, it has been decided to further compare the actual model performance by evaluation of correct predictions that each model is able to provide. These hit rates have been calculated based on the following formula:

Tables 5 and 6 summarise how well are the absolute/partial churn models able to make predictions compared to the constant only model (0-model) as well as how successfully these models perform in prediction of churners under various conditions. The first comparison has been made by examining the overall hit rate of the null model and the model with 50/50 cut off. As shown in Table 5, absolute churn model shows a small improvement in the overall hit rate of the model (0-model = 80.1%  model with all predictors 82.4%); considering there is a 50/50 chance that an individual will be a churner when chosen randomly, the model is able to predict only 28.9% real churners. However, due to the fact that the total number of absolute churners in the dataset is 19.9%, a consideration of 50/50 chance for randomly choosing an individual marked as 1 is closer to 20/80. For this reason a model fit with 19.9/80.1 cut off has been calculated. After this change, the model shows a significantly improved ability to predict churners (86.2% churners correctly predicted).

(31)

31

0-model 50/50 cut off 19.9/80.1 cut off

Percentage of correct predictions

Not Churned 100 95.7 67.7

Churned 0 29.1 86.2

Hit-rate 80.1 82.4 71.4

Table 5: Hit-rate absolute churn

0-model 50/50 cut off 24.7/75.3 cut off

Percentage of correct predictions

Not Churned 100 92.4 76.4

Partially Churned 0 60.4 85.9

Hit-rate 75.3 84.5 78.8

Table 6: Hit-rate partial churn

In order to increase confidence in the conclusions made, one additional method has been used for further comparison of the ability to predict either partial or absolute churners. Here, the ability of models to make predictions has been concretised on focusing on how well the model is able to predict by calculation of a top decile lift. Firstly, the individuals were put into decile groups (each group consisting of 10% individuals from the whole sample size) based on the values of predicted probabilities. This was followed by examination of how well the model was able to successfully predict within the first 10% of individuals with highest predicted probabilities – also called top decile lift. While the main focus has been put on how well the model is able to predict for the top decile lift, in order to be able to provide visualisation by development of cumulative lift curve, a lift has been calculated for each decile group.

(32)

32 Figure 6: Cumulative lift curves for a) absolute churn for b) partial churn

Obviously, the differences between the two models cannot be made solely by analysis of how well the model is able to predict but a comparison of actual β coefficients that can be interpreted into meaningful effects of independent variables on both chosen DV’s. As previously mentioned in chapter 3.3, the exact same variables have been used for prediction of both dependent variables. Thus, it was possible to compare how much the models differ across the signs of the coefficients as well as their overall statistical significance. Additionally, a transformation of Exp(β), by use of previously explained formula [exp (βx)–1]*100, has been made in order to allow a further examination of variables‘ overall importance in each model. The summary of the results can be found in the following Table 7.

Main

Partial Churn

Absolute Churn

Variables

β

P-value

Exp(β)

β

P-value

Exp(β)

Constant -3.138 0.000* 0.043 -0.568 0.000* 0.602 Monetary 0.014 0.000* 1.014 0.001 0.078*** 1.001 Frequency -0.532 0.000* 0.587 -0.369 0.000* 0.692 Recency 0.099 0.000* 1.104 0.014 0.000* 1.014 Failure recovery -0.004 0.679 0.996 -0.171 0.000* 0.843 Length of relationship 0.000 0.881 1.000 0.001 0.240 1.001 Remainders -0.003 0.003* 0.997 -0.008 0.000* 0.992

(33)

33 Registration UPC -0.222 0.000* 0.801 -0.408 0.000* 0.665 Registration ziggo.nl -0.242 0.000* 0.785 -0.543 0.000* 0.581 Unsubscribe 0.200 0.000* 1.222 0.263 0.000* 1.301 Facebook linked -0.188 0.000* 0.829 -0.656 0.000* 0.519 Discount 0.105 0.000* 1.110 -0.103 0.000* 0.902 Age 0.009 0.000* 1.009 0.013 0.000* 1.013

Fake age information 0.241 0.000* 1.272 0.497 0.000* 1.643

Gender -0.020 0.350 0.980 -0.012 0.561 0.988

Omnibus (Chisquare) = 53521.535* Cox & Snell R² = .325 Nagelkerke R² = .483

Omnibus (Chisquare) = 35635.594* Cox & Snell R² = .230 Nagelkerke R² = .365

*significant at 1% significance level **significant at 5% significance level ***significant at 10% significance level

Table 7: Results logistic regression

Additionally, the overall explanatory power of the (groups of) variables of particular interest has been analysed by comparison of the AIC and BIC values of the complete model with the AIC and BIC values of the models from which the specific variable has been removed. The actual difference indicates the overall importance of the given variable, thus, if the AIC/BIC after removal is higher the variable is proven to add value to the model and vice versa.

AIC partial BIC partial AIC absolute BIC absolute Model with all variables included 98 518.270 98 704.880 100 375.111 100 571.542 Model without RFM 143 117.328 143 274.473 121 124.447 121 291.414 Model without failure recovery 98 516.441 98 693.230 100 577.826 100 764.436 Model without last order 98 601.459 98 758.604 100 371.968 100 538.934 Model without discount 98 548.672 98 725.461 100 407.600 100 594.210 Model without 4 industry specific

variables 98 704.380 98 882.239 101 311.327 101 439.007 Model without demographics 98 676.831 98 843.798 100 721.788 100 888.755

Table 8: Check of the explanatory power of specific groups of covariates

Examination of the difference between the signs of the coefficients shows that these signs and significance values are identical for majority of the predictor variables in both methods; however, there are also few variables that have been found to differ across the methods. Due to the fact that different hypotheses have been developed for various groups of covariates, the results will also be presented in such manner.

(34)

34 absolute churn, respectively. Considering the fact that the second highest difference after removal has been found to be approximately 1000, the predictive power of all RFM variables can be undeniably observed. Additionally, coefficient values of the RFM variables have also shown to have the same type of the effect on both partial and absolute churn. The dominance of RFM in partial churn relative to effect of RFM on absolute churn is also visible when the exp(β) values are studied. Thus, considering that everything else remains constant for each increase in 1 euro of monetary value, the odds that an individual becomes partially churned increase by 1.4% [exp(β) = 1.014] while the odds of absolute churn increases only by 0.1% [exp(β) = 1.001]. In case of recency, for every additional week that passes since the last purchase, the odds of partial churn are increased by 10.4% [exp(β) = 1.104]vs. the odds of absolute churn by 1.4%[exp(β) = 1.014]. Lastly, for each increase in number of orders by 1, the odds of partial churn are decreased by 41.3% [exp(β) = .587] vs. the odds of absolute churn by 31.8% [exp(β) = .692]. Overall, this leads to the conclusion that all results were in line with expectations (Hypothesis 1: supported).

In the case of failure recovery, the results were again in line with expected outcomes. Odds of partial churn have been shown to be unaffected by failure recovery (p-value = 0.679) while the odds of absolute churn have been found to decrease by 15.7% if the individual made at least one complaint [exp(β) = .843]. AIC/BIC values also indicate that the failure recovery has no explanatory value in prediction of partial churn (Hypothesis 2: supported).

Last order variable has been found to offer different insights across the two approaches. Absolute churn has been found to be unaffected by the type of last order/relationship breadth (p-values = 0.251 to 0.399) while the partial churn has been found to differ significantly across categories of individuals whose last order consisted of card + gift or cardset. Individuals who bought both a card and a gift during their last order have been found to be 21.2% less likely to partially churn [exp(β) = .788] and individuals who have bought a cardset during their last order have been found to be 242% more likely to partially churn [exp(β) = 3.420]than individuals who have only bought a single card as their last purchase. (Hypothesis 3: supported).

(35)

35 In line with this reasoning, the effects of industry specific variables have been found to have the same type of relationship with both partial and absolute churn. Additionally, all variables have been found to have the exact same statistical significance (p-values in all cases = 0.000) except for the effect of Hotmail email domain on partial churn, which has been found to be statistically insignificant (p-value = 0.327). Strength of the relationship between the variables and the different churn methods has been found to differ, and therefore, to be in line with expectations. Both AIC/BIC differences (the after variable removal increase in AIC/BIC of partial churn is approximately 200 while for absolute churn model the increase is approximately 1000). Moreover, the odds of absolute churn have been found to be by 48.1% smaller across individuals who have their Facebook accounts linked with the trading account in comparison to the individuals who have not [exp(β) = .519]; the odds of partial churn within the same scenario have been found to decrease by 17.1% [exp(β) = .829]. The effect of email registration has shown to start diminishing during the use of partial churn. For example, in the case of gmail.com domain, individuals using this particular email address were 7.4% less likely to partially churn [exp(β) = .936]and 33.9% less likely to absolutely churn [exp(β) = .661] than individuals who have used an email address belonging to category “other”. The effect of unsubscribing has been approximately the same across both attrition models, where individuals who have unsubscribed from news feed were 30.1% more likely to absolutely churn [exp(β) = 1.301] and 22.2% more likely to partially churn [exp(β) = 1.222]than individuals who continued to receive news from the company. Lastly, the odds of partial and absolute churn have been found to decrease by 0.3% [exp(β) = .997] and by 0.8% [exp(β) = .992] respectively, with each additional reminder. (Hypothesis 5: supported).

Demographic variables have been the last group of predictors of interest. Age has been found to have a significant positive effect on both partial value = 0.000, β = 0.009) and absolute churn (p-value = 0.000, β = 0.013). At the same time, the gender has been found to be statistically insignificant in both cases, thus, having no effect on churn (p-values = 0.350 and 0.561). However, as the main hypothesis concerning the demographic variables has been focused on provision of similar results, the claim can be accepted, despite of the lack of significant relationship among gender and churn. (Hypothesis 6: supported).

(36)

36

3.4.2. Findings: CHAID and exhaustive CHAID decision trees

The variable frequency (number of orders) has been found to be the most influential predictor variable. For tree which was built to predict absolute churn, the frequency variable has been used to further split individuals into nine child nodes. A clear linear pattern could be observed, the node most on the left side (individuals who have made 1 or less orders) consisted of 10 974/57.2% of the absolute churners which has shown to be highly different from the child node situated most to the right (individuals who have made at least 21 orders) and consisted only 24/0.2% of absolute churners. This exact variable has shown the same pattern for partial churn. Individuals who have made 1 or fewer orders consisted of 15 328/79.9% partial churners while individuals who made at least 21 purchases had only 41/0.3% partial churners. Two other variables that have been found to have most explanatory power have been recency (positive effect on both partial and absolute churn) and number of reminders (negative effect). In addition, five more variables have been found to have significant influence on both partial and absolute churn; namely, failure recovery, unsubscribing, age and linking to Facebook.

Additionally, few variables have been found to be significant only in one of each of the proposed churn measurements. Monetary and discount variables have been found to have a small positive effect on partial churn while they have not been found to be influential in absolute churn. In case of absolute churn, type of email registration has been found to play an important role, where the individuals who have used email domains belonging to the category “other” have been found to be more prone to churning.

(37)

37 Partial Churn Absolute Churn

Not Churned 95.5% 93.1% correct

(Partially) Churned 55.1% 41.4% correct Overall Percentage 85.6% 82.8% correct

Table 9: Hit-rate CHAID and Exhaustive CHAID

3.4.3. Findings: CRT and Quest decision trees

In addition to the two types of categorical CHAID trees, classification and regression trees (CRT) and Quest have been employed. As shown in Table 10, both regression trees have been able to provide slightly improved fit in comparison to the previous CHART building methods. In addition, since this tree belongs to the category of regression trees, an over-fitting might become a problem, and therefore, both trees were pruned which helped to increase the overall fit of the model. As previously, the frequency variable has been shown to be the most important variable for both methods

Partial CRT Partial Quest Absolute CRT Absolute Quest

Not Churned 95.0% 95.3% 93.4% 93.6%

(Partially)Churned 63.0% 58.3% 39.9% 39.0%

Overall Percentage 87.1% 86.2% 82.8% 82.7%

Table 10: Hit-rate CRT and QUEST

3.4.4. Model performance summary

(38)

38 Figure 7: Summary of the top decile lift scores

4. Conclusion

The research presented in this thesis has aimed to provide a comparison of two specific model-building strategies for customer churn in non-contractual setting. The comparison has been carried out by analyses of the effects of various predictor variables by means of examining the relative power and type of the relationship. In total, six hypotheses regarding the differences and similarities between the partial and absolute churn have been formulated. Overall, the results have shown that all of the provided hypotheses are supported, thus, indicating a conclusion that each method predicts slightly different phenomenon.

Firstly, while RFM factors have proven to be the most important predictors in both methods, an undeniable dominance of RFM variables has been found in the case of partial churn. In fact, the power of recency and frequency in the partial method is so overwhelming that without the use of these variables, the method might even become inapplicable. Additionally, all three variables have shown to have the same kind of relationship with both dependent variables, thus further supporting the claim in favour of partial churn’s ability to predict attrition behaviour of individuals.

(39)

39 complained at least once have been found to be 17.1% less likely to churn than individuals who have never complained. This effect, however, is being lost when the partial approach is employed. The explanation for the difference might be found in the fact that the relationship itself has been found positive. If the relationship of failure recovery and customer attrition were found to have negative effects on customer churn, it would mean that a significant amount of individuals would simply stop purchasing, and therefore, be detected by both approaches to a certain degree. However, since the exact opposite holds true and failure recovery has proven to successfully solve customers’ issues along with the fact that “the most dissatisfied customers stop buying and hence drop out of the customer base, the individuals who complain are those customers who are still interested to give a company another chance“(Knox, & van Oest, 2014), thus, it is logical to observe lower churn among these individuals. In the case of a partial churn, which accounts not only for customers who have not made any purchase for a given period but also for customers who have significantly reduced their share of the wallet which is not affected by number of complaints, the effect of satisfaction gained from successful recovery becomes scattered across the two diverse groups of partial churners. The third difference between the partial and absolute churn method has been the effect of relationship breadth depicted by the type of last order. An analysis of the partial churn has revealed the effect of last order to be highly significant and influential across two categories. Firstly, the individuals who have purchased card with an additional gift during their last order were 22.2% less likely to become partially defected and individuals who have purchased a set of cards during their last order were found to be 242% more likely to partially defect in the future. Interestingly, this relationship became statistically insignificant after the effects were measured relative to the absolute churn. Thus, it can be concluded that the breadth of a relationship does not affect the probability of a customer to completely cease from purchasing and the relationship with partial churn is rather the result of a short-term effect on sales which returns back to the normal relationship after a certain period. Logical explanation for this phenomenon could also be offered by the nature of the last order. If a customer buys a card and a gift, he or she has to pay an additional price for the gift and, therefore, such an individual will always spend more than customers who only buy single cards. At the same time, individuals who buy a set of cards are most likely planning to buy the cards in a bulk for the future events and, thus, will not have to make another purchase for a longer period, though he or she might make another purchase after giving away the last card from the set. Thus, the inter-purchase time after different types of last orders changes but does not lead into abandonment of the company.

(40)

40 had been found to be exactly opposite to the effect of discount on absolute defection. Thus, people who have received a discount were found to be 11.1% more likely to become partial churners while the likelihood of becoming an absolute churner decreased by 9.8% for each individual who received a discount offer. Even though this difference has been expected and in line with previous hypothesis, it also means that partial churn method has again proved that the results obtained using this approach are rather biased and untrue in certain cases. It logical to expect that a person who has received a discount will most likely try to wait for another discount, resulting in decrease of short-term spending; however, it is also very unlikely that it will make an individual feel ungrateful towards the company. Indeed, the opposite type of relationship between the discount and the two types of churn approaches could be explained by the fact that while receiving the discount increases the time before next purchase, overall the relationship between the customer and the company is strengthened, proving that both partial and absolute churn methods are studying rather different phenomena.

The predictive power of the industry specific variables has shown to be stronger in the case of absolute churn. Additionally, all variables have shown the same kind of positive/negative relationships with both methods, leading to the conclusion that partial churn is indeed able to detect the long-term effects, which remain relatively similar over time. Nevertheless, the absolute churn approach performs slightly better and is more likely to catch the real relationship between customer churn more accurately, and the results confirm that to certain extent the same can be indicated during implementation of partial defection strategy.

Lastly, neither of the methods have shown serious deviations when the effects of demographic variables, age and gender, have been compared. On one hand, there has been no real theoretical background that would lead to different expectations, thus, suggesting that both methods can detect the real effect fairly well. On the other hand, the slightly higher effect of age on absolute churn indicates that the churn behaviour is likely to be immediate.

Comparison of binary dependent variable approaches used for churn prediction in non-contractual online retail setting Master Thesis