• No results found

Value to the Churn Prediction Models: A New Approach of Combining Churn Probability and Customer Value for Customer Retention in the Telecommunication Market

N/A
N/A
Protected

Academic year: 2021

Share "Value to the Churn Prediction Models: A New Approach of Combining Churn Probability and Customer Value for Customer Retention in the Telecommunication Market"

Copied!
54
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Value to the Churn Prediction Models:

A New Approach of Combining Churn Probability and

Customer Value for Customer Retention in the

Telecommunication Market

(2)

2

Value to the Churn Prediction Models: A New Approach of Combining Churn

Probability and Customer Value for Customer Retention in the

Telecommunication Market

Willem Bekkers University of Groningen Faculty of Economics and Business

MSc Marketing Intelligence & MSc Marketing Management Master Thesis January 11, 2016 H.W. Mesdagstraat 75 9718 HE Groningen Tel: +31(0) 625337094 E-mail: w.j.w.bekkers@student.rug.nl Student number: S2522861 Supervisors University of Groningen

(3)
(4)

4 MANAGEMENT SUMMARY

Nowadays, the telecommunication sector is characterized as an increasingly saturated market, particularly in developed countries. Therefore, telecom operators are moving from product centric to customer relationship management (CRM) strategies. Within CRM, an essential part is churn management, because it aims to maximize the value of the customer base by establishing long-term customer relationships.

Research in churn management has essentially relied on probability to churn for targeting customers in retention campaigns and optimize the customer base. It can be argued that while some customers do not have the highest probability to churn, they can have a relatively higher customer value. Therefore, the aim of this paper is to rethink the current approach and develop an approach that accounts for the probability to churn and simultaneously adjust for the customer value. This research takes the next step in churn management by investigating to what extent high probability customers and high value customers differ from unique customers.

This study finds that nearly 80 percent of the customers in the value approach top decile are not reported in the top decile of the probability approach. This results in nearly 80 percent of different customers being targeted by companies using the value the approach in contrast to the probability approach. This will lead to different customer lifetime value (CLV) forecasts because companies will target financially valuable customers. The value which is at risk in the top decile of the value approach increases with 128 percent compared to the probability approach.

(5)

5 PREFACE

During my Pre-MSc program I discovered my interest for statistics and marketing intelligence. My passion for prediction models grew after participating in the Customer Models course. Making future predictions is something magical and having a future oriented perspective is close to my personal attitude. Through doing this thesis period I discovered that I would like to develop myself through a future career in the growing area of marketing intelligence.

The end-result of this thesis could not have been accomplished without the help of a few inspiring people. I would like to thank my supervisor dr. ir. M.J. Gijsenberg for his positive way of providing feedback and his flexibility, and supervisor dr. J.T. Bouma for his insightful feedback. Finally, I would like to thank my fellow students for support and constructive feedback.

(6)

6 TABLE OF CONTENTS MANAGEMENT SUMMARY ... 4 1. INTRODUCTION ... 8 2. THEORETICAL FRAMEWORK ... 10 2.1 Customer Churn ... 10

2.1.1 Importance of customer churn ... 11

2.1.2 Customer churn in the telecommunication market ... 12

2.2 Determinants of Churn ... 12 2.2.1 Recency ... 13 2.2.2 Frequency ... 13 2.2.3 Monetary value ... 14 2.2.4 Length of relationship ... 15 2.2.5 Socio-demographics ... 15 2.3 Evaluation Criteria ... 17

2.3.1 Top Decile Lift ... 17

2.4 Customer Equity ... 18

3. RESEARCH DESIGN ... 20

3.1 Dataset ... 21

3.2 Variables Description ... 21

3.2 Creating New Variables ... 21

3.3 Missing Values, Oddities and Outliers ... 22

3.4 Customer Churn Models ... 22

3.4.1 Classification trees ... 23

3.4.2 Logistic regression ... 23

3.5 Cluster Analysis ... 25

4. RESULTS ... 26

4.1 Pearson Correlation Coefficients ... 26

4.2 General Descriptives ... 27

4.3 Decision Trees ... 27

4.4 Logistic Regression ... 29

4.4.1 Evaluation of the overall model ... 29

4.4.2 Parameter assessment ... 31

(7)

7

4.5 Adjust for Customer Value ... 34

4.6 Customer Profiles ... 34

4.6.1 Cluster analysis value approach ... 35

4.6.2 Cluster analysis probability approach ... 38

4.7 Hypothesis Testing ... 40 4.6.1 Recency ... 40 4.6.2 Frequency ... 40 4.6.3 Monetary value ... 40 4.6.4 Relationship length ... 41 4.6.5 Age ... 41 4.6.6 Device newness ... 41

4.6.7 Difference in unique customers ... 41

4.6.8 Difference in clusters ... 42

5. CONCLUSIONS AND DISCUSSION ... 43

5.1 Scientific Implications ... 43

5.2 Managerial Implications ... 45

6. LIMITATIONS AND FURTHER RESEARCH ... 46

6.1 Limitations ... 46

6.2 Further Research Suggestions ... 47

(8)

8 1. INTRODUCTION

The telecommunication sector is characterized as a fast developing market, with a continuing rise of new players and rapidly changing technologies. Along with the high competition, the mobile telecom markets in the world are increasingly saturated (Ahn et al. 2011), particularly in developed countries (Verbeke et al. 2012). In terms of users, more people have a mobile phone than a water closet (ITU, 2015b; WHO, 2015) and users can easily switch because of number portability (Wong, 2011). Kumar & Reinartz (2012, p. 284) state that the worldwide subscription rates to mobile providers were said to be 90 percent in 2010 and that there were 19.5 million mobile subscriptions in the Netherlands in 2014 (ITU, 2015a). Based on these figures the telecom market is considered mature. Therefore, telecom operators are moving from product centric to CRM strategies (Verbeke et al. 2012). Within CRM, an essential part is churn management, because it aims to maximize the value of the customer base by establishing long-term customer relationships (Risselada, Verhoef & Bijmolt, 2010).

(9)

9 In the academic literature it is argued that not every customer is worth the effort of developing a sustainable relationship with and companies should focus on their most profitable clients instead of unprofitable customers (Buckinx & Van der Poel, 2005). Traditionally, most researchers and managers ignore the unequal customer importance. In other words, the traditional approach disregards the heterogeneity of individuals while targeting the value of customers. According to Glady, Baesens & Croux (2009) marketing strategies should focus on the estimated financial value of customers, and that customers’ equity (i.e. total value of the customer base) is central in decision making. Customer equity is driven by the probability that a customer stays with the company and the value the customer represents for the company (Vogel, Evenschitzky & Ramaseshan, 2008). While the future probability of the customers’ relationship with the company is extensively researched in churn management, simultaneously accounting for customer value is unaddressed in academic literature. Therefore, the following main research question is answered:

To what extent do high probability churn customers, in comparison to high probability churn customers adjusted with customer value, differ in their value-at-risk?

With this new approach, three relevant sub-questions emerge:

1. What are determinants of customer churn in the telecommunication industry?

2. To what extent do high probability churn customers, in comparison to high probability churn customers adjusted with customer value, differ in unique customers?

3. To what extent are the profiles for high probability customers and high probability churn customers adjusted with customer value different to target these profiles with tailored retention campaigns?

In this study, a customer database of a telecom operator is used to analyse the churn behaviour of individual customers. To analyse the churn predictions of each customer, common churn models are used such as: logistic regression and the classification trees. To evaluate the prediction power the Top Decile Lift (TDL) and hit-rates are calculated for the estimated time period. Then, the value-at-risk when accounting for customer value and without customer value is evaluated. Lastly, for both the top decile of the probability approach and value approach a cluster analysis is performed to investigate the differences in profiles.

(10)

10 customer equity. This study takes the next step in churn management to account for the customer heterogeneity in customer value. Secondly, this research adds to current literature by emphasizing on drivers or antecedents of customer churn in the rapidly changing telecom market. What makes this research even more interesting is the strong managerial relevance of the new approach. Churn management and detecting high probability churners is of major importance for customer centric companies and maintaining a valuable customer portfolio. Simultaneously accounting for customer heterogeneity in customer value will give marketing managers additional valuable insights and will make marketing more accountable.

The remainder of this paper is organized as follows: the next section provides an overview of the literature on churn management, section two drafts hypotheses and summarizes these in a conceptual framework. Section three shows a general description of the data, data preparation (e.g. oddities, missing values and outliers) and the operationalization of churn probability with customer value. The model estimation, validation and clustering results are discussed in section four. Section five provides a discussion of the findings and concludes. Finally, limitations and directions for future research are presented in the last chapter.

2. THEORETICAL FRAMEWORK

The aim of this section is to discuss the management of churn behaviour. Furthermore, it presents the determinants that influence churn. The method for selecting the high probability churn customers such as the TDL is described in paragraph 2.3. This section concludes with a discussion of customer value and all hypothesized relations will graphically be presented in the derived conceptual model.

2.1 Customer Churn

From a marketing perspective, churn describes the phenomenon of a ‘customer leaving’ the company for another. Once the contractual relationship is ended the customer will switch to another company. According to Blattberg, Kim & Neslin, (2008, p. 610); Leeflang et al. (2015, p. 321) churn is defined as ‘the probability the customer leaves the firm in a given

period’, this definition is used in this research. According to Leeflang et al. (2015, p. 321)

customer churn in contractual settings (e.g. mobile phone subscriptions) is defined as ‘the

termination or defection of the contract between the customer and the company’. Churn refers

(11)

11 customer base that terminate the contract in a given period of time (Blattberg et al. 2008, p. 607). To reduce customer churn, company’s use probability models to predict the likelihood of a customer’s defection. In a contractual setting, the aim is to predict whether the customer is likely to defect during a given time period. This is traditionally stated as a binary issue (Leeflang et al. 2015, p 321).

These customer churn models make predictions based on historical individual customer data. According to Blattberg et al. (2008, p. 7) companies currently have the possibility to store and use huge amounts of customer data to establish a competitive advantage. The development in data volume, variety and velocity can be used to make more accurate churn models (Leeflang et al. 2015, p. 81). The data that is most commonly used as input for customers churn models is socio-demographic data (e.g. age or gender), customer behaviour data (e.g. number of incoming and outgoing calls, number of calls to customer helpdesk) and financial data (e.g. billing information).

2.1.1 Importance of customer churn

Next to missing future revenues from a leaving customer, there are three other reasons why firms suffer from customer churn. First of all, acquiring new customers is five to six times more expensive than retaining existing customers (Verbeke et al. 2012). Secondly, after investing in the acquisition of the customer, dissatisfied customers may spread negative word-of-mouth (Buckinx & Van der Poel, 2005) instead of positive word-word-of-mouth to attract new customers (Verbeke et al. 2012). Lastly, when a customer defects the relationship, the company will miss the opportunity to make more revenue by up-selling or cross-selling products or services (Risselada et al. 2010). In light of the current market trends, the last argument is highly relevant because a key trend in the telco market is service bundling (Prince & Greenstein, 2014).

(12)

12 (2004) who investigated the impact of acquisition cost, margin and retention on customer value and firm value. They found that for every percent increase in retention, firm value rises by five percent. However, for every percent of increase in acquisition, the firm value increases by .02 percent to .32 percent. A one percent improvement in discount rates or price promotions barely increases customer value from .5 percent to 1.2 percent (Gupta et al. 2004). Therefore, improvements in retention rates increase firm value significantly when compared to acquisition or price promotions. These findings are in line with Braff, Passmore & Simpson (2003), who found that companies can generate a four to five percent increase in margin and that customers’ lifetime management can reduce churn by 9.9 percent.

2.1.2 Customer churn in the telecommunication market

Even though churn management can reduce churn, churning as such remains a serious challenge in the telecom market. According to Blattberg et al. (2008, p. 608) annual churn rates in various industries can vary between 20 percent and 50 percent. Meaning that this percentage of customers begin the year with a firm, will leave by the end of the year. According to Kumar & Reinartz (2012, p. 85) churn rate can be defined as ‘the number of

existing customers who left by the end of a given period divided by the number of existing customer at the beginning of the respective period’. Annual churn rates of 20 percent - 38

percent are estimated for mobile telecom operators in Europe (Lemmens & Gupta, 2013). This is in line with Jahromi et al. (2010), who report annual churn rates of 30 percent for the telecommunication market and Kumar & Reinartz (2012, p .60), who argue that annual churn rates in telecom industry can be as high as 40 percent. Therefore, the telecom market can be described as one of the top markets that is suffering from high annual churn rates. In order to accurately predict an individual’s likelihood to churn based on past customer behaviour, it is essential to understand which determinants are driving customers to leave.

2.2 Determinants of Churn

(13)

13 (2013) finds that usage variables (e.g. how often a consumers calls, billing information and how frequently a customer uses a service) are relevant variables and seem to be the best explanatory variables of defection.

2.2.1 Recency

According to Fader et al. (2005) recency is the time period since the last purchase of the customer. In other words, a low recency implies that a short period of time has passed since a consumer’s last transaction. Many prior studies have used recency in CLV forecasts and argue that recency usually is a stronger indicator compared to frequency and monetary value (Fader et al. 2005). Buckinx & Van den Poel (2005) say that customers who have been active in a short period of time, are more likely to be active in the future in comparison to inactive customers. Furthermore, the lower the recency value, the higher the likelihood that a customer stays loyal to the firm. One can argue that recency is a more important predictor in non-contractual settings than in non-contractual relationships. The relatively long lack of activity of a customer sends a signal to the company that the customer may have ended the relationship (Reinartz & Kumar, 2000). In line with the arguments above, Borle, Singh & Jain (2008) carried out research into CLV and find that longer interpurchase times are associated with greater risk of leaving the firm and the risk is equal for male and female customers.

A few years ago, Neslin et al. (2013) presented the ‘recency trap’. When customers do not make a purchase in a given period of time, their recency increases, and this makes them even less likely to purchase a good or a service in the following period. Consequently, the customer drifts away from the company. The CLV decreases and the customer becomes worthless to the firm. Based on the findings of Buckinx & Van den Poel (2005); Borle et al. (2008); Fader et al. (2005); Neslin et al. (2013) the following hypothesis is formed:

H1a: The longer it has been since the last experience with the company, the more likely the customer will churn.

2.2.2 Frequency

(14)

14 Poel, 2005). In a wireless telecommunications study, one of the most relevant churn triggers is the amount of consumption over the previous three months (Lemmens & Croux, 2006). If the consumption decreases, there is a higher probability of defection because the customer uses the service less frequently. When the consumption is stable or increases, the customer is more likely to be loyal (Lemmens & Croux, 2006). In the telecom market, it is possible to predict customer churn by how frequently a customer calls or the number of minutes or data consumed per month (Verbeke et al. 2013). Frequency can capture the trends or evolution in call behaviour by monitoring the amount of data used and increase in duration of calls.

Verbeke et al. (2013) advises model builders to pay attention to usage attributes in modeling processes. A strong drop in total minutes called might be a strong indicator for customer churn. However, it intuitively makes sense that the event of churn has already taken place but is not registered in the database yet. In addition, a heavy peak in frequency of usage before defection may also be a good predictor. The peak in data, text or minutes may indicate that a customer is consuming their remaining minutes or data the customer already paid for and churned to a competitor (Verbeke et al. 2013). Based on the findings of Buckinx & Van den Poel (2005); Lemmens & Croux (2006); Lemon et al. (2002); Verbeke et al. (2013) the following hypothesis is tested:

H1b: The less frequent the experiences with the company, the more likely the customer will churn.

2.2.3 Monetary value

(15)

15 consumers with lower valued propositions are more likely to churn. This is in line with Risselada et al. (2013) who provides empirical evidence that suggest customers who have a lower revenue of their fixed phone line have a greater likelihood to defect than those with a more expensive subscription.

It can be assumed that customers who spend more now, will likely spend more in the future. A consumer’s monetary value is likely an effective predictor of the future customer’s behaviour, such as retention. Based on the findings of Buckinx & Van den Poel (2005); Lemmens & Croux (2006); Risselada et al. (2013); Verbeke et al. (2013) the following hypothesis will be tested:

H1c: The lower the customers’ monetary value, the more likely the customer will churn.

2.2.4 Length of relationship

The length of a relationship between a customer and a firm has received considerable attention from researchers and practitioners. The length of the relationship is defined as the number of days or months since the customer has made his or her first purchase at a company (Buckinx & Van der Poel, 2005; Lemmens & Croux, 2006). Anderson & Weitz (1989) find that the length of the relationship is positively related to the expected future stability of the relationship. Furthermore, customers with longer relationships and high prior customer shares are less likely to churn (Verhoef, 2003). It can be argued that customers who have already stayed with the company for a longer period are more familiar with the procedures and more acquainted with the products or services (Reinartz & Kumar, 2000). Furthermore, a significant negative effect of relationship length on churn is presented in a study with Internet service provider data of Risselada et al. (2013). Hence, customers who have a long relationship with the company are less likely to defect in the near future. In line with the findings of Buckinx & Van der Poel, (2005); Verhoef (2003); Risselada et al. (2013); the following hypothesis will be tested:

H1d: The shorter the customer-company relationship, the higher likelihood of customer churning.

2.2.5 Socio-demographics

(16)

16 such as income and age are often selected as predictors in CRM models (Verhoef, Fransen & Hoekstra, 2001; Verhoef, 2003). In a study with Internet service provider customers, age is found to be a significant churn predictor. Customers older than 65 years have a lower churn probability compared to younger people (Risselada et al. 2013). As described in paragraph 2.2.3, Lemmens & Croux (2006) find that a customer is more likely to defect when his or her payment plan is cheaper. Furthermore, this effect tends to be stronger with younger customers compared to older ones (Lemmens & Croux, 2006). Previous research shows that age tends to be a good predictor for churn (Lemmens & Croux, 2006; Risselada et al. 2010; Verbeke et al. 2012) and therefore the following hypothesis will be tested:

H1e: The younger the customer, the more likely the customer will churn.

According to Borle et al. (2008) gender has an insignificant effect is on churn, implying that males and females tend to have equal risk of defection. Geographic location or household size is often used as explanatory variable for defection (Neslin et al. 2006; Risselada et al. 2013). Nevertheless, Verbeke et al. (2013) argue that socio-demographic variables such as zip code, location or region may have less predictive power of churn. Notwithstanding of age, it can be concluded that behavioural variables are better predictors for customer churn than socio-demographic ones (Buckinx & Van der Poel, 2005). Therefore, the other socio-socio-demographic factors are used as control variables in this model. The control variables can also be used for face validity. Face validity is defined as the believability of the outcomes of the model and to test whether or not the outcomes are in line with the prevailing theories (Leeflang et al. 2015, p. 152).

(17)

17 new subscription from a competitor (Lemmens & Croux, 2006). Attributes such as the newness of the current equipment generally seem to be a highly important predictor to model churn (Verbeke et al. 2013). In line with the findings of Lemmens & Croux (2006); Mozer et al. (2000); Verbeke et al. (2013) the following hypothesis is tested:

H1f: Customers with an older cellular phone have a higher probability of churning.

After specifying the best predictors for customer churn which represent real-life as closely as possible, models are estimated and validated. Validation refers to the ability of the model with regards to the prediction of customer churning and to the data fit. However, validation does not account for the financial performance of the company, which can be achieved with the model. The following paragraph presents several evaluation measures concerning the financial performance of churn models.

2.3 Evaluation Criteria

After the estimation and validation of a churn model, the corresponding churn probability for each customer is calculated. Then, these individual churn probabilities were ordered and divided into groups or deciles (mostly ten groups). The groups are ordered based on customers who were least likely versus most likely to churn. The aim was to create variation between the customers that have the highest probability to churn and the customers that have the lowest probability to churn by maximizing the separation between the groups or deciles (Blattberg et al. 2008, p. 318). The TDL was chosen because it emphasizes the top ten percent of customers’ most likely to churn and it was in line with the aim of this study. Furthermore, the TDL is often used as evaluation technique in other studies (Holtrop et al. 2014; Lemmens & Croux, 2006; Neslin et al. 2006; Risselada et al. 2010).

2.3.1 Top Decile Lift

According to Leeflang et al. (2015, p. 322) the TDL is defined as the ‘fraction of churners in

the top-decile divided by the fraction of churners in the whole set’. The goal was to achieve a

(18)

18 A critical shortcoming of the traditional approach is that companies do not account for the value customers represent. Not all customers are worth the effort of developing a sustainable relationship with and companies should instead focus on their most valuable clients (Buckinx & Van der Poel, 2005). Therefore, firms can be discretionary in terms of which specific customers to retain. According to Kumar & Shah (2009) this is important for companies who deal with large customer bases and where value between customers is skewed. These researchers argue that marketing activities, such as retention campaigns for churn, are less efficient if they are uniformly applied to all customers of the firm. In other words, retention campaigns are less wasteful if firms account for customers’ future value (Kumar & Shah, 2009). In order to make deliberate management investments in a profitable retention campaign, companies should simultaneously account for customers’ probability to churn and their value. This paper will investigate a new approach (i.e. churn probability adjusted with customer value), which could lead to new insights in churn management and CLV. Since customer value is considered as an extension in the new approach, it is interesting to further investigate the existing literature about customer equity and CLV. In the following paragraphs the current literature about these two concepts is presented.

2.4 Customer Equity

(19)

19 the customer base on basis of importance (Rust et al. 2004). According to Srinivasan & Hanssens (2009) segmenting the customer based on the most profitable customers may have an effect on the firm’s risk in the future and is an area for future research. This paper will investigate this research suggestion by simultaneously accounting for churn probability and customer value. Therefore the following hypothesis is tested:

H2: The unique customer IDs in the top decile of the probability approach and the top decile of the value approach significantly differ from each other.

Traditional retention strategies target the group of customer based on their churn riskiness (Lemmens & Croux, 2006). One can argue that targeting customers does not only depend on their churn probability rate but also on the customer’s value or monthly spending (Lemmens & Gupta, 2013). Segmenting customers based on their value will give companies new useful information about how to segment the customer base (Rust et al. 2004). However, Srinivasan & Hanssens (2009) argue that maximizing the customer equity by focusing on the most profitable customers may lead to a narrowing of the customer base. Therefore, it remains unclear whether paying more attention to more valuable customers will lead to an increase of risk for a firm in the long-run and whether it is effective to create segments (Srinivasan & Hanssens, 2009). To investigate the effectiveness of forming segments an explorative research question is created. Next to the main research question, the following explorative research question is proposed because a static test for this question is less feasible.

RQ1: To what extent do the segments of customers in the top decile of the probability approach and the top decile of the value approach differ from each other?

2.5 Conceptual Model

(20)

20 This study empirically examines the relationship of recency, frequency, monetary value, relationship length, age, newness of handset on the probability to churn. The probability to churn is adjusted with customer value metric to simultaneously incorporate the likelihood to defect and customer value. The new approach is benchmarked with the traditional approach where churn is only based on the probability to churn. The operationalization of the customer’s value metric is further explained in section three.

3. RESEARCH DESIGN

Section three discusses the methodology of this paper. The first paragraph describes the dataset in this research. Hereafter, the next paragraph presents the variables included in the model. Then the customer churn models are discussed and the model is presented graphically in the derived mathematical equation. Finally, the last paragraph provides a discussion of the methodology of the cluster analysis.

(H1f) (H1e) (H1d) (H1a) (H1b) (H1c) Recency Frequency Monetary value Age Newness of handset Churn Relationship length

(21)

21 3.1 Dataset

In this study, a customer database of a telecom operator is used to analyse the churn behaviour of individual customers. This research started with a dataset which consist of three kinds of postpaid subscriptions. The first subscription is a sim-only offer and is a subscription without a mobile device. The second subscription consist of less individual consumer information in comparison with the third type. This third subscription consists of many individual level variables and is interesting for clustering purposes. The third type of subscriptions was chosen due to the rich individual level data. The sample of customers was balanced by 45 percent churners and 55 percent non-churners to estimate the model. In order to enable more accurate churn predictions, a balanced sample should be used (Donkers, Franses & Verhoef, 2003; Lemmens & Croux 2006; Risselada et al. 2013). The churn rate in the observation period was 13.4 percent. Churners were considered to be customers who made a churn request four months before the end date of their contract.

3.2 Variables Description

The predictors included in the model were divided into relational characteristics and customer characteristics (Prins & Verhoef 2007). The relational characteristics consisted of the variables relationship length, recency, frequency and monetary value. The variable age was a customer characteristic. The number one was the code received by customers who request to churn, if not a customer got a notation of zero. The full list of the variables and the explanation is provided in table A1 in Appendix A.

3.2 Creating New Variables

(22)

22 alternatively used slower, free Internet. In the second type, customers who have used the standard type of internet with additional payment. Finally, the dummy lte_frequency_nl was created to identify whether or not customers have a 4G capable device.

3.3 Missing Values, Oddities and Outliers

After creating new variables the data was investigated to determine the missing values, oddities and outliers. First of all, missing values were identified. The variables LTE_upload_speed, Income, Educat, Work_Sit, HH_Car, Housetype, Consuction_year, Size_m2, Value_House_class Last_move, Telephone, and Dealer_level_3 contain around 80 percent missing values and are therefore dropped from this study. Secondly, oddities are observed by investigating the range of the variables. A total of eight customers were found with an amount of 48, 52 and 68 days in the variable days_national_usage. This variable cannot exceed a value higher than 31 because a month cannot exceed more than 31 days. Another oddity is the launch date of the device from the year 1900 because mobile phones did not exist in that period. A total of 24 cases had been reported with that launch year. Furthermore, also negative ARPU was found in the dataset. The negative ARPU can mean that customers have been refunded by the company. Values considering age with values of 114 and 262 appeared in the dataset. These values were replaced with the average age of 41. Third, outliers were detected by making use of boxplots. Field & Hole (2002, p. 122) argue that outliers can bias the mean of a variable and inflate the standard deviation. Outliers were also found in the variables days_national_usage and device_newness. The variables day_national_usage_clean and device_newness_clean were created by replacing the outlier values by the average value of this variable. It was first investigated whether the outliers have a different effect on the estimates before subsequent steps were undertaken.

3.4 Customer Churn Models

(23)

23 Baesens, 2014). Neslin et al. (2006) investigated the accuracy among the different predictive models. Extensions of the churn models such as bagging and boosting are developed to increase the prediction accuracy (Lemmens & Croux, 2006). Research of Risselada et al. (2010) investigated the staying power (i.e. the predictive performance of a model in a given time period after the estimation period) of the churn models.

An important finding of the research of Neslin et al. (2006) is that methods do matter because the predictive accuracy of the model is important when targeting the right customers and it could affect the profitability of churn management. Furthermore, the methods decision trees and logistic regression are found to perform well on predictive power (Neslin et al. 2006), comprehensibility and operational efficiency (Verbeke et al. 2013). This research uses these methods because of the arguments stated above.

3.4.1 Classification trees

The data mining tool decisions trees or regression trees is commonly used by researchers and practitioners (Neslin et al. 2006). Using a decision tree is advantageous in a number of ways. First of all, trees are easy-to-implement and interpret managerially. Moreover, trees perform well on large datasets and are dynamic because non-linearity and interactions can be detected (Blattberg et al. 2008, p. 431). In several studies the decision tree outperformed the logistic regression (Perlich, Provost & Simonoff, 2003; Risselada et al. 2013), especially in telecommunication studies (Verbeke et al. 2013).

The decision tree tool is available in multiple methods such as the Chi-Square Automatic Interaction Detection (CHAID), Exhaustive CHAID and the Classification And Regression Tree (CART). For an in-depth discussion of these methods one can refer to Blattberg et al. (2008, p. 431). These methods are used to find real patterns in the data with different splitting rules (i.e. tree types) to investigate similarities among these methods. One can argue that decision trees were basically developed as an exploratory data mining tool and therefore used to find interesting explanatory variables as input for other data mining tools such as the logistic regression (Blattberg et al. 2008, p. 448).

3.4.2 Logistic regression

(24)

24 academics and practitioners in comparison with the 23 percent use of decision trees. One may prefer the logit as a method because of the mathematical convenience (Verbeke et al. 2013). The logit model is less sensitive to minor changes in comparison to decision trees. A small change in the regression tree can change the whole structure of the tree (Risselada et al. 2013).

In order to satisfy the binary event, the logistic regression uses a cumulative distribution function were the outcomes in churn probabilities are larger than zero and smaller than one. After investigating the outcomes of the decision trees, the following mathematical model is specified: ln 𝑃𝑖 1 − 𝑃𝑖 = ∝ + ∑ 𝛽1𝑗 𝐽 𝑗=1−4 𝑃𝑅𝑖 + ∑ 𝛽2𝑗 𝐽 𝑗=1−7 𝐷𝐷𝑖 + ∑ 𝛽3𝑗 𝐷𝐵𝑆𝐺𝑖 𝐽 𝑗=1−8 + 𝛽4 𝑅𝑖 + 𝛽5 𝐶𝐷𝑖 + 𝛽6𝑅𝐿𝑖 + 𝛽7𝑀𝑉𝑖 + 𝛽8 𝑆𝑖 + 𝛽9𝐷𝑁𝑖 + 𝛽10𝐹𝑖 + 𝛽11𝐴𝑖 Where:

Pi = predicted probability to churn α = constant

i = Subscriber id

PRi = proposition, with category 5 as benchmark DDi = dealer detailed, with category 8 as benchmark

DBSGi = data bundle size groups, with category 9 as benchmark

Ri = recency, dummy that indicates whether customers have not renew their subscription before year (0) or renew before (1)

CDi = dummy for contract duration that indicates whether customers have a subscription of one year (0) or two years (1)

RLi = relationship length MVi = monetary value

Si = dummy for sleeper that indicates whether the customer is a non-sleeper (0) or a sleeper (1)

DNi = device newness Fi = frequency

(25)

25 After specification and estimation of a churn model, the probability to churn for each

customer is multiplied with the ARPU of the individual. Previous research has often used a constant profit margin per user and ignore ARPU. McCarthy, Fader & Hardie (2015) warn researchers about this because customer profitability is often very noisy and rarely constant over time. ARPU is a reliable and stable metric and therefore used in this research (McCarthy et al. 2015). The new statistic (i.e. adjusted probability) was used to identify the customers with a probability to churn and simultaneously account for their value. Hereafter the customers are, based on their probability and ARPU, divided in ten equal groups to investigate the new top decile.

3.5 Cluster Analysis

The final analysis was a segmentation to identify the types of profiles in the top decile of the probability approach and the top decile of the value approach. The purpose of the cluster analysis was to divide the top deciles in clusters that are homogeneous inside and heterogeneous outside the cluster. The cluster analysis started with the hierarchical clustering method in order to find the optimal number of segments and was fine-tuned with a non-hierarchical clustering method. A limitation of the non-hierarchical clustering method is that there is no possibility for reassigning customers to other clusters (Mooi & Sarstedt, 2011, p. 244). In other words, if customers are linked together they can never be separated. To deal with this limitation, a non-hierarchical clustering method was performed which created the possibility to move linked customers to others clusters. A drawback of the non-hierarchical method is that the number of clusters have to be pre-specified (Malhotra, 2008, p. 603; Mooi & Sarstedt, 2011, p. 254). Therefore, the hierarchical method was used for specifying the optimal number of clusters and the number was used as an input for the non-hierarchical method.

In the hierarchical clustering method a commonly used approach, Ward’s method, was used because of two advantages it has. Firstly, Ward’s method have been shown to perform better than other methods (Malhotra, 2008, p. 603). Secondly, this procedure is often used as a hierarchical precursor to non-hierarchical clustering methods for dividing the data into a given number of clusters (Johnson & Wichern, 1998, p. 693; Malhotra, 2008, p. 603). Scores on each variable were transformed to Z-scores. By use of standardization the variables have a mean of zero and a variance of one and contribute equally.

(26)

26 therefore favourable to apply to large datasets (Johnson & Wichern, 1998, p. 696). Secondly, this algorithm is less affected by the presence of irrelevant variables and by outliers (Mooi & Sarstedt, 2011, p. 259). With the use of the number of clusters found in the hierarchical method, the K-means clustering algorithm converges when there is no further possibility of assignment of customers to segments (Wagstaff et al. 2001). Furthermore, because of computation time the cluster analyses was performed with a random sample of 25 percent taken from the top decile group of the probability top decile and another random sample of 25 percent was taken from the value approach top decile.

To summarize, the process was started with cleaning the data by investigating the outliers, oddities and missing values. Then general descriptives were provided to get an initial picture of the dataset. Three decision trees methods were used to mine the data on relevant predictors as input for the logistic regression. After the estimation of the logit model, the churn probabilities were adjusted for customer value. Two top decile lifts were created, one top decile with customers according to the probability approach and one top decile with customers selected by the value approach. Both top deciles were analysed by comparing the top deciles for unique customers. Finally, for both top deciles a cluster analysis was executed to investigate the different profiles between both groups.

4. RESULTS

This chapter provides a general view of the data. Then, the results of the decision trees and logit model are discussed. For both the top decile of the probability and value approach a cluster analysis is performed and the results are presented. Finally, the results of the hypotheses are summarized.

4.1 Pearson Correlation Coefficients

(27)

27 = .002, p = .290) and voice_mail_minutes (r = .001, p = .578). This suggests that days_national_data and roaming_sms are less likely potential predictors of churn.

4.2 General Descriptives

The dataset consisted of 192.692 females (48.2%) and 206.149 males (51.6%) with an average age of 41. A total of 176.540 customers (44.2%) did a churn request in the analyzed period, whereas 222.824 customers (55.8%) did not request to churn. The average relationship length of the customers with the company was 50 months. The smartphone brands Samsung (43.2%), Apple (36.3%) and HTC (7.1%) were the most frequently chosen brands.

FIGURE 2: Data bundle sizes

Figure 2 shows the percentage of (non)-churners on the Y-axis and the different data bundle groups on the X-axis. When splitting the data about data_bundle_size_groups in multiple categories, it was seen that the number of churners compared to the non-churners increases among the categories with 4.001-6.000 MB and 10.000-16.000 MB. 4.3 Decision Trees

(28)

28 TABLE 2: Results decision trees

Variable CHAID Exhaustive CHAID CART Proposition ✔ ✔ ✔ Dealer_detailed ✔ ✔ ✔ Data_bundle_size_groups ✔ ✔ ✔ Dummy_renew_before ✔ ✔ ✔ Contract_duration ✔ ✔ ✔ Relationship_length ✔ ✔ ✔ Dummy_sleeper ✔ ✔ ✔ Device_newness_index ✔ ✔ ✔ Days_national_usage ✔ ✔ National_data_mbs ✔ Age ✔

Legend: potential relevant predictors found in decision trees.

After trying different splitting rules all parent nodes in the three different models were set to 1200 and the child nodes were set to 600 in order to obtain the most organized and insightful tree. One can see from table 2 that the predictors: proposition, dealer_detailed, data_bundle_size_groups, dummy_renew_before, contract_duration, relationship_length, dummy_sleeper and device_newness were often used to split the nodes. According to the three models these determinants were the most important predictors and ended up high in the tree. The variable days_national_usage was found to be a strong predictor according the CHAID and Exhaustive CHAID. The CHAID also reported the variable national_data_mbs as an important variable. Up high in the three the variable age was found by making use of the Exhaustive CHAID method.

To prove that the three decision tree techniques outperformed a null model the following hit-rates were found and reported in table 3. In order to mine the data easily, the set was partitioned into a mutually exclusive training (70%) and validation (30%) sample. The training set was used to build the decision tree models and checked for accuracy by making use of the validation data.

TABLE 3: Hit-rates Decision Trees

CHAID Exhaustive CHAID

CART

Sample Hit-rate Hit-rate Hit-rate

Training No_churn 74.7% 74.5% 73.7% Churn_request 85.9% 86.1% 87.2% Overall Percentage 79.7% 79.6% 79.7% Validation No_churn 74.5% 74.3% 73.8% Churn_request 85.9% 86.0% 87.5% Overall Percentage 79.6% 79.5% 79.8%

(29)

29 The percentages in table 3 represent the amount of the cases where the model predicted churn and no churn request correctly and are 79.7 percent (CHAID), 79.6 (Exhaustive CHAID) and 79.7 percent (CART). These percentages were higher compared to the 61.9 percent of churners predicted as churners in the research of Glady et al. (2009). The TDL are respectively 1.899 (CHAID), 1.905 (Exhaustive CHAID) and 2.205 (CART) and in line with the 2.25 reported by Risselada et al. (2010). One can see that all three models better predicted churn request compared to no churn. Furthermore, the models were accurate since the overall percentages of the three techniques between the sample and validation sets are in line with each other. An issue that can arise with testing the model on a training and validation set is overfitting. Overfitting occurs when the model is able to find a unique idiosyncratic combination of predictors in the training set and poorly generalize new samples such as a validation sample (Blattberg et al. 2008, p. 294). According to table 3 overfitting was not an issue because the model performed well on the training and validation sample. The predictors that occurred often in the trees were used in the logit model.

4.4 Logistic Regression

After identifying the relevant predictors of churn by use of the decision trees, and investigating the relevant predictor based on the literature, the independent variables were included in the logistic regression.

4.4.1 Evaluation of the overall model

First the model in general was discussed. The Omnibus test p-value was found to be highly significant, at one percent significance level for the logit model (p-value = .000 < .01). Therefore, it can be stated that both models performed significantly better compared to the constant only model. In order to evaluate how well the model fitted the data, the McFadden’s, Nagelkerke and Cox & Snell R-squares were calculated.

(30)

30 higher the value of the R-square, the better the fit of the model. In other words, the model explains nearly 39-52 percent of the variance in defection more than a null model would explain.

Another diagnostic used was the hit rate. To assess the robustness of the model, the set was partitioned randomly into a mutually exclusive training (70%) and validation (30%) sample (Lemmens & Croux, 2006; Risselada et al. 2013; Neslin et al. 2013). The training set was used to build the logistic regression and checked whether it is accurate or not by making use of the validation data. According to Leeflang et al. (2015, p. 269) the hit rate is defined as the percentage of correctly classified observations and presented in table 4.

TABLE 4: Hit-rates Decision Trees

Logistic Predicted Percent

Sample Observed No_churn Churn_request Correct

Null No_churn 222.824 0 100% Churn_request 176.540 0 0% Overall Percentage 55.8% Training No_churn 172.579 50.245 77.5% Churn_request 34.007 142.533 80.7% Overall Percentage 78.9% Validation No_churn 50.279 16.142 75.7% Churn_request 8.482 44.180 83.9% Overall Percentage 79.3%

Legend: hit-rates of the logit model.

(31)

31 FIGURE 5: Cumulative lift curve

Another model comparison statistic is a cumulative lift curve and is presented in figure 5. This figure shows the cumulative percentage of churners on the Y-axis and the number of deciles on the X-axis. The TDL in this study was 1.9 which is in line with the TDL of 1.5 in the study of Risselada et al. (2010). The TDL meant that 10 percent of the highest scored customers capture 19 percent of the churners. The lift curve was benchmarked with the base case where every customer has the same probability to churn and the predictions are based on a random selection (Leeflang et al. 2015, p. 324). This random model was identified by the dotted line and the estimated model outperformed the random model. Hence, higher lift indicates better prediction and the estimated model outperformed the random model (Jamal & Bucklin, 2006).

4.4.2 Parameter assessment

Table 5 shows the parameters of the logit model. The sign, size, and significance of the most remarkable estimate were discussed. The variables marked with two stars were statistically significant at the one percent significance level and this explanation applied for all the subsequent tests in this paper.

(32)

32 TABLE 5: Parameter estimates

Predictor Beta Exp(B) Wald p-value VIF

Constant 2,126 8,382 1964,418 ,000**

Proposition (ref. cat. ‘5’) 849,884 ,000** 1,268

A(1) -,835 ,434 69,987 ,000**

B(2) ,275 1,317 48,056 ,000**

C(3) ,158 1,171 3923,187 ,000**

D(4) -1,545 ,213 3902,022 ,000**

Dealer_detailed (ref. cat. ‘8’) 1068,245 ,000** 1,081

Company shop(1) -,332 ,718 122,988 ,000** Retail shop(2) -,360 ,698 135,962 ,000** Telesales(3) -,223 ,800 50,544 ,000** Consumer_dealers(4) -,019 ,982 ,363 ,547 Online_shop(5) -,290 ,748 85,862 ,000** Traders(6) ,195 1,215 29,722 ,000** Retail(7) -,143 ,867 19,924 ,000**

Data_bundle_size_groups (ref. cat. ‘9’) 1992,745 ,000** 1,233

0_MB(1) -,912 ,402 1106,245 ,000** 1-500_MB(2) -,323 ,724 364,075 ,000** 501-1000_MB(3) -,040 ,960 1,755 ,185 1001-2000_MB(4) -,586 ,557 754,432 ,000** 2001-3000_MB(5) -,330 ,719 248,982 ,000** 3001-4000_MB(6) -,804 ,448 177,275 ,000** 4001-6000_MB(7) -,105 ,900 33,739 ,000** 6001-10000_MB(8) -,672 ,511 251,420 ,000** Dummy_renew_before ,362 1,436 1140,228 ,000** 1,503 Dummy_contract_duration ,121 1,129 53,311 ,000** 1,045 Relationship_length -,004 ,996 1146,034 ,000** 1,474 ARPU_index -,001 ,999 14,694 ,000** 1,114 Dummy_sleeper -3,522 ,030 26779,146 ,000** 1,287 Device_newness_index ,001 1,001 15,647 ,000** 1,154 Days_national_usage -,023 ,977 1335,464 ,000** 1,069 Age -,004 ,996 156,990 ,000** 1,147

Note: R2 = .357 (McFadden’s R2), .387 (Cox & Snell), .519 (Nagelkerke). Model χ2 = 194464.266, p < .001. The groups marked with ** are statistically significant at the .01 level. ⁎⁎ p < .01.

Legend: parameter estimates and validation diagnostics of the logit model.

(33)

33 consumer became a sleeper, then the odds for churning would be expected to decrease by a factor of .030 when the other variables in the model are held constant. This interpretation applied for each significant variable in the logit model.

An issue that could have arisen was that the predictor variable in the model was highly correlated with one or more other predictor variables in the model (Leeflang et al. 2015, p. 110). A consequence of multicollinearity is that parameter estimates become unreliable. A detection method for multicollinearity was by investigating the Variance Inflation Factor (VIF). Since the collinearity statistics in regression concern the relationships among the independent variables, the dependent variable could be ignored (IBM, 2014). Therefore the same predictors as used in the logit model were estimated in a regression. A VIF greater than five means that multicollinearity is a problem (Leeflang et al. 2015, p. 140). One can see that the VIF values in table 5 were all below five and therefore no multicollinearity is found in the model.

4.4.3 Robustness check

An important requirement of a good model is the robustness which is a quality characteristic for a logical and meaningful range of values (Little, 1970). To assess the robustness of the model, the data was randomly divided in two subsamples (i.e. training and validation sample). This robustness check is often applied in database marketing research (Lemmens & Croux, 2006; Risselada et al. 2013; Neslin et al. 2013). The model found similar estimation results for both subsamples, which supports the stability of the results. Another check for robustness is whether or not the results were consistent with previous researches (Nitzan & Libai, 2011). The model in this research also found comparable predictors as found in two similar churn prediction studies executed by the same company who provided the dataset for this research. The predictors such as dealer_detailed, relationship_length, ARPU, proposition, days_national_usage and age are common in the researches. These robustness checks confirmed that the model consisted of relevant constructs and logical results.

(34)

34 had a different effect on the estimates. It was found that the original and cleaned variables had not changed the outcome of the analysis. A plausible reason for this result could be the large number of customers in this dataset. Therefore, it was decided not to use the cleaned variables but to retain the original variable because the outliers had not influenced the outcomes in a significant manner.

4.5 Adjust for Customer Value

The following step was to analyse whether or not the probability to churn and probability adjusted with customer value differed concerning their unique customers in the top decile. One can argue that churn is a function of ARPU, and subsequently that probability was adjusted with ARPU could be problematic. The effect of APRU as a driver of churn was significant, however the relative impact of ARPU on churn was almost nil and ARPU was not a main driver of churn. Hence, the assumption was made that ARPU as a driver had no effect on the weighing of churn probability and ARPU.

A Chi-Squared Test of Independence was performed with unique customer id’s and groups (1 = probability to churn, 2 = probability times value). The square test was significant, Chi-Square (1) = 139855.717, p < .000. Customers based on unique customer id’s differed in the top decile of probability to churn and top decile based on probability times value to churn. Interestingly, 79.7 percent of the customers in the value approach top decile were not reported in the probability only approach top decile. The value at risk increases with 127.71 percent. When it was known the customers in unique customer id’s differ in both the top deciles, it became interesting to further investigate whether or not different customer profiles could be found in both top deciles.

4.6 Customer Profiles

(35)

35 national_minutes or international_minutes were excluded from the cluster analysis due to outliers. These outliers influenced the cluster analyse in the following way: it provided a cluster with a very small amount of customers which was is not practical to target from a managerial perspective (Mooi & Sarstedt, 2011, p. 255). Therefore active variables without outliers were used to estimate the cluster analysis as genuinely as possible.

4.6.1 Cluster analysis value approach

The clustering procedure started with a hierarchical clustering method to find the optimal number of clusters. The optimal number of segments was based on the results of the agglomeration schedule, dendogram, and scree plot. Firstly, the agglomeration schedule in table B1 in Appendix B was investigated, which showed the clusters or customers combined at each stage (Mooi & Sarstedt, 2011, p. 267). For example, in stage 9931, customers 123 and 225 were merged at a distance of 1702.232. Within the agglomeration schedule it was meaningful to look at the distance in the coefficients. The big changes in the values indicated big losses in information. A big change in value is shown from stage 9942 (7711.267) to stage 9943 (13062.406). Lastly, the dendogram was analysed and is in line with the results of the agglomeration schedule and scree plot and supported the optimal number of three clusters. The dendogram was too large and therefore not added to the appendix. One can argue that the optimal number of clusters was three because a lot of information would be lost with a two cluster solution.

Furthermore, by plotting the coefficients (i.e. distances) against the number of segments, a scree plot was generated and shown in figure B1 in Appendix B. One can interpret the scree plot by looking at the distinct break (e.g. elbow). The number of segments prior to this strong increase was the most probable solution (Mooi & Sarstedt, 2011, p. 270). As described in chapter three, with hierarchical methods an object remains in a segment. According to Mooi & Sarstedt (2011, p. 244) with hierarchical clustering there is no possibility to reassign an object to another cluster. To deal with this limitation and fine-tune the cluster analysis, the subsequent move was to optimize the cluster analysis by using non-hierarchical clustering method.

(36)

36 their significant differences. The variables relationship_length, device_newness and age were metric variables and therefore a One-way ANOVA test was most suitable and shown in table B2 in Appendix B. This test was significant at a one percent significance level for relationship length ((F(2, 9942) = 24177.389, p = .000)), device newness ((F(2, 9942) = 10.866, p = .000)) and age ((F(2, 9942) = 92.773, p = .000)). Hence, the clusters significantly differed between each other based on relationship length, device newness and age.

To investigate how these segments exactly significant differed on these variables, a least-significant difference (LSD) test was executed and table B3 in Appendix B presents the results. The mean differences between all the segments were given and the stars represent the significance at a one percent significance level. The results of LSD-test could be interpreted through the use of an example of the variable relationship length. The negative sign before the mean difference, when comparing cluster one with cluster two and three, meant that cluster one consist of consumers with a shorter relationship with the company. This interpretation applies for each significant variable in the LSD-test.

The variables dummy_renew_before and data_bundle_size_groups were measured on a nominal scale and therefore a Pearson Chi-Square test was performed to analyse the significant differences between the segments. The results of the dummy variable renew before are presented in table B4 and B5 in Appendix B. In order to analyse whether or not the clusters differed concerning their behaviour of renewing their subscription in the past, a Chi-square test was performed with the clusters and the variable renew_before (0 = no, 1 = yes). The Chi-square test was significant at a one percent significance level, Chi-Square (1) = 6300.054, p < .001. The results showed that the clusters do differ from each other in their behaviour of renewing their contract in the past.

The results of the variable data_bundle_size_groups are shown in table B6 and B7 in Appendix B. The same test was used in order to analyse whether or not the clusters significantly differ in their data bundles. The Chi-square test was significant at a one percent significant level, Chi-Square (1) = 459.954, p < .001. The results showed that the clusters differed between each other in their data bundle size.

(37)

37 TABLE 6: overview clusters

Cluster

Variables: 1 2 3 Average value

across cluster Relationship_length 20.1 54.6 116.4 63.7 Device_newness 42.1 43.4 42.7 42.7 Age 36.5 39.6 42.8 39.6 Data_bundle_size_groups 6 5 5 5.3 0_MB(1) 1% 1% 3% 1.7% 1-500_MB(2) 23 % 34% 36% 31% 501-1000_MB(3) 5% 4% 5% 4.7% 1001-2000_MB(4) 9% 11% 9% 9.7% 2001-3000_MB(5) 9% 3% 3% 5% 3001-4000_MB(6) 1% 1% 0% 0.7% 4001-6000_MB(7) 28% 20% 15% 21% 6001-10000_MB(8) 2% 1% 2% 1.7% 10000-16500_MB(9) 22% 25% 27% 24.7% Total % 100% 100% 100% Dummy_renew_before 0 1 1 0.66 No (0) 87% 8% 6% 33.3% Yes (1) 13% 92% 94% 66.6%

Legend: means per cluster value approach

The results of table 6 show that cluster one contained 5398 customers, cluster two represented 3484 customers and cluster three contained 1063 customers. It was argued the size of the clusters were large enough for targeting purposes, which is convenient for marketing managers (Mooi & Sarstedt, 2011, p. 255). Within the different clusters the means were presented per variable. Lastly, the three clusters were labelled with a profile name and the characteristics for each cluster are provided in table 7.

TABLE 7: Profile description

Cluster Profile name Characteristics

1 Potentials Mid-thirties who have a relative short relationship length with the company, a relative large data bundle and mostly did not renew their contract before.

2 Persuadable’s Begin forties who have a relative medium relationship length with the company, a relative medium or small data bundle size and mainly did renew their contract before.

3 Loyals Begin forties who have a very long relationship length with the company, a relative medium or small data bundle size and mainly did renew their contract before.

Legend: profile names and characteristics value approach

(38)

38

4.6.2 Cluster analysis probability approach

In line with the cluster analysis of the value approach, the procedure started with a hierarchical clustering method for finding the optimal number of clusters. In order to get a valid comparison with the cluster analyse for the value approach, the same variables were used in this cluster analysis. The optimal number of segments was based on the results of the agglomeration schedule, scree plot and dendogram. Firstly, the agglomeration schedule in table C1 in Appendix C was applied which showed the clusters or customers combined at each stage (Mooi & Sarstedt, 2011, p. 267). For example, in stage 9931 customers 8 and 9 were merged at a distance of 2181.876. One could infer that a big change in value is shown from stage 9942 (7665.800) to stage 9943 (12777.542). Furthermore, by plotting the coefficients (i.e. distances) against the number of segments a scree plot was generated and shown in figure C1 in Appendix C. The dendogram was too large and therefore was not added to the appendix. According to the agglomeration schedule, scree plot and the dendogram a three cluster solution was the most optimal choice because a lot of information would be lost with a two cluster solution.

(39)

39 The variables dummy_renew_before and data_bundle_size_groups were measured on a nominal scale and therefore a Pearson Chi-Square test was performed to analyse the significant differences between the segments. First of all, table C4 and C5 in Appendix C shows the results of the dummy variable renew before. In order to analyse whether or not the clusters differed concerning their behaviour of renewing their subscription in the past, a Chi-square test was performed with the clusters and the variable renew before (0 = no, 1 = yes). The Chi-square test was significant at a one percent significance level, Chi-Square (1) = 1618.716, p < .001. One can see in table C5 in Appendix C that the clusters differed between each other in terms of behaviour of renewing their contract in the past. Secondly, table C6 and C7 in Appendix C presents the results of the variable data_bundle_size_groups. The same test was used in order to analyse whether or not the clusters significantly differed in their data bundles. The Chi-square test was significant at a one percent significance level, Chi-Square (1) = 1175.708, p < .001. The results showed that the clusters differ between each other in their data bundle size. The results of the differences between the clusters are shown in table 8 where the variables are in the vertical columns, and the number of clusters and the number of customers in each cluster are in the horizontal rows.

TABLE 8: overview clusters

Cluster

Variables: 1 2 3 Average value

across cluster Relationship_length 20.2 22.1 49.7 30.7 Device_newness 39.1 64.3 51.2 51.5 Age 38.5 44.1 44.9 42.5 Data_bundle_size_groups 5 3 3 3.7 0_MB(1) 0% 2% 3% 1.7% 1-500_MB(2) 48% 69% 52% 56.3% 501-1000_MB(3) 2% 3% 3% 2.7% 1001-2000_MB(4) 5% 5% 27% 12.3% 2001-3000_MB(5) 0% 0% 1% 0.3% 3001-4000_MB(6) 0% 0% 0% 0% 4001-6000_MB(7) 27% 14% 7% 16% 6001-10000_MB(8) 0% 0% 0% 0% 10000-16500_MB(9) 18% 7% 7% 10.6% Total % 100% 100% 100% Dummy_renew_before 0 0 0 0 No (0) 98% 97% 68% 87.7% Yes (1) 2% 3% 32% 12.3%

Legend: means per cluster probability approach

(40)

40 managers (Mooi & Sarstedt, 2011, p. 255). Within the different clusters the means were presented per variable. Lastly, the clusters were labelled with a profile name and the characteristics were provided for each cluster in table 9.

TABLE 9: Profile description

Cluster Profile name Characteristics

1 Switchers End-thirties who have a relative short relationship length with the company, a large data bundle, average new device and mostly did not renew their contract before.

2 Laggards Mid-forties who have a relative short relationship length with the company, a small data bundle with an outdated device.

3 Loyals Begin forties who have a long relationship length with the

company and a small data bundle size combined with a somehow outdated device.

Legend: profile names and characteristics probability approach

4.7 Hypothesis Testing

In order to find evidence about the importance of churn predictors and the differences between unique customers and customer profiles, the following hypotheses were reflected on.

4.6.1 Recency

Recency, also known as the time since the last purchase of the customer and was measured with the variable dummy_renew_before. If a consumer recently renewed his or her contract before, then the odds for churning would be expected to increase by a factor of 1.436 when the other variables in the model were held constant. Therefore, hypothesis H1a is supported.

4.6.2 Frequency

In case of frequency, the variable days_national_usage was used for measuring how frequent customers use the service. If a consumer shows more frequent behaviour, then the odds for churning would be expected to decrease by a factor of .977 when the other variables in the model were held constant. Hence, hypothesis H1b is supported.

4.6.3 Monetary value

(41)

41

4.6.4 Relationship length

Relationship length shows how long the customer has a relationship with the company in months. If a consumer-company relationship length increases by one month, then the odds for churning would be expected to decrease by a factor of .976 when the other variables in the model were held constant. Hence, hypothesis H1d is supported.

4.6.5 Age

The customer characteristic age was also included in the model for hypothesis testing. If a consumer becomes one year older, then the odds for churning would be expected to decrease by a factor of .996 when the other variables in the model were held constant. Hence, hypothesis H1e is accepted.

4.6.6 Device newness

Device newness is a relative new predictor that is tested in the academic literature. This variable shows how many months before the end date of this dataset the device of the customer is introduced into the market. Considering that all the other predictors remains constant for each increase in one month of the device, then the odds for churning would be expected to increase by a factor of 1.001. So, hypothesis H1f is supported.

4.6.7 Difference in unique customers

Referenties

GERELATEERDE DOCUMENTEN

The effect size statistics derived from the churn driver variables in this analysis are influenced by metadata which can be classified in five categories: driver

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

The results of our previous analysis (individual effects model) indicate that an increase in change in activity (i.e., higher changes within sessions), decreases customer churn..

The interaction with XXXX shows a negative effect, which indicates when a customer is acquired via XXXX, and the number of total discount subscriptions goes up by 1, the

Given the different characteristics of the online and offline channel, and the customers that use a respective channel, channel choice is expected to moderate the

Theoretical Framework Churn Drivers Relationship Breadth H1: - Relationship Depth H2: - Relationship Length H3: - Age H4: - Gender H5: - Prior Churn H6: + Price H7: + Promotion H15:

›  H4: Average product price positively influences the effect of the amount of opens on customer churn.. ›  H5: Average product price positively influences the effect of the amount

To identify interaction effects that can have a moderating effect on the drivers of churn, a Pearson Chi-square correlation test has been performed for the variables of