• No results found

Predicting churn based on the interplay between usage (dynamics) and newsletter email marketing University of Groningen

N/A
N/A
Protected

Academic year: 2021

Share "Predicting churn based on the interplay between usage (dynamics) and newsletter email marketing University of Groningen"

Copied!
67
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Predicting churn based on the interplay between usage

(dynamics) and newsletter email marketing

University of Groningen

14 January 2019

(2)

Predicting churn based on the interplay between usage

(dynamics) and newsletter email marketing

Master thesis MSc Marketing Intelligence and Marketing Management 14 January 2019

Dirk Hendrik (Dick) van Klaarbergen Eendrachtskade 10 C16 9726 CW Groningen +31 6 11821318 d.h.van.klaarbergen@student.rug.nl S2749351 University of Groningen Faculty of Economics and Business

Department Marketing

PO BOX 800, 9700 AV Groningen (NL)

Supervisors University:

(3)

TABLE OF CONTENTS

EXECUTIVE SUMMARY ... 4 ABSTRACT ... 1 ACKNOWLEDGEMENTS ... 2 INTRODUCTION ... 3 THEORETICAL FRAMEWORK ... 8 RFI usage ... 9 Recency ... 10 Frequency ... 11 Intensity ... 11 Recency interactions ... 11 Dynamics in usage ... 12

Monthly peaks and dips ... 13

Monthly variance ... 13

Annual trend ... 14

Newsletter emails ... 14

Interplay between usage (dynamics) and newsletter emails ... 15

Conceptual research model ... 16

DATA ... 18

Research setting ... 18

Variable descriptions ... 18

Churn ... 19

RFI usage ... 21

Monthly dynamics in usage ... 21

Newsletter emails ... 22

Missings ... 22

Outliers and oddities ... 23

Descriptive statistics ... 23

METHODOLOGY ... 26

Modeling technique ... 26

Logistic regression ... 26

Model specification ... 27

(4)

Estimation ... 29 Machine learning ... 29 Balanced sample ... 30 RESULTS ... 31 Logistic regression ... 31 Control variables ... 31 Multicollinearity ... 31 Modeling ... 33 Hypotheses testing ... 38 Model estimates ... 39 Robustness check ... 41

Machine learning techniques ... 43

DISCUSSION ... 47

Conclusion ... 47

Theoretical implications ... 48

Managerial implications ... 49

Limitations ... 49

Suggestions for future research ... 51

REFERENCES ... 52

APPENDICES ... 57

Appendix A: Holiday usage patterns ... 57

(5)

EXECUTIVE SUMMARY

Both in business-to-consumer (B2C) and business-to-business (B2B) markets keeping track of customer usage patterns, to know which customers are going to leave the company (churn), is important. As customer retention is much more effective than customer acquisition.

Moreover, in B2B customers are more valuable than in B2C, as customers are mostly larger and there are less. So in B2B customer retention might be even more important. Next to that, in B2B people within the company often have direct contact with their customers, this makes it easier to apply customer retention. So in this paper the effect of usage and newsletters emails on churn is investigated in a B2B context. The study is carried out in the learning method for secondary education market.

This paper shows that online learning method usage can indicate the “path to death” of customers. The “path to death” are the stages a customer goes through before churning. First this research found that, the less recent a product is used, the higher the churn probability of a customer. In addition also for frequency applies, the less frequent a product is used, the higher the churn probability of a customer, but this effect is less strong when there are a lot of dips in usage. Moreover, when the product is used recently and also frequently, the churn probability of a customer is even lower. Also, variation in usage is found to be related to low churn probabilities of customers. Finally, customers that use the online learning method have a lower churn probability than customers that do not use the online learning method.

Also the effect of newsletter email marketing is investigated. This showed that whether or not a customer received a newsletter email, is related to churn. More specifically, a customer that did receive a newsletter email has a 24% lower churn probability. In addition, customers that do not use the online learning method and do not receive newsletter emails have a

substantially higher churn probability than the average churn probability.

(6)

1

ABSTRACT

In this paper churn is predicted by the interplay between usage (dynamic) variables and newsletter email marketing. This is done by using a binary logistic regression (logit) model that predicts whether or not a customer stays at the company by using usage (dynamic) and newsletter email variables. In addition, machine learning is used to investigate the increase in predictive performance of machine learning compared to logit modeling. The setting of this study is the learning method for secondary education market (B2B). In this context churn refers to customers that used a learning method of the company in the past schoolyear, but do not use this learning method in the year after that anymore. Contractual churn is modeled, as the change in learning method is observed by the company. In this market, in the past no usage data was present, because there was only offline usage by using textbooks. However currently online usage is increasing and online usage is easy to monitor, so this data is used to predict churn. The results of this study show that online usage can be used as a churn

predictor. More specially, more recent usage decreases the churn probability of customers. In addition, frequency strengthens the effect that recency of usage has on churn. Also frequency has a direct effect on churn, however this effect strongly depends on recency of usage and the amount of monthly dips in frequency of usage. Next to that, the amount of variation in

monthly usage is found to have a positive effect on churn. However, no evidence is found for a positive effect of targeting customers with newsletters by means of usage (dynamic)

(7)

2

ACKNOWLEDGEMENTS

First, I would like to thank my thesis supervisor Dr. H. Risselada for the critical and supportive feedback throughout the thesis writing period. Besides, I want to show

appreciation to all the teachers of the MSc. Marketing at the University of Groningen for teaching me the knowledge I needed to carry out this research.

(8)

3

INTRODUCTION

Customer retention management has a large impact on firm performance (Rust & Zahorik, 1993). More specifically, investment in retention of existing customers has higher returns than investments in acquisitions of new customers (Rust & Zahorik, 1993). The costs of the

acquisition of a new customer is often said to be five times higher than the costs of retention of an existing customer (Pfeifer, 2005). This emphasizes the importance of customer retention management.

From an analytical perspective customer retention management can be split up into two modeling efforts, namely predicting churn and analyzing effectiveness of customer retention efforts (Hung, Yen & Wang, 2006). Churn refers to customers (that joined the company for at least a period) that have left the company (Huang, Kechadi, Buckley, Kiernan, Keogh & Rashid, 2010). This paper is mainly aimed at predicting churn, but also improving the effectiveness of retention management is a goal of this study. This paper will analyze the effect that customer usage (dynamics) and the customer retention method newsletter email marketing has on churn. Also the effect that the interplay between usage (dynamics) and newsletters has on churn will be investigated. Therefore first usage (dynamic) variables are used as predictors of churn. After the most important usage (dynamics) churn predictors are found, these predictors are used to indicate which customers should be targeted with

newsletter marketing emails. Thus a comparison is made between the effectiveness of sending newsletter emails to every customer (no targeting) and sending newsletter emails just to customers with usage that indicates high churn probability (targeting).

There are found two main reasons of churn in academic literature, namely customer satisfaction and switching costs (De Ruyter, Wetzels & Bloemer, 1998; Lam, Shankar, Erramilli & Murthy, 2004; Rust & Zahorik, 1993). Customer satisfaction makes customers more loyal and thereby churning less likely (Lam et al., 2004). Switching costs make churning more difficult and/or costly and thereby less likely (Jones, Mothersbaught & Beatty, 2000). The reasoning in this paper will be mainly based at the satisfaction stream in churn literature, but also switching costs theory will be used for the reasoning of relationships hypothesized in this study.

(9)

4

important customer relationship management is (Gummesson, 2004). However in B2B context, churn research is far less extensive (Gordini & Veglio, 2017; Jahromi, Stakhovych & Ewing, 2014), while there are important differences with the B2C context. The main

differences between B2B and B2C are that there are less customers in B2B and customers buy in larger volumes in B2B (Rauyruen & Miller, 2007; Narayandas, 2005). This makes

customer retention even more important for B2B companies, as losing customers costs more and it is more difficult to get new customers. Additionally, in B2B settings companies are often relying on personal selling and salespersons often have a personal relationship with the buyer (Mudambi, 2002). So when B2B companies know which customers have the highest probability to churn they can easier approach these customers to prevent them from churning. This increases the effectiveness of customer retention efforts. Moreover Palmatier, Dant, Grewal and Evans (2006) found that relationship marketing is more important in B2B than it is in B2C and interpersonal relationships are part of the switching costs concept (Jones, Mothersbaught & Beatty, 2000). Thereby switching costs are of more importance in B2B than in B2C. Switching costs are one of the main reasons for churn and there is a difference in importance of switching costs between B2B and B2C, so switching costs are of relative more importance than satisfaction in B2B compared to B2C. Thus also churn predictors might be different in B2B than in B2C. This makes it interesting to investigate churn predictors in a B2B context.

The relationship between product usage and churn probability of a customer has already been studied sometimes in B2C (often telecommunications). Ascarza and Hardie (2013) and Castro and Tsuzuki (2015) found that customer usage is negatively related to churn in respective contractual and online gaming settings. Furthermore in telecommunications context studies indicate that total call duration (in minutes per month) is a negative predictor of churn probability (Huang, Kechadi & Buckley, 2012; Lemmens & Croux, 2006). Moreover

Lemmens and Croux (2006) indicate that in telecommunications also dynamics in usage over time play an important role in customer churn prediction, even a more important role than the static usage variables. They found that the role of change in monthly minutes of

telecommunication usage is even more important than static monthly use itself. Also Bolton and Lemon (1999) found that dynamics in customer usage is related to customer satisfaction in online banking and customer satisfaction is negatively related to churn (Gustafsson,

(10)

5

To prevent customers from churning, customer retention methods are applied by companies. The effectiveness of an important method of customer retention management in this setting, namely newsletter email marketing, will be investigated. Email marketing and specifically newsletter email marketing are cost effective marketing tools that are used by a lot of companies (Nickerson, 2007; Ting, 2012). In addition, targeting high probability churner customers can increase effectiveness of the emails (Jenkins, 2008; Ting, 2012). In this paper the effectiveness of using usage (dynamic) variables for targeting newsletter emails to prevent churn will be tested.

So churn prediction and retention management are important topics in as well B2C as B2B. However already a lot is known about churn in B2C context, but in B2B churn prediction literature is still limited. Furthermore as is explained, it is important to understand customer usage to know which customers are most likely to churn. In addition literature shows that dynamics in usage might be even more important than usage itself. Nevertheless the effects of usage and dynamics in usage on churn are only studied limitedly and mainly in B2C context, while it is especially important in B2B. Also the effectiveness of targeting high probability churners based on usage data with email marketing has not been studied in academic research yet. That is why in this paper will be zoomed in on churn prediction based on usage

(dynamics) in a B2B setting. Furthermore the effectiveness of targeting customers with newsletter emails based on usage (dynamics) will be studied. Finally the usefulness of using machine learning to predict churn is examined.

So the following main research question is formulated for this paper:

How is the interplay between usage (dynamics) and newsletter email marketing related to the churn probability of a customer?

Sub questions are:

1. How are usage (dynamics) related to the churn probability of a customer?

2. To what extent can the effectiveness of newsletter email marketing be increased by targeting customers with usage that indicates high churn probability?

3. To what extent does machine learning increase the performance of churn prediction?

(11)

6

students. Learning methods in the past were merely offline. Nevertheless currently availability and usage of online material is increasing. Currently learning methods generally consist of an offline textbook with an online module. In the past it was not possible to investigate the influence of product usage on churn, because it is difficult to observe offline usage behavior of customers, because investigating textbook usage of students is nearly impossible on a large scale. However currently a part of customer usage, namely online usage, can be observed. Thus in this study will be investigated if usage (dynamics) of the online part of a learning method can explain whether or not a customer is going to churn to a competitor.

In this paper the main modeling method that will be used is binary logistic regression (logit), with as dependent variable churn and as explanatory variables online learning method usage (dynamics) and newsletter email marketing. Also various machine learning techniques other than logit will be used to improve the performance of the model. Usage will be included in the models by the variables recency, frequency and intensity of use (RFI model). Dynamics in usage will be integrated in the models by means of looking at the number of monthly dips and peaks in usage, the variation in monthly usage and the yearly trend in usage. Observational cross-sectional data will be used for this.

(12)

7

This paper extents churn literature by providing new insights on the effect of online usage and online usage dynamics. Most churn studies took place in a B2C context however this study is set in a B2B context, so it fills a gap in B2B churn research. As for B2B companies churn management might be even more important than for B2C companies, but less churn research is done in B2B. In addition a lot of papers on churn modeling by using usage data are mainly focused on testing of churn modeling techniques instead of extending theory. This paper also serves the goal of providing more theoretical understanding of the effect of usage level (dynamics) and newsletter email marketing on customer churn by using satisfaction and switching costs theory. Also on the methodological side this study provides insights, because the performance of the traditional logistic regression is compared with a number of other newer machine learning techniques.

Moreover the insights are useful for managers of partly online products, as the findings of this paper show that for a product with an online and offline component, online usage data can be used as a predictor of churn. The insights of this paper can help managers to identify potential churners by analyzing product usage, in other words the “path to death” of customers can be identified. The “path to death” are the stages a customer goes through before churning (Ascarza & Hardie, 2013). Especially in B2B the insights are useful, because account managers often have a personal relationship with their customers. So the potential churners can be directly targeted by retention efforts. Furthermore the comparison of machine learning techniques delivers insights about the use of performance of different churn modeling

techniques. This information can be used for the choice of churn modeling technique by data-analysts within companies that want to predict churn.

(13)

8

THEORETICAL FRAMEWORK

Customer loyalty literature can be divided into three streams: behavioral loyalty, attitudinal loyalty and composite/integrated loyalty. The composite/integrated loyalty approach contains as well behavioral as attitudinal loyalty (Rauyruen & Miller, 2007). In this case behavioral loyalty will be modeled by using the behavioral dependent variable churn. Churn is a well-studied concept in academic literature, especially in B2C settings.

Most studies indicate that customer satisfaction and switching costs are the main underlying causes for customers to stay or churn (e.g. De Ruyter & Wetzels, 1998; Gustafsson, Johnson & De Roos, 2005; Lam et al., 2004; Rust & Zahorik, 1993; Shukla, 2004). Customer

satisfaction is “the customer’s psychological response to his or her positive evaluation of the consumption outcome in relation to his or her expectation” (Shukla, 2004, p. 85). When a customer is more satisfied, this customer tends to be more loyal and thereby has a lower probability to churn (Lam et al., 2004). Switching costs are all factors that make it more difficult or costly for consumers to change supplier, so switching costs make churning difficult and/or costly and thereby less likely (Jones, Mothersbaught & Beatty, 2000). Furthermore, interpersonal relationships are more important in B2B than in B2C (Palmatier, Dant, Grewal & Evans, 2006) and Jones, Mothersbaught and Beatty (2000) found that interpersonal relationships are part of the switching cost concept. Switching costs are an important factor in the churn/stay decision of customers and there is a difference in

importance between B2C and B2B. This means that the relationship between usage dynamics and churn might be different in B2B than it is in B2C. As when switching costs are relatively more important in compared to satisfaction in B2B, this changes the influence of churn predictors, like usage (dynamics), on churn. However, a little research into churn predictors is done in B2B. That is the reason why this paper investigates the influence of usage (dynamics) on churn in a B2B setting.

(14)

9

usage will be a predictor of churn, as looking at usage of the online module will also investigate full product usage. In addition usage of complements, in this case both the textbook as the online module, is related to customer loyalty (Eppen, Hanson & Hanson, 1991). So it is expected that more usage of the online module, in other words more complete usage of the product, will lead to lower churn. So it is expected that also in this setting usage of only the online module will be a predictor of churn. In the following section the expected relationships and hypotheses of this paper and the theoretical research framework will be presented.

RFI usage

The first relationship that is going to be tested in this paper is the relationship between usage of the online learning method and churn probability. Shukla (2004) states that product usage has a direct effect on customer satisfaction specifically for durable high involvement products. In this setting learning methods are sold, this are durable and high involvement products. As learning methods can be used for a long time and teachers tend to carefully consider the purchase of a new learning method. Thus also in the educational publishing context, it is expected that product usage is related to satisfaction. Shukla (2004) also found that usage does not have a significant effect on churn in various contexts. Nevertheless a lot of other paper confirm that there is a relationship between usage of a product and customer churn (Gustafsson, Johnson & De Roos, 2005; Lam et al., 2004; Rust & Zahorik, 1993). This contradiction in literature also makes it interesting to investigate if in this specific context the relation between usage and churn exists.

(15)

10

Ascarza and Hardie (2013) show that the RFM concept is an important predictor of churn. Also Coussement and Van der Poel (2008) found that in their case an extended form of RFM, namely eRFM was a good predictor of churn. Moreover the paper of Castro and Tsuzuki (2015) shows that the RFI concept is a good churn predictor. So RFM and RFI usage were found to be good predictors of churn in various contexts (online gaming, annual subscription and newspaper).

Also in the B2B market the RFM concept was used as a churn predictor (Chen, Hu and Hsieh, 2015; Gordini & Veglio, 2017; Jahromi, Stakhovych & Ewing, 2014). The various outcomes indicate that especially recency and frequency were strong churn predictors. However, all these studies were done in a totally different context (logistics, ecommerce and FMCG). Even more important, these RFM models were not based on usage history, like it is in this paper, but on purchasing history. So they used a similar model (RFM), but with another kind of data. The difference between usage and purchasing history is in the fact that customers do have to pay extra for new purchases and they do not have to pay extra for more usage. This is relevant for the relationship with churn. As having to pay for an additional product (one frequency more in case of purchasing frequency) causes higher switching costs than not having to pay for one time extra usage (one frequency more in the case of usage frequency). Switching costs are an important factor in the churn/stay decision of customers. So there is a difference

between the effect of RFM on churn in case of usage and purchasing history. Nevertheless all these studies indicate that the RFM/RFI variables are useful as predictors of churn, so it is expected that also in this setting churn can be explained by using a RFI model by using usage history.

Recency

For the RFI concept, operationalization similar to those of Castro and Tsuzuki (2015) are used. First, for recency the indicator is days since the last time of use of the online learning method of a customer in the schoolyear 2017/2018. This factor actually already looks at dynamics in usage. However more in-depth analyses of dynamics in usage are introduced in the dynamics in usage section. Ascarza and Hardie (2013) found that recency of usage is significantly related to churn. Also Hughes (2005) indicates that the more recent usage is, the less likely churning becomes. This can be explained by satisfaction literature, when a

(16)

11

H1a: Recency of usage of the product is positively related to the churn probability of a customer

Frequency

Second, for frequency the indicator is average number of times of usage of the online learning method per student in the schoolyear 2017/2018. Ascarza and Hardie (2013) indicate that frequency is negatively related to churn. Also Hughes (2005) indicates that high frequency is related to high customer loyalty and so low churn. When a customer uses something more frequent, this indicates that this customer is more satisfied about the product and thereby has a lower probability to churn. In addition when the teachers and students use the learning

methods frequent, the switching costs rise, as they are familiar with the product and they do not want to have to get used to a new learning method. So the following is hypothesized:

H1b: Frequency of usage of the product is negatively related to the churn probability of a customer

Intensity

Third, for intensity the indicator will be percentage of students that uses the online learning method in the schoolyear 2017/2018. Castro and Tsuzuki (2015) show that intensity of usage is positively related with the churn probability of a customer. When a customer uses

something more intense, so in this case a bigger percentage of students, it is expected that the teacher(s) and students are more satisfied about the learning method and thereby the

probability to churn is lower. Also here switching costs play a role, as when a lot of students (and thereby teachers) use the online learning method they get used to this method and do not want to familiarize with a new method. So the following is hypothesized:

H1c: Intensity of use the product is negatively related to the churn probability of a customer

Recency interactions

In the study of Ascarza and Hardie (2013) next to the direct relationship, also the interaction between recency and frequency was a significant predictor of churn. This was the indication that also in this study it could be useful to include an interaction effect of as well recency and frequency as recency and intensity.

(17)

12

probability would be predicted. However only saying recency is low based on one recent observation does not make sense. This example shows that the effect of recency highly depends on usage in terms of frequency and intensity. So the following is hypothesized:

H1d: Frequency of usage strengthens the positive relationship between recency of usage and churn probability of a customer

H1e: Intensity of usage strengthens the positive relationship between recency of usage and churn probability of a customer

Dynamics in usage

Moreover dynamics in usage is also found to be an important churn predictor, in some contexts even more important than static usage itself (Lemmens & Croux, 2006; Ascarza & Hardie, 2013). So in this paper also dynamic usage variables will be investigated. Lemmens and Croux (2006) found that dynamics over time, in telephone usage play an important role for customer churn prediction. They found that the role of change in monthly minutes of use is even more important than monthly use itself. Also Ascarza and Hardie (2013) state that including dynamic effects in usage increase the predictive performance of a churn model. They for example found that a decreasing usage patterns lead to churn.

A number of usage dynamics variables similar to that of Lemmens and Croux (2006) and Ascarza and Hardie (2013) will be used in this paper. Lemmens and Croux (2006) studied usage dynamics by looking at “the percentage change in monthly minutes of use versus a three-month average” (Lemmens & Croux, 2006, p. 278). However the data used for this paper is cross-sectional instead of longitudinal, as it is in the paper of Lemmens and Croux (2006). The reason for this is that in the context of this study it is only possible to churn at one moment in a year. So the data that is used in this paper is based on one moment. This means that looking at change in percentages in monthly churn is not possible. Nevertheless in this study we will look at similar variables as the change in monthly use variable of Lemmens and Croux (2006), but then made into monthly and half-yearly dynamics, namely: number of monthly peaks and dips in frequency of usage, variance in monthly frequency of usage and the annual usage time trend in frequency of usage.

(18)

13

there is highly frequent usage, but with a lot of monthly dips this causes a weaker effect of frequency than when the frequency of usage is stably high.

Monthly peaks and dips

First, the direction of the variation in usage will be covered by two variables, namely number of monthly dips in a year and number of monthly peaks in a year. Lemmens and Croux (2006) found that a constant usage pattern is related to low churn probabilities. In addition,

satisfaction literature suggests dips in usage indicate something else than peaks. Dips in usage indicate that a customer was not satisfied about the material in a certain period, so this would mean higher churn probability, for peaks this is the other way around. The dips and peaks variables will not indicate the extent of change in usage, because only the amount of dips and peaks are investigated, not the extent of the dips and peaks. So the following hypotheses have been hypothesized:

H2a: Number of monthly dips in frequency of usage of the product is positively related to churn probability of a customer

H2b: Number of monthly dips in frequency of usage weakens the negative relationship between frequency of use and churn probability of a customer

H2c: Number of monthly peaks in frequency of usage of the product is negatively related to churn probability of a customer

H2d: Number of monthly peaks in frequency of usage strengthens the negative relationship between frequency of use and churn probability of a customer

Monthly variance

Second, monthly variance in frequency of usage in the year 2017/2018 will be used as explanatory variable to take into account the extent of monthly variance in usage. The

(19)

14

possibility to change the situation, this is mostly caused by high switching costs (Stauss & Neuhaus, 1997). So the following is hypothesized:

H2e: Monthly variation in frequency of usage of the online module of a product is positively related to churn probability of a customer

H2f: Monthly variation in frequency of usage weakens the negative relationship between frequency of usage and churn probability of a customer

Annual trend

Third, the annual trend in usage will be used as a predictor of churn. The annual time trend in the schoolyear 2017/2018 is the change in frequency of usage between the first and second half of the schoolyear. Ascarza and Hardie (2013) and Huang, Kechadi and Buckley (2012) found that customer who have a decreasing usage pattern have a high churn probability. The reason for this is that decreasing use indicates less satisfaction over time, this means higher churn probability. Thus the following hypotheses will be tested.

H2g: Usage time trend in frequency is negatively related to churn probability of a customer H2h: Usage time trend in frequency of usage strengthens the negative relationship between frequency of usage and churn probability of a customer

Newsletter emails

Predicting churners only gets valuable when an effective communication tool for customer retention can be used (Jahromi, Stakhovych & Ewing, 2014). So also the effectiveness of the customer retention method email newsletters will be investigated. Email newsletters are aimed at increasing customer retention. In this study a test is done for targeting customers with email newsletters by using usage (dynamic) variables for churn prediction. This is done to see if targeting customers based on usage (dynamic) variables makes the newsletters more effective in preventing customers from churning.

(20)

15

However marketing mailing also has downsides. Morimoto and Chang (2006) mention three of them. First they mention perceived advertising intrusiveness, defined as “the degree to which an unwanted marketing communication interferes with an individual’s cognitive process and tasks, as well as the interference with media contents including offensive

materials” (Morimoto & Chang, 2006, p. 10). Second they mention perceived loss of control, defined as “the degree to which a consumer feels a loss of control in conducting their own tasks due to the exposure to intrusive ads” (Morimoto & Chang, 2006, p. 11). Third they mention irritation caused by direct marketing communication, defined as “the negative, impatient, and displeasing feeling of individual consumers caused by various forms of

advertising stimuli” (Morimoto & Chang, 2006, p. 11). All these factors lead to less customer satisfaction about the company and thereby to churn. So sending too many newsletter emails (that are not perceived as useful) could lead to the issues as mentioned before that lead to less satisfaction and thereby churn.

So this would mean that there is an ideal number of emails that can be sent in a certain situation. As email newsletters are highly effective and thus decrease the churn probability of customers. Nevertheless when a company sends too many emails advertising intrusiveness, perceived loss of control and irritation issues arise (Morimoto & Chang, 2006). From a certain amount of emails on this could lead to higher churn probabilities for customers. So both a linear negative and a u-shaped relationship will be tested to check whether too many newsletter emails are sent. However it is not expected that the company exceeds the critical point of sending too many newsletter emails, because the company already takes into account that they must not sent too many marketing emails. So a linear effect of number of newsletters sent to customers will be investigated by the following hypothesis:

H3: Newsletter emails received is negatively related with the churn probability of a customer

Interplay between usage (dynamics) and newsletter emails

(21)

16

Nevertheless it is possible to investigate if currently sent newsletter emails are more effective for customers that have usage that indicates high churn probability. For this a moderation of the usage (dynamic) variables on the relationship between newsletter emails sent and churn will be tested. It is expected that newsletter marketing works already as a churn decreasing factor and the effect is getting even stronger for customers with usage (dynamic) variables that indicate high churn probability. The reason for this is that people that are more likely to churn have lower switching barriers (De Ruyter, Wetzels & Bloemer, 1998). These switching barriers can be increased by sending relevant newsletter emails. As sending newsletter emails can lead to better company-customer relationships (Ting, 2012) and interpersonal

relationships are part of the switching barriers concept (Jones, Mothersbaught & Beatty, 2000). So relevant newsletter emails can lead to higher switching barriers and thereby lower churn probability. So the following hypothesis will be studied:

H4: Usage (dynamic) factors strengthen the negative effect of number of newsletter emails on churn

Conceptual research model

The conceptual research model with all the hypotheses is shown in figure 1. The relationships of frequency and intensity on churn probability are negative, recency is positively related to churn. Number of monthly peaks and the trend over the schoolyear are negatively related to churn probability, number of monthly dips and monthly variation in frequency of usage are positively related to churn. The effect of newsletter email marketing on churn probability is negative. Finally the moderating effect of the dynamic usage variables on the effect between frequency and churn is strengthening for peaks and trend and weakening for dips and

variation.

(22)

17

Figure 1

(23)

18

DATA

Research setting

The dataset of this study is provided by an educational publishing company in a western European country that sells learning methods in a few submarkets. The company provides learning methods for different educational levels (submarkets), but only one educational level will be modeled in this paper, namely secondary education. Only this submarket is studied because there are a lot of differences between the educational levels, for example not all markets that the company serves are (merely) B2B markets. Thus modeling the whole market would get very complicated and thereby not useful. In addition, only the core courses are included, because these courses are taught at every secondary school level. The core courses are biology, chemistry, Dutch, economics, English, French, geography, German, history, mathematics and physics.

Variable descriptions

Each record in the dataset contains the variables shown in table 1. The preparation of the variables and some background information will be given in the following section.

Table 1

Descriptions of the variables

Variable name* Description DV: Churn

Churn Binary dependent variable that indicates whether or not a customer uses another learning method in schoolyear 2018/2019 compared to schoolyear 2017/2018

EV: Usage

OnlineDummy Binary variable that indicates whether or not the online learning method is at least used once, in the schoolyear 2017/2018

Recency Continuous variable that indicates days since the last score is set on the online learning method, in the schoolyear 2017/2018

Frequency Continuous variable that indicates average number of scores per student set on the online learning method, in the schoolyear 2017/2018

Intensity Continuous variable that indicates the percentage of students that have set a score in the online learning method, in the schoolyear 2017/2018

Peaks Continuous variable that indicates the number of monthly frequencies in usage that are at least 1 standard deviation above the annual mean, in the schoolyear 2017/2018

(24)

19

Cv Continuous variable that indicates the coefficient of variation, that is the monthly standard deviation in frequency of usage, in the schoolyear 2017/2018, divided by the monthly mean in frequency of usage, in the schoolyear 2017/2018

Trend Continuous variable that indicates the trend during the schoolyear 2017/2018:

the total frequency in the second half of the schoolyear (February till June) minus the total frequency in the first half of the schoolyear (September till January))

EV: Newsletter email marketing

NewsletterDummy Binary variable that indicates whether or not at least one newsletter email is received, in the schoolyear 2017/2018

NewslettersReceived Continuous variable that indicates the number of newsletter emails received, in the schoolyear 2017/2018

Control variables

Years_in_use Continuous variable that indicates for how many years the current learning method has been used Cross_buying Continuous variable that indicates how many other products a school buys from the focal company Number_of_students Continuous variable that indicates an estimate of how many students a school-section has

Sales_contacts Continuous variable that indicates how often salespeople visited a school-section-course Section (4) 4 Binary (dummy) variables that indicate the section

Course (11) 11 Binary (dummy) variables that indicate the course *DV is dependent variable, EV is explanatory variable

Churn

The churn data are based on booklists of schools. The booklists are documents in which the school presents which books are used in a certain schoolyear. These booklists are imported in a dataset before every schoolyear, so it is possible to see which learning method is used and if there is a change compared to the past schoolyear(s) in learning method for a school-section-course. For 91.6% of the observations a booklist is present, the other schools did not share their booklist with the company, these observations are not in the dataset to ensure reliable data. The customers in the secondary education market can either directly buy the products of the focal company or do this via a reseller, both ways are used on a regular basis. Both ways of buying (directly via the focal firm and indirectly via a reseller) are included in the data, as when a learning method is on the booklist it is observed both when it is bought directly from the focal company or via a reseller. The data is observational and cross-sectional. As the behavior of customers of the company is observed, not controlled by the researcher. In

(25)

20

Moreover the publisher sells methods for different courses and types of education (vmbo, havo and vwo) within the secondary school market. Typically a school purchases different methods for different courses and sections of different publishers. Sections are a combination of the level of secondary education (vmbo and havo/vwo) and junior (Dutch: onderbouw) or senior (Dutch: bovenbouw) classes. In addition, there are different layers the data can be aggregated on, in this study the data will be aggregated on school-section-course level, because the course teachers of sections of schools are the most important decision makers for the decision which learning method will be used for a course. This means that there is no dependence between the different courses and section within schools, so school-section-courses can be seen as independent choices and thereby separate observations1. When a school-section-course uses another method in the schoolyear 2018/2019 than the schoolyear before 2017/2018, this is indicated as a churn.

Furthermore a comparison was made between customers that use the product 3 years or less and the rest. As customers that use the product less than 4 years have a lower churn

probability, because of contractual and financial reasons. More specifically, some customers have contracts in which is stated that they have to use the product for at least 3 years. In addition, schools often have book funds which do not allow to buy new learning methods within approximately 3 years. A Welch unequal variances two sample t-test with the mean churn of the observations within 3 years (mean= 0.032) in use and the mean churn of the rest (mean= 0.057) indicated that that the difference in churn rate was significant (p=<0.001). However excluding customer that use their learning method for 3 years or less, while the mean churn for this customers is still 3.2% is too extreme, because then a lot of churners would be ignored. So a control dummy variable was added to the models that indicates which observations use the learning method for less than 4 years to account for the contractual and financial difference between this group and the rest.

A few school-section-courses existed two times in the data, this was caused by the fact that a part of the section churned and the other part stayed with the focal firm. For example when only vwo gets a new learning method of another publisher, but havo keeps the same method there is a difference in churn within the section. This typically hardly ever happens, there were only 71 (0.006%) observations where this was the case, these observations were deleted

1 This is also checked by using the courses and sections as control variables. The significant dummies are

(26)

21

from the dataset by listwise deletion as it is just a small fraction of the total amount of observations.

RFI usage

The usage variables are based on online module usage. More specifically, students make assignment in the online learning environment of the learning methods. Usage is based at the number of assignments made in the online learning method. The usage data is aggregated on school-section-course level in the schoolyear 2017/2018. The RFI concept is used to

operationalize usage. For frequency and intensity the indicators use a number of students measure, to account for the difference in size of schools. For this an estimation of number of students is used. The number of students of a school is based at data of the Dutch

governmental institution for education (DUO). Number of students at a school-section is estimated by using the average distributions of students over the sections.

Monthly dynamics in usage

Dynamics in usage will be investigated by number of monthly peaks and dips in frequency, monthly variation in frequency and the annual trend in frequency. The monthly dynamic variables needed some extra preparation before they could be used in the models. The

dynamic variables are based at monthly (peaks, dips and coefficient of variation) or half year (annual trend) data. In table 1 is shown how the variables are made.

For the monthly peaks and dips indication the following operationalization is used: a month is indicated as a dip when the frequency in usage is less than the mean minus one standard deviation and a month is indicated as a peak when the frequency in usage is more than the mean plus one standard deviation. This is based on a book about control of quality of manufactured products (Shewhart, 1931). This book is about control charts of machine behavior. Nevertheless the goal with the variable is the same, namely indicating when an observation is outside the normal range. This theory uses three standard deviations from the mean as the normal range. In this paper a normal range of one standard deviations from the mean will be used to indicate which months fall outside the normal range, because when more standard deviations are used, almost no monthly peaks or dips are identified. This peak/dip indication will be used to create the variables number of monthly peaks and dips for a school-section-course in the schoolyear 2017/2018.

(27)

22

the standard deviation divided by the mean. The decision to divide the standard deviation by the mean of monthly frequency is made because other ways the variation would by definition be higher for observations with high frequency of usage.

The annual time trend is measured by the change in frequency of usage between the first and second half of the schoolyear 2017/2018. The first half of the schoolyear are the months September till January and the second half of the schoolyear are the months February till June.

Next to that an important issue in dynamics in monthly usage is usage during holidays. In appendix A is shown that during holidays usage is very low (almost 0). So the frequency of monthly usage is made equal by making each month a “normal” month, for the calculations of the peaks, dips and coefficient of variation variables. This done by dividing the frequency of usage in a month by the number of not holiday days in the month times 31. So also each month is made into a month of 31 days. In addition, the months July and August are not taken into account, because of summer holidays.

Newsletter emails

The newsletter email marketing data is based on emails that are sent by marketing communication specialists within the company. Also this data is aggregated on school-section-course level and about the schoolyear 2017/2018. The emails that the marketing communication specialists at the company send are for example newsletters, invitations for events or requests about exemplar products. All these types of emails are mainly aimed at keeping the customer loyal to the company and upselling. So eventually they are aimed at decreasing churn. Newsletter emails are studied in this paper, because this is the most frequent used type of email within the company.

Missings

After the data was prepared in R the dataset was checked on missings, outliers and oddities. First, a lot of missings were in the data, both for the newsletter email marketing data and the usage data. However these missings were missing because there was no data for this

(28)

23

missings were primarily unrelated to the focal variable and the other variables in the analysis, they are only related in the sense that if the number of students of a school-section misses, also the other number of students values of the same school-section miss. So the missings are mainly at random, thus imputation is a good way to deal with the missings (Sterne et al., 2009). The imputation method that was used is bagged trees, because this imputation method is proven to perform well on various datasets (Saar-Tsechansky & Provost, 2007). For the missings a bagged tree prediction is made with all the explanatory variables in the dataset. The missings are replaced with the imputed values2. Imputing missings by using the other explanatory variables can lead to multicollinearity issues (Shah, Barlett, Carpenter, Nicholas & Hemingway, 2014), however this is not the case in this study because a lot of explanatory variables are used to impute just a small fraction of missings (808 imputed observation, 6.8% of the total number of observations).

Outliers and oddities

Moreover for a number of variables outliers are present. The outliers and also the way that is dealt with the outliers are shown in appendix B. Only the outliers that are seen as oddities are adjusted in the dataset, because the other outliers can still be important in predicting churn probability and mostly the number of outliers was rather high.

Descriptive statistics

Before the analysis some descriptive statistics are inspected. First, tables with the distribution of the courses and sections and churn over the courses and sections is created and inspected (see table 2 and 3).

Table 2

Distribution of courses and churn over courses

Course n Percentage of total observations Churners Churn rate

Biology 518 4.38% 20 3.86% Chemistry 732 6.19% 44 6.01% Dutch 1456 12.32% 51 3.50% Economics 1069 9.05% 24 2.25% English 1105 9.35% 53 4.80% French 1069 9.05% 24 2.25% Geography 734 6.21% 25 3.41% German 1620 13.71% 81 5.00%

(29)

24

History 705 5.97% 54 7.66%

Mathematics 2373 20.08% 63 2.65%

Physics 435 3.68% 27 6.21%

Mean 1074.18 42.36 3.94%

From table 2 can be concluded that there is a lot of difference between the different courses in terms of number of observations in the dataset and churn rates. This is the reason why the significant course dummies are included as control variables. For example there are a lot of mathematics observations, this is caused by the high market share in mathematics. However the number of churners and the churn percentage is low for mathematics, this also makes sense because there is not that much competition in this submarket, actually the only real competitor product, is another product of the focal company. For chemistry, history and physics the churn percentage are rather high, also the number of observations for this courses is far below the mean. This is caused by a weaker position in this submarket.

Table 3

Distribution of sections and churn over sections

Section n Cumulative percentage Churners Churn rate

vmbo junior 2976 25.19% 112 3.76%

vmbo senior 3307 27.99% 113 3.42%

havo/vwo junior 2935 24.84% 141 4.80%

havo/vwo senior 2598 21.99% 100 3.85%

Mean 2954 116.5 3.94%

Table 3 indicates that there are not that much differences between the sections. Only the churn percentage of havo/vwo junior is rather high. Also for sections the significant dummies are included as control variables.

Also a table with the means and standard deviations of the dependent variable and the explanatory variables is made, the results are in table 4.

Table 4

Descriptive statistics

Variable Mean Standard

(30)

25

EV: Usage

Number of online module users (% of total) 7001 (59.25%) OnlineDummy 0.59 0.49 0 1 Recency 185.70 156.17 1 365 Frequency 1.87 11.33 0 828.85 Intensity 0.13 0.23 0 1 Peaks 0.94 0.91 0 4 Dips 0.38 0.78 0 4 Cv 0.77 0.86 0 3.16 Trend -73.26 586.33 -15067.25 7431.46

EV: Newsletter email marketing Number of customers that received at least 1 newsletter email (% of total)

7500 (63.5%)

NewsletterDummy 0.63 0.48 0 1

NewslettersSent 1.55 1.70 0 6

Total 11816

Table 4 shows that 466 of 11816 customers churn, put it differently only 3.94% of the customers churned after the schoolyear 2017/2018. Furthermore 7001 (59.25%) customers used the online module at least once in 2017/2018. This was the indication to test models with dummy variables of online usage (OnlineDummy). The online dummy indicates whether or not the online module is used in the schoolyear 2017/2018. This dummy was included, because the difference between 0 and 1 might indicate something else than the difference between the rest of the values. As a 0 for the usage variables means that the online module is not used at all. Another remarkable fact that can be derived from the table is the large

(31)

26

METHODOLOGY

Modeling technique

In this study churn is modeled by using binary logistic regression (logit) modeling. In addition various machine learning techniques will be used to improve the performance of the model. Actually logit is also machine learning, however in this paper it is not called machine learning. When referred to machine learning in this paper the more advanced decision tree models, ensemble learning techniques and neural networks are meant. The company-customer relationships in the market are contractual. Contractual means that “the loss of the customer is observed by the firm” (Ascarza, Netzer & Hardie, 2018, p. 55). This makes it easier to

indicate when a customer ends the relationship than in a noncontractual setting. This makes that the dependent variable churn, in this context can be modeled by using a logit model, a more complicated model is not needed (Ascarza, Netzer & Hardie, 2018). Logit is most used and has a high predictive performance in churn modeling (Neslin, Gupta, Kamakura, Lu & Mason, 2006). In addition, logit is rather simple to interpret (Burez & Van der Poel, 2009).

Logistic regression

Logit models are often used to predict whether a customer does or does not do something. Also probit models can be used for this, however logit is often preferred in marketing, because of mathematical convenience (easier to calculate probabilities and to interpret parameters). In this case a logit model will be used to predict whether the customer churns or not. Logit models assume that a latent variable, which represents the utility of the product drives, in this case the churn decision, of a customer. This utility Ui has a linear specification,

see equation 1 (Fok, Leeflang, Wieringa, Bijmolt & Pauwels, 2017). [1] 𝑈𝑖 = ∝ + 𝛽𝑥𝑖′+ 𝜀𝑖

With: α: Intercept

xi = Explanatory variable

β: Coefficient of an explanatory variable εi = Error term for Ui

The utility Ui and the decision Yi are linked to each other by the rules (Fok et al., 2017):

- Yi = 0 when the utility is 0 or less than 0

(32)

27

Only the decision is analyzed so the error term is not observed. So the decision is treated as a random variable. A cumulative distribution function (CDF) is used for the logit model, see equation 2 (Fok et al., 2017).

[2] 𝑃[𝑌1 = 1] =

exp (∝+𝑥𝑖′𝛽) 1+exp (𝛼+𝑥𝑖′𝛽)

With:

P[Y1 = 1] = The probability that something happens α: Intercept

xi = Explanatory variable

β: Coefficient of an explanatory variable

For more detailed information about logit modeling, statistical modeling books as Fok et al., (2017) can be consulted.

Model specification

The formula in equation 3 will be used as total model with all variables for the logistic regression in this study, the interaction effects will be used for testing moderation. Also the control variables are included. A number of variables are quadratically, instead of linearly, related to churn. A logit model only takes into account linear variables, however by

transformation of variables that are expected to be nonlinear, in this case quadratically, it is possible to model nonlinear variables in a logit by adding the quadratic variable (see equation 3). Models with and without the quadratic variables will be compared to see whether there really is a quadratic relationship.

[3] 𝑈𝑖 = ∝ + 𝛽1𝑅𝑒𝑐𝑒𝑛𝑐𝑦 + 𝛽2𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 + 𝛽3𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 +

𝛽4(𝑅𝑒𝑐𝑒𝑛𝑐𝑦: 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦) + 𝛽5(𝑅𝑒𝑐𝑒𝑛𝑐𝑦: 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦) + 𝛽6𝑃𝑒𝑎𝑘𝑠 + 𝛽7𝐷𝑖𝑝𝑠 + 𝛽8𝑐𝑣 + 𝛽9𝑇𝑟𝑒𝑛𝑑 + 𝛽10𝑁𝑒𝑤𝑠𝑙𝑒𝑡𝑡𝑒𝑟𝑠𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑑 + 𝛽11𝑌𝑒𝑎𝑟𝑠𝑖𝑛𝑈𝑠𝑒+ 𝛽12𝐶𝑟𝑜𝑠𝑠_𝑏𝑢𝑦𝑖𝑛𝑔 +

𝛽13𝑁𝑢𝑚𝑏𝑒𝑟_𝑜𝑓_𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 + 𝛽14𝑆𝑎𝑙𝑒𝑠_𝑣𝑖𝑠𝑖𝑡𝑠 + 𝛽15𝑆𝑒𝑐𝑡𝑖𝑜𝑛 + 𝛽16𝐶𝑜𝑢𝑟𝑠𝑒 + 𝜀𝑖

With:

U: Utility that a customer churns

i: Customer (school-section-course) 1 … 11818 α: Intercept

βi: Coefficient of explanatory variable εi = Error term for Ui

(33)

28

What happens after the most important usage (dynamic) predictors are identified, is that a model is specified with interaction effects between the most important churn prediction usage (dynamic) variables and newsletter emails. With this model can be investigated whether using usage (dynamic) variables for targeting in newsletter emails is strengthening the negative effect of the newsletters on churn.

Model comparison and performance

A number of different logit models are compared in terms of model performance and

significance of the variables. This is done to identify the most important churn predictors, to create a simple and thereby easy understandable model and to test the hypotheses of the conceptual framework.

The first criterion that is used to compare the performance of the models is the Akaike information criterion (AIC). This criterion will be used to choose a parsimonious (complete, but simple) model, as simplicity and completeness are both important model criteria (Little, 1970). In addition, a goal of this paper is delivering a model that is easy to understand for managers, so it is better to have a simple model. AIC is an information criterion that

calculates the likelihood, penalized for the number of parameters (see equation 4). AIC has a relatively low penalty for number of parameters when compared to other information criteria, as BIC, CAIC and AIC3. This means that the AIC favors models with more variables than BIC, CAIC and AIC3. AIC is chosen over other information criteria because in this setting the other information criteria tend to favor too small models.

[4] AIC = -2 (LL) + 2 (K+1) With:

LL = Natural logarithm of the likelihood of the model K = Number of parameters of the model

T = Number of observations of the model

(34)

29

are used. In equation 5 the formula of the TDL is shown. A TDL of 1 indicates that the model performance is as high as random selection, a TDL of 2 indicates that a model performances twice as good as random selection.

[5] 𝑇𝐷𝐿 =𝑇𝑜𝑝 10% 𝐶ℎ𝑢𝑟𝑛 𝑟𝑎𝑡𝑒

𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑐ℎ𝑢𝑟𝑛 𝑟𝑎𝑡𝑒 × 100% Estimation

For the estimation of the extent of the effect of the explanatory variables, the significance level and the estimate of the parameters are used. The significance level is used to indicate whether the effect of the explanatory variable will be interpreted. When a variable is

significant at 5%, it is interpreted. The coefficient of the estimate is used to see if the effect of the explanatory variable is positive or negative, however it cannot directly be used to see the extent of the effect. For this the odds ratio and the marginal effect are needed. The odds ratio indicates the likelihood of churn happening versus churn not happening. The marginal effect indicates the probability increase in observing a churn, if a binary explanatory variable is 1 instead of 0 or an instantaneous change in a continuous independent variable happens, with all other explanatory variables kept at average observation.

Machine learning

In addition Gordini & Veglio (2017) found that in B2B e-commerce, a traditional technique like logit is getting outperformed by machine learning techniques, because of the noisy, imbalanced and nonlinear setting. Also the context of this study is quite complex and thereby noisy, imbalanced and nonlinear, for example the differences between different courses may lead to imbalance in predictive performance of variables between courses. So it is expected that also in this study machine learning may lead to improved predictive performance

compared to the logit models. Furthermore a lot of papers in B2C churn modeling found that bagging, boosting and random forest are good ways to improve predictive performance

(Risselada, Verhoef & Bijmolt, 2010; Lemmens & Croux, 2006; Coussement & Van der Poel, 2008), but yet no unambiguous best machine learning technique in churn modeling is found as is shown in Risselada, Verhoef and Bijmolt (2010).

(35)

30

namely bagging, boosting and random forest will be included in the comparison, these models combine several models into one model to get the best result.

The caret package in R is used to compare the performance of different machine learning techniques. The classification and regression training (caret) package streamlines the model training process (Kuhn, 2015). This makes it possible to compare the performance of a rather large number of different machine learning techniques in a quick way. Caret evaluates and chooses the optimal model within the modeling method that is used, with the variables that have been specified and within the train control that is specified. The machine learning techniques are set to predict churn, with all the variables in this study. The machine learning is done by repeated k-fold cross validation, as the performance of this resampling method is good and stable (Kim, 2009). In addition, repeated k-fold cross validation ensures a less biased model performance (Kim, 2009). Repeated k-fold cross validation is resampling the data k times and repeats the process of splitting a number of times. The number of repeats and the k are manually chosen based on processing time and performance of the models.

Caret optimizes the tuning of certain parameters (machine learning technique specific) that

are not manually adjusted by the researcher, by using resampling. The parameters are optimized by caret to get a performance that is as high as possible in terms of accuracy and Kappa. Accuracy is “the overall agreement rate averaged over cross-validation iterations” (Kuhn, 2018). Kappa is “Cohen’s (unweighted) Kappa statistic averaged across the

resampling results” (Kuhn, 2018). Cohen’s kappa statistic also uses the agreement rate for all the iterations, but it also takes into account the agreement happening by chance. For a more detailed explanation about the caret package, Kuhn (2018) can be consulted.

Balanced sample

(36)

31

RESULTS

Logistic regression

Control variables

First the significance of the control variables was checked. The control variables are used to decrease the omitted variable bias. Only the significant control variables are included to ensure a high number of observations per parameter in the models. The first variable that was significant is the dummy that indicates the first 3 years (SubsetDummy: β=-0.385, p=0.033), when the observation is 1,2 or 3 years in use the churn probability is significantly lower than in the rest of the years in use. Also the years in use control variable was included to take into account the negative effect of the rest of the years in use. Furthermore the dummies of the courses economics (β=-0.685, p=0.001), French (β=-0.693, p=0.001), history (β=0.535,

p=0.001) and mathematics (β=-0.506, p<0.000) were significant. This makes sense, because

the results in table 2 indicate that economics, French and mathematics have a relatively low churn rates and the course history has a relatively high churn rate. Only the section variable of havo/vwo junior was significant (β=0.248, p=0.017), also this does make sense, because the churn rate of the section havo/vwo junior is relatively high (see table 3). The control variables number of students, cross-buying, sales contacts were not significant. Also the dummies that were used to test if adjustments that were done in the data preparation stage were an issue (dummy that indicates imputated number of students observations, dummy that indicates outliers for the frequency variable and the dummy for intensity above 100% observations) were not significant. So these data adjustment could be done without significantly influencing the results.

Multicollinearity

Before the models are estimated a test for multicollinearity was done. Multicollinearity is caused by correlation between the explanatory variables, while explanatory variables should be independent. A multicollinearity test was done before the analysis to ensure accurate estimates, because multicollinearity can lead to inaccurate estimates for the explanatory variables in the models. Multicollinearity was tested by using the variance inflation factor (VIF) of the explanatory variables. This function makes a separate regression of one

explanatory variable on all other explanatory variables in the model, after that equation 6 is used to calculate the VIF of the explanatory variable. The threshold for multicollinearity that is used is 5, because De Vaus (2013) indicates that this is a signal for problematic

(37)

32 [6] 𝑉𝐼𝐹 = 1

1−𝑅𝑖2

The results of the VIF test are in table 5. As can be seen the highest VIF in the total model is for the recency variable (VIF=6.2), this is above the threshold of 5, so it should be solved. Therefore a correlation matrix was made, this showed that the high VIF for recency is mainly caused by the peaks variable (cor= -0.84). The correlation indicates that when recency is higher (usage is a long time ago), there are less monthly peaks in frequency of usage. This makes sense, because when usage is a long time ago, by definition a lot of zeroes for monthly frequency in usage are observed. Thus less peaks can be observed, because it is a shorter period in which the monthly peaks can take place, as the rest of the values are zero. So a model without the peaks variable and a model without the recency variable are made to see if there was still a multicollinearity issue, the VIF test results of these models are also in table 5. The results indicate that there are no multicollinearity issues anymore, as the highest VIF is 3.8 of the recency variable, which is not beyond the threshold of 5 anymore. So in the modeling will be taken into account that models with both recency and peaks have to be interpreted carefully.

Table 5 VIF test results

Variables VIF score total model with peaks and recency variable

VIF score model without peaks variable

(38)

33

Modeling

After the significant control variables are identified and a multicollinearity test was done, a comparison between different parts of the conceptual framework will be made to test the relationships hypothesized in the theoretical framework. First, a model with only the control variables is included (model 1). After that three versions of models with only the RFI usage variables (models 2, 3 and 4) are estimated, than two models with only dynamic usage variables (models 5 and 6) are estimated and after that four models with both RFI and dynamic usage variables (model 7, 8, 9 and 10) are estimated. By this the most important churn predictors are identified. The results of these analyses are shown in table 6, the colors in the performance measure columns (AIC and TDL) indicate how well the score for the model performance criterion is compared to the other models in the table (green is better score and red is worse score). Also the marginal effects (effect of an increase of 1 of the explanatory variable on the dependent variable, for an average observation) and significance of the effects are shown in the table. After the analysis of the effect of the usage variables on churn is done, a number of models to test the effect of newsletters and the effect of the interplay between newsletters and usage (dynamics) on churn are estimated.

Table 6

Marginal effects* and performance of usage models

(39)

34 RFI usage

The first analysis that is done is the analysis of the effect of the RFI usage variables on churn. First, a model with only the RFI variables is estimated (model 2), also a model including the interactions as hypothesized in the theory section is estimated (model 3). After that a model with only the significant variables of model 2 and 3 are estimated (model 4). This model contains recency and frequency and an interaction effect between recency and frequency. Intensity and the interaction between recency and intensity did not have a significant effect on churn. Also the performance metrics AIC and TDL indicated that a model without the

intensity variables performed better.

In the comparison between the models in table 6 can be seen that the TDL and the AIC improve by adding the RFI usage variables compared to a model with only the control

variables. So it can be concluded that using the RFI variables for churn prediction is useful, as it works better than a model with only the control variables (with only the effect of significant courses, sections and years in use). Also the amount of significant variables indicates the usefulness of using usage variables for churn prediction.

In addition, TDL and AIC both indicate the model with only the significant variables (Model 4) as the best model. This model contains 3 significant variables, namely recency, frequency and an interaction between recency and frequency. The marginal effect shows that the effect of frequency is negative, the effects of recency and the interaction between recency and frequency are both positive. The direction of the effects indicate: more frequent usage means less churn, higher number of days since last time of usage means more churn and the positive effect of recency is strengthened by frequency. Thus the direction of the effects are as they were expected in the hypotheses. A remark that has to be made is that frequency is only significant when the interaction between recency and frequency is in the model. This is caused by the strengthening effect of recency on the effect of frequency on churn. This

interaction effect takes a part of the variance from the main effect and thereby the significance of the main effect is affected.

Dynamics in usage

Referenties

GERELATEERDE DOCUMENTEN

Er kunnen verschillende determinatoren op de eerste plaats staan, zoals (2) demonstreert, maar de meest neutrale gevallen de en het juist niet, wat samenhangt met het

The engpron package is loaded whith the options jones, monstress, unhyphenable, visible, nice, and final — default options — and the explicit option tame which replaces the

interface by which one consecutively, first, specifies the parameters and replacement code of a document-command[ 2 ], and, second, evaluates it with compatible argu-

Given the different characteristics of the online and offline channel, and the customers that use a respective channel, channel choice is expected to moderate the

Theoretical Framework Churn Drivers Relationship Breadth H1: - Relationship Depth H2: - Relationship Length H3: - Age H4: - Gender H5: - Prior Churn H6: + Price H7: + Promotion H15:

› Targeting customers by using usage data for newsletter emails is not effective in reducing churn › Effective to predict churn. with

›  H4: Average product price positively influences the effect of the amount of opens on customer churn.. ›  H5: Average product price positively influences the effect of the amount

Pas wanneer ze een veel ruirnere ervaring hebben opgedaan met verschillende elementen die in dit patroon èn in transitieve patronen voorkomen, ontstaat een basis voor de abstractie