• No results found

Customer Lifetime Value prediction: a contractual B2B market-context

N/A
N/A
Protected

Academic year: 2021

Share "Customer Lifetime Value prediction: a contractual B2B market-context"

Copied!
58
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

market-context

Jeroen Jacob Pieter van Gurp Ceintuurbaan 342-2 1072 GP Amsterdam (+31) 6 81 17 68 51 j.j.p.van.gurp@student.rug.nl S3041379 Master Thesis

MSc Marketing Intelligence & MSc Marketing Management University of Groningen

Faculty of Economics and Business

January 10, 2021

First supervisor: dr. P.S. van Eck Second Supervisor:

(2)

PREFACE

This research resembles not only the conclusion of my master’s degree, but also the end of my time as student at the University of Groningen. I wrote this in the period of September till January 2020 and combined this with an internship at the Marketing department of Microsoft. It was a strange period to write a thesis and start with an internship, however, I experienced this period as a great learning process.

First, I want to thank Microsoft for the opportunity to combine my thesis with an internship. Especially my managers were extremely helpful during the process and provided me with all the necessary help and information. Combining a thesis and an internship requires flexibility from both parties and I am grateful that I received such freedom. Also, I would like to thank dr. Peter van Eck, my first supervisor, for his constructive feedback and advice during the process of writing this thesis.

Moreover, I would like to thank my family and girlfriend for their support throughout this strange period of writing a thesis during the corona crisis. Especially when seeing few people during the lockdowns, they were of great value for me.

(3)

ABSTRACT

Over the last 30 years, the customer targeting assumed greater importance. The old-school telemarketers started ranking and targeting customers based on their historical purchasing behaviour, while over the years, more and more advanced classification methods developed. Predicting which customers are likely to purchase again and thus are worth investing in, which will defect became a major topic of interest for research and business. Metrics as the customer lifetime value, where future profits are compared to the related costs, are introduced and used for customer segmentation. To predict this value, several modelling techniques are introduced, all accounting for a different context. Where an extensive stream of literature focuses on the business-to-consumer market, the business-to-business gained less attention. This study focuses on determining the best method for predicting the customer lifetime value in a business-to-business context. To do so, customer retention and churn are modelled using various methods, after which the most suitable is chosen. In this case, a logistic regression performed best. This model is then multiplied with a multiple linear regression model to predict the customer lifetime value ultimately. The effect of historical customer behaviour is used as input for both models. The customer-firm relationship length, -breadth and -depth all increase the customer lifetime value, directly or indirectly, and are therefore important drivers.

Keywords: Customer lifetime value, churn, B2B, logistic regression, multiple linear regression, customer behaviour

(4)

TABLE OF CONTENTS

PREFACE ii ABSTRACT iii TABLE OF CONTENTS iv 1. INTRODUCTION 1 1.1 Research purpose 2 1.2 Academic contribution 2 1.3 Managerial contribution 3 2. LITERATURE REVIEW 4 2.1 Recency-Frequency-Monetary models 4 2.2 Probability Models 5 2.3 Econometric Models 6 2.4 Persistence Models 9

2.5 Computer Science Models 10

2.6 Diffusion/Growth Models 12

2.7 Model selection 13

3. METHODOLOGY 14

3.1 Conceptual framework 14

3.2 Data collection & -cleaning 15

3.3 Variable description 15 3.3.1 Dependent Variables 15 3.3.2 Independent variables 16 3.4 Data exploration 17 3.4.1 Missing values 17 3.4.2 Data irregularities 18 3.4.3 Descriptive statistics 18 3.5 Churn models 19

3.5.1 Logistic regression model 19

3.5.2 Alternative classification techniques 20

3.5.3 Model specification 21 3.6 Revenue model 22 4. RESULTS 23 4.1 Churn model 23 4.1.1 Validity 23 4.2 Classification techniques 25

(5)

4.2.2 Random Forest 26

4.2.3 Neural Network 26

4.2.4 Support Vector Machine 26

4.3 Method selection 26

4.4 Model estimation logistic regression 27

4.4 Revenue model 29

4.4.1 Validity 30

4.4.2 Model assumptions 31

4.4.3 Model estimation 33

4.5 CLV calculation 34

5. CONCLUSION AND DISCUSSION 36

5.1 Findings churn model 36

5.2 Findings revenue model 37

5.3 Prediction method CLV 38

6. LIMITATIONS AND RECOMMENDATIONS 40

6.1 Limitations 40

6.2 Recommendations 41

7. BIBLIOGRAPHY 42

(6)

1. INTRODUCTION

Over the last decades, the marketplace has changed; product lifecycles are getting shorter, customer diversity is increasing, and nowadays, markets are more fragmented than they were before (W. Reinartz & Kumar, 2012). Simultaneously a shift appeared within marketing philosophies. Whereas in the past, the focus laid on the needs of groups of customers, nowadays, marketing focuses on individual customers' needs. As stated by Kumar and Shah (2009), the world has become more customer-centric.

The approaches of product centricity and customer-centricity differ in almost every detail, from the organisational structure until the used performance metrics. In the past profitability-or market share per product were metrics to assess firm perfprofitability-ormance, in customer-centric organisations, the used metrics are customer satisfaction and customer lifetime value (Shah et al., 2006).

When customers are seen as a critical asset, customer lifetime value is not only a significant financial metric but can also be useful to assess the firm value or financial performance (Berger & Nasr, 1998; Blattberg et al., 2009; Gupta et al., 2004). The customer lifetime value (CLV) is defined as the total profits derived from a certain customer during its lifetime (Berger & Nasr, 1998; Shah et al., 2006). By calculating the CLV, firms can assign a weight to- and segment their customers based on their predicted contribution. Moreover, the CLV helps firms determine how much they should invest in retaining their customers and allocating their (marketing) resources (Kumar, 2007; Malthouse & Blattberg, 2005). As Mulhern (1999) stated, it is often the case that a small percentage of customers account for the most significant part of the revenues and profits. Therefore, it makes sense to account a disproportionate amount of marketing resources to the most contributing customers to become as profitable as possible.

In the business-to-consumer (B2C) market, the customer is placed more centrally; this is also the case in the business-to-business (B2B) context. The traditional production-oriented organisations compete mainly on prices; the margins in the services industry can be higher. Customers are the most valuable asset of any firm and to maintain these customers, firms should ensure high customer retention. Customer retention or customer churn directly impacts the CLV (Bolton et al., 2004), as the customer lifetime ends when the customer leaves.

In a B2C context, many customers individually represent a relatively low value. In the B2B context, usually, fewer customers individually represent a lot more value. As those B2B customers represent a lot more (financial) value, a correct CLV determination could be valuable. It is common for service markets to engage in contractual relationships with customers; those could be long-term or short-term. Customer churn for contractual service firms might be even more severe than for non-contractual firms, as they lose all the future incomes of these customers up until they switch at the end of their contract to the initial firm. For non-contractual firms, a purchase at a competitor does not necessarily mean a customer is

(7)

not purchasing the next time at the focal firm. Additionally, another difference between non-contractual and contractual firms is whether churning behaviour is observed. Firms that engage in contractual relationships observe whether a customer defects or renews its contract where a non-contractual firm does not observe customer defection.

Due to its growing importance, extensive literature is available for the concept of CLV. Some of that literature focuses on the role and impact of factors that drive the CLV (Bolton et al., 2004; Kumar & Shah, 2009), while on the other hand there is a focus towards the different ways of the CLV prediction and -determination (Berger and Nasr 1998; Bolton, Lemon, and Verhoef 2004; Cheng et al. 2012).

This study is performed in a business setting and focuses on Microsoft Netherlands. Microsoft is a big-tech company, and the Dutch subsidiary is based in Amsterdam. The research focuses on the product group of Microsoft Azure, the cloud computing platform of Microsoft. For this service, customers can subscribe in various ways. The first way is to subscribe for free and receive only basic functionality. The second way is to use a pay-as-you-go subscription, where customers pay for the use of the different modules and functionalities used. The last way is to arrange enterprise agreements where organisations receive tailor-made agreements. Even though there is a free subscription, it is not likely that a company will go for this option as functionality is too limited.

Microsoft Azure's focus lies on B2B customers as these customers could gain relatively more advantage when moving from on-premises to cloud computing compared to B2C customers. The local IT-infrastructures of business customers are more advanced and extensive resulting in more costs. Moving to the cloud reduces those costs and enhances the scalability and availability of the IT-infrastructure. When, for example, temporarily more computing power is needed, it is possible to scale the processing power and downscale it. The Azure cloud platform consists of different services or products which can be used independently.

1.1 Research purpose

This research focuses on different prediction methods for the customer lifetime value in a contractual B2B context. It aims to give a clear comparison between different prediction methods and rank them based on predictive power. Therefore, this study answers the following research question: “What is the best method to predict the customer lifetime value in a contractual B2B context?”

1.2 Academic contribution

In the existing literature, there is a stream that focuses on CLV prediction in a contractual setting (Ascarza & Hardie, 2013; Benoit & Van den Poel, 2009; Wirtz et al., 2014), but the majority covers noncontractual context (Fader et al., 2010; Gupta et al., 2006; Malthouse & Blattberg, 2005). This study emphasises on CLV prediction methods in a contractual setting where transactions happen at a discrete-time base. Where previous research on discrete-time based transactions consider a non-contractual context (Fader et al.,

(8)

2010), this study's scope is a contractual B2B context. More specifically, the data is gathered from Microsoft's Dutch subsidiary and refers to the purchases of its cloud computing services; Microsoft Azure. As this industry is relatively new, there are little studies regarding this context. The contribution of this study lies in the application of known prediction methods applied in a new context. In contrast, traditional well-performing methods might be not applicable in this new context.

1.3 Managerial contribution

This study intends to determine the best practice for identifying the most valuable customers. As identifying valuable customers is of growing importance, this study helps managers predict which customers are worth investing in and which customers represent less future value. Based on this prediction method's outcomes, managers can allocate resources and focus on the group most profitable or loyal customers. Focussing on the most profitable customers rather than on customers that are not likely to deliver much value can be a beneficial strategy.

The paper is built up as follows, first relevant prediction methods are presented in the literature review. After that, a general model for CLV is determined and the prediction methods will be chosen. After that, the results of this study are presented. To conclude, the results will be concluded and discussed. Based on this discussion and conclusions, managerial- and academic recommendations is given.

(9)

2. LITERATURE REVIEW

Customer lifetime value and its prediction is an extensively researched topic, and a wide range of models are used for this. Before comparing these models, the context of this study should be considered. There are different kinds of customer-firm relationships. These relationships could be either contractual or non-contractual. Contractual relationships imply a legal relationship between both parties and that customer defection is observed, such as memberships or subscriptions. There is no legal relationship between the customer and company in non-contractual relationships, and defection is mostly unobserved. The customer-company relationship of Microsoft is a subscription and thus contractual.

The customer lifetime value is the sum of the discounted future cashflows of an individual customer (Berger & Nasr, 1998; Gupta et al., 2004; Shah et al., 2006). The general form of the CLV is as follows: ( 1 ) 𝐶𝐿𝑉 𝑖 = 𝑡=0 𝑇 ∑

(

𝑝𝑖𝑡−𝑐𝑖𝑡

)

𝑟𝑖𝑡 (1+𝑑)𝑡 − 𝐴𝐶𝑖𝑡

Where i is the customer index, t is the time index, T is the number of periods considered for estimating the CLV and d is the discount rate (Kumar, 2007). The profits are computed by p, which stands for the paid price of customer i at time t and c stands for the costs. Finally, r stands for the retention of a customer, and AC stands for the acquisition costs for customer i at time t. Marginal profits can be derived by subtracting the costs c of the price paid p by the customer.

For multi-service industries, profits are usually influenced by the margin of a range of services. In this case, profits are determined by the following calculation (Berger & Nasr, 1998): ( 2 ) 𝑃𝑟𝑜𝑓𝑖𝑡 𝑖,𝑡= 𝑗=1 𝐽 ∑ 𝑆𝑒𝑟𝑣 𝑖𝑗,𝑡 * 𝑈𝑠𝑎𝑔𝑒𝑖𝑗,𝑡* 𝑀𝑎𝑟𝑔𝑖𝑛𝑖𝑗,𝑡

Where J is the amount of different used services, 𝑆𝑒𝑟𝑣 is a binary indication if service j is

𝑖𝑗,𝑡

purchased or not, 𝑈𝑠𝑎𝑔𝑒 is the number of services purchased and is the average

𝑖𝑗,𝑡 𝑀𝑎𝑟𝑔𝑖𝑛𝑖𝑗,𝑡

profit margin of service j.

Gupta et al. (2006) identify six different model types for modelling customer lifetime value directly or indirectly. As these model types encompass the variety of prediction methods, this is used as a preliminary categorization. The described models are recency-frequency-monetary (RFM) models, probability models, econometric models, persistence models, computer science models, and diffusion/growth-models.

2.1 Recency-Frequency-Monetary models

Which customer to target and which not, is a regularly faced problem for database marketers. In the last 30 years, RFM models are often used in direct marketing. As a reaction to the low response rates of direct marketing actions, these models were developed to target

(10)

specific customers to improve the response rate (Gupta et al., 2006). These models try to predict future customer behaviour based on historical behaviour. They are based on three criteria: (R) recency of the last purchase, (F) frequency of the purchases, and the (M) monetary value of all the purchases. Customers are scored on each criterium and after that prioritized accordingly. Some researchers assign the same weight to each variable (Hughes, 1994); others propose different relative weights for each one (Stone, 1994). Based on this model, valuable customers are identified, and a ranking is observable. These models try to predict the order in future customer value rather than the absolute value of each customer. As described by Kumar (2007), RFM models perform well in the high volume business. For this type of models, several assumptions are made. Namely, customers who had purchased recently at the firm are more likely to purchase again. Customers who bought more frequently are more likely to purchase again than customers who have bought a few times, and customers who spent the highest amount of money were more likely to buy again.

The RFM models can only be utilized when historical sales data is present. Without this, there is no input for the models. Moreover, the RFM models try to predict future customer behaviour rather than predict a real or absolute future value of the different customers. Therefore, when trying to predict the CLV with these models, several drawbacks arise (Fader et al., 2005). To start, the RFM models only estimate the incomes of period t+1 and not, as the CLV ought to be, the total sum of all the future periods. Second, the RFM aspects are incomplete predictors of the underlying behaviour and can be ambiguously interpreted. Lastly, performed marketing actions are not considered.

Reinartz and Kumar (2003) compared a traditional RFM model with a probability model (Pareto/NBD) to predict the customer lifetime value. They predicted the top 30 per cent of their customer base with both models and found that the selection based on the probability model resulted in a 33 per cent higher revenue. In addition to this, Venkatesan and Kumar (2004) conducted a similar study. In their research, they found the top five per cent of their customer base as selected by their CLV model is up to 50 per cent more profitable compared to the top five per cent selected by the RFM model. Where RFM models are easy to explain and understand, their predictive power is somewhat limited. So, other types of models might be more accurate predictors.

2.2 Probability Models

As mentioned earlier, aside from the RFM models probability models exist. A probability model tries to explain presented behaviour as result of an underlying stochastic process. This process is driven by individual characteristics and are heterogeneous across the population. This observed behaviour differs across the population and follows a probability distribution. The models focus on describing and predicting behaviour, rather than explaining the differences as a function of covariates. Regarding the CLV, the aim is to make predictions whether or not a customer is still active in the future and what their purchasing behaviour would look like (Gupta et al., 2006).

(11)

Schmittlein, Morrison, and Colombo (1987) are among the first introducing a model that capture these needs, namely the Pareto/NBD model. This model assumes that customers actively make purchases for an unobserved time before becoming inactive forever. This model assumes that purchases follow a Poisson distribution which is captured by the negative binomial distributed model. An exponential distribution captures the time a customer stays alive combined with a gamma-distributed dropout rate. This distribution is called the Pareto distribution. In this model, the only required inputs are the recency- and the frequency variables described in the RFM models. The model produces the number of active customers, a ranking of these active customers and the future transaction levels (Schmittlein et al., 1987). As the model does not provide the value of each transaction but rather the number of transactions, this model should be combined with a model that determines each transaction's financial value to come to a correct CLV prediction. The model is not useful in every context. This type of models is only suited for non-contractual situations and situations where transactions do not follow a fixed time path, such as annual events (Gupta et al., 2006). These events do not follow the Poisson distribution assumptions, so the data do not fit Pareto/NBD models.

Nonetheless, several companies face customers who interact with them on a discrete-time base. These organisations' transactions only occur at fixed time intervals or receive transactions for specific events or firms that discretely report transactions; nonetheless, the transactions were made throughout the period. As the Pareto/NBD model assumes that customer purchasing follows a Poisson distribution, these models are not applicable in this situation. By using the Poisson distribution, one assumes that transactions can occur at any moment in time. Fader, Hardie, and Shang (2010) present in their study a different model that builds upon transaction data with a discrete-time interval. Based on their underlying assumptions, the developed model is a beta-geometric/beta-Bernoulli (BG/BB) model. This model is the correct customer base analysis model in non-contractual settings with transactions on a discrete-time base. Fader, Hardie and Shang (2010) show that their model performs extremely well for this type of data. Based on several sorts of predictions, their model tends to deliver perfect predictions. To verify whether this model is the correct model for this type of data, the authors compared it with a regular Pareto/NBD model. The BG/BB model outperforms the Pareto/NBD model since the Poisson/exponential assumptions of this model does not hold in the situation where transactions take place in a discrete-time interval. However, this model only explains if a transaction takes place or not. To predict the CLV with this model, it should be combined with a model that accounts for each purchase's monetary value. As proposed by the authors, the correct model for this is the gamma-gamma mixture model of Colombo and Jiang (1999). This model is in earlier research combined with the Pareto/NBD models to determine the CLV (Fader et al., 2005) and should be a fitted candidate for this new type of model. However, the combination with a BG/BB model has yet to be made.

(12)

2.3 Econometric Models

Traditional marketing literature proposes simple prediction models, using mainly aggregated data to predict the customer lifetime value (Berger & Nasr, 1998; Jain & Singh, 2002). Next to these simple models, researchers have proposed more sophisticated models based on behaviour. These econometric models try to predict customer behaviour including customer acquisition (Lewis, 2005), customer retention (Gupta et al., 2004), and customer expansion or cross-selling (Kamakura et al., 2003, 2004; J. Reinartz & Kumar, 2003).

Customer acquisition indicates the first purchase of a new customer. Research of this topic tries to link acquisition with customer retention and extend that to CLV. Customer acquisition is mostly modelled with a logit or a probit model; this depends on the error-term assumed distribution (Lewis, 2005). Generally, for customer j at time t, the acquisition is modelled as follows: ( 4) 𝑍 𝑗𝑡 * = α 𝑗𝑋𝑗𝑡+ ε𝑗𝑡 (3) 𝑍 𝑗𝑡= 1 𝑖𝑓 𝑍𝑗𝑡 * > 0 𝑍 𝑗𝑡= 0 𝑖𝑓 𝑍𝑗𝑡 * ≤0 ( 4 )

Where 𝑋 represents the covariates of customer j at time t and are customer-specific

𝑗𝑡 α𝑗

parameters.

Retention models are somewhat less straightforward; for those models, two classes can be distinguished. First, the models that consider a churned customer as "lost-for-good" where the second class considers churning behaviour as "always-a-share". The first class use mostly hazard models to model customer churn, while the latter typically uses migration or Markov models. The lost-for-good type of models tries to predict the relationship duration using hazard models. These hazard models could be subdivided in proportional hazard models and accelerated failure time (AFT) hazard models. Proportional hazard models assume that the hazard rate is a combination of a baseline hazard and the effects of the covariates. AFT hazard models consider a constant hazard rate determined by specifications of the standard error and mean. The different specifications lead to different models, such as a Weibull model (Gupta et al., 2006).

Moreover, Donkers, Verhoef, and de Jong (2007) also propose two retention models. The first one focusses on the modelling of relationship aspects and keeps the derived profits constant over time. These relational aspects are projected in a retention rate or relationship duration. To model this individual retention rate, the authors use numerous different models. They derive the individual retention probabilities with a probit model and through a bagging approach. Second, besides the retention models, they propose using duration models and Tobit-II models to derive the relationship duration. Derived future profits remain unchanged in their CLV determination, and thus their CLV models follow the form of:

( 4 )

𝐸

(13)

Next to customer acquisition and retention, revenues are needed to determine the CLV. These revenues depend on the customer's purchasing behaviour and on the firm's ability to cross-sell its products or services. To model the change of a firm its margin over time, several studies use simple models such as linear regression. These simple models are performing relatively well in predicting future incomes (Venkatesan & Kumar, 2004).

Cross-selling is the process of increasing the number of products or services that are purchased by the customer. As the firm already has a relationship with these customers, the costs of selling to this group are lower compared to the acquisition of new customers. Moreover, cross-selling increases switching costs as the more services a customer uses, the higher the switching costs for going to a competitor are (Kamakura et al., 2003). Cross-selling may take place in various ways. The first way for companies to be able to cross-sell is by making use of recommendation systems. Recommendation systems are information search and -filtering tools that help consumers discover products tailored to their individual needs (Ferreira et al., 2020). Next to recommended purchases, customers might follow more of a sequence with their purchases. An example of this could be purchases of cloud computing services where customers first purchase a relatively basic module, for example, data storage, after which one decides to upgrade its initial service level with more complex modules such as machine learning and artificial intelligence applications. Such a cross-selling journey has been modelled in Li, Sun, and Wilcox (2005) their research. Their modelling approach orders the products and their demand along a continuum, representing the development of customer demand maturity. The more a customer's demand maturity is located closer to the product's place on the continuum, the more likely it is that a purchase takes place. They propose the use of a multivariate probit model for the situation in which customer i makes a binary buying decision for each of the j products. The unobserved utility for customer i for product j at time t is given as:

( 5 )

𝑈

𝑖𝑗𝑡= β𝑖

|

𝑂𝑗− 𝐷𝑀𝑖𝑡−1

|

+ γ𝑖𝑗𝑋𝑖𝑡+ ε𝑖𝑗𝑡

where 𝑂 represents the position of product j on the continuum, and stands for the

𝑗 𝐷𝑀𝑖𝑡−1

demand maturity of customer i in at the end period t-1. X denotes other covariates that may influence the utility of customer i. This ordered or sequenced modelling approach is compared during this study with four other models that range from a simple independent model to the most complex multivariate probit model with sequential ordering effects. Their multivariate probit model is the model that has the highest predictive power with a mean absolute error (MAE) of 0.5%.

Another modelling approach is presented by Benoit and Van den Poel (2009). In their study, they suggest the use of quantile regression rather than classical linear regression. Classical linear regression models focus on the conditional mean function, which describes the changes in the mean of y by the covariates of x. One of the linear regression assumptions is that, for every value of x, the corresponding error-term follows the same distribution. The components of vector x ought to affect only the location of the conditional distribution of y and not its

(14)

distributional shape. However, it might be the fact that the distribution of this error-term is not similar for the complete vector of x (Benoit & Van den Poel, 2009). Or to put it in other words, over the whole regression the distribution of the error-term should be constant. Quantile regression, as described by Koenker and Bassett (1978), is an extension on the mean regression in that it uses conditional quantiles of the response variable, for example, the first quartile of a distribution or the median. This sort of regression gives a more subtle view of the relation between the variables as it accounts for the distribution of the covariates.

Benoit and Van den Poel (2009) show in their research the benefits of using quantile regression for modelling the CLV compared to an OLS regression. As mentioned before, the information from a quantile regression is richer as for every type of covariate the distribution is considered. Therefore, this analysis method explains the effect of every covariate based on the predetermined quantile. Using a large database, it is realistic to assume that the covariates' distribution is different across all the different observations. Using a quantile regression gives a complete view of the data, then a standard linear regression would do. In addition to this enriched view of the data, quantile regression compared to mean regression performs better in absolute CLV predictions.

Moreover, when using quantile regression as a predictive method to order the customer base, quantile regression performs significantly better compared to linear regression. Mainly when focusing on the high-end customer group or the top of the customer base with the highest predicted CLV, the quantile regression outperforms OLS regression. This effect wears off when larger top groups are considered. However, the focus of interest lies mainly on the top customers or customers with the highest predicted CLV, and therefore quantile regression shows a useful extension compared to OLS regression.

2.4 Persistence Models

To create a sustainable competitive advantage over competitors, organisations try to create long-term effects on their marketing strategies. However, short-term results are most visible, whereas the quantification of long-term responses is somewhat tricky. Persistence models try to capture these long-term responses by combining several reactions towards marketing actions (Dekimpe & Hanssens, 2005) and treat them like a dynamic system where every change in the one variable results in the change of the other variable(s) (Gupta et al., 2006). Another domain where persistence models are used is the determination of customer equity as described by Vraná and Jašek (2015) and Villanueva, Yoo, and Hanssens (2008). Their used modelling type is a vector autoregressive (VAR) model, which is frequently used to model persistency effects.

Villanueva, Yoo, and Hanssens (2008) model the effect of marketing-induced customer acquisition compared to word-of-mouth customer acquisition on customer equity growth, resulting in increased firm performance. The persistence models focus on the earlier described customer acquisition, -retention and cross-selling for the CLV prediction. The persistence models are built in three steps. The first step is to check whether the variables are stable or evolving; this is done with a unit root test (Villanueva et al., 2008). For example, is customer

(15)

retention stable over time or evolutionary. If the variables are stable over time, the VAR model can be estimated in the so-called level-form. If the variables are evolutionary, the VAR model is estimated in changes-form. Secondly, the VAR model has to be estimated, such as is done by Villanueva, Yoo, and Hanssens (2008):

𝑀𝐾𝑇 𝑡 𝑊𝑂𝑀𝑡 𝑉𝐴𝐿𝑈𝐸𝑡

(

)

= 𝑎 10 𝑎20 𝑎30

(

)

+ 𝑙=1 𝑝 ∑ 𝑎 11 𝑙 𝑎 12 𝑙 𝑎 13 𝑙 𝑎 21 𝑙 𝑎 22 𝑙 𝑎 23 𝑙 𝑎 31 𝑙 𝑎 32 𝑙 𝑎 33 𝑙

(

)

𝑀𝐾𝑇 𝑡−1 𝑊𝑂𝑀𝑡−1

(

( 6 ) Where MKT represents the number of acquired customers by the marketing actions of an organisation, WOM stands for the acquired customers through word-of-mouth and VALUE represents the firm performance. Time is expressed in t and p represents the lag order of the model. The error-terms (𝑒 , ) are distributed N(0, ). The direct effects of acquisition

1𝑡 𝑒2𝑡, 𝑒3𝑡 Σ

of firm value are captured by 𝑎 , the cross-effects are captured by the

31, 𝑎32 𝑎12, 𝑎21,

feedback effects are represented by 𝑎 , and the reinforcement effects are captured by

13, 𝑎23

. After the model is estimated, the impulse response functions must be derived. 𝑎

11, 𝑎22, 𝑎33

The impulse response function is the function of the response after the system is confronted with an impulse, or in this case a change in the variables.

As the CLV is rather a long-term performance metric, persistence models might perform well in this case. By utilising persistence models, insights in the relative importance of acquisition, retention and cross-selling can be derived. On the other hand, the input data for persistence models are long time-series data which may not always be present.

2.5 Computer Science Models

Next to structured and parametric analysis methods, such as logistic regressions or duration models, another stream of methods occurs in the marketing literature. These methods are computer science models and predominantly focus on optimising predictive performance. Computer science methods operate in a non-parametric way and are most likely to be better predictors than parametric methods as those methods follow certain underlying assumptions. Parametric models rely on theory and are relatively simple to understand; computer science models only consider the input data and recognise patterns in the data. Some computer science models are support vector machines, neural networks, decision trees, and ensemble methods (Gupta et al., 2006). These methods will now be discussed.

A support vector machine (SVM) is a supervised learning method and is frequently used for classification and regression analysis. Where parametric methods assume linear functions to classify the different observations, an SVM uses curvilinear functions to classify. An SVM uses a kernel-induced transformation from the attribution space to a higher-dimensional space to grasp crucial data insights (Cui & Curry, 2005). So, the SVM maps the data points as points in a multidimensional space and clusters them so that clusters are as far away as possible from each other. These so-called kernel transformations ensure that the linear SVM can solve a

(16)

non-linear problem. In marketing, SVMs are frequently used classification methods (Chen & Fan, 2013; Cui & Curry, 2005; Moro et al., 2014).

Cui and Curry (2005) compare the prediction accuracy of an SVM with a multinomial logit model by using Monte Carlo simulations. They find that for every tested condition, the SVM outperformed the multinomial logit. The overall mean prediction rate of the logit model is approximately 73%, where the prediction rate of the SVM is up to 85%. An SVM is a relatively simple machine learning model, and therefore the chances of overfitting the presented data is lower compared to more advanced models. Therefore, the results of an SVM are likely to be more generalisable (Lessmann & Voß, 2008).

Neural networks (NN) are computer science models that are mathematical representations inspired by a human brain's functioning. The neural networks encompass input layers, multiple hidden layers, and an output layers; each layer consists of one or more nodes (Lessmann & Voß, 2008). The number of hidden nodes sets the model complexity. Where zero hidden nodes result in the form of logistic regression (LR), a higher number of nodes results in complex non-linear relationships. In the research of Moro, Cortez, and Rita (2014) the success of performed marketing actions is predicted using a neural network, an SVM, logistic regression and a decision tree (DT). They found that NN outperformed the other methods on the predictive performance measures area under the curve (AUC) at 0.8 and the cumulative lift curve (LIFT) of 0.7. For both measures, a value of one means that the model captures all the variance, so the NN performs relatively good.

Decision trees have been used for numerous different classification problems in the world. It assumes that an observation can be classified based on their underlying characteristics. A decision tree is composed of leaves and decision nodes. Leaves indicate a final class whereas decision nodes specify a specific test to be carried out on a single attribute value. After each node are branches for every result of this conducted test. A decision tree classifies its observations by starting at the root of the tree and moving up until an observation ends up in a leaf (Quinlan, 1993). Several researchers studied the performance of decision trees in CLV related fields. As mentioned before, Moro, Cortez and Rita (2014) compared decision tree performance with other predictive methods and concluded that neural networks outperform DTs.

Moreover, Verbeke et al. (2012) study the predictive performance of different models for customer churn. They indicate that their chosen decision tree method, C4.5, performs as one of the best methods. Moreover, where NNs and SVMs result in complex non-linear models, decision trees are easy to interpret and therefore might be favoured above models that have higher predictive performance.

Whereas the previous methods used a single model, other analysis methods combine models to enhance predictive power; these methods are called ensemble methods. Examples of these methods are bagging and boosting. The idea of bagging is that the subset of training data N is split up randomly with replacement and divided into M number of subsets. Each observation

(17)

may appear repeated times in the M subsets or not at all. Therefore, every subset consists of a different sample of observations. For each of the different subset, models are built. All those models are then ensembled, and an average model of all the M subsets is the output (Breiman, 1996). The models that are frequently used in this case are classification trees. In the CLV literature, bagging is a regularly used method of prediction. Donkers, Verhoef, and de Jong (2007) use a bagging approach where the average predicted probability of all classes is used to identify an average retention rate. This rate is multiplied by the expected profits to result in the CLV. Compared to other methods, probit, Tobit-II, and a duration model, the bagging approach yielded no enhanced results.

Furthermore, Lemmens and Croux (2006) compare the results of a bagging model and a binary logistic regression for churn prediction. They found that the bagging approach had a predictive power up to 16 per cent better for the GINI coefficient and 26 per cent better for the top decile lift. They calculated the value of the bagging approach compared to the logit model and concluded that the increased predictive performance of the bagging approach was worth up to $3 million.

2.6 Diffusion/Growth Models

The previous models determine the CLV of individual customers and are used for segmentation and targeting purposes. They use disaggregated data that suits best for operational purposes. However, CLV data should be aggregated to make it more fitting as a strategic metric. When considering the complete customer base and their value, different sorts of models are required. These models focus on customer equity, which encompasses the CLV of all the current and future customers. The acquisition of future customers can be modelled with two different methods. Firstly, the acquisition is modelled using the earlier discussed methods of identifying an individual customer and its probability of becoming an active customer. The second way considers aggregated CLV data and uses diffusion- or growth models to arrive at the future customer equity. With this customer equity, the firm performance can be assessed.

Bass (1969) was one of the first who modelled adoption and diffusion. He proposes that the probability that people adopt products or services can be expressed as function of the number of previous buyers. Where the focus of Bass (1969) lays on the adoption and diffusion of products, his model can be used in a much broader context. Gupta, Lehmann, and Stuart (2004) adjusted this model, the so-called technological substitution model, to estimate the number of customers in future periods. Bass, Jain, and Krishnan (2000) state that the derived estimates from this model are comparable to those of the Bass model. The number of customers at period t is expressed as:

( 7 ) 𝑛

𝑡=

αγ 𝑒𝑥𝑝 (−β−γ𝑡) [(1+exp𝑒𝑥𝑝 −β−γ𝑡( ) ]2

Gupta, Lehmann and Stuart (2004) assessed their model's performance with data of five companies. They compare their model for customer equity with firm value and show that in 60% of the cases, their CE model approximates firm value. In addition to that, they split up the effects of acquisition, retention, and margins on customer equity. They found customer

(18)

retention most influential, where an increase of one per cent customer retention resulted in an average increase of five per cent in customer equity. On the aggregate level, customer retention is thus most influential for the customer equity or summed total CLVs.

In addition to the model proposed by Gupta, Lehmann, and Stuart (2004), Libai, Muller, and Peres (2009) propose a services growth model from which firm value can be derived. Their model describes a service firm's growth and accounts for two forms of attrition, churn and disadoption of customers. Disadopting customers stop using the service while churning customers switch to competitors. To test their model's predictive performance, they compare it with a firm's stock market value. The model estimates of customer equity are close to the stock market performance where, on average, they differ eighteen per cent. Additionally, they test a model without the attrition attributes and conclude that their performance compared to the model, including these attributes are performing way worse.

Diffusion and growth models consider customers' aggregated data to measure the growth of specific attributes, in this case, customer equity. Both types of research of Gupta, Lehmann, and Stuart (2004) and Libai, Muller, and Peres (2009) reveal the importance of customer retention while assessing the customer lifetime value and customer equity.

2.7 Model selection

All the aforementioned methods model the CLV differently, and in almost every study, a different method performs best. So, from the literature, it is not unambiguous to say that one method always outperforms another. However, what is made clear in previous studies is that not all the methods can be used in every situation. For example, persistence models can only be utilised when sufficient time series data is present. Additionally, diffusion/growth models consider aggregated data to predict a customer base's trends and equity. In that case, the analysis would focus on predicting future customer equity rather than the individual CLV. As this study focuses on the CLV and not customer equity, diffusion/growth models are not considered. Moreover, the present data do not allow for persistence models as the data encompasses too short timeframe. For this study, a similar approach is chosen as in the research of Donkers, Verhoef, and de Jong (2007), where retention- and revenue is modelled separately. To ultimately come to the CLV, a multiplication of both models is performed.

(19)

3. METHODOLOGY

This section contains a description of this research set-up and the application of the different methods used. First, a general framework, to compare the performance of the different methods, is determined. After that, the data collection, -preparation and -exploration is discussed.

To determine which of the previously mentioned models performs best, fixed inputs should be provided. As the inputs in all the models are consistent, the difference in performance is determined by the method of analyses. In this study, the framework of Bolton, Lemon, and Verhoef (2004) is adopted.

3.1 Conceptual framework

In the research of Bolton, Lemon, and Verhoef (2004) customer asset management of services, something that is closely linked to the CLV, is introduced. The framework describes how marketing instruments influence the customer-firm relationship perceptions and ultimately, financial outcomes. Those financial outcomes are forth flowing out of customer behaviour that is affected by the relationship perceptions. Three customer behaviour aspects are identified; the length of the customer-firm relationship, the service's usage frequency, and the number of additional products or services purchased. The joint outcomes of those behavioural aspects result in the revenue, which is used in the CLV calculation. For this research, the CUSAMS framework presented by Bolton, Lemon, and Verhoef (2004) is adopted. The framework focuses on a service providing firm which Microsoft is as well. A visual representation of the framework can be found inAppendix A

Relationship length

It is expected that the length a customer interacts with the firm positively affects the CLV. As the longer the relationship lasts, the chances of customers churning decrease. Customers are interacting more with a firm and its services and are getting used to their services. Moreover, switching costs might increase as the relationship length is increasing. As proposed by the CUSAMS framework (Bolton et al., 2004), customer behaviour is influenced by customer-firm relationship perceptions. Bolton (1998) found earlier that customer satisfaction results in a longer duration of the relationship. Moreover, Verhoef (2003) demonstrates that customer satisfaction positively affects customer retention and thus inevitably on relationship length. From these studies can be deducted that a customer with a long customer-firm relationship length is most likely to be a satisfied customer and probably stays at the company longer, resulting in a higher CLV.

Relationship depth

As proposed by Bolton, Lemon, and Verhoef (2004), relationship depth can be seen as the frequency of service usage over time. Service usage assumably has a positive effect on the retention, thus on the CLV as well, as the customers must stay at the firm to use a company's

(20)

services. Moreover, over time, a higher level of service usage signals a more in-depth relationship between the customer and the firm. This reflects on the customer its intentions to maintain the relationship with the firm. So, the deeper the customer-firm relationship, the higher the CLV.

Relationship breadth

Relationship breadth could be expressed in the cross-buying or add-on buying of customers (Bolton et al., 2004). Cross-buying is the process of buying an additional number of products or services from the company over time (Blattberg et al., 2001). In a contractual setting, customer retention is extended by cross-selling of multiple services (Srivastava & Shocker 1987). Moreover, in their research O’Neal and Bertrand (1991) state that B2B customers who are in a long-term relationship with their supplier, generally have a greater scope of the relationship. Furthermore, companies try to sell as many different products as possible to the same customer in order to create higher switching costs resulting in less switching behaviour (Blattberg et al., 2001).

3.2 Data collection & -cleaning

The data is collected via the sales channels of Microsoft Netherlands. It considers the Dutch Microsoft Azure customers and ranges over the period of July 2017 up until October 2020. The data consists of historical usage behaviour and splits up the monthly consumption in the consumption per service. The data of the 295 biggest Dutch customers of Microsoft is taken into consideration for this study. As described earlier, Microsoft Azure is consumed in a pay-as-you-go subscription. Customers can have multiple subscriptions, for example, for different projects or functional divisions a different subscription. The available data is monthly consumption data on subscription level. The dataset consists of the consumption data of 11,798 subscriptions.

Next to the consumption data, firm characteristics are added, such as its industry and firm size. It is important to incorporate those variables. A larger firm likely has more resources than a smaller firm and, therefore, more likely can dedicate more resources to their IT infrastructure.

The data set consists of a variable for the company's name that is a customer at Microsoft. This is transformed in a way that privacy-sensitive information is invisible, so company names are made anonymous. This is done by factorizing the names and then transform them to numeric values.

3.3 Variable description

In this section, all the variables that are used during the modelling procedure will be evaluated. Moreover, the operationalization of the theoretical constructs is discussed study.

(21)

3.3.1 Dependent Variables

Churn. A binary variable churn is created. This variable indicates whether a customer churns

in a period or not. To determine whether the customer has churned, the variable Fiscal Month is used as indicator. When a customer is inactive, has a consumption of zero dollars, at the latest available time period, being October 2020, the churn variable indicates a one. When the customer is active, a zero is returned.

Revenue. The CLV is, as mentioned before, the summation of the total profits derived from a

certain customer over its lifetime. Due to competitively sensitive nature of the information on Microsoft its margins, the costs of a single Azure service, is not available. Hence, profits cannot be calculated in this case. Therefore, profits will not be assessed, but the revenues are considered.

Customer Lifetime Value. To ultimately model the CLV, a combination of the former two

dependent variables should be made. The determination of the customer lifetime value in this case is two-fold. First, the retention probability of a customer should be modelled. Second, the profits derived from that customer in the next period should be considered. The CLV then, is a multiplication of the probability to stay and the derived future revenues. Predictions always consist of a certain degree of uncertainty. When extrapolating for example a model that predicts with 95 percent certainty, after only six periods the accuracy is only 74 percent. So, for every additional period in the prediction, accuracy will decrease. Therefore, in this case the timeframe is limited to one year, twelve periods.

3.3.2 Independent variables

Relationship length. As the data starts at July 2017 it might be the case that customers were

already active before that time, however, as this is not observable that is not considered. To account for the construct of relationship length, a new variable is created. This variable rel_length counts the number of months a customer is active and starts at the time a customer becomes active for the first time.

Relationship depth. As Microsoft use a pay-as-you-go subscription model, service usage is

directly reflected in the paid price by the customer. This price is divided for every single module Microsoft Azure consist of. So, for every module the revenue reflects the depth of the relationship. The variables used are AI, App_Dev, Cloud_Scale_Analytics, Compute, IoT, Networking, OSS_Data, Rest_of_ADS, Security, SQL_Data_Modernization, Storage. The total consumption is displayed in the variable Gand_Total, this is the sum of all the consumption of the before mentioned products.

Relationship breadth. The relationship breadth consists of the add-on buying of services. In

this case relationship breadth is considered the total number of different products. The variable rel_breadth sums the total number of products for which the consumption is above zero dollar per month.

(22)

Lagged variables. To be able to predict the total revenues of a period, historic consumption

data is used. Therefore, lagged variables are created and added to the data set. These variables show the consumption of period t-1. The lagged variables are created for all the earlier mentioned consumption variables.

Engagement level. To assess the effectiveness of marketing activities on customer behaviour,

the variable engagement level is added. This variable ranks the customer on a four-point scale. The ranking is based on underlying data concerning interactions of customers with marketing actions of Microsoft. This could for instance be attendance during a webinar/workshop, downloads of product specific content, or the creation of free/trial accounts for Azure. All these interactions are scored and result in one of the four categories. This categorical variable consists of four levels ranging from: none, low, medium, and high.

Control variables. Next to the main variables some control variables are included. These

variables are not the main concern of this study, but they might have an impact that should be considered. First, the type of industry is incorporated in the categorical variable Industry. This variable has 23 levels. The second control variable is the variable PCIB. This stands for “PC installed base” and indicates how many computers a company has. This variable is considered as an indication of the number of employees.

3.4 Data exploration

This section elaborates on the data that is used in this study. It focuses on how irregularities in the data are treated. Moreover, descriptive statistics of the different values are presented.

3.4.1 Missing values

When looking at the data, a total of 1,333,941 NA values appears. The NA's are present in three sorts of variables. First, there are 1,289,883 NA's present in the consumption variables. These are appearing since zero consumption of that product in the given time period is happening. So, the missings have a meaning and therefore should not be imputed by a prediction imputation method. Since the NA's have a meaning, namely no consumption, these should be imputed with zeros.

Next to missing values in the consumption data, there are NA's present in the variables PCIB and Engagement level. These missing values are in the data because there was no data available of the installed base for those specific customers. In the PCIB variable, the NA's do not mean that there is no PC installed base, but rather that the data are missing at random, the same holds for Engagement level. To avoid a loss of statistical power, the NA's are imputed using hot-deck imputation. Hot deck imputation makes use of the available data points to predict the missing values. Imputation eventually will have an impact on the results of the models. However, this impact is assumed to be less important than the row deletion of the observations. For the variable PCIB, approximately fourteen per cent of the observations are imputed, while only eight per cent of the values are imputed for the engagement level.

(23)

To conclude, there are missings in the lagged variables. These missings are all on the first period of a subscription. Since lagged variables have the value of the consumption for a period back in time, it is impossible to create a lagged variable for the first period. Therefore, NAs are present and are neither imputed using imputation methods nor with zeroes. The NAs are present for the relationships with a length smaller than two. So, customers that are active for more than one period are considered in the analyses. In total, 0.6 per cent of the observations have a relationship length of only one period, and due to this small share are not considered.

3.4.2 Data irregularities

When looking at the data there are some things that attract attention. First, there are a lot of zeroes present in the variable, especially in the consumption variables. This is easy to explain, since these variables are imputed with zeroes as explained in the previous section. Secondly, when glimpsing at the provisional descriptive statistics of the variables, it turns out that 48 of the consumption variables return a negative value. This would implicate that customers received money back from Microsoft. After consultation with Microsoft, it is decided to delete the instances that contain these negative values, as these values are transformed during the extraction process from the internal databases.

3.4.3 Descriptive statistics

After the inspection and adjustments in the data, the descriptive statistics of the data are created. As mentioned before, the dataset consists of the data 183,489 observations. These observations are from the biggest 295 Microsoft customers. This classification is based on the total spend on all the Microsoft solution areas, including Data & AI, Modern Workplace, Business Applications, and Applications and Infrastructure. All those customers can have multiple subscriptions for Azure. Multiple subscriptions are mainly used by different functional division within a company or are project-specific. The database is aggregated on subscription level which consists of 11,798 subscriptions. The data consists of 23 different industries, including but not limited to, Automotive, Discrete Manufacturing, Higher Education, Retailers, and Telecommunications. For an overview of all the different industries and their relative shares, see Appendix B. Due to privacy concerns, company names will not be provided.

When looking at the descriptive statistics in Table 1, some interesting information is presented. As we just imputed a lot of zeroes it is reasonable to see that the minimum of the consumption variables is zero. When considering the consumption of AI, the median has the same value as the minimum, zero. This implicates that when ordering the data, the middle observation is a zero. Not only for the AI consumption is this the case, but also for almost all other consumption variables, except Compute, Networking and Storage. Another thing that is interesting is the fact that the maxima are extremely high compared to the mean. This can be explained by the number of zeroes in the data set and by the fact that in a data set with 295 companies, a lot of individual differences are present. As this study encompasses the

(24)

consumption of cloud computing, a topic which might be relatively new for a lot of companies, these differences arise.

When assessing the relationship breadth, it can be concluded that customers on average use 3.65 products where the median observation is four. The maximum number of products in use are eleven while a minimum of one product is used. Concerning the relationship length, the insights arise that on average a customer is 14.06 months active with a minimum of one period and a maximum of forty. Moreover, when considering the control variable PCIB, it can be deducted that the numbers are quite high. With on average 34,739 employees working at the companies, with a minimum of 79 and a maximum of 446,547. So, within the dataset there is a wide distribution for the variable linked to the company size.

Variable Min Max Mean Median

AI 0 79,511 31.55 0

App development 0 128,784 331.25 0

Cloud Scale analytics 0 241,182 219.4 0

Compute 0 279,173 991.02 13.95

IoT 0 9,236 2.727 0

Networking 0 49,048 96.08 0.78

OSS Data Modernization 0 98,199 88.61 0

Rest of ADS 0 40,302 13.66 0

Rest of apps and infra 0 36,579 50.1 0

Security 0 91,849 163.70 0 SQL Data Modernization 0 233,730 474.77 0 Storage 0 260,109 656.34 15.51 Grand total 1 779,406 3119.1 210 Relationship breadth 1 11 3.65 4 Relationship length 1 40 14.06 12 PCIB 79 446,547 34,739 8452

Table 1: Descriptive statistics

3.5 Churn models

In this section the different methods to model churn are explained. The first model used is a logistic regression, where the models after that are machine learning models or computer science models. These models aim to model the retention probabilities. First, an elaboration on the logistic regression will be provided after which a short recap of the used machine learning methods is given.

3.5.1 Logistic regression model

The first model that is used, is a model to predict whether a customer will churn or not. The dependent variable of this first model is a binary variable, meaning that it can have a value of zero or one. As the variable is binary of nature, a logistic regression is conducted.

(25)

This predictive analysis is used to elaborate on the relationship between churn and the (lagged) consumption variables as well as the earlier described CUSAMS concepts. The independent variables are the continuous lagged consumption variables, the continuous CUSAMS variables and the control variables. This model is chosen as it is easy to implement, and the effects of the independent variables can be extracted easily.

The logistic regression model, or logit model, assumes that there is a latent utility that drives the latent customer behaviour (Leeflang et al., 2015). The latent utility follows a linear form and can be defined as:

( 8 ) 𝑈

𝑖= α + 𝑥'𝑖β + ε𝑖

The latent utility and the behavioural outcome are linked as follows:

( 9 ) 𝑌

𝑖 = {

0 𝑖𝑓 𝑈𝑖≤0 1 𝑖𝑓 𝑈𝑖 >0

So, if the latent utility is equal or smaller than zero, a customer decides to stay as customer. When the latent utility of a customer is greater than zero, the customer decides to churn or defect. The observed outcome behaviour𝑌𝑖and the latent utility𝑈𝑖are linked in the following way: ( 10 ) 𝑃 𝑌 𝑖= 1

[

]

= 𝑒𝑥𝑝 (𝑈𝑖) 1+𝑒𝑥𝑝 (𝑈𝑖)

3.5.2 Alternative classification techniques

As the decision to churn is binary in nature, it also can be treated as a classification problem. So, next to the logit model for churn, several other classification methods are conducted, and their performance is assessed. More specifically, the decision about whether a customer churns is also modelled with a Decision Tree, Random Forest, Neural Network, and a Support Vector Machine.

The first classification method is a decision tree, which is based on the idea of data splitting. The tree begins with all the data in the root node. It splits this into mutual exclusive child nodes such that the observations in each node are as homogeneous as possible concerning the dependent variable and maximally different from observations in other nodes. This process is repeated for every child note until the value of splitting is below a certain threshold. Since the data is split at every node following a certain threshold, the decision tree is easy to follow and interpret. Therefore, the decision tree is often used in practice.

A random forest is an ensemble learning method. Ensemble methods characterize themselves as methods that use multiple models of the same data combined to obtain a final model. For the random forest method, multiple random training data sets are created. For all these data sets, individual decision trees are created. From all the available predictor

(26)

variables, m random variables are selected. Equal to decision trees, on each node, the data is split into two groups classified according to a certain threshold. However, the difference between a random forest and a decision tree is that an RF is the output of the aggregation of all the different decision trees.

Neural networks are machine learning models inspired by the human brain. They consist of input-, hidden and output layers. The complexity, or depth, of the model, is determined by the number of nodes within the hidden layer. Deeper networks are often more accurate but take a longer time to train. Therefore, balancing training time and model complexity is an important aspect of this method.

The last classification method is the support vector machine. This is a supervised learning method which uses curvilinear functions to classify observations. As this classification method use kernel transformations, it works completely different than the other machine learning and regression methods. Therefore, this method is included in the research.

3.5.3 Model specification

The finite model where the theoretical constructs are translated into practical concepts looks as follows:

𝑈

𝑖= α + β1𝐴𝐼_𝐿𝑎𝑔𝑖 + β2𝐴𝑝𝑝_𝐷𝑒𝑣_𝐿𝑎𝑔𝑖 + β2𝐶𝑙𝑜𝑢𝑑_𝑆𝑐𝑎𝑙𝑒_𝐿𝑎𝑔𝑖 + β3𝐶𝑜𝑚𝑝𝑢𝑡𝑒_𝐿𝑎𝑔𝑖+ β4𝐼𝑜𝑇_𝐿𝑎𝑔𝑖

Where:

Utility of customer i for churning at time t 𝑈

𝑖

α Constant (intercept)

Slope parameters (effects) β

1, … , β16

Lagged AI consumption of customer i 𝐴𝐼_𝐿𝑎𝑔

𝑖

Lagged App_Dev consumption of customer i 𝐴𝑝𝑝_𝐷𝑒𝑣_𝐿𝑎𝑔

𝑖

Lagged Cloud scale analytics consumption of 𝐶𝑙𝑜𝑢𝑑_𝑆𝑐𝑎𝑙𝑒_𝐿𝑎𝑔

𝑖

customer i

Lagged Compute consumption of customer i 𝐶𝑜𝑚𝑝𝑢𝑡𝑒_𝐿𝑎𝑔

𝑖

Lagged IoT consumption of customer i at time t 𝐼𝑜𝑇_𝐿𝑎𝑔𝑖

Lagged Networking consumption of customer i 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑖𝑛𝑔_𝐿𝑎𝑔

𝑖

at time t

Lagged OSS Data Modernization consumption of 𝑂𝑆𝑆_𝐷𝑎𝑡𝑎_𝑀𝑜𝑑𝑒𝑟𝑛𝑖𝑧𝑎𝑡𝑖𝑜𝑛_𝐿𝑎𝑔

𝑖

customer i at time t

Lagged Rest of ADS consumption of customer i 𝐴𝐷𝑆_𝐿𝑎𝑔

𝑖

Lagged Rest of Apps and Infra consumption of 𝐴𝑝𝑝𝑠_𝐼𝑛𝑓𝑟𝑎_𝐿𝑎𝑔

𝑖

(27)

Lagged Security consumption of customer i 𝑆𝑒𝑐𝑢𝑟𝑖𝑡𝑦_𝐿𝑎𝑔𝑖

Lagged SQL Data Modernization consumption 𝑆𝑄𝐿_𝐷𝑎𝑡𝑎_𝑀𝑜𝑑𝑒𝑟𝑛𝑖𝑧𝑎𝑡𝑖𝑜𝑛_𝐿𝑎𝑔

𝑖

of customer i

Lagged Storage consumption of customer i 𝑆𝑡𝑜𝑟𝑎𝑔𝑒_𝐿𝑎𝑔

𝑖

Number of installed PC’s for customer i 𝑃𝐶𝐼𝐵

𝑖

Length of the relationship of customer i 𝑅𝑒𝑙_𝑙𝑒𝑛𝑔𝑡ℎ

𝑖

Lagged relationship breadth of customer i 𝑟𝑒𝑙_𝑏𝑟𝑒𝑎𝑑𝑡ℎ_𝑙𝑎𝑔

𝑖

Industry customer i is active in 𝐼𝑛𝑑𝑢𝑠𝑡𝑟𝑦

𝑖

Engagement level of customer i 𝐸𝑛𝑔𝑎𝑔𝑒𝑚𝑒𝑛𝑡_𝑙𝑒𝑣𝑒𝑙 𝑖 Error term ε 𝑖 3.6 Revenue model

The second model to come to a right CLV determination is a model that predicts the revenue of the future periods. The dependent variable of this model is therefore a continuous variable, namely total revenue earned in period t+1. To model the effects of different independent variables on a continuous dependent variable, a multiple linear regression is most suitable. A multiple linear regression presumes that one dependent variable Y, relies on multiple independent variables X. The variables are linked via the following linear function:

( 11 ) 𝑌

𝑖= β1𝑥𝑖1 + β2𝑥𝑖2 + … + β𝑗𝑥𝑖𝑗+ ε

Where the dependent variable is the total revenue of a period, the independent variables encompass among others the lagged variables. As the total revenue of a period, is equal to the sum of the revenue generated in one period, the lagged consumption variables are used. As the revenue model is used to predict the future incomes, only the lagged consumption can be used as input. In addition to the lagged consumption variables, also the lagged version of relationship breadth should be used. The same line of thought applies on this as the lagged consumption variables. This approach is chosen, as then only the revenues of the first month are not considered in the analyses. When for instance a model is built for the lead version of total revenue, the effect of the last observation in time is not added to the regression. As the most recent consumption is ought to be more important than the first observation, this is modelled as such.

(28)

4. RESULTS

This chapter comprises the results of the study. Different classification techniques are compared to ultimately decide which method should be used to model the churn decision. 4.1 Churn model

To model the first part of the CLV estimation, the effect of the independent relationship variables on the dependent variable churn is measured. To determine a final model that is as accurate as possible, different models are considered and tested on their in- and out-of-sample accuracy. The first model considers all the variables as described in section 3.5.4 where relationship depth is represented by the lagged variables. The second model is like the first model but differs in how the lagged variables are calculated. As can be derived from the descriptive statistics, a lot of extreme values are present in the data. As a logistic regression normally face difficulties with extreme values, in the second model the lagged variables represent the average of two lagged periods. By this transformation, the data is smoothened, and the extreme values will be less extreme. To not transform the data too much, but only account for the extreme values, an average of two periods is considered. Model three is a variation evolved out of model one, where model three only includes the statistically significant parameters out of model one. The same holds for model four that evolves in a similar way out of the second model.

4.1.1 Validity

To see which of the models suits the data best, and therefore is the preferred model, all the models are compared. Additionally, a Null-model, a model that only includes an intercept of the dependent variable, is included to have some benchmark model. In order to see how all the models perform, the data is split in two samples. The so-called in-sample data consists of 75 percent of the data, whereas the out-of-sample data consist of the resulting 25 percent. The in-sample data is used to assess the model fit of the different models whereas the out-of-sample is used to quantify the prediction performance.

To assess the model fit of each of the different models, several validity tests are conducted. The first two measures to look at, are the Akaike Information Criteria (AIC) and the Bayesian Information Criteria (BIC). Both criteria assess the model fit where the models are penalized for additional variables. These measures are used for model selection and the lower the value of both criteria, the better the model fit. The values of these criteria are calculated over the in-sample data and the lower the value, the better fit the model has. Based on the information criteria, model three suits best the data. Next to the information criteria, the loglikelihood of the different models is assessed. The likelihood ratio test is performed for all the models. As model three consists of only the significant variables of model one as so for model four and two, the likelihood ratio test is performed between model one and three and two and four. From there can be concluded that only model one and two show significance, meaning that they significantly differ from each other. Models three and four are not significant different from respectively model one and two. Based on the value of the loglikelihood, model one fits

Referenties

GERELATEERDE DOCUMENTEN

This study further investigates the field of customization by testing the effect of a personalized direct mail, that fits customer preferences, on the drivers of customer equity

Of er een reële kans bestaat voor concurrenten om een succesvolle onderneming te kunnen starten is de vraag als men kijkt naar het toekomstperspectief van de

The predictors included in the model were divided into relational characteristics and customer characteristics (Prins & Verhoef 2007). The relational characteristics

(a) Post-paid customers: The usage factors did have some effect on customer churn in the post-paid sample as the variables “Average abroad total charge ratio” and “Maximum

The Chartered Institute of Management Accountants (CIMA) (2009, p. 3) defines CPA as “the analysis of the revenue streams and service costs associated with spe- cific customers

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Het informele sociale netwerk onder de bewoners speelt niet alleen een rol in de organisatie van het verzet tegen de plannen van de gemeente, maar is ook belangrijk voor

Het kind zal vervolgens in staat zijn empathie te tonen, omdat de veilige hechtingsrelatie heeft gezorgd voor een ontwikkeld vermogen om emoties te reguleren, perspectief van