• No results found

To churn or not to churn? : an empirical analysis on customer churn behavior at a Dutch energy supplier during a loyalty media campaign using parametric and semiparametric selection models

N/A
N/A
Protected

Academic year: 2021

Share "To churn or not to churn? : an empirical analysis on customer churn behavior at a Dutch energy supplier during a loyalty media campaign using parametric and semiparametric selection models"

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

To churn or not to churn? An empirical analysis on

customer churn behavior at a Dutch energy

supplier during a loyalty media campaign using

parametric and semiparametric selection models

by

Karlijn Schipper

10002106

A thesis submitted in partial fulfillment for the degree of Master of Science Econometrics, Free track In the supervision of dr. J.C.M. van Ophem and dr. M.J.G. Bun

in the

Faculty of Economics and Business University of Amsterdam

(2)

2

Statement of Originality

This document is written by Karlijn Schipper who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3 Abstract

This paper analyzes customer churn behavior of customers of a Dutch energy supplier during a specific media campaign of the company in 2014 and 2015 using parametric and semiparametric selection models. Customer churn is known as the loss of customers during a given time period and in this paper customer churn behavior is defined as (1) the switch of a customer to another energy supplier during a specific loyalty campaign, and (2) the number of days up to a churn beginning at the start of the loyalty campaign. To control for a possible sample selection bias due to censoring, parametric and semiparametric two-step selection models are applied. Results show that the loyalty media campaign of the energy supplier has a significant, negative effect on customer churn. Other important churn determinants are customer loyalty measured by the number of years a customer is already a member of the company, an active customer status indicated by a switch of products prior to the observation period, being selected for a churn prevention program and dissatisfaction with the energy provider indicated by contacts with the service desk. However, correctly estimating the time in days up to a churn is difficult.

(4)

4

Acknowledgements

First of all, I would like to thank dr. J. C. M. van Ophem for his supervision, generous guidance, support, suggestions and valuable comments on my work during the process of writing this thesis.

Next, I would like to thank Friso Westenberg and Rixt Altenberg for the opportunity to do research at the concerning Dutch energy supplier, and Fred Veens and Eline van den Boogert for providing the confidential datasets.

I would also like to thank Simon Teggelaar and Jolien van den Berg for their valuable suggestions on the topic of this study and for their support.

(5)

5 1 Introduction

Since the liberalization of the Dutch energy market in 2004, it is possible for customers to switch to another energy supplier. The ACM (2014)1 showed that since 2004 almost half of all the Dutch

consumers (44%) has switched at least once and that 94% of these consumers are satisfied with the switching process. Price appears to be the most important reason for switching, but also green power as a sustainability argument and personal offers of competitive energy providers can be decisive factors. The main reason for not switching is satisfaction with the current energy supplier. Moreover, customers are nowadays able to compare energy prices of different energy providers online themselves, such that the switching process has become easier and more accessible.

Previous studies show that in general it is of great concern to keep customers as long as possible concerning the following reasons. Siber (1997) shows that acquiring new customers can substantially exceed the costs of retaining existing ones and it is far more time and effort consuming. Reichheld and Sasser (1990) also show that many turnovers, either before or at the end of the contract, can thereby lead to a great loss of price premium, a decrease in profit levels and a possible loss of referrals from continuing service customers. With regard to the energy market, when customers terminate an energy supplier prematurely, they consequently have to pay a fee. Energy providers are willing to pay for this fee by granting a welcome gift for the new customers, which further increases the expenses.

Since customer churn, i.e. the process of customers terminating their contract and switching to a competitive energy supplier during a given time period, can have such a big influence on the profits, it is important to estimate and predict churn behavior of the existing customers as correct and accurate as possible. Subsequently, an appropriate implementation plan can be developed to retain existing customers and to improve company operations. To prevent many potential turnovers, energy suppliers can then use media campaigns embedding different kinds of medium types (such as TV, radio, online display or preroll) in order to target the right type of existing customers. The questions then arises: “what is exactly the current and future impact of media campaigns on the churn behavior of customers of a Dutch utility company?” The focus in this paper is on modeling and explaining both customer churn behavior in general and the relationship between media campaigns, in which we focus on offline medium types, and customer churn behavior.

Evaluation studies in the context of estimating customer churn using media campaigns for energy suppliers are very rare. However, studies focusing on customer churn in other

1 The Authority for Consumers & markets (ACM) is a Dutch independent supervisor committed to

consumers and companies, and pursues a fair competition on the market and transparency in information to consumers.

(6)

6

industries without considering media campaigns do exist. Initial studies mainly focus on finding a few specific factors. Bolton (1998) develops and estimates a dynamic, proportional hazard model of the duration of provider-customer relationship that in particular focuses on the role of customer satisfaction. Data are used from a cellular telephone company. The duration of this relationship is postulated to depend on the customer’s subjective expected value of the relationship, which he updates according to an anchoring and adjustment process using e.g. the quality of the service perceived. The results indicate that customer satisfaction ratings are positively correlated with the duration of the relationship. Furthermore, changes in customer satisfaction can have important financial implications, because lifetime revenues from an individual customer depend on the duration of the relationship.

Bendapudi and Berry (1997) claim that customers maintain relationships with service providers because of constraints, i.e. switching costs, or because of loyalty. The two major components of switching costs in the mobile telecommunications service industry are membership card programs and loyalty points. Membership benefits and accumulated points may be lost when service contracts are terminated or customers switch their service provider. Even dissatisfied customers may then show a high level of so called “false” loyalty.

Thereby, Bolton, Kannan and Bramlett (2000) find that members in loyalty programs for a world-wide financial services company overlook or discount negative evaluations of the company in relation to its competitors in their decisions to stay. A possible reason could be that members of the loyalty reward program perceive that they are getting better quality and service for their price, i.e. good value.

Gerpott, Rams and Schindler (2001) analyze a two-stage model in which customer satisfaction has a significant impact on customer loyalty, which in turn influences a customer’s intention to extend the contractual relationship with his mobile telephone network operator. There are two elements that provide the most important early-warning signals for the degree of customer retention and act as a “lever” to motivate customers to continue their contractual relationship with a provider. These elements are (1) the assessment of the customer that the prices charged by their supplier are ‘good’ and ‘fair’ compared to competitors and (2) the perception of the customer of the functional benefits of the mobile communications services.

Kim and Yoon (2004) investigate different determinants of subscriber churn using a binomial logit model. The propensity to switch is found to be dependent on the level of satisfaction as well with service attributes including call quality, handset type and brand image, level of income and subscription duration. Ahn, Han and Lee (2006) also apply a comprehensive, logistic model. Customer grade based on usage level and tenure, different types of calling plans, gender, payment method and membership cardholders among other things are included as regressors to explain churn. It is shown that in particular customer loyalty programs, service

(7)

7

usage level and call quality are found to have a large, significant effect on churn. Different churn determinants such as number of complaints and number of saved loyalty points are found to have a significant impact on a customer status change, where customer status is grouped into three categories: active use, non-use and suspended. A change in customer status in its place is found to act as a partial mediator between churn determinants and customer churn. This would indicate that a customer’s status change is an early signal of customer churn.

In more recent studies, new data mining techniques have been developed with the aim of predicting customer churn. An example is Hung, Yen and Wang (2006). Both decision tree and neural network techniques are used to assign a “propensity-to-churn” score to all customers of a Taiwanese telecom company individually. Their results indicate that both techniques can deliver accurate churn prediction models by using customer demographics, billing information, contract and service status, call detail records and service change log. Lemmens and Croux (2006) apply bagging and boosting classification techniques to a U.S. wireless telecommunications company. Their results show an improvement of an already existing binary logit churn model based on predictive power, diagnostic measures and partial dependence plots. This kind of modeling is not discussed in this study however.

This paper on customer churn behavior differs from previous ones for the following reasons. To our knowledge, no study on customer churn behavior of customers of an energy supplier exists. Thereby, using parametric and semiparametric selection models to control for a possible sample selection bias is new.

A second distinct research objective of this research is the inclusion of a specific media campaign as an explanatory variable for customer churn behavior during this specific media campaign. The energy supplier of interest in this study uses a specific loyalty media campaign with the aim of retaining existing customers and increasing customer loyalty. However, campaigns are in general expensive and time consuming. It is therefore of interest to quantify the effect of media campaigns on customer churn. The reach of the media campaign is captured by the indicator whether someone has seen or heard the media campaign at least four times, or not. This information is granted by a Dutch independent marketing and business consulting agency who conducted a survey to online respondents, who are also customers at the energy supplier during the loyalty media campaign.

The outline of the paper is as follows. In Section 2, the research models and methodsare set up including the factors that pertain to customer churn behavior and media campaign reach. Section 3 describes the data. In Section 4, the empirical results are presented. Section 5 gives a summary and conclusion. In Section 6, the limitations of this study and possibilities for future research regarding to implementation and other models are presented.

(8)

8

2 Research models and methods

In this section, two binary choice models on customer churn and on the reach of the media campaign are presented. A bivariate probit model is set up to investigate the relationship between the process of churn and of being reached by the media campaign. Then, parametric and semiparametric models are presented to investigate the presence of a possible selection bias.

2.1 Binary choice models 2.1.1 Customer churn model

The main focus of this paper is on explaining customer churn behavior during a specific media campaign. A first step is to set up a latent binary choice model in order to investigate the relationship between the binary churn variable and the reach of the media campaign. The index function model can be written as follows:

churni∗ = κ campaigni+ Zi′γ + εi, for i = 1,…,N (2.1) where campaigni is a binary variable indicating whether a customer has seen or heard the media campaign, Zi depicts a set of control variables consisting of company related and socio-demographic variables based on previous studies to control for other possible causes of churn and the error term εi is independent and identically standard normally distributed across all

observations: εi ~ iid N(0,1)2. The error term is assumed to be homoskedastic. It is questionable

whether this is the case. For binary choice models, however, using robust standard errors will not make an improvement, since the estimated parameter vector becomes inconsistent. An extensive description of and motivation for the included variables and loyalty campaign are given in Section 3.

The dependent variable churni∗ in (2.1) is not directly observed. Instead, we only observe churni= {1 if churni ∗> 0, i. e. ε i> −(Zi′γ + κ TV_radioi) 0 otherwise, (2.2)

2 An extension would be to relax the distributional assumptions of the error term by putting the density

equal to a Hermite series following the approach of Gallant and Nychhka (1987) and Stewart (2004). The model parameters are estimated consistently. However, applying this method to the dataset used in this study results in a zero, singular Hessian matrix and this semi-nonparametric model does not converge. Using a subset of the dataset and of the regressors to correct for multicollinearity, varying between different starting values and employing different methods, such as “BHHH”, does not have an effect (see Cameron and Trivedi, 2005, p. 350).

(9)

9

where churni= 1 if a customer churned between October 1st 2014 and March 1st 2015, and

churni= 0 if a customer did not leave the company during that specific period. An explanation for this specific period is provided in Section 3 as well.

2.1.2 Media campaign reach model

The dummy variable campaign𝑖 on the right-hand side of (2.1) is not observed for all customers of the energy supplier: the variable is only known for 15,549 customers who participated in an online survey conducted by a Dutch marketing and business consulting agency. Therefore, it is of interest to investigate the possibility of predicting this binary variable for all customers as correct as possible in order to gain more useful insights in customer behavior and to include this variable in the general binary choice model on churn. A latent binary choice model is subsequently set up using only customers who participated in the above-mentioned survey. The index function model is then

campaign𝑖∗ = Di′μ + ui, for i = 1,…N, (2.3) where Di contains a set of explanatory variables consisting of mainly socio-demographics. The error term ui is independent and identically standard normally distributed across all observations: ui ~ iid N(0,1) and is assumed to be homoskedastic following the same reasoning as in Section 2.1.1.

The dummy variable campaign𝑖∗ is not directly observed as well. We only observe campaign𝑖 = {1 if campaign𝑖∗> 0, i. e. ui> −Di′μ

0 otherwise, (2.4) where campaign𝑖 = 1 if a customer has seen or heard a specific media commercial, and

campaign𝑖= 0 if a customer has not been in contact with the media campaign.

In (2.1), the directly observed binary value of campaign𝑖 is included. Based on the

outcomes of (2.3), an alternative to include in (2.1) is the expectation of the estimated

probability of the variable campaign𝑖: E[p̂campaign𝑖], the estimated binary value of the variable:

Î[pcampaign𝑖] that is based on a specifically determined optimal cutoff point, or a newly created categorical variable. This categorical variable consists of three categories (low, mid and high) based on the probability of having been in contact with the media campaign. Nevertheless, including these variables in different models and comparing these models with the initial model, in which the observed values of campaign𝑖 are included, does not make a significant

improvement.

(10)

10

The process of having been exposed to the media campaign may be related to the process of churn. Endogeneity and simultaneity problems can occur and the error terms in the regression models may be correlated. Furthermore, it cannot be tracked whether a customer first switched energy supplier and then was exposed by the media campaign, or vice versa. In order to investigate the possible correlation, the following recursive, simultaneous-equations bivariate probit model is set up

{churni∗= Zi′γ + κ campaign𝑖+ εi, churni= 1 if churni∗> 0, 0 otherwise, (2.5)

campaign𝑖∗= Di′μ + ui, campaign𝑖 = 1 if campaign𝑖∗> 0, 0 otherwise, (2.6) (uεi

i | Zi, Di) ~ N [(00) , (

1 ρ ρ 1)],

where (2.5) and (2.6) correspond to the univariate probit models (2.1) and (2.3) respectively, and the error terms are assumed to follow a bivariate normal distribution with correlation coefficient ρ. Some variables in Di also appear in matrix Zi.

Surprisingly, in formulating the log-likelihood the endogenous nature of TV_radioi on the right-hand side of the first part can be ignored and the model can then be fit as an ordinary bivariate probit model.3 This can be established using the following argument (see Greene, 2012,

p. 786). The term that enters the log-likelihood is

Pr [churni= 1, campaign𝑖 = 1] = Pr [churni= 1|campaign𝑖 = 1] ∗ Pr[campaign𝑖= 1]

= Pr[εi> −(Zi′γ + κ campaign𝑖) | ui> −(Di′μ)] ∗ Pr[ui> −(Di′μ)] (2.7)

Given (2.5), the marginal probability for campaign𝑖 = 1 is just Φ(Di′μ), whereas the conditional probability Pr[εi> −(Zi′γ + κ campaign𝑖) |ui> −(Di′μ)] =Φ2(Zi

γ+κ,D i′μ,ρ )

Φ(Di′μ) , where we

make use of the symmetry of the normal distribution. The product returns the bivariate normal probability equivalent to the normal bivariate probit model in (2.7). The other three terms in the log-likelihood are derived in a similar way, which produces:

Pr[churni= 1, TV_radioi= 1] = Φ2(Ziγ + κ, D i′μ, ρ) Pr[churni= 1, TV_radioi= 0] = Φ2(Ziγ, −D i′μ, −ρ) Pr[churni= 0, TV_radioi= 1] = Φ2(−(Zi′γ + κ), Di′μ, −ρ) Pr[churni= 0, TV_radioi= 0] = Φ2(−Zi′γ, −Di′μ, ρ). (2.8)

These terms are equal of those of an ordinary bivariate probit model. Hence, the simultaneity in the model can be ignored.

Interpretation of the components of this bivariate probit model is particularly complicated. Typically, interest will center on (2.5) and in particular on the impact of the variables in the model on the probability that TV_radioi is equal to 1. Because campaigni appears

3 For more information on the derivation of the log-likelihood procedure of the ordinary bivariate probit

(11)

11

on the right-hand side of (2.5), there is potentially a direct effect in Zi and an indirect effect transmitted to churni through the impact of the variable in question on Pr[campaign𝑖 = 1].

2.3 Selection models on the number of days up to churn

In order to get deeper insight in the churn behavior of customers of the energy supplier, it is also of interest to analyze the duration of the membership in days up to a churn, which in this study is defined as churn_daysi, during the observation period. It is only known after how many days a customer churned in case he actually churned. If a customer did not churn, he churned after at least 150 days, but the exact number is unknown. Therefore, the dependent variable is censored and right-truncated. The model is not corrected for right-truncation. A selection bias due to the censored data may arise in that customers who churned may differ in important, unmeasured ways from customers who did not to churn. For example, when a customer is a member of the energy supplier for a longer period, he may be more loyal and then, he may be less likely to churn. To take a possible selection bias into account, we set up a sample selection model, which comprises a selection equation that

churni= I{churni∗= Ziγ + ε

i> 0}, (2.9)

where for convenience the dummy variable for the media campaign, campaign𝑖, is included in Zi.

The error term εi is assumed to be standard normally distributed, such that an ordinary probit

model is obtained. See Section 2.1.1 for a more detailed explanation.

The selection model also comprises a participation equation. As above-mentioned, only for customers who churned during the loyalty campaign, it is known after how many days they churned. This leads to the following link

churn_daysi= {churn_daysi if churni= 1, i. e. churni

= Z

i′γ + εi> 0

− if churni= 0, i. e. churni∗= Ziγ + ε

i ≤ 0, (2.10)

where the variable churn_daysi is a nonnegative, duration variable and follows a Weibull

duration distribution.

To obtain the participation equation, we consider a Weibull proportional hazard model. Of note, in this study only cross-section data are used. The conditional hazard rate is defined as

λ(ti|x) = λ0(ti, α)exp (Xi′β + ωi), (2.11)

where λ0(ti, α) denotes the baseline hazard, Xi consists of the dummy variable campaign𝑖 and a set of control variables consisting of both socio-demographic and company related variables, and ωi depicts unobserved heterogeneity across the consumers due to possible omitted regressors and interindividual differences.

Then, an expression for the integrated hazard function is obtained using the following steps (see Cameron and Trivedi, 2005, p. 590)

(12)

12

∫ λ(s0ti i|Xi)dsi= ∫ λ0ti 0(si, α)exp (X′iβ + ωi)dsi (2.12)

Λ(ti|Xi) = Λ0(ti, α)exp (Xi′β + ωi),

where for convenience ti= churn_daysi is substituted.

Subsequently, we apply a natural logarithmic transformation to transform (2.12) to a linear regression model with an additive error

ln (Λ(ti|Xi)) = ln(Λ0(ti, α)) + Xi′β + ωi (2.13)

−ln(Λ0(ti, α)) = Xi′β + ωi− ln (ti|Xi)

= Xi′β + ωi+ νi = Xiβ + ν

i∗,

where the error term νi= −ln (ti|Xi) is type I extreme value distributed. However, due to the

presence of unobserved heterogeneity ωi, the distribution of νi∗= (ωi+ vi) is unknown.

Therefore, νi is assumed to be normally distributed: ν

i∗~N(0, σ2).

A convenient transformation of the dependent variable ti for the Weibull regression model is ln(Λ0(ti, α)) = αln (ti) such that

−αln (ti) = Xi´β + νi

ln (ti) = −1

α(Xi´β + νi

). (2.14)

Without loss of generality, we ignore the scale factor −1α in the rest of this study. Substituting churn_daysi= ti, the participation equation of the selection model results in

ln (churn_daysi) = Xi′β + νi∗. (2.15)

In order to enhance identification of the model, the number of explanatory variables in Zi from equation (2.9) needs to be larger than the number of explanatory variables contained in matrix Xi in (2.15). If Xi= Zi, then severe collinearity problems among the regressors in (2.15) may arise. Cameron and Trivedi (2005) show that β̂ is then only obtained because of non-linearity in the regressors with less precision and that the standard errors of β̂ are large. The problem is less severe when there is a larger variation in Ziγ̂ across observations. More

information on the included regressors and a motivation of including the regressors will be provided in the data section.

In the absence of sample selectivity, the identifying assumption of equation (2.15) is a mean-zero restriction on the errors conditional on the explanatory variables. However, as explained above, a selection bias problem may arise due to the partially missing dependent variable. Under the selection rule described by equation (2.9), the assumption of a mean-zero restriction is no longer satisfied and an OLS regression of the dependent variable on the regressors alone using only customers for who the dependent variable is observed leads to inconsistent estimation of β. This can be derived as follows. The population regression function for equation (2.15) can be written as

(13)

13

E[ln (churn_daysi)|Xi, Sample Selection Rule: churni= 1] = Xi´β + E[νi|X

i, εi> −Zi′γ] for i = 1,…,N1, (2.17)

where without loss of generality the first N1 observations are assumed to contain data on churn_daysi. If E[νi|X

i, εi> −Zi′γ] ≠ 0, OLS would indeed provide inconsistent estimates.

In order to obtain consistent estimates and to control for a sample selection bias, Heckman’s two-step procedure is used. Following the steps made in Greene (2003) and Cameron and Trivedi (2005) for this procedure, the conditional expectation for equation (2.15) can be derived as follows

E[ln (churn_daysi)|churni= 1, Xi] = Xi´β + E[νi∗|εi> −Zi′γ, Xi]

= Xi´β + ρσνi∗ φ(Ziγ) Φ(Ziγ) = Xi′β + ρσνi∗λ(Zi ′γ) = X i ′β + bλ(Z i′γ), (2.18)

where ρσνi∗ denotes the correction term in order to obtain zero expectation of the error term,

λ(Ziγ) is a simple estimator of the selection bias and is known as the Inverse Mills ratio, and the

error terms εi and νi∗ are joint normally distributed and homoskedastic:

νi

i∗) ~N [(00) , (

1 ρ

ρ σ2)].

This results in the following equation

ln (churn_daysi) =Xi´β + bλ(Zi′γ̂) + ui, (2.19)

where γ̂ is obtained by the first-step probit regression of churni on matrix Zi and ui follows a standard normal distribution: ui~N(0,1). E[ui|Xi, εi> −Ziγ, X

i] = 0, because independence is

assumed. The assumption that ui follows a standard normal distribution is restrictive, because it

is an explicit distributional assumption. It is questionable whether this is reasonable. The assumption that Var[ui] = 1 is without loss of generality, because churni is a binary variable.

The parameters β and b of equation (2.19) are estimated consistently by OLS.

The main advantages of this parametrically estimated sample selection model over more flexible models are that (1) it is easy to implement and (2) it is more efficient. However, in case the common distribution of the error terms or the functional relation is not correctly specified, the estimator may become inconsistent. Because the distributional assumption on the error terms of the Heckman model may be too restrictive to be able to eliminate the selection bias (Goldberger, 1983), more flexible semiparametric model are estimated. Semiparametric estimators allow consistent estimates to be obtained under weaker assumptions by admitting less space for specification errors, and the estimation results are easier to interpret than nonparametric methods. Like their parametric counterparts, semiparametric selection models generally imply a two-step approach with estimators in each step being separate constructions. The functional relationships of the selection and participation equation are specified, while the

(14)

14

selection correction function and the error’s density are generic. In this study, Newey’s (1999) approach is applied.

Newey’s (1999) series estimator builds on Lee (1982) and Cosslett (1991). As explained in Hussinger (2008), the selection correction term is consistently approximated by ∑Kk=1ηkpk(Zi′γ), where ηk are unknown parameters and pk(⋅) are known, smooth functions. The

chosen approximation function for pk(⋅) is [τ(Ziγ)]k−1. τ(⋅) contains a lead term, which is

defined as the normal distribution function. To avoid multicollinearity, the approximation function can be written as follows: τ(Zi′γ) = 2Φ(Zi′γ) − 1, and is monotone and uniformly bounded between -1 and 1. This uniform boundedness improves robustness with respect to outliers.

The model equation is then

ln (churn_daysi) = Xi´β + ∑K ηk

k=1 pk(Zi′γ) + ξi∗, (2.20)

where ξi∗= ∑∞k=K+1ηkpk(Zi′γ) + ξi and has an asymptotic variance. The condition of

√N-consistency that K converges at a slower rate to infinity than N, the total number of observations, must hold. The unknown parameters can be estimated by OLS requiring consistent estimates for Ziγ obtained from the first step.

The conditional expectation takes on the following structure E[ln (churn_daysi)|Zi, churni= 1] = E [Xi´β + ∑ ηkpk(Ziγ) + ξ

i∗ K k=1 |Zi, εi> −Zi ′γ] = Xi´β + ∑ ηk K k=1 pk(Zi′γ) + E [ ∑ ηk ∞ k=K+1 pk(Zi′γ) + ξi|Zi, εi > −Zi′γ] = Xi´β + ∑Kk=1ηkpk(Zi′γ) (2.21)

where the last equation is allowed in case ξi follows a standard normal distribution and is independent of εi and Zi, and where ∑∞ ηk

k=K pk(Zi′γ) = ∑k=K∞ ηk(2Φ(Zi′γ) − 1)K converges to

zero.

3 Data and empirical considerations

The underlying database in this study is the database from a Dutch energy supplier, which further remains anonymous. This energy supplier produces, sells and delivers electricity, gas, heat and additional services and is active in the Netherlands, Belgium and Germany. The dataset consists of 1,878,212 Dutch Consumer Market (CM) customers in a contractual setting and covers e.g. socio-demographic variables, information on socioeconomic status (SES) and company-related variables.

(15)

15

Information on which customers have been reached by the media campaign is provided by a Dutch independent marketing and business consulting agency, which also remains anonymous in this study. This agency is specialized in measuring, refining and improving advertising effectiveness. They focus on and attaches great importance to the ability to improve transparency, accountability and return on investment (ROI) of all cross media investments (television, online and radio). They conduct online surveys to their own panel database. The dataset consists of more than 200,000 panelists, whereof 15,549 participants were also customer at the concerning energy supplier in 2014 and 2015. In this group there is an overrepresentation of customers with a low Customer Lifetime Value (CLV)4, customers who do

not have district heating, online active customers, relatively new customers. These customers are then not representative for the whole dataset of the energy supplier. We do not correct for this.

The majority of the data is collected at one time point on October 1st 2014, the start of

the loyalty media campaign of main interest in this study. The running time of this loyalty campaign is in October and November 2014. A few variables are also observed at another time point to control for time effects. The focus of the campaign is on the retention of existing and on the acquisition of new customers by offering a yearly incremental discount on the energy price up to 25% when customers remain. Existing customers start with a discount of 10%, whereas new customers start with a discount of 5%. A second loyalty campaign begins on March 1st 2015.

The target group now consists of a wider range of customers including also customers who take grey electricity and have a fixed costs contract. The proposition of this campaign remains the same. In this study, particular interest lies in the effect of the first campaign and hence, we choose an observation period of October 1st 2014 to March 1st 2015 to investigate the effect of

this campaign only. Using a longer observation period would make the conclusions more complicated due to the additional effect of the second campaign. No observations are used after March 1st of 2015 and therefore, the data are right-truncated. As earlier mentioned, we do not

correct for this.

To construct the sample, the dataset of the energy supplier was matched with the dataset of the consulting agency using both an email address and an individual customer number. Missing observations and customers less than 18 years old are removed from the dataset. Further, there are customers present in the dataset who have churned before the start of the observation period on October 1st 2014. This group consists of only 80 customers (0.946%).

Removing these customers results in left-truncation. However, keeping these already churned

4 The Customer Lifetime Value (CLV) of a customer indicates how much a customer is worth to the energy

supplier. It is important to note that in all models in this study CLV is not included, since the energy supplier calculated the CLV per customer using their own estimated probability that a customer will churn. Including CLV would then lead to endogeneity problems.

(16)

16

customers seems contradictory and because this group of customers is such a small part of the dataset, we consider this problem to be negligible and assume that excluding these customers will not have a substantial effect on our results. Of note, conclusions made of this study do not apply for these customers then.

The resulting pooled cross-section consists of 8,379 observations, whereof 604 (7.208%) customers churned between the concerning loyalty campaign and the start of a successive loyalty campaign and the remaining 7,775 (92.792%) customers stayed. Some descriptive statistics can be found in Table I.

Churn behavior is defined by two variables. The first variable is a dummy variable that has value ‘1’ in case a customer churned during the media campaign of interest between October 1st 2014 and March 1st 2015, and ‘0’ in case a customer did not churn during this period. The

second variable describes the time in days until he churned and is calculated by (1) taking the difference in days between the date of churn and October 1st 2014 and (2) taking the natural

logarithm of the resulting outcome plus one controlling for non-defined values. As above-mentioned, no data are used after March 1st 2015. The variable then in the interval [min =

0.000; max = ln(149) = 5.011] and is right-truncated, for which we do not correct.

In Figure 1 below, the natural logarithm of the number of days up to a churn beginning at the start of the campaign against the frequencies is plotted. The mass of the distribution is concentrated on the right and the distribution is skewed. An explanation for this could be that at the end of December the contracts of most health insurances expire, and hence customers need to conclude new health contracts or extend existing ones. During that period, customers of energy suppliers might subsequently also consider switching to another energy provider. Furthermore, in the cold winter months, customers may use more energy and gas such that switching to another, cheaper energy supplier can lead to substantial cost reduction and also, customers may have more time left to spend on comparing energy and gas prices online.

Figure 1: Histogram of the natural logarithm of the number of days up to a churn during the loyalty campaign

(17)

17 Table I: Descriptive statistics

Variable # Obs Mean Std.dev Min Max

Churned 8,379 0.072 0.259 0 1

Number of days up to churn 604 85.459 39.87 0 149 Reached by the campaign 8,379 0.883 0.322 0 1 Yearly consumption of gas 8,379 1,404.958 898.157 0 23,099 Yearly consumption of energy 8,379 3,472.855 2,843.340 0 163,03 Sitevisit in 3 months before Oct 1 8,379 0.486 0.500 0 1 Number of contacts with servicedesk 8,379 0.301 0.665 0 4 Contact in 3 months before Oct 1 8,379 0.084 0.277 0 1 Rare use of social media 8,379 0.204 0.403 0 1 Weekly use of social media 8,379 0.204 0.403 0 1 Daily use of social media 8,379 0.254 0.435 0 1 Continuous use of social media 8,379 0.155 0.362 0 1 Number of years member 8,379 76.16 5.051 0 15 Product switch in 3 months before Oct 1 8,379 0.235 0.424 0 1 Do not receive letters 8,379 0.069 0.253 0 1 Do not receive calls 8,379 0.103 0.304 0 1 Do not receive emails 8,379 0.078 0.268 0 1 Number of direct mails received 8,379 0.084 0.289 0 2 Number of emails received 8,379 0.204 0.403 0 1 Member of loyalty program 8,379 0.461 0.499 0 1 Selected for retention program 8,379 0.030 0.169 0 1 Number of spent loyalty points 8,379 13.694 70.194 0 575 Living in traditional area 8,379 0.715 0.451 0 1 Not living in traditional area 8,379 0.215 0.411 0 1 Living in a rented house 8,379 0.416 0.493 0 1 Employment: free 8,379 0.077 0.267 0 1 Employment: director 8,379 0.079 0.269 0 1 Employment: middle 8,379 0.220 0.414 0 1 Employment: higher 8,379 0.110 0.313 0 1 Employment: other 8,379 0.242 0.428 0 1 Employment: no, <35y 8,379 0.158 0.365 0 1 Employment: no, >65y 8,379 0.114 0.318 0 1 Education: primary school 8,379 0.049 0.216 0 1 Education: vmbo 8,379 0.084 0.277 0 1 Education: mavo 8,379 0.106 0.308 0 1 Education: mbo 8,379 0.289 0.453 0 1 Education: havo-vwo 8,379 0.191 0.393 0 1 Education: hbo 8,379 0.196 0.397 0 1 Education: university 8,379 0.086 0.280 0 1 High income 8,379 0.205 0.404 0 1 Number of technological devices 8,379 33.414 0.852 1 4 Age 8,379 523.474 14.276 20 114

Gender 8,379 0.534 0.499 0 1

Gender * age 8,379 273.721 27.528 0 96

1 child 8,379 0.199 0.399 0 1

(18)

18 >2 children 8,379 0.122 0.327 0 1 Province: Gelderland 8,379 0.266 0.442 0 1 Province: Friesland 8,379 0.051 0.219 0 1 Province: Groningen 8,379 0.015 0.121 0 1 Province: Drenthe 8,379 0.010 0.100 0 1 Province: Overijssel 8,379 0.024 0.153 0 1 Province: Utrecht 8,379 0.028 0.165 0 1 Province: Flevoland 8,379 0.042 0.201 0 1 Province: Noord-Holland 8,379 0.347 0.476 0 1 Province: Noord-Brabant 8,379 0.136 0.343 0 1 Province: Noord-Brabant 8,379 0.057 0.231 0 1 Province: Zeeland 8,379 0.002 0.041 0 1 Province: Limburg 8,379 0.022 0.148 0 1 Married 8,379 0.418 0.493 0 1

High level of urbanity 8,379 0.418 0.493 0 1

The covariate of interest in this study is campaign𝑖 and is determined by the results of the surveys of the above-mentioned Dutch consulting agency. Online respondents indicated the time in hours that they spent on watching TV and listening to the radio during the loyalty campaign.5

The agency matched this information with the broadcasting schedule of the campaign. According to an unknown formula they indicated whether the customers were effectively, at least four times, reached. This information resulted in two different dummy variables: one for having seen the TV commercial and one for having heard the radio commercial. The accuracy of these variables is questionable since it is unknown whether a customer indeed saw or heard the commercial.

A closer look on the data reveals that the majority of the customers who heard the radio commercial also saw the TV commercial. Therefore, we create a new dummy variable with value ‘1’ in case a customer has either seen or heard the commercial effectively and ‘0’ else. In total, 7,398 (88.292%) customers saw or heard the campaign, whereas 981 (11.708%) customers were not reached by the campaign. We expect that if a customer saw or heard the commercial, he may be more likely to apply for a higher discount and hence, he may be less likely to churn due to the cost reduction.

Considering only the customers who did not churn, 6,869 customers did hear or see the commercial, whereas 911 customers did not see or hear it. This means that the group of customers who were reached by the commercial is 6,869911 = 7.540 times as big as the group that were not reached. Considering only the customers who did churn, 477 customers were reached by the campaign, whereas 122 customers were not reached. The group of customers who were reached by the campaign is 477

122= 3.910 as big as the group that was not reached. There is a

(19)

19

substantial difference between these two groups. Furthermore, the sample estimate of the correlation coefficient between these two variables takes on a value of -0.075 and it is significantly different from zero with p = 0.000.

Further, a number of control variables are used in this study to control for other effects with respect to customer churn behavior. These variables are observed on October 1st 2014. As

explained in the introduction, an important determinant of churn is the magnitude of the energy price (ACM, 2014) since most customers are price conscious and price driven when it comes to the decision to churn. Unfortunately, no information on gas and energy prices of the energy supplier and its competitors is available in the dataset. To take prices indirectly into account, yearly consumption of both gas and electricity for the past year are included. The level of consumption partially influences how much a customer pays for the services. A customer can visit the website to look up his yearly consumption and therefore a dummy indicating whether a customer has visited the website in the 3 months prior to the start of the campaign is included as well.

Another major factor is customer satisfaction, because less satisfied customers are more likely to churn (e.g. Ahn et al., 2006) and customer satisfaction ratings are positively correlated with the duration of the relationship (Bolton, 1998). Due to the limited amount of available data, it is difficult to quantify customer satisfaction with the energy supplier in this study. More contact with the service desk due to e.g. experienced service failures might imply a higher customer dissatisfaction and hence, a higher probability to churn. Therefore, we define customer satisfaction by contact with the service desk. Further, the main goal of the use of social media for this energy supplier is to influence a positive attitude towards the company and enhance customer satisfaction. Four different dummies indicating the amount of use of social media for every customer individually are therefore included. These dummies are measured at five different levels varying between little use and continuous use with no use as a reference level.

Customer loyalty in this study is measured by the number of years that a customer is a member of the energy supplier. We expect that the longer the customer is a member of the energy supplier, the more loyal he is and the less likely to churn. We expect a negative effect of customer loyalty on churn.

The status of a customer can directly and indirectly influence customer churn (Ahn et al., 2006). We determine an active customer status by the following variables: a dummy variable indicating whether a customer switched products in the three months prior to October 1st 2014

and three dummy variables indicating whether a customer indicated that he or she does not want to receive any mails, calls or emails. We expect that the first variable has a negative impact and the remaining three variables have a positive impact on the probability to churn.

(20)

20

The company conducts a special retention program, i.e. a churn prevention program, by sending emails and mails about new contracts and offers to a selected group of customers in order to reduce the number of churns. This selected group of customers consists of customers whose contracts are about to expire. The dummy variable indicating whether a customer is selected is therefore included. We also include the numerical variables containing information about the number of emails and mails about new deals that have been sent to these customer. The duration of all contracts differ and ranges from one year to indefinite. However, no information on this is available and it is unknown whether a customer churned exactly at the end of the contract or prematurely. The type of contract is not included as well.

An indicator for participating in a membership program, i.e. a loyalty program, is included as well. As a member of this program, a customer can save loyalty points, which can be spend on special products. Ahn et al. find that participating in a special membership program makes customers less likely to churn. A negative effect of participating in the membership program is expected. Also, we include the number of loyalty points spent between October 1st

2014 and March 1, 2015 and expect that spending loyalty points has a significant effect on the decision when to churn.

Customers living in the area traditionally owned by the energy supplier (consisting of Noord-Holland, Groningen, Gelderland and Flevoland) are expected to be less likely to leave the company.6 This variable also indirectly indicates the province, where a customer is living.

Therefore a dummy variable for this is included.

Information on socioeconomic status (SES) is included as well. SES is determined by type of employment (dummy variables for middle employed, higher employed, freelancer, higher management function, unemployed at age ≥ 65 other, and no employment at age < 65 as a reference) which indirectly indicates number of weekly working hours, industry in which employed (wage labor, unemployed at age < 65 and at ≥ 65, government, freelance, and no industry as a reference), highest level of completed education (in increasing degree of difficulty: primary education, lbo/vmbo, mavo, havo/vwo, mbo, hbo, university, and primary school as a reference) and gross family income (indicator whether the total income is high (≥2 average income) and low otherwise). Customers with a higher SES are supposed to be more likely to churn since they might be better capable of finding cheaper options elsewhere. It is expected that there exists a positive correlation between type of employment, gross income compared to average income and level of completed education, and SES.

A dummy variable with ‘1’ for renting a house and value ‘0’ for having bought a house is also included. This dummy variable can be an indicator for a certain stage of life. Individuals who

6 A network operator is now managing the energy network for electricity and gas to which all the meters

(21)

21

bought a house earn or own a minimum amount of money. Furthermore, some customers, who rent a specific type of building, for instance a student room, are not allowed to switch to another energy supplier.

At last, there is controlled for effects of age, gender, a dummy variable indicating whether a customer lives in an area with a high level of urbanity and having of children.

This study also focuses on the determinants of the reach of the offline media campaign. To our knowledge, no studies exist on this subject. It is quite challenging to set up this kind of model and some common sense has therefore to be utilized as well. We expect that retirees and middle-aged customers have more time left to spend on watching TV (linear TV) and listening to the radio. Therefore, age is included.

Customers who make use of social media on a regular basis are expected to also watch more TV and listen more to the radio and hence to be more likely to have been reached by the commercial. The number of devices a customer owns may affect the probability of having been reached as well.

Furthermore, we expect that higher educated customers or customers with a higher income are less likely to have been reached by the campaign. SES is expected to have a negative impact on churn.

We control for other effects such as gender, province where living, marital status, level of urbanity, having children, income and education.

4 Empirical results

This section discusses the estimation results for the effects of the media campaign and various control variables on customer churn and the effects of different socio-demographic variables on the reach of the media campaign. Further, the estimation results from the selection models for the effects on the number of days up to a churn are discussed.

4.1 Binary choice models 4.1.1 Customer churn model

A binary choice model is estimated to investigate the relationship between customer churn and the reach of a media campaign and control variables described in Section 3. The parameters are estimated using the maximum-likelihood estimation method. Table II displays the estimation

(22)

22 Table II: Estimation of customer churn7

Variable Estimate Std. Err. Marginal Effects Std. Err. Reached by the campaign -0.344*** 0.061 -0.041*** 0.007 Do not receive letters -0.151 0.178 -0.018 0.021 Do not receive calls 0.177. 0.105 0.021. 0.013 Do not receive emails 0.123 0.177 0.015 0.021 Member of loyalty program -0.079 0.054 -0.010 0.006 Selected for retention program -0.373* 0.169 -0.045* 0.020 Number of contacts with servicedesk 0.155*** 0.029 0.019*** 0.003 Number of emails received -0.837*** 0.085 -0.101*** 0.010 Number of direct mails received 0.446*** 0.066 0.054*** 0.008 Living in traditional area 0.023 0.094 0.003* 0.011 Not living in traditional area 0.265*** 0.098 0.032* 0.012 Yearly consumption of gas 0.000 0.000 0.000 0.000 Yearly consumption of energy 0.000 0.000 0.000 0.000

Age -0.004 0.002 -0.000 0.000 Gender -0.247 0.167 -0.030 0.020 Gender * age 0.003 0.003 0.000 0.000 Education: vmbo 0.156 0.134 0.019 0.016 Education: mavo 0.025 0.131 0.003 0.016 Education: mbo 0.019 0.121 0.002 0.015 Education: havo/vwo 0.034 0.123 0.004 0.015 Education: hbo 0.068 0.124 0.008 0.015 Education: university 0.006 0.141 0.001 0.017 Living in a rented house 0.016 0.053 0.002 0.006 High income -0.003 0.065 -0.000 0.008 High level of urbanity 0.027 0.048 0.003 0.006 Number of years member -0.046*** 0.006 -0.006*** 0.001 Employment: free 0.038 0.098 0.005 0.012 Employment: director 0.015 0.104 0.002 0.013 Employment: higher 0.134 0.091 0.016 0.011 Employment: middle -0.047 0.077 -0.006 0.009 Employment: other -0.098 0.074 -0.012 0.009 Employment: no, >65y -0.111 0.093 -0.013 0.011 Married -0.021 0.051 -0.003 0.006 Activated account 0.003 0.067 0.000 0.008 Contact in 3 months before Oct 1 0.022 0.079 0.003 0.001 Sitevisit in 3 months before Oct 1 0.061 0.054 0.007 0.006 Product switch in 3 months before Oct 1 -0.387*** 0.061 -0.047*** 0.007 1 child 0.001 0.061 0.000 0.007 2 children 0.072 0.070 0.009 0.008 > 2 children 0.144. 0.086 0.017. 0.010 Intercept -0.749*** 0.211

Test SES=0 χ²(13)=13.41 with p=0.417

LR-χ² 574.43

McFadden-R² 0.132

(23)

23

results. Interpretation of the results of the nonlinear model is not as straightforward as from a linear model. Therefore, marginal effects are calculated as well.

What first comes to mind is that the majority of the estimated marginal effects are very small in value. Further, the results show that having seen or heard the media commercial has a significant and negative influence on the probability of churning. This means that having seen or heard the commercial makes customers less likely to churn with a decrease of 0.041 in the probability to churn. This is in line with the expectation.

In line with the expectations as well is that being selected for a retention program and the number of emails received have a negative influence on the probability to churn. This might indicate that the retention program reaches its goal of retaining existing customers. However, the estimated parameter for number of mails received is significant and positive. This may indicate that sending letters is not as beneficial as sending emails.

The estimated parameter for loyalty measured by the length of the membership in years is negative (-0.006) and significant. This shows that loyal customers would be less likely to leave the company. However, this measure can also say something about a customer’s personality traits, such as slackness. An active customer status defined by the dummy variable ‘product switch in 3 months before October 1st 2014’ has a significant, negative (-0.047) influences on the

probability to churn. This may suggest that active customers, who recently switched products, are less likely to churn. This effect is possibly endogenous as it is similar to the churn process during the media campaign.

The estimated parameters for the number of contacts with the service desk, i.e. customer dissatisfaction, and the dummy variable for customers who indicated that they do not want to be contacted by telephone are positive and significantly different from zero. This indicates that in case a customer has more contact with the service desk or does not want to be contacted might be considering churning.

It is striking that all levels of education, gross family income and type of employment are not significant. SES appears not to have a, direct, effect on churn. We perform a χ²-test to test the hypothesis of all parameters for SES simultaneously being equal to zero. We obtain χ² (13) = 13.41 with p = 0.417 and conclude that SES does not have a significant impact on the probability to churn. The estimated coefficients are generally negative indicating that a higher SES decreases the probability to churn, which is in line with the expectation.

A graphical presentation of the predicted churn probabilities with corresponding frequencies can be found in Figure 2 below. The distribution is skewed. The sample estimate of the correlation coefficient between the predicted churn dummy variable and the observed, true values of the churn dummy variable is significantly different from zero with p = 0.000.

(24)

24

Figure 2: Histogram of the predicted probabilities to churn

After the estimation procedure the predictive accuracy of the model is calculated using a confusion matrix. A confusion matrix presents correctly and incorrectly estimated positive and negative values. The matrix is presented in Table III below. TN and TP on the diagonals depict the true negative and the true positive correctly predicted values respectively, i.e. either 0 or 1 was predicted and indeed the observed value was 0 or 1. FN and FP are the false negative and false positive incorrectly predicted values.

Table III: Confusion matrix (Hay, 1988)

Observed values

Predicted values 0 1

0 TN FP

1 FN TP

The predictive accuracy of the model is the percentage that the classifier is correct and can be calculated with the following formula:

Accuracy = TN+TP

TN+FN+FP+TP , (4.1)

The value of the accuracy is desired to be as close to 1 as possible, because this indicates that the majority of the predicted variables are equal to the observed values. For relatively high values for the accuracy, the classification model is predicted on the whole dataset.

The estimated confusion matrix for churn can be found in Table IV below with in parentheses the percentage relatively to all 8,379 observations. The predictive accuracy for the churn model is quite high with a value of 0.666. This indicates that it may be possible to predict churn. However, it is questionable whether the accuracy is high enough. The optimal cutoff point has a rather low value of 0.072, which is almost equal to the percentage of customer churn during the second campaign. This optimal cutoff point is determined by finding a point such that both the sensitivity and specificity of the estimated model are as big as possible. The sensitivity of the model is the True Positive Fraction (TPF), i.e. the correctly predicted positive values, whereas the specificity denotes the True Negative Fraction (TNF), i.e. the correctly predicted negative values. These values are explained more extensively later in this section.

(25)

25

Table IV: Estimated confusion matrix for customer churn

Observed values Predicted values 0 1 0 5,127 (61.189%) 153 (1.826%) 1 2,648 (31.603%) 451 (5.383%)

Ansari and Mela (2003) explain that a Receiver Operating Characteristic (ROC) curve is a graphical plot that illustrates the performance of a binary variable and illustrates the tradeoffs between two types of errors that can occur for any threshold. These types of errors are the false negatives and false positives described above. The curve is constructed by plotting the True Positive Fraction (TPF), i.e. the sensitivity, against the False Positive Fraction (FPF), i.e. 1-specitivity, for a range of possible classification thresholds. The TPF and FPF are calculated as follows:

TPF = TP

TP+FN

FPR = FP

FP+TN. (4.2)

The resulting plot is represented over a unit square. The area under the ROC curve provides a summary measure of the quality of the model. A model with an ROC curve that tracks the 45-degree line, with an area value of 0.500 under the line, would be worthless as it would not separate the two classes of observations, whereas a perfect model would have value 1 for the area under the curve and has the ability of the test to correctly classify those churn and no churn.

The ROC curve for churn is plotted in Figure 3 below. The value under the ROC is quite high with a value of 0.773. This may indicate that the predictive ability is fair and the quality of the model is bad. It may be possible to classify the customers.

Figure 3: ROC curve for customer churn

0 .0 0 0 .2 5 0 .5 0 0 .7 5 1 .0 0 Se n si ti vi ty 0.00 0.25 0.50 0.75 1.00 1 - Specificity Area under ROC curve = 0.7733

(26)

26 4.1.2 Media campaign reach model

In order to analyze the relationship between the binary variable offline media campaign reach and various socio-demographic variables, we estimate a binary choice model. The empirical results of the estimation together with estimated marginal effects can be found in Table V. The parameters are estimated using maximum-likelihood estimation.

Looking at the magnitudes of all estimated coefficients together, we see that all values are low and that the majority of the variables appears not to be significantly different from 0.

In line with the expectation is that the amount of use of social media has a significant and positive impact on the probability of having been in contact with the commercial. Thereby, we obtain the relation that the more social media a customer uses, he or she is even more likely to be reached. A logical explanation could be that customers who spend more time on social media, also spend more time on watching TV and listening to the radio, i.e. we expect a positive correlation.

Furthermore, the higher the level of completed educated, the smaller is the probability of having been in contact with the commercial. The probability of the offline campaign reach even decreases with 0.055 for customers who are in the possession of a university degree. Customers who are middle high employed are significantly less likely to have seen or heard the loyalty commercial. Altogether, this might indicate that customers with a higher SES have a lower probability of having been in contact with the offline media commercial. We perform a χ²-test to test the hypothesis of all parameters for SES simultaneously being equal to zero. We obtain χ² (12) = 17.94 with p = 0.117 and we conclude that there is not a significant relation between SES and media campaign reach.

Relatively few provinces (only Friesland and Zuid-Holland) are significant at a level of 5%. In general the signs for estimates of the different provinces are positive. Compared to reference variable Noord-Brabant, customers living in a province other than Noord-Brabant would be more likely to have been reached by the media commercial. Though it is rather difficult to give a relevant reasoning for this.

Older people are more likely to have seen or heard the commercial. This is again difficult to explain. On the one hand, older people might have more time left to spend on offline media, on the other hand, younger people might be more interested and dedicated to technological devices and offline medium types. A one year increase in age is associated with a 0.002 increase in probability.

In Figure 4 below a graphical presentation of the predicted probabilities against its corresponding frequencies is shown. The distribution is slightly skewed and concentrated around 0.9.

(27)

27

Table V: Estimation of the reach of the offline media campaign

Variable Estimate Std. Err. Marginal effects Std. Err. Number of technological devices -0.022 0.025 -0.004 0.005 Living in a rented house 0.049 0.042 0.009 0.008 Gender 0.098 0.135 0.019 0.026 Gender * age -0.001 0.003 -0.000 0.000 Age 0.013*** 0.002 0.002 0.000*** Education: vmbo -0.042 0.118 -0.008 0.023 Education: mavo -0.183. 0.1010 -0.035 0.021. Education: mbo -0.141 0.103 -0.027 0.020 Education: havo/vwo -0.153 0.105 -0.029 0.020 Education: hbo -0.098 0.107 -0.019 0.020 Education: university -0.288* 0.117 -0.055 0.022* Education: free -0.060 0.080 -0.012 0.015 Employment: director 0.017 0.085 0.003 0.016 Employment: higher -0.018 0.075 -0.004 0.014 Employment: middle 0.081 0.064 0.016 0.012 Employment: other 0.014 0.061 0.003 0.012 Employment: no, <65y 0.053 0.078 0.010 0.015 Low income 0.018 0.052 0.003 0.001 Province: Friesland 0.270* 0.112 0.052 0.021* Province: Groningen 0.050 0.162 0.010 0.031 Province: Drenthe 0.101 0.193 0.019 0.037 Province: Overijssel -0.043 0.129 -0.008 0.025 Province: Flevoland 0.244* 0.117 0.047 0.022* Province: Gelderland 0.172* 0.079 0.033 0.015* Province: Utrecht 0.134 0.126 0.026 0.024 Province: Noord-Holland 0.161* 0.079 0.031 0.0151* Province: Zuid-Holland 0.239*** 0.088 0.046 0.017*** Province: Zeeland -0.391 0.365 -0.075 0.070 Province: Limburg 0.222 0.144 0.042 0.028 Married 0.023 0.042 0.004 0.008 Social media: rare use 0.055 0.059 0.010 0.011 Social media: weekly use 0.162*** 0.062 0.031 0.012*** Social media: daily use 0.206*** 0.063 0.039 0.012*** Social media: continuously use 0.299*** 0.076 0.057 0.014*** High level of urbanity -0.086* 0.042 -0.016 0.008* 1 child .014 0.051 0.003 0.010 2 children -0.082 0.060 -0.016 0.011 >2 children -0.081 0.076 -0.015 0.014 Intercept 0.409* 0.184 Test SES=0 χ²(12)=17.94 p=0.117 LR-X² 170.54 McFadden-R² 0.028

(28)

28 Figure 4: Histogram of predicted probabilities

The calculated confusion matrix for the reach of the campaign is presented in Table VI. The accuracy can again be calculated using formula (4.1) and takes on a relative low value of 0.609. This indicates that it is rather complex and may be too difficult to predict whether a customer has been in contact with the commercial. In case information on the hours spent on watching TV and listening to the radio and on which canals have been watched or listened to, a more accurate model on the probability of having been reached by the offline media campaign could be set up. Because most customers have been exposed to the campaign, the optimal cutoff point, calculated using the same procedure as in the previous subsection, is relatively high (0.884).

Table VI: Calculated confusion matrix for offline media reach

Observed values

Predicted values 0 1

0 556 (6.636%) 2,853 (34.049%) 1 425 (5.072%) 4,545 (52.219%)

In Figure 5 below, the ROC curve for the reach of the campaign is plotted. The upper line is almost identical to the reference line, i.e. the 45º line. The area value under the bowed line is 0.6266. This indicates that it may be too difficult to classify the customers using the included variables and that the quality of the model is bad. This confirms the previous findings of the confusion matrix.

Figure 5: ROC curve for offline media reach

0 .0 0 0 .2 5 0 .5 0 0 .7 5 1 .0 0 Se n si ti vi ty 0.00 0.25 0.50 0.75 1.00 1 - Specificity

(29)

29

4.2 Bivariate probit model on churn and the offline media campaign

In Table VII the empirical estimation results of the bivariate probit model are presented. The parameters are estimated using maximum likelihood. The dependent variables are the dummy variable whether a customer churned and the dummy variable whether a customer saw or heard the offline media campaign. The independent variables consist of socio-demographic variables, company-related variables and other variables described in Section 3. Of interest is the effect of the reach of the media campaign on churn which can be found in table VII. However, the estimated parameter is not significant with p = 0.838.

Table VIII presents the results of the bivariate probit model conditional on success in (2.2): Pr[churni = 1,campaign𝑖 = 1]

Pr[campaign𝑖 = 1] , the partial effects of the conditional mean function. An interest

result is that the estimated partial effect of having seen or heard the commercial is positive (0.012), but the estimated parameter is not significant.

Furthermore, an LR test is performed to test the hypothesis that the correlation coefficient is equal to zero: 𝐻0: ρ = 0. In this case the bivariate probit model collapses into two separate probit models. The test statistic follows a 𝜒2(1)-distribution. From Table VIII it can be

deduced that 𝜒2(1) = 0.236 with p = 0.627. At a significance level of 5%, the hypothesis of no

correlation between the error terms is strongly not rejected. This indicates that ρ is not significantly different from 0 and the two univariate probit models are allowed to estimate separately. This may suggest that the process of having seen the TV of heard the radio commercial and the process of churn are independent of one another.

Assumption 1

Results show that (1) the accuracy of the media campaign reach model is very low (0.609) in Section 4.1.2 and (2) the bivariate probit model collapses in two separate univariate probit models due to the insignificance of the estimated correlation coefficient in Section 4.2. It may be too difficult to estimate whether a customer has seen or heard the campaign or not. Thereby, the process of reaching a customer by the campaign may be independent of the process of churn. As a result, we do not consider all 1,878,212 customers of the energy supplier in the rest of this study and continue with the subsample of 8,379 observations in which the binary variable media reach is directly observed.

4.3 Selection models on the number of days up to churn

Parametric and semiparametric selection models are applied to correct for selectivity and to estimate the time in days up to a churn starting from the beginning of the media campaign given

Referenties

GERELATEERDE DOCUMENTEN

Given the different characteristics of the online and offline channel, and the customers that use a respective channel, channel choice is expected to moderate the

Theoretical Framework Churn Drivers Relationship Breadth H1: - Relationship Depth H2: - Relationship Length H3: - Age H4: - Gender H5: - Prior Churn H6: + Price H7: + Promotion H15:

›  H4: Average product price positively influences the effect of the amount of opens on customer churn.. ›  H5: Average product price positively influences the effect of the amount

Adding a social influence variable and historical data to the model, resulted in highly significant, strong beta’s which influenced the predictive power of the churn model in a

H1b: Churn intention acts as a mediator on the relationship between the perceived benefits/costs and actual churn Accepted (Mediation) H2a: The premium of other insurance companies

This study analyzed to what extent the perceived costs and benefits influence churn intention and churn behavior for different groups of people in the Dutch health

To identify interaction effects that can have a moderating effect on the drivers of churn, a Pearson Chi-square correlation test has been performed for the variables of

Multiple variables have been added as moderators on the effect of perceived price on churn: customer dissatisfaction, a factor for the different insurers, the usage of