• No results found

Friends for Life? The relationship between customer friendliness and retention

N/A
N/A
Protected

Academic year: 2021

Share "Friends for Life? The relationship between customer friendliness and retention"

Copied!
46
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Friends for Life?

The relationship between customer friendliness and

retention

by

Melvin Bredewold

(2)

2

Friends for Life?

The relationship between customer friendliness and

retention

by

Melvin Bredewold University of Groningen Faculty of Economics and Business

Msc Marketing Master thesis January 2018 De Kaai 42 9723LD Groningen +31 (0) 6 48 36 34 26 m.s.bredewold@student.rug.nl Student number: 3033848

First supervisor: prof. dr. T.H.A. Bijmolt Second supervisor: dr. J. van Doorn

(3)

3

Management summary

This study provides an extensive analysis of a customer feedback metric, more precisely, customer friendliness. Marketing research company SAMR created their customer feedback metric which focusses on different aspects of customer friendliness. They are tracking their metric for 220 Dutch companies, among 12 industries, since 2012. Recently, the following question arose: “Is our metric a

good indicator of future performance?”. This study aims to answer that specific question.

The data provided by SAMR included scores on different aspects of customer friendliness for the 220 companies, as well as anonymized personal information. However, to answer this question, an indicator of future performance was necessary. Fortunately, the data contained measurements partly from the same respondents over the past year. Therefore, it was possible to create a retention variable which is used as the indicator of future performance. Taking past research into account, several aspects of this customer feedback metric can be criticized. First of all, the metric employs a full-scale from one to ten while prior studies found that a non-linear scale (e.g., Top-2-Box) increases predictive power. Additionally, this study takes into account the Dutch grading culture by creating a Top-3-Box scale. Secondly, the use of multiple aspects is proved inefficient in predicting future performance, as only one aspect would be sufficient. This study to investigates whether this finding of previous literature also holds for this metric. Finally, non-linear effects are expected, meaning that the effects of the scores on retention differ, whereas the most prominent effect is expected between the score of five and six.

Several logistic regressions were employed to investigate whether the relationship is present in the first place. Additionally, the corresponding questions are investigated by comparing the different predictive regression models. However, because the customer feedback metric measures several aspects of customer friendliness, multicollinearity was found. A principal components analysis and factor analysis were used to solve this issue. A single component which captures all scores of customer friendliness was created.

(4)

4 Secondly, the results contradict the critiques on the employment of a full-scale rating possibility. Adjusting the full-scale into a Top-2-Box scale did not improve the prediction of retention. Instead, it even performed worse. However, the adjustment into a Top-3-Box scale to account for the Dutch grading culture proved to perform better than the Top-2-Box scale and even performed as good as the full-scale model.

Thirdly, the results of this study reveal that the biggest impact lies between the increase in score from a seven to an eight. A respondent who rates an eight is 18.14 percent more likely to retain compared to a respondent who rates a seven. This effect is also found to a lesser extent between an increase from six to seven, where the chance of retention increases by 8.2 percent. No differences were found between the scores from one till six, and more importantly, the difference in retention between an eight, nine, and ten are non-existent.

Finally, this study found results that support the critique on employing a multi-item measurement. No difference in predictive power was found between a model with multiple items included, compared to a model with only one item included. Whereas the use of multiple items also caused multicollinearity. However, the use of multiple items distinguishes this metric from other metrics. Therefore, more research should be conducted on the effects and usefulness of the different items. Additionally, this study found that retention increases as age increases. Also, lower educated

(5)

5

Preface

In front of you lies the thesis: “The Relationship between Customer Friendliness and Retention: friends for life?” It has been written to fulfill the final graduation requirements for the Msc. Marketing Management & Intelligence program at the Rijksuniversteit of Groningen. I was conducting the research and writing my thesis from September 2017 until January 2018.

The project was requested by SAMR, a market research company located in Leusden. Together with my thesis supervisor, prof. dr. T. Bijmolt, the research question was carefully formulated. SAMR provided me with real-life data, which came with the corresponding issues of data cleaning. This, however, made the project both more realistic and challenging, as well as satisfying when I found any results.

I would like to thank my supervisor, prof. dr. T. Bijmolt, for his excellent guidance and support throughout the whole process. Without his supervision, the main question might still be left unanswered. Secondly, I would like to thank dr. K. Dehmamy for assisting with my coding issues in Rstudio. Furthermore, I would like to thank Gerrit Piksen and Jurian Broers for the opportunity to write my master thesis at SAMR and providing me with the ‘Klantvriendelijkste bedrijf van

Nederland’ data. I also would like to thank my friends and family for all the support I received in the past half year. Especially Dion Parameswaram and Marc Boels, with whom I spent most of my time with and who helped me survive this master program. Finally, I want to give special thanks to Dion Parameswaram who came up with the perfectly fitting title for my thesis.

I hope you enjoy reading this thesis as much as I had stress about it while writing it. Melvin Bredewold

(6)

6

Table of Contents

1 Introduction ... 7

2 Theoretical framework... 10

2.1 Customer Feedback Metrics ... 10

2.2 Customer friendliness ... 10

2.3 Retention ... 11

2.4 Relationship between customer friendliness and retention ... 12

2.5 Non-linear scaling ... 13

2.6 Multi-item measurement ... 14

2.7 Effect of industry characteristics ... 15

3 Methodology ... 16

3.1 Data ... 16

3.1.1 Creating the dependent variable ... 16

3.1.2 Deletion of observations ... 16

3.1.3 Data aggregration ... 17

3.1.4 Deletion of explanatory variables ... 18

3.1.5 Deletion of missing values ... 18

3.1.6 Imputation of missing values ... 19

3.1.7 Self-selection bias ... 19 3.2 Descriptive analysis ... 20 3.3 Logistic regression ... 21 3.4 Multicollinearity ... 22 3.4.1 Correlation matrix ... 22 3.4.2 VIF ... 23

3.4.3 Principal Components Analysis ... 23

3.4.4 Factor Analysis ... 24

3.5 The final models ... 25

3.6 Model validation ... 26

4 Results ... 29

4.1 Main results ... 29

4.2 Improvement by non-linear scaling ... 31

4.3 Non-linear effects ... 31

4.4 Multi vs. single item measurement ... 33

4.5 Additional findings ... 33

5 Conclusion ... 35

5.1 Recommendations ... 36

5.2 Limitations and future research ... 37

References ... 39

(7)

7

1

Introduction

Customer feedback metrics (CFMs) are instruments used to measure the attitudes of a customer. A Dutch telecom provider, for example, implemented the Net Promoter Score (NPS) several years ago to keep track on the efforts of their contact center employees. This metric enabled the company to meet higher quality standards. Different CFMs are used worldwide by leading companies such as Apple in retail and ING in financial services (Bain & Company, 2017). CFMs are used to set goals and measure the performance of metrics that are an indicator of future firm performance (Hauser & Katz, 1998) or loyalty (Gupta, Lehmann, and Stuart, 2004). According to Gupta and Zeithaml (2006), it is critical to understand the effect of CFMs on firm performance to increase the accountability of the marketing department. Which was an increasingly important topic in the last decade.

Dutch companies also recognize the increasing importance of a customer focus. When in 2007 customers were asked to recall at least one positive experience with a company, the results were shocking; 40 percent of the respondents were unable to remember a single positive experience. Fortunately, this number was reduced to only 7 percent in 2014 (SAMR, 2017), which may prove that the focus of Dutch companies is shifting more and more to the customer. Another explanation might be that customers are more familiar with the questions asked to measure CFMs because companies gradually adopted the different metrics in the period between 2007 and 2014. The NPS, for example, was introduced in 2003 and improved in 2006, and 2011 by Reichheld (Bain & Company, 2017) and the CES was only introduced in 2010.

Different CFMs like the NPS (Reichheld, 2003) and the Customer Effort Score (CES) by Dixon, Freeman, and Toman (2010) were introduced the past decade and claimed to be the best indicator for future performance. Nowadays, the link between CFMs and firm performance is widely proven in different studies (Gupta et al., 2004). There exists a positive relationship between CFMs and

retention (de Haan, Verhoef, and Wiesel, 2015; Rust & Zahorik, 1993).

However, CFMs do not always deliver on their purpose. Research of de Haan et al. (2015) found that one CFM, more specifically CES, had no significant impact on retention and contradicted the claim of Dixon et al. (2010). Reichheld’s NPS (2003) is also viewed critically by Morgan and Rego (2006) who saw little to no predictive value for future performance. Also, practice has shown that high scores in CFMs are no guarantee for future successes. A key example is fashion chain Miss Etam which won the award of ‘Klantvriendelijkste bedrijf van Nederland’ in 2012 (BNR Nieuwsradio, 2012), whilst declared bankrupt in 2015 (Nu.nl, 2015). Another example is Kmart; despite the significant increase in customer satisfaction measured with the American Customer Satisfaction Index (ACSI) the

(8)

8 Both research and practice are showing contradicting results when it comes to the relationship between different customer satisfaction metrics and a firms’ performance. These findings resulted in the following problem statement:

To which extent is it possible to predict retention using a customer feedback metric measuring customer friendliness?

In this study, the CFM developed by the Dutch company SAMR (SmartAgent & MarketResponse) is used to analyze the relationship between customer friendliness and retention. This metric is different compared to the widely known CFMs like NPS and CES. Customer friendliness is defined as being available, honest, and properly helping a customer while being sincere. It may be most similar to the average customer satisfaction, which is broadly founded by research (Hanssens, 2009). However, the first major difference is that the CFM in this study measures friendliness instead of satisfaction. Moreover, this metric captures different aspects of friendliness, whereas average customer satisfaction is often measured by only one question (Ittner and Larcker, 1998). This raised the first two research questions:

1) Are multiple questions better in predicting retention compared to one question? and 2) Which of the aspects has the most explaining power in predicting retention?

The CFM in this study employs a full-scale (1-10), which is criticized by different studies. Oliver, Rust, and Varki (1997) found that customers evaluate firms with extreme scores, which sequentially will lead to rather non-linear relationships (van Doorn & Verhoef, 2008; van Doorn, Verhoef, and Bijmolt, 2007). Therefore de Haan et al. (2015) advice to transform the scales of the CFM to capture

extremely satisfied customers (e.g., Top-2-Box) is preferred over using a full-scale. These findings raised the next question:

3) Does a non-linear scale outperform the full-scale in predicting retention?

Additionally, it would also be useful for companies to know how much an increase in customer friendliness score matters. It would be likely to expect that the difference between a seven and an eight would be more significant than the difference between a four and a five, the latter ones being both insufficient. Thus a non-linear relationship is expected. Therefore the fourth question is raised: 4) Which improvement in the CFM score has the most impact on retention?

(9)

9 This study contributes to current literature because: first, the metric used differs from more

traditional metrics like NPS, CES and average satisfaction. Secondly, it will investigate whether measuring multiple aspects is more useful compared to measuring only one question. Thirdly, it will study whether transforming a full-scale into a non-linear scale increases the predictive value. Furthermore, this study will also contribute to practice in several ways: first, SAMR will have insights in the (predictive) value of their developed metric, and secondly, this study will give insights in which change in customer friendliness score has the most impact on retention.

(10)

10

2

Theoretical framework

2.1 Customer Feedback Metrics

There are several reasons for companies to measure, track and react to CFMs, of which the following three are the most important reasons: First, customers are one of the most critical marketing assets of the firm. There exists an almost one-to-one relationship between the value of the customer base and firm value (de Haan et al., 2015). Secondly, to increase the accountability of marketing, which was a hot topic in both research and practice in the last decade (Gupta et al., 2004; Moorman & Rust, 1999). This is supported by Verhoef and Leeflang (2009) who argue that accountability, next to innovativeness, is one of the drivers of the influence and thus the importance of the marketing department. Furthermore, Rego, Morgan, and Fornell (2013) argue that customer satisfaction next to market share are the most frequently used ways to assess marketing performance. Finally, CFMs are used as a benchmarking tool to measure the competitiveness of a firm within an industry.

Many different CFMs exist, claiming to be the best. Yet, studies have varied conclusions; whereas some say there is no significant difference between the CFMs and their predictive value on future performance (van Doorn, Leeflang, and Tijs, 2013; Wiesel, Verhoef & de Haan, 2012), others have found that the predictive value of the CFMs strongly differs across industries (de Haan et al., 2015). A brief overview of different CFMs and their predictive value for future performance can be found in table 1.

Table 1:

Overview CFMs and their relationship with future performance (Customer Insights Center, 2014)

2.2 Customer friendliness

A friendly person by definition is someone who possesses and exhibits the characteristics of a friend, such as being kind and helpful. In companies, the service department, and in particular contact staff has the most significant impact on friendliness. According to Johnston (1997), customer friendliness is influenced by the warmth and personal approachability of the contact staff; friendliness is

increased by a cheerful attitude and the ability to make the customer feel welcome. According to his research, customer friendliness is one of the primary sources of customer satisfaction, next to attentiveness, responsiveness, and care. Customer friendliness is also related to satisfaction in the research of Tsai and Huang (2002), who found an increase in willingness to pass positive comments to friends. Again, in this study, customer friendliness is defined as being available, honest, and properly helping a customer while being sincere. This definition corresponds to the ‘six golden rules’.

Morgan and Rego (2006) Keiningham et al. (2007) van Doorn et al. (2013) de Haan et al. (2015)

Customer satisfaction + +/- + +

Top-2-Box satisfaction + +/- + ++

NPS - +/- + +

(11)

11 The six golden rules as SAMR calls them, are based on text analysis of respondents’ answers in interviews about customer friendliness (SAMR, 2017) and can be found in table 2. The seventh question is a more general and overarching question concerning customer friendliness, and finally, the last item was added in 2016 and focusses on social responsibility.

Table 2:

Questions of the CFM

¹not available for 2012 ²only available for 2016

De Haan et al. (2015) argue that it is preferred to monitor a dashboard of CFMs that measure different dimensions instead of just one CFM. The CFM used in this study does precisely that, the questions capture various aspects of customer friendliness. The questions differ significantly from more widely used multi-item satisfaction measures which focus on aspects such as product portfolio, price-quality ratio, customized advice, relevance, and support/dealing with complaints (Keiningham, Cooil, Adreassen, and Aksoy, 2007). Instead, this metric focusses on the acts of employees and the culture of the company.

Customer friendliness is an unobservable metric or a stated preference as named by economists; an unobservable measure “involves behaviors of customers that typically relate to purchase or

consumption of a product or service” (Gupta and Zeithaml, 2006). Like the definition is indicating,

these metrics cannot be observed but only measured via surveys or interviews.

2.3 Retention

Retention is defined as a customer who is repeat buying from a firm or a customer being ‘alive’. According to Gupta and Zeithaml (2006) retention is one of the critical drivers of firm profitability, one of the reasons why may be because retention is cheaper than acquisition (Gupta et al., 2004). Retention is also proven to be one of the key drivers of CLV and firm profitability. Thus the link between retention and future performance is present. In contrast to friendliness, retention is an observable metric (Gupta and Zeithaml, 2006) because companies can in some settings easily observe whether a customer is repeatedly buying and or not. This, however, differs across industries and is more manageable for contract based industries (e.g., Telecom). In this study, retention is defined as a respondent who is a customer in year t and t+1. So when an individual is a customer at company X in both 2012 and 2013, it will be noted as retention.

Question Dimension

1. Company X does not bother me with business that is not relevant to me Obtrusiveness 2. Employees of company X are available when I need them Convenience 3. Company X admits mistakes and solves them in a proper manner Honesty

4. Company X keeps its promises Honesty

5. It is easy to swap a purchased item at company X Convenience 6. Company X is sincerely involved with me as a customer¹ Honesty

7. Company X is a customer friendly company Overall

(12)

12 This study uses self-stated retention, meaning that the customer indicates whether they visited a particular company or not. Thus, not based on actual purchase data. For some customers, this question may be hard to answer accurately because the less often an individual is in contact with a company (e.g., in the leisure industry) the harder it becomes to answer this question correctly (Batislam, Denizel, and Filiztekin, 2007). This may lead to an accuracy bias in some industries. Fortunately, this bias will be less present in contract-based industries because this study assumes that respondents, for example, know at which bank they are a customer at.

2.4 Relationship between customer friendliness and retention

Traditional marketing metrics often measure the customers’ attitudes (e.g., purchase intention) but do not link these attitudes to actual behavior. Whereas intentions are only limitary correlated to behavior (Srinivasan, Vanhuele, and Pauwels, 2010). This study aims to link the unobservable to the observable metric, more precisely customer friendliness to self-stated retention. The relationship between friendliness and satisfaction is proven by past research (Johnston, 1997; Tsai & Huang, 2002), and customer satisfaction is often linked to retention (Rego, Morgan, and Fornell, 2013; Guo, Xiao, and Tang, 2009; Bolton, Kannan, and Bramlett, 2000; de Haan et al., 2015). However, the direct relationship between friendliness and retention is not studied thoroughly. Tsai and Huang (2002) found that customers’ willingness to return was positively influenced by the employee affective delivery, which was mediated by perceived friendliness. Practitioners also found a positive relation: First, most customers will spend up to 10 percent more for the same product with better service. Secondly, when a customer receives good (poor) service, he or she tells 12 (20) people. Finally and most importantly, if service is good (poor), 82 (91) percent of the customers will (not) come back (Dummies, 2017).

One of the underlying theories may be the three-component model (TCM) developed by Meyer and Allen (1991), who explains the concept of commitment divided into three components. The model focusses on employees but can be extended to customers (Bansal, Irving, and Taylor, 2004). Especially the first component, affective commitment (e.g., your emotional attachment to an organization) is expected to be important in the relationship between customer friendliness and retention. Customers with a high level of affective commitment enjoy the relationship with the organization and therefore stay because they want to stay. It is expected that when a customer is treated friendly be a company, the affective commitment will increase. While research on the relationship between friendliness and retention is scarce, underlying theories like the TCM reinforce the first hypothesis:

(13)

13 Friendliness, by all means, is not the only factor that influences retention, many other variables like switching-costs (e.g., resulting in a high continuance commitment; fear of loss), price, promotions, and quality also contribute to retention (Chen & Hitt, 2002; Lewis, 2004). This study will only focus on the effect of friendliness and several demographics as control variables.

2.5 Non-linear scaling

An aspect of the CFM used in this study that can be criticized is the employment of a full-scale (e.g., score from 1 to 10). Oliver, Rust, and Varki (1997) found that customers evaluate firms with extreme scores, which result in somewhat non-linear relationships (van Doorn and Verhoef, 2008; van Doorn, Verhoef & Bijmolt, 2007). Thus a really satisfied customer is likely to rate either a nine or a ten and a less satisfied customer do not make a distinction between a four and a six. This idea is also employed in Reichheld's’ NPS (2003), where the highly satisfied customers who rate a nine and a ten are seen as Promoters, whereas passively satisfied customers rated a seven or an eight. And everything below a seven was seen as a detractor.

Because of these findings de Haan et al. (2015) argue that transforming the scales of the CFMs to capture extremely satisfied customers (Top-2-Box) or dividing customers into groups of satisfied and dissatisfied customers (NPS) is preferred over using a full-scale. This is also demonstrated in the research of Morgan and Rego (2006), who proved that the transformed metric (Top-2-Box) is a good predictor of future business performance. These findings resulted in the following hypothesis:

H2: Transformation into a Top-2-Box scale results in better predictability of retention.

(14)

14 What strikes is that a six and a seven are the most common given grades, which is not in line with the Top-2-Box transformation. This study assumes that the grades respondents’ received in secondary school affects their own grading system, as being embedded in the Dutch culture. Therefore it might be wise to adopt a Top-3-Box scale instead of the original Top-2-Box. These findings resulted in the third hypothesis:

H3: A Top-3-Box scale outperforms the Top-2-Box scale in predicting retention.

Besides adjusting to a Top-2/3-Box scale to account for non-linear effects, it is also interesting to see whether the differences between the lower and higher scores also differ. If we assume non-linear effects to be present (Van Doorn et al., 2007), one might argue that the difference between a three and a four, which are both insufficient, will be smaller than the difference between a seven and an eight. Therefore the following hypothesis is made:

H4: The effect of customer friendliness on retention is non-linear.

While companies often strive for the highest possible scores, it will be useful to know which increase in ratings have the most significant effect. With this knowledge, a company can optimize their input (time, money) and their output (score on customer friendliness, retention) to have the highest possible profit.

Finally, in the Dutch grading system, a grade of one to five is insufficient, whereas everything above a six or higher is sufficient. One may expect that the most significant impact on retention will be found between the border of insufficient to sufficient. Therefore the next hypothesis is made:

H5: A score increasing from five to six on customer friendliness has the largest impact on retention.

2.6 Multi-item measurement

The CFM used in this study is measured by multiple items (questions). However, past research argues that a one item measurement (e.g., ”In general, how satisfied are you with company X”) is sufficient when measuring a respondents attitudes (Ittner and Larcker, 1998). The three widely known metrics all only use one item. This study aims to test whether a multi-item model (e.g., model with all

-10% 0% 10% 20% 30% 40% 50% 1 2 3 4 5 6 7 8 9 10

Distribution of Grades³

Figure 1:

Grades given in The Netherlands

(15)

15 H1/H6 H2 H4/H5 H3 Figure 2: Conceptual model

Customer Feedback Metric

Main effect

questions concerning included) performs better than a single-item model (e.g., a model with only one question concerning friendliness included). Another downside when using different dimensions is the risk of multicollinearity. However, de Haan et al. (2015) argue that its preferred to monitor a ‘dashboard’ of CFMs. Therefore, the final hypothesis is made:

H6: A multi-item measurement outperforms a single-item measurement in predicting retention.

2.7 Effect of industry characteristics

The kind of industry has an essential effect on retention, regardless of the score on customer friendliness. This effect may be because customers have different criteria and expectations per industry. Other external factors such as competitiveness, switching costs, and product visibility also play a role but are unaccounted for in this study (Ou, Verhoef, and Wiesel, 2014). Despite the findings of de Haan et al. (2015) who found differences in the predictive performance of CFMs per industry, and Gupta and Zeithaml (2006) made the following generalization: The strength of the

satisfaction- profitability link varies across industries”, this study assumes that industry has a main

(16)

16

Figure 3:

Creating the Retention variable

3

Methodology

3.1 Data

As mentioned before the data originates from SAMR, a Dutch market research agency who collects customer data from Dutch companies in twelve industries. The dataset includes a total of 21,756 observations from 14,092 respondents from 2012 until 2016, for 220 different companies. The number of observations is larger than the number of respondents because a respondent can

participate in multiple years. First of all, companies with less than 100 ratings were deleted from the dataset because retention will hardly be measured when a low amount of observations are available. This, for example, can be caused when a company is only recently included in the survey while it was not included in the past years. The deletion resulted in 190 remaining companies.

3.1.1 Creating the dependent variable

Secondly, the dependent variable of this study, retention, has to be created. Again, this variable is measured as follows: retention is noted (as 1) when a respondent is a customer in year t and t+1. Thus a respondent has to be a client in a consecutive year to receive a ‘1’ for retention. An example can be found in figure 3. In the example, the observation of 2013 for customer one is missing. Therefore the retention of 2012 cannot be measured and is set to 0. Customer three does not have a consecutive year at all, and thus it is impossible to measure retention.

Customer Year Company X Company Y Retention X Retention Y

1 2012 1 1 NA NA 1 2014 1 0 1 0 1 2015 1 1 1 0 1 2016 1 0 NA NA 2 2013 1 0 0 0 2 2014 0 1 NA NA 3 2015 1 1 NA NA 3.1.2 Deletion of observations

After creating the dependent variable, all respondents who participated only once were deleted because retention cannot be measured for these respondents. For this reason, 9,190 respondents were removed, and this resulted in a dataset of 4,902 respondents who participated at least twice, for a total of 12,327 observations. Finally, all observations without a consecutive year (t+1), including all observations from 2016, were deleted. This was done because retention cannot be measured for an observation without a consecutive year (e.g., if there is no data for 2017, retention for 2016 cannot be measured).

(17)

17

Figure 4:

Deleting observations without a consecutive year

Figure 5:

Aggregation and deletion of company = 0

consecutive year) is missing. The observation for 2016 is deleted for the same reason. The single observation of customer three is also deleted since this respondent only participated once.

Customer Year Company X Company Y Retention X Retention Y

1 2012 1 1 NA NA 1 2014 1 0 1 0 1 2015 1 1 1 0 1 2016 1 0 NA NA 2 2013 1 0 0 0 2 2014 0 1 NA NA 3 2015 1 1 NA NA

Customer Year Company X Company Y Retention X Retention Y

1 2014 1 0 1 0

1 2015 1 1 1 0

2 2013 1 0 0 0

3.1.3 Data aggregation

As can be seen in figure 4, Company X and Y are separated in different columns. However, to be able to analyze the data the correctly, the companies should be aggregated in the same columns. Finally, any observations with company ‘0’ should be deleted because retention cannot be measured if the respondent has never visited the company in the first place. Also, no customer friendliness scores were given if the company has a ‘0’. Figure 5 shows that ID four and six are deleted.

Customer Year Company X Company Y Retention X Retention Y

1 2014 1 0 1 0

1 2015 1 1 1 0

2 2013 1 0 0 0

ID Customer Year Company Retention

1 1 2014 1 1 Company X 2 1 2015 1 1 3 2 2013 1 0 4 1 2014 0 0 Company Y 5 1 2015 1 0 6 2 2013 0 0

ID Customer Year Company Retention

1 1 2014 1 1

2 1 2015 1 1

3 2 2013 1 0

(18)

18

3.1.4 Deletion of explanatory variables

In the questionnaire, respondents are asked whether they visited a particular company, if answered with ‘yes’: the respondent rates the company on eight aspects of customer friendliness. However, not all eight questions were useful. First of all, the last question is deleted because it is only available for 2016, which is not relevant in predicting retention since there is no data for 2017. The sixth question is available for all years except for 2012, which means it is not missing at random (MNAR). In this case, valuable information is lost and there exists no universal method of imputing MNAR data accurately (Donders et al., 2006). Therefore question six is also excluded in the analysis.

Finally, there are randomly missing scores because the survey allowed for an ‘I don’t know’-option in the answer in 2014 and 2015 measures. These variables are missing at random (MAR) instead of missing completely at random (MCAR). Therefore simple techniques such as complete and available case analysis or mean imputation may give biased results (Donders et al., 2006). Thus a more sophisticated method will be used to impute the missing data, which is thoroughly explained in section 3.1.6.

3.1.5 Deletion of missing values

A total of 52,535 values are missing out of all 428,659 scores (61,237 observations * 7 variables) on friendliness, which is equal to 12.25 percent of the ratings. After deleting Q6, the number of missing values was reduced to 34,961 (9.51%) out of 367,422 (61.237 observations * 6 variables). Deletion of all observations with missing values would be a possible solution if five percent or less of the cases were missing (Schafer, 1997). In the case of this dataset, too much information will be lost which may result in bias. However, the dataset still contains observations who miss more than half of the scores on customer friendliness. Logically these observations lose their predictive value. Furthermore, the imputation of the missing values would be less accurate if more than half of the variables are missing (e.g., the imputation model will then base the imputation of four variables on the value of only two variables). Therefore the following rule is applied: Every observation with NA > 2 is deleted.

(19)

19

Figure 6:

Deletion for NA >2

ID Customer Year Company Retention Q1 Q2 Q3 Q4 Q5 Q7

1 1 2014 1 1 8 8 9 10 7 8

2 1 2015 1 1 8 NA 7 6 NA 9

3 2 2013 1 0 4 6 6 7 8 6

5 1 2015 1 0 NA 6 7 NA NA 8

ID Customer Year Company Retention Q1 Q2 Q3 Q4 Q5 Q7

1 1 2014 1 1 8 8 9 10 7 8

2 1 2015 1 1 8 NA 7 6 NA 9

3 2 2013 1 0 4 6 6 7 8 6

3.1.6 Imputation of missing values

The Multivariate Imputation by Chained Equations package (mice) in R is used to impute the remaining missing values (Buuren & Groothuis-Oudshoorn, 2011). At first, it is important to see which method the mice package picks as default on the particular dataset. In this case, “pmm”, short for predictive mean matching (Little, 1988), was chosen by the package and will be applied.

Predictive mean matching is one of the most popular imputation algorithms to impute continuous variables. The method can preserve non-linear relations (Buuren & Groothuis-Oudshoorn, 2011) which might be useful when testing the hypothesis for non-linear relations. This method uses a regression model to pick the five closest elements to a missing by Euclidean distance and places them in a donor pool. Finally, one of the five values of the donor pool is selected at random and imputed for the missing value.

3.1.7 Self-selection bias

(20)

20

3.2 Descriptive analysis

After cleaning the data, the most important variables are analyzed and presented in table 3. What strikes, at first sight, is the fact that the average retention rate is 39.35 percent, which is relatively low compared to other studies (e.g., de Haan et al., 2015) who found an average retention rate of 73.1 percent over a more extended period of two years. The low retention rates might be because this study does account for the self-selection bias, whereas this was not the case in the study of de Haan et al. (2015). Another explanation might be the fact that in their study, the retention question is asked in a second survey, and is clearly focused on retention. Whereas in this study the retention variable is derived from the question “have you visited supermarket X in the past two months?” or

“are you a customer at bank X” which is less focused on retention. And thus, respondents may be

less primed to think of retention actively.

What also strikes is that the average score on the questions concerning customer friendliness all range between 6.77 and 7.86, which is in line with the Dutch grading culture (Nuffic, 2009). The age of the participants ranges from 18 until 93, with an average of in the 53.09, which is rather old. The dataset includes slightly more females (54.83%) compared to males (45.17%). Furthermore, some provinces are over-represented, more participants are from the south-western part of the

Netherlands. Also called the ‘Randstad’. Finally, lower educated respondents are less represented (21.53%) compared to medium (38.26%) and highly educated (40.21%) respondents. Education is also a self-stated variable were the respondents themselves indicated their highest level of education.

Table 3:

Overview of descriptives per industry

Concerning the industries, there are quite some differences in observations and more importantly in retention rates. As shown in table 3, the retention rates range from 57.84 percent in banking to 20.53 percent in retail, showing the main effect that industries have. Also, the number of observations differ, with the lowest being 1,717, and the highest being 9,834.

Industry Observations(n) Retention Retention (%) Q1 Q2 Q3 Q4 Q5 Q7

(21)

21

3.3 Logistic regression

To test the first hypothesis a binomial logistic regression will be used, which is part of the generalized linear models family. This type of regression does not assume a linear relationship, but it assumes a linear relationship between any continuous independent variables and the logit transformation of the dependent variable. The logistic regression is the obvious choice since the dependent variable is binary (1= retention, 0= churn). The outcome cannot exceed 1 or drop below 0. The following formula would resemble the model:

𝐿𝐿𝐿𝐿 �1 − 𝑃𝑃� = 𝛽𝛽𝑃𝑃 0+ 𝛽𝛽1𝑄𝑄1 + 𝛽𝛽2𝑄𝑄2 + 𝛽𝛽3𝑄𝑄3 + 𝛽𝛽4𝑄𝑄4 + 𝛽𝛽5𝑄𝑄5 + 𝛽𝛽7𝑄𝑄7 + 𝛽𝛽8𝐴𝐴𝐴𝐴𝐴𝐴 + 𝛽𝛽9𝐺𝐺𝐴𝐴𝐿𝐿𝐺𝐺𝐴𝐴𝐺𝐺

+ 𝛽𝛽10𝐸𝐸𝐺𝐺𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐿𝐿𝑒𝑒+ 𝛽𝛽11𝑈𝑈𝐺𝐺𝑈𝑈𝐸𝐸𝐿𝐿𝐸𝐸𝐸𝐸𝑈𝑈𝑢𝑢+ 𝛽𝛽12𝐼𝐼𝐿𝐿𝐺𝐺𝐸𝐸𝐼𝐼𝐸𝐸𝐺𝐺𝑈𝑈𝑖𝑖 Where: i = Industry (1-12), e= Education(1-3), u= Urbanity (1-5)

The independent variables are allowed to be correlated in a logistic regression. However, too much multicollinearity can lead to a poorly performing model (Leeflang, Wieringa, Bijmolt, and Pauwels, 2015, p.138). Which might be the case since some dimensions of friendliness, mentioned in paragraph 2.3, are likely to overlap. In case of multicollinearity, the Principal Components Analysis (PCA) and Factor Analysis (FA) can be used to combine any number of related predictors in fewer unrelated components.

The second and third hypothesis can also both be tested with the same binomial logistic regression. The difference, however, is the fact that the explanatory variables measuring customer friendliness are now binomial (e.g., 1= a score of 10-9, 0= a score of 8-0 for Top-2-Box). Using the same model structure also ensures that the three different models are comparable to each other. Important to note, if some exploratory variables are changed in the full-scale model (e.g., due to multicollinearity), the same changes should also be made in the other two models. If not, any comparison would be impossible. The formula remains the same, and thus Q1 until Q7 are included, but the scales of these variables are adjusted into a Top-2-Box scale and a Top-3-Box scale.

(22)

22

Table 4:

Correlation Matrixes

The final hypothesis can be tested by using the same model which is used for hypotheses one until three but adjusting it by including only one explanatory variable for customer friendliness. Again, just Q7 will be added based on the same reasoning as mentioned above, more specifically, because it is the most general question measuring customer friendliness.

The models contain several control variables. First, education is measured on a 3-point scale (low, medium, high) and is self-stated by respondents. Secondly, urbanity which is also self-stated and is measured on a 5-point scale (not urban at all, to very urban).

3.4 Multicollinearity

Multicollinearity refers to the fact that several explanatory variables are correlated with each other (Leeflang et al., 2015). This problem is expected since the six predictors all measure a part of customer friendliness. The first evidence of multicollinearity was the fact that Q1 had an adverse effect on retention while running a logistic regression model, while a positive impact was expected. When running the same model with only Q1 and several control variables the effect was indeed positive. This indicates that some questions concerning friendliness are highly correlated with one another. In this paragraph, several tests and solutions for multicollinearity are employed. Only the explanatory variables concerning customer friendliness are included.

3.4.1 Correlation matrix

The first step to test for multicollinearity is the correlation matrix. The Pearson Correlation

Coefficient is used to quantify the degree to which two variables are related to each other (Pearson, 1895). It is calculated by the covariance of the two variables being divided by the product of the standard deviations. This method is most suitable since the questions concerning customer friendliness are all on an interval level. The test results in correlations ranging from 0.65 up to 0.84 for the full-scale variables. Indicating both strong (0.6-0.8) and very strong (0.8-1.0) positive

correlations (e.g., a high score in Q2 would mean a high score for Q7). Both the Top-2-Box and Top-3-Box variables have gradually decreased scores in the correlation matrix; this may be caused by the fact that the variables are transformed from interval to binary variables. Still, all correlations are rather high and significant (p < 0.001), resulting in more proof for the multicollinearity problem. Table 4 provides an overview of the correlations.

Full-scale Top-2-Box Top-3-Box

(23)

23

3.4.2 VIF

Another approach to test for multicollinearity is by using the variance inflation factors (VIF). This test regresses the explanatory variables on each other which results in different R-Squareds. The R² is applied in the following formula: VIF= 1/1-R². The results can be found in table 5; the scores range between 2.16 and 5.22. Again it is notable that in this case the VIF scores for the 2-Box and Top-3-Box gradually decreased. Now different rules of thumb are used by practitioners and researchers. The most commonly used cut-off point is at 5 for correlations and 10 for high correlations (Hocking and Pendleton, 1983; Craney and Surles, 2002). Indicating that Q7, in this case, is correlating with the other variables, which can also be seen in the correlations matrix as Q7 has all correlations > 0.8, except for Q1.

Concluding, the VIF scores are lower than expected, especially when taking into account the correlations matrix. However, because the correlations in table 4 are all rather high and significant, the Principal Components Analysis and Factor Analysis will be used to lower the amount of

correlating explanatory variables.

Table 5:

VIF scores on Retention

3.4.3 Principal Components Analysis

Before lowing the amount of correlating variables, the ideal amount of factors should be determined. This is done using a Principal Components Analysis (PCA). Running the analysis for the full-scale variables, it concludes that the first component explains 79.58 percent of the variance, whereas the other five components explain only between six and two percent of the variance. This indicates that one component is sufficient enough. For Top-2-Box, the first component explains 76.78 percent, and finally, for Top-3-Box the first component explains 73.59 percent. Again, the decrease is explanation power is present when moving to the Top-2/3-Box models, but the first component still explains enough. The tables of the PCA can be found in Appendix I.

The scree-plot (Cattell, 1966) in Appendix II also concludes that only one component should be used, since the rule of thumb is: ‘stop before the scree’. The scree is also in line with the eigenvalues rule of Kaiser (1960) which tells to only select the number of components with an eigenvalue higher than one.

Q1 Q2 Q3 Q4 Q5 Q7

Full-scale 2.333827 3.572698 4.191087 4.173634 4.096231 5.222902

Top-2-Box 2.471521 3.052385 3.184469 3.442708 3.130701 3.628379

(24)

24

3.4.4 Factor Analysis

When conducting a Factor Analysis, both one and two components would be possible (p < 0.001). However, when using two components, the loadings of one variable on two components if quite high (> .4) as can be seen in table 6. Because of the substantial loadings of one variable on two

components and the evidence from the PCA, all six variables will be transformed into one component.

Table 6:

Factor analysis – comparing number of components

The factor analysis assigns different weights to the six variables before combining them into one new explanatory variable, see table 7. In this case, one variable can have a more significant impact than another. For example, 84.64 percent of the variance of Q7 is accounted for in the new component (0.92^2=84.64%). Whereas this percentage is lower for the variance of Q1, 59.29 percent

respectively. Due to the different loadings, the model accounts for the differences in importance (and effect) while creating a new variable.

Table 7:

Factor analysis - loadings and variance explained

The six questions are combined with the help of the factor analysis into a new variable measuring customer friendliness, called ‘CF_Combined’. This variable is displayed in table 8 for the three

different scaling methods. All means are set to zero, and an increase in one of the six variables has its unique effect on the value of CF_Combined. Because the effects are all different in weight, the scale cannot be evenly distributed as it was before.

Table 8: Descriptives of CF_Combined Full-scale Q1 Q2 Q3 Q4 Q5 Q7 Component 1 0.77 0.85 0.89 0.89 0.89 0.92 Component 1 0.67 0.45 0.78 0.79 0.80 0.70 Component 2 0.39 0.89 0.44 0.43 0.41 0.58 Q1 Q2 Q3 Q4 Q5 Q7

Load Var. Load Var. Load Var. Load Var. Load Var. Load Var. Full-scale 0.77 59.29% 0.85 72.25% 0.89 79.21% 0.89 79.21% 0.89 79.21% 0.92 84.64% Top-2-B. 0.79 63.36% 0.84 70.73% 0.86 73.27% 0.87 76.21% 0.85 72.59% 0.88 77.09% Top-3-B. 0.76 57.91% 0.81 65.45% 0.83 69.56% 0.85 72.08% 0.84 71.06% 0.86 74.13%

(25)

25

3.5 The final models

After combining the explanatory variables to prevent bias due to multicollinearity, the following formula represents the multi-item model used in this study for the full-scale, Top-2-Box, and Top-3-Box versions:

𝐿𝐿𝐿𝐿 �1 − 𝑃𝑃� = 𝛽𝛽𝑃𝑃 0+ 𝛽𝛽1𝐶𝐶𝐶𝐶_𝐶𝐶𝐸𝐸𝐶𝐶𝑈𝑈𝐸𝐸𝐿𝐿𝐴𝐴𝐺𝐺𝑗𝑗+ 𝛽𝛽2𝐴𝐴𝐴𝐴𝐴𝐴 + 𝛽𝛽3𝐺𝐺𝐴𝐴𝐿𝐿𝐺𝐺𝐴𝐴𝐺𝐺 + 𝛽𝛽4𝐸𝐸𝐺𝐺𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐿𝐿𝑒𝑒+ 𝛽𝛽5𝑈𝑈𝐺𝐺𝑈𝑈𝐸𝐸𝐿𝐿𝐸𝐸𝐸𝐸𝑈𝑈𝑢𝑢

+ 𝛽𝛽6𝐼𝐼𝐿𝐿𝐺𝐺𝐸𝐸𝐼𝐼𝐸𝐸𝐺𝐺𝑈𝑈𝑖𝑖

where: j= (Full-scale, Top-2-Box, Top-3-Box), i = Industry (1-12), e= Education(1-3), u= Urbanity (1-5)

As mentioned before, a full-scale model with evenly spread steps (1-10) is necessary to test

hypotheses four and five. Which means that the new variable ‘CF_Combined’ cannot be used, since this variable ranges from -3.93 to 1.61. Instead, the original scores of 1-10 will be transformed into dummy variables (e.g., X1-X10). In section 3.3 is explained that only Q7 will be used in this model because it is the most general question measuring customer friendliness. This reasoning is supported by the fact that Q7 explains the most variance according to the factor analysis (see table 7), which indicates the significant importance of this variable. Since only one customer friendliness question is included, the multicollinearity issue is also avoided. These minor changes resulted in the following formula:

𝐿𝐿𝐿𝐿 �1 − 𝑃𝑃� = 𝛽𝛽𝑃𝑃 0+ 𝛽𝛽1𝑄𝑄7𝑗𝑗+ 𝛽𝛽2𝐴𝐴𝐴𝐴𝐴𝐴 + 𝛽𝛽3𝐺𝐺𝐴𝐴𝐿𝐿𝐺𝐺𝐴𝐴𝐺𝐺 + 𝛽𝛽4𝐸𝐸𝐺𝐺𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐿𝐿𝑒𝑒+ 𝛽𝛽5𝑈𝑈𝐺𝐺𝑈𝑈𝐸𝐸𝐿𝐿𝐸𝐸𝐸𝐸𝑈𝑈𝑢𝑢+ 𝛽𝛽6𝐼𝐼𝐿𝐿𝐺𝐺𝐸𝐸𝐼𝐼𝐸𝐸𝐺𝐺𝑈𝑈𝑖𝑖

where: j = Grade (1-10) i = Industry (1-12), e= Education(1-3), u= Urbanity (1-5)

Finally, to test the last hypothesis, a single-item model should be estimated and compared with the full-scale multi-item model. Again, Q7 will be included in this model but now in the standard scale version (1-10), instead of a dummy scale. The final model would look as follows:

𝐿𝐿𝐿𝐿 �1 − 𝑃𝑃� = 𝛽𝛽𝑃𝑃 0+ 𝛽𝛽1𝑄𝑄7 + 𝛽𝛽2𝐴𝐴𝐴𝐴𝐴𝐴 + 𝛽𝛽3𝐺𝐺𝐴𝐴𝐿𝐿𝐺𝐺𝐴𝐴𝐺𝐺 + 𝛽𝛽4𝐸𝐸𝐺𝐺𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐿𝐿𝑒𝑒+ 𝛽𝛽5𝑈𝑈𝐺𝐺𝑈𝑈𝐸𝐸𝐿𝐿𝐸𝐸𝐸𝐸𝑈𝑈𝑢𝑢+ 𝛽𝛽6𝐼𝐼𝐿𝐿𝐺𝐺𝐸𝐸𝐼𝐼𝐸𝐸𝐺𝐺𝑈𝑈𝑖𝑖

(26)

26

3.6 Model validation

Eight different models are estimated using all 56,137 observations and are presented in table 9 and 10 on the next page. Model (1) represents the baseline model with only the explanatory variables included. In model (2) the CFM is included, but industry is still excluded, whereas this is reversed for model (3). Both model (2) and (3) are created to assess the impact of CF_Combined and test

hypothesis 1. Model (4) is the full-scale multi-item model, which is also used to test hypothesis 1. Model (5) represents the Top-2-Box and model (6) the Top-3-Box, used to test hypotheses 2 and 3. To test hypotheses 4 and 5, model (7) is created using the dummy variable scale. Finally, to test hypothesis 6, model (8) is built, which only has one item (question) included.

First of all, the Likelihood ratio test (LRtest) is performed to see whether the models perform better than the baseline model. This can only be done with nested models, which is the case. The results show that all models perform better than the baseline, model (1), indicating that the CFM has predictive power. What strikes is the fact that model (3), with CF_Combined excluded and industry included, performs better according to most metrics than model (2), with CF_combined included and industry excluded. This proves the main effect of industry, whereas also demonstrating a weaker impact of the CFM.

For model comparison, one can look at the Akaike information criterion (AIC) and Bayesian

information criterion (BIC) which can be used to compare models which are not nested. This allows the comparison between models 4 to 8. This is an in-sample fit measurement. Both tests penalize models for having too many parameters; this penalty is more substantial for the BIC (Akaike, 1998; Schwarz, 1978). Following the AIC, the dummy model (model (7)) performs the best. However, model (6) (the Top-3-Box model) performs best according to the BIC.

To test the predictive validity, several out-of-sample criteria are used. To measure these criteria, the dataset is separated where seventy percent of the data is used to estimate the models whereas the remaining thirty percent is used for validation. The latter part of the data was not available when estimating the models, thus being out-of-sample. The top decile lift (TDL), receiver operating

(27)

27 Table 9: Output of model 1-6 ** p < 0 .001 *p < 0.01 . p < 0.05

Baseline (1) Baseline + CFM (2) Baseline + Industry (3) Full-scale Multi (4) Top-2-Box (5) Top-3-Box (6)

Intercept -0.8042723 ** -0.7766837 ** -0.0123131 0.055073 0.022129 0.0495183 CF_Combined 0.0645374 ** 0.102739 ** 0.072511 ** 0.1025698 ** Age 0.0066744 ** 0.0062797 ** 0.0055890 ** 0.004971 ** 0.005214 ** 0.0049306 ** Gender: Female 0.0008089 -0.0055081 0.0470810 . 0.039299 0.046284 . 0.0422147 Education: Low -0.0598688 . -0.0705102 . -0.0679200 . -0.084654 * -0.080825 * -0.0805459 * Education: Medium 0.0231333 0.0184515 0.0216859 0.014917 0.017163 0.0183894

Urbanity: not at all -0.0211669 -0.0227236 -0.0332544 -0.036278 -0.034730 -0.0408586

Urbanity: strong 0.0420520 0.0421568 0.0492016 0.048735 0.047073 0.0466875

Urbanity: little 0.0524184 0.0507697 0.0481653 0.045102 0.046580 0.0437425

Urbanity: really strong -0.0275885 -0.0234190 -0.0215570 -0.015528 -0.020593 -0.0176187

Industry: DIY -1.2897148 ** -1.330432 ** -1.304786 ** -1.3221274 ** Industry: Drugstores -0.7273562 ** -0.757913 ** -0.731988 ** -0.7466302 ** Industry: Electronics -1.5345251 ** -1.558985 ** -1.545478 ** -1.5527052 ** Industry: Energy -0.2909254 ** -0.295290 ** -0.288272 ** -0.2893681 ** Industry: Insurance -0.6010938 ** -0.621060 ** -0.609353 ** -0.6165579 ** Industry: Leisure -1.5014662 ** -1.519930 ** -1.512703 ** -1.5226001 ** Industry: Online -1.1226072 ** -1.159167 ** -1.139401 ** -1.1565868 ** Industry: Retail -1.6537678 ** -1.690257 ** -1.663786 ** -1.6828784 **

Industry: Retail Other -1.0360272 ** -1.072436 ** -1.048310 ** -1.0625664 **

Industry: Supermarkets -0.3480094 ** -0.388123 ** -0.366004 ** -0.3804636 **

Industry: Telecom -0.5416933 ** -0.541187 ** -0.541834 ** -0.5416436 **

AIC 52604.78 52570.37 50589.55 50505.69 50547.72 50500.76

BIC 52681.99 52656.16 50761.12 50685.85 50727.88 50680.91

LRtest (Baseline) 52.301 ** 2857.9 ** 2984.7 ** 2926.7 *** 2980.5 **

Top decile lift 1.083 1.116 1.459 1.445 1.442 1.438

ROC 0.5141 0.5232 0.6288 0.6325 0.6312 0.6315

Hit rate 60.49% 60.49% 62.2% 62.46% 62.35% 62.43%

Misclass. error 0.3951 0.3951 0.3781 0.3754 0.3765 0.3757

(28)

28 Table 10: Output of model 7-8 ** p < 0 .001 *p < 0.01 . p < 0.05

Dummy model (7) Full-scale Single (8)

Intercept -0.260069 . -0.3724359 ** Customer friendliness (Q7) 0.0559088 ** Grade: 2 0.144151 Grade: 3 0.234692 Grade: 4 0.185879 Grade: 5 0.142216 Grade: 6 0.163368 Grade: 7 0.242190 . Grade: 8 0.408916 ** Grade: 9 0.408884 ** Grade: 10 0.416367 ** Age 0.004982 ** 0.0050733 ** Gender: Female 0.039547 0.0395236 Education: Low -0.079437 * -0.0831900 * Education: Medium 0.017215 0.0144271

Urbanity: not at all -0.039383 -0.0358218

Urbanity: strong 0.050506 0.0502373

Urbanity: little 0.046192 0.0467741

Urbanity: really strong -0.015174 -0.0150376 Industry: DIY -1.323271 ** -1.3232490 ** Industry: Drugstores -0.750818 ** -0.7518109 ** Industry: Electronics -1.552153 ** -1.5520750 ** Industry: Energy -0.287165 ** -0.2928498 ** Industry: Insurance -0.614354 ** -0.6169100 ** Industry: Leisure -1.523446 ** -1.5207054 ** Industry: Online -1.160835 ** -1.1571529 ** Industry: Retail -1.682292 ** -1.6809569 ** Industry: Retail Other -1.061957 ** -1.0631659 ** Industry: Supermarkets -0.375908 ** -0.3769318 ** Industry: Telecom -0.537599 ** -0.5389785 **

AIC 50500.52 50510.6

BIC 50749.3 50690.76

LRtest (Baseline) 2144.3 ** 2118.2 **

Top decile lift 1.441 1.453

ROC 0.6315 0.6328

Hit rate 62.54% 62.55%

Misclass. error 0.3746 0.3745

R² McFadden 0.0420205 0.0415250

R² Cox and Snell 0.0547486 0.0541209

(29)

29

Figure 7:

Out-of-sample comparison

Figure 8:

Psuedo R-square comparison

4

Results

All hypotheses are tested in this section. An overview of the supported and rejected hypotheses can be found at the end of section 4, in table 15.

4.1 Main results

To test the first hypothesis, and see whether the CFM has a positive effect on retention, models (2-3) are compared with model (1). The first support for this hypothesis was the result of the LRtest performed in section 3.6, which proved that all models performed significantly better than the null model. Another option is to compare the performance of the model 1 through 3, according to the out-of-sample criteria, which can be found in figure 7.

The charts in figure 7 illustrate that the improvement when including CF_Combined is quite some smaller than the improvement when industry is added instead. A significant increase in ROC and hit rate and the decrease in misclassification error is found between model (2) and (3). This indicates that the CFM has a positive effect, since the criteria do change, but the effect is rather small in comparison with the industry effect. The three Pseudo R-square measures support this and can be found in figure 8. 35% 40% 45% 50% 55% 60% 65%

ROC Hit rate Misclass. error

Out-of-sample criteria

Baseline CF_Combined Industry 1 1,1 1,2 1,3 1,4 1,5

Baseline CF_Combined Industry

Top Decile Lift

0 0,02 0,04 0,06

McFadden Cox and Snell Nagelkerke

Psuedo R-squared

(30)

30 The effect of customer friendliness on retention is present but seems rather small. However, to further assess this relationship the odds ratios (coefficients) are calculated. The odds ratios for model (4), where both the CFM and industry are included will be used because then the comparison

between the difference in impact is possible, the figures can be found in table 11.

Unfortunately, the interpretation of the odds ratios come with an issue, for example, the odds of 1.0049 for Age indicate that a one unit increase in age, increases the chance of retention by 0.49 percent. However, due to the adjustments of the factor analysis to account for multicollinearity in section 3.3.4, the CF_Combined variable is not measured in whole units anymore, which makes it hard to interpret. Therefore the odds ratios of model (8), the full-scale model with only Q7 included, are used. The odds for all variables are alike, except for the customer friendliness variable. This seems reasonable because the scale changed from -3.935853 to 1.618753 (see table 8) to a normal one until ten scale. The latter scale has more units, and thus the odds are logically lower.

Table 11:

Estimations and Odds Ratios

The results of model (8) show that every one unit increase in CF increases the chances for retention by 5.75 percent. Which is a reasonable effect. The effect of industry, however, is much larger (e.g., the likelihood of retention for the DIY industry is -73.37 percent compared to the banking industry). Concluding, the findings demonstrate a significant positive effect of customer friendliness on

retention. This effect, however, is small in comparison with the main effect of industry. Nevertheless, the findings all support H1 “Customer friendliness has a positive effect on retention.”

Full-scale Multi (4) Full-scale Single (8)

(31)

31

4.2 Improvement by non-linear scaling

The performance of the full-scale model (4) is compared with the Top-2-Box model (5) to test hypothesis 2. The results show that the latter model does not outperform the full-scale model. The Top-2-Box performs worse on both in- and out-of-sample criteria. Therefore can be concluded that adjusting the scale does not improve the model and H2 “Transformation into a Top-2-Box scale results in better predictability of retention” is rejected.

When testing hypothesis 3, the performance of the Top-2-Box model (5) is compared with the performance of the Top-3-Box model (6), which is created to account for the Dutch grading culture (Nuffic, 2009). Overall, an increase in performance is found, and the Top-3-Box model outperforms the Top-2-Box model according to all metrics, except for the TDL. Because of these findings, it can be concluded that H3 “A Top-3-Box scale outperforms the Top-2-Box scale in predicting retention” is

supported by the data. When comparison the Top-3-Box scale with the full-scale model (4), one can see that the validation methods are inconclusive about which model performs best. Therefore, no definite model can be picked as the winner between the two.

4.3 Non-linear effects

To measure the effect of customer friendliness more precisely and test hypothesis 4 the odds ratios of model (7) are analyzed and can be found in table 12. With the odds ratios, one can see which increase in grade has the most substantial impact on retention. The odds ratios are all compared to the base score of one and can again be interpreted as follows: When a customer rates company X with a score of eight, the likeliness of retention increases by 50.51 percent ((1.5051-1)*100) compared to a rating of one.

Table 12:

Odds ratios per score in friendliness, model (7)⁴

The effects of the scores from two till six are rather plain, except a score of three. However, these effects do not differ significantly from the effect of one, as can be seen from the estimations in table 10. The action starts when scores reach a seven, the effect increases and it differs from the baseline (p < 0.05). The most prominent improvement is found between seven and eight, after this, the effect levels off for the scores: nine and ten. The top three scores all differ significantly from the baseline (p

< 0.001). Respondents who rate company X with a seven, have a 27.4 percent higher chance to retain

compared to a rating of one. This increase is 50.51 percent for both a score of eight and nine and 51.64 percent for a score of ten. This trend is illustrated in figure 7.

Score (base=1) 2 3 4 5 6 7 8 9 10

(32)

32 To support the pattern depicted in figure 7, every score should be compared its previous score (e.g., the effect of a score of eight is compared with the effect of a score of seven, and the effect of a seven with the effect of a six). The results can be found in table 13. These findings indeed show that the difference between a six and seven is significant (p < 0.05), the same holds for the difference

between a seven and an eight (p < 0.001). Again, the odds ratios can be interpreted. When a score is increased from a six to a seven, the chances of retention are 8.20 percent higher. The chances grow by 18.14 percent for an increase from a seven to an eight. The results also show that the increase levels off since there is no difference between an eight and a nine, and a nine and ten. Also, no differences in effect between one through six are found.

Table 13:

Comparison per increase in score

** p < 0 .001 *p < 0.01 . p < 0.05

Concluding, the findings support H4: “The effect of customer friendliness on retention is non-linear”

since the effect is non-linear, however, not all increases are significantly different from one another. Hypothesis 5“ A score increasing from five to six on customer friendliness has the largest effect on

retention” can now also be answered with the information provided by table 13. Here one can see

that the biggest effect is located between a seven and eight. Therefore the hypothesis is rejected.

-5,00% 5,00% 15,00% 25,00% 35,00% 45,00% 55,00% 2 3 4 5 6 7 8 9 10

Impact on retention per score

(33)

33

4.4 Multi vs. single item measurement

When comparing the multi-item (4) and single-item (8) models, one can conclude that the latter model performs slightly better than the former model according to the Hit rate, misclassification error, ROC, and TDL, but worse according to the AIC, BIC, Pseudo R-Squareds. The single item model scores better on out-of-sample criteria and thus is slightly better in predicting retention. The multi-item model, on the other hand, scores better on in-sample tests. According to the AIC and BIC, this model is more simple and almost obtains the same prediction scores as the single item model. Due to these contradicting findings, no conclusion can be drawn to the question which model performs best. Thus H6 “A multi-item measurement outperforms a single-item measurement in predicting retention” is rejected.

Table 14:

Comparison of model 4 and 8

4.5 Additional findings

Several of the control variables had a significant effect on retention. No hypotheses were made beforehand, and thus are all findings additional. All models found that age has a positive impact (p <

0.001) on retention. This relationship is supported by the study of Karani & Fraccastoro (2010), who

found that elderly consumers are more likely to repurchase (retain) and actively resist brand switching as a favorite brand was established. Also, a negative effect (p < 0.01) for lower educated respondents is found, compared to the base level of respondents who finished higher education. The results suggest that lower educated consumers are less likely to retain. This finding is supported by studies which found that brand loyalty increases with social class (Chance & French, 1972; Carman, 1970). While, on the other hand, this result is contradictory to the findings of Dash, Schiffman, and Berenson (1976) who found that consumers in higher social classes are less loyal because the perceived risk decreases as social class increases.

Again, analyzing the odds ratios (see table 11) it can be concluded that a one year increase in age results in an increased chance of retention of approximately .5 percent (coefficients = 1.0049,

1.0050). Lower educated respondents are roughly 8 percent (0.9188, 0.9201) less likely to retain than higher educated respondents.

Full-scale Multi (4) Full-scale Single (8)

AIC 50505.69 50510.6

BIC 50685.85 50690.76

Top decile lift 1.445 1.453

ROC 0.6325 0.6328

Hit rate 62.46% 62.55%

Misclass. error 0.3754 0.3745

(34)

34 Concluding, an overview of supported and rejected hypotheses can be found in table 15.

Table 15:

Hypotheses and conclusions

Hypotheses Supported Remarks

H1: Customer friendliness has a positive effect on retention

H2: Transformation into a Top-2-Box scale results in better

predictability of retention

The model performed worse

H3: A Top-3-Box scale outperforms the Top-2-Box scale in

predicting retention

H4: The effect of customer friendliness on retention is non-linear

No effect between 1 to 6 Effects are found between 6 to 8 No effect between 8 to 10

H5: A score increasing from five to six on customer friendliness

has the largest impact on retention

The biggest effect was found from 7 to 8

H6: A multi-item measurement outperforms a single-item

(35)

35

5

Conclusion

Different CFMs have been studied in the last decades (de Haan et al., 2015; Gupta & Zeithaml, 2006) and various CFMs such as the NPS and CES have been introduced (Dixon et al., 2010; Reichheld, 2003). Whereas some studies question the effects of different CFMs on future performance (Morgan & Rego, 2006), most researchers agree that CFMs can predict future business performance measured to some extent (de Haan et al., 2015; Rego et al., 2013; Morgan & Rego, 2006; Rust & Zahorik, 1993). This study contributes to the existing literature by first, adding results of a new and less researched CFM, more specifically, customer friendliness. Secondly, the study focusses on the use different scaling methods to account for non-linear effects (van Doorn and Verhoef, 2008; Oliver et al., 1997) such as Top-2-Box, which is proven to improve prediction of future performance (Morgan & Rego, 2006). Also, a third scaling method, Top-3-Box, is studied to account for the way Dutch participants give grades on questionnaires (Nuffic, 2009). Thirdly, the studied CFM employs a multi-item

measurement which is criticized by past research (Ittner and Larcker, 1998). Therfore, this study aims to investigate whether a multi-item measurement increases predictive performance or not, and which of the different items has most explaining power. Finally, the difference in impact of a one-score increase in customer friendliness on retention is studied.

First of all, the main question of this study “To which extent is it possible to predict retention using a customer feedback metric measuring customer friendliness?” is answered. The results prove that

customer friendliness has a positive effect on customer retention. A one-point increase in score results in a 5.75 percent increased chance of retention. However, several validation methods show that the impact of customer friendliness is limited, as it only slightly performs better than the model without the variable included. Nevertheless, all results prove that to some extent, it is possible to predict retention using customer friendliness.

The second question that arose was “Does a non-linear scale outperform the full-scale in predicting

retention?”. To test this, the full-scale from one to ten was transformed into a Top-2-Box scale, and

Referenties

GERELATEERDE DOCUMENTEN

An opportunity exists, and will be shown in this study, to increase the average AFT of the coal fed to the Sasol-Lurgi FBDB gasifiers by adding AFT increasing minerals

Three cornerstones for strategybuilding in this new political- administrative relations are for the Ministry of Justice: the more res- ponsible role of the citizens concerning

Worse still, it is a book that brought Singh an enormous amount of stress and trauma, mainly due to a related column he wrote in April 2008 for The Guardian in which he accused

quest for EEG power band correlation with ICA derived fMRI resting state networks. This is an open-access article distributed under the terms of the Creative Commons Attribution

It is however remarkable that whereas the theme trainings were cancelled to provide sales training in order to improve the client friendliness, a great part of the service coaching

Mr Ostler, fascinated by ancient uses of language, wanted to write a different sort of book but was persuaded by his publisher to play up the English angle.. The core arguments

• Provides insights into the effect of customer satisfaction, measured through online product reviews, on repurchase behavior!. • Adresses the question whether the reasons for

 While the constructs defined in the literature review where shown to be important for a positive customer experience, the degree to which they need to be integrated in a website