• No results found

The key drivers of customer loyalty in a B2B manufacturing setting

N/A
N/A
Protected

Academic year: 2021

Share "The key drivers of customer loyalty in a B2B manufacturing setting"

Copied!
80
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

The key drivers of customer loyalty in a B2B

manufacturing setting

(2)

2

The key drivers of customer loyalty in a B2B

manufacturing setting

Master Thesis Marketing Intelligence and Marketing Management University of Groningen

Faculty of Economics and Business Department Marketing PO Box 800, 9700 AV Groningen (NL) Tessa Dekker Saffierstraat 116, 9743LL Groningen (+31)6 57984026 T.Dekker.3@student.rug.nl s2488930 First supervisor: Dr. H. Risselada (H.Risselada@rug.nl) Second supervisor: Dr. J. T. Bouma (J.T.Bouma@rug.nl) External supervisors:

Mark Julsing (M.Julsing@Company X.nl) Nicole Verhoeven (N.Verhoeven@Company X.nl)

(3)

3

Abstract

In this paper, I study the main drivers of customer loyalty in a Business-to-Business (B2B) manufacturing setting by relying on the switching cost theory. Customer loyalty has a significant influence on business performance and is, therefore, recognized as a central point in marketing. This study focuses on the effects of customer-firm relationship characteristics – depth, length and breadth – and the effect of bundle completeness (the extent to which all necessary parts are included in a certain bundle) on customer loyalty. These mentioned effects are currently mainly investigated in the Business-to-Consumer setting. As the effects of the antecedents of customer loyalty might differ considerably for the B2B manufacturing setting, it is necessary to further investigate the effects of relationship characteristics on customer loyalty.

Hence, an empirical research was performed by using panel data of a large Dutch educational publishing company, Company X. In this research, a binary logistic regression has been performed and evaluated by comparing the performance to a classification tree and a random forest. The results of this study show that relationship length and breadth, have a positive effect on customer loyalty. In addition, organization size was found to have a direct negative effect on customer loyalty and did moderate the effect of bundle completeness on customer loyalty. Further, the region where a customer is located was found to affect customer loyalty directly and indirectly by influencing the effect of bundle completeness on customer loyalty. Finally, the logistic regression in combination with the random forest approach was found to provide the most useful insights.

It is recommended for managers operating in a B2B manufacturing industry to focus on the relatively new customers, larger organizations and to motivate customers to buy across product categories. Key account managers can play a crucial role by assuring a close relationship with the customer and by knowing the needs of the customer, which helps to deliver the optimal service. Future research might use a type of a multilevel model for further market research in order to investigate the effects of the hierarchical structure in the data.

(4)

4

Acknowledgements

(5)

5

Table of contents

Abstract ... 3 Acknowledgements ... 4 Table of contents ... 5 1. Introduction ... 7 2. Theoretical framework ... 11 2.1 Industry characteristics ... 11 2.1.1 Type of customer ... 12 2.1.2 Number of alternatives ... 12

2.1.3 Customer budget types ... 13

2.1.4 Short market description ... 14

2.2 Switching cost theory ... 15

2.3 Relationship characteristics ... 15 2.3.1 Relationship depth ... 16 2.3.2 Relationship length ... 17 2.3.3 Relationship breadth ... 17 2.4 Bundle completeness ... 17 2.5 Moderator effects ... 19 2.5.1 Organization size ... 19 2.5.2 Geographical distance ... 19 2.6 Research model ... 21 3. Data ... 21 3.1 Unit of analysis ... 22 3.2 Dataset preparation ... 22 3.2.1 Transforming variables ... 22

3.2.2 Merging and subsetting datasets ... 22

(6)

6 4.1 Sampling ... 29 4.2 Correlation checks ... 29 4.3 Modelling methods ... 30 4.4 Estimation parameters ... 32 4.5 Performance measures ... 33 4.6 Model specification ... 35

5 Results: analysis and empirical discussion ... 36

5.1 Model selection logistic regression ... 36

5.2 Main drivers of customer churning ... 37

5.3 Model performance ... 40

5.4 Comparing methods ... 40

5.4.1 Classification tree ... 40

5.4.2 Random forest ... 41

6 Discussion ... 43

7. Limitations and future research ... 45

References ... 47

Appendices ... 55

Appendix 1: Key market characteristics ... 55

1.1 Market structure ... 55

Appendix 2: Missing values ... 55

2.1 Matrix of the pattern of the missing values ... 55

Appendix 3: Descriptive statistics ... 56

3.1 Histogram relationship length ... 56

3.2 Histogram organization size ... 56

Appendix 4: Steps in statistical program R ... 57

(7)

7

1.

Introduction

Customer loyalty has a significant influence on business performance and is therefore recognized as a central point in marketing (Lam et al. 2004). The impact of customer loyalty on performance originates from the costs of losing existing customers and the extra revenues gained from retaining customers (Maicas, Redondo, and Olivan 2006). First, a customer switching to a competitor’s brand directly leads to a loss of revenue as sales decreases. Additionally, losing customers will lead to a rise in acquisition costs since the company needs to attract new customers to make up for the decrease in its customer base (Risselada, Verhoef, and Bijmolt 2010). Second, the increase in resistance to competitors’ products and the decrease in price sensitivity of loyal customers lead to additional firm revenues (Maicas, Redondo, and Olivan 2006). Therefore, it is not surprising that companies invest heavily in retaining customers.

(8)

8 The previously mentioned effects of the relationship characteristics and bundling on customer loyalty are mainly investigated in the B2C setting. For Business-to-Business (B2B) firms, however, loyalty is at lW as important, if not more important than for B2C firms due to the limited number of customers (Lam et al. 2004). In addition, the B2B setting differs considerably compared to the B2C setting regarding Customer Relationship Management (CRM) due to long-term relationships and more rational decision-making processes (Lam et al. 2004; Zhang, Netzer, and Ansari 2014). Strong relationships are found to be more effective in business markets than in consumer markets as relationships are considered being more critical (Palmatier et al. 2006; Zhang, Netzer, and Ansari 2014). As the effects of the antecedents of customer loyalty might differ considerably, it is necessary to further investigate the effects of relationship characteristics on customer loyalty.

(9)

9 characteristics, and therefore the moderating effects, might differ considerably in the B2B setting due to previously mentioned reasons. Since it is probable that there are moderator effects caused by different customer characteristics, it is necessary to further investigate this in a B2B setting. This research will investigate the moderating effects of different types of customer characteristics, including organizational size and geographical distance.

This brings us to the following three key issues that will be further investigated in this research: (1) What is the influence of relationship depth, length and breadth on customer loyalty? (2) What is the influence of bundle type on customer loyalty? (3) What is the influence of customer characteristics as moderators on the mentioned relationships?

In order to answer these questions, an empirical research is performed by using panel data of a large Dutch educational publishing company, Company X. In order to analyse the hypotheses, a binary logistic regression model was specified, estimated and validated to provide knowledgeable insights. In order to evaluate the performance of the logistic regression and to gain more insights, the performance was compared to the traditional classification tree method and the random forest technique.

The results of this study show that relationship length and breadth, have a positive effect on customer loyalty. In addition, organization size was found to have a direct negative effect on customer loyalty and did moderate the effect of bundle completeness on customer loyalty. Further, the region where a customer is located was found to affect customer loyalty directly and indirectly by affecting the effect of bundle completeness on customer loyalty. Finally, the logistic regression in combination with the random forest approach was found to provide the most useful insights.

(10)

10 Besides the theoretical implications, this study contributes to managerial decision-making by highlighting the mechanisms underlying customer loyalty. Managers operating in a similar B2B industry with limited alternatives and inflexible budgets will be able to increase customer retention by mainly focusing on the relatively new customers, larger organizations and by motivating customers to buy across product categories. Key account managers can play a crucial role by assuring a close relationship with the customer and by knowing the needs of the customer, which helps to deliver the optimal service.

(11)

11

2.

Theoretical framework

Customer relationship management takes a central role in today’s marketing practice. Strong long-term customer relationships are found to be profitable for companies (Chiou and Droge 2006). One part of the explanation is the declining costs of attracting new customers on its own and the increased likelihood of a positive Word-of-Mouth (WOM) by existing customers, which lead to an increase in the acquisition of new customers. Besides declining costs, CRM also leads to increasing revenues. This is partially due to the lower price-sensitivity of loyal customers. Moreover, loyal customers tend to be more open to adopting new products and are more resistant to competitor’s offerings (Maicas, Redondo, and Olivan 2006). In this research, customer loyalty is defined as “a deeply held commitment to rebuy or repatronize a preferred product/service consistently in the future” (Oliver 1999). A key distinction regarding the measurement of loyalty can be made between attitudinal and behavioural loyalty, where attitudinal loyalty refers to repurchase intentions, whereas behavioural loyalty refers to the actual behaviour of repurchasing (Dick and Basu 1994; Lam et al. 2004; Leenheer et al. 2007; Maicas, Redondo, and Olivan 2006). As this research focuses on the repeated purchase of a specific brand, or in other terms the actual purchase behaviour, the dependent variable loyalty refers to behavioural loyalty.

The remainder of this section will outline the theoretical background of the research and the development of the hypotheses. First, different industry characteristics will be summarized to provide information on the relevance of the study. Thereafter, the effect of the relationship characteristics of depth, length and breadth on loyalty will be explained. Then the relationship between bundle type and loyalty will be illustrated, followed by a clarification of the effect of different moderators. Finally, a proposed research model will be provided.

2.1

Industry characteristics

(12)

12 budgets are limited. Although this research will not compare different industries, a more detailed discussion of the characteristics of industries will be outlined in order to show how this research is different from previous research and in what way new insights can be gathered for a certain type of industry.

2.1.1 Type of customer

To start with, a distinction should be made between the B2B industry and the B2C industry. Since relationships differ considerably in a B2B setting, due to a limited number of customers, more personal professional interaction, a more rational customer decision-making process and multi-personal decision-making, deeper and closer relationships are developed (Lam et al. 2004; Zhang, Netzer, and Ansari 2014). This leads to an increased importance of customer-firm relationships in a B2B setting compared to a B2C setting as the customer’s expectations of a relationship are higher. In addition, Palmatier et al. (2006) encountered that strong relationships are more effective in business markets than in consumer markets. Due to the limited amount of research on customer loyalty from a B2B perspective, this research will focus on a B2B setting instead of the B2C setting, which suggests a stronger link between the customer-firm relationship and loyalty, as will be discussed in more detail in section 2.2.

Another distinction that should be considered is the type of industry in terms of manufacturing or service. The customer-firm relationship differs considerably between the markets due to the difference in the number of customer touch-points and the difference in the nature of the relationship (Lam et al. 2004; Zhang, Netzer, and Ansari 2014). Previous research found that the effectiveness of relationship management is weaker for product markets than for service markets (Palmatier et al. 2006). Prior studies mainly focused on the effects of the customer-firm relationship on customer loyalty in a service industry (Bolton, Lemon, and Verhoef 2004; Liu and Wu 2007; Maicas, Redondo, and Olivan 2006). As existing literature lacks the focus on a manufacturing industry, this research will include an analysis on customer loyalty in a manufacturing industry specifically, which would imply a weaker link between the customer-firm relationship and loyalty.

2.1.2 Number of alternatives

(13)

13 market. Multiple studies consider a market as being perfectly competitive, while it is most likely that an industry knows imperfect competition by being a pure monopoly, an oligopoly or by having monopolistic competition (McDowell 2012). In order to zoom in, some more detailed attention will be given to the case of an industry being an oligopoly. An oligopoly can be recognized by having large numbers of buyers, with only a few companies serving them with close substitutes (McDowell 2012; Stigler 1964). Numerous industries can be considered an oligopoly, such as the television sector, the oil and steel industry and parts of the education publishing industry (Gallet 2001; Klepper 2002).

Another factor affecting the degree of competition in the market is the presence, or absence, of a second-hand market. If there is an absence of a second-hand market, the number of alternatives will be limited, as there is no possibility for the buyer to acquire the same product at a lower price. Although most industries do know a threat of second-hand products, there are several specific markets where that threat is absent. One such market is the Dutch market of educational publishers of secondary schools. As the end-consumers (students) are not in the need for cheaper versions of their schoolbooks due to the law on free schoolbooks introduced in 2009 by the Dutch government, there is an absence of a second-hand book market (Mededingingsautoriteit 2011).

As mentioned, the number of alternatives can have an impact on the research model that will be discussed in this paper. This research will focus on an industry with a limited number of alternatives as only little existing literature considered this perspective. According to Jones and Sasser (1995) the development of customer loyalty is different for industries with a broad set of alternatives than an industry with a limited set of alternatives. The reason behind this is most likely that it becomes less attractive for customers to switch when the number of attractive alternatives declines. Therefore, it is expected that customers tend to be more loyal in the studied industry and that the effects of the relationship characteristics might be flattened because of this.

2.1.3 Customer budget types

(14)

14 less likely to switch, as it is timely and costly to consider all other options. Again, the example of the Dutch educational publishing industry will be used. In this industry, the government sets a fixed budget the customer (the school) can spend on school supplies (per student) (Mededingingsautoriteit 2011). This inflexibility does not only lead to higher switching barriers, it also leads to a strong reduction in the use of workbooks that have a lifecycle of only one year, which puts extra pressure on competition and therefore on offering attractive product bundles (Mededingingsautoriteit 2011). As most markets do not have this high degree of inflexibility of budgets, little research considers such types of markets. Therefore, this research involves a market with inflexible budgets to provide additional insights on the topic and the relationship between bundle type and customer loyalty is expected to be stronger.

To conclude, this research considers the differences between industries on three key dimensions, type of customer, number of alternatives and budget types. Although the link between the relationship characteristics and customer loyalty is already investigated to some extent, this research tries to shed new light on it in a different context. This research will investigate the relationships in a B2B manufacturing industry where buyers only have a limited set of alternatives, and in which the customer has a fixed budget. Due to the limited set of alternatives for customers, the effect of the relationship characteristics on customer loyalty is expected to be somewhat flat. In addition, the bundle type-loyalty relationship is expected to be strong due to the customers’ fixed budgets. Finally, moderator effects extend the model. Next, a more in-depth explanation of the relationships will be outlined.

2.1.4 Short market description

(15)

15

2.2

Switching cost theory

As mentioned before, the overarching theory used for this research is the theory of customer switching costs. Nowadays, firms are making extensive use of switching barriers, which can be defined as “any factor which makes it difficult or costly for consumers to change providers” (M. A. Jones, Mothersbaugh, and Beatty 2000, p. 259). Here, a provider can be interpreted as the focal firm that provides a certain offering to the customer. Switching barriers are effective tools to decrease the switching likelihood of customers. One type of switching barrier is raising switching costs, which can be defined as the “one-time costs facing the buyer of switching from one supplier’s product to another” (M. A. Jones, Mothersbaugh, and Beatty 2000; Porter 1980, p. 10). According to Klemperer (1987), switching costs lead to a decrease in rivalry since customers are less sensitive to competitor’s offerings. Julander and Söderlund (2003) identified three types of switching costs, which are (1) transactions costs, (2) learning costs and (3) artificial costs. Artificial switching costs concern “what the firm does to retain customers” (Julander and Söderlund 2003, p. 5) and are therefore completely controlled by the firm. According to this definition, bundling can be considered an artificial cost since as it is conducted by the firm. Additionally, as mentioned in the introduction, a distinction can be made between financial switching costs and psychological switching costs (Bell, Auh, and Smalley 2005; Lam et al. 2004). As mentioned before, this paper focuses on the psychological type of switching costs. In addition, due to the fact that firms have total control over the artificial switching costs and are therefore better able to influence it, it is most interesting to investigate this type of costs. As a result, this study will use the artificial psychological switching costs as a starting point to outline the antecedents of customer loyalty.

2.3

Relationship characteristics

(16)

16 Here, a distinction can be made between the depth, the length and the breadth of the relationship. Previous research already focussed on these relationship characteristics. However, until now, researchers only included one or two of the three relationship characteristics in their link between customer relationship and loyalty or focused on a B2C service setting (Kamakura, Ramaswami, and Srivastava 1991; Maicas, Redondo, and Olivan 2006). This research aims at a more integral approach by including all three characteristics of the customer-firm relationship as defined by Bolton et al. (2004). In addition, as explained in section 2.1.1, this research focuses on the B2B manufacturing environment in order to fill a gap in literature. As outlined, the effects of the customer-firm relationship characteristics on customer loyalty is expected to be stronger due to the B2B setting. Nevertheless, this increase in strength might be flattened as this study involves the manufacturing sector instead of the previously studied service industry. The expected effects of relationship depth, length and breadth on loyalty are explained and outlined below.

2.3.1 Relationship depth

The relationship depth is defined as “the deepening of the customer’s relationship with the firm through increased usage or upgrading” (Bolton, Lemon, and Verhoef 2004, p. 274). Bolton et al. (2004) found that relationship depth had a significant positive impact on revenues and therefore on customer lifetime value. Furthermore, Maícas et al. (2006) encountered that relationship depth decreased the likelihood of customer switching. So, relationship depth is expected to increase the likelihood of customer repurchase, or in other words customer loyalty. Upgrading has been defined as a characteristic of relationship depth (Bolton, Lemon, and Verhoef 2004; Venkatesan and Kumar 2004). Upgrading “involves the increase of order volume either by the sales of more units of the same purchased item, or the upgrading into a more expensive version of the purchased item” (Kamakura 2008, p. 42). According to Venkatesan and Kumar (2004), upgrading can be considered a switching cost. Since it is found that switching costs increase customer retention, it is expected that upgrading will positively influence customer loyalty (M. A. Jones, Mothersbaugh, and Beatty 2000; Lam et al. 2004; Venkatesan and Kumar 2004; Verhoef, Franses, and Hoekstra 2002). This leads to the following hypothesis:

(17)

17 2.3.2 Relationship length

Relationship length represents “the duration of the relationship and customer retention” (Bolton, Lemon, and Verhoef 2004, p. 274). The length of the relationship has been recognized as one of the most important measures of relational behaviour (U. M. Dholakia and Morwitz 2002). Furthermore, a long-term relationship is associated with a higher level of inertia, which indicates a rising loyalty (Bell, Auh, and Smalley 2005). In addition, a direct link between relationship length and loyalty has been found in previous research. This link indicates that a lengthy customer-firm relationship leads to an increase in customer loyalty (Bolton, Lemon, and Verhoef 2004; Maicas, Redondo, and Olivan 2006). Therefore, it is expected to find a positive effect of relationship length on customer loyalty, leading to the following hypothesis: H2: The length of the customer relationship has a positive effect on customer loyalty.

2.3.3 Relationship breadth

Relationship breadth is associated with “the number of additional (different) products or services purchased from a company over time” (Bolton, Lemon, and Verhoef 2004, p. 273). By buying additional products from the same firm, a customer tends to participate in a long-term relationship with the firm, has a higher contribution in margins and tends to increase the purchase frequency. All combined it leads to a higher customer profitability (Shah et al. 2012). In this paper, cross-buying refers to the customer buying products across different sections, or in other words, across different product categories. Additionally, it is found that relationship breadth, or in other terms cross-buying, leads to customer retention since relationship breadth can be considered a switching barrier for customers (Kamakura, Ramaswami, and Srivastava 1991). Therefore, it is expected to find a positive relationship between cross-buying and customer loyalty, as shown in the hypothesis below.

H3: Relationship breadth has a positive effect on customer loyalty.

2.4

Bundle completeness

(18)

18 a negative switching barrier, which is found to increase customer repurchase intentions (M. A. Jones, Mothersbaugh, and Beatty 2000; Julander and Söderlund 2003). According to previous research, bundling is an effective tool to increase customer loyalty (Hamilton and Koukova 2008; Harris and Blair 2006; Popkowski Leszczyc and Häubl 2010).

Nowadays, firms are using product bundles extensively. Although bundling might seem similar to the concept of cross-buying there is a substantial difference in the purchase process. In case of cross-buying, a customer purchases additional products next to their initial purchase. This indicates that a customer acts multiple times to acquire the wanted products (Shah et al. 2012; Venkatesan and Kumar 2004). In case of bundling, a customer purchases multiple products of the company at once, meaning that it costs less effort and time to acquire all products. To see the different effects on customer loyalty, it is essential to include both relations in this research. Although the effectiveness of the general concept of bundling is already investigated, it is not yet clear what effect different types of bundles have on loyalty. Therefore, this research is trying to extend the literature on bundling by investigating the effects of bundle completeness on customer loyalty. According to the Cambridge Dictionary (2018), completeness can be defined as “the quality of being whole or perfect and having nothing missing.” Therefore, in this article, bundle completeness indicates to what extent a certain bundle includes all necessary parts for a customer. Companies often offer different types of bundles to their customers, from a bundle only including the basic needs to more extensive bundles with multiple extra elements included (Venkatesh and Mahajan 1997). For example, a company might acquire a bundle including a theoretical book and an exercise book for a certain training of its employees. However, it would be more convenient (and such, the bundle would be more complete) if the bundle would also include an answer sheet in order to make the training more effective. Although current literature already focussed on the type of elements within a bundle (Sheng and Pan 2009; Stremersch and Tellis 2002; Wang, Sun, and Keh 2013), it is not yet investigated what the effect of bundle completeness is on customer loyalty. It can be assumed that a higher degree of completeness of a bundle leads to a higher perceived switching barrier for the customer. Therefore, it is expected that the completeness of a bundle positively affects customer loyalty, leading to the following hypothesis.

(19)

19

2.5

Moderator effects

Besides the main relationships, it is valuable to take a step further by considering the role of moderators in the model. There are several aspects that might influence the mentioned relationship between bundle type and customer loyalty. Previous research recognized the need to include customer characteristics as influencers of the development of customer loyalty (Cooil et al. 2007). Nevertheless, that research focused on a B2C setting. As business customers differ considerably from individual consumers (Lam et al. 2004; Zhang, Netzer, and Ansari 2014), it is of high relevance to show the effects of these differences by including them in a study focusing on the B2B context. Several characteristics will be included in this research, namely size of the buyer’s business and the geographical region in which it is located.

2.5.1 Organization size

First, the size of the buyer’s organization will be considered as moderator of the relationship between bundle type and customer loyalty. Rajamma, Zolfagharian and Pelton (2011), emphasized the importance to further investigate the effect of size on the development of loyalty in a B2B setting. In addition, it is interesting to include size as a moderator since the size of an organization highly affects the decision-making process. According to Dholakia et al. (1993), the decision time of an acquisition will increase as the size of an organization increases. As the decision time rises, the organization is more likely to reconsider all available options in more detail. In addition, often the number of people involved in the decision-making process increases when size increases. This implies that multiple people need to be loyal to the organization to create real customer loyalty as defined in this paper. Due to the in-depth consideration of alternatives and the increased number of people involved in the process, the relationship between bundle type and customer loyalty is expected to be weakened by organization size. This leads to the hypothesis as formulated below.

H5a: The customer organization size will weaken the relationship between bundle completeness and customer loyalty.

H5b: The customer organization size will negatively influence customer loyalty. 2.5.2 Geographical distance

(20)

20 It will be investigated whether the buyer’s location will affect the positive effect of bundle completeness and the customer-firm relationship on customer loyalty. This moderator is exclusively relevant in a B2B context as the relationship between buyer and seller is more personal than in a B2C setting. A high level of trust and relational certainty between the account manager of the selling company and the buyer is of great importance as it highly influences the customer-firm relationship (Palmatier et al. 2006). Furthermore, geographical distance is closely related to a high level of relational uncertainty (Knobloch and Solomon 1999). In other words, an increase in geographical distance between a buyer and a seller will most likely lead to an increase in relational uncertainty. In addition, regional loyalty has gained little attention in B2B marketing literature and refers to a state of having an “affective attitude to the region” (Nicholson, Tsagdis, and Brennan 2013, p. 21). However, detailed quantitative research on the moderating effect of geographic distance on the development of customer loyalty is lacking. Therefore, this research will empirically investigate these effects and it is expected that geographical distance will negatively moderate the main effects.

H6a: The effect of bundle completeness on customer loyalty will be stronger for the region the seller is located than for other regions.

(21)

21

2.6

Research model

The conceptual framework in figure 1 shows the visual representation of the hypotheses as formulated previously. To summarize, customer loyalty is expected to be positively influenced by the three customer-firm relationship characteristics, which are relationship depth, length and breadth. Besides, the effect of bundle type is expected to have a direct positive effect on customer loyalty. Finally, different customer characteristics, including organization size and geographical distance, are included as moderators, influencing the effect of the relationship characteristics and bundle completeness on customer loyalty.

Figure 1 - A conceptual framework of the antecedents of customer loyalty

3.

Data

(22)

22 as this market has a switch cycle of four years for schools using a lease contract and is usually even longer for schools buying the books (Mededingingsautoriteit 2011). In order to make a sufficient prediction of the antecedents of customer loyalty, it is decided to take a time-period of twelve years. A more extensive explanation for this decision will be further outlined in the remainder of this section.

3.1 Unit of analysis

As different parties play a role in the decision-making process (as outlined in section 2.1.4), it is difficult but necessary to select the right unit of analysis. As the course teachers are the most powerful in deciding which method to use, and so whether to switch or not, the unit of analysis is set at the course level. Moreover, this decision usually differs per section. A section within a school includes the teaching level, which is either HV (HAVO and VWO) or VMBO (LWOO, VMBO BK and VMBO GT) and it includes the distinction between junior and senior high school classes. Therefore, this study is performed at the course-section level per school. The final dataset contains data of 1049 schools. Due to the course and section level data, there can be multiple observations per school. The number of observations in the final dataset is 8469. The process of coming to the final dataset will be described in the following paragraphs.

3.2 Dataset preparation

3.2.1 Transforming variables

In order to assure that the analyses are correctly interpreted, the first step was to check whether all variables were recognized by the correct type by the statistical program R. The following variables needed to be transformed into a categorical variable (in R named a factor): methcode, schoolnr, regio and sectie. All other variables were already recognized as the correct type and no additional transformations were necessary. This process of checking and, when needed, transforming variables have been performed repeatedly after the creation of new variables. 3.2.2 Merging and subsetting datasets

(23)

23 removing all unnecessary variables. The variables that have been kept in the final dataset are schoolnr., course, sectie, schoolnaam, region, org_size, rel_length, cross-buyt1, bundle_comt1, bundle_up and churnt. Furthermore, it was immediately recognized that the school with school number 0 was an odd school as no information was available. After a check with colleagues, it was confirmed that this school did not exist and therefore all observations for school number 0 were removed from the dataset. After these small changes, the starting number of observations is 92,725. Now, the merging process will be explained following the most important steps. In order to be able to analyse the data on a course-section level, complete information on the section levels (N=23) were merged to the ‘wissel_data’.

Moreover, it has been recognized that multiple method descriptions were missing in the dataset that includes the bundle information, while a method code was present. After a detailed check, the dataset was found to contain competitors’ data besides the needed data of Company X schools. As this research only focuses on the churn rates of customers of Company X, all competitor data was removed from, both, the bundle and wissel datasets. Moreover, it was found that a few method descriptions were not related to the courses included in this research. This research only includes data for the courses Mathematics, Geography and Dutch. These specific courses are chosen since all three courses are taught in all sections and in all study years, avoiding messy data. Moreover, as these courses have the largest market share for Company X, there is a sufficient amount of observations. Therefore, the observations unrelated to these courses were removed from the ‘wissel_data’ dataset. Finally, it was decided to only keep data for a period of twelve years. The main reason behind this is that customers with a lease contract are only able to switch once in four years. In addition, the contracts do not have a set starting date for all customers, meaning that the possible switching year of customer A differs from the possible switching year of customer B. In order to analyse the switching behaviour in such a way that all customers had the ability to switch, the data has been divided into three four-year periods. With period t ranging from 2014 to 2017, period t-1 ranging from 2010 to 2013 and period t-2 ranging from 2006 to 2009. As the data now includes observations ranging from the years 2005 to 2017, the data was subsetted to only keep the data from 2006 to 2017. These steps lead to a wissel dataset with 54,960 observations

(24)

24 due to the need for a new building, it is usually moving to a place nearby the initial location and staying in the same region. Therefore, there might be only a minor bias due to this assumption. Furthermore, in order to attach a region to a location, it was needed to link the location to a province first, before being able to assign the right region to the location. Province data (N=1,976) has been used from an external source (Eropuitineigenland 2018) and contains data on which addresses belong to which province. It has been merged to the ‘wissel_data’ dataset (N=54960).

Further, information on the number of students per school (N=1,153) needed to be included in the data in order to create the variable org_size. This information was available within the company and originates from the organization “Dienst Uitvoering Onderwijs” (DUO), which offers a list of students per school annually. If available, the data of 2017 has been used to merge to the ‘wissel_data’ as it is assumed that the organization size is constant over time. Although the schools’ sizes do fluctuate over time, the sizes only vary a little over the years, especially when only analysing data over a twelve-year period. Nevertheless, a small bias might occur due to this assumption. In addition, from the organizations that churned before 2017 multiple organizations did not exist anymore in 2017 due to acquisitions or other reasons. For these schools, size data of the most recent years were imputed manually in the data.

Next, the data needed extra preparation to be aggregated on a school-course-section level. Therefore, the needed variables were created per period and the dataset was aggregated on the school-course-section level. This lead to a drastic drop in the number of observations (N=5,511) Finally, information on the bundles customers use (N=8,222) is needed in order to analyse the effect of bundle completeness and bundle upgrade on churn rates. First, this bundle data needed to be prepared before being able to merge it to the ‘wissel_data’. The first step was to merge more complete information about the methods to the bundle data independent of year, school, course and section. After doing so and after creating the bundle related variables bundle_comt1 and bundle_up on the right aggregate level, the bundle data were merged to the ‘wissel_data’ dataset, leading to a final number of observations of 5,511.

3.2.3 Variable description

(25)

25 period t (2014-2017) A description of all variables included in the analysis are described in table 1 below.

Table 1 - variable overview

* Variables are calculated for different time periods: • t-2: 2006-2009

• t-1: 2010-2013, • t: 2014-2017 ** i = customer i

*** No data available for t-2 **** Assumed to be constant

***** Based on the NUTS regions as described by Centraal Bureau voor Statistiek (2018)

Variables Type Description Operationalization *, ** Notation

Churnt DV Whether or not a customer churns in period t 𝐼𝐹 (𝑀𝑎𝑥 𝐽𝐴𝐴𝑅𝑖 = 2017, 𝑡ℎ𝑒𝑛 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 1) Binary 0 = no churn, 1 = churn

Bundle_up IV Whether or not customer i purchases a higher level bundle type in period t compared to period t-1 𝐼𝐹 (𝑏𝑢𝑛𝑑𝑙𝑒_𝑐𝑜𝑚𝑡𝑖 > 𝑏𝑢𝑛𝑑𝑙𝑒_𝑐𝑜𝑚𝑡1𝑖, 𝑡ℎ𝑒𝑛 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0) Binary 0 = no, 1 = yes

Rel_length IV Sum of years between the first purchase and last purchase of customer i over period t-1 and t-2

𝑀𝑎𝑥 𝐽𝐴𝐴𝑅𝑖− 𝑀𝑖𝑛 𝐽𝐴𝐴𝑅𝑖 Numeric

Cross_buyt1 IV Whether or not customer i purchased products for two or more sections in period t-1 𝐼𝐹(𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦𝑖−1 > 1, 𝑡ℎ𝑒𝑛 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0) Binary 0 = no, 1 = yes Bundle_com*** • Bundle_comt • Bundle_comt1 IV

The degree to which a product bundle includes all necessary elements of a method in period t Bundle_k period t: 𝐼𝑓(𝐴𝑅𝑅𝐴𝑁𝐺𝐸𝑀𝐸𝑁𝑇𝑖𝑡 = k items 𝑡ℎ𝑒𝑛 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 Categorical 1= single item 2= 2 item bundles 3= 3 item bundles 4 = 4 item bundles 5 = 5 item bundles 6 = complete bundle Bundle_com period t: 𝐼𝑓(𝐵𝑢𝑛𝑑𝑙𝑒𝑘 𝑖𝑡 = 1, 𝑡ℎ𝑒𝑛 𝑘, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒) The degree to which a product bundle

includes all necessary elements of a method in period t-1 Bundle_k period t-1: 𝐼𝑓(𝐴𝑅𝑅𝐴𝑁𝐺𝐸𝑀𝐸𝑁𝑇𝑖𝑡1 = k items 𝑡ℎ𝑒𝑛 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 Categorical 1= single item 2= 2 item bundles 3= 3 item bundles 4 = 4 item bundles 5 = 5 item bundles 6 = complete bundle Bundle_com period t-1: 𝐼𝑓(𝐵𝑢𝑛𝑑𝑙𝑒𝑘 𝑖𝑡1 = 1, 𝑡ℎ𝑒𝑛 𝑘, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒)

Org_ size**** IV Sum of the number of students registered at customer i in 2017

𝐿𝐿𝑁. 𝑙𝑜𝑘𝑎𝑡𝑖𝑒

= 𝑂𝑟𝑔𝑎𝑛𝑖𝑧𝑎𝑡𝑖𝑜𝑛_𝑠𝑖𝑧𝑒

Numeric

(26)

26

3.3 Data exploration

3.3.1 Missings

In order to be able to examine the proposed hypothesis correctly, the first check on missing data was performed. Besides analysing the number of missing values relative to the total dataset, a matrix based on the missing patterns has been created to investigate the relation between the missing values (see appendix 2). The variables that do include missing values are bundle_comt1, region, org_size and bundle_up. First, for both, bundle_comt1 and bundle_up, 0.7% of all data is missing. As can be derived from the missing pattern matrix in appendix 2, these two variables contain exactly the same missing values. This is obviously true as the variable bundle_up is based on the variables bundle_comt. Most likely, these missing values occur due to random reporting issues and are therefore assumed to be Missing At Random (MAR). Since the amount of missing values relative to the size of the complete dataset is so small, it is likely that they will not affect the outcomes of the analyses.

Second, org_size contains 6.6% missing values. These missing values originate from the data collection of DUO. As the data on organization size collected by DUO is collected independent of the variables used in the historical booklist data, the missing values of this variable are independent of the missing values of the other variables in the dataset. Additionally, from the matrix in appendix 2, it can be derived that only 65 missing values are overlapping with the missing values of other variables such as region. The other 303 missing values are independent of any other variable. Therefore, the missing values of org_size are most likely MAR. Although more than five per cent of the data for this variable is missing, it will most likely not have a substantial impact on the analyses as it is MAR.

Third, the variable region contains 2.9% missing values and will, therefore, most likely have no large impact on the analyses results. The reason for this missing data is most likely originating from the acquisitions or organizations shutting down at some point in time. As the majority of these missing values is probably caused by previously mentioned reasons, it is needed to delete these organizations from the dataset as the “disappearance” of a school from the dataset at a certain point of time is recognized as churning. Therefore, the organizations with missing values for this variable are likely to have a substantial impact on the analysis and are thus deleted from the dataset by listwise deletion.

(27)

27 data containing missing values is only a minor part of the total data, listwise deletion has been used to handle the missing data, leading to 4,923 observations.

3.3.2 Outliers

In order to test whether the data contains any outliers, boxplots were created for each numeric variable. For the variables rel_length and organization size, outliers were detected. Rel_length showed that a relationship length of seven years or less are considered outliers. Although these outliers are possible values which seem to be normal to occur (a length of less than eight seems quite normal), a dummy variable was created to be able to account for the outliers. This variable dummy_rel_length shows a 1 if the value falls outside the interquartile range and a 0 if this is not the case. The dummy variable shows that 626observations are considered an outlier, which is 12.7% of the total dataset. In addition, the variable org_size contained outliers as well. Three schools are considered an outlier with 5,250, 5,215 and 9,135 students. While these values are also not impossible 9,135 students may be questionable. After a check, it was derived that this school is actually a large overarching organization with multiple different schools. As this outlier may influence the results heavily, a dummy variable was created to detect the outliers. This variable dummy_org_size shows that 132 observations were considered an outlier. The created dummy variables are both included in several models to compare whether the outliers affect the performance of the analyses.

2.3.3 Normality checks

(28)

28 2.3.4 Descriptive statistics

After the thorough inspection of the data and the needed steps that had to be performed, two tables with descriptive statistics have been created. Table 2 below shows the descriptive statistics for the numeric variables, while table 3 shows the descriptive statistics for the categorical variables per period (if necessary).

The most interesting conclusions that can be drawn from table 1 is that, on average, customers stay with the firm on a long-term basis with 1 years being the average relationship length compared to a maximum of 1 years for two periods. In addition, the size of the organization has a considerable high standard deviation and a wide range between the minimum and maximum value. Although this might seems odd, also speciality schools (e.g. a special school for disabled people) are included and tend to have a low amount of students, while schools in large cities might serve a high amount of students. Moreover, in section 3.3.2 it was already concluded that there exist outliers for organization size, which will be accounted for in the further analyses. When examining table 3 it can be derived that almost two-thirds of the customers did not churn during the observation period, which is in line with the high average of relationship length. In addition, over ninety per cent of the customers acquire products of Company X for more than one section. This might indicate that the customers tend to have a high loyalty towards Company X. Moreover, almost one-third of the customers was involved in an upgrade of the bundle in period t compared to period t-1. This might be explained by the increased importance of online products, or by the high product satisfaction of Company X customers (Kien 2017). Further, the distribution of bundle completeness seems to differ considerably between period t and period t-1. While period t shows that more than 1% of the customers were in the possession of a complete bundle, in period t-1 it was around 1%. Although this seems like a major difference in the bundle type customers acquire, it might be that the difference comes from the differences in reporting over time. The further back in time, the less detailed the bundle data gets. As a result, a customer might have acquired a complete bundle, but it was not recognised as a complete bundle as only a smaller amount of products were reported. Therefore, the bundle completeness might not completely reflect the actual customer behaviour leading to a bias. Finally, most customers are located in the Zern region of the Netherlands, while the lW

Variable Min. Max. Mean Median SD

Relationship length 1 1 1 1 1

Organization size (constant)* 1 1 1 1 1

Table 2 - Descriptive statistics for numeric variables

(29)

29 customers are located in the X. Although this might seem odd, it is in line with the population numbers of the regions. As the number of schools is directly correlated with the number of people living in an area, it is not surprising to find the highest numbers for the Zern region and the loZ numbers for the Xern and Yern parts of the Netherlands.

4.

Methodology

4.1

Sampling

For the dataset, a balanced sample is used in order to estimate and validate the results of the models. The decision to use a balanced sample approach is made based on the findings that balanced samples outperform random samples in terms of estimation (Risselada, Verhoef, and Bijmolt 2010). The sizes of the balanced samples are 1935 for non-churners and 1823 for churners.

4.2 Correlation checks

Before the model estimation, correlation checks are performed to derive whether there exists multicollinearity among the explanatory variables and to obtain first insights in the correlations

Variable Overall t (2014-2017) t-1 (2010-2013) N % N % N % Churn Churn 1 1 No churn 1 1 Bundle upgrade Upgrade 1 1 No upgrade 1 1 Cross-buying Cross-buy 1 1 No cross-buy 1 1 Bundle completeness* No bundle 1 1 1 1

Two item bundle 1 1 1 1

Three item bundle 1 1 1 1

Four item bundle 1 1 1 1

Five item bundle 1 1 1 1

Complete bundle 1 1 1 1 Region** X 1 1 W 1 1 Y 1 1 Z 1 1

Table 3 - Descriptive statistics for categorical variables

(30)

30 between the explanatory variables and the dependent variable. For the initial correlation check, a Pearson correlation has been performed and is reported in table 4 below.

Rel_length Cross_buyt1 Bundle_comt1 Org_size Region Churnt

Rel_length 1 Cross_buyt1 1 1 Bundle_comt1 1 1 1 Org_size 1 1 1 1 Region 1 1 1 1 1 Churnt 1 1 1 1 1 1

Table 4 Pearson correlations

From table 4, it can be derived that it is likely that no problems with multicolliniearity will arise during the analyses as all correlation coefficients between the explanatory variables are below 0.2. In addition, the variables relationship length, cross_buy and region are found to be negatively correlated to churning, while bundle_com and org_size are positively correlated to churning.

In order to assure that there is indeed no issue with multicolliniarity of the explanatory variables. An extra check was performed by using the Variance Inflation Factors (VIF) of the explanatory variables included in the final model. The VIF scores are displayed in table 5 below.

Variable VIF Rel_length 1 Cross_buyt1 1 Bundle_comt1 1 Org_size 1 Region 1 Dummy_rel_lencht 1 Dummy_org_size 1

Table 5 Variance Inflation Factor (VIF) scores

As can be derived from table 5, indeed no issues of multicolliniearity were found to exist as all VIF scores are below the threshold of four. Therefore, no further actions are needed in terms of reducing such issues.

4.3 Modelling methods

(31)

31 the logistic regression model generates these utilities by using the probabilities and defines the binary choice of a customer to churn or not (as defined in equation 1).

[1] 𝑌𝑖,𝑡{

1 𝑖𝑓 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖 𝑐ℎ𝑢𝑟𝑛𝑠 (𝑖𝑛 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡) 0 𝑖𝑓 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑐ℎ𝑢𝑟𝑛 (𝑖𝑛 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡)

The latent utilities are defined as can be seen in equation 2. [2] 𝑈𝑖 = α + 𝑥𝑖,𝛽 + 𝜀𝑖

The latent utility and the observed choice are linked as shown in equation 3.

[3] 𝑌𝑖 = {0 𝑖𝑓 𝑈𝑖 ≤ 0

1 𝑖𝑓 𝑈𝑖 > 0

The probability of observing that customer i is equal to 1, is similar to the logistic cumulative distribution function (CDF), which is provided in equation 4.

[4] 𝑃[𝑌𝑖 = 1] = exp (𝑈𝑖)

1+exp (𝑈𝑖)

For a more detailed explanation on the binary logistic regression including more specific information on the model specification, estimation and validation, I would like to refer to the statistical book “Modeling Markets: Analyzing Marketing Phenomena and Improving Marketing Decision-making” (Leeflang et al. 2015).

Although the binary logistic regression model is commonly used in marketing literature for binary analyses, different methods exist to examine the effects of the independent variables on the dependent variable (Risselada, Verhoef, and Bijmolt 2010, p. 198). Numerous studies included more than one model to assess and compare the performance of the different model in order to obtain the best results (e.g. Coussement and Van den Poel 2008; Larivière and Van den Poel 2005; Risselada, Verhoef, and Bijmolt 2010; Wang, Sun, and Keh 2013). The results of which method performs best are mixed. In order to assure a high-quality analysis and to provide additional insights on which model obtains the best results, three additional approaches to analyse the hypotheses are included.

(32)

32 method for splitting as this method allows for multiway splitting and performs well for relatively large datasets with categorical variables (Antipov and Pokryshevskaya 2010). In order to avoid overfitting of the trees, the Cost complexity pruning method has been applied (Breiman 2017).

Finally, Random Forest is applied, which is a Machine Learning technique related to the classification tree. There are three commonly used ensemble-learning techniques related to decision trees, namely Bagging, Boosting and Random Forest (Kübler, Wieringa, and Pauwels 2017). These techniques combine several tree models and aggregate across the different decision rules. By doing so, they tend to improve the accuracy and generally avoid overfitting (Kübler, Wieringa, and Pauwels 2017). The Random Forest method is preferred over the other methods as it outperforms Bagging and Boosting methods on robustness, it is insensitive to the number of consideration predictors at the splits and it is more convenient to use as it is faster to perform (Coussement and Van den Poel 2008). Furthermore, Random Forest is preferred over other Machine Learning methods such as Neural Network and Support Vector Machine as several researchers found better performance scores of Random Forest compared to other methods (Coussement and Van den Poel 2008; Larivière and Van den Poel 2005; Wang, Sun, and Keh 2013). The random forest technique uses the random subspace sampling method to base as the splitting procedure and can be considered an averaging ensemble. For a more in-depth explanation of the random forest technique, I would like to refer to the literature of Kübler, Wieringa and Pauwels (2017). Although the performance of Random Forest is generally relatively high compared to other methods, it is more difficult to interpret the results. For this reason, the logistic regression will be used to interpret the results, while the random forest might be used for additional information on the relative importance of variables when performing as good as, or better than the logistic regression. The relative importance of the variables is shown in the Mean Decrease Gini score, which shows the impact per variable on the splitting of the trees. So, the higher the Mean Decrease Gini score, the more impact the variables has on the dependent variable.

4.4 Estimation parameters

(33)

33 relationship is positive when the coefficient shows a positive sign. For the more in-depth qualitative analysis, the marginal effects will be used. Marginal effects show what happens to the probability of churning if there is a small change in the independent variable by using the average values across all variables. If a variable is binary, the marginal effects should be interpreted as the change in the probability of churning if the IV changes from zero to one. For a continuous variable, the marginal effects are having a slightly different interpretation as it is about the effect of an instantaneous change in the IV. However, for the sake of interpretation, when a variable is an integer and not truly continuous, it is assumed that the marginal effects show the change in the probability of churning if the IV changes by one unit. As the continuous variables relationship length and organization size are both integers as the changes are always by one whole unit (for relationship length one year and for organization size 1 student), the last interpretation will be used in the analysis for these two variables. Furthermore, if the classification tree or the random forest outperforms the logistic regression, the results of these techniques will be used to provide additional information on the relative impact of the independent variables. Please note that the estimation will be based on the results of the analyses on a balanced training dataset. This dataset contains seventy-five per cent of the total data in order to be able to test the performance on the hold-out sample of 25% of the total data.

4.5 Performance measures

In order to assess the performance of the different approaches, two widely used performance measures will be evaluated. Firstly, the Top Decile Lift (TDL) will be compared. This widely used measure shows the ratio between the percentage of churners in the top decile divided by the percentage of churners in the complete dataset. So, the TDL shows how well a model is performing in classifying the customers with a high probability of churning (Risselada, Verhoef, and Bijmolt 2010). As these customers classify as the high-risk customers for companies, the TDL is the most important and insightful measure in this research. Secondly, the hit rate will be considered for comparison. The hit rate represents the percentage of the number of customers that were classified correctly as being either a churner or non-churner (Venkatesan and Kumar 2004). All performance measures are performed on a within period hold-out sample, which represents 25% of the total dataset.

(34)
(35)

35

4.6 Model specification

𝑈𝑖𝑡 = 𝛼 + 𝛽1𝑅𝑒𝑙_𝑙𝑒𝑛𝑔𝑡ℎ + 𝛽2𝐵𝑢𝑛𝑑𝑙𝑒_𝑢𝑝 + 𝛽3𝐶𝑟𝑜𝑠𝑠_𝑏𝑢𝑦𝑡−1+ 𝛽4𝐵𝑢𝑛𝑑𝑙𝑒_𝑐𝑜𝑚𝑡−1 + 𝛽5(𝐵𝑢𝑛𝑑𝑙𝑒𝑐𝑜𝑚𝑡−1∗ 𝑂𝑟𝑔𝑎𝑛𝑖𝑧𝑎𝑡𝑖𝑜𝑛_𝑠𝑖𝑧𝑒) + 𝛽6(𝐵𝑢𝑛𝑑𝑙𝑒_𝑐𝑜𝑚𝑡−1 ∗ 𝑅𝑒𝑔𝑖𝑜𝑛) + 𝜀 With:

Uit The utility that customer i obtains for going to churn in period t Bundle_upti Binary variable with 1 = bundle upgrade between period t-1 and t for

customer i, 0 = no bundle upgrade between period t-1 and t for customer i

Bundle_upt1i Binary variable with 1 = bundle upgrade between period t-2 and t-1 for customer i, 0 = no bundle upgrade between period t-2 and t-1 for customer i

Rel_lengthi Numeric variable indicating the length of the customer-firm relationship

Cross_buyt1i Binary variable with 1 = customer i did purchase products for two or more sections in period t-1, 0 = customer i did not purchase products for two or more sections in period t-1

Bundle_comt1i Categorical variable indication the degree of completeness of a bundle for period t-1 for customer i (single item, 2 item bundles, 3 item bundles, 4 item bundles, 5 item bundles, complete bundle) Org_sizei Numeric variable indicating the number of students of customer i Regioni Categorical variable indicating the region customer i is located

(36)

36

5 Results: analysis and empirical discussion

Now that the model is specified, the methods of retaining the results are outlined and the performance measures are set, it is time to come up with the results of the research. First, the best logistic regression model will be validated and specified, then the estimate results will be explained and the model performance will be assessed. Thereafter, the decision tree and random forest will be compared to the logistic regression based on their performance and if needed interpreted.

5.1 Model selection logistic regression

In order to compare the different methods, first the best model of the logistic regression should be determined. As mentioned in section 4.4, several measures are analysed in order to find the best model. In table 6 below, these measures are displayed.

Table 6 shows the different criteria only for the models related to the conceptual model presented in figure 1 in the theoretical framework. Model 1 only includes the main effects as defined in the conceptual model, while model 2 also includes a dummy variable for the outliers detected for the variable rel_length. As can be derived from table 6, model 2 outperforms model 1 on all model selection criteria except for the hit rate, showing a slight decrease in correctly predicted values. Therefore, the dummy variable will be included in model 3, which includes both, the main and the interaction effects as developed in the conceptual model. From table 6, it can be derived that model 3 outperforms model 2 on all criteria except for the BIC score. As the BIC score includes a high penalty for the number of variables included in the model, the BIC shows that model 3 is more complex than model 2 without adding enough information to make up for this complexity. Nevertheless, due to the improvement in the pseudo R squares, AIC and hit rate, model 3 is chosen as the best model in this research. Thus, model 3 is chosen to use for estimating the main drivers of customer churning.

Model AIC BIC LR Pseudo R2 Hitrate

Mcfadden Cox and Snell Nagelkerke

1* 1 1 1 1 1 1 1

2** 1 1 1 1 1 1 1

3*** 1 1 1 1 1 1 1

Table 6 Model comparison main models

* Basic model: churnt ~ rel_length + cross_buyt1 + bundle_comt1 + org_size + region ** Basic model with dummy for detecting outliers

*** Basic model with dummy and interaction effects

Notes: Due to technicalities in the data, bundle_upgrade could not be included in the models and the variable

org_size has been divided by factor 100 for the sake of interpretation (this did not affect the validity of the

(37)

37 Besides the previously named models, two additional models were created for mere explorative purposes. Both models can be found in table 7 below.

Model AIC BIC LR Pseudo R2 Hitrate

Mcfadden Cox and Snell Nagelkerke

4* 1 1 1 1 1 1 1

5** 1 1 1 1 1 1 1

First of all, model 4 includes all main effects as developed in the conceptual model in section 2.6 as well as all possible interaction effects with region and organization size as the indirect variables. Compared to model 3, the AIC and the pseudo R squares slightly increase. However, the BIC is found to be best for model 3. As model 4 includes all possible interaction effects, the increase in the BIC is caused by an imbalance of the high number of variables included and the small increase in information that these variables provide. Therefore, model 5 was created by including only the significant effects in the model. Although the BIC improved substantially by doing so, the other selection criteria were found to be worse or similar to model 4.

To conclude, model 3 is the most appropriate model to use for the further analyses of the main drivers of customer loyalty. In addition, model 4 and 5 provide some new insights in other variables that might influence customer loyalty.

5.2 Main drivers of customer churning

As the best model has been selected, it is possible to analyse and interpret the results of the logit model. Table 8 below shows the coefficients, the exponent of the coefficients, the marginal effects and the p-values per variable for the main effects and the indirect moderating effects.

Variable Estimate Exp (β) dF/dx P-value

Relationship characteristics

Relationship length t-1 and t-2 1 1 1 1

Cross-buy t-1 1 1 1 1

Bundle completeness Bundle completeness t-1 1 1 1 1

No bundle (benchmark) 1 1 1 1

Two item bundle 1 1 1 1

Three item bundle 1 1 1 1

Four item bundle 1 1 1 1

Five item bundle 1 1 1 1

Complete bundle 1 1 1 1 Customer characteristics Organization size 1 1 1 1 Region 1 1 1 1 W (benchmark) 1 1 1 1 X 1 1 1 1

Table 7 Model comparison explorative models

(38)

38

Y 1 1 1 1

Z 1 1 1

Interaction effects Bundle completeness t-1 * organization size

1 1 1 1

Two item bundle * org_size 1 1 1 1

Three item bundle * org_size 1 1 1 1

Four item bundle * org_size 1 1 1

Five item bundle * org_size 1 1 1 1

Complete bundle * org_size 1 1 1 1

Bundle completeness t-1 * region 1 1 1 1

Two item bundle * X 1 1 1 1

Three item bundle * X 1 1 1 1

Four item bundle * X 1 1 1 1

Five item bundle * X 1 1 1 1

Complete bundle * X 1 1 1 1

Two item bundle * Y 1 1 1 1

Three item bundle * Y 1 1 1 1

Four item bundle * Y 1 1 1 1

Five item bundle * Y 1 1 1 1

Complete bundle * Y 1 1 1 1

Two item bundle * Z 1 1 1 1

Three item bundle * Z 1 1 1 1

Four item bundle * Z 1 1 1 1

Five item bundle * Z 1 1 1 1

Complete bundle * Z 1 1 1 1

Constant 1 1 1 1

*Significant at 0.1 level | ** significant at 0.05 level | *** significant at 0.1 level

From table 8, it can be derived that there are several explanatory variables that drive customer churning. All bolded variables are showing significant effects and will therefore be interpreted one by one. Firstly, relationship length is found to have a negative effect on churning and is thereby supporting H2 (p=0.000). The probability of churning is decreasing with 0.0577 when the relationship length increases with one year. Secondly, customer cross-buy also show a negative effect on customer churning and thus supporting H3 (p=0.0011). The probability of churning is decreasing with 0.0959 when a customer switches from no cross-buy to cross-buy at the company. Thirdly, as expected, the type of bundle has a significant influence on customer churning (p=0.0287). However, only a four-item bundle compared to having no bundle is supported to negatively influence churning. The probability that a customer churns increases with 0.1841 when a customer has a four-item bundle instead of no bundle. This finding shows opposing results than was expected, as it shows that a more complete bundle (including four-items) is increasing the chance of churning. Therefore, H4 is not supported. This odd outcome might be caused by interaction effects that will be outlined in the remainder of this section. Fourthly, organization size positively affects customer churning directly. This supports H5b with a p-value of 0.0167. This implies that an increase of 100 students will result in an increase

(39)

39 of 0.0031 in the probability of churning. Finally, the variable region shows a significant effect on churning for the Yern region compared to the Wern region (p=00138). Implying that when organizations are located in the Y of the Netherlands instead of the W, the probability to churn decreases with -0.1361. This finding does not support H6b as it was expected that regional loyalty would explain lower churning rates in the Xern part of the Netherlands. However, it was found that the Yern region organizations are less likely to churn, indicating that there are differences per region, but that these differences are most likely caused by different reasons than regional loyalty.

To conclude the main effects, the variables relationship length, cross-buy and organization size showed results in line with the developed research model as outlined in section 2.6. However, the variables bundle completeness and region did not support the previously developed hypothesis.

Now that the direct effects are analysed, it is time to take a step deeper by investigating the indirect interaction effects of organization size and region. The first effect found to be positive is the interaction effect of organization size on the relationship of bundle completeness on churning (p=0389). When organization size increases, the relationship between a four-item bundle and churning turns out to be negative (instead of the positive relationship previously found). To be more specific, the churning probability decreases with 0.0052 when a customer purchases a four-item bundle instead of a single item. As organization size was expected to have a positive effect on the relationship between bundle completeness and customer churning, H5a is thus not supported by these findings.

Referenties

GERELATEERDE DOCUMENTEN

Moreover, a strong relationship or connection with the brand strongly predicts how often the brand was purchased in the past and will be purchased in the future,

The present research applies the model of Vogel, Evanschitzky, and Ramaseshan (2008) in order to investigate whether the relationship between customer loyalty and its

This suggests again that, in case of two-vehicle crashes, the second vehicle being a light truck increases the equivalent fatality rate for the first vehicle and, in case of

The tri-dimensional concept customer brand engagement (based on cognitive-, emotional- and intentional brand engagement) was used to understand what motivates customers

The purpose of this research was to investigate how specific aspects of a destination, including image, personality and attachment, influence attitudinal destination loyalty

The reasoning behind the outcome of the first hypothesis (H1) is that when the relationship quality between the customer and the service provider is perceived

• Provides insights into the effect of customer satisfaction, measured through online product reviews, on repurchase behavior!. • Adresses the question whether the reasons for

Besides investigating the overall effect of the five different customer experience dimensions (cognitive, emotional, sensorial, social, and behavioural) on customer loyalty, I