• No results found

Spatial proximity and customer lifetime value : is a good neighbor a priceless treasure?

N/A
N/A
Protected

Academic year: 2021

Share "Spatial proximity and customer lifetime value : is a good neighbor a priceless treasure?"

Copied!
32
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Spatial proximity and Customer Lifetime Value Is a good neighbor a priceless treasure?

Jip van Seeters, 5897521

Keywords: customer lifetime value, customer base valuation, social contagion, spatial proximity

Research Area: Customer Lifetime Value 29 June 2015

Faculty of Economics and Business Academic year 2014-2015

University of Amsterdam Master Thesis

Supervisor: Dr. Evsen Korkmaz Second Reader: Dr. Umut Konus

Abstract

The purpose of this study is to examine if the effect of social contagion can be observed in individual

customer lifetime value (CLV). This is researched using the BG/NBD model (Fader, Hardie and Lee, 2005) to calculate forecasted customer sales for an online grocery retailer operating in the Netherlands to compare customers based on co-location data using statistical analysis. In the findings, no significant effect of spatial proximity on CLV is observed; there is no proof for social contagion between customers based on co-location data. The first customers in the neighborhood to start purchasing at the online retailer have a higher CLV relative to other customers in the neighborhood. A limitation of this study is that spatial proximity assumes all ties between customers are equal (Iyengar, Van den Bulte and Choi, 2011). This means the nature of the tie nor the power of consumers on each other can be studied.

The implication of this study is that using spatial proximity as a proxy for social contagion is not related to higher CLV in this setting. Next to that the higher expected sales of first customers supports the notion by Gupta and Zeithaml (2006) that retention rate is the main driver of CLV. This can help firms to identify more profitable customers and allocate marketing resources accordingly. The originality of this study lies in the fact that it examines the network effect on individual customer worth.

(2)

2

Table of Contents

Abstract ... 1

Introduction ... 3

Literature Review ... 5

Customer Lifetime Value ... 6

Social contagion and CLV ... 8

Methodology ... 10

Data collection and processing ... 11

Variables ... 12

Analytical strategy ... 12

Strenghts and limitations ... 13

Results ... 14 Descriptive statistics ... 14 BG/NBD calculations ... 15 T-tests ... 16 Correlations ... 16 Multiple regression ... 18 Discussion ... 21 Conclusion ... 21 Ethical issues ... 22 Limitations ... 22 Implications ... 23 Bibliography ... 23 Appendix ... 27 Statement of Originality

This document is written by Student Jip van Seeters who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3

Introduction

Marketers are pressured more and more by shareholders to account for their productivity (Rust, Lemon & Zeithaml, 2004). Meanwhile, investment in marketing is often seen as a ´black box’. This is illustrated by John Wanamaker´s statement often cited by marketers: “half of the money I spend on marketing is wasted, the problem is I don´t know which half”. One of the ways to get more insight in this black box is through customer metrics (Rust, Lemon & Zeithaml, 2004). Gupta et al. (2004) distinguish between observable metrics, such as customer acquisition, retention and customer lifetime value, and unobservable metrics, such as satisfaction and loyalty. Because firms can use such metrics to base their decisions on them, Gupta et al. (2004) say that the metrics are strongly linked to financial performance (see figure 1).

Figure 1 – Framework for customer metrics and their impact on firms´ financial performance (Gupta & Zeithaml, 2006)

Next to pressure from shareholders to account for productivity, the availability and accessibility of consumer data has increased over the last few decades, creating and opening new avenues for quantitative tools for managing customers more effectively (Jain & Singh, 2002; Gupta et al., 2006; Chen & Fan, 2013; Ekinci, Ulengin & Uray, 2014). These customer metrics, such as Customer Lifetime Value (CLV), form an increasingly important and accurate part of the marketer´s tool kit, helping to focus marketing strategy (Rust, Lemon & Zeithaml, 2004). Gupta & Zeithaml (2006) define CLV as the long-run profitability of an individual customer. It can be used to identify customers and how much to invest in acquiring and keeping them. Eventually it can even be used to create custom marketing and product offerings (Gupta and Zeithaml, 2006). Because ultimately, marketing is “the art of attracting and keeping profitable customers” (Berger & Nasr, 1998 after Kotler and Armstrong, p. 18). Next to the pressure from shareholders the availability of relevant data due to digitalization has also lead to an increased use of customer metrics (Fader & Hardie, 2005). More and more data is collected and stored, especially when selling through online channels (Fader & Hardie, 2009). The information we produced in two average days in 2010 equaled the total amount of information produced from the beginning of time up to 2003 (Siegler, 2010) and with the rise of smartphones that amount can only have increased. With all this information available, businesses and academics are making more and more use of it.

According to Berger & Nasr (1998) CLV is an important construct for marketing decision making. This is linked to another and maybe the most important reason for using CLV: that correct use and interpretation of available customer data may lead to increasing revenues. IBM predicts that retailers can increase their revenues

(4)

4

with 60% if they get everything out of the ‘big data’ available to them (IBM, n.d.). Some of the ways IBM proposes retailers can achieve this increase in revenue is by changing how a company deals with its customers by

optimizing promotion effectiveness, using micro-market campaign management and real-time demand forecast. The most effective tool to use available data on customers to allocate marketing resources and account for marketing expenditure is the CLV (Berger & Nasr, 1998; Jain and Singh, 2002; Rust, Lemon and Zeithaml, 2004; Gupta and Zeithaml, 2006; Schulze, Skiera & Wiesel, 2012), and consequently the Customer Equity (CE) derived from the total of individual CLV’s of a firms customer base.

Most research on CLV focuses on the value of an individual customer without accounting for interaction (Gupta et al., 2006). However the effect of interaction between (potential) customers and its effect on customer lifetime value (CLV) needs more empirical validation: just like a product does not exist in isolation, but rather is part of a network of products (e.g. sales of bacon and eggs are related) (Oestreicher-Singer & Sundararajan, 2010; Oestreicher-Singer & Sundararajan, 2012; Oestreicher-Singer, Libai, Sivan, Carmi, & Yassin, 2013), the value of a customer might be dependent on the network he or she is a part of. The relationship between social interaction (e.g. WOM, intermediaries) and both observable (e.g. price, sales) and unobservable (e.g.

satisfaction, attitude) customer metrics has been shown in several studies (Salganik, Dodds & Watts, 2006; Shin, Hanssens & Gajula, 2008; Dewan and Ramprasad, 2009), but how can it be linked to CLV? U.S.-based

consultancy firm Bain and Company (n.d.) proposes a CLV-formula that incorporates the concept of ‘referral value’ to account for some of the interaction that takes place amongst (potential) customers (Bain and Company, n.d.). This idea is supported by the results of Schmitt, Skiera and Van den Bulte (2010), who show that the average value of a referred customer of a leading German bank is higher than that of a nonreferred customer..

This research is a search for more empirical evidence for being in a network affecting CLV by using customer-base analysis (Fader, Hardie & Shang, 2010; Fader & Hardie, 2010) to compare the value of the customer on the individual level. By grouping customers on the basis of co-location data (i.e. having the same postal code and thus living in close spatial proximity) the group of customers living in close proximity to other customers can be compared to the group not living in close proximity. Also it will be examined how starting to purchase before or after other customers in the neighborhood is related to CLV; the relative position of a customer towards other customers (i.e. the order number where 1 is the first customer from a specific neighborhood and 2 is the second customer in the neighborhood, et cetera). This to establish not only the difference between customers living in close proximity and not living in close proximity, but also to see how moment of joining the customer-base relative to other customers impacts CLV. Expectations about this relationship derived from theory will be discussed in the theoretical framework section of this paper.

This research contributes to previous research by looking for empirical evidence of undervaluation of a customer base when not taking into account the interaction between (potential) customers. This will provide support for the academic view that assessment of customers should be based upon more than only their transactions with a firm (Gupta et al., 2006; Kumar et al., 2010). This research also has managerial implications both on an aggregate level to determine customer equity and on a disaggregate level to allocate (marketing-) resources and account for productivity and expenditures. If customer-base valuation is going to be used to allocate resources more efficient, achieve more promotion effectiveness, create micro-market campaign management and use real-time demand forecasts, as has been suggested by both practical (IBM, n.d.) and academic sources (Dwyer, 1989), interaction between customers’ needs to be accounted for in these tools. Derived from literature, the research question for this thesis is: “To what extent do customers living in close proximity to earlier customers differ in the individual value of the customer in terms of total sales amount and expected number of sales over the next year?”.

My personal motivation to propose this research is that I believe that the ‘Don Draper’-days of flashy marketers making bold, subjective statements about a firm are over and the future of marketing is in a more

(5)

5

scientific approach. This thesis is structured as follows: in the next section there is a review of the relevant literature, identifying the research gap and ending with the research question. In the third section I elaborate on the research in terms of design, data collection, analysis and expected results. After this the results of a regression analysis are elaborated upon. And finally, the conclusion of this research, the limitations and ethical issues and its dissemination relevance are discussed in the last section.

Literature Review

In this section the relevant reviewed literature will be discussed to set down the theoretical framework for this thesis. First the relationship of a firm with its clients will be discussed in terms of impact on the firm and the firms’ decision making, then the concept of Customer Lifetime Value (CLV) will be elaborated upon and finally the concept of social contagion and how it’s related to CLV will be discussed.

According to Leeflang (2011) the future of marketing is distinguished marketing, aimed at creating superior customer value through the three underlying main concepts: organization, operationalization and orientation. In short, in a distinguished marketing setting, these concepts are defined as follows. First, ‘organization’ means the marketing department’s accountability, innovativeness, customer connection and cooperation with other

departments. Second, ‘operationalization’ means the knowledge, data and decision making process. And third, ‘orientation’ means the interaction with the customer, where the customers’ needs and values are leading. Leeflang (2011) argues that the role of the customer should be more central and delivering superior customer value should be the starting point for stating business objectives. Gupta et al. (2006) argue that firms derive revenue from creation and sustenance of long-term relationships and according to Malthouse et al. (2013) managing relationships with customers helps firms to maximize lifetime value. This is because the lifetime profitability of a customer is driven up by loyalty (Zhang, Dixit and Friedmann, 2010). Loyalty of a customer can also lead to expansion of the customer-base and acceleration of the acquisition of new customers and other value-adding activities for the firm (Gremler and Brown, 1998). Gremler and Brown (1998) call this the ripple effect which means that loyal customers create more value to the firm through includes repeat transactions, referring new customers, co-producing service, offering social support or benefits to other customers and employees and mentoring inexperienced customers. Next to expansion and acceleration higher loyalty is also associated with higher CLV (Kumar and Sha, 2004).

Bijmolt et al. (2010) make a distinction between direct and indirect impact of a customers’ actions on firm performance. The direct impact comes from current and future transactions with the firm whilst the indirect impact also includes behavioral manifestations. The authors name customer co-creation, word of mouth (WOM) and complaining behavior as the three types of behavior that affect the brand or the firm in ways other than purchase. Therefore the orientation of a firm should be on shifting from product-centric to customer-focused organization (Leeflang, 2011). This way a customer can become exogenous to the firm. In this case, Leeflang (2011) explains, endogeneity means the marketing mix is defined within the marketing department and the customer gets a ‘take-it-or-leave-it’- value offer and exogeneity means the marketing mix and value offer is determined by the wants and needs of the customer. This type of orientation can enable a firm to establish relationships with customers. The next frontier is customer engagement (CE) and is defined as “the behavioral manifestation from a customer toward a brand or a firm which goes beyond purchase behavior” (Bijmolt et al. after Van Doorn et al., 2010, p. 342) which means a customer can co-create value, competitive strategy and collaborate in a firm’s innovation process (Bijmolt et al., 2010). Van Doorn et al. (2010) define a larger set of behaviors that form the customer engagement behaviors (CEB). They include word-of-mouth (WOM), recommendations, helping other customers, blogging, writing reviews and engaging in legal action. They

(6)

6

emphasize that the manifestations can be either positive or negative. This means an accurate measurement of customer value beyond purchase behavior is needed for a company to make the right decisions. This is because consumers are influenced by each other, both on and offline and it’s in the interest of a firm to operationalize that influence. Because of this, Verhoef, Reinartz and Krafft (2010) propose to include measures of customer engagement in customer metrics and customer lifetime value calculations. They say a firm’s customer base may be undervalued if engagement is not included in the analysis. Kumar et al. (2010) take it a step further and create a framework for capturing total customer engagement value (CEV) rather than CLV. In their work CLV is

combined with customer referral value (CRV, total number of referrals by customers), customer influencer value (CIV, the customer’s behavior to influence others) and customer knowledge value (CKV, value added through feedback from the customer) to create the more elaborate and comprehensive construct, CEV. Davenport (2006) says customer analytics have a positive impact on firm performance. But most analytical models focus just on transactions, whilst behavior and conditions also have an effect on profit (Algesheimer & Wangenheim, 2008). For this reason further research into the effects of these behavioral and conditional components, social contagion in the case of this research, is needed.

According to Korkmaz, Fok and Kuik (2013) processing of the data on customer transactions stored by firms to provide information can help managers in the decision making process. Nevertheless, only individual transactions of customers are not enough to accurately estimate the present value of a customer. This is because future income may be over- or undervalued when not taking into account the behavioral component (Kumar et al. 2010). In addition, Singh, Freeman, Lepri and Pentland (2013) say research on spending behavior has focused mainly on using past buying behavior to determine spending patterns. The problem with using this data is that the focus lies with your current customers and identifying high-value prospective customers is difficult. Therefore this research is aimed at comparing customers affected in some way through social contagion with customers assumed not to be affected by it. To compare different customers, Customer Lifetime Value (CLV) is an excellent tool (Jain and Singh, 2002; Rust, Lemon and Zeithaml, 2004; Gupta and Zeithaml, 2006; Fader and Hardie, 2010; Fader, Hardie and Shang, 2010; Schulze, Skiera and Wiesel, 2012; Ekinci, Ulengin and Uray, 2013). In the next section the use of the CLV-construct is elaborated upon.

Customer Lifetime Value

This section is aimed at clearly defining what CLV means and how it can be operationalized for research. Berger and Nasr (1998) define CLV as the economic worth of a customer. They elaborate on this definition as the future cash flows the firm expects to receive from the customer over time; which means a present value (PV) of a customer is calculated. Chang, Chang and Li (2012) say measuring CLV is measuring the profit streams

generated by a customer across the entire customer life cycle. In order to do this a certain set of skills and data is required. The first requirement is that there is a dataset over a specific time span with specific content. The second requirement is the control of a set of statistical techniques to forecast. And the third requirement is that the limitations and shortcomings of these analyses can be analyzed and interpreted. They divide CLV models over scoring models, probability models and econometric models. Econometric models use set of covariates, scoring models a set of purchase characteristics and probability models look at the underlying stochastic process determined by individual characteristics. Jain and Singh (2002) explains the difference between the different lines of CLV research. First there is the research stream that uses revenue stream, acquisition retention and other marketing costs to optimize CLV and allocate marketing resources. Second there is the stream that aims to use customer base analysis to predict the probabilistic future value of customers. And thirdly there is a stream that analyzes the use of CLV in managerial decisions. This research falls under the second stream of analyzing

(7)

7

a customer base to predict future value of customers. To perform this research a probability model looking at the underlying stochastic process determined by individual characteristics is used.

In order to use the probability model, the nature of the relationship between a firm and a customer needs to be clear. Calciu (2009) explains the types of relationship that a customer can have with a firm and what this implies for the CLV analysis. In figure one he presents how the customer relationship can be contractual or non-contractual and discrete or continuous. He also gives examples of industries in which this type of consumer relationship might occur and a graphic representation of the dynamic of the relationship. Especially in the continuous non-contractual setting it is unknown when the customer defects and so the probability of a customer defecting needs to be included in the CLV calculation. This can be done either deterministically (assuming no heterogeneity amongst customer defection probability) or stochastically (assuming heterogeneity amongst customer defection probability). The stochastic way is preferred for customer base analysis. This is simplified by Fader, Hardie and Lee’s (2005) BG/NBD model, which allows for analysis based on just frequency, first purchase and last purchase. This model can be used to analyze a customer-base in a non-contractual setting. This model has the advantage that it allows for easy implementation using Microsoft Excel whilst obtaining similar results to more complex models (Fader, Hardie and Lee (2005). The model makes assumptions about the distributions of the transactions and the probability of the customer becoming inactive. Next to this it accounts for heterogeneity of individual customers. The BG/NBD model assumes the number of transactions to be Poisson-distributed with a heterogeneous transaction rate following a gamma distribution. The probability of a customer becoming inactive is distributed according to a geometric distribution, with heterogeneity following a beta distribution. Batislam, Denizel, & Filiztekin (2007) compare several models for performing a customer base analysis and conclude that the BG/NBD model provides an advantage over other models. It is slightly less accurate that the Pareto/NBD model, but because it is less ambiguous in parameter estimation and easier to use the BG/NBD model is preferred in this research.

Reinartz and Kumar (2000) say lifetime analyses used to be performed mostly in contractual setings because expected revenues can be forecast fairly accurately. But in noncontractual settings the “aliveness of the customer” is not so clear. Therefore CLV incorporates the probability of the customer still being alive. They show how other models can result in suboptimal allocation of resources, because they fail to distinguish between the short-life and long-life customers. Basing the CLV on probability of being ‘alive’ based on individual customer characteristics is therefore more interesting, because it can account for heterogeneity (Fader and Hardie, 2009).

Figure 2.1 Type of customer relationship

(8)

8

Verhoef, Van Doorn and Dorotic (2007) stimulate the application of CLV and customer value management in practice to show firms how to influence customer retention and customer expansion. However Gupta and Zeithaml (2006) also note that there are some challenges for future research. One of these challenges is to account for network effects amongst customers. The implicit assumption made by CLV analysis is that each customer should be approached individually and only has an individual value. However, it is possible that not taking in account network effects lead to an undervaluation of CLV and consequently Customer Equity.

Social contagion and CLV

Algesheimer and Wangenheim (2006) say that direct drivers of customer value such as customer transactions have been well understood and researched. However, indirect effects have been ignored although it is known that social network effects such as WOM and other communication are powerful influencers of buying behavior. Some customers may be bigger influencers than others and therefore not all customer relationships are equally profitable for firms. Algesheimer and Wangenheim (2006) say marketing should be more network minded and the network depends on the structure of relational ties between a set of actors. Hogan et al. (2003) found proof for such an effect when looking at the value of lost customers; a lost customer is not worth only the opportunity costs of lost revenue, but in fact creates extra losses of revenue because of loss of other current customers or not engaging with new potential customers. Schmitt, Skiera and Van den Bulte (2010) proofed that referred

customers have a higher retention rate, contribution margin (at first) and are more valuable in the long and short run. This supports the argument that CLV and CE are undervalued if interaction amongst (potential) customers is not taken into account is further strengthened by studies about the effects of those phenomena on consumers and corporate. In another research Nitzan and Libai (2011), using data from a postal code database, found that exposure of customer to another defecting customer increases defection hazard by 80%. They blame this on social influence: “the transmission of information that reduces uncertainty and search effort through normative and social pressure, or as a result of network externalities” (Nitzan and Libai, 2011, p. 25). To control for externalities, Salganik, Dodds and Watts (2006) performed an experiment where they created a web-based artificial music market, where a first group is asked to listen and rate music. A second group is provided with the same music database, but with the addition that they are shown the ratings of others as well. The social influence caused an entirely different list of hits and misses. Other studies have shown a rise of price and sales due to a buzz around a product (Shin, Hanssens and Gajula, 2008; Dewan and Ramprasad, 2009) which falls under the category of WOM. Cheng & Zhou (2010)´s review of previous literature about eWOM showed that eWOM affects both price and probability of a sale, which would mean that a customer is worth more than just its individual long-run profitability. Apart from through WOM social influence can also occur through other forms of interaction. Manski (2000) explains how interaction can form a consumer’s behavior through observational learning; a consumer observing the behavior of another consumer can lead to choosing different actions of the observing party. Brodie, Hollebeek, Juric and Ilic (2011) say these customer to customer interactions need more attention in future research in the area of customer management.

Iyengar, Van den Bulte and Choi (2011) call this customer to customer interaction social contagion. They say social contagion can occur both through physical interaction as through non-physical interaction (e.g. online) between customers. They say to study the nature of the interaction between the customers specific data on the network is needed and a popular alternative to using network data is to use proximity. Proximity data (or as they also call it: co-location data) is defined as people living or working in close physical proximity. Its advantage for firms is that it is cheap, efficient and complete, as the data is almost always collected by a firm. The upsides and downsides of co-location data are presented in table 2.1.

(9)

9

Table 2.1 Downsides and upsides to spatial proximity to study social contagion

Downsides Upsides

It assumes all co-located members are connected at the same level. So the nature of the tie can not be studied using co-location data.

Also represents influence in general in the form of: 1. Competitive intensity among adopters is incorporated

Shared contextual conditions and homophilous location decisions: because people who are alike choose the same place to live and/or having the same place to live converges behavior.

2. Legitimation: copying behavior leads to feeling good about oneself.

Similar behavior can not be interpreted as social influence, therefore causal effects can not be identified because of the low level of granularity.

3. Social learning: offline communication strenghthens social learning through representativeness

(trustworthiness of source can be determined) and face-to-face interaction (stronger interaction).

This study is not aimed at the nature of the tie. It is aimed at seeing if besides from expansion and acceleration there is also an effect of increasing retention, frequency of purchase and sales amount. Even though the neighbor relationship might not be as strong as friendly of family bonds, weak ties can still be strong

(Granovetter, 1973). Algesheimer and Wangenheim (2006) distinguish between a directed tie (e.g. giving advice to someone) and an undirected tie (e.g. physical proximity to your neighbor). They suggest a customer network lifetime value (CNLV) as an alternative to CLV, where characteristics determining a customer’s value in a network are measured and used to direct marketing efforts more efficiently. However, CNLV requires a lot of extra information on a customer and a network. So for the scope of this research co-location data on spatial proximity (i.e. customers living in the same neighborhood) will be used in this study.

Kumar, Ramani and Bohling (2004) have provided proof that each customer varies in value to a firm. If a firm knows the value of its individual customers, decisions can be made accordingly. Knowing individual

customer value allows for choosing the right marketing mix (e.g. frequency, channel) best suited for each customer. Weinberg and Berger (2011) introduce the concept of Connected Customer Lifetime Value (CCLV) as the present value of the transactions made by a customer plus the present value of the transactions made by other customers that are influenced by the first customer. They do this because they say online behaviors, such as tweeting about a product or brand, “can bring about purchase, when otherwise it would not take place” (Berger and Weinberg, 2011, p. 329). Similarly, Klier, Klier, Probst and Thiel (2014) showed how neglecting network effects leads to a misevaluation of customers. They give an alternative to individual CLV in the form of customer lifetime network value (CLNV) in which the net network contribution of customers is considered. But just like the CNLV-calculation, this calculation needs a very rich set of data to compute CLNV; knowing at what point a customer was referred, by whom and to what part of the value of the latter customer can be attributed to the former is rarely in a customer database today.

So how does social contagion affect customer lifetime value? Lee, Lee and Feick (2006) valuing and making decisions based only on financial transactions can be misleading because some of the relationship-based indicators are latent (e.g. word of mouth) but contribute to CLV. They use aggregate saving (on marketing) to incorporate word-of-mouth effects in CLV. In their effort to broaden the scope of word of mouth research Libai et al. (2010) say that interactions of customers can influence all aspects of the purchase behavior including

(10)

10

frequency and volume. They say that off-line social contagion can influence a firms profitability. In addition, Drye (2011) has shown that customers, or in his article supporters of a good cause, are not scattered dots but they are connected and this may explain a non-linearity in the CLV-calculation. Walsh and Elsner (2011) compare the CLV of market mavens (those customers more active in WOM activity) and non-mavens and find that market mavens buy more and refer more, which would actually indicate that the not the referred, but the referring customers are actually making the most and highest transactions.

Referrals can be valuable to a company in several ways. Kumar, Petersen and Leone (2007) distinguish between type 1 and type 2 referrals. Type 1 referrals would not have become customers had they not been referred and type 2 referrals would have become customers anyway but were also referred. They propose to add the value of type 1 referred customers to the CLV of the reffering customer to create CRV, however this does not include the amount that the type 2 referrals may or may not spend extra due to the referral. Libai, Muller and Peres (2012) call the type 1 referral ‘expansion’ and the type 2 referral ‘accelaration’ both are attributed to word-of-mouth and thus both are seen as a way of adding value to customer equity. By using this to their advantage companies can start seeding programs in areas that are expected to yield higher CLV’s. Libai, Muller and Peres (2012) also say it’s important to find out how retention, frequency and amount are influenced in referred customers.

Next to accelerating and expanding referrals can also increase customer retention. Weinberg and Berger (2011) say retention can accelerate profits through referrals, because customers may act as marketers and positive word-of-mouth increases the retention, leading to higher CLV. So how can these customers be identified? Malthouse and Blattberg (2005) say relationship marketing assumes that firms can be more profitable if they identify the most profitable customers and invest disproportionate marketing resources in them. To predict referral behavior, Wangenheim and Bayón (2007) use attitudinal concepts such as satisfaction to create

customer types that can help estimate how many referrals will be made. Finding efficient ways of predicting referral behavior can help a firm to target the most profitable customers more easily. For this reason finding generalizable circumstances under which customers are more valuable than other customers is attractive. Therefore this study will also look at how customers in a neighborhood affect each other; are customers that start purchasing at a firm because of social contagion different in value than other customers? To answer this the order number of customers will also be incorporated in the study, similar to Laamanen (2013) who found that the order number of an opera performance lead to higher ticket sales because of the opportunity for theatre visitors to engage in WOM.

In conclusion of this paragraph, this research is aimed at establishing if the neighborhood effect does not only affect the number of customers in a geographic area, but also the CLV and ultimately profits, through the expansion of expected customers transactions and expected sales amount. By looking at customers of an online grocery retailer operating in the Netherlands and comparing the CLV of different customers on the basis of co-location data (e.g. first vs. later customers, only customer in neighborhood vs. multiple customers in

neighborhood) The purpose of this research is to establish if there is empirical evidence of higher CLV for customers living in close spatial proximity to other customers. To guard for the reliability of this research, other factors that could lead to higher value of a customer-base living in close proximity (e.g. higher income, less proximity to a retailer, distinct demographic characteristics) will need to be controlled for.

Methodology

In this chapter the method used to answer the research question is elaborated upon. This section describes the choices and trade-offs necessary to execute the research properly and get to the right answers about how the

(11)

11

individual CLV of customers is affected by their neighbors. In this section the collection of the data and how the data is analyzed will be discussed.

This research is exploratory because it is aimed at finding the effect of a form of social contagion between customers where it is not actually measured; being the neighbor of an existing customer of an online grocery retailer and starting to purchase there as well does not mean this is because of social contagion. Nevertheless, if the effect can already be observed on this scale, it would contribute to the theoretical base for including the effect of interaction between customers in the CLV research, as is suggested by Gupta and Zeithaml (2006)

The study will be executed following a quantitative method of data analysis. Using panel data the study is aimed at measuring actual consumer behavior. Actual behavior is preferred over behavioral intention because consumers sometimes differ in what they do and what they say they do for three reasons: 1. changes in the consumer environment (Peter and Olson, 2001), 2. the disruptive effect of psychological distance (Trope, Liberman and Wakslak, 2007) and 3. the level a consumer beliefs to have control over the situation (Armitage and Christian, 2003). Therefore this study will look at actual behavior and conditions.

From the sales transaction data of an online grocery retailer, CLV is determined per individual customer. By linking this individual data to neighborhood open access data provided by the CBS, it can be established which customers belong to which neighborhood and the relevant characteristics of the neighborhood can be controlled for. The customers of the retailer will be divided into two groups: 1. customers with no neighbors also purchasing at the company and 2. customers who live in a neighborhood with several customers purchasing at the online grocery retailer. Next to this the customers who share a postal code will also be examined on the basis of their number in the order of purchasing relative to other customers.

Data collection and processing

A dataset of sales to individual customers of an online grocery retailer in 2008 and 2009 is available for use in this research project. The dataset provided data on purchase behavior. The data provided on costs was insufficient to use in analysis. The benefit of investigating social contagion based on co-location data is that postal codes are collected for every customer. This is helpful to show whether or not the customer had other customers living in close proximity. The data allows for a view up to street-level, which is the complete postal code existing of 4 numbers and two letters (e.g. 1111 AB). For this research the data will be studied on the neighborhood level, meaning only the numbers are used. This leads to neighborhood populations of between 50 and over 20.000 addresses.

To prepare the data for analysis first the postal codes were reduced to just the 4 digits and a numerical measure was constructed to represent the time of purchase. The transactions took place over a time span of 103 weeks and therefore a transaction on the first day of the first week lead to the value 102,86 and a transaction on the last day of the last week would be 0,14. The transaction data was aggregated to a customer-level with one postal code per customer. Some customers had purchased from multiple postal codes. In that case the values for the control variables of the postal code at which the first transaction took place were used.

The data for the variables, which will be discussed more elaborately in the next section, had to be calculated or collected for the research. First, the dependent variables, the predicted number of sales for the next year per customer, were estimated using the BG/NBD model (Fader, Hardie and Lee, 2005). The time horizon of a year was used because Schulze, Skiera and Wiesel (2012) say that a long-term customer lifetime value is a better horizon for making a projection of the value of a customer base. Secondly, the independent variables of customer order and unique customer were calculated using the matchsequence-syntax of SPSS. For the customer order number the first customer in a neighborhood was marked one, the second two, et cetera. For the

(12)

12

uniqueness of the customer a dichotomous variable was constructed where 1 meant no other customer shared that customers’ postal code and 0 meant there were customers in the dataset that had the same postal code. Thirdly, per postal code neighborhood characteristics were collected from the Dutch Central Bureau for Statistics (CBS), constructing the control variables population, income, density, number of supermarkets in a range of 3 km and distance to supermarket.

Variables

The dependent variables, forecasted sales (E(x)t) and expected sales amount (E(x)t * average sales per purchase), are constructed using the BG/NBD model by Fader, Hardie and Lee (2005). By using this as a dependent variable, an objective measure of comparison of the present value of each individual customer is constructed. Next to the expected number of transactions over a year, the total sales amount was also used as a DV to see whether consumer behavior differs per neighborhood in respect to sales. This was calculated in a crude but efficient manner by multiplying the average sales per purchase with the amount of expected transactions.

The two predictor variables, order number of customer in a neighborhood and single customer in a neighborhood were constructed in SPSS. These predictor variables represent the possible social contagion of customers living in close proximity. Firstly, a dummy variable was created to distinguish between neighbors (0) and unique customers from a neighborhood (1). Next to this the number of neighbors on both levels can also be used as a variable to see if the effect of interaction on customer base valuation differs if there are more

neighbors. Secondly, the customer order number is used to see if customers who start purchasing later than other customers with the same postal code differ in value. This could also be a consequence of social contagion.

To control for other factors that might cause different type of behavior the neighborhood characteristics as a determinant of buying behavior are included in the analysis. Of course the number of inhabitants of the neighborhood may cause a higher number of customers in that neighborhood, but this can also depend on the geographical size of the neighborhood. Therefore, the measure of population density is used, which is the number of inhabitants per km2. This represents how close neighbors actually live to each other as well. Next to this, the number of supermarkets and the distance to get to that supermarket are also incorporated as controls. This is because the data is collected from an online grocery retailer. Another factor that could affect the buying behavior of customers in the same neighborhood is income. This could lead to a higher opportunity cost of going to a super market and therefore a higher chance of using a (more expensive) service like online grocery

shopping. Unfortunately, the CBS does not have the income per neighborhood available for the entire dataset. Especially because the missing values for income occur mostly in the smaller neighborhoods, this is a risk of bias. Therefore the variable ‘Income’ will only be explored in the correlation matrix, but not in the regression.

For an overview of the variables see appendix 2.

Analytical strategy

To explore the expectations, first the descriptive statistics were analyzed. The outcome of the BG/NBD model and its parameters were analyzed in detail. To answer the question if single customers in a neighborhood are worth less than multiple customers in a neighborhood an independent samples T-test was used to compare the means and variances of the two groups. This test was chosen because the outcome is continuous, there are two categories of groups in each test and the groups are about different participants. The assumptions for parametric tests were met (appendix 3). After the T-test a correlation matrix was assessed to explore the relationships between the variables. And finally a multiple regression was used to see the effect of the independent variables on the dependent variables. This form of regression was used because of the continuous nature of the outcome,

(13)

13

predictor and control variables (Field, 2009). In order to execute a proper regression, some assumptions on normality and independence were assessed. The results of this assessment are presented in appendix 3a. For some variables the assumption of normality was violated and a transformation had to be used, which according to Field (2009) is a proven way of dealing with non-normality. The transformations are shown in the formulas for the regression. In appendix 3b the skewness and kurtosis of the variables after transformation are presented and show no severe nonnormality. The expectation, based on previous research, is that the forecasted transactions and expected sales amount will be higher for the higher ranked customers (i.e. those who could have been exposed more to social contagion).

The formulas for the regression are as following:

Formula 1 𝐸𝐸(𝑋𝑋)𝑡𝑡= 𝛼𝛼 + 𝛽𝛽1�𝑋𝑋𝑛𝑛 𝑝𝑝𝑝𝑝� + 𝛽𝛽2�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝑝𝑝𝑝𝑝� + 𝛽𝛽5𝐿𝐿𝐷𝐷�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐸𝐸𝑝𝑝𝑝𝑝� + 𝛽𝛽6�(𝐷𝐷𝐷𝐷𝑁𝑁𝑁𝑁𝑁𝑁𝐷𝐷𝑝𝑝𝑝𝑝) + 𝜀𝜀 Formula 2 𝐸𝐸(𝑋𝑋)𝑡𝑡= 𝛼𝛼 + 𝛽𝛽1(𝐷𝐷𝑆𝑆𝐷𝐷𝑆𝑆𝐿𝐿𝐸𝐸 𝐷𝐷𝐶𝐶𝐷𝐷𝐷𝐷𝐶𝐶𝑁𝑁𝐸𝐸𝑁𝑁) + 𝛽𝛽2�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝑝𝑝𝑝𝑝� + 𝛽𝛽5𝐿𝐿𝐷𝐷�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐸𝐸𝑝𝑝𝑝𝑝� + 𝛽𝛽6�(𝐷𝐷𝐷𝐷𝑁𝑁𝑁𝑁𝑁𝑁𝐷𝐷𝑝𝑝𝑝𝑝) + 𝜀𝜀 Formula 3 𝐿𝐿𝐷𝐷(𝐴𝐴𝐴𝐴𝑆𝑆_𝐷𝐷𝐴𝐴𝐿𝐿𝐸𝐸𝐷𝐷 ∗ 𝐸𝐸(𝑋𝑋)𝑡𝑡) = 𝛼𝛼 + 𝛽𝛽1�𝑋𝑋𝑛𝑛 𝑝𝑝𝑝𝑝� + 𝛽𝛽2�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝑝𝑝𝑝𝑝� + 𝛽𝛽5𝐿𝐿𝐷𝐷�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐸𝐸𝑝𝑝𝑝𝑝� + 𝛽𝛽6�(𝐷𝐷𝐷𝐷𝑁𝑁𝑁𝑁𝑁𝑁𝐷𝐷𝑝𝑝𝑝𝑝) + 𝜀𝜀 Formula 4 𝐿𝐿𝐷𝐷(𝐴𝐴𝐴𝐴𝑆𝑆_𝐷𝐷𝐴𝐴𝐿𝐿𝐸𝐸𝐷𝐷 ∗ 𝐸𝐸(𝑋𝑋)𝑡𝑡) = 𝛼𝛼 + 𝛽𝛽1(𝐷𝐷𝑆𝑆𝐷𝐷𝑆𝑆𝐿𝐿𝐸𝐸 𝐷𝐷𝐶𝐶𝐷𝐷𝐷𝐷𝐶𝐶𝑁𝑁𝐸𝐸𝑁𝑁) + 𝛽𝛽2�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝑝𝑝𝑝𝑝� + 𝛽𝛽5𝐿𝐿𝐷𝐷�𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐸𝐸𝑝𝑝𝑝𝑝� + 𝛽𝛽6�(𝐷𝐷𝐷𝐷𝑁𝑁𝑁𝑁𝑁𝑁𝐷𝐷𝑝𝑝𝑝𝑝) + 𝜀𝜀

Strenghts and limitations

The strength of this research is that because of the proven quantitative methods used, the internal validity is very high. Also the data representing actual behavior is usually a little higher in the hierarchy of evidence and the number of transactions studied is high. But most importantly, the strength of this research is that measuring an effect of spatial proximity on sales provides strong support for the argument that interaction between customers’ needs to be taken into account when analyzing a customer base. And when it is not measured, than it can still provide relevant information both for future researchers as for managers looking for more efficient ways of allocating resources.

But the research also has some limitations. The first limitation of this research design is that proximity is no guarantee for social contagion; just because two customers are neighbors it does not mean they interact or that one customer is stimulated by the other to start purchasing at a specific firm. Therefore the relationship was not expected to be very strong. Nevertheless, if the effect is observable in this research, the effect can be expected to be larger if actual referral data is incorporated in future research. The second limitation is that it cannot be excluded that there is an omitted variable (e.g. a neighborhood characteristic not included in the control variables) that affects consumer behavior. The third limitation is that it is not possible to study the nature of the tie between customers or the relative power of individual customers over other customers, as explained by Iyengar, Van den Bulte, and Choi (2011). There is no insight into how the customers are related or not and what

(14)

14

the strength of that relationship is. Nor is it possible to distinguish market mavens or highly influential customers from other customers or even to see if there is negative feedback from one customer to another. As the company from which the data for this research was collected is already collecting richer and more detailed data on which customer was referred by which customer and how customers are linked, this is a great avenue for future research.

Results

To explore the relationship between the independent variables and the dependent variables, the analysis consists of the descriptive statistics, the BG/NBD calculation, independent samples t-tests, a correlation matrix and two separate regression analyses for the two DV’s.

In order to perform this analysis the data must meet the assumptions of normality and independence. However the kurtosis and skewness-values of the data as shown in Appendix 1a showed a non-normal distribution. Therefore some of the variables have been natural logarithmically transformed to prepare it for the analysis. Even though there was still some skewness and kurtosis, the transformations improved the normality of the distribution (Appendix 1b).

To make sure there was no bias due to outliers the relevant z-scores were calculated using SPSS (appendix 3d) and showed no threat. The missing values in the dataset were coded 999. The reason data for some neighborhoods could not be collected by the CBS was that these neighborhoods were too small. If the characteristics of the cases are different, computing means would create a larger bias according to IBM (2015). Therefore deleting cases pairwise or listwise was preferred over computing means. Pairwise deletion was preferred over listwise deletion because it was deemed unnecessary to delete the entire case, because the missing values were limited to two control variables and amounted to 4 for distance to supermarket and only 1 for number of supermarkets.

Descriptive statistics

The sample used in this research consisted of 7.868 transactions by 230 unique customers. The sample was taken from the 2008 and 2009 sales data of an online grocery retailer operating in the Netherlands. It includes only transactions from the Amsterdam area, including periphery. Postal codes in this region range from 1011 to 1199. Postal codes in the Netherlands consist of 4 numbers, representing the area, and 2 letters, representing the street or house. For the aim of this research the 4-number area-level was deemed more fit. The selected sample only includes the sales that have been made on Monday morning between 9 and 11 am. Even though more spread over geographical location, week and time of day would improve the validity of this research, the extra time and effort to process the data does not fit the scope of this research.

This sample is appropriate for the research question because it includes information on the

neighborhood the customer lives or works in and specifically on the dates and frequency of all transactions. This information is necessary to calculate the expected transactions using the BG/NBD model. From the postal codes the characteristics of each neighborhood from which orders were placed can be derived using open access data of the Dutch Central Bureau of Statistics (CBS). A more extensive table containing descriptives of the

neighborhoods, customers and transactions is provided in appendix 1a and 1b.

Table 4.1 Descriptives

N Minimum Maximum Mean

Standard Deviation

(15)

15

Forward looking transactions over the next year

(E(x)t)

230 .07 135.51 34.71 33.89

Expected average sales amount next year (E(x)t*Average Sales)

230 5.85 32046.75 4654.63 5199.58

Customer order number in Neighborhood 230 1.00 17.00 4.77 3.90

Single Customer in Neighborhood 230 .00 1.00 .09 .28

Distance to Supermarket in Neighborhood 226 .20 2.40 .67 .47

Number of Supermarkets in Neighborhoods 229 .00 158.80 50.01 41.58

Density of Neighborhood per km2 230 146.00 11654.00 5687.96 3426.53

The number of forward looking transactions and the expected sales amount appear to have a relatively large standard deviation. This can most likely be attributed to the large amount of B2B customers (table 4.2) as business consumers are likely buying groceries for a lot more people at the same time for business lunches et cetera. The data was selected from the Monday morning timeslot and many companies use this time to order their groceries for the rest of the week. The high percentage of business customers also explains the high expected average sales amount (see table 4.1).

Table 4.2 B2C vs. B2B Count Percentage B2C 41 0.52% B2B 7827 99.48% Total 7868 100% BG/NBD calculations

To calculate the customer lifetime value the BG/NBD model was used. In the BG/NBD model parameters are estimated in Microsoft Excel using a maximum likelihood estimation with the add-in solver function. These parameter estimates can be used to evaluate the degree of heterogeneity and average defection or purchase rates. The results of the parameter estimation is shown in table 4.3 and interpreted following the same structure for calculations as Korkmaz, Fok and Kuik (2014) used.

Table 4.3 Results of the BG/NBD Maximum Likelihood Estimates Entire Customer-base N 230 r 0.146 α 0.005 r/α 28.406 a 5.680

(16)

16

b 256.567

a/(a+b) 0.022

log-likelihood 22267.44204

The model shows that the purchase rate (r/ α) of an average customer is 28.41. The heterogeneity in purchase rates can be observed by looking at the shape parameter r. The shape parameter r can be seen as an inverse measure of the concentration in purchase rates (Schmittlein, Cooper and Morrison, 1993) and therefore it can be concluded that the heterogeneity for this customer base is high. Within the BG/NBD model customers can only drop out at a repeat transaction moment (Fader, Hardie and Lee, 2005). To get an idea of the defection probability of an average customer the formula a/(a+b) can be used. For this customer base the chance that a customer defects the next purchase it is estimated to be 0.022. The large values of a suggest low dispersion in defection rates.

T-tests

So are there statistically significant differences between neighbors and non-neighbors? In order to investigate if there is a difference between the two groups (single vs. multiple customers in the neighborhood), independent-samples t-tests were used. The test allowed to see if there is a significant difference in the mean scores for the dependent variables E(x)t and Expected Sales.

The analysis revealed no significant difference in variance in E(x)t and Expected sales amount between single customers and neighborhood customers with scores of t(228)=-.942;p,ns and t(228)=-.385;p,ns

respectively. For DV’s Levene’s test was not significant so equal variances were assumed.

To elaborate on this a bit further it has also been examined if there are differences between neighborhoods with a higher or lower amount of customers. Using different cut-off points shows there is no difference between the means of neighborhoods with more or less customers (table 4.4). None of these tests had a significant Levene’s test, so for all the t-tests for the equality of means, the variances are assumed to be equal.

Table 4.4 Independent Samples test for different neighborhood sizes

Cut-off point Dependent Variable df t-value Significance

3 customers E(x)t 228 -.022 0.983 Expected Sales 228 .589 0.521 5 customers E(x)t 228 .262 0.794 Expected Sales 228 .397 0.691 10 customers E(x)t 228 -.474 0.636 Expected Sales 228 -.962 0.337

The negative and insignificant t-values contradict the expectation that customers living in close spatial proximity have a higher customer lifetime value. This will need further assessment by looking at the correlation matrix and regression analysis.

Correlations

To further deepen the understanding of the relationships between the dependent, independent and control variables, the next step in the analysis is studying the correlation matrix. In this matrix (table 4.5) the pearson correlation coefficient (r) and its significance is depicted. The richness of the data allows for some more analysis on the data to calculate CLV (time of first and last purchase, frequency of purchasing) and neighborhood characteristics (population, income) in the correlation matrix.

(17)

17

Table 4.5 Pearson Correlations

1 2 3 4 5 6 7 8 9 10 11 12

1. Forward looking transactions over the next year

-

2. Expected average amount of sales next year

.876** -

3. Customer order number in neighborhood -.250** -.225** -

4. Single Customer in Neighborhood .062 .072 -.298** -

5. Density of Neighborhood per km2 .149* 0.129 .221** -.137* -

6. Number of Supermarkets in Neighborhoods .033 .049 .191** -.065 .689** -

7. Distance to Supermarket in Neighborhood -.143* -.114 .090 -.021 -.694** -.484** -

8. First Sale .472** .402** -.495** .428 -.013 -.075 .016 -

9. Last Sale -.724** -.629** .074 -.039 -.110 -.010 .187** -.033 -

10. Frequency of Sales .799** .703** -.171** .066 .175** .084 -.115 .367** -.421** -

11. Total number of customers in the neighborhood

-.009 -.032 .674** -.442** .329** .284** .135* .011 .053 .038 -

12. Average Income x 1000 in Neighborhood -.018 -.062 .407** -.220** .492** .358** -.319** -.044 .029 .008 .603** -

13. Population of Neigborhood .008 -.023 -.017 -.049 .321** .412** -.380** -.091 -.027 .039 -.025 -.182**

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

(18)

18

First of all, the DV’s are similar to each other in their relationships to the other variables; the direction of the relationship is always the same and roughly the same size. Next to that, almost all of the correlation

coefficients of the DV’s have the same level of significance. The only exceptions were the r’s of distance to the supermarket and the population density. These control variables were significantly (p<.05) related to forecasted transactions, but not significantly related to sales amount. The density of the neighborhood showed significant positive correlation with forward looking transactions and the distance to the supermarket showed significant negative correlation. This suggests that the number of transactions is higher in more populated neighborhoods with low distance to supermarkets.

The order number of a customer in a neighborhood showed significant negative relation to both DV’s implying that a customer with a lower order number has relative higher sales than a customer with a higher order number. Frequency, first sale and last sale only correlated significant with the order number (apart from with each other and the forecasted purchases and sales amount that were calculated using these variables). This result is somewhat different from the expectation based on the research of Schmitt, Skiera and van der Bulte (2010). Contrary to the expectation, later customers are associated with a lower CLV than the earlier customers. Of course the limitation of this research is that social contagion between customers is only assumed and it cannot be established if the earlier customer actually referred some of the later customers.

Single customers in a neighborhood did not show significant correlation with forecasted sales or expected sales amount. This would imply, in accordance with the t-test, that single customers do not have lower CLV’s. Single customers were associated with lower population density and lower income. Similarly the total number of customers in a neighborhood also shows significant positive correlation with these neighborhood characteristics. This could imply the size of the customer base is related to population density and income in a neighborhood, but individual CLV is not. Which is in support of Libai, Muller and Peres’ (2012) ideas around acceleration and expansion of a customer base through word of mouth.

The first of the control variables, population density was significantly positively correlated with the number of supermarkets in a neighborhood and significantly negatively correlated to the distance to the

supermarket. Density of a neighborhood is related to higher forecasted transactions. Density in turn is related to higher total number of customers, income, number of supermarkets and lower distance to the supermarket, which might imply more attractive neighborhoods. The correlation between higher density and higher forecasted transactions could be support for the idea that more spatial proximity leads not only to more customers, like in Drye (2011) but also to more spending customers. But the correlation is not so high (r=.149) and the effect needs to be analyzed more carefully in a regression analysis. Especially because, denser neighborhoods were also associated with higher total amount of customers and higher income. Income showed no significant relation to forecasted transactions or sales amount, it was positively and significantly correlated to the total number of customers in a neighborhood. Finally, less distance to supermarkets is associated with a higher number of forecasted transactions, as was said before, and a lower number of total customers in the neighborhood.

Multiple regression

In both analyses three models where tested. The first model contained only the control variables (the three neighborhood characteristics density, number of supermarkets and average distance to supermarkets). This was done to quantify the explanatory value of the IV’s by looking at R-squared (R2). The second model included the independent variable of the nth customer, next to the control variables. This approximates social contagion between neighbors. The third model included the independent variable of single customer in a neighborhood, next to the control variables.

(19)

19

The aim of this analysis is to see how much of the variance in forecasted customer transactions and sales amount is explained by the independent variables and what their respective predictive values are. First of all, the regressions on the two DV’s showed Durbin-Watson coefficients between 1.437 and 1.726 indicating slight, but non-threatening positive autocorrelation regarding the sample size (Durbin and Watson, 1951). To make sure the assumption of independence, necessary for a regression analysis, is not violated multicollinearity was tested for and showed average VIF-values close to 1 (Appendix 1c) and are therefore no threat to the analysis (Field, 2009).

Table 4.6 Coefficientsa

Model B S.E. t Sig. R2

1 (Constant) 30.321 5.322 5.697 .000 .030 Density of Neighborhood per km2 .002 .001 1.599 .111 Distance to supermarket (log transformed) -3.090 6.119 -.505 .614 Number of supermarkets (square root transformed)

-1.369 1.107 -1.236 .218 2 (Constant) 37.382 5.296 7.059 .000 .103 Density of Neighborhood per km2 .004 .001 3.235 .001 Distance to supermarket (log transformed) 8.664 6.346 1.365 .174 Number of supermarkets (square root transformed)

-1.114 1.058 -1.052 .294 Nth Customer in Neighborhood -2.895 .610 -4.742 .000 3 (Constant) 28.724 5.463 5.258 .000 .037 Density of Neighborhood per km2 .002 .001 1.814 .071 Distance to supermarket (log transformed) -1.922 6.180 -.311 .756 Number of supermarkets (square root transformed)

-1.425 1.106 -1.288 .199

Single Customer in

Neighborhood

10.253 8.100 1.266 .207

a. Dependent Variable: Forward looking transactions over the next year

The regression analysis on the DV forecasted transactions (table 4.6) revealed that the first independent variable of the regression model, the order number of the customer, can explain some of the variance in the dependent variables forecasted transactions: the R2 change due to including the IV is .073. The regression model

(20)

20

of the order number of the customer on forecasted number of transactions was found to be statistically

significant, F(4,221)=7.491;p=0.00. So looking at R2 the independent variables without the control variables can explain less than 10% of the variance in the dependent variables. The customer order number in the

neighborhood has a significant negative impact on forecasted number of transactions over the next year (t=-4.742; p=0.00). The unstandardized coefficient B equals -2.895 which means that for each order number increase of 1 the expected number of transactions goes down with roughly 3 transactions. This is different than expected and was also seen in the correlation matrix. Two explanations based on literature are elaborated upon. 1. the order number of the customers is too much a function of time in which transactions could have taken place, showing how retention rate is the main driver of CLV (Gupta and Zeithaml, 2006) and the effect of social

contagion does not overcome the effect of time. 2. the referring customers are actually worth more than the referred customers, supporting Walsh and Elsner (2011). Whatever the explanation, social contagion is not visible in this setting.

Adding the independent variable single customer in a neighborhood to the model instead of the order number of customers explains .007 (R2 change) more of the variance then the model with only control variables. This model was found to be barely statistically significant F(4,221)=2.109;p=0.081. In agreeance with the t-tests and the correlation matrix, there was no statistically significant effect of single customer in the neighborhood on the DV.

In both models the variable density, which was correlated to forecasted purchases in the correlation matrix, showed to have no effect on forecasted purchases or sales amount. This would suggest that population density is not a good predictor of future purchases. The other control variables showed no significant effects on the DV’s Table 4.7 Coefficientsa Model B Std. Error t Sig. R2 1 (Constant) 6.754 .396 17.055 .000 .030

Density of Neighborhood per km2

0.00 .000 .918 .360

Distance to supermarket (log transformed)

-.607 .455 -1.333 .184

Number of supermarkets (square root transformed)

-.111 .082 -1.342 .181

2 (Constant) 7.221 .398 18.136 .000 .101

Density of Neighborhood per km2

.000 .000 2.359 .019

Distance to supermarket (log transformed)

.171 .477 .359 .720

Number of supermarkets (square root transformed)

-.094 .080 -1.177 .240

Customer order number in neighborhood

(21)

21

3 (Constant) 6.705 .408 16.447 .000 .031

Density of Neighborhood per km2

0.00 .000 .999 .319

Distance to supermarket (log transformed)

-.571 .461 -1.238 .217

Number of supermarkets (square root transformed)

-.112 .083 -1.360 .175

Single Customer in Neighborhood

.313 .604 .517 .606

a. Dependent Variable: Total amount of sales by customer

As for the analysis on the other dependent variable, expected sales amount (table 4.7), the regression model of order number of customer on expected sales amount was also found to be statistically significant, F(4,221)=5.822; p=0.00. The other IV, single customer in a neighborhood, did not create a statistically significant model, F(4,221)=1.770, ns.

The order number of the customer in the neighborhood has a significant effect (t=-4.175; p=0.00). The unstandardized coefficient B equals -.192 and because this variable was logarithmically transformed this means that for each unit increase in customer order number the expected sales amount goes down with19.2 percent. There was no statistically significant contribution to the model for the variable single customer in the

neighborhood (t=.517; p ns). The variance of the DV explained by adding this variable to the model was 1% (R2 change = 0.01).

The second independent variable, representing uniqueness of customer in neighborhood, showed to be of less explanatory value as the R2 change due to including uniqueness of a customer in the model was only .001. This means the explanatory value of being the single customer from a neighborhood is close to zero. Next to that the coefficient for single customer in the neighborhood on expected sales amount is insignificant.

Discussion

This chapter is meant to reflect on the performed research by first summarizing the main conclusions. After this the research is discussed in terms of both ethical issues and limitations to the research. Finally, the avenues for future research and the key takeaways for managers and academics are discussed in the implications section.

Conclusion

The results show there is no empirical ground for more (or less) frequent transactions and a higher (or lower) sales amount for customers living in close spatial proximity to other customers. However there is proof that for customers living in close spatial proximity, the earlier customers are more profitable than the later customers. Comparing this to the literature it seems the most likely explanation that the first customers were also in fact the most loyal customers with the highest retention rate. Which is proven to lead to higher CLV (Kumar and Shah, 2004; Gupta and Zeithaml, 2006). Referrals can also lead to higher retention (Weinberg and Berger, 2011) but it is not sure that these customers have actually been referred. These customers may have been purchasing at this firm even before the data was collected. In other words: time is the most important driver of CLV, not spatial proximity to other customers. In this research the nature of the ties and the relative influence of one customer on the other could not be studied, which may open up an avenue for future research.

(22)

22

The analysis also showed that density was related to a higher amount of customers in a neighborhood, but did not affect customer profitability. Income was also related to the number of customers, but not to

forecasted purchases and it couldn’t be included in the regression do to a high number of missing values. The population of a neighborhood was not correlated to the total number of customers, which implies that the total number of customers is dependent on how close they live together. Social contagion is not necessarily a referral from one customer to another, it can also be caused by social learning, competition or legitimation (Iyangyar, van den Bulte and Choi, 2011) so the effect of social contagion may be more visible in the total customer base than in the individual customer lifetime value.

Ethical issues

The ethics around the collection and analysis of data by companies to improve business performance is a complicated matter. (Richards & King, 2014; Tene & Polonetsky, 2012) say “big data and big data analytics are a threat to privacy, confidentiality and identity, but it does spell the death of law”. One of the main objections is that the use of the data can lead to manipulation of a consumer´s behavior (Labrecque, vor dem Esche, Mathwick, Novak, & Hofacker, 2013). Another issue is that the data can be individualized by malevolent agents and used for criminal purposes such as identity theft or blackmailing (Tene & Polonetsky, 2012). Recent revealing (The Guardian, 2015) shows the scale to which security agencies taps into company data to watch civilians and that an individual’s privacy is hardly respected by most governments. Therefore the collection and storage, as well as the interpretation and use of data collected by a company needs to be executed in a very concise and safe manner, because “everyone has the right to the protection of personal data” (European Commission, 2014). From the personal data used in this proposed research an individual cannot be retraced, however in terms of a company’s approach to its customers there is a risk that this research is used for undesirable practices; relatively invaluable customers or groups of customers could be neglected or, worse, refused service based on the findings of the customer base valuation. For this reason it is important for companies to develop business ethics and for policymakers to determine laws around such practices.

Limitations

Like any research, this research has some limitations. The first limitation is that the proposed operationalization for social contagion is narrow in nature; using only the interaction between neighbors excludes various other forms of interaction. One of these is shown in a research about how innovation spreads, where Beugelsdijk & Cornet (2002) showed that the dispersion of an innovation occurs more through friends than through neighbors. And by using co-location data only, the nature of the tie between two customers could not be researched. Unfortunately there is no dataset readily available or easily collectible that incorporates the relationship between different customers. Nevertheless this research shows the higher value of customers that could be affected by social contagion. And thus still provides solid empirical evidence for the incorporation of interaction into the customer-base valuation, because the effect can only be larger for stronger relationships.

Another way in which this research is incomplete is that it fails to distinguish between different channels of communication; the interaction between neighbors can occur both on- and offline, though intuitively one assumes that most occurs offline. When more and more of our communication is taking place online through emails and social networks, it would be an interesting avenue for future research to see to what extent the interaction is retraceable (e.g. Singh, Freeman, Lepri and Pentland, 2013) and if it could be incorporated into research about customer-base valuation.

Finally, a threat to the generalizability of this research is that it analyzes the customer base of one firm in one industry over a specific period of time. The purchases were mostly B2B and all made on Monday between 9 and 11. Therefore the risk of omitting variable bias is present in this study. The type of firm, the industry and the

Referenties

GERELATEERDE DOCUMENTEN

On the other hand noise levels has a positive impact on happiness and amount of social interaction with housemates has a very small negative impact on the happiness of students

the approach to model innovative and economic performance as a function of the relative strength of the firms’ internal resource base, the externally acquired

H1a: Systematic versus Unsystematic Dynamic Pricing  Purchase Satisfaction H1b: Systematic versus Unsystematic Dynamic Pricing  Switching Intention H2a: H1a is mediated by

The predictors included in the model were divided into relational characteristics and customer characteristics (Prins &amp; Verhoef 2007). The relational characteristics

(a) Post-paid customers: The usage factors did have some effect on customer churn in the post-paid sample as the variables “Average abroad total charge ratio” and “Maximum

The Chartered Institute of Management Accountants (CIMA) (2009, p. 3) defines CPA as “the analysis of the revenue streams and service costs associated with spe- cific customers

This study further investigates the field of customization by testing the effect of a personalized direct mail, that fits customer preferences, on the drivers of customer equity

Whereas increasing levels of knowledge proximity can increase the mutual absorptive capacity between partners, knowledge proximity can also reach a point at which it