• No results found

Understanding Company Customer Data Usage (A study of the “McKinsey Quarterly Global Survey on what marketers say about working online, 2011”)

N/A
N/A
Protected

Academic year: 2021

Share "Understanding Company Customer Data Usage (A study of the “McKinsey Quarterly Global Survey on what marketers say about working online, 2011”)"

Copied!
53
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MSc Business Administration, specialization - Marketing

Understanding Company Customer Data Usage

(A study of the “McKinsey Quarterly Global Survey on what marketers say about

working online, 2011”)

Hengda Zheng (student number 2127687) Yana Antonenko (student number 2244187)

Fregatstraat 60 Blekerslaan 4 C08

9642LD, Veendam 9724EJ, Groningen

Tel: +31 (0)648450538 Tel: +31 (0)644797700

Email: paulzheng.h.d@gmail.com Email: yana.antonenko@gmail.com

Under the supervision of:

(2)

Abstract

This paper investigates the question of firms’ customer data usage. Companies have different types of data, i.e., individual data on each customer or aggregate data on sales that they use to support marketing decisions. In this research we consider four customer data types:

1) aggregate (non-individual) sales or usage data,

2) sales or usage data for individual customers, but no information about the customer, 3) basic information on each customer, which can be linked with sales or usage data, and 4) detailed customer information, which can be linked to sales or usage data to create a “single customer view”.

The objective of this research is to identify company characteristics that drive customer data usage. We provide general and detailed outlooks using logistic regression (LR) and multinomial logit (MNL). For the LR estimation we combined data types 1, 2 and 3 and labelled it aggregate data. MNL is the main method, which accounts for differences in the usage between 4 customer data types. Both methods are estimated with a base level – data type 4 – individual data.

Data for analysis was collected from managers in 777 companies across a vast scope of world regions, industries and company sizes. The results of the MNL estimation show that those companies that use data type 1 hope to improve purchase influence through data analysis. Normally these companies experience a lack of leadership in analytics and a lack of infrastructure and IT, which prevent them from working with individual data. They tend to have an average revenue from online sales between 2% and 10%. With annual revenue of less than $1 billion, they can be assumed relatively small-scale businesses. Companies that choose to have customer data type 2 also lack the infrastructure and IT to analyse the data, and are predominantly from the Manufacturing and Financial services industries. The difference with companies using data type 1 is that companies with customer data type 2 use some social media, and have much smaller revenue (less than $10 million). Finally, companies that use customer data type 3 - similar to data type 1 - experience a lack of leadership in analytics and of infrastructure and IT, hope to influence the purchasing process, are involved in Manufacturing industry and use some social media.

(3)

Table of Contents

PART 1. INTRODUCTION

1.1 Companies’ usage of customer data 4

1.2 Research question statement 5

1.3 Scientific relevance 6

PART 2. DATA DESCRIPTION

2.1 Data for explanatory variables 7

2.2 Data for dependent variable 9

PART 3. LITERATURE REVIEW 12

PART 4. THE MODEL

4.1 Conceptual model for customer data type usage 14

4.2 Determinants of customer data type usage 16

4.2.1 Perceived benefits of analyzing customer data 16

4.2.2 Digital media and online tools 19

4.2.3 Internal challenges 20

4.2.4 Competitive advantage of data 22

4.2.5 Company revenue 22

4.3 Institutional variables 23

PART 5. RESEARCH METHODOLOGY

5.1 Variables used in the analysis 25

5.2 Validity and reliability of measures 26

5.3 Research methods 27

5.3.1 Logistic regression 27

5.3.2 Multinomial logit 28

PART 6. EMPIRICAL RESULTS

6.1 Logistic regression results 30

6.1.1 Face validity and robustness check 32

6.2 Multinomial logit results 33

6.2.1 Face validity and robustness check 35

PART 7.CONCLUSION

7.1 Outcomes of analysis 38

7.2 Managerial implications 39

7.3 Limitations and further research 40

REFERENCES 41

(4)

PART 1. Introduction

The following paper is based on the data of McKinsey & Company, retrieved from the McKinsey Quarterly global survey on what marketers say about working online, conducted in 2011.

The paper is organized in the following manner. Firstly, we introduce the topic of companies’ customer data usage, the research question and the scientific relevance of the paper. Secondly, we describe the data used in this research. Thirdly, we present a literature review of the research question, followed by a conceptual model, and hypotheses for the constructs that we find are related to the proposed model. Next, we elaborate on the research development process and validity checks, as well as the econometric model and sample description. Empirical results are based on testing the model with two statistical techniques – logistic regression and multinomial logit to account for potential effects on the dependent variable. Finally, we provide managerial implications and general remarks about the study. Below we specify the area of responsibility of each of the two authors in the paper, because this paper is the joint work of two authors.

Yana Antonenko

• Part 1. Introduction • Part 3. Literature review • Sections 2.2; 5.1; 5.3.2; 6.2

Hengda Zheng • Part 4. The model

• Sections 2.1; 5.2; 5.3.1; 6.1 • Part 7. Conclusion

1.1

Companies’ usage of customer data

Recent developments in data mining technologies allow companies nowadays to obtain massive amounts of customer data, a phenomenon which is sometimes referred to as “big data”. According to McKinsey & Company’s (2012) recent study, most chief executives identify big data and analytics, digital marketing and social-media tools as three key trends in digital business. Here are some observations:

™ Insights from the big data are increasingly used to provide individualised customer offerings. ™ Large amounts of data become an advantage rather than a drawback, because the computing

power is technologically available.

(5)

However, big data also creates information deluge, which makes it difficult for marketing managers to transform available information into useful insights for their businesses (Russel and Kamakura, 1994). As Day (2011, p.183) states:

“their [marketers’] strategies are not keeping up with the disruptive effects of technology-empowered customers; the proliferation of media, channel, and customer contact points; or the possibilities for microsegmentation”.

The latter, microsegmentation, is becoming possible because customer data is now also available at very detailed, or “micro” levels. Data can be linked to each individual customer, and technology allows companies to interact intensively with the customer as never before (Rust, Moorman, and Bhalla 2010). Rust, Moorman and Bhalla (2010) in their article invite the readers to rethink the concept of marketing, and advocate the importance of individual data in maximising customer lifetime value.

Analyzing customer data is an essential process for any company. Through analysis a company gets insights into its customer base and uses this knowledge as a support for a wide scope of marketing decisions (Fayyad and Uthurusamy 2002; Linoff and Berry 2011). The reason why companies want to have knowledge about their customers, is that it naturally allows them to learn from customer needs and preferences, design targeted products and services, and benefit from increased customer loyalty and profitability. As Kotler and De Bes (2003) summarise, marketing focuses on needs and the development of appropriate offerings to satisfy them.

So why do companies find themselves in a difficult situation when it comes to analyzing the customer data? The answer could be that many of the companies lack the marketing capabilities to deal with the data (Day, 1994). Success in analyzing increasing amounts of customer data is based on the premise that a company has analytically-skilled employees, who are able to work with the data (McGovern et al, 2004; Verhoef and Leeflang, 2009). However, nowadays it is very often one of the major challenges companies face. Besides, collecting, analysing and maintaining customer data requires substantial investment in IT infrastructure. One CIO described talking about his request to the executive committee about a new data center:

“I have to go and tell them that I need a billion dollars, and in return I’m going to give them exactly nothing in new functionality— I’m going to allow them to stay in business” (Kaplan, Smolinski and Weinberg 2011).

1.2 Research

question

statement

(6)

now already working with individual data, while others seem to still prefer the aggregate type of customer data. The question that we asked ourselves in our research is: What characterizes the company that uses individual and aggregate types of customer data? Hence, the objectives of this study are to identify and interpret which type of firms use what type of customer data.

1.3 Scientific

relevance

(7)

PART 2. DATA DESCRIPTION

Our analysis is based on McKinsey and Company’s 2011 global Internet-based survey concerning “what markers say about working online”. The survey was targeted at managers of companies across a wide range of industries, regions and company sizes. It contained key questions concerning the type of data used by the company, the benefits that the company hoped to gain from analysing customer data, and lastly, how the data-driven customer insights affected the company’s competitiveness in the short term. Questions on attitudes towards online presence, social media and performance of online sales were also asked. These constituted explanatory variables used in this research, and will be described later in this chapter. Furthermore, demographic profiles for the company were obtained and contained the workplace location, type of industry, the company’s revenue size and the participant’s job title.

As a result, the sample contained 792 respondents, within which 15 participants did not fully complete the questionnaire and, hence, their observations were excluded from the analysis. There were 67 missing data points for the dependent variable – data type – consequently leading to a final sample of 710 observations.

2.1 Data for explanatory variables

Here we provide a descriptive analysis for some of the survey questions that we later used in the analysis. These are:

- Importance of data-driven insights for competitiveness - Social media usage

- Importance of effective online presence - Company annual revenue (in US dollars) - Percentage of revenue from online sales - Industry

- Region (by office) - Main company focus

Firstly, concerning the importance of data-driven customer insights, 332 companies out of 710

sample observations consider data-driven customer insights as “very important” for their company’s competitiveness, whereas less than 60 companies regard it as “somewhat or not at all important” (see Figure 1).

(8)

Secondly, the sample companies are active in various industries, including business services, finance, high-tech/telecommunications, manufacturing, healthcare service and retail. Business services represents the largest group, with 154 observations (see Figure 2).

Figure 2. Industry

Thirdly, as illustrated in Figure 3, companies with revenue of less than $1 billion represent a substantial part of the sample, and can be classified as small-scale businesses. Companies with revenue of more than $1 billion are considered large-scale companies. The majority of companies are large-scale companies with revenue of over $30 billion (n=195).

Around 60% (n=424) of companies focus on the B2B market, and 30% (n=209) focus on B2C.  An “effective online presence” is “extremely important” for 412 companies, “important” for 253 companies and “somewhat important” for 108 companies (see Figure 4). For 343 companies, their percentage of revenue from online sales is

less than 2%, whereas 175 companies report that their percentage of revenue from online sales is between 2% and 10% (see Figure 5). Finally 119 companies report that their online revenue is more than 10% of annual revenue. The companies generally neglect the importance of social media - only 202 companies regard it as an important tool (see Figure 6). Lastly, most of the respondents’ company offices are located in Europe (n=291), followed by North America (n=227), and then Developing markets (n=168) and Asia (n=91) (see Figure 7).

Figure 3. Company annual revenue (in US Dollars)

(9)

       

      

 

343

175

119

Less than 2% Between 2% and 10% More than 10% 108 253 412 Somewhat important Important Extremely important

Importance of online presence Percentage of revenue from online sales

 

      

  

291 227 168 91 Europe North America Developing Market Asia 202 362 204 Important tool

Some use Don't use

Importance of having social media

2.2 Data for dependent variable

As illustrated in Figure 8, within the total 710 observations, a large percentage of companies (38%) have “basic customer data, linked to usage/sales data”, followed by “we have detailed customer information, but do not know who the customer is” (19%), “we have detailed customer information, linked to usage/sales data to create a “single customer view” (18%) and “we only have aggregated (non-individual) sales/usage data” (17%).

Region

Figure 4. Importance of effective online

presence

Figure 5. Percentage of revenue from

online sales

Figure 6. Social media usage Figure 7. Region (by office)

(10)

PART 3. Literature review

In the academic literature companies’ usage of customer data has received moderate attention. The literature review shows that, in general, articles tend to discuss various marketing-mix aspects of either aggregate, or individual level data usage, like in the case of, for instance, discussion of brand choice implications based on scanner data (Bucklin, Russell and Srinivasan, 1998). Many companies prefer to use aggregated data for the purpose of assessing marketing campaign performance or calculating market share (Chen and Yang, 2007; Bucklin and Gupta, 1992; Guadagni and Little, 1983). At the same time, only several findings address the question of comparing individual versus aggregated data type usage (Russel and Kamakura, 1994). Dong and colleagues (2009), for instance, compare the value of targeting at an individual as opposed to aggregate level of customer data. Their conclusion for the pharmaceutical domain is that firm profitability increases by 14-23% when targeting an individual rather than a segment. However, given that these results are valid only for one industry, it is not possible to explicitly say that individual type data is inherently better than aggregate type data.

Mass marketing versus direct marketing

A further literature review shows that in today’s competitive world, mass marketing is losing out to direct marketing, which becomes more beneficial for companies. Leeflang and Wittink (2000) state that future success lies in providing individualised products. Rossi, McCulloch and Allenby (1996) also empirically support the notion that targeted marketing will gain much more prevalence in the future. Besides, service is becoming more individualized, complicated and important (Clemens et al., 2011).

(11)

Customer data collection

One of the most powerful supports for direct marketing is data mining, which provides an in-depth knowledge of customers (Ling and Li, 1998). Many scholars refer to data mining techniques when talking about analyzing data (Linoff and Berry 2011; Loveman 2004). Data mining is described as “a process for exploring large amounts of data to discover meaningful patterns and rules” (Linoff and Berry 2011, p. 2), and is also known under many other names such as, business intelligence, competitive intelligence and marketing intelligence. It is a process of what Linoff and Berry (2011, p.10) call “turning data into information, information into action, and action into value”. Besides, they claim that “organizations who want to excel at using their data to improve their business do not view data mining as a sideshow” (2011, p. 2).

Data mining refers to any data companies possess, however the focus of this paper is on customer data. As was mentioned earlier, the increase in customer data available, in particular, scanner-based data (Leeflang et al. 2000, p. 301) and data from social media, has created a task for the companies to convert this customer data into knowledge about their customer (Linoff and Berry 2011). Leeflang et al. (2000, p.305) advocate the importance of having marketing management support systems that allow the automated analysis of scanner data. As Shaffer and John (1995) illustrate with the example of target coupons, statistical software is important, since it allows companies to target selected households with a high degree of accuracy. In addition to that, Leeflang and colleagues (2000, p. 301) emphasize of importance of having “good” data as a prerequisite for meaningful customer insights. Having said this, they propose several constructs, like availability, quality, variability and quantity, which the collected data should comply with.

Furthermore, customer data can be scattered around different departments within the company. In this case, cross-functional integration becomes essential (Rust, Zeithaml and Lemon 2000), and companies need to develop a “network” of insights. Forsyth, Galante and Guild (2006) from McKinsey and Company also suggest that network should include not only data on attitudes, behavior or customer transactions, but also insights from expert third parties.

Benefits and challenges of analyzing customer data

(12)

requires knowledge of individual-level preferences, which, in turn, can be drawn out from customer data analysis. However, customization also has its drawbacks, for instance, implementation challenges or insufficient customer information. Besides, there are internal challenges that could prevent companies from using certain types of data, for example, a lack of analytically-skilled marketers (Verhoef and Leeflang, 2009). Another possible reason can be the lack of financial resources since processing and storage of individual data might come at a cost for a company (Bodapati and Gupta 2004). Overall, the question is, as Rossi, McCulloch and Allenby (1996) put it, whether companies want to collect complete and detailed profiles for their customers. They argue that decreasing costs of information processing and storage will make it easier to collect purchase history data of, for example, individual customers. So, the decision will depend on company characteristics and its marketing strategy.

Data types used for innovation

While data mining techniques and data warehousing technologies are based on quantitative data, qualitative data is used less often. Indeed, the McKinsey and Company survey data (see Figure 9) shows that marketing decisions now rely heavily on quantitative data, while qualitative data is left aside.

However, qualitative data is believed to be one of the sources for breakthrough innovation in the company. Nagji and Tuff (2012) state that successfully managing innovation could help a firm to better use its resources, and make innovation a driver of the firm’s growth. In latter discussion, we introduce the topic of data types used for innovation and discuss it in more detail.

One of the functions of marketing is to drive the creation of new products that meet

customer needs (McGovern et al. 2004). This allows companies to improve the quality of products and services, and decrease their price (Hauser 2006). In this way, the internal processes of discovering customer needs and cost optimisation force the company to innovate. However, external factors play an equally important role in pushing companies to invest in innovation. In particular, global competition, increasing customer expectations and the necessity to stand out from competitors lead companies to heavily invest in innovation in hope of yielding a new stream of successful offerings

Figure 9. Types of customer data (%) used in companies

(13)

(Nagji and Tuff 2012). The case of Procter & Gamble illustrates how innovation helps the company to generate billions of dollars, which it re-invests for future innovation. The money that Procter & Gamble invested for innovation is, on average, 50% more than its major competitors (Brown and Anthony 2011).

When talking about innovation, one can distinguish different types. For example, McKinsey & Company’s analysis distinguishes between line extension without innovation, marginal innovation, incremental innovation and breakthrough innovation. Their analysis of product launches reveals that for breakthrough innovation less data-driven insights are used (see Figure 10). Data-driven insights are used mostly for line extensions without innovation, marginal and incremental innovation. Meanwhile, for breakthrough innovation, which has the highest impact on category sales, it is qualitative data (i.e. market trends reports, focus groups with consumers etc.) that is used.

Figure 10. Data-driven insights used for innovation (Figure adopted from

(14)

PART 4. The model

In this part of the paper we present a conceptual model. In addition, we describe model constructs in more detail.

4.1

Conceptual model for customer data usage

In the introduction we mentioned that our study is, to the best of our knowledge, the first empirical test of customer data usage. Due to the limited number of published empirical articles on the topic, we first developed model constructs following intuitive reasoning, and later researched the proposed constructs to find support in the academic literature. As a result, the proposed conceptual model is a brand new formation.

The conceptual model is derived in Figure 11. In the model, the main variable of interest is the type of customer data used by companies, which includes:

1) aggregate (non-individual) sales or usage data,

2) sales or usage data for individual customers, but no information about the customer, 3) basic information on each customer, which can be linked with sales or usage data, and

4) detailed customer information, which can be linked to sales or usage data to create a “single customer view”.

We believe that these four types of data can be influenced by six groups of predictors.

Firstly, based on the finding that benefits of using individual or aggregate data vary (Chen and Yang 2007), we consider perceived benefits of analysing the data as one of the predictors for customer data type used in a company. Since there are a number of benefits, this construct consists of 5 subgroups:

™ Purchase influence ™ Driving innovation ™ Customer loyalty

™ Analysis

™ Driving increased sales volume benefits The next predictor is what we label “internal challenges”.

Secondly, data linked to an individual customer can bring deep customer insights, however this type of data is difficult to obtain, and its processing is expensive (Guadagni and Little 1983; Bodapati and Gupta 2004). Based on this argument, companies might face internal constraints, i.e., lack of quality data to analyse or a lack of resources, which will influence the type of customer data companies choose. Similar to benefits construct, internal challenges also consist of subgroups:

™ Lack of resources

™ Lack of leadership in analytics

(15)

The third predictor is a group of variables related to digital media and online tools. Since nowadays companies can obtain almost real-time individual data from social media (

Berinato 2010

), we include a social media usage variable. We hypothesise that importance of effective online presence and revenues from online sales have a relationship with the type of customer data that a company has. Fourthly, following Linoff and Berry’s (2011) statement that forward-looking companies perceive data mining as a very important process, we measure the importance of data-driven customer insights for a firm’s competitiveness, and label it the “Competitive advantage of data” construct. Linoff and Berry (2011) also mention that large-sized companies are using data mining techniques to be able to establish one-to-one relationships with customers – something that small sized companies have an inherent advantage in. Fifthly, following this and Bodapati and Gupta’s (2004) proclamation of substantial financial resources required for using individual customer data, we measure company revenue as a potential indicator for company size and an influencer of choice of a certain customer data type. Finally, to control for other variables, we include institutional variables which we label “firm characteristics”. These include the company’s region by office, its focus and the industry in which it operates.

In our conceptual model, we expect variables to have different relationships with the usage of customer data type (see Table 1). For instance, we expect that the more important a company perceives data-driven customer insights for its future competitiveness (H4), the more inclined it will be to choose data at an individual rather than aggregate level. For the challenges predictor, we expect the opposite, i.e. when company lacks infrastructure and IT, it is expected that it will have a preference for the aggregate type of customer data.

Data type Institutional variables Firm Characteristics Region Industry B2B/B2C

Perceived benefits (H1a,b,c,d,e)

Purchase influence Driving innovation Customer loyalty Analysis

Driving increased sales volume

Internal challenges (H3a,b,c,d)

Lack of resources

Lack of leadership in analytics Lack of analytical talent Lack of infrastructure and IT

Digital media and online tools (H2a,b,c)

Social media usage Effective online presence Revenue from online sales

Competitive advantage of data (H4)

Company revenue (H )5  

(16)

4.2 Determinants of customer data usage

As outlined in the conceptual model, we believe there are 6 groups of determinants that can be reasons why companies have a certain customer data type. These are:

1) the perceived benefits that companies see from analyzing the data,

2) digital media and online tools, (i.e. companies’ perceived importance of online presence, or social media usage),

3) internal challenges that companies face in building their analytical capabilities, 4) competitive advantage of data, 5) company revenue, and 6) institutional variables. We discuss these in more detail in this part of the paper.

4.2.1 Perceived benefits of analysing customer data

We identify and discuss herein 5 benefits that stem from analyzing the data: purchase influence, driving innovation, customer loyalty, analysis and driving increased sales.

Purchase influence

Following the renowned model of the consumer buying process, there are 5 stages of the buying process: (1) problem recognition, (2) information search, (3) evaluation of alternatives, (4) purchasing decision, and (5) post-purchase behaviour (Kotler, Shalowitz, Stevens 2008). Companies are able to influence the consumer at different stages of the buying process. For instance, at the problem recognition and information search stages, companies can influence the consumer by increasing awareness of the companies’ products and services. Customers usually buy products according to their previous experience or interaction with the firms (Erdem et al., 1996; Feinberg et al., 1992), so it is important to understand how each customer feels about the firms’ products (Edell and Burke, 1987). This knowledge can be used for influencing the consumer at the purchasing decision stage. The result from analysing scanner panel data shows that consumer behaviour changes overtime, and suggests the relationship between current choice probabilities and past purchases (Erdem and Keane, 1996; Feinberg, Kahn, and McAlister, 1992; Guadagni and Little, 1983). An example of purchase influence benefit can be Harrah’s Entertainment, one of the most successful companies in the casino industry in USA. Through data-based marketing and analytical tools the company has discovered customers’ individual potential demand and has transformed the way customers make a purchasing decision (Loveman, 2003). On the basis of this discussion, we hypothesise the following:

H1a: Companies that realise the purchase influence benefit, stemming from data analysis, choose more

(17)

Driving innovation

With the marketing department losing its influence within the firm, innovativeness is one of the keys to regain marketing’s reputation in the firm (Verhoef and Leeflang, 2009; Dodgson, 2006). The degree of company innovativeness has a link with the customer data type a company uses (see discussion in Section 2.3). For line-extension, marginal and incremental innovation, which require fast response measures, we hypothesise that aggregate customer data will be more appropriate.

The importance of innovation is illustrated with the example of Procter & Gamble. Being one of the top companies in the world, P&G focuses on open innovation, which generates more than 35% of the overall innovation and billions of dollars in revenue (Huston and Sakkab, 2006). However, open innovation is not easy to acquire, it needs a large amount of investigating, analyzing and testing (Dodgson, 2006). For this reason, we believe firms will use different data types in order to drive innovation. However, generally, we do not expect results for this hypothesis, since there is no distinctive variable related to innovation in the current research. We still hypothesise the following:

H1b: A high degree of company innovation is related to its preference for aggregate data rather than

individual customer data.

Customer loyalty

Loyalty programs have become widely-used, because companies realise loyal customers usually generate more profitability (Gupta, Lehmann and Stuart 2004). The result of a McKinsey two-year study of customer attitudes in various industries shows that loyal customers could spend more on the firms’ products and might also bring in new customers (Coyles and Gokey 2002). It might be that companies are looking for loyal customers because serving a loyal customer costs approximately five or six times less than attracting a new customer (Gupta and Zeithaml, 2006). Furthermore, customer data from a Spanish retailer shows that loyal customers’ purchase behaviour is usually consistent once they become loyal customers (Gomez, Arranz and Cillan 2006). A good marketing relationship with customers usually helps firms to obtain a greater market share and more revenue, whilst reducing costs (Ndubisi, Malhotra and Wah, 2009), which are the underlying reasons for the emergence of customer relationship management (CRM) (Reinartz et al. 2004). Thus, we believe that satisfying loyal customers and providing tailored services are related to individual data. Based on this discussion we hypothesise the following:

H1c: Companies that hope to get more loyal customers through data analysing, prefer to have

(18)

Analysis

The influence of the marketing department has decreased in recent decades (Verhoef and Leeflang 2009). It is suggested that the marketing department should focus more on providing appropriate measuring metrics to support long-term marketing and management decision making through which firms could achieve “real” good performance (Webster et al. 2005). A growing pressure on marketing contribution to shareholder value has brought the metrics into central stage (Grewal et al. 2009). Rust, Lemon and Zeithaml (2004) claim that “leading marketing companies consider this problem so important that the Marketing Science Institute has established its highest priority for 2002-2004 as “Assessing Marketing Productivity (Return on Marketing) and Marketing Metrics”.

Without a doubt, customers are the most important assets for any organization and customer performance has a direct link with firm financial performance. However, it is reported that not many metrics are clearly defined to measure customer performance (Gupta and Zeithaml 2006). Thus, we believe a relationship between the level of data and metrics for decision making and firm performance exists. So, the following hypothesis has been formed:

H1d: Companies that are able to develop accurate metrics and models for the decision-making process,

prefer to have individual customer data rather than aggregate data.

Driving Sales

Data analysis can help companies increase sales volume, and thus generate more profits. For an example of how a paper company leveraged its profits through data mining, see Watson et al. (2002). Customer insights generated from data analysis can help the company to stimulate repeat purchases and cross-sell products (Ansari and Mela 2003), leading to increased sales.

Today’s customers are faced with massive amounts of information which prevent them from finding the best products or services that could satisfy their needs. However, with the fast development of technologies and click-web information gathering, firms can now easily collect more personalized and detailed information about each customer and provide individualised products or services (Davenport et al. 2011). A customer might become a loyal customer after he feels satisfied with the services or products, thus generating more sales volume for firms or bringing in more customers (Ndubisi, Malhotra and Wah 2009; Bougie et al. 2003; Gelbrich 2010). Customer relationship management has emerged under the umbrella of technology. With CRM, firms can establish relationships with customers, and find many interesting stories and much useful information for setting up marketing campaigns with the customer data thus contributing to the sales (Tanner et al. 2005). So, it would be interesting to investigate what kind of data firms use in order to drive sales volume.

(19)

3.2.2 Digital media and online tools

Digitalisation of the world, the availability of new digital tools and the prominence of social media – all these factors influence the customer data type available at the company. Therefore, we identify the following 3 dimensions that are related to digital media and online tools construct: online presence, social media usage, and revenues from online sales.

Online presence

The emergence of new technologies is now opening more opportunities for firms to increase profits. Internet technologies allow firms to easily customise, because there is a lot of customer data available, allowing them to provide content that is highly relevant to the consumer (Ansari and Mela 2003). Besides, Internet development has opened a new channel for online sales.

The research from Köhler and colleagues (2011) shows that both existing customers and newcomers could get more familiar with a firm’s services with the help of online agents, which will further help firms to understand customer attitudes and behaviour, and therefore satisfy, in particular, new generations of customers. However, companies might not easily get personal information online, because around 95% of customers do not trust online business and thus are not willing to provide useful personal information (Hoffman, Novak and Peralta 1999).

Srinivasan and Moorman (2005) studied the effect of a number of online shops on customer satisfaction across 106 online retailers. Their findings suggest that for online retailers with a moderate number of online shops customer relationship management (CRM) could strongly improve customer satisfaction, whereas for companies with low or high number of online shops, customer satisfaction would be able achievable. Customer data mining through for ex., a loyalty program could bring benefits for firms (Gomez, Arranz and Cillan 2006; Gupta and Zeithaml 2006). Therefore, we formulate the following hypothesis:

H2a: If companies consider effective online presence important, they are more likely to prefer

individual data.

Social media usage

The rapidly growing phenomenon of social media is now helping firms to better fulfil their strategies. Generally speaking, marketers could obtain near-real time free customer data from social media, such as Twitter (Berinato 2010).

(20)

into a company’s activities, and thus provide a new way of marketing the firm’s products (Ginovsky 2011). It transforms the way the products are perceived by consumers. Besides, the outcome of Colliander and Dahlen’s (2011) study indicates that the use of social media generates higher brand attitudes and purchase intentions. Therefore, we hypothesise the following:

H2b: Companies that use social media, prefer to have individual customer data rather than aggregate

data.

Revenues from online sales

Online shopping is currently increasingly gaining popularity because of convenience, additional variety and, most importantly - lower price. It was reported that 900 million dollars were generated from online sales in U.S., which was 9 % of total retail sales compared with 5% in 2007 (Rigby 2011). Online shopping is becoming widely popular not only because of its convenience, but also because it offers tailored products based on customers’ previous purchase experience (Rigby 2011). Research by Choi and Bell (2011) shows that customers who are looking for “preference minorities” usually prefer shopping online because they know that offline shopping targets customers en masse, and they cannot usually get individually tailored products or services. By having online shops, firms generate 50% more sales because of “individual preference”, where preference is isolated within a certain area (Choi and Bell 2011). Therefore, we form the following hypothesis:

H2c: The higher the revenues from online sales, the higher the possibility of a company having

individual data.

4.2.3 Internal challenges

There are a number of challenges that companies face in building their analytical capabilities. These are: lack of resources, lack of leadership in analytics, lack of analytical talent, and lack of infrastructure and IT.

Lack of resources

(21)

following hypothesis is established:

H3a: A lack of resources (time, funding and quality data) leads to firms having a preference for

aggregated data over individual data.

Lack of leadership in analytics

According to Watson and colleagues (2002), supervisors and managers are very important for firms’ data analysis. Without support from management level, companies are less likely to use data mining techniques to better serve customers. The importance of top managers for data analysis is also approved by Bughin et al. (2011) who stated that senior support is vital to form data-driven competitiveness. Therefore, we propose the following hypothesis:

H3b: Companies that have a lack of leadership in analytics will probably have more aggregate type

customer data than individual data.

Lack of analytical talent

The massive amounts of data available for companies have created the problem of who will actually deal with the data. For instance, companies nowadays lack analytically competent marketing talents able to analyze the data (McGovern 2004), which can be a result of a past failure to track market movements and respond to them by developing corresponding marketing capability (Hamel and Välikangas 2003). The marketing field, which conventionally has encouraged creativity and out-of-the-box thinking in marketers, is nowadays faced with a lack of employees with analytical skills and expertise in IT, finance and data analysis (McGovern et al. 2004; Verhoef and Leeflang 2009).

Three aspects are relevant in this discussion. Firstly, firms are lacking employees with analytical skills. One might argue here that there are employees from the financial department within any company, who possess analytical capabilities and could potentially be engaged in customer data analysis. However, research (Wiesel et al. 2011) shows that financial analysts, on average, are not good marketers, because they are unable to interpret data in relation to marketing activities. Moreover, if there are no practical implications to be drawn out of data analysis, it has no value for the company (Leeflang and Wittink 2000). Secondly, the lack of analytically-talented employees might be a result of the inability of the HR department to identify the required skills in potential candidates. Thirdly, it might be difficult for a company to attract appropriate candidates due to relatively small supply of this type of specialist in the market. Individual data requires a greater deal of time and commitment of resources to analyze than aggregate data, and we therefore hypothesise the following:

H3c: Companies that lack analytically-talented employees prefer aggregated data rather than

(22)

Lack of infrastructure and IT

McKinsey Quarterly (2011) reported that investing in IT and data exploration help the financial industry to gain value creation opportunities. The reason why IT is so important is because all the related data mining which is used for identifying potential customers, customer purchasing behaviour and loyalty is related to IT tools and infrastructure. Without them, firms could barely analyse any type of data.

The research shows a strong correlation between IT capital accumulation and labour productivity, and this is particularly true for the U.S. economy. The U.S. economy which is related to the IT industry sees a huge increase in gains, while non-IT related industries do not display a strong growth in productivity (Stiroh 2001). In this way, the following hypothesis is formed:

H3d: Companies that lack proper infrastructure and IT probably use more aggregate customer data

rather than individual data.

4.2.4 Competitive advantage of data

Data mining provides opportunities for firms to discover unsatisfied needs and as a result to improve services (Fayyad and Uthurusamy 2002). Through data mining, firms can extract personal information and preferences of individual customers and use it to customise product offerings, which will lead to higher customer satisfaction (Wong and Chung 2008).

One of the components of brand value is consumers’ opinions of certain brands. A customer-based view is valuable for firms because customers are the drivers of firms’ financial gains (Lassar et al. 1995). The research conducted by Sriram and colleagues (2007) showed that the aggregated store level data gave clear evidence of drivers of brand equity and the benefits of understanding of brand equity drivers could help firms to realize the impacts of different activities and, consequently, better allocate company resources. Therefore, the following hypothesis is formed:

H4: Companies that perceive data-driven customer insights important for their future competitiveness

prefer to have individual types of customer data rather than aggregate data.

4.2.5 Company revenue

(23)

and usage of data type is established:

H5: High revenue firms have individual customer data types rather than aggregate data.

4.3 Institutional

variables

Complementary to the constructs mentioned above, we include firm characteristics that may affect the type of data used by companies. These are region by office, company focus and industry. Unlike the previous variables, we do not formulate any explicit hypotheses about these variables. In the discussion below we present each included variable and provide the rationale for including it in the analysis.

Region by office

Customer demand for products is heterogeneous across world regions. Depending on, for instance, local culture and consumer lifestyles (Hofstede 2001), or readiness to embrace the technology (Parasuraman 2000), consumers differ in their preferences for products and services. The higher value of the technology readiness index in developed markets suggests that product or service users there are more advanced in their consumption. Therefore, in developed markets it might be easier to gather personal data, compared with developing markets. Glaeser, Kolko and Saiz (2001) report that generally speaking a wide range of product variance exits in larger markets, where increased need for individualism exists. In this sense, large markets might have a preference for individual data. So, companies might need to use different customer data types in order to serve customers in different world regions.

Company focus

(24)

Industry

Differences might exist between the types of data companies use, depending on the industry they are involved in. For example, Rafalski et al. (2002) state that data mining insights are very important for the health industry. Since targeting at individual level results in higher profitability in the health industry (Dong et al. 2009), individual data might be more preferable in this industry. Besides, the example of Harrah’s Entertainment, which we mentioned earlier, suggests that companies in the entertainment industry might be well-off with the individual data.

Clemens et al. (2011) state that customers increasingly want personalised service. By offering a satisfying level of service, firms can have a healthy and long-term relationship with customers, who might finally become loyal. Therefore, for the service industries we expect to have more individual type data.

In the preceding discussions, we elaborated on our conceptual model and the underlying relationships. An overview of the hypotheses and the expected direction of the variables are presented in Table 1.  

Perceived benefits Purchase influence -Driving innovation -Customer loyalty -Analysis

-Driving increased sales

-Digital media and online tools

Importance of effective online presence

-Revenues from online sales

-Social media as a tool to achieve business objectives

-Competitive advantage of data

Importance of data-driven customer insights for competitiveness

-Internal challenges

Lack of proper infrastructure and IT tools +

Lack of analytical talent +

Lack of resources +

Lack of leadership in analytics +

Annual revenue

Less than $10 million +

At least $10 million but less than $250 million +

At least $250 million but less than $500 million +

At least $500 million but less than $1 billion +

At least $1 billion but less than $10 billion

-At least $10 billion but less than $20 billion

-More than $30 billion

-Expected direction Main Variables “+”: aggregated data “-” : individual data NS: not significant    

(25)

PART 5.

Research Methodology

In this part of the paper, we elaborate on the data collection methods, sample description, variables used in the model, validity and reliability of measures. In addition, we provide reasoning for the choice of research methods - logistic regression and multinomial logit - used in our analysis.

5.1

Variables used in the analysis

    

Herein we discuss the measurement scales of independent variables used in the analysis. A summary can be found in Appendix 1. We have transformed (see Appendix 2) the original independent variables for several reasons.

Firstly, there is a substantial number of missing values within the observations, particularly for the firmographic variables. In total, there are 279, 144 and 44 missing for industry, company main focus and revenue respectively. Following Hair et al.’s (2010, p. 42) suggestion, we analysed for the pattern of the missing data and concluded that the companies might be actually involved in several industries, focus on both B2B and B2C, or the question was out of the respondent’s competence. Missing values are also found in “Importance of effective online presence” (n=4), “Revenues from online sales” (n=140), and “Social media as a tool to achieve business objectives” (n=9) variables. One possible reason for these missing values could be that the respondent is not from the financial department. However, there was no clear pattern discovered in the missing data. Because complete exclusion of the missing data will drastically reduce the sample size, the decision to include missing values in the analysis was made.

In order to increase the number of observations for each of the categories within variables, some of the levels were grouped. The “Do not know” or “Not applicable” categories in “Importance of having online presence”, “Social media as a tool to achieve business objective”, and “Companies’ annual revenue” are combined with the least category (see Appendix 2).

Secondly, Churchill (1979) states that combining items in the measurement scale can lead to a better distinction between respondents, and to improved reliability. Based on this premise, and also to ease the interpretation, we combined the levels within some variables. In this way, for example, for the variable “Importance of effective online presence”, the “do not know” category is grouped with “somewhat important”, and “very important” - with “extremely important”. The rest of the variables’ transformations are shown in Appendix 2.

(26)

categories: lack of resources, lack of leadership in analytics, lack of analytical talent and lack of infrastructure and IT.

Lastly, Hosmer and Lemeshow (2000) suggest the use of a mean value when variables have a large variability, since otherwise it is difficult to describe the relationship between dependent and independent variables. Following this suggestion, we calibrated the model so that the revenue variable was transformed from categorical to continuous. However, due to the fact that for the continuous variable the effect within each level would not be recognized, the decision to keep it as a categorical variable was made.

5.2

Validity and Reliability of Measures

The chi-square test (p-value ≤ 0.05), which indicates that two variables are independent, shows that both perceived benefits and challenges are internally related to each other (see Appendix 3). Additionally, the factor analysis, which aims to find if there is any underlying pattern between variables (Hair et al. 2010, p. 94), was conducted to test whether or not it is possible to have certain factors for both perceived benefits from data analysing and the challenges in building analytical ability. The KMO value - detecting the allowance of running factor analysis - which is suggested by Hair et al. (2010, p.105) to have at least 0.5 value, is too low for perceived benefits (.178) and internal challenges (.494) to perform a factor analysis. Although there is no strong statistical evidence from factor analysis to allow us to group variables, we still combine them based on the results of chi-square tests and managerial sense.

Additionally, because multicollinearity might affect estimation results, the inter-correlation between variables was checked. Correlation was analysed with variance inflation factor (VIF) instead of Pearson correlation, because variables are of categorical nature, and the latter is usually used for continuous variables (Pallant, 2005). The collinearity diagnostics show that most of the VIF values are below 5 (see Table 2), which allows us to conclude that there are no severe forms of multicollinearity between the variables (Leeflang et al., 2000; Hair et al., 2010).

VIF VIF

Perceived benefits Internal challenges

Purchasing influence 1.281 Lack of resources 1.171

Driving increased sales volume 1.448 Lack of leadership in analytics 1.213

Driving innovation 1.373 Lack of analytical talent 1.245

Customer loyalty 1.251 Lack of infrastructure and IT 1.236

Analysis benefit 1.263

Institutional variables Digital media and online tools

Industry 1.045 Social media usage 1.191

Region (by office) 1.067 Importance of effective online presence 1.095

Company main focus 1.147 Importance of data-driven customer insights for

(27)

5.3 Research

Methods

The two research methods used in this paper are logistic regression and multinomial logit. In this section we provide rationales for using them. Logistic regression serves as an airbag for multinomial logit, which is our main estimation method. In addition, we describe the econometric models used in the analysis.

5.3.1 Logistic regression

Logistic regression has long been used for estimating choices (Winterich and Barone 2011; Maimaran and Simonson 2011). One of the most significant advantages of logistic regression is that it does not strictly follow the assumptions of linear regression, meaning that no normal distribution for the dependent variable, homogeneity of variance nor limitation for types of independent variables are required (Hair et al. 2010). The logistic regression can also guarantee that the final outcome variables fall between 0 and 1 (Hair et al. 2010). However, the disadvantage of logistic regression according to Hair et al. (2010, p.415) is that it needs a large sample size in order to support its estimation. In our analysis, the proportion between the variables and observations meets the criteria 1:15 (Hair et al. 2010, p. 176), which is enough for estimating the model.

The aim of applying logistic regression to our research is to provide general information about the company’s customer data usage. The original dependent variable consists of 4 customer data types, which we label by levels. The four levels are:

Level 1: We have aggregated (non-individual) sales or usage data.

Level 2: We have sales or usage data for individual customers, but no further information who the

customer is.

Level 3: We have basic information on each customer and can link it to sales or usage data.

Level 4: We have detailed customer information and can link it to sales or usage data to create a “single

customer view”.

One of the most distinguished characteristics of logistic regression is that its outcome variable is binary or dichotomous (Hosmer and Lemeshow 2000). Hence, the original four data types were merged into two levels:

(1) individual data type was combined from Level 2, 3 and 4, (2) aggregated data represented Level 1.

(28)

Econometric model

Based on our conceptual model, we formulate the econometric model for the logistic regression as follows:

         1

Data type (DT) is the dependent variable, with 0 standing for individual data and 1 for aggregate data. The reference category for the dependent variable is individual data. The individual data BF are the five perceived benefits that companies desire to gain from analysing the customer data. CH are four internal challenges companies face in the next two to four years in building analytical abilities. RV stands for the seven levels of company revenue. SM is the social media usage. FC is importance of data-driven customer insights for competitiveness in the following two to four years. OR stands for the companies online revenues, and OP represents the importance of effective online presence. RG refers to the locations of the companies, while CF stands for the companies’ focus. Finally, ID represents the 5 industries. For the base cases of the independent variables, please see Appendix 1.

5.3.2 Multinomial logit

According to Gensch and Recker (1979), multinomial logit has long been used for choices and performs better than the commonly used linear regression for individual choice. Also the multinomial logit (MNL) model is typically used when there is no clear-cut ordering of the outcomes (Soon 2009). Ribar (1993) stated that one of the most serious problems for multinomial logit is the independence of irrelevant alternatives (IIA). Also, the coefficients from different levels are difficult to interpret.

(29)

or usage data. In this light, we would like to emphasize that in our research we believe that data type is a function of each firm’s individual characteristics. The data type that the firm chooses brings the highest utility in comparison to other data types, and therefore, ranking cannot be applied here. Besides, existing literature advises that, when the ranking does not carry obvious objectivity, an outcome variable should be treated as unordered (Borooah, 2006). In the worst case scenario, when an ordered variable is treated as an unordered outcome variable, the result can be the loss of efficiency, but the estimates are unlikely to be biased. The opposite situation could, on the contrary, lead to a more serious error of biased estimates (Borooah, 2006).

Therefore, to sum up, there are two main reasons why we chose unordered rather than ordered multinomial logit:

1) the data type a company uses is every firm’s own choice

2) there is a less biased outcome in the case of a specification mistake.

Econometric model

Based on our conceptual model, we formulate the econometric model for the multinomial logit as follows:

, , , , , , ,

(30)

PART 6.

Empirical Results

In this part we present the results of the estimation for both logistic regression and multinomial logit. After that, the model’s goodness-of-fit is presented. Additionally, to test the reliability of the results, face validity and the robustness check are carried out for both analyses.

6.1 Logistic

regression

results

Variables Beta Exp (Beta) Variables Beta Exp (Beta)

1 Internal challenges 4 Annual revenue

Lack of leadership in analytics 0,437* 1.547 Less than $10 million 0.082 1.085

Lack of infrastructure and IT 0,539* 1.714 At least $10 million but less than $250 million 0.571 1.771

2 Industry At least $250 million but less than $500 million 0,739** 2.094

Financial 0.224 1.2551 At least $500 million but less than $1 billion 0.706 2.026

High Tech/Telecom 0.018 1.018 At least $1 billion but less than $10 billion 0.241 1.273 Manufacturing 0,581** 1.788 At least $10 billion but less than $20 billion -0.065 0.937 Health/Social service/Pharmaceuticals 0.349 1.417 At least $20 billion but less than $30 billion 0.581 1.789 Retail/Wholesale -0.286 0.751 5 Percentage of revenue from online sales***

Other 0.363 1.437 Between 2% and 10% -0.715 0.975

3 Region (by office) More than 10% - 0,025** 0.489

Asia/Pacific 0,725** 2.065 6 Social media usage

Europe 0.219 1.245 Is an important tool 0.005 1.005

North America 0.131 1.14 Use some of social media 0,532** 1.702

Model fitting information

Pseudo R-Square (Cox& Snell) 0.055

Pseudo R-Square (Nagelkerke) 0.087

-2Log Likelihood 659.311

Classification Table (Overall percentage) 80.70% n = 710

The logistic regression (see Table 3) shows that both “Lack of leadership in analytics” (p<0.05) and “Lack of infrastructure and IT” (p<0.05) would prefer aggregate data. Increasing values for both of them will lead to increases in choosing aggregated data by 0.437 and 0.539 respectively.

Compared with the Business services industry, it is aggregate data that is more highly favoured by the Manufacturing industry (p<.1, Beta=0.581). “Asia/Pacific” (p<0.1) demonstrates that aggregated data is more suitable, and that by choosing aggregated data, companies in the Asia/Pacific region could increase their utility by 0.725 in comparison with developing markets (e.g. South Africa, Egypt).

For companies whose annual revenue is between $250 million and $500 million (p<.1), the utilities of choosing aggregated data instead of individual data will cause an increase of 0.739.

Table 3. Estimation results for logistic regression

Base case:

1 Challenge not chosen 2 Business/Legal/Professional 3 Developing markets

4 More than $30 billion 5 Less than 2%

6 Do not use social media

*. Significant at .05 level **. Significant at .1 level ***.

(31)

The likelihood-ratio test shows that “revenues from online sales” is a significant variable. Negative relationships are found between “Revenues from online sales” and usage of customer data. It shows that if the companies revenues from online sales are between 2% and 10% (p<0.1), they could generate more utilities (Beta=-0.715) by using individual data. When the “revenues from online sales” increase to more than 10%, a slightly higher possibility of choosing individual data is found (Beta=-0.025). Finally, a positive relationship between “some use of social media” and “data types” is identified (p<0.1), showing that aggregated data is more useful (Beta=0.532).

The reason why “revenues from online sales” is retained in the model, is because of LL test results. According to Hair et al. (2010), the result of the Log Likelihood test can identify any variable that might be significant as a whole to improve the model goodness-of-fit. Our result (see Table 4) shows that both “lack of infrastructure and IT” and “revenues from online sales” are significant under this condition. Therefore, both of them are included in the analysis.

-2LL Df (1) Reduced Df(2) Df(2)-Df(1) Value Accepted or not

Full Model 651.595 38

Reduced Model

without Purchase influence 652.928 37 1 1.333 No

without Driving sales volume 652.018 37 1 0.423 No

without Driving innovation 651.893 37 1 0.298 No

without Customer loyalty 652.581 37 1 0.986 No

without Analysis 652.208 37 1 0.613 No

without Lack of resources 651.653 37 1 0.058 No

without Lack of leadership in analytics 655.422 37 1 3.827 No

without Lack of analytical talent 651.687 37 1 0.092 No

without Lack of infrastructure and IT 657.861 37 1 6.266 Yes

without Industry 657.436 32 6 5.841 No

without Region 656.26 35 3 4.665 No

without Main focus 652.686 35 3 1.091 No

without Annual revenue 658.492 31 7 6.897 No

without Percentage of revenue from online sales 659.651 36 2 8.056 Yes

without Social media usage 656.445 36 2 4.85 No

without Effectiveness of online presence 651.952 36 2 0.357 No

without Importance of data-driven customer insights for

competitiveness 654.037 34 4 2.442 No

Table 4:Log likelihood test for logistic regression

(32)

technique to our model. The cumulative lift in Figure 12 illustrates that our model is performing better than a random model.

As an additional measure to test the model’s performance, we have split the database into 10 equal groups and have ranked them according to the average response values (see Figure 13). The groups show an exponentially declining trend, indicating that our model performs well. Moreover, the overall classification matrix value, which measures how well group membership is allocated (Hair et al. 2010, p.421) shows a hit ratio of 80.7%. Compared with a randomly chosen possibility of 50%, this indicates adequate model performance.

0 0.2 0.4 0.6 0.8 1 1.2 0 0.05 0.1 0.15 0.2 0.25 1 2 3 4 5 6 7 8 9 10

Model selection Random selection

Mean response value Figure 12. Cumulative curve for logistic regression

Figure 13. Estimated response probabilities for logistic

regression

6.1.1 Face validity and robustness check

The results of logistic regression show that four out of five significant variables are consistent with the expected signs (see Table 1). Therefore, our model meets the criteria for face validity.

According to White and Ju (2002), one of the most popular ways of detecting model robustness is to remove or add the “core” variables to see whether major value change in parameters occurs. The model is not robust if any large change of parameters is found, and vice versa. We formed seven reduced models to compare with the full model. The results show that no major changes of parameters occurred, and the signs were generally in line with the full model (see Appendix 4).Therefore, we consider that our model is robust.

(33)

6.2

Multinomial logit results

Variables Data levels

Level 1 Level 2 Level 3 Benefits and Challenges

1 Purchase influence 0,51** 0.33 0,40**

2 Lack of leadership in analytics 0,69* 0.25 0,40**

3 Lack of infrastructure and IT 1,00* 0,61* 0,58*

4 Region (by office)

Asia 0.68 -0.12 -0.10

Europe 0.29 -0.02 0.20

North America 0,84* 0,90* 0,90*

5 Industry

Financial -0.19 -0,81** -0.47

High Tech/ Telecomm -0.06 -0.48 0.01

Manufacturing 0.08 - 0,74** - 0,74**

Healthcare/Social services/Pharmaceuticals 0.30 -0.76 0.17

Retail/Wholesale -0.02 0.31 0.25

Other 0.38 -0.25 0.10

6 Revenue from online sales

More than 10% -0.18 -0.39 -0.24

Between 2% and 10% -0,59** 0.06 0.15

7 Social media usage

Use some social media -0.36 -1,60* -1,08*

Social media is an important tool -0.05 -0.21 0.04

8 Competitive advantage of data***

Data-driven customer insights are very important -0.68 -0.51 0.57

Data-driven customer insights are somewhat important -0.12 0.34 0.86

9 Annual revenue

Less than $10 million -0.29 -0,91** -0.30

At least $10 million but less than $250 million 0.28 -0.66 -0.20

At least $250 million but less than $500 million 0.30 -0.67 -0.57

At least $500 million but less than $1 billion 1,47** 0.76 0.92

At least $1 billion but less than $10 billion 0.19 -0.19 -0.02

At least $10 billion but less than $20 billion -0.32 -0.68 -0.27

At least $20 billion but less than $30 billion 0.45 -0.50 0.00

Base case:

1 Benefit not chosen 2 Challenge not chosen 3 Developing markets

4 Business/Legal/Professional services 5 Less than 2%

6 Do not use social media

7 Data-driven customer insights are not important 8 Revenue more than $30 billion

Model fitting information

Pseudo R-Square (Cox and Snell) .177

Pseudo R-Square (Nagelkerke) .190

McFadden .073

n = 710

Notes:

*. Significant at 0.05 level **. Significant at 0.1 level

*** Significant based on log likelihood ratio

Level 1 Aggregated (non-individual) sales or usage data

Level 2 Sales or usage data for individuals customer, but no further information Level 3 Basic Information on each customer

Level 4 - base case Detailed information on each customer

(34)

The aim of multinomial logistic regression is to go in depth, and to discuss which major factors may influence the choice of different customer data types. Table 5 shows that “purchase influence” has significant influence on Level 1 (aggregated sales or usage data) and Level 3 (basic information on individual customers and can link that to sales or usage data) compared with base case - Level 4 (detailed information on each customer). If companies find “purchase influence” benefit is important, the utility will be gained from using Level 1 data (Beta=0.51).

“Lack of leadership” is significant within two levels: Level 1 (p<0.05) and Level 3 (p<0.1).

Also, “Lack of infrastructure and IT” is significant across all levels with a significance level of p<0.05. Lacking leadership and infrastructure and IT will make aggregated data more suitable, and the highest utility increase is for Level 1 (Beta = 0.69) in “lack of leadership” and for Level 1 (Beta=1) in “lack of infrastructure and IT”.

The results also suggest that using social media usually leads companies to have more aggregated data. By applying individual data, companies will decrease their utilities by 1.6 for Level 2 and 1.08 for Level 3.

If companies recognise the importance of data-driven customer insights for competitiveness, the individual level data will bring more utilities compared with aggregated data.

When company revenue is less than $10 million per year, choosing Level 3 (p<0.1) will result in negative utility (Beta= 0.91) as compared to companies with more than $30 billion revenue. Companies with revenue between $500 million and $1 billion, choosing Level 1 (p<0.1), have increased utility of 1.47. These results generally indicate that small-scale companies prefer to use aggregate type of data as compared to large-scale companies.

The results also indicate the significant influence of the region North America (p<0.05), which shows that individual data is more preferable (Beta=0.9). Among all the industries, Manufacturing industry has significance in Level 2 (p<0.05) and Level 3 (p<0.05), which shows that by choosing individual data, companies usually decrease utilities by 0.74 (Beta) within two levels. A negative effect from using individual data (Level 2) exists within financial industry (p<0.1, Beta=0.81).

(35)

It can be noted from the analysis results that R2 of logistic regression (R2 = 0.087) is much lower than of multinomial logit (R2 = .190). This serves as evidence for the choice of multinomial logit as a main estimation method. However, a still relatively low R2 value can be explained by the cross-sectional nature of the data, which makes it difficult to fit a good model (Reisinger 1997).

6.2.1 Face validity and robustness check

Six out of eight significant variables have consistent signs that are in line with our expectations (see Table 1). Therefore, multinomial logit model has satisfactory face validity.

Referenties

GERELATEERDE DOCUMENTEN

A recent TrialNet ancillary study determined levels of beta cell killing using this approach in individuals at risk of type 1 diabetes (from TN01).. This study demonstrated that

Daarnaast is meer onderzoek nodig naar expliciete instructie in algemene kritisch denkvaardigheden, zoals dit vaker in het hoger onderwijs wordt onderwezen, omdat de

Proposition: Under the assumptions above, the parameter vector 6 is identified if and only if there does not exist a nonsin- gular matrix A-(a1,A2) such that l;~al is distributed no

As both operations and data elements are represented by transactions in models generated with algorithm Delta, deleting a data element, will result in removing the

It was not relevant for this research when the diagram showed a workflow of the development (e.g. scrum dia- gram), a graphical user interface, the working of a different system

H3b: People's (A) donation intention, (B) attitude towards the advertisement and (C) attitude towards the organization will be higher/more positive if they are confronted with

Neverthe- less, the simulation based on the estimates of the parameters β, S 0 and E 0 , results in nearly four times more infectious cases of measles as reported during odd