• No results found

Decline in paid subscriptions

N/A
N/A
Protected

Academic year: 2021

Share "Decline in paid subscriptions"

Copied!
61
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Decline in paid

subscriptions

A lead scoring study

By

(2)

1

Decline in paid subscriptions

A lead scoring study

Master thesis Marketing Intelligence 13-01-2020

by

Joep Heijink S2743450

Munnekeholm 18, 9711JA, Groningen +316 313 45 286

j.heijink@student.rug.nl

Internal supervisor: Prof. dr. J. Wieringa j.e.wieringa@rug.nl

Second internal supervisor: Dr. P. S. van Eck p.s.van.eck@rug.nl External supervisor: Floor Schouwman floor.schouwman@ndcmediagroep.nl University of Groningen Faculty of Economics & Business

Department of Marketing

(3)

2

Summary

NDC mediagroep faces declining revenue from fixed subscriptions. This study has been conducted to gather insights into this problem and provide suggestions for a more effective promotion of paid subscriptions. The central question in this study is: “How can lead scoring affirm that NDC channel users opt for a fixed subscription, instead of a trial subscription?” Prior to the lead scoring model, a multinomial logistic regression has been conducted to establish the underlying relation between the independent variables and the dependent variable outcomes. The dependent variable consists of three categories, first being no active subscription, the second being a promotional subscription and the third being a fixed

subscription. The data set has been divided into two separate data sets. The first consisting of all users from whom website usage and possible newsletter subscriptions were known. The second being the full data set of all users, with only account history variables present. For the lead scoring model, machine learning has been applied to a train data set and evaluated on an evaluation data set. The optimal method appears to be the conditional inference tree method. As to the results, the proposed lead scoring model predicts which type of subscription would be most suited for each user it analyzes, based on the available data. With the optimal method, this can be done at an accuracy of approximately 93%. This in turn can help in deciding on the best proposition for each specific user. Compared with randomly approaching users and offering certain subscriptions, NDC can make use of the predictions made by the model by targeting those who are predicted to have a fixed or promotional subscription. By

implementing this method of who to approach, NDC can increase targeting efficiency and reduce wasted sales resources.

Following the conclusion of this research, it seems logical in the current situation to only target those who are likely to opt for a fixed or promotional subscription. Users who have had a trial subscription appear to not be in this target group.

(4)

3

Preface

(5)

4

Table of contents

Summary ... 2 Preface ... 3 1. Introduction ... 5 2. Theory ... 8 2.1 Conversion ... 8 2.2 Lead scoring ... 9 2.3 Website behavior ... 10 2.4 Newsletter engagement ... 11 2.5 Account history ... 12 2.6 Conceptual model ... 13 3. Methodology ... 14 3.1 Population ... 14 3.2 Sample ... 14 3.3 Data collection ... 15 3.4 Variables ... 15 3.5 Data cleaning ... 17 3.6Analyses ... 18 3.6.1 Initial analyses ... 18 3.6.2Lead scoring... 20 4. Results ... 21 4.1 Descriptive statistics ... 21

4.2 Multinomial model assumptions ... 22

4.3 Final multinomial model ... 24

4.4 Mediation effect ... 25 4.5 Lead scoring ... 25 4.6Hypothesis testing ... 27 5. Conclusions ... 28 5.1 Conclusion ... 28 5.2 Discussion ... 29 6. Recommendations ... 30 7. References ... 33 8. Appendices ... 37

Appendix I: separation test outcome ... 37

Appendix II: improved VIF test output ... 37

Appendix III: lead scoring output ... 37

(6)

5

1. Introduction

Shifting focus from the physical to the digital channels has both been challenging and promising for the publishing sector. The Wall Street Journal (2016) has shown that spending on print advertising has seen a large decrease since 2007 with the strongest decline around the financial crisis of 2008. In 2012, the spending on digital advertisements has surpassed that of print advertisements (Vranica & Marshall, 2016). Whereas printed ad results are difficult to follow up on, digital ads are easily trackable. It is expected that from 2013 to 2020, the global data volume will increase from 4.4 zettabytes to 44 zettabytes (Hajirahimova & Aliyeva, 2017). An increase which can be beneficial for the advertising industry as well.

This expansion in data availability offers opportunities for companies like NDC mediagroep. NDC mediagroep is the largest independent media house of the northern part of the Netherlands (NDC mediagroep, 2019). It controls more than 70 digital channels, 40 weekly magazines, 7 news magazines and 3 newspapers (NDC mediagroep, 2019). This large number of digital channels, combined with the increasing availability of data, proves to be a valuable tool for the offering of NDC’s and the NDC’s business partners’ products.

These developments are causing NDC to be increasingly dependent on its Customer

Relationship Management (CRM) system. CRM systems lay the base for the building of

long-term relationships with customers (Hendricks, Singhal, & Stratman, 2007). Due to the benefits of a CRM system, NDC has decided to implement their own comprehensive system. The implementation of this new CRM system started three years ago and is still an ongoing process.

One of the goals of NDC is to utilize this newly acquired toolset to map its relationship with NDC’s target group: the “Noorderling”. The Noorderling can be defined as someone living in the northern part of the Netherlands (Van Dale, 2019). The CRM system provides numerous tools to reach this goal. New identification codes shared among systems used by NDC provide the ability to connect these systems which makes integrated analyses of multi-channel data possible.

The main objective of this research is to make use of NDC’s abundance of data by developing a model which is able to give a lead conversion probability score. NDC’s CRM system contains hundreds of thousands customer profiles with information on account history, newsletter subscriptions and demographics. Useful information, but not yet used to its full potential. By using this data and quantifying the specifics of NDC’s relation with the

(7)

6 quantifying the relationship, users of NDC’s channels can be targeted one-to-one based on their lead conversion probability. One-to-one marketing can be seen as an extreme form of segmentation, where the segmented target group consist of only one individual (Arora, 2008). A segmented target group can provide a solid base for marketing campaigns and helps in deciding the proper cross- or upsell, especially on an individual level. Cross-selling is offering an additional product after a customer has made a purchase earlier, whereas upselling is offering a customer a new, better or faster product than the one he or she has purchased earlier (Aydin & Ziya, 2008).

One of the ways to quantify the beforementioned relation in a lead conversion probability score is lead scoring (Monat, 2011). Important lead characteristics are defined, assigned a rating and combined into the final lead-score (Monat, 2011). For this research, a lead is defined as the possibility that a user of NDC’s channels subscribes to one of NDC’s paid subscriptions.

Present day publishers face the issue of losing users to the online environment. In order to counter this, trial subscriptions have made their emergence besides the increase in online availability of services. Trial subscriptions offer the opportunity to become familiar with the product or service against a very reduced rate. The aim of trial subscriptions from the firm’s perspective is that users who enjoy the trial subscription, opt for a regular fixed subscription. This is unfortunately often not the case and users tend to make use of only the trial subscription. Being able to convert these trial subscriptions into regular subscriptions would be of great financial benefit for NDC.

The main research question that can be formulated from this problem statement is:

How can lead scoring affirm that NDC channel users opt for a fixed subscription, instead of a trial subscription?

Multiple channels will be used to answer the main question. Data is retrieved from NDC’s website, newsletter and CRM system. In this research, website engagement is defined based on the average session time, average pageviews and the total amount of sessions an individual has engaged in, due to these variables being the main indicators for conversion that are available (Park & Park, 2016). The first sub-question is given:

(8)

7 Newsletter response behavior can be used to gauge user involvement. Whether someone is opted-in for the newsletter, the email is opened and the contents are clicked on can be tracked and recorded (Fabian, 2015). These variables will be used. The following sub-question is given:

What is the contribution of newsletter engagement to the lead score?

Account history is defined by past subscriptions and past subscription types. These are considered as the main indicator variables for account history, because past subscription information can indicate a certain interest in NDC’s paid subscriptions. The following sub-question is established:

What is the contribution of account history to the lead score?

The final question that arises is which of the three lead-scores proves to be the overall best predictor for a lead. The weighted predictive capabilities of each lead-score must be determined in order to establish the final overall lead-score to answer the main research question. The final sub-question is given:

What are the individual predictive capabilities of these elements, and what is the combined contribution to the lead score?

For the generation of a solid lead-scoring model, it has been decided to focus on NDC’s largest product, the Dagblad van het Noorden (DVHN) newspaper. This decision has been made because DVHN has the most available data out of NDC’s products and it helps in keeping the research uncluttered. It also helps in defining conversion, which is the goal after following up on a lead. For this research, conversion is defined as a user subscribing to one of DVHN’s paid subscriptions. A paid subscription can give access to print, digital or both media channels. DVHN has a variety of different offerings. Besides the physical paper and its different types of subscriptions, DVHN offers its own website and newsletters as well. The developed lead-scoring model can possibly be applied to NDC’s other subscription-based offerings as well and increase the number of fixed subscriptions.

The data that will be used for the generation of the lead-scores is from the period of February 2019 to August 2019. This period has been decided on due to changes made to the subscription system in February, which has caused a loss of customer data from before February who did not have a paid subscription during the system change.

(9)

8 not much research has been conducted specifically on lead scoring. By the proposition of this quantitative lead scoring model, the gap in lead scoring theory can be partially filled.

2. Theory

The purpose of this second chapter is to lay the theoretical base upon which the research is built. Quantitative research usually requires the pre-formulation of theory (deduction) on which the methodology can be based, and the hypotheses tested (Heyink, 1993). Therefore, available literature has been retrieved and an overview is provided. The beforementioned research question and sub questions have been divided into five subchapters. A sixth subchapter is dedicated to the conceptual model. The main research question’s focus applies to the conversion of a lead. Therefore, the theoretical base starts at conversion.

2.1 Conversion

In online environments, the conversion rate is considered to be the percentage of visits to an online purchasing channel in which a transaction has taken place (Gudigantala, 2014). To be able to convert a visitor into making a purchase, relevant metrics must be established (Park & Park, 2016). Gudigantala’s research has shown that metrics such as purchase intention and website satisfaction are significant conversion predictors. Website satisfaction also has a positive significant effect on purchase intention (Gudigantala, 2014). This has been concluded by the fact that firms with high website satisfaction and high purchase intention are positively related to firms with high conversion rates (Gudigantala, 2014).

When switching over to the physical environment, other factors have proven to be of influence on conversion. In the research conducted by Lam et al. (2001) factors such as the weather, day of the week, price promotions and whether it is a holiday or not are shown to be significantly related to conversion.

(10)

9 advertisements, but also the indirect effect by encompassing the advertisement clicks as stochastic events dependent on the past occurrences.

The way in which a conversion can come about is by following up on a lead. However, the decision of when to follow up on a lead can be a difficult one and is often only based on intuition or guesswork (Monat, 2011). Lead scoring is a tool which can aid this decision process.

2.2 Lead scoring

The ability of the sales and marketing departments to cooperate is key to efficiently deciding on and the offering of the correct up- and cross-sells (Talatappeh, 2016). One of the tools to accomplish this is the use of lead scoring (Raab, 2008). Lead scoring can be used to make the decision for when the lead is ready to be made use of by the sales department as stated by Raab (2008).

In order to develop a lead score, independent lead characteristics that have an influence on the probability of a lead conversion have to be identified (Monat, 2011). These characteristics can then be given a score or mark and eventually be combined into one score which gives the total probability of a lead conversion (Kolowich, 2019). The acquired lead scores can then be used to aid in the efficient follow up of leads.

A process linkable to lead scoring is targeting. Targeting can be a helpful tool for the next step in the implementation of lead scoring. After having created a lead score, targeting can be used to allocate the proportionate amount of marketing resources to the group that has the highest lead scores. In turn, targeting allows for the elimination of wasted marketing resources (Ganesh Iyer, 2005).

Not much academic literature on lead scoring is available. One of the first researchers to attempt a form of sales lead modeling, as stated by Monat (2011), are R.D. Kestnbaum and L. Hsieh (1983). Kestnbaum and Hsieh conducted a linear regression analysis using multiple independent variables considered to be important indicators for a sales lead to determine the coefficients used for the creation of a probability score of a sales lead. However, they do not provide any supporting data, references or other proof of validity.

(11)

10 As stated by Monat (2011), there is no clear consensus as what precisely defines a good or bad lead. Not only does this differ between industries, it also varies between companies within the same industry.

So, to tackle the lack of theoretically based lead characterization theories, Monat (2011) proposes his own theory. Monat’s lead characterization theory proposes a qualitative model to qualify leads based on its essential characteristics. The model’s predictive capabilities are consistent with the models presented by the former researchers starting with Kestnbaum and Hsieh in 1983. However, one of the goals of Monat’s model has been to build it upon a sound theoretical base. Monat states that a quantitative model could prove to be highly efficient in predicting lead conversions. One of Monat’s aims is to build a quantitative model upon his qualitative model.

A key requirement for the ability to predict lead conversions is having an extensive system with an abundance of information available (Duncan, 2015). All estimations start with raw data, data which can be retrieved from systems such as a CRM system (Verhoef, 2003).

2.3 Website behavior

To establish a lead score, essential lead characteristics must be determined (Monat, 2011). One possible indicator for a lead conversion is web browsing behavior. Montgomery et al. (2004) have found that by analyzing page-by-page viewings and path information, conversion predictions for a single user can be as accurate as up to 40%, which is up to twice as accurate as prediction models which did not have this information included. By being able to accurately predict conversion to such an extent, other opportunities arise. Montgomery’s analysis among customer website behavior indicated that adjusting a web page design and its offerings towards the requirements for a visitor with a high chance on conversion, can increase the rate of conversion from 7 to 9%.

(12)

11 can be increased, Hauser states. This can eventually be extended towards retention and conversion (Gudigantala, 2014). Having assigned a lead score to an individual visitor enables the possibility for website morphing for that individual. An individual who scores high on one aspect and low on another, might have a different cognitive style than someone who scores high/low on other aspects (Larsson, 2011).

When looking at website behavior, not only the number of visits is an important estimator for conversion, the time between these visits can also be an important metric (Park & Park, 2016). This is due to the fact that users rarely converse on the first site visit, so only looking at these first visits can result in loss of valuable data (Montgomery, Li, Srinivasan, & Liechty, 2004).

When considering the previously mentioned theory, the following hypothesis can be formed:

H1: Website engagement positively affects lead conversion

2.4 Newsletter engagement

Besides analyzing website behavior, online marketeers have another tool which can create insights in customer engagement, customer loyalty and customer satisfaction: e-mail marketing (Kumar, 2005; Fabian, 2015; Bender, Fabian, Haupt, & Lessmann, 2018). E-mail marketing has been around since Gary Thuerk sent out the first mass e-mail in 1978 and has since come a long way (Hosch, 2008).

One of the means for analyses with e-mail marketing is e-mail tracking. By applying regular web tracking mechanisms to e-mails, individual e-mail recipient reading behavior can be monitored and interpreted (Fabian, 2015; Bender, Fabian, Haupt, & Lessmann, 2018). The experiment conducted by Bender et al. (2018) indicates that 92% of the received e-mails originating from large companies from the US, UK and Germany contained tracking components. This states that, among larger companies, e-mail tracking is widely implemented.

(13)

12 consumers have received e-mails, possibly due to the fact that emails contain links towards the relevant website.

Having multiple touchpoints, both web and e-mail response behavior, can sometimes be a cause for confusion as to which of the channels is considered the optimal conversion predictor and thus should receive the most resources (Li & Kannan, 2014). As stated by Li and Kannan (2014), different options are available for this allocation decision. Allocating resources to all channels can be costly or ineffective, a weighted average or focus on one channel can prove to be the better choice.

One of the advantages of having a comprehensive CRM system is the ability to perform direct mail marketing. Based on the known profile in CRM, direct mailings can be provided with a personalized offer (Verhoef, 2003). Furthermore, Verhoef (2003) states that other advantages of direct mailings are that there is no direct competition for the attention of the respondent due to the email having only one company’s offers, which can lead to more involvement of the respondent.

The following hypotheses can be formed:

H2: A newsletter subscription has a positive effect on lead conversion H3: Newsletter response has a positive effect on lead conversion

2.5 Account history

Depending on the industry, retaining a customer is always between 5 and 25 times less expensive than acquiring a new one (Gallo, 2014). However, the possibility to retain a customer is not always present. Subscription based service providers have to handle coming and going subscribers.

Verhoef (2003) states that the chance of customer retention is significantly higher for customers with a high prior customer share and lengthy relationship with the company. Customer share can be defined here as: “The ratio of a customer’s purchases of a particular

category of products or services from supplier X to the customer’s total purchases of that category of products or services from all suppliers” (Verhoef, 2003). This means that when a

(14)

13 scoring, this can mean that for the characterization of lead characteristics, a long account history weighs heavier than a short account history.

Losing a paying subscriber is a costly occurrence, but there is still something to gain. A previously made purchase can lead to an increased future conversion rate due to the fact that the customer already has experience with the company’s products or services (Moe & Fader, 2004; Kodali, 2014). This mechanism can be used by marketeers to target the customers who could not be retained. The advantage of targeting those who have already had a paid subscription is that they are already familiar with the service.

Based on the theory, the following hypothesis can be formed:

H4: Previous paid subscriptions in account history have a positive effect on lead conversion chance

2.6 Conceptual model

With all the above-mentioned literature and four hypotheses considered, the following conceptual model (figure 1) can be constructed:

Figure 1: conceptual model

(15)

14 the user towards the direction of conversion. “Newsletter behavior” is assumed to have a direct effect on both “Website engagement” and on lead conversion, based on the research by Lee (2010) and Verhoef (2003). Therefore, “Newsletter engagement” performs as a possible mediator between “Website engagement” and “Lead conversion”. If a recipient responds to a newsletter, website engagement by this recipient will increase due to the newsletter linking through to an online webpage. By having opted-in for a newsletter, recipients have also already voluntary chosen to engage with the relevant company, and thus have made the first step towards conversion.

3. Methodology

The aim of the following chapter is to provide an overview of the methods used to achieve the research results. A cross-sectional descriptive study design has been chosen to determine the existing causal relations between independent variables and the dependent variable.

3.1 Population

The full population size of which NDC has CRM profile data is over XXXXXX. The core of these XXXXXX entries has a size of approximately XXXXXX active subscribers. This group contains subscription information from all NDC’s paid subscription products. Besides the XXXXXX active subscribers, there is a group of approximately XXXXXX users of which account history is known, but do not have a specific paid subscription. The rest of the population contains users who have or had a newsletter subscription or make use of NDC’s other services such as event ticket sales. It is difficult to state exact numbers due to many overlapping entries and distortions in the data that have occurred over time.

3.2 Sample

(16)

15 In order to use as much of the available data as possible, the population will be used in two different ways. The first being the full analysis based on the users of which web behavior is known and linkable to their account information and possible newsletter behavior, this is a smaller portion of the full population. The size of this full analysis is 7273. However, this will not be considered as a sample of the full population, because it is not representative of or comparable to the full population due to there being different information present. Next, the full population will be analyzed from which the account history is known. Unfortunately, the only common value between web data and account information by which the data can be linked is the email address. The email address is only known during a website session if a user logs in or uses it somewhere on the website which stores the email address using cookies. Thus, it cannot be said with certainty if a user who is known in the CRM system does or does not make use DVHN’s website.

3.3 Data collection

Collecting the data has been done by using multiple systems in which NDC manages its data. The base of the data, the user information, has been extracted from NDC’s CRM system. Next to past and current account information, newsletter subscriptions are also recorded in the CRM system.

Other newsletter information such as newsletter-opens and newsletter click-throughs are recorded in a dedicated system. From this dedicated system, the data is extracted and linked to the other available data. Website behavior such as average session duration, average pageviews and total amount of sessions between 01-02-2019 and 30-09-2019 has been extracted from a third dedicated system and linked to the CRM data where possible.

3.4 Variables

(17)

16 Many users with fixed subscriptions did not have an end date noted in the CRM system due to the subscription being indefinite. In order to still have a relation length variable for these users, three years from the starting date has been chosen for the imputation of the end date. The reason for this decision being that this is approximately the time period for the longest possible predetermined contract. A duration/survival model can provide imputation options for these missing values, however there is not a clear censoring variable that can be created and thus it has been decided to not use such a model. Then, a variable which indicates which type of subscription is currently active is created. The value ‘5’for an active fixed subscription, ‘3’ for an active promotional subscription, ‘1’ for an active trial subscription and a ‘0’ for when there currently is no active subscription. This is considered as the dependent variable.

Next are the website behavior variables. First, the average session duration variable indicates the average time spent on the DVHN website during each session. Second, the average pageviews per session gives the average amount of pages viewed per sessions. Finally, the sessions variable provides the total amount of sessions a user has initiated during the previously mentioned time period.

For the newsletter data, three variables are used. The first variable states how many emails were received. The second variable indicates the number of times emails are opened. The third variable shows how many times items in the newsletter are clicked on. Table 1 provides an overview of all variables in the dataset

Table 1: List of included variables

Variable names in data: Meaning:

Status_dummy Indicates whether a subscription is active

relationlength_fixed Average fixed subscription length in days

relationlength_promo Average promotional subscription length in days

relationlength_trial Average trial subscription length in days

fixed_dummy Variable stating the amount of fixed subscriptions

promotional_dummy Variable stating the amount of promotional

subscriptions

trial_dummy Variable stating the amount of trial subscriptions

active_subscr_dummy Dummy stating which subscription is active

avg_session_duration The average session duration

avg_pageviews_per_session The average amount of pageviews per session

newsletter_status_dummy Stating whether a newsletter subscription is active

sessions Total amount of sessions

received Number of emails received

viewed Number of received emails viewed

(18)

17 3.5 Data cleaning

For the full analysis to be conducted, the data has been prepared and cleaned. Exact duplicate entries have been resolved by removing one of the duplicates. After having cleaned the basic data set from basic abnormalities, outliers have been detected. As can be seen in the boxplots below (graph 1), outliers are present:

The sources of the (extreme) outliers are possibly bots skewing the recorded session data. Google for example makes use of “crawler bots” for a variety of reasons, one of which is to index webpages (Google, 2019). These crawler bots generate the same data as regular users. Whereas these bots are usually filtered out of the data, they sometimes slip through. The outliers have been detected via the inter quartile range (IQR) method. The formula used for the lower bound is Q1 – (1.5 * IQR) and the formula used for the upper bound is Q3 + (1.5 * IQR) (Chakraborti, 2015). After detection, the outliers have been replaced by NA values and imputed by using the random forest technique, after which, the boxplots look as followed (graph 2):

Graph 1: Boxplot web data

(19)

18 Next, the outliers in the newsletter data need to be identified. As can be seen in the boxplots below (graph 3), outliers are present in the newsletter behavior. However, none of the values lay beyond the reasonable limit. Therefore, it has been decided to not remove any of the higher values due to those still being possible.

Besides the outliers, other anomalies are present. Some recipients have received newsletters over the full period of 233 days. 627 Of those who have received emails, have a higher value for the “viewed” variable than for the “received” variable. This would mean that some received emails were opened multiple times, which is not unimaginable.

Furthermore, after the newsletter details have been linked to the CRM data, missing values are present. These values exist at the click-through and viewed variables. For the click-through variable, 270 missing values are present. The viewed variable shows 88 missing values. This can be explained by users not clicking on the email content and users not opening the emails at all, thus these values have been imputed with the value of 0.

3.6 Analyses

3.6.1 Initial analyses

In order to be able to perform lead scoring, the underlying relations between variables must be determined. The chosen method for analysis is multinomial logistic regression. Multinomial logistic regression is often used in the marketing and economics field to predict which product a customer would select by using relevant predictor variables (Price, 2019). Furthermore, due to the dependent variable containing more than two categories, analysis options are limited. The

(20)

19 non-ordinal nature of the dependent variable creates the need for a classification analysis. The multinomial logistic regression provides a classification analysis framework that can include more than two categories for the dependent variable. Next to its capabilities, another reason for the popularity of multinomial logistic regressions is that there is no assumption of normality, linearity or homoscedasticity (Starkweather, 2011). A more powerful alternative for the multinomial logistic regression is the discriminant function analysis, this however requires the beforementioned assumptions to be met (Starkweather, 2011). The multinomial logistic regression considers for each entry and its corresponding dependent variable outcome, all other possible dependent variable outcomes. Variables which differ per dependent variable outcome are determined and defined in the data set. Then, as indicated by Starkweather and Moske (2011): “A multinomial logistic regression is used to predict categorical placement in or the

probability of category membership on a dependent variable based on multiple independent variables.”

Other assumptions must be met, however. As stated by Starkweather (2011), one of these assumptions is the independence of the dependent variable choices. This states that the membership in one category, is not related to the membership in another category (Starkweather, 2011). The Hausmann-McFadden Independence of Irrelevant Alternatives (IIA) test has been selected to test whether this assumption is met or not.

The second assumption which must be met is non-perfect separation (UCLA: Statistical Consulting Group, 2019). Non-perfect separation states that predictor variables cannot be fully separated based on the dependent variable class outcome (Starkweather, 2011). Quasi-complete separation is also a possible outcome. Quasi-complete separation is the occurrence where a dependent variable outcome separates one or more predictor variables to a certain degree. Complete or quasi-complete separation will result in incorrectly estimated coefficients and the effect sizes will not be accurate (Starkweather, 2011). A separation test has been conducted to check if the assumption has been met.

(21)

20 independent variable is inflated due to its linear relation with another independent variable (Allison, 2012).

Furthermore, a full mediation analysis has been conducted in order to specify the underlying relation between website behavior and newsletter engagement and their relationship with the dependent variable. Newsletter engagement is a possible moderator.

3.6.2 Lead scoring

The main goal of this research is to provide a framework which can ensure that NDC users are more likely to opt for a fixed subscription, rather than for a trial subscription or have no subscription at all. This can be done by using lead scoring.

Machine learning is used to come up with such lead scores. The data set is split into a train and evaluate data set. Machine learning is then applied on the train data set with all the relevant variables, after which the evaluation data set is used to evaluate the trained model. Different types of machine learning are tested in order to come up with the best method in terms of accuracy and processing time.

The machine learning method used for determining the correct independent variables is the conditional inference tree method. The conditional inference tree (ctree) method is based on the conditional inference framework by Hothorn et. al. (2006). The reason for selecting the conditional inference tree method is due to there being little trade-off between accuracy and output speed.

(22)

21

4. Results

The result section of this research has been divided into 6 subchapters. First, descriptive statistics are provided to have a proper understanding of the used data. Second, the main question provided in the introduction is answered, followed by answers on the sub questions. Third and final, each individual hypothesis test result is presented.

4.1 Descriptive statistics

From the overall data set, the relationship length and other characteristics become apparent when looking at the basic numbers. When comparing the distribution of subscription types, it can be concluded that the distribution is not fully equal. As expected, the percentage of users who currently have an active trial subscription is much lower than those with a promotional or fixed subscription. This can be explained by the fact that trial subscriptions usually only span approximately over a time period of two weeks. Whereas the total number of trial (XXXXX) subscriptions over the given time period is only slightly lower than the number of fixed (XXXXX) or promotional (XXXXX) subscriptions, it makes sense that only

a few still have an active trial subscription. Table 2 provides an overview of the descriptive figures. Graph 4 shows the current distribution of the subscription types. Graph 5 and 6 provide histograms to elaborate on table 2.

Group size per subscription type

Total size Webusage known Newsletter engagement known Relationlength Fixed relationlength Promo relationlength Trial relationlength Fixed (5) 3611 1215 207 Fixed active (5) 1434.50 140.25 13.33

Promo (3) 8892 3373 623 Promo active (3) 1131.71 835.05 16.15

Trial (1) 917 308 48 Trial active (1) 161.37 13.48 41.78

Nonactive (0) 6346 2336 245 Nonactive (0) 137.30 28.75 30.91

Total size 19766 7232 1123 Overall mean 822.75 411.14 21.56

Website usage means Sessions Average pageviews per session Average session duration Newsletter variable means

Received Viewed Clicked

Fixed active (5) 8.34 3.52 253.61 Fixed active (5) 183.47 75.91 18.75

Promo active (3) 7.03 3.26 214.64 Promo active (3) 177.37 76.77 18.36

Trial active (1) 7.17 2.91 215.38 Trial active (1) 103.65 59.88 15.60

Nonactive (0) 7.41 2.77 193.25 Nonactive (0) 138.93 59.12 19.21

Overall mean 7.38 3.13 214.31 Overall mean 166.95 72.04 18.50 Table 2: Descriptive figures overview

Trial 5% Promotional 45% Fixed 18% Nonactive 32%

Subscription distribution

Trial Promotional Fixed Nonactive

(23)

22 4.2 Multinomial model assumptions

For the specification of the relation between the independent and dependent variables, a multinomial logistic regression has been performed. For a multinomial logistic regression, assumptions must be met. These assumptions have been tested for both the web data subset and full data set. First, the independence of dependent variable choices test has been conducted. The Hausman-McFadden test has been selected to check for the IIA assumption.

From the results of the IIA test, it can be concluded that the IIA assumption is met for both data sets. For the test, the multinomial model has first been estimated with all dependent variable categories present. After, one of the categories has been removed. With this, the coefficients of each of the multinomial alternative outcomes can be compared. From the highly significant P-values of 1 and close to 1, it can be concluded that these coefficients are the same with all or only a part of the variables

present, thus the IIA assumption is met. Table 3 to the right show the outcomes of the IIA tests. Next, a separation test has been conducted to check for possible non-perfect or quasi-complete separation. It also incorporates a check for infinite variables. The test shows that none of the variables are infinite and that there is no separation of any kind present. Therefore, the non-perfect or quasi-complete separation assumption is met for both data sets. The outcome can be found in the appendix at part I.

The final assumption that must be met is the absence of multicollinearity. All available variables are compared. A VIF test with the relevant independent variables has been conducted for both data sets. Table 4 below shows that most VIF scores do not exceed 2.5, which is considered to be the maximum VIF value for a multinomial logistic regression (Allison, 2012).

Web data subset IIA test

Chi-square -0.60965

Degrees of freedom 9

P-value 1

Full data IIA test

Chi-square 53008

Degrees of freedom 3

P-value < 2.2e-16

Table 3: IIA test results Graph 5: Newsletter engagement histogram

0 50 100 150 200 Fixed active (5) Promo active (3) Trial active (1) Nonactive (0) Overall mean Newsletter engagement

Received Viewed Clicked

0 500 1000 1500 2000 Fixed active (5) Promo active (3) Trial active (1) Nonactive (0) Overall mean

Mean of relationship length in days

fixed promo trial

(24)

23 The variables received and clicked are above the acceptable VIF score of 2.5. This can be concluded as logical due to the fact that if someone receives an email, it will be likely that the email is opened, therefore correlation between these two variables exists. The received and viewed variables have been recoded into a new variable by multiplying received with the viewed variable. The VIF score of the new variable is 2.35. However, the final effect of the new variable in the multinomial model is close to zero, therefore it has been decided to drop the received and viewed variables from the multinomial model altogether. Furthermore, the newly created variable has no increased predictive capability for the lead scoring model, neither do the separate received nor viewed variables. The output of the renewed VIF test can be found at the appendix in part II.

Variables (web data subset) VIF score Variables (full data) VIF score

relationlength_trial 1.29 relationlength_trial 1.36 relationlength_promo 2.11 relationlength_promo 1.84 relationlength_fixed 1.49 relationlength_fixed 1.56 fixed_dummy 1.40 fixed_dummy 1.46 trial_dummy 1.28 trial_dummy 1.28 promotional_dummy 2.07 promotional_dummy 1.85 avg_pageviews_per_session 1.53 avg_session_duration 1.52 sessions 1.09 clicked 2.37 received 2.62 viewed 3.80

Table 4: VIF scores for both datasets

Furthermore, a chi-square goodness-of-fit test has not been conducted due to the fact that the sample used, is also considered to be the population. Therefore, the data sets upon which the multinomial models are based, can not be compared to “the full population”. The pseudo R² scores can be found for both models in table 5.

Web data subset Pseudo R² score Full data Pseudo R² score

McFadden 0.7632 McFadden 0.7237

CoxSnell 0.7900 CoxSnell 0.7773

Nagelkerke 0.9074 Nagelkerke 0.8889

(25)

24 4.3 Final multinomial model

After having met all assumptions, the multinomial logistic regression can be used to define the relations between the independent and dependent variables. Table 6 below shows the log odds and standard errors for both the promotional and fixed subscription dependent variable outcomes versus the reference level of no active subscription per independent variable.

It can be concluded that the dummy and relationlength variables prove to be the strongest predictors for which type of subscription a user would choose. By increasing the independent variable by one unit, the log odds of choosing for the shown dependent variable instead of the reference level (no subscription) increases or decreases by the value seen in the table. For the full data set, increasing the fixed_dummy variable by one unit, the log odds for having a fixed subscription versus no active subscription increase with 0.5189. These log odds can be transformed to a normal probability: exp(0.518) / (1 + exp(0.518)) = 0.6267.

Web data subset (intercept) relationlength _promo relationlength _fixed relationlength _trial fixed_dum my promotional_ dummy trial_dummy Promo -6.2071*** (0.0281) 0.0133*** (0.0006) 0.0016*** (0.0001) -0.0177*** (0.0033) 0.0132 (0.5330) 0.9240 (0.0843) -0.0740 (0.0470) Fixed -2.3851*** (0.1388) 0.0025*** (0.0006) 0.0019*** (0.0001) -0.0210*** (0.0025) 0.5358*** (0.0400) 0.1451 (0.0906) -0.2323 (0.0426) avg_pageview s_per_session avg_session_d uration

sessions clicked newsletter_ dummy Promo 0.0885 (0.0450) -0.0001 (0.0005) 0.0136 (0.0093) -0.0068 (0.0050) -0.3290*** (0.0870) Fixed 0.0347 (0.0316) 0.0005 (0.0003) 0.0086 (0.0057) -0.0090 (0.0041) -0.2928* (0.1300)

Residual deviance: 3391.58, AIC:3431.583

Full data set (intercept) relationlength _promo relationlength _fixed relationlength _trial fixed_dum my promotional_ dummy trial_dummy Promo -6.0635*** (0.0116) 0.0116*** (0.0003) 0.0017*** (0.0001) -0.0139*** (0.0021) 0.0968*** (0.0283) 1.0102*** (0.0428) -0.0606* (0.0301) Fixed -2.1486*** (0.0588) 0.0034*** (0.0003) 0.0017*** (0.0001) -0.0136*** (0.0017) 0.5189*** (0.0221) 0.0925* (0.0465) -0.2195*** (0.0279)

Residual Deviance: 10805.85, AIC: 10833.85

(26)

25 4.4 Mediation effect

Next to the main tests, a mediation analysis has been conducted. The Quasi-Bayesian Confidence Interval method has been selected. From the mediation analysis, it can be concluded that newsletter engagement does not have a mediating effect. In table 7, the output from the causal mediation analysis can be seen. From the output, it can be concluded that the average causal mediation effect (ACME), the effect of the mediator alone is positively and significantly related to the active subscription type. However, the average direct effect (ADE) is only slightly significant and negatively related. The total effect (ADE+ACME) is insignificant. This can be the result of the weak link between the IV and DV.

4.5 Lead scoring

The final goal of this research is to be able to provide a method for lead scoring. Relevant variables have been selected by adding every variable one by one. At each step, the combination with the highest prediction accuracy has been selected. The full output of this can be found in the appendix at part III. The following equation has been created accordingly for the web data subset: Pr⁡[𝑆𝑖 = j] = exp⁡(α𝑗+ 𝑅𝑝′𝑖𝑗𝛽 + 𝑅𝑓′𝑖𝑗𝛽 + 𝐷𝑝′𝑖𝑗𝛽 + 𝑆𝑒𝑖′𝛾𝑗) ∑ exp⁡(𝛼𝑘+ 𝑅𝑝′𝑖𝑘𝛽 + 𝑅𝑓′𝑖𝑘𝛽 + 𝐷𝑝′𝑖𝑘𝛽 𝑗 𝑘=1 + 𝑆𝑒𝑖′𝛾𝑘)

The equation for the full dataset:

Mediation analysis Estimate 95% CI Lower 95% CI Upper p-value

ACME (control) 1.45e-03 1.13e-03 0.0 <2e-16***

ACME (treated) 1.45e-03 1.14e-03 0.0 <2e-16***

ADE (control) -1.28e-03 -2.46e-03 0.0 0.03*

ADE (treated) -1.28e-03 -2.46e-03 0.0 0.03*

Total effect 1.69e-04 -1.02e-03 0.0 0.78

Prop. Mediated (control) 1.55e+00 -3.82e+01 37.0 0.78

Prop. Mediated (treated) 1.55e+00 -3.83e+01 37.1 0.78

ACME (average) 1.45e-03 1.14e-03 0.0 <2e-16***

ADE (average) -1.28e-03 -2.46e-03 0.0 0.03*

Prop. Mediated (average)

1.55e+00 -3.83e+01 37.1 0.78

Sample size used: 6924 Simulations: 1000

(27)

26 Pr⁡[𝑆𝑖 = j] = exp⁡(α𝑗+ 𝑅𝑝 ′ 𝑖𝑗𝛽 + 𝑅𝑓′𝑖𝑗𝛽 + 𝐷𝑝′𝑖𝑗𝛽 + 𝐷𝑡′𝑖𝑗𝛽) ∑ exp⁡(𝛼𝑘+ 𝑅𝑝′𝑖𝑘𝛽 + 𝑅𝑓′𝑖𝑘𝛽 + 𝐷𝑝′𝑖𝑘𝛽 𝑗 𝑘=1 + 𝐷𝑡′𝑖𝑘𝛽)

The variables included in the equation can be defined as followed:

Pr[ 𝑆𝑖 = 𝑗] = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦⁡𝑜𝑓⁡𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛⁡𝑡𝑦𝑝𝑒⁡𝑗⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖 𝑅𝑝𝑖 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒⁡𝑝𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑎𝑙⁡𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑙𝑒𝑛𝑔𝑡ℎ⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖 𝑅𝑓𝑖 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒⁡𝑓𝑖𝑥𝑒𝑑⁡𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑙𝑒𝑛𝑔𝑡ℎ⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖 𝐷𝑝𝑖 = 𝑁𝑢𝑚𝑏𝑒𝑟⁡𝑜𝑓⁡𝑝𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑎𝑙⁡𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛𝑠⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖 𝐷𝑡𝑖 = 𝑁𝑢𝑚𝑏𝑒𝑟⁡𝑜𝑓⁡𝑡𝑟𝑖𝑎𝑙⁡𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛𝑠⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖 𝑆𝑒𝑖 = 𝑁𝑢𝑚𝑏𝑒𝑟⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠⁡𝑓𝑜𝑟⁡𝑢𝑠𝑒𝑟⁡𝑖

After having determined the ideal predictive independent variables, other often used machine learning methods are applied to the same variable selection. Tables 8 and 9 below show the accuracy percentages per training method for both the web data subset and the full data set. The web data subset has 5540 users in the train subset and 1385 users in the evaluation subset. For the full data set, the relative size of the train and evaluation subset is the other way around. The train subset has been set to 3769 users and the evaluation subset size is 15.080. Normally, the train data set would be the largest. However, in this case the size of the train data set is large enough for very high accuracy rates. Furthermore, it provides more potential leads and is therefore more valuable for NDC.

(28)

27 Method: Accuracy % Promo predicted Fixed predicted Cohen’s kappa

Conditional inference tree 94.87 5 32 0.9176

Neural Network 93.79 12 34 0.8735

Random Forest 94.65 7 32 0.9141

Regularized SVM (dual) with linear kernel

89.45 29 30 0.8258 Naïve Bayes 91.98 22 20 0.8685 k-Nearest Neighbors 93.64 10 33 0.8974 Multinomial Logistic Regression 90.68 5 32 0.8493

Table 8: Machine learning accuracy's for web data subset

Method: Accuracy % Promo predicted Fixed predicted Cohen’s kappa

Conditional inference tree 93.07 92 276 0.8903

Neural Network 91.57 218 386 0.8640

Random Forest 93.39 105 289 0.8944

Regularized SVM (dual) with linear kernel

83.85 761 521 0.7350 Naïve Bayes 90.85 238 278 0.8540 k-Nearest Neighbors 91.72 154 268 0.8682 Multinomial Logistic Regression 87.85 248 211 0.8021

Table 9: Machine learning accuracy's for full data set

4.6 Hypothesis testing

The final step in analyzing the data is testing the hypotheses. After having verified that a multinomial logistic regression can be used, the following conclusions can be made from its output. Table 6 shows the results for both data sets, defining the relation between the independent and dependent variable outcomes.

(29)

28 An abnormality that as for now can not be explained is the negative significant relation between the newsletter dummy and the dependent variable outcomes. This indicates that by being signed up for a newsletter, it is likely that the user does not have a fixed or promotional subscription. A reason for this could be that a user who makes use of one of DVHN’s paid subscriptions, does not have the need for a newsletter since he or she already receives the news through the paper or paid section of DVHN’s website.

Variables Significance Accepted/rejected

H1: Website engagement positively affects lead conversion

Sessions, avg_session_duration, avg_pageviews_per_session

Not significant Rejected

H2: A newsletter subscription has a positive effect on lead conversion

Newsletter_status_dummy Significant, negative effect Rejected

H3: Newsletter response has a positive effect on lead conversion

clicked Not significant Rejected

H4: Previous paid subscriptions in account history have a positive effect on lead conversion chance

Relationlength_promo, relationlength_fixed,

relationlength_trial, fixed_dummy, promotional_dummy, trial_dummy

Partly significant Accepted

Table 10: Hypotheses overview

5. Conclusions

5.1 Conclusion

The reduction in demand for print media has been the initiator for NDC to reconsider its direction. The main goal of this research has been to explore NDC’s opportunities that come alongside of this shift in demand from print to digital media. One of the obstacles that NDC comes across is the increase in trial subscriptions which in turn leads to fewer fixed and promotional subscriptions, where the real value for NDC lays. Thus, the following main research question could be presented:

How can lead scoring affirm that NDC channel users opt for a fixed subscription, instead of a trial subscription?

(30)

29 implementing this method of who to approach, NDC can increase targeting efficiency and reduce wasted sales resources.

Not all considered aspects have been contributive towards the final lead scoring model. Website engagement and newsletter engagement have proven to be only responsible for a small increase in the predictive accuracy percentage. For the multinomial logistic regression, none of these variables are significantly related to the dependent variable outcomes. Therefore, the sub questions considering the contribution of website and newsletter engagement can be answered by stating that there is close to no contribution of both these aspects to the final lead score. However, the relation was not significant. Therefore, it can not be stated whether newsletter engagement or website engagement has a positive or negative effect on lead conversion.

The main contributing aspect to the lead score is account history. Users who have had a history of a fixed and/or promotional subscription are highly likely to still have a fixed or promotional subscription and are thus likely to be predicted as such. Users who have a history of trial subscriptions are less likely to currently have an active subscription, emphasized by the negative significant relation with the dependent variable outcomes of promotional or fixed subscriptions. This indicates that trial subscriptions often are not followed by a promotional or fixed subscription, confirming NDC’s current predicament.

Unfortunately, not much predictive capabilities are available for the web data and newsletter data, therefore we mainly consider the predictive capabilities of the account history variables. These can be looked at individually or used together in the multinomial model. By considering each possible dependent variable outcome based on the account history variables per user, predicted probabilities per dependent variable outcome can be established. These probabilities can be analyzed and possibly passed through to the sales department.

Following the conclusion of this research, it seems logical in the current situation to only target those who are likely to opt for a fixed or promotional subscription. Users who have had a trial subscription appear to not be in this target group.

5.2 Discussion

(31)

30 developing stage. While big data studies are increasingly common for other industries and institutions, NDC still has a lot to gain on this subject.

In order to keep the scope of the research manageable, clear variables have been created and defined. This has kept the research manageable, but also limited. Only general variables have been created for the web and newsletter engagement aspects indicating its usage. This has been decided due to a combination of limited time and limited availability of data. Furthermore, demographic aspects have not been considered.

While it is proven that newsletter behavior does not have significant predictive capabilities, there is a significant relation between the dependent variable outcomes and having a newsletter subscription. This relation, however, appears to be negative. The reasoning behind this is beyond the scope of this research but could be explored further.

Next to the limitation of the web and newsletter aspects, this research only focused on DVHN and DVHN’s products, making it brand specific. Therefore, the lead scoring model may not be applicable to NDC’s other brands and products.

Due to a system change in February 2019, a lot of valuable data was only available starting from then. This limited time span from which the data was used, limits the lead scoring model by only incorporating data from a time period of eight months.

One of the main limitations of this research is that only current and previous users of one of DVHN’s products are included. Due to the account history variables being the main predictors, only users that have previously had subscriptions are incorporated. Users who make use of only the website or newsletters are not represented in detail enough to be included and are therefore not considered.

Furthermore, due to there being limited literature available on the subject of lead scoring, assumptions had to be made as to how to create a lead scoring model. Therefore, this research has been a contribution to the limited existing body of knowledge on the development of lead scoring theory.

6. Recommendations

(32)

31 subscription. These users can be approached and given an offer for one of these subscriptions. The conversion rate of these approaches can be monitored and used to verify if the model works or not. Based on the outcome of this, the model can be adjusted.

Currently, the model is only estimated for DVHN’s products. The model can also be expanded towards NDC’s other products. By generalizing the steps taken to develop the lead scoring model, other brands may benefit as well.

Further opportunities lay in the expansion of the lead scoring model. This can be done in multiple ways. A good starting point for this is by clearly defining what makes a lead a lead. If the important characteristics of a lead are determined, these characteristics can be quantified and used to expand on the existing model.

Lead characterization is different for each product and organization, therefore inhouse knowledge is required to achieve this goal. NDC has multiple divisions that can aid in this process. The marketing and digital division have an abundance of knowledge that can be used to dig deeper into the vast amount of available data and utilize this. Cooperation is key for this endeavor.

Other ways of expanding the current lead scoring model is by combining NDC’s digital channels and create detailed variables for each of the channel’s important metrics. The available web, newsletter and account history data are extensive and can be explored further. By expanding and possibly further combining these aspects, prediction models can be more detailed and more accurate. Currently, the web, app and newsletter data are hardly combined. Interpreting a user’s behavior across all platforms on a more detailed level could prove to be a logical next step.

A more detailed approach into NDC’s multiple channels can be achieved by defining the customer journey towards a desired conversion. Currently, the customer journey is not exactly known and can thus not clearly be linked to a user’s behavior. By having the different steps in the customer journey defined, valuable metrics can be selected and in turn linked to the definition of a lead. The beforementioned lead characterization can benefit from this. Defining in what stage of the customer journey a user is currently in can assist in efficiently targeting possible leads.

(33)

32 susceptible for various sorts of offers. These insights can be incorporated into the lead scoring model.

One of the advantages of expanding on the current model is that data spanning over a longer period can be incorporated. An increase in users that can be analyzed can lead to an even more accurate model and provide more leads. A research spanning over a longer period reduces the problematic statistical survival issue, relating to the handling of indefinite subscriptions.

Furthermore, the negative relation between trial subscription history and successive promotional or fixed subscriptions has been quantified. This relation can be analyzed further as to why users of trial subscriptions do not follow this up with a promotional or fixed subscription. In addition to the quantitative side of this research, it could be worthwhile to expand on this with qualitative information. By approaching the users of trial subscriptions after their trial subscription has ended, their position on possibly prolonging the paid subscription could be assessed. This approach could lead to a more specific hypothesis as to how to retain trial subscription users.

(34)

33

7. References

A. Lund & M. Lund. (2018). Multinomial Logistic Regression using SPSS Statistics. Retrieved from Laerd Statistics: https://statistics.laerd.com/spss-tutorials/multinomial-logistic-regression-using-spss-statistics.php

Allison, P. (2012, september 10). When Can You Safely Ignore Multicollinearity? Retrieved from Statistical Horizons: https://statisticalhorizons.com/multicollinearity

Arora, N. X. (2008). Putting One-to-one Marketing to Work: Personalization, Customization, and Choice. Marketing Letters, 19(3/4), 305-321. doi:10.1007/s11002-008-9056-z Aydin, G., & Ziya, S. (2008). Pricing Promotional Products Under Upselling. Manufacturing

and Service Operations Management, 10(3), 360-376.

Bender, B., Fabian, B., Haupt, J., & Lessmann, S. (2018). Track and Treat - Usage of E-Mail Tracking for. (p. 59). 26th European Conference on Information Systems: Beyond Digitization - Facets of Socio-Technical Change, ECIS 2018.

Chakraborti, Y. H. (2015). Boxplot-Based Outlier Detection for the Location-Scale Family.

Communications in Statistics: Simulation and Computation, 44(6), 1492-1513.

doi:10.1080/03610918.2013.813037

Donath, B., Crocker, R., Dixon, C., & Obermayer, J. (1995). Managing Sales Leads. NTC

Business Books, 258-263.

Donath, R. (1999). Quality information leads to quality leads. Marketing News, 33(17), 11. Duncan, B. A. (2015). Probabilistic Modeling of a Sales Funnel to Prioritize Leads. KDD '15

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1751-1758. doi:10.1145/2783258.2788578

Fabian, B. &. (2015). E-Mail Tracking in Online Marketing: Methods, Detection and Usage.

Proceedings of the 12. International Conference on Wirtschaftsinformatik.

Gallo, A. (2014, 10 29). The Value of Keeping the Right Customers. Retrieved from Harvard Business Review: https://hbr.org/2014/10/the-value-of-keeping-the-right-customers Ganesh Iyer, D. S.-B. (2005). The Targeting of Advertising. Marketing Science, 305-523.

doi:https://doi.org/10.1287/mksc.1050.0117

Garafola, T. (1992). A study of the factors used by foodservice brokers in the qualification of sales leads. School of Food, Hotel, and Travel Management.

Google. (2019). Overview of Google-crawlers (user-agents). Retrieved from support.google.com: https://support.google.com/webmasters/answer/1061943?hl=nl Graham, J. (1996). Why I won't buy from you: what prospects don't say. Air Conditioning,

Heating, and Refrigeration News, 198(17), 1-2.

Grandy, T. (2005). What is a qualified sales lead. Reeves Journal: Plumbing, Heating, Cooling,

85(11), 22-23.

(35)

34 Hajirahimova, M. S., & Aliyeva, A. S. (2017, October). About Big Data Measurement Methodologies and Indicators. International Journal of Modern Education and

Computer Science (IJMECS), 9(10), 1-9. doi:10.5815/ijmecs.2017.10.01

Hauser, J. R. (2009). Website Morphing. Marketing Science, 28(2), 202-223. doi:10.1287/mksc.1080.0459

Hauser, J., Liberali, G., & Urban, G. (2014). Website Morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science, 60(6), 1594-1616. Hendricks, K., Singhal, V., & Stratman, J. (2007). The impact of enterprise systems on

corporate performance:A study of ERP, SCM, and CRM system implementations.

Journal of Operations Management, 25(1), 65-82.

Heyink, J. W. (1993). The function of qualitative research. Social Indicators Research, 29(3), 291-305.

Hornstein, S. (2005). Sizing up prospects. Sales & Marketing Management, 157, 22.

Hosch, W. L. (2008, 12 17). Spam: Unsolicited Electronic Message. Retrieved from Encyclopædia Britannica: https://www.britannica.com/topic/spam#ref1072166

Hothorn T., H. K. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651-674. doi:10.1198/106186006X133933

Jolson, M. (1988). Qualifying sales leads: the tight and loose approaches. Industrial Marketing

Management, 17, 189-196.

Jolson, M., & Wotruba, T. (1992). Selling and sales management in action: prospecting: a new look at this old challenge. Journal of Personal Selling & Sales Management, 12(4), 59-66.

Kestnbaum, R., & Hsieh, L. (1983). A yardstick to measure inquiry quality. Business

Marketing, 70-1.

Kodali, S. J. (2014, 02 12). The State Of Retailing Online: Key Metrics And Initiatives 2014.

Retrieved from Forrester:

https://www.forrester.com/report/The+State+Of+Retailing+Online+Key+Metrics+An d+Initiatives+2014/-/E-RES111401#

Kolowich, L. (2019, Jan 30). Marketing. Retrieved from hubspot.com: https://blog.hubspot.com/marketing/lead-scoring-instructions

Kraemer, D. H.-S. (2014). Cognitive style, cortical stimulation, and the conversion hypothesis.

Frontiers in Human Neuroscience, 8, 15. doi:10.3389/fnhum.2014.00015

Kumar, V. &. (2005). Who are the multichannel shoppers and how do they perform?: Correlates of multichannel shopping behavior. Journal of Interactive Marketing, 19(2), 44-62. doi:10.1002/dir.20034

(36)

35 Larsson, A. (2011). Interactive to me - interactive to you? A study of use and appreciation of interactivity on swedish newspaper websites. New Media and Society, 13(7), 1180-1197. doi:10.1177/1461444811401254

Lee, H.-H. &. (2010). Investigating Dimensionality of Multichannel Retailer's Cross-Channel Integration Practices and Effectiveness: Shopping Orientation and Loyalty Intention.

Journal of Marketing Channels, 17, 281-312. doi:10.1080/1046669X.20

Li, H., & Kannan, P. (2014). Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment. Journal of Marketing

Research, 51(1), 40-56.

McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia Medica, 22(3), 276-282. doi:10.11613/BM.2012.031

Moe, W. W., & Fader, P. S. (2004). Dynamic Conversion Behavior at E-Commerce Sites.

Management Science, 50(3), 326-335. doi:10.1287/mnsc.1040.0153

Monat, J. P. (2011). Industrial sales lead conversion modeling. Marketing Intelligence &

Planning, 29(2), 178-194.

Montgomery, A., Li, S., Srinivasan, K., & Liechty, J. (2004, September). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579-595+630.

NDC mediagroep. (2019). Over NDC mediagroep. Retrieved from NDC mediagroep: https://www.ndcmediagroep.nl/over-ndc-mediagroep/

Park, C., & Park, Y.-H. (2016). Investigating purchase conversion by uncovering online visit patterns. Marketing Science, 35(6), 894-914.

Price, B. S. (2019). Automatic Response Category Combination in Multinomial Logistic Regression. Journal of Computational & Graphical Statistics, 28(3), 758–766. doi:10.1080/10618600.2019.1585258

Raab, D. M. (2008). Lead Scoring Takes Center Stage. DM Review, 18(8), 6.

Starkweather, J. &. (2011). Multinomial Logistic Regression. University of North Texas, 29,

2825-2830. Retrieved from

http://www.unt.edu/rss/class/Jon/Benchmarks/MLR_JDS_Aug2011.pdf

Talatappeh, M. (2016). Examining the role of cooperation between international marketing and sales departments to create value for customers. International Business Management,

10(15), 3129-3135.

UCLA: Statistical Consulting Group. (2019). Multinomial Logistic Regression. Retrieved from UCLA: Institute for Digital Research & Education: https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/

(37)

36 Verhoef, P. (2003). Understanding the Effect of Customer Relationship Management Efforts on Customer Retention and Customer Share Development. Journal of Marketing, 67(4), 30-45. doi:10.1509/jmkg.67.4.30.18685

Vranica, S., & Marshall, J. (2016, 10 20). Plummeting Newspaper Ad Revenue Sparks New

Wave of Changes. Retrieved from Wall Street Journal:

https://www.wsj.com/articles/plummeting-newspaper-ad-revenue-sparks-new-wave-

of-changes-1476955801?cx_navSource=cx_picks&cx_tag=collabctx&cx_artPos=1#cxrecs_s Witkin, H. A. (1977). Fielddependent and field-independent cognitive styles and their. Rev.

Educ. Res., 47, 1-64. doi:10.3102/00346543047001001

Referenties

GERELATEERDE DOCUMENTEN

Keywords: Subscription program; loyalty program; retail; subscription retailing; participation intention; privacy trust image; store loyalty; store satisfaction;

Is there a stronger flat-rate bias for the focal product when primed with a hedonic consumption goal and how do the different pricing models affect this bias.. Figure 2.1-

Also, Vilnai-Yavetz and Koren (2013) affirm that a mediating role of aesthetics and symbolism of how product packaging influences purchasing intention exists. In line

to have a negative influence on the final product of an adaptation effort, the ERP system after

The Case of Computer-Controlled Music Instruments‟, Organization Science.17(1): 45– 63. Optimal Stimulation Level – Explanatory Behavior Models. Journal of Consumer

Consequently, while both user types are expected to score relatively high on user expertise, differences between people scoring relatively high on lead userness or

Entertainment was measured by asking to what degree users used the website and participated in discussions 1) for entertainment and stimulation of the mind, 2) for fun

initial approximation, an iterative method is usually applied to approximate the relative maximum reasonably well.. In order to examine the convergence in the Scoring Method, we