• No results found

Estimating the impact of marketing campaign on the adoption of mobile banking : using marketing mix modelling

N/A
N/A
Protected

Academic year: 2021

Share "Estimating the impact of marketing campaign on the adoption of mobile banking : using marketing mix modelling"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version (e) MSc in Econometrics

(f) Your track of the MSc in Econometrics 1

Master’s Thesis

Estimating the impact of a marketing

campaign on the adoption of mobile banking

Using marketing mix modelling

Val´

erie Vermaas

Student number: 10360301

Date of final version: January 15, 2018 Master’s programme: Econometrics

Specialisation: Financial Econometrics Supervisor: Prof. dr. C. G. H. Diks Second reader: Prof. dr. A. Rapp

(2)

Statement of Originality

This document is written by Student Val´erie Vermaas who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 1

1.1 Mobile banking adoption . . . 2 1.2 Marketing Mix Modelling . . . 2 1.3 Return on Marketing Investment . . . 3

2 Data and preliminary analysis 4

2.1 Data description . . . 4 2.2 Preliminary Analysis . . . 7

3 Methodology 12

3.1 Choice of model . . . 12 3.2 Generalized Additive Models . . . 16 3.3 Model evaluation . . . 17

4 Results 19

4.1 The aggregated card services . . . 20 4.2 The differences between the card services . . . 25 4.3 Generalized Additive Models . . . 28

5 Conclusion 34

Bibliography 37

Programs 40

(4)

Introduction

Every year, billions are spent on marketing campaigns to convince costumers to buy products or use services and simultaneously create brand awareness. Implicitly or explicitly, every firm conducts a strategy on how this budget is allocated over TV, radio and different on- and offline strategies. But determining what the perfect strategy is to make this allocation is a timely and complex problem. The Marketing Science Institute has even acknowledged the complexity and importance of this topic as they have appointed ‘improving multi-touch attribution, marketing mix, and ROI models – across all media, digital and non-digital’ as one of their top research priorities (MSI, 2016).

ABN AMRO is a big Dutch bank that realizes it could potentially make a lot more impact when they would make their decisions on marketing campaigns more data-based. A way to get more understanding of this problem and thus be able to optimize this strategy is to use Marketing Mix Modelling, Mhitarean (2017): a variety of models which are suitable to explain the variation of a dependent variable such as sales or users of a service. In addition, as Danaher and Rust (1996) describe, the results could potentially be used to estimate and optimize the ROI, Return On Investment, of their marketing efforts.

While the bank is interested in the effect of media on all of their campaigns, this thesis focuses on mobile banking. One of the core values of ABN AMRO is innovation and an important part of this consists of convincing customers to innovate with them; one of the ways to do so is to adopt mobile banking. Shaikh and Karjaluoto (2015): In 2011, 96% of the world’s population has a mobile phone subscription while only 8,6% has a mobile banking account, this in spite of the many advantages of mobile banking. For example, when a customer loses his debit card he could contact the call-center, go through a long choice-menu, wait until an employee is available, go through some safety-checks and explain to the employee that he lost his card, wants to block it and receive a new one. Alternatively, he could open the mobile application, press a few buttons and within a minute the whole process is handled. Obviously the second option is desirable for both the bank and the customer since contacting a call-center costs a lot of money and time. Therefore there is an incentive to use marketing campaigns to move customers to solve a problem by using the mobile application instead of contacting the call-center.

(5)

CHAPTER 1. INTRODUCTION 2 Marketing Mix Modelling is a complex problem. While extensive research has been done on how to model the effectiveness of advertising on sales, little research has been conducted on the effect of marketing on the adoption of mobile banking although this is of interest since it is beneficial to both the bank and the customer. Moreover, most research has been done on monthly or weekly data instead of daily data. The objective of this thesis is to use and compare different econometric models to see which of those describes the amount of usage of cost-saving functions (e.g. applying for a new debit card) in the online environment best and obtain insights on the marketing dynamics. In this chapter the most relevant literature on these matters will be discussed and we will go more into detail with respect to the practical application.

1.1

Mobile banking adoption

Mobile banking can be advantageous for both banks and customers alike. Wessels and Drennan (2010) give examples of these advantages for customers within the mobile value settings identi-fied by Anckar and D’Incau (2002). These settings consist of (i) critical needs and arrangements like a forgotten bill payment, (ii) spontaneous needs and arrangements like an impulse purchase of an item that requires the transfer of funds, (iii) efficiency needs and ambitions which con-sists of for example the increasing of productivity during ‘dead times’ like commuting and (iv) mobility-related needs e.g. no access to a computer.

Furthermore Wessels and Drennan (2010) did a study on the key motivators of the customer acceptance of mobile banking and found that perceived usefulness was the most significant motivator, thus marketing may be needed to show customers how mobile banking can fit into their lifestyle and how useful mobile banking can be.

However, from a bank’s point of view, just as important as the better customer experience that comes with mobile banking is the fact that mobile banking is simply cost-saving. In Chapter 2 a broad description will be done of the different self-services within the mobile application which, since the services are processed online instead of through the call-center or an office visit, are cost-saving.

1.2

Marketing Mix Modelling

Various models will be considered in this thesis to capture all the different dynamic effects of the marketing variables. Hanssens et al. (2002) describe different functional forms for decreasing, constant and increasing returns to scale. They also mention the possibility of using ‘S-shaped’ response curves, “nicely convex-concave” functions as first proposed by Ginsberg (1974) Also GAM models could be used to model the effect of marketing effort, as shown by Bhattacharya (2012).

Besides the question what is the correct functional form of the marketing variables, also the time-effect has to be considered. A TV advertisement one sees today, not only has an effect today but also a smaller effect on the days to follow. To capture the evolving character of

(6)

marketing exposure, a carry-over effect can be included with the AdStock model of Broadbent (1984). The AdStock model is based on the distributed lag models formulated by Koyck (1954) and Jorgenson (1966) and will be explained in Chapter 3.

Next to the modelling of the effects of scale and time of the single marketing channels, the cross-channel effects between different marketing channels have to be taken into account as mentioned by Dinner et al. (2014). While advertising used to be for example only on TV, it is found that advertising on multiple channels simultaneously can strengthen the message. (Naik and Raman, 2003) also describe these effects as synergies. Nowadays there are even more ways to implement multichannel strategies, with the new possibilities of SEA strategies and advertising on social media (Leeflang et al., 2015).

Moreover, in the research of Naik et al. (2005) the shares of competing brands are taken into account. In Chapter 2 we will describe how the effect of competing marketing efforts (those of competing banks) will be taken into account in our modelling. Also in this chapter all other external factors that may influence the amount of self-services will be discussed.

1.3

Return on Marketing Investment

For a bank it is very useful to understand which factors have influence on the adoption of mobile banking, what the carry-over effect is of the various marketing efforts and what the synergies are between the various marketing channels. As discussed, the results could additionally be used to estimate and optimize the ROI. The ROI of the marketing, or as Farris et al. (2002) describe the Marketing Return on Investment, MROI or also mentioned as ROMI, is defined as

MROI = Incremental financial value generated by marketing − Cost of marketing

Cost of marketing . (1.1) This allows companies to calculate how beneficial their marketing effort is. The MROI might turn out negative, but companies should consider the fact that short-term marketing also leads to a positive-long term effect. As Driver and Foxall (1986) describe, short-term advertising also has a long-term effect since, among other matters, the people exposed to an advertisement are not necessarily existing customers thus the impact of advertising can have delayed effects. Fur-thermore, Clark et al. (2009) describe that short-term advertising is one of the most important factors to create brand awareness and Macdonald and Sharp (2000) find in turn that brand awareness leads to better brand performance.

More concrete, it is found that by considering also the long-term effect, the marketing effect is doubled (Lodish and Lubetkin, 1992) or even tripled or increased by a factor four (Dyson, 2008). Moreover Naik et al. (2000) conducted a meta-analyses on 113 case studies and found that advertising sometimes has a long-term effect that lasts longer than a year, which won’t be captured by AdStock.

(7)

Chapter 2

Data and preliminary analysis

The methods that will be used in this thesis will be applied to observational data from ABN-AMRO, enriched with contextual data. The data provided by the bank are obtained from different sources and can be divided into three categories. At first we have click data from the mobile application and internet banking. Secondly we have access to data from the ATL, Above The Line, media campaigns from their Marketing and Communication Dashboard. Fur-thermore there is data on the BTL, Below the line, efforts. The data from these three sources is supplemented with contextual data.

For this research, data from September 2016 until November 2017 is taken into account. Unlike for example mortgages, which are sold by the bank of interest for decades, it is only possible to go through the process of card services online for this short period. For this reason, we cannot use data from a longer period and thus we decided to model at a daily level instead of weekly even though in most studies weekly data is used. Moreover, all data are available on a daily level. This provides us with 414 days and an equal amount of data points.

In this section both the bank data as well as the contextual data will be described just as their relevant transformations and furthermore preliminary analysis will be done. At the end of the data description an overview of all variables used will be presented together with the expected signs of their coefficients in our model.

2.1

Data description

Click data

We have access to click data from the mobile application and internet banking. From this dataset we can extract when a customer has reached the ‘Thank you’ page and thus has finished the process of requesting a service in the mobile application or internet banking. In some occurrences customers are sent from the mobile application to a section of the website and it is difficult to extract whether the service is requested within internet banking or the mobile application. Since the aim of the bank is to reduce store visits and calls to the call center and both internet banking and the mobile application achieve this goal, we treat them as equal.

(8)

Card Service Total online requests % online of total requests New card 458 k 57,6% Replace card 770 k 28.4% Block card 448 k 36.3% Deblock card 154 k 40.5% Change daylimit 1.32 M 89.4% Change geolocation 665 k 92.3%

Table 2.1: Order size of the different card services

The main focus lies on six card services: request a new card, replace a card, block a card, deblock a card, change the limit you can spend on a day and change your geographic location: enable your card to be used out of Europe. In most instances the requests of these services are processed within one store visit or phone call, thus can be considered as equally cost-saving when instead processed online and therefore these services are aggregated for the first part of this thesis. Moreover, we aggregate the services requested online over all customers per day. In Table 2.1 the order size of the different card services requested online is given, together with average percentage per day in the given period the online requests of this service cover of the total requests.

ATL campaigns

ATL Campaigns perform Above The Line; mass media is used to promote brands and reach out to the target customers. For these campaigns, it is not possible to lead back their effort to an individual customer. The bank uses conventional forms of media such as commercials on TV and Radio as well as print and modern forms of media such as exposure on Social media, SEA, Search Engine Advertisement and online, bannering on websites. We purposely do not use all the different kinds of media expenditures in our model since some are performed on such a low scale that they won’t contribute significantly to the model.

We have knowledge of, and thus are able to use the marketing budget spent on all ATL channels for our model. However, marketing agents base their decisions on the number of impressions made by their marketing efforts. Moreover, it is difficult to decide which costs should be taken into account. For example, do we divide creation costs of a campaign equally over the days and the channels, and what if a campaign is reused? Also, when using marketing euro’s spent in the model and the marketing costs drop due to lower prices in the TV or radio market, the model will interpret this as a lower level of advertising, while in fact this remains constant. Moreover, the quality of the commercial or advertisement can have an effect on the effectiveness of the campaign, but this is left out of scope since this is difficult to measure.

For TV and Radio exposure cumulative Gross Rating Points, GRP’s, are taken into account. Farrelly et al. (2005): The GRP’s measure the total volume of delivery of a media campaign to a target audience. It is equal to the percentage of the target audience that is reached by the campaign times the frequency of exposure. In 2017, the Dutch population of 18 years or older

(9)

CHAPTER 2. DATA AND PRELIMINARY ANALYSIS 6

Category Variable Unit Total Exposure Number of days ATL TV GRP 1.12 K 35

ATL Radio GRP 1.85 K 35 ATL Online Impressions 655 B 414 ATL Social Impressions 3.72 B 414 ATL SEA Impressions 281 M 405

Table 2.2: Order size of the ATL variables

Category Variable Unit Total Exposure No. of days BTL BM Amount sent 3.24 M 62

BTL EM Amount sent 194 K 28 BTL DM Amount sent 160 13 BTL POLS Amount shown 10.1 M 411

Table 2.3: Order size of the BTL variables

of age consists of 13.339.900 persons, which makes one GRP equal to 133.399 contacts whom have watched one commercial. Furthermore, just like in the research of as Su´arez and Estevez (2016), the GRP’s are homogenised to a single format spot length of 20 seconds, where three seen ads of 20 seconds are equivalent to two seen ads of 30 seconds.

For the channels Social, SEA and Online the exposure is measured in impressions. In Table 2.2 an overview of the order size of each ATL channel and the number of days of exposure is given. There are more GRP’s spent on Radio then on TV, this is due to the fact that GRP’s on TV are more expensive. Moreover, there are a lot more impressions Online than there are on Social and SEA.

BTL campaigns

To the contrary of ATL, BTL campaigns can be aimed at individual customers. ABN-AMRO uses BankMail, messages that appear in a mailbox in the mobile application, email and direct mail, mail sent to the home address, for this purpose. From now on, these channels will be referred to as BM, EM and DM respectively. Another channel used by the bank is POLS, these are banners that customers can see when they are logged in to internet banking, referring to a subject that is relevant to them.

A BM is available for reading for 60 days in the mobile app. Since not all users read their BM on a daily basis, the number of BM ’s available in the mobile application is used in our model. The exposure of EM is done in large batches. This means that for example at one day, thousands of EM ’s are sent while the following weeks none will be sent. In the methodology will be described how this property should be modelled. In Table 2.3 the order size of the BTL variables is shown. As we can see, there is a considerable difference in the order size, especially the exposure of DM is very small. Therefore DM is left out of the model.

(10)

Category Variable Unit Source Expected Sign Abbreviation Click Data Services used Amount Bank Dep Var SERV ATL TV GRP Bank + TV ATL Radio GRP Bank + RAD ATL Online Impressions Bank + ONL ATL Social Impressions Bank + SOC ATL SEA Impressions Bank + SEA BTL BM Amount sent Bank + BM BTL EM Amount sent Bank + EM BTL POLS Amount shown Bank + POLS Contextual School Holiday Dummy +/- SHOL Contextual Holiday Dummy +/- HOL Contextual ING MB e Nielsen +/- ING Contextual Rabobank MB e Nielsen +/- RABO

Table 2.4: Overview of the variables used in the model

Contextual data

As one would expect, not merely advertising expenditures drive the amount of requests of services processed online. Seasonality, trust in the bank and competitors may also have influence and have to be taken into account. To do so, we complement the data provided by the bank with contextual data.

Firstly, we expect that the amount of services requested are not the same level on a business day as on a weekend day. To control for this effect we include dummy variables for six weekdays, the seventh is left out. For the same reason we use dummies for school holidays and holidays, as we expect that for example on Christmas day, few people will be occupied with their debit card. Next to seasonality effects, we have to take competitors into account. We expect that advertising of competitors for mobile banking will have a positive effect on the use of mobile banking of the bank considered in this thesis, since we don’t expect that customers will change banks because of this advertisement, but we expect they will consider the possibilities of mobile banking in general. We allow for these effects using the estimated media budget spent by competitors, ING and Rabobank, on mobile banking provided by The Nielsen company.

2.2

Preliminary Analysis

Descriptive Statistics

Fig. 2.1 shows the graphs of the number of different card services as described in the previous section. We observe that Deblock card and Block card have very similar time series and that Change geolocation has a peek around July, which can be explained by the fact that this is holiday season and many people change their geolocation to use their debit cards outside of Europe.

(11)

CHAPTER 2. DATA AND PRELIMINARY ANALYSIS 8

(a) New card (b) Replace card (c) Block card

(d) Deblock card (e) Change daylimit (f) Change geolocation

Figure 2.1: Graphs of the different card services

(a) New card (b) Replace card

(12)

done, while on the next day the level is very high in comparison to the other days. Upon further investigation it is found that due to a database error, the requests of a new card and a card replacement aren’t correctly registered in the weekends and holidays and are falsely allocated to the following day. To correct for this error, the assumption is made that service (a) and (b) follow the same saturday-sunday-monday (or all successive days with a holiday) allocation as services (c)-(f). Using this assumption, the requests of services (a) and (b) are reallocated on the days where the database error occurred, which is shown in Fig. 2.2. In Fig. 2.3 the six different time series of the card services are shown in one graph.

In Table 2.5 the descriptive statistics are given for all variables described in the previous section. We observe that apart from Online, all media variables don’t have exposure on at least one day in the given period. The graphs of the time series of all variables used in this thesis are given in Fig. 2.4. To make the output easier to interpret, in the model the impressions on Social, Online and SEA are included in millions and BM, EM and POLS are included in thousands.

(13)

CHAPTER 2. DATA AND PRELIMINARY ANALYSIS 10

(a) Card Services (b) TV (c) Radio

(d) Online (e) Social (f) SEA

(g) Bank Mail (h) Email (i) POLS

(j) ING (k) Rabobank

(14)

Statistic N Mean St. Dev. Min Max SERV 414 6.19 K 1.59 K 2.48 K 10.36 K TV 414 2.69 9.66 0 56 RAD 414 44.75 154.36 0 799 ONL 414 158.24 M 91.63 M 30.72 K 527.35 K SOC 414 8.98 M 18.92 M 0 107.12 M SEA 414 67.76 M 116.17 M 0 546.92 M BM 414 428.38 K 452.39 K 0 1.33 M EM 414 464.31 5.18 K 0 70.75 K POLS 414 24.40 K 22.79 K 0 137.10 K ING 414 150.75 K 416.21 K 0 1.82 M RABO 414 44.31 K 347.51 K 0 3.24 M

Table 2.5: Descriptive Statistics

Correlation between variables

A heat map of the correlation between the variables is given in Fig. 2.5. The red tiles signify negative correlation whilst the blue tiles signify positive correlation. The darker the colour, the stronger the correlation. From the heatmap we can deduce that several media variables have positive correlations, especially TV and Social, ING and Radio and ING and Social. However, the largest correlation is still smaller than 0.7, thus we expect this correlation is not worrisome. The positive correlation also makes sense if we look at Fig. 2.4, since we see that the TV, Radio, ING and Social campaign are all approximately simultaneous.

(15)

Chapter 3

Methodology

In the previous section we discussed that marketing mix modelling is a timely and complex problem. In this chapter the methodology will be described of how the difficulties that are encountered within this subject will be addressed. An appropriate functional form of the model has to be be chosen, we have to account for seasonality and the dynamic structure of the marketing efforts has to be considered. Moreover, we will describe on which criteria the choice for the final model will be based and which tests will be used to validate this model.

3.1

Choice of model

Stationarity

The first step of our modelling approach is to test whether the considered time series is sta-tionary, a necessary condition to obtain valid test results. We test for a unit root with the augmented Dickey-Fuller (ADF) unit root test. Following the approach as in Heij et al. (2004), we use the augmented Dickey-Fuller test equation

∆yt= α + βt + ρyt−1+ ρ1∆yt−1+ · · · + ρp−1∆yt−p+1+ t (3.1)

to test for the following hypotheses:

H0 : ρ = 0 and β = 0 (stochastic trend),

H1 : ρ < 0 and β 6= 0 (deterministic trend)

using a t-test on ρ in Eviews using automated lag selection based on BIC, Bayesian Information Criterion as proposed by Schwarz (1978), given by

BIC = log(N )k − 2 log( ˆL),

where ˆL is the maximum likelihood value and k the number of parameters estimated by the model.

We have to observe the time series to decide whether it appears to have a trend and a constant. When it has, we have to include those in the test and otherwise we have to leave them out. The test will be performed at the 5% significance level.

(16)

Model selection

As described in the data section, we have a substantial number of variables that might explain the number of requests of card services. We do want to explain as much as possible, but we also don’t want to overparametrize the model. Therefore we choose the model with the lowest BIC to select the optimal number of parameters and the optimal functional form.

From the descriptive statistics in the data chapter, we can deduce that there might be an autonomous growth of the amount of services that are requested online. Therefore we will test whether we should include a trend in the model using the BIC described earlier. Furthermore, we can see a weekly trend in the time series thus we expect that we have to include a dummy variable to describe this effect. It will be investigated whether it is better to include a dummy variable for Monday to Saturday or only to indicate whether it is a business day or not, or if we should group the days in a different way or even not include a day variable at all. Ultimately, we will investigate whether it is better to incorporate the media spent by the competitors separately or together, and whether it is better to include the net spent or merely use a dummy variable to indicate whether or not there was a media campaign running that day.

Linear or log-linear

As described in the literature section, both linear and non-linear functional forms are used in marketing mix modelling. While a linear model is simple and intuitive, it does not allow for a non-linear return to scale of marketing efforts.

A linear model,

yt= β0+ β1x1t+ . . . + xktβk+ t, (3.2)

implies that the increase of each xtin GRP’s results in an identical increase of yt, while this is

most of the times not realistic. We expect that an exponential model,

yt= eβ0+β1x1t+β2x2t+...+t, (3.3)

might provide a better fit which can be estimated using OLS again when the logarithms are taken on the left- and right- hand side,

log(yt) = β0+ β1x1t+ . . . + βkxkt+ t, (3.4)

which is well-known as the log-linear model, the coefficients of which can be interpreted by %∆y = 100 ·eβi− 1



≈ 100βi, for βi small and i = 1, 2, . . .

As proposed by Mhitarean (2017) we can use the Box-Cox transformation ytλ− 1

λ = β0+ β1x1t+ . . . + βkxkt+ t (3.5) on the linear model to find the appropriate functional form. When λ tends to 1, (3.5) goes to

(17)

CHAPTER 3. METHODOLOGY 14 and the linear model is the best fit. When λ → 0, (3.5) goes to (3.4) thus the log-linear model is the best fit. We test whether model (3.2) or (3.4) is optimal using the Likelihood Ratio (LR) test. When both 0 and 1 aren’t in the 95% confidence interval of λ, other functional forms should be investigated.

The Ramsey RESET test

Also, it should be tested whether square or cubic effects should be incorporated in the model. This can be tested using the Ramsey RESET test. As in Tsay (2005) ˆβ = (β0, β1, ..., βk)0 to

compute the fit ˆyt= Xt0β with Xˆ t= (1, x1t, ..., xkt), the residuals ˆt= Xt− ˆXt and the sum of

squared residuals SSR0 =PTt=p+1ˆ2t with T=414, the sample size. Next, the linear regression

ˆ

t= Xt0α1+ Mt0αt+ υt (3.7)

is done with Mt= (ˆx2t, ˆx3t) and the least squares residuals

ˆ

υt= ˆt− Xt0αˆ1− Mt0αˆ2

and the sum of squared residuals SSR1 =PTt=k+1υˆt2. Our null hypothesis is that α1 and α2 of

Eq. 3.7 are equal to zero. This can be tested by the F -statistic of Eq. 3.7 given by F = (SSR0− SSR1)/g

SSR1/(T − k − g)

with g = s + k + 1 (3.8) With s = 2 as we control for quadratic and cubic effects. When the null hypothesis is rejected, this indicates that a quadratic or cubic term should be included.

AdStock

As mentioned in Chapter 1, advertising at time t has a direct effect at time t, but also a (smaller) effect at time t + 1, t + 2, ... and this carry-over effect can be included with the AdStock model of Broadbent (1984). Lags of the advertising expenditure of a marketing channel will be included in the model using an AdStock variable

AdStockkt= f (xkt) = xkt+ λkxk,t−1+ λ2kxk,t−2+ ... (3.9)

with xkt the advertising expenditure of channel k on time t and λk ∈ [0, 1) the decay factor or

retention rate of channel k which can be rewritten to

AdStockkt= λkAdStockk,t−1+ xkt. (3.10)

Besides AdStock variables to include the carry-over effects of the marketing exposure, we want to include synergy AdStock variables in our model. As described in the literature section, there are synergies between different marketing channels and we want to capture those in our model. We can do so by, similar to equation 3.10, define the synergy effect between channel k and k’ as AdStockk:k0t= λk:k0AdStockk:k0,t−1+ xktxk0,t. (3.11)

(18)

The value of λ is of course dependent of the channel of the marketing effort. Most research is done on weekly instead of daily data on AdStock models. In a meta-analysis of 114 papers, Assmus et al. (1984) found an average λ of 0.46 with a standard deviation of σ = 0.30 Since we use daily data, we expect that the retention rate to be closer to one than for weekly data considering this has to be the case to get the same half-life, log(0.5)/ log(λ). However, Abe (1991) found that for cranberry drinks the daily retention rate was equal to 0.909 which implies a half-life of 7.26 days which is relatively short. For SEA and interaction with SEA we do not include AdStock variable, since one only is exposed to this channel when actively searching for mobile banking thus the effect is expected to be direct. The same holds for POLS, customers only see a POLS message when they are already in the online banking environment thus we do not include AdStock for POLS.

Non-linear estimation

Since with the use of the AdStock model for media efforts our model became nonlinear, we have to use a nonlinear optimization method to solve for

min S(β) = min

T

X

t=1

(yt− ˆyt)2 s.t. λk∈ [0, 1). (3.12)

We choose not to use the Gauss-Newton method, since we want to set boundaries on λ and this is not possible with the Gauss-Newton method. Therefore we estimate β = (β1 . . . βkλ1 . . . λk)T

with the Levenberg-Marquardt method, which is a combination of the Gauss-Newton and the Gradient descend method. This is an iterative method where starting values have to be given for β. We use λ=0.9 and the estimates of the model without AdStock as an initial guess for β. Gavin (2017) notes that the parameter updates of the Levenberg-Marquardt algorithm vary between the gradient descent update and the Gauss-Newton update,

JTW J + γI θ

lm = JTW (y − ˆy), (3.13)

where small values of the algorithmic parameter γ result in a Gauss-Newton update and large values of γ in a gradient descent update. In each iterative step the value of γ is adjusted. If an iteration leads to a worse approximation, S(β + γ) > S(β) γ is increased. Otherwise, when the approximation improves, γ is decreased. To improve convergence, Marquardt (1963) stated that the values of γ have to be normalized to the values of JTW J such that

JTW J + γ diag(JTQJ ) θ

lm = JTW (y − ˆy). (3.14)

After we have estimated the parameters using the Levenberg-Marquardt method, we save the AdStock rates which are significantly different from zero and use those rates to re-estimate the log-linear model. We will evaluate whether our model has improved by comparing the BIC values and thereafter use stepwise selection where we delete the least significant variable in each iteration until either the BIC does not improve anymore or every variable is significant at the 5% level. Subsequently we will determine whether there exist significant interaction effects between the media variables and whether interaction AdStock effects should be included.

(19)

CHAPTER 3. METHODOLOGY 16

3.2

Generalized Additive Models

We will also attempt to model the dynamic nature of the media variables with the use of Generalized Additive Models, GAM. For this part of the modelling we follow the approach and notation of Wood (2017). A generalized additive model in general has a structure which resembles

g(µi) = Xi∗θ + f1(x1i) + f2(x2i) + f3(x3i, x4i) + . . . (3.15)

where

µi≡ E(Yi) and Yi∼ some exponential family distribution.

Furthermore Yi indicates the response variable, in our case services or log(services), which one

turns out to be the best fit from the linear models. Xi∗ is a row of the model matrix with only strictly parametric model components, in our case the dummy variables that indicate weekdays, holidays and so on. θ is the corresponding parameter vector and each fj represents a smooth

function of the covariates which are not estimated strictly parametric, xk, in our case the media

variables. It is also possible to include interaction effects, as in the last term of 3.15. In this section a brief explanation will be given of the GAM theory.

Thin plate regression splines

A smooth function fj can be estimated by choosing basis functions bij(x). Then fj has a

representation like fj(x) = q X i=1 bij(x)βij. (3.16)

Various bases are available, in this thesis is chosen to use a thin plate regression spline basis to find f , as default in the mgcv package of R, since it can represent smooths of more than one predictor variable and they are considered optimal. Thin plate regression splines are based on Thin plate splines, which estimate a smooth function g(x) by minimizing

||y − f ||2+ φJmd(f ) (3.17)

where y is a vector of n observations

yi = g(xi) + i

and f = (f (x1), f (x2), . . . , f (xn))T. Jmd(f ) is a function that puts a penalty on the so-called

‘wiggliness’ of f . We want to fit the data as good as possible, but at the same time prefer a smooth function of f . We define φ as the smoothing parameter that controls the tradeoff between those two properties (note that the standard notation for the smoothing parameter is λ, but we choose φ to avoid confusion since λ is already used as decay factor). If φ → ∞, the estimate is a straight line, while if φ = 0 there is no penalty.

While the thin plate spline ˆf is a good smoother, thin plate splines have as many unknown parameters as there are unique predictor combinations and are therefore computationally costly.

(20)

Therefore thin plate regression splines are introduced. They divide the components of the thin plate splines into wiggly components δ which space is truncated and zero-wiggliness components α that remain unchanged. They are estimated with the Lanczos algorithm which has a lower computational cost. For more background on thin plate ,regression splines and the Lanzcos algorithm see Wood (2017).

GCV function

We have to choose the optimal smoothing estimator φ to make a tradeoff between a penalty on wiggliness and on bad fit. This is done by using the Generalized Cross Validation, GCV score, where the φ is chosen for which

Vg =

nD( ˆβ)

(n − tr(A))2 (3.18)

is minimized, where D( ˆβ) is the deviance and tr(A) the effective degrees of freedom of the model. The GCV score is known to overfit on occasion, therefore Kim and Gu (2004) suggest to multiply the degrees of freedom with γ = 1.4. This can largely correct the tendency to overfit without compromising model fit and will thus be used in this thesis to estimate the optimal value of φ.

Model selection

The GAM method does not allow for parameter selection, therefore we start with the full model and subsequently use stepwise selection to find the optimal amount of parameters. Moreover, we will include AdStock variables of all media variables except for SEA and POLS. Since with GAM we can’t determine an optimal decay factor as we can with the Levenberg-Marquardt method, we use the decay factor as found for daily data by Abe (1991), λ = 0.909.

3.3

Model evaluation

Additional to the tests mentioned earlier, we have to perform the Jarque-Bera test for Normality and calculate the VIF ’s, Variance Inflation Factors, to validate the model. While the former is well-known, for reference see Heij et al. (2004), the latter is a less commonly used method and thus will be discussed in this section.

Although in Chapter 2 we could conclude from the heat map that the correlation between the variables was not worrisome, we plan on including AdStock transformations of the media variables and interaction effects. To check whether or not those models suffer from severe multicollinearity we calculate the VIF for each variable j as described in Fox (2008)

VIFj =

1 1 − R2

j

, (3.19)

where R2j is the R2 of the regression of Xj on the other covariates,

(21)

CHAPTER 3. METHODOLOGY 18 in the example that j = 1. Since the estimated variance of the ˆβj can be expressed as

d V ar( ˆβj) = 1 1 − R2j · s2 (n − 1) dV ar(Xj) (3.21) = VIFj · s2 (n − 1) dV ar(Xj) ,

the VIF indicates the impact of collinarity on the precision of ˆβj. Thus with an VIF of four, the

standard error of ˆβj is two times as big as it would be when the Xj was uncorrelated with the

(22)

Results

In this section the methods of the previous chapter are applied to the dataset provided by the ABN-AMRO on the self-services within their online banking environment, which is described in chapter 2. First the behaviour of the aggregated card services will be investigated, next the differences between the card services and lastly the general additive models.

As described in the previous chapter, the first step in our modelling approach is to check whether there is a stochastic, deterministic or no trend in the time series. This is tested with the augmented Dickey-Fuller test, in Table 4.1 the ADF statistic with corresponding p-value is reported together with the lag length of the test and whether a constant and trend was included in the test. We observed the time series in Fig. 2.4 (a)-(f) to decide on whether or not to include a trend. New card does not appear to have a trend, but about the trend of Change Geolocation we are not sure. For this series we first tested for a unit root with a constant and a trend and then, when the null hypothesis couldn’t be rejected, we tested with a constant only, which also isn’t rejected. This indicates that Change geolocation has a unit root. When we take the first differences of Change geolocation the null hypothesis is rejected, thus we should model the first differences for this series. All other series have a trend included in the test and show to have a deterministic trend as the null hypothesis of a stochastic trend is rejected. Finally, Change daylimit is stationary.

Name ADF Statistic p-value Lag length Constant Trend Card Services -4.09 0.0071 15 Change geolocation -2.72 0.0719 7 ∆(Change geolocation) −7.79 < 0.0001 6 Change daylimit -6.30 < 0.0001 13 Block card -10.88 < 0.0001 6 Deblock card -8.57 < 0.0001 12 Request new card -6.90 < 0.0001 14 Replace card -16.06 < 0.0001 5

Table 4.1: The ADF-statistic for all card services

(23)

CHAPTER 4. RESULTS 20

4.1

The aggregated card services

The linear and log-linear model

When modelling the aggregated card services, we first want to determine whether we should include the media expense on mobile banking of both competitors separately or combined, whether we want to use the net spent or a dummy variable if they had a campaign on mobile banking running and in which way we should incorporate the day of the week effect. Based on the BIC values, it is best to include the competitors separately with a dummy variable whether there was a campaign on mobile banking that day. Moreover, we can conclude that it is better to incorporate a dummy for the days M onday to Saturday than only a dummy to indicate whether or not it was a business day. However, T uesday and W ednesday were not significantly different from F riday, thus is chosen to group these days which also has a better BIC-value. Finally, using the same measure, it can be concluded that there is a significant trend thus there is an autonomous growth. The results of this linear model with these decisions processed,

SERVt= β0+ β1TRENDt+ β2TVt+ β3RADt+ β4SOCt+ β5ONLt+ β6SEAt+ β7RABOt

+ β8INGt+ β9SHOLt+ β10HOLt+ β11TUEt+ β12WEDTHUFRIt+ β13SATt

+ β14SUNt+ β15BMt+ β16EMt+ β17POLSt+ t, (4.1)

are shown in the first column of Table 4.2.

We conclude that all variables are significant except for Social, Online and EM and that POLS has a significant effect, but with an unexpected sign. Next, we perform the LR test on the Box-Cox transformation as described in 3.5. As the results of this test, displayed in Fig. 4.1, show that 0 lies in the 95% confidence interval of λ, we can conclude that the log-linear model,

log(SERVt) = β0+ β1TRENDt+ β2TVt+ β3RADt+ β4SOCt+ β5ONLt+ β6SEAt

+ β7RABOt+ β8INGt+ β9SHOLt+ β10HOLt+ β11TUEt+ β12WEDTHUFRIt

+ β13SATt+ β14SUNt+ β15BMt+ β16EMt+ β17POLSt+ t, (4.2)

is a better fit. The results of this model are shown in the second column of Table 4.2. The signs and significances of the variables are equal to the linear model, except for POLS, this variable has lost its significance. Moreover, we can see that the BIC-value of the log-linear model is better than the BIC-value of the linear model. Finally the Ramsey RESET test is performed for both models, the results are shown in Table 4.2. This shows that the null hypothesis of no nonlinearity is rejected for the linear model, but not for the log-linear model. From the full log-linear model the optimal log-linear model is defined using the R-function StepAIC with k=log(n)=log(414) instead of k=log(2) degrees of freedom to get the model with the best BIC. The results are shown in Table 4.3. We can see that all media variables are omitted from the model, except for TV, Radio, SEA and BM. Moreover it shows that media exposure of Rabobank has a positive effect while media exposure of ING has a negative effect, which might

(24)

Linear model Log-linear model (Intercept) 5.23 · 103∗ (2.71 · 102) 8.53 · 100∗ (4.21 · 10−2 ) Trend 6.15 · 100∗ (4.29 · 10−1) 1.03 · 10−3∗ (6.68 · 10−5) TV 1.78 · 101∗ (5.52 · 100) 2.89 · 10−3∗ (8.59 · 10−4) RAD 7.98 · 10−1∗ (3.22 · 10−1) 1.19 · 10−4∗ (5.01 · 10−5) SOC −1.69 · 100 (3.59 · 100) −3.77 · 10−4 (5.59 · 10−4) ONL 7.04 · 10−3 (5.07 · 10−1) 3.79 · 10−5 (7.89 · 10−5) SEA 2.20 · 100∗ (4.01 · 10−1 ) 3.18 · 10−4∗ (6.24 · 10−5) RABO 3.34 · 102∗ (1.12 · 102) 4.70 · 10−2∗ (1.74 · 10−2) ING −6.42 · 102∗ (1.42 · 102) −8.67 · 10−2∗ (2.21 · 10−2) SHOL 2.72 · 102∗ (9.99 · 102) 4.12 · 10−2∗ (1.55 · 10−2) HOL −1.40 · 103∗ (2.40 · 102) −2.59 · 10−1∗ (3.73 · 10−2) TUE 3.87 · 102∗ (1.34 · 102) 5.94 · 10−2∗ (2.09 · 10−2) WEDTHUFRI 5.37 · 102∗ (1.10 · 102) 8.38 · 10−2∗ (1.72 · 10−2) SAT −1.79 · 103∗ (1.38 · 102) −3.13 · 10−1∗ (2.14 · 10−2) SUN −2.39 · 103∗ (1.37 · 102) −4.61 · 10−1∗ (2.14 · 10−2) BM 4.67 · 10−1∗ (1.21 · 10−1) 6.95 · 10−5∗ (1.88 · 10−5) EM −8.09 · 10−1 (7.06 · 100) −1.13 · 10−3∗ (1.10 · 10−3) POLS −5.52 · 100∗ (2.14 · 100) −6.51 · 10−3∗ (3.33 · 10−3) N 414 414 R2 0.799 0.835 adj. R2 0.791 0.828 Resid. sd 727.79 0.113 BIC 6727.502 -532.705 RESET (p-value) 5.0803 (0.006632) 2.8913 (0.05668) Standard errors in parentheses

indicates significance at p < 0.05

Table 4.2: The estimates of the linear and log-linear model

(25)

CHAPTER 4. RESULTS 22 be dependent of the nature of the campaign. All media variables have a positive effect and on Wednesday, Thursday and Friday most card services are requested digitally and on a Sunday the least. Furthermore, on a Holiday less card services are requested digitally, while on a School Holiday this is more.

AdStock Estimation

As described in Chapter 4, for all media variables which are included in the optimal log-linear model, AdStock variables are included:

log(SERVt) = β0+ β1TRENDt+ β2AdStock(TVt, λTV) + β3AdStock(RADt, λRAD) + β4SEAt

+ β5RABOt+ β6INGt+ β7SHOLt+ β8HOLt+ β9TUEt+ β10WEDTHUFRIt

+ β11SATt+ β12SUNt+ β13AdStock(BMt, λBM) + t, (4.3)

This is estimated with the Levenberg-Marquardt method. Saving the AdStocks, the model is re-estimated with OLS. Next, the least significant variable is removed from the model and this is estimated using the Levenberg-Marquardt method again. This procedure is repeated until the BIC-value does not improve anymore by removing variables. The results are shown in Table 4.3. All coefficients have the same sign as in the optimal loglinear model, Rabobank and BM are removed in the optimal model with AdStock. TV has an optimal AdStock of 0.000, which indicates that there is approximately no carry-over effect. Radio has a carry-over effect of 0.9454 which indicates a half-time of 12.3 days.

Next, for the media variables of that are included in the optimal log-linear model, both the AdStock as the interaction AdStock is included:

log(SERVt) = β0+ β1TRENDt+ β2AdStock(TVt, λT V) + β3AdStock(RADt, λRAD) + β4SEAt

+ β5RABOt+ β6INGt+ β7SHOLt+ β8HOLt+ β9TUEt+ β10WEDTHUFRIt

+ β11SATt+ β12SUNt+ β13AdStock(BMt, λBM) + β14AdStock(TVt· RADt, λTVRAD)

+ β15AdStock(TVt· BMt, λT V BM) + β16AdStock (RADt· BMt, λRADBM) (4.4)

+ β17SEA · TV + β18SEA · RAD + β19SEA · BM + t

For SEA, interaction effects with the other media variables are included instead of the interaction AdStock. The optimal model is determined with the same procedure as for the optimal log-linear model with only AdStock. The results are shown in Table 4.3. The signs are equal to the log-linear model and log-linear model with AdStock except for the AdStock variable of Radio, which is negative in the model with interactions. Radio and BM have a positive synergy just like SEA and Radio, while the other synergies are negative. However, also because the BIC-value is lower, we expect this model to be over-parametrized. This can lead to poor estimates which can explain the negative value of the AdStock for Radio.

As described in the methodology, several tests will be performed to evaluate the models. First, the variance inflation factor is given. In the absence of multicollinearity, this should be

(26)

Log-linear Log-linear AdStock Log-linear AdS and Int. (Intercept) 8.52 · 100∗ (3.57 · 10−2 ) 8.56 · 100∗ (2.73 · 10−2 ) 8.47 · 100∗ (2.90 · 10−2 ) Trend 1.06 · 10−3∗ (6.04 · 10−2) 9.51 · 10−4∗ (5.53 · 10−2) 1.02 · 10−3∗ (8.68 · 10−5) TV 2.61 · 10−3∗ (7.19 · 10−4) RAD 1.24 · 10−4∗ (4.51 · 10−5) SEA 3.14 · 10−4∗ (6.05 · 10−5) 3.77 · 10−4∗ (5.47 · 10−5) 3.49 · 10−4∗ (9.33 · 10−5) RABO 3.71 · 10−2∗ (1.69 · 10−2) 3.62 · 10−2 (1.67 · 10−2) ING −8.55 · 10−2∗ (2.18 · 10−2) −6.32 · 10−2∗ (2.00 · 10−2) SHOL 4.06 · 10−2∗ (1.44 · 10−2) 3.08 · 10−2∗ (1.51 · 10−2) HOL −2.65 · 10−1∗ (3.69 · 10−2) −2.51 · 10−1∗ (3.53 · 10−2) −2.67 · 10−1∗ (3.56 · 10−2) TUE 6.06 · 10−2∗ (2.08 · 10−2) 6.34 · 10−2∗ (2.00 · 10−2) 6.14 · 10−2∗ (1.93 · 10−2) WEDTHUFRI 8.43 · 10−2∗ (1.71 · 10−2) 8.84 · 10−2∗ (1.64 · 10−2) 8.95 · 10−2∗ (1.58 · 10−2) SAT −3.06 · 10−1∗ (2.10 · 10−2) −3.03 · 10−1∗ (2.01 · 10−2) −3.04 · 10−1∗ (1.95 · 10−2) SUN −4.53 · 10−1∗ (2.10 · 10−2) −4.52 · 10−2∗ (2.01 · 10−2) −4.54 · 10−2∗ (1.94 · 10−2) BM 5.17 · 10−5∗ (1.56 · 10−5) 2.85 · 10−6∗ (2.25 · 10−5) AdST V 1.58 · 10−3∗ (5.91 · 10−4) 1.35 · 10−2∗ (2.75 · 10−3) AdSRAD 3.00 · 10−5∗ (3.33 · 10−6) −2.10 · 10−4∗ (4.29 · 10−5) AdSRADT V −9.83 · 10−8 (1.55 · 10−7) AdSRADBM 1.73 · 10−7∗ (2.52 · 10−8) AdST V BM −9.81 · 10−6∗ (2.36 · 10−6) SEA*RAD 4.53 · 10−6 (6.78 · 10−6) SEA*TV −2.20 · 10−4∗ (1.10 · 10−4) SEA*BM −4.23 · 10−8 (1.10 · 10−7) λT V 0.0000 0.0000 λRAD 0.9454 0.7967 λRADT V 0.9900 λRADBM 0.8920 λT V BM 0.0000 N 414 414 414 R2 0.8334155979 0.8448191335 0.8398901538 adj. R2 0.8280016049 0.8409684917 0.8325940089 Resid. sd 0.1132782322 0.1089245701 0.1117557208 BIC -552.2811 -599.7158 -590.4975 Standard errors in parentheses

AdSα is an abbreviation for AdStock(α, λα) ∗

indicates significance at p < 0.05

Table 4.3: The optimal models with and without AdStock and interaction

smaller than ten for all variables. From the results given in Table 4.4 we can conclude that the VIF’s exceed this boundary for the AdStock variables in the model with interaction. Thus the multicollinearity in this model is severe and the estimates are less reliable. Moreover, the Jaruqe-Bera test is performed on the residuals of the three optimal models and their qq-plots are included. The Ajusted Jarque Bera statistics in Table 4.5 indicate that for all models, the Null hypothesis of normality is rejected. From their qq-plots in Fig. 4.2 we can also distinguish

(27)

CHAPTER 4. RESULTS 24 non-normality.

Variable VIF Optimal VIF Optimal AdStock VIF Optimal AdStock interaction Trend 1.680 1.524 4.048 TV 1.553 AdStock(TV,λT V) 1.136 26.404 RAD 1.562 AdStock(RAD,λRAD) 1.130 28.658 SEA 1.589 1.405 4.401 RABO 1.482 1.685 ING 1.185 1.081 SHOL 1.217 1.640 HOL 1.034 1.024 1.124 TUE 1.734 1.732 1.741 WEDTHUFRI 2.311 2.301 2.324 SAT 1.737 1.725 1.749 SUN 1.735 1.720 1.731 BM 1.612 3.885 AdStock(RAD ·BM,λRADBM) 6.374 AdStock(TV ·BM,λTVBM) 26.828 AdStock(RAD ·TV,λRADTV) 14.536

Table 4.4: The VIF of the optimal models

Model AJB statistic p-value Log-linear 16.099 0.003 Log-linear with AdStock 19.004 0.002 Log-linear with AdStock and interaction 18.487 0.005

Table 4.5: The AJB statistics

(a) Log-linear (b) with AdStock (c) with AdStock and interaction

(28)

4.2

The differences between the card services

In this section the different card services will be modelled and the differences between the card services will be reviewed. Are the same media variables significant, and do they exhibit the same weekly trend? Table 4.1 showed that Change daylimit, Block card, Deblock card and Request card have a deterministic trend thus a trend is included, Request new card is stationary and Change geolocation has a stochastic trend thus the first differences are modelled.

Firstly, for every series the optimal day grouping is found. The results of the models with all variables are shown in Table 4.6. The day-of-the-week effects are left out of the table for readability. Subsequently, StepAIC is used again to determine the optimal models, the results can be found in Table 4.7.

For the different time series of the separate card services it could be investigated whether it is better to use a log-linear functional form instead of a linear, and interaction effects and AdStock could be included. This might give a better fit, but the main objective of this thesis is to model the total of card services since they are all equally cost-saving. The reason to model them separately is to see whether or not the media variables had influence on all card services. In chapter 5 conclusions will be drawn from the results.

(29)

CHAPTER 4. RESULTS 26

Daylimit Block Deblock ∆Geolocation New card Replace card (Intercept) 2200.2829∗ 322.2906∗ 76.2672∗ 305.3080∗ 503.5507∗ 505.6731∗ (183.2910) (16.9314) (8.4518) (36.1962) (30.8149) (102.3529) Trend 2.8162∗ 0.6669∗ 0.4446∗ 1.4629∗ (0.2904) (0.0268) (0.0134) (0.1714) TV 11.3894∗ −0.1720 −0.2401 0.0767 1.0191 4.0542 (3.7338) (0.3449) (0.1722) (0.9458) (0.8054) (2.2086) RAD 0.2720 0.0083 −0.0077 0.0455 0.0678 −0.1175 (0.2178) (0.0200) (0.0100) (0.0551) (0.0469) (0.1291) SOC 3.2294 −0.0800 −0.0230 0.0316 −1.4338∗ 0.0199 (2.4300) (0.2248) (0.1121) (0.6156) (0.5242) (1.4361) ONL 0.2884 0.0195 −0.0054 −0.0378 0.0186 0.0179 (0.3431) (0.0316) (0.0158) (0.0834) (0.0710) (0.2029) SEA 1.4931∗ 0.0653∗ 0.0521∗ 0.0761 0.1009 0.0826 (0.2712) (0.0251) (0.0125) (0.0654) (0.0556) (0.1605) RABO 251.6827∗ −23.0756∗ −4.9439 −20.0777 −68.4580∗ 3.0007 (75.7393) (7.0050) (3.4925) (18.9646) (16.1364) (44.8146) ING −333.1837∗ −11.8842 −5.4291 8.5594 −8.4679 7.0272 (96.2118) (8.8722) (4.4365) (24.1685) (20.5630) (56.9166) SHOL −10.0219 7.5666 1.4369 −25.4498 51.6175∗ 18.9937 (67.5796) (6.2496) (3.1162) (16.9624) (14.4376) (40.0163) HOL −593.5460∗ −23.4586 −20.9491−108.5370−395.2507−215.1816∗ (162.3170) (15.0085) (7.4847) (41.2054) (35.0156) (95.8346) BM 0.2089∗ 0.0293∗ 0.0114∗ −0.0126 −0.0517∗ 0.0634 (0.0817) (0.0075) (0.0038) (0.0207) (0.0177) (0.0484) EM 1.4271 −0.0689 0.0818 −1.3443 −1.4420 −0.8278 (4.7789) (0.4410) (0.2204) (1.2116) (1.0312) (2.8285) POLS 0.3218 −0.7583∗ −0.4449∗ −0.0415 0.1192 −0.9920 (1.4463) (0.1309) (0.0667) (0.3457) (0.2943) (0.8578) N 414 414 414 414 414 414 R2 0.7197 0.8019 0.8667 0.5916 0.8775 0.3664 adj. R2 0.7077 0.7939 0.8610 0.5741 0.8726 0.3441 Resid. sd 492.3983 45.5420 22.7052 124.7319 106.2199 291.8329 BIC 6403.98 4427.817 3856.478 5267.036 5129.031 5955.895 Standard errors in parentheses

indicates significance at p < 0.05

(30)

Daylimit Block Deblock ∆Geolocation New card Replace card (Intercept) 2297.2118∗ 307.5812∗ 63.8916∗ 1922.5529∗ 487.9578∗ 520.3316∗ (144.1702) (11.2458) (4.7336) (60.7113) (19.3768) (29.8830) Trend 2.7726∗ 0.6743∗ 0.4510∗ 1.4470∗ (0.2462) (0.0244) (0.0121) (0.1216) TV 16.4252∗ 3.7261∗ (2.8926) (1.5068) SEA 1.5014∗ 0.0650∗ 0.0552∗ (0.2583) (0.0234) (0.0117) RABO 242.3245∗ −17.4710∗ 148.0617∗ −52.3376∗ (71.6596) (6.2053) (33.4496) (14.3336) ING −302.9949∗ −313.3175∗ (90.5927) (41.7493) HOL −604.1333∗ −21.5633−394.6436∗ (160.1591) (7.3679) (34.5570) BM 0.2976∗ 0.0315∗ 0.0101∗ 0.2211∗ −0.0576∗ (0.0630) (0.0060) (0.0027) (0.0355) (0.0125) POLS −0.7506∗ −0.4114∗ −5.3328∗ (0.1235) (0.0624) (0.6031) RAD 0.5739∗ (0.0968) SOC −3.0559∗ (0.9214) ONL −0.5240∗ (0.1388) SHOL 220.9757∗ 51.5886∗ (29.7489) (12.6485) N 414 414 414 414 414 414 R2 0.7136 0.7982 0.8637 0.5759 0.8731 0.3503 adj. R2 0.7065 0.7942 0.8607 0.5621 0.8706 0.3455 Resid. sd 493.4272 45.5151 22.7332 224.0023 107.0359 291.5225 BIC 6370.781 4387.382 3817.571 5731.878 5095.421 5899.988 Standard errors in parentheses

indicates significance at p < 0.05

(31)

CHAPTER 4. RESULTS 28

4.3

Generalized Additive Models

We first consider the GAM with all media and contextual variables. Also for all media variables except for SEA and POLS AdStock is included with λ = 0.909 as described in Section 3. Again we tested, based on the BIC value, whether it is better to group weekdays and the media spent on mobile banking of the competitors. This results in the following equation:

E(log(SERVt)) = θ0+ θ1TRENDt+ θ2COMPt+ θ3SHOLt+ θ4HOLt+ θ5SATt+ θ6SUNt

+ θ7THUt+ θ8WEDFRIt+ f1(AdStock(TVt, λ)) + f2(AdStock(RADt, λ))

+ f3(AdStock(SOCt, λ)) + f4(AdStock(ONLt, λ)) + f5(SEAt) (4.5)

+ f6(AdStock(BMt, λ)) + f7(AdStock(EMt, λ)) + f8(POLSt) + t,

with λ = 0.909.

A. parametric coefficients Estimate Std. Error t-value p-value (Intercept) 8.4986 0.0279 304.5362 < 0.0001 Trend 0.0012 0.0001 13.1227 < 0.0001 COMP 0.0274 0.0153 1.7972 0.0731 SHOL 0.0259 0.0182 1.4241 0.1553 HOL -0.2486 0.0403 -6.1771 < 0.0001 SAT -0.2985 0.0195 -15.3044 < 0.0001 SUN -0.4335 0.0214 -20.2949 < 0.0001 THU 0.0576 0.0157 3.6653 0.0003 TUE 0.0582 0.0155 3.7583 0.0002 WEDFRI 0.0799 0.0137 5.8454 < 0.0001 B. smooth terms edf Ref.df F-value p-value s(AdStock(TV, 0.909)) 5.2232 6.3058 8.2511 < 0.0001 s(AdStock(RAD, 0.909)) 1.0003 1.0005 0.0767 0.7820 s(AdStock(SOC, 0.909)) 2.2170 2.8363 0.9841 0.2862 s(AdStock(ONL, 0.909)) 7.9135 8.6660 3.6586 0.0003 s(SEA) 3.1847 3.9998 11.5871 < 0.0001 s(AdStock(BM, 0.909)) 5.8773 6.9649 12.1111 < 0.0001 s(AdStock(EM, 0.909)) 3.8406 4.7890 1.6549 0.1206 s(POLS) 2.8846 3.6301 8.0898 < 0.0001 C. model properties

R-sq.(adj.) 0.876 Scale est. 3.1528e+05 Deviance explained 88.8% BIC 6632.138

GCV 3.8513e+05 n 414

Table 4.8: Output of the full GAM

In Table 4.8 the model output is shown. This tells us that Competitors, School Holiday and the smooth terms of Radio, Social and EM are not significant at the 5% level. Estimating the

(32)

optimal model archived by stepwise selection,

E(log(SERVt)) = θ0+ θ1TRENDt+ θ2HOLt+ θ3SATt+ θ4SUNt+ θ5THUt+ θ6WEDFRIt

+ f1(AdStock(TVt, λ)) + f2(AdStock(ONLt, λ)) + f4(SEAt) (4.6)

+ f5(AdStock(BMt, λ)) + f6(POLSt) + t

with λ = 0.909, we get the output as shown in Table 4.9. We can see that the adjusted R2, GCV and BIC have improved, but the deviance explained is a little lower. In Fig. 4.3 (a)-(e) the estimated partial regression functions of Model 4.6 are shown. On the x-axis the media spent and on the y-axis the effect of the media variable is given. The solid line shows the estimated effect and the grey area is the 95% confidence interval. The stripes or so-called rug plots at the bottom of each plot show the values of the covariates of each smooth; if the stripes are close together there are many observations in that area, if they are far apart there are little and the 95% confidence interval gets wider. From the graphs of the smooth functions we can tell

A. parametric coefficients Estimate Std. Error t-value p-value (Intercept) 8.5339 0.0212 403.0150 < 0.0001 Trend 0.0011 0.0001 14.0647 < 0.0001 HOL -0.2598 0.0404 -6.4347 < 0.0001 SAT -0.2936 0.0194 -15.0976 < 0.0001 SUN -0.4289 0.0214 -20.0024 < 0.0001 THU 0.0602 0.0158 3.8101 0.0002 TUE 0.0579 0.0157 3.6901 0.0003 WEDFRI 0.0828 0.0138 6.0165 < 0.0001 B. smooth terms edf Ref.df F-value p-value s(AdStock(TV, 0.909)) 4.8320 5.8710 10.6305 < 0.0001 s(AdStock(ONL, 0.909)) 8.1989 8.8126 5.2275 < 0.0001 s(SEA) 2.8920 3.6321 15.6346 < 0.0001 s(AdStock(BM, 0.909)) 6.2377 7.3498 14.1622 < 0.0001 s(POLS) 2.9663 3.7279 7.9058 < 0.0001 C. model properties

R-sq.(adj.) 0.871 Scale est. 3.2644e+05 Deviance explained 88.1% BIC 6602.143

GCV 3.8087e+05 n 414

Table 4.9: Output of the reduced GAM

that SEA is increasing monotone, the more SEA impressions the better. POLS is increasing as well, but decreases after about 70.000 views. However, the confidence interval gets very wide at that point since there are little observations in this area, thus it might as well be possible that POLS is saturated at this point. TV shows about the same fashion, except the effect decreases slightly after a bit. This might be due to the fact that instead of the GRP’s, the AdStock of TV is included. Since the campaign lasted 35 days and with a decay factor of λ = 0.909 we have a half-time of 7.26 days, the AdStock of TV is higher after the campaign than on the first

(33)

CHAPTER 4. RESULTS 30 few days of the campaign, and therefore a lower effect with a lower AdStock doesn’t necessarily mean that less GRP’s spent on TV works better. For BM we see different levels of effect for a different number of BM visible. This might be attributable to the nature of BM, sometimes a message is sent to a large number of customers while on another occasion a more specific target group is chosen. Some messages have a stronger ’call to action’ than others. While there might be a similar explanation for Online, the smooth is so wiggly that it looks like it is overfitted.

(a) s(AdStock(TV,0.909)) (b) s(SEA) (c) s(AdStock(Online,0.909))

(d) s(POLS ) (e) s(AdStock(BM,0.909)) (f) te(RAD,TV )

Figure 4.3: Graphs of the smooth functions

Including Synergy

We have seen in the linear models that there is synergy between Radio and TV. As discussed in Chapter 3, thin plate regression splines can represent smooths of more than one predictor variable. We incorporated this interaction effect using a te() term, this produces a smooth of Radio and TV from the tensor product of any basis available for use with s() (Wood, 2017).

(34)

Thus the following model is estimated:

E(log(SERVt)) = θ0+ θ1TRENDt+ θ2HOLt+ θ3SATt+ θ4SUNt+ θ5THUt+ θ6WEDFRIt

+ f1(AdStock(TVt, λ), AdStock(Radiot, λ)) + f2(AdStock(ONLt, λ))

+ f4(SEAt) + f5(AdStock(BMt, λ)) + f6(POLSt) + t (4.7)

with λ = 0.909, of which the results are shown in Table 4.10. It shows that the GCV and deviance explained are better than in model 4.6, but the BIC and adjusted R2 are worse. We performed an F-test with the null hypothesis that model 4.7 is better than 4.6 and this null hypothesis is rejected, see Table 4.11. Thus we can not conclude that it is better to include the interaction effect of Radio and TV instead excluding Radio. In 4.3 (f) the tensor product spline fit is shown of TV and Radio. The bold contours show the estimate of the of the smooth, it shows that for a low level of Radio and TV we see interaction. However, it is quite hard to interpret a smooth of two variables.

A. parametric coefficients Estimate Std. Error t-value p-value (Intercept) 8.5299 0.0212 401.4509 < 0.0001 Trend 0.0012 0.0001 14.0744 < 0.0001 HOL -0.2336 0.0407 -5.7391 < 0.0001 SAT -0.2905 0.0192 -15.1357 < 0.0001 SUN -0.4284 0.0212 -20.2384 < 0.0001 THU 0.0592 0.0156 3.7953 0.0002 TUE 0.0562 0.0155 3.6361 0.0003 WEDFRI 0.0823 0.0136 6.0644 < 0.0001 B. smooth terms edf Ref.df F-value p-value te(AdStock(TV, 0.909),AdStock(RADIO, 0.909)) 10.7335 11.5703 8.0671 < 0.0001 s(AdStock(ONL, 0.909)) 8.3190 8.8494 4.9825 < 0.0001 s(SEA) 3.1181 3.9094 12.3402 < 0.0001 s(AdStock(BM, 0.909)) 5.6469 6.7758 10.3926 < 0.0001 s(POLS) 2.8983 3.6464 6.7594 0.0001 C. model properties

R-sq.(adj.) 0.875 Scale est. 3.1678e+05 Deviance explained 88.7% BIC 6617.268

GCV 3.8017e+05 n 414

Table 4.10: Output of the GAM with interaction

Resid. Df Resid. Dev Df Deviance F Pr(>F) 1 371.28 118884043

2 376.64 124329629 -5.3606 -5445586 3.2068 0.006215 *** Table 4.11: F-test between model 4.7 and model 4.6

(35)

CHAPTER 4. RESULTS 32 Grid Search

Two remarks have to be made about the outcome of the GAM. Firstly, we have few data points and not every level of input is evenly covered, as the rug plots show. Secondly, since in the mgcv-package it is not possible to automatically select an optimal decay factor λ we have chosen λ = 0.909 for all media variables which is likely to be sub-optimal. To improve the model we have performed a grid search on the decay factors of Model 4.6, the script of which is to be found in Appendix 5. The optimal decay factors of TV, Online and BM are determined in that order, since it is computationally too heavy to perform a grid search on three decay factors at once.

The graphs of the BIC against the decay factor λ can be found in Fig. 4.4. It shows that the optimal decay factors for TV and BM are 0.89 and 0.91 respectively. The BIC for the varying decay factor of Online keeps decreasing when the decay factor grows beyond 0.9. Since we expect the decay factor of TV to be higher than the decay factor of Online and a decay factor of bigger than 0.95 seems very unlikely, since this indicates a halftime of longer than 13 days, we choose the local optimum of 0.80 for Online.

(a) TV (b) Online (c) BM

Figure 4.4: Graphs of the BIC with the different decay factors

In Table 4.12 the results are shown of the model estimated with the decay factors found in the grid search. We see that the Adjusted R2, the deviance explained, the GCV and the BIC have all improved. In Fig. 4.5 (a)-(e) the estimated partial regression functions are shown. We can see that the graphs haven’t changed dramatically, which is good since it would be worrisome if a small model adjustment gives a very different outcome. However, the confidence intervals are smaller, Online is a little less wiggly and TV is less decreasing. Thus we can conclude it is worthwhile to do a grid search for the decay factor.

(36)

A. parametric coefficients Estimate Std. Error t-value p-value (Intercept) 8.5120 0.0204 416.9096 < 0.0001 Trend 0.0013 0.0001 16.3747 < 0.0001 HOL -0.2753 0.0401 -6.8667 < 0.0001 SAT -0.2932 0.0192 -15.2695 < 0.0001 SUN -0.4291 0.0212 -20.2316 < 0.0001 THU 0.0574 0.0157 3.6646 0.0003 TUE 0.0539 0.0156 3.4611 0.0006 WEDFRI 0.0796 0.0137 5.8048 < 0.0001 B. smooth terms edf Ref.df F-value p-value s(AdStock(TV, 0.89)) 6.0349 7.1655 20.9543 < 0.0001 s(AdStock(ONL, 0.8)) 8.2411 8.8296 5.3324 < 0.0001 s(SEA) 2.8822 3.6168 23.3090 < 0.0001 s(AdStock(BM, 0.91)) 5.5614 6.6708 14.4444 < 0.0001 s(POLS) 3.2188 4.0033 6.5157 < 0.0001 C. model properties

R-sq.(adj.) 0.874 Scale est. 3.1908e+05 Deviance explained 88.4% BIC 6596.706

GCV 3.738e+05 n 414

Table 4.12: Output of the GAM with Grid search AdStock

(a) s(AdStock(TV,0.89)) (b) s(SEA) (c) s(AdStock(Online,0.8))

(d) s(POLS ) (e) s(AdStock(BM,0.91))

(37)

Chapter 5

Conclusion

In this thesis Marketing Mix Modelling is applied to learn what the effect of the media exposure and other contextual variables is on the self-services of within the mobile banking application. Extensive research has been devoted on how to model the effectiveness of advertising on sales, but more research could be conducted on the effect of marketing on the adoption of mobile banking and also little research has been done with daily data. The bank wants to know what media contributes to the increase of self-services to be able to calculate their return on marketing investment. Furthermore we were interested in the decay factors of the media variables, the differences between the separate self-services and which type of model gives us the best insights. In this section a summary will be given of the most important results and further recommendations will be done.

The log-linear models

The first step of our modelling approach was to perform an ADF-test to see if the self-services are stationary, have a deterministic or a stochastic trend. Resultant we modelled the series with a trend and based on the BIC-value found that Wednesday, Thursday and Friday are not significantly different. Also we found that the best fit is to incorporate whether or not the competitors had a campaign on mobile banking instead of their actual spent on media in the model. Using a Likelihood Ratio test on the Box-Cox transformation and considering the results of the Ramsey RESET test we concluded that the log-linear model is the best functional form.

From the full linear form we estimated the optimal linear model, the optimal log-linear model with AdStock and the optimal log-log-linear model with AdStock and interactions, using the approach as discussed in Section 3. Based on the VIF’s we conclude that the loglinear model with AdStock and interaction suffers from high multicollinearity. Based on the BIC and Adj. R2 we conclude that the Log-Linear model with AdStock has the best fit. TV, Radio and SEA have a significant positive effect on the card-services. We found a decay factor for Radio of λ = 0.9454 which corresponds to a half-time of 12.3 days. Moreover, Rabobank has a significant positive contribution and on a Holiday the level of card services is significantly lower.

(38)

Furthermore the level of card services is the highest on Wednesday, Thursday and Friday and the lowest on Sunday.

We considered the separate card services and found that all except New card have a signif-icant trend. It appears that New card is insensitive for media exposure. Moreover, geolocation has a unit root and thus the first differences were modelled. It shows that School holiday has a positive effect on Geolocation, which makes sense since more people go on vacation, and New card. Less services are done digitally on a Holiday in the category Change Daylimit, Deblock card and New card. Moreover, in the optimal models TV has a positive effect on Change daylimit and Replace card, while Radio only has an effect on Change geolocation. SEA has a positive effect on Change daylimit, Block card, Deblock card and BM has a positive effect on Change daylimit, Block card, Deblock card, and Geolocation but a negative effect on New card which seems odd. Also POLS has negative effects on Block card, Deblock card and Change geolocation which is unexpected. Another interesting insight is that we found that the positive effect of Rabobank in the log-linear model of the combined self-services is due to the large positive effect on Daylimit.

The Generalized Additive Models

To obtain more insight of the dynamics of the media variables without having to specify a functional form for every media variable in advance we used Generalized Additive Models. First we used a set decay factor of 0.909 for the media variables. We found that in the GAM Competitors, School Holiday, Radio, Social and EM did not contribute significantly at the 5% level. In the log-linear models Radio and Competitors did have a significant effect. SEA appears to be monotone increasing, the more impressions the better, while POLS and TV are saturated at some point. We found different levels of effect for a different volume of BM, this might be due to the type of BM sent and will be interesting to investigate. Online however seems to be overfitted.

In an attempt to improve the fit of the model, we performed a grid search for the best decay factor of TV, BM and Online. Due to computational time, we were not able to do a grid search on them simultaneously but based on the BIC, we found decay factors of 0.89, 0.91 and 0.80 respectively and this corresponds with our expectations. When one reads a BM, the message will probably stay in mind longer than when an advertisement seen on TV. When a message is displayed in a banner Online it will probably be forgotten even faster. With the optimal decay factors the fit of the model is indeed better but Online is still wiggly. Another method than thin plate regression splines could be considered, but it could also be possible that other factors should be taken in consideration, like on which websites there was bannering or the clicks should be taken into account instead of the views.

Since Radio was omitted from the optimal GAM while it was included in the optimal log-linear models, we tried to fit a smooth of Radio and TV combined. The fit of this model is slightly better but an F-test turned out that it is not significantly better. Moreover the level

Referenties

GERELATEERDE DOCUMENTEN

(1999) will argue that foreign banks intensely enter the manufacturing sector and they will exert a high competitive pressure on domestic banks in this sector because foreign banks

The empirical results show (i) a significant negative effect of foreign bank presence on bank-level as well as country-level financial stability (ii) foreign banks with

It can be seen that banks with higher market power can better withstand an increase in federal funds rate, as the coefficient of the interaction term between ROA and

This thesis clearly demonstrates the significance of motivation to a professional service firm, both as a dynamic capability and as a source for development of

The main objective of this study was to determine the prevalence and nature of cyberbullying and its effect on and relationship to mental well-being among adolescents in the

De dochters weven haar eigen uitzet niet achter het weefgetouw en waarschijnlijk is hun binding met het werk. op zich minder

Taken together, the positive effect of the GDP growth rate and the profitability ratio suggest that, banks operating in higher economic development conditions and

The metrics number of elements per input pattern and number of elements per output pattern measure the size of the input and the output pattern of rules respectively.. For an