Predicting effective mass marketing channels using Machine Learning

(1)

Predicting effective mass marketing channels using Machine Learning

submitted in partial fulfillment for the degree of master of science Roos Slingerland

10775935

master information studies data science

faculty of science university of amsterdam

2018-06-28

Internal Supervisor External Supervisor

Title, Name dhr. Dr Stevan Rudinac Michel van Koningsbrugge, MSc.

Affiliation UvA, FEB, OM MK2

(2)

Predicting Effective Mass Marketing Channels

using Machine Learning

R. Slingerland

University of Amsterdam Amsterdam, The Netherlands roos.slingerland@student.uva.nl

ABSTRACT

This study aims to improve the way media companies now advise their clients on their campaigns using machine learning. Compared to the way this is currently done, based on the experience of em-ployees, this optimizes the process in terms of time, magnitude and continuity. Furthermore, new relations between variables can be found that can help in understanding marketing phenomena. We used the data of a marketing company containing answers to ques-tionnaires about marketing campaigns. The explanatory variables involve income, gender, political preference, age, education and more. The channel used for the marketing campaign is the multi-class response variable, being either Cinema, Daily Paper, Internet, Magazine, Out of Home, Radio, or Television. The dataset of nearly 800,000 rows was split up based on the question: Do you recognize this campaign?to execute three classification experiments. Differ-ent algorithms were applied with differDiffer-ent sorts of feature selection and these were evaluated in terms of macro F1-score. All models outperformed the random baselines. The best scoring algorithm was logistic regression without feature selection, with a macro F1-score of 0.413. Analysis of the coefficients of the logistic regres-sion models was performed to evaluate the feature importance. Future research might focus on adding variables, failure analysis, expanding the dataset or implementing a classification model while keeping the expert knowledge of the media company.

1 INTRODUCTION

In the field of marketing, making accurate predictions is very im-portant, as it can lead to taking better decisions that result in saving costs or increasing profit and understanding the marketing phe-nomena better [6]. Advertising as part of marketing covers around 2% of the U.S. GDP [13]. This study was executed at business in-telligence company MK2_{, which has a Dutch media company as a} client. This company advises clients about marketing campaigns (via which channel, for how long etc.) and then researches how participants perceived the commercials by analyzing their data. They collect this data via respondent questionnaires to ask about the respondents’ demographics, media usage and opinion about a specific brand, competitors and/or campaigns.

The advice regarding the type of marketing channel is now made based on the experience of the planners at the media company, the budget of the client and the objective of the campaign [2]. However, the company is looking for other ways to improve this process, since now this experience does not always lead to the best decision. The planner can be biased towards some brands or channels or he/she can simply have a bad day, resulting in wrong decisions. Moreover, since this experience is dependent on the employee, this might be a problem in case the employee is absent. Thirdly, currently, no way

exists to evaluate how well the advice for the choice of marketing channel was.

A way to overcome these problems is machine learning, which can be executed in many ways which differ in terms of variable-selection methods, the time needed, the computation needed, the number of variables possible etc. [3]. One of these techniques is Supervised Machine Learning, which entails learning the labels of a dataset. This is practiced by finding a formula connecting the variables and then checking the performance to a testset that has not been seen by the model before.

With this technique the following research question will be an-swered:

To what extent is it possible to predict the most effective mar-keting channel, using machine learning on mass marmar-keting questionnaires?

The explanatory variables of this study will be obtained from the media company’s questionnaires. Furthermore, the advertisement channel will be the dependent variable with seven possible values: Cinema, Daily Paper, Internet, Magazine, Out of Home (for example billboards), Radio, and Television.

Subquestions in this study are:

(1) How do different machine learning algorithms differ in per-formance on this classification task?

(2) What is the difference between only using respondents that recognize advertisements and also using respondents that did not recognize the advertisements regarding performance? (3) Which features were found important and how do they differ

for each class?

By answering these questions, it will become clear whether machine learning can, in fact, improve the decision making of marketing channel. This can lead to more effective campaigns and, therefore, more satisfied clients for the media company, better expenditures of the campaign budgets of the clients, and more time for the employ-ees of the media company to spend on different tasks. For science, this study contributes in general to methods that are less dependent on humans with the benefits of speed, magnitude, and continuity. The rest of this paper is organized as follows: in section 2 more research on machine learning in marketing can be found. Section 3 describes the data description, evaluation metrics and experimental setup. The results are stated in section 4. Lastly, section 5 holds the conclusions of this study together with some critical thoughts.

(3)

2 RELATED WORK

As stated in the introduction, making accurate predictions within marketing is very important. Customer relation management (CRM) has gained interest over the past decade and is defined by Ngai, Xiu, and Chau [11] as: "the strategic use of information, processes, technology and people to manage the customer’s relationship with your company (Marketing, Sales, Services, and Support) across the whole customer life cycle”.

Current research has already looked into applying machine learn-ing to analyze CRM. Ngai and colleagues (2009) analyzed 87 papers that used machine learning techniques in the field of CRM and classified them into seven machine learning models and techniques. They found that the model most frequently applied, was the classi-fication modelwhich tries to map every datapoint into a predefined class [11]. They stated that over all models, neural networks are the most commonly used technique (30 articles), followed by decision trees(23) and (logistic)regression (10). Support Vector Machines (SVM) was only used once and although applied effectively in many do-mains, applications in marketing are scarce [6]. All these techniques can be considered as techniques for the supervised classification model and will now be explained in more detail.

Regressionis a statistical tool to map datapoints to a prediction value by fitting a curve [11]. Logistic regression is an application of regression that only allows for two different outputs. However, in case of a multiclass problem - as is the case in this study - logistic regression can still be used. In that case, a one-vs-rest-technique is practiced, meaning that for every class a binary problem is simu-lated. In order to fit the data well, the maximum likelihood function is produced and maximized. This technique is easy to understand, posterior probabilities are possible and the results are quick and robust to noisy data [11]. By using L1 regularization, some variables are excluded which makes the interpretation easier [9]. Although one might expect that this negatively affects the performance, ac-cording to literature this is not always the case. In the domain of marketing, logistic regression was found to perform approximately equal to for example decision trees, yet was preferred because of its easy use and speed [5]. For this reason, logistic regression will be applied in this research.

Decision treesas well are easy to implement, low in computa-tion time, robust to noisy data and return variable importance [3]. It learns a hierarchy consisting of if/else questions about the fea-tures to divide the datapoints into the predicted categories. This technique could outperform SVM and logistic regression, as it did before in related work. It is, however, subject to overfitting since it can exactly learn the dataset [3]. Random Forests can be seen as an improvement of decision trees, growing multiple trees on a sample of the trainset. All these trees then choose the best fitting label and the aggregate of these votes will decide what the label of the datapoint from the testset will be [9]. The variable importance is, however, not defined per class contrary to the coefficients for logistic regression. Despite this, decision trees and random forests will be applied to the data of this study.

Neural Networkshave been used in many domains and were inspired by the human brain. SVM is a technique based on neu-ral networks that uses kernel transformations to solve nonlinear problems with a linear model [6] and is said to be very robust to

noisy data. It finds a unique, optimal and global solution that maxi-mizes the margin of separation between the positive and negative examples [3]. It does, however, scale badly for large datasets in terms of time and computation. Because of all benefits though, this algorithm will also be applied in this research.

Besides predicting the marketing channel, another important part of this study is discriminant function analysis (DFA) by an-alyzing the coefficients of the algorithms, also known as feature importance analysis. DFA in marketing research has been used for three main reasons according to Crask and Perreault Jr [4]: classi-fying individuals into groups, profiling characteristics of the most discriminative groups, and identifying underlying dimensions. For this study, the second and third reason apply since we are inter-ested in how customers, that recognized advertisements via for example TV differ from those who recognized advertisements via Radio, or that did recognize the advertisement. Because of the in-crease in DFA techniques for computers and the inin-crease of data in general, interest in DFA has increased. However, DFA (and feature importance analysis) has been criticized, because of the lack of validation of this method. Scholars question whether the dimen-sions found are generalizable to the population, and these do not only arise in small-sample research [4]. Hence, careful conclusions about coefficients need to be drawn, especially when variables are correlated to each other [9]. Coefficients indicate how important a model evaluates a specific variable. This number can be both positive and negative and can be seen as a level of associativity to that class, compared to the other classes. In this research, the coefficients of the logistic regression algorithm will be analyzed as feature importance analysis.

3 METHODOLOGY

3.1 Description of data

The data is stored in a database owned by the media company and goes back to 2010, containing answers to questionnaires. At the start of this research, MK2already built an OLAP-cube for the media company, a technique to link multiple tables to each other using foreign keys with a front-end in Microsoft Excel enabling non-programmers to analyze the data. This cube calculates many (sub)totals overnight, enabling the user to quickly get numbers in pivot-tables in Excel for analysis. However, for this study, those (sub)totals do not contain the necessary information, since the individual answers of the respondents are of interest. Therefore, different tables had to be linked manually to obtain a dataset with rows per user, per questionnaire, per campaign. All tables in the database were investigated, and by setting a 50 percent threshold for missing values 19 explanatory variables were selected for their expected value. These involve the brand, family size, gender, age, education, employment status, income, hometown, political pref-erence, grocery shopping, mailbox stickers, and opinion about the economy. The explanatory variables are discussed further in figure 5, to be found in Appendix A. The response variable is multiclass with seven possible values: Cinema, Daily Paper, Internet, Maga-zine, Out of Home, Radio, and Television (see table 1 and figure 2). For all these 781,520 rows, the respondents filled in whether they recognized that campaign. This was used to split up the dataset in three different ways for the experiments, explained in section

(4)

Table 1: Translation of class abbreviations.

TV Television

RAD Radio

INT Internet

OOH Out of Home

DGB Daily Paper

MAG Magazine

BIOS Cinema

3.4, see also figure 1. The Yes-dataset consists of the rows in the dataset where respondents said they recognized the advertisement, which entails 22% of the entire dataset. Rows with respondents that answered maybe to this question (17%) were added to the Yes-dataset, resulting in the Yes-Maybe-dataset of 309,763 rows. The third dataset again contains the Yes-dataset, but this time a random sample was added from the respondents that said no and said maybe with the same size as the Yes-dataset1_{resulting in the All-dataset} with 346,004 rows. Respondents choosing for the fourth option not applicablewere excluded from all datasets according to the advice of [2] because they had problems watching the advertisement in the questionnaire.

3.2 Evaluation metrics

Data in marketing is expensive and the respondents that actually saw marketing campaigns are scarce [8, 13]. Because of this skewed data, accuracy is often not a suitable evaluation metric and AUC or lift are proposed alternatives [3, 5, 8, 13]. However, since the interpretation of the outcome is very important in marketing for it 1_{This undersampling step was taken because of computational constraints.}

No Maybe Yes Yes Yes Yes No Maybe Maybe Yes dataset Yes Maybe dataset All dataset

Figure 1: Creation of different datasets. From the entire dataset of nearly 800,000 rows three datasets were cre-ated: only the rows where respondents recognized the cam-paign (Yes-dataset), these rows combined with all the rows where respondents doubted (Maybe-dataset) and the Yes-dataset combined with a random sample of rows with an-swers maybe and no (All-dataset).

Maybe No Yes ALT

TV _RAD _DGB

MAG INT OOH BIOS 0 50000 100000 150000 200000 250000

Figure 2: Distribution of the All-dataset.

to be of any help for making strategies [7, 14], a different choice was made for this research. For this research, precision (ratio of correctly classified positive examples to total classified positive examples [15]) and recall (ratio of correctly classified positive ex-amples to total positive exex-amples in data [15]) are thought of as equally important. Precision is important here since every time the algorithm classifies something as being TV, it should be TV and not something else. On the other hand, recall is important since we want all datapoints that should be classified as TV to be returned. Therefore, the harmonic mean, also known as F1-score, is used as evaluation metric:

F1-score=2 ∗ precision ∗ recall

precision+ recall . (1)

As can be observed, every datapoint is handled equally in this metric. Yet, treating every class equally instead is a more appropriate way here because of the skewed class distribution. In this way, the weight is not only at the large classes, but the small classes become equally important. This is known as the Macro F1-score:

Macro average F1-score= 1 K

K Õ

k=1

F1-score for class k. (2) The trade-off between precision and recall can be visualized in a precision-recall curve, which uses both the true labels and the predicted uncertainties for each class [9]. Ideally, the curve would stay close to the upper-right corner, meaning high precision and high recall, see figure 5. The area underneath the curve is known as average precision and could be used to easily compare precision-recall curves to each other.

Another curve that is often used in classification problems is the receiver operating characteristics (ROC) curve. It calculates different thresholds like the precision-recall curve as well, but it shows the false positive rate against true positive rate (recall) [15]. Ideally, this curve is close to the upper-left corner: high recall and low false positive rate, see figure 5. In case of a random classifier, the curve fits the diagonal, mostly plotted as well. In this way, one could compare their model to a random classifier. The area underneath the ROC-curve is known as AUC and can be used to easily compare curves. In the case of a multiclass problem, one could also plot the average curve by treating every datapoint equally (micro), or by treating every class equally (macro). One of the benefits of the ROC-curve compared to F1-score, is that it also takes True Negatives

(5)

SQL Table SQL Table SQL Table SQL Table SQL Table Filter missing values CSV file str str str str str str Enlarge NaN list Group values together Rename variable names CSV file

bin bin bin int bin int

CSV file str str str str str str Onehotencoding except numerical variables Logistic

Regression Decision Trees Random Forests Linear SVM

Use all features Use 50% features* Use 50% features** Use all features Use 50% features* Use 50% features** Use all features Use 50% features* Use 50% features** Use all features Use 50% features* Use 50% features** Link tables Split data Transformation Selection Preprocessing Crossvalidation Classification

Figure 3: Flowchart of selection, preprocessing, transformation, cross-validation [9] and classification including feature selec-tion. * Using SelectPercentile(). ** Using SelectFromModel() with a Random Forests Classifier. For each last step Gridsearch was used to tune hyperparameters; that is for logistic regression C, for decision trees max_depth, for random forests max_depth and max_features, and for SVM again C.

into account and that it visualizes different thresholds instead of just one.

Lastly, confusion matrices are plotted to visualize the perfor-mance of the classifiers. It is the representation of one point in the ROC-curve, for a specific threshold. Each row represents a true label, each column represents a predicted label and therefore high numbers on the diagonal are ideal. In this research, the confusion matrices are normalized, meaning every row adds up to one to adjust for the unbalanced classes.

3.3 Tools

The data for this study was obtained by combining multiple tables from the data warehouse of the media company using the soft-ware Microsoft SQL Server Management Studio. Furthermore, Python was used, specifically the machine learning library Scikit Learn [12]. Methods like Gridsearch, Imputer and Pipeline were used, together with the algorithms LogisticRegression, DecisionTreeClas-sifier, RandomForestClasDecisionTreeClas-sifier, and LinearSVC. In a meeting with an employee of the media company, the preprocessing steps were suggested and adjusted based on her domain knowledge. For the ROC-curves and precision-recall curves we addressed the library Scikit-plot[10]. Lastly, for the infographics, the tool Draw.IO was handled [1].

3.4 Experimental setup

Three different experiment designs for the different datasets were used that will now be shortly described. However, the preprocess-ing and model trainpreprocess-ing was all the same and will be explained first. In preprocessing, some values were added to the NaN-list, some were grouped together, and some were renamed. The variables HuishoudGrootteand AantalKinderen were changed from categori-cal to numericategori-cal by using the median for the missing values. The other variables were treated as categorical and therefore one-hot-encoding was applied, resulting in a total of 141 variables. A grid search for multiple hyperparameters was executed with a three-fold cross-validation, splitting the train- and testset without overlap in respondents with an 80-20 ratio, optimizing the overall macro F1-score. The experiments were executed without any feature selec-tion, with the best 50% according to SelectPercentile()2and with the 50% best according to SelectFromModel()3_{with a Random Forests} Classifier. The algorithms used for this classification problem, were 2_{Univariate feature selection method that calculates the level of significance between}

all variables and the response variable. The variables with the highest confidence are then chosen. The user can specify how many features by giving a percentile of the total variables [9].

3_{Model-based feature selection method that uses a supervised machine learning model}

to determine the most important features. The user can determine a threshold that has to be reached for the variables to be kept. In comparison to univariate feature selection methods, this method also takes the interactions between variables into account [9].

(6)

logistic regression, decision trees, random forests and Linear SVM, all with balanced training to adjust for the skewed class distribution. For every experiment, a baseline was calculated. The confusion ma-trices were normalized and plotted, and precision, recall, and the macro F1-score was calculated. These steps are visualized in figure 3.

For every experiment, a baseline was also calculated, using the three different datasets. This was done by randomly predicting a label for the testset, while keeping the distribution of classes. Meaning that if class A appeared twice as much as class B in the testset, the chance that the random predictor chose label A was twice as high. This resulted in a prediction for all datapoints in the testset. Then for this prediction, the F1-score was calculated. To adjust for the randomness, this process was repeated 100 times and the average of these 100 F1-scores was set as the baseline.

3.4.1 Experiment 1.In this experiment, the Yes-dataset was used to solve the multiclass problem of seven classes. The baseline macro F1-score was set at 0.143. This experiment will give insight into how well the different classes are predictable when only using the respondents that were actually reached by the campaign. By looking at the differences between the coefficients per class, the differences can be made visible between respondents that were reached by different channels.

3.4.2 Experiment 2.In this experiment, the Yes-Maybe-dataset was used containing 173,002 rows that answered yes, and 136,761 rows that answered maybe. The latter all got the class label MAYBE, resulting in eight different classes. The baseline macro F1-score was set at 0.125. The coefficients of this experiment will give insight into how respondents that did recognize a campaign differ from those who doubt whether they recognized it.

3.4.3 Experiment 3. In this experiment, the All-dataset was used containing 173,002 rows that answered yes and 173,002 rows that answered differently (no or maybe). The latter all got the class label ALT, resulting in eight different channel types. The baseline macro F1-score was set at 0.125. The coefficients of this experiment will give insight into how respondents that did not recognize a campaign (or doubt) differ from those who did.

4 EVALUATION

4.1 Experiment 1

For experiment 1, table 2 shows that no feature selection results in the best F1-score and random forests as feature selection method performs worst for all algorithms. Different thresholds were tried out with the percentile feature selection method - results to be found in table 6 - but this proved that the more features are taken into the model, the better the performance.

Regarding algorithms, logistic regression performs best (0.413), closely followed by SVM (0.411) and random forests (0.408). The worst performer is decision trees with an F1-score of 0.385. All algorithms outperform the baseline of 0.143. The normalized con-fusion matrix (figure 4) shows that channels Cinema and Daily Paper are predicted best (also visible in the ROC-curve in figure 5), Magazine and Television also good, but Internet, Out of Home, and Radio are often predicted wrong. Furthermore, it can be found that

BIOS DGB INT MAG OOH RAD TV

Macro F1= 0.413 BIOS DGB INT MAG OOH RAD TV True label 0.90 0.00 0.01 0.00 0.00 0.00 0.09 0.00 0.91 0.01 0.01 0.02 0.00 0.05 0.01 0.32 0.35 0.08 0.02 0.02 0.19 0.00 0.00 0.00 0.74 0.02 0.01 0.22 0.00 0.01 0.02 0.07 0.30 0.02 0.59 0.02 0.08 0.05 0.07 0.07 0.27 0.45 0.03 0.01 0.03 0.04 0.05 0.07 0.76

Normalized confusion matrix

0.0 0.2 0.4 0.6 0.8

Figure 4: Normalized confusion matrix of best model for ex-periment 1 (logistic regression with no feature selection). Internet is mistaken for both Daily Paper and TV and that both Out of Home and Radio are often predicted as TV. Since TV is the largest category and therefore has the most rows with information, it was expected to be predicted best. It is therefore interesting to see that the smaller classes Cinema and Daily Paper were predicted best.

Since the confusion matrix only shows the situation for one threshold, it is useful to also look at the precision-recall curves to evaluate the classes. Here we see that while Internet stays at maximum precision for a while before dropping, Cinema, Out of Home and Magazine immediately drop. TV shows the curve closest to the upper-right corner and also has the highest average precision (0.902). We see that the micro-average curve is higher than most other classes, but this can be declared by the size of the TV class. Since this class performs good regarding precision-recall curve and has the most datapoint in its class, it influences the micro-average curve strongly. To also look at the false positives, the ROC-curve can give insight. There we see that not TV, but both Cinema and Daily Paper show the highest area underneath the curve (0.98), consistent with the confusion matrix. Since all the curves are above the diagonal, we see that all classes perform better than random.

Although this study only uses models with balanced training, it is interesting to see how the performance would be if this were not the case. The influence of balancing is visible when looking at 11 in Appendix B. Here the normalized confusion matrix is plotted for exactly the same model as for figure 4. Without balancing, the largest category, TV, is over-represented in the predictions (vertical column blue). With balancing, however, this vertical column has been shifted to the diagonal, meaning almost all classes are better predicted since now every class is seen as equally important, in-stead of each datapoint.

When looking at the coefficients (see figure 12 up to 18 in Appen-dix D with all classes in the same colors as in figure 5), we see the following. For class Cinema, the way respondents handle their mail (stickers) and political preference are positively associated with the class Cinema, while the variable about who is the primary income earner at home is negatively associated. Some brands also appear in this list. For example, lotteries (Bankgiro Loterij and Postcode

(7)

Table 2: Results of experiment 1 after hyperparame-ter tuning that is for logistic regression C, for deci-sion trees max_depth, for random forests max_depth and max_features, and for SVM again C. For the best algorithm the parameter was tuned to: C = 10. Precision and recall are averaged over all datapoints using a macro average.

Algorithm Feature Selection method Macro F1-score Precision Recall

Logistic regression None 0.413 0.72 0.66

Logistic regression Percentile 0.402 0.71 0.65

Logistic regression Random

Forests 0.398 0.70 0.66

Decision Trees None 0.385 0.69 0.62

Decision Trees Percentile 0.368 0.69 0.51

Decision Trees Random

Forests 0.351 0.67 0.59

Random Forests None 0.408 0.70 0.71

Random Forests Percentile 0.383 0.69 0.61

Random Forests Random

Forests 0.378 0.69 0.68

Linear SVM None 0.411 0.72 0.66

Linear SVM Percentile 0.399 0.70 0.65

Linear SVM Random

Forests 0.395 0.70 0.66

Loterij) are positively associated with the channel, where travel company TUI is negatively associated with the channel compared to the other channels. This means that the advertisements of these lotteries are more likely to be recognized when broadcasted via cinema, then if these brands advertised in another way.

For class Daily Paper, we mostly see brands in the list of extreme coefficients. Remarkable here is that lottery Nationale Postcodeloterij is positively associated, while lottery Bankgiro Loterij is negatively associated with the class Daily Paper. Meaning daily paper adver-tisements of Nationale Postcodeloterij are likely to be recognized, while advertisements of Bankgiro Loterij are not.

For class Internet, we see only brands in this list with mostly negative coefficients. Van Gogh Museum has a very large positive co-efficient, indicating a positive association with the channel Internet, so has the telecommunication brand Tele2.

For class Magazine, we see some positive coefficients for brands, but also positive coefficients for the family structure of the re-spondent. We see negative coefficients for the variable about the primary income earner and the variable indicating who is doing the groceries at home.

For class Out of Home, we see that the region of the hometown of the respondent is positively associated if the respondent is living in a large city (Amsterdam, Rotterdam, Den Haag), while the other regions are not in this top 30 list. This could mean that the Out of Home advertisements in large cities work better. We see negative coefficients for variables indicating the primary income earner and positive coefficients for the family structure. Interestingly, we see travel company TUI with a negative coefficient, but travel company

0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2 0.4 0.6 0.8 1.0 Precision Precision-Recall Curve

Precision-recall curve of class BIOS (area = 0.348) Precision-recall curve of class DGB (area = 0.603) Precision-recall curve of class INT (area = 0.472) Precision-recall curve of class MAG (area = 0.189) Precision-recall curve of class OOH (area = 0.156) Precision-recall curve of class RAD (area = 0.403) Precision-recall curve of class TV (area = 0.902) micro-average Precision-recall curve (area = 0.658)

0.0 0.2 0.4 _{False Positive Rate} 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

True Positive Rate

ROC Curves

ROC curve of class BIOS (area = 0.98) ROC curve of class DGB (area = 0.98) ROC curve of class INT (area = 0.93) ROC curve of class MAG (area = 0.96) ROC curve of class OOH (area = 0.87) ROC curve of class RAD (area = 0.77) ROC curve of class TV (area = 0.79) micro-average ROC curve (area = 0.93) macro-average ROC curve (area = 0.90)

Figure 5: Precision-recall curve and ROC-curve of best model (logistic regression with no feature selection) of all classes for experiment 1.

Arkewith a positive coefficient, while this is just a name change of the same company. We also see some positive coefficients for liquors (Grand Marnier and Aperol Spritz). This means that advertisements of these liquors are more likely to be recognized if they appear via Out of Home advertising then via other channels.

For class Radio, we only see brands in this list, with notable tour operator Oad Reizen as only positive coefficient here. This means that the advertisements for all these brands, except for Oad

(8)

Reizen, would be more effective if they were broadcasted via another channel than Radio.

Lastly, for class TV we again only see brands in this list of ex-treme coefficients. We see some hardware stores like Praxis, Karwei, Gammapositively associated with TV, but for example, Van Gogh Museumis negatively associated with marketing channel TV. This means that advertisements of hardware stores on Television are more likely to be recognized than advertisements via other channels. For advertisements of Van Gogh Museum the opposite yields: ad-vertisements via all other channels are more likely to be recognized than via TV.

4.2 Experiment 2

In experiment 2, only the rows of respondents that said yes and maybewere used, the results can be found in table 3. Again, no feature selection outperforms the two feature selection methods for all algorithms. When looking at the algorithms, it becomes clear that random forests predict best (0.278), followed by decision trees (0.266), logistic regression (0.262) and lastly SVM (0.258). All algo-rithms outperform the baseline of 0.125, meaning that all models predict better than a random choice of channel.

For the best model, the normalized confusion matrix is plotted in figure 6. Here we see that almost every channel is predicted as Maybe or TV more often than as the real label. Channel TV and Maybe score best, respectively 0.53 and 0.51, and since they are also the largest categories this is not unexpected.

When comparing the precision-recall curve (figure 7) to experi-ment 1, we see a decrease in both precision and recall. We again see the curve closest to the upper-right corner is TV, this time closely followed by class Maybe (area = 0.546). Comparable to experiment 1, we see the lowest curves for Magazine and Out of Home. Com-paring the ROC-curve to experiment 1, we again see the curves of Cinema and Daily Paper closest to the upper-left corner. When comparing class Maybe to both the micro- and macro-average, we see it has a smaller area (0.62 compared to 0.88 and 0.74). In fact, it has the smallest area of all classes. This could be explained by the high false positive rate, as we could also see in the confusion matrix (column Maybe). The really steep classes (Cinema and Daily Paper) have a small False Positive Rate, which can also be seen in the confusion matrix. The columns of these classes are filled with low numbers for all cells except the diagonal. This means that datapoints are rarely wrongly predicted as, for example, Cinema, resulting in a low False Positive Rate, hence a steep curve.

When looking at the coefficients, we focus on the coefficients of the channel that was not in experiment 1: class Maybe. Recall that in this class all respondents doubted recognizing the campaign (figure 19). Here, we mostly see only negative associations for brands, with Gammaas the most negative coefficient. Besides that, we see the variable Arbeidsstatus_Tijdelijk verlof, indicating temporary leave as employment status for the respondent (for example maternity leave). The negative coefficient for this variable indicates that re-spondents that are on temporary leave are less likely to doubt about recognizing advertisements. Since the other classes consist of rows from the Yes-dataset, we can conclude the following. Respondents on temporary leave are more likely to recognize advertisement

BIOS DGB INT MAG

MAYBE OOH RAD

TV Macro F1= 0.278 BIOS DGB INT MAG MAYBE OOH RAD TV True label 0.370.00 0.00 0.000.290.01 0.010.31 0.000.320.06 0.000.530.00 0.04 0.05 0.00 0.10 0.17 0.010.520.01 0.03 0.16 0.00 0.00 0.02 0.070.520.03 0.060.31 0.01 0.02 0.02 0.010.510.02 0.100.32 0.00 0.01 0.01 0.010.340.11 0.070.44 0.00 0.02 0.02 0.010.360.030.29 0.29 0.01 0.00 0.01 0.000.330.02 0.100.53

0.0 0.1 0.2 0.3 0.4 0.5

Figure 6: Normalized confusion matrix of best model for ex-periment 2 (random forests with no feature selection). Table 3: Results of experiment 2 after hyperparame-ter tuning that is for logistic regression C, for deci-sion trees max_depth, for random forests max_depth and max_features, and for SVM again C. For the best algo-rithm the parameters were tuned to: max_depth = none and max_features = none. Precision and recall are averaged over all datapoints using a macro average.

Forests 0.206 0.48 0.33

Forests 0.227 0.47 0.40

Forests 0.241 0.48 0.46

Linear SVM Random

Forests 0.200 0.49 0.26

than to doubt about recognizing them. This could be declared be-cause of the extra time people on temporary leave have. Therefore, the chances they watch more TV, listen to more radio, read more daily papers or magazines, and spend more time outside or on the internet are higher. And therefore, the chance they recognize advertisements via one of these channels is higher.

The absence of positive coefficients means that there is no vari-able seen by the algorithm that is highly positively associated with doubting about recognizing the campaign. The negative coefficients

(9)

0.0 0.2 0.4 _Recall 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Precision Precision-Recall Curve

Precision-recall curve of class BIOS (area = 0.244) Precision-recall curve of class DGB (area = 0.210) Precision-recall curve of class INT (area = 0.124) Precision-recall curve of class MAG (area = 0.030) Precision-recall curve of class MAYBE (area = 0.546) Precision-recall curve of class OOH (area = 0.040) Precision-recall curve of class RAD (area = 0.192) Precision-recall curve of class TV (area = 0.558) micro-average Precision-recall curve (area = 0.471)

True Positive Rate

ROC Curves

ROC curve of class BIOS (area = 0.87) ROC curve of class DGB (area = 0.87) ROC curve of class INT (area = 0.77) ROC curve of class MAG (area = 0.72) ROC curve of class MAYBE (area = 0.62) ROC curve of class OOH (area = 0.67) ROC curve of class RAD (area = 0.70) ROC curve of class TV (area = 0.68) micro-average ROC curve (area = 0.88) macro-average ROC curve (area = 0.74)

Figure 7: Precision-recall curve and ROC-curve of best model (random forests with no feature selection) of all classes for experiment 2.

mean that advertisements of these brands are more likely to be rec-ognized than to be doubtfully recrec-ognized. These advertisements could, for example, stay in the respondents’ memory for a long time because they are really funny or maybe annoying.

4.3 Experiment 3

In experiment 3, both respondents that did recognize the campaign, and an equally large random sample of people that answered maybe or no wer in the dataset. The results of the third experiment can be found in table 4, concluding that random forests with no feature

ALT _BIOS _DGB INT _MAG _OOH _RAD TV

Macro F1= 0.282 ALT BIOS DGB INT MAG OOH RAD TV True label 0.690.00 0.01 0.01 0.00 0.01 0.05 0.22 0.390.32 0.00 0.00 0.00 0.01 0.01 0.27 0.640.00 0.24 0.05 0.00 0.01 0.03 0.03 0.630.00 0.09 0.11 0.01 0.02 0.02 0.13 0.560.00 0.01 0.02 0.04 0.02 0.02 0.33 0.390.00 0.01 0.01 0.01 0.11 0.070.39 0.410.00 0.02 0.01 0.00 0.02 0.26 0.27 0.410.00 0.00 0.01 0.00 0.02 0.090.46

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 8: Normalized confusion matrix of best model for ex-periment 3 (random forests with no feature selection). selection perform best. Again, all models outperform the baseline of 0.125 and again looking at the confusion matrix in figure 8 -the largest categories are over-represented in -the predictions (this time TV and Alternative). Especially the classes Internet, Out of Home, and Magazine perform badly.

When comparing the plots in figure 9 to the plots of experiment 2, we have to be careful because of the change in colors of the classes. For the precision-recall curve, we again observe a high curve for TV, yet class Alternative performs even better (area = 0.662). The curves for Radio, Out of Home, Magazine, Internet, Daily Paper and Cinema look similar to the curves of experiment 2. The micro-average curve lays a bit higher than for experiment 2. This could be explained by the fact that the curve of class Alternative lays higher than class Maybe did in experiment 2. Likewise, we see a comparable ROC-curve when compared to experiment 2. The curves for all classes look very similar, and the curve for Alternative is similar to the curve for Maybe in experiment 2. Again this curve has a very low area (0.68), although it is a bit higher than Maybe in experiment 2 (0.62). The explanation for this gentle slope is equal to the explanation for the gentle slope in experiment 2.

When looking at the most extreme coefficients, we focus on the new class in comparison to experiment 1: Alternatively (see figure 20). Recall that this class contains the rows with respondents that answered they doubted about recognizing the advertisement, or an-swered they did not recognize the advertisement. In this list, except for brands, we also see a positive coefficient for the variable of po-litical choice for the Dutch party SGP, a very strict reformed party. This means that respondents that answered they were going to vote for SGP are likely to not have recognized advertisements. This is interesting to see, and also explainable. The party is known for its strict rules and its voters are said to be old-fashioned, often without a television. The party worships the Sunday and discourages its voters to, among many other things, go on the internet on this day. The website of the party, for example, does not even work on Sundays. For these reasons, it is understandable why respondents plan to vote for SGP are more likely not, or doubtfully, have seen advertisements.

Different lottery companies have negative coefficients and also hardware stores are found in this list with negative coefficients.

(10)

Table 4: Results of experiment 3 after hyperparame-ter tuning that is for logistic regression C, for deci-sion trees max_depth, for random forests max_depth and max_features, and for SVM again C. For the best algo-rithm the parameters were tuned to: max_depth = none and max_features = auto. Precision and recall are averaged over all datapoints using a macro average.

Forests 0.247 0.52 0.37

Forests 0.255 0.51 0.45

Forests 0.263 0.52 0.52

Linear SVM Random

Forests 0.241 0.52 0.35

Notable is the occurrence of different branches of bread substi-tute company Bolletje in this list, all negatively associated with this alternative class. These negative coefficients mean that the advertisements of these brands are more likely to be recognized by respondents than not to be recognized (or doubtfully recognized).

5 CONCLUSIONS

This study aimed to improve the way companies decide on chan-nels for their marketing campaigns. The seven possible chanchan-nels Cinema, Daily Paper, Internet, Magazine, Out of Home, Radio and Television were taken as multiclass response variable and answers to questionnaires of a media company were used as explanatory variables. Three experiments with different datasets were executed to find out to what extent it is possible to predict the most effec-tive marketing channel using machine learning on mass marketing questionnaires. By using different algorithms and feature selection methods, the best models for each experiment were found based on the highest macro F1-score. The highest performance was reached when applying no feature selection, and this is valid for all algo-rithms for all experiments. This can be seen as an indicator that the models were not overfitted, or at least feature selection did not solve this. Other reasons for feature selection might be a decrease in running time and reducing the size of the model in case of saving the model. Since these two reasons are not seen as a problem in this research, feature selection is not seen as useful for this research. Yet, in case a larger dataset is used in future work, computation and

0.0 0.2 0.4 _Recall 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Precision Precision-Recall Curve

Precision-recall curve of class ALT (area = 0.662) Precision-recall curve of class BIOS (area = 0.221) Precision-recall curve of class DGB (area = 0.201) Precision-recall curve of class INT (area = 0.083) Precision-recall curve of class MAG (area = 0.018) Precision-recall curve of class OOH (area = 0.043) Precision-recall curve of class RAD (area = 0.188) Precision-recall curve of class TV (area = 0.527) micro-average Precision-recall curve (area = 0.533)

True Positive Rate

ROC Curves

ROC curve of class ALT (area = 0.68) ROC curve of class BIOS (area = 0.85) ROC curve of class DGB (area = 0.85) ROC curve of class INT (area = 0.71) ROC curve of class MAG (area = 0.67) ROC curve of class OOH (area = 0.67) ROC curve of class RAD (area = 0.70) ROC curve of class TV (area = 0.70) micro-average ROC curve (area = 0.89) macro-average ROC curve (area = 0.73)

Figure 9: Precision-recall curve and ROC-curve of best model (random forests with no feature selection) of all classes for experiment 3.

time might play a bigger role and then this should be investigated again.

The best performing algorithms differed for the experiments; for experiment 1 this was logistic regression (0.413), but for experi-ments 2 and 3, this was random forests (0.278 respectively 0.282). For experiment 1 it was clear that not all channels could be pre-dicted equally well, Cinema and Daily Paper were prepre-dicted very well while Out of Home was predicted badly according to the con-fusion matrix. It might be that Out of Home as class is a too broad to accurately predict, since there are billboards in this category, but

(11)

also advertisements on cars and store windows. It is surprising to see that Television - which is the largest category with therefore the most information - is not predicted best. For experiment 2 and 3, the precision, recall and therefore F1-score were lower than for experiment 1 and the largest classes were often over-represented in the predictions at the expense of the real labels. This resulted in very gentle and low curves in the plots with ROC-curves.

Feature importance analysis was applied to the coefficients of the logistic regression models, using L1 regularization. This means that some coefficients were put to exactly zero, which can be seen as a form of automatic feature selection with as consequence the ease of interpretation. Often brands were found in the list with most extreme coefficients, either with positive coefficients indicat-ing a positive association with the channel compared to the other channels or with negative coefficients indicating the opposite. For experiment 2, we saw that besides multiple brands, temporary leave as employment status had a negative coefficient. This meant that people on temporary leave are more likely to recognize advertise-ments. For experiment 3, we saw positive and negative coefficients for brands and a positive association for respondents that were planning to vote for the conservative party SGP. In other words: people planning to vote for SGP are not likely to have recognized advertisements. This could be declared by the general lack of media use for these voters.

When looking at the downsides of this research, the first thing that comes to mind is the reliability of questionnaire data. The data relies on the honesty of the respondents and could contain lies, especially for questions involving income or voting behavior. The media company tried to tackle this by always giving the respon-dent the opportunity to answer I don’t know. In addition, feature importance analysis can also be seen as non-reliable, as was already stated in section 2. Especially when coefficients are not very big (as is the case for experiment 2 and 3), conclusions about these should be taken with a grain of salt. Above that, it should be noted that although all models outperform the baselines, the macro F1-scores are not very high, especially for experiment 2 and 3. Further research should focus on this and try to optimize by for example changing the weights of the classes during training, using other algorithms or increasing the dataset for experiment 3. It could also be that very important explanatory variables are still missing in order to predict like cost of campaign, date of start of advertisement, duration of exposure (for example multiple weeks on television), length of advertisement itself (for example number of seconds in radio campaign), but there should be thought about a feasible way to measure this for all channels.

A different proposal for future work would be to focus on a step earlier in the marketing process. As indicated by Bruinsma [2], the step of recognition of the campaign that was used in this study to split up the different datasets, is actually late in the process of marketing. Before a respondent actually recognizes a campaign he/she mostly saw it multiple times and this means that the com-pany already had multiple moments of contact with the customer. New methods to measure this first moment of contact, instead of later in the process, could be investigated. Although for television this is measurable by a little box, like audience ratings are now derived, for other channels this is hard. For class Out of Home the exact location is needed, and for Magazines, it is hard to assume

people read the entire magazine and did not give it to their friend afterwards.

Lastly, future work could work on a way to implement this sys-tem for the media company while keeping the expert knowledge of the planner. There might be solutions for this in the field of interac-tive learning. The models proposed in this study could help giving more accurate advice to customers about the way they perform their marketing campaigns using machine learning. With machine learning more data can be analyzed quicker and it can show connec-tions between variables and marketing channels that the planner might have missed or does not have experience with. The planner could then help to interpret these patterns.

ACKNOWLEDGMENTS

The author of this paper would like to take this moment to thank supervisor Stevan Rudinac for his enthusiastic comments every week and Michel van Koningsbrugge for all his trust and the given opportunities during the time at his company. Also thanks to her family for their motivating support during this stressful period (and sister for lending her car out), to the employees of MK2_{and Karel for} the lovely lunch conversations and to her friends for their playful distraction in all those moments in between.

REFERENCES

[1] G Alder and D Benson. [n. d.]. Draw.IO. https://www.draw.io/. ([n. d.]). [2] E. Bruinsma. 2018. private communication. (May 2018).

[3] Kristof Coussement and Dirk Van den Poel. 2008. Churn Prediction in Subscrip-tion Services: an ApplicaSubscrip-tion of Support Vector Machines While Comparing Two Parameter-Selection Techniques. Expert systems with applications 34, 1 (2008), 313–327. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136.728 [4] Melvin R Crask and William D Perreault Jr. 1977. Validation of discriminant

analysis in marketing research. Journal of Marketing Research (1977), 60–68. [5] Sven F Crone, Stefan Lessmann, and Robert Stahlbock. 2006. The impact of

preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research 173 (2006), 781–800. https: //doi.org/10.1016/j.ejor.2005.07.023

[6] Dapeng Cui and David Curry. 2005. Prediction in Marketing Using the Support Vector Machine. Marketing Science 24, 4 (2005), 595–615. http://www.jstor.org/ stable/pdf/40056988.pdf?casa

[7] Arne De Keyser, Jeroen Schepers, and Umut Konuş. 2015. Multichannel customer segmentation: Does the after-sales channel matter? A replication and extension. International Journal of Research in Marketing32, 4 (2015), 453–456. https://doi. org/10.1016/j.ijresmar.2015.09.005

[8] Charles X Ling and Chenghui Li. 1998. Data Mining for Direct Marketing: Problems and Solutions. KDD 98 (1998), 73–79. http://www-aig.jpl.nasa.gov/ public/kdd98/

[9] Andreas C Müller and Sarah Guido. 2016. Introduction to machine learning with Python: a guide for data scientists. " O’Reilly Media, Inc.".

[10] Reiichiro Nakano. 2017. reiinakano/scikit-plot: 0.3.5 [Data set]. Zenodo. (2017). https://doi.org/0.5281/zenodo.293191

[11] E W T Ngai, Li Xiu, and D C K Chau. 2009. Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with ApplicationsExpe36 (2009), 2592–2602. https://doi.org/10. 1016/j.eswa.2008.02.021

[12] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [13] C Perlich, · B Dalessandro, O Stitelman, T Raeder, and F Provost. 2014. Machine learning for targeted display advertising: Transfer learning in action. Machine learning95, 1 (2014), 103–127. https://archive.nyu.edu/jspui/bitstream/2451/ 31708/2/Provost1

[14] Xuhui Shao and Lexin Li. 2011. Data-driven multi-touch attribution models. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 258–264. https://doi.org/10.1145/2020408.2020453 [15] Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance

measures for classification tasks. Information Processing & Management 45, 4 (2009), 427–437.

(12)

A

DISTRIBUTION VARIABLES

Table 5: Description of explanatory variables in All-dataset (meaning all rows with yes and a random sample from maybe and no).

Variables Description Not Null Null Percentage Null Unique Values

0 Merk Name of Brand 346004 0 0.00 65

1 AantalKinderen Number of children of

respondent 295202 50802 14.68 26

2 HuishoudGrootte Number of people in

household of respon-dent

309218 36786 10.63 57

3 Geslacht Gender of respondent 344952 1052 0.30 2

4 Opleiding Education level of

re-spondent 343092 2912 0.84 3

5 Arbeidsstatus Employment status of

respondent 309172 36832 10.64 7

6 Boodschapper How often the

respon-dent is responsible for groceries in household

297384 48620 14.05 5

7 Gezinssamenstelling Family structure of

re-spondent 256351 89653 25.91 7

8 HuishoudInkomen Income of household of

respondent 249407 96597 27.92 3

9 Kostwinner Whether the

respon-dent is the primary in-come earner at home

309226 36778 10.63 4

10 Regio Region of hometown of

respondent 342356 3648 1.05 5

11 Politiek How would the

respon-dent vote 236262 109742 31.72 15

12 EconSit How would respondent

say the economy is right now

247863 98141 28.36 3

13 EconProg How would respondent

say the economy will change

247863 98141 28.36 3

14 AanschafGunstig How favourable is it

to purchase at this mo-ment according to re-spondent

247863 98141 28.36 3

15 Reclamedrukwerk How the mailbox of

re-spondent handles ad-vertisements

268759 77245 22.32 3

16 HuishoudProg How would respondent

say her/his household will change

247863 98141 28.36 3

17 HuishoudSit How would respondent

say her/his household is doing right now

247863 98141 28.36 3

(13)

Yes TV _OOH _RAD _DGB INT _MAG _BIOS 0 20000 40000 60000 80000 100000 120000 140000 160000 180000

Maybe Yes MAYBE

TV _RAD _DGB INT _MAG _OOH _BIOS

0 20000 40000 60000 80000 100000 120000 140000 160000 180000

(14)

B

CONFUSION MATRIX

BIOS

DGB

INT

MAG

OOH

RAD

TV

Macro F1= 0.309

BIOS

DGB

INT

MAG

OOH

RAD

TV

True label

0.04 0.00 0.00 0.00 0.00 0.00

0.96

0.00

0.80 0.00 0.00 0.02 0.02 0.16

0.00 0.26 0.21 0.00 0.00 0.04

0.50 0.00 0.00 0.00 0.03 0.00 0.12

0.85 0.00 0.01 0.00 0.00 0.02 0.01

0.96 0.00 0.07 0.00 0.00 0.00 0.08

0.85 0.00 0.01 0.00 0.00 0.00 0.01

0.99 Normalized confusion matrix

0.0

0.2

0.4

0.6

0.8

Figure 11: Normalized confusion matrix for best model for experiment 1 (logistic regression with no feature selection) without balanced learning.

(15)

C

SELECTPERCENTILE() RESULTS

Table 6: Macro F1-scores of different values for features selection method Percentile() for logistic regression for experiment 1. Percentile Macro F1-score

50 0.402 60 0.404 70 0.408 80 0.409 90 0.411 100 0.413

(16)

D

COEFFICIENTS ANALYSIS

5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0

Boodschapper_Geen Boodschapper_Geen

Kostwinner_Ja, ik ben de kostwinner Kostwinner_Ja, ik ben de kostwinner

Kostwinner_Mijn partner en ik dragen evenveel bij aan het huishoudinkomenKostwinner_Nee, ik ben de partner van de kostwinner Kostwinner_Mijn partner en ik dragen evenveel bij aan het huishoudinkomenKostwinner_Nee, ik ben de partner van de kostwinner

Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Bankgiro Loterij Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Bankgiro Loterij

Merk_Bolletje Goed Bezig!Merk_HBO Merk_Bolletje Goed Bezig!Merk_HBO

Merk_Nationale Postcode LoterijMerk_TUI Merk_Nationale Postcode LoterijMerk_TUI

Merk_Ti Sento Merk_Ti Sento

Opleiding_Laag Opleiding_Laag

Politiek_50Plus (50+) Politiek_50Plus (50+)

Politiek_Christen-Democratisch Appèl (CDA)Politiek_ChristenUnie (CU) Politiek_Christen-Democratisch Appèl (CDA)Politiek_ChristenUnie (CU)

Politiek_Democraten 66 (D66)Politiek_Groenlinks (GL) Politiek_Democraten 66 (D66)Politiek_Groenlinks (GL)

Politiek_Partij van de Arbeid (PvdA) Politiek_Partij van de Arbeid (PvdA)

Politiek_Partij voor de Dieren (PvdD)Politiek_Partij voor de Vrijheid (PVV) Politiek_Partij voor de Dieren (PvdD)Politiek_Partij voor de Vrijheid (PVV)

Politiek_Socialistische Partij (SP) Politiek_Socialistische Partij (SP)

Politiek_Staatkundig Gereformeerde Partij (SGP) Politiek_Staatkundig Gereformeerde Partij (SGP)

Politiek_Volkspartij voor Vrijheid en Democratie (VVD)Politiek_ik zou blanco stemmen Politiek_Volkspartij voor Vrijheid en Democratie (VVD)Politiek_ik zou blanco stemmen

Politiek_ik zou mijn stem ongeldig maken Politiek_ik zou mijn stem ongeldig maken

Politiek_ik zou niet de moeite nemen om te gaan stemmen Politiek_ik zou niet de moeite nemen om te gaan stemmen

Politiek_ik zou op een andere landelijke partij stemmen (niet bovenstaand)Reclamedrukwerk_Nee, ik heb geen sticker op de brievenbus Politiek_ik zou op een andere landelijke partij stemmen (niet bovenstaand)Reclamedrukwerk_Nee, ik heb geen sticker op de brievenbus

Reclamedrukwerk_Nee/ja sticker Reclamedrukwerk_Nee/ja sticker

Reclamedrukwerk_Nee/nee sticker Reclamedrukwerk_Nee/nee sticker

Top 30 most extreme coefficients for channel BIOS, experiment 1

Figure 12: Coefficients of experiment 1, class Cinema.

0 5 10 15

Boodschapper_Alle Boodschapper_Alle

Boodschapper_Ongeveer de helft Boodschapper_Ongeveer de helft

Boodschapper_Ongeveer driekwart Boodschapper_Ongeveer driekwart

Gezinssamenstelling_Studentenhuis/woongroep Gezinssamenstelling_Studentenhuis/woongroep

Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Arke Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Arke

Merk_Bankgiro Loterij Merk_Bankgiro Loterij

Merk_Bolletje Goed Bezig! Merk_Bolletje Goed Bezig!

Merk_De Friesland ZorgverzekeraarMerk_E.ON Merk_De Friesland ZorgverzekeraarMerk_E.ON

Merk_Friesland Bank Merk_Friesland Bank

Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_Grand?Italia Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_Grand?Italia

Merk_GroenrijkMerk_HBO Merk_GroenrijkMerk_HBO

Merk_Intratuin Merk_Intratuin

Merk_LeasePlan Merk_LeasePlan

Merk_Nationale Postcode LoterijMerk_Plastic Heroes Merk_Nationale Postcode LoterijMerk_Plastic Heroes

Merk_Radio 538 Merk_Radio 538

Merk_Rosada Factory OutletMerk_Slankie Merk_Rosada Factory OutletMerk_Slankie

Merk_TUI Merk_TUI

Merk_VSM Merk_VSM

Merk_ViaDierenwinkel.nlMerk_Vrienden Loterij Merk_ViaDierenwinkel.nlMerk_Vrienden Loterij

Merk_Zeeman Merk_Zeeman

Regio_AmRoDHRegio_G,Fr,D Regio_AmRoDHRegio_G,Fr,D

Top 30 most extreme coefficients for channel DGB, experiment 1

(17)

5 0 5 10

Merk_AmstelMerk_Arke Merk_AmstelMerk_Arke

Merk_BavariaMerk_Bertolli Merk_BavariaMerk_Bertolli

Merk_Bolletje Merk_Bolletje

Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Noten & Granenrepen Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Noten & Granenrepen

Merk_Bolletje OntbijtkoekMerk_Converse Merk_Bolletje OntbijtkoekMerk_Converse

Merk_Friesland BankMerk_G'woon Merk_Friesland BankMerk_G'woon

Merk_Gamma Merk_Gamma

Merk_GroenrijkMerk_Karwei Merk_GroenrijkMerk_Karwei

Merk_LG Merk_LG

Merk_Lamb WestonMerk_Modifast Merk_Lamb WestonMerk_Modifast

Merk_Parkmobile Merk_Parkmobile

Merk_Rosada Fashion OutletMerk_Slankie Merk_Rosada Fashion OutletMerk_Slankie

Merk_StaatsloterijMerk_TUI Merk_StaatsloterijMerk_TUI

Merk_Tele2 Merk_Tele2

Merk_The Phone HouseMerk_Ti Sento Merk_The Phone HouseMerk_Ti Sento

Merk_Van Gogh MuseumMerk_ViaDierenwinkel.nl Merk_Van Gogh MuseumMerk_ViaDierenwinkel.nl

Merk_Vrienden LoterijMerk_Zeeman Merk_Vrienden LoterijMerk_Zeeman

Top 30 most extreme coefficients for channel INT, experiment 1

Figure 14: Coefficients of experiment 1, class Internet.

5 0 5 10 15

Boodschapper_Alle Boodschapper_Alle

Boodschapper_Ongeveer de helft Boodschapper_Ongeveer de helft

Boodschapper_Ongeveer driekwart Boodschapper_Ongeveer driekwart

Boodschapper_Ongeveer een kwart Boodschapper_Ongeveer een kwart

Gezinssamenstelling_Alleenstaand met (thuiswonende) kinderenGezinssamenstelling_Alleenstaand zonder kinderen Gezinssamenstelling_Alleenstaand met (thuiswonende) kinderenGezinssamenstelling_Alleenstaand zonder kinderen

Gezinssamenstelling_Anders, namelijk: Gezinssamenstelling_Anders, namelijk:

Gezinssamenstelling_Gehuwd/samenwonend met (thuiswonende) kinderenGezinssamenstelling_Gehuwd/samenwonend zonder kinderen Gezinssamenstelling_Gehuwd/samenwonend met (thuiswonende) kinderenGezinssamenstelling_Gehuwd/samenwonend zonder kinderen

Gezinssamenstelling_Thuiswonend bij (groot)ouder(s)/familieKostwinner_Ja, ik ben de kostwinner Gezinssamenstelling_Thuiswonend bij (groot)ouder(s)/familieKostwinner_Ja, ik ben de kostwinner

Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Batavia Stad Fashion Outlet Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Batavia Stad Fashion Outlet

Merk_Bertolli Merk_Bertolli

Merk_Bolletje OntbijtcrackersMerk_Converse Merk_Bolletje OntbijtcrackersMerk_Converse

Merk_Grand?ItaliaMerk_HBO Merk_Grand?ItaliaMerk_HBO

Merk_Licor 43 Merk_Licor 43

Merk_Nationale Postcode LoterijMerk_Radio 538 Merk_Nationale Postcode LoterijMerk_Radio 538

Merk_Remy MartinMerk_TUI Merk_Remy MartinMerk_TUI

Merk_Vrienden Loterij Merk_Vrienden Loterij

Top 30 most extreme coefficients for channel MAG, experiment 1

(18)

5 0 5 10 15 20

Gezinssamenstelling_Alleenstaand met (thuiswonende) kinderenGezinssamenstelling_Alleenstaand zonder kinderen Gezinssamenstelling_Alleenstaand met (thuiswonende) kinderenGezinssamenstelling_Alleenstaand zonder kinderen

Gezinssamenstelling_Anders, namelijk: Gezinssamenstelling_Anders, namelijk:

Gezinssamenstelling_Gehuwd/samenwonend met (thuiswonende) kinderenGezinssamenstelling_Gehuwd/samenwonend zonder kinderen Gezinssamenstelling_Gehuwd/samenwonend met (thuiswonende) kinderenGezinssamenstelling_Gehuwd/samenwonend zonder kinderen

Gezinssamenstelling_Thuiswonend bij (groot)ouder(s)/familieKostwinner_Ja, ik ben de kostwinner Gezinssamenstelling_Thuiswonend bij (groot)ouder(s)/familieKostwinner_Ja, ik ben de kostwinner

Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Aperol Spritz Kostwinner_Nee, ik ben geen kostwinner of partner van de kostwinnerMerk_Aperol Spritz

Merk_Arke Merk_Arke

Merk_Batavia Stad Fashion Outlet Merk_Batavia Stad Fashion Outlet

Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers

Merk_Converse Merk_Converse

Merk_De Friesland ZorgverzekeraarMerk_E.ON Merk_De Friesland ZorgverzekeraarMerk_E.ON

Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_HBO Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_HBO

Merk_Lamb WestonMerk_LeasePlan Merk_Lamb WestonMerk_LeasePlan

Merk_Nationale Postcode LoterijMerk_Plastic Heroes Merk_Nationale Postcode LoterijMerk_Plastic Heroes

Merk_Radio 538Merk_TUI Merk_Radio 538Merk_TUI

Merk_ViaDierenwinkel.nlMerk_Vrienden Loterij Merk_ViaDierenwinkel.nlMerk_Vrienden Loterij

Regio_AmRoDH Regio_AmRoDH

Top 30 most extreme coefficients for channel OOH, experiment 1

Figure 16: Coefficients of experiment 1, class Out of Home.

8 6 4 2 0 2 4 6 8

Merk_Amstel Merk_Amstel

Merk_Aperol SpritzMerk_Aviko Merk_Aperol SpritzMerk_Aviko

Merk_Bolletje Boeren Beschuit Merk_Bolletje Boeren Beschuit

Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers

Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers

Merk_Bolletje OntbijtkoekMerk_Calendulan Merk_Bolletje OntbijtkoekMerk_Calendulan

Merk_ConverseMerk_G'woon Merk_ConverseMerk_G'woon

Merk_GammaMerk_Karwei Merk_GammaMerk_Karwei

Merk_LG Merk_LG

Merk_Licor 43 Merk_Licor 43

Merk_ModifastMerk_Netflix Merk_ModifastMerk_Netflix

Merk_Oad ReizenMerk_Praxis Merk_Oad ReizenMerk_Praxis

Merk_Slankie Merk_Slankie

Merk_Staatsloterij Merk_Staatsloterij

Merk_Sultana GoodmorningMerk_The Famous Grouse Merk_Sultana GoodmorningMerk_The Famous Grouse

Merk_Ti Sento Merk_Ti Sento

Merk_Van Gils Merk_Van Gils

Merk_Van Gogh Museum Merk_Van Gogh Museum

Top 30 most extreme coefficients for channel RAD, experiment 1

(19)

7.5 5.0 2.5 0.0 2.5 5.0 7.5

Merk_AmstelMerk_Aviko Merk_AmstelMerk_Aviko

Merk_Bavaria Merk_Bavaria

Merk_Bolletje OntbijtkoekMerk_Calendulan Merk_Bolletje OntbijtkoekMerk_Calendulan

Merk_De Friesland ZorgverzekeraarMerk_De Lotto Merk_De Friesland ZorgverzekeraarMerk_De Lotto

Merk_G'woon Merk_G'woon

Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_Groenrijk Merk_Grand Marnier met ginger ale (Grand Ginger)Merk_Groenrijk

Merk_KarweiMerk_LG Merk_KarweiMerk_LG

Merk_Lamb WestonMerk_LeasePlan Merk_Lamb WestonMerk_LeasePlan

Merk_ModifastMerk_Netflix Merk_ModifastMerk_Netflix

Merk_Oad Reizen Merk_Oad Reizen

Merk_Peijnenburg OntbijtkoekMerk_Praxis Merk_Peijnenburg OntbijtkoekMerk_Praxis

Merk_Remy MartinMerk_Samsung Merk_Remy MartinMerk_Samsung

Merk_Spiroflor Merk_Spiroflor

Merk_Sultana GoodmorningMerk_Tele2 Merk_Sultana GoodmorningMerk_Tele2

Merk_Van Gogh Museum Merk_Van Gogh Museum

Merk_Wasa Delicate CrispMerk_Zeeman Merk_Wasa Delicate CrispMerk_Zeeman

Top 30 most extreme coefficients for channel TV, experiment 1

Figure 18: Coefficients of experiment 1, class TV.

2.5 2.0 1.5 1.0 0.5 0.0

Arbeidsstatus_Tijdelijk verlof (bijv. zwangerschapsverlof)Merk_Amstel Arbeidsstatus_Tijdelijk verlof (bijv. zwangerschapsverlof)Merk_Amstel

Merk_Aperol SpritzMerk_Arke Merk_Aperol SpritzMerk_Arke

Merk_Aviko Merk_Aviko

Merk_Bankgiro LoterijMerk_Bavaria Merk_Bankgiro LoterijMerk_Bavaria

Merk_Bertolli Merk_Bertolli

Merk_Bolletje Boeren BeschuitMerk_Bolletje Goed Bezig! Merk_Bolletje Boeren BeschuitMerk_Bolletje Goed Bezig! Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers Merk_Bolletje Krokante OntbijtgranenMerk_Bolletje Lichte Crackers

Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers Merk_Bolletje OntbijtkoekMerk_Friesland Bank Merk_Bolletje OntbijtkoekMerk_Friesland Bank

Merk_G'woon Merk_G'woon

Merk_GammaMerk_Karwei Merk_GammaMerk_Karwei

Merk_LG Merk_LG

Merk_Nationale Postcode LoterijMerk_Peijnenburg Ontbijtkoek Merk_Nationale Postcode LoterijMerk_Peijnenburg Ontbijtkoek

Merk_Plastic HeroesMerk_Praxis Merk_Plastic HeroesMerk_Praxis

Merk_Sultana GoodmorningMerk_The Phone House Merk_Sultana GoodmorningMerk_The Phone House

Top 30 most extreme coefficients for channel MAYBE, experiment 2

(20)

2 1 0 1

Merk_AmstelMerk_Arke Merk_AmstelMerk_Arke

Merk_Aviko Merk_Aviko

Merk_Bolletje Boeren BeschuitMerk_Bolletje Goed Bezig! Merk_Bolletje Boeren BeschuitMerk_Bolletje Goed Bezig!

Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers Merk_Bolletje Noten & GranenrepenMerk_Bolletje Ontbijtcrackers

Merk_Bolletje Ontbijtkoek Merk_Bolletje Ontbijtkoek

Merk_De Friesland ZorgverzekeraarMerk_G'woon Merk_De Friesland ZorgverzekeraarMerk_G'woon

Merk_GroenrijkMerk_Karwei Merk_GroenrijkMerk_Karwei

Merk_Mijndomein.nl Merk_Mijndomein.nl

Merk_Nationale Postcode LoterijMerk_Peijnenburg Ontbijtkoek Merk_Nationale Postcode LoterijMerk_Peijnenburg Ontbijtkoek

Merk_Plastic HeroesMerk_Praxis Merk_Plastic HeroesMerk_Praxis

Merk_Remy MartinMerk_Slankie Merk_Remy MartinMerk_Slankie

Merk_Sultana GoodmorningMerk_TUI Merk_Sultana GoodmorningMerk_TUI

Merk_The Phone HouseMerk_Ti Sento Merk_The Phone HouseMerk_Ti Sento

Politiek_Staatkundig Gereformeerde Partij (SGP) Politiek_Staatkundig Gereformeerde Partij (SGP)

Top 30 most extreme coefficients for channel ALT, experiment 3