• No results found

To Click or Not to Click: Machine Learning

N/A
N/A
Protected

Academic year: 2021

Share "To Click or Not to Click: Machine Learning"

Copied!
12
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

7/24/2017 | 1

To Click or Not to Click: Machine Learning

Techniques for Predicting and Uncovering

Influencers of Click Behaviour

Erwin Oosterhuis

s2211173

University of Groningen

First Supervisor: prof. dr. J.E. Wieringa Second Supervisor: dr. J.E.M. van Nierop

(2)

7/24/2017 | 2

Click-through rates have decreased to as low as 0.1% (MediaMind 2012).

Predicting clicks gives an opportunity to target advertisements, which results in

improvements of CTR. (Briggs and Hollis 1997, Sherman and Deighton 2001, Chandon et al. 2003, Chatterjee et al. 2003)

2) What is the importance and influence of

user demographics and browsing behavior variables on click-through?

Two-fold research purpose

1) How can machine learning techniques be used to develop an accurate display advertising click-through prediction model, providing further insights into open research

issues in model building?

Research Question 1 Research Question 2

(3)

Hypotheses

7/24/2017 | 3

Choice of algorithm

Before calibration, bagged decision trees have best performance

Before calibration, logistic regression has worst performance

After calibration, boosted decision trees have best performance

Calibration

Calibration positively influences model performance

Handling class imbalance

Undersampling outperforms SMOTE in terms of model performance

Feature selection

Wrapper feature selection positively influences model performance

Filter feature selection positively influences model performance

Wrapper feature selection outperforms filter feature selection in terms of performance

Importance of the datatypes

Combination of both demographic variables and browsing behaviour variables result in better performance than their separate influence

Influence of demographic variables

Age positively influences click behaviour

Women show higher click behaviour than men

Influence of browsing behaviour variables

Search behaviour is positively associated with click behaviour

Store patronage is positively associated with click behaviour

Browsing via tablets or mobile phones negatively influences click behaviour

(4)

› 12 weeks of Analytics data, enriched with internal data =

19.358 raw observations

› 9.6% missing values for session length

• Little’s MCAR test -> p=0.000, data is not MCAR

• Assumption: data is MAR

• MI to impute values

› Final dataset consists of 10489 observations

› Analysis takes places in R

(5)

7/24/2017 | 5

Methodology

Algorithms

SMOTE or undersampling Full, filter or wrapper

No calibration or calibration

+ + +

Data set Boosted dataset: resampled 15 times

LR, DT, boosting, bagging, RF, SVM

Grid search for optimal parameter settings 10-fold cross-validation

AUC SAR LogLoss

Performance

1/3 + (AUC + ACC + (1-RMSE))

1) Shapiro-Wilk test for normality assumption 1) p<0.05: Wilcoxon test

2) P>0.05: Paired samples t-test

(6)
(7)

Results Research Question 1

7/24/2017 | 7

› Random forests result in highest performance before calibration › SVM results in worst performance before calibration

› Calibration significantly improves performance of all algorithms › Boosted decision trees result in highest performance after

calibration

› Feature selection improves model performance (except for SVM and boosting) – differs which selection procedure works best

(8)

Results Research Question 2

› Only bagged decision trees benefitted from combination of data › Age and gender do not influence click behaviour

› Search behaviour is a strong predictor of click behaviour • Search depth has positive influence

• Number of unique searches is important predictor › Store patronage is strong predictor of click behaviour

• New users are less likely to click on advertisements • Days since last session negatively associated with click

behaviour

• Session count is important predictor

› Browsing via mobile phones and tablets decreases click behaviour

(9)

Additional results

› Time per page is most important predictor of click behaviour

› The acquisition channel influences click behaviour

• Organic search and direct traffic have largest positive

influence

› Landing page influences click behaviour

• Landing on product page has positive influence

• Internal search result has negative influence

(10)

7/24/2017 | 10

Research Question 1 Research Question 2

Conclusions

1. ‘There is no such thing as a free

lunch in statistics’. Take an

empirical approach.

2. Calibration and feature selection (in almost all cases) increase

model performance. Wrapper

feature selection is preferred.

3. Take SMOTE in consideration when handling class imbalance.

4. Use random forests for highest performance. Or, when using

calibration, combine it with boosted decision trees.

1. Focus on data quality instead of quantity. Browsing behaviour

variables are better predictors than demographic variables.

2. Incorporate search behaviour data, store patronage data and data on time spent per page. 3. Critically assess advertising on

mobile devices

4. Focus on acquisition via organic search and direct traffic

(11)

Limitations/future research

› Are the variables independent variables or mediators?

› Demographical data was of low quality. Focus on high

quality demographical data and re-asses influence

› Only one dataset used. Focus on more datasets for a

better benchmark study

› Are parametric tests (paired samples t-test) a good way to

compare performance? Use non parametric tests such as

Friedman test.

(12)

Referenties

GERELATEERDE DOCUMENTEN

Through participative democratic governance, a platform is being created in which all stakeholders shall contribute towards restoring the culture of learning and

Motivated by the problem that the WD masses in CVs are higher than in pre-CVs and that their space density seems signi ficantly lower than theoretically predicted, we investigate

To compensate for the increase dependability of findings due to the applied qualitative research methods and to substantiate those findings with quantitative findings, a survey

Point-wise ranking algorithms operate on one query document pair at once as a learning instance, meaning that a query document pair is given a relevance rating all of its

Dutch students who have a lower average from their previous education (HAVO/VWO) are more likely to drop out of the studies sooner during the first year compared to Dutch

Keywords: Click Prediction, Machine Learning Techniques, Logistic Regression, Decision Tree, Boosted Decision Trees, Bagged Decision Trees, Random Forests, Support

Furthermore, for those customer that spent &gt;0% to 50% of their website visits on additional product page types the odds ratio to switch decrease by a factor of .755 relative

Participants living together with at least one other person (n=373) had significantly higher scores on the domains physical health, social relationships, and environment and on