Recommendation systems for online stores

(1)

Recommendation systems for online

stores

How implicit feedback can be used to enhance

performance

(2)

stores

How implicit feedback can be used to enhance

performance

Jacco Kuiper Rozenstraat 1, Heerenveen j.r.kuiper.3@student.rug.nl 06 23 95 50 55 S2070510

Master Thesis

June 28, 2017 University of Groningen

Faculty of Economics and Business / Department of Marketing Faculty of Arts / Department of Information Science

(3)

Management Summary

This thesis aims to unleash the potential of implicit feedback for online personalized recommendations. To do so, it first describes how recommender systems can benefit both stores & consumers and studies the workings of multiple algorithms.

Because multiple sources of implicit feedback are available in the data, the value of each type of implicit feedback is assessed first using MAP@k. No indicators except ’view’ and ’buy’ are found to be important for rec-ommender performance. In other words: including system and user-location information does not improve recommendations.

Three recommender models are deployed on the product-detail page of an online store: a content-based model based on product description similarities, a collaborative filtering model based on product-views and a model based on correlated cross-occurrence between purchases and productviews. A comparison of their click-through rates shows that the content-based model outperforms the other models.

Summing up, this thesis finds that:

• Behavioral indicators such as ’buy’ and ’view’ contain predictive power when used for recommendations. System and location indicators do not.

• On product-detailpages, content-based recommenders outperform per-sonalized models.

Online stores may use this information to optimize their bounce rates and to help consumers find interesting products faster to increase conversion rates and thus revenue. Companies such as Conversify can use these findings to assist online stores in choosing and deploying the right type of recommender at the right place.

Other research could focus on how the performance of various recom-mender systems depends on its on-site placement. Another - more interesting - way to build upon this research may be to deploy and test similar models in

(4)

(5)

Preface

This thesis finalizes my ”University-career”. I am glad that I have been given the opportunity to finalize two master degrees with this project. I want to thank mr. Wieringa who made that possible. I also want to thank Misja and Bart for all their help and I truly think that their company is quite awesome. Last but not least I want to thank my parents and Mileen for their support.

I have learned a lot in the past 5 months and have worked with some tools I plan to work with in the future. I have also learned many things that I can use to improve my own websites.

Altogether, it has been a good learning experience and I hope to apply all that I’ve learned in the coming years.

(6)

1 Introduction 1

2 Related work 6

2.1 Recommender systems as marketing technology . . . 7

2.1.1 Understanding consumers . . . 7

2.1.2 Impact on online shopping . . . 9

2.1.3 Evaluating recommender systems in online settings . . . 13

2.2 Recommender algorithms . . . 14

2.2.1 Neighborhood-based collaborative filtering . . . 15

2.2.2 Model-based collaborative filtering . . . 17

(7)

CONTENTS v

6 Conclusion and Future Work 39

6.1 Future Work . . . 39

(8)

- Jeff Bezos, CEO of Amazon

1

Introduction

The rise of E-commerce The rise of Internet has brought many changes in terms of communication, an ongoing strong growth of data and radical changes to the way in which consumers shop. With a few clicks, one can use FaceTime or Skype to call with everyone everywhere while simultaneously seeing each other; search for information in an estimated 4.76 billion pages (based on Google and Bing [51]); and shop in online stores from various countries or even compare prices in real-time when shopping in physical stores. The commercial possibilities these changes entail have not remained unnoticed: the upsurge of the Internet drives more and more retailers and marketeers to find ways to unleash the power of the Internet for commercial purposes such as online advertising and shopping. In the US only, online sales are expected to increase from $373 billion in 2016 to $500 billion in 2020, which is 11% of overall retail sales in the US [10]. Furthermore, over 60% of all US retail sales will involve the Internet this year, either as direct sales channel or as part of a consumers’ path to purchase [11]. In the Netherlands, online sales grew with 16% from 2014 to 2015 [13]. Figures over 2016 show that E-commerce is still growing and that online sales now account for over 10% of all retail sales [14].

Personalization of online advertising As the Internet matures, more people share their personal information on the Internet. The rise of social media platforms such as Facebook, LinkedIn and more recently Snapchat underpins this trend. Marketeers can use such rich data sources to personalize marketing efforts, which leads to higher click-through-rates (CTR) and conversion than online mass advertising [31]. Financial

(9)

CHAPTER 1. INTRODUCTION 2 numbers show that marketeers find their way to such advertising methods better than ever: Facebook’s advertising revenue over 2016 was $26.88 billion, up 57% from 20151_.

Other companies that have extremely rich data sources are search engines. Although Google’s founders stated that ”the goals of the advertising business model do not always correspond to providing quality search to users” back in 1998 when Google was founded [4], they currently are at the forefront of personalized marketing technology with a reported revenue of $79 billion from (re-targeted) advertising in 20162_.

Personalization of online stores Data richness is not limited to search engines and social media platforms: online stores are rich in terms of data as well. They collect a lot of information on their customers’ browsing and buying behavior and therefore have personal targeting abilities too [26]. To optimize marketing effort, the more innovative online stores leverage their behavioral and transactional data with analyses and tools to support decision-making and -aiding. This has brought forth various technological solutions among which recommendation engines. In essence, recommender systems are software tools and techniques providing suggestions for items to be of use to a user [41]. Problem statement Not all online stores possess the required technological or finan-cial resources to build data-driven marketing tools. This shortage, combined with the growth of (open-source) E-commerce has paved the way for software companies to build and maintain data-driven marketing tools for online stores. Among these companies is Conversify, a company that aims to increase conversion for online stores with persuasive and intelligent optimization tools such as notifications, smartsearch and recommenders. Conversify’s notification system is intelligent: it learns from which notifications have worked in the past for similar users using machine-learning techniques. Their recom-mender system is less intelligent: it is based solely on a ’more-like-this’ (MLT) query for product descriptions and thus makes no use of behavioral and transactional data stored in session dumps to customize recommendations for individual users. The aim of this thesis is to create a more intelligent recommendation system for Conversify using behavioral

1

https://investor.fb.com/investor-news/press-release-details/2017/Facebook-Reports-Fourth-Quarter-and-Full-Year-2016-Results/default.aspx

(10)

and transactional data obtained from online sessions. From now on, I refer to such data as implicit session feedback. Hence, the research question is:

How can implicit session feedback be used to create intelligent e-commerce recommender systems?

How recommender systems benefit consumers Where brick and mortar stores can only serve up to a fixed number of options due to its limited physical size, online stores have the ability to serve consumers with millions of options. This shift from scarcity of presentational space to abundance has given rise to the long tail phenomenon: physical stores can only provide what is popular, while online stores can make everything available. Shelf space used to be scarce and therefore valuable for traditional retailers. Its equivalent in an E-commerce environment - on-site availability - is abundant which leads to near-zero-cost dissemination of products and thus allows for much larger assortments [28]. As it is impossible for online stores and customers to respectively present and compare all items from such a rich pool of information, the need for systems that recommend which information should be shown to someone becomes clear.

We now know that recommender systems are helpful for consumers in a time-saving manner, but they also have the ability to increase happiness. As the amount of choice increases, so does the expectation of consumers to find the perfect fit from all those choices. However, with the many options that capitalism and free markets give us, finding the optimum can be challenging. This leads to a phenomenon called the paradox of choice where the abundance of options increases expectations up to a point where they can not be met. Consumers then attribute the gap between expectations and reality to their own decision-making capabilities, which makes them feel as if they failed and thus makes them unhappy [43]. Recommenders combat this phenomenon by providing relevant choices only [41].

(11)

CHAPTER 1. INTRODUCTION 4 what they need faster; increased cross- or up-selling by suggesting additional or more complete products; and building consumer loyalty by creating a value-added consumer-company relationship. The third benefit comes forth from the fact that sites invest in learning and consumer repay the sites that do this best by visiting them more often, which then reinforces learning [42].

One of the pioneers in the field of data-driven marketing is Amazon. The first version of their recommender system was launched in 2003 [31]. Nowadays Amazon employs recommender systems to personalize the online store for each customer [41] and 35% of Amazons’ revenue comes from those systems [34]. In the Netherlands, Wehkamp and Bol.com among others have similar recommendation systems for up- & cross-selling and popular products, but the performance of their systems is not publicly available. However, other online stores have implemented recommendation systems with reported revenue increases of up to 70% and increases in conversion rate of over 20% [40]. Hence, an optimized recommendation system is a valuable tool for online stores.

Contributions This thesis aims to implement and A/B test data-driven variants of recommender systems. In doing so, the importance of implicit data in such setups is studied. The data used can not be linked to a person, age or IP-address. As privacy and information security are hot topics nowadays, such less intrusive recommendation engines might be a solution. Furthermore, other companies may use the literature study, process description and outcomes to improve their recommendation engines.

Structure The structure of the remainder of this document is the following:

Chapter 2 (Related work) describes two research fields. The first part describes the importance of recommender systems as a marketing tool, both for consumers and retailers. The second part describes recommender systems in technical terms. Chapter 3 (Research design) describes the recommender system setups in terms of data

input and software tools used and created.

(12)

used now.

Chapter 5 (Discussion) assesses what this thesis has brought in terms of knowledge. Chapter 6 (Conclusion and future work) summarizes this thesis and gives directions for

(13)

”Personalization wasn’t supposed to be a cleverly veiled way to chase prospects around the web, show-ing them the same spammy ad for the same lame stuff as everyone else sees. No, it is a chance to differentiate at a human scale, to use behavior as the most important clue about what people want and more important, what they need.”

- Seth Godin, Author — Entrepreneur — Marketer

2

Related work

Introduction Recommender systems (RS) come in many types and forms, which has led to multiple active related research fields and use cases. Currently, Netflix makes extensive use of RS to personalize movies and series recommendations; a Dutch company called Scisports offers transfer- and scouting-recommendations to football clubs; and recruitment firms such as Young Capital use them to find the right candidate for a job opening. These use cases show the widespread adoption of RS in fields other than E-commerce. This thesis, however, revolves around RS for online stores which is why this chapter is narrowed down to literature related to such usage. Hence, in the context of this work, a recommender system is defined as a web-based tool that tailors a vendors’ offerings to consumers according to their implicit preferences [29]. Implicit preference is defined as any indicator that shows a users’ interest towards an item which has not been explicitly designated nor asked for.

RS research in E-commerce settings can be divided in three areas: understanding consumers, delivering recommendations and RS’ impact on vendors and consumers. The first and last areas are related to marketing, whereas delivering recommendations is a matter of technology [29]. This section is therefore divided in two parts: a psychological marketing-oriented part and a technological algorithm-oriented part. The first part addresses the purpose of recommender systems in terms of marketing and their effect on market-, consumer- and retailer-level. The second part discusses the rationale of various implementations of RSs and their advantages and disadvantages.

(14)

2.1 Recommender systems as marketing technology

In terms of recommender engines for marketing purposes, the aspects worthy of inves-tigating are twofold. The first is how recommenders can understand consumers and potential problems that arise with collecting implicit data to estimate user preferences. The second aspect is how recommenders affect markets, shoppers and vendors.

2.1.1 Understanding consumers

To understand consumers, RSs need data (feedback). Feedback can be used to estimate preferences, which form the foundation for building consumer profiles. In this research, we are interested in RSs that use implicit feedback. Implicit RSs infer preferences from behavioral cues, such as purchase history or click-stream data. In this case, consumers do not proactively provide information regarding their preference as opposed to explicit feedback collection methods such as surveys [29].

Datasources Consumer data comes in various forms. Originally, past purchases and/or behavioral click-stream data formed the basis for recommender engines, but the widespread adoption of social networks and advancements in Internet technology allow the inclusion of other sources such as social, contextual and system information to enhance recommenders [41].

Although this research focuses on RSs with information solely collected by an online store itself, it is interesting to mention that researchers have investigated the added value of social networks and found that including information available on Facebook can increase accuracy when estimating user-profiles [16]. Other research [36] corroborates this finding: the richness of social network information and the combination of such information with product attributes leads to more accurate user-profile estimations. Amazon gives consumers the possibility to link their Facebook profile to Amazon’s system, which then provides recommendations based on prior purchases of friends. This approach is based on the presumption that friends have similar preferences - a claim supported by research [27] - and is essentially an online variant of word of mouth.

(15)

CHAPTER 2. RELATED WORK 8 imagine that system information says something about the user: i.e. the more expensive the system in use, the more the user is willing to pay. Companies have reported to use such information to distinguish between Mac and PC users and to suggest more expensive options to Mac users1.

Contextual factors may also play a vital role in the performance of recommender systems [41, p. 191]. Theories on consumer decision-making in marketing have estab-lished this importance by proving that preferences are in fact determined by context and therefore not stable [47]. In online shopping settings the addition of contextual elements has shown to be beneficial for recommender performance [15]. Such elements are time of the day, week and year [32].

Privacy concerns Implicit feedback has some drawbacks in the form of privacy con-cerns, as users are not always aware of which data is used and what for. This is even worse in situations where implicit data collected by a retailer is augmented with social network information without user consent. Many argue that in this age of information, personal information is a commodity which can be traded for personal gain, but not all users know which parts of their personal information are used for targeting [41, p. 650].

The quality of recommendations is correlated with the amount, richness and diversity of user data. However, the same factors also drive the perceived privacy risks. This problem is referred to as the privacy-personalization trade-off [41, p. 651]. This problem is not limited to recommendation engines however: the use of similar techniques for online advertising by companies such as Google and Facebook have contributed to the dispersion of tools such as Adblockers and cookietrackers with users reporting intrusiveness as one of the main reasons for using such tools2_.

Ultimately, e-vendors need to decide which of the available types of data are relevant and how this data should be modeled. This affects the perceived intrusiveness of their system and the quality of the recommendations served in later stages [9, 29].

1_{http://business.time.com/2012/06/26/orbitz-shows-higher-prices-to-mac-users/}

2

(16)

2.1.2 Impact on online shopping

Although RSs have some privacy issues depending on which data sources are included, there are various ways in which they serve both consumers and vendors [38]. Before diving into its benefits for the mentioned entities, let us first assess RSs impact on a market level.

Market impact Some literature suggests that RSs lead to more homogeneous sales, because they limit a consumer’s ability to explore novel products by showing mainly popular or similar items [35]. Some research seems to support this claim, stating that the most popular recommendation algorithms are based on sales and ratings and therefore not useful for novel products [9]. Note that from an individual user’s perspective novelty may very well be experienced, because he or she may have never seen the more popular products before. However, if recommender systems push each person to new products, but these new products are the same for each user, aggregate diversity decreases. This tends to create a rich-gets-richer effect and this might prevent consumers from finding better matches.

Other research backs the opposing view that RSs diversify sales by heterogeniz-ing the offers displayed [5], statheterogeniz-ing that online retailheterogeniz-ing exhibits a significantly more heterogenous sales distribution than traditional retailing. According to the latter study, recommender engines significantly increase the share of niche products in sales.

(17)

CHAPTER 2. RELATED WORK 10 Consumer impact Personalized recommendations have the ability to alter a user’s perceptions and intentions and to alter decision making processes. Many studies have focused on applying the technology acceptance model (TAM) [7] to show that recom-mendations indeed increase perceived usefulness. Others use the theory of reasoned action (TRA) [2] and switching- and transaction cost theories [29].

Studies have shown that recommender engines increase the perceived usefulness [25] and perceived benefits [6] of systems. These effects may either be direct or indirect via variables such as transaction costs (in terms of time and effort spent on searching for products) or perceived care. Increased usefulness and benefits increase the likelihood of system usage [49].

Another interesting viewpoint to assess recommender importance is its ability to fight problems related to choice overload. Information processing theories state that one’s information processing capabilities are limited. In this sense, human attention is treated as a scarce commodity [46]. Internet allows an ever growing body of content to be accessed immediately, which makes attention the limiting factor in the consumption of information. The same principle can be applied to products: the abundance of products available on the Internet increases the cognitive resources needed to make good decisions. The need for cognitive resources in decision-making processes induces consumers to make trade-offs between decision accuracy and cognitive effort [22]. Recommenders aim to minimize cognitive effort by raising awareness and eliciting salience. This decreases information asymmetry and improves decision accuracy [19].

(18)

the amount of products a consumer considers. Furthermore, RSs were also found to have significant positive effects (for the end-user) on the time spent on product search, which means that RSs have the ability to reduce cognitive effort. Another finding is that RSs increases decision quality and confidence in the purchase decision.

A somewhat counter-intuitive phenomenon of recommendations is its ability to make up for below-average product ratings. Products with a low average rating elicit greater responses from consumers than those with high ratings. In sales terms: the purchase probability of a recommended product increases significantly more for low-rated than for high-rated products. In such cases recommenders can help customers find products which would not have been considered in absence of recommendations [18].

The aforementioned findings highlight the many possibilities of RSs to help users and establish the value that RSs have from a consumer perspective [29], but there is another side of the coin in the form of retailers.

Retailer impact Although recommender algorithms are often evaluated in terms of accuracy in academic research, predictive accuracy is not the eventual goal from a retailer’s point of view. From a retailer’s perspective, recommender systems may be used to increase customer loyalty or increase profit. Although relevance may increase the likelihood of purchase, it may deteriorate profit margins at the same time. Therefore it is of utmost importance for retailers to balance relevance and profitability [19].

(19)

CHAPTER 2. RELATED WORK 12 Research in terms of monetary gain has mostly been done in the form of studies by software companies that develop recommender systems, which is why such information may be exaggerated and biased. However, the only way to give some indications of the financial impact of RSs for retailers is to use those studies.

A study3 _{by Barilliance (a company that helps E-commerce to increase sales and}

conversion rates) from 2014 analyzed over 300 online stores that utilize their recommen-dation engine. Results show that revenue generated from recommenrecommen-dations was 12% and conversion rates of consumers clicking on recommendation was 5.5 times the conversion rate of non clicking customers. Recommendations above the fold were found to be 1.7 times more effective than below the fold. The most engaging type of recommendations was ’what customers ultimately buy’, which shows that social proof works. Furthermore, personalized ’Top Sellers’ were found to be twice as effective in terms of CTR than their non-personalized counter-parts.

Synergy effects In systems where user history forms the basis for recommender en-gines, synergy effects exist. Users that perceive personal recommendations as beneficial are more likely to return to a vendor that provides them with recommendations. The vendor’s system then learns more about the consumer as they interact, which improves recommendation quality, which on its turn increases perceived usefulness and therefore revisit likeliness. This is also referred to as reinforcement learning [42].

Recommenders also elicit an interaction effect which stems from how consumers con-struct their preferences. Preference concon-struction is an interaction between a consumers’ information processing capabilities, task properties and information space [37]. This implies that consumer preferences are partly stable and partly dynamic. The dynamic part changes as the consumer and system interact, because the consumer processes new information. This leads to a situation where consumer preferences are dynamically updated with each interaction iteration [29], which means that RSs have an ongoing influence on purchase behavior.

(20)

2.1.3 Evaluating recommender systems in online settings

Recommender systems can be tested in various ways and determining their performance is not a trivial task. One can distinguish between three main methods: user studies, offline evaluation and online evaluation. [41]

User studies User studies ask users to explicitly rate the recommendations given by an algorithm. Users are asked to quantify their overall satisfaction with the system as well. The algorithm that gets the best ratings is chosen as best recommender engine. User studies allow for qualitative feedback, which is their strongest advantage over the other two evaluation options. Note that user studies ask participants to rate their satisfaction and not the relevance. In most cases, participants do not even know which item was most relevant according to the system. User studies are not used that often, because the other evaluation options straightforward Beel, Genzmehr, Langer, N¨urnberger, and Gipp [3]. We therefore continue with off- and online evaluation respectively.

Offline evaluation As recommender systems essentially form a solution for prediction problems, evaluation measures used for predictive accuracy are often used to measure recommender performance. These methods find their origin in information retrieval and include the Root Mean Squared Error (RMSE), F1-measure (based on precision and recall) and Mean Average Precision (MAP, also based on precision and recall). These measures are based on splitting the data into a train and test set, training a model using train data and testing using test data. Because no user feedback is needed, these methods form a relatively easy and cheap way to show and discuss performance improvements [44], which explains their widespread usage.

(21)

CHAPTER 2. RELATED WORK 14 to both off- and online data should therefore use offline setups for validation purposes and online setups (or user studies if real-world evaluation is not possible) to test actual performance.

Online evaluation According to Beel, Genzmehr, Langer, N¨urnberger, and Gipp [3], the most used online measure for recommendations is click-through rate (CTR). Click-through rate is the ratio of clicked recommendations. For instance, if a system displays 5,000 recommendations and 100 are clicked, the CTR is 2%. CTR is thus a measure for interaction. However, the rise of E-commerce has given birth to more monetary focused performance indicators of RSs. E-commerce specific measures include the rate of orders with recommended (and selected) items, the relative lift in average order value due to recommendations and the increase in revenue from recommended items. The measures are however more important to the businesses deploying them and not found in scientific studies.

A substitute for A/B testing that has gained popularity in recent years is that of Multi-Armed Bandit. The Multi-Armed Bandit approach does not split the traffic in even partitions, but shows the best performing experiment variant more often than the other variants. The intuition behind this approach is that not only after but also during a test, one wants to minimize the number of ”lost conversions”. To use this type of test, one needs a clear dependent variable that indicates performance of a variant. In case of recommender systems this can for instance be click-through rate or a more e-commerce specific measure.

2.2 Recommender algorithms

(22)

This section continues with a description of different types of recommender engines (formerly) used in online shopping settings, its underlying concepts and their advantages and disadvantages. The first part is respectively devoted to neighborhood- and model-based collaborative filtering methods, which are considered to be the classical approaches to recommendations.

Then, the focus shifts to content-based filtering methods. Both collaborative filtering and content-based methods have their shortcomings. Solutions and extensions to their shortcomings have been heavily researched topics, which has resulted in the convergence of the two approaches into systems called hybrid models. As there is a gradual build-up in complexity of the systems described, this chapter will start with basic engines and end with more complex solutions. The state-of-the-art solutions will be discussed more in-depth since - in contrast to the simpler engines - they are more relevant in online shopping environments where various types of data are available.

2.2.1 Neighborhood-based collaborative filtering

Neighborhood-based collaborative filtering (NCF) is the classical approach to recommen-dations for systems with interaction data between a user and an item. It is a wisdom-of-the-crowds approach to recommendation: it uses feedback provided by a set of nearest neigbors of a user or item to make recommendations [42]. It exploits the fact that feedback across users and items is often highly correlated and relies on imputation to estimate unobserved feedback [1]. NCF methods can be divided in user- and item-based approaches.

(23)

CHAPTER 2. RELATED WORK 16 Y feels towards unseen products.

Item-based Because user-based solutions become inefficient as the number of users increases, Amazon came up with a different approach to NCF [31]. In item-based NCF, item scores are imputed based on the interaction history of user X with items in itemset S. For each item in S, the similarity of that item to all items in the product catalog is calculated. The similarity ranking can then be used to recommend products to user X [41] [1].

(24)

2.2.2 Model-based collaborative filtering

The NCF methods discussed in the previous section are essentially generalizations of k-nearest neighbor classifiers often used in machine learning. Such methods do not create a model up front: the prediction approach is specific to the instance being predicted. It does not perform any form of generalization, but simply compares a new instance (a product or user) to instances seen in earlier. Preprocessing is often done to speed up the recommendation process, but it is not mandatory for those methods to work [1, ch. 3].

As opposed to NCF, model-based collaborative filtering (MCF) methods first create a (generalized) model for prediction. This approach is similar to (un)supervised learning processes in machine learning such as support vector machines and decision trees, where the training- and prediction-phases are clearly separated [1, ch. 3]. Model-based CF techniques - of which latent factor models have become the most popular variant - try to find generalizations in the data that can be exploited for recommendations. The underlying assumption is the fact that (big) parts of rows and columns are correlated, which leads to redundancies in the data. The original interaction matrix can therefore be approximated fairly good using a low-rank matrix. Latent factor models are considered state of the art and have proved their value in the Netflix recommendation contest. The name implies what distinguishes them from other models: they try to find latent factors for both users and items that can be inferred from ratings. These factors can be obvious dimensions such as color or material, but they may also be uninterpretable [23].

The most popular form of latent factor models is matrix factorization [23], which infers vectors of factors from rating patterns. The vectors for both items and users are obtained by minimizing the regularized squarred error on the set of known ratings. The estimated interest for any user-item combination can then be obtained by simply retrieving the dot product of the vectors of these combinations [1, ch. 3].

(25)

CHAPTER 2. RELATED WORK 18 of items. Model-based approach are faster at constructing a model. The third advantage is that MCF methods are less prone to overfitting, which makes them more robust. This advantage is related to matrix factorizations’ flexibility with regard to biases. Variation in the observed ratings is often due to effects associated with users and items instead of the interaction between them. Matrix factorization allows for mathematical adaption of the minimization function to account for such effects, such that observed ratings can be broken down into global average, user bias, item bias and interaction [23].

Disadvantages Although matrix factorization methods have advantages over CF, the biggest problem - the cold-start problem discussed earlier - remains unsolved. Further-more, the inclusion of multiple different data sources can only be done using (subjective) weights.

2.2.3 Content-based engines

Content-based recommenders originate from algorithms used in the field of information retrieval, where similarity measures are used to find documents that match a users’ profile or target document. The underlying principle is to find similarities in items that have received a rating and then use their common characteristics to create a user profile. This profile can then be used to find the unseen items most similar to those rated [1]. In online shopping settings, content-based systems use item attributes to find items similar to a target item or to a target user. Note that these systems in their most naive form do not have to use any information about the user except which item the user is currently displaying interest towards. Examples of such naive implementations are recommendations based on description distance between a target item (= the item being viewed) and other items using TF-IDF [8] or distance in terms of visual cues between two products [33]. In these cases recommendations are non-personalized.

(26)

and looks at multiple sci-fi movies. If we do not have any information about other users, collaborative filtering methods can not be used. However, the attributes of the sci-fi movies are likely to contain similarities in (parts of) their descriptions and categories. This gives some insight into X’s interests and allows us to recommend other sci-fi movies that are close in terms of attributes to the movies in X’s current session.

Advantages Content-based systems do not require other data than information about the current users’ interests. For the items of interest, the system finds overlapping product attributes. They can therefore be of value when data from other users is not available. Also, content-based systems can deal with the cold-start problem [20], because they are able to produce output as long as at least one item is seen, bought or rated. [1]. Furthermore, they require much less (or even zero) training time compared to CF methods.

Disadvantages Content-based systems return less diverse results than CF methods, because the constructed model is specific to the current user. When a consumer is interested in multiple (very different) products, a content-based recommender will rec-ommend some products of both product categories which allows for some variation in the recommended products. However, in cases where a user only views one product and no historical records are available, content-based systems return only products that are closely related to the product viewed in terms of attributes. This problem can be alleviated by explicitly asking a user to specify relevant attribute values [1]. However, such systems are often referred to as knowledge-based systems and require an extra step from the user, which minimizes their applicability for implicit-feedback in online shopping environments.

2.2.4 Hybrid systems

(27)

CHAPTER 2. RELATED WORK 20 types of data. Because online stores collect interaction data and user metadata and create item metadata themselves, such systems have benefits compared to content-based or CF methods. Hence, hybrid systems deserve more attention than the prior discussed algorithms. Three types of hybrid recommender systems exist [1, p. 200]:

Monolithic An integrated algorithm using various data sources, where the distinction between its inputs is not always clear.

Ensemble Results from CF- and CB-systems are combined into single output, the algorithms can affect each others’ calculations in the sense that output from one algorithm can be input for another algorithm.

Mixed Similar to ensemble methods, but there is no form of interaction between various algorithms: it therefore is simply a combination of the output of various systems presented together.

The more popular hybrid models are monolithic extensions of matrix factorization, that allow for the ingestion of user and item features into existing models. In section 2.2.2 (model-based CF) we already saw that matrix factorization allows for biases to be accounted for, which increases accuracy. In a similar way, one can add additional data sources and even temporal dynamics into matrix factorization algorithms. As user preferences and item popularity change over time, such additional data sources have great benefits in terms of accuracy.

(28)

feedback and the way in which they calculate confidence levels for that interaction type is somewhat arbitrary. Furthermore, their evaluation measure is based on recall, which makes it hard to estimate real-world performance of their system.

Kula [24] proposes a somewhat similar approach using metadata embeddings and has implemented his ideas in a Python module called LightFM. His model is based on matrix factorization and content-based techniques. In cases where user- or item metadata is absent, the model reduces to standard matrix factorization. However, if there is such data available, the model is able to produce feature embeddings which encode semantic information. The approach is therefore somewhat similar to how word2vec estimates similar words. The feature embeddings can be used not only to recommend products, but also to recommend certain attributes such as tags or colors. Results for E-commerce settings are not provided, but LightFM was made for and by Lyst which is an online clothing catalogue. Tests on the often used Movielens-dataset show that LightFM with both user- and item metadata outperforms matrix factorization with none or one of these options and content-based models too.

(29)

CHAPTER 2. RELATED WORK 22 The inclusion of social data The rise of social networks has not been left uncultivated either: Li, Wu, and Lai [30] state that shopping can be approached as a social experience and that people often rely on opinions of friends before buying, a viewpoint also adopted by Amazon that has implemented a Facebook-integration to recommend products based on friends’ purchases and interests. They argue that preference similarity, trust in experts and social closeness are drivers for recommender system success. Their results show that the inclusion of social data can enhance CF methods, but as they did not test their system in a real-world application but only using offline metrics, the actual performance remains unknown. Other research also shows that social data can improve recommendation quality [16, 36].

(Dis)advantages All hybrid models discussed in this section enhance the more naive approaches discussed in the previous sections. Cold-start issues become less relevant with these approaches, but one big issue remains: the inclusion of multiple implicit interaction sources. The often-proposed solution is to use interaction weights. However, these weights may very well differ per product and per user, which makes it hard to estimate and even harder to statistically justify them.

2.2.5 Correlated Cross-Occurrence

(30)

Table 2.1: Event counts Event A not A B k11 k10 not B k01 k00

and all secondary indicators using Log-likelihood ratio tests. Secondary indicators can be anything that may describe a user’s taste: past purchases, views, (dis)likes, and add-to-cart actions but also user features such as location and operating system. To assess which events and features are useful for predicting, the algorithm performs log-likelihood tests for each primary-secondary indicator pair per product. Its beauty is that it allows for statistically sound methods to measure whether multiple types of data such as location, operating system and browser software possess predictive power. Hence, anything known about the population can be tested for correlation with the primary indicator and used to predict a users’ preferences.

At the heart of the algorithm is the log-likelihood ratio, which computes a score for the co-occurrence of two events. To do so, one needs (k11) the number of times two events have occurred together, (k10)(k01) the number of times that they have occurred without each other and (k00) the number of observations without any of these events as illustrated in table 2.1. One can then calculate the LLR with:

LLR = 2 ∗X(k)(H(k) − H(rowSums(k)) − H(colSums(k))) Where H is Shannon’s entropy.

H =X(kij/

X

(k)) ∗ log(kij/

X (k))

(31)

CHAPTER 2. RELATED WORK 24 B is a matrix of the same users in rows and observed pageviews then

AtB

returns a matrix of all cross-occurrences of conversion-ids in rows and pageviews in columns. Then, a Log-likelihood ratio test is done for each non-zero element in the ob-tained matrix. Low confidence of correlation would in this case mean that the secondary indicator ’pageview’ is not related to the primary indicator ’purchase’ for a particular product, which is somewhat unlikely in real-world settings because normally one looks at a product before buying. Note that matrix A can also be used to find out which conversions correlate with other conversions:

AtA

In a similar way one can test whether a pageview of product C is correlated with a purchase of product A and whether a product is more likely to be bought by users that share a particular characteristic such as location or device brand. Essentially, the multiplication of the transposed matrix Atcan be done for any indicator matrix. Since non-significant indicators are not used, predictive power never decreases from adding data. However, non-significant indicators need to be filtered out because they decrease the speed with which recommendations are returned and slow websites were found to be a conversion (and revenue) killer4 _{back in 2009. The report estimated that for an}

e-commerce site making $100,000 per day, a 1 second page delay could potentially cost $2.5 million in lost sales every year. This effect is likely to be stronger now, since Google started using loading time as signal to determine rank in 20105_.

(Dis)Advantages The CCO algorithm takes away the problem of arbitrarily weighing interaction types and allows for statistical validity when using multiple interaction and user feature types. Furthermore, item features can be used to bypass the cold-start problem found in pure CF engines. Depending on the amounts of data included and

4_{https://blog.kissmetrics.com/loading-time/}

(32)

(33)

3

Research design

The previous chapter has given insights into the workings of various recommender algorithms as well as their strengths and weaknesses. This chapter uses that newly obtained knowledge to propose the most optimal solution to answer the research question:

How can implicit session feedback be used to create intelligent e-commerce recommender systems?

To answer this question, the first part of this chapter focuses on the structure and contents of the data that can be used for implementation. The previous chapter has shown that algorithm choice depends heavily on the data available, so after we have established what types of data are available, we can choose an algorithm that fits the data. This chapter then continues with data collection, (offline) testing setup & validation and online test design.

3.1 Data types

Conversify stores data from their customers’ visitors in cookies and sessions. For each unknown device, a unique id is generated, which works as an identifier for that device. This unique id is stored in a cookie. I assume that every unique device is a unique visitor. Note that this is not always the case: one may use his/her laptop for browsing and end up buying something using their tablet or phone. There is however no way to link cross-device activities to a single visitor.

(34)

After instantiating a cookie with a unique id, we have a method to link sessions to one visitor and thus to each other: the user-id allows Conversify to track their customers’ visitors behavior (on a specific device) over time. The existence of the unique id in the cookie is limited to the duration of the cookies which are set to expire after a year of inactivity. Cookies may also be deleted when the visitor chooses to do so. Session duration depends on activity as well: they expire after 30 minutes of inactivity, which is the default duration for sessions that Google uses too1_.

All expired sessions from previous days are stored in JSON-format on Amazon Web Services (AWS) Simple Storage Service (S3). Session data includes (past) purchases, pageview information, system information and location information. Note that due to user-hash storing in cookies, past purchases (and other information) can not be linked to a visitor that (1) deleted his cookie, (2) returns after the cookie expiration date and (3) is active on multiple devices. In such cases, a new cookie and therefore a new user-hash (device-hash) will be generated.

Next to session information, Conversify has access to productfeeds of their customers’ products that can be matched with the products purchased or viewed. This means that buying interaction data, viewing interaction data, user features in terms of location and system, and item features from the productfeed are all available. Currently, the only use for implicit (that is: view and buy behavior) data is to keep track of the number of conversions and to improve their notification system models. This data can however also be used for recommender engines, albeit that we encounter the shortcomings outlined in the previous paragraph. The other data source, the productfeed, is used for their other software products: intelligent search and content-based recommendations. Productfeed data contains item feature information and can - as described in chapter 2 - be used for both content-based and hybrid models. Conclusively, we have implicit feedback in the form of (past) purchases and pageviews, user features in the form of location and system information and item features in the form of productfeeds.

Retailer selection Not many Conversify customers make use of Conversifys’ current recommender system, which narrows the implementation options. For those that use

(35)

CHAPTER 3. RESEARCH DESIGN 28 Table 3.1: Model comparison

Model / Data CF CB Hybrid CCO

Interaction data

(# different types) Yes (1 or weights) No Yes (1 or weights) Yes (unlimited with LLR) User features No No Yes Yes (unlimited with LLR)

Item features No Yes Yes Yes

their current system, a toys shop is the most suitable one because the collected data goes back further than any other recommender-using customer. An important thing to note is that the average number of sessions for a user that buys or views something is only 1.5. Thus, on a daily basis, their visitors are mostly new which means that they have no historical data.

3.2 Model selection

Based on chapter 2, we can summarize the capabilities of various algorithms in terms of data usage as shown in table 3.1. It tells us that the correlated cross-occurrence algorithm best fits the data available, because it allows for easy ingestion of various interaction sources and item and user features. It therefore is the system to be implemented. Implementation The CCO algorithm has been implemented in Scala based on Apache PredictionIO, which itself is an open source Machine Learning stack written in Java that in its most basic form relies on:

Elasticsearch A high availability real-time JSON full-text search engine that can be queried over HTTP.

Spark A cluster-computing framework, which essentially provides a faster im-plementation of the MapReduce algorithm once introduced by Google to allow fast calculations for big data.

(36)

Figure 3.1: PredictionIO architecture overview

Data can be stored in HBase, and analyzed with log-likelihood ratio tests by Spark to compute which indicators are significant and which are not. Those that are, are stored in Elasticsearch, which essentially means that Elasticsearch stores a model that is available for querying. The CCO implementation for PredictionIO is called the Universal Recommender2and is well documented by its maintainers. An overview of its architecture can be found in figure 3.1. In short, it shows that: the input for all data is the eventserver (in our case Hbase), which can be used to store (real-time) user actions from a web or mobile app. Spark can access the data and use it for training. This is however, not done in real time because - depending on the amount of data - retraining requires quite some calculation. A common approach is retraining every night using a time-based job scheduler such as cronjob, but this again depends on the amount of data and the changes in the data over time. After training, Spark stores its training outputs in a predictionserver (Elasticsearch), which on its turn can be queried over HTTP by the app. Note that the arrow from Hbase to Elasticsearch is denoted by ”realtime data”. This is because actions are stored and compared to the model in real-time, so that a user gets personalized results regardless of whether he or she was in the training set.

(37)

CHAPTER 3. RESEARCH DESIGN 30 Figure 3.2: Example of extracted view events

3.3 Data collection

To feed the recommender engine, we need data. I therefore created a Python program that connects to Conversifys’ AWS S3 repository and streams the contents of the session dumps. It then extracts the user-hash, interactions & user features if the session contains interactions with products that are in the most recent version of the productfeed, and writes them to productviews-, buys-, and user-information-files that can be imported to Hbase using PredictionIO. An example of such a file - in this case for views - can be found in figure 3.2. The other files (buys and user features) have a similar structure. The first key ”event” holds the value of an indicator. It can take values such as buy, view, device brand, device type and the other available indicators. The second key ”entityType” represents a user and is followed by a unique identifier ”entityId” which is the unique id stored in a cookie by Conversify. ”TargetEntityType” and ”targetEntityId” hold the productid of the item the user interacted with, or in case of a user feature -respectively the feature type (e.g. ”device family”) and its value (e.g. ”mobile”). The last key ”eventTime” registers when the event took place.

(38)

Table 3.2: Matching titles of bought products

Conversify stored title (substring) Productfeed matching title(s) Match rubbabu mascot auto blauw rubbabu mascot auto blauw 13 cm YES djeco krimpie - sieraden djeco krimpie - sieraden het hertje,

djeco krimpie - sieraden het poesje NO the available data is used to its full potential. 7.958 (old) urls have been analyzed using this method of which 4.955 returned ’404’ and 2.643 had a redirected url that was found in the productfeed. The productfeed self contains approximately 7.000 products, so over 37% of the products has been changed at some point in time.

For buys, similar problems existed. Buys however, are not stored in sessions using an URL, but using the title of a product. In some cases employees extended these titles for the sake of search engine optimization or because they added more variants of a product. This resulted in products that were bought and stored using their old title in Conversifys’ repositories. For the products that were not matched to the productfeed initially, I have tested whether its title as stored by Conversify was a substring of a product in the productfeed. Because a substring can match multiple other strings in case of multiple variants, I only take the products for which exactly one match exists into account. In case of one match, its title has been altered and the product can be included in the data. The examples in table 3.2 clarify this approach.

(39)

CHAPTER 3. RESEARCH DESIGN 32 Table 3.3: Data collected

Event or feature Count Users Values Examples of possible values Buy 40.615 12.960 4.139 item1, item2, item3

View 1.494.045 409.674 6.816 item4, item5, item6

Device type 408.579 408.579 4 Tablet, Desktop, Smartphone Device brand 245.650 245.650 53 Apple, Samsung, Nokia OS family 409.776 409.776 13 macOS, Windows, Android Browser family 409.776 409.776 45 Chrome, Safari, Firefox Client country 409.173 409.173 146 NL, UK, BE

Client state 311.959 311.959 1.025 Friesland, Antwerpen System language 408.616 408.616 166 NL, EN, ES

3.4 Testing setups

The method proposed in literature [3, 12] and used by the creators of PredictionIO is to use offline tests for validation purposes and use online real-world measures such as CTR or increased conversions to determine actual performance and to tune models. Following this advice, I use both off- and online testing. Both types of testing will be done for two recommender types often seen in real-world applications: (1) People like you bought X & (2) People like you viewed X. In the first case, all likelihood ratio tests are done with ’buy’ as primary indicator and the other variables listed in table 3.3 as secondary indicators. For the second model, ’buy’ will not be used and ’view’ will be the primary indicator with the remaining variables as secondary indicators.

3.4.1 Offline testing

(40)

To assess the predictive power of indicators, the dataset is split into a training and test set that respectively make up 80% and 20% of all data. Then, for both models, each indicator will be evaluated in terms of separate MAP@k and combined MAP@k. Separate tests show which indicators entail most predictive power with regard to the primary indicator. Combined tests show which combinations of indicators leads to the best predictive performance, and essentially tells which indicators should be included in the deployed models.

3.4.2 Online testing

Online testing will be done in the form of A/B/n testing using click-through rate (CTR) as dependent variable. For each unique user-id, one of the available recommenders is ran-domly selected which ensures that all other factors are kept equal during the experiment. Google Analytics allows easy comparison of the performance of the various models. The only item feature used is ”in stock”, which is to keep the recommender in sync with the productfeed, so that no out-of-stock products are shown by the recommenders.

(41)

4

Evaluation

The previous chapters have made clear that both off- and online evaluation is necessary to deploy a good model. This chapter is therefore divided into two parts: offline and online evaluation. The second part (online evaluation) builds on the first part because offline evaluation forms the basis for the models deployed in the second part.

4.1 Offline evaluation

Table 4.1 shows the mean average precision for respectively 5 and 10 products (MAP@5 & MAP@10) for each of the indicators separated. The uneven rows represent figures for ’People like you bought X’, the even rows figures for ’People like you viewed X’. The table shows that ’view’ contains most predictive power for both types of recommenders. This makes sense because a view by nature indicates interest in a particular product to a much greater extent than any of the other indicators.

Table 4.2 assesses the added value of the indicators when combined with the primary indicator. This allows one to find out whether predictive performance increases when an indicator other than ’view’ or ’buy’ is included. The numbers in bold indicate the primary indicators, which are the same numbers as shown in table 4.1.

Table 4.2 shows that for the recommender with primary indicator ’buy’, the addition of ’views’ nearly doubles predictive performance for both MAP@5 and MAP@10. All other indicators except ’type’ (device type) worsen performance. The effect of ’type’ is assessed by running a separate MAP test with indicators ’buy’, ’view’ and ’type’ combined. Adding ’type’ showed no increase in predictive performance (0.0254)

(42)

Table 4.1: MAP@K Separate events

event Buy View Country State Lang Browser OS Type Brand MAP@5 0.0107 0.1912 0.0012 0.0052 0.0016 0.0009 0.0003 0.0003 0.0013 MAP@5 0.0228 0.0004 0.0013 0.0003 0.0003 0.0003 0.0007 0.0003 MAP@10 0.0127 0.2011 0.0014 0.0055 0.0019 0.0011 0.0005 0.0004 0.0017 MAP@10 0.0266 0.0004 0.0014 0.0003 0.0003 0.0004 0.0009 0.0003

Table 4.2: MAP@K combined with primary indicator

event Buy View Country State Lang Browser OS Type Brand MAP@5 0.0107 0.0212 0.0097 0.0086 0.0106 0.0101 0.0098 0.0123 0.0106 MAP@5 0.0228 0.0187 0.0198 0.0198 0.0192 0.0204 0.0215 0.0219 MAP@10 0.0127 0.0259 0.0117 0.0102 0.0122 0.0119 0.0116 0.0139 0.0125 MAP@10 0.0266 0.0212 0.0222 0.0221 0.0218 0.0228 0.0239 0.0249 compared to ’buy’ and ’view’ combined (0.0259), which means that no indicators

other than ’buy’ and ’view’ should be included for the ’People like you bought X’-recommender.

For the recommender with primary indicator ’view’, no other indicator improves performance. Solely ’view’ (0.0266) performs better than ’view’ combined with any of the other indicators. Therefore only ’view’ is used to create the ’People like you viewed X’-recommender. This essentially means that this recommender is a pure collaborative filtering algorithm based on patterns in product-views.

4.2 Online evaluation

(43)

CHAPTER 4. EVALUATION 36

Table 4.3: Comparison of click-through rates (CTR)

# Type Based on Impressions Clicks CTR Baseline diff 0 Content-based Description 1557 113 7,3% N/A

(44)

5

Discussion

This chapter describes the insights gained from this thesis. The research question was: How can implicit session feedback be used to create intelligent e-commerce

recommender systems?

Throughout this thesis, I have assessed the value of implicit feedback for recommen-dation systems in both off- and online situations. I will use this distinction in this chapter as well, starting with a discussion of offline results.

5.1 Offline tests

For this particular research, none of the implicit feedback indicators except ’view’ and ’buy’ were found to increase predictive performance in terms of MAP@5 and MAP@10. This essentially means that no preferences were found based on non-behavioral charac-teristics such as location and system. As reported in chapter 2, other companies have found differences in customer preferences based on system and location information and actively use this information to tailor recommendations. The difference in my findings and theirs could be due to the fact that cross-device tracking was not possible and each device was assumed to be a unique user, which - given the recent increase in multi-device browsing behavior - might not be appropriate. Another explanation might be in the limited amount of data used in my research.

(45)

CHAPTER 5. DISCUSSION 38

5.2 Online tests

In terms of click-through rate, Conversify’s content-based model performs better than both the correlated cross-occurrence model based on ’buy’ & ’view’ and the collaborative filtering model based on ’view’. This means that recommending similar products based on description evokes more responses than the models based on implicit feedback. The placement of the recommenders may explain why this is the case, because a product-detail page visit indicates interest in the visited (and therefore similar) item(s) which by nature gives content-based recommender an edge over the other two. A dedicated ”rec-ommendations” page or placement on the homepage may show more elicited responses for both models. Unfortunately, the online store used for this research is too small to examine recommendation effects in terms of financial gain.

5.3 General remarks

Because this research is narrowed down to a single store with a low share of repeat customers, the results should not be generalized to online stores with many repeat customers. For such shops - of which the prime example is Amazon - implicit feedback has been shown to be extremely helpful for personalization.

(46)

6

Conclusion and Future Work

This thesis aimed to find a way to unleash the potential of implicit feedback for online personalized recommendations. To do so, I have assessed how recommender systems can benefit both stores & consumers and studied the workings of multiple algorithms. Contributions of this thesis Three recommendation models have been put to the test in a real-world setting, which resulted in the content-based being declared winner because it best elicits responses from shoppers. An overview of contributions can be found below:

• Behavioral indicators such as ’buy’ and ’view’ contain predictive power when used for recommendations. System and location indicators do not.

• On product-detailpages, content-based recommenders outperform personalized models.

Online stores may use this information to optimize their bounce rates and to help consumers find what they find interesting faster to increase conversion rates and thus revenue. Conversify can use this information to assist online stores in choosing and deploying the right type of recommender at the right place.

6.1 Future Work

Other research could focus on how the performance of various recommender systems depends on its on-site placement. Another, more interesting way to build upon this

(47)

(48)

[1] Charu C. Aggarwal. Recommender Systems: The Textbook. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319296574, 9783319296579. [2] Icek Ajzen. From Intentions to Actions: A Theory of Planned Behavior, pages

11–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 1985. ISBN 978-3-642-69746-3. doi: 10.1007/978-3-642-69746-3 2. URL http://dx.doi.org/ 10.1007/978-3-642-69746-3_2.

[3] Joeran Beel, Marcel Genzmehr, Stefan Langer, Andreas N¨urnberger, and Bela Gipp. A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation, pages 7–14. ACM, 2013.

[4] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Seventh International World-Wide Web Conference (WWW 1998), 1998. URL http://ilpubs.stanford.edu:8090/361/.

[5] Erik Brynjolfsson, Yu (Jeffrey) Hu, and Duncan Simester. Goodbye pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Science, 57(8):1373–1386, 2011. doi: 10.1287/mnsc.1110.1371. URL http://dx.doi.org/10.1287/mnsc.1110.1371.

[6] Patrick Y. K. Chau and Candy K. Y. Ho. Developing consumer-based service brand equity via the internet: The role of personalization and trialability. Journal of Organizational Computing and Electronic Commerce, 18(3):197–223, 2008. doi: 10.1080/10919390802198956. URL http://dx.doi.org/10.1080/ 10919390802198956.

[7] Fred D. Davis. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q., 13(3):319–340, September 1989. ISSN 0276-7783. doi: 10.2307/249008. URL http://dx.doi.org/10.2307/249008. [8] Elasticsearch. elasticsearch/elasticsearch, 2015. URL https://github.com/

(49)

BIBLIOGRAPHY 42 [9] Daniel Fleder and Kartik Hosanagar. Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Science, 55(5): 697–712, 2009. doi: 10.1287/mnsc.1080.0974. URL http://pubsonline. informs.org/doi/abs/10.1287/mnsc.1080.0974.

[10] Forrester. Forrester forecasts us online retail to top $500b by 2020, . URL https://www.forrester.com/Forrester+Forecasts+US+ Online+Retail+To+Top+500B+By+2020/-/E-PRE9146.

[11] Forrester. Forrester: Us cross-channel retail sales to reach $1.8 trillion by 2017, . URL https://www.forrester.com/Forrester+US+ CrossChannel+Retail+Sales+To+Reach+18+Trillion+By+ 2017/-/E-PRE6324.

[12] Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. Offline and online evaluation of news recommender systems at swissinfo.ch. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 169–176, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2668-1. doi: 10.1145/2645710.2645745. URL http://doi.acm. org/10.1145/2645710.2645745.

[13] GFK. Nederlanders shoppen in 2015 voor 16,07 miljard online, . URL http://www.gfk.com/nl/insights/press-release/ nederlanders-shoppen-in-2015-voor-eur-1607-miljard-online/. [14] GFK. Online bestedingen aan producten groeien met 22%,

. URL http://www.gfk.com/nl/insights/news/ online-bestedingen-aan-producten-groeien-met-22/.

[15] Michele Gorgoglione, Umberto Panniello, and Alexander Tuzhilin. The effect of context-aware recommendations on customer purchasing behavior and trust. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pages 85–92, New York, NY, USA, 2011. ACM. ISBN

(50)

[16] Jrg Gottschlich, Irina Heimbach, and Oliver Hinz. The value of users’ facebook profile data - generating product recommendations for online social shopping sites. In ECIS, page 117, 2013. URL http://dblp.uni-trier.de/db/conf/ ecis/ecis2013.html#GottschlichHH13.

[17] Shuk Ying Ho, David Bodoff, and Kar Yan Tam. Timing of adaptive web person-alization and its effects on online consumer behavior. Info. Sys. Research, 22(3): 660–679, September 2011. ISSN 1526-5536. doi: 10.1287/isre.1090.0262. URL http://dx.doi.org/10.1287/isre.1090.0262.

[18] Kartik Hosanagar. recommended for you: How well does personalized marketing work? URL http://knowledge.wharton.upenn.edu/article/

recommended-for-you-how-well-does-personalized-marketing-work/. [19] Kartik Hosanagar, Ramayya Krishnan, and Liye Ma. Recomended for you: The

im-pact of profit incentives on the relevance of online recommendations. In ICIS, page 31. Association for Information Systems, 2008. URL http://dblp. uni-trier.de/db/conf/icis/icis2008.html#HosanagarKM08. [20] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit

feedback datasets. In Proceedings of the 2008 Eighth IEEE International Confer-ence on Data Mining, ICDM ’08, pages 263–272, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978-0-7695-3502-9. doi: 10.1109/ICDM.2008.22. URL http://dx.doi.org/10.1109/ICDM.2008.22.

[21] Gerald Hubl and Valerie Trifts. Consumer decision making in online shopping environments: The effects of interactive decision aids. Marketing Science, 19(1): 4–21, 2000. doi: 10.1287/mksc.19.1.4.15178. URL http://dx.doi.org/10. 1287/mksc.19.1.4.15178.

[22] Eric J Johnson and John W Payne. Effort and accuracy in choice. Management science, 31(4):395–414, 1985.

(51)

0018-BIBLIOGRAPHY 44 9162. doi: 10.1109/MC.2009.263. URL http://dx.doi.org/10.1109/ MC.2009.263.

[24] Maciej Kula. Metadata embeddings for user and item cold-start recommenda-tions. CoRR, abs/1507.08439, 2015. URL http://arxiv.org/abs/1507. 08439.

[25] Nanda Kumar and Izak Benbasat. Research note: The influence of recommendations and consumer reviews on evaluations of websites. Information Systems Research, 17 (4):425–439, 2006. URL http://dblp.uni-trier.de/db/journals/ isr/isr17.html#KumarB06.

[26] Juhnyoung Lee, Mark Podlaseck, Edith Schonberg, and Robert Hoch. Visualization and Analysis of Clickstream Data of Online Stores for Understanding Web Merchan-dising, pages 59–84. Springer US, Boston, MA, 2001. ISBN 978-1-4615-1627-9. doi: 10.1007/978-1-4615-1627-9 4. URL http://dx.doi.org/10.1007/ 978-1-4615-1627-9_4.

[27] Young-Jin Lee, Kartik Hosanagar, and Yong Tan. Do i follow my friends or the crowd? information cascades in online movie ratings. Management Science, 61(9): 2241–2258, 2015. doi: 10.1287/mnsc.2014.2082. URL http://dx.doi.org/ 10.1287/mnsc.2014.2082.

[28] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, New York, NY, USA, 2nd edition, 2014. ISBN 1107077230, 9781107077232.

[29] Seth Siyuan Li and Elena Karahanna. Online recommendation systems in a b2c e-commerce context: A review and future directions. J. AIS, 16(2): 2, 2015. URL http://dblp.uni-trier.de/db/journals/jais/ jais16.html#LiK15.

(52)

[31] Greg Linden, Brent Smith, and Jeremy York. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, January 2003. ISSN 1089-7801. doi: 10.1109/MIC.2003.1167344. URL http://dx. doi.org/10.1109/MIC.2003.1167344.

[32] Sabrina Lombardi, Sarabjot Singh Anand, and M Gorgoglione. Context and customer behaviour in recommendation. 2009.

[33] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 43–52, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3621-5. doi: 10.1145/2766462.2767755. URL http://doi.acm.org/ 10.1145/2766462.2767755.

[34] McKinsey & Company. How retailers can keep up with consumers. URL http: //www.mckinsey.com/industries/retail/our-insights/

how-retailers-can-keep-up-with-consumers.

[35] Gal Oestreicher-Singer and Arun Sundararajan. The visible hand of peer networks in electronic markets. Management Science, 58(11):1963–1981, 2012.

[36] Sung-Hyuk Park, Soon-Young Huh, Wonseok Oh, and Sang Pi Han. A social network-based inference model for validating customer profile data. MIS Q., 36(4): 1217–1237, December 2012. ISSN 0276-7783. URL http://dl.acm.org/

citation.cfm?id=2481674.2481685.

[37] John W. Payne, James R. Bettman, and David A. Schkade. Measuring constructed preferences: Towards a building code. Journal of Risk and Uncertainty, 19(1): 243–270, 1999. ISSN 1573-0476. doi: 10.1023/A:1007843931054. URL http: //dx.doi.org/10.1023/A:1007843931054.