Contextual analysis of reviews for recommendation and helpfulness

(1)

Contextual analysis of Reviews

for Recommendation and

Helpfulness

Quiryn Kasper Johannes Otten

Student number:

10089039

Date of final version:

July 14, 2017

Master’s programme:

Econometrics

Specialization:

Big Data in Business Analytics

Supervisor:

Dr. M. Mazloom

Second reader:

Dr. J.C.M. van Ophem

(2)

Statement of Originality

This document is written by student Quiryn Otten who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1 Introduction

Today we have access to a large scale of social media data. Every day a lot of new data is created by people posting pictures, writing tweets and filming events. In the past this data was not available since all pictures and films were stored analogously but because of the rise of social media all these data has now become accessible. Social media websites and apps, like Facebook, Instagram and Youtube, becoms more and more part of our lifes where millions of people share their thoughts and opinions. Facebook is a social network service which has over 1.94 billion1 _{monthly active Facebook users. Instagram is a}

photo-sharing application and service, with 700 million users2, where users are

able to share pictures and videos publicly or privately. Youtube is a video sharing platform with over a billion users3 _{who upload, view, rate, share and}

comment on videos. All these posts, created by the users them self, contain lots of information. Nowadays we are able to extract and use all this hidden information by the use of social media analytics whether this information is in a picture, video or text.

The raise of social media has made it possible for one person to talk to thou-sands of people about products and the companies that provide them. Mangold et al. [27] states that the impact of consumer to consumer communications has become easier which gave consumers more and more information about prod-ucts and companies. This stands in contrast to the traditional integrated mar-keting communications paradigm whereby a high degree of control is present. Therefore, firms must learn to shape consumer discussions in a manner that is consistent with the organization’s mission and performance goals. By the use of Social media analytics firms are able to retrieve the opinions of thousands customers which they can use for their marketing and sales or to improve their

1_{https://investor.fb.com/} 2_{https://instagram-press.com/} 3_{Http://www.youtube.com/yt/press}

(5)

products. Castilo et al. [6] shows that social media capability enables firms to effectively balance exploration and expoitation of knowledge which in turn facilitates product innovation. Mazloom et al. [28] created a model to predict the popularity of brand-related social media posts. By using this model, brands are able to understand what drives post popularity in general as well as isolate the brand specific drivers. This will help generate more likes for their posts and thus more publicity for the brand.

Another way to improve the customers experience with a firm is to help them finding products they need. With the use of recommender systems in on-line applications and e-commercial services firms can recommend items to users that suits them. A recommender system is an algorithm that suggests relevant items to users based on the user’s behaviour. These systems can recommend books, movies, restaurants or even partners on an online dating platform. Many big companies such as Youtube and Netflix have adopted recommendation tech-niques to their systems to estimate the potential preferences of customers and recommend relevant items to the user. These recommendations are a big part of their product for example the recommender system in Youtube is responsi-ble for 60% of the clicks [9] and Netflix thinks that the effect of personalizing recommendations save them over 1 billion dollar per year [14].

In this thesis we focus on the contentual analysis of reviews for recommen-dation and review helpfulness. There is already allot of research in using textual features in recommender systems. In [1, 20, 21, 32] they use Word2Vec, Latent Dirichlet Allocation (LDA) or Textual Sentiment analysis as textual features for recommendation. In [7, 22] they use textual features like Word2Vec and linguis-tic category features extracted from textual content to predict the helpfulness of reviews. However no past literature used a combination of Word2Vec, LDA and Sentiment to find if these features complement each other in item recommen-dation or review helpfulness prediction. That’s why the main question we try to answer in this thesis is: Can a combination of textual features improve

(6)

rec-ommendation systems and helpfulness prediction for Amazon product reviews? Amazon.com is an online web shop selling a lot of different products like books, clothing, electronics and groceries. The company has 15 different separate retail web shops and is, in 2015, the most valuable retailer in the United States by market capitalization. We build a collaborative filtering recommender system for clothing products using the reviews written by the customers. By using Word2Vec, LDA and the Sentiment as features we try to find the best textual features for recommendation. We also used these textual features to build a Support Vector Machine (SVM) regression model that predicts the helpfulness of a review for general clothing products and for specific individual products. This model can be used to give insights in the characteristics behind helpful and unhelpful reviews.

We found that the Word2Vec feature returns the best model improvement. The sentiment features gave the worst model improvement for both models. However the Word2Vec, LDA and Sentiment features all improve the perfor-mance of recommendation systems. We were also able to find that the three textual features complement each other when used together in both item rec-ommendation and review helpfulness prediction. We also showed that the gen-eral helpfulness prediction model for clothing products can be specified for one specific item. This shows that companies can build item specific helpfulness prediction models.

The structure of this thesis is as follows. First we discuss previous work of recommendation systems in section 2. Next we describe the used methods in section 3. In section 4 we discuss the used data and preformed experiments followed by the results in section 5. In the last section we discuss the conclusion of this thesis.

(7)

2 Related Work

At the moment recommender systems are a hot topic. A recommender system is an algorithm that suggests relevant items to users based on the user’s behaviour. These systems can recommend books, movies, restaurants or even partners on an online dating platform. A recommender system can be build using a lot of different types of parameters. Damianos [12] uses for example the location of the user, the current weather and the time to recommend tourist guides while Liu [23] uses click behavior of users on the website to recommend news articles. Hofmann [16] shows us that when you only have data of users and items you can create latent variables to optimize the recommendation system. Latent variables are variables that are not directly observed but are created using other observed variables. These latent factors can indirectly reflect a user opinion by observing his or her behavior, including purchase history, browsing history, search patterns, or even mouse movements [19]. In this paragraph we first discuss different methods in recommender systems that are used today followed by text mining techniques used in these systems.

2.1 Recommender Systems

At the moment there are two leading techniques used in recommender systems: Content Based Filtering and Collaborative Filtering.

2.1.1 Content Based Filtering

Content Based Filtering Recommender Systems try to recommend items similar to those a user has liked in the past [26]. It is based on the information of the content of an item, such as a genre of a book or an actor in a movie. Content Based Filtering uses the ratings of a user to profile the user’s interests based on the item features. The system recommends items with similar characteristics of items a user liked in the past. Although Content Based Filtering recommenders

(8)

are losing popularity they are still used today.

Tkalˇciˇc et al. [48] created a content-based recommender system for images. It is based on an emotion detection technique that takes as input the video sequences of the users facial expressions. It extracts Gabor low level features from the video frames and employs a k-nearest neighbours machine learning technique to generate affective labels. Yin et al. [51] proposes a Location Content Aware Recommender System (LCARS) that offers a particular user a set of venues or events by giving consideration to both personal interest and local preference. LCARS consists of two components: offline modelling and online recommendation. The offline modelling part, called LCALDA, is designed to learn the interest of each individual user and the local preference of each individual city by capturing item co-occurrence patterns and exploiting item contents. The online recommendation part automatically combines the learned interest of the querying user and the local preference of the querying city to produce the top-k recommendations.

Content Based Filtering in recommender systems are based on individual information and thus ignore contributions from other users [2]. Systems that uses other users contributions are Collaborative Filtering Recommender Systems which are discussed in the next paragraph.

2.1.2 Collaborative Filtering

The second leading technique used in recommender systems is Collaborative Filtering. Collaborative Filtering is based on the core assumption that users who have expressed similar interests in the past will share common interests in the future [13, 43]. This can be accomplished by searching relations between users or items by calculating the similarity between these users or items. Collaborative Filtering can be split in 2 different techniques: Memory Based Collaborative Filtering and Model Based Collaborative Filtering.

(9)

divided between the based method and item based method. The user-based method makes recommendation user-based on the interest of users who have similar tastes. The similarities between users are calculated using a similarity metric. The Pearson correlation or the cosine similarity are usually used as similarity metrics [44]. Next the k-nearest neighbors, the k users with highest similarity to a given user, are selected and used to predict a rating for a specific item. This is usually done by calculating an average or a weighted average over the K nearest neighbors but can also be done by using the fine-grained neighbor-weighting factors to improve the model performance [15]. The item that gets the highest predicted rating will be recommended to the specific user. The item-based method works the same as the user-based method but instead the similarities between users, similarities between items are used [2].

The second method is the Model Based Collaborative Filtering. Model Based Collaborative Filtering primary uses the User-Item-Rating matrix to cre-ate a model for recommendation [10]. This model is used to crecre-ate rating pre-dictions for items given a specific user. There are a lot of different model based Collaborative Filtering models for example Breese et al. [5] created a Bayesian network model which models the conditional probability between items while Hofmann [16] created the latent semantic model which creates clusters for users and items with latent classes. Some of the most successful Model Based Col-laborative Filtering model today is the Matrix Factorization Model [46]. In its basic form, Matrix Factorization uses latent variables to extract relations be-tween a user and an item inferred from rating patterns. High correspondence between item and user factors leads to a recommendation.

These latent factor methods have become more and more popular in recent years by combining good scalability with predictive accuracy [19]. It is that the underlying interactions between users and items which makes Matrix Fac-torization more effective compared to Content Based Filtering [4]. In addition, they offer flexibility in modelling various real-life situations like challenges in

(10)

processing large datasets and very sparse User-Item matrices [42].

One of the advantages of Collaborative Filtering is that there is no need for item or user characteristics however issues arise in the so called cold start problem. The cold start problem takes place when there are not enough data to make a recommendation. It occurs when a new item or user is added and there are no data to model on. In case of the cold start problem Collaborative Filtering is less advanced since Content Based Filtering only needs the item characteristics in case of a new item [31].

In this thesis we focus on the textual features in recommender systems. Text mining can be defined as the application of algorithms and methods from the machine learning field to texts. Here the goal is to find useful patterns in text to use in models. Basically it is a method to transform text into a vector of information that a model can use. Many researchers use information extraction methods like natural language processing or some simple prepossessing steps in order to extract data from texts [17, 33].

In the case of a recommender system, text mining could be used to improve the performance. Loh et al. [25] created a recommender system that helps travel agents to find options where to go and what to do for customers. The recommender uses textual data extracted from a chat between the agent and the customer. Using a fuzzy reasoning about cues found in a text they calculated the likelihood of a theme or subject being present in that text. This vector is, in turn, used in the system to produce a recommendation. Ozsoy [36] ap-plied Word2Vec to their recommendation domain to recommend items to users. Word2Vec is an algorithm, created by Mikolov et al. [30], that transforms words into a vector space using a two-layer neural network. First the weights are ran-domly distributed and followed by training using the Skip-gram methodology. At each step the weights are updated using the Stochastic Gradient Decent. After training, the vector space can be obtained by extracting the weights of the neural network.

(11)

The sentiment behind a text could also improve a recommender. Krishna et al. [21] used the sentiment of a text in their system. They proposed a Learn-ing Automate-Based Sentiment Analysis System (LASA) that recommends the places nearby the current location of the users by analyzing the feedback text from the places. They calculated a sentiment score based on the user’s feedback text to improve their model. Sentiment analysis is basically an algorithm that transforms the sentiment in a text into variables. Usually this sentiment is ex-pressed in 2 different numbers with represents the positive and negative opinion towards the subject [34].

Another way to retrieve information from text for recommendations is Latent Dirichlet Allocation (LDA). LDA is an unsupervised topic model in applying hierarchical Bayesian models to grouped data. Wang and Blei [50] developed a system that recommends scientific articles to users of an online community. Their approach combines the metrics of collaborative filtering and probabilistic topic modeling, LDA. They showed that this approach works well and makes good predictions on unrated articles.

So the textual features Word2Vec, LDA and Sentiment are common features to use in recommender systems. However these textual features are never used together in a recommender system at the same time. We will put all these three features in a recommender system using Factorization Machines in order to find out if these textual features complement each other. The Factorizaion Machines model, which is a collaborate filtering technique, is explained in section 3.1.2.

2.2 Helpfulness Prediction

Most websites use the helpfulness score of a review to show the users the most helpful reviews first and the unhelpful reviews second. These helpfulness scores are manually votes made by other users of the website. Using this method it is possible that new helpful reviews gets sorted last because no one have read

(12)

then yet and thus did not made a helpfulness vote. The problem is that users will not vote on this helpful review in the future as well because it is still sorted lasts. This problem leads to a vicious circle where the helpful review will stay unnoticed. By making use of a helpfulness prediction model this issues gets resolved.

Currently some models have been created to predict the helpfulness of a review. Most models make use of structural features, lexical features and meta-data features to predict the helpfulness of reviews. Structural features are ob-servations of the document structure and formatting. Lexical Features capture the words observed in the reviews like Unigrams and Bigrams. The meta-data feature is usually the star rating of the reviews. Kim et al. [18] made a SVM regression to predict the helpfulness of amazon reviews. They concluded that the most useful features include the length of the review, its unigrams, and its product rating. Liu et al. [24] conclude that the reviewer’s expertise, the writ-ing style of the review, and the timeliness of the review are the most important factors. Krishnamoorthy [22] states that the sentiment is the most important parameter in predicting the helpfulness. Ngo-Ye et al. [35] finds that both review text and reviewer engagement characteristics help predict review help-fulness. The hybrid approach of combining the textual features of bag-of-words model and RFM dimensions produces the best prediction results.

These models return a high prediction accuracy. However none of the pre-vious experiments make use of textual features like Word2Vec. We hypothesis that the Word2Vec feature can improve the model because of the additional con-textual information it contains. That is why we will create a SVM regression model using Word2Vec features in order to predict the helpfulness of a review.

(13)

3 Method

In this section we discuss the different Machine Learning techniques used in this research. We create two different types of models: A recommendation system and a helpfulness prediction model visualized in figure 1. First we explain the Factorziation Machines model used to recommend items to users. Secondly we explain the Support Vector Machines (SVM) regression model used to predict the helpfulness of reviews.

3.1 Recommender System

3.1.1 Problem Statement

The goal of a recommendation system is to find an item i ∈ I for user u ∈ U that gives the highest utility/rating. In order to create this system we need to construct a rating function r that uses the user, item and context to predict the corresponding rating r : U × I × C ⇒ R. The moment the rating function is constructed the recommender returns the item that gives the highest return. We can write this problem statement as follows:

i = arg max

i∈I ˆr(u, i, c) (1)

In other words: the recommender system returns the item i that gives the highest predicted rating given a user u and context c. In this thesis we use Factorization Machines to construct the rating prediction function ˆr(u, i, c) in equation 1.

3.1.2 Factorization Machines

Factorization Machines is a model class that combines factorization models with Support Vector Machines. It estimates interactions between variables because it breaks the independence of the interactions parameters by factorizing them. Factorization Machines combines the generality of feature engineering with the

(14)

Figure 1: We use two types of Machine Learning methods: Factorization Machines for item recommendation and SVM for review helpfulness prediction.

superiority of factorization models in estimating interactions between categorical variables of a large domain. This means that the data for one interaction also helps to estimate the parameters for related interactions. This way the algorithm is able to estimate interactions even in problems with huge sparsity [39]. Sparsity occurs when the data contains allot of zeros for example in data situation like recommender systems (the User-Item matrix) and bag of words approaches.

In figure 2 the feature matrix of the first experiment is presented. The user and item features are presented as binary indicator variables (dummy variables) that represent a unique user and item in each observation. In this way each unique item and user is represented by a column. These two binary variables create a very sparse matrix since for each observation only 2 columns (one user and one item) have a value one and the others are zero. The context (c) in equation 1 is represented by the Word2Vec (wi), Sentiment (s) and LDA (li)

features in figure 2. The creation of these features are explained in section 4.2. The Factorization Machines regression model equation is close to a linear model that depends on a linear number of parameters. The model allows for d degrees of parameters combinations. In the case of a factorization machine model of degree d = 2 we can write the regression equation as follows:

ˆ r(x) := w0+ n X i=1 wixi+ n X i=1 n X j=i+1 hvi, vjixixj (2)

(15)

Figure 2: The Feature Matrix used for Factorizaion Machines. Source: Rendle 2012 [40].

Here xi is a sparse feature vector of observation i that includes the user, item

and context information. The dot product of two vectors of size k is represented by h·, ·i and can be written as hvi, vji := P

k

f =1vi,f· vj,f. The model latent

parameters that have to be optimized in equation 2 are: w0∈ R, w ∈ Rn, V ∈ Rn×k

where w0 is the global bias, wi represents the strength of the i-th variable and

ˆ

wi,j := hvi, vji models the interaction between the i-th and j-th variable. The

variables vi and vj are the latent variables in this model since they are not

observed directly.

In sparse settings, like in the User-Item matrix in recommendation, there are usually not enough data to estimate correlations between variables directly and independently. However by factorizing parameters, the Factorization Machines model breaks the independence of interaction between parameters. By doing this, Factorization Machines can estimate the correlation between parameters even in sparse settings. This means that the data of one user can help to estimate the parameters of another user. To make this more intuitive we give an example using the data showed in figure 2.

(16)

If we want to estimate the interaction between user 1 (u1) and item 4 (i4)

for predicting the rating (r) we do not have the data where both variables u1

and i1 are non-zero. So a direct estimate would lead to no interaction between

vu1 and vi4 (wu1,i4 = 0). By using these factorized parameters hvu1, vi4i we can estimate this interaction. Because user 2 (u2) and user 3 (u3) have similar

interactions with item 3 the latent vectors vu2 and vu3 will be similar to. In

other words hvu2, vi3i and hvu3, vi3i need to be similar as well. Moreover user 1 (vu1) has a different factor vector from user 3 (vu3) because he or she has a

different interaction with the factors of item 1 and item 3. The factor vectors of item 4 and item 3 has to be similar since user 2 has similar interactions for both items in the target r. This means that the interaction of the factor vectors of user 1 and item 4 will be similar to the one of user 1 and item 3.

Now we understand the concept behind the 2-way Factorization Machines described above. We can transform this 2-way model to a higher order d-way model write in equation 3, which we will use in our recommender system.

ˆ r(x) :=wo+ n X i=1 wixi+ d X l=2 n X i1=1 . . . n X il=il−1+1 l Y j=1 xij kl X f =1 l Y j=1 v(l)_i j,f (3)

Here the model parameters are:

w0∈ R, w ∈ Rn, ∀l ∈ {2, ..., d} : Vl∈ Rp×kl

The function ˆr(x), in equation 3, can be applied as a regression to construct a predictor. We will use this function in the first experiment to construct an item rating prediction given a user. This predictor ˆr(x) can be used to optimize the problem statement in equation 1.

3.2 Helpfulness Prediction

In order to predict the helpfulness of a review using textual features we use Support Vector Machine (SVM) Regressions. SVM became popular a few years

(17)

ago for solving many Machine Learning problems [3]. In this section we discuss the basics behind the SVM problem in regression.

3.2.1 Support Vector Machine regression

In a simple linear regression we try to minimize the following regularized error function: 1 2 N X n=1 {yn− tn}2+ λ 2||w|| 2 ₍₄₎

To obtain a sparse solution we replaced the quadratic error by an -insenitive error function [49]. The function returns a zero when the absolute difference between the prediction y(x) and the target t is less then > 0. We can write this -insensitive error function as follows:

E(y(x) − t) =      0 if |y(x) − t)| < |y(x) − t)| − otherwise

When we substitute E(y(x)−t) in equation 4 we get the following optimization

function: C N X n=1 E(y(xn) − tn) + 1 2||w|| 2 ₍₅₎

were y(x) = wT_{φ(x) + b. The (inverse) regularization parameter, denoted by}

C, appears at the front of the error term. We can transform the optimization problem using two slack variables ξn> 0 and ˆξn> 0. Here ξnstands for a point

where tn> y(xn) + and ˆξn stands for a point for which tn< y(xn) − . These

slack variables are illustrated in figure 3.

The condition yn− 6 tn 6 yn+ , where yn= t(xn), means that the target

points lies inside the -tube. By introducing the slack variables we allow points to lie outside the tube. These corresponding conditions are: tn 6 y(xn) + + ξn

(18)

Figure 3: Slack Variables ( ˆξ and ξ) are allowed to lie outside the tube (Red area). Source: Bishop 2006 [3].

and tn > y(xn) − − ˆξn. Using this condition the support vector regression

error function can be written as follows:

C N X n=1 (ξn+ ˆξn) + 1 2||w|| 2 ₍₆₎

By introducing new Lagrange multipliers an > 0, ˆan > 0, ˆµn > 0 we can

mini-mize this equation. The Lagrangian equation becomes:

L = C N X n=1 (ξn+ ˆξn) + 1 2||w|| 2₋ N X n=1 (µnξn+ ˆµnξˆn) − N X n=1 an( + ξn+ yn− tn) − N X n=1 ˆ an( + ˆn− yn+ tn) (7)

The corresponding derivatives with respect to w, b, ξn and ˆξn set equal to zero

(19)

∂L ∂w = 0 ⇒ w = N X n=1 (an− ân)φ(xn) ∂L ∂b = 0 ⇒ N X n=1 (an− ân) = 0 ∂L ∂ξn = 0 ⇒ an+ µn= C ∂L ∂ ˆξn = 0 ⇒ ân+ ˆµn= C

If we substitute these conditions in the Lagrange function, the Lagrange function becomes: ˜ L(a, â) = −1 2 N X n=1 N X m=1 (an− ân)(am− âm)k(xn, xm) − N X n=1 (an+ ân) + N X n=1 (an− ân)tn (8)

We want to maximize this Lagrange with respect to {an} and { ˆan} where the

kernel k(x, x0) = φ(x)Tφ(x). Using the derivatives and ˆy(x) = wTφ(x) + b we

see that prediction for new inputs can be created using

y(x) =

N

X

n=1

(an− ˆan)k(x, xn) + b (9)

Using the corresponding Karush-Kuhn-Tucker (KKT) conditions [3], which state that at the solution the product of the dual variables and the constraints must vanish, we can obtain the following values:

b = tn− − wTφ(x) = tn− − N X m=1 (am− ˆam)k(xn, xm) (10)

where w can be obtained by using the derivatives of equation 7. Using the values of w and b we are able to create a support vector machine regression.

(20)

4 Experimental Setup

In this section we discuss the experiments preformed in this thesis. First we explain the used dataset and the implementation of the data followed by an explanation of the used evaluation metrics and the preformed experiments.

4.1 Data Description

The data are crawled reviews and products from the webshop Amazon.com. Amazon.com is an online web shop selling a lot of different products like books, clothing, electronics and groceries. The company has 15 different separate retail web shops and was, in 2015, the most valuable retailer in the United States by market capitalization.

The dataset contains product reviews and metadata from May 1996 till July 2014 of the United States retail web shop. In this thesis we focus on the Clothing, Shoes and Jewelry products. We had in total 1,503,384 products and 5,748,920 corresponding reviews. To deal with the cold start problem in the experiment with the recommendation system we used the users which have made at least 5 reviews. This restriction narrows down the dataset to 1.184,427 reviews and 534,847 products for the recommendation system. For the helpfulness prediction model we only use the reviews which have at least 5 helpfulness votes in order to predict a trust worthy helpfulness. This data selection results in a set of 122.975 reviews. In all experiments we split the dataset into a training set (80%) to create the model and a test set (20%) to measure the performance of the models.

The dataset contains the following important used parameters: • ReviewerID is a unique value for each unique customer. • Asin is a variable which gives a value for each unique product.

(21)

Figure 4: A dataset example of the reviews with the corresponding products used in experiment 3 and 4 for the individual models.

the first value represents the number of positive helpfull votes and the second value represent the total number of votes. The helpful votes are made by other unique customers who read the review.

• ReviewText is the text written by the reviewer about the product. • Overall is the star rating made by the reviewer. The star rating is a value

between 1 and 5, where 1 is the lowest rating and 5 the higest rating. • Categories is a list of categories the product belongs to.

In figure 4 we show an example of six reviews with their corresponding prod-ucts. These products are used in experiment 3 and 4 for the review helpfulness prediction and visualizing helpfulness characteristics.

(22)

4.2 Implementation details

4.2.1 Feature Extraction

To represent the review text we create Word2Vec, LDA and Sentiment features. These features are created by the following procedures.

The review text is preprocessed first in order to create the LDA textual feature. We start by removing the stop words, like ”for” or ”the”, since these words don’t give us extra information about the topic. Next the reviews are stemmed. Stemming words is a common NLP technique to reduce similar words. For example the words ”walk”, ”walking” and ”walked” will be transformed into ”walk”. This is important for topic modeling because the LDA model would otherwise treat these words differently. After preprocessing the review texts we preform the LDA model. This results in a vector {l1, l2, ..., lT}, where li = 1

when the review is assigned to topic i and zero otherwise and L is the total number of topics trained.

To generate the Word2Vec feature we used a pre-trained Neural Network model on the Google News dataset (about 100 billion words). This model has 300-dimensional vectors for 3 million words and phrases. The phrases were obtained using a simple data driven approach describe in [29]. The Word2Vec feature vector in figure 2 is created by applying the model to all words in the review and use these vectors to calculate an average vector with the size of {1 × 300}

The SentiStrength package [47] is used to extract the Sentiment features. SentiStrength is based on a lexicon of 2310 sentiment words and word stems obtained from the linguistic Inquiry and Word Count (LIWC) program [37], the General Inquirer list of sentiment terms [45] and ad-hoc additions made during testing, particularly for new CMC words. All words in a document are used as an input to construct an output of a positive score (1 to 5) and a negative score (-1 to -5). The model gives each word or stem in the dictionary a positive

(23)

or negative score. The scores of the words are human assigned and based on a corpus from the social network site Myspace [47].

4.2.2 Parameter adjustment

In the recommender system we use 3 different optimization techniques for the Factorization Machines model namely: Stochastic Gradient Descent (SGD), Alternating Least-Squares (ALS) and Markov Chain Monto Carlo (MCMC). SGD is very popular for optimizing factorization models as they are simple, work well with different loss functions and have low computational and storage complexity [39, 52]. SGD is based on iterating over cases of the training data and performing small steps in the direction of a smaller loss using the derivative. ALS minimizes the regression with L2-regularization. It optimizes one regression for parameter θ given all remaining parameters Θ\{θ} [8, 41]. The MCMC optimization technique is a Bayesian inference technique that generates the distribution of ˆy by Gibbs sampling [11].

4.3 Evaluation Metrics

We use different evaluation metrics for the recommendation experiment and the helpfulness prediction. In this section we explain which metrics are used and how they are calculated.

Recommender System: After creating the recommender systems, the Root Mean Squared Error (RMSE) is used as evaluation metric. The RMSE, calculated with the test data set (20%), represents the sample standard devi-ation of the differences between predicted values ˆyk and observed values yk.

Suppose the size of the test sample is K, the calculation of the RMSE can be expressed as: RM SE = s PK k=1(ˆyk− yk)2 K (11)

(24)

A lower returned RMSE means that the predicted parameter is closer to the ground truth.

Helpfulness Prediction: The Spearman’s rank correlation coefficient is used to represent the performance of the helpfulness prediction models. The Spearman’s rank correlation is calculated as follows:

rs= ρrgX,rgY =

cov(rgX, rgY)

σrgXσrgY

(12) Here ρ denotes the Pearson correlation coefficient, applied to the rank variables, cov(rgX, rgY) is the co-variance of the rank variables and σrgX and σrgY are

the standard deviations of the rank variables. The Spearman’s rank correlation is calculated using the test dataset (20%). The correlation is a value between -1 and +1 where a value of −1 or +1 means a perfect correlation and 0 means no correlation at all.

4.4 Experiments

Experiment 1: Recommender System. The first experiment tries to find whether a recommender system can be improved by using textual features. We create a Factorization Machines Collaborative Recommender System using the LIBFM package [40]. In order to deal with the cold start problem we use only the users with 5 ratings or more. First we construct a base line using only the user and item features to predict the rating. This baseline will be used to measure the improvements of the system by adding the textual features. An example of the final dataset is illustrated in figure 2.

We preform a k-fold cross validation on the dataset where k = 5. The base-line will be created by comparing the optimization techniques, SGD, ALS and MCMC, with different number of iterations, learning rates and factors. First, we find the optimal number of iterations by performing Factorization Machines to iterations between 25 and 1000 for the three optimization techniques SGD, ALS and MCMC. Secondly we use the optimal number of iterations, found in

(25)

the previous step, to find the optimal learning rates. The model is calculated for different learning rate (lr) where lr = [0.001, 0.005, 0.01, 0.05, 0.1]. We do not apply this step to MCMC since this technique does not use a learning rate. The last step is to find the optimal number of factors (f ) for all optimization techniques where f = [5, 10, 15, 20, 25, 30, 40, 50, 75, 100]. The combination of the number of iterations, learning rate value, number of factors and optimiza-tion technique that returns the lowest RMSE will be used as the baseline. The optimal t number of topics used for LDA will be found by performing the Fac-torization Machines with LDA features for t = [10, 20, 30, 45, 60, 75, 100]. The number of topics that will be used is based on the model with the lowest RMSE. By adding the textual features Word2Vec, LDA and Sentiment separately to the baseline we investigate which textual feature improves the recommendation the best by comparing the RMSE. We also investigate the performance of the system when we use all textual features together. This way we try to see if the features can complement each other in item recommendation.

Experiment 2: Helpfulness prediction, General items. In the second experiment we predict the helpfulness using a SVM Regression model. By creat-ing multiple SVM models with different features we can compare the importance of features to each other. We apply a 5-fold cross validation for three different kernels namely: the linear, polynomial and the radial basis function (RBF) for different cost functions C where C = [0.01, 0.1, 1, 10]. We use the model with the best performing kernel in combination with the best cost function C.

The SVM regression will be preformed on the Word2Vec, Rating and Sen-timent features to predict the helpfulness of a review. We include the review rating in the regression because Kim [18] shows that this parameter is one of the most important in predicting the review helpfulness. In order to predict a trust worthy helpfulness we only use the reviews which have at least 5 helpfulness votes.

(26)

separately and a combination of all features. The Spearman’s rank correlation coefficient, shown in equation 12, is used to represent the performance of the SVM model.

Experiment 3: Helpfulness prediction, Individual items. While ex-periment 2 focuses on the general helpfulness prediction for all clothes, this experiment focuses on particular items. We will apply the same SVM regres-sion approach as experiment 3 to the three different items shown in figure 4. The average Spearman’s rank correlations of these three models is returned as the performance measure. By doing this we try to find out if a model can perform better when only the data of a specific product is used. This way we specialize the model for one single product which may lead to better performance.

Experiment 4: Helpful review characteristics. In the last experiment we will focus on the characteristics of helpful reviews and unhelpful reviews. We will investigate which words are frequently used for helpful and unhelpful reviews. Companies can use these characteristics in their product description to give information about their product. We construct wordclouds by first sorting the reviews based on the helpful score, which is the number of positive votes divided by the total number of votes. Next we used the top 20 reviews as a sample for the helpful reviews and the bottom reviews as unhelpful reviews. This experiment will be applied to all products and to the individual products shown in figure 4. We also try to find differences in the length of the reviews.

(27)

5 Results

In this section we will discuss the results of the experiments. First we examine the improvements made by textual features on the baseline of the Collaborative Recommender System. Second we will discuss the prediction of the review helpfulness for clothing in general and individual items. The results of the helpful review characteristics experiment are discussed in the last paragraph.

5.1 Experiment 1: Recommender System

Figure 5 shows the construction of the baseline, where only the user and item features are used. In 5a the results for SGD, ALS and MCMC are shown for different numbers for iterations while the learning rate and the number of factors are fixed. In 5b the result of the change in learning rate is shown using the optimal number of iterations found in 5a and a fixed number of factors. In 5c the optimal number of iterations and learning rate found in 5a and 5b are used while the number of factors is variable.

Figure 5a shows that 1000 iterations give the best results for SGD, ALS and MCMC. Using 1000 iteration in figure 5b, we find that the learning rate of 0.001 gives the lowest RMSE for both SGD and ALS. In figure 6(c) we observe that all optimization techniques do not show large improvements when the number of factors increase. This is a consequence of the used sparse matrix shown in figure 2. There are not enough data in this situation to estimate complex interactions between parameters. Restricting the number of factors and thus the expressiveness of the Factorization Machine model leads to a better result [39]. The baseline is the lowest returned RMSE in figure 5. This baseline has a RMSE of 1.07 which is created by using the MCMC technique with 1000 iterations.

In table 1 the results of the Collaborative Recommender system are presented using different textual features. Here the RMSE is used to represent the model

(28)

(a) (b)

(c)

Figure 5: Recommendation performance in terms of RMSE using the user and item features.

Figure 6: The RMSE for item recommendation with LDA feature for different number of topics using the MCMC with 1000 iterations.

(29)

Table 1: The RMSE of the Recommender System with different Textual features.

User Item LDA W2V Sentiment RMSE Baseline Improvement

Yes Yes - - - 1.07

-Yes Yes Yes - - 0.92 14 %

Yes Yes - Yes - 0.87 19 %

Yes Yes - - Yes 0.98 8 %

Yes Yes Yes Yes Yes 0.82 23 %

accuracy. In figure 6 we observe that k = 20 number of topics used in the LDA model returns the lowest RMSE of 0.92. The optimal LDA feature addition, with k = 20, to the baseline results of a model improvement of 14%. The Word2Vec neural network returns a lower RMSE of 0.87 which is equal to a improvement of 19% compared to the baseline. The reason LDA performs worse than Word2Vec could be caused by the use of small textual documents (reviews). LDA gives better results when large documents are used [38]. The sentiment features returns a RMSE of 0.98 which is the lowest model improvement of 8%. So all textual features improve the model where Word2Vec returns the highest improvement. The lowest RMSE is created by the use of all textual features. The combination of LDA, Word2Vec and Sentiment gives a RMSE of 0.82 which is equal to a model improvement of 23% compared to the baseline. Because of this we can conclude that the textual features complement each other in recommending items to users.

5.2 Experiment 2: Helpfulness prediction, General items

In table 2 the results of the 5-fold cross validation L2-regularized L2-loss support vector regression, for the prediction of the review helpfulness of clothing items, are presented. Here we present the Spearman’s Rank correlation for the used features.

(30)

Table 2: Spearman’s rank correlation for helpfulness prediction for clothing reviews

Rating Word2Vec Sentiment Correlation

Yes - - 0.29

- Yes - 0.27

- - Yes 0.01

Yes Yes - 0.39

Yes Yes Yes 0.41

First we observe that the textual Word2Vec feature gives a Spearman’s rank correlation of 0.27 while the Sentiment feature only results in a correlation of 0.01. The Word2Vec feature outperforms the sentiment feature. The effective-ness of Word2Vec could be explained by the fact that small review texts are used where the Word2Vec gives the SVM regression more information compared to the sentiment. The result suggests that the context in a review is more im-portant than the sentiment of the review itself to predict the helpfulness. The rating feature returns the highest correlation of 0.29. This result suggest that the rating of the review has a big influence on the helpfulness of the review itself. This conclusion is confirmed by [18].

The highest correlation (0.41) is found when we use all features together. This suggests that the Word2Vec, sentiment and rating features complement each other. However if we only use the Word2Vec and Rating features we get a correlation of 0.39 which is a 5% worse performance when all features are used. So the Word2Vec and Rating features are responsible for 95% of the useful information in order to predict the review helpfulness compared to the Sentiment feature.

(31)

5.3 Experiment 3: Helpfulness prediction, Individual items

In this experiment we perform the same procedure as in experiment 2 but instead of using all clothing data we create three individual models for the three products show in figure 4. By doing this we try to see if the prediction model could be specialized for one single product.

Table 3: Spearman’s rank correlation for helpfulness prediction for one specific product

Rating Word2Vec Sentiment Correlation

Yes - - 0.23

- Yes - 0.34

- - Yes 0.03

Yes Yes - 0.38

Yes Yes Yes 0.41

In table 3 we find the Spearman’s correlation results for SVM regression using different features. The regression with the Word2Vec feature has a correlation of 0.34 while the Sentiment feature only gives a correlation of 0.03. This suggest that the contextual information in the Word2Vec outperforms the Sentiment in predicting the helpfulness. The rating of the review as a feature returns a correlation of 0.23 which indicates that the star rating is important for the helpfulness. The highest correlation is reached when all features are combined. In this situation the features complement each other in de regression. The Word2Vec and Rating features help the prediction the most since the regression with these features returns a correlation of 0.38, which is only 7% less than the regression with all features.

The results of experiment 2 in table 2 and the results of experiment 3 in table 3 are quite similar. They return the same highest correlation when all features are used. The model with only the Word2Vec features preforms significantly better for the individual dataset than the clothing dataset. This proposes that

(32)

the model can give more meaning to a review context for a individual product than for all products. This sounds intuitively correct since when only one prod-uct is used, the model can focus on the characteristics of one single prodprod-uct instead of the characteristics of all items.

5.4 Experiment 4: Helpful review Characteristics

In figure 7 the wordclouds of helpful and unhelpful reviews are shown. When we compare the wordclouds in figure 7 we observe that helpful reviews contains more objective terms like ”size”, ”shoe” and ”time”. While the unhelpful re-views contain more subjective terms for example ”like”, ”good” and ”look”. It is also notable that the average word length of helpful reviews is significantly larger than unhelpful reviews. The helpful reviews have an average word length of 52 while the unhelpful reviews had an average of 9 words. This suggest that the helpful reviews probably consists of more helpful information leading to a higher helpfulness. This corresponds to the conclusion in [18] where the length of the review is an important parameter in predicting the helpfulness.

Figure 7: Wordclouds of helpful (green) and unhelpful (red) reviews for all items. The helpful and unhelpful reviews had an average word length of 52 and 9 respectively

(33)

Figure 8: Wordcloud of helpfull (green) and unhelpfull (red) reviews of individual products

The wordclouds of individual products are shown in figure 8. We see that for almost all wordclouds the most used word is the product itself: ”shoe” and ”shirt”. When we look at the other words we see for the helpful reviews more objective words like ”women” and ”run” (women running shoes), ”wolf” and ”howl” (picture on the shirt) and ”Nike” and ”2000” (brand and price). If we take a look at the unhelpful reviews we see more subjective words like ”like”, ”will”, ”review” and ”look”. We conclude that helpful reviews use more objective words and describe the product more compared to unhelpful reviews.

(34)

6 Conclusion & Future Work

The goal of this thesis is to analyse the context of product reviews, from ama-zon.com, for recommendation and helpfulness. We investigated the impact of the combination of textual features on item recommendation and review help-fulness prediction. Previous research used a variety of textual features for item recommendation and helpfulness prediction. However it is never exam-ined whether a combination of these features results in a better performance. The main question of this thesis is: Can a combination of textual features im-prove recommendation systems and helpfulness prediction for Amazon product reviews?

Section 5 shows the results of Factorization Machines for item recommenda-tion making use of the textual features: Word2Vec, LDA and Sentiment. We conclude that all textual features give the recommender system extra informa-tion resulting in better performances. The Word2Vec feature shows the highest significantly improvement while the Sentiment feature the lowest. Because of the relative short texts of reviews the LDA feature scores worse compared to the Word2Vec feature. However LDA still returns a better result then the sen-timent features. The best recommendation is created when all textual features are used. So the Word2Vec, LDA and Sentiment features complement each other in item recommendation.

In case of using SVM regression for predicting the helpfulness of a review the Word2Vec and the Rating features were the most important parameters. The Word2Vec feature preformed better when the model is created for an individual item compared to a general prediction for all products. This is because the model can specialize in the case of individual items. The sentiment has still a significant effect on predicting the helpfulness however this effect in minimal compared to the Rating and Word2vec features. The helpfulness prediction model returned the same results when applied to all items and to individual

(35)

items. We also found that reviews that use more words with objectively terms have a higher helpfulness score then short texts and subjective terms are used. The main contribution of this thesis is that the textual features Word2Vec, LDA and Sentiment complement each other in clothing recommendation using Factorization Machines. We were also able to predict the review helpfulness in general and for individual items. By using this model firms are able to understand what drives review helpfulness to their products. This way firms can find out which description they can use to give more helpful information about their products. By the use of the prediction model it is also possible to give a helpfulness score to reviews that have not be read yet.

The helpfulness prediction model returns a high correlation with the ground truth. However the model can maybe improved by adding more features. One could examine if the model returns a higher prediction accuracy when more features are added. Another suggestion for future research is to examine if the SVM regression gives the same results when other products are used in the dataset like Electronics or Groceries. This way the differences in helpful review characteristics between Clothing, Electronics and/or Grocery products can get examined.

(36)

References

[1] Deepak Agarwal and Bee-Chung Chen. “fLDA: matrix factorization through latent dirichlet allocation”. In: Proceedings of the third ACM international conference on Web search and data mining. ACM. 2010, pp. 91–100. [2] Tejal Arekar, R.S. Sonar, and N.J. Uke. “A Survey on Recommendation

System”. In: International Journal of Innovative Research in Advanced Engineering 2.1 (2015).

[3] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

[4] Dheeraj K Bokde, Sheetal Girase, and Dabajyoti Mukhopadhyay. “Role of Matrix Factorization Model in collaborative Filtering algorithm: A Sur-vey”. In: IJAFRC 1.6 (2014), pp. 993–1022.

[5] John S. Breese, David Heckerman, and Carl Myers Kadie. “Empirical anal-ysis of predictive algorithms for collaborative filtering”. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998), pp. 43–52.

[6] Ana Castillo et al. “Introducing Analytics Talent in the Equation on Busi-ness Value of Social Media Capability: An Empirical Investigation”. In: (2017).

[7] Jie Chen, Chunxia Zhang, and Zhendong Niu. “Identifying Helpful On-line Reviews with Word Embedding Features”. In: International Con-ference on Knowledge Science, Engineering and Management. Springer. 2016, pp. 123–133.

[8] P Comon, X Luciani, and A. L. F. de Almeida. “Tensor decompostitions, alternating least squares and other tales”. In: Journal of Chemomerics 23.7-8 (2009).

(37)

[9] James Davidson et al. “The Youtube Video Recommendation System”. In: Proceedings of the fourth ACM conference on Recommender systems (2010), pp. 293–296.

[10] Michael D. Ekstrand, John T. Riedl, and Joseph A. Konstan. “Collabora-tive Filtering Recommender Systems”. In: Found. Trends Hum.-Comput. Interact. 4.2 (Feb. 2011), pp. 81–173. issn: 1551-3955. doi: 10 . 1561 / 1100000009. url: http://dx.doi.org/10.1561/1100000009.

[11] Christoph Freudenthaler, Lars Schmidt-Thieme, and Steffen Rendle. “Bayesian factorization machines”. In: (2011).

[12] Damianos Gavalas and Michael Kenteris. “A web-based pervasive recom-mendation system for mobile tourist guides”. In: Personal and Ubiquitous Computing 15.7 (2011), pp. 759–770.

[13] David Goldberg et al. “Using collaborative filtering to weave an Informa-tion Tapestry”. In: CommunicaInforma-tions of the ACM 35.12 (1992), pp. 61– 70.

[14] Carlos A. Gomez-Uribe and Neil Hunt. “The Netflix Recommender Sys-tem: Algorithms, Business Value, and Innovation”. In: ACM Transactions on Management Information Systems (TMIS) 6.4 (2016).

[15] Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. “Explain-ing collaborative filter“Explain-ing recommendations”. In: Proceed“Explain-ings of the 2000 ACM Conference on Computer Supported Cooperative Work (CSCW’00) (2000), pp. 241–250.

[16] Thomas Hofmann. “Latent semantic models for collaborative filtering”. In: TOIS 22.1 (2004), pp. 89–115.

[17] Andreas Hotho, Andreas N¨urnberger, and Gerhard Paaß. “A brief survey of text mining.” In: Ldv Forum. Vol. 20. 1. 2005, pp. 19–62.

(38)

[18] Soo-Min Kim et al. “Automatically assessing review helpfulness”. In: Pro-ceedings of the 2006 Conference on empirical methods in natural language processing. Association for Computational Linguistics. 2006, pp. 423–430. [19] Yehuda Koren, Robert Bell, and Chris Volinsky. “Matrix Factorization Techniques for recommender systems”. In: Computer 42.8 (2009), pp. 30– 37.

[20] Ralf Krestel, Peter Fankhauser, and Wolfgang Nejdl. “Latent dirichlet allocation for tag recommendation”. In: Proceedings of the third ACM conference on Recommender systems. ACM. 2009, pp. 61–68.

[21] P Venkata Krishna et al. “Learning automata based sentiment analysis for recommender system on cloud”. In: Computer, Information and Telecom-munication Systems (CITS), 2013 International Conference on. IEEE. 2013, pp. 1–5.

[22] Srikumar Krishnamoorthy. “Linguistic features for review helpfulness pre-diction”. In: Expert Systems with Applications 42.7 (2015), pp. 3751–3759. [23] Jiahui Liu, Elin Pedersen, and Peter Dolan. “Personalized News Recom-mendation Based on Click Behavior”. In: 2010 International Conference on Intelligent User Interfaces. 2010.

[24] Yang Liu et al. “Modeling and predicting the helpfulness of online re-views”. In: Data mining, 2008. ICDM’08. Eighth IEEE international con-ference on. IEEE. 2008, pp. 443–452.

[25] Stanley Loh et al. “A tourism recommender system based on collaboration and text analysis”. In: Information Technology & Tourism 6.3 (2003), pp. 157–165.

[26] Pasquale Lops, Marco de Gemimis, and Giovanni Semeraro. “Content-based Recommender Systems: State of the Art and Trends”. In: Recom-mender Systems Handbook. 2011, pp. 73–105.

(39)

[27] W Glynn Mangold and David J Faulds. “Social media: The new hybrid element of the promotion mix”. In: Business horizons 52.4 (2009), pp. 357– 365.

[28] Masoud Mazloom et al. “Multimodal Popularity Prediction of Brand-related Social Media Posts”. In: Proceedings of the 2016 ACM on Mul-timedia Conference. MM ’16. Amsterdam, The Netherlands: ACM, 2016, pp. 197–201. isbn: 978-1-4503-3603-1. doi: 10.1145/2964284.2967210. url: http://doi.acm.org/10.1145/2964284.2967210.

[29] Tomas Mikolov et al. “Distributed representations of words and phrases and their compositionality”. In: Advances in neural information processing systems. 2013, pp. 3111–3119.

[30] Tomas Mikolov et al. “Efficient Estimation of Word Representations in Vector Space”. In: Proceedings of Workshop at ICLR (2013).

[31] Raymond J. Mooney and Loriene Roy. “Content-Based Book Recom-mending Using Learning for Text Categorization”. In: ISIGIR (2000), pp. 195–204.

[32] Cataldo Musto et al. “Ask Me Any Rating: A Content-based Recom-mender System based on Recurrent Neural Networks.” In: IIR. 2016. [33] Un Yong Nahm and Raymond J Mooney. “Text mining with

informa-tion extracinforma-tion”. In: Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases. 2002, pp. 60–67. [34] Tetsuya Nasukawa and Jeonghee Yi. “Sentiment Analysis: Capturing

Fa-vorability Using Natural Language Processing”. In: Proceedings of the 2Nd International Conference on Knowledge Capture. K-CAP ’03. Sanibel Is-land, FL, USA: ACM, 2003, pp. 70–77. isbn: 1-58113-583-1. doi: 10.1145/ 945645.945658. url: http://doi.acm.org/10.1145/945645.945658.

(40)

[35] Thomas L Ngo-Ye and Atish P Sinha. “The influence of reviewer en-gagement characteristics on online review helpfulness: A text regression model”. In: Decision Support Systems 61 (2014), pp. 47–58.

[36] Makbule Gulcin Ozsoy. “From word embeddings to item recommenda-tion”. In: arXiv preprint arXiv:1601.01356 (2016).

[37] James W Pennebaker, Matthias R Mehl, and Kate G Niederhoffer. “Psy-chological aspects of natural language use: Our words, our selves”. In: Annual review of psychology 54.1 (2003), pp. 547–577.

[38] Xuan-Hieu Phan, Le-Minh Nguyen, and Susumu Horiguchi. “Learning to classify short and sparse text & web with hidden topics from large-scale data collections”. In: Proceedings of the 17th international conference on World Wide Web. ACM. 2008, pp. 91–100.

[39] Steffen Rendle. “Factorization machines”. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE. 2010, pp. 995–1000. [40] Steffen Rendle. “Factorization Machines with libFM”. In: ACM

Transac-tions on Intelligent Systems and Technology 3.3 (2012).

[41] Steffen Rendle et al. “Fast context-aware recommendations with factor-ization machines”. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM. 2011, pp. 635–644.

[42] F Ricci et al. “Recommender Systems Handbook”. In: Recommender Sys-tems Handbook. 2011.

[43] Yue Shi, Martha Larson, and Alan Hanjalic. “Collaborative Filtering be-yond the User-Item Matrix: A Survey of the State of the Art and Future Challenges”. In: ACM Computing Surveys 47.1 (2014).

[44] Amit Singhal. “Modern information retrieval: A brief overview”. In: IEEE Data Engineering Bulletin 24.4 (2001), pp. 35–43.

(41)

[45] Philip J Stone, Dexter C Dunphy, and Marshall S Smith. “The general inquirer: A computer approach to content analysis.” In: (1966).

[46] G´abor Tak´acs et al. “Matrix Factorization and Neighbor Based Algo-rithms for the Netflix Prize Problem”. In: Proceedings of the 2008 ACM Conference on Recommender Systems. RecSys ’08. Lausanne, Switzer-land: ACM, 2008, pp. 267–274. isbn: 978-1-60558-093-7. doi: 10.1145/ 1454008 . 1454049. url: http : / / doi . acm . org / 10 . 1145 / 1454008 . 1454049.

[47] Mike Thelwall. Heart and soul: Sentiment strength detection in the social web with sentistrength, 2013. 2013.

[48] Mark Tkalˇciˇc et al. “Affective Labeling in a Content-Based Recommender System for Images”. In: IEEE Transactions on Multimedia 15.2 (2013), pp. 391–400.

[49] VN Vapnik. The nature of statistical learning theory; Jordan M, Lauritzen SL, Lawless JL, Nair V, editors. 1995.

[50] Chong Wang and David M. Blei. “Collaborative Topic Modeling for Rec-ommending Scientific Articles”. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’11. San Diego, California, USA: ACM, 2011, pp. 448–456. isbn: 978-1-4503-0813-7. doi: 10.1145/2020408.2020480. url: http://doi.acm. org/10.1145/2020408.2020480.

[51] Hongzhi Yin et al. “LCARS: a location-content-aware recommender sys-tem”. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), pp. 221–229.

[52] Tong Zhang. “Solving Large Scale Linear Prediction Problems Using Stochas-tic Gradient Decent Algorithms”. In: ICML (2004), p. 116.

Contextual analysis of reviews for recommendation and helpfulness