A Personalized Alternative Flight Recommender Using Collaborative Filtering

(1)

MSc Information Studies

Data Science track

Master’s Thesis

A Personalized Alternative Flight

Recommender Using Collaborative Filtering

by

Nienke Pot

10381015

July 28, 2017

Supervisors:

Prof. Dr. Marcel Worring

Dr. Mazood Mazloom

(2)

Abstract

Recommendations on e-commerce sites are getting more personalized. This individual approach to users can create a better relationship with the customer. In this research a flight booking site is investigated. The flight company wants to recommend alternative flight destinations to the user when a flight destination is not available. To make this recommendation personalized, they want to base the recommendations on the user’s search history on flight destinations. In the literature three different approaches for recommendations are discussed: content-based filtering, collaborative filtering and data mining.

Here the collaborative filtering approach is chosen as most suitable. This model creates recommen-dations based on the previous behavior of the user and by comparing the user with other users. First three classifier are explored: kNN, SVM and MLP. Then three latent factor models were added to the classifier to see if this would improve the classifier: SVD, NMF and RBM. The differences between the different recommendation models were minor and the latent factor models did not improve the recommender system.

To make a more personalized recommendation to the user, more data must be obtained of the user. Now the user profile is to limited to make a personalized recommendation. The user can only be followed for a very short time so the data about the user is very sparse. Creating an account for the user would be a good solution for this problem. Another solution would be to make recommendations based on the similarity of the destination. Then a profile for each destination should be created.

(3)

1 Introduction

Recommendations allow fast automated customization and personalization for e-commerce sites (Sivapalan, Sadeghian, Rahnama, & Madni, 2014) and therefore are getting more important for companies. By recommendations companies can create a personalized marketing strategy based on the needs of the individual customer. And indeed the personalization of the recommendations results in a higher customer satisfaction and loyalty (Pollard, Chuo, & Lee, 2016) (Sivapalan et al., 2014). So companies are looking at more efficient and faster algorithms to improve their recommender system, to make their personalized recommendations more.

Based on the available data of the user and the aimed goal for the recommender system a company can choose for a specific recommendation technique. There are many techniques to make recommen-dations, but in principal there are three recommendation techniques to distinguish: content-based filtering, collaborative filtering and data mining (Sivapalan et al., 2014). Every method has a differ-ent focus. Contdiffer-ent-based filtering focuses on matching the user based on his previous interest in the content of an item and compares that interest with items holding a similar content (Thorat, Goudar, & Barve, 2015). In contrast collaborative filtering does not look at the content of an item, it looks at the user’s previous interest and compares it to other users with an similar interest (Ricci, Rokach, & Shapira, 2015). The last method, data mining, searches for any interesting patterns in the data to make recommendations (Han, Kamber, & Pei, 2006).

The data that is used for the recommendation is retrieved from the user when he leaves data on the website either explicitly, like ratings of an item, or implicitly, like purchases of an item (Sivapalan et al., 2014). When the type of data that is available is known and the goal that needs to be achieved is clear, the company can choose a recommendation method.

An example of a recommender system is Pandora.com. Pandora.com is an online radio station which recommends music to the user by the Pandora music recommender. Musical experts are asked to characterize the music using semantic descriptors as attributes. Then these attributes are matched with the attributes of the user’s taste, which are determined by the previous listening behavior of the user (Barrington, Oda, & Lanckriet, 2009). Another example of a recommender system is Net-flix. The online movie and series platform recommends movies and series to the user by looking at patterns in the items rating. They use a matrix factorization model to find the hidden patterns in the data to determine what the user might also prefer (Koren & Bell, 2015).

In this research the e-commerce site from a flight company for booking flights and the possibility here to make personalized alternative flight recommendations is investigated. When a flight is not available for a specific date an alternative flight destination needs to be presented to the user. Now these recommendations are made based on locations of the airport. So it recommends the closest airport to the original searched airport as an alternative destination. Location can work as a good estimator to make recommendations, but that is not very personalized. When a users needs to be in Barcelona but that airport for that date is not available the closest airport nearby can be a good recommendation. But when a user wants to go for a city trip and the Barcelona airport is not available then maybe another European city like Paris or Rome can also be a good alternative for the customer. So there is a need to examine if the recommendations that are made to the user can be made personalized.

The data that is available to make these recommendations, is the user’s search history on the website. This shows the flight destinations the user has searched for and the flight destination that is booked by the user. The challenge here is that the user is only tracked for a short moment, as soon as a users leaves the site the tracking starts over again when a user enters the site. So this leads to very sparse data. The aim of this research is to make personalized recommendations to the customer for alternative flight destinations based on their search behavior on the website.

1.1 Research question

The question that is aimed to answer is:

Can personalized recommendations for the flight destination be made based on the search history of the user?

The research question will be answered by the following sub questions:

• Which recommendation method performs best based on the search history of the user? • Are there patterns recognized in the search behavior of the user?

(5)

case. Then the the data is preprocessed, so the the alternative flights in the data are found. When the methods are known and the data is prepared the recommender systems can be tested and evaluated. Finally there is searched for any patterns in the data.

2 Related Literature

2.1 Data in recommender systems

The data that is available is of big influence for choosing the right recommendation method. Rec-ommendation systems use information of products and users to predict the next new item of interest to the user. The data used in recommendation systems can be rating data, behavior pattern data, transaction data or production data (Sivapalan et al., 2014). The data is retrieved from the user showing its preferences by implicit feedback like click-through or purchases, or by explicit feedback like ratings (Su & Khoshgoftaar, 2009). Explicit feedback was traditionally most used for recom-mender systems. Only this type of feedback costs the user effort which not many users are willing to make. Implicit feedback on the other hand is a natural product coming from the users interaction, which does not cost the user effort. Therefore it is easier for implicit data to get increased volume of data (Hofmann, Schuth, Bellogin, & De Rijke, 2014). The feedback of the user is stored in a user-item matrix. The user-item matrix is a matrix in which each row is represented by a user and each column by an item. In this matrix for each user is shown how high the user has rated an item, or if he has visited the item or not. So the data that can be used for recommendation systems comes in different forms and this influences the type of recommendation system that needs to be chosen.

2.2 Recommendation techniques

As indicated in the introduction there are different techniques to make recommendations and the three most commonly used techniques are: data mining, collaborative filtering and content-based filtering (Sivapalan et al., 2014). This section will elaborate on this.

• Content-based filtering makes recommendations based on the previous behavior of the user concerning the products he has chosen (Liu, Dolan, & Pedersen, 2010). The content(product) is described using labels and a weight indicating how good a label describes the content. Then using clustering or nearest-neighbor algorithms a new product can be recommended to the user (Sivapalan et al., 2014).

• Data mining is used for the extraction of useful information of the dataset. It can find patterns in behavior of the user. Their most important algorithms are the association rules (Sivapalan et al., 2014).

• Collaborative filtering bases its recommendations on users that have similar behavior. It looks at the past behavior of the users to see which users are similar and can therefore predict the user’s next preference (Liu et al., 2010)(Sivapalan et al., 2014) (Balabanovi´c & Shoham, 1997).

2.3 Content-based filtering

CBF uses item descriptions and the user profile to make recommendations. It looks at the users previous item ratings to see what items the user is interested in. Then the algorithm looks for items that are most similar to the items the user preferred(Thorat et al., 2015).

CBF can been seen as a three step problem: 1. Discover the attributes of the items.

2. Compare the attribute of the users preference with the attributes of the items. 3. Recommend the items which attributes satisfy the users interest (Thorat et al., 2015). So first the attributes of an item need to be found. The most used algorithm to look for the attributes of an item is tf-idf. The tf-idf weight uses the term frequency and the inverse document frequency to determine the weight of a term. Therefore it looks at the occurrence of a word in a document(term frequency) and at the number of documents the term occurs in(inverse document frequency). The tf-idf weight indicates the importance of an attribute in the item. Thus the attributes of an item can be determined (Pazzani & Billsus, 2007).

(6)

1. User preference model: a description of the items the users interest.

2. User history model: this can include data about whether a user has viewed, purchased or rated an item (Thorat et al., 2015) (Pazzani & Billsus, 2007).

From the items the user has in its profile, the attributes that are of the users interest can be retrieved. This way the user profile can show how important a user thinks an attribute is showing a weight. When the user profile is made and the attributes of the item are known, the next step is to recommend an item to the user. This can be done by heuristic methods or classification methods(Thorat et al., 2015).

So by determining the attributes of an item, then looking for the attributes of the user’s interest and then matching those, CBF can make the recommendations.

2.4 Data mining

Data mining is a good method for exploring the data. The main algorithms are based on associ-ation rules. The algorithm tries to find rules in large datasets (Mannila, Toivonen, & Verkamo, 1994).

Mining for association rules is a two-step process:

1. Find all frequent item sets from the dataset: these are combinations of items that occur often together in the dataset.

2. Find strong association rules from the frequent item sets: An association rule is considered to be strong if it satisfies a minimum support threshold, how often a frequent item set occurs, and a minimum confidence threshold, how often a rule holds to be true (Han et al., 2006) (Mannila et al., 1994).

To improve the performance of of this process the first step should be optimized since this is the most costly. A challenge here is that finding frequent item sets in a large dataset will give a lot of frequent items sets, because it also finds all the subsets of an frequent item set. Only looking at strong association rules is not enough, it is also important to look if there is a correlation. Association rules can not be used directly for making predictions, but they are a good starting point for further exploration(Han et al., 2006).

2.5 Collaborative filtering

CF makes prediction for a user based on the previous behavior of the user and its similarity to other users. The goal is to relate items and users. To make such a comparison there are two main techniques, these are the memory-based filtering or neighborhood approach and model-based filtering or latent factor models (Ricci et al., 2015)(Koren & Bell, 2015). The Neighborhood approach focuses at the similarity between items or users. Latent factor models can help to predict the missing values in matrix and try to find latent factors to make the predictions (Deng, Yu, et al., 2014). One of the biggest problem in CF is data sparsity. Most of the time there is a lack of data for new users, new products or not so popular products. They occur little and then no recommendation or few recommendations can be given (Su & Khoshgoftaar, 2009) (Wang, Wang, & Yeung, 2015). Latent factor models make of a high dimensional matrix with a lot of missing values a smaller low-dimensional representation of the matrix (Koren & Bell, 2015). A combination of both the neighborhood approach and the latent factor models can also be used. Here the latent factor models can improve the task of relating users to items (Ricci et al., 2015)(Isinkaye, Folajimi, & Ojokoh, 2015).

• Memory-based filtering is easy to implement and highly effective. This method looks at who the neighbors of the users are using similarity techniques and then calculates the next step of the user by looking at what the other users have done in the past(Sivapalan et al., 2014). The prediction is made for an unseen item by looking at what the most similar neighbors are and what those neighbors have rated, searched or bought (Balabanovi´c & Shoham, 1997)(Isinkaye et al., 2015). Here the entire or a sample of the database is used to make user-item predictions. The user is part of a group of users with similar interest. First the similarity between users is calculated. Then a prediction is made for a user by looking at how the others in his group have behaved (Su & Khoshgoftaar, 2009). Memory-based filtering is a supervised technique machine learning technique, this form of machine learning is most used. The targeted class labels are available here, the algorithm tries to directly provide discriminative power for pattern classification(LeCun, Bengio, & Hinton, 2015).

(7)

• Model-based filtering wants to improve CF by building a model based on the previous ratings. The goal here is to uncover latent factors in the dataset.

• Hybrid filtering combines memory-based and model-based filtering(Sivapalan et al., 2014).

2.6 Method selection

In this research the search history of the user is used, which is implicit feedback in the form of behavior pattern data. So the data is focused on the user’s behavior. There is no information of an item available. CBF is therefore not suited for this research. Both data mining and CF use the user’s behavior to make recommendations and are quite similar in that way. Only data mining is more generalized and CF is more personalized. Data mining looks for patterns seen over all users, and CF looks also at the previous behavior a specific user when making recommendations. When two different user both buy cheese and milk data mining will give them both the same recommendation, where CF looks at the past behavior of the two users and can give them therefore different recom-mendations. So because of the user based data and the more personalized approached for making the recommendations, this research will focus on CF. Though data mining is used afterwards to see if there are general patterns in the data. So first is searched for a good machine learning method for CF. Here the first focus will be on memory-based filtering since the goal of this research is to predict a certain value, and memory-based has a discriminative task as opposed to model-based filtering. Then there is looked if the latent factor models can help to improve the discriminative task of the classifier.

2.7 Machine learning techniques for collaborative filtering

Now the most appropriate recommender method is found, the machine learning techniques needs to be chosen to make the recommendations. The first step is to find a classifier for memory-based filtering. Then the latent-factor models to improve the memory-based filtering are discussed. kNN is the most basic classifier that looks at user similarity and neighbor behavior to make predic-tions. It is the most common technique for the neighborhood approach. The number of most similar neighbors that are compared to the user can be fine-tuned. Comparing to only 10 neighbors might not be representable, but the more neighbors that are taken into account the slower the algorithm will be.

So memory-based filtering can been seen as a classification problem and a SVM is one of the most used classifiers in machine learning (Xia, Dong, & Xing, 2006). The principle behind the SVM is that it is a binary classifier that splits two groups by a linear separator. This hyperplane separates the two groups optimally when the distance to the line between the two nearest point of each class is as large as possible i.e. the margin is as large as possible. The C parameter decides in how far you want to avoid false classifications. A large C, will give a small margin on the hyperplane and thus a smaller chance on misclassification and visa versa(Xia et al., 2006)(Weston & Watkins, 1998). This C is important to fine-tune, since you want a large as possible margin and as many as possible correct classifications. When the SVM has to separate more than two classes, we speak of a multi-class problem. The SVM most commonly uses a one-versus-rest approach to separate the multiple classes. Here one class is separated from the other classes by the hyperplane, instead of separating each class individually (Weston & Watkins, 1998). So first the SVM trains on set separating the classes by deciding where the hyperplanes are, then the new points are assigned to a class, when looking at what side of the plane the point is.

Another way to do the classifications is by using a neural network for CF. The Multilayer percep-tron (MLP)(Figure 1) is a neural classifier (He et al., 2017). It is a feed-forward neural network with multiple hidden layers. When a MLP has many hidden layers it is seen as a deep neural network (Deng et al., 2014). It detects itself what representation is needed for the classification or detection. Those levels are created by multiple non-linear modules, which from the raw data each time make an higher and more abstract level of representation of the data. With these higher levels of repre-sentation of the data complex functions can be learned. The high level of reprerepre-sentation of the data help identify what aspects of the input are important or irrelevant, this can help in discrimination tasks for classification(LeCun et al., 2015).

(8)

Figure 1: MLP

The MLP has at least three layers, an input layer, an output layer and one or more hidden layers. Each node from the current layer is connected with all nodes from the previous layer with, an MLP is thus an fully connected network. The nodes are connect with an specific weight for the non-linear activation function, the weights are updated by back-propagation(Deng et al., 2014)(LeCun et al., 2015). The number of nodes per layer and the number of layers that used can be fine-tuned. Also here applies the less nodes and layers the faster the algorithm is, but when having to less nodes and/or layers it can influence the accuracy of the model.

The next step is to find latent factor models that can improve the results of the classifiers. Here we are discussing two types of latent factor models: matrix factorization and a neural latent factor model.

One of the most discussed method is therefore matrix factorization (MF). Matrix factorization is used to improve CF, by using dimensionality reduction. The goal is to uncover latent features that explain the observed ratings(Ricci et al., 2015). For many cases MF is scalable and gives highly accurate results. The idea beyond it is to make a low-rank estimate out of the original user-item matrix. MF maps the user and items into the same latent feature space. The missing values in the matrix are predicted by looking at the inner-product of the user-item vector pairs (Luo, Zhou, Xia, & Zhu, 2014).

Singular value decomposition is the most basic MF model. It uncovers latent relations between customers and products and can then be directly used for predictions. It uses the low-dimensional space resulting from the SVD to improve neighborhood formation for later use in a neighborhood approach (Ricci et al., 2015). The low-dimensional matrix that is obtained contains a low-rank esti-mate of the original matrix. The unknown values are now predicted based on corresponding entries in the low-rank estimate. Because of the low-dimension that is gotten from the SVD, it can help in storing data efficiently. In MF a loss-function is used to determine the difference of the original rating and the rating as an output of an MF algorithm. Iteratively the parameters are updated to get as little loss as possible and that the model fits the training data. In this training phase non-negativity proves to be important. Since it can learn the meaning of the features more precisely(Luo et al., 2014).

Non-negative matrix factorization satisfies this non-negativity constraint by adapting the learn-ing rate. The learnlearn-ing rate is rescaled such when updatlearn-ing the parameters that only non-negative features remain(Luo et al., 2014).

The second latent-factor model is a neural latent-factor model:RBM. This is a one-layer generative stochastic neural networks that learns a probability distribution of a dataset. A RBM (Figure 2) layer consists of one layer of stochastic visible units and one layer of hidden visible units. Each visible unit is connected with each hidden unit. The model is called restricted since there are no connection between the nodes in the same layer (Deng et al., 2014).

The RBM reconstructs the input by three steps: feed-forward pass, backward pass and the compari-son with original input. For each input x there is a feed-forward pass where the input x is combined with a weight and bias, in the hidden layers the computation activation is calculated. When a cer-tain threshold is reached, the node in the hidden-layer will activate and returns an output x0in the backward pass. The activated nodes in the hidden-layer will return the new constructed input x0 back to the visible layer. Here the output is compared the original input. This will repeat until the reconstruction of the input is as close as possible (Wang et al., 2015).

By reconstructing the input the RBM tries to find patterns in the data by determining what features are most important. A RBM is mostly used in a neural network in the beginning of the network afterwards a layer with a discriminative task is added. The pre-training of RBMs works well in most cases with both large and small datasets (Deng et al., 2014).

(9)

Figure 2: RBM

So to make a recommender system using CF there are three classifiers: kNN, SVM and MLP. Then there is searched for latent-factor models that can help to improve the classifier, these models are: SVD, NMF and RBM.

3 Method

3.1 Data and ground truth

In this research implicit feedback of the user is used to make predictions about what alternative flight destination the user will book. The first question that comes up is: when is the booked flight destinations an alternative flight destination? Every time a user comes to the site an id is assigned to the user. From this id there can be tracked which query was from the same user, but most of all you can see if and what the user has booked in the end.

In the data is looked at the flight destinations in the search queries of the users and then looked at the booked flight destination for each user. There are 3 combination of flight destinations in the search queries and flight destination in the bookings per user to be distinguish.

1. A user has searched for a specific flight destination and has then also booked a flight for this specific destination.

DestinationA − > DestinationA

2. A user does search for one or multiple flight destinations, but does not book a flight.

DestinationA − > N oBooking

DestinationA, DestinationB, DestinationC,.... − > N oBooking

3. A user searches for multiple flight destinations before he has booked.

DestinationA, DestinationB, DestinationC,.... − > DestinationB

For the first case you can assume that the user did not book an alternative flight destination, since the user only has searched for one destination and has also booked a flight to this destination. The second case indicates that the user was looking for an alternative flight destination, but it is not known what alternative flight was chosen in the end. Therefore the second combination is not appropriate as an alternative flight. The last combination is appropriate as alternative flight. It shows that the user was not be satisfied by the flight, it might not be available, or that the user was doubting about the flight destinations. These last cases are labeled as alternative flights and used in the recommendation model. A user-item matrix is created as input for the recommendation system. Here every possible flight destination is a column, the user is represented on a row. When the user has searched for a flight destination the value of the cell of that column and row is entered as one. The user-item matrix is used as input of the recommendation model, the targeted value is the flight destination to which the user has booked. The recommendation model wants to predict the booked flight destination. Here the accuracy needs to be as high as possible.

(10)

3.2 Pipeline

A pipeline(Figure 3) is created as a structure for all the recommendation systems. The first step in the pipeline is the filtering of the data on the alternative flights and put them in a user-item matrix. This input and the labeled data set is then split into a train and a test set, the model is trained and finally the model is evaluated by looking at the accuracy of the model.

Figure 3: Pipeline

Before the classifiers were tested on the dataset, a randomized model was build which is used as the baseline. The randomizer randomly assigns a class for each of the samples. This randomized baseline gives an indication of what accuracy needs to be improved at least. The next step was to compare the three classifiers as recommender: kNN, SVM and MLP. For the three classifiers the parameters were tuned.

• For kNN the number of neighbors were tuned, this is the number of most similar neighbors to which the user is compared. More neighbors gives a better representation, but it also results in a slower algorithm.

• For the SVM the C parameter was tuned. This represent in how far you want to avoid misclas-sification. The larger the C, the smaller the margin and a less change on false clasmisclas-sification. The goals is to find a large as possible margin with as many correct classifications as possible. • For the MLP classifier the number of nodes and number of layers can be tuned. The more nodes the more capacity and weights, which can help for processing the signals of the previous layer. The more layers the higher the complexity of the model.

The next step was to improve the best classifier by adding a latent factor model to the recom-mendation system. The latent factor models that are added are: SVD, NMF and RBM.

• For NMF the number of dimensions can be fine-tuned. The less dimensions the faster the algorithm will work, but it also will have a lower complexity.

• For the RBM also the number of nodes can be fine-tuned. The more nodes the higher the capacity and weights are, which can help for processing the signals of the previous layer.

4 Results

4.1 Data

The dataset covers a period from November 2016 until February 2017. There were two tables to retrieve the data from. One table contained the searches of the user, the other table contained in-formation about the bookings per user. Each sample in the table contained:

• a user ID

• destination of departure • destination of arrival • number of adults

• number of children, number of babies • date of departure

• date of return

(11)

By linking the user ID of each table per user there could be found what the user was searching for and what the resulting booking was.

Then a user-item matrix was created, where each row represents a single user, the columns are represented by the destinations of arrival which are searched for and the destination of arrival that is booked is used as the label for each sample. The matrix contained 115 different arrival destinations. The total matrix contains 3209847 samples.

When filtering the data on cases where the user has booked an alternative flight, resulted in a matrix containing 1091943 samples. The dataset was then split in a training set and a test set. The trainings set had a size of 731601 samples and the test set a size of 360342 samples.

4.2 Recommender systems

The first recommendation model that was set as baseline was the randomized model. This random-ized recommender system gave an accuracy of 0.0086. This means that only 0.86% of the samples are labeled correctly.This seems correct since it has a change of one in 115, which results in 0.0087. The next step was to compare the two classifiers: kNN and MLP.

For kNN 10, 50, 100, 250 number of neighbors were tried. The differences are not very big, but 250 number of neighbors performed best for kNN.

Table 1: Results kNN

Number of neighbors Accuracy Accuracy Top 5

10 0.43 0.64

50 0.45 0.69

100 0.45 0.70

250 0.45 0.71

The SVM was fine-tuned for the C parameter with C 1, 10 and 100. The lowest C, C=1 performed best here. There is not looked at the top 5 results for the SVM. To get the probability score for all classes for the SVM is a very expensive run, which takes a lot of time. Since it performed worse then kNN and the MLP classifier there is no need to further examine this and spend all that time.

Table 2: Results SVM C Accuracy 1 0.42 10 0.40 100 0.39

The MLP was tried with different number of nodes and layer sizes. First the same number of nodes as features was tried, then there was looked what would happen if the number of nodes and the number layers would increase. Here the differences were also very small. But the MLP with 2 hidden layers with each 200 nodes performed slightly as the best. The performance on the training set was only slightly better, around 1%, better than the performance of the test set. Probably the natural capacity of the network is reached. Therefor increasing the number of nodes and layer has no added value.

When comparing the predicted values for the models with the different settings, they all match arou 90%. When comparing the different predicted values to each other not a specific label is targeted wrong. When comparing the differences between the predicted values, the number of wrong classified label is around 100. So the 10% that differs between the lists are not from a specific label that is guessed wrong.

(12)

Table 3: Results MLP

Number of layers & nodes Accuracy test set Accuracy Top 5 test set Accuracy training set Accuracy Top 5 training set

2 layers, 115 nodes 0.46 0.73 0.47 0.74 3 layers, 115 nodes 0.46 0.73 0.47 0.74 2 layers, 200 nodes 0.46 0.74 0.47 0.74 3 layers, 200 nodes 0.46 0.73 0.47 0.74 4 layers, 200 nodes 0.46 0.73 0.47 0.74 2 layers, 500 nodes 0.46 0.74 0.47 0.74 3 layers, 500 nodes 0.46 0.74 0.47 0.74 1 layer, 250 nodes 0.46 0.73 0.47 0.74 1 layer, 220 nodes 0.46 0.73 0.47 0.74 1 layer, 300 nodes 0.46 0.74 0.47 0.74

SVM performed the worst, though it did not differ to much from the other classifiers. The ac-curacy of both kNN and MLP were very similar, MLP performed slightly better on a few tenth. What did differ significantly was the time the classification took. Where the MLP classifier took an hour to do the classification, kNN took more than 13 hours.So based on accuracy, but especially on time MLP performed best. The MLP performed best with 2 hidden layers with each 200 number of nodes. It gave an accuracy of 0.4635. Normalization of the data did not change this result. So the next step was to try to improve the classifier by adding a latent factor model.

Table 4: Best Results

Method Accuracy Accuracy Top 5 Time: Random 0.0086 0.043 0.49 s kNN 0.46 0.71 13,5 h SVM 0.42 - 4,5 h MLP 0.46 0.74 58.4 m Normalize-MLP 0.47 0.73 1.19 h NMF-MLP 0.46 0.73 1.9 h SVD-MLP 0.21 0.42 1.02 h RBM-MLP 0.31 0.49 1.15 h

None of the latent factor models did improve the MLP classifier. So a higher level of abstraction of the data did not help the discriminative task.

Table 5: Results latent factor models & MLP Model Accuracy Accuracy Top 5 SVD-MLP 0.21 0.42 RBM(200)-MLP 0.30 0.48 RBM(300)-MLP 0.31 0.49 NMF(100)-MLP 0.45 0.71 NMF(200)-MLP 0.46 0.72 NMF(300)-MLP 0.46 0.73

So MLP with 2 hidden layers with each 200 nodes performed best as recommender system, it resulted in an accuracy of 0.46 and 0.74 for the top 5 results.

4.3 Predictions

Looking at the predicted values, there can be seen that not all labels were guessed. These were the labels that occur less in the dataset. Here two options are tried to improve the predictions. First option was to make one category ’Others’ of the labels that occur very little. An other options was to remove samples with the labels that occur very little, since they barely were represented in the

(13)

dataset. Both improved the performance of the classifier very little. The MLP had an accuracy of 0.46, removing the tail and making one category of the tail also gave an accuracy of 0.46. So there was no improvement. The data (Figure 4) also showed that the labels ORY (Paris) and AMS (Schiphol) occurred way more then the other labels. These two labels were extreme outliers in the dataset. So there is looked if removing these values improved the accuracy, but it did not. Since the occurrence of those values were the highest, they were the easiest to predict. So removing these values lowers the performance.

Figure 4: All alternative flight bookings per airport

Table 6: Results MLP tails

Method Accuracy Accuracy Top 5

MLP 0.46 0.74

no-tail-mlp 0.46 0.74 tail in one category -mlp 0.46 0.74 no-outliers-mlp 0.41 0.66

Since there were 115 labels in total it is hard for the recommender to predict when there are so many options. Therefore is also looked at the top 5 recommendations for the accuracy. When looking at only the top 1 recommendation the recommender performs not very well,though it did improve the accuracy a lot compared to the randomized baseline. Looking at the top 5 it gives an accuracy of 0.74 which is quite reasonable. The occurrences of the predicted values did not differ too much of the occurrences of targeted values. So the recommender did not only guessed the values that occurred most.

4.4 Patterns

The latent factor models did not perform very well on this dataset, this is an indication that there are not many patterns in the data. By using association rules any frequent item sets in the user-item matrix are explored. This resulted negative, there were no frequent item sets in the data.

When looking at the booked flight destinations for the alternative flights (Figure 6), the flights destinations that were not alternative (Figure 7) and all flight destinations (Figure 5), they look similar. Only the not alternative flight destinations seem to have only the destination ORY as an outliers, while the alternative flight destinations and all flight destinations have both AMS and ORY as an outliers in the data. But further all three datasets with the flight destinations look very similar.

(14)

So there are no differences found between the alternative flight destinations and the normal flight destination and there is also no frequent item set found in the user-item matrix.

Figure 5: All bookings per airport

Figure 6: All alternative flight bookings per airport

(15)

5 Conclusion

The goal of this research was to investigate whether a personalized recommendation for alternative flight based on search history on flight destination of the user was possible and which recommenda-tion technique would perform best for this.

Based on the literature review there was chosen for a collaborative filtering technique since the focus was on previous behavior of the user and comparing it to other users, instead of looking at the content of the items. Three classifiers were compared: kNN, SVM and MLP. The MLP classifier with 2 hidden layers with each 200 nodes performed best both on accuracy and time. It gave an accuracy of 0.46 and a accuracy of 0.74 for the top 5 scores.

There was also tried to improve the classifier by adding a latent-factor model: SVD, NMF and RBM. None of these models were able to improve the classifier. This can indicate that there were no hidden patterns in the data. When looking for frequent item set in the user-item matrix, frequent item sets were found.

When looking at the results there was shown that some labels that occurred very less in the data, did not occur in the predicted values. By making one category of those label or by removing those label, the performance of the classifier was very slightly improved. There were also two extreme outliers in the dataset. These were the destinations ORY and AMS. When removing these outliers from the dataset the classifier performed worse. This follows from the fact that those labels occurred the most and were therefore the easiest to predict.

So making personalized predictions based on the search history of the user is hard. This has to do with two parts:

1. The data is very sparse. And since the user can only be tracked for a short moment, the user-item matrix seems incomplete. And therefore maybe no good patterns can be found in the matrix.

2. It is hard to determine what to exactly label as an alternative flight. Looking for multiple destination can indicate that the user has booked an alternative flight destination, but that does not need to be the case.

There are two options to make better personalized predictions for the user from his search history. The first one is follow the user for a longer period. So for example track the user for a week or month. This is not always possible through regulations. By making a user account though you can get a better feeling of the taste of the user. Then the recommendation are not based on search behavior but on their booking history. The second option is a content-based filtering method. This will link the user’s preference to the items. Then the next step is to determine what attributes belong to these items (flight destinations), this can be the climate, whether it is a airport for especially city trips or long stays, language etc. Based on the attributes of the items and the attributes of the user preference recommendations can be made. Of course there can also be chosen to make a hybrid system which combines comparing users with other users and users with items. But all of these improvements does underlie that you need to obtain more data of the user to make a better user-item matrix.

6 Acknowledgement

Thank you to Avanade for providing resources and guidance for this thesis. A special thanks for my supervisors at Avanade: Mateusz Skawinski, Nicole Holla and Marsha Jurgens. They really helped in guiding me to the right resources and helped in getting the project done in a short period. I also would like to thank Transavia for making their resources available and giving me a lot of freedom in this project. Lastly thank you to my supervisor at the University of Amsterdam: Marcel Worring for his feedback and supervision.

(16)

References

Balabanovi´c, M., & Shoham, Y. (1997). Fab: content-based, collaborative recommendation. Communications of the ACM , 40 (3), 66–72.

Barrington, L., Oda, R., & Lanckriet, G. R. (2009). Smarter than genius? human evaluation of music recommender systems. In Ismir (Vol. 9, pp. 357–362).

Deng, L., Yu, D., et al. (2014). Deep learning: methods and applications. Foundations and Trends in Signal Processing, 7 (3–4), 197–387.R

Han, J., Kamber, M., & Pei, J. (2006). Mining frequent patterns, associations, and correla-tions. Data Mining: Concepts and Techniques (2nd ed., pp. 227-283). San Francisco, USA: Morgan Kaufmann Publishers.

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web (pp. 173–182).

Hofmann, K., Schuth, A., Bellogin, A., & De Rijke, M. (2014). Effects of position bias on click-based recommender evaluation. In European conference on information retrieval (pp. 624–630).

Isinkaye, F., Folajimi, Y., & Ojokoh, B. (2015). Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal , 16 (3), 261–273.

Koren, Y., & Bell, R. (2015). Advances in collaborative filtering. In Recommender systems handbook (pp. 77–118). Springer.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444. Liu, J., Dolan, P., & Pedersen, E. R. (2010). Personalized news recommendation based on

click behavior. In Proceedings of the 15th international conference on intelligent user interfaces (pp. 31–40).

Luo, X., Zhou, M., Xia, Y., & Zhu, Q. (2014). An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Transactions on Industrial Informatics, 10 (2), 1273–1284.

Mannila, H., Toivonen, H., & Verkamo, A. I. (1994). E cient algorithms for discovering association rules. In Kdd-94: Aaai workshop on knowledge discovery in databases (pp. 181–192).

Pazzani, M., & Billsus, D. (2007). Content-based recommendation systems. The adaptive web, 325–341.

Pollard, D., Chuo, S., & Lee, B. (2016). Strategies for mass customization. Journal of Business & Economics Research (Online), 14 (3), 101.

Ricci, F., Rokach, L., & Shapira, B. (2015). Recommender systems handbook. Springer. Sivapalan, S., Sadeghian, A., Rahnama, H., & Madni, A. M. (2014). Recommender systems

in e-commerce. In World automation congress (wac), 2014 (pp. 179–184).

Su, X., & Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009 , 4.

Thorat, P. B., Goudar, R., & Barve, S. (2015). Survey on collaborative filtering, content-based filtering and hybrid recommendation system. International Journal of Computer Applications, 110 (4).

Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. In Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining (pp. 1235–1244).

Weston, J., & Watkins, C. (1998). Multi-class support vector machines (Tech. Rep.). Tech-nical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, Uni-versity of London, May.

Xia, Z., Dong, Y., & Xing, G. (2006). Support vector machines for collaborative filtering. In Proceedings of the 44th annual southeast regional conference (pp. 169–174).

A Personalized Alternative Flight Recommender Using Collaborative Filtering

MSc Information Studies

Data Science track

Master’s Thesis

A Personalized Alternative Flight

Recommender Using Collaborative Filtering

by

Nienke Pot

10381015

July 28, 2017

Supervisors:

Prof. Dr. Marcel Worring

Dr. Mazood Mazloom

Contents

1

Introduction

1.1

Research question

2

Related Literature

2.1

Data in recommender systems

2.2

Recommendation techniques

2.3

Content-based filtering

2.4

Data mining

2.5

Collaborative filtering

2.6

Method selection

2.7

Machine learning techniques for collaborative filtering

3

Method

3.1

Data and ground truth

3.2

Pipeline

4

Results

4.1

Data

4.2

Recommender systems

4.3

Predictions

4.4

Patterns

5

Conclusion

6

Acknowledgement

References