Social Factors for Venue Recommendation

(1)

Social Factors for Venue Recommendation

Armand de Waard

University of Amsterdam

armanddewaard@me.com

Supervisor: Masoud Mazloom

ABSTRACT

Nowadays people share their experiences with their online followers on social media. This makes social media a good place to gather and analyze large scale data for a specific user. In this thesis we focus on user-generated content from a recommendation perspective. Unfortunately, there is no explicit rating when users generate content on social media. We propose to use social factors such as likes, comments, hashtags and the length of a post instead of explicit rat-ings for the use in venue recommendation. We collected over 3 million posts from Instagram spanning 472 venues in Amsterdam generated by over 700.000 users. We eval-uate our venue recommendation system in three different experiments. Our results show that the use social factors is promising for a venue recommendation system.

Keywords

Recommender System, Venue Recommendation, Social Me-dia Analysis

1. INTRODUCTION

In present day times, social media platforms such as Face-book, Instagram and Twitter are more popular then ever, with hundreds of millions of users using their services daily. Now users are sharing their daily lives and experiences with an ever-growing online user base, providing a huge potential data set with information about where people are going, what they have done and most importantly: How they ex-perienced it. In return, other users react on this with likes, comments and even certain keywords presented as hashtags. We can use this social user-generated data to build large data sets of how people interact regarding different venues, mapped to each individual user. This gives us a basis to build a venue recommendation system which takes this input and converts it to a recommendation for a different user. Whereas users usually show their emotions by adding certain text or images, we can also utilize other data from these posts. Metadata such as the number of likes from other users, the number of comments, the number of hashtags used and even the length of the post can all be viewed as a rating of some kind. As opposed to other recommendation systems we can use this metadata to create the data set we need.

In this project we focus on the user-generated data related to touristic venues in Amsterdam. Each year, Amsterdam receives over 32 million days of visit where tourists, both foreign and national, spend the night, with an additional 14 million day-visitors spending the day in Amsterdam[5].

Figure 1: Overview of distribution of tourists in Am-sterdam, highlighting the problem of clustering in certain areas. Venue recommendation can provide a solution by utilizing emotions being expressed on social media as ratings to create recommendations for new tourists and show them there is more than just the top venues.

In 2014, this amounted to over 6 million unique visitors in Amsterdam staying longer than one day[10]. This generates an estimated e5.7 billion for the city of Amsterdam. Cur-rent research shows that 85% of all visitors go and visit one or more museums, with the Van Gogh Museum, Rijksmu-seum and Anne Frank House being the most popular by far. The geographical location of the museums and attrac-tion also play a large role in the decision making process of what to visit. As can be seen in Figure 1, tourists are centered around a number of popular attractions like the before mentioned Rijksmuseum. This particular visualiza-tion was made by visualizing the specific transacvisualiza-tions made by tourists with the I-Amsterdam City Card.

This highlights a new problem. Tourists tend to stick to whatever attractions are close to them and which are acces-sible using the I-Amsterdam City Card. At current times, Amsterdam is reaching its limit for the amount of tourists it can handle. Quantity is being taken above quality. Apart

(2)

(a)

Figure 2: Overview of our venue recommendation system. The first section describes the gathering of data from Instagram. The second section shows noise removal and cleaning of data. The third section shows the creation of three of our user-item matrices. Lastly, we show the use of our user-item matrix in recommendation.

from the large amount of litter it creates, most of the tourists are visiting the city center, creating overcrowdedness in this area[11]. I-amsterdam proposes marketing the so called Am-sterdam Metropolitan Area (AMA)[3] in order to alleviate the massive amount of tourists in the Amsterdam city cen-ter and to betcen-ter distribute them throughout the AMA. The AMA offers a number of municipalities with all kind of tourist attractions, increasing the number of things to do for tourists and paving way for a better distribution. Venue recommendation based on social media can play a role in a more even spread of tourists across the AMA, using a tourists own experience and the results thereof to provide new recommendations for other tourists.

The aim of this research is to propose a venue recom-mendation system which can recommend places to different tourists. To that end, we use a social factor-based recom-mendation system which, in the core, uses social factors such as likes, comments, hashtags and the length of a post instead of explicit ratings for the use in venue recommendation.

In this thesis our contributions are as follows:

1. We create a large data set based on social media posts specifically about touristic places in Amsterdam from Instagram, including images and social factor meta-data;

2. We propose a new way to use social factors as a rating in venue recommendation;

3. We carry out different experiments based on social fac-tors and evaluate their performance with different al-gorithms and the optimal configuration of hyperpa-rameters.

2. RELATED WORK

This section describes related work on Recommender Sys-tems, the core of our proposal, and Social Media Analysis.

2.1 Recommender Systems

Recommender Systems give users recommendations on the subject they are looking for by filtering and compar-ing historical data and trycompar-ing to predict a match based on user input [20]. There are several ways to do this, with the first being a content-based prediction. The actual content or metadata of the product, based on history, is used to make a new recommendation for the user [16]. An alternative way of recommendation, used in this thesis, is collaborative filtering [9]. Collaborative filtering is a popular technology for recommendation systems that gives recommendations by way of two methods: item-based recommendations or user-based recommendations [22]. User-user-based recommendation works by finding users that are similar to the given user based on ratings given for certain items. Items that these similar users like are then recommended to the user [9]. On the other hand, item-based recommendation works by using a users previous ratings on items and finding similar unob-served items [21]. Both types of collaborative filtering are carried out by analyzing so-called user-item matrices, where a large table is presented with each row being a unique user, and each column being a unique item, for example a venue [7].

With social media being used more and more, recommen-dations based on user-interaction on social media are up-coming. An example is from Amato et al. [2], where a proposal is made for recommendation based on past inter-actions, opinions and selections. Data was mostly gathered

(3)

Table 1: Instagram data set statistics Statistic Number Posts 3.129.709 Unique users 726.238 hashtags 472 hashtags >50 posts 152 Posts for hashtags >50 posts 1.237.463 Posts with both text and image 599.756

First post October 28, 2010 Latest post April 7, 2016

from a Yahoo Flickr data set. Wu et al. propose a Twitter-based recommender system called Kaleido, taking into ac-count user behavior such as sharing, retweeting and also usage and location in order to recommend new media [26].

Previous work has been done in venue recommendation, with an example being the navigation of New York City with recommendation based on data gathered from Foursquare, Picasa and Flickr [27]. This has also further been elaborated upon by combining historical and location/check-in based data to make even better recommendations, where multi-ple cities are combined for more accurate predictions [18]. Tiwari and Kaushik enrich tourist locations by using crowd-based and supplementary data such as weather and traffic conditions in order to make recommendations and decision making for visiting new locations easier [25].

Different from most of the previous work, which use ex-plicit or imex-plicit ratings, we propose a venue recommenda-tion system which utilizes social factors such as likes, com-ments, hashtags and the length of a post as explicit ratings for the use in venue recommendation.

2.2 Social Media Analysis

Social media networks such as Instagram and Twitter are widely used in a variety of activities, mainly microblogging. This enables users to post daily updates about their lives with all sorts of contents. From life events to expressing emotion and opinions about things they experience in life [6]. Analyzing can ben done utilizing both textual and vi-sual data, combining with other modalities such as meta-data. Studies have shown that using single modalities such as only text or only visual information, yield lower results than when combined with other modalities. A study into Twitter event detection showed that combining text and im-age detection yielded a several percent higher accuracy [1]. The goal of this study is to also utilize multiple modalities and types of data, which can be improved even more by adding information such as user temporal behavior in ad-dition to text, visual or metadata. Another research into multimodality in Twitter showed that adding location and time stamp data provided high results for event detection [28]. The sheer amount of volume that is present of both the before mentioned Instagram and Twitter allows for great machine learning progression.

However, to the best of our knowledge there is little to no research on using a multimodal analysis based on metadata from social media posts. Whereas most related work utilized sentiment-based data, we aim to utilize the beforementioned social factors as the basis of our analysis.

Algorithm 1 Mining data from Instagram

n = 472, the amount of hashtags found in the Amster-dam Marketing attractions list. venueList holds the prepro-cessed venues to be mined

for i = 0 to i = n do venue = venueList[n]

mediaListF orV enue = getM ediaF orV enue(venue) for x = 0 to x = mediaListF orV enue.Length do

visData, textData, metaData = downloadM ediaItem (mediaListF orV enue(x)) saveT oDataBase(visData, textData, userData) end for

end for=0

3. SOCIAL FACTOR RECOMMENDATION

In the section below, an overview is given of how we arrive at the prediction of results. Our method consists of four steps, starting with sentiment extraction on the data set extracted from Instagram. This datadset is utilized in the creation of several user-item matrices. Firstly matrices are created based on textual and visual sentiment, these are also combined into a new matrix. Furthermore we also created matrices for the number of likes, -comments, -hashtags and length as of post. Lastly, the top 2 matrices were combined into a new matrix: LikesComments. Finally, we explain venue recommendation based on these user-item matrices.

3.1 Data set

The basis of our recommender system is a matrix that is based on the rating of users towards certain places of in-terest. To extract this rating from our posts, we first need to make the data usable for our sentiment extraction tool before we proceed. To be able to analyze the effect of a multimodal recommender system, we first need a large data set to analyze. The platform of choice was Instagram. With a user base of over 700 million monthly active users[8], In-stagram allows its users to post a combination of visual and textual expressions. We use the Instagram API for selecting media[14] for this. In line with the problem addressed in this thesis, we focused on hashtags related to the Metropoli-tan area of Amsterdam. We mined a data set containing more than 3.000.000 posts which is based on 472 separate hashtags. These hashtags have been selected from the Am-sterdam attractions list provided by AmAm-sterdam Marketing through Amsterdam Open Data[4].

Table 1 shows the main statistics for our data set. Most of hashtags that were used had less than 50 posts per hashtag, leaving us with 152 total hashtags that contained a represen-tative number of posts. Each hashtag in this list we stripped whitespaces, special characters and replace uppercase with lowercase letters, leaving one-word hashtags. This was done to adapt to the Instagram platform, where two-word hash-tags are not permitted. A venue like ”Van Gogh Museum” was transformed to the hashtag ”vangoghmuseum”. The blue column in Figure 2a shows a schematic representation of our data set. The purple column shows the cleaning of data de-scribed above. Accessing the Instagram API does not only give us textual and visual data. In addition, the number of likes, -comments and -hashtags per post are also included. Lastly, several metadata values such as date and location of post are also included.

(4)

(a) (b) (c)

Figure 3: Performance of different algorithms on our likes user-item matrix, measured in RMSE and obtained by changing different parameters. This figure shows the results when changing the learn rate and keeping the other two parameters fixed (a), changing number of factors whilst keeping the iterations fixed and utilizing the best value for learn rate (b) and finally the best learn rate and number of factors when changing number of iterations (c). This gives us an RMSE of 0.434 for the MCMC algorithm with a learn rate of 0.005, 50 factors and 1000 iterations.

Further analysis showed that around 50 percent of the posts has both an image and a caption text. The maximum number of posts for a user was 14.200. However, the user ranked at position 51 only had 913 posts. Upon furthor inspection, the top 50 users are mostly companies promoting Amsterdam. These users were excluded from the data set so they did not skew our results. The algorithm that has been used to mine the data from Instagram is shown in Algorithm 1.

3.2 User-item matrix creation

In this section we explain the creation of several user-item matrices. We utilize several modalities inferred from Insta-gram: visual, textual and social metadata. We hypothesize that we can use social metadata in a recommender system. Furthermore, we hypothesize that the social metadata will perform similar to sentiment-based modalities. Our algo-rithm for creation of the sentiment matrices was initially only based on textual- and visual sentiment, together with the combined matrix. This data was not experimented upon in this thesis. However, five more matrices have been cre-ated in the exact same way. Since these are based on meta-data, the sentiment extraction step could be skipped. This is roughly shown in the pink column in Figure 2a.

3.2.1 Number of likes user-item matrix

In order to create a user-item matrix based on number of likes, M atrixL we use the exact number of likes given

by users of Instagram for post Pu, i. With number of likes

varying greatly, we use Log10_{in order to decrease the range}

to fall between[0,6]. We then calculate the minimum and maximum scores in our new data set. These scores are used to normalize the entire transformed set to the scale [1,11]. This allows creating of M atrixL

3.2.2 Number of comments user-item matrix

In similar fashion to user-item matrix M atrixL, the

user-item matrix M atrixC is created using the number of

com-ments given on post Pu, i. After applying Log10

transfor-mation to the number of comments, we normalize from a scale of [0,3] to the scale [1,11]. This gives us the comments user-item matrix M atrixC

3.2.3 Number of hashtags user-item matrix

To create a user-item matrix based on the number of hash-tags used in post Pu, i, which we call M atrixH, the same

approach as the previous two matrices is used. The number of comments is transformed with Log10 _{to a scale of [0,2].}

Afterwards, this rating r is normalized to a scale of [1,11]. Now we create M atrixH

3.2.4 Post-length user-item matrix

The final user-item matrix utilizing social metadata from post Pu, i is based on the length of Pu, i. We take the number

of characters as length l. After transforming l with Log10 we normalize from scale [0,3.3] to [1,11], completing creation of M atrixPL.

3.2.5 Combined likes-comments user-item matrix

Following the example of a combined visual- and textual sentiment user-item matrix[13] we want to create a combined user-item matrix for M atrixL and M atrixC: M atrixL,C.

This is done by average pooling the ratings from both ma-trices into the new matrix.

3.3 Predicting new ratings and venues

Our user-item matrices serve as input for a predeveloped recommender system using state-of-the-art recommendation algorithms, LibRec[12]. LibRec has over 30 recommenda-tion algorithms that can be used to get the best results. We selected two of the best Factorization Machine algorithms Librec used on one of their sample data sets, seeing their explicit ratings were similar to our data set. The two al-gorithms we used are BiasedMF and SVD++. Further-more, the LibFM Factorization Machine [19] provided us with three more algorithms: Stochastic Gradient Descent (SGD), Alternating Least Squares (ALS) and Bayesian in-ferences using Markov Chain Monte Carlo (MCMC).

4. EXPERIMENTAL SETUP

4.1 Implementation Details

In order to get the data set ready for insertion into Li-bRec. We split into a 80-20 data set for training and testing

(5)

(a) (b) (c)

Figure 4: Performance of our different algorithms on comments-based user-item matrix, measured in RMSE and obtained by iterating through the sets of different parameters. This figure shows the results when changing the three different parameters, similar to Figure 3. A 0.0005 learn rate, 75 factors and 1000 iterations give us an 0.769 RMSE utilizing the MCMC algorithm.

Table 2: Used values per hyperparameter Parameter Values

Learn rate 0.001,0.005,0.01,0.05,0.1 Factors 5,10,15,20,25,30,40,50,75,100 Iterations 25,50,75,100,150,200,250,500,1000

purposes. Furthermore, we elaborate experiment with find-ing the optimal parameters for our matrices to get the best results and utilize different algorithms, described in the sec-tions below.

For our purpose we considered using the five best perform-ing algorithms, used by LibRec on their own data sets[17]. We consider Biased Matrix Factorization (BiasedMF)[15], Singular Value Decomposition (SVD++)[15] and Item-based K-Nearest-Neighbour(ItemKNN)[17].

For each of the three algorithms, we run them with a different combination of hyperparameters. BiasedMF and SVD++ use the number of factors, iterations and the learn rate. For ItemKNN we use similarity method, number of neighbours and shrinkage.The used values can be seen in Table 2. These values are also used in our other three algo-rithms from LibFM, as described in Section 3.3.

Following LibRec’s example[17] we set our default values to the same values as their best results had. Factors F = 20, Learn Rate LR = 0.01 and Iterations I = 25. We first test for the optimal number of iterations. We vary our value from {25, 50, 75, 100, 150, 200, 250, 500, 1000}. We use this value when finding the optimal Learn Rate from {0.001, 0.005, 0.01, 0.05, 0.1}. Finally, we use the previously determined values for both parameters to find the optimal number of factors from {5, 10, 15, 20, 25, 30, 40, 50, 75, 100}. For ItemKNN the parameters are different. Neighbours = 25, Similarity Method = PCC and Shrinkage = 25. Since the similarity method is fixed, we first find the optimal number of neighbours. Afterwards we use this to find the best re-sulting combination with shrinkage. All iterations are run with 10-fold validation across a 80-20 distribution for train-ing and testtrain-ing. With ItemKNN betrain-ing highly intensive with regard to computing power, this was dropped during the experimentation phase.

4.2 Sentiment extraction

The baseline to which we compare our recommender

sys-tem is a matrix that is based on the sentiment of users to-wards certain places of interest. To extract sentiment rating from our posts, we first need to make the data usable for our sentiment extraction tool before we proceed.

4.2.1 Creating a usable set of posts

The data mined from Instagram has been separated into different tables within a database. The input we need for sentiment extraction is one line per post, with the post-and userId, denoted pu, together with the specific hashtag

i for identification purposes later on. Below, an example is shown of the input we use per separate post with regard to sentiment.

PostId: 12 EndPostId UserId: 179418561 EndUserId poi: rijksmuseumgardens Endpoi beautiful city brilliant weekend amsterdam thebetterhalf fountain rijksmuseumgardens cute tb

4.2.2 Sentiment extraction through SentiStrength

Using our created set of posts, we run it through tiStrength [24] to determine the sentiment per post. Sen-tiStrength uses previous psychology research to determine both negative and positive sentiment in short, informal texts. This gives a near human accurate estimate of sentiment S within our posts[23]. We can then combine the output we get from SentiStrength into a user-item matrix to get it ready for the recommender system. This is further detailed in sec-tion 3.2 Below, the output of a single post pu, i that is gone

through SentiStrength can be seen. We can see that this post has a positive S of 3 (Out of 5) and a negative S of -1 (out of -5). This results and an average sentiment of 2, which is positive.

3 -1 beautiful[2] city[0] brilliant[1] weekend[0] amsterdam[0] thebetterhalf[0] fountain[0]

rijksmuseumgardens[0] cute[1] tb[0] [[Sentence=-1,3=word max, 1-5]] [[[3,-1 max of sentences]]]

4.2.3 Sentiment user-item matrices

As described in Section 4.2 we used SentiStrength to ex-tract textual sentiment STfrom our post Pu, i. SentiStrength

gives us both a positieve rating between [0,5] and a nega-tive sentiment between [-5,0]. Using average pooling we got

(6)

(a) (b) (c)

Figure 5: The result of using different algorithms on our hashtag-based user-item matrix M atrixH, measured

in RMSE. This figure shows the results when iterating through our sets of parameter values. Our lowest RMSE is 1.062, using the MCMC algorithm with a 0.0005 learn rate, 25 factors and 1000 iterations.

(a) (b) (c)

Figure 6: This figures shows the results of our experiments on the matrix M atrixPL, based on the length of

a post. We achieve a 0.703 RMSE at a 0.05 learn rate, 100 factors and 1000 iterations with the MCMC algorithm.

the average sentiment of a post. In order to prevent neg-ative ratings in our Recommender System and to allow for of the SentiStrength output, we normalized the ratings to a new scale of [1,11]. Afterwards, we were able to create the textual user-item matrix, denoted as M atrixT.

4.3 Evaluations

The different algorithms and combinations of hyperpa-rameters from 4.1 are all put through LibRec. To determine if using a multimodal user-item matrix provides better re-sults, we run each sequence of iterations for all five matrices: number of likes, -comments, -hashtags, length of post and combined. LibRec provides us with a clear output file con-taining all predictions and the result of predetermined eval-uation measures. We aim to evaluate performance utilizing two error-based measures: Mean Absolute Error (MAE) and Root Mean Squared Error(RMSE). Since not all algorithms provide us with an MAE output, we only consider RMSE. RMSE is a negatively-oriented error measure, meaning lower scores are better with 0 being the lowest with all predictions being correct.

4.4 Experiments

To be able to determine whether Social Factors can be used in Venue Recommendation, we propose the following research questions:

1. RQ1: How does the use of social factors perform in venue recommendation?

2. RQ2: Does the use of social factors in recommen-dation perform equally when compared to the use of visual and textual sentiment?

3. RQ3: Which parameters are optimal for the data sets we use in our experiments?

The experiments in the sections below are carried out to answer our research questions. The yellow column in Figure 2a gives a visual representation of how we use our user-item matrices.

4.4.1 Number of likes for venue recommendation

In this experiment we consider the use of using the number of likes on a post for recommendation. We find the optimal combination of parameters LR, F and I for different algo-rithms by changing the values as described in section 4.1.

4.4.2 Number of comments for venue

recommenda-tion

Similar to utilizing number of likes, we now experiment with different algorithms and parameter values when using the number of comments on a post. We used the exact same setup as for the number of likes.

4.4.3 Number of hashtags for venue

recommenda-tion

The third variant of social factors, number of hashtags, is used in this experiment to see the effect in recommendation

(7)

(a) (b) (c)

Figure 7: Combining two of the most important social factors gives the results as shown in these graphs. The same experimental setup as before has been used. We achieve our second best RMSE of 0.614 for a 0.01 learn rate, 20 factors and 1000 iterations on the MCMC algorithm.

of venues. Again, we use the setup described in section 4.1 to carry out our experiment, finding the optimal parameter values.

4.4.4 Length of post for venue recommendation

In this experiment we try to use the length of a post, depicted by the count of the number of characters, to find the best recommendation result. In line with the experiments in the previous sections, we vary the values of LR,F and I to find the optimal result.

4.4.5 Combination of number of likes and comments

for venue recommendation

In our final experiment, we combine the two best-performing matrices M atrixLand M atrixC into M atrixL,C. In order

to determine whether a multimodal matrix with social fac-tors provides adequate results in recommendation, we again utilize the setup from section 4.1 and change the values of our parameters in order to find the best result.

5. RESULTS

We show the result of our analysis using social factors for venue recommendation in figures 3, 4, 6, 5 and 7. The comparison of our top results is shown in Table 3.

5.1 Number of likes for venue

recommenda-tion results

In figure 3 we plot the result for our matrix based on number of likes, M atrixL. As can be seen in figure 3 (a) the

MCMC algorithm gives us the best result of 0.452 RMSE with a learn rate of 0.005. Factors are fixed at 20 whilst the number of iterations is set at 50. As shown in figure 3 (b) the MCMC algorithm further drops our RMSE to 0.451 at 50 factors, utilizing the learn rate found before and going through the factors as described in Table 2. Finally, when going through our iterations from 25 to 1000, we find the optimal combination of parameters. At 1000 iterations the MCMC algorithm gives us the lowest RMSE of 0.434. This final result can be seen in figure 3(c).

5.2 Number of comments for venue

recommen-dation results

Figure 4 plots the results for the number of comments on a post, created with M atrixC. Looking at figure 4, we see that

while iterating through our learn rate values the best RMSE

is a 0.769 with a learn rate of 0.0005. This is also with factors fixed at 20 and iterations fixed at 50 in a similar fashion as the experiment for M atrixL. Our second experiment on

M atrixC yields an RMSE of 0.769 with 75 factors. This

is only slightly lower than our first experiment and can be seen in Figure 4 (b). Changing the number of iterations, with our two optimal parameter values fixed, gives us a same RMSE of 0.769 on the MCMC algorithm. The final set of parameters is a learn rate of 0.0005, 75 factors and 1000 iterations. This is plotted in Figure 4 (c).

5.3 Number of hashtags for venue

recommen-dation results

The output of our third set of experiments, based on our matrix with number of hashtags M atrixHis plotted in

Fig-ure 5. As shown in the first figFig-ure a learn rate of 0.0005 pro-vides a first lowest RMSE of 1.062. Similar to M atrixLand

M atrixC, this is while using the MCMC algorithm. Figure

5 (b) highlights the second optimal parameter value, which is a slightly lower 1.062 at 25 factors. On our third run we get a range of RMSE values from 1.116 to 1.061, our low-est RMSE being achieved with a value of 1000 iterations. M atrixH is the first social factor to score an RMSE above

1.0.

5.4 Length of post for venue recommendation

results

Our final set of experiments consists of optimizing param-eters for data based on the length of a post, where length is denoted with the number of characters in a post. Similiar to the other three experiments, the MCMC algorithm provides us with the best results. Figure 6 (a) shows an RMSE of 0.703 with a learn rate of 0.05. Fixing the learn rate with the newfound value whilst iteration through the number of factors gives us our best result at 100 factors (See Figure 6 (b)). Again, a 0.703 RMSE score. Finally, with the two val-ues fixed, 1000 iterations give the final score of 0.703 RMSE. This is depicted in Figure 6 (c).

5.5 Combination of number of likes and

com-ments for venue recommendation results

In order to see if combined different social factors gives us better results, we combine two of the most important social factors in this experiments: Likes and comments. The results of which are shown in Figure 7. Whilst iteration

(8)

Table 3: Comparison of RMSE performance for each of our user-item matrices and our comparison ma-trices based on sentiment. Results show utilizing so-cial metadata is on par with- and even outperforms sentiment-based ratings Matrix RMSE M atrixT ext 0.618 M atrixV is 0.578 M atrixT ext,V is 0.464 M atrixL 0.434 M atrixC 0.769 M atrixH 1.062 M atrixPL 0.703 M atrixL,C 0.614

through the list of values, a learn rate of 0.01 gives us a lowest RMSE of 0.648. Unlike the other experiments, it is not the MCMC but the SVG++ algorithm. With the learn rate fixed, we find that again the SVF++ algorithm gives a lowest RMSE of 0.648 at 20 factors. Finally, the MCMC algorithm outperforms SVD++ with a final lowest RMSE of 0.614 at 1000 iterations. This is plotted in Figure 7 (c). Even though the final score is not lowest than our best score in the likes matrix, the result is better than the comments matrix on its own.

5.6 Complete set of results

In Table 3 we have a complete overview of our results for M atrixL, M atrixC, M atrixH, M atrixPL, M atrixL,C,

including our comparison results from [13] using textual-(M atrixT ext), visual- (M atrixV is) and combined sentiment

(M atrixT ext,V is).

6. CONCLUSIONS & FUTURE WORK

This section describes possible research and optimizations for future work. Also, we reach our conclusion.

6.1 Future work

Based on our contributions, there are several optimiza-tions that can be done to our work. Firstly, the gathering of data can be improved even further. We have filtered out the Top 50 users given their extreme numbers and high proba-bility of being businesses or other influencers. Seeing as the goal is to base the venue recommendation on actual users and not on businesses, further filtering of said business could improve the results and usability of the research.

That being said, our end goal would be to provide venue recommendation. We now know that social factors can be used for recommendation. This enables future research to output these recommendations and create a recommenda-tion for users. The focus here could be to filter out the most popular destinations and also start recommendation other popular destinations. This is in line with the overpopula-tion of the Amsterdam Metropolitan Area.

With this research being based on social factors that on some occasions outperform visual and textual sentiment, a new step could be creating multimodal experiments based on both social factors and visual and textual sentiment. With a probability of these sets being complimentary to each other, our rating predictions could be even more accurate, further improving venue recommendation to users.

Lastly, further improvement of our recommendation could be done by not only using user-based data such as likes and comments, but historical data as well. The more recent a like, the more relevant it is and the more weight it should get. This could further be improved upon by utilizing weather-based data. What was temperature and precipitation like when the rating was given? It is likely that outdoor venues will be more popular without rain and with sun and that indoor-based venues will be more popular on colder or rainier days.

6.2 Conclusions

In this research we have proposed the use of social factors in venue recommendation, utilizing a large data set gathered from Instagram and splitting this into several user-item ma-trices, each representing a different social factor. We have shown that each of the social factors can be used in a user-item matrix for venue recommendation, providing usable re-sults with each social factor including the multimodal set of likes and comments. Furthermore, we have shown that some social factors even outperform visual and textual sen-timent based user-item matrices when using likes. Using comments also outperforms a textual sentiment based user-item matrix. As described in Future work, additional work is required to determine whether further optimizations can be made by combining or extending these data sets to see which one works even better in a multimodal setting.

(9)

References

[1] S. M. Alqhtani, S. Luo, and B. Regan. Fusing text and image for event detection in twitter. CoRR,

abs/1503.03920, 2015.

[2] F. Amato, V. Moscato, A. Picariello, and G. Sperli. Recommendation in social media networks. In 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), volume 00, pages 213–216, April 2017. . URL doi.ieeecomputersociety.org/10.1109/BigMM.2017.55. [3] Amsterdam Marketing. Amsterdam metropolitan area,

2016. URL http://www.iamsterdam.com/en/local/ about-amsterdam/metropolitan-area. Online; accessed: 27-October-2016.

[4] Amsterdam Marketing. Amsterdam attractions, 2016. URL https://data.amsterdam.nl/dataset/attracties/. Online; accessed: 8-October-2016.

[5] Amsterdam Tourism & Convention Board. Amsterdam visitors profile. Amsterdam Visitors Profile, 2012. [6] Y. Bae and H. Lee. A sentiment analysis of audiences on

twitter: Who is the positive or negative audience of popular twitterers? In G. Lee, D. Howard, and D. ´Slezak, editors,_, Convergence and Hybrid Information Technology, pages 732–739, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-24082-9.

[7] J. Bu, X. Shen, B. Xu, C. Chen, X. He, and D. Cai. Improving collaborative recommendation via user-item subgroups. IEEE Transactions on Knowledge Data Engineering, 28(9):2363–2375, Sept. 2016. ISSN 1041-4347. . URL

doi.ieeecomputersociety.org/10.1109/TKDE.2016.2566622. [8] S. Byford. Instagram is growing faster than ever, 2017.

URL https://www.theverge.com/2017/4/26/15431872/ instagram-monthly-active-users-700-million-growth. Online; accessed: 24-June-2017.

[9] Y. Cai, H. Leung, Q. Li, H. Min, J. Tang, and J. Li. Typicality-based collaborative filtering recommendation. IEEE Transactions on Knowledge Data Engineering, 26 (3):766–779, March 2014. ISSN 1041-4347. . URL doi.ieeecomputersociety.org/10.1109/TKDE.2013.7. [10] Centraal Bureau voor de Statistiek. Metropoolregio

amsterdam in cijfers, 2016. URL

http://www.ois.amsterdam.nl/feiten-en-cijfers/

metropoolregio-amsterdam/toerisme/#. Online; accessed: 27-October-2016.

[11] B. Clark. Rijksmuseum director warns amsterdam ”too dirty and crowded” from tourists, August 2014. URL http: //www.iamexpat.nl/read-and-discuss/expat-page/news/

rijksmuseum-director-warns-amsterdam-too-dirty-crowded-tourists. Online; accessed: 8-October-2016.

[12] G. Guo, J. Zhang, and N. Yorke-Smith. Trustsvd: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pages 123–129. AAAI Press, 2015. ISBN 0-262-51129-0. URL

http://dl.acm.org/citation.cfm?id=2887007.2887025. [13] B. Hendriks. Multi-modal sentiment analysis for social

venue recommendation. 2017.

[14] Instagram Inc. Instagram media endpoints, 2016. URL https://data.amsterdam.nl/dataset/attracties/. Online; accessed: 8-October-2016.

[15] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–434. ACM, 2008.

[16] H. Li, F. Cai, and Z. Liao. Content-based filtering recommendation algorithm using hmm. In 2012 Fourth International Conference on Computational and Information Sciences, pages 275–277, Aug 2012. . [17] LibRec. Librec algorithms, 2016. URL

http://librec.net/tutorial.html. Online; accessed:

8-October-2016.

[18] A. Noulas, S. Scellato, N. Lathia, and C. Mascolo. A random walk around the city: New venue recommendation in location-based social networks. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pages 144–153, Sept 2012. .

[19] S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May 2012. ISSN 2157-6904.

[20] D. Sethi and A. Singhal. Comparative analysis of a recommender system based on ant colony optimization and artificial bee colony optimization algorithms. In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), volume 00, pages 1–4, July 2017. . URL doi.ieeecomputersociety.org/10. 1109/ICCCNT.2017.8204106.

[21] P. Su and H. Ye. An item based collaborative filtering recommendation algorithm using rough set prediction. In 2009 International Joint Conference on Artificial Intelligence, pages 308–311, April 2009. .

[22] M. Tang, Y. Jiang, J. Liu, and X. Liu. Location-aware collaborative filtering for qos-based service

recommendation. In 2012 IEEE 19th International Conference on Web Services, pages 202–209, June 2012. . [23] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and

A. Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544–2558, 2010. ISSN 1532-2890. . URL http://dx.doi.org/10.1002/asi.21416. [24] M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment

strength detection for the social web. Journal of the American Society for Information Science and Technology, 63(1):163–173, 2012. ISSN 1532-2890. . URL

http://dx.doi.org/10.1002/asi.21662.

[25] S. Tiwari and S. Kaushik. Information enrichment for tourist spot recommender system using location aware crowdsourcing. In 2014 IEEE 15th International

Conference on Mobile Data Management, volume 2, pages 11–14, July 2014. .

[26] C. Wu, Y. Zhang, J. Jia, and W. Zhu. Mobile contextual recommender system for online social media. IEEE Transactions on Mobile Computing, 16(12):3403–3416, Dec. 2017. ISSN 1536-1233. . URL

doi.ieeecomputersociety.org/10.1109/TMC.2017.2694830. [27] J. Zah´alka, S. Rudinac, and M. Worring. New yorker

melange: Interactive brew of personalized venue recommendations. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14, pages 205–208, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3063-3. . URL

http://doi.acm.org/10.1145/2647868.2656403.

[28] X. Zhou and L. Chen. Event detection over twitter social media streams. The VLDB Journal, 23(3):381–400, June 2014. ISSN 1066-8888. . URL