Streamwatch: evaluation of a twitter-based music recommendation system.

(1)

Streamwatch: evaluation of a

twitter-based music

recommendation system.

Ruben F. de Vries 10260218 Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors Isaac Sijaranamual dhr. dr. Evangelos Kanoulas Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam June 26th, 2015

(2)

Abstract

Streamwatchr is an online music service that provides real-time infor-mation about music listening behaviour around the world on their website, where the collected songs can be listened to. After each song an follow-up song is recommended by Streamwatchr which leads to an Internet ra-dio function. How well does this music recommender work and how well does it work in comparison to more popular music recommender systems like YouTube and LastFM? The first question leads to an arbitrary an-swer since not enough data is available in combination with the fact that a question concerning how well does something work? shall always be arbitrary. Second question has a relative answer that is achieved by de-veloping two di↵erent methods; implicit and explicit comparison. Implicit comparison implies that the users are not aware that the three music rec-ommender systems are compared, as with explicit comparison the users are. An A/B testing framework is set up locally for the implicit com-parison, but is not run in practice and has thus no results. The explicit comparison is done by developing an separate website where the three music recommender systems return their follow-up song given an initial song, which can then be rated for how good the follow-up song is and if the song is original. Results show that Streamwatchr scores not as good as YouTube and LastFM on the quality of the follow-up song, but scores twice as good in originality.

(3)

2.2 Evaluation . . . 6 2.3 Radio . . . 6 3 Method 7 3.1 Research Question 1 . . . 7 3.2 Research Question 2 . . . 9 3.2.1 Explicit . . . 9 3.2.1.1 YouTube . . . 9 3.2.1.2 LastFM . . . 10 3.2.1.3 Streamwatchr . . . 11 3.2.1.4 Playing songs . . . 12 3.2.1.5 Rating songs . . . 12 3.2.1.6 In Summary . . . 12 3.2.1.7 Design choices . . . 13 3.2.2 Implicit . . . 14 4 Results 16 5 Conclusion 17

6 Discussion and future work 17

(4)

1 Introduction

Streamwatchr is an online music service that provides real-time information about music listening behaviour around the world. This is achieved by collecting and analysing Tweets from Twitter users. All the real-time listening data is converted into statistics on the website of Streamwatchr which enables the user to find the most popular, now listened to and unexpected songs.

Music has been integrated into our lives since the introduction of portable music players (discman, mp3-players, etc). Nowadays a separate portable music player is unnecessary; music can be streamed or played on every sort of smartphone. This means nearly infinite access to music, wherever the user may be. Because of this accessibility, the focus has shifted from not just listening to music to also sharing the listened music. By analysing this non-private sharing information, trends and flows from music listeners can be derived.

Besides real-time statistics about listened songs there is the possibility to play music on Streamwatchr’s website. Every listened song is followed by a recom-mended song, which makes the listening function of Streamwatchr an online ‘radio’. These follow-up songs consist only of songs that have been tweeted at least once and occur in MusicBrainz[6], and are thus listened to by a sharing user. This means that the Streamwatchr radio is a radio provided by the users, for the users.

New song recommendation is the focus of this bachelor thesis. This main-focus can be divided into two research questions. Firstly, it is currently not known how well the recommendation works; is the recommended song a good follow-up to the previous song? Secondly, the computed performance of the previous research question needs to be compared to other recommendation sys-tems to gain knowledge about how well it works in comparison to more popular systems like LastFM and YouTube. More music recommendation systems are available present day (spotify, pandora etc.) but LastFm is chosen since it is especially developed for recommending music and YouTube for its popular-ity. Involving more than two systems would not be possible regarding the time since other music recommender systems are not as easy accessible as LastFM and YouTube.

For the first research question it is mandatory to give a definition of a ’good’ recommended song. This paper will show that it is not possible to define how appropriate a recommended song is, given the limited dataset that Streamwatchr has provided. A di↵erent approach has thus been taken; not the recommended songs are rated, but the total user experience of the radio function.

While the first research question has an absolute answer, the second question has a more relative one. Streamwatchr is compared to the recommendation systems of YouTube and LastFM in an implicit and explicit way. The implicit method uses A/B testing in combination with the experience measure created for the first research question, while for explicit testing a separate webpage was set up

(5)

where users can explicitly rate all three recommender systems by comparing recommended songs given an initial song.

2 Literature Review

2.1 Music Recommender Systems

The current music recommender of Streamwatchr is based on the Google News Personalization ranking[1]. This news recommendation system is a domain in-dependent approach which makes it possible for Streamwatchr to adept this system. Where Google uses the news articles that users read, Streamwatchr uses the songs that are listened to and are tweeted. Three di↵erent algorithms are used for generating a recommendation; MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. Details about these algorithms are left out of this section since these algorithms are not covered during this project.

Music recommendation can be done in various ways. Next to the domain in-dependent approach, like the Google News Personalization recommender, are the domain specific approaches. Not every (specific) music recommender uses the same input signals. Knowledge about these di↵erent signals, and how they are addressed, can lead to insights for the improvement and comparison of Streamwatchr’s current recommender.

The Local Implicit Feedback Mining recommender from Yang is based on online and o✏ine signals, in combination with a supervised learning algorithm[9]. From every user is known which song they like by means of a rating; an o✏ine signal. Date, time and location of a user listening to a song were collected as online signals. Using the online signals as predicting data, and the o✏ine data as labels, a supervised learning algorithm was trained. The evaluation of this learning algorithm was done by computing the precision and recall. Where the gathered online signals are in accordance to the data gathered by Streamwatchr, the o✏ine labels are not; Streamwatchr does not know from every user which music they like.

Lee et al. (2011) also developed a music recommender with the usage of o✏ine data, namely music playlists[5]. Their method consisted of combining playlists from di↵erent users, where the playlist is divided in to a head (the X most listened songs) and a tail (the remaining songs). Suppose there is a user1 and user2. If the head of the playlist of user1 occurs in the tail of user2, the head of user2 might be interesting for user1. The Evaluation of this music recom-mender was done by presenting the users a list of 20 recommended tracks, which could then be rated (o✏ine evaluation). At the moment Streamwatchr does not provide a playlist function, this might be a useful upgrade for the music recom-mender though.

(6)

2.2 Evaluation

An online/implicit evaluation framework need to be set up for the evaluation of the three music recommender systems. Controlled experiments with online website users are often developed in combination with the A/B testing frame-work. With this method it will be examined which of the possibilities is more efficient. In order to provide a clear explanation of this framework it will be explained by means of an example:

Suppose you own a website where electronic devices are sold and you are not satisfied with the number of products sold. You want to know whether the place of the purchase button on the product page a↵ects sales numbers. The online A/B testing framework can be used to perform such an experiment. Here, the users (experimental units) are divided when entering the website into group A or group B where users from group A are presented with a product page where the purchase button is displayed on the left side, and users in Group B a page where the button is displayed on the right side. Over time, comparing sales of both concepts can lead to the answer of where to display the buy button. With this online framework it may thus be possible that at the end of the study can be determined which option (A or B) is the most efficient, and therefore, is preferred over the other.[3].

Besides online/implicit evaluation is o✏ine/explicit evaluation, where the ex-perimental units are more explicitly involved in to the research. With of-fline/explicit evaluation the users are aware that multiple systems are being compared to each other and are explicitly asked to rate the systems, as where with implicit comparison the users are not. The experimental users are therefore more focussed on the compared systems (in this paper three music recommender systems) and the purpose of the experiment, which will lead to a more safe out-come which can then be related and compared to the online/implicit evaluation outcome.

2.3 Radio

The initiators of Streamwatchr have added playing music and song recommen-dations to create a radio function with songs that have been tweeted by people all over the world. This means that users of Streamwatchr are presented a radio, which is implicitly made by music sharing people.

Radio is originally a form of wireless telecommunication in which a radio channel spreads messages in the form of radio waves. The first radio was successfully developed in the late 19th century by Guglielmo Marconi, who built upon earlier work of Nikola Tesla and Heinrich Hertz.

The first radios were analogous in which the transmitted radio waves have been modulated either with the use of amplitude modulation (AM) or frequency

(7)

modulation (FM). This technique was no longer necessary with the introduction of the digital signal, also referred to as DAP[8].

The newest medium with respect to the listening radio is the Internet, or in other terms; Internet radio. Every popular radio station in the Netherlands can now be listened to on the website www.nederland.fm.

Streamwatchr di↵ers from the other radio stations with respect to providing music. Where with popular stations like Radio 538 and SkyRadio the music is provided by radio DJ’s, at Streamwatchr the music is provided by an algorithm and indirectly by the people around the world who have shared their music tastes on Twitter.

3 Method

3.1 Research Question 1

For answering the first research question (how well does the music recommender of Streamwatchr work? ) it is mandatory to define what an appropriate follow-up song is. First intend was to train an unsfollow-upervised learning algorithm on the collected data from Streamwatchr’s website, to cluster listened to songs as an appropriate follow-up song or not.

The saved data from Streamwatchr’s website was already visually accessible in statistics and graphs via their back-end website, but for the means of training an algorithm the raw data was needed, which could be extracted by executing the following GET request:

curl --silent --user SecretAPIKey:

http://zookst22.science.uva.nl:8008/api/v0/query -d @sw.json | jq -c ’.hits.hits[]’ > out.jsons

By executing this query a database was received with 33.744 raw data points in the following format:

{"_id":"Qre3bYTqSXiWLcCO2r6stQ","_index":"streamwatchr","_score" :1, "_source" :{"created_at":"2014-10-30T12:31:59.863+01:00", "event_properties":{"show":true},"event_type":"toggle_video", "received_at":"2014-10-30T12:32:01.565354+01:00","state": {"browser":"chrome","browser_dimensions":{"height":1308, "width":1388},"browser_version":"38.0.2125.111", "current_location":"Ip-address" currently_playing":{"artist": "Ed Sheeran","mbId":[{"mbId": "b8a7c51f-362c-4dcb-a259-bc6e0095f0a6","name":"Ed Sheeran"}], "song":"thinking out loud","state":"playing","video_id": "rp1DJL_SIys"},"ip_address":"IP-address" language":null, "page":"home","platform":"macos","player_video_shown"

(8)

:false,"screen_resolution":{"height":1440,"width":2560}, "user_agent_string":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) {"height":651,"width":1388},"visitor_key":"0091f408-08cc-422f-9f00-a7849e695ed9"}},"_type":"event"}

Every raw data point is created during an event that has occurred when a user is using Streamwatchr’s website. Events like listening to a song, skipping to the next song, watching the included YouTube movie and closing the browser are stored in the event-type parameter. Hence, every data point corresponds to an event, which means that there has been a total of 33.744 events since the exis-tence of Streamwatchr (2 years). From this total, only 1306 events were created where the users were actually playing a song on the website. This means that there are not enough data-points for training and developing an unsupervised learning algorithm in the provided project time. A di↵erent approach was thus taken for qualifying Streamwatchr’s radio function where all raw-data points can be concluded; creating a total user-experience score for their behaviour on Streamwatchr’s website.

The visitor-key is like event-type another parameter, as can be seen in the data-point example. A visitor-key is assigned to every user that starts a new session on Streamwathr’s website. By chronologically sorting the data-points (read events) containing the same visitor-key, it is possible to create a timeline from the time that a user starts a session, until the time that the user leaves the session (close their browser). This means that all data-points are concluded, which could lead to a timeline like figure 1.

The entire user-experience (timeline of all of his/her events) indicates indirectly how well the performance of the music recommender is. By means of rating every single event-type in combination with its parameters from zero to one, a total user-experience score can be computed. See the following example.

Figure 1: Example of user-experience timeline from one unique user. A python program was written to compute the average among all the user-experience scores, given the entire raw data set (dataextraction.py with out.txt as input). It is possible to adjust the event-type scores to give more or less weight to certain events or to add and delete event scores.

The arbitrariness is high with this method; the given event scores are not trained values and are picked intuitively. Answering a question concerning how well a

(9)

system works with an absolute answer shall though always be arbitrary. One method for reducing the arbitrariness is to calibrate the event values given the outcome of the implicit and explicit comparison, which is discussed in section 3.2 of this paper.

3.2 Research Question 2

Following section will focus on a research question with a relative answer, whereas the first research question had an absolute answer. How well does the music recommender fromm Streamwatchr work in comparison to other mu-sic recommender systems like LastFM an YouTube?. In the remain of this paper there will be referred to Streamwatchr, LastFM and YouTube as the three music recommender systems or the three systems.

First intend was to compare the three systems by means of the unsupervised learning approach that should have been developed during answering the first research question (how this would be done is described below). Since the lack of data the choice has been made to approach the comparison with two di↵erent methods; explicit and implicit comparison.

3.2.1 Explicit

For the explicit comparison a separate website was set up where the three music recommender systems return their follow-up song, given an initial input song by the user. The best working music recommender can be deduced from letting the users explicitly rate the obtained follow-up songs.

In order to achieve this it is necessary that all three systems have an API, where a recommended song can be obtained given an initial input song. Next step is to create a playback function, so the user can listen to recommended songs, which can then be rated. For each of the three systems a brief description is stated below of how the follow-up song can be retrieved, followed by a section about the playback and rating.

3.2.1.1 YouTube YouTube is a website where videos can be uploaded, shared and viewed among users. In recent years, however, it often serves as a website for listening to music. YouTube automatically provides a next video (read song) on the right side of the website when a user is listening to a song.

This is the recommendation of YouTube, given a listened to song. For the explicit comparison, it is obliged to extract this recommendation, which can be done by sending a search request to the YouTube server in the following format[2]:

(10)

Figure 2: Picture of YouTube’s recommendation.

https://www.googleapis.com/youtube/v3/search?relatedToVideoId=" +VideoID+"&type=video&part=snippet&key=APIkey

The result of this search query is a list of recommended videos (songs) with its additional video data. One of the queries’ parameters is the VideoID (ev-ery video on YouTube has a unique VideoID), which is needed to retrieve its associated recommended video’s. It is therefore important that the VideoID corresponding to the initial song on YouTube is obtained. A YouTube search query can be done for acquiring such a VideoID:

https://www.googleapis.com/youtube/v3/search?part=snippet&q=" +Query+"&type=video&key=APIkey

The ‘Query’ parameter consists of the search keywords (song name and artist from the initial song) concatenated to each other with the + sign.

3.2.1.2 LastFM LastFM is a website which is especially developed for giv-ing music recommendations.

The information from the webpage example above can also be required by means of sending a search query to the LastFM server. A top X recommendations is returned given an input song, from which the first recommendation is extracted. The search query of LastFM has the following format and can be acquired with a GET request[4]:

http://ws.audioscrobbler.com/2.0/?method=track.getsimilar&artist= "+ArtistQuery+"&track="+SongQuery+"&limit=2&autocorrect=1&

api_key=APIkey&format=json

Parameters for this search query are the artist and the name from the initial song, an API key and the amount of returned recommendations. Here the parameters also have to be concatenated with the + sign.

(11)

Figure 3: Picture of LastFM’s recommendation.

3.2.1.3 Streamwatchr Streamwatchr’s recommendations are generated with the Google News Ranking System, which is domain independent approach. Just as with YouTube and LastFM it is possible to retrieve a list of song recommen-dation generated by Streamwatchr, given an input song:

http://streamwatchr.com/recommend-radio?song="+SongQuery+"&artists [0][mbId]="+artistid+"&artists[0][name]="+ArtistQuery

ArtistQuery and SongQuery are the artist and song name, once again con-catenated with the + sign. Main di↵erence in comparison to the LastFM and YouTube search query is the artistsid which is the unique artistid from an on-line music database; MusicBrainz[6]. For obtaining music recommendations from Streamwatchr it is thus mandatory to acquire the corresponding artistid on Musicbrainz. Once again a search query has to be send, this time to Mu-sicBrainz, which returns the artistid for the artist from the inputsong.

http://musicbrainz.org/ws/2/artist/?query=artist:"+ArtistQuery The only parameter in this query is the name of the artist, where the spaces are replaced by the + sign.

By means of combining these two search queries, it is possible to extract the first recommendation of Streamwatchr, given an initial song.

(12)

3.2.1.4 Playing songs It is important that the recommended songs can be listened to on the website, so the user can hear clearly which song the three music recommender systems have returned, given the initial song. The songs can be played with an embedded YouTube player, which require a VideoID from the video (read song) that has to be played.

3.2.1.5 Rating songs The songs provided by the three music recommender system can be rated with two di↵erent scores; one score for the quality of the recommended song (is it a good follow-up song given the initial song?) and second is the originality. The originality score is introduced since the music recommender systems can return a song which is from the same artist and album, which is not suitable for a radio system. Scores can be given in the range from 1 till 5, meaning from worst to best.

3.2.1.6 In Summary All the di↵erent aspects of the explicit comparison method are described above. This section will give a summary of how these aspects work together and lead to one final webpage where the experiment can be conducted.

Figure 5: Schematic overview of the back-end from the final webpage. When an experimental unit participates in the experiment it is first directed to Form.html where it can fill in an input song (artist and song title separated). A random song is automatically picked when the song is misspelled or one of the music recommender systems cannot find an appropriate follow-up song. The song the experimental unit provided (or the random song) is then sent to Python.php, which contains a special code to run python programs on the initi-ated from php files, and initiates Recommendation.py with as input arguments the artist and song title.

Recommendation.py is the main file where all the follow-up songs from the three music recommender systems are collected and are stored in an html code snippet that is then written to an external file named total.html. Since a Python program cannot be shown directly in the browser it is mandatory to let the program write html code to the external total.html, which can then be shown in the browser. This file also contains the rating form that initiates next.php

(13)

when the form is submitted. Next.php stores the ratings in a MySQL database and initiates Recommendation.py with as input the song that scored highest in previous rating. Again, when not every music recommender system provides a follow-up song from the input song, a random song is picked.

Figure 6: The final explicit rating webpage.

3.2.1.7 Design choices Some choices were made while designing the ex-plicit comparison website. The beta version of the website had no YouTube but Spotify playback. However, due to the limitations of the Spotify data-set and the syntactic difficulties regarding their search query, YouTube was chosen for the playback service.

The first webpage presented to the user during the experiment contains a form which requires artist and song title from which the recommended songs need to be retrieved. If this song is not known to one of the three music recommender systems, or the number is misspelled, a random song is automatically selected via the http://www.randomlists.com/random-songs website.

It is also possible that the music recommender system of Streamwatchr does not return a follow-up song when it cannot find one. The decision was made to choose a new random initial song till all three music recommender systems have a follow-up song. The failure to return a follow-up song by Streamwatchr happens often and is an item that needs to be improved in the future. For this experiment, it is mandatory that the focus is at the songs that are recommended, so their originality and quality can be compared with the systems of LastFM and YouTube.

At least three types of videos can be found on YouTube; non music, music (official released work) and live/unreleased music footage. Completely excluding the unreleased and live footage is hard to achieve but for the o✏ine evaluation the recommendations from YouTube are filtered on syntactic phrases like ’live’ and ’concert’. If the first recommendation of YouTube consists one of these words the choice is made to continue with an random initial song instead of the second recommendation, since the second recommendations of LastFM and Streamwatchr are also not covered.

(14)

The three featured songs are displayed in random order so it is not clear to the users which song is from which music recommender systems. This avoids biases towards one particular system.

3.2.2 Implicit

Implicit comparison implies that the website users from Streamwatchr are not aware from which one of the three music recommender systems a follow-up song is received. A commonly used framework for testing and comparing multiple systems in an online environment is the A/B testing framework.

With A/B testing the experimental units (website users from Streamwatchr) are divided in to, in this case three (A/B/C testing), equally separated groups. All three groups are presented a di↵erent music recommender when the site is entered, without knowing which one.

Three distinct data sets are created after running the A/B/C testing frame-work on Streamwatchr’s website. An overview of how well the music recom-menders work in comparison to each other can be derived by computing the user-experience (see method research question 1) per data set and compare the scores. Arbitrariness does not play a role anymore since the computing user-experience is equally arbitrary among the music recommenders.

Three variants of A/B/C testing have been implemented in the website’s source code; the standard A/B/C testing framework as described above where a user always receives a follow-up song from the same music recommender, a variant where the website users receives a follow-sup song randomly from one of the three recommenders interleaved and the last variant is where the music recommender is unlike variant one and two associated to the users sessionID; the user receives follow-up songs from one of the three music recommenders when the website is visited. A more detailed description is given below.

The advantages of implementing three distinct variants are that the website users make use of the di↵erent music recommenders in three di↵erent ways. By comparing the user-experience between the three variants of one user, it can be deduced how a single user responds to di↵erent recommenders. This will exclude biases like specific user behaviour. With large amounts of users these biases are less significant but Streamwatchr does not have sufficient users which leads to the reason why there was chosen for three variants.

In order not to directly adjust the current site, the A/B/C framework was first developed on a local node.js server. Node.js is a software platform on which a JavaScript application can be run and developed [7]. The entire front end of the Streamwatchr website has been set up at the node.js, which directly communicates with the back-end of the original website. In this way, without a↵ecting the current Website, the A/B/C framework can be implemented and debugged.

(15)

Playing music and retrieve the next song is initiated by the player.js file, and is thus the place where the A/B/C framework should be implemented. However, the A/B/C framework was not directly implemented into the player.js file, but on a remote Flask server that could communicate directly with the player.js file. A Flask server is a local server that is running a python program. The applica-tion of the follow-up songs from the three di↵erent music recommender systems can be achieved with sending an GET request to the local Flask server: http://127.0.0.1:5000/rec/’+recartist+’+-+’+recsong+’/’+clientip +’/’+sessionid+’/variant

The query consists of the location where the program is running, followed by the artist and title of the song from which the recommended song has to be obtained. Last three parameters are ClientIP (the IP-address of the user which is using the website), the sessionid of the session initiated when a user arrives on the website, and finally an integer from 1 to 3, which indicates which of the three A/B/C variants should be used; 1) IP variant (via ClientIP), 2) the three music recommender systems interleaved and 3) the session variant (via SessionID).

The Flask server returns one recommended song given the variant that was sent via the incomming request. Three variants can be received and are all addressed in a di↵erent manner.

If variant 1 is received, a link between IP-address and one of the three music recommender systems is created and then stored in a python dictionary, where the key corresponds to the IP address and the value to the music recommender system. When the IP-address is already known and thus present in the dictio-nary, the corresponding recommender is retrieved from the dictionary instead of creating a new link. From the dictionary can thus be concluded which of the three music recommendeder systems should be used for retrieving a follow-up song. First intend was to develop a hash function to hash an IP-address to one of three music recommender systems but due to the limited number of users of the website it cannot be assumed that the IP-addresses are uniformly dis-tributed and therefore the decision was made to link IP addresses at random to one of the three music recommendation systems.

When option number 2 is received the Flask server returns a follow-up song from either three of the music recommender systems, and the user is therefore not linked to one fixed music recommender system. This is done by choosing one of the music recommender systems at random with the ’random’ function from Python at every incoming request.

Third and last variant is the same as option 1, but the dictionary is built up by and consists of SessionID’s instead of IP addresses.

The frameworks and servers described are all implemented and can be run at any moment.

(16)

Figure 7: Schematic overview of the back-end from the implicit comparison, regarding song retrieval.

4 Results

This section will focus on the results that were gathered from the explicit com-parison method since the implicit method (A/B/C testing framework) was only developed on a local server regarding the time and was not conducted in a live situation.

The website used for the explicit comparison method received a total of hundred submitted comparisons; every music recommender system received two scores which makes a total of 600 received scores. An average is computed per score and can be found in figure 8.

(17)

5 Conclusion

Two separate research question were stated in the beginning of this thesis, both regarding the music recommender system of Streamwatchr; ’How well does the music recommender of Streamwatch work?’ and ’How well does the music recommender of Streamwatchr work in comparison to the systems of LastFM and YouTube?’.

First research question does not (yet) have an satisfactory conclusion since the lack of users that Streamwatchr has at this very moment. However, an approach was made to provide a method for rating Streamwatchr’s music recommender system, which can be improved in the future by combining conclusions and results from research question number two.

The second question was addressed with two di↵erent approaches; implicit and explicit comparison. Implicit comparison was only developed on a local server and has thus no results from real Streamwatchr users. However, the frame-work for implicit comparison is created and can thus be run and used in the future.

Explicit comparison was achieved by letting users explicitly rate follow-up song that were provided by the three music recommender systems. It can be con-cluded that given the average scores (see figure 8) that LastFM provides the best follow-up songs followed by YouTube and Streamwatchr respectively, though the di↵erence are not significant.

A more significant di↵erence can be found at the originality score. Here, Streamwatchr scores twice as good as the system of LastFM and YouTube do. This makes Streamwatchr’s music recommender more suitable for the purpose that is served; a radio function.

6 Discussion and future work

This research has shown that the music recommender system of Streamwatchr is more suitable than the systems from LastFM and YouTube for creating a radio function. With this result is mainly an answer given to the second research question. The first research question was addressed during this project but has not yielded the desired conclusion. This was mainly caused by the fact that at present there are not enough Streamwatchr users; certainly in the short time that this project took place. In the future the method created for scoring the user experience, in particular the giving of scores to separate event-types, can be calibrated on the basis of the outcome from comparing the explicit and implicit recommendation systems.

During the development of the o✏ine comparison site the syntactic part of the retrieval of follow-up songs took more time than expected. Typos, punctuation

(18)

marks and even capitol letters were not accepted by every system in the execu-tion of a GET request towards their servers. Ultimately there is therefore chosen to proceed with a random initial song if one of the three music recommenda-tion systems did not generated a follow-up song, since it was often because of syntactical problems rather than that the systems did not had a follow-up song available.

All three music recommender systems have a di↵erent database available for retrieving their follow-up song. This seems unfair given that the systems are compared to each other. YouTube has access to both popular and unpopular music, where the overlap with LastFM and Streamwatchr lies mainly in the field of popular music. Theoretically, this would not result in a bias since especially the popular music is shared by users on Twitter, so that in practice, most use is made of music that is available in all three of the music recommender systems databases.

Originally, in addition to the two research questions discussed in this paper was also the intention to improve Streamwatchr’s music recommender system during this project. The results show that there is still room for improvement, especially in terms of recommending a number that fits well to the previous track (recommendation quality). Following the literature review would suggest that most progress and improvement can be achieved through playlists and listen history of unique users. By viewing the overlap between users can be derived which users have the same music tastes, to then exchange and recommend songs that do not occur in both playlists or listening history.

7 Repository

All described files in this paper can be found in the following repository:

https://drive.google.com/folderview?id=0B1SRB-TxVhf0aWhrSFNMSTdQdW8usp=sharing

References

[1] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news personalization: scalable online collaborative filtering. In Pro-ceedings of the 16th international conference on World Wide Web, pages 271–280. ACM, 2007.

[2] Google Developers. Youtube search api, 2015. [Online; accessed 02-May-2015].

[3] Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery, 18(1):140–181, 2009.

(19)

[4] LastFM. Api - last.fm, 2015. [Online; accessed 04-May-2015].

[5] Kibeom Lee and Kyogu Lee. My head is your tail: applying link analysis on long-tailed music listening behavior for music recommendation. In Proceed-ings of the fifth ACM conference on Recommender systems, pages 213–220. ACM, 2011.

[6] MusicBrainz. Musicbrainz - the open music encyclopedia, 2015. [Online; accessed 27-April-2015].

[7] Wikipedia. Node.js — wikipedia, the free encyclopedia, 2015. [Online; accessed 06-June-2015].

[8] Wikipedia. Radio — wikipedia, the free encyclopedia, 2015. [Online; ac-cessed 17-June-2015].

[9] Diyi Yang, Tianqi Chen, Weinan Zhang, Qiuxia Lu, and Yong Yu. Local implicit feedback mining for music recommendation. In Proceedings of the sixth ACM conference on Recommender systems, pages 91–98. ACM, 2012.

Streamwatch: evaluation of a twitter-based music recommendation system.