Link Prediction in Real Life Applications

(1)

Link Prediction in Real Life Applications

SUBMITTED IN PARTIAL FULFILLMENT FOR THE DEGREE OF MASTER OF

SCIENCE

Lirry Pinter

10565051

M

ASTER

I

NFORMATION

S

TUDIES

H

UMAN-

C

ENTERED

M

ULTIMEDIA

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

17-07-2018

1st Supervisor 2nd Supervisor Dhr. N. Voriskades MSc Dhr. dr. Frank Nack Informatics Institute, UvA Informatics Institute, UvA

(2)

Link Prediction in Real Life Applications

Lirry Pinter — 10565051

University of Amsterdam lirry.pinter@student.uva.nl

ABSTRACT

In 2012, Google introduced the Knowledge Graph, a relational data-base that stores data in the form of semantic triples. Link prediction on knowledge graphs is becoming increasingly popular for solving tasks in the field of computer science due to its many applications. Link prediction models represent entities and the relations between them as low-dimensional embeddings to predict missing links in knowledge graphs. Research on using these models on datasets other than the popular WN18 and FB95K are less common. In this paper, the models RESCAL (Nickel et al., 2011), TransE (Bordes et al.,2013), DistMult (Yang et al., 2015) and ComplEx (Trouillon et al., 2016) are compared to predict links between professionals in the electronic music industry. In order to do this a custom knowledge graph was build, based on data from a platform in this field. The results show that some of these models can indeed succeed in this task relatively well, given that the dataset that is used provides substantial data for the training process.

KEYWORDS

Link Prediction, Knowledge Graphs, Deep Learning, Embedding Models

1 INTRODUCTION

Knowledge graphs (KG’s) are structured semantic databases that represent knowledge on the web. A number of projects like Google Knowledge Vault [6], DBPedia [1] and FreeBase [2] have arisen in the last decade. These KG’s contain facts about things in the form of a directed graph of nodes (entities) that are connected through edges (relations). Given a set of entitiesE and a set of relations R, a KG can then be denoted as a set of triples. These triples contain a head and tail whereh, t ∈ E which are connected through a relation wherer∈ R. A triple (h, r, t) then contains one fact within the KG. Triples can also be connected to each other. To exemplify, consider two triples (_{Leonardo Da Vinci, isCreatorOf, Mona Lisa) and} (Leonardo Da Vinci, isCreatorOf, The Last Supper) in a KG. Since ’Mona Lisa’ and ’The Last Supper’ both are linked with ’_{Leonardo Da Vinci’, the KG can learn that these two entities are} related to each other.

Given the fact that KG’s can be incomplete, a research field called Knowledge Base (orgraph) Completion (KBC) has arisen that tries to predict missing links within KG’s. The aim then is that given the query (Barrack Obama, wasPresidentOf, ?), the model should return a tail containingUSA. Thus, KBC (or link prediction) concerns itself with the prediction of heads in (?, r, t) and tails in (h, r ?). In most papers on the task of link prediction, the WN18 and FB15K datasets are used to evaluate their model [11] [15] [14]. This paper will focus on constructing its own dataset, based on the database

of Bookya1, a platform for music professionals. This platform is build to solve the problem of connectivity in the electronic music industry. The possibilities of the Internet and the explosion of new digital tools to produce electronic music has led to a significant increase in the amount of electronic music producers. Key players in this industry are artists, agencies, promoters, venues and events. To illustrate, the music platform Soundcloud alone had over 10 million music creators in 2014 [4]. Because of this large number of new producers, it is becoming increasingly difficult for agencies and promoters to find and book new artists. Similarly, for upcoming artists the chances of being discovered are lowered due to the same problem. The platform has created a database that includes as many of the earlier mentioned key players (artists, agencies, promoters, venues and events) as possible. Within this database, profiles are built for all individuals. This way, artists are accessible for promoters and vice versa. The profiles contain numerous kinds of properties (e.g nationality, genre). This paper explores the idea of generating recommendations to keyplayers. The question then arises:How can models of link prediction be used to generate these recommendations? The fist task then is to build a dataset in the form of a KG, based on the database of the platform. Then, the task is to apply link prediction models on the created dataset. The most relevant predictions would be predicting links between artists and promoters or events. Figure 1 depicts a simplified example graph that illustrates the task within this dataset.

Figure 1

The left graph represents existing triples in the dataset. Artist A, Artist B and Event have the same Genre A. Artist A is Booked by Event. Given the triples in the left graph, the task is to predict the probability of (Event, Booked, Artist

B)

1.1 Structure

This paper will first describe the concepts of knowledge graphs and link prediction. Then, the different models that are used are being

1

(3)

Table 1: Summary of models used in this paper

Model Scoring function Relation parameters Complexity RESCAL (Nickel et al., 2011) eT

i Rkej Rk∈ IRK

2

O(K2

) TransE (Bordes et. al, 2013) −k e_i + r

k− ejk Rk ∈ IRK O(K)

DistMult (Yang et al., 2015) he_i, r_k, e_ji R_k ∈ IRK O(K) ComplEx (Trouillon et al, 2016) Re(he_i, r

k, ¯eji) Rk∈ CK O(K)

outlined. Afterwards, the process of the custom dataset construc-tion is being depicted. Finally, the results, analysis and conclusion will be described. To the best of my knowledge, link prediction on knowledge graphs has not been done in the specific field of music professionals. This thesis could be used as an explorative paper in research of KG’s and link prediction applied in specific fields.

2 RELATED WORK

An extensive amount of papers have been published on the task of link prediction (for an overview of different methods, see [10]). In this paper, four models were tested on the dataset. These models can be divided by their scoring function,complexity, loss function and ability to handle asymmetric relations. Table 1 gives an overview of these models. For a more detailed description see subsection 2.3.

2.1 Symmetry

When applying link prediction models on KG’s, the type of rela-tion in the KG can be symmetric, asymmetric and antisymmetric. Symmetric relations can be defined as that if a relation between two entities (e1, r,e2) holds, the relation (e2, r,e1) also holds (e.g

isRelatedTo). Contrastively, an asymmetric relation exists when (e1, r,e2) holds, (e2, r,e1) can not hold (e.gisFatherOf) [9]. These

relation ’characteristics’ can cause some models to perform less effective, as matrix multiplication issues arise when dealing with asymmetric relations.

2.2 Notation

Let a KG beG = ( E, R, T) where E represent all entities, R represent all relations, andT represent all triples. Let e be the embedding vector of an entity, andr the embedding vector of a relation. Let f(i,j,k) be the original triple where i,j are indexes of (h, t)∈ E and k ofr ∈ R.

2.3 Models

Some models concerning link prediction on knowledge graphs en-tail a form of matrix factorization. Matrix factorization is limited to a single relation, it is not suitable for multiple relations. Ten-sor decomposition is the equivalent of matrix factorization in a multi-dimensional space. This allows the calculation of multiple relations. Early forms of tensor decompositions like CANDECOM-P/PARAFAC (CP) calculate a target tensor X as the sum of outer products of the matrices of the head, tail, and relation types vectors. The score function can then be defined as:

f (i, j, k) = hT

idiaд(rk) tj (1)

RESCAL[11]. Similar to CP, the RESCAL model proposed by Nickel et al. [12] works though decomposing a tensor. Figure 2, taken from [7] depicts how the decomposition works.

Figure 2: Tensor decomposition in RESCAL

It is taking a different approach by mapping all entities in a single embedding space, disregarding the entity being a head or a tail. Additionally, the relation types are used as bilinear operators in the shape of a matrixRDx D. A triple is scored as:

f (i, j, k) = eT

iRkej (2)

This allows the model to handle asymmetric relations, but increases the complexity space due the bilinear operator.

TransE[3]. TransE is a model that does not use tensor decom-position. Instead, it interprets relations between entities in to low dimensional embeddings calledtranslations. Low dimensional data is (on the condition that vital properties of the original graph are preserved) more scalable and better performing than higher di-mensional embeddings. Therefor, this model is generally speaking faster then tensor decomposition based methods like RESCAL. The initialization of the model starts with a random low dimensional continuous vector space. Then, the triples are scored based on a dissimilarity function, between the head and the tail.

f (i, j, k) = −k ei+ rk− ejk (3) This is done by arguing that the embedding of a tail (after transla-tion) should be closer to the embedded head and relation if a fact in the knowledge graph is true, and far if the fact is wrong. The model uses a pairwise loss function. Because of the simplicity of the scoring function, TransE is able to handle symmetric, asymmetric and antisymmetric relations.

DistMult [16]. Like TransE, the DistMult model calculates dis-tances in a single embedding space, the difference between the two models lies in the scoring function. Where TransE calculates the added distances, DistMult takes the dot product of the entity and relation embeddings (4). The model is often implemented with a logistic loss function.

f (i, j, k) = hei, rk, eji (4)

(4)

ComplEx [14]. Like DistMult, triples are scored by multiplying the entry wise product of the head and tail relations. If relations are antisymmetric, the vectors can not be multiplied. ComplEx solves this problem by doing calculations in a complex space, consisting of a Real (Re) and Imaginary (Im) space. [14]. After the calculations in the complex space are done, the triple is scored in the Real embeddings by multiplying the head embedding with the complex conjugate of the tail (5). Logistic loss is used for training.

f (i, j, k) = Re(hei, rk, ¯eji) (5)

2.4 Evaluation tasks

In this paper the evaluation task is concerned with link prediction as described in section 1. With either the head or tail entity and the relation known, the task is to predict the missing entity i.e. the triple(h, r, ?) or (?, r, t) where ? denotes the missing entity. This is done by taking all complete triples from the testset, corrupt them by removing one of heads or tails and then iterate over the possible entities that could complete the triple. The possible entities are then ranked based on probability of correctness, which is the product of the score function of a model. An evaluation metric that is concerned with ranking is Mean Reciprocal Rank (MRR). The reciprocal rank of of a single query is the multiplicative inverse of the rank of the first relevant entity. MRR then is the mean of all the queries in the set. More formal, MRR can be defined as:

MRR = 1 |Q | |Q | Õ i=1 1 rank_i (6)

WhereQ is a query andrank_i is the ranking position of the first relevant entity. MRR can be set to Raw or Filtered. In the Raw setting, all the entities are included in the ranks. In the Filtered setting, only entities with the correct entity type will be included in the rankings. Therefor, Filtered MRR generally yields higher scores. Next to MRR, Hits@N is a widely used metric in evaluating link prediction models [14] [8]. After ranking all the scores the metric takes the top N entities in the (filtered) ranking, and looks if the true entity is present. To exemplify, if the metric is Hits@1 and the true entity is ranked the highest the Hits@1 score is 1.0. With the goal of recommendation on the platform in mind, MRR and Hits@10 will be used as evaluation metrics. Other papers [5] have proposed Mean Average Precision (MAP) as an additional evaluation metric. For this evaluation it is discarded, given that queries in the testset usually contain 1 relevant document which causes the MAP score to be identical to the MRR score.

As stated by Bordes et al. [3], results can be divided by classifying the cardinalities of thehead and tail entities. These could be 1-TO-1, 1-TO-MANY, MANY-TO-1 and MANY-TO-MANY. 1-TO-1 entails that onehead can have a relation with only one tail. In 1-TO-MANY a head can have many tails. In MANY-TO-1, many heads can have onetail. Finally, in MANY-TO-MANY relations many heads can have manytails. The dataset used in this paper contains two of this classes, namely theMANY-TO-1 and MANY-TO-MANY class. To exemplify,_{isBasedInCity is a relations that can contain multiple} heads, but only one tail (e.g. multiple artists can be based in the

same city, but they can not be based in multiple cities). Furthermore, manyArtists can have multiple Genres.

3 DATASET CONSTRUCTION

3.1 Dataset construction task

The original database that was made available needed optimizing and cleaning in order to convert it to a dataset of relational triples. LetB be the original dataset, with B(Ar, Ag, Ev, Pr, Ve) the original files of the keyplayers (Artists, Agencies, Events, Promoters, Venues). The keyplayers were split up in separate CSV files. Every column of each file in the database contained data about respectively lo-cation, genre and other information from a keyplayer. None of the entities had an unique ID, which is necessary in semantic data. Furthermore, entities like city, country and genre did not have an own file, but were rather spread out over the other files. Thus, the number of unique entities in these categories was unknown. Fi-nally, a percentage of the entries in the original database contained noise in the form of different languages (e.g.Allemagne for Ger-many), special characters (possibly problematic for the matching of entities) and multiple values in single cells. Finally, some entities were depicted as URL’s pointing to endpoint where the data was stored on the platform. The names of these entities would differ from the original names in the database (e.g101 Agency would be written as101-agengy-1). To construct a relational database, all the connections within the file need to be represented in the form of triples as<id1><relation><id2>. The figure below poses an example of a sample from the original database, and the desired dataset entries.

Algorithm 1 Relational extraction

1: procedure Transform(B) . Original database 2: for all entities in files do

3: Assign unique 7 digit ID2

4: Remove errors and translate to English

5: for each ID do

6: Assign<isType> and <hasName> relations

7: Assign relations with other ID’s

8: returnrelations . text file

As for converting the tabular data in to relational triples, a pipeline was formed. This pipeline addresses the issues described in the

(5)

previous section. Finally, the complete dataset needed to be con-verted in the format of the WordNet database (WN18), a popular benchmark for relational learning which is used by a considerable amount of papers [10] [15] [12]. The structure of this dataset can be mimicked by creating two files. First, a file containing all the triples, in the format of ID, Relation, ID (e.g 2000192,isBasedinCountry, 10004363). This file is then split up in a train(90%), validation (5%) and test-set (5%). Additionally, a file that denotes all thedefinitions in the format ID, Name, Description (e.g. 10004563 Berlin City). Since the ID’s are being used for training, the definitions file is used in the evaluation process by making the entities readable and understandable again.

4 EXPERIMENTS

4.1 Dataset

After extracting the triples, some conclusions could be made. Firstly, theVenues only contained information about capacity and location. Since the prediction task was to predict new relations between Artists and Events, the Venues did not add extra value to the ex-periment and were thus left out. As shown in the figure above, other entities contained useful relations that could help for link prediction. The threshold for considering a relation type useful is if the relation was connected to eitherArtists or Events in a direct (e.g. bookedArtist) or indirect (e.g. aPromoter hasGenre, that was connected to anEvent). Secondly, an observation of the relations made clear that none of them were symmetric. Yet, some relations in the dataset could cause problems in the validation process. To elaborate, a relation (Artist, wasBookedby Promoter) would have an counterpart (_{Promoter, bookedArtist Artist) with} the same entities. If these two relations would be divided in the train and testset, the model would overfit on this data by learning the reverse relation. For this reason, the duplicate relations were also removed. Finally the final dataset, now called BK7, could be distinguished as followed: BK7 No. Entities 22,741 Relation Types 7 Relations 80,627 Train 72,558 Val 4012 Test 4057

4.2 Experimental Setup

All the experiments were run on a VirtualBox Ubuntu 18.04 envi-ronment. The models were implemented with Python code from the paper ’Complex Embeddings for Simple Link Prediction’ (Trouillon et al., 2016)3. Hyperparameters were tested and set for all models. Across all the models the ADAGRAD optimizer was used. Training stops when the Filtered MRR (validated every 50 epochs) is lower than the previous one (early stopping) with a maximum of 1000 epochs. Mentioned by Kadlec et al. [8], batch size can drastically improve the score of the models. In this case, the MRR scores of all models significantly improved when the batch size was doubled. The batch size used was set at 50 batches per epoch, which is equal to the sum of the trainset divided by 50. Forλ and learning rate, the settings as described by Trouillon et al. [14] were used. For the number of embeddings K, experiments were done on K ={20, 35, 50, 100, 150}. Finally, the best results per model came at K =35 for RESCAL (Nickel et al., 2011),K = 50 for TransE (Bordes et al., 2013), K = 50 for DistMult (Yang et al., 2015), and K = 35 for ComplEx (Trouillon et al., 2016). These embedding sizes are lower than in the original papers, and this is due the lower complexity of this dataset compared to the benchmark dataset WN18. Changing the initial learning rate had not a lot of influence, and as mentioned in the original paper [14] this could be caused by the ADAGRAD [13] optimizer, which tunes the learning rate while optimizing. Finally, due to time and computational limitations, the negative sampling remained at 1.

5 RESULTS

Table 2 displays the MRR of all models across the whole testset. The highest scores for Filtered and Raw are displayed in bold. As can been seen, the gap between the Filtered and Raw setting of the MRR differs between the models. This is interesting because a large gap between these two metrics can indicate that the model not performing optimally. On the contrary, if the gap is small, the model does a decent job of predicting the type of the entity correctly. In this experiment both the Filtered as Raw MRR scores of TransE are the highest, but the gap between the Filtered and Raw scores is also the highest. Furthermore, RESCAL has the lowest

Model MRR

Filtered Raw RESCAL (Nickel et al., 2011) 0.127 0.104 TransE (Bordes et al., 2013) 0.204 0.148 DistMult (Yang et al., 2015) 0.107 0.088 ComplEx (Trouillon et al., 2016) 0.146 0.127 Table 2:Filtered and Raw MRR for each model on the testset

Filtered MRR, but the gap with the Raw MRR is smaller than the other models. It can be argued that the model is generalizing well in predicting the correct entity type, but overfitting with respect to the unique entities within that type.

The highest scoring model, TransE outperforms RESCAL with a difference of 0.077 in Filtered MRR. A reason for that could be that

3

https://github.com/ttrouill/complex

(6)

the simple embedding of vectors and the handling of asymmetric relations is a better method for this dataset. Still, considering the shape of the dataset being similar to WN18, the scores seem fairly low compared with those tested in other papers [14] [8] [11].

Since predicting relation types that involve eitherArtists, Promoters andEvents are more important than predicting other types correctly , it is interesting to see how the models perform per relation type

6 ANALYSIS

A more detailed analysis of the scores is displayed in Table 3 on the next page. It contains the different relation types and their MRR and Hit@10 scores, respectively. The highest Filtered MRR scores per relation type are partially divided over the models. What becomes clear is that the initial assumption that TransE is the best performing model can be disputed. Indeed, it has the highest av-erage MRR when predicting all queries in the testset, but for the relation types relevant to our task the ComplEx model scores con-siderably higher. Another interesting observation is that the TransE model scores high onMANY-TO-1 relations like isBasedinCountry and_{isBasedinCity. Since over 30% of the queries in the testset} are relations of that type, it can explain the lower average MRR score of the ComplEx model. Moreover, a possible explanation of the low MRR scores for the relation typesisBasedinCity and isBasedinCountry could lie in the fact that the fact that the re-lation between the two (e.g. Berlin, isCityIn, Germany) is not defined in the dataset. Therefor, the models have less learning parameters involving these entity types.

With respect to the prediction task for this dataset, all models perform considerably better in the relevant relation types, to il-lustrate, ComplEx scored two times its average for these relations. The threshold for a relevant relation type can be defined as a di-rect relation between eitherArtists, Promoters and Events. The relation types that meet that threshold areBookedArtist, and hostsArtist. These relations must not be confused with the rele-vant relation types described in section 4, which were relerele-vant for the training process.

Further analysis can be done by generating the top 10 ranked en-tities for a query in the testset. The queries in Table 4 and 5 are generated with the ComplEx and TransE models. These queries were chosen to depict different cases where predictions where a success or a failure.

Query #1. An example of a fail case for a relation type that is not relevant for the prediction task. This query represents a triple with anArtist entity related to a Country entity. As the table depicts, none of the top 10 ranked entities were of the correct type, a sign that the model performs poorly on this relation. This contradicts the rather high MRR score of this model for this relation type.

Query #2. An example of a success-case of a query in the relevant relation types. This query represents a triple with anPromoter related to anArtist. The query yielded ambiguous results. The correct triple was not predicted, but the top 10 ranked entities are almost exclusively of the correct type.

Query #3. As already depicted in Table 3, the ComplEx model does not perform well on the_{hasGenre relation type. This is} con-firmed by the top ranked entities of this query, where none of the results yield either a correct entity type or the correct entity.

Query #4. Similar to query #2, but represents a triple with an Event to an Artist . In this case, the query is considered more useful since the top 5 ranked entities are all of the correct type, with the correct entity in second place. When predictingArtists, they could have similarities that are previously unknown and could help to generate a better recommendation if this system would be implemented.

These cases clarify that the models are still not performing optimally. The calculation of the MRR scores are done on both corrupted heads (?, r, t), as tails (h, r, ?). These the queries from these cases all were taken from the latter category. It is possible that predictions of missing head entities yield more convincing results. Considering the large gap between the MRR score of the lowest (_hasGenre) relation and the highest (wasPromoterAt) relation, the validation set was studied. It became clear that over 40% of the validation set triples contained thehasGenre relation. A possible scenario is that the algorithm overfitted on this validation data, causing the MRR on the test scores to be significantly lower. On a more positive note, the percentage of triples with thewasPromoterAt in the validation set were only 0.7%. With the model scarcely optimizing itself on this relation, it still yielded the highest scores.

7 CONCLUSION AND FUTURE WORK

This paper explored how methods of link prediction can be applied to aid a platform in the field of electronic music professionals. Four models were applied to a KG that was derived from tabular data. This was done by analyzing the tabular data and creating a pipeline that could take this original data as an input, and relational triples as an output. The MRR scores in the relevant categories and success cases proved that link prediction on knowledge graphs can be a foundation for doing recommendations on this dataset. Thus, it can be said that transforming a database to relational triples and applying link prediction models on them is a possible way to generate these recommendations. Considering the scores of all the models, more can be said about the dataset. Here it is argued that in the end, it was not sufficient enough to utilize the full potential that the models could offer. Additionally, in any machine learning problem there is room for improvement. The models performed reasonably well, considering the small amount of relations ands triples in the training set. But to really archive the task of predicting Artists, Promoters and Events to each other, more data is needed in the training process of the models. Additionally, further tuning of the hyper-parameters could lead to better results. This paper also illustrated once again that different link prediction models yield variating results in terms of symmetry, relation type and number of entities. It it very clear that these models need to be chosen according to the dataset that is being used.

(7)

Table 3

Filtered MRR and Hits@10 for each model on each relation type

Relation type Category RESCAL (Nickel et al., 2011) TransE (Bordes et al., 2013) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) MRR H@10 MRR H@10 MRR H@10 MRR H@10 isBasedinCity M-TO-1 0.114 0.196 0.252 0.393 0.079 0.160 0.147 0.256 isRepresentedBy M-TO-M 0.035 0.077 0.102 0.208 0.021 0.034 0.044 0.106 hasGenre M-TO-M 0.093 0.218 0.103 0.189 0.107 0.251 0.069 0.155 isBasedinCountry M-TO-1 0.275 0.481 0.430 0.665 0.172 0.364 0.296 0.323 bookedArtist M-TO-M 0.152 0.213 0.208 0.310 0.163 0.230 0.433 0.621 hostsArtist M-TO-M 0.083 0.174 0.165 0.294 0.121 0.128 0.400 0.647 wasPromoterAt M-TO-M 0.015 0.056 0.467 0.667 0.287 0.481 0.474 0.648

Relation types that are relevant to task inbold Best model MRR and Hits@10 per relation type in_bold Best model MRR for relevant relation type is underlined

Table 4: Examples of success and fail-cases (TransE) (#1)Cressida isBasedinCountry

Mandal Forbes , Vivify, Clap! Clap! , Bicep , Nobodys Face, Sebastian Markiewicz , Abou Samra, Deby Cage , Barja , K-391

Germany

(#2)UNITY Festival Israel bookedArtist

Mindustries, Anthony Jimenez, Loefah, HetGoreLef, People Get Real, Rick Wilhite, Chris Ojeda, Mit Dir Festival ,Nehuen, Millenium Mayhem

Love Island

Table 5: Examples of succes and fail-cases (ComplEx) (#3)Ben Techy hasGenre

Lukas Stern, Uzuri, Semibreve Festival, Gold Slugs, Danny Trashock, Carlos Manaca, Curses, Komes, Alex Neri, Souloud

Electronic

(#4)Sonic Solution Entertainment bookedArtist

Dank , Mad Elephant, Wave Week , CODA Toronto , Raumakustik , Tomas Barfodo, Graz , Secret Cinema, Todd Terry, Peter Glasspool

Fabio

For Table 4 and 5, top results for each query, with the desired result are underlined in the row beneath the results. Wrong results but correct entity type are in bold, correct result in the results is bold and underlined.

(8)

Expanding the dataset with more parameters could lead to other, interesting results. Because of time limitations, some entities in the original database were not included in this research. One of these entities is the ’bio’, which contains text describing that particular entity. With Natural Language Processing techniques, new pos-sible entities and links could arise that would expand the dataset more and give room for a potentially more accurate link predic-tion. Similarly, an entity type like aRecord Label could be the key to a stronger dataset, since it has connections toEvents, Artists and most other entity types in the current dataset. Finally, other methods of link prediction could be applied to the dataset. It has shown that translating embeddings on a hyperplane (TransH) [15] is better suited for many-to-many relational datasets. Due to time limitations this was not done in this paper.

REFERENCES

[1] S ¨oren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. InThe Semantic Web, Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudr ´e-Mauroux (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 722–735.

[2] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In_{Proceedings of the 2008 ACM SIGMOD International Conference on} Management of Data (SIGMOD ’08). ACM, New York, NY, USA, 1247–1250. DOI: http://dx.doi.org/10.1145/1376616.1376746

[3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and OksanaYakhnenko. 2013. Translating EmbeddingsforModeling Multi-relational Data. In_{Advances in Neural Information Processing Systems 26,} C. J.C. Burges,L.Bottou,M.Welling, Z.Ghahramani, andK.Q. Wein-berger (Eds.). Curran Associates, Inc., 2787–2795. http://papers.nips.cc/paper/ 5071- translating- embeddings- for- modeling- multi- relational- data.pdf [4] Mike Butcher. 2014. _{Soundcloud Launches Ad Platform And Preps Ad-Free}

Subscription Service. Technical Report. https://techcrunch.com/2014/08/21/ soundcloud- launches- ad- platform- and- preps- ad- free- subscription- service/ ?ncid=rss

[5] Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. 2016. Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks.CoRR abs/1607.01426 (2016). http://arxiv.org/abs/1607.01426 [6] Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin

Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. InThe 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014. 601–610. http://www.cs.cmu. edu/∼nlao/publication/2014.kdd.pdfEvgeniy Gabrilovich Wilko Horn Ni Lao Kevin Murphy Thomas Strohmann Shaohua Sun Wei Zhang Geremy Heitz. [7] Brian Jones. 2016. Relational Learning with TensorFlow. http://nbviewer.jupyter.

org/github/fireeye/tf rl tutorial/blob/master/tf rl tutorial.ipynb. (2016). [8] Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. 2017. Knowledge Base

Com-pletion: Baselines Strike Back._{CoRR abs/1705.10744 (2017). http://arxiv.org/abs/} 1705.10744

[9] Hitoshi Manabe, Katsuhiko Hayashi, and Masashi Shimbo. 2018. Data-Dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Comple-tion. (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/ 16211

[10] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A review of relational machine learning for knowledge graphs._{Proc. IEEE 104, 1} (2016), 11–33.

[11] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. (01 2011), 809–816. [12] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-way

Model for Collective Learning on Multi-relational Data. InProceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, USA, 809–816. http://dl.acm.org/citation.cfm?id=3104482. 3104584

[13] Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016). http://arxiv.org/abs/1609.04747

[14] Th ´eo Trouillon, Johannes Welbl, Sebastian Riedel, ´Eric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In_{International} Conference on Machine Learning (ICML), Vol. 48. 2071–2080.

[15] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes.

[16] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding Entities and Relations for Learning and Inference in Knowledge Bases._{CoRR abs/1412.6575 (2014). http://arxiv.org/abs/1412.6575}