Skill tag recommendation inprofessional networks

(1)

Skill tag recommendation in

professional networks

(2)

Layout: typeset by the author using LATEX.

(3)

Skill tag recommendation in

professional networks

An exploration for online co-working spaces

Fabian S. van Stijn 11906448

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. S. van Splunter Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam Jan, 2021

(4)

Abstract

A tag is an important form of metadata which is widely used on online platforms. Tags display words which are related to online entities like an account or content. For online platforms a tag is an important way to be able to recommend the right content to an end user. Tags can be user made or automatically generated. In this paper we specifically look at recommending skill tags for accounts of users on work related platforms. The challenge in this thesis is to also be able to do this with a small amount of data. Two methods of recommending skill tags are implemented and the trade offs are compared. It turned out that skill tag recommendation is feasible on a small dataset and that recommending skills from the most similar users gives the best performance.

Keywords

Skills, Tags, Recommendation system, Small dataset, Okapi BM25, Naive Bayes classification, Knowork

(5)

1 Introduction

Today tagging is used on many platforms to be able to categorise posts, accounts or other data. This can be seen on for example YouTube, Instagram or Linkedin. In this way order can be created in the chaos of different posts. Content can be recommended based on these tags and certain preferences. Users can for example give skills or interests and receive notifications and recommendations for content. To be able to do this it would be important for a platform to have qualitatively high and complete accounts. This can be achieved better when it is easier for users to complete this account. Thus tag recommendation comes to mind as a possible solution. Recommending tags can be done in multiple ways and each has their own advantages. Platforms that are starting out are prone to hav-ing a small user base and thus a small dataset. The quality in profiles also varies from user to user. Some users want to finish the profile as quickly as possible to be able to use the platform and others take their time to really complete it. For this thesis it is the goal to look at an approach that still performs even with a limited amount of rough data and is able to also work with multiple languages.

To be able to achieve this goal the following research question was formulated:

R1: “Is skill tag recommendation based on user profile data feasible, when predicted on a limited amount of data?”

With the following sub questions:

R1.1: “Is it more effective to recommend skills based on the skills of the best matching user profiles, or on the complete set of available skills?”

R1.2: “Which constraints on using the dataset can be introduced to the recommendation system, to balance the quality of the data available for suggestions, whilst preserving enough quantity of data to ensure generalisability?”

In this thesis the assumption is made that skill tag recommendation is feasible when a five skill recommendation gives a precision and recall of at least twenty percent. Which means it would have to recommend at least one skill correctly and find at least twenty percent of the recommendable skills. This is a relatively low precision but takes into account that the dataset is small and has its flaws. This is discussed in Section 4.1.

1.1 Contributions

The contributions made in this paper can be summarised as follows:

• Two different means of recommending an arbitrary amount of skills were made which could do recommendations with about description and job title of a user with the small dataset. • A comparison was done between both implementations and the advantages and disadvantages

were discussed.

(7)

1.2 Literature Review

While doing research on how to solve this problem one of the first viable options that came to mind was auto tagging. In another paper neural networks were used to predict relevant tags based on the text and annotated tags belonging to this text [5]. The main difference with this paper is that a profile’s about description is in general a lot shorter than a whole paper, on which this approach based its predictions. Therefore this implementation couldn’t be used in this thesis.

Another way to find the relevant tags is by using ontology. An ontology is an information model that provides a formal explicit specification of a shared conceptualisation of a domain [8]. It can be constructed with terms from the texts and enhanced by adding concepts from the domain. The problem is that the domain of Knowork is very broad yet specific. It contains a lot of skills but only for professionals who work at co-working spaces. Thus an ontology could not be found nor easily created.

An often used method of doing text classification in information retrieval is the Naive Bayes Classifier [3] which is also a form of statistical text classification. An algorithm which has also proven to perform very well on independent attributes is the Naive Bayes Classifier [1]. It can learn from a relatively small dataset and performs so well that it seems like it does not learn from independent attributes.

An important feature of both of these methods is that they use independent words as their input. This makes it semantically easy and computationally efficient. It also means that with a very short text a recommendation can be done. Because it uses the independent words and not takes semantics into account it is also not language restricted. One model can be used for multiple languages as long as there is enough data on each language.

2 Background

In this section the underlying theory for the algorithms that make the recommendation system work are explained. The following theories are discussed: TF-IDF, Okapi BM25 and Naive Bayes Classification.

2.1 TF-IDF

TF-IDF [3] stands for Term Frequency - Inverse Document Frequency and it reflects how important a word is to a document in a dataset. The value increases proportionally to the word count in the document and is offset by the amount of other documents which also contain the word. This makes words that occur more often throughout the dataset less relevant. The next formula introduced by M.J.McGill [7] calculates the TF-IDF value.

ft,d· log

N nt

(2.1) With ft,d being the term frequency, N the total number of documents in the corpus and nt the

(8)

2.2 Okapi BM25 Ranking

Okapi BM25 is often used in information retrieval systems. It uses a bag-of-words data set to determine which documents would be most relevant to a query. To calculate the BM25 score the following formula is used.

score(D, Q) = n X i=1 IDF(qi) · f (qi, D) · (k1+ 1) f (qi, D) + k1· 1 − b + b ·_avgdl|D| (2.2) In this formula f(qi, D)is the term frequency in document D. The variables k1 and b are freely

chosen parameters. The IDF is computed with the following formula introduced by S.E. Robertson [6]:

IDF(qi) = ln(

N − n(qi) + 0.5

n(qi) + 0.5

+ 1) (2.3)

With N being the total amount of documents and n(qi)the amount of documents containing q.

2.3 Naive Bayes Classification

Naive Bayes classification is a relatively simple approach to building a classifier. It is a form of su-pervised learning. There are multiple ways to implement a Naive Bayes classifier but what they all have in common is that they all assume that all features are independent of each other. It considers all features to contribute equally to the probability of belonging to a certain class. Research has been done to why Naive Bayes classifiers work so well even though an oversimplified assumption is made. [10]. From this research it turned out that the dependencies cancel eachother out. The main advantage of Naive Bayes is that it does not need a large dataset to work and can work in multiple languages without having to change the model.

Naive Bayes is based on Bayes’theorem which calculates conditional probability using the fol-lowing formula:

P (A | B) = P (B | A)P (A)

P (B) (2.4)

Where P (A | B) is the probability of A occurring given that B is true. In Naive Bayes classi-fication an instance is represented by a vector which contains a certain amount of features. The probability of this vector belonging to a certain class is p(Ck| x1, . . . , xn). Using Bayes’theorem

the following formula can be constructed as has been done by M.E. Maron [4]: p(Ck | x) =

p(Ck) p(x | Ck)

p(x) (2.5)

Because x is known in the equation p(x) becomes a constant and the numerator of the fraction remains. Vector x contains multiple features thus for each feature the probability has to be calcu-lated:

(9)

p(Ck) n

Y

i=1

p(xi| Ck) , (2.6)

This multiplication raises an issue of the probability of a certain instance is zero (for example the probability of a word being in a text but the word is not in the text). To solve this issue additive smoothing is used. The principle works with the following formula:

ˆ θi=

xi+ α

N + αd (i = 1, . . . , d), (2.7) If the pseudocount α equals 1 the additive smoothing is also called "add-one smoothing".

3 Methodology

The previous chapter gives some knowledge about the methods that are going to be used to con-struct a skill recommendation system. To be able to successfully answer these questions at first the data has to be cleaned. All data needs to be complete and in the right format. When this has been done the data can be preprocessed. The text is converted to an index so that calculations can be done in the algorithms. An implementation has to be made which takes a job title and description as input and outputs relevant skills based on the ranking of the specific algorithm. The recommen-dations are done based on the given limited set of data made available by the Knowork platform. Because the dataset is very limited basic techniques are most likely to give a good prediction. For a complicated model to work there needs to be enough data to learn from and not overfit on the data. The approach to answer sub question R1.1: “Is it more effective to recommend skills based on the skills of the best matching user profiles, or on the complete set of available skills?”is described in Section 3.1. The approach to answer sub question R1.2: “Which constraints on using the dataset can be introduced to the recommendation system, to balance quality of the data available for suggestions, whilst preserving enough quantity of data to ensure generalisability?” is described in Section 3.2.

3.1 Exploring the Basis for Recommendations

To be able to answer if recommending skills from the best matching user or the best matching skills is more effective two separate implementations are made. Finding the best matching profile is done using Okapi BM25 to rank all profiles from most to least relevant. The Okapi BM25 algorithm needs a term frequency index of the whole dataset to do this ranking. Background information on how the Okapi BM25 algorithm works is given in Section 2.2 and in Section 4.3 an extensive explanation on how to implement this is given.

The individual skill recommendation is done with Naive Bayes Classification. This implementa-tion iterates over every single skill and calculates which one has the highest probability to belong to a given description. This implementation needs a term document matrix for its predictions. Again background information is available in Section 2.3 and an extensive implementation explanation in Section 4.4.

(10)

3.2 Constraining the dataset

To answer the second sub question the dataset has to be inspected. The variations in profile data that influence its quality are discussed and potential ways of removing the low quality profiles are tested. A look at the dataset is taken in Section 4.1.

3.3 Evaluation

To be able to compare the performance of each approach an evaluation of the algorithms has to be done. Two very important evaluation metrics are precision and recall. Precision stands for the fraction of correctly recommended items among all recommended items. Recall is the fraction of recommended items among all relevant recommendations. There are four types of classifications that can be made: true positives, true negatives, false positives and false negatives. True positives are the items that were classified as relevant and which are correct. True negatives are items which have not been classified as relevant but are also not relevant. False negatives are items which are relevant but which were classified as not being relevant. And lastly false positives are items which were classified as relevant but which are not relevant. When evaluating a recommendation system its performance can vary depending on how many recommendations it does. Because each skill in the recommendation has a certain confidence score the best recommendations are given first. So if only one item is recommended the precision is most likely the highest. To calculate the precision and recall the following formulas can be used.

Precision = tp

tp + f p (3.1)

Recall = tp

tp + f n (3.2)

These metrics can be calculated for each amount of recommendations. This is the preci-sion@k/recall@k.

Another way to evaluate on a recommendation system is the coverage of the dataset it has. Coverage indicates the amount of skills that the recommendation system can recommend. Some-times a recommendation system only recommends a certain part of the dataset. Coverage can be calculated with the following formula:

Coverage skills = n

N ∗ 100 (3.3)

Where n is the amount of skills recommended out of N which represents all skills that can be recommended.

(11)

4 Experiment setup

In this section at first the dataset is described. After that multiple explanations are given on how to preprocess the data, implement the recommendation systems and on how to evaluate on their output.

4.1 Dataset

The user data of the Knowork platform was made available for this thesis but because it was rela-tively small using another dataset was also taken into consideration. The problem with most other bigger datasets containing job titles, descriptions and skills is that they do not have the same scope. Most of these datasets were from job vacancy platforms in which they are looking for employees. Also taking into account that Knowork users are a certain type of professional (working at co-working spaces) using the Knowork dataset still seemed the most viable option.

Knowork’s dataset consisted of 1331 rows, each with 5 entry fields. In the table underneath a description of the data is given with random samples not related to each other. When looking at these samples also notice the spelling mistake (Costumer) and capitalisation inconsistencies that are in this random data.

Table 1: Overview of the dataset. Field Datatype Sample

UserID Integer 38

Job Title String Incubator Manager

About Description String "I’m a screenwriter and director from Finland." Skills String "sales, Costumer relationship management"

Interests String "Community Development, community management, bread" Of the 1331 rows 639 were empty and 268 were incomplete. Which left 424 accounts that were useful to work with. The average length of the about description of these accounts was 27 words. The longest description was 418 words long and the shortest was 1 word long. The modus of the dataset was a word length of 3 words. In Figure 1 the distribution of the length of the about descriptions has been plotted.

(12)

Figure 1: Plot with the lengths of the about descriptions contained in the dataset

An average user added 4.7 skills to their profile. The maximum amount of skills was 20 and the minimum was 1 skill. The modus of the amount of skills was again 3. Figure 2 shows the distribution of the skills in the dataset.

(13)

The dataset has its shortcomings, as the entries contain (random) test values, nonsense and a selection of inconsistencies:

• I1: Length of the about description • I2: Different languages

• I3: Typos

• I4: Capitalisation

• I5: Different use of terms for the same skill (synonyms) • I6: Interpretation

I6 requires some extra explanation. It should also be taken into consideration that users might have a different interpretation of what should be in their about description. One user might talk about his hobbies while another talks about their company. This inconsistency is also present in the skill set. Some users might give their job related skills while others give non job related skills (playing guitar for example).

The variation of about description length seems very important. In this description a user can put anything they want. This makes them very different overall. The structure, content and length varies. People also give a different amount of skills, about a quarter of the user profiles only had one or two skills. Ideally for training more skills are better.

The modus of the length of the about descriptions was three words. This is a very small amount to do a prediction on. Thus the trade off when the really short about descriptions are cut out would be that a big part of the data would be lost because most of the descriptions are really short. When the average about description length is plotted against the amount of skills, as in Figure 3, a clear tendency to become longer can be seen. This indicates that a user who gives a longer description is likely to give more skills as well. Thus the data quality tends to increase the longer a user’s about description is or the more skills he gives. Therefore it would be interesting to see how the recommendation system would perform if only about descriptions of a certain minimum length were used. This comparison will be done in Section 5. By doing this comparison inconsistency I1 (length of the about description) can be minimised.

(14)

Figure 3: Average description length per given amount of skills

4.2 Preprocessing

Before the dataset was able to be used in the recommendation system it had to be cleaned. A lot of the profiles were incomplete because they were missing the job title, about description or skills. These had to be filtered out. For the algorithms to be able to use the dataset it had to be converted into indexes. The text also had to be transformed into lowercase and split to achieve this. To be able to split the about descriptions into lists of words NLTK (Natural Language Processing Toolkit) was used. In the following sections an explanation is given on how to construct a term frequency index and a term document matrix.

4.3 Okapi BM25

Okapi BM25 is a document recommending algorithm using a term frequency index to calculate which documents (or profiles in our case) are most likely to match with the query. A simple way to recommend skills that could be relevant is looking at the most similar accounts and recommending the skills that they have entered.

4.3.1 Term Frequency Index

A term frequency index is a big dictionary which for each word in the dataset counts the amount of occurrences in each document it occurs in. In this case a document refers to a user. One of the items in a dictionary could for example look like this:

‘passion’: ‘FreqperDoc’: Counter(816: 2, 788: 1, 1002: 1, 293: 1, 827: 1, 876: 1, 40: 1), ‘Cor-pusFreq’: 8, ‘DocFreq’: 7

(15)

To make this index every user’s description is converted to lowercase and tokenized. By doing this inconsistency I4 (Capitalisation) has been gotten rid of. Then for each word in this tokenization list the word is counted and this value is saved into the dictionary. The total amount of occurences in the corpus and the amount of documents containing this word are also calculated. Underneath pseudocode is written on how to construct this term frequency index.

Algorithm 1:Index creation import dataset;

split dataset into train and test dataset; create an empty dictionary;

for user in dataset do

Set user id variable to corresponding user;

Get job title variable from user, convert all characters to lowercase letters and tokenize this.;

Get about description variable, convert all characters to lowercase letters and tokenize this.;

Combine these two lists; for word in combined list do

add 1 to dictionary[word][user id] end

end

Initiate another empty dictionary;

Create a dictionary containing the sum of the amount of occurrences of this word in the dictionary;

Create a dictionary containing the amount of users having this word in their account; Calculate the total amount of users;

Combine all dictionaries and add the last value into the empty dictionary for each word; 4.3.2 TF-IDF

To be able to calculate the BM25 ranking the TF-IDF value also needs to be calculated. The underlying theory from section 2.1 is used. The pseudo code for the algorithm to calculate the TF-IDF is written underneath.

(16)

Algorithm 2:TF-IDF Algorithm tfidf(term, document, index ):

if The term is in the index dictionary then Save term frequency to a variable; else

return 0.0; end

if The document frequency of the term is bigger than 0 then

Inverse Document Frequency = log(total amount of files / term document frequency); else

return 0.0; end

return Term Frequency * Inverse Document Frequency 4.3.3 BM25 Rank Calculation

With the index that was created in the section above the BM25 ranking for each user profile can be calculated so a certain user’s skill list can be recommended. This algorithm uses the underlying theory from Section 2.2. Underneath pseudo code was written on how to implement the recommen-dation system.

Algorithm 3:BM25 Algorithm BM25ranker(query, a, b, index ); Initiate an empty list;

for user in database do Set score to 0; for word in query do

TF-IDF value = tfidf(word, user, index) Term Frequency = Term frequency from index;

Calculate TF-IDF value * (a + 1) / (Term Frequency + a * (1 - b + b * Total amount of users / average about length)) and add this to the score;

end

Append this the user id and the score to the empty list; end

Sort the list and return it;

4.4 Naive Bayes

In the Naive Bayes implementation individual skills were recommended instead of the skills given by another user. To be able to do this a Term Document Matrix has to be made. The advantage of recommending single skills is that these skills are ranked for each individual skill instead of per user.

(17)

4.4.1 Term Document Matrix

In the term document matrix words are linked to the skills they were used in. Each item in a term document matrix would represent a skill and contain the word count of each word in the about text the users with this skill have. An example of an item in the term document matrix is:

Lifestyle Features (‘hello’: 806: 1, ‘im’: 806: 1, ‘italian’: 806: 1, ‘live’: 806: 1, ‘amsterdam’: 806: 1, ‘sinc’: 806: 1, ‘juli’: 806: 1, ‘2017’: 806: 1, ‘love’: 806: 2, ‘write’: 806: 1, ‘expat’: 806: 1, ‘life’: 806: 1, ‘share’: 806: 1, ‘soverydutchstori’: 806: 1, ‘blogger’: 806: 1, ‘drink’: 577: 1, ‘lot’: 577: 1, ‘coffe’: 577: 1, ‘madsen’: 577: 1)

To make this index the about descriptions are converted to lowercase and tokenized again. Next looping over this list for each user, each word is added to a dictionary for each skill. To construct this index the following pseudo code can be followed as an example:

Algorithm 4:TDM Matrix import dataset;

split dataset into train and test dataset; Create an empty dictionary;

for user in dataset do

Set user id variable to corresponding user;

Get job title variable from user, convert all characters to lowercase letters and tokenize this.;

Get about description variable, convert all characters to lowercase letters and tokenize this.;

Combine these two lists into a word list, stem all words; Make a list of the skills the user has given;

for skill in skills do

if skill not in dictionary then :

else

Add a tuple of the user id and the word list with the skill as key; end

Add the tuple to the one already in the dictionary; end

end

To calculate which skill fits best with a certain query a ranking algorithm was written. This algorithm uses the theory explained in Section 2.3 Underneath pseudo code for constructing such a ranking algorithm.

(18)

Algorithm 5:Naive Bayes Recommendation

Naive Bayes Ranker(query, index, amount of recommendations); Create an empty list to store all scores in;

Take the query, convert all characters to lowercase letters and tokenize; for skill in index do

Skill probability = times skill occurred / amount of skills in the dataset; Create an empty probability list;

for word in query do

Word probability = (times word occurred in this skill + 1) / (times skill occurred + amount of all words in the dataset);

Append word probability to the probability list; if word not in skill’s index then

Word probability = 1 / (times skill occurred + amount of all words in the dataset);

Append word probability to the probability list; end

Skill probability = The product of every probability in the probability list * skill probability;

Append skill probability to score list; end

end

Sort probability score list;

Return top[amount of recommendations];

4.5 Evaluation

The next two sections will describe how the dataset was evaluated. The training test data split and the algorithm which did the evaluation are explained.

4.5.1 Training Test Dataset

To be able to see how the algorithm performs on data it has never seen before the dataset is split into two parts. A training dataset and a test dataset, in this thesis a 90/10 percent split was made. Depending on what minimum length of about description was used for recommendation the size of the training and test dataset varied. With this split the algorithm “learns” from the training dataset and is evaluated on the test dataset. When the dataset is divided the training dataset contains most of the data. It can also be useful to select a certain type of data from the dataset. For example all profiles with a certain minimum description length. To be able to split the data and select certain data the following pseudocode can be used:

(19)

Algorithm 6:Training / Test Split

Training test dataset split(percentage test data, minimum about length); Create an empty list to store user ids in;

for profile in the database do

if None of the fields are empty then

Take the user’s description, lowercase and tokenize it;

if length of description is bigger than minimum about length then Append user id to user id list;

end end end

Use sklearn to split the list of user ids with a test dataset size of the percentage given; Return training dataset and test dataset;

4.5.2 Algorithm Evaluation

Now that a training and test dataset have been created the next step is to evaluate on the test data. The recommendations given by the algorithm can be evaluated on by looking at the skills that were given by the users themselves. These skills are known to be correct because they are from the user. The wording can be slightly different so to not skip out on correct recommendations both the given skill and the predicted skill are stemmed. If both match a correct recommendation has been found. This can be evaluated for a certain amount of recommendations so the precision and recall can be compared for a depending on the amount of recommendations. Underneath pseudo code for the evaluation of the recommendations.

(20)

Algorithm 7:Recommendation evaluation

K evaluation(Max amount of recommendations); Create two lists to store precision and recall values in; for 1 to the max amount of recommendations do

Create three empty dictionaries to store matches, amount of matches and amount of skills in per profile;

for Profile in the test dataset do

Take the profile description and make a recommendation with an x amount of predictions;

for Each skill in the profile’s given skills do Add one to the amount of skills dictionary; Stem this skill;

for Each skill in the predicted skills do Stem this skill;

if Both skills match then

Add one to the amount matched dict; end

end end end

Calculate precision and recall values based on the counts from the dictionaries; Append these values to the precision and recall lists;

end

(21)

5 Results

For each of the methods precision-recall curves were made. Because evaluation is done on a recom-mendation system an important metric is how many recomrecom-mendations it does. A recomrecom-mendation system recommending only one item should be more precise because it recommends the skill it has the highest confidence about. When recommending multiple skills it should have to give a higher recall because it finds a bigger part of all skills that are matching.

5.1 Okapi BM25

First off it would be relevant to see what a prediction looks like for an unknown profile. A real user with user id 644 had the following about description: Content Communicatie Specialist Strateeg Adviseur Trainer. The skills this user gave were: Communication - Social media - Social Media Marketing - content strategy - Branding strategy. The 10 skill recommendation the Okapi BM25 algorithm did was: Online Marketing - Marketing - Social media marketing - Social media - Story-telling - pr - Marketing - Copywriting - branding - Branding strategy

To represent the performance of both implementations a ten run average has been calculated where for each amount of recommendations up to ten recommendations. The average, best and worst precision and recall are plotted. This was done for a minimum about description length of zero, five and ten words.

Figure 4: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations and all description lengths

There is a big variance between the best and the worst performances. As expected the more predictions are done the bigger the recall becomes and the lower the precision. The variance in per-formance indicates that depending on the split of the test training dataset the results vary which is to be expected for a small dataset. With an average of 4.7 skills per account as has been calculated

(22)

in 4.1 a recall of 0.3 for a 10 skill recommendation with a precision of 0.15 seems to be correct taking into account that most accounts have a smaller amount of skills than the average.

Figure 5: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations with a minimum description length of five words

Figure 6: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations with a minimum description length of ten words

(23)

Figure 7: Comparison between the average precision and recall from 1 to 10 predictions for each minimum amount of words in the description (>0, >5, >10).

Comparing the three different plots on average the non restricted dataset obviously performs better. This could be due to the big decrease in size of the dataset when only bigger about descrip-tions are used and the quality not being as important to the BM25 algorithm.

5.2 Naive Bayes

The Naive Bayes recommendation system gave the following recommendation for an unknown profile with user id 43. The about description of this user was I’m a designer. The skills that were given by this user were Product Design and Advertisement Design. The skills recommended by the Naive Bayes Classifier were: Product Design, UI design, 3D design, App designer, User Interface, Banner design, Book Design, Job Costing, English to Swedish, Finnish to English.

(24)

Figure 8: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations and all description lengths.

Figure 9: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations with a minimum description length of 5.

(25)

Figure 10: Ten run average Precision-Recall Curve plotted for an increasing amount of recommen-dations with a minimum description length of 10.

Figure 11: Comparison between the average precision and recall from 1 to 10 predictions for each minimum amount of words in the description (>0, >5, >10).

The Naive Bayes Implementation started performing better with more complete data even though the dataset got smaller. When only profiles were used with a minimum description length of ten words it clearly performed better.

(26)

5.3 Comparison

Looking at the results compared there is a clear winner. The Okapi BM25 algorithm outperformed the Naive Bayes algorithm by about ten percent in both precision and recall. The Okapi BM25 implementation is also faster because it has a time complexity of O(words*documents) instead of O(words*classes) like Naive Bayes has [2]. The time difference was clearly noticeable when getting the results

Figure 12: Both implementations ten run average Precision-Recall curves plotted for an increasing amount of recommendations

The coverage was plotted for recommendations with 10 skills over the whole test dataset, see Figure 13. Coverage indicates the amount of skills that the recommendation system can recom-mend. To calculate the coverage Equation 3.3 was used. Logically the more predictions that are made the more of the skills from the dataset are recommended. The steady increase could infer that the recommendation system does not limit itself to a small set of the data. At least for recom-mendations with 10 skills this holds true. When doing recomrecom-mendations with 5 skills less than 10 percent of the total skills are recommended. An advantage of recommending more skills therefore is a bigger variation of the skills recommended from the dataset. Naive Bayes clearly has a higher coverage of the dataset than Okapi BM25 does.

(27)

Figure 13: The coverage of the skills in the dataset for both implementations

6 Discussion

In the upcoming sections the results from Section 5 and the feedback that was given by a team of experts on the subject is discussed.

6.1 Results

Looking at the overall results the precision of both recommendation systems is relatively low. In-specting the recommendations and how these match for each profile it becomes clear that on some profiles the algorithm performs good with multiple matching recommendations and on others bad with no matches at all. Although inconsistencies I1 and I4 (Length of the about description and Capitalisation) were gotten rid of the recommendation systems still had to deal with inconsistencies I2, I3, I5 and I6 (Different languages, Typos, Synonyms and difference in interpretation). These are inconsistencies that could explain the variance in performance and low precision. Especially different languages (I2) and different interpretations (I6) are differences that could make the rec-ommendation perform better on one test dataset and worse on another.

When evaluating the algorithms the results could also vary a lot from time to time. This could be due to the small size of the dataset. Whenever certain good performing accounts were in the test dataset the algorithm performed worse and vice versa. Some users only write a description because it is mandatory to make an account on the platform. So these users just fill something in which is not related to them or their skills. This type of profile also makes the predictions less precise. Because of the short length of this kind of description the words are more valuable to the algorithm although they have no real meaning.

(28)

Something that became evident when looking at the results is that word matching for evaluation might be a too simplistic approach. For example a user gave the skill "Javascript" and got a recommendation with the skill "Programming". This word does not match at all but is a relevant skill. In this way an ontology based approach like discussed in Section 1.2 [8] would have an advantage because Javascript could fall under the category programming and thus be deemed relevant.

6.2 Expert Validation

Because this thesis was done for the Knowork platform it was interesting to get some opinions from the team behind Knowork. The team consisted of 5 domain experts. To be able to get their opinions an explanation video was made for the team to give them some context on what had been made, with which thought process and how the recommendations were done. Also two concrete examples of skill suggestions were given, one based on Okapi BM25 and one based on Naive Bayes. After this the following questions were asked and the most relevant answers are discussed for each question:

“The recommendations that are done by the algorithm are based on the skills and about descrip-tions of other users on the platform. Does this approach bring any risks? Please formulate two if you can.”

The biggest concern the Knowork team had was the fact that users often neglect the about de-scription and do not fill it in well enough. Although this is a risk they did think the best approach was to use job title and description as inputs. Another risk they pointed out was that users can potentially choose skills that do not apply to him/her because this makes it faster to make the account or they use skills which sound the same but have another meaning.

“Some profiles performed a lot better than others. This seems to be because of the different content people put in their about description. An option to increase precision could be to suggest a certain format to write about in the description. Do you see this as a viable option and what are your thoughts on this idea?”

The Knowork team pointed out that the more restrictions you do on an input the less likely people are to fill it in. So the users should not be required to fill in their description in a certain format but given an example of what an about description could look like.

6.2.1 Reflection

Users not being willing to fill in their about description is a hard problem to solve. With the skill recommendation completing the profile is made easier but users still have to put in some effort to complete the profile. This is an inevitable problem on a platform. Although it could be made easier by giving users an example of what an about description could look like. In this way the users can also be guided to put certain content in their about description and no restrictions are set which makes it more likely that an entry is received.

(29)

7 Conclusion

R1.1: “Is it more effective to recommend skills based on the skills of the best matching user profiles, or on the complete set of available skills?”

Profile based skill recommendation with the Okapi BM25 classifier seems to perform better than recommending individual skills for this dataset size and quality as shown in Section 5. Both precision and recall are higher overall.

R1.2: “Which constraints on using the dataset can be introduced to the recommendation system, to balance the quality of the data available for suggestions, whilst preserving enough quantity of data to ensure generalisability?”

As can be seen in Section 5 by cutting down on the shortest about descriptions an increase in performance was found in the individual skill recommendation. But the profile based skill recom-mendation worked best when it got all profiles and performed better overall. Therefore setting a constraint on the minimum about length of a profile would not significantly improve predictions.

This thesis answers the following main research question R1: “Is skill tag recommendation based on user profile data feasible, when predicted on a limited amount of data”

Looking at the results and the sub questions answered above the conclusion can be drawn that doing relevant recommendations with this dataset is feasible. Although the shortcomings of the dataset, the recommendations it does based on this dataset of 300 profiles is surprisingly good. There would be a lot of room for improvement when the dataset continues to grow. The experts behind the platform were satisfied with the predictions.

7.1 Future Work

Due to the small set of profiles one of the things this research did not take into account is making use of semantics. The dataset also had a big variety of skill sets because the skills are not bounded by a specific professional subgroup. This could make use of semantics harder because of a lot of different professional jargon. Some semantics based approaches are too computationally intensive or do not work with the shorter descriptions. Semantic analysis has been used to reduce computa-tional time in neural networks by reducing the size in the vector space model [9]. This could also be done to combine parts of sentences in about descriptions. It would be interesting to find out if this would increase accuracy or if it would further decrease the size of the dataset and thus cause a decrease in performance.

Because the quality of the data is important for the performance of the recommendation system it could be interesting to make an algorithm which can determine the quality of a profile. It could for example use about description length, semantic analysis, amount of skills, orthography and de-scription/skill matching to determine how good a profile is. Based on this the user can be skipped out of the recommending dataset or notified that his/her account could be made better.

(30)

As Knoworks dataset grows and maybe the quality changes in the future it would also be inter-esting to see how the performance changes. The increase in data might benefit performance over time but might also make other implementations that need more data viable.

It would also be interesting to find out if suggesting users to write about descriptions in a certain format would increase performance and what format would be most beneficial.

One of the benefits of the approaches that have been used is that it can work with multiple languages. As more accounts in a new language are made the performance of the recommendation system will increase. In the recommendation systems stemming has been used. It would be inter-esting to see how the recommendation system performs on different languages and what changes should be made on the stemming part of the algorithm for different languages.

Acknowledgements

Writing this thesis couldn’t be done without the help of a few individuals. I would like to thank Greg Marshall from the Knowork platform for assisting and all his input from our meetings. I would like to thank my supervisor Sander van Splunter for all his help while writing my thesis. I would also like to thank the Knowork team for taking the time to do the questionnaire that was made. In addition I would like to thank my family and friends for supporting me during my thesis.

References

[1] P. Domingos and M. Pazzani. Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier, page 7. https://www.ics.uci.edu/~pazzani/Publications/ mlc96-pedro.pdf. 4

[2] C. Elkan. page 4. 12 1998. https://www.researchgate.net/publication/2343024_Naive_ Bayesian_Learning. 23

[3] C. D. Manning. Introduction to Information Retrieval, pages 253–263. 2008 jul. https: //nlp.stanford.edu/IR-book/pdf/irbookprint.pdf, ISBN: 0521865719. 4

[4] M.E. Maron. Automatic Indexing: An Experimental Inquiry, page 409. https://sci2s.ugr. es/keel/pdf/algorithm/articulo/Maron1961.pdf. 5

[5] P. Mukalov, O. Zelinskyi, R. Levkovych, P. Tarnavskyi, A. Pylyp, and N. Shakhovska. Development of System for Auto-Tagging Articles, Based on Neural Network, pages 3–7. http://ceur-ws.org/Vol-2362/paper10.pdf. 4

[6] S.E. Robertson, S.Walker, S.Jones, and M. Gatford M.M. Hancock-Beaulieu. Okapi at TREC-3., page 2. 01 1994. 5

[7] G. Salton and M. J. McGill. An Introduction to Modern Information Retrieval. 1983. https: //dl.acm.org/doi/book/10.5555/576628, ISBN: 0070544840. 4

[8] G. Sriharee. An Ontology based Approach To Autotagging, page 1. https://link.springer. com/content/pdf/10.1007/s40595-014-0033-6.pdf. 4, 25

(31)

[9] B. Yu, Z. Xu, and C. Li. Latent semantic analysis for text categorization using neural network, page 903. https://doi.org/10.1016/j.knosys.2008.03.045. 26

[10] H. Zhang. The Optimality of Naive Bayes, page 6. http://www.cs.unb.ca/~hzhang/ publications/FLAIRS04ZhangH.pdf. 5

Skill tag recommendation inprofessional networks

Skill tag recommendation in

professional networks

Skill tag recommendation in

professional networks

An exploration for online co-working spaces

Contents

1

Introduction

1.1

Contributions

1.2

Literature Review

2

Background

2.1

TF-IDF

2.2

Okapi BM25 Ranking

2.3

Naive Bayes Classification

3

Methodology

3.1

Exploring the Basis for Recommendations

3.2

Constraining the dataset

3.3

Evaluation

4

Experiment setup

4.1

Dataset

4.2

Preprocessing

4.3

Okapi BM25

4.4

Naive Bayes

4.5

Evaluation

5

Results

5.1

Okapi BM25

5.2

Naive Bayes

5.3

Comparison

6

Discussion

6.1

Results

6.2

Expert Validation

7

Conclusion

7.1

Future Work

Acknowledgements

References