Investigation of the indicative features of job advertisement interest by matching job and job seeker
1
submitted in partial fulfillment for the degree of master of science
2
Rianne Klaver
3
12395927
4
master information studies
5 data science 6 faculty of science 7 university of amsterdam 8 2019-06-26 9
First Supervisor Second Supervisor Title, Name Prof. Dr. Evangelos Kanoulas Dr. Yuval Engel
Affiliation Informatics Institute Amsterdam Business School
(Entrepreneurship & Innovation Section) Email e.kanoulas@uva.nl y.engel@uva.nl
10
Investigation of the indicative features of job advertisement
interest by matching job and job seeker
Rianne Klaver
University of Amsterdam
ABSTRACT
Research into job recommendations and job attractiveness has
al-12
ways been of interest for companies. Choosing the right applicant
13
can be of great financial importance, since the job needs to be
14
performed adequately. If a company knows what aspects draw in
15
applicants, it could alter its advertisement to be more attractive for
16
the desired applicants. This research investigates which features
17
contribute most to a user expressing interest in an advertisement. It
18
will make a distinction between the indicative features: in general,
19
gender specific, and user specific. Furthermore, several methods
20
are employed to determine whether a user is actually capable of
21
performing the job, i.e. is a match.
22
KEYWORDS
Job recommendation Job interest prediction Text matching
23
Feature importance
24
1
INTRODUCTION
Finding the right candidate to fill job vacancy is crucial for
compa-25
nies. Since a company invests in their new employee, they could
26
lose a significant amount of money if the person’s performance is
27
lacking or if they resign [21]. Researchers have focused on how
28
to select the right candidate from the pool of candidates [16], and
29
how the selection process actually works in practice [10]. However,
30
what if the right person was not even in that pool to begin with?
31
An application was built to match companies and job seekers.
32
Job seekers could see job advertisements, including information
33
about the company, and ’like’ or ’dislike’ the advertisement.
Like-34
wise, a company could like or dislike a job seeker’s profile. A match
35
occurred when both parties liked each other; then the application
36
would provide communication opportunities between them to
fur-37
ther discuss the job advertisement.
38
The data collected from this application can provide great
in-39
sights into what drives a job seeker and company, looking to fulfill
40
a vacancy, to be interested in one another. This research will focus
41
on the interest of the job seeker, and will investigate whether it can
42
be predicted if a certain user will like a certain advertisement using
43
the data of the application and which features drive that decision.
44
This information could be useful for a company seeking the perfect
45
candidate for their vacancy. They would know what information
46
they provide is most important to the job seekers. The company can
47
then focus on altering that information in the job advertisement to
48
attract the right candidates and thus improve their candidate pool.
49
This will increase their chance of finding the right employee.
50
This research aims to answer the following question:
51
• What company, job advertisement, and job seeker features
52
are most indicative of whether a job seeker will like the
53
advertisement or not?
54
The sub questions that will be answered in this research are:
55
• What is the best method to determine if a user matches with
56
a job advertisement?
57
• Do the indicative features differ per user?
58
• Is there a difference between indicative features for each
59
gender?
60
The research questions will be answered by firstly creating a
61
random forest classifier, predicting whether a user will like the
62
advertisement. The feature importance of each feature is then
mea-63
sured to see which features attribute most to a prediction. The
64
feature contributions to a single prediction of various user rating
65
are calculated to see whether those features differ per user
pre-66
diction. Furthermore, this research explores methods to match the
67
skills a user has to the skills a job requires, and to match the job
68
title and the user’s previous work experience. This will be done to
69
see if a user profile actually matches with the job advertisement,
70
i.e. the user can actually exercise the profession. Lastly separate
71
models are trained on a dataset containing either only male or only
72
female users, to see whether the indicative features, measured by
73
the feature importance, differ per gender.
74
2
RELATED LITERATURE
2.1
Applicant’s attraction to a job
75
Various studies have researched what features of a job
advertise-76
ment and the company behind it can indicate whether an applicant
77
will be interested in pursuing the job.
78
Chapman et al. studied the attraction of an applicant to a certain
79
job [12]. They combined 71 studies to find that characteristics of the
80
company and the job itself are important features for this attraction.
81
Moreover, the researchers found that an applicant’s perception of
82
fit in the company, a subjective feature, is an important predictor
83
as well. This perception of fit can be between the applicant and
84
the organisation, or between the applicant and the job. Another
85
interesting finding is that the available data about job characteristics
86
is used more by women than by men when assessing a job.
87
Furthermore, research has been done on employee preferences
88
per gender when selecting a job. Flory et al. found that showing a
89
job advertisement with male connotations would attract different
90
responses from women and men than the same advertisement with
91
these connotations removed [17].
92
Moreover, Cheryan et al. found that the physical environment
93
has influence on whether people want to participate in a certain
94
group, computer science in this research [15]. For example, if women
95
feel like they do not fit in, because of certain objects representing
96
certain male stereotypes, they have less interest in participating.
97
Gender stereotyping for certain jobs starts early in our lives. Miller
98
et al. showed that children from the age of 8 already find some jobs
99
should be performed by men or by women [27].
100 1
Lastly, research has been conducted on which features women
101
or men find important when choosing a job. Centers et al. found
102
that women find ‘good coworkers’ more important than men when
103
choosing a job [11].
104
2.2
Job recommending
105
Section 2.1 demonstrates that different applicants can respond
dif-106
ferently to the same job advertisement dependant on several factors.
107
Thus, this information could be used to provide the applicants with
108
the right advertisements, i.e. give the right job recommendations.
109
This would possibly make them more likely to like the
advertise-110
ment in the application. Research has been done on how to
recom-111
mend the right job, i.e. predicting whether the user will like the
112
advertisement. Lacic et al. showed that a user’s ‘frequency and
re-113
cency of interactions with a job posting’ can be used to recommend
114
jobs [22]. Kille et al. investigated whether clicking on an
adver-115
tisement, bookmarking it, or replying can indicate whether a user
116
finds the advertisement interesting [20]. Their research concluded
117
that replying is the most accurate feature for providing relevant
118
recommendations. The 2016 and 2017 ACM Recommender Systems
119
Challenge involved job recommendations, and the 2017 contest
120
specifically focused on cold-start recommending [8, 24]. The team
121
of Lian et al. used a single boosting tree as their predictive model,
122
and investigated whether basic profile features could be matched to
123
the advertisement features to improve the recommendations [24].
124
They matched the country, industry, and career level, and reported
125
that the country match was in their top ten of most important
126
features. Bianchi et al. participated in the contest as well, and also
127
investigated user-advertisement profile matching [9]. Their method
128
involved calculating the cosine coefficient of the common features
129
and computing TF-IDF to investigate feature relevance.
130
3
METHODOLOGY
3.1
Data Description
131
The dataset consists of ratings between 2015 and 2017 of which
132
the 23 features can be divided into three categories: data about the
133
company, about the job advertisement, and about the user who is
134
rating the job. The dataset is labelled; the labels state whether the
135
user has liked or disliked the advertisement. The dataset contains
136
the following unstructured features: job required skills, job tasks,
137
user work experience, and user skills. All other considered features
138
are structured. Table 1 shows general statistics of the dataset. A
139
distinction is made between unique and total male and female users.
140
A user was able to rate multiple advertisements, the total number
141
is the sum of all ratings made by either men or women.
142
Table 1 reveals a heavily imbalanced dataset regarding the like
143
and dislike class. Figure 1 shows the contribution per gender to this
144
imbalance. It appears that women have been more active on the
145
application, even though there were less unique female users than
146
male users.
147
3.1.1 Company data. The user would see certain information
148
about the company behind the job advertisement when rating. This
149
information concerned the company itself, such as the company
150
size, foundation year, average employee age, and the percentage of
151
female employees. The industry of the company is also recorded,
152 Ratings 675787 Liked 114080 Disliked 561707 Job advertisements 2588 Users 8920
Male 4967 unique, 380090 total Female 3910 unique, 293746 total
Table 1: Statistics of the dataset
Figure 1: The job ratings per gender
and follows the Global Industry Classification Standard (GICS).
153
GICS is a global standard for classifying companies into sectors
154
[35]. The percentage of women in the industry is recorded as well.
155
3.1.2 Job data. Besides information about the company, the
156
user would also see information about the job itself. This includes
157
the tasks of the job, the skills the job applicant is required to have,
158
the location of the job, and the language of the advertisement.
159
The user would also see if the job is full-time, part-time, or an
160
internship. Moreover, the job advertisement indicated the level of
161
the job, ranging from entry-level to executive. The dataset also
162
contains a classification of the job into six different occupational
163
groups, e.g. ’management occupations’.
164
3.1.3 User data. Users had to disclose personal data when
sub-165
scribing to the application. The company would see this data when
166
rating users. The users provided their gender, location, languages,
167
education, graduation year, and years of experience. Figure 2 shows
168
a histogram of the graduation decade of users to provide an idea
169
of the user distribution. It shows the distribution of the job ratings
170
per graduation decade in total percentage and absolute values per
171
decade. It can be seen that most users of the application are recent
172
graduates or still studying. The education information was stored
173
unstructured, and has been modified into a structured feature,
in-174
dicating whether the education was Information Technology (IT)
175
related or not.
176
3.2
Classifier
177
A random forest classifier is used to classify whether a user will like
178
an advertisement. Liaw et al. explains the algorithm [25]: a random
179
forest consists ofn decision trees, which are built on n bootstrap
180
samples. To decide on the best split at every node only a subset
181
of features is used instead of all. All decision trees are combined
182
to provide one classification: each predicts the same sample and
183
the majority vote for a certain class is the result. The model will be
184
evaluated on the accuracy and F-score [6]. Accuracy alone would
185 2
Figure 2: The job ratings per user graduation decade
be insufficient since the dataset is imbalanced, i.e. a larger number
186
of users has disliked advertisements [19].
187
3.3
Dataset pre-processing
188
3.3.1 Filtering. The dataset has been filtered on the following
189
conditions:
190
• Each job advertisement must concern a full-time or part-time
191
job.
192
• Each job advertisement must have over 85 user ratings.
193
• Each job advertisement must be written in English.
194
• Each user must be able to speak English.
195
• Each company behind an advertisement must have less than
196
500 employees.
197
Advertisements with less than 85 ratings, internships, and large
198
companies are filtered out to remove outliers which could
nega-199
tively impact the model. Internships make up only a very small
200
part of the dataset (4%), which is the case for large companies as
201
well, and advertisement with less than 85 ratings make up only 5%
202
of the dataset. The English language requirements are set to ensure
203
that text matching on certain features can be performed.
Further-204
more, any entry containing missing values has been dropped as
205
the Random Forest Classifier implemented by scikit-learn does not
206
support handling missing values [36]. The dataset has been split
207
into a train (70%) and test (30%) set.
208
3.3.2 Basic features. Several basic features have been created
209
based on existing features in the dataset:
210
• Whether a user can speak Dutch or not.
211
• How many skills a user claims to have.
212
• The count of skills required in the job advertisement.
213
3.3.3 Text features. The features user skills, job required skills,
214
job title, and user work experience all consist of text. These features
215
have been sorted alphabetically per either skill or job title per
216
rating. Furthermore, the most frequent occurring Dutch words have
217
been translated. The user work experience feature contained Dutch
218
words even though the dataset has been filtered on English job
219
advertisements and English speaking users. This is most likely due
220
to the ability to import previous work experience from LinkedIn,
221
which can be in Dutch. For example, ’communicatie’ has been
222
translated to ’communication’. Frequently occurring abbreviations
223
have also been replaced by the full word, e.g. ’jr.’ by ’junior’. These
224
alterations are of importance for the stemming and to calculate
225
word similarity, as explained in the following sections.
226
3.3.4 Stemming. Each word in the text features mentioned in
227
section 3.3.3 has been stemmed using the Lancester stemmer. This
228
stemmer consists of 115 rules that state when the ending of a word
229
has to be removed or replaced [33]. Multiple rules can be applied to
230
one word. Moral et al. states that the Lancester stemmer is stronger
231
than the Porter, Lovins, Dawson, and Krovetz stemmers [30]. A
232
stronger stemmer could be beneficial since text matching will be
233
performed on these features , as it is expected it will stem the most
234
words with the same meaning to the same stem.
235
3.3.5 Robust Scaling. All features are scaled using the Robust
236
scaler, which is a scaler robust to outliers [3]. The features are scaled
237
separate of each other by first discarding the median [2]. Then, the
238
data is scaled between the first quartile and third quartile.
239
3.4
SMOTE
240
The Synthetic Minority Over-sampling Technique (SMOTE) has
241
been used to resolve the imbalance of the dataset. SMOTE
over-242
samples the class ’like’, which has significantly less entries than
243
the class ’dislike’ [13]. Each entry of the class ’like’ is seen as a
244
vector, and its nearest neighbour is found. The neighbour vector
245
is deducted from the original entry and the result is multiplied by
246
value randomly chosen in the range [0,1] and added to the original
247
entry. This results in a new entry. This is done for every entry,
248
and can be repeated multiple times to achieve the same amount of
249
entries as the ’dislike’ class.
250
3.5
Job tasks
251
A company had to provide job tasks for every job advertisement,
252
indicating what the employee would be working on and how much
253
time they would be spending performing the task. The time
indica-254
tion is on a daily scale, with a maximum of five days to divide over
255
the tasks. A company could add up to three tasks to an
advertise-256
ment. The following four features have been created based on this
257
information:
258
• Task total count: the total amount of days of one week spent
259
working on the tasks
260
• Task different count: the amount of different tasks that have
261
to be performed on the job
262
• Task maximum: the longest period of time spent on one task
263
in one week
264
• Task minimum: the shortest period of time spent on one task
265
in one week
266
3.6
Rarity of skills
267
3.6.1 User skills. A user provided their skills as part of their
268
profile. To investigate whether their skills are an indication if they
269
will like a certain job advertisement or not, the rarity of their skill
270
set is determined. This is done in two ways: based on all the user
271
skill sets and based on the skill sets of the users who have rated
272
that certain advertisement.
273
3.6.1.1 Based on all the users’ skill sets - skill-rarity1. The rarity
274
of the skill set is measured by taking the following steps. First
275
new dataset is created containing all unique user profiles. Then,
276
an Inverse Document-Frequency (IDF) dictionary based on all the
277
skills of the unique user is calculated. This dictionary contains the
278
IDF score for every word appearing in any skill set. The IDF score
279
is calculated as follows:
280
IDFi= loдd fN
i + 1 (1)
whereIDFiis the IDF score for wordi, N equals the total amount
281
of documents, i.e. skill sets, andd fiequals the number of documents
282
that contain wordi, i.e. the document frequency [7].
283
The following equation is used to calculate the total IDF of a
284
user’s skill setj:
285
IDFj = Ínj
i=1IDFi, j
nj (2)
whereIDFi, jis the IDF score for wordi appearing in document
286
j, njis the total number of words in documentj.
287
3.6.1.2 Based on the skill sets of the users’ who have rated that
288
certain advertisement - skill-rarity2. This calculation largely follows
289
the steps taken in section 3.6.1.1, however, multiple new datasets
290
are created, one for every job advertisement. Each dataset contains
291
all user profiles who have rated that certain advertisement. A new
292
IDF dictionary is created for each dataset, containing the IDF score
293
for every word appearing in only the skill sets of the users who
294
have rated that advertisement. Then, the total IDF of a user’s skill
295
set in that particular dataset is calculated based on equation 2.
296
Thus, it is likely that users who have rated multiple advertisements
297
are not assigned the same IDF score for their skill set for every
298
advertisement.
299
3.6.2 Job required skills - skill-rarity3. A job advertisement
con-300
tains information on the skills the applicant should possess,
pro-301
vided by the company. If a company asks for a very specific,
uncom-302
mon skill users might be more inclined to dislike the advertisement,
303
as there is a higher chance they do not have that skill. Therefore,
304
the rarity of the job required skills might be indicative of whether
305
a user will like the advertisement or not. The rarity of these skills
306
is determined in almost the same manner as presented in section
307
3.6.1.1. However, new datasets of all unique job advertisements,
308
instead of all unique users, are created.
309
3.7
Matching user and job required skills
310
Both the user and the company provided skills the applicant has or
311
should have. This could be used to determine whether the user is
312
a match with the advertisement, i.e. whether the user is actually
313
capable of performing the job. Several methods have been tested
314
to determine these matches based on the skill sets and will be
315
discussed in the following sections.
316
3.7.1 Levenshtein Distance. The Levenshtein distance is a metric
317
that calculates the edit distance between sentence x and y. The
318
distance is defined as the sum of character insertions, deletions, and
319
replacements performed on sentence x to transform it into sentence
320
y [23]. Three different methods are implemented to calculate the
321
Levenshtein distance between the user skill set and the job required
322
skill set and will be explained in the following three sections.
323
3.7.1.1 Complete string - Levenshtein1. All the user skills are
324
combined into one stringu, and the same for all the job required
325
skillsj. The Levenshtein distance, L(u, j) is then calculated between
326
these two strings. This distance is normalized by dividing it by
327
the maximum character length of stringu and j, resulting in the
328
normalized Levenshtein distance over the complete skills sets.
329
3.7.1.2 Per skill - Levenshtein2. This method focuses on
individ-330
ual skill matches between the two skill sets, and will output the
331
amount of matched skills. A matched skill is defined as a skill in
332
the user skill setui that is similar to a skill in the job required skill
333
setui, To determine the amount of matched skills, all skill pairs
334
between the user and job required skill set are created, (ui, jk) . For
335
every skill pair, the normalised Levenshtein distance as introduced
336
in section 3.7.1.1 is calculated, however, the complete strings are
337
now represented by one skill. Next, the best matchingjkis
deter-338
mined for everyui, i.e. the (ui, jk) pair with the lowest Levenshtein
339
distance. Eachui can only match with onejk. A thresholdt is
340
needed to determine how many (ui, jk) actually match, i.e. a match
341
occurs whenL(ui, jk)< t. t is found by empirically trying different
342
thresholds andt=0.5 is chosen as it results in the highest model
343
accuracy.
344
3.7.1.3 Per skill – Levenshtein3. This method mostly follows the
345
method explained in section 3.7.1.2. However, besidesuionly being
346
allowed to match once,jk is only allowed to match once as well.
347
The matching pairs (ui, jk) are selected in the following manner:
348
Find the lowestL(ui, jk), register the pair as a potential match, and
349
remove all other pairs containing eitherui orjk from the list of
350
all skill pairs. Again, the lowestL(ul, jm) is found, and registered
351
as a potential match. This pair can thus never contain eitherui or
352
jk. Once allui orjkare matched,x potential matches are found,
353
wherex = min(len(ui), len(jk)). These Levenshtein distances of
354
these potential matches are then compared with certain thresholds
355
t to arrive at the final number of actual matches. t=0.2 is chosen as
356
it results in the highest model accuracy.
357
3.7.2 ROUGE score. Recall-Oriented Understudy for Gisting
358
Evaluation (ROUGE) is an algorithm to determine the similarity
359
between two text fragments and is based on overlapping words
360
[26]. Several ROUGE scores have been used to calculate the distance
361
between the user and job required skills: ROUGE-1 and ROUGE-2
362
(ROUGE-N scores), ROUGE-L, and 1 and
ROUGE-WE-363
2 (ROUGE-WE scores).
364
3.7.2.1 ROUGE-N. ROUGE-N measures the overlap of either
un-365
igrams (ROUGE-1) or bigrams (ROUGE-2) between the hypothesis
366
sentence and the reference sentence. A unigram is a single word,
367
a bigram is a sequence of two words. The hypothesis sentence in
368
this case is the job required skill set and the reference sentence is
369
the skill set of the user. The ROUGE-N F-score is used to match
370
the similarity between the two sentences and is calculated by the
371
following equations [26]:
372
ROUGE-NRecall = Í
дr amn∈RCountmatch(дramn)
Í
дr amn∈RCount(дramn)
(3)
whereдramnis a n-gram of length n,R is the reference sentence,
373
Countmatch(дramn) are overlapping n-grams between the
refer-374
ence and hypothesis sentence, andCount(дramn) is the count of a
375
n-gram.
376
ROUGE-NPr ecision= Í
дr amn∈RCountmatch(дramn)
Í
дr amn∈HCount(дramn)
(4)
whereH is the hypothesis sentence.
377
TheROUGE-NF −scor eis then calculated using the general for-mula for the F-score of all ROUGE variants:
ROUGEF −scor e=
2 ∗ROUGERecall∗ROUGEPr ecision ROUGERecall+ ROUGEPr ecision (5) 3.7.2.2 ROUGE-L. ROUGE-L is based on the longest common
378
subsequence (LCS) between the hypothesis and reference sentence.
379
The LCS is the longest sequence of words occurring in both of these
380
sentences [39]. The sequence does not necessarily have to exist
381
of consecutive words, however, the word order cannot be altered.
382
For instance, say hypothesis = ‘java programming, C#, SQL’ and
383
reference = ‘C# programming, SQL’ . The LCS would be either ‘C#,
384
SQL’ or ‘programming, SQL’, and thus be of length two. ROUGE-L
385
is then calculated as follows [26]:
386
ROUGE-LRecall =Í LCS(H, R) дr am1∈RCount(дram1)
(6)
whereLCS(H, R) is the LCS of the hypothesis and reference
sen-387 tence. 388 ROUGE-LPr ecision=Í LCS(H, R) дr am1∈HCount(дram1) (7)
ROUGE-LF −scor eis calculated using formula 5.
389
3.7.3 ROUGE-WE score. The ROUGE-WE score was introduced
390
by Ng et al. and is based on calculating ROUGE over word
em-391
beddings [31]. Word embeddings aim to translate a word into a
392
vector containing numbers, ’such that their respective projections
393
are closer to each other if the words are semantically similar, and
394
further apart if they are not’ [31]. The researchers use word2vec to
395
calculate the word embeddings.
396
3.7.3.1 ROUGE-WE-1. The ROUGE-WE-1 scores are calculated
397
in the following manner [32]: The first step is to calculate the
398
word2vec vectors,wi, for every wordi. A pre-trained model is
399
used for this, trained on dataset containing Google News articles
400
[1]. Then, all pairs of words, (Hi, Rj), between the hypothesis and
401
reference sentence are determined, and their word2vec vectors are
402
multiplied. If a word does not appear in the model (OOV), i.e. has
403
no word2vec vector,wx = 0. All resulting vectors are summed and
404
then divided by either the total word count of the reference sentence
405
for the recall score (V EC-ROUGE-W E-1Recall), or of the
hypothe-406
sis sentence for the precision score (V EC-ROUGE-W E-1Pr ecision).
407
The entries of the final vector,V EC-ROUGE-W E-1Recall or
408
V EC-ROUGE-W E-1Pr ecision, are averaged to arrive at
409
ROUGE-W E-1Recall orROUGE-W E-1Pr ecision, i.e.:
410
V EC-ROUGE-W E-1Recall = Í i ∈HÍj ∈RwHi∗wRj Í дr am1∈RCount(дram1) (8) wherewx = 0 if wxis OOV. 411
V EC-ROUGE-W E-1Pr ecision= Í i ∈HÍj ∈RwHi∗wRj Í дr am1∈HCount(дram1) (9) wherewx = 0 if wxis OOV. 412
ROUGE-W E-1Recall = Ín
i=1V EC-ROUGE-W E-1Recall i
n (10)
where n equals the number of entries of the vector
413
V EC-ROUGE-W E-1Recall.
414
ROUGE-W E-1Pr ecision= Ím
i=1V EC-ROUGE-W E-1Pr ecisioni m
(11) where m equals the number of entries of vector
415
V EC-ROUGE-W E-1Pr ecision.
416
ThenROUGE-W E-1F -scor eis calculated using formula 5.
417
3.7.3.2 ROUGE-WE-2. The calculation of ROUGE-WE-2 scores
418
broadly follow the steps presented in section 3.7.3.1. The word2vec
419
of a bigramwi, j is calculated by multiplying the vectors of the
420
individual words constituting the bigram, i.e.wi, j = wi ∗wj. The
421
equations for calculatingV EC-ROUGE-W E-2Recalland
422
V EC-ROUGE-W E-2Pr ecisionthus differ slightly from equation 8
423
and 9 for unigrams:
424
V EC-ROUGE-W E-1Recall = Í
i ∈HÍj ∈RwHi ∗wRj
Í
дr am2∈RCount(дram2)
(12)
wherewx = 0 if wxis OOV,i = дram2∈H, and j = дram2∈R.
425
V EC-ROUGE-W E-1Pr ecision= Í
i ∈HÍj ∈RwHi∗wRj
Í
дr am2∈HCount(дram2)
(13)
wherewx = 0 if wxis OOV,i = дram2∈H, j = дram2∈R.
426
Equations 10, 11, and 5 are then used to calculate
427
ROUGE-W E-2F -scor e.
428
3.7.4 BLEU score. The bilingual evaluation understudy (BLEU)
429
score is based on the modified precision score for n-grams [34]:
430
pn= Í
дr amn∈RCountmatch(дramn)
Í
дr amn∈HCount(дramn)
(14)
The modified n-gram precision scores (of n=1,...) are combined by
431
taking their geometric mean. Papineni et al. stated BLEU based on
432
a maximum of 4-grams provided the best results [34]. Therefore,
433
this research will also focus on this maximum. Papineni et al. also
434
introduce a brevity penaltyBP to penalize hypothesis sentences
435
that are shorter than the reference sentence:
436
BP =
1 i f h > r
e(1−r /h)) i f h ≤ r (15) whereh is the length of the hypothesis sentence, and r the length
437
of the reference sentence.
438 5
BLEU is equal toBP multiplied by the geometric mean of the
439
pn. BLEU could be equal to zero when e.g. no 3-grams or 4-grams
440
are found, since the geometric mean ofpnwill then result in zero.
441
Smoothing function 1, as presented by Chen et al., is used avoid
442
this issue [14]. This function replacespn=0 withϵ = 0.1.
443
3.8
Matching job title and user work
444
experience
445
Each job advertisement contained the actual job title, and each user
446
had to provide the titles of jobs they have performed in the past.
447
Matching these job titles could be indicative of whether the user is a
448
good candidate for the job. If a user has held the same job title as the
449
advertisement, they might be a suitable match for the job, and more
450
inclined to like it. The following methods described in section 3.7 are
451
used to determine if there could be a match: ROUGE-1, ROUGE-2,
452
and ROUGE-L, ROUGE-WE-1 AND ROUGE-WE-2, and BLEU. The
453
method using Levenshtein distance has been altered slightly. First,
454
the Levenshtein distance between the advertisement job title and
455
each user job title is calculated. Then the pair with the minimum of
456
all distances is chosen as the best match. This Levenshtein distance
457
method thus outputs one distance measurement.
458
3.9
Feature importance
459
Several methods have been implemented to investigate which
fea-460
tures are most indicative of whether a job seeker will like the job
461
advertisement. The feature importance of all features of the random
462
forest classifier model has been calculated, and partial dependence
463
plots (PDP) are created. Furthermore, distinctive models are created
464
for all male and female users to see whether the feature importance
465
differs per gender. Lastly, the contribution per feature to a decision
466
classification of a single user is investigated. The following sections
467
will elaborate on the techniques used.
468
3.9.1 Of the model - Feature importance. The feature importance
469
is calculated by the Gini importance. The Gini importance uses the
470
Gini impurity to calculate the impurity of a nodeni[40]:
471 G = C Õ i=1 p(i) ∗ (1 − p(i)) (16) The Gini importance of one node is then calculated by [5, 37]: first
472
multiplying the node impurity with the number of training samples
473
reaching the node divided by the total number of samples, i.e. the
474
weighted node samples. Then subtracting the node impurities of the
475
left and right child node multiplied by the weighted node samples.
476
To obtain the feature importance of featuref of the tree, the Gini
477
importance of all nodes splitting onf is summed and then divided
478
by the total amount of nodes in the tree. This is then normalized
479
and all averaged over all trees to obtain the feature importance per
480
feature of the random forest.
481
3.9.2 Of the model - Partial Dependence Plots (PDP). PDPs are
482
constructed per feature and aim to explain the relationship between
483
the job rating and the features of the model. If the Gini importance
484
for a certain feature in the model is high, a PDP provides additional
485
insight. It plots the likelihood of predicting like against the possible
486
values of the feature. Thus, it shows what influence a certain value
487
of that feature has on the prediction, whereas the Gini importance
488
only states to what extent a feature is of importance to the model.
489
A PDP for featuref is created in the following way [18]: first the
490
training dataset is changed slightly, only the values of featuref
491
are altered. This value is changedi times to all possible values fi
492
between the [5,95] percentile range that are present in the dataset.
493
The other values are disregarded to diminish the effect of outliers.
494
Then for each of these altered versions of one row, a prediction is
495
made by the model. This is thus donei times for every row. Finally,
496
the average prediction per possible value off is calculated, which
497
is plotted resulting in the PDP forf .
498
3.9.3 Model per gender. The dataset has been split in d1
con-499
taining only male users andd2 containing only female users.
Ran-500
dom forest models have been trained and tested on these datasets
501
separately. The feature importances are calculated using the Gini
502
importance as explained in section 3.9.1.
503
3.9.4 Per user. The TreeInterpreter algorithm is used to
investi-504
gate the importance of features per user prediction. This algorithm
505
measures importance as contribution of a feature to the final
pre-506
diction. When looking at a single tree, the root node already has
507
a prediction value. This value then changes based on the splitting
508
criterion, resulting in different prediction values for the child nodes
509
[4, 38]. The contribution of the feature at the root node is either
510
the difference in value compared to the left child or the difference
511
compared to the right child, depending on the actual decision path.
512
This process repeats until the leaf nodes are reached. When
pre-513
dicting a user’s liking a certain path is followed in each tree of the
514
random forest. For each tree the contributions per feature present
515
in the tree path are registered, together with the final prediction.
516
All contributions per feature are averaged to calculate the
contribu-517
tions for the entire random forest. The final probabilities of a user
518
belonging to a class, i.e. like/dislike, is then calculated by summing
519
all feature contributions for that class, and adding it to the forest
520
bias, i.e. the average value of all root nodes. The algorithm can be
521 written as [4]: 522 Pred(u) = 1 T T Õ t =1 rootval t+ F Õ f =1 (1 T T Õ t =1 contt(u, f )) (17) wherePred(u) is the prediction of user u, T is the total number of
523
trees,rootval is the value at the root node,F is the total number of
524
features andcont(u, f ) is the contribution of f to the prediction of
525
u.
526
4
RESULTS AND ANALYSIS
4.1
On the original dataset
527
A random forest classifier model has been trained on only the
528
existing, unaltered, features in the dataset. The results are presented
529
in table 2. This means the baseline for the accuracy is 81.57%. The
530
F-score is fairly low, mainly caused by the very low recall score.
531
Accuracy 0.8157 F-score 0.2618 Precision 0.6705 Recall 0.1627
Table 2: Results of testing on the original model
4.2
Choosing best method to match user and
532
job required skills
533
Table 4 shows the accuracy, F-score, precision and recall of the test
534
set of several random forest classifiers. Each model is trained on one
535
of the skill matching features plus all other features except the ones
536
concerning the job title and user experience matches. All results lie
537
together quite closely. ROUGE-L obtains the highest accuracy and
538
precision, and BLEU the highest F-score and recall. The ROUGE-L
539
score has been chosen to add to the feature selection of the final
540
model, since it has the highest accuracy.
541
4.3
Choosing best method to match job title
542
and user experience
543
Several models are created and tested to obtain the best method to
544
match the job title and user experience. One of the methods plus
545
all other features except the ones concerning skill matching have
546
been used to train each model. The results are shown in table 5.
547
Again, the results of each feature are reasonably alike, however, the
548
maximum of each performance score is higher than the maximum of
549
each when trained on the skill matching. Matching on job title and
550
user work experience thus seems to lead to a slightly better model
551
than when matching on skill sets. Levenshtein distance results in
552
the highest accuracy and precision, and ROUGE-L in the highest
553
F-score and recall. Since the Levenshtein distance has the highest
554
accuracy it has been chosen to be added to the final model.
555
4.4
Model results
556
Adding the ROUGE-L feature on skill matching and the Levenshtein
557
distance feature on job title and work experience matching to the
558
random forest classifier model yields the results presented in table
559
3. The final model has obtained the highest accuracy compared to
560
the results in table 4 and 5, although only slightly. The F-score and
561
precision have not improved compared to the results of ROUGE-L
562
on skills and Levenshtein on job title matching. The recall score
563
has improved slightly compared to the results of these two models.
564
Compared to the results of the original model on unaltered data
565
(see section 4.1) the accuracy has improved slightly, and the F-score
566
has improved significantly. This is due to an increased recall score,
567
meaning more users who have liked an advertisement are correctly
568
predicted by the model. Thus the final model performs significantly
569
better than the original model on unaltered data. However, the
570
model’s low recall score indicates that out of all users who have
571
liked an advertisement, the model only predicts 36.78% correctly.
572
Accuracy 0.8292 F-score 0.4613 Precision 0.6184 Recall 0.3678 Table 3: Results of final model
4.5
Feature importance
573
4.5.1 Gini importance. The Gini importance was calculated for
574
each feature of the final model. The top five most important
fea-575
tures are shown in figure 3 (see appendix A for the complete list).
576
Graduation year is the most important feature. The added features
577
concerning users’ skills (ROUGE-L skill matching, skill rarity1, and
578
the count of user skills) are all in the top five as well. This indicates
579
that the user skills do posses important information for whether a
580
user will like the advertisement or not.
581
The importances can be categorized since the Gini importances
582
of all features combined add up to 1. Table 6 shows the summed
583
importances per feature category. Features concerning the user
in-584
formation are most important, followed by the category containing
585
the matched skill and job title features. The company features are
586
least indicative of whether a user will like the advertisement. This
587
shows that while a user’s own circumstances are leading when
588
rating job advertisements, a company should pay attention to the
589
details of the advertisement to attract the right applicants. A
com-590
pany should focus on establishing the right required skill set, since
591
this has an importance of almost 9% when matching it with the
592
user skill set. The job required skill set is also of importance to the
593
model when calculating the rarity.
594
Levenshtein BLEU ROUGE Rouge-WE
1 2 3 1 2 L 1 2
Accuracy 0.8224 0.8192 0.8224 0.8204 0.8236 0.8183 0.8250 0.8217 0.8236 F-score 0.4418 0.4464 0.4594 0.4619 0.4613 0.4479 0.4472 0.4557 0.4564 Precision 0.5940 0.5864 0.5991 0.5721 0.5895 0.5848 0.6120 0.5906 0.5981 Recall 0.3517 0.3603 0.3725 0.3873 0.3789 0.3629 0.3523 0.3709 0.3690
Table 4: Results of user skill and job required skill set matching
Levenshtein BLEU ROUGE Rouge-WE
1 2 L 1 2
Accuracy 0.8275 0.8271 0.8229 0.8214 0.8226 0.8247 0.8223 F-score 0.4615 0.4645 0.4600 0.4584 0.4683 0.4655 0.4526 Precision 0.6205 0.6026 0.5961 0.5962 0.5866 0.6015 0.5921 Recall 0.3673 0.3779 0.3745 0.3724 0.3896 0.3796 0.3663
Table 5: Results of job title and user work experience matching
Figure 3: The Gini importance of the top 5 most important features
Features GINI importance User 0.7203
Matching text 0.1342 Job 0.0906 Company 0.0549
Table 6: Total GINI importance per feature category
4.5.2 Partial dependence plots. The partial dependence plots are
595
created for the top two most important features found by the Gini
596
importance. Each plot contains a red dotted line indicating the 50%
597
prediction, the higher the number the more chance of predicting
598
’like’. Figure 4 shows that if a user has a graduation year of up until
599
2011, they are more likely to dislike the advertisement. Between
600
2011 and 2017 (the year the data was acquired) there is more chance
601
of liking the advertisement. In 2017 there is equal chance of liking or
602
disliking the advertisement. The chance of liking decreases over 10%
603
between 2016 and 2017. This could be due to the user still studying
604
and using the application informatively rather than actively looking
605
for a job, since most of the dataset was obtained during 2016 and
606
2017.
607
Figure 5 shows that generally the less years of experience a user
608
has, the more likely they are to like the job advertisement. A user
609
is more likely to dislike an advertisement if they have over 10 years
610
of experience. This gives the impression that either advertisements
611
are mainly focused on graduate or junior positions or users with
612
less work experience are more open to different jobs that might
613
not completely match with their profile. Interestingly, users with
614
almost no experience tend to dislike an advertisement more as
615
well. However, this can be linked to the results of the PDP of user
616
graduation year. No or almost no experience might indicate that
617
the user is still studying, and might therefore not be as interested
618
in finding a job just yet.
619
Furthermore, the partial dependencies of the user gender are
620
calculated to investigate the relation between gender and the
clas-621
sification. Male users seem to have more chance of liking an
adver-622
tisement, about 50.6%, whereas female users have more chance of
623
disliking an advertisement, about 54.0%. Thus, women are about
624
4.5% less likely to like an advertisement than men. This could mean
625
that the advertisements were more focused on men, or that men are
626
more likely to like an advertisement even though they might not
627
match the profile. Research has indeed shown that job requirements
628
are seen more as set rules by women than men [28].
629
Figure 4: Partial Dependency Plot of Graduation year
Figure 5: Partial Dependency Plot of Years of experience
4.5.3 Model per gender. Two models have been trained, on only
630
male or only female users, to inspect the difference between men
631
and women when rating an advertisement. Table 7 shows the results
632
of testing the models on separate test sets. The model trained on the
633
female users can predict with higher accuracy, whereas the model
634
trained on male users has a higher F-score.
635
Table 8 shows the top five most important feature as calculated
636
by the Gini importance (see appendix A for the complete list). These
637
features are different for each gender. For example, the most
impor-638
tant feature ’Years of experience’ for men has an importance of 0.16,
639
as opposed to a lower importance of 0.10 for women. Graduation
640
year, the most important feature for women with a score of 0.15, is
641
only the fourth most important feature for predicting the rating of
642
men, with a score of 0.09. However, these features are correlated
643
since a user usually starts work after graduating, and thus contain
644
potentially overlapping information.
645
4.5.4 Feature importance - per applicant. Two users have been
646
randomly chosen to show the possible results when investigating
647
what features contributed most to the prediction of the model. One
648
user liked and one user disliked an advertisement, and for both the
649
model has predicted this correctly. Table 9 shows some information
650
about these users.
651 8
Male Female Accuracy 0.8207 0.8322 F-score 0.5061 0.4274 Precision 0.6269 0.5716 Recall 0.4243 0.3413
Table 7: Model results on the test dataset per gender
Male Female
Years of experience 0.1613 Graduation year 0.1508 User country 0.1205 Count user skills 0.1369 Count user skills 0.1011 ROUGE-L skills 0.1026 Graduation year 0.0893 Years of experience 0.1007 Skill-rarity1 0.0861 Dutch 0.0889 Table 8: Top five most important features per gender model
User 1 2
Liked advertisement No Yes ROUGE-L skills 0 0.1004 Gender Male Female User education field IT IT
Dutch Yes No
Graduation year 2011 2016 skill-rarity-1 3.1083 2.9536 User country NL Other Language count 2 2 skill-rarity-2 2.8802 3.2263 Table 9: Information about user 1 and 2
The top five sets of features that contributed most to user 1’s
652
prediction are shown in table 10. ROUGE-L skills has the highest
653
contribution. All other contribution sets are made up by multiple
654
features. The set of user 1’s gender (male), education field (IT), and
655
Dutch (yes), combined with one or more features, shows to be a big
656
contributor to the prediction.
657
Table 11 shows the top five of contributions sets for the prediction
658
of user 2. The top contributor set has a very high contribution of
659
0.5 to the prediction, which is in the range [0,1]. All top five sets
660
contain the user’s country (not the Netherlands), and ROUGE-L
661
skills, some plus some other features.
662
5
CONCLUSION
This research aimed to find the most indicative features of whether a
663
user will like a job advertisement. It investigated how to determine
664
whether a user and an advertisement match, i.e. the user could
665
fulfill the position. The differences between men and women, and
666
different users have been explored as well.
667
Several methods have been applied to determine if a user matches
668
with a job advertisement. The ROUGE-L measure proved best for
669
matching the user skill set and the job required skills, although only
670
by a small margin. To match the job title and user work experience
671
the Levenshtein distance demonstrated the best results. Adding
672
these predictive features improved the original model significantly,
673
however, the F-score of the final model is still fairly low, which is
674
mainly caused by a low recall score. This means that the conclusions
675
drawn from this research should only be taken tentatively.
676
This research has shown there is a difference between indicative
677
features for each gender. The results show that a large difference
678
can be found in the matching of skill sets. Women seem to weigh in
679
more whether their skill set actually matches the job required skills
680
than men. If a company would like to attract more female applicants,
681
they could decide to rephrase the required skills, for example by
682
adding that possessing all skills is not a hard requirement.
683
The most indicative features are proven to be user related,
how-684
ever, matching a user and a job advertisement based on skill set
685
is also of great importance. Although the indicative features to
686
differ from user to user, matching based on the skill does contribute
687
for every case. A company should therefore put a strong focus on
688
writing the required skill set as precisely as possible to drive their
689
ideal candidate to like the advertisement and thus in their candidate
690
pool.
691
This research has used partial dependence plots to investigate the
692
influence of a single feature. However, these plots are less reliable
693
when features are correlated [29]. Future research could therefore
694
focus on constructing for example accumulated local effects plots.
695
These plots are robust to correlated features, and will thus describe
696
the individual influence of a feature in the model more accurately.
697
Contribution Feature set 0.2496 ROUGE-L skills
0.0888 Gender, User education field, Dutch, ROUGE-L skills
0.0832 Gender, User education field, Dutch, Graduation year, skill-rarity1 0.0272 Gender, User education field, Dutch, Graduation year, ROUGE-L skills
0.0236 Gender, User education field, Dutch, Graduation year, ROUGE-L skills, skill-rarity1 Table 10: Top 5 feature contribution sets of user 1
Contribution Feature set
0.5003 User country, ROUGE-L skills, User education field, Language count, skill-rarity-1 0.0535 User country, ROUGE-L skills, User education field
0.0209 User country, ROUGE-L skills, User education field, Language count, skill-rarity-1, Graduation year 0.0174 User country, ROUGE-L skills
0.0166 User country, ROUGE-L skills, User education field, Language count, skill-rarity1, Graduation year, skill-rarity2 Table 11: Top 5 feature contribution sets of user 2
Further, it would be interesting to involve a user’s past activity on
698
the application in the model. This can be used to determine whether
699
a user is interested in the job advertisement, as discussed in section
700
2.1 and thus could improve the model. The original dataset contains
701
the exact date and time of a user’s rating, from which the user’s
702
usage frequency and interactions with other job advertisements
703
can be derived.
704
6
ACKNOWLEDGMENTS
I would like to thank and express my great gratitude towards Prof.
705
Evangelos Kanoulas and Dr. Yuval Engel for providing me with
706
valuable support, constructive advice, and the opportunity to
per-707
form research on this intriguing problem. I also wish to thank Sam
708
Waterson and Priyanka Nanayakkara for all our great discussions
709
enhancing my research.
710
REFERENCES
[1] [n. d.]. GoogleNews-vectors-negative300.bin.gz. ([n. d.]). Retrieved June 11, 2019
711
from https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/
712
edit?usp=sharing
713
[2] [n. d.]. scikit-learn/sklearn/preprocessing/data.py. ([n. d.]). Retrieved June 11,
714 2019 from https://github.com/scikit-learn/scikit-learn/blob/7813f7efb/sklearn/ 715 preprocessing/data.py#L1047 716 [3] [n. d.]. sklearn.preprocessing.RobustScaler. ([n. d.]). Retrieved 717
June 11, 2019 from https://scikit-learn.org/stable/modules/generated/
718
sklearn.preprocessing.RobustScaler.html
719
[4] 2014. Interpreting random forests. (2014). Retrieved June 13, 2019 from http:
720
//blog.datadive.net/interpreting-random-forests/
721
[5] 2017. scikit-learn/sklearn/tree/_tree.pyx. (2017). Retrieved June
722
13, 2019 from https://github.com/scikit-learn/scikit-learn/blob/
723
18cdaa69c14a5c84ab03fce4fb5dc6cd77619e35/sklearn/tree/tree.pyx#L1056 724
[6] 2019. Evaluation of binary classifiers. (2019). Retrieved June 19, 2019 from
725
https://en.wikipedia.org/wiki/Evaluationofbinaryclassifiers 726
[7] 2019. scikit-learn/sklearn/feature_extraction/text.py. (2019). Retrieved June
727
18, 2019 from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/
728
featureextraction/text.py 729
[8] Fabian Abel, András Benczúr, Daniel Kohlsdorf, Martha Larson, and Róbert
730
Pálovics. 2016. RecSys Challenge 2016: Job Recommendations. In Proceedings of
731
the 10th ACM Conference on Recommender Systems (RecSys ’16). ACM, New York,
732
NY, USA, 425–426. https://doi.org/10.1145/2959100.2959207
733
[9] Mattia Bianchi, Federico Cesaro, Filippo Ciceri, Mattia Dagrada, Alberto Gasparin,
734
Daniele Grattarola, Ilyas Inajjar, Alberto Maria Metelli, and Leonardo Cella.
735
2017. Content-Based approaches for Cold-Start Job Recommendations. 1–5.
736
https://doi.org/10.1145/3124791.3124793
737
[10] Pernilla Bolander and Jörgen Sandberg. 2013. How Employee Selection Decisions
738
Are Made in Practice. Organization Studies 34, 3 (March 2013), 285–311. https:
739
//doi.org/10.1177/0170840612464757
740
[11] Richard Centers and Daphne E. Bugental. 1966. Intrinsic and Extrinsic Job
741
Motivation Among Different Segments of the Working Population. The Journal
742
of applied psychology 50, 3 (June 1966), 193–197. https://doi.org/10.1037/h0023420
743
[12] Derek S. Chapman, Krista L. Uggerslev, Sarah A. Carroll, Kelly A. Piasentin, and
744
David A. Jones. 2005. Applicant Attraction to Organizations and Job Choice:
745
A Meta-Analytic Review of the Correlates of Recruiting Outcomes. Journal of
746
Applied Psychology 90, 5 (Sept. 2005), 928–944.
https://doi.org/10.1037/0021-747
9010.90.5.928
748
[13] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer.
749
2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial
750
intelligence research 16 (2002), 321–357.
751
[14] Boxing Chen and Colin Cherry. 2014. A Systematic Comparison of
Smooth-752
ing Techniques for Sentence-Level BLEU. In Proceedings of the Ninth Workshop
753
on Statistical Machine Translation. Association for Computational Linguistics,
754
Baltimore, Maryland, USA, 362–367. https://doi.org/10.3115/v1/W14-3346
755
[15] Sapna Cheryan, Victoria C. Plaut, Paul G. Davies, and Claude Steele. 2009.
Ambi-756
ent Belonging: How Stereotypical Cues Impact Gender Participation in Computer
757
Science. Journal of personality and social psychology 97, 6 (Dec. 2009), 1045–1060.
758
https://doi.org/10.1037/a0016239
759
[16] Jane Fish. 1992. How to choose the right applicant. British Journal of Nursing 1,
760
7 (July 1992), 352–355. https://doi.org/10.12968/bjon.1992.1.7.352
761
[17] Jeffrey A. Flory, Andreas Leibbrandt, and John A. List. 2014. Do competitive
762
workplaces deter female workers? A large-scale natural field experiment on job
763
entry decisions. Review of Economic Studies 82, 1 (Jan. 2014), 122–155. https:
764
//doi.org/10.1093/restud/rdu030
765
[18] Brandon M. Greenwell. 2017. pdp: An R Package for Constructing Partial
De-766
pendence Plots. The R Journal 9, 1 (2017), 421–436.
https://doi.org/10.32614/RJ-767
2017-016
768
[19] Laszlo A Jeni, Jeffrey F Cohn, and Fernando De La Torre. 2013. Facing Imbalanced
769
Data–Recommendations for the Use of Performance Metrics. In 2013 Humaine
770
Association Conference on Affective Computing and Intelligent Interaction. IEEE,
771
245–251.
772
[20] Benjamin Kille, Fabian Abel, Balázs Hidasi, and Sahin Albayrak. 2015. Using
773
Interaction Signals for Job Recommendations, Vol. 162. 301–308. https://doi.org/
774
10.1007/978-3-319-29003-417 775
[21] Ute-Christine Klehe. 2004. Choosing How to Choose: Institutional Pressures
776
Affecting the Adoption of Personnel Selection Procedures. International Journal
777
of Selection and Assessment 12, 4 (Dec. 2004), 327–342. https://doi.org/10.1111/
778
j.0965-075X.2004.00288.x
779
[22] Emanuel Lacic, Dominik Kowald, Markus Reiter-Haas, Valentin Slawicek, and
780
Elisabeth Lex. 2017. Beyond Accuracy Optimization: On the Value of Item
781
Embeddings for Student Job Recommendations. CoRR abs/1711.07762 (2017).
782
arXiv:1711.07762 http://arxiv.org/abs/1711.07762
783
[23] Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting
dele-784
tions, insertions and reversals. Soviet Physics Doklady 10, 8 (Feb. 1966), 707–710.
785
Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.
786
[24] Jianxun Lian, Fuzheng Zhang, Min Hou, Hongwei Wang, Xing Xie, and
787
Guangzhong Sun. 2017. Practical Lessons for Job Recommendations in the
788
Cold-Start Scenario. In Proceedings of the Recommender Systems Challenge 2017
789
(RecSys Challenge ’17). ACM, New York, NY, USA, Article 4, 6 pages. https:
790
//doi.org/10.1145/3124791.3124794
791
[25] Andy Liaw and Matthew Wiener. 2002. Classification and Regression by
random-792
Forest. R News 2, 3 (Dec. 2002), 18–22. http://CRAN.R-project.org/doc/Rnews/
793
[26] Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of
Sum-794
maries. In Text Summarization Branches Out: Proceedings of the ACL-04
Work-795
shop. Association for Computational Linguistics, Barcelona, Spain, 74–81. https:
796
//www.aclweb.org/anthology/W04-1013
797
[27] Linda Miller and Jacqueline Budd. 1999. The Development of Occupational
Sex-798
role Stereotypes, Occupational Preferences and Academic Subject Preferences
799
in Children at Ages 8, 12 and 16. Educational Psychology 19, 1 (1999), 17–35.
800
https://doi.org/10.1080/0144341990190102
801
[28] Tara Sophia Mohr. 2014. Why Women Don’t Apply for Jobs Unless They’re 100%
802
Qualified. (2014). Retrieved June 18, 2019 from
https://hbr.org/2014/08/why-803
women-dont-apply-for-jobs-unless-theyre-100-qualified
804
[29] Christoph Molnar. 2019. Accumulated Local Effects (ALE) Plot. Christoph Molnar,
805
Chapter 5.3. https://christophm.github.io/interpretable-ml-book/
806
[30] Cristian Moral, Angélica de Antonio, Ricardo Imbert, and Jaime Ramírez. 2014-03.
807
A Survey of Stemming Algorithms in Information Retrieval. Information Research:
808
An International Electronic Journal 19, 1 (2014-03), 22.
809
[31] Jun-Ping Ng and Viktoria Abrecht. 2015. Better Summarization Evaluation with
810
Word Embeddings for ROUGE. (Aug. 2015).
https://doi.org/10.18653/v1/D15-811
1222
812
[32] ng-j p. 2015. ROUGE summarization evaluation metric, enhanced with use of
813
Word Embeddings. (2015). Retrieved June 11, 2019 from
https://github.com/ng-814
j-p/rouge-we
815
[33] Chris D. Paice. 1990. Another Stemmer. SIGIR Forum 24, 3 (Nov. 1990), 56–61.
816
http://doi.acm.org.proxy.uba.uva.nl:2048/10.1145/101306.101310
817
[34] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU:
818
A Method for Automatic Evaluation of Machine Translation. In Proceedings
819
of the 40th Annual Meeting on Association for Computational Linguistics (ACL
820
’02). Association for Computational Linguistics, Stroudsburg, PA, USA, 311–318.
821
https://doi.org/10.3115/1073083.1073135
822
[35] Ryan L. Philips and Rita Ormsby. 2016. Industry classification schemes: An
823
analysis and review. Journal of Business Finance Librarianship 21, 1 (Jan. 2016),
824
1–25. https://doi.org/10.1080/08963568.2015.1110229
825
[36] raghavrv. 2015. [RFC] Missing values in RandomForest 5870. (2015). Retrieved
826
June 11, 2019 from https://github.com/scikit-learn/scikit-learn/issues/5870
827
[37] Stacey Ronaghan. 2018. The Mathematics of Decision Trees, Random Forest and
828
Feature Importance in Scikit-learn and Spark. (2018). Retrieved June 13, 2019 from
829
https://medium.com/@srnghn/the-mathematics-of-decision-trees-random-830
forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3
831
[38] Ando Saabas. 2019. andosa/treeinterpreter. (2019). Retrieved June 13, 2019
832
from https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/
833
treeinterpreter.py
834
[39] Kuo-Tsung Tseng, De-Sheng Chan, Chang-Biau Yang, and Shou-Fu Lo. 2018.
835
Efficient merged longest common subsequence algorithms for similar sequences.
836
Theoretical Computer Science 708 (Jan. 2018), 75–90. https://doi.org/10.1016/
837
j.tcs.2017.10.027
838
[40] Victor Zhou. 2019. A Simple Explanation of Gini Impurity. (2019). Retrieved
839
June 13, 2019 from https://victorzhou.com/blog/gini-impurity/
840
A
FEATURE IMPORTANCE
Full model Male model Female model
Graduation year 0.1342 Years of experience 0.1613 Graduation year 0.1508 Years of experience 0.1214 User country 0.1205 Count user skills 0.1369 ROUGE-L skills 0.0918 Count user skills 0.1011 ROUGE-L skills 0.1026 skill-rarity1 0.0776 Graduation year 0.0893 Years of experience 0.1007
Count user skills 0.0754 skill-rarity1 0.0861 Dutch 0.0889
User country 0.0743 ROUGE-L skills 0.0611 skill-rarity1 0.0636 Dutch 0.0710 Job rating year 0.0497 Levenshtein job title 0.0472 Levenshtein job title 0.0424 Count user languages 0.0467 skill-rarity2 0.0373 User gender 0.0413 Code user education field 0.0442 Job title code 0.0361 skill-rarity2 0.0379 Levenshtein job title 0.0420 User country 0.0349 Count user languages 0.0315 skill-rarity2 0.0376 Count user languages 0.0327 Code user education field 0.0305 skill-rarity3 0.0278 skill-rarity3 0.0251 Job rating year 0.0252 Dutch 0.0178 Job classification code 0.0197 skill-rarity3 0.0250 Job title code 0.0161 Job rating year 0.0194 Job title code 0.0231 Job classification code 0.0122 Code user education field 0.0140 Job classification code 0.0159 Company average employee age 0.0115 Company average employee age 0.0130 Count required skills 0.0118 Count required skills 0.0111 Count required skills 0.0124 Company average employee age 0.0110 Industry percent female 0.0102 Industry percent female 0.0104 Industry percent female 0.0096 Company percent females 0.0097 Company age 0.0097 Company percent females 0.0087 Company age 0.0090 Company percent females 0.0089 Company age 0.0083 Job level code 0.0078 Company foundation year 0.0080 Company foundation year 0.0069 Industry GICS class 0.0075 Job level code 0.0066 Job level code 0.0065 Company foundation year 0.0072 Industry GICS class 0.0063 Industry GICS class 0.0063 Tasks maximum 0.0049 Tasks maximum 0.0057 Tasks maximum 0.0051 Company size 0.0044 Company size 0.0047 Company size 0.0041 Tasks different count 0.0017 Tasks different count 0.0015 Tasks different count 0.0014 Tasks minimum 0.0008 Tasks total count 0.0010 Tasks minimum 0.0008 Tasks total count 0.0005 Tasks minimum 0.0007 Tasks total count 0.0006 Job location 0.0002 Job commitment 0.0007 Job commitment 0.0004 Job commitment 0.0002 Job location 0.0004 Job location 0.0002