Predicting citation impact of
scientific papers using peer reviews
Reinder Gerard van Dalen
Master thesis
Information Science Reinder Gerard van Dalen s2497867
A B S T R A C T
Sharing scientific knowledge is of great importance in developing our joint knowl-edge. Researchers contribute to sharing such knowledge by publishing research in various scientific fields. However, it is essential that these studies are read and used in further research in order for them to have an effect on our joint knowledge. The citation impact of researcher’s work reflects whether this is the case. Within the field of exploring factors that influence this impact, studies have been done into the predictive power of for example the paper title and number of authors of a paper. Using peer reviews to predict the citation impact is however still unexplored. This thesis examines whether it is possible to predict the citation impact of scientific papers using their peer reviews. It is hypothesized that scientific papers with a high citation impact contain more positive feedback than papers with a low citation impact. To check this hypothesis, two corpora containing peer review texts were created. Subsequently, the predictive power of peer reviews for citation impact was
explored. Peer reviews of in total 3,315 scientific papers from the NIPS1
and ICLR2
conferences were collected. The 2,421 papers from NIPS were labeled with a low, medium or high citation impact. This was based on citation scores collected over two years. The 894 papers from ICLR were labeled with paper acceptance. Both corpora were employed to predict the citation impact of NIPS scientific papers. Sev-eral classifiers were trained on the data using Machine Learning. The best setup used NIPS data only and yielded an f1-score of 0.83, which outperformed the base-line of 0.71. The use of an auxiliary classifier based on ICLR data did not improve the classification of NIPS papers, although several implementation methods were investigated. A feature analysis of the optimal citation impact prediction model did show several differences in topics and sentiment between the low and high impact papers. Additionally, this research yielded two corpora of peer reviews and infor-mation of papers from various editions of NIPS and ICLR. All in all, this thesis is hopefully a step towards further research in this field.
1
Conference on Neural Information Processing Systems,https://nips.cc/. 2
International Conference on Learning Representations,https://iclr.cc/.
C O N T E N T S
Abstract i
Preface iv
1 introduction 1
2 related work 3
3 data and material 7
3.1 Collection . . . 7
3.1.1 NIPS Citation Counts . . . 7
3.1.2 NIPS Reviews . . . 12
3.1.3 ICLR Reviews on OpenReview.net . . . 15
3.2 Annotation . . . 20
3.2.1 NIPS Reviews . . . 20
3.2.2 ICLR Reviews on OpenReview.net . . . 21
3.3 Processing . . . 22
3.3.1 Citation impact . . . 22
Citation growth rates . . . 23
Citation impact . . . 24
3.3.2 Textual information . . . 28
3.3.3 Specific features . . . 30
4 method 32 4.1 Classification of NIPS papers . . . 32
4.1.1 Dataset . . . 32
4.1.2 Development . . . 33
4.1.3 Testing . . . 33
4.1.4 Evaluation . . . 33
4.1.5 Feature analysis . . . 34
4.2 Classification of ICLR papers . . . 34
4.2.1 Dataset . . . 34
4.2.2 Development . . . 35
4.2.3 Testing . . . 35
4.2.4 Evaluation . . . 36
4.3 Expanding the NIPS classifier . . . 36
4.3.1 Expanding the dataset . . . 36
4.3.2 Development . . . 37 4.3.3 Testing . . . 37 4.3.4 Evaluation . . . 37 4.4 Stacking . . . 37 4.4.1 Dataset . . . 37 4.4.2 Experiments . . . 38 4.4.3 Evaluation . . . 38
5 results and discussion 39 5.1 General findings . . . 39
5.2 Classification of NIPS papers . . . 40
5.2.1 Development phase . . . 40
5.2.2 Testing phase . . . 42
5.2.3 Feature analysis . . . 43
CONTENTS iii
Low impact papers . . . 44
High impact papers . . . 45
5.2.4 Discussion . . . 47
5.3 Classification of ICLR papers . . . 48
5.3.1 Development phase . . . 48
5.3.2 Testing phase . . . 49
5.3.3 Discussion . . . 50
5.4 NIPS classification using the extended classifier . . . 51
5.4.1 Development phase . . . 51
5.4.2 Testing phase . . . 52
5.4.3 Discussion . . . 53
Rating prediction . . . 53
5.5 NIPS classification using stacking . . . 53
5.5.1 Discussion . . . 54
6 conclusion and future work 55
Appendix 58
a modifiability of the data processing 59
b classification of nips papers: feature combinations 61
c classification of nips papers: results of the development phase 63
d classification of nips papers: informative features 65
e classification of nips papers: informative features with 10 most
likely collocations 69
f classification of iclr papers: feature combinations 74
g classification of iclr papers: results of the development phase 76
h extended nips classifier: feature combinations 78
P R E F A C E
In 2013 I started studying the Bachelor Information Science at the University of Groningen (UoG). During this Bachelor I have become acquainted with, amongst other things, the processing of Big Data, the field of Computational Linguistics and Machine Learning. In my Bachelor thesis I combined these research fields with my interests for politics. In that thesis, the automatic classification of Dutch Twitter users based on political preference was explored. The results of the study were striking. Many prejudices about supporters of certain political parties could be demonstrated using a feature analysis of an implemented machine learning model. Together with my thesis supervisor B. Plank and a fellow student we submitted a combined Author Profiling paper to the Computational Linguistics in the
Nether-lands (CLIN) Journal (van Dalen et al., 2017). During the CLIN27 conference, I
got acquainted with and became interested in the scientific world. When I finished my Bachelor and started the Master Information Science at the UoG, this interest was further stimulated. In June 2017 my supervisor shared her interest in doing meta-research on scientific papers with me. In this field of research, I could further express my interest in the scientific world. I decided to use the further knowledge about Machine Learning, I learned during the Master, in the meta-research on sci-entific papers: predicting the impact of scisci-entific papers.
During the creation of this thesis I got full support of my friends, parents and family. Partly thanks to them, I was able to do research and make this document. A special thanks goes to Stijn Eikelboom for his valuable suggestions on improv-ing this study. Beside the support of them, the last three years the department of Information Science in Groningen taught and supported me in a professional and accessible way. I’m very thankful to my supervisor during the period of my re-search and the creation of this thesis. Without her, I would not have been able to make the thesis into what it is now.
I’m very pleased by the final product of my thesis. When creating this product, I used all the skills I learned during my studies at the UoG. By combining all these skills and the knowledge these entail, I was able to create this thesis which I hope will contribute to the scientific world.
1
I N T R O D U C T I O N
All around the world researchers try to solve divergent scientific problems. From difficult mathematical issues up to ethical questions. In all of these fields, there are scientific conferences where people of the scientific world contribute to our joint knowledge. Without all this unique research, the world would not have been where it is now. From improvements in the agricultural sector up to the newest medical techniques, sharing knowledge in all scientific fields is of great importance for all of this. Because of the many research possibilities, the world is regularly introduced
to new research through scientific articles in scientific journals such as Science1
or
Nature2
. However, there are also smaller scientific conferences with its own journals in which interesting studies appear. All scientific articles are of great importance for new scientific research. In these articles, researchers, for example, discover new techniques and encourage others to conduct further research. This further research can be in the same field of research, but also use the discovered techniques in a completely different field.
When a discovered technique is so groundbreaking that it can be used in a mul-tidisciplinary way, the accompanying scientific article will receive many citations. The number of citations is important for a researcher in terms of contributing to the field and attracting funding. In addition, a high number of citations from a large audience can serve as a reward for the hard work of the researcher. But probably most important is, that a lot of future research will be conducted into a field that is of great importance. It is therefore interesting to investigate what factors influence the citation impact. Are there meta-variables that cause one paper to be cited more often than another?
Typically, papers submitted to a conference are peer reviewed. These reviews are of great importance. Reviewers provide recommendations to conference chairs and journal editors, who decide if a scientific paper is accepted or rejected for a conference or journal. Does the content of these reviews perhaps tell us more about a paper than just whether it should be accepted or not? To investigate this, this study focuses on detecting signals in peer review texts that are predictive of citation impact. It intends to answer the following research question: Is it possible to predict the citation impact of scientific papers using their peer reviews? It is hypothesized that scientific papers with a high citation impact contain more positive feedback than papers with a low citation impact. Therefore, it should be possible to predict the citation impact on the basis of peer reviews.
To answer this research question, two corpora with review texts were created. For one of the corpora the citation scores of the corresponding papers were collected. Subsequently the data of both corpora were used in trying to predict the citation impact using machine learning.
The first created corpus contains peer reviews scraped from the proceedings website of the Conference on Neural Information Processing Systems (NIPS). In total 2,421 papers with corresponding reviews were collected. The second corpus that was created consists of reviews from the International Conference on Learning Representations (ICLR). This corpus consists of review texts from 894 scientific pa-pers that were scraped from OpenReview.net. From April 2016 up to and including May 2018, the citation scores of papers from five NIPS editions were collected from Google Scholar. During this period, these were collected approximately every six months. The collected citation scores were used to determine the citation impact of NIPS scientific papers. Using supervised Machine Learning a model was created 1
http://www.sciencemag.org 2
https://www.nature.com
introduction 2
in predicting this citation impact. The reviews from ICLR were used in creating a auxiliary classifier which predicts the paper acceptance of scientific papers. Two dif-ferent implementations were explored in combining the auxiliary and the citation impact classifier to improve the classification of NIPS scientific papers.
First, the related work in the field of predicting citation impact is discussed in
Chapter2. Next, the creation of the two corpora is discussed in detail in Chapter3.
The collection of the citation scores, the annotation of the papers in both corpora and the pre-processing of the data for Machine Learning is also discussed in this
chapter. The creation of different classification models is described in Chapter4. In
2
R E L A T E D W O R K
There are different ways of determining the impact of research. For example, the impact of a specific paper or that of an individual researcher could be investigated. For the latter case, one could for example look at the publication and citation records of an author. In defining the impact of a specific paper, citations could be explored in determining the impact. In both cases, citation counts are a central measure.
Impact of individual researchers
In a study conducted by Hirsch, citation counts are also of importance. In 2005,
Hirschintroduced the h-index, an index defined as the number of papers with
cita-tion number > h. Hirschintroduced this index as a useful tool to characterize the
scientific output of a researcher. With the h-index,Hirschwants to solve the question
of how to quantify the cumulative impact and relevance of an individual’s scientific
research output. He defines h as: “A scientist has index h if h of his or her Np papers
have at least h citations each and the other (Np- h) papers have 6 h citations each” (Hirsch,
2005).Hirschargues that the h-index gives a more realistic view of the scientific
im-pact of an author’s publications than using other measures: number of papers (no measure of importance or impact), number of citations (hard to collect and may be inflated by a small number of paper with many citations) and citations per
pa-per (rewards low productivity and penalizes high productivity). Hirsch’s method
is a generally accepted method in defining the impact of an author’s research. In predicting the h-index of an author, a range of research has been conducted.
Dong et al.(2015) conducted a prediction task to determine whether a given
pa-per will, within a pre-defined timeframe, increase the h-index of its primary author. In their study, six categories of factors were employed, comprised of author, topic, reference, social, venue and temporal attributes. They found two predictive factors: the researcher’s authority on the publication topic (denoted by being highly cited by others in a specific domain of expertise) and the venue in which the paper was published. Surprisingly the topic popularity and the co-authors’ h-index were of little relevance in increasing the author’s h-index.
Impact of papers
Besides predicting the impact of an author’s research, various ways of predicting
im-pact of scientific papers have been explored.Mcnamara et al.(2013) researched the
predictability of citation network features in predicting high impact papers. They argue that being able to predict ’the next big thing’ allows the allocation of resources
to fields where rapid developments are occurring. In the study ofMcnamara et al.,
network features including the neighborhood in the citations network and measures of interdisciplinarity were explored. Using these features and the Scopus Database of Elsevier, which contains 24,097,496 papers published during the years 1995-2012,
Mcnamara et al. tried to predict paper impact. The paper impact was defined by taking citations over several years into account. The algorithms that were used in
predicting the impact were linear regression, decision trees and random forests.
Mcna-mara et al. found multiple predictors of high impact papers. Papers with early high citation counts, citation counts by a paper and citations of and by highly cited papers proved predictive for high impact. Also interdisciplinary citations of a paper and of papers that cite it, were found as being predictors of high impact papers. In
related work 4
contrast to the research by Mcnamara et al., this thesis explores textual and peer
review features instead of network features.
Another approach in exploring the citations to scientific articles was used in the
study of Vieira and Gomes (2010). Instead of exploring network features, Vieira
and Gomes analyzed the dependence on articles features in citation growth. In the study 226,166 articles that were published in 2004 in the fields of Biology & Biochemistry, Chemistry, Mathematics and Physics were analyzed. The number of citations and the article features co-authors, institutional addresses, number of pages, number of references, journals and number of citations (approximately 5 year window) were collected using the Web of Science (WoS) corpus. To analyze the dependence
on the citation growth, a mean citation rate was calculated for each paper. Vieira
and Gomes found that there was a dependence on the mean citation rate for the number of co-authors, the number of addresses and the number of references. For the relation between the mean impact and the number of pages, the dependence
obtained byVieira and Gomeswas very low. In this thesis, a similar method as that
ofVieira and Gomeswas used in determining the citation growth of a paper. Vieira and Gomesused citation counts in approximately a 5 year window and calculated for each paper a mean citation rate. The citation growth rate as calculated in this
paper, is inspired by the technique used byVieira and Gomes.
In a study conducted byHaslam et al.(2008), more factors contributing to
cita-tion impact were explored. Haslam et al. explored multiple impact predictors of
308standard articles published in three of the primary journals on social-personality
psychology in 1996. The citation counts were collected in mid-July 2006. A broad range of predictors was explored in this study: author characteristics (number of authors, gender, nationality and eminence), institutional factors (university, pres-tige, journal prestige and grant support), article features (title characteristics, num-ber of studies, figures and tables, numnum-ber and recency of references) and research approach (topic area and methodology). Using statistical multivariate analysis,
Haslam et al. found that the following factors were predictive for a high num-ber of citations: first author eminence, the later author being more senior, journal prestige, article length and number and recency of references.
Jacques and Sebire (2010) hypothesized that specific features of journal titles
may be related to citation rates. They investigated this by reviewing the title char-acteristics of the 25 most cited articles and the 25 least cited articles published in
general and specialist media journals. Using a Mann-Whitney U test, Jacques and
Sebirefound that there was a positive correlation between the length of paper titles, the use of colons in paper titles and the presence of an acronym. Because of the
positive correlation that was found,Jacques and Sebiresuggest that the effect of the
construction of an article title on citation impact, may be underestimated.
A somewhat similar study was conducted in 2009 by Webster et al.. In their
research, they wanted to help answering the question of which variables are pre-dictive for high citation counts. Instead of only investigating meta-features of titles, they focused on word frequencies in titles during certain periods. The meta-feature
number of references was also explored in this study.Webster et al. analyzed 808
arti-cles containing 8,631 title words from the psychology journal Evolution and Human Behavior between 1979 and 2008. For different periods, they found different words that could be identified as hot topics. They also found that articles that cite more references are in turn cited more themselves.
Other title features were investigated byJamali and Nikzad(2011). 2,172 papers
published in 2007 out of six journals of the Public Library of Science were explored. The articles came from the research fields of biology, medicine, computational
biol-ogy, genetics and pathogens. Jamali and Nikzadcategorized all titles as descriptive,
indicative or question. Furthermore, the titles were featureized by the number of
words the title contains. Using statistical difference and correlation tests, Jamali
and Nikzad found correlations between the explored features and the number of downloads and citations. Papers with question titles tended to be downloaded
related work 5
more, but cited less than others. Papers with longer titles were downloaded slightly less often than the papers with shorter titles. The study also showed that titles
with colons tended to be longer and receive fewer downloads and citations. Jamali
and Nikzadalso found that there was a positive correlation between the number of downloads and citations.
Another study that shows a positive relation between the number of downloads
and citations was conducted by Subotic and Mukherjee (2014). In this study a
unified investigation of article title characteristics in relation to subsequent article
citation and download rates was conducted. Subotic and Mukherjeecombined
ar-ticle title features that were previously studied mostly individually. They carefully selected 258 articles out of 41 different psychology journals. Similar to the findings ofJamali and Nikzad(2011),Subotic and Mukherjeefound that shorter titles were
associated with more citations, by using a statistical analysis. The study also con-cluded that the title amusement level was slightly correlated with downloads, but
not with the number of citations. Furthermore,Subotic and Mukherjeefound that
amusing titles tended to be shorter.
Various studies show that there is a relation between the paper title and the number of citations or download rates. However, these studies mainly focused their
research on specific scientific fields. In2016,Hudsondid a more broad study in the
relation of article features and the citation impact. In this study,Hudsonanalyzed
52,000 articles that were submitted in 2014 in the UK’s four-year Research
Evalu-ation Framework (REF). The analyzed articles were from 36 different disciplines.
Using statistical regression,Hudsonfound that the title characteristics varied
con-siderably between the different disciplines. However, the research shows four main
findings. First,Hudsonfound that the lengths of the titles increased with the
num-ber of authors in almost all disciplines. The second finding was that the use of colons and question marks tended to decline with increasing numbers of authors. Third, papers published later on in the 4-year period, tended to have more authors than those published earlier. The fourth finding of the study is interesting for the research on citation impact. In some disciplines, the number of subsequent cita-tions to papers were higher when the titles were shorter and when they employed
colons. The citations were lower when paper titles employed question marks (
Hud-son, 2016). Again this shows that, in some fields, there is a relation between the
paper title and the number of citations. Some parts of the approach ofHudsonare
also used to predict the citation impact in this thesis. The use of question marks and colons in paper titles are considered in this prediction task.
The previously described studies show that in various areas of research, different
article features are predictors of or related to scientific impact.Bergsma et al.(2012)
showed that certain article features are also predictors for hidden attributes such as whether the author of a paper is a native English speaker, whether the author is male or female and whether the paper was published in a conference or workshop proceedings. Using Bag of Words features in training, three linear SVM classifiers
on three datasets containing approximately 400 papers each,Bergsma et al. were
able to outperform a minority class baseline in predicting all three hidden attributes. By adding stylistic and syntactic features to the models, the performance of the classifiers improved even more. In predicting the native language of an author, one of the classifiers performed with an f1-score of 91.6, almost 42 percent point better than the baseline of 49.8. Interesting in this study is the use of features extracted from the article itself. It could be that for example the abstract is not only predictive for the hidden attributes, but also for the citation impact. It is worth investigating whether features representing the content of an article are predictive for citation impact.
A novelty in the field of investigating predictors for citation impact, is the use
of peer reviews. Kang et al. (2018) presents the first public dataset of scientific
peer reviews available for research purposes (PeerRead v1). However, before the publication of this study, two corpora with peer reviews from the NIPS and ICLR
related work 6
conference were already created for this thesis. The dataset Kang et al. created
consist of 14.7K papers with their peer-reviews and corresponding accept/reject
de-cisions. For a subset of 3K papers, the dataset contains 10.7K textual reviews. Kang
et al. created the PeerRead v1 using three strategies. The first was that of collaborat-ing with conference chairs and conference management systems to allow authors and reviews to opt-in their paper drafts and peer reviews, respectively. Secondly
Kang et al. crawled publicly available peer reviews and labeled these with numer-ical scores such as clarity and impact. Last, they crawled arXiv submissions which coincide with important conference submission dates and checked whether a sim-ilar paper appears in proceedings of these conferences at a later date. The dataset
created byKang et al.differs with respect to the size relative to the corpora created
for this thesis. PeerRead v1 contains papers from the 2017 ICLR edition, where the ICLR corpus created for this thesis contains peer reviews of the 2013, 2014, 2016 and 2017 ICLR editions. The PeerRead v1 dataset consists of the same papers and reviews as the NIPS corpus created for this thesis does. Furthermore, the PeerRead
v1 contains reviews from ACL 2017 and CoNLL 2016. The dataset created byKang
et al. also included another 11,778 papers without reviews, but labeled with their
accept/reject decisions. Kang et al. also conducted a classification task with a part
of their created dataset. In this classification task, Kang et al. tried to predict the
paper acceptance of ICLR and arXiv papers. Both models performed better than the majority baseline. Most relevant for this thesis is the classification model built for predicting the ICLR acceptance. With the most optimal settings, this model had an accuracy of 65,3%. The model outperforms the majority baseline of 57,6%. In
creat-ing this optimal model,Kang et al. implemented a feature set containing 22 coarse
features and sparse and dense lexical features. A feature analysis showed that these feature were most predictive for the paper acceptance: use of appendix (True/False), number of theorems (Integer), number of equations (Integer), average number of references (Float), use of ‘state-of-the-art’ in the abstract (True/False) and number of cited papers pub-lished in the last five years (Integer). In this thesis, a classification model that predicts ICLR papers was also created. However, this model differs from the model created by Kang et al. in the implemented feature combinations. In this thesis, limited article features were available and therefore implemented. Only lexical features of the abstract were used in this thesis. This thesis mainly focused on trying to predict the paper acceptance based on the peer reviews of the papers.
As rightly concluded byKang et al., the access to peer reviews is limited.
There-fore, both the author of this thesis andKang et al.chose to collect the peer reviews
of NIPS. This is because NIPS is one of the few conferences that made its peer
re-views publicly accessible. As described byHirsch, citation counts are hard to collect
as well. WhileMcnamara et al. had access to the Scopus Database of Elsevier, which
also contains information about citations, the author of this thesis only had access to public sites like Google Scholar for collecting citations.
In predicting the citation impact of NIPS scientific papers, the relatively novel field of using peer reviews as predictors for citation impact is explored in this thesis. However, predictors that were also investigated by some researchers as described above, were implemented in the development of the classification model to predict the NIPS papers citation impact. In this thesis predictors are referred to as the specific features derived from the previous work. This specific feature set includes the
use of question marks in paper titles (Hudson(2016),Jamali and Nikzad(2011)), the
use of colons in paper titles (Jacques and Sebire(2010),Jamali and Nikzad(2011),
Hudson(2016)), the length of the paper title (Jamali and Nikzad(2011),Subotic and Mukherjee(2014),Hudson(2016)) and the number of authors of a paper (Vieira and Gomes(2010),Hudson(2016)).
3
D A T A A N D M A T E R I A L
3.1
collection
For this research two different types of data needed to be collected: peer reviews and citation counts. Both data types were hard to collect. There are very few con-ferences who make reviews publicly available and there is no central place where citation counts can be downloaded easily. In the different types of data, the
con-ference on Neural Information Processing Systems (NIPS)1
and the the International
Conference on Learning Representations (ICLR)2
play a key role.
NIPS is a machine learning and computational neuroscience conference held
every December3
. Because NIPS was one of the first conferences that made peer
review public and keeps their records in a way that is easy to access4
, peer reviews (referred to as reviews) of NIPS scientific papers and their belonging citation counts
were collected from Google Scholar5
.
Just like NIPS, ICLR is a conference on machine learning. In contrast to NIPS, ICLR is held every April. Reviews of ICLR scientific papers were collected from the
website OpenReview.net6
. This website contains peer reviews of 13 different
confer-ences7
. The choice to collect reviews of ICLR was made, because of the similarities between the subjects of both NIPS and ICLR. The collection of the reviews and citation counts is described in the following subsections.
3.1.1 NIPS Citation Counts
In order to get the citation scores of the NIPS scientific papers, initially a list of papers was created for the editions of 2013, 2014, 2015, 2016 and 2017. This was done automatically using a Python script.
In a first attempt to collect the citation counts, the Python module scholar.py8
was used in scraping the citations counts from Google Scholar. Unfortunately this attempt failed because Google was not allowing the required number of requests to their server. Therefore, a human annotator was needed to recorded the citation scores from Google Scholar. The annotator was helped by a Python script that au-tomatically opened the Google Scholar page with search results of a specific paper and allowed the annotator to enter the number of citations. The only thing the
an-notator had to do, was typing over the right number as marked in red in Figure1.
The recording of the citation scores started in April 2016 (citations1) by the supervisor of this thesis. The citation scores were also recorded in November 2016 (citations2), June 2017
(citation-s3), March 2018 (citations4) and May 2018 (citations5). Table1shows an overview
of the recorded citation counts. During these periods the available citation counts of the NIPS 2013 up to and including the 2017 edition were collected.
For almost all of the NIPS papers, the citation counts were accurately recorded by the author of this thesis. Unfortunately, some citation counts were not collected due to technical problems. Due to this, no citation counts were recorded for four 1 https://nips.cc 2 https://iclr.cc 3 https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems 4 https://papers.nips.cc 5 http://scholar.google.com 6 https://openreview.net 7 https://openreview.net/on 2018-04-17 8 https://github.com/ckreibich/scholar.py 7
3.1 collection 8
papers of the 2017 edition. All collected citation counts are stored into CSV files. For each year, there is a CSV file containing the citation counts for the different periods.
Figure 1: The annotator had to type of the number of citations.
citations1 citations2 citations3 citations4 citations5 NIPS
Edition April 2016 November 2016 June 2017 March 2018 May 2018 2013 2014 2015 2016 2017 recorded not recorded
Table 1: Overview of the recorded NIPS citation counts.9
In Figure2, the absolute increase in citations for each edition of NIPS is plotted
against the first collected citations. The Figure shows the growth of citations from the first up to the most recent moment the citations were collected. In the plot for NIPS 2013, one can see one paper with an outstandingly high growth in citations. The absolute growth of this paper between citations1 and citations4 is much bigger than the growth of other NIPS 2013 papers. The plots for NIPS 2014, 2015,
2016and 2017 show a similar picture. In all editions, there is one paper that really
stands out in terms of the absolute increase of citations. A list of these papers can
be seen in Table2.
It shows that when the paper with the highest increase was removed from the NIPS 2013 plot, the plots of the different NIPS editions look more similar (see
Fig-ure3. This paper is therefore an exception in citation growth. It is clearly that for
each edition of NIPS, there are papers with a normal growth and papers with a higher growth. The papers with the more normal growth are clustered in the bot-tom left corner of the plot. Papers with a faster growth are floating from the botbot-tom
left, up to the top right corner. In Figure4, the clustered papers are displayed using
a red dotted line, the green dotted line indicates the floating papers.
The plot of the growth in citations for the NIPS 2017 edition differs somewhat from the plots of the other editions. There is no outstanding paper and there are less floating papers. This is probably because of the difference in time between the periods which the citations were collected. For the NIPS 2017 edition, there are only two months in between citations4 and citations5: March 2018 until May
2018. For comparison, the time in between periods of collecting citations for the
other editions is seven up to nine months. The plot of NIPS 2017 shows that the 9
In May 2018, only the citation counts for the NIPS 2017 edition were collected, to ensure that at least two consecutive collected citation counts were available for each edition.
3.1 collection 9
development of citation growth is already visible, but not as clearly as in the other plots.
The plot in the bottom right corner of Figure 2, shows the citation growth of
the different NIPS editions all together. Here the most recent absolute number of citations is plotted against the absolute increase in citations. The outstanding paper of NIPS 2013 is clearly visible in this plot. When removing this paper to zoom in on
the plot (Figure5), the clustered and floating papers are well observable. However
the floating papers are closer to the clustered papers, which is mainly due to another outstanding paper in the top right corner.
The different plots of the collected citation counts show that there is distinguish-able variation in the amount of growth between the NIPS scientific papers. The processing and combining of the citation counts with the NIPS reviews is described
in Chapter3.3.
NIPS
Edition Farthest outlier
2013 Distributed Representations of Words and Phrases and their
Composi-tionality (Mikolov et al.,2013)
2014 Sequence to Sequence Learning with Neural Networks (Sutskever
et al.,2014)
2015 Faster R-CNN: Towards Real-Time Object Detection with Region
Pro-posal Networks (Ren et al.,2015)
2016 Improved Techniques for Training GANs (Salimans et al.,2016)
2017 Dual path networks (Chen et al.,2017)
Table 2: Papers with the highest growth in citations and the highest number of citations in
3.1 collection 10
3.1 collection 11
Figure 3: Citations of NIPS 2013 Scientific Papers.
Figure 4: NIPS 2013: Clustered papers with normal growth and floating papers with faster
3.1 collection 12
Figure 5: Citations of NIPS 2013 - 2017.
3.1.2 NIPS Reviews
As mentioned before, NIPS has an easily accessible proceedings website. However, the available information was not homogeneous across the different NIPS editions. Not only the information of the NIPS scientific papers is available on this site, it
also contains the peer reviews of the conference reviewers. Using Beautiful Soup10
, the available information was scraped from the NIPS proceedings website. An
overview of the information that was scraped can be seen in Table3. A description
of the information that was scraped for each individual review can be found in
Table4. As that Table shows, the available information is not the same for every
NIPS edition. An overview of the differences can be seen in Table5.
Information Description
ID The unique paper ID of the NIPS Proceedings website.
Title The organizational title of the scientific paper.
Abstract The abstract of the paper.
URL The URL to the NIPS Proceedings paper page.
BIB URL The URL to the BIB file of the scientific paper.
PDF URL The URL to the PDF file of the scientific paper.
Presentation type The medium used to present the scientific paper. For
example Poster or Oral.
Authors A list of the authors of the paper.
Author ID’s The relative URLs of the NIPS Proceedings author pages.
Path to reviews The local path and file name to a copy of the NIPS
re-view HTML file.
Submission paper ID The submission ID of the NIPS paper.
Reviews Peer reviews of the paper. See Table 4 for a detailed
description about the scraped information for a single review.
Table 3: An overview of the information that was scraped from the individual web pages of
the NIPS papers.
The reviews and the information about the papers were saved into a JSON file. For each edition of the NIPS conference, a separate JSON file was created. The
collected citation counts as described in Section 3.1.1 were also included in the
10
3.1 collection 13
Information Description
ID The name or ID of the reviewer. This is always a string
that contains a number of the reviewer.
For example: Assigned_Reviewer_13 or Reviewer 3.
Review The review text written by the reviewer.
For the NIPS editions 2013, 2014 and 2015, the reviewers were asked to write a review the following way: "First provide a summary of the paper, and then address the follow-ing criteria: Quality, clarity, originality and significance.". In 2016 the reviewers were asked to write a "Qualitative Assessment".
In 2017 there was no heading for the actual review.
Summary The summary of the review.
For the NIPS editions 2013, 2014 and 2015 the review-ers were asked to write a summary the following way: "Please summarize your review in 1-2 sentences".
In 2016 the reviewers were just asked to write a "Sum-mary".
In 2017 the reviewers weren’t asked to write a summary of the review.
Reviewer confidence In 2016 the reviewers were asked specify their
confi-dence in the review. The reviewers could choose one out of the following three options:
1. Less confident (might not have understood
signif-icant parts);
2. Confident (read it all; understood it all reasonably
well);
3. Expert (read the paper in detail, know the area,
quite certain of my opinion).
Author feedback In 2013, 2014 and 2015 the author was asked to give
feedback based on the reviews of the reviewers. Table 4: An overview of the scraped data for an individual NIPS review.
JSON file. The original HTML source files of the reviews were subsequently saved into a separate folder for each edition. An example of the JSON structure of a
specific paper can be seen in Listing1.
Review NIPS
Edition Reviewer_ID Review Summary Reviewer
confidence Author feedback 2013 2014 2015 2016 2017 Information available Information not available
Table 5: Overview of the available review information for papers of the different NIPS
edi-tions.
The NIPS reviews dataset contains 9,163 reviews in total. Some statistics about
the dataset can be seen in Table6. On average there are four peer reviews written
3.1 collection 14
1 {
2 "5138": {
3 "paper_id": "5138",
4 "paper_title": "The Randomized Dependence Coefficient",
5 "abstract": "We introduce the Randomized Dependence Coefficient (RDC), ...", 6 "url": "https://papers.nips.cc/paper/5138-the-randomized-dependence-coefficient", 7 "bib": "http://papers.nips.cc/paper/5138-the-randomized-dependence-coefficient/...", 8 "pdf": "http://papers.nips.cc/paper/5138-the-randomized-dependence-coefficient.pdf", 9 "conf_event_type": "Poster", 10 "authors": [ 11 "David Lopez-Paz", 12 "Philipp Hennig", 13 "Bernhard Schölkopf" 14 ], 15 "authors_ids": [ 16 "/author/david-lopez-paz-6694", 17 "/author/philipp-hennig-5163", 18 "/author/bernhard-scholkopf-1472" 19 ], 20 "path_to_rev": "reviews_2013/5138.rev", 21 "reviews_url": "https://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/...", 22 "submission_paper_id": "14", 23 "reviews": { 24 "Assigned_Reviewer_4": {
25 "review": "The paper introduces a new method called RDC to measure the ...", 26 "summary": "An interesting work combining several known ideas, but the ...", 27 "review_confidence": null
28 },
29 "Assigned_Reviewer_5": {
30 "review": "The RDC is a non-linear dependency estimator that satisfies ...", 31 "summary": "RDC is a straightforward and computationally efficient ...", 32 "review_confidence": null
33 },
34 "Assigned_Reviewer_6": {
35 "review": "The authors propose a non-linear measure of dependence ...", 36 "summary": "I think overall the work is extremely interesting and ...", 37 "review_confidence": null
38 },
39 "Assigned_Reviewer_7": {
40 "review": "This paper gives a new approach to nonlinear correlation. The ...", 41 "summary": "New and interesting approach to the metric of nonlinear ...", 42 "review_confidence": null
43 }
44 },
45 "author_feedback": "Dear reviewers, Thank you for the supportive feedback and ...", 46 "citations": { 47 "citations1": 23, 48 "citations2": 33, 49 "citations3": 53, 50 "citations4": 75 51 } 52 } 53 }
3.1 collection 15
Besides that, it includes ≈ 1,2 million words of review summaries and feedback on the reviews by the paper authors. On average, each review contains 312 words. The average number of words in the review summaries is 40. The Author Feedback contains on average the most number of words. Authors write their feedback about the reviews in on average 644 words.
The created NIPS corpus was published on GitHub11
and is available for re-search purposes.
NIPS Editions 2013 2014 2015 2016 2017
Number of papers 360 411 403 568 679
Average number
of reviews per paper 3 3 4 6 3
Average confidence score 2 Reviews* 1,132 1,278 1,536 3,240 1,977 Summaries* 1,132 1,278 1,536 3,240 Totals Author Feedback 359 408 403 Reviews* 422,538 423,940 445,629 775,677 645,547 Summaries* 40,732 44625 62,494 291,952 Total number of words Author Feedback 225,784 269,379 259,455 Reviews* 373 332 290 239 327 Summaries* 36 35 41 90 Average number of words Author Feedback 629 660 644
Table 6: Statistics of the NIPS Reviews dataset. Information marked with a * was used in the
experiments of this thesis.
3.1.3 ICLR Reviews on OpenReview.net
OpenReview.net contains peer reviews for 25 different venues of 13 different
confer-ences12
. "OpenReview aims to promote openness in scientific communication, particularly the peer review process, by providing a flexible cloud-based web interface and underlying
database API."13
Using this API, reviews of scientific papers of the International Con-ference on Learning Representations (ICRL) were collected. The reviews of the follow-ing editions were processed: 2013, 2014, 2016 and 2017. Unfortunately, the reviews of the 2015 ICLR edition were not available on OpenReview.
A separate JSON file was created for each ICLR edition. For each paper, two types of information were processed and saved into a JSON file: information about the paper itself and information about the reviews belonging to the paper. The
information collected about the paper itself is described in Table 7. Information
collected about individual reviews is described in Table8.
An example of the JSON structure of a specific OpenReview paper can be seen in
Figure2. An overview of review and paper information with differing availability
between ICLR editions can be seen in Table9.
The corpus of ICLR reviews from OpenReview has a total of 2,697 reviews, 4,994 replies and 1,021 questions. In total, the corpus contains 783,848 words of review texts. It also contains ≈ 1 million words of replies to these reviews. The texts labeled as questions contain 98,417 words. On average a review in the ICLR corpus consists of ≈ 291 words. A reply has an average length of ≈ 268 words. The texts in 11 https://github.com/reinardvandalen/citations 12 https://openreview.net/on 2018-04-17 13 https://openreview.net/about
3.1 collection 16
the ICLR 2017 edition that were marked as questions, contain on average 96 words.
More details and an overview of the quantity of the corpus can be seen in Table10.
In comparison to the corpus of NIPS reviews, the ICLR corpus is much smaller. While the NIPS corpus contains peer reviews of 2,421 papers, the ICLR corpus only contains reviews of 894 papers.
The collected ICLR corpus was published on GitHub14
and is available for re-search purposes.
Information Description
ID The unique OpenReview ID of the scientific paper.
Serial number The serial number of the paper for the particular ICLR edition.
Title The organizational title of the paper.
Abstract The abstract of the paper.
PDF The URL to the PDF file of the scientific paper.
Authors The author(s) of the paper.
Author ID’s The unique OpenReview IDs of the author(s). On OpenReview
these IDs are the email addresses of the authors.
Keywords Some papers are labeled with one or more keywords. These
key-words are saved into a list.
Track The type of track followed within the conference. For example:
workshop or conference.
Acceptance States if the paper is accepted or not.
Reviews The peer reviews of the paper. See Table 8 for a detailed
de-scription about the collected information for a single review. Table 7: An overview of the information that was collected about the ICLR papers. 14
3.1 collection 17
Information Description
ID A unique OpenReview review ID
Serial number The serial number of the review.
Parent The review ID of the parent.
When the review is a direct review to the paper, the parent is the OpenReview paper ID. If the review is a reply to an already existing review, the parent is the review ID of the review that was replied to.
Authors The author of the review. Sometimes the author of the paper
itself replies to one of the reviews. This can be checked by comparing the author of the paper with the author of the reply.
Title The title of the review.
Text The contents of the review, reply or question.
Type The type of the review. For the editions 2013, 2014 and 2016
this can be a review or a reply. For the 2017 edition this can also be a question.
Rating The rating a reviewer gives the paper. This information is only
available in the ICLR 2016 and 2017 edition. The reviewers could choose one out of the following scores:
1. Trivial or wrong;
2. Strong rejection;
3. Clear rejection;
4. Ok but not good enough;
5. Marginally below acceptance threshold;
6. Marginally above acceptence threshold;
7. Good paper, accept;
8. Top 50% of accepted papers, clear accept;
9. Top 15% of accepted papers, strong accept;
10. Top 5% of accepted papers, seminal paper.
Confidence The confidence of the reviewer about its review. This
informa-tion is only available in the ICLR 2016 and 2017 ediinforma-tion. The reviewers could choose one out of the following scores:
1. The reviewer’s evaluation is an educated guess;
2. The reviewer is willing to defend the evaluation, but is
quite likely that the reviewer did not understand central parts of the paper;
3. The reviewer is fairly confident that the evaluation is
cor-rect;
4. The reviewer is confident but not absolutely certain that
the evaluation is correct;
5. The reviewer is absolutely certain that the evaluation is
correct and very familiar with the relevant literature.
Decision When the review is the final decision of the jury, the decision is:
"Invite to Workshop Track", "Reject", "Accept (Poster)" or "Accept (Oral)". This information is only available for the 2017 edition.
Replies If there are any replies on a review, these replies or reviews are
nested here along with all review information as described in this Table.
3.1 collection 18 1 { 2 "5Qbn4E0Njz4Si": { 3 "paper_info": { 4 "paper_id": "5Qbn4E0Njz4Si", 5 "paper_nbr": 38,
6 "title": "Hierarchical Data Representation Model - Multi-layer NMF",
7 "abstract": "Understanding and representing the underlying structure of ...", 8 "pdf": "https://openreviews.nethttps://arxiv.org/abs/1301.6316", 9 "authors": [ 10 "Hyun-Ah Song", 11 "Soo-Young Lee" 12 ], 13 "author_ids": [ 14 "hi.hyunah@gmail.com", 15 "longlivelee@gmail.com" 16 ], 17 "keywords": [], 18 "track": "workshop", 19 "acceptance": true 20 }, 21 "reviews": { 22 "Oel6vaaN-neNQ": { 23 "id": "Oel6vaaN-neNQ", 24 "number": 2, 25 "parent": "5Qbn4E0Njz4Si", 26 "authors": [ 27 "anonymous reviewer 7984" 28 ],
29 "title": "review of Hierarchical Data Representation Model - Multi-layer NMF", 30 "text": "The paper proposes to stack NMF models on top of each other. ...", 31 "type": "review", 32 "rating": null, 33 "confidence": null, 34 "decision": null, 35 "replies": { 36 "-B7o-Yy0XjB0_": { 37 "id": "-B7o-Yy0XjB0_", 38 "number": 1, 39 "parent": "Oel6vaaN-neNQ", 40 "authors": [ 41 "Hyun-Ah Song" 42 ], 43 "title": "",
44 "text": "- Points on the con that experimental results are not great: ...", 45 "type": "reply", 46 "rating": null, 47 "confidence": null, 48 "decision": null, 49 "replies": {} 50 } 51 } 52 } 53 } 54 } 55 }
3.1 collection 19
Review ICLR
Edition Rating Confidence Decision
Paper acceptance 2013 2014 2016 2017 Information available Information not available
Table 9: An overview of review and paper information with differing availability between
ICLR editions
ICLR Editions 2013 2014 2016 2017
Number of papers 67 88 125 614
Average number
of reviews per paper 5 5 1 3
Average number
of replies per paper 1 2 1 8
Average number
of questions per paper 2
Average rating score 6 6
Average confidence score 4 4
Average acceptance* rate 0.82 0.42 0.52
Reviews* 309 446 183 1,759 Replies 64 205 81 4,644 Totals Questions 1,021 Reviews* 96,334 154,086 41,735 491,693 Replies 19,427 69,410 17,792 962,716 Total number of words Questions 98,417 Reviews* 312 346 228 280 Replies 304 339 220 207 Average number of words Questions 96
Table 10: Statistics of the ICLR OpenReview reviews. Information marked with a * was used
3.2 annotation 20
3.2
annotation
The NIPS Reviews and ICLR Reviews corpus were both annotated in a different way. The annotation process for both corpora is described in the following subsections.
3.2.1 NIPS Reviews
For the NIPS Reviews corpus the collected citation counts as described in Section3.1.1
were used to annotate the corpus. Every review is annotated with the available
ci-tation counts for the associated paper. As described in Section 3.1.1, the citation
counts were collected by a human annotator. The supervisor of this thesis, Assoc.
Prof. B. Plank15
, started collecting the citations in April 2016. The author of this thesis continued to do so in June 2017. The citation counts were recorded in the following periods, with the aim of collecting the citations approximately every six months: • citations1: April 2016 • citations2: November 2016 • citations3: June 2017 • citations4: March 2018 • citations5: May 2018
For the 2013, 2014 and 2015 editions, the citation counts of citations1,
cita-tions2, citations3 and citations4 are available. citations3 and citations4 are
available for the 2016 edition. For the 2017 edition the citations4 and citations5 were collected. In May 2018 only the citations of the NIPS 2017 edition were col-lected, to ensure that at least two consecutive citation counts were available for each edition.
The annotation process started with automatically creating a list of the NIPS scientific papers. This was done automatically by using the Beautiful Soup Python scraper. On the different moments mentioned above, a human annotator recorded the actual Google Scholar citation counts. The annotator was helped by a Python program, which automatically opened the Google Scholar search results page of the paper from where the citation counts needed to be collected. The program the
annotator used can be seen in Figure 6. The human annotator only had to type
over the Google Scholar citation scores. However, the use of Google Scholar comes with some limitations. The data collected by the annotator could be noisy because of the citation scores scraping algorithm of Google Scholar. Furthermore, human errors could be made during the collecting. However, for this research, there was no other method of collecting citation counts than using Google Scholar. This is in
contrast to publishers like Elsevier16
, that do have exact information about citations, but keep these behind a pay wall.
15
http://www.let.rug.nl/bplank/ 16
3.2 annotation 21
Figure 6: The Python program the annotator used to record the citation counts from Google
Scholar.
Due to technical problems no citation counts were recorded for four papers of the 2017 edition. Because of this, these four papers were excluded form the research. Besides these technical issues, it could also happen that the annotator typed over the wrong number of citations. Because the citations counts were not checked by a second annotator, this has to be taken into account when using the citation counts. However, there were no lower citation counts in consecutive periods. Sometimes Google adjusts the number of citations of a paper on Google Scholar. This could mean that in June 2017 a paper could have 25 citations and in May 2018 24 citations. If this was the case, the 24 citations in May 2018 were adjusted to 25. This was done because it would have been difficult to decide how many periods had to be adjusted back in time. It was therefore decided to maintain the highest recorded citation count. However, the adjusting of the citation counts was only required for very few papers.
As described earlier in Section 3.1.1, the citation counts were stored into CSV
files. For each NIPS edition, a CSV file was created, with in each column a period of citations. The citation counts were also stored in the NIPS reviews dataset as
described in Section3.2.1.
Besides the citation counts, a part of the NIPS reviews dataset was annotated
by the reviewers with the so called Reviewer confidence (Table 4). This information
is available only for the NIPS 2016 edition. As described in Table 4 each review
was annotated by the reviewer with a confidence rate. The authors of the reviews could choose between three different options: less confident, confident and expert. In this research the reviewer confidence scores are not used, but they are available in
the NIPS reviews dataset (Section3.2.1).
3.2.2 ICLR Reviews on OpenReview.net
The research conducted in this thesis tries to predict the citation impact of NIPS scientific papers. In order to improve the classification of these papers, a model was built to try predict the paper acceptance of ICLR papers using the reviews form OpenReview.net. It is expected that there are corresponding features in the NIPS and ICLR datasets, because both corpora contain peer reviews. Therefore it is hypothe-sized that the addition of the ICLR classification model will improve the NIPS clas-sification. To be able to built a model that predicts the ICLR paper acceptance, every ICLR review was annotated with its paper acceptance, expressed as either true or
false. For the 2013 and 2017 ICLR editions, the acceptance information was
avail-able on OpenReview.net. For annotating the papers of the 2014 edition, a technique called Distant Supervision was used. This technique was also used amongst others likeMintz et al.(2009),Read(2005) andGo et al.(2009). The acceptance information
for the 2014 edition was scraped from the ICLR 2014 website17
. Using Distant Super-vision the paper acceptance information was linked to the corresponding ICLR 2014 17
3.3 processing 22
papers. Because the acceptance information was not available for the 2016 edition, these reviews were not annotated with their belonging paper acceptance.
On OpenReview.net, Ratings and Confidence scores were also available for the peer reviews of some ICLR editions. Just like the paper acceptance, these reviews could hold predictive features for citation impact. Therefore, the ICLR reviews for the
2016and 2017 editions were also annotated with a Rating and Confidence score, as
earlier described in Section3.1.3. Both the rating and confidence score were given
by the author of a review. The author could give a scientific paper a score of 1 to 10. The confidence of the author about his or her review was captured in the Confidence score. Here, the author could annotate their own review with a score of 1 to 5. In both cases one was a bad score and ten a high score. An extensive overview of the
various scores can be seen in Table8.
The paper acceptance, paper rating and reviewer confidence were all three stored
into the ICLR Reviews dataset (Section3.1.3).
3.3
processing
In order to predict the impact of the NIPS scientific papers, the citation impact has to be defined. The NIPS reviews dataset only contains absolute citation scores,
col-lected on specific points in time. In Section3.3.1, the process in defining the citation
impact is described. The NIPS and ICLR corpus also contain textual information, beside annotated labels. This information needed to be processed so that it could be used to predict the impact of papers. The processing of this textual information
is described in Section3.3.2. In previous work by for exampleHudson(2016), other
features were indicated as predictive for the impact of papers. The creation of these
features for datasets is described in Section3.3.3.
Figure 7: Timeline describing the periods in which citation counts were collected.
3.3.1 Citation impact
To predict the citation impact of NIPS scientific papers, citation impact needed to be defined. There were multiple options to define the citation impact. In any case, first of all the citation growth needed to be calculated. This was done by calculating
a mean citation rate for each paper as inspired byVieira and Gomes(2010).
The citation rates could have been interpreted as the citation impact, where high numbers represent a high citation impact and low number represent low citation impact. The impact of NIPS scientific papers could in that case have been predicted using a regression task. However, this approach did not appear to be effective in early experiments. The performance of a regression model in trying to predict the
citation rate was poor18
. 18
3.3 processing 23
Because of the poor performance of the regression task, there was chosen to use a classification task to predict the citation impact of NIPS scientific papers. Therefore, the different citation rates needed to be classified into different impact categories. In order to define the different impact categories, a boxplot of the citation rates
was created (Figure 4). The boxplot shows a clear distinction between two types
of citation growth. The inliers in the boxplot were labelled as papers withlow-med
impact and the outliers ashighimpact. To see if a further classification of the citation
rates was possible, a second boxplot was created (Figure9). This boxplot represents
the citation rates classified with alow-medimpact. This boxplot shows that there is
again a clear distinction between two types of growth. The figure shows a group of outliers and a group of inliers. This means that there were still papers that had about the same growth and papers that had a more extreme growth. The first group was labelled as low impact papers and the latter group as medium impact papers.
In the sections below, the calculation of the citation growth and impact are de-scribed in more detail.
Citation growth rates
As a first step, the citation growth of a paper was calculated. Because the periods in between the different collection points differed, one can not directly use the absolute citation growth to indicate the growth over time. The periods in which the citation
counts were collected are plotted on a timeline in Figure 7. The time in between
these periods was indicated using v1, v2, v3 and v4. The first step in processing the citation counts, was calculating the absolute growth in citations for each of the described periods, the so called v_abs. The calculation of v_abs was performed by taking the number of citations of the successive period and subtract this from the
citations of the previous period (Equation1). Second, the growth rate of citations
was calculated, the so called v_rate. This calculation was performed for all periods. For this calculation the v_rate of the period in question was divided by the length
of that period (Equation 2). The last step in calculating the citation growth was to
calculate the average growth rate per month, the v_avg. This was calculated by
summing up all the v_rates and dividing this by the number of v_rates (Equation3).
An example of calculating the citation growth can be seen in Table11.
v_absx= citationsz− citationsy (1)
Equation 1: Calculates the absolute growth in citations in between two collection points.
Where x is a period in between to collection points, citationszthe number of
ab-solute citations of the successive period and citationsythe number of absolute
citations of the previous period.
v_ratex=
v_absx
v_lengthx (2)
Equation 2: Calculates the growth rate of citations within a period. Where x is a period in
between to collection points.
v_avg =
Pn
i=xv_ratex+ v_ratex+1.. + v_raten
z (3)
Equation 3: Calculates the average growth rate per month. Where x is a period in between
3.3 processing 24
The Randomized Dependence Coefficient
Conference NIPS Edition 2013 Citation counts citations1= 23 citations2= 33 citations3= 53 citations4= 75 v_abs
v_absv1 = citations2− citations1= 10
v_absv2 = citations3− citations2= 20
v_absv3 = citations4− citations3= 22
v_rate
v_ratev1= v_lengthv_absv1
v1 =
10 7 = 1.43
v_ratev2= v_lengthv_absv2v2 = 207 = 2.86
v_ratev3= v_lengthv_absv3v3 = 229 = 2.44
Citation growth v_avg = v_ratev1+v_ratev2+v_ratev3
3 = 2.24
Table 11: Calculation example of calculating the citation growth of The Randomized Dependence
Coefficient (Lopez-Paz et al.,2013).
1 { 2 "citations": { 3 "citations": { 4 "citations1": 23, 5 "citations2": 33, 6 "citations3": 53, 7 "citations4": 75 8 }, 9 "v_rates": { 10 "v1": 10, 11 "v2": 20, 12 "v3": 22 13 }, 14 "v_avg": { 15 "v1_avg": 1.43, 16 "v2_avg": 2.86, 17 "v3_avg": 2.44, 18 "v_avg": 2.24 19 } 20 } 21 }
Listing 3: Example of the JSON code for the in Table11calculated growth figures. The citation growth was calculated for all papers in the NIPS reviews dataset. All v_abs figures, v_rates and the citation growth (v_avg) were added to a copy of the JSON dataset. An example of the JSON code representing the growth figures
calculated in Table11, is displayed in Listing3. Important to notice is that the names
of the growth figures in the JSON code differ from the names described above. The
in the code described v_rates are the v_abs figures. The vx_avg in the code are the
v_rates. The v_avg is documented in the same way as described above.
Citation impact
In Figure8, the average growth rate per month is plotted against the the first
3.3 processing 25
the y-axis contained the absolute growth in citations. Just like in Figure2, one can
see that not all papers have the same citation growth. Some papers in the plots are more clustered in the bottom left corner and some are floating from the bottom left up to the top right corner. To make the clustered and floating papers more concrete, a boxplot of the average growth rate per month of the complete dataset was created.
This boxplot can be seen in Figure9. The boxplot itself is hard to read. Black circles
dominate the Figure, the so-called fliers or outliers. These are data points above the Upper Outlier Threshold (UOT) of the boxplot. Te UOT is defined by adding up the Inter Quartile Range Rule (IQRR) to the third Quartile 3 (Q3) of the boxplot
(Equation4). The IQRR is calculated as can be seen in Equation5. Here, the Inter
Quartile Range (IQR) is multiplied by 1.5. The IQR is the absolute difference
be-tween Quartile 1 (Q1) and Q3 of the boxplot (Equation6). Each growth rate that is
above the UOT is an outlier or flier in the data. Papers with such a growth rate are therefore defined as high impact papers. Papers with a growth rate that is below the UOT, are defined as low/medium impact papers. The growth rates belonging to the
latter class of papers are within the boxplot of Figure9, in between 0 and the Upper
Outlier Threshold. Taking these datapoints, the boxplot in Figure 10was created.
Just like in Figure9, the flyers and outliers are visible. In contrast to Figure9, the
boxplot itself is much more visible. For the boxplot in of the low/medium citation growth rates, the UOT was also calculated. All growth rates above the UOT were defined as medium impact papers. Papers with a growth rate in between 0 and the UOT, were defined as low impact papers. An overview of how a paper is defined as
low, medium or high impact paper, can be seen in Table12. In Figure8, the calculated
citation impact is visually indicated.
UOTz= Q3z+ IQRRz (4)
Equation 4: Calculates the Upper Outlier Threshold (UOT) of z above which data points are
marked as flier or outlier. Where z is a boxplot of growth rates and IQRR the Inter Quartile Range Rule (see Equation5)
IQRRz= 1.5 ∗ IQRz (5)
Equation 5: Calculates the Inter Quartile Range Rule (IQRR) of z. Where z is a boxplot of
growth rates and IQR the Inter Quartile Range (see Equation6).
IQRz= Q3z− Q1z (6)
Equation 6: Calculates the absolute difference between Q1zand Q3zof z: the Inter Quartile
Range (IQR). Where z is a boxplot of growth rates.
high impact paper v_avgx > UOTz
medium impact paper v_avgx > UOTi
low impact paper v_avgx < UOTi
Table 12: Overview of how paper x is defined as low, medium or high impact paper. Where
zthe boxplot of all growth rates and i the boxplot of all the growth rates without the high impact rates.
3.3 processing 26
Figure 8: Citation Growth (average growth rate per month) of NIPS Scientific Papers. For each
NIPS year, the paper with the highest impact is excluded from the graph (the same papers as in Table2). Legend: low impact papers (•); medium impact papers (×);
3.3 processing 27
Figure 9: Boxplot z of the NIPS citation growth (NIPS 2013 - 2017).
Figure 10: Boxplot i of the low/medium NIPS citation growth (NIPS 2013 - 2017).
Measures Boxplot z Boxplot i
Q1 0.14 0.11 Q2 0.50 0.44 Q3 1.16 0.85 IQR 1.02 0.74 IQRR 1.53 1.11 UOT 2.69 1.96
Total growth rates/papers 2417 2133
Low/medium impact papers 2133
High impact papers 284
Medium impact papers 112
Low impact papers 2021
Table 13: Different measures of the boxplot z and boxplot i and overview of the low, medium
and high impact papers. Where boxplot z is created out of all growth rates and boxplot i out of all growth rates without high impact rates.