• No results found

Who cites what in Computer Science? - Analysing Citation Patterns across Conference Rank and Gender

N/A
N/A
Protected

Academic year: 2021

Share "Who cites what in Computer Science? - Analysing Citation Patterns across Conference Rank and Gender"

Copied!
4
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Who Cites What in Computer Science?

-Analysing Citation Patterns across Conference

Rank and Gender

Tobias Milz1[0000−0003−3159−7666] and Christin Seifert2[0000−0002−6776−3868] 1 University of Passau, 94030 Passau, Germany

tobias.milz@uni-passau.de

2 University of Twente, PO BOX 217, 7500 AE Enschede, The Netherlands

c.seifert@utwente.nl

Abstract. Citations are a means to refer to previous, relevant scientific bodies of work. However, little is known about how citations behave with respect to venue reputation. Do A* papers get more often cited by C papers or vice versa? What is the source and sink of a citation in terms of venue reputation? In this work, we investigate this issue by analysing the DBLP database of computer science publications, utilizing rank information from the CORE database. Our analysis shows that authors tend to cite publications from the same or higher ranked venues more often than from lower tier venues. Self-citations, on the contrary, are especially focused on same-tier venues. The gender of the first author does not seem to have any impact on the citations from and to differently ranked mediums.

Keywords: Citations · Self-Citations · Analysis · DBLP · CORE.

1

Introduction

Citations are a means to refer to previous scientific bodies of work, and are also used to calculate impact factors for journals [1, 4] and performance measures for scientists [3] and thus have become a valuable commodity in science. Research has been concerned with finding influencing factors for citations (e.g. [10]), and most prominently to identify the influence of self-citations on citations and sub-sequently on indicators of scientific performance, e.g. [2, 5]. Multi-authored, as well as papers with male first author have been found to have a higher self-citation rate [2, 5, 11], while self-self-citation rates generally vary over fields and countries [14]. To the best of our knowledge, the only study that investigated the relation of self-citations and the scientific reputation of the publication venue is in the economics domain [8]. The authors found that the proportion of self-citations increased with the impact factor of ecology journals.

This paper contributes to the knowledge of citation and self-citation by analysing the domain of computer science. Specifically, we investigate the DBLP computer science bibliography [9] w.r.t. ranking of the conferences/journals and gender of the first author.

(2)

2 T. Milz and C. Seifert

2

Problem Statement

Citations can either be synchronous (outgoing) or diachronous (incoming) [7], the former refers to the number of publications a paper cites and the latter how often a publication gets cited. Analogously, outgoing and incoming self-citations are citations from and to publications of the same author. The self-citation rate is defined as the ratio of the self-citations normalized by the total number of citations and can be calculated for both, incoming and outgoing self-citations. In this paper, we analyse incoming and outgoing citations and self-citation rates with respect to the conference/journal rank. For instance, if paper P cites paper Q, and P was published at an A* conference while paper Q was published at a C conference, the citation counts as an outgoing citation for A* and incoming citation for C.

3

Method

For our analysis, we use the DBLP citation graph [13], supplemented with the paper’s ranking information and a gender attribute for the authors. The rank-ings are extracted from the Computing Research and Education Association of Australasia (CORE) database3 using a rule-based string matching method of the venue name. The focus of this method is to find the most likely match, but without introducing any false-positives in favour of Recall. The publication year of the papers is also considered in order to take rank changes of venues into ac-count. We follow previously suggested methods to determine an author’s gender by matching their first name (given name) to country-specific name lists [6]. For author identity, we rely on the quality of the DBLP citation graph, which al-ready employs author name disambiguation approaches [12]. Out of all 3,079,007 papers in DBLP covering the publication and citation period from 1946–2018, 55.66% (1,744,449) were assigned a binary (female/male) gender based on the first author’s inferred gender. A CORE rank was assigned to 14.15% (435,823), while both information could be assigned to 7.86% (242,096) of all papers.

4

Results

The heatmaps in figure 1 show the fraction of outgoing and incoming citations and self-citations for publications from each conference/journal rank. The initial theory is, that publications will more often cite highly ranked papers, as they have more visibility. According to the results, this hypothesis seems to hold true. For example, 93.6% of all outgoing citations from publications with a B rating, cite other publications with the same or higher rating (top left). Furthermore, A, B and C ranked papers receive more than half of all their incoming citations from publications of the same rank (top right). For self-citations, this effect is even more prominent especially for the categories C and Australasian, which

3

(3)

Who cites What? 3

A* A*/A A B C Austr. Other A* A*/A A B C Austr. Other 0.42 0.006 0.4 0.15 0.023 0.001 0 0.041 0.67 0.22 0.058 0.013 0 0 0.23 0.005 0.56 0.16 0.042 0.002 0.001 0.18 0.003 0.33 0.43 0.06 0.004 0.001 0.15 0.002 0.31 0.18 0.35 0.003 0.001 0.27 0.002 0.33 0.27 0.037 0.093 0.003 0.2 0.002 0.37 0.25 0.098 0.008 0.068 0.00 0.15 0.30 0.45 0.60

A* A*/A A B C Austr. Other A* A*/A A B C Austr. Other 0.29 0.17 0.15 0.1 0.045 0.064 0.078 0 0.28 0.001 0.001 0 0 0 0.39 0.34 0.51 0.26 0.2 0.29 0.28 0.23 0.16 0.24 0.55 0.23 0.37 0.31 0.078 0.034 0.089 0.091 0.52 0.12 0.16 0.006 0.002 0.004 0.005 0.002 0.15 0.013 0.002 0.001 0.002 0.003 0.003 0.007 0.17 0.0 0.1 0.2 0.3 0.4 0.5

A* A*/A A B C Austr. Other A* A*/A A B C Austr. Other 0.45 0.004 0.39 0.14 0.025 0.001 0.001 0.041 0.59 0.27 0.085 0.016 0 0 0.15 0.003 0.64 0.14 0.056 0.002 0.001 0.083 0.002 0.24 0.6 0.072 0.005 0.002 0.042 0.001 0.18 0.16 0.61 0.002 0.002 0.1 0 0.23 0.27 0.062 0.33 0.003 0.1 0 0.3 0.24 0.15 0.009 0.2 0.00 0.15 0.30 0.45 0.60

A* A*/A A B C Austr. Other A* A*/A A B C Austr. Other 0.43 0.19 0.14 0.071 0.026 0.048 0.046 0.001 0.34 0.001 0.001 0 0 0 0.36 0.3 0.6 0.19 0.15 0.2 0.24 0.17 0.14 0.19 0.65 0.17 0.37 0.28 0.04 0.031 0.066 0.081 0.65 0.071 0.17 0.003 0 0.002 0.004 0.002 0.31 0.006 0.002 0 0.002 0.002 0.003 0.005 0.27 0.00 0.15 0.30 0.45 0.60

Fig. 1. Ratio of citations (top) and self-citations (bottom) from venues with specific rank. Rows indicate the source and columns the target of citations. Left: normalized by the total number of outgoing citations per rank; right: normalized by total incoming citations per rank.

have much lower citation rates (35.1% and 9.3% respectively) than self-citation rates (61.2% and 33.2% respectively) towards same-tier publications (bottom). In other words, authors prefer to cite higher ranked publications, but self-citations are more commonly towards publications of the same conference/journal rank. Please note, that although a difference is observable in values for categories Australasia and Other, we abstain from an interpretation, since both categories only contain 4318 (0.9%) of the papers with an assigned rank.

Table 1 shows the statistics w.r.t. venue rank and gender of the first author. For example, out of all 1,957,108 outgoing citations towards papers with a male lead author, 13.8% are cited in publications from conferences/journals with an A* rating. This citation-rate indicates how citations from/to differently rated mediums are affected by the first author’s gender of the cited/citing paper. The results show that despite the lower number of papers with female leading authors (410,262 papers with female and 1,334,187 with male lead author), the distribution of the incoming and outgoing citation rate stays the same. In other words, the gender of the leading author has no significant effect on the citations of papers when considering their identified rating.

Further studies are required to shed light on the reason for the difference in citation/self-citations behaviour w.r.t. rank. An interesting future question

(4)

4 T. Milz and C. Seifert

Table 1. Comparison of citations by gender (M male, F female, X unisex, ? -unknown) and conference/journal rank

Conference/Journal Rank P Papers A* A*/A A B C Austr. Other Citations

in M 1,334,187 0.138 0.003 0.387 0.319 0.143 0.006 0.003 1,957,108 F 410,262 0.126 0.001 0.398 0.325 0.143 0.005 0.003 417,655 X 609,101 0.134 0.002 0.371 0.343 0.144 0.005 0.003 748,836 ? 725,453 0.117 0.001 0.355 0.346 0.174 0.005 0.003 676,809 out M 1,334,187 0.234 0.008 0.427 0.239 0.087 0.004 0.002 1,355,908 F 410,262 0.226 0.003 0.433 0.248 0.084 0.004 0.001 430,910 X 609,101 0.231 0.004 0.432 0.244 0.084 0.004 0.001 733,721 ? 725,453 0.237 0.002 0.427 0.235 0.094 0.004 0.001 761,150

would be, whether a homophily property in citation behaviour can be observed, i.e., whether a specific gender tends to cite authors of the same gender.

References

1. Time to remodel the journal impact factor. Nature 535(7613), 466–466 (jul 2016) 2. Aksnes, D.W.: A macro study of self-citation. Scientometrics 56(2), 235–246 (2003) 3. Alonso, S., Cabrerizo, F., Herrera-Viedma, E., Herrera, F.: h-index: A review fo-cused in its variants, computation and standardization for different scientific fields. Journal of Informetrics 3(4), 273 – 289 (2009)

4. Callaway, E.: Beat it, impact factor! Publishing elite turns against controversial metric. Nature 535(7611), 210–211 (2016)

5. King, M.M., Bergstrom, C.T., Correll, S.J., Jacquet, J., West, J.D.: Men set their own cites high: Gender and self-citation across fields and over time. Socius 3 (2017) 6. Larivire, V., Ni, C., Gingras, Y., Cronin, B., Sugimoto, C.: Bibliometrics: Global

gender disparities in science 504, 211–3 (12 2013)

7. Lawani, S.M.: On the heterogeneity and classification of author self-citations. Jour-nal of the American Society for Information Science 33(5), 281–284 (1982) 8. Leblond, M.: Author self-citations in the field of ecology. Scientometrics (2012) 9. Ley, M.: The DBLP computer science bibliography: Evolution, research issues,

per-spectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) String Processing and Infor-mation Retrieval. pp. 1–10. Springer Berlin Heidelberg, Berlin, Heidelberg (2002) 10. Medoff, H.M.: The efficiency of self-citations in economics. Scientometrics (2006) 11. Milz, T., Seifert, C.: Analysing author self-citations in computer science

publica-tions. In: Proceedings of the 29th DEXA Conferences and Workshops (2018) 12. Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework

for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24(6), 975–987 (2012). https://doi.org/10.1109/TKDE.2011.13 13. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and mining of academic social networks. In: Proc. SIGKDD Intl. Conference on Knowledge Discovery and Data Mining. pp. 990–998. ACM, New York, NY (2008) 14. Thijs, B., Gl¨anzel, W.: The influence of author self-citations on bibliometric

Referenties

GERELATEERDE DOCUMENTEN

Mijn familie, vrienden, ouders en schoonouders: bedankt voor jullie interesse in dit proefschrift, jullie steun voor ons gezin en vooral het delen van vele gezellige en

4. Inclusive Commission: Business actors participated in inclusive commis- sions, as demonstrated in the cases of both Mali and Papua New Guinea. Mali’s Economic and Social

The Tablighi Jamaʻat – a transnational Islamic missionary movement that propagates greater religious devotion and observance in the Gambia – opens the door to a new experience of

Hier ligt een kans voor het Raadssecretariaat om zijn diensten uit te breiden, door niet alleen tijdens maar vooral ook in aanloop naar het voorzitterschap uitvoerig contact te

Door atomaire en moleculaire emissie te bestuderen die wordt veroorzaakt door bovenge- noemde processen, kunnen we de interactie en terugkoppelingsprocessen tussen het groeiende

First, the study’s results indicated that one form of endorser type (expert versus consumer), endorser’s gender (male versus female), or audience’s gender (male

Figure 2 (C) and Figure 2 (D) shows that 78% of all the arXiv papers’ submission date is earlier than journal online publication and 84% are earlier than print

ANGULAR DISSYMETRY OF SCATTERING AND SHAPE OF PARTICLES