• No results found

Opportunistic citing : problems of citation analysis revisited

N/A
N/A
Protected

Academic year: 2021

Share "Opportunistic citing : problems of citation analysis revisited"

Copied!
42
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Opportunistic citing: problems of citation

analysis revisited

Master’s thesis

Student: Jakub Żyła, 11089474

University of Amsterdam, Faculty of economics and business Supervisor: Dr. Nathan Betancourt

(2)

ABSTRACT

Citations are a generally accepted indicator of importance and performance of authors and their work in the academic community, however not all agree with that standard. Up to this point the discussion focused on the argument that scientists do not cite all the historical influences contained in their papers, which is consider by the proponents of the citation analysis as truistic. I believe that the issue of the citation indexes might be more complex than selection bias. I hypothesize that the number of cited articles might be translated onto visibility of a paper ergo its popularity. I present strong ground for existence of such processes and clear incentive of “gaming” the current reward system in science. My results do not prove that people do collude or include “opportunistic” references to increase their chance of being noticed, however, I show strong preliminary ground for looking closer into that concept.

Statement of originality

This document is written by Student Jakub Żyła who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

TABLE OF CONTENTS

ABSTRACT ... 2

I. Introduction ... 5

II. Literature review ... 7

1. Normative view ... 7

2. Social constructivist view ... 9

3. Literature gap and research questions ... 11

III. Theoretical framework ... 12

IV. Methodology ... 17 1. Data ... 17 2. Citation Network ... 20 3. Data transformation ... 21 4. Variables ... 23 Dependent variable ... 23 Independent variables ... 23 V. Results ... 24 1. Descriptive statistics ... 25 2. Correlation analysis ... 25

3. Regression analysis of the outdegree ... 28

4. Regression analysis of the indegree ... 29

5. Effect of outdegree on indegree across time ... 31

VI. Discussion and conclusions ... 33

Findings ... 33

Implications for the citation analysis as an evaluative tool ... 34

(4)

Limitations ... 36

Future research ... 37

References ... 39

(5)

I. Introduction

Since its fast speed origin in 50s’ and 60s’ social network analysis has not lost momentum and now is a cornerstone of many fields of science like sociology or

management. In the recent years we have seen a growing trend in study of a variety of scholar networks, wherein a node is typically an academic entity like a paper, a book, a journal or an author and the links indicate a relationship such as citation or

co-authorship (Yan and Ding, 2012). One such network is a citation network, which is in the heart of contemporary bibliometrics. These studies are also essential for patent citations analysis, which is an important area of management. Moreover its popularity has been boosted by being at the center of a young and extremely potent area of research on complex networks. Although the field of citation analysis is well established, the growing knowledge on properties of such networks like their small-world structure (Watts and Strogatz, 1998) and preferential attachment mechanism (Newman, 2001; Jeong et al., 2003) enable to revisit some of the old issues and encourage discovery of new ones. Commonly the citation data about papers and authors have been used as a

representation of scientific “worth” or quality and importance in the scientific community, respectively (Gilbert, 1977). However, since the beginning, there were voices, which dismissed it as any kind of determinant of quality or status usually quoting lack of a universal standard for citing and “clear demonstration of the way in which citations reflect the processes of scientific influence” (Mulkay, 1974; p.111).

Now, almost five decades later, this discussion is still very much alive. On the one side we have proponents of the citation analysis who point at its universal acceptance and validation by peer judgment (Garfield, 1997b). On the other, its adversaries who quote various empirical studies showing numerous issues with citation analysis as an

(6)

evaluative tool (MacRoberts and MacRoberts, 1996). It seems that an agreement will not be reached as long as it falls under a large nature versus nurture debate. As long as both sides cannot agree on the most basic common perspective on the world it wont be possible for them to settle their argument. So, for now what is left is to further push our knowledge on actual observed mechanisms in citing. The only generally accepted

mechanism occurring in citation networks is preferential attachment, know in sociology as Matthew effect (Merton, 1968). This effect can be interpreted in both ways. Some will say that it is logical as great scientists are involved in many important discoveries and their papers are going to have more visibility (Merton, 1973). At the same time, this process can be a proof of something being evidently wrong with citation analysis as “eminent scientists get disproportionately great credit for their contribution to science while relatively unknown scientists tend to get disproportionately little credit for comparable contributions”. The preferential attachment is only a valid mechanism as long as we assume that citations are a legitimate reflection of quality of the work. Therefore, it should not be used in a dispute on citation analysis itself, as opposing groups do not agree on the underlying assumptions. In my opinion the discussion have reached an impasse and repeating the same truisms will not make any progress. The only way to break the deadlock is to search for new existing mechanisms.

One such promising argument has been made in Webster et al. (2009) who propose an idea of reciprocal altruism mechanism in science. This study aims at validating and further investigating such phenomena thereby warranting the concerns around the citation analysis as an evaluative tool. In particular, I aim at reinforcing the theoretical background of said occurrence and present a broader mechanism of “Opportunistic Citing”.

(7)

This paper is structured as follows. First a critical review of existing literature on the topic is outlined. Second, the used methodology and empirical results are presented. Finally, findings of this paper are presented along with its limitations and avenues for future research.

II. Literature review

The following paragraphs are meant to outline the arguments of both sides in the ongoing discussion about legitimacy of citation analysis as an evaluative tool. In this approach number of links is used to measure level of quality, influence, importance or performance of some academic entity (Borgman and Furner, 2002). Although nowadays bibliometric indexes are widely accepted in sociology of science as an indicator of status there are a lot of voices, which reject it as an objective evaluation method. We can

distinguish between two contradictory views and theoretical frameworks behind them. The proponents of the citation analysis are usually associated with the so-called

“normative” theory of citing, which is based on the assumption that authors cite their influences (MacRoberts and MacRoberts, 1997). On the other hand, opponents of the evaluative citation analysis support the “constructivist” theory of citing that authors are biased and act in self-interest (MacRoberts and MacRoberts, 1997).

1. Normative view

The first account assumes that authors use citing in order to evaluate the contribution and by doing so “acknowledge the intellectual debts” (Borgman and Furner, 2002; Garfield, 1965; 1996). According to this perspective citations are a mean of giving credit due to those authors whose work had been of relevance to that of the

(8)

citers. One of the strongest advocates of this approach is Eugene Garfield, founder of the citation index, who first in 1965 (p. 189) published a popular list of fifteen “major reasons” for citing another author. These are: paying homage to pioneers; giving credit for related work (homage to peers); identifying methodology, equipment, etc.; providing background reading; correcting one’s own work; correcting the work of others;

criticizing previous work; substantiating claims; alerting to forthcoming work; providing leads to poorly disseminated, poorly indexed, or un-cited work; authenticating data and classes of fact-physical constants; identifying original publications in which an idea or concept was discussed; identifying original publications or other work describing an eponymic concept or term (e.g. Pareto’s Law); disclaiming work or ideas of others (negative claims) and disputing priority claims of others (negative homage). All

scientists following these criteria would definitely fall into the normative perspective of science and selflessly serve to the greater good of scholarship. However these are more of a list of “good practices” for citing and not a description of actual motives behind citing decisions. The most important argument behind the validity of citation analysis is the evidence of correlation between the number of citations and other measures of quality of scientific work as well as peer-judgment (Cole and Cole, 1967; Bayer and Folger, 1996). In his relatively recent study Baldi (1998) tried to find the most important determinants of citation patterns among scientists, while prescribing them to one of the two opposing citing processes: normative or constructivist. His findings were consistent with the normative view as none of the determinants characteristic for the

constructivist theory turned out to be significant, with the exception of the sex of the cited author. Borgman and Furner (2002; p.14) provide an overview of the presumptive arguments made by the defenders of the citation analysis, which are as follows:

(9)

works — works that “ought” to be cited in the citing work;

2. that the general result of citers’ activities is such that (a) all works that ought to be cited in the citing work indeed are cited, and (b) all works that are cited indeed ought to be cited in the citing work; and

3. that the quality of a given citable work consists in its citation worthiness, and thus may be measured by citation counts.

To sum up, under the normative perspectives quality of a papers and importance of the author are the predictors of the received citations. On the other hand, the contained references are a mean of giving credit to the most important work done on the topic and reflection of other influences. The assumed underlying mechanism is preferential attachment, which means that influential papers and scientists are going to be rewarded by receiving high number of citations.

2. Social constructivist view

On the opposite side stand social constructivists who challenge the view of citers’ always following the generally accepted standards of citing in science and in their particular field. One strong argument has been presented by Gilbert (1977) who argued that references are used by scholars as a tool of persuasion. Latour (1987) went even further and described citing purely as a game played by the scientists in order collect the most benefit from it. One of the strongest adversaries of the evaluative citing analysis are the MacRoberts, who collected all the main arguments against the normative view: in practice scientists are not always citing their influences, claiming that on average scholars acknowledge only 30% of the references needed to cover all influence evident in them; citing is highly biased as some work is never cited although its objectively more

(10)

cite-worthy; secondary sources are more preferred, as disproportional number of citations is given to review papers; informal influences, which are paramount to any research are not cited; citer motivation may be unethical as suggested by Latour (1987); substantial variation in citation rates with fields, nationality, time period, and size; self-citations should not be considered as measurement of performance; different sizes of the audience, meaning not all work has the same probability to be cited; lack of perfect knowledge of the available literature, as scientists mostly use secondary sources and informal communication; confusion about the methodology and probable issues with the gathered data; technical problems causing as much as 20% of the data to be

erroneous (MacRoberts and MacRoberts, 1996). There has been substantial number of empirical research warranting these claims. One of the most popular papers is by White and Wang (1997) who studied citation patterns and motivations by conducting

interviews with twelve faculty and doctoral students of agricultural economics. They found out that it is common for some materials, which are not used in the process of writing to be cited and the other way around. But probably their most interesting discovery is what they call “metalevel concerns” – ie. confusion among citers on the general meaning and value of citing. In their study to identify the most important reasons for citing Case and Higgins (2000) warrant the theory that citations are used largely to persuade, as the second most significant factor predicting decision to cite was “citing author’s judgment that citing a prestigious work will promote the cognitive authority of his or her own work”. In his critique of citation rates and journal impact factors as evaluative tools of research Seglen (1998; p.226) went as far as saying: “citation rates are determined by so many technical factors that it is doubtful whether pure scientific quality has any detectable effect at all”.

(11)

Under the social constructivist view the preferential attachment is no longer working and cannot be used to describe the distribution of citations. In this view distribution of citations is more random as individuals act according to private preferences based on different ethical norms. Therefore a different mechanism describing the citations distribution is needed.

3. Literature gap and research questions

This study aims to further develop the discussion on citation analysis, which seems not to move forward. This stagnation might be due to the fact that its adversaries repeat the same arguments and do not present any operationalization of their theories. I

believe that at this point the social constructivists provided enough evidence to prove the legitimacy of their claims. There is a substantial level of confusion around awarding citations. And as it is one of the major reward systems in science, frequently affecting the awarded grants, there is need for more codified guidelines for citing. In order to shed more light onto the problem, research on other mechanisms then preferential

attachment is necessary. The main focus should shift onto looking at the actual citing patterns of researchers and the underlying motives. Especially, as there is a substantial body of work showing that good practices of citing are actually not fully reflected in practice (MacRoberts and MacRoberts, 1996).

This research will investigate claims that there might be a correlation between the number of cited papers and received citations, which cannot be quantitatively explained under the normative arguments. Therefore, it will lay ground for a new mechanism in citation analysis. Hence, the research of this study is as following:

(12)

Are there mechanisms other than preferential attachment in citation analysis, which are in line with the social constructivist view of science?

III. Theoretical framework

Normative perspective of citing does not fully explain observed effects

Although it has been decades since the first claims of inadequacy of citation analysis were laid no codified “theory of citing” (Mulkay, 1974), able to satisfy disbelievers, has been presented. In order to further support the social constructivist view I propose an analysis of number of cited papers while controlling for all usually quoted factors by the normative view. I believe that a large part of the variance in the contained references is not explainable by the normative view, showing that there is a lot of personal preference involved in citation process:

Hypothesis 1: Substantial part of the variance of number of references cannot be explained by the factors typically assumed to describe it according to the normative theory.

There is a need for different mechanisms influencing citation decisions

Recently a new interesting argument has been added to the discussion about citation analysis in Webster et al. (2009), who show that there might be a correlation between the number of papers cited and the number of received citations. Although, it is not the first paper showing the correlation between the two (Peters and van Raan, 1994), Webster et al. (2009) suggest a different way of approaching it. Basing on their preliminary findings I present a completely new way of understanding this

phenomenon. I believe that this occurrence is especially interesting to examine under the assumption made here in the hypothesis 1. Existence of such effect would mean that

(13)

personal biases in citation process, factors unrelated to the “quality” of the paper or “importance” of the author, might affect the number of received citations. The

assumption is that papers (authors), which cite more reservedly and genuinely will be less rewarded than their counterparts for the same work.

Up to this point studies showed a relationship between the number of received citations and various properties of papers other than “quality” like number of authors, field of science, journal, type of publications and number of pages (Peters and van Raan, 1994; Baldi, 1998), which in turn moderate the number of references included in the a given paper. Articles written by large number of co-authors with strong position in the field will yield more citations (Rousseau, 1992). Fields of study are characterized by number of references, which will impact the number of received citations independently from the field size (Lovaglia, 1989). For example average article in psychology will be characterized by a larger number of references than one in physics. Some argue that articles published in high impact factor journals will receive more citations than similar ones published elsewhere. However, Seglen (1994) showed that there might not be any “bonus” from being published in a high-impact journal. Lastly, certain types of articles like review articles will be more cited than “normal” ones (Seglen, 1997). Webster et al. (2009) add to that and argues that authors might participate in reciprocal altruism (Trivers, 1971) and suggests an “I cite you, you cite me” mentality.

In my opinion while the existence of such mechanism is extremely likely, Webster et al. (2009) lay very weak ground under their theory. First of all, I believe that the issue goes beyond only one bias and focusing only on reciprocal altruism greatly limits the research. Especially as it is impossible to differentiate between biases at hand without conducting thorough interviews with scientists about references contained in their work. There are many different biases, which could be taken into consideration when

(14)

analyzing citation patterns like anchoring, bandwagon effect, focusing effect or kinship. Secondly, Webster et al. (2009) check only a correlation between the number of

references included in a paper and those received by it with no control for any of numerous factors already established in the literature as important. This is why I consider their findings as extremely preliminary at best. Moreover, I will argue that an agent (author) does not bear any “cost” while citing another paper, which is a necessary condition of reciprocal altruism (Stephens, 1996). I think that it is reasonable to assume that published papers of such low quality that they will carry a “negative” impact on the reader when cited can be ignored. First of all being published is by itself a proxy for decent quality and such problems would come up during the review process. Second the number of final readers who will review in depth all the included references is very low.

Opportunistic citing as the primary mechanism of citing under social constructivist view

Consequently, there is no cost of including a lot of citations, while it is possibly beneficial for the received citations. This is why I propose considering mentioned process not in the category of altruism but rather opportunism. Agents are assumed to selfishly implement the “tit for tat” strategy counting on the positive outcome, while not incurring any detrimental effects. One cites more generously and broadly than justified in order to reap benefits in the form of more people noticing their work and

consequently citing it. I believe that this mechanism might exist not only on an individual level but also as collusion between multiple agents who agree on implementing this strategy among themselves.

I argue that this claim is well founded and may be well illustrated by translating it into current work on bibliometric search algorithm PageRank. Said tool is based on “Link Analysis Ranking” (Roberts et al., 2005), which counts the number and quality of

(15)

links of a page in order to estimate its importance. The underlying assumption is relevance propagation, which states that relevant pages are going to link to other relevant pages and receive more links from other websites (Blazek, 2007). PageRank serves in the same fashion as the impact factor for journals or authors. However, Baeza-Yates et al. (2005) observe that collusion impacts positively the PageRank score. They prove “that for a single page there is always something to win by colluding with other pages”, although that effect is lower for highly ranked pages. I believe that the same process might be present in the academic community, where a group of authors collude in order to artificially boost the number of citations received by their work through groundless citation of each other papers.

In order to add to the arguments presented by the adversaries of the normative theory of citing I will argue that there might be another reason for what I call

“Opportunistic Citing”, which is referencing other papers not in accordance to their influence on a study at hand but because it might be beneficial for gathering recognition. As the number of papers published per year is growing very fast, it is fairly unrealistic for one to have complete information about the whole network (the whole body of work published in the field), which is implicitly assumed by the propagators of the normative point of view (Vazquez, 2001). Following the logic behind the recursive search model presented by Vazquez (2001), which main idea is “to be connected to one node of the network and any time we get in contact with a new node we follow its links, exploring in this way part of the network”. It would mean that in order to find potential papers to cite, one chooses from the references present in and the citations to the papers they are already familiar with. Of course occurrence of such phenomenon would need to be tested but I believe that it is safe to assume that it is present to some extent. This

(16)

which is well proven to exist in citation networks. If we agree such phenomenon exists, multitude of connections to well renowned authors and papers as well as new popular ones would position a potential papers in a favorable position in the network. Such increased “centrality” (Blazek, 2007) by riding on the back of important papers would mean a strategic advantage over other similar papers located in more peripheral positions.

This concept may be better understood in the context of the Bianconi-Barabasi Fitness model of the evolution of a network (Bianconi and Barabasi, 2001; Albert and Barabasi, 2002). The model is based on the idea of fitness of a node, which defines how strong it’s going to attract other nodes. I believe that papers with more references will be “fitter” than those of similar quality but with less initial links. Thus, they will attract more connections at the expense of other similar papers. We will never have perfect knowledge about the citers’ ethics, however we can assume that he desires to be cited. I believe that by “artificially” increasing (referencing work which has not been used or is of lesser worth than that which is not referenced) or modifying the number of outgoing links that an academic entity has in the network one can positively influence the number of ingoing links.

Because of the limitations and scale of this study I will only attempt to show a preliminary ground for possibility of such phenomenon, pushing further claims presented in Webster et al. (2009). I will argue that as large part of the variance in included references is not explained and in turn the high number of those references influence positively the number of received citations there is a lot of room for the mechanisms described here to exist. Thus the second hypothesis is as follows:

(17)

Hypothesis 2: The number of references in a paper has a positive effect on the number of citations it gets.

Moreover, it is assumed that large number of citations will increase the paper’s visibility, which in turn will speed up the process of getting citations. This means that the number of references used will have a moderating effect of the rate of received citations. Taken into consideration the preferential attachment mechanism we can than conclude that large number of references will have a positive effect on the paper’s popularity. Hence, the third hypothesis is as follows:

Hypothesis 3: The number of references in a paper has a positive moderating effect on the pace of getting citations.

IV. Methodology

This section will present a description of the data used in this research along with all the necessary transformations made. Next, all variables used in the models are

introduced.

1. Data

The data used here contain of all publications in 12 major physics journals, together with their full citation list, from 1893 to 2013 gathered by American Physical Society. Database in json format contained 541 448 unique papers along with its information: title, names of its author(s), date (DD/MM/YY), journal, issue, volume, page of

publication, last page of publication, number of pages, affiliation of the paper, affiliations of each authors and other additional database specific information. Additional dataset

(18)

with a full reference list of the papers composed of 6 040 030 iterations was provided in a csv format.

Table 1. Basic statistics of the dataset

Journal #Papers #Citations Period Legend Label

Physical Review (Series I) 360 579 1893-1912 #1 g2

Physical Review 44,916 634,981 1913-1969 #2 g1

Reviews Of Modern Physics 2,768 158,621 1929-2013 #3 g12

Physical Review Letters 106,909 1,965,514 1958-2013 #4 g8

Physical Review A 64,032 548,225 1970-2013 #5 g3

Physical Review B 158,888 1,510,387 1970-2013 #6 g4

Physical Review C 33,865 268,675 1970-2013 #7 g5

Physical Review D 68,574 724,340 1970-2013 #8 g6

Physical Review E 44,831 220,851 1993-2013 #9 g7

Physical Review Special Topics

– Accelerators and Beams 1,789 4,948 1998-2013 #10 g9 Physical Review Special Topics

– Physics Education Research 215 605 2005-2013 #11 g10

Physical Review X 200 886 2011-2013 #12 g11

Total 527,347 6,038,612

Individual characteristics of each journal are described in Table 1. Additional information can be seen on the visualization of the network presented on Figure 1. First of all, we can clearly see that each journal forms a cluster, which means that papers tend to cite papers from the same journal. This is fairly obvious as most journals cover

specific topics. However, from the visualization we can find out which topics are most interconnected.

The three journals with the highest impact factor (for 2013) are Reviews Of Modern Physics, Physical Review Letters and Physical Review X, all of which cover all aspects of physics. We can observe that the first two (white and red colors respectively) can be observed all over the network. Physical Review X (dark purple) is not visible on the graph because of the small number of its citations.

(19)

Moreover, we can see the historical influences. Physical Review Series I (dark green) is detached from the main network and is mainly connected with it through its successor the Physical Review (dark blue).

Also, as expected both special topics journals are separated from the main body of the network. What is interesting the cluster formed by the Physics Education

Research papers (aquamarine) is much bigger than that formed by papers published in Accelerator and Beams (cream). This is interesting as the second journal has much more publications, which shows that education research is much more disjointed from whole body of physics research that on accelerators and beams.

(20)

Figure 1. Visualization of the analyzed citation network

2. Citation Network

In a citation network (in this text synonym of “graph”) nodes correspond to publications and links (in this text words “link” and “edge” are synonims) to citations between them. Citation network ℕ = (ℙ, 𝑹) is defined in a given set of units ℙ (publications in physics journals) in, which a citing relation 𝑹 ⊆ ℙ × ℙ occurs:

(21)

𝑖𝑹𝑗 ≡ 𝑖 𝑐𝑖𝑡𝑒𝑠 𝑗

The link e, between a pair (i,j), represents a directed connection from node i to node j. In this case i is the “citing” node and j is the “cited” node. Degree centrality of a node is defined by the number of links this node has. In mathematical terms the degree centrality, d(i), of node i is defined as:

𝑑(𝑖) = ∑ 𝑚𝑖𝑗

𝑗

where 𝑚𝑖𝑗 = 1 when the link between nodes i and j is present, and 𝑚𝑖𝑗 = 0 if it is not. The indegree of a node i is the number of directed links connecting to it - ie. number of times publication i has been cited. The outdegree of a node i is the number of directed links initiated by it - ie. number of references contained in publication i.

3. Data transformation

In order to proceed with the analysis it was necessary to conduct extensive data transformation, which proved to be very challenging because of the substantial size of the data. Firstly, all the necessary data was extracted from the json file using a excel macro. An excel file containing paper ID, its title and year of publication was created. It was not possible to include the author’s names to that file, due to excel limitations. That is why another file was created: containing the paper ID and the number of its authors. As the maximum number of rows in excel is 1 048 576 and it’s limited computational power all of the following steps couldn’t be conducted in that environment. I decided to carry on using software for statistical computing R and its various social network analysis packages. Three files have been imported into the software. File containing the future node information: paper ID, title and year of publication were merged with the

(22)

one containing the information on the number of authors. Next, the network object was created, using the iGraph package for R, from the csv file containing all the reference information, which became the edges in the network. As the file containing the edge information and the node information had different number of papers and not all journals occurring in the reference list were present in the node file it was not possible to create a dynamic network at this point. Trimming of the data was necessary to add all the information about each paper (journal, year of publication, number of authors and pages). In order to not distort the “full” character of the data, which was one of the strong suits of the analysis, I decided to count the in- and outdegrees separately and only then delete some of the nodes in order to add the missing information. This way I obtained the data set with the maximum possible vertices with their full information (degree count also included the papers, which had to be later excluded from the data set). Creating a panel data composed of all papers and each of its references along with its full information would be theoretically possible. However, it would not only be very time-consuming but also the created file would be extremely big with around 12 080 060 iterations. Due to the fact that I have access only to a laptop computer and working on such a huge dataset needs a lot of computational power I decided to add time as a regular continuous variable along with a total number of indegrees and outdegrees of each paper. The degrees of the papers were counted using the iGraph package in R. All of the iterations with missing values in the number of indegrees were deleted, as those were the papers appearing in the json file document but not in the reference list.

(23)

4. Variables

For the examination of the hypothesis the variables descrbing indegree, outdegree, year, number of authors and type of journal are used. A further explanation of these variables can be found below.

Dependent variable

lnin: Logarithmically transformed number of how many times a given paper has

been cited as for 2013. Number of indegree citations is transformed into logarithms for two reasons. Firstly, a non-linear relationship is expected between the dependent and independent variables because of the Matthew effect resulting from the power law distribution of the network (Wang and Chen, 2003; Hein et al., 2006). Second, the indegree variable is highly positively skewed and after the transformation it resembles more a normal distribution. In order to perform the logarithmic transformation ones were added to the number of indegree citations. Histograms of the variable before and after transformation can be seen in the appendix (Fig. 2 and 3 respectively).

Independent variables

lnout: Number of the references included in a paper. Same logarithmic

transformation as with the dependent variable has been applied. Variable out is the outdegree number before the logarithmic transformation.

year: The year of publication of a given paper. This variable is also logarithmically

transformed as it is observed that a correlation between this variable and the dependent variable is logarithmic. The assumption is that the coefficient of the year variable and its derivatives in the model are going to show how the number of citations grew across the years. Because of the “rich get richer effect” it is assumed that papers, which citations count grew more steeply, will be cited more across the years. Before the logarithmic

(24)

transformation it was multiplied by -1 and 2014 was added. This way a continuous variable was crated ranging from 121 for papers published in 1893 to 1 for those published in 2013. This way the constant value was significantly smaller and more adequate for analysis.

g1, g2,…,g12: The dummy variables which indicate affiliation to one of the twelve

journals. It is assumed that each of these journals are of high-quality. The differences are between the types of articles published and referencing patters. Also they have

audiences of different sizes.

aut: Number of authors of the paper. The assumption is that it is important to control

for this variable as it may explain a large portion of the variance.

pages: Number of pages of the given publications. The assumption is that it is

essential to control for this variable as it may explain a large part of the variance.

V. Results

The following section contains the results of the analysis carried out in this research. First, the descriptive statistics are reported in order to give an overview of the data. Subsequently, the outcomes of the correlation analysis are reported along with a short description of the most important results. Next, numerous regression analysis were performed in order to test various hypothesis.

The initial transformation of the data was carried out using R software and its iGraph package. Next the full dataset was imported to Stata 13, where the statistical analysis was performed.

(25)

1. Descriptive statistics

Table 2 contains descriptions of all variables used in the model before their logarithmic transformations were made, in order to make them more easily

interpretable. Dummy variables created for journal affiliation were not contained as their meaningful description has been described earlier.

Table 2. Descriptive statistics before logarithmic transformations

Variables observations No. mean SD min max

inn 527,347 11.45 34.86 0 6,291

out 527,347 11.37 10.45 0 607

year 527,347 1993 17.06 1893 2013

aut 521,586 6.633 55.24 1 3,173

pages 527,347 7.635 6.111 1 1,529

* Dummy variables for journal affiliation have been included in the regression model for descriptive statistics look Table 1.

Due to the huge character of the dataset it was not possible to verify the perfect accuracy of the data. However, the suspicious outliers have been examined. As expected the article with the highest number of authors (3173) was an experiment using the Large Hadron Collider. The longest article with 1529 pages is a current Review of Particle Physics (published in 2012).

2. Correlation analysis

The correlations of all variables are provided in Table 1. Almost all correlations are significant at p level of 0.05. We can acknowledge that lnout is positively related to lnin (Corr. coefficient = 0.15, p < .05), which indicates preliminary evidence that the

hypothesized relation is true. Although, the correlation between the discussed variables is relatively low it grows in importance because of the database size.

(26)

Moreover, the effects of all key variables are consistent with the author’s

assumptions and relationships established by previous literature. As there is a positive correlation between aut and lnin (Corr. coefficient = 0.16, p < .05), we can conclude that with every additional author of the paper grows the number of citations that the paper gets. Year variable is also positively correlated with the number of obtained citations (Corr. coefficient = 0.27, p < .05), which confirms the obvious truth that the number of received citations increases with time. The number of pages of a paper is also positively related with the indegree of a paper (Corr. coefficient = 0.08, p < .05), which tells us that the more extensive papers can expect more citations.

Furthermore, we can examine the general effects of affiliation to one of the journals. The results are consistent with what could be expected analyzing the

descriptive statistics of the journals. The positively related dummy variables for journal affiliation with the indegree are those representing Physical Review, Physical Review Letters and Reviews of Modern Physics. The former is the second series of physical journals analyzed here, so we can assume that it contains a lot of important discoveries, which were highly cited by the later work when it has been divided into multiple titles. The latter two journals: Physical Review Letters and Reviews of Modern Physics are the ones with the highest current impact factors (7.728 and 42.860 for 2013, respectively). One other journal included here - Physical Review X (g11) has a relatively high impact factor however is not positively correlated with the number of outdegrees. I assume that this outcome was dictated by low number of publications in it (only 200) and it’s young age (established in 2011), which may translate into a lot of newly published papers with few citations

(27)

Table 3. Correlations

Variables lninn lnout aut year pages g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12

lnin 1.0000 lnout 0.1536* 1.0000 aut 0.0157* 0.0164* 1.0000 year 0.2679* -0.3698* -0.0714* 1.0000 pages 0.0818* 0.3247* 0.0599* -0.1493* 1.0000 g1 0.0405* -0.2032* -0.0260* 0.4343* -0.0666* 1.0000 g2 -0.0229* -0.0612* -0.0023 0.0535* 0.0326* -0.0080* 1.0000 g3 -0.0474* 0.0501* -0.0242* -0.0533* 0.0291* -0.1134* -0.0097* 1.0000 g4 -0.0604* 0.1825* -0.0348* -0.0712* 0.0248* -0.2004* -0.0172* -0.2441* 1.0000 g5 -0.0290* 0.0217* 0.0063* 0.0142* 0.0464* -0.0799* -0.0068* -0.0974* -0.1720* 1.0000 g6 -0.0219* 0.0701* 0.0477* -0.0723* 0.1794* -0.1180* -0.0101* -0.1437* -0.2539* -0.1013* 1.0000 g7 -0.1391* -0.0411* -0.0206* -0.1668* 0.0617* -0.0930* -0.0080* -0.1133* -0.2002* -0.0798* -0.1178* 1.0000 g8 0.2096* -0.1449* 0.0497* -0.0107* -0.2798* -0.1539* -0.0132* -0.1875* -0.3311* -0.1321* -0.1950* -0.1537* 1.0000 g9 -0.0425* -0.0530* -0.0009 -0.0560* 0.0352* -0.0178* -0.0015 -0.0217* -0.0383* -0.0153* -0.0226* -0.0178* -0.0294* 1.0000 g10 -0.0142* -0.0284* -0.0014 -0.0297* 0.0186* -0.0062* -0.0005 -0.0075* -0.0133* -0.0053* -0.0078* -0.0062* -0.0102* -0.0012 1.0000 g11 -0.0118* 0.0131* -0.0005 -0.0420* 0.0139* -0.0059* -0.0005 -0.0072* -0.0128* -0.0051* -0.0075* -0.0059* -0.0098* -0.0011 -0.0004 1.0000 g12 0.0625* 0.0337* -0.0060* 0.0523* 0.2456* -0.0222* -0.0019 -0.0270* -0.0477* -0.0190* -0.0281* -0.0221* -0.0366* -0.0042* -0.0015 -0.0014 1.0000

(28)

3. Regression analysis of the outdegree

After showing that there is a correlation between the two main variables I want to check if there is a part of the outdegree variance not described by the typically

mentioned characteristic: type of the journal (here almost equivalent with type of paper as review papers and letters got their own journal), number of authors and number of pages, number of years since publications. The model shown in Table 4 describes only 29% of the variance of outdegree, which indicates that these factors have medium significance in predicting the number of used references. This lays strong ground to look into other non-quality related factors influencing the number of citations included in a paper. I believe that part of this variance could be predicted by occurrence of collusion activities and opportunistic citing.

As predicted, some dummy variables for journal affiliation have negative coefficients, which means that they have a negative impact on the dependent variable in relation to the omitted journal (g8), which acts as a reference level.

Moreover, the number of years since publication has a negative effect on the

outdegree, which is logical as the older papers had less options of work to cover in their citations.

One effect is not in line with that hypothesized. The number of authors was assumed to have a positive impact on the number of contained citations as each authors may want to cover different area of the topic or is influenced by different scientists. Here, the negative coefficient of the aut variable is hypothesized to be a result of outliers with huge number of authors what is not translated onto the actual number of contained references.

(29)

Table 4. Regression analysis of predictors of the outdegree

out Coefficient Robust Std. Error

aut -0.00595*** 0.000559 pages 0.600*** 0.101 year -0.235*** 0.00617 g1 5.901*** 0.451 g2 5.989*** 1.713 g3 0.970** 0.383 g4 2.235*** 0.353 g5 0.676 0.461 g6 0.367 0.603 g7 -3.790*** 0.416 g9 -10.67*** 0.657 g10 -14.90*** 0.852 g11 1.665 1.060 g12 21.43*** 2.723 Constant 10.58*** 0.542 Observations 521,586 R-squared 0.291 *** p<0.01, ** p<0.05, * p<0.1

4. Regression analysis of the indegree

Up to this point we established that in the discussed network there is a lot of ambiguity about the patterns of citing as with the standard factors we are only able to predict 23% of the number of references. Given these outcomes I believe that the constructivist’s arguments are warranted.

Next, I want to check my hypothesis if the number of references included has indeed a positive impact on the number of citations received. In order to check influence of outdegree on the dependent variable a regression analysis has been performed. Results from the regression analyses, shown in Table 5, indicate that a 1% increase in lnout multiplies lninn by e^(0.436∗log(1.01)) ≅ 1.002. So, a 1% increase in the number of

(30)

references increases number of received citations by 0.2% (Benoit, 2011). Considering that the average number of outdegrees is only 11 it is a significant effect. All variables are significant and the model describes 23% of the variance.

Table 5. Regression analysis of predictors of indegree

lnin Coefficient Robust Std. Error

lnout 0.436*** 0.00733 aut 0.000360*** 3.25e-05 pages 0.0234*** 0.00331 year 0.473*** 0.00168 g1 -0.929*** 0.0121 g2 -1.996*** 0.0662 g3 -0.808*** 0.0117 g4 -0.828*** 0.00987 g5 -0.883*** 0.0143 g6 -0.793*** 0.0183 g7 -0.914*** 0.0153 g9 -0.816*** 0.0323 g10 -0.409*** 0.0649 g11 -0.674*** 0.0666 g12 -0.669*** 0.0813 Constant 0.0538*** 0.00744 Observations 521,586 R-squared 0.231 *** p<0.01, ** p<0.05, * p<0.1

Next interaction between lnin and year was added to the model and Wald test was performed. The test (F (1,521569) = 1560.66; P > F = 0.0000) rejects the hypothesis that both coefficients are simultaneously equal to zero. Therefore, the interaction is statistically significant, which allows me to continue with the next section of the analysis.

(31)

5. Effect of outdegree on indegree across time

In the last part of the analysis the dataset was divided into two by the median of outdegree, which was nine, in order to analyze the moderating effect of outdegree. The idea is two show the difference in the coefficient of the year variable, which will indicate different speeds of a paper being cited. It is expected that the dataset with a higher number of outdegree is going to have a higher coefficient of year variable.

Table 6. Regression analysis of predictors of indegree for papers with the outdegree

number <= 9

lnin Coefficient Robust Std. Error

lnout 0.336*** 0.00566 aut 0.000417*** 4.90e-05 pages 0.0196*** 0.00510 year 0.393*** 0.00386 g1 -0.768*** 0.0145 g2 -1.842*** 0.0810 g3 -0.704*** 0.0165 g4 -0.712*** 0.0135 g5 -0.794*** 0.0167 g6 -0.728*** 0.0215 g7 -0.807*** 0.0206 g9 -0.762*** 0.0398 g10 -0.451*** 0.0696 g11 -0.903*** 0.102 g12 -0.668*** 0.0713 Constant 0.371*** 0.0241 Observations 267,601 R-squared 0.168 *** p<0.01, ** p<0.05, * p<0.1

(32)

Table 7. Regression analysis of predictors of indegree for papers with the outdegree

number > 9

lnin Coefficient Robust Std. Error

lnout 0.460*** 0.00872 aut 0.000292*** 3.69e-05 pages 0.0252*** 0.00164 year 0.543*** 0.00204 g1 -1.182*** 0.0144 g2 -3.218*** 0.204 g3 -0.962*** 0.00954 g4 -0.999*** 0.00818 g5 -1.022*** 0.0125 g6 -0.923*** 0.0132 g7 -1.134*** 0.0114 g9 -1.256*** 0.0458 g10 -1.576*** 0.174 g11 -0.676*** 0.0775 g12 -0.849*** 0.0591 Constant -0.0243 0.0207 Observations 253,985 R-squared 0.284 *** p<0.01, ** p<0.05, * p<0.1

As expected, the coefficient of the year variable is higher in the dataset with high outdegree number, which indicates that those papers, on average, got their citations faster. When we consider the power law distribution of the network we can conclude that those papers will have a greater chance of getting popular. On average papers with a broader bibliography will be more cited then the others. Considering that a typical paper on average receives most citations after two or three years after its publication (Stewart, 1983) the speed of getting the initial citations is crucial.

(33)

VI. Discussion and conclusions

Findings

This study was set out to discover and attempt to theorize new mechanism in citation analysis taking into account all issues pointed out by the social constructivists. The hypotheses were constructed under the assumption of veracity of the arguments made by adversaries of citation analysis as an evaluative tool. Overall, the results of the analysis supported the assumptions made here. The following section describes and discusses the findings of this research. First, implications for the citation analysis are presented. Next, it moves to the implications for the argued mechanism of opportunistic citing. Last it lists the main limitations of this study and gives directions for the future research.

There are three main findings of this study. First, the outcomes of the regression analysis of predictors of outdegree prove that the normative view of science is not able to explain citation patterns in science. The model is able to explain only around 30% of the outcomes. It demonstrates that citing decisions are highly preferential, which means that they might not carry the same impact. Hence, ideally individual citations should not be treated equally but considered singly.

Second, the regression analysis of predictors of indegree proves that the number of references is a valid predictor of received citations. Theoretically, by increasing the number of references one might reap significantly more citations without changing the quality of a paper. According to the model, a 100% increase in the number of references should increase the number of received citations by 20%, which could be enough to reap positive effects of preferential attachment. As the average number of citations in the used dataset was only 11 and the median was only 9 I believe that it is fairly plausible

(34)

for someone to double the number of references opportunistically. It can be understood as one increasing their chance of being noticed in the multitude of published work or increasing the “fitness” of the paper.

Lastly, it is shown that the number of used references has a positive effect on the pace of being cited. This effect further supports the arguments above. Theoretically, a paper of the same quality but with more references is going to be noticed earlier as it reaches larger audience. Consequently, according to the preferential attachment it will become popular earlier. It could be well described by the Fitness model described earlier. Papers with more references are more fit ergo they get new connections (citations) faster as the network grows.

Implications for the citation analysis as an evaluative tool

Current study proves that patters seen in the citation analysis are inexplicable under the normative perspective thus providing strong arguments against using it, as an evaluation tool. Results presented here indicate significant differences in the use of citations between scientists. This would mean that citations obtained from different sources should carry a different meaning, which is not the case at the moment.

All citation indexes treat each citation count the same. The most basic example of that approach being erroneous are citations used to critique some work. Although, it has been argued that such cases are fairly rare and thus irrelevant, it well illustrates issues of citation analysis as an evaluative tool at the individual level. Outcomes of this

research further support the need for reconceptualization of citation indexing and impact factors to incorporate weights for each connection.

I believe that these measures should follow methods introduced in web rank algorithms like PageRank, which considers not only the count but the quality of a

(35)

connection as well. The idea is that citations from well renowned scientists or papers published in important journals, which cite more narrowly would be counted with a higher weight. Such initial importance could be assigned by peer judgment. This way citation indexes and impact factors would be fairer and less likely to be gamed.

Implications for the opportunistic citing

I believe that this study gives a strong preliminary ground of existence of mechanisms other than Matthew effect. As the number of references is statistically important for predicting the number of received citations and that number is not fully explained by the theorized factors it is a preliminary proof of existence of another mechanism. Therefore, the results present a theoretical basis for a concept of

Opportunistic Citing. I believe that looking for such mechanisms is crucial for entirety of

science. The theories and findings presented here are not addressed to the most

proclaimed scientist and clearly important discoveries and do not aim to take away any due credit. The emphasis is on how the entirety of science is created and concerns of current reward systems being flawed.

As the social pressure to be cited is very high people might try to boost their citation count “artificially” by manipulating their reference list. Moreover scientists might avoid unpopular topics or niches in fear of not getting any recognition, what in effect might be harmful to science. If we consider all scholars as agents acting for the benefit of science, ideally they should work on the topics where they would add most value to it. However, as that value is inadequately reflected in the reward system based on citation counts they might do otherwise. This is purely speculative but I hypothesize that the

opportunity costs of academics choosing their research topic basing on its popularity (in terms of citation counts) is extremely high. Most basically, I think that there might be

(36)

scientists working on popular topics, adding relatively very little value to science, while there are fields where that contribution would be much higher. And In my opinion a flawed recognition system based on imperfect citation count is to blame.

The citation index might be the best way to recognize the people who are in the “fat tale” of the citation distributions but unfair to the ones in the average interval. My findings indirectly show that getting credit sometimes depends on pure luck, as

referencing is highly preferential. Therefore, building reward systems based on citation indexes is not fair for most. Moreover, it becomes even less fair when people

intentionally manipulate their citations in order to benefit from it. It can be understood as the cost of no requirement of citing all historical influences, what allows for

opportunistic citing.

Limitations

The study found numerous limitations. Most importantly lack of sufficient computational power prevented me from doing even most basic calculations on the network object. Calculating any properties of the analyzed network proved impossible. Therefore it was an efficient barrier to further investigation of presented theories on a large scale. Secondly, limited scale and time constraint prevented any micro level analysis of a smaller part of the network, which would include qualitative factors like “worth” of the papers or importance of its authors. Such approach would be extremely time consuming as all considered papers would have to be evaluated and problem of multiple authors would have to be addressed. Most papers in the network (ergo physics) have multiple authors even as many as 3173. One possible solution might be considering only the first author but as Merton (1968; 1973) have pointed out, high-status scientists

(37)

are well aware of the Matthew effect and counteract it by for example giving the first place to another coauthor.

Moreover, this study covers only one citation network so all results are field specific. Generally, field of physics is characterized by a rather low level of citations. The

standard among physicists is to cite narrowly and more current research. This is why these findings might not be generalizable onto all science.

Future research

Current study opens multiple options for future research of mechanisms existing in scientific citing. Discovery of such phenomena would not only add to the ongoing debate but also be crucial for various scientific indexes. I believe that further investigation of the topic is crucial for improvement of the current reward systems in science based on the citation count.

First of all, I believe that the current research should be validated in other fields characterized by different “referencing culture”. Once the most basic assumptions are generalized onto the whole science one can look further for mechanisms described here.

I believe that there are various ways in which arguments presented here could be tested. Definitely, the best possible avenue for future research would be qualitative studies, which would get a closer look into citation decisions of academics during their writing process. Such studies would better illustrate how recognition is granted in practice and test weather the theory of opportunistic citing presented here is justified.

Moreover, allegations of collusion could be tested by bibliographic coupling research combined with qualitative assessment of legitimacy of included connections based on the contents of the papers. The idea is to find groups of authors who cite each other

(38)

heavily and repeatedly over time and check if all the references are justifiable according to the general “good practices” of citing. I think that some of those connections could be fairly unrelated with the topic and others would be chosen instead other more

important work of higher quality. Such studies would provide further insight into the importance of affiliations between scientists for citation decisions. In business,

networking skills is listed as the most important predictor of success. I wonder to what degree the same mechanism is visible in science.

Lastly, I think that the Bianconi-Barabasi Fitness model could be successfully used to access the importance of individual citation preferences on the fitness of a paper. The idea would be to examine papers with different number of references, which are otherwise fairly identical. And as the network grows check how effectively will they attract future citations.

(39)

References

Albert, R., & Barabási, A. (2002). Statistical mechanics of complex networks. Reviews of

Modern Physics, 74(1), 47.

Baeza-Yates, R. A., Castillo, C., & López, V. (2005). Pagerank increase under different collusion topologies. Airweb, , 5 25-32.

Baldi, S. (1998). Normative versus social constructivist processes in the allocation of citations: A network-analytic model. American Sociological Review, , 829-846. Bayer, A. E., & Folger, J. (1966). Some correlates of a citation measure of productivity in

science. Sociology of Education, , 381-390.

Benoit, K. (2011). Linear regression models with logarithmic transformations. London

School of Economics, London,

Bianconi, G., & Barabási, A. (2001). Competition and multiscaling in evolving networks.

EPL (Europhysics Letters), 54(4), 436.

Blazek, R. (2007). Author-statement citation analysis applied as a recommender system to

support non-domain-expert academic research ProQuest.

Borgman, C. L., & Furner, J. (2002). Scholarly communication and bibliometrics.

Borodin, A., Roberts, G. O., Rosenthal, J. S., & Tsaparas, P. (2005). Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Internet Technology

(TOIT), 5(1), 231-297.

Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for

Information Science, 51(7), 635-645.

Cole, S., & Cole, J. R. (1967). Scientific output and recognition: A study in the operation of the reward system in science. American Sociological Review, , 377-390.

Garfield, E. (1965). Can citation indexing be automated. Statistical Association Methods

for Mechanized Documentation, Symposium Proceedings, , 1 189-192.

Garfield, E. (1997). Validation of citation analysis. Journal of the American Society for

Information Science, 48(10), 962-962.

Garfield, E. (1996). How can impact factors be improved? BMJ (Clinical Research Ed.),

(40)

Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, , 113-122. Hein, D. O., Schwind, D. M., & König, W. (2006). Scale-free networks.

Wirtschaftsinformatik, 48(4), 267-275.

Jeong, H., Néda, Z., & Barabási, A. (2003). Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4), 567.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society Harvard university press.

Lovaglia, M. (1989). Status characteristics of journal articles for editor’s decisions and citations. The Society for Social Studies of Science Annual Meeting, 15-18.

MacRoberts, M., & MacRoberts, B. (1996). Problems of citation analysis. Scientometrics,

36(3), 435-444.

MacRoberts, M. H., & MacRoberts, B. R. (1997). Citation content analysis of a botany journal. Journal of the American Society for Information Science, 48(3), 274-275. Merton, R. K. (1968). The matthew effect in science. Science, 159(3810), 56-63. Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations

University of Chicago press.

Mulkay, M. J. (1974). Methodology in the sociology of science: Some reflections on the study of radio astronomy. Social Science Information, 13(2), 107-119.

Newman, M. E. (2001). Clustering and preferential attachment in growing networks.

Physical Review E, 64(2), 025102.

Peters, H., & van Raan, A. F. (1994). On determinants of citation scores: A case study in chemical engineering. Journal of the American Society for Information Science, 45(1), 39.

Rousseau, R. (1992). Why am I not cited or, why are multi-authored papers more cited than others?

Seglen, P. O. (1994). Causal relationship between article citedness and journal impact.

Journal of the American Society for Information Science, 45(1), 1.

Seglen, P. O. (1998). Citation rates and journal impact factors are not suitable for evaluation of research. Acta Orthopaedica Scandinavica, 69(3), 224-229.

(41)

Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ (Clinical Research Ed.), 314(7079), 498-502.

Stephens, C. (1996). Modelling reciprocal altruism. The British Journal for the Philosophy

of Science, 47(4), 533-551.

Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166-189.

Trivers, R. L. (1971). The evolution of reciprocal altruism. Quarterly Review of Biology, , 35-57.

Vazquez, A. (2001). Disordered networks generated by recursive searches. EPL

(Europhysics Letters), 54(4), 430.

Wang, X. F., & Chen, G. (2003). Complex networks: Small-world, scale-free and beyond.

IEEE Circuits and Systems Magazine, 3(1), 6-20.

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks.

Nature, 393(6684), 440-442.

Webster, G. D., Jonason, P. K., & Schember, T. O. (2009). Hot topics and popular papers in evolutionary psychology: Analyses of title words and citation counts in evolution and human behavior, 1979–2008. Evolutionary Psychology, 7(3),

147470490900700301.

White, M. D., & Wang, P. (1997). A qualitative study of citing behavior: Contributions, criteria, and metalevel documentation concerns. The Library Quarterly, , 122-154. Yan, E., & Ding, Y. (2012). Scholarly network similarities: How bibliographic coupling

networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other. Journal of the American Society

(42)

Appendix

Figure 2. histogram of in variable (indegree before logarithmic transformation)

Referenties

GERELATEERDE DOCUMENTEN

Lemma 7.3 implies that there is a polynomial time algorithm that decides whether a planar graph G is small-boat or large-boat: In case G has a vertex cover of size at most 4 we

Nog steeds wordt er onderwijs gege­ yen in verschiIIende aspecten van alles wat met tuinen en groen te maken heeft.. De &#34; tuinhazen&#34; worden

The standard mixture contained I7 UV-absorbing cornpOunds and 8 spacers (Fig_ 2C)_ Deoxyinosine, uridine and deoxymosine can also be separated; in the electrolyte system

Naast meer betrouwbare aanwijzingen voor een mesolithische aanwezigheid, kan de overgrote meerderheid der artefacten in het neolithicum gesitueerd worden. Het

een muur (vermoedelijk een restant van een kelder), wat puinsporen en een vierkant paalspoor. Daarnaast werd er nog een botconcentratie in de natuurlijke

It is shown that by exploiting the space and frequency-selective nature of crosstalk channels this crosstalk cancellation scheme can achieve the majority of the performance gains

Existmg algo- factoring algorithms achieve the same running time äs nthms, like the multiple polynormal quadratic sieve algo- mpqs&gt; heuristica]iy or rigorously, but generally

Since it involves the same underlying sieving operations äs, for instance, the quadratic sieve and the special number field sieve, it is our guess that this algorithm will eventually