Understanding the dynamics of citation networks through preferential attachment models

(1)

MSc Mathematics

Master Thesis

Understanding the dynamics of citation

networks through preferential

attachment models

Author: Supervisor:

Roelof Kuipers

prof. dr. N.V. Litvak

dr. A.V. den Boer

Examination date:

April 5, 2018

(2)

Abstract

Citation networks are used to analyse the evolution of science. But is there a mechanism that can capture the dynamics behind this evolution? In this thesis we introduce the PAMREF model, a directed preferential attachment model with a random number of added edges and bounded fitness. This model explains the growth of complex networks through a combination of the rich-get-richer and fit-get-richer paradigm. Vertices in the network with many connections will have a higher probability of obtaining more connections. At the same time, this probability is also influenced by the fitness value of each vertex. We will give a rigorous prove of some key properties of the PAMREF model and estimate its parameters. First we will derive a maximum likelihood estimator for parameter δ, which determines how strongly the rich-get-richer. After that we will estimate the fitness distribution using two methods: the modified growth method and the modified PAFit method. We will test the quality of these estimators and we will see that an extended version of this model can be used to explain the growth of citation networks.

Title: Understanding the dynamics of citation networks through preferential attachment models

Author: Roelof Kuipers, rnkuipers@hotmail.com, 10220321 Supervisor: prof. dr. N.V. Litvak, dr. A.V. den Boer Second Examiner: prof. dr. M.R.H. Mandjes

Examination date: April 5, 2018

Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 105-107, 1098 XG Amsterdam http://kdvi.uva.nl

Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente

Drienerlolaan 5, 7522 NB Enschede https://www.utwente.nl/en/eemcs

(3)

1. Introduction

The use of citations is of incredible importance to the legitimacy of science. The cita-tions in a scientific article show the body of literature on which the author builds his arguments. Because of this they also serve as a fact-checking tool with which the reader can independently verify if the cited material supports these arguments. The amount of citations an article receives are often seen as an indication of its popularity. While most scientific articles receive little citations, for some this number grows rapidly over the years.

Since the 1960s, the analysis of citations in scientific articles has become a major scientific field of itself. Citation analysis, sometimes also called scientometrics, is used to find out what the impact and quality of scientific articles is, as well as to examine the origin and growth of new research fields [11]. This citation analysis is done on citation networks in which scientific articles are represented as vertices that are linked with directed edges from one vertex to another by citation. In this way citation networks can show the evolution of science as a whole. Nowadays, because of globalization and digitalization, more articles are being published and articles are more easily accessible to anyone. This has made citation networks bigger and citation analysis more useful than ever. However, there are still many unanswered questions. What determines the growth of a citation network? Is it possible to predict which articles will be cited in the future? Is there a mechanism that can capture the dynamics behind the evolution of a citation network?

What makes these questions interesting for a master student in the field of stochastics is that citation networks are an example of real-world complex networks. They are so big and can grow so complex that you can only express their behaviour in probabilistic terms. In Figure 1.1 an example of a citation network of 21 million articles published between 1996 and 2011 is shown colored per discipline [5].

A popular model to explain the growth of complex networks is the preferential attach-ment model (PAM). The PAM explains this growth through the so called rich-get-richer paradigm. This is the idea that at every time step when a new vertex is added to a network it is more likely to connect to already well-connected vertices.

It has been shown that the PAM can be extended to model the growth of citation networks [8]. With this model we can hence capture the dynamics behind the evolution of a citation network. However, it is not known how we can find the right values of parameters for this model, using the data of a specific citation network, such that the growth of the PAM corresponds to the growth of this citation network. In this thesis a start for the estimation of these parameters will be given by looking at a slightly simpler version of the PAM from [8]. For this model we will construct estimators of the parameters based on observing the articles and citations (vertices and edges) in a

(6)

Figure 1.1.: Visual representation of the citation network of published articles between 1996 and 2011 [5], colored per discipline.

citation network.

In the first chapter we will see why we can model citation networks as PAMs. After that we will present our model that we will refer to as the PAMREF model, the directed PAM with a random number of added edges and bounded fitness, and discuss and proof some key properties of this model. In this rest of this thesis we will be busy estimating the parameters which determine the dynamics behind the growth of the model. These parameters are the parameter δ, which expresses how strongly the rich-get-richer, and the fitness values (ηt)t≥1, which represent the quality of each article. Our PAMREF model

has not been discussed much in literature and therefore our proofs and derivations and results are new extensions based on literature for slightly different models.

We will first construct a maximum likelihood estimator for parameter δ and after that give and compare two methods with which we can estimate the fitness values. Finally, we will conclude how well we can estimate the parameters in our PAM and what this implies for the understanding of the dynamics behind the growth of citation networks.

(7)

2. Modelling citation networks as a

PAM

Before we will see how we can model citation networks as preferential attachment models, let us first discuss how the preferential attachment model works.

The preferential attachment model (PAM), proposed by Barab´asi and Albert in [1], is commonly used to model complex networks. It is a dynamic model as it describes the growth of a complex network and it does so through the sequential addition of vertices. With these dynamics PAMs can explain the occurrence of a power-law degree distribution in real-world networks. This is the observation that the proportion pk of the vertices in

the network having k connections to other vertices scales as an inverse power of k pk∝ k−τ,

for power-law exponent τ . This means that although many vertices will have a low degree (the fraction of vertices with degree 1 is proportional to 1−τ), there exist some vertices with an extremely high degree as pk decays slowly for high k. A power-law

distribution is often plotted in log-log scale. If for some constant C we have pk = Ck−τ,

then log pk = log C − τ log k. This implies that the plot of log k 7→ log pk is close to a

straight line. An example of this is shown in Figure 3.1.

Networks that show a asymptotic power-law degree distribution are also called scale-free. Many real-world networks, such as citation networks, the Internet and the network of Facebook friendships, are scale-free and PAMs offer an explanation for this with the rich-get-richer paradigm. In PAMs at every time step a new vertex is added with one or more edges. These edges are attached to vertices that are already present in the network with a probability proportional to the degree (i.e. the number of connections to other vertices) of these vertices at that time. This favors vertices with a large degree and hence ensures that the rich-get-richer. Note that this implies that vertices who are present in the network the longest will often have the highest degree. Therefore this sometimes also called the old-get-richer.

In general, the probabilities are determined proportional to a non-decreasing affine preferential attachment function (PA function) f : N → R+ of the degree. The most

basic example of a PAM is the Barab´si-Albert model in which this function is f (k) = k. The network starts at t = 1 where we only have one vertex with no edges. Then recursively at each time t = 2, 3, . . . a new vertex is added with one edge that the vertex attaches to one of the t − 1 vertices that are already present in the network. This vertex, say vertex i, has degree Di(t − 1) before vertex t entered the network and will therefore

(8)

entails normalizing by the sum over all the degrees Pt−1

i=1f (Di(t − 1)) which makes the

probabilities depend on the degrees in an affine way. If vertex i has been chosen (by a sample from a multinomial distribution) then Di(t) = Di(t − 1) + 1 and for vertex t we

get Dt(t) = 1, as the edges in the Barab´si-Albert model are undirected. Note that at

each time step, i.e. at each new arriving vertex, we need to update these probabilities as the degrees of the vertices have changed.

It has been shown in [1] that the empirical degree distribution (pk(t))∞_k=1, with pk(t) the

proportion of vertices with degree k at time t, converges to a limiting degree distribution (pk)∞k=1 when t → ∞, which follows a power-law

pk∝ k−3.

Many different variations of this basic PAM have been studied. Most commonly known is the PAM where the PA function is parametrised by a parameter δ. This parameter δ determines how much the probabilities of attachment are determined by the degrees of the vertices. As this addition to the model is so common when we talk about a PAM we will implicitly mention a PAM with PA function parametrised by δ and we will explicitly mention it when we do not have a parameter δ. There are a couple of more interesting extensions to the basic PAM known. Such us the PAMs with directed edges and PAMs where a new vertex t does not connects to the existing t − 1 vertices with one edge, but with a fixed number of m edges, or with a random number of mt edges (according

to some fixed distribution). Also, in case of multiple edges, these edges can be added independently of each other or the probabilities of attachment can intermediately be updated after each edge is connected to a vertex (which increases the degree of that vertex). Furthermore, another interesting variation is the PAM with fitness where at each time step a new vertex i joins the network that has a fitness value ηi, drawn from

some distribution with bounded support. The probability with which later edges will be connected to vertex i is then proportional to both its degree plus δ and its fitness value. This gives rise to a fit-get-richer paradigm in which vertices with a high fitness will end up with a higher degree.

It has been shown that these variations of the basic PAM all have empirical degree distributions that converge to a asymptotic degree distribution which follows a power-law with some power-law exponent, see for example [2], [3], [6], [7] and [15].

Note that in the case of directed edges we have one vertex from which the edge orig-inates and another vertex to which the edge is connected. This gives each vertex an in-degree, which is the number of edges from other vertices to that vertex, and an out-degree, which is the number of edges from the vertex to the other vertices. Citation networks have directed edges, because citations are directed (a new article cites older articles, but does older articles do not also cite the new article). From this point onward, when we talk about the degree of a vertex in a directed PAM we will actually talk about its in-degree. The out-degrees in citation networks are not that interesting as they only tell the number of edges that each new vertex brings to the network, while the behaviour of the in-degrees can be rather complex. A consequence of this is that when a new vertex enters the network and is connected to another vertex by a directed edge, the (in-)degree

(9)

of the receiving vertex will increase by 1, while the (in-)degree of the vertex from which the edges originates will remain 0.

2.1. PAM for citation networks

In [8], Garavaglia, van der Hofstad and Woeginger have investigated the possibilities of modelling citation networks using a modified PAM in order to explain the dynamics behind their growth. They performed empirical research using data from the Web of Science and found that real-world citation networks contain four main characteristics:

1. The number of published scientific articles grows exponentially in time. This im-plies that when modelling citation networks, the inter-arrival time between vertices (articles) in the network decays exponentially.

2. Citation networks have empirical power-law degree distributions, where the degree is the number of times an articles is cited by other articles

3. The attractiveness of an article to receive new citations depends on its past number of citations (its degree) through an affine function. This corresponds to the affine PA function f (k) ∝ k + δ, for some parameter δ, as used in the PAM.

4. The majority of scientific article stops receiving citations after some time, while a few keep being cited for longer times. This implies that there is an aging effect which favors newer articles over older to receive citations in citation networks. Garavaglia et al. showed that in fact the age of articles in citation networks is log-normal distributed. When modelling citation networks this implies that the probability that a new edge will be connected to a vertex is not only proportional to (the affine function of) its degree but also to the (log-normal) probability of the time that the vertex is present in the network.

These properties of citation networks suggest that they can be modelled as a PAM where the attachment probability is proportional to the degree, parameter δ and the age of each vertex. Furthermore, the edges are directed, the number of edges frpm each vertex (the number of articles it cites) is random and where the number of vertices that arrives at the network increases exponentially.1 However, when modelling such a network as a PAM they proved that the addition of aging to the model destroys the power-law degree distribution which is a crucial property of PAMs and a characteristic of citation

1

Note that in ordinary PAMs we look at discrete time. Each time step represent the addition of one vertex. However with exponential growth we need to look at continuous time as we have exponen-tially decaying inter-arrival times. This also means that the age of the vertices, which needs to be determined at each new arriving vertex, is measured in continuous time. Garavaglia et al. have shown that PAMs in continuous time can be represented as continuous time branching processes (CTBP). However we will be working with a simplified version of their model in which we do not use aging and look at discrete time, i.e. through the addition of vertices to the network. Therefore we will just speak of a PAM instead of a CTBP.

(10)

networks. Because of this inconsistency, they suggested the existence of an additional characteristic in citation networks:

5. Every article has some fitness value which expresses the quality of the paper and which also influence its attachment probability (together with the degree, param-eter δ and its age). This fitness value can be based on many properties such as, for example, the name of the author of the article, the journal in which it is pub-lished and the intrinsic quality of the article. Although it is hard (maybe even impossible) to quantify the quality of an article and to determine the exact fitness value, we can analyse the way in which the fitness value of an article influences its probability of receiving citations. If we know how the fitness values are distributed, then (analogously to the aging effect) this implies how articles with a higher fitness will be favored over articles that are less fit.

This suggestion of the existence of fitness in the model is not strange and PAMs with fitness have been well studied in order to model real-world networks, see [2], [4], [9], [13] and [14]. The main result of Garavaglia et al. in [8] is that when the fitness values have an exponential distribution, then the directed PAM with a random number of added edges, fitness and aging has a power-law degree distribution.

Because citation networks also have a power-law degree distribution it seems like this fitness that is exponentially distributed is also present in citation networks and that this PAM is suitable to model the growth of citation networks. However, in order to correctly model a citation network, we need to find the right value of the parameters in our model such that the growth of the model corresponds to the specific growth of citation network. What is the value of parameter δ? What is the value of parameter λ which determines the shape of the fitness distribution and hence how the fitness is distributed? What are the parameters of the log-normal distribution that determine the influence of aging on the growth of the network?

If we have the data set of a real-life citation network we want to be able to extract these parameters from the data. This data set consists simply of the number of vertices (articles), the moment when each vertex arrived at the network and to which other articles their edges (citations) are connected. In other words, we observe the growth of a citation network, but we do not observe the parameters that define this growth. We will give a start on the estimation of the parameters in the PAM from [8] by looking at slightly simpler model and construct estimators for the values of the parameters of that model. This model can later be extended to the model from [8]. In this thesis we will work with the directed PAM with parameter δ, a random number of added edges and bounded fitness. In this model we will not include the aging and therefore it is also not needed to look at the (exponential) inter-arrival times of the vertices. We can simply work in discrete time. We will also assume that the random number of added edges mtat

each time step t is drawn independent and identically distributed (IID) from some known distribution with finite mean and that mtdoes not need to be estimated. This, because

when we model the growth of citation networks we can simply observe mtby looking at

(11)

see that in the model without aging we must have a fitness distribution with bounded support in order to have a power-law degree distribution. Therefore we cannot use a exponential fitness distribution, but we will work with a truncated Gamma distribution instead.

In the next section we will first introduce the directed PAM with a random number of edges. After that we will extend this to the model where we add (bounded) fitness. We will refer to this model, the directed PAM with a random number of added edges and bounded fitness, as the PAMREF model will and it will be the main model that we will work with in this thesis. We shall first discuss and prove some key properties of the PAMREF model and then construct estimators for the parameters δ and the fitness values (ηt)t≥1 of the vertices that define the growth of this model.

(12)

3. The PAMREF model

Before we look at the directed PAM with a random number of added edges and bounded fitness, that we will often refer to as the PAMREF model, we will introduce the directed PAM with a random number of edges but without fitness (so this could also be called the PAMRE model). We will extend this model to the PAMREF model by giving each vertex a fitness value, drawn from some bounded fitness distribution. After we have done this we will show in this chapter that the network generated by a PAMREF model is scale-free, i.e. has an empirical degree distribution that converges to an asymptotic degree distribution which follows a power-law. This proof will be a rigorous version of the proof in [10] that is derived using mean value analysis for a model where every vertex adds one edge (instead of a random number of mt). After we have shown this we will

end the chapter by giving a heuristic derivation of the expected degree of vertices in the PAMREF model. This will be an extension of the derivation for the model without fitness from [15]. We will use this later to estimate the fitness values in our model.

3.1. PAM with a random number of added edges

We first present the directed preferential attachment model with a random number of added edges. Suppose (mt)t≥2is a sequence of IID samples from a positive integer-valued

random variable with finite mean. The preferential attachment model gives a network sequence that we denote by (PAt(δ))∞t=1, depending on parameter δ > 0, where for every

t the network PAt(δ) at time t is represented by a set Vt = {1, . . . , t} of t vertices and

Pt

j=2mj edges. We start with the network PA1(δ) that consists of vertex 1 with no

edges. Now, when vertex 2 enters the network it has m2 (directed) edges and can only

attach all his m2 (directed) edges to vertex 1. Therefore the network PA2(δ) consists of

two vertices V2 = {1, 2} where D1(2) = m2 and D2(2) = 0. Here Di(t) is the degree of

vertex i at time t. Now for t ≥ 3, given PAt−1(δ), a new vertex t is added with mt edges

connecting t to Vt−1. To which vertices in Vt−1 these edges are connected is decided

independently for each of the mtedges. The conditional probability that the j-th of the

mt edges is connected to vertex i ∈ Vt−1 is

P t−→ i|PAj t−1(δ) =

Di(t − 1) + δ

Pt−1

r=0(Dr(t − 1) + δ)

, 0 ≤ i ≤ t − 1. (3.1) The probabilities are proportional to the degrees of the vertices in Vt−1 plus the value of

parameter δ. The parameter δ in our model determines how much the degree influences the probability of attracting new edges. A low value of δ makes the probabilities depend strongly on the degrees, while a high value makes the probabilities more uniform. Note

(13)

that for each edge j, with 1 ≤ j ≤ mt, the probability the probability that a vertex i

will be chosen is the same as the vertices are added independently of each other. After all of the mt edges from vertex t have been added, the network given by PAt(δ) has

been constructed from PAt−1(δ). For the mt vertices in Vt−1 to which the mt edges of

vertex t have been connected we have Di(t) = Di(t − 1) + 1, while for vertex t we have

Dt(t) = 0 as our model is directed. As for each new vertex t we have that Dt(t) = 0 we

must have that δ > 0, otherwise new vertices will never attract any edges.

We will abbreviate the total sum in the denominator in (3.1) by St−1(δ) :=Pt−1_r=1(Dr(t−

1) + δ). If we then denote the total number of added edges up until time t by Mt =

Pt

j=2mj, we can write the denominator in (3.1) as St−1(δ) = Pt−1r=1(Dr(t − 1) + δ) =

Mt−1+ (t − 1)δ, because the total sum of the degrees is equal to total the number of

added edges.

It has been shown in [10] and [12] that the empirical degree distribution of a directed PAM with a random number of added edges converges to an asymptotic degree distribu-tion which follows a power-law. We denote the empirical degree distribudistribu-tion (pk(t))∞k=0

in the model at time t, PAt(δ), for k ≥ 0 by

pk(t) = 1 t t X i=1 1{Di(t)=k}= Nk(t) t . (3.2)

Here Nk(t) is the number of vertices with degree k at time t. Note that we have also

have a probability p0(t) of having degree 0 as we have a directed network. If the mean

number of added edges E[m2] = µ < ∞ is finite (which we assume), then pk(t) → pk, as

t → ∞, where the limiting degree distribution (pk)∞k=0 is, for k ≥ 0, proportional to

pk∝ k−(2+δ/µ).

We verified this power-law behaviour of the model by generating a network with n = 10000, δ = 3 and mt= µ = 2 fixed for each vertex. After that we compared the empirical

degree distribution to the power-law with exponent τ = 2 + 3/2. The results are shown in a log-log plot in Figure 3.1. We see that the empirical degree distribution resembles the power-law quite well as it follows the straight line as discussed in Chapter 2.

Although we will show all are derivations in this thesis for a random number of added edges mt, in the examples we will fix this number mt= µ for some integer µ. We will do

this because every time a vertex t enters the network we can simply observe its random number of connections with that it will attach to other vertices, mt. Therefore these

random numbers are known and for convenience we can also simply work with a fixed number in our examples. However all our derivations will be done for a random number of added edges mt. Note that we will often work with the mean number of added edges

E[m2] = µ < ∞. As we assume that we can observe mt, we also assume this mean to be

known (which is common in PAMs with random edges).

(14)

10−4 10−3 10−2 10−1 100 100 100.5 101 101.5 102 Degree Probability

Figure 3.1.: Log-log plot of the degree distribution of a directed PAM with n = 10000, δ = 3 and mt= 2 and with power-law exponent τ = 2 + 1/µ = 3.5 in black.

3.2. The PAMREF model

The directed PAM with a random number of added edges and bounded fitness, the PAMREF model, is almost the same as the model without fitness. The difference is that the probability that a new edge is added to a vertex j does not only depend on its degree k + δ plus parameter δ, but also on the fitness ηj of vertex j. We will first discuss the

model, and then show that the model has an asymptotic degree distribution that follows a power-law.

In the PAMREF model, we have a sequence (ηt)t≥1 of IID samples from a fitness

distribution with bounded support [0, ηmax]. We will abbreviate this sequence by η :=

(ηt)t≥1. Each vertex t has a fitness value ηt which represents the fitness, i.e. quality, of

the vertex and influences its probability to attract new edges together with its degree plus δ. The preferential attachment model gives a network sequence up to time n that we therefore denote by (PAt(δ, η))∞t=1, where for every t the network PAt(δ, η) at time t

is represented by a set Vt= {1, . . . , t} of t vertices andPtj=2mj edges.

For t ≥ 2, given PAt−1(δ, η), a new vertex is added with mtedges connecting vertex t

to Vt−1. The conditional probability that the j-th of the mtedges is connected to vertex

i ∈ Vt−1 is P t−→ i|PAj t−1(δ, η) = ηi(Di(t − 1) + δ) Pt−1 r=1ηr(Dr(t − 1) + δ) , 0 ≤ i ≤ t − 1. (3.3) We will abbreviate the denominator by St−1(δ, η) := Pt−1_r=1 ηr(Dr(t − 1) + δ). A

(15)

St(δ, η) (recall that for the model without fitness we had St(δ) = Mt+ tδ), because each

fitness value ηr influences the value of the sum St(δ, η) as it scales Dr(t − 1) + δ. We

will see that this random sum will not make it harder to estimate parameter δ (as we then assume the fitness values to be known), but does make it much harder to estimate the fitness values η = (ηt)t≥1.

Before we will show the model is scale-free we will look at how adding fitness to a PAM can change the whole structure of the model.

3.3. Influence of fitness on the PAM

When simulating the model with fitness, we see that the fitness of the vertices has a huge influence on the evolution of the network. In this section we will first look at how this becomes visible when we look at a small graph for a network with and without fitness. After that we will look at how different types of fitness distributions have a different influence on the structure of the network.

As we have seen in the previous chapter, the PAM is known for its rich-get-richer mechanism. In the model without fitness, we also call this the old-get-richer mechanism. Vertices who are longer present in the network have more time to attract edges and hence get richer. In the PAM with fitness, such as our PAMREF model, we see that this old-get-richer mechanism no longer fully holds. In addition to the mechanism, we get the fit-get-richer mechanism, where vertices with a high fitness value also get a higher degree. To see this difference, we simulated a small PAM n = 25 vertices and with a random number of added edges (mt)25t=2 twice. In one simulation we gave each vertex a specific

fitness value, visualized in appendix A.1, while in the other simulation the vertices had no fitness, visualized in appendix A.2. As one can see the vertices with the highest degrees in the model without fitness are vertex 1 with degree 48, vertex 2 with degree 17 and vertex 5 with degree 8. While in the model with fitness the vertices with the highest degree are the first vertex 1 with degree 30 and vertex 8, that has the highest fitness value, with degree 33.

We see that the structure of the PAM with fitness is rather different and more complex then the PAM without fitness. The specific effect that the fitness distribution has on the structure of a PAM has been extensively studied, see [4], [9], [10], [13] and [14]. In [4], Borgs et al. have given an overview of the effect of fitness to structure of the PAM. They have shown that depending on the distribution from which the fitness values are sampled we can have three different types of PAMs with fitness:

1. the old-get-richer network, where all vertices have equal fitness; this is the same as the PAM without fitness;

2. the fit-get-richer network, where vertices with a high fitness will grow faster then those with a low fitness and thus get a higher degree;

3. the innovation-pays-off network, where almost all edges will be connected to a small group of vertices that have the highest fitness values; vertices who are inno-vative (who have a very high fitness) are being rewarded.

(16)

It is important to keep in mind that in all the types of PAMs with fitness the probability of attachment to a vertex still also depends on the degree Di(t) + δ of the vertex plus

δ. Which type of network we observe depends on structure of the fitness distribution. In [4] a method is given to mathematically determine the type of network based on the probability density function of the fitness distribution. However, a heuristic explanation suffices to show the occurrence of the fit-get-richer and the innovation-pays-off network based on the shape of the fitness distribution function (note that the old-get-richer network simply is the rich-get-richer as described in Chapter 2).

Our goal will finally be to estimate the fitness distribution of a (directed) PAM with fitness (and a random number of added edges). We shall later try to estimate the fitness distribution for each of the three types of networks although we focus less on the old-get-richer networks. As we shall see in the next section, the only condition we need to make is that the fitness distribution has bounded support.

In the rest of this thesis we shall work with truncated, i.e. bounded, Gamma distri-butions. These distributions can have many different shapes. In addition, recall that it is was shown in [8] that a true citation network has an exponential fitness distribution which is simply an instance of a Gamma distribution. Note that we are looking at PAMs with just fitness while a true citation network can be modelled as a PAM with fitness and aging. Because of this aging the condition of having bounded fitness does not longer needs to hold. As we do not have this condition we shall work with truncated Gamma distributions. 0 1 2 3 4 5 0.0 1.0 2.0 3.0 Value Density Γ5(3, 12) Γ5(5, 5) Γ5(12, 3)

Figure 3.2.: Density plots of a Γ5(3, 12) innovation-pays-off distribution, a Γ5(12, 3) fit-get-richer

distribution and a Γ5(5, 5) distribution.

We denote a truncated Gamma distribution with shape and rate parameter (α, β) and upper bound 5 as Γ5(α, β). In Figure 3.2 a density plot for a Γ5(3, 12), a Γ5(5, 5)

(17)

with a innovation-pays-off network. This is because most vertices have a low fitness value, as you can see in Figure 3.2, while just a few vertices have a fitness value that is much higher than the rest. When these vertices have entered the network they will dominate the other vertices and obtain almost all the edges of the new vertices. On the contrary, with a Γ5(12, 3) fitness distribution we end up with a fit-get-richer network.

Most vertices have a higher fitness value, while a couple have low fitness value, see Figure 3.2. These vertices will obtain less edges and will have a low degree. As most of the vertices have a high fitness value, they will get richer, but there is not a group of vertices dominating the others as in the Γ5(3, 12) distribution. What is interesting

is that for a Γ5(5, 5) distribution it is less clear what type of network we have. There

are many vertices with a much higher fitness value then the rest and at the same time a couple with a lower fitness value then the average. We expected that the influence of this fitness distribution on the network results in a network that is somewhat in between the innovation-pays-off network and the fit-get-richer network. To show what influence the

10−4 10−3 10−2 10−1 100 100 ₁₀1 ₁₀2 ₁₀3 ₁₀4 Degree Probability 10−4 10−3 10−2 10−1 100 100 ₁₀1 ₁₀2 ₁₀3 ₁₀4 Degree Probability

Figure 3.3.: Loglog plot of the degree distribution of a network with Γ5(3, 12)

innovation-pays-off fitness distribution left and of a network with a Γ5(12, 3) fit-get-richer fitness

distribution right.

different fitness distributions have on the degree distribution in our network we generated a directed PAM with a Γ5(3, 12) innovation-pays-off fitness and a PAM with a Γ5(12, 3)

fit-get-richer distribution with n = 10000, δ = 1 and mt = m = 5. We clearly see

the different networks when comparing their degree distribution. In Figure 3.3 a loglog plots of both degree distributions is shown, where we see that in the innovation-pays-off network there is a small fraction of the vertices with a much higher degree than the rest, while in the fit-get-richer network there are quite a few vertices with a degree that is a bit higher than the rest.

Although Figure 3.3 suggests that the asymptotic degree distribution will show a power-law (as an inverse function of the degree and therefore as a decreasing function), we

(18)

have not proven this and we also have not shown what influence the fitness distribution has on the power-law exponent.

3.4. The PAMREF model is scale-free

Showing that the asymptotic degree distribution of the vertices in our PAMREF model, the directed PAM with a random number of independently added edges and fitness, follows a power-law, is a bit more work than for the model without fitness. As the fitness values are random samples from some distribution, they scale the numerator and denominator in equation (3.3) in a random way (recall the difference in Section 3.2 between St(δ) and St(δ, η)).

In order to prove the occurrence of a power-law we shall first show that the limiting degree distribution (pk,η)k≥0 of vertices with fitness value η follows a power-law,

de-pending on the fitness value η and some constant Cη > 0 which is determined using the

fitness distribution ρ(η). We will see what this constant Cη exactly is and this will show

us why our fitness distribution must have bounded support. After this we will derive the power-law exponent for the full limiting degree distribution (pk)k≥0 for our model.

After we have shown that the PAMREF model is scale-free, we will use this to give an heuristic derivation of the expected degree of a vertex in our model. This heursistic derivation will be an extension of the method for the PAM without fitness in [15] and we will later use this expression to estimate the fitness values.

We will first introduce some notation. In equation (3.3) we wrote Di for the degree

of vertex i, where vertex i had fitness value ηi. In this section we shall write Di,η

to emphasize that we look at the degree of vertex i with fitness value η, where we write η instead of ηi as we will later need to integrate over all possible fitness values η.

Furthermore, we will denote the total sum as S_tη := St(δ, η), omitting the dependency

on δ and still adding η to emphasize that we have fitness, for convenience. Note that by combining these two notations we write S_tη =Pt

i=1ηi(Di,ηi+ δ). Also, we will denote

the density function of the (bounded) fitness distribution by ρ(η).

We will start with showing that (pk,η)k≥0 follows a power-law. So far we have not

yet found a proof of the asymptotic degree distribution for vertices with fitness value η in a PAM that is both directed, has a random number of added edges and has fitness. Therefore we believe that our proof that we base on the general proofs from [10] (which make use of a mean value approach and one added edge) and [6] (which shows the proof for the model with a random number of added edges but no fitness) is new. For some parts of our proof we shall omit the full details and refer to [10] or [6]. We write Nk,η(t)

for the number of vertices with fitness value η and degree k at time t and denote the empirical degree distribution of vertices with fitness η in PAt(δ, η), for k ≥ 0 by

pk,η(t) =

Nk,η(t)

t .

(19)

dis-tribution of vertices with fitness value η equals, for k ≥ 0, pk,η = ρ(η)Cη µη Γ(δ + Cη/(µη) + 1)Γ(k + δ) Γ(δ)Γ(k + δ + Cη/(µη) + 1) .

Although we can prove what the limiting degree distribution is, we do not know the rate of convergence with which pk,η(t) → pk,η, which remains to be proven in further

research.

Theorem 1. The asymptotic degree distribution (pk,η)k≥0 for the vertices with fitness

value η in a directed PAM with a random number of independently added edges and bounded fitness follows a power-law. The limiting degree sequence is distributed as

pk,η∼ k−(1+1/β(η)), (3.4)

where β(η) := (µη)/Cη. Here E[m2] = µ < ∞ and Cη is a constant which is the solution

to the equation 1 = Z ηmax 0 ρ(η) δ/µ Cη/(µη) − 1 dη, (3.5)

with ρ(η) the fitness density function with bounded support [0, ηmax].

Proof. The total sum S_tη =Pt

i=1ηi(Di,ηi+ δ) at time t can be written as

Sη_t = Z ηmax 0 t X k=0 η(k + δ)Nk,η(t) dη = Z ηmax 0 t X k=0 η(k + δ)pk,η(t)t dη = Cη(t)t, (3.6) for Cη(t) = Z ηmax 0 t X k=0 η(k + δ)pk,η(t) dη. (3.7)

In this way we express the total sum by looking at Nk,η(t), the number of vertices with

degree k and fitness value η, for all possible fitness values η and all possible degrees k. Now, the expected value of Nk,η(t + 1) given PAt−1(δ, η) equals

E[Nk,η(t + 1)|PAt(δ, η), mt+1] = Nk,η(t) + E[Nk,η(t + 1) − Nk,η(t)|PAt(δ, η), mt+1].

(3.8) Using this and the different notation of S_tη, we see that the expected difference between Nk,η(t + 1) and Nk,η(t) equals E[Nk,η(t + 1) − Nk,η(t)|PAt(δ, η), mt+1] = Nk−1,η(t) mt+1η(k − 1 + δ) Cη(t)t − Nk,η(t) mt+1η(k + δ) Cη(t)t + ρ(η)1{k=0} , (3.9)

(20)

where we used equation (3.1) to express the probability that a new edge was connected to vertex of degree k − 1 or degree k. The first term on the right hand side of the equation is the expected number of vertices with fitness value η that had degree k − 1 and time t and have degree k at time t + 1 when the new vertex t + 1 adds mt+1 edges

to the network. The second term is the expected number of vertices with fitness value η that had degree k at time t and have degree k + 1 at time t + 1. And if we look at the expected difference in number vertices with degree k = 0 and fitness η, then we gain one of such vertices if the newly added vertex t + 1 that has degree 0 (because of the directed edges) also has fitness value η. This happens with probability ρ(η). Note that as mentioned in [10] and [6] there is an additional probability that two or more edges will be connected to the same vertex, but this probability will quickly converge to 0.

Now, in Section 3.5.1 we will proof in Lemma 1 that S_tη = E[Stη](1 + oP(1)) and in

Lemma 2 that S_tη is concentrated around its mean. As we simply rewrote S_tη as S_tη = Cη(t)t, this will also hold for Cη(t)t. So first we use that Cη(t)t = E[Cη(t)](1 + oP(1))t.

If we also combine equation (3.9) and (3.8), take the expectation on both sides and use that E[mt] = µ, then we can write the expected value of Nk,η(t + 1) as

E[Nk,η(t + 1)] = E[Nk,η(t)] + µη E[Cη(t)](1 + oP(1))) 1 t E[Nk−1,η(t)](k − 1 + δ) − E[Nk,η(t)](k + δ) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa+ ρ(η)1{k=0}. (3.10) Analogous to the statements in [3] and [6], which is also confirmed for the model with fitness in [10], we see that for t → ∞ we have that E[Nk(t)] − E[Nk,η(t − 1)] → pk,η and

that E[pk,η(t)] = E[Nk,η(t)]/t → pk,η. We then also have

E[Cη(t)] = Z ηmax 0 t X k=0 η(k + δ)E[pk,η(t)] dη → Z ηmax 0 X k≥0 η(k + δ)pk,η dη, (3.11) which we define as Cη := Z ηmax 0 X k≥0 η(k + δ)pk,η dη. (3.12)

Note that we will use that Cη(t) is concentrated around its mean, as discussed, and that E[Cη(t)] → Cη < ∞ as otherwise we would have one vertex which attracts a positive

fraction of all the edges. This could only happen in a PAM with unbounded fitness in which one vertex could (in theory) have infinite fitness.

Now, for equation (3.10), these statements imply that pk,η= µη Cη pk−1,η(k − 1 + δ) − pk,η(k + δ) + ρ(η)1{k=0} (3.13)

(21)

We now need to find p0,η, the limiting probability of that a vertex with fitness value η

has degree 0. Rewriting equation (3.13) gives pk,η k + δ + Cη µη = pk−1,η(k − 1 + δ) + Cη µηρ(η)1{k=0}. (3.14) If we set p−1,η = 0 and look at the expression for k = 0, we see that p0,η equals

p0=

ρ(η)Cη/(µη)

δ + Cη/(µη)

. (3.15)

With this expression of p0,η we can write p1,η as

p1,η = p0,η δ 1 + δ + Cη/(µη) , (3.16) and, for k ≥ 1, pk,η= ρ(η)Cη/(µη) δ + Cη/(µη) k Y j=1 j − 1 + δ j + δ + Cη/(µη) . (3.17)

One could check that P

k≥0pk,η = 1. Hence the probabilities (pk,η)k≥0 of having a

specific degree k ≥ 0 and fitness value η define a proper probability distribution. If we use the Gamma-function with the recurrence relation Γ(t + 1) = tΓ(t), then we can write pk,η, for k ≥ 1, as pk,η = ρ(η)Cη µη Γ(δ + Cη/(µη))Γ(k + δ) Γ(δ)Γ(k + δ + Cη/(µη) + 1) . (3.18)

Now, applying Stirling’s formula shows an interesting result. It turns out that we have a specific power-law degree sequence for the vertices with a specific fitness value η. This power-law degree sequence equals, for k ≥ 1

pk,η = cδ,µ(η) · k−(1+1/β(η))(1 + O(1/k)), (3.19)

where cδ,µ(η) = ρ(η)C_µη ηΓ(δ+C_Γ(δ)η/(µη)+1) > 0 and with power-law exponent 1 + 1/β(η) where

we define β(η) = (µη)/Cη.

We will now show that we can find an expression of Cη > 0. This is a specific constant

for each fitness distribution ρ(η) which tells us how much the distribution of the fitness values influences to total sum of the degrees in the network. Recall that when constructed the limiting degree sequence (pk)k≥0 from (3.18) we showed that E[Cη] converged to

Cη = Z ηmax 0 X k≥0 η(k + δ)pk,ηdη. (3.20)

(22)

To find the value of Cη we substitute equation (3.18) in expression (3.20) of Cη and use the equality X k≥1 Γ(k + u) Γ(k + v) = Γ(u + 1) (v − u − 1)Γ(v). This yields Cη = Z ηmax 0 X k≥0 η(k + δ)ρ(η)Cη µη Γ(δ + Cη/(µη))Γ(k + δ) Γ(δ)Γ(k + δ + 1 + Cη/(µη)) dη = Z ηmax 0 ρ(η)Cη µ Γ(δ + Cη/(µη)) Γ(δ) X k≥0 Γ(k + 1 + δ) Γ(k + 1 + δ + Cη/(µη)) dη = Z ηmax 0 ρ(η)Cη µ Γ(δ + Cη/(µη)) Γ(δ) X k≥1 Γ(k + δ) Γ(k + δ + Cη/(µη)) dη = Z ηmax 0 ρ(η)Cη µ Γ(δ + Cη/(µη)) Γ(δ) Γ(δ + 1) (Cη/(µη) − 1)Γ(δ + Cη/(µη)) dη = Z ηmax 0 ρ(η) δCη/µ Cη/(µη) − 1 dη. (3.21)

Hence Cη is the solution to

1 = Z ηmax 0 ρ(η) δ/µ Cη/(µη) − 1 dη, (3.22)

which shows what we wanted to prove.

The function β(η) = (µη)/Cη in power-law exponent 1 + 1/β(η) is often called the

dynamic growth exponent which represents how fast the degree grows as a function of η. We will see in Section 3.5 that the expected degree of a vertex i with fitness η will in fact scale as a power-law, with positive exponent β(η) (note the difference with power-law decay in the degree distribution). It is shown in numerous articles such as [2] and [9], that after the constant Cη is determined, which is based on the fitness distribution ρ(η),

we have that 0 < β(η) < 1 for each η. This is because the degree Di,η(t) of a vertex

cannot grow faster than O(t).

Note that one can mimic the proof of Theorem 1 for the case with fixed η = 1 and will then end up with the power-law distribution for the model without fitness from Section 3.1.

One important consequence of Theorem 1 is that the fitness distribution must have bounded support. If the support of the fitness density function ρ(η) is unbounded then equation (3.5) clearly has no solution for Cη.

Now, using equation (3.5) and Theorem 1 we can also find the power-law exponent for the full limiting degree distribution. Theorem 1 states that the degree distribution (pk,η)k≥0 of the vertices with fitness value η follows a power-law with exponent

γ(η) := 1 + 1

β(η) = 1 + Cη

(23)

where µ = E[m2] and where η ∈ [0, ηmax]. There is some discussion about whether the

asymptotic degree distribution then also follows a power-law for every possible bounded fitness distribution, see [2] and [10]. However, for many bounded fitness distributions it has been shown that the when (pk,η)k≥0 follows a power-law, the full limiting degree

distribution (pk)k≥0 of all the vertices also follows a power-law, see [10],[9] and [2]. This

power-law is dominated by the smallest exponent γ(η) and hence by 1 + βmax where

βmax := maxη∈[0,ηmax]β(η) is the maximal value of β(η). We will see that this also

holds for our PAMREF model. We present the following result without proof, but shall see that the result holds analogous to [9], using our derived power-law exponent from Theorem 1.

The asymptotic degree distribution (pk)k≥0 for the vertices in a directed PAM with a

random number of independently added edges and bounded fitness follows a power-law. The power-law exponent is dominated by minηγ(η), i.e.

pk∼ k− minηγ(η). (3.23)

If the support of fitness distribution ρ(η) is [0, ηmax], then the limiting degree distribution

is proportional to pk∼ Z ηmax 0 ρ(η)k−γ(η) dη = Z ηmax 0 ρ(η)k−(1+Cη/(µη))_dη. _(3.24)

We will demonstrate the result of equation (3.23) for the uniform distribution ρ(η) = 1 for η ∈ [0, 1] as in [9]. We substitute this in equation (3.24) and get

pk ∼ Z 1 0 1 · k−(1+Cη/(µη))_dη = Cη Z ∞ Cη/µ+1

k−y(y − 1)−2 dy (change of variable y = y + 1 + Cη/(µη))

= Cηk−(1+Cη/µ) Z ∞ 1+Cη/µ k−(y−1+Cη/µ)_{(y − 1)}−2 _dy = Cηk−(1+Cη/µ) Z ∞ 0

k−y(y + Cη/µ)−2 dy (change of variable y = y + 1 + Cη/µ)

∼ k

−(Cη/µ+1)

ln(k) .

(3.25) The asymptotic probability of having degree k thus is dominated by the smallest value of γ(η) and hence by 1 + βmax. We shall show in a example that the result is correct.

Example

We take n = 10000, δ = 3 and we take a fixed number of added edges µ = mt = 2.

(24)

ρ(η) = 1 with η ∈ [0, 1] from which we sampled the n fitness values of the vertices when generating our network. Suppose we want to determine the limiting degree distribution for all the vertices in our PAMREF model. We first need to determine the constant Cη

in equation (3.5). The value of Cη is then the solution to

1 = Z 1 0 3/2 Cη/(2η) − 1 dη,

which gives Cη = 2.960. We then have 1/β(η) = 2.960_2η . As the power-law exponent is

dominated by the 1 + βmax and as η ∈ [0, 1], we should have a power-law with exponent

γ = 1+2.960₂ = 2.48. Note that with for power-law exponent we mean that the asymptotic degree distribution shows decay with this exponent (so pk ∼ k−2.48). This power-law

was compared to the empirical degree distribution of the model that we generated in R (with uniform fitness on [0, 1]). As one can see in Figure 3.4 the empirical degree distribution follows a power-law with exponent 2.48. In [9] other examples are shown

10−4 10−3 10−2 10−1 100 100 100.5 101 101.5 102 102.5 Degree Probability

Figure 3.4.: Log-log plot of the degree distribution of the PAM with uniform fitness, n = 10000, δ = 3 and mt= 2 and with exponent 2.48 in black.

confirming equation (3.23) for truncated exponential fitness distributions and two-Dirac delta fitness distributions (note that those examples are in an undirected PAM, so the values of the power-law exponent are different then one would have in our case).

(25)

3.5. Expected degree in the PAMREF model

In this section we will give a heuristic derivation of the expected degree in a PAMREF model. We will later use this to estimate the fitness values of the vertices in our network. First we will show that the total sum S_tη is approximately equal to its expectation E[Stη].

After that we will heuristically derive the expected degree plus δ, E[Di,η(t)+δ], of a vertex

i with fitness value η. Finally, we will give an example to show that this approximation of the expected degree actually is quite accurate.

3.5.1. Replace total sum by the expected total sum

Recall that the total sum of the degrees plus δ with fitness is equal to

S_tη = t X r=1 ηr(Dr,ηr(t) + δ) = Z ηmax 0 t X k=0 η(k + δ)pk,η(t)t dη = Cη(t)t (3.26)

As the fitness values are unknown we do not know the value of S_tη. In fact, we also even do not exactly know E[Stη] = E[Cη(t)]t. However, we have seen in Theorem 1 that

E[Stη]/t = E[Cη(t)] → Cη, for Cη the solution to equation (3.5). So E[Stη] = Cηt(1 + o(1))

and E[Stη] = O(t). In the following two lemmas we shall prove that the total sum S η t

does not differ much from E[Stη] and that we can in fact replace S η

t by E[S η

t] when t is

large enough as we will show that S_tη is concentrated around its mean. Lemma 1. For each > 0

P(|Stη− E[S η

t]| > E[S η

t]) → 0, t → ∞.

Proof. Fix > 0. We have seen that random variable S_tη has finite mean E[Stη] for each

t. Applying Chebyshev’s inequality yields

P(|Stη− E[S η t]| > E[S η t]) ≤ Var(Sη_t) 2_E[Sη t]2 .

Now, we observe that E[Stη]2 = O(t2), that the fitness values are IID and that for i, j

(Di,ηi(t) + δ)ηi and (Dj,ηj(t) + δ)ηj are negatively correlated as the edges that go to

(26)

S_tη =Pt

i=1Yi and we can write

Var(S_tη) = E[(Stη)2] − E[S η t]2 = t X i=1 E[Yi2] + 2 X 1≤i<j≤t E[YiYj] − E[Stη]2 ≤ t X i=1 E[Yi2] + 2 X 1≤i<j≤t

E[Yi]E[Yj] − E[Stη]2

= t X i=1 E[Yi2] + E[S η t] 2₋ t X i=1 E[Yi]2− E[Stη] 2 ≤ t X i=1 E[Yi2] = t X i=1 E[E[Yi2| max_r Yr]] ≤ t X i=1 E[E[Yi· max r Yr| maxr Yr] ≤ E " max r Yr· E " _t X i=1 Yi| max r Yr ## ≤ E[max r Yr] · E[S η t].

The maximum here is taken over r ∈ {1, . . . , t} and if we now again replace Yj by

(Dj,ηj(t) + δ)ηj and insert this result into Chebyshev’s inequality we see when t → ∞

that P(|Stη− E[S η t]| > E[S η t]) ≤ max r∈{1,...,t} E[(Dr,η(t) + δ)ηr] 2 E[Stη] → 0,

where we have convergence as the maximal degree increases with a slower rate than O(t). Otherwise we would have one vertex which attracts a positive fraction of all new edges and this could only happen in the case of unbounded fitness.

Using Lemma 1 we can now show that we are allowed to replace S_tη by E[Stη] when t

is large enough.

Lemma 2. For > 0 and x ∈ R>0

P(Stη > x) → 1{E[Stη]>x/(1+)}, when t → ∞.

Proof. Fix > 0 and consider the event A = {|S_tη − E[Stη]| > E[S η

(27)

S_tη = S_tη1A+ Stη1Ac. By lemma 1 it follows that for each x ∈ R_>0 P(Stη > x) = P(S η t > x, A) + P(S η t > x, Ac) ≤ P(A) + P(S η t > x, Ac) ≤ P(A) + P(E[Stη](1 + ) > x, A c₎ → 0 + 1{E[Sηt]>x/(1+)}

when t → ∞ and at the same time P(Stη > x) ≥ P(S

η

t > x, Ac) → 1{E[Stη]>x/(1+)}, when t → ∞.

From these lemmas we can conclude that S_tη = E[Stη](1 + oP(1)) and that S η

t is

con-centrated around its mean. So for sufficiently large values of t we can replace S_tη by E[Stη].

3.5.2. Expected degree of vertices with fitness

In this section we will heuristically derive the expectation of the degree of vertex i with fitness η at time t plus δ, E[Di,η(t)+δ]. This derivation is based on the proof of Theorem

8.2 in [15] of the basic PAM. What makes the derivation in our case different from the derivation in [15] is that each vertex has a fitness value that influences the value of the degree and that it is drawn IID from a bounded fitness distribution. Our expectation E[Di,η(t) + δ] will be an approximation. We have seen that Stη ≈ E[S

η

t] and that as

E[Stη]/t = E[Cη(t)] → Cη we can for our derivation say that E[Stη] ≈ Cηt with Cη the

solution to equation (3.5). Therefore we will make step of replacing Stη by Cηt in our

heuristic derivation as we will be looking at large values of t. We will see that the expected degree that we derive is still rather accurate. In this section we shall derive that we can approximate the expectation E[Di,ηi(t) + δ] of the degree of a vertex i with

fitness value η at time t plus δ as

E[Di,η(t) + δ] ≈ δ

t i

_Cηµη

, (3.27)

with E[m2] = µ < ∞ the mean number of added edges per vertex.

Fix δ > 0 and assume that we can write S_tη ≈ Cηt, We first compute the expectation

of Di,η(t + 1) + δ given the degree Di,η(t) of vertex i with fitness value η, given Stη and

given mt at time t ≥ i. This equals

E[Di,η(t + 1) + δ|Di,η(t), Stη, mt] = Di,η(t) + δ + E[Di,η(t + 1) − Di,η(t)|Di,η(t), Sηt, mt]

= Di,η(t) + δ + mt

η(Di,η(t) + δ)

S_tη .

(28)

Now, using that S_tη ≈ Cηt, taking the expectation on both sides of equation (3.28) and

using the mt is independent of Di,η(t) gives us

E[Di,η(t) + δ] = E[E[Di,η(t) + δ|Di,η(t − 1), Stη, mt]]

≈ E (Di,η(t − 1) + δ) 1 +mtη Cηt = E [Di,η(t − 1) + δ] 1 + µη Cηt = E [Di,η(t − 1) + δ] Cηt + (µη) Cη . (3.29)

If we repeat this procedure, using that Di,η(i) = 0 for each i (as we have directed edges),

we find

E[Di,η(t) + δ] = E [E[Di,η(t) + δ|Di,η(t − 1), Stη, mt]]

≈ E[Di,η(i) + δ] t−1 Y r=i Cηr + (µη) (Cηr = δ t−1 Y r=i Cηr + (µη) (Cηr . (3.30)

In order to compute the product we will make use of the Gamma-function as it follows the recurrence relation

Γ(t + 1) = tΓ(t). We can now compute that

t−1 Y r=i Cηr + (µη) Cηr = t−1 Y r=i r + µη_C η r = Γ(t +_Cµη η)Γ(i) Γ(i +µη_C η)Γ(t) . (3.31)

Therefore the expectation E[Di,η(t) + δ] of the degree of vertex i with fitness η at time

t plus δ equals E[Di,η(t) + δ] ≈ δ · Γ(t + µη_C η)Γ(i) Γ(i +_Cµη η)Γ(t) . (3.32)

We now have written the expected degree in terms of the Gamma function. Continuing with the method from [15] we can derive a more usable expression by using a consequence of Stirling’s formula: for each t > 0 and a > 0 we have

Γ(t + a) Γ(t) = t

a_{(1 + O(1/t)).}

Applying Stirling’s formula shows us that the expected value E[Di,η(t) + δ] for high

enough values of t is given by

E[Di,η(t) + δ] ≈ δ

t i

_Cηµη

(29)

So equation (3.33) shows that the degree of a vertex i with fitness η approximately scales as power-law with growth exponent β(η) = (µη)/Cη (note the difference between

this scaling as a power-law and the power-law decay that we observe in the asymptotic degree distribution as a function of the degree) with Cη the solution to

1 = Z ηmax 0 ρ(η) δ/µ Cη/(µη) − 1 dη. (3.34)

This corresponds to the results of [2], [4] and [9], showing this result for a undirected PAM with no parameter δ and by making use of first order differential equations with a mean value approach. We will now give an example to show that this approximation of the expected degree in fact is quite accurate.

100.5 101 101.5 102 102.5 103 100 101 102 103 104 Time t Degree + δ

Figure 3.5.: Loglog plot of the growth of the degree plus δ = 3 of vertex 1 in black and the expected growth shown in a red line.

Example

Using the same setting as in the example from Section 3.4, we take n = 10000, δ = 3 and µ = mt = 2 and for each vertex sample its fitness value from the uniform fitness

distribution ρ(η) = 1 with η ∈ [0, 1]. Suppose we want to verify if the expected degree of vertex 1 really has scaled as the power-law derived in equation (3.33) and if the approx-imation of Cη is approximately correct. The randomly sampled uniform fitness value of

(30)

this vertex in our network was η1 = 0.90 (which is very high). We have determined the

value of constant Cη using equation (3.5) (which equals equation (3.34)) in the example

from Section 3.4. This value was Cη = 2.960. In Figure 3.5 a loglog plot is shown of the

growth of the degree of vertex 1 plus δ = 3 from t = 1 to t = n in black. The red line shows the power-law

E[D1,η1 + δ] ≈ δ t i µη1 Cη = 3 t 1 1.80 2.960 .

Recall that a log-log representation of a power-law should be a straight line and that in this case the straight line should be increasing as we have a positive power-law exponent. As one can see, the growth of the expected degree plus δ of vertex 1 from equation (3.33) follows the true growth of the degree as t → n rather well. This indicates that the approximate derivation of E[Di,ηi+ δ] is quite accurate.

(31)

4. Estimating the parameters of the

PAMREF model

We have proven in Section 3.4 that the PAMREF model has an empirical degree dis-tribution which converges to a power-law. Therefore we can explain the occurrence of real-world complex networks that are directed, have a random number of added edges and (bounded) fitness values with this model. This also suggests that parameter esti-mation on this model is a good starting point for the estiesti-mation of the model from [8]. As we have seen in Section 3.2, we can describe the stepwise evolution of the PAMREF model in a probabilistic way with equation (3.3). In that equation we assume that we know the value of parameter δ and the fitness values of the vertices η = (ηt)t≥1.

How-ever, in real citation networks the value of δ and the fitness values are unknown. If we wish to model a citation network using a PAM we need to estimate both this value of δ and the fitness values η. Recall that we assume the distribution (and hence the mean) of the random number of added edges to be known and that it is not needed to esti-mate (mt)t≥1. We will first construct a MLE for δ and then see that we cannot simply

estimate the fitness values η. In this chapter we will again use the original notation from Section 3.2. We write Di for the degree of vertex i and denote the total sum by

S_tη(δ, η) :=Pt

i=1ηi(Di+ δ).

4.1. MLE for δ

In this section we shall construct a MLE for δ which is based on the log-likelihood function from [14] for the general model without random edges and where new edges can also be drawn between already present vertices in the network (so not only from a new vertex). We will also give a more rigorously construction of the log-likelihood from [14]. The log-likelihood of observing the full evolution of the network that we will derive for inference on δ can also be used for inference on the fitness values η later on. When constructing the MLE for δ we assume that we know the fitness values η of the vertices. Suppose we observe the evolution of our model up to time n as a sequence (PAt(δ, η))nt=1,

where PAt(δ, η) is the PAM at time t and PA1(δ, η) is the initial network consisting of

one vertex V1 = {1} with no edges. Note that as we look at the model up to time n

we will only look at the fitness values up to time n. Therefore we write η := (ηt)nt=1.

We can represent the likelihood P(PAt(δ, η)|PAt−1(δ, η), mt) of going from the network

PAt−1(δ, η) to PAt(δ, η), for each t ≥ 2, by conditioning on PAt−1(δ, η), the fitness

(32)

time n then equals

`(δ | (PAt(δ, η))nt=1, (mt)t≥2) = n

X

t=2

log P(PAt(δ, η)|PAt−1(δ, η), mt). (4.1)

Note that in the likelihood P(PAt(δ, η)|PAt−1(δ, η), mt) we only make use of the fitness

values (ηr)t−1r=1, but for convenience we will still write η = (ηt)nt=1. In order to construct

the maximum likelihood estimator for δ we need to find a way to represent the full evolution of the network (PAt(δ, η))nt=1up to PAn(δ, η). We will base this on the stepwise

evolution from equation (3.3) and, as in [14], we will make use of the random variable Zi(t) that denotes the number of new edges connected to vertex i at time t. For vertex

1 at time t = 2 we always have Z1(2) = m2. Note that for each t ≥ 2 we have

Pn

i=1Zi(t) =

Pt−1

i=1Zi(t) = mt, since there are mt new edges at time t, none of the

vertices {t + 1, . . . , n} are yet in the network at time t and a vertex cannot connect an edge to itself, i.e. we do not allow self-loops, so Zt(t) = 0. Furthermore, for each i

we have Pn

t=1Zi(t) =

Pn

t=i+1Zi(t) = Di(n), the degree of vertex i at time n, where

we can count from t = i + 1 since no edges can be added to vertex i before the vertex enters the network. Also, for each vertex i and time t > i we have the recursive equation Di(t) = Zi(t) + Di(t − 1) where Di(i) = 0.

As the edges of each vertex are added independently, the conditional probability that all the mt edges of incoming vertex t at time t will be connected to vertex i, for t ≥ 2,

is P(Zi(t) = mt|PAt−1(δ, η), mt) = ηi(Di(t − 1) + δ) St−1(δ, η) mt , (4.2) where St−1(δ, η) =Pt−1r=1ηr(Dr(t − 1) + δ) is the total sum of edges and vertices with

their fitness values at time t − 1. Using equation (4.2) we can also express the con-ditional likelihood of growing from PAt−1(δ, η) to PAt(δ, η) in terms of the sequence

Zt := (Z1(t), . . . , Zt−1(t)), which represent the number of newly added edges to all the

vertices {1, . . . , t − 1} at time t ≥ 2. For some realization zt:= (z1(t), . . . , zt−1(t)) of Zt,

where we always must havePt−1

i=1zi(t) = mt, this likelihood is given by

P(PAt(δ, η)|PAt−1(δ, η), mt) = P(Zt= zt|PAt−1(δ, η), mt)

= Czt· _η 1(D1(t−1)+δ) St−1(δ,η) z1(t) · · ·ηt−1(Dt−1(t−1)+δ) St−1(δ,η) zt−1(t) , (4.3) where Czt := mt!

z1(t)!···zt−1(t)! is the number of possible ways in which we can observe this

realization zt. If we take the logarithm of equation (4.3), we get the log-likelihood

log P(PAt(δ, η)|PAt−1(δ, η), mt) = t−1

X

i=1

zi(t) (log(ηi(Di(t − 1) + δ)) − log(St−1(δ, η))) + log Czt.

(4.4) Note that for inference on δ and the fitness values η the constant log Czt is not

(33)

full network with respect to δ, we sum equation (4.4) from t = 2 to n. If we define Z(n) := (Z2, . . . , Zn), then this represents the full evolution of the network in terms of

the number of added edges to each vertex at every time step. We can then express the log-likelihood of observing δ, but also of observing the fitness values η (which we shall use later), given the full evolution (PAt(δ, η))nt=1 in terms of Z(n) as

`(δ, η |Z(n)) = `(δ, η |(PAt(δ, η))nt=1) = n X t=2 t−1 X i=1 Zi(t) log(ηi(Di(t − 1) + δ)) − log(St−1(δ, η)) = n X t=2 t−1 X i=1 Zi(t) log(ηi) + log(Di(t − 1) + δ) − n X t=2 mtlog (St−1(δ, η)) , (4.5) where in the last equality we use that Pt−1

i=1Zi(t) = mt. Note that we refrain from

conditioning on (mt)nt=2 as this is implicitly included in Z(n). For inference on δ we can

drop the first term in the last equality. Note that we have the fitness values η = (ηt)nt=1,

that we assume to be known when estimating δ, present in the total sum St−1(δ, η) =

Pt−1

r=1ηr(Dr(t − 1) + δ). With this we can find the maximum likelihood estimator of δ.

If we assume that δ lies in some interval [a, b] with 0 < a < b < ∞, then the MLE for δ is ˆδn= argmaxδ∈[a,b] ln(δ), for

ln(δ) = n X t=2 t−1 X i=1 Zi(t) log(Di(t − 1) + δ) − n X t=2 mtlog (St−1(δ, η)) . (4.6)

If we take the derivative with respect to δ, then the MLE for δ ∈ (a, b) is a solution of the equation l_n0(δ) = 0, for derivative

l0_n(δ) = ∂ln(δ) ∂δ = n X t=2 mt X i=1 Zi(t) Di(t − 1) + δ − n X t=2 mtPt−1r=1ηr St−1(δ, η) . (4.7)

We get the second term in the derivative since ∂ log(St−1(δ, η)) ∂δ = 1 St−1(δ, η) ∂St−1(δ, η) ∂δ = Pt−1 r=1ηr St−1(δ, η) .

4.2. Simulation study for δ estimation

In this section we will test the estimation of the MLE for δ in our PAMREF model, the directed PAM with random number of added edges and bounded fitness. We simulated a small directed PAM with n = 1000 vertices, a fixed number of mt = m = 5 edges,

with true parameter δ = 2 and where each vertex i ∈ {1, . . . , 1000} had a fitness value ηi which was sampled from a Γ5(5, 5) fitness distribution. The network was generated

(34)

in R and the full evolution (PAt(δ, η))nt=1 of the network until time n was observed. We

are interested in estimating the value δ = 2 by maximizing the log-likelihood function from equation (4.6). As the derivative l_n0(δ) consists of many fractions with δ in the denominator, finding an expression for δ as a solution of l0_n(δ) = 0 is rather hard. Therefore we used the built-in function optimize in R to find the maximum likelihood estimate ˆδn = argmaxδ∈[0,5] ln(δ) in the interval [0, 5]. This process was repeated by

generating 1000 replicates of this PAM under the same conditions (and using the same fitness values of the vertices). The results of the 1000 MLE estimates ˆδn of the value

δ = 2 are shown in Figure 4.1.

We see that the MLE is very accurate. In the 1000 simulations of the model with only n = 1000 vertices the worst estimate deviated 0.31 from the true value δ = 2, see Figure 4.1 for the empirical distribution of the 1000 values of ˆδn. Although we did not

prove that the estimator is asymptotically normal, we see in Figure 4.1 the shape of the estimations follows a normal distribution. In addition, often the empirical slope of degree distribution is used to estimate δ. In our PAMREF model the influence of δ on the slope of the degree distribution is very little as this slope is highly influenced by the fitness distribution, see Theorem 1. Therefore this MLE is an accurate way to estimate δ.

Value of the estimate of δ

Frequency 1.7 1.8 1.9 2.0 2.1 2.2 2.3 0 50 100 150 200

Figure 4.1.: Histogram of the MLE estimates ˆδn of true parameter δ = 2 for 1000 replicates of

(35)

4.3. Fitness estimation

So far, we have seen that we can quite easily estimate the parameter δ in our model. We will now assume that we know the value of δ. But how do we then estimate the fitness values η = (ηt)nt=1 of the vertices in our network and hence the fitness distribution?

We assume that we know which type of probability distribution represents the fitness distribution, but not the values of the parameters of the distribution. In this thesis we shall work with a truncated Gamma distribution with upper bound 5 and try to estimate this distribution based on the evolution of a network (PAt(δ, η))nt=1 for different shape

and rate parameters (which results in a rich-get-richer, fit-get-richer or innovation-pays-off network depending on the values of the parameters, see Section 3.3). We will try to estimate the fitness distribution by estimating the fitness values of each vertex in the network. As we have many vertices and as their fitness values are IID samples from the (unknown) truncated Gamma distribution, if we can estimate the fitness values, we can also estimate the full fitness distribution.

Unfortunately, the estimation of the fitness values is not as easy as the estimation of δ. When we observe (the evolution of) a citation network we only see the degrees of all the vertices at each time point. This also holds for our PAMREF model. In this network we therefore estimate the unobservable parameter δ and the unobservable fitness values η using only the degrees of the vertices that we view at every time point in the network (PA(δ, η))n_t=1. For the estimation of δ we only need to estimate one hidden parameter δ, but for the estimation of the fitness values we have as much hidden ‘parameters’ as we have vertices in the network. We shall discuss two possible ways of estimating the fitness values, see that they have some flaws, and then introduce two methods with which we can overcome these flaws and perform our fitness estimation.

4.3.1. Growth method and PAFit method

Before introducing the two methods, we first note that for the fitness estimation we can-not determine and hence use the constant Cη that we introduced in Section 3.4. As the

value of Cη depends on the probability density function ρ(η) of the fitness distribution,

see equation (3.5), and we do not know the fitness distribution, we also cannot deter-mine Cη. Now, there are two possible ways of estimating these fitness values. The first

and most obvious way would be to estimate the fitness values using the log-likelihood function from equation (3.5). Suppose we would like to start with estimating only one fitness value, for example the fitness value η1 of vertex 1. Omitting the terms that do

not depend on the fitness value η1, we can express the log-likelihood of observing the

Understanding the dynamics of citation networks through preferential attachment models

MSc Mathematics

Master Thesis