• No results found

Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

N/A
N/A
Protected

Academic year: 2021

Share "Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23"

Copied!
21
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Cover Page

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

3

M O D E L S

3.1 introduction

We formulate the model in a more concise and less abstract fashion than in Section 2.2.4. We start at𝑛 = 2 with two connected nodes.

We then define the recursive scheme for the network to evolve. Sup- pose at time𝑛 we have a network of 𝑛 nodes 𝑉𝑛= {𝑣1, 𝑣2, … , 𝑣𝑛} with degrees(𝑑𝑖)𝑛𝑖=1. The new node𝑣𝑛+1comes in at time𝑛 + 1 and puts 𝑓(𝑑𝑖) as the preference on node 𝑖, where 𝑓 maps the degree 𝑘 ∈ ℕ+ to its associated preference𝑓(𝑘) ∈ ℝ+. The new node chooses one ex- isting node𝑣𝑖in𝑉𝑛to connect to with probability proportional to the associated preference𝑓(𝑑𝑖), where 𝑑𝑖is the degree of node𝑣𝑖. This puts probability𝑓(𝑑𝑖)/∑𝑛𝑗=1𝑓(𝑑𝑗) on node 𝑣𝑖.𝑓 is typically assumed a priori to be non-decreasing, in other words, nodes of higher degrees are more favorable for the incoming node to connect to, thus higher degrees inspire more incoming connections. This explains the name

preferential attachment model. This is closely

related to the Matthew effect etc.

as elaborated in Section 2.1.3.

After the incoming node makes the choice, the network evolves to the stage of𝑛 + 1 nodes and the recur- sive scheme starts again with the newly incoming node𝑣𝑛+2. We may repeat the procedure as many times as we want.

We define the empirical degree distribution𝑃𝑘(𝑛) to be the pro- portion of nodes of degree𝑘 at time 𝑛

𝑃𝑘(𝑛) = 1

𝑛 ∑

𝑖∈[𝑛]

𝟙{𝑑𝑖(𝑛)=𝑘},

where[𝑛] = {1, … , 𝑛} and 𝑑𝑖(𝑛) is the degree of node 𝑣𝑖at time𝑛. In case𝑓 is affine with 𝑓(𝑘) = 𝑘 + 𝛿 as in [55], it has been well-known that for any fixed𝑘, as 𝑛 → ∞, 𝑃𝑘(𝑛) → 𝑝𝑘(at least) in probability with𝑝𝑘 only depending on𝛿. Moreover, it is also well-known that the limiting degree distribution follows a power law, whence this is a scenario of the scale-free phenomenon occuring. That is, as𝑛 → ∞,

𝑃𝑘(𝑛) → 𝑝𝑘with𝑝𝑘= 𝑐(𝑘)𝑘−𝜏,

where𝜏, being equal to 3 + 𝛿, is the power-law exponent and 𝑐(𝑘) is slowly-varying in𝑘. Furthermore, if 𝑓 is linear 𝑓(𝑘) = 𝑘, we can work out the exact asymptotic degree distribution to be𝑝𝑘= 4/(𝑘(𝑘 + 1)(𝑘 + 2)) ([16, 55]), i.e., the power-law exponent 𝜏 = 3 + 0 = 3).

(3)

Móri [55] and Resnick and Samorodnitsky [69] show that central limit theorems hold for the empirical degree distribution𝑃𝑘(𝑛). They apply martingale central limit theorems on carefully devised martingales to prove the asymptotic normality results of𝑃𝑘(𝑛) for any fixed 𝑘 with explicit limiting variance𝜎𝑘2depending only on𝛿 and 𝑘.

If we reduce our assumption of𝑓 being affine to be only sublin- ear (which includes the affine case), we know much less, except the limiting degree distribution that we will talk about later—partly be- cause the analysis becomes much more difficult. Suppose we have 𝑣1, 𝑣2, … , 𝑣𝑛with degrees(𝑑𝑖)𝑛𝑖=1and preferences(𝑓(𝑑𝑖))𝑛𝑖=1. We need the total preference, defined as the sum of all𝑓(𝑑𝑖)’s, to normalize the multinomial distribution on all the nodes. In the affine case of 𝑓(𝑘) = 𝑘 + 𝛿 with 𝛿 parameter,

The sum of all

degrees is2𝑛. the total preference is deterministic

with the form∑𝑛𝑖=1𝑓(𝑑𝑖) = ∑𝑛𝑖=1(𝑑𝑖+ 𝛿) = 𝑛𝛿 + 2𝑛. However, in the general case, the preceding simple relation ceases to hold and the to- tal preference becomes a rather messy random quantity depending on the entire history of the network evolution. For the affine case, it is possible to study the limiting degree distribution with simple recursions on the degree evolution. However, this tool fails in the more general case. Stronger and less intuitive ideas are necessary to study the even simplest quantity—the limiting degree distribution.

Rudas, Tóth, and Valkó [70] proves that the empirical degree propor- tion𝑃𝑘(𝑛) is asymptotic to 𝑝𝑘as in (3.13) almost surely, for any fixed 𝑘. Their proofs rely on a branching-process framework where one can study the degree distribution with the well-established strong Law-of- Large-Number-type results from the classical work of branching pro- cesses dating back to the 1970s and 1980s, see [40, 56].

In the non-linear regime, the pa networks do not yield a power law in general. In the case of the sublinear case, we find a much lighter tail on the limiting degree distribution, where the degree distribution decays much faster with respect to the increasing degree. That is, in the sublinear case, much fewer node of high degrees can be found. On the one hand, the estimation problem of general sublinear pa function is interesting in itself; on the other hand, it helps in applications where power laws are believed to hold. In some applications, it might be a good idea to drop the assumption that the pa function is affine and we might try to estimate the pa function𝑓 in a more general sublinear domain. With such estimation one might conclude whether or not𝑓 being affine is reasonable, depending on whether the estimator of𝑓 looks like an affine function at all. In other words, estimating𝑓 in a more general domain helps to validate the modelling of power laws with pa networks.

The main contribution of our work is to propose the empirical es-

(4)

timator and prove its consistency. The intuition behind the empirical estimator comes from thinking in the limiting regime where every- thing becomes stable and easy to analyze. The Markovian nature of the pa networks makes the problem hard as it ceases to be a conven- tional statistical problem where everything is independent and iden- tically distributed (iid). The most surprising part in our study is per- haps that our empirical estimators do not depend on history, we only need the final snapshot of the network to establish our estimators. A deeper thinking of the problem reveals that despite the Markovian na- ture, the network will inevitably go to the limiting shape, where the connecting behavior of the incoming nodes look more and more like iid random variables. Henceforth eventually the history matters less and less.

This chapter is organized as follows. In Section 3.2, we give the in- tuition behind the estimator and present the main results on consis- tency. Section 3.3 introduces the terminology of branching processes and gives a random tree model that is equivalent to the evolution of pa networks. After that, we give the proof of the main consistency re- sult in Section 3.4. In the last section, we present a simulation study on the performance of the proposed empirical estimators in different settings and explain our observations. Some other studies are carried out in order to uncover more properties of the empirical estimator, the most interesting one being that the estimator seems to asymptot- ically normal with parametric √𝑛 rate.

3.2 construction of the empirical estimator and main result Suppose we have a pa graph, that has𝑛 nodes and is already more or less not far from the limiting distribution(𝑝𝑘)𝑘=1. Suppose a new node comes in and needs to pick an existing node to attach to according to the pa rule associated with the pa function𝑓. For 𝑁𝑘the number of nodes of degree𝑘 (≈ 𝑛𝑝𝑘 in the limiting regime), the probability of choosing an existing node of degree𝑘 is

𝑓(𝑘)𝑁𝑘

𝑗=1𝑓(𝑗)𝑁𝑗 ≈ 𝑓(𝑘)𝑝𝑘

𝑗=1𝑓(𝑗)𝑝𝑗.

We are interested in the quantity𝑓(𝑘) for each 𝑘 ≥ 1. However we note the denominator on the right hand side of the above display

𝑗=1𝑓(𝑗)𝑝𝑗does not depend on𝑘. If we put an extra factor 𝑛/𝑁𝑘 ≈ 1/𝑝𝑘on the above display to cancel the term𝑝𝑘, then we get a rescaled version of𝑓(𝑘). Keeping in mind that 𝑓 is only identifiable up to scale, we define𝑟𝑘—the rescaled version of𝑓(𝑘) for each 𝑘 and the above

(5)

heuristics says 𝑟𝑘∶= 𝑓(𝑘)

𝑗=1𝑓(𝑗)𝑝𝑗 ≈ Probability of choosing a node of degree𝑘

𝑁𝑘/𝑛 .

We want to devise an estimator mimicking the above equation, which works in the non-limiting regime. The probability of the new node choosing an existing node of degree𝑘 can be estimated by counting the number of times that the incoming node chooses an existing node of degree𝑘 during the evolutional history of the pa network. We de- note the said number for the pa network at time𝑛 by 𝑁→𝑘(𝑛) and the number of nodes of degree𝑘 in the pa network at time 𝑛 by 𝑁𝑘(𝑛).

The empirical estimator (ee) 𝑘̂𝑟(𝑛) is defined by

̂𝑟𝑘(𝑛) = 𝑁→𝑘(𝑛)

𝑁𝑘(𝑛) . (3.1)

Suppose𝑁>𝑘(𝑛) is the number of nodes of degree strictly bigger than 𝑘 at time 𝑛. For the pa networks considered here, we have the follow- ing crucial observation.

Lemma 3.1. 𝑁→𝑘(𝑛) = 𝑁>𝑘(𝑛).

Proof. Observe that𝑁→𝑘(𝑛) counts the number of times that the in- coming node chooses an existing node of degree𝑘 to connect to up to time𝑛. However if a node was chosen to be connected to as a node of degree𝑘 at some point before time 𝑛, its degree at time 𝑛 is at least 𝑘 + 1. On the other hand, we notice the node degree may only jump from1 to 2, 2 to 3, …, 𝑘 to 𝑘 + 1, etc. Therefore if a node has degree strictly bigger than𝑘, it must have been chosen to be connected to as a node of degree𝑘 at some time. This gives the equality as in the statement of the lemma.

In the light of the above observation, we note (3.1) is equivalent to

̂𝑟𝑘(𝑛) = 𝑁>𝑘(𝑛)

𝑁𝑘(𝑛). (3.2)

We give the main results of this chapter in the following theorem and defer the proof until Section 3.4.

Theorem 3.2. If the true pa function satisfies that there exists a positve constant as the Malthusian parameter for the associated continuous- time random tree model, then the above constructed empirical estimator

̂𝑟𝑘(𝑛) is consistent almost surely, i.e.,

̂𝑟𝑘(𝑛) a.s.−−→ 𝑟𝑘, as𝑛 → ∞. (3.3)

(6)

Remark. Suppose you have a real-world dynamic network and are thinking of modeling it as a pa network. The above estimators then only need the network you observe at the end of the evolution and do not need the knowledge of the evolutional history. This is seemingly against the intuition given the Markov setting of the model. A simple explanation would be the limiting behavior of the network model—

the degree sequence stabilizes as the network size goes up and as the limiting degree distribution is deterministic, the influence of the his- tory ceases. The asymptotic independence on the history is also practi- cally important, because in real-world applications, it is often difficult, expensive or sometimes even impossible, to recover the complete evo- lution history of dynamic networks.

Remark. The above equation has some philosophical interpretations.

Suppose one lives in the world of a pa network as a node and the nodes are constantly coming in. The nodes in this world do not un- derstand how some superior force makes the world as it is and thus do not know the exact preferential attachment rule. In this world they measure wealth by counting the degree, i.e., the more neighbors one has, the wealthier one is. Suppose a node has degree𝑘 and it asks how likely it is going to get an extra edge so as to get richer. The question is equivalent to asking for an estimate of𝑓(𝑘). Our estimator says it is reasonable to count the number of nodes with higher degrees (a.k.a.

the people above you)𝑁>𝑘and the number of nodes with the same degrees (the people sharing the same rank as you)𝑁𝑘, and then the quotient is an estimator of𝑓(𝑘). If you live in the world of these nodes and ask what is your chance of moving up, then naturally the best you can come up with is simply to compute the aforementioned ratio. The higher the relative number of the people above you against the people sharing your rank is, the better chance you have to move up.

3.3 borrowing strength from branching processes

In this section we introduce the terminology needed to state the pa model in the language of branching process, similar to [70]. As we later will see, the evolution of one particular kind of continuous-time branching process is in a certain sense equivalent to the evolution of the preferential attachment model; this enables us to study the de- gree distribution of preferential attachment networks with the clas- sical and well-established results of the branching processes.

(7)

3.3.1 Rooted ordered tree

The pa network is a rooted ordered tree, which can be described as an evolving genealogical tree, where the nodes are individuals and the edges are parent-child relations. The usual notation for the nodes are

∅ for the root of the tree and 𝑙-tuples (𝑖1, … , 𝑖𝑙) of positive natural numbers𝑖𝑗 ∈ ℕ+for the other nodes (𝑙 ∈ ℕ+ = {1, 2, …}). The chil- dren of the root are labeled(1), (2), …, and in general 𝑥 = (𝑖1, … , 𝑖𝑙) denotes the𝑖𝑙-th child of the𝑖𝑙−1-th child of⋯ of the 𝑖1-th child of the root. Thus the set of all possible individuals is

ℐ = {∅} ∪ (

𝑙=1

𝑙) , ℐ𝑙= {(𝑖1, … , 𝑖𝑙)|𝑖𝑗∈ ℕ+}.

For𝑥 = (𝑥1, … , 𝑥𝑘) and 𝑦 = (𝑦1, … , 𝑦𝑙) the notation 𝑥𝑦 is shorthand for(𝑥1, … , 𝑥𝑘, 𝑦1, … , 𝑦𝑙), and 𝑥𝑙 is the concatenation (𝑥1, … , 𝑥𝑘, 𝑙).

Since the edges of the tree can be inferred from the labels of the nodes(𝑖1, … , 𝑖𝑙), a rooted ordered tree can be identified with a subset 𝐺 ⊂ ℐ . (Not every subset corresponds to a rooted ordered tree, as the labels need to satisfy the compatibility conditions that for every (𝑥1, … , 𝑥𝑘) ∈ 𝐺 we have (𝑥1, … , 𝑥𝑘−1) ∈ 𝐺 (parent must be in tree) as well as(𝑥1, … , 𝑥𝑘 − 1) ∈ 𝐺 if 𝑥𝑘 ≥ 2 (older sibling must be in tree). The set of all finite rooted ordered trees is denoted by 𝒢 . In this terminology and notation the degree of node𝑥 ∈ 𝐺 is the number of its children in𝐺 plus 1 (for its parent), given by

deg(𝑥, 𝐺) = |{𝑙 ∈ ℕ+∣ 𝑥𝑙 ∈ 𝐺}| + 1. (3.4)

3.3.2 Branching process

The evolution in time of the genealogical tree is described through stochastic processes(𝜉𝑥(𝑡))𝑡≥0, one for each individual𝑥 ∈ ℐ . The birth is process𝜉𝑥is a point process on[0, ∞) giving the ages of the parent𝑥 at the births of its children. The birth time 𝜎𝑥of individual 𝑥 in calendar time is then defined recursively, by setting 𝜎= 0 (the root is born at time zero) and

𝜎𝑦= 𝜎𝑥+ inf{𝑢 ≥ 0 ∶ 𝜉𝑥(𝑢) ≥ 𝑙}, if𝑦 = 𝑥𝑙.

Thus the𝑙-th child of 𝑥 is born at the birth time of 𝑥 plus the time of the𝑙-th event in 𝜉𝑥.

It is assumed that the birth processes𝜉𝑥for different𝑥 ∈ ℐ are independent and identically distributed. Thus the splits in the tree

(8)

evolve independently and identically from every node onwards and this independence makes the process a branching process. As a formal definition we may define all processes𝜉𝑥on the product probability space

(𝛺, ℬ, 𝑃) = ∏

𝑥∈ℐ

(𝛺𝑥, ℬ𝑥, 𝑃𝑥),

where every(𝛺𝑥, ℬ𝑥, 𝑃𝑥) is an independent copy of a single proba- bility space(𝛺0, ℬ0, 𝑃0) and every 𝜉𝑥is defined as𝜉𝑥(𝜔) = 𝜉(𝜔𝑥) if 𝜔 = (𝜔𝑥)𝑥∈ℐ ∈ 𝛺, for 𝜉 a given point process defined on (𝛺0, ℬ0, 𝑃0).

We identify the point process𝜉 with the process 𝜉(𝑡) giving the num- ber of points in[0, 𝑡], for 𝑡 ≥ 0, and write 𝜇(𝑡) = 𝔼[𝜉(𝑡)] for its intensity measure, which is called the reproduction function in this context.

Besides the reproductive process𝜉𝑥we also attach a random char- acteristic𝜙𝑥to every individual𝑥 ∈ ℐ . This is also a stochastic pro- cess (𝜙𝑥(𝑡))𝑡≥0, which we take nonnegative, measurable and separable;

for simplicity define𝜙𝑥(𝑡) = 0 for 𝑡 < 0. We then proceed to define 𝑍𝜙𝑡 = ∑

𝑥∈ℐ ∶𝜎𝑥≤𝑡

𝜙𝑥(𝑡 − 𝜎𝑥).

If𝜙𝑥(𝑡) is viewed as a characteristic of individual 𝑥 at age 𝑡, then the variable𝜙𝑥(𝑡−𝜎𝑥) is the characteristic of individual 𝑥 at calendar time 𝑡, and 𝑍𝜙𝑡is the sum of all such characteristics over the individuals that are alive at time𝑡 (i.e., 𝜎𝑥≤ 𝑡).

The characteristics𝜙𝑥 are assumed independent and identically distributed for different individuals𝑥, as the reproductive processes.

Formally this may be achieved by defining𝜙𝑥(𝜔) = 𝜙(𝜔𝑥) if 𝜔 = (𝜔𝑥)𝑥∈ℐ ∈ 𝛺, for a given stochastic process 𝜙 on (𝛺0, ℬ0, 𝑃0). This allows that the two processes𝜉𝑥and𝜙𝑥attached to a given individual are dependent. In fact, we shall be interested in the choices, for a given natural number𝑘,

𝜙(𝑡) ≡ 1, 𝜙(𝑡) = 𝟙{𝜉(𝑡)=𝑘}, 𝜙(𝑡) = 𝟙{𝜉(𝑡)>𝑘}.

(3.5)

For these characteristics the variable𝑍𝜙𝑡 is equal to the total number of individuals born up to time𝑡, and the total number of those indi- viduals with exactly𝑘 or more than 𝑘 children at time 𝑡, respectively.

We consider the supercritical, Malthusian processes with the fol- lowing three conditions hold.

1. 𝜇 does not concentrate on any lattice {0, ℎ, 2ℎ, … } for ℎ > 0.

(9)

2. There exists a number𝜆> 0 such that

0

𝑒−𝜆𝑡𝜇(𝑑𝑡) = 1. (3.6)

3. The first moment of𝑒−𝜆𝑡𝜇(𝑑𝑡) is finite, i.e.,

0

𝑢𝑒−𝜆𝑢𝜇(𝑑𝑢) < ∞. (3.7)

The second condition is the Malthusian assumption, and𝜆is called the Malthusian parameter; the third is the supercritical condition.

The following is a weaker version of Theorem 6.3 of [56] (see Theo- rem A from [70]). It is worth noting that the (3.8) implies the Malthu- sian condition, but (3.8) is a sufficient condition for the almost sure convergence and might be not necessary for weaker notions of con- vergence, e.g., in probability.

Proposition 3.3. Consider a supercritical, Malthusian branching pro- cess with Malthusian parameter𝜆, counted by two bounded random characteristics𝜙 and 𝜓. Suppose that there exists a 𝜆 < 𝜆such that

0

𝑒−𝜆𝑡𝜇(𝑑𝑡) < ∞. (3.8)

Then almost surely, as𝑡 → ∞, 𝑍𝜙𝑡

𝑍𝜓𝑡

→ ∫0𝑒−𝜆𝑡𝔼[𝜙(𝑡)] 𝑑𝑡

0 𝑒−𝜆𝑡𝔼[𝜓(𝑡)] 𝑑𝑡. (3.9)

3.3.3 The continuous random tree model

To connect back to the pa model, given a pa function𝑓 we now de- fine the process𝜉 as a pure birth processes with birth rate equal to 𝑓(𝜉(𝑡) + 1), i.e., the continuous-time Markov process with state space the nonnegative integers and the only possible transitions given by

𝑃(𝜉(𝑡 + 𝑑𝑡) = 𝑘 + 1 ∣ 𝜉(𝑡) = 𝑘) = 𝑓(𝑘 + 1) 𝑑𝑡 + 𝑜(𝑑𝑡). (3.10) The genealogical tree is then also a Markov process on the state space 𝒢 . The initial state of the process is the root{∅} of the tree, and the jumps of this process correspond to an individual𝑥 ∈ 𝐺 giving birth to a child, which is then incorporated in the tree as an additional node.

(10)

In the preceding notation this means that the process can jump from a state𝐺 to a state of the form 𝐺 ∪ {𝑥𝑘}, where necessarily 𝑥 ∈ 𝐺 and𝑘 = deg(𝑥, 𝐺) is the number of children minus one that 𝑥 already has in the tree. This jump is made with rate𝑓(deg(𝑥, 𝐺)), since ac- cording to (3.10) with𝜉 = 𝜉𝑥the individual𝑥 gives birth to a new child with rate𝑓(𝑘) if 𝑥 already has 𝑘 − 1 children. The description in terms of rates means more concretely that given the current state 𝐺, the Markov process can jump to the finitely many possible states 𝐺 ∪ {𝑥𝑘}, 𝑥 ∈ 𝐺 and 𝑘 = deg(𝑥, 𝐺), and it chooses between these states with probabilities

𝑓(deg(𝑥, 𝐺)

𝑦∈𝐺𝑓(deg(𝑦, 𝐺)), 𝑥 ∈ 𝐺.

Furthermore, the waiting time in state𝐺 to the next jump is an expo- nential variable with intensity equal to the total preference

𝐹(𝐺) = ∑

𝑥∈𝐺

𝑓(deg(𝑥, 𝐺)).

The continuous-time scale of the process is not essential to us, but it is convenient for the calculations. We shall use that when𝑡 → ∞ the continuous-time tree visits the same states (trees) as the pa model, and taking limits as𝑡 → ∞ is equivalent to taking limits in the pa model as the number of nodes increases to infinity.

In order to apply Proposition 3.3 in this setting we need to verify its conditions on the reproduction function𝜇(𝑡) = 𝔼[𝜉(𝑡)], and deter- mine the Malthusian parameter. The events of the pure birth process (3.10) can be written as𝑇1< 𝑇1+𝑇2< 𝑇1+𝑇2+𝑇3< ⋯, for 𝑇1, 𝑇2, … in- dependent random variables, with𝑇𝑘exponentially distributed with rate𝑓(𝑘). The total number of events 𝜉(𝑡) = ∫ 𝟙(0,𝑡](𝑢) 𝜉(𝑑𝑢) at time 𝑡 is equal to ∑𝑙=1𝟙(0,𝑡](𝑇1+ ⋯ + 𝑇𝑙), whence

𝔼[∫ 𝑒−𝜆𝑢𝜉(𝑑𝑢)] = 𝔼[

𝑛=1

𝑒−𝜆(𝑇1+⋯+𝑇𝑙)] =

𝑙=1 𝑙

𝑘=1

𝑓(𝑘) 𝜆 + 𝑓(𝑘). As a function of𝜆 this expression is decreasing; it is infinite at 𝜆 = 0 and decreases to0 at 𝜆 = ∞ under mild conditions on 𝑓. The Math- usian parameter𝜆 is defined as the argument where the function takes the value one and will typically exist. In a neighbourhood of𝜆 the function will also be finite, and so will its derivative in absolute value in (3.7). Thus the conditions of Proposition 3.3 will typically be satisfied.

(11)

We may also calculate the expressions in the right hand side of (3.9) for the characteristics (3.5), as follows. Fromℙ(𝜉(𝑡) > 𝑘 − 1) = ℙ(𝑇1+ ⋯ + 𝑇𝑘 < 𝑡) = ∫𝑡

0𝑘(𝑢) 𝑑𝑢, for ℎ𝑘the density of𝑇1+ ⋯ + 𝑇𝑘, we have by Fubini’s theorem (or partial integration),

𝜆 ∫

0

𝑒−𝜆𝑡ℙ(𝜉(𝑡) > 𝑘 − 1) 𝑑𝑡 = ∫

0

𝑠

𝜆𝑒−𝜆𝑡𝑑𝑡 ℎ𝑘(𝑢) 𝑑𝑢

= 𝔼[𝑒−𝜆(𝑇1+⋯+𝑇𝑘)] =

𝑘

𝑗=1

𝑓(𝑗) 𝜆 + 𝑓(𝑗).

(3.11) Furthermore, writingℙ(𝜉(𝑡) = 𝑘−1) as the difference of the preceding with𝑘 − 2 and 𝑘 − 1, we obtain

𝜆 ∫

0

𝑒−𝜆𝑡ℙ(𝜉(𝑡) = 𝑘 − 1) 𝑑𝑡 = 𝜆 𝜆 + 𝑓(𝑘)

𝑘−1

𝑗=1

𝑓(𝑗)

𝜆 + 𝑓(𝑗). (3.12) As we will see, (3.11) and (3.12) correpond to the computations of the limiting proportions of nodes with degrees bigger than𝑘 and 𝑘.

3.4 consistency of the empirical estimators

For completeness, we present a result without proof from [70] giving the limiting degree distribution for a class of pa function𝑓.

Proposition 3.4. Consider a pa function𝑓 such that there exists a positive constant𝜆 as the Malthusian parameter for the associated continuous-time random tree model. Then as𝑛 → ∞, the empirical de- gree distribution𝑃𝑘(𝑛) converges almost surely for any 𝑘 to some limit 𝑝𝑘specified by the equation below

𝑃𝑘(𝑛)−−→ 𝑝a.s. 𝑘= 𝜆 𝜆+ 𝑓(𝑘)

𝑘−1

𝑗=1

𝑓(𝑗)

𝜆+ 𝑓(𝑗), ∀𝑘 ∈ ℕ+. (3.13) Note𝑝1= 𝜆/(𝜆+ 𝑓(1)) and 𝑝𝑘+1= 𝑝𝑘𝑓(𝑘)/(𝜆+ 𝑓(𝑘 + 1)).

Suppose the birth process is defined as in (3.10), then intervals between successive jumps of𝑋𝑡are independent exponentially dis- tributed random variables of parameters(𝑓(𝑘))𝑘=1respectively. We recall the Malthusian equation and that𝜆the solution to the equa-

(12)

tion (in𝜆)

0

𝑒−𝜆𝑡𝑑𝜇(𝑡) =

𝑘=1

[ [

𝑘

𝑖=1

𝑓(𝑖) 𝜆 + 𝑓(𝑖)]

]

= 1.

It will be useful later that (by the formula of𝑝𝑘in (3.13))

𝑘=1

𝑓(𝑘)𝑝𝑘=

𝑘=1

𝜆𝑓(𝑘) 𝜆+ 𝑓(𝑘)

𝑘−1

𝑖=1

𝑓(𝑖) 𝜆+ 𝑓(𝑖)

=

𝑘=1

𝜆

𝑘

𝑖=1

𝑓(𝑖)

𝜆+ 𝑓(𝑖) = 𝜆.

(3.14)

Proof of Theorem 3.2. We need to apply Proposition 3.3 properly. We find appropriate characteristics𝜙 and 𝜓 so that on the left hand side of (3.9) we have the ee and the right hand side𝑟𝑘.

We set𝜙(𝑡) = 𝟙{𝜉(𝑡)+1>𝑘}and𝜓(𝑡) = 𝟙{𝜉(𝑡)+1=𝑘}, then 𝑍𝜙𝑡 = ∑

𝑥∈ℐ

𝟙{𝑡≥𝜎𝑥}𝟙{𝜉𝑥(𝑡−𝜎𝑥)+1>𝑘}, 𝑍𝜓𝑡 = ∑

𝑥∈ℐ

𝟙{𝑡≥𝜎𝑥}𝟙{𝜉𝑥(𝑡−𝜎𝑥)+1=𝑘}.

𝑍𝜓𝑡 counts all the nodes who have born and have degree𝑘 up to time 𝑡 and 𝑍𝜙𝑡 counts all the nodes who have been born and have degree strictly bigger than𝑘 at time 𝑡.

We apply Proposition 3.3 with the above defined𝜙 and 𝜓 and it gives us

𝑡→∞lim

|{𝑥 ∈ 𝛶(𝑡) ∶ 𝜉𝑥(𝑡 − 𝜎𝑥) + 1 > 𝑘}|

|{𝑥 ∈ 𝛶(𝑡) ∶ 𝜉𝑥(𝑡 − 𝜎𝑥) + 1 = 𝑘}|

= 𝜆0𝑒−𝜆𝑡ℙ(𝜉𝑥(𝑡 − 𝜎𝑥) + 1 > 𝑘)𝑑𝑡 𝜆0𝑒−𝜆𝑡ℙ(𝜉𝑥(𝑡 − 𝜎𝑥) + 1 = 𝑘)𝑑𝑡

(3.15)

holds almost surely. We identify the denominator and nominator with

the calculations in (3.11) and (3.12) Note that the

reproduction process𝜉(𝑡) associated with a node and its degree is related by

𝜉(𝑡) + 1 = Deg(𝑡).

and come to

𝜆

0

𝑒−𝜆𝑡ℙ(𝜉(𝑡) + 1 > 𝑘)𝑑𝑡 =

𝑘

𝑗=1

𝑓(𝑗)

𝑓(𝑗) + 𝜆, (3.16)

𝜆

0

𝑒−𝜆𝑡ℙ(𝜉(𝑡) + 1 = 𝑘)𝑑𝑡 = 𝜆 𝜆+ 𝑓(𝑘)

𝑘−1

𝑗=1

𝑓(𝑗)

𝑓(𝑗) + 𝜆. (3.17)

(13)

We see immediately that the right hand side of (3.17) is the same as the limiting proportion𝑝𝑘of degree𝑘 from (3.13) and the right hand side of (3.16) is𝑝>𝑘 ∶= ∑𝑗=𝑘+1𝑝𝑗by Fubini’s theorem. Therefore we arrive at

𝑡→∞lim

|{𝑥 ∈ 𝛶(𝑡) ∶ deg(𝑥, 𝛶(𝑡)) > 𝑘}|

|{𝑥 ∈ 𝛶(𝑡) ∶ deg(𝑥, 𝛶(𝑡)) = 𝑘}| = 𝑝>𝑘 𝑝𝑘

holds almost surely. It suffices to show that the right hand side of the preceding display is the same with𝑟𝑘 = 𝑓(𝑘)/ ∑𝑗𝑓(𝑗)𝑝𝑗, which is shown in the succeeding Lemma 3.5.

Lemma 3.5. Suppose(𝑝𝑘)𝑘=1is the limiting degree distribution speci- fied in (3.13) for the pa function𝑓. Then

𝑓(𝑘)

𝑗=1𝑝𝑗𝑓(𝑗) = 𝑝>𝑘

𝑝𝑘 (3.18)

holds for𝑘 ∈ ℕ+.

Proof. Define the auxiliary quantity𝑞𝑘as the limiting preference to- wards degree𝑘 to be

𝑞𝑘= 𝑓(𝑘)𝑝𝑘

𝑖=1𝑓(𝑖)𝑝𝑖 = 𝑓(𝑘)𝑝𝑘 𝜆 . Note𝑞𝑘= 𝑟𝑘𝑝𝑘and (3.18) is the same as𝑞𝑘 = 𝑝>𝑘.

For𝑘 = 1 we note 𝑝1= 𝜆/(𝑓(1) + 𝜆), 𝑝>1= 1 − 𝑝1= 𝑓1/(𝜆+ 𝑓(1)) = 𝑓(1)𝑝1/𝜆. Assuming𝑝>𝑘 = 𝑞𝑘holds, consider the case of 𝑘 + 1. By (3.13), we have 𝑝𝑘+1= 𝑓(𝑘)𝑝𝑘/(𝜆+ 𝑓(𝑘 + 1)).

𝑝>𝑘+1= 𝑝>𝑘− 𝑝𝑘+1= 𝑞𝑘− 𝑝𝑘+1

= 𝑓(𝑘)𝑝𝑘( 1

𝜆 − 𝑓(𝑘 + 1) 𝜆+ 𝑓(𝑘 + 1))

= 𝑓(𝑘 + 1) 𝜆

𝑓(𝑘)𝑝𝑘 𝜆+ 𝑓(𝑘 + 1)

= 𝑓(𝑘 + 1)𝑝𝑘+1 𝜆 = 𝑞𝑘+1.

Then by mathematical induction,𝑝>𝑘= 𝑞𝑘holds for all𝑘 ∈ ℕ+.

(14)

3.5 simulation studies

In this section we present a numerical illustration of the behavior of the empirical estimator.

We run the experiment for the following pa functions (after nor- malization such that𝑓(1) = 1)

𝑓(1)(𝑘) = (𝑘 + 1/2)/(3/2), 𝑓(2)(𝑘) = 𝑘2/3, 𝑓(3)(𝑘) =√𝑘 + 2/4 √3.4

We simulate 1,000 pa networks of 10,000, 100,000 and 1,000,000 nodes for each of the three functions, so 9,000 networks in total. In each experiment, keeping the model is only identifiable up to scale, we ap- ply the empirical estimation on some degrees and then normalize the obtained estimation such that the preference on degree 1 is1 to en- able easy comparisons. We summarize our simulation study in Fig- ure 3.1. The three rows of the 9 panels refer to the pa functions𝑓(1), 𝑓(2),𝑓(3)(top to bottom), while the three columns refer to networks with 10,000, 100,000, 1,000,000 nodes. In each of the nine panels the degree𝑘 is on the horizontal axis, plotted from 1 to the maximal de- gree that occurred in all of the 1000 simulations. The vertical axis gives the value of the pa function and a boxplot of the 1,000 estimates com- puted from the simulated networks. The ground truth for each𝑘 is marked as the red lines in each panel.

These plots suggest the following observations:

• The estimator is consistent, as our theorem shows. The quality improves when we have more nodes, hence more observations.

• For a fixed number of nodes, the quality of the estimator dete- riorates fast when𝑘 increases, exemplified by the substantial variability compared to the ground truth.

• Even if the ee has a large variance for large𝑘’s, the sample mean of 𝑘̂𝑟 in each degree𝑘 is still remarkably close to the truth, demonstrating that the ee is nearly unbiased.

• For a fixed number of nodes, it appears that when the ee makes larger errors, it is overestimating.

• The ee is not automatically monotone. However, we can slightly modify the estimator so that it is still consistent but always gives monotone results.

(15)

23 45678910 12 14

01020304050

Degree k

f(k)

EE of f1 with 10000 nodes

2468 1013 16 19 22 25 28 31

020406080100

Degree k

f(k)

EE of f1 with 100000 nodes

26 1015202530354045505560

0100200300400

Degree k

f(k)

EE of f1 with 1000000 nodes

23 45 6 78 910 11 12 13 14

5101520253035

Degree k

f(k)

EE of f2 with 10000 nodes

2 4 6 8 1012141618202224

020406080

Degree k

f(k)

EE of f2 with 100000 nodes

2 4 6 811141720232629323538

0102030405060

Degree k

f(k)

EE of f2 with 1000000 nodes

2 3 4 5 6 7 8 9 10

12345

Degree k

f(k)

EE of f3 with 10000 nodes

2 3 45 67 8 910 11 12 13 14

12345

Degree k

f(k)

EE of f3 with 100000 nodes

2 3 4 5 6 7 8 9 11 13 15 17 19

1234567

Degree k

f(k)

EE of f3 with 1000000 nodes

Figure 3.1. Boxplots of ee’s in different settings.

The three rows of the 9 panels refer to the pa functions𝑓(1),𝑓(2), 𝑓(3)and the three columns refer to networks of 3 levels of numbers of nodes. In each panel the horizontal axis is the degree. The vertical

axis gives the value of the preference function and a boxplot of the 1,000 estimates. The ground truth is marked in red.

(16)

To sum up, the estimator works as proven in the main Theorem 3.2, but the exact performance depends on the true pa function and the degree of interest—if the true pa function increases slowly with re- spect to the degree, then it is easier to estimate the preference of low degrees and harder the preference of high degrees and vice versa.

3.5.1 Sample variance study

We again run 1000 simulations of trees with pa function function of 𝑓(1),𝑓(2),𝑓(3), but now only simulate networks of size 1,000,000. We apply the ee to each simulated network and calculated the sample vari- ance of the 1000 estimates for each given degree𝑘 = 1, … , 70. The sample variances are plotted against the degrees in Figure 3.2. Differ- ent colors stand for different pa functions: the red corresponds to𝑓(1), the green𝑓(2)and the blue𝑓(3). The scale on both axes are inlog. De- note the sample variance of the ee for degree𝑘 by 𝑠𝑘, inspections on these plots reveal the following observations:

• It appearslog 𝑠𝑘grows polynomially with respect tolog 𝑘. For 𝑓(1), the affine pa function, it looks likelog 𝑠𝑘is affine tolog 𝑘.

• Sample variance𝑠𝑘characterizes, to a certain extent, the diffi- culty of estimating𝑟𝑘.

– Looking at small𝑘’s, we see that in the beginning, 𝑠(3)𝑘 <

𝑠(2)𝑘 < 𝑠(1)𝑘 .

– Then at about𝑘 = 17, the blue line (𝑠(3)) first crosses the green line (𝑠(2)), i.e.,𝑠(2)(𝑘) < 𝑠(3)(𝑘) < 𝑠(1)(𝑘).

– The blue line continues on and crosses the red line (𝑠(1)) at around𝑘 = 18. This means 𝑠(2)(𝑘) < 𝑠(1)(𝑘) < 𝑠3(𝑘).

– The green line crosses the red line at approximately𝑘 = 35, so from that point on 𝑠(1)(𝑘) < 𝑠(2)(𝑘) < 𝑠(3)(𝑘).

• On the one hand, for small𝑘’s, the slower 𝑓 grows with 𝑘, the easier it is to estimate𝑟𝑘, reflected in the observation that slower𝑓 yields lower sample variance in small 𝑘’s. On the other hand, for large𝑘’s, the faster 𝑓 grows in 𝑘, the easier it is to es- timate𝑟𝑘.

• The shapes of curves of different𝑓’s seems to indicate that the faster𝑓 grows with 𝑘, the slower log 𝑠𝑘grows withlog 𝑘. The seemingly affine relations might be a consequence of the lim- iting power-law distribution.

(17)

The above observations seem intuitive because faster growth of𝑓 in- dicates that more nodes of higher degrees will come into realizations, and more observations of nodes of higher degrees yields better results in estimating the preferences on higher degrees. However as the total number of nodes is fixed, more nodes of high degrees means fewer nodes of low degrees. This results in larger variance in estimating𝑟𝑘 with small𝑘’s.

10-4 10-2 100 102

10 20 30 40 50 60 70

Degree

Sample Var

f 1 2 3

Sample Var. Study for f's (loglog) (n=1 million)

Figure 3.2. Sample Variance Study of EE

3.5.2 Asymptotic normality of the ee with parametric rate?

We might wonder for any fixed𝑘, what the asymptotic distribution of 𝑘̂𝑟(𝑛) is when 𝑛 → ∞? Perhaps it is asymptotically normal? If not, what else? To answer this question, we study the simulation results here.

We fix the number of nodes to be one million in all simulated net- works for each pa function. Then we look at the ee’s in each simula- tion. For each𝑓, we plot the qq-plot

qq-plot:

quantile-quantile plot

of each estimator for𝑘 = 2, 3, 4 against the normal distribution. The results are summarized in Fig- ure 3.3. The pa function is the same on each horizontal level and the degree𝑘 on which we conduct our ee study is the same on each verti- cal level. Since the number of nodes is one million, we expect that the limiting distribution should have kicked in, assuming there is indeed

(18)

a limiting distribution. We see that indeed qq-plots indicates the ee’s look very much like normal distributions.

-3 -2 -1 0 1 2 3

1.651.661.671.681.69

QQ-plot of f1(2)

Sample Quantiles of f(2)

Sample Quantiles

-3 -2 -1 0 1 2 3

2.302.322.342.362.38

QQ-plot of f1(3)

Sample Quantiles of f(3)

Sample Quantiles

-3 -2 -1 0 1 2 3

2.942.962.983.003.023.04

QQ-plot of f1(4)

Sample Quantiles of f(4)

Sample Quantiles

-3 -2 -1 0 1 2 3

1.5701.5801.5901.600

QQ-plot of f2(2)

Sample Quantiles of f(2)

Sample Quantiles

-3 -2 -1 0 1 2 3

2.052.062.072.082.092.102.11

QQ-plot of f2(3)

Sample Quantiles of f(3)

Sample Quantiles

-3 -2 -1 0 1 2 3

2.482.502.522.542.56

QQ-plot of f2(4)

Sample Quantiles of f(4)

Sample Quantiles

-3 -2 -1 0 1 2 3

1.0451.0501.0551.0601.065

QQ-plot of f3(2)

Sample Quantiles of f(2)

Sample Quantiles

-3 -2 -1 0 1 2 3

1.0901.0951.1001.1051.1101.1151.1201.125

QQ-plot of f3(3)

Sample Quantiles of f(3)

Sample Quantiles

-3 -2 -1 0 1 2 3

1.131.141.151.161.17

QQ-plot of f3(4)

Sample Quantiles of f(4)

Sample Quantiles

Figure 3.3. QQ-Plots of 2̂𝑟(𝑁), ̂𝑟3(𝑁) and ̂𝑟4(𝑁) with 𝑛 = 106for 𝑓(1),𝑓(2)and𝑓(3)

We suspect that, for fixed𝑘, the following asymptotic normality result holds

√𝑛( ̂𝑟𝑘(𝑛) − 𝑟𝑘) ∼ 𝑁(0, 𝜎2𝑘), (3.19) where𝜎2𝑘only depend on𝑓 and 𝑘. To see whether this might be cor- rect, we have the following study. We fix the pa function to be𝑓(2) and run 3 simulations for different network sizes of 10,000, 100,000 and 1,000,000 and study the estimators of the preference on degree 2 only. If (3.19) is true, then the distribution of𝑟2(𝑛) (𝑛 is the network size) should remain somewhat stable after rescaling them with √𝑛.

We conduct such study and summarize the results in Figure 3.4, where the label r2 corrected on the𝑥-axis means that we take √𝑛( ̂𝑟2(𝑛) − 𝑟2) (centered and rescaled with parametric rate) instead of𝑟2. The sample variances of each simulation is on top of each subplot.

(19)

1E4 Sample Var = 33.235 1E5 Sample Var.= 35.886 1E6 Sample Var. = 35.056

0.000 0.025 0.050 0.075

-20 -10 0 10 20 -20 -10 0 10 20 -20 -10 0 10 20

r2 corrected

Freq.

Histograms of r2

Figure 3.4. Histograms of √𝑛( ̂𝑟2(𝑛) − 𝑟2) for 𝑓(2)with different net- work sizes

If we do density estimations on the data of √𝑛( ̂𝑟2(𝑛)−𝑟2) for three different𝑛’s, we obtain Figure 3.5. As both the sample variances, his- tograms and density plots look rather stable after the √𝑛-rescaling, we conjecture that (3.19) is true.

(20)

0.00 0.02 0.04 0.06

-20 -10 0 10 20

r2 corrected

Esti. Density

No. of nodes 1E4 1E5 1E6

Esti. Density Plot for r2

Figure 3.5. Estimated Density of √𝑛( ̂𝑟2(𝑛) − 𝑟2) with different net- work sizes𝑛

(21)

Referenties

GERELATEERDE DOCUMENTEN

In example (71), the patient is covert since it is understood from the context. The verb takes an oblique root. Note that the occurrence of the patient after the verb does

The prefix ba- combined with absolute verb roots occurs in intransitive constructions expressing ‘to do a relatively time stable activity’. Derivational forms only take

Chapter 9 discussed the derived verb constructions. Verbs are derived from prefixation processes. Three general types of derived verb constructions can be distinguished with regard

We derive a contraction rate for the corre- sponding posterior distribution, both for the mixing distribution rel- ative to the Wasserstein metric and for the mixed density relative

Fengnan Gao: Bayes &amp; Networks, Dirichlet-Laplace Deconvolution and Statistical Inference in Preferential Attachment Networks, © April 2017.. The author designed the cover

In this chapter we improve the upper bound on posterior contraction rates given in [61], at least in the case of the Laplace mixtures, obtaining a rate of

plex networks are representations of complex systems that we wish to study, and network science, by all its means, is the science to study these complex systems beneath such

If the distribution of the initial