Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

(1)

Cover Page

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

Part II

S TAT I S T I CA L I N F E R E N C E I N P R E F E R E N T I A L AT TA C H M E N T N E T W O R K S

(3)

(4)

2

I N T R O D U C T I O N T O N E T W O R K S C I E N C E A N D P R E F E R E N T I A L AT TA C H M E N T N E T W O R K S

2.1 network science

2.1.1 The emergence of network science

We live in a connected world No man is an

island entire of itself … therefore never send to know for whom the bell tolls; it tolls for thee.

—John Donne

embedded in all kinds of networks. We are connected by networks of pipes and transmission wires—the elec- trical grid and sewage system bring vast modern utilities and conve- nience to every household. We are connected by networks of cables—

the Internet cables enable us to interact with anything from the other side of the world in a single mouse click and has infiltrated every cor- ner of our lives. The highways, railways and airports form a gigan- tic transportation network in which humans and commodities/goods flow seamlessly. The supply chain, through which we purchase and sell everything, is connected by networks of different layers of suppli- ers and buyers, covering entities from the manufacturers in China to Albert Heijn next door. More and more so, we connect to each other

on virtual networks—Facebook The author has

Facebook friends, whom he simply forgets to remove as friends.

connects us to online contacts, that we might have never seen in person. Our money is tied with each other’s as well—the banking networks, the networks of mutual/hedge funds, the insurance companies and credit card providers, offers easy access of money/wealth management across the world and (almost) free flow of capital, and spread financial turmoils over the entire globe like Dominoes.

The networks are ubiquitous in nature as well. Our brains The complex neural network structure has been one of the biggest inspiration for artificial neural networks, in applications such as deep learning.

, with all the interactions between different parts, are among the most complex systems that we know of. The structure and function of the brains, granting us cognitive abilities, depend on the topology of how the highly interconnected neurons form the immense neural networks.

When a living cell is under the influence of chemical compounds, a network of biochemical reactions convert one compound into another. The growth or decline of the population of one species triggers dynamics of several other species’ populations within the ecological network.

Table 2.1 illustrates the different categories of networks that we

consider—reproduced from Table 2-1 of [20] ^{[20] was}

commissioned by the US Army and stresses on militaristic purposes.

.

The inception and flourishing of network science is the conse- quence of the desire to understand, predict and manipulate the complex systems that modern society and science have exposed to us. Com-

(5)

introduction to networks

BiologicalNetworksPhysicalNetworksSocialNetworks

TypeofNetworksGlobalImpactTypeofNetworksGlobalImpactTypeofNetworksGlobalImpact

Diseasetransmittingnetworks(HIV,influenza,TB,malaria,cholera) Spreadofdisease,epidemics Distributiongrids(electricpower,watersupply,businesssupplychains) Efficientdistributionofgoodsorcommodities Affiliation/acquaintanceNetworks(terrorist,community,business,religious,clubs) Efficientcollabrationandactivitycoordination Ecologicalnetworks(foodwebs,riverbasin,rainforest) Survivalofselectedspecies;globalweatherandtopograhy Telecommunicationsinfrastructure(cellular,PSTN,cableTV,Internet) Instantaneousworldwideinformationdistribution Broadcastnetworks(radio,TVnetworks) Disseminationofidenticalinformationtolargegroups

MetabolicnetworksSustenanceoflifeforagivengenerationoflivingentities DODglobalinformationgrid(sensors,communications,andweapons) Network-centricwarfareandnetwork-enabledoperations Informationexchangenetworks(U.S.mail,localandlong-distancetelephoneservice) Cheap,convenientlongdistancepair-wisecommunications Communitynetworks(insectsocieties,animalherds,birdflocks,schoolsoffish) Survivalofselectedspecies Transportationnetworks(airports,highways,railways,shipping) Rapidmovmentofgoodsfromsuppliertomarket;moderntravel Groupformingnetworks(eBay,corporateintranets) Easy,convenientformationofgroupsoflike-mindedpeoplewhohavenevermet) GeneexpressionnetworksTransmissionandevolutionoflifebetweengenerations Electronicfinancialtransactionnetworks(banking,creditcards,ATMs) Electroniccashlesstransactions Supplychainsandbusinessnetworks Coordinationofmulipleplayerstoachievecommongoals,globalcostreductionSocialServicenetworks(SocialSecurity,familyservices,Medicare,Medicaid) Efficienteliveryofgovernmentservicestolarge,distributedconstituencies

Table2.1.RepresentativeNetworks–ReproducedfromTable2-1of[20]

26

(6)

2.1 network science

plex networks are representations of complex systems that we wish to study, and network science, by all its means, is the science to study these complex systems beneath such networks. Our science has come to the point to realize that the study of the pairwise relations between two or a handful objects are no longer sufficient. For instance, the simple fact of somebody having lung cancer might result from un- countable factors and the conventional way of studying—such as to study whether smoking or the individual’s genetics causes cancer can- not adequately explain the complex reason that actually causes can-

cer. Rather, the study of many factors Graphical models, Bayes network

, interacting and interfering with each other, and functioning as a whole in the forms of complex systems, thrives. The social hierarchy has become more flattened and

decentralized In the author’s

opinion, decentralization is the social trend to reshape the society even further.

, and it thus renders the irrelevance of a few dominat- ing individuals (e.g. kings, presidents or ceo’s). Instead, the collective thinking and behavior of the populace, that we build social networks for, is more and more the factor determining the outcomes of events.

For the omnipresence of networks, from military strategists to logistic planners, from the directors of funding agencies to urban designers, the demand for network science is strong and getting stronger.

Let us look at the words of complex network one by one. Network is a fancy name for graph that represents the underlying complex sys- tem. Graphs are mathematical structures to model pairwise relations between any number of objects. A graph𝐺 can be defined as ⟨𝑉, 𝐸⟩, consisting of a set of vertices/nodes𝑉 and set of edges 𝐸. Vertices can be any concrete or abstract objects, ranging from cities to IP addresses, from mathematicians to different scientific disciplines. The edges are lines/segments between vertices, representing some relationships between vertices. If the vertices represent cities, then the existence or absence of edges may represent whether there is a railway connection between the two cities or not. Or if the vertices represent two mathematicians, the edge can represent whether they have written a paper

together. But why complex? ^complex

\käm-ˈpleks, kəm-ˈ, ˈkäm-ˌ\

adjective : having parts that connect or go together in complicated ways : not easy to understand or explain : not simple

Merriam-Webster

Complex means big and interconnected.

Unlike the classical graphs, the modern networks exhibit patterns between nodes and have inherently topological structures. By the defini- tion of the United States National Research Council in [20], network science is the study of network representations of physical, biological and social phenomena leading to predictive models of these phenom- ena. It is a scientific discipline devoted to the study of the structure, dynamics and behaviors of complex networks, so as to predict the net- works and even manipulate them.

Modern network science gets its basic framework from the mathematical graph theory. Leonhard Euler is believed to have written the first paper in the history of graph theory to solve the problem of the

(7)

Königsberger

Brückenproblem Seven Bridges of Königsberg. The great invention of graph theory man-

ifests the extraordinary beauty of mathematics – it extracts the most essential elements of our concern

If in other sciences we should arrive at certainty without doubt and truth without error, it behooves us to place the foundations of knowledge in mathematics.

– Roger Bacon

and alleviate our pain in getting lost in seemingly endless circumstantial periphery matters. The diagram in Figure 2.1 (a) - (c) sketches the process of abstraction. The geog- raphy is no longer relevant, but only the typology, in the form of a graph, matters. The celebrated solution to the seven-bridge problem is simple, yet powerful and universal.

(a) Original map of Königsberg

The figures are taken from the Wikipedia page of the Seven Bridges of Königsberg and are distributed under the GNU free license.

(b) First-step abstrac- tion

(c) Graph representa- tion

Figure 2.1. Abstraction of the problem of the Seven Bridges of Königsberg

The probabilistic methods in graphs were, however, introduced much later to graph theory by Paul Erdős

The author’s Erdős number is 4 because Aad van der Vaart has 3.

and Alfréd Rényi (and independently by Edgar Gilbert) in their famous studies of so-called Erdős–Rényi (er) models. Suppose there are𝑛 nodes and for every possible pair of nodes, run an independent binomial trial with success probability𝑝 on the existence of the edge between the pair. Then we obtain a random graph, that has support on all possible graphs of the𝑛 nodes. The er models are simple, hence mathematically tractable, yet mathematically interesting objects. These studies on such objects set the foundation of the discipline of random graph theory and inspired an explosion in literatures on random graphs. The investigations later play an essential role as the backbone of the modeling approach of network science.

Despite the fact that graph theory has been around for more than 200 years and random graph theory half a century, network science as a distinct discipline only emerged after two foundational papers [11, 78] revolutionized people’s perspectives. In 1998 in a Nature let- ter, Watts and Strogatz [78] coined the term of small-world and rein- vigorated the six-degree-separation theory from back in the 1920s and 1960s by sociologists. They introduced a random graph model com- bining global interaction and local clustering, that offers a possible explanation of the so-called small-world phenomena. Shortly after-

28

(8)

2.1 network science

wards in 1999, another famous paper Barabási and Albert [11], discov- ered the so-called scale-free property in the topology of the Internet

in particular, the World wide web (WWW)

and proposed a beautiful dynamic mathematical model for such scale- freeness. In their model, the nodes come one by one and each new node attaches preferentially to existing nodes of higher degrees. Their discovery leads to a series of findings of similar patterns in networks of various scientific disciplines, from protein networks to coauthor- ship networks. These two papers have henceforth incubated the promi- nence of network science.

The success of network science depends largely upon the marriage of random graph theory to complex systems. Random graph theory provides a sound framework for scientists across a wide range of disciplines to work on. With this framework, the quantification and val- idation of models become (mathematically) concrete and the funda- mental mechanisms responsible for the systems in question are inves- tigated. As one of the leader in network science, Albert-László Bara- bási proclaimed in [10], Data-based mathematical models of complex systems are offering a fresh perspective, rapidly developing into a new discipline: network science.

2.1.2 Fundamentals of graph theory

A graph𝐺 is an ordered pair ⟨𝑉, 𝐸⟩, consisting of a node/vertex Node is the terminology in computer science and vertex in mathematics.

Both terms are interchangeable in this dissertation.

set𝑉, an edge set𝐸, each edge 𝑒 ∈ 𝐸 associates two nodes from 𝑉, called its endpoints. An edge𝑒 = (𝑣₁, 𝑣₂) indicates there is an edge going from𝑣₁to𝑣₂and we write𝑣₁ → 𝑣₂. If both(𝑣₁, 𝑣₂) and (𝑣₂, 𝑣₁) are in𝐸, then the edge (𝑣₁, 𝑣₂) = (𝑣₂, 𝑣₁) is called undirected. If the edge (𝑣₁, 𝑣₂) ∈ 𝐸 and (𝑣₂, 𝑣₁) ∉ 𝐸, then the edge (𝑣₁, 𝑣₂) is called directed and we write𝑣₁ → 𝑣₂. A loop is an edge with identical endpoints.

Multiple edges are edges sharing the same pair of endpoints. A simple graph has neither loop nor multiple edges.

A walk𝑤 is a list 𝑤 = (𝑣₀, 𝑒₁, 𝑣₁, … , 𝑒_𝑘, 𝑣_𝑘) of nodes and edges such that for1 ≤ 𝑚 ≤ 𝑛, the edge 𝑒_𝑚has endpoints𝑣_𝑚−1and𝑣_𝑚. A path is a walk with no repeated nodes or edges. If the graph is simple, then a walk is uniquely defined by its edges and its length is defined as the number of edges of the walk. If there exists a walk from𝑣₁to 𝑣₂, then we write𝑣₁ ⇝ 𝑣₂. If both𝑣₁ ⇝ 𝑣₂and𝑣₂ ⇝ 𝑣₁, we write 𝑣₁↭ 𝑣₂. The distance from𝑣₁and𝑣₂(𝑣₁≠ 𝑣₂) is the smallest length of all the walks from𝑣₁to𝑣₂and the distance is defined as0 if 𝑣₁= 𝑣₂. Should such a walk does not exist, we define the distance to be∞. The diameter of𝐺 = ⟨𝑉, 𝐸⟩ is the largest distance between any two nodes in𝑉.

(9)

2.1.3 Properties of typical networks

In spite of the varieties and dramatically different origins of networks, there seems to be several universal patterns in complex networks that the scientists find across different disciplines.

Sparsity

Suppose a simple network of𝑛 nodes, the maximal number of edges are given by the simple formula of𝑛(𝑛 − 1)/2, and the maximal sum of degrees of all nodes is twice that. However if we count degrees of all nodes in a typical real-world network, it is most likely much smaller than that. Using the big-𝑂 and small-𝑜 notations, the total number of edges is maximally of the order𝑂(𝑛²) and the total number of edges is usually of order𝑜(𝑛²).

Small world

This is perhaps the most popularized network property due to the 1990 play Six Degrees of Separation by John Guare. The six-degree- separation theory says if you build a social network with all the population of the US (or the entire world) where edges represent acquain- tances (two nodes connect to each other if they know each other), then typically anyone in the network can reach anyone else through only 6 different nodes, despite the 7-billion population. Hence the world is small. Mathematically, the small-world term entails that the typical distance and diameter of the network are small. Suppose a network of size𝑛, and the distance between two nodes is defined as the smallest number of nodes that a node needs to reach the other node.

The typical distance is the average distance over all possible pairs of vertices. The typical distance is small as it is usually of order𝑂(log 𝑛)

For many applied mathematicians, log 𝑛 is bounded

(andlog log 𝑛 is

constant).

. Scale-freeness

As the name suggests, scale-freeness means that such networks have similar topological structures regardless at which scale the observer looks at them, hence they are free from scale. Quantitatively the term says that the degree distributions of said networks follow power laws.

Suppose𝑃_𝑘 is the proportion of nodes of degree𝑘 in the network, we say(𝑃_𝑘)^∞_𝑘=1satisfies power-law distribution (as known as a Pareto distribution) if

𝑃_𝑘∝ 𝑘^−𝜏

30

(10)

2.2 preferential attachment networks

for some power-law exponent𝜏. The term power law ([68]) is closely

connected to the Matthew effects For whosoever

hath, to him shall be given, and he shall have more abundance: but whosoever hath not, from him shall be taken away even that he hath.

– Matthew 13:12

([53, 54]), Zipf ’s law ([81]), Benford’s law and the Pareto principle. The Matthew effect is a term in sociology to describe the phenomenon where the rich get richer and the poor get poorer. The Pareto principle, or 80/20 principle, states that in many fields 80% of the outcomes come from 20% of the participants, for example 80% of the national income are taken by 20% of the population or 80% of the (important) papers are published by 20% of the scientists. The principle is in nature connected to the accumulation of something (in the case of the national income, it is capital and in the case of papers, knowledge). Scale-free networks (or related power laws) arise in various distinct disciplines. It is in our curiosity to won- der how to explain such universality, preferably (by mathematicians) with the means of concrete mathematical models and proofs.

2.2 preferential attachment networks

The focal point of this part of the dissertation are the preferential attachment (pa) network models. In this section we give an overall introduction of the model. As the dissertation is essentially a collection of papers that the author worked on during his PhD research, there are also individual introductions to respective chapters.

2.2.1 History and motivation of the pa networks

There are many interesting random graph models (see [38] for more information). The probabilitists/physicists have been studying Erdős–

Rényi model, configuration model, exponential random graphs; for statisticians, stochastic block models have particularly caught the at- tention for its widespread applications in community detection. So why do we study pa models?

First, most random graph models are static. Some classical results on Erdős–Rényi models are of a somewhat dynamic nature. Suppose 𝐺(𝑛, 𝑝) is a random graph on 𝑛 nodes where each edge appears with probability𝑝 independently. Suppose we take 𝑝 = 𝑐/𝑛 and 𝑐 is some constant. Then as𝑛 → ∞, we observe the following famous classical phase transition results for the connected component (see [38, Chap- ter 4] for more details):

𝑐 < 1 There is no giant connected component and size of the biggest component at most𝑂(log 𝑛).

𝑐 > 1 There is one single giant component of size 𝛽𝑛 for some con- stant0 < 𝛽 < 1. All other components have size 𝑂(log 𝑛).

(11)

𝑐 = 1 There is no giant connected component and the largest component has size𝑂(𝑛^2/3). There are more than one connected components of size𝑛^2/3.

The exact meaning of the above conclusions is beyond the scope here, but the subjects in the above study are not a true dynamic random graph, though the results are under the disguise of an asymptotic for- mulation. With𝐺(𝑛, 𝑐/𝑛) we cannot build 𝐺(𝑛 + 1, 𝑐/(𝑛 + 1)) by simply adding a node, because the probability of one edge existing between any two nodes has changed. We either need to construct𝐺(𝑛 + 1, 𝑐/(𝑛 + 1)) from scratch—𝐺(𝑛 + 1, 𝑐/(𝑛 + 1)) has nothing to do with 𝐺(𝑛, 𝑐/𝑛), or we can remove each edge independently first with certain probability to reduce the probability of an edge existing between any two nodes from𝑐/𝑛 to 𝑐/(𝑛 + 1) first and add a node with proba- bility𝑐/(𝑛 + 1) to have an edge with any of the 𝑛 existing nodes. How- ever this is not good enough in plenty of real-world applications. It is often natural to assume the same network evolves over time. In so- cial networks, networks get dynamics by having newcomers as nodes (and new edges between the new nodes and the existing nodes). New edges might appear among the existing nodes (two existing nodes es- tablishing contact) and existing edges might also disappear. Through dynamics brought about by events happening in the network, the same network changes. It would be useful to model the dynamic nature of such real-world networks.

The second reason has to do with the necessity to explain the scale- freeness. Despite the ubiquity of power laws, there have been only two generative mathematical ways capable of producing power-law degree distributions in networks. One is to impose power laws somewhere else to produce power-law degree distributions (see, for example, [22, 45]). In Deijfen, Hofstad, and Hooghiemstra [22] they impose a condition on the weight generating mechanism, each vertex𝑥 on ℤ^𝑑 gets a weight𝑊_𝑥independently drawn from a power-law distribution satisfyingℙ(𝑊 > 𝑤) = 𝑤^{−(𝜏−1)}for𝑤 ≥ 1 and some positive constant 𝑐. For any two points 𝑥, 𝑦 ∈ ℤ^𝑑with weights𝑊_𝑥and𝑊_𝑦, there is an independent binomial decision on whether there is an edge between them or not. The probability that there is an edge between them is defined by𝑝_𝑥𝑦 = 1 − 𝑒^−𝜆𝑊^𝑥^𝑊^𝑦^{/|𝑥−𝑦|}^𝛼, for some parameters 𝜆, 𝛼 > 0.

Then the degree distribution of vertices onℤ^𝑑follows a power law as inℙ(𝐷₀ > 𝑠) = 𝑠^−𝛾𝑙(𝑠), where 𝑠 ↦ 𝑙(𝑠) is slowly varying at infinity and𝛾 = 𝛼(𝜏 − 1)/𝑑 > 1. Essentially in this sort of models, a power law in one place results in another power law somewhere else. The only other approach to obtain power-law degree distributions is through the preferential attachment paradigm.

Even from a mathematical point of view, Barabási and Albert [11]

32

(12)

did not even well-define their terminology, it was the first success in providing a possible answer for the prevalence of power laws and has

been cited more than 26,000 times as yet The citation number for [11] on Google Scholar is 26382 on 14 November 2016.

, which is a feat in itself. It is worth pointing out that the idea of preferential attachment did not come from [11] first. Yule [79] proposed a preferential-attachment- type model in his study of species evolutions. Simon [73] gave a modern version on the shoulder of Yule [79], in the modeling of word frequencies. Simon [73] assumes a stochastic model for the incoming words—each incoming word has a constant probability of being new, i.e., a word that has never appeared before; and the probability that the incoming word is a word that has appeared is proportional to the number of times that word has already appeared. The formation of this model has largely to do with the word frequency study as in Zipf ’s law. Suppose we have a text of𝑘 words (𝑤_𝑖)^𝑘_𝑖=1, there are𝑚 distinct words(𝜏_𝑖)^𝑚_𝑖=1in the text and the𝑖-th distinct word appears in the text for𝑛_𝑖times. Conditioned on the𝑘 words having already appeared, we assume the (𝑘+1)-th words is generated with the following probabilistic structure

ℙ(𝑤_𝑘+1= 𝜏_𝑗|𝑤₁, … , 𝑤_𝑘) = {{ {{ {

𝛼 if𝑗 = 𝑚 + 1, 1 − 𝛼

𝑘 𝑛_𝑗 if𝑗 ≤ 𝑚, (2.1) where𝜏_𝑚+1is the (𝑚 + 1)-st word (different from any word before).

Surprisingly, this is also where the author sees connections between two completely different lines of work—non-parametric Bayes

and network science. The Chinese restaurant process There is also Indian Buffet Process.

, despite its dubi- ous name, is an important tool in non-parametric Bayes for applications in machine learning, image analysis and document collections (see [15]). Imagine a massive Chinese restaurant where there is infinitely number of tables and each table can host infinitely many guests.

Each table in the model represent a cluster/feature in the original problem. Suppose there exist𝑘 guests sitting on a total number of 𝑚 tables and the 𝑖-th table has 𝑛_𝑖guests (∑^𝑚_𝑖=1𝑛_𝑖 = 𝑘) and the 𝑙-th guest is sitting on table𝑇_𝑙(the first guest always sits on the first table).

Suppose the (𝑘 + 1)-st guest comes in and needs to make a random decision on the table𝑇_𝑘+1to take. Then

ℙ(𝑇_𝑘+1= 𝑗|𝑇₁, … , 𝑇_𝑘) = {{ { {{ { {

𝛽

𝑘 + 𝛽 if𝑗 = 𝑚 + 1, 𝑛_𝑗

𝑘 + 𝛽 if𝑗 ≤ 𝑚,

(2.2)

(13)

where𝛽 is some parameter and 𝑗 = 𝑚+1 stands for the (𝑘+1)-st guest picks the first unoccupied table. Suppose the guests here are incoming words and tables existing words, with a bit of imagination, we may see that (2.1) gives a primitive version of (2.2).

2.2.2 A rather general pa model

Here we introduce a more general preferential attachment model first.

We borrow the notation mostly from [38], which the author also rec- ommends for further details for the special case of linear, fixed-initial- degree pa models. As the main objects studied in the dissertation, we give the version of allowing multiple edges but disallowing self-loops, whose meaning will be clear in a moment.

Suppose in the end we produce a graph sequence{PA_𝑡(𝑊, 𝑓)}^∞_𝑡=1 with𝑊 a random measure on ℕ and 𝑓 a monotone map from ℕ to ℝ⁺.𝑊 is the distribution of the random initial degrees and can be degenerate with on a certain positive number. Let{𝑚_𝑡}^∞_𝑡=1be an iid sequence generated from𝑊. For every 𝑡 ∈ ℕ, PA_𝑡(𝑊, 𝑓) is a graph with nodes 𝑉_𝑡 = {𝑣₀, 𝑣₁, … , 𝑣_𝑡} and |𝑉_𝑡| = 𝑡 + 1. PA₁(𝑊, 𝑓) con- sists of two nodes𝑣₁and𝑣₂connecting to each other with𝑚₁edges between them, where𝑚₁is an independent realization of𝑊. Then we start to define the preferential attachment scheme for𝑡 ≥ 2 re- cursively. GivenPA_𝑡−1(𝑊, 𝑓), a new node 𝑣_𝑡comes in with𝑚_𝑡edges to be connected to𝑉_𝑡−1. The choices of which nodes to connect to in𝑉_𝑡−1are done with intermediate updating preferential attachment rule. Define the intermediate stepsPA_𝑡,𝑖(𝑊, 𝑓) for 0 ≤ 𝑖 ≤ 𝑚_𝑡by ini- tializingPA_𝑡,0(𝑊, 𝑓) to be PA_𝑡−1(𝑊, 𝑓). For 1 ≤ 𝑖 ≤ 𝑚_𝑡,PA_𝑡,𝑖(𝑊, 𝑓) is constructed on the basis ofPA_{𝑡,𝑖−1}(𝑊, 𝑓) by adding an extra edge between the incoming node𝑣_𝑡and a random-chosen node in𝑉_𝑡−1. The probability of choosing𝑣_𝑗with intermediate degreeDeg_{𝑡,𝑖−1}(𝑣_𝑗), which is the degree of node𝑣_𝑗inPA_{𝑡,𝑖−1}(𝑊, 𝑓), is proportional to the quantity𝑓(Deg_{𝑡,𝑖−1}(𝑣_𝑗)), that is

ℙ(𝑣_𝑡,𝑖→ 𝑣_𝑗| PA_{𝑡,𝑖−1}(𝛿)) = 𝑓( Deg_{𝑡,𝑖−1}(𝑣_𝑗))

∑_𝑣∈𝑉

𝑡−1𝑓( Deg_{𝑡,𝑖−1}(𝑣)), (2.3) where𝑣_𝑡,𝑖→ 𝑣_𝑗indicates that𝑣_𝑡picks𝑣_𝑗to connect to in the intermediate(𝑡, 𝑖)-step. After all the 𝑚_𝑡steps are done, we arrive atPA_𝑡,𝑚_𝑡(𝑊, 𝑓) with𝑣_𝑡integrated into the network, henceforth we definePA_𝑡,𝑚_𝑡(𝑊, 𝑓) to bePA_𝑡(𝑊, 𝑓). Then the recursion can start again for the (𝑡 + 1)-st node𝑣_𝑡+1.

34

(14)

2.2.3 The linear pa models with random initial degrees This is obtained by taken𝑓 in (2.3) to be an affine function as

𝑓(𝑘) = 𝑘 + 𝛿,

where𝛿 is some parameter. Chapter 4 deals with the problem of esti- mating𝛿 by proposing a maximum likelihood estimator and proving the asymptotic efficiency of the maximum likelihood estimator.

Barabási and Albert [11] gave the first modern formation of the pa

models. We might get the (more-or-less) same model There is slight difference on the initial configuration.

by specifying𝑊 to be a degenerate measure with all the mass on1, which means all nodes come with only one edge to connect to, and𝑓(𝑘) = 𝑘.

2.2.4 The general sublinear pa models

Suppose𝑊 in the {PA_𝑡(𝑊, 𝑓)}^∞_𝑡=1to be a degenerate measure with all the mass on1, i.e., ℙ(𝑊 = 1) = 1, and 𝑓 to be a rather general function mappingℕ to ℝ⁺that does not go faster than an affine function.

To be precise, there exists a positive constant𝑀 not depending on 𝑘 such that𝑓(𝑘) ≤ 𝑀𝑘 for all 𝑘 ∈ ℕ. Chapter 3 presents the empirical estimator to deal with the estimation problem of𝑓 in this setting. The insight why the empirical estimator works has its roots in the classical theory of branching processes, which we exploit in the proof of Theorem 3.2.

2.2.5 The general sublinear parametric pa models

Let the initial degree distribution be degenerate at1 and (𝑓_𝜃, 𝜃 ∈ 𝛩 ⊂ ℝ^𝑑) be a parametric family. Impose some conditions uniformaly on 𝑓_𝜃with𝜃 ∈ 𝛩, which will be specified in Chapter 5. Similar to Chap- ter 4, we propose a maximum likelihood estimator (mle) to estimate 𝜃 that determines the preferential attachment function. To prove that the mle works in this case requires a combination of techniques from both Chapters 4 and 3. We build upon the classical results of super- critical Crump-Mode-Jagers processes and then apply the martingale central limit theorem to prove the desired asymptotic normality of the mle.

(15)