The Small World Phenomenon

(1)

The Small World Phenomenon

Sjoerd Janse

February 5, 2016

Bachelor thesis Supervisor: Dr. Sonja Cox

1 2 3 4 5 6

(2)

Abstract

In this thesis we present the result called the small world phenomenon, also refered to as six degrees of separation. For this result we first simplify the problem to the random Erd˝os-R´enyi graphs G(n, pn) with n points and between any pair of points exists an edge

with probability pn. We define pn by

pn=

nα n − 1,

where α ∈ 0,1₂ is a fixed parameter. For this simplified problem we prove the proba-bilistic asymptotic upper and lower bound for the diameter D(G(n, pn)). More precisely,

setting M = 1 2α we prove: lim n→∞P (D(G(n, pn)) ∈ {2M − 3, 2M − 2, 2M − 1, 2M, 2M + 1}) = 1.

In order to obtain this result we introduce for every u ∈ G(n, pn) the subgraphs

Γ1(u), Γ2(u), . . . , ΓM(u) of G(n, pn), the so called neigbourhoods and we denote their sizes

by d1(u), . . . , dM(u), called the neigbourhood sizes. We work towards Lemma 4.4 which

provides for i ∈ {1, . . . , M } bounds regarding the neighbourhood size di(u) conditioned

on the sizes d1(u), . . . , di−1(u). With this as a tool we prove the asymptotic probabilistic

upper bound 2M + 1.

For the lower bound of the diameter we start with stating a theorem regarding the probability of intersection of two randomly chosen sets of the same size, which we apply on the probability that two neighbourhoods Γi(u) and Γi(v) have a nonempty

intersec-tion. After a couple of calculations we were able to prove the probabilistic asymptotic lower bound 2M − 3.

In the end the Strogatz-Watts graphs are introduced as the link between the proven limit and the small world phenomenon. For the Strogatz-Watts graphs on n points we formulate a similar theorem bounding the diameter, which reduces it to a constant times log n.

Title: The Small World Phenomenon

Authors: Sjoerd Janse, sjoerd.janse@gmail.com, UvA studentnumber: 10246134 Supervisor: Dr. Sonja Cox

Second grader: Dr. Peter Spreij Date: February 5, 2016

Korteweg-de Vries Institute for mathematics University of Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

(3)

1 Preface

Almost all of us have had the experince of encountering someone far grom home, who, to our surprise turns out to share a mutual acquaintance with us. This kind of experience occurs with sufficient frequency so that our language even provides a clich´e to be uttered at the appropriate moment of recognizing mutual acquiantances. We say, “My it’s a small world”- Stanlry Milgram [1] The idea of the small world phenomenon started with Stanley Milgram, who lived from 15 August 1933 till 20 December 1984. When he reached the age of 34 he decided to do a test. He wanted to reach a stockbroker in Boston Massachuttes from his own place Omaha, Nebraska. The only trick in this game was he was not allowed by himself to send the package directly, but only via persons of whom he knew the first name. So he sent 160 packages to friends with the instruction to send the package with the same instructions to a person they knew by the first name, and with the goal to reach the stockbroker.

The result was that within four days the first package had already reached the

stockbroker via only two acquaintances. This was highly exceptional, but the surprise was, that the packages that reached the stockbroker went via a mean of five

acquaintances [2].

Of course, this was not a really big group but the idea was set. So in 2008 Microsoft decided to do a test on the data they collected from their Messenger service, which included 180 million people. It seemed that the average chain of users was 6.6[4]. It not excluded a users chain of 29 people, but it sure confirmed the test of Milgram. Nowadays this behavior of networks is known as the small world phenomenon, with the result called six degrees of separation: the chain of knowing people has an average length of six people.

Mathematically this phenomenon is proven via probability theory on the diameter of graphs, where conditionally on the neighbourhoods Γ1(u), Γ2(u), . . . , Γi−1(u) one can

say something about Γi(u). Moez Draief and Laurent Massouli´e provide a compact

proof in their Epidemics and Rumours in Complex Networks in chapter 6 [3]. Altough I only mention the result in the end of this thesis, I did work fully through chapter 4 of [3], which deals with an easier problem regarding the more random, though easier, Erd˝os-R´enyi graphs. This I have tried to rewrite into a level that students in the last phase of their bachelor will understand.

(5)

2 Graphs

We begin this chapter with definitions concerning graphs and will then introduce the Erd˝os-R´enyi graphs.

2.1 Graphs

Definition 2.1. A graph G = (V, E) is a collection of vertices V = {v1, v2, . . . , vn} and

edges E ⊆ V × V , the interpretation being that (vi, vj) ∈ E if and only if there is an

edge between two distinct points vi, vj ∈ V .

For graphs we can define the diameter, which is defined by means of distance between points of V .

Definition 2.2. Let G = (V, E) be a graph and u, v ∈ V . The distance d(u, v) is the length of the shortest path in G between u and v. If two points u and v are not connected by a path in G we write d(u, v) = ∞. We write dG(u, v) to emphasize that

the distance is taken in a graph G. We define the diameter D(G) as the maximum of the distances between all the points of G, so D(G) := maxu,v∈Gd(u, v).

Definition 2.3. For a graph G = (V, E) and u ∈ V we define the set Γi(u) = {v ∈ V : d(u, v) = i}

to be the set of points in G with distance i from u. The size #Γi(u) we denote with

di(u), called the neighbourhood size of u at distance i.

Now we will introduce the Erd˝os-R´enyi graphs and a proposition regarding the neighbourhood sizes of these graphs, which will be useful in the rest of this thesis.

2.2 Erd˝

os-R´

enyi graphs

Definition 2.4. For an integer n and a probability value 0 ≤ p ≤ 1 we define the Erd˝os-R´enyi graph G(n, p) = (V (n, p), E(n, p)) to be the graph with n vertices, where each edge is an element of E(n, p) with probability p, independently of all other edges. Remark. Introduce the notation“for any point x ∈ G(n, p)”, means that “for any x ∈ V (n, p)”, with the understanding that G(n, p) = (V (n, p), E(n, p)).

Remark. For n points and p = 0 you have the empty graph, and for p = 1 you always will have the complete graph on n points. In general there are n₂ = 1₂n(n − 1) possible

(6)

Example 2.5. The next picture gives some evaluations of random graphs for n = 20 points for different probability values p.

G(20, 0) G(20, 0.05) G(20, 0.1) Definition 2.6. We denote with L(X|F ) the distribution of a random variable X given some σ-algebra F .

Proposition 2.7. For an Erd˝os-R´enyi graph G(n, pn) = (V, E) with neighbourhood

sizes di(u) for i = 1, 2, . . . , D(G), it holds that:

L(d_i(u)|d1(u), . . . , di−1(u)) = Bin

n − 1 − d1(u) − . . . − di−1(u), 1 − (1 − p)di−1(u)

. Proof. For convenience we denote for j ∈ {1, 2, . . . , i} the probability that a point v ∈ V \ Γj(u) is connected with one of the dj(u) points of Γj(u) by pj. It holds that

pj = P (v is connected with one of the dj(u) points)

= 1 − P (v is not connected with any of the dj(u) points)

= 1 − (1 − p)dj(u)_.

Let x1, . . . , xi be positive integers such that Pik=1xk≤ n. We now express the

probability of di(u) = xi given d1(u) = x1, . . . , di−1(u) = xi−1:

P (di(u) = xi|d1(u) = x1, . . . , di−1(u) = xi−1)

= P _x

i points of the n − x1− . . . − xi−1 points in V \

Si−1

k=1Γi(u)

are connected with Γi−1(u) and the other n − x1− . . . − xi points are not

=n − x1− . . . − xi−1 xi

(pi−1)xi(1 − pi−1)n−x1−...−xi.

Substituting pi−1= 1 − (1 − p)dj(u) and xj = dj(u) for j = 1, 2, . . . , i gives us the

(7)

3 The main theorem

In this chapter we will formulate the main theorem which we are handling in this thesis. Before stating the theorem we will first formulate and prove some useful bounds on the logarithmic and exponential function.

3.1 Approximation limits

In the next two chapters we will work towards an upper and lower bound by bounding the expectation of random variables. We will need some estimates which are stated below.

Lemma 3.1. For every x > 0 it holds that log (1 + x) ∈x −1

2x 2_{, x .}

Proof. For every x > 0 the following (in)equalities hold: log (1 + x) = Z 1+x 1 1 sds = x + Z 1+x 1 1 s− 1 ds = x + Z 1+x 1 Z s 1 − 1 u2 duds = x − Z 1+x 1 Z s 1 1 u2 duds ≥ x − Z 1+x 1 Z s 1 1 duds = x − Z 1+x 1 (s − 1)ds = x − Z x 0 s ds = x − 1₂x2,

where at the inequality we use 1 ≤ 1 + x and 1 ≤ s ∈ [1, 1 + x]. Because _u12 ≥ 0 we also

have proven in the second line that x is the upper bound.

Another usefull upper bound for the logarithmic function is stated in the next Lemma Lemma 3.2. For every x ∈ (0, ∞) it holds that

log x ≤ x − 1. Proof. For every x ∈ (0, 1) it holds that

log x = Z x 1 1 udu ≤ Z x 1 1 du = x − 1.

(8)

We have also a similar lemma for the Euler function. Lemma 3.3. For every x < 0 it holds that

1 − ex∈−x − 1 2x

2_{, −x .}

Proof. For every x < 0 the following inequalities holds: 1 − ex = Z 0 x esds = −x − Z 0 x (es− 1) ds = −x − Z 0 x Z 0 s eududs ≥ −x − Z 0 x Z 0 s duds = −x − 1₂x2.

Where the last equality follows by the same calculations as in the last part of the proof of Lemma 3.1. Furthermore, because eu > 0 for all u, we also have the upper bound −x.

The next lemma contains two upper bounds, which will be useful for proving bounds on the expectation of the neighbourhood sizes. We first state the lemma as it is stated in [3] as Lemma 2.4, and then we rewrite it to be applicable in Chapter 4.

Lemma 3.4. Let X be a sum of independent {0, 1}-valued random variables with expectation E(X) and h : R → R be given by h(x) := (1 + x) log(1 + x) − x. Then for all ζ > 0 it holds that:

P (X − E(X) ≥ ζE(X)) ≤ e−E(X)h(ζ) and P (X − E(X) ≤ −ζE(X)) ≤ e−E(X)h(−ζ). For the proof see page 25 of [3].

Remark. The Chernoff bound can be rewritten as follows. With the same definitions and conditions as in Lemma 3.4 it holds that:

P (X ≥ (1 + ζ)E(X)) ≤ e−E(X)h(ζ) and P (X ≤ (1 − ζ)E(X)) ≤ e−E(X)h(−ζ). (3.1)

3.2 Limiting the diameter

Let α ∈ 0,1₂ and for n ∈ {2, 3, . . .} define the probability pn by

pn=

nα

n − 1. (3.2)

Because n ≥ 2 implies pn> 0 and α ∈ (0,1₂) implies pn< 1, it follows that pn is indeed

a probability.

Definition 3.5. For x ∈ R we denote with bxc the biggest integer smaller than or equal to x. So

(9)

Now we can state the main theorem about the diameter of Erd˝os-R´enyi graphs: Theorem 3.6. Let α ∈ R with α ∈ 0,1₂, let

pn= nα n − 1 and M = 1 2α . (3.3)

Then for the Erd˝os-R´enyi graph G(n, pn) it holds that:

lim

n→∞P (D(G(n, pn)) ∈ {2M − 3, 2M − 2, 2M − 1, 2M, 2M + 1}) = 1. (3.4)

We will prove Theorem 3.6 by proving Theorem 4.6 which states that the diameter of G(n, pn) is bounded from above by 2M + 1 with high probability. We finish by proving

Theorem 5.1 which states that the diameter D(G(n, pn)) is bounded from below by

2M − 3 with high probability.

Remark. Theorem 3.6 is a minor simplification of Theorem 4.2 on page 48 of [3]. Indeed, in that book, one does not assume that the probability pn is of the form (3.2),

but instead assumes that both lim n→∞ log n (n − 1)pn = 0 and lim n→∞ (n − 1)p_√ n n = 0

(10)

4 Upper bound

Before stating the theorem of the upper bound we first will introduce a definition and state Lemma 4.4 about the neighbourhood sizes. It turns out that this lemma is also necessary for the lower bound in Chapter 5. In this chapter we assume α ∈ 0,1₂ and M to be_2α1

4.1 The neighbourhood size

Througout this section we will fix an arbitrary small > 0 and suppose n is big, so especially n > 3. Recall that for every u ∈ V (n, pn) and j ∈ {1, . . . , D(G(n, pn))}, de

neighbourhood size dj(u) denotes the number of points v ∈ V (n, pn) with d(u, v) = j.

Definition 4.1. Define for all i ∈ {3, . . . , M }:

d−_i = (1 − )2(1 − n−α)i−2niα d+_i = (1 + )2(1 + n−α)i−2niα and define for all points u ∈ G and i = 3, . . . , M the event

Ei(u) = {d−_i ≤ di(u) ≤ d+_i },

with its complement

E_i(u) = {di(u) < d−i } ∪ {d +

i < di(u)}.

Remark. For the rest of this chapter it will be useful to have the following two propositions regarding d−_j and d+_j :

Proposition 4.2. For every i, j ∈ {3, . . . , M } with i ≤ j it holds that: d−_i ≤ d+_i ≤ d+_j .

Proof. The first inequality directly follows from the fact that and n are greater than 0. The second inequality follows by the observation that for every j ≥ i it holds that:

d+_j = (1 + )2(1 + n−α)j−2njα= d+_i (1 + n−α)j−in(j−i)α with the notice that both coefficients are greater than one for every n ≥ 3. Proposition 4.3. For a fixed α ∈ (0,1₂) and M =₁

2α there is a constant C < ∞ such

that: lim sup n→∞ d+_{M −1} √ n = C.

(11)

Proof. Filling in the definition we have d+_{M −1}= (1 + )2(1 + n −α₎M −1 (1 + n−α)2 n α(M −1) ≤ 1 + 1 + n−α 2 (1 + n−α)2α1 n 1 2 ≤ (1 + )2_{(1 + )}_2α1 √_n = (1 + )2+2α1 √ n.

Where in the first inequality we have used M − 1 < _2α1 and in the second inequality we have used that n−α < 1, so 1 < 1 + n−α< 1 + . As it holds that < ∞ this finishes the proof.

Now we can state the lemma about P (Ei(u)), the probability that the number of points

with distance i to u is bounded by d−_i and d+_i .

Lemma 4.4. Let Ei(u) be as in Definition 4.1, take α ∈ 0,1₂ and M = _2α1 . Then

for any fixed K > 0 and large enough n it holds that:

P (Ei(u)) ≥ 1 − M n−K, for u ∈ 1, . . . , n, i = 1, . . . , M.

Proof. For readability we first introduce the i − 1-dimensional subset of Ni−1 Di−1= {d ∈ Ni−1: ∀k ∈ {1, . . . , i − 1} dk∈ (d−_k, d+_k)}.

We know rewrite the following: P Ei|E1, . . . , Ei−1

= P {di< d−i }|E1, . . . , Ei−1 + P {di> d+i }|E1, . . . , Ei−1

≤ max_d∈D_i_{P d}_i < d−_i |(d_i, . . . , di−1) = d + maxd∈DiP di > d

For the second equality we use Proposition 2.7 for L(di(u)|d1(u), . . . , di−1(u)). For the

last inequality we used three things. Firstly for d ∈ Di−1 we have

n ≥ n − 1 − d1− . . . − di−1≥ n − 1 − d+1 − · · · − d +

(12)

Secondly it holds that d+_i−1≥ d_i−1 ⇒ (1 − p_n)d+i−1 ≤ (1 − p n)di−1 ⇒ 1 − (1 − pn)d + i−1 ≥ 1 − (1 − p_n₎di−1_. _(4.3)

Thirdly we used the following properties of the binomial distribution: for all k ∈ N it holds that

P (X < k|X ∼ Bin(n; p)) ≤ P (Y < k|Y ∼ Bin(N ; p)) for n ≤ N, P (X < k|X ∼ Bin(n; p)) ≤ P Y < k|Y ∼ Bin(n; p0) for p ≤ p0. Filling in (4.2) and (4.3) in the binomial distribution gives us indeed the last

inequality. The following section is the link between the two probabilities of equation (4.1) and the result of the Chernoff bound in Lemma 3.4.

4.1.1 Applying the Chernoff bound

We want to estimate the expectations E X|X ∼ Bin n − 1 − d+₁ − · · · − d+_i−1; 1 − (1 − p)d−i−1 (4.4) = (n − 1 − d+₁ − · · · − d+_i−1)(1 − (1 − pn)d − i−1₎ and E XX ∼ Bin n; 1 − (1 − pn)d + i−1 = n(1 − (1 − pn)d + i−1_). _(4.5)

Below we will first provide a bound on (1 − (1 − pn)d

−

i−1 _{and 1 − (1 − p}

n)d

+

i−1_{. The}

bound is given in the same way for both d−_i−1 and d+_i−1, so we use the notation d±_i−1. Because 1 − pn∈ (0, 1) we can apply Lemma 3.2 which gives us for every x ∈ (0, 1) the

upper bound x − 1 for log x. So the inequality

d±_i−1log (1 − pn) ≤ −d±_i−1pn

holds and therefore it holds that 1 − (1 − pn)d

±

i−1 _{= 1 − e}d ±

i−1log(1−pn)_{≤ 1 − e}−d±i−1pn_.

Because −d±_i−1pn< 0 we next can apply Lemma 3.3 which gives for every x < 0 the

upper bound −x for 1 − ex. So it holds that 1 − (1 − pn)d

±

i−1 ≤ d±

i−1pn.

As α ∈ 0,1₂ it holds that limn→∞

√

npn= 0. According to Proposition 4.3 we also

know that lim n→∞ d±_i−1 √ n = C holds, so limn→∞d ± i−1pn= 0.

(13)

So for the expectations stated in (4.4) and (4.5) this means that E

X|X ∼ Binn − 1 − d+₁ − · · · − d+_i−1; 1 − (1 − p)d−i−1

= nαd−_i−1 n n − 1 = n α_d− i−1 1 + 1 n − 1 = nαd−_i−1(1 + η) and E XX ∼ Bin n; 1 − (1 − pn)d + i−1 = nαd+_i−1 n n − 1 = n α_d+ i−1 1 + 1 n − 1 = nαd+_i−1(1 + η)

. with η = _n−11 , so it holds limn→∞η = 0. Furthermore we have that for a random

variable X1 with law

L(X1) = Bin n − 1 − k, 1 − (1 − p)d−i−1 it holds that P X1 ≤ d−_i = P X1≤ 1 − nα nαd−_i−1 = P X1 ≤ (n − 1) 1 −_nα n EX1 ! = P X1 ≤ 1 − nα − η 1 + η EX1 . and for a random variable X2 with law

L(X₂) = Binn, 1 − (1 − p)d+i−1 it holds that P X2≥ d+_i = P X2 ≥ 1 + nα nαd+_i−1 = P X2 ≥ 1 +_nα 1 + η EX2 = P X2 ≥ 1 + nα − η 1 + η EX2 . Now we can apply the Chernoff bound of Lemma 3.4 twice. With

ζ =

nα − η

1 + η the right part of equation (3.1) provides

+ + d− − −nα_d−

(14)

and the left part of (3.1) provides: P Bin n; 1 − (1 − pn)d + i−1 ≥ d+_i ≤ e−nαd+i−1(1+η)h(−ζ)_.

So together this gives:

P Ei|E1, . . . , Ei−1 ≤ e−n

α_d−

i−1(1+η)h(ζ)_{+ e}−n α_d+

i−1(1+η)h(−ζ)_. _(4.6)

Now for the exponents we first state the next proposition regarding the function h mentioned in the Chernoff bound.

Proposition 4.5. For every x ∈ (0, 1) and the function h : R → R stated below it holds that h(x) := (1 + x) log (1 + x) − x ∈ 1 2x 2_,1 2x 2 1 +2 3x . Proof. Define the function f : R → R, f (x) = log x then it holds:

f0(x) = 1_x, f00(x) = −_x12 and f

000_{(x) =} 2 x3.

and thereby we have

f (1) = 0, f0(1) = 1, f00(1) = −1 and f000(1) = 2. The Taylor expansion of log x around 1 + x will be:

log (1 + x) = 0 + (x) −1 2(x)

2_{+ C(x)}3_{= x −} 1

2x

2_{+ Cx}3 _{with C > 0 a constant.}

So for the function h(x) as in the proposition it holds that: h(x) = x(1 + x) −1 2(x 2_{+ x}3_{) + Cx}3_{− x =} 1 2x 2_{+ (C −} 1 2)x 3_.

By the Tailor expansion we have that for C it holds that: C = f

000₍₁₎

3! = 1 3 As x ∈ (0, 1) it holds that x3 > 0 and therefore:

h(x) ∈ 1 2x 2_,1 2x 2₊1 3x 3 .

For the exponents in equation (4.6) it holds that:

−nαd−_i−1(1 + η)h(ζ) ≥ −nαd−_i−1(1 + η)1 2ζ 2 1 +2 3ζ =: c1nαd−i−1 n1α 2 .

(15)

and −nαd+_i−1(1 + η)h(ζ) ≥ −nαd+_i−1(1 + η)1 2ζ 2 1 +2 3ζ =: c2nαd+_i−1 _n1α 2

for some constants c1 and c2 greater than 0. Because α < 1₂ this gives for fixed K an

N ∈ N such that for all n > N it holds that:

P Ei|E1, . . . , Ei−1 ≤ 2e−c log n≤ n−K. (4.7)

4.1.2 Finishing the proof

Continuation of the proof of Lemma 3.3. Now concluding: P (Ei(u)) ≥ P (E1(u), . . . , Ei(u))

≥ P (E1(u), . . . , Ei−1(u)) − P Ei|E1(u), . . . , Ei−1(u)

≥ P (E1(u)) − i X j=2 P Ei|Ej(u), . . . , Ej−1(u) = 1 − P E1(u) − i X j=2 P Ei|Ej(u), . . . , Ej−1(u) ≥ 1 − n−K− (M − 1)n−K = 1 − M n−K.

Where in the second inequality we use the property that for two events A and B, with B being the complement of B, it holds that:

P (A, B) = P (B|A) P (A) = (1−P B|A)P (A) = P (A)−P (A) P B|A ≥ P (A)−P B|A , and for the last inequality we use M times equation (4.7).

4.2 Upper bound theorem

As said in the end of section 3.2, we now can prove the bounds of the diameter. We will first prove the upper bound:

Theorem 4.6. Suppose α ∈ (0,1₂), M =_2α1 and we have the probability p = _n−1nα . Then for every Erd˝os-R´enyi graph G(n, pn) it holds that:

lim

n→∞P (D(G(n, pn)) ≤ 2M + 1) = 1 (c.q. equation (3.4)),

Proof. For two nodes u, v ∈ G(n, pn) we have that d(u, v) > 2M + 1 if

(16)

which is equivalent to saying M [ i=1 Γi(u) ! \ M [ i=1 Γi(v) ! = ∅

and additionally all the dM(u) × dM(v) edges (i.e. all the possible edges between

ΓM(u) and ΓM(v)) must not be a part of graph G(n, pn). The probability that all the

dM(u) × dM(v) are absent from graph G(n, pn) is given by

P (d(u, v) > 2M + 1 | Γ1(u), . . . , ΓM(u), Γ1(v), . . . , ΓM(v)) ≤ (1 − pn)dM(u)dM(v).

Note that by Lemma 4.4:

E1{dM(u)>d+M}+1{dM(v)>d+M}+ (1 − pn)( di M) 2 ≤ PEi(u) + P Ei(v) + (1 − pn)(d − M)2 ≤ M n−K+ M n−K+ e−nα = 2M n−K+ e−nα.

For a fixed K > 0 this goes to zero for n → ∞. So the diameter will stay under 2M + 1 with high probability, and therefore the upper bound is proven.

(17)

5 Lower bound

In this chapter we also assume α ∈ 0,1₂, M = _2α1 . Now we have proven the upper bound, it remains to show:

Theorem 5.1. Take α ∈ 0,1₂, M = _2α1 and the probability pn= n

α

n−1. Then for

every Erd˝os-R´enyi graph G(n, pn) it holds that:

lim

n→∞P (D(G(n, pn)) > 2M − 4) = 1 (c.q. equation (3.4)).

Before starting the proof, we will first state a theorem about the probability of two sets having an empty intersection.

5.1 Empty sets

Theorem 5.2. Given a set C of n items, a sequence (rn)n∈N such that

limn→∞√rn_n = 0 and two subsets C1 and C2 of size rn selected independently and

uniformly at random from C. Then it holds that: P (C1∩ C2 = ∅) = (1 + c1)e−

r2_n n+Rn

with c1 a bounded constant and Rn satisfies

lim

n→∞Rn=

r3 n

(n − 2rn)2

R for some finite constant R > 0.

In this chapter we will for readability write r instead of rn, knowing it is dependent of

n and it satisfies: lim n→∞ r √ n = 0.

Before proving above theorem we first state Stirlings formula and a corollary of this formula for limits.

Lemma 5.3 (Stirlings formula). For all n ∈ N define f (n) = n! and g(n) =√2πnn+12_e−n_{, then} lim n→∞ n! √ 2πnn+12e−n − 1 = lim n→∞ f (n) g(n) − 1 = 0.

(18)

We are interested to approximate the probability P (C1∩ C2= ∅) for C1 and C2 as

mentioned in Theorem5.2. Because there are _nr possible unordered subsets of C with size r, this probability equals

r n r n r n−r r n = (n−r)! (n−2r)!r! n! (n−r)!r! = (n − r)!(n − r)! (n − 2r)!n! (5.1) Therefore we will first deduce the next corollary regarding the difference between

(n − r)!(n − r)! (n − 2r)!n! and

(n − r)2(n−r)+1_e−2(n−r)

(n − 2r)n−2r+12e−(n−2r)_nn+1₂_e−n

.

Corollary 5.4. Let f (n) and g(n) be defined as in Lemma 5.3 for all n ∈ N then it holds that: lim n→∞ f2(n − r) f (n − 2r)f (n)− g2(n − r) g(n − 2r)g(n) = 0 Proof. lim n→∞ f2(n − r) f (n − 2r)f (n)− g2(n − r) g(n − 2r)g(n) = lim n→∞ g2(n − r) g(n − 2r)g(n) g(n − 2r) f (n − 2r) g(n) f (n) f (n − r) g(n − r) f (n − r) g(n − r) − 1 (5.2) The right part of (5.2) will be zero because all the individual limits satisfies Stirlings formula: lim n→∞ f (m) g(m) − 1 = 0 for m ∈ {n − 2r, n − r, n}.

For the corollary to be true, we are rested to show that the left part of (5.2) is bounded for all n ∈ N. It holds

g2(n − r) g(n − 2r)g(n) = √ 2π2 √ 2π√2π (n − r)n−r+12e−n+r 2 nn+12e−n(n − 2r)n−2r+ 1 2e−n+2r = (n − r)2n−2r+1n−(n+12)(n − 2r)2r−n− 1 2 = e ∗ z }| { (2n − 2r + 1) log (n − r) ∗∗ z }| { −(n + 1₂) log n+(2r−n−1₂) log (n−2r) _(5.3)

(19)

log (n − 2r) = log n + log (1 −_nr) ∗∗ = −(n + 1₂) log (n − r) + n log (1 − r_n) +1 2log (1 − r n) ∗ + ∗∗ = (2n − 2r + 1 − n − 1₂) log (n − r) + n log (1 −_nr) + 1 2log (1 − r n) = (n − 2r) log (n − r) + n log (1 − _nr) − 1 2 log (n − r) + log (1 − r n) = (n − 2r) log (1 + r n − 2r) + log (n − 2r) + n log (1 −_nr) −1 2log (n − r)2 n So the exponent of (5.3) equals

(n − 2r) log (1 + r n − 2r) + n log (1 − r n) − 1 2log (n − 2r) + 1 2log (n − r)2 n . So the fraction yields

g2_{(n − r)} g(n − 2r)g(n) = e −1 2log (n−2r)+ 1 2log (n−r)2 n e(n−2r) log (1+n−2rr )+n log (1− r n) = s (n − r)2 (n − 2r)ne (n−2r) log (1+_n−2rr )+n log (1−_nr) .

So for proving (5.2) we need the product of the square root and the exponential to be finite. Because it holds that

1 = lim n→∞ n − 2r n − 2r ≤ limn→∞ n − r n − 2r ≤ limn→∞ n − r n − r ≤ lim n→∞ n − r n ≤ limn→∞ n n = 1, for the square root we have

lim n→∞ s (n − r)2 (n − 2r)n = limn→∞ r n − r n − 2r r n − r n = 1.

Now for the exponential, we use Lemma 3.1 which tells us that for every x > 0, log (1 + x) ∈ x −1 2x 2_{, x} . Now we rewrite the exponent:

(n − 2r) log (1 + r n − 2r) + n log (1 − r n) = n log 1 + r n − 2r 1 − r n − 2r log 1 + r n − 2r = n log 1 +nr − r(n − 2r) − r 2 n − 2r − 2r log 1 + r n − 2r = n log 1 + r 2 − 2r log 1 + r

(20)

Using Lemma 3.1 with x = _n−2rr and x = −_nr we have (n − 2r) log (1 + r n − 2r) + n log (1 − r n) ∈ n r2 n − 2r − r4 2(n − 2r)2 − 2r r n − 2r, n r2 n − 2r − 2r r n − 2r − r2 2(n − 2r)2 = − r 2 n − 2r − r4 2n(n − 2r)2, − r2 n − 2r + r3 (n − 2r)2 So lim n→∞ g2(n − r) g(n − 2r)g(n) ≤ lim_n→∞max − r2 n − 2r − r4 2n(n − 2r)2 , − r2 n − 2r + r3 (n − 2r)2 , which is finite. So lim n→∞ f2_{(n − r)} f (n − 2r)f (n)− g2_{(n − r)} g(n − 2r)g(n) = Cn· 0

with limn→∞Cn< ∞, so the limit equals 0 as desired.

Remark. Because limn→∞_2nr is a constant, even equal 0. It holds that

− r 4 2n(n − 2r)2 = − r 2n r3 (n − 2r)2 = Rn r3 (n − 2r)2, with Rn as in Theorem 5.2.

Proof of Theorem 5.2. Let n ∈ N, C be a set of n items and r such that limn→∞√r_n = 0. As mentioned in (5.1)

P (C1∩ C2= ∅) =

(n − r)!(n − r)! (n − 2r)!n! . Denoting g(n) =√2πnn+12e−n, Corollary 5.4 yields

lim n→∞ P (C1∩ C2= ∅) − g2(n − r) g(n − 2r)g(n) = 0 So it holds that: lim n→∞P (C1∩ C2= ∅) = limn→∞ s (n − r)2 (n − 2r)ne (n−2r) log (1+_n−2rr )+n log (1−r_n) = lim n→∞(1 + c1)e − r2 n−2r+ r3 (n−2r)2

with limn→∞c1= 0 as it is the limit of the square root and it will be bounded by the

(21)

5.2 Nonempty sets

Theorem 5.5. Given a set C of n items, a sequence (rn)n∈N with the property

limn→∞√rn_n = 0 and two subsets C1 and C2 of size rn selected independently and

uniformly at random from C. Then it holds that: P (C1∩ C2 6= ∅) = Rn with lim

n→∞Rn= R

r2

n for a constant R > 0. Proof. We first notice

P (C1∩ C2 6= ∅) = P

C1∩ C2 = ∅

= 1 − P (C1∩ C2= ∅) .

Applying Theorem 5.2 we have P (C1∩ C2 6= ∅) ∈ 1 − e− r2 n−2r+ r3 (n−2r)2_{, 1 − e}− r2 n−2r− r4 2n(n−2r)2 . Using Lemma 3.3 that for every x < 0 it holds that 1 − ex∈−x −1

2x

2_{, −x with the}

previous result states e− r2 n−2r+ r3 (n−2r)2 ≤ r 2 n − 2r − r3 (n − 2r)2 e− r2 n−2r− r4 2n(n−2r)2 ≥ r 2 n − 2r + r4 2n(n − 2r)2 − 1 2 r2 n − 2r + r4 2n(n − 2r)2 2 So lim n→∞P (C1∩ C2 6= ∅) ≤ limn→∞ r2 n − 2r + r4 2n(n − 2r)2 − 1 2 r2 n − 2r + r4 2n(n − 2r)2 2 = Rn with Rn satisfies lim

n→∞Rn= R for a finite constant R

Now we have the probability of two sets C1 and C2 randomly chosen from sets of size r

having a nonempty intersection limited by the product of a finite constant and r_n2 we will translate our problem to this result.

5.3 Lower bound theorem

Proof of Theorem 5.1. Now chose the set C1 random from sets of size

d0(u) + d1(u) + · · · + dM −2(u) and the set C2 from sets of the size

(22)

1. First determine all Γ1(u), . . . , ΓM −2(u) from the n nodes and define Γ0(u) as the

set containing the additional n − d0(u) + d1(u) − · · · − dM −2(u) nodes. By

construction there are conditional on Γ1(u), . . . , ΓM −2(u) no edges between Γ0(u)

and Γi(u) for every i = 1, . . . , M − 3. Because adding edges is independent of the

ordering, the edges in Γ0(u) and between Γ0(u) and ΓM −2(u) are not affected by

the conditioning.

2. Now define for any u ∈ G(n, pn) the sets

Γ00(u) = Γ0(u) ∪ ΓM −2(u) and βi(u) = ∪ij=1∆i(u)

where ∆i(u) is a random set of size di(u) = |Γi(u)|.

3. Now we are going to construct C2

a) Pick a node v ∈ G(n, pn) at random.

b) If v ∈ Γ0(u), pick at random a set ∆1(v) taken from Γ00(u) \ {v}.

c) If ∆1(v) has empty intersection with ΓM −2(u), pick at random a set ∆2(v)

taken from Γ00(u) \ β1(v)

d) In general if ∆i−1(v) has empty intersection with βM −i(v), pick at random a

set ∆i(v) taken from Γ00(u) \ βi−1(v)

The probability that this procedure stops (i.e. that there is not an empty intersection) equals the probability that two sets C1 and C2 have a nonempty intersection. With C1

and C2 satisfy

|C₁| = 1 + d₁(u) + . . . dM −2(u) and |C2| = 1 + d1(v) + . . . dM −2(v)

and with

lim

n→∞

d1(u) + · · · + d_√ M −2(u)

n = 0. (5.4)

Equation (5.4) is true according to Proposition 4.3 which states that it holds that lim sup_n→∞d

+ M −1_√

n = C and the observation

d+_j = d+_j−1(nα+ ) = d+_{M −1}(nα+ )j−(m−1). For k ∈ {1, 2} it also holds that:

|Ck| ≤ 1 + d+1 + · · · + d+M −2≤ d + M −1 M −2 X j=1 (nα+ )j = Cn

(23)

for a sequence (Cn)n∈N satisfying limn→∞√Cn_n = 0. So:

P (d(u, v) ≤ 2M − 4)

= P (d(u, v) ≤ 2M − 4 ∩ A) + P d(u, v) ≤ 2M − 4 ∩ A

= P (d(u, v) ≤ 2M − 4|A) P (A) + P d(u, v) ≤ 2M − 4|AP A = P d(u, v) ≤ 2M − 4 M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! P M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! + P d(u, v) ≤ 2M − 4 M −2 [ i−1 E_i(u) ∪ Ei(v) ! P M −2 [ i−1 E_i(u) ∪ Ei(v) ! ≤ P d(u, v) ≤ 2M − 4 M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! + P M −2 [ i−1 Ei(u) ∪ Ei(v) ! ≤ O (1 + d + 1 · · · + d + M −2)2 n ! + M −2 X i=1 P Ei(u) + P Ei(v) = + 2(M − 2)n−K n→∞−−−→ 0

5.4 Concluding

So this ends the proof of the lowerbound, so we have proven both bounds of Theorem 3.6.

(24)

6 Small diameter

The result of Chapter 3 can be seen as the result that all the n persons living on the earth are connected via a little number of common acquaintances. The result of Theorem 3.6 is based on the assumption that the probability pn, the probability that

there is a connection between two nodes (e.g. persons) is the same for every pair of nodes. Therefore we introduce the so called Strogatz-Watts graphs.

6.1 Strogatz-Watts graph

Definition 6.1. Let m ∈ N and let n = m2. A Strogatz-Watts graph SW (n, pn) is a

semi random graph on the set V of n = m2 nodes where the nodes are connected via the following edges:

• via a grid line: point v is connected with its grid neighbours, so varying between the two and four edges;

• via a shortcut: point v is connected with probability pn with a uniformly chosen

point from V \ {v}.

Remark. Without shortcuts the diameter D(SW (n, p)) will be the distance between the left bottom and the right top, so equals 2√n. With the shortcuts added it will appear that the diameter decreases radically. Now the small world phenomenon is represented as follows:

Theorem 6.2. Let m ∈ N and let n = m2. Define a probability pn depending on n and

let SW (n, pn be a Strogatz-Watts graph. Then there is a constant Mp depending on the

probability pn such that for the diamerer D(SW (n, pn)) it holds that:

lim n→∞P D(SW (n, p)) ≤ Mplog (n) = 1.

The proof is ommited and for further reading you can consult Chapter 6 of [3] where the above theorem is stated and proven as Theorem 6.1.

(25)

7 Populaire samenvatting

Je hebt vast wel een keer meegemaakt dat je ergens op vakantie bent en dat je toevallig met iemand aan de praat raakt, vervolgens hebben jullie over een aantal personen die jullie kennen en daar blijkt een gemeenschappelijke vriend tussen te zitten. Het

grappige blijkt dat dit best vaak gebeurd, waardoor je soms zegt: “Wat is het toch een kleine wereld!” Dat klopt ook, het staat zelfs bekend als het small world phenomenon. Soms zie je hele coole mensen en vraag je je af of je ooit een hand met ze kan schudden. Wat nou als ik je vertel dat je maar op zes “handschudafstand” van ze vandaan bent? Wat is nou weer een handschudafstand? Stel je staat met Alice, Bob en Eve in een groep en je geeft Alice een hand, dan sta je één handschudafstand van Alice af. Stel nu dat Alice Bob een hand geeft, dan staat Alice één handschudafstand van Bob en van jouw af, en jij staat vervolgens twee handschudafstand van Bob af. Geeft Bob Eve vervolgens een hand, dan sta je drie handschudafstand van Eve af.

Stel nu dat je een hand schudden niet genoeg vindt, maar ook nog eens

facebookvrienden wilt zijn met die coole persoon. Eerstegraads vrienden zijn je eigen vrienden, tweedegraadsvrienden zijn de vrienden van je vrienden enzovoorts. Dan geldt er dat jij met iedereen, dus ook die coole persoon waar je je hand mee wilt schudden, hooguit een zesdegraads vriend is.

Deze theorie is ooit geintroduceerd door Milgram en vervoglens op grotere schaal getest door microsoft in haar messengerdatabase, waarna ik hem nu aan de hand van het boek Epidemics and Rumours in Complex Netwroks van Moez Draief en Laurent Massouli´e [3] wiskundig heb bewezen.

(26)

Bibliography

[1] Stanley Milgram, The Small-World Problem, Psychology Tofay, Vol. 1, No. 1, 61-67, 1967.

[2] Jeffrey Travers and Stanley Milgram, An Experimental Study of the Small World Problem, Sociometry, Vol. 32, No. 4, 425-443, 1969.

[3] Moez Draief and Laurent Massouli´e, Epidemics and Rumours in Complex Networks, Camebrdige university press, 46-67, 123, 2010.

[4] Jure Leskovec and Eric Horvitz, Planetary-Scale Views on an Instant-Messaging Network, Microsoft Research Technical ReportMSR-TR-2006-18, 1, 28, june 2007.

The Small World Phenomenon