The Small World Phenomenon
Sjoerd Janse
February 5, 2016
Bachelor thesis Supervisor: Dr. Sonja Cox
1 2 3 4 5 6
Abstract
In this thesis we present the result called the small world phenomenon, also refered to as six degrees of separation. For this result we first simplify the problem to the random Erd˝os-R´enyi graphs G(n, pn) with n points and between any pair of points exists an edge
with probability pn. We define pn by
pn=
nα n − 1,
where α ∈ 0,12 is a fixed parameter. For this simplified problem we prove the proba-bilistic asymptotic upper and lower bound for the diameter D(G(n, pn)). More precisely,
setting M = 1 2α we prove: lim n→∞P (D(G(n, pn)) ∈ {2M − 3, 2M − 2, 2M − 1, 2M, 2M + 1}) = 1.
In order to obtain this result we introduce for every u ∈ G(n, pn) the subgraphs
Γ1(u), Γ2(u), . . . , ΓM(u) of G(n, pn), the so called neigbourhoods and we denote their sizes
by d1(u), . . . , dM(u), called the neigbourhood sizes. We work towards Lemma 4.4 which
provides for i ∈ {1, . . . , M } bounds regarding the neighbourhood size di(u) conditioned
on the sizes d1(u), . . . , di−1(u). With this as a tool we prove the asymptotic probabilistic
upper bound 2M + 1.
For the lower bound of the diameter we start with stating a theorem regarding the probability of intersection of two randomly chosen sets of the same size, which we apply on the probability that two neighbourhoods Γi(u) and Γi(v) have a nonempty
intersec-tion. After a couple of calculations we were able to prove the probabilistic asymptotic lower bound 2M − 3.
In the end the Strogatz-Watts graphs are introduced as the link between the proven limit and the small world phenomenon. For the Strogatz-Watts graphs on n points we formulate a similar theorem bounding the diameter, which reduces it to a constant times log n.
Title: The Small World Phenomenon
Authors: Sjoerd Janse, sjoerd.janse@gmail.com, UvA studentnumber: 10246134 Supervisor: Dr. Sonja Cox
Second grader: Dr. Peter Spreij Date: February 5, 2016
Korteweg-de Vries Institute for mathematics University of Amsterdam
Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math
Contents
1 Preface 4
2 Graphs 5
2.1 Graphs . . . 5
2.2 Erd˝os-R´enyi graphs . . . 5
3 The main theorem 7 3.1 Approximation limits . . . 7
3.2 Limiting the diameter . . . 8
4 Upper bound 10 4.1 The neighbourhood size . . . 10
4.1.1 Applying the Chernoff bound . . . 12
4.1.2 Finishing the proof . . . 15
4.2 Upper bound theorem . . . 15
5 Lower bound 17 5.1 Empty sets . . . 17
5.2 Nonempty sets . . . 21
5.3 Lower bound theorem . . . 21
5.4 Concluding . . . 23
6 Small diameter 24 6.1 Strogatz-Watts graph . . . 24
7 Populaire samenvatting 25
1 Preface
Almost all of us have had the experince of encountering someone far grom home, who, to our surprise turns out to share a mutual acquaintance with us. This kind of experience occurs with sufficient frequency so that our language even provides a clich´e to be uttered at the appropriate moment of recognizing mutual acquiantances. We say, “My it’s a small world”- Stanlry Milgram [1] The idea of the small world phenomenon started with Stanley Milgram, who lived from 15 August 1933 till 20 December 1984. When he reached the age of 34 he decided to do a test. He wanted to reach a stockbroker in Boston Massachuttes from his own place Omaha, Nebraska. The only trick in this game was he was not allowed by himself to send the package directly, but only via persons of whom he knew the first name. So he sent 160 packages to friends with the instruction to send the package with the same instructions to a person they knew by the first name, and with the goal to reach the stockbroker.
The result was that within four days the first package had already reached the
stockbroker via only two acquaintances. This was highly exceptional, but the surprise was, that the packages that reached the stockbroker went via a mean of five
acquaintances [2].
Of course, this was not a really big group but the idea was set. So in 2008 Microsoft decided to do a test on the data they collected from their Messenger service, which included 180 million people. It seemed that the average chain of users was 6.6[4]. It not excluded a users chain of 29 people, but it sure confirmed the test of Milgram. Nowadays this behavior of networks is known as the small world phenomenon, with the result called six degrees of separation: the chain of knowing people has an average length of six people.
Mathematically this phenomenon is proven via probability theory on the diameter of graphs, where conditionally on the neighbourhoods Γ1(u), Γ2(u), . . . , Γi−1(u) one can
say something about Γi(u). Moez Draief and Laurent Massouli´e provide a compact
proof in their Epidemics and Rumours in Complex Networks in chapter 6 [3]. Altough I only mention the result in the end of this thesis, I did work fully through chapter 4 of [3], which deals with an easier problem regarding the more random, though easier, Erd˝os-R´enyi graphs. This I have tried to rewrite into a level that students in the last phase of their bachelor will understand.
2 Graphs
We begin this chapter with definitions concerning graphs and will then introduce the Erd˝os-R´enyi graphs.
2.1 Graphs
Definition 2.1. A graph G = (V, E) is a collection of vertices V = {v1, v2, . . . , vn} and
edges E ⊆ V × V , the interpretation being that (vi, vj) ∈ E if and only if there is an
edge between two distinct points vi, vj ∈ V .
For graphs we can define the diameter, which is defined by means of distance between points of V .
Definition 2.2. Let G = (V, E) be a graph and u, v ∈ V . The distance d(u, v) is the length of the shortest path in G between u and v. If two points u and v are not connected by a path in G we write d(u, v) = ∞. We write dG(u, v) to emphasize that
the distance is taken in a graph G. We define the diameter D(G) as the maximum of the distances between all the points of G, so D(G) := maxu,v∈Gd(u, v).
Definition 2.3. For a graph G = (V, E) and u ∈ V we define the set Γi(u) = {v ∈ V : d(u, v) = i}
to be the set of points in G with distance i from u. The size #Γi(u) we denote with
di(u), called the neighbourhood size of u at distance i.
Now we will introduce the Erd˝os-R´enyi graphs and a proposition regarding the neighbourhood sizes of these graphs, which will be useful in the rest of this thesis.
2.2 Erd˝
os-R´
enyi graphs
Definition 2.4. For an integer n and a probability value 0 ≤ p ≤ 1 we define the Erd˝os-R´enyi graph G(n, p) = (V (n, p), E(n, p)) to be the graph with n vertices, where each edge is an element of E(n, p) with probability p, independently of all other edges. Remark. Introduce the notation“for any point x ∈ G(n, p)”, means that “for any x ∈ V (n, p)”, with the understanding that G(n, p) = (V (n, p), E(n, p)).
Remark. For n points and p = 0 you have the empty graph, and for p = 1 you always will have the complete graph on n points. In general there are n2 = 12n(n − 1) possible
Example 2.5. The next picture gives some evaluations of random graphs for n = 20 points for different probability values p.
G(20, 0) G(20, 0.05) G(20, 0.1) Definition 2.6. We denote with L(X|F ) the distribution of a random variable X given some σ-algebra F .
Proposition 2.7. For an Erd˝os-R´enyi graph G(n, pn) = (V, E) with neighbourhood
sizes di(u) for i = 1, 2, . . . , D(G), it holds that:
L(di(u)|d1(u), . . . , di−1(u)) = Bin
n − 1 − d1(u) − . . . − di−1(u), 1 − (1 − p)di−1(u)
. Proof. For convenience we denote for j ∈ {1, 2, . . . , i} the probability that a point v ∈ V \ Γj(u) is connected with one of the dj(u) points of Γj(u) by pj. It holds that
pj = P (v is connected with one of the dj(u) points)
= 1 − P (v is not connected with any of the dj(u) points)
= 1 − (1 − p)dj(u).
Let x1, . . . , xi be positive integers such that Pik=1xk≤ n. We now express the
probability of di(u) = xi given d1(u) = x1, . . . , di−1(u) = xi−1:
P (di(u) = xi|d1(u) = x1, . . . , di−1(u) = xi−1)
= P x
i points of the n − x1− . . . − xi−1 points in V \
Si−1
k=1Γi(u)
are connected with Γi−1(u) and the other n − x1− . . . − xi points are not
=n − x1− . . . − xi−1 xi
(pi−1)xi(1 − pi−1)n−x1−...−xi.
Substituting pi−1= 1 − (1 − p)dj(u) and xj = dj(u) for j = 1, 2, . . . , i gives us the
3 The main theorem
In this chapter we will formulate the main theorem which we are handling in this thesis. Before stating the theorem we will first formulate and prove some useful bounds on the logarithmic and exponential function.
3.1 Approximation limits
In the next two chapters we will work towards an upper and lower bound by bounding the expectation of random variables. We will need some estimates which are stated below.
Lemma 3.1. For every x > 0 it holds that log (1 + x) ∈x −1
2x 2, x .
Proof. For every x > 0 the following (in)equalities hold: log (1 + x) = Z 1+x 1 1 sds = x + Z 1+x 1 1 s− 1 ds = x + Z 1+x 1 Z s 1 − 1 u2 duds = x − Z 1+x 1 Z s 1 1 u2 duds ≥ x − Z 1+x 1 Z s 1 1 duds = x − Z 1+x 1 (s − 1)ds = x − Z x 0 s ds = x − 12x2,
where at the inequality we use 1 ≤ 1 + x and 1 ≤ s ∈ [1, 1 + x]. Because u12 ≥ 0 we also
have proven in the second line that x is the upper bound.
Another usefull upper bound for the logarithmic function is stated in the next Lemma Lemma 3.2. For every x ∈ (0, ∞) it holds that
log x ≤ x − 1. Proof. For every x ∈ (0, 1) it holds that
log x = Z x 1 1 udu ≤ Z x 1 1 du = x − 1.
We have also a similar lemma for the Euler function. Lemma 3.3. For every x < 0 it holds that
1 − ex∈−x − 1 2x
2, −x .
Proof. For every x < 0 the following inequalities holds: 1 − ex = Z 0 x esds = −x − Z 0 x (es− 1) ds = −x − Z 0 x Z 0 s eududs ≥ −x − Z 0 x Z 0 s duds = −x − 12x2.
Where the last equality follows by the same calculations as in the last part of the proof of Lemma 3.1. Furthermore, because eu > 0 for all u, we also have the upper bound −x.
The next lemma contains two upper bounds, which will be useful for proving bounds on the expectation of the neighbourhood sizes. We first state the lemma as it is stated in [3] as Lemma 2.4, and then we rewrite it to be applicable in Chapter 4.
Lemma 3.4. Let X be a sum of independent {0, 1}-valued random variables with expectation E(X) and h : R → R be given by h(x) := (1 + x) log(1 + x) − x. Then for all ζ > 0 it holds that:
P (X − E(X) ≥ ζE(X)) ≤ e−E(X)h(ζ) and P (X − E(X) ≤ −ζE(X)) ≤ e−E(X)h(−ζ). For the proof see page 25 of [3].
Remark. The Chernoff bound can be rewritten as follows. With the same definitions and conditions as in Lemma 3.4 it holds that:
P (X ≥ (1 + ζ)E(X)) ≤ e−E(X)h(ζ) and P (X ≤ (1 − ζ)E(X)) ≤ e−E(X)h(−ζ). (3.1)
3.2 Limiting the diameter
Let α ∈ 0,12 and for n ∈ {2, 3, . . .} define the probability pn by
pn=
nα
n − 1. (3.2)
Because n ≥ 2 implies pn> 0 and α ∈ (0,12) implies pn< 1, it follows that pn is indeed
a probability.
Definition 3.5. For x ∈ R we denote with bxc the biggest integer smaller than or equal to x. So
Now we can state the main theorem about the diameter of Erd˝os-R´enyi graphs: Theorem 3.6. Let α ∈ R with α ∈ 0,12, let
pn= nα n − 1 and M = 1 2α . (3.3)
Then for the Erd˝os-R´enyi graph G(n, pn) it holds that:
lim
n→∞P (D(G(n, pn)) ∈ {2M − 3, 2M − 2, 2M − 1, 2M, 2M + 1}) = 1. (3.4)
We will prove Theorem 3.6 by proving Theorem 4.6 which states that the diameter of G(n, pn) is bounded from above by 2M + 1 with high probability. We finish by proving
Theorem 5.1 which states that the diameter D(G(n, pn)) is bounded from below by
2M − 3 with high probability.
Remark. Theorem 3.6 is a minor simplification of Theorem 4.2 on page 48 of [3]. Indeed, in that book, one does not assume that the probability pn is of the form (3.2),
but instead assumes that both lim n→∞ log n (n − 1)pn = 0 and lim n→∞ (n − 1)p√ n n = 0
4 Upper bound
Before stating the theorem of the upper bound we first will introduce a definition and state Lemma 4.4 about the neighbourhood sizes. It turns out that this lemma is also necessary for the lower bound in Chapter 5. In this chapter we assume α ∈ 0,12 and M to be2α1
4.1 The neighbourhood size
Througout this section we will fix an arbitrary small > 0 and suppose n is big, so especially n > 3. Recall that for every u ∈ V (n, pn) and j ∈ {1, . . . , D(G(n, pn))}, de
neighbourhood size dj(u) denotes the number of points v ∈ V (n, pn) with d(u, v) = j.
Definition 4.1. Define for all i ∈ {3, . . . , M }:
d−i = (1 − )2(1 − n−α)i−2niα d+i = (1 + )2(1 + n−α)i−2niα and define for all points u ∈ G and i = 3, . . . , M the event
Ei(u) = {d−i ≤ di(u) ≤ d+i },
with its complement
Ei(u) = {di(u) < d−i } ∪ {d +
i < di(u)}.
Remark. For the rest of this chapter it will be useful to have the following two propositions regarding d−j and d+j :
Proposition 4.2. For every i, j ∈ {3, . . . , M } with i ≤ j it holds that: d−i ≤ d+i ≤ d+j .
Proof. The first inequality directly follows from the fact that and n are greater than 0. The second inequality follows by the observation that for every j ≥ i it holds that:
d+j = (1 + )2(1 + n−α)j−2njα= d+i (1 + n−α)j−in(j−i)α with the notice that both coefficients are greater than one for every n ≥ 3. Proposition 4.3. For a fixed α ∈ (0,12) and M =1
2α there is a constant C < ∞ such
that: lim sup n→∞ d+M −1 √ n = C.
Proof. Filling in the definition we have d+M −1= (1 + )2(1 + n −α)M −1 (1 + n−α)2 n α(M −1) ≤ 1 + 1 + n−α 2 (1 + n−α)2α1 n 1 2 ≤ (1 + )2(1 + )2α1 √n = (1 + )2+2α1 √ n.
Where in the first inequality we have used M − 1 < 2α1 and in the second inequality we have used that n−α < 1, so 1 < 1 + n−α< 1 + . As it holds that < ∞ this finishes the proof.
Now we can state the lemma about P (Ei(u)), the probability that the number of points
with distance i to u is bounded by d−i and d+i .
Lemma 4.4. Let Ei(u) be as in Definition 4.1, take α ∈ 0,12 and M = 2α1 . Then
for any fixed K > 0 and large enough n it holds that:
P (Ei(u)) ≥ 1 − M n−K, for u ∈ 1, . . . , n, i = 1, . . . , M.
Proof. For readability we first introduce the i − 1-dimensional subset of Ni−1 Di−1= {d ∈ Ni−1: ∀k ∈ {1, . . . , i − 1} dk∈ (d−k, d+k)}.
We know rewrite the following: P Ei|E1, . . . , Ei−1
= P {di< d−i }|E1, . . . , Ei−1 + P {di> d+i }|E1, . . . , Ei−1
≤ maxd∈DiP di < d−i |(di, . . . , di−1) = d + maxd∈DiP di > d
+ i |(di, . . . , di−1) = d = maxd∈DiP X < d−i |X ∼ Binn − 1 − |d|; 1 − (1 − pn)d + i−1 + maxd∈DiP X > d+i |X ∼ Binn − 1 − |d|; 1 − (1 − pn)d + i−1 ≤ PBin n − 1 − d+1 − · · · − d+i−1; 1 − (1 − pn)d − i−1 ≤ d−i + PBinn; 1 − (1 − pn)d + i−1 ≥ d+i . (4.1)
For the second equality we use Proposition 2.7 for L(di(u)|d1(u), . . . , di−1(u)). For the
last inequality we used three things. Firstly for d ∈ Di−1 we have
n ≥ n − 1 − d1− . . . − di−1≥ n − 1 − d+1 − · · · − d +
Secondly it holds that d+i−1≥ di−1 ⇒ (1 − pn)d+i−1 ≤ (1 − p n)di−1 ⇒ 1 − (1 − pn)d + i−1 ≥ 1 − (1 − pn)di−1. (4.3)
Thirdly we used the following properties of the binomial distribution: for all k ∈ N it holds that
P (X < k|X ∼ Bin(n; p)) ≤ P (Y < k|Y ∼ Bin(N ; p)) for n ≤ N, P (X < k|X ∼ Bin(n; p)) ≤ P Y < k|Y ∼ Bin(n; p0) for p ≤ p0. Filling in (4.2) and (4.3) in the binomial distribution gives us indeed the last
inequality. The following section is the link between the two probabilities of equation (4.1) and the result of the Chernoff bound in Lemma 3.4.
4.1.1 Applying the Chernoff bound
We want to estimate the expectations E X|X ∼ Bin n − 1 − d+1 − · · · − d+i−1; 1 − (1 − p)d−i−1 (4.4) = (n − 1 − d+1 − · · · − d+i−1)(1 − (1 − pn)d − i−1) and E XX ∼ Bin n; 1 − (1 − pn)d + i−1 = n(1 − (1 − pn)d + i−1). (4.5)
Below we will first provide a bound on (1 − (1 − pn)d
−
i−1 and 1 − (1 − p
n)d
+
i−1. The
bound is given in the same way for both d−i−1 and d+i−1, so we use the notation d±i−1. Because 1 − pn∈ (0, 1) we can apply Lemma 3.2 which gives us for every x ∈ (0, 1) the
upper bound x − 1 for log x. So the inequality
d±i−1log (1 − pn) ≤ −d±i−1pn
holds and therefore it holds that 1 − (1 − pn)d
±
i−1 = 1 − ed ±
i−1log(1−pn)≤ 1 − e−d±i−1pn.
Because −d±i−1pn< 0 we next can apply Lemma 3.3 which gives for every x < 0 the
upper bound −x for 1 − ex. So it holds that 1 − (1 − pn)d
±
i−1 ≤ d±
i−1pn.
As α ∈ 0,12 it holds that limn→∞
√
npn= 0. According to Proposition 4.3 we also
know that lim n→∞ d±i−1 √ n = C holds, so limn→∞d ± i−1pn= 0.
So for the expectations stated in (4.4) and (4.5) this means that E
X|X ∼ Binn − 1 − d+1 − · · · − d+i−1; 1 − (1 − p)d−i−1
= nαd−i−1 n n − 1 = n αd− i−1 1 + 1 n − 1 = nαd−i−1(1 + η) and E XX ∼ Bin n; 1 − (1 − pn)d + i−1 = nαd+i−1 n n − 1 = n αd+ i−1 1 + 1 n − 1 = nαd+i−1(1 + η)
. with η = n−11 , so it holds limn→∞η = 0. Furthermore we have that for a random
variable X1 with law
L(X1) = Bin n − 1 − k, 1 − (1 − p)d−i−1 it holds that P X1 ≤ d−i = P X1≤ 1 − nα nαd−i−1 = P X1 ≤ (n − 1) 1 −nα n EX1 ! = P X1 ≤ 1 − nα − η 1 + η EX1 . and for a random variable X2 with law
L(X2) = Binn, 1 − (1 − p)d+i−1 it holds that P X2≥ d+i = P X2 ≥ 1 + nα nαd+i−1 = P X2 ≥ 1 +nα 1 + η EX2 = P X2 ≥ 1 + nα − η 1 + η EX2 . Now we can apply the Chernoff bound of Lemma 3.4 twice. With
ζ =
nα − η
1 + η the right part of equation (3.1) provides
+ + d− − −nαd−
and the left part of (3.1) provides: P Bin n; 1 − (1 − pn)d + i−1 ≥ d+i ≤ e−nαd+i−1(1+η)h(−ζ).
So together this gives:
P Ei|E1, . . . , Ei−1 ≤ e−n
αd−
i−1(1+η)h(ζ)+ e−n αd+
i−1(1+η)h(−ζ). (4.6)
Now for the exponents we first state the next proposition regarding the function h mentioned in the Chernoff bound.
Proposition 4.5. For every x ∈ (0, 1) and the function h : R → R stated below it holds that h(x) := (1 + x) log (1 + x) − x ∈ 1 2x 2,1 2x 2 1 +2 3x . Proof. Define the function f : R → R, f (x) = log x then it holds:
f0(x) = 1x, f00(x) = −x12 and f
000(x) = 2 x3.
and thereby we have
f (1) = 0, f0(1) = 1, f00(1) = −1 and f000(1) = 2. The Taylor expansion of log x around 1 + x will be:
log (1 + x) = 0 + (x) −1 2(x)
2+ C(x)3= x − 1
2x
2+ Cx3 with C > 0 a constant.
So for the function h(x) as in the proposition it holds that: h(x) = x(1 + x) −1 2(x 2+ x3) + Cx3− x = 1 2x 2+ (C − 1 2)x 3.
By the Tailor expansion we have that for C it holds that: C = f
000(1)
3! = 1 3 As x ∈ (0, 1) it holds that x3 > 0 and therefore:
h(x) ∈ 1 2x 2,1 2x 2+1 3x 3 .
For the exponents in equation (4.6) it holds that:
−nαd−i−1(1 + η)h(ζ) ≥ −nαd−i−1(1 + η)1 2ζ 2 1 +2 3ζ =: c1nαd−i−1 n1α 2 .
and −nαd+i−1(1 + η)h(ζ) ≥ −nαd+i−1(1 + η)1 2ζ 2 1 +2 3ζ =: c2nαd+i−1 n1α 2
for some constants c1 and c2 greater than 0. Because α < 12 this gives for fixed K an
N ∈ N such that for all n > N it holds that:
P Ei|E1, . . . , Ei−1 ≤ 2e−c log n≤ n−K. (4.7)
4.1.2 Finishing the proof
Continuation of the proof of Lemma 3.3. Now concluding: P (Ei(u)) ≥ P (E1(u), . . . , Ei(u))
≥ P (E1(u), . . . , Ei−1(u)) − P Ei|E1(u), . . . , Ei−1(u)
≥ P (E1(u)) − i X j=2 P Ei|Ej(u), . . . , Ej−1(u) = 1 − P E1(u) − i X j=2 P Ei|Ej(u), . . . , Ej−1(u) ≥ 1 − n−K− (M − 1)n−K = 1 − M n−K.
Where in the second inequality we use the property that for two events A and B, with B being the complement of B, it holds that:
P (A, B) = P (B|A) P (A) = (1−P B|A)P (A) = P (A)−P (A) P B|A ≥ P (A)−P B|A , and for the last inequality we use M times equation (4.7).
4.2 Upper bound theorem
As said in the end of section 3.2, we now can prove the bounds of the diameter. We will first prove the upper bound:
Theorem 4.6. Suppose α ∈ (0,12), M =2α1 and we have the probability p = n−1nα . Then for every Erd˝os-R´enyi graph G(n, pn) it holds that:
lim
n→∞P (D(G(n, pn)) ≤ 2M + 1) = 1 (c.q. equation (3.4)),
Proof. For two nodes u, v ∈ G(n, pn) we have that d(u, v) > 2M + 1 if
which is equivalent to saying M [ i=1 Γi(u) ! \ M [ i=1 Γi(v) ! = ∅
and additionally all the dM(u) × dM(v) edges (i.e. all the possible edges between
ΓM(u) and ΓM(v)) must not be a part of graph G(n, pn). The probability that all the
dM(u) × dM(v) are absent from graph G(n, pn) is given by
P (d(u, v) > 2M + 1 | Γ1(u), . . . , ΓM(u), Γ1(v), . . . , ΓM(v)) ≤ (1 − pn)dM(u)dM(v).
Note that by Lemma 4.4:
E1{dM(u)>d+M}+1{dM(v)>d+M}+ (1 − pn)( di M) 2 ≤ PEi(u) + P Ei(v) + (1 − pn)(d − M)2 ≤ M n−K+ M n−K+ e−nα = 2M n−K+ e−nα.
For a fixed K > 0 this goes to zero for n → ∞. So the diameter will stay under 2M + 1 with high probability, and therefore the upper bound is proven.
5 Lower bound
In this chapter we also assume α ∈ 0,12, M = 2α1 . Now we have proven the upper bound, it remains to show:
Theorem 5.1. Take α ∈ 0,12, M = 2α1 and the probability pn= n
α
n−1. Then for
every Erd˝os-R´enyi graph G(n, pn) it holds that:
lim
n→∞P (D(G(n, pn)) > 2M − 4) = 1 (c.q. equation (3.4)).
Before starting the proof, we will first state a theorem about the probability of two sets having an empty intersection.
5.1 Empty sets
Theorem 5.2. Given a set C of n items, a sequence (rn)n∈N such that
limn→∞√rnn = 0 and two subsets C1 and C2 of size rn selected independently and
uniformly at random from C. Then it holds that: P (C1∩ C2 = ∅) = (1 + c1)e−
r2n n+Rn
with c1 a bounded constant and Rn satisfies
lim
n→∞Rn=
r3 n
(n − 2rn)2
R for some finite constant R > 0.
In this chapter we will for readability write r instead of rn, knowing it is dependent of
n and it satisfies: lim n→∞ r √ n = 0.
Before proving above theorem we first state Stirlings formula and a corollary of this formula for limits.
Lemma 5.3 (Stirlings formula). For all n ∈ N define f (n) = n! and g(n) =√2πnn+12e−n, then lim n→∞ n! √ 2πnn+12e−n − 1 = lim n→∞ f (n) g(n) − 1 = 0.
We are interested to approximate the probability P (C1∩ C2= ∅) for C1 and C2 as
mentioned in Theorem5.2. Because there are nr possible unordered subsets of C with size r, this probability equals
r n r n r n−r r n = (n−r)! (n−2r)!r! n! (n−r)!r! = (n − r)!(n − r)! (n − 2r)!n! (5.1) Therefore we will first deduce the next corollary regarding the difference between
(n − r)!(n − r)! (n − 2r)!n! and
(n − r)2(n−r)+1e−2(n−r)
(n − 2r)n−2r+12e−(n−2r)nn+12e−n
.
Corollary 5.4. Let f (n) and g(n) be defined as in Lemma 5.3 for all n ∈ N then it holds that: lim n→∞ f2(n − r) f (n − 2r)f (n)− g2(n − r) g(n − 2r)g(n) = 0 Proof. lim n→∞ f2(n − r) f (n − 2r)f (n)− g2(n − r) g(n − 2r)g(n) = lim n→∞ g2(n − r) g(n − 2r)g(n) g(n − 2r) f (n − 2r) g(n) f (n) f (n − r) g(n − r) f (n − r) g(n − r) − 1 (5.2) The right part of (5.2) will be zero because all the individual limits satisfies Stirlings formula: lim n→∞ f (m) g(m) − 1 = 0 for m ∈ {n − 2r, n − r, n}.
For the corollary to be true, we are rested to show that the left part of (5.2) is bounded for all n ∈ N. It holds
g2(n − r) g(n − 2r)g(n) = √ 2π2 √ 2π√2π (n − r)n−r+12e−n+r 2 nn+12e−n(n − 2r)n−2r+ 1 2e−n+2r = (n − r)2n−2r+1n−(n+12)(n − 2r)2r−n− 1 2 = e ∗ z }| { (2n − 2r + 1) log (n − r) ∗∗ z }| { −(n + 12) log n+(2r−n−12) log (n−2r) (5.3)
log (n − 2r) = log n + log (1 −nr) ∗∗ = −(n + 12) log (n − r) + n log (1 − rn) +1 2log (1 − r n) ∗ + ∗∗ = (2n − 2r + 1 − n − 12) log (n − r) + n log (1 −nr) + 1 2log (1 − r n) = (n − 2r) log (n − r) + n log (1 − nr) − 1 2 log (n − r) + log (1 − r n) = (n − 2r) log (1 + r n − 2r) + log (n − 2r) + n log (1 −nr) −1 2log (n − r)2 n So the exponent of (5.3) equals
(n − 2r) log (1 + r n − 2r) + n log (1 − r n) − 1 2log (n − 2r) + 1 2log (n − r)2 n . So the fraction yields
g2(n − r) g(n − 2r)g(n) = e −1 2log (n−2r)+ 1 2log (n−r)2 n e(n−2r) log (1+n−2rr )+n log (1− r n) = s (n − r)2 (n − 2r)ne (n−2r) log (1+n−2rr )+n log (1−nr) .
So for proving (5.2) we need the product of the square root and the exponential to be finite. Because it holds that
1 = lim n→∞ n − 2r n − 2r ≤ limn→∞ n − r n − 2r ≤ limn→∞ n − r n − r ≤ lim n→∞ n − r n ≤ limn→∞ n n = 1, for the square root we have
lim n→∞ s (n − r)2 (n − 2r)n = limn→∞ r n − r n − 2r r n − r n = 1.
Now for the exponential, we use Lemma 3.1 which tells us that for every x > 0, log (1 + x) ∈ x −1 2x 2, x . Now we rewrite the exponent:
(n − 2r) log (1 + r n − 2r) + n log (1 − r n) = n log 1 + r n − 2r 1 − r n − 2r log 1 + r n − 2r = n log 1 +nr − r(n − 2r) − r 2 n − 2r − 2r log 1 + r n − 2r = n log 1 + r 2 − 2r log 1 + r
Using Lemma 3.1 with x = n−2rr and x = −nr we have (n − 2r) log (1 + r n − 2r) + n log (1 − r n) ∈ n r2 n − 2r − r4 2(n − 2r)2 − 2r r n − 2r, n r2 n − 2r − 2r r n − 2r − r2 2(n − 2r)2 = − r 2 n − 2r − r4 2n(n − 2r)2, − r2 n − 2r + r3 (n − 2r)2 So lim n→∞ g2(n − r) g(n − 2r)g(n) ≤ limn→∞max − r2 n − 2r − r4 2n(n − 2r)2 , − r2 n − 2r + r3 (n − 2r)2 , which is finite. So lim n→∞ f2(n − r) f (n − 2r)f (n)− g2(n − r) g(n − 2r)g(n) = Cn· 0
with limn→∞Cn< ∞, so the limit equals 0 as desired.
Remark. Because limn→∞2nr is a constant, even equal 0. It holds that
− r 4 2n(n − 2r)2 = − r 2n r3 (n − 2r)2 = Rn r3 (n − 2r)2, with Rn as in Theorem 5.2.
Proof of Theorem 5.2. Let n ∈ N, C be a set of n items and r such that limn→∞√rn = 0. As mentioned in (5.1)
P (C1∩ C2= ∅) =
(n − r)!(n − r)! (n − 2r)!n! . Denoting g(n) =√2πnn+12e−n, Corollary 5.4 yields
lim n→∞ P (C1∩ C2= ∅) − g2(n − r) g(n − 2r)g(n) = 0 So it holds that: lim n→∞P (C1∩ C2= ∅) = limn→∞ s (n − r)2 (n − 2r)ne (n−2r) log (1+n−2rr )+n log (1−rn) = lim n→∞(1 + c1)e − r2 n−2r+ r3 (n−2r)2
with limn→∞c1= 0 as it is the limit of the square root and it will be bounded by the
5.2 Nonempty sets
Theorem 5.5. Given a set C of n items, a sequence (rn)n∈N with the property
limn→∞√rnn = 0 and two subsets C1 and C2 of size rn selected independently and
uniformly at random from C. Then it holds that: P (C1∩ C2 6= ∅) = Rn with lim
n→∞Rn= R
r2
n for a constant R > 0. Proof. We first notice
P (C1∩ C2 6= ∅) = P
C1∩ C2 = ∅
= 1 − P (C1∩ C2= ∅) .
Applying Theorem 5.2 we have P (C1∩ C2 6= ∅) ∈ 1 − e− r2 n−2r+ r3 (n−2r)2, 1 − e− r2 n−2r− r4 2n(n−2r)2 . Using Lemma 3.3 that for every x < 0 it holds that 1 − ex∈−x −1
2x
2, −x with the
previous result states e− r2 n−2r+ r3 (n−2r)2 ≤ r 2 n − 2r − r3 (n − 2r)2 e− r2 n−2r− r4 2n(n−2r)2 ≥ r 2 n − 2r + r4 2n(n − 2r)2 − 1 2 r2 n − 2r + r4 2n(n − 2r)2 2 So lim n→∞P (C1∩ C2 6= ∅) ≤ limn→∞ r2 n − 2r + r4 2n(n − 2r)2 − 1 2 r2 n − 2r + r4 2n(n − 2r)2 2 = Rn with Rn satisfies lim
n→∞Rn= R for a finite constant R
Now we have the probability of two sets C1 and C2 randomly chosen from sets of size r
having a nonempty intersection limited by the product of a finite constant and rn2 we will translate our problem to this result.
5.3 Lower bound theorem
Proof of Theorem 5.1. Now chose the set C1 random from sets of size
d0(u) + d1(u) + · · · + dM −2(u) and the set C2 from sets of the size
1. First determine all Γ1(u), . . . , ΓM −2(u) from the n nodes and define Γ0(u) as the
set containing the additional n − d0(u) + d1(u) − · · · − dM −2(u) nodes. By
construction there are conditional on Γ1(u), . . . , ΓM −2(u) no edges between Γ0(u)
and Γi(u) for every i = 1, . . . , M − 3. Because adding edges is independent of the
ordering, the edges in Γ0(u) and between Γ0(u) and ΓM −2(u) are not affected by
the conditioning.
2. Now define for any u ∈ G(n, pn) the sets
Γ00(u) = Γ0(u) ∪ ΓM −2(u) and βi(u) = ∪ij=1∆i(u)
where ∆i(u) is a random set of size di(u) = |Γi(u)|.
3. Now we are going to construct C2
a) Pick a node v ∈ G(n, pn) at random.
b) If v ∈ Γ0(u), pick at random a set ∆1(v) taken from Γ00(u) \ {v}.
c) If ∆1(v) has empty intersection with ΓM −2(u), pick at random a set ∆2(v)
taken from Γ00(u) \ β1(v)
d) In general if ∆i−1(v) has empty intersection with βM −i(v), pick at random a
set ∆i(v) taken from Γ00(u) \ βi−1(v)
The probability that this procedure stops (i.e. that there is not an empty intersection) equals the probability that two sets C1 and C2 have a nonempty intersection. With C1
and C2 satisfy
|C1| = 1 + d1(u) + . . . dM −2(u) and |C2| = 1 + d1(v) + . . . dM −2(v)
and with
lim
n→∞
d1(u) + · · · + d√ M −2(u)
n = 0. (5.4)
Equation (5.4) is true according to Proposition 4.3 which states that it holds that lim supn→∞d
+ M −1√
n = C and the observation
d+j = d+j−1(nα+ ) = d+M −1(nα+ )j−(m−1). For k ∈ {1, 2} it also holds that:
|Ck| ≤ 1 + d+1 + · · · + d+M −2≤ d + M −1 M −2 X j=1 (nα+ )j = Cn
for a sequence (Cn)n∈N satisfying limn→∞√Cnn = 0. So:
P (d(u, v) ≤ 2M − 4)
= P (d(u, v) ≤ 2M − 4 ∩ A) + P d(u, v) ≤ 2M − 4 ∩ A
= P (d(u, v) ≤ 2M − 4|A) P (A) + P d(u, v) ≤ 2M − 4|AP A = P d(u, v) ≤ 2M − 4 M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! P M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! + P d(u, v) ≤ 2M − 4 M −2 [ i−1 Ei(u) ∪ Ei(v) ! P M −2 [ i−1 Ei(u) ∪ Ei(v) ! ≤ P d(u, v) ≤ 2M − 4 M −2 \ i−1 (Ei(u) ∩ Ei(v)) ! + P M −2 [ i−1 Ei(u) ∪ Ei(v) ! ≤ O (1 + d + 1 · · · + d + M −2)2 n ! + M −2 X i=1 P Ei(u) + P Ei(v) = + 2(M − 2)n−K n→∞−−−→ 0
5.4 Concluding
So this ends the proof of the lowerbound, so we have proven both bounds of Theorem 3.6.
6 Small diameter
The result of Chapter 3 can be seen as the result that all the n persons living on the earth are connected via a little number of common acquaintances. The result of Theorem 3.6 is based on the assumption that the probability pn, the probability that
there is a connection between two nodes (e.g. persons) is the same for every pair of nodes. Therefore we introduce the so called Strogatz-Watts graphs.
6.1 Strogatz-Watts graph
Definition 6.1. Let m ∈ N and let n = m2. A Strogatz-Watts graph SW (n, pn) is a
semi random graph on the set V of n = m2 nodes where the nodes are connected via the following edges:
• via a grid line: point v is connected with its grid neighbours, so varying between the two and four edges;
• via a shortcut: point v is connected with probability pn with a uniformly chosen
point from V \ {v}.
Remark. Without shortcuts the diameter D(SW (n, p)) will be the distance between the left bottom and the right top, so equals 2√n. With the shortcuts added it will appear that the diameter decreases radically. Now the small world phenomenon is represented as follows:
Theorem 6.2. Let m ∈ N and let n = m2. Define a probability pn depending on n and
let SW (n, pn be a Strogatz-Watts graph. Then there is a constant Mp depending on the
probability pn such that for the diamerer D(SW (n, pn)) it holds that:
lim n→∞P D(SW (n, p)) ≤ Mplog (n) = 1.
The proof is ommited and for further reading you can consult Chapter 6 of [3] where the above theorem is stated and proven as Theorem 6.1.
7 Populaire samenvatting
Je hebt vast wel een keer meegemaakt dat je ergens op vakantie bent en dat je toevallig met iemand aan de praat raakt, vervolgens hebben jullie over een aantal personen die jullie kennen en daar blijkt een gemeenschappelijke vriend tussen te zitten. Het
grappige blijkt dat dit best vaak gebeurd, waardoor je soms zegt: “Wat is het toch een kleine wereld!” Dat klopt ook, het staat zelfs bekend als het small world phenomenon. Soms zie je hele coole mensen en vraag je je af of je ooit een hand met ze kan schudden. Wat nou als ik je vertel dat je maar op zes “handschudafstand” van ze vandaan bent? Wat is nou weer een handschudafstand? Stel je staat met Alice, Bob en Eve in een groep en je geeft Alice een hand, dan sta je ´e´en handschudafstand van Alice af. Stel nu dat Alice Bob een hand geeft, dan staat Alice ´e´en handschudafstand van Bob en van jouw af, en jij staat vervolgens twee handschudafstand van Bob af. Geeft Bob Eve vervolgens een hand, dan sta je drie handschudafstand van Eve af.
Stel nu dat je een hand schudden niet genoeg vindt, maar ook nog eens
facebookvrienden wilt zijn met die coole persoon. Eerstegraads vrienden zijn je eigen vrienden, tweedegraadsvrienden zijn de vrienden van je vrienden enzovoorts. Dan geldt er dat jij met iedereen, dus ook die coole persoon waar je je hand mee wilt schudden, hooguit een zesdegraads vriend is.
Deze theorie is ooit geintroduceerd door Milgram en vervoglens op grotere schaal getest door microsoft in haar messengerdatabase, waarna ik hem nu aan de hand van het boek Epidemics and Rumours in Complex Netwroks van Moez Draief en Laurent Massouli´e [3] wiskundig heb bewezen.
Bibliography
[1] Stanley Milgram, The Small-World Problem, Psychology Tofay, Vol. 1, No. 1, 61-67, 1967.
[2] Jeffrey Travers and Stanley Milgram, An Experimental Study of the Small World Problem, Sociometry, Vol. 32, No. 4, 425-443, 1969.
[3] Moez Draief and Laurent Massouli´e, Epidemics and Rumours in Complex Networks, Camebrdige university press, 46-67, 123, 2010.
[4] Jure Leskovec and Eric Horvitz, Planetary-Scale Views on an Instant-Messaging Network, Microsoft Research Technical ReportMSR-TR-2006-18, 1, 28, june 2007.