• No results found

Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

N/A
N/A
Protected

Academic year: 2021

Share "Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23"

Copied!
35
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

Part IV

A P P E N D I X

(3)
(4)

A

D I R I C H L E T P R O C E S S E S A N D C O N T R A C T I O N R AT E S R E L AT I V E T O N O N - M E T R I C S

In this appendix we present the definition of the Dirichlet process and give the proof to Theorem 1.13—a more generalized basic contraction rate theorem in Chapter 1, which is valid for more general discrepan- cies.

A.1 dirichlet processes

We give one version of the definition of the Dirichlet process, as in [32, Chapter 4]. Dir(𝑘; 𝛼1, … , 𝛼𝑘) denotes the Dirichlet distribution parameterized by a𝑘-dimensional real vector (𝛼1, … , 𝛼𝑘).

Definition A.1 (Dirichlet Process). A random measure𝑃 on a Pol- ish space(𝔛, 𝒳 ) is said to possess a Dirichlet process distribution DP(𝛼) with base measure 𝛼, which is a given positive Borel measure on(𝔛, 𝒳 ), if for every finite measurable partition 𝐴1, 𝐴2, … , 𝐴𝑘of 𝔛,

(𝑃(𝐴1), … , 𝑃(𝐴𝑘)) ∼ Dir(𝑘; 𝛼(𝐴1), … , 𝛼(𝐴𝑘)). (A.1)

A.2 contraction rates relative to non-metrics

The following lemma comes from the general results of [14] and [47].

Lemma A.2. For any probability measure𝑃 and dominated, convex set of probability measures 𝒬 withℎ(𝑝, 𝑞) > 𝜀 for any 𝑞 ∈ 𝒬 and any 𝑛 ∈ ℕ, there exists a test 𝜙𝑛such that

𝑃𝑛𝜙𝑛≤ 𝑒−𝑛𝜀2/8, sup

𝑄∈𝒬

𝑄𝑛(1 − 𝜙𝑛) ≤ 𝑒−𝑛𝜀2/8.

Lemma A.3. Let𝑑 be a discrepancy measure in the sense of (a)–(d) whose balls are convex and which is bounded from above by the Hellin- ger distanceℎ. If 𝑁(𝐶−1𝜀/4, 𝒬, 𝑑) ≤ 𝑁(𝜀) for any 𝜀 > 𝐶𝜀𝑛 > 0 and some non-increasing function𝑁 ∶ (0, ∞) → (0, ∞), then for every 𝜀 > 𝐶𝜀𝑛and𝑛, there exists a test 𝜑𝑛such that for all𝑗 ∈ ℕ,

𝑃𝑛𝜑𝑛≤ 𝑁(𝜀) 𝑒−𝑛𝜀2/32

1 − 𝑒−𝑛𝜀2/32, sup

𝑄∈𝒬,𝑑(𝑃,𝑄)>𝐶𝑗𝜀

𝑄𝑛(1 − 𝜑𝑛) ≤ 𝑒−𝑛𝜀2𝑗2/32.

Proof. For a given𝑗 ∈ ℕ, choose a maximal set 𝑄𝑗,1, 𝑄𝑗,2, … , 𝑄𝑗,𝑁𝑗in the set 𝒬𝑗 = {𝑄 ∈ 𝒬 ∶ 𝐶𝑗𝜀 < 𝑑(𝑃, 𝑄) < 2𝐶𝑗𝜀} such that 𝑑(𝑄𝑗,𝑘, 𝑄𝑗,𝑙) ≥

(5)

𝑗𝜀/2 for every 𝑘 ≠ 𝑙. By property (d) of the discrepancy every ball in a cover of 𝒬𝑗by balls of radius𝐶−1𝑗𝜀/4 contains at most one 𝑄𝑗,𝑘. Thus 𝑁𝑗 ≤ 𝑁(𝐶−1𝑗𝜀/4, 𝒬𝑗, 𝑑) ≤ 𝑁(𝜀). Furthermore, the 𝑁𝑗balls𝐵𝑗,𝑙of ra- dius𝑗𝜀/2 around 𝑄𝑗,𝑙cover 𝒬𝑗, as otherwise the set of𝑄𝑗,𝑙would not be maximal. For any point𝑄 in each 𝐵𝑗,𝑙, we have

𝑑(𝑃, 𝑄) ≥ 𝐶−1𝑑(𝑃, 𝑄𝑗,𝑙) − 𝑑(𝑄, 𝑄𝑗,𝑙) ≥ 𝑗𝜀/2.

Since the Hellinger distance bounds𝑑 from above, also ℎ(𝑃, 𝐵𝑗,𝑙) ≥ 𝑗𝜀/2. By Lemma A.2, there exist a test 𝜑𝑗,𝑙of𝑃 versus 𝐵𝑗,𝑙with error probabilities bounded from above by𝑒−𝑛𝑗2𝜀2/32. Let𝜑𝑛be the supre- mum of all the tests𝜑𝑗,𝑙obtained in this way, for𝑗 = 1, 2, … , and 𝑙 = 1, 2, … , 𝑁𝑗. Then,

𝑃𝑛𝜑 ≤

𝑗=1

𝑙=1

𝑁𝑗𝑒−𝑛𝑗2𝜀2/32

𝑗=1

𝑁(𝐶−1𝑗𝜀/4, 𝒬𝑗, 𝑑)𝑒−𝑛𝑗2𝜀2/32

≤ 𝑁(𝜀) 𝑒−𝑛𝜀2/32 1 − 𝑒−𝑛𝜀2/32, and for every𝑗 ∈ ℕ,

sup

𝑄∈∪𝑙>𝑗𝒬𝑙

𝑄𝑛(1 − 𝜑𝑛) ≤ sup

𝑙>𝑗

𝑒−𝑛𝑙2𝜀2/32≤ 𝑒−𝑛𝑗2𝜀2/32,

by the construction of𝜑𝑛.

Proof of Theorem 1.13. For every𝜀 > 4𝐶𝜀𝑛,

log 𝑁(𝐶−1𝜀/4, 𝒫𝑛, 𝑑) ≤ log 𝑁(𝜀𝑛, 𝒫𝑛, 𝑑) ≤ 𝑐1𝑛𝜀2𝑛.

We take𝑁(𝜀) = exp(𝑐1𝑛𝜀2𝑛) and 𝜀 = 𝑀𝐶−1𝜀𝑛,𝑗 = 1 in Lemma A.3, where𝑀 > 4𝐶 is a large constant to be chosen later, there exist tests 𝜑𝑛with errors

𝑃0𝑛𝜑𝑛≤ 𝑒𝑐1𝑛𝜀𝑛2 𝑒−𝑛𝑀2𝐶−2𝜀2𝑛/32 1 − 𝑒−𝑛𝑀2𝐶−2𝜀2𝑛/32, sup

𝑝∈𝒫𝑛∶𝑑(𝑝,𝑝0)>𝑀𝜀𝑛

𝑃𝑛(1 − 𝜑𝑛) ≤ 𝑒−𝑛𝑀2𝐶−2𝜀2𝑛/32.

Next the proof proceeds as in Theorem 2.4 of [30]. All terms tend to zero for𝑀2/(32𝐶2) > 𝑐1and𝑀2/(32𝐶2) > 2 + 𝑐2.

(6)

B

C O N V E R G E N C E T O A P O W E R L AW O F T H E M O V I E D E G R E E S I N T H E PA M - I M D B M O D E L

We establish the asymptotic behavior of the movie degrees of actors when the number of movies of the network goes to infinity with the formal definition of our model in Section 6.2. Indeed we prove that the distribution of the movie degree converges to a power law, from which we can also give the motivation of our choice of linear preference over the movie degrees of actors and also the estimate of the parameter 𝛿 in our model proposed before (the parameter is also used in the simulation in Section 6.4).

B.1 introduction and heuristics

In [21], a preferential attachment model with random initial degrees, that combines the rich-get-richer and the rich-by-birth effects, is in- vestigated, where in particular the effect of random initial degree is studied. However, here we have a random movie size and random number of new actors. This leads us to apply the methods in [21], which proves the convergence to a power law of actor’s movie degrees.

To formulate the result, let𝑁𝑘𝑀(𝑡) be the number of actors with movie degree𝑘 in 𝒢𝑡and define𝑝𝑘(𝑡) = 𝑁𝑘𝑀(𝑡)/𝜙𝑡 (recall in Sec- tion 6.2 that𝑡 is the number of movies and 𝜙𝑡is the number of actors in the network) as the fraction of actors with movie degree𝑘. We are interested in the limiting distribution of𝑝𝑘(𝑡) as 𝑡 → ∞. We start by giving a short heuristic derivation of the distribution. Note that

𝔼[𝑁𝑘𝑀(𝑡 + 1)|𝒢𝑡] = 𝑁𝑘𝑀(𝑡) + 𝔼[𝑁𝑘𝑀(𝑡 + 1) − 𝑁𝑘𝑀(𝑡)|𝒢𝑡]. (B.1) Asymptotically, for𝑡 large, the movie size of 𝑚𝑡+1is supposed to be much smaller than the size of the actor network|𝒜𝑡|. Thus, it is un- likely that an actor vertex will be chosen twice when a movie comes in, we shall ignore the difference for now. The difference𝑁𝑘𝑀(𝑡 + 1) − 𝑁𝑘𝑀(𝑡) between the number of actor vertices with degree 𝑘 at time 𝑡+1 and time𝑡 respectively, comes from three possibilities (recall that 𝜁𝑡+1, 𝜓𝑡+1, and𝜉𝑡+1are the movie size of𝑚𝑡+1, the number of old actors and the number of new actors respectively):

1. Actor vertices with movie degree𝑘 in 𝒢𝑡that are chosen by the movie vertex𝑚𝑡+1are subtracted from𝑁𝑘𝑀(𝑡). The conditional probability that an actor vertex with movie degree𝑘 is chosen is(𝑘 + 𝛿)𝑁𝑘𝑀(𝑡)/𝛺𝑡(recall that we assume affine preferential

(7)

attachment model as in𝑓(𝑘) = 𝑘 + 𝛿 with 𝛿 parameter), where 𝛺𝑡 = ∑𝑣∈𝒜

𝑡(𝐷𝑘𝑀(𝑣) + 𝛿). Thus conditioned on the number of old actors𝜓𝑡+1, the expected number of actor vertices with movie degree𝑘 is 𝜓𝑡+1(𝑘 + 𝛿)𝑁𝑘𝑀(𝑡)/𝛺𝑡.

2. Actor vertices with movie degree𝑘 − 1 in 𝒢𝑡that are chosen by the movie vertex𝑚𝑡+1are added to𝑁𝑘𝑀(𝑡). By reasoning the same as above, it follows that the mean number of such vertices is𝜓𝑡+1(𝑘 − 1 + 𝛿)𝑁𝑘−1𝑀 (𝑡)/𝛺𝑡, conditioned on𝜓𝑡+1.

3. The new vertices introduced by the movie vertex𝑚𝑡+1 to be added to𝑁𝑘𝑀(𝑡) if 𝑘 = 1 (the new actor vertices have movie degrees equal to1).

Combining the three contributions gives:

𝔼[𝑁𝑘𝑀(𝑡 + 1) − 𝑁𝑘𝑀(𝑡)|𝒢𝑡]

≈ (𝑘 − 1 + 𝛿)𝑁𝑘−1𝑀 (𝑡)

𝛺𝑡 𝔼[𝜓𝑡+1] − (𝑘 + 𝛿)𝑁𝑘𝑀(𝑡) 𝛺𝑡 𝔼[𝜓𝑡+1] + 1{𝑘=1}(𝔼[𝜁𝑡+1] − 𝔼[𝜓𝑡+1])

= (𝑘 − 1 + 𝛿)𝑁𝑘−1𝑀 (𝑡)

𝛺𝑡 𝜇𝜓−(𝑘 + 𝛿)𝑁𝑘𝑀(𝑡)

𝛺𝑡 𝜇𝜓+ 1{𝑘=1}𝜇𝜉, (B.2)

where we introduce the notations𝜇𝜁= 𝔼[𝜁] = 𝔼[𝜁𝑡+1], 𝜇𝜓= 𝔼[𝜓] = 𝔼[𝜓𝑡+1] and 𝜇𝜉= 𝜇𝜁−𝜇𝜓. In (B.2) the approximation sign refers to the fact that we have ignored the possibility that an actor vertex is chosen in the same movie twice. Now assume that𝑝𝑘(𝑡) converges to some limit𝑝𝑘as𝑡 → ∞ and the strong law of large number holds for the distribution of new actor vertices introduced by every movie, so that 𝔼[𝜙𝑡]/𝑡 ∼ 𝜇𝜉and𝑁𝑘𝑀(𝑡 + 1)/𝑡 ∼ (𝑡 + 1)𝜇𝜉𝑝𝑘. Also assume that

𝛺𝑡= ∑

𝑎∈𝒜𝑡

(𝐷𝑡𝑀(𝑎) + 𝛿))

≈ 𝑡(𝜇𝜁+ 𝜇𝜉𝛿).

(B.3)

Define𝜃 = 𝜇𝜁+ 𝜇𝜉𝛿, then by the law of large numbers 𝛺𝑡+1 ≈ 𝑡𝜃.

Substituting (B.3) into (B.1), and taking expectation again, we come to

𝔼[𝑁𝑘𝑀(𝑡 + 1)] = (𝑘 − 1 + 𝛿)𝜇𝜓

𝑡𝜃 𝔼[𝑁𝑘−1𝑀 (𝑡)]

𝑘 + 𝛿 (B.4)

(8)

B.1 introduction and heuristics

Let𝑡 → ∞, and observing that 𝔼[𝑁𝑀𝑘 (𝑡 + 1)] − 𝔼[𝑁𝑘𝑀(𝑡)] → 𝜇𝜉𝑝𝑘 implies1𝑡𝔼[𝑁𝑘𝑀(𝑡)] → 𝜇𝜉𝑝𝑘, for all𝑘, then yields the recursion

𝑝𝑘= 𝑘 − 1 + 𝛿

𝜃 𝑝𝑘−1−𝑘 + 𝛿

𝜃 𝑝𝑘+ 1{𝑘=1}. (B.5) where𝜃 = (𝜇𝜁+ 𝜇𝜉𝛿)/𝜇𝜓. By iteration, noticing that𝑝0= 0 (there is no actor vertex with movie degree0), the recursion can be solved by

𝑝𝑘= 𝛤(1 + 𝛿 + 𝜃) 𝛤(1 + 𝛿)

𝛤(𝑘 + 𝛿) 𝛤(𝑘 + 𝛿 + 1 + 𝜃)𝜃

≈ 𝑐(𝜃, 𝛿)𝑘−(1+𝜃),

(B.6)

the approximation only holds when𝑘 is sufficiently large.

Theorem B.1 (Theorem 6.1). Suppose, for the model discussed above, that the following conditions hold:

1. In fact Condition 1.

implies that the distribution of movie size𝜁 has a finite third moment, but a finite second moment is sufficiently strong to proceed the proof.

There exist a constant𝑎0> 0 and 𝑐𝑁such that

ℙ(𝜁 > 𝑁) ≤ 𝑐𝑁𝑁−(3+𝑎0). (B.7) In particular this implies the distribution of the movie size𝜁 has a finite second moment.

2. 𝛿 > −1.

3. 𝜃 > 1.

Then there exists a constant𝛾 such that

𝑡→∞limℙ (max

𝑘≥1 |𝑝𝑘(𝑡) − 𝑝𝑘| ≥ 𝑡−𝛾) = 0 where(𝑝𝑘)𝑘≥1is defined in (B.6).

Estimation in real dataset gives us an estimate of𝑎0at around0.05.

This yields the first assumption in the theorem above is reasonable in real dataset. As for the second assumption, it is natural otherwise 𝑘 + 𝛿 can be negative when 𝑘 = 1, Besides we obtained an estimate of 𝛿 = 0.0851. The last condition states 𝜃 = 𝜇𝜁+ 𝛿𝜇𝜉> 1, which is also reasonable in real dataset as it basically requires the average number of old actors to be strictly greater than1, as confirmed by estimation of𝜇𝜓≈ 8.7400 in the real dataset.

We shall prove the main result of this chapter and it is worth not- ing that the proof is largely adapted from [21].

(9)

B.2 proof of theorem B.1

The proof of Theorem B.1 consists of two parts: in the first part, we prove that the degree sequence is concentrated around its mean; and in the second part, we identify the mean degree sequence.

We formulate these two steps into two propositions. The concen- tration of the movie degree sequence is as follows:

Proposition B.2. With the conditions of Theorem B.1, there exists a constant𝛼 ∈ (1/2, 1) such that

𝑡→∞limℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼) = 0.

The result of the identification of the mean sequence is as follows:

Proposition B.3. With the conditions of Theorem B.1, there exist con- stants𝑐 > 0 and 𝛽 ∈ (0, 1) such that

max𝑘≥1 |𝔼[𝑁𝑘𝑀(𝑡)] − 𝑡𝜇𝜉𝑝𝑘| ≤ 𝑐𝑡𝛽, (B.8) where(𝑝𝑘)𝑘≥1are as in (B.6) and𝜇𝜉= 𝜇𝜁− 𝜇𝜓is the expected number of new actors.

With Proposition B.2 and B.3, we can prove the main result Theo- rem B.1.

Proof of Theorem B.1. Combining (B.8) with the triangle inequality, we have

ℙ (max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝑡𝜇𝜉𝑝𝑘| ≥ 𝑐𝑡𝛽+ 𝑡𝛼)

≤ ℙ (max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼) .

The right side goes to0 as 𝑡 → ∞ by Proposition B.2. Since 𝜙𝑡/𝑡 − 𝜇𝜉 converges to0 almost surely and 𝑝𝑘(𝑡) = 𝑁𝑘𝑀(𝑡)/𝜙𝑡, we have

𝑡→∞limℙ (max

𝑘≥1 |𝑝𝑘(𝑡) − 𝑝𝑘| ≥ 𝑐𝑡𝛽+ 𝑡𝛼 𝑡𝜇𝜉 ) = 0.

Theorem B.1 follows by choosing the right0 < 𝛾 < 1 − max(𝛼, 𝛽).

Note that𝛾 ∈ (1/2, 1).

(10)

B.2 proof of theorem B.1

B.2.1 Concentration around the mean

In this section, we prove Proposition B.2. The way to prove this is to apply Azuma’s inequality – see e.g. [37] – after expressing the differ- ence𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)] in terms of a Doob’s martingale. See [16, 21]

for more of the applications.

First note that

𝑁𝑘𝑀(𝑡) ≤ 1 𝑘

𝑙=1

𝑙𝑁𝑙𝑀(𝑡) ≤ 1

𝑘𝐿𝑀𝑡 , (B.9) where𝐿𝑀𝑡 = ∑𝑡𝑙=1𝑙𝑁𝑙𝑀(𝑡) = ∑𝑡𝑖=1𝜁𝑖(not to be confused with𝛺𝑡 = 𝛺𝑀𝑡 (𝛿) = ∑𝑡𝑙=1(𝑁𝑙𝑀(𝑡) + 𝛿)). Thus (B.9) implies 𝔼[𝑁𝑘𝑀(𝑡)] ≤ 𝜇𝜁𝑡/𝑘.

Define[𝑡] = {1, … , 𝑡}, on the event {𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]} we introduce the Doob’s martingale,

𝑀𝑛= 𝔼[𝑁𝑘𝑀(𝑡)|𝒢𝑛], 𝑛 = 0, … , 𝑡, (B.10) where 𝒢0is set to be the empty graph. It is the Doob’s martingale with respect to(𝒢𝑛)𝑡𝑛=1since it is clear that𝔼|𝑀𝑛| < ∞. We have 𝑀𝑡 = 𝑁𝑘𝑀(𝑡) and 𝑀0= 𝔼[𝑁𝑘𝑀(𝑡)] and hence

𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)] = 𝑀𝑡− 𝑀0.

Furthermore conditioned on the movie sizes(𝜁𝑛)𝑡𝑛=0, the increments satisfy|𝑀𝑛− 𝑀𝑛−1| ≤ 2𝜁𝑛. In fact, the information that is in 𝒢𝑛and not in 𝒢𝑛−1is how the𝑛-th movie select old actors and add new actors and how that might have an impact on the number of actors with the movie degree𝑘, thus the absolute difference resulting from the 𝑛-th movie can not be greater than the size of the𝑛-th movie. On the event {𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]}, we obtain |𝑀𝑛− 𝑀𝑛−1| ≤ 2𝑡𝑎. We apply the Azuma-

Hoeffding inequality and come to

ℙ(|𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡])

≤ 2 exp (− 𝑡2𝛼

8 ∑𝑡𝑖=1𝑡2𝑎) = 2 exp (−1

8𝑡2𝛼−2𝑎−1) . (B.11) To boundℙ( max𝑘≥1|𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼), we give an up- per bound by noticing that the movie degree of any actor cannot ex-

(11)

ceed the number of movies𝑡:

ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡])

𝑡

𝑘=1

ℙ( |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]), (B.12)

Substituting (B.11) into (B.12), we have ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡])

≤ 2𝑡 exp (−1

8𝑡2𝛼−2𝑎−1) .

(B.13)

If𝛼 > 𝑎 + 1/2, then the above exponential tends to 0 with 𝛼 < 1.

We have proved Proposition B.2 on the event{𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]}. We use the assumption on the distribution of the movie sizes thatℙ(|𝜁1| >

𝑁) ≤ 𝑐𝑁𝑁−(1+𝑎0)to obtain

ℙ( max

1≤𝑖≤𝑡𝜁𝑖≤ 𝑡𝑎) =

𝑡

𝑖=1

ℙ(𝜁𝑖≤ 𝑡𝑎) = (1 − ℙ(𝜁1> 𝑡𝑎))𝑡

≥ 1 − 𝑡ℙ(𝜁1> 𝑡𝑎) ≥ 1 − 𝑐𝑁𝑡−(3𝑎+𝑎𝑎0−1)

(B.14)

where we take 2(1+𝑎1

0/3) < 𝑎 < 1/2 so that 3𝑎 + 𝑎𝑎0− 1 > 1/2. This yields that the event{𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]} will occur with high probability.

Define the event{𝜁𝑖≤ 𝑡𝑎, 𝑖 ∈ [𝑡]} as 𝐾𝑡 ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼)

= ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝐾𝑡)ℙ(𝐾𝑡) (B.15) + ℙ( max

𝑘≥1 |𝑁𝑀𝑘 (𝑡) − 𝔼[𝑁𝑀𝑘 (𝑡)]| ≥ 𝑡𝛼|𝐾𝑐𝑡)ℙ(𝐾𝑡𝑐), (B.16) where𝐾𝑐𝑡is the complement of the event of𝐾𝑡. We now take care of the terms on the right hand side one by one. For (B.15), (B.13) leads to

ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝐾𝑡)ℙ(𝐾𝑡)

≤ 2𝑡 exp (−1

8𝑡2𝛼−2𝑎−1) .

(B.17)

(12)

B.2 proof of theorem B.1

As for (B.16), we use (B.14) and have ℙ( max

𝑘≥1 |𝑁𝑘𝑀(𝑡) − 𝔼[𝑁𝑘𝑀(𝑡)]| ≥ 𝑡𝛼|𝐾𝑡𝑐)ℙ(𝐾𝑡𝑐)

≤ 𝑐𝑁𝑡−(3𝑎+𝑎𝑎0−1).

(B.18)

Combining (B.17) and (B.18), we have the desired result.

B.2.2 Identification of the mean sequence

In this section we prove Proposition B.3. We introduce

̄𝑁𝑀𝑘 (𝑡) = 𝔼[𝑁𝑀𝑘 (𝑡)|(𝜁𝑖)𝑡𝑖=1], for𝑡 ≥ 1, (B.19) to denote the expectation of𝑁𝑘𝑀(𝑡) conditioned on the knowledge of all the movie sizes. Define the error term

𝜀𝑘(𝑡) = ̄𝑁𝑘𝑀(𝑡) − 𝑡𝜇𝜉𝑝𝑘, for𝑡 ≥ 1, (B.20) where𝜇𝜉= 𝜇𝜁− 𝜇𝜓is as in Proposition B.3.

Also, for any real sequence𝑄 = (𝑄𝑘)𝑘≥0with𝑄0= 0 we introduce the operator𝑇𝑡(compare to (B.4))

(𝑇𝑡𝑄)𝑘= (1 − 𝑘 + 𝛿

𝛺𝑡−1𝑝𝜓) 𝑄𝑘+𝑘 − 1 + 𝛿

𝛺𝑡−1 𝑝𝜓𝑄𝑘−1, 𝑘 ≥ 1. (B.21) The operator𝑇𝑡describes the effect of choosing one actor (whether new or not) for a movie, neglecting the effect of the introduction of a new actor with movie degree1: when adding an actor to the 𝑡-th movie, it will happen with probability𝑝𝜓that it is old and1 − 𝑝𝜓 that it is new. If the actor is old, we choose an actor of movie degree 𝑘 with probability (𝑘 + 𝛿)/𝛺𝑡−1thus this actor will have a new movie degree of𝑘 + 1 and an actor of movie degree 𝑘 − 1 with probability (𝑘 − 1 + 𝛿)/𝛺𝑡−1thus this actor will have a new movie degree of𝑘. The expected number of actors with movie degree𝑘 after the choice of an actor is made is given by applying the operator𝑇𝑡to ̄𝑁𝑀(𝑡 − 1) and adding the possibility of introducing a new actor with movie degree1.

Therefore conditioned on𝜁𝑡and without taking the possibility of in- troducing new actors with movie degree1 into account, we can write the effect of a movie on the network as the operator 𝒯𝑡= 𝑇𝑡𝜁𝑡, here the power𝜁𝑡on the operator𝑇𝑡means applying the operator𝑇𝑡repeatedly for𝜁𝑡times since the𝑡-th movie has size of 𝜁𝑡. Hence, ̄𝑁𝑘𝑀(𝑡) satisfies

̄𝑁𝑀𝑘 (𝑡) = (𝒯𝑡 ̄𝑁𝑀(𝑡 − 1))𝑘+ 𝜉𝑡1{𝑘=1}, 𝑘 ≥ 1. (B.22)

(13)

Now we introduce another operator on a real sequence𝑄 = (𝑄𝑘)𝑘≥0 with𝑄0= 0 (compare to (B.5))

(𝑆𝑄)𝑘 =𝑘 − 1 + 𝛿

𝜃 𝑄𝑘−1−𝑘 + 𝛿

𝜃 𝑄𝑘, 𝑘 ≥ 1, (B.23) where𝜃 = (𝜇𝜓+ (𝜇𝜁− 𝜇𝜓)𝛿)/𝜇𝜓. This yields that (B.5) is given by 𝑝𝑘 = (𝑆𝑝)𝑘+ 1{𝑘=1}with the initial condition𝑝0 = 0. Notice that for 𝑘 ≥ 1

𝑡𝑝𝑘 = (𝑡−1)𝑝𝑘+(𝑆𝑝)𝑘+1{𝑘=1}= (𝑡−1)(𝒯𝑡𝑝)𝑘−𝜅𝑘(𝑡)+1{𝑘=1}, (B.24) where

𝜅𝑘(𝑡) = (𝑡 − 1)(𝒯𝑡𝑝)𝑘− (𝑡 − 1)𝑝𝑘− (𝑆𝑝)𝑘. (B.25) Put (B.25), (B.20) and (B.22) together, we have the recursion of𝜀(𝑡) = (𝜀𝑘(𝑡))𝑘≥1

𝜀𝑘(𝑡) = ̄𝑁𝑘𝑀(𝑡) − 𝑡𝜇𝜉𝑝𝑘

= (𝒯𝑡𝑁̄𝑀(𝑡 − 1))𝑘+ 𝜉𝑡1{𝑘=1}− (𝑡 − 1)𝜇𝜉(𝒯𝑡𝑝)𝑘+ 𝜇𝜉𝜅𝑘(𝑡) − 𝜇𝜉1{𝑘=1}

= (𝒯𝑡𝜀(𝑡 − 1))𝑘+ 𝜇𝜉𝜅𝑘(𝑡) + (𝜉𝑡− 𝜇𝜉)1{𝑘=1}.

(B.26)

We define the norm for a real sequence𝑄 = (𝑄𝑘)𝑘≥1as‖𝑄‖ = sup𝑘≥1|𝑄𝑘|. Taking into account that 𝔼[ ̄𝑁𝑘𝑀(𝑡)] = 𝔼[𝑁𝑘𝑀(𝑡)], before we can prove Proposition B.3, we need to show there are constants 𝑐 > 0 and 𝛽 ∈ [0, 1) such that for 𝑡 = 0, 1, … the following holds

‖𝔼[𝜀(𝑡)]‖ = sup

𝑘≥1

|𝔼[𝑁𝑘𝑀(𝑡)] − 𝑡𝜇𝜉𝑝𝑘| ≤ 𝑐𝑡𝛽. (B.27)

Notice that the movie degree of any actor will always be bounded by the number of movies, we introduce𝑘𝑡= 𝑡, this is the upper bound of the movie degrees when the network only has𝑡 movies. Now define a sequence

̃𝜀𝑘(𝑡) = 𝜀𝑘(𝑡)1{𝑘≤𝑘𝑡}

and note that for𝑘 ≤ 𝑘𝑡, the sequence( ̃𝜀𝑘(𝑡))𝑘≥1satisfies

̃𝜀𝑘(𝑡) = 1{𝑘≤𝑘𝑡}(𝒯𝑡𝜀(𝑡 − 1))𝑘+ 𝜇𝜉 ̃𝜅𝑘(𝑡) + (𝜉𝑡− 𝜇𝜉)1{𝑘=1}, (B.28) where ̃𝜅𝑘(𝑡) = 𝜅𝑘(𝑡)1{𝑘≤𝑘𝑡}. Using the triangle inequality we obtain

(14)

B.2 proof of theorem B.1

that

‖𝔼[𝜀(𝑡)]‖ ≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼 ̃𝜀(𝑡)‖

≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼[1(−∞,𝑘𝑡](⋅)𝒯𝑡𝜀(𝑡)‖ + ‖𝔼 ̃𝜅(𝑡)‖, (B.29) where1(−∞,𝑘𝑡](𝑘) = 1{𝑘≤𝑘𝑡}.

We formulate a lemma to derive bounds for terms in (B.29).

Lemma B.4. There are constants𝐶̃𝜀,𝐶(1)𝜀 ,𝐶(2)𝜀 and𝐶̃𝜅, independent of 𝑡, such that for 𝑡 sufficiently large and some 𝛽 ∈ [0, 1),

(a) ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤𝑡𝐶1−𝛽̃𝜀,

(b) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝒯𝑡𝜀(𝑡 − 1)]‖ ≤ (1 −𝐶𝑡(1)𝜀 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝐶𝑡1−𝛽(2)𝜀 , (c) ‖𝔼[ ̃𝜅(𝑡)]‖ ≤𝑡𝐶1−𝛽𝜅̃ .

With Lemma B.4, we can prove Proposition B.3.

Proof of Proposition B.3. We will establish (B.8) by induction on𝑡. Fix 𝑡0∈ ℕ, we initialize the induction hypothesis for any 𝑡 = 𝑡0. Indeed, we have

‖𝔼[𝜀(𝑡0)]‖ ≤ sup

𝑘≤1

𝔼[𝑁𝑘𝑀(𝑡0)] + 𝑡0𝜇𝜉sup

𝑘≥1

𝑝𝑘≤ 𝑡0(1 + 𝜇𝜉) ≤ 𝑐𝑡𝛽0,

when we choose𝑐 so large that 𝑡0(1 + 𝜇𝜉) ≤ 𝑐𝑡𝛽0. Now we start the in- duction argument: assuming (B.8) is true for𝑡−1, we want to establish (B.8) for𝑡. In fact, noticing that 1 − 𝐶(1)𝜀 /𝑡 ≥ 0 holds for 𝑡 sufficiently large and using Lemma B.4, we have

‖𝔼[𝜀(𝑡)]‖ ≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼[1(−∞,𝑘𝑡](⋅)𝒯𝑡𝜀(𝑡)‖ + ‖𝔼 ̃𝜅(𝑡)‖

≤ 𝐶̃𝜀

𝑡1−𝛽 + (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝐶(2)𝜀 𝑡1−𝛽 + 𝐶̃𝜅

𝑡1−𝛽

≤ 𝐶̃𝜀

𝑡1−𝛽 + (1 −𝐶(1)𝜀

𝑡 ) 𝑐(𝑡 − 1)𝛽+𝐶(2)𝜀 𝑡1−𝛽 + 𝐶̃𝜅

𝑡1−𝛽

≤ 𝑐𝑡𝛽(1 −𝑐𝐶(1)𝜀 − 𝐶̃𝜀− 𝐶(2)𝜀 − 𝐶̃𝜅

𝑡1 ) .

If we choose𝑡0so large that1 − 𝐶(1)𝜀 /𝑡 ≥ 0 is true for all 𝑡 ≥ 𝑡0, and we also choose𝑐 so large that 𝑐𝐶(1)𝜀 ≥ 𝐶̃𝜀+ 𝐶(2)𝜀 + 𝐶̃𝜅and𝑐𝑡0𝛽≥ (1 + 𝜇𝜉)𝑡0, then we have‖𝔼[𝜀(𝑡)]‖ ≤ 𝑐𝑡𝛽and thus can advance the induction hypothesis to𝑡. Therefore we have established (B.8).

(15)

We shall prove Lemma B.4 one by one.

Proof of Lemma B.4(a). Obviously‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤ 𝔼[‖𝜀(𝑡) − ̃𝜀(𝑡)‖], thus

‖𝜀(𝑡) − ̃𝜀(𝑡)‖ = sup

𝑘>𝑘𝑡

| ̄𝑁𝑘𝑀(𝑡) − 𝑡𝜇𝜉𝑝𝑘| ≤ sup

𝑘>𝑘𝑡

̄𝑁𝑀𝑘 (𝑡) + 𝑡𝜇𝜉sup

𝑘>𝑘𝑡

𝑝𝑘.

Notice that𝑘𝑡 = 𝑡, the first term𝑁̄𝑘𝑀(𝑡) vanishes because it is not possible that there exists any actor with movie degree strictly greater than𝑡 when there are only 𝑡 movies in the network. The second term sup𝑘>𝑘

𝑡𝑝𝑘, however, from (B.6) we know asymptotically (𝑘 → ∞) 𝑝𝑘 follows a power law

𝑝𝑘≈ 𝑐(𝜃, 𝛿)𝑘−(1+𝜃).

Notice that𝑝𝑘 is monotonically decreasing with𝑘 when 𝑘 is suffi- ciently large and𝑘𝑡= 𝑡, the above relation implies there exists a con- stant𝐶̃𝜀and a𝛽 such that for 𝑘 sufficiently large

𝑡𝜇𝜉sup

𝑘>𝑘𝑡

𝑝𝑘 ≤ 𝐶̃𝜀 𝑡𝜃+1.

Since𝜃> 1, for 𝛽 ∈ [0, 1) we have that for 𝑡 sufficiently large

‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤ 𝐶̃𝜀 𝑡1−𝛽.

Proof of Lemma B.4(b). First we prove that for𝑡 sufficiently large, on the event𝛺𝑡−1≥ 𝑡,

‖𝔼[1(−∞,𝑘𝑡](⋅)𝑇𝑡𝜀(𝑡)]‖ ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝐶(3)𝜀

𝑡1−𝛽, (B.30) which is Lemma B.4(b) if𝜁𝑡= 1. Later we will extend the case to more general values of𝜁𝑡.

To prove (B.30), we first prove a bound which we will later use.

In fact, given real-valued sequence𝑄 = (𝑄𝑘)𝑘≥0, which satisfies (i) 𝑄0= 0 and (ii)

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄𝑘| ≤ 𝐶𝑄𝛺𝑡−1, (B.31) there exists a𝛽 ∈ (0, 1) and a constant 𝑐 > 0 such that for 𝑡 sufficiently

(16)

B.2 proof of theorem B.1

large, on the event𝛺𝑡−1≥ 𝑡,

‖𝔼[1(−∞,𝑘𝑡)(⋅)𝑇𝑡𝑄]‖ ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝑄]‖ + 𝑐𝐶𝑄 𝑡1−𝛽. (B.32) To prove (B.32), recall for𝑘 ≥ 1

𝔼[(𝑇𝑡𝑄)𝑘] = 𝔼 [(1 −𝑘 + 𝛿

𝛺𝑡−1𝑝𝜓) 𝑄𝑘+𝑘 − 1 + 𝛿

𝛺𝑡−1 𝑝𝜓𝑄𝑘−1] . (B.33) Notice that in the above expression we have a random variable𝛺𝑡−1in the denominator, we substitute them with their expectations𝔼[𝛺𝑡−1] = (𝑡 − 1)𝜃, where 𝜃 = 𝜇𝜁+ 𝜇𝜉𝛿, and subtract the differences. That is

𝔼[(𝑇𝑡𝑄)𝑘] = 𝔼 [(1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝𝜓) 𝑄𝑘+𝑘 − 1 + 𝛿

(𝑡 − 1)𝜃 𝑝𝜓𝑄𝑘−1] (B.34) + (𝑘 + 𝛿)𝑝𝜓𝔼 (𝛺𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺𝑡−1 𝑄𝑘) (B.35) + (𝑘 − 1 + 𝛿)𝑝𝜓𝔼 (𝛺𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺𝑡−1 𝑄𝑘−1) . (B.36) First we take care of (B.34). Note that𝑘 ≤ 𝑘𝑡 = 𝑡 holds always and 𝜃 > 1, 𝑘 + 𝛿 ≤ 𝑘𝑡+ 𝛿 = 𝑡 + 𝛿 ≤ (𝑡 − 1)𝜃 holds for sufficiently large 𝑡, and hence

1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝𝜓≥ 0.

Thus, for𝑡 sufficiently large, sup

𝑘≤𝑘𝑡

|(1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝𝜓) 𝔼[𝑄𝑘] + 𝑘 − 1 + 𝛿

(𝑡 − 1)𝜃 𝑝𝜓𝔼[𝑄𝑘−1]|

≤ (1 − 1

(𝑡 − 1)𝜃) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝑄]‖

≤ (1 −𝐶(1)𝜀

𝑡 )‖𝔼[1(−∞,𝑘𝑡](⋅)𝑄]‖,

(B.37)

where𝐶(1)𝜀 is some constant. Now we bound (B.35) and (B.36). We use the assumption (ii) in (B.31) and for𝑡 sufficiently large we have

sup

𝑘≥1

|(𝑘 + 𝛿)𝑝𝜓𝔼 (𝛺𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺𝑡−1 𝑄𝑘)| ≤ 𝑐𝐶𝑄

𝑡 𝔼[|𝛺𝑡−1− (𝑡 − 1)𝜃|].

For the expectation on the right side, we apply Hölder’s inequality and

(17)

obtain

𝔼[|𝛺𝑡−1− (𝑡 − 1)𝜃|] ≤ (𝔼|𝛺𝑡−1− (𝑡 − 1)𝜃|2)1/2

≤ (𝔼 |

𝑡−1

𝑖=1

𝑋𝑖|

2

)

1/2

,

(B.38)

where𝑋𝑖 ∶= 𝛿(𝜉𝑖− 𝜇𝜉) + (𝜁𝑖− 𝜇𝜁). In order to bound the expecta- tion𝔼 |∑𝑡−1𝑖=1𝑋𝑖|2, we need the Marcinkiewicz–Zygmund inequality (see [18, Section 10.3]): if(𝑋𝑖)𝑖≥1are i.i.d. random variables such that 𝔼[𝑋𝑖] = 0 and 𝔼[|𝑋𝑖|𝑞] < +∞ hold for all 𝑖, 1 ≤ 𝑞 < ∞, then there exists a constant𝑐𝑞depending only on𝑞 such that

𝔼 (|

𝑛

𝑖=1

𝑋𝑖|

𝑞

) ≤ 𝑐𝑞𝑛𝔼[|𝑋1|𝑞]. (B.39)

Note that𝜁 has a finite second moment and 𝜉 ≤ 𝜁, so that 𝑋𝑖 has a finite second moment. Applying the Marcinkiewicz-Zygmund in- equality on the right side of (B.38), we obtain

𝔼[|𝛺𝑡−1− (𝑡 − 1)𝜃|] ≤ (𝑐2(𝑡 − 1)𝔼[|𝑋21|])1/2≤ 𝑐𝑡1/2, (B.40) where𝑐 is some constant. Now we have proved that the supremum over𝑘 of the absolute values of (B.35) is bounded from above by a constant divided by𝑡1−𝛽, where𝛽 ≥ 1/2. The same can be done anal- ogously for (B.36). Hence we have proved (B.32).

In order to substitute𝑄 with 𝜀(𝑡 − 1) in (B.32), we have to prove (i) and (ii) for𝑄 are satisfied for 𝜀(𝑡 − 1). Note that 𝜀0(𝑡 − 1) = 0 is always true by convention, we only have to prove (B.31). Since𝜃> 1 when𝛿 > −1, note from (B.6) that we have the bound 𝑝𝑘≤ 𝑐𝑘−𝛾with 𝛾 > 2. We have 𝑘 + 𝛿 > 0 for any 𝑘 ≥ 1 and on the event {𝛺𝑡−1≥ 𝑡},

sup

𝑘≥1

|𝑘 + 𝛿||𝜀𝑘(𝑡 − 1)|

≤ ∑

𝑘≥1

(𝑘 + 𝛿)[𝑁𝑀𝑘 (𝑡 − 1) + (𝑡 − 1)𝜇𝜉𝑝𝑘]

≤ 𝛺𝑡−1+ (𝑡 − 1) ∑

𝑘≥1

(𝑘 + 𝛿)𝜇𝜉𝑝𝑘≤ 𝑐𝛺𝑡−1,

(B.41)

where𝑐 is some constant . We have completed the proof of (B.31).

To complete the proof of Lemma B.4(b), we first show that (B.32)

(18)

B.2 proof of theorem B.1

implies, for1 ≤ 𝑛 ≤ 𝑡, and all 𝑘 ≥ 1, on the event {𝛺𝑡−1≥ 𝑡},

𝔼[1{𝑘≤𝑘𝑡}(𝑇𝑡𝑛𝜀(𝑡 − 1))𝑘] ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝜀(𝑡 − 1)]‖

+𝑛𝐶(3)𝜀 𝑡1−𝛽 .

(B.42) We apply mathematical induction on𝑛 to prove (B.42). Note that when 𝑛 = 1, (B.42) is the same with (B.32) and hence true. Now we assume (B.42) is true for the case of𝑛 − 1 and seek to advance the induction hypothesis. Notice that

1{𝑘≤𝑘𝑡}(𝑇𝑡𝑛𝜀(𝑡 − 1))𝑘 = 1{𝑘≤𝑘𝑡}𝑇𝑡(𝑄(𝑡 − 1))𝑘, (B.43) where𝑄𝑘(𝑡 − 1) = 1{𝑘≤𝑘𝑡}(𝑇𝑡𝑛−1𝜀(𝑡 − 1))𝑘. To use (B.32), we need to verify (i) and (ii) for𝑄(𝑛 − 1). Note that 𝑄0(𝑛 − 1) = 0 by convention and to prove (B.31), on the event{𝛺𝑡−1≥ 𝑡}, we have, for 𝑡 sufficiently large,

𝑘=1

(𝑘 + 𝛿)(𝑇𝑡𝑄)𝑘≤ (1 + 𝑝𝜓 𝛺𝑡−1)

𝑘=1

(𝑘 + 𝛿)𝑄𝑘

≤ (1 +𝑝𝜓 𝑡 )

𝑘=1

(𝑘 + 𝛿)𝑄𝑘, and hence by induction,

𝑘=1

(𝑘 + 𝛿)(𝑇𝑡𝑛𝑄)𝑘≤ (1 +𝑝𝜓 𝑡 )

𝑛−1 ∞

𝑘=1

(𝑘 + 𝛿)𝑄𝑘. (B.44)

Substitute𝑄𝑘= 𝜀𝑘(𝑡 − 1) and notice that 𝜀𝑘(𝑡 − 1) = 𝑁𝑘𝑀(𝑡 − 1) − (𝑡 − 1)𝜇𝜉𝑝𝑘, the following bound holds:

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄𝑘(𝑛 − 1)| ≤

𝑘=1

(𝑘 + 𝛿) |(𝑇𝑡𝑛−1𝜀(𝑡 − 1))𝑘|

𝑘=1

(𝑘 + 𝛿) |(𝑇𝑡𝑛−1𝑁𝑀(𝑡 − 1))𝑘| + (𝑡 − 1)𝜇𝜉

𝑘=1

(𝑘 + 𝛿) |(𝑇𝑡𝑛−1𝑝)𝑘|

≤ (1 +𝑝𝜓 𝑡 )

𝑛−1

𝑘≥1

(𝑘 + 𝛿)[𝑁𝑘𝑀(𝑡 − 1) + (𝑡 − 1)𝜇𝜉𝑝𝑘]

≤ (1 +𝑝𝜓 𝑡 )

𝑛−1

⋅ 𝑐𝛺𝑡−1,

(B.45) where the second inequality follows by substitute𝑄𝑘in (B.44) by𝑁𝑘𝑀(𝑡−

(19)

1) and 𝑝𝑘respectively and the last inequality follows by (B.41). Since 1 + 𝑥 ≤ 𝑒𝑥holds for𝑥 ≥ 0, when 𝑛 ≤ 𝑡, on the event {𝛺𝑡−1≥ 𝑡},

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄𝑘(𝑛 − 1)| ≤ 𝑒1𝑐𝛺𝑡−1, (B.46) and (B.46) implies (ii).

We now use the induction hypothesis on the case of𝑛 − 1 that 𝔼[1{𝑘≤𝑘𝑡}(𝑇𝑡𝑛−1𝜀(𝑡 − 1))𝑘]

≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝜀(𝑡 − 1)]‖ +(𝑛 − 1)𝐶(3)𝜀

𝑡1−𝛽 . (B.47) We substitute𝑄𝑘in (B.32) with𝑄𝑘(𝑛 − 1) = 1{𝑘≤𝑘𝑡}(𝑇𝑡𝜀(𝑡 − 1))𝑘, and on the event{𝛺𝑡−1≥ 𝑡}, we have

𝔼[1{𝑘≤𝑘𝑡}(𝑇𝑡𝜀(𝑡 − 1))𝑘]

≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[1(−∞,𝑘𝑡](⋅)𝜀(𝑡 − 1)]‖ +(𝑛 − 1)𝐶(3)𝜀 + 𝑐𝐶𝑄

𝑡1−𝛽 ,

(B.48) in which if we take𝐶(3)𝜀 ≥ 𝑐𝐶𝑄, we have completed advancing the induction hypothesis.

We obtain from (B.48) that on the event{𝛺𝑡−1≥ 𝑡}, for 𝜁𝑡< 𝑡,

𝔼[1{𝑘≤𝑘𝑡}(𝒯𝑡𝜀(𝑡 − 1))𝑘|𝜁𝑡] ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)|𝜁𝑡]‖ +𝜁𝑡𝐶(3)𝜀 𝑡1−𝛽

= (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝜁𝑡𝐶(3)𝜀 𝑡1−𝛽 . Note that we have used𝜁𝑡is independent of𝜀(𝑡 − 1). If 𝜁𝑡> 𝑡, similar to (B.41), we have the trivial bound

sup

𝑘≤𝑘𝑡

|(𝒯𝑡𝜀(𝑡 − 1))𝑘| ≤ 𝑐𝛺𝑡, (B.49)

and hence, on the event{𝛺𝑡−1≥ 𝑡},

𝔼[1{𝑘≤𝑘𝑡}(𝒯𝑡𝜀(𝑡 − 1))𝑘|𝜁𝑡] ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖

+𝜁𝑡𝐶(3)𝜀

𝑡1−𝛽 + 𝑐𝔼[𝛺𝑡1{𝜁𝑡>𝑡}|𝜁𝑡].

(B.50)

(20)

B.2 proof of theorem B.1

We bound

𝔼[𝛺𝑡1{𝜁𝑡>𝑡}] = (𝑡 − 1)𝜃ℙ(𝜁𝑡> 𝑡) + (1 + 𝛿)𝔼[𝜁𝑡1{𝜁𝑡>𝑡}]

≤ (𝑡 − 1) 𝑐𝑁

𝑡3+𝑎0 +1 + 𝛿 𝑡 𝔼[𝜁2𝑡]

≤ 𝑐𝑁

𝑡2+𝑎0 +(1 + 𝛿)𝔼[𝜁2𝑡]

𝑡 ≤ 𝑐

𝑡1−𝛽,

(B.51)

where𝑐 is some constant chosen appropriately.

On the even{𝛺𝑡−1 < 𝑡}, we use the bound in (B.49) and hence have

𝔼[1{𝑘≤𝑘𝑡}(𝒯𝑡𝜀(𝑡 − 1))𝑘|𝜁𝑡] ≤ (1 −𝐶(1)𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖

+𝜁𝑡𝐶(3)𝜀

𝑡1−𝛽 + 𝑐𝔼[𝛺𝑡1{𝜁𝑡>𝑡}|𝜁𝑡] + 𝑐𝔼[𝛺𝑡1{𝛺𝑡−1<𝑡}|𝜁𝑡].

(B.52)

For the last term we use the standard large deviation theory. The event {𝛺𝑡−1< 𝑡} is the same as the event {𝛺𝑡−1− 𝔼[𝛺𝑡−1] < 𝑡 − (𝑡 − 1)𝜃}, and hence the same as{∑𝑡−1𝑖=1𝑋𝑖 < 𝑡 − (𝑡 − 1)𝜃}. We now study the event {∑𝑡−1𝑖=1𝑋𝑖< 𝑡 − (𝑡 − 1)𝜃}. For 𝑡 sufficiently large, we have

ℙ (∑𝑡−1𝑖=1𝑋𝑖

𝑡 − 1 < (1 − 𝜃)(𝑡 − 1) + 1

𝑡 − 1 ) ≈ ℙ ( ̄𝑋𝑡−1< 1 − 𝜃) . where we define the average of all𝑋𝑖’s as

̄𝑋𝑡−1= 1 𝑡 − 1

𝑡−1

𝑖=1

𝑋𝑖.

However, standard large deviation theory tells us the probability is exponentially small – see [3, page 21] for more about large deviation principle, noting1 − 𝜃 < 0, this in particular implies that for 𝑡 suffi- ciently large, we have

ℙ ( ̄𝑋𝑡−1< 1 − 𝜃) ≤ 𝑂(𝑡−2). (B.53) We use the above bound on the last term of (B.52) and hence have

𝔼[𝛺𝑡1{𝛺𝑡−1<𝑡}] = (𝑡 + 𝜃)ℙ(𝛺𝑡−1 < 𝑡) ≤ 𝑐 𝑡 ≤ 𝑐

𝑡1−𝛽, (B.54) where we use𝜁𝑡and𝜉𝑡are independent from𝛺𝑡−1and𝑐 is some con-

(21)

stant. The bound in Lemma B.4(b) follows for𝛽 ≥ 1/2 by taking ex- pectation on both sides of (B.52).

Proof of Lemma B.4(c). Now we prove Lemma B.4(c). Recall the def- initions that

̃𝜅𝑘(𝑡) = 1{𝑘≤𝑘𝑡}𝜅𝑘(𝑡) with 𝜅𝑘(𝑡) = (𝑡 − 1)((𝒯𝑡− 𝐼)𝑝)𝑘− (𝑆𝑝)𝑘, (B.55) where 𝒯𝑡= 𝑇𝑡𝜁𝑡,𝑇𝑡is defined in (B.21), and𝑆 is defined in (B.23), and where𝐼 denotes the identity operator. We now assume 𝑘 ≤ 𝑘𝑡in this proof, in which case𝜅𝑘(𝑡) = ̃𝜅𝑘(𝑡). (B.27) means that

𝜅𝑘(𝑡) = 1

𝜇𝜉(𝜀𝑘(𝑡) − (𝒯𝑡𝜀(𝑡 − 1))𝑘− (𝜉𝑡− 𝜇𝜉)1{𝑘=1}). (B.56) (B.41) implies thatsup𝑘≤1|𝜀𝑘(𝑡)| ≤ 𝑐𝛺𝑡−1≤ 𝑐𝛺𝑡andsup1≤𝑘≤𝑘

𝑡|(𝒯𝑡𝜀(𝑡−

1))𝑘| ≤ 𝑐𝛺𝑡by (B.49), thus

sup

𝑘≤𝑘𝑡

|𝜅𝑘(𝑡)| ≤ ̃𝐶𝛺𝑡+|𝜉𝑡− 𝜇𝜉|

𝜇𝜉 , (B.57)

where ̃𝐶 is some constant appropriately chosen and 𝜉𝑡− 𝜇𝜉is0 when taking expectations.

For𝑥 ∈ [0, 1] and 𝑤 ∈ ℕ, we define

𝑓𝑘(𝑥; 𝑤) = ((𝐼 + 𝑥(𝑇𝑡− 𝐼))𝑤𝑝)

𝑘. (B.58)

Also define

𝜅𝑘(𝑡; 𝑤) = (𝑡 − 1)[𝑓𝑘(1; 𝑤) − 𝑓𝑘(0, 𝑤)] − (𝑆𝑝)𝑘, (B.59) so that𝜅𝑘(𝑡) = 𝜅𝑘(𝑡; 𝜁𝑡). Note that 𝑥 ↦ 𝑓𝑘(𝑥; 𝑤) is a polynomial in 𝑥 of degree𝑘. Take a Taylor expansion around 𝑥 = 0, we have

𝑓𝑘(1; 𝑤) = 𝑓𝑘(0; 𝑤) + 𝑤((𝑇𝑡− 𝐼)𝑝)𝑘+1

2𝑓𝑘(𝑥𝑘; 𝑤), (B.60) for some𝑥𝑘∈ (0, 1). Since 𝐼 + 𝑥(𝑇𝑡− 𝐼) and 𝑇𝑡− 𝐼 commute,

𝑓𝑘(𝑥; 𝑤) = 𝑤(𝑤 − 1)((𝐼 + 𝑥(𝑇𝑡− 𝐼))𝑤−2(𝑇𝑡− 𝐼)2𝑝)

𝑘.

Referenties

GERELATEERDE DOCUMENTEN

In example (71), the patient is covert since it is understood from the context. The verb takes an oblique root. Note that the occurrence of the patient after the verb does

The prefix ba- combined with absolute verb roots occurs in intransitive constructions expressing ‘to do a relatively time stable activity’. Derivational forms only take

Chapter 9 discussed the derived verb constructions. Verbs are derived from prefixation processes. Three general types of derived verb constructions can be distinguished with regard

We derive a contraction rate for the corre- sponding posterior distribution, both for the mixing distribution rel- ative to the Wasserstein metric and for the mixed density relative

Fengnan Gao: Bayes &amp; Networks, Dirichlet-Laplace Deconvolution and Statistical Inference in Preferential Attachment Networks, © April 2017.. The author designed the cover

In this chapter we improve the upper bound on posterior contraction rates given in [61], at least in the case of the Laplace mixtures, obtaining a rate of

plex networks are representations of complex systems that we wish to study, and network science, by all its means, is the science to study these complex systems beneath such

To sum up, the estimator works as proven in the main Theorem 3.2, but the exact performance depends on the true pa function and the degree of interest—if the true pa function