Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

(1)

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

Part IV

A P P E N D I X

(3)

(4)

A

D I R I C H L E T P R O C E S S E S A N D C O N T R A C T I O N R AT E S R E L AT I V E T O N O N - M E T R I C S

In this appendix we present the definition of the Dirichlet process and give the proof to Theorem 1.13—a more generalized basic contraction rate theorem in Chapter 1, which is valid for more general discrepan- cies.

A.1 dirichlet processes

We give one version of the definition of the Dirichlet process, as in [32, Chapter 4]. Dir(𝑘; 𝛼₁, … , 𝛼_𝑘) denotes the Dirichlet distribution parameterized by a𝑘-dimensional real vector (𝛼₁, … , 𝛼_𝑘).

Definition A.1 (Dirichlet Process). A random measure𝑃 on a Pol- ish space(𝔛, 𝒳 ) is said to possess a Dirichlet process distribution DP(𝛼) with base measure 𝛼, which is a given positive Borel measure on(𝔛, 𝒳 ), if for every finite measurable partition 𝐴₁, 𝐴₂, … , 𝐴_𝑘of 𝔛,

(𝑃(𝐴₁), … , 𝑃(𝐴_𝑘)) ∼ Dir(𝑘; 𝛼(𝐴₁), … , 𝛼(𝐴_𝑘)). (A.1)

A.2 contraction rates relative to non-metrics

The following lemma comes from the general results of [14] and [47].

Lemma A.2. For any probability measure𝑃 and dominated, convex set of probability measures 𝒬 withℎ(𝑝, 𝑞) > 𝜀 for any 𝑞 ∈ 𝒬 and any 𝑛 ∈ ℕ, there exists a test 𝜙_𝑛such that

𝑃^𝑛𝜙_𝑛≤ 𝑒^−𝑛𝜀²^/8, sup

𝑄∈𝒬

𝑄^𝑛(1 − 𝜙_𝑛) ≤ 𝑒^−𝑛𝜀²^/8.

Lemma A.3. Let𝑑 be a discrepancy measure in the sense of (a)–(d) whose balls are convex and which is bounded from above by the Hellin- ger distanceℎ. If 𝑁(𝐶⁻¹𝜀/4, 𝒬, 𝑑) ≤ 𝑁(𝜀) for any 𝜀 > 𝐶𝜀_𝑛 > 0 and some non-increasing function𝑁 ∶ (0, ∞) → (0, ∞), then for every 𝜀 > 𝐶𝜀_𝑛and𝑛, there exists a test 𝜑_𝑛such that for all𝑗 ∈ ℕ,

𝑃^𝑛𝜑_𝑛≤ 𝑁(𝜀) 𝑒^−𝑛𝜀²^/32

1 − 𝑒^−𝑛𝜀²^/32, sup

𝑄∈𝒬,𝑑(𝑃,𝑄)>𝐶𝑗𝜀

𝑄^𝑛(1 − 𝜑_𝑛) ≤ 𝑒^−𝑛𝜀²^𝑗²^/32.

Proof. For a given𝑗 ∈ ℕ, choose a maximal set 𝑄_𝑗,1, 𝑄_𝑗,2, … , 𝑄_𝑗,𝑁_𝑗in the set 𝒬_𝑗 = {𝑄 ∈ 𝒬 ∶ 𝐶𝑗𝜀 < 𝑑(𝑃, 𝑄) < 2𝐶𝑗𝜀} such that 𝑑(𝑄_𝑗,𝑘, 𝑄_𝑗,𝑙) ≥

(5)

𝑗𝜀/2 for every 𝑘 ≠ 𝑙. By property (d) of the discrepancy every ball in a cover of 𝒬_𝑗by balls of radius𝐶⁻¹𝑗𝜀/4 contains at most one 𝑄_𝑗,𝑘. Thus 𝑁_𝑗 ≤ 𝑁(𝐶⁻¹𝑗𝜀/4, 𝒬𝑗, 𝑑) ≤ 𝑁(𝜀). Furthermore, the 𝑁_𝑗balls𝐵_𝑗,𝑙of ra- dius𝑗𝜀/2 around 𝑄_𝑗,𝑙cover 𝒬𝑗, as otherwise the set of𝑄_𝑗,𝑙would not be maximal. For any point𝑄 in each 𝐵_𝑗,𝑙, we have

𝑑(𝑃, 𝑄) ≥ 𝐶⁻¹𝑑(𝑃, 𝑄_𝑗,𝑙) − 𝑑(𝑄, 𝑄_𝑗,𝑙) ≥ 𝑗𝜀/2.

Since the Hellinger distance bounds𝑑 from above, also ℎ(𝑃, 𝐵_𝑗,𝑙) ≥ 𝑗𝜀/2. By Lemma A.2, there exist a test 𝜑_𝑗,𝑙of𝑃 versus 𝐵_𝑗,𝑙with error probabilities bounded from above by𝑒^−𝑛𝑗²^𝜀²^/32. Let𝜑_𝑛be the supremum of all the tests𝜑_𝑗,𝑙obtained in this way, for𝑗 = 1, 2, … , and 𝑙 = 1, 2, … , 𝑁_𝑗. Then,

𝑃^𝑛𝜑 ≤

∞

∑

𝑗=1

∑

𝑙=1

𝑁_𝑗𝑒^−𝑛𝑗²^𝜀²^/32≤

∞

∑

𝑗=1

𝑁(𝐶⁻¹𝑗𝜀/4, 𝒬_𝑗, 𝑑)𝑒^−𝑛𝑗²^𝜀²^/32

≤ 𝑁(𝜀) 𝑒^−𝑛𝜀²^/32 1 − 𝑒^−𝑛𝜀²^/32, and for every𝑗 ∈ ℕ,

sup

𝑄∈∪_𝑙>𝑗𝒬_𝑙

𝑄^𝑛(1 − 𝜑_𝑛) ≤ sup

𝑙>𝑗

𝑒^−𝑛𝑙²^𝜀²^/32≤ 𝑒^−𝑛𝑗²^𝜀²^/32,

by the construction of𝜑_𝑛.

Proof of Theorem 1.13. For every𝜀 > 4𝐶𝜀_𝑛,

log 𝑁(𝐶⁻¹𝜀/4, 𝒫𝑛, 𝑑) ≤ log 𝑁(𝜀_𝑛, 𝒫𝑛, 𝑑) ≤ 𝑐₁𝑛𝜀²_𝑛.

We take𝑁(𝜀) = exp(𝑐₁𝑛𝜀²_𝑛) and 𝜀 = 𝑀𝐶⁻¹𝜀_𝑛,𝑗 = 1 in Lemma A.3, where𝑀 > 4𝐶 is a large constant to be chosen later, there exist tests 𝜑_𝑛with errors

𝑃₀^𝑛𝜑_𝑛≤ 𝑒^𝑐¹^𝑛𝜀^𝑛² 𝑒^−𝑛𝑀²^𝐶⁻²^𝜀²^𝑛^/32 1 − 𝑒^−𝑛𝑀²^𝐶⁻²^𝜀²^𝑛^/32, sup

𝑝∈𝒫_𝑛∶𝑑(𝑝,𝑝₀)>𝑀𝜀_𝑛

𝑃^𝑛(1 − 𝜑_𝑛) ≤ 𝑒^−𝑛𝑀²^𝐶⁻²^𝜀²^𝑛^/32.

Next the proof proceeds as in Theorem 2.4 of [30]. All terms tend to zero for𝑀²/(32𝐶²) > 𝑐₁and𝑀²/(32𝐶²) > 2 + 𝑐₂.

(6)

B

C O N V E R G E N C E T O A P O W E R L AW O F T H E M O V I E D E G R E E S I N T H E PA M - I M D B M O D E L

We establish the asymptotic behavior of the movie degrees of actors when the number of movies of the network goes to infinity with the formal definition of our model in Section 6.2. Indeed we prove that the distribution of the movie degree converges to a power law, from which we can also give the motivation of our choice of linear preference over the movie degrees of actors and also the estimate of the parameter 𝛿 in our model proposed before (the parameter is also used in the simulation in Section 6.4).

B.1 introduction and heuristics

In [21], a preferential attachment model with random initial degrees, that combines the rich-get-richer and the rich-by-birth effects, is in- vestigated, where in particular the effect of random initial degree is studied. However, here we have a random movie size and random number of new actors. This leads us to apply the methods in [21], which proves the convergence to a power law of actor’s movie degrees.

To formulate the result, let𝑁_𝑘^𝑀(𝑡) be the number of actors with movie degree𝑘 in 𝒢_𝑡and define𝑝_𝑘(𝑡) = 𝑁_𝑘^𝑀(𝑡)/𝜙_𝑡 (recall in Sec- tion 6.2 that𝑡 is the number of movies and 𝜙_𝑡is the number of actors in the network) as the fraction of actors with movie degree𝑘. We are interested in the limiting distribution of𝑝_𝑘(𝑡) as 𝑡 → ∞. We start by giving a short heuristic derivation of the distribution. Note that

𝔼[𝑁_𝑘^𝑀(𝑡 + 1)|𝒢_𝑡] = 𝑁_𝑘^𝑀(𝑡) + 𝔼[𝑁_𝑘^𝑀(𝑡 + 1) − 𝑁_𝑘^𝑀(𝑡)|𝒢_𝑡]. (B.1) Asymptotically, for𝑡 large, the movie size of 𝑚_𝑡+1is supposed to be much smaller than the size of the actor network|𝒜𝑡|. Thus, it is un- likely that an actor vertex will be chosen twice when a movie comes in, we shall ignore the difference for now. The difference𝑁_𝑘^𝑀(𝑡 + 1) − 𝑁_𝑘^𝑀(𝑡) between the number of actor vertices with degree 𝑘 at time 𝑡+1 and time𝑡 respectively, comes from three possibilities (recall that 𝜁_𝑡+1, 𝜓_𝑡+1, and𝜉_𝑡+1are the movie size of𝑚_𝑡+1, the number of old actors and the number of new actors respectively):

1. Actor vertices with movie degree𝑘 in 𝒢_𝑡that are chosen by the movie vertex𝑚_𝑡+1are subtracted from𝑁_𝑘^𝑀(𝑡). The conditional probability that an actor vertex with movie degree𝑘 is chosen is(𝑘 + 𝛿)𝑁_𝑘^𝑀(𝑡)/𝛺_𝑡(recall that we assume affine preferential

(7)

attachment model as in𝑓(𝑘) = 𝑘 + 𝛿 with 𝛿 parameter), where 𝛺_𝑡 = ∑_𝑣∈𝒜

𝑡(𝐷_𝑘^𝑀(𝑣) + 𝛿). Thus conditioned on the number of old actors𝜓_𝑡+1, the expected number of actor vertices with movie degree𝑘 is 𝜓_𝑡+1(𝑘 + 𝛿)𝑁_𝑘^𝑀(𝑡)/𝛺_𝑡.

2. Actor vertices with movie degree𝑘 − 1 in 𝒢𝑡that are chosen by the movie vertex𝑚_𝑡+1are added to𝑁_𝑘^𝑀(𝑡). By reasoning the same as above, it follows that the mean number of such vertices is𝜓_𝑡+1(𝑘 − 1 + 𝛿)𝑁_𝑘−1^𝑀 (𝑡)/𝛺_𝑡, conditioned on𝜓_𝑡+1.

3. The new vertices introduced by the movie vertex𝑚_𝑡+1 to be added to𝑁_𝑘^𝑀(𝑡) if 𝑘 = 1 (the new actor vertices have movie degrees equal to1).

Combining the three contributions gives:

𝔼[𝑁_𝑘^𝑀(𝑡 + 1) − 𝑁_𝑘^𝑀(𝑡)|𝒢_𝑡]

≈ (𝑘 − 1 + 𝛿)𝑁_𝑘−1^𝑀 (𝑡)

𝛺_𝑡 𝔼[𝜓_𝑡+1] − (𝑘 + 𝛿)𝑁_𝑘^𝑀(𝑡) 𝛺_𝑡 𝔼[𝜓_𝑡+1] + 1_{𝑘=1}(𝔼[𝜁_𝑡+1] − 𝔼[𝜓_𝑡+1])

= (𝑘 − 1 + 𝛿)𝑁_𝑘−1^𝑀 (𝑡)

𝛺_𝑡 𝜇_𝜓−(𝑘 + 𝛿)𝑁_𝑘^𝑀(𝑡)

𝛺_𝑡 𝜇_𝜓+ 1_{𝑘=1}𝜇_𝜉, (B.2)

where we introduce the notations𝜇_𝜁= 𝔼[𝜁] = 𝔼[𝜁_𝑡+1], 𝜇_𝜓= 𝔼[𝜓] = 𝔼[𝜓_𝑡+1] and 𝜇_𝜉= 𝜇_𝜁−𝜇_𝜓. In (B.2) the approximation sign refers to the fact that we have ignored the possibility that an actor vertex is chosen in the same movie twice. Now assume that𝑝_𝑘(𝑡) converges to some limit𝑝_𝑘as𝑡 → ∞ and the strong law of large number holds for the distribution of new actor vertices introduced by every movie, so that 𝔼[𝜙_𝑡]/𝑡 ∼ 𝜇_𝜉and𝑁_𝑘^𝑀(𝑡 + 1)/𝑡 ∼ (𝑡 + 1)𝜇_𝜉𝑝_𝑘. Also assume that

𝛺_𝑡= ∑

𝑎∈𝒜_𝑡

(𝐷_𝑡^𝑀(𝑎) + 𝛿))

≈ 𝑡(𝜇_𝜁+ 𝜇_𝜉𝛿).

(B.3)

Define𝜃 = 𝜇_𝜁+ 𝜇_𝜉𝛿, then by the law of large numbers 𝛺_𝑡+1 ≈ 𝑡𝜃.

Substituting (B.3) into (B.1), and taking expectation again, we come to

𝔼[𝑁_𝑘^𝑀(𝑡 + 1)] = (𝑘 − 1 + 𝛿)𝜇_𝜓

𝑡𝜃 𝔼[𝑁_𝑘−1^𝑀 (𝑡)]

𝑘 + 𝛿 (B.4)

(8)

B.1 introduction and heuristics

Let𝑡 → ∞, and observing that 𝔼[𝑁^𝑀_𝑘 (𝑡 + 1)] − 𝔼[𝑁_𝑘^𝑀(𝑡)] → 𝜇_𝜉𝑝_𝑘 implies¹_𝑡𝔼[𝑁_𝑘^𝑀(𝑡)] → 𝜇_𝜉𝑝_𝑘, for all𝑘, then yields the recursion

𝑝_𝑘= 𝑘 − 1 + 𝛿

𝜃^∗ 𝑝_𝑘−1−𝑘 + 𝛿

𝜃^∗ 𝑝_𝑘+ 1_{𝑘=1}. (B.5) where𝜃^∗ = (𝜇_𝜁+ 𝜇_𝜉𝛿)/𝜇_𝜓. By iteration, noticing that𝑝₀= 0 (there is no actor vertex with movie degree0), the recursion can be solved by

𝑝_𝑘= 𝛤(1 + 𝛿 + 𝜃^∗) 𝛤(1 + 𝛿)

𝛤(𝑘 + 𝛿) 𝛤(𝑘 + 𝛿 + 1 + 𝜃^∗)𝜃^∗

≈ 𝑐(𝜃^∗, 𝛿)𝑘^−(1+𝜃^∗⁾,

(B.6)

the approximation only holds when𝑘 is sufficiently large.

Theorem B.1 (Theorem 6.1). Suppose, for the model discussed above, that the following conditions hold:

1. In fact Condition 1.

implies that the distribution of movie size𝜁 has a finite third moment, but a finite second moment is sufficiently strong to proceed the proof.

There exist a constant𝑎₀> 0 and 𝑐_𝑁such that

ℙ(𝜁 > 𝑁) ≤ 𝑐_𝑁𝑁^−(3+𝑎⁰⁾. (B.7) In particular this implies the distribution of the movie size𝜁 has a finite second moment.

2. 𝛿 > −1.

3. 𝜃 > 1.

Then there exists a constant𝛾 such that

𝑡→∞limℙ (max

𝑘≥1 |𝑝_𝑘(𝑡) − 𝑝_𝑘| ≥ 𝑡^−𝛾) = 0 where(𝑝_𝑘)_𝑘≥1is defined in (B.6).

Estimation in real dataset gives us an estimate of𝑎₀at around0.05.

This yields the first assumption in the theorem above is reasonable in real dataset. As for the second assumption, it is natural otherwise 𝑘 + 𝛿 can be negative when 𝑘 = 1, Besides we obtained an estimate of 𝛿 = 0.0851. The last condition states 𝜃 = 𝜇_𝜁+ 𝛿𝜇_𝜉> 1, which is also reasonable in real dataset as it basically requires the average number of old actors to be strictly greater than1, as confirmed by estimation of𝜇_𝜓≈ 8.7400 in the real dataset.

We shall prove the main result of this chapter and it is worth noting that the proof is largely adapted from [21].

(9)

B.2 proof of theorem B.1

The proof of Theorem B.1 consists of two parts: in the first part, we prove that the degree sequence is concentrated around its mean; and in the second part, we identify the mean degree sequence.

We formulate these two steps into two propositions. The concentration of the movie degree sequence is as follows:

Proposition B.2. With the conditions of Theorem B.1, there exists a constant𝛼 ∈ (1/2, 1) such that

𝑡→∞limℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼) = 0.

The result of the identification of the mean sequence is as follows:

Proposition B.3. With the conditions of Theorem B.1, there exist con- stants𝑐 > 0 and 𝛽 ∈ (0, 1) such that

max𝑘≥1 |𝔼[𝑁_𝑘^𝑀(𝑡)] − 𝑡𝜇_𝜉𝑝_𝑘| ≤ 𝑐𝑡^𝛽, (B.8) where(𝑝_𝑘)_𝑘≥1are as in (B.6) and𝜇_𝜉= 𝜇_𝜁− 𝜇_𝜓is the expected number of new actors.

With Proposition B.2 and B.3, we can prove the main result Theo- rem B.1.

Proof of Theorem B.1. Combining (B.8) with the triangle inequality, we have

ℙ (max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝑡𝜇_𝜉𝑝_𝑘| ≥ 𝑐𝑡^𝛽+ 𝑡^𝛼)

≤ ℙ (max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼) .

The right side goes to0 as 𝑡 → ∞ by Proposition B.2. Since 𝜙_𝑡/𝑡 − 𝜇_𝜉 converges to0 almost surely and 𝑝_𝑘(𝑡) = 𝑁_𝑘^𝑀(𝑡)/𝜙_𝑡, we have

𝑡→∞limℙ (max

𝑘≥1 |𝑝_𝑘(𝑡) − 𝑝_𝑘| ≥ 𝑐𝑡^𝛽+ 𝑡^𝛼 𝑡𝜇_𝜉 ) = 0.

Theorem B.1 follows by choosing the right0 < 𝛾 < 1 − max(𝛼, 𝛽).

Note that𝛾 ∈ (1/2, 1).

(10)

B.2 proof of theorem B.1

B.2.1 Concentration around the mean

In this section, we prove Proposition B.2. The way to prove this is to apply Azuma’s inequality – see e.g. [37] – after expressing the differ- ence𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)] in terms of a Doob’s martingale. See [16, 21]

for more of the applications.

First note that

𝑁_𝑘^𝑀(𝑡) ≤ 1 𝑘

∞

∑

𝑙=1

𝑙𝑁_𝑙^𝑀(𝑡) ≤ 1

𝑘𝐿^𝑀_𝑡 , (B.9) where𝐿^𝑀_𝑡 = ∑^𝑡_𝑙=1𝑙𝑁_𝑙^𝑀(𝑡) = ∑^𝑡_𝑖=1𝜁_𝑖(not to be confused with𝛺_𝑡 = 𝛺^𝑀_𝑡 (𝛿) = ∑^𝑡_𝑙=1(𝑁_𝑙^𝑀(𝑡) + 𝛿)). Thus (B.9) implies 𝔼[𝑁_𝑘^𝑀(𝑡)] ≤ 𝜇_𝜁𝑡/𝑘.

Define[𝑡] = {1, … , 𝑡}, on the event {𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]} we introduce the Doob’s martingale,

𝑀_𝑛= 𝔼[𝑁_𝑘^𝑀(𝑡)|𝒢_𝑛], 𝑛 = 0, … , 𝑡, (B.10) where 𝒢0is set to be the empty graph. It is the Doob’s martingale with respect to(𝒢𝑛)^𝑡_𝑛=1since it is clear that𝔼|𝑀_𝑛| < ∞. We have 𝑀_𝑡 = 𝑁_𝑘^𝑀(𝑡) and 𝑀₀= 𝔼[𝑁_𝑘^𝑀(𝑡)] and hence

𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)] = 𝑀_𝑡− 𝑀₀.

Furthermore conditioned on the movie sizes(𝜁_𝑛)^𝑡_𝑛=0, the increments satisfy|𝑀_𝑛− 𝑀_𝑛−1| ≤ 2𝜁_𝑛. In fact, the information that is in 𝒢_𝑛and not in 𝒢_𝑛−1is how the𝑛-th movie select old actors and add new actors and how that might have an impact on the number of actors with the movie degree𝑘, thus the absolute difference resulting from the 𝑛-th movie can not be greater than the size of the𝑛-th movie. On the event {𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]}, we obtain |𝑀_𝑛− 𝑀_𝑛−1| ≤ 2𝑡^𝑎. We apply the Azuma-

Hoeffding inequality and come to

ℙ(|𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡])

≤ 2 exp (− 𝑡^2𝛼

8 ∑^𝑡_𝑖=1𝑡^2𝑎) = 2 exp (−1

8𝑡^{2𝛼−2𝑎−1}) . (B.11) To boundℙ( max_𝑘≥1|𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼), we give an upper bound by noticing that the movie degree of any actor cannot ex-

(11)

ceed the number of movies𝑡:

ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡])

≤

𝑡

∑

𝑘=1

ℙ( |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]), (B.12)

Substituting (B.11) into (B.12), we have ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡])

≤ 2𝑡 exp (−1

8𝑡^{2𝛼−2𝑎−1}) .

(B.13)

If𝛼 > 𝑎 + 1/2, then the above exponential tends to 0 with 𝛼 < 1.

We have proved Proposition B.2 on the event{𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]}. We use the assumption on the distribution of the movie sizes thatℙ(|𝜁₁| >

𝑁) ≤ 𝑐_𝑁𝑁^−(1+𝑎⁰⁾to obtain

ℙ( max

1≤𝑖≤𝑡𝜁_𝑖≤ 𝑡^𝑎) =

𝑡

∏

𝑖=1

ℙ(𝜁_𝑖≤ 𝑡^𝑎) = (1 − ℙ(𝜁₁> 𝑡^𝑎))^𝑡

≥ 1 − 𝑡ℙ(𝜁₁> 𝑡^𝑎) ≥ 1 − 𝑐_𝑁𝑡^{−(3𝑎+𝑎𝑎}⁰⁻¹⁾

(B.14)

where we take _2(1+𝑎¹

0/3) < 𝑎 < 1/2 so that 3𝑎 + 𝑎𝑎₀− 1 > 1/2. This yields that the event{𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]} will occur with high probability.

Define the event{𝜁_𝑖≤ 𝑡^𝑎, 𝑖 ∈ [𝑡]} as 𝐾_𝑡 ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼)

= ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝐾_𝑡)ℙ(𝐾_𝑡) (B.15) + ℙ( max

𝑘≥1 |𝑁^𝑀_𝑘 (𝑡) − 𝔼[𝑁^𝑀_𝑘 (𝑡)]| ≥ 𝑡^𝛼|𝐾^𝑐_𝑡)ℙ(𝐾_𝑡^𝑐), (B.16) where𝐾^𝑐_𝑡is the complement of the event of𝐾_𝑡. We now take care of the terms on the right hand side one by one. For (B.15), (B.13) leads to

ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝐾_𝑡)ℙ(𝐾_𝑡)

≤ 2𝑡 exp (−1

8𝑡^{2𝛼−2𝑎−1}) .

(B.17)

(12)

As for (B.16), we use (B.14) and have ℙ( max

𝑘≥1 |𝑁_𝑘^𝑀(𝑡) − 𝔼[𝑁_𝑘^𝑀(𝑡)]| ≥ 𝑡^𝛼|𝐾_𝑡^𝑐)ℙ(𝐾_𝑡^𝑐)

≤ 𝑐_𝑁𝑡^{−(3𝑎+𝑎𝑎}⁰⁻¹⁾.

(B.18)

Combining (B.17) and (B.18), we have the desired result.

B.2.2 Identification of the mean sequence

In this section we prove Proposition B.3. We introduce

̄𝑁^𝑀_𝑘 (𝑡) = 𝔼[𝑁^𝑀_𝑘 (𝑡)|(𝜁_𝑖)^𝑡_𝑖=1], for𝑡 ≥ 1, (B.19) to denote the expectation of𝑁_𝑘^𝑀(𝑡) conditioned on the knowledge of all the movie sizes. Define the error term

𝜀_𝑘(𝑡) = ̄𝑁_𝑘^𝑀(𝑡) − 𝑡𝜇_𝜉𝑝_𝑘, for𝑡 ≥ 1, (B.20) where𝜇_𝜉= 𝜇_𝜁− 𝜇_𝜓is as in Proposition B.3.

Also, for any real sequence𝑄 = (𝑄_𝑘)_𝑘≥0with𝑄₀= 0 we introduce the operator𝑇_𝑡(compare to (B.4))

(𝑇_𝑡𝑄)_𝑘= (1 − 𝑘 + 𝛿

𝛺_𝑡−1𝑝_𝜓) 𝑄_𝑘+𝑘 − 1 + 𝛿

𝛺_𝑡−1 𝑝_𝜓𝑄_𝑘−1, 𝑘 ≥ 1. (B.21) The operator𝑇_𝑡describes the effect of choosing one actor (whether new or not) for a movie, neglecting the effect of the introduction of a new actor with movie degree1: when adding an actor to the 𝑡-th movie, it will happen with probability𝑝_𝜓that it is old and1 − 𝑝_𝜓 that it is new. If the actor is old, we choose an actor of movie degree 𝑘 with probability (𝑘 + 𝛿)/𝛺_𝑡−1thus this actor will have a new movie degree of𝑘 + 1 and an actor of movie degree 𝑘 − 1 with probability (𝑘 − 1 + 𝛿)/𝛺_𝑡−1thus this actor will have a new movie degree of𝑘. The expected number of actors with movie degree𝑘 after the choice of an actor is made is given by applying the operator𝑇_𝑡to ̄𝑁^𝑀(𝑡 − 1) and adding the possibility of introducing a new actor with movie degree1.

Therefore conditioned on𝜁_𝑡and without taking the possibility of introducing new actors with movie degree1 into account, we can write the effect of a movie on the network as the operator 𝒯_𝑡= 𝑇𝑡^𝜁^𝑡, here the power𝜁_𝑡on the operator𝑇_𝑡means applying the operator𝑇_𝑡repeatedly for𝜁_𝑡times since the𝑡-th movie has size of 𝜁_𝑡. Hence, ̄𝑁_𝑘^𝑀(𝑡) satisfies

̄𝑁^𝑀_𝑘 (𝑡) = (𝒯_𝑡 ̄𝑁^𝑀(𝑡 − 1))_𝑘+ 𝜉_𝑡1_{𝑘=1}, 𝑘 ≥ 1. (B.22)

(13)

Now we introduce another operator on a real sequence𝑄 = (𝑄_𝑘)_𝑘≥0 with𝑄₀= 0 (compare to (B.5))

(𝑆𝑄)_𝑘 =𝑘 − 1 + 𝛿

𝜃^∗ 𝑄_𝑘−1−𝑘 + 𝛿

𝜃^∗ 𝑄_𝑘, 𝑘 ≥ 1, (B.23) where𝜃^∗ = (𝜇_𝜓+ (𝜇_𝜁− 𝜇_𝜓)𝛿)/𝜇_𝜓. This yields that (B.5) is given by 𝑝_𝑘 = (𝑆𝑝)_𝑘+ 1_{𝑘=1}with the initial condition𝑝₀ = 0. Notice that for 𝑘 ≥ 1

𝑡𝑝_𝑘 = (𝑡−1)𝑝_𝑘+(𝑆𝑝)_𝑘+1_{𝑘=1}= (𝑡−1)(𝒯_𝑡𝑝)_𝑘−𝜅_𝑘(𝑡)+1_{𝑘=1}, (B.24) where

𝜅_𝑘(𝑡) = (𝑡 − 1)(𝒯_𝑡𝑝)_𝑘− (𝑡 − 1)𝑝_𝑘− (𝑆𝑝)_𝑘. (B.25) Put (B.25), (B.20) and (B.22) together, we have the recursion of𝜀(𝑡) = (𝜀_𝑘(𝑡))_𝑘≥1

𝜀_𝑘(𝑡) = ̄𝑁_𝑘^𝑀(𝑡) − 𝑡𝜇_𝜉𝑝_𝑘

= (𝒯_𝑡𝑁̄^𝑀(𝑡 − 1))_𝑘+ 𝜉_𝑡1_{𝑘=1}− (𝑡 − 1)𝜇_𝜉(𝒯_𝑡𝑝)_𝑘+ 𝜇_𝜉𝜅_𝑘(𝑡) − 𝜇_𝜉1_{𝑘=1}

= (𝒯𝑡𝜀(𝑡 − 1))_𝑘+ 𝜇_𝜉𝜅_𝑘(𝑡) + (𝜉_𝑡− 𝜇_𝜉)1_{𝑘=1}.

(B.26)

We define the norm for a real sequence𝑄 = (𝑄_𝑘)_𝑘≥1as‖𝑄‖ = sup_𝑘≥1|𝑄_𝑘|. Taking into account that 𝔼[ ̄𝑁_𝑘^𝑀(𝑡)] = 𝔼[𝑁_𝑘^𝑀(𝑡)], before we can prove Proposition B.3, we need to show there are constants 𝑐 > 0 and 𝛽 ∈ [0, 1) such that for 𝑡 = 0, 1, … the following holds

‖𝔼[𝜀(𝑡)]‖ = sup

𝑘≥1

|𝔼[𝑁_𝑘^𝑀(𝑡)] − 𝑡𝜇_𝜉𝑝_𝑘| ≤ 𝑐𝑡^𝛽. (B.27)

Notice that the movie degree of any actor will always be bounded by the number of movies, we introduce𝑘_𝑡= 𝑡, this is the upper bound of the movie degrees when the network only has𝑡 movies. Now define a sequence

̃𝜀𝑘(𝑡) = 𝜀_𝑘(𝑡)1_{𝑘≤𝑘_𝑡_}

and note that for𝑘 ≤ 𝑘_𝑡, the sequence( ̃𝜀_𝑘(𝑡))_𝑘≥1satisfies

̃𝜀𝑘(𝑡) = 1_{𝑘≤𝑘_𝑡_}(𝒯_𝑡𝜀(𝑡 − 1))_𝑘+ 𝜇_𝜉 ̃𝜅_𝑘(𝑡) + (𝜉_𝑡− 𝜇_𝜉)1_{𝑘=1}, (B.28) where ̃𝜅_𝑘(𝑡) = 𝜅_𝑘(𝑡)1_{𝑘≤𝑘_𝑡_}. Using the triangle inequality we obtain

(14)

that

‖𝔼[𝜀(𝑡)]‖ ≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼 ̃𝜀(𝑡)‖

≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝒯_𝑡𝜀(𝑡)‖ + ‖𝔼 ̃𝜅(𝑡)‖, (B.29) where1_(−∞,𝑘_𝑡_](𝑘) = 1_{𝑘≤𝑘_𝑡_}.

We formulate a lemma to derive bounds for terms in (B.29).

Lemma B.4. There are constants𝐶_̃𝜀,𝐶⁽¹⁾_𝜀 ,𝐶⁽²⁾_𝜀 and𝐶_̃𝜅, independent of 𝑡, such that for 𝑡 sufficiently large and some 𝛽 ∈ [0, 1),

(a) ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤_𝑡^𝐶1−𝛽^̃𝜀,

(b) ‖𝔼[1(−∞,𝑘_𝑡](⋅)𝒯_𝑡𝜀(𝑡 − 1)]‖ ≤ (1 −^𝐶_𝑡⁽¹⁾^𝜀 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +^𝐶_𝑡1−𝛽⁽²⁾^𝜀 , (c) ‖𝔼[ ̃𝜅(𝑡)]‖ ≤_𝑡^𝐶1−𝛽^𝜅^̃ .

With Lemma B.4, we can prove Proposition B.3.

Proof of Proposition B.3. We will establish (B.8) by induction on𝑡. Fix 𝑡₀∈ ℕ, we initialize the induction hypothesis for any 𝑡 = 𝑡₀. Indeed, we have

‖𝔼[𝜀(𝑡₀)]‖ ≤ sup

𝑘≤1

𝔼[𝑁_𝑘^𝑀(𝑡₀)] + 𝑡₀𝜇_𝜉sup

𝑘≥1

𝑝_𝑘≤ 𝑡₀(1 + 𝜇_𝜉) ≤ 𝑐𝑡^𝛽₀,

when we choose𝑐 so large that 𝑡₀(1 + 𝜇_𝜉) ≤ 𝑐𝑡^𝛽₀. Now we start the induction argument: assuming (B.8) is true for𝑡−1, we want to establish (B.8) for𝑡. In fact, noticing that 1 − 𝐶⁽¹⁾_𝜀 /𝑡 ≥ 0 holds for 𝑡 sufficiently large and using Lemma B.4, we have

‖𝔼[𝜀(𝑡)]‖ ≤ ‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ + ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝒯_𝑡𝜀(𝑡)‖ + ‖𝔼 ̃𝜅(𝑡)‖

≤ 𝐶_̃𝜀

𝑡^1−𝛽 + (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝐶⁽²⁾_𝜀 𝑡^1−𝛽 + 𝐶_̃𝜅

𝑡^1−𝛽

≤ 𝐶_̃𝜀

𝑡^1−𝛽 + (1 −𝐶⁽¹⁾_𝜀

𝑡 ) 𝑐(𝑡 − 1)^𝛽+𝐶⁽²⁾_𝜀 𝑡^1−𝛽 + 𝐶_̃𝜅

𝑡^1−𝛽

≤ 𝑐𝑡^𝛽(1 −𝑐𝐶⁽¹⁾_𝜀 − 𝐶_̃𝜀− 𝐶⁽²⁾_𝜀 − 𝐶_̃𝜅

𝑡¹ ) .

If we choose𝑡₀so large that1 − 𝐶⁽¹⁾_𝜀 /𝑡 ≥ 0 is true for all 𝑡 ≥ 𝑡₀, and we also choose𝑐 so large that 𝑐𝐶⁽¹⁾_𝜀 ≥ 𝐶_̃𝜀+ 𝐶⁽²⁾_𝜀 + 𝐶_̃𝜅and𝑐𝑡₀^𝛽≥ (1 + 𝜇_𝜉)𝑡₀, then we have‖𝔼[𝜀(𝑡)]‖ ≤ 𝑐𝑡^𝛽and thus can advance the induction hypothesis to𝑡. Therefore we have established (B.8).

(15)

We shall prove Lemma B.4 one by one.

Proof of Lemma B.4(a). Obviously‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤ 𝔼[‖𝜀(𝑡) − ̃𝜀(𝑡)‖], thus

‖𝜀(𝑡) − ̃𝜀(𝑡)‖ = sup

𝑘>𝑘_𝑡

| ̄𝑁_𝑘^𝑀(𝑡) − 𝑡𝜇_𝜉𝑝_𝑘| ≤ sup

𝑘>𝑘_𝑡

̄𝑁^𝑀_𝑘 (𝑡) + 𝑡𝜇_𝜉sup

𝑘>𝑘_𝑡

𝑝_𝑘.

Notice that𝑘_𝑡 = 𝑡, the first term𝑁̄_𝑘^𝑀(𝑡) vanishes because it is not possible that there exists any actor with movie degree strictly greater than𝑡 when there are only 𝑡 movies in the network. The second term sup_𝑘>𝑘

𝑡𝑝_𝑘, however, from (B.6) we know asymptotically (𝑘 → ∞) 𝑝_𝑘 follows a power law

𝑝_𝑘≈ 𝑐(𝜃^∗, 𝛿)𝑘^−(1+𝜃^∗⁾.

Notice that𝑝_𝑘 is monotonically decreasing with𝑘 when 𝑘 is sufficiently large and𝑘_𝑡= 𝑡, the above relation implies there exists a con- stant𝐶_̃𝜀and a𝛽 such that for 𝑘 sufficiently large

𝑡𝜇_𝜉sup

𝑘>𝑘_𝑡

𝑝_𝑘 ≤ 𝐶_̃𝜀 𝑡^𝜃^∗⁺¹.

Since𝜃^∗> 1, for 𝛽 ∈ [0, 1) we have that for 𝑡 sufficiently large

‖𝔼[𝜀(𝑡) − ̃𝜀(𝑡)]‖ ≤ 𝐶_̃𝜀 𝑡^1−𝛽.

Proof of Lemma B.4(b). First we prove that for𝑡 sufficiently large, on the event𝛺_𝑡−1≥ 𝑡,

‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝑇_𝑡𝜀(𝑡)]‖ ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝐶⁽³⁾_𝜀

𝑡^1−𝛽, (B.30) which is Lemma B.4(b) if𝜁_𝑡= 1. Later we will extend the case to more general values of𝜁_𝑡.

To prove (B.30), we first prove a bound which we will later use.

In fact, given real-valued sequence𝑄 = (𝑄_𝑘)_𝑘≥0, which satisfies (i) 𝑄₀= 0 and (ii)

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄_𝑘| ≤ 𝐶_𝑄𝛺_𝑡−1, (B.31) there exists a𝛽 ∈ (0, 1) and a constant 𝑐 > 0 such that for 𝑡 sufficiently

(16)

large, on the event𝛺_𝑡−1≥ 𝑡,

‖𝔼[1_(−∞,𝑘_𝑡₎(⋅)𝑇_𝑡𝑄]‖ ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝑄]‖ + 𝑐𝐶_𝑄 𝑡^1−𝛽. (B.32) To prove (B.32), recall for𝑘 ≥ 1

𝔼[(𝑇_𝑡𝑄)_𝑘] = 𝔼 [(1 −𝑘 + 𝛿

𝛺_𝑡−1𝑝_𝜓) 𝑄_𝑘+𝑘 − 1 + 𝛿

𝛺_𝑡−1 𝑝_𝜓𝑄_𝑘−1] . (B.33) Notice that in the above expression we have a random variable𝛺_𝑡−1in the denominator, we substitute them with their expectations𝔼[𝛺_𝑡−1] = (𝑡 − 1)𝜃, where 𝜃 = 𝜇_𝜁+ 𝜇_𝜉𝛿, and subtract the differences. That is

𝔼[(𝑇_𝑡𝑄)_𝑘] = 𝔼 [(1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝_𝜓) 𝑄_𝑘+𝑘 − 1 + 𝛿

(𝑡 − 1)𝜃 𝑝_𝜓𝑄_𝑘−1] (B.34) + (𝑘 + 𝛿)𝑝_𝜓𝔼 (𝛺_𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺_𝑡−1 𝑄_𝑘) (B.35) + (𝑘 − 1 + 𝛿)𝑝_𝜓𝔼 (𝛺_𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺_𝑡−1 𝑄_𝑘−1) . (B.36) First we take care of (B.34). Note that𝑘 ≤ 𝑘_𝑡 = 𝑡 holds always and 𝜃 > 1, 𝑘 + 𝛿 ≤ 𝑘_𝑡+ 𝛿 = 𝑡 + 𝛿 ≤ (𝑡 − 1)𝜃 holds for sufficiently large 𝑡, and hence

1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝_𝜓≥ 0.

Thus, for𝑡 sufficiently large, sup

𝑘≤𝑘_𝑡

|(1 − 𝑘 + 𝛿

(𝑡 − 1)𝜃𝑝_𝜓) 𝔼[𝑄_𝑘] + 𝑘 − 1 + 𝛿

(𝑡 − 1)𝜃 𝑝_𝜓𝔼[𝑄_𝑘−1]|

≤ (1 − 1

(𝑡 − 1)𝜃) ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝑄]‖

≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 )‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝑄]‖,

(B.37)

where𝐶⁽¹⁾_𝜀 is some constant. Now we bound (B.35) and (B.36). We use the assumption (ii) in (B.31) and for𝑡 sufficiently large we have

sup

𝑘≥1

|(𝑘 + 𝛿)𝑝_𝜓𝔼 (𝛺_𝑡−1− (𝑡 − 1)𝜃

(𝑡 − 1)𝜃𝛺_𝑡−1 𝑄_𝑘)| ≤ 𝑐𝐶_𝑄

𝑡 𝔼[|𝛺_𝑡−1− (𝑡 − 1)𝜃|].

For the expectation on the right side, we apply Hölder’s inequality and

(17)

obtain

𝔼[|𝛺_𝑡−1− (𝑡 − 1)𝜃|] ≤ (𝔼|𝛺_𝑡−1− (𝑡 − 1)𝜃|²)^1/2

≤ (𝔼 |

𝑡−1

∑

𝑖=1

𝑋_𝑖|

2

)

1/2

,

(B.38)

where𝑋_𝑖 ∶= 𝛿(𝜉_𝑖− 𝜇_𝜉) + (𝜁_𝑖− 𝜇_𝜁). In order to bound the expecta- tion𝔼 |∑^𝑡−1_𝑖=1𝑋_𝑖|², we need the Marcinkiewicz–Zygmund inequality (see [18, Section 10.3]): if(𝑋_𝑖)_𝑖≥1are i.i.d. random variables such that 𝔼[𝑋_𝑖] = 0 and 𝔼[|𝑋_𝑖|^𝑞] < +∞ hold for all 𝑖, 1 ≤ 𝑞 < ∞, then there exists a constant𝑐_𝑞depending only on𝑞 such that

𝔼 (|

𝑛

∑

𝑖=1

𝑋_𝑖|

𝑞

) ≤ 𝑐_𝑞𝑛𝔼[|𝑋₁|^𝑞]. (B.39)

Note that𝜁 has a finite second moment and 𝜉 ≤ 𝜁, so that 𝑋_𝑖 has a finite second moment. Applying the Marcinkiewicz-Zygmund inequality on the right side of (B.38), we obtain

𝔼[|𝛺_𝑡−1− (𝑡 − 1)𝜃|] ≤ (𝑐₂(𝑡 − 1)𝔼[|𝑋²₁|])^1/2≤ 𝑐𝑡^1/2, (B.40) where𝑐 is some constant. Now we have proved that the supremum over𝑘 of the absolute values of (B.35) is bounded from above by a constant divided by𝑡^1−𝛽, where𝛽 ≥ 1/2. The same can be done anal- ogously for (B.36). Hence we have proved (B.32).

In order to substitute𝑄 with 𝜀(𝑡 − 1) in (B.32), we have to prove (i) and (ii) for𝑄 are satisfied for 𝜀(𝑡 − 1). Note that 𝜀₀(𝑡 − 1) = 0 is always true by convention, we only have to prove (B.31). Since𝜃^∗> 1 when𝛿 > −1, note from (B.6) that we have the bound 𝑝_𝑘≤ 𝑐𝑘^−𝛾with 𝛾 > 2. We have 𝑘 + 𝛿 > 0 for any 𝑘 ≥ 1 and on the event {𝛺_𝑡−1≥ 𝑡},

sup

𝑘≥1

|𝑘 + 𝛿||𝜀_𝑘(𝑡 − 1)|

≤ ∑

𝑘≥1

(𝑘 + 𝛿)[𝑁^𝑀_𝑘 (𝑡 − 1) + (𝑡 − 1)𝜇_𝜉𝑝_𝑘]

≤ 𝛺_𝑡−1+ (𝑡 − 1) ∑

𝑘≥1

(𝑘 + 𝛿)𝜇_𝜉𝑝_𝑘≤ 𝑐𝛺_𝑡−1,

(B.41)

where𝑐 is some constant . We have completed the proof of (B.31).

To complete the proof of Lemma B.4(b), we first show that (B.32)

(18)

implies, for1 ≤ 𝑛 ≤ 𝑡, and all 𝑘 ≥ 1, on the event {𝛺_𝑡−1≥ 𝑡},

𝔼[1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡^𝑛𝜀(𝑡 − 1))_𝑘] ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝜀(𝑡 − 1)]‖

+𝑛𝐶⁽³⁾_𝜀 𝑡^1−𝛽 .

(B.42) We apply mathematical induction on𝑛 to prove (B.42). Note that when 𝑛 = 1, (B.42) is the same with (B.32) and hence true. Now we assume (B.42) is true for the case of𝑛 − 1 and seek to advance the induction hypothesis. Notice that

1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡^𝑛𝜀(𝑡 − 1))_𝑘 = 1_{𝑘≤𝑘_𝑡_}𝑇_𝑡(𝑄(𝑡 − 1))_𝑘, (B.43) where𝑄_𝑘(𝑡 − 1) = 1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡^𝑛−1𝜀(𝑡 − 1))_𝑘. To use (B.32), we need to verify (i) and (ii) for𝑄(𝑛 − 1). Note that 𝑄₀(𝑛 − 1) = 0 by convention and to prove (B.31), on the event{𝛺_𝑡−1≥ 𝑡}, we have, for 𝑡 sufficiently large,

∞

∑

𝑘=1

(𝑘 + 𝛿)(𝑇_𝑡𝑄)_𝑘≤ (1 + 𝑝_𝜓 𝛺_𝑡−1)

∞

∑

𝑘=1

(𝑘 + 𝛿)𝑄_𝑘

≤ (1 +𝑝_𝜓 𝑡 )

∞

∑

𝑘=1

(𝑘 + 𝛿)𝑄_𝑘, and hence by induction,

∞

∑

𝑘=1

(𝑘 + 𝛿)(𝑇_𝑡^𝑛𝑄)_𝑘≤ (1 +𝑝_𝜓 𝑡 )

𝑛−1 ∞

∑

𝑘=1

(𝑘 + 𝛿)𝑄_𝑘. (B.44)

Substitute𝑄_𝑘= 𝜀_𝑘(𝑡 − 1) and notice that 𝜀_𝑘(𝑡 − 1) = 𝑁_𝑘^𝑀(𝑡 − 1) − (𝑡 − 1)𝜇_𝜉𝑝_𝑘, the following bound holds:

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄_𝑘(𝑛 − 1)| ≤

∞

∑

𝑘=1

(𝑘 + 𝛿) |(𝑇_𝑡^𝑛−1𝜀(𝑡 − 1))_𝑘|

≤

∞

∑

𝑘=1

(𝑘 + 𝛿) |(𝑇_𝑡^𝑛−1𝑁^𝑀(𝑡 − 1))_𝑘| + (𝑡 − 1)𝜇_𝜉

∞

∑

𝑘=1

(𝑘 + 𝛿) |(𝑇_𝑡^𝑛−1𝑝)_𝑘|

≤ (1 +𝑝_𝜓 𝑡 )

𝑛−1

∑

𝑘≥1

(𝑘 + 𝛿)[𝑁_𝑘^𝑀(𝑡 − 1) + (𝑡 − 1)𝜇_𝜉𝑝_𝑘]

≤ (1 +𝑝_𝜓 𝑡 )

𝑛−1

⋅ 𝑐𝛺_𝑡−1,

(B.45) where the second inequality follows by substitute𝑄_𝑘in (B.44) by𝑁_𝑘^𝑀(𝑡−

(19)

1) and 𝑝_𝑘respectively and the last inequality follows by (B.41). Since 1 + 𝑥 ≤ 𝑒^𝑥holds for𝑥 ≥ 0, when 𝑛 ≤ 𝑡, on the event {𝛺_𝑡−1≥ 𝑡},

sup

𝑘≥1

(𝑘 + 𝛿)|𝑄_𝑘(𝑛 − 1)| ≤ 𝑒¹𝑐𝛺_𝑡−1, (B.46) and (B.46) implies (ii).

We now use the induction hypothesis on the case of𝑛 − 1 that 𝔼[1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡^𝑛−1𝜀(𝑡 − 1))_𝑘]

≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝜀(𝑡 − 1)]‖ +(𝑛 − 1)𝐶⁽³⁾_𝜀

𝑡^1−𝛽 . (B.47) We substitute𝑄_𝑘in (B.32) with𝑄_𝑘(𝑛 − 1) = 1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡𝜀(𝑡 − 1))_𝑘, and on the event{𝛺_𝑡−1≥ 𝑡}, we have

𝔼[1_{𝑘≤𝑘_𝑡_}(𝑇_𝑡𝜀(𝑡 − 1))_𝑘]

≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[1_(−∞,𝑘_𝑡_](⋅)𝜀(𝑡 − 1)]‖ +(𝑛 − 1)𝐶⁽³⁾_𝜀 + 𝑐𝐶_𝑄

𝑡^1−𝛽 ,

(B.48) in which if we take𝐶⁽³⁾_𝜀 ≥ 𝑐𝐶_𝑄, we have completed advancing the induction hypothesis.

We obtain from (B.48) that on the event{𝛺_𝑡−1≥ 𝑡}, for 𝜁_𝑡< 𝑡,

𝔼[1_{𝑘≤𝑘_𝑡_}(𝒯_𝑡𝜀(𝑡 − 1))_𝑘|𝜁_𝑡] ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)|𝜁_𝑡]‖ +𝜁_𝑡𝐶⁽³⁾_𝜀 𝑡^1−𝛽

= (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖ +𝜁_𝑡𝐶⁽³⁾_𝜀 𝑡^1−𝛽 . Note that we have used𝜁_𝑡is independent of𝜀(𝑡 − 1). If 𝜁_𝑡> 𝑡, similar to (B.41), we have the trivial bound

sup

𝑘≤𝑘_𝑡

|(𝒯_𝑡𝜀(𝑡 − 1))_𝑘| ≤ 𝑐𝛺_𝑡, (B.49)

and hence, on the event{𝛺_𝑡−1≥ 𝑡},

𝔼[1_{𝑘≤𝑘_𝑡_}(𝒯_𝑡𝜀(𝑡 − 1))_𝑘|𝜁_𝑡] ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖

+𝜁_𝑡𝐶⁽³⁾_𝜀

𝑡^1−𝛽 + 𝑐𝔼[𝛺_𝑡1_{𝜁_𝑡_>𝑡}|𝜁_𝑡].

(B.50)

(20)

We bound

𝔼[𝛺_𝑡1_{𝜁_𝑡_>𝑡}] = (𝑡 − 1)𝜃ℙ(𝜁_𝑡> 𝑡) + (1 + 𝛿)𝔼[𝜁_𝑡1_{𝜁_𝑡_>𝑡}]

≤ (𝑡 − 1) 𝑐_𝑁

𝑡^3+𝑎⁰ +1 + 𝛿 𝑡 𝔼[𝜁²_𝑡]

≤ 𝑐_𝑁

𝑡^2+𝑎⁰ +(1 + 𝛿)𝔼[𝜁²_𝑡]

𝑡 ≤ 𝑐

𝑡^1−𝛽,

(B.51)

where𝑐 is some constant chosen appropriately.

On the even{𝛺_𝑡−1 < 𝑡}, we use the bound in (B.49) and hence have

𝔼[1_{𝑘≤𝑘_𝑡_}(𝒯_𝑡𝜀(𝑡 − 1))_𝑘|𝜁_𝑡] ≤ (1 −𝐶⁽¹⁾_𝜀

𝑡 ) ‖𝔼[𝜀(𝑡 − 1)]‖

+𝜁_𝑡𝐶⁽³⁾_𝜀

𝑡^1−𝛽 + 𝑐𝔼[𝛺_𝑡1_{𝜁_𝑡_>𝑡}|𝜁_𝑡] + 𝑐𝔼[𝛺_𝑡1_{𝛺_𝑡−1_<𝑡}|𝜁_𝑡].

(B.52)

For the last term we use the standard large deviation theory. The event {𝛺_𝑡−1< 𝑡} is the same as the event {𝛺_𝑡−1− 𝔼[𝛺_𝑡−1] < 𝑡 − (𝑡 − 1)𝜃}, and hence the same as{∑^𝑡−1_𝑖=1𝑋_𝑖 < 𝑡 − (𝑡 − 1)𝜃}. We now study the event {∑^𝑡−1_𝑖=1𝑋_𝑖< 𝑡 − (𝑡 − 1)𝜃}. For 𝑡 sufficiently large, we have

ℙ (∑^𝑡−1_𝑖=1𝑋_𝑖

𝑡 − 1 < (1 − 𝜃)(𝑡 − 1) + 1

𝑡 − 1 ) ≈ ℙ ( ̄𝑋_𝑡−1< 1 − 𝜃) . where we define the average of all𝑋_𝑖’s as

̄𝑋_𝑡−1= 1 𝑡 − 1

𝑡−1

∑

𝑖=1

𝑋_𝑖.

However, standard large deviation theory tells us the probability is exponentially small – see [3, page 21] for more about large deviation principle, noting1 − 𝜃 < 0, this in particular implies that for 𝑡 sufficiently large, we have

ℙ ( ̄𝑋_𝑡−1< 1 − 𝜃) ≤ 𝑂(𝑡⁻²). (B.53) We use the above bound on the last term of (B.52) and hence have

𝔼[𝛺_𝑡1_{𝛺_𝑡−1_<𝑡}] = (𝑡 + 𝜃)ℙ(𝛺_𝑡−1 < 𝑡) ≤ 𝑐 𝑡 ≤ 𝑐

𝑡^1−𝛽, (B.54) where we use𝜁_𝑡and𝜉_𝑡are independent from𝛺_𝑡−1and𝑐 is some con-

(21)

stant. The bound in Lemma B.4(b) follows for𝛽 ≥ 1/2 by taking expectation on both sides of (B.52).

Proof of Lemma B.4(c). Now we prove Lemma B.4(c). Recall the def- initions that

̃𝜅𝑘(𝑡) = 1_{𝑘≤𝑘_𝑡_}𝜅_𝑘(𝑡) with 𝜅_𝑘(𝑡) = (𝑡 − 1)((𝒯_𝑡− 𝐼)𝑝)_𝑘− (𝑆𝑝)_𝑘, (B.55) where 𝒯𝑡= 𝑇𝑡^𝜁^𝑡,𝑇_𝑡is defined in (B.21), and𝑆 is defined in (B.23), and where𝐼 denotes the identity operator. We now assume 𝑘 ≤ 𝑘_𝑡in this proof, in which case𝜅_𝑘(𝑡) = ̃𝜅_𝑘(𝑡). (B.27) means that

𝜅_𝑘(𝑡) = 1

𝜇_𝜉(𝜀_𝑘(𝑡) − (𝒯_𝑡𝜀(𝑡 − 1))_𝑘− (𝜉_𝑡− 𝜇_𝜉)1_{𝑘=1}). (B.56) (B.41) implies thatsup_𝑘≤1|𝜀_𝑘(𝑡)| ≤ 𝑐𝛺_𝑡−1≤ 𝑐𝛺_𝑡andsup_{1≤𝑘≤𝑘}

𝑡|(𝒯_𝑡𝜀(𝑡−

1))_𝑘| ≤ 𝑐𝛺_𝑡by (B.49), thus

sup

𝑘≤𝑘_𝑡

|𝜅_𝑘(𝑡)| ≤ ̃𝐶𝛺_𝑡+|𝜉_𝑡− 𝜇_𝜉|

𝜇_𝜉 , (B.57)

where ̃𝐶 is some constant appropriately chosen and 𝜉_𝑡− 𝜇_𝜉is0 when taking expectations.

For𝑥 ∈ [0, 1] and 𝑤 ∈ ℕ, we define

𝑓_𝑘(𝑥; 𝑤) = ((𝐼 + 𝑥(𝑇_𝑡− 𝐼))^𝑤𝑝)

𝑘. (B.58)

Also define

𝜅_𝑘(𝑡; 𝑤) = (𝑡 − 1)[𝑓_𝑘(1; 𝑤) − 𝑓_𝑘(0, 𝑤)] − (𝑆𝑝)_𝑘, (B.59) so that𝜅_𝑘(𝑡) = 𝜅_𝑘(𝑡; 𝜁_𝑡). Note that 𝑥 ↦ 𝑓_𝑘(𝑥; 𝑤) is a polynomial in 𝑥 of degree𝑘. Take a Taylor expansion around 𝑥 = 0, we have

𝑓_𝑘(1; 𝑤) = 𝑓_𝑘(0; 𝑤) + 𝑤((𝑇_𝑡− 𝐼)𝑝)_𝑘+1

2𝑓_𝑘^″(𝑥_𝑘; 𝑤), (B.60) for some𝑥_𝑘∈ (0, 1). Since 𝐼 + 𝑥(𝑇_𝑡− 𝐼) and 𝑇_𝑡− 𝐼 commute,

𝑓_𝑘^″(𝑥; 𝑤) = 𝑤(𝑤 − 1)((𝐼 + 𝑥(𝑇_𝑡− 𝐼))^𝑤−2(𝑇_𝑡− 𝐼)²𝑝)

𝑘.