Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

(1)

Cover Page

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

5

N E T W O R K M O D E L S

5.1 introduction and notation

In Chapter 3, we consider the estimation of the general sublinear preferential attachment function, where we only assume the monotonicity of the preferential attachment function. The success of the maximum likelihood estimator in Chapter 4 motivates us to consider the following problem. Suppose that the preferential attachment function is from a sublinear parametric family{𝑓_𝜃∣ 𝜃 ∈ 𝛩 ⊂ ℝ^𝑑} with the true parameter𝜃₀. The problem is no longer non-parametric as in Chap- ter 3 and we expect the maximum likelihood estimator to work. The problem, however, becomes significantly harder than that in Chap- ter 4 as the analysis of all quantities in the likelihood function gets much more involved as soon as we step away from the affine domain.

Nonetheless in this chapter, with a branching process framework similar to that in Chapter 3, we show that the maximum likelihood estimator (mle) ̂𝜃_𝑛works despite the complexity of analyzing the likelihood function. The main contribution of this chapter is indeed to use the branching process framework, which first appeared in [70] and was the foundation stone of Chapter 3, to study the likelihood function of the general sublinear preferential attachment networks.

We employ the exact same branching process framework as in Sec- tion 3.3, except that the preferential attachment function assumes the parametric form{𝑓_𝜃, 𝜃 ∈ 𝛩}. We also inherit all the notation from Chapter 3. Some notation also comes from Chapter 4, such as the su- perscript(0), which stresses that the quantity is under the true param- eter𝜃₀. For any sequence𝑎_𝑘, define𝑎_>𝑘= ∑_𝑖>𝑘𝑎_𝑖.𝑁_𝑘(𝑡) is the number

of nodes of degree𝑘 in the network at time 𝑡 The network at time𝑡 possesses 𝑡 + 1 nodes.

,𝑃_𝑘(𝑡) is the empirical proportion of nodes of degree𝑘 at time 𝑡, i.e., 𝑃_𝑘(𝑡) = 𝑁_𝑘(𝑡)/(𝑡 + 1). 𝑝_𝑘 is the limiting degree proportion of nodes of degree𝑘 and 𝑞_𝑘 is the limiting probability of the incoming node choosing a node of degree 𝑘 to attach to, i.e. 𝑞_𝑘= 𝑓(𝑘)𝑝_𝑘/ ∑^∞_𝑖=1𝑓(𝑖)𝑝_𝑖.𝑆_ℎ(𝑡) is the summary of a certain quantity in the whole network and will be defined thoroughly in the next section.

Throughout the chapter, we assume that uniformly in𝜃 ∈ 𝛩, 𝑓_𝜃(𝑘) is non-decreasing in𝑘 and 𝑓_𝜃is sublinear in the sense that there exists a positive constant𝐶 such that 𝑓_𝜃(𝑘) ≤ 𝐶𝑘 uniformly in 𝜃 ∈ 𝛩.

The rest of the chapter is organized as follows. Section 5.2 gives

(3)

the definition of the maximum likelihood estimator. In Section 5.3 we prove that the mle is consistent by employing classical results from the continous-time branching processes. In Section 5.4 we prove that the mle is asymptotically normal. To overcome the problem of relying on the entire evolutional history of the network to establish the mle, we propose in Section 5.5 the quasi-maximum-likelihood estimator, which only depends on the final snapshot of the network.

5.2 construction of the maximum likelihood estimator Let(ℱ_𝑡)^𝑛_𝑡=1be the𝜎-algebras generated by the stochastic process of the graph’s evolution. Given the data of the degrees of the nodes that got attached to as𝐷^(𝑛)= (𝐷_𝑖)^𝑛_𝑖=2, the likelihood function is as follows

𝐿_𝑛(𝑓; 𝐷^(𝑛)) =

𝑛

∏

𝑡=2

𝑓(𝐷_𝑡)𝑁_𝐷_𝑡(𝑡 − 1) 𝑆_𝑓(𝑡 − 1)

where we define𝑆_ℎ(𝑡) = 𝑆_ℎ(𝑡 − 1) + ℎ(𝐷_𝑡−1+ 1) − ℎ(𝐷_𝑡−1) + ℎ(1) for 𝑡 ≥ 1 with the initialization 𝑆_ℎ(1) = 2ℎ(1), for any map ℎ ∶ ℕ → ℝ₊ and𝑆_ℎ(𝑡) = ∑^∞_𝑘=1ℎ(𝑘)𝑁_𝑘(𝑡). Note that 𝑆_ℎ(𝑡) is ℱ_𝑡-measurable. Then the derivatives of thelog-likelihood can be written as

The equality is understood component-wise.

̇𝑙𝑛(𝑓_𝜃; 𝐷^(𝑛)) = 𝜕

𝜕𝜃log 𝐿_𝑛(𝑓_𝜃; 𝐷^(𝑛))

=

𝑛

∑

𝑡=2

[𝑓_𝜃̇

𝑓_𝜃(𝐷_𝑡) −𝑆𝑓_𝜃̇(𝑡 − 1) 𝑆_𝑓_𝜃(𝑡 − 1)]

By the formation of the last equality, we see that ̇𝑙_𝑛(𝑓_𝜃₀) is a martingale relative to the filtration ℱ_𝑡because

̇𝑙𝑛(𝑓_𝜃₀; 𝐷^(𝑛)) =

𝑛

∑

𝑡=2

[

̇𝑓_𝜃₀ 𝑓_𝜃₀(𝐷_𝑡) −

∞

∑

𝑘=1

̇𝑓_𝜃₀

𝑓_𝜃₀(𝑘)𝑓_𝜃₀(𝑘)𝑁_𝑘(𝑡 − 1) 𝑆_𝑓

𝜃0(𝑡 − 1) ]

=

𝑛

∑

𝑡=2

[

̇𝑓_𝜃₀ 𝑓_𝜃₀(𝐷_𝑡) −

∞

∑

𝑘=1

̇𝑓_𝜃₀

𝑓_𝜃₀(𝑘)ℙ(𝐷_𝑡= 𝑘 ∣ ℱ_𝑡−1)]

=

𝑛

∑

𝑡=2

[

̇𝑓_𝜃₀

𝑓_𝜃₀(𝐷_𝑡) − 𝔼_𝜃₀[

̇𝑓_𝜃₀

𝑓_𝜃₀(𝐷_𝑡) | ℱ_𝑡−1]] .

If we normalize the likelihood by the number of nodes in the net-

(4)

work, we obtain

Note the left hand side of the display is𝜄 instead of 𝑙.

𝜄_𝑛(𝑓_𝜃) ∶= 1

𝑛 + 1log 𝐿_𝑛(𝑓_𝜃; 𝐷^(𝑛))

= 1

𝑛 + 1

∞

∑

𝑘=1

log 𝑓_𝜃(𝑘)

𝑛

∑

𝑡=2

𝟙_{𝐷_𝑡_=𝑘}− 1 𝑛 + 1

𝑛

∑

𝑡=2

log 𝑆_𝑓_𝜃(𝑡 − 1)

=

∞

∑

𝑘=1

log 𝑓_𝜃(𝑘)𝑁_>𝑘(𝑛) 𝑛 + 1 − 1

𝑛 + 1

𝑛

∑

𝑡=2

log 𝑆_𝑓_𝜃(𝑡 − 1)

=

∞

∑

𝑘=1

log 𝑓_𝜃(𝑘)𝑃_>𝑘(𝑛) − 1 𝑛 + 1

𝑛

∑

𝑡=2

log 𝑆_𝑓_𝜃(𝑡 − 1).

We take partial derivatives on both sides of the preceding display and obtain

̇𝜄𝑛(𝑓_𝜃) =

∞

∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)𝑃_>𝑘(𝑛) − 1 𝑛 + 1

𝑛

∑

𝑡=2

𝑆𝑓_𝜃̇(𝑡 − 1) 𝑆_𝑓_𝜃(𝑡 − 1)

=

∞

∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)𝑃_>𝑘(𝑛) − 1 𝑛 + 1

𝑛

∑

𝑡=2

∑^∞_𝑖=1𝑃_𝑖(𝑡 − 1) ̇𝑓_𝜃(𝑖)

∑^∞_𝑖=1𝑃_𝑖(𝑡 − 1)𝑓_𝜃(𝑖)

(5.1)

In this chapter, ̇𝑓_𝜃and _𝜃̇𝜄 (sometimes ̇𝑙_𝜃) are often understood component-wise to simplify notation, such as in (5.1), (5.2), (5.15) and (5.22).

̇𝜄𝑛(𝑓_𝜃) will be shown to converge in probability to the following quantity in Proposition 5.5

̇𝜄(𝑓_𝜃) =

∞

∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)𝑝⁽⁰⁾_>𝑘−∑^∞_𝑖=1𝑝⁽⁰⁾𝑖 𝑓_𝜃̇(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃(𝑖). (5.2) Indeed𝜃₀solves the above equation. In fact, assuming the Malthusian

parameter ^{Recall the}

definition of the Malthusian parameter in (3.6) and further elaboration in (3.14).

associated with the true parameter𝑓_𝜃₀is𝜆^∗₀, we have

̇𝜄(𝑓_𝜃₀) =

∞

∑

𝑘=1

̇𝑓_𝜃₀

𝑓_𝜃₀(𝑘)𝑝⁽⁰⁾_>𝑘−∑_𝑖=1^∞ 𝑝⁽⁰⁾_𝑖 𝑓_𝜃̇₀(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃₀(𝑖)

=

∞

∑

𝑘=1

̇𝑓_𝜃₀(𝑘) ( 𝑝⁽⁰⁾_>𝑘 𝑓_𝜃₀(𝑘)−𝑝⁽⁰⁾_𝑘

𝜆^∗₀ ) = 0.

The second equality holds because of (3.14) and in the last equality we apply Lemma 3.5, which first appeared in [17].

Suppose𝜆^∗is the Malthusian parameter of the continuous branching tree process associated with the preferential attachment function

(5)

𝑓 and recall the equality 𝑞_𝑘= 𝑓(𝑘)𝑝_𝑘

∑^∞_𝑖=1𝑓(𝑖)𝑝_𝑖 = 𝑓(𝑘)𝑝_𝑘 𝜆^∗ = 𝑝_>𝑘.

5.3 consistency

We present the following proposition obtained by combining Theo- rem 3.1 and Corollary 3.4 of [56]. All the notation are the same as in Section 3.3 as we follow the same branching process framework therein.

The reader is advised to read Section 3.3 and Section 3.4 thoroughly before proceeding with the proofs.

Recall the conditions of the supercritical, Malthusian processes that we consider in Section 3.3.2.

Proposition 5.1. Assume that the reproduction function𝜇 = 𝔼[𝜉(𝑡)]

satisfies conditions (3.6), where the Malthusian parameter is denoted by𝛼, and (3.7) and does not concentrate on any lattice. Given product- measurable, separable, non-negative random process𝜑(𝑡), assigning a characteristic on the root node∅ such that 𝔼(𝜑(𝑡)), as a function on 𝑡, is continuous almost everywhere with respect to the Lebesgue measure and the following conditions hold:

∞

∑

𝑘=0

sup

𝑘≤𝑡≤𝑘+1

(𝑒^−𝛼𝑡𝔼[𝜑(𝑡)]) < ∞, (5.3) 𝔼[sup

𝑠≤𝑡

𝜑(𝑠)] < ∞ for all 𝑡 < ∞. (5.4) Then there exists a random variable𝑌_∞depending only on the repro- duction process𝜉(𝑡) such that as 𝑡 → ∞

𝑒^−𝛼𝑡𝑍^𝜑𝑡

−→ 𝑌𝑃 _∞𝑚^𝜑_∞, (5.5)

where𝑚^𝜑_∞is defined as

𝑚^𝜑_∞= ∫^∞

0 𝑒^−𝛼𝑡𝔼[𝜑(𝑡)]𝑑𝑡

∫^∞

0 𝑢𝑒^−𝛼𝑡𝑑𝜇(𝑡) .

Note that in the preceding display, the denominator depends only on the reproduction function𝜇(𝑡) = 𝔼[𝜉(𝑡)].

Define_𝛼𝜉(𝑡) = ∫₀^𝑡𝑒^−𝛼𝑢𝜉(𝑑𝑢). If the 𝑥 log 𝑥 condition

𝔼[_𝛼𝜉(∞) log⁺_𝛼𝜉(∞)] < ∞ (5.6) holds, then the convergence in (5.5) holds in𝐿₁sense.

(6)

Suppose that the reproduction process𝜉 satisfies (5.6), and both 𝜑₁ and𝜑₂satisfy the conditions of (5.3) and (5.4). Define𝑇_𝑡as the total number of births up to and including time𝑡. Then, on {𝑇_𝑡 → ∞}, as 𝑡 → ∞.

𝑍^𝜑𝑡¹

𝑍^𝜑_𝑡²

−→𝑃 𝑚^𝜑_∞¹ 𝑚^𝜑∞²

= ∫^∞

0 𝑒^−𝛼𝑡𝔼[𝜑₁(𝑡)]𝑑𝑡

∫^∞

0 𝑒^−𝛼𝑡𝔼[𝜑₂(𝑡)]𝑑𝑡. (5.7) First we show the following lemma. Recall the definition of degree in the continuous random tree model in (3.4) and note that for the root node∅, the degree evolves together with the reproduction process on the root asdeg(∅, 𝛶_𝑡) = 𝜉(𝑡) + 1.

Lemma 5.2. For our continuous branching tree model and character- istics𝜑₁and𝜑₂both monotone increasing (but not necessarily strictly monotone) in𝑡, suppose for some constant 𝐶 > 0 and 𝛾 ≥ 0

max(𝜑₁(𝑡), 𝜑₂(𝑡)) ≤ 𝐶𝟙_{𝑡≥0}deg(∅, 𝛶_𝑡) log^𝛾(deg(∅, 𝛶_𝑡)).

Then as𝑡 → ∞ 𝑍^𝜑𝑡¹

𝑍^𝜑𝑡²

−→𝑃 𝑚^𝜑_∞¹

𝑚^𝜑∞² = ∫₀^∞𝑒^−𝛼𝑡𝔼[𝜑₁(𝑡)]𝑑𝑡

∫₀^∞𝑒^−𝛼𝑡𝔼[𝜑₂(𝑡)]𝑑𝑡. (5.8)

Proof. For our model, (5.6) only depends on the reproduction process 𝜉(𝑡) associated with the sublinear preferential attachment functions and has been verified in [70]. We only need to prove that conditions (5.3) and (5.4) hold for both𝜑₁and𝜑₂. Take any𝜑(𝑡) bounded above by𝐶 deg(∅, 𝛶_𝑡) log^𝛾(deg(∅, 𝛶_𝑡)).

Suppose the limiting degree distribution associated with the continuous branching tree model is(𝑝_𝑘)^∞_𝑘=1and the Malthusian parameter is𝛼, then the condition (5.3) entails

∞

∑

𝑘=0

sup

𝑘≤𝑡≤𝑘+1

(𝑒^−𝛼𝑡𝔼[𝜑(𝑡)]) ≲ ∫

∞ 0

𝑒^−𝛼𝑡𝔼[𝜑(𝑡)]𝑑𝑡

≤

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘) ∫

∞ 0

𝑒^−𝛼𝑡ℙ(deg(∅, 𝛶_𝑡) = 𝑘)𝑑𝑡

= 𝐶 𝛼

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)𝑝_𝑘< ∞.

(5.9) The last equality comes from the calculation in (3.17). The finiteness of the series is due to the fact that the limiting degree distribution (𝑝_𝑘)^∞_𝑘=1decays at slowest with a power law with exponent(2 + 𝜀) for

(7)

some positive𝜀 (this corresponds to the case where the preferential attachment function is affine with𝑓(𝑘) = 𝑘 − 1 + 𝜀), then 𝑘 log^𝛾(𝑘)𝑝_𝑘 is at least of orderlog^𝛾(𝑘)/𝑘^1+𝜀and hence the convergence.

For condition (5.4), we use the monotonicity of𝜑 and obtain for and0 < 𝑡 < ∞

𝔼[sup

𝑠≤𝑡

𝜑(𝑠)] = 𝔼[𝜑(𝑡)] < ∞.

The above boundedness comes from (5.9) and the monotonicity of 𝜇(𝑡). In fact, suppose the opposite of the preceding display holds, then (5.9) cannot be bounded.

Therefore both𝜑₁and𝜑₂satisfy the conditions in Proposition 5.1 and applying the fact that{𝑇_𝑡→ ∞} is an almost sure event, we obtain the desired result.

Lemma 5.3. Suppose the preferential attachment network model stated above with the underlying preferential attachment function𝑓. Suppose the empirical degree distribution is𝑃𝑘(𝑛) ∶= 𝑁𝑘(𝑛)/(𝑛 + 1) and the limiting degree distribution is(𝑝_𝑘)^∞_𝑘=1. Suppose for some constant𝐶 > 0 and𝛾 > 0, a map ℎ ∶ ℕ₊ → ℝ₊satisfies

ℎ(𝑘) ≤ 𝐶𝑘 log^𝛾(𝑘). (5.10)

Then as𝑛 → ∞

∞

∑

𝑘=1

ℎ(𝑘)𝑃_𝑘(𝑛)−→^𝑃

∞

∑

𝑘=1

ℎ(𝑘)𝑝_𝑘. (5.11)

Suppose bothℎ₁andℎ₂are maps fromℕ₊toℝ₊satisfying (5.10), then

∑^∞_𝑘=1ℎ₁(𝑘)𝑃_𝑘(𝑛)

∑^∞_𝑘=1ℎ₂(𝑘)𝑃_𝑘(𝑛)

−→𝑃 ∑^∞_𝑘=1ℎ₁(𝑘)𝑝_𝑘

∑^∞_𝑘=1ℎ₂(𝑘)𝑝_𝑘. (5.12)

Proof. Establish the continuous branching tree process as before and call the expanding graph𝛶_𝑡with the root∅. Suppose that its corre- sponding Malthusian parameter is𝜆^∗.

First we note that as in the calculation of (5.9), the right hand side of (5.11) is finite. Define the random characteristic𝜑₂(𝑡) = 𝟙_{𝑡≥0}and 𝜑₁(𝑡) = 𝟙_{𝑡≥0}ℎ(deg(∅, 𝛶_𝑡)). The quotient of the branching process counted with𝜑₁and𝜑₂is equal to

𝑍^𝜑_𝑡¹ 𝑍^𝜑𝑡²

= ∑_𝑥∈𝛶

𝑡ℎ(deg(𝑥, 𝛶_𝑡))

|{𝑥 ∈ 𝛶_𝑡}| = ∑^∞_𝑘=1ℎ(𝑘)𝑁_𝑘(𝑛)

𝑛 + 1 =

∞

∑

𝑘=1

ℎ(𝑘)𝑃_𝑘(𝑛),

(8)

where𝑛 + 1 is the number of nodes in 𝛶_𝑡and𝑁_𝑘(𝑛) is the number of nodes of degree𝑘 in 𝛶_𝑡. Since both𝜑₁and𝜑₂satisfy conditions in Lemma 5.2, we have

∞

∑

𝑘=1

ℎ(𝑘)𝑃_𝑘(𝑛) = 𝑍^𝜑𝑡¹

𝑍^𝜑𝑡²

−→𝑃 ∑^∞_𝑘=1ℎ(𝑘)𝜆^∗∫^∞

0 𝑒^−𝜆^∗^𝑡ℙ(deg(∅, 𝛶_𝑡) = 𝑘)𝑑𝑡 𝜆^∗∫₀^∞𝑒^−𝜆^∗^𝑡𝑑𝑡 . The denominator is simply1 and the nominator is ∑^∞_𝑘=1ℎ(𝑘)𝑝_𝑘 by identifying𝑝_𝑘from (3.17).

(5.12) can be shown by the same argument with 𝜑₁(𝑡) = 𝟙_{𝑡≥0}ℎ₁(deg(∅, 𝛶_𝑡)), 𝜑₂(𝑡) = 𝟙_{𝑡≥0}ℎ₂(deg(∅, 𝛶_𝑡)).

Now we prove the uniform convergence of_𝑛̇𝜄 to ̇𝜄. Abbreviate 𝜕_𝑖𝑓_𝜃= (𝜕/𝜕𝜃_𝑖)𝑓_𝜃and𝜕_𝑖𝑗𝑓_𝜃 = (𝜕²/𝜕𝜃_𝑖𝜕_𝑗)(𝑓_𝜃). From now on in this chapter, the Malthusian parameter of the continuous branching tree process associated with the true preferential attachment function𝑓_𝜃₀ is𝜆^∗₀. Note that𝐶²is the class of functions whose first and second derivatives both exist and are continuous. For the entire chapter, we assume the𝑓_𝜃∈ 𝐶²with respect to𝜃 ∈ 𝛩 for any 𝑘 ∈ ℕ₊.

We will need the following lemma in the study of the uniform convergence of the likelihood equation.

Lemma 5.4. Suppose𝑍_𝜃(𝑛) → 𝑍_𝜃in𝐿₁uniformly on𝜃 ∈ 𝛩, then the Cesàro mean of𝑍_𝜃(𝑛) is asymptotic to 𝑍_𝜃in𝐿₁uniformly.

Proof. By Lemma 4.7(ii),𝑍_𝜃(𝑛) → 𝑍_𝜃pointwise for𝜃 ∈ 𝛩.

Fix any𝜀 sufficiently small. Because of the uniform convergence of𝑍_𝜃(𝑛) → 𝑍_𝜃in𝐿₁, we find𝑚 such that for any 𝑛 > 𝑚,

𝔼[sup

𝜃∈𝛩

|𝑍_𝜃(𝑛) − 𝑍_𝜃|] < 𝜀.

By the trivial split𝑍_𝜃(𝑛) − 𝑍_𝜃= ∑^𝑚_𝑖=1(𝑍_𝜃(𝑖) − 𝑍_𝜃)/𝑛 + ∑^𝑛_𝑖=𝑚+1(𝑍_𝜃(𝑖) − 𝑍_𝜃)/𝑛, we bound

𝔼[sup

𝜃∈𝛩

|𝑍_𝜃(𝑛) − 𝑍_𝜃|] ≤1 𝑛𝔼 [sup

𝜃∈𝛩

|

𝑚

∑

𝑖=1

(𝑍_𝜃(𝑖) − 𝑍_𝜃)|]

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 as 𝑛→∞ for fixed𝑚

(9)

+1 𝑛

𝑛

∑

𝑖=𝑚+1

𝔼[sup

𝜃∈𝛩

|𝑍_𝜃(𝑖) − 𝑍_𝜃|]

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

≤𝜀 by the choice of 𝑚

.

Proposition 5.5. For the model stated above with the preferential at- tachment function from{𝑓_𝜃|𝜃 ∈ 𝛩} with 𝛩 ⊂ ℝ^𝑑being a compact set such that𝑓_𝜃 ∈ 𝐶²with respect to𝜃 for any 𝑘, 𝑓_𝜃(𝑘) is non-decreasing in𝑘 for any 𝜃 ∈ 𝛩, inf_𝜃∈𝛩min_𝑘∈ℕ₊𝑓_𝜃(𝑘) = inf_𝜃∈𝛩𝑓_𝜃(1) ≥ 𝛿 > 0 and for some𝐶 > 0 and 𝛾 > 0 uniform for all 𝜃 ∈ 𝛩 assume that

max(max

1≤𝑖≤𝑑|𝜕_𝑖𝑓_𝜃(𝑘)|, 𝑓_𝜃(𝑘)) ≤ 𝐶𝑘 log^𝛾(𝑘), (5.13)

1≤𝑖≤𝑑max|(𝜕_𝑖𝑓_𝜃/𝑓_𝜃)(𝑘)| ≤ 𝐶 log^𝛾(𝑘). (5.14) Then

sup

𝜃∈𝛩

| ̇𝜄_𝑛(𝑓_𝜃) − ̇𝜄(𝑓_𝜃)|−→ 0.^𝑃 (5.15) Proof. First we establish the convergence results by applying Lemma 5.3 with properℎ. For any 1 ≤ 𝑖 ≤ 𝑑, as 𝑛 → ∞

∞

∑

𝑘=1

|𝜕_𝑖𝑓_𝜃(𝑘)|𝑃_𝑘(𝑛)−→^𝑃

∞

∑

𝑘=1

|𝜕_𝑖𝑓_𝜃(𝑘)|𝑝⁽⁰⁾_𝑘 , (5.16)

∞

∑

𝑘=1

𝑓_𝜃(𝑘)𝑃_𝑘(𝑛)−→^𝑃

∞

∑

𝑘=1

𝑓_𝜃(𝑘)𝑝⁽⁰⁾_𝑘 , (5.17)

∞

∑

𝑘=1

|𝜕_𝑖𝑓_𝜃

𝑓_𝜃 (𝑘)| 𝑃_>𝑘(𝑛)−→^𝑃

∞

∑

𝑘=1

|𝜕_𝑖𝑓_𝜃

𝑓_𝜃 (𝑘)| 𝑝⁽⁰⁾_>𝑘, (5.18)

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)𝑃_𝑘(𝑛)−→^𝑃

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)𝑝⁽⁰⁾_𝑘 . (5.19)

Note (5.13), (5.16) and (5.17) hold by applying Lemma 5.3 withℎ(𝑘) =

𝜕_𝑖𝑓_𝜃(𝑘) and ℎ(𝑘) = 𝑓_𝜃(𝑘), respectively. (5.19) holds with ℎ(𝑘) = 𝑘 log^𝛾(𝑘).

(5.18) holds with

ℎ(𝑘) =

𝑘−1

∑

𝑗=1

|𝜕_𝑖𝑓_𝜃

𝑓_𝜃 (𝑘)| ≤ 𝐶

𝑘−1

∑

𝑗=1

log^𝛾(𝑗) ≤ 𝐶𝑘 log^𝛾(𝑘).

We sort out some continuity issues that are necessary to proceed with the proof. Because𝑓_𝜃(𝑘) is bounded away from zero uniformly in𝜃 ∈ 𝛩 and both 𝑓_𝜃(𝑘) and ̇𝑓_𝜃(𝑘) are continuous with respect to 𝜃

(10)

for any𝑘 ∈ ℕ₊,( ̇𝑓_𝜃/𝑓_𝜃)(𝑘) is continuous with respect to 𝜃 for any 𝑘 ∈ ℕ₊. Since𝑃_𝑘(𝑛) = 0 for any 𝑘 > 𝑛 and 𝑓_𝜃 ∈ 𝐶²with respect to 𝜃, ∑^∞_𝑘=1𝑓_𝜃̇(𝑘)𝑃_𝑘(𝑛), ∑^∞_𝑘=1𝑓_𝜃(𝑘)𝑃_𝑘(𝑛) and ∑^∞_𝑘=1(𝜕_𝑖𝑓_𝜃/𝑓_𝜃)(𝑘)𝑃_>𝑘(𝑛) are just finite sums and hence are continuous with respect to𝜃 for any 𝑛. By the dominated convergence theorem, using the assumptions (5.14) and (5.13) and the finiteness of the right hand side of (5.19),

∑^∞_𝑘=1𝑓_𝜃̇(𝑘)𝑝⁽⁰⁾_𝑘 ,∑^∞_𝑘=1𝑓_𝜃(𝑘)𝑝_𝑘⁽⁰⁾and∑^∞_𝑘=1(𝜕_𝑖𝑓_𝜃/𝑓_𝜃)(𝑘)𝑝⁽⁰⁾_>𝑘 are all continuous with respect to𝜃. Therefore both ̇𝜄_𝑛(𝑓_𝜃) and ̇𝜄(𝑓_𝜃) are continuous with respect to𝜃.

We will need the following fact This is in fact an

application of Scheffé’s Lemma, but we prove here with the dominated convergence theorem.

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)|𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 | =

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)(𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 )

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 in probability by (5.19)

+ 2

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘) max(𝑝⁽⁰⁾_𝑘 − 𝑃_𝑘(𝑛), 0)

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 in probablity by the dct

,

(5.20)

where we apply the dominated convergence theorem (dct) because max(𝑝⁽⁰⁾_𝑘 − 𝑃_𝑘(𝑛), 0) ≤ 𝑝⁽⁰⁾_𝑘 and∑^∞_𝑘=1𝑘 log^𝛾(𝑘)𝑝⁽⁰⁾_𝑘 < ∞ as in the calculation of (5.9).

The difference between _𝑛̇𝜄(𝑓_𝜃) and the limiting criterion ̇𝜄(𝑓_𝜃) is equal to

| ̇𝜄_𝑛(𝑓_𝜃) − ̇𝜄(𝑓_𝜃)| ≤ |

∞

∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)(𝑃_>𝑘(𝑛) − 𝑝⁽⁰⁾_>𝑘)|

+ |

| 1 𝑛

𝑛−1

∑

𝑗=1

𝑆𝑓_𝜃̇(𝑗)/(𝑗 + 1)

𝑆_𝑓_𝜃(𝑗)/(𝑗 + 1)−∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃̇(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃(𝑖)

|

| .

We call the first term of the previous display𝐴 and the second 𝐵 and we deal with them separately. By (5.18),𝐴 → 0 as 𝑛 → ∞. Keeping (5.20) in mind and we bound

|

∞

∑

𝑘=1

𝜕_𝑖𝑓_𝜃

𝑓_𝜃 (𝑘)(𝑃_>𝑘(𝑛) − 𝑝⁽⁰⁾_>𝑘)| ≤ 𝐶

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)|𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 | → 0, (5.21) as𝑛 → ∞. Since the above bound does not depend on 𝜃, the convergence of𝐴 to 0 is uniform on 𝜃 ∈ 𝛩.

By (5.16) and (5.17), the convergences of both𝑆𝑓_𝜃̇(𝑗)/(𝑗 + 1) →

∑^∞_𝑘=1𝑝⁽⁰⁾_𝑘 𝑓_𝜃̇(𝑘) and 𝑆_𝑓_𝜃(𝑗)/(𝑗 + 1) → ∑^∞_𝑘=1𝑝⁽⁰⁾_𝑘 𝑓_𝜃(𝑘) hold in probabil-

(11)

ity. By similar reasoning as that in (5.21), both convergences are uniform in𝜃 ∈ 𝛩. Note that ∑^∞_𝑘=1𝑓_𝜃(𝑘)𝑝⁽⁰⁾_𝑘 > 0 for any 𝜃 ∈ 𝛩, the map 𝜃 ↦ ∑^∞_𝑘=1𝑓_𝜃(𝑘)𝑝⁽⁰⁾_𝑘 is continous and𝛩 is compact, ∑^∞_𝑘=1𝑓_𝜃(𝑘)𝑝⁽⁰⁾_𝑘 is uniformly bounded away from0. Therefore uniformly in 𝜃 ∈ 𝛩 as 𝑛 → ∞

𝑆𝑓_𝜃̇(𝑗) 𝑆_𝑓_𝜃(𝑗)

−→𝑃 ∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃̇(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃(𝑖).

Now we show the convergence is also in𝐿₁-sense. Since𝑃_𝑘(𝑛) → 𝑝⁽⁰⁾_𝑘 holds almost surely for every𝑘 as 𝑛 → ∞ by Proposition 3.4 and 𝑃_𝑘(𝑛) ≤ 1, then 𝑃_𝑘(𝑛) → 𝑝⁽⁰⁾_𝑘 in𝐿₁as well. Consider the real sequence

∑^𝐾_𝑘=1𝑘 log^𝛾(𝑘)𝔼[𝑃_𝑘(𝑛)], which is monotone in 𝐾 and is asymptotic to

∑^𝐾_𝑘=1𝑘 log^𝛾(𝑘)𝑝⁽⁰⁾_𝑘 for any fixed𝐾. Noting that, by Fubini’s theorem, 1 = 𝔼[∑^∞_𝑘=1𝑃_𝑘(𝑛)] = ∑^∞_𝑘=1𝔼[𝑃_𝑘(𝑛)], (𝔼[𝑃_𝑘(𝑛)])^∞_𝑘=1is a probability distribution. Therefore,∑^∞_𝑘=1𝑘 log^𝛾(𝑘)𝔼[𝑃_𝑘(𝑛)] → ∑^∞_𝑘=1𝑘 log^𝛾(𝑘)𝑝⁽⁰⁾_𝑘 . By a similar argument as that in (5.20), as𝑛 → ∞,

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)|𝔼[𝑃_𝑘(𝑛)] − 𝑝⁽⁰⁾_𝑘 | → 0.

Then by Fubini’s theorem and (5.13), as𝑛 → ∞

𝔼[|

∞

∑

𝑘=1

𝜕_𝑖𝑓_𝜃(𝑘)(𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 )|] ≤

∞

∑

𝑘=1

|𝜕_𝑖𝑓_𝜃(𝑘)|𝔼|𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 |

≤ 𝐶

∞

∑

𝑘=1

𝑘 log^𝛾(𝑘)𝔼|𝑃_𝑘(𝑛) − 𝑝⁽⁰⁾_𝑘 | → 0.

Since𝑆_𝑓_𝜃(𝑗)/(𝑗 + 1) > 𝛿, the random variables 𝑆𝑓_𝜃̇(𝑗)/𝑆_𝑓_𝜃(𝑗) are (uniformly in𝑗) bounded above by 𝛿⁻¹𝑆𝑓_𝜃̇(𝑗)/(𝑗 + 1). However 𝑆𝑓_𝜃̇(𝑗)/(𝑗 + 1) is uniformly integrable because of its 𝐿₁-convergence,𝑆𝑓_𝜃̇(𝑗)/𝑆_𝑓_𝜃(𝑗) are thus uniform integrable. Therefore uniformly in𝜃 ∈ 𝛩 as 𝑛 → ∞

𝑆𝑓𝜃̇(𝑗) 𝑆_𝑓_𝜃(𝑗)

𝐿₁

−→∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃̇(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾𝑖 𝑓_𝜃(𝑖).

We apply Lemma 5.4 and obtain that uniformly in𝜃 ∈ 𝛩 1

𝑛 + 1

𝑛

∑

𝑗=1

𝑆𝑓_𝜃̇(𝑗) 𝑆_𝑓_𝜃(𝑗)

𝐿1

−→ ∑^∞_𝑖=1𝑝⁽⁰⁾𝑖 𝑓_𝜃̇(𝑖)

∑^∞_𝑖=1𝑝⁽⁰⁾_𝑖 𝑓_𝜃(𝑖).

We have disposed𝐵 and thus the uniform convergence of ̇𝜄_𝑛to ̇𝜄holds.

(12)

To establish the consistency of the mle, we also need the (local) uniqueness of the root to the equation system (5.2). This is in gen-

eral difficult. The author spent a

lot of time on proving the global uniqueness of the mle in this case, but to no avail.

However the formation in (5.22) and the three technical Lemmas 5.6, 5.7 and 5.8 were developed as an attempt to establish a general theory on the global uniqueness of the mle and turn out to be useful in proving the local uniqueness of the mle for 𝑓_𝛼,𝛽(𝑘) = (𝑘 + 𝛼)^𝛽.

In the study of the mle for the affine preferential attachment in Section 4.3, a monotonicity argument is deployed to prove the uniqueness of the solution to the likelihood equation (4.13). A similar argument still works in a much less visible fashion for certain types of parametric families. We need the following two simple lemmas:

Lemma 5.6. Suppose𝑣_𝑘is a strictly monotone decreasing with respect to𝑘. For two probability distributions (𝑝_𝑘)^∞_𝑘=1and(𝑞_𝑘)^∞_𝑘=1with mass onℕ₊satisfying that𝑝_𝑘≤ 𝑞_𝑘for𝑘 ≤ 𝐾 and 𝑝_𝑘> 𝑞_𝑘for𝑘 > 𝐾,

∞

∑

𝑘=1

𝑝_𝑘𝑣_𝑘>

∞

∑

𝑘=1

𝑞_𝑘𝑣_𝑘.

In case𝑣_𝑘is strictly monotone increasing, the previous inequality changes direction (> becomes <).

Proof. We use that∑^∞_𝑘=1𝑝_𝑘= 1 = ∑^∞_𝑘=1𝑞_𝑘and obtain∑^𝐾_𝑘=1(𝑞_𝑘−𝑝_𝑘) =

∑^∞_𝑘=𝐾+1(𝑝_𝑘− 𝑞_𝑘). By the strict monotonicity of 𝑣_𝑘we have

𝐾

∑

𝑘=1

(𝑞_𝑘− 𝑝_𝑘)𝑣_𝑘<

∞

∑

𝑘=𝐾

(𝑝_𝑘− 𝑞_𝑘)𝑣_𝑘

Rearranging the terms gives the desired result.

Lemma 5.7. Suppose𝑤_𝑘is a strictly monotone increasing sequence and (𝑝_𝑘)^∞_𝑘=1a probability distribution. Suppose∑^∞_𝑘=1𝑝_𝑘𝑤_𝑘 < ∞. Then the reweighed probability distribution𝑞_𝑘 = 𝑝_𝑘𝑤_𝑘/(∑^∞_𝑘=1𝑝_𝑘𝑤_𝑘) satisfies that there exists a𝐾 such that 𝑝_𝑘 ≥ 𝑞_𝑘 for𝑘 ≤ 𝐾 and 𝑝_𝑘 < 𝑞_𝑘for 𝑘 > 𝐾. If 𝑤_𝑘is strictly monotone decreasing, then there exists a𝐾 such that𝑝_𝑘≤ 𝑞_𝑘for𝑘 ≤ 𝐾 and 𝑝_𝑘> 𝑞_𝑘for𝐾 > 𝐾.

Proof. It is impossible that𝑝_𝑘 > 𝑞_𝑘for any𝑘 ∈ ℕ₊, otherwise they cannot both sum up to1. Suppose 𝑝_𝑗 ≤ 𝑞_𝑗, then

𝑞_𝑗+1

𝑝_𝑗+1 = 𝑤_𝑗+1

∑^∞_𝑘=1𝑝_𝑘𝑤_𝑘 > 𝑤_𝑗

∑^∞_𝑘=1𝑝_𝑘𝑤_𝑘 = 𝑞_𝑗 𝑝_𝑗 ≥ 1.

By mathematical induction,𝑝_𝑘 < 𝑞_𝑘for any𝑘 > 𝑗. Find the smallest 𝐽 such that 𝑝_𝐽< 𝑞_𝐽and set𝐾 = 𝐽 − 1, then 𝑝_𝑘< 𝑞_𝑘for any𝑘 > 𝐾. It is also impossible that𝐽 = 1.

(13)

In case𝑤_𝑘 is monotone decreasing (then1/𝑤_𝑘is monotone increasing), compare𝑞 and 𝑝, where 𝑝 is the reweighed version of 𝑞 with weight1/𝑤_𝑘, and apply the above result. Therefore the lemma holds.

With the above two lemmas, we introduce another lemma to study the behavior of ̇𝜄(𝑓_𝜃). If the parametric family 𝑓_𝜃with𝜃 ∈ 𝛩^′⊂ 𝛩 satisfies that𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘) is either monotone increasing (decreasing) in 𝑘 and is not a constant for any𝜃 ∈ 𝛩^′, we say the change of𝜃 ∈ 𝛩^′away from𝜃₀induces monotone increasing (decreasing) on𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘).

Lemma 5.8. Suppose the change of𝜃 ∈ 𝛩₀^′⊂ 𝛩 away from 𝜃₀induces monotonicity on𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘). Then for any 𝜃 ∈ 𝛩₀^′, ̇𝜄(𝑓_𝜃) ≠ 0.

Proof. A more illustrative view of (5.2)

Recall the mle ̂𝜃𝑛

is a root of (5.2) and 𝑞_𝑘= 𝑓(𝑘)𝑝_𝑘/𝜆^∗.

is as follows

̇𝜄(𝑓_𝜃) = ∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)𝑞⁽⁰⁾_𝑘 − ∑

𝑘=1

𝑝⁽⁰⁾𝑖 𝑓_𝜃₀(𝑘)/𝜆^∗₀⋅ 𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘)

∑^∞_𝑘=1𝑝⁽⁰⁾_𝑘 𝑓_𝜃(𝑘)/𝜆^∗₀

̇𝑓_𝜃 𝑓_𝜃(𝑘)

= ∑

𝑘=1

̇𝑓_𝜃

𝑓_𝜃(𝑘)𝑞⁽⁰⁾_𝑘 − ∑

𝑘=1

𝑞⁰_𝑘⋅ 𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘)

∑^∞_𝑘=1𝑞⁽⁰⁾_𝑘 ⋅ 𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘)

̇𝑓_𝜃 𝑓_𝜃(𝑘)

= ∑

𝑘=1

𝑞⁽⁰⁾_𝑘 𝑓_𝜃̇

𝑓_𝜃(𝑘) − ∑

𝑘=1

𝑞^(0,𝜃)_𝑘 𝑓_𝜃̇ 𝑓_𝜃(𝑘)

= 𝔼_𝑞₀[𝑓_𝜃̇

𝑓_𝜃(𝐾)] − 𝔼_𝑞_0,𝜃[𝑓_𝜃̇ 𝑓_𝜃(𝐾)],

(5.22)

where𝑞^(0,𝜃)_𝑘 ∝ 𝑞⁽⁰⁾_𝑘 𝑓_𝜃(𝑘)/𝑓_𝜃₀(𝑘) is the probability distribution generated by reweighing(𝑞⁽⁰⁾_𝑘 )^∞_𝑘=1with(𝑓_𝜃/𝑓_𝜃₀(𝑘))^∞_𝑘=1and𝐾 follows the distribution of(𝑞⁽⁰⁾_𝑘 )^∞_𝑘=1and(𝑞^(0,𝜃)_𝑘 )^∞_𝑘=1, respectively.

Fix any𝜃 ∈ 𝛩₀^′and without loss of generality we assume𝜃 induces monotone decreasing on𝑓_𝜃/𝑓_𝜃₀in𝑘. Apply Lemma 5.7 with weight (𝑓_𝜃/𝑓_𝜃₀)(𝑘), and there exists an 𝐾 such that 𝑞⁽⁰⁾_𝑘 ≤ 𝑞_𝑘^(0,𝜃)for𝑘 ≤ 𝐾 and 𝑞⁽⁰⁾_𝑘 > 𝑞^(0,𝜃)_𝑘 for𝑘 > 𝐾. Then apply Lemma 5.6 with 𝑣_𝑘 = ( ̇𝑓_𝜃/𝑓_𝜃)(𝑘) (understood component-wise) and two probability distributions𝑞⁽⁰⁾_𝑘 and𝑞^(0,𝜃)_𝑘 and obtain that𝜕_𝑖𝜄(𝑓_𝜃) < 0 in case (𝜕_𝑖𝑓_𝜃/𝑓_𝜃)(𝑘) is decreasing in𝑘 and the other way round for increasing (𝜕_𝑖𝑓_𝜃/𝑓_𝜃)(𝑘).

The local uniqueness of the mle for 𝑓𝛼,𝛽(𝑘) = (𝑘+𝛼)^𝛽. In the case of𝑓_𝛼,𝛽(𝑘) = (𝑘 + 𝛼)^𝛽, which is our primary candidate of specifying models, assuming the true parameter(𝛼₀, 𝛽₀), moving the parameter away does not induce monotonicity on𝑓_𝛼,𝛽/𝑓_𝛼₀_,𝛽₀automatically.

(14)

Analysing the function𝑔(𝑥) = (𝑓_𝛼,𝛽/𝑓_𝛼₀_,𝛽₀)(𝑥) = (𝑥 + 𝛼)^𝛽/(𝑥 + 𝛼₀)^𝛽⁰ with its derivative𝑔^′(𝑥) yields that (𝑓_𝛼,𝛽/𝑓_𝛼₀_,𝛽₀)(𝑘) is monotone increasing in𝑘 on {(𝛼, 𝛽)|𝛽 − 𝛽₀+ 𝛽𝛼₀− 𝛽₀𝛼 ≥ 0; 𝛽 ≥ 𝛽₀} and is monotone decreasing on the set{(𝛼, 𝛽)|𝛽 − 𝛽₀+ 𝛽𝛼₀− 𝛽₀≤ 0; 𝛽 ≤ 𝛽₀}. It is not possible to prove(𝛼₀, 𝛽₀) is the global unique root for ̇𝜄(𝑓_𝛼,𝛽) with the above technique (by compare two terms in (5.22)). It is cleaner if we can show the global uniqueness of the mle, but it is in general difficult. For our purpose, we only need the local uniqueness of the mle as the root to (5.2).

Now we prove the mle as a root of (5.2) is locally unique. This can be achieved by studying the Hessian matrix of𝜄(𝑓_𝛼,𝛽) evaluated at (𝛼₀, 𝛽₀). Abbreviate 𝑓₀ ∶= 𝑓_𝛼₀_,𝛽₀and calculate the Hessian matrix as follows

̈𝜄(𝑓)𝛼₀,𝛽₀= −1

𝑎²(𝑎𝑏 − 𝑐² 𝑎𝑑 − 𝑐𝑒

𝑎𝑑 − 𝑐𝑒 𝑎𝑓 − 𝑒²) , (5.23) where the quantities are defined as

𝑎 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘),

𝑏 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) 𝛽²₀ (𝑘 + 𝛼₀)², 𝑐 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) 𝛽₀ 𝑘 + 𝛼₀, 𝑑 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) 𝛽₀

𝑘 + 𝛼₀log(𝑘 + 𝛼₀), 𝑒 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) log(𝑘 + 𝛼₀),

𝑓 =

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) log²(𝑘 + 𝛼₀).

It is immediate that𝑎𝑏−𝑐²is strictly positive after applying the Cauchy- Schwarz or Jensen inequality

𝔼_𝑝₀[√𝑓₀(𝐾) 𝛽₀

𝐾 + 𝛼₀⋅√𝑓₀(𝐾)] < √𝔼𝑝₀[𝑓₀(𝐾)]⋅√𝔼_𝑝₀[𝑓₀(𝐾) 𝛽²₀ (𝐾 + 𝛼₀)²], where𝐾 follows the law of (𝑝⁽⁰⁾_𝑘 )^∞_𝑘=1. The same works for proving𝑎𝑓−

𝑒² > 0 and 𝑏𝑓 − 𝑑² > 0. It is more tricky to handle the determinant

(15)

of the Hessian matrix, which equals

| ̈𝜄(𝑓)_𝛼₀_,𝛽₀| = 𝑎²𝑏𝑓 + 𝑐²𝑒²− 𝑎𝑏𝑒²− 𝑎𝑓𝑐²− 𝑎²𝑑²− 𝑐²𝑒²+ 2𝑎𝑐𝑑𝑒

= 𝑎²(𝑏𝑓 − 𝑑²) − 𝑎𝑒(𝑏𝑒 − 𝑐𝑑) − 𝑎𝑐(𝑐𝑓 − 𝑒𝑑) > 0.

The last inequality holds because we can show both𝑏𝑒 − 𝑐𝑑 < 0 and 𝑐𝑓 − 𝑒𝑑 < 0. We only prove 𝑏𝑒 < 𝑐𝑑 ( 𝑐𝑓 < 𝑒𝑑 is similar). De- fine𝑥_𝑘 = 𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘)𝛽₀/(𝑘 + 𝛼₀) and 𝑢_𝑘 = (𝑘 + 𝛼₀) log(𝑘 + 𝛼₀)/𝛽₀, 𝑦_𝑘 = 𝑥_𝑘𝑢_𝑘 = 𝑝⁽⁰⁾_𝑘 𝑓₀(𝑘) log(𝑘 + 𝛼₀). Define 𝑝_𝑥(𝑘) = 𝑥_𝑘/ ∑_𝑗𝑥_𝑗 and 𝑝_𝑦(𝑘) = 𝑦_𝑘/ ∑_𝑗𝑦_𝑗. Then noting𝑢_𝑘is strictly monotone decreasing, an application of Lemma 5.7 tells us that there exists a𝐾 such that 𝑝_𝑥(𝑘) ≥ 𝑝_𝑦(𝑘) for 𝑘 ≤ 𝐾 and 𝑝_𝑥(𝑘) < 𝑝_𝑦(𝑘) for 𝑘 > 𝐾. Applying Lemma 5.6 with𝑤_𝑘= 𝛽₀/(𝑘 + 𝛼₀) yields 𝑏𝑒 < 𝑐𝑑. Therefore the Hes- sian matrix is negative definite and(𝛼₀, 𝛽₀) is an unique root of ̇𝜄(𝑓_𝛼,𝛽) in a neighborhood around(𝛼₀, 𝛽₀) with non-trivial diameter.

Global concavity of 𝜄(𝑓𝜃)in case 𝑓𝛽(𝑘) = (𝑘 + 𝛼0)^𝛽with known 𝛼₀. In the special case of𝑓_𝛽(𝑘) = (𝑘 + 𝛼₀)^𝛽with𝛽 the only parameter, (5.2) reduces to

𝜄^′(𝛽) =

∞

∑

𝑘=1

log(𝑘 + 𝛼₀)𝑝⁽⁰⁾_>𝑘−∑^∞_𝑘=1𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽log(𝑘 + 𝛼₀)

∑^∞_𝑘=1𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽 . Calculating the second order derivative gives

𝜄^″(𝛽) = −(

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽log²𝑘

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽

− (

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽log(𝑘 + 𝛼₀))²)/

∞

∑

𝑘=1

𝑝⁽⁰⁾_𝑘 (𝑘 + 𝛼₀)^𝛽< 0.

The last inequality holds for any𝛽 ∈ [0, 1] because of the Cauchy- Schwarz inequality (or Jensen inequality)

(𝔼_𝑝₀[(𝐾 + 𝛼₀)^𝛽/2⋅ (𝐾 + 𝛼₀)^𝛽/2log(𝐾 + 𝛼₀)])²<

𝔼_𝑝₀[𝐾^𝛽log²(𝐾 + 𝛼₀)]𝔼_𝑝₀[(𝐾 + 𝛼₀)^𝛽],

where𝐾 follows the law of (𝑝⁽⁰⁾_𝑘 )^∞_𝑘=1. In this parametric case, the maxi- mizer of the limiting likelihood function is unique, so is the root of its derivative (5.2). The other perspective is that moving the parameter𝛽 away from𝛽₀ induces monotonicity on(𝑓_𝛽/𝑓_𝛽₀)(𝑘) = (𝑘 + 𝛼₀)^𝛽−𝛽⁰, then𝜄^′(𝛽) ≠ 0 for any 𝛽₀≠ 𝛽 ∈ [0, 1].

Almost-global uniqueness in case 𝑓(𝛼) = (𝑘 + 𝛼)^𝛽⁰with known