• No results found

Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23

N/A
N/A
Protected

Academic year: 2021

Share "Cover Page The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation. Author: Gao, F. Title: Bayes and networks Issue Date: 2017-05-23"

Copied!
23
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Cover Page

The handle http://hdl.handle.net/1887/49012 holds various files of this Leiden University dissertation.

Author: Gao, F.

Title: Bayes and networks

Issue Date: 2017-05-23

(2)

5

N E T W O R K M O D E L S

5.1 introduction and notation

In Chapter 3, we consider the estimation of the general sublinear pref- erential attachment function, where we only assume the monotonic- ity of the preferential attachment function. The success of the maxi- mum likelihood estimator in Chapter 4 motivates us to consider the following problem. Suppose that the preferential attachment function is from a sublinear parametric family{𝑓𝜃∣ 𝜃 ∈ 𝛩 ⊂ ℝ𝑑} with the true parameter𝜃0. The problem is no longer non-parametric as in Chap- ter 3 and we expect the maximum likelihood estimator to work. The problem, however, becomes significantly harder than that in Chap- ter 4 as the analysis of all quantities in the likelihood function gets much more involved as soon as we step away from the affine domain.

Nonetheless in this chapter, with a branching process framework sim- ilar to that in Chapter 3, we show that the maximum likelihood estima- tor (mle) ̂𝜃𝑛works despite the complexity of analyzing the likelihood function. The main contribution of this chapter is indeed to use the branching process framework, which first appeared in [70] and was the foundation stone of Chapter 3, to study the likelihood function of the general sublinear preferential attachment networks.

We employ the exact same branching process framework as in Sec- tion 3.3, except that the preferential attachment function assumes the parametric form{𝑓𝜃, 𝜃 ∈ 𝛩}. We also inherit all the notation from Chapter 3. Some notation also comes from Chapter 4, such as the su- perscript(0), which stresses that the quantity is under the true param- eter𝜃0. For any sequence𝑎𝑘, define𝑎>𝑘= ∑𝑖>𝑘𝑎𝑖.𝑁𝑘(𝑡) is the number

of nodes of degree𝑘 in the network at time 𝑡 The network at time𝑡 possesses 𝑡 + 1 nodes.

,𝑃𝑘(𝑡) is the empirical pro- portion of nodes of degree𝑘 at time 𝑡, i.e., 𝑃𝑘(𝑡) = 𝑁𝑘(𝑡)/(𝑡 + 1). 𝑝𝑘 is the limiting degree proportion of nodes of degree𝑘 and 𝑞𝑘 is the limiting probability of the incoming node choosing a node of degree 𝑘 to attach to, i.e. 𝑞𝑘= 𝑓(𝑘)𝑝𝑘/ ∑𝑖=1𝑓(𝑖)𝑝𝑖.𝑆(𝑡) is the summary of a certain quantity in the whole network and will be defined thoroughly in the next section.

Throughout the chapter, we assume that uniformly in𝜃 ∈ 𝛩, 𝑓𝜃(𝑘) is non-decreasing in𝑘 and 𝑓𝜃is sublinear in the sense that there exists a positive constant𝐶 such that 𝑓𝜃(𝑘) ≤ 𝐶𝑘 uniformly in 𝜃 ∈ 𝛩.

The rest of the chapter is organized as follows. Section 5.2 gives

(3)

the definition of the maximum likelihood estimator. In Section 5.3 we prove that the mle is consistent by employing classical results from the continous-time branching processes. In Section 5.4 we prove that the mle is asymptotically normal. To overcome the problem of relying on the entire evolutional history of the network to establish the mle, we propose in Section 5.5 the quasi-maximum-likelihood estimator, which only depends on the final snapshot of the network.

5.2 construction of the maximum likelihood estimator Let(ℱ𝑡)𝑛𝑡=1be the𝜎-algebras generated by the stochastic process of the graph’s evolution. Given the data of the degrees of the nodes that got attached to as𝐷(𝑛)= (𝐷𝑖)𝑛𝑖=2, the likelihood function is as follows

𝐿𝑛(𝑓; 𝐷(𝑛)) =

𝑛

𝑡=2

𝑓(𝐷𝑡)𝑁𝐷𝑡(𝑡 − 1) 𝑆𝑓(𝑡 − 1)

where we define𝑆(𝑡) = 𝑆(𝑡 − 1) + ℎ(𝐷𝑡−1+ 1) − ℎ(𝐷𝑡−1) + ℎ(1) for 𝑡 ≥ 1 with the initialization 𝑆(1) = 2ℎ(1), for any map ℎ ∶ ℕ → ℝ+ and𝑆(𝑡) = ∑𝑘=1ℎ(𝑘)𝑁𝑘(𝑡). Note that 𝑆(𝑡) is ℱ𝑡-measurable. Then the derivatives of thelog-likelihood can be written as

The equality is understood component-wise.

̇𝑙𝑛(𝑓𝜃; 𝐷(𝑛)) = 𝜕

𝜕𝜃log 𝐿𝑛(𝑓𝜃; 𝐷(𝑛))

=

𝑛

𝑡=2

[𝑓𝜃̇

𝑓𝜃(𝐷𝑡) −𝑆𝑓𝜃̇(𝑡 − 1) 𝑆𝑓𝜃(𝑡 − 1)]

By the formation of the last equality, we see that ̇𝑙𝑛(𝑓𝜃0) is a martingale relative to the filtration ℱ𝑡because

̇𝑙𝑛(𝑓𝜃0; 𝐷(𝑛)) =

𝑛

𝑡=2

[

̇𝑓𝜃0 𝑓𝜃0(𝐷𝑡) −

𝑘=1

̇𝑓𝜃0

𝑓𝜃0(𝑘)𝑓𝜃0(𝑘)𝑁𝑘(𝑡 − 1) 𝑆𝑓

𝜃0(𝑡 − 1) ]

=

𝑛

𝑡=2

[

̇𝑓𝜃0 𝑓𝜃0(𝐷𝑡) −

𝑘=1

̇𝑓𝜃0

𝑓𝜃0(𝑘)ℙ(𝐷𝑡= 𝑘 ∣ ℱ𝑡−1)]

=

𝑛

𝑡=2

[

̇𝑓𝜃0

𝑓𝜃0(𝐷𝑡) − 𝔼𝜃0[

̇𝑓𝜃0

𝑓𝜃0(𝐷𝑡) | ℱ𝑡−1]] .

If we normalize the likelihood by the number of nodes in the net-

(4)

work, we obtain

Note the left hand side of the display is𝜄 instead of 𝑙.

𝜄𝑛(𝑓𝜃) ∶= 1

𝑛 + 1log 𝐿𝑛(𝑓𝜃; 𝐷(𝑛))

= 1

𝑛 + 1

𝑘=1

log 𝑓𝜃(𝑘)

𝑛

𝑡=2

𝟙{𝐷𝑡=𝑘}− 1 𝑛 + 1

𝑛

𝑡=2

log 𝑆𝑓𝜃(𝑡 − 1)

=

𝑘=1

log 𝑓𝜃(𝑘)𝑁>𝑘(𝑛) 𝑛 + 1 − 1

𝑛 + 1

𝑛

𝑡=2

log 𝑆𝑓𝜃(𝑡 − 1)

=

𝑘=1

log 𝑓𝜃(𝑘)𝑃>𝑘(𝑛) − 1 𝑛 + 1

𝑛

𝑡=2

log 𝑆𝑓𝜃(𝑡 − 1).

We take partial derivatives on both sides of the preceding display and obtain

̇𝜄𝑛(𝑓𝜃) =

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)𝑃>𝑘(𝑛) − 1 𝑛 + 1

𝑛

𝑡=2

𝑆𝑓𝜃̇(𝑡 − 1) 𝑆𝑓𝜃(𝑡 − 1)

=

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)𝑃>𝑘(𝑛) − 1 𝑛 + 1

𝑛

𝑡=2

𝑖=1𝑃𝑖(𝑡 − 1) ̇𝑓𝜃(𝑖)

𝑖=1𝑃𝑖(𝑡 − 1)𝑓𝜃(𝑖)

(5.1)

In this chapter, ̇𝑓𝜃and 𝜃̇𝜄 (sometimes ̇𝑙𝜃) are often understood compo- nent-wise to simplify notation, such as in (5.1), (5.2), (5.15) and (5.22).

̇𝜄𝑛(𝑓𝜃) will be shown to converge in probability to the following quan- tity in Proposition 5.5

̇𝜄(𝑓𝜃) =

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)𝑝(0)>𝑘−∑𝑖=1𝑝(0)𝑖 𝑓𝜃̇(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃(𝑖). (5.2) Indeed𝜃0solves the above equation. In fact, assuming the Malthusian

parameter Recall the

definition of the Malthusian parameter in (3.6) and further elaboration in (3.14).

associated with the true parameter𝑓𝜃0is𝜆0, we have

̇𝜄(𝑓𝜃0) =

𝑘=1

̇𝑓𝜃0

𝑓𝜃0(𝑘)𝑝(0)>𝑘−∑𝑖=1 𝑝(0)𝑖 𝑓𝜃̇0(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃0(𝑖)

=

𝑘=1

̇𝑓𝜃0(𝑘) ( 𝑝(0)>𝑘 𝑓𝜃0(𝑘)−𝑝(0)𝑘

𝜆0 ) = 0.

The second equality holds because of (3.14) and in the last equality we apply Lemma 3.5, which first appeared in [17].

Suppose𝜆is the Malthusian parameter of the continuous branch- ing tree process associated with the preferential attachment function

(5)

𝑓 and recall the equality 𝑞𝑘= 𝑓(𝑘)𝑝𝑘

𝑖=1𝑓(𝑖)𝑝𝑖 = 𝑓(𝑘)𝑝𝑘 𝜆 = 𝑝>𝑘.

5.3 consistency

We present the following proposition obtained by combining Theo- rem 3.1 and Corollary 3.4 of [56]. All the notation are the same as in Section 3.3 as we follow the same branching process framework therein.

The reader is advised to read Section 3.3 and Section 3.4 thoroughly before proceeding with the proofs.

Recall the conditions of the supercritical, Malthusian processes that we consider in Section 3.3.2.

Proposition 5.1. Assume that the reproduction function𝜇 = 𝔼[𝜉(𝑡)]

satisfies conditions (3.6), where the Malthusian parameter is denoted by𝛼, and (3.7) and does not concentrate on any lattice. Given product- measurable, separable, non-negative random process𝜑(𝑡), assigning a characteristic on the root node∅ such that 𝔼(𝜑(𝑡)), as a function on 𝑡, is continuous almost everywhere with respect to the Lebesgue measure and the following conditions hold:

𝑘=0

sup

𝑘≤𝑡≤𝑘+1

(𝑒−𝛼𝑡𝔼[𝜑(𝑡)]) < ∞, (5.3) 𝔼[sup

𝑠≤𝑡

𝜑(𝑠)] < ∞ for all 𝑡 < ∞. (5.4) Then there exists a random variable𝑌depending only on the repro- duction process𝜉(𝑡) such that as 𝑡 → ∞

𝑒−𝛼𝑡𝑍𝜑𝑡

−→ 𝑌𝑃 𝑚𝜑, (5.5)

where𝑚𝜑is defined as

𝑚𝜑= ∫

0 𝑒−𝛼𝑡𝔼[𝜑(𝑡)]𝑑𝑡

0 𝑢𝑒−𝛼𝑡𝑑𝜇(𝑡) .

Note that in the preceding display, the denominator depends only on the reproduction function𝜇(𝑡) = 𝔼[𝜉(𝑡)].

Define𝛼𝜉(𝑡) = ∫0𝑡𝑒−𝛼𝑢𝜉(𝑑𝑢). If the 𝑥 log 𝑥 condition

𝔼[𝛼𝜉(∞) log+𝛼𝜉(∞)] < ∞ (5.6) holds, then the convergence in (5.5) holds in𝐿1sense.

(6)

Suppose that the reproduction process𝜉 satisfies (5.6), and both 𝜑1 and𝜑2satisfy the conditions of (5.3) and (5.4). Define𝑇𝑡as the total number of births up to and including time𝑡. Then, on {𝑇𝑡 → ∞}, as 𝑡 → ∞.

𝑍𝜑𝑡1

𝑍𝜑𝑡2

−→𝑃 𝑚𝜑1 𝑚𝜑2

= ∫

0 𝑒−𝛼𝑡𝔼[𝜑1(𝑡)]𝑑𝑡

0 𝑒−𝛼𝑡𝔼[𝜑2(𝑡)]𝑑𝑡. (5.7) First we show the following lemma. Recall the definition of de- gree in the continuous random tree model in (3.4) and note that for the root node∅, the degree evolves together with the reproduction process on the root asdeg(∅, 𝛶𝑡) = 𝜉(𝑡) + 1.

Lemma 5.2. For our continuous branching tree model and character- istics𝜑1and𝜑2both monotone increasing (but not necessarily strictly monotone) in𝑡, suppose for some constant 𝐶 > 0 and 𝛾 ≥ 0

max(𝜑1(𝑡), 𝜑2(𝑡)) ≤ 𝐶𝟙{𝑡≥0}deg(∅, 𝛶𝑡) log𝛾(deg(∅, 𝛶𝑡)).

Then as𝑡 → ∞ 𝑍𝜑𝑡1

𝑍𝜑𝑡2

−→𝑃 𝑚𝜑1

𝑚𝜑2 = ∫0𝑒−𝛼𝑡𝔼[𝜑1(𝑡)]𝑑𝑡

0𝑒−𝛼𝑡𝔼[𝜑2(𝑡)]𝑑𝑡. (5.8)

Proof. For our model, (5.6) only depends on the reproduction process 𝜉(𝑡) associated with the sublinear preferential attachment functions and has been verified in [70]. We only need to prove that conditions (5.3) and (5.4) hold for both𝜑1and𝜑2. Take any𝜑(𝑡) bounded above by𝐶 deg(∅, 𝛶𝑡) log𝛾(deg(∅, 𝛶𝑡)).

Suppose the limiting degree distribution associated with the con- tinuous branching tree model is(𝑝𝑘)𝑘=1and the Malthusian parame- ter is𝛼, then the condition (5.3) entails

𝑘=0

sup

𝑘≤𝑡≤𝑘+1

(𝑒−𝛼𝑡𝔼[𝜑(𝑡)]) ≲ ∫

0

𝑒−𝛼𝑡𝔼[𝜑(𝑡)]𝑑𝑡

𝑘=1

𝑘 log𝛾(𝑘) ∫

0

𝑒−𝛼𝑡ℙ(deg(∅, 𝛶𝑡) = 𝑘)𝑑𝑡

= 𝐶 𝛼

𝑘=1

𝑘 log𝛾(𝑘)𝑝𝑘< ∞.

(5.9) The last equality comes from the calculation in (3.17). The finiteness of the series is due to the fact that the limiting degree distribution (𝑝𝑘)𝑘=1decays at slowest with a power law with exponent(2 + 𝜀) for

(7)

some positive𝜀 (this corresponds to the case where the preferential attachment function is affine with𝑓(𝑘) = 𝑘 − 1 + 𝜀), then 𝑘 log𝛾(𝑘)𝑝𝑘 is at least of orderlog𝛾(𝑘)/𝑘1+𝜀and hence the convergence.

For condition (5.4), we use the monotonicity of𝜑 and obtain for and0 < 𝑡 < ∞

𝔼[sup

𝑠≤𝑡

𝜑(𝑠)] = 𝔼[𝜑(𝑡)] < ∞.

The above boundedness comes from (5.9) and the monotonicity of 𝜇(𝑡). In fact, suppose the opposite of the preceding display holds, then (5.9) cannot be bounded.

Therefore both𝜑1and𝜑2satisfy the conditions in Proposition 5.1 and applying the fact that{𝑇𝑡→ ∞} is an almost sure event, we obtain the desired result.

Lemma 5.3. Suppose the preferential attachment network model stated above with the underlying preferential attachment function𝑓. Suppose the empirical degree distribution is𝑃𝑘(𝑛) ∶= 𝑁𝑘(𝑛)/(𝑛 + 1) and the limiting degree distribution is(𝑝𝑘)𝑘=1. Suppose for some constant𝐶 > 0 and𝛾 > 0, a map ℎ ∶ ℕ+ → ℝ+satisfies

ℎ(𝑘) ≤ 𝐶𝑘 log𝛾(𝑘). (5.10)

Then as𝑛 → ∞

𝑘=1

ℎ(𝑘)𝑃𝑘(𝑛)−→𝑃

𝑘=1

ℎ(𝑘)𝑝𝑘. (5.11)

Suppose both1and2are maps from+to+satisfying (5.10), then

𝑘=11(𝑘)𝑃𝑘(𝑛)

𝑘=12(𝑘)𝑃𝑘(𝑛)

−→𝑃𝑘=11(𝑘)𝑝𝑘

𝑘=12(𝑘)𝑝𝑘. (5.12)

Proof. Establish the continuous branching tree process as before and call the expanding graph𝛶𝑡with the root∅. Suppose that its corre- sponding Malthusian parameter is𝜆.

First we note that as in the calculation of (5.9), the right hand side of (5.11) is finite. Define the random characteristic𝜑2(𝑡) = 𝟙{𝑡≥0}and 𝜑1(𝑡) = 𝟙{𝑡≥0}ℎ(deg(∅, 𝛶𝑡)). The quotient of the branching process counted with𝜑1and𝜑2is equal to

𝑍𝜑𝑡1 𝑍𝜑𝑡2

= ∑𝑥∈𝛶

𝑡ℎ(deg(𝑥, 𝛶𝑡))

|{𝑥 ∈ 𝛶𝑡}| = ∑𝑘=1ℎ(𝑘)𝑁𝑘(𝑛)

𝑛 + 1 =

𝑘=1

ℎ(𝑘)𝑃𝑘(𝑛),

(8)

where𝑛 + 1 is the number of nodes in 𝛶𝑡and𝑁𝑘(𝑛) is the number of nodes of degree𝑘 in 𝛶𝑡. Since both𝜑1and𝜑2satisfy conditions in Lemma 5.2, we have

𝑘=1

ℎ(𝑘)𝑃𝑘(𝑛) = 𝑍𝜑𝑡1

𝑍𝜑𝑡2

−→𝑃𝑘=1ℎ(𝑘)𝜆

0 𝑒−𝜆𝑡ℙ(deg(∅, 𝛶𝑡) = 𝑘)𝑑𝑡 𝜆0𝑒−𝜆𝑡𝑑𝑡 . The denominator is simply1 and the nominator is ∑𝑘=1ℎ(𝑘)𝑝𝑘 by identifying𝑝𝑘from (3.17).

(5.12) can be shown by the same argument with 𝜑1(𝑡) = 𝟙{𝑡≥0}1(deg(∅, 𝛶𝑡)), 𝜑2(𝑡) = 𝟙{𝑡≥0}2(deg(∅, 𝛶𝑡)).

Now we prove the uniform convergence of𝑛̇𝜄 to ̇𝜄. Abbreviate 𝜕𝑖𝑓𝜃= (𝜕/𝜕𝜃𝑖)𝑓𝜃and𝜕𝑖𝑗𝑓𝜃 = (𝜕2/𝜕𝜃𝑖𝜕𝑗)(𝑓𝜃). From now on in this chapter, the Malthusian parameter of the continuous branching tree process associated with the true preferential attachment function𝑓𝜃0 is𝜆0. Note that𝐶2is the class of functions whose first and second deriva- tives both exist and are continuous. For the entire chapter, we assume the𝑓𝜃∈ 𝐶2with respect to𝜃 ∈ 𝛩 for any 𝑘 ∈ ℕ+.

We will need the following lemma in the study of the uniform convergence of the likelihood equation.

Lemma 5.4. Suppose𝑍𝜃(𝑛) → 𝑍𝜃in𝐿1uniformly on𝜃 ∈ 𝛩, then the Cesàro mean of𝑍𝜃(𝑛) is asymptotic to 𝑍𝜃in𝐿1uniformly.

Proof. By Lemma 4.7(ii),𝑍𝜃(𝑛) → 𝑍𝜃pointwise for𝜃 ∈ 𝛩.

Fix any𝜀 sufficiently small. Because of the uniform convergence of𝑍𝜃(𝑛) → 𝑍𝜃in𝐿1, we find𝑚 such that for any 𝑛 > 𝑚,

𝔼[sup

𝜃∈𝛩

|𝑍𝜃(𝑛) − 𝑍𝜃|] < 𝜀.

By the trivial split𝑍𝜃(𝑛) − 𝑍𝜃= ∑𝑚𝑖=1(𝑍𝜃(𝑖) − 𝑍𝜃)/𝑛 + ∑𝑛𝑖=𝑚+1(𝑍𝜃(𝑖) − 𝑍𝜃)/𝑛, we bound

𝔼[sup

𝜃∈𝛩

|𝑍𝜃(𝑛) − 𝑍𝜃|] ≤1 𝑛𝔼 [sup

𝜃∈𝛩

|

𝑚

𝑖=1

(𝑍𝜃(𝑖) − 𝑍𝜃)|]

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 as 𝑛→∞ for fixed𝑚

(9)

+1 𝑛

𝑛

𝑖=𝑚+1

𝔼[sup

𝜃∈𝛩

|𝑍𝜃(𝑖) − 𝑍𝜃|]

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

≤𝜀 by the choice of 𝑚

.

Proposition 5.5. For the model stated above with the preferential at- tachment function from{𝑓𝜃|𝜃 ∈ 𝛩} with 𝛩 ⊂ ℝ𝑑being a compact set such that𝑓𝜃 ∈ 𝐶2with respect to𝜃 for any 𝑘, 𝑓𝜃(𝑘) is non-decreasing in𝑘 for any 𝜃 ∈ 𝛩, inf𝜃∈𝛩min𝑘∈ℕ+𝑓𝜃(𝑘) = inf𝜃∈𝛩𝑓𝜃(1) ≥ 𝛿 > 0 and for some𝐶 > 0 and 𝛾 > 0 uniform for all 𝜃 ∈ 𝛩 assume that

max(max

1≤𝑖≤𝑑|𝜕𝑖𝑓𝜃(𝑘)|, 𝑓𝜃(𝑘)) ≤ 𝐶𝑘 log𝛾(𝑘), (5.13)

1≤𝑖≤𝑑max|(𝜕𝑖𝑓𝜃/𝑓𝜃)(𝑘)| ≤ 𝐶 log𝛾(𝑘). (5.14) Then

sup

𝜃∈𝛩

| ̇𝜄𝑛(𝑓𝜃) − ̇𝜄(𝑓𝜃)|−→ 0.𝑃 (5.15) Proof. First we establish the convergence results by applying Lemma 5.3 with properℎ. For any 1 ≤ 𝑖 ≤ 𝑑, as 𝑛 → ∞

𝑘=1

|𝜕𝑖𝑓𝜃(𝑘)|𝑃𝑘(𝑛)−→𝑃

𝑘=1

|𝜕𝑖𝑓𝜃(𝑘)|𝑝(0)𝑘 , (5.16)

𝑘=1

𝑓𝜃(𝑘)𝑃𝑘(𝑛)−→𝑃

𝑘=1

𝑓𝜃(𝑘)𝑝(0)𝑘 , (5.17)

𝑘=1

|𝜕𝑖𝑓𝜃

𝑓𝜃 (𝑘)| 𝑃>𝑘(𝑛)−→𝑃

𝑘=1

|𝜕𝑖𝑓𝜃

𝑓𝜃 (𝑘)| 𝑝(0)>𝑘, (5.18)

𝑘=1

𝑘 log𝛾(𝑘)𝑃𝑘(𝑛)−→𝑃

𝑘=1

𝑘 log𝛾(𝑘)𝑝(0)𝑘 . (5.19)

Note (5.13), (5.16) and (5.17) hold by applying Lemma 5.3 withℎ(𝑘) =

𝜕𝑖𝑓𝜃(𝑘) and ℎ(𝑘) = 𝑓𝜃(𝑘), respectively. (5.19) holds with ℎ(𝑘) = 𝑘 log𝛾(𝑘).

(5.18) holds with

ℎ(𝑘) =

𝑘−1

𝑗=1

|𝜕𝑖𝑓𝜃

𝑓𝜃 (𝑘)| ≤ 𝐶

𝑘−1

𝑗=1

log𝛾(𝑗) ≤ 𝐶𝑘 log𝛾(𝑘).

We sort out some continuity issues that are necessary to proceed with the proof. Because𝑓𝜃(𝑘) is bounded away from zero uniformly in𝜃 ∈ 𝛩 and both 𝑓𝜃(𝑘) and ̇𝑓𝜃(𝑘) are continuous with respect to 𝜃

(10)

for any𝑘 ∈ ℕ+,( ̇𝑓𝜃/𝑓𝜃)(𝑘) is continuous with respect to 𝜃 for any 𝑘 ∈ ℕ+. Since𝑃𝑘(𝑛) = 0 for any 𝑘 > 𝑛 and 𝑓𝜃 ∈ 𝐶2with respect to 𝜃, ∑𝑘=1𝑓𝜃̇(𝑘)𝑃𝑘(𝑛), ∑𝑘=1𝑓𝜃(𝑘)𝑃𝑘(𝑛) and ∑𝑘=1(𝜕𝑖𝑓𝜃/𝑓𝜃)(𝑘)𝑃>𝑘(𝑛) are just finite sums and hence are continuous with respect to𝜃 for any 𝑛. By the dominated convergence theorem, using the assumptions (5.14) and (5.13) and the finiteness of the right hand side of (5.19),

𝑘=1𝑓𝜃̇(𝑘)𝑝(0)𝑘 ,∑𝑘=1𝑓𝜃(𝑘)𝑝𝑘(0)and∑𝑘=1(𝜕𝑖𝑓𝜃/𝑓𝜃)(𝑘)𝑝(0)>𝑘 are all con- tinuous with respect to𝜃. Therefore both ̇𝜄𝑛(𝑓𝜃) and ̇𝜄(𝑓𝜃) are contin- uous with respect to𝜃.

We will need the following fact This is in fact an

application of Scheffé’s Lemma, but we prove here with the dominated convergence theorem.

𝑘=1

𝑘 log𝛾(𝑘)|𝑃𝑘(𝑛) − 𝑝(0)𝑘 | =

𝑘=1

𝑘 log𝛾(𝑘)(𝑃𝑘(𝑛) − 𝑝(0)𝑘 )

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 in probability by (5.19)

+ 2

𝑘=1

𝑘 log𝛾(𝑘) max(𝑝(0)𝑘 − 𝑃𝑘(𝑛), 0)

⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

→0 in probablity by the dct

,

(5.20)

where we apply the dominated convergence theorem (dct) because max(𝑝(0)𝑘 − 𝑃𝑘(𝑛), 0) ≤ 𝑝(0)𝑘 and∑𝑘=1𝑘 log𝛾(𝑘)𝑝(0)𝑘 < ∞ as in the calculation of (5.9).

The difference between 𝑛̇𝜄(𝑓𝜃) and the limiting criterion ̇𝜄(𝑓𝜃) is equal to

| ̇𝜄𝑛(𝑓𝜃) − ̇𝜄(𝑓𝜃)| ≤ |

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)(𝑃>𝑘(𝑛) − 𝑝(0)>𝑘)|

+ |

| 1 𝑛

𝑛−1

𝑗=1

𝑆𝑓𝜃̇(𝑗)/(𝑗 + 1)

𝑆𝑓𝜃(𝑗)/(𝑗 + 1)−∑𝑖=1𝑝(0)𝑖 𝑓𝜃̇(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃(𝑖)

|

| .

We call the first term of the previous display𝐴 and the second 𝐵 and we deal with them separately. By (5.18),𝐴 → 0 as 𝑛 → ∞. Keeping (5.20) in mind and we bound

|

𝑘=1

𝜕𝑖𝑓𝜃

𝑓𝜃 (𝑘)(𝑃>𝑘(𝑛) − 𝑝(0)>𝑘)| ≤ 𝐶

𝑘=1

𝑘 log𝛾(𝑘)|𝑃𝑘(𝑛) − 𝑝(0)𝑘 | → 0, (5.21) as𝑛 → ∞. Since the above bound does not depend on 𝜃, the conver- gence of𝐴 to 0 is uniform on 𝜃 ∈ 𝛩.

By (5.16) and (5.17), the convergences of both𝑆𝑓𝜃̇(𝑗)/(𝑗 + 1) →

𝑘=1𝑝(0)𝑘 𝑓𝜃̇(𝑘) and 𝑆𝑓𝜃(𝑗)/(𝑗 + 1) → ∑𝑘=1𝑝(0)𝑘 𝑓𝜃(𝑘) hold in probabil-

(11)

ity. By similar reasoning as that in (5.21), both convergences are uni- form in𝜃 ∈ 𝛩. Note that ∑𝑘=1𝑓𝜃(𝑘)𝑝(0)𝑘 > 0 for any 𝜃 ∈ 𝛩, the map 𝜃 ↦ ∑𝑘=1𝑓𝜃(𝑘)𝑝(0)𝑘 is continous and𝛩 is compact, ∑𝑘=1𝑓𝜃(𝑘)𝑝(0)𝑘 is uniformly bounded away from0. Therefore uniformly in 𝜃 ∈ 𝛩 as 𝑛 → ∞

𝑆𝑓𝜃̇(𝑗) 𝑆𝑓𝜃(𝑗)

−→𝑃𝑖=1𝑝(0)𝑖 𝑓𝜃̇(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃(𝑖).

Now we show the convergence is also in𝐿1-sense. Since𝑃𝑘(𝑛) → 𝑝(0)𝑘 holds almost surely for every𝑘 as 𝑛 → ∞ by Proposition 3.4 and 𝑃𝑘(𝑛) ≤ 1, then 𝑃𝑘(𝑛) → 𝑝(0)𝑘 in𝐿1as well. Consider the real sequence

𝐾𝑘=1𝑘 log𝛾(𝑘)𝔼[𝑃𝑘(𝑛)], which is monotone in 𝐾 and is asymptotic to

𝐾𝑘=1𝑘 log𝛾(𝑘)𝑝(0)𝑘 for any fixed𝐾. Noting that, by Fubini’s theorem, 1 = 𝔼[∑𝑘=1𝑃𝑘(𝑛)] = ∑𝑘=1𝔼[𝑃𝑘(𝑛)], (𝔼[𝑃𝑘(𝑛)])𝑘=1is a probability distribution. Therefore,∑𝑘=1𝑘 log𝛾(𝑘)𝔼[𝑃𝑘(𝑛)] → ∑𝑘=1𝑘 log𝛾(𝑘)𝑝(0)𝑘 . By a similar argument as that in (5.20), as𝑛 → ∞,

𝑘=1

𝑘 log𝛾(𝑘)|𝔼[𝑃𝑘(𝑛)] − 𝑝(0)𝑘 | → 0.

Then by Fubini’s theorem and (5.13), as𝑛 → ∞

𝔼[|

𝑘=1

𝜕𝑖𝑓𝜃(𝑘)(𝑃𝑘(𝑛) − 𝑝(0)𝑘 )|] ≤

𝑘=1

|𝜕𝑖𝑓𝜃(𝑘)|𝔼|𝑃𝑘(𝑛) − 𝑝(0)𝑘 |

≤ 𝐶

𝑘=1

𝑘 log𝛾(𝑘)𝔼|𝑃𝑘(𝑛) − 𝑝(0)𝑘 | → 0.

Since𝑆𝑓𝜃(𝑗)/(𝑗 + 1) > 𝛿, the random variables 𝑆𝑓𝜃̇(𝑗)/𝑆𝑓𝜃(𝑗) are (uni- formly in𝑗) bounded above by 𝛿−1𝑆𝑓𝜃̇(𝑗)/(𝑗 + 1). However 𝑆𝑓𝜃̇(𝑗)/(𝑗 + 1) is uniformly integrable because of its 𝐿1-convergence,𝑆𝑓𝜃̇(𝑗)/𝑆𝑓𝜃(𝑗) are thus uniform integrable. Therefore uniformly in𝜃 ∈ 𝛩 as 𝑛 → ∞

𝑆𝑓𝜃̇(𝑗) 𝑆𝑓𝜃(𝑗)

𝐿1

−→∑𝑖=1𝑝(0)𝑖 𝑓𝜃̇(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃(𝑖).

We apply Lemma 5.4 and obtain that uniformly in𝜃 ∈ 𝛩 1

𝑛 + 1

𝑛

𝑗=1

𝑆𝑓𝜃̇(𝑗) 𝑆𝑓𝜃(𝑗)

𝐿1

−→ ∑𝑖=1𝑝(0)𝑖 𝑓𝜃̇(𝑖)

𝑖=1𝑝(0)𝑖 𝑓𝜃(𝑖).

We have disposed𝐵 and thus the uniform convergence of ̇𝜄𝑛to ̇𝜄holds.

(12)

To establish the consistency of the mle, we also need the (local) uniqueness of the root to the equation system (5.2). This is in gen-

eral difficult. The author spent a

lot of time on proving the global uniqueness of the mle in this case, but to no avail.

However the formation in (5.22) and the three technical Lemmas 5.6, 5.7 and 5.8 were developed as an attempt to establish a general theory on the global uniqueness of the mle and turn out to be useful in proving the local uniqueness of the mle for 𝑓𝛼,𝛽(𝑘) = (𝑘 + 𝛼)𝛽.

In the study of the mle for the affine preferential attach- ment in Section 4.3, a monotonicity argument is deployed to prove the uniqueness of the solution to the likelihood equation (4.13). A similar argument still works in a much less visible fashion for certain types of parametric families. We need the following two simple lemmas:

Lemma 5.6. Suppose𝑣𝑘is a strictly monotone decreasing with respect to𝑘. For two probability distributions (𝑝𝑘)𝑘=1and(𝑞𝑘)𝑘=1with mass on+satisfying that𝑝𝑘≤ 𝑞𝑘for𝑘 ≤ 𝐾 and 𝑝𝑘> 𝑞𝑘for𝑘 > 𝐾,

𝑘=1

𝑝𝑘𝑣𝑘>

𝑘=1

𝑞𝑘𝑣𝑘.

In case𝑣𝑘is strictly monotone increasing, the previous inequality changes direction (> becomes <).

Proof. We use that𝑘=1𝑝𝑘= 1 = ∑𝑘=1𝑞𝑘and obtain∑𝐾𝑘=1(𝑞𝑘−𝑝𝑘) =

𝑘=𝐾+1(𝑝𝑘− 𝑞𝑘). By the strict monotonicity of 𝑣𝑘we have

𝐾

𝑘=1

(𝑞𝑘− 𝑝𝑘)𝑣𝑘<

𝑘=𝐾

(𝑝𝑘− 𝑞𝑘)𝑣𝑘

Rearranging the terms gives the desired result.

Lemma 5.7. Suppose𝑤𝑘is a strictly monotone increasing sequence and (𝑝𝑘)𝑘=1a probability distribution. Suppose𝑘=1𝑝𝑘𝑤𝑘 < ∞. Then the reweighed probability distribution𝑞𝑘 = 𝑝𝑘𝑤𝑘/(∑𝑘=1𝑝𝑘𝑤𝑘) satisfies that there exists a𝐾 such that 𝑝𝑘 ≥ 𝑞𝑘 for𝑘 ≤ 𝐾 and 𝑝𝑘 < 𝑞𝑘for 𝑘 > 𝐾. If 𝑤𝑘is strictly monotone decreasing, then there exists a𝐾 such that𝑝𝑘≤ 𝑞𝑘for𝑘 ≤ 𝐾 and 𝑝𝑘> 𝑞𝑘for𝐾 > 𝐾.

Proof. It is impossible that𝑝𝑘 > 𝑞𝑘for any𝑘 ∈ ℕ+, otherwise they cannot both sum up to1. Suppose 𝑝𝑗 ≤ 𝑞𝑗, then

𝑞𝑗+1

𝑝𝑗+1 = 𝑤𝑗+1

𝑘=1𝑝𝑘𝑤𝑘 > 𝑤𝑗

𝑘=1𝑝𝑘𝑤𝑘 = 𝑞𝑗 𝑝𝑗 ≥ 1.

By mathematical induction,𝑝𝑘 < 𝑞𝑘for any𝑘 > 𝑗. Find the smallest 𝐽 such that 𝑝𝐽< 𝑞𝐽and set𝐾 = 𝐽 − 1, then 𝑝𝑘< 𝑞𝑘for any𝑘 > 𝐾. It is also impossible that𝐽 = 1.

(13)

In case𝑤𝑘 is monotone decreasing (then1/𝑤𝑘is monotone in- creasing), compare𝑞 and 𝑝, where 𝑝 is the reweighed version of 𝑞 with weight1/𝑤𝑘, and apply the above result. Therefore the lemma holds.

With the above two lemmas, we introduce another lemma to study the behavior of ̇𝜄(𝑓𝜃). If the parametric family 𝑓𝜃with𝜃 ∈ 𝛩⊂ 𝛩 satis- fies that𝑓𝜃(𝑘)/𝑓𝜃0(𝑘) is either monotone increasing (decreasing) in 𝑘 and is not a constant for any𝜃 ∈ 𝛩, we say the change of𝜃 ∈ 𝛩away from𝜃0induces monotone increasing (decreasing) on𝑓𝜃(𝑘)/𝑓𝜃0(𝑘).

Lemma 5.8. Suppose the change of𝜃 ∈ 𝛩0⊂ 𝛩 away from 𝜃0induces monotonicity on𝑓𝜃(𝑘)/𝑓𝜃0(𝑘). Then for any 𝜃 ∈ 𝛩0, ̇𝜄(𝑓𝜃) ≠ 0.

Proof. A more illustrative view of (5.2)

Recall the mle ̂𝜃𝑛

is a root of (5.2) and 𝑞𝑘= 𝑓(𝑘)𝑝𝑘/𝜆.

is as follows

̇𝜄(𝑓𝜃) = ∑

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)𝑞(0)𝑘 − ∑

𝑘=1

𝑝(0)𝑖 𝑓𝜃0(𝑘)/𝜆0⋅ 𝑓𝜃(𝑘)/𝑓𝜃0(𝑘)

𝑘=1𝑝(0)𝑘 𝑓𝜃(𝑘)/𝜆0

̇𝑓𝜃 𝑓𝜃(𝑘)

= ∑

𝑘=1

̇𝑓𝜃

𝑓𝜃(𝑘)𝑞(0)𝑘 − ∑

𝑘=1

𝑞0𝑘⋅ 𝑓𝜃(𝑘)/𝑓𝜃0(𝑘)

𝑘=1𝑞(0)𝑘 ⋅ 𝑓𝜃(𝑘)/𝑓𝜃0(𝑘)

̇𝑓𝜃 𝑓𝜃(𝑘)

= ∑

𝑘=1

𝑞(0)𝑘 𝑓𝜃̇

𝑓𝜃(𝑘) − ∑

𝑘=1

𝑞(0,𝜃)𝑘 𝑓𝜃̇ 𝑓𝜃(𝑘)

= 𝔼𝑞0[𝑓𝜃̇

𝑓𝜃(𝐾)] − 𝔼𝑞0,𝜃[𝑓𝜃̇ 𝑓𝜃(𝐾)],

(5.22)

where𝑞(0,𝜃)𝑘 ∝ 𝑞(0)𝑘 𝑓𝜃(𝑘)/𝑓𝜃0(𝑘) is the probability distribution gen- erated by reweighing(𝑞(0)𝑘 )𝑘=1with(𝑓𝜃/𝑓𝜃0(𝑘))𝑘=1and𝐾 follows the distribution of(𝑞(0)𝑘 )𝑘=1and(𝑞(0,𝜃)𝑘 )𝑘=1, respectively.

Fix any𝜃 ∈ 𝛩0and without loss of generality we assume𝜃 induces monotone decreasing on𝑓𝜃/𝑓𝜃0in𝑘. Apply Lemma 5.7 with weight (𝑓𝜃/𝑓𝜃0)(𝑘), and there exists an 𝐾 such that 𝑞(0)𝑘 ≤ 𝑞𝑘(0,𝜃)for𝑘 ≤ 𝐾 and 𝑞(0)𝑘 > 𝑞(0,𝜃)𝑘 for𝑘 > 𝐾. Then apply Lemma 5.6 with 𝑣𝑘 = ( ̇𝑓𝜃/𝑓𝜃)(𝑘) (understood component-wise) and two probability distributions𝑞(0)𝑘 and𝑞(0,𝜃)𝑘 and obtain that𝜕𝑖𝜄(𝑓𝜃) < 0 in case (𝜕𝑖𝑓𝜃/𝑓𝜃)(𝑘) is decreasing in𝑘 and the other way round for increasing (𝜕𝑖𝑓𝜃/𝑓𝜃)(𝑘).

The local uniqueness of the mle for 𝑓𝛼,𝛽(𝑘) = (𝑘+𝛼)𝛽. In the case of𝑓𝛼,𝛽(𝑘) = (𝑘 + 𝛼)𝛽, which is our primary candidate of specifying models, assuming the true parameter(𝛼0, 𝛽0), moving the parame- ter away does not induce monotonicity on𝑓𝛼,𝛽/𝑓𝛼0,𝛽0automatically.

(14)

Analysing the function𝑔(𝑥) = (𝑓𝛼,𝛽/𝑓𝛼0,𝛽0)(𝑥) = (𝑥 + 𝛼)𝛽/(𝑥 + 𝛼0)𝛽0 with its derivative𝑔(𝑥) yields that (𝑓𝛼,𝛽/𝑓𝛼0,𝛽0)(𝑘) is monotone in- creasing in𝑘 on {(𝛼, 𝛽)|𝛽 − 𝛽0+ 𝛽𝛼0− 𝛽0𝛼 ≥ 0; 𝛽 ≥ 𝛽0} and is mono- tone decreasing on the set{(𝛼, 𝛽)|𝛽 − 𝛽0+ 𝛽𝛼0− 𝛽0≤ 0; 𝛽 ≤ 𝛽0}. It is not possible to prove(𝛼0, 𝛽0) is the global unique root for ̇𝜄(𝑓𝛼,𝛽) with the above technique (by compare two terms in (5.22)). It is cleaner if we can show the global uniqueness of the mle, but it is in general dif- ficult. For our purpose, we only need the local uniqueness of the mle as the root to (5.2).

Now we prove the mle as a root of (5.2) is locally unique. This can be achieved by studying the Hessian matrix of𝜄(𝑓𝛼,𝛽) evaluated at (𝛼0, 𝛽0). Abbreviate 𝑓0 ∶= 𝑓𝛼0,𝛽0and calculate the Hessian matrix as follows

̈𝜄(𝑓)𝛼0,𝛽0= −1

𝑎2(𝑎𝑏 − 𝑐2 𝑎𝑑 − 𝑐𝑒

𝑎𝑑 − 𝑐𝑒 𝑎𝑓 − 𝑒2) , (5.23) where the quantities are defined as

𝑎 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘),

𝑏 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘) 𝛽20 (𝑘 + 𝛼0)2, 𝑐 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘) 𝛽0 𝑘 + 𝛼0, 𝑑 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘) 𝛽0

𝑘 + 𝛼0log(𝑘 + 𝛼0), 𝑒 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘) log(𝑘 + 𝛼0),

𝑓 =

𝑘=1

𝑝(0)𝑘 𝑓0(𝑘) log2(𝑘 + 𝛼0).

It is immediate that𝑎𝑏−𝑐2is strictly positive after applying the Cauchy- Schwarz or Jensen inequality

𝔼𝑝0[√𝑓0(𝐾) 𝛽0

𝐾 + 𝛼0⋅√𝑓0(𝐾)] < √𝔼𝑝0[𝑓0(𝐾)]⋅√𝔼𝑝0[𝑓0(𝐾) 𝛽20 (𝐾 + 𝛼0)2], where𝐾 follows the law of (𝑝(0)𝑘 )𝑘=1. The same works for proving𝑎𝑓−

𝑒2 > 0 and 𝑏𝑓 − 𝑑2 > 0. It is more tricky to handle the determinant

(15)

of the Hessian matrix, which equals

| ̈𝜄(𝑓)𝛼0,𝛽0| = 𝑎2𝑏𝑓 + 𝑐2𝑒2− 𝑎𝑏𝑒2− 𝑎𝑓𝑐2− 𝑎2𝑑2− 𝑐2𝑒2+ 2𝑎𝑐𝑑𝑒

= 𝑎2(𝑏𝑓 − 𝑑2) − 𝑎𝑒(𝑏𝑒 − 𝑐𝑑) − 𝑎𝑐(𝑐𝑓 − 𝑒𝑑) > 0.

The last inequality holds because we can show both𝑏𝑒 − 𝑐𝑑 < 0 and 𝑐𝑓 − 𝑒𝑑 < 0. We only prove 𝑏𝑒 < 𝑐𝑑 ( 𝑐𝑓 < 𝑒𝑑 is similar). De- fine𝑥𝑘 = 𝑝(0)𝑘 𝑓0(𝑘)𝛽0/(𝑘 + 𝛼0) and 𝑢𝑘 = (𝑘 + 𝛼0) log(𝑘 + 𝛼0)/𝛽0, 𝑦𝑘 = 𝑥𝑘𝑢𝑘 = 𝑝(0)𝑘 𝑓0(𝑘) log(𝑘 + 𝛼0). Define 𝑝𝑥(𝑘) = 𝑥𝑘/ ∑𝑗𝑥𝑗 and 𝑝𝑦(𝑘) = 𝑦𝑘/ ∑𝑗𝑦𝑗. Then noting𝑢𝑘is strictly monotone decreasing, an application of Lemma 5.7 tells us that there exists a𝐾 such that 𝑝𝑥(𝑘) ≥ 𝑝𝑦(𝑘) for 𝑘 ≤ 𝐾 and 𝑝𝑥(𝑘) < 𝑝𝑦(𝑘) for 𝑘 > 𝐾. Applying Lemma 5.6 with𝑤𝑘= 𝛽0/(𝑘 + 𝛼0) yields 𝑏𝑒 < 𝑐𝑑. Therefore the Hes- sian matrix is negative definite and(𝛼0, 𝛽0) is an unique root of ̇𝜄(𝑓𝛼,𝛽) in a neighborhood around(𝛼0, 𝛽0) with non-trivial diameter.

Global concavity of 𝜄(𝑓𝜃)in case 𝑓𝛽(𝑘) = (𝑘 + 𝛼0)𝛽with known 𝛼0. In the special case of𝑓𝛽(𝑘) = (𝑘 + 𝛼0)𝛽with𝛽 the only parameter, (5.2) reduces to

𝜄(𝛽) =

𝑘=1

log(𝑘 + 𝛼0)𝑝(0)>𝑘−∑𝑘=1𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽log(𝑘 + 𝛼0)

𝑘=1𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽 . Calculating the second order derivative gives

𝜄(𝛽) = −(

𝑘=1

𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽log2𝑘

𝑘=1

𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽

− (

𝑘=1

𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽log(𝑘 + 𝛼0))2)/

𝑘=1

𝑝(0)𝑘 (𝑘 + 𝛼0)𝛽< 0.

The last inequality holds for any𝛽 ∈ [0, 1] because of the Cauchy- Schwarz inequality (or Jensen inequality)

(𝔼𝑝0[(𝐾 + 𝛼0)𝛽/2⋅ (𝐾 + 𝛼0)𝛽/2log(𝐾 + 𝛼0)])2<

𝔼𝑝0[𝐾𝛽log2(𝐾 + 𝛼0)]𝔼𝑝0[(𝐾 + 𝛼0)𝛽],

where𝐾 follows the law of (𝑝(0)𝑘 )𝑘=1. In this parametric case, the maxi- mizer of the limiting likelihood function is unique, so is the root of its derivative (5.2). The other perspective is that moving the parameter𝛽 away from𝛽0 induces monotonicity on(𝑓𝛽/𝑓𝛽0)(𝑘) = (𝑘 + 𝛼0)𝛽−𝛽0, then𝜄(𝛽) ≠ 0 for any 𝛽0≠ 𝛽 ∈ [0, 1].

Almost-global uniqueness in case 𝑓(𝛼) = (𝑘 + 𝛼)𝛽0with known

Referenties

GERELATEERDE DOCUMENTEN

In example (71), the patient is covert since it is understood from the context. The verb takes an oblique root. Note that the occurrence of the patient after the verb does

The prefix ba- combined with absolute verb roots occurs in intransitive constructions expressing ‘to do a relatively time stable activity’. Derivational forms only take

Chapter 9 discussed the derived verb constructions. Verbs are derived from prefixation processes. Three general types of derived verb constructions can be distinguished with regard

We derive a contraction rate for the corre- sponding posterior distribution, both for the mixing distribution rel- ative to the Wasserstein metric and for the mixed density relative

Fengnan Gao: Bayes &amp; Networks, Dirichlet-Laplace Deconvolution and Statistical Inference in Preferential Attachment Networks, © April 2017.. The author designed the cover

In this chapter we improve the upper bound on posterior contraction rates given in [61], at least in the case of the Laplace mixtures, obtaining a rate of

plex networks are representations of complex systems that we wish to study, and network science, by all its means, is the science to study these complex systems beneath such

To sum up, the estimator works as proven in the main Theorem 3.2, but the exact performance depends on the true pa function and the degree of interest—if the true pa function