H Node-SpeciﬁcDiffusionLMS-BasedDistributedDetectionOverAdaptiveNetworks

(1)

Node-Specific Diffusion LMS-Based Distributed

Detection Over Adaptive Networks

Sara Al-Sayed, Member, IEEE, Jorge Plata-Chaves, Member, IEEE, Michael Muma, Member, IEEE,

Marc Moonen, Fellow, IEEE and Abdelhak M. Zoubir, Fellow, IEEE

Abstract—Diffusion adaptation techniques have shown great promise in addressing the problem of node-specific distributed estimation where the nodes in the network are interested in different, possibly overlapping, sets of parameters. In this work, node-specific distributed detection, which has remained largely unexamined, is considered. In particular, the problem is formulated as one of binary hypothesis testing at each node for each of its parameters of interest. A distributed, online solution for this problem is sought based on diffusion adaptation techniques. In this setting, a signal to be detected by one node constitutes interference that may compromise the ability of the other nodes to detect their signals of interest reliably. Under mild assumptions on the data and network, it is shown that, for sufficiently small adaptation step-sizes, interference can be kept in check. Local detectors are developed where the test-statistics and thresholds adapt to changing conditions in real time. The distributed algorithm is analyzed; and its detection performance characterized and illustrated through numerical simulations.

Index Terms—Distributed detection, diffusion adaptation, adaptive networks, collaborative processing, multi-task learning, mean-square performance analysis.

I. INTRODUCTION

H

ETEROGENEITY is a staple of modern-day networks,

where geographically dispersed nodes have different tasks; computational and communication resources; modes of operation; degrees of readiness for cooperation; privacy concerns; and environmental conditions under which data transmission takes place. For example, in localization and tracking applications, different nodes are interested in lo-calizing and tracking different moving targets; in spectrum sensing, different cognitive radios are interested in detecting and tracking different spectrum holes—more examples can be found in [1], [2].

Typically, these networks operate in harsh wireless en-vironments with scarce computational and communication resources and fluctuating statistical conditions. In order to This work was done in the frame of the project HANDiCAMS, which acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 323944. The work of S. Al-Sayed and M. Muma was also supported by the LOEWE initiative (Hessen, Germany) within the NICER project and by the German Research Foundation (DFG).

S. Al-Sayed is with the Bioinspired Communication Systems Group; M. Muma and A. M. Zoubir are with the Signal Processing Group. Both groups are at Institut für Nachrichtentechnik, Technische Univer-sität Darmstadt, 64283 Darmstadt, Germany (e-mails: sara.al-sayed@bcs.tu-darmstadt.de, {mmuma,zoubir}@spg.tu-darmstadt.de).

J. Plata-Chaves and M. Moonen are with the Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, B-3001 Leuven, Belgium (e-mails: {jorge.plata-chaves,marc.moonen}@esat.kuleuven.be).

address these challenges, the nodes engage in collaborative adaptation and learning, where each node exchanges infor-mation with its neighbors and updates its estimates online in response to streaming data [3], [4]. Two popular modes of cooperation for parameter estimation are investigated in the literature: consensus strategies [5]–[8] and diffusion strategies [4], [9]–[12]. It was shown in [13] that diffusion strategies enjoy enhanced stability and performance relative to consensus strategies, achieving centralized performance in the slow adap-tation regime [4], [10], [14], [15]. Building upon parameter estimation is detection, investigated in both the consensus [16]–[20] and diffusion [21]–[25] contexts.

In distributed node-specific adaptation and learning, the nodes aim to learn different, possibly overlapping, sets of parameters characterizing different phenomena of interest. This problem, sometimes referred to as multi-tasking, is the focus of several recent works [26]–[36]. While the works [31]–[36] address the problem of node-specific distributed estimation in a diffusion setting, the problem of node-specific distributed detection has remained largely unexamined. Hence, we would like to fill this gap by extending the framework of [31] to solve a detection task that is cast as a binary hypothesis testing problem at each node for each of its parameters of interest. That is, we seek a distributed, online strategy such that each node can detect the presence or absence of its parameters of interest. We choose the framework of [31] because, unlike its counterparts in [33], [35], [36], it can be flexibly generalized to accommodate individual, local, and global parameters of interest with potential overlap; and, under fairly general assumptions, asymptotic unbiasedness can be ensured. The developed distributed detection algorithm is further analyzed based on the energy-conservation framework [3], [4], [12] and its performance illustrated through numerical simulations.

The article is organized as follows. In Sec. II, the data model is introduced and the problem is formulated. In Sec. III, we review the node-specific diffusion least-mean-squares (LMS)–based distributed estimation algorithm and scrutinize the asymptotic behavior of the weight estimates. In Sec. IV, the basic form of the node-specific diffusion LMS-based distributed detection algorithm as motivated by asymptotic analysis is described. In Sec. V, we conduct estimation and detection performance analyses, motivating fully distributed online detectors. Simulation results are presented in Sec. VI.

Notation: Lowercase letters are reserved for scalars and vectors, uppercase for matrices; boldface font is reserved for random variables, and normal font for deterministic variables.

(2)

The time index appears in parenthesis for scalars, and in the subscript for vectors and matrices. All vectors are column vectors unless mentioned otherwise. Transposition, inversion, and the trace and gradient operators are denoted by(_·)⊤,(_·)−1, Tr(_{·), and ∇}x, respectively; and the Euclidean norm is denoted

by_{k·k. Expectation is denoted by E. Set cardinality is denoted} by _{|·|. The notation 1 and I represents the all-one vector} and identity matrix of appropriate sizes, respectively (if not clear from the context, the size is indicated in the subscript). The operator col_{{·} stacks its arguments vertically; and the} operatordiag_{{·} forms a diagonal matrix from its arguments.}

II. DATAMODEL ANDPROBLEMFORMULATION

A. Data Model

We reappropriate the model in [31] for the problem

for-mulation in this work. We consider a network of N nodes

whose topology is described by a graph. The nodes are indexed

by k _{∈ V , {1, . . . , N}. The neighborhood of node k,}

denoted as_Vk, is the set of all nodes, includingk, that node k

can exchange information with. Information exchange among neighbors is governed by a combination policy, which will be elaborated upon later. There are Ns systems, models, or

signals to be detected, indexed byc∈ S , {1, . . . , Ns}, each

parametrized by a known, real-valued weight vector wsc of size Mc × 1. Different nodes in the network are interested

in detecting different sets of weight vectors, so the network can be visualized as involving clusters of connected nodes, Cc ⊆ V, c ∈ S, that are generally not disjoint, such that

nodes k _{∈ C}c are interested in the cth weight vector wsc. The set of interests of each node k is nonempty and denoted by _Ik , {c|k ∈ Cc, c∈ S}. It is assumed that each node k

knows all nodesℓ_{∈ V}k∩ Cc that share its interest in thecth

weight vector, which is known, for all c _{∈ I}k. Unlike the

formulation in [31], the specification of global, common, and local interests is implicit in the formulation here. The sets V, Vk,Cc,S, and Ik, as well as their subsets, are assumed to be

naturally ordered. At each time indexi≥ 0, every node k has access to real-valued scalar noisy measurements of the form:

dk(i) = X c∈Ik uk,c,iwosc+ vk(i) = Ns X c=1 uk,c,iwosc+ vk(i) (1)

in terms of the acting or true value of the weight vectors, wo sc , and where uk,c,i 6= 0, if k ∈ Cc = 0, if k /_{∈ C}c . (2)

The row vectors _{uk,c,i|c ∈ Ik} are real-valued regressors of

sizesMc that are known to nodek. It is assumed that the

en-ergy of the regressors is bounded from above, i.e., there exists a real-valued positive scalar α such thatP_c∈I_kkuk,c,ik2≤ α

for allk and i. The noise vk(i) is a scalar wide-sense stationary

zero-mean real Gaussian noise process, spatially independent and temporally white with finite varianceσ2

v,k = E v2k(i).

B. Binary Hypothesis Tests

Each nodek in the network aims to establish the absence or presence of each of its weight vectors of interest. This problem can be cast as that of binary hypothesis testing at each node for each of those weight vectors. That is, for everyc_{∈ I}k:

wosc= 0 under _Hc 0 wsc6= 0 under H c 1 (3) whereHc

0 andHc1denote the null and alternative hypotheses,

respectively.

One approach to solving this detection problem is for each node k, given its data {dk(i),{uk,c,i}} up to time index

i, to calculate batch estimates for wo

sc, c ∈ Ik, say the least-squares estimates. Local batch processing is riddled with challenges, however, e.g., memory constraints, rank deficiency, computational constraints, missing statistical knowledge, and time-varying scenarios. The approach in this work is to employ diffusion adaptation techniques.

A compact representation of model (1) will come to use later on. Let

wo s, col n wo sc c∈S o (M_{× 1)} (4) whereM =PNs

c=1Mc, and for each nodek,

u⊤k,i, col n u⊤k,c,i c∈S o (M× 1) (5)

So the data model can be written more compactly as

d_k(i) = uk,iwos+ vk(i), i≥ 0. (6)

Arranging the data from all nodesk = 1, . . . , N into vectors and matrices:

di, col{d1(i), d2(i), . . . , dN(i)} (N× 1)

Ui, col{u1,i, u2,i, . . . , uN,i} (N× M)

vi, col{v1(i), v2(i), . . . , vN(i)} (N× 1)

Rv, diag σ2 v,1, σv,22 , . . . , σ2v,N (N_{× N)}

then time indecesj = i, i_{− 1, . . . , 1, 0:}

d0:i, col{di, di−1, . . . , d0} ((i + 1)N × 1)

U0:i, col{Ui, Ui−1, . . . , U0} ((i + 1)N× M)

v0:i, col{vi, vi−1, . . . , v0} ((i + 1)N × 1)

Rv,0:i, diag{Rv, . . . , Rv} ((i + 1)N× (i + 1)N)

leads to the following network data model:

d0:i= U0:iwso+ v0:i. (7)

Additionally, we make the assumption thatU0:i, as well asUi,

are full-rank for everyi, with M≤ N. This can be guaranteed by sufficient spatial and temporal diversity among the regres-sors{uk,c,i} without stringent individual linear independence

(3)

III. NODE-SPECIFICDIFFUSIONLMS-BASED DISTRIBUTEDESTIMATION

In [31], a distributed stochastic-gradient type algorithm was developed to minimize the following global mean-square-error (MSE) cost: J n{wk,c}c∈Ik o k∈V ,X k∈V E dk(i)− X c∈Ik uk,c,iwk,c 2 . (8)

The data appearing in (8),_{dk(i), uk,i}, follow the data model

(1). The regressors uk,c,i for each node k and c ∈ Ik are

generally random in nature, having been generated from an underlying probability distribution; nevertheless, each node k observes the realizations of its own regressors. For each node k and c ∈ Ik, the regressors {uk,c,i} are zero-mean,

spatially independent, temporally uncorrelated, and satisfy a global observability condition: PN_k=1Ru,k > 0, where

Ru,k = E u⊤k,iuk,i ≥ 0 is the regressor covariance matrix.

The noise and regressors are assumed to be statistically inde-pendent. The adapt-then-combine (ATC) version of the node-specific diffusion LMS-based distributed estimation algorithm that attempts to solve (8) is reviewed briefly here. At each time index i, each node k has an estimate wk,c,i−1 of the

weight vector wo

sc for all c ∈ Ik. Each node k updates its current estimates _{wk,c,i−1} in an LMS fashion with

step-size µk using its data {dk(i),{uk,c,i}} to form intermediate

estimates {ψk,c,i}. We introduce the N × N combination

matrix Ac , [ac,ℓk], ℓ, k ∈ V, for fusing the intermediate

estimates pertaining to the weight vector wsc, c ∈ S, within neighborhoods, forming final estimates{wk,c,i}. That is, node

k scales the intermediate estimate of node ℓ for the weight vector wsc at each time index i with ac,ℓk for fusion. The entries of each Ac satisfy:

ac,ℓk= 0, if k /∈ Cc orℓ /∈ Vk∩ Cc; (9)

if k_{∈ C}c,

X

ℓ∈Vk∩Cc

ac,ℓk= 1

The algorithm is listed in Table I, using the following defini-tions: u⊤ Ik,i, col n u⊤ k,c,i c∈Ik o (10) wIk,i, col n {wk,c,i}c∈Ik o (11) ψIk,i, col n {ψk,c,i}c∈Ik o (12) of size P_c∈I kMc each.

In this work, the regressors _{uk,c,i} are taken to be

de-terministic and known. In practice, the regressors may have been estimated, yet it is assumed that the estimation error is negligible—see Sec. VI-B for a practical example. The network nodes run the algorithm in Table I so that at each time indexi, each node k has an estimate wk,c,iof the acting or true

weight vectorwo

scfor allc∈ Ik. In order to design hypothesis tests based on the estimates{wk,c,i}, it is necessary to derive

a relationship to the network data model (7). To this end, it is instructive to rewrite the algorithm in augmented form. In

TABLE I

DIFFUSIONESTIMATIONALGORITHM

Initializations: Vk, Ik, µk for every node k, ac,ℓk (ℓ, k ∈ Cc, cluster c, c∈ S). Start with wk,c,−1= 0 for every node k and each cluster c, c ∈ Ik. For every time index i≥ 0, repeat

Adaptation step: for every node k, repeat ψI_k,i= wI_k,i−1+ µku⊤I_k,i

dk(i) − uI_k,iwI_k,i−1

(13) (uI_k,i, wI_k,i, and ψI_k,iare defined in (10)–(12).)

Combination step: for every node k and each cluster c, c∈ Ik, repeat

wk,c,i=

X

ℓ∈V_k∩C_c

ac,ℓkψℓ,c,i (14)

analogy to the augmented regressors defined according to (5) and (2), let wk,i, col {wk,c,i}_c∈S (M_{× 1)} (15) ψk,i, col {ψk,c,i}_c∈S (M_{× 1)} (16) Moreover, let Aℓk , diag {ac,ℓkIMc}c∈S (M _{× M)} (17)

Then, the adaptation and combination steps of the ATC algorithm, (13) and (14), can be rewritten in augmented form as

( _ψ

k,i = wk,i−1+ µkuk,i⊤ [dk(i)− uk,iwk,i−1]

wk,i = P

ℓ∈Vk

Aℓkψℓ,i (18)

Let Yk,i , I − µku⊤k,iuk,i (M × M) and Ek , diag{ek},

where ek is the all-zero vector of length N and kth entry

equal to1. By induction, it can be verified that the following relationship holds between the estimator wk,i for all nodesk

and the global measurement vector d0:i:

wk,i = Kk,id0:i (19)

where Kk,i = " X ℓ∈Vk µℓAℓkUi⊤Eℓ X ℓ∈Vk AℓkYℓ,iKℓ,i−1 # (20) of sizeM× (i + 1)N. The proof follows closely that of [21, Proposition 2]. Thecth subvector of node k’s estimator wk,i,

wk,c,i, for everyc∈ S, for all k ∈ V, can be recovered from

(19) using matricesPc of the form:

Pc, 0Mc×M1, . . . , 0Mc×Mc−1, IMc×Mc, 0Mc×Mc+1, . . . , 0Mc×MNs (Mc× M) (21)

where 0N1×N2 is the all-zero matrix of size N1× N2, such that

wk,c,i= Pcwk,i

= PcKk,iU0:iwso+ PcKk,iv0:i

= PcKk,iUc,0:iwsoc+ PcKk,i X

c′∈S\{c}

Uc′,0:iwso′ c

(4)

where

Uc,0:i= U0:iPc⊤ ((i + 1)N× Mc) (23)

The second term in the last equation in (22) constitutes interference to nodek’s estimate of the cth weight vector wsc arising from all systems, models, or signals c′ _{∈ S \ {c}. A}

detector for each weight vector wsc would need to suppress the respective interference and pre-whiten the noise. But to suppress the interference, the active hypotheses need to be known. Here, we propose a different approach, motivated by the fact that the interference term vanishes at steady-state under certain conditions. This is established in the following lemma.

Lemma 1 (Vanishing Interference). Under the data and network model in Sec. II, if each node k in the network runs the algorithm in Table I using combination policiesAc,c∈ S,

with entries satisfying (9), then, for sufficiently small step-sizes _{µk}, as i → ∞, Kk,iU0:i→ Wk (24) where Wk, diag {IIk(c)IMc}c∈S (M_{× M)} (25)

with I denoting the indicator function: The matrix Wk is an

Ns×Nsblock-diagonal matrix whosecth block, of size Mc×

Mc, is the identity matrix,IMc, ifc∈ Ik. Proof: Let us define the following matrices:

M , diag{µkIM}k∈V (N M_{× NM)} (26) Hi, diag n u⊤k,iuk,i k∈V o (N M_{× NM)} (27)

Di, colnu⊤k,iuk,i

k∈V o =_Hi(1N⊗ IM) (N M× M) (28) Ki, col {Kk,i}k∈V (N M× (i + 1)N) (29) A , [Aℓk], ℓ, k∈ V (NM × NM) (30)

whereX⊗ Z is the Kronecker product of the matrices X and Z, and Yi, diag {Yk,i}k∈V = I− MHi (N M× NM) (31)

Given the previous definitions, it is straightforward to establish from (20) that the following recursion holds:

(KiU0:i) =A⊤MDi+A⊤Yi(Ki−1U0:i−1) (32)

with initial condition(K−1U0:−1) = 0. As i→ ∞, assuming

convergence, expression (32) becomes I_{− A}⊤

Yi

(KiU0:i) =A⊤MDi. (33)

The condition for convergence of (32) coincides with that for mean stability of the augmented algorithm (18)—the latter is derived in Sec. V-B1. To see thatKk,iU0:i→ Wk as i→ ∞,

we first need to show that

KiU0:i= col{Wk}k∈V

, W (34)

solves (33). The left-hand side of (33) withKiU0:iset to W

and using (31) evaluates to I− A⊤ Yi W = W − A⊤ W + A⊤ MHiW =A⊤ MHiW (35)

since_A⊤_{W = W by the properties of the combination policy} in (9). The right-hand side of (33) can be re-expressed using (28) as

A⊤

MDi =A⊤MHi(1N ⊗ IM) =A⊤MHiW (36)

given the construction of the regressors according to (5) and (2). Hence,_{W solves (33).}

Next, we need to show that W is the unique solution for (33). Equivalently, we show in Appendix A that, under the conditions outlined in Lemma 1, the node-specific diffusion LMS-based distributed estimation algorithm in its augmented form (18) generates iterates _{wk,i} that converge to Wkwso

in the mean-square sense for each node k. The assumption of deterministic regressors _{uk,c,i} made in this work poses

an analysis challenge such that a different route than the one pursued in [31] needs to be taken.

Motivated by the above lemma, we consider the following model derived from (22) by ignoring the interference term:

wk,c,i≈ PcKk,iUc,0:iwsoc+ PcKk,iv0:i, i≥ 0 (37) for each nodek and every c_{∈ I}k.

IV. NODE-SPECIFICDIFFUSIONLMS-BASED

DISTRIBUTEDDETECTION

Based on model (37), according to the Neyman–Pearson framework [37], the test that maximizes the probability of detection of the weight vector wsc subject to a given false-alarm rate upper bound is the likelihood-ratio test. To this end, the following local detectors can be set up for each c _{∈ I}k

and every nodek at time index i_{≥ 0:} Tk,c,i(wk,c,i) Hc 0 ≶ Hc 1 γk,c,i. (38)

The test-statistics are given by

Tk,c,i(wk,c,i) = w⊤scQk,c,iwk,c,i (39) where, for full-rankPcKk,i (guaranteed by full-rankU0:i),

Qk,c,i= PcKk,iU0:iPc⊤

⊤

PcKk,iRv,0:iKk,i⊤Pc⊤

−1

. (40) The computation of Qk,c,i requires the exchange of large

amounts of data across the network. Given communication and computational constraints in networks, an approximation can prove rewarding. Recalling the expression (20) for Kk,i and

assuming the matrices_{Kk,i} of temporally growing

dimen-sions are not exchanged among neighboring nodes (by setting Aℓk to Wk, if ℓ = k, and 0, otherwise), and that the

step-sizes_{µk} are sufficiently small, then, a good approximation

forQk,c,i is Qk,c,i = IMc. A detailed derivation is given in Appendix B. The test-statistics are ultimately given by

Tk,c,i(wk,c,i) = w⊤scwk,c,i, (41) relying only on the locally available adaptive estimatewk,c,i.

It remains to determine the thresholds {γk,c,i} based on

(5)

V. PERFORMANCEANALYSIS

Throughout this section, substantial use is made of the following error quantity:

e

wk,i , wko− wk,i, wok, Wkwos, (42)

which is referred to as the weight-error vector.

A. Detection Performance

For every node k _{∈ V, c ∈ I}k, the test-statistics at time

index i_{≥ 0, given by (41), are distributed as} Tk,c,i(wk,c,i)∼ N ws⊤cPc(w o k− E ewk,i), σk,c,i2 (43) where σ2 k,c,i= w⊤scPcRwek,iP ⊤ c wsc (44)

with Rwek,i = E[wek,i− E ewk,i][wek,i− E ewk,i]

⊤

denoting the covariance matrix of node k’s weight-error vector. It is then clear that the detection performance relates to the estimation performance, which is analyzed in Sec. V-B.

The detection and false-alarm probabilities are given by Pd k,c,i≈ Q γk,c,i− w⊤scwsc+ w ⊤ scPcEwek,i σk,c,i ! (45) P_k,c,if _{≈ Q} γk,c,i+ w ⊤ scPcEwek,i σk,c,i ! (46) whereQ(_{·) is the right-tail Gaussian probability function:}

Q(x) = √1

2π

Z ∞

x

e−t22 dt.

The misdetection probability is given by Pm

k,c,i = 1− Pk,c,id .

A distributed method for the computation of the thresholds {γk,c,i} will be presented in Sec. V-C.

B. Estimation Performance

We first introduce the network weight-error vector e

wi, col{ ew1,i, . . . ,weN,i} (NM × 1) (47)

as well as the following matrix:

Ui= diag{u1,i, . . . , uN,i} (N × NM) (48)

Recalling the definitions for the matrices _{M, H}i, and A in

(26), (27), and (30), respectively, the network-wide weight-error recursion is given by1

e

wi=A⊤[I− MHi]wei−1− A⊤MUi⊤vi. (49)

1_{Apart from the inherent structure of}_H

iandUithat is brought about by the augmented regressor construction according to (5) and (2), this recursion is of the same form as its counterpart in single-task networks in [21].

1) Mean Performance: Taking the expectation of both sides of the weight-error recursion (49) leads to the following network-wide mean weight-error recursion:

Ewei=A⊤[I− MHi]Ewei−1. (50)

A sufficient condition for asymptotic unbiasedness irrespective of the initial conditionw−1according to [21, Lemma 2] is that

there exist a time indexio, a submultiplicative normk·k, and

a constant0 < θ < 1 such that

A⊤_(I

− MHi)

≤ θ (51)

for all i _{≥ i}o. As a matter of fact, sufficiently small

step-sizes _{µk} ensure asymptotic unbiasedness, as the following

theorem shows.

Theorem 1 (Mean Stability). Under the data and network model in Sec. II, if each node k in the network runs the algorithm in Table I using combination policies Ac, c ∈ S,

with entries satisfying (9), then, for sufficiently small step-sizes {µk}, limi→∞Ewek,i = 0, for each k. Equivalently,

limi→∞E wk,c,i= wosc for eachc∈ Ik and everyk.

Proof:See Appendix C.

2) Mean-Square Performance: The network-wide

weight-error covariance recursion is given by Rwei , E[ ewi− E ewi][wei− E ewi]

⊤

=_A⊤[I_{− MH}i]Rwei−1[I− HiM]A

+_A⊤_MU_i⊤RvUiMA (52)

with_Rwe−1 = 0, if wk,−1 = 0 for every node k. The matrix Rwek,i is given by the kth M × M diagonal block of Rwei. Convergence analysis of (52) is generally challenging and has not been undertaken in [21]. However, it is clear from (44), (45), and (46) that_Rwei−1 must converge to0 for the detection probability Pd

k,c,i to converge to 1 for a given false-alarm

probability, asi_{→ ∞, for every node k and each c ∈ I}k.

C. Distributed Threshold Computation

In the Neyman–Pearson framework, the thresholds_{γk,c,i}

are calculated from the false-alarm rate constraints as γk,c,i≈ σk,c,iQ−1(P_k,c,if )− w⊤scPcEwek,i. (53) Clearly, the calculation of the thresholds according to (53) at each time indexi cannot be performed locally at node k using only the locally available regressors_{uk,c,i|c ∈ Ik},

measure-ments_{dk(i)}, and intermediate estimates from its neighbors

{{ψℓ,c,i}|ℓ ∈ Vk, c∈ Ik} at the respective time index. More

information needs to be exchanged for the calculation of the local mean weight-error vector Ewek,i according to (50)

and test-statistic variance σ2

k,c,i according to (44) and (52).

Similar to [21], an alternative distributed method is proposed to approximate the thresholds. First, assuming asymptotic unbiasedness, the thresholds can be approximated as

γk,c,i= g −1 2 c · σk,c,iloc Q −1 (P_k,c,if ) (54) where σk,c,iloc 2 = w⊤scPcR loc e wk,iP ⊤ c wsc (55)

(6)

TABLE II

DIFFUSIONDETECTIONALGORITHM

Initializations: wsc,Vk,Ik, µk, ac,ℓk, bRw_e_{Ik ,−1}= 0, Pk,c,if , and gc(nodes ℓ, k∈ Cc, cluster c, c∈ S). Start with wk,c,−1= 0 for every node k and each cluster c, c∈ Ik. For every time index i≥ 0, repeat

1) Run one iteration of the Diffusion Estimation Algorithm in Table I. 2) Decision: for every node k and each cluster c, c∈ Ik, repeat

a) Test-statistic: Tk,c,i= w ⊤ scwk,c,i (59) b) Threshold: Rloc_w_e Ik ,i= h I− µku⊤I_k,iuI_k,i i Rloc_w_e Ik ,i h I− µku⊤I_k,iuI_k,i i + µ2 kσ2v,ku ⊤ I_k,iuI_k,i (60a) σloc k,c,i 2 = w⊤ scR loc e w_k,c,iwsc (60b) γk,c,i= g −1 2 c · σk,c,iloc Q−1(P f k,c,i) (60c) c) Test: Tk,c,i Hc₀ ≶ Hc 1 γk,c,i (61)

and the matrix Rloc e wk,i is given by Rloc e wk,i = I_{− µ}ku⊤k,iuk,i Rloc e wk,i I_{− µ}ku⊤k,iuk,i + µ2_kσ_v,k2 u⊤_k,iuk,i, Rblocw_ek,−1 = 0 (56) which is the kth M× M diagonal block equation recovered from (52) by setting A = diag{Wk}k∈V

. As a matter of fact, each nodek need only run the recursion (56) employing not the augmented regressors {uk,i}, but {uIk,i} in (10) instead, absolving the node of the need for global information such as the number of clusters Nsand the lengths of each of

the weight vectorswsc,c∈ S: Rloc e w_Ik,i= I_{− µ}ku⊤Ik,iuIk,i Rloc e w_{Ik ,i} I_{− µ}ku⊤Ik,iuIk,i + µ2 kσ2v,ku⊤Ik,iuIk,i, Rb loc e w_Ik,−1 = 0. (57)

The factor gc in (54) accounts for the gain incurred by the

diffusion process, and can be estimated offline as

gc≈ * 1 |Cc| X k∈Cc σloc k,c,i 2 (σk,c,i)2 + (58) where_{h·i is the time-averaging operator and the variances can} be computed analytically or using Monte Carlo simulations. The complete detection algorithm is listed in Table II, where Rloc

e

wk,c,i is the appropriate submatrix inR

loc e wIk,i.

Remark 1: The central limit theorem may be invoked to argue for the asymptotic Gaussianity of the test-statistics according to the linearity of the relations (41), (19), and (7). Though this may motivate the same algorithm derivation and analysis under non-Gaussian observation models, it is gener-ally known that both the estimation and detection performance of adaptive algorithms are impacted when the noise model departs from Gaussianity. Non-Gaussian observation models were investigated in [38] in the diffusion-based estimation

Network Topology 0 5 10 0 1 2 3 4 Node Number, k σ 2 v,k 0 5 10 0 5 10 15 20 Node Number, k T r( Ru ,k ) 1 7 3 6 5 2 4 8 9 10

Fig. 1. Network topology, node noise variances σ2

v,k, and regressor covariance tracesTr(Ru,k), for N = 10 nodes.

context, and in [22], [23] in the diffusion-based detection context. In [22], [23], [38], adaptive update rules with error nonlinearities were developed to counteract impulsive noise and were subsequently analyzed.

VI. SIMULATIONRESULTS

A. Algorithm Assessment

We consider N = 10 nodes, and two node clustersC1 =

{1, 2, 3, 4, 5, 6, 7}, C2 = {4, 5, 6, 7, 8, 9, 10}. The nodes in

clusterCc are interested in detecting the presence or absence

of the randomly initiated unit-norm weight vectorswsc,c = 1, 2, of dimensions M1= M2= 2. The regressors{uk,c,i} and

noise samples_{vk(i)} are drawn independently of one another,

independently across time and space, and identically dis-tributed across time: For each nodek, the regressors_{uk,c,i}

are additionally uncorrelated for c_{∈ I}k, and the samples are

drawn from a multivariate zero-mean Gaussian distribution with covariances _{Ru,k}, where Ru,k = E u⊤k,iuk,i; and the

noise samples are drawn according to a zero-mean Gaussian distribution with variances _{σ_v,k2 _{}. The same set of} regres-sors is maintained throughout the experiments. The network topology, regressor covariance traces, and noise variances are shown in Fig. 1. The nodes employ the uniform combination rule for both weight vectors, i.e., the nonzero combination weights are of the form ac,ℓk = 1/|Vk∩ Cc|, c = 1, 2. The

step-size parameterµk is set to0.01 for all k. The target

false-alarm rate P_k,c,if is set to 10−2 _{for all} _{k, c}

∈ Ik, and i. The

factor gc was estimated offline and found to be6, c = 1, 2.

All results are obtained over10, 000 Monte Carlo runs. First, we numerically assess the approximations involved in deriving the diffusion detection algorithm in Table II, partic-ularly the test-statistics (41) and thresholds (54). Specifically, we compare the detection and false-alarm performance, when the weight vectors ws1 and ws2 are both present or both absent, respectively, of three schemes with increasing level of approximation for the test-statistics and thresholds. The last scheme corresponds to the choices (41) and (54), respectively. The schemes are listed in Table III. The resulting average detection and false-alarm performance across the two clusters

(7)

0 1 2 3 4 5 10-3 10-2 10-1 100 Misdetection Rate Cluster 1 0 1 2 3 4 5 10-3 10-2 10-1 Misdetection Rate Cluster 2 (a) 0 5 10 15 20 25 30 35 40 10-2 100 False-Alarm Rate Cluster 1 0 5 10 15 20 25 30 35 40 10-2 100 False-Alarm Rate Cluster 2 (b)

Fig. 2. For a target false-alarm rate10−2_{, a) average detection performance} across cluster1 and cluster 2 (top and bottom plots, respectively) when both weight vectors are present, and b) average false-alarm performance when both weight vectors are absent of three schemes with increasing level of approximation for the test-statistics and thresholds. The schemes are listed in Table III.

C1andC2is shown in Figs. 2a and 2b, respectively. The results

show thatQk,c,i= I is a reasonable approximation. Moreover,

the fully distributed threshold computation is shown to lead to nearly the same detection performance at the expense of worse false-alarm performance. Similar results were reported for single-task networks in [21].

We consider now two scenarios to assess the detection and false-alarm performance of the diffusion detection algorithm, respectively, as well as the tracking performance. For compar-ison, we additionally plot throughout the results assuming no cooperation among the respective cluster nodes, as well as full cooperation—in the latter case, the uniform combination rule is employed. The factor gc, c = 1, 2, is obviously 1 in the

case of no cooperation, and was found to be 7 in the case of full cooperation.

1) Scenario 1—Detection Performance: The scenario is

described as follows. Foriswitch= 250,

wso= ( wT s1 w T s2 T , if i < iswitch wT s1 01×M T , if i_{≥ i}switch . (62)

See Figs. 3–4. In Fig. 3a, the simulated transient network estimation and tracking performance for the aforementioned three modes of cooperation is depicted in terms of the network mean weight-error norm, _N1_{kE e}wik, (top), and network

mean-square deviation (MSD), _N1 Ek ew_i_k2, (bottom). The average detection performance across _C1 and C2 is plotted in Fig.

3b (top and bottom plots, respectively). It can be seen that cooperation among the cluster nodes boosts performance. The simulated test-statistic means for weight vectorswsc,c = 1, 2, of three exemplary nodes are plotted in Figs. 4a and 4b, respectively: nodes 1 (in cluster 1 only), 4 (in clusters 1 and 2), and 8 (in cluster 2 only). Even though according to the diffusion estimation algorithm in Table I neither do nodes1, 2, or 3 compute test-statistics for ws2 nor nodes 8, 9, or 10 forws1, the results are plotted as if the nodes were running the augmented algorithm (18) instead to emphasize that node k’s estimates for weight vectors c when c /∈ Ik are zero.

Finally, it is worth noting that a blip occurs in the simulated test-statistic means for cluster 1 nodes (1 through 7) at time index i = iswitch due to transient interference from ws2 as it changes state.

2) Scenario 2—False-Alarm Performance: The scenario is

described as follows. Foriswitch= 250,

wo s= ( [01×M 01×M]T if i < iswitch wT s1 01×M T if i≥ iswitch . (63)

See Figs. 5–6. Analogous observations to those in Scenario 1 can be made. However, it is noteworthy that the target false-alarm rate is met only in the case of no intra-cluster cooperation, while the false-alarm performance is slightly compromised in the case of cooperation. This is due to the approximationQk,c,i = I, c = 1, 2, which is exact in the case

of no cooperation.

B. Collaborative Spectrum Sensing Application

In collaborative spectrum sensing, secondary users (SUs) cooperate to detect transmissions of primary users (PUs) of licensed radio spectrum bands. SUs can access these bands in the absence of transmissions from PUs and cease their own transmissions when PUs resume theirs. In [21], the problem was formulated assuming one PU. We extend the model here

TABLE III

APPROXIMATIONASSESSMENT- ALTERNATIVESCHEMES

Scheme Name Test-statistic Threshold

Qk,c,i= Qoptk,c,i Tk,c,i = ws⊤cQ

opt

k,c,iwk,c,i, where

Qopt_k,c,iis computed as in (40).

γk,c,i= σk,c,iopt Q−1(P

f

k,c,i),

whereσ_k,c,iopt 2= w⊤

scQ opt k,c,iPcRwek,iP ⊤ c Q opt k,c,iwsc

and Rw_ek,i is given by (52).

Qk,c,i= I Tk,c,i= w⊤scwk,c,i γk,c,i= σk,c,iQ

−1_(Pf

k,c,i), where σ2k,c,i= w

⊤

scPcRwek,iP

⊤

c wsc.

(8)

0 50 100 150 200 250 300 350 400 450 500 Time, i -50 -40 -30 -20 -10 0

Netw. Mean Error Norm (dB)

Diffusion No cooperation Full cooperation 0 50 100 150 200 250 300 350 400 450 500 Time, i -30 -20 -10 0 10 Netw. MSD (dB) Diffusion No cooperation Full cooperation (a) 0 5 10 15 20 25 10-3 10-2 10-1 100 Misdetection Rate Cluster 1 Diffusion No cooperation Full cooperation 0 5 10 15 20 25 30 35 40 10-3 10-2 10-1 100 Misdetection Rate Cluster 2 Diffusion No cooperation Full cooperation (b)

Fig. 3. Scenario 1: a) Transient network estimation and tracking performance in terms of network mean weight-error norm and network mean-square deviation (MSD) (top and bottom plots, respectively), approximated via Monte Carlo simulations, for three modes of cooperation. One of the weight vectors to be detected changes state at time index i= 250—see Eq. (62). b) For a target false-alarm rate10−2_{, average detection performance across cluster}₁ and cluster2 (top and bottom plots, respectively).

200 400 0 0.5 1 Time, i E [T 1 , 1 ,i ] 200 400 0 0.5 1 Time, i E [T 4 , 1 ,i ] 200 400 −1 −0.5 0 0.5 1 Time, i E [T 8 , 1 ,i ] (a) c= 1 200 400 −1 −0.5 0 0.5 1 Time, i E [T 1 ,2 ,i ] 200 400 0 0.5 1 Time, i E [T 4 ,2 ,i ] 200 400 0 0.5 1 Time, i E [T 8 ,2 ,i ] (b) c= 2

Diffusion No cooperation Full cooperation

Fig. 4. Scenario 1: Mean of test-statistics of nodes1 (in cluster 1 only), 4 (in clusters1 and 2), and 8 (in cluster 2 only), approximated via Monte Carlo simulations, for weight vector wsc: a) c= 1, b) c = 2.

0 50 100 150 200 250 300 350 400 450 500 Time, i -50 -40 -30 -20 -10 0

Netw. Mean Error Norm (dB)

Diffusion No cooperation Full cooperation 0 50 100 150 200 250 300 350 400 450 500 Time, i -40 -30 -20 -10 0 Netw. MSD (dB) Diffusion No cooperation Full cooperation (a) 0 50 100 150 200 250 Time, i 10-3 10-2 10-1 100 False-Alarm Rate Cluster 1 Diffusion No cooperation Full cooperation 0 50 100 150 200 250 300 350 400 450 500 Time, i 10-3 10-2 10-1 100 False-Alarm Rate Cluster 2 Diffusion No cooperation Full cooperation (b)

Fig. 5. Scenario 2: a) Transient network estimation and tracking performance in terms of network mean weight-error norm and network mean-square deviation (MSD) (top and bottom plots, respectively), approximated via Monte Carlo simulations, for three modes of cooperation. One of the weight vectors to be detected changes state at time index i= 250—see Eq. (63). b) For a target false-alarm rate10−2_{, average false-alarm performance across cluster} 1 and cluster 2 (top and bottom plots, respectively).

200 400 0 0.5 1 Time, i E [T 1 , 1 ,i ] 200 400 0 0.5 1 Time, i E [T 4 , 1 ,i ] 200 400 −1 −0.5 0 0.5 1 Time, i E [T 8 , 1 ,i ] (a) c= 1 200 400 −1 −0.5 0 0.5 1 Time, i E [T 1 , 2 ,i ] 200 400 0 0.05 0.1 Time, i E [T 4 , 2 ,i ] 200 400 0 1 2 3 ·10−2 Time, i E [T 8 , 2 ,i ] (b) c= 2

Diffusion No cooperation Full cooperation

Fig. 6. Scenario 2: Mean of test-statistics of nodes1 (in cluster 1 only), 4 (in clusters1 and 2), and 8 (in cluster 2 only), approximated via Monte Carlo simulations, for weight vector wsc: a) c= 1, b) c = 2.

(9)

PU1 PU2 SU1 SU2 SU3 SU4 SU5 SU6 SU7 SU8 SU9 SU10

Fig. 7. Collaborative sensing application network setup.

to several PUs. A network can be used to model cooperation within and among possibly overlapping clusters of SUs, each interested in detecting the presence or absence of one PU. The SUs are indexed by k, k _{∈ V = {1, . . . , N}SU}, and

the PUs by c, c _{∈ S = {1, . . . , N}PU}. The SUs in cluster

Cc ⊆ V are interested in tracking the cth PU. For simplicity,

consider the setup in Fig. 7, with NSU = 10 and NPU = 2.

The clusters C1 andC2 in this setup coincide with the

com-munication ranges of the two PUs: C1 = {1, 2, 3, 4, 5, 6, 7},

C2 ={4, 5, 6, 7, 8, 9, 10}, with the nodes 4, 5, 6, and 7 lying

within both communication ranges. Thecth PU’s continuous-time real-valued signal is denoted asso

c(t), t∈ R, where, over

some time interval,so

c(t) = 0 underHc0, andsoc(t) = sc(t)6= 0

under_Hc

1. The channels from the PUs to the SUs are assumed

to be band-limited. Sampling at a frequency _T1

s satisfying the Nyquist sampling theorem, the received signal at the kth

SU at time t = nTs, n ∈ Z, is given by yk(nTs) = ro 1k(nTs) + ro2k(nTs) + νk(nTs), where, for c = 1, 2, rock(nTs) = L_Xck−1 j=0 hck(j)soc (n− j)Ts .

The channelhck(j) from the cth PU to the kth SU is modeled

as a time-invariant finite-impulse response filter withLcktaps,

and it is nonzero only ifk∈ Cc, i.e., when thekth SU is within

the cth PU’s communication range. The signal sc(nTs), c =

1, 2, is assumed to be known to all nodes k ∈ Cc, e.g.,sc(t)

is a pilot or frame preamble signal whose samples may be estimated by the respective nodes during instances when they know the PU to be active. The channelhck(j) is also assumed

to be known to the kth SU or to have been estimated. The noise νk(i) is a scalar wide-sense stationary zero-mean real

Gaussian noise process, spatially independent and temporally white with variance σ2

ν,k = E νk2(t). The noise variance σν,k2

is assumed as well to be known to thekth SU or to have been estimated during instances of PU inactivity.

As in [21], it is assumed that the kth SU observes a block

of L = T

Ts samples of yk(t), then it processes them for a diffusion iteration at timet = iT , i≥ 0, where L is the block length and T is the block duration. It is also assumed that the signal sc(t), c = 1, 2, repeats itself every L samples, i.e.,

sc(nT s) = sc (n + L)Ts

for alln. At time t = iT , the kth

SU collects the last L samples of yk(t) from time instants

t = (i_{− 1)T + T}stot = iT into a vector, leading to the data

model yk,i = Hkso+ νk,i following the definitions

yk,i= h yk(iT ) yk(iT− Ts) . . . yk iT− (L − 1)Ts i⊤ Hk = [H1kH2k] Hck=    hck(0) . . . hck(Lck− 1) . . . 0 .. . . .. . .. ... 0 . . . hck(0) . . . hck(Lck− 1)    s = [s1s2]⊤ sc= h sc(iT ) sc(iT− Ts) . . . sc iT − (Mc− 1)Ts i⊤ so_{= [s}o 1so2]⊤ so c = h so c(iT ) soc(iT− Ts) . . . soc iT_{− (M}c− 1)Ts i⊤ νk,i= h νk(iT ) νk(iT− Ts) . . . νk iT_{− (L − 1)T}s i⊤ of dimensionsL_{× 1, L × M, L × M}c,M× 1, Mc× 1, M × 1,

Mc× 1, and L × 1, respectively, where Mc = L + Lck− 1,

c = 1, 2. The noise vectors νk,i and νℓ,j are independent for

k _{6= ℓ or i 6= j. The binary hypothesis tests are given by} so

c = 0 under Hc0, and soc = sc 6= 0 under Hc1. Using the

following definitions:

d_k(i) = 1

σν,kkHksk

s⊤Hk⊤yk,i

uk,i= [uk,1,iuk,2,i] =

1 σν,kkHksk s⊤H_k⊤Hk (1× M) ws= [ws1ws2] = s (M× 1) wo s= [wso1w o s2] = s o _(M × 1) vk(i) = 1 σν,kkHksk s⊤Hk⊤νk,i

and noting that with the proposed construction forHk it holds

thatuk,c,i 6= 0, if k ∈ Cc, and uk,c,i = 0, if k /∈ Cc,c = 1, 2,

we thus obtain the data model dk(i) = uk,iwos+ vk(i), which

is identical to the data model in Sec. II with time-invariant regressors and noise varianceσ2

v,k = 1 for all k.

For the simulation, L = 4 samples per observation block, L1k = L2k = 2 channel taps for all k, leading to lengths

M1 = M2 = 5 for the weight vectors ws1 and ws2. The channel tapshck(0), . . . , hck(Lck−1) for the kth SU are drawn

independently according to zero-mean Gaussian distributions with variance σ2

hc,k, c = 1, 2, as plotted in the bottom plots of Fig. 8, while the noise variances _{σ2_ν,k_{} are plotted in the} top plot. The factors g1 and g2 were estimated offline and

found to be 3 and 1, respectively. All other parameters are the same as in Sec. VI-A. For the case when both PUs are present, the results showed that the average misdetection rate across the two clusters is zero for a target false-alarm rate of 10−2_{, while the resulting average false-alarm performance for}

the case when both PUs are absent is 0.02 for cluster 1 and 0.09 for cluster2.

VII. CONCLUSION

We proposed a distributed online detection algorithm based on node-specific diffusion LMS estimation to solve the

(10)

prob-1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 2 4 6 0 5 10 15 4 6 8 10 4 6 8 10 12 14

Fig. 8. SU noise variances σ2

ν,k (top) and channel tap variances σ2hc,k, c= 1, 2, for the SUs of clusters 1 and 2, respectively (bottom).

lem where each node in a network would like to establish the presence or absence of each of its parameters of interest, with possible overlap among the node interests. The developed detectors are motivated by the dynamics of the diffusion estimation algorithm, are fully distributed, and are able to track changing conditions in real time. The detection performance of the algorithm is intertwined with its estimation performance, which is analyzed using the energy-conservation framework, showing that the diffusion LMS iterates converge in the mean and mean-square sense to the true weight vectors given sufficiently small step-sizes. The approximations undertaken in deriving the detectors as well as the resulting performance were assessed in simulations. Finally, we demonstrated the per-formance of the distributed detection algorithm in a practical collaborative sensing application.

APPENDIXA

MEAN-SQUARESTABILITY OFDIFFUSIONESTIMATION ALGORITHM

Let us define the update vector ˆ

sk,i(wk,i−1) =−u⊤k,i[dk(i)− uk,iwk,i−1]. (64)

The ATC diffusion algorithm in augmented form, written in terms of the update vector (64), is given by the following update equations:

ψ_k,i = w_k,i_{− µ}_ksˆ_k,i(wk,i−1)

w_k,i = Pℓ∈VkAℓkψℓ,i

(65)

where Aℓk is of the form (9) and (17). We show in the

following lemma that sˆk,i(wk,i−1) satisfies certain properties

that will be exploited in the analysis.

Lemma 2 (Update Vector Properties). Under the data model assumptions in Sec. II, the update vectorsˆk,i (64) for eachk

andi satisfies the following properties:

Randomness: There exists anM_{× 1 deterministic vector} function sk,i(w) such that, for all M × 1 vectors w in the

filtration Fi−1generated by the past history of iterates{wk,j}

forj_{≤ i − 1, it holds that}

E{ˆsk,i(w)|Fi−1} = sk,i(w) (66)

where

sk,i(w) ,−u⊤k,iuk,i(wo− w). (67)

Moreover, for eachk, there exists σ2

s,k≥ 0 such that for each

i and all w_{∈ F}i−1, it holds that E n kˆsk,i(w)− sk(w)k2 Fi−1 o ≤ σ2 s,k (68) whereσ2 s,k, σv,k2 maxikuk,ik2.

Lipschitz: There exists λU > 0 such that for all x, y ∈ RM, it holds that

ksk,i(x)− sk,i(y)k ≤ λUkx − yk (69)

where the subscript “U ” in λU refers to the latter determining

the upper bound with λU , maxk,ikuk,ik2. This property

is guaranteed by the bounded energy assumption on the regressors in Sec. II, where in this appendixλU is used instead

ofα for notational convenience.

Strong monotonicity: There exists λL > 0, λL ≤ λU,

such that for allx, y_{∈ R}M_{, it holds that}

(x_{− y)}⊤X

k∈V

[sk,i(x)− sk,i(y)]≥ λLkx − yk2 (70)

where the subscript “L” in λL refers to the latter

determin-ing the lower bound with λL , miniλmin P_k∈Vu⊤_k,iuk,i

. Here, λmin(·) denotes the minimum eigenvalue of its square

matrix argument. This property is guaranteed by the full-rank

assumption onUi in Sec. II.

Properties equivalent to the abovementioned Lipschitz and strong monotonicity properties can be established as follows. First, let

Hk,i , ∇ws_k,i(w) = u ⊤

k,iuk,i (71)

for any w_{∈ F}i−1. Then, the equivalent properties are

Hk,i≥ 0, kHk,ik ≤ λU,

X

k∈V

Hk,i≥ λLIM. (72)

Extended error dynamics: The objective is to examine

the evolution of the offset between the estimate wk,i for each

k and the limit point wo

k, Wkwso, where Wk, diag {IIk(c)IMc}c∈S . (73)

We first define the error quantities e

ψk,i, wok− ψk,i, wek,i, wok− wk,i (74)

Due to the structure of the regressors according to (5) and (2), the expected value of the update vector can be written for each k, i, and wk,i−1∈ Fi−1 as

sk,i(wk,i−1) =−u⊤k,iuk,i(wo− wk,i−1)

=_−u⊤

k,iuk,iWk(wo− wk,i−1)

=_−u⊤_k,iuk,i(wko− wk,i−1)

=−Hk,iwek,i−1 (75)

where we used (74) and (71).

For nodek, time index i, and any w∈ Fi−1, let

(11)

represent the noise incurred by stochastic approximation. We refer to it as the update noise vector. Then, the update equations (65) using (76) and (75) become

ψk,i = wk,i−1+ µkHk,iwek,i−1− µkv_k,is

wk,i = Pℓ∈VkAℓkψℓ,i

(77) Subtracting each of the update equations in (77) fromwo

k and

using the fact that P_ℓ∈V_kAℓkwok = wok,

( e

ψ_k,i = [I− µkHk,i]wek,i−1+ µkvk,is

e

wk,i =

P

ℓ∈VkAℓkψeℓ,i

(78) We introduce the following network variables:

e ψi ,    e ψ1,i .. . e ψN,i   , ewi,    e w1,i .. . e wN,i   , vsi ,    vs 1,i .. . v_N,is    (79) Recalling that _{A = [A}ℓk], ℓ, k ∈ V, the network relations

become ( e ψi = [I− MHi]wei−1+Mvis e w_i = A⊤_ψ_e i (80) where_{M and H}i were defined in (26) and (27), respectively.

This leads to the following network weight-error recursion: e

wi=Biwei−1+A⊤Mvis, i≥ 0 (81)

with feedback matrix

Bi, A⊤(I− MHi). (82)

Properties of the matrix Ac: Consider the |Cc| × |Cc|

combination matrix [Ac]Cc, formed fromAc by keeping only

the rows and columns with indices in _Cc, where c ∈ S.

Under the assumptions from Sec. II that each of the clusters is connected and that each node is its own neighbor, it follows that the matrices[Ac]Ccare primitive. The following properties then hold for each matrix Ac:

a) The matrixAc has a single eigenvalue at1 corresponding

to the left eigenvector ¯1c with entries

¯1c,k=

(

1, ifk∈ Cc

0, otherwise. (83)

b) All other eigenvalues ofAclie strictly inside the unit circle.

c) The right eigenvector corresponding to the eigenvalue at1 ofAc is given bypcsuch thatAcpc = pc, with entriespc,k

that with proper scaling and normalization sum up to one: 1⊤pc= 1, and are positive, if k∈ Cc, and0 otherwise.

Jordan canonical decomposition of _{A: The matrix A}

is now to be decomposed in terms of the Jordan canonical decomposition of the matricesAc,c∈ S, of the form [39, P.

128], forǫ > 0: Ac = Vǫ,cJcVǫ,c−1 (84) where Jc= 1 Jǫ,c , Vǫ,c= [pc|VR,c], Vǫ,c−1= ¯1⊤ c V⊤ L,c (85) Let Ec, diag n Ic′=cIMc′ c′∈S o = Ec⊤, (86)

which is a block diagonal matrix whose cth diagonal block is the identity matrix of size Mc, while the other diagonal

blocksc′_{, of sizes}_M

c′, are all-zero matrices. Then, the Jordan canonical decomposition of_{A can be expressed as}2

A = VǫJ Vǫ−1 (87) where J = diag ( IM, ,Jǫ z }| { X c∈S Jǫ,c⊗ Ec (N −1)M×(N −1)M ) (88) Vǫ= " X c∈S pc⊗ Ec N M×M X c∈S VR,c⊗ Ec N M×(N −1)M # (89) V−1 ǫ = " P c∈S¯1⊤c ⊗ Ec⊤ P c∈SVL,c⊤ ⊗ Ec⊤ # M _{× NM} (N_{− 1)M × NM} (90)

Now, the feedback matrix_Bidefined in (82) can be expressed

in terms of the decomposition as Bi= Vǫ−1 ⊤ J⊤ − D⊤ i V⊤ ǫ (91) with_D_i⊤, Vǫ⊤A⊤MHi Vǫ−1 ⊤

. Let us consider the follow-ing partitionfollow-ing of the matrix_D_i⊤:

D⊤ i = D⊤ i,11 Di,21⊤ D⊤ i,12 Di,22⊤ (92) whereD⊤

i,11is anM×M matrix. We will proceed to evaluate

each of the partitions. As forD⊤ i,11, Di,11⊤ = X c∈S p⊤c ⊗ Ec ! A⊤ MHi X c∈S ¯ 1c⊗ Ec ! . (93)

The term pre-multiplying_Hi can be simplified:

MA X c∈S pc⊗ Ec ! =_M X c∈S Ac⊗ Ec ! X c∈S pc⊗ Ec ! =M X c∈S Acpc⊗ Ec ! =M X c∈S pc⊗ Ec ! =X c∈S (diag_{µ1, . . . , µN} ⊗ IM)(pc⊗ Ec) =X c∈S (diag_{µ1, . . . , µN}pc)⊗ Ec =X c∈S qc⊗ Ec (94) where qc, diag{µ1, . . . , µN}pc (95)

2_{Equivalently,}_{A = VJ}′_V−1_{, where}_J′_{has ones instead of ǫ on the lower} diagonal. Let x , [1, ǫ, . . . , ǫN −1_]⊤_{, x} ǫ , [ǫ, ǫ2, . . . , ǫN −1]⊤, so x = 1 x⊤ ǫ ⊤

, and Xǫ , diag{xǫ}. Let X , diagIM,Pc∈SXǫ⊗ Ec . Then,A = VX−1_{X J}′_X−1_{X V}−1_{= V}

ǫJ Vǫ−1, whereJ = X J′X−1, Vǫ, VX−1, andVǫ−1, X V−1.

(12)

with entriesqc,k= µkpc,k. Substituting (94) into (93), D⊤i,11= X c∈S q⊤c ⊗ Ec ! Hi X c∈S ¯1c⊗ Ec ! = N X k=1 X c∈S qc,kEc ! Hk,i X c∈S ¯1c,kEc ! = N X k=1 QkHk,iWk (Eqs. (83), (73)) = N X k=1 QkHk,i (Eqs. (71), (5), (2)) (96) where Qk , diag {qc,kIMc}c∈S . (97) It follows that Di,11= N X k=1 Hk,iQk. (98)

The other three partitions of the matrix _Di can be simplified

similarly.

Bounding the matrix_Di: Letµmax, max

k µk. Then, the

step-sizes can be expressed as

µk= τkµmax (99)

where 0 < τk ≤ 1, k ∈ V. Note that following (95) and

(99), the entries of the vector qc, qc,k,k∈ V, c ∈ S, can be

expressed as

qc,k= µkpc,k = µmaxτkpc,k. (100)

It is clear from this relation that, like_{pc,k}, qc,k > 0 if k∈ Cc

and0 otherwise.

We will proceed to show that the partitions of the matrix Di are bounded. For example, Di,11 = PNk=1Hk,iQk is

bounded from above and below for alli as follows. Consider the transposed matrix,D⊤

i,11: X k∈V QkHk,i= X k∈V X c∈S qc,kEcHk,i =X c∈S X k∈V qc,kEcHk,i ≥X c∈S " min k′, qc,k′>0 qc,k′ X k∈V EcHk,i # ≥ qmin X c∈S X k∈V EcHk,i = qmin X k∈V X c∈S EcHk,i = qmin X k∈V Hk,i ≥ qminλLIM (Eq. (72)) =_O(µmax) (101)

where qmin , minc, k, qc,k>0qc,k. Following the same reason-ing, exploiting the Lipschitz property (72), it can be shown

that _X

k∈V

QkHk,i≤ qmaxN λUIM =O(µmax) (102)

where qmax , maxc, kqc,k. Hence, there exist positive

con-stants, θ1 andθ2, whereθ1 ≤ θ2, independent of µmax such

that the following relation holds for alli:

θ1µmaxIM ≤ Di,11≤ θ2µmaxIM. (103)

Since the eigenvalues of Di,11 are real-valued and positive,

they are bounded as

θ1µmax≤ λ(Di,11)≤ θ2µmax. (104)

This implies that the eigenvalues ofIM−Di,11⊤ are1−O(µmax)

for sufficiently smallµmax such that

kIM − Di,11k ≤ 1 − σ11µmax= 1− O(µmax) (105)

for some positive constantσ11 independent ofµmax andi.

Similarly, it can be shown that Di,12=O(µmax), Di,21 =

O(µmax), and Di,22=O(µmax) for all i. For example,

kDi,21k ≤ X c∈S V_L,c⊤ _{⊗ E}c X c∈S qc⊗ Ec kHik ≤ X c∈S V_L,c⊤ _{⊗ E}c X c∈S qc⊗ Ec max k∈V kHk,ik ≤ X c∈S VL,c⊤ ⊗ Ec X c∈S qc⊗ Ec λU = X c∈S V⊤ L,c⊗ Ec maxc∈S kqckλU (106a) ≤ X c∈S V_L,c⊤ _{⊗ E}c q N q2 co,koλU = X c∈S V_L,c⊤ _{⊗ E}c √ N µmaxτkopco,koλU (106b)

where co , arg maxc∈Skqck, ko , arg maxk∈V qco,k, and (100) was used in (106b). The upper bound in (106a) can be motivated as follows: X c∈S qc⊗ Ec = maxc∈Skqc⊗ Eck = maxc∈Skqck. (107)

It follows that_kDi,21k ≤ σ21µmax=O(µmax), where σ21> 0

is a constant that is independent of µmax and i. Similarly,

kDi,12k ≤ σ12µmax = O(µmax), kDi,22k ≤ σ22µmax =

O(µmax), for some constants σ12, σ22 > 0 that are

indepen-dent of µmax andi.

Transformed network weight-error recursion: The

feed-back matrix Bi in (91) can be expressed in terms of the

partitions of the matrixDi as

Bi= Vǫ−1 ⊤I_M− D⊤_i,11 −D_i,21⊤ −D⊤ i,12 Jǫ⊤− D⊤i,22 Vǫ⊤. (108)

Transforming the network weight-error recursion (81) by pre-multiplying by_V_ǫ⊤, then, it holds fori_{≥ 0 that}

V⊤ ǫ wei=Vǫ⊤Bi Vǫ−1 ⊤ V⊤ ǫ wei−1+Vǫ⊤A⊤Mvis. (109) Let Vǫ⊤wei= P c∈Sp⊤c ⊗ Ec e wi P c∈SVR,c⊤ ⊗ Ec e wi , ¯ wi ˇ wi (110)

(13)

and V⊤ ǫ A⊤Mvis, ¯ vs i ˇ vs i . (111)

Therefore, in expanded form, the transformed network weight-error recursion (109) becomes

¯

w_i= IM− D⊤i,11

¯

wi−1− Di,21⊤ wˇi−1+ ¯vis (112a)

ˇ wi= Jǫ⊤− D ⊤ i,22 ˇ

wi−1− D⊤i,12w¯i−1+ ˇvis (112b)

Second-order moment stability: Evaluating the

second-order moments of w¯i and wˇi in (112a)–(112b), conditioned

on Fi−1: E h k ¯wik 2 Fi−1 i = IM − Di,11⊤ ¯

wi−1− D⊤i,21wˇi−1

2 + Eh_k¯vs ik 2 Fi−1 i (113a) Ehk ˇwik2 Fi−1 i = J⊤ ǫ − Di,22⊤ ˇ

wi−1− D⊤i,12w¯i−1

2 + Ehkˇvs ik 2 Fi−1 i (113b) where we used (111), (79), and (76), along with the fact that the noise samples _{vk(i)} are zero-mean. Taking the

expectation again, Ek ¯wik2= E IM − Di,11⊤ ¯

wi−1− D⊤i,21wˇi−1

2 + Ek¯vs ik 2 (114a) Ek ˇwik2= E J⊤ ǫ − Di,22⊤ ˇ

wi−1− D⊤i,12w¯i−1

2 + Ekˇvs ik 2 (114b) Bounding (114a) by considering an arbitrary constant t ∈ (0, 1), invoking Jensen’s inequality [40], and recalling the bounds on IM − Di,11⊤ andDi,21, Ek ¯w_i_k2_{= E} 1₁− t_{− t} IM − D⊤i,11 ¯ wi−1−t tD ⊤ i,21wˇi−1 2 + Ek¯vs ik 2 ≤(1− σ11µmax) 2 1_{− t} Ek ¯wi−1k 2 +σ 2 21µ2max t Ek ˇwi−1k 2 + Ek¯vs ik 2 . (115) Choosingt = σ11µmax< 1, Ek ¯wik2≤ (1 − σ11µmax)Ek ¯wi−1k2+ σ2 21µmax σ11 Ek ˇ wi−1k2 + Ek¯vs ik 2 . (116)

Bounding (114b) using similar reasoning,

Ek ˇwik2≤ 1 tE J⊤ ǫ wˇi−1 2+ Ekˇvs ik 2 + 1 1_{− t}E D⊤

i,22wˇi−1+ D⊤i,12w¯i−1

2. (117) As for the first term in the upper bound in (117), in-voking the Rayleigh–Ritz eigenvalue characterization [39]:

E Jǫ⊤wˇi−1

2

≤ ρ(JǫJǫ∗)Ek ˇwi−1k2, whereρ(·) denotes the

spectral radius of its Hermitian matrix argument. Note that the product_JǫJǫ∗ can be decomposed as

JǫJǫ∗= X c∈S Jǫ,c⊗ Ec ! X c∈S Jǫ,c∗ ⊗ Ec ! =X c∈S Jǫ,cJǫ,c∗ ⊗ Ec (118) and that ρ(_JǫJǫ∗) = max_c∈S ρ Jǫ,cJǫ,c∗ ≤ max_c∈S Jǫ,cJǫ,c∗ 1 (119)

where_k·k₁ is the maximum absolute column sum. Letc1 ,

arg maxc∈S

Jǫ,cJǫ,c∗

1. SayJǫ,c1 hasLc1 Jordan blocks each with eigenvalueλc1,ℓ,1≤ ℓ ≤ Lc1. Then,

ρ Jǫ,c1J ∗ ǫ,c1 ≤ max_1≤ℓ≤L c1 (_|λc1,ℓ| + ǫ) 2 = ρ(Jǫ,c1)+ǫ 2 (120) So,ρ(JǫJǫ∗)≤ (ρ(Jǫ,c1) + ǫ) 2

. Note that ρ(Jǫ,c) < 1 for all

c ∈ S. Hence, ǫ can be selected such that ρ Jǫ,c1

+ ǫ ∈ (0, 1). Substituting the upper bound (120) into (117),

Ek ˇw_i_k2_≤ 1 t(ρ(Jǫ,c1) + ǫ) 2 Ek ˇwi−1k2+ Ekˇvisk 2 + 1 1_{− t}E D⊤

i,22wˇi−1+ Di,12⊤ w¯i−1

2 . (121) Choosingt = (ρ(Jǫ,c1) + ǫ)∈ (0, 1) leads to Ek ˇwik2≤ (ρ(Jǫ,c1) + ǫ)Ek ˇwi−1k 2 + Ekˇvs ik 2 + 1 1_{− ρ(J}ǫ,c1)− ǫ

E D⊤i,22wˇi−1+ D⊤i,12w¯i−1

2

. (122) As for the third term in the upper bound in (122),

E Di,22⊤ wˇi−1+ D⊤i,12w¯i−1

2 ≤ 2E Di,22⊤ wˇi−1 2+ 2E D⊤i,12w¯i−1 2 ≤ 2σ2

22µ2maxEk ˇwi−1k2+ 2σ122 µ2maxEk ¯wi−1k2. (123)

Substituting the upper bound in (123) into (122),

Ek ˇwik2≤ ρ(Jǫ,c1) + ǫ + 2σ2 22µ2max 1_{− ρ(J}ǫ,c1)− ǫ Ek ˇwi−1k2 + 2σ 2 12µ2max 1− ρ(Jǫ,c1)− ǫ Ek ¯wi−1k2+ Ekˇvsik 2 . (124) It remains to bound the update noise terms in (116) and (124),

Ek¯vs ik 2 and Ekˇvs ik 2 , respectively: Ek¯vs ik 2 + Ekˇvs ik 2 = E V⊤ ǫ A⊤Mvis 2 ≤ Vǫ⊤A⊤ 2 kMk2Ekvisk 2 ≤ v12µ2maxEkvisk 2 (125) where v1 , V⊤ ǫ A⊤

> 0, independent of µmax. Note that,

for alli, Ekvs ik 2 = N X k=1 E vsk,i 2 ≤ N X k=1 σs,k2 = σs2 (126)

where we used (76) and (68), and σ2 s , PN k=1σs,k2 . Hence, referring back to (125), Ek¯vs ik 2 + Ekˇvs ik 2 ≤ v2 1µ2maxσs2. (127)

(14)

Substituting into the upper bounds for E_{k ¯}wik2 andEk ˇwik2

in (116) and (124), respectively,

Ek ¯wik 2

≤ (1 − σ11µmax)Ek ¯wi−1k2+σ

2 21µmax σ11 Ek ˇwi−1k 2 + v21µ2maxσs2 (128a) Ek ˇw_i_k2_≤ρ(Jǫ,c1) + ǫ + 2σ2 22µ 2 max 1−ρ₍Jǫ,c1)−ǫ Ek ˇwi−1k2 + 2σ212µ 2 max 1−ρ₍J_ǫ,c1₎−ǫEk ¯wi−1k 2 + v2 1µ2maxσs2 (128b)

Introducing the scalar coefficients a = 1− σ11µmax = 1−

O(µmax), b = σ 2 21µmax σ11 =O(µmax), c = 2σ 2 12µ 2 max 1−ρ₍Jǫ,c1)−ǫ =O µ 2 max d = ρ(Jǫ,c1) + ǫ + 2σ2 22µ 2 max 1−ρ(J_ǫ,c1)−ǫ = ρ(Jǫ,c1) + ǫ +O µ 2 max and e = v12µ2maxσ2s, we can write the inequalities (128a)–

(128b) more compactly as Ek ¯wik2 Ek ˇw_i_k2 a b c d | {z } Γ Ek ¯wi−1k2 Ek ˇwi−1k2 + e12 (129)

where_{denotes entry-wise less-than-or-equal-to comparison;} and the entries of the matrixΓ indicated above are of the form

Γ =

1− O(µmax) O(µmax)

O µ2 max ρ(Jǫ,c1) + ǫ +O µ 2 max . (130)

Since the spectral radius of a matrix is upper bounded by any of its norms, then, invoking this property in terms of the maximum absolute column sum, it holds that

ρ(Γ)≤ maxn1− O(µmax) +O µ2max

, ρ(Jǫ,c1) + ǫ +O(µmax) +O µ 2 max o . (131) Since Jǫ,c1 < 1 and is independent of µmax, and since the latter and ǫ are small positive numbers that can be chosen arbitrarily small and independently of each other, it is clear that the upper bound above can be made strictly smaller than one for sufficiently smallµmax andǫ. If this is the case, ρ(Γ) < 1

so thatΓ is a stable matrix. Given stability, if (129) is iterated, it can be concluded that

lim sup i→∞ Ek ¯wik2 Ek ˇwik2 (I − Γ)−1e12= O(µmax) O µ2 max , (132) i.e., lim sup i→∞ Ek ¯ wik2=O(µmax) (133) lim sup i→∞ Ek ˇ wik2=O µ2max (134) This leads to lim sup i→∞ Ek e wik2= lim sup i→∞ E Vǫ−1 ⊤w¯i−1 ˇ wi−1 ≤ lim sup i→∞ v 2 2 h Ek ¯w_i_k2_{+ Ek ˇ}w_i_k2i=O(µmax) (135) with v2 , V−1 ǫ

, implying that, for each k, lim supi→∞Ek ewk,ik

2

= O(µmax), thus concluding the

proof.

APPENDIXB APPROXIMATION FORQk,c,i

SettingAℓk toWk, ifℓ = k, and 0, otherwise, Kk,i in (20)

becomes Kk,i=

µkUi⊤Ek Yk,iKk,i−1 (136)

=µkUi⊤Ek µkYk,iUi−1⊤ Ek . . . µk(Yk,i. . . Yk,1)U0⊤Ek

Note that for sufficiently small step-sizes {µk},

Yk,i. . . Yk,j+1≈ I − µk i X m=j+1 u⊤ k,muk,m. (137)

Evaluating the first parenthesized expression in (40) and ignoring_O(µ2_k) terms,

PcKk,iU0:iPc⊤ = µkPc   i X j=0 (Yk,i. . . Yk,j+1)u⊤k,juk,j  P⊤ c ≈ µk i X j=0 u⊤k,c,juk,c,j. (138)

Ignoring_O(µ2_k) terms can also be motivated by asymptotically vanishing cross-interest interference, established in Lemma 1. Evaluating the second parenthesized expression in (40) and keeping terms only up to _O(µ2_k),

PcKk,iRv,0:iKk,i⊤Pc⊤

= µ2kσ2v,kPc _Xi j=0 (Yk,i. . . Yk,j+1)u⊤k,juk,j(Yk,i. . . Yk,j+1)⊤ Pc⊤ ≈ µ2 kσ2v,k i X j=0 u⊤k,c,juk,c,j ≈ µkσv,k2 PcKk,iU0:iPc⊤ (139)

Since constants multiplying test-statistics have no bearing on the detection performance, it can be concluded that a good approximation for Qk,c,i under small step-sizes {µk}

isQk,c,i= IMc.

APPENDIXC

MEANSTABILITY OFDIFFUSIONESTIMATIONALGORITHM Starting from (50), and with Bi = A⊤[I− MHi], the

network mean weight-error recursion is given by Ewei =

BiEwei−1. Transforming the recursion by pre-multiplying it

with_Vǫ in (89), and using (110) and (108) yields

E ¯w_i E ˇw_i | {z } ,zi = IM − Di,11⊤ −D⊤i,21 −D⊤ i,12 Jǫ⊤− Di,22⊤ | {z } , ¯Bi E ¯wi−1 E ˇwi−1 | {z } ,zi−1 . (140)

Let z¯i , E ¯wi and zˇi , E ˇwi such that zi =

¯ z⊤i ˇz⊤i

⊤

in (140). Following a similar procedure as in Appendix A,_k¯zik2

and_kˇzik2 can be bounded as

k¯zik 2

≤ (1 − σ11µmax)Ek¯zi−1k2+σ

2 21µmax σ11 kˇzi−1k 2 (141a) kˇzik2≤ ρ(Jǫ,c1) + ǫ + 2σ2 22µ 2 max 1−ρ(J_ǫ,c1)−ǫ kˇzi−1k2 + 2σ212µ 2 max 1−ρ₍Jǫ,c1)−ǫk¯zi−1k 2 (141b)