BLIND SEPARATION OF EXPONENTIAL POLYNOMIALS AND THE DECOMPOSITION OF A TENSOR IN RANK-( L

(1)

BLIND SEPARATION OF EXPONENTIAL POLYNOMIALS AND THE DECOMPOSITION OF A TENSOR IN RANK-( L

r

, L

r

, 1) TERMS

^∗

LIEVEN DE LATHAUWER^†

Abstract. We present a new necessary and suﬃcient condition for essential uniqueness of the decomposition of a third-order tensor in rank-(Lr, Lr, 1) terms. We derive a new deterministic technique for blind signal separation that relies on this decomposition. The method assumes that the signals can be modeled as linear combinations of exponentials or, more generally, as exponential polynomials. The results are illustrated by means of numerical experiments.

Key words. multilinear algebra, higher-order tensor, singular value decomposition, canonical polyadic decomposition, block term decomposition, blind signal separation, exponential polynomial

AMS subject classifications. 15A18, 15A69 DOI. 10.1137/100805510

1. Introduction. In section 1.1 we explain our notation and give some basic deﬁnitions. In section 1.2 we brieﬂy recall the multilinear singular value decomposition (MLSVD) and the canonical polyadic decomposition (CPD). In section 1.3 we set out the goals of this paper and explain how it is organized.

1.1. Preliminaries.

1.1.1. Notation. Scalars are denoted by lower-case letters (a, b, . . . ), vectors are written in boldface lower-case (a, b, . . . ), matrices correspond to boldface capitals (A, B, . . . ), and tensors are written as calligraphic letters ( A, B, . . . ). This notation is consistently used for lower-order parts of a given quantity. For instance, the entry with row index i and column index j in a matrix A, i.e., (A)

ij

, is denoted by a

ij

(also (a)

_i

= a

i

and ( A)

ijk

= a

ijk

). The ith column vector of a matrix A is denoted by a

i

, i.e., A = [a

₁

a

₂

. . .]. Italic capitals indicate index upper bounds (e.g., i = 1, 2, . . . , I).

K stands for R or C. The symbol ⊗ denotes the Kronecker product,

A ⊗ B =

⎡

⎢ ⎣

a

11

B a

12

B · · · a

21

B a

22

B · · ·

.. . .. .

⎤

⎥ ⎦ .

The Khatri–Rao or columnwise Kronecker product is represented by :

A B = [a

1

⊗ b

1

a

₂

⊗ b

2

· · · ] .

∗Received by the editors August 16, 2010; accepted for publication (in revised form) September 13, 2011; published electronically December 8, 2011. This research was supported by the Research Coun- cil K.U. Leuven: GOA-MaNet, CoE EF/05/006 Optimization in Engineering (OPTEC), CIF1 and STRT1/08/023; the F.W.O.: (a) project G.0427.10N, (b) Research Communities ICCoS, ANMMM, and MLDM; the Belgian Federal Science Policy Oﬃce: IUAP P6/04 (DYSCO, “Dynamical systems, control and optimization,” 2007–2011); EU: ERNSI.

http://www.siam.org/journals/simax/32-4/80551.html

†Group Science, Engineering and Technology, Katholieke Universiteit Leuven Campus Kortrijk, E.

Sabbelaan 53, 8500 Kortrijk, Belgium (Lieven.DeLathauwer@kuleuven-kortrijk.be), and Department of Electrical Engineering (ESAT), Research Division SCD, Katholieke Universiteit Leuven, Kasteel- park Arenberg 10, B-3001 Leuven, Belgium (Lieven.DeLathauwer@esat.kuleuven.be, http://homes.

esat.kuleuven.be/∼delathau/home.html).

1451

(2)

The column space of a matrix is denoted by span(A). The rank of a matrix A is denoted by rank(A). The superscripts ·

^T

, ·

^∗

, ·

^H

, and ·

^†

denote the transpose, complex conjugate, complex conjugated transpose, and Moore–Penrose pseudoinverse, respectively. The operator diag( ·) stacks its scalar arguments in a square diagonal matrix.

For vectorization of a matrix A = [a

₁

a

₂

. . .] we adopt the following convention:

vec(A) = [a

^T₁

a

^T₂

. . .]

^T

. The (N × N ) identity matrix is represented by I

N ×N

. 1.1.2. Basic deﬁnitions.

Definition 1.1. Consider T ∈ K

^I¹^×I²^×I³

and U

⁽¹⁾

∈ K

^J¹^×I¹

, U

⁽²⁾

∈ K

^J²^×I²

, U

⁽³⁾

∈ K

^J³^×I³

. Then the products T ·

1

U

⁽¹⁾

, T ·

2

U

⁽²⁾

, and T ·

3

U

⁽³⁾

are deﬁned by

( T ·

1

U

⁽¹⁾

)

_j₁_i₂_i₃

=

I1

i1=1

t

i1i2i3

u

⁽¹⁾_j₁_i₁

∀j

1

, i

2

, i

3

,

( T ·

₂

U

⁽²⁾

)

_i₁_j₂_i₃

=

I2

i2=1

t

i1i2i3

u

⁽²⁾_j₂_i₂

∀i

₁

, j

2

, i

3

,

( T ·

₃

U

⁽³⁾

)

_i₁_i₂_j₃

=

I3

i3=1

t

i1i2i3

u

⁽³⁾_j₃_i₃

∀i

₁

, i

2

, j

3

,

respectively.

Definition 1.2. The outer product A ◦ B of a tensor A ∈ K

^I¹^×I²^×···×I^P

and a tensor B ∈ K

^J¹^×J²^×···×J^Q

is the tensor deﬁned by

( A ◦ B)

i1i2...iPj1j2...jQ

= a

i1i2...iP

b

j1j2...jQ

for all values of the indices.

For instance, the outer product T of three vectors a, b, and c is deﬁned by t

ijk

= a

i

b

j

c

k

for all values of the indices. The outer product T of a matrix E and a vector c is deﬁned by t

ijk

= e

ij

c

k

for all values of the indices.

Definition 1.3. In the matrix representations T

_I₃_I₂_×I₁

∈ K

^I³^I²^×I¹

, T

_I₃_I₁_×I₂

∈ K

^I³^I¹^×I²

, T

_I₂_I₁_×I₃

∈ K

^I²^I¹^×I³

of a tensor T ∈ K

^I¹^×I²^×I³

, the entries are stacked as follows:

(T

_I₃_I₂_×I₁

)

_(i₃_−1)I₂_+i₂_,i₁

= (T

_I₃_I₁_×I₂

)

_(i₃_−1)I₁_+i₁_,i₂

= (T

_I₂_I₁_×I₃

)

_(i₂_−1)I₁_+i₁_,i₃

= t

i1i2i3

∀i

1

, i

2

, i

3

. (1.1)

Definition 1.4. A third-order tensor T ∈ K

^I¹^×I²^×I³

has multilinear rank (R

1

, R

2

, R

3

) iﬀ rank(T

_I₃_I₂_×I₁

) = R

1

, rank(T

_I₃_I₁_×I₂

) = R

2

, and rank(T

_I₂_I₁_×I₃

) = R

3

.

The values R

1

, R

2

, and R

3

are sometimes called the mode-1, mode-2, and mode- 3 rank, respectively. If the multilinear rank of a third-order tensor T is equal to (R

1

, R

2

, R

3

), then T is called rank-(R

1

, R

2

, R

3

). A rank-(1, 1, 1) tensor is brieﬂy called rank-1. This implies that a third-order tensor has rank-1 iﬀ it equals the outer product of three nonzero vectors.

Besides multilinear rank, a second generalization of matrix rank to higher-order tensors is the following.

Definition 1.5. The rank of a tensor T is the minimal number of rank-1 tensors that yield T in a linear combination.

An early discussion of the generalization of matrix rank to higher-order tensors

can be found in [12, 13].

(3)

1.2. Basic tensor decompositions.

Definition 1.6. An MLSVD of a rank-(R

1

, R

2

, R

3

) tensor T ∈ K

^I¹^×I²^×I³

is a decomposition of T of the form

(1.2) T = S ·

₁

U

⁽¹⁾

·

₂

U

⁽²⁾

·

₃

U

⁽³⁾

, in which

• the matrices U

⁽¹⁾

∈ K

^I¹^×I¹

, U

⁽²⁾

∈ K

^I²^×I²

, and U

⁽³⁾

∈ K

^I³^×I³

are orthogonal (unitary);

• the tensor S ∈ K

^I¹^×I²^×I³

is – all-orthogonal (all-unitary):

_I

2

i2=1 I3

i3=1

s

j1i2i3

s

^∗j2i2i3

¹₂

= σ

_i⁽¹⁾₁

if j

1

= j

2

= i

1

= 0 if j

1

= j

2

,

_I

1

i1=1 I3

i3=1

s

i1j1i3

s

^∗_i₁_j₂_i₃

¹₂

= σ

_i⁽²⁾₂

if j

1

= j

2

= i

2

= 0 if j

1

= j

2

,

_I

1

i1=1 I2

i2=1

s

i1i2j1

s

^∗i1i2j2

¹₂

= σ

_i⁽³⁾₃

if j

1

= j

2

= i

3

= 0 if j

1

= j

2

. – ordered:

σ

⁽¹⁾₁

σ

⁽¹⁾₂

· · · σ

_R⁽¹⁾₁

> σ

_R⁽¹⁾₁₊₁

= · · · = σ

⁽¹⁾_I₁

= 0, σ

⁽²⁾₁

σ

⁽²⁾₂

· · · σ

_R⁽²⁾₂

> σ

_R⁽²⁾₂₊₁

= · · · = σ

⁽²⁾_I₂

= 0, σ

⁽³⁾₁

σ

⁽³⁾₂

· · · σ

_R⁽³⁾₃

> σ

_R⁽³⁾₃₊₁

= · · · = σ

⁽³⁾_I₃

= 0.

The values σ

⁽¹⁾_i₁

, σ

_i⁽²⁾₂

, and σ

⁽³⁾_i₃

are called mode-1, mode-2, and mode-3 singular values, respectively. The columns of U

⁽¹⁾

, U

⁽²⁾

, and U

⁽³⁾

are called mode-1, mode-2, and mode-3 singular vectors, respectively.

The MLSVD has its roots in [30, 31] and was further studied in [5], where it was called higher-order singular value decomposition (HOSVD).

Definition 1.7. A canonical polyadic decomposition (CPD) of a rank-R tensor T ∈ K

^I¹^×I²^×I³

is a decomposition of T in a linear combination of R rank-1 terms:

(1.3) T =

R r=1

a

_r

◦ b

r

◦ c

r

.

The fully symmetric variant, in which a

_r

= b

_r

= c

_r

, r = 1, . . . , R, was studied

in the nineteenth century in the context of invariant theory [3]. The unsymmetric

decomposition was introduced in 1927 [12, 13]. Around 1970, the unsymmetric de-

composition was independently reintroduced in psychometrics [1] and phonetics [11],

where it was called canonical decomposition (CANDECOMP) and parallel factor de-

composition (PARAFAC), respectively. The terms CANDECOMP and PARAFAC

are sometimes used when the number of rank-1 terms is not minimal.

(4)

1.3. Goals and organization. In this paper we further study a tensor decomposition that we have recently introduced, namely the decomposition of a third-order tensor in rank-(L

r

, L

r

, 1) terms [6, 7, 8], and we develop a new technique for blind signal separation, based on this decomposition.

Section 2 contains the discussion of the tensor decomposition as such. In section 2.1 the deﬁnition is recalled. Section 2.2 concerns the uniqueness of the decomposition.

Theorem 2.4 is a new result. Section 2.3 is a note on computational methods.

Blind signal separation has been an active research area in the last twenty years;

see, for instance, the books [2, 4, 14]. A prominent technique is independent component analysis (ICA). In ICA, signals are separated on the basis of their hypothesized statistical independence. In this paper we develop a new deterministic technique for blind signal separation. Signals are separated under the assumption that they can be modeled as exponential polynomials. Such signals are ubiquitous; as a matter of fact, exponentials play a very fundamental role in signal processing and system theory [17, 24].

Section 3 derives the blind signal separation technique at a conceptual level.

The problem is formulated in terms of the decomposition in rank-(L

r

, L

r

, 1) terms of a third-order tensor with partial Hankel structure. Section 4 considers in detail signals that can be modeled as a linear combination of exponentials. The results are generalized to exponential polynomials in section 5. In section 6 the computational cost is reduced by means of tensor compression. Section 7 addresses the particular case of exponential polynomials that are mutually orthogonal. Section 8 illustrates the results by means of some numerical experiments. We conclude in section 9.

2. The decomposition of a third-order tensor in rank-( L

_r

, L

_r

, 1) terms.

2.1. Deﬁnition and representation. In [6, 7, 8] we introduced block term decompositions (BTD) of a higher-order tensor. If there is only one term, then the BTD of a given tensor reduces to its MLSVD. If all the terms have multilinear rank (1, 1, 1), then the BTD of a third-order tensor reduces to its CPD. A particular BTD is the decomposition in rank-(L

r

, L

r

, 1) terms.

Definition 2.1. A decomposition of a tensor T ∈ K

^I¹^×I²^×I³

in a sum of rank- (L

r

, L

r

, 1) terms, 1 r R, is a decomposition of T of the form

(2.1) T =

R r=1

E

_r

◦ c

r

,

in which matrix E

_r

∈ K

^I¹^×I²

is rank-L

r

and vector c

_r

∈ K

^I³

is nonzero, 1 r R.

We assume that R is minimal.

Note that the multilinear rank of the rth term in (2.1) is indeed equal to (L

r

, L

r

, 1).

It is clear that in (2.1) one can permute the rth and r

th terms when L

r

= L

r

. Also, one can scale E

_r

, provided c

_r

is counterscaled. We call the decomposition essentially unique when it is only subject to these trivial indeterminacies. If E

_r

= U

_r

· Σ

_r

· V

_r^H

denotes the SVD of E

_r

, 1 r R, then

T =

R r=1

U

_r

· (c

r

Σ

r

) · V

^H_r

◦ (c

r

/c

r

) =

R r=1

( c

r

Σ

r

) ·

1

U

_r

·

2

V

^∗_r

·

3

(c

_r

/c

r

)

is a representation of (2.1) in which each term is in MLSVD form.

If we factorize E

_r

as A

_r

· B

^T_r

, in which the matrix A

_r

∈ K

^I¹^×L^r

and the matrix

(5)

B

_r

∈ K

^I²^×L^r

are rank-L

r

, r = 1, . . . , R, then we can write (2.1) as

(2.2) T =

^R

r=1

(A

_r

· B

^T_r

) ◦ c

r

.

The decomposition is visualized in Figure 2.1. Deﬁne A = [A

₁

· · · A

R

], B = [B

₁

· · · B

R

], C = [c

₁

· · · c

R

]. In terms of the matrix representations of T deﬁned in (1.1), (2.2) can be written as

T

_I₃_I₂_×I₁

= (C B) · A

^T

, (2.3)

T

_I₃_I₁_×I₂

= (C A) · B

^T

, (2.4)

T

_I₂_I₁_×I₃

= [vec(E

₁

) · · · vec(E

_R

)] · C

^T

. (2.5)

T

I I

I

J J

J K K

K

= L

1

+ . . . + L

R

A

₁

B

^T₁

c

₁

A

_R

B

^T_R

c

_R

Fig. 2.1. Visual representation of the decomposition in rank-(Lr, Lr, 1) terms.

2.2. Uniqueness. In [7] several conditions were mentioned under which essential uniqueness of the decomposition is guaranteed. We recall [7, Theorem 4.1], which also appeared in a slightly diﬀerent form in [26].

Theorem 2.2. Consider a decomposition of T ∈ K

^I¹^×I²^×I³

in rank-(L

r

, L

r

, 1) terms as in (2.1)–(2.2), with I

1

, I

2

_R

r=1

L

r

. If A = [A

₁

A

₂

· · · A

R

] and B = [B

₁

B

₂

· · · B

R

] have full column rank and C does not have proportional columns, then the decomposition is essentially unique.

Below we propose a new uniqueness theorem (Theorem 2.4). This theorem gener- alizes the CPD uniqueness result [15, Condition A]. We ﬁrst reformulate [15, Condition A] as Theorem 2.3 and give a new proof. The reasoning will be generalized to prove Theorem 2.4.

Theorem 2.3. Consider the CPD (1.3) of T ∈ K

^I¹^×I²^×I³

. Deﬁne E(w) =

_R

r=1

w

r

a

_r

· b

^T_r

. Assume that the following conditions are satisﬁed:

(C1) For every w that has at least two nonzero entries, we have that rank(E(w)) >

1. (C2) The columns of C = [c

₁

c

₂

· · · c

R

] are linearly independent.

Then the CPD of T is essentially unique. On the other hand, if condition (C1) is not satisﬁed, then the CPD of T is not essentially unique.

Proof. Assume that there exists an alternative decomposition

(2.6) T =

R r=1

˜

a

_r

◦ ˜b

r

◦ ˜c

r

.

(6)

Denote Z = C

^†,T

, ˜ Z = ˜ C

^†,T

, and Y = ˜ C

^†

C. We have

T ·

3

˜ z

^T₁

= ˜ a

₁

· ˜b

^T₁

=

R r=1

w

⁽¹⁾_r

a

_r

· b

^T_r

,

where w

⁽¹⁾r

= y

1r

. Because of condition (C1), we have that ˜ a

₁

· ˜b

^T₁

= d

1

a

_r

·b

^T_r

for some r ∈ {1, 2, . . . , R}, with d

1

= 0. Because of condition (C2), we have that ˜z

1

= d

1

z

_r

. If r = 1, the ﬁrst and rth terms in (1.3) can without loss of generality be switched. We conclude that ˜ a

₁

· ˜b

^T₁

= d

1

a

₁

· b

^T₁

and ˜ z

₁

= d

1

z

₁

, with d

1

= 0.

Induction step. Let us assume that ˜ a

_r

· ˜b

^T_r

= d

r

a

_r

·b

^T_r

and ˜ z

_r

= d

r

z

_r

with d

r

= 0, 1 r ¯ R − 1. Then we prove that also ˜ a

_R_¯

· ˜b

^T_R_¯

= d

R¯

a

_R_¯

· b

^T_R_¯

and ˜ z

_R_¯

= d

R¯

z

_R_¯

with d

R¯

= 0. We have

T ·

3

˜ z

^T_R_¯

= ˜ a

_R_¯

· ˜b

^TR¯

=

R r=1

w

_r^{( ¯}^R)

a

_r

· b

^T_r

,

where w

r^{( ¯}^R)

= y

Rr¯

. Because of condition (C1), we have that ˜ a

_R_¯

· ˜b

^TR¯

= d

r

a

_r

· b

^T_r

for some r ∈ {1, 2, . . . , R}, with d

r

= 0. Because of condition (C2), we have that

˜

z

_R_¯

= d

r

z

_r

. Because ˜ Z has full column rank, we can rule out the possibility that r < ¯ R. If r > ¯ R, the ¯ Rth and rth terms in (1.3) can without loss of generality be switched. The induction follows.

We now have that ˜ Z = ZD, in which D = diag(d

1

, d

2

, . . . , d

R

) ∈ K

^R×R

is diagonal and nonsingular. This implies that ˜ C = CD

⁻¹

. On the other hand, we also have that

˜

a

_r

· ˜b

^T_r

= d

r

a

_r

· b

^T_r

, 1 r R. We conclude that decompositions (1.3) and (2.6) are essentially equal.

Finally, we prove that condition (C1) is necessary. Let us assume that it is not satisﬁed. Then there exist nonzero vectors u and v such that u ·v

^T

=

_R

r=1

w

r

a

_r

·b

^T_r

, in which, say, w

1

= 0 = w

2

. Without loss of generality, we assume that w

1

= 1. Then we have that a

₁

· b

^T₁

= u · v

^T

−

_R

r=2

w

r

a

_r

· b

^T_r

. Hence, an alternative decomposition of T is

T = u ◦ v ◦ c

1

+

R r=2

a

_r

◦ b

r

◦ (c

r

− w

r

c

₁

),

in which c

₂

− w

2

c

₁

is not proportional to c

₂

.

We now generalize Theorem 2.3 to the decomposition in rank-(L

r

, L

r

, 1) terms.

Theorem 2.4. Consider a decomposition of T ∈ K

^I¹^×I²^×I³

in rank-(L

r

, L

r

, 1) terms as in (2.1)–(2.2). Deﬁne E(w) =

_R

r=1

w

r

E

_r

. Assume that the following conditions are satisﬁed:

(C1) For every w that has at least two nonzero entries, we have that rank(E(w)) >

max

_r|w_r₌₀

(L

r

).

(C2) The columns of C are linearly independent.

Then decomposition (2.1)–(2.2) is essentially unique. On the other hand, if condition (C1) is not satisﬁed, then decomposition (2.1)–(2.2) is not essentially unique.

Proof. Without loss of generality, we assume that the terms in (2.1)–(2.2) are ordered such that L

1

L

2

· · · L

R

. Condition (C1) is then equivalent to the following set of conditions:

(C1.1) If w

1

= 0 and ∃s ∈ {2, 3, . . . , R} such that w

s

= 0, then rank(E(w)) > L

1

.

(7)

(C1.2) If w

2

= 0 and ∃s ∈ {1, 3, . . . , R} such that w

s

= 0, then rank(E(w)) > L

2

. .. .

(C1.R) If w

R

= 0 and ∃s ∈ {1, 2, . . . , R−1} such that w

s

= 0, then rank(E(w)) > L

R

. Assume that there exists an alternative decomposition

(2.7) T =

R r=1

E ˜

_r

◦ ˜c

r

=

R r=1

( ˜ A

_r

· ˜ B

^T_r

) ◦ ˜c

r

,

which is also ordered such that L

1

L

2

· · · L

R

. Denote Z = C

^†,T

, ˜ Z = ˜ C

^†,T

, and Y = ˜ C

^†

C. We have

T ·

3

˜ z

^T₁

= ˜ E

₁

=

R r=1

w

⁽¹⁾_r

E

_r

,

where w

r⁽¹⁾

= y

1r

. Since rank( ˜ E

₁

) = L

1

, we have according to condition (C1.1) that (i) w

⁽¹⁾₂

= w

⁽¹⁾₃

= · · · = w

⁽¹⁾_R

= 0 or that (ii) w

⁽¹⁾₁

= 0 and ∃s ∈ {2, 3, . . . , R} such that w

⁽¹⁾s

= 0. In the special case that w

₁⁽¹⁾

= w

⁽¹⁾₂

= · · · = w

_s−1⁽¹⁾_¯

= w

_s+1⁽¹⁾_¯

= · · · = w

_R⁽¹⁾

= 0 while w

s⁽¹⁾¯

= 0 and L

s¯

= L

1

, the ﬁrst and ¯ sth terms in (2.1) can without loss of generality be switched. In all other cases, possibility (ii) can be ruled out because of (C1.2)–(C1.R) and the ordering constraint. We conclude that w

⁽¹⁾₂

= w

⁽¹⁾₃

= · · · = w

⁽¹⁾_R

= 0, i.e., ˜ z

₁

= d

1

z

₁

with d

1

= 0.

Next, we have

T ·

3

˜ z

^T₂

= ˜ E

₂

=

R r=1

w

⁽²⁾r

E

_r

,

where w

r⁽²⁾

= y

2r

. Since rank( ˜ E

₂

) = L

2

, we have according to condition (C1.2) that (i) w

₁⁽²⁾

= w

⁽²⁾₃

= · · · = w

_R⁽²⁾

= 0 or that (ii) w

⁽²⁾₂

= 0 and ∃s ∈ {1, 3, . . . , R} such that w

⁽²⁾s

= 0. Let us examine the latter possibility. First, it is impossible that w

⁽²⁾₁

= 0 while w

₂⁽²⁾

= w

⁽²⁾₃

= · · · = w

⁽²⁾_R

= 0. Indeed, this would mean that ˜ z

₂

is in the orthogonal complement of span([c

₂

c

₃

· · · c

R

]). On the other hand, ˜ z

₂

∈ span( ˜ Z) = span(Z) = span( ˜ C) = span(C) = span(T

^T_I

2I1×I3

). This would imply that

˜

z

₂

, just like ˜ z

₁

, is proportional to z

₁

, which is impossible since ˜ Z has full column rank.

Second, in the special case that w

₁⁽²⁾

= w

₂⁽²⁾

= · · · = w

⁽²⁾_s−1_¯

= w

⁽²⁾_s+1_¯

= · · · = w

⁽²⁾_R

= 0 while w

s⁽²⁾¯

= 0 and L

s¯

= L

2

, the second and ¯ sth terms in (2.1) can without loss of generality be switched. Third, rank( ˜ E

₂

) = L

2

rules out the special case w

₁⁽²⁾

= w

⁽²⁾₂

= · · · = w

_s−1⁽²⁾_¯

= w

_s+1⁽²⁾_¯

= · · · = w

_R⁽²⁾

= 0, w

⁽²⁾¯s

= 0 and L

s¯

> L

2

. Fourth, because of (C1.3)–(C1.R) it is impossible that w

₁⁽²⁾

= 0, w

₂⁽²⁾

= 0, and ∃s ∈ {3, . . . , R} such that w

s⁽¹⁾

= 0. Fifth, because of (C1.3)–(C1.R) it is impossible that w

⁽²⁾₁

= w

⁽²⁾₂

= 0 and ∃s

1

, s

2

∈ {3, . . . , R}, s

1

= s

2

, such that w

⁽²⁾s1

= 0 = w

⁽²⁾s2

. We conclude that w

⁽²⁾₁

= w

⁽²⁾₃

= · · · = w

_R⁽²⁾

= 0, i.e., ˜ z

₂

= d

2

z

₂

with d

2

= 0.

Induction step. Let us assume that ˜ z

_r

= d

r

z

_r

with d

r

= 0, 1 r ¯ R − 1. Then we prove that also ˜ z

_R_¯

= d

R¯

z

_R_¯

with d

R¯

= 0. We have

T ·

3

˜ z

^T_R_¯

= ˜ E

_R_¯

=

R r=1

w

_r^{( ¯}^R)

E

_r

,

(8)

where w

^{( ¯}r^R)

= y

Rr¯

. Since rank( ˜ E

_R_¯

) = L

R¯

, we have according to condition (C1. ¯ R) that (i) w

^{( ¯}₁^R)

= · · · = w

^{( ¯}_R−1_¯^R)

= w

^{( ¯}_R+1_¯^R)

= · · · = w

^{( ¯}_R^R)

= 0 or that (ii) w

^{( ¯}_R_¯^R)

= 0 and ∃s ∈ {1, . . . , ¯ R − 1, ¯ R + 1, . . . , R} such that w

^{( ¯}s^R)

= 0. Let us examine the latter possibility.

First, it is impossible that w

^{( ¯}R¯^R)

= w

^{( ¯}R+1¯^R)

= · · · = w

_R^{( ¯}^R)

= 0. Indeed, this would mean that ˜ z

_R_¯

is in the orthogonal complement of span([c

_R_¯

c

_R+1_¯

· · · c

R

]). On the other hand, ˜ z

_R_¯

∈ span(˜Z) = span(Z) = span( ˜ C) = span(C) = span(T

^T_I

2I1×I3

). This would imply that ˜ z

_R_¯

∈ span([z

1

· · · z

R−1¯

]) = span([˜ z

₁

· · · ˜z

R−1¯

]), which is impossible since Z has full column rank. Second, in the special case that w ˜

^{( ¯}₁^R)

= · · · = w

^{( ¯}_R_¯^R)

(= · · · ) = w

^{( ¯}_¯_s−1^R)

= w

^{( ¯}_s+1_¯^R)

= · · · = w

_R^{( ¯}^R)

= 0 while w

_s^{( ¯}_¯^R)

= 0 and L

s¯

= L

R¯

, the ¯ Rth and ¯ sth terms in (2.1) can without loss of generality be switched. Third, rank( ˜ E

_R_¯

) = L

R¯

rules out the special case w

^{( ¯}₁^R)

= w

₂^{( ¯}^R)

= · · · = w

^{( ¯}_¯_s−1^R)

= w

_s+1^{( ¯}_¯^R)

= · · · = w

_R^{( ¯}^R)

= 0, w

^{( ¯}_¯_s^R)

= 0 and L

_s_¯

>

L

R¯

. Fourth, because of (C1. ¯ R + 1)–(C1.R) it is impossible that ∃s

1

∈ {1, . . . , ¯ R − 1}

such that w

^{( ¯}s^R)1

= 0, w

_R^{( ¯}_¯^R)

= 0 and ∃s

2

∈ { ¯ R + 1, . . . , R} such that w

^{( ¯}s^R)2

= 0. Fifth, because of (C1. ¯ R + 1)–(C1.R) it is impossible that w

₁^{( ¯}^R)

= w

^{( ¯}₂^R)

= · · · = w

^{( ¯}_R_¯^R)

= 0 and ∃s

1

, s

2

∈ { ¯ R + 1, . . . , R}, s

1

= s

2

, such that w

^{( ¯}s^R)1

= 0 = w

^{( ¯}s^R)2

. We conclude that w

^{( ¯}₁^R)

= · · · = w

^{( ¯}_R−1_¯^R)

= w

^{( ¯}_R+1_¯^R)

= · · · = w

^{( ¯}_R^R)

= 0, i.e., ˜ z

_R_¯

= d

R¯

z

_R_¯

with d

R¯

= 0.

After induction we have that ˜ Z = ZD, in which D ∈ K

^R×R

is diagonal and nonsingular. This implies that ˜ C = CD

⁻¹

. From (2.5) we now obtain

T

_I₂_I₁_×I₃

· ˜Z = [vec(˜E

1

) · · · vec(˜E

R

)] = [vec(E

₁

) · · · vec(E

R

)] · D.

We conclude that decompositions (2.1) and (2.7) are essentially equal.

Finally, we prove that conditions (C1.1)–(C1.R) are necessary. Let us assume that (C1. ¯ R) is not satisﬁed, i.e., for some w

R¯

= 0 and w

_s

= 0 (s = ¯ R) we have that rank(E(w)) L

R¯

. Without loss of generality we assume that w

R¯

= 1. Then we have that E

_R_¯

= E(w) −

_R

r= ¯R

w

r

E

_r

. Hence, an alternative decomposition of T is T = E(w) ◦ c

R¯

+

R r= ¯R

E

_r

◦ (c

r

− w

r

c

_R_¯

),

in which c

_s

− w

_s

c

_R_¯

is not proportional to c

_s

.

We illustrate Theorem 2.4 by means of two examples. Note that the essential uniqueness in Example 2 does not follow from Theorem 2.2, nor from any other theorem in [7].

Example 1. Consider full-rank matrices X, Y ∈ K

^3×3

and C ∈ K

^2×2

. Deﬁne the rank-2 matrices E

₁

= x

₁

y

^T₁

+ x

₂

y

^T₂

and E

₂

= x

₂

y

₂^T

+ x

₃

y

^T₃

and deﬁne a tensor T ∈ K

^3×3×2

by the decomposition in rank-(2,2,1) terms T = E

1

◦ c

1

+ E

₂

◦ c

2

. Obviously, for this decomposition condition (C1) in Theorem 2.4 is not satisﬁed, since E([1 − 1]) = E

1

− E

2

= x

₁

y

^T₁

− x

3

y

^T₃

is also rank-2. The proof of the theorem explains how alternative decompositions can be found. We have for instance T = E([1 − 1]) ◦ c

1

+ E

₂

◦ (c

1

+ c

₂

).

Example 2. Consider full-rank matrices X, Y ∈ K

^5×5

, D

₁

, D

2

∈ K

^3×3

, and

C ∈ K

^2×2

. Deﬁne the rank-3 matrices E

₁

= [x

₁

x

₂

x

₃

] · D

1

· [y

1

y

₂

y

₃

]

^T

and

E

₂

= [x

₃

x

₄

x

₅

] ·D

2

·[y

3

y

₄

y

₅

]

^T

, and deﬁne a tensor T ∈ K

^5×5×2

by the decomposition

in rank-(3,3,1) terms T = E

1

◦ c

1

+ E

₂

◦ c

2

. It is clear that w

1

E

₁

+ w

2

E

₂

has at

least rank 4 if w

1

= 0 = w

2

. Since the conditions in Theorem 2.4 are satisﬁed, the

decomposition of T is essentially unique.

(9)

2.3. Computation. An algorithm of the alternating least squares (ALS) type was proposed in [8]. This generalization of the most popular CPD algorithm [11, 19, 20, 27] is sometimes rather slow. Line search algorithms form an interesting alternative. In such schemes the multilinearity of the problem may be exploited to ﬁnd a good or even the optimal step size [22, 25]. Enhanced line search was used to compute a BTD in [22]. In [23] a Levenberg–Marquardt algorithm was discussed.

A Levenberg–Marquardt iteration step is quite expensive, but the convergence is quadratic. We also mention that the proof of Theorem 2.2 is constructive [7, 26]. It shows that under the conditions of the theorem the decomposition can be computed via a matrix generalized eigenvalue decomposition (GEVD).

3. Blind separation and tensor decomposition. Consider the following data model:

(3.1) Y = M · S + N,

in which Y ∈ K

^K×N

is the matrix of observed data, M ∈ K

^K×R

is an unknown mixing matrix, S ∈ K

^R×N

is a matrix that contains R unknown source signals, and N ∈ K

^K×N

represents additive noise. The goal of blind signal separation is the estimation of M and/or S, given only Y. N is considered as a perturbation of the equation and will for convenience be ignored in the presentation.

Since the factorization M ·S is not unique, we need to make some assumptions on M and/or S. In principal component analysis (PCA) it is assumed that the columns of M are mutually orthogonal and that also the rows of S are mutually orthogonal [16].

The problem then reduces to the computation of the singular value decomposition (SVD) of Y. In independent component analysis (ICA) it is assumed that the rows of S are mutually statistically independent [4]. Algebraic methods for ICA rely in some way on a CPD of a higher-order tensor derived from X [9]. The latter can be an observed higher-order cumulant tensor or a third-order tensor in which a set of observed covariance matrices is stacked, to mention just two popular alternatives.

In this paper we propose a new technique for blind signal separation, which is based on a diﬀerent assumption on S. Namely, in section 4 we assume that the sources can be modeled as linear combinations of exponentials. In section 5 we more generally consider exponential polynomials. This structure allows us to formulate the problem in terms of the decomposition of a third-order tensor, derived from Y, in a sum of rank-(L

r

, L

r

, 1) terms. If this decomposition is essentially unique, then its computation allows us to blindly separate the signals.

We work as follows. Each row of Y is mapped to an (I × J) Hankel matrix, with I + J − 1 = N . These matrices are stacked in a tensor Y ∈ K

^I×J×K

. Formally, we have

(3.2) ( Y)

ijk

= (Y)

_k,i+j−1

, 1 i I, 1 j J, 1 k K.

Since the mapping is linear, the (I×J) slices of Y are linear combinations of the Hankel representations of the sources. The linear coeﬃcients correspond to the entries of M.

We have

(3.3) Y =

R r=1

H

_r

◦ m

r

,

in which H

_r

∈ K

^I×J

is the Hankel matrix derived from the rth row of S, 1 r R.

Under certain conditions on the sources, discussed in sections 4 and 5, the associated

(10)

Hankel matrices have low rank. That is, under certain conditions on the sources, decomposition (3.3) is of the type (2.1), and the uniqueness conditions mentioned in section 2.2 apply.

If decomposition (3.3) is essentially unique, then it directly yields the mixing matrix up to column permutation and scaling. Let ˆ M be the estimate of M. We ideally have ˆ M = M · D · P, in which D ∈ K

^R×R

is a nonsingular diagonal matrix and P ∈ K

^R×R

a permutation matrix.

If M has full column rank with K R, then an estimate of the source matrix S can be obtained from (3.1) as ˆ S = ˆ M

^†

· Y. If M is rank-deﬁcient and/or K < R, then this approach is not possible. However, the source values can still be estimated by av- eraging the entries of the estimates ˆ H

_r

of H

_r

along the antidiagonals. Consistent with the indeterminacies of M, we ideally have ˆ S = P

^T

· D

⁻¹

· S. We call the factorization M · S essentially unique when it is only subject to these trivial indeterminacies.

4. Linear combinations of exponentials. In section 4.1 we explain our source model. In section 4.2 we explain how the source structure leads to rank-(L

r

, L

r

, 1) terms in (3.3). In section 4.3 we discuss the uniqueness of the decomposition Y = M · S.

4.1. Source model. In this section we assume that the sources can be expressed as linear combinations of exponentials:

(4.1) s

r

(n)

^def

= s

r,n+1

=

Lr

lr=1

c

lr,r

z

ⁿ_l_r_,r

, 0 n N − 1, 1 r R.

Because of Euler’s formula, this model subsumes linear combinations of possibly ex- ponentially damped sinusoids:

e

^−αn

cos(ωn + φ) = cz

ⁿ

+ c

^∗

(z

^∗

)

ⁿ

,

where c =

¹₂

e

^jφ

and z = e

^−α+jω

. Model (4.1) can also include hyperbolic sines and cosines:

sinh(n) = e

ⁿ

− e

⁻ⁿ

2 , cosh(n) = e

ⁿ

+ e

⁻ⁿ

2 .

4.2. Tensor decomposition. If s

r

(n) can be written as in (4.1), then its associated Hankel matrix H

_r

∈ K

^I×J

admits the Vandermonde decomposition

(4.2) H

_r

= V

_r

· diag(c

1,r

, c

2,r

, . . . , c

Lr,r

) · ˜ V

_r^T

,

in which the Vandermonde matrices V

_r

∈ K

^I×L^r

and ˜ V

_r

∈ K

^J×L^r

are deﬁned by (4.3)

V

_r

=

⎡

⎢ ⎢

⎢ ⎣

1 1 · · · 1

z

1,r

z

2,r

· · · z

Lr,r

.. . .. . .. . z

_1,r^I−1

z

_2,r^I−1

· · · z

^I−1_L_r_,r

⎤

⎥ ⎥

⎥ ⎦ , V ˜

_r

=

⎡

⎢ ⎢

⎢ ⎣

1 1 · · · 1

z

1,r

z

2,r

· · · z

Lr,r

.. . .. . .. . z

^J−1_1,r

z

^J−1_2,r

· · · z

^J−1_L_r_,r

⎤

⎥ ⎥

⎥ ⎦ ;

see [21]. Let us assume that I, J max(L

1

, L

2

, . . . , L

R

). Since a Vandermonde matrix

generated by distinct poles has full rank, H

_r

is rank-L

r

and the rth term in (3.3) is

rank-(L

r

, L

r

, 1), 1 r R. Hence, (3.3) is a decomposition of Y in rank-(L

r

, L

r

, 1)

terms.

(11)

4.3. Uniqueness.

Theorem 4.1. Consider a matrix M ∈ K

^K×R

that does not have proportional columns and a matrix S ∈ K

^R×N

with structure (4.1). Assume that

^{N +1}₂

_R

r=1

L

r

. If all the poles z

lr,r

are distinct, 1 l

r

L

r

, 1 r R, then the decomposition Y = M · S is essentially unique.

Proof. The constraint

^{N +1}₂

_R

r=1

L

r

allows us to map the rows of Y to (I × J) Hankel matrices with I, J

_R

r=1

L

r

. If all the poles are distinct, then the matrices V = [V

₁

V

₂

· · · V

R

] · diag(c

1,1

, c

2,1

, . . . , c

LR,R

) ∈ K

^I×^R^r=1^L^r

and V = [ ˜ ˜ V

₁

V ˜

₂

· · · ˜ V

_R

] ∈ K

^J×^R^r=1^L^r

, deﬁned by (4.3), have full column rank. The result then follows from Theorem 2.2.

Remark 1. Note that under the conditions of Theorem 4.1 exact blind signal separation may be achieved by computing a matrix GEVD.

In the case of repeated poles, essential uniqueness may sometimes by proved by invoking Theorem 2.4. We discuss three examples.

Example 3. Consider a mixture of two sources. The mixing matrix M ∈ K

^K×2

has full column rank, with K 2. The two rows of S ∈ K

^2×N

are generated as in (4.1), with poles z

1

, z

2

, z

3

and z

3

, z

4

, z

5

, respectively. Apart from the common pole z

3

, all poles are distinct. The number of samples N 9. The two sources are associated with rank-3 Hankel matrices H

₁

, H

2

∈ K

^I×J

, where I, J 5. Since there is only one common pole, rank(w

1

H

₁

+ w

2

H

₂

) 4 when w

1

= 0 = w

2

. Theorem 2.4 now implies that decomposition (3.3) is essentially unique. Consequently, the factorization Y = M · S is essentially unique.

Example 4. Consider again a mixture of two sources. The mixing matrix M ∈ K

^K×2

has full column rank, with K 2. The two rows of S ∈ K

^2×N

are generated as in (4.1), with poles z

1

, z

2

, z

3

and z

2

, z

3

, z

4

, respectively. The poles z

1

, z

2

, z

3

, and z

4

are all diﬀerent. The number of samples N 9. The two sources are associated with rank-3 Hankel matrices H

₁

, H

2

∈ K

^I×J

, where I, J 5. Theorem 2.4 indicates that decomposition (3.3) is not essentially unique since, for instance, rank(c

1,2

H

₁

− c

2,1

H

₂

) 3. As a matter of fact,

Y =

M ·

c

1,2

−c

2,1

0 1

₋₁

· c

1,2

−c

2,1

0 1

· S

is a valid decomposition too.

Example 5. Consider a mixture of three sources. The mixing matrix M ∈ K

^K×3

has full column rank, with K 3. The three rows of S ∈ K

^3×N

are generated as in (4.1), with poles z

1

, z

2

, z

3

; z

3

, z

4

, z

5

; z

5

, z

6

, z

1

, respectively. The poles z

1

, z

2

, . . . , z

6

are distinct. The number of samples N 11. The three sources are associated with rank-3 Hankel matrices H

₁

, H

2

, H

3

∈ K

^I×J

, where I, J 6. We ﬁrst consider linear combinations w

1

H

₁

+ w

2

H

₂

+ w

3

H

₃

in which exactly one of the coeﬃcients is equal to zero. The rank of such linear combinations is at least equal to 4. Next, we consider linear combinations in which none of the coeﬃcients is equal to zero. The rank of such linear combinations is always strictly greater than 3, if c

1,1

c

1,2

c

1,3

+ c

2,1

c

2,2

c

2,3

= 0.

Hence, under the latter condition Theorem 2.4 implies that decomposition (3.3) is essentially unique. The factorization Y = M · S is then essentially unique.

5. Exponential polynomials. In section 5.1 we present a source model that is

more general than the one in 4.1. In section 5.2 we explain how the source structure

BLIND SEPARATION OF EXPONENTIAL POLYNOMIALS AND THE DECOMPOSITION OF A TENSOR IN RANK-( L