BLIND SEPARATION OF EXPONENTIAL POLYNOMIALS AND THE DECOMPOSITION OF A TENSOR IN RANK-( L
r, L
r, 1) TERMS
∗LIEVEN DE LATHAUWER†
Abstract. We present a new necessary and sufficient condition for essential uniqueness of the decomposition of a third-order tensor in rank-(Lr, Lr, 1) terms. We derive a new deterministic technique for blind signal separation that relies on this decomposition. The method assumes that the signals can be modeled as linear combinations of exponentials or, more generally, as exponential polynomials. The results are illustrated by means of numerical experiments.
Key words. multilinear algebra, higher-order tensor, singular value decomposition, canonical polyadic decomposition, block term decomposition, blind signal separation, exponential polynomial
AMS subject classifications. 15A18, 15A69 DOI. 10.1137/100805510
1. Introduction. In section 1.1 we explain our notation and give some basic definitions. In section 1.2 we briefly recall the multilinear singular value decomposition (MLSVD) and the canonical polyadic decomposition (CPD). In section 1.3 we set out the goals of this paper and explain how it is organized.
1.1. Preliminaries.
1.1.1. Notation. Scalars are denoted by lower-case letters (a, b, . . . ), vectors are written in boldface lower-case (a, b, . . . ), matrices correspond to boldface capitals (A, B, . . . ), and tensors are written as calligraphic letters ( A, B, . . . ). This notation is consistently used for lower-order parts of a given quantity. For instance, the entry with row index i and column index j in a matrix A, i.e., (A)
ij, is denoted by a
ij(also (a)
i= a
iand ( A)
ijk= a
ijk). The ith column vector of a matrix A is denoted by a
i, i.e., A = [a
1a
2. . .]. Italic capitals indicate index upper bounds (e.g., i = 1, 2, . . . , I).
K stands for R or C. The symbol ⊗ denotes the Kronecker product,
A ⊗ B =
⎡
⎢ ⎣
a
11B a
12B · · · a
21B a
22B · · ·
.. . .. .
⎤
⎥ ⎦ .
The Khatri–Rao or columnwise Kronecker product is represented by :
A B = [a
1⊗ b
1a
2⊗ b
2· · · ] .
∗Received by the editors August 16, 2010; accepted for publication (in revised form) September 13, 2011; published electronically December 8, 2011. This research was supported by the Research Coun- cil K.U. Leuven: GOA-MaNet, CoE EF/05/006 Optimization in Engineering (OPTEC), CIF1 and STRT1/08/023; the F.W.O.: (a) project G.0427.10N, (b) Research Communities ICCoS, ANMMM, and MLDM; the Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, “Dynamical systems, control and optimization,” 2007–2011); EU: ERNSI.
http://www.siam.org/journals/simax/32-4/80551.html
†Group Science, Engineering and Technology, Katholieke Universiteit Leuven Campus Kortrijk, E.
Sabbelaan 53, 8500 Kortrijk, Belgium (Lieven.DeLathauwer@kuleuven-kortrijk.be), and Department of Electrical Engineering (ESAT), Research Division SCD, Katholieke Universiteit Leuven, Kasteel- park Arenberg 10, B-3001 Leuven, Belgium (Lieven.DeLathauwer@esat.kuleuven.be, http://homes.
esat.kuleuven.be/∼delathau/home.html).
1451
The column space of a matrix is denoted by span(A). The rank of a matrix A is de- noted by rank(A). The superscripts ·
T, ·
∗, ·
H, and ·
†denote the transpose, complex conjugate, complex conjugated transpose, and Moore–Penrose pseudoinverse, respec- tively. The operator diag( ·) stacks its scalar arguments in a square diagonal matrix.
For vectorization of a matrix A = [a
1a
2. . .] we adopt the following convention:
vec(A) = [a
T1a
T2. . .]
T. The (N × N ) identity matrix is represented by I
N ×N. 1.1.2. Basic definitions.
Definition 1.1. Consider T ∈ K
I1×I2×I3and U
(1)∈ K
J1×I1, U
(2)∈ K
J2×I2, U
(3)∈ K
J3×I3. Then the products T ·
1U
(1), T ·
2U
(2), and T ·
3U
(3)are defined by
( T ·
1U
(1))
j1i2i3=
I1
i1=1
t
i1i2i3u
(1)j1i1∀j
1, i
2, i
3,
( T ·
2U
(2))
i1j2i3=
I2
i2=1
t
i1i2i3u
(2)j2i2∀i
1, j
2, i
3,
( T ·
3U
(3))
i1i2j3=
I3
i3=1
t
i1i2i3u
(3)j3i3∀i
1, i
2, j
3,
respectively.
Definition 1.2. The outer product A ◦ B of a tensor A ∈ K
I1×I2×···×IPand a tensor B ∈ K
J1×J2×···×JQis the tensor defined by
( A ◦ B)
i1i2...iPj1j2...jQ= a
i1i2...iPb
j1j2...jQfor all values of the indices.
For instance, the outer product T of three vectors a, b, and c is defined by t
ijk= a
ib
jc
kfor all values of the indices. The outer product T of a matrix E and a vector c is defined by t
ijk= e
ijc
kfor all values of the indices.
Definition 1.3. In the matrix representations T
I3I2×I1∈ K
I3I2×I1, T
I3I1×I2∈ K
I3I1×I2, T
I2I1×I3∈ K
I2I1×I3of a tensor T ∈ K
I1×I2×I3, the entries are stacked as follows:
(T
I3I2×I1)
(i3−1)I2+i2,i1= (T
I3I1×I2)
(i3−1)I1+i1,i2= (T
I2I1×I3)
(i2−1)I1+i1,i3= t
i1i2i3∀i
1, i
2, i
3. (1.1)
Definition 1.4. A third-order tensor T ∈ K
I1×I2×I3has multilinear rank (R
1, R
2, R
3) iff rank(T
I3I2×I1) = R
1, rank(T
I3I1×I2) = R
2, and rank(T
I2I1×I3) = R
3.
The values R
1, R
2, and R
3are sometimes called the mode-1, mode-2, and mode- 3 rank, respectively. If the multilinear rank of a third-order tensor T is equal to (R
1, R
2, R
3), then T is called rank-(R
1, R
2, R
3). A rank-(1, 1, 1) tensor is briefly called rank-1. This implies that a third-order tensor has rank-1 iff it equals the outer product of three nonzero vectors.
Besides multilinear rank, a second generalization of matrix rank to higher-order tensors is the following.
Definition 1.5. The rank of a tensor T is the minimal number of rank-1 tensors that yield T in a linear combination.
An early discussion of the generalization of matrix rank to higher-order tensors
can be found in [12, 13].
1.2. Basic tensor decompositions.
Definition 1.6. An MLSVD of a rank-(R
1, R
2, R
3) tensor T ∈ K
I1×I2×I3is a decomposition of T of the form
(1.2) T = S ·
1U
(1)·
2U
(2)·
3U
(3), in which
• the matrices U
(1)∈ K
I1×I1, U
(2)∈ K
I2×I2, and U
(3)∈ K
I3×I3are orthogonal (unitary);
• the tensor S ∈ K
I1×I2×I3is – all-orthogonal (all-unitary):
I 2
i2=1 I3
i3=1
s
j1i2i3s
∗j2i2i3 12= σ
i(1)1if j
1= j
2= i
1= 0 if j
1= j
2,
I 1i1=1 I3
i3=1
s
i1j1i3s
∗i1j2i3 12= σ
i(2)2if j
1= j
2= i
2= 0 if j
1= j
2,
I 1i1=1 I2
i2=1
s
i1i2j1s
∗i1i2j2 12= σ
i(3)3if j
1= j
2= i
3= 0 if j
1= j
2. – ordered:
σ
(1)1σ
(1)2· · · σ
R(1)1> σ
R(1)1+1= · · · = σ
(1)I1= 0, σ
(2)1σ
(2)2· · · σ
R(2)2> σ
R(2)2+1= · · · = σ
(2)I2= 0, σ
(3)1σ
(3)2· · · σ
R(3)3> σ
R(3)3+1= · · · = σ
(3)I3= 0.
The values σ
(1)i1, σ
i(2)2, and σ
(3)i3are called mode-1, mode-2, and mode-3 singular values, respectively. The columns of U
(1), U
(2), and U
(3)are called mode-1, mode-2, and mode-3 singular vectors, respectively.
The MLSVD has its roots in [30, 31] and was further studied in [5], where it was called higher-order singular value decomposition (HOSVD).
Definition 1.7. A canonical polyadic decomposition (CPD) of a rank-R tensor T ∈ K
I1×I2×I3is a decomposition of T in a linear combination of R rank-1 terms:
(1.3) T =
R r=1a
r◦ b
r◦ c
r.
The fully symmetric variant, in which a
r= b
r= c
r, r = 1, . . . , R, was studied
in the nineteenth century in the context of invariant theory [3]. The unsymmetric
decomposition was introduced in 1927 [12, 13]. Around 1970, the unsymmetric de-
composition was independently reintroduced in psychometrics [1] and phonetics [11],
where it was called canonical decomposition (CANDECOMP) and parallel factor de-
composition (PARAFAC), respectively. The terms CANDECOMP and PARAFAC
are sometimes used when the number of rank-1 terms is not minimal.
1.3. Goals and organization. In this paper we further study a tensor decom- position that we have recently introduced, namely the decomposition of a third-order tensor in rank-(L
r, L
r, 1) terms [6, 7, 8], and we develop a new technique for blind signal separation, based on this decomposition.
Section 2 contains the discussion of the tensor decomposition as such. In section 2.1 the definition is recalled. Section 2.2 concerns the uniqueness of the decomposition.
Theorem 2.4 is a new result. Section 2.3 is a note on computational methods.
Blind signal separation has been an active research area in the last twenty years;
see, for instance, the books [2, 4, 14]. A prominent technique is independent compo- nent analysis (ICA). In ICA, signals are separated on the basis of their hypothesized statistical independence. In this paper we develop a new deterministic technique for blind signal separation. Signals are separated under the assumption that they can be modeled as exponential polynomials. Such signals are ubiquitous; as a matter of fact, exponentials play a very fundamental role in signal processing and system theory [17, 24].
Section 3 derives the blind signal separation technique at a conceptual level.
The problem is formulated in terms of the decomposition in rank-(L
r, L
r, 1) terms of a third-order tensor with partial Hankel structure. Section 4 considers in detail signals that can be modeled as a linear combination of exponentials. The results are generalized to exponential polynomials in section 5. In section 6 the computational cost is reduced by means of tensor compression. Section 7 addresses the particular case of exponential polynomials that are mutually orthogonal. Section 8 illustrates the results by means of some numerical experiments. We conclude in section 9.
2. The decomposition of a third-order tensor in rank-( L
r, L
r, 1) terms.
2.1. Definition and representation. In [6, 7, 8] we introduced block term decompositions (BTD) of a higher-order tensor. If there is only one term, then the BTD of a given tensor reduces to its MLSVD. If all the terms have multilinear rank (1, 1, 1), then the BTD of a third-order tensor reduces to its CPD. A particular BTD is the decomposition in rank-(L
r, L
r, 1) terms.
Definition 2.1. A decomposition of a tensor T ∈ K
I1×I2×I3in a sum of rank- (L
r, L
r, 1) terms, 1 r R, is a decomposition of T of the form
(2.1) T =
R r=1E
r◦ c
r,
in which matrix E
r∈ K
I1×I2is rank-L
rand vector c
r∈ K
I3is nonzero, 1 r R.
We assume that R is minimal.
Note that the multilinear rank of the rth term in (2.1) is indeed equal to (L
r, L
r, 1).
It is clear that in (2.1) one can permute the rth and r
th terms when L
r= L
r. Also, one can scale E
r, provided c
ris counterscaled. We call the decomposition essentially unique when it is only subject to these trivial indeterminacies. If E
r= U
r· Σ
r· V
rHdenotes the SVD of E
r, 1 r R, then
T =
R r=1U
r· (c
rΣ
r) · V
Hr◦ (c
r/c
r) =
R r=1( c
rΣ
r) ·
1U
r·
2V
∗r·
3(c
r/c
r)
is a representation of (2.1) in which each term is in MLSVD form.
If we factorize E
ras A
r· B
Tr, in which the matrix A
r∈ K
I1×Lrand the matrix
B
r∈ K
I2×Lrare rank-L
r, r = 1, . . . , R, then we can write (2.1) as
(2.2) T =
Rr=1
(A
r· B
Tr) ◦ c
r.
The decomposition is visualized in Figure 2.1. Define A = [A
1· · · A
R], B = [B
1· · · B
R], C = [c
1· · · c
R]. In terms of the matrix representations of T defined in (1.1), (2.2) can be written as
T
I3I2×I1= (C B) · A
T, (2.3)
T
I3I1×I2= (C A) · B
T, (2.4)
T
I2I1×I3= [vec(E
1) · · · vec(E
R)] · C
T. (2.5)
T
I I
I
J J
J K K
K
= L
1+ . . . + L
RA
1B
T1c
1A
RB
TRc
RFig. 2.1. Visual representation of the decomposition in rank-(Lr, Lr, 1) terms.
2.2. Uniqueness. In [7] several conditions were mentioned under which essential uniqueness of the decomposition is guaranteed. We recall [7, Theorem 4.1], which also appeared in a slightly different form in [26].
Theorem 2.2. Consider a decomposition of T ∈ K
I1×I2×I3in rank-(L
r, L
r, 1) terms as in (2.1)–(2.2), with I
1, I
2R
r=1
L
r. If A = [A
1A
2· · · A
R] and B = [B
1B
2· · · B
R] have full column rank and C does not have proportional columns, then the decomposition is essentially unique.
Below we propose a new uniqueness theorem (Theorem 2.4). This theorem gener- alizes the CPD uniqueness result [15, Condition A]. We first reformulate [15, Condition A] as Theorem 2.3 and give a new proof. The reasoning will be generalized to prove Theorem 2.4.
Theorem 2.3. Consider the CPD (1.3) of T ∈ K
I1×I2×I3. Define E(w) =
Rr=1
w
ra
r· b
Tr. Assume that the following conditions are satisfied:
(C1) For every w that has at least two nonzero entries, we have that rank(E(w)) >
1.
(C2) The columns of C = [c
1c
2· · · c
R] are linearly independent.
Then the CPD of T is essentially unique. On the other hand, if condition (C1) is not satisfied, then the CPD of T is not essentially unique.
Proof. Assume that there exists an alternative decomposition
(2.6) T =
R r=1˜
a
r◦ ˜b
r◦ ˜c
r.
Denote Z = C
†,T, ˜ Z = ˜ C
†,T, and Y = ˜ C
†C.
We have
T ·
3˜ z
T1= ˜ a
1· ˜b
T1=
R r=1w
(1)ra
r· b
Tr,
where w
(1)r= y
1r. Because of condition (C1), we have that ˜ a
1· ˜b
T1= d
1a
r·b
Trfor some r ∈ {1, 2, . . . , R}, with d
1= 0. Because of condition (C2), we have that ˜z
1= d
1z
r. If r = 1, the first and rth terms in (1.3) can without loss of generality be switched. We conclude that ˜ a
1· ˜b
T1= d
1a
1· b
T1and ˜ z
1= d
1z
1, with d
1= 0.
Induction step. Let us assume that ˜ a
r· ˜b
Tr= d
ra
r·b
Trand ˜ z
r= d
rz
rwith d
r= 0, 1 r ¯ R − 1. Then we prove that also ˜ a
R¯· ˜b
TR¯= d
R¯a
R¯· b
TR¯and ˜ z
R¯= d
R¯z
R¯with d
R¯= 0. We have
T ·
3˜ z
TR¯= ˜ a
R¯· ˜b
TR¯=
R r=1w
r( ¯R)a
r· b
Tr,
where w
r( ¯R)= y
Rr¯. Because of condition (C1), we have that ˜ a
R¯· ˜b
TR¯= d
ra
r· b
Trfor some r ∈ {1, 2, . . . , R}, with d
r= 0. Because of condition (C2), we have that
˜
z
R¯= d
rz
r. Because ˜ Z has full column rank, we can rule out the possibility that r < ¯ R. If r > ¯ R, the ¯ Rth and rth terms in (1.3) can without loss of generality be switched. The induction follows.
We now have that ˜ Z = ZD, in which D = diag(d
1, d
2, . . . , d
R) ∈ K
R×Ris diagonal and nonsingular. This implies that ˜ C = CD
−1. On the other hand, we also have that
˜
a
r· ˜b
Tr= d
ra
r· b
Tr, 1 r R. We conclude that decompositions (1.3) and (2.6) are essentially equal.
Finally, we prove that condition (C1) is necessary. Let us assume that it is not satisfied. Then there exist nonzero vectors u and v such that u ·v
T=
Rr=1
w
ra
r·b
Tr, in which, say, w
1= 0 = w
2. Without loss of generality, we assume that w
1= 1. Then we have that a
1· b
T1= u · v
T−
Rr=2
w
ra
r· b
Tr. Hence, an alternative decomposition of T is
T = u ◦ v ◦ c
1+
R r=2a
r◦ b
r◦ (c
r− w
rc
1),
in which c
2− w
2c
1is not proportional to c
2.
We now generalize Theorem 2.3 to the decomposition in rank-(L
r, L
r, 1) terms.
Theorem 2.4. Consider a decomposition of T ∈ K
I1×I2×I3in rank-(L
r, L
r, 1) terms as in (2.1)–(2.2). Define E(w) =
Rr=1
w
rE
r. Assume that the following conditions are satisfied:
(C1) For every w that has at least two nonzero entries, we have that rank(E(w)) >
max
r|wr=0(L
r).
(C2) The columns of C are linearly independent.
Then decomposition (2.1)–(2.2) is essentially unique. On the other hand, if condition (C1) is not satisfied, then decomposition (2.1)–(2.2) is not essentially unique.
Proof. Without loss of generality, we assume that the terms in (2.1)–(2.2) are ordered such that L
1L
2· · · L
R. Condition (C1) is then equivalent to the following set of conditions:
(C1.1) If w
1= 0 and ∃s ∈ {2, 3, . . . , R} such that w
s= 0, then rank(E(w)) > L
1.
(C1.2) If w
2= 0 and ∃s ∈ {1, 3, . . . , R} such that w
s= 0, then rank(E(w)) > L
2. .. .
(C1.R) If w
R= 0 and ∃s ∈ {1, 2, . . . , R−1} such that w
s= 0, then rank(E(w)) > L
R. Assume that there exists an alternative decomposition
(2.7) T =
R r=1E ˜
r◦ ˜c
r=
R r=1( ˜ A
r· ˜ B
Tr) ◦ ˜c
r,
which is also ordered such that L
1L
2· · · L
R. Denote Z = C
†,T, ˜ Z = ˜ C
†,T, and Y = ˜ C
†C.
We have
T ·
3˜ z
T1= ˜ E
1=
R r=1w
(1)rE
r,
where w
r(1)= y
1r. Since rank( ˜ E
1) = L
1, we have according to condition (C1.1) that (i) w
(1)2= w
(1)3= · · · = w
(1)R= 0 or that (ii) w
(1)1= 0 and ∃s ∈ {2, 3, . . . , R} such that w
(1)s= 0. In the special case that w
1(1)= w
(1)2= · · · = w
s−1(1)¯= w
s+1(1)¯= · · · = w
R(1)= 0 while w
s(1)¯= 0 and L
s¯= L
1, the first and ¯ sth terms in (2.1) can without loss of generality be switched. In all other cases, possibility (ii) can be ruled out because of (C1.2)–(C1.R) and the ordering constraint. We conclude that w
(1)2= w
(1)3= · · · = w
(1)R= 0, i.e., ˜ z
1= d
1z
1with d
1= 0.
Next, we have
T ·
3˜ z
T2= ˜ E
2=
R r=1w
(2)rE
r,
where w
r(2)= y
2r. Since rank( ˜ E
2) = L
2, we have according to condition (C1.2) that (i) w
1(2)= w
(2)3= · · · = w
R(2)= 0 or that (ii) w
(2)2= 0 and ∃s ∈ {1, 3, . . . , R} such that w
(2)s= 0. Let us examine the latter possibility. First, it is impossible that w
(2)1= 0 while w
2(2)= w
(2)3= · · · = w
(2)R= 0. Indeed, this would mean that ˜ z
2is in the orthogonal complement of span([c
2c
3· · · c
R]). On the other hand, ˜ z
2∈ span( ˜ Z) = span(Z) = span( ˜ C) = span(C) = span(T
TI2I1×I3
). This would imply that
˜
z
2, just like ˜ z
1, is proportional to z
1, which is impossible since ˜ Z has full column rank.
Second, in the special case that w
1(2)= w
2(2)= · · · = w
(2)s−1¯= w
(2)s+1¯= · · · = w
(2)R= 0 while w
s(2)¯= 0 and L
s¯= L
2, the second and ¯ sth terms in (2.1) can without loss of generality be switched. Third, rank( ˜ E
2) = L
2rules out the special case w
1(2)= w
(2)2= · · · = w
s−1(2)¯= w
s+1(2)¯= · · · = w
R(2)= 0, w
(2)¯s= 0 and L
s¯> L
2. Fourth, because of (C1.3)–(C1.R) it is impossible that w
1(2)= 0, w
2(2)= 0, and ∃s ∈ {3, . . . , R} such that w
s(1)= 0. Fifth, because of (C1.3)–(C1.R) it is impossible that w
(2)1= w
(2)2= 0 and ∃s
1, s
2∈ {3, . . . , R}, s
1= s
2, such that w
(2)s1= 0 = w
(2)s2. We conclude that w
(2)1= w
(2)3= · · · = w
R(2)= 0, i.e., ˜ z
2= d
2z
2with d
2= 0.
Induction step. Let us assume that ˜ z
r= d
rz
rwith d
r= 0, 1 r ¯ R − 1. Then we prove that also ˜ z
R¯= d
R¯z
R¯with d
R¯= 0. We have
T ·
3˜ z
TR¯= ˜ E
R¯=
R r=1w
r( ¯R)E
r,
where w
( ¯rR)= y
Rr¯. Since rank( ˜ E
R¯) = L
R¯, we have according to condition (C1. ¯ R) that (i) w
( ¯1R)= · · · = w
( ¯R−1¯R)= w
( ¯R+1¯R)= · · · = w
( ¯RR)= 0 or that (ii) w
( ¯R¯R)= 0 and ∃s ∈ {1, . . . , ¯ R − 1, ¯ R + 1, . . . , R} such that w
( ¯sR)= 0. Let us examine the latter possibility.
First, it is impossible that w
( ¯R¯R)= w
( ¯R+1¯R)= · · · = w
R( ¯R)= 0. Indeed, this would mean that ˜ z
R¯is in the orthogonal complement of span([c
R¯c
R+1¯· · · c
R]). On the other hand, ˜ z
R¯∈ span(˜Z) = span(Z) = span( ˜ C) = span(C) = span(T
TI2I1×I3
). This would imply that ˜ z
R¯∈ span([z
1· · · z
R−1¯]) = span([˜ z
1· · · ˜z
R−1¯]), which is impossible since Z has full column rank. Second, in the special case that w ˜
( ¯1R)= · · · = w
( ¯R¯R)(= · · · ) = w
( ¯¯s−1R)= w
( ¯s+1¯R)= · · · = w
R( ¯R)= 0 while w
s( ¯¯R)= 0 and L
s¯= L
R¯, the ¯ Rth and ¯ sth terms in (2.1) can without loss of generality be switched. Third, rank( ˜ E
R¯) = L
R¯rules out the special case w
( ¯1R)= w
2( ¯R)= · · · = w
( ¯¯s−1R)= w
s+1( ¯¯R)= · · · = w
R( ¯R)= 0, w
( ¯¯sR)= 0 and L
s¯>
L
R¯. Fourth, because of (C1. ¯ R + 1)–(C1.R) it is impossible that ∃s
1∈ {1, . . . , ¯ R − 1}
such that w
( ¯sR)1= 0, w
R( ¯¯R)= 0 and ∃s
2∈ { ¯ R + 1, . . . , R} such that w
( ¯sR)2= 0. Fifth, because of (C1. ¯ R + 1)–(C1.R) it is impossible that w
1( ¯R)= w
( ¯2R)= · · · = w
( ¯R¯R)= 0 and ∃s
1, s
2∈ { ¯ R + 1, . . . , R}, s
1= s
2, such that w
( ¯sR)1= 0 = w
( ¯sR)2. We conclude that w
( ¯1R)= · · · = w
( ¯R−1¯R)= w
( ¯R+1¯R)= · · · = w
( ¯RR)= 0, i.e., ˜ z
R¯= d
R¯z
R¯with d
R¯= 0.
After induction we have that ˜ Z = ZD, in which D ∈ K
R×Ris diagonal and nonsingular. This implies that ˜ C = CD
−1. From (2.5) we now obtain
T
I2I1×I3· ˜Z = [vec(˜E
1) · · · vec(˜E
R)] = [vec(E
1) · · · vec(E
R)] · D.
We conclude that decompositions (2.1) and (2.7) are essentially equal.
Finally, we prove that conditions (C1.1)–(C1.R) are necessary. Let us assume that (C1. ¯ R) is not satisfied, i.e., for some w
R¯= 0 and w
s= 0 (s = ¯ R) we have that rank(E(w)) L
R¯. Without loss of generality we assume that w
R¯= 1. Then we have that E
R¯= E(w) −
Rr= ¯R
w
rE
r. Hence, an alternative decomposition of T is T = E(w) ◦ c
R¯+
R r= ¯RE
r◦ (c
r− w
rc
R¯),
in which c
s− w
sc
R¯is not proportional to c
s.
We illustrate Theorem 2.4 by means of two examples. Note that the essential uniqueness in Example 2 does not follow from Theorem 2.2, nor from any other theorem in [7].
Example 1. Consider full-rank matrices X, Y ∈ K
3×3and C ∈ K
2×2. Define the rank-2 matrices E
1= x
1y
T1+ x
2y
T2and E
2= x
2y
2T+ x
3y
T3and define a tensor T ∈ K
3×3×2by the decomposition in rank-(2,2,1) terms T = E
1◦ c
1+ E
2◦ c
2. Obviously, for this decomposition condition (C1) in Theorem 2.4 is not satisfied, since E([1 − 1]) = E
1− E
2= x
1y
T1− x
3y
T3is also rank-2. The proof of the theorem explains how alternative decompositions can be found. We have for instance T = E([1 − 1]) ◦ c
1+ E
2◦ (c
1+ c
2).
Example 2. Consider full-rank matrices X, Y ∈ K
5×5, D
1, D
2∈ K
3×3, and
C ∈ K
2×2. Define the rank-3 matrices E
1= [x
1x
2x
3] · D
1· [y
1y
2y
3]
Tand
E
2= [x
3x
4x
5] ·D
2·[y
3y
4y
5]
T, and define a tensor T ∈ K
5×5×2by the decomposition
in rank-(3,3,1) terms T = E
1◦ c
1+ E
2◦ c
2. It is clear that w
1E
1+ w
2E
2has at
least rank 4 if w
1= 0 = w
2. Since the conditions in Theorem 2.4 are satisfied, the
decomposition of T is essentially unique.
2.3. Computation. An algorithm of the alternating least squares (ALS) type was proposed in [8]. This generalization of the most popular CPD algorithm [11, 19, 20, 27] is sometimes rather slow. Line search algorithms form an interesting alternative. In such schemes the multilinearity of the problem may be exploited to find a good or even the optimal step size [22, 25]. Enhanced line search was used to compute a BTD in [22]. In [23] a Levenberg–Marquardt algorithm was discussed.
A Levenberg–Marquardt iteration step is quite expensive, but the convergence is quadratic. We also mention that the proof of Theorem 2.2 is constructive [7, 26]. It shows that under the conditions of the theorem the decomposition can be computed via a matrix generalized eigenvalue decomposition (GEVD).
3. Blind separation and tensor decomposition. Consider the following data model:
(3.1) Y = M · S + N,
in which Y ∈ K
K×Nis the matrix of observed data, M ∈ K
K×Ris an unknown mixing matrix, S ∈ K
R×Nis a matrix that contains R unknown source signals, and N ∈ K
K×Nrepresents additive noise. The goal of blind signal separation is the estimation of M and/or S, given only Y. N is considered as a perturbation of the equation and will for convenience be ignored in the presentation.
Since the factorization M ·S is not unique, we need to make some assumptions on M and/or S. In principal component analysis (PCA) it is assumed that the columns of M are mutually orthogonal and that also the rows of S are mutually orthogonal [16].
The problem then reduces to the computation of the singular value decomposition (SVD) of Y. In independent component analysis (ICA) it is assumed that the rows of S are mutually statistically independent [4]. Algebraic methods for ICA rely in some way on a CPD of a higher-order tensor derived from X [9]. The latter can be an observed higher-order cumulant tensor or a third-order tensor in which a set of observed covariance matrices is stacked, to mention just two popular alternatives.
In this paper we propose a new technique for blind signal separation, which is based on a different assumption on S. Namely, in section 4 we assume that the sources can be modeled as linear combinations of exponentials. In section 5 we more generally consider exponential polynomials. This structure allows us to formulate the problem in terms of the decomposition of a third-order tensor, derived from Y, in a sum of rank-(L
r, L
r, 1) terms. If this decomposition is essentially unique, then its computation allows us to blindly separate the signals.
We work as follows. Each row of Y is mapped to an (I × J) Hankel matrix, with I + J − 1 = N . These matrices are stacked in a tensor Y ∈ K
I×J×K. Formally, we have
(3.2) ( Y)
ijk= (Y)
k,i+j−1, 1 i I, 1 j J, 1 k K.
Since the mapping is linear, the (I×J) slices of Y are linear combinations of the Hankel representations of the sources. The linear coefficients correspond to the entries of M.
We have
(3.3) Y =
R r=1H
r◦ m
r,
in which H
r∈ K
I×Jis the Hankel matrix derived from the rth row of S, 1 r R.
Under certain conditions on the sources, discussed in sections 4 and 5, the associated
Hankel matrices have low rank. That is, under certain conditions on the sources, decomposition (3.3) is of the type (2.1), and the uniqueness conditions mentioned in section 2.2 apply.
If decomposition (3.3) is essentially unique, then it directly yields the mixing matrix up to column permutation and scaling. Let ˆ M be the estimate of M. We ideally have ˆ M = M · D · P, in which D ∈ K
R×Ris a nonsingular diagonal matrix and P ∈ K
R×Ra permutation matrix.
If M has full column rank with K R, then an estimate of the source matrix S can be obtained from (3.1) as ˆ S = ˆ M
†· Y. If M is rank-deficient and/or K < R, then this approach is not possible. However, the source values can still be estimated by av- eraging the entries of the estimates ˆ H
rof H
ralong the antidiagonals. Consistent with the indeterminacies of M, we ideally have ˆ S = P
T· D
−1· S. We call the factorization M · S essentially unique when it is only subject to these trivial indeterminacies.
4. Linear combinations of exponentials. In section 4.1 we explain our source model. In section 4.2 we explain how the source structure leads to rank-(L
r, L
r, 1) terms in (3.3). In section 4.3 we discuss the uniqueness of the decomposition Y = M · S.
4.1. Source model. In this section we assume that the sources can be expressed as linear combinations of exponentials:
(4.1) s
r(n)
def= s
r,n+1=
Lr
lr=1
c
lr,rz
nlr,r, 0 n N − 1, 1 r R.
Because of Euler’s formula, this model subsumes linear combinations of possibly ex- ponentially damped sinusoids:
e
−αncos(ωn + φ) = cz
n+ c
∗(z
∗)
n,
where c =
12e
jφand z = e
−α+jω. Model (4.1) can also include hyperbolic sines and cosines:
sinh(n) = e
n− e
−n2 , cosh(n) = e
n+ e
−n2 .
4.2. Tensor decomposition. If s
r(n) can be written as in (4.1), then its asso- ciated Hankel matrix H
r∈ K
I×Jadmits the Vandermonde decomposition
(4.2) H
r= V
r· diag(c
1,r, c
2,r, . . . , c
Lr,r) · ˜ V
rT,
in which the Vandermonde matrices V
r∈ K
I×Lrand ˜ V
r∈ K
J×Lrare defined by (4.3)
V
r=
⎡
⎢ ⎢
⎢ ⎣
1 1 · · · 1
z
1,rz
2,r· · · z
Lr,r.. . .. . .. . z
1,rI−1z
2,rI−1· · · z
I−1Lr,r⎤
⎥ ⎥
⎥ ⎦ , V ˜
r=
⎡
⎢ ⎢
⎢ ⎣
1 1 · · · 1
z
1,rz
2,r· · · z
Lr,r.. . .. . .. . z
J−11,rz
J−12,r· · · z
J−1Lr,r⎤
⎥ ⎥
⎥ ⎦ ;
see [21]. Let us assume that I, J max(L
1, L
2, . . . , L
R). Since a Vandermonde matrix
generated by distinct poles has full rank, H
ris rank-L
rand the rth term in (3.3) is
rank-(L
r, L
r, 1), 1 r R. Hence, (3.3) is a decomposition of Y in rank-(L
r, L
r, 1)
terms.
4.3. Uniqueness.
Theorem 4.1. Consider a matrix M ∈ K
K×Rthat does not have proportional columns and a matrix S ∈ K
R×Nwith structure (4.1). Assume that
N +12R
r=1
L
r. If all the poles z
lr,rare distinct, 1 l
rL
r, 1 r R, then the decomposition Y = M · S is essentially unique.
Proof. The constraint
N +12R
r=1
L
rallows us to map the rows of Y to (I × J) Hankel matrices with I, J
Rr=1