Monomial factorizations via tensor decompositions
Mikael Sørensen
a, Lieven De Lathauwer
b, Nicholaos D. Sidiropoulos
aa
University of Virginia, Dept. of Electrical and Computer Engineering, Thornton Hall 351 McCormick Road, Charlottesville, VA 22904, USA, {ms8tz, nikos}@virginia.edu.
b
Group Science, Engineering and Technology, KU Leuven - Kulak, E. Sabbelaan 53, 8500 Kortrijk, Belgium, and KU Leuven - STADIUS Center for Dynamical Systems, Signal Processing and Data
Analytics, E.E. Dept. (ESAT), Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium.
Lieven.DeLathauwer@kuleuven.be.
Abstract
The Canonical Polyadic Decomposition (CPD), which decomposes a tensor into a sum of rank one terms, plays an important role in signal processing and machine learning. In this paper we extend the CPD framework to the more general case of monomial factoriza- tions. This includes extensions of multilinear algebraic uniqueness conditions originally developed for the CPD. We obtain a deterministic condition for monomial factorizations that is both necessary and sufficient but which may be difficult to check in practice. We derive a deterministic relaxation that admits a constructive interpretation and is also easier to verify. Computationally, we reduce the monomial factorization problem into a CPD problem, which can be solved via a matrix EigenValue Decomposition (EVD).
Under the given conditions, the discussed EVD-based algorithms are guaranteed to re- turn the exact monomial factorization. Finally, we make a connection between monomial factorizations and the coupled block term decomposition, which allows us to translate monomial structures into low-rank structures.
Keywords: tensor, canonical polyadic decomposition, block term decomposition, coupled decomposition, monomial, uniqueness, eigenvalue decomposition.
2010 MSC: 15A15, 15A23
1. Introduction
Tensors have found many applications in signal processing and machine learning; see [1, 2] and references therein. The most well-known tensor decomposition is the Canonical Polyadic Decomposition (CPD), i.e., the decomposition into a minimal number of rank- one terms [3, 4]:
X = (A B) S
T∈ C
IJ ×K, (1)
where ’’ denotes the Khatri-Rao (columnwise Kronecker) product, A ∈ C
I×R, B ∈ C
J ×Rand S ∈ C
K×R. Note that the columns of A B correspond to vectorized rank- one matrices, explaining why (1) is referred to as a decomposition into rank-one terms.
(A formal definition of the CPD will be provided in Section 1.2.) In signal processing, the CPD is related to the ESPRIT [5, 6] and ACMA [7] methods while in machine learning, it is related to the naive Bayes model [8, 9, 10, 11]. In [12, 13] we extended the CPD framework to coupled CPD and we have shown the usefulness of the latter decomposition
Preprint submitted to Linear Algebra and its Applications July 8, 2019
in sensor array processing [14], wireless communication [15] and in multidimensional harmonic retrieval [16, 17]. In this paper we will further extend the CPD framework to more general monomial structures. (A monomial is a product of variables, possibly with repetitions.) More precisely, we consider bilinear factorizations of the form
X = AS
T∈ C
I×K, (2)
in which the columns of A ∈ C
I×R(or similarly the columns of S ∈ C
K×R) satisfy monomial relations of the form
a
p1,r· · · a
pL,r− a
s1,r· · · a
sL,r= 0, (3) where a
m,rdenotes the m-th entry of the r-th column of A. A bilinear factorization (2) exhibiting monomial structure of the form (3) will be referred to as a monomial factorization. (In Sections 3 and 4 it will become clear that (1) is a special case of (2).) To make things more tangible, let us consider a concrete example. In signal processing, the separation of digital communication signals is probably one of the earliest examples involving monomial structures. For instance, blind separation of M -PSK signals in which the entries of S in (2) take the form
s
kr= e
√−1ukr
with u
kr∈ 0, 2π/M, . . . , 2π(M − 1)/M (4)
has been considered (e.g., [18, 19]). From (4) it is clear that s
Mk1r
= s
Mk2r
for all k
1, k
2∈ {1, . . . , K}. In other words, for every pair (k
1, k
2), with k
1< k
2, we can exploit C
K2=
(K−1)K
2
monomial relations of the form s
Mk1r
− s
Mk2r
= 0. In this paper we will explain how to translate this type of problems into a tensor decomposition problem. Another example, which will be discussed in Section 4.4, is the Binary Matrix Factorization (BMF):
X = AS
T∈ C
I×K, (5)
where A ∈ {0, 1}
I×Ris a binary matrix. BMFs of the form (5) play a role in binary latent variable modeling (e.g., [20, 21, 22]).
Monomial factorizations have the interesting property that they provide a framework that allows us to generalize the CPD model. As an example, the presented monomial factorization framework enables us to extend the CPD model (1) to the case of binary weighted rank-one terms (this will be made clear in Section 3.3):
X = (D ∗ (A B))S
T∈ C
IJ ×K, (6)
where ’∗’ denotes the Hadamard (element-wise) product and D ∈ {0, 1}
IJ ×R. Binary weighted rank-one terms are of interest in clustering applications involving tensor struc- tures (e.g., [23, 24]).
The paper is organised as follows. In the rest of the introduction we will first present the notation used throughout the paper, followed by a brief review of the CPD and the Block Term Decomposition (BTD) [25]. As our first contribution, we will in Section 2 present a new link between monomial factorizations of the form (2) and the coupled BTD [12, 13]. This connection enables us to translate the monomial constraint (3) into a low-rank constraint, which in turn allows us to treat the monomial factorization of a matrix as a tensor decomposition problem. Next, in Section 3 we will present
2
identifiability conditions. It will be explained that the presented identifiability conditions are extensions of well-known CPD uniqueness conditions developed in [26, 27, 28, 29] to the monomial case. We also explain that the monomial factorization framework can be used to generalize the CPD model (1) to the binary weighted CPD model (6). As our third contribution, we will in Section 4 extend the algebraic algorithm for CPD in [27, 30]
to monomial factorizations. We also demonstrate how the presented algebraic algorithm can be adapted and used for the computation of a BMF of the form (5).
1.1. Notation
Vectors, matrices and tensors are denoted by lower case boldface, upper case boldface and upper case calligraphic letters, respectively. The r-th column, conjugate, transpose, conjugate-transpose, determinant, permanent, inverse, right-inverse, range and kernel of a matrix A are denoted by a
r, A
∗, A
T, A
H, |A|,
+A
+, A
−1, A
†, range (A), ker (A), respectively. The dimension of a subspace S is denoted by dim(S). The symbols ⊗ and denote the Kronecker and Khatri-Rao product, defined as
A ⊗ B :=
a
11B a
12B . . . a
21B a
22B . . . .. . .. . . . .
, A B := [a
1⊗ b
1a
2⊗ b
2. . . ] ,
in which (A)
mn= a
mn. The outer product of, say, three vectors a, b and c is denoted by a ◦ b ◦ c, such that (a ◦ b ◦ c)
ijk= a
ib
jc
k. The number of non-zero entries of a vector x is denoted by ω(x) in the tensor decomposition literature, dating back to the work of Kruskal [31]. Let diag(a) ∈ C
J ×Jdenote a diagonal matrix that holds a column vector a ∈ C
J ×1or a row vector a ∈ C
1×Jon its diagonal. In some cases a diagonal matrix is holding row k of A ∈ C
I×Jon its diagonal. This will be denoted by D
k(A) ∈ C
J ×J. Furthermore, let vec(A) denote the vector obtained by stacking the columns of A ∈ C
I×Jinto a column vector vec(A) = [a
T1, . . . , a
TJ]
T∈ C
IJ. Let e
(N )n∈ C
Ndenote the unit vector with unit entry at position n and zeros elsewhere. The all-ones vector is denoted by 1
R= [1, . . . , 1]
T∈ C
R. Matlab index notation will be used for submatrices of a given matrix. For example, A(1:k,:) represents the submatrix of A consisting of the rows from 1 to k of A. The binomial coefficient is denoted by C
mk=
k!(m−k)!m!. The k-th compound matrix of A ∈ C
I×Ris denoted by C
k(A) ∈ C
CkI×CRk. It is the matrix containing the determinants of all k × k submatrices of A, arranged with the submatrix index sets in lexicographic order. See [28, 30, 32, 33] and references therein for a discussion.
1.2. Canonical Polyadic Decomposition (CPD)
Consider the tensor X ∈ C
I×J ×K. We say that X is a rank-1 tensor if it is equal to the outer product of non-zero vectors a ∈ C
I, b ∈ C
Jand s ∈ C
Ksuch that x
ijk= a
ib
js
k. A Polyadic Decomposition (PD) is a decomposition of X into a sum of rank-1 terms [3, 4]:
X =
R
X
r=1
E
(r)◦ s
r=
R
X
r=1
a
r◦ b
r◦ s
r, (7)
where E
(r)= a
rb
Tr= a
r◦ b
r∈ C
I×Jis a rank-one matrix. The rank of a tensor X is equal to the minimal number of rank-1 tensors that yield X in a linear combination.
Assume that the rank of X is R, then (7) is called the Canonical PD (CPD) of X .
3
1.2.1. Matrix representation
Consider the horizontal matrix slice X
(i··)∈ C
J ×Kof X , defined by (X
(i··))
jk= x
ijk= P
Rr=1
a
irb
jrs
kr. The tensor X can be interpreted as a collection of matrix slices X
(1··), . . . , X
(I··), each admitting the factorization X
(i··)= P
Rr=1
a
irb
rs
Tr= BD
i(A) S
T. Stacking yields (1):
X = h
X
(1··)T, . . . , X
(I··)Ti
T= (A B) S
T. (8)
1.2.2. Uniqueness conditions for CPD
The rank-1 tensors in (7) can be arbitrarily permuted and the vectors within the same rank-1 tensor can be arbitrarily scaled provided the overall rank-1 term remains the same.
We say that the CPD is unique when it is only subject to these trivial indeterminacies.
For cases where S in (8) has full column rank, the following necessary and sufficient uniqueness condition stated in Theorem 1.1 was obtained in [26] and later reformulated in terms of compound matrices in [28]. It makes use of the matrix
G
(2)CPD= C
2(A) C
2(B) ∈ C
CI2CJ2×CR2(9) and the vector
f
(2)(d) = [d
1d
2, d
1d
3, . . . , d
R−1d
R]
T∈ C
C2R, (10) which consists of all distinct products of entries d
r· d
swith r < s from the vector d = [d
1, . . . , d
R]
T∈ C
R.
Theorem 1.1. Consider the PD of X ∈ C
I×J ×Kin (7). Assume that S has full column rank. The rank of X is R and the CPD of X is unique if and only if the following implication holds
G
(2)CPD· f
(2)(d) = 0 ⇒ ω(d) ≤ 1, (11) for all structured vectors f
(2)(d) of the form (10).
In practice, condition (11) can be hard to check. However, as observed in [26, 27, 28], if G
(2)CPDin (11) has full column rank, then f
(2)(d) = 0 and the condition is automatically satisfied. This fact leads to the following more easy to check uniqueness condition, which is only sufficient.
Theorem 1.2. Consider the PD of X ∈ C
I×J ×Kin (7). If ( S has full column rank,
G
(2)CPDhas full column rank, (12)
then the rank of X is R and the CPD of X is unique.
Furthermore, if condition (12) is satisfied, then the CPD of X can be computed via a matrix EVD [27, 30]. In short, the “CPD” of X can be converted into a “basic CPD”
of an (R × R × R) tensor Q of rank R, even in cases where max(I, J ) < R [27, 30]. The latter CPD can be computed by means of a standard EVD (e.g., [3, 34]). In Section 4 we briefly discuss how to construct the tensor Q from X and how to retrieve the CPD factor matrices A, B and S of X from the CPD of Q.
More details about the CPD can be found in [3, 31, 27, 28, 29, 26, 30, 35, 2] and references therein.
4
1.3. Block Term Decomposition (BTD) and coupled BTD
The multilinear rank-(P, P, 1) term decomposition of a tensor is an extension of the CPD (7), where each term in the decomposition now consists of the outer product of a vector and a matrix that is low-rank [25]. More formally, a
r◦ b
r◦ s
rin (7) is replaced by E
r◦ s
r:
X =
R
X
r=1
E
r◦ s
r, (13)
where E
r∈ C
I×Jis a rank-P matrix with min(I, J ) > P . Note that if P = 1, then (13) indeed reduces to (7) with E
r= a
rb
Tr= a
r◦ b
r. We will consider the extension of (13) in which a set of tensors X
(n)∈ C
In×Jn×K, n ∈ {1, . . . , N } is decomposed into a sum of coupled multilinear rank-(P, P, 1) terms [12], or coupled BTD for short:
X
(n)=
R
X
r=1
E
(n)r◦ s
r, n ∈ {1, . . . , N }, (14)
where E
(n)r∈ C
In×Jnis a rank-P matrix with min(I
n, J
n) > P and s
r∈ C
K. Note that {s
r} are shared between all X
(n), i.e., the third mode ensures the coupling. As in the CPD case, the rank of the coupled BTD is defined as the minimal number of coupled multilinear rank-(P, P, 1) terms {E
(n)r◦ s
r} that yield X
(1), . . . , X
(N ). Since E
(n)ris low-rank, we know that (14) can also be expressed in terms of a coupled PD:
X
(n)=
R
X
r=1
E
(n)r◦ s
r=
R
X
r=1
N
(n,r)M
(n,r)T◦ s
r=
R
X
r=1 P
X
p=1
n
(n,r)p◦ m
(n,r)p◦ s
r, (15)
where E
(n)r= N
(n,r)M
(n,r)T, in which N
(n,r)= [n
(n,r)1, . . . , n
(n,r)P] ∈ C
In×Pand M
(n,r)= [m
(n,r)1, . . . , m
(n,r)P] ∈ C
Jn×Pare rank-P matrices.
1.3.1. Matrix representation Define
M
(n)= h
M
(n,1), . . . , M
(n,R)i
∈ C
In×P R, (16)
N
(n)= h
N
(n,1), . . . , N
(n,R)i
∈ C
Jn×P R, (17)
S
(ext)= 1
TP⊗ s
1, . . . , 1
TP⊗ s
R∈ C
K×P R. (18) Then the factorization (15) can also be expressed in terms of {M
(n), N
(n), S
(ext)} as follows:
X =
X
(1).. . X
(N )
=
M
(1)N
(1).. . M
(N )N
(N )
S
(ext)T. (19)
5
1.3.2. Uniqueness condition for coupled BTD
The coupled BTD version of G
(2)CPDin (9) is given by
G
(N,P +1)BTD=
C
P +1M
(1)C
P +1N
(1).. .
C
P +1M
(N )C
P +1N
(N )
P
BTD∈ C
N ×(CR+PP +1−R), (20)
where M
(n)∈ C
In×P Rand N
(n)∈ C
Jn×P Rare given by (16) and (17), respectively, and P
BTD∈ C
CP +1P R ×(CR+PP +1−R)is the “compression” matrix that takes into account that each column vector s
rin (18) is repeated P times and that |M
(n,r)| = |N
(n,r)| = 0. The latter property implies that R columns of C
P +1(M
(n)) C
P +1(N
(n)) are zero columns, which are eliminated by P
BTD. Here we only state how P
BTDis constructed. The reasoning behind the construction P
BTDcan be found in [12]. (More details can also be found in the proof of Theorem 2.1 in Appendix B.) The C
R+PP +1− R columns of P
BTDare indexed by the lexicographically ordered tuples in the set
Γ
c= {(r
1, . . . , r
P +1) | 1 ≤ r
1≤ · · · ≤ r
P +1≤ R} \ {(r, . . . , r)}
Rr=1.
Consider also the mapping f
c: {(r
1, . . . , r
P +1)}
P +1→ {1, 2, . . . , C
R+PP +1− R} that returns the position of its argument in the set Γ
c. Similarly, the C
P RP +1rows of P
BTDare indexed by the lexicographically ordered tuples in the set
Γ
r= {(q
1, . . . , q
P +1) | 1 ≤ q
1< · · · < q
P +1≤ P R}.
Likewise, we define the mapping f
r: {(q
1, . . . , q
P +1)}
P +1→ {1, 2, . . . , C
P RP +1} that re- turns the position of its argument in the set Γ
r. The entries of P
BTDare now given by
(P
BTD)
fr(q1,...,qP +1),fc(r1,...,rP +1)=
( 1, if d
qP1e = r
1, . . . , d
qP +1Pe = r
P +1,
0, otherwise. (21)
Theorem 1.3. Consider the coupled BTD of X
(n)∈ C
In×Jn×K, n ∈ {1, . . . , N } in (30).
If
( S has full column rank,
G
(N,P +1)BTDhas full column rank, (22)
then the coupled BTD rank of {X
(n)} is R and the coupled BTD of {X
(n)} is unique.
As in Theorem 1.2, if condition (22) in Theorem 1.3 is satisfied, then the coupled BTD of {X
(n)} can be computed via a matrix EVD [13]. More details will be provided in Section 4.1.1.
The objective of this paper is to extend the CPD results discussed in this section to the case of monomial factorizations. More precisely, in Section 2 we explain that the monomial factorization can be interpreted as a coupled BTD. Next, in Section 3 we extend the uniqueness conditions stated in Theorems 1.1 and 1.2 to the case of bilinear models with factor matrices satisfying monomial relations. Finally, in Section 4 we extend the algebraic algorithm associated with Theorems 1.2 and 1.3 to the case of monomial factorizations.
6
2. Link between monomial factorization and coupled BTD
In Section 2.1 we explain how to represent the monomial structure (3) as a low-rank constraint on a particular matrix. Using this low-rank matrix, we will in Section 2.2 translate the monomial factorization (2) into the coupled BTD of the form (14) reviewed in Section 1.3.
2.1. Representation of monomial structure via low-rank matrix
We say that X admits an R-term monomial factorization enjoying N monomial rela- tions of the form (3) if every column of A in (2) satisfies the monomial relations
L
Y
l=1
a
(+,n)lr−
L
Y
l=1
a
(−,n)lr= 0, r ∈ {1, . . . , R}, n ∈ {1, . . . , N }, (23)
where a
(+,n)r∈ C
Land a
(−,n)r∈ C
Lare given by
( a
(+,n)r= [a
(+,n)1r, . . . , a
(+,n)Lr]
T= a
p1,n,r, . . . , a
pL,n,rT, a
(−,n)r= [a
(−,n)1r, . . . , a
(−,n)Lr]
T= a
s1,n,r, . . . , a
sL,n,rT,
(24)
in which a
(+,n)lr= a
pl,n,rcorresponds to the p
l,n-th entry of the r-th column of A (simi- larly for a
(−,n)lr). Define the linearly structured matrix A
L(a
(+,n)r, a
(−,n)r) ∈ C
L×L:
1A
La
(+,n)r, a
(−,n)r:=
a
(+,n)1r0 · · · 0 (−1)
L· a
(−,n)1ra
(−,n)2ra
(+,n)2r. . . 0
0 a
(−,n)3r. . . . . . .. . .. . . . . . . . . . . 0 0 · · · 0 a
(−,n)Lra
(+,n)Lr
(25)
= diag(a
(+,n)r) + (diag(a
(−,n)r)P) ∗ (1
L1
TL− e
(L)1e
(L)TL+ (−1)
Le
(L)1e
(L)TL), where P is the “cyclic” permutation matrix given by
P =
L−1
X
l=1
e
(L)l−1e
(L)Tl+ e
(L)1e
(L)TL∈ C
L×L. (26)
1
The matrix 1
L1
TL− e
(L)1e
(L)TL+ (−1)
Le
(L)1e
(L)TLtakes the sign (−1)
Lin (25) into account.
7
From the cofactor expansion of |A
L(a
(+,n)r, a
(−,n)r)| along the first row, the connection between (23) and (25) becomes clear:
|A
L(a
(+,n)r, a
(−,n)r)| = a
(+,n)1r·
a
(+,n)2r0 · · · 0 a
(−,n)3r. . . . . . .. . . . . . . . 0
a
(−,n)Lra
(+,n)Lr(27)
+ (−1)
La
(−,n)1r(−1)
L+1a
(−,n)2ra
(+,n)2r0 a
(−,n)3r. . .
.. . . . . . . . a
(+,n)L−1r0 · · · 0 a
(−,n)Lr=
L
Y
l=1
a
(+,n)lr−
L
Y
l=1
a
(−,n)lr= 0 ,
where we exploited that the two involved (L − 1) × (L − 1) minors in (27) are triangular.
The determinant property (27) also explains that A
L(a
(+,n)r, a
(−,n)r) is low-rank under the condition (23). In fact, since the minors in (27) do not vanish under condition (23), A
L(a
(+,n)r, a
(−,n)r) will be a rank-(L − 1) matrix. The only possible exception is the trivial case where Q
Lm=1
a
pm= 0 and Q
Ln=1
a
sn= 0. In this section it will become clear that if Q
Lm=1
a
pm6= 0 or Q
Ln=1
a
sn6= 0, then the linear structure on A
L(a
(+,n)r, a
(−,n)r) can be relaxed without affecting the identifiability. To the best of our knowledge, the representation of a monomial relation of the form (23) via the rank deficiency of the matrix in (25) is a novel contribution of this paper.
2.2. Monomial factorization via coupled BTD
Consider the bilinear factorization (2) in which the columns of A satisfy N monomial relations of the form (23). The mapping (25) together with the bilinear property of the monomial factorization enables us to transform (2) into a coupled BTD. In detail, for every monomial relation (n ∈ {1, . . . , N }), we build a tensor X
(n)∈ C
L×L×Kwith matrix slices
X
(n)(:, :, k) = A
L(x
(+,n)k, x
(−,n)k) =
R
X
r=1
A
L(a
(+,n)r, a
(−,n)r)s
kr, k ∈ {1, . . . , K}, (28)
in which x
(+,n)k∈ C
Land x
(−,n)k∈ C
Lare constructed from the entries of the k-th column of X in accordance to the n-th monomial relation, so that (cf. Eq. (24)):
( x
(+,n)k= [x
(+,n)1k, . . . , x
(+,n)Lk]
T= x
p1,n,k, . . . , x
pL,n,kT, x
(−,n)k= [x
(−,n)1k, . . . , x
(−,n)Lk]
T= x
s1,n,k, . . . , x
sL,n,k T.
(29)
Overall, this yields the coupled BTD:
C
L×L×K3 X
(n)=
R
X
r=1
A
L(a
(+,n)r, a
(−,n)r) ◦ s
r, n ∈ {1, . . . , N }. (30) The key observation is that each equation in (28) corresponds to a BTD, and the collec- tion of all equations in (28) is a coupled BTD.
8
Matrix representation. Let X
(··k,n)∈ C
L×Ldenote the k-th matrix slice of X
(n), defined according to (X
(··k,n))
ij= (X
(n))
ijk. We now obtain the following matrix representation of (30):
X
(n)= h
vec(X
(··1,n)), . . . , vec(X
(··K,n)) i
= A
(n)S
T∈ C
L2×K, n ∈ {1, . . . , N }, where A
(n)= h
vec(A
L(a
(+,n)1, a
(−,n)1)), . . . , vec(A
L(a
(+,n)R, a
(−,n)R)) i
∈ C
L2×Rand S = [s
1, . . . , s
R] ∈ C
K×R. Stacking yields
X = h
X
(1)T, . . . , X
(N )Ti
T= A
totS
T∈ C
N ·L2×K, (31) where A
tot∈ C
N ·L2×Ris given by
A
tot=
A
(1).. . A
(N )
=
vec(A
L(a
(+,1)1, a
(−,1)1)) . . . vec(A
L(a
(+,1)R, a
(−,1)R))
.. . . . . .. .
vec(A
L(a
(+,N )1, a
(−,N )1)) . . . vec(A
L(a
(+,N )R, a
(−,N )R))
.
(32) Uniqueness condition. Since A
L(a
(+,n)r, a
(−,n)r) defined by (25) is low-rank, (30) corre- sponds to a coupled BTD. In more detail, let the rank of A
L(a
(+,n)r, a
(−,n)r) be equal to L
r,n< L, then it admits the low-rank factorization
A
L(a
(+,n)r, a
(−,n)r) = N
(n,r)M
(n,r)T= h
N e
(n,r), 0
L,L−1−Lr,nih
M f
(n,r), 0
L,L−1−Lr,ni
T, (33) where e N
(n,r)∈ C
L×Lr,nand f M
(n,r)∈ C
L×Lr,nare rank-L
r,nmatrices and 0
m,ndenotes an (m×n) zero matrix. Note that if ω(a
(+,n)r) = L or ω(a
(−,n)r) = L, then L
r,n= L−1, as explained in Section 2.1. However, if ω(a
(+,n)r) < L and ω(a
(−,n)r) < L, then L
r,n< L−1.
Consider also G
(N,L)BTDgiven by (20) in which M
(n)and N
(n)are built from the M
(n,r)and N
(n,r)matrices in (33). We can now conclude that if for all r ∈ {1, . . . , R} there exists an n ∈ {1, . . . , N } such that L
r,n= L − 1 and G
(N,L)BTDhas full column rank, then the monomial factorization problem can be solved via the coupled BTD (30). Theorem 2.1 below summarizes the result.
Theorem 2.1. Consider the coupled BTD of X
(n)∈ C
In×Jn×K, n ∈ {1, . . . , N } in (30).
If
( S has full column rank,
G
(N,L)BTDhas full column rank, (34)
then the coupled BTD rank of {X
(n)} is R, the coupled BTD of {X
(n)} is unique, A
tothas full column rank, the monomial factorization of X in (2) is unique, and A in (2) has full column rank.
Proof. The result is an immmediate consequence of Theorem 1.3. The interested reader can find a proof in Appendix B in which the connection between Theorem 2.1 and the subsequent Theorem 3.2 is made apparent.
9
Note that in Theorem 2.1 we state that if condition (34) is satisfied, then A in (2) has full column rank. This is an obvious consequence of the uniqueness property of the full column rank factor matrix S. Note also that we have dropped the linear structure on A
L(a
(+,n)r, a
(−,n)r) and instead used the low-rank factorization A
L(a
(+,n)r, a
(−,n)r) = N
(n,r)M
(n,r)Tin the coupled BTD of {X
(n)}.
As a final remark, we mention that the transformation step from a monomial factor- ization (2) to an “unconstrained” coupled BTD (15) can be understood as a generalization of the Hankelisation step used in the ESPRIT method [5] for the special case where the columns of A have exponential structure.
3. Identifiability conditions for monomial factorizations
We will now take the linear structure on A
L(a
(+,n)r, a
(−,n)r) into account. In Section 3.1 we explain that this will lead to generalizations of the uniqueness conditions stated in Theorems 1.1 and 1.2 for CPD to monomial factorizations. Next, in Section 3.2 we formulate the presented uniqueness conditions in terms of null spaces, which will lead to a condition that is easier to comprehend.
3.1. Necessary and sufficient uniqueness conditions 3.1.1. Mixed discriminants
In Theorem 3.1 we present a uniqueness condition for the monomial factorization of X. It will be based on the properties of the mixed discriminant discussed in this section.
However, let us first briefly explain how mixed discriminants appear in our problem. The overall idea is to find a condition that ensures that S
Thas a unique right-inverse (up to intrinsic column scaling and permutation ambiguities), denoted by W. If W is unique, then Xw
r= a
ris also unique and ω(S
Tw
r) = 1 for all r ∈ {1, . . . , R}. This means that if d
r= S
Tw
r, then X
(n)w
r= P
Rs=1
vec(A
(n,s))d
sris a vectorized rank-(L − 1) matrix.
The latter property can be used to derive a condition that ensures the uniqueness of W.
Let H
(r)∈ C
L×Lbe of the form (25), corresponding to A
(n,r)in which the superscript
’n’ is omitted. Then in the proof of Theorem 3.1 it will become clear that the derivation of a uniqueness condition for W, and therefore also for the monomial factorization, boils down to finding a compact expression of the determinant of H
(1)d
1+ . . . + H
(R)d
R, where d
r∈ C:
R
X
r=1
H
(r)d
r=
R
X
r1,...,rL=1
D(H
(r1), . . . , H
(rL)) · d
r1· · · d
rL= X
(l1,...,lR)∈[L]R
D
H
(1), . . . , H
(1)| {z }
l1times
, . . . , H
(R), . . . , H
(R)| {z }
lRtimes
d
l11· · · d
lRR, (35)
where [L]
Rdenotes the set of all weak compositions of L in R terms, i.e.,
[L]
R= {(l
1, . . . , l
R) | l
1+ · · · + l
R= L and l
1, . . . , l
R≥ 0}. (36) Note that the cardinality of [L]
Ris C
R+L−1L. The coefficients {D(H
(r1), . . . , H
(rL))} in (35) are known as mixed discriminants and are given by
D(H
(r1), . . . , H
(rL)) =
∂
LH
(r1)d
r1+ · · · + H
(rL)d
rL∂d
r1· · · ∂d
rL. (37)
10
The mixed discriminant of an L-tuple of square matrices H
(r1)∈ C
L×L, . . . , H
(rL)∈ C
L×Lcan also be defined as
D
H
(r1), . . . , H
(rL)= 1 L!
X
σ∈SL
sgn(σ) h
h
(rσ(1)1), h
(rσ(2)2), . . . , h
(rσ(L)L)i
, (38) where h
(rσ(l)l)denotes the σ(l)-th column of H
(rl), S
Ldenotes the set of all permutations of 1, 2, . . . , L and sgn(σ) denotes the sign of the permutation σ. It can be shown that the two definitions (37) and (38) of the mixed discriminant are equivalent [36].
From (38) it is clear that the mixed discriminant can be understood as an extension of the determinant. Indeed, if H := H
(r1)= · · · = H
(rL), then (38) reduces to the determinant
D
H, . . . , H
| {z }
L times
= X
σ∈SL
sgn(σ)
L
Y
l=1
h
l,σ(l)= |H| . (39)
The mixed discriminant can also be understood as an extension of the permanent. More precisely, let D
(1)∈ C
L×L, . . . , D
(L)∈ C
L×Lbe diagonal matrices, then from (38) we obtain (a scaled version of) the permanent
D
D
(1), . . . , D
(L)= 1 L!
X
σ∈SL L
Y
l=1
d
(σ(l))l,l= 1 L!
+
| B
+
| , (40)
where B ∈ C
L×Lis given by (B)
il= d
(l)ii. Furthermore, let D
(1)∈ C
L×L, . . . , D
(L)∈ C
L×Lbe diagonal matrices, then
D
D
(1), . . . , D
(L)= D
D
(σ(1)), . . . , D
(σ(L))= 1 L!
+
| B
+
| , ∀σ ∈ S
L, (41) which follows from the column permutation invariance property of the permanent, i.e.,
+
| B
+
| =
+
| BΠ Π Π
+
| for any permutation matrix Π Π Π ∈ C
L×L. Note that the permanent can be seen as a signless version of the determinant (i.e.,
+
| H
+
| is equal to (39) when sgn(σ) is dropped). This directly explains the permutation invariance property of the permanent.
The three properties (39)–(41) of the mixed discriminant will be used in the derivation of Theorem 3.1. A further discussion of the mixed discriminant and its properties can be found in [36, 37]. A discussion of the properties of the permanent can be found in [32, 33].
3.1.2. Uniqueness condition in terms of dimension of column space
As mentioned earlier, in the proof of Theorem 3.1 we will make use of the expansion of
P
Rr=1
A
L(a
(+,n)r, a
(−,n)r)d
rin terms of the scalars d
1, . . . , d
R. The key ingredient in the derivation of condition (53) in Theorem 3.1 is the following identity
R
X
r=1
A
L(a
(+,n)r, a
(−,n)r)d
r= X
σ∈SL
sgn(σ)
L
Y
l=1 R
X
r=1
d
r· (A
L(a
(+,n)r, a
(−,n)r)
lσ(l)!
=
L
Y
l=1 R
X
r=1
d
ra
(+,n)lr!
−
L
Y
l=1 R
X
r=1
d
ra
(−,n)lr!
, (42)
11
where S
Ldenotes the set of all permutations of 1, 2, . . . , L, and sgn(σ) denotes the sign of the permutation σ. Note also that (42) directly follows from the patterned structure of A
L(a
(+,n)r, a
(−,n)r). (See also equations (25) and (27).) Define
A
(+,r)=
a
(+,1)Tr.. . a
(+,N )Tr
∈ C
N ×Land A
(−,r)=
a
(−,1)Tr.. . a
(−,N )Tr
∈ C
N ×L. (43)
Due to properties (39)–(41), the expansion of the product-of-sums in (42) with respect to d
1, . . . , d
Ryields the homogeneous polynomial (see the proof of Theorem 3.1 for more details):
R
X
r=1
A
L(a
(+,n)r, a
(−,n)r)d
r=
L
Y
l=1 R
X
r=1
d
ra
(+,n)lr!
−
L
Y
l=1 R
X
r=1
d
ra
(−,n)lr!
(44)
= X
(l1,...,lR)∈[L]R
"
D
D
n(A
(+,1)), . . . , D
n(A
(+,1))
| {z }
l1times
, . . . , D
n(A
(+,R)), . . . , D
n(A
(+,R))
| {z }
lRtimes
− D
D
n(A
(−,1)), . . . , D
n(A
(−,1))
| {z }
l1times
, . . . , D
n(A
(−,R)), . . . , D
n(A
(−,R))
| {z }
lRtimes
#
d
l11· · · d
lRR.
(Compare with (35).) In terms of the matrices and vectors defined below a compact expression of (44) will be introduced. For every weak composition of L in R terms (i.e., l
1+ · · · + l
R= L subject to l
r≥ 0) we define the square (L × L) matrices
A
(+,n)(l1,...,lR)
= h 1
Tl1
⊗ a
(+,n)1, . . . , 1
TlR
⊗ a
(+,n)Ri
∈ C
L×L, (45)
A
(−,n)(l1,...,lR)
= h 1
Tl1
⊗ a
(−,n)1, . . . , 1
TlR
⊗ a
(−,n)Ri
∈ C
L×L. (46)
From the matrices in (45) and (46), we also build the row vectors g
(n,L)+∈ C
1×(CR+L−1L −R)and g
(n,L)−∈ C
1×(CR+L−1L −R)whose entries are indexed by an R-tuple (l
1, l
2, . . . , l
R) with 0 ≤ l
r≤ L − 1 and ordered lexicographically:
g
(n,L)+=
+| A
(+,n)(L−1,1,0,0,...,0) +| ,
+
| A
(+,n)(L−1,0,1,0,...,0) +| , . . . ,
+
| A
(+,n)(0,...,0,1,L−1) +|
, (47)
g
(n,L)−=
+| A
(−,n)(L−1,1,0,0,...,0) +| ,
+
| A
(−,n)(L−1,0,1,0,...,0) +| , . . . ,
+
| A
(−,n)(0,...,0,1,L−1) +|
. (48) Based on (47) and (48) we in turn build the row vector
g
(n,L)MF=
g
(n,L)+− g
(n,L)−D
(L)W∈ C
1×(CLR+L−1−R), (49) in which the subscript ’MF’ stands for Monomial Factorization and the diagonal weight matrix D
(L)W∈ C
(CR+L−1L −R)×(CR+L−1L −R)is given by
D
(L)W= diag
w
(L)(L−1,1,0,0,...,0), w
(L)(L−1,0,1,0,...,0), . . . , w
(0,...,0,1,L−1)(L), (50)
12
where the scalar w
(l(L)1,l2,...,lR)
=
l 11!l2!···lR!
takes into account that, due to the column permutation invariance property of the permanent,
+
| A
(+,n)(l1,l2,...,lR) +
| and
+
| A
(−,n)(l1,l2,...,lR) +
| appear
l L!1!l2!···lR!
times in the expansion of
P
Rr=1
A
L(a
(+,n)r, a
(−,n)r)d
rand that each permanent is scaled by the factor
L!1(see (40)), as will be made clear in the proof of Theorem 3.1. Overall, we build
G
(N,L)MF= h
g
(1,L)TMF, g
(2,L)TMF, . . . , g
(N,L)TMFi
T∈ C
N ×(CR+L−1L −R). (51) It can be verified that (51) is an extension of (9) to the monomial case, i.e., if X satisfies the CPD factorization (8) with full column rank S, then G
(N,L)MFreduces to G
(2)CPD. Note that in the former case there are two superscripts. Namely, ’N ’ and ’L’ that indicate the number of monomial constraints / equations and the degree of the involved monomials, respectively. In the CPD case we have N = C
I2C
J2and L = 2. Finally, we will also make use of the vector
f
(L)(d) = [d
L−11d
2, d
L−11d
3, . . . , d
R−1d
L−1R]
T∈ C
(CR+L−1L −R). (52) Comparing (10) with (52), it is clear that the latter is also an extension of the former.
More precisely, f
(L)(d) consists of all C
R+L−1Ldistinct entries of d ⊗ · · · ⊗ d minus the R entries d
L1, . . . , d
LR.
Theorem 3.1. Consider the monomial factorization of X in (2) with N monomial rela- tions of the form (3). Assume that S has full column rank. The monomial factorization of X is unique, A in (2) has full column rank and A
totin (31) has full column rank if and only if the following implication holds
G
(N,L)MF· f
(L)(d) = 0 ⇒ ω(d) ≤ 1, (53) for all structured vectors f
(L)(d) of the form (52).
Proof. See Appendix A.
As (11) in the CPD case, condition (53) can be hard to check. Analogous, we observe that if G
(N,L)MFin (53) has full column rank, then f
(L)(d) = 0 and the condition is auto- matically satisfied. This leads to the following more easy to check sufficient uniqueness condition.
Theorem 3.2. Consider the monomial factorization of X in (2) with N monomial re- lations of the form (3). If
( S has full column rank,
G
(N,L)MFhas full column rank, (54)
then the monomial factorization of X is unique, A in (2) has full column rank and A
totin (31) has full column rank.
13
3.2. Uniqueness condition in terms of dimension of null space
Theorem 3.3 below provides an alternative formulation of Theorem 3.1, which may be more easy to comprehend.
Theorem 3.3 makes use of a matrix Ψ Ψ Ψ
(N,L)∈ C
N ×RL, defined as
Ψ Ψ Ψ
(N,L)=
ψ ψ ψ
(1,L).. . ψ ψ ψ
(N,L)
=
( e a
(+,1)1⊗ · · · ⊗ e a
(+,1)L)
T.. .
( e a
(+,N )1⊗ · · · ⊗ e a
(+,N )L)
T
−
( e a
(−,1)1⊗ · · · ⊗ e a
(−,1)L)
T.. .
( e a
(−,N )1⊗ · · · ⊗ e a
(−,N )L)
T
, (55)
where e a
(+,n)l= [a
(+,n)l1, . . . , a
(+,n)lR]
T∈ C
Rand e a
(−,n)l= [a
(−,n)l1, . . . , a
(−,n)lR]
T∈ C
R. Note that
R
X
r=1
A
L(a
(+,n)r, a
(−,n)r)d
r= e a
(+,n)1⊗ · · · ⊗ e a
(+,n)L− e a
(−,n)1⊗ · · · ⊗ e a
(−,n)L T(d ⊗ · · · ⊗ d)
= ψ ψ ψ
(n,L)(d ⊗ · · · ⊗ d), (56)
where d = [d
1. . . , d
R]
T∈ C
R. Theorem 3.3 will also make use of the subspace
ker(Ψ Ψ Ψ
(N,L)) ∩ π
S(L), (57) where π
S(L)denotes the subspace of vectorized R
Lsymmetric tensors. The link between Theorem 3.1 and Theorem 3.3 follows from the observation that condition (53) can also be expressed as
f
(L)(d) ∈ ker(G
(N,L)MF) ⇒ ω(d) ≤ 1. (58) Since
G
(N,L)MFf
(L)(d) = 0 ⇔ Ψ Ψ Ψ
(N,L)(d ⊗ · · · ⊗ d) = 0, (59) as explained in Appendix C, condition (58) can in turn be expressed as
d ⊗ · · · ⊗ d ∈ ker(Ψ Ψ Ψ
(N,L)) ∩ π
(L)S⇒ ω(d) ≤ 1. (60) Note that this also means that the dimension of the subspace ker(G
(N,L)MF) is equal to the dimension of the subspace ker(Ψ Ψ Ψ
(N,L)) ∩ π
S(L). Theorems 3.3 and 3.4 below are reformulations of Theorems 3.1 and 3.2 in terms of ker Ψ Ψ Ψ
(N,L)∩ π
(L)S.
Theorem 3.3. Consider the monomial factorization of X in (2) with N monomial rela- tions of the form (3). Assume that S has full column rank. The monomial factorization of X is unique, A in (2) has full column rank and A
totin (31) has full column rank if and only if the following implication holds
d ⊗ · · · ⊗ d ∈ ker(Ψ Ψ Ψ
(N,L)) ∩ π
(L)S⇒ ω(d) ≤ 1, (61) where Ψ Ψ Ψ
(N,L)is given by (55).
14
Theorem 3.4. Consider the monomial factorization of X in (2) with N monomial re- lations of the form (3). If
( S has full column rank,
ker(Ψ Ψ Ψ
(N,L)) ∩ π
(L)Sis an R-dimensional subspace, (62) then the monomial factorization of X is unique, A in (2) has full column rank and A
totin (31) has full column rank.
3.3. Application: Extension of CPD to (0,1)-binary weighted CPD 3.3.1. CPD
Consider the CPD of X given by (7) in which E
(r)= a
rb
Tris associated with the r-th column of A B. Recall that any 2-by-2 submatrix of E
(r)is a rank-1 matrix, i.e.,
e
(r)i1j1
e
(r)i1j2
e
(r)i2j1
e
(r)i2j2
= e
(r)i1j1
e
(r)i2j2
− e
(r)i1j2
e
(r)i2j1
= 0. Since there are C
I2ways of selecting two rows of E
(r)and C
J2ways of selecting two columns of E
(r), it is clear that the CPD can be interpreted as a monomial factorization involving N = C
I2C
J2monomial relations of the form (23) with L = 2, a
(+,n)r= [e
(r)i1j1
, e
(r)i2j2
]
Tand a
(−,n)r= [e
(r)i1j2
, e
(r)i2j1
]
T, where the superscript
0n
0is associated with the tuple (i
1, i
2, j
1, j
2).
3.3.2. Binary weighted CPD
A nice property of monomial factorizations is that they allow us to extend the CPD model (1) to binary weighted CPD (6) in which E
(r)in (7) now takes the form E
(r)= D
(r)∗ (a
rb
Tr), where D
(r)∈ {0, 1}
I×Jis a binary “connectivity” matrix. This means that the tensor representation (7) extends to
X =
R
X
r=1
E
(r)◦ s
r=
R
X
r=1
(D
(r)∗ (a
rb
Tr)) ◦ s
r(63)
for binary weighted CPD. From (8) and (63) it is clear that (6) is a matrix representation of the binary weighted CPD of X in which D = [vec(D
(1)T), . . . , vec(D
(R)T)] ∈ C
IJ ×R. Since E
(r)is not necessarily a low-rank matrix, the CPD modeling approach cannot be used for binary weighted CPD. However, it can be verified that any 2-by-2 submatrix of E
(r)= D
(r)∗ (a
rb
Tr) must satisfy the monomial relation e
(r)i1j1
e
(r)i2j2
e
(r)i1j2
e
(r)i2j1
· (e
(r)i1j1
e
(r)i2j2
− e
(r)i1j2
e
(r)i2j1
) = 0. We can now conclude that the binary weighted CPD of a tensor can be interpreted as a monomial factorization involving N = C
I2C
J2monomial relations of the form (23) with L = 6 and
( a
(+,n)r= [e
(r)i1j1
, e
(r)i2j2
e
(r)i1j2
, e
(r)i2j1
, e
(r)i1j1
, e
(r)i2j2
]
T, a
(−,n)r= [e
(r)i1j1
, e
(r)i2j2
, e
(r)i1j2
, e
(r)i2j1
, e
(r)i1j2
, e
(r)i2j1