Monomial factorizations via tensor decompositions

(1)

Monomial factorizations via tensor decompositions

Mikael Sørensen

^a

, Lieven De Lathauwer

^b

, Nicholaos D. Sidiropoulos

^a

a

University of Virginia, Dept. of Electrical and Computer Engineering, Thornton Hall 351 McCormick Road, Charlottesville, VA 22904, USA, {ms8tz, nikos}@virginia.edu.

b

Group Science, Engineering and Technology, KU Leuven - Kulak, E. Sabbelaan 53, 8500 Kortrijk, Belgium, and KU Leuven - STADIUS Center for Dynamical Systems, Signal Processing and Data

Analytics, E.E. Dept. (ESAT), Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium.

Lieven.DeLathauwer@kuleuven.be.

Abstract

The Canonical Polyadic Decomposition (CPD), which decomposes a tensor into a sum of rank one terms, plays an important role in signal processing and machine learning. In this paper we extend the CPD framework to the more general case of monomial factorizations. This includes extensions of multilinear algebraic uniqueness conditions originally developed for the CPD. We obtain a deterministic condition for monomial factorizations that is both necessary and sufficient but which may be difficult to check in practice. We derive a deterministic relaxation that admits a constructive interpretation and is also easier to verify. Computationally, we reduce the monomial factorization problem into a CPD problem, which can be solved via a matrix EigenValue Decomposition (EVD).

Under the given conditions, the discussed EVD-based algorithms are guaranteed to re- turn the exact monomial factorization. Finally, we make a connection between monomial factorizations and the coupled block term decomposition, which allows us to translate monomial structures into low-rank structures.

Keywords: tensor, canonical polyadic decomposition, block term decomposition, coupled decomposition, monomial, uniqueness, eigenvalue decomposition.

2010 MSC: 15A15, 15A23

1. Introduction

Tensors have found many applications in signal processing and machine learning; see [1, 2] and references therein. The most well-known tensor decomposition is the Canonical Polyadic Decomposition (CPD), i.e., the decomposition into a minimal number of rank- one terms [3, 4]:

X = (A B) S

^T

∈ C

^{IJ ×K}

, (1)

where ’’ denotes the Khatri-Rao (columnwise Kronecker) product, A ∈ C

^I×R

, B ∈ C

^{J ×R}

and S ∈ C

^K×R

. Note that the columns of A B correspond to vectorized rank- one matrices, explaining why (1) is referred to as a decomposition into rank-one terms.

(A formal definition of the CPD will be provided in Section 1.2.) In signal processing, the CPD is related to the ESPRIT [5, 6] and ACMA [7] methods while in machine learning, it is related to the naive Bayes model [8, 9, 10, 11]. In [12, 13] we extended the CPD framework to coupled CPD and we have shown the usefulness of the latter decomposition

Preprint submitted to Linear Algebra and its Applications July 8, 2019

(2)

in sensor array processing [14], wireless communication [15] and in multidimensional harmonic retrieval [16, 17]. In this paper we will further extend the CPD framework to more general monomial structures. (A monomial is a product of variables, possibly with repetitions.) More precisely, we consider bilinear factorizations of the form

X = AS

^T

∈ C

^I×K

, (2)

in which the columns of A ∈ C

^I×R

(or similarly the columns of S ∈ C

^K×R

) satisfy monomial relations of the form

a

p₁,r

· · · a

p_L,r

− a

s₁,r

· · · a

s_L,r

= 0, (3) where a

m,r

denotes the m-th entry of the r-th column of A. A bilinear factorization (2) exhibiting monomial structure of the form (3) will be referred to as a monomial factorization. (In Sections 3 and 4 it will become clear that (1) is a special case of (2).) To make things more tangible, let us consider a concrete example. In signal processing, the separation of digital communication signals is probably one of the earliest examples involving monomial structures. For instance, blind separation of M -PSK signals in which the entries of S in (2) take the form

s

kr

= e

√−1ukr

with u

kr

∈ 0, 2π/M, . . . , 2π(M − 1)/M (4)

has been considered (e.g., [18, 19]). From (4) it is clear that s

^M_k

1r

= s

^M_k

2r

for all k

₁

, k

₂

∈ {1, . . . , K}. In other words, for every pair (k

1

, k

₂

), with k

₁

< k

₂

, we can exploit C

_K²

=

(K−1)K

2

monomial relations of the form s

^M_k

1r

− s

^M_k

2r

= 0. In this paper we will explain how to translate this type of problems into a tensor decomposition problem. Another example, which will be discussed in Section 4.4, is the Binary Matrix Factorization (BMF):

X = AS

^T

∈ C

^I×K

, (5)

where A ∈ {0, 1}

^I×R

is a binary matrix. BMFs of the form (5) play a role in binary latent variable modeling (e.g., [20, 21, 22]).

Monomial factorizations have the interesting property that they provide a framework that allows us to generalize the CPD model. As an example, the presented monomial factorization framework enables us to extend the CPD model (1) to the case of binary weighted rank-one terms (this will be made clear in Section 3.3):

X = (D ∗ (A B))S

^T

∈ C

^{IJ ×K}

, (6)

where ’∗’ denotes the Hadamard (element-wise) product and D ∈ {0, 1}

^{IJ ×R}

. Binary weighted rank-one terms are of interest in clustering applications involving tensor structures (e.g., [23, 24]).

The paper is organised as follows. In the rest of the introduction we will first present the notation used throughout the paper, followed by a brief review of the CPD and the Block Term Decomposition (BTD) [25]. As our first contribution, we will in Section 2 present a new link between monomial factorizations of the form (2) and the coupled BTD [12, 13]. This connection enables us to translate the monomial constraint (3) into a low-rank constraint, which in turn allows us to treat the monomial factorization of a matrix as a tensor decomposition problem. Next, in Section 3 we will present

2

(3)

identifiability conditions. It will be explained that the presented identifiability conditions are extensions of well-known CPD uniqueness conditions developed in [26, 27, 28, 29] to the monomial case. We also explain that the monomial factorization framework can be used to generalize the CPD model (1) to the binary weighted CPD model (6). As our third contribution, we will in Section 4 extend the algebraic algorithm for CPD in [27, 30]

to monomial factorizations. We also demonstrate how the presented algebraic algorithm can be adapted and used for the computation of a BMF of the form (5).

1.1. Notation

Vectors, matrices and tensors are denoted by lower case boldface, upper case boldface and upper case calligraphic letters, respectively. The r-th column, conjugate, transpose, conjugate-transpose, determinant, permanent, inverse, right-inverse, range and kernel of a matrix A are denoted by a

_r

, A

^∗

, A

^T

, A

^H

, |A|,

⁺

A

⁺

, A

⁻¹

, A

^†

, range (A), ker (A), respectively. The dimension of a subspace S is denoted by dim(S). The symbols ⊗ and denote the Kronecker and Khatri-Rao product, defined as

A ⊗ B :=







a

11

B a

12

B . . . a

21

B a

22

B . . . .. . .. . . . .





 , A B := [a

1

⊗ b

1

a

2

⊗ b

2

. . . ] ,

in which (A)

_mn

= a

_mn

. The outer product of, say, three vectors a, b and c is denoted by a ◦ b ◦ c, such that (a ◦ b ◦ c)

_ijk

= a

_i

b

_j

c

_k

. The number of non-zero entries of a vector x is denoted by ω(x) in the tensor decomposition literature, dating back to the work of Kruskal [31]. Let diag(a) ∈ C

^{J ×J}

denote a diagonal matrix that holds a column vector a ∈ C

^{J ×1}

or a row vector a ∈ C

^1×J

on its diagonal. In some cases a diagonal matrix is holding row k of A ∈ C

^I×J

on its diagonal. This will be denoted by D

k

(A) ∈ C

^{J ×J}

. Furthermore, let vec(A) denote the vector obtained by stacking the columns of A ∈ C

^I×J

into a column vector vec(A) = [a

^T₁

, . . . , a

^T_J

]

^T

∈ C

^IJ

. Let e

^{(N )}n

∈ C

^N

denote the unit vector with unit entry at position n and zeros elsewhere. The all-ones vector is denoted by 1

R

= [1, . . . , 1]

^T

∈ C

^R

. Matlab index notation will be used for submatrices of a given matrix. For example, A(1:k,:) represents the submatrix of A consisting of the rows from 1 to k of A. The binomial coefficient is denoted by C

_m^k

=

_k!(m−k)!^m!

. The k-th compound matrix of A ∈ C

^I×R

is denoted by C

k

(A) ∈ C

^C^k^I^×C^R^k

. It is the matrix containing the determinants of all k × k submatrices of A, arranged with the submatrix index sets in lexicographic order. See [28, 30, 32, 33] and references therein for a discussion.

1.2. Canonical Polyadic Decomposition (CPD)

Consider the tensor X ∈ C

^{I×J ×K}

. We say that X is a rank-1 tensor if it is equal to the outer product of non-zero vectors a ∈ C

^I

, b ∈ C

^J

and s ∈ C

^K

such that x

_ijk

= a

_i

b

_j

s

_k

. A Polyadic Decomposition (PD) is a decomposition of X into a sum of rank-1 terms [3, 4]:

X =

R

X

r=1

E

^(r)

◦ s

r

=

R

X

r=1

a

r

◦ b

r

◦ s

r

, (7)

where E

^(r)

= a

r

b

^T_r

= a

r

◦ b

r

∈ C

^I×J

is a rank-one matrix. The rank of a tensor X is equal to the minimal number of rank-1 tensors that yield X in a linear combination.

Assume that the rank of X is R, then (7) is called the Canonical PD (CPD) of X .

3

(4)

1.2.1. Matrix representation

Consider the horizontal matrix slice X

^(i··)

∈ C

^{J ×K}

of X , defined by (X

^(i··)

)

_jk

= x

ijk

= P

R

r=1

a

ir

b

jr

s

kr

. The tensor X can be interpreted as a collection of matrix slices X

^(1··)

, . . . , X

^(I··)

, each admitting the factorization X

^(i··)

= P

R

r=1

a

ir

b

r

s

^T_r

= BD

i

(A) S

^T

. Stacking yields (1):

X = h

X

^(1··)T

, . . . , X

^(I··)T

i

T

= (A B) S

^T

. (8)

1.2.2. Uniqueness conditions for CPD

The rank-1 tensors in (7) can be arbitrarily permuted and the vectors within the same rank-1 tensor can be arbitrarily scaled provided the overall rank-1 term remains the same.

We say that the CPD is unique when it is only subject to these trivial indeterminacies.

For cases where S in (8) has full column rank, the following necessary and sufficient uniqueness condition stated in Theorem 1.1 was obtained in [26] and later reformulated in terms of compound matrices in [28]. It makes use of the matrix

G

⁽²⁾_CPD

= C

2

(A) C

2

(B) ∈ C

^C^I²^C^J²^×C^R²

(9) and the vector

f

⁽²⁾

(d) = [d

₁

d

₂

, d

₁

d

₃

, . . . , d

_R−1

d

_R

]

^T

∈ C

^C²^R

, (10) which consists of all distinct products of entries d

r

· d

s

with r < s from the vector d = [d

1

, . . . , d

R

]

^T

∈ C

^R

.

Theorem 1.1. Consider the PD of X ∈ C

^{I×J ×K}

in (7). Assume that S has full column rank. The rank of X is R and the CPD of X is unique if and only if the following implication holds

G

⁽²⁾_CPD

· f

⁽²⁾

(d) = 0 ⇒ ω(d) ≤ 1, (11) for all structured vectors f

⁽²⁾

(d) of the form (10).

In practice, condition (11) can be hard to check. However, as observed in [26, 27, 28], if G

⁽²⁾_CPD

in (11) has full column rank, then f

⁽²⁾

(d) = 0 and the condition is automatically satisfied. This fact leads to the following more easy to check uniqueness condition, which is only sufficient.

Theorem 1.2. Consider the PD of X ∈ C

^{I×J ×K}

in (7). If ( S has full column rank,

G

⁽²⁾_CPD

has full column rank, (12)

then the rank of X is R and the CPD of X is unique.

Furthermore, if condition (12) is satisfied, then the CPD of X can be computed via a matrix EVD [27, 30]. In short, the “CPD” of X can be converted into a “basic CPD”

of an (R × R × R) tensor Q of rank R, even in cases where max(I, J ) < R [27, 30]. The latter CPD can be computed by means of a standard EVD (e.g., [3, 34]). In Section 4 we briefly discuss how to construct the tensor Q from X and how to retrieve the CPD factor matrices A, B and S of X from the CPD of Q.

More details about the CPD can be found in [3, 31, 27, 28, 29, 26, 30, 35, 2] and references therein.

4

(5)

1.3. Block Term Decomposition (BTD) and coupled BTD

The multilinear rank-(P, P, 1) term decomposition of a tensor is an extension of the CPD (7), where each term in the decomposition now consists of the outer product of a vector and a matrix that is low-rank [25]. More formally, a

_r

◦ b

r

◦ s

r

in (7) is replaced by E

_r

◦ s

_r

:

X =

R

X

r=1

E

r

◦ s

r

, (13)

where E

r

∈ C

^I×J

is a rank-P matrix with min(I, J ) > P . Note that if P = 1, then (13) indeed reduces to (7) with E

r

= a

r

b

^T_r

= a

r

◦ b

r

. We will consider the extension of (13) in which a set of tensors X

⁽ⁿ⁾

∈ C

^Iⁿ^×Jⁿ^×K

, n ∈ {1, . . . , N } is decomposed into a sum of coupled multilinear rank-(P, P, 1) terms [12], or coupled BTD for short:

X

⁽ⁿ⁾

=

R

X

r=1

E

⁽ⁿ⁾_r

◦ s

_r

, n ∈ {1, . . . , N }, (14)

where E

⁽ⁿ⁾_r

∈ C

^Iⁿ^×Jⁿ

is a rank-P matrix with min(I

n

, J

n

) > P and s

r

∈ C

^K

. Note that {s

r

} are shared between all X

⁽ⁿ⁾

, i.e., the third mode ensures the coupling. As in the CPD case, the rank of the coupled BTD is defined as the minimal number of coupled multilinear rank-(P, P, 1) terms {E

⁽ⁿ⁾_r

◦ s

r

} that yield X

⁽¹⁾

, . . . , X

^{(N )}

. Since E

⁽ⁿ⁾_r

is low-rank, we know that (14) can also be expressed in terms of a coupled PD:

X

⁽ⁿ⁾

=

R

X

r=1

E

⁽ⁿ⁾_r

◦ s

_r

=

R

X

r=1

N

^(n,r)

M

^(n,r)T

◦ s

_r

=

R

X

r=1 P

X

p=1

n

^(n,r)_p

◦ m

^(n,r)_p

◦ s

_r

, (15)

where E

⁽ⁿ⁾_r

= N

^(n,r)

M

^(n,r)T

, in which N

^(n,r)

= [n

^(n,r)₁

, . . . , n

^(n,r)_P

] ∈ C

^Iⁿ^×P

and M

^(n,r)

= [m

^(n,r)₁

, . . . , m

^(n,r)_P

] ∈ C

^Jⁿ^×P

are rank-P matrices.

1.3.1. Matrix representation Define

M

⁽ⁿ⁾

= h

M

^(n,1)

, . . . , M

^(n,R)

i

∈ C

^Iⁿ^{×P R}

, (16)

N

⁽ⁿ⁾

= h

N

^(n,1)

, . . . , N

^(n,R)

i

∈ C

^Jⁿ^{×P R}

, (17)

S

^(ext)

= 1

^T_P

⊗ s

1

, . . . , 1

^T_P

⊗ s

R

∈ C

^{K×P R}

. (18) Then the factorization (15) can also be expressed in terms of {M

⁽ⁿ⁾

, N

⁽ⁿ⁾

, S

^(ext)

} as follows:

X =





 X

⁽¹⁾

.. . X

^{(N )}





 =







M

⁽¹⁾

N

⁽¹⁾

.. . M

^{(N )}

N

^{(N )}





 S

^(ext)T

. (19)

5

(6)

1.3.2. Uniqueness condition for coupled BTD

The coupled BTD version of G

⁽²⁾_CPD

in (9) is given by

G

^{(N,P +1)}_BTD

=







C

_{P +1}

M

⁽¹⁾

C

_{P +1}

N

⁽¹⁾

.. .

C

P +1

M

^{(N )}

C

P +1

N

^{(N )}







P

BTD

∈ C

^{N ×(C}^R+P^{P +1}^−R)

, (20)

where M

⁽ⁿ⁾

∈ C

^Iⁿ^{×P R}

and N

⁽ⁿ⁾

∈ C

^Jⁿ^{×P R}

are given by (16) and (17), respectively, and P

BTD

∈ C

^C^{P +1}^{P R} ^×(C^R+P^{P +1}^−R)

is the “compression” matrix that takes into account that each column vector s

r

in (18) is repeated P times and that |M

^(n,r)

| = |N

^(n,r)

| = 0. The latter property implies that R columns of C

P +1

(M

⁽ⁿ⁾

) C

P +1

(N

⁽ⁿ⁾

) are zero columns, which are eliminated by P

_BTD

. Here we only state how P

_BTD

is constructed. The reasoning behind the construction P

_BTD

can be found in [12]. (More details can also be found in the proof of Theorem 2.1 in Appendix B.) The C

_R+P^{P +1}

− R columns of P

BTD

are indexed by the lexicographically ordered tuples in the set

Γ

c

= {(r

1

, . . . , r

P +1

) | 1 ≤ r

1

≤ · · · ≤ r

P +1

≤ R} \ {(r, . . . , r)}

^R_r=1

.

Consider also the mapping f

c

: {(r

1

, . . . , r

P +1

)}

^{P +1}

→ {1, 2, . . . , C

_R+P^{P +1}

− R} that returns the position of its argument in the set Γ

_c

. Similarly, the C

_{P R}^{P +1}

rows of P

_BTD

are indexed by the lexicographically ordered tuples in the set

Γ

r

= {(q

1

, . . . , q

P +1

) | 1 ≤ q

1

< · · · < q

P +1

≤ P R}.

Likewise, we define the mapping f

_r

: {(q

₁

, . . . , q

_{P +1}

)}

^{P +1}

→ {1, 2, . . . , C

_{P R}^{P +1}

} that returns the position of its argument in the set Γ

_r

. The entries of P

_BTD

are now given by

(P

BTD

)

_f_r_(q₁_,...,q_{P +1}_),f_c_(r₁_,...,r_{P +1}₎

=

( 1, if d

^q_P¹

e = r

1

, . . . , d

^q^{P +1}_P

e = r

P +1

,

0, otherwise. (21)

Theorem 1.3. Consider the coupled BTD of X

⁽ⁿ⁾

∈ C

^Iⁿ^×Jⁿ^×K

, n ∈ {1, . . . , N } in (30).

If

( S has full column rank,

G

^{(N,P +1)}_BTD

has full column rank, (22)

then the coupled BTD rank of {X

⁽ⁿ⁾

} is R and the coupled BTD of {X

⁽ⁿ⁾

} is unique.

As in Theorem 1.2, if condition (22) in Theorem 1.3 is satisfied, then the coupled BTD of {X

⁽ⁿ⁾

} can be computed via a matrix EVD [13]. More details will be provided in Section 4.1.1.

The objective of this paper is to extend the CPD results discussed in this section to the case of monomial factorizations. More precisely, in Section 2 we explain that the monomial factorization can be interpreted as a coupled BTD. Next, in Section 3 we extend the uniqueness conditions stated in Theorems 1.1 and 1.2 to the case of bilinear models with factor matrices satisfying monomial relations. Finally, in Section 4 we extend the algebraic algorithm associated with Theorems 1.2 and 1.3 to the case of monomial factorizations.

6

(7)

2. Link between monomial factorization and coupled BTD

In Section 2.1 we explain how to represent the monomial structure (3) as a low-rank constraint on a particular matrix. Using this low-rank matrix, we will in Section 2.2 translate the monomial factorization (2) into the coupled BTD of the form (14) reviewed in Section 1.3.

2.1. Representation of monomial structure via low-rank matrix

We say that X admits an R-term monomial factorization enjoying N monomial relations of the form (3) if every column of A in (2) satisfies the monomial relations

L

Y

l=1

a

^(+,n)_lr

−

L

Y

l=1

a

^(−,n)_lr

= 0, r ∈ {1, . . . , R}, n ∈ {1, . . . , N }, (23)

where a

^(+,n)r

∈ C

^L

and a

^(−,n)r

∈ C

^L

are given by

( a

^(+,n)_r

= [a

^(+,n)_1r

, . . . , a

^(+,n)_Lr

]

^T

= a

p_1,n,r

, . . . , a

p_L,n,r

^T

, a

^(−,n)_r

= [a

^(−,n)_1r

, . . . , a

^(−,n)_Lr

]

^T

= a

s1,n,r

, . . . , a

_s_L,n_,r

T

,

(24)

in which a

^(+,n)_lr

= a

p_l,n,r

corresponds to the p

l,n

-th entry of the r-th column of A (similarly for a

^(−,n)_lr

). Define the linearly structured matrix A

_L

(a

^(+,n)r

, a

^(−,n)r

) ∈ C

^L×L

:

¹

A

L

a

^(+,n)_r

, a

^(−,n)_r

:=







a

^(+,n)_1r

0 · · · 0 (−1)

^L

· a

^(−,n)_1r

a

^(−,n)_2r

a

^(+,n)_2r

. . . 0

0 a

^(−,n)_3r

. . . . . . .. . .. . . . . . . . . . . 0 0 · · · 0 a

^(−,n)_Lr

a

^(+,n)_Lr







(25)

= diag(a

^(+,n)_r

) + (diag(a

^(−,n)_r

)P) ∗ (1

_L

1

^T_L

− e

^(L)₁

e

^(L)T_L

+ (−1)

^L

e

^(L)₁

e

^(L)T_L

), where P is the “cyclic” permutation matrix given by

P =

L−1

X

l=1

e

^(L)_l−1

e

^(L)T_l

+ e

^(L)₁

e

^(L)T_L

∈ C

^L×L

. (26)

1

The matrix 1

_L

1

^T_L

− e

^(L)₁

e

^(L)T_L

+ (−1)

^L

e

^(L)₁

e

^(L)T_L

takes the sign (−1)

^L

in (25) into account.

7

(8)

From the cofactor expansion of |A

L

(a

^(+,n)r

, a

^(−,n)r

)| along the first row, the connection between (23) and (25) becomes clear:

|A

L

(a

^(+,n)_r

, a

^(−,n)_r

)| = a

^(+,n)_1r

· a

^(+,n)_2r

0 · · · 0 a

^(−,n)_3r

. . . . . . .. . . . . . . . 0

a

^(−,n)_Lr

a

^(+,n)_Lr

(27)

+ (−1)

^L

a

^(−,n)_1r

(−1)

^L+1

a

^(−,n)_2r

a

^(+,n)_2r

0 a

^(−,n)_3r

. . .

.. . . . . . . . a

^(+,n)_L−1r

0 · · · 0 a

^(−,n)_Lr

=

L

Y

l=1

a

^(+,n)_lr

−

L

Y

l=1

a

^(−,n)_lr

= 0 ,

where we exploited that the two involved (L − 1) × (L − 1) minors in (27) are triangular.

The determinant property (27) also explains that A

L

(a

^(+,n)r

, a

^(−,n)r

) is low-rank under the condition (23). In fact, since the minors in (27) do not vanish under condition (23), A

L

(a

^(+,n)r

, a

^(−,n)r

) will be a rank-(L − 1) matrix. The only possible exception is the trivial case where Q

L

m=1

a

_p_m

= 0 and Q

L

n=1

a

_s_n

= 0. In this section it will become clear that if Q

L

m=1

a

p_m

6= 0 or Q

L

n=1

a

s_n

6= 0, then the linear structure on A

L

(a

^(+,n)r

, a

^(−,n)r

) can be relaxed without affecting the identifiability. To the best of our knowledge, the representation of a monomial relation of the form (23) via the rank deficiency of the matrix in (25) is a novel contribution of this paper.

2.2. Monomial factorization via coupled BTD

Consider the bilinear factorization (2) in which the columns of A satisfy N monomial relations of the form (23). The mapping (25) together with the bilinear property of the monomial factorization enables us to transform (2) into a coupled BTD. In detail, for every monomial relation (n ∈ {1, . . . , N }), we build a tensor X

⁽ⁿ⁾

∈ C

^L×L×K

with matrix slices

X

⁽ⁿ⁾

(:, :, k) = A

L

(x

^(+,n)_k

, x

^(−,n)_k

) =

R

X

r=1

A

L

(a

^(+,n)_r

, a

^(−,n)_r

)s

kr

, k ∈ {1, . . . , K}, (28)

in which x

^(+,n)_k

∈ C

^L

and x

^(−,n)_k

∈ C

^L

are constructed from the entries of the k-th column of X in accordance to the n-th monomial relation, so that (cf. Eq. (24)):

( x

^(+,n)_k

= [x

^(+,n)_1k

, . . . , x

^(+,n)_Lk

]

^T

= x

p1,n,k

, . . . , x

_p_L,n_,k

^T

, x

^(−,n)_k

= [x

^(−,n)_1k

, . . . , x

^(−,n)_Lk

]

^T

= x

s_1,n,k

, . . . , x

s_L,n,k

T

.

(29)

Overall, this yields the coupled BTD:

C

^L×L×K

3 X

⁽ⁿ⁾

=

R

X

r=1

A

L

(a

^(+,n)_r

, a

^(−,n)_r

) ◦ s

r

, n ∈ {1, . . . , N }. (30) The key observation is that each equation in (28) corresponds to a BTD, and the collection of all equations in (28) is a coupled BTD.

8

(9)

Matrix representation. Let X

^(··k,n)

∈ C

^L×L

denote the k-th matrix slice of X

⁽ⁿ⁾

, defined according to (X

^(··k,n)

)

ij

= (X

⁽ⁿ⁾

)

ijk

. We now obtain the following matrix representation of (30):

X

⁽ⁿ⁾

= h

vec(X

^(··1,n)

), . . . , vec(X

^(··K,n)

) i

= A

⁽ⁿ⁾

S

^T

∈ C

^L²^×K

, n ∈ {1, . . . , N }, where A

⁽ⁿ⁾

= h

vec(A

_L

(a

^(+,n)₁

, a

^(−,n)₁

)), . . . , vec(A

_L

(a

^(+,n)_R

, a

^(−,n)_R

)) i

∈ C

^L²^×R

and S = [s

₁

, . . . , s

_R

] ∈ C

^K×R

. Stacking yields

X = h

X

^(1)T

, . . . , X

^{(N )T}

i

^T

= A

_tot

S

^T

∈ C

^{N ·L}²^×K

, (31) where A

tot

∈ C

^{N ·L}²^×R

is given by

A

tot

=





 A

⁽¹⁾

.. . A

^{(N )}





 =







vec(A

L

(a

^(+,1)₁

, a

^(−,1)₁

)) . . . vec(A

L

(a

^(+,1)_R

, a

^(−,1)_R

))

.. . . . . .. .

vec(A

L

(a

^{(+,N )}₁

, a

^{(−,N )}₁

)) . . . vec(A

L

(a

^{(+,N )}_R

, a

^{(−,N )}_R

))





 .

(32) Uniqueness condition. Since A

_L

(a

^(+,n)r

, a

^(−,n)r

) defined by (25) is low-rank, (30) corresponds to a coupled BTD. In more detail, let the rank of A

_L

(a

^(+,n)r

, a

^(−,n)r

) be equal to L

_r,n

< L, then it admits the low-rank factorization

A

L

(a

^(+,n)_r

, a

^(−,n)_r

) = N

^(n,r)

M

^(n,r)T

= h

N e

^(n,r)

, 0

L,L−1−L_r,n

ih

M f

^(n,r)

, 0

L,L−1−L_r,n

i

T

, (33) where e N

^(n,r)

∈ C

^L×L^r,n

and f M

^(n,r)

∈ C

^L×L^r,n

are rank-L

r,n

matrices and 0

m,n

denotes an (m×n) zero matrix. Note that if ω(a

^(+,n)r

) = L or ω(a

^(−,n)r

) = L, then L

_r,n

= L−1, as explained in Section 2.1. However, if ω(a

^(+,n)r

) < L and ω(a

^(−,n)r

) < L, then L

r,n

< L−1.

Consider also G

^(N,L)_BTD

given by (20) in which M

⁽ⁿ⁾

and N

⁽ⁿ⁾

are built from the M

^(n,r)

and N

^(n,r)

matrices in (33). We can now conclude that if for all r ∈ {1, . . . , R} there exists an n ∈ {1, . . . , N } such that L

r,n

= L − 1 and G

^(N,L)_BTD

has full column rank, then the monomial factorization problem can be solved via the coupled BTD (30). Theorem 2.1 below summarizes the result.

Theorem 2.1. Consider the coupled BTD of X

⁽ⁿ⁾

∈ C

^Iⁿ^×Jⁿ^×K

, n ∈ {1, . . . , N } in (30).

If

( S has full column rank,

G

^(N,L)_BTD

has full column rank, (34)

then the coupled BTD rank of {X

⁽ⁿ⁾

} is R, the coupled BTD of {X

⁽ⁿ⁾

} is unique, A

tot

has full column rank, the monomial factorization of X in (2) is unique, and A in (2) has full column rank.

Proof. The result is an immmediate consequence of Theorem 1.3. The interested reader can find a proof in Appendix B in which the connection between Theorem 2.1 and the subsequent Theorem 3.2 is made apparent.

9

(10)

Note that in Theorem 2.1 we state that if condition (34) is satisfied, then A in (2) has full column rank. This is an obvious consequence of the uniqueness property of the full column rank factor matrix S. Note also that we have dropped the linear structure on A

L

(a

^(+,n)r

, a

^(−,n)r

) and instead used the low-rank factorization A

L

(a

^(+,n)r

, a

^(−,n)r

) = N

^(n,r)

M

^(n,r)T

in the coupled BTD of {X

⁽ⁿ⁾

}.

As a final remark, we mention that the transformation step from a monomial factorization (2) to an “unconstrained” coupled BTD (15) can be understood as a generalization of the Hankelisation step used in the ESPRIT method [5] for the special case where the columns of A have exponential structure.

3. Identifiability conditions for monomial factorizations

We will now take the linear structure on A

L

(a

^(+,n)r

, a

^(−,n)r

) into account. In Section 3.1 we explain that this will lead to generalizations of the uniqueness conditions stated in Theorems 1.1 and 1.2 for CPD to monomial factorizations. Next, in Section 3.2 we formulate the presented uniqueness conditions in terms of null spaces, which will lead to a condition that is easier to comprehend.

3.1. Necessary and sufficient uniqueness conditions 3.1.1. Mixed discriminants

In Theorem 3.1 we present a uniqueness condition for the monomial factorization of X. It will be based on the properties of the mixed discriminant discussed in this section.

However, let us first briefly explain how mixed discriminants appear in our problem. The overall idea is to find a condition that ensures that S

^T

has a unique right-inverse (up to intrinsic column scaling and permutation ambiguities), denoted by W. If W is unique, then Xw

r

= a

r

is also unique and ω(S

^T

w

r

) = 1 for all r ∈ {1, . . . , R}. This means that if d

r

= S

^T

w

r

, then X

⁽ⁿ⁾

w

r

= P

R

s=1

vec(A

^(n,s)

)d

sr

is a vectorized rank-(L − 1) matrix.

The latter property can be used to derive a condition that ensures the uniqueness of W.

Let H

^(r)

∈ C

^L×L

be of the form (25), corresponding to A

^(n,r)

in which the superscript

’n’ is omitted. Then in the proof of Theorem 3.1 it will become clear that the derivation of a uniqueness condition for W, and therefore also for the monomial factorization, boils down to finding a compact expression of the determinant of H

⁽¹⁾

d

₁

+ . . . + H

^(R)

d

_R

, where d

_r

∈ C:

R

X

r=1

H

^(r)

d

r

=

R

X

r₁,...,r_L=1

D(H

^(r¹⁾

, . . . , H

^(r^L⁾

) · d

r₁

· · · d

r_L

= X

(l1,...,lR)∈[L]R

D

H

⁽¹⁾

, . . . , H

⁽¹⁾

| {z }

l₁times

, . . . , H

^(R)

, . . . , H

^(R)

| {z }

l_Rtimes

d

^l₁¹

· · · d

^l_R^R

, (35)

where [L]

_R

denotes the set of all weak compositions of L in R terms, i.e.,

[L]

R

= {(l

1

, . . . , l

R

) | l

1

+ · · · + l

R

= L and l

1

, . . . , l

R

≥ 0}. (36) Note that the cardinality of [L]

R

is C

_R+L−1^L

. The coefficients {D(H

^(r¹⁾

, . . . , H

^(r^L⁾

)} in (35) are known as mixed discriminants and are given by

D(H

^(r¹⁾

, . . . , H

^(r^L⁾

) =

∂

^L

H

^(r¹⁾

d

r₁

+ · · · + H

^(r^L⁾

d

r_L

∂d

r₁

· · · ∂d

r_L

. (37)

10

(11)

The mixed discriminant of an L-tuple of square matrices H

^(r¹⁾

∈ C

^L×L

, . . . , H

^(r^L⁾

∈ C

^L×L

can also be defined as

D

H

^(r¹⁾

, . . . , H

^(r^L⁾

= 1 L!

X

σ∈SL

sgn(σ) h

h

^(r_σ(1)¹⁾

, h

^(r_σ(2)²⁾

, . . . , h

^(r_σ(L)^L⁾

i

, (38) where h

^(r_σ(l)^l⁾

denotes the σ(l)-th column of H

^(r^l⁾

, S

_L

denotes the set of all permutations of 1, 2, . . . , L and sgn(σ) denotes the sign of the permutation σ. It can be shown that the two definitions (37) and (38) of the mixed discriminant are equivalent [36].

From (38) it is clear that the mixed discriminant can be understood as an extension of the determinant. Indeed, if H := H

^(r¹⁾

= · · · = H

^(r^L⁾

, then (38) reduces to the determinant

D



H, . . . , H

| {z }

L times



 = X

σ∈S_L

sgn(σ)

L

Y

l=1

h

_l,σ(l)

= |H| . (39)

The mixed discriminant can also be understood as an extension of the permanent. More precisely, let D

⁽¹⁾

∈ C

^L×L

, . . . , D

^(L)

∈ C

^L×L

be diagonal matrices, then from (38) we obtain (a scaled version of) the permanent

D

⁽¹⁾

, . . . , D

^(L)

= 1 L!

X

σ∈S_L L

Y

l=1

d

^(σ(l))_l,l

= 1 L!

+

| B

+

| , (40)

where B ∈ C

^L×L

is given by (B)

il

= d

^(l)_ii

. Furthermore, let D

⁽¹⁾

∈ C

^L×L

, . . . , D

^(L)

∈ C

^L×L

be diagonal matrices, then

D

⁽¹⁾

, . . . , D

^(L)

= D

D

^(σ(1))

, . . . , D

^(σ(L))

= 1 L!

+

| B

+

| , ∀σ ∈ S

L

, (41) which follows from the column permutation invariance property of the permanent, i.e.,

+

| B

+

| =

+

| BΠ Π Π

+

| for any permutation matrix Π Π Π ∈ C

^L×L

. Note that the permanent can be seen as a signless version of the determinant (i.e.,

+

| H

+

| is equal to (39) when sgn(σ) is dropped). This directly explains the permutation invariance property of the permanent.

The three properties (39)–(41) of the mixed discriminant will be used in the derivation of Theorem 3.1. A further discussion of the mixed discriminant and its properties can be found in [36, 37]. A discussion of the properties of the permanent can be found in [32, 33].

3.1.2. Uniqueness condition in terms of dimension of column space

As mentioned earlier, in the proof of Theorem 3.1 we will make use of the expansion of

P

R

r=1

A

L

(a

^(+,n)r

, a

^(−,n)r

)d

r

in terms of the scalars d

1

, . . . , d

R

. The key ingredient in the derivation of condition (53) in Theorem 3.1 is the following identity

R

X

r=1

A

_L

(a

^(+,n)_r

, a

^(−,n)_r

)d

_r

= X

σ∈S_L

sgn(σ)

L

Y

l=1 R

X

r=1

d

_r

· (A

L

(a

^(+,n)_r

, a

^(−,n)_r

)

_lσ(l)

!

=

L

Y

l=1 R

X

r=1

d

r

a

^(+,n)_lr

!

−

L

Y

l=1 R

X

r=1

d

r

a

^(−,n)_lr

!

, (42)

11

(12)

where S

L

denotes the set of all permutations of 1, 2, . . . , L, and sgn(σ) denotes the sign of the permutation σ. Note also that (42) directly follows from the patterned structure of A

L

(a

^(+,n)r

, a

^(−,n)r

). (See also equations (25) and (27).) Define

A

^(+,r)

=







a

^(+,1)Tr

.. . a

^{(+,N )T}r







∈ C

^{N ×L}

and A

^(−,r)

=







a

^(−,1)Tr

.. . a

^{(−,N )T}r







∈ C

^{N ×L}

. (43)

Due to properties (39)–(41), the expansion of the product-of-sums in (42) with respect to d

₁

, . . . , d

_R

yields the homogeneous polynomial (see the proof of Theorem 3.1 for more details):

R

X

r=1

A

L

(a

^(+,n)_r

, a

^(−,n)_r

)d

r

=

L

Y

l=1 R

X

r=1

d

r

a

^(+,n)_lr

!

−

L

Y

l=1 R

X

r=1

d

r

a

^(−,n)_lr

!

(44)

= X

(l₁,...,l_R)∈[L]_R

"

D

n

(A

^(+,1)

), . . . , D

n

(A

^(+,1)

)

| {z }

l1times

, . . . , D

n

(A

^(+,R)

), . . . , D

n

(A

^(+,R)

)

| {z }

lRtimes

− D

D

n

(A

^(−,1)

), . . . , D

n

(A

^(−,1)

)

| {z }

l1times

, . . . , D

n

(A

^(−,R)

), . . . , D

n

(A

^(−,R)

)

| {z }

lRtimes

#

d

^l₁¹

· · · d

^l_R^R

.

(Compare with (35).) In terms of the matrices and vectors defined below a compact expression of (44) will be introduced. For every weak composition of L in R terms (i.e., l

1

+ · · · + l

R

= L subject to l

r

≥ 0) we define the square (L × L) matrices

A

^(+,n)_(l

1,...,lR)

= h 1

^T_l

1

⊗ a

^(+,n)₁

, . . . , 1

^T_l

R

⊗ a

^(+,n)_R

i

∈ C

^L×L

, (45)

A

^(−,n)_(l

1,...,lR)

= h 1

^T_l

1

⊗ a

^(−,n)₁

, . . . , 1

^T_l

R

⊗ a

^(−,n)_R

i

∈ C

^L×L

. (46)

From the matrices in (45) and (46), we also build the row vectors g

^(n,L)₊

∈ C

^1×(C^R+L−1^L ^−R)

and g

^(n,L)₋

∈ C

^1×(C^R+L−1^L ^−R)

whose entries are indexed by an R-tuple (l

1

, l

2

, . . . , l

R

) with 0 ≤ l

r

≤ L − 1 and ordered lexicographically:

g

^(n,L)₊

=

₊

| A

^(+,n)(L−1,1,0,0,...,0) +

| ,

+

| A

^(+,n)(L−1,0,1,0,...,0) +

| , . . . ,

+

| A

^(+,n)(0,...,0,1,L−1) +

|

, (47)

g

^(n,L)₋

=

₊

| A

^(−,n)(L−1,1,0,0,...,0) +

| ,

+

| A

^(−,n)(L−1,0,1,0,...,0) +

| , . . . ,

+

| A

^(−,n)(0,...,0,1,L−1) +

|

. (48) Based on (47) and (48) we in turn build the row vector

g

^(n,L)_MF

=

g

^(n,L)₊

− g

^(n,L)₋

D

^(L)_W

∈ C

^1×(C^L^R+L−1^−R)

, (49) in which the subscript ’MF’ stands for Monomial Factorization and the diagonal weight matrix D

^(L)_W

∈ C

^(C^R+L−1^L ^−R)×(C^R+L−1^L ^−R)

is given by

D

^(L)_W

= diag

w

^(L)(L−1,1,0,0,...,0)

, w

^(L)(L−1,0,1,0,...,0)

, . . . , w

(0,...,0,1,L−1)^(L)

, (50)

12

(13)

where the scalar w

_(l^(L)

1,l2,...,l_R)

=

_l ¹

1!l₂!···l_R!

takes into account that, due to the column permutation invariance property of the permanent,

+

| A

^(+,n)_(l

1,l₂,...,l_R) +

| and

+

| A

^(−,n)_(l

1,l₂,...,l_R) +

| appear

_l ^L!

1!l2!···lR!

times in the expansion of

P

R

r=1

A

_L

(a

^(+,n)r

, a

^(−,n)r

)d

_r

and that each permanent is scaled by the factor

_L!¹

(see (40)), as will be made clear in the proof of Theorem 3.1. Overall, we build

G

^(N,L)_MF

= h

g

^(1,L)T_MF

, g

^(2,L)T_MF

, . . . , g

^(N,L)T_MF

i

T

∈ C

^{N ×(C}^R+L−1^L ^−R)

. (51) It can be verified that (51) is an extension of (9) to the monomial case, i.e., if X satisfies the CPD factorization (8) with full column rank S, then G

^(N,L)_MF

reduces to G

⁽²⁾_CPD

. Note that in the former case there are two superscripts. Namely, ’N ’ and ’L’ that indicate the number of monomial constraints / equations and the degree of the involved monomials, respectively. In the CPD case we have N = C

_I²

C

_J²

and L = 2. Finally, we will also make use of the vector

f

^(L)

(d) = [d

^L−1₁

d

2

, d

^L−1₁

d

3

, . . . , d

R−1

d

^L−1_R

]

^T

∈ C

^(C^R+L−1^L ^−R)

. (52) Comparing (10) with (52), it is clear that the latter is also an extension of the former.

More precisely, f

^(L)

(d) consists of all C

_R+L−1^L

distinct entries of d ⊗ · · · ⊗ d minus the R entries d

^L₁

, . . . , d

^L_R

.

Theorem 3.1. Consider the monomial factorization of X in (2) with N monomial relations of the form (3). Assume that S has full column rank. The monomial factorization of X is unique, A in (2) has full column rank and A

_tot

in (31) has full column rank if and only if the following implication holds

G

^(N,L)_MF

· f

^(L)

(d) = 0 ⇒ ω(d) ≤ 1, (53) for all structured vectors f

^(L)

(d) of the form (52).

Proof. See Appendix A.

As (11) in the CPD case, condition (53) can be hard to check. Analogous, we observe that if G

^(N,L)_MF

in (53) has full column rank, then f

^(L)

(d) = 0 and the condition is automatically satisfied. This leads to the following more easy to check sufficient uniqueness condition.

Theorem 3.2. Consider the monomial factorization of X in (2) with N monomial relations of the form (3). If

( S has full column rank,

G

^(N,L)_MF

has full column rank, (54)

then the monomial factorization of X is unique, A in (2) has full column rank and A

_tot

in (31) has full column rank.

13

(14)

3.2. Uniqueness condition in terms of dimension of null space

Theorem 3.3 below provides an alternative formulation of Theorem 3.1, which may be more easy to comprehend.

Theorem 3.3 makes use of a matrix Ψ Ψ Ψ

^(N,L)

∈ C

^{N ×R}^L

, defined as

Ψ Ψ Ψ

^(N,L)

=





 ψ ψ ψ

^(1,L)

.. . ψ ψ ψ

^(N,L)





 =







( e a

^(+,1)₁

⊗ · · · ⊗ e a

^(+,1)_L

)

^T

.. .

( e a

^{(+,N )}₁

⊗ · · · ⊗ e a

^{(+,N )}_L

)

^T







−







( e a

^(−,1)₁

⊗ · · · ⊗ e a

^(−,1)_L

)

^T

.. .

( e a

^{(−,N )}₁

⊗ · · · ⊗ e a

^{(−,N )}_L

)

^T





 , (55)

where e a

^(+,n)_l

= [a

^(+,n)_l1

, . . . , a

^(+,n)_lR

]

^T

∈ C

^R

and e a

^(−,n)_l

= [a

^(−,n)_l1

, . . . , a

^(−,n)_lR

]

^T

∈ C

^R

. Note that

R

X

r=1

A

L

(a

^(+,n)_r

, a

^(−,n)_r

)d

r

= e a

^(+,n)₁

⊗ · · · ⊗ e a

^(+,n)_L

− e a

^(−,n)₁

⊗ · · · ⊗ e a

^(−,n)_L

T

(d ⊗ · · · ⊗ d)

= ψ ψ ψ

^(n,L)

(d ⊗ · · · ⊗ d), (56)

where d = [d

1

. . . , d

R

]

^T

∈ C

^R

. Theorem 3.3 will also make use of the subspace

ker(Ψ Ψ Ψ

^(N,L)

) ∩ π

_S^(L)

, (57) where π

_S^(L)

denotes the subspace of vectorized R

^L

symmetric tensors. The link between Theorem 3.1 and Theorem 3.3 follows from the observation that condition (53) can also be expressed as

f

^(L)

(d) ∈ ker(G

^(N,L)_MF

) ⇒ ω(d) ≤ 1. (58) Since

G

^(N,L)_MF

f

^(L)

(d) = 0 ⇔ Ψ Ψ Ψ

^(N,L)

(d ⊗ · · · ⊗ d) = 0, (59) as explained in Appendix C, condition (58) can in turn be expressed as

d ⊗ · · · ⊗ d ∈ ker(Ψ Ψ Ψ

^(N,L)

) ∩ π

^(L)_S

⇒ ω(d) ≤ 1. (60) Note that this also means that the dimension of the subspace ker(G

^(N,L)_MF

) is equal to the dimension of the subspace ker(Ψ Ψ Ψ

^(N,L)

) ∩ π

_S^(L)

. Theorems 3.3 and 3.4 below are reformulations of Theorems 3.1 and 3.2 in terms of ker Ψ Ψ Ψ

^(N,L)

∩ π

^(L)_S

.

Theorem 3.3. Consider the monomial factorization of X in (2) with N monomial relations of the form (3). Assume that S has full column rank. The monomial factorization of X is unique, A in (2) has full column rank and A

tot

in (31) has full column rank if and only if the following implication holds

d ⊗ · · · ⊗ d ∈ ker(Ψ Ψ Ψ

^(N,L)

) ∩ π

^(L)_S

⇒ ω(d) ≤ 1, (61) where Ψ Ψ Ψ

^(N,L)

is given by (55).

14

(15)

Theorem 3.4. Consider the monomial factorization of X in (2) with N monomial relations of the form (3). If

( S has full column rank,

ker(Ψ Ψ Ψ

^(N,L)

) ∩ π

^(L)_S

is an R-dimensional subspace, (62) then the monomial factorization of X is unique, A in (2) has full column rank and A

tot

in (31) has full column rank.

3.3. Application: Extension of CPD to (0,1)-binary weighted CPD 3.3.1. CPD

Consider the CPD of X given by (7) in which E

^(r)

= a

_r

b

^T_r

is associated with the r-th column of A B. Recall that any 2-by-2 submatrix of E

^(r)

is a rank-1 matrix, i.e.,

e

^(r)_i

1j1

e

^(r)_i

1j2

e

^(r)_i

2j₁

e

^(r)_i

2j₂

= e

^(r)_i

1j₁

e

^(r)_i

2j₂

− e

^(r)_i

1j₂

e

^(r)_i

2j₁

= 0. Since there are C

_I²

ways of selecting two rows of E

^(r)

and C

_J²

ways of selecting two columns of E

^(r)

, it is clear that the CPD can be interpreted as a monomial factorization involving N = C

_I²

C

_J²

monomial relations of the form (23) with L = 2, a

^(+,n)r

= [e

^(r)_i

1j₁

, e

^(r)_i

2j₂

]

^T

and a

^(−,n)r

= [e

^(r)_i

1j₂

, e

^(r)_i

2j₁

]

^T

, where the superscript

⁰

n

⁰

is associated with the tuple (i

1

, i

2

, j

1

, j

2

).

3.3.2. Binary weighted CPD

A nice property of monomial factorizations is that they allow us to extend the CPD model (1) to binary weighted CPD (6) in which E

^(r)

in (7) now takes the form E

^(r)

= D

^(r)

∗ (a

r

b

^T_r

), where D

^(r)

∈ {0, 1}

^I×J

is a binary “connectivity” matrix. This means that the tensor representation (7) extends to

X =

R

X

r=1

E

^(r)

◦ s

_r

=

R

X

r=1

(D

^(r)

∗ (a

_r

b

^T_r

)) ◦ s

_r

(63)

for binary weighted CPD. From (8) and (63) it is clear that (6) is a matrix representation of the binary weighted CPD of X in which D = [vec(D

^(1)T

), . . . , vec(D

^(R)T

)] ∈ C

^{IJ ×R}

. Since E

^(r)

is not necessarily a low-rank matrix, the CPD modeling approach cannot be used for binary weighted CPD. However, it can be verified that any 2-by-2 submatrix of E

^(r)

= D

^(r)

∗ (a

r

b

^T_r

) must satisfy the monomial relation e

^(r)_i

1j₁

e

^(r)_i

2j₂

e

^(r)_i

1j₂

e

^(r)_i

2j₁

· (e

^(r)_i

1j₁

e

^(r)_i

2j₂

− e

^(r)_i

1j2

e

^(r)_i

2j1

) = 0. We can now conclude that the binary weighted CPD of a tensor can be interpreted as a monomial factorization involving N = C

_I²

C

_J²

monomial relations of the form (23) with L = 6 and

( a

^(+,n)_r

= [e

^(r)_i

1j₁

, e

^(r)_i

2j₂

e

^(r)_i

1j₂

, e

^(r)_i

2j₁

, e

^(r)_i

1j₁

, e

^(r)_i

2j₂

]

^T

, a

^(−,n)_r

= [e

^(r)_i

1j₁

, e

^(r)_i

2j₂

, e

^(r)_i

1j₂

, e

^(r)_i

2j₁

, e

^(r)_i

1j₂

, e

^(r)_i

2j₁

]

^T

,

(64)

in which the superscript

⁰

n

⁰

is associated with the tuple (i

1

, i

2

, j

1

, j

2

).

Let us end this section with a concrete example that demonstrates that the connectivity pattern of D makes the uniqueness properties of the binary weighted CPD of X

15

(16)

in (6) different from the uniqueness properties the CPD of X in (7). Let I = J = 5, K = R = 3, and

A =







2 8 4

7 3 4

2 10 3

4 8 5

6 5 5







, B =







8 5 6

8 3 6

7 10 2

4 9 3

8 6 5







, S =

" ₆ ₈ ₁₀ 8 2 8 6 1 4

# ,

D

⁽¹⁾

=







1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 0 1 1 1 1







, D

⁽²⁾

=







1 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 1 0







, D

⁽³⁾

=







0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 1 1





 .

It can be verified that condition (54) in Theorem 3.2 is satisfied, implying the uniqueness of A, B and S. It can also be verified that the ranks of E

⁽¹⁾

= D

⁽¹⁾

∗ (a

1

b

^T₁

), E

⁽²⁾

= D

⁽²⁾

∗ (a

2

b

^T₂

) and E

⁽³⁾

= D

⁽³⁾

∗ (a

3

b

^T₃

) are 4, 3 and 5, respectively, i.e., the connectivity pattern implies that the binary weighted CPD factor matrices are not even required to be low rank. If they were low rank, BTD would have applied; this exemplifies that binary weighted CPD can guarantee uniqueness in cases where other tensor decompositions fail.

Observe also that in contrast to the ordinary CPD, the local low-rank properties of the binary weighted CPD factor matrices can vary. For example, [e

^(I)₁

e

^(I)₂

]

^T

E

⁽¹⁾

[e

^{(J )}₁

e

^{(J )}₂

] =

16 16 56 0

is a rank-2 matrix while [e

^(I)_I−1

e

^(I)_I

]

^T

E

⁽¹⁾

[e

^{(J )}_{J −I}

e

^{(J )}_J

] = 16 32 24 48

is a rank-1 matrix.

The connectivity pattern of D also affects the identifiability of the CPD factor matrices A and B. To see this, consider again the above example, but now we set a

1

= a

2

and b

1

= b

2

. The binary weighted CPD of X in (63) is still unique, despite the rank-deficient matrix A B. In contrast, the CPD of X (i.e., D = 1

IJ

1

^T_R

) is not unique.

4. Algorithms for monomial factorizations

In this section we will present algebraic algorithms tailored for the monomial factorization of X. In Section 4.1 we first derive an algorithm that exploits the link between monomial factorizations and coupled BTD, as discussed in Section 2.2. Next, in Section 4.2 we derive an algorithm that also exploits the linear structure on A

L

(a

^(+,n)r

, a

^(−,n)r

), which is based on the column space formulation in Section 3.1.2. Finally, in Section 4.3 we present an alternative algorithm that relies on the null space formulation in Section 3.2.

4.1. Condition and algorithm based on the coupled BTD formulation

In this section we review the algorithm for coupled BTD developed in [13], which is an extension of the algebraic algorithm for CPD developed in [27, 30]. This leads to a constructive interpretation of Theorem 2.1 for monomial factorizations. First in Section 4.2 we will explain how to incorporate the linear structure of A

L

(a

^(+,n)r

, a

^(−,n)r

) into the algorithm, leading to a constructive interpretation of Theorem 3.2.

16

Monomial factorizations via tensor decompositions