Citation/Reference Domanov I., Stegeman A., De Lathauwer L.,
``On the largest multilinear singular values of higher-order tensors’’
SIAM Journal on Matrix Analysis and Applications, vol. 38, No.4, 2017, 1434-1453.
Archived version Author manuscript: the content is identical to the content of the published paper, but without the final typesetting by the publisher
Published version https://doi.org/10.1137/16M110770X
Journal homepage http://epubs.siam.org/doi/abs/10.1137/16M110770X
Author contact ignat.domanov@kuleuven.be +32 56 24 64 92
Abstract
IR url in Lirias https://lirias.kuleuven.be/handle/123456789/588422
(article begins on next page)
HIGHER-ORDER TENSORS
IGNAT DOMANOV†, ALWIN STEGEMAN†, AND LIEVEN DE LATHAUWER†
Abstract. Let σndenote the largest mode-n multilinear singular value of an I1×· · ·×INtensor T . We prove that
σ21+· · · + σ2n−1+ σn+12 +· · · + σ2N≤ (N − 2)kT k2+ σn2, n = 1, . . . , N.
We also show that at least for third-order cubic tensors the inverse problem always has a solution.
Namely, for each σ1, σ2 and σ3that satisfy
σ21+ σ22≤ kT k2+ σ32, σ21+ σ32≤ kT k2+ σ22, σ22+ σ32≤ kT k2+ σ21,
and the trivial inequalities σ1 ≥ √1nkT k, σ2 ≥ √1nkT k, σ3 ≥ √1nkT k, there always exists an n× n × n tensor whose largest multilinear singular values are equal to σ1, σ2and σ3. We also show that if the equality σ12+ σ22=kT k2+ σ32holds, thenT is necessarily equal to a sum of multilinear rank-(L1, 1, L1) and multilinear rank-(1, L2, L2) tensors and we give a complete description of all its multilinear singular values. We establish a connection with honeycombs and eigenvalues of the sum of two Hermitian matrices. This seems to give at least a partial explanation of why results on the joint distribution of multilinear singular values are scarce.
Key words. multilinear singular value decomposition, multilinear rank, singular value decom- position, tensor
AMS subject classifications. 15A69, 15A23
1. Introduction. Throughout the paper the superscripts
T,
H, and
∗denote transpose, hermitian transpose, and conjugation, respectively. We also use the “empty sum/product” convention, i.e., if m > n, then
P
n m( ·) = 0 and Q
nm
( ·) = 1.
Let T ∈ C
I1×···×IN. A mode-n fiber of T is a column vector obtained by fixing indices i
1, . . . , i
n−1, i
n+1, . . . , i
N. A matrix T
(n)∈ C
In×I1···In−1In+1...INformed by all mode-n fibers is called a mode-n matrix unfolding (aka flattening or matricization) of T . For notational convenience we assume that the columns of T
(n)are ordered such that
(1) the (i
n, 1 + X
N k=1k6=n(i
k− 1)
k−1
Y
ll=16=n
I
l)th entry of T
(n)= the (i
1, . . . , i
N)th entry of T .
For instance, if N = 3, i.e., T ∈ C
I1×I2×I3, then (1) implies that T
(1)= [T
1. . . T
I3] ∈ C
I1×I2I3,
∗Submitted to the editors DATE.
Funding: This work was funded by (1) Research Council KU Leuven: C1 project c16/15/059- nD; (2) F.W.O.: project G.0830.14N, G.0881.14N; (3) the Belgian Federal Science Policy Office:
IUAP P7 (DYSCO II, Dynamical systems, control and optimization, 2012-2017); (4) EU: The re- search leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Advanced Grant:
BIOTENSORS (no. 339804). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information
† Group Science, Engineering and Technology, KU Leuven - Kulak, E. Sabbelaan 53, 8500 Kortrijk, Belgium and Dept. of Electrical Engineering ESAT/STADIUS KU Leuven, Kasteel- park Arenberg 10, bus 2446, B-3001 Leuven-Heverlee, Belgium (ignat.domanov@kuleuven.be,al- win.stegeman@kuleuven.be,lieven.delathauwer@kuleuven.be).
1
T
(2)= [T
T1. . . T
TI3] ∈ C
I2×I1I3,
T
(3)= [vec(T
1) . . . vec(T
I3)]
T∈ C
I3×I1I2, where T
1, . . . , T
I3∈ C
I1×I2denote the frontal slices of T .
Tensor T ∈ C
I1×···×INis all-orthogonal if the matrices T
(1)T
H(1), . . . , T
(N )T
H(N )are diagonal. The MultiLinear (ML) Singular Value Decomposition (SVD) (aka Higher- Order SVD) is a factorization of T into the product of an all-orthogonal tensor S ∈ C
I1×···×INand N unitary matrices U
1∈ C
I1×I1, . . . , U
N∈ C
IN×IN,
(2) T = S·
1U
1·
2U
2. . . ·
NU
N,
where ” ·
n” denotes the n-mode product of S and U
n. Rather than giving the formal definition of ” ·
n”, for which we refer the reader to [3, 4, 13], we present N equivalent matricized versions of (2):
(3) T
(n)= U
nS
(n)(U
N⊗ · · · ⊗ U
n+1⊗ U
n−1⊗ · · · ⊗ U
1)
T, n = 1, . . . , N, where “ ⊗” denotes the Kronecker product. For N = 2, i.e., for T = T
1∈ C
I1×I2, the MLSVD reduces, up to trivial indeterminacies, to the classical SVD of a matrix, T
(1)= T
1= USV
H, where U = U
1, S = S
(1), and V = U
∗2⊗ 1. It is known [ 4] that MLSVD always exists and that its uniqueness properties are similar to those of the matrix SVD.
The MLSVD has many applications in signal processing, data analysis, and ma- chine learning (see, for instance, the overview papers [13, Subsection 4.4], [15]). Here we just mention that as Principal Component Analysis (PCA) can be done by SVD of a data matrix, MLPCA can be done by MLSVD of a data tensor [5, 14, 16].
The singular values of T
(n), are called the mode-n singular values of T . Since S
(1)S
H(1), . . . , S
(N )S
H(N )are diagonal, it follows from (3) that the ML singular values of T coincide with the ML singular values of S, which are just the Frobenius norms of the rows of S
(1), . . . , S
(N ). Throughout the paper,
σ
ndenotes the largest singular value of T
(n).
In the matrix case, i.e., for N = 2, the description of MLSVD is trivial. Indeed, the singular values of T
(1)= T
1and T
(2)= T
T1coincide and T
(3)= vec(T
1)
Thas a single singular value kT k. Thus, the singular values of T
(1)completely define the singular values of T
(2)and T
(3). In particular, the set of triplets (σ
1, σ
2, σ
3) coincides with the set {(x, x, y) : y ≥ x ≥ 0} ⊂ R
3whose Lebesgue measure is zero. The situation for tensors is much more complicated. It is clear that in the general case N ≥ 2, the sets of the mode-1,. . . , mode-N singular values are not independent either.
The study of topological properties of the set of ML singular values of real tensors has been initiated only recently in [8] and [7]. In particular, it has been shown in [8]
and [7] that, as in the matrix case, some configurations of ML singular values are not possible but, nevertheless, at least for n × · · · × n tensors the set of ML singular values has a positive Lebesgue measure.
In this paper we study possible configurations for the largest ML singular values,
i.e., for σ
1, . . . , σ
N. Our results are valid for real and complex tensors. The following
theorem presents simple necessary conditions for σ
1, σ
2, and σ
3to be the largest ML
singular values of a third-order tensor. For instance, it implies that a norm-1 tensor
whose largest ML singular values are equal to 0.9, 0.9, and 0.7 does not exist.
Theorem 1. Let σ
1, σ
2, and σ
3denote the largest ML singular values of an I
1× I
2× I
3tensor T . Then
σ
12+ σ
22≤ kT k
2+ σ
32, σ
12+ σ
32≤ kT k
2+ σ
22, σ
22+ σ
23≤ kT k
2+ σ
12, (4)
σ
1≥ 1
√ I
1kT k, σ
2≥ 1
√ I
2kT k, σ
3≥ 1
√ I
3kT k.
(5)
Figure 1 shows four typical shapes of the set {(σ
12, σ
22, σ
32) : σ
1, σ
2, σ
3satisfy (4)–(5) } (WLOG, we assumed that I
1≤ I
2≤ I
3).
S
N
X
1X
2Y
1Y
2Z
1Z
2σ
21σ
22σ
32O
(a) I1< I2< I3
S
N
X
1X
2Y
1Y
2Z
σ
12σ
22σ
32O
(b) I1= I2< I3
S
N
X
Y
1Y
2Z
1Z
2σ
12σ
22σ
23O
(c) I1< I2= I3
S
N
X
Y Z
σ
21σ
22σ
23O
(d) I1= I2= I3= n
Fig. 1. The typical shapes of the set{(σ21, σ22, σ23) : σ1, σ2, σ3satisfy (4)–(5)} for I1≤ I2≤ I3
(drawn for I1 = 2, I2 = 3, I3 = 5 andkT k = 1). Plot (a) is the case where all dimensions of a tensor are distinct. The points S, X1, X2, Y1, Y2, Z1, Z2 and N have coordinates (I1
1,I1
2,I1
3), (1−I12+I1
3,I1
2,I1
3), (1,I1
2,I1
2), (I1
1, 1−I11+I1
3,I1
3), (I1
1, 1,I1
1), (I1
1,I1
2, 1−I11+I1
2), (I1
1,I1
1, 1), and (1, 1, 1), respectively. Plots (b)–(c) are the cases where a tensor has exactly two equal dimensions, the points Z1 and Z2 were merged into one point Z and the points X1 and X2 were merged into one point X. Plot (d) is the case where all three dimensions of a tensor are equal to each other, I1= I2= I3= n. In this case, the points Y1and Y2 were merged into one point Y , so S, X, Y , and Z have the coordinates (1n,1n,n1), (1,n1,n1), (1n, 1,n1), and (n1,1n, 1), respectively. By Corollary3, any point (σ21, σ22, σ23) of the polyhedron SXY ZN in plot (d) is feasible, i.e., there exists a norm-1 tensorT ∈ Cn×n×n whose squared largest multilinear singular values are σ21, σ22, and σ23. The volume of SXY ZN equals half of the volume of the cube, i.e., 12(1−1n)3.
One can easily verify that if σ
1, σ
2and σ
3satisfy (4)–(5) for I
1= I
2= I
3= 2 and kT k = 1, then σ
1, σ
2and σ
3are the largest ML singular values of the 2 × 2 × 2 tensor T with mode-1 matrix unfolding
T
(1)= [T
1T
2] =
√
σ21+σ22+σ23−1√2
0 0
√
1+σ21√−σ22−σ232
0
√
1+σ23√−σ21−σ222
√
1+σ22√−σ21−σ322
0
.
The proof of the following result relies on a similar explicit construction of an I
1× I
2× I
3tensor T .
Theorem 2. Let I
1≤ I
2≤ I
3and σ
1, σ
2, σ
3satisfy (4) and the following three inequalities
σ
1≥ 1
√ I
1kT k, (6)
(I
2− I
1)σ
12+ (I
1I
2− I
2)σ
32+ (1 − I
2) kT k
2≥ 0, (7)
(I
2− I
1)σ
12+ (I
1I
2− I
2)σ
22+ (1 − I
2) kT k
2≥ 0.
(8)
Then there exists an I
1× I
2× I
3tensor T such that 1. all entries of T are non-negative;
2. T is all-orthogonal;
3. the largest ML singular values of T are equal to σ
1, σ
2and σ
3.
Conditions (5) and (6)–(8) mean that the point (σ
21, σ
22, σ
32) belongs to the trihedral angle SX
1Y
1Z
1and S
2X
2Y
2Z
2, respectively, where S
2has coordinates (
I11,
I11,
I11).
The gap between the necessary conditions in Theorem 1 and the sufficient conditions in Theorem 2, i.e., the set
(9) {(σ
12, σ
22, σ
32) : (4)–(5) hold and at least one of (6)–(8) does not hold }, is shown in Figure 2c. One can easily verify that the gap is empty only for I
1= I
2= I
3.
Corollary 3. Let σ
1, σ
2and σ
3satisfy (4)–(5) for I
1= I
2= I
3= n ≥ 2. Then there exists an n × n × n tensor T such that
1. all entries of T are non-negative;
2. T is all-orthogonal;
3. the largest ML singular values of T are equal to σ
1, σ
2and σ
3.
Thus, the conditions in Theorem 1 are not only necessary but also sufficient for σ
1, σ
2, and σ
3to be feasible largest ML singular values of a cubic third-order tensor.
Figure 1d shows the set of feasible triplets (σ
21, σ
22, σ
23) of an n × n × n tensor.
We do not have a complete view on the feasibility of points in (9). In Section 3 we obtain particular results on the (non)feasibility of the points S(
I11,
I12,
I13), X
1(1 −
1
I2
+
I13,
I12,
I13), and Y
1(
I11, 1 −
I11+
I13,
I13). Namely, we show that if I
1< I
2and I
3= I
1I
2− 1, then the point S is not feasible and if I
3= I
1I
2, then the point S is feasible but the points X
1and Y
1not.
It worth mentioning a link with scaled all-orthonormal tensors introduced recently in [6]. Tensor T ∈ C
I1×···×INis scaled all-orthonormal [6, Definition 2] if at least N −1 of the N matrices T
(1)T
H(1), . . . , T
(N )T
H(N )are multiples of the identity matrix. It is clear that if the largest mode-n singular value of a norm-1 tensor is
√1In
, then all mode- n singular values are also
√1In
. Thus, feasibility of a point belonging to the segment
S
N
X
1X
2Y
1Y
2Z
1Z
2σ
21σ
22σ
32O
(a) σ1, σ2, σ3 satisfy (4)–(5)
N
X
2Y
2Z
2S
2σ
12σ
22σ
32O
(b) σ1, σ2, σ3satisfy (4) and (6)–(8)
S
2S
N
X X
12Y
1Y
2Z
1Z
2σ
21σ
22σ
23O
(c) the set in eq. (9)
Fig. 2. Gap between the necessary conditions in Theorem1and the sufficient conditions in Theorem2for I1 < I2 < I3 (drawn for I1 = 2, I2 = 5, I3= 7 and kT k = 1). The point S2 has coordinates (I1
1,I1
1,I1
1). The set in plot (c) is the difference of the set in plot (a) and the set in plot (b).
SX
1(resp. SY
1or SZ
1) is equivalent to the existence of a norm-1 I
1× I
2× I
3tensor T such that
T
(2)T
H(2)= 1 I
2I
I2, T
(3)T
H(3)= 1 I
3I
I3(resp. T
(1)T
H(1)= 1 I
1I
I1, T
(3)T
H(3)= 1 I
3I
I3or T
(1)T
H(1)= 1 I
1I
I1, T
(2)T
H(2)= 1 I
2I
I2), i.e., to the existence of a scaled all-orthonormal tensor T .
The following result generalizes Theorem 1 for N th-order tensors.
Theorem 4. Let σ
1, . . . , σ
Ndenote the largest ML singular values of an I
1×· · ·×
I
Ntensor T . Then
σ
21+ · · · + σ
n2−1+ σ
2n+1+ · · · + σ
2N≤ (N − 2)kT k
2+ σ
n2, n = 1, . . . , N, (10)
σ
1≥ 1
√ I
1kT k, . . . , σ
N≥ 1
√ I
NkT k.
(11)
Theorems 1, 2, and 4 are proved in Section 2.
It is natural to ask what happens if some inequalities in (4) are replaced by equalities. Obviously, the three equalities in (4) hold if and only if σ
1= σ
2= σ
3= kT k, implying that T
(1), T
(2), and T
(3)are rank-1 matrices. Hence all the remaining ML singular values of T are zero. Similarly, the two equalities σ
12+ σ
22= kT k
2+ σ
32and σ
12+ σ
23= kT k
2+ σ
22are equivalent to σ
1= kT k and σ
2= σ
3, implying that rank(T
(1)) = 1 and rank(T
(2)) = rank(T
(3)) =: L, i.e., T is an ML rank-(1, L, L) tensor, where L ≤ min(I
2, I
3). It is clear that in this case the remaining nonzero mode-2 and mode-3 singular values of T also coincide and may take any positive values whose squares sum up to kT k
2− σ
22. In Section 4 we characterize the tensors T for which the single equality σ
12+ σ
22= kT k
2+ σ
32holds. We show that T is necessarily equal to a sum of ML rank-(L
1, 1, L
1) and ML rank-(1, L
2, L
2) tensors and give a complete description of all its ML singular values. The description relies on a problem posed by H. Weyl in 1912: given the eigenvalues of two n × n Hermitian matrices A and B, what are all the possible eigenvalues of A + B? The following answer was conjectured by A. Horn in 1962 [9] and has been proved through the development of the theory of honeycombs in [10, 11] (see also [2, 12]). Let
λ
i( ·) denote the ith largest eigenvalue of a Hermitian matrix.
If
(12) α
i= λ
i(A), β
i= λ
i(B), γ
i= λ
i(A + B), then α
i, β
i, and γ
isatisfy the trivial equality
(13) γ
1+ · · · + γ
n= α
1+ · · · + α
n+ β
1+ · · · + β
nand the list of linear inequalities
X
k∈K
γ
k≤ X
i∈I
α
i+ X
j∈J
β
j, (I, J, K) ∈ T
rn, 1 ≤ r ≤ n − 1, (14)
where I = {i
1, . . . , i
r}, J = {j
1, . . . , j
r}, K = {k
1, . . . , k
r} are subsets of {1, . . . , n}
and T
rndenotes a particular finite set of triplets (I, J, K). (The construction of T
rnis given in Appendix A.) The inverse statement also holds: if α
i, β
i, and γ
isatisfy (13) and (14), then there exist n × n Hermitian matrices A, B, and C such that ( 12) holds.
We have the following results.
Theorem 5. Let σ
12+ σ
22= kT k
2+ σ
32. Then T is a sum of ML rank-(L
1, 1, L
1) and ML rank-(1, L
2, L
2) tensors, where L
1≤ min(I
1, I
3) and L
2≤ min(I
2− 1, I
3).
Theorem 6. Let σ
12+ σ
22= kT k
2+ σ
23. Then the values σ
1= σ
11≥ σ
12≥ · · · ≥ σ
1I1≥ 0, σ
2= σ
21≥ σ
22≥ · · · ≥ σ
2I2≥ 0, σ
3= σ
31≥ σ
32≥ · · · ≥ σ
3I3≥ 0,
are the mode-1, mode-2, and mode-3 singular values of an I
1× I
2× I
3tensor T , respectively, if and only if
σ
112+ · · · + σ
1I21= σ
221+ · · · + σ
22I2= σ
312+ · · · + σ
3I23= kT k
2,
σ
1i= 0 for i > min(I
1, I
3), σ
2i= 0 for i > min(I
2, I
3), and (13) and (14) hold for
α
i=
( σ
1i+12, i ≤ min(I
1, I
3) 0, otherwise , β
i=
( σ
22i+1, i ≤ min(I
2, I
3)
0, otherwise , γ
i= σ
23i+1, (15)
and n = I
3− 1.
Example 7. If n = 2, then T
12= {(i, j, k) : k = i + j − 1, 1 ≤ i, j, k ≤ 2 } = {(1, 1, 1), (1, 2, 2), (2, 1, 2)} (see Appendix A). By Horn’s conjecture, the equality γ
1+ γ
2= α
1+ α
2+ β
1+ β
2together with the inequalities (also known as the Weyl inequalities)
(16) γ
1≤ α
1+ β
1, γ
2≤ α
1+ β
2, γ
2≤ α
2+ β
1,
characterize the values α
1, α
2, β
1, β
2, γ
1, γ
2that can be eigenvalues of 2 × 2 Hermitian matrices A, B, and A + B. Let σ
112+ σ
221= kT k
2+ σ
312. From Theorem 6 and (16) it follows that the values σ
11≥ σ
12≥ σ
13≥ 0, σ
21≥ σ
22≥ σ
23≥ 0, and σ
31≥ σ
32≥ σ
33≥ 0, are the mode-1, mode-2, and mode-3 singular values, respectively, of a 3 × 3 × 3 tensor T if and only if
σ
112+ σ
212+ σ
132= σ
212+ σ
222+ σ
232= σ
312+ σ
232+ σ
332= kT k
2, σ
322≤ σ
122+ σ
222, σ
332≤ σ
122+ σ
223, σ
332≤ σ
132+ σ
222.
2. Proofs of Theorems 1, 2, and 4. The following lemma will be used in the proof of Theorem 1.
Lemma 8. Let H = (H
ij)
Ii,j=13∈ C
I3I1×I3I1be a positive semidefinite matrix consisting of the blocks H
ij∈ C
I1×I1. Then
(17) λ
max(H
11+ · · · + H
I3I3) + λ
max(H) ≤ tr(H) + λ
max(Φ(H)).
where Φ(H) denotes the I
3× I
3matrix with the entries (Φ(H))
ij= tr(H
ij) and λ
max( ·) denotes the largest eigenvalue of a matrix.
Proof. Let H = P
Rr=1
w
rw
Hr, where w
rare orthogonal and w
r= [w
1rT. . . w
IT3r]
Twith w
kr∈ C
I3. First, we rewrite (17) in terms of w
kr, 1 ≤ k ≤ I
3, 1 ≤ r ≤ R.
WLOG, we can assume that kw
1k = max
rkw
rk. Hence,
(18) λ
max(H) = kw
1k
2=
I3
X
k=1
kw
k1k
2.
It is clear that
H
ij= X
R r=1w
irw
jrH, 1 ≤ i, j ≤ I
3. Hence
(19) λ
max(H
11+ · · · + H
I3I3) = max
kxk=1 I3
X
k=1
(H
kkx, x) = max
kxk=1 I3
X
k=1
X
R r=1|(w
kr, x) |
2.
Since H = P
R r=1w
rw
Hr, it follows that
(20) tr(H) =
X
R r=1kw
rk
2.
Since
Φ(H)
ij= tr(H
ij) = tr X
R r=1w
irw
Hjr!
= X
R r=1w
Hjrw
ir= X
R r=1w
Tirw
∗jr,
it follows that
Φ(H) = X
R r=1
w
T1rw
∗1r. . . w
T1rw
∗I3r.. . . . . .. . w
IT3rw
∗1r. . . w
IT3rw
∗I3r
=
X
R r=1
w
T1r.. . w
IT3r
w
1r∗. . . w
∗I3r= X
R r=1W
TrW
r∗, ‘ (21)
where
W
r:= [w
1r. . . w
I3r] ∈ C
I1×I3.
Now we prove (17). By (18), (19), the Cauchy inequality, and (20),
λ
max(H) + λ
max(H
11+ . . . H
I3I3) = kw
1k
2+ max
kxk=1
"
I3X
k=1
|(w
k1, x) |
2+
I3
X
k=1
X
R r=2|(w
kr, x) |
2#
≤
kw
1k
2+ max
kxk=1
"
I3X
k=1
|(w
k1, x) |
2# +
X
R r=2kw
rk
2= tr(H) + max
kxk=1
"
I3X
k=1
|(w
k1, x) |
2# . (22)
To complete the proof of (17) we should show that
kxk=1
max
"
I3X
k=1
|(w
k1, x) |
2#
≤ λ
max(Φ(H)).
This can be done as follows
kxk=1
max
"
I3X
k=1
|(w
k1, x) |
2#
= max
kxk=1
"
I3X
k=1
x
Hw
k1w
Hk1x
#
= λ
maxW
1W
H1=
λ
maxW
H1W
1≤ λ
maxX
R r=1W
HrW
r!
= λ
max(Φ(H)
∗) = λ
max(Φ(H)).
(23)
Now we are ready to prove Theorem 1.
Proof of Theorem 1. The three inequalities in (5) are obvious. We prove that σ
12+ σ
22≤ kT k
2+ σ
32. The proofs of the inequalities σ
21+ σ
32≤ kT k
2+ σ
22and σ
22+ σ
32≤ kT k
2+ σ
12can be obtained in a similar way.
By definition of ML singular values,
σ
21= λ
max(T
(1)T
H(1)) = λ
max(T
1T
H1+ · · · + T
I3T
HI3), σ
22= λ
max(T
H(2)T
(2)) = λ
max(T
T(2)T
∗(2)) = λ
max(H), where
H = T
T(2)T
∗(2)=
T
1T
H1. . . T
1T
HI3.. . . . . .. . T
I3T
H1. . . T
I3T
HI3
.
Since vec(T
i)
T(vec(T
j)
T)
H= tr(T
iT
Hj), it follows that σ
23= λ
max(T
(3)T
H(3)) = λ
max(Φ(H)), where
Φ(H) =
tr(T
1T
H1) . . . tr(T
1T
HI3) .. . . . . .. . tr(T
I3T
H1) . . . tr(T
I3T
HI3)
.
Since kT k
2= tr(H), the inequality σ
21+ σ
22≤ kT k
2+ σ
23is equivalent to λ
max(T
1T
H1+ · · · + T
I3T
HI3) + λ
max(H) ≤ tr(H) + λ
max(Φ(H)), which holds by Lemma 8.
Proof of Theorem 2. The proof consists of three steps. In the first step we con- struct all-orthogonal and non-negative I
1×I
2×I
3tensors S
2, X
2, Y
2, Z
2, and N whose squared largest ML singular values are the coordinates of S
2(
I11,
I11,
I11), X
2(1,
I12,
I12), Y
2(
I11
, 1,
I11
), Z
2(
I11
,
I11
, 1), and N (1, 1, 1), respectively (see Figure 2b). Then we show that because of the zero patterns of S
2, X
2, Y
2, Z
2, and N , the tensor
(24) T = t
S2S
22+ t
X2X
22+ t
Y2Y
22+ t
Z2Z
22+ t
NN
212,
is all-orthogonal for any non-negative values t
S2, t
X2, t
Y2, t
Z2, t
N. The superscripts
“2” and “
12” in (24) denote the entrywise operations. Finally, in the third step, we find non-negative values t
S2, t
X2, t
Y2, t
Z2, t
Nsuch that T is norm-1 tensor whose squared largest ML singular values are equal to σ
21, σ
22, and σ
23.
Step 1. Let π denote the cyclic permutation π : 1 → I
1→ I
1− 1 → · · · → 2 → 1.
The tensors S
2, X
2, Y
2, and Z
2are defined by
S
2,ijk= (
1I1
, if j = π
k−1(i) and 1 ≤ i, k ≤ I
1, 0, otherwise,
X
2,ijk=
√1
I2
, if j = π
k−1(i), i = 1, and 1 ≤ k ≤ I
1,
√1
I2
, if i = 1 and I
1< j = k ≤ I
2, 0, otherwise,
Y
2,ijk= (
1√I1
, if j = π
k−1(i), j = 1, and 1 ≤ k ≤ I
1,
0, otherwise,
Z
2,ijk= (
1√I1
, if j = π
k−1(i), k = 1, and 1 ≤ i ≤ I
1, 0, otherwise,
and the tensor N , by definition, has only one nonzero entry, N
111= 1.
Step 2. It is clear that the (i, j, k)th entry of a linear combination of S
22, X
22, Y
22, Z
22, and N
2may be nonzero only if
j = π
k−1(i) and 1 ≤ i, k ≤ I
1or i = 1 and I
1< j = k ≤ I
2. The same is also true for T defined in ( 24). One can easily check that each column of T
(1), T
(2), and T
(3)contains at most one nonzero entry, implying that T is all- orthogonal tensor.
Step 3. From the construction of the all-orthogonal tensors S
2, X
2, Y
2, Z
2, and N it follows that their largest ML singular values are equal to the Frobenius norms of the first rows of their matrix unfoldings. Thus, the same property should also hold for T whenever the values t
S2, t
X2, t
Y2, t
Z2, and t
Nare non-negative. Now the result follows from the fact that the polyhedron in Figure 2b is the convex hull of the points S
2, X
2, Y
2, Z
2, and N . We can also write the values of t
S2, t
X2, t
Y2, t
Z2, and t
Nexplicitly. We set
f (σ
12, σ
22, σ
32) := (I
1I
2+ I
2− 2I
1)σ
12+ (I
1− 1)I
2σ
22+ (I
1− 1)I
2σ
32+ (2 − I
1I
2− I
2).
If (σ
12, σ
22, σ
32) belongs to the tetrahedron X
2Y
2Z
2N , i.e., f (σ
21, σ
22, σ
23) ≥ 0, then t
X2= I
22(I
2− 1) (1 + σ
12− σ
22− σ
32), t
Y2= I
12(I
1− 1) (1 + σ
22− σ
21− σ
23), t
Z2= I
12(I
1− 1) (1 + σ
32− σ
12− σ
22),
t
N= 1 − t
X2− t
Y2− t
Z2= f (σ
12, σ
22, σ
32)
2(I
1− 1)(I
2− 1) , t
S2= 0.
If (σ
12, σ
22, σ
32) belongs to the tetrahedron X
2Y
2Z
2S
2, i.e., f (σ
12, σ
22, σ
32) ≤ 0, then t
X2= I
1I
1− 1 (σ
21− 1 I
1), t
Y2= I
1I
1− 1 (σ
22− 1 I
1) + (I
2− I
1)I
1(I
12− 1)I
2(σ
21− 1 I
1), t
Z2= I
1I
1− 1 (σ
23− 1 I
1) + (I
2− I
1)I
1(I
12− 1)I
2(σ
21− 1 I
1), t
S2= 1 − t
X2− t
Y2− t
Z2= −f(σ
21, σ
22, σ
32)I
1I
2(I
1− 1)
2, t
N= 0.
Proof of Theorem 4. The inequalities in (11) are obvious. We prove that (25) σ
12+ · · · + σ
N2−1≤ (N − 2)kT k
2+ σ
N2.
The proofs of the remaining N − 1 inequalities in ( 10) can be obtained in a similar way.
The proof of (25) consists of two steps. In the first step we reshape T into third-
order tensors T
[1], . . . , T
[N−2]and compute their matrix unfoldings. In this step we
will make use of (1) for N = 3. For the reader’s convenience and for a future reference here we write a third-order version of (1) explicitly: if X ∈ C
I×J×K, then for all values of indices i, j, and k
the (i, j + (k − 1)J)th entry of X
(1)= the (j, i + (k − 1)I)th entry of X
(2)= the (k, i + (j − 1)I)th entry of X
(3)= the (i, j, k)th entry of X .
(26)
In the second step, we apply the first inequality in (4) to each tensor T
[n], then we sum up the obtained inequalities and show that the result coincides with inequality (25).
Step 1. Let n ∈ {1, . . . , N −2}. A third-order tensor T
[n]∈ C
I1···In×In+1×In+2···INis constructed as follows:
the (i
1+ X
n k=2(i
k− 1)
k
Y
−1 l=1I
l, i
n+1, i
n+2+ X
N k=n+3(i
k− 1)
k
Y
−1 l=n+2I
l)th entry of T
[n]is equal to the (i
1, . . . , i
N)th entry of T . Now we apply (26) for X = T
[n]and
i = i
1+ X
n k=2(i
k− 1)
k
Y
−1 l=1I
l, j = i
n+1, k = i
n+2+ X
N k=n+3(i
k− 1)
k
Y
−1 l=n+2I
l.
After simple algebraic manipulations, we obtain that
the (i
1+ X
n k=2(i
k− 1)
k
Y
−1 l=1I
l, i
n+1+ X
N k=n+2(i
k− 1)
k
Y
−1 l=n+1I
l)th entry of T
[n](1)=
the (i
n+1, 1 + X
N k6=n+1k=2(i
k− 1)
k
Y
−1 l6=n+1l=1I
l)th entry of T
[n](2)=
the (i
n+2+ X
N k=n+3(i
k− 1)
k
Y
−1 l=n+2I
l, i
1+
n+1
X
k=2
(i
k− 1)
k
Y
−1 l=1I
l)th entry of T
[n](3)= the (i
1, . . . , i
N)th entry of T .
(27)
Step 2. From (27) and (1) it follows that T
[1](1)= T
(1), (28)
T
[n](2)= T
(n+1), 1 ≤ n ≤ N − 2, (29)
T
[N−2](3)= T
(N ). (30)
Comparing the expressions of T
[n](1)and T
[n](3)in (27), we obtain that
(31) T
[n](3)=
T
[n+1](1) T, 1 ≤ n ≤ N − 3.
By Theorem 1, for every n ∈ {1, . . . , N − 2}
(32) σ
max2(T
[n](1)) + σ
2max(T
[n](2)) ≤ kT
[n]k
2+ σ
max2(T
[n](3)) = kT k
2+ σ
2max(T
[n](3)),
where σ
max( ·) denotes the largest singular value of a matrix. Substituting ( 28)–(31) into (32) we obtain
σ
12+ σ
22≤ kT k
2+ σ
2max(T
[1](3)) = kT k
2+ σ
max2(T
[2](1)), n = 1, σ
max2(T
[2](1)) + σ
32≤ kT k
2+ σ
2max(T
[2](3)) = kT k
2+ σ
max2(T
[3](1)), n = 2,
.. .
σ
max2(T
[N−3](1)) + σ
2N−2≤ kT k
2+ σ
2max(T
[N−3](3)) = kT k
2+ σ
2max(T
[N−2](1)), n = N − 3, σ
max2(T
[N(1)−2]) + σ
2N−1≤ kT k
2+ σ
2max(T
[N(3)−2]) = kT k
2+ σ
2N, n = N − 2.
Summing up the above inequalities and canceling identical terms on the left- and right-hand side we obtain (25).
3. Results on feasibility and non-feasibility of the points S, X
1, and Y
1. Throughout this subsection we assume that T is a norm-1 tensor.
In the following example we show that it may happen that S is the only feasible point in the plane through the points S, X
1, and Y
1, i.e., the plane σ
23=
I13
.
Example 9. Let I
3= I
1I
2and T ∈ C
I1×I2×I3. Assume that σ
23=
I13. Then T
H(3)T
(3)=
I13I
I3. Since T
(3)is a square matrix, it follows that T
(3)is a scalar multiple of a unitary matrix, T
(3)=
√1I3
U. One can easily verify (see [6, p. 65]), that T
H(1)T
(1)=
I11I
I1and T
H(2)T
(2)=
I12I
I2. Hence, σ
21=
I11and σ
22=
I12. Thus, the points X
1and Y
1are not feasible.
From Example 9 it follows that the point S is feasible if I
1= 2, I
2= 3, and I
3= 6.
The point S is also feasible if I
1= 2, I
2= 3, and I
3= 4. Indeed, let T be an 2 × 3 × 4 tensor with mode-3 matrix unfolding
T
(3)= 1 2 √
3
1 + √
3 0 0 1 − √
3 −2 0
0 1 + √
3 1 − √
3 0 0 2
0 1 − √
3 1 + √
3 0 0 2
1 − √
3 0 0 1 + √
3 −2 0
.
Then one can also easily verify that T
(1)T
H(1)=
12I
2, T
(2)T
H(2)=
13I
3, and T
(3)T
H(3)=
1
4
I
4. The following result implies that in the “intermediate” case I
1= 2, I
2= 3, and I
3= 5 the point S is not feasible.
Theorem 10. Let I
3= I
1I
2− 1, T ∈ C
I1×I2×I3, and T
(3)T
H(3)=
I13I
I3. Then the following statements hold:
(i) if T
(1)T
H(1)=
I11I
I1, then I
1≤ I
2; (ii) if T
(2)T
H(2)=
I12
I
I2, then I
2≤ I
1; (iii) if the point S is feasible, then I
1= I
2.
Proof. (i) Let T
(3)= [t
1. . . t
I1I2]. Then the identity T
(1)T
H(1)=
I11
I
I1is equiv- alent to the system
t
Hi1t
i2+ t
HI1+i1t
I1+i2+ · · · + t
HI1(I2−1)+i1t
I1(I2−1)+i2= 0, kt
i1k
2+ kt
I1+i1k
2+ · · · + kt
I1(I2−1)+i1k
2= 1
I
1, 1 ≤ i
1< i
2≤ I
1. (33)
Since T
(3)T
H(3)=
I13I
I3, the matrix √
I
3T
(3)∈ C
I3×I1I2can be extended to a unitary matrix √ I
3T
(3)a
T∈ C
I1I2×I1I2, where a ∈ C
I1I2is a vector such that T
(3)a
∗= 0
and kak
2=
I13. Hence,
h T
H(3)a
∗i T
(3)a
T= 1 I
3I
I2I3or
(34) t
Hit
j+ ¯ a
ia
j= 0 for i 6= j and kt
ik
2+ |a
i|
2= 1 I
3, 1 ≤ i < j ≤ I
1I
2. From (33)–(34) it follows that
¯
a
i1a
i2+ ¯ a
I1+i1a
I1+i2+ · · · + ¯a
I1(I2−1)+i1a
I1(I2−1)+i2= 0,
|a
i1|
2+ |a
I1+i1|
2+ · · · + |a
I1(I2−1)+i1|
2= 1 I
1, 1 ≤ i
1< i
2≤ I
1. Thus, the vectors
[a
ia
I1+i. . . a
I1(I2−1)+i]
T∈ C
I2, 1 ≤ i ≤ I
1are nonzero and mutually orthogonal. Hence, I
1≤ I
2.
(ii) The proof is similar to the proof of (i).
(iii) Since S is feasible, it follows that T
(1)T
H(1)=
I11
I
I1and T
(2)T
H(2)=
I12
I
I2. Hence, by (i) and (ii), I
1= I
2.
4. The case of at least one equality in (4). The following two lemmas will be used in the proof of Theorem 6.
Lemma 11. Let H and Φ(H) be as in Lemma 8. Then the equality in (17) holds if and only if H can be factorized as
(35) H = [vec(W
1) G ⊗ x][vec(W
1) G ⊗ x]
H, where
(i) W
1∈ C
I1×I3and x is a principal eigenvector of W
1W
1H, i.e., W
1W
H1x = λ
max(W
1W
H1)x, kxk = 1;
(ii) the matrix G = [g
2. . . g
R] ∈ C
I3×(R−1)has orthogonal columns;
(iii) G
TW
H1x = 0;
(iv) λ
max(W
H1W
1) = λ
max(W
H1W
1+ G
∗G
T).
Moreover, if (35) and (i)–(iv) hold, then
σ(
I3
X
k=1
H
kk) = σ(W
1W
H1+ kGk
2Fxx
H), (36)
σ(H) = {kW
1k
2F, kg
2k
2, . . . , kg
Rk
2, 0, . . . , 0 }, (37)
σ(Φ(H)) = σ(W
1HW
1+ G
∗G
T), (38)
where σ( ·) denotes the spectrum of a matrix and k · k
Fdenotes the Frobenius norm.
Proof. The proof essentially relies on the proof of Lemma 8 so we use the same notations and conventions as in the proof of Lemma 8.
Derivation of (36)–(38). Assume that 35 and (i)–(iv) hold. Then
H = X
R r=1vec(W
r) vec(W
r)
H, where W
r= xg
Trfor r = 2, . . . , R.
Hence
I3
X
k=1
H
kk= X
R r=1W
rW
Hr= W
1W
H1+ X
R r=2xg
rTg
∗rx
H= W
1W
1H+ kGk
2Fxx
H,
which implies (36). By (ii), (iii), and the convention kxk = 1 in (i), the vectors vec(W
r) are mutually orthogonal, which implies (37). Finally, by (21),
Φ(H) = X
R r=1W
rTW
∗r= W
T1W
∗1+ X
R r=1g
rx
Tx
∗g
Hr= W
T1W
1∗+ GG
H,
which implies (38).
Sufficiency. By (i) and (36),
λ
max(
I3
X
k=1
H
kk) = λ
max(W
1W
H1) + kGk
2F.
By (iv) and (ii),
kW
1k
2F≥ λ
max(W
1HW
1) ≥ λ
max(G
∗G
T) = max
2≤r≤R
kg
rk
2.
Thus, by (37), λ
max(H) = kW
1k
2Fand tr(H) = kW
1k
2F+ kGk
2F. By (iv) and (38), λ
max(Φ(H)) = λ
max(W
H1W
1). Thus, the left- and right-hand sides of (17) are equal to λ
max(W
1W
H1) + kW
1k
2F+ kGk
2F.
Necessity. It is clear that the equality in (17) holds if and only it holds in (22) and (23). So we replace the inequality signs in (22) and (23) with an equality sign.
From the first line of (23) it follows that x satisfies (i). By the Cauchy inequality, the equality
I3
X
k=1