``On the largest multilinear singular values of higher-order tensors’’

(1)

Citation/Reference Domanov I., Stegeman A., De Lathauwer L.,

``On the largest multilinear singular values of higher-order tensors’’

SIAM Journal on Matrix Analysis and Applications, vol. 38, No.4, 2017, 1434-1453.

Archived version Author manuscript: the content is identical to the content of the published paper, but without the final typesetting by the publisher

Published version https://doi.org/10.1137/16M110770X

Journal homepage http://epubs.siam.org/doi/abs/10.1137/16M110770X

Author contact ignat.domanov@kuleuven.be +32 56 24 64 92

Abstract

IR url in Lirias https://lirias.kuleuven.be/handle/123456789/588422

(article begins on next page)

(2)

HIGHER-ORDER TENSORS

IGNAT DOMANOV^†, ALWIN STEGEMAN^†, AND LIEVEN DE LATHAUWER^†

Abstract. Let σndenote the largest mode-n multilinear singular value of an I1×· · ·×INtensor T . We prove that

σ²₁+· · · + σ²n−1+ σ_n+1² +· · · + σ²N≤ (N − 2)kT k²+ σ_n², n = 1, . . . , N.

We also show that at least for third-order cubic tensors the inverse problem always has a solution.

Namely, for each σ1, σ2 and σ3that satisfy

σ²₁+ σ²₂≤ kT k²+ σ₃², σ²₁+ σ₃²≤ kT k²+ σ²₂, σ²₂+ σ₃²≤ kT k²+ σ²₁,

and the trivial inequalities σ1 ≥ ^√¹_nkT k, σ2 ≥ ^√¹_nkT k, σ3 ≥ ^√¹_nkT k, there always exists an n× n × n tensor whose largest multilinear singular values are equal to σ1, σ2and σ3. We also show that if the equality σ₁²+ σ²₂=kT k²+ σ₃²holds, thenT is necessarily equal to a sum of multilinear rank-(L1, 1, L1) and multilinear rank-(1, L2, L2) tensors and we give a complete description of all its multilinear singular values. We establish a connection with honeycombs and eigenvalues of the sum of two Hermitian matrices. This seems to give at least a partial explanation of why results on the joint distribution of multilinear singular values are scarce.

Key words. multilinear singular value decomposition, multilinear rank, singular value decomposition, tensor

AMS subject classifications. 15A69, 15A23

1. Introduction. Throughout the paper the superscripts

^T

,

^H

, and

^∗

denote transpose, hermitian transpose, and conjugation, respectively. We also use the “empty sum/product” convention, i.e., if m > n, then

P

n m

( ·) = 0 and Q

ⁿ

m

( ·) = 1.

Let T ∈ C

^I¹^×···×I^N

. A mode-n fiber of T is a column vector obtained by fixing indices i

1

, . . . , i

_n−1

, i

n+1

, . . . , i

N

. A matrix T

(n)

∈ C

^Iⁿ^×I¹^···Iⁿ⁻¹^Iⁿ⁺¹^...I^N

formed by all mode-n fibers is called a mode-n matrix unfolding (aka flattening or matricization) of T . For notational convenience we assume that the columns of T

⁽ⁿ⁾

are ordered such that

(1) the (i

n

, 1 + X

N k=1k6=n

(i

k

− 1)

k−1

Y

ll=16=n

I

l

)th entry of T

(n)

= the (i

1

, . . . , i

N

)th entry of T .

For instance, if N = 3, i.e., T ∈ C

^I¹^×I²^×I³

, then (1) implies that T

(1)

= [T

1

. . . T

I3

] ∈ C

^I¹^×I²^I³

,

∗Submitted to the editors DATE.

Funding: This work was funded by (1) Research Council KU Leuven: C1 project c16/15/059- nD; (2) F.W.O.: project G.0830.14N, G.0881.14N; (3) the Belgian Federal Science Policy Office:

IUAP P7 (DYSCO II, Dynamical systems, control and optimization, 2012-2017); (4) EU: The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Advanced Grant:

BIOTENSORS (no. 339804). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information

† Group Science, Engineering and Technology, KU Leuven - Kulak, E. Sabbelaan 53, 8500 Kortrijk, Belgium and Dept. of Electrical Engineering ESAT/STADIUS KU Leuven, Kasteel- park Arenberg 10, bus 2446, B-3001 Leuven-Heverlee, Belgium (ignat.domanov@kuleuven.be,alwin.stegeman@kuleuven.be,lieven.delathauwer@kuleuven.be).

1

(3)

T

(2)

= [T

^T₁

. . . T

^T_I₃

] ∈ C

^I²^×I¹^I³

,

T

(3)

= [vec(T

1

) . . . vec(T

I3

)]

^T

∈ C

^I³^×I¹^I²

, where T

1

, . . . , T

I3

∈ C

^I¹^×I²

denote the frontal slices of T .

Tensor T ∈ C

^I¹^×···×I^N

is all-orthogonal if the matrices T

(1)

T

^H₍₁₎

, . . . , T

(N )

T

^H_{(N )}

are diagonal. The MultiLinear (ML) Singular Value Decomposition (SVD) (aka Higher- Order SVD) is a factorization of T into the product of an all-orthogonal tensor S ∈ C

^I¹^×···×I^N

and N unitary matrices U

1

∈ C

^I¹^×I¹

, . . . , U

N

∈ C

^I^N^×I^N

,

(2) T = S·

¹

U

1

·

²

U

2

. . . ·

^N

U

N

,

where ” ·

ⁿ

” denotes the n-mode product of S and U

ⁿ

. Rather than giving the formal definition of ” ·

ⁿ

”, for which we refer the reader to [3, 4, 13], we present N equivalent matricized versions of (2):

(3) T

(n)

= U

n

S

(n)

(U

N

⊗ · · · ⊗ U

ⁿ⁺¹

⊗ U

ⁿ−1

⊗ · · · ⊗ U

¹

)

^T

, n = 1, . . . , N, where “ ⊗” denotes the Kronecker product. For N = 2, i.e., for T = T

¹

∈ C

^I¹^×I²

, the MLSVD reduces, up to trivial indeterminacies, to the classical SVD of a matrix, T

(1)

= T

1

= USV

^H

, where U = U

1

, S = S

(1)

, and V = U

^∗₂

⊗ 1. It is known [ 4] that MLSVD always exists and that its uniqueness properties are similar to those of the matrix SVD.

The MLSVD has many applications in signal processing, data analysis, and ma- chine learning (see, for instance, the overview papers [13, Subsection 4.4], [15]). Here we just mention that as Principal Component Analysis (PCA) can be done by SVD of a data matrix, MLPCA can be done by MLSVD of a data tensor [5, 14, 16].

The singular values of T

(n)

, are called the mode-n singular values of T . Since S

₍₁₎

S

^H₍₁₎

, . . . , S

_{(N )}

S

^H_{(N )}

are diagonal, it follows from (3) that the ML singular values of T coincide with the ML singular values of S, which are just the Frobenius norms of the rows of S

(1)

, . . . , S

(N )

. Throughout the paper,

σ

n

denotes the largest singular value of T

(n)

.

In the matrix case, i.e., for N = 2, the description of MLSVD is trivial. Indeed, the singular values of T

(1)

= T

1

and T

(2)

= T

^T₁

coincide and T

(3)

= vec(T

1

)

^T

has a single singular value kT k. Thus, the singular values of T

⁽¹⁾

completely define the singular values of T

(2)

and T

(3)

. In particular, the set of triplets (σ

1

, σ

2

, σ

3

) coincides with the set {(x, x, y) : y ≥ x ≥ 0} ⊂ R

³

whose Lebesgue measure is zero. The situation for tensors is much more complicated. It is clear that in the general case N ≥ 2, the sets of the mode-1,. . . , mode-N singular values are not independent either.

The study of topological properties of the set of ML singular values of real tensors has been initiated only recently in [8] and [7]. In particular, it has been shown in [8]

and [7] that, as in the matrix case, some configurations of ML singular values are not possible but, nevertheless, at least for n × · · · × n tensors the set of ML singular values has a positive Lebesgue measure.

In this paper we study possible configurations for the largest ML singular values,

i.e., for σ

1

, . . . , σ

N

. Our results are valid for real and complex tensors. The following

theorem presents simple necessary conditions for σ

1

, σ

2

, and σ

3

to be the largest ML

singular values of a third-order tensor. For instance, it implies that a norm-1 tensor

whose largest ML singular values are equal to 0.9, 0.9, and 0.7 does not exist.

(4)

Theorem 1. Let σ

1

, σ

2

, and σ

3

denote the largest ML singular values of an I

1

× I

²

× I

³

tensor T . Then

σ

₁²

+ σ

₂²

≤ kT k

²

+ σ

₃²

, σ

₁²

+ σ

₃²

≤ kT k

²

+ σ

²₂

, σ

²₂

+ σ

²₃

≤ kT k

²

+ σ

₁²

, (4)

σ

1

≥ 1

√ I

1

kT k, σ

²

≥ 1

√ I

2

kT k, σ

³

≥ 1

√ I

3

kT k.

(5)

Figure 1 shows four typical shapes of the set {(σ

1²

, σ

²₂

, σ

₃²

) : σ

1

, σ

2

, σ

3

satisfy (4)–(5) } (WLOG, we assumed that I

1

≤ I

²

≤ I

³

).

S

N

X

1

X

2

Y

1

Y

2

Z

1

Z

2

σ

²₁

σ

²₂

σ

₃²

O

(a) I1< I2< I3

S

N

X

1

X

2

Y

1

Y

2

Z

σ

₁²

σ

²₂

σ

₃²

O

(b) I1= I2< I3

S

N

X

Y

1

Y

2

Z

1

Z

2

σ

₁²

σ

₂²

σ

²₃

O

(c) I1< I2= I3

S

N

X

Y Z

σ

²₁

σ

₂²

σ

²₃

O

(d) I1= I2= I3= n

Fig. 1. The typical shapes of the set{(σ²₁, σ₂², σ²₃) : σ1, σ2, σ3satisfy (4)–(5)} for I1≤ I2≤ I3

(drawn for I1 = 2, I2 = 3, I3 = 5 andkT k = 1). Plot (a) is the case where all dimensions of a tensor are distinct. The points S, X1, X2, Y1, Y2, Z1, Z2 and N have coordinates (_I¹

1,_I¹

2,_I¹

3), (1−_I¹₂+_I¹

3,_I¹

2,_I¹

3), (1,_I¹

2,_I¹

2), (_I¹

1, 1−_I¹₁+_I¹

3,_I¹

3), (_I¹

1, 1,_I¹

1), (_I¹

1,_I¹

2, 1−_I¹₁+_I¹

2), (_I¹

1,_I¹

1, 1), and (1, 1, 1), respectively. Plots (b)–(c) are the cases where a tensor has exactly two equal dimensions, the points Z1 and Z2 were merged into one point Z and the points X1 and X2 were merged into one point X. Plot (d) is the case where all three dimensions of a tensor are equal to each other, I1= I2= I3= n. In this case, the points Y1and Y2 were merged into one point Y , so S, X, Y , and Z have the coordinates (¹_n,¹_n,_n¹), (1,_n¹,_n¹), (¹_n, 1,_n¹), and (_n¹,¹_n, 1), respectively. By Corollary3, any point (σ²₁, σ₂², σ²₃) of the polyhedron SXY ZN in plot (d) is feasible, i.e., there exists a norm-1 tensorT ∈ Cⁿ^×n×n whose squared largest multilinear singular values are σ²₁, σ²₂, and σ²₃. The volume of SXY ZN equals half of the volume of the cube, i.e., ¹₂(1−¹_n)³.

(5)

One can easily verify that if σ

1

, σ

2

and σ

3

satisfy (4)–(5) for I

1

= I

2

= I

3

= 2 and kT k = 1, then σ

¹

, σ

2

and σ

3

are the largest ML singular values of the 2 × 2 × 2 tensor T with mode-1 matrix unfolding

T

(1)

= [T

1

T

2

] =





√

σ²₁+σ₂²+σ²₃−1

√2

0 0

√

1+σ²₁√−σ²2−σ²3

2

0 √

1+σ²₃√−σ²1−σ²2

2

√

1+σ₂²√−σ²1−σ3²

2

0 

 .

The proof of the following result relies on a similar explicit construction of an I

1

× I

2

× I

³

tensor T .

Theorem 2. Let I

1

≤ I

²

≤ I

³

and σ

1

, σ

2

, σ

3

satisfy (4) and the following three inequalities

σ

1

≥ 1

√ I

1

kT k, (6)

(I

2

− I

¹

)σ

₁²

+ (I

1

I

2

− I

²

)σ

₃²

+ (1 − I

²

) kT k

²

≥ 0, (7)

(I

2

− I

¹

)σ

₁²

+ (I

1

I

2

− I

²

)σ

₂²

+ (1 − I

²

) kT k

²

≥ 0.

(8)

Then there exists an I

1

× I

²

× I

³

tensor T such that 1. all entries of T are non-negative;

2. T is all-orthogonal;

3. the largest ML singular values of T are equal to σ

¹

, σ

2

and σ

3

.

Conditions (5) and (6)–(8) mean that the point (σ

²₁

, σ

₂²

, σ

₃²

) belongs to the trihedral angle SX

1

Y

1

Z

1

and S

2

X

2

Y

2

Z

2

, respectively, where S

2

has coordinates (

_I¹₁

,

_I¹₁

,

_I¹₁

).

The gap between the necessary conditions in Theorem 1 and the sufficient conditions in Theorem 2, i.e., the set

(9) {(σ

1²

, σ

²₂

, σ

₃²

) : (4)–(5) hold and at least one of (6)–(8) does not hold }, is shown in Figure 2c. One can easily verify that the gap is empty only for I

1

= I

2

= I

3

.

Corollary 3. Let σ

1

, σ

2

and σ

3

satisfy (4)–(5) for I

1

= I

2

= I

3

= n ≥ 2. Then there exists an n × n × n tensor T such that

1. all entries of T are non-negative;

2. T is all-orthogonal;

3. the largest ML singular values of T are equal to σ

¹

, σ

2

and σ

3

.

Thus, the conditions in Theorem 1 are not only necessary but also sufficient for σ

1

, σ

2

, and σ

3

to be feasible largest ML singular values of a cubic third-order tensor.

Figure 1d shows the set of feasible triplets (σ

²₁

, σ

₂²

, σ

²₃

) of an n × n × n tensor.

We do not have a complete view on the feasibility of points in (9). In Section 3 we obtain particular results on the (non)feasibility of the points S(

_I¹₁

,

_I¹₂

,

_I¹₃

), X

1

(1 −

1

I2

+

_I¹₃

,

_I¹₂

,

_I¹₃

), and Y

1

(

_I¹₁

, 1 −

I¹1

+

_I¹₃

,

_I¹₃

). Namely, we show that if I

1

< I

2

and I

3

= I

1

I

2

− 1, then the point S is not feasible and if I

³

= I

1

I

2

, then the point S is feasible but the points X

1

and Y

1

not.

It worth mentioning a link with scaled all-orthonormal tensors introduced recently in [6]. Tensor T ∈ C

^I¹^×···×I^N

is scaled all-orthonormal [6, Definition 2] if at least N −1 of the N matrices T

(1)

T

^H₍₁₎

, . . . , T

(N )

T

^H_{(N )}

are multiples of the identity matrix. It is clear that if the largest mode-n singular value of a norm-1 tensor is

^√¹_I

n

, then all mode- n singular values are also

^√¹_I

n

. Thus, feasibility of a point belonging to the segment

(6)

S

N

X

1

X

2

Y

1

Y

2

Z

1

Z

2

σ

²₁

σ

²₂

σ

₃²

O

(a) σ1, σ2, σ3 satisfy (4)–(5)

N

X

2

Y

2

Z

2

S

2

σ

₁²

σ

²₂

σ

₃²

O

(b) σ1, σ2, σ3satisfy (4) and (6)–(8)

S

2

S

N

X X

12

Y

1

Y

2

Z

1

Z

2

σ

²₁

σ

₂²

σ

²₃

O

(c) the set in eq. (9)

Fig. 2. Gap between the necessary conditions in Theorem1and the sufficient conditions in Theorem2for I1 < I2 < I3 (drawn for I1 = 2, I2 = 5, I3= 7 and kT k = 1). The point S2 has coordinates (_I¹

1,_I¹

1). The set in plot (c) is the difference of the set in plot (a) and the set in plot (b).

SX

1

(resp. SY

1

or SZ

1

) is equivalent to the existence of a norm-1 I

1

× I

²

× I

³

tensor T such that

T

(2)

T

^H₍₂₎

= 1 I

2

I

I2

, T

(3)

T

^H₍₃₎

= 1 I

3

I

I3

(resp. T

₍₁₎

T

^H₍₁₎

= 1 I

1

I

I1

, T

₍₃₎

T

^H₍₃₎

= 1 I

3

I

I3

or T

₍₁₎

T

^H₍₁₎

= 1 I

1

I

I1

, T

₍₂₎

T

^H₍₂₎

= 1 I

2

I

I2

), i.e., to the existence of a scaled all-orthonormal tensor T .

The following result generalizes Theorem 1 for N th-order tensors.

Theorem 4. Let σ

1

, . . . , σ

N

denote the largest ML singular values of an I

1

×· · ·×

I

N

tensor T . Then

σ

²₁

+ · · · + σ

n²−1

+ σ

²_n+1

+ · · · + σ

²N

≤ (N − 2)kT k

²

+ σ

_n²

, n = 1, . . . , N, (10)

σ

1

≥ 1

√ I

1

kT k, . . . , σ

^N

≥ 1

√ I

N

kT k.

(11)

(7)

Theorems 1, 2, and 4 are proved in Section 2.

It is natural to ask what happens if some inequalities in (4) are replaced by equalities. Obviously, the three equalities in (4) hold if and only if σ

1

= σ

2

= σ

3

= kT k, implying that T

⁽¹⁾

, T

(2)

, and T

(3)

are rank-1 matrices. Hence all the remaining ML singular values of T are zero. Similarly, the two equalities σ

1²

+ σ

²₂

= kT k

²

+ σ

₃²

and σ

₁²

+ σ

²₃

= kT k

²

+ σ

₂²

are equivalent to σ

1

= kT k and σ

²

= σ

3

, implying that rank(T

(1)

) = 1 and rank(T

(2)

) = rank(T

(3)

) =: L, i.e., T is an ML rank-(1, L, L) tensor, where L ≤ min(I

²

, I

3

). It is clear that in this case the remaining nonzero mode-2 and mode-3 singular values of T also coincide and may take any positive values whose squares sum up to kT k

²

− σ

2²

. In Section 4 we characterize the tensors T for which the single equality σ

1²

+ σ

₂²

= kT k

²

+ σ

₃²

holds. We show that T is necessarily equal to a sum of ML rank-(L

1

, 1, L

1

) and ML rank-(1, L

2

, L

2

) tensors and give a complete description of all its ML singular values. The description relies on a problem posed by H. Weyl in 1912: given the eigenvalues of two n × n Hermitian matrices A and B, what are all the possible eigenvalues of A + B? The following answer was conjectured by A. Horn in 1962 [9] and has been proved through the development of the theory of honeycombs in [10, 11] (see also [2, 12]). Let

λ

i

( ·) denote the ith largest eigenvalue of a Hermitian matrix.

If

(12) α

i

= λ

i

(A), β

i

= λ

i

(B), γ

i

= λ

i

(A + B), then α

i

, β

i

, and γ

i

satisfy the trivial equality

(13) γ

1

+ · · · + γ

ⁿ

= α

1

+ · · · + α

ⁿ

+ β

1

+ · · · + β

ⁿ

and the list of linear inequalities

X

k∈K

γ

k

≤ X

i∈I

α

i

+ X

j∈J

β

j

, (I, J, K) ∈ T

rⁿ

, 1 ≤ r ≤ n − 1, (14)

where I = {i

¹

, . . . , i

r

}, J = {j

¹

, . . . , j

r

}, K = {k

¹

, . . . , k

r

} are subsets of {1, . . . , n}

and T

_rⁿ

denotes a particular finite set of triplets (I, J, K). (The construction of T

_rⁿ

is given in Appendix A.) The inverse statement also holds: if α

i

, β

i

, and γ

i

satisfy (13) and (14), then there exist n × n Hermitian matrices A, B, and C such that ( 12) holds.

We have the following results.

Theorem 5. Let σ

₁²

+ σ

₂²

= kT k

²

+ σ

₃²

. Then T is a sum of ML rank-(L

¹

, 1, L

1

) and ML rank-(1, L

2

, L

2

) tensors, where L

1

≤ min(I

¹

, I

3

) and L

2

≤ min(I

²

− 1, I

³

).

Theorem 6. Let σ

₁²

+ σ

₂²

= kT k

²

+ σ

²₃

. Then the values σ

1

= σ

11

≥ σ

¹²

≥ · · · ≥ σ

^1I¹

≥ 0, σ

2

= σ

21

≥ σ

²²

≥ · · · ≥ σ

^2I2

≥ 0, σ

3

= σ

31

≥ σ

³²

≥ · · · ≥ σ

^3I³

≥ 0,

are the mode-1, mode-2, and mode-3 singular values of an I

1

× I

²

× I

³

tensor T , respectively, if and only if

σ

₁₁²

+ · · · + σ

1I²1

= σ

²₂₁

+ · · · + σ

²2I2

= σ

₃₁²

+ · · · + σ

3I²3

= kT k

²

,

(8)

σ

1i

= 0 for i > min(I

1

, I

3

), σ

2i

= 0 for i > min(I

2

, I

3

), and (13) and (14) hold for

α

i

=

( σ

_1i+1²

, i ≤ min(I

¹

, I

3

) 0, otherwise , β

i

=

( σ

²_2i+1

, i ≤ min(I

²

, I

3

)

0, otherwise , γ

i

= σ

²_3i+1

, (15)

and n = I

3

− 1.

Example 7. If n = 2, then T

₁²

= {(i, j, k) : k = i + j − 1, 1 ≤ i, j, k ≤ 2 } = {(1, 1, 1), (1, 2, 2), (2, 1, 2)} (see Appendix A). By Horn’s conjecture, the equality γ

1

+ γ

2

= α

1

+ α

2

+ β

1

+ β

2

together with the inequalities (also known as the Weyl inequalities)

(16) γ

1

≤ α

¹

+ β

1

, γ

2

≤ α

¹

+ β

2

, γ

2

≤ α

²

+ β

1

,

characterize the values α

1

, α

2

, β

1

, β

2

, γ

1

, γ

2

that can be eigenvalues of 2 × 2 Hermitian matrices A, B, and A + B. Let σ

₁₁²

+ σ

²₂₁

= kT k

²

+ σ

₃₁²

. From Theorem 6 and (16) it follows that the values σ

11

≥ σ

¹²

≥ σ

¹³

≥ 0, σ

²¹

≥ σ

²²

≥ σ

²³

≥ 0, and σ

³¹

≥ σ

32

≥ σ

³³

≥ 0, are the mode-1, mode-2, and mode-3 singular values, respectively, of a 3 × 3 × 3 tensor T if and only if

σ

₁₁²

+ σ

²₁₂

+ σ

₁₃²

= σ

₂₁²

+ σ

²₂₂

+ σ

₂₃²

= σ

₃₁²

+ σ

²₃₂

+ σ

₃₃²

= kT k

²

, σ

₃₂²

≤ σ

12²

+ σ

²₂₂

, σ

₃₃²

≤ σ

12²

+ σ

²₂₃

, σ

₃₃²

≤ σ

13²

+ σ

²₂₂

.

2. Proofs of Theorems 1, 2, and 4. The following lemma will be used in the proof of Theorem 1.

Lemma 8. Let H = (H

ij

)

^I_i,j=1³

∈ C

Î³Î¹^×I³Î¹

be a positive semidefinite matrix consisting of the blocks H

ij

∈ C

^I¹^×I¹

. Then

(17) λ

max

(H

11

+ · · · + H

^I³^I³

) + λ

max

(H) ≤ tr(H) + λ

^max

(Φ(H)).

where Φ(H) denotes the I

3

× I

³

matrix with the entries (Φ(H))

_ij

= tr(H

ij

) and λ

max

( ·) denotes the largest eigenvalue of a matrix.

Proof. Let H = P

^R

r=1

w

r

w

^H_r

, where w

r

are orthogonal and w

r

= [w

_1r^T

. . . w

_I^T₃_r

]

^T

with w

kr

∈ C

^I³

. First, we rewrite (17) in terms of w

kr

, 1 ≤ k ≤ I

³

, 1 ≤ r ≤ R.

WLOG, we can assume that kw

¹

k = max

_r

kw

^r

k. Hence,

(18) λ

max

(H) = kw

¹

k

²

=

I3

X

k=1

kw

^k1

k

²

.

It is clear that

H

ij

= X

R r=1

w

ir

w

_jr^H

, 1 ≤ i, j ≤ I

³

. Hence

(19) λ

max

(H

11

+ · · · + H

^I3I3

) = max

kxk=1 I3

X

k=1

(H

kk

x, x) = max

kxk=1 I3

X

k=1

X

R r=1

|(w

^kr

, x) |

²

.

(9)

Since H = P

R r=1

w

r

w

^H_r

, it follows that

(20) tr(H) =

X

R r=1

kw

^r

k

²

.

Since

Φ(H)

ij

= tr(H

ij

) = tr X

R r=1

w

ir

w

^H_jr

!

= X

R r=1

w

^H_jr

w

ir

= X

R r=1

w

^T_ir

w

^∗_jr

,

it follows that

Φ(H) = X

R r=1



 

w

^T_1r

w

^∗_1r

. . . w

^T_1r

w

^∗_I₃_r

.. . . . . .. . w

_I^T₃_r

w

^∗_1r

. . . w

_I^T₃_r

w

^∗_I₃_r



  =

X

R r=1



  w

^T_1r

.. . w

_I^T₃_r



 

w

_1r^∗

. . . w

^∗_I₃_r

= X

R r=1

W

^T_r

W

_r^∗

, ‘ (21)

where

W

r

:= [w

1r

. . . w

I3r

] ∈ C

^I¹^×I³

.

Now we prove (17). By (18), (19), the Cauchy inequality, and (20),

λ

max

(H) + λ

max

(H

11

+ . . . H

I3I3

) = kw

¹

k

²

+ max

kxk=1

"

_I₃

X

k=1

|(w

^k1

, x) |

²

+

I3

X

k=1

X

R r=2

|(w

^kr

, x) |

²

#

≤

kw

¹

k

²

+ max

kxk=1

"

_I₃

X

k=1

|(w

^k1

, x) |

²

# +

X

R r=2

kw

^r

k

²

= tr(H) + max

kxk=1

"

_I₃

X

k=1

|(w

^k1

, x) |

²

# . (22)

To complete the proof of (17) we should show that

kxk=1

max

"

_I₃

X

k=1

|(w

^k1

, x) |

²

#

≤ λ

^max

(Φ(H)).

This can be done as follows

kxk=1

max

"

_I₃

X

k=1

|(w

^k1

, x) |

²

#

= max

kxk=1

"

_I₃

X

k=1

x

^H

w

k1

w

^H_k1

x

#

= λ

max

W

1

W

^H₁

=

λ

max

W

^H₁

W

1

≤ λ

^max

X

R r=1

W

^H_r

W

r

!

= λ

max

(Φ(H)

^∗

) = λ

max

(Φ(H)).

(23)

Now we are ready to prove Theorem 1.

(10)

Proof of Theorem 1. The three inequalities in (5) are obvious. We prove that σ

₁²

+ σ

₂²

≤ kT k

²

+ σ

₃²

. The proofs of the inequalities σ

²₁

+ σ

₃²

≤ kT k

²

+ σ

₂²

and σ

₂²

+ σ

₃²

≤ kT k

²

+ σ

₁²

can be obtained in a similar way.

By definition of ML singular values,

σ

²₁

= λ

max

(T

(1)

T

^H₍₁₎

) = λ

max

(T

1

T

^H₁

+ · · · + T

^I3

T

^H_I₃

), σ

²₂

= λ

max

(T

^H₍₂₎

T

(2)

) = λ

max

(T

^T₍₂₎

T

^∗₍₂₎

) = λ

max

(H), where

H = T

^T₍₂₎

T

^∗₍₂₎

=



 

T

1

T

^H₁

. . . T

1

T

^H_I₃

.. . . . . .. . T

I3

T

^H₁

. . . T

I3

T

^H_I₃



  .

Since vec(T

i

)

^T

(vec(T

j

)

^T

)

^H

= tr(T

i

T

^H_j

), it follows that σ

²₃

= λ

max

(T

(3)

T

^H₍₃₎

) = λ

max

(Φ(H)), where

Φ(H) =



 

tr(T

1

T

^H₁

) . . . tr(T

1

T

^H_I₃

) .. . . . . .. . tr(T

I3

T

^H₁

) . . . tr(T

I3

T

^H_I₃

)



  .

Since kT k

²

= tr(H), the inequality σ

²₁

+ σ

₂²

≤ kT k

²

+ σ

²₃

is equivalent to λ

max

(T

1

T

^H₁

+ · · · + T

^I³

T

^H_I₃

) + λ

max

(H) ≤ tr(H) + λ

^max

(Φ(H)), which holds by Lemma 8.

Proof of Theorem 2. The proof consists of three steps. In the first step we con- struct all-orthogonal and non-negative I

1

×I

²

×I

³

tensors S

²

, X

²

, Y

²

, Z

²

, and N whose squared largest ML singular values are the coordinates of S

2

(

_I¹₁

,

_I¹₁

,

_I¹₁

), X

2

(1,

_I¹₂

,

_I¹₂

), Y

2

(

_I¹

1

, 1,

_I¹

1

), Z

2

(

_I¹

1

,

_I¹

1

, 1), and N (1, 1, 1), respectively (see Figure 2b). Then we show that because of the zero patterns of S

²

, X

²

, Y

²

, Z

²

, and N , the tensor

(24) T = t

^S2

S

2²

+ t

X2

X

2²

+ t

Y2

Y

2²

+ t

Z2

Z

2²

+ t

N

²

¹2

,

is all-orthogonal for any non-negative values t

S2

, t

X2

, t

Y2

, t

Z2

, t

N

. The superscripts

“2” and “

¹₂

” in (24) denote the entrywise operations. Finally, in the third step, we find non-negative values t

S2

, t

X2

, t

Y2

, t

Z2

, t

N

such that T is norm-1 tensor whose squared largest ML singular values are equal to σ

²₁

, σ

²₂

, and σ

²₃

.

Step 1. Let π denote the cyclic permutation π : 1 → I

¹

→ I

¹

− 1 → · · · → 2 → 1.

The tensors S

²

, X

²

, Y

²

, and Z

²

are defined by

S

^2,ijk

= (

₁

I1

, if j = π

^k⁻¹

(i) and 1 ≤ i, k ≤ I

¹

, 0, otherwise,

X

^2,ijk

=

 



 

√1

I2

, if j = π

^k⁻¹

(i), i = 1, and 1 ≤ k ≤ I

¹

,

√1

I2

, if i = 1 and I

1

< j = k ≤ I

²

, 0, otherwise,

Y

^2,ijk

= (

₁

√I1

, if j = π

^k⁻¹

(i), j = 1, and 1 ≤ k ≤ I

¹

,

0, otherwise,

(11)

Z

^2,ijk

= (

₁

√I1

, if j = π

^k⁻¹

(i), k = 1, and 1 ≤ i ≤ I

¹

, 0, otherwise,

and the tensor N , by definition, has only one nonzero entry, N

¹¹¹

= 1.

Step 2. It is clear that the (i, j, k)th entry of a linear combination of S

2²

, X

2²

, Y

2²

, Z

2²

, and N

²

may be nonzero only if

j = π

^k−1

(i) and 1 ≤ i, k ≤ I

¹

or i = 1 and I

1

< j = k ≤ I

²

. The same is also true for T defined in ( 24). One can easily check that each column of T

(1)

, T

(2)

, and T

(3)

contains at most one nonzero entry, implying that T is all- orthogonal tensor.

Step 3. From the construction of the all-orthogonal tensors S

²

, X

²

, Y

²

, Z

²

, and N it follows that their largest ML singular values are equal to the Frobenius norms of the first rows of their matrix unfoldings. Thus, the same property should also hold for T whenever the values t

^S2

, t

X2

, t

Y2

, t

Z2

, and t

N

are non-negative. Now the result follows from the fact that the polyhedron in Figure 2b is the convex hull of the points S

2

, X

2

, Y

2

, Z

2

, and N . We can also write the values of t

S2

, t

X2

, t

Y2

, t

Z2

, and t

N

explicitly. We set

f (σ

₁²

, σ

²₂

, σ

₃²

) := (I

1

I

2

+ I

2

− 2I

¹

)σ

₁²

+ (I

1

− 1)I

²

σ

₂²

+ (I

1

− 1)I

²

σ

₃²

+ (2 − I

¹

I

2

− I

²

).

If (σ

₁²

, σ

²₂

, σ

₃²

) belongs to the tetrahedron X

2

Y

2

Z

2

N , i.e., f (σ

²₁

, σ

₂²

, σ

²₃

) ≥ 0, then t

X2

= I

2

2(I

2

− 1) (1 + σ

₁²

− σ

2²

− σ

3²

), t

Y2

= I

1

2(I

1

− 1) (1 + σ

²₂

− σ

²1

− σ

²3

), t

Z2

= I

1

2(I

1

− 1) (1 + σ

₃²

− σ

1²

− σ

2²

),

t

N

= 1 − t

^X2

− t

^Y2

− t

^Z2

= f (σ

₁²

, σ

²₂

, σ

₃²

)

2(I

1

− 1)(I

²

− 1) , t

S2

= 0.

If (σ

₁²

, σ

²₂

, σ

₃²

) belongs to the tetrahedron X

2

Y

2

Z

2

S

2

, i.e., f (σ

₁²

, σ

²₂

, σ

₃²

) ≤ 0, then t

X2

= I

1

I

1

− 1 (σ

²₁

− 1 I

1

), t

Y2

= I

1

I

1

− 1 (σ

²₂

− 1 I

1

) + (I

2

− I

¹

)I

1

(I

₁²

− 1)I

²

(σ

²₁

− 1 I

1

), t

Z2

= I

1

I

1

− 1 (σ

²₃

− 1 I

1

) + (I

2

− I

¹

)I

1

(I

₁²

− 1)I

²

(σ

²₁

− 1 I

1

), t

S2

= 1 − t

^X2

− t

^Y2

− t

^Z2

= −f(σ

²1

, σ

₂²

, σ

₃²

)I

1

I

2

(I

1

− 1)

²

, t

N

= 0.

Proof of Theorem 4. The inequalities in (11) are obvious. We prove that (25) σ

₁²

+ · · · + σ

N²−1

≤ (N − 2)kT k

²

+ σ

_N²

.

The proofs of the remaining N − 1 inequalities in ( 10) can be obtained in a similar way.

The proof of (25) consists of two steps. In the first step we reshape T into third-

order tensors T

^[1]

, . . . , T

^[N^−2]

and compute their matrix unfoldings. In this step we

(12)

will make use of (1) for N = 3. For the reader’s convenience and for a future reference here we write a third-order version of (1) explicitly: if X ∈ C

^I^×J×K

, then for all values of indices i, j, and k

the (i, j + (k − 1)J)th entry of X

⁽¹⁾

= the (j, i + (k − 1)I)th entry of X

⁽²⁾

= the (k, i + (j − 1)I)th entry of X

⁽³⁾

= the (i, j, k)th entry of X .

(26)

In the second step, we apply the first inequality in (4) to each tensor T

^[n]

, then we sum up the obtained inequalities and show that the result coincides with inequality (25).

Step 1. Let n ∈ {1, . . . , N −2}. A third-order tensor T

^[n]

∈ C

^I¹^···Iⁿ^×Iⁿ⁺¹^×Iⁿ⁺²^···I^N

is constructed as follows:

the (i

1

+ X

n k=2

(i

k

− 1)

k

Y

−1 l=1

I

l

, i

n+1

, i

n+2

+ X

N k=n+3

(i

k

− 1)

k

Y

−1 l=n+2

I

l

)th entry of T

^[n]

is equal to the (i

1

, . . . , i

N

)th entry of T . Now we apply (26) for X = T

^[n]

and

i = i

1

+ X

n k=2

(i

k

− 1)

k

Y

−1 l=1

I

l

, j = i

n+1

, k = i

n+2

+ X

N k=n+3

(i

k

− 1)

k

Y

−1 l=n+2

I

l

.

After simple algebraic manipulations, we obtain that

the (i

1

+ X

n k=2

(i

k

− 1)

k

Y

−1 l=1

I

l

, i

n+1

+ X

N k=n+2

(i

k

− 1)

k

Y

−1 l=n+1

I

l

)th entry of T

^[n]₍₁₎

=

the (i

n+1

, 1 + X

N k6=n+1k=2

(i

k

− 1)

k

Y

−1 l6=n+1l=1

I

l

)th entry of T

^[n]₍₂₎

=

the (i

n+2

+ X

N k=n+3

(i

k

− 1)

k

Y

−1 l=n+2

I

l

, i

1

+

n+1

X

k=2

(i

k

− 1)

k

Y

−1 l=1

I

l

)th entry of T

^[n]₍₃₎

= the (i

1

, . . . , i

N

)th entry of T .

(27)

Step 2. From (27) and (1) it follows that T

^[1]₍₁₎

= T

(1)

, (28)

T

^[n]₍₂₎

= T

(n+1)

, 1 ≤ n ≤ N − 2, (29)

T

^[N−2]₍₃₎

= T

(N )

. (30)

Comparing the expressions of T

^[n]₍₁₎

and T

^[n]₍₃₎

in (27), we obtain that

(31) T

^[n]₍₃₎

=

T

^[n+1]₍₁₎

T

, 1 ≤ n ≤ N − 3.

By Theorem 1, for every n ∈ {1, . . . , N − 2}

(32) σ

_max²

(T

^[n]₍₁₎

) + σ

²_max

(T

^[n]₍₂₎

) ≤ kT

^[n]

k

²

+ σ

_max²

(T

^[n]₍₃₎

) = kT k

²

+ σ

²_max

(T

^[n]₍₃₎

),

(13)

where σ

max

( ·) denotes the largest singular value of a matrix. Substituting ( 28)–(31) into (32) we obtain

σ

₁²

+ σ

₂²

≤ kT k

²

+ σ

²_max

(T

^[1]₍₃₎

) = kT k

²

+ σ

_max²

(T

^[2]₍₁₎

), n = 1, σ

_max²

(T

^[2]₍₁₎

) + σ

₃²

≤ kT k

²

+ σ

²_max

(T

^[2]₍₃₎

) = kT k

²

+ σ

_max²

(T

^[3]₍₁₎

), n = 2,

.. .

σ

_max²

(T

^[N−3]₍₁₎

) + σ

²_N₋₂

≤ kT k

²

+ σ

²_max

(T

^[N−3]₍₃₎

) = kT k

²

+ σ

²_max

(T

^[N−2]₍₁₎

), n = N − 3, σ

_max²

(T

^[N₍₁₎^−2]

) + σ

²_N−1

≤ kT k

²

+ σ

²_max

(T

^[N₍₃₎^−2]

) = kT k

²

+ σ

²_N

, n = N − 2.

Summing up the above inequalities and canceling identical terms on the left- and right-hand side we obtain (25).

3. Results on feasibility and non-feasibility of the points S, X

1

, and Y

1

. Throughout this subsection we assume that T is a norm-1 tensor.

In the following example we show that it may happen that S is the only feasible point in the plane through the points S, X

1

, and Y

1

, i.e., the plane σ

²₃

=

_I¹

3

.

Example 9. Let I

3

= I

1

I

2

and T ∈ C

^I¹^×I²^×I³

. Assume that σ

²₃

=

_I¹₃

. Then T

^H₍₃₎

T

(3)

=

_I¹₃

I

I3

. Since T

(3)

is a square matrix, it follows that T

(3)

is a scalar multiple of a unitary matrix, T

(3)

=

√¹

I3

U. One can easily verify (see [6, p. 65]), that T

^H₍₁₎

T

(1)

=

_I¹₁

I

I1

and T

^H₍₂₎

T

(2)

=

_I¹₂

I

I2

. Hence, σ

²₁

=

_I¹₁

and σ

₂²

=

_I¹₂

. Thus, the points X

1

and Y

1

are not feasible.

From Example 9 it follows that the point S is feasible if I

1

= 2, I

2

= 3, and I

3

= 6.

The point S is also feasible if I

1

= 2, I

2

= 3, and I

3

= 4. Indeed, let T be an 2 × 3 × 4 tensor with mode-3 matrix unfolding

T

(3)

= 1 2 √

3 

 

 1 + √

3 0 0 1 − √

3 −2 0

0 1 + √

3 1 − √

3 0 0 2

0 1 − √

3 1 + √

3 0 0 2

1 − √

3 0 0 1 + √

3 −2 0



 

 .

Then one can also easily verify that T

(1)

T

^H₍₁₎

=

¹₂

I

2

, T

(2)

T

^H₍₂₎

=

¹₃

I

3

, and T

(3)

T

^H₍₃₎

=

1

4

I

4

. The following result implies that in the “intermediate” case I

1

= 2, I

2

= 3, and I

3

= 5 the point S is not feasible.

Theorem 10. Let I

3

= I

1

I

2

− 1, T ∈ C

^I¹^×I²^×I³

, and T

(3)

T

^H₍₃₎

=

_I¹₃

I

I3

. Then the following statements hold:

(i) if T

(1)

T

^H₍₁₎

=

_I¹₁

I

I1

, then I

1

≤ I

²

; (ii) if T

(2)

T

^H₍₂₎

=

_I¹

2

I

I2

, then I

2

≤ I

¹

; (iii) if the point S is feasible, then I

1

= I

2

.

Proof. (i) Let T

(3)

= [t

1

. . . t

I1I2

]. Then the identity T

(1)

T

^H₍₁₎

=

_I¹

1

I

I1

is equivalent to the system

t

^H_i₁

t

i2

+ t

^H_I₁_+i₁

t

I1+i2

+ · · · + t

^HI1(I2−1)+i1

t

I1(I2−1)+i2

= 0, kt

ⁱ1

k

²

+ kt

^I1+i1

k

²

+ · · · + kt

^I1(I2−1)+i1

k

²

= 1

I

1

, 1 ≤ i

¹

< i

2

≤ I

¹

. (33)

Since T

(3)

T

^H₍₃₎

=

_I¹₃

I

I3

, the matrix √

I

3

T

(3)

∈ C

^I³^×I¹^I²

can be extended to a unitary matrix √ I

3

T

(3)

a

^T

∈ C

Î¹Î²^×I¹Î²

, where a ∈ C

^I¹^I²

is a vector such that T

₍₃₎

a

^∗

= 0

(14)

and kak

²

=

_I¹₃

. Hence,

h T

^H₍₃₎

a

^∗

i T

(3)

a

^T

= 1 I

3

I

I2I3

or

(34) t

^H_i

t

j

+ ¯ a

i

a

j

= 0 for i 6= j and kt

ⁱ

k

²

+ |a

ⁱ

|

²

= 1 I

3

, 1 ≤ i < j ≤ I

¹

I

2

. From (33)–(34) it follows that

¯

a

i1

a

i2

+ ¯ a

I1+i1

a

I1+i2

+ · · · + ¯a

I1(I2−1)+i1

a

I1(I2−1)+i2

= 0,

|a

ⁱ1

|

²

+ |a

^I1+i1

|

²

+ · · · + |a

I1(I2−1)+i1

|

²

= 1 I

1

, 1 ≤ i

¹

< i

2

≤ I

¹

. Thus, the vectors

[a

i

a

I1+i

. . . a

I1(I2−1)+i

]

^T

∈ C

^I²

, 1 ≤ i ≤ I

¹

are nonzero and mutually orthogonal. Hence, I

1

≤ I

²

.

(ii) The proof is similar to the proof of (i).

(iii) Since S is feasible, it follows that T

₍₁₎

T

^H₍₁₎

=

_I¹

1

I

I1

and T

₍₂₎

T

^H₍₂₎

=

_I¹

2

I

I2

. Hence, by (i) and (ii), I

1

= I

2

.

4. The case of at least one equality in (4). The following two lemmas will be used in the proof of Theorem 6.

Lemma 11. Let H and Φ(H) be as in Lemma 8. Then the equality in (17) holds if and only if H can be factorized as

(35) H = [vec(W

1

) G ⊗ x][vec(W

¹

) G ⊗ x]

^H

, where

(i) W

1

∈ C

^I¹^×I³

and x is a principal eigenvector of W

1

W

₁^H

, i.e., W

1

W

^H₁

x = λ

max

(W

1

W

^H₁

)x, kxk = 1;

(ii) the matrix G = [g

2

. . . g

R

] ∈ C

^I³^×(R−1)

has orthogonal columns;

(iii) G

^T

W

^H₁

x = 0;

(iv) λ

max

(W

^H₁

W

1

) = λ

max

(W

^H₁

W

1

+ G

^∗

G

^T

).

Moreover, if (35) and (i)–(iv) hold, then

σ(

I3

X

k=1

H

kk

) = σ(W

1

W

^H₁

+ kGk

²F

xx

^H

), (36)

σ(H) = {kW

¹

k

²F

, kg

²

k

²

, . . . , kg

^R

k

²

, 0, . . . , 0 }, (37)

σ(Φ(H)) = σ(W

₁^H

W

1

+ G

^∗

G

^T

), (38)

where σ( ·) denotes the spectrum of a matrix and k · k

^F

denotes the Frobenius norm.

Proof. The proof essentially relies on the proof of Lemma 8 so we use the same notations and conventions as in the proof of Lemma 8.

Derivation of (36)–(38). Assume that 35 and (i)–(iv) hold. Then

H = X

R r=1

vec(W

r

) vec(W

r

)

^H

, where W

r

= xg

^T_r

for r = 2, . . . , R.

(15)

Hence

I3

X

k=1

H

kk

= X

R r=1

W

r

W

^H_r

= W

1

W

^H₁

+ X

R r=2

xg

_r^T

g

^∗_r

x

^H

= W

1

W

₁^H

+ kGk

²F

xx

^H

,

which implies (36). By (ii), (iii), and the convention kxk = 1 in (i), the vectors vec(W

r

) are mutually orthogonal, which implies (37). Finally, by (21),

Φ(H) = X

R r=1

W

_r^T

W

^∗_r

= W

^T₁

W

^∗₁

+ X

R r=1

g

r

x

^T

x

^∗

g

^H_r

= W

^T₁

W

₁^∗

+ GG

^H

,

which implies (38).

Sufficiency. By (i) and (36),

λ

max

(

I3

X

k=1

H

kk

) = λ

max

(W

1

W

^H₁

) + kGk

²F

.

By (iv) and (ii),

kW

¹

k

²F

≥ λ

^max

(W

₁^H

W

1

) ≥ λ

^max

(G

^∗

G

^T

) = max

2≤r≤R

kg

^r

k

²

.

Thus, by (37), λ

max

(H) = kW

¹

k

²F

and tr(H) = kW

¹

k

²F

+ kGk

²F

. By (iv) and (38), λ

max

(Φ(H)) = λ

max

(W

^H₁

W

1

). Thus, the left- and right-hand sides of (17) are equal to λ

max

(W

1

W

^H₁

) + kW

¹

k

²F

+ kGk

²F

.

Necessity. It is clear that the equality in (17) holds if and only it holds in (22) and (23). So we replace the inequality signs in (22) and (23) with an equality sign.

From the first line of (23) it follows that x satisfies (i). By the Cauchy inequality, the equality

I3

X

k=1

X

R r=2

|(w

^kr

, x) |

²

= X

R r=2

kw

^r

k

²

in (22) would imply that

w

kr

= c

kr

x, k = 1, . . . , I

3

, r = 2, . . . , R.

for some c

kr

∈ C. Hence,

(39) w

r

= [w

^T_1r

. . . w

^T_I₃_r

]

^T

= [c

1r

. . . c

I3r

]

^T

⊗ x = g

^r

⊗ x, r = 2, . . . , R.

Since H = P

R r=1

w

r

w

^H_r

, it follows that

H = [w

1

. . . w

R

][w

1

. . . w

R

]

^H

= [w

1

g

2

⊗ x . . . g

^R

⊗ x][w

¹

g

2

⊗ x . . . g

^R

⊗ x]

^H

, which coincides with (35). The mutual orthogonality of w

2

, . . . , w

R

and the orthogonality of w

1

to w

2

, . . . , w

R

implies (ii) and (iii), respectively. By (39), W

r

= xg

^T_r

for r = 2, . . . , R. Hence, the equality

λ

max

W

^H₁

W

1

= λ

max

X

R r=1

W

^H_r

W

r

!