On Avoiding Diverging Components in the Computation of the Best Low Rank Approximation of Higher-Order Tensors

(1)

On Avoiding Diverging Components in the

Computation of the Best Low Rank

Approximation of Higher-Order Tensors

Lieven De Lathauwer

Tech. Report 05-269, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2005 This report was written as a contribution to an e-mail discussion between Rasmus Bro, Lieven De Lathauwer, Richard Harshman and Lek-Heng Lim.

(2)

1. Orthogonality in one of the modes

Consider the approximation of a tensor A ∈ CI1×I2×...×IN by a rank-R tensor

ˆ A, given by ˆ A = R X r=1 λru(1)r ◦ u(2)r ◦ · · · ◦ u(N )r , where R 6 I1. Denote X = (λ1, . . . , λR; u(1)1 , . . . , u(1)R ; . . . ; u (N ) 1 , . . . , u(N )R ).

We have the following theorem.

Theorem 1 Under the condition that the vectors u(1)r , r = 1, . . . , R, are

mutually orthonormal, the function f (X) = kA −

R

X

r=1

λru(1)r ◦ u(2)r ◦ · · · ◦ u(N )r k2 (1)

attains its infimum. Proof: Let U(n)_{= [u}(n)

1 . . . u(n)R ], n = 1, . . . , N , and let Λ = diag(λ1, . . . , λR).

Matricizing (1), we obtain

f (X) = kA(1)− U(1)· Λ · (U(2)¯ · · · ¯ U(N ))Tk2. (2)

Let U(1) = (U(1)_{, U}(1)⊥

) be (square) unitary. For any choice of U(1), we have

f (X) = kU(1)H · A(1)− µ IR×R O(I1−R)×R ¶ · Λ · (U(2)_{¯ · · · ¯ U}(N )₎T_k2_. ₍₃₎ Define B = A ×1 U (1)H

. Denote the (I2 × I3× . . . × IN)-slices of B by B1,

. . . , BI1. Then f (X) = R X r=1 kBr− λru(2)r ◦ ur(3)◦ · · · ◦ u(N )r k2+ c(U(1)), (4) in which c(U(1)_{) = kA ×} 1(U(1) ⊥

)H_k2_{. We conclude that the problem reduces}

to a set of best rank-1 approximation problems, for which the infimum is attained.

(3)

2. Bounded condition number in one mode

In the previous section the condition number of U(1) _{was taken equal to one.}

This constraint can be relaxed. We have the following theorem.

Theorem 2 Let the vectors u(n)r , r = 1, . . . , R, have unit norm. Under the

condition that the condition number κ(U(1)_{) 6 k, the function f (X) attains}

its infimum.

Proof: All level sets of the cost function are compact. Closed: cf. note

Lek-Heng. Bounded: below.

We assume that all vectors u(n)r , r = 1, . . . , R, n = 1, . . . , N , have

unit-length. Hence, we have to show that λr → ∞ implies that f (X) → ∞. Let

λ = (λ1, . . . , λR). We have

f (X) = kvec(A) − (U(1)_{¯ · · · ¯ U}(N )_{) · λk}2_. ₍₅₎

Reasoning by contradiction, we see that the condition number of U(1)_{¯ · · · ¯}

U(N ) _{is bounded because the condition number of U}(1) _{is bounded. This}

implies that k(U(1)_{¯ · · · ¯ U}(N )_{) · λk → ∞ whenever λ}

r → ∞. As a result,

f (X) → ∞ whenever λr → ∞.

From a practical point of view, in an ALS algorithm, the constraint could be imposed by replacing the current estimate of U(1) _{by its best approximation}

with condition number at most k. This approximation is simply obtained by replacing the singular values that are more than k times smaller than the dominant singular value σ1, by σ1/k.

3. Zero-correlation in one mode

One can also impose that the factors in one mode are uncorrelated. In that case, the matrix U(1) _{is of the form 1 · m}T _{+ Q · Ω, in which 1 is a vector}

that contains only ones, m contains the means of the different factors, Q 2

(4)

is column-wise orthogonal and Ω is diagonal. Even when some entries of Ω become big, this cannot lead to degeneracy, cf. Section 1. However, entries of m may also become big and may mutually cancel. This means that degeneracy cannot be completely avoided. If it occurs, two or more columns of U(1) _{become proportional to 1. This is an event that happens}

with probability zero.