Singular values of tensors are Lagrange multipliers

(1)

Singular values of tensors are Lagrange multipliers

Citation for published version (APA):

Graaf, de, J. (2011). Singular values of tensors are Lagrange multipliers. (CASA-report; Vol. 1102). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/2011

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Department of Mathematics and Computer Science

CASA-Report 11-02 January 2011

Singular values of tensors are Lagrange multipliers

by J. de Graaf

Centre for Analysis, Scientific computing and Applications Department of Mathematics and Computer Science

Eindhoven University of Technology P.O. Box 513

5600 MB Eindhoven, The Netherlands ISSN: 0926-4507

(3)

(4)

Singular Values of Tensors

are

Lagrange Multipliers

J. de GRAAF

Abstract

This note has been inspired by the Masters Thesis of Femke van Belzen [B]. The Lagrange Multiplier Method is applied to construct a singular value de-composition for Tensors in such a way that the number of zero terms in the ’Singular Value Expansion’ is as large as possible.

The result is a bit disappointing in the sense that one would like to get ’a lot more’ zeros in the q-dimensional number scheme of ’edge-length M ’, q ≥ 3, which represents a q-tensor on a vector space of dimension M . It is shown that the maximum number of zeros one can get is q dim SO(M ) = q₂M (M − 1). This number is significantly comparable with Mq only if q = 2.

Eindhoven, June 2007.

1 Singular value decomposition for matrices

An argument from Linear Algebra

Consider the matrix A ∈ IRn×m as a mapping A : IRm → IRn_{. There exists an orthogonal}

matrix S ∈ IRm×m and a diagonal matrix Σ ∈ IRm×m, such that

S>A>AS = Σ2 = diag[σ₁2, σ₂2, . . . , σ_r2, 0, . . . , 0], (1.1) with σ1 ≥ σ2 ≥ . . . ≥ σr > 0 and r ≤ m. The identity (1.1) says that AS ∈ IRn×m has r

mutually orthogonal columns with respective norms σ1, σ2, · · · . Hence the columns of AS

with column index > r, if any, are 0. Since the columns of AS are columns in IRn we also have r ≤ n. So r ≤ min(n, m).

We now first introduce an orthogonal matrix Q ∈ IRn×n: For its j-th column, 1 ≤ j ≤ r, we take the j-th column of AS, divided by σj. If it happens that r < n the remaining

(5)

Next we introduce Σ0 ∈ IRn×m: The ’diagonal’ of Σ0 has length min(n, m). The first r numbers on this diagonal are put equal to σ1, . . . , σr. The remaining numbers, if any, are

0. All the other entries of Σ0 are set 0 as well. Thus we arrive at

AS = QΣ0. (1.2) The main result of this section now follows

A = QΣ0S>. (1.3) The columns of Q = q

1, . . . , qn and S = s1, . . . , qm establish an orthonormal basis in

IRn and IRm, respectively. Alternatively, the result (1.3) can be written A = r X j=1 σjq_js>j = r X j=1 σjq_j ⊗ sj. (1.4)

An argument from Analysis

As a preliminary exercise for the great things to come in the next section I now want to derive the formulae (1.3)-(1.4) by means of the method of Lagrange Multipliers.

On IRm× IRn _{consider the bilinear function}

IRm× IRn 3 [x ; y] 7→ T (x, y) = y>Ax = x>A>y ∈ IR. (1.5) We study restricted maxima of T on the product of unit spheres

Sm−1× Sn−1 =[x; y] ∈ IRm× IRn

|x| = 1, |y| = 1

⊂ IRm× IRn, (1.6) which is a compact set, by means of the Lagrange multiplier method for finding conditional extrema. Tacitly assuming the standard inner product on IRn we write derivatives as columns. Note that ∂xT (x, y) = T (·, y₁) and ∂yT (x, y) = T (x, ·).

A pair [x₁; y

1] where T takes its maximum value on S

m−1_{× S}n−1 _{satisfies the system}

T (·, y 1) = A >_y 1 = α1x1, T (x₁, ·) = Ax₁ = α2y₁, (1.7) with multipliers α1, α2. It immediately follows that α1 = α2 = x>1A>y₁ = α, say, and

AA>y 1 = α 2_y 1, A >_Ax 1 = α2x1. Write α = T11.

From (1.7) it follows that for any ξ

1 ⊥ x1 we have T (ξ1, x1) = 0 and for any η1 ⊥ y1 we

have T (x₁, η

1) = 0.

For the 2nd step we look for extrema on the set Sm−1× Sn−1 ∩ {x₁}⊥ ∩ {y 1} ⊥ = 2

(6)

=[x; y] ∈ IRm_{× IR}n |x| = 1, |y| = 1, x > 1x = 0, y > 1y = 0 . (1.8)

Because of compactness the extremum exists at [x₂, y

2], say. It necessarily satisfies the

Lagrangian conditions with multipliers β1, β2, T12, T21,

T (·, y 2) = A >_y 2 = β1x2 + T12x1, T (x₂, ·) = Ax₂ = β2y₂+ T21y₁. (1.9) It immediately follows that β1 = β2 = x>2A>y₂ = β = T22, say. From (1.7) we find

T (x₂, y

1) = 0 and T (x1, y2) = 0. Combination with (1.8) then shows that T12 = T21 = 0

and, finally, that AA>y

2 = β 2_y

2, A >_Ax

2 = β2x2.

We proceed inductively. For the k-th step, 1 ≤ k ≤ min(m, n), we look for extrema on the set Sm−1×Sn−1_{∩ {x} 1, . . . , xk−1} ⊥ _{∩ {y} 1, . . . , yk−1} ⊥ = =[x; y] ∈ IRm× IRn |x| = 1, |y| = 1, x > 1x = . . . = x > k−1x = 0, y > 1y = . . . = y > k−1y = 0 . (1.10) Because of compactness the extremum exists at [x_k, y

k], say. It necessarily satisfies the

Lagrangian conditions with multipliers T1k, . . . , T(k−1)k, Tk1, . . . , Tk(k−1),

T (·, y k) = A >_y k = λkxk+ T1kx1+ · · · + T(k−1)kxk−1, T (x_k, ·) = Ax_k = µky_k+ Tk1y₁+ · · · + Tk(k−1)y_k−1. (1.11) As before λk = µk = x>kA >_y k , AA >_y k = λ 2 ky_k, A >_Ax k = λ2kxk and also T (x`, y_k) = 0 if k 6= `.

Our construction has to stop if k reaches min(m, n) ! Suppose m > n, and extend the orthonormal set {x₁, . . . , x_n} to an orthonormal basis {x₁, . . . , x_m}. Then for `, n < ` ≤ m and all k, 1 ≤ k ≤ n the 1st line in (1.11) tells us T (x_`, y

k) = 0.

In this way we find again the representation (1.4).

Another way of finding the necessary zeros is the following: From (1.11) it follows that T`k = 0, if ` > k and also T`k= 0 if k > `. Therefore T`k 6= 0 is only possible if ` = k.

Finally from (1.11) it follows T (x_`, y

k) = 0 if k < ` ≤ m and 1 ≤ k ≤ n,

T (x_k, y

r) = 0 if k < r ≤ n and 1 ≤ k ≤ n.

Counting the zeros:

(m−1)+. . .+(m−n)+(n−1)+. . .+1 = 1₂n(2m−n−1)+1₂n(n−1) = n(m−1) = nm−n. A satisfying result!

(7)

2 Singular value decomposition for covariant 3-tensors

On the Cartesian product X × Y × Z we consider the tri-linear function

T (·, ·, ·) : X × Y × Z → IR. (2.1) We suppose the vector spaces to be finite dimensional, dim X = K, dim Y = L, dim Z = M . Each of them is supplied with a positive definite inner product. The latter enables us to identify each of the vector spaces with its dual.

Our aim is to find orthonormal bases {xk}Kk=1 ⊂ X, {y`}L`=1 ⊂ Y, {zm}Mm=1 ⊂ Z, such

that the expansion coefficients in

T = Tk`mxk⊗ y`⊗ zm, (2.2)

contain as many zeros as possible.

We study restricted maxima of T on the product of unit spheres SX× SY× SZ =[x; y; z] ∈ X × Y × Z |x| = 1, |y| = 1, |z| = 1 ⊂ X × Y × Z, (2.3) which is a compact set, by means of the Lagrange multiplier method for finding conditional extrema. Because of the fixed inner products we consider partial derivatives as vectors:

∂xT (x, y, z) = T (·, y, z) ∈ X,

∂yT (x, y, z) = T (x, ·, z) ∈ Y,

∂zT (x, y, z) = T (x, y, · ) ∈ Z

.

With respect to any choice of orthonormal bases this corresponds to good old rituals like ∂

∂xj Tk`mx k

y`zm = Tj`my`zm, with Tk`m = Tk`m.

Step 1: A triple [x1; y1; z1] where T takes its maximum value on SX× SY× SZ exists and

necessarily satisfies the system

T (·, y1, z1) = λ11x1,

T (x1, ·, z1) = λ12y1,

T (x1, y1, · ) = λ13z1,

(2.4) with multipliers λ1j, 1 ≤ j ≤ 3.

It immediately follows that

λ11= λ12 = λ13= T (x1, y1, z1) = T111. (2.5)

(8)

Also note that

T (ξ₁, y1, z1) = 0, T (x1, η1, z1) = 0, T (x1, y1, ζ1) = 0,

whenever ξ₁ ⊥ x1, η1 ⊥ y1, ζ1 ⊥ z1. (2.6)

Step 2: A triple [x2; y2; z2] where T takes its maximum value on

SX× SY× SZ ∩ {x1}⊥ ∩ {y1}⊥ ∩ {z1}⊥= =[x; y; z] |x| = 1, |y| = 1, |z| = 1, (x1· x) = 0, (y1· y) = 0, (z1· z) = 0 ⊂ X × Y × Z, exists and necessarily satisfies the system

T ( · , y2, z2) = λ21x2+ µ122x1,

T (x2, · , z2) = λ22y2+ µ212y1,

T (x2, y2, · ) = λ23z2+ µ221z1,

(2.7)

with multipliers λ2j, 1 ≤ j ≤ 3 and µ122, µ212, µ221.

It immediately follows that, adapting the notation,

λ21= λ22 = λ23= T (x2, y2, z2) = T222 = λ222. (2.8)

Also note that

T (ξ₂, y1, z1) = 0, T (x1, η2, z1) = 0, T (x1, y1, ζ2) = 0, whenever ξ₂ ∈ {x1, x2}⊥, η2 ∈ {y1, y2}⊥, ζ2 ∈ {z1, z2}⊥. (2.9) Finally T ( x1, y2, z2) = T122 = µ122, T (x2, y1, z2) = T212 = µ212, T (x2, y2, z1) = T221 = µ221. (2.10) We proceed inductively.

Step m: Let m ≤ min{K, L, M }. Suppose for convenience min{K, L, M } = M . A triple [xm; ym; zm] where T takes its maximum value on

SX×SY×SZ ∩ {x1, . . . , xm−1}⊥∩ {y1, . . . , ym−1}⊥∩ {z1, . . . , zm−1}⊥ = =n[x; y; z] |x| = |y| = |z| = 1, (x1·x) = . . . = (xm−1·x) = 0, (y1· y) = . . . = (ym−1· y) = 0, (z1· z) = . . . = (zm−1· z) = 0 o ⊂ X × Y × Z, 5

(9)

exists and necessarily satisfies the system 1 ≤ m ≤ M :      T ( · , ym, zm) = λm1xm+ µ1mmx1+ . . . + µ(m−1)mmxm−1, T (xm, · , zm) = λm2ym+ µm1my1+ . . . + µm(m−1)mym−1, T (xm, ym, · ) = λm3zm+ µmm1z1+ . . . + µmm(m−1)zm−1, (2.11)

with multipliers λmj, 1 ≤ j ≤ 3, and µimm, µmim, µmmi, 1 ≤ i ≤ m − 1.

It immediately follows that, adapting the notation,

λm1= λm2= λm3= T (xm, ym, zm) = Tmmm = λmmm. (2.12)

Also note that

T (ξ_m, ym, zm) = 0, T (xm, ηmzm) = 0, T (xm, ym, ζm) = 0,

whenever ξ_m ∈ {x1, . . . , xm}⊥, ηm∈ {y1, . . . , ym}⊥, ζm ∈ {z1, . . . , zm}⊥. (2.13)

Finally

T ( xj, ym, zm) = Tjmm = µjmm, T (xm, yj, zm) = Tmjm = µmjm, T (xm, ym, zj) = Tmmj = µmmj,

for 1 ≤ j ≤ m − 1. (2.14) Special Case: K = L = M . If we have reached m = M we have found an orthonormal basis in each of the spaces X, Y, Z. From (2.13) we find that always Tjmm = 0, Tmjm = 0

and Tmmj = 0, whenever j > m. The number of zeros in the expansion (2.2) therefore

is 3 (M − 1) + (M − 2) + · · · + 1 = 3

2M (M − 1). This number happens to be 3 times

the dimension of the orthogonal group SO(M ), which parametrizes the set of orthogonal bases. In general no more zeros in the set of components {Tijk} are to be expected since

M3 _{numbers are needed to fix a 3-tensor with respect to prescribed bases.}

The position of the zeros in the 3-dimensional array of numbers {Tk`m}Mk,`,m=1 can be

visualised as the union of 3 ’wings’, like the tail of an arrow, Tijk = 0 if [k ` m] ∈[m + n m m] , [m m + n m] , [m m m + n]

m, n ∈ IN , m + n ≤ M Question: Then, 3₂M (M − 1) being ’negligable’ with respect to M3_{, what is the advantage}

of the representation (2.2) with respect to the just constructed orthonormal basis?! Maybe, the fact that SO(M ) is a compact group is of some advantage!?

Special Case: K > L > M . After we have reached m = M , we continue inductively with the Lagrange multiplier procedure for

T (·, ·, zM) : X × Y × {zM} → IR, (2.15)

restricted to the set

(10)

SX×SY×{zM} ∩ {x1, . . . , x`−1}⊥∩ {y1, . . . , y`−1}⊥ = =n[x; y; zM] |x| = |y| = 1, (x1·x) = . . . = (x`−1·x) = 0, (y1·y) = . . . = (y`−1·y) = 0, o ⊂ X×Y×Z, and find the sets of equations for extensions of the orthonormal systems in X and Y

M + 1 ≤ ` ≤ L : ( T ( · , y`, zM) = λ``Mx`+ µ1`Mx1+ . . . + µ(`−1)`Mx`−1, T (x`, · , zM) = λ``My`+ µ`1My1+ . . . + µ`(`−1)My`−1,

(2.16) In order to finish the construction of an orthonormal basis for X, we choose an extension of the orthonormal set {x1, . . . , xL} ⊂ X to an orthonormal basis {x1, . . . , xK} ⊂ X.

We now count the zeros in the expansion (2.2) with respect to our orthonormal bases. From (2.11), (2.16) we gather T (xk, ym, zm) = 0 if K ≥ k > m, and 1 ≤ m ≤ M, T (xm, y`, zm) = 0 if L ≥ ` > m, and 1 ≤ m ≤ M, T (xm, ym, zj) = 0 if M ≥ j > m, and 1 ≤ m ≤ M, T (xk, y`, zM) = 0 if K ≥ k > `, and M + 1 ≤ ` ≤ L, T (x`, yj, zM) = 0 if L ≥ j > `, and M + 1 ≤ ` ≤ L, (2.17)

The corresponding amounts of zeros are, respectively, K − 1 + K − 2 + · · · + K − M, L − 1 + L − 2 + · · · + L − M, M − 1 + M − 2 + · · · + 1,

K − (M + 1) + K − (M + 2) + · · · + K − L, L − (M + 1) + L − (M + 2) + · · · + 1.

The total amount of zeros in the ’cube with sides K , L , M ’ then becomes 1 2L(2K − L − 1) + 1 2L(L − 1) + 1 2M (M − 1) = L(K − 1) + 1 2M (M − 1). (2.18) Suggestions and Remarks

• I am not quite convinced that I caught all zeros in the latter case.

• A generalization to higher dimensions looks obvious. Apart from the bookkeeping! I have some suggestions for that.

• The graphical representation of the position of the zeros in the latter case form a funny bird.

• Definition III.1 and Theorems III.2 and III.2 in Femke van Belzen’s graduation paper should be confronted with the bases constructed in the underlying note.

(11)

Reference

[B] Belzen, Femke van : Singular Value Decompositions and Low Rank Approximations of Multi-Linear Functionals. Master’s Thesis. Dept of Electrical engineering 2007. Eindhoven University of Technology. The Netherlands.

JdG, Eindhoven, June 2007.

(12)

PREVIOUS PUBLICATIONS IN THIS SERIES:

Number Author(s) Title Month

10-73 10-74 10-75 11-01 11-02 O. Matveichuk J.J.M. Slot K. Kumar M. Pisarenco M. Rudnaya V. Savcenco C. Cancès C. Choquet Y. Fan I.S. Pop J. de Graaf J. de Graaf

A Rouse-like model for highly ordered main-chain liquid crystalline polymers containing hairpins Analysis, numerics, and optimization of algae growth

Existence of weak

solutions to a degenerate pseudo-parabolic equation modeling two-phase flow in porous media

Stokes-Dirichlet/Neuman problems and complex analysis

Singular values of tensors and Lagrange multipliers

Dec. ‘10 Dec. ‘10 Dec. ‘10 Jan. ‘11 Jan. ‘11 Ontwerp: de Tantes, Tobias Baanders, CWI