Real solving polynomial equations with semidefinite programming

(1)

.5cm.

Real solving polynomial equations with semidefinite programming

Jean Bernard Lasserre - Monique Laurent - Philipp Rostalski LAAS, Toulouse - CWI, Amsterdam - ETH, Z¨urich

LAW 2008

Real solving polynomial equations with semidefinite programming – p.1

(2)

The problem

Given polynomials h₁, . . . , h_m ∈ R[x] = R[x1, . . . , xn]

• Compute all common real roots (assuming finitely many), i.e.

compute the real variety V^R(I) of the ideal I := (h₁, . . . , h_m)

• Find a basis of the real radical ideal I(V^R(I))

V^R(I) := {v ∈ Rⁿ | f (v) = 0 ∀f ∈ I}

I(V^R(I)) := {f ∈ R[x] | f(v) = 0 ∀v ∈ V^R(I)}

|{z}=

Real Nullstellensatz

{f ∈ R[x] | ∃m ∈ N si ∈ R[x] f^2m + P

i s²_i ∈ I}

(3)

Our contribution

1. A semidefinite characterization of I(V^R(I))

[as the kernel of some positive semidefinite moment matrix]

2. Assuming |V^R(I)| < ∞, an algorithm for finding:

• a generating set (border or Gr¨obner basis) of I(V^R(I))

• the real variety V^R(I)

Remarks about the method:

• real algebraic in nature: no complex roots computed

• works if V^R(I) is finite (even if V^C(I) is not)

• no preliminary Gröbner basis of I is needed

• numerical, based on semidefinite programming (SDP)

(4)

Plan of the talk

1. The moment-matrix method for V^R(I)

2. Adapt the moment-matrix method for V^C(I) [drop PSD]

3. Relate to the ‘prolongation-projection’ algorithm of Zhi and Reid for V^C(I)

4. Adapt the prolongation-projection algorithm for V^R(I)

[add PSD]

(5)

The complex case is well understood

Given an ideal _{I ⊆ R[x]} with |V^C(I)| < ∞,

find the (complex) variety V^C(I) and the radical ideal I(V^C(I)).

Linear algebra in the finite dimensional space ^R_[x]/I Need a linear basis of ^R_[x]/I and a normal form algorithm

V^C(I) can be computed e.g. with:

• Linear algebra methods: Eigenvalue method [Stetter-Möller, Stickelberger, Rouillier]

• Homotopy methods [Verschelde] . . .

Seidenberg [1974]: I(V^C(I)) = (I ∪ {q₁, . . . , q_n}), where

q_i is the square-free part of p_i, the monic generator of _{I ∩ R[x}_i].

(6)

The eigenvalue method for |V^C(I)| < ∞, i.e. dim R[x]/I < ∞

Stickelberger theorem:

Let m_f be the ‘multiplication by f’ linear operator in ^R_[x]/I. 1. The eigenvalues of m_f are {f (v) | v ∈ V^C(I)}.

2. The eigenvectors of m^T_f give the points v ∈ V^C(I). M_f^Tζ_B,v = f (v)ζ_B,v ∀ v ∈ V^C(I)

where M_f is the matrix of m_f in a base B of ^R_[x]/I and ζ_B,v := (b(v))_b∈B

Moreover, when B is a set of monomials and 1 ∈ B, a border basis of I can be read directly from the multiplication matrices M_x1, . . . , M_xn.

(7)

Finding a linear basis B of R[x]/I and a basis G of the ideal I

• Typically: G is a Gr¨obner basis and B is the set of standard monomials for a given monomial ordering (e.g. via

Buchberger’s algorithm)

• More generally: Assume B = {b₁ = 1, b₂, . . . , b_N} is a set of monomials with border ∂B := (x1B ∪ . . . ∪ xnB) \ B.

Write any border monomial

xib_j =

XN k=1

a^(ij)_k b_k

| {z }

∈Span(B)

+ g^(ij)

|{z}

∈I

Then: G := {g^(ij) | xib_j ∈ ∂B} is a (border) basis of I and carries the same information as the multiplication matrices

M_x1, . . . , M_xn Real solving polynomial equations with semidefinite programming – p.7

(8)

Counting real roots with the Hermite quadratic form

For _{f ∈ R[x]}

Hermite bilinear form: ^H^f : R[x]/I × R[x]/I → R (g, h) 7→ Tr(M_{f gh})

Theorem: For f = 1

rank(H₁) = |V^C(I)|, Sign(H₁) = |V^R(I)|, Rad (H₁) = I(V^C(I))

• rank(H_f) = |{v ∈ V^C(I) | f (v) 6= 0}|

• Sign(H_f)

= |{v ∈ V^R(I) | f (v) > 0}| − |{v ∈ V^R(I) | f (v) < 0}|

(9)

To find V^R(I) and a basis of the real radical ideal I(V^R(I)) ...

... it suffices to have a linear basis B of ^R_[x]/I(VR(I)) and the multiplication matrices in ^R_[x]/I(V^R(I)) !

New tool: Moment matrices

y ∈ R^Nⁿ^2s M_s(y) := (y_α+β)_α,β∈Nⁿ_s Nⁿ

s := {α ∈ Nⁿ | |α| = P

i α_i ≤ s}

monomials _x^α of degree ≤ s

Motivation: For y = (v^α)_α∈Nⁿ_2s =: ζ_2s,v where v ∈ Rⁿ M_s(y) = ζ_s,vζ_s,v^T 0 and KerM_s(y) ⊆ I(v)

(10)

Real roots of I = (h₁, . . . , h_m) and PSD moment matrices Lemma: For v ∈ V^R(I) and t ≥ D := max_j deg(h_j)

the vector y = ζ_t,v = (v^α)_|α|≤t satisfies:

• the linear constraints (LC): [v ∈ V^C(I)] y^T(h_j~x^α) = 0 ∀j = 1 . . . m ∀α s.t. |α| + deg(h_j) ≤ t

• the PSD constraint: ^M⌊t/2⌋(y) 0 [v ∈ Rⁿ]

Set: ^K^t ^{:= {y ∈ R}^N

n

t | (LC), M_⌊t/2⌋(y) 0}

Obviously: K_t ⊇ cone(ζ_t,v | v ∈ V^R(I)}

Theorem: ∃t ≥ s ≥ D π_s(K_t)=cone(ζ_s,v | v ∈ V^R(I)}

(11)

Semidefinite characterization of I(V^R(I))

Theorem 1: Let y be a generic element of K_t, i.e.

y lies in the relative interior of the cone K_t. Then (KerM_⌊t/2⌋(y)) ⊆ I(V^R(I))

with equality for t large enough.

• Geometric property of SDP:

y is generic ⇐⇒ rankM_⌊t/2⌋(y) is maximum

⇐⇒ KerM_⌊t/2⌋(y) ⊆ KerM_⌊t/2⌋(z) ∀z ∈ K_t

Thus: for v ∈ V^R(I), KerM_⌊t/2⌋(y) ⊆ KerM_⌊t/2⌋(ζ_t,v)⊆ I(v).

• Let {g₁, . . . , g_L} be a basis of I(V^R(I)). Real Nullstellensatz: g_l^2m + P

i s²_i = P_m

j=1 u_jh_j. This implies: g_l ∈ KerM_⌊t/2⌋(y) for t large enough.

(12)

Stopping criterion when |V^R(I)| < ∞

Theorem 2: Let y be a generic element of K_t.

Assume one of the following two flatness conditions holds:

(F1) rankM_s(y) = rankM_s−1(y) for some D ≤ s ≤ ⌊t/2⌋

(Fd) rankM_s(y) = rankM_s−d(y) for some d = ⌈D/2⌉ ≤ s ≤ ⌊t/2⌋. Then:

• I(V^R(I)) = (KerM_s(y))

• Any base B of the column space of M_s−1(y) is a base of ^R_[x]/I(V^R(I))

• The multiplication matrices can be constructed from M_s(y).

(13)

Sketch of proof: Assume rankM_s(y) = rankM_s−1(y)

• Thm [Curto-Fialkow 1996] π_2s(y) has a flat extension

˜

y ∈ R^Nⁿ, i.e. such that rankM (˜y) = rankM_s(y).

• Thm [La 2005] As M (˜y) 0, (KerM_s(y))=KerM (˜y) is a real radical 0-dimensional ideal.

• I ⊆

|{z}

(LC)

(KerM_s(y)) ⊆

y|{z}generic

I(V^R(I))

Thus: (KerM_s(y))=I(V^R(I))

• B indexes a base of M_s−1(y) =⇒ B indexes a base of M (˜y)

=⇒ B is a base of ^R_[x]/KerM(˜y) = R[x]/I(V^R(I)) Use linear dependencies in M_s(y) to construct the multiplication matrices.

(14)

The moment-matrix algorithm for V^R(I) Input: h₁, . . . , h_m ∈ R[x]

Output: B base of ^R_[x]/I(V^R(I))

The multiplication matrices M_xi in ^R_[x]/I(V^R(I)) Algorithm: For t ≥ D

Step 1: Compute a generic element y ∈ K_t. Step 2: Check if (F1) or (Fd) holds.

If yes, return a column basis B of M_s−1(y) and M_xi = M_B⁻¹P_i,

• M_B:= principal submatrix of M_s−1(y) indexed by B

• P_i:= submatrix of M_s(y) with rows in B and columns in _x_iB. If no, go to Step 1 with t → t + 1.

(15)

The algorithm terminates: (F1) holds for t large enough.

• For t ≥ t₀, KerM_⌊t/2⌋(y) contains a Gröbner base {g₁, . . . , g_L} of I(V^R(I)) for a total degree ordering.

• B := {b₁, . . . , b_N}: set of standard monomials base of ^R_[x]/I(V^R(I)).

Set: s := 1 + max_b∈B deg(b) and assume t ≥ t₀, ⌊t/2⌋ > s. For |α| ≤ s, write x^α =

XN i=1

λ_ib_i

| {z }

deg≤s−1

+

XL l=1

u_lg_l

| {z }

deg≤|α|≤s<⌊t/2⌋

Thus: x^α − P_N

i=1 λ_ib_i ∈ KerM_⌊t/2⌋(y). That is: rankM_s(y) = rankM_s−1(y).

(16)

Two small examples

Ex. 1: I = (h := x²₁ + x²₂) V^R(I) = {0}, |V^C(I)| = ∞.

M₁(y) 0, 0 = y^T~h = y₂₀ + y₀₂ =⇒ y_α = 0 ∀α 6= 0. Any generic y ∈ K₂ is y = (y₀, 0, . . . , 0) with y₀ > 0. Thus: (KerM₁(y)) = (x1, x2) = I(V^R(I)).

Ex. 2: I = (h_i := xi(x²_i + 1) | i = 1, . . . , n) V^R(I) = {0}, |V^C(I)| = 3ⁿ.

M₂(y) 0, 0 = y^T(x~ih_i) = y_4ei + y_2ei ∀i =⇒ y_α = 0 ∀α 6= 0. Any generic y ∈ K₄ is y = (y₀, 0, . . . , 0) with y₀ > 0.

Thus: (KerM₁(y)) = (x1, . . . , xn) = I(V^R(I)).

(17)

Some algorithmic issues

How to find a generic y ∈ K_t, i.e. with rankM_t(y) max. ? Solve the SDP program: min_y∈Kt 1 with a SDP solver using the ‘extended self-dual embedding property’.

Then the central path converges to a solution in the relative interior of the optimum face, i.e., to a generic point y ∈ K_t. How to compute ranks of matrices ?

We use SVD decomposition, but this is a sensitive numerical issue ...

The method may work without (F1) or (Fd):

If rankM_B(y) = rankM_B∪∂B(y) and the formal multiplication matrices commute.

(18)

Extension of the moment-matrix algorithm to V^C(I)

Omit the PSD condition and work with the linear space:

K_t = {y ∈ R^Nⁿ^t | y^T(h_j~x^α) = 0 ∀j, α with |α| + deg(h_j) ≤ t}

The same algorithm applies: For t ≥ D

• Pick generic y ∈ K_t, i.e. rankM_s(y) maximum ∀s ≤ ⌊t/2⌋

[choose y ∈ K_t randomly]

• Check if the flatness condition (F1) or (Fd) holds.

• If yes, find a basis of ^R_[x]/J where J := (KerM_s(y)) satisfies I ⊆ J ⊆ I(V^C(I)) and thus V^C(J) = V^C(I).

• If not, iterate with t + 1.

(19)

Find the ideal (KerM_s(y)) = I in the Gorenstein case

The inclusion I ⊆ (KerM_s(y)) ⊆ I(V^C(I)) may be strict for any generic y.

Example: For _{I = (x}²₁_{, x}²₂_{, x}₁_x₂), V^C(I) = {0},

I(V^C(I)) = (x1, x2), dim R[x]/I = 3, dim R[x]/I(V^C(I)) = 1, while dim R[x]/(KerMs(y)) = 2 for any generic y !

Recall: The algebra A := R[x]/I is Gorenstein if there exists a non-degenerate bilinear form on A satisfying (f, gh) = (f g, h)

∀f, g, h ∈ A, i.e. if there exists y ∈ K_∞ with I = KerM (y) Hence: ∃y ∈ K_t s.t. rankM_s(y) = rankM_s−1(y) and

I = (KerM_s(y)) IFF A is Gorenstein.

(20)

Example: the moment-matrix algorithm for real/complex roots

I = (x²1 − 2x¹x³ + 5, x¹x²2 + x²x³ + 1, 3x²2 − 8x¹x³), D = 3, d = 2

Ranks of M^s(y) for generic y ∈ K^t, K^t :

t = 2 3 4 5 6 7 8 9

s = 0 1 1 1 1 1 1 1 1

s = 1 4 4 4 4 4 4 4 4

s = 2 8 8 8 8 8 8

s = 3 11 10 9 8

s = 4 12 10

no PSD 8 complex roots

t = 2 3 4 5 6

s = 0 1 1 1 1 1

s = 1 4 4 4 2 2

s = 2 8 8 2

s = 3 10

with PSD extract 2 realroots

(21)

8 complex / 2 real roots:

v¹ = h

−1.101, −2.878, −2.821 i

v² = h

0.07665 + 2.243i, 0.461 + 0.497i, 0.0764 + 0.00834i i

v³ = h

0.07665 − 2.243i, 0.461 − 0.497i, 0.0764 − 0.00834i i

v⁴ = h

−0.081502 − 0.93107i, 2.350 + 0.0431i, −0.274 + 2.199i i

v⁵ = h

−0.081502 + 0.93107i, 2.350 − 0.0431i, −0.274 − 2.199i i

v⁶ = h

0.0725 + 2.237i, −0.466 − 0.464i, 0.0724 + 0.00210i i

v⁷ = h

0.0725 − 2.237i, −0.466 + 0.464i, 0.0724 − 0.00210i i

v⁸ = h

0.966, −2.813, 3.072 i

(22)

Extracting real roots without (F1) or (Fd)

I = (5x⁹1 − 6x⁵1x² + x¹x⁴2 + 2x¹x³, −2x⁶1x² + 2x²1x³2 + 2x²x³, x²1 + x²2 − 0.265625) D = 9, d = 5, |V^R(I)| = 8, |V^C(I)| = 20

order rank sequence of extract. order s accuracy comm. error

t M^s(y) (1 ≤ s ≤ ⌊t/2⌋) MON/SVD MON/SVD MON/SVD

10 1 4 8 16 25 34 — — —

12 1 3 9 15 22 26 32 — — —

14 1 3 8 10 12 16 20 24 3(3)/—(—) 0.12786/— 0.00019754/—

16 1 4 8 8 812 16 20 24 4(3)/3(3) 4.6789e-5/0.00013406 4.7073e-5/0.00075005

Quotient basis: B = {1, x¹, x², x³, x²1, x¹x², x¹x³, x²x³} border basis G of size 10

Real solutions:

8

>>

><

>>

>:

x¹ = (−0.515, −0.000153, −0.0124) x² = (−0.502, 0.119, 0.0124)

x³ = (0.502, 0.119, 0.0124) x⁴ = (0.515, −0.000185, −0.0125) x⁵ = (0.262, 0.444, −0.0132) x⁶ = (−2.07e-5, 0.515, −1.27e-6) x⁷ = (−0.262, 0.444, −0.0132) x⁸ = (−1.05e-5, −0.515, −7.56e-7)

(23)

Link with the elimination method of Zhi and Reid

Theorem: If (F1) holds, i.e. for some D ≤ s ≤ ⌊t/2⌋

rankM_s(y) = rankM_s−1(y) for generic y ∈ K_t, then ^{dim π}^2s^(K^t^{) = dim π}^2s−1^(K^t^{) = dim π}^2s^(K^t+1⁾

Theorem (based on [Zhi-Reid 2004]): If for some D ≤ s ≤ t (ZR) dim π_s(K_t) = dim π_s−1(K_t) = dim π_s(K_t+1)

then one can construct a base of ^R_[x]/I and the multiplication matrices in ^R_[x]/I [and thus extract V^C(I)].

Hence: The Zhi-Reid criterion (ZR) may be satisfied earlier than the flatness criterion (F1).

(24)

Example: I = (x²₁ − 2x1x3 + 5, x1x²₂ + x2x3 + 1, 3x²₂ − 8x1x3)

t = 2 3 4 5 6 7 8 9

s = 0 1 1 1 1 1 1 1 1

s = 1 4 4 4 4 4 4 4 4

s = 2 8 8 8 8 8 8

s = 3 11 10 9 8

s = 4 12 10

rankM³(y)=rankM²(y) for y ∈ K⁹

t = 3 4 5 6 7 8 9

s = 1 4 4 4 4 4 4 4

s = 2 8 8 8 8 8 8 8

s = 3 11 10 9 8 8 8 8

s = 4 12 10 9 8 8 8

s = 5 12 10 9 8 8

s = 6 12 10 9 8

s = 7 12 10 9

s = 8 12 10

dim π³(K⁶)

=dim π²(K⁶)

=dim π³(K⁷)

(25)

Extending the Zhi-Reid criterion to the real case

• In the complex case, K_t = H_t^⊥ where

H_t := {h_jx^α ∀j, α with deg(h_jx^α) ≤ t}.

• In the real case, K_t is a cone, contained in the linear space P_t^⊥, with the same dimensions: dim K_t = dim P_t^⊥, where

P_t := H_t ∪ {f x^α | f ∈ KerM_⌊t/2⌋(y), deg(x^α) ≤ ⌊t/2⌋}

Theorem: If for some D ≤ s ≤ t

(ZR+) dim π_s(P_t^⊥) = dim π_s−1(P_t^⊥) = dim π_s((P_t ∪ ∂P_t)^⊥) then one can construct a base of J with I ⊆ J ⊆ I(V^R(I)) and thus extract V^R(I) = V^C(J) ∩ Rⁿ.

(26)

Link with the flatness criterion

Theorem: In the PSD case, the flatness criterion (F1):

rankM_s(y) = rankM_s−1(y) for generic y ∈ K_t is equivalent to the stronger version of the (ZR) criterion:

(ZR++) dim π_s−1(P_t^⊥) = dim π_2s(P_t^⊥) = dim π_2s((P_t ∪ ∂P_t)^⊥) in which case we find the real radical ideal J = I(V^R(I)).

Hence: the algorithm based on (ZR) may stop earlier than the moment-matrix algorithm, based on (F1).

Future work: Adapt other known efficient algorithms for complex roots to real roots by incorporating SDP conditions.