Completely Positive and Copositive Matrices

(1)

Completely Positive and Copositive Matrices

Lucas Riedstra

June 28, 2020

Bachelor thesis Mathematics and Computer Science Supervisors: dr. Jan Brandts, dr. Leen Torenvliet

Informatics Institute

Korteweg-de Vries Institute for Mathematics Faculty of Sciences

(2)

Abstract

We study the completely positive and copositive matrices as dual cones in the space of symmetric matrices. We review the upper bound for the CP-rank of an n × n completely positive matrix given by Hannah and Laffey in [1] and give a simplification of their proof. This is followed by a treatment of optimization theory and Lagrange duality, where we compute the dual problem for the general conic linear problem. We consider several examples of problems that can be rewritten as optimization problems over the cone of completely positive matrices. Most importantly, following De Klerk and Pasechnik in [2], we show that the independent set number of a graph can be expressed as the solution of a linear optimization problem over the set of completely positive matrices. Our arguments differ subtly at some points from those used in [2].

The result of De Klerk and Pasechnik can be used to show that determining whether a matrix is completely positive and/or copositive is an NP-hard problem — this was shown by Dickinson and Gijben in [3]. Their proof is studied and the open problem of whether determining complete positivity is NP-complete is discussed. It is also shown that, based on a result of Murty and Kabadi in [4], the problem of determining copositivity of a rational matrix is co-NP-complete.

We analyze an algorithm introduced by Bundfuss and D¨ur in [5] that determines whether a matrix is copositive, and follow the proof in [5] to show that this algorithm terminates in finitely many steps for most matrices. Also, we give a Python implementation and discuss some implementation choices. Finally, we discuss an algorithm for determining whether a matrix is completely positive designed by Berman and Rothblum in [6] and give an implementation for a specific case in Mathematica.

Title: Completely Positive and Copositive Matrices Author: Lucas Riedstra, lcriedstra@gmail.com, 11837322 Supervisors: dr. Jan Brandts, dr. Leen Torenvliet

Second graders: dr. Guus Regts, dr. Peter van Emde Boas End date: June 28, 2020

Informatics Institute University of Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.ivi.uva.nl

Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.kdvi.uva.nl

(3)

1. Introduction

1.1. Motivation

In this section, we will explain the main questions that are treated and answered in this thesis.

First, we give the definitions of completely positive and copositive matrices:

1. A matrix A ∈ Rn×n is completely positive if there exists k ∈ N and an entrywise nonnegative matrix B ∈ Rk×n_{such that A = B}>_B.

2. A symmetric matrix A ∈ Rn×n is copositive if, for all entrywise nonnegative vectors x ∈ Rn, it holds that x>Ax ≥ 0.

The sets of completely positive and copositive matrices in Rn×n are denoted by CPn and

COP_n respectively. The concept of copositivity was first introduced in the field of linear algebra by Motzkin in 1952 [7], while the concept of complete positivity was introduced in the field of numerical analysis by Hall in 1962 [8]. For both classes of matrices, it seems ‘hard’ at first sight to verify whether a symmetric matrix belongs to that class or not. We will study the complexity of determining membership of these classes. In the last decades, interest in CPn and COPn has risen, mostly due to their applications in

optimization problems [2,9, 10]. The completely positive matrices also find applications in block designs and economic modelling [11].

Clearly, CPnand COPnare not subspaces of Rn×n. This leads to the following question,

which we try to answer in the first chapters:

Question 1. What is the appropriate setting in which to consider the sets CPnand COPn,

and how are these two sets related?

We will see that CPn and COPn are convex cones in the space of symmetric matrices.

Afterward, we turn our attention specifically to CPn. If a matrix A is known to be

completely positive, one can ask what the minimal k ∈ N is such that there exists a nonnegative B ∈ Rk×n with A = B>B. This minimal k is known as the CP-rank of A. This yields the following question:

Question 2. Is there an upper bound for the CP-rank of a matrix A ∈ CPn?

This question was solved in [1] by Hannah and Laffey in 1983, and we give a sim-plification of their proof. After that, we consider the applications of COPn and CPn in

optimization theory. To answer this question, we first consider optimization theory in gen-eral, where the relationship between CPn and COPn appears again through the concept

of Lagrange duality. Again this yields a more general question, namely the following: Question 3. In what type of optimization problems do the cones CPn and COPn appear?

In particular, we consider one specific graph-theoretical problem, namely to find the independent set number of a graph, which can be reformulated as the solution of an opti-mization problem over CPnor COPn(this was shown in 2002 by De Klerk and Pasechnik

(6)

in [2]). We give a proof which follows that of De Klerk and Pasechnik, but differs subtly at some points, making it simpler in the opinion of the author.

We also consider some other optimization problems which can be reformulated in the same setting. Following D¨ur in [9], we obtain the result that the standard quadratic problem (a hard optimization problem) can also be rewritten as a linear optimization problem over CPn or COPn. This is surprising since linear problems are in general much

easier than quadratic problems. We will see that the difficulty of the problem does not disappear, but is shifted to the feasible sets, since CPn and COPn are ‘difficult’ sets in

some sense. Finding the independent set number of a graph is generally known to be an NP-hard problem [12], meaning that it is highly unlikely an efficient algorithm exists. This shows that optimization over CPn or COPn is also NP-hard.

One can wonder if the problem of determining whether a matrix is completely positive or copositive is difficult as well. This is our next main question:

Question 4. What is the complexity class of the problem of determining whether a matrix is completely positive and/or copositive?

As it turns out, both problems are NP-hard. For COP, this was proved decades ago by Murty and Kabadi in [4] (1987). For CP, this has only recently been proved by Dickinson and Gijben in [3] (2014), although they do state that NP-hardness of this problem had “long been assumed”. So not only is optimization over CPn and COPn difficult, simply

determining whether a matrix lies in CPn or COPn is already difficult. This also means

that any advancement in solving these problems could be an advancement in the famous P = NP problem.

Like most NP-hard problems, the fact that they are hard does not mean that no algo-rithm exists to solve them. In the final chapter, we consider a specific algoalgo-rithm designed quite recently by Bundfuss and D¨ur [5] (2008) to determine whether a matrix is coposi-tive. We implement and analyse this algorithm and prove its correctness (here we also fix a minor error of the proof in [5]), which can be reformulated as the following question: Question 5. Under what conditions can correctness of the algorithm by Bundfuss and D¨ur be proven, and is the algorithm viable to use?

The main goal of this text is to answer the five questions posed in this section in as much detail as possible. Furthermore, we try to motivate why CPn and COPn are interesting

to study, and why it is beneficial to understand them. Since optimization is an important subfield of applied mathematics, placing CPn and COPn in the context of optimization

problems is a large part of the motivation for considering these sets. But apart from optimization problems, these sets are interesting to consider from a linear-algebraic point of view and a geometric point of view in the case of CPn.

1.2. Overview and preliminaries

We give a quick overview of the thesis: in chapter 2, we give both the mathematical preliminaries as well as an introduction to theoretical computer science and complexity classes. In chapter3, we discuss the elementary properties of CPn and COPn and give an

upper bound of the CP-rank of a matrix A ∈ CPn. In chapter4, we place CPn and COPn

in the context of optimization problems and use this theory to prove that membership of CPn and COPn are NP-hard problems. We also show that membership of COPn is a

(7)

to determine if a matrix is copositive, and we theoretically discuss an algorithm that determines if a matrix is completely positive.

The mathematical prerequisites are mostly basic, and while familiarity with theoretical computer science (complexity classes and Turing machines) is very helpful, the theory is built up from the ground in the text.

1.3. Ethical aspects

With any work of research, it is important to consider possible ethical consequences of the work, both positive and negative. However, the work done in this thesis is so theoretical and abstract that no direct ethical consequences can be found (as far as the author is aware).

We will see in the text that the problem of determining complete positivity and/or copositivity are both NP-hard, and so any advancement in these problems might lead to solving the well-known P = NP problem, which might have ethical ramifications. However, this is both highly unlikely and far-fetched.

We will also see that the sets CP and COP can play a role in certain optimization prob-lems, and of course, many specific types of optimization have ethical aspects to consider. But in the opinion of the author, the types of optimization problems considered in this thesis are abstract and do not carry ethical consequences themselves.

We conclude that currently, there are no obvious ethical aspects to consider regarding the study of CP and COP.

(8)

2. Preliminaries

The following chapter aims to provide both the required mathematics and computer science background.

2.1. Convexity, cones and dual cones

We let Sn denote the set of symmetric matrices in Rn×n. The standard basis vectors in

Rn are denoted e1, . . . , en, while e ∈ Rn is the all-ones vector. The support of a vector

x ∈ Rn is defined as

supp x := {i | xi6= 0}.

We will let Rn≥0 and Rn>0 denote the entrywise nonnegative and positive vectors in Rn

respectively. The set Rn≥0 is also called the nonnegative orthant of Rn.

Most of the definitions and theorems in this section come from [11]. Unless explicitly stated otherwise, V will be a real, finite-dimensional vector space.

2.1.1. Convex sets

Convex sets are sets for which ‘the line between two points is contained in the set’. The following definition makes this precise:

Definition 2.1.1. 1. A set C ⊆ V is called convex if, for all x, y ∈ C and every λ ∈ [0, 1], we have λx + (1 − λ)y ∈ C.

2. For S ⊆ V , a convex combination of elements in S is an expression of the form Pn

i=1αisi, where all αi≥ 0 and si∈ S, and where Pni=1αi= 1.

3. For S ⊆ V , the convex hull of S is the set of all convex combinations of elements in S. This set is denoted conv(S).

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Figure 2.1.: The convex hull of the set S = {(0, 0), (1, 0), (0, 1)} ⊆ R2.

In the case V = Rn, the definition of convexity of C implies exactly that for all x, y ∈ C, the line segment between x and y must be contained in C. The following properties are easily proved:

(9)

1. S is convex if and only if S contains every convex combination of elements in S. 2. If (Sα)α∈I are all convex, then ∩α∈ISα is again convex.

3. conv(S) is the smallest convex set that contains S, that is, conv(S) = \

S⊆T , T convex

T.

Definition 2.1.3. Let C ⊆ V be convex. Then v is called an extreme point of C if, for all λ ∈ [0, 1] and x, y ∈ C we have

v = λx + (1 − λ)y =⇒ λ = 0, y = v or λ = 1, x = v,

that is, the only convex combination of elements in C that yields v is the trivial combina-tion v = 1 · v.

Example 2.1.4. The extreme points of a triangle (or more generally, any convex polygon) in R2 are its vertices. The set of extreme points of the closed unit disk in R2 is the unit circle.

Besides convex sets, we also have convex functions. In the case f : R → R, these are functions for which ‘the line between two points on the graph lies completely above the graph’.

Definition 2.1.5. A function f : V → R is called convex if, for all x, y ∈ V and λ ∈ (0, 1) we have

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), and strictly convex if the inequality is strict.

A function f is called (strictly) concave if −f is (strictly) convex.

Example 2.1.6. If L : V → V is linear and b ∈ V is fixed, then the affine function f : V → V : v 7→ L(v) + b

is both concave and convex.

It is immediate that for any convex function f and any convex combinationPn

i=1αixi

we have f (Pn

i=1αixi) ≤Pni=1αif (xi) — this can be proved by induction.

Consider a convex compact (i.e., closed and bounded) set C ⊆ V , and let f : C → R be convex and continuous. We know that f attains a maximum at C (since f is a continuous function on a compact set), and intuitively, it is clear that this maximum must also be attained at one of its extreme points. We will prove this by combining an important theorem with a small lemma.

Lemma 2.1.7. Let C ⊆ V be convex and compact, and let S ⊆ V such that C = conv(S). Furthermore, let f : C → V be convex and continuous. Then the maximum of f on C is attained in S.

Proof. Let x ∈ C, then x is a convex combinationPn

i=1αisi of points in S. Now we have

f (x) = f n X i=1 αisi ! ≤ n X i=1 αif (si) ≤ max 1≤i≤nf (si) n X i=1 αi = max 1≤i≤nf (si).

This shows that for any x ∈ C there exists s ∈ S such that f (s) ≥ f (x), and therefore the maximum of f must be attained in S.

(10)

-2 0 2 4 -2

0 2 4

Figure 2.2.: The cones K1 (blue) and K2 (orange).

The following theorem is a very intuitive but nontrivial result which we will need. A proof can be found in [13].

Theorem 2.1.8 (Krein-Milman). Let C be a compact convex set in V , and let E denote the set of extreme points of C. Then C = conv(E).

Note that the previous theorem implies that every nonempty compact convex set has at least one extreme point: this is not a triviality.

From the previous theorem and lemma, we immediately infer the following:

Corollary 2.1.9. Let C ⊆ V be compact and convex, and let f : C → R be convex and continuous. Then f attains its maximum at an extreme point of C.

Equivalently, if f : C → R is concave and continuous, it attains its minimum at an extreme point of C.

2.1.2. Convex cones, rays, dual cones

The second type of sets we will consider after convex sets are cones.

Definition 2.1.10. A nonempty set K ⊆ V is called a cone if, for every x ∈ K and α ≥ 0 we have αx ∈ K.

Example 2.1.11. The entire space V and any subspace of V is a cone, and for all n ∈ N, the nonnegative orthant of Rn×n is a cone.

Letting arg(x) denote the angle between x and the (nonnegative) x-axis in R2_{, the}

following two sets are cones:

K1=x ∈ R2| π/6 ≤ arg(x) ≤ π/3 , K2 =x ∈ R2| −π/6 ≤ arg(x) ≤ 2π/3 .

See fig.2.2for a picture.

A cone need not be convex. For instance, if we define K1, K2 as in the previous

ex-ample, then (K2 \ K1) ∪ {0} is a nonconvex cone. We have the following easily proved

characterization of convex cones:

Proposition 2.1.12. Let K ⊆ V . Then K is a convex cone if and only if for all α, β ≥ 0 and x, y ∈ K we have αx + βy ∈ K.

(11)

Definition 2.1.13. Let K ⊆ V be a cone. 1. K is called pointed if K ∩ (−K) = {0}.

2. K is called solid if it has nonempty interior in V .

3. K is called proper if it is closed, convex, pointed, and solid.

All of the mentioned properties are natural — note that being pointed simply means that K contains no lines through the origin. The cones K1and K2from example2.1.11are

both proper cones, as well as the nonnegative orthant of Rn_{. Any nondegenerate subspace}

of V is not pointed and therefore not a proper cone.

We will now define two notions, namely the extreme rays of a cone and the dual cone of a set, which will be very important in applications.

Definition 2.1.14. Let x ∈ V , then the ray generated by x is

ray(x) := {αx | α ≥ 0}.

Clearly, ray(x) is the smallest cone that contains x. If dim(V ) > 1, then ray(x) has empty interior, so ray(x) is generally not a proper cone.

Definition 2.1.15. Let K be a convex cone and x ∈ K. If, for all y, z ∈ K, we have

y + z = x =⇒ y, z ∈ ray(x),

then x is an extreme vector of K and ray(x) is an extreme ray of K.

For example, the reader can easily verify that the extreme rays of the convex cone R2≥0

are {(x, 0) | x ≥ 0} and {(0, y) | y ≥ 0}.

Now, we assume (V, h·, ·i) is an inner product space. In the remaining part of this thesis, we will make use of two standard inner products:

• In the case V = Rn_{, we use the inner product hx, yi = x}>_{y =}Pn

i=1xiyi;

• In the case V = Rn×n _{or V = S}

n, we use the inner product hA, Bi = Tr A>B =

Pn

i,j=1aijbij, which reduces to hA, Bi = Tr(AB) if V = Sn.

This inner product is known as the Frobenius inner product.

Definition 2.1.16. Let S ⊆ V . Then we define the dual cone or simply the dual of S as S∗:= {x ∈ V | hx, yi ≥ 0 for every y ∈ S.}

If S = S∗, then S is called self-dual.

Example 2.1.17. In the case where V = Rn and h·, ·i is the standard inner product, we have that hx, yi ≥ 0 if and only if the angle between x and y is at most π/2. Therefore, for S ⊆ Rn, the set S∗ is the set of vectors that have an acute or right angle with all vectors in S.

One can easily check that the nonnegative orthant of Rn is self-dual. The cones K1 and

K2 from example 2.1.11are dual to each other.

We will need the following theorem to prove a basic property of duality — a proof can be found in [14].

(12)

Theorem 2.1.18 (Strict separation of a point and a closed convex set). Let C ⊆ V be closed and convex and b ∈ V \ C. Then there exists u ∈ V and c ∈ R such that hu, xi > c for all x ∈ C and hu, bi < c.

A simple corollary is the following:

Corollary 2.1.19. Let K ⊆ V be a closed convex cone and b ∈ V \ K. Then there exists u ∈ V such that hu, xi ≥ 0 for all x ∈ K and hu, bi < 0.

Proof. Applying the previous theorem yields u ∈ V , c ∈ R such that hu, xi > c for all x ∈ K and hu, bi < 0. Since 0 ∈ K and hu, 0i = 0 it must hold that c < 0, from which the claim follows.

The set {v ∈ V | hv, ui = 0} is a hyperplane. This hyperplane separates V into two closed half-spaces, namely {v | hv, ui ≤ 0} and {v | hv, ui ≥ 0}. Intuitively, the separation theorem now states that if b /∈ K, then there exists a hyperplane that separates b from K.

The following theorem gives the most basic properties of duality: Theorem 2.1.20. Let S, T ⊆ V .

(a) S∗ is a closed convex cone in V . (b) If S ⊆ T , then T∗⊆ S∗.

(c) S ⊆ S∗∗.

(d) S = S∗∗ if and only if S is a closed convex cone.

Proof. (a) Let x, y ∈ S∗ and α, β ≥ 0. Then for all s ∈ S we have hαx + βy, si = αhx, si + βhy, si ≥ 0,

which shows that S∗ is a convex cone. To show that S∗ is closed: note that if (xn)n∈N

is a sequence in S∗ converging to x, then for all s ∈ S we have hx, si =Dlim

n→∞xn, s

E = lim

n→∞hxn, si ≥ 0,

by continuity of the inner product. (b) This is clear from the definition.

(c) Suppose s ∈ S. Then for all x ∈ S∗ we have hs, xi = 0 by definition, so s ∈ S∗∗. This shows S ⊆ S∗∗.

(d) “ =⇒ ” If S = S∗∗, then S is a dual cone and therefore closed and convex by (a). “ ⇐= ” For a contradiction, suppose S is a closed and convex cone and S ( S∗∗. Then there exists b ∈ S∗∗\ S, and by corollary2.1.19there exists u ∈ V such that hu, bi < 0 and hu, xi ≥ 0 for all x ∈ S. From the second condition it follows that u ∈ S∗, but if u ∈ S∗ and hu, bi < 0 then clearly b /∈ S∗∗, a contradiction.

In the following paragraphs, we will see more exciting examples of cones, namely cones of matrices in the vector space V = Sn⊆ Rn×n.

(13)

2.1.3. Positive semi-definite matrices

We introduce positive semi-definite matrices and prove some elementary properties. We show that the set of positive semi-definite matrices is a proper self-dual cone.

Definition 2.1.21. Let A ∈ Sn. Then A is called positive semi-definite if x>Ax ≥ 0 for

all x ∈ Rn and positive definite if the inequality is strict for all x 6= 0.

The set of positive semi-definite matrices in Rn×n is denoted by PSDn, and the set of

positive definite matrices by PSD_n+.

Note that any positive semi-definite matrix has only nonnegative diagonal entries, since for A ∈ PSDn we have aii = e>i Aei ≥ 0 for i = 1, . . . , n. Analogously, a positive definite

matrix has only positive diagonal entries. We define the notion of a Gram matrix.

Definition 2.1.22. Let v1, . . . , vn∈ Rk. Then we define the Gram matrix G of v1, . . . , vn

by gij = hvi, vji. This matrix is denoted Gram(v1, . . . , vn).

Example 2.1.23. The Gram matrix of e1, e2, e ∈ R2 is given by

Gram(e1, e2, e) =   1 0 1 0 1 1 1 1 2  .

The following (easily proved) proposition gives two alternative ways to look at Gram matrices:

Proposition 2.1.24. Let B ∈ Rk×nhave columns v1, . . . , vn and rows b>1, . . . , b>k. Then

Gram(v1, . . . , vn) = B>B = k

X

i=1

bib>i .

The following theorem, together with the previous proposition, gives a characterization of positive semi-definite matrices:

Theorem 2.1.25. Let A ∈ Sn. Then the following are equivalent:

(a) A ∈ PSDn.

(b) All eigenvalues of A are nonnegative. (c) There exists C ∈ Sn such that A = C2.

(d) There exists k ∈ N and vectors v1, . . . , vn∈ Rk such that A = Gram(v1, . . . , vn).

Proof. (a) =⇒ (b): Suppose Ax = λx. Then we have 0 ≤ x>Ax = x>(λx) = λhx, xi which implies λ ≥ 0.

(b) =⇒ (c): Let A = QDQ>be the spectral decomposition of A. Because all eigenvalues are nonnegative, we can take the entrywise square root of D and find

A = Q√D2Q>= (Q√DQ>)(Q√DQ>) = (Q√DQ>)2.

(c) =⇒ (d): If C is symmetric then A = C2 = C>C, so A is the Gram matrix of the columns of C (cf. proposition2.1.24).

(14)

(d) =⇒ (a): Let x ∈ Rn and suppose A = Gram(v1, . . . , vn), then x>Ax = n X i=1 n X j=1 aijxixj = n X i=1 n X j=1 hvi, vjixixj ? = * _n X i=1 xivi, n X j=1 xjvj + = n X i=1 xivi 2 ≥ 0,

where ? follows from the bilinearity of the inner product.

With analogous arguments as in the previous proof, the following theorem can be proved: Theorem 2.1.26. Let A ∈ Sn. Then the following are equivalent:

(a) A ∈ PSD_n+.

(b) All eigenvalues of A are positive.

(c) There exists an invertible C ∈ Sn such that A = C2.

(d) There exists k ∈ N and linearly independent vectors v1, . . . , vn ∈ Rk such that A =

Gram(v1, . . . , vn).

From (b), we infer that PSD+_n is exactly the set of invertible matrices in PSDn.

The set of positive semidefinite matrices gives us our first example of a proper cone in a vector space other than Rn. To prove this, we first need a minor lemma:

Lemma 2.1.27. If A ∈ Sn satisfies x>Ax = 0 for all x ∈ Rn≥0, then A = 0.

Proof. Choosing x = ej we find x>Ax = ajj = 0 for all j. For i 6= j, set x = √1₂(ei+ ej),

then

0 = x>Ax = 1

2(aii+ 2aij + ajj) = aij, so A = 0.

Note that the previous lemma does not hold in general if A is not symmetric: for example, the matrix 0 1

−1 0

satisfies x>Ax = 0 for all x ∈ Rn.

Theorem 2.1.28. The set PSDnis a proper cone in Sn. Furthermore, PSDnis self-dual,

and its interior equals PSD_n+.

Proof. First, we prove PSD_n∗ = PSDn; this in combination with theorem2.1.20will show

that PSDnis a closed convex cone. We must show that PSDnis self-dual, or equivalently,

that

Y ∈ PSDn ⇐⇒ Tr(XY ) ≥ 0 for all X ∈ PSDn.

• ‘ =⇒ ’: Suppose Y ∈ PSD_n and let X ∈ PSDn. We know that X has a spectral

decomposition X =Pn

i=1λiqiq>i with λ1, . . . , λn all nonnegative. Now we have

Tr(XY ) = Tr _n X i=1 λiqiq>i ! Y ! = n X i=1 λiTr qiq>i Y _? = n X i=1 λi(q>i Y qi) ≥ 0,

(15)

• ‘ ⇐= ’: Suppose Y /∈ PSDn. Then there exists some x ∈ Rn such x>Y x < 0. Now define X = xx>∈ PSDn, then Tr(XY ) = Tr xx>Y = x>Y x < 0, so Y /∈ PSD∗ n.

We conclude that PSDn is self-dual.

To prove that PSDn is pointed: suppose A and −A are both positive semi-definite.

Then for all x ∈ Rn we have

0 ≤ x>Ax = −x>(−A)x≤ 0 =⇒ x>Ax = 0, so A = 0 by lemma2.1.27.

Finally, to prove that PSDn is solid, we prove that Int PSDn= PSD+n.

• If A ∈ PSD_n has only positive eigenvalues (so A ∈ PSD+_n), then there exists a neighborhood of A in Sn of matrices with only positive eigenvalues, and therefore

A ∈ Int PSDn. This follows from the fact that the eigenvalues of a matrix are

continuous as a function of its entries, which in turn follows from the fact that the roots of polynomials are continuous functions of its coefficients (see, for example, [15]).

• If A ∈ PSD_n has a zero eigenvalue, then for every ε > 0 the symmetric matrix A − εI has a negative eigenvalue, and therefore A /∈ Int PSDn.

We conclude Int PSDn= PSD+n, which finishes the proof.

This concludes the linear algebraic side of the preliminaries. We will also need some definitions from graph theory, which we discuss in the next section.

2.2. Graph theory

An undirected graph is a tuple G = (V, E) where V is a finite set of vertices (usually {1, . . . , n}) and E ⊆ {{u, v} ⊆ V | u 6= v} is the set of edges. Since we will only consider undirected graphs in this paper, we will simply refer to them as graphs. Some basic terminology:

• A graph is called fully connected if it contains all possible edges, that is, E = {{u, v} | u, v ∈ V }.

• A graph is called totally disconnected if it contains no edges, so E = ∅.

• Given a subset V0 _{⊆ V , the induced subgraph is the graph G}0_{= (V}0_{, E}0_{), where}

E0 ={u, v} ∈ E | u, v ∈ V0 .

Definition 2.2.1. Let G = (V, E) be a graph and V0 ⊆ V . Then V0 is called an inde-pendent set if the induced subgraph is totally disconnected. The stability number of G, denoted α(G), is the number of vertices in the largest independent set of G.

There is a very natural connection between graphs and linear algebra:

Definition 2.2.2. Let G = (V, E) be a graph. Then we define the adjacency matrix of G, denoted AG, by

aij =

(

1 if {i, j} ∈ E; 0 otherwise.

(16)

2.3. Complexity theory

In this thesis, the complexity of certain algorithms will be studied, and we will use terms such as polynomial time, NP-hard, and co-NP-complete. This section will mostly consist of introducing these terms (hopefully already familiar to the reader) and our notation. The treatment is based on that in [12]. The main topic that the reader may not have encountered before is Turing reductions and the corresponding oracle Turing machines.

2.3.1. Problems and encodings

Everybody has an intuitive notion of what a problem, a solution, or an algorithm is. However, it seems difficult to rigorously define these terms. For example, on a high level, an algorithm is best described by ‘a step-by-step procedure which solves a problem’, which is of course not a definition. Formal definitions are only possible when both an encoding scheme and a machine model are chosen. In this section, we will briefly describe this formalism, and we refer to [12] for the details.

The formalism is based on only considering decision problems, which are problems which only have ‘yes’ and ‘no’ as output. These problems are easily defined formally:

Definition 2.3.1. A decision problem Π consists of a set DΠ of instances and a set

YΠ⊆ DΠof yes-instances.

Example 2.3.2. Let us consider a decision problem that we will use as an example throughout this section. The independent set problem asks: given a graph G = (V, E) and some k ∈ {1, . . . , |V |}, does G contain an independent set of size k?

Here, the set of instances DΠ is the set of all tuples (G, k) where G = (V, E) is a graph

and k ≤ |V |, and the set of yes-instances YΠ is the subset of all tuples (G, k) where G

contains an independent set of size k.

We will denote instances of this problem as (G, k) or ((V, E), k).

The next step is to translate a problem into something a machine can understand. This is called an encoding scheme.

Definition 2.3.3. Let Σ be a finite set of symbols (called an alphabet ). Then Σ∗ is the set of all finite strings of symbols from Σ (including the empty string ε). A subset L ⊆ Σ∗ is called a language over Σ. The length of a string x is denoted by |x|.

Given a problem Π, an encoding scheme for Π is an injective function e : DΠ→ Σ∗.

Example 2.3.4. Let us return again to the previous example of the independent set problem, using the alphabet Σ = {0, 1, (, ), −}, and let b(n) ∈ Σ∗ denote the binary representation of n ∈ N (as a string).

The parameters of the independent set problem are the graph G = (V, E) and the constant k. Without loss of generality we assume that the vertex set V is always given by {1, . . . , m} for some m ∈ N. Writing E = {{u1, v1}, . . . , {up, vp}}, a possible encoding

scheme could be

e(G, k) := b(k) − b(m) − (b(u1) − b(v1)) · · · (b(up) − b(vp))

Of course, many more encoding schemes are possible. For example, instead of specifying the number of vertices and then all edges, we could simply write out the rows of the adjacency matrix AG. Since any element of AG is either 0 or 1 (and therefore an element

of Σ), this yields the encoding scheme

(17)

Now, given an encoding scheme, we can define the language associated with a problem: Definition 2.3.5. Let Π be a problem and e an encoding scheme for that problem using alphabet Σ. Then we define the language associated with Π as

L[Π, e] := {x ∈ Σ∗ | x = e(I) for some I ∈ YΠ}.

While encoding schemes are necessary to formally discuss algorithms and complexity, using them is cumbersome, especially since there are so many possibilities. This is where Garey and Johnson in [12] give the notion of a reasonable encoding scheme for a problem Π: an encoding scheme which concisely describes the problem without adding any unnecessary information or padding the input. Of course, this is not a formal definition, so this is where we must appeal to our intuition. Garey and Johnson make the following observation: Observation. Every concept in terms of languages is encoding-independent, so long as we restrict ourselves to reasonable encoding schemes.

However, we need some notion of ‘input length’, which is defined as follows:

Definition 2.3.6. A length function for a problem Π is a function LenΠ: DΠ→ N, which

is polynomially related to any reasonable encoding scheme. That is, given a reasonable encoding scheme e, there exist polynomials p, q such that for all I ∈ DΠ we have

LenΠ(x) ≤ p(|x|) and |x| ≤ q(LenΠ(x)).

From this point, we will assume that every problem has a length function associated with it.

Example 2.3.7. The most natural length function for the independent set problem is simply ((V, E), k) 7→ |V |. It is easily checked that both encoding schemes from exam-ple 2.3.4 are polynomially related to this length function. Informally, we say that the input length of an instance of this problem is the number of vertices of G.

2.3.2. Algorithms, Turing machines, the class P

To define an algorithm, we need to fix a machine model — we use deterministic Turing machines. We will only give the definition and a short explanation: for details, the reader is referred to [12].

Definition 2.3.8. A deterministic one-tape Turing machine or DTM is an 8-tuple

M = (Q, q0, qY, qN, Γ, Σ, b, δ),

where:

• Q is a set of states and q0, qY, qN ∈ Q are the initial state, accepting state, and

rejecting state respectively;

• Γ is a set of symbols, Σ ⊆ Γ a set of input symbols and b ∈ Γ \ Σ a blank symbol ; • δ : (Q \ {q_Y, qN}) × Γ → Q × Γ × {L, R} is a transition function.

We imagine a two-way infinite tape made up of tape squares labeled · · · , −2, −1, 0, 1, 2, . . . , and a read-write head which is always scanning exactly one square. A computation on a DTM then proceeds as follows:

(18)

1. An input string x ∈ Σ∗ is placed in tape squares 1 through |x|, and all other squares contain the blank symbol b.

2. The read-write head starts at square 1, and the program starts in state q0.

3. In a step-by-step manner, the transition function δ(q, s) = (q0, s0, ∆) is computed, with q the current state and s the current symbol being scanned by the read-write head. The current state is then updated to q0, the read-write head replaces the s by s0, and then moves one square left if ∆ = L and right if ∆ = R.

This happens until either the accepting state qY or the rejecting state qN is reached

(which may never happen).

The number of steps that a computation takes is called the time of computation.

Definition 2.3.9. Let M be a DTM, x ∈ Σ∗, and let Π be a decision problem under encoding scheme e.

1. We say M halts on input x if M , on input x, reaches the accepting or rejecting state. If the accepting state is reached, we say that M accepts x.

2. The language recognized by M is defined as LM := {x ∈ Σ∗ | M accepts x}.

3. We say that M solves Π if LM = L[Π, e].

4. M is called algorithmic if M halts on all inputs x ∈ Σ∗.

5. If M is algorithmic, its time complexity function TM: N → N is given by mapping n

to the maximum computation time of M , taken over all inputs of size n.

6. If M is algorithmic, it is called a polynomial time DTM if there exists a polynomial p such that TM(n) ≤ p(n) for all n ∈ N.

7. We define the class P as

P := {L ⊆ Σ∗ | there exists a polynomial time DTM M such that L = LM.},

and we will say the problem Π belongs to P (denoted Π ∈ P) if L[Π, e] belongs to P for some reasonable encoding scheme e.

This is a formal definition of the class P. If we want to check if a problem is solvable in polynomial time, it would be cumbersome to have to construct a DTM which solves the problem. This motivates a new intuitive notion (again, following Garey and Johnson), namely that of a reasonable machine model. Examples of reasonable machine models are k-tape Turing machines and Random Access Machines, and the main idea is that a reasonable machine model has a polynomial bound on the amount of work that can be done in a single unit of time (for example, a k-tape Turing machine can do k ‘units of work’ in one unit of time).

Observation (Invariance thesis). Reasonable machine models can simulate each other with a polynomial overhead in time.

For a more detailed discussion on this thesis, see [16]. Its main implication is that we can choose any machine model we like to check if a problem is solvable in polynomial time.

(19)

2.3.3. The class NP

To define problems in NP, we will first give an intuitive description in terms of the inde-pendent set problem. Recall that this problem asks: given k ∈ N and a graph G, does G contain an independent set of size k? Suppose someone answers ‘yes’ to this question, and we ask them to prove their claim. Then they could provide us with a subset V0 ⊆ V , and we would only have to check that (a) |V0| = k and (b) that V0 is an independent set. Remark. Here, we will start being informal and state the following claim: it is clear that this verification can be done in polynomial time. As discussed earlier, to prove this we would need to construct a DTM (or some other reasonable machine model) and show that it runs in polynomial time. However, we simply see that checking if |V0| = k requires k ‘steps’ and checking if V0 is an independent set requires checking all edges between elements of V0 — since there are k₂ = k(k−1)₂ = k₂2 − k₂ pairs that need to be checked and k ≤ |V |, this is a polynomial amount of ‘steps’, even if each one of these ‘steps’ in fact requires 10, 100, or 1000 units of time on a DTM (so long as we can bound it by a polynomial).

This gives the notion of polynomial time verifiability: while it may or may not be possible to find a solution in polynomial time, it is possible to verify that a claimed solution is indeed a solution in polynomial time. Another way to state this is that the answer of a problem in NP must have a proof of polynomial length.

To formalize this, Garey and Johnson define the following nondeterministic Turing machine:

Definition 2.3.10. A nondeterministic Turing machine (NDTM) is a DTM M with the addition of a guessing module: preceding any computation, the guessing module, starting on square -1, either writes a randomly chosen symbol from Γ at its current position and then moves one position to the left, or halts. If the guessing module halts (which may not happen), then computation proceeds as described earlier.

Note that there are infinitely many possible computations on an NDTM, one for each possible guessed string. We say that M accepts x ∈ Σ∗ if there exists at least one compu-tation for which M enters the accepting state qY, and again we define LM as the language

recognized by M as the set of accepted strings.

The time to accept a string x ∈ LM is given by the minimal amount of steps before qY

is reached, taken over all possible guesses for which x is accepted. The time complexity function TM(n) is given by the maximum computation time for all x ∈ LM with length

n (note that we only consider accepting computations). M is called polynomial time if TM(n) is polynomial.

And this allows us to define the class NP: Definition 2.3.11. We define the class NP as

NP := {L ⊆ Σ∗| there exists a polynomial time NDTM M such that L = LM, }

and we will say a problem Π belongs to NP if L[Π, e] belongs to NP for some reasonable encoding scheme e.

From the definition it immediately follows that the independent set problem belongs to NP. One of the most famous open problems in computer science is the question of whether P = NP — it is now widely believed that the answer is negative, even though no proof has been given yet.

(20)

Definition 2.3.12. Let Π be a decision problem with instances DΠ and yes-instances

YΠ. Then we define the complement of Π as the problem Π{ with instances DΠ and

yes-instances DΠ\ YΠ.

Intuitively, Π{ simply asks the question “does x not lie in YΠ?”. It is immediately

clear that Π ∈ P =⇒ Π{ _{∈ P: given a polynomial-time algorithm that solves Π, we}

can simply reverse the output (in a DTM: interchange the states qY and qN) to obtain a

polynomial-time algorithm that solves Π{.

However, it is not clear why Π ∈ NP would imply that Π{_{∈ NP: for some problems, it}

seems easy to verify yes-instances but difficult to verify no-instances.

Example 2.3.13. Consider the complement of the independent set problem: given G = (V, E) and k ≤ |V |, does G not contain an independent set of size k?

Intuitively, to be able to verify this, one would have to check all subsets V0 ⊆ V of size k and show that none of them are independent, but this is not doable in polynomial time.

This induces the following definition:

Definition 2.3.14. We define the class co-NP as the set of all languages L ⊆ Σ∗ whose complement Σ∗\ L lies in NP.

A problem Π is said to belong to co-NP if L[Π, e] ∈ co-NP for some reasonable encoding scheme e.

Remark. Suppose Π is a decision problem with some reasonable encoding scheme e using alphabet Σ∗. Then e divides Σ∗ into three classes: encodings of yes-instances, encodings of no-instances, and strings that are not encodings of instances. In other words: the set of yes-instances of Π{ does not equal the complement of L[Π, e] in Σ∗. However, for any reasonable encoding scheme, it must be verifiable in polynomial time whether a string is an encoding of an instance or not — therefore, the above definition is still sensible.

It is an open question whether NP = co-NP, but just as with P = NP, the answer is widely believed to be negative. Note that if NP 6= co-NP, this immediately implies P 6= NP, since the class P equals its own complement.

Another related unsolved question is whether P = NP ∩ co-NP.

2.3.4. NP-completeness and NP-hardness

It is clear that deterministic Turing machines can be used to compute functions f (n) in the following manner:

Definition 2.3.15. Let M be a DTM and f : Σ∗ → Γ∗. Then f is said to be computed by M if:

• M is algorithmic (M halts for all inputs);

• On input x ∈ Σ∗, the output y ∈ Γ∗ (defined by running M until it halts and then forming a string from the symbols on tape square 1, 2, . . . , until the last non-blank tape square) equals f (x).

Using this, we can define ‘polynomial transformations’:

Definition 2.3.16. Let L1 ⊆ Σ∗1, L2 ⊆ Σ∗2, and let f : Σ∗1 → Σ∗2 satisfy the following two

properties:

(21)

2. For all x ∈ Σ∗₁ it holds that x ∈ L1 ⇐⇒ f (x) ∈ L2.

Then f is called a polynomial transformation or many-one-reduction from L1 to L2. If a

polynomial transformation from L1 to L2 exists, we write L1∝ L2.

If Π1 and Π2 are decision problems such that L[Π1, e1] ∝ L[Π2, e2] for reasonable

en-coding schemes e1 and e2, we say that Π1 is many-one-reducible to Π2.

Example 2.3.17. Let us consider an easy polynomial transformation. For a graph G = (V, E), a subset V0 ⊆ V is called a clique if the induced subgraph is connected. The clique problem asks: given k ≤ |V |, does G contain a clique of size |V |?

It is easily seen V0 ⊆ V is an independent set if and only if V0 is a clique in the comple-ment graph G{(formed by replacing all non-edges by edges and vice versa). Therefore, the independent set problem is many-one reducible to the clique problem, since constructing G{ from G can be done in polynomial time. By analogous reasoning, the clique problem is many-one reducible to the independent set problem.

Suppose L2∈ P and L1 ∝ L2. Then it is clear that L1 ∈ P as well, since for any x ∈ Σ∗1

we can first compute f (x) (in polynomial time) and then determine if f (x) ∈ L2.

We state this as a lemma (together with its analogue for NP): Lemma 2.3.18. Let L1, L2 be languages such that L1∝ L2. Then:

(a) L2∈ P =⇒ L1∈ P;

(b) L2∈ NP =⇒ L1 ∈ NP.

Definition 2.3.19. Two languages L1, L2 are called polynomially equivalent if L1 ∝ L2

and L2∝ L1.

Polynomial equivalence states that deciding whether x ∈ L1 is ‘equally hard’ as deciding

whether x ∈ L2.

It turns out that there exist languages in NP that satisfy the property that any language in NP can be polynomially transformed to them. These languages define ‘the hardest problems in NP’:

Definition 2.3.20. Let L be a language in NP such that for all languages L0 ∈ NP it holds that L0 ∝ L. Then L is called NP-complete.

If Π is a decision problem such that L[Π, e] is NP-complete for some reasonable encoding scheme e, then Π is also called NP-complete.

A language (or the corresponding decision problem) is called co-NP-complete if its com-plement is NP-complete.

The following theorem will be very important in our studies. For a proof, we refer the reader to [12].

Theorem 2.3.21. The independent set problem is NP-complete.

So informally, we now know the NP-complete problems are ‘the hardest problems in NP’. Therefore, if we can show that a problem is harder than an NP-complete problem, then it is either NP-complete itself or lies outside NP — this is called an NP-hard problem. As it turns out, the notion of many-one-reducibility is, for some problems, not strong enough. This is where Turing reducibility comes into play.

Intuitively, a problem Π1 is Turing reducible to Π2 if Π1 can be solved in polynomial

time given an oracle for Π2, where an oracle for Π2 can be viewed as a magical machine

which solves any instance of Π2 in only one step. To formally define this, we need an

oracle Turing machine. We will give a slightly different definition than the one given in [12] — ours is less general, but still equivalent.

(22)

Definition 2.3.22. An oracle Turing machine (OTM) is a 11-tuple M = (Q, q0, qY, qN, qc, qoY, qoN, Γ, Σ, b, δ),

where:

• Q, q0, qY, qN, Γ, Σ, b are the same as with a DTM;

• qc, qoY, qoN ∈ Q are oracle consultation state, oracle yes state, and oracle no state

respectively.

• δ : (Q \ {qY, qN, qc}) × Γ → Q × Γ × Σ × {L, R} × {L, R} is a transition function.

We imagine a standard DTM which now also has a two-way infinite oracle tape and a corresponding (write-only) oracle head. A computation on an OTM depends on a specified oracle set D ⊆ Σ∗ and proceeds as follows:

1. As usual, an input string x ∈ Σ∗ is placed on the primary tape, all other squares contain the blank symbol b, and both heads start at square 1 of their tape. The program starts in state q0.

2. In a step-by-step manner, the following occurs: • If the current state is qh, computation ends.

• If the current state is qc, let y ∈ Σ∗ be the string on oracle tape squares 1

through k, where k is the last square which does not contain a blank.

In one step, the oracle tape is changed to contain only blanks, and the state is changed to qoY if y ∈ D and qoN if y /∈ D.

• Else, the computation occurs exactly like the computation on a DTM, but with two tapes. So δ(q, s1) = (q0, s01, s02, ∆1, ∆2) is computed where q is the current

state and s1is the symbol scanned by the read-write head. The state is changed

to q0, the read-write tape replaces s1 by s01 and moves in direction ∆1, and the

oracle tape replaces the current oracle tape symbol by s0₂ and moves in direction ∆2.

We denote MDfor the OTM M with specified oracle set D and we call MD a polynomial

time OTM if its time complexity function is bounded by a polynomial. We define LD_M as the set of strings that are accepted by MD.

With the formal definition of an OTM, we can define Turing reductions:

Definition 2.3.23. Let L1, L2 ⊆ Σ∗ be two languages. A polynomial time Turing

reduc-tion from L1 to L2 is an OTM M with alphabet Σ and oracle set L2 such that LM = L1.

If a Turing reduction exists from L1 to L2, we write L1∝T L2.

Example 2.3.24. To obtain some intuition for OTM’s, we will show that every decision problem Π is Turing reducible to its complement Π{. To see this, let M be the following OTM:

1. The input x is written on the oracle tape and the oracle is consulted;

2. If the state after consultation is qoY, change the state to qN. If the state after

(23)

Clearly, if M has oracle set YΠ, then LM = Y_Π{, which proves that M is a Turing reduction

from Π to Π{.

Example 2.3.25. It is clear that a many-one reduction is simply a special case of a Turing reduction: if x ∈ L1 ⇐⇒ f (x) ∈ L2, then L1 ∝T L2: given an input x, we need

to compute f (x) and call the oracle once with input f (x) to determine if f (x) ∈ L2, and

this immediately shows whether or not x ∈ L1.

Definition 2.3.26. A language L ⊆ Σ∗ is called NP-hard if there exists an NP-complete language L0 such that L0 ∝T L.

A decision problem Π is called NP-hard if L[Π, e] is NP-hard for some reasonable en-coding scheme e.

The previous two examples show that if L ⊆ Σ∗ is the complement of an NP-complete language or many-one reducible to an NP-complete language, then L is NP-hard.

(24)

3. The completely positive and copositive

cones

The following chapter introduces the completely positive and copositive matrices and treats some interesting questions regarding these classes of matrices.

3.1. Definitions and elementary properties

Most of the definitions and terminology in this section are taken from [11]. 3.1.1. Completely positive matrices

A matrix A ∈ Rn×m is called nonnegative if all entries of A are nonnegative, and positive if all entries of A are positive. The set of entrywise nonnegative and positive matrices in Rn×m are denoted by Rn×m≥0 and Rn×m>0 respectively. We also use the notation A ≥ 0 if A

is nonnegative and A > 0 if A is positive.

Definition 3.1.1. A matrix A ∈ Rn×n _{is completely positive if there exists k ∈ N and}

a nonnegative matrix B ∈ Rk×n≥0 such that A = B>B. The set of completely positive

matrices in Rn×n is denoted CPn.

If A ∈ CPn, the smallest k for which we can find a matrix B ∈ Rk×n≥0 with A = B>B is

called the CP-rank of A.

Example 3.1.2. The all-ones matrix E ∈ Rn×n is completely positive, since it can be decomposed as E = ee> (recall that e is the all-ones vector).

Sometimes the following characterization is more useful:

Proposition 3.1.3. Let A ∈ Sn. Then the following are equivalent:

(a) A ∈ CPn.

(b) There exist b1, . . . , bk∈ Rn≥0 such that A =

Pk

i=1bib>i .

(c) There exist v1, . . . , vn∈ Rk≥0 such that A = Gram(v1, . . . , vn).

Proof. This follows immediately from proposition2.1.24.

For example, one property that immediately follows from this characterisation is that the sum of two completely positive matrices is again completely positive.

Determining whether a matrix is completely positive is hard in general (we will discuss this in the next chapter). There are, however, some clear necessary conditions for complete positivity. By theorem2.1.25, it is clear that A ∈ PSDn, and furthermore, since A is the

product of entrywise nonnegative matrices, A must be entrywise nonnegative. Matrices which satisfy the last two properties have their own name:

Definition 3.1.4. A matrix A ∈ Sn is called doubly nonnegative if A is nonnegative and

(25)

The following proposition is now immediate: Proposition 3.1.5. We have CPn⊆ DN Nn.

We give a well-known result, which we will not prove in this thesis. In [11], both an algebraic and a geometric proof is given.

Theorem 3.1.6. We have CPn= DN Nn if and only if n ≤ 4. Furthermore, for n ≤ 4,

the CP-rank of A ∈ CPn is at most n.

In this thesis, we will try to uncover properties of completely positive matrices and study the problem of determining whether or not a matrix is completely positive.

For much of our discussion, we will need the following important result:

Theorem 3.1.7. The set CPn is a closed convex cone in Rn×n. The extreme rays of CPn

are given by matrices of the form xx> with x ∈ Rn≥0.

Proof. It follows immediately from proposition3.1.3that CPn is a convex cone. To prove

that it is closed, we must show that if (Ak)k∈N is a sequence in CPn which converges to

A, then A ∈ CPn. Write Ak = Gram

v₁(k), . . . , vn(k)

.

Note that the vectors v_i(k) need not be of the same size for different k. However, we will show in corollary 3.2.7that we can find an upper bound N for the sizes of v(k)_i . By adding zeroes to all v(k)_i until they have size N , we may assume without loss of generality that v(k)_i ∈ RN _{for all i, k.}

Now, for any j, the sequence (v(k)_j )k∈N is bounded (since

v (k) j 2 → ajj for k → ∞),

and therefore has a converging subsequence with limit vj ≥ 0. It follows that A =

Gram(v1, . . . , vn), and therefore A ∈ CPn.

The fact that the extreme rays are given by xx> for x ∈ Rn≥0 also follows immediately

from proposition3.1.3.

Remark. In the proof of this theorem in [11], the vectors v(k)_i are not appended with zeroes, the fact that there exists an upper bound for the sizes of the v(k)_i is not used. As far as the author is aware, this is essential for the proof, and thus the proof in [11] is incorrect.

We will conclude this paragraph by proving a sufficient condition for complete positivity: Definition 3.1.8. A matrix A ∈ Rn×n is called diagonally dominant if, for all i ∈ {1, . . . , n} we have

|aii| ≥

X

j6=i

|aij|,

and strictly diagonally dominant if the above inequality is strict for all i.

Theorem 3.1.9. If A ∈ Rn×n≥0 is symmetric and diagonally dominant, it is completely

positive.

Proof. Define ai = aii−Pj6=iaij, then by the assumptions we have ai ≥ 0 for all i. Let

ei,j := ei+ ej and Fij := ei,je>i,j, so F has 1 in positions ii, ij, ji, jj and 0 everywhere else.

Then we have

A =X

i<j

aijFij+ diag(a1, . . . , an).

Since all aij are nonnegative, all Fij are completely positive, and diag(a1, . . . , an) is clearly

(26)

Note that the condition of being diagonally dominant is not a necessary condition for complete positivity: for n > 2, the all-ones matrix E = ee> ∈ Sn is completely positive

but not diagonally dominant.

3.1.2. Copositive matrices

We now consider the dual cone of CPn, called the set of copositive matrices.

Definition 3.1.10. Let A ∈ Sn. Then A is called copositive if x>Ax ≥ 0 for all x ∈ Rn≥0,

and strictly copositive if the inequality is strict for all x 6= 0.

The set of copositive matrices in Rn×nis denoted COPn, and the set of strictly copositive

matrices by COP+ n.

Clearly, a copositive (resp. strictly copositive) matrix has only nonnegative (resp. posi-tive) diagonal entries.

Example 3.1.11. Consider a matrix A = a b b c

∈ S2 with a, c ≥ 0. Then, for x =

(x, y)>∈ R2 _{we have} x>Ax =x ya b b c x y

= ax2+ 2bxy + cy2 = (√ax −√cy)2+ 2(b +√ac)xy.

Since x, y ≥ 0, this shows b ≥ −√ac =⇒ A ∈ COP2. Conversely, suppose b <

√

ac, and consider the vector x = 1 p_a

c > . Then we find x>Ax = 2(b +√ac)r a c < 0, so A /∈ COP2.

Therefore, we have proven that a b

b c

∈ COP₂ ⇐⇒ a, c ≥ 0 and b ≥ −√ac.

Note that any positive semi-definite matrix is copositive, as well as any symmetric nonnegative matrix. It also follows from the definition that the sum of two copositive matrices is copositive. Let us denote the set of symmetric nonnegative matrices in Rn×n by SN Nn= Sn∩ Rn×n≥0 . We conclude the following:

Proposition 3.1.12. We have PSDn+ SN Nn⊆ COPn.

The next theorem shows equality only holds for small n — the same n as in theo-rem3.1.6.

Theorem 3.1.13 (Diananda [17]). We have COPn= PSDn+SN Nnif and only if n ≤ 4.

The following lemma is an easy consequence of the definition of copositivity:

Lemma 3.1.14. Let A ∈ Sn, and let k·k be any norm on Rn. Then A is copositive if and

only if for all x ∈ Rn

≥0 with kxk = 1 we have x>Ax ≥ 0.

Analogously, A is strictly copositive if and only if for all x ∈ Rn≥0 with kxk = 1 we have

x>Ax > 0.

(27)

Theorem 3.1.15. The set COPn is a proper cone in Sn. In fact, CPn∗ = COPn and

COP_n∗ = CPn.

Proof. We will first show that CP_n∗ = COPn. We have

A ∈ CP_n∗ ⇐⇒ Tr(AC) ≥ 0 for all C ∈ CPn

⇐⇒ TrAB>B≥ 0 for all B ∈ Rk×n

≥0 , k ∈ N

⇐⇒ TrBAB>

≥ 0 for all B ∈ Rk×n≥0 , k ∈ N.

Letting b>₁, . . . , b>_k denote the rows of B, we have Tr BAB> = Pk

i=1b>i Abi, and from

this we infer that

TrB>AB≥ 0 for all B ∈ Rk×n

≥0 ⇐⇒ b >

Ab ≥ 0 for all b ∈ Rn≥0 ⇐⇒ A ∈ COPn.

This proves that CP_n∗= COPnand therefore that COPn is a closed convex cone in Snand

that COP_n∗= CPn by theorem2.1.20.

The proof that COPn is pointed is almost exactly the same as in the proof of

theo-rem 2.1.28: if A and −A are both copositive then x>Ax = 0 for all x ∈ Rn≥0 which

implies A = 0 by lemma2.1.27. Since PSDn⊆ COPn, we know that COPnis solid, which

completes the proof that COPn is a proper cone.

In contrast with the extreme rays of CPn, the extreme rays of COPn are not yet fully

known.

We give one immediate application of the above theorem. Theorems 3.1.6 and 3.1.13 look very similar, and it is sensible to wonder if one can be proved from the other. In fact, the two theorems (while discovered independently) are equivalent, and this can be understood in terms of duality:

Proposition 3.1.16. For fixed n ∈ N, we have CPn = DN Nn if and only if COPn =

PSDn+ SN Nn.

Proof. It is easily seen that DN N is a closed convex cone, so we have by theorem2.1.20 CP_n= DN Nn ⇐⇒ CPn∗ = DN Nn∗.

As we have just shown, CP_n∗ = COPn, and it is proved in [18] that DN Nn∗ = PSDn+

SN N_n. From this, the equivalence follows.

We end this paragraph with the following chain of inclusions:

CPn⊆ DN Nn⊆ PSDn⊆ DN Nn∗ ⊆ COPn⊆ Sn⊆ Rn×n,

where

• CP_n is a closed convex cone with dual COPn;

• PSDn is a proper self-dual cone with interior PSDn+;

(28)

3.2. Membership and maximal CP-rank

The first question we will treat is: how can we determine if a matrix A is completely positive? Since CPn ⊆ DN Nn, it already helps us if the matrix in question is doubly

nonnegative. But this of course induces the question: how do we check if a matrix is doubly nonnegative? After answering this question, we prove an upper bound for the CP-rank.

3.2.1. Membership of DN N

To verify if a matrix A ∈ Sn is doubly nonnegative, we must nonnegativity and positive

semi-definiteness of A. Of course, nonnegativity of A is trivial to verify. To verify if A is positive semi-definite, we outline a two-step process. Recall that A ∈ Sn is positive

semidefinite if and only if all its eigenvalues are nonnegative (theorem2.1.25).

Removing the kernel

Assume A ∈ Sn. Using Gaussian elimination, one can find a basis for ker(A), and the basis

can be made orthonormal and extended to an orthonormal basis for Rn using the Gram-Schmidt algorithm. This can all be done efficiently (in polynomial time). Therefore, we can find an orthogonal matrix Q ∈ Rn×n, decomposable as Q =Q1 Q2, such that the

columns of Q2 form an orthonormal basis of ker(A). In this case, we have both AQ2 = 0

and Q>₂A = (A>Q2)>= (AQ2)>= 0.

Now computing the basis transformation Q>AQ we obtain

Q>AQ = Q>AQ1 Q2 = Q> 1 Q>₂ AQ1 0 = Q> 1AQ1 0 0 0 .

It is clear that the eigenvalues of Q>₁AQ1 are exactly the nonzero eigenvalues of A.

There-fore, A is positive semi-definite if and only if Q>₁AQ1 is positive-definite.

Determining positive-definiteness

By the previous step, we can assume without loss of generality that A has no zero eigen-values, and we want to determine if A is positive definite. There are multiple efficient algorithms for this. We present one that relies on the following theorem:

Theorem 3.2.1 (Sylvester’s criterion [19]). Let A ∈ Sn. Then A is positive-definite if

and only if for k ∈ {1, . . . , n}, the determinant of the k × k top-left block of A is positive. The above theorem gives an efficient way to check if a matrix is positive-definite, since computing a determinant is possible in polynomial time using Gaussian elimination, and an n-by-n matrix has exactly n principal leading minors.

A different algorithm is the Cholesky-Banachiewicz algorithm: this algorithm provides, in polynomial time, an invertible lower-triangular matrix L such that A = LL> if A is definite (called a Cholesky factorization) and will terminate if A is not positive-definite.

Therefore, we conclude that it is possible to efficiently determine if a matrix is doubly nonnegative. We now turn to the question of complete positivity.

(29)

3.2.2. A geometric interpretation of complete positivity

Using elementary linear algebra, we can obtain a geometric interpretation of complete positivity.

Definition 3.2.2. Let (V, k·k_V) and (W, k·k_W) be normed linear spaces. Then a linear map T : V → W is called a (linear ) isometry if kT vk_W = kvk_V for all v ∈ V .

Let X = {v1, . . . , vn} ⊆ V and Y = {w1, . . . , wn} ⊆ W . We say that X can be

isometrically mapped to Y if there exists an isometry T : span(X) → W such that T vi =

wi for i = 1, . . . , n.

Recall the following fact from linear algebra:

Proposition 3.2.3. Let (V, h·, ·iV) and (W, h·, ·iW) be finite-dimensional inner product

spaces with orthonormal bases Φ and Ψ respectively. Let T : V → W be linear. Then the following are equivalent:

(a) T is an isometry;

(b) For all v, w ∈ V it holds that hT v, T wiW = hv, wiV;

(c) The matrix Q that represents T w.r.t. the bases Φ and Ψ satisfies Q>Q = I.

(d) The matrix Q that represents T w.r.t. the bases Φ and Ψ has orthonormal columns. From the previous proposition, we obtain the following:

Proposition 3.2.4. Let B ∈ Rk×n, C ∈ R`×n. Then B>B = C>C if and only if the columns of B can be isometrically mapped to the columns of C.

Proof. Let v1, . . . , vn be the columns of B and w1, . . . , wn the columns of C. Then we

have

B>B = C>C ⇐⇒ Gram(v1, . . . , vn) = Gram(w1, . . . , wn)

⇐⇒ hvi, vji = hwi, wji for all i, j.

This immediately shows that if B>B 6= C>C, no isometry can exist between the columns of B and those of C. Furthermore, it is also clear that if we define T : span {v1, . . . , vn} → R`

by declaring T vi = wi and extending T linearly, then T is an isometry by part (b) of the

previous proposition and bilinearity of the inner product.

Assume that A ∈ PSDn. Because A is positive semi-definite, we know that A = C>C

for some C ∈ Rn×n, and we want to find k ∈ N and B ∈ Rk×n≥0 such that A = B>B. Now,

proposition3.2.4shows that A is completely positive if and only if we can find an isometry that maps the columns of C into Rk≥0for some k (this is called isometrically embedding the

columns of C into Rk≥0). The CP-rank of A is then the least k for which this is possible.

Example 3.2.5. Suppose A ∈ PSD2, and write A = C2 = C>C for some C ∈ S2 with

columns v1, v2. Now, geometrically it is immediately clear that v1, v2 can be isometrically

embedded into R2≥0if and only if the angle between v1and v2is at most 90 degrees. This in

turn occurs if and only if hv1, v2i ≥ 0, which happens if and only if A = Gram(v1, v2) ≥ 0,

or equivalently, A ∈ DN N2.

We have thus shown the following: Every A ∈ DN N2 is completely positive with a

(30)

Based on the geometric interpretation, we can give an upper bound for the CP-rank of a matrix A. This mainly relies on the following theorem, originally found in [1]:

Theorem 3.2.6. Let S = {v1, . . . , v`} be vectors in Rn which can be isometrically

em-bedded into Rs

≥0 for some s ∈ N and let k = dim span S. Then S can be isometrically

embedded into Rt≥0, where t = 12k(k + 1).

Proof. Without loss of generality we may assume n = k: because dim span S = k there exists an isometry which maps S into

Rk

0

.

Now, let Q ∈ Rs×k be the matrix representing the isometry which maps S into Rs≥0, so

Q>Q = Ik (or equivalently: Q has orthonormal columns), and let q>1, . . . , q>s denote the

rows of Q. Let W =qiq>i | i = 1, . . . , s and d := dim span W . We claim that, so long as

s > d, we can remove a row of Q and multiply the other rows with nonnegative numbers such that Q remains an isometry. This shows that we can map S isometrically into Rs−1≥0 ,

and repeating this argument shows we can map S into Rd. To show this, assume s > d and write

Ik= Q>Q = s

X

i=1

qiq>i .

Since W spans a d-dimensional space, the matrices qiq>i must be linearly dependent, so

there exist α1, . . . , αs such thatPs_i=1αiqiq>i = 0, where at least one of the αi is positive.

Now, for any β ∈ R we have

Ik= s

X

i=1

(1 − βαi)qiq>i .

If we choose β = min1≤i≤s

n

1

αi | αi > 0

o = _α1

j for some j, then clearly 1 − βαj = 0 and

1 − βαi ≥ 0 for all i.

Therefore, if we remove the row q>_j from Q and replace the other q>_i by y>_i := √ 1 − βαiq>i , we find s X i=1,i6=j yiyi>= s X i=1,i6=j (1 − βαi)qiq>i = Ik,

so the matrix Y ∈ R(s−1)×k whose rows are the yi is an isometry which maps S into Rs−1≥0 .

We conclude that S can be mapped isometrically into Rd≥0, and since d ≤ dim(Sk) = 1

2k(k + 1), this completes the proof.

Corollary 3.2.7. If A ∈ CPn has rank k, then the CP-rank of A is at most 1₂k(k + 1).

In particular, any A ∈ CPn has CP-rank at most 1₂n(n + 1).

Proof. If we write A = C>C where C ∈ Snis constructed as in the proof of theorem2.1.25,

it is clear that rank(C) = rank(A). The result now follows directly from the previous theorem.

The most important implication of this corollary is that there exists an upper bound for the CP-rank of an n-by-n matrix which is polynomial in n. Therefore, if there exists an algorithm which, for given k ∈ N can determine if A ∈ Sn has a factorization A = B>B

with B ∈ Rk×n≥0 , then the CP-rank of A can be determined using only polynomially many

(31)

4. Optimization and NP-hardness

In this chapter, we introduce linear and conic programming. We show that the stability number of a graph can be written as the solution to a conic program, and we will use this result to prove that determining membership of CPn and of COPn is NP-hard.

4.1. Conic programming and duality

We introduce optimization problems in general and define the Lagrangian dual. After the general treatment, we consider linear programs and conic linear programs.

4.1.1. Terminology

Given a set X, and functions f : X → R, g : X → Rk and h : X → R`, we define the standard optimization problem:

minimize

x ∈ X f (x) subject to g(x) ≤ 0,

h(x) = 0.

(4.1)

Note we have k inequality constraints (g1(x) ≤ 0, . . . , gk(x) ≤ 0) and ` equality constraints

(h1(x) = 0, . . . , h`(x) = 0).

There is some standard terminology associated with such optimization problems: 1. The function f is called the objective function.

2. The inequalities gi(x) ≤ 0, the equalities hi(x) = 0 and the set restriction x ∈ X are

called constraints.

3. A value x ∈ X which satisfies the constraints is called a feasible solution. If a feasible solution exists, the problem itself is called feasible.

4. The optimal value is defined as inf

x feasiblef (x) = inf {f (x) | x ∈ X, g(x) ≤ 0, h(x) = 0}.

with the convention inf ∅ = +∞.

5. If the optimal value is −∞, the problem is called unbounded.

6. Let ξ be the optimal value. A feasible solution x0 ∈ X which satisfies f (x0) = ξ

is called an optimal solution. If an optimal solution exists, the problem is called solvable.

There are a few remarks to be made about the terminology here:

• The standard optimization problem (4.1) is a minimization problem, while maxi-mization problem are also abundant. The above definitions are easily adapted for maximization problems: the optimal value becomes a supremum and the problem is

Completely Positive and Copositive Matrices