Methods from Statistical Physics Applied to Computational Problem Solving in AI

(1)

Methods from Statistical Physics Applied to Computational Problem

Solving in AI

BSc Physics & Astronomy thesis

Candidate: Timothy Mans Daily Supervisor: Daan van den Berg

Examinator: Marcel Vreeswijk Credits: 15 EC

July 23, 2020

Abstract

This thesis aims to compare the performance of two physics-inspired algorithms, Simulated Annealing (SA) and Survey Inspired Decimation (SID), on hard random-3-SAT instances. I exploit the similarities of 3-SAT to the spin glass model. An Ising-spin formulation of 3-SAT is constructed and the cavity method from spin glass theory is applied. This uncovers properties of low energy states, from which the mes-sage passing algorithm SID is derived. A description of SA as applied to 3-SAT is given. I propose an experimental comparison between SID and SA for random-3-SAT over the clause density range given by {3.400, 3.424, 3.448, ..., 4.600}, and for problem sizes N = 125, N = 250, N = 500, and N = 1000. A description of the behaviour of of SID and SA over the parameter space is given. SID is shown to be a superior algorithm for solving random-3-SAT instances compared to SA.

(2)

1 Introduction: Physics and AI

In recent years, a great number of core concepts from theoretical physics has found its way into the domain of information science, and more specifically, combinatorial problem solving. First, Simulated Annealing is an algorithm which takes key concepts such as crystalline lattice structure and free energy from condensed matter and thermal physcis to find a global minimum of a function of many variables [1]. Second, a class of computationally hard problems called NP-complete problems have been found to exhibit phase transitions such as found in condensed matter physics [2]. Third, from statistical physicis, percolation models on random graphs provide a means to test the resilience of networks [3]. Inversely, the newly minted concept of Verlinde gravity is based on the entropy of quantum entangled bits of binary information [4]. And finally, one of the core concept of the statisical physics of disordered systems, the spin glass, is nearly mathematically equivalent to a boolean satisfiability problem [5].

This thesis aims to compare the performance of two physics-inspired algorithms, Simulated Annealing (SA) and Survey Inspired Decimation (SID) on hard random-3-SAT instances. The theory section touches on the complexity of computational problem solving, elaborating on complexity classifictions. Next the satisfiability problem is described and formalized, and instance hardness is discussed. 3-SAT is transfered to the realm of statistical physics through similarity with spin glass models. Message passing algorithms are introduced trough the theoretical application of the cavity method on 3-SAT. Several message passing algo-rithms and Simulated Annealing are explained. An experimental comparison between simulated annealing and survey propagation is described. Both algorithms are run on a set of 10200 random-3-SAT instances where the algorithms are run over a parameter space containing hard random-3-SAT problem instances. The behaviour of the algorithms over the chosen parameter space is described, and their performance compared.

2 Theory

2.1 Complexity

Decision problems—problems that can be answered by YES or NO—can be classified by complexity. Two such classes are P (Polynomial) and NP (Nondeterministic Polynomial). A problem L is in P if there exists a solving algorithm that solves any instance of L in polynomial time, where polynomial time means that the running time is a polynomial function of the problem size. That is to say, the worst-case running time for a solving algorithm increases polynomialy with respect to the problem size, based on the fastest known algorithms. A problem L is in NP if a candidate solution to any instance of L can be verified in polynomial time. Every problem in P is also a member of NP, for if a problem instance is solved in polynomial time, verifying the solution is a matter of solving the instance again (in polynomial time), and comparing the solutions (polynomial time). However, there exists a set of problems in NP for which no solving algorithms that run in polynomial time have been found as of yet, they are the NP-complete problems [6]. If a problem L1 is reducible (sufficiently efficiently relative to the complexity of the problem), to a different problem L2,

this implies L2is at least as tough to solve as L1. The set of problems to which every problem in NP can be

reduced in polynomial time is called NP-hard, and it is not constrained to just decision problems. NP-hard problems that are also members of NP themselves—which makes them mutually reducible—are said to be NP-complete, and make up the hardest subset of problems in NP (see Fig. 1).

Figure 1: Euler diagram of computational complexity classes in case P6=NP. NP-complete problems are in both NP and NP-Hard by definition. Image adapted from [7].

A well known NP-complete problem is the satisfiabilty problem (SAT), in fact this was the first problem to have been classified as such [8]. A consequence of the mutual reducibility property of NP-completeness is that any instance of the myriad of NP-complete problems [6] can be reduced to a SAT instance and be solved as such, which continues to be a succesful method applied to a wide range of practical applications [9]. Another consequence is that if any that if a polynomial solving algorithm for one NP-complete problem is found, this would imply P = NP and all problems in NP could be solved in polynomial time, but this is currently believed not to be the case [10].

(4)

2.2 The Satisfiabiliy Problem

An instance of SAT is a Boolean expression, a combination of N Boolean variables xi ∈ {0, 1} ≡ {F, T}

(as per C), with i ∈ {1, ..., N }, with M constraints or clauses, using Boolean operators (∧, ∨, ¬), and parentheses. The problem now is to determine whether an assignment of truth values exists to satisfy the expression, i.e., make the entire expression be TRUE. It is beneficial to use some sort of normalized form of propositional formula to restrict nesting in order to keep SAT algorithms simpler. This is why most SAT solvers take a propositional formula in a normal form as input. One such representation of a Boolean expression is the Conjunctive Normal Form (CNF), which is a conjunction of clauses, each of which is a disjunction of literals (CNF is an AND of ORs). A literal is a propositional variable or its negation. For example:

(x1∨ x2) ∧ (¯x1∨ x2∨ ¯x4) ∧ (x3∨ x4), (1)

where ¯x is the negation of the propositional variable x. Here eq. 1 is a CNF-3-SAT instance containing three clauses and seven literals of four variables, where the 3 in CNF-3-SAT denotes that the maximum number of literals in any clause of the formula is 3. To satisfy a CNF formula every clause has to be TRUE. To accomplish this at least one literal contained in every clause has to be TRUE. A (non-unique) solution to the above would be x1 := 1, x2 := 1, x3 := 1, x4 := 1. A different normal form of Boolean

expressions is Disjunctive Normal Form (DNF), where expressions are disjunctions of clauses, each of which are conjunctions of literals (DNF is an OR of ANDs). For example,

(x1∧ x2) ∨ (¯x1∧ ¯x4) ∨ (x3∧ x4). (2)

Here eq. 2 is a DNF-2-SAT instance containing three clauses and six literals of four variables. To satisfy a DNF formula, only one clause has to be TRUE, but doing so requires all the literals contained in it to be TRUE. One (non-unique) solution to this formula is x1:= 0, x2:= 0, x3:= 0, x4:= 0. Solving DNF-SAT is

trivial because clauses are satisfyable if they do not contain both a propositional variable and its negation, which can be easily checked in linear time. Why then, is CNF the most widely used form of propositional formula by SAT algorithms? After reduction of a given problem instance to a SAT instance, computing a equisatisfyable (L1 and L2 are equisatisfyable if and only if L1 is satisfied when L2 is satisfied) translation

for any arbitrary resulting Boolean expression to CNF, takes linear time, whereas a similar transformation to DNF takes exponential time, and therefore CNF has much wider practical application. [11, 12]. A common naming convention for SAT instances is K-SAT, where K denotes the highest number of literals in any clause. As shown for the CNF-SAT and DNF-SAT examples above this is 3-CNF-SAT and 2-DNF-SAT respectively. Though some restricted versions of K-SAT exist that are not NP-complete, such as DNF-SAT mentioned above and 2-DNF-SAT, the random-K-DNF-SAT problem is NP-complete. Using either exactly K literals per clause, or K at most, does not affect complexity and neither does prohibiting duplicate clauses. Prohibiting duplicate variables and literals in the clauses themselves also has no effect on complexity. If there exists one or more clauses with 3 or more variables the problem is NP-complete. This thesis will focus on random 3-CNF-SAT requiring clauses containing exactly three literals of three distinct variables randomly negated with probability 0.5.

3-SAT formulas can be represented by factor graphs as shown in Fig. 2. N circular variable nodes are connected to M square clause nodes by edges, where a dashed edge symbolizes negation (¯xi). The

connectivity of a node is the number of edges connecting it to other nodes. In 3-SAT clause nodes have a connectivity of 3 per definition, while the connectivity of variable nodes may vary.

c2 c4 c1 c3 x1 x3 x2 (a) unsolved c2 c4 c1 c3 1 1 0 (b) solved

Figure 2: (¯x1∨ x2∨ x3) ∧ (x1∨ ¯x2∨ x3) ∧ (x1∨ x2∨ ¯x3) ∧ (¯x1∨ ¯x2∨ ¯x3) as a factor graph. Circular variable

nodes are connected to square clause nodes by edges, where a dashed edge symbolizes negation (¯xi). Fig. 2a

shows a 3-SAT instance, while Fig. 2b shows the same instance satisfied by fixing the variables with truth values x1:= 1, x2:= 0, x3:= 1. Satisfying edges and satisfied clauses highlighted in green.

2.3 Instance-Hardness

The thoughness of individual problem instances is called the instance hardness. Instance hardness of 3-SAT depends heavily on the clause density α = M/N , the ratio of the number of clauses to variables. So for

(5)

which values of α are the problem instances the hardest? We define a lower bound αlband an upper bound

αubsuch that the probability a SAT instance is satisfyable in the thermodynamic limit where N goes to

infinity, limN →∞PN(α) = 1 for α < αlb, and limN →∞PN(α) = 0 for α > αub. If α < αlb the instance

is said to be underconstrained, and conflicts are rare enough so that solutions are easily found showing the instance is satisfyable. Inversely, If α > αub the instance is said to be overconstrained, and conflicts

proliferate making the instance unsatisfyable [5, 13]. These are the bounds of the satisfiability threshold. This threshold seperates the satifyable and unsatisfyable domains, and it is on this threshold that the instances are the hardest to solve. As N increases, the range between αlband αub tightens, and the transition from

satisfyable to unsatisfyable resembles more and more a step function similar to a phase transition as found in condensed matter physics [5]. The existence of a phase transition would imply a sharp boundary αcrit,

such that an instance is satisfyable for α < αcritand unsatisfyable for α > αcrit. Therefore, the instances

at this boundary should be the hardest of all. Because of 3-SAT’s inherent difficulty, exactly locating the phase transition for higher K is an open problem [14, 15]. However, a transition of satisfiability has been rigorously confirmed for the 2-SAT problem [16], and many hold the notion that phase transitions are in fact a property of NP-completeness [2]. Currently the bounds are constrained to αlb= 3.42, αub= 4.51, and

αcrit' 4.27 is estimated for K = 3 [17, 18]. Another important boundary is αcluster ' 3.92, below which

the solution space essentially forms one big cluster, and above which it becomes fragmented into many [5]. The HARD-SAT-Phase is the region αcluster< α < αcrit. This is of particular interest with regard to local

search algorithms, as their succes depends heavily on the architecture of the state space. [19].

2.4 3-SAT and Spin Glasses

To transfer the 3-SAT problem to the realm of physics, we shall represent our variables as Ising spins σi∈ {−1, 1}, corresponding to variables xi∈ {0, 1}, with i ∈ {1, ..., N }. To be able to represent a literal,

or a variable node in the graph representation, we introduce a set of coupling constants {Jia} ∈ {−1, 1}

with a ∈ {1, ..., M }. One for each edge linking a clause node a with a variable node xi, using Ji= −1 for

a positive literal xi or a solid line, and Ji = 1 for a negative literal ¯xi or a dashed line, so that a literal

Jiσi= −1 (the coupling constant is opposite to the spin), if and only if it is a satisfying literal. Now we can

define the energy of a clause node a as

Ea= 3 Y r=1 1 + Ja rσir 2 .

This way Ea = 0 if the clause is satisfied by at leat one satisfying literal, else Ea= 1. This enables us to

write the Hamiltonian of a 3-SAT instance: H = M X a=1 3 Y r=1 1 + Ja rσir 2 . (3)

The 3-SAT decision problem is now translated to figuring out whether there exists a configuration of spins where H = 0, given a set of couplings {Jai}. This formulation of 3-SAT is a close physical analogue to

the Sherrington-Kirkpatrick (SK) model of a spin glass from statistical physics. Spin glasses are substances where the interaction between spins can be ferromagnetic (J = 1) or antiferromagnetic (J = −1). In the SK model Ising spins are situated on a lattice and spin interactions are unbound by range. The Hamiltonian of the basic SK model is given by

H = − X

i<j<N

Jijσiσj,

where σi, σjare Ising spins, Jijis the magnetic coupling between spins, and N is the total amount of spins.

There exist deep similarities between the two models. Finding states of minimal energy (ground states) in spin glasses is an NP-complete problem as well. Every spin interaction can be thought of as a clause, and like with 3-SAT, finding the ground state is analogous to finding a state without violated constraints. A triplet of spins {σ1, σ2, σ3} is said to be frustrated if J1,2· J2,3· J1,3 = −1, which means it is impossible

to find a zero energy configuration of the triplet. Frustration in spin glasses causes an abundance of locally stable clusters of spin configurations, where a cluster is defined by a set of spin configurations of equal energy that are connected by single spin flips. Local stability means that the energy cannot be decreased by a flip of a finite amount of spins. For 3-SAT with αcluster > 3.92 we encounter this clustering as well, with the

condition of local stability simplified to require that the energy cannot be decreased with any sequence of single spin flips. This is a problem for local search algorithms as they will inevitably get trapped in these local minima, unable to surmount the energy barrier required to get to a lower energy state [20]. Fortunately this problem has been the topic of extensive research in the context of spin glasses, and methods have been developed to analyze the properties of low energy states. One such method is the cavity method.

2.5 From the Cavity Method to a Message Passing Algorithm

We generate a random 3-sat graph of N spins, called G(N ), where N approaches the thermodynamic limit limN →∞. For every unique triplet of spins a clause connecting them exists with probability

(6)

chosen so that the average total number of clauses generated in the thermodynamic limit, limN →∞M , is given by M = lim N →∞ N 3 ! ∗ pclause= lim N →∞ N ! 3!(N − 3)!∗ 6α N2 = αN, (4)

where α is the fixed clause density M/N . The connectivities k are randomly distributed according to lim N →∞ (N − 1)(N − 2)/2 k ! 6α N2 k 1 − 6α N2 (N −1)(N −2)/2−k =(3α) k k! exp(−3α) ≡ f3α(k), (5) the Poisson distribution of mean 3α [21].

Now that we have generated our random graph, the objective at hand is to identify properties of spin configurations associated with low energy states. Suppose we copy G(N ) and expand it to G(N + 1) by introducing a novel spin σ0 with a fixed value. Consistant with eq. 4, σ0 is to be connected to each unique

pair of spins {σi, σj} in G(N ) with probability pclause, hereby generating k new clauses, in accordance with

the Poisson distribution given in eq. 5. We call the 2k spins that newly connect to σ0cavity spins {σa1, σa2},

with a = 1, ..., k. In order to define the energy of G(N ), we use the fact that in the thermodynamic limit our cavity spins are far apart statistically speaking. When a ferromagnet or spin glass settles in one of the aforementioned locally stable clusters, correlation between distant spins tend to vanish [22]. Borrowing this clustering property allows us to assume there exist no more correlation between the cavity spins themselves. For simplicity we (wrongly) assume there exists only a single locally stable cluster with a global energy minimum, enabling us to write the energy of G(N ) as a function of the fixed cavity spins as follows:

EG(N )({σa1, σa2}) = A − k

X

a=1

(h1→aσa1+ h2→aσa2),

where A is the energy of G(N ) independent of the cavity spins, and h1→a and h2→a are cavity fields, where

the notation i → a symbolizes that this field emanated from spin σi and is acting upon interaction a. A

cavity field hi→ais the magnetic field on σiin the absence of interaction a, passed on to a. This is analogous

to the influence of the constraints on variable xiin the absence of clause a, passed on to a. We would now

like to describe the energy of G(N + 1) associated with adding σ0, see Fig. 3. Regarding a single new clause

a σ0 σ2 a σ1 a ua→0 h2→a h1→a

Figure 3: Part of the graph G(N + 1) connected to newly added spin σ0 showing one of k new clauses, a,

and the cavity bias ua→0 calculated from the cavity fields hi→a originating from two of the 2k cavity spins

{σi

a}. The arrows symbolize the direction of the fields.

node a, this energy is a combination of the energy Ea(σ0, σa1, σa2) of the clause itself and the link energies

h1→aσ1aand h2→aσa2 of the corresponding cavity spins σa1 and σa2. For all k clauses this becomes

EG(N +1)({σa1, σ 2 a}) = A − k X a=1 Ea− (h1→aσa1+ h2→aσa2). (6)

To minimize this energy with regard to cavity spins σ1aand σ2a, we define the free energy shift and the cavity

bias

(7)

respectively, where {Ji

a} are the magnetic couplings or values of the edges, as follows:

min (Ea(σ0, σ1, σ2) − (h1→aσ1+ h2→aσ2)) = −(wa→0+ σ0ua→0). (7)

The right-hand side of eq. 7 is now factored in terms of σ0-dependent and σ0-independent parts. Applying

eq. 7 to eq. 6 gives us the minimized energy of G(N + 1) for a given σ0:

EG(N +1)(σ0) = A − k X a=1 wa→0− σ0 k X a=1 ua→0.

The second term can be written as −σ0h0 where

h0≡ k X a=1 ua→0= k X a=1 ua, (8)

denotes the complete magnetic field acting on σ0, called the local field of σ0. The expression for the local

field is used to define a set of self-consistent probabilty distributions. The probability distribution of local fields P (h) is given by averaging over f3α(k):

P (h) = ∞ X k=0 f3α(k) Z δ h − k X i=1 ui ! _k Y i=1 duiQ(ui), (9)

where Q(u) is the probability distribution of the cavity biases, given by [5] Q(u) =

Z

δ (u − ui) EJdh1dh2P (h1)P (h2), (10)

where EJ denotes the expectation value for all magnetic couplings, and h1 and h2 are the input local fields,

which in turn are again dependent on a set of neighbouring cavity biases {ui} following distribution Q(u).

These self-consistent equations essentially describe an update mechanism that propagates messages over the entire graph, and which lies at the core of the Warning Propagation algorithm.

To derive the self-consistency equations eqs. 9 and 10, we have made the assumption that there exists a single locally stable cluster of spin configurations with a minimum energy. But it has been shown there can exist may such states for frustated systems [22]. This limits the usefulness of Warning Propagation to graphs that lack any inherent frustration, which requires that there can be no conflicting clauses. Conflicting clauses are caused by loops, so this describes tree-like graphs. If we forgo this assumption, eq. 8 becomes:

hα0 ≡ k X a=1 uαa→0= k X a=1 uαa.

Now we have a local field for σ0 for each stable low energy state α. This transforms the self-consistency

equations to distributions of distributions called surveys [5]: P0(h) = Z exp u k X i=1 ui ! δ h − k X i=1 ui ! k Y i=1 duiQ(ui), (11) Qi(u) = Z

exp (µwa→0− |h1| − |h2|) δ(u − ui)dh1dh2Pi(h1)Pi(h2). (12)

This pair of self-consistency equations describe the update mechanism for the Survey Propagation algorithm, in the same way eqs. 9 and 10 described that of Warning Propagation.

3 Algorithms

3.1 Warning Propagation

The usefulness of Warning Propagation (WP) is limited to tree-like graphs, and a random 3-SAT instance are very unlikely to be tree-like. However, it is useful to understand WP as a primer on Survey Propagation. WP is a message passing algorithm used to disseminate messages ua→i ∈ {0, 1} through a graph to warn

those variables that have to satisfy a clause in order to achieve a satisfiable assignment for the SAT-instance. {0, 1} represents a non-warning and a warning, respectively. A warning ua→i= 1 is interpreted by a variable

xi as a demand to satisfy clause a. At t = 0, messages ua→i are randomly initialized for every edge with

value 0 or 1 with with equal probability. For t = 1 to t = tmax, the messages are sequentially updated using

the Warning Propagation-Update subroutine (WP-Update), generating the set of messages {ua→i(t)}.

WP-Update takes an edge connecting clause a to variable xi, and calculates its updated message ua→i

according to

ua→i=

Y

j∈V(a)\i

θ Jjahj→a, (13)

where j denotes a variable connected to clause a different from i, and V (a) represents the collection of variables connected to clause a, and where Jiais the magnetic coupling, or value of the edge between a and

xi. The function θ(x) is described by

θ(x) = (

0, for x ≤ 0 1, for x > 0.

(8)

The expression hj→adenotes the cavity field given by

hj→a= −

X

b∈V(j)\a

Jjbub→j,

where b denotes a clause connected to j different from a, and V (j) represents the collection of clauses attached to j, For example, updating ua→1 in Fig. 4, using eq. 13:

ua→1= θ(−J2a(J2bub→2+ J2cuc→2)) · θ(−J3a(J3dud→3+ J3eue→3)

= θ(1 ∗ 1 + −1 ∗ 1) · θ(−(−1 ∗ 0 + −1 ∗ 1)) = θ(0) · θ(1) = 0,

which can be interpreted as: x1 does not have to satisfy a, as a still has the possibility to satisfied by x2;

therefore a warning is not sent. If {ua→i(t)} 6= {ua→i(t − 1)}, WP performs another cycle, updating all

a 1 2 3 b c d e h3→a 1 1 0 1 u_a→1 h_2→a

Figure 4: Shown here is the part of a 3-SAT graph that is relevant to the calculation of ua→1 done by the

WP-update subroutine. Circular variable nodes are connected to square clause nodes by edges, where a dashed edge symbolizes negation. The cavity fields {h1→a} and cavity biases {ub→2}, the values of which

can be seen along the edges, are shown here relative to the edge connecting clause node a to variable node 1.

messages. If {ua→i(t)} = {ua→i(t − 1)} WP has converged and the final set of messages at convergence,

symbolized with a star, {u∗a→i} = {ua→i(t)} has been generated. An example of a final set of messages

{u∗a→i} on an example graph is shown in Fig. 5. For tree-like graphs, convergence is guaranteed for t ≥ 1+r/2,

a b c d

1 2 3 4

0

0 0

1 1 1 1

Figure 5: An example of a final set of messages {u∗a→i} on the graph given by the formula (¯x1) ∧ (x1 ∨

x2) ∧ (¯x2∨ x3∨ x4) ∧ (¯x4). Circular variable nodes are connected to square clause nodes by edges, where a

dashed edge symbolizes negation. The leaf clauses a and d are the first to send warnings to their neighboring variables to satisfy them as they have no other options. This prompts b to send a warning to x2, because

x1 will not satisfy b. Finally c sends a warnings to x3, since both x2 and x4 are warned to adopted values

that do not satisfy c.

where r is the maximal shortest distance between any two leaf-nodes of the tree [13].

3.2 Survey Propagation

Survey Propagation (SP) essentialy uses the same message passing procedure as WP, however the messages sent are surveys, which can be interpreted as probabilities of warnings. SP was derived from the assumption that there exist many stable states, which makes it suitable for frustrated systems with local minima. Hence,

(9)

SP is able to converge on graphs with loops. It can be utilized as a heuristic to solve random 3-SAT instances in combination with a decimation procedure. If SP is deployed on tree-like graphs, SP reduces to WP and convergence is guaranteed. At t = 0, messages ηa→i ∈ [0, 1] are randomly initialized for every edge, note

that this is a continuous interval. For t = 1 to t = tmax, the messages are sequentially updated using the

Survey-Propagation subroutine (SP-Update), generating {ηa→i(t)}. SP-Update takes an edge connecting

clause a to variable i, and calculates its updated message ua→iaccording to

ηa→i= Y j∈V(a)\i _Πu j→a Πu j→a+ Π s j→a+ Π0j→a ,

where superscript u stand for unsatisfying, s for satisfying, and 0 for neutral. Found from numerical solutions of the self consistency equations eqs. 11 and 12 [5], Πuj→a, Π

s

j→a, Π0j→aare given by

Πuj→a=  1 − Y b∈Vu a(j) (1 − ηb→j)   Y b∈Vs a(j) (1 − ηb→j), Πsj→a=  1 − Y b∈Vs a(j) (1 − ηb→j)   Y b∈Vu a(j) (1 − ηb→j), Π0j→a= 1 − Y b∈V(j)\a 2(1 − ηb→j),

where Vas(j) is the set of clauses connected to variable j that tend to make j satisfy clause a, and Vau(j)

is the set of clauses connected to j that tend to make j not satisfy a. When two clauses have a variable in common, one clause tends to make a variable satisfy the other clause if their couplings to this variable are equal, that is: Jai = Jbi for clauses a and b connected to a variable xi. These definitions of sets Vas(j)

and Vu

a(j) are visualized in Fig. 6. Πuj→a can be interpreted as the probability that j will not satisfy a.

Therefore, the interpretation of the survey ηa→iis the probability that all j ∈ V (a)\i will not satisfy a. The

denominator only functions to normalize the expression.

a Vs a(1) Vs a(1) Vu a(1) Vu a(1) 1

Figure 6: clause a, variable x1 and clause nodes belonging to either Vas(1) or Vau(1) relative to a and x1,

depending on the values of the edges. Solid and dashed edges correspond to coupling constants J = −1 and J = 1, and to literals x and ¯x, respectively.

If |{ηa→i(t)} − {ηa→i(t − 1)}| > , where is a specified precision value, SP performs another cycle,

updating all surveys. If |{ηa→i(t)} − {ηa→i(t − 1)}| ≤ , SP has converged, resulting in the final set off

surveys, again symbolized with a star, η∗a→i= ηa→i(t).

3.3 Survey Inspired Decimation

Survey Inspired Decimation (SID) is a decimation procedure that fixes variables in a graph based on the final set of surveys {ηa→i∗ } generated by Survey Propagation. First SID checks to make sure the set of surveys is

not trivial. A trivial set of surveys is defined by η∗a→i= 0 for all η ∗ a→i∈ {η

∗

a→i}. If all surveys are trivial,

no more information can be extracted from the surveys pertaining to the variables’ biases towards a value in a low energy state. If this is the case, you could employ for instance a local search algorithm to try and further minimize the energy. For each variable xi, the biases Wi+, W

− i , and W

0

i are determined, which are

measures of how biased variable xiis towards value 1 or 0. The biases are given by

Wi+= Π+_i Π+_i + Π−_i + Π0 i , Wi−= Π−i Π+i + Π − i + Π0i , Wi0= 1 − Wi+− W − i ,

(10)

where Π+_i, Π−_i, and Π0i are given by Π+i =  1 − Y a∈V+(i) (1 − ηa→i∗ )   Y a∈V−(i) (1 − ηa→i∗ ), Π−i =  1 − Y a∈V−(i) (1 − η∗a→i)   Y a∈V+(i) (1 − ηa→i∗ ), Π0i = Y a∈V(i) (1 − ηa→i∗ ),

where V+(i) is the set of nodes connected to xi by solid edges, and V−(i) is the set of nodes connected to

xi by dashed edges. The interpretation of Π+i is the probability that at least one neighbor wants xi to be

TRUE, and none want it to be to be FALSE (the other way around for Π−i). The biases normalize these

probabilities. The variables are sorted by their level of biasW_i+− W− i

, and the f · N most biased variables are fixed, where N is the total number of variables, and f is a specified fraction of variables that are to be fixed after each cycle of SID. A variable xi is fixed to xi = 1 if Wi+ > W

−

i and fixed to to xi = 0 if

W_i+ < W_i−. The graph can now be reduced by removing any satisfied clauses and fixed variables. For the resulting reduced graph, the process is repeated; running SP, calculating the biases, fixing a fraction of variables, and reducing the graph. This is done untill the SAT-instance is either solved and a satisfying configuration of variables is given, SP hasn’t converged, or if a contradiction is found. A contradiction is where two surveys of the same value (within precision ) and an opposing coupling converge on a variable. In the case of a contradiction, the instance is most likely unsatisfyable, but the accuracy of this determination is dependent on the specified value of f . Since finding contradictions is not exact, it can not be used to determine with certainty that a SAT instance is unsatisfyable. Fixing fewer variables at a time leads to better convergence of SP and fewer contradictions found, therefore the performace of SID increases, solving a higher percentage of SAT-instances at the cost of time [13].

3.4 Simulated Annealing

A different approach to solving K-SAT is the Simulated Annealing algorithm (SA). Simulated Annealing is a probablistic local search algorithm. A description of Simulated Annealing on 3-SAT will now be given. The energy of the system with N variables and M clauses is given by eq. 3. At t = 0, variable values are initialized at 0 or 1 with equal probability, producing configuration C. A single random variable is flipped, producing C0. If EC0− E_C < 0 the change is accepted, but if E_C0 − E_C > 0 the change is accepted probablistically.

In case EC0 − EC = 0, the change is accepted with probability 0.5. This is repeated untill a satisfying

configuration is found, or when tmax is reached, upon which SA will return the state associated with the

lowest energy it has found. Probablistically allowing movement toward higher energy states allows Simulated Annealing to escape local minima, which are so closely associated with frustrated systems such as random 3-SAT. The probability for accepting a change toward a higher energy state, P (T ) is dependent on a variable which we call the temperature. In general, P (T ) decreases as T decreases, and T is related to t by a function called the cooling scheme. Consider the rapid quenching of a molten metal, freezing it in an amorphous state, contrasted with the stable crystalline structure of a regular piece of metal at low temperature. This analogy applies to the cooling function: If the system is ’cooled’ too quickly, the configuration wil freeze in a suboptimal configuration. That is why the temperature is slowly lowered by annealing the system into a low energy ”crystalline” state. In the original formulation of Simulated Annealing, the acceptance function is given by P (T, ) = e−/T, where = EC0− E_C[1]. This resembles a Boltzmann factor consistent with the

analogy to physical systems. This author’s implementation of SA chooses to use a more ”na¨ıve” acceptance function, hereby avoiding to choose an acceptance function strictly for analogies’ sake. The cooling scheme and probability function are given by

T (t) = tmax− t tmax

and P (T, ) = T ,

respectively, where = EC0− EC is the energy shift of a single flip, and T goes linearly from 1 to 0 for every

input tmax, chosen so for simplicity. The -dependency is there to reduce the chance of big positive leaps

in energy. This function generally accepts more changes than the original, hypothetically converging more slowly with a lower chance to get stuck in any local minima. A comparison of the acceptance functions is shown in fig 7.

4 Method

This section describes an experiment to test the performance of the physics-inspired algorithms Survey Inspired Decimation and Simulated Annealing on random 3-SAT. The versions of the algorithms used were written by this author. To validate SID, the algorithm was run on several handcrafted tree-like 3-SAT instances, which were designed to be either satisfyable or unsatisfyable. The Instances were permutations of the example graph provided in [13]. Since SP reduces to WP on tree-like graphs and because convergence is guaranteed for WP on tree-like graphs, the outcome of SID should be the same for multiple runs on the same instance. It was checked that SID produced a satisfyable truth assignment for satisfiable instances

(11)

0.0 0.2 0.4 0.6 0.8 1.0

T (t)

0.0 0.2 0.4 0.6 0.8 1.0

P

(T

,

)

P1, = 1 P1, = 3 P0, = 1 P0, = 3

Figure 7: The acceptance function used in this experiment P1(T, ) = T, and the acceptance function from

the original formulation of SA P0(T, ) = e−/T [1], shown here for several values of , where T is cooling

function T (t) = (tmax− t)/tmax, and is the positive energy shift. An energy shift of 1 corresponds to one

unsatisfied clause. The probability of accepting a change to the corresponding configuration decreases when the positive energy shift is higher.

every run, and that it spotted a contradiction for unsatisfyable instances every run. Simulated Annealing was tested on the same handcrafted graphs to check if the algorithm only converged on satisfyable instances, and never on unsatisfyable instances. The solutions were verified by a seperate cheking module that counts the amount of satisfied clauses, given the solution provided for a given problem.

To test SID in a pure form, no additional local search algorithms were deployed when a trivial set of surveys ηa→i∗ was found, and the algorithm was stopped. The number of satisfied clauses in this case is very

close to the amount required to solve the instance, but the run is counted as unsuccesful nevertheless. The values for the accuracy parameter = 0.001, the maximum amount of iteration of SP for each cycle of SID tmax= 1000, and the fraction of fixed variables at each step f = 0.02, where taken from [13].

A set of 10200 random 3-SAT instances was generated, 2550 for each value of the geometric progression N = 125, N = 250, N = 500, N = 1000. For each value of N , 50 problem instances were generated per value of α ∈ {3.400, 3.424, 3.448, ..., 4.600}. This set spans the satisfiability threshold, where 3.42 < α < 4.506 for large N [17, 18], which includes the aforementioned HARD-SAT-phase, where 3.921 < α < 4.267 [5]. Graphs were generated as follows. For a given N , M was calculated with M = αN . For M clauses, 3 variables out of N were selected at random, negating the variable with probability 0.5. Clauses containing tautology or redundance (where a variable appears more than once in a clause) were discarded, as were resulting 3-SAT problem instances exclusive of any of the N variables required. The resulting graphs were stored.

For every generated 3-SAT instance, SID and SA were run sequentially. The running time of SID was timed with python’s time.clock(), and Simulated Annealing was run on the same SAT-instance for the exact running time of SID. A run of SID or SA was counted as succesful if algorithm converged and a satisfying configuration was provided. For both algorithms, the best variable configurations found for each instance were saved. Additional data was gathered: The number of satisfied clauses by SID, the number of satisfied clauses by SA, the total number of SP iterations per instance, the running time (equal for both algorithms), and the specific outcome of SID (succesful, unconverged, contradiction found, or trivial messages found).

Multiprocessing was implemented to save time, by running seperate SAT-instances at once on multiple cores. Per value for N and α, the 50 individual instances were distributed over the cores, solved one at a time for each core. Because of the way SID passes its running time to SA, the comparison between the two algorithms is independent of computing resource. The experiment was run on three machines, and for each machine the amount of cores used by the experiment was set equal to the amount available on the machine.

5 Results

The fraction of random-3-SAT instances solved by SID is shown to drop over the chosen range of α in Fig. 8. This in contrast to the almost uniformly low fraction of instances solved by SA in Fig. 9, with only 20 out of 10200 succesful runs, all occuring for N = 125. The total fraction of SAT instances solved by SID is fsolved,SID = 0.49, whereas this fraction for SA is only fsolved,SA = 0.0026. Considering the range of the

satisfiability threshold from limN →∞PN(α) = 1 to limN →∞PN(α) = 0, where PN(α) is the probability of

an instance being satisfyable, is given by, 3.42 < α < 4.506, fsolved,SID= 0.49 is a very good result over the

range 3.400 < α ≤ 4.600. The instances solved by SA tend to have low values of α, the highest fraction of solved SAT instances is 0.06 at α = 3.6624. The performance of SA can further be illustrated by plotting the fraction of satisfied clauses at the best state found by SA. As shown in Fig. 10, SA is generally very close to a solution, but performs worse for higher values of N . With respect to the running time given by SID, the perfomance of SA seems to increase over α for N = 500 and N = 1000. The running time for SID,

(12)

which was consequently passed on as the allotted running time for SA, is shown in Fig. 11. The running time data is dependent on the specific machine that was used during the experiment.

3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 α −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 fsolv ed N = 125 N = 250 N = 500 N = 1000

Figure 8: The fraction random-3-SAT instances solved by Survey Inspired Decimation (SID) over α, for different values of N . One dot represents the fraction of the 50 instances with the same values for N and α that were solved by SID. The curves were fitted with the function a tanh(−bx + c) + d. Notice how the fraction drops of steeper over α for higher values of N .

3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 α −0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 fsolv ed N = 125 N = 250 N = 500 N = 1000

Figure 9: The fraction random-3-SAT instances solved by Simulated Annealing (SA) over α, for different values of N . One dot represents the fraction of the 50 instances with the same values for N and α that were solved by SA. The datapoints for N = 250 and N = 500 are hidden underneath the datapoints for N = 1000 at the bottom. Succesful runs have only occured for N = 125, favoring low values of α.

(13)

3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 α 0.975 0.980 0.985 0.990 0.995 1.000 M sat /M N = 125 N = 250 N = 500 N = 1000

Figure 10: The fraction satisfied clauses at the best state found by Simulated Annealing (SA) over α, for different values of N . One dot represents the averaged fraction of satisfied clauses of 50 instances with the same values for N and α. A value of 1 corresponds with a satisfied instance. The curves were fitted with the function ax3+ bx2+ cx + d. The fraction of satisfied clauses is lower for higher N, and increases over α with respect to the running time.

3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 α −500 0 500 1000 1500 2000 2500 3000 3500 4000 t(s) N = 125 N = 250 N = 500 N = 1000

Figure 11: Average running time of SID, consequently passed on as the allotted running time for SA, over α. One dot represents the averaged running time of 50 instances with the same values for N and α. The time data is dependent on the machine that was used. N = 125 was run on machine A, N = 250 was run on machine B, and N = 500 was run on machine C. N = 1000 was run on a combination of machines A and C, using machine A for 3.400 ≤ α < 4.288, and using machine C for α ≥ 4.288. The curves were fitted with the function ax3+ bx2+ cx + d. The discontinuity of the data for N = 1000 correponds with the switch from machine A to faster machine C. Because of this discontinuity, the curve is fitted for seperately for both machines.

(14)

6 Discussion

As can be seen in Fig. 8, SID performs as expected over α, solving most sat instances at α = 3.4, with the fraction of solved SAT instances dropping for increasing α. This is in accordance with the probability of the instances being satisfyable, considering the bounds of the satisfiability threshold in the high-N limit, αlb= 3.42, and αub= 4.51 [17, 18]. The drop in solved SAT instances over α is much sharper for higher

N , which is in accordance with the satisfyability threshold becoming sharper as N increases [5]. Looking at Fig. 9, SA seems to favor solving SAT instances for lower values of α, which is to be expected since a higher fraction of instances is satisfyable for lower values of α. However, this could also be the effect of the supposed proliferation of local minima for α ≥ αcluster ' 3.92 [13, 5]. Because of the dearth of solved

instances by SA over parameter space as shown in Fig. 9, it is hard to draw solid conclusions with regard to the α dependence of SA based solely on this data. Fig. 10 sheds light on the behaviour of SA over the parameter space, showing values for MSAT/M . From the N dependence of MSAT/M , and fsolved, it appears

that SA with the specific acceptance function and cooling scheme used is not very suitable to solve random-3-SAT instances for values of N chosen this high, in the time allotted by the convergence of SID. It would be interesting to see if a different acceptance function would increase the number of SAT instances solved in this region of parameter space. The curves in Fig. 10 for N = 500 and N = 1000 seem to tend upwards over α. This at first glance mysterious property is explained by taking the allotted running time for SA into account. As can be seen in Fig. 11, SID takes more time to converge for higher values of α. This increase in running time is especially steep for the curves corresponding to N = 500 and N = 1000. Consequently, these running times are passed on to SA, which then gains a lot more running time for the satisfying of clauses, and this is reflected in Fig. 10. Taking these increases in running time into account, I expect a net decrease for MSAT/M over α. Comparing MSAT/M and fsolved for SA, the question arises how much the

satisfiability of a SAT-instance influences the maximum amount of satisfyable clauses. We know there is at least one less satisfyable clause in an unsatisfyable instance compared to a satisfyable instance, but how does the actual abundance of unsatsfyable clauses scale with α and N ? An in-depth instance hardness analysis could shed light on this question. From Fig. 10, SA seems to get stuck in local minima for high N , and it is hard to say to what degree SA is escaping local minima. Based on the abundance of solved SAT instances by SID, I conclude that SP is a much better heuristic for dealing with local minima, and that SID is the superior algorithm for solving random-3-SAT instances compared to SA.

7 Conclusion

The similarities of hard computational problems to challenges in statistical physics continue to be a source of inspiration to develop novel methods in computational problem solving. In my thesis, I aimed to compare the performance of SID and SA on random-3SAT. I explained the nuances of complexity, defining the P, NP, NP-Hard and NP-Complete complexity classes. I introduced the NP-Complete Boolean satisfyability problem, and explained its significance in the realm of hard computational problems. The concept of instance hardness was introduced, showing where the greatest challenges lie. The Spin glass model was shown to be inherently similar to an Ising-spin formulation of the random-3-SAT Boolean satisfiability problem. I showed how the cavity method, first described to study the relaxation of spin glasses [22], is applied to 3-SAT to derive message passing algorithms, as described by [5, 13]. These message passing algorithms were then described in detail. Simulated Annealing, a different physics-inspired algorithm, was introduced. The analogy to physical concept of annealing, and how this helps us to solve 3-SAT more efficiently was explained. A novel experiment was proposed to compare the performance of Survey Inspired Decimation (SID) and Simulated Annealing (SA) on random-3SAT. A set of 10200 random 3-SAT instances was generated, and the algorithms were sequentially run in the same running time, achieved by measuring the running time of SID, and passing this on to SA as the allotted running time. This way both algorithms were compared for N = 125, N = 250, N = 500, N = 1000, and for the range of α spanning the satisfiability threshold, α ∈ {3.400, 3.424, 3.448, ..., 4.600}. I have shown the behaviour of both algorithms over the parameter space. SID solved most sat instances at α = 3.4, with the fraction of solved SAT instances dropping for increasing α. This was shown to be in accordance with the literature regarding the satisfiability threshold [17, 18]. SID was shown to outperform SA, solving a higher fraction of SAT-instances, where fsolved,SID= 0.49 and

fsolved,SA = 0.0026. SA appeared to get trapped in local minima. It was shown SA satisfied on average

98.8% of clauses, often coming close to solving the instance, but rarely solving them. I concluded that the heuristic used by SID is superior to SA. The performance of SID on the hardest instances of 3-SAT shows us that introducing concepts from physics to computational problem solving is a worthwhile pursuit.

References

[1] Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.

[2] Peter C Cheeseman, Bob Kanefsky, and William M Taylor. Where the really hard problems are. In IJCAI, volume 91, pages 331–340, 1991.

[3] Duncan S Callaway, Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. Network robustness and fragility: Percolation on random graphs. Physical review letters, 85(25):5468, 2000.

(15)

[4] Erik Verlinde. On the origin of gravity and the laws of newton. Journal of High Energy Physics, 2011 (4):29, 2011.

[5] Marc M´ezard and Riccardo Zecchina. Random k-satisfiability problem: From an analytic solution to an efficient algorithm. Physical Review E, 66(5):056126, 2002.

[6] Michael R Garey and David S Johnson. Computers and intractability: a guide to np-completeness, 1979.

[7] Behnam. Euler diagram for P, NP, NP-Complete, and NP-Hard set of problems. Nov 2007. URL https://commons.wikimedia.org/wiki/File:P np np-complete np-hard.svg.

[8] Stephen A Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151–158, 1971.

[9] Joao Marques-Silva. Practical applications of boolean satisfiability. In 2008 9th International Workshop on Discrete Event Systems, pages 74–80. IEEE, 2008.

[10] Lance Fortnow. The status of the p versus np problem. Communications of the ACM, 52(9):78–86, 2009.

[11] Grigori S Tseitin. On the complexity of derivation in propositional calculus. In Automation of reasoning, pages 466–483. Springer, 1983.

[12] Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener. On converting cnf to dnf. Theoretical computer science, 347(1-2):325–335, 2005.

[13] Alfredo Braunstein, Marc M´ezard, and Riccardo Zecchina. Survey propagation: An algorithm for satisfiability. Random Structures & Algorithms, 27(2):201–226, 2005.

[14] R´emi Monasson, Riccardo Zecchina, Scott Kirkpatrick, Bart Selman, and Lidror Troyansky. Determin-ing computational complexity from characteristic ‘phase transitions’. Nature, 400(6740):133, 1999. [15] Ke Xu and Wei Li. Exact phase transitions in random constraint satisfaction problems. Journal of

Artificial Intelligence Research, 12:93–103, 2000.

[16] Andreas Goerdt. A threshold for unsatisfiability. Journal of Computer and System Sciences, 53(3): 469–486, 1996.

[17] Olivier Dubois, Yacine Boufkhad, and Jacques Mandler. Typical random 3-sat formulae and the satis-fiability threshold. arXiv preprint cs/0211036, 2002.

[18] Alexis C Kaporis, Lefteris M Kirousis, and Efthimios G Lalas. The probabilistic analysis of a greedy satisfiability algorithm. Random Structures & Algorithms, 28.

[19] Holger H Hoos. Sat-encodings, search space structure, and local search performance. In IJCAI, vol-ume 99, pages 296–303. Citeseer, 1999.

[20] Shu Tanaka, Ryo Tamura, and Bikas K Chakrabarti. Quantum spin glasses, annealing and computation. Cambridge University Press, 2017.

[21] B´ela Bollob´as. Random graphs. Number 73. Cambridge university press, 2001.

[22] Marc M´ezard, Giorgio Parisi, and Miguel Virasoro. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987.

Methods from Statistical Physics Applied to Computational Problem Solving in AI