Successes and setbacks in P vs NP A study of fruitful proof methods in complexity theory and the barriers they face towards solving P vs NP

(1)

Successes and setbacks in P vs NP

A study of fruitful proof methods in complexity theory and the barriers they face towards solving P vs NP

Marlou M. Gijzen

July 8, 2016

Bachelorthesis

Supervisor: prof. dr. Ronald de Wolf

Korteweg-de Vries Instituut voor Wiskunde

(2)

Abstract

There are several successful proof techniques in complexity theory, but each has its own barrier towards solving P vs NP. Diagonalization gave us the Hierarchy Theorems, but most diagonalization proofs relativize and cannot solveP vs NP. When considering circuits, we found a lot of lower bounds on complexity classes. But most of the proofs in circuit complexity are natural proofs, and those cannot show thatP 6= NP. Arithmetization turned out useful in interactive proofs, but it induces algebrizing results. Those cannot solveP vs NP either. However, there is slight hope for proving that P 6= NP by showing upper bounds.

Title: Successes and setbacks inP vs NP

Author: Marlou M. Gijzen, marlou.gijzen@student.uva.nl, 6127901 Supervisor: prof. dr. Ronald de Wolf

Second grader: dr. Leen Torenvliet End date: July 8, 2016

Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

(3)

1 Introduction

“

The P versus NP problem has gone from an interesting problem related to logic to perhaps the most fundamental and important mathematical question of our time, whose importance only grows as computers become more powerful and widespread.

Lance Fortnow in [22]

”

In our daily lives, we are accustomed to using computers to solve a plethora of problems; calculating costs, determining the shortest route or even simply googling things we want to know. The time needed to compute solutions to problems like this varies with the problem itself.

In 1956 Gödel wrote a letter to von Neumann with the first mention of the P vs NP problem. He wondered to what extent we would be able to reduce the time it takes for a Turing machine to solve hard computational problems.

Turing machines describe our notion of computability. The Church-Turing thesis states that anything that is computable (in an intuitive sense), can be computed by a Turing ma-chine. Different alternatives to distinguish between the computable and non-computable have arisen (Post systems,λ-calculus, combinatory logic, µ-recursive functions, Turing ma-chines). These all turned out to be equivalent to each other; hence Church declared that they all represent what it means to be computable. This was before computers were even invented [33, Lec. 28]. Since then, we’ve started using Turing machines in the theory of problem solving.

Scientists began to categorise problems according to the duration of the computation. We call the field that studies the complexity (or hardness) of problemscomplexity theory. P is the collection of problems that we cansolve quickly with a computer. To be more precise: P stands for Polynomial time, problems that a Turing machine can solve in time that grows polynomially in the input size. Consider an algorithm that determines whether a given num-ber is prime. For larger numnum-bers, the algorithm takes more time to compute. For problems inP, the time an algorithm needs can be expressed as a polynomial. The variable in the polynomial is the size of the input (the length of the list).

NP stands for Non-deterministic Polynomial time and refers to problems for which a computer can quicklyverify a solution. A computer, a deterministic machine, always per-forms the exact same computation for the same input; their actions for each situation are predetermined. However, a non-deterministic machine can perform different possible com-putations for the same input. These different comcom-putations are the different paths that lead towards possible solutions of the problem. Non-determinism enables a Turing machine to make guesses during the computation. This way, the machine can always find a path towards

(6)

a correct solution, if such a path exists. We call this path a certificate. For problems inNP, this path leads the machine to the solution in polynomial time. If we give a deterministic ma-chine this certificate, it is also able to go through it in polynomial time. This is why problems inNP can be verified quickly.

Hence the big question: does P equal NP? If we are able to verify a solution quickly, are we able to find one quickly too? Intuitively, the answer is no; verifying the solution to a sudoku puzzle is much easier than finding one yourself. Yet no-one has been able to prove that this is the case. In 2000, the Clay Mathematics Institute listed seven fundamental unsolved math problems known as the Millenium Problems. A prize of a million (US) dollars will be given to anyone with a solution. Among these problems isP vs NP: we want a proof that shows thatP 6= NP or P = NP .

These classes are of great interest to us, becauseP envisions the problems wecan solve quickly, whileNP contains many problems we wouldlike to solve quickly [20]. The schedul-ing of timetables is one of them. But also protein foldschedul-ing, which is important for biologists. If we had an efficient algorithm for finding the optimal folding of a protein, then we could understand and prevent a lot of diseases far better. Cryptography, what we use for sending information over the internet securely, relies on the hardness ofNP problems. If P = NP, then we could easily decrypt messages sent over the internet and find passwords that are used on websites. But we would also be able to live in a world with optimal schedules in public transportation, better weather predictions and improved disease prevention. We could un-derstand stock markets far better and retrace evolution [1, 20]. It seems almost obvious that P 6= NP. Feynman, a famous physicist, found it hard to accept that P vs NP even was an open problem [2]. However, a proof thatP 6= NP could give us insight in why the problems inNP are hard to solve and it might enable us to find efficient solutions to those problems in specific cases. But so far we have been unable to give such a proof.

The proof techniques that have been traditionally successful for other problems in com-plexity theory turned out to be insufficient for solvingP vs NP. Each faces unique barriers and is thus unable to proveP 6= NP or P = NP. This thesis studies these proof techniques and explores whyP vs NP is so difficult to solve. In Chapter 3, diagonalization and the rela-tivization barrier will be discussed. Chapter 4 is about circuit complexity and natural proofs. In Chapter 5 we will study arithmetization and algebrization. Chapter 6 will discuss some other proof techniques and the current situation of theP vs NP problem.

(7)

2 Preliminaries

We first need to discuss different kinds of Turing machines and how they work. With that we can explain how computational complexity is measured and categorised.

2.1 Turing Machines

2.1.1 What are Turing machines?

Alan Turing defined hisautomatic machine in 1936 [48]. This machine, which we now call a Turing machine, was meant as a "human computer": It has an unlimited supply of paper for its calculations and it has to follow fixed rules, supposedly written in a book. Whenever it’s assigned a new job, the rules get altered [47]. More formally:

Definition 2.1 (Turing machine). A Turing machine is a tuple (Γ, Q, δ), where: • Γ is a finite set of tape symbols, the alphabet.

• Q is a finite set of states.

• δ is a transition function: Q × Γ → Q × Γ × {L, R, S}.

The Turing machine has atape, meant for reading and writing. A tape is a one-way infinite line of cells, each of which holds a single symbol ofΓ. Γ generally contains a blank symbol, a start symbol and the numbers 1 and 0. The tape has a tape head that can read or write symbols on the tape. The machine can only look at one symbol at a time. It can remember those that it has already seen by altering the state it’s in. The register holds the state of the machine, which determines the behaviour. There is a begin state, q_start and an end state, qhalt, that tell the machine when to start and stop [7, Sec. 1.2], [48].

2.1.2 How the Turing machine solves a problem

The Turing machine is able to compute solutions to different problems. We have to formulate an instance of a problem and give it as input to a Turing machine with a suitableδ in q_start. From then on the machine will try to calculate the solution step by step:

Definition 2.2 (Computational step [7, Sec. 1.2]). A single step is described as follows: 1. The machine reads the symbol under the tape head.

(8)

3. The state of the machine is replaced with a state fromQ.

4. The head moves to the left, right or stays in its position (L, R or S).

The new state and symbol can be the same as the ones before. The transition function δ describes for every symbol and state which new symbol and new state should come in place and in which direction the tape head goes next.T_M(x) denotes the number of steps a TM M takes on inputx.

If a machine is able to compute the solution, then it will eventually reach the end state with the solution written on the tape. We can consider a problem as a functionf . A Turing machine takes an inputx and, after taking several computational steps, it can come into the final state and outputf (x). We use |x| to denote the length of a string x. The computation of a function is thus defined as follows:

Definition 2.3 (Computing a function [7, Def. 1.3]). Let f : {0, 1}∗ → {0, 1}∗

and letM be a TM (Turing machine). M computes f if for every x ∈ {0, 1}∗, whenM starts with x written on its tape, it halts withf (x) on the tape.

A Turing machine can be defined to have more than one tape. Multi-tape Turing machines can compute the same functions as single-tape machines [7, Sec. 1.3]. Thus, without loss of generality we will consider machines with one tape only.

2.1.3 Universal and probabilistic Turing machines

We can represent a Turing machine as a binary string, using its description. For the descrip-tion a list of all inputs and outputs of the transidescrip-tion funcdescrip-tion will suffice, since that funcdescrip-tion fully determines the behaviour. We useM_αto denote the Turing machine represented by the stringα and we use M_α(x) to denote the output that the machine gives on input x.

When proving results it will come in handy if we can make two assumptions about the representation scheme. Firstly, we want that every string represents some Turing machine. This can be done by mapping each string that is not a valid description to a trivial machine: one that immediately outputs 0. Secondly, we want that every machine is represented by in-finitely many strings. We ensure this by saying that the description can end with an arbitrary number of 1’s that are ignored [7, Sec. 1.4].

We can use the description to simulate the behaviour of a machine on some input x by another,universal, Turing machine. In order to separate the input and the description we use the symbol# from the alphabet of the universal machine.

Theorem 2.4 (Universal Turing machine [7, Th. 1.9]). Let T : N → N be a function. There exists a Turing machine U such that for all x, α ∈ {0, 1}∗, U (α#x) = Mα(x). If Mα halts on

input x within O(T (|x|)) steps, then U (α#x) halts within CT (|x|) log T (|x|) steps, where C is some number independent of x.

Hence we only need one machine to compute everything that is computable. A Turing machine can also use probability:

Definition 2.5 (Probabilistic Turing machine [7, Def. 7.1]). A probabilistic Turing machine (PTM) is a Turing machine with two transition functionsδ₀ and δ₁. At every step, the TM appliesδ₀ orδ₁with probability 1

(9)

2.1.4 Languages and oracle Turing machines

For computation we’ll only considerBoolean functions, f : {0, 1}∗ → {0, 1}. The problems associated with these functions arelanguages or decision problems: questions with a yes-or-no (1 or 0) answer. For a collection of Boolean functionsf_n : {0, 1}n → {0, 1}, one for each n ∈ N, a language L is a set of binary strings: L = {x | fn(x) = 1 for some n}.

Definition 2.6 (Deciding a language). A machine M decides a language L if it computes the functionsf : {0, 1}∗ → {0, 1} where f (x) = 1 ⇔ x ∈ L.

An oracle machine is some kind of Turing machine, that is able to immediately decide problems in the language used as the oracle. How this is done is not specified: The oracle works as a "black box", much like an "actual" oracle. If O is a language and M an oracle machine with access toO, we denote the output of M on an input x ∈ {0, 1}∗byMO(x). Definition 2.7 (Oracle machines [7, Sec. 3.4]). An oracle Turing machine is a Turing machine that has a tape called the oracle tape. It also has the three statesq_query,q_yes andq_no.

The oracle of the machine will start to work whenever the state is inq_querywithy written on the oracle tape. In a single step the machine will then move intoq_yes if y ∈ O and into qnootherwise

2.2 Computational complexity

2.2.1 Running time and measuring complexity

If we want to talk about the time it takes to solve a problem, then it is more convenient to count the number of steps a Turing machine takes, than to measure the number of seconds that computing the function took: We don’t want the time for completing a problem to de-pend on something other than the nature of the problem itself. The number of steps taken by the machine is dependent on the input size. So it makes sense to define the running time as a function on the input length.

Definition 2.8 (Running time [7, Def. 1.3]). Let f : {0, 1}∗

→ {0, 1} and T : N → N be some functions and letM be a TM. M computes f in T(n)-time if on every input x ∈ {0, 1}n, M computes f while taking at most T (n) steps.

Exactly counting the number of steps a Turing machine takes is too precise. If we let a Turing machine work with a different numerical system, the computation of the same problem can be done in a different number of steps. We will therefore only consider the highest growing term in the running time function. For this we use thebig Oh notation [7, Def. 0.2].

Definition 2.9. Let f ,g be two functions from N to N. We say that f = O(g) if there exist c, n0 ∈ N such that for all n > n0,f (n) ≤ c · g(n). We say that f = Ω(g) if g = O(f ). We say thatf = o(g) if lim_n→∞ f (n)

(10)

We use the term complexity to say something about the resources needed to compute a function. By looking at the complexity of problems we are able to compare them according to their difficulty. We can use the running time to measure the complexity, but we can also consider, for example, the amount of working storage used for the computation (space com-plexity) or the number of gates in a circuit that computes the function (circuit comcom-plexity). All these are dependent on the input size and thus we represent them the same way.

2.2.2 Classification according to complexity

We classify languages according to their complexity. Before we introduceP, the class that started it all [14], we need the classDTIME:

Definition 2.10 (The class DTIME [7, Def. 1.12]). Let T: N → N be a function. A language L is in DTIME(T (n)) iff there exists a TM M such that M decides L and TM(x) ≤ O(T (n)).

The classP consists of languages that a machine cansolve in polynomial time [7, Sec. 1.6]. Definition 2.11 (The class P [7, Def. 1.13]). P =S

c≥1DTIME(n c_).

Examples of problems in P are determining whether a number is prime and deciding whether a graph has some maximum matching (a maximal set of edges such that no edges share a vertex). The classNP contains the languages for which a possible answer, the cer-tificate, can be verified in polynomial time.

Definition 2.12 (The class NP [7, Def. 2.1]). L ∈ NP iff there exists a polynomial p: N → N and a TMM that runs in polynomial time such that ∀x ∈ {0, 1}∗:x ∈ L ⇔ ∃u ∈ {0, 1}p(|x|) such thatM (x#u) = 1.

We call the Turing machineM the verifier for L and u the certificate for x. A problem is thus inNP, if there is a possible solution to the problem for which a Turing machine can verify that it is a correct solution in polynomial time. Consider thesubset sum problem: given a list of numbers, is there a subset with the sum equal to someC? The input will be the list of numbers, while the certificate is a subset of these numbers. The Turing machine can then check if it adds up toC or not [7, Sec. 2.1]. Other examples of problems in NP are factoring and determining whether two graphs are isomorphic.

We have that P ⊆ NP: If L ∈ P, the certificate can be an empty string and the ma-chine can just solve the problem in polynomial time. The classesEXP and NEXP are the exponential-time equivalents ofP and NP. If L ∈ NP, we can enumerate and check all possible certificates for the problem in exponential time, soNP ⊆ EXP [7].

For the complexity we can also look at the amount of memory a Turing machine uses: Definition 2.13 (The class SPACE [7, Def. 4.1]). We say that a language L ∈ SPACE(T (n)) iff there is a TM that decidesL on an input of length n using no more than O(T (n)) locations of its tape.

The class of problems which use polynomial space is the following: Definition 2.14 (The class PSPACE [14]). PSPACE =S

c>0SPACE(n c_).

(11)

If L ∈ NP, we can go trough all possible certificates using polynomial space: We can write one down, check it, erase it and write down the next one. Thus:NP ⊆ PSPACE [7]. Since a machine that works with polynomial space can visit at most an exponential amount of configurations,PSPACE ⊆ EXP. Thus: P ⊆ NP ⊆ PSPACE ⊆ EXP ⊆ NEXP. WhetherP = PSPACE is another open problem in complexity theory.

We can also define complexity classes with oracles. For a complexity class C and oracle O, COgives the class of languages decided by Turing machines that decide languages inC, only now with oracle access toO.

2.2.3 NP-complete problems and reduction

We can compare the difficulty of decision problems with reductions. If we translate a specific problem to another, we can find a solution by solving the other problem.

Definition 2.15 (Polynomial-time reducability [7, Def. 2.7]). A language L is polynomial-time reducible to a language L0 (L ≤_p L0) if there is a polynomial-time computable function g : {0, 1}∗ → {0, 1}∗ _{such that for every}_{x ∈ {0, 1}}∗_,_{x ∈ L ⇔ g(x) ∈ L}0_.

With reductions we are able to compare languages based on how hard they are. IfL ≤_p L0 then L0 is at least as hard asL. If the reduction works both ways the problems can be considered equally difficult, up to a polynomial slowdown. Since≤_p is transitive we don’t need a direct reduction.

If a problem L0 is at least as hard as any other problem in NP, so L ≤_p L0 for every L ∈ NP, we call it NP-hard. If L0 is also inNP, then we call it NP-complete [7, Def. 2.7].

Boolean formulas consist of variables in{0, 1} and the logical operators ∧, ∨ and ¬. We say that such a formula issatisfiable if there exists an assignment that makes the formula true. Satisfiability, orSAT, is the language of all satisfiable Boolean formulas.

Theorem 2.16 (Cook-Levin Theorem [15, 35]). SAT is NP-complete

Besides SAT, there are many more NP-complete problems: every L ∈ NP such that SAT ≤p L. The previously mentioned Subset Sum Problem is one of these. But also the

Travelling Salesman Problem, which asks whether there exists a path of some length that visits specific cities on a map (does a graph have a Hamiltonian path with a weight smaller than or equal to someC?). The Clique Problem, of finding a maximum subset of adjacent vertrices in a graph, is alsoNP-complete.

2.2.4 Measuring complexity with Boolean circuits

We can also use the Boolean circuit that computes a function for measuring the complexity. A Boolean circuit can be seen as a simplified model of the silicon chips in a modern computer. It is a diagram that shows how to derive an output from a binary input string.

Definition 2.17 (Boolean circuit [7, Sec. 6.1]). An n-input Boolean circuit is a directed acyclic graph withn vertices without incoming edges (the input) and one vertex with no outgoing edges (the output). All vertices except those for the input aregates, for which it uses the

(12)

logical operations∨, ∧ and ¬. The fan-in or -out of a gate is the number of in- or outcoming edges. The∨ and ∧ gates have fan-in 2 and ¬ has fan-in 1. All gates have fan-out 1. If C is a Boolean circuit andx ∈ {0, 1}∗an input, the output is denoted as C(x). This is the value of the output vertex. Thesize of the circuit, |C|, is the number of vertices it contains. The depth is the length of the longest path from an input to output vertex.

Below is an example of a circuit that computes¬x₁ ∧ (x₂∨ x₃). It has depth 2 and size 6. Inlayer one, we have the gates ¬ and ∨ and in layer two the output gate ∧.

Figure 2.1: Example of circuit

We can classify languages according to the size and depth of the corresponding circuit. There are two types of circuit families:uniform and non-uniform.

Definition 2.18 ([41, Def. 8.13.1]). A circuit family C = {C1, C2, . . .} is a collection of

cir-cuits in which C_n has n inputs. A T (n)-time (or -space) uniform circuit family contains circuits for which there is a TMM such that on input n, M outputs a description of C_n in T (n) time (or space).Non-uniform circuit families are not restricted to this requirement.

With uniform computation the same Turing machine is used for all input sizes. Uniform circuits compute the same functions as Turing machines. Non-uniform computation allows the usage of a different algorithm for different input sizes. Non-uniform circuits can com-pute functions that are not computable by any Turing machine, such as the Halting Problem that will be discussed in the next chapter. Those circuits might be an unpractical choice for complexity, given the Church-Turing thesis. However, if we can prove lower bounds on the size of circuits without taking into account whether they are uniform or not, then we can also apply these bounds to uniform circuits and thus to other models as Turing machines [41, Sec. 8.13]. A non-uniform computation can be represented as a family of Boolean circuits: Definition 2.19 (Circuit families and language recognition [7, Def. 6.2]). Let T : N → N be a function. AT (n)-size circuit family is a sequence {C_n}

n∈N of Boolean circuits, where

Cnhasn inputs and ∀n, |C_n| ≤ T (n). For a language L, L ∈ SIZE(T (n)) iff there exists a T (n)-size circuit family {Cn} such that for all x ∈ {0, 1}n,x ∈ L ⇔ C_n(x) = 1.

The classP contains precisely the languages that have a logspace-uniform polynomial-sized circuit family (a circuit family {C_n} is logspace-uniform if there is a function com-putable with logarithmic space that mapsn to the description of C_n) [7, Sec. 6.2.1]. If we consider non-uniform circuits we obtain the following class:

Definition 2.20 (The class P/poly[7, Def. 6,5]). P/poly = ∪cSIZE(nc).

SoP_/poly contains languages that are decidable by polynomial-sized circuit families. We have thatP ⊆ P_/poly. We have now discussed enough basic theory to go into the main part of this thesis.

(13)

3 Diagonalization and relativization

Diagonalization can be used for separating complexity classes and it gave us the Time Hierarchy Theorem. However, most diagonalization proofs relativize and therefore cannot solve P vs NP.

3.1 Diagonalization and complexity

Cantor was the first to use the diagonalization argument. He showed the existence of un-countable sets. We call a set X unun-countable iff there is no injective function from X to N. If X is countable, such a function exists. We can thus enumerate the elements in X, by looking at the image of the injection. Cantors proof goes as follows: Suppose towards a contradiction that we could enumerate the set of infinite sequences of binary digitsS. We can now create a new element byflipping the diagonal: changing each digit on the diagonal (so a 0 becomes a 1 and vice versa), see Figure 3.1. This new element,s, is another infinite sequence of binary digits, sos ∈ S. But s was not in our enumeration, since it differs from each element on at least one point. Therefore, such an enumeration isn’t possible andS is uncountable.

s1=00000000 · · · s2= 11111111 · · · s3= 01010101 · · · s4= 10101010 · · · s5= 11001100 · · · s6= 00110011 · · · s7= 11100011 · · · s8= 00011100· · · . . . . . . . . . . . . . . . . . . . . . . . . . . . ._. . s =10110101 · · ·

Figure 3.1: The diagonalization argument

3.1.1 How diagonalization found its way into computer science

Inspired by this method and by Gödel’s incompleteness theorems, Turing brought the di-agonalization method into computer science. He showed that there is no way of deciding whether a Turing machine iscircular or circle free. ("If a computing machine never writes down more than a finite number of symbols of the first kind [0 and 1], it will be called circular. Otherwise it is said to be circle-free") [48, Ch. 2]. In present-day we consider a slightly dif-ferent notion, known as theHalting Problem. Davis stated it as the problem of determining

(14)

whether a Turing machine will eventually halt on an input (i.e., produce an output instead of computing for infinity) [16, p. 70]. He formulated a functionHALT: HALT(α#x) = 1 iff a Turing machine represented by the stringα halts on input x after a finite number of steps andHALT(α#x) = 0 otherwise [7, Ch. 1.5]

Theorem 3.1 (Davis 1958 [16, 7]). No TM can compute the language HALT.

To prove that this is an undecidable problem we use the same argument as Turing uses in [48, p. 246]: We define a machine that flips on the diagonal and we reach a contradiction when we let it compute itself. We usex_H to denote the description of a Turing machineH. Proof sketch. Assume towards a contradiction that there is a TM M that computes HALT. We can now define a new TM H as follows: If M (x#x) = 1 then loop forever, else halt. RunningH on input x_H causes a contradiction: ifM (x_H#x_H) = 1 then H loops forever on inputx_H. However, sinceM computes HALT, H has to halt on input x_H. IfM (x_H#x_H) = 0, H halts, but also cannot halt after a finite number of steps.

3.1.2 What diagonalization is and what it’s used for in complexity

Since the diagonalization method made its first appearance in complexity theory, it has suc-cessfully been used for separating complexity classes. The purpose of this method is showing the existence of a language that is in one of two complexity classes, but not in the other. It often relies on the fact that there’s a correspondence between Turing machines and nat-ural numbers. We previously mentioned that every binary string represents some Turing machine. Since a binary string is essentially a natural number, we can order the Turing ma-chines accordingly. The sequence of all Turing mama-chinesT = T₁, T₂, . . . will thus start with the machine that is represented by the binary string of the natural number1, the second one by2, etc.

We will now give an idea of how diagonalization can separate two complexity classes. If we want to prove that A 6⊂ B, we create some sequence M = M₁, M₂, . . ., such that for every language in classB there is a Turing machine in M that decides it. This can be done by simply taking the ordered sequence of all Turing machines and removing every one that does not decide a language inB. We then define a Turing machine, N , that decides a language as follows: on input 1 it outputs the opposite answer ofM₁(1), on input 2 it outputs the opposite ofM₂(2), etc. This is where we flip the diagonal. The language that N decides is different from all languages inB, since for every machine in M it differs in output on at least one input. If we were able to make sure that the language thatN decides is in A, we get the desired result: a language which is inA, but not in B.

3.1.3 The Time Hierarchy Theorem

A well-known theorem that uses diagonalization to separate complexity classes is the Time Hierarchy Theorem. It tells us that with more computation time, Turing machines can decide more languages. We call a functionT : N → N time (or space) constructible if T (n) ≥ n and

(15)

there is a Turing machine that on input of somex outputs the binary representation of T (|x|) in time (or space)O(T (n)) [7, Sec. 1.3].

Time Hierarchy Theorem (Hartmanis, Stearns 1965 [27, Cor. 9.1]). If we have two time-constructible functions V and T satisfying T (n) log T (n) = o(V (n)), then

DTIME(T (n)) ( DTIME(V (n))

. The theorem implies thatP 6= EXP. We will use the following corollary:

Corollary 3.1 (Hartmanis, Stearns [27, Cor. 2.7]). If T (n) ≥ n and x ∈ DTIME(T (n)), then ∀ε > 0, ∃ a multi-tape TM that prints out the nth digit of x in (1 + ε)T (n) or fewer steps.

We can now give the proof of Theorem 3.1.3 as in [27].

Proof. Assume that V and T are functions as in the theorem. We create a sequence M = M1, M2, . . . of TM’s that compute all languages computable in time 2T . This can be done by

adding an extra tape and a counter to all TM’s. If a TM takes more than2T (n) steps for a computation on an inputx ∈ {0, 1}n, a mark is added to the extra tape and we will remove the TM from the sequence. Corollary 3.1 now implies that for allL ∈ DTIME(T (n)) there exists anM_iinM such that M_idecidesL.

Let U be the universal TM from Theorem 2.4. With this we define a TM N , that on in-put (x_M

i#i) outputs 1 − U (xMi#i). Here we flip the diagonal. Then N decides a

lan-guageL0 ∈ DTIME(T (n)), since it differs on at least one input from every language in/ DTIME(T (n)).

We know that U runs in C_iT (n) log T (n) time. Suppose that N begins to simulate U afterD_itime, for a constantD_i. ThenN operates in time D_i+ C_i· T (n) log T (n). But since T (n) log T (n) = o(V (n)), there exists an m0 ∈ N such that Di+Ci·T (m) log T (m) ≤ V (m)

for allm > m₀. For them < m₀ we could provideN0 with a table containingU (x_M

m#m),

soN0 runs inV (n) time.

We have found a languageL0 ∈ DTIME(V (n)), but L0 ∈ DTIME(T (n)). This proves/

the theorem.

3.1.4 Other diagonalization results

The diagonalization method has been used successfully for separating a lot more complex-ity classes. The Space Hierarchy Theorem, an analogue of the Time Hierarchy Theorem, concerns space-bounded complexity classes.

Definition 3.2 (The class L [14]). L = SPACE(log(n)).

Space Hierarchy Theorem (Stearns, Hartmanis, Lewis [7, 46]). If U and T two are space-constructible functions such that T (n) = o(U (n)), then SPACE(T (n)) ( SPACE(U(n))

The Space Hierarchy Theorem tells us that L 6= PSPACE. But diagonalization can be used for other results than separations. Ladner proved the following, using diagonalization:

(16)

Theorem 3.3 (Ladner 1975 [34]). If P 6= NP, then there exists a language L ∈ NP, L /∈ P that is not NP-complete.

So if P 6= NP, there are problems beside the NP-complete problems that we cannot efficiently compute. We call these languages inNP\P NP-intermediate languages.

3.2 The barrier diagonalization imposes: relativization

3.2.1 Relativization and why diagonalization proofs relativize

All diagonalization results depend upon certain attributes of Turing machines. The Time Hi-erarchy theorem uses a representation of Turing machines by strings, as well as the ability of one Turing machine to simulate other machines. The oracle Turing machine is also governed by these attributes. This implies that the Time Hierarchy theorem also applies to languages decided by oracle Turing machines [7, Ch. 3.4]. This is known as arelativizing result, since it holds relative to any oracle. If we had a relativizing result showing thatC 6= D for com-plexity classes C, D, then also CO 6= DO for all oracles O. All the other diagonalization arguments that I previously mentioned also relativize, as well as many others.

3.2.2 Relativizing results cannot solve the P vs NP problem

The following theorem shows that relativizing results alone are insufficient to solve P vs NP. We will give the proof of the theorem according to [7, Ch. 3.4].

Theorem 3.4 (Baker, Gill, Solovay 1975 [9]). There exist oracles A, B such that PA_{= NP}A

and PB _{6= NP}B_.

Proof. For the oracle A we will take a PSPACE-complete language. Then PSPACEA⊆ PSPACE, since A doesn’t give PSPACE any more power. Also, PSPACE ⊆ PA_,

be-cause we can reduce all L ∈ PSPACE to A. With the same reasons that P ⊆ NP ⊆ PSPACE, we have that PA⊆ NPA⊆ PSPACEA. But thenPA = NPA= PSPACE.

We will now construct a setB and language L such that L ∈ NPB, butL 6∈ PB.

We letL = {1n | ∃x ∈ B : |x| = n}, L(1n) = 1 iff 1n∈ L and 0 otherwise. Let M₁, M₂, . . . be an enumeration ofDTIME(2

n

10) oracle TMs.

We will createB = ∪∞

i=1Bi in stages. LetB₁ = ∅. At stage i we will determine B_i+1. Let nibe the smallestn_ithat is bigger than the lengths of all strings inB_i. We runMBi

i on input

1ni_{. Then, the behaviour of the oracle is as follows: If}_MBi

i queries a stringx it has queried

before, the oracle gives the same answer. Otherwise, it answers negatively and we decide thatx /∈ B_i+1. We now want to make sure thatMBi

i disagrees withL on at least one input.

IfMBi

i accepts1ni, we decide that all strings of lengthni are not inBi+1, soL(1ni) = 0. If

MBi

i rejects, we can find a stringx ∈ {0, 1}nthatM Bi

i has not queried (sinceM Bi

i can make

at most 2ni

10 queries). For small n, poly(n) > 2n

10, but this is not a problem since there is an

(17)

If we continue to do this at every step, thenL 6∈ PB. ButL ∈ NPB, since every TM in this class can non-deterministically guess a string and ask if it is inB [7, Ch. 3.4].

3.2.3 Non-relativizing diagonalization

We haven’t formally defined diagonalization. We could define the diagonalization method as the process of flipping the diagonal. This is usually done by a universal Turing machine that uses the representations of Turing machines by strings as input. A proof that uses the diagonalization method with this techniques is relativizing.

Kozen defined the diagonal as a function that flips the outcome of languages on some index. In most diagonalization proofs, as the proof of the Time Hierarchy Theorem in Section 3.1.3, this index is precisely on the diagonal line (as Cantor originally used it). But this is of course not necessary.

Furthermore, Kozen shows that for any languagef that is not in a complexity class, there exists an index such thatf is a diagonal. So if P 6= NP is provable, then it is also provable by (non-relativizing) diagonalization [32].

(18)

4 Circuits and natural proofs

When studying the computation of a function we can also consider the Boolean circuit that computes it. Proving lower bounds was particularly successful for circuits with constant depth. But most lower bound proofs in circuit complexity are natural and cannot be used to separate P from NP.

4.1 Usage and successes

From a mathematical point of view, Boolean circuits are a lot easier than Turing machines. They also allow us to use nonuniform models of computation, where we can use a differ-ent algorithm for differdiffer-ent input sizes. By looking at multiple methods of computation, we might increase our chances of solvingP vs NP. Circuits also offer hope for circumventing relativizing techniques: By directly looking at the circuits that compute the function we are no longer treating machines as black boxes, in comparison to oracle machines.

4.1.1 The size and depth of circuits

Since non-uniform circuits can even compute undecidable languages, we need to restrict their size or depth to gain interesting results. By considering circuits of polynomial size, we could eventually prove thatP 6= NP: We know that P ⊆ P_/poly (see Section 2.2.4), so a goal in circuit complexity is showing thatNP 6⊆ P_/poly.

The first who gained results by considering the size of circuits was Shannon. He showed that there exist hard functions, which require large circuits for their computation:

Theorem 4.1 (Shannon 1949 [43]). Almost all Boolean functions on n variables can only be computed by Boolean circuits (with fan-in ≤ 2 and ∧, ∨, ¬ gates) of size greater than Ω(2_nn).

Circuit complexity also has its own, nonuniform, hierarchy theorem. It tells us that small circuits are able to compute fewer functions than large circuits:

Theorem 4.2 (Nonuniform Hierarchy Theorem [7, Th. 6.22]). For all functions T, T0

: N → N with 2_nn > T0(n) > 10T (n) > n_{, SIZE(T (n)) ( SIZE(T}0(n)).

Apart from size, we can also look at the depth of a circuit.Parallel computing allows us to execute different computations at the same time by using several processing devices. In this case, the depth of a circuit is a more adequate measure for the cost of a computation. There are several complexity classes depending on the depth of a circuit. One of them isNC:

(19)

Definition 4.3 (The class NC [14]). For every i, NCiis the class of decision problems that can be decided by a uniform family of circuits {C_n}, where every C_n has polynomial size andO(login) depth. The allowed gates are ∨, ∧ with fan-in 2 and ¬. NC =S

i≥0NC i_.

The languages inNC are actually the languages that have efficient parallel algorithms [7, Th. 6.27]. Suppose a circuit has depthd and k · d gates. We can let each of k processors of a parallel computer take on the role of a single gate. Then, the computation of gatek_i in layer dj can be performed by a processori at time j. So the running time of a parallel algorithm

is given by the depth of the circuit.

Because we can simulate a parallel logarithmic-time algorithm with a (regular) polynomial-time one,NC ⊆ P. Another open question is whether P = NC, or, whether every efficient algorithm has an even faster parallel implementation [7, Sec. 6.7.2].

We can also allow the gates to have unbounded fan-in, so the∨ and ∧ gates can take more than two inputs. The corresponding complexity class is the following:

Definition 4.4 (The class AC [14]). For every i, ACiis the class of decision problems that can be decided by a nonuniform family of circuits{C_n}, where C_nhas polynomial size and O(login) depth. The allowed gates are ∨, ∧ with unbounded fan-in and ¬. AC =S

i≥0AC i

. We have thatACi ⊆ NCi+1, since every∨ or ∧ gate with fan-in n can be simulated by a tree with depthlog(n) and fan-in 2. This implies that NC = AC [7, Sec. 6.7.1].

4.1.2 Parity is not in AC

0

Researchers started to focus on the computational power of non-uniform circuits with a low depth. There was hope for a proof that a problem inNP needs circuits with restricted depth. Then, we might be able to gradually remove the restrictions and find superpolynomial-size lower bounds forNP, showing that P 6= NP [49]. One of the first big successes was the result that circuits inAC0, which are of constant depth, cannot compute Parity.

Definition 4.5 (Parity [7, Th. 14.1]). L(x1, x2, . . . , xn) =

Pn

i=1xi(mod 2).

Theorem 4.6 (Furst et al. , Ajtai [24, 5]). L /∈ AC0

This result was proven using restrictions on functions. With restrictions some of the inputs are fixed, while others remain variable. A restriction on a function induces another function on the unrestricted variables. If a circuitC computes a function f , then for a restriction ρ, Cρ_computes_fρ_{. Putting restrictions on the parity function will still give the parity function,}

only for the unrestricted variables [24].

It can be shown that any circuit inAC0can be made into a constant function by restricting some, but not all, of the input bits. However, Parity can only be made constant by restricting all input bits. So Parity cannot be computed by a circuit inAC0[7, Sec. 14.1.1].

(20)

4.1.3 Majority is not in AC

0

with parity gates

We can take this one step further. Razborov was able to show in [39] that the Majority function is not contained inAC0, even with Parity gates.

Definition 4.7 (Majority [31]). Majn(x1, x2, . . . , xn) = 1 iff

Pn

i=1xi ≥ n/2.

Instead of Majority, we’ll consider the Threshold function: Definition 4.8 (Threshold function). Tn

k(x1, x2, . . . , xn) =

(

1 if Pn

i=1xi ≥ k

0 otherwise

Fork = n/2 the Threshold function is equal to Majority. We will prove the statement for the Threshold function withk = dn/2 +

√

n/2 + 1/2e. The proof of the result that Tn

k ∈ AC/ 0

outlined by Jukna in [31] consists of two parts. First, it is shown that functions decided by small circuits can be approximated by low-degree polynomials. Subsequently, it is proven that the Threshold function cannot be approximated by such polynomials.

In order to show the first part, we need the following lemma: Lemma 4.9. Let f = Qm

i=1fi, where each fi : F n

2 → F2 is a polynomial of degree at most d

over F2. Then for all r ≥ 1, there exists a polynomial g : Fn2 → F2 of degree at most dr that

differs from f on at most 2n−r _inputs.

Proof. We will define g randomly and calculate the upped bound on the expected number of differences withf .

Let S₁, . . . , S_m ⊆ {1, 2, . . . , m}, chosen uniformly at random. Then let g be defined as follows: g = r Y j=1 hj, hj = 1 − X i∈Sj (1 − fi).

We will calculate the probability thatf and g differ on some fixed input a ∈ {0, 1}n. Iff (a) = 1, then f_i(a) = 1 for all i , and thus g(a) = 1 and f (a) = g(a).

If f (a) = 0, f_i

0(a) = 0 for at least one i0, andg(a) = 1 iff hj(a) = 1 for all j. We

have thath_j(a) = 1 ⇔ P

i∈Sj(1 − fi) = 0 ⇔ (fi(a) = 1, ∀i ∈ Sj). The probability that

i0 ∈ {1, 2, . . . , m} is contained in Si is 1₂, so the probability thati0 ∈ S/ i is also 1₂. Then,

P r[hj(a) = 1] ≤ 1/2.

Givenf (a) = 0, the probability that g(a) = 1 becomes:

P r[g(a) = 1] = P r[∀j, hj(a) = 1] ≤ 2−r,

because the eventsh_j(a) = 1 are independent for all j. So P r[f (a) 6= g(a)] ≤ 2−r.

For an inputa ∈ {0, 1}n, letX_a be the indicator variable for the eventg(a) 6= f (a). Let X be the sum of all Xa. ThenX will be the number of inputs on which f and g differ. The expectation value ofX can be calculated as follows:

E[X] = X a E[Xa] = X a P r[Xa= 1] = X a P r[g(a) 6= f (a)] ≤X a 2−r = 2n−r.

(21)

The pigeonhole principle states that if E[f0] ≤ t for some function f0, then there exists a pointx for which f0(x) ≤ t. This point x gives our function g that fits the requirements in

the lemma.

Now letf be a function computed by a circuit of depth c and size `. We will approximate f at every gate in the circuit. The input of f consists of polynomials of degree 1. The circuit can contain four gates. For a polynomialp, ¬(p) = 1 − p and theLgate uses the sum of the inputs. But the∧ gate uses the product of the inputs and the ∨ gate can be constructed from∧, so we need Lemma 4.9 for these gates. Since the circuit has ` gates, there will be at most` approximations, so g will differ from f on at most ` · 2n−r inputs. The degrees of the approximating polynomials will grow individually for each path from input to output and each path containsc gates, so the final function g that approximates f will have degree at mostrc. Thus, a functionf computed by a small circuit can be approximated by a low-degree polynomial.

For two vectorsu and v: u ≤ v iff u_i ≤ v_i for alli. Then: Lemma 4.10. Let f = Q

i∈Sfi be a monomial of degree d = |S| ≤ n − 1. If a is a vector

(a1, . . . , an)with ai ∈ {0, 1} andPn_i=0ai ≥ d − 1, then over F2,P_b≤af (b) = 0.

Proof. Since f has degree d, f (a) = 0 if there exists an i ∈ S such that ai = 0. Then for all

b ≤ a f (b) = 0. If f (a) = 1, then ai = 1 ∀i ∈ S. The number of vectors b ≤ a with f (b) = 1

is then equal to2m, withm the number of a_i ∈ {a_d+1, . . . , a_n} for which a_i = 1. This is an even number, soP

b≤af (b) = 0 over F2.

With this lemma we are able to show that the Threshold function cannot be approximated well by a low-degree polynomial:

Lemma 4.11. Let n/2 ≤ k ≤ n. Then every polynomial of degree d ≤ 2k − n − 1 over F2

differs from Tn

k on at least n

k inputs.

Proof. Let g be a polynomial of degree d ≤ 2k − n − 1 over F2. LetU = {u | g(u) 6= Tkn(u)}

andA = {(a₁, . . . , a_n) | a_i ∈ {0, 1},Pn

i=1ai = k}. We want to show that |U | ≥ n k

. For this we create a matrixM = (m_a,u): We index the rows and columns according to the members ofA and U respectively and m_a,u = 1 iff u ≤ a and 0 otherwise. If it can be shown that the columns ofM span the whole linear space, F(

n k) 2 , then|U | ≥ |A| = n k .

Leta ∈ A and U_a= {u ∈ U | m_a,u= 1}. If it can be shown that for every b ∈ A : X u∈Ua mb,u = ( 1 ifb = a 0 otherwise,

then we get the desired result. Over allb ∈ A, this sum gives vectors with a 1 only on index a. So all unit vectors are in the column-span of M and the column-span thus equals F(

n k)

2 .

Leta ∧ b be a vector with (a ∧ b)_i = 1 iff a_i = b_i = 1. Then the number of ones in a ∧ b are the indices wherea and b both have a one. Since a and b both contain k ones, there are

(22)

at leastn − 2(n − k) = 2k − n of those indices. We have that d ≤ 2k − n − 1, so a ∧ b has at leastd + 1 ones. Furthermore, we know that for every u ∈ U_a, u ≤ a. So m_b,u = 1 iff u ≤ a ∧ b. Then: X ua mb,u = X u≤a∧b 1 = X x≤a∧b (T_kn(x) + g(x)) = X x≤a∧b T_kn(x) + X x≤a∧b g(x).

By linearity and Lemma 4.10:P

x≤a∧bg(x) = 0. We have that

P

x≤a∧bT n

k(x) = 1 iff a = b,

so this proves the Lemma. With Lemma 4.9 and 4.11 we can prove the following theorem:

Theorem 4.12 (Razborov 1987 [39]). Every circuit with constant depth c, unbounded fan-in and ∧, ∨ and Parity gates that computes T_kn with k = dn/2 +√n/2 + 1/2e has size ` ≥ 2Ω(n1/(2c)₎

.

Proof. Let f be a function computed by a circuit of size ` and let r = n1/(2c)_{. According to}

Lemma 4.9 there is a polynomialg of degree at most rc = √

n that approximates f , making at most ` · 2n−r mistakes. Lemma 4.11 tells us that a polynomial g of degree

√ n that ap-proximatesTn k makes at least n k = Ω( 2n √

n) mistakes. If we have that ` · 2

n−r _≥ n k , then ` ≥ 2Ω(n1/(2c)₎ .

Suppose, towards a contradiction, that we had an efficient (i.e., polynomial-sized) circuit for Majority. We could add zeroes to the end of an input of n bits, such that it becomes a string of lengthN = 2k. Then Maj

N = TN/2N = T n

k and the latter function must also have

a polynomial-sized circuit. But this is in contradiction with the theorem. We can conclude that Majority is not inAC0.

4.2 The barrier in circuit complexity: natural proofs

A common proof strategy in circuit complexity can be described as follows: First, some prop-erty of Boolean functions is defined and it is shown that all functions in some complexity class do not posses that property. Then, it is shown that a functionf does have that property. This implies thatf is not in the complexity class. Both Theorem 4.6 and 4.12 were proven this way. But Razborov and Rudich argue in [40] that, under an assumption, such lower bound techniques cannot be used to prove thatP 6= NP.

4.2.1 What are natural proofs?

Natural proofs use a certain property that Boolean functions can posses. This property can be identified as a collection of Boolean functions,{C_n|n ∈ ω}. Each C_n is in itself a collection of Boolean functions on n variables. Some f_n will posses this property iff f_n ∈ C_n (or Cn(fn) = 1).

(23)

Definition 4.13 (Natural proof [40]). The property that corresponds to a natural proof has three characteristics:

Constructiveness: The property Cnis computable in time polynomial in the truth table of

fn. The truth table has size2n, so there is an algorithm running in2O(n)time that computes

Cn(fn).

Largeness: A random Boolean function gnhas probability at least _n1k, for some fixedk > 0,

to be inC_n.

We call C_n anatural property if it is constructive and large. A property is useful against P/poly if it satisfies the following:

Usefulness: Every sequence of functions f1, f2, . . . , fn, . . . with fn ∈ Cn has a circuit size

that is super-polynomial. Sog_n ∈ C/ _nfor all{g_n} ∈ P_/poly.

A property could also be useful against other complexity classes, with an obvious modifi-cation in the definition.

There is a motivation behind these characteristics. Constructiveness has an empirical jus-tification. Proofs of circuit lower bounds often use combinatorial techniques, like the proof shown above. It turns out that almost all properties of Boolean functions in combinatorics are at worst exponential-time decidable [7, 40].

Thelargeness condition might make sense intuitively: We want that a random function g_n has a non-negligible chance of having the propertyC_n. In fact, a lower bound on the circuit complexity of one function implies a lower bound on the complexity of a lot more functions [40, 7]. So a property that only applies to a small number of functions cannot be used for proving lower bounds.

At last, a propertyC_nhas to beuseful. Only then we can use it for separating complexity classes.

4.2.2 Most proofs in circuit complexity naturalize

In 1997 Razborov and Rudich showed that all circuit lower bound proofs that were known at the time naturalize, i.e., are natural proofs [40]. In order to show that a proof naturalizes, one has to find the property that is used and show that it is natural. The validation of the usefulness requirement is generally contained in the proof.

Parity /∈ AC0 naturalizes

Functions decided by circuits inAC0 become constant after fixing a number of inputs, but Parity can only become constant after fixing all variables. So the property that was used is Cn = {fn: {0, 1}n→ {0, 1} | ∀ restrictions ρ on less than n variables, fnρis not constant}.

Cn has the largeness condition, since a random function isn’t likely to become constant

after a random restriction. It is also constructive: If we letk variables be unfixed, we can calculateC_n(f ) for some f : {0, 1}n→ {0, 1} by listing all n

k2

n−k _{= 2}O(n)_possible

(24)

Majority /∈ AC0with parity gates naturalizes

The proof in Section 4.12 is a modification of the original proof of Razborov. Our variant gives a mapping of the Threshold function to a matrix. The rank of that matrix gives the minimal number of mistakes an approximation polynomial makes. Both proofs are similar, but Razborov’s version gives a mappingM from all symmetric functions to matrices [39].

The propertyC_n= {f_n: {0, 1}n → {0, 1} | rank(M (f_n)) is large } [40].

In the proof, it is shown that all functions inAC0 with Parity gates can be approximated well by a low-degree polynomial. This means that the mappingM on functions in AC0+L cannot give matrices with high rank. The property is thus useful againstAC0 +L. The calculation of the rank can be done in polynomial-time (constructiveness). And for at least 1/2 of all Boolean functions fnthe rank of the matrixM (fn) is high (largeness) [40].

4.2.3 Natural proofs cannot solve P vs NP

In their paper Razborov and Rudich showed that a property C_n of a natural proof can be used to break apseudorandom function family[40]. We can construct pseudorandom function families from one-way functions. Therefore, ifone-way functions exist, then natural proofs cannot show thatP 6= NP.The existence of one-way functions is a widely believed conjecture and it implies thatP 6= NP[7, Sec. 9.2].

Definition 4.14 (T (|x|)-strong one-way function [7, Def. 9.4]). A polynomial-time com-putable function f : {0, 1}∗ → {0, 1}∗ is a T (|x|)-strong one-way function iff for every probabilisticT (|x|)-time algorithm A,

P r[A(y) = x s.t. f (x) = y] < n−c

for every c and sufficiently large n ∈ N, where y = f (x) and x ∈ {0, 1}n uniformly at random.

From any one-way function we can construct a pseudorandom generator. This is a function that creates a pseudorandom string of bits on input of some other string, called theseed [7, Sec. 9.2.3]. From this generator we can construct a pseudorandom family of functions: Definition 4.15 (T (|k|)-secure pseudorandom function family, [7, Def. 9.16]). Let {fk}, k ∈

{0, 1}∗ be a family of functions such thatf_k : {0, 1}|k| → {0, 1}, and there is a polynomial-time algorithm that, given k ∈ {0, 1}∗ and x ∈ {0, 1}|k|, computes f_k(x). This family is T (|k|)-secure pseudorandom iff for every probabilistic oracle Turing machine A that runs in timeT (|k|)

|P r[Afk₍₁n_{) = 1] − P r[A}g₍₁n_{) = 1]| < n}−c

for everyc and sufficiently large n ∈ N, where g : {0, 1}n→ {0, 1} and k ∈ {0, 1}nchosen uniformly at random.

We will assume the existence of2n

-strong (subexponentially strong) one-way functions for some fixed > 0. From these, we can create a 2|k|

0

-secure pseudorandom function family {fk} for another 0 > 0 (, ‘ < 1) [7, 28].

(25)

We will now discuss why natural proofs cannot showP 6= NP: It turns out that a natural property enables us to distinguish between oracle access to a pseudorandom function from such a family, and oracle access to a uniformly random Boolean function. The following theorem is due to Razborov and Rudich [40]. The proof comes from [7, Th. 23.1].

Theorem 4.16. If subexponentially strong one-way functions exist, there is no lower-bound natural proof useful against P/poly

Proof. We assume towards a contradiction that there exists a lower bound natural proof, that uses some natural propertyC_n, useful againstP_/poly. There exists ac > 0 such that a uniformly random function onn variables has probability of at least 1

nc to be inCn(largeness).

Let {f_k} be a 2|k|

-secure pseudorandom function family. Let A be a polynomial-time algorithm with oracle access to a function h : {0, 1}m → {0, 1} and let n = m/2 . The functionh can be a uniformly random function, or f_kfor somek ∈ {0, 1}m. Define a function g : {0, 1}n → {0, 1} as g(x) = h(x0m−n_{), so m − n zeroes are added to the input and it is}

then sent as input toh. We let A compute C_n(g).

Ifh is f_k, thenf_kcan be computed in time polynomial inm and thus also in time polyno-mial inn. This means that g can be computed in polynomial time, and thus by a polynomial-sized circuit. SoC_n(g) = 0, since C_nis useful againstP_/poly.

Ifh is a uniformly random function, then g is also a uniformly random function and thus Cn(g) = 1 with probability at least _n1c.

Cn(g) can be computed in 2O(n)time (constructiveness). So A distinguishes between oracle access tof_kand to a uniformly random function in less than2m

time, which is a

contradic-tion.

Thus, natural proofs cannot separate NP from P_/poly. In order to solve this problem, proofs must either violate the largeness or the constructiveness condition.

4.2.4 Non-naturalizing results

That Majority is not inAC0 even with Parity gates was one of the last remarkable circuit lower bounds for quite some time. After 1987 there were no significant new general lower bounds in circuit complexity [19], until Ryan Williams was able to show thatNEXP 6⊂ AC0 withMOD_mgates in [49] (aMOD_m gate outputs 1 iffm divides the sum of its inputs, where m > 1 is an arbitrary constant). The result is non-relativizing and non-natural and it has been called the "first baby step" towards solvingP vs NP [18]. This will be discussed in Chapter 6.

There has been another noteworthy result: Burhman et al. showed thatMA_exp (the class of languages computable with two-roundpublic-coin interactive proof with an exponential-time verifier) does not have polynomial-sized circuits [11]. This is another result that is non-natural and non-relativizing, since it uses diagonalization andarithmetization respectively [4].

(26)

5 Arithmetization and algebrization

Arithmetization is used for representing Boolean functions by polynomials and it’s especially useful in interactive proofs. However, most lower bound proofs that use arithmetization algebrize and cannot solve P vs NP.

5.1 The usage of arithmetization

Arithmetization circumvents the relativization barrier. It has been used in circuit complexity: Razborov’s result that Majority is not inAC0used a representation of Boolean formulas by polynomials, as did the similar result of Smolensky [39, 45].

5.1.1 What is arithmetization?

Arithmetization is used for extending Boolean formulas to polynomials. The arithmetization of a Boolean formula to a polynomial is such that their outputs coincide on Boolean inputs. This way, we can use the polynomial instead of the formula.

Definition 5.1 (Arithmetization [30, B.2.2]). Arithmetization is the conversion of a Boolean formulaϕ : {0, 1}n → {0, 1} to a multivariate polynomial ˜ϕ : Fn → F, for a finite field F. Letϕ be a Boolean formula on the variables x₁, . . . , x_n, such thatϕ has no ∨ symbols. We can defineϕ by induction on the structure of ϕ:˜

xi 7→ xi ∈ F[x1, . . . , xn]

¬φ 7→ (1 − ˜φ) φ ∧ ψ 7→ ˜φ · ˜ψ Now, for allx ∈ {0, 1}n:ϕ(x) = ˜ϕ(x).

5.1.2 Rounds of interaction and deterministic interactive proofs

The first results where arithmetization played a major role in the proof concerned interactive proofs. These results where due to Lund et al. and Shamir, who showed thatIP = PSPACE [42, 36] (of which more later) and Babai et al., who proved thatMIP = NEXP [8]. The class of languages that haveprobabilistic interactive proofs is IP, while MIP contains languages that have protocols with multiple provers [14].

We will first considerdeterministic interactive proofs, that consist of an interaction between a deterministic verifier and prover. The verifier asks the prover questions and verifies its responses. The interaction consists of several rounds:

(27)

Definition 5.2 (A k-round deterministic interaction [7, Def. 8.2]). Let f, g : {0, 1}∗ _{→ {0, 1}}∗

be functions andk ≥ 0 an integer. A k-round interaction on input x ∈ {0, 1}∗is defined as a sequence of stringsa₁, . . . , a_k∈ {0, 1}∗: a1 = f (x) a2 = g(x, a1) . . . a2i+1= f (x, a1, . . . , a2i) (2i < k)

a2i+2= g(x, a1, . . . , a2i+1) (2i + 1 < k)

The output of the interaction is defined asf (x, a₁, . . . , a_k).

In the end, the verifier is convinced of the statement the prover was trying to prove (f (x, a₁, . . . , a_k) = 1), or not. Languages decided by deterministic interactive proof systems correspond with the following class:

Definition 5.3 (The class dIP [7, Def. 8.3]). A language L ∈ dIP iff there is a k-round deterministic interaction between a polynomial-time TMV and a prover P, where k is at most a polynomial in the size of the inputx ∈ {0, 1}∗, such that:

1. Ifx ∈ L, then there is a prover P : {0, 1}∗ → {0, 1}∗such thatV accepts. 2. Ifx /∈ L, then for all provers P : {0, 1}∗ → {0, 1}∗ V rejects.

The prover has unlimited computational power, but he cannot let the verifier accept a false statement. Since the verifier and prover are deterministic, all questions and answers can be announced immediately: there is no need for more than one round of interaction. We have that NP = dIP: the existence of a certificate that can be verified in polyno-mial time corresponds with the existence of an interaction transcript. Namely: if such a certificate exists, there is a 1-round deterministic interaction where the verifier accepts. If there is a transcript a₁, . . . , a_k of a k-round interaction, then (a₁, . . . , a_k) can count as a certificate: On input x, a polynomial-time Turing machine V can check that V (x) = a1, V (x, a1, a2) = a3, . . . , V (x, a1, . . . , ak) = 1 [7, Lem. 8.4].

5.1.3 Probabilistic interactive proofs and the class IP

In probabilistic interactive proofs, with at most a small probability true statements get re-jected and false statements get accepted. The verifier is now a probabilistic Turing machine. The verifier V and the prover P engage in a conversation, consisting of several rounds of interaction. Every other roundV generates a random string, the coin, that it uses in its verification. ButP cannot see this coin. A formal definition of such an interaction is as follows:

Definition 5.4 (A k-round interaction with private coins [7, Sec 8.1.2]). Let f, g : {0, 1}∗ → {0, 1}∗

be functions,k ≥ 0 an integer, x ∈ {0, 1}∗ an input andr ∈ {0, 1}m a random string gen-erated byf . A k-round probabilistic interaction with private coins is defined as a sequence of

(28)

stringsa₁, . . . , a_k∈ {0, 1}∗: a1 = f (x, r) a2 = g(x, a1) . . . a2i+1= f (x, r, a1, . . . , a2i) (2i < k)

a2i+2= g(x, a1, . . . , a2i+1) (2i + 1 < k)

The output of the interaction is defined asf (x, r, a₁, . . . , a_k).

Every other round, f uses the random variable r. It can use the same part of r, but it can also divider into disjoint sequences and use a different part of r every round. Since the interactions are random variables, the output is too. Because of the probability in the inter-action, the verifier can sometimes accept a false statement. We want that correct statements are accepted with high probability and wrong statements with low probability:

Definition 5.5 (The class IP [14, 7]). A language L ∈ IP iff there is a k-round private coin interactiona₁, . . . , a_kbetween a probabilistic polynomial-time TMV and a prover P, where k is at most a polynomial in the size of the input x ∈ {0, 1}∗. Furthermore:

1. Ifx ∈ L, there is a P : {0, 1}∗ → {0, 1}∗such thatV accepts with probability at least 2

3.

2. Ifx /∈ L, for all P : {0, 1}∗ → {0, 1}∗ the probability thatV accepts is at most 2

3.

5.1.4 IP = PSPACE

It was known for some time thatIP ⊆ PSPACE [7, Sec. 8.1.1]. For a language L, we can compute the probability that a verifier accepts an inputx using polynomial space, which tells us whetherx ∈ L or not. Most researchers thought that IP ⊆ PSPACE would be a proper containment, since there are oracles relative to which this is the case. In 1992 Shamir [42], following up on Lund et al. [36], was able to show the opposite, using algebrization.

Theorem 5.6 ([42, 36]). PSPACE ⊆ IP

In order to prove this, we will show that aPSPACE-complete language, TQBF, is in IP. This language consists of True Quantified Boolean Formulas. This can be understood asSAT for formulas with quantifiers.

The proof below gives an interaction protocol forTQBF and the presentation is inspired by [7, Sec. 8.3.3].

Proof. Let ψ = ∀x1∃x2· · · ∀xn−1∃xnϕ(x1, . . . , xn) be a TQBF and ϕ : {0, 1}n → {0, 1}.

We will construct a polynomial-time verifierV and a prover P, such that TQBF ∈ IP. The idea is to use arithmetitization to letP convince V that ψ 6= 0. Let ˜ϕ be the arithmetization ofϕ over a finite field F_p, for somep. We know that ψ is true if

Ψ = Y b1∈{0,1} X b2∈{0,1} · · · Y bn−1∈{0,1} X bn∈{0,1} ˜ ϕ(b1, . . . , bn) 6= 0

(29)

for a right choice ofp, on which more later. In order to convince V that Ψ 6= 0, P will try to showV that Ψ = K for some K ∈ F_p. In the interaction, P will provide V with the univariate polynomial h(x1) = X b2∈{0,1} · · · Y bn−1∈{0,1} X bn∈{0,1} ˜ ϕ(x1, b2. . . , bn).

V will then verify that h(0) · h(1) = K. More rounds will follow, so V can check that P is not cheating.

However, there are two problems. Because of the products, the degree of h can get ex-ponentially big. If we had a TQBF with only universal quantifiers, the degree could get as high as2n. The value ofΨ can thus also be double-exponential. A polynomial-time verifier cannot read the binary string of a double-exponential number, nor can it receive the possible 2n_{coefficients of the polynomial.}

We will thus work modulo some prime number p to lower the values involved in the interaction. The choice ofp is such that Ψ will get a non-zero value and such that all values that arise in the interaction are low enough forV.

In order to reduce the degrees of the polynomials that will arise in the interaction, we use a linearization operatorL_i. For a polynomialq:

Liq(x1, . . . , xm) = (1−xi)q(x1, . . . , xi−1, 0, xi+1, . . . , xm)+xi·q(x1, . . . , xi−1, 1, xi+1, . . . , xm)

This way,x_ihas a power at most 1 in the expressionL_iq, and q will agree with L_iq on inputs in{0, 1}. Then let d be the upper bound on the degree of all polynomials involved, which is known to the verifier.

For the interaction, we will also treat the quantifiers∀x_i, ∃x_i as operatorsA_i, E_i: Aiq(x1, . . . , xm) = q(x1, . . . , xi−1, 0, xi+1, . . . , xm) · q(x1, . . . , xi−1, 1, xi+1, . . . , xm)

Eiq(x1, . . . , xm) = q(x1, . . . , xi−1, 0, xi+1, . . . , xm) + q(x1, . . . , xi−1, 1, xi+1, . . . , xm)

The linearization operator should be there for every free variable. So instead ofΨ, we will now use the expression

P = A1L1E2L1L2· · · An−1L1· · · Ln−1EnL1· · · Lnϕ(x˜ 1, . . . , xn)

and letP show that it is equal to K.

The protocol can be described in the following way: In the first roundV asks P to prove thatψ 6= 0. In the second round, P says that he will show that P = K, for some K ∈ F_p. In the third round,V can verify the expression P and check that p is indeed a prime number.

Round 4 and 5: A₁

1. P provides a polynomial h₁(x₁) of degree at most d that is supposed to equal L1E2· · · Lnϕ(x˜ 1, . . . , xn).

Successes and setbacks in P vs NP A study of fruitful proof methods in complexity theory and the barriers they face towards solving P vs NP