Faster space-efficient algorithms for Subset Sum, k-Sum and related problems

(1)

Faster space-efficient algorithms for Subset Sum, k-Sum and

arXiv:1612.02788v1 [cs.DS] 8 Dec 2016

Faster Space-Efficient Algorithms for Subset Sum, k-Sum and

1 Introduction

The Subset Sum problem and the closely related Knapsack problem are two of the most basic NP-Complete problems. In the Subset Sum problem we are given integers w1, . . . , wn and an integer

target t and are asked to find a subset X of indices such that P

i∈Xwi = t. In the Knapsack

problem we are given integers w1, . . . , wn (weights), integers v1, . . . , vn (values) and an integer

t (weight budget) and want to find a subset X maximizing P

i∈Xvi subject to the constraint

P

i∈Xwi≤ t.

It is well known that both these problems can be solved in time O∗(min(t, 2n)), where we use the O∗_{(·) notation to suppress factors polynomial in the input size. In their landmark paper introducing} the Meet-in-the-Middle approach, Horowitz and Sahni [14] solve both problems in O∗₍₂n/2_{) time}

and O∗(2n/2) space, substantially speeding up the trivial O∗(2n) time polynomial space algorithm. Later this was improved to O∗(2n/2) time and O∗(2n/4) space by Schroeppel and Shamir [33]. While these approaches have been extended to obtain improved tradeoffs in several special cases, particularly if the instances are random or “random-like” [2, 3, 4, 10, 15], the above algorithms are still the fastest known for general instances in the regimes of unbounded and polynomial space.

The idea in all these algorithms is to precompute and store an exponential number of interme-diate quantities in a lookup table, and use this to speed up the overall algorithm. In particular, in the Meet-in-the-Middle approach for Subset sum, one splits the n numbers into two halves L and R and computes and stores the sum of all possible 2n/2 _{subsets of L, and similarly for R. Then for}

every partial sum a of a subset of L one looks up, using binary search, whether there is the partial sum t − a for a subset of R. As these approaches inherently use exponential space, an important long-standing question (e.g. [2], [9], Open Problem 3b in [38], Open Problem 3.8b in [11]) has been to find an algorithm that uses polynomial space and runs in time O∗₍₂(1−ε)n_{) for some ε > 0. We}

almost resolve this open problem.

1.1 Our Results

Subset Sum, Knapsack and Binary Linear Programming: In this paper we give the first polynomial space randomized algorithm for Subset Sum, Knapsack and Binary Linear Programming (with few constraints) that improves over the trivial O∗(2n) time algorithm for worst-case inputs, under a mild technical assumption. Our main theorem reads as follows:

Theorem 1. There are Monte Carlo algorithms solving Subset Sum and Knapsack using O∗₍₂0.86n₎

time and polynomial space. The algorithms assume random read-only access to random bits. In the Binary Linear Programming problem we are given v, a1, . . . , ad_{∈ Z}n and u1, . . . , ud∈ Z

and are set to solve the following optimization problem: minimize _{hv, xi}

subject to haj, xi ≤ uj, for j = 1, . . . , d

xi∈ {0, 1}, for i = 1, . . . , n.

Using reductions from previous work, we obtain the following consequence of Theorem 1:

Corollary 2. There is a Monte Carlo algorithm solving Binary Linear Programming instances with maximum absolute integer value m and d constraints in time O∗(20.86n(log(mn)n)O(d)) and polynomial space. The algorithm assumes random read-only access to random bits.

(4)

Though read-only access to random bits is a rather uncommon assumption in algorithm design,1 such an assumption arises naturally in space-bounded computation (where the working memory is much smaller than the number of random bits needed), and has received more attention in complexity theory (see e.g. [27, 28, 32]).

From existing techniques it follows that the assumption is weaker than assuming the existence of sufficiently strong pseudorandom generators. We elaborate more on this in Section 6 and refer the reader to [32] for more details on the different models for accessing input random bits.

List Disjointness: A key ingredient of Theorem 1 is a new algorithmic result for the List Dis-jointness problem. In this problem one is given (random read-only access to) two lists x, y ∈ [m]n

with n integers in range [m] with m ≤ poly(n), and the goal is to determine if there exist indices i, j such that xi = yj. This problem can be trivially solved by sorting using ˜O(n) time2 and O(n)

space, or in ˜O(n2_{) running time and O(log n) space using exhaustive search. We show that}

im-proved running time bounds with low space can be obtained, if the lists do not have too many “repeated entries”.

Theorem 3. _{There is a Monte Carlo algorithm that takes as input an instance x, y ∈ [m]}n of List Disjointness with m ≤ poly(n), an integer s, and an upper bound p on the sum of the squared frequencies of the entries in the lists, i.e. Pm

v=1|x−1(v)|2 + |y−1(v)|2 ≤ p, and solves the List

Disjointness instance x, y in time ˜O(npp/s) and O(s log n) space for any s ≤ n2/p. The algorithm assumes random read-only access to a random function h : [m] → [n].

Note the required function h can be simulated using polynomially many random bits. This result is most useful when s = O(1), and it gives non-trivial bounds when the number of repetitions is not too large, i.e. we can set p ≪ n2. In particular, if m = Ω(n) and the lists x and y are “random-like” then we expect p = O(n) and we get that List Disjointness can be solved in ˜O(n3/2) time and O(log n) space. Moreover, in this random-like setting, as s increases to n, the running time of our algorithm smoothly goes to ˜O(n), matching the sorting based running time.

As List Disjointness arises as a fundamental subroutine in many algorithms (especially those based on the Meet-in-the-Middle approach) we expect Theorem 3 to be also useful for related problems other than the ones studied in this paper.

k-Sum: We illustrate that the random read-only access assumption is not needed when an in-stance has sufficient random properties itself. In particular, we apply the techniques behind The-orem 3 to solve random instances of the k-Sum problem. In this problem we are given k lists w1_{, . . . , w}k_{each with n integers and an integer t and the goal is to find some s}

i∈ wi for each i such

thatPn

i=1si = t. We solve random instances of this problem in the following precise sense:

Theorem 4. _{There is a randomized algorithm that given a constant k ≥ 2, k uniformly random} vectors w1, . . . , wk _∈R [m]n, with n ≤ m ≤ poly(n) and m being a multiple of n, and a target t

(that may be adversarially chosen depending on the lists) finds si ∈ wi satisfying Pki=1si = t if

they exist with constant probability using ˜O(nk−0.5) time and O(log n) space.

In particular, this implies an ˜O(n1.5_{) time algorithm using O(log n) space for the random 2-Sum}

problem (without any read-only random bits). Note that unlike algorithms for worst-case k-Sum,

1

The only exception we are aware of is [6] where it is treated as a rather innocent assumption.

2

(5)

we cannot reduce k-Sum to ℓ-Sum on n⌈k/ℓ⌉-sized lists in the usual way as the produced lists will not be random. By applying Theorem 3 we also show how random instances of k-Sum for even constant k can be solved in ˜O(n3k/2) time and O(log n) space assuming random read-only access to random bits. See Section 5 for more details.

1.2 Our Main Techniques

Techniques behind Theorem 1: To obtain the algorithm for Subset Sum in Theorem 1, we apply the Meet-in-the-Middle approach from [14] as outlined above, but use Theorem 3 instead of using a lookup table as we cannot store the latter in memory. While we can also not store the two lists of partial sums in memory, we can simulate random access to it as a fixed partial sum in these lists is easily computed in O(n) time. As the running time in Theorem 3 depends on the parameter p, which we may need to set as large as 2n in general, this does not directly give us an O∗_{((2 − ε)}n) time polynomial space algorithm for ε > 0.

To obtain such an improvement, we use a win-win approach, similar to what was suggested by Austrin et al. [4]: if p = O∗₍₂0.72n_{) is a valid upper bound for the sum of the squared frequencies}

of the entries in the lists, Theorem 3 already gives the promised running time. On the other hand if p = O∗(20.72n) is not a valid upper bound, then we exploit the particular additive combinatorics of subset sums to show that a different polynomial space algorithm works well for this Subset Sum instance. Note that if p needs to be large, a large number of subsets need to have the same sum, which does not directly seem exploitable in general, let alone using only polynomial space. However, recently Austrin et al. [4] developed a technique to show that in this case the number of distinct sums |w(2[n])| := |{P

e∈Xwe : X ∈ 2[n]}| must be substantially small, and this technique is almost

directly applicable here. Observe that this connection is completely non-trivial. For example, a-priori it is entirely possible that one particular subset sum occurs 2n−1 _{times (so half the subsets}

have the same sum), and the other 2n−1_{subset sums are all distinct (and hence |w(2}[n]_{)| = 2}n−1+1). Nevertheless, Austrin et al. show that such connections can be obtained by relating the problem to bounding ‘Uniquely Decodable Code Pairs’ (see e.g. [5] and the references therein), a well-studied notion in information theory. In our setting, we show that if p = O∗(20.72n) is not a valid upper bound as required in Theorem 3, then |w(2[n])| ≤ O∗(20.86n) and we can run a O∗_(|w(2[n]_{)|) time} polynomial space algorithm (see e.g. [20, 4]).

To obtain the result for Knapsack (and for Binary Linear Programming) we use a reduction to Subset Sum previously presented by Nederlof et al. [26].

Techniques behind Theorem 3: At its core, the algorithm behind this theorem relies on space efficient cycle finding algorithms that, given access to a function f : [n] → [n], are able to find and sample pairs i, j such that f (i) = f (j) efficiently. To gain any non-trivial improvement over naive search with these algorithms, certain random-like properties of the function f are required. Cycle finding algorithms are well-known in cryptography (a famous example is the Pollard’s rho method for factorization) where these random-like properties are assumed to hold, but are less commonly used in the traditional worst case analysis of algorithms.

Nevertheless, cycle finding algorithms were previously already applied in a similar setting by Beame et al. [6]. To guarantee the random-like properties of f , Beame et al. use the idea of (essentially) ‘shuffling’ f using the assumed random access to the read-only tape with random bits to ensure f has the required properties. They use this to give a substantially improved space-time

(6)

trade-off for the related Element Distinctness problem, in which one is given a list x ∈ [m]n and needs to determine whether all values occur at most once.

We combine this approach with a simple but powerful idea from cryptography (see e.g. [34], and for a non-rigorous application to random Subset Sum, [7]), which allows us to deal with two lists rather than one, thereby greatly improving the scope of the technique. Specifically, we define the function f in such a way that f (i) = f (j) implies that either xi = xj, yi = yj, yi = xj or

xi = yj. Note that we are interested in the latter two cases, and using the cycle finding algorithm

we sample pairs of indices until we expect to encounter one of these cases if they exist. To allow for a space-time trade-off, we use the elegant extension of the cycle finding algorithm by Beame et al..

1.3 Related Previous Work

In this section we briefly outline related previous work, grouped by four of the research areas touched by our work.

Exact algorithms for NP-complete problems. The goal of solving NP-complete problems exactly using as few resources as possible has witnessed considerable attention in recent years. The Subset Sum problem and its variants have received a lot of attention in this area, but as mentioned above, nothing better than exhaustive search (for polynomial space) or the classic results of [14, 33] was known. Because of the lack of progress on both questions, several works investigated space-time trade-offs indicating how fast the problems can be solved in 2αn space for some 0 < α < 1 (see e.g. [2, 10]). A promising research line aimed at improving the running time of [14] in the worst case is found in [3, 4]. One of their results is that instances satisfying |w(2[n])| ≥ 20.997n can be solved in O∗(20.49991n) time, and a curious byproduct is that every instance can either be solved faster than [14] or faster than the trivial polynomial space algorithm.

Exact algorithms for the Binary Linear Programming problem were previously studied by Im-pagliazzo et al. [16], and Williams [37]. Using the notation from Corollary 2, the algorithm of Impagliazzo runs in time O∗₍₂n(1−poly(1/c))_{) if d = cn, and the algorithm of Williams runs in time}

O∗₍₂n(1−Ω(1/((log log m)(log d)5

)))_{), and both algorithms use exponential space. Our algorithm only}

improves over brute-force search when d = o(n/(log(n log(mn)))), but improves over the previous algorithms in the sense that it uses only polynomial space.

Bounded Space Computation. In many applications, memory is considered to be a scarcer resource than computation time. Designing algorithms with limited space usage has led to extensive work in areas such as the study of complexity classes L and N L (see e.g. [1]) and pseudorandom generators that fool bounded space algorithms (see e.g. [27]). In particular, its study elicited some surprising non-trivial algorithms for very fundamental computational problems such as graph reachability. In this paper we add the List Disjointness problem to the list. Space-time trade-offs in many contexts have already been studied before (see e.g. [23, Chapter 7] or for trade-offs Exact algorithms for NP-complete problems, [13, Chapter 10]).

Cryptography. In the area of cryptography the complexity of Subset Sum on random instances has received considerable attention [12, 22, 8], motivated by cryptographic schemes relying on the hardness of Subset Sum [17, 29]. In a recent breakthrough Howgrave-Graham and Joux [15] showed

(7)

that random instances can be solved faster than the Meet-in-the-Middle approach, specifically in O∗₍₂0.3113n_{) time. This was subsequently improved in [7] where also a O}∗₍₂0.72n_{) time polynomial}

space algorithm was claimed3.

k-Sum and List Disjointness. These are among the most fundamental algorithmic problems that arise as sub-routines in various problems. Most relevant is the paper by Beame et al. [6] which presents an algorithm for the Element Distinctness problem in which one is given a list x ∈ [m]n and needs to determine whether all values occur at most once, and our work heavily builds on this algorithm. Assuming the exponential time hypothesis, k-Sum requires no(k) _{time [30], and while}

the problem can be solved in n⌈k/2⌉via Meet-in-the-Middle it is sometimes even conjectured that k-Sum requires nk/2−o(1) time. Before our work, no non-trivial algorithms using only polylogarithmic space were known. As List Disjointness and 2-Sum are (up to a translation of one of the sets) equivalent, our algorithm for List Disjointness also implies new space efficient algorithms for k-Sum for k ≥ 2. Space-time trade-offs for k-k-Sum were studied by Wang [36], and more recently, by Lincoln et al. [24]. Our results improve these trade-offs for sufficiently small space on random instances or on worst-case instances assuming random access to polynomially many random bits.

We are not aware of previous work explicitly targeting random k-Sum, and would like to mention that it is currently not clear whether random k-Sum is easier than k-Sum in the sense that it can be solved faster given unbounded space.

Outline of this Paper

In Section 2 we introduce the notation used in this paper. In Section 3 we present our List Disjointness algorithm. In Section 4 we present our algorithms for the Subset Sum, Knapsack and Binary Linear Programming problems. In Section 5 we present our algorithms for k-Sum. Finally, in Section 6 we provide concluding remarks and possible directions for further research.

2 Notation

For sets A, B we denote AB _{as the set of all vectors with entries from A indexed with elements}

from B. We freely interchange elements from AB_{with functions f : B → A in the natural way, and} let f−1 _{: A → 2}B _{denote the inverse of f , i.e. for a ∈ A we let f}−1_{(a) denote {b ∈ B : f (b) = a}.} If f : B → B and i ≥ 0 we let fi_{(s) denote s if i = 0 and f (f}i−1_{(s)) otherwise. If f : B → A}

and F ⊆ 2B we also denote f (F) for {f (X) : X ∈ F}. If f : B → Z and X ⊆ B we shorthand f (X) = P

e∈Xf (e). For vectors x, y ∈ ZB we denote by hx, yi the inner product of x and y, i.e.

hx, yi = P

i∈Bxiyi. For integers p, q, r we let q ≡p r denote that q is equal to r modulo p. If A

is a set, a ∈R A denotes a is uniformly at random sampled from the set A. We let ˜O(T ) denote

expressions of the type O(T ·polylog(T )), while O∗() denotes expressions of the type O(T ·poly(|x|)) where x is a problem instance at hand. For integers i ≤ j, [i] denotes the set {1, . . . , i} and [i, j] denotes {i, i + 1, . . . , j}. If G = (V, E) is a directed graph and v ∈ V , we let NG−(v) denote the set

of in-neighbors of v and N_G+_{(v) denote the set of out-neighbors of v. The in-degree of v is |N}_G−_(v)|; the out-degree of v is |NG+(v)|, and a graph is k-(in/out)-regular if all vertices have (in/out)-degree

k. By Monte Carlo algorithms we mean randomized algorithms with only false negatives happening

3

While this algorithm served as an inspiration for the current work and the claim looks reasonable, we felt several arguments were missing in the write-up that seem to rely on implicit strong assumptions of sufficient randomness.

(8)

with constant probability. If the algorithms in this work claim an instance is a YES-instance they can also provide a witness. With random read-only access to a function we mean we can read of any of its evaluations in constant time.

3 List Disjointness

In this section we prove Theorem 3, which we restate here for convenience.

Theorem 3 _{(restated). There is a Monte Carlo algorithm that takes as input an instance x, y ∈} [m]n _{of List Disjointness with m ≤ poly(n), an integer s, and an upper bound p on the sum of} the squared frequencies of the entries in the lists, i.e. Pm

v=1|x−1(v)|2 + |y−1(v)|2 ≤ p, and solves

the List Disjointness instance x, y in time ˜O(npp/s) and O(s log n) space for any s ≤ n2/p. The algorithm assumes random read-only access to a random function h : [m] → [n].

Naturally, the set [m] above can be safely replaced with any m-sized set (and indeed we will apply this theorem in a setting where values can also be negative integers), but for convenience we stick to [m] in this section. We first recall and slightly modify the result from Beame et al. [6] in Subsection 3.1 and afterwards present our algorithm and proof in Subsection 3.2.

3.1 The Collide Function from Beame et al.

At its core, our algorithm for List Disjointness uses a space-efficient subroutine to find cycles in 1-out-regular directed graphs i.e. graphs in which every vertex has out-degree exactly 1. If G = (V, E) is such a graph, its edge relation can be captured by a function f : V → V where f (u) = v if vertex u has an outgoing edge to vertex v. We say that two vertices u 6= v are colliding if f (u) = f (v) and refer to (u, v) as a colliding pair. For k ∈ V , denote by f∗(k) the set of vertices reachable from k in G, or more formally f∗_{(k) = {f}i_{(k) : i ≥ 0} (recall from Section 2 that f}i denotes the result of applying f iteratively i times to k). Since V is finite, clearly there exists a unique smallest i < j such that fi(k) = fj_{(k). The classic and elegant Floyd’s cycle finding algorithm takes k ∈ V as} input and finds such i and j, using read-only access to f , ˜_O(|f∗_{(k)|) time and O(log n) working} memory, where n denotes |V |.4 _{See e.g. the textbook by Joux on cryptanalysis [19, Section 7.1.1.]}

or Exercises 6 and 7 of Knuth’s ‘Art of Computer Programming’ [21] for more information. An extension of this algorithm allowing space-time trade-offs was recently proposed by Beame et al. [6] to find multiple colliding pairs in G more efficiently than using independent runs of Floyd’s algorithm. Formally, given a sequence K = (k1, . . . , ks) of s vertices in G, define f∗(K) :=

∪si=1f∗(ki) to be the set of vertices reachable in G from K. Beame et al. show there is a deterministic

algorithm that finds all colliding pairs in the graph G[f∗_{(K)] in time O(|f}∗_{(K)| log s min{s, log n})}

using only O(s log n) space. We will apply this but abort a run after a certain time in order to facilitate our analysis, and use the following definition to describe its behavior.

Definition 4.1. Given an integer L and sequence K = (k1, . . . , ks), let ℓ be the greatest integer

upper bounded by s satisfying |f∗({k1, . . . , kℓ})| ≤ L. Define fL∗(K) = f∗({k1, . . . , kℓ}).

4

Sometimes O(1) memory usage is stated assuming the RAM model with word size log(|V |), but as |V | may be exponential in the input size in our applications we avoid this assumption.

(9)

Lemma 3.1 (Implicit in [6]). There is a deterministic algorithm Collide(f, K, L) that given read-only access to f : V → V describing a 1-out-regular directed graph G and a set K of s ver-tices returns (a superset of ) the set of pairs {(v, NG−(v) ∩ fL∗(K)) : |NG−(v) ∩ fL∗(K)| > 1} using

O(L log s min{s, log n}) time and O(s log n) space.

The algorithm of Beame et al. [6] can be thought of as follows. First pick the vertex k1

and follow the path dictated by f , i.e. f1_(k

1), f2(k1), . . . , until a colliding pair is found (or we

revisit the start vertex k1). Thus we stop at the first step t such that there exists a t′ < t with

ft′(k1) = ft(k1). This colliding pair of vertices is reported (if we not not revisit the start vertex)

and then a similar process is initiated from the next vertex, k2, in K until a colliding pair is found

again. Notice that this time a colliding pair might occur because of two reasons: first because the path from k2 intersects itself, and second because the path from k2 intersects the path of k1. We

again report the found colliding pair and move on to the next vertex in K and so on. To achieve the aforementioned time and space bounds, Beame et al. [6] use Floyd’s cycle finding algorithm combined with additional bookkeeping. By inspecting their algorithm it is easily seen that when aborting after O(L log s min{s, log n}) time, all colliding pairs in fL∗(K) will be reported. Note that

the output of Collide(f, K, L) can indeed be described in O(s log n) space as each vertex in K either gives rise to a new (v, X) pair in the output or adds one vertex to the set X for some pair (v, X).

3.2 The Algorithm for List Disjointness

Given a list z ∈ [m]n_{, define the number of pseudo-solutions of z as}

p(z) =

m

X

v=1

|z−1(v)|2.

This is a measure of how many pairs of indices (i.e. ‘pseudo-solutions’) there are in which z has the same value. Note that a similar notion of ‘pseudo-duplicates’ was used by Beame et al. [6]. Notice that p(z) ≥ n as p(z) ≥ P

v|z−1(v)| = n. The running time of our algorithm to find a common

value of two lists x, y will depend on the quantity

p(x, y) = p(x) + p(y).

We refer to a solution as a pair of indices i∗, j∗ such that xi∗ = y_j∗. On a high level, our algorithm

uses the procedure Collide to obtain samples from the union of the set of solutions and pseudo-solutions of x and y. The quantity p(x, y) is thus highly relevant as it indicates the number of samples required to guarantee that a solution will be found with good probability. We first present an algorithm for the following promise version of the List Disjointness problem, from which Theorem 3 is an easy consequence.

Lemma 3.2. _{There is a randomized algorithm that given two lists x, y ∈ [m]}n with a common value given by xi∗ = y_j∗, and positive integers p(x, y) ≤ p and s satisfying s ≤ n2/p, outputs the

solution (i∗, j∗) with constant probability using ˜Onpp/sexpected time and O(s log n) space. The algorithm assumes random read-only access to a random function h : [m] → [n].

Note that the algorithm from Lemma 3.2 can be easily turned into a decision algorithm for List Disjointness: run the algorithm provided by Lemma 3.2, and return “disjoint” if it is still running

(10)

after ˜O(npp/s) time. If a solution exists, this algorithm will find a common element with constant probability by Markov’s inequality. Thus this lemma implies Theorem 3, and the remainder of this section is devoted to the proof of Lemma 3.2. To this end, we assume indices i∗ and j∗ with xi∗= y_j∗ exist. The algorithm is listed in Figure 1.

Algorithm LD(x, y, s, p)

Assumes random access to a random function h : [m] → [n]. 1: fori = 1, . . . , n do

2: if xi= yi then return(i, i).

3: Pick r = (ra, rb) where ra∈R{0, 1}⌈log2(n+1)⌉ and rb∈ {0, 1}.

4: _{Define z : [n] → [m] as} zi =

(

xi, if hra, bin(i)i ≡2 rb,

yi, if hra, bin(i)i ≡2 1 − rb.

bin_{(i) denotes the binary expansion of i.} 5: _{Define f : [n] → [n] as f (i) = h(z}i).

6: _{L ←} 1₂nps/p. 7: whiletrue do

8: Sample K = (k1, . . . , ks) ∈R[n]s.

9: _{C ← Collide(f, K, L).} Collide_{is described in Lemma 3.1.} 10: for _{(v, X) ∈ C do}

11: if _{i, j ∈ X and hr}a, bin(i)i 6≡2hra, bin(j)i and zi= zj then return (i, j).

Figure 1: Applying the collision search technique for list disjointness.

The intuition behind this algorithm is as follows: at Line 11 we have that i 6= j forms a colliding pair, i.e. f (i) = f (j) = v, and since h is a random hash function we expect zi = zj. This implies

that (i, j) is either a pseudo-solution or a solution which is checked at Line 11 by the conditions hra, bin(i)i 6≡2 hra, bin(j)i and zi = zj.

Recall that f is naturally seen as a directed graph with vertex set [n] (see Figure 2 for an example), and therefore we also refer to the indices from [n] as vertices. The main part of the proof of Lemma 3.2 is to lower bound the probability that the solution (i∗, j∗) is found in a single iteration of the loop of Line 7. Let us consider such an iteration, and use WL(K) (mnemonic for

‘walk’) to denote the length L prefix of the following sequence of vertices

(k1, f (k1), . . . , fℓ1(k1), k2, f (k2), . . . , fℓ2(k2), . . . , ks, f (ks), . . . , fℓs(ks)),

where ℓi is the smallest positive integer such that z(fℓi(ki)) = z(fp(kj)) and either j = i and

p < ℓi or j < i and p < ℓj. More intuitively stated, we continue with a next element of K in this

sequence whenever we reach an index i for which we already encountered the value z(i). Notice that fℓi_(k

i) might either have the same z value as a previous distinct vertex which means that

these two vertices form a solution or a pseudo-solution, or is a repeated vertex which happens due to the hash function h mapping different z values to the same vertex.

Suppose that both i∗, j∗ occur in WL(K), hra, bin(i∗)i ≡2 rb and hra, bin(j∗)i ≡2 1 − rb.

Then i∗_{, j}∗ _{∈ f}∗

L(K) as all vertices in WL(K) not in fL∗(K) are exactly the vertices of the path

kℓ, f1(kℓ), . . . that was aborted before it intersects either itself or other visited vertices, and thus

(11)

1 2 3 4 5 6 7 8

Figure 2: An example of the graph G representing f with n = 8, L = ∞, z = (11, 7, 3, 8, 3, 4, 1, 1), h(v) = (v mod 8) + 1 (so f (i) = (zi mod 8) + 1). If K = (3, 5, 7) then WL(K) =

(3, 4, 1, 4, 5, 7, 2, 8) (note 4 appears twice as h maps both 11 and 3 to 4) and Collide(f, K, L) outputs {(4, {1, 3, 5}), (2, {7, 8})}.

algorithm will return the solution (i∗, j∗). Because of this, we now lower bound the probability of the event that both i∗_{, j}∗ _{occur in W}

L(K). Call r = (ra, rb) good if it simultaneously holds that

hra, bin(i∗)i ≡2 rband hra, bin(j∗)i ≡2 1−rb i.e. zi∗ = x_i∗ and z_j∗ = y_j∗. First we have the following

easy observation.

Observation 4.1. If i∗_{6= j}∗, then Prr[r is good] = 1/4.

Indeed, this follows as Prr[hra, bin(i∗)i 6≡2 hra, bin(j∗)i] = 1/2 (which is observed by deferring

the random choice ra(j) on a bit position j where bin(i∗) and bin(j∗) differ) and conditioned on

these inner product being of different parities, exactly one choice of rb leads to r being good.

We continue with lower bounding the probability that i∗ and j∗ occur in WL(K) given that r is

good. To do so, we start with an observation which allows us to replace the random string WL(K)

by an easier to analyze random string. A similar connection was also used in [6, Proof of Theorem 2.1]. We use [n]L _{to denote the set of all strings of length L over the alphabet {1, 2, . . . , n}. For} any string β in [n]L_{, a repetition with respect to z is said to occur at position i of β if there exists}

j < i such that z(β(i)) = z(β(j)). If clear from the context the ‘with respect to z’ part is omitted. An sth _{repetition is said to occur at position i of β if already s − 1 repetitions have occurred before} index i of β, and a repetition occurs at index i.

Lemma 3.3. Fix any value of s and z. Let β be a random string generated by cutting off a uniform sample from the set [n]L _{at the s}th

repetition with respect to z. Then strings WL(K) and β are

equally distributed. I.e. for every ρ ∈ [n]L′ with L′ _{≤ L we have Pr}h,K[WL(K) = ρ] = Prβ[β = ρ].

Proof. Note we may assume that the sth repetition (with respect to z) occurred at position L′in ρ, or L′_{= L and at most s repetitions have occurred in ρ, as otherwise Pr}

h,K[WL(K) = ρ] = Prβ[β =

ρ] = 0. Assuming ρ has this property, Prβ[β = ρ] = n−L

′

as the only relevant event is that the first L′ locations of the uniform sample from [n]L used to construct β match with ρ.

For notational convenience, let us denote α = WL(K). Note that

Pr h,K[α = ρ] = L′ Y i=1 Pr h,K[αi = ρi|(α1, . . . , αi−1) = (ρ1, . . . , ρi−1)]. (1)

Thus to finish the claim it suffices to show that, for every i ≤ L′, αi is uniform over [n] given

(12)

equals z(αi′) for some i′ < i − 1. If so, it follows from the definition of W_L(K) = α that i − 1 = ℓ_j

for some j and αi = kj+1 which is chosen uniformly at random from [n] and independent from all

previous outcomes. Otherwise, αi = h(z(αi−1)) and since z(αi−1) does not occur in the sequence

z(α1), . . . , z(αi−2) and h is random, h(z(αi−1)) is uniform over [n] and independent of all previous

outcomes.

Thus, each term of the product in (1) equals 1/n and Prh,K[α = ρ] = Prβ[β = ρ] = n−L

′

. This implies that instead of analyzing the sequence WL(K) we can work with the above

distri-bution of strings in [n]L which is easier to analyze, as we will do now:

Lemma 3.4. _{Fix any value of s and z with p(z) ≤ p. Let β be a random string generated by cutting} off a uniform sample from [n]L at the sth

repetition. Then Prβ[β contains i∗ and j∗] ≥ Ω((L/n)2).

Proof. Letting β be distributed as in the lemma statement, note that Pr β [β contains i ∗_{, j}∗ ] ≥ Pr ˜ β∈R[n]L

[ ˜β contains i∗, j∗_{∧ ˜}β has at most s repetitions],

because if the uniform sample from [n]L used to construct β contains i∗, j∗ and has at most s repetitions, β contains i∗ _{and j}∗_{. The latter displayed expression can be further lower bounded as}

≥ Pr

˜ β∈R[n]L

[ ˜β contains i∗, j∗ _{exactly once ∧ ˜}β has at most s repetitions] = Pr

˜ β∈R[n]L

[ ˜β contains i∗, j∗ exactly once] (2)

· Pr

˜ β∈R[n]L

[ ˜_{β has at most s repetitions| ˜}β contains i∗, j∗ exactly once]. (3) We may lower bound (2) as

Pr

˜ β∈R[n]L

[ ˜β contains i∗ and j∗ exactly once] = Pr

˜ β∈R[n]L [∃!ℓ1, ℓ2 ≤ L : ˜β(ℓ1) = i∗∧ ˜β(ℓ2) = j∗] = L(L − 1)(1/n)2(1 − 2/n)L−2 ≥ L(L − 1)(1/n)2(1 − 2/n)n= Ω((L/n)2). (4)

The second equality uses that for each v ∈ [n], Pr[v = ˜β(i)] = 1/n for each i, and the ˜β(i)’s are independent. The second inequality uses that L ≤ n which is implied by s ≤ n2/p and p ≥ n.

We continue by lower bounding (3). To this end, let β′ be the string obtained after removing the unique ℓ1 and ℓ2 indices from ˜β which contain i∗ and j∗ as values respectively. Note that β′

is uniformly distributed over ([n] \ {i∗, j∗_})L−2, since we condition on i∗ and j∗ being contained exactly once in ˜β. Let q and q′ be the number of repetitions in ˜β and β′ respectively. Note that q − q′ _{= 1 if z(β}′_{(k)) 6= z(i}∗_{) for every k ∈ [L − 2] because of the repetition in ˜}_{β occurring at ℓ}

2

(assuming ℓ1 < ℓ2 for simplicity). Otherwise, q − q′ = 2 because of the repetition in ˜β occurring

at ℓ2 and the repetition in ˜β occurring at ℓ1 if z(i∗) = z( ˜β(k)) for k < ℓ1 or otherwise at k if

z(i∗_{) = z( ˜}_{β(k)) for k ≥ ℓ}

1. Therefore we have that

E_˜ β∈R[n]L[q] = 1 + Pr_β′[∃k ∈ [L − 2] : z(β ′_{(k)) = z(i}∗_{)] + E} β′[q′] ≤ 1 + (L − 2)|z −1_(z(i∗_))| n − 2 + Eβ′[q ′_] Using |z−1(z(i∗_{))| ≤}√p, L−2_n−2 _≤ L_n. ≤ 1 + L√p/n + Eβ′[q′] ≤ 1 + s/2 + E_β′[q′], Using L = 1 2nps/p.

(13)

where β′ _∈R {[n] \ {i∗, j∗}}L. Since the number of repetitions is always less than the number of

collisions (i.e. unordered pairs of distinct indices in which the string has equal values), we have the upper bound E_β′[q′] ≤ X i<j Pr β′[z(β ′_{(i)) = z(β}′_(j))] ≤L 2 m X v=1 |x−1(v)|2+ |y−1(v)|2+ |x−1(v)||y−1(v)| ! /(n − 2)2 ≤L₂ 2p/(n − 2)2 ≤ s/3,

where we use the AM-GM inequality in the second inequality to obtain the upper bound |x−1_(v)||y−1_{(v)| ≤}

|x−1(v)|2+ |y−1(v)|2. In the last inequality we assume n is sufficiently large and use L₂_{≤ L/2.} Thus E_β˜[q] ≤ 1 + 5s/6, and we can use Markov’s inequality to upper bound Pr[q > s] = Pr[q ≥

s + 1] ≤ 1+5s/6s+1 ≤ 11/12. This lower bounds (3) by 1/12 and therefore the claim follows.

Proof of Lemma 3.2. The case i∗ = j∗ is checked at Line 2. So we may assume i∗ _{6= j}∗. By Observation 4.1, r is good with probability 1/4. In the remainder of this proof, we fix such a good r (and thus z is also fixed as it is determined by r).

We first analyze the running time performance of a single iteration of the loop from Line 7. Line 8 takes O(L) time and O(s log n) space as s ≤ L, which follows from the requirement of s being at most n2/p. For Line 9, recall that Collide(f, K, L) takes Õ(L) time and O(s log n) space. Lines 10 and 11 in the algorithm take Õ(L) time and O(s log n) space as in Line 11 we can iterate over all elements of X and quickly check whether a new element is in a solution by storing all considered elements of X in a lookup table. Thus a single iteration of the while loop from Line 7 uses Õ(L) time and O(s log n) space.

As Lines 1 to 6 clearly do not form any bottleneck, it remains to upper bound the expected number of iterations of the while loop before a solution is found.

To this end, recall that a solution (i∗, j∗) is found if i∗ and j∗ are in WL(K). The probability

that this happens is equal to the probability that i∗_{, j}∗ _{are in a sequence β generated by cutting}

off a uniform sample from the set [n]L at the sth repetition by Lemma 3.3. Applying Lemma 3.4 we can lower bound this probability with Ω((L/n)2). This implies that

E_r,h_[Pr

K[Collide(f, K, L) detects (i ∗_{, j}∗

)]] ≥ Ω((L/n)2),

and therefore the expected number of iterations of the loop at Line 7 until (i∗_{, j}∗_{) is found is}

O(n2/L2). Since every iteration takes Õ(L) time, the expected running time will be Õ(n2/L) which is Õ(npp/s).

We would like to mention that in the proof of Lemma 3.4, upper bounding the number of repetitions by the number of collisions seems rather crude, but we found that tightening this step did not lead to a considerable improvement of an easy ˜O(n2_{/s) time O(s log n) space algorithm for}

(14)

4 Subset Sum, Knapsack and Binary Linear Programming

The main goal of this section is to prove Theorem 1, which we restate here for convenience. Theorem 1(restated). There are Monte Carlo algorithms solving Subset Sum and Knapsack using O∗₍₂0.86n_{) time and polynomial space. The algorithms assume random read-only access to random}

bits.

We first focus on Subset Sum, and discuss Knapsack later. Let w1, . . . , wn be the integers in

the Subset Sum instance. Throughout this section we assume that n is even, by simply defining wn+1= 0 if n is odd. Let w be the weight vector (w1, . . . , wn).

To prove the theorem for Subset Sum, we use a ‘win-win’ approach based on the number of all possible distinct sums |w(2[n])| = |{hw, xi : x ∈ {0, 1}n}| generated by w. Specifically, if |w(2[n])| is sufficiently large, we prove that the number of pairs of subsets with the same sum, or more formally

|{(x, y) ∈ {0, 1}n× {0, 1}n : hw, xi = hw, yi}|,

cannot be too large (see Lemma 4.2 below for a more precise statement). This is done by showing a smoothness property of the distribution of subset sums. The proof of this smoothness property builds for a large part on ideas by Austrin et al. [4] in the context of exponential space algorithms for Subset Sum. Then we use the Meet-in-the-Middle approach with a random split L, R of [n], to reduce the Subset Sum instance to an instance of List Disjointness with 2n/2_{-dimensional vectors x}

and y whose entries are the sums of the integers from w indexed by subsets L and R, respectively. We apply Theorem 3 to solve this instance of List Disjointness. The crux is that, assuming |w(2[n])| to be large, the smoothness property implies we may set the parameter p sufficiently small. On the other hand, if |w(2[n])| is sufficiently small, known techniques can be applied to solve the instance in O∗_(|w(2[n]_{)|) time.}

The result for Knapsack and Binary Integer Programming is subsequently obtained via a re-duction by Nederlof et al. [26].

On the smoothness of the distribution of subset sums. The required smoothness property is a quite direct consequence from the following more general bound independent of the subset sum setting. Let us stress here that the complete proof of the smoothness property proof is quite similar to [5, Proposition 4.4], which in turn was inspired by previous work on bounding sizes of Uniquely Decodable Code Pairs (e.g. [35]). However, our presentation will be entirely self-contained.

Lemma 4.1. _{Let d ≤ n be a positive integer. Let A ⊆ {0, 1}}n _{and B ⊆ {−1, 0, 1}}n be collections satisfying: (i) |supp(b)| = d for every b ∈ B, i.e. any b ∈ B has d non-zero entries and, (ii) for every a, a′ _{∈ A and b, b}′ _{∈ B the following holds:}

a + b = a′+ b′ implies that (a, b) = (a′, b′). (5) Then |A||B| ≤ 2n ⌈d/2⌉n poly(n).

Proof. Note we may assume d and n are even since we can otherwise increase n to n + 1 and set an+1= bn+1= 1 (if we want to increase d) or an+1= bn+1= 0 (if we want to increase n) for every

b ∈ B and a ∈ A. The increases of n and d are subsumed by the poly(n) factor.

The idea of the proof will be to give a short encoding of the pairs (a, b) ∈ A × B using the above properties. We use the standard sumset notations A + B = {a + b : a ∈ A, b ∈ B} and

(15)

A − B = {a − b : a ∈ A, b ∈ B}, where the additions of the vectors are in Zn. Note that (5) implies that |A + B| = |A||B| = |A − B| as a − b = a′_{− b}′ _{implies a + b}′_{= a}′_{+ b.}

For a vector a ∈ A, recall that a−1(0) denotes the set of indices i in [n] with a(i) = 0. Similarly, define the sets a−1(1) and b−1_{(u) for u ∈ {−1, 0, 1}. For a pair (a, b) ∈ A × B, let us define its} signature x(a, b) as the vector

x(a, b) = x = (x0,−1, x0,0, x0,1, x1,−1, x1,0, x1,1) ∈ N6,

where xu,v = |a−1(u) ∩ b−1(v)| for each u ∈ {0, 1} and each v ∈ {−1, 0, 1}. Note that x can take at

most (n + 1)6 _{possible values.}

For a fixed signature vector x, define Px as the set of all pairs (a, b) ∈ A × B with signature

x(a, b) = x. So, if (a, b) ∈ Px, x indicates for each combination of possible values the number of

indices in which this combination occurs in the pair (a, b). For a vector c ∈ Zn_{, we let ODD(c) ⊆ [n]}

denote the set of indices i such that ci is odd. Letting xodd = x0,−1+ x0,1+ x1,0, note that for

(a, b) ∈ Px, we thus have that |ODD(a + b)| = xodd.

To bound the number of pairs (a, b), we will bound Px for each x. As (a, b) ∈ A×B is determined

by a + b, it suffices to bound the number of sums a + b. Fix some (a, b) ∈ Px. A crucial observation

is that a + b has entries from {−1, 0, 1, 2}, and a 2 precisely occurs in the indices a−1(1) ∩ b−1(1) and a −1 precisely occurs at the indices in a−1_{(0) ∩b}−1_{(−1). So a+b can be completely determined}

by specifying the triple

(ODD(a + b), a−1_{(0) ∩ b}−1_{(−1), a}−1_{(1) ∩ b}−1(1)).

Thus, we may bound Px by counting the number of possibilities of such triples to obtain

|Px| ≤ n xodd xodd x0,−1 n − xodd x1,1 = n! (x0,−1)!(xodd− x0,−1)!(x1,1)!(n − xodd− x1,1)! = n x0,−1, xodd− x0,−1, x1,1, n − xodd− x1,1 . (6)

Similarly, since (a, b) ∈ A × B is also determined by a − b, which is in turn determined by the triple (ODD(a + b), a−1_{(0) ∩ b}−1(1), a−1_{(1) ∩ b}−1_(−1)),

we may also bound Px by counting the number of possibilities of these triples to obtain

|Px| ≤ n xodd xodd x0,1 n − xodd x1,−1 = n x0,1, xodd− x0,1, x1,−1, n − xodd− x1,−1 . (7)

Note that (6) and (7) are equivalent modulo interchanging x0,−1 with x0,1 and x1,1 with x1,−1

(which is natural as A − B = A + (−B)). We know that x0,−1 + x1,1 + x0,1 + x1,−1 = d, and

(16)

second case we obtain |Px| ≤ max d1+d2≤d/2 n d1, xodd− d1, d2, n − xodd− d2 = max d′_≤d/2_d₁max_+d₂_=d′ n d1, xodd− d1, d2, n − xodd− d2

For any u, v with u+v = s, the term u!v! is minimized when u = ⌊s/2⌋ and v = ⌈s/2⌉. Applying this to the term above, once with u = d1 and v = d2 and once with u = xodd− d1 and v = n − xodd− d2,

we obtain that |Px| ≤ max d′_≤d/2 n ⌈d′_{/2⌉, ⌊(n − d}′_{)/2⌋, ⌊d}′_{/2⌋, ⌈(n − d}′_)/2⌉ ≤ max d′_≤d/22 n n/2 ⌈d′_/2⌉ n/2 ⌊d′_/2⌋ ≤ max d′_≤d/22 n n d′ ≤ 2n n d/2 . The last step uses that n_u _n

v ≤ 2n

u+v for any u, v and that d/2 ≤ n/2 and hence the maximum

is attained at d′ _{= d/2. The lemma now follows directly from this bound as |A||B| ≤}P

x|Px| ≤

2n _d/2n(n + 1)6_.

Now we use Lemma 4.1 to obtain the promised smoothness property of the distribution of subset sums.

Lemma 4.2. Let w = (w1, . . . , wn) be integers and d ≤ n be a positive integer. Denote Cd= {x ∈

{−1, 0, 1}n: hw, xi = 0 ∧ |supp(x)| = d}. Then |w(2[n])| · |Cd| ≤ 2n n d/2 poly(n).

Proof. Let A ∈ {0, 1}n be a set of vectors such that hw, ai = hw, a′i implies a = a′ for every a, a′ _{∈ A. Note that such an A satisfying |A| = |w(2}[n]_{)| can be found by picking one representative} from the set {x ∈ {0, 1}n_{: hw, xi = s} for each s ∈ w(2}[n]_{). We apply Lemma 4.1 with B = C}

d. It

remains to show that this pair (A, B) satisfies (5). To this end, note that if a + b = a′+ b′, then ha, wi = ha, wi + hb, wi = ha + b, wi = ha′+ b′_{, wi = ha}′_{, wi + hb}′_{, wi = ha}′_{, wi,}

using linearity of inner product and hb, wi = hb′, wi = 0 by definition of B. Therefore, a = a′ by definition of A and since a + b = a′_{+ b}′ _{it follows that b = b}′_.

Now we are fully equipped to prove the first part of the main theorem of this section: Proof of Theorem 1(a). The algorithm is as follows

(17)

Algorithm SSS(w, t, h)

Assumes random access to a random function h : [Pn

i=1wi] → {0, 1}n/2.

1: Run a polynomial space, O∗(20.86n_{) time algorithm that assumes |w(2}[n]_{)| ≤ O}∗(20.86n) from e.g. [20, Theorem 1(a)] or [4].

2: Let (L, R) be random partition of [n] with |L| = n/2 and |R| = n/2. 3: Let x be the list (P

e∈Xwe)X⊆L of length 2n/2.

4: _{Let y be the list (t −}P_e∈Y we)Y ⊆R of length 2n/2.

5: Run LD(x, y, 1, O∗(20.72n)) and cut off the running time after O∗(20.86n) time if a solution was still not found.

6: if so far no solution was found then return NO else return YES

Figure 3: Applying the collision search technique for list disjointness.

Here, the algorithm for Line 1 is implemented by hashing all integers of the Subset Sum instance modulo a prime of order O∗(20.86n) and running the algorithm by Lokshtanov and Nederlof [25] (as already suggested and used in [20, 4]). By checking whether a correct solution has been found by self reduction and returning NO if not, we can assume the algorithm has no false positives and false negatives with constant probability assuming |w(2[n])| ≤ O∗(20.86n).

We continue by analyzing this algorithm. First, it is clear that this algorithm runs in O∗(20.86n) time. For correctness, note that the algorithm never returns false positives as the algorithms invoked on Lines 1 and 5 also have this property. Thus it remains to upper bound the probability of false negatives. Suppose a solution exists. If |w(2[n])| ≤ O∗(20.86n), Line 1 finds a solution with constant probability, so suppose this is not the case. As in Lemma 4.2, denote

Cd= {x ∈ {−1, 0, 1}n: hw, xi = 0 ∧ |supp(x)| = d}.

Then we know by Lemma 4.2 that 20.86n_|Cd| ≤ O∗(2n _⌈d/2⌉n ), so Cd ≤ O∗(20.14n _⌈d/2⌉n ). Let

wL, wR denote the restrictions of the vector w to all indices from L and R respectively. Let us

further denote

PL_{= {(x, y) ∈ {0, 1}}L_{× {0, 1}}L_{: hw}L, xi = hwL, yi},

PR_{= {(x, y) ∈ {0, 1}}R_{× {0, 1}}R_{: hw}R, xi = hwR, yi},

p = max{|PL|, |PR|}.

As |PL| and |PR| are exactly the number of pseudo-solutions of the List Disjointness instance (x, y),

it remains to show that p ≤ O∗₍₂0.72n_{) with constant probability. In particular, we will show that}

|PL| ≤ O∗(20.72n_{) with probability at least 3/4. As |P}L_{| and |P}R_{| are identically distributed a} union bound shows that p ≤ O∗(20.72n) with probability at least 1/2. This suffices for our purposes as then the running time of the List Disjointness algorithm is O∗₍₂n/2√_{p) = O}∗₍₂0.86n_).

Note that elements of Cd may contribute to PL, but only if their support is not split, e.g. the

support is a subset of either L or R. Let us introduce the following notation for elements of Cd

that contribute to p:

C_dL_{= {x ∈ C}d: supp(x) ⊆ L}.

Suppose that (x, y) ∈ PL. Then hwL, x − yi = 0 and therefore x − y ∈ C_dL. Now suppose

(18)

such that v = x − y, as for each index i ∈ L with vi = 1 we have xi = 1 and yi = 0, for each index

i ∈ L with vi = −1 we have xi = 0 and yi= 1, and for each i ∈ L with vi = 0, either xi = yi = 0

or xi = yi = 1. Therefore we have the upper bound

|PL| ≤

n/2

X

d=0

|CdL|2n/2−d. (8)

If v ∈ Cd, for a random split (L, R) as picked in Line 2 we see that

Pr[v ∈ CdL] = Pr[supp(v) ⊆ L] = n − d n/2 − d / n n/2 . (9)

Now we can combine all the work to bound the expectation of |PL| over the random split as

E_[|PL_{|] ≤} n/2 X d=0 E_[CL d]2n/2−d Using (8) ≤ n/2 X d=0 X v∈Cd Pr[v ∈ CdL]2n/2−d = n/2 X d=0 |Cd| n − d n/2 − d / n n/2 2n/2−d Using (9) = O∗   n/2 X d=0 20.14n n ⌈d/2⌉ n − d n/2 − d / n n/2  2n/2−d By Lemma 4.2 and |w(2[n])| ≥ 20.86n = O∗  2−0.36n n/2 X d=0 n ⌈d/2⌉ n − d n/2 − d /2d  , Using _n/2n ≥ 2n/n

Omitting polynomial terms, taking logs, rewriting using log₂ b_a

= b · h(a/b) where h(q) = −q log2q − (1 − q) log2(1 − q) is the binary entropy function, and denoting δ = d/n, this reduces to

−0.36 + max 0≤δ≤1/2h(δ/2) + h 1/2 − δ 1 − δ (1 − δ) − δ n.

Note that here we are allowed to replace the summation by a max as we suppress factors polynomial in n. By a direct Mathematica computation this term is upper bounded by 0.72n (where the maximum is attained for δ ≈ 0.0953), which implies that E[|PL|] ≤ O∗(20.72n) as required.

Knapsack and Binary Linear Programming. Now Theorem 1(b) follows from Theorem 1(a) by the following reduction:

Lemma 4.3 ([26], Theorem 2). If there exists an algorithm that decides the Subset Sum problem in O∗(t(n)) time and O∗(s(n)) space then there exists an algorithm that decides the Knapsack in O∗(t(n)) time and O∗(s(n)) space.

(19)

We would like to remark that the techniques from Section 3 do not seem to be directly applicable to the knapsack problem, so the reduction of [26] seems necessary.

Using the methods behind the proof of Lemma 4.3, we can also obtain an algorithm for the Binary Linear Programming problem. In particular, we use the following result:

Lemma 4.4. _{Let U be a set of cardinality n, let ω : U → {−N, . . . , N} be a weight function,} and let l < u be integers. Then there is a polynomial-time algorithm that returns a set of pairs Ω = {(ω1, t1), . . . , (ωK, tK)} with ωi : U → {−N, . . . , N} and integers t1, . . . , tK ∈ [−N, N] such

that (1) K is O(n lg(nN )), and (2) for every set X ⊆ U it holds that ω(X) ∈ [l, u] if and only if there exists an index i such that ωi(X) = ti.

This result is a small extension of Theorem 1 by Nederlof et al. [26] (assuming u − l ≤ nN), but the same proof from [26] works for this extension (the polynomial time algorithm is a direct recursive algorithm using rounding of integers, refer to [26] for details).

Corollary 2(restated). There is a Monte Carlo algorithm solving Binary Integer Programming in-stances with maximum absolute integer value m and d constraints in time O∗₍₂0.86n_(log(mn)n)O(d)₎

and polynomial space. The algorithm assumes random read-only access to random bits.

Proof. Using binary search, we can reduce the optimization variant to a decision variant that asks for a x ∈ {0, 1}n such that haj, xi ∈ [lj, uj] for j = 1, . . . , d + 1. Using Lemma 4.4, we can reduce

this problem to O(log(m)n) instances on the same number of variables and constraints but with l1 = u1. Using the same reduction on each single obtained instance for j = 2, . . . , d we obtain

(O(log(m)n))d instances on n variables and m constraints with lj = uj for every j, and these

instances can be easily enumerated with linear delay and polynomial space. An instance satisfying lj = uj for every j can in turn be reduced to a Subset Sum problem on n integers in a standard way

by setting wi =Pdj=1a j

iBj and t =

Pd

j=1liBj, where B ≥ mn is a power of two. This reduction

clearly preserves whether the instance is a YES-instance as B ≥ mn prevents interaction between different segments of the bit-strings when integers are added.

5 Random k-Sum

We start by proving Theorem 4 which we restate here for convenience and afterwards present the faster algorithm mentioned in Subsection 1.1 that assumes random read-only access to random bits. Theorem 4 _{(restated). There is a randomized algorithm that given a constant k ≥ 2, k uniformly} random vectors w1_{, . . . , w}k _∈

R [m]n, with n ≤ m ≤ poly(n) and m being a multiple of n, and a

target t (that may be adversarially chosen depending on the lists) finds si ∈ wi satisfyingPki=1si = t

if they exist with constant probability using ˜O(nk−0.5) time and O(log n) space.

Proof. Let us first consider the case k = 2. We will apply the List Disjointness algorithm from Section 3 with s = 1 (so K = {k1}), x = w1, y = w2 and p = Θ(n) (the latter is justified by an

easy computation of E[p(w1, w2)])5. Recall that the List Disjointness algorithm assumes random read-only access to a random function h : [m] → [n], but here we do not use this assumption and show that we can leverage the input randomness to take

h(v) = (v mod n) + 1.

5

Fix v ∈ [m]. Let Xi = 1 if w1(i) = v and 0 otherwise. Then E[|(w1)−1(v)|2] =P_iE[Xi] + 2P_i<jE[XiXj] = n m+ 2 n 2 1 m2. Thus E[p(w 1 , w2)] =P vE[|(w 1 )−1_(v)|2 ] +P vE[|(w 2 )−1_(v)|2 ] = 2n + 4 n2 1 m= Θ(n).

(20)

Note that since m is a multiple of n, we have that if v ∈R [m], then both h(v) and h(t − v) are

uniformly distributed over [n]. To analyze this adjusted algorithm, we reuse most of the proof of Lemma 3.2. To facilitate this, we show that within one run of the loop of Line 7 the following variant of Lemma 3.3 holds, where WL({k1}) is as defined in Section 3 (but with the new function

h as defined above).

Lemma 5.1. Fix any value of integers i∗, j∗ _{∈ [n] and r such that r is good. Let x, y be random} strings from [m]n _{such that x(i}∗_{) = y(j}∗_{). Let z}′ _{∈ [m]}n _{be the vector with z}′_{(i) = i if i /}_{∈ {i}∗_{, j}∗_}

and z′_{(i) = n + 1 otherwise. Let β be a random string which is generated by cutting off a uniform}

sample from the set [n]Lat the first repetition with respect to z′. Then for every string ρ containing i∗ and j∗ it holds that Prk1,x,y[WL({k1}) = ρ] ≥ Prβ[β = ρ].

Intuitively, this lemma states that random instances in which i∗ _{and j}∗ _{form a solution behave}

similarly as the algorithm from Lemma 3.2 with the specific vector z′ in which i∗, j∗ are the only indices with common values.

Proof. We may assume ρ(i) 6= ρ(j) for every i 6= j as otherwise Prβ[β = ρ] = 0 since the only

repetition with respect to z′ in ρ must be formed by i∗ _{6= j}∗. By the same argument we may also assume that the last entry of ρ is i∗ _{or j}∗_{. Denoting L}′ _{for the length of ρ, note that Pr}

β[β = ρ] ≤

n−L′ as a necessary condition for β = ρ is that the first L′ locations of the infinite random string used to construct β match ρ. For notational convenience, let us denote α = WL({k1}). We have

that Pr k1,x,y [α = ρ] = L′ Y i=1 Pr k1,x,y [αi = ρi|(α1, . . . , αi−1) = (ρ1, . . . , ρi−1)]. (10)

We see that Prk1[α1 = ρ1] = 1/n as α1 = k1and k1 is uniformly distributed over [n]. For 1 < i ≤ L

′

we know αi−1= ρi−16= ρj = αj for every j < i − 1 and i∗ and j∗ cannot both occur in α1, . . . , αi−1

thus both xαi−1 and yαi−1 are independent of α1, . . . , αi−1 and uniformly distributed over [m] as

x, y are random strings with xi∗ = y_j∗. Depending on r we either have that α_i = h(x_α_i−1) or

αi = h(xαi−1), but in both cases

Pr

k1,x,y

[αi = ρi|(α1, . . . , αi−1) = (ρ1, . . . , ρi−1)] = 1/n,

since m is a multiple of n. Thus (10) equals n−L′ and the lemma follows. As a consequence of this lemma we have that

Pr

k1,x,y

[WL({k1}) contains i∗, j∗|r is good ∧ x(i∗) = y(j∗)] ≥ Pr

β[β contains i ∗_{, j}∗_]

≥ Ω((L/n)2),

(11) where β is distributed as in Lemma 5.1 and we use Lemma 3.4 with s = 1, z = z′ _{and p = n + 1}

for the second inequality. Note the probability lower bounded in (11) is sufficient for our purposes, as we assume i∗, j∗ exist in the theorem statement and Observation 4.1 still holds. Using the proof of Lemma 3.2, the adjusted algorithm thus solves 2-Sum in time ˜O(n√p) = ˜O(n1.5_{) and O(log n)}

space with constant probability.

For k-Sum with k > 2, we repeat the above for every tuple from the Cartesian product of the last k − 2 sets. This blows up running time by a factor of nk−2 _{and space by an additive factor of}

k log n. It is easy to see that this algorithm will still find a solution with constant probability if it exists since it only needs to do this for the correct guess of the integers from the last k − 2 sets.

(21)

The following result follows more directly from our techniques from Section 3.

Theorem 5. _{Let k, s, m be integers such that k is even, k ≥ 2, s ≤ n}k/2 and nk/2_{≤ m ≤ poly(n).} There is a randomized algorithm that given k uniformly random vectors w1_{, . . . , w}k_∈

R[m]n and a

target t (that may be adversarially chosen depending on the lists) finds si ∈ wi satisfyingPki=1si = t

if it exists with constant probability using ˜O(n3k/4/√s) time and O(s log n) space. The algorithm assumes random read-only access to a random function h : [km] → [nk/2_].

Proof. We may assume t ≤ km since otherwise the answer is trivially NO. Define two lists

x =   k/2 X i=1 w_si_i   (s1,...,s_k/2)∈[n]k/2 y =  t − k/2 X i=1 wk/2+i_s_i   (s1,...,s_k/2)∈[n]k/2 .

Then for any value v ∈ [m] and v′_{∈ {t − m, . . . , t} we have}

E_[|x−1_(v)|2_{] ≤ n}k/2_{/m + n}k_/m2 E_[|y−1_(v′_)|2_{] ≤ n}k/2_{/m + n}k_/m2_.

Thus we see that E[p(x, y)] ≤ O(nk/2) so we may set p = O(nk/2) and it will be a correct upper bound on the number of pseudo-collisions with constant probability by Markov’s inequality. The algorithm then follows from Theorem 4 as it runs in time O(n3k/4_/√_s).

6 Further Research

Our work paves the way for several interesting future research directions that we now briefly outline. The random read-only access to random bits assumption: An important question is whether the random access read-only randomness assumption is really required to improve over exhaustive search. This assumption seems relatively mild. Specifically, note that the assumption is weaker than the assumption that non-uniform exponentially strong pseudorandom generators exist (see e.g. [18]). Specifically, if such pseudorandom generators exist they could be used in the seminal construction by Goldreich et al. (see e.g. [18, Proposition 3]) to build strong pseudorandom functions h : [m] → [n] required for the algorithm listed in Figure 1: when the algorithm fails to solve a particular instance of List Disjointness, this instance could be used as advice to build a distinguisher for the pseudorandom generator (this is similar to the proof of other derandomization results, see e.g. [1, Section 9.5.2 and 20.1]). Interestingly, Impagliazzo et al. [17] show that if Subset Sum is sufficiently hard in the average case setting, the Subset Sum function that computes the sum of a given subset is a good candidate for a pseudorandom function.

Beame et al. [6] also raise the question whether the random access read-only randomness as-sumption is really required and in particular ask whether polylog(n)-wise independent randomness might be useful to this end. Note that the involved algorithms only run as slow as exhaustive search if the function f has very specific properties, and perhaps additional specific insights in the List Disjointness instances arising in our Subset Sum application can be of further use to exclude or algorithmically exploit that an instance has these properties.

More generally, it is natural to ask whether the fact that our algorithm uses little space can be used to build sufficiently strong pseudorandom generators for it. As mentioned in the introduction, derandomization of small space algorithms has been well-studied in previous works, but to our

(22)

knowledge almost all positive results concern algorithms that read random bits only once (and thus need to store their values). A quite different recent work perhaps more relevant for computational model is provided by Raz [31], who shows that low space non-uniform algorithms cannot learn vectors based on inner product with random vectors.

Solving instances of List Disjointness with many pseudo-collisions faster: A natural question is whether our dependence in the number of pseudo-collisions is really needed. Though this number comes up naturally in our approach it is somewhat counterintuitive that this number determines the complexity of an instance. It would be interesting to find lower bounds, even for restricted models of computation, showing that this dependence is needed, or show the contrary. Other applications: Due to the basic nature of the List Disjointness problem, we expect there to be more applications of Theorem 3. Note that already the result of Beame et al. [6] implies space efficient algorithms for e.g. the Colinear problem (given n points in the plane are 3 of them on a line). In the area of exact algorithms for NP-complete problems, it is for example still open to solve MAX-2-SAT or MAX-CUT in O∗_{((2 − ε)}n_{) time and polynomial space (see [13, Section 9.2]) or}

the Traveling Salesman problem in time O∗_{((4 − ε)}n_{) time and polynomial space (see [13, Section}

10.1]) for ε > 0. Other applications might be to find space efficient algorithms to check whether two vertices in an n-vertex directed graph with maximum (out/in)-degree d are of distance k from each other: this can be done in time O(d⌈k/2⌉_{) time and space or in O(d}k_{) time and O(k log n)}

space. Improving the latter significantly for d = 2 would generalize Theorem 1.

Finally, a natural direction of this research line would be to rigorously study the worst-case complexity of other cryptographic problems typically solved by cycle finding algorithms, e.g. the Discrete Logarithm.

References

[1] Sanjeev Arora and Boaz Barak. Computational Complexity - A Modern Approach. Cambridge University Press, 2009.

[2] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jussi Määttä. Space-time tradeoffs for subset sum: An improved worst case algorithm. In ICALP, pages 45–56, 2013.

[3] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Subset sum in the absence of concentration. In Symposium on Theoretical Aspects of Computer Science, STACS, pages 48–61, 2015.

[4] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Dense subset sum may be the hardest. In Symposium on Theoretical Aspects of Computer Science, STACS, pages 13:1–13:14, 2016.

[5] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Sharper upper bounds for unbalanced uniquely decodable code pairs. In International Symposium on Information Theory, ISIT, pages 335–339, 2016.

(23)

[6] Paul Beame, Rapha¨el Clifford, and Widad Machmouchi. Element distinctness, frequency moments, and sliding windows. In Foundations of Computer Science (FOCS), pages 290–299, 2013.

[7] Anja Becker, Jean-S´ebastien Coron, and Antoine Joux. Improved generic algorithms for hard knapsacks. In Advances in Cryptology - EUROCRYPT, pages 364–385, 2011.

[8] Matthijs J. Coster, Antoine Joux, Brian A. Lamacchia, Andrew M. Odlyzko, Claus-Peter Schnorr, and Jacques Stern. Improved low-density subset sum algorithms. Computational Complexity, 2:111–128, 1992.

[9] Marek Cygan, Fedor Fomin, Bart M.P. Jansen, Lukasz Kowalik, Daniel Lokshtanov, Daniel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Open problems for FPT school 2014 (link).

[10] Itai Dinur, Orr Dunkelman, Nathan Keller, and Adi Shamir. Efficient dissection of composite problems, with applications to cryptanalysis, knapsacks, and combinatorial search problems. In Advances in Cryptology - CRYPTO, pages 719–740, 2012.

[11] Rodney G. Downey, Michael R. Fellows, and Frank K. H. A. Dehne, editors. Parameterized and Exact Computation, IWPEC, 2004.

[12] Abraham Flaxman and Bartosz Przydatek. Solving medium-density subset sum problems in expected polynomial time. In Symposium on Theoretical Aspects of Computer Science STACS, volume 3404, pages 305–314, 2005.

[13] Fedor V. Fomin and Dieter Kratsch. Exact Exponential Algorithms. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2010.

[14] Ellis Horowitz and Sartaj Sahni. Computing partitions with applications to the knapsack problem. J. ACM, 21(2):277–292, 1974.

[15] Nick Howgrave-Graham and Antoine Joux. New generic algorithms for hard knapsacks. In Advances in Cryptology - EUROCRYPT, pages 235–256, 2010.

[16] Russell Impagliazzo, Shachar Lovett, Ramamohan Paturi, and Stefan Schneider. 0-1 integer linear programming with a linear number of constraints. CoRR, abs/1401.5512, 2014.

[17] Russell Impagliazzo and Moni Naor. Efficient cryptographic schemes provably as secure as subset sum. J. Cryptology, 9(4):199–216, 1996.

[18] Abhishek Jain, Krzysztof Pietrzak, and Aris Tentes. Hardness preserving constructions of pseudorandom functions. In Theory of Cryptography - TCC, pages 369–382, 2012.

[19] Antoine Joux. Algorithmic Cryptanalysis. Chapman & Hall/CRC, 1st edition, 2009.

[20] Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Homomorphic hashing for sparse coeffi-cient extraction. In Parameterized and Exact Computation - IPEC, pages 147–158, 2012. [21] Donald E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming.

Faster space-efficient algorithms for Subset Sum, k-Sum and related problems