Average-Case: Randomized vs. Quantum - Quantum Computing and Communication Complexity

average-case complexity under the uniform distribution.

If a≥ 3/10, the expected number of queries for step 2 is

log NX

i=100

Pr[˜a1 ≤ 2/10, . . . , ˜ai−1≤ 2/10 | a ≥ 3/10] · i c ≤

log NX

i=100

Pr[˜a_i−1≤ 2/10 | a ≥ 3/10] · i c ≤

log NX

i=100

2⁻⁽ⁱ⁻¹⁾· i

c ∈ O(1).

The probability that step 4 is needed (given a ≥ 3/10) is at most 2−(c log N)/c = 1/N . This adds _N¹N = 1 to the expected number of queries.

Under the uniform distribution, the probability of the event a < 3/10 is at most 2^−c^′^N for some constant c^′. This case contributes at most 2^−c^′^N(N + (log N )²)∈ o(1) to the expected number of queries. Thus in total the algorithm uses O(1) queries on average, hence Rûnif₂ (f ) ∈ O(1). Since Qûnif2 (f ) ≤ Rûnif2 (f ), we also have Qûnif₂ (f )∈ O(1).

Since a deterministic classical algorithm for f must be correct on every input x, it is easy to see that it must make at least N/10 queries on every input, hence

D^unif(f )≥ N/10. 2

Accordingly, we can have huge gaps between Dûnif(f ) and Qûnif₂ (f ). However, this example tells us nothing about the gaps between quantum and classical bounded-error algorithms. In the next section we exhibit an f where Qûnif₂ (f ) is exponentially smaller than the classical bounded-error complexity Rûnif₂ (f ).

5.4 Average-Case: Randomized vs. Quantum

5.4.1 The function

We use the following modification of Simon’s problem from Section 1.5:² Modified Simon’s problem:

We are given x = (x1, . . . , x2ⁿ), with xi ∈ {0, 1}ⁿ. We want to compute a Boolean function defined by: f (x) = 1 iff there is a non-zero k ∈ {0, 1}ⁿ such that for all i∈ {0, 1}ⁿ we have x_i⊕k = xi.

Here we treat i∈ {0, 1}ⁿ both as an n-bit string and as a number between 1 and 2ⁿ, and⊕ denotes bitwise XOR (addition modulo 2). Note that this function is total, unlike Simon’s original promise function. Formally, f is not a Boolean function because the variables xi are {0, 1}ⁿ-valued. However, we can replace

2The preprint [90] proves a related but incomparable result about another modification of Simon’s problem.

every variable xi by n Boolean variables and then f becomes a Boolean function of N = n2ⁿ variables. The number of queries needed to compute the Boolean function is at least the number of queries needed to compute the function with {0, 1}ⁿ-valued variables (because we can simulate a query to the Boolean input-variables by means of a query to the {0, 1}ⁿ-valued input-variables, just ignoring the n− 1 bits we are not interested in) and at most n times the number of queries to the {0, 1}ⁿ-valued input variables (because one {0, 1}ⁿ-valued query can be simulated using n Boolean queries). As the numbers of queries are so closely related, it does not make a big difference whether we use the{0, 1}ⁿ-valued input variables or the Boolean input variables. For simplicity we count queries to the {0, 1}ⁿ-valued input variables.

We are interested in the average-case complexity of this function. The main result is the following exponential gap, to be proven in the next sections:

5.4.1. Theorem (Ambainis & de Wolf [13]). For f as above, we have that Q^unif₂ (f )≤ 22n + 1 and R^unif2 (f )∈ Ω(2^n/2).

5.4.2 Quantum upper bound

Our quantum algorithm for f is similar to Simon’s. Start with the 2-register su-perpositionP

i∈{0,1}ⁿ|ii|~0i (for convenience we ignore normalizing factors). Apply

a query to obtain X

i∈{0,1}ⁿ

|ii|xⁱi.

Measuring the second register gives some j and collapses the first register to X

i:xi=j

|ii.

Applying a Hadamard transform to each qubit of the first register gives X

5.4. Average-Case: Randomized vs. Quantum 83 Notice that|i^′i has non-zero amplitude only if (k, i^′) = 0. Hence if f (x) = 1, then measuring the final state gives some i^′ orthogonal to the unknown k.

To decide if f (x) = 1, we repeat the above process m = 22n times. Let i1, . . . , im ∈ {0, 1}ⁿ be the results of the m measurements. If f (x) = 1, there must be a non-zero k that is orthogonal to all ir (r ∈ {1, . . . , m}). Compute the subspace S ⊆ {0, 1}ⁿ that is generated by i1, . . . , im (i.e., S is the set of binary vectors obtained by taking linear combinations of i1, . . . , im over GF (2)).

If S = {0, 1}ⁿ, then the only k that is orthogonal to all ir is k = 0ⁿ (clearly ir· 0ⁿ = 0 for all ir), so then we know that f (x) = 0. If S 6= {0, 1}ⁿ, we just query all 2ⁿ values x0...0, . . . , x1...1 and then compute f (x). Of course, this latter step is very expensive, but it is needed only rarely:

5.4.2. Lemma. Assume that x = (x0, . . . , x2ⁿ−1) is chosen uniformly at random from {0, 1}^N. Then, with probability at least 1− 2⁻ⁿ, f (x) = 0 and the measured i1, . . . , im generate {0, 1}ⁿ.

Proof. It can be shown by a small modification of [4, Theorem 5.1, p.91] that with probability at least 1− 2^−c2ⁿ (c > 0), there are at least 2ⁿ/8 values j such that xi = j for exactly one i ∈ {0, 1}ⁿ (and hence f (x) = 0). We assume that this is the case in the following.

If i1, . . . , im generate a proper subspace of {0, 1}ⁿ, then there is a non-zero k∈ {0, 1}ⁿ that is orthogonal to this subspace. We estimate the probability that this happens. Consider some fixed non-zero vector k ∈ {0, 1}ⁿ. The probability that i1 and k are orthogonal is at most ¹⁵₁₆, as follows. With probability at least 1/8, the measurement of the second register gives j such that f (i) = j for a unique i. In this case, the measurement of the final superposition (5.1) gives a uniformly random i^′. The probability that a uniformly random i^′ has (k, i^′)6= 0 is 1/2. Therefore, the probability that (k, i1) = 0 is at most 1− ¹₈ · ¹₂ = ¹⁵₁₆.

The vectors i1, . . . , im are chosen independently. Therefore, the probability that k is orthogonal to each of them is at most (¹⁵₁₆)^m = (¹⁵₁₆)²²ⁿ < 2⁻²ⁿ. There are 2ⁿ− 1 possible non-zero k, so the probability that there is a k that is orthogonal to each of i1, . . . , im, is ≤ (2ⁿ− 1)2⁻²ⁿ< 2⁻ⁿ. 2

Note that this algorithm is actually a zero-error algorithm: it always outputs the correct answer. Its expected number of queries on a uniformly random input is at most m = 22n for generating i1, . . . , im and at most ₂¹n2ⁿ = 1 for querying all the xi if the first step does not give i1, . . . , im that generate {0, 1}ⁿ. This completes the proof of the first part of Theorem 5.4.1. In contrast, in Section 5.4.4 we show that the worst-case zero-error quantum complexity of f is Ω(N ), which is near-maximal.

5.4.3 Classical lower bound

Let D₁ be the uniform distribution over all inputs x ∈ {0, 1}^N and D₂ be the uniform distribution over all x for which there is a unique k 6= 0 such that xi = x_i⊕k (and hence f (x) = 1). We say that an algorithm A distinguishes between D₁ and D₂ if the average probability that A outputs 0 is ≥ 2/3 under D1 and the average probability that A outputs 1 is ≥ 2/3 under D².

5.4.3. Lemma. If there is a bounded-error algorithm A that computes f with m = T_A^unif queries on average, then there is an algorithm that distinguishes between D1

and D₂ and uses O(m) queries on all inputs.

Proof. Without loss of generality we assume A has error probability ≤ 1/10.

Under D1, the probability that A outputs 1 is at most 1/10 + o(1) (1/10 is the maximum probability of error on an input with f (x) = 0 and o(1) is the probability of getting an input with f (x) = 1), so the probability that A outputs 0 is at least 9/10−o(1). We run A until it stops or makes 10m queries. The average probability (under D1) that A does not stop before 10m queries is at most 1/10, for otherwise the average number of queries would be more than ₁₀¹(10m) = m.

Therefore the probability under D1 that A outputs 0 after at most 10m queries, is at least (9/10− o(1)) − 1/10 = 4/5 − o(1). In contrast, the D2-probability that A outputs 0 is ≤ 1/10 because f(x) = 1 for any input x from D2. We can use

this to distinguish D1 from D2. 2

5.4.4. Lemma. A classical randomized algorithm A that makes m ∈ o(2^n/2) queries cannot distinguish between D1 and D2.

Proof. Suppose m∈ o(2^n/2). For a random input from D1, the probability that all answers to m queries are different is

1·

For a random input from D₂, the probability that there is an i such that A queries both xi and x_i⊕k (k is the hidden vector) is ≤¡_m If no pair xi, x_i⊕k is queried, the probability that all answers are different is

1·

5.4. Average-Case: Randomized vs. Quantum 85 It is easy to see that all sequences of m different answers are equally likely.

Therefore, for both distributions D1 and D2, we get a uniformly random sequence of m different values with probability 1−o(1) and something else with probability o(1). Thus A cannot “see” the difference between D1 and D2 with sufficient

probability to distinguish between them. 2

The second part of Theorem 5.4.1 now follows: a classical algorithm that computes f with an average number of m queries can be used to distinguish between D1 and D2 with O(m) queries (Lemma 5.4.3), but then O(m)∈ Ω(2^n/2) (Lemma 5.4.4).

5.4.4 Worst-case quantum complexity of f

For the sake of completeness, we will here show a lower bound of Ω(N ) queries for the zero-error worst-case complexity Q₀(f ) of the function f on N = n2ⁿ binary variables defined in Section 5.4. (We count binary queries this time.) Consider a quantum algorithm that makes at most T queries and that, for every x, outputs either the correct output f (x) or, with probability≤ 1/2, outputs “don’t know”.

Consider the polynomial P which is the acceptance probability of our T -query algorithm for f . It has the following properties:

1. P has degree d≤ 2T 2. if f (x) = 0 then P (x) = 0 3. if f (x) = 1 then P (x) ∈ [1/2, 1]

We first show that only very few inputs x∈ {0, 1}^N make f (x) = 1. The number of such 1-inputs for f is the number of ways to choose k ∈ {0, 1}ⁿ− {0ⁿ}, times the number of ways to choose 2ⁿ/2 independent xi ∈ {0, 1}ⁿ. This is (2ⁿ− 1) · (2ⁿ)²ⁿ^/2< 2ⁿ⁽²ⁿ^/2+1). Accordingly, the fraction of 1-inputs among all 2^N inputs x is < 2ⁿ⁽²ⁿ^/2+1)/2ⁿ²ⁿ = 2⁻ⁿ⁽²ⁿ^/2−1). These x are exactly the x that make P (x)6= 0.

However, the following result is known [148, 133]:

5.4.5. Lemma (Schwartz). If P is a non-constant N -variate multilinear poly-nomial of degree d, then

|{x ∈ {0, 1}^N | P (x) 6= 0}|

2^N ≥ 2^−d.

This implies d ≥ n(2ⁿ/2− 1) and hence T ≥ d/2 ≥ n(2ⁿ/4− 2) ≈ N/4.

Thus we have proved that the worst-case zero-error quantum complexity of f is near-maximal:

5.4.6. Theorem (Ambainis & de Wolf [13]). Q₀(f ) ∈ Ω(N).

In document Quantum Computing and Communication Complexity (pagina 97-102)