Server scheduling to balance priorieties, fairness, and average qualityof service

(1)

Server scheduling to balance priorieties, fairness, and average

qualityof service

Citation for published version (APA):

Bansal, N., & Pruhs, K. R. (2010). Server scheduling to balance priorieties, fairness, and average qualityof service. SIAM Journal on Computing, 39(7), 3311-3335. https://doi.org/10.1137/090772228

DOI:

10.1137/090772228

Document status and date: Published: 01/01/2010

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

(2)

SIAM J. COMPUT. 2010 Society for Industrial and Applied Mathematics Vol. 39, No. 7, pp. 3311–3335

SERVER SCHEDULING TO BALANCE PRIORITIES, FAIRNESS,

AND AVERAGE QUALITY OF SERVICE∗

NIKHIL BANSAL† AND KIRK R. PRUHS‡

Abstract. Often server systems do not implement the best known algorithms for optimizing

average Quality of Service (QoS) out of concern that these algorithms may be insufficiently fair to individual jobs. The standard method for balancing average QoS and fairness is to optimize the_p norm, 1< p < ∞. Thus we consider server scheduling strategies to optimize the p norms of the standard QoS measures, flow and stretch. We first show that there is no no(1)-competitive online algorithm for thepnorms of either flow or stretch. We then show that the standard clairvoyant algorithms for optimizing average QoS, Shortest Job First (SJF), and Shortest Remaining Processing Time (SRPT), are scalable for the _p norms of flow and stretch. We then show that the standard nonclairvoyant algorithm for optimizing average QoS, Shortest Elapsed Time First (SETF), is also scalable for thep norms of flow. We then show that the online algorithm, Highest Density First (HDF), and the nonclairvoyant algorithm, Weighted Shortest Elapsed Time First (WSETF), are scalable for the weighted_pnorms of flow. These results suggest that the concern that these stan-dard algorithms may unnecessarily starve jobs is unfounded. In contrast, we show that the Round Robin, or Processor Sharing, algorithm, which is sometimes adopted because of its seeming fairness properties, is notO(1 + )-speed, no(1)-competitive for sufficiently small.

Key words. scheduling, resource augmentation, ﬂow time, shortest elapsed time ﬁrst, shortest

remaining processing time, multilevel feedback, shortest job ﬁrst

AMS subject classifications. 68M20, 90B35 DOI. 10.1137/090772228

1. Introduction.

1.1. Motivation. When designing a scheduling strategy for servers in a

client-server setting, there is commonly a conﬂict between the competing demands of fairness and average-case performance. For concreteness let us initially consider a web server serving static content. Each job i for the web server can be described by a release time ri when a request arrives at the server and a size or volume pi of the ﬁle

re-quested. An online scheduling algorithm must determine the unique job to run or, equivalently, ﬁle to transmit, at each point in time. Generally servers, such as web servers, that must handle jobs of widely varying sizes must allow preemption, which is the suspension of the execution of one job and the later resumption of the execu-tion of that job from the point of suspension, to avoid the possibility of one big job delaying many small jobs. For example, standard web servers schedule preemptively. The most commonly used Quality of Service (QoS) measure for a single job Ji is clearly the ﬂow/response/waiting time Fi = Ci− ri, where Ci is the time when the server completes the job. Then the two most obvious QoS objectives for a schedule are the average, or 1norm, of the response times, and the maximum, or ∞norm, of the response times. It is well known that the scheduling algorithm Shortest

Remain-∗_{Received by the editors September 28, 2009; accepted for publication (in revised form) May 12,}

2010; published electronically July 29, 2010.

http://www.siam.org/journals/sicomp/39-7/77222.html

†_{IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 (nikhil@us.ibm.com).} ‡_{Computer Science Department, University of Pittsburgh, Pittsburgh, PA 15260 (kirk@cs.pitt.}

edu). This author’s work was supported in part by a grant from the US Air Force, an IBM faculty award, and NSF grants CCR-0098752, ANIR-0123705, CNS-0325353, CCF-0448196, CCF-0514058, IIS-0534531, and CCF-0830558.

(3)

ing Processing Time (SRPT) is optimal for minimizing the total response time, and the scheduling algorithm Come-Served (FCFS) (or, equivalently, First-In-First-Out (FIFO)) is optimal for minimizing the maximum response time. Yet standard web servers do not use either SRPT or FCFS. For example, the Apache web server uses FCFS to determine the request that is allocated a free process slot and essentially leaves the scheduling of the processes to the underlying operating system, whose scheduling policy may be designed primarily to optimize average-case perfor-mance [1]. By reverse engineering standard server scheduling policies, such as the one used by Apache, the apparent goal of the designers was to try to balance the competing objectives of optimizing the worst case for fairness reasons and optimizing the average case for performance reasons.

The standard way to compromise between optimizing for the average and opti-mizing for the worst case is to optimize the p norm, generally for something like

p = 2 or p = 3. For example, the standard way to ﬁt a line to a collection of points is to pick the line with minimum least squares (equivalently, 2) distance to the points,

and Knuth’s TEX typesetting system uses the 3 norm to determine line breaks [20,

page 97]. This suggests optimizing the objective of the pnorm of the response times,

(n_i=1_F_ip_/n)1/p, as a way to balance the competing demands of balancing worst-case and average performance. The p norm of response times considers the average in

the sense that it takes into account the response times of all jobs, but because xp is strictly a convex function of x, the p norm increases more signiﬁcantly due to jobs with unusually high response times.

The following passage from the standard Silberschatz and Galvin text

Operat-ing Systems Concepts [27] also argues for optimizOperat-ing essentially the 2 norm in time

sharing systems as a way of reducing variability:

It is desirable to maximize CPU utilization and throughput, and to minimize turnaround time, waiting time and response time. In most cases, we optimize for the average measure. However, there are circumstances when it is desirable to optimize for the maximum and minimum values, rather than the average. For example, to guarantee that all users get good service. It has also been suggested, that for interactive systems (such as time sharing systems), it is more important to minimize the variance in the response time than it is to minimize the average response time. A system with reasonable and predictable response time may be considered more desirable than a system that is faster on the average, but is highly variable. However, little work has been done on CPU scheduling algorithms to minimize variance.

— Operating Systems Concepts, Silberschatz and Galvin.

The variance is the expected squared distance from the mean, n_i=1_(F_i− Z)2, where Z =n_i=1Fi/n. The objective of minimizing the variance is not a good formal

criterion since one can achieve zero variance by scheduling the jobs so that all the jobs have the same exact horrible quality of service. So probably the most reasonable alternative to variance would be to set Z = 0 and consider the objective ofn_i=1F_i2, which is essentially equivalent to the 2norm of response times.

Thus our goal here is to report on a theoretical investigation into server scheduling with the objective of the pnorms of standard job QoS measures. To give an indication

of the type of results that we obtain, let us return to our discussions of scheduling policies for web servers serving static content with the objective being the 2norm of response times.

(4)

SERVER SCHEDULING 3313 QoS Load QoS Optimal Online Load (a) (b)

Fig. 1_{. (a) Standard QoS curve. (b) The worst possible QoS curve of an (1 +})-speed,

O(1)-competitive online algorithm.

We ﬁrst give a negative result that shows that optimizing for the 2 norm of re-sponse times is much harder than optimizing for the 1and ∞norms. More precisely, we show that every scheduling algorithm has competitive ratio ω(1). The competitive ratio of an algorithm A and an objective F is

max

I

F (A(I))

F (Opt(I)) ≤ c,

where A(I) denotes the schedule that algorithm A produces on input I, and similarly Opt(I) denotes the optimal schedule for I. So for every scheduling algorithm A, there are instances I on which A produces a schedule where the 2norm of response times

can be unboundedly worse than in the optimal schedule.

However, as is often the case in such lower bound constructions, the lower bound instances have a special adversarial structure. The load that these instances place on the server is near the capacity of the server, so that the scheduling algorithm does not have any spare time to recover from even minor scheduling misjudgments. We attempt to illustrate this phenomenon in Figure 1. QoS curves such as those in Figure 1(a) are ubiquitous in server systems. That is, there is a relatively modest degradation in schedule QoS as the load increases until one nears some threshold— this threshold is essentially the capacity of the system—after which any increase in the load precipitously degrades the QoS provided by the server. The concept of load is not so easy to formally define but generally reflects the number of users of the system. The online algorithm whose performance is pictured in Figure 1(b), while not optimal, would seem to have reasonable performance. However, this online algorithm would have a high competitive ratio. The value of the competitive ratio is at least the ratio of the values of the performance curves for online and optimal at a particular load. So in Figure 1(b) one can establish that the competitive ratio must be high by picking a load a bit less than the capacity of the system, where the performance of the online algorithm has degraded significantly, but the performance of the optimal algorithm has not yet degraded.

To address this issue, [18] introduced resource augmentation analysis, which com-pares the online algorithm against an optimal oﬄine algorithm with a slower processor. More formally, in the context of a scheduling minimization problem with an objective function F , an algorithm A is s-speed, c-competitive if

max

I

F (As(I))

(5)

where the subscripts denote the speed of the processor that the algorithms use. In-creasing the speed of a server by a factor of s is essentially equivalent to lowering the load on the server by a factor of s. A (1 + )-speed, O(1)-competitive algorithm is said to be scalable [25, 24]. More formally, an algorithm A is scalable if for all > 0, there exists a constant c (which may depend upon ) such that for all inputs I,

F (A1+(I))

F (Opt1(I)) ≤ c.

Such a scalable algorithm is O(1)-competitive on inputs I where Opt1(I) is

approx-imately Opt₁₊_{(I). The loads in the usual performance curve shown in Figure 1(a),} where Opt₁_{(I) is not approximately Opt}₁₊_{(I), are those points near or above the} capacity of the system. Thus the performance curve of a scalable scheduling algorithm should be no worse than that shown in Figure 1(b); that is, the scheduling algorithm should scale reasonably well up to quite near the capacity of the system.

Among other results, we will show that the scheduling algorithm SRPT is a scalable algorithm for the objective of the p norm of response times. This result

suggests that perhaps the fear that SRPT will unnecessarily starve jobs is unfounded since SRPT scales almost as well with load as the optimal algorithm for the pnorm

of response times.

1.2. Our results. We analyze the performance of various standard scheduling

algorithms for the objectives of the p norms, 1 < p < ∞, of the two most common

job QoS measures, response time and stretch. The stretch or slowdown for a job

i is deﬁned to be Fi/pi. If a job has stretch s, then it appears to the client that it

received dedicated service from a speed 1/s processor. One motivation for considering stretch is that a human user may have some feeling for the size of a job. For example, in the setting of a web server, the user may have some knowledge about the size of the requested document (for example, the user may know that video documents are generally larger than text documents) and may be willing to tolerate a larger response time for larger documents.

We consider the following algorithms:

• SJF. Shortest Job First always runs the job i of minimum volume pi. SJF

may preempt a job when a job of smaller volume arrives.

• SRPT. Shortest Remaining Processing Time always runs the job of minimum unﬁnished volume. So, for example, if a job i is two-thirds completed, then its unﬁnished volume is pi/3.

• RR. Round Robin shares the processor equally among all jobs.

• SETF. Shortest Execution Time First always runs the job that has been run the least so far.

• HDF. Highest Density First always runs the job of highest density, which is the weight of a job divided by the size of a job.

• WSETF. Among all jobs with the smallest norm, Weighted Shortest Execu-tion Time First splits the processor proporExecu-tionally to the weights of the jobs. The norm of a job is its ﬁnished volume divided by its weight.

All the algorithms except for WSETF, which we introduce here, are standard schedul-ing algorithms.

We summarize our results in Table 1. We ﬁrst consider the standard online algo-rithms aimed at average QoS, that is, SJF and SRPT. We show that the algoalgo-rithms

SJF and SRPT are scalable with respect to all pnorms of ﬂow and stretch. Next we

(6)

SERVER SCHEDULING 3315 Table 1

Resource augmentation results for unweighted and weightedpnorms, 1< p < ∞.

Algorithm Speed Competitive ratio Section

pNorm of ﬂow pNorm of stretch General clairvoyant 1 nΩ(1) nΩ(1) 4 algorithm SJF (1 +) O(1/) O(1/) 5 SRPT (1 +) O(1/) O(1/) 6 SETF (1 +) O(1/2+2/p₎ _O( 1 3+1/p· lg1+1/p _B ) 7 RR (1 +) Ω(n1−2p₎ _Ω(_n) ₈ General

nonclairvoyant (1 +) Ω(min{n, log B}) 4

algorithm

Weightedpnorm of ﬂow

HDF (1 +) O(1/2₎ ₉

WSETF (1 +) O(1/2+2/p₎ ₁₀

requirement of a job, which are often referred to as nonclairvoyant algorithms [22]. We show that the standard nonclairvoyant algorithm SETF is also scalable for all p

norms of ﬂow. For the pnorm of stretch, we show that, given (1 + ) speed-up, SETF is polylogarithmically competitive in B, the ratio of the length of the longest job to the shortest job. We also give an essentially matching lower bound. Note that all of the results assume that p is constant, so that multiplicative factors, which are a function of p alone, are absorbed into the constant in the asymptotic notation. Recall that we are primarily interested in p = 2 and p = 3.

Some server systems have mechanisms that allow the users or the system to spec-ify that some jobs are more important than other jobs. For example, in the Unix operating system the jobs have priorities which can be set with the nice command. One natural way to formalize a scheduling objective in a setting where the jobs are of varying importance is to associate a weight wiwith each job i, and then to weight the

performance of each job accordingly in the objective. So, for example, the weighted p norm of flow would be (n_i=1_w_i_F_ip)1/p. We show that both HDF and the non-clairvoyant algorithm WSETF are scalable for the p norms of flow. An interesting aspect of our analysis of HDF and WSETF is that we first transform the problem on the weighted instance to a related problem on the unweighted instance.

These resource augmentation results argue that the concern that these standard scheduling algorithms aimed at optimizing average QoS might unnecessarily starve jobs is unfounded when the server is less than fully loaded.

It might be tempting to conclude that all reasonable algorithms should have such scaling guarantees. However, we show that this is not the case. More precisely, the standard Processor Sharing, or, equivalently, Round Robin, algorithm is not (1 + )-speed, O(1)-competitive for any pnorm of ﬂow, 1 < p < ∞, and for suﬃciently small

. This is perhaps surprising, since fairness is a commonly cited reason for adopting Processor Sharing [28].

2. Related results. The results in the literature that are closest in spirit to

those here are found in a series of papers, including [4, 14, 16, 26], that argue that SRPT will not unnecessarily starve jobs any more than Processor Sharing, or,

(7)

equiv-alently, RR, does under “normal” situations. In these papers, “normal” is deﬁned as there being a Poisson distribution on release times, and processing times being independent samples from a heavily tailed distribution. More precisely, these papers argue that every job should prefer SRPT to Processor Sharing under these circum-stances. Experimental results supporting this hypothesis are also given. So informally our paper and these papers reach the same conclusion about the superiority of SRPT. But in a formal sense the results are incomparable.

The following results are known about online algorithms when the objective func-tion is average ﬂow time. The competitive ratio of every deterministic nonclairvoyant algorithm is Ω(n1/3), and the competitive ratio of every randomized nonclairvoyant algorithm against an oblivious adversary is Ω(log n) [22]. Here n is the number of jobs in the instance. The randomized nonclairvoyant algorithm Randomized Multi-Level Feedback (RMLF), proposed in [19], is O(log n)-competitive against an oblivious ad-versary [5]. The online clairvoyant algorithm SRPT is optimal. The online clairvoyant algorithm SJF is scalable [6]. The nonclairvoyant algorithm SETF is scalable [18]. RR is also known to be O(1)-speed, O(1)-competitive for average ﬂow time [15].

For minimizing average stretch, [23] shows that SRPT is 2-competitive. In the oﬄine case, there is a polynomial-time approximation scheme for average stretch [7, 10]. In the nonclairvoyant case, the authors of [3] showed that any algorithm is Ω(B)-competitive (without speed-up), where B is the ratio of the maximum to minimum size. They also showed that even with an O(1) times speed-up, any algorithm is Ω(min(n, log B))-competitive. They also gave a nonclairvoyant algorithm that is (1 +

)-speed, O(poly(−1· log B))-competitive.

HDF was shown in [6] to be (1 + )-speed, O(1/)-competitive for weighted flow time. This immediately implies a similar result for SJF for average flow time and for average stretch. Reference [2] gives an O(log W )-competitive algorithm for weighted flow time. Other results for weighted flow time can be found in [11, 10, 2].

Subsequent to this research, two papers [9, 12] extend our results to the multi-processor setting. The paper [9] shows that HDF is scalable in the multimulti-processor setting. The paper [12] shows that scalable nonclairvoyant randomized algorithms exist that have the immediate dispatch property; that is, they assign a job to a processor as soon as the job arrives.

3. Definitions and preliminaries. We assume a collection of jobsJ = J1, . . . ,

Jn. For Ji, the release time is denoted by ri, the volume/size by pi, and the weight by

wi. The density of a job Jiis wi/pi. Our analyses will require the technical restriction

that the weights are positive integers.

A schedule speciﬁes for each time a job run at that time. If a speed s processor works on a job Ji for an amount of time t, then the unﬁnished volume of Ji is reduced

by s · t. Note that we allow preemption; that is, a job can be interrupted arbitrarily in time and resumed later from the point of interruption. We use pi(t) to denote the remaining/unfinished volume on job Ji at time t, and xi(t) = pi− pi(t) to denote the work performed on Ji by time t. The completion time Ci(S) of a job J_i in a schedule S is the first time t after r_i _{when p}_i_{(t) = 0. The flow time of J}_i in S is Fi(S) = C_i(S) − r_i_{. The stretch S}_i _{of J}_i in S is F_i(S)/p_i. If the schedule S is understood, it may be dropped from the notation. An online algorithm does not know about job Ji until time ri, at which time it learns the weight of Ji. A clairvoyant

algorithm A learns pi at time ri. A nonclairvoyant algorithm A learns about the

size of a job only when it meets its service requirement and leaves the system. In particular, at any time t the algorithm only knows a lower bound on piequal to xi(t).

(8)

SERVER SCHEDULING 3317 We use As(I) to denote the schedule output by the algorithm A, with a speed s

processor, on the inputI. For an objective function X, we use X(A_s(I)) to denote the value of the objective function X on the schedule As(I). On some occasions,

particu-larly when the speed s is typographically complex, it is more convenient to use the no-tation X(A(I), s) instead. Some objective functions that we consider are F =_iFi,

Fp =_i_F_ip_{, S =}_i_S_i_{, S}p =_i_Sp_i_{, W F =} _i_w_i_F_i_{, and W F}p =_i_w_i_F_ip. The p norm of flow is (_i_F_ip)1/p_{, the}_p norm of stretch is (_i_S_ip)1/p_{, and the}_pnorm of weighted flow is (_i_w_i_F_ip)1/p_{. Thus an algorithm being c-competitive for} objec-tive Fp is equivalent to being c1/p-competitive for the p norm of flow, and a similar

relationship holds for stretch and weighted ﬂow. We use Opt to denote the optimal schedule for the objective function under consideration. So Fp(Opt_s(I)) denotes the value of the Fp objective function on the optimal schedule for this objective function. Although when the objective is understood from the context, we will drop it from the notation.

A job Ji is unﬁnished in a scheduleS at time t if ri≤ t ≤ Ci(S), and it has age

t − riat this time. We use U (S, t) to denote the collection of unﬁnished jobs at time t

in the scheduleS. For a schedule S, we use Agep(S, t) to denote the sum over all jobs

Ji ∈ U(S, t) of (t − ri)p−1. For a scheduleS, we use SAgep(S, t) to denote the sum

over all jobs Ji∈ U(S, t) of (t − ri)p−1_/pp_i_{. Note that F}p(S) = p_tAgep(S, t)dt. That is, if at every time step t, each unfinished job pays an amount equal to its age to the (p − 1)st power, then over the lifetime of any job, the total amount paid by that job is exactly equal to _p1_F_ip. Thus, the total payment made by all the jobs is proportional to Fp. Thus to show that an algorithm A1+ is O(1)-competitive for the p norm of flow, it is sufficient to show that at any time t the following local competitiveness condition holds: (1) Agep_{(U (A}₁₊_{, t), t) = O} 1 p · Agep_{(U (Opt} 1, t), t).

Similarly, to show that an algorithm A1+ is O(1p)-competitive for the p norm of

stretch, it is suﬃcient to show that at any time t the following local competitiveness condition holds: (2) SAgep_{(U (A}₁₊_{, t), t) = O} 1 p · SAgep_{(U (Opt} 1, t), t).

Let V (u, S, α) denote the unﬁnished volume at time u in the schedule S of those jobs Jj that satisfy the conditions in the list α. So, for example, V (ri, Opt1, rj ≤

ri, pj≤ pi) is the amount of volume that Opt₁_{has unfinished on jobs J}_j that arrived not after ri, are not larger in size than Ji, and that Opt₁ has not finished by time ri. Similarly, we let P (u, S, α) denote the aggregate sizes of the unfinished jobs Jj at time u in the schedule S that satisfy the conditions in the list α. Finally let Q(α) denote the aggregate size of the jobs Jj that satisfy the conditions in the list α. Note

that for simplicity of notation, we always implicitly use j as the local index to the jobs being described in this notation.

An oblivious adversary must specify the complete input before an online algorithm begins executions [8].

4. General lower bounds. In this section we show that there are no online

algorithms that are O(1)-competitive with respect to the p norms of ﬂow and stretch

(9)

Theorem 1. Let p > 1. The competitive ratio, with respect to the _pnorm of flow

time, of any randomized algorithm A against an oblivious adversary is Ω(n(p−1)/(p(3p−1))).

The competitive ratio, with respect to the p norm of stretch, of any randomized

algo-rithm A against an oblivious adversary is Ω(n(p−1)/3p2).

Proof. We use Yao’s minimax principle for online cost minimization problems [8] and lower bound the expected value of the ratio of the objective functions on A and Opt on input distribution which we specify. The inputs are parameterized by integers L, α, and β in the following manner. A long job of size L arrives at t = 0. From time 0 until time Lα− 1 a job of size 1 arrives every unit of time. With probability 1/2 this is all of the input. With probability 1/2, Lα+β short jobs of length 1/Lβ arrive every 1/Lβ time units from time Lαuntil 2Lα− 1/Lβ.

We first consider the Fp objective. In this case, α = p+1_p−1, and β = 2. We now compute Fp(A) and Fp(Opt). First consider the case that A does not finish the long job by time Lα. Then with probability 1/2 the input contains no short jobs. Then Fp(A) is at least the flow of the long job, which is at least Lαp. In this case the adversary could first process the long job and then process the unit jobs. Hence, Fp(Opt) = O(Lp+ Lα· Lp) = O(Lα+p). The competitive ratio is then Ω(Lαp−α−p), which is Ω(L) by our choice of α.

Now consider the case that A ﬁnishes the long job by time Lα. Then with probability 1/2 the input contains short jobs. One strategy for the adversary is to ﬁnish all jobs, except for the long job, when they are released. Then Fp(Opt) =

O(Lα· 1p+ Lα+β· (1/Lβ)p_{+ L}αp_{). It is obvious that the dominant term is L}αp, and

hence, Fp(Opt) = O(Lαp). Now consider the subcase that A has at least L/2 unit jobs unfinished by time 3Lα/2. Since these unfinished unit jobs must have been delayed by at least Lα/2, Fp(A) = Ω(L · Lαp). Clearly in this subcase the competitive ratio is Ω(L). Alternatively, consider the subcase that A has at most L/2 unit jobs unfinished by time 3Lα/2. Then, as the total unfinished volume by time 3Lα/2 is L, A has at least L/2 unfinished in small jobs at time 3Lα/2. By the convexity Fpwhen restricted to jobs of size 1/Lβ (we can gift A the completion of unit jobs), the optimal strategy for A from time 3Lα/2 onward is to delay each small job by the same amount. Thus A delays Lα+β/2 short jobs by at least L/2. Hence in this case, Fp(A) = Ω(Lα+β· Lp). This gives a competitive ratio of Ω(Lα+β+p−αp), which by the choice of β is Ω(L). As the total number of jobs in the instance is O(Lα+β), the competitive ratio of any online algorithm with respect to the measure Fp is L = n1/(α+β)= n(p−1)/(3p−1)and hence n(p−1)/(p(3p−1))with respect to the p norm of flow time.

We now consider the Sp objective. In this case, α = 2p+1_p−1, and β = 1. We now compute Sp(A) and Sp(Opt). First consider the case that A does not ﬁnish the long job by time Lα. Then with probability 1/2 the input contains no short jobs. Then Sp(A) is at least the stretch of the long job, which is at least (Lα/L)p= Lp(α−1). In this case the adversary could ﬁrst process the long job and then process the unit jobs. Hence, Sp(Opt) = O(1 + Lα· Lp) = O(Lα+p). The competitive ratio is Ω(Lαp−α−2p), which is Ω(L) by our choice of α.

Now consider the case that A finishes the long job by time Lα. Then with probability 1/2 the input contains short jobs. One strategy for the adversary is to finish all jobs, except for the long job, when they are released. Then Sp(Opt) = O(Lα·1p+Lα+β·1p+(Lα/L)p). Algebraic simplification shows that α+β ≤ αp−p for our choice of α and β. Hence the dominant term is Lαp−p, and Sp(Opt) = O(Lαp−p). Now consider the subcase that A has at least L/2 unit jobs unfinished at time

(10)

SERVER SCHEDULING 3319 Sp(A) = Ω(L · Lαp). Clearly in this subcase the competitive ratio is Ω(Lp+1). Alter-natively, consider the subcase that A has at most L/2 unit jobs unﬁnished by time

3Lα/2. Then A has at least L/2 unﬁnished volume in small jobs at time 3Lα/2. By

the convexity Sp when restricted to jobs of size 1/Lβ, the optimal strategy for A from time 3Lα/2 onward always delays each small job by the same amount (we can gift A the competition of the unit jobs at time 3Lα/2). Thus A delays Lα+β/2 short jobs by at least L/2. Hence in this case, Sp(A) = Ω(Lα+β· L(β+1)p). This gives a competitive ratio of Sp(A) = Ω(Lα+β+βp+2p−αp). By the choice of α and β, this gives a competitive ratio of Ω(L2p+1).

Thus the overall competitive ratio is at least Ω(L), and noting that n = O(Lα+β), the desired result follows.

It is easy to see that there is no O(1)-speed, O(1)-competitive nonclairvoyant algorithm for Sp, 1 < p < ∞, using the input instance from [17]. In this instance, there are n jobs with sizes 1, 2, 4, 8, . . . , B = 2n all of which arrive at time 0. By scheduling the jobs from shortest to longest, it is easy to see that each job has a stretch of at most 2, and hence Sp(Opt) = O(n). For any nonclairvoyant (possibly randomized) algorithm A with a speed s ≥ 1 processor, it is not hard to see that the job of size 2i _{incurs a stretch of at least Ω(n − i)/s, as it is indistinguishable from} jobs of size greater than 2i until they receive a service of 2i; a formal argument can be found in [3]. Thus, for any nonclairvoyant algorithm A, Sp(A, s) = Ω(np+1/sp), which implies a lower bound of Ω(n) on the competitive ratio of any nonclairvoyant algorithm even with an O(1) speed for all pnorms of stretch. Note that in the above

instance B = O(2n), and hence this also implies a lower bound of Ω(log B).

5. Analysis ofSJF. This section is devoted to proving the following theorem.

Theorem 2. SJF is (1+)-speed, O(1

)-competitive for the pnorms of flow time

and stretch for p ≥ 1.

Recall that SJF always runs a job of minimal size. We will ﬁx an input, some 1-speed processor schedule Opt₁_{for this instance, and a time t, and then argue that} SJF satisﬁes the local competitiveness conditions (1) and (2) at this time t.

LetD denote U(SJF₁₊_{, t) \ U (Opt}₁_{, t). An association scheme maps each J}_i∈ D

to an pi/4(1 + ) amount of volume from V (t, Opt1, rj≤ t −₄₍₁₊₎ (t − ri), pj ≤ pi) so

that no amount of volume is mapped to by more than one job inD. We show in Lemma 3 that the existence of an association scheme implies that the local competitiveness conditions hold. Lemma 4 proves an invariant that we then use in Lemma 5 to show that an association scheme exists.

Lemma 3. If an association scheme exists, then the local competitiveness condi-tions (1) and (2) hold at time t.

Proof. Consider an arbitrary job Ji ∈ D. Suppose the association scheme maps

Jito vσ(j)volume from job Jσ(j)in Opt1for j = 1, 2, . . . , |U (Opt1, t)|. The properties

of the association scheme guarantee that _j_v_σ(j) ≥ p_i_{/(4(1 + )). Moreover, each} job Jσ(j) with vσ(j) > 0 has size pσ(j) at most pi and age at least /(4(1 + )) times the age of Ji. For each j = 1, . . . , |U (Opt1, t)|, we can view Ji as being associated with a vσ(j)/pσ(j) fraction of Jσ(j). Since jvσ(j) ≥ pi/(4(1 + )) and pσ(j) ≤ pi

whenever vσ(j)> 0, this implies that Ji is associated with at least a

(3) j:vσ(j)>0 vσ(j) pσ(j) ≥ j:vσ(j)>0 vσ(j) pi ≥ pi 4(1 + )· 1 pi = 4(1 + )

(11)

times the age of Ji, the value of Agep for Jσ(j) is Ω(p−1) times the value of Agep

of Ji. The local competitiveness condition for ﬂow (1) thus follows together with (3)

and summing up over all jobs inD.

The argument for stretch is very similar. Instead of Agep above, we consider SAgep_{. The key observation is that the association scheme associates job J}

i∈ D with

a volume of jobs of no larger size, and age at least Ω() times that of Ji. Hence SAgep of each of these jobs is Ω(p−1) times the value of Agep _{of J}_i.

Lemma 4. _{Given any arbitrary scheduling instance J, let Opt}₁_{denote any} sched-ule using a speed 1 processor. Then for all times u and for all values p, the following inequality holds:

V (u, Opt1, pj ≤ p) ≥

1 + P (u, SJF1+, pj ≤ p) + 1

1 + V (u, SJF1+, pj ≤ p). Proof. The proof is by induction on the events that might happen at time u. First, suppose that SJF₁₊ _{is working on some unﬁnished job J}_j _{with p}_j ≤ p. Then, the right side decreases at least as fast as the left side since SJF₁₊ has a (1+)-speed processor, and the left side can decrease at a rate of at most 1. Moreover, in the event that job Jj ﬁnishes under SJF1+, this only helps the inequality (as Jj

ceases to contribute to the ﬁrst term on the right side). On the other hand, if there is no job Jj with size≤ p under SJF1+, then the right side is zero and the inequality

is trivially satisﬁed.

In the case that Opt₁ ﬁnishes a job, this does not aﬀect the inequality as the left side varies continuously with time. The only remaining event is that a new job Jj,

with pj ≤ p, arrives at time u. In this case, both sides increase by exactly pj, which

implies the desired result.

Lemma 5. _{An association scheme exists.}

Proof. We give a procedure for constructing an association scheme and prove its correctness. Let 1, . . . , k denote the indices of jobs in D such that p1≤ p2≤ · · · ≤ pk, and consider them in this order. Assume that we are considering job Ji. Let tdenote the time t −₄₍₁₊₎ (t − ri). Associate with Ji an pi/4(1 + ) amount of volume taken

arbitrarily from V (t, Opt1, rj ≤ t, pj ≤ pi), provided that it has not been previously

associated with a lower indexed job inD.

Consider some ﬁxed job Ji∈ D. To show that this procedure does not fail on job

Ji, it clearly suﬃces to show that V (t, Opt1, pj≤ pi, rj≤ t) contains enough volume to associate all the jobs J1, . . . , Ji; that is,

(4) _{V (t, Opt}₁_{, p}_j ≤ p_i_{, r}_j ≤ t)≥

4(1 + )Q(Jj ∈ D, pj ≤ pi).

We ﬁrst upper bound the right side. Clearly, Q(Jj∈ D, pj≤ pi) can be written as Q(Jj∈ D, rj≤ ri, pj≤ pi) + Q(Jj∈ D, rj > ri, pj≤ pi).

Since Opt₁ has to ﬁnish all jobs inD that are released after r_i_{, by time t, the latter} term can be upper bounded as

(5) _{(t − r}_i)≥ Q(J_j∈ D, r_j_{> r}_i_{, p}_j≤ p_i_).

To upper bound the ﬁrst term, we note that any job unﬁnished at time t under SJF1+

that arrived before time ri must also be unﬁnished at time ri. Thus,

(12)

SERVER SCHEDULING 3321 Combining (5) and (6), we have

(7) _Q(J_j∈ D, p_j ≤ p_i)≤ P (r_i_{, SJF}₁₊_{, p}_j≤ p_i_{) + (t − r}_i_).

We now lower bound the left side of (4). Since Opt₁_{can do at most (t − r}_i) work during [ri, t], the unﬁnished volume at time t, composed of jobs that arrived by t,

must be at least unﬁnished volume at ri, plus new volume that arrived during [ri, t]

minus (t − ri). Applying this argument for jobs of size at most pi, we get

(8)

V (t, Opt1, pj≤ pi, rj≤ t)≥ V (ri, Opt1, pj≤ pi) + Q(rj ∈ (ri, t], pj ≤ pi)− (t − ri).

By Lemma 4 with u = ri and p = pi, the ﬁrst term on the right side of (8) can be bounded as

(9) V (ri, Opt1, pj ≤ pi)≥

1 + P (ri, SJF1+, pj≤ pi) + 1

1 + V (ri, SJF1+, pj ≤ pi). Combining (8) and (9) we get that

V (t, Opt1, pj≤ pi, rj≤ t)≥ 1 + P (ri, SJF1+, pj ≤ pi) + 1 1 + V (ri, SJF1+, pj ≤ pi) + Q(rj∈ (ri, t], pj ≤ pi)− (t − r_i) ≥ 4(1 + )P (ri, SJF1+, pj ≤ pi) + 4 + 3 4(1 + )V (ri, SJF1+, pj ≤ pi) + Q(rj∈ (ri, t], pj ≤ pi)− (t − ri), (10)

where the second inequality follows by noting that

P (ri, SJF1+, pj≤ pi)≥ V (r_i_{, SJF}₁₊_{, p}_j ≤ p_i_).

Getting back to proving (4), it suﬃces to show that the right side of (10) exceeds

4(1+) times the right side of (7). By canceling the common terms, it suﬃces to show

that (11) 4 + 3 4(1 + )V (ri, SJF1+, pj≤ pi) + Q(rj∈ (ri, t _{], p} j≤ pi)≥ 4 + 5 4(1 + )(t − ri). We now relate (t − ri) to the terms on the left side of this expression. As Ji is

unﬁnished under SJF₁₊ _{at time t, it must be the case that SJF}₁₊ was working on jobs of size≤ p_i _{during the entire time [r}_i_{, t]. These jobs of size ≤ p}_i that SJF₁₊ worked on during [ri, t] either had to arrive before ri or during [ri, t], and it must be

the case that

(12) V (ri, SJF1+, pj≤ pi)+Q(rj∈ (ri, t], pj ≤ pi)≥ (1+)(t−ri) =4 + 3₄ (t−ri).

The equality above follows by our choice of t = t − (t − ri)/(4(1 + )). Multiplying (12) by _(1+)(4+3)(4+5) , we obtain that (13) (4 + 5) (1 + )(4 + 3) ·(V (ri, SJF1+, pj≤ pi) + Q(rj∈ (ri, t], pj≤ pi))≥ 4 + 5 4(1 + )(t−ri).

(13)

Now, (11) follows by observing that the right side of (13) is same as that of (11), but the coeﬃcient of each term of the left side of (11) is larger than the corresponding coeﬃcient in the left side of (13).

6. Analysis ofSRPT. This section is devoted to proving the following theorem.

Theorem 6. SRPT is (1 + )-speed, O(1

)-competitive for the p norms of flow

time and stretch for p ≥ 1.

Recall that SRPT always runs a job with the least remaining unfinished volume. We will fix an input, some 1-speed processor schedule Opt₁_{, and a time t, and then} argue that SRPT satisfies the local competitiveness conditions (1) and (2) at this time t. The overall structure of the analysis of SRPT is similar to that of SJF, but the analysis is somewhat more involved because we need to handle the remaining volume under SRPT more carefully.

We deﬁne D be the collection of jobs J_i ∈ U(SRPT₁₊_{, t) \ U (Opt}₁_{, t), with the} additional condition that the release time and size satisfy (t − ri)≥ 8pi/. Intuitively

this additional condition guarantees that the jobs in D now must be suﬃciently old relative to their size. An association scheme maps each Ji ∈ D to an pi/4(1 + )

amount of volume from V (t, Opt1, rj≤ t −₄₍₁₊₎ (t − ri), pj ≤ pi) so that no amount

of volume is mapped to by more than one job.

We show in Lemma 7 that the existence of an association scheme implies that the local competitiveness conditions hold. Lemma 8 proves an invariant that we then use in Lemma 9 to show that an association scheme exists.

Lemma 7. _{If an association scheme exists, then the local competitiveness} condi-tions hold at time t.

Proof. By the deﬁnition of D, any job J_i _{in U (SRPT}₁₊_{, t) that does not lie in}

D either must lie in U(Opt1, t) or must have age no more than 8pi/. Thus, we can

bound Fp(SRPT₁₊) as follows: Fp(SRPT₁₊_{) = p} t Agep_{(SRP T} 1+, t)dt (14) = O t Agep₍_{D, t)dt +} t Agep_(Opt 1, t)dt + n i=1 _8p_i_/ x=0 x p−1_dx (15) = O t Agep₍_{D, t)dt + F}p_(Opt 1) + n i=1 8pi _p (16) = O t Agep₍_{D, t)dt + F}p_(Opt 1) + 1 pF p_(Opt 1) (17) = O t Agep₍_{D, t)dt +} 1 pF p_(Opt 1) (18) = O 1 pF p_(Opt 1) . (19)

Line (14) is from the deﬁnition of Fp. Line (15) follows since_x=08pi/_xp−1_{dx is the} max-imum contribution to _tAgep(SRPT₁₊_{, t)dt that job J}_i can make before it reaches age 8pi/. Line (16) follows from the deﬁnition of Fp(Opt1). Line (17) follows since

the ﬂow time of Ji is at least pi in Opt₁. Line (19) follows by reasoning identical to that used in the proof of Theorem 2 to show that_tAgep(D, t)dt = O1pFp(Opt1)

. The result for stretch also follows directly by observing that Jiis associated only with volume composed of jobs with size no more than pi, and the contribution of these jobs to stretch is at least as large.

(14)

SERVER SCHEDULING 3323

Lemma 8. For all times u and values of p, and for any 1-speed processor schedule

Opt₁, the following inequality holds:

V (u, Opt1, pj≤ p) ≥ 1 + P (u, SRPT1+, pj≤ p) + 1 1 + V (u, SRPT1+, pj≤ p) − 1 1 + p.

Proof. The proof is by induction on the event that may happen at time u. First we note that the case of job arrivals and either SRPT₁₊ or Opt₁ ﬁnishing a job is quite easy and is handled exactly as in the proof of Lemma 4. Thus it suﬃces to consider the event when jobs are being worked on. We consider three cases.

First, if SRPT₁₊ is running a job with size ≤ p, then since SRPT₁₊ has a (1 + )-speed processor, the right side of the inequality decreases at rate 1, which is at least as fast as the rate at which the left side can possibly decrease, and hence the inequality holds inductively.

Second, if SRPT₁₊_{is running a job with remaining volume > p, this must mean} that there is no job alive of size ≤ p, and hence the right side of the inequality is −p/(1 + ), while the left side is nonnegative.

We now consider the ﬁnal case when SRPT₁₊ _{is running a job J}_l _{with size > p} and remaining volume ≤ p. In fact, in this case we will show that the following stronger inequality holds:

(20) _{V (u, Opt}₁_{, p}_j≤ p) ≥ 1

1 + P (u, SRPT1+, pj≤ p) − 1 1 + p.

Note that this inequality is stronger than the inequality that we want to prove since the size of a job is at least as large as its remaining volume. Let u ≤ u be the last time when SRPT₁₊ _{was running J}_l_{and its remaining volume just reached exactly p.} Even if Opt₁ _{was working on jobs of size at most p during the entire interval [u}_{, u],} we have

V (u, Opt1, pj≤ p) ≥ V (u, Opt1, pj ≤ p) + Q(pj≤ p, rj ∈ [u, u]) − (u − u) ≥ Q(pj≤ p, rj∈ [u, u]) − (u − u).

(21)

Now consider the SRPT₁₊ schedule. Note that, since SRPT₁₊ _{was running J}_l at time u, no other jobs of size≤ p were present under SRPT₁₊_{. Moreover, as J}_l is still alive at time u, SRPT1+must have been working on jobs of remaining volume at

most p during [u, u]. Also, since Jlis being run at time u (by the nature of the SRPT

scheduling algorithm), it must be that any job Jj of size≤ p that arrived during [u, u]

is either already completed or has not been executed at all; i.e., pj(u) = p. Using these observations, we have the following chain of equalities:

P (u, SRPT1+, pj≤ p) = V (u, SRPT1+, pj ≤ p)

= V (u, SRPT1+, pj ≤ p) + Q(pj ≤ p, rj ∈ [u, u])

−(1 + )(u − u₎

= p + Q(pj≤ p, rj∈ [u, u]) − (1 + )(u − u),

which implies that

u − u= 1

1 + (p + Q(pj≤ p, rj∈ [u

_{, u]) − P (u, SRPT}

(15)

Combining this with (21) we get that V (u, Opt1, pj ≤ p) ≥ Q(pj≤ p, rj∈ [u, u])

− 1 1 + (p + Q(pj≤ p, rj∈ [u _{, u]) − P (u, SRPT} 1+, pj≤ p)) or, equivalently, V (u, Opt1, pj≤ p) ≥ 1 + Q(pj≤ p, rj∈ [u _{, u])} − 1 1 + p + 1 1 + P (u, SRPT1+, pj≤ p). Since Q(pj ≤ p, rj ∈ [u, u]) is nonnegative, we know that

V (u, Opt1, pj≤ p) ≥ 1

1 + P (u, SRPT1+, pj≤ p) − 1 1 + p. This then implies (20).

Lemma 9. _{An association scheme exists.}

Proof. To construct the association scheme, we consider the jobs in D in

nonde-creasing order of their sizes and associate job Jiwith any pi/4(1+) units of volume in V (t, Opt1, rj≤ t, pj≤ pi) that have not been associated previously with a lower index job inD. This construction will clearly not fail on job J_i_{if V (t, Opt}₁_{, r}_j≤ t_{, p}_j≤ p_i) contains enough volume to associate all the jobs J1, . . . , Ji. Thus it is suﬃcient to

prove

(22) _{V (t, Opt}₁_{, r}_j ≤ t_{, p}_j≤ p_i)≥

4(1 + )Q(Jj ∈ D, pj ≤ pi).

By an argument identical to that used to obtain (7) (except that SJF₁₊ is replaced by SRPT₁₊), we can upper bound the right side of (22) by

(23) _Q(J_j∈ D, p_j ≤ p_i)≤ P (r_i_{, SRPT}₁₊_{, p}_j≤ p_i_{) + (t − r}_i_).

By the argument analogous to that used for (8), we can lower bound the left side of (22) as

(24)

V (t, Opt1, pj≤ pi, rj≤ t)≥ V (r_i_{, Opt}₁_{, p}_j≤ p_i_{) + Q(r}_j ∈ (r_i_{, t}_{], p}_j ≤ p_i)− (t − r_i_). Applying Lemma 8 with u = ri and p = pi, the ﬁrst term on the right side of (24) can be bounded as V (ri, Opt1, pj ≤ pi)≥ 1 + P (ri, SRPT1+, pj ≤ pi) (25) + 1 1 + V (ri, SRPT1+, pj ≤ pi)− 1 1 + pi.

Getting back to proving (22), it suﬃces to show that the right side of (25) exceeds the right side of (23). Thus by canceling the common terms, it is suﬃcient to prove (26) 4 + 3 4(1 + )V (ri, SJF1+, pj ≤ pi) + Q(rj∈ (ri, t _{], p} j ≤ pi)− 1 1 + pi≥ 4 + 5 4(1 + )(t − ri). We now focus on computing a suitable upper bound on the term (t − ri). Since

SRPT₁₊_{did not ﬁnish J}_i_{by time t}_{, since during the time period (r}

(16)

SERVER SCHEDULING 3325 been the case that SRPT₁₊ was running only jobs with remaining volume at most pi, and since SRPT1+has a (1 + )-speed processor,

(27) _{(1 + )(t}− r_i)≤ V (r_i_{, SRPT}₁₊_{, p}_j_(r_i)≤ p_i_{) + Q(r}_j∈ (r_i_{, t}_{], p}_j ≤ p_i_). By the nature of SRPT, at any time u and for any volume v, there can be at most one job that has size at least v and remaining volume less than v. By applying this argument with v = pi, we obtain that

(28) _{V (r}_i_{, SRPT}₁₊_{, p}_j_(r_i)≤ p_i)≤ V (r_i_{, SRPT}₁₊_{, p}_j ≤ p_i_{) + p}_i_.

By our choice of t= t−(t−ri)/(4(1+)), it follows that (1+)(t−ri) = 4+3₄ _(t−r_i). Together with (27) and (28), this implies

(29) 4 + 3

4 (t − ri)≤ V (ri, SRPT1+, pj≤ pi) + Q(rj ∈ (ri, t

_{], p}

j≤ pi) + pi. Since Ji ∈ D, the age (t − ri) of Ji is at least 8pi or, equivalently, pi≤ 8(t − ri). By

subtracting 2₈_{(t − r}_i_{) from the left side of (29) and subtracting 2p}_ifrom the right side of (29), we get that (30) 4 + 2 4 (t − ri)≤ V (ri, SRPT1+, pj ≤ pi) + Q(rj ∈ (ri, t], pj≤ pi)− pi.

Multiplying (30) by _(1+)(4+2)(4+5) , we obtain that (4 + 5) (1 + )(4 + 2) · (V (ri, SRPT1+, pj≤ pi) + Q(rj ∈ (ri, t], pj≤ pi)− pi) (31) ≥ 4 + 5 4(1 + )(t − ri).

Now, (26) follows by observing that the right side of (31) is same as that of (26), but the coeﬃcient of each positive (respectively, negative) term of the left side of (26) is larger (respectively, smaller) than the corresponding term on the left side of (31).

7. Analysis of SETF. We now consider the analysis of SETF. Recall that

SETF always runs a job Jj that has been run the least, that is, for which xj(t)

is minimal. We ﬁrst show that SETF is a (1 + )-speed, O(1/2+2/p)-competitive algorithm for minimizing the p norm of ﬂow time. We then show that SETF is

(1 + )-speed, O(log2B)-competitive for minimizing the p norm of stretch, where B

is the ratio of the maximum to minimum job size.

To analyze the performance of SETF₁₊ on an input I, we perform a series of transformations, where at each stage we possibly require an additional speed-up of (1 + ). This sequence of transformations gives a fairly general way to relate the per-formance of SETF to the perper-formance of SJF. We begin by deﬁning an intermediate policy MLF and show that the performances of SETF and MLF are related (modulo a factor of (1 + ) in the speed-up). We then transform the input I slightly to an instance ˜I and show that the performance of MLF on ˜I is related to the performance of MLF onI. The crucial property of the instance ˜I is that the performance of MLF

on ˜I can be viewed as the performance of SJF on another input instance L, where L

(17)

MLF(I) to SJF(L). The ﬁnal step, and the heart of the analysis, is a general scheme that relates the performance of SJF onL to the performance of SJF on I. This gives us a way to relate the performances of SETF(I) and SJF(I). Finally, the results about the competitiveness of SETF for ﬂow time and stretch follow from those about the competitiveness of SJF.

Our MLF is a variation of the standard Multilevel Feedback Queue algorithm [13, 28], where the quanta for the queues are set as a function of . Let i = ((1 + )i−1),

i ≥ 0, and let qi= i+1− i= 2(1 + )i, i ≥ 0. In MLF a job Jj is in level k at time

t if k≤ xj(t) < k+1. The number of levels L is O(log1+(B)). MLF maintains the

invariant that it is always running the earliest arriving job in the smallest nonempty level. It is crucial to note that MLF is deﬁned only for the purposes of analysis. One consequence (implicit in the choice of quanta above) is that we can assume without loss of generality that the smallest job size in the instance is exactly 1 and the largest job size is at most B. This assumption about the absolute sizes of jobs is made only so that we can ﬁx the sizes of quanta in MLF. In particular, SETF does not use this fact in any way.

Lemma 10. _{For any instance}I and for any job J_j∈ I, J_j _{completes in SET F}₁₊

before Jj completes inMLF1.

Proof. For a job with xj(t) ≤ z, let p≤z(j, t) = min(pj, z) − xj(t), that is, the

amount of work that must be done on Jj until either Jj completes or until Jj has

been processed for z time units. Let U≤z(A, t) denote the set of unﬁnished jobs in

algorithm A at time t that have less than z work done on them. Let V≤z(A, t) denote

Jj∈U≤z(A,t)p≤z(j, t). Now, as SETF is always working on the job with least work done, it is easy to see that for any time t and amount of work x, V≤x(SETF_s_{, t) ≤}

V≤x(As, t) for all algorithms A and all speeds s.

To reach a contradiction suppose that there is some job Jjwhich completes earlier in MLF1than in SETF1+. Clearly MLF1must then ﬁnish the volume V≤pj(MLF1, rj) before time Cj(MLF1). Moreover, at time Cj(MLF1) all the jobs in U (MLF1, t) must

be in level k such that k≤ pj< k+1and hence have at least pj/(1+) amount of work

done on them; otherwise Jj would not be run by MLF1at time Cj(MLF1). Consider

the schedule A1+ that at all times t runs the job that MLF1 is running at time t

(and idles if this job is already completed). Hence, A1+ would have completed Jj,

and all previously arriving jobs with processing time less that pj, by time Cj(MLF1)

since A1+ has a (1 + )-speed processor. Hence, by the property of SETF from

the previous paragraph, SETF₁₊ _{completes J}_j _{by time C}_j(MLF1), which is our

contradiction.

Lemma 10 implies that

(32) _Fp(SETF(I), s) ≤ Fp(MLF(I), s/(1 + )).

Thus we will focus henceforth on relating MLF and Opt. We begin by deﬁning several instances that are derived from the original instance. Let the original instance beI.

Let J be the instance obtained from I as follows. Consider a job J_j ∈ I and let i

be the smallest integer such that pj + ≤ (1 + )i. The processing time of Jj in

J is then (1 + )i_{. Let}_{K be the instance obtained from J by decreasing each job}

size by . Thus, each job in K has size k = ((1 + )k − 1) for some k. Note that this k is the same as the quanta for level k in the deﬁnition of MLF. Moreover, in

this transformation fromI to K, the size of a job does not decrease, and it increases by at most a factor of (1 + ). This follows because in the worse case a job of size

(18)

SERVER SCHEDULING 3327 transformation fromI to J , the size of a job does not decrease, and it increases by a factor of at most (1 + )2. Since MLF has the property that increasing the length of a particular job will not decrease the completion time of any job, we can conclude that (33) _Fp(MLF(I), s/(1 + )) ≤ Fp(MLF(K), s/(1 + )).

Finally we create an instanceL by replacing each job of size ((1 + )k− 1) in K by k jobs of size q0, q1, . . . , qk−1, where qi= i+1− i, as deﬁned in the deﬁnition of MLF. Note that k = q0+ q1+· · · + q_k−1_{. For a job J}_j ∈ K, we denote the corresponding jobs inL by J_j,0_{, J}_j,1_{, . . . , J}_j,k−1_{, where job J}_j,i_{has size q}_i.

A crucial observation about the execution of MLF on instance K is that it can be modeled by the execution of SJF on instance L. More precisely, for any time t and for any speed s, SJFs(L) is working on a job Jj,b ∈ L if and only if MLFs(K)

is working on job Jj ∈ K that is in level b at time t. In particular, this implies that

the completion time of Jj in MLFs(K) is exactly the completion time of some job

Jj,b∈ SJFs(L). Hence,

(34) _Fp(MLF(K), s/(1 + )) ≤ Fp(SJF(L), s/(1 + )). By Theorem 2, we know that

(35) _Fp(SJF(L), s/(1 + ) = O(1/p_)Fp(Opt(L), s/(1 + )2_).

We need to relate the optimal schedule forL back to the optimal schedule for I. To do this we ﬁrst relateL to J as follows. Let L(k) denote the instance obtained

fromJ by multiplying each job size in J by /(1 + )k. Next, we remove fromL(k)

any job whose size is less than 2. We claim thatL = L(1)∪L(2)∪· · · . To see this, let us consider some job Jj ∈ J of size (1 + )i. Then,L(1) contains the corresponding job Jj(1) of size /(1 + ) · (1 + )i= 2(1 + )i−1= qi−1. Similarly,L(2) contains the

job Jj(2) of size qi−2 and so on. In particular, for Jj of size (1 + )i, the job Jj,k∈ L

is identical to the job Jj(i − k) ∈ L(k) for k = 0, . . . , i − 1. Summarizing, we have

that theL(k)’s are geometrically scaled down copies of J and that L is exactly the union of theseL(k)’s.

Our idea at an intuitive level is as follows: Given a good schedule ofJ , we ﬁrst obtain a good schedule for L(k). This will be easy to do, as L(k) is a scaled down version of J . We will then superimpose the schedules for each L(k) to obtain a good schedule for L. This will give us a procedure for obtaining a good schedule for L given a good schedule for J . To put everything in place, we need the following straightforward lemma from [3] which relatesJ and L(k).

Lemma 11. _Let J and L(k) be as defined above, let s ≥ 1 be some speed, and

let x ≥ 1 be any real number. Then, for any job Ji ∈ J and the corresponding job

Ji(k) ∈ L(k), the flow time and stretch under SJF are related as F (Ji(k), SJF(L(k)), (1 + )−k· x · s) ≤ 1

xF (Ji, SJF(J ), s), S(Ji(k), SJF(L(k)), (1 + )−k· x · s) ≤ (1 + )

k

x S(Ji, SJF(J ), s).

Proof. We ﬁrst show that for all jobs Jj ∈ J , we have that F (Jj, SJF(J ), s) ≤

(1/s)F (Jj, SJF(J ), 1). Let yj(x, s) denote the work done on job Jj, after x units of

time since it arrived, under SJF using an s-speed processor. We will show a stronger invariant that, for all jobs Jjand all times t, it is the case that yj((t−rj)/s, s) ≥ yj(t−

(19)

rj, 1), by using induction on the time t. Suppose that at time t, SJF1works on job Jj.

Then the invariant clearly continues to hold at time t for jobs besides Jj. Consider

any job Jiunﬁnished for SJF1at time t that is smaller than job Jj. By the deﬁnition

of Ji, SJF1 has already completed Ji. Since the invariant for Ji holds at the time

Ci< t that SJF1completes job Ji, we have that yi((Ci−ri)/s, s) ≥ yi(Ci−ri, 1) = pi.

Therefore SJF_s_{has completed job J}_i _{by time (C}_i− r_i_{)/s ≤ r}_i_{+ (t − r}_i_{)/s ≤ t. Thus,} by the deﬁnition of SJF_s_{, at time t it either works on job J}_j, and then the invariant for Jj is preserved, or it works on some bigger job, and then Jj is completed and the invariant for Jj holds trivially.

We now show the main result of the lemma. Since L(k) is obtained by scaling jobs inJ by (1 + )k (and possibly dropping some jobs if they become smaller than 2), the ﬂow time of every job Jj(k) ∈ L(k) under SJF with a speed (1 + )−k

processor is at most that of the corresponding job Jj ∈ J , under SJF with a unit

speed processor. Thus by the argument in the above paragraph, running SJF onL(k) with an x · (1 + )−k-speed processor yields a 1/x times smaller ﬂow time for each job inL(k) than the corresponding job in J .

Finally, since the sizes of jobs inL(k) are (1 + )−ktimes smaller than inJ , the result for stretch follows from the result for ﬂow time.

We are now ready to show that a good schedule forL can be obtained from the SJF schedule for J using an additional speed-up of (1 + 2).

Lemma 12.

Fp(Opt(L), 1 + 2) = O(1/2_)Fp(SJF(J ), 1).

Proof. Given the schedule SJF(J ), we construct a schedule A algorithm for L as

follows. For each k = 1, 2, . . . , A runs the jobs in L(k) with a speed sk = (1 +₁₊ )−k processor using the algorithm SJF. As L =_k≥1L(k), this gives a schedule for L. Note that the total speed required by A is at most

∞ i=1 si= ∞ i=1 _{(1 +}1 1+)i = 1−₁₊₂1+ = 1 + 2. By Lemma 11, Fp(A(L(k)), sk)≤ (1 + )−k sk p Fp(SJF(J , 1)) = (1 + 2) (1 + )2 kp Fp(SJF(J , 1)). Hence, Fp(A(L), 1 + 2) Fp(SJF(J ), 1) ≤ ∞ i=1 1 + 2 (1 + )2 ip ≤∞ i=0 1− 2 (1 + )2 i = O(1/2).

The proof then follows because Opt is at least as good as A.

(20)

SERVER SCHEDULING 3329 (1 + )2 times as long as they are inI, we get that

Fp(Opt(L), s/(1 + )2_{) = O(1/}2)· Fp SJF(J ), s (1 + 2)(1 + )2 = O(1/p+2)· Fp Opt(J ), s (1 + 2)(1 + )3 = O(1/p+2)· Fp Opt(I), s (1 + 2)(1 + )5 . (36)

By stringing together the inequalities (32), (33), (34), (35), and (36), we ﬁnd that Fp(SETF(I), s) = O(1/2p+2)· Fp

Opt(I), s

(1 + 2)(1 + )4

. Hence, we conclude with the main theorem for the ﬂow analysis of SETF.

Theorem 13. SETF is (1 + )-speed, O(1/2+2/p)-competitive for the _p norm of flow time.

We now turn our attention to the stretch analysis of SETF. The analysis is similar to the analysis of ﬂow, but we need a diﬀerent choice of the speed-up factors sk. By Lemma 10, it follows that

(37) _Sp(SETF(I), 1 + ) ≤ Sp(MLF(I), 1).

Similarly, since each job in Ji ∈ I is at most (1 + )2 times smaller than the corre-sponding job inK and also has size no more than that of the corresponding job in J , we get that

(38) _Sp(MLF(I), 1) ≤ (1 + )2p_Sp(MLF(K), 1). Combining (32) and (33), we get that

(39) _Sp(SETF(I), 1 + ) ≤ (1 + )2p_Sp(MLF(K), 1).

Consider the execution of MLF on K using a 1-speed processor and that of SJF

onL also using a 1-speed processor. By the correspondence between MLF on K and

SJF on L, we know that for each job in K, say of size [(1 + )k_{− 1], its ﬂow time}

is exactly equal to that of the corresponding job of size 2(1 + )k ∈ L(1). Thus the ratio of the pth power of stretch for these jobs in L(1) and K is ([(1+)2₍₁₊₎k−1]k )p, which is (1−(1+) −k)p_{. Now since [(1 + )}k− 1] ≥ 1 for all valid job sizes in K, we have that (1 + )−k≤ ₁₊ and hence that (1−(1+) −k)p≥ ((1 + ))−p. Thus the ratio of contributions to the objective Sp for L and K will be at least 1/((1 + ))p, which implies that

(40) _Sp(MLF(K), 1) ≤ ((1 + ))p_Sp(SJF(L), 1). Now, applying Theorem 2, it follows that

(41) _Sp(SJF(L), 1 + ) = O(1/p_)Sp(Opt(L), 1). Combining (40) and (41), we get that