State-dependent Importance Sampling for a Slow-down Tandem Queue

(1)

State-dependent Importance Sampling

for a Slow-down Tandem Queue

∗

D.I. Miretskiy† _{W.R.W. Scheinhardt}‡ _{M.R.H. Mandjes}§

Abstract

In this paper we investigate an advanced variant of the classical (Jackson) tandem queue, viz. a two-node system with server slow-down. The slow-down mechanism has the primary objective to protect the downstream queue from frequent overflows, and it does so by reducing the service speed of the upstream queue as soon as the number of jobs in the downstream queue reaches some pre-specified threshold. To assess the efficacy of such a policy, techniques are needed for evaluating overflow metrics of the second queue. We focus on the estimation of the probability of the following rare event: overflow in the downstream queue before exhausting the system, starting from any given state in the state space.

Due to the rarity of the event under consideration, na¨ıve, direct Monte Carlo simula-tion is often infeasible. We therefore rely on the applicasimula-tion of importance sampling to obtain variance reduction. The principal contribution of this paper is that we construct an importance sampling scheme that is asymptotically efficient. In more detail, the pa-per addresses the following issues. (i) We rely on powerful heuristics to identify the exponential decay rate of the probability under consideration, and verify this result by applying sample-path large deviations techniques. (2) Immediately from these heuris-tics, we develop a proposal for a change of measure to be used in importance sampling. (3) We prove that the resulting algorithm is asymptotically efficient, which effectively means that the number of runs required to obtain an estimate with fixed precision grows subexponentially in the buffer size. We stress that our method to prove asymptotic effi-ciency is substantially shorter and more straightforward than those usually provided in the literature. Also our setting is more general than the situations analyzed so far, as we allow the process to start off at any state of the state space, and in addition we do not impose any conditions on the values of the arrival rate and service rates, as long as the underlying queueing system is stable.

∗

Part of this research has been funded by the Dutch BSIK/BRICKS project.

†

Corresponding author. Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands.

‡

Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. WS is also with CWI, Amsterdam, The Netherlands.

§

Korteweg-de Vries Institute for Mathematics; The University of Amsterdam; Plantage Muidergracht 24; 1018 TV Amsterdam; The Netherlands. MM is also with CWI, Amsterdam, The Netherlands and EURANDOM, Eindhoven, The Netherlands; part of this work was done while he was at Stanford University, Stanford, CA 94305, US.

(2)

1 Introduction

There is a vast body of literature on Jacksonian networks, and in particular tandem queues. Being applicable in a broad range of domains, such as communication networks, manufac-turing, and logistics, they have been subject of intensive research for over fifty years. Owing to its special features, particularly the fact that its steady-state distribution is of product-form, various performance metrics could be analyzed explicitly.

Changing the model slightly, often means that the product-form is lost, and that the anal-ysis becomes cumbersome. One such a variant of the classical Jacksonian tandem queue is the the two-node system with server slow-down, also known as a system with backpressure. This mechanism is designed to offer the second (or: downstream) queue some sort of protec-tion against frequent overflows: as long as the number of jobs in the downstream queue is larger than some pre-specified threshold the server of the first (or: upstream) queue slows down, and it returns to its normal speed when the number of jobs in the second queue drops below the threshold again. For this model only partial results are available, see e.g. [20]. It is noted that the slow-down model is of significant practical interest, as a related mechanism has been proposed e.g. in the design of Metro Ethernet [12, 16].

Lacking explicit formulas for the queue’s steady-state distribution, several alternative approaches can be pursued. In this paper we highlight two such approaches: we asymp-totically characterize the probability of interest (when the buffer size of the downstream queue grows large, and the value of the threshold is scaled accordingly), but emphasis lies on the development of efficient simulation techniques, based on importance sampling (IS). It is noted that due to the rarity of the event under consideration, na¨ıve, direct Monte Carlo simulation is often infeasible. The idea of IS is to simulate the system under a different prob-ability distribution (often referred to as the ‘new measure’), under which the event of interest occurs more frequently. After correcting the simulation output by means of likelihood ra-tios, an unbiased estimate is obtained. We refer to e.g. [10] for an introduction to IS and its background.

The asymptotics that we present in this work rely on powerful heuristics developed in [15], that identify the exponential decay rate associated to the probability under considera-tion as the soluconsidera-tion of a, relatively easy, convex programming problem. The correctness of the heuristics is then proven by applying techniques known as sample-path large-deviations [6, 19]. Importantly, this procedure also reveals the so-called typical path to overflow: given that the rare event occurs, then with overwhelming probability this happens by a path ‘close to’ the typical path. Having this path at our disposal, the next step is to use this knowledge in designing IS algorithms.

When developing efficient IS schemes, various complications arise. The most important of these is that state-independent new measures, which often worked well in the case of a single queues [17, 18], usually fail for networks that are intrinsically more complex, see for instance [2, 9] for the case of the ordinary Jacksonian tandem model. It was concluded that

(3)

the class of state-independent IS is not sufficiently rich to construct asymptotically efficient new measures, explaining the increasing interest in state-dependent IS schemes. An early reference on such state-dependent schemes, in the context of a certain class of queueing networks, is [3], but an analytic proof of efficiency properties was lacking there. Later such proofs for related new measures were given in, e.g., [8].

Let us now consider this paper’s contributions in more detail. The primary goal is to an-alyze the probability of overflow in the downstream (‘protected’) queue, before the system idles, starting off from any given state. Special cases of this problem were already studied in [7, 13]. These papers exclusively consider the case of starting in the origin, which is substan-tially more straightforward to address. In [13] a ‘pseudo-state-dependent’ IS scheme was proposed for estimating the overflow in the second queue. Its asymptotic efficiency, for a limited set of initial parameters, was concluded, but just on the basis of empirical evidence. In [7] a provably asymptotically efficient new measure was proposed; as mentioned above, the analysis was restricted to the case that the system started empty, and in addition certain assumptions on the model parameters (i.e., arrival rate at the upstream queue, and service rates) were imposed. Our paper therefore generalizes [7, 13], in that all initial states are al-lowed, and that there are no assumptions on the values of the arrival rate and service rates, as long as the underlying queueing system is stable. An important additional contribution is that our proof of asymptotic efficiency is rather elementary and short, compared to that in [7].

It is mentioned that there are several technicalities related to the way the new measure should be constructed close to the axes, and close to the slow-down threshold. We consid-ered a similar technique for the ordinary two-node Jacksonian tandem model in [15]. The approach followed there, however, could not be directly applied in the current model. The main complication lies in the discontinuity of the transition structure along the slow-down threshold. As a consequence, the typical paths to overflow can have a rather complex struc-ture. In addition, the way the new measure should be constructed close to the threshold is non-trivial.

We like to stress that the focus on this paper lies on the analytic aspects of the problem, that is, the analysis of the decay rate and the proof of asymptotic efficiency. We decided to refrain from including numerical experiments in this paper, as these form a topic of research in their own right. This is due to the fact that it is still a rather nontrivial step from an asymp-totically efficient procedure, as the one presented here, to an actual, efficient implementation of the algorithm. It is noted that several aspects that we did not mention above play a crucial role: it matters for instance very much whether a new measure requires computation ‘on the fly’ of new transition rates, or whether they can be precomputed. These issues we plan to address in forthcoming work.

The paper in structured as follows. The basics of IS are recapitulated in Section 2.1, whereas a model description is given in Section 2.2. Then, after having heuristically identi-fied the shapes of the most probable path to overflow, we present in Section 3 IS schemes for

(4)

estimating the probability of interest; as a by-product we find the corresponding exponen-tial decay rate. The explanation of how sample-path large-deviations techniques are used to rigorize these findings, we leave to the Appendix. In Section 4 we slightly modify the IS schemes designed in Sections 3.4 – 3.6 and prove the asymptotic efficiency of the resulting scheme. We end the paper with some conclusions in Section 5.

Let us finish this introduction with a few words on the person in honor of whom this workshop has been organized, Reuven Rubinstein. Reuven has played a pivotal role at the interface of the simulation community and the applied probability. Relying on his unsur-passable intuition, he succeeded now and again to fuel the applied probabilists with new revolutionary ideas. We do recognize the crucial role ‘intuition’ plays in the design of ‘good’ simulation techniques – the present paper is very much in that spirit: it describes how heuris-tics can be transformed into provably efficient methods. We hope Reuven’s beautiful contri-butions to the area will continue after his retirement.

2 Preliminaries

In this section we present a short overview on the main concepts in importance sampling, and we introduce our model.

2.1 Importance sampling

As we mentioned in the introduction, estimating small probabilities through direct, na¨ıve, simulations is often infeasible, due to the rarity of the event involved; the simulation effort needed to obtain an estimate of given precision could be prohibitively large. We therefore have to use variance reduction techniques, and in this paper we focus on importance sam-pling (IS). IS performs simulations of the system under a new measure, which guarantees more frequent occurrence of the event of interest. After weighing the simulation output with the appropriate likelihood ratios (keeping track of the likelihood of the realization un-der the original measure with respect to the new measure), we obtain an unbiased estimator. The focus lies on state-dependent IS schemes, that is, schemes in which the new measure may depend on the current state of the system.

Let us now give a generic description of IS, as well as the concept of asymptotic effi-ciency. To this end, consider a family of rare events {AB}, in the probability space (Ω, F, P);

here B is the so-called ‘rarity parameter’. To estimate P(AB) via IS simulations one need to

generate samples under a new probability measure Q, with respect to which P is absolutely continuous. The probability P(AB) can now alternatively be expressed as

P(AB) = EQ[LI], (1)

(5)

derivative) of a realization (‘path’) ω: L = dP

dQ(ω). (2)

We see that we can sample under Q, say n times, obtain observations (L1, I1), . . . , (Ln, In),

and construct the unbiased estimator of P(AB) by n−1·Pni=1LiIi.

Clearly some alternative measures Q perform better than others, in terms of the variance of the resulting estimator. Therefore the following concept has been introduced.

Definition 2.1. The IS scheme for P(AB) is called asymptotically efficient if

lim inf

B→∞

log EQ_[L2_I]

log P(AB)

≥ 2. (3)

If, in addition, the probability of the {AB} decays exponentially in B, i.e.,

lim

B→∞

1

B log P(AB) ∈ (−∞, 0),

the definition of asymptotic efficiency reduces to lim sup B→∞ 1 Blog E Q_[L2_{I] ≤ 2 lim} B→∞ 1 Blog P(AB).

Notice that EQ_[L2_{I] = E[LI], so the above criterion can alternatively be written as}

lim sup

B→∞

1

Blog E[LI] ≤ 2 limB→∞

1

Blog P(AB). (4)

2.2 Slow-down tandem queue

In this subsection we describe the two-node slow-down network. At the first node (or: sta-tion) jobs arrive according to a Poisson process of rate λ. Each job receives service at the first station, after which it is routed to the second one. After receiving service at the second node, the job leaves the system. Service times at the second station have an exponential dis-tribution with parameter µ2. At the first node, however, the service speed depends on the

content of the second queue: normally, service times at the first station have an exponential distribution with parameter µ1, but if the number of jobs in the second queue exceeds some

pre-specified threshold (the ‘slow-down threshold’) than the parameter of the exponential distribution changes to µ+₁, where µ+₁ < µ1. When the system ‘stabilizes’, that is, the number

of jobs in the second queue drops again below the slow-down threshold, the service rate of the first station returns to its original value µ1.

For convenience we choose the parameters such that λ + µ1 + µ2 = 1, without loss of

generality (and thus λ + µ+₁ + µ2 < 1). The waiting rooms at both stations are assumed

to be infinitely large. Let Q(t) = {(Q1(t), Q2(t)), t ≥ 0} be the joint queue-length process,

which is regenerative if we impose that it is stable, see [14] for more insights into this issue. Our main interest is to estimate the probability of reaching some high level B in the second

(6)

queue before it returns to the origin, starting from any given state. Note that in our model the slow-down threshold scales with B, that is, the threshold has value θB in the remainder of this paper, for some θ ∈ (0, 1).

The queue-length process can also be recorded at jump epochs, and then it is described by the embedded discrete time Markov chain Qj = (Q1,j, Q2,j), where Qi,j is the number of

jobs in queue i after the j-th transition. We define the possible jump directions of the process Qj via vectors v0 = (1, 0), v1 = (−1, 1), and v2 = (0, −1), with corresponding jump rates λ,

µ1(or µ+1) and µ2respectively.

For convenience we will also consider the so-called scaled processes X(t) = Q(Bt)/B (in continuous time) and Xj = Qj/B (in discrete time). The advantage of these scalings

is the ‘invariance’ of the state space for any B. More specifically, our target probability is equivalent to the probability that the second component of either the scaled process Xj

or the scaled process X(t) reaches 1 before the process visits the origin. We introduce the following subsets of the state space, with x := (x1, x2):

D := {x : x1 > 0, 0 < x2 < θ}, ∂1 := {(0, x2) : x2> 0}, ∂2:= {(x1, 0) : x1 > 0},

D+ := {x : x1 > 0, θ ≤ x2 < 1}, ∂+1 := {(0, x2) : x2∈ [θ, 1)}, ∂θ:= {(x1, θ) : x1 > 0},

∂e:= {(x1, 1) : x1 > 0}.

The full state space is ¯D ∪ ¯D+_{, where ¯}_{D := D ∪ ∂}

θ∪ (∂1 \ ∂1+) ∪ ∂2 and ¯D+ := D+∪ ∂e∪

∂₁+∪ ∂θ. Note that the transition vkis impossible when queue k is empty, i.e., when Xj ∈ ∂k.

We modify the process Xj to deal with this by allowing some self-loop transitions in the

following way (see also Figure 1): for k = 1, 2,

P(Xj+1= Xj|Xj ∈ ∂k) = µk, P(Xj+1 = Xj|Xj ∈ ∂+1) = µ+1/(λ + µ+1 + µ2). (5)

Next, we introduce the stopping time τ_Bx, which is the first time that the process Xj hits

level 1, starting from state x = (x1, x2), without any visits to the origin:

τ_Bx = inf{k > 0 : Xk∈ ∂e, Xj 6= 0 for j = 1, . . . , k − 1}, (6)

and we define τ_Bx = ∞ if Xj hits the origin before ∂e. It will also be convenient to let IB(Ax)

be the indicator of the event τ_Bx < ∞ for the path Ax _{= (X}

j, j = 0, . . . : X0 = x). Thus we

can write the probability of our interest as

px_B := EIB(Ax) = P(τBx < ∞). (7)

3 State-dependent importance sampling

In order to find a ‘good’ new measure for IS simulations, the first step is usually to find the ‘most probable path to overflow’, i.e., the way in which overflow most probably occurs, conditional on its occurrence. In Subsections 3.1-3.3 we explain a method in which minimiz-ing certain ‘cost-functions’ leads to the most probable path and a good correspondminimiz-ing new

(7)

-6 X1 X2 1 θ 0 @@R ∂1 @@R ∂+₁ ∂2 ∂e ∂θ -λ ?_µ 2 @ @ Iµ1 -λ ?_µ 2 @ @ Iµ + 1 µ1 Y µ2 D D+ -λ @ @ I µ1 _λ -?µ2

Figure 1: State space and transition structure for the scaled process Xj.

measure, given by new (state-dependent) transition rates ˜λ(x), ˜µ1(x) and ˜µ2(x) below the

slow-down threshold and ˜λ+(x), ˜µ+₁(x), ˜µ+₂(x) above it. Also, the minimal cost itself will be shown to be the decay rate of px_Bas B → ∞, which will play a pivotal role in the asymptotic efficiency proofs later.

The results of the minimization procedure are presented in three different subsections, since they are different, depending on the parameters settings. In Subsection 3.4 we treat the case µ2 < µ+1 < µ1 in which the second server is always the bottleneck, Subsection 3.5 deals

with the case µ+₁ ≤ µ2< µ1in which either the first or the second server is the bottleneck, and

in Subsection 3.6 we describe the case µ+₁ < µ1 ≤ µ2, in which the first server is always the

bottleneck. Beforehand we would like to point out that the new measure mentioned above and denoted by tildes, is not exactly the same as the asymptotically efficient new measure that will be introduced in Section 4 (denoted by bars), although it is closely related.

3.1 Path to overflow

The typical path to overflow in the very special case that the origin is the starting state and θ ∈ {0, 1}, has already been identified in [1]. In that paper the time-reversed process is used to find the shape of the most probable path to overflow. In the more general setting that θ ∈ [0, 1], but again with the origin as the only starting state, the path to overflow was obtained in [13]. In this section we present a method similar to the one in [13] to find the optimal path starting from any state x ∈ ¯D ∪ ¯D+. The advantage of this method is that it provides us insight into the typical behavior conditional on observing the rare event

(8)

under consideration; our choice for the new measure (which we prove to be asymptotically efficient) will be inspired on it.

Before introducing our method we state a property that says that, when searching the typical path to overflow, we can restrict ourselves to a (small) subset of all feasible paths. We leave the proof that this typical path satisfies these restrictions for the Appendix.

Property 3.1. We only consider paths that satisfy the following:

(i) Each path is a concatenation of subpaths, which are straight lines on any of the subsets D, D+_{, ∂}

1 \ ∂1+, ∂1+ and ∂2, and the measure stays constant along each subpath, i.e.,

˜

λ(x) = ˜λ, ˜µ1(x) = ˜µ1, ˜µ2(x) = ˜µ2, ˜λ+(x) = ˜λ+, ˜µ+1(x) = ˜µ+1 and ˜µ+2(x) = ˜µ+2, for any

state x on the same subpath;

(ii) each path does not have more than two subpaths in each subset if µ+₁ < µ1 ≤ µ2, and

more than one subpath per subset otherwise.

With every path that satisfies Property 3.1 we associate a ‘cost’, the main idea being that the minimal cost of the path to overflow in the second queue starting from state x is the decay rate of the probability of interest (see Section 3.3). Our method is based on the family of cost functions I, defined by

I(˜λ | λ) = λ − ˜λ + ˜λ log˜λ

λ; (8)

see also [19], p. 14, 20. Note that the function (8) is convex and equals 0 at ˜λ = λ. Intuitively, the value I(˜λ | λ) is the cost we need to ‘pay’ per time unit to let a Poisson process with parameter λ behave like a Poisson process with parameter ˜λ.

In the following we will explain our cost method in some detail. More background can be found in Section 3.1 of [15] and in the Appendix of [13].

3.2 Example

As a leading example, we here consider a path consisting of two linear pieces, through the interior of the state space, staying away from the boundaries, from some state x to another state y, where x1 ≥ y1 and x2 < θ < y2 (the last condition meaning that the path crosses the

slow-down threshold). We focus on computing the typical path that connects x with y (and in particular the point where it crosses the threshold), and the corresponding new measure. To this end, we construct new measures (˜λ, ˜µ1, ˜µ2) and (˜λ+, ˜µ+1, ˜µ+2), such that ˜µ1 >

˜

µ2, ˜λ ≤ ˜µ1, ˜µ+1 > ˜µ+2 and ˜λ+ ≤ ˜µ+1. Under these measures our path consists of two

lin-ear subpaths and each of them has a constant north-west drift. In other words: below the slow-down threshold our path has a constant slope −α, with

α = µ˜1− ˜µ2 ˜ µ1− ˜λ

(9)

while above the threshold it has a constant slope −α+, with α+ = µ˜ + 1 − ˜µ+2 ˜ µ+₁ − ˜λ+. (10)

Below the slow-down threshold, the cost of this path is, per unit time,

I(˜λ, ˜µ1, ˜µ2) = I(˜λ | λ) + I(˜µ1 | µ1) + I(˜µ2| µ2). (11)

To find the cost per unit horizontal (vertical) distance, we need to divide this cost by the horizontal speed ˜µ1 − ˜λ (vertical speed ˜µ1 − ˜µ2). Similar expressions apply for the costs

per unit time and unit distance, when the process is above the slow-down threshold. Thus, minimizing the cost of any path that consists of two straight subpaths (one strictly below the threshold and one above it) from x to y in this case boils down to minimizing

(θ − x2) I(˜λ, ˜µ1, ˜µ2) ˜ µ1− ˜µ2 + (y2− θ) I(˜λ+, ˜µ+₁, ˜µ+₂) ˜ µ+₁ − ˜µ+₂ , (12)

over ˜λ, ˜µ1, ˜µ2, ˜λ+, ˜µ+1 and ˜µ+2, such that ˜λ ≤ ˜µ1, ˜µ1 > ˜µ2, ˜λ+ ≤ ˜µ+1 and ˜µ+1 > ˜µ+2 hold, as well

as κ(x) := x1− µ˜1− ˜λ ˜ µ1− ˜µ2 (θ − x2) = y1+µ˜ + 1 − ˜λ+ ˜ µ+₁ − ˜µ+₂ (y2− θ), (13)

where (κ(x), θ) is the state in which the optimal path crosses the slow-down threshold. One way to solve the minimization problem (12) is the following; from now on we focus on the ending state y = (0, 1), as this will later turn out to be the most likely point of entering ∂ein many situations. For each fixed crossing state (κ(x), θ), we can find the cost of the path

through that state. Then, we minimize this cost over all possible values of κ. Note that the optimal value κ(x) is a function of the starting state x. This property complicates the shape of the optimal paths significantly, as well as the analysis of the new measure.

The total cost of the bottom part of the optimal path, i.e., the subpath from x to (κ(x), θ) attains its minimum when the triplet (˜λ, ˜µ1, ˜µ2) is a solution to

               ˜ λ = ˜µ1+κ(x)−x_θ−x₂1(˜µ1− ˜µ2) ˜ λ + ˜µ1+ ˜µ2 = λ + µ1+ µ2 ˜ λ˜µ1µ˜2 = λµ1µ2 ˜ λ ≤ ˜µ1and ˜µ1 > ˜µ2 ˜ λ, ˜µ1, ˜µ2> 0 (14)

Similarly, the total cost of the subpath above the threshold from (κ(x), θ) to (0, 1) is minimal when (˜λ+_{, ˜}_µ+ 1, ˜µ+2) is a solution to                ˜ λ+ = ˜µ+₁ −κ(x)_1−θ(˜µ+₁ − ˜µ+₂) ˜ λ+_{+ ˜}_µ+ 1 + ˜µ+2 = λ + µ+1 + µ2 ˜ λ+µ˜+₁µ˜+₂ = λµ+₁µ2 ˜ λ+ _{≤ ˜}_µ+ 1 and ˜µ+1 > ˜µ+2 ˜ λ+, ˜µ+₁, ˜µ+₂ > 0 (15)

(10)

Notice also that if (˜λ, ˜µ1, ˜µ2) is the solution to (14) for some starting state x, it also minimizes

this system if we replace x by any state that belongs to the straight line between x and (κ(x), θ). Similarly, (˜λ+, ˜µ+₁, ˜µ+₂) which is the solution of (15) stays unchanged for the whole top part of the optimal path.

It will be useful to define the functions γ1 and γ2 as the cost of the subpaths (of the

optimal path to overflow) below and above the thresholds, i.e., γ1(x1, x2) := − (x1− κ(x)) log ˜ λ(x1, x2) λ − (θ − x2) log ˜ µ2(x1, x2) µ2 , (16) γ2(κ(x), θ) := −κ(x) log ˜ λ+_{(κ(x), θ)} λ − (1 − θ) log ˜ µ+₂(κ(x), θ) µ2 , (17)

where ˜λ and ˜µ2are given by the solution to (14), ˜λ+and ˜µ+2 by the solution to (15), and κ(x)

is given in (13). Then clearly the total cost of the path (x1, x2) → (κ(x), θ) → (0, 1) can be

expressed as γ1(x1, x2) + γ2(κ(x), θ).

3.3 Decay rate as minimal cost

Once we have considered all possible path types with their minimal cost, we can obtain the overall minimum cost, corresponding to the most probable path, and the corresponding (state-dependent) new measures (˜λ(x), ˜µ1(x), ˜µ2(x)) and (˜λ+(x), ˜µ+₁(x), ˜µ+₂(x)). Defining

γ(x) := overall minimal cost over all paths x → ∂e,

the following theorem states that this is in fact the exponential decay rate of the probability px_Bas B → ∞. It is based on a large deviation principle for the process X(t) (with a local rate function that is closely related to our cost function) that can be found in the Appendix. Theorem 3.2. The exponential decay rate of px_Bis equal to the minimal cost of overflow γ(x), i.e.,

lim B→∞ 1 B log p x B= −γ(x).

We now present the value of γ(x) (as well as the corresponding new measures) for the three cases mentioned above (that is, second server is bottleneck, ‘shifting bottleneck’, and first server is bottleneck).

3.4 Importance sampling scheme for µ2 < µ+1 < µ1

In this case, where the second queue is always the bottleneck, the new measure under which the path to overflow has minimal cost, in terms of the cost function (8), turns out to be given by (˜λ, ˜µ1, ˜µ2) =      (µ2, µ1, λ), if x ∈ A1, solution to (14), if x ∈ A2, (λ, µ1, µ2), if x ∈ A3, (18)

(11)

θ -6 x1 x2 1 0 α1θ + α+1(1 − θ) θ/α1+ (1 − θ)/α+1 A1 A2 A3 A+₃ A+₂ A+₁ r HHHj ?? r r

Figure 2: Partition of ¯D ∪ ¯D+and some optimal paths to overflow when µ2 < µ+1 < µ1.

and (˜λ+, ˜µ+₁, ˜µ+₂) =      (µ2, µ+1, λ), if x ∈ A+1, solution to (15), if x ∈ A+₂, (λ, µ+₁, µ2), if x ∈ A+3. (19)

Here the subsets Ai and A+_i , i = 1, 2, 3 form a partition of the state space ¯D ∪ ¯D+ as

depicted in Figure 2, where α1 := (µ1− µ2)/(µ1− λ) and α1+ := (µ+1 − µ2)/(µ+1 − λ). We

chose not give the precise definitions of the sets Aiand A+_i here, since they do not add much

to the understanding. For some starting states x, Figure 2 also shows the shape of the most probable path to ∂e.

Note that the new measure in the subsets A1 and A+1, i.e., interchanging λ and µ2, has

been earlier found in [17] for the problem of reaching a large total network population. Mea-sures similar to the ones in the other subsets were introduced in [15]. Also, we point out that the new measure is continuous in the state x, as can be verified by solving system (14) for x = (α1θ + α+₁(1 − θ), 0) and x = (θα1+ (1 − θ)/α+₁, 0), yielding the solutions in the first and

third lines of (18), respectively. A similar principle holds above the slow-down threshold as well.

The cost γ(x) of the optimal path, and hence the decay rate of px_Bcan be expressed as:

γ(x) =      (1 − x1− x2)γ, if x ∈ A1, γ1(x1, x2) + γ2(κ(x), θ), if x ∈ A2, 0, if x ∈ A3, (20) where γ := − log λ µ2 ,

(12)

θ -6 x1 x2 1 0 α1θ θ/α1+ (1 − θ)/α+2 B1 B2 B3 B+₃ B₂+ r r r ?

Figure 3: Partition of ¯D ∪ ¯D+and some optimal paths to overflow when µ+₁ ≤ µ2 < µ1.

is the decay rate of the path (0, 0) → (0, 1), γ1and γ2are as in (16) and (17), and κ(x) is given

in (13). Importantly, we treat only paths with starting state below the slow-down threshold; starting states above the threshold are substantially easier to deal with.

3.5 Importance sampling scheme for µ+₁ ≤ µ2 < µ1

In the case where the bottleneck may shift from the second to the first station due to the slowdown mechanism, the new measure under which the path to overflow has minimal cost, in terms of cost function (8), is given by

(˜λ, ˜µ1, ˜µ2) =      (µ2, µ1, λ), if x ∈ B1, solution to (14), if x ∈ B2, (λ, µ1, µ2), if x ∈ B3. (21) and (˜λ+, ˜µ+₁, ˜µ+₂) = ( solution to (15), if x ∈ B₂+, (λ, µ2, µ+₁), if x ∈ B₃+. (22) Here, the five subsets B1, B2, B3, B₂+and B₃+are shown in Figure 3, which is comparable

to Figure 2. The main difference is that there is no set B1+, and the constant α1+is replaced

by α+₂ := (µ2− µ+1)/(µ2− λ), while α1is the same as introduced in the previous subsection.

The decay rate γ(x) is now given by

γ(x) =      θ(1 − x1− x2)γ + (1 − θ) log(1/z+), if x ∈ B1, γ1(x1, x2) + γ2(κ(x), θ), if x ∈ B2, (1 − θ) log(µ2/µ+₁), if x ∈ B3. (23)

(13)

θ β -6 x1 x2 1 0 C1 C2 C3 C₃+ C₂+ β1 θ/α2+ (1 − θ)/α+2 r r r ?

Figure 4: Partition of ¯D ∪ ¯D+and some optimal paths to overflow when µ+₁ < µ1 ≤ µ2.

Here z+is the unique solution in (0, 1) of the equation λ + µ+₁ + µ2(1 − z+) = 2

s λµ+₁

z+ , (24)

which follows from system (14) by taking (x1, x2) = (0, 0). In fact, (1 − θ) log(1/z+) is the

cost of the vertical path (0, θ) → (0, 1) in the interior, satisfying ˜λ = ˜µ+₁. See also Equations (30) and (33) in [11] and [13] respectively, for more details.

3.6 Importance sampling scheme for µ+1 < µ1 ≤ µ2

When the first queue is always the bottleneck and θ is not too small (to be made more pre-cise at the end of this subsection), the new measure under which the path to overflow has minimal cost, in terms of cost function (8), is given by

(˜λ, ˜µ1, ˜µ2) =      (µ1, λ, µ2), if x ∈ C1, solution to (14), if x ∈ C2, (λ, µ2, µ1), if x ∈ C3. (25) and (˜λ+, ˜µ+₁, ˜µ+₂) = ( solution to (15), if x ∈ C2+, (λ, µ2, µ+1), if x ∈ C3+. (26)

The sets C1, C2, C3, C2+and C3+are shown in Figure 4, where α2:= (µ2−µ1)/(µ2−λ); the

constants β and β1 are given shortly. Interestingly, for the current case the behavior under

the new measure is entirely different on C1 and C2, and the measure is not continuous in

(14)

‘upwards’, which is γ1(x) + γ2(κ(x), θ), is equal to the cost of the path ‘to the right’, which

can be shown to be θγ + (1 − θ) log(1/q) − x1log(µ1/λ) with q being the unique solution in

(0, λµ+₁/µ2

1) of the equation

µ1µ2q2+ µ1(µ1− λ − µ+₁ − µ2)q + λµ+₁ = 0 (27)

(see also Proposition 14 in [13] for more background). Thus, the boundary between C1 and

C2is the zero level curve of the function

f (x) = θγ + (1 − θ) log(1/q) − x1log(µ1/λ) − γ1(x1, x2) − γ2(κ(x), θ).

The intersection point of this curve with the horizontal axis lies at (β1, 0) with

β1 := θ(µ2− µ1)/(µ2− λ) + (1 − θ)(λµ+1 − µ21q)/(λµ+1 − µ1µ2q2).

The intersection point (0, β) with the vertical axis follows as the unique solution to

f (0, β) = θγ + (1 − θ) log(1/q) − (θ − β) log(1/z) − (1 − θ) log(1/z+) = 0, (28) where z is the unique solution in (0, 1) of

1 − µ2z = 2r λµ1

z , (29)

which is the analogue of Equation (24), with µ+1 replaced by µ1.

The main result of this subsection is the decay rate, which is in this case given by:

γ(x) =      θγ + (1 − θ) log(1/q) − x1log(µ1/λ), if x ∈ C1, γ1(x1, x2) + γ2(κ(x), θ), if x ∈ C2, (θ − x2) log(µ2/µ1) + (1 − θ) log(µ2/µ+1), if x ∈ C3. (30)

We finally return to our assumption that θ should not be too small. In particular we used in the above that the threshold lies above the set C1, i.e., θ > β. When µ2is very large

compared to λ and µ1, the corresponding value of β may be rather large, because the cost of

‘upward’ paths will be much larger than the cost of paths ‘to the right’. However, for most ‘real-life’ cases, β is quite small; in the most interesting (heavily loaded) cases, β is still below 0.1. Of course we could also consider cases where β is larger than θ. This will lead to minor changes in the structure of C1, C2 and C2+, and the minimal cost γ(x) will then also change;

we chose to leave this special case out.

3.7 Properties of the new measures

We like to summarize some important properties of the new measures (˜λ(x), ˜µ1(x), ˜µ2(x))

and (˜λ+(x), ˜µ+₁(x), ˜µ+₂(x)) described in Sections 3.4–3.6 in the following proposition. The first two statements hold independently of the relation between the parameters λ, µ1, µ+1

and µ2, and show that the functions ˜λ(x), ˜λ+(x), ˜µ2(x) and ˜µ+₂(x) depend monotonically

on x. The last two statements give bounds which do depend on the relation between the parameters.

(15)

Proposition 3.3. For any x ∈ ¯D ∪ ¯D+_{the functions ˜}_{λ(x), ˜}_µ

1(x), ˜µ2(x) and ˜λ+(x), ˜µ+1(x), ˜µ+2(x),

as defined by either (18) and (19), or by (21) and (22), or by (25) and (26) satisfy the following: (i) ∂˜λ(x) ∂x1 ≤ 0, ∂ ˜µ2(x) ∂x1 ≥ 0, ∂˜λ(x) ∂x2 ≤ 0 and ∂ ˜µ2(x) ∂x2 ≥ 0 (if µ+₁ < µ1≤ µ2, then assume that x /∈ C1);

(ii) ∂˜λ +_(x) ∂x1 ≤ 0, ∂ ˜µ + 2(x) ∂x1 ≥ 0, ∂˜λ +_(x) ∂x2 ≤ 0 and ∂ ˜µ+₂(x) ∂x2 ≥ 0; (iii) ˜λ(x) ∈ [λ, µ2] and ˜µ2(x) ∈ [λ, µ2] if µ2 < µ1, and

˜

λ(x) ∈ [λ,pλµ1/z] and ˜µ2(x) ∈ [µ2z, µ1] ∪ {µ2} if µ2 ≥ µ1;

(iv) ˜λ+(x) ∈ [λ, µ2] and ˜µ₂+(x) ∈ [λ, µ2] if µ2< µ+₁ and

˜

λ+(x) ∈ [λ, q

λµ+₁/z+_{] and ˜}_µ+

2(x) ∈ [µ2z+, µ+1] if µ2 ≥ µ+1.

Here, as before, z+is defined by (24) and z by (29).

4 Asymptotic efficiency

For the special case in which the starting state is the origin, it is known from [13] that the new measures provided in Section 3 are not always asymptotically optimal. For example, in the simplest case, when µ2 < µ+1 < µ1, multiple visits of the process Q(t) to the horizontal

axis ∂2may lead to large likelihood ratios of particular sample paths under the new measure

(µ2, µ1, λ). This critically impacts the quality of the estimator. To avoid this behavior we

use a technique similar to what was proposed in [8] and also used in [15]. It is based on using a specific measure around ∂2, under which visits to ∂2 are harmless to the likelihood

ratio. Thus, in this section we will introduce new measures (indicated by bars) based on the measures from the previous section (indicated by tildes), and subsequently prove that these new measures are indeed asymptotically efficient. As in the previous section, we split the problem into three cases. In Section 4.1 we explain our method in detail for the situation in which the second server is always the bottleneck (µ2 < µ+₁ < µ1), and in Sections 4.2 and 4.3

we treat the other cases.

4.1 Asymptotic efficiency for µ2 < µ+1 < µ1

In this subsection we present a modification of the scheme constructed in Subsection 3.4 and prove its asymptotic optimality. At first we introduce the function W (x) for any point x = (x1, x2) of the state space. This function will give us expressions for the new measures,

denoted by bars, similar to how it was done in [8, 15]. In particular we will now find such a measure both below (¯λ(x), ¯µ1(x), ¯µ2(x)) and above (¯λ+(x), ¯µ+1(x), ¯µ+2(x)) the threshold.

(16)

For some small δ > 0, let us first introduce three auxiliary functions Wi(x), i = 1, 2, 3:

W1(x) := 2γ2(x)I{x∈ ¯D+_}+ 2 (γ₁(x) + γ₂(κ(x), θ)) I_{{x∈ ¯}_D}− δ,

W2(x) := 2γ(x1, δ/2γ) − δ, (31)

W3(x) := 2γ(0) − 3δ,

where γ(x), γ1(x) and γ2(κ(x), θ) are as in (20), see also (16) and (17); we recall that γ equals

− log(λ/µ2).

In the next step we introduce the minimum of these auxiliary functions: ¯

W (x) := W1(x) ∧ W2(x) ∧ W3(x). (32)

Since W2(x) = W1(x1, δ/2γ), it follows that this minimum is only attained by W2in a narrow

strip along the horizontal axis, namely when x2 ≤ δ/2γ, unless x is close to the origin, in

which case W3is the minimum. In all other states we simply have ¯W (x) = W1(x).

The last step is a mollification procedure, in which we define: W (x) := −ǫ log

3

X

i=1

e−Wi(x)/ǫ_. ₍₃₃₎

The resulting function W (x) is a ‘smoothed’ version of ¯W (x), except on the threshold, where W1 is not differentiable. The ‘smoothness’ of W (x) depends on the choice of the parameter

ǫ: the larger ǫ is chosen, the smoother the function W (x) is. On the other hand, as ǫ ↓ 0 we see that W (x) converges to the (non-smooth) function ¯W (x).

The parameters δ and ǫ depend on B, and in the sequel we will need the following con-ditions for their asymptotic behavior as B grows large, see [8]. For conciseness, we often suppress the index B.

Assumption 4.1. The parameters δ ≡ δB and ǫ ≡ ǫBare strictly positive and satisfy the following

limit conditions: as B → ∞, (i) ǫB→ 0, (ii) δB→ 0, (iii) BǫB→ ∞, (iv) ǫB/δB→ 0.

The following expression for the gradient of W (x) is immediate from (33), and will play an important role in the representation of the state-dependent, asymptotically efficient new measure: DW (x) = 3 X k=1 ρk(x)DWk(x), where ρk(x) := e−Wk(x)/ǫ P3 i=1e−Wi(x)/ǫ . (34)

Also, we have the following helpful property.

Proposition 4.2. The gradients of the functions Wi(x), i = 1, 2, 3 are given by:

DW1(x) = 2 log λ ˜ λ(x), log ˜ µ2(x) µ2 , if x ∈ ¯D, DW1(x) = 2 log λ ˜ λ+_(x), log ˜ µ+₂(x) µ2 , if x ∈ ¯D+, DW2(x) = 2 log λ ˜ λ(x1, δ/2γ) , 0 , DW3(x) = (0, 0).

(17)

Proof. It is clear that DW1(x) = −2γ(1, 1) if x ∈ A1∪ A+1 and DW1(x) = (0, 0) if x ∈ A3∪ A+3.

When x ∈ A2, DW1(x) seems to be more complicated:

1 2DW1(x) = Dγ(x) = log λ ˜ λ(x), log ˜ µ2(x) µ2 +∂γ(x) ∂κ(x) ∂κ(x) ∂x1 ,∂κ(x) ∂x2 −x1− κ(x) ˜ λ(x) ∂˜λ(x) ∂x1 ,∂˜λ(x) ∂x2 ! −θ − x2 ˜ µ2(x) ∂ ˜µ2(x) ∂x1 ,∂ ˜µ2(x) ∂x2 − κ(x) ˜ λ+_(x) ∂˜λ+(x) ∂x1 ,∂˜λ +_(x) ∂x2 ! − 1 − θ ˜ µ+₂(x) ∂ ˜µ+₂(x) ∂x1 ,∂ ˜µ + 2(x) ∂x2 . This gradient is more involved than its analog for the standard Jackson tandem (see Propo-sition 5.1 in [15]), not only because we now have two new measures (below and above the slow-down threshold), but also due to the strong dependence on x of the optimal path shape (and the optimal crossing state (κ(x), θ) in particular). Fortunately, the second term is zero because ∂γ(x)/∂κ(x) = 0 due to the fact that (κ(x), θ) is the optimal crossing state. Also, applying implicit differentiation one can find the partial derivatives of all ‘tilded’ variables (˜λ(x), etc.) and show that the vectors in the second and third lines sum up to zero.

The other statements (including the case when x ∈ A+₂) follow easily from the definitions of Wi(x), i = 1, . . . , 3.

Now we are ready to define the new measure, see also (27) and (29) in [15]: ¯ λ(x) := λe−hDW (x),v0i/2_eH(DW (x))/2_, _{if x ∈ ¯}_D, ¯ µi(x) := µie−hDW (x),vii/2eH(DW (x))/2, i = 1, 2, if x ∈ ¯D, ¯ λ+(x) := λ λ + µ+₁ + µ2 e−hDW (x),v0i/2_eH+(DW (x))/2_, _{if x ∈ ¯}_D+_, ¯ µ+₁(x) := µ + 1 λ + µ+₁ + µ2 e−hDW (x),v1i/2_eH+(DW (x))/2_, _{if x ∈ ¯}_D+_, ¯ µ+₂(x) := µ2 λ + µ+₁ + µ2 e−hDW (x),v2i/2_eH+(DW (x))/2_, _{if x ∈ ¯}_D+_. (35)

Note that the functions ˜λ(x), etc. from the previous section are transition rates, while the functions ¯λ(x), etc. are transition probabilities under the new measure (just as λ and λ/(λ + µ+₁ + µ2) are transition probabilities under the original measure when x ∈ ¯D resp. x ∈

¯

D+). The functions H(DW (x)) and H+(DW (x)) in the new measure (35) are known as Hamiltonians, which we use to enable the comparison with [4, 8, 15]; in fact they provide the normalization such that the new transition probabilities sum up to 1. More precisely,

H(DW (x)) := 2 loghλe−hDW (x),v0i/2_{+ µ}

1e−hDW (x),v1i/2+ µ2e−hDW (x),v2i/2 i−1 and H+(DW (x)) := 2 log " λe−hDW (x),v0i/2 λ + µ+₁ + µ2 +µ + 1e−hDW (x),v1i/2 λ + µ+₁ + µ2 +µ2e −hDW (x),v2i/2 λ + µ+₁ + µ2 #−1 .

(18)

Now that we defined the change of measure in (35), we are ready to prove that it is asymptotically efficient. We start with some lemmas that are similar to the ones in [4]. Lemma 4.3. The likelihood L(A) of a path A = (Xj, j = 0, . . . , σ) under the new measure (35)

satisfies log L(A) = B 2 σ−1 X j=0 hDW (Xj), Xj+1− Xji + 2 X k=1 1 2 σ−1 X j=0 hDW (Xj), vkiI{Xj = Xj+1∈ ∂k} (36) − 1 2 σ−1 X j=0 H(DW (Xj))I{Xj∈D}+ H +_{(DW (X} j))I{Xj∈D+} . Proof. The proof is analogous to the proof of Lemma 1 in [4].

Lemma 4.4. For any path A = (Xj, j = 0, ..., σ) under the new measure (35), the first term in (36)

satisfies B 2 σ−1 X j=0 hDW (Xj), Xj+1− Xji − B 2(W (Xσ) − W (X0)) ≤ C Bǫσ + C +_σ+_,

for sufficiently large Bǫ, where C and C+ are some positive constants and σ+ is the number of slow-down threshold crossings up to time σ.

Proof. Our argument is based on the representation W (x + y) = W (x) + hDW (x), yi +1

2y

T_{H(x)y + |y|}2_{r(x, y),}

where y := Xj+1−Xjis a one-step increment of the scaled process Xj, the matrix H(x) is the

Hessian matrix of the function W (x), and the function r(x, y) is such that lim|y|→0r(x, y) = 0,

except when x and x + y are separated by the slow-down threshold. In the latter case we can bound r(x, y) from above, uniform in x, as follows:

r(x, y) ≤ 2BC+,

where C+is some positive constant, based on a uniform upper bound on |DW (x) − DW (x+ y)|.

To end the proof, we refer to Lemma 5.5 in [15] for the following bound that holds when x and x + y are not separated by the slow-down threshold,

1 2y

T_{H(x)y + |y|}2_r(y)

≤ 2C B2_ǫ,

(19)

Lemma 4.5. For any x ∈ D we have H(DW (x)) ≥ 0, and for any x ∈ D+we have H+(DW (x)) ≥ 0.

Proof. For any x ∈ ¯D+we have H+(DW1(x)) = −2 log " λe− log(λ/˜λ+ ) λ + µ+₁ + µ2 + µ + 1e− log(˜µ + 2/µ2)+log(λ/˜λ + ) λ + µ+₁ + µ2 +µ2e log(˜µ+2/µ2) λ + µ+₁ + µ2 # = −2 log" ˜λ +_{+ ˜}_µ+ 1 + ˜µ+2 λ + µ+₁ + µ2 # = 0, H+(DW2(x)) = −2 log " λe− log(λ/˜λ) λ + µ+₁ + µ2 + µ + 1elog(λ/˜λ) λ + µ+₁ + µ2 + µ2 λ + µ+₁ + µ2 # = −2 log" ˜λ + µ + 1λ/˜λ + µ2 λ + µ+₁ + µ2 # ≥ 0,

where the last inequality is found by considering the convex function f (x) := (x + λµ+1/x +

µ2)/(λ+µ+₁+µ2); since f (λ) = f (µ+₁) = 1 it follows that f (x) < 1 for any x ∈ [λ, µ2] ⊂ [λ, µ+₁].

Finally we also have

H+(DW3(x)) = −2 log λ + µ + 1 + µ2 λ + µ+₁ + µ2 = 0.

Combining these bounds with representation (34) and keeping in mind the concavity of H+(x) (thanks to Proposition 3.2 in [8]) we obtain

H+(DW (x)) = H+ 3 X i=0 ρi(x)DW (x) ! ≥ 3 X i=0 ρi(x)H+(DWi(x)) ≥ 0, for any x ∈ ¯D+_.

The proof of the other statement is analogous, or follows from Lemma 5.3 in [15].

Lemma 4.6. Consider the slow-down network and recall the definition of τ_Bx in (6). For any sequence υBsuch that limB→∞υB = 0 the following limit holds:

lim B→∞ 1 B log E(e υBτBx|I B(Ax) = 1) = 0.

Proof. We define a new random variable τ which represents the same random period of time as τ_Bx, but for the case when θ = 0, i.e., for the two-node tandem Jackson network with parameters (λ, µ+1, µ2). It is clear that τBx ≤st τ . From Lemma 5.6 in [15] we know that for

the standard Jacksonian tandem network lim B→∞ 1 B log E(e υBτ_|I B(Ax) = 1) = 0,

(20)

Theorem 4.7. When µ2< µ+1 < µ1 and Assumption 4.1 holds, the new measure in (35), where the

function W is based on (20), is asymptotically optimal.

Proof. We will roughly follow the proof of Theorem 5.7 in [15], finding upper bounds on each of the three terms in Lemma 4.3.

To deal with the first term, we first bound W (x) from below. Upon combining the fact that W2(x) ≥ W1(x) − δ for any x ∈ ¯D ∪ ¯D+(this is shown in the same manner as in Thm.

5.7 of [15]; use (20)), with the monotonicity of γ(x) in both x1 and x2, and using definition

(33), it is found that

W (x) ≥ −ǫ log(e−W1(x)/ǫ_{+ e}(−W1(x)+δ)/ǫ_{+ e}−W3(x)/ǫ₎

≥ −ǫ log(3e(−2γ(x)+3δ)/ǫ) = 2γ(x) − ǫ log(3) − 3δ. (37)

Using the same technique we obtain an upper bound for W (Xτx B):

W (Xτx

B) ≤ −δ. (38)

Combining the inequalities (37)-(38) with Lemma 4.4 (take σ = τ_Bx), we now derive the following upper bound on the first term in Lemma 4.3:

B 2 τx B−1 X j=0 hDW (Xj), Xj+1− Xji ≤ B 2(−2γ(x) + η(B)) + C Bǫτ x B+ C+τBx,+, (39)

where C and C+ are some positive constants, η(B) is such that limB→∞η(B) = 0 (use

As-sumption 4.1), and τ_Bx,+is the number of slow-down threshold crossings up to time τ_Bx. Now let us bound the second term in Lemma 4.3. For any x ∈ ∂2we have hDW2(x), −v2i =

hDW3(x), −v2i = 0 and hDW1(x), −v2i = 2 log(˜µ2/µ2); applying (34) we arrive at

hDW (x), −v2i = 2 log ˜

µ2

ρ1(x) ≥ −2γρ1(x) ≥ −2γe−(W1(x)−W2(x))/ǫ, (40)

where the first inequality comes from the fact that ˜µ2 ≥ λ (see Proposition 3.3). It is also clear

that W1(x)−W2(x) = δ for any x ∈ A1∩∂2, where the functions W1(x) and W2(x) are defined

by (31). The second statement of Proposition 3.3 guarantees that the difference W1(x)−W2(x)

decreases to 0 as x goes from (α1, 0) to (α−11 , 0). From here we can immediately find 0 ≤

W1(x) − W2(x) ≤ δ, which implies:

hDW (x), −v2i ≥ −2γe−δ/ǫ.

Using the same technique and keeping Proposition 3.3 in mind, one can also show that hDW (x), −v1i ≥ −2γe−δ/ǫ,

for any x ∈ ∂1. Using these inequalities we can bound the second term in Lemma 4.3 from

above: 2 X k=1 1 2 τx B−1 X j=0 hDW (Xj), vkiI{Xj = Xj+1 ∈ ∂k} ≤ γe−δ/ǫτBx. (41)

(21)

Finally note that Lemma 4.5 provides a straightforward bound on the last term of the log-likelihood expression in Lemma 4.3:

H(DW (Xj))I{Xj∈D}+ H

+_{(DW (X}

j))I{Xj∈D+} ≥ 0. (42)

Upon combining (39), (41) and (42), we bound (36) in the following way: log(L(A)) ≤ −Bγ(x) + Bη(B) 2 + χ(B)τ x B+ C+τBx,+, where χ(B) := γe−δ/ǫ+ C ǫB. Now for any path Axwe have

1 Blog E [L(A x_)I B(Ax)] = 1 B log(E [L(A x_)|I B(Ax) = 1] P [IB(Ax) = 1]) ≤ 1 B log Ehe−Bγ(x)+Bη(B)+χ(B)τBx+C + τBx,+|I B(Ax) = 1 i px_B = −γ(x) + η(B) 2 + 1 B log E h eχ(B)τBx|I B(Ax) = 1 i +1 B log E h eC+τBx,+|I_B(Ax) = 1 i + 1 B log p x B.

Using that limB→∞τ_Bx,+/B = 0 a.s. when IB(Ax) = 1, and that limB→∞χ(B) = 0 (see

Assumption 4.1), and invoking Lemma 4.6 and Theorem 3.2, we conclude that lim B→∞ 1 B log E [L(A x_)I B(Ax)] ≤ −2γ(x) = 2 lim B→∞ 1 B log p x B.

In view of criterion (4), this completes the proof.

4.2 Asymptotic efficiency for µ+1 ≤ µ2 < µ1

Remarkably, we can use the same function W (x) for this case as in the previous subsection, see (33); also we define the new measures ¯λ(x), ¯µ1(x), ¯µ2(x) and ¯λ+(x), ¯µ+1(x), ¯µ+2(x) in

the same way, see (35).

One difference with the previous case is that Lemma 4.5 no longer holds. In fact the Hamiltonians can be negative now, but the next lemma shows that they vanish as B grows large; recall that due to Assumption 4.1 we have that 1/ǫ → ∞ and δ/ǫ → ∞ as B → ∞. Lemma 4.8. For any x ∈ D we have H(DW (x)) ≥ 0, and for any x ∈ D+we have

H+(DW (x)) ≥ −C⋆e−(θ−2γδ)γ/ǫ

(22)

Proof. Using the same technique as in the proof of Lemma 4.5, one can prove that the first statement holds, while for the second statement we find

H+(DW (x)) ≥ ρ2(0, θ)H+(DW2(0, x2)),

if x ∈ D+. The second factor on the right hand side is in fact a constant, since DW2does not

depend upon its second argument. Furthermore it is negative, and ρ2(0, θ) ≤

e−W2(0,θ)/ǫ

e−W1(0,θ)/ǫ ≤ e

−(θ−δ 2γ)γ/ǫ_,

so that the claim follows.

Theorem 4.9. When µ+₁ ≤ µ2 < µ1 and Assumption 4.1 holds, the new measure in (35), where the

function W is based on (23) and (17), is asymptotically optimal.

Proof. In order to prove this theorem we again bound all three terms of the log-likelihood ratio, see Lemma 4.3. For the first two terms we find exactly the same as in (39) and (41). As for the third term, this is now bounded in Lemma 4.8. Thus, we find again

log(L(A)) ≤ −Bγ(x) + Bη(B) 2 + χ(B)τ x B+ C+τBx,+, only now χ(B) = γe−δ/ǫ+ C ǫB + C⋆ 2 e −(θ−δ 2γ)γ/ǫ,

which also vanishes as B grows large. From now on we can follow the proof of Theorem 4.7 which leads us to the result.

4.3 Asymptotic efficiency for µ+1 < µ1 ≤ µ2

For the case when µ+1 < µ1 ≤ µ2we again use the function W (x) defined by (33), to describe

the IS scheme as in (35), which we can prove to be asymptotically efficient. It is important that in this case the function W2(x) plays a more important role than before. Because of the

structure and the cost of the optimal path to overflow we now have ¯

W (x) = W2(x), for any x ∈ C1∪ {x : x2 ≤ δ/2γ},

see also Figure 4. In the two previous subsections this was only valid on {x : x2 ≤ δ/2γ}. Theorem 4.10. When µ+₁ < µ1 ≤ µ2 and Assumption 4.1 holds, the new measure in (35), where

the function W is based on (30) and (17), is asymptotically optimal.

Proof. Not surprisingly, the proof of this theorem is almost the same as that of Theorem 4.9. Even Lemma 4.8 remains valid in this case. The only essential difference is the behavior of the function W (x) on the constraints ∂1and ∂2. This leads to a different bound on the second

(23)

term of the log-likelihood in Lemma 4.3. As in Theorems 4.7 and Theorem 4.9 we have for any x ∈ ∂2,

hDW (x), −v2i ≥ −2 log(1/z)ρ1(x),

but for ρ1(x) we now have by Theorem 5.8 in [15] that

ρ1(x) ≥ exp −2β log(1/z) ǫ ,

where β is unique solution to the equation f (0, β) = 0, see also (28). We conclude, that for any x ∈ ∂2, hDW (x), −v2i ≥ 2 log(z) exp −2β log(1/z) ǫ . For x ∈ ∂1we have, again due to Theorem 5.8 in [15]:

hDW (x), −v1i ≥ −2 log(µ1/λ)e−2δ/ǫ.

The rest of the proof can be done by mimicking the arguments used in Thm. 4.7 or Thm. 4.9.

5 Conclusions

This paper focused on constructing IS schemes for estimating the probability of a specific rare event: overflow in the second queue of the slow-down network before the system idles, starting from any given state. We proved asymptotic efficiency of the proposed new mea-sure. The analysis heavily relied on large-deviations argumentation.

One can look at this result from two different perspectives. On one hand, this is the continuation of our earlier work on rare-event simulation in a two-node Jacksonian tandem network, see [15]. On the other hand, this paper can be viewed as the generalization of already existed research on the slow-down network, see [7, 13]. We rigorized and further studied the empirical findings of [13]. Also, [7, 13] consider the restrictive case that the only possible starting state is the origin. In our paper we developed IS schemes for all three pos-sible cases (second queue bottleneck, ‘shifting bottleneck’, first queue bottleneck), unlike [7] that specializes to a specific ordering of the parameters: λ < µ+1 < µ2 ≤ µ1(which is covered

by our ‘shifting bottleneck’ case µ+₁ ≤ µ2 < µ1). In our schemes one may pick an arbitrary

starting state x. An important by-product of our analysis is a precise description of the typi-cal path to overflow (in the second queue), starting in an arbitrary state. Although our proofs use specific properties of the model at hand, we strongly feel that our methodology carries over to more general classes of queues.

As indicated in the introduction, a next challenge is to transform the methods presented in this paper into simulation programs. We stress that, even with an asymptotically efficient

(24)

new measure at our disposal, new questions come up: should we compute the new measure ‘on the fly’ (that is, while running the program), or precompute it (and store it)? Also, it may pay off to partition the state space into a small number of sets, and to approximate the state-dependent change of measure by new measures that are constant on these sets. A detailed simulation study, as well as extensive practical guidelines, will be presented in a forthcoming paper.

A

Appendix. Large deviations

The goal of this appendix is to establish the result that the cost of the optimal path to over-flow is equal to the exponential decay rate of our probability of interest, see Theorem 3.2. We also highlight a number of important and interesting large-deviations properties of the process X(t).

Let us consider any absolutely continuous function φ : [0, ∞) → ¯D ∪ ¯D+_{, representing a}

path associated with the scaled process X(t). Our first aim is to define a so-called local rate function ℓ(φ(t), ˙φ(t)), which depends both on the position at time t and on the time derivative (or speed vector) ˙φ(t) at time t. To do it, first we define four auxiliary functions Li(y), where

the argument y should be interpreted as a ‘speed vector’: Li(y) := sup ϑ (hϑ, yi − gi(ϑ)) , i = 1, . . . , 4, (43) and where g1(ϑ) := λ(eϑ1 − 1) + µ1(eϑ2−ϑ1 − 1) + µ2(e−ϑ2 − 1), g2(ϑ) := λ(eϑ1 − 1) + µ+1(eϑ2−ϑ1− 1) + µ2(e−ϑ2− 1), g3(ϑ) := λ(eϑ1 − 1) + µ2(e−ϑ2 − 1), g4(ϑ) := λ(eϑ1 − 1) + µ1(eϑ2−ϑ1 − 1);

cf. (5.5) in [19]. It is observed that g1(·) corresponds to D, g2(·) to D+, g3(·) to ∂1, and g4(·) to

∂2. Now we can define the local rate function ℓ as:

ℓ(φ(t), ˙φ(t)) :=                          L1( ˙φ(t)), if φ(t) ∈ D, L2( ˙φ(t)), if φ(t) ∈ D+∪ ∂e, [L1⊕ L3]( ˙φ(t)), if φ(t) ∈ ∂1\ ∂1+, [L2⊕ L3]( ˙φ(t)), if φ(t) ∈ ∂1+, [L1⊕ L2 ⊕ L3]( ˙φ(t)), if φ(t) = (0, θ), [L1⊕ L4]( ˙φ(t)), if φ(t) ∈ ∂2, [L1⊕ L2]( ˙φ(t)), if φ(t) ∈ ∂θ, (44)

where, for n ≥ 2, and y denoting a two-dimensional vector, [L1⊕ . . . ⊕ Ln](y) := inf ( _n X i=1 ρiLi(yi) : ρi ≥ 0, n X i=1 ρi= 1, n X i=1 ρiyi = y ) ,

(25)

is the inf-convolution of the functions L1, . . . , Ln, the infimum being taken over all values ρi

and vectors yi, i = 1, . . . , n, that satisfy the given conditions.

We now briefly explain why we use this inf-convolution on the boundaries of the state space. Assume that the scaled process X(t) follows a path φ(t) ∈ ∂1\ ∂1+, such that ∂φ2/∂t >

0 for t ∈ [0, T ]. Hence, the first and the second components of the vector y should be zero and strictly positive, respectively. It is clear that the original (unscaled) jump process Q(t) can only increase its second component when it is not on ∂1, since jumps of the v1 type are

not allowed on ∂1. Therefore the inf-convolution provides a ‘mixture’ of the functions L1

and L3, supposing that the process Q(t) spends a fraction of time ρ in the lower part of the

interior D and a fraction 1 − ρ on the vertical constraint. Note that ρ must be such that φ(t) has speed y with positive increment in the vertical direction and zero-increment in the horizontal direction, such that the scaled process X(t) remains in ∂1.

Now we are ready to state the following theorem.

Theorem A.1. The process X(t) satisfies a large deviations principle with rate function (44), i.e., lim B→∞− 1 Blog p x B= inf Z τ 0 ℓ(φ(t), ˙φ(t))dt,

where τ := inf{t > 0 : φ(t) ∈ ∂e, φ(s) 6= 0, s ∈ (0, t)} and the infimum is taken over all absolutely

continuous functions φ : [0, ∞) → ¯D ∪ ¯D+_{such that φ(0) = x and τ < ∞.}

Proof. We sketch the proof of this result, as it is reminiscent of results proven in [7]; see also the proof of Thm. 4.1 in [15]. The main arguments are taken from [5, 6].

We first introduce the process Z(t), which is an unconstrained version of X(t), that is, Z(t) is allowed to have negative values in both components. In addition we will assume that Z(0) = X(0) = x ∈ ¯D ∪ ¯D+. One can then use Theorems 3.2 and 3.4 of [6] to show that the map Γ : Z(t) → X(t) exists and Theorem 2.2 from the same paper to show that it is Lipschitz continuous. Γ is known as the Skorokhod map and the question whether it exists is referred as the Skorokhod problem; for more details see [6].

Since the map Γ is Lipschitz continuous and the process Z(t) satisfies a large deviation principle (see Theorem 7.2.3 of [5]), one can apply the contraction principle (see Theorem 2.13 of [19]) and conclude that the process of our interest, X(t), satisfies a large deviations prin-ciple with rate function ℓ(φ(t), ˙φ(t)) defined by (44).

To prove Thm. 3.2, we now recapitulate the main findings of Section 4 in [15]. Using the local rate function ℓ we can define the rate function of any path φ(t) = (φ1(t), φ2(t)) with

t ∈ [0, T ] for some T , as the integral of ℓ over time. At first let us mention the following property: for the paths that stay in one of the subsets D, D+, ∂1\ ∂₁+, ∂₁+, ∂2, the rate function

(44) is minimal when the path is straight, with constant speed vector; see Lemma 5 of [15], and p. 87 of [19].

Now we assume that φ(t) ∈ D, for t ∈ (0, T ) is a path between two states x and y. We know that the path φ(t) has minimal cost if the process X(t) moves along a straight line at

(26)

constant speed. We can define a corresponding new measure as follows: ˜ λ = λeϑ1_, ˜ µ1 = µ1eϑ2−ϑ1, (45) ˜ µ2 = µ2e−ϑ2,

where ϑ = (ϑ1, ϑ2) is the maximizer in (43) with i = 1. In fact this is exactly the same new

measure we would find using the cost minimization procedure from Section 3, due to the immediate equality

ℓ(φ(t), ˙φ(t)) = I(˜λ, ˜µ1, ˜µ2); (46)

see again [15]. This equality however, does not hold on the boundaries. Instead, when φ(t) stays on ∂1or ∂2for t ∈ [0, T ], we have

ℓ(φ(t), ˙φ(t)) ≤ I(˜λ, ˜µ1, ˜µ2),

where the new measure (˜λ, ˜µ1, ˜µ2) is again defined as in (45). However, for the optimal paths

we still have equality between local rate functions and cost functions on the boundaries, see Lemma 6 in [15]. Let, e.g., Φ1be the set of paths that travels a distance h > 0 along ∂1\ ∂+1 at

constant speed during a time σ, i.e.,

Φ1 = {φ(t) ⊂ ∂1\ ∂1+: φ(0) = (0, x⋆2), φ(σ) = (0, x⋆2+ h)},

for some x⋆₂. Then we have the following relation inf φ∈Φ1 Z σ 0 ℓ(φ(t), ˙φ(t))dt = h inf I(˜λ, ˜µ1, ˜µ2) ˜ µ1− ˜µ2 ,

where the second infimum is taken over all ˜λ, ˜µ1and ˜µ2 such that ˜λ < ˜µ1 and ˜µ1 > ˜µ2. We

have a similar situation for paths that follow ∂1+or the horizontal constraint ∂2.

Our last result, which is an analogue of Lemma 7 in [15], regulates the number of sub-paths of the optimal path to overflow.

Lemma A.2. The optimal path from any starting state x to the exit boundary ∂edoes not have more

than

(i) two subpaths in each subset, if µ+₁ < µ1 ≤ µ2, and

(ii) one subpath in each subset otherwise.

Finally, using all the results in this appendix, Thm. 3.2 follows; the proof is analogous to the proof of Thm. 8 in [15].

(27)

References

[1] V. Anantharam, P. Heidelberger, and P. Tsoucas. Analysis of rare events in continuous time Markov chains via time reversal and fluid approximation. IBM Research Report, REC 16280, 1990.

[2] P.T. de Boer. Analysis of state-independent importance-sampling measures for the two-node tandem queue. ACM Transactions on Modeling and Computer Simulation, 16(3):225– 250, 2006.

[3] P.T. de Boer, Victor F. Nicola, and Reuven Y. Rubinstein. Adaptive importance sampling simulation of queueing networks. In Proceedings of the 2000 Winter Simulation Conference (WSC’00), pages 646–655, Orlando, Florida, 2000.

[4] P.T. de Boer and W.R.W. Scheinhardt. Alternative proof with interpretations for a recent state-dependent importance sampling scheme. Queueing Systems: Theory and Applica-tions, 57(2-3):61–69, 2007.

[5] P. Dupuis and R.S. Ellis. A weak convergence approach to the theory of Large deviations. John Wiley & Sons, New York, 1997.

[6] P. Dupuis and H. Ishii. On Lipschitz continuity of the solution mapping to the Sko-rokhod problem, with applications. Stochastics and Stochatics Reports, 35:31–62, 1991. [7] P. Dupuis, K. Leder, and H. Wang. Large deviations and importance sampling for a

tan-dem network with slow-down. Queueing Systems: Theory and Applications, 57(2-3):71–83, 2007.

[8] P. Dupuis, A.D. Sezer, and H. Wang. Dynamic importance sampling for queueing net-works. Annals of Applied Probability, 17(4):1306–1346, 2007.

[9] P. Glasserman and S.-G. Kou. Analysis of an importance sampling estimator for tandem queues. ACM Transactions on Modeling and Computer Simulation, 1(5):22–42, 1995. [10] P. Heidelberger. Fast simulation of rare events in queueing and reliability models. ACM

Transactions on Modeling and Computer Simulation, 5(1):43–85, 1995.

[11] D.P. Kroese, W.R.W. Scheinhardt, and P.G. Taylor. Spectral properties of the tandem Jackson network, seen as quasy-birth-and-death process. Annals of Applied Probability, 14(4):2057–2089, 2004.

[12] R. Malhotra, M. Mandjes, W. Scheinhardt, and H. van den Berg. A feedback fluid queue with two congestion control thresholds. To appear in: Mathematical Methods in Opera-tions Research, 2008.

[13] D.I. Miretskiy, W.R.W. Scheinhardt, and M.R.H. Mandjes. Efficient simulation of a tan-dem queue with server slow-down. Simulation, 83(11):751–767, 2007.

(28)

[14] D.I. Miretskiy, W.R.W. Scheinhardt, and M.R.H. Mandjes. Tandem queue with server slow-down. ACM Sigmetrics Performance Evaluation Review, 35(3):51–52, 2007.

[15] D.I. Miretskiy, W.R.W. Scheinhardt, and M.R.H. Mandjes. State-dependent

im-portance sampling for a Jackson tandem network. Submitted, 2008. See

also Memorandum 1867, Dept. of Applied Mathematics, University of Twente, URL: http://eprints.eemcs.utwente.nl/12734/.

[16] W. Noureddine and F. Tobagi. Selective back-pressure in switched Ethernet LANs. In Global Telecommunications Conference, 2, 1999.

[17] S. Parekh and J. Walrand. A quick simulation method for excessive backlogs in net-works of queues. IEEE Transactions on Automatic Control, 34:54–66, 1989.

[18] J. S. Sadowsky. Large deviations theory and efficient simulation of excessive backlogs in a GI/GI/m queue. IEEE Transactions on Automatic Control, 36(12):1383–1394, 1991. [19] A. Shwartz and A. Weiss. Large deviations for performance analysis. Queues, communications

and computing. Chapman & Hall, London, UK, 1995.

[20] N. D. van Foreest, M.R.H. Mandjes, J.C.W. van Ommeren, and W.R.W. Scheinhardt. A tandem queue with server slow-down and blocking. Stochastic Models, 21(2-3):695–724, 2005.