The (Generalized) Adaptive Multilevel Splitting algorithm, and its applications

(1)

MSc Stochastics and Financial Mathematics

Master Thesis

The (Generalized) Adaptive Multilevel

Splitting algorithm, and its applications

Author: Supervisor:

Jeffrey Guijken

dr. S.G. Cox

Examination date:

(2)

Abstract

The Adaptive Multilevel Splitting-algorithm (AMS-algorithm) can be used to efficiently estimate rare event probabilities of the form P(X ≥ a), where X is a random vari-able with a continuous cumulative distribution function, for which we can sample from L(X|X > x) for all x ∈ R. In particular, we can consider the case that X = supn∈NXn,

where (Xn)n∈N is a Markov chain on R. In this thesis, we give a proof of the

well-known fact that the obtained AMS-estimator is unbiased, and we analyze its variance for the special case that k = 1. Furthermore, we show a central limit theorem for the AMS-estimator in this special case. In order to be able to estimate rare event probabilities of the form P(Xn ∈ B for some n ∈ N) for multidimensional Markov

chains (Xn)n∈N on Rd, the AMS-algorithm can be generalized to the MMC-algorithm.

With this MMC-algorithm, the expected time E[τB] that it takes a Markov chain to

reach a set B can also be estimated. In order to prove that the resulting MMC-estimator for P(Xn ∈ B for some n ∈ N) is unbiased, we follow the general framework

from [BGGLR16], called the GAMS-algorithm. For this GAMS-algorithm, based on [BGGLR16], we show unbiasedness of the GAMS-estimator under some assumptions, and we show that the MMC-algorithm can be written as a GAMS-algorithm, and that it satisfies the assumptions. Furthermore, we discuss a central limit theorem proven in [CDGR19] for the MMC-estimator, based on the theory of Fleming-Viot particle systems in [CDGR17]. Finally, the MMC-algorithm is tested in practical situations.

Title: The (Generalized) Adaptive Multilevel Splitting algorithm, and its applications Author: Jeffrey Guijken, jeffrey.guijken@student.uva.nl, 11234946

Supervisor: dr. S.G. Cox

Second Examiner: prof. dr. M.R.H. Mandjes Examination date: June 14, 2021

Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 105-107, 1098 XG Amsterdam

(3)

Introduction

A rare event is a probabilistic event that occurs with a very small probability. One could (naively) think that such rare events are not relevant, because they will most probably not happen. However, the theory of rare events is a crucial one, with applications in all sorts of disciplines. One can think of a banker, who is interested in the probability that his bank is going default, if he makes a certain deal. Although this probability is very small, it is crucial for the banker to know how small this probability exactly is. It can make a huge difference whether it is of order 10−8 or 10−15. Another application can be found in physics, where at each time instance, particles can jump from one state to another with a very small probability, but where one is really interested in how long it takes for a particle to make such a jump. Although the jump probability is very small for a single particle, the problem is still interesting, because of the fact that there are very many particles present.

In this theory of rare events, it is therefore important to estimate the probability of the rare event. Building an estimator that is unbiased, and that has a low variance is a must. As is well-known, it is not a good idea to estimate a rare event probability, say p := P(X > a), by using the Monte Carlo approach, that is, by sampling N i.i.d. random variables X1, . . . , XN with law L(X) (the law of X), and taking the quantity

1 N

PN

n=11Xn>a as the estimator for p. The reason for this is that as p decreases, the

squared relative error Var_N1 PN

n=11Xn>a

/p2 blows up really fast. Because of this fact, mathematicians have created a lot of methods in order to reduce this relative error (see for example [AG07]). One of these methods, is the so-called Adaptive Multilevel Splitting algorithm (AMS-algorithm), which will be the basis of this thesis, and was first introduced by Frédéric Cérou and Arnaud Guyader in [CG07].

The AMS-algorithm is based on a splitting technique. The idea is to rewrite the probability p = P(X > a) as p = QL

`=1P(X > a`|X > a`−1), where −∞ = a0 < a1 <

· · · < aL−1 < aL = a. These real numbers (a`)L_`=0 are called the levels, and the idea of

the AMS-algorithm is to adaptively define these levels during the algorithm, instead of defining them up front.

In particular, the AMS-algorithm does the following: first, it samples N i.i.d. random variables X₁0, . . . , X_N0 with law L(X), just as in the Monte Carlo approach. Then, the algorithm disables the k smallest of those random variables, and replaces them by k random variables with law L(X|X > Z0), where Z0is the highest of the disabled random

variables. This procedure of replacing is then repeated in each iteration (with resulting random variables X₁q, . . . , X_Nq in iteration q), until at least N − k + 1 of the random variables in a certain iteration are higher than a. This final iteration is called Qiter.

(5)

random variables 1

XnQiter>a, obtained from the last iteration, and is given by

ˆ p = N X n=1 GQiter n 1_XQiter n >a,

where the weights GQiter

n are defined adaptively during the algorithm, and can be thought

of as the first L − 1 terms P(X > a`|X > a`−1) in the expression of the splitting step

above. Note that the Monte Carlo estimator looks similar, but it has weights _N1, which are typically larger than the weights GQiter

n in the above expression.

In this thesis, the AMS-algorithm will be used for Markov chains in particular. Namely, given a Markov chain (Xt)t∈I, where I is a countable index set, we are interested in the

event that the Markov chain ever crosses some value a. In other words, if we take X = sup_t∈IXt in the AMS-algorithm, then we can estimate this probability.

Let us now explain why we are interested in events that a Markov chain ever reaches some value a. For this, we consider a physical setting. Suppose that a particle is moving in R, and let us assume that for each location x ∈ R where the particle can be, the particle has some potential f (x) if it is in x. The function f is assumed to be differentiable, and we assume also that there is a “potential well”, i.e., f has a local minimum. We assume that the particle generally moves from an area with a high potential to an area with a low potential. In physics and chemistry, there are a lot of situations that can be described in this way. A physical example is given by particles in the air, that are moving from areas with a high pressure to a low pressure. Another example (from chemistry) is the folding of proteins. This is a process, in which a protein chain transforms from a “flat” structure to a “folded” structure. This folding process happens via multiple substructures, that are between the flat structure and the folded structure. For each transition, a lot of energy is needed, and each such transition can therefore be seen as a transition of a particle from one potential well, to another potential well, where the potential hill between the two wells, is caused by the energy that is needed to go from the flatter structure to the more folded structure (see also [PKN11]).

The simplest way to model the location of a particle in a potential well is given by the ODE

dXt= −∇f (Xt) dt,

where Xt is the location of the particle at time t. Note that in this model, the particle

moves to the minimum of the potential well, and will in particular never exit the potential well. However, there are many forces acting on this particle which are not included in this model. For example, one could think of the force that arises from the particle colliding with other particles. To compensate for this, we add a stochastic force term in the equation, in the form of a Brownian motion. This idea originates from [Kramers40], and the resulting dynamics are given by the following SDE:

dXt= −∇f (Xt) dt +

√

2ε dWt,

(6)

The problem in which we are interested is now the following: assume that the potential well is given by some interval (−∞, a∗]. We are interested in the probability that the particle ever leaves the potential well, given that it starts there. One could think of the potential function being f (x) = −x3+ x, which has a local minimum in −

√ 3

3 , and the

potential well can be taken as (−∞,

√ 3

3 ]. See the figure below.

Figure 0.1: Potential function f (x) = −x3+ x, together with potential well and crossing value a∗.

Because the particle is tempted to stay within the well, the probability that we want to estimate is very small. Therefore, we want to perform the AMS-algorithm to estimate this probability. In order to do this, we can discretize the above SDE (for example, with an Euler-Maruyama scheme, see for example [CN14, pp.31]), which results in a Markov chain. Then, we perform the AMS-algorithm on this Markov chain, as is described in the beginning of the introduction.

Finally, let us comment on the generalized AMS-algorithm (GAMS-algorithm) that is introduced in this thesis, originating from [BGGLR16]. This GAMS-algorithm is created, in order to make the above reasoning also valid for particles that move in a multidimensional space. It can also be used to analyze more complicated models for the particle. In particular, in the dynamics from [Kramers40] above, we did not include the acceleration of the particle. This acceleration can be included, by adding one SDE for the velocity of the particle, thus obtaining a system of two SDEs. The GAMS-algorithm can then estimate the probability p, by using these advanced dynamics.

Overview of the chapters

In Chapter 1, the AMS-algorithm is properly defined. We prove (based on [BLR15]) that the obtained AMS-estimator is unbiased, and for the case that k = 1, we derive the variance of the estimator, and give a central limit theorem for it.

(7)

In Chapter 2, we generalize the AMS-algorithm from Chapter 1, to obtain the GAMS-algorithm. We will explain why such a generalized version is needed, if we want to work in higher dimensions. Furthermore, under some assumptions, we prove that the obtained GAMS-estimator is unbiased, based on [BGGLR16].

In Chapter 3, two applications of the GAMS-algorithm are discussed. First, it is proved that multidimensional Markov chains satisfy the assumptions made in Chapter 2, so that if we plug a multidimensional Markov chain into the GAMS-algorithm, we obtain an unbiased estimator. The resulting algorithm is called the MMC-algorithm. After this, we discuss how the GAMS-algorithm can also be used to estimate the expected time it takes for a Markov chain (or: particle) to enter a certain set (or: leave a potential well). Finally, in Chapter 4, we will test the MMC-algorithm in four situations: a moving particle in R and in R2, a simple random walk on Z, and the Ornstein-Uhlenbeck process. In the latter two situations, theoretical formulas exist, so that we can really see how well the algorithms behave.

(8)

Notation

Definition 0.1. Given a random variable X : Ω → R, we denote L(X) for the law (or: distribution) of X. If Y : Ω → R is another random variable, then L(X|Y ) denotes the law of X, conditional on Y .

Definition 0.2. Given a finite set A = {X1, . . . , Xn} of random variables, the `-th order

statistic of A is defined as the `-th smallest element of A, and is denoted by ord`(A).

Definition 0.3. Given two σ-algebras G and H on a space Ω, the smallest σ-algebra on Ω containing the union G ∪ H is denoted by G ∨ H.

Definition 0.4. Given a filtration (Fz)z∈R, and given an (Fz)z∈R-stopping time Z, we

define the stopped σ-algebra FZ by FZ := {F ∈ F∞ : F ∩ {Z ≤ z} ∈ Fz ∀z ∈ R},

(9)

1 The AMS-algorithm in the idealized

case

The Adaptive Multilevel Splitting algorithm is a tool which can be used to efficiently estimate rare event probabilities. This algorithm is widely usable, but in order to apply the basic version, one has to make some assumptions. In the current chapter, we will discuss the basic algorithm, together with these assumptions that must hold in order to be able to use it. The chapter will consist of roughly three parts; first, we will introduce the algorithm in Section 1.1 and 1.2, and we will prove some technical lemmas in Section 1.3 and 1.4 about the various processes appearing in the algorithm, in order to prove unbiasedness of the estimator obtained by the algorithm (AMS-estimator). Most of the results we cover in these sections heavily rely on the theory in [BLR15]. After proving the unbiasedness result, we will show in Section 1.5 that the algorithm is well-defined, in the sense that it stops after an almost surely finite number of iterations. Finally, we will consider the specific case, in which only one of the introduced random variables is resampled, the case where k = 1 (the parameter k is introduced below). For this choice, we derive the asymptotic distribution of the estimator, and thereby show some results about the quality of the estimator. This will be the content of Section 1.6.

1.1 Motivation, setting and assumptions

In this section, some motivation for the study of the AMS-algorithm is given. Further-more, we will state the problem of the quantity that we want to estimate, and give some assumptions on the involved random variable. The problem is as follows: given a non-negative random variable X with law µ, and real numbers a > b ≥ 0, calculate

p(a, b) := P(X ≥ a|X > b). We will make two assumptions on the random variable X:

Assumption 1.1. X has a continuous cumulative distribution function F ;

Assumption 1.2. For all x ∈ [b, a], we can sample random variables with distribution L(X|X > x).

If the latter assumption holds, then we call this the idealized setting. A crucial example of a random variable X for which Assumption 1.2 holds, but for which we do not know the full distribution function (so that we cannot compute p(a, b) directly), is when X = sup_n∈NXn is the supremum of a time-homogeneous Markov chain (Xn)n∈N, of which we

(10)

know the dynamics. We assume that (Xn)n∈N takes values in Z, and that the sample

paths are “continuous”, in the sense that if Xn = i and Xm = j for some m > n,

then for all k ∈ {i, . . . , j} (assuming j > i), there exists ` ∈ {n, . . . , m} such that X` = k. Calculating the distribution function will not be easy in this case, although

we do know how to sample according to the conditional law L(X|X > x). Namely, given that the Markov chain has crossed x ∈ [b, a], we can take a Markov chain (Yn)n∈N

starting in x + 1, and let it follow the given dynamics of (Xn)n∈N after that. Then,

by the memorylessness property of Markov chains and the fact that {X > x} = {X ≥ x + 1} = {Xn = x + 1 for some n ∈ N}, we know that the resulting supremum Y of

(Yn)n∈N has law L(X|X > x). Note that the time-homogeneity of the Markov chain X

is used here, and that we cannot follow this procedure for Markov chains that are not time-homogeneous.

This setting of taking the supremum of a Markov chain will be a key application of the AMS-algorithm that is considered in this thesis. Note: throughout this Chap-ter, we always assume that the Markov chains considered are discrete-time, time-homogeneous Markov chains.

With this application in mind, we want to emphasize that if the above Markov chain takes its values in Rdwith d > 1 instead of R, then we cannot apply the AMS-algorithm straight away. In this case, we cannot take X as the supremum of the Markov chain anymore. However, we can define a function ξ : Rd→ R, and take X to be the supre-mum of ξ(Xn). The problem with this approach is that in general, we cannot sample

from the conditional law L(X|X > x) anymore, because multiple choices for y ∈ Rd

lead to the same value of ξ(y), i.e., there are multiple crossing values. As a result, we must ensure that the function ξ is such, that if y1 and y2 are so that ξ(y1) = ξ(y2), then

sup_n∈Nξ(Y_n1)= supd _n∈Nξ(Y_n2), where (Y_ni)_n∈N is a Markov chain with Y₁i = yi, following

the dynamics of (Xn)n∈N. However, for the case where d > 1, there is only one of such a

function ξ that is known of, which is the so-called committor function (note: for d = 1, the function ξ(x) = x satisfies, but this is not the committor function!). It turns out to be the case that we can write the probability p(a, b) that we want to estimate in terms of this function, which makes the AMS-algorithm redundant, if we would know this committor function. In Chapter 2, we will therefore generalize the AMS-algorithm, so that it is also useful when considering the situation in Rd. With this generalization, we can also study one-dimensional Markov chains, that are not of the form assumed above (i.e., “continuous” on Z).

Now that we have given an example for which Assumptions 1.1 and 1.2 are satisfied, but for which we cannot calculate the quantity p(a, b) directly, let us give some motivation of why we would want to use the AMS-algorithm, instead of other methods, such as a Monte Carlo sampler or an importance sampler.

First of all, the naive way to estimate the quantity p(a, b) is the well-known Monte Carlo approach. The idea is then to sample i.i.d. random variables (Wn)Nn=1according to

(11)

this method leads to an estimator with a very large relative error as p(a, b) → 0. Indeed, the squared relative error is given by

Var 1 N PN n=11Wn≥a p(a, b)2 = p(a, b)(1 − p(a, b)) N p(a, b)2 = 1 − p(a, b) N p(a, b) ≈ 1 N p(a, b) as p(a, b) → 0. We will show in Section 1.6 that if we choose the parameter k in the AMS-algorithm equal to 1, then the relative error of the corresponding AMS-estimator will grow a lot slower than the Monte Carlo estimator as p(a, b) → 0.

A second way to estimate rare events is by using a change-of-measure technique, known as importance sampling. The new measure makes the rare event more likely, and the relative error can drastically decrease (with respect to the Monte Carlo approach). The problem with this approach is that we need to come up with such a new measure, and that we need to be able to compute the Radon-Nikodym derivative of the new measure with respect to the old one. This turns out to be difficult in some cases, of which the Markov chain setting is an example.

1.2 Adaptive Multilevel Splitting algorithm

In this section, the basic AMS-algorithm is given. After this, we discuss the rˆole that the parameter k plays in the algorithm.

Before we provide the algorithm, let us quickly explain the idea behind it. The algo-rithm starts with a number nrep of i.i.d. random variables, with law L(X|X > b). These

are the same random variables that one would use, if a Monte Carlo approach was pre-ferred. However, in this case, the smallest k of the random variables are disabled, and are replaced by random variables that are bigger than the disabled random variables. This is done by sampling according to the conditional law L(X|X > Zq), where Zqis the k-th

smallest random variable (i.e., the highest random variable that was disabled). In doing so, we make sure that the random variables are getting bigger in each iteration step, and the algorithm stops if at least nrep− k + 1 random variables are above the threshold a.

Estimating the probability p(a, b) is then done by using an estimator similar as in the Monte Carlo approach, only there are some (stochastic) weights 1 −_nk

rep

Qiter

in front of the indicators in this case, where Qiter is the total amount of iterations performed by

the algorithm.

In the algorithm, we will use the concept of `-th order statistic, which is the `-th smallest value of a given finite set of random variables. Recall that µ is the law of X. Algorithm 1.3 (AMS-algorithm).

”””

-Input: a > b ≥ 0, k ∈ N, measure µ on B(R) describing law of X, nrep ∈ N≥2,“way to

(12)

-For a > b ≥ 0, estimates the (rare event) probability p(a, b) = P(X ≥ a|X > b). -Output: Estimator ˆpnrep,k(a, b) for p(a, b).

”””

Initialization step (q = 0): 1. Set Z−1 := b;

2. Let X₁0, . . . , X_n0_rep be independent random variables with law L(X|X > b); 3. Let Z0 := X_(k)0 be the k-th order statistic of Xn0

1≤n≤nrep.

Iteration step (q ≥ 0):

If Zq≥ a: stop algorithm, and set Qiter= q.

Else, perform the following three steps:

1. Independently of all previous random variables, sample k i.i.d. random variables (χqm)1≤m≤k∼ L(X|X > Zq);

2. For n ∈ {1, . . . , nrep}: if Xnq > Zq, set Xnq+1:= Xnq;

...if Xnq ≤ Zq, set Xnq+1 := χq_`, where ` ≤ k is such that

... Xnq = X_(`)q is the `-th order statistic of (X_nq0)1≤n0_≤n rep;

3. Set Zq+1 := X_(k)q+1, and set q = q + 1.

Computation step: Return

ˆ pnrep,k(a, b) = 1 nrep nrep X n=1 " 1 − k nrep Qiter ·

1

_XQiter n ≥a #

(13)

Figure 1.1: Visual representation of AMS-algorithm for nrep = 3 and k = 1. In each

iteration, the smallest (red) random variable is disabled, and is given the label Zq. A new random variable X is sampled, conditional on {X > Zq}.

The surviving random variables are displayed in green. If Zqcrosses the level

a, the algorithm is stopped.

Let us now discuss the rˆole of the parameter k that appears in the algorithm. This is the amount of resamples one makes in each iteration step. The algorithm is stated for general k, and we will be able to prove that the estimator is unbiased, for all k (see Section 1.3). However, we will only be able to prove some efficiency properties of the resulting estimator for the case where k = 1 (see Section 1.6). In [BLR15], the variance of the estimator is also analyzed for general k, and it can be shown that in the idealized setting, taking k = 1 will minimize this variance.

The need for Assumption 1.2 is clearly visible in the algorithm; we need to be able to sample random variables with law L(X|X > x). In Section 1.1, we already saw that if X is the supremum of a specific type of Markov chain, this assumption is satisfied once the dynamics of the Markov chain are known. Let us now give another example in which the sampling from the conditional distribution is easy.

Assume that X follows an exponential distribution with rate λ. Then L(X|X > x) = L(X + x) = Exp(λ) + x,

and we know how to generate from the latter distribution. Note however that this exam-ple is really for showing that in some cases conditionally sampling can be easy, and not for any practical purpose, due to the fact that the distribution function of (X|X > x) is

(14)

explicitly known in this case, and we can calculate p(a, b) directly.

In the following, we will see that it is no coincidence that this exponential case shows up. This case will be crucial in proving some results we will state, and we will give some intuition about this now. The trick will be that if we perform the AMS-algorithm on some random variable X ∼ µ, we can do a transformation Λ (which we will define in Section 1.3) on the obtained quantities(Xnq)n≤nrep, Zq

q≤Qiter

. The resulting random variables, which we will denote by(Ynq)n≤nrep, Sq

q≤Qiter :=(Λ(Xnq))n≤nrep, Λ(Zq) q≤Qiter later, turn out to have the same law as their counterparts (Ynq;exp)n≤nrep, S

exp q q≤Qiter :=

(Λexp(Xnq;exp))_n≤n_rep, Λexp(Zqexp)

q≤Qiter

, when using an exponential random variable Xexp ∼ µexp = Exp(1) instead of X, if the correct parameters aexp and bexp (as

coun-terparts of a and b) in the algorithm are chosen. This will be the content of Theorem 1.9. As a result, we can show that deriving distributional properties of the transformed random variables can be done just for the case where X is exponentially distributed. We will show that the AMS-estimator only depends on these transformed random vari-ables, and that it is enough to prove that the AMS-estimator is unbiased in the case that X is exponentially distributed, in order to prove the unbiasedness for some general random variable X. In Section 1.3 and 1.4, we will give a proof of the unbiasedness of the AMS-estimator in the exponential case (Proposition 1.10), and we show that this is indeed enough to show the unbiasedness in the general case (Theorem 1.4).

1.3 Unbiasedness of the AMS-algorithm

In this section, we will show that the AMS-estimator obtained from the AMS-algorithm (Algorithm 1.3) is unbiased. Let us now give this result, and delay the proof of the result until after Proposition 1.10.

Theorem 1.4. Let ((Xnq)n≤nrep, Zq)q≤Qiter be the sequence obtained from Algorithm 1.3

with distribution µ, initial level b ≥ 0, stopping level a > b, nrep > 0 and 1 ≤ k ≤ nrep−1.

Assume that Assumptions 1.1 and 1.2 hold, and that p(a, b) > 0. Then the AMS-estimator ˆpnrep,k(a, b) is an unbiased estimator for p(a, b), i.e.,

E[ ˆpnrep,k(a, b)] =

µ[a,∞)

µ[b,∞) = p(a, b).

Now, we set up the main framework for proving Theorem 1.4. Recall that we want to estimate the probability p(a, b) = P(X ≥ a|X > b), and that we want to show that we can (without loss of generality) assume that the random variable X that is involved has an exponential distribution, in order to prove the unbiasedness of the AMS-estimator. For this, we will state Theorem 1.5, which is a first big step towards the proof of this claim. To prove this result, we want to, as already mentioned in the previous section, transform the active random variables Xnq in iteration step q of the algorithm, together

(15)

First, let us introduce the transformation Λ. Given the distribution function F of X, define

Λ(x) := − log(1 − F (x)). (1.1) In the remainder of this chapter, we will denote the Λ-transformed quantities of the AMS-algorithm by Ynq := Λ (Xnq) and Sq:= Λ(Zq).

Now that we are ready to state Theorem 1.5, let us first explain why this result is important. On the one hand, the theorem shows that the distribution of the Λ-transformed quantities Ynq and Sq only depends on the law µ through S−1 = Λ(b). In

other words, if we compare the random variables Ynq and Sq for different choices of µ

(say µ1 and µ2 with corresponding Λ’s given by Λ1 and Λ2, and initial levels b1 and b2

in the AMS-algorithm), but we ensure that Λ1(b1) = Λ2(b2), then the quantities (Ynq,

Sq) will have the same law. This will be explained in more detail after Proposition 1.6. The fact that the resulting random variables do not depend on the law µ of X, will be a big step in proving that we can assume that X is an exponential random variable.

On the other hand, Theorem 1.5 will allow us to specify the distribution of the quan-tities Sq, which turns out to be useful for proving some properties of the AMS-estimator

in the case where k = 1, and for proving that the algorithm stops after an almost surely finite amount of time (see Section 1.5 and 1.6).

The upcoming result was already stated in [BLR15, Proposition 3.2]; we added some missing details to the proof. Because of the length of the proof, it is delayed until Section 1.4. In this result, we assume that there is no stopping criterion in the algorithm. By this, we mean that we remove the stopping step in the algorithm, so that the random variables (Xnq)n≤nrep and Zq are defined for each q ≥ 0.

Theorem 1.5. Let Ynq= Λ(Xnq) and Sq = Λ(Zq) be the Λ-transformed random variables,

obtained by running the AMS-algorithm without stopping criterion (i.e., if we do not stop the algorithm if Zq≥ a). Then the following three properties hold for all q ≥ 0:

(i)q The sequence (Ynq− Sq−1)_n≤n_rep are i.i.d. Exp(1)-random variables;

(ii)q The sequence (Ynq− Sq−1)_n≤n_rep is independent of (Sr− Sr−1)0≤r≤q−1;

(iii)q The sequence (Sr− Sr−1)0≤r≤q is i.i.d.

Before stating the consequences of Theorem 1.5 in detail, let us derive a property of the transformation Λ. We make the following observation: if Y is distributed according to L(X|X > x), then Y has distribution function FY(y) = F (y)−F (x)_{1−F (x)} , y ≥ x. As a result,

we find that for y ≥ x:

Λ(y) − Λ(x) = − log 1 − F (y) 1 − F (x) = − log 1 −F (y) − F (x) 1 − F (x)

(16)

This leads to the following result:

Proposition 1.6. If Y ∼ L(X|X > x), then Λ(Y ) − Λ(x) ∼ Exp(1). Proof. Since FY(Y ) follows a uniform distribution on [0, 1], we find that

P(Λ(Y ) − Λ(x) ≥ t) = P(− log(1 − U ) ≥ t) = P(U ≥ 1 − e−t) = e−t, where U ∼ Uniform([0, 1]).

Let us now look at the consequences that Theorem 1.5 has for the unbiasedness of the AMS-estimator. First of all, by Proposition 1.6, the random variables Y_n0− S−1 follow

an Exp(1) distribution, because X_n0 ∼ L(X|X > b). Taking the k-th order statistic, and recalling that S0 = Y_(k)0 , we see that the law of S0− S−1 does not depend on the

law µ or on the initial level b. Hence, by property (iii) of Theorem 1.5, we know that the sequence (Sq− Sq−1)q≥0 has the same law if the AMS-algorithm is run with law µ1

and initial level b1, compared to when using µ2 and b2. As a result, we find that the

sequence (Sq)q≥−1 has the same law if the AMS-algorithm is run with law µ1 and initial

level b1, compared to when using µ2and b2, provided that S−1 is the same in both cases,

i.e., provided that Λ1(b1) = Λ2(b2). Furthermore, by property (i) of Theorem 1.5, the

differences Ynq − Sq−1 do also not depend on the choice of µ and b. As a result, the

sequence (Ynq, Sq−1)_n≤n_rep_,q≥0 will have the same law if the AMS-algorithm is run under

µ1 with initial condition b1, compared to when it is run under µ2 with initial condition

b2.

Now, let us look at the case where µ1 := µ = L(X), and µ2 := µexp = Exp(1), and

let us put superscripts “exp” in the latter case for random variables X, Y, Z, S. Then, Λ2(y) = Λexp(y) = y, so that Λ1(b1) = Λ2(Λ1(b1)). Therefore, we need to choose the

initial levels as b1 = b and b2 = Λ1(b) in order to have Λ1(b1) = Λ2(b2). Because Λ2 is

the identity, we get the following result (see also [BLR15, Proposition 3.2]). Again, we use the AMS-algorithm without stopping criterion (see also Theorem 1.5).

Corollary 1.7. Let µ be a probability distribution, and let µexp = Exp(1). Let (Xnq, Zq−1)_n≤n_rep_,q≥0

be obtained by running Algorithm 1.3 (without stopping criterion), with law µ and initial level b, and let Xnq;exp, Z_q−1exp

n≤nrep,q≥0

be obtained by running Algorithm 1.3 (without stopping criterion), with law µexp and initial level bexp := Λ(b), where Λ is defined by

equation (1.1). Then the sequences (Λ(Xnq), Λ(Zq−1))_n≤n_rep_,q≥0and

Xnq;exp, Zq−1exp

n≤nrep,q≥0

have the same law.

In order to replace the previous corollary by a statement about the AMS-algorithm that does contain a stopping criterion, we need to verify that the random variable Qiter

(17)

the subscripts “q ≥ 0” in Corollary 1.7 by “q ≤ Qiter”, which shows that the sequences

obtained from the AMS-algorithm really have the same law.

In particular, let a be the stopping level for the AMS-algorithm under µ with initial level b, and let aexp be the stopping level in the AMS-algorithm under µexp with initial

level bexp = Λ(b). If we want to insert stopping levels in Corollary 1.7, we need to make

sure that

P (Zq< a) = P Zqexp < aexp . (1.3)

Namely, if that is the case, and the AMS-algorithms from Corollary 1.7 are given the stopping levels a and aexp respectively, then the law of the total amount of iterations is

the same for both algorithms, i.e.

P(Qiter> q) = P (Zq< a) = P Zqexp< aexp = P Qexp_iter > q

for all q ≥ 0.

The next lemma shows that we can indeed choose the level aexp in such a way. Compare

this with Lemma 3.3 in [BLR15].

Lemma 1.8. Assume the setting of Corollary 1.7, and include stopping levels a and aexp:= Λ(a) respectively in these algorithms. Then equation (1.3) is satisfied.

Proof. Recalling that Sq = Λ(Zq), we see by Corollary 1.7 that P (Zqexp< aexp) =

P (Sq< aexp). Furthermore, since Λ is increasing, we have {Sq < aexp} ⊆ {Zq < a} ⊆

{Sq ≤ aexp}. Since Sq has a density (it is an order statistic of exponential random

variables, see discussion after Proposition 1.6), the sets {Sq< aexp} and {Sq≤ aexp} are

equal up to P-null sets, and hence P(Sq< aexp) = P(Zq< a) = P(Sq ≤ aexp). It follows

that P(Zq< a) = P(Sq< aexp) = P(Zqexp < aexp).

Now that we know that Qiter follows the same distribution if the AMS-algorithm is

run under µ with initial level b and stopping level a, compared to the AMS-algorithm under µexp with initial level bexp = Λ(b) and stopping level aexp = Λ(a), we can combine

this with Corollary 1.7, and get the following result. See also [BLR15, Corollary 3.4].

Theorem 1.9. Let µ be a probability distribution, and let µexp = Exp(1). Let (Xnq, Zq−1)_n≤n_rep_,q≤Q_iter

be obtained by running Algorithm 1.3 with law µ, initial level b and stopping level a, and let Xnq;exp, Zq−1exp

n≤nrep,q≤Qexp_iter

be obtained by running Algorithm 1.3 with law µexp,

ini-tial level bexp = Λ(b) and stopping level aexp= Λ(a), where Λ is defined by equation (1.1).

Then the sequences (Λ(Xnq), Λ(Zq−1))_n≤n_rep_,q≤Q_iter and

Xnq;exp, Z_q−1exp

n≤nrep,q≤Qexpiter

have the same law.

Finally, we can elaborate on why this theorem is so important in proving the unbi-asedness of the AMS-estimator. As already mentioned before, Theorem 1.9 will allow us to conclude that the AMS-estimator is unbiased for a general µ, provided that we have

(18)

proven that it is unbiased for µexp = Exp(1). We will now show how this follows, which

is the content of Theorem 4.1 in [BLR15]. We rely on the fact that we already know that the estimator is unbiased in case µ = µexp = Exp(1):

Proposition 1.10. Let ((Xnq)n≤nrep, Zq)q≤Qiter be the sequence obtained from Algorithm

1.3 with distribution µ = µexp= Exp(1), initial level b ≥ 0, stopping level a > b, nrep > 0

and 1 ≤ k ≤ nrep− 1. Assume that Assumptions 1.1 and 1.2 hold, and that p(a, b) > 0.

Then the AMS-estimator ˆpnrep,k(a, b) is an unbiased estimator for

µexp[a,∞) µexp[b,∞), i.e., E[ ˆpnrep,k(a, b)] = µexp[a,∞) µexp[b,∞) = e −(a−b)_.

The proof of Proposition 1.10 is delayed until after the proof of Theorem 1.4, which we will give now.

Proof of Theorem 1.4. Mimicking the proof of Lemma 1.8, we can show that {XQiter

n ≥

a} is equal to {YQiter

n ≥ Λ(a)} up to P-null sets (since Ynq also has a density by Theorem

1.5). Letting ˜pnrep,k(aexp, bexp) denote the AMS-estimator obtained under µexp= Exp(1),

starting level bexp = Λ(b) and stopping level aexp= Λ(a), Theorem 1.9 then implies that

E[ ˆpnrep,k(a, b)] = E[˜pnrep,k(aexp, bexp)].

Invoking Proposition 1.10 and equation (1.2) gives us: E[ ˆpnrep,k(a, b)] = E[˜pnrep,k(aexp, bexp)] = e

−aexp+bexp

= e−Λ(a)+Λ(b) = elog(p(a,b)) = p(a, b).

Let us end this section by giving a proof of Proposition 1.10. Some technical, but easy-to-prove, details are left out in the proof, and we refer the interested reader to [BLR15, Theorem 4.1] for all details. Note that X ∼ Exp(1) has a density, which we will denote by f in the upcoming proof. Let us denote F (s; t) = P(X ≥ s|X > t) (with corresponding density f (s; t)) for the conditional distribution function of X ∼ Exp(1), and recall that the sample (X_n0)n≤nrep is an i.i.d. sample with law L(X|X > b).

Note: we can write down the conditional distribution function and density more ex-plicitly in the case that X follows an exponential distribution (in particular, F (s; t) = F (s − t)). However, we will not do this in the proof, because the proof can then also be used for all other random variables X that allow for a density (see also the remark after this proof).

(19)

Proof of Proposition 1.10. The proof relies on the functional equation in q given by q(b) = Z a b 1 − k nrep

q(y)fnrep,k(y; b) dy + θnrep,k(a, b), b ∈ [0, a], (1.4)

where θnrep,k(a, b) = k−1 X `=0 nrep− ` nrep P a ∈hX_(`)0 , X_(`+1)0 ,

and where fnrep,k(y; b) = k

nrep

k F (y; b)

k−1_{f (y; b)(1−F (y; b))}nrep−k_{is the density of X}0

(k),

with corresponding distribution function Fnrep,k(y; b) (see, for example, Equation (2.1.6)

in [DN03]). Here, q : [0, a] → R≥0 should be a bounded, measurable mapping.

The proof then consists of three steps:

1. If p1, p2 : [0, a] → R≥0 are two solutions of the functional equation (1.4), then

p1 = p2;

2. The mapping b 7→ E[ˆpnrep,k(a, b)] is a solution of the functional equation (1.4);

3. The mapping b 7→ p(a, b) is a solution of the functional equation (1.4). It follows immediately from those three assertions that Proposition 1.10 holds.

Let us therefore show how these three steps are proven. Step 1 is easy to prove: ||p1− p2||∞= sup b∈[0,a] ||p1(b) − p2(b)|| = 1 − k nrep sup b∈[0,a] Z a b

[p1(y) − p2(y)]fnrep,k(y; b) dy

≤ 1 − k nrep sup b∈[0,a] Z a b

|p1(y) − p2(y)|fnrep,k(y; b) dy

≤ 1 − k nrep ||p1− p2||∞ sup b∈[0,a] Z a b fnrep,k(y; b) dy ≤ 1 − k nrep ||p₁− p₂||∞ < ||p1− p2||∞, if ||p1− p2||∞6= 0.

From this, it follows that ||p1− p2||∞= 0, so that p1 = p2.

To prove step 2, the expectation in step 2 is split into two parts:

E[ ˆpnrep,k(a, b)] = E[ˆpnrep,k(a, b)1Qiter=0] + E[ˆpnrep,k(a, b)1Qiter>0].

The first expectation can be rewritten to θnrep,k(a, b), by using that {Qiter = 0} =

{X0

(k)> a}, and plugging in the definition of ˆpnrep,k(a, b). The second expectation turns

out to be equal to the first term in the functional equation with q(y) = E[ˆpnrep,k(a, y)],

(20)

straightforward (see [BLR15, Proposition 4.2]). Adding up those two expressions proves step 2.

For the last step, more work has to be done. First, by shifting around the order of the random variables (X_n0)n≤nrep, one can show that (see [BLR15, Lemma 4.3])

θnrep,k(a, b) = p(a, b) · (1 − Fnrep−1,k(a; b)), where we recall that Fn,k(a, b) is the

dis-tribution function of the k-th order statistic of (X_m0)m≤n. We omit the proof of this

statement, as it is a straightforward computation. The statement in step 3 now follows: Z a b 1 − k nrep

p(a, y)fnrep,k(y; b) dy + θnrep,k(a, b)

= Z a

b

nrep− k

nrep

p(a, y)knrep k

F (y; b)k−1f (y; b)(1 − F (y; b))nrep−k _{dy + θ}

nrep,k(a, b)

= Z a

b

p(a, y)(1 − F (y; b))knrep− 1 k

F (y; b)k−1f (y; b)(1 − F (y; b))nrep−k−1 _{dy + θ}

nrep,k(a, b)

(∗)

= Z a

b

p(a, b)fnrep−1,k(y; b) dy + θnrep,k(a, b)

= p(a, b) · Fnrep−1,k(a; b) + p(a, b) · [1 − Fnrep−1,k(a; b)]

= p(a, b),

where we used in equality (∗) that P(X > a|X > y)·P(X > y|X > b) = P(X > a|X > b) for b ≤ y ≤ a.

Finally, let us make the remark that the proof above only uses the fact that X has a density, and it does not use properties of the specific exponential distribution in any place.

1.4 Proof of Theorem 1.5.

The goal of this section is to give the proof of Theorem 1.5, which we omitted in the previous section.

Proof of Theorem 1.5. The proof of this theorem is given with induction with respect to level q.

Base step, q = 0. (i)0 follows because Yn0− S−1 is equal to Λ Xn0 − Λ(b), which are

independent random variables by construction, that follow an exponential distribution with rate 1 by Proposition 1.6. The second assertion (ii)0 is an empty assertion if q = 0,

and the last assertion (iii)0is also trivial, because the sequence consists of only 1 random

variable.

Induction step, q ≥ 0. Assume that for all r ≤ q, the assertions (i)r, (ii)r and (iii)r hold.

First, let us prove that assertions (i)q+1 and (ii)q+1 hold. For given y1, . . . , ynrep ≥ 0

(21)

(ii)q+1 as P (y1, . . . , ynrep, s0, . . . , sq) := P Y₁q+1− S_q> y1, . . . , Ynq+1rep − Sq> ynrep, Sq− Sq−1> sq, . . . , S0− S−1> s0 .

If we can show that this probability splits up in the correct way as the product of marginal probabilities, then we have shown the independence claimed in (i)q+1 and (ii)q+1. In

particular, we have to show that P (y1, . . . , ynrep, s0, . . . , sq) = e

−Pnrep

n=1 yn· P (S_q− S_q−1> s_q, . . . , S₀− S₋₁> s₀) . (1.5)

Let us denote I≤ for the set of indices n ∈ {1, . . . , nrep} with Xnq ≤ Zq, and set I> :=

{1, . . . , nrep} \ I≤. Furthermore, define the set I := {I ⊆ {1, . . . , nrep} : #I = k}, which

is the collection of all possible sets that I≤ can attain. For ` ≤ k, let n` ∈ I≤ denote

the index with Xnq` = X

q

(`), just as in the algorithm. Then, by construction of the

AMS-algorithm, we have: X_nq+1_` = χq+1_` ∀` ≤ k, and X_nq+1 = X_nq ∀n ∈ I_>, so that Y_nq+1_` = Λ χq+1_` ∀` ≤ k, and Y_nq+1 = Y_nq ∀n ∈ I>.

As a result, we find that

P (y1, . . . , ynrep, s0, . . . , sq) = 1 #I X I∈I h P Y_nq− S_q> yn ∀n ∈ I>; Sq− Sq−1 > sq, . . . , S0− S−1 > s0; Λχq+1_` − Λ(Z_q) > yn` ∀` ≤ k

|

I≤= I i .

Writing the above probability as an expectation over the corresponding indicator, then (using the tower rule) conditioning on the σ-algebra containing the information about all random variables Ynq− Sq and Sr− Sr−1, r ≤ q, we can take the indicators belonging to

the random variables in the first line of the previous expression outside the conditional expectation. Then, because the random variables Λ

χq+1_`

− Λ(Zq) only depend on

the information about Zq in the σ-algebra mentioned above (by construction of the

algorithm), and because we know that they are, conditional on Zq, an i.i.d. sequence of

(22)

the quantity above, after applying the tower rule for conditional expectations again: P (y1, . . . , ynrep, s0, . . . , sq) = 1 #I X I∈I P Y_nq− Sq−1− (Sq− Sq−1) > yn

A

∀n ∈ I>; Sq− Sq−1 > sq, . . . , S0− S−1 > s0

|

I≤= I

·

exp ( − k X `=1 yn` )# , = 1 #I X I∈I P Y_nq− Sq−1> (Sq− Sq−1) + yn

A

∀n ∈ I>; Sq− Sq−1 > sq

|

I≤= I

A

·

P Sq−1− Sq−2 > sq−1, . . . , S0− S−1 > s0

|

I≤= I · exp    − X n∈I≤ yn     ,

where the second equality follows from the independence relations in the induction hy-potheses (ii)q and (iii)q. Applying the tower rule again (conditioning on the σ-algebra

generated by Sq− Sq−1) in the first probability, and using the exponential i.i.d. property

of assertion (i)q, we see that the above expression is equal to

1 #I X I∈I  E   Y n∈I> exp{−yn− (Sq− Sq−1)}1{Sq−Sq−1>sq}

|

I≤= I  

·

P Sq−1− Sq−2 > sq−1, . . . , S0− S1 > s0

|

I≤= I exp    − X n∈I≤ yn      = e−Pnrepn=1 yn P (Sq−1− Sq−2> sq−1, . . . , S0− S1 > s0) E e−(nrep−k)(Sq−Sq−1)1 {Sq−Sq−1>sq} .

Filling in y1 = · · · = ynrep = s0 = · · · = sq−1 = 0 so that P (y1, . . . , ynrep, s0, . . . , sq) =

P(Sq− Sq−1 > sq), we see that the first two terms in the above expression are equal to

1, and the last expression does not depend on these chosen values, meaning that the last expression must be equal to P(Sq− Sq−1 > sq), for all sq∈ R. As a result, we find that

the expression above is equal to e−Pnrepn=1 yn

P (Sq−1− Sq−2 > sq−1, . . . , S0− S1> s0) P (Sq− Sq−1 > sq) ,

which is equal to the expression in (1.5) by assertion (iii)q, which is what we wanted to

show.

Now that we proved assertions (i)q+1 and (ii)q+1, assertion (iii)q+1 will follow quickly.

Because we already know that the random variables Sr− Sr−1 in (iii)q+1 are i.i.d. for

r ≤ q, we only have to show that Sq+1− Sq is independent of that sequence, and that

(23)

the required independence. Furthermore, by (i)r, r ≤ q + 1, the law of Ynr− Sr−1 does

not depend on r, so that the same holds for Sq+1 − Sq (take r = q + 1). Hence, the

sequence (Sr− Sr−1)0≤r≤q+1 is i.i.d., which completes the proof.

1.5 The algorithm stops almost surely

In this section we will prove that the amount of iterations Qiter performed by the

AMS-algorithm is almost surely finite. Just as in the previous sections, we will use the random variables Sq for all q ≥ 0, which are obtained by running the AMS-algorithm without

stopping criterion. To prove the almost sure finiteness of Qiter, we will first derive an

expression for the law of the random variables Sq− Sq−1, which are i.i.d. by Theorem

1.5.

Let us first introduce some notation. For a finite set of random variables A, let ordk(A) denote the k-th order statistic of A. The following result was already quickly

mentioned in [BLR15] (consequence of Proposition 3.2) without proof. Let us now give this statement, together with its proof.

Proposition 1.11. Let (En)n≤nrep be a sequence of independent Exp(1) random

vari-ables. Then Sq− Sq−1 d

= S0− S−1 =d Pk_n=1 _n_repE_−n+1n .

Proof. The first equality in law was already proven in Theorem 1.5, so we can focus on the second equality. By Proposition 1.6 and the mutual independence of the random variables X_n0, we have S0− S−1 = Λ X_(k)0 − Λ(b) = ord_kΛ X0 n − Λ(b) 1≤n≤nrep d = E(k).

We will now show that the latter expression has the same law as Pk

n=1nrepE−n+1n . Let

0 < a1≤ x1≤ a2 ≤ x2≤ · · · ≤ anrep≤ xnrep be real numbers. Then,

P E(1)∈ [a1, x1], . . . , E(nrep)∈ [anrep, xnrep]

= X

i1,...,inrep

P

E(1) ∈ [a1, x1], . . . , E(nrep)∈ [anrep, xnrep]; Ei1 ≤ · · · ≤ Einrep

= X

i1,...,inrep

P

Einrep ∈ [anrep, xnrep], . . . , Ei1 ∈ [a1, x1]

= X

i1,...,inrep

P

Einrep ∈ [anrep, xnrep]

· · · P (Ei1 ∈ [a1, x1]) = nrep! nrep Y n=1 −e−xn_{+ e}−an_.

(24)

Using that the above intervals generate the domain of the order statistics, we therefore see that the joint probability density of the sequence E(1), . . . , E(nrep) is given by

∂ ∂x1 · · · ∂ ∂xnrep nrep! nrep Y n=1 −e−xn_{+ e}−an = n rep! · e− Pnrep n=1 xn_.

Now, let g : Rnrep _{→ R}nrep _{be defined by g(x}

1, . . . , xnrep) = (x1, x2 − x1, . . . , xnrep−

xnrep−1). Then g is invertible with g

−1_(y

1, . . . , ynrep) = (y1, y1+ y2, . . . , y1+ · · · + ynrep).

Note that the Jacobian matrix of g−1 has determinant equal to 1. Therefore, the joint density of the g-transformed random variables (E(1), E(2)− E(1), . . . , E(nrep)− E(nrep−1))

is given by nrep! · e− Pnrep n=1 g −1 n (yn)_{= n} rep! · exp ( − nrep X n=1 n X k=1 yk )

= nrepe−nrepy1 · (nrep− 1)e(nrep−1)y2· . . . · 1 · e−ynrep,

which is the joint density of a sequence (Fn)n≤nrep of independent Exp(nrep − n + 1)

random variables. In other words, the sequence (E₍₁₎, E₍₂₎−E₍₁₎, . . . , E_(n_rep₎−E_(n_rep₋₁₎) has the same distribution as En

nrep−n+1

n≤nrep

. Hence, the random variable E_(k)satisfies

S0− S−1 = Ed (k)= k X n=1 E(n)− E(n−1) d = k X n=1 En nrep− n + 1 , (with E(0):= 0).

The fact that the amount of iterations Qiterin the AMS-algorithm will be almost surely

finite is a consequence of this proposition. This result is also given in [BLR15, Proposi-tion 3.6], although with the knowledge of ProposiProposi-tion 1.11, the result is easier for us to prove.

Corollary 1.12. P(Qiter < ∞) = 1.

Proof. By the strong law of large numbers, we have that Sq q + 1 = Pq r=0Sr− Sr−1 q + 1 + Λ(b) q + 1 a.s. −→ q→∞E[S0− S−1] = k X n=1 1 nrep− n + 1 > 0,

where we used the i.i.d.-property from Theorem 1.5(iii), and the representation in Propo-sition 1.11. As a result, we have that almost surely, Sq → ∞ as q → ∞. Using (the

proof of) Lemma 1.8, we have that P(Zq ≥ a) = P(Sq> Λ(a)), and we thus find that

P(Qiter < ∞) = P(∃q : Zq≥ a) = P(∃q : Sq > Λ(a)) ≥ P(Sq q→∞

(25)

1.6 A simple case, k = 1

Now that we have proven that the AMS-estimator is unbiased and that Qiter is almost

surely finite, for each value of k, let us take a look at the special choice of k = 1. First, we will prove that we can explicitly derive the distribution of the random variable Qiter

in this case. Using the dependence of the AMS-estimator on this quantity, we will derive the variance of the AMS-estimator in this case, and we will show that we can calculate the asymptotic distribution of the AMS-estimator as nrep goes to infinity.

First, let us state the result concerning the law of Qiter if k = 1. This is Proposition

3.7 in [BLR15], and we fixed some errors in its proof.

Proposition 1.13. Let k = 1. Then Qiter ∼ Poisson(−nreplog p(a, b)).

Proof. Taking k = 1 in Proposition 1.11 and using Theorem 1.5, we see that (Sq −

Sq−1)q≥0 is an i.i.d. sequence with common law Exp(nrep). Define the stochastic process

(Nt)t≥0 by Nt= ∞ X q=0 1Sq−S−1≤t;

this is a Poisson process with interarrival times Sq− Sq−1 ∼ Exp(nrep). Therefore, we

have that Qiter = ∞ X q=0 1Zq≤a a.s. = ∞ X q=0 1Sq≤Λ(a) = ∞ X q=0 1Sq−S−1≤Λ(a)−Λ(b)

= NΛ(a)−Λ(b) = N− log p(a,b),

where we used (the proof of) Lemma 1.8 again, and we used equation (1.2) in the last equality. As a result, using that (Nt)t≥0 is a Poisson process with rate nrep, we find that

Qiter∼ Poisson(−nreplog p(a, b)).

We will now explain how to use the previous result to say something about the esti-mator ˆpnrep,1(a, b). Note that for k = 1, all the random variables X

Qiter

n in the final step

of the algorithm are at least equal to a. Therefore, we have the representation

ˆ pnrep,1(a, b) = 1 − 1 nrep Qiter .

(26)

straightforward:

Var( ˆpnrep,1(a, b)) = −p(a, b)

2 + E (ˆpnrep,1(a, b)) 2 = −p(a, b)2+ ∞ X q=0 1 −_n1 rep 2q

(−nreplog p(a, b))qenreplog p(a,b)

q!

= −p(a, b)2+ p(a, b)nrep

∞ X q=0 −1 −_n1 rep 2

nreplog p(a, b)

q q!

= −p(a, b)2+ p(a, b)nrep_e−nrep

1−_nrep1 2log p(a,b)

= −p(a, b)2+ p(a, b)nrep_{p(a, b)}−nrep

1−_nrep1

2

= −p(a, b)2+ p(a, b)nrep−nrep

1− 1 nrep 2 = p(a, b)2hp(a, b)− 1 nrep _{− 1}i_.

As a corollary, we find that

Var( ˆpnrep,1(a, b)) = p(a, b)

2 1 − 1 nrep log p(a, b) + O 1 nrep2 − 1 = p(a, b)2· − 1 nrep log p(a, b) + O 1 nrep2 .

The squared relative error Var( ˆp_p(a,b)nrep,12(a,b))is therefore equal to −_n_rep1 log p(a, b)+O

1 n2 rep . Comparing this with the squared relative error of the Monte Carlo method, which is

1−p(a,b)

nrepp(a,b) (see Section 1.1), we see that the AMS-estimator performs (asymptotically)

better than the Monte Carlo estimator, for all values of p(a, b), because 1−p(a,b)_p(a,b) ≥ − log p(a, b). Especially as p(a, b) goes to 0, the difference between these squared errors will become huge.

Next, before we will derive the asymptotic distribution of the AMS-estimator, let us quickly look at two sequences that will help us in deriving this distribution. We have:

log 1 − 1 nrep = − 1 nrep + O 1 nrep2 as nrep → ∞, so that −nreplog 1 − 1 nrep → 1 as nrep → ∞; (1.6) √ nrep nreplog 1 − 1 nrep + 1 → 0 as nrep → ∞. (1.7)

(27)

Proposition 1.14. √nrep(ˆpnrep,1(a, b) − p(a, b))

d

→ N 0, −p(a, b)2_{log p(a, b) as n} rep →

∞.

Proof. Let (Wn)n≥1 be a sequence of independent Poisson(− log p(a, b)) random

vari-ables. By the summation property of the Poisson distribution, and by Proposition 1.13, we then have Qiter d = nrep X n=1 Wn.

By the central limit theorem, we have that √ nrep 1 nrep nrep X n=1 Wn+ log p(a, b) ! d → N (0, − log p(a, b)).

Using Slutsky’s lemma, we then find: √

nrep(− log ˆpnrep,1(a, b) + log p(a, b)) =

√ nrep −Q_iterlog 1 − 1 nrep + log p(a, b) d =√nrep − log 1 − 1 nrep nrep X n=1 Wn+ log p(a, b) ! = −nreplog 1 − 1 nrep √ nrep 1 nrep nrep X n=1 Wn+ log p(a, b) ! +√nrep nreplog 1 − 1 nrep + 1 log p(a, b) d

→ 1 · N (0, − log p(a, b)) + 0 = N (0, − log p(a, b)),

where we used the central limit theorem and equations (1.6) and (1.7) in the last line. Finally, by the delta-method applied to x 7→ e−x, we then find

√

nrep(ˆpnrep,1(a, b) − p(a, b))

d

(28)

2 A General Framework: the

GAMS-algorithm

In the previous chapter, we introduced the AMS-algorithm in its basic form. There, we already discussed what difficulties arise if we want to apply the algorithm to Markov chains on Rd. These Markov chains are highly relevant; first, they generalize the setting of one-dimensional processes, where one should think of the moving particle in a potential well from the introduction, and second, they can also be used to compute probabilities of the form P(supn=1,...,TXn≥ a) instead of the probabilities P(sup_n∈NXn≥ a) in Chapter

1. Often, these probabilities from Chapter 1 that are of the form P(supn∈NXn≥ a) are

not so relevant, because these probabilities are equal to 0 or 1 in many cases. Also, the ability to compute P(supn=1,...,T Xn ≥ a) enables us to estimate the expected time it

takes a Markov chain to reach the value a; see Section 3.3 for details. In order to tackle the difficulties that arise if we want to apply the AMS-algorithm for multidimensional Markov chains, we introduce a generalized version (GAMS) of the AMS-algorithm from Chapter 1 in this chapter.

This chapter consists of three parts. First, in Section 2.1, we give the formal setting of the Generalized Adaptive Multilevel Splitting algorithm (GAMS), which was first in-troduced in [BGGLR16]. In Section 2.2, we postulate six assumptions, under which this GAMS-algorithm yields an unbiased estimator for the problem discussed in Section 2.1. Later, in Chapter 3, we will see that these six assumptions are indeed satisfied in the multidimensional Markov chain setting. Finally, in Section 2.3, we will discuss how to prove the unbiasedness result of Section 2.2, and indicate where all six assumptions are needed. This is based on [BGGLR16, Section 3].

Later, in Section 3.1, we will see how this GAMS-algorithm can help us in dealing with multidimensional Markov chains. Namely, in Section 3.1, we propose the so-called MMC-algorithm for multidimensional Markov chains, which is a natural extension of Algorithm 1.3. Then, in the same section, we show that this MMC-algorithm can be interpreted as a GAMS-algorithm. Summarizing: in this chapter we prove that (under some assumptions) we obtain an unbiased estimator for some quantity (namely, E[ϕ(X)], see Section 2.1) by using the abstract GAMS-algorithm, and in Chapter 3, we will apply this theory, in order to show that the MMC-algorithm delivers an unbiased estimator for multidimensional Markov chain rare event probabilities.

(29)

2.1 Setting and GAMS-algorithm

In this section, we indicate which quantity we want to estimate, we set up the formal setting of the GAMS-algorithm, and we will give the GAMS-algorithm and the corre-sponding GAMS-estimator. We will rely on the GAMS-setting from [BGGLR16]. One should be aware, that the GAMS-algorithm can be thought of as a framework, in which many other algorithms fall, rather than it being a concrete algorithm itself. In partic-ular, the MMC-algorithm and the Fleming-Viot particle system from Sections 3.1 and 3.2 are examples of algorithms that fall into this GAMS-framework. More on this in Chapter 3.

First of all, we let X : Ω → P be a random element, where P is a Polish space (i.e., a separable and completely metrizable topological space) endowed with its Borel σ-algebra B(P), and let ϕ : P → R be a bounded, measurable function. The problem is as follows. Problem 2.1. Calculate E [ϕ(X)].

Usually, the support Supp(ϕ) of the function ϕ is contained in a set C, where C ∈ B(P) is so, that P(X ∈ C) is very small. Only then, it makes sense to use a rare event algo-rithm to estimate E[ϕ(X)].

On the space (P, B(P)) we define a filtration, which we will denote by (filtz)z∈R, and

which satisfies filtz ⊆ filtz0 ⊆ B(P) for all z < z0. Given a random element Y : Ω → P,

we define the corresponding filtration by

filtY_z = Y−1(filtz).

A typical σ-algebra filtz in this filtration, is chosen such that filtYz contains the

in-formation about the random element Y “up to the level z ∈ R”. For a concrete ex-ample, see Section 3.1. Furthermore, we define the space of finite sequences in P by Prep _{:= {(x}

n)n∈I : xn ∈ P for all n ∈ I, I ⊂ N, |I| < ∞}. We define the σ-algebra

B(Prep_{) on P}rep _{by setting B(P}rep_{) :=} W

I∈IB(P)

⊗I_{, where I is the set of all possible}

I. On the space (Prep, B(Prep)), we can then define the filtration (filtrep_z )_z∈R by setting filtrep_z := W

I∈Ifilt ⊗I

z . Then, for a random element X = (Xn)n∈I : Ω → Prep, we define

filtX_z := X−1(filtrep_z ).

In each iteration of the GAMS-algorithm, we want to be able to sample new random elements based on the random elements of the previous iteration, and based on some level Zq. To this end, we define a collection of probability distributions {πz(x, ·) : z ∈

R, x ∈ P }, where πz(x, ·) denotes the distribution of such a new random element in an

iteration, based on the information of some element x ∈ P up to a level z ∈ R (for a concrete example, see Section 3.1). For technical reasons, we assume that this collection of probability measures is a probability kernel, in the sense that for all A ∈ B(P), the mapping (z, x) 7→ πz(x, A) is (B(R) ⊗ B(P))-measurable. We assume that given x ∈ P

and z ∈ R, we can sample according to this law πz(x, ·). We also assume that, by letting

(30)

which will be denoted by PX.

Before stating the GAMS-algorithm, let us first explain the idea behind it. In the algorithm, some random variables appear, that are said to be sampled (or: chosen) according to some “rule”. These rules have to be given as input in the algorithm. In each iteration q, there are active random elements (Xnq)n∈Iq, for some index set

Iq ⊂ N. Note that in the AMS-algorithm of Chapter 1 (Algorithm 1.3), we would have Iq = {1, . . . , nrep} for all q, if we would use this notation. Given these random elements

in iteration q, the random elements Xnq+1 of iteration q + 1 are constructed as follows.

First, the algorithm samples (according to the offspring rule) a number of offspring Bnq+1 ∈ N0 for each random element Xnq of iteration q. For example, if Bnq+1 = 3, then

Xnqhas 3 offspring in iteration q +1, of which one is Xnqitself (Xnq+1:= Xnq), and of which

two are its children. If Bnq+1 = 0, then Xnq is not present in iteration q + 1, and we call

the random element killed. The children of Xnq are generated by using the probability

distribution πZq(X

q

n, ·), where Zq ∈ R will be the current level of the algorithm. This

level is defined in an adaptive way during the algorithm, using the level rule. Each child (say label n) in each iteration q has a parent Pq(n); this parent-assigning is governed by the parent rule. Furthermore, in the algorithm, some stopping random variables Rq

taking values in {0, 1} appear. These are such, that if Rq = 1, then the algorithm is

stopped at iteration q. The stopping rule describes how those stopping random variables are sampled.

Last, let us comment on the filtration (Fzq)_z∈R, that is created in each iteration of

the algorithm. Those filtrations will be defined in such a way, that the sampling of new random elements in the algorithm conditional on this filtration, is solely based on the random elements that are encountered before, so that the algorithm will be well-defined. In particular, see Assumptions 2.5 and 2.6 on Rq and Zq.

Now, we can formulate the GAMS-algorithm, which was first introduced in [BGGLR16]. Algorithm 2.2 (GAMS-algorithm).

”””

-Input: n0 ∈ N, measure PX : B(R) → R, bounded, measurable ϕ : P → R,“way to

sam-ple from PX and from πz(x, ·) for all x ∈ P, z ∈ R”, “rules to pick Zq (level rule), Rq

(stopping rule), Bnq (offspring rule) and Pq+1(n) (parent rule)”.

-For X : Ω → P and bounded, measurable ϕ : P → R, estimates the expectation E[ϕ(X)] by partially resampling random elements from the previous iteration in the next one, based on their number of offspring Bnq+1, and sampling the new random elements

(chil-dren) by partially resampling the alive random elements according to one of the probability distributions πz(x, ·). Estimates the rare event expectation by a weighted average of the

final random elements. In each iteration, the set of active random elements is (Xnq)n∈Iq

-Output: Estimator ˆp(ϕ) for E[ϕ(X)]. ”””

Initialization step (q = 0):

1. Let I0 = {1, . . . , n0}, and let Xn0

(31)

PX;

2. Set G0_n= _n1

0, for all n ∈ I

0_;

3. Define the tuple X0 = G0_n, X_n0_n∈I0, and set Fz0:= filtX

0

z , z ∈ R;

4. Pick (R-valued) level Z0 according to the level rule, and set F0 := F_Z0₀

(c.f. Definition 0.4). Iteration step (q ≥ 0):

Pick Rq according to the stopping rule. If Rq= 1: stop algorithm, and set Qiter = q.

Else, perform the following ten steps:

1. Conditionally on Fq, sample random variables (Bnq+1)n∈Iq according to the

off-spring rule;

2. Kill random elements with Bnq+1 = 0 (no offspring),

and set accordingly I_killedq+1 = {n ∈ Iq: Bq+1n = 0};

3. Let Kq+1 =P n∈Iq_\Iq+1 killed Bnq+1− 1

be the amount of new random elements, and let Inewq+1:= {max{Iq} + 1, . . . , max{Iq} + Kq+1} denote the set of Kq+1 new labels;

4. Define the index set Iq+1 _{for iteration q + 1 by I}q+1 _{= I}q+1 new ∪ Iq_{\ I}q+1 killed ; the labels for iteration q + 1 are the new labels together with the surviving labels from iteration q;

5. For each new label n ∈ Inewq+1, choose a parent-label n0 = Pq+1(n) from Iq\ I_killedq+1 ,

according to the parent rule. Furthermore, for n ∈ Iq\ I_killedq+1 , set Pq+1(n) = n. The parents have to be taken such, that B_nq+10 is the number of offspring that n0 ∈

Iq_{\ I}q+1

killed has, i.e., #n ∈ Iq+1 : Pq+1(n) = n

0_{= B}q+1 n0 ;

6. If n ∈ Iq\ I_killedq+1 , set Xnq+1= Xnq;

7. If n ∈ Inewq+1, let Xnq+1 be sampled independently according to law πZq

X_Pqq+1_(n), ·

, i.e., let it be a random element that is a partial resample of its parent until level Zq;

8. For n ∈ Iq+1, define the weight Gq+1n =

E h B_Pq+1q+1_(n)

|

Fq i−1 · Gq_Pq+1_(n), and set Xq+1 =Gq+1n , Xnq+1 n∈Iq+1;

9. Define the filtration Fzq+1 = Fq∨ σ Pq+1 ∨ filtX

q+1

z , z ∈ R;

10. Pick new level Zq+1 ≥ Zq according to the level rule, and set Fq+1:= FZq+1q+1.

(32)

Computation step: Return ˆ p(ϕ) := X n∈IQiter GQiter n · ϕ XnQiter

as the estimator for E [ϕ(X)].

Figure 2.1: Visual representation of GAMS-algorithm for n0 = 3. In each iteration,

offspring of the previous iteration is sampled. Blue lines indicate the copy (id.) of a random element of the previous iteration; orange lines indicate their children.

2.2 Six assumptions and unbiasedness result

In this section we state six assumptions on the random variables in the GAMS-algorithm, under which the GAMS-estimator ˆp(ϕ) is an unbiased estimator for E[ϕ(X)] (see The-orem 2.9). In Section 2.3, we will give a sketch of the proof of this result, and we will refer to [BGGLR16] for the remaining details.

In the upcoming sections, we will often have statements that hold for all q ≥ 0 instead of for all q ≤ Qiter. One should interpret the random quantities (for example, Zq) with

q > Qiteras being obtained from the GAMS-algorithm, if no stopping criterion was used.

Note that we also used this convention in Chapter 1.

(33)

Assumption 2.3. For all x ∈ P and for all bounded, measurable mappings ϕ : P → R, the mapping z 7→R_y∈Pϕ(y)πz(x, dy) is right-continuous.

Moreover, limz→−∞

R

y∈Pϕ(y)πz(x, dy) =

R

y∈Pϕ(y)P X_(dy).

Assumption 2.4. If Y ∼ πz(x, ·) for some z ∈ R, x ∈ P, and if z0 ≥ z and ϕ : P → R

is bounded and measurable, then

Eϕ(Y ) | filtYz0 =

Z

y∈P

ϕ(y)πz0(Y, dy).

For a concrete example of a probability kernel {πz(x, ·) : z ∈ R, x ∈ P} satisfying

Assumption 2.4, see Section 3.1.

Assumption 2.5. For all q ≤ Qiter, the stopping random variable Rq is Fq-measurable.

Assumption 2.6. For all q ≤ Qiter, the random level Zq is a stopping time with respect

to the filtration (Fzq)_z∈R.

Assumption 2.7. For all q ≤ Qiter and n ∈ Iq, it holds that E

h

Bnq+1 | Fq

i

> 0 a.s. Assumption 2.8. Qiter is almost surely finite, and either

• Qiter≤ M for some M ∈ R, and EPn∈IqG

q

n < ∞ for all q ≥ 0;1

• or: for all q ≥ 0, Pn∈IqG

q

n≤ N a.s. for some N ∈ R.1

Note that Assumptions 2.6 and 2.7 are needed for steps 8 and 10 in the iteration step of the GAMS-algorithm to make even sense.

Now, we give the main result of this section. This is Theorem 3.2 in [BGGLR16]. Theorem 2.9. Let (Xnq, Gqn)n∈Iq

q≤Qiter be the sequence of random elements that is

obtained by the GAMS-algorithm, with input n0> 0, ϕ : P → R bounded and measurable,

and initial law PX. Assume that Assumptions 2.3-2.8 hold. Then the GAMS-estimator ˆ

p(ϕ) is an unbiased estimator for the quantity E[ϕ(X)] in Problem 2.1, i.e. E [ ˆp(ϕ)] = E[ϕ(X)].

1

(34)

2.3 Proof of Theorem 2.9

In this section, a sketch of the proof of Theorem 2.9 is given, based on the results in [BGGLR16]. We will give the main ideas, and we discuss the rˆole that Assumptions 2.3, 2.4, 2.5 and 2.8 play in the proof. These assumptions are displayed in bold in the upcoming part.

First, let us define the intermediate GAMS-estimator ˆpq(ϕ) (compare with true GAMS-estimator ˆp(ϕ)) by:

ˆ

pq(ϕ) := X

n∈Iq

Gq_n· ϕ(X_nq).

Note that ˆpQiter_{(ϕ) = ˆ}_{p(ϕ). Then, define the stochastic process (M}q_(ϕ))

q≥0 by

Mq(ϕ) := E [ˆpq(ϕ) | Fq] .

Note: this stochastic process is well-defined, in the sense that ˆpq(ϕ) is integrable, be-cause by the boundedness of ϕ, we have E|ˆpq(ϕ)| ≤ ||ϕ||∞EPn∈IqG

q

n. Clearly, by

Assumption 2.8, we have that EP

n∈IqG

q

n < ∞, so that E|ˆpq(ϕ)| < ∞.

The proof of Theorem 2.9 is based on two claims, whose proofs we will discuss later. Claim 2.10. (Mq(ϕ))_q≥0 is a martingale with respect to the filtration (Fq)q≥0.

Claim 2.11. Qiter is a stopping time with respect to the filtration (Fq)q≥0.

Let us assume that these two claims are true, and let us give the proof of Theorem 2.9. This is part of the proof of Theorem 3.2 in [BGGLR16]; we filled in some details. Proof of Theorem 2.9. The goal is to use the optional stopping theorem to the martin-gale (Mq(ϕ))_q≥0 with respect to the stopping time Qiter. For this (see [RY94, Chapter

2, Theorem 3.2]), we need that either the stopping time has to be bounded, or the mar-tingale must be uniformly integrable. Indeed, Assumption 2.8 ensures that either the stopping time is bounded; or thatP

n∈IqGqn a.s. ≤ N , in which case |Mq_{(ϕ)| ≤ E [|ˆ}pq(ϕ)| | Fq]a.s.≤ N E max n∈Iq|ϕ(X q n)|

|

Fq ≤ N ||ϕ||∞< ∞,

so that in the latter case, the martingale is almost surely uniformly bounded, and thus uniformly integrable as well. In either case, the optional stopping theorem tells us that

E[ ˆp(ϕ)] = E[ˆpQiter(ϕ)] = EMQiter(ϕ)OST= M0(ϕ) = E "_n 0 X n=1 1 n0 ϕ(X_n0) # = E[ϕ(X)].

The (Generalized) Adaptive Multilevel Splitting algorithm, and its applications

MSc Stochastics and Financial Mathematics

Master Thesis