Input estimation in a discretely observed Lévy-driven storage system

(1)

MSc Mathematics

Master Thesis

Input estimation in a discretely

observed Lévy-driven storage system

Author:

Supervisor:

Dennis Nieman

prof. dr. M.R.H. Mandjes

Examination date:

Daily supervisor:

September 15, 2020

L. Ravner PhD

(2)

Abstract

This thesis discusses the estimation of the input in a Lévy-driven stochastic storage model, based on observations of the corresponding workload process. The times of the observations are assumed to be on an equispaced grid. The main estimator of interest is based on an approximate moment equation, and uses observations timed at a random subset of the grid. We discuss consistency of this estimator, and give suggestions for its improvement based on the error in the approximation of the moment equation. We also substantiate the results and discuss related estimators with help of simulation examples.

Title: Input estimation in a discretely observed Lévy-driven storage system Author: Dennis Nieman, niemandennis@gmail.com

Supervisor: prof. dr. M.R.H. Mandjes Daily supervisor: L. Ravner PhD Second Examiner: dr. J.L. Dorsman Examination date: September 15, 2020 Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 105-107, 1098 XG Amsterdam http://kdvi.uva.nl

(3)

Introduction

Due to their various applications in e.g. mathematical finance and queueing theory, reflected Lévy processes are a topic of interest in stochastics. They can, for example, be used to model interest rates, the water level in a dam or the demand in a telecommunication system. Statistical estimation of model primitives is essential to control such systems.

Over time, various estimation methods for more classical queueing models have been proposed. There is a vast literature on estimation of queueing systems (see [2] for a bibliography). Often the goal is to estimate input characteristics such as the arrival rate or distribution of the service times, or other metrics related to the queue such as the mean waiting time or length of a busy period. This is done using observations of arrival and departure epochs, waiting times, or sometimes only from lengths of busy and idle periods. All this applies to the study of the M/G/1 model, whose workload process is a reflected Lévy process (the reflection of a so-called Compound Poisson process with drift). But the reflected Lévy process model is significantly more versatile than the M/G/1 model. For example, it allows for the study of queue-like systems governed by a more ‘volatile’ input or systems with heavy traffic.

The model primitive of interest in this thesis will be this driving process, that we will refer to as net input. More specifically, we will assume this net input is an increasing Lévy process, and estimate its Laplace exponent. Estimating the Laplace exponent is natural, since it is a vital object in the study of (reflected) Lévy processes, and moreover, the objects estimated in the ‘classical’ case, if defined, can be given in terms of the Laplace exponent.

We give a short overview of the literature. There is not a lot of queueing literature that uses, as we will do, (discrete observations of) the workload or ‘virtual waiting time’ process. The comparison is somewhat unfair, since quantities such as arrival and departure epochs are natural in most queueing models, whereas these notions are absent in our more general setting. In specific cases in classical queueing, the workload process can be retrieved from a given initial value and the arrival and departure times per package. This means that, in a way, sampling from the workload process means requiring less information.

Probably the most closely related paper is [8], describing a non-parametric approach for the M/G/1 queue. The authors use the empirical cumulative distribution function (ecdf) of the periodically sampled workload to estimate the traffic intensity and the service time distribution. Among the results are Central Limit Theorems for the ecdf of the workload and of the residual service time. Reference [4] discusses a maximum likelihood estimator based on the waiting times of the customers in a GI/G/1 queue, that is, observations from the workload process just prior to the arrivals. The main result here is also a CLT.

Contrary to the lack of research on estimation of reflected Lévy processes, there is substantial literature on estimation of Lévy processes. Here, having discrete observations is a more common assumption. See, for example, [9], [5] and references therein.

(5)

No statistical methods had been proposed in the reflected Lévy process setting, until the recent appearance of [17]. This article discusses an estimator of the Laplace exponent of the underlying input process by observing the reflected process at Poisson epochs – a method called ‘Poisson probing’. This estimator, as the authors explain, is a considerable contribution to the queueing literature for various reasons. Firstly, the idea of estimating a Lévy process from its reflection is novel. Moreover, the setting is more general than traditional queueing models, and it does not require that the process starts in stationarity, nor does it estimate a characteristic quantity related to the steady-state distribution of the system. Our estimator, being intimately related to the one in this paper, too, will have all these qualities. But there will also be one significant improvement. A major disadvantage of the Poisson probing is that the time between observations is exponentially distributed, whereas in many practical applications it is natural to have observations at equidistant points in time. The estimator proposed in this thesis will still use random probing, but the times between probes will have a discrete distribution, thereby overcoming the aforementioned issue. In summary, we will propose a way to estimate the Laplace exponent of a Lévy process from discrete observations of its reflection. The procedure is illustrated in the figure below.

t

storage process probe non-probed observation on grid

Figure: Discrete probing of a storage process. In our model, only the grid observations are observed; the rest of the sample path is merely included for illustration.

We now elaborate on the contents of this thesis. In our setting, no explicit expression is avail-able for the joint distribution of the observed variavail-ables. Moreover, we are trying to estimate a model primitive (the Laplace transform of the input) that is only indirectly related to the observed data (the reflected process associated with said input). These complications render impossible the application of standard statistical methods, such as maximum likelihood estima-tion. Motivated by the ideas in [17], we will define an estimator of the Laplace transform ϕ as the solution to an approximate conditional moment equation. The moment equation is solved by a function ˆϕ∆

n of the data, that will be the main estimator studied in this thesis.

Since ˆϕ∆

n is not constructed as a maximum likelihood estimator, and moreover observations

are dependent, it is more difficult to prove favourable features such as consistency and asymp-totic normality. Yet, in Chapter 2, we will aim at a proof of consistency of ˆϕ∆

n. The proof

(6)

probing process. We will use tools from probability theory, such as renewal theory to establish convergence of empirical process averages, and a discrete-time PASTA-like result to relate this to empirical averages of the process of probes.

We will analyse the estimator ˆϕ∆

n a bit more in Chapter 3. Here, we will assess the accuracy

of the moment equation it is derived from. The error is in fact due to two approximations. We consider this error a function of the distance ∆ > 0 between points on the grid. We will show that, as ∆ ↓ 0, the error vanishes in the second approximation. For the first approximation, the error vanishes even when divided by ∆, i.e. it is ‘superlinear’ in ∆. This result depends on a first-order (Taylor) approximation of the Laplace transform of the workload process and two crucial observations. The first is Lemma 3.2, that ensures

lim

∆↓0

Z ∞

0

ξe−ξtf (t)(bt/∆e − t/∆) dt = 0

under a condition on the function f (with b·e denoting the nearest integer function). The second step is to show that this result can be applied to the derivative of the Laplace transform of the workload process.

In Chapter 4, results are substantiated by simulation examples. We will consider the effect of the distance between grid points, and also of the sampling rate of the probing process. We also briefly discuss two more estimators related to ˆϕ∆

n, that are expected to reduce variance:

one does not use random probing, and the other is a so-called resampling estimator that re-uses data.

(7)

1 Preliminaries and setup

This chapter is introductory. We settle some notation and state results that will be used through-out this thesis. Moreover, we state all the assumptions that make up the statistical model.

1.1 Weak convergence

We define Cb to be the set of all bounded, continuous functions [0, ∞) → R. If f ∈ Cb, we

denote

kf k∞= sup

x≥0

|f (x)|.

If X, X1, X2, . . . are non-negative random variables such that

lim

n→∞Ef (Xn) = Ef (X)

for all f in Cb, we say that Xn converges weakly to X, and write Xn X. Equivalently, if F, F1, F2, . . . are the respective distributions of X, X1, X2, . . . , then

lim

n→∞Fn(x) = F (x)

whenever F is continuous at x ≥ 0. A weak limit is unique in distribution, since the integrals n

Ef (X) = Z

f (x) P(X ∈ dx) : f ∈ Cbo

completely determine the law P(X ∈ · ) (see e.g. [6], Theorem 1.2).

1.2 Reflected Lévy processes

We explain what is meant by a reflected Lévy process. Many facts will be stated without proof or reference. All statements in this section and the next are justified in Chapter 2 of [7] and/or Chapter 4 of [16].

To understand intuitively the notion of a reflected Lévy process, we describe an analogue, the classical single-server queueing model. In this model, a server processes arriving packages, which are stored in a queue if the server is busy upon their arrival. The process is discrete, in the sense that inter-arrival times are strictly positive and thus the set of arrival times is countable. The arrival times and the service times (the ‘job sizes’) characterise the system. There is a notion of ‘workload’ at a fixed time t, which is the sum of all remaining service times of packages in the system (in queue and at the server) at time t. Equivalently, it is the time a package would spend in the queue upon arrival at time t.

(8)

Suppose that Vn denotes the workload of the queue just before the arrival of the n-th package

(i.e. the time the n-th package spends in the queue). Let Andenote the time between the arrivals n and n + 1 and let Bn denote the service time of the n-th package. Then Yn:= Bn− An has

the interpretation of the work that is added to the queue minus the work that can potentially be done by the server in the period from the n-th arrival until just before the n + 1-st arrival. As such, we refer to Ynas the ‘net input’ to the queue. Moreover, the well-known Lindley recursion formula holds:

Vn+1= max{Vn+ Yn, 0}.

Iterating the formula, and denoting Xn=P n

i=1Yi, it follows that Vn= Xn+ max{V0, − min

0≤i≤nXi}. (1.1)

In case the Yiare i.i.d., as is the case in e.g. the GI/G/1 queueing model, the process {Xn}n∈N0

can be thought of as a random walk.

Using Lévy processes, we model a similar system with ‘instantaneous’ arrival of work. A Lévy

process is a continuous-time stochastic process {X(t)}t≥0such that

– X(0) = 0;

– increments are stationary:

X(t + s) − X(t)= X(s),d s, t ≥ 0;

– increments are independent:

X(t + s) − X(t) ⊥⊥ {X(u) : 0 ≤ u ≤ t}, s, t ≥ 0;

– sample paths are a.s. càdlàg.

Lévy processes can be considered the continuous-time counterpart of a random walk (with i.i.d. increments); this is justified by noting that a random walk such as {Xn}n∈N0 in the previous

paragraph also satisfies X0= 0 and has stationary and independent increments (again assuming

the Yi are i.i.d.). The càdlàg assumption ensures that almost every sample path has only jump

discontinuities.

In our queueing model, a Lévy process X = {X(t)}t≥0plays the role of cumulative net input,

describing the change in ‘work’ in the system. The corresponding workload process {V (t)}t≥0is

defined through

V (t) = X(t) + max{x, − inf

0≤s≤tX(s)},

the continuous-time counterpart of (1.1). Here V (0) = x ≥ 0 is the initial workload of the queue. Another justification of this construction is that the term L(t) := max{x, − inf

0≤s≤tX(s)}

de-fines the unique right-continuous non-decreasing process such that – {V (t) = X(t) + L(t)}t≥0 is a non-negative process with V (0) = x;

– L(t) is increasing only if V (t) = 0.

In the literature, V = {V (t)}t≥0is also referred to as the regulated version of X (and {L(t)}t≥0

as the regulator process), or the reflection of X (at zero). We will refer to V both as ‘workload process’ and ‘reflected process’.

(9)

t

X(t) x

V (t)

Figure 1.1: Sample path of a Compound Poisson workload process with drift {X(t)}t≥0, and its

reflection at zero {V (t)}t≥0

We turn to an example of a Lévy process. Let {N (t)}t≥0 be a Poisson process of constant

rate λ > 0, and let B1, B2, . . . be a sequence of i.i.d. random variables. Fix d ∈ R. Then

X(t) = N (t)

X

n=1

Bn+ dt

defines a Lévy process called Compound Poisson process with drift. In case d = −1 and the Bn

are positive, the corresponding workload process describes the workload of an M/G/1 queue. Figure 1.1 shows a realisation of such a process and its reflection at zero.

1.3 Model

Now that the notion of a reflected Lévy process is clear, we are ready to specify the statistical model we will be studying. We have to make some assumptions on the type of Lévy processes we are using. These assumptions, and also some properties that follow from them, are discussed in this section.

A Lévy process with increasing sample paths is called a subordinator. Throughout this thesis, we assume that the net input process X = {X(t)}t≥0 is the difference of a subordinator J =

{J (t)}t≥0(the input) and a unit-rate linear drift (the output). In this case, the reflection of X

has the interpretation of a storage process. In analogy with the classical queueing model from the previous section, we can think of the increments in J as the Lévy counterpart of the job sizes Bn. The interpretation is particularly clear in the case of an M/G/1 workload process (a

quintessential storage process) as in Figure 1.1, where the input is Compound Poisson:

J (t) = N (t)

X

n=1

Bn. (1.2)

Throughout, the input process X is assumed to fulfil the stability condition EX(1) < 0. The stability condition ensures that X(t) almost surely drifts to −∞ and that the reflected process

(10)

V has a weak limit. We denote by V a random variable with the limiting distribution, so that

V (t) V . The limiting distribution is stationary for V , meaning that conditioned on V (0)

being distributed as V , V (t) is also distributed as V for all t ≥ 0.

The absence of negative jumps implies that the Laplace transform α 7→ Ee−αX(1)is finite for all α in [0, ∞). We define the Lévy exponent ϕ : [0, ∞) → R by

ϕ(α) = log Ee−αX(1).

Since ϕ is convex, ϕ(0) = 0, and ϕ0(0) = −EX(1) > 0, it follows that ϕ is strictly increasing, and hence invertible on [0, ∞). We let ψ denote the inverse of ϕ. The Lévy exponent ϕ and its inverse ψ play an important role in the analysis of reflected Lévy processes. For example, the Laplace transform of the workload after an exponential time can be computed explicitly in terms of these functions: if X is any spectrally positive Lévy process with EX(1) < 0 and (independently) T has exponential distribution with rate ξ > 0, then (see e.g. Theorem 4.1 in [7]) Exe−αV (T )= Z ∞ 0 ξe−ξtExe−αV (t)dt = ξ ξ − ϕ(α) e−αx− α ψ(ξ)e −ψ(ξ)x_, _(1.3)

where we use the subscript x to emphasize that V (0) = x.

For any Lévy process X without negative jumps, the Lévy exponent characterises the distri-bution of X(t), as

Ee−αX(t)= eϕ(α)t, α, t ≥ 0.

Moreover, for α ≥ 0, there is an explicit expression

ϕ(α) = log Ee−αX(1)= dα + σ2α2/2 +

Z

(0,∞)

e−αx− 1 + αx1(x ≤ 1) ν(dx), (1.4) for a unique triplet (d, σ, ν) such that d ∈ R, σ ≥ 0, and ν is a non-negative measure on (0, ∞) such that R

(0,∞)1 ∧ x

2_{ν(dx) < ∞. The representation (1.4) is known as the classical}

Lévy-Khintchine formula. In our case, this representation is even simpler. The assumption that X is

the difference of a subordinator and a linear drift implies that σ = 0 and that the sample paths of X are of bounded variation. As such,

Z (0,∞) x1(x ≤ 1) ν(dx) < ∞. So (1.4) can be written as ϕ(α) = cα + Z (0,∞) (e−αx− 1) ν(dx) (1.5)

for some c ∈ R. The unit-rate output assumption that we made entails that c = 1. Equivalently,

X(t) ≡ J (t) − t, where J = {J (t)}t≥0is a Lévy process with exponent log Ee−αJ(1)=

Z

(0,∞)

(11)

Take again T ∼ Exponential(ξ) for fixed ξ > 0, and assume T is independent of V . We state a formula for Px(V (T ) = 0) that will be used throughout. Using (1.3) and that V (T ) is

non-negative, note that Px(V (T ) = 0) = lim α→∞Exe −αV (T )₌ ξ ψ(ξ)e −ψ(ξ)x _lim α→∞ α ϕ(α). (1.6)

In our case, the Lévy exponent ϕ is given by (1.5) with c = 1, so the Monotone Convergence Theorem implies lim α→∞ ϕ(α) α = 1 + limα→∞ Z (0,∞) e−αx− 1 α ν(dx) = 1 + Z (0,∞) lim α→∞ e−αx− 1 α ν(dx) = 1, (1.7) so (1.6) can be simplified to Px(V (T ) = 0) = ξ ψ(ξ)e −ψ(ξ)x_. _(1.8)

We go back to the Compound Poisson example (1.2) once more. Suppose B is a random variable with the same distribution as the i.i.d. jobs B1, B2, . . . and b(α) = Ee−αB denotes its

Laplace transform. Then it follows that

Ee−αJ(1)= ∞ X n=0 E exp−α n P i=1 BiP(N (1) = n) = ∞ X n=0 b(α)ne−λλ n n! = exp(λb(α) − λ), that is, log e−αJ (1)= λb(α) − λ = λ Z ∞ 0 (e−αx− 1) P(B ∈ dx), (1.9)

so the Lévy jump measure is ν = λP(B ∈ · ). Conversely, if J is any subordinator with

ν((0, ∞)) < ∞, then letting λ = 1/ν((0, ∞)), it follows that

log Ee−αJ(1)= λ Z

(0,∞)

(e−αx− 1) (ν/λ)(dx),

so J (1) is Compound Poisson with rate λ and jump size distribution ν/λ (which is a probability measure). Thus the input subordinator is a Compound Poisson process if and only if its Lévy jump measure ν is a finite measure.

Since the Lévy exponent is so vital to the study of (reflected) Lévy processes, and also the Lévy-driven storage model, it is the model primitive we will be estimating. We assume not to have the entire process V at our disposal, but only the discrete observations

{V (i∆) : i = 0, 1, 2, . . .}

(12)

2 A consistent estimator of ϕ

In this chapter, we describe an estimator for ϕ. The rest of the chapter is devoted to the proof of its consistency. Our estimator ˆϕ∆

n is based on one that was proposed in [17]. We set out the

sampling scheme from this reference, and then explain how ours is derived from it. Let T1, T2, . . . be i.i.d. Exponential(ξ) random variables for some fixed ξ > 0, and let

Si= i

X

j=1

Tj, i ≥ 0.

As before, let V denote the workload process corresponding to the input described in the previous chapter. Abbreviate Vi= V (Si) for each i. It follows from (1.3) that

E[e−αVi_|V i−1] = ξ ξ − ϕ(α) e−αVi−1₋ α ψ(ξ)e −ψ(ξ)Vi−1_. _(2.1)

This justifies estimating ϕ(α) from the moment equation 1 n n X i=1 e−αVi₌ 1 n n X i=1 ξ ξ − ϕ(α) e−αVi−1₋ α ψ(ξ)e −ψ(ξ)Vi−1_.

Suppose that ψn is an estimator of ψ(ξ). The estimator for ϕ(α) is found by solving the above

equation for ϕ(α), and substituting ψn for ψ(ξ). This yields

ˆ ϕ?_n(α; ψn) = ξ · e−αVn_{− e}−αV0_{+ (α/ψ} n)P n i=1e−ψn Vi−1 Pn i=1e−αVi . (2.2)

The first thing we will do is adapt the method so as not to have to estimate ψ(ξ). It follows from (1.8) that (2.1) is equivalent to

E[e−αVi_|V i−1] = ξ ξ − ϕ(α)e −αVi−1₋ α ξ − ϕ(α)P(Vi= 0|Vi−1), (2.3)

which can also be written as

ϕ(α)E[e−αVi_|V i−1] = ξ E[e−αVi_|V i−1] − e−αVi−1 + αP(Vi= 0|Vi−1).

Taking expectations yields

ϕ(α)Ee−αVi_{= ξE e}−αVi_{− e}−αVi−1 + αP(V i = 0).

The corresponding empirical moment equation is

ϕ(α)1 n n X i=1 e−αVi_{= ξ}1 n n X i=1 (e−αVi_{− e}−αVi−1_{) + α}1 n n X i=1 1{Vi= 0}.

Solving for ϕ(α), and recognising a telescoping series, it follows that ϕ(α) can be estimated by ˆ ϕn(α) = n −1_ξ(e−αVn_{− e}−αV0_{) + n}−1_αPn i=11{Vi= 0} n−1Pn i=1e−αVi . (2.4)

(13)

A comment on the difference between the two estimators ˆϕ?n and ˆϕn. It is important to note

that the first is more widely applicable: as explained in [17], it can be used if the input process J is any Lévy process without negative jumps. By contrast, the estimator ˆϕn relies on (1.8),

which is true only in case J is a subordinator. However, the following should be noted: in [17], it is shown that the consistency of ˆϕ?

nrelies on the existence of a consistent estimator of ψ(ξ), and

in this reference, existence of a consistent estimator of ψ(ξ) is only established for subordinator input.

As mentioned in the introduction to this thesis, a disadvantage to the above setting is that the times between observations have a continuous distribution, whereas in practice, often obser-vations are equidistant in time. So the second adaptation we will make yields an estimator that is a function of the observations

{V (i∆) : i = 0, 1, 2, . . .}.

for a fixed ∆ > 0. We will approximate the procedure described above. This is done by rounding the exponential times to values on the grid. Letting bxe denote the integer closest to x, we set

T_j∆= ∆ bTj/∆e , S_i∆=

i

X

j=1

T_j∆, V_i∆= V (S_i∆).

The observations V_i∆will be referred to as probes, and the observation times S_j∆constitute what we will refer to as the probing process. Note that the number ∆ bt/∆e is the integer multiple of ∆ closest to t ∈ R, i.e. the nearest point on the grid.

Consider the estimator ˆ ϕ∆_n(α) = n −1_ξ(e−αV∆ n − e−αV ∆ 0 ) + n−1αPn i=11{V ∆ i = 0} n−1Pn i=1e −αV∆ i . (2.5)

This estimator is nothing but a discrete-time approximation of (2.4). We keep the multiplicative term n−1 for the asymptotic analysis below.

A natural way to assess the estimator is to verify whether or not it is consistent. In our case, consistency holds point-wise in a strong sense:

Theorem 2.1. Let ξ, ∆ > 0. The estimator ˆϕ∆

n( · ) is point-wise (strongly) consistent for the Lévy exponent ϕ: for each α ≥ 0,

lim

n→∞ϕˆ

∆

n(α) = ϕ(α) a.s.

Thus, our estimator approaches the true value as the sample size increases.

The following remark is appropriate. Even though this result is stated as a truth, its proof, unfortunately, is incomplete. There is one missing step in the proof of Proposition 2.3. In the discussion chapter of this thesis, we discuss how it may be established. At this moment, we cannot exclude the necessity of an additional assumption regarding the input process J .

The proof of Theorem 2.1 is as follows. We analyse the numerator and the denominator in (2.5) separately. For the numerator, note that, as n → ∞,

(14)

We will show that the limit of the numerator is α lim n→∞ 1 n n X i=1 1{V_i∆= 0} = αE1{V = 0} = αP(V = 0) a.s., (2.7) where V has the stationary limiting distribution of V . Quite similarly, the denominator has limit lim n→∞ 1 n n X i=1 e−αVi∆ _{= Ee}−αV _a.s. _(2.8)

Both will follow from the following more general result:

Lemma 2.2 (Probes see time averages). Let f : [0, ∞) → R be a bounded measurable function.

As n → ∞, 1 n n X i=1 f (V_i∆) → Ef (V ) a.s. (2.9)

Note that (2.7) and (2.8) follow by applying Lemma 2.2 to the functions f (x) = 1{x = 0} and f (x) = e−αx, respectively. The proof of Lemma 2.2 is quite lengthy and will be outlined and given later. We first complete the proof of Theorem 2.1.

Combining (2.5) – (2.8), we see that ˆ

ϕ∆_n(α) → αP(V = 0)

Ee−αV a.s.

It is known that (see e.g. [7]) the limit of the denominator is Ee−αV = αϕ

0₍₀₎

ϕ(α) .

This identity is known as the Generalised Pollaczek-Khintchine formula. Recalling from (1.7) that ϕ(α)/α → 1 as α → ∞, it follows from V ≥ 0 that

P(V = 0) = lim α→∞Ee −αV _{= ϕ}0_(0), hence ˆ ϕ∆_n(α) → αP(V = 0) Ee−αV = αϕ0(0) αϕ0_(0)/ϕ(α) = ϕ(α) a.s.,

so the estimator is consistent. Now, all that remains is a proof of Lemma 2.2. This is done in the next sections. We outline the proof here.

Lemma 2.2 essentially follows from the equality lim n→∞ 1 n n X i=1 f (V_i∆) = lim n→∞ 1 n n X i=1 f (V (i∆)) a.s. (2.10)

for bounded measurable f . The right-hand side we call the time average of the process f (V (i∆)). So, informally, the equality states that the limiting average of the process observed on the entire grid is the same as the limiting average of the probes.

In Section 2.1, we will show that the right-hand side of (2.10) equals Ef (V ), as follows from a renewal argument. Such a convergence statement is often called ergodicity. The aforementioned ‘missing step’ will be pointed out here.

In Section 2.3, we will conclude the proof of Lemma 2.2 by establishing the equality (2.10). To do so, we will need a number of results on the probing process. These results are all stated in Section 2.2.

(15)

2.1 Ergodic theorem

Proposition 2.3. Let f : [0, ∞) → R be a bounded measurable function. Then

lim n→∞ 1 n n X i=1 f (V (i∆)) = Ef (V ) a.s.

Proof. We only sketch the core argument here. We modify some arguments and use results from

Chapters V and VI of [3]. For notational convenience, we assume ∆ = 1. Define

U0= inf{k ∈ N : V (k) = 0}

and (inductively)

Uj+1_{= inf{k ∈ N : k > U}j, V (k) = 0}, _{j ∈ N}0.

Then the Uj are stopping times with respect to the discrete filtration whose i-th σ-algebra is

Fi:= σ{V (0), V (1), . . . , V (i)}, (2.11)

i.e. the smallest σ-algebra making V (0), . . . , V (i) measurable. Also let

Mn_{= sup{j ∈ N}0: Uj ≤ n}, so that UMn ≤ n < UMn+1. We write n X i=1 f (V (i)) =h U0 X i=1 + U1 X i=U0+1 + · · · + UMn X i=U_Mn−1+1 + n X i=U_Mn+1 i f (V (i)) (2.12) =h U0 X i=1 + n X i=U_Mn+1 i f (V (i)) + Mn−1 X j=0 Uj+1 X i=Uj+1 f (V (i)). (2.13)

We analyse the asymptotic behaviour of the two terms in (2.13). Defining Y0= U0 and Yi+1= Ui+1− Ui, we may write Uj =Pj_i=0Yi. Using the (strong) Markov property of {V (t)}t≥0, it

can be shown that

– {V (Uj+ i)}i∈N0 is independent of {V (i) : 0 ≤ i ≤ Uj} and hence of U0, . . . , Uj, and

– {V (Uj+ i)}i∈N0 is distributed as {V (i)}i∈N0 conditional on V (0) = 0.

For this reason, we call V a regenerative process. This also implies that Yj+1 is independent

of the process V up to time Uj and hence of all the stopping times U0, . . . , Uj. Consequently,

the Y0, . . . , Yj are independent, and in fact, Y1, . . . , Yj are even iid. It follows that the sums in

(2.12) are mutually independent. Also, the terms

Uj+1

X

i=Uj+1

f (V (i)), 0 ≤ j ≤ Mn− 1

are all identically distributed with the distribution of

Y1

X

i=1

f (V (i))

V (0) = 0.

By linearity, we assume without loss of generality that 0 ≤ f ≤ 1. So 1 n hXU0 i=1 + n X i=U_Mn+1 i f (V (i)) ≤ 1 n(U0+ n − UMn),

(16)

which vanishes a.s., for both U0 and n − UMn are a.s. finite. Thus, continuing from (2.13), the

limit of 1_nPn

i=1f (V (i)) equals that of

1 n Mn−1 X j=0 Uj+1 X i=Uj+1 f (V (i)) = Mn n 1 Mn Mn−1 X j=0 Uj+1 X i=Uj+1 f (V (i)).

The Elementary Renewal Theorem states that Mn/n → 1/EY1 almost surely. This implies

Mn→ ∞, and thus, from the Strong Law of Large Numbers,

1 Mn Mn−1 X j=0 Uj+1 X i=Uj+1 f (V (i)) → E0 Y1 X i=1 f (V (i)) a.s.,

and this limit is known to equal EY1Ef (V ), so

1 n n X i=1 f (V (i)) → 1 EY1 EY1Ef (V ) = Ef (V ) a.s. (2.14)

The same arguments can be used when ∆ 6= 1, concluding the proof.

The missing step in the proof is the following: we need the mean EY1 to be finite. In words,

the mean number of steps it takes the process V to return to 0 should be finite. Only then we can guarantee that Mn → ∞ and the consequent convergence in (2.14). As promised, we will

comment on this in the discussion in Chapter 5.

2.2 Analysis of the probing process

To establish that probes see time averages, we need some more facts about the probing process. These are stated and proved here. The results in this section are very similar to what is known about the Bernoulli process. As such, the (admittedly tedious) proofs can be skipped by the reader who is familiar with this process.

We will again assume that ∆ = 1. Firstly, letting p = p(ξ) = e−ξ/2_{, we note that the common}

distribution of the inter-probe times T∆

i is given by P(T_i∆= 0) = P(Ti ≤1₂) = 1 − e−ξ/2= 1 − p, P(T_i∆= k) = P(k −1₂ < Ti ≤ k +1 2) = e −ξ(k−1/2)_{− e}−ξ(k+1/2) _{= p}2k_(p−1_{− p),} k ∈ N.

We now determine the distribution of S∆

j =

Pj

i=1T

∆

i .

Proposition 2.4. For each j in N0 and k in N,

P(S_j∆= 0) = (1 − p)j, P(S∆_j = k) = j∧k X n=1  j n pn(1 − p)j−nk − 1 n − 1 p2(k−n)(1 − p2)n. (2.15)

Proof. Independence of the T∆

i implies

P(S∆_j ) = P(T₁∆= · · · = T_j∆= 0) = (1 − p)j

(17)

To establish (2.15), we condition on the number of non-zero elements in the set {T₁∆, . . . , T_j∆}. Let us call this number U (j), so

P(S∆_j = k) = j∧k X n=1 P(U (j) = n)P(S_j∆= k | U (j) = n) = j∧k X n=1 P(U (j) = n)P(Sn∆= k | T ∆ i > 0, i = 1, . . . , n)

Since the T_i∆are i.i.d. and moreover non-zero with probability p, it follows that

P(U (j) = n) = j

n

pn(1 − p)j−n.

Note that for t ∈ N

P(T_j∆= t | T_i∆> 0, i = 1, . . . , n) = P(T_j∆= t | T_j∆> 0) =p

2t_(p−1_{− p)}

p = (p

2₎(t−1)_{(1 − p}2_),

so T∆

j conditioned on being positive has geometric distribution with parameter p2. The

inter-pretation of T∆

j | Tj∆> 0 is ‘the number of i.i.d. Bernoulli(p2) attempts until failure’. Thus we

can interpret S∆

n =

Pn

i=1T

∆

i conditioned on {Ti∆ > 0, i = 1, . . . , n} as ‘the number of i.i.d.

Bernoulli(p2) attempts until n failures’ which has a negative binomial distribution

P(S∆n = k | T ∆ i > 0, i = 1, . . . , n) = k − 1 n − 1 p2(k−n)(1 − p2)n. Thus (2.15) follows. For k ∈ N, define N (k) = sup{i ∈ N : Si∆≤ k}. Then N (k) ≥ n ⇐⇒ S_n∆≤ k. (2.16)

As is true for the Bernoulli process, our process N = {N (k)}_k∈N has independent increments due to the T∆

i being ‘memoryless’.

Proposition 2.5. The process N has stationary and independent increments.

Proof. We will show that, given i, n ∈ N, k ∈ N0 and n(1) ≤ · · · ≤ n(k),

P(N (k + i) − N (k) ≥ n | N (j) = n(j), 1 ≤ j ≤ k)

depends neither on k, nor on any of the n(j). We start by writing this probability as P(N (k + i) ≥ n(k) + n | N (j) = n(j), 1 ≤ j ≤ k),

which, due to (2.16), equals

(18)

We condition on the value of S_n(k)+1∆ , which, due to N (k) = n(k) necessarily exceeds k, and also satisfies S∆ n(k)+1 ≤ S ∆ n(k)+n≤ k + i: P(S∆_n(k)+n≤ k + i | N (j) = n(j), 1 ≤ j ≤ k) = k+i X s=k+1 h P(S_n(k)+n∆ ≤ k + i | S_n(k)+1∆ = s, N (j) = n(j), 1 ≤ j ≤ k) (2.17) · P(S∆ n(k)+1= s | N (j) = n(j), 1 ≤ j ≤ k) i . (2.18) We now determine the probabilities on the last two lines separately. To do so, note that the conditioning event

{N (j) = n(j), 1 ≤ j ≤ k}

dictates the values of S₁∆, . . . , S_n(k)∆ : by (2.16), the value of S_i∆ is the unique s that satisfies

N (s − 1) < i ≤ N (s). Moreover, N (k) = n(k) implies S∆

n(k)+1> k. Conversely, given the values

of S1∆, . . . , Sn(k)∆ and the fact that Sn(k)+1> k, it follows that N (k) = n(k) and the other N (j)

are also determined. The first term (2.17) is P(S∆_n(k)+n≤ k + i | S∆ n(k)+1 = s, N (j) = n(j), 1 ≤ j ≤ k) P(S∆_n(k)+n− Sn(k)+1≤ k + i − s | Sn(k)+1∆ = s, N (j) = n(j), 1 ≤ j ≤ k) P n(k)+n X j=n(k)+2 T_j∆≤ k + i − s S ∆ n(k)+1= s, N (j) = n(j), 1 ≤ j ≤ k .

As the conditioning event depends only on the values of Tj∆for j ≤ n(k)+1, and the conditioned

event is a statement about T∆

j for j ≥ n(k) + 2, (and these are independent) this term equals

P

n(k)+n

X

j=n(k)+2

T_j∆≤ k + i − s= P(S_n−1∆ ≤ k + i − s).

By the above discussion, the probability in (2.18) can also be written as P(S∆_n(k)+1= s | S_n(k)+1∆ > k, S_i∆= si, 1 ≤ i ≤ n(k))

where the si are given in terms of n(1), . . . , n(k). Since s > k, this probability equals

P(S∆ n(k)+1= s, S ∆ i = si, 1 ≤ i ≤ n(k)) P(S∆ n(k)+1 > k, S ∆ i = si, 1 ≤ i ≤ n(k)) = P(T ∆ n(k)+1 = s − sn(k), T ∆ n(k)= sn(k)− sn(k)−1, . . . , T ∆ 1 = s1) P(T∆ n(k)+1> k − sn(k), Tn(k)∆ = sn(k)− sn(k)−1, . . . , T1∆= s1) = P(T ∆ n(k)+1 = s − sn(k)) P(T∆ n(k)+1> k − sn(k)) = p 2(s−sn(k))_(p−1_{− p)} p2(k−sn(k))+1 = p2(s−k)(p−2− 1),

where we used independence of the T∆

i and P(T_n(k)+1∆ > u) = ∞ X v=u+1 (p−1− p)p2v_{= (p}−1_{− p)}p2(u+1) 1 − p2 = p 2u+1_.

(19)

Replacing the identities in (2.17) and (2.18), we see that P(N (k + i) − N (k) ≥ n | N (j) = n(j), 1 ≤ j ≤ k) = P(S_n(k)+n∆ ≤ k + i | N (j) = n(j), 1 ≤ j ≤ k) = k+i X s=k+1 P(S_n−1∆ ≤ k + i − s)p2(s−k)_(p−2_{− 1)} = i X s=1 P(S_n−1∆ ≤ i − s)p2s_(p−2_{− 1)}

which is independent of k and n(1), . . . , n(k), thus proving that N has independent and sta-tionary increments.

Using the distribution of S∆

j , the distribution of the increments of N can be made more

explicit. We will only need this for one-step increments. It readily follows from the previous display that P(N (k + 1) − N (k) ≥ n) = P(Sn−1∆ = 0)(1 − p2) = (1 − p)n−1(1 − p2), and thus P(N (k + 1) − N (k) = n) = P(N (k + 1) − N (k) ≥ n) − P(N (k + 1) − N (k) ≥ n − 1) = p(1 − p2)(1 − p)n−1, (2.19) and, lastly, P(N (k + 1) − N (k) = 0) = 1 − P(N (k + 1) − N (k) ≥ 1) = 1 − (1 − p2) = p2.

Now we know everything that is needed to conclude that (2.10) holds. The proof is given in the next section.

2.3 Probes see time averages

Recall our definition from the previous section

N (k) = sup{i ∈ N : Si∆≤ k}.

Since N (k) − N (k − 1) counts the number of S_i∆ that equal k, it follows that

N (k) X i=1 f (V_i∆) = k X i=1 f (V (i))(N (i) − N (i − 1)). (2.20)

We analyse the term on the right. The Elementary Renewal Theorem shows that lim k→∞N (k)/k = 1 ET∆ 1 =1 − p 2 p =: d a.s. (2.21) where p = 1 − P(T∆

1 = 0) = e−ξ/2. Alternatively, this can be proved by noting that

N (k)/k = k−1N (0) + k−1 k

X

i=1

(20)

where k−1N (0) → 0 a.s., and the Strong Law of Large Numbers can be applied to the other

term, since the N (i) − N (i − 1) are i.i.d. with mean

E[N (k) − N (k − 1)] = ∞ X n=1 np(1 − p2)(1 − p)n−1= p(1 − p2)1 p2 = 1 − p2 p = d, k ∈ N.

We use an argument in the spirit of [13]. Define a filtrationF := {Fk}k∈Nby letting each Fk

be the smallest σ-algebra making the random variables

V (0), . . . , V (k + 1), N (1), . . . , N (k)

measurable. Then, since the increments of N = {N (k)}_k∈N are independent, and moreover, V and N are independent,

E[N (k + 1) − N (k) |Fk] = E[N (k + 1) − N (k)] = d

This implies that M (k) ≡ N (k) − dk defines an F-martingale. Since V is F-predictable (i.e. every V (k + 1) isFk-measurable) it follows that

W (k) ≡ k

X

i=1

f (V (i))(M (i) − M (i − 1)).

also definesF-martingale. Moreover,

EW (k)2≤ E k X i=1 (M (i) − M (i − 1))2= k X i=1

E(M (i) − M (i − 1))2= kE(M (1) − M (0))2,

and E(M (1) − M (0))2 _{is a finite constant (as can be verified with help of the distribution of}

N (1) − N (0); see (2.19)), so by a martingale LLN,

W (k)/k → 0 a.s. (2.22)

Now note that (recalling (2.20)) 1 N (k) N (k) X i=1 f (V_i∆) = k N (k) 1 k k X i=1 f (V (i))[N (i) − N (i − 1)] = k N (k) hW (k) k + d k k X i=1 f (V (i))i So, by (2.21) and (2.22), lim k→∞ 1 N (k) N (k) X i=1 f (V_i∆) → 1 d h 0 + d lim k→∞ 1 k k X i=1 f (V (i))i= lim k→∞ 1 k k X i=1 f (V (i)) a.s..

Since N (k)/k → d > 0 a.s., it follows that N (k) a.s. increases to infinity, so (2.10) follows. Under the proviso that the statement of the ergodic theorem (Proposition 2.3) is correct, this yields the statement of Lemma 2.2.

(21)

3 Accuracy of the estimation equation

The estimator ˆϕ∆

n that we proposed was (almost) shown to be consistent in the previous chapter.

This is already informative. Another way to see how ‘good’ the estimator is, would be to compute its expectation. Since ˆϕ∆

n is a complicated function of the observations Vi∆, and very little is

known explicitly about the distribution of V (t), we do not expect this to be possible. But note that, just as the estimator ˆϕn in (2.4) is derived from (2.3), we can interpret our estimator ˆϕ∆n

defined in (2.5) as implicitly derived from the approximation E[e−αVi∆_|V∆ i−1] ≈ ξ ξ − ϕ(α)e −αV∆ i−1₋ α ξ − ϕ(α)P(V ∆ i = 0|V ∆ i−1). (3.1)

The goal in this chapter is to assess the accuracy of this approximation as ∆ ↓ 0. The idea behind this is that an explicit expression for the error in terms of ∆ may help improve the approximation procedure: from a more accurate moment equation, we might be able to derive a more accurate method of moments estimator. In a way, this chapter fails to achieve this: all the errors that we compute will turn out to be zero. Fortunately, as we will argue, this does not mean that the results are uninformative. Moreover, in the discussion in Chapter 5, we will propose a way to improve the results below.

Note that the approximation is two-fold: in more detail, we used E[e−αVi∆_|V∆ i−1= x] = Exe−αV ∆ 1 = Exe−αV (∆bT /∆e) ≈ Exe−αV (T ) (3.2) = ξ ξ − ϕ(α)e −αx₋ α ξ − ϕ(α)Px(V (T ) = 0) ≈ ξ ξ − ϕ(α)e −αx₋ α ξ − ϕ(α)Px(V (∆ bT /∆e) = 0) (3.3) = ξ ξ − ϕ(α)e −αx₋ α ξ − ϕ(α)P(V ∆ i = 0|V ∆ i−1= x)

where T is a generic random variable with the Exponential(ξ) distribution. There are two sections, each of which is devoted to one of the approximations.

3.1 Error in the first approximation

In the current section, our goal is to assess the first approximation (3.2): we determine the difference of Exe−αV1∆ _{= E}_xe−αV (∆bT /∆e)₌ Z ∞ 0 ξe−ξtExe−αV (∆bt/∆e)dt (3.4) and Exe−αV1 = Exe−αV (T )= Z ∞ 0 ξe−ξtExe−αV (t)dt (3.5)

(22)

as ∆ ↓ 0. From our endeavours in this section, it will readily follow that hx: t 7→ Exe−αV (t) is

continuous almost everywhere on [0, ∞), so that the Dominated Convergence Theorem implies that the term in (3.4) tends to the term in (3.5) as ∆ decreases to 0. In fact, the following stronger result holds.

Theorem 3.1. lim ∆↓0 Exe−αV ∆ 1 − E_xe−αV1 ∆ = 0. (3.6)

A comment on this result. Had we found a non-zero number c ∈ R as limit, then we could have improved (3.1) and derived a new estimator, since we would have

Exe−αVi∆ _{= E}_xe−αVi_{+ c∆ + o(∆).}

But Theorem 3.1 says that the linear coefficient c is zero, and thus the approximation (3.2) is already quite good, at least up to a term linear in ∆. Alternatively, we will say that the first-order error is zero.

Here is an outline of the proof of Theorem 3.1. We start by showing that

hx: t 7→ Exe−αV (t)

is differentiable almost everywhere on [0, ∞), so that

hx(∆ bt/∆e) = hx(t) + h0x(t)(∆ bt/∆e − t) + o(∆ bt/∆e − t) a.e. (3.7)

Subsequently, we will show that Z ∞

0

ξe−ξthx(∆ bt/∆e) dt = Z ∞

0

ξe−ξthx(t) + h0_x(t)(∆ bt/∆e − t)dt + o(∆), (3.8)

which is the same as Exe−αV ∆ 1 _{= E} xe−αV1+ Z ∞ 0

ξe−ξth0_x(t)(∆ bt/∆e − t) dt + o(∆). (3.9)

The main part of the proof is to show that the integral is o(∆), after which (3.6) follows. This is done by applying the following result.

Lemma 3.2. Let u ≥ 0, and suppose that f : [u, ∞) → [0, ∞) is bounded and uniformly

continuous. Then lim ∆↓0 Z ∞ u ξe−ξtf (t)(bt/∆e − t/∆) dt = 0.

We will now justify all of the aforementioned facts and then we apply Lemma 3.2. The proof of Lemma 3.2 is given at the end.

Proof of Theorem 3.1. Firstly, we show that hx is differentiable so (3.7) holds. It follows from Equation (35) on page 117 of [16] that

hx(t) = Exe−αV (t)= eϕ(α)te−αx− α Z t 0 e−ϕ(α)sPx(V (s) = 0) ds .

By the Lebesgue Differentiation Theorem, the integral is differentiable a.e. with derivative d

dt Z t

0

(23)

Consequently, hx is differentiable (a.e.) with derivative h0_x(t) = ϕ(α)eϕ(α)te−αx− α Z t 0 e−ϕ(α)sPx(V (s) = 0) ds − αeϕ(α)t_e−ϕ(α)t_P x(V (t) = 0) = ϕ(α)hx(t) − αPx(V (t) = 0) a.e.

Writing px(t) = Px(V (t) = 0), this simply reads

h0_x= ϕ(α)hx− αpx. (3.10)

Henceforth, we shall, as in the above display, omit the statement ‘almost everywhere’. The fact that some equations do not hold ‘everywhere’ does not compromise the results, and the precise reader is welcome to mentally add ‘a.e.’ wherever they see fit.

To establish (3.8), we prove the equivalent statement lim ∆↓0 Z ∞ 0 ξe−ξthx(∆ bt/∆e) − hx(t) − h 0 x(t)(∆ bt/∆e − t) ∆ dt = 0. (3.11)

This is done with help of the Dominated Convergence Theorem. Note that |∆ bt/∆e − t| ≤ ∆/2, whence hx(∆ bt/∆e) − hx(t) − h0x(t)(∆ bt/∆e − t) ∆ ≤ hx(∆ bt/∆e) − hx(t) − h0x(t)(∆ bt/∆e − t) 2(∆ bt/∆e − t) = 1 2 hx(∆ bt/∆e) − hx(t) (∆ bt/∆e − t) − h 0 x(t) ,

which clearly vanishes as ∆ ↓ 0. So all that remains to be shown is that the limit may be taken outside the integral. The Mean Value Theorem implies

Note that kh0_xk∞= supt≥0|h0x(t)| is finite by (3.10), since both hxand pxare bounded between

0 and 1. Thus, the Dominated Convergence Theorem yields (3.11) and hence (3.8).

We apply Lemma 3.2. We do not take f = h0x but recall that (3.10), and carefully explain

why we are allowed to take f = hxand f = pxseparately.

First, we show that hxis uniformly continuous. Note that, by construction,

|V (t + s) − V (t)| ≤ |X(t + s) − X(t)|

for t ≥ 0 and appropriate s. As a consequence, for a given ε > 0,

P(|V (t + s) − V (t)| > ε) ≤ P(|X(t + s) − X(t)| > ε) = P(|X(|s|)| > ε)

where we used stationary increments in the equality. The term on the right vanishes as s → 0, since X(s) → X(0) = 0 a.s. by right-continuity of sample paths. So

lim

(24)

As a consequence, |hx(t) − hx(t + s)| ≤ Ex e−αV (t)− e−αV (t+s) ≤ Ex1{|V (t + s) − V (t)| > ε} + Ex 1{|V (t + s) − V (t)| ≤ ε} ·e−αV (t)− e−αV (t+s) ≤ P(|V (t + s) − V (t)| > ε) + αε. By (3.12), we see that lim s→0sup_t≥0|hx(t) − hx(t + s)| ≤ αε.

Letting ε ↓ 0 yields that hxis uniformly continuous. So Lemma 3.2 gives

lim

∆↓0

Z ∞

0

ξe−ξthx(t)(bt/∆e − t/∆) dt = 0.

As it turns out, px is not necessarily uniformly continuous when x 6= 0 (see the discussion

after Proposition 3.3). But we shall now prove that p0is uniformly continuous, and afterwards

we will see that this implies lim ∆↓0 Z ∞ 0 ξe−ξtpx(t)(bt/∆e − t/∆) dt = 0 (3.13) for all x ≥ 0.

In case the input J is a Compound Poisson process, the stronger statement that p0is Lipschitz

continuous is noted in [1]. The argument for general J is slightly more involved; we include it here. For t ≥ 0, let Z(t) = 1{V (t) = 0}. Since the reflected process V also has the stationary and independent increments properties, it follows that, for 0 = t0< t1< . . . < tn,

P0(Z(t1) = · · · = Z(tn) = 1) = P0(V (t1) = V (t2) − V (t1) = · · · = V (tn) − V (tn−1) = 0) = n Y i=1 P0(V (ti) − V (ti−1) = 0) = n Y i=1 p0(ti− ti−1).

In the nomenclature of [10], this implies {Z(t)}t≥0is a regenerative phenomenon, and p0is called

a p-function. Then uniform continuity of p0 follows by Theorem 2.3 in the same reference,

provided that p0 is a so-called standard p-function, meaning that it satisfies the additional

requirement lim

t↓0p0(t) = limt↓0P0(V (t) = 0) = 1.

We prove this now. Firstly, due to the assumptions on J , it follows by e.g. Lemma 4.11 in [11] that lim s↓0 J (s) s = 0 a.s. This implies lim n→∞P J (s)/s ≤ 1 for 0 < s < 1/n= P ∞ [ n=1 {|J (s)/s| ≤ 1 for 0 < s < 1/n}= 1,

(25)

since the sets {|J (s)/s| ≤ 1 for 0 < s < 1/n} are growing with n. By Equation (22) in [18],

p0(t) = P0(V (t) = 0) = P(J (s) ≤ s for 0 ≤ s ≤ t).

Combining the previous two displays, we see that p0(t) → 1 as t ↓ 0. So p0 is a standard

p-function and therefore uniformly continuous.

Thus, we have (3.13) for x = 0. To prove it for general x ≥ 0, we do the following. Define

τx:= inf{t ≥ 0 : V (t) = 0}. (3.14)

Note that for t < x, we have

V (t) ≥ X(t) + x > J (t) − t + t ≥ 0,

so px(t) = Px(V (t) = 0) = 0 and P(τx< t) = 0. For t ≥ u ≥ x, the Strong Markov Property of

V (see [3]) implies Px(V (t) = 0|τx= u) = Px(V (t) − V (τx) = 0|τx= u) = PV (τx)(V (t − u) = 0) = p0(t − u), since V (τx) = 0. Consequently, px(t) = Px(V (t) = 0) = Z t x Px(V (t) = 0|τx= u) P(τx∈ du) = Z t x p0(t−u) P(τx∈ du). (3.15) and Z ∞ 0 ξe−ξtpx(t)(bt/∆e − t/∆) dt = Z ∞ x Z t x

ξe−ξtp0(t − u)(bt/∆e − t/∆) P(τx∈ du) dt.

Note that for all ∆ > 0, |ξe−ξt_p

0(t − u)(bt/∆e − t/∆)| ≤ ξe−ξt. (3.16)

Thus, from Fubini’s Theorem, Z ∞ 0 ξe−ξtpx(t)(bt/∆e − t/∆) dt = Z ∞ x Z ∞ u

ξe−ξtp0(t − u)(bt/∆e − t/∆) dt P(τx∈ du).

Since t 7→ p0(t−u) is uniformly continuous, the inner integral vanishes as ∆ ↓ 0 by an application

of Lemma 3.2. We use the Dominated Convergence Theorem to conclude that the limit may be taken outside the integral. Equation (3.16) implies that the convergence is dominated by

Z ∞

u

ξe−ξt= e−ξu,

which is integrable (with respect to the distribution of τx), so indeed, it follows that

lim

∆↓0

Z ∞

0

ξe−ξtpx(t)(bt/∆e − t/∆) dt = 0.

So, from h0x= ϕ(α)hx− αpx, we see that

lim ∆↓0 Z ∞ 0 ξe−ξth0x(t)(bt/∆e − t/∆) dt = 0, or, equivalently, Z ∞ 0

(26)

and combined with (3.9), we finally obtain Exe−αV (∆bT /∆e)= Exe−αV (T )+ o(∆).

This is equivalent to the statement of Theorem 3.1.

Proof of Lemma 3.2. Fix u ≥ 0 and let f : [u, ∞) be uniformly continuous and bounded. We

split up the integral over a partition

I_k∆= [k∆, (k + 1)∆), _{k ∈ N}0, so that Z ∞ u e−ξtf (t)(bt/∆e − t/∆) dt = ∞ X k=0 Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt.

We think of f as being zero on [0, u). The value of the series is unambiguous since it converges absolutely, as can be seen from

Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt ≤ kf k∞∆e −ξk∆_. Define s∆_k = sup t∈I∆ k e−ξtf (t), i∆_k = inf t∈I∆ k e−ξtf (t).

We use this to bound Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt = Z (k+1/2)∆ k∆ e−ξtf (t)(k − t/∆) dt + Z (k+1)∆ (k+1/2)∆ e−ξtf (t)(k + 1 − t/∆) dt ≤ i∆ k Z (k+1/2)∆ k∆ (k − t/∆) dt + s∆_k Z (k+1)∆ (k+1/2)∆ (k + 1 − t/∆) dt = ∆₈(s∆_k − i∆ k). Similarly, Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt ≥ ∆₈(i∆_k − s∆ k), so we conclude that Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt ≤ ∆(s ∆ k − i ∆ k), hence Z ∞ u e−ξtf (t)(bt/∆e − t/∆) dt (3.17) ≤ ∞ X k=0 Z I∆ k e−ξtf (t)(bt/∆e − t/∆) dt ≤ ∆ ∞ X k=0 (s∆_k − i∆ k). (3.18)

(27)

We show that the upper bound in (3.18) vanishes as ∆ ↓ 0. To this end, note that s∆_k − i∆

k = 0 if k < u/∆−1, and s∆

k−i∆k = s∆k ≤ kf k∞for the one k?= k?(u, ∆) satisfying u/∆−1 ≤ k?< u/∆,

so that for all k not equal to k? we have

s∆_k − i∆k = sup t∈I∆ k e−ξtf (t) − inf t∈I∆ k e−ξtf (t) ≤ e−ξk∆ sup t∈I∆ k f (t) − e−ξ(k+1)∆ inf t∈I∆ k f (t) =e−ξk∆− e−ξ(k+1)∆sup t∈I∆ k f (t) + e−ξ(k+1)∆sup t∈I∆ k f (t) − inf t∈I∆ k f (t) ≤ e−ξk∆_{(1 − e}−ξ∆_{)kf k} ∞+ e−ξk∆ sup s,t≥u |s−t|≤∆ |f (s) − f (t)|.

Therefore, using the power series formulaP∞

k=0e−ξk∆= (1 − e−ξ∆)−1, we see that the term in

(3.18) is bounded by (taking into account the k?_{-th term)}

∆(1 − e−ξ∆)kf k∞+ sup s,t≥u |s−t|≤∆ |f (s) − f (t)|(1 − e−ξ∆)−1+ ∆kf k∞ = 2∆kf k∞+ ∆ 1 − e−ξ∆ sup s,t≥u |s−t|≤∆ |f (s) − f (t)|. (3.19)

Recognising a derivative (or by l’Hôpital’s rule), we see that lim ∆↓0 ∆ 1 − e−ξ∆ = 1 ξ.

Moreover, the assumption that f is uniformly continuous is equivalent to lim

∆↓0 _s,t≥usup |s−t|≤∆

|f (s) − f (t)| = 0.

Hence the upper bound in (3.19), and consequently the term in (3.17), vanish as ∆ ↓ 0.

3.2 Error in the second approximation

We now discuss the second approximation (3.3). Letting as before px(t) = Px(V (t) = 0), this

boils down to a comparison of

Px(V1∆= 0) = Px(V (∆ bT /∆e) = 0) = Z ∞ 0 ξe−ξtpx(∆ bt/∆e) dt (3.20) and Px(V1= 0) = Px(V (T ) = 0) = Z ∞ 0 ξe−ξtpx(t) dt (3.21) as ∆ ↓ 0.

We would like to obtain a result like Theorem 3.1, i.e. a limit as ∆ ↓ 0 of Px(V1∆= 0) − Px(V1= 0)

(28)

In an approach following the previous section, we would need the derivative of the map px: t 7→

Px(V (t) = 0). Under the hypothesis that p0xis bounded and uniformly continuous on [x, ∞), it

would readily follow that the term in (3.22) vanishes: we would simply replace hx by px in the

previous section.

Unfortunately, none of the available results (known to the author) about px seem to imply

its differentiability (on its support) – not even under some additional assumptions. Even worse, recall that in order to apply Lemma 3.2, the derivative should actually be uniformly continuous. That the term in (3.22) converges without differentiability of pxis unlikely, as it can be written

as

Z ∞

0

ξe−ξtpx(∆ bt/∆e) − px(t)

∆ dt.

In the discussion chapter at the end of this thesis, we will consider some known results that may help establish convergence of the difference (3.22) under certain conditions. What we will do now, is show the weaker result that the difference between the probabilities (3.20) and (3.21) vanishes. This is still a justification of the approximation (3.3), albeit weaker than the justification of (3.2). Proposition 3.3. lim ∆↓0Px(V ∆ 1 = 0) = Px(V1= 0).

Proof. Note that the result is true if x = 0 since p0 is continuous (apply the Dominated

Con-vergence Theorem in (3.20)). So we consider the case x > 0. We first show that px is

right-continuous. As we will see then, this implies the assertion by a change of variable in the integral in (3.20).

Recall from (3.15) that

px(t) = Z t

x

p0(t − u) P(τx∈ du), t ≥ x

with τxas in (3.14), and px(t) = 0 for t < x. This identity yields, for s > t ≥ x,

|px(s) − px(t)| ≤ Z (t,s] p0(s − u) P(τx∈ du) + Z t x |p0(s − u) − p0(t − u)| P(τx∈ du).

The left integral is bounded by P(τx ∈ (t, s]), which vanishes as s ↓ t. For the integral on the

right, let ε > 0. As p0 is uniformly continuous, we have |p0(s − u) − p0(t − u)| < ε for |s − t|

small enough, and thus Z t

x

|p0(s − u) − p0(t − u)| P(τx∈ du) ≤ ε

for |s − t| small enough. This shows that px is right-continuous.

Now we apply the promised change of variable. Using the convention that bk + 1/2e = k when k is an integer, it follows that bx + 1/2e = dxe for all x ∈ R, where dxe is the smallest

(29)

integer greater than or equal to x. Letting s = t + ∆/2, we have Px(V1∆= 0) = Z ∞ 0 ξe−ξtpx(∆ bt/∆e) dt = Z ∞ ∆/2 ξe−ξ(s−∆/2)px(∆ ds/∆e) ds = eξ∆/2 Z ∞ ∆/2 ξe−ξspx(∆ ds/∆e) ds.

Note that for ∆ < x we have px(∆ d(∆/2)/∆e) = px(∆) = 0, and hence

Px(V1∆= 0) = e

ξ∆/2Z

∞

x

ξe−ξspx(∆ ds/∆e) ds.

Now, as ∆ ↓ 0, we have ∆ ds/∆e ↓ s, so by right-continuity of px, we find px(∆ ds/∆e) → px(s).

Hence an application of the Dominated Convergence Theorem yields lim ∆↓0Px(V ∆ 1 = 0) = Z ∞ x ξe−ξspx(s) ds = Px(V1= 0).

The proof may seem tedious, but continuity of pxon its support [x, ∞) is not guaranteed. To

see this, let J (t) be a Poisson process of rate λ > 0. By [18],

px(t) = Px(V (t) = 0) = Z t−x 0 1 − u/t P(J(t) ∈ du) = bt−xc X n=0 (1 − n/t)P(J (t) = n). (3.23)

Take e.g. x = 1 and recall that P(J (t) = n) = (λt)ne−λt/n!, so

lim t↑2p1(t) = limt↑2P(J (t) = 0) = e −2λ_, whereas lim t↓2p1(t) = limt↓2 P(J (t) = 0) + (1 − 1/t)P(J (t) = 1) = e −2λ_{+ λe}−2λ_.

More generally, if x > 0, then px has a jump discontinuity at each t ≥ x such that t − x is an

(30)

4 Simulation analysis

In this chapter, we substantiate some of the theoretical results with simulation examples. We will evaluate the performance of the estimator ˆϕ∆

n for two different input processes.

For the first input process, we will assume that the time horizon T > 0 and the size ∆ of the grid is fixed. So there is a number M such that the storage process V can potentially be observed at M + 1 times 0, ∆, . . . , M ∆. This is convenient in applications where data is already available, and also in experiment design. As a consequence, the number of probes may vary, since the probing process Sj∆=

Pj

i=1T

∆

i is stochastic, i.e. the maximal j such that Sj∆≤ M ∆

varies with each realisation.

With a fixed time horizon, the number of probes will increase if the rate ξ increases. But along increases the probability that the time T_i∆ between two consecutive probes is zero, and using the same observation doubly may be inefficient. With the second input process, we will discuss the effect of the sizes of both ξ and ∆. Here, in some situations, we will argue that the comparison is more fair if instead we fix the number of probes.

0 ∆ 2∆ 3∆ . . . S∆ 1 S∆ 2 S∆ 3 S4∆ S5∆ S6∆ S∆ 7 S∆ 8 S∆9

sample path{V (t)}t≥0 probe non-probed observation on grid

Figure 4.1: Probing part of a storage process {V (t)}t≥0. In our model, only the grid observations

(31)

In Figure 4.1, we give a visual interpretation of the probing procedure. The sample path is actually an (approximate) realisation of the process described in Section 4.1. As mentioned earlier, the time Ti∆ between probes has positive probability of being zero, and so it is possible

that there are j for which S∆

j = Sj+1∆ . This can be seen in this example, where the realisation

of the probing process {S∆

j }j∈N is {∆, ∆, 3∆, 5∆, 6∆, 8∆, 8∆, . . .}.

Recall from Section 1 that a subordinator is a Compound Poisson process if and only if its Lévy jump measure is finite. We start by considering an input process that is not Compound Poisson. In the second section, we will analyse the Compound Poisson case.

4.1 Infinite intensity case

We simulate a storage process whose input is the sum of two independent subordinators {J1(t)}

and {J2(t)}. This is a nice opportunity to consider and describe two commonly used

subordi-nators. We omit some details in what comes next; all relevant background information can be found in [7].

The first process is taken to be a Gamma process, which is defined through the jump measure

ν(dx) = γ xe

−βx_dx, _{x > 0}

for fixed positive numbers β, γ. It can be shown that Ee−αJ1(t)₌ β

β + α

γt

,

which is to say J1(t) has Gamma distribution with rate β and shape parameter γt.

The second process we take as an Inverse Gaussian process. This is defined as follows: let {X(t)} be a Brownian Motion with drift d > 0 and variance σ2_{> 0. Then}

J2(t) = inf{s ≥ 0 : X(s) = t}

is a subordinator. It is known that J2(t) has an Inverse Gaussian distribution with mean µ = t/d

and shape parameter λ = t2/σ2, and the Lévy exponent is

log Ee−αJ2(1)₌ d σ2 1 −p1 + 2σ2_α/d2₌ λ µ 1 −p1 + 2µ2_α/λ_.

We take (γ, β, µ, λ) = (2, 5, 2/5, 2), so the Lévy exponent function of the net input process is

ϕ(α) = α + 2 log 5

5 + α+ 4

1 −p1 + α/4.

One can verify that EJ1(1) = EJ2(1) = 2/5, so

EX(1) = EJ1(1) + EJ2(1) − 1 < 0,

hence the queue is stable.

We simulate on a discrete grid G := {kδ : k ∈ N0} for some fixed small δ > 0 (not to be

confused with ∆). The reflected process is simulated approximately: we estimate inf

Input estimation in a discretely observed Lévy-driven storage system

MSc Mathematics

Master Thesis