Nonparametric Density Estimation for Univariate Current Status Data

(1)

MSc Stochastics and Financial Mathematics

Master Thesis

Nonparametric Density Estimation for

Univariate Current Status Data

Author: Supervisor:

Catharina Elisabeth Graafland

dr. A.J. van Es

Examination date:

January 26, 2017

Korteweg-de Vries Institute for Mathematics

(2)

Abstract

We construct a density estimator and an estimator for the distribution function in the univariate current status model. The estimator is based on the relation of the problem to univariate uniform deconvolution. Some asymptotic properties are derived and sim-ulations are presented.

AMS classification: primary 62G05, 62N01, 62G07; secondary 62G20 Keywords: Univariate current status, kernel density estimation.

Title: Nonparametric Density Estimation for Univariate Current Status Data Author: Catharina Elisabeth Graafland, lisette.graafland@student.uva.nl, 10814086 Supervisor: dr. A.J. van Es

Second Examiner: prof. dr. J.H. van Zanten Examination date: January 26, 2017

Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 105-107, 1098 XG Amsterdam http://kdvi.uva.nl

(3)

Chapter 1 Introduction

In this thesis we study the Univariate Current Status Data problem (UCSD problem). In this problem the initial data in which we are interested are not available. However other related data, so-called current status data, are available. Deriving a density estimator for the initial data from the current status data will be the aim of this thesis.

The UCSD problem, or type I interval censoring problem, can be formulated in the follow-ing way. Let X1, . . . , Xn be unobservable variables of interest. Let T1, . . . Tn be i.i.d. random

variables with known density. For T1, . . . , Tn it is known whether the unobservable variables

X1, . . . , Xn are smaller or larger than the corresponding Ti. Else stated, writing ∆i = I[Xi≤Ti],

the current status data (T1,∆1), . . . , (Tn,∆n) are observed.

An interpretation of the random variables T1, . . . Tn as random time instants, serving as

obser-vation times, is an explanation of the term current status. However the variables T1, . . . Tn are

not restricted to be time instants.

To illustrate practical importance we consider below three examples of UCSD problems. Given a population of children one can be interested in the age of weaning. See for instance [Diamond and McDonald (1991)] and [Grummer-Strawn (1993)]. Weaning is the process of gradually introducing an infant to its adult diet and withdrawing the supply of its mothers milk. In this problem Xi is the age of an individual child when it starts the process of weaning.

Ti is the age of an individual child in this population. ∆i is defined as ∆i = I[Xi≤Ti]. Hence

∆i = 1 if the child is already weaning at observation and ∆i = 0 if it takes nothing but its

mothers milk.

This problem is worth to be analysed by current status data methods, not only when the data of Xi are missing, but also when the data are available, because of the inaccurate character of

the initial data.

Another problem using USCD, formulated by Milton Friedman in [Stat. Research Group, Colombia University (1947), Ch.11, p.341], is about proximity fuzes. A proximity fuze is a device that detonates a munitions explosive material automatically when the distance to a target becomes smaller than a predetermined value. One can be interested in the maximum distance over which a proximity fuze operates.

(6)

Ti is the nearest distance the proximity fuze reaches with respect to its target. ∆i is defined as

∆i = I[Xi≤Ti]. Hence ∆i = 1 if the fuze operates at a distance greater than the nearest distance

relative to its target and ∆i = 0 if the fuze operates only at a distance that is smaller than the

nearest distance it has reached relative to its target.

The third and last problem is about estimation of the age of incidence of a non fatal human disease such as Hepatitis A from a cross-sectional sample. This problem was mentioned by N. Keiding in [Keiding (1991)]. In this problem Xi is the age of incidence of an individual by a non

fatal human disease. Ti is the age at which a diagnostic test is carried on the individual. Recall

∆i = I[Xi≤Ti]. Hence ∆i = 1 if the individual was already ill at the moment of the diagnostic

test and ∆i = 0 if the individual gets ill later1.

The above problems are relevant in different aspects of life and one can come up with many other examples. It is therefore not surprising that research to this problem has been done al-ready. With success, an estimator is found in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)]. In this master thesis however a new density estimator for the USCD problem will be constructed. This density estimator differs from the one in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)] in the way it is derived, but has similar asymptotic properties. The estimator in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)] is based on maximum likelihood estimation. The estimator in this thesis is constructed by kernel estimation methods after recovering a connection with the uniform deconvolution model. An advantage of this derivation is the possibility to expand the theory to bi- or multivariate current status data more naturally. This is a subject for further research. Expanding the maximum likeli-hood estimator to the bivariate current status model is still a tedious problem as mentioned in [Groeneboom (2013)].

We start this thesis in Chapter 2 with a summary of the main results and theorems. As mentioned above, the way the density estimator is constructed is a result in itself and therefore a short explanation of the steps between the results and theorems will already be given in this section. In Chapter 3 the derivation of the estimator and theorems are again presented. In this chapter all results, theorems and lemmas will be proven and explained more deeply. However for readability some lemmas are omitted and will be proven in Appendix A. Chapter 4 will be devoted to simulation of the constructed estimator. In this chapter also some problems beyond the theory in this thesis are simulated. In Chapter 5 the results of Chapter 3 will be discussed and compared with existing theory on the subject. In this way one gets insight in subjects of further research. In Chapter 6 some suggestions are made for such further research.

1_{In this problem we assume that the members of our population are all infected during their lifetime and}

that this infection causes illness. Data can be drawn from a population of recent and former patients. The age at which the first diagnostic test is performed can be taken as data points for Ti.

(7)

Chapter 2 Construction and results

The uniform deconvolution problem

Our first result will be Lemma 3.1 which demonstrates the relationship between UCSD and data of the uniform deconvolution problem. A short sketch of this problem is given below. First we describe the general deconvolution problem. Next we look at the special case of uniform deconvolution.

Let X1, . . . , Xnbe i.i.d. observations, where Yi = Xi+ Zi and Xi and Zi are independent.

Assume that the unobservable Xi have distribution function F and density f . Also assume

that the unobservable random variables Zi have a known density k. Note that the density

g of Yi is equal to the convolution of f and k, so g = k ∗ f where ∗ denotes convolution.

So we have

g(x) = Z ∞

−∞

k(x − u)f (u)du.

The general deconvolution problem is the problem of estimating f or F from the obser-vations Xi. In the uniform deconvolution problem we require the distribution of Zi to be

U[0, 1) distributed. In this case k(x) = I[0,1)(x) and hence we have

g(x) = Z ∞

−∞

I[0,1)(x − u)f (u)du = F (x) − F (x − 1). (2.1)

In the next item we present a similar equality between the distribution function F of the unobservable variables X1, . . . , Xn in the UCSD problem and the density of variables

V1, . . . , Vn, obtained from the observed UCSD.

Transformation of UCSD to uniform deconvolution data (Section 3.1)

In the USCD problem our variables of interest are X1, . . . , Xn. The exact value of Xi

is not measured. Instead of X1, . . . , Xn we observe the corresponding Ti and we observe

∆i = I[Xi≤Ti]. The observations are presented as (T1,∆1), . . . , (Tn,∆n).

In Section 3.1 we consider the following transformation of the points Ti,

Vi =

Ti+ 1 , if ∆i = 0,

(8)

The first main lemma is stated below.

Lemma 3.1. Assume that the distribution of the variables Xi is concentrated on [0, 1].

Assume that the Ti have a density q. If Ti is supported on [0, 1], then the Vi have a density

g, given by

g(v) = ¯q(v)F(v) − F (v − 1), v ∈[0, 2], (2.2) with the function ¯q defined by

¯

q(v) = q(v) + q(v − 1). (2.3)

This lemma reveals the connection between the UCSD problem and the uniform decon-volution model described above. As can be derived by equation (2.1), the USCD problem is exactly a uniform deconvolution problem if the variables Ti are uniformly distributed

on [0, 1], since then q ≡ 1 on [0, 1]. In this case Vi is in distribution equal to Xi+ Zi, with

Zi ∼ U [0, 1).

The result in Lemma 3.1 is important as it serves as the start of our construction of the density estimator for the UCSD problem.

Two inversion formulas (Section 3.2)

Several methods are known to analyse the uniform deconvolution model. Our analysis of the transformed UCSD problem is inspired by the method in [van Es (2011)]. The first step in our approach is inversion of equation (2.2). Under appropriate assumptions for the functions F and q we derive two inversion formulas for both F and f . For x ∈ [0, 1] we have F(x) = g(x) q(x) and F(x) = 1 − g(x + 1) q(x) . Taking derivatives1 _{we get}

f(x) = d dx g(x) q(x) = 1 q(x)g 0 (x) − q 0_(x) q2_(x)g(x) and f(x) = d dx 1 − g(x + 1) q(x) = − 1 q(x)g 0 (x + 1) − q 0_(x) q2_(x)g(x + 1) .

The inversion formulas for F are derived from equation (2.2). Hence the inversion formu-las yield equal F and f if g is of the form (2.2). However for arbitrary g, for example an estimator of g, which is not of the form (2.2), the inversions will in general not coincide. They may not even yield distribution functions or densities.

(9)

Construction of two estimators of the density function (Section 3.3)

From the above equations we construct our first two density estimators for f . The con-struction is achieved by plugging in a standard kernel density estimator gnh for the density

g. We denote the estimators by f_nh− and f+

nh, respectively called the left and right

estima-tor. For x ∈ [0, 1] we have

f_nh−(x) = 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) (2.4) and f_nh+(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2_(x)gnh(x + 1) , (2.5) where for x ∈ [0, 2], gnh(x) = 1 n n X i=1 1 hw x − V_i h ,

a standard kernel estimator with kernel function w and bandwidth h > 0. Expectation and variance of the left and right estimator (Section 3.4)

In Section 3.4 we derive expansions for the expectation and variance of the left and right estimator given in equations (2.4) and (2.5). The results, without assumptions, are given in the following theorems. The only assumption stated here is the standard assumption for the sequence of bandwidths h = hn, namely for n → ∞ that h → 0 and nh3 → ∞.

For the left estimator we have Theorem 3.4. E f_nh−(x) = f (x) +1 2h 2 Z v2w(v)dv b−(x) + o(h2_), Var f_nh−(x) = 1 q(x) 1 nh3F(x) Z w0(u)2_du_{+ o} 1 nh3 , where b−(x) = 1 q(x)g 000 (x) − q 0 (x) q2_(x)g 00 (x).

(10)

For the right estimator we have Theorem 3.5. E f_nh+(x) = f (x) +1 2h 2 Z v2w(v)dv b+(x) + o(h2), Var f_nh+(x) = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2_du_{+ o} 1 nh3 , where b+(x) = − 1 q(x)g 000 (x + 1) + q 0_(x) q2_(x)g 00 (x + 1).

Theorem 3.4 and Theorem 3.5 together with the standard assumptions on the sequence h= hn ensure that fnh− and f

+

nh are both weakly consistent estimators of f .

Combination of the left and right estimator; Expectation, variance and asymp-totic properties (Sections 3.5, 3.6 and 3.7)

In Section 3.4 we are led to the idea of a convex linear combination of f_nh− and f+

nh to

obtain a new estimator with lower variance than both f−

nh and f +

nh. The new estimator is

defined as

f_nht (x) = tf_nh−(x) + (1 − t)f+ nh(x),

for some fixed 0 ≤ t ≤ 1.

The expectation and variance of ft

nh are stated without assumptions in the following

theorem. The same standard assumptions on the sequence h = hn are imposed.

Theorem 3.6. E ft nh(x) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2_), Var fnht (x) = 1 q(x) 1 nh3 t 2 F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 . (2.6)

In particular it turns out that for uniformly distributed variables Ti on [0, 1], i.e. q ≡ 1

on [0, 1], we have b−_{(x) = b}+_{(x) = f}00_{(x). In this case the expectation of f}t

nh does not

depend on t. The fact that for general distribution q the bias of ft

nh depends on t is an

essential difference to the uniform deconvolution problem.

Note that Theorem 3.6 shows that a MSE optimal bandwidth h is of order n−1/7 _{since it}

minimizes the sum of the squared bias and variance which are of order h4 _{and 1/(nh}3₎

respectively.

Again, the standard assumptions on the bandwidths h = hn and equations (2.6) ensure

that ft

nh is a weakly consistent estimator of f . The next main theorem states asymptotic

normality of ft

(11)

Theorem 3.8. As n → ∞, h → 0 and nh3 _{→ ∞, we have} √ nh3_(ft nh(x) − E fnht (x)) D → N (0, σ2 t) with σ_t2 = 1 q(x) t 2_F_{(x) + (1 − t)}2_{(1 − F (x)} Z w0(u)2_du.

Derivation of the optimal estimator with respect to t (Section 3.8)

Note that, at this moment, we have found a consistent asymptotically normal estimator of f for all t, such that 0 ≤ t ≤ 1. In Section 3.8 we focus on finding an optimal t for ft

nh.

We define t∗ _{∈ [0, 1] to be the value for t that minimizes the asymptotic mean squared}

error of ft nh with respect to t. For t∗ _{we find} t∗ = 1 − F (x) − nh 7_C(w)b+_(x)˜_q(x)q(x) 1 + nh7_C(w)˜_q2_(x)q(x) + o(1), where ˜ q(x) = 1 q(x)q 000 (x) − q 0_(x) q2_(x)q 00 (x) and C(w) = 1 4 R v2_w(v)dv2 R w0_(u)2_du .

Note that for q ≡ 1 on [0, 1] we have t∗ _{= 1 − F (x) + o(1).}

Discussion on the optimal value of t (Section 3.9)

In Section 3.9 we discuss the value of t∗_{. Note that t}∗ _{depends on the bandwidth h. For}

a MSE optimal choice of h, i.e. h ∼ cn−1/7_{, c > 0 we have}

t∗ ∼ 1 − F (x) − C(w)b

+_(x)˜_q(x)q(x)

1 + C(w)˜q2_(x)q(x) .

For a super optimal choice of h, i.e. h n−1/7_{, we have t}∗ _{∼ b}+_(x)/(b+_{(x) − b}−_{(x)). For}

a sub optimal choice of h, i.e. h n−1/7, we have t∗ ∼ 1 − F (x).

Asymptotic normality of the final estimator (Section 3.10)

In all three cases described above, t∗ _{and hence f}t∗

nh depend on F . Hence ft ∗

nh is not

applicable as an estimator of f . We call ft∗

nh a semi-estimator. In the case of (super)

optimal h the constructed ft∗

nh is a semi-estimator as well, it depends on (F and) the

(12)

In the sequel we take t = 1 − ˆFn, where ˆFn is an estimator of F on [0, 1] that is consistent

in MSE. We define our final estimator fnh as fnh1− ˆFn. Hence we have

fnh(x) = (1 − ˆFn(x))fnh−(x) + ˆFn(x)fnh+(x).

The main theorem of this thesis proves asymptotic normality and gives expansions of the bias and variance of the final estimator for h = O(n−1/7_{). It also shows that the final}

estimator is t-optimal for sub optimal h, i.e. h n−1/7_.

Theorem 3.11. Assume that Condition W on w in Section 3.4 is satisfied, that f is twice continuously differentiable, q is three times continuously differentiable on a neighbourhood of x, and that ˆFn(x) is an estimator of F (x) with

E ( ˆFn(x) − F (x))2 → 0. (2.7)

Then, as n → ∞ and h = O(n−1/7_{), we have nh}3 _{→ ∞, h → 0 and we have for 0 < x < 1}

√ nh3_(f nh(x) − E fnh(x)) D → N (0, σ2_), with σ2 = F (x)(1 − F (x)) 1 q(x) Z w0(u)2_du. Furthermore, if E ( ˆFn(x) − F (x))2 = o(nh7) (2.8) then E fnh(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+(x) + o(h2 ). If h n−1/7 _{then f}

nh(x) is t-optimal and we have

√ nh3_(f nh(x) − f (x)) D → N (0, σ2_), as n → ∞.

With t-optimal we mean that given the sequence of bandwidths the choice of t is the MSE optimal one.

(13)

Consistent estimator of the distribution function F (Section 3.11)

In Section 3.11 we define the estimator Ft

nh(x) for F (x) for x ∈ [0, 1] by F_nht (x) = tF_nh−(x) + (1 − t)F+ nh(x), with F_nh−(x) = gnh(x) q(x) and F_nh+(x) = 1 − gnh(x + 1) q(x) for some fixed 0 ≤ t ≤ 1.

The expansions of the expectation and the variance of Ft

nh are stated in the following

theorem. The assumptions are the same as in Theorem 3.11.

Theorem 3.13. Under the assumptions for w, f and q in Theorem 3.11 we have

E Ft nh(x) = F (x) + 1 q(x) 1 2h 2_(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2₎ _(2.9) and Var Fnht (x) = 1 q(x) 1 nh(t 2 F(x) + (1 − t)2(1 − F (x)) Z w(v)2dv+ o 1 nh . (2.10)

We prove in Section 3.11 that Ft

nh is consistent in MSE for all 0 ≤ t ≤ 1 with the help of

Theorem 3.13. Hence the estimator Ft

nh meets the requirement of Theorem 3.11.

In the simulations in Chapter 4 we choose for ˆFn, introduced in Theorem 3.11, the

esti-mator Ft

nh with t = 1/2. In other words we define for x ∈ [0, 1],

ˆ Fn(x) = 1 2F − nh(x) + 1 2F + nh(x).

(14)

Chapter 3 Detailed derivation of the results

3.1 Univariate current status data

We consider again the univariate current status data problem or the Type I interval censoring problem. In this estimation problem one observes at i.i.d. random time instants T1, . . . , Tn

whether unobservable variables of interest X1, . . . , Xn are smaller or larger than the

corre-sponding Ti. If we write ∆i = I[Xi≤Ti], the observations are (T1,∆1), . . . , (Tn,∆n). The exact

value of Xi is not measured. Based on these data, under some regularity conditions, one can

estimate the distribution of the Xi.

Note that the probabilities for ∆i, given the value of Ti, are given by

P(∆i = 0|Ti = ti) = 1 − F (ti),

P(∆i = 1|Ti = ti) = F (ti). (3.1)

Consider the following transformation of the points Ti,

Vi =

Ti+ 1 , if ∆i = 0,

Ti , if ∆i = 1. (3.2)

Lemma 3.1. Assume that the distribution of the variables Xi is concentrated on [0, 1]. Assume

that the Ti have a density q. If Ti is supported on [0, 1], then the Vi have a density g, given by

g(v) = ¯q(v)F(v) − F (v − 1), v ∈ [0, 2], (3.3)

with the function ¯q defined by

¯

(15)

Proof

We omit the subscript i. For v ∈ [0, 1] we have

P(V ≤ v) = 1 X k=0 P(V ≤ v, ∆ = k) = P (V ≤ v, ∆ = 1) = Z 1 0 P(V ≤ v, ∆ = 1|T = t)q(t)dt = Z 1 0 P(T ≤ v, ∆ = 1|T = t)q(t)dt = Z v 0 P(∆ = 1|T = t)q(t)dt = Z v 0 F(t)q(t)dt.

By the support restriction on the distribution induced by F and q this confirms (3.3) for the given values of v.

Let us also check the claim on the interval [1, 2]. For u ∈ [0, 1] we have

P(1 ≤ V ≤ 1 + u) = P (1 ≤ V ≤ 1 + u, ∆ = 0) + P (1 ≤ V ≤ 1 + u, ∆ = 1) = P (1 ≤ V ≤ 1 + u, ∆ = 0) = Z 1 0 P(1 ≤ V ≤ 1 + u, ∆ = 0|T = t)q(t)dt = Z 1 0 P(T ≤ u, ∆ = 0|T = t)q(t)dt = Z u 0 P(∆ = 0|T = t)q(t)dt = Z u 0 (1 − F (t))q(t)dt = Z 1+u 1 (1 − F (t − 1))q(t − 1)dt.

Again by the support restriction on the distribution induced by F and q this confirms (3.3) for

the values of v in [1, 2]. 2

This lemma reveals a connection between the UCSD problem and the uniform deconvolution problem. The USCD problem is equal to the uniform deconvolution problem if the variables Ti

have a uniform distribution on [0, 1], since then q ≡ 1 on [0, 1]. In this case Vi is in distribution

(16)

3.2 Inversion formulas

We deduce the following inversion formulas from Lemma 3.1.

Lemma 3.2. If g is of the form (3.3) and q is strictly positive on [0, 1], we have for x ∈ [0, 1]

F(x) = g(x)

q(x), (3.5)

F(x) = 1 − g(x + 1)

q(x) . (3.6)

Furthermore, if we assume that F and q are differentiable, we have for 0 < x < 1

f(x) = 1 q(x)g 0 (x) − q 0_(v) q2_(x)g(x), (3.7) f(x) = − 1 q(x)g 0 (x + 1) − q 0_(x) q2_(x)g(x + 1) . (3.8) Proof

Note that v ∈ [0, 1] implies F (v − 1) = 0 and q(v − 1) = 0. Hence for v ∈ [0, 1] equation (3.3) turns into g(v) = F (v)q(v). Rewriting this gives the first inversion formula (3.5).

Also v ∈ [1, 2] implies F (v) = 1 and q(v) = 0. Hence for v ∈ [1, 2] equation (3.3) turns into g(v) = (1 − F (v − 1))q(v − 1). Rewriting this gives F (v − 1) = 1 − g(v−1)_q(v−1). Substituting x for v −1 gives the second inversion formula (3.6) for x ∈ [0, 1].

Differentiating these formulas yields the two formulas (3.7) and (3.8). 2 From now on we assume the density q to be differentiable and strictly positive on [0, 1]. We also assume that the distribution function F has a density f .

3.3 Construction of two estimators of the density

func-tion

Our aim is to construct an ‘optimal’1 _{estimator for the density function f of the unobservable}

variables of interest X1, . . . , Xn. In this section we start with the construction of two different

estimators using the inversion formulas in Lemma 3.2.

The inversion formulas in Lemma 3.2 yield equal F (x) and f (x) if g is of the form (3.3). However for arbitrary g, for example an estimator of g, which is not of the form (3.3), the inversions will in general not coincide. They may not yield distribution functions or densities.

(17)

We now derive two different estimators of f (x) from (3.7) and (3.8) in the following way.

For g(x) we substitute the kernel density estimator with differentiable kernel function w and bandwidth h > 0. For x ∈ [0, 2] this yields

gnh(x) = 1 n n X i=1 1 hw x − V_i h . (3.9)

For g0_{(x) we substitute the derivative of this kernel estimator g}0 nh(x), g_nh0 (x) = 1 n n X i=1 1 h2 w 0x − Vi h .

In this way we derive two different estimators of f (x) from (3.7) and (3.8). We call the estimators respectively the left and right estimator, written as f_nh− and f+

nh. We have f_nh−(x) = 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) (3.10) and f_nh+(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2_(x)gnh(x + 1) . (3.11)

Note that these estimators coincide with the estimators in the uniform deconvolution model as obtained in [van Es (2011)] when the Ti are taken to be uniform, i.e. q(x) ≡ 1 on [0, 1].

Our next goal is to show that f_nh−(x) and f_nh+(x) are consistent estimators of f (x). We are interested in the expectation and variance of these estimators.

3.4 Expectation and variance of the left and right

esti-mator

In this section we derive expansions of the expectation and variance of the left and right es-timator. In our derivation we need expansions of the expectation and variance of the kernel density estimator gnh as well as its derivative.

We first impose the following condition on the kernel function w in gnh.

Condition W

The function w is a continuously differentiable symmetric probability density function with support [-1,1].

(18)

The asymptotics for gnh and its derivative are given in the following lemma.

Lemma 3.3. Assume that the density g is bounded or integrable. Assume also that g is twice differentiable with continuous and bounded g00_{. If the kernel function w satisfies Condition W,}

then we have E gnh(x) = g(x) + 1 2h 2 g00(x) Z v2w(v)dv + o(h2), Var gnh(x) = 1 nhg(x) Z w(u)2_du_{+ o} 1 nh . (3.12)

If furthermore, g is three times differentiable with continuous and bounded g(3)_{, then we}

have E g_nh0 (x) = g0(x) + 1 2h 2_g(3)_(x) Z v2w(v)dv + o(h2_), Var g0nh(x) = 1 nh3g(x) Z w0(u)2du+ o 1 nh3 . (3.13) Proof

The equations in (3.12) are standard asymptotic properties of the kernel estimator and the proof of these expansions can be found in [Silverman (1986)]. The proofs of equations (3.13) are however not commonly known. We therefore prove (3.13) below. We start with the expansion of E g0

nh(x). In this expansion Lemma A.1 is used twice. Lemma A.1 and its proof are given in

full detail in Section A.1. For E g0

nh(x) we have E g_nh0 (x) = E1 n n X i=1 1 h2 w 0x − Vi h = 1 h2E w 0x − V1 h = 1 h2 Z w0x − t h g(t)dt = 1 h2 [−hwx − t h g(t)]∞−∞− Z hw x − t h g0(t)dt = 1 h Z wx − t h g0(t)dt

= Ew,g0(x, h) (Use Lemma A.1) = g0(x) Z w(−u)du + h2 1 2g 000 (x) Z u2w(u)du + o(1) = g0(x) + 1 2h 2_g000 (x) Z u2w(u)du + o(h2_).

(19)

In the penultimate equation we used Lemma A.12 _{in Section A.1. In the ultimate equation we}

used that R w(u)du = 1 and the fact that w is symmetric.

We continue the proof with the expansion of the variance of g0

nh(x) stated in (3.13). In this

expansion Lemma A.1 is used again. Our first step is to rewrite E g0 nh(x)2. E gnh0 (x)2 = E 1 n n X i=1 1 h2 w 0x − Vi h !2 = E 1 n2_h4 n X i=1 w0x − Vi h 2 + 1 n2_h4 X i6=j w0x − Vi h w0x − Vj h ! = 1 nh4E w 0x − V1 h 2 +n −1 nh4 E w0x − V1 h 2 = 1 nh3E 1 hw 0x − V1 h 2 +n −1 n 1 h2E w 0x − V1 h 2 . The variance of g0

nh(x) can now be written as follows.

Var g_nh0 (x) = E g0_nh(x)2_{− (E g}0 nh(x)) 2 = 1 nh3E 1 hw 0x − V1 h 2 +n −1 n 1 h2E w 0x − V1 h 2 − 1 h2E w 0x − V1 h 2 = 1 nh3E 1 hw 0x − V1 h 2 − 1 nh2 1 hE w 0x − V1 h 2 .

We use Lemma A.1 to expand the first term3 _{and we note that the second term equals} 1

nh2

Eg0_nh(x)2.

2_{Note that the derivative of the density g and the kernel w satisfy Condition C and K in Section A.1. Note}

that s = 2 because w is a symmetric density function. These considerations justify the use of Lemma A.1.

3_{Note that E}1 hw

0x−V1

h

2

= Ew02,g_{(x, h). The density g satisfies Condition C. The function w}02 _satisfies

Condition K for s = 2 by the following arguments.

First of all sup−∞<u<∞|w0(u)2|du < ∞, because w0(x) is bounded. Secondly, R uw0(u)2du= 0 because w0(u)

is odd by the symmetry of w(u). FinallyR u2_w0_(u)2_{du 6= 0, because u}2 _{and w}0_(u)2 _{are even and there exist an}

(20)

Together we have Var g0nh(x) = 1 nh3 g(x) Z w0(u)2_du₊1 2h 2_g00 (x) Z u2w0(u)2_du_{+ o(h}2₎ − 1 nh2 g0(x) + 1 2h 2_g000 (x) Z v2w(v)dv + o(h2₎ 2 = 1 nh3g(x) Z w0(u)2_du_{+ o} 1 nh3 . 2

The results in Lemma 3.3 enable us to calculate E f_nh−(x) and Var f_nh−(x). The next theorem states the expansions of the expectation and variance of f_nh−(x).

Theorem 3.4. Assume that the kernel function w satisfies Condition W, that the density f is twice continuously differentiable, and that the density q is three times continuously differentiable, then we have for 0 < x < 1

E f_nh−(x) = f (x) + 1 2h 2 Z v2w(v)dv b−(x) + o(h2), (3.14) and Var f_nh−(x) = 1 q(x) 1 nh3F(x) Z w0(u)2_du_{+ o} 1 nh3 , (3.15) where b−(x) = 1 q(x)g 000 (x) − q 0_(x) q2_(x)g 00 (x). (3.16) Proof

The assumptions on f and q imply that a function g of the form (3.3) is three times differ-entiable with continuous and bounded derivative g(3) _{on the intervals (0, 1) and (1, 2). Hence}

the assumptions of Lemma 3.3 are in particular satisfied on the interval (0, 1) for g of the form (3.3). With this in mind we first prove (3.14).

(21)

We have E f_nh−(x) = E 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) (by (3.10)) = 1 q(x)E g 0 nh(x) − q0(x) q2_(x)E gnh(x) = 1 q(x) g0(x) + 1 2h 2 g000(x) Z v2w(v)dv + o(h2) − q 0_(x) q2_(x) g(x) + 1 2h 2 g00(x) Z v2w(v)dv + o(h2) = f (x) + 1 q(x) 1 2h 2 g000(x) Z v2w(v)dv + o(h2) − q 0_(x) q2_(x) 1 2h 2 g00(x) Z v2w(v)dv + o(h2) = f (x) + 1 2h 2 Z v2w(v)dv 1 q(x)g 000 (x) − q 0_(x) q2_(x)g 00 (x) + o(h2).

We continue with the expression for Var f_nh−(x). We have

Var f_nh−(x) = Var 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) = 1 q(x)2 Var g 0 nh(x) − q0(x)2 q4_(x) Var gnh(x) − 2 1 q(x) q0(x) q2_(x)Cov (g 0 nh(x), gnh(x)) .

We consider the second and third term in Var f_nh−(x) separately. Notice that Var gnh(x)

Var g0

nh(x), because 1/(nh) 1/(nh3). Hence the second term is negligible. By the

Cauchy-Schwarz inequality we have

Cov(g_nh0 (x), gnh(x)) ≤ q Var g0 nh(x)pVar gnh(x) qVar g0 nh(x) q Var g0 nh(x) = Var gnh0 (x).

Hence the third term is negligible. We may therefore conclude

Var f_nh−(x) = 1 q(x)2 Var g 0 nh(x) + o(Var g 0 nh(x)) = 1 q(x)2 1 nh3g(x) Z w0(u)2du+ o 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3F(x) Z w0(u)2du+ o 1 nh3 . 2

(22)

We will now focus on f+

nh(x). The next step is to find expansions of E f +

nh(x) and Var f + nh(x).

The results and derivation are similar as those of f− nh(x).

E f+ nh(x) = f (x) + 1 2h 2 Z v2w(v)dv b+_{(x) + o(h}2_), _(3.17) and Var f+ nh(x) = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2_du_{+ o} 1 nh3 , (3.18) where b+(x) = − 1 q(x)g 000 (x + 1) + q 0_(x) q2_(x)g 00 (x + 1). (3.19) Proof

By a similar reasoning as in the proof of Theorem 3.4 the assumptions of Lemma 3.3 are now in particular satisfied on the interval (1, 2) for g of the form (3.3). With this in mind we first prove equation (3.17). E f+ nh(x) = E − 1 q(x)g 0 nh(x + 1) + q0(x) q2_(x)gnh(x + 1) (by (3.11)) = − 1 q(x)E g 0 nh(x + 1) + q0(x) q2_(x)E gnh(x + 1) = − 1 q(x) g0(x + 1) +1 2h 2_g000 (x + 1) Z v2w(v)dv + o(h2₎ + q 0_(x) q2_(x) g(x + 1) + 1 2h 2_g00 (x + 1) Z v2w(v)dv + o(h2₎ = f (x) + 1 q(x) −1 2h 2_g000 (x + 1) Z v2w(v)dv + o(h2₎ + q 0_(x) q2_(x) 1 2h 2_g00 (x + 1) Z v2w(v)dv + o(h2₎ = f (x) + 1 2h 2 Z v2w(v)dv − 1 q(x)g 000 (x + 1) + q 0_(x) q2_(x)g 00 (x + 1) + o(h2_).

We continue with the expression for Var f_nh+(x). We have

Var f+ nh(x) = Var − 1 q(x)g 0 nh(x + 1) + q0(x) q2_(x)gnh(x + 1) = 1 q(x)2 Var g 0 nh(x + 1) + q0(v)2 q4_(x) Var gnh(x + 1) + 2 1 q(x) q0(x) q2_(x)Cov (g 0 nh(x + 1), gnh(x + 1)) .

(23)

By the same reasoning as in the proof of Theorem 3.4 we have that Var gnh(x + 1)

Var g0

nh(x+1). This result and the Cauchy-Schwarz inequality give Cov(g 0

nh(x+1), gnh(x+1))

Var g0

nh(x + 1). We may therefore conclude

Var f+ nh(x) = 1 q(x)2Var g 0 nh(x + 1) + o(Var g 0 nh(x + 1)) = 1 q(x)2 1 nh3g(x + 1) Z w0(u)2du+ o 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2du+ o 1 nh3 . 2

3.5 Combination of the left and right estimator

The left estimator has a smaller variance than the right estimator for small values of x, because of the factors F (x) and 1 − F (x). For large values of x the converse is true. To obtain an estimator with smaller variance as both the left and right estimator, it makes sense to construct a convex linear combination of the two estimators. We define the combined estimator by

f_nht (x) = tf_nh−(x) + (1 − t)f+ nh(x),

for some fixed 0 ≤ t ≤ 1.

One could wonder if the bias of this combined estimator will not be bigger than the bias of f_nh−(x) and/or f_nh+(x). By linearity of the bias, the bias of the new estimator for t ∈ (0, 1) will lie somewhere between the bias of f_nh−(x) and f+

nh(x). The exact value depends on t. In Section

3.8 we will find an optimal value for t, which we call t∗_{. The value t}∗ _{is optimal in the sense}

that it minimizes the mean squared error of ft

nh(x) with respect to t. It is possible that ft ∗ nh(x)

will have a larger bias than f_nh−(x) or f+

nh(x), but it will have lower or equal mean squared error

for certain.

3.6 Expectation and variance of the combined estimator

We will derive the expectation and the variance of this combined estimator as well as its asymptotic behaviour. The next theorem states the expansions of the expectation and variance of ft

(24)

E ft nh(x) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2_), _(3.20) and Var fnht (x) = 1 q(x) 1 nh3 t 2 F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 . (3.21) Proof

We first prove (3.20). We apply Theorem 3.4 and Theorem 3.5. We have E ft nh(x) = E tf − nh(x) + (1 − t)f + nh(x) = tE f_nh−(x) + (1 − t)E f+ nh(x) = t f(x) + 1 2h 2 Z v2w(v)dv b−(x) + o(h2) + (1 − t) f(x) + 1 2h 2 Z v2w(v)dv b+(x) + o(h2) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2_).

In order to prove (3.21) we use the following lemma. The proof of this lemma is given at the end of this section.

Lemma 3.7. Under the assumptions of Theorem 3.6 we have

Cov f_nh−(x), f+ nh(x) = o 1 nh3 .

We continue with the expansion of Var ft

nh(x). We have Var fnht (x) = Var tf − nh(x) + (1 − t)f + nh(x) = t2_{Var f}− nh(x) + (1 − t) 2_{Var f}+ nh(x) + 2t(1 − t) Cov f − nh(x), f + nh(x) = t2_{Var f}− nh(x) + (1 − t) 2_{Var f}+ nh(x) + 2t(1 − t)o 1 nh3 = t2 1 q(x) 1 nh3F(x) Z w0(u)2_du_{+ o} 1 nh3 + (1 − t)2 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2_du_{+ o} 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3 t 2_F_{(x) + (1 − t)}2_{(1 − F (x)} Z w0(u)2_du_{+ o} 1 nh3 , which shows (3.21). 2

(25)

Proof of Lemma 3.7 We have Cov f_nh−(x), f+ nh(x) = Cov 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x), − 1 q(x)g 0 nh(x + 1) + q0(x) q2_(x)gnh(x + 1) ,

which we can rewrite as

Cov f_nh−(x), f+ nh(x) = Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) (1) + Cov 1 q(x)g 0 nh(x), q0(x) q2_(x)gnh(x + 1) (2) + Cov −q 0_(x) q2_(x)gnh(x), − 1 q(x)g 0 nh(x + 1) (3) + Cov −q 0_(x) q2_(x)gnh(x), q0(x) q2_(x)gnh(x + 1) . (4)

We work out the first line separately and show that the second, third and fourth line are negligible compared to the first line. We have

(1) Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) = − 1 q(x)2 E g 0 nh(x)g 0 nh(x + 1) − E g 0 nh(x)E g 0 nh(x + 1).

(26)

Now note that, for n large enough, E g0_nh(x)g_nh0 (x + 1) = E n X i=1 1 nh2 w 0x − Vi h _Xn j=1 1 nh2 w 0x+ 1 − Vj h ! = E n X i=1 1 n2_h4w 0x − Vi h w0x+ 1 − Vi h + n X i6=j 1 n2_h4w 0x − Vi h w0x+ 1 − Vj h ! = E n X i6=j 1 n2_h4w 0x − Vi h w0x+ 1 − Vj h = n X i6=j 1 n2_h4E w0x − Vi h w0x+ 1 − Vj h = n X i6=j 1 n2_h4E w 0x − Vi h E w0x+ 1 − Vj h = n(n − 1) n2 E g 0 nh(x)E g 0 nh(x + 1) = (1 − 1 n)E g 0 nh(x)E g 0 nh(x + 1) = E gnh0 (x)E g 0 nh(x + 1) − 1 nO(1) = E g_nh0 (x)E g_nh0 (x + 1) − o 1 nh3 .

We have used that w0x−Vi h

w0x+1−Vi h

= 0 for all i provided that h < 1/2, which is true for n large enough. We have also used that Vi and Vj are independent for all i, j and the fact that

E g0

nh(x)E g 0

nh(x + 1) = g

0_(x)g0_{(x + 1) + O(h}2_{) = O(1).}

We may now conclude

Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) = − 1 q(x)2 E gnh0 (x)E g 0 nh(x + 1) − o 1 nh3 − E gnh0 (x)E g 0 nh(x + 1) = o 1 nh3 and therefore (1) = o 1 nh3.

(27)

With Cauchy-Schwarz it is easily seen that (2) Var g0

nh(x) and hence (2) = o( 1 nh3),

(3) Var g0

nh(x + 1) and hence (3) = o(nh13) and the last term (4) Var g 0

nh(x) and hence also

(4) = o( 1

nh3). 2

Note that for uniformly distributed variables Ti on [0, 1], i.e. q ≡ 1 on [0, 1], we have

b−(x) = b+_{(x) = f}00_{(x). In this case the expectation of f}t

nh does not depend on t.

3.7 Asymptotic behaviour of the combined estimator

In this section we prove asymptotic normality of the combined estimator. Recall that, for some fixed 0 ≤ t ≤ 1, we have

f_nht (x) = tf−

nh(x) + (1 − t)f + nh(x)

and note that asymptotic normality for ft

nh(x) implicitly proves us asymptotic normality of our

former estimators f_nh−(x) and f_nh+(x) by taking t equal to one and zero.

Theorem 3.8. Assume that Condition W is satisfied and that f is bounded on a neighbourhood of x. Then, as n → ∞, h → 0 and nh3 _{→ ∞, we have for 0 < x < 1}

√ nh3_(ft nh(x) − E f t nh(x)) D → N (0, σt2) with σ_t2 = 1 q(x) t 2_F_{(x) + (1 − t)}2_{(1 − F (x)} Z w0(u)2_du. Proof Write ft nh(x) for 0 < x < 1 as follows. f_nht (x) = tf_nh−(x) + (1 − t)f_nh+(x) = t 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) + (1 − t) − 1 q(x)g 0 nh(x + 1) − q0(x) q2_(x)gnh(x + 1) = t 1 q(x) n X i=1 1 nh2 w 0x − Vi h − q 0_(x) q2_(x) n X i=1 1 nhw x − V_i h ! + (1 − t) − 1 q(x) n X i=1 1 nh2 w 0x+ 1 − Vi h − q 0_(x) q2_(x) n X i=1 1 nhw x+ 1 − V_i h ! = 1 n n X i=1 Uih(x) with Uih = t 1 q(x) 1 h2 w 0x − Vi h − q 0_(x) q2_(x) 1 hw x − V_i h + (1 − t) − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0_(x) q2_(x) 1 hw x+ 1 − V_i h .

(28)

Note that E fnht (x) = E 1 n n X i=1 Uih(x) = 1 nE n X i=1 Uih(x) = 1 nnE U1h(x) = E U1h(x) (3.22) and Var ft nh(x) = Var 1 n n X i=1 Uih(x) = 1 n2 Var n X i=1 Uih(x) = 1 nVar U1h(x). (3.23) We need the following lemma to prove that ft

nh(x)−E fnht (x) (= 1 n

Pn

i=1 (U1h(x) − E U1h(x)))

is asymptotically normally distributed. The lemma enables us to show that the Lyapunov con-dition in the Central Limit Theorem is satisfied. The proof of this lemma can be found in Section A.2. The corresponding expression of the Lyapunov condition as stated below is de-rived in Section A.4.

Lemma 3.9. Under the assumptions of Theorem 3.8, we have for m even and for h → 0

E U1h(x)m = 1 q(x)m−1 1 h2m−1(t m_F_{(x) + (1 − t)}m_{(1 − F (x))} Z w0(u)m_du_{+ o} 1 h2m−1 . (3.24)

We now check the Lyapunov condition with the help of Lemma 3.9. This means that for some δ > 0 we have to check

E |U1h(x) − E U1h(x)|2+δ

nδ/2_(Var(U

1h(x)))1+δ/2

→ 0.

We use that E U1h(x) = E fnht (x) = O(1) and Var U1h(x) = O(1/h3). Furthermore we use

that (a + b)4 _{≤ 2}3_(a4_{+ b}4_{). For δ = 2 we have}

E |U1h(x) − E U1h(x)|4 n(Var(U1h(x)))2 ≤ 2 3_{(E U} 1h(x)4+ (E U1h(x))4) n(Var(U1h(x)))2 ∼ 8 1 q(x)3 1 h7 (t 4_F_{(x) + (1 − t)}4_{(1 − F (x))}_{R w}0 (u)4_du_{+ o(} 1 h7) n(1 h3c2)2 ∼ 8c1 nhc2 2 → 0,

(29)

3.8 Derivation of the t-optimal estimator

We now define the ‘t-optimal’ estimator to be the estimator that minimizes the mean squared error of ft

nh(x). Recall that the mean squared error of fnht (x) can be written as

MSE ft nh(x) = bias 2_ft nh(x) + Var f t nh(x). Note that E ft

nh(x) and hence bias fnht (x) would be independent of t and equal to E f − nh(x)

and E f_nh+(x) if g00_{(x) = −g}00_{(x + 1) (implying g}000_{(x) = −g}000_{(x + 1)). As we will see in Lemma}

3.10 this equality holds when q(x) = I[0,1), i.e. if the Ti are uniformly distributed on [0, 1].

Minimisation of the mean squared error under t reduces in this case to minimisation of the variance under t. However in general E ft

nh(x) is dependent on t and so is its bias.

Minimisation of the mean squared error of ft

nh under t is the subject of this section. Our

first step is to obtain a clear expression for the bias of ft

nh. Our second step is then to combine

this expression with its variance to construct the MSE. We differentiate this MSE ft

nh(x) with

respect to t and finally we solve the equation d

dtMSE f t

nh(x) = 0 for t.

Lemma 3.10 below gives relations between g(x), g(x + 1) and their first, second and third derivatives. We use this lemma in our analysis of the mean squared error of ft

nh.

Lemma 3.10. If the density g is three times differentiable we have for 0 < x < 1

g(x) + g(x + 1) = q(x), g0(x) + g0(x + 1) = q0(x), g00(x) + g00(x + 1) = q00(x), and g000(x) + g000(x + 1) = q000(x). Proof

Summation of the inversion formulas (3.5) and (3.6) gives the first expression. Taking deriva-tives gives the next three equations. 2

We define the function ˜q(x) as follows.

˜ q(x) = 1 q(x)q 000 (x) − q 0_(x) q2_(x)q 00 (x). (3.25)

(30)

By equation (3.20) we find the following expressions for the bias of ft

nh and its derivative in

terms of ˜q(x). bias ft nh(x) = E fnht (x) − f (x) = 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2₎ and d dtbias f t nh(x) = 1 2h 2 Z v2w(v)dv ˜q(x) + o(h2_).

This leads us to an expression for MSE ft

nh(x) and d dtMSE f t nh(x). We have MSE ft nh(x) = bias f t nh(x) 2 + Var ft nh(x) =1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2₎2 + 1 q(x) 1 nh3 t 2_F_{(x) + (1 − t)}2_{(1 − F (x)} Z w0(u)2_du_{+ o} 1 nh3 . Differentiation yields d dtMSE f t nh(x) = 2 bias f t nh(x) d dt bias f t nh(x) + d dtVar f t nh(x) = 21 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+_{(x) + o(h}2₎ ×1 2h 2 Z v2w(v)dv ˜q(x) + o(h2₎ + 1 q(x) 1 nh3 (2F (x) + 2t − 2) Z w0(u)2du+ o 1 nh3 = 1 2h 4 Z v2w(v)dv 2 tb−(x) + (1 − t)b+(x) ˜q(x) + o(h4) + 1 q(x) 1 nh3 (2F (x) + 2t − 2) Z w0(u)2du+ o 1 nh3 .

We find the value for t that minimizes4 _{MSE f}t

nh(x) by solving d

dtMSE f t

nh(x) = 0 with respect

to t. We will call this value t∗_{. Note that (tb}−_{(x) + (1 − t)b}+_{(x)) ˜}_{q(x) = b}+_{(x) + t˜}_q.

4_{One can check that the second derivative of MSE f}t

(31)

We have t∗ = − 1 q(x) 1 nh3 (2F (x) − 2)R w 0_(u)2_{du −} 1 2h 4 _{R v}2_w(v)dv2 b+(x)˜q(x) + o(h4_{) + o(} 1 nh3) 2 1 q(x) 1 nh3 R w0(u)2du+ 1 2h4 R v2w(v)dv 2 ˜ q2_(x) = 2 1 q(x) 1 nh3 (1 − F (x))R w 0_(u)2_{du −} 1 2h 4 _{R v}2_w(v)dv2 b+_(x)˜_q(x) 2 1 q(x) 1 nh3 R w0(u)2du+ 1₂h4 R v2w(v)dv 2 ˜ q2_(x) + o(1) = (1 − F (x)) − nh 7_C(w)b+_(x)˜_q(x)q(x) 1 + nh7_C(w)˜_q2_(x)q(x) + o(1), (3.26) where C(w) = 1 4 R v2_w(v)dv2 R w0_(u)2_du . (3.27)

If q(x) ≡ 1 on [0, 1], we have that ˜q(x) equals 0 and hence t∗ _{= 1 − F (x) + o(1). This result is}

also found in [van Es (2011)]. In the next section we analyse the value of t∗ _{for general densities}

q(x).

3.9 Discussion on the optimal value of t

First of all we observe that t∗ _{depends on F (x). Thereby t}∗ _{depends on g}00_{(x + 1) and g}000_{(x + 1)}

by the factor b+_{(x). The resulting function f}t∗

nh(x) is therefore, although theoretically correct,

not useful in practice. We call ft∗

nh(x) the optimal semi-estimator and use it as a benchmark.

In the next section we prove the existence of a real estimator that achieves, for a special choice of h, the same asymptotic value for the MSE as the optimal semi-estimator.

Secondly we observe that the value of t∗ _{still depends on the sample size n and the window}

width h. We will discuss this impact below.

One can check that a minimum of the mean squared error of ft

nh with respect to h is

ob-tained when h ∼ cn−1/7_{, with c > 0 a constant. By this argument we distinguish three cases}

in our analysis of t∗_{. We consider t}∗ _{for an optimal choice for h, i.e. h ∼ cn}−1/7 _{with the}

constant c > 0. We also consider t∗ _{for a sub and super optimal choice}5 _{for h, i.e. h n}−1/7

and h n−1/7_.

In equation (3.26) two different types of terms arise, terms of O( 1

nh3) and terms of O(h4).

If we choose h to be optimal, these terms are both of O( 1

n4/7). In this case t∗ ∼ 2 1 q(x)(1 − F (x))R w 0_(u)2_{du −} 1 2 R v 2_w(v)dv2 b+(x)˜q(x) 2 1 q(x)R w 0_(u)2_du₊1 2 R v2w(v)dv 2 ˜ q2_(x) + o(1).

5_{Note the following restrictions to a sub and super optimal choice of h. The assumption nh}3 _{→ ∞ implies}

h _n1/31 and the requirement h ∼ n

(32)

To get an idea of the behaviour of the optimal semi-estimator ft∗

nh for optimal chosen h, we first

analyse the sub and super optimal choice for h.

If we choose h to be sub optimal the variance of ft

nh dominates the mean squared error, in

this case O(h4_{) = o(nh}3_{). We expect our optimal estimator then to minimize its variance. Let}

us check this. For sub optimal h n−1/7 _{one finds t}∗ _{= 1 − F (x) + o(1). Our optimal estimator}

equals in this case

f_nht∗(x) = (1 − F (x))f−

nh(x) + F (x)f +

nh(x) + o(1). (3.28)

For small values of x the term (1 − F (x)) is bigger than F (x) and hence the contribution of f_nh−(x) is bigger than the contribution of f_nh+(x). For big x the opposite is true. This is consis-tent with the theory before; for small x the variance of f_nh−(x) is smaller than the variance of f_nh+(x) and for big x the opposite is true.

If we choose h to be super optimal the squared bias of ft

nh dominates the mean squared

error. In this case O(nh3_{) = o(h}4_{). We expect our optimal estimator then to minimize its}

squared bias. For super optimal h we find t∗ _{= −}b+_(x) ˜

q(x) + o(1) =

b+_(x)

b+_(x)−b−_(x) + o(1). Our optimal

estimator equals in this case

f_nht∗(x) = b +_(x) b+_{(x) − b}−_(x)f − nh(x) + b−(x) b−_{(x) − b}+_(x)f + nh(x) + o(1). (3.29)

The contribution of f_nh−(x) and f_nh+(x) is in this case determined by their proportional difference in bias as expected.

We have seen that the expression for t∗ _{in case of an optimal h is less simple than the}

expressions for sub and super optimal h. However by the above analysis for sub and super optimal h we can imagine its effect on the estimator. It will balance the contribution of f_nh−(x) and f_nh+(x) to ft

nh(x) while taking into account their single contribution to both the bias and

the variance.

In the sequel we prove asymptotic normality of a final estimator of the form ft

nh with

t = 1 − ˆFn(x) and ˆFn(x) an estimator for F (x). This final estimator will only be t-optimal in

the sub optimal regime of h. It however makes sense to choose h sub optimal.

First of all asymptotic normality states in this case that ft∗

nh approaches f (x) accurately,

because of the vanishing bias. A sub optimal choice for h is for this reason common in kernel density estimation literature. For instance the aim of Bickel and Rosenblatt in [Bickel and Rosenblatt (1973)] is to get global measures on how good an kernel estimator fn(x) is an

estimate of f (x). In particular they obtain asymptotic results for the maximum absolute deviation and integrated quadratic deviation. With these measures one is able to set up simple confidence intervals and one can use the measures as test statistics for the hypothesis H : f = f0.

The results are all obtained in the sub optimal regime of h.

Another advantage of a sub optimal choice with respect to a (super) optimal choice of h is illustrated in Section 4.3. As seen above t∗ _{depends on g(x), g}00_{(x) and g}000_{(x). However if h is}

(33)

chosen sub optimal t∗ _{only depends on g(x). In practice we substitute the density estimators}

gnh(x), gnh00 (x) and g 000

nh(x) for these functions. The need of g 00

nh(x) and g 000

nh(x) is a disadvantage.

First of all the variances of g00

nh(x) and g 000

nh(x) are of larger order. Secondly, boundary effects

occur more often. And thirdly the addition of the kernel estimators g00

nh(x) and g 000

nh(x) demand

the function g to be higher order differentiable.

3.10 Asymptotic normality of the final estimator

Choosing h sub optimal, we have found

t∗ = 1 − F (x) + o(1). In the sequel we obtain asymptotic normality for ft

nh(x) with t = 1 − ˆFn(x), where ˆFn(x) is

an estimator for F (x) that is consistent in mean squared error. Note that 1 − ˆFn(x) is not

generally in [0, 1]. We call this estimator our final estimator and write it as fnh(x). Hence we

have

fnh(x) = (1 − ˆFn(x))fnh−(x) + ˆFn(x)fnh+(x). (3.30)

In the last main theorem below we prove asymptotic normality of fnh and we give expansions

of the bias and variance for h = O(n−1/7_{). We also show that f}

nh is optimal for sub optimal h,

i.e. h n−1/7_.

Theorem 3.11. Assume that Condition W is satisfied, that f is twice continuously differen-tiable, q is three times continuously differentiable on a neighbourhood of x, and that ˆFn(x) is

an estimator of F (x) with

E ( ˆFn(x) − F (x))2 → 0. (3.31)

Then, as n → ∞ and h = O(n−1/7_{), we have nh}3 _{→ ∞, h → 0 and we have for 0 < x < 1}

√ nh3_(f nh(x) − E fnh(x)) D → N (0, σ2_), with σ2 = F (x)(1 − F (x)) 1 q(x) Z w0(u)2du. Furthermore, if E ( ˆFn(x) − F (x))2 = o(nh7) (3.32) then E fnh(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+_{(x) + o(h}2_). If h n−1/7 _{then f}

nh(x) is t-optimal and we have

√ nh3_(f nh(x) − f (x)) D → N (0, σ2_), as n → ∞.

(34)

Proof

Note that Theorem 3.8 proves asymptotic normality for the estimator ft

nh(x) with t = 1 − F (x),

because we have (1 − F (x)) ∈ [0, 1]. We use this fact to prove asymptotic normality for our final estimator fnh(x) = f_nh1− ˆFn(x)(x). Write fnh(x) = (1 − ˆFn(x))f_nh−(x) + ˆFn(x)f_nh+(x) = f_nh1−F (x)(x) + Rnh(x), (3.33) where Rnh(x) = ( ˆFn(x) − F (x))Snh(x) and Snh(x) = fnh+(x) − f − nh(x). (3.34) Note that √ nh3_(f nh(x) − E fnh(x)) = √ nh3_f1−F (x) nh (x) − E f 1−F (x) nh (x) + Rnh(x) − E Rnh(x) =√nh3_f1−F (x) nh (x) − E f 1−F (x) nh (x) +√nh3_R nh(x) − √ nh3_{E R} nh(x).

We show in the sequel that √nh3_R

nh(x) and

√

nh3_{E R}

nh(x) converge to 0 in distribution.

We first rewrite Snh(x), we have

Snh(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2_(x)gnh(x + 1) − 1 q(x)g 0 nh(x) − q0(x) q2_(x)gnh(x) = − 1 q(x) 1 n n X i=1 1 h2w 0x+ 1 − Vi h − q 0_(x) q2_(x) 1 n n X i=1 1 hw x+ 1 − V_i h ! − 1 q(x) 1 n n X i=1 1 h2 w 0x − Vi h − q 0_(x) q2_(x) 1 n n X i=1 1 hw x − V_i h ! = 1 n n X i=1 − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0_(x) q2_(x) 1 hw x+ 1 − V_i h − 1 q(x) 1 h2 w 0x − Vi h − q 0_(x) q2_(x) 1 hw x − V_i h ! = 1 n n X i=1 Wih(x) where Wih(x) = − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0_(x) q2_(x) 1 hw x+ 1 − V_i h − 1 q(x) 1 h2 w 0x − Vi h − q 0_(x) q2_(x) 1 hw x − V_i h ! .

(35)

The next lemma establishes some properties of Snh(x). The proof of Lemma 3.12 can be found

in Section A.3.

Lemma 3.12. Under de assumptions of Theorem 3.11 we have

E Snh(x) = 1 2h 2 Z v2w(v)dv(b+_{(x) − b}− (x)) + o(h2_), _(3.35) E Wih(x)m = 1 q(x)m−1 1 h2m−1 Z w0(u)mdu+ o 1 h2m−1 , (3.36) and _√ nh3_(S nh(x) − E Snh(x)) D → N0, 1 q(x) Z w0(u)2du . (3.37)

We start to analyse the term √nh3_R

nh(x) and we prove that it converges to zero in

distri-bution. We rewrite the term as √ nh3_R nh(x) = √ nh3_{( ˆ}_F n(x) − F (x))Snh(x) =√nh3_{( ˆ}_F n(x) − F (x))(Snh(x) − E Snh(x)) +√nh3_{( ˆ}_F n(x) − F (x))E Snh(x).

We estimate the first and second term in the last line separately. By condition (3.31) we have that ˆFn(x) − F (x)

P

→ 0 and hence by Slutsky’s Theorem and the result of Lemma 3.12 we may conclude √ nh3 ˆ_F n(x) − F (x) (Snh(x) − E Snh(x)) D → 0.

Furthermore we have by Lemma 3.12 that E Snh(x) equals O(h2). Hence for h = O(n−1/7) we

have √ nh3_{E S} nh(x) = O(n1/2h7/2) = O(n1/2(n−1/7)7/2) = O(1). (3.38)

Using again condition (3.31) we conclude √

nh3_{( ˆ}_F

n(x) − F (x))E Snh(x) P

→ 0. (3.39)

Together equations (3.38) and (3.39) ensure that √

nh3_R nh(x)

D

(36)

We now analyse the term √nh3_{E R}

nh(x) and we prove that it converges to zero in

distribu-tion as well. By the Cauchy-Schwarz inequality we have E√nh3_|R nh(x)| ≤ √ nh3_{(E ( ˆ}_F n(x) − F (x))2)1/2(E (Snh(x))2)1/2 = E ( ˆFn(x) − F (x))2)1/2O(1) → 0.

By the fact that √nh3_R

nh(x) and

√

nh3_{E R}

nh(x) both converge to zero in distribution, we

may conclude that √nh3_(f

nh(x) − E fnh(x)) has the same asymptotic normal distribution as

√

nh3_(f1−F (x)

nh (x) − E f 1−F (x)

nh (x)), which proves the first statement of the theorem.

The second statement of the theorem is proven as follows. By equation (3.20) we have

E f_nh1−F (x)(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+_{(x) + o(h}2_). _(3.40) Furthermore we have E |Rnh(x)| ≤ (E ( ˆFn(x) − F (x))2)1/2(E (Snh(x))2)1/2 = o√nh7_O_√1 nh3 = o(h2_). _(3.41)

Together equation (3.40) and (3.41) prove the second statement of the theorem.

Finally, by equation (3.21), we find that

Var f_nh1−F (x)(x) = 1 q(x) 1 nh3 (1 − F (x)) 2_F_{(x) + F (x)}2_{(1 − F (x)} Z w0(u)2_du_{+ o} 1 nh3 = 1 q(x) 1 nh3F(x)(1 − F (x)) Z w0(u)2_du_{+ o} 1 nh3 .

Thus for sub optimal h we have Var fnh(x) ∼ Var f_nh1−F (x)(x). Hence for sub optimal h the

estimator fnh(x) is t-optimal.

The last line follows by the fact that for nh3 _{→ ∞ and h n}−1/7 _{we have}

√

nh3_{(E f}

nh(x) − f (x)) → 0

as n → ∞. 2

Note that, in contrast to Theorem 3.8, we already assume sufficient smoothness on the functions f and q to prove the first statement of asymptotic normality. In equation (3.38) this assumption ensures that the difference between f+

nh(x) and f −

nh(x), denoted by Snh(x), satisfies

E Snh(x) = O(h2). Together with the restriction h = O(n−1/7) on h we obtain

√

nh3_{E S}

nh(x) =

O(1).

Note also that Lemma 3.12 reveals that the expectation of the difference between the left and right estimator, i.e. E Snh(x), depends only on the density q as we have b+(x)−b−(x) = ˜q(x)

(37)

3.11 Consistent estimator of the distribution function

Theorem 3.11 is based on the assumption that an estimator that is consistent in mean squared error exists for the distribution function F . In this section we show that such an estimator can be found easily. In fact there are many suitable estimators of F .

The construction of the estimator starts with the inversion formulas (3.5) and (3.6) obtained in Section 3.2. For g we substitute again the kernel estimator gnh, defined in (3.9), as we did

in the construction of f_nh− and f+

nh. This method leads to two different estimators for F for

x ∈[0, 1], F_nh−(x) = gnh(x) q(x) and F + nh(x) = 1 − gnh(x + 1) q(x) . (3.42) We define the estimator Ft

nh(x) for x ∈ [0, 1] as

F_nht (x) = tF_nh−(x) + (1 − t)F+

nh(x) (3.43)

for some fixed 0 ≤ t ≤ 1. In the sequel we show that the estimator Ft

nh is consistent in mean

squared error. This is proven with the help of expansions of the expectation and the variance of Ft

nh. These expansions are stated in the next theorem of which the proof can be found in

Appendix A.5.

Theorem 3.13. Under the assumptions of Theorem 3.11, that is, if Condition W is satis-fied, f is twice continuously differentiable and q is three times continuously differentiable on a neighbourhood of x, we have E Ft nh(x) = F (x) + 1 q(x) 1 2h 2_(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2₎ _(3.44) and Var Ft nh(x) = 1 q(x) 1 nh(t 2_F_{(x) + (1 − t)}2_{(1 − F (x))} Z w(v)2_dv_{+ o} 1 nh . (3.45)

With the help of Theorem 3.13 we are able to check that the estimator Ft

nh is consistent in MSE. We have E Fnht (x) − F (x) 2 = bias2F_nht (x) + Var Fnht (x) = 1 q(x) 1 2h 2_(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2₎ 2 + 1 q(x) 1 nh(t 2_F_{(x) + (1 − t)}2_{(1 − F (x))} Z w(v)2_dv_{+ o} 1 nh = O h4_{+ O} 1 nh → 0, (3.46) as h → 0 and nh → ∞.

(38)

The second statement in Theorem 3.11 is based on the assumption that the estimator Fn(x)

satisfies

E ˆFn(x) − F (x)

2

= o(nh7).

This assumption is stated in equation (3.32). Note that we can choose different bandwidths h in ˆFn(x) and fnh(x) as long as we meet the requirements in Theorem 3.11. In the sequel we

will write h1 for the bandwidth in fnh(x) and h2 for the bandwidth in ˆFn(x). We show below

in which way the estimator Ft

nh(x) is able to satisfy equation (3.32) for respectively an optimal

and a sub optimal choice of h1.

An optimal choice of h1, i.e. h1 = cn− 1

7 with c > 0, reduces the assumption for Ft

nh(x) to

E Ft

nh(x) − F (x)

2

= o(1).

By equation (3.46) this requirement is met by the estimator Ft

nh(x) for all h2 such that h2 → 0

and nh2 → ∞ as n → ∞. In particular the requirement is met for the optimal choice of h2

in Ft

nh(x), given by h ∗

2 = cn

−1/5 _{with c > 0. The bandwidth h}∗

2 is optimal in the sense that it

minimizes the MSE (equation (3.32)) with respect to h2.

A sub optimal choice of h1 requires the MSE of Fnht (x) to be of smaller order. It is possible to

keep the optimal choice h∗

2 ∼ n−1/5 when h1 is chosen sub optimal. However in order to satisfy

equation (3.32) one should not choose h1 too small. Note that by equation (3.46) the MSE of

Ft

nh(x) with h = h ∗

2 is of order O(n

−4/5_{). Hence to satisfy equation (3.32) one should always}

choose h1 n−9/35. With this restriction Fnht (x) satisfies equation (3.32) for sub optimal h1

(39)

Chapter 4 Simulation

4.1 Illustration of the theory

In this section we illustrate the theory in Chapter 3 with the help of simulated examples. We define below the functions and values as well as the technique of computation that we use in the simulation of the examples.

As a kernel function we use the biweight kernel

w(x) = 15

16(1 − x

2₎2_I

[−1,1](x). (4.1)

We define ˆFn(x), proposed in Theorem 3.11, for fixed bandwidth h as

ˆ Fn(x) = F 1/2 nh (x) = 1 2(F − nh(x) + F + nh(x)). (4.2)

As seen in Section 3.11, ˆFn(x) meets the requirements of Theorem 3.11.

We want to reduce the computational power and we want to weigh each data point equally in our simulations. To achieve this we use a method similar as the method of WARPing, described in [H¨ardle (1991), Sec 1.4]. A short explanation is given below.

We divide the length of the domain l of our estimators in M intervals. The width of each interval is called the binwidth b. We count for each j ∈ {1, . . . , M } how many observations Vi

are in the interval bj = [(j − 1)b, jb]. We call this number nbj. Hence we have

nbj = n

X

i=1

I[Vi ∈ bj].

We give the observations Vi in the bin bj the average value of the bin, denoted by Vbj. Hence

we have Vbj = (j − 1)b + jb 2 = (j − 1 2)b.

(40)

With these considerations one can construct an estimator that approximates the initial estimator. We illustrate this for the estimator gnh. All other kernel estimators in the simulations

are approximated in the same way. De domain of gnhis [0, 2], hence l = 2. We choose M = 400.

Hence for the binwidth b we have b = 2/400 = 0.005. The estimator ˆgnh, given for x ∈ [0, 2] by

ˆ gnh(x) = 1 n M X j=1 1 hnbjw x − V_b j h , (4.3)

is an approximation of the estimator gnh.

In section 4.1.1, 4.1.2 and 4.1.3 we simulate three samples of UCSD where Xi ∼ Beta(2, 2).

For Ti we examine respectively Ti ∼ N (0.5, 0.3) conditioned on [0, 1], Ti ∼ U [0, 1] and Ti ∼

Beta(2, 5). The sample size is n = 10000 and the bandwidths are h1 = 0.22 and h2 = 0.16.

In Figure 1 the true density f , the known density q and the true density g are plotted for each of the three different cases.

4.1.1 T

i

∼ N (0.5, 0.3) conditioned on [0, 1]

We start with the results for Ti ∼ N (0.5, 0.3) conditioned on [0, 1], which we will interchangeably

denote by Ti ∼ N[0,1](0.5, 0.3).

The estimators f_nh−, f+

nh as well as the final estimator fnh are given in Figure 3. We analyse

Figure 3 (a) and 3 (b) with the help of Figure 2. In Figure 2 (a) the functions b−_{(x) and b}+_(x)

are drawn. Note that the image of b−_{(x) lies approximately between the values −50 and 50}

for x in the interval [0, 0.8]. Note also that b−_{(x) changes sign around x = 0.15 and x = 0.7.}

Hence we would expect the bias of f_nh−(x) to be of equal magnitude in the interval [0, 0.8] and changing sign around x = 0.15 and x = 0.7. This is what we see in Figure 3 (a).

In Figure 1 (a) and in Figure 2 (b) the density g(x) and its second and third derivative are drawn. Note that g0_{(x) and g}(3)_{(x) are not continuous at x = 1. This is however demanded}

in Lemma 3.3. The kernel density estimator g0nh(x) is therefore insecure around x = 1. Recall

that h1 = 0.22. Hence for x ∈ [0.78, 1] we do not expect the graph of b−(x) to be an accurate

indication for the bias. In Figure 3 (a) we see that the discontinuity is causing a boundary effect in this interval.

Due to the decreasing shape of g(x) when approaching x = 0 the expected boundary effect at x= 0 is small and is not noticed in the graph of f_nh−.

The graph of f_nh+(x) in Figure 3 (b) can be analysed in the same way. Recall that f_nh+(x) depends on g(x + 1) and its derivatives. Hence the discontinuity of g0_{(x) and g}(3)_{(x) causes a}

boundary effect for x ∈ [0, 0.22].

Due to the factors 1−F (x) and F (x) the discontinuity of g0_{(x) and g}(3)_{(x) does not influence}

our final estimator fnh. As seen in Figure 3 (c) the boundary effects have disappeared and our

final estimator approaches the density f better.

The estimators F_nh−, F+

nh and ˆFn = F 1/2

nh are given in Figure 4. The right tail of F −

nh is a

(41)

0.5 1.0 1.5 2.0 0.5 1.0 1.5 g(x) q(x) for Ti ∼ N(0.5,0.3) on [0,1] f(x) for Xi ∼ Beta(2,2)

(a) Xi ∼ Beta(2, 2) and Ti∼ N (0.5, 0.3) conditioned on [0, 1]

0.5 1.0 1.5 2.0 0.5 1.0 1.5 g(x) q(x) for Ti ∼ U(0,1) f(x) for Xi ∼ Beta(2,2) (b) Xi∼ Beta(2, 2) and Ti∼ U [0, 1] 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 2.5 g(x) q(x) for Ti ∼ Beta(2,5) f(x) for Xi ∼ Beta(2,2)

(c) Xi∼ Beta(2, 2) and Ti∼ Beta(2, 5)

(42)

0.2 0.4 0.6 0.8 1.0 - 50 50 100 150 b-_(x) b+_(x) (a) b−(x) and b+_(x) 0.5 1.0 1.5 2.0 - 100 - 50 0 50 100 g''(x) g(3)(x)

(b) the density g(x) and derivatives

Figure 2: The functions b−_{, b}+ _{and the derivatives of g for X}

i ∼ Beta(2, 2) and Ti ∼ N[0,1](0.5, 0.3). 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 f -nh(x) f(x) (a) fnh−(x) 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 f+nh(x) f(x) (b) fnh+(x) 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 fnh(x) f(x) (c) fnh(x)

Figure 3: The estimators f− nh, f

+

Nonparametric Density Estimation for Univariate Current Status Data

MSc Stochastics and Financial Mathematics

Master Thesis