MSc Stochastics and Financial Mathematics
Master Thesis
Nonparametric Density Estimation for
Univariate Current Status Data
Author: Supervisor:
Catharina Elisabeth Graafland
dr. A.J. van Es
Examination date:
January 26, 2017
Korteweg-de Vries Institute for Mathematics
Abstract
We construct a density estimator and an estimator for the distribution function in the univariate current status model. The estimator is based on the relation of the problem to univariate uniform deconvolution. Some asymptotic properties are derived and sim-ulations are presented.
AMS classification: primary 62G05, 62N01, 62G07; secondary 62G20 Keywords: Univariate current status, kernel density estimation.
Title: Nonparametric Density Estimation for Univariate Current Status Data Author: Catharina Elisabeth Graafland, lisette.graafland@student.uva.nl, 10814086 Supervisor: dr. A.J. van Es
Second Examiner: prof. dr. J.H. van Zanten Examination date: January 26, 2017
Korteweg-de Vries Institute for Mathematics University of Amsterdam
Science Park 105-107, 1098 XG Amsterdam http://kdvi.uva.nl
Contents
1 Introduction 5
2 Construction and results 7
3 Detailed derivation of the results 14
3.1 Univariate current status data . . . 14
3.2 Inversion formulas . . . 16
3.3 Construction of two estimators of the density function . . . 16
3.4 Expectation and variance of the left and right estimator . . . 17
3.5 Combination of the left and right estimator . . . 23
3.6 Expectation and variance of the combined estimator . . . 23
3.7 Asymptotic behaviour of the combined estimator . . . 27
3.8 Derivation of the t-optimal estimator . . . 29
3.9 Discussion on the optimal value of t . . . 31
3.10 Asymptotic normality of the final estimator . . . 33
3.11 Consistent estimator of the distribution function . . . 37
4 Simulation 39 4.1 Illustration of the theory . . . 39
4.1.1 Ti ∼ N (0.5, 0.3) conditioned on [0, 1] . . . 40
4.1.2 Ti ∼ U [0, 1] . . . 43
4.1.3 Ti ∼ Beta(2, 5) . . . 43
4.1.4 Influence of q on the accuracy . . . 45
4.1.5 Boundary correction . . . 47
4.2 Sample expectations and sample variances . . . 47
4.2.1 Xi ∼ Beta(2, 3) and Ti ∼ U [0, 1] . . . 48
4.2.2 Xi ∼ Beta(2, 2) and Ti ∼ N (0.5, 0.3) conditioned on [0, 1] . . . 50
4.3 Optimal value of t . . . 50
4.4 Unknown distribution of Ti . . . 53
5 Discussion 56
A Remaining proofs and lemmas 61
A.1 Lemma A.1 and its proof . . . 61
A.2 Proof of Lemma 3.9 . . . 62
A.3 Proof of Lemma 3.12 . . . 65
A.4 Lyapunov Condition . . . 67
Chapter 1
Introduction
In this thesis we study the Univariate Current Status Data problem (UCSD problem). In this problem the initial data in which we are interested are not available. However other related data, so-called current status data, are available. Deriving a density estimator for the initial data from the current status data will be the aim of this thesis.
The UCSD problem, or type I interval censoring problem, can be formulated in the follow-ing way. Let X1, . . . , Xn be unobservable variables of interest. Let T1, . . . Tn be i.i.d. random
variables with known density. For T1, . . . , Tn it is known whether the unobservable variables
X1, . . . , Xn are smaller or larger than the corresponding Ti. Else stated, writing ∆i = I[Xi≤Ti],
the current status data (T1,∆1), . . . , (Tn,∆n) are observed.
An interpretation of the random variables T1, . . . Tn as random time instants, serving as
obser-vation times, is an explanation of the term current status. However the variables T1, . . . Tn are
not restricted to be time instants.
To illustrate practical importance we consider below three examples of UCSD problems. Given a population of children one can be interested in the age of weaning. See for instance [Diamond and McDonald (1991)] and [Grummer-Strawn (1993)]. Weaning is the process of gradually introducing an infant to its adult diet and withdrawing the supply of its mothers milk. In this problem Xi is the age of an individual child when it starts the process of weaning.
Ti is the age of an individual child in this population. ∆i is defined as ∆i = I[Xi≤Ti]. Hence
∆i = 1 if the child is already weaning at observation and ∆i = 0 if it takes nothing but its
mothers milk.
This problem is worth to be analysed by current status data methods, not only when the data of Xi are missing, but also when the data are available, because of the inaccurate character of
the initial data.
Another problem using USCD, formulated by Milton Friedman in [Stat. Research Group, Colombia University (1947), Ch.11, p.341], is about proximity fuzes. A proximity fuze is a device that detonates a munitions explosive material automatically when the distance to a target becomes smaller than a predetermined value. One can be interested in the maximum distance over which a proximity fuze operates.
Ti is the nearest distance the proximity fuze reaches with respect to its target. ∆i is defined as
∆i = I[Xi≤Ti]. Hence ∆i = 1 if the fuze operates at a distance greater than the nearest distance
relative to its target and ∆i = 0 if the fuze operates only at a distance that is smaller than the
nearest distance it has reached relative to its target.
The third and last problem is about estimation of the age of incidence of a non fatal human disease such as Hepatitis A from a cross-sectional sample. This problem was mentioned by N. Keiding in [Keiding (1991)]. In this problem Xi is the age of incidence of an individual by a non
fatal human disease. Ti is the age at which a diagnostic test is carried on the individual. Recall
∆i = I[Xi≤Ti]. Hence ∆i = 1 if the individual was already ill at the moment of the diagnostic
test and ∆i = 0 if the individual gets ill later1.
The above problems are relevant in different aspects of life and one can come up with many other examples. It is therefore not surprising that research to this problem has been done al-ready. With success, an estimator is found in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)]. In this master thesis however a new density estimator for the USCD problem will be constructed. This density estimator differs from the one in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)] in the way it is derived, but has similar asymptotic properties. The estimator in [Witte (2011)] and [Groeneboom, Jongbloed and Witte (2010)] is based on maximum likelihood estimation. The estimator in this thesis is constructed by kernel estimation methods after recovering a connection with the uniform deconvolution model. An advantage of this derivation is the possibility to expand the theory to bi- or multivariate current status data more naturally. This is a subject for further research. Expanding the maximum likeli-hood estimator to the bivariate current status model is still a tedious problem as mentioned in [Groeneboom (2013)].
We start this thesis in Chapter 2 with a summary of the main results and theorems. As mentioned above, the way the density estimator is constructed is a result in itself and therefore a short explanation of the steps between the results and theorems will already be given in this section. In Chapter 3 the derivation of the estimator and theorems are again presented. In this chapter all results, theorems and lemmas will be proven and explained more deeply. However for readability some lemmas are omitted and will be proven in Appendix A. Chapter 4 will be devoted to simulation of the constructed estimator. In this chapter also some problems beyond the theory in this thesis are simulated. In Chapter 5 the results of Chapter 3 will be discussed and compared with existing theory on the subject. In this way one gets insight in subjects of further research. In Chapter 6 some suggestions are made for such further research.
1In this problem we assume that the members of our population are all infected during their lifetime and
that this infection causes illness. Data can be drawn from a population of recent and former patients. The age at which the first diagnostic test is performed can be taken as data points for Ti.
Chapter 2
Construction and results
The uniform deconvolution problem
Our first result will be Lemma 3.1 which demonstrates the relationship between UCSD and data of the uniform deconvolution problem. A short sketch of this problem is given below. First we describe the general deconvolution problem. Next we look at the special case of uniform deconvolution.
Let X1, . . . , Xnbe i.i.d. observations, where Yi = Xi+ Zi and Xi and Zi are independent.
Assume that the unobservable Xi have distribution function F and density f . Also assume
that the unobservable random variables Zi have a known density k. Note that the density
g of Yi is equal to the convolution of f and k, so g = k ∗ f where ∗ denotes convolution.
So we have
g(x) = Z ∞
−∞
k(x − u)f (u)du.
The general deconvolution problem is the problem of estimating f or F from the obser-vations Xi. In the uniform deconvolution problem we require the distribution of Zi to be
U[0, 1) distributed. In this case k(x) = I[0,1)(x) and hence we have
g(x) = Z ∞
−∞
I[0,1)(x − u)f (u)du = F (x) − F (x − 1). (2.1)
In the next item we present a similar equality between the distribution function F of the unobservable variables X1, . . . , Xn in the UCSD problem and the density of variables
V1, . . . , Vn, obtained from the observed UCSD.
Transformation of UCSD to uniform deconvolution data (Section 3.1)
In the USCD problem our variables of interest are X1, . . . , Xn. The exact value of Xi
is not measured. Instead of X1, . . . , Xn we observe the corresponding Ti and we observe
∆i = I[Xi≤Ti]. The observations are presented as (T1,∆1), . . . , (Tn,∆n).
In Section 3.1 we consider the following transformation of the points Ti,
Vi =
Ti+ 1 , if ∆i = 0,
The first main lemma is stated below.
Lemma 3.1. Assume that the distribution of the variables Xi is concentrated on [0, 1].
Assume that the Ti have a density q. If Ti is supported on [0, 1], then the Vi have a density
g, given by
g(v) = ¯q(v)F(v) − F (v − 1), v ∈[0, 2], (2.2) with the function ¯q defined by
¯
q(v) = q(v) + q(v − 1). (2.3)
This lemma reveals the connection between the UCSD problem and the uniform decon-volution model described above. As can be derived by equation (2.1), the USCD problem is exactly a uniform deconvolution problem if the variables Ti are uniformly distributed
on [0, 1], since then q ≡ 1 on [0, 1]. In this case Vi is in distribution equal to Xi+ Zi, with
Zi ∼ U [0, 1).
The result in Lemma 3.1 is important as it serves as the start of our construction of the density estimator for the UCSD problem.
Two inversion formulas (Section 3.2)
Several methods are known to analyse the uniform deconvolution model. Our analysis of the transformed UCSD problem is inspired by the method in [van Es (2011)]. The first step in our approach is inversion of equation (2.2). Under appropriate assumptions for the functions F and q we derive two inversion formulas for both F and f . For x ∈ [0, 1] we have F(x) = g(x) q(x) and F(x) = 1 − g(x + 1) q(x) . Taking derivatives1 we get
f(x) = d dx g(x) q(x) = 1 q(x)g 0 (x) − q 0(x) q2(x)g(x) and f(x) = d dx 1 − g(x + 1) q(x) = − 1 q(x)g 0 (x + 1) − q 0(x) q2(x)g(x + 1) .
The inversion formulas for F are derived from equation (2.2). Hence the inversion formu-las yield equal F and f if g is of the form (2.2). However for arbitrary g, for example an estimator of g, which is not of the form (2.2), the inversions will in general not coincide. They may not even yield distribution functions or densities.
Construction of two estimators of the density function (Section 3.3)
From the above equations we construct our first two density estimators for f . The con-struction is achieved by plugging in a standard kernel density estimator gnh for the density
g. We denote the estimators by fnh− and f+
nh, respectively called the left and right
estima-tor. For x ∈ [0, 1] we have
fnh−(x) = 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) (2.4) and fnh+(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2(x)gnh(x + 1) , (2.5) where for x ∈ [0, 2], gnh(x) = 1 n n X i=1 1 hw x − Vi h ,
a standard kernel estimator with kernel function w and bandwidth h > 0. Expectation and variance of the left and right estimator (Section 3.4)
In Section 3.4 we derive expansions for the expectation and variance of the left and right estimator given in equations (2.4) and (2.5). The results, without assumptions, are given in the following theorems. The only assumption stated here is the standard assumption for the sequence of bandwidths h = hn, namely for n → ∞ that h → 0 and nh3 → ∞.
For the left estimator we have Theorem 3.4. E fnh−(x) = f (x) +1 2h 2 Z v2w(v)dv b−(x) + o(h2), Var fnh−(x) = 1 q(x) 1 nh3F(x) Z w0(u)2du+ o 1 nh3 , where b−(x) = 1 q(x)g 000 (x) − q 0 (x) q2(x)g 00 (x).
For the right estimator we have Theorem 3.5. E fnh+(x) = f (x) +1 2h 2 Z v2w(v)dv b+(x) + o(h2), Var fnh+(x) = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2du+ o 1 nh3 , where b+(x) = − 1 q(x)g 000 (x + 1) + q 0(x) q2(x)g 00 (x + 1).
Theorem 3.4 and Theorem 3.5 together with the standard assumptions on the sequence h= hn ensure that fnh− and f
+
nh are both weakly consistent estimators of f .
Combination of the left and right estimator; Expectation, variance and asymp-totic properties (Sections 3.5, 3.6 and 3.7)
In Section 3.4 we are led to the idea of a convex linear combination of fnh− and f+
nh to
obtain a new estimator with lower variance than both f−
nh and f +
nh. The new estimator is
defined as
fnht (x) = tfnh−(x) + (1 − t)f+ nh(x),
for some fixed 0 ≤ t ≤ 1.
The expectation and variance of ft
nh are stated without assumptions in the following
theorem. The same standard assumptions on the sequence h = hn are imposed.
Theorem 3.6. E ft nh(x) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2), Var fnht (x) = 1 q(x) 1 nh3 t 2 F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 . (2.6)
In particular it turns out that for uniformly distributed variables Ti on [0, 1], i.e. q ≡ 1
on [0, 1], we have b−(x) = b+(x) = f00(x). In this case the expectation of ft
nh does not
depend on t. The fact that for general distribution q the bias of ft
nh depends on t is an
essential difference to the uniform deconvolution problem.
Note that Theorem 3.6 shows that a MSE optimal bandwidth h is of order n−1/7 since it
minimizes the sum of the squared bias and variance which are of order h4 and 1/(nh3)
respectively.
Again, the standard assumptions on the bandwidths h = hn and equations (2.6) ensure
that ft
nh is a weakly consistent estimator of f . The next main theorem states asymptotic
normality of ft
Theorem 3.8. As n → ∞, h → 0 and nh3 → ∞, we have √ nh3(ft nh(x) − E fnht (x)) D → N (0, σ2 t) with σt2 = 1 q(x) t 2F(x) + (1 − t)2(1 − F (x) Z w0(u)2du.
Derivation of the optimal estimator with respect to t (Section 3.8)
Note that, at this moment, we have found a consistent asymptotically normal estimator of f for all t, such that 0 ≤ t ≤ 1. In Section 3.8 we focus on finding an optimal t for ft
nh.
We define t∗ ∈ [0, 1] to be the value for t that minimizes the asymptotic mean squared
error of ft nh with respect to t. For t∗ we find t∗ = 1 − F (x) − nh 7C(w)b+(x)˜q(x)q(x) 1 + nh7C(w)˜q2(x)q(x) + o(1), where ˜ q(x) = 1 q(x)q 000 (x) − q 0(x) q2(x)q 00 (x) and C(w) = 1 4 R v2w(v)dv2 R w0(u)2du .
Note that for q ≡ 1 on [0, 1] we have t∗ = 1 − F (x) + o(1).
Discussion on the optimal value of t (Section 3.9)
In Section 3.9 we discuss the value of t∗. Note that t∗ depends on the bandwidth h. For
a MSE optimal choice of h, i.e. h ∼ cn−1/7, c > 0 we have
t∗ ∼ 1 − F (x) − C(w)b
+(x)˜q(x)q(x)
1 + C(w)˜q2(x)q(x) .
For a super optimal choice of h, i.e. h n−1/7, we have t∗ ∼ b+(x)/(b+(x) − b−(x)). For
a sub optimal choice of h, i.e. h n−1/7, we have t∗ ∼ 1 − F (x).
Asymptotic normality of the final estimator (Section 3.10)
In all three cases described above, t∗ and hence ft∗
nh depend on F . Hence ft ∗
nh is not
applicable as an estimator of f . We call ft∗
nh a semi-estimator. In the case of (super)
optimal h the constructed ft∗
nh is a semi-estimator as well, it depends on (F and) the
In the sequel we take t = 1 − ˆFn, where ˆFn is an estimator of F on [0, 1] that is consistent
in MSE. We define our final estimator fnh as fnh1− ˆFn. Hence we have
fnh(x) = (1 − ˆFn(x))fnh−(x) + ˆFn(x)fnh+(x).
The main theorem of this thesis proves asymptotic normality and gives expansions of the bias and variance of the final estimator for h = O(n−1/7). It also shows that the final
estimator is t-optimal for sub optimal h, i.e. h n−1/7.
Theorem 3.11. Assume that Condition W on w in Section 3.4 is satisfied, that f is twice continuously differentiable, q is three times continuously differentiable on a neighbourhood of x, and that ˆFn(x) is an estimator of F (x) with
E ( ˆFn(x) − F (x))2 → 0. (2.7)
Then, as n → ∞ and h = O(n−1/7), we have nh3 → ∞, h → 0 and we have for 0 < x < 1
√ nh3(f nh(x) − E fnh(x)) D → N (0, σ2), with σ2 = F (x)(1 − F (x)) 1 q(x) Z w0(u)2du. Furthermore, if E ( ˆFn(x) − F (x))2 = o(nh7) (2.8) then E fnh(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+(x) + o(h2 ). If h n−1/7 then f
nh(x) is t-optimal and we have
√ nh3(f nh(x) − f (x)) D → N (0, σ2), as n → ∞.
With t-optimal we mean that given the sequence of bandwidths the choice of t is the MSE optimal one.
Consistent estimator of the distribution function F (Section 3.11)
In Section 3.11 we define the estimator Ft
nh(x) for F (x) for x ∈ [0, 1] by Fnht (x) = tFnh−(x) + (1 − t)F+ nh(x), with Fnh−(x) = gnh(x) q(x) and Fnh+(x) = 1 − gnh(x + 1) q(x) for some fixed 0 ≤ t ≤ 1.
The expansions of the expectation and the variance of Ft
nh are stated in the following
theorem. The assumptions are the same as in Theorem 3.11.
Theorem 3.13. Under the assumptions for w, f and q in Theorem 3.11 we have
E Ft nh(x) = F (x) + 1 q(x) 1 2h 2(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2) (2.9) and Var Fnht (x) = 1 q(x) 1 nh(t 2 F(x) + (1 − t)2(1 − F (x)) Z w(v)2dv+ o 1 nh . (2.10)
We prove in Section 3.11 that Ft
nh is consistent in MSE for all 0 ≤ t ≤ 1 with the help of
Theorem 3.13. Hence the estimator Ft
nh meets the requirement of Theorem 3.11.
In the simulations in Chapter 4 we choose for ˆFn, introduced in Theorem 3.11, the
esti-mator Ft
nh with t = 1/2. In other words we define for x ∈ [0, 1],
ˆ Fn(x) = 1 2F − nh(x) + 1 2F + nh(x).
Chapter 3
Detailed derivation of the results
3.1
Univariate current status data
We consider again the univariate current status data problem or the Type I interval censoring problem. In this estimation problem one observes at i.i.d. random time instants T1, . . . , Tn
whether unobservable variables of interest X1, . . . , Xn are smaller or larger than the
corre-sponding Ti. If we write ∆i = I[Xi≤Ti], the observations are (T1,∆1), . . . , (Tn,∆n). The exact
value of Xi is not measured. Based on these data, under some regularity conditions, one can
estimate the distribution of the Xi.
Note that the probabilities for ∆i, given the value of Ti, are given by
P(∆i = 0|Ti = ti) = 1 − F (ti),
P(∆i = 1|Ti = ti) = F (ti). (3.1)
Consider the following transformation of the points Ti,
Vi =
Ti+ 1 , if ∆i = 0,
Ti , if ∆i = 1. (3.2)
Lemma 3.1. Assume that the distribution of the variables Xi is concentrated on [0, 1]. Assume
that the Ti have a density q. If Ti is supported on [0, 1], then the Vi have a density g, given by
g(v) = ¯q(v)F(v) − F (v − 1), v ∈ [0, 2], (3.3)
with the function ¯q defined by
¯
Proof
We omit the subscript i. For v ∈ [0, 1] we have
P(V ≤ v) = 1 X k=0 P(V ≤ v, ∆ = k) = P (V ≤ v, ∆ = 1) = Z 1 0 P(V ≤ v, ∆ = 1|T = t)q(t)dt = Z 1 0 P(T ≤ v, ∆ = 1|T = t)q(t)dt = Z v 0 P(∆ = 1|T = t)q(t)dt = Z v 0 F(t)q(t)dt.
By the support restriction on the distribution induced by F and q this confirms (3.3) for the given values of v.
Let us also check the claim on the interval [1, 2]. For u ∈ [0, 1] we have
P(1 ≤ V ≤ 1 + u) = P (1 ≤ V ≤ 1 + u, ∆ = 0) + P (1 ≤ V ≤ 1 + u, ∆ = 1) = P (1 ≤ V ≤ 1 + u, ∆ = 0) = Z 1 0 P(1 ≤ V ≤ 1 + u, ∆ = 0|T = t)q(t)dt = Z 1 0 P(T ≤ u, ∆ = 0|T = t)q(t)dt = Z u 0 P(∆ = 0|T = t)q(t)dt = Z u 0 (1 − F (t))q(t)dt = Z 1+u 1 (1 − F (t − 1))q(t − 1)dt.
Again by the support restriction on the distribution induced by F and q this confirms (3.3) for
the values of v in [1, 2]. 2
This lemma reveals a connection between the UCSD problem and the uniform deconvolution problem. The USCD problem is equal to the uniform deconvolution problem if the variables Ti
have a uniform distribution on [0, 1], since then q ≡ 1 on [0, 1]. In this case Vi is in distribution
3.2
Inversion formulas
We deduce the following inversion formulas from Lemma 3.1.
Lemma 3.2. If g is of the form (3.3) and q is strictly positive on [0, 1], we have for x ∈ [0, 1]
F(x) = g(x)
q(x), (3.5)
F(x) = 1 − g(x + 1)
q(x) . (3.6)
Furthermore, if we assume that F and q are differentiable, we have for 0 < x < 1
f(x) = 1 q(x)g 0 (x) − q 0(v) q2(x)g(x), (3.7) f(x) = − 1 q(x)g 0 (x + 1) − q 0(x) q2(x)g(x + 1) . (3.8) Proof
Note that v ∈ [0, 1] implies F (v − 1) = 0 and q(v − 1) = 0. Hence for v ∈ [0, 1] equation (3.3) turns into g(v) = F (v)q(v). Rewriting this gives the first inversion formula (3.5).
Also v ∈ [1, 2] implies F (v) = 1 and q(v) = 0. Hence for v ∈ [1, 2] equation (3.3) turns into g(v) = (1 − F (v − 1))q(v − 1). Rewriting this gives F (v − 1) = 1 − g(v−1)q(v−1). Substituting x for v −1 gives the second inversion formula (3.6) for x ∈ [0, 1].
Differentiating these formulas yields the two formulas (3.7) and (3.8). 2 From now on we assume the density q to be differentiable and strictly positive on [0, 1]. We also assume that the distribution function F has a density f .
3.3
Construction of two estimators of the density
func-tion
Our aim is to construct an ‘optimal’1 estimator for the density function f of the unobservable
variables of interest X1, . . . , Xn. In this section we start with the construction of two different
estimators using the inversion formulas in Lemma 3.2.
The inversion formulas in Lemma 3.2 yield equal F (x) and f (x) if g is of the form (3.3). However for arbitrary g, for example an estimator of g, which is not of the form (3.3), the inversions will in general not coincide. They may not yield distribution functions or densities.
We now derive two different estimators of f (x) from (3.7) and (3.8) in the following way.
For g(x) we substitute the kernel density estimator with differentiable kernel function w and bandwidth h > 0. For x ∈ [0, 2] this yields
gnh(x) = 1 n n X i=1 1 hw x − Vi h . (3.9)
For g0(x) we substitute the derivative of this kernel estimator g0 nh(x), gnh0 (x) = 1 n n X i=1 1 h2 w 0x − Vi h .
In this way we derive two different estimators of f (x) from (3.7) and (3.8). We call the estimators respectively the left and right estimator, written as fnh− and f+
nh. We have fnh−(x) = 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) (3.10) and fnh+(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2(x)gnh(x + 1) . (3.11)
Note that these estimators coincide with the estimators in the uniform deconvolution model as obtained in [van Es (2011)] when the Ti are taken to be uniform, i.e. q(x) ≡ 1 on [0, 1].
Our next goal is to show that fnh−(x) and fnh+(x) are consistent estimators of f (x). We are interested in the expectation and variance of these estimators.
3.4
Expectation and variance of the left and right
esti-mator
In this section we derive expansions of the expectation and variance of the left and right es-timator. In our derivation we need expansions of the expectation and variance of the kernel density estimator gnh as well as its derivative.
We first impose the following condition on the kernel function w in gnh.
Condition W
The function w is a continuously differentiable symmetric probability density function with support [-1,1].
The asymptotics for gnh and its derivative are given in the following lemma.
Lemma 3.3. Assume that the density g is bounded or integrable. Assume also that g is twice differentiable with continuous and bounded g00. If the kernel function w satisfies Condition W,
then we have E gnh(x) = g(x) + 1 2h 2 g00(x) Z v2w(v)dv + o(h2), Var gnh(x) = 1 nhg(x) Z w(u)2du+ o 1 nh . (3.12)
If furthermore, g is three times differentiable with continuous and bounded g(3), then we
have E gnh0 (x) = g0(x) + 1 2h 2g(3)(x) Z v2w(v)dv + o(h2), Var g0nh(x) = 1 nh3g(x) Z w0(u)2du+ o 1 nh3 . (3.13) Proof
The equations in (3.12) are standard asymptotic properties of the kernel estimator and the proof of these expansions can be found in [Silverman (1986)]. The proofs of equations (3.13) are however not commonly known. We therefore prove (3.13) below. We start with the expansion of E g0
nh(x). In this expansion Lemma A.1 is used twice. Lemma A.1 and its proof are given in
full detail in Section A.1. For E g0
nh(x) we have E gnh0 (x) = E1 n n X i=1 1 h2 w 0x − Vi h = 1 h2E w 0x − V1 h = 1 h2 Z w0x − t h g(t)dt = 1 h2 [−hwx − t h g(t)]∞−∞− Z hw x − t h g0(t)dt = 1 h Z wx − t h g0(t)dt
= Ew,g0(x, h) (Use Lemma A.1) = g0(x) Z w(−u)du + h2 1 2g 000 (x) Z u2w(u)du + o(1) = g0(x) + 1 2h 2g000 (x) Z u2w(u)du + o(h2).
In the penultimate equation we used Lemma A.12 in Section A.1. In the ultimate equation we
used that R w(u)du = 1 and the fact that w is symmetric.
We continue the proof with the expansion of the variance of g0
nh(x) stated in (3.13). In this
expansion Lemma A.1 is used again. Our first step is to rewrite E g0 nh(x)2. E gnh0 (x)2 = E 1 n n X i=1 1 h2 w 0x − Vi h !2 = E 1 n2h4 n X i=1 w0x − Vi h 2 + 1 n2h4 X i6=j w0x − Vi h w0x − Vj h ! = 1 nh4E w 0x − V1 h 2 +n −1 nh4 E w0x − V1 h 2 = 1 nh3E 1 hw 0x − V1 h 2 +n −1 n 1 h2E w 0x − V1 h 2 . The variance of g0
nh(x) can now be written as follows.
Var gnh0 (x) = E g0nh(x)2− (E g0 nh(x)) 2 = 1 nh3E 1 hw 0x − V1 h 2 +n −1 n 1 h2E w 0x − V1 h 2 − 1 h2E w 0x − V1 h 2 = 1 nh3E 1 hw 0x − V1 h 2 − 1 nh2 1 hE w 0x − V1 h 2 .
We use Lemma A.1 to expand the first term3 and we note that the second term equals 1
nh2
Eg0nh(x)2.
2Note that the derivative of the density g and the kernel w satisfy Condition C and K in Section A.1. Note
that s = 2 because w is a symmetric density function. These considerations justify the use of Lemma A.1.
3Note that E1 hw
0x−V1
h
2
= Ew02,g(x, h). The density g satisfies Condition C. The function w02 satisfies
Condition K for s = 2 by the following arguments.
First of all sup−∞<u<∞|w0(u)2|du < ∞, because w0(x) is bounded. Secondly, R uw0(u)2du= 0 because w0(u)
is odd by the symmetry of w(u). FinallyR u2w0(u)2du 6= 0, because u2 and w0(u)2 are even and there exist an
Together we have Var g0nh(x) = 1 nh3 g(x) Z w0(u)2du+1 2h 2g00 (x) Z u2w0(u)2du+ o(h2) − 1 nh2 g0(x) + 1 2h 2g000 (x) Z v2w(v)dv + o(h2) 2 = 1 nh3g(x) Z w0(u)2du+ o 1 nh3 . 2
The results in Lemma 3.3 enable us to calculate E fnh−(x) and Var fnh−(x). The next theorem states the expansions of the expectation and variance of fnh−(x).
Theorem 3.4. Assume that the kernel function w satisfies Condition W, that the density f is twice continuously differentiable, and that the density q is three times continuously differentiable, then we have for 0 < x < 1
E fnh−(x) = f (x) + 1 2h 2 Z v2w(v)dv b−(x) + o(h2), (3.14) and Var fnh−(x) = 1 q(x) 1 nh3F(x) Z w0(u)2du+ o 1 nh3 , (3.15) where b−(x) = 1 q(x)g 000 (x) − q 0(x) q2(x)g 00 (x). (3.16) Proof
The assumptions on f and q imply that a function g of the form (3.3) is three times differ-entiable with continuous and bounded derivative g(3) on the intervals (0, 1) and (1, 2). Hence
the assumptions of Lemma 3.3 are in particular satisfied on the interval (0, 1) for g of the form (3.3). With this in mind we first prove (3.14).
We have E fnh−(x) = E 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) (by (3.10)) = 1 q(x)E g 0 nh(x) − q0(x) q2(x)E gnh(x) = 1 q(x) g0(x) + 1 2h 2 g000(x) Z v2w(v)dv + o(h2) − q 0(x) q2(x) g(x) + 1 2h 2 g00(x) Z v2w(v)dv + o(h2) = f (x) + 1 q(x) 1 2h 2 g000(x) Z v2w(v)dv + o(h2) − q 0(x) q2(x) 1 2h 2 g00(x) Z v2w(v)dv + o(h2) = f (x) + 1 2h 2 Z v2w(v)dv 1 q(x)g 000 (x) − q 0(x) q2(x)g 00 (x) + o(h2).
We continue with the expression for Var fnh−(x). We have
Var fnh−(x) = Var 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) = 1 q(x)2 Var g 0 nh(x) − q0(x)2 q4(x) Var gnh(x) − 2 1 q(x) q0(x) q2(x)Cov (g 0 nh(x), gnh(x)) .
We consider the second and third term in Var fnh−(x) separately. Notice that Var gnh(x)
Var g0
nh(x), because 1/(nh) 1/(nh3). Hence the second term is negligible. By the
Cauchy-Schwarz inequality we have
Cov(gnh0 (x), gnh(x)) ≤ q Var g0 nh(x)pVar gnh(x) qVar g0 nh(x) q Var g0 nh(x) = Var gnh0 (x).
Hence the third term is negligible. We may therefore conclude
Var fnh−(x) = 1 q(x)2 Var g 0 nh(x) + o(Var g 0 nh(x)) = 1 q(x)2 1 nh3g(x) Z w0(u)2du+ o 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3F(x) Z w0(u)2du+ o 1 nh3 . 2
We will now focus on f+
nh(x). The next step is to find expansions of E f +
nh(x) and Var f + nh(x).
The results and derivation are similar as those of f− nh(x).
Theorem 3.5. Assume that the kernel function w satisfies Condition W, that the density f is twice continuously differentiable, and that the density q is three times continuously differentiable, then we have for 0 < x < 1
E f+ nh(x) = f (x) + 1 2h 2 Z v2w(v)dv b+(x) + o(h2), (3.17) and Var f+ nh(x) = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2du+ o 1 nh3 , (3.18) where b+(x) = − 1 q(x)g 000 (x + 1) + q 0(x) q2(x)g 00 (x + 1). (3.19) Proof
By a similar reasoning as in the proof of Theorem 3.4 the assumptions of Lemma 3.3 are now in particular satisfied on the interval (1, 2) for g of the form (3.3). With this in mind we first prove equation (3.17). E f+ nh(x) = E − 1 q(x)g 0 nh(x + 1) + q0(x) q2(x)gnh(x + 1) (by (3.11)) = − 1 q(x)E g 0 nh(x + 1) + q0(x) q2(x)E gnh(x + 1) = − 1 q(x) g0(x + 1) +1 2h 2g000 (x + 1) Z v2w(v)dv + o(h2) + q 0(x) q2(x) g(x + 1) + 1 2h 2g00 (x + 1) Z v2w(v)dv + o(h2) = f (x) + 1 q(x) −1 2h 2g000 (x + 1) Z v2w(v)dv + o(h2) + q 0(x) q2(x) 1 2h 2g00 (x + 1) Z v2w(v)dv + o(h2) = f (x) + 1 2h 2 Z v2w(v)dv − 1 q(x)g 000 (x + 1) + q 0(x) q2(x)g 00 (x + 1) + o(h2).
We continue with the expression for Var fnh+(x). We have
Var f+ nh(x) = Var − 1 q(x)g 0 nh(x + 1) + q0(x) q2(x)gnh(x + 1) = 1 q(x)2 Var g 0 nh(x + 1) + q0(v)2 q4(x) Var gnh(x + 1) + 2 1 q(x) q0(x) q2(x)Cov (g 0 nh(x + 1), gnh(x + 1)) .
By the same reasoning as in the proof of Theorem 3.4 we have that Var gnh(x + 1)
Var g0
nh(x+1). This result and the Cauchy-Schwarz inequality give Cov(g 0
nh(x+1), gnh(x+1))
Var g0
nh(x + 1). We may therefore conclude
Var f+ nh(x) = 1 q(x)2Var g 0 nh(x + 1) + o(Var g 0 nh(x + 1)) = 1 q(x)2 1 nh3g(x + 1) Z w0(u)2du+ o 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2du+ o 1 nh3 . 2
3.5
Combination of the left and right estimator
The left estimator has a smaller variance than the right estimator for small values of x, because of the factors F (x) and 1 − F (x). For large values of x the converse is true. To obtain an estimator with smaller variance as both the left and right estimator, it makes sense to construct a convex linear combination of the two estimators. We define the combined estimator by
fnht (x) = tfnh−(x) + (1 − t)f+ nh(x),
for some fixed 0 ≤ t ≤ 1.
One could wonder if the bias of this combined estimator will not be bigger than the bias of fnh−(x) and/or fnh+(x). By linearity of the bias, the bias of the new estimator for t ∈ (0, 1) will lie somewhere between the bias of fnh−(x) and f+
nh(x). The exact value depends on t. In Section
3.8 we will find an optimal value for t, which we call t∗. The value t∗ is optimal in the sense
that it minimizes the mean squared error of ft
nh(x) with respect to t. It is possible that ft ∗ nh(x)
will have a larger bias than fnh−(x) or f+
nh(x), but it will have lower or equal mean squared error
for certain.
3.6
Expectation and variance of the combined estimator
We will derive the expectation and the variance of this combined estimator as well as its asymptotic behaviour. The next theorem states the expansions of the expectation and variance of ft
Theorem 3.6. Assume that the kernel function w satisfies Condition W, that the density f is twice continuously differentiable, and that the density q is three times continuously differentiable, then we have for 0 < x < 1
E ft nh(x) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2), (3.20) and Var fnht (x) = 1 q(x) 1 nh3 t 2 F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 . (3.21) Proof
We first prove (3.20). We apply Theorem 3.4 and Theorem 3.5. We have E ft nh(x) = E tf − nh(x) + (1 − t)f + nh(x) = tE fnh−(x) + (1 − t)E f+ nh(x) = t f(x) + 1 2h 2 Z v2w(v)dv b−(x) + o(h2) + (1 − t) f(x) + 1 2h 2 Z v2w(v)dv b+(x) + o(h2) = f (x) + 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2).
In order to prove (3.21) we use the following lemma. The proof of this lemma is given at the end of this section.
Lemma 3.7. Under the assumptions of Theorem 3.6 we have
Cov fnh−(x), f+ nh(x) = o 1 nh3 .
We continue with the expansion of Var ft
nh(x). We have Var fnht (x) = Var tf − nh(x) + (1 − t)f + nh(x) = t2Var f− nh(x) + (1 − t) 2Var f+ nh(x) + 2t(1 − t) Cov f − nh(x), f + nh(x) = t2Var f− nh(x) + (1 − t) 2Var f+ nh(x) + 2t(1 − t)o 1 nh3 = t2 1 q(x) 1 nh3F(x) Z w0(u)2du+ o 1 nh3 + (1 − t)2 1 q(x) 1 nh3(1 − F (x)) Z w0(u)2du+ o 1 nh3 + o 1 nh3 = 1 q(x) 1 nh3 t 2F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 , which shows (3.21). 2
Proof of Lemma 3.7 We have Cov fnh−(x), f+ nh(x) = Cov 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x), − 1 q(x)g 0 nh(x + 1) + q0(x) q2(x)gnh(x + 1) ,
which we can rewrite as
Cov fnh−(x), f+ nh(x) = Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) (1) + Cov 1 q(x)g 0 nh(x), q0(x) q2(x)gnh(x + 1) (2) + Cov −q 0(x) q2(x)gnh(x), − 1 q(x)g 0 nh(x + 1) (3) + Cov −q 0(x) q2(x)gnh(x), q0(x) q2(x)gnh(x + 1) . (4)
We work out the first line separately and show that the second, third and fourth line are negligible compared to the first line. We have
(1) Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) = − 1 q(x)2 E g 0 nh(x)g 0 nh(x + 1) − E g 0 nh(x)E g 0 nh(x + 1).
Now note that, for n large enough, E g0nh(x)gnh0 (x + 1) = E n X i=1 1 nh2 w 0x − Vi h Xn j=1 1 nh2 w 0x+ 1 − Vj h ! = E n X i=1 1 n2h4w 0x − Vi h w0x+ 1 − Vi h + n X i6=j 1 n2h4w 0x − Vi h w0x+ 1 − Vj h ! = E n X i6=j 1 n2h4w 0x − Vi h w0x+ 1 − Vj h = n X i6=j 1 n2h4E w0x − Vi h w0x+ 1 − Vj h = n X i6=j 1 n2h4E w 0x − Vi h E w0x+ 1 − Vj h = n(n − 1) n2 E g 0 nh(x)E g 0 nh(x + 1) = (1 − 1 n)E g 0 nh(x)E g 0 nh(x + 1) = E gnh0 (x)E g 0 nh(x + 1) − 1 nO(1) = E gnh0 (x)E gnh0 (x + 1) − o 1 nh3 .
We have used that w0x−Vi h
w0x+1−Vi h
= 0 for all i provided that h < 1/2, which is true for n large enough. We have also used that Vi and Vj are independent for all i, j and the fact that
E g0
nh(x)E g 0
nh(x + 1) = g
0(x)g0(x + 1) + O(h2) = O(1).
We may now conclude
Cov 1 q(x)g 0 nh(x), − 1 q(x)g 0 nh(x + 1) = − 1 q(x)2 E gnh0 (x)E g 0 nh(x + 1) − o 1 nh3 − E gnh0 (x)E g 0 nh(x + 1) = o 1 nh3 and therefore (1) = o 1 nh3.
With Cauchy-Schwarz it is easily seen that (2) Var g0
nh(x) and hence (2) = o( 1 nh3),
(3) Var g0
nh(x + 1) and hence (3) = o(nh13) and the last term (4) Var g 0
nh(x) and hence also
(4) = o( 1
nh3). 2
Note that for uniformly distributed variables Ti on [0, 1], i.e. q ≡ 1 on [0, 1], we have
b−(x) = b+(x) = f00(x). In this case the expectation of ft
nh does not depend on t.
3.7
Asymptotic behaviour of the combined estimator
In this section we prove asymptotic normality of the combined estimator. Recall that, for some fixed 0 ≤ t ≤ 1, we have
fnht (x) = tf−
nh(x) + (1 − t)f + nh(x)
and note that asymptotic normality for ft
nh(x) implicitly proves us asymptotic normality of our
former estimators fnh−(x) and fnh+(x) by taking t equal to one and zero.
Theorem 3.8. Assume that Condition W is satisfied and that f is bounded on a neighbourhood of x. Then, as n → ∞, h → 0 and nh3 → ∞, we have for 0 < x < 1
√ nh3(ft nh(x) − E f t nh(x)) D → N (0, σt2) with σt2 = 1 q(x) t 2F(x) + (1 − t)2(1 − F (x) Z w0(u)2du. Proof Write ft nh(x) for 0 < x < 1 as follows. fnht (x) = tfnh−(x) + (1 − t)fnh+(x) = t 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) + (1 − t) − 1 q(x)g 0 nh(x + 1) − q0(x) q2(x)gnh(x + 1) = t 1 q(x) n X i=1 1 nh2 w 0x − Vi h − q 0(x) q2(x) n X i=1 1 nhw x − Vi h ! + (1 − t) − 1 q(x) n X i=1 1 nh2 w 0x+ 1 − Vi h − q 0(x) q2(x) n X i=1 1 nhw x+ 1 − Vi h ! = 1 n n X i=1 Uih(x) with Uih = t 1 q(x) 1 h2 w 0x − Vi h − q 0(x) q2(x) 1 hw x − Vi h + (1 − t) − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0(x) q2(x) 1 hw x+ 1 − Vi h .
Note that E fnht (x) = E 1 n n X i=1 Uih(x) = 1 nE n X i=1 Uih(x) = 1 nnE U1h(x) = E U1h(x) (3.22) and Var ft nh(x) = Var 1 n n X i=1 Uih(x) = 1 n2 Var n X i=1 Uih(x) = 1 nVar U1h(x). (3.23) We need the following lemma to prove that ft
nh(x)−E fnht (x) (= 1 n
Pn
i=1 (U1h(x) − E U1h(x)))
is asymptotically normally distributed. The lemma enables us to show that the Lyapunov con-dition in the Central Limit Theorem is satisfied. The proof of this lemma can be found in Section A.2. The corresponding expression of the Lyapunov condition as stated below is de-rived in Section A.4.
Lemma 3.9. Under the assumptions of Theorem 3.8, we have for m even and for h → 0
E U1h(x)m = 1 q(x)m−1 1 h2m−1(t mF(x) + (1 − t)m(1 − F (x)) Z w0(u)mdu+ o 1 h2m−1 . (3.24)
We now check the Lyapunov condition with the help of Lemma 3.9. This means that for some δ > 0 we have to check
E |U1h(x) − E U1h(x)|2+δ
nδ/2(Var(U
1h(x)))1+δ/2
→ 0.
We use that E U1h(x) = E fnht (x) = O(1) and Var U1h(x) = O(1/h3). Furthermore we use
that (a + b)4 ≤ 23(a4+ b4). For δ = 2 we have
E |U1h(x) − E U1h(x)|4 n(Var(U1h(x)))2 ≤ 2 3(E U 1h(x)4+ (E U1h(x))4) n(Var(U1h(x)))2 ∼ 8 1 q(x)3 1 h7 (t 4F(x) + (1 − t)4(1 − F (x))R w0 (u)4du+ o( 1 h7) n(1 h3c2)2 ∼ 8c1 nhc2 2 → 0,
3.8
Derivation of the t-optimal estimator
We now define the ‘t-optimal’ estimator to be the estimator that minimizes the mean squared error of ft
nh(x). Recall that the mean squared error of fnht (x) can be written as
MSE ft nh(x) = bias 2ft nh(x) + Var f t nh(x). Note that E ft
nh(x) and hence bias fnht (x) would be independent of t and equal to E f − nh(x)
and E fnh+(x) if g00(x) = −g00(x + 1) (implying g000(x) = −g000(x + 1)). As we will see in Lemma
3.10 this equality holds when q(x) = I[0,1), i.e. if the Ti are uniformly distributed on [0, 1].
Minimisation of the mean squared error under t reduces in this case to minimisation of the variance under t. However in general E ft
nh(x) is dependent on t and so is its bias.
Minimisation of the mean squared error of ft
nh under t is the subject of this section. Our
first step is to obtain a clear expression for the bias of ft
nh. Our second step is then to combine
this expression with its variance to construct the MSE. We differentiate this MSE ft
nh(x) with
respect to t and finally we solve the equation d
dtMSE f t
nh(x) = 0 for t.
Lemma 3.10 below gives relations between g(x), g(x + 1) and their first, second and third derivatives. We use this lemma in our analysis of the mean squared error of ft
nh.
Lemma 3.10. If the density g is three times differentiable we have for 0 < x < 1
g(x) + g(x + 1) = q(x), g0(x) + g0(x + 1) = q0(x), g00(x) + g00(x + 1) = q00(x), and g000(x) + g000(x + 1) = q000(x). Proof
Summation of the inversion formulas (3.5) and (3.6) gives the first expression. Taking deriva-tives gives the next three equations. 2
We define the function ˜q(x) as follows.
˜ q(x) = 1 q(x)q 000 (x) − q 0(x) q2(x)q 00 (x). (3.25)
By equation (3.20) we find the following expressions for the bias of ft
nh and its derivative in
terms of ˜q(x). bias ft nh(x) = E fnht (x) − f (x) = 1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2) and d dtbias f t nh(x) = 1 2h 2 Z v2w(v)dv ˜q(x) + o(h2).
This leads us to an expression for MSE ft
nh(x) and d dtMSE f t nh(x). We have MSE ft nh(x) = bias f t nh(x) 2 + Var ft nh(x) =1 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2)2 + 1 q(x) 1 nh3 t 2F(x) + (1 − t)2(1 − F (x) Z w0(u)2du+ o 1 nh3 . Differentiation yields d dtMSE f t nh(x) = 2 bias f t nh(x) d dt bias f t nh(x) + d dtVar f t nh(x) = 21 2h 2 Z v2w(v)dv tb−(x) + (1 − t)b+(x) + o(h2) ×1 2h 2 Z v2w(v)dv ˜q(x) + o(h2) + 1 q(x) 1 nh3 (2F (x) + 2t − 2) Z w0(u)2du+ o 1 nh3 = 1 2h 4 Z v2w(v)dv 2 tb−(x) + (1 − t)b+(x) ˜q(x) + o(h4) + 1 q(x) 1 nh3 (2F (x) + 2t − 2) Z w0(u)2du+ o 1 nh3 .
We find the value for t that minimizes4 MSE ft
nh(x) by solving d
dtMSE f t
nh(x) = 0 with respect
to t. We will call this value t∗. Note that (tb−(x) + (1 − t)b+(x)) ˜q(x) = b+(x) + t˜q.
4One can check that the second derivative of MSE ft
We have t∗ = − 1 q(x) 1 nh3 (2F (x) − 2)R w 0(u)2du − 1 2h 4 R v2w(v)dv2 b+(x)˜q(x) + o(h4) + o( 1 nh3) 2 1 q(x) 1 nh3 R w0(u)2du+ 1 2h4 R v2w(v)dv 2 ˜ q2(x) = 2 1 q(x) 1 nh3 (1 − F (x))R w 0(u)2du − 1 2h 4 R v2w(v)dv2 b+(x)˜q(x) 2 1 q(x) 1 nh3 R w0(u)2du+ 12h4 R v2w(v)dv 2 ˜ q2(x) + o(1) = (1 − F (x)) − nh 7C(w)b+(x)˜q(x)q(x) 1 + nh7C(w)˜q2(x)q(x) + o(1), (3.26) where C(w) = 1 4 R v2w(v)dv2 R w0(u)2du . (3.27)
If q(x) ≡ 1 on [0, 1], we have that ˜q(x) equals 0 and hence t∗ = 1 − F (x) + o(1). This result is
also found in [van Es (2011)]. In the next section we analyse the value of t∗ for general densities
q(x).
3.9
Discussion on the optimal value of t
First of all we observe that t∗ depends on F (x). Thereby t∗ depends on g00(x + 1) and g000(x + 1)
by the factor b+(x). The resulting function ft∗
nh(x) is therefore, although theoretically correct,
not useful in practice. We call ft∗
nh(x) the optimal semi-estimator and use it as a benchmark.
In the next section we prove the existence of a real estimator that achieves, for a special choice of h, the same asymptotic value for the MSE as the optimal semi-estimator.
Secondly we observe that the value of t∗ still depends on the sample size n and the window
width h. We will discuss this impact below.
One can check that a minimum of the mean squared error of ft
nh with respect to h is
ob-tained when h ∼ cn−1/7, with c > 0 a constant. By this argument we distinguish three cases
in our analysis of t∗. We consider t∗ for an optimal choice for h, i.e. h ∼ cn−1/7 with the
constant c > 0. We also consider t∗ for a sub and super optimal choice5 for h, i.e. h n−1/7
and h n−1/7.
In equation (3.26) two different types of terms arise, terms of O( 1
nh3) and terms of O(h4).
If we choose h to be optimal, these terms are both of O( 1
n4/7). In this case t∗ ∼ 2 1 q(x)(1 − F (x))R w 0(u)2du − 1 2 R v 2w(v)dv2 b+(x)˜q(x) 2 1 q(x)R w 0(u)2du+1 2 R v2w(v)dv 2 ˜ q2(x) + o(1).
5Note the following restrictions to a sub and super optimal choice of h. The assumption nh3 → ∞ implies
h n1/31 and the requirement h ∼ n
To get an idea of the behaviour of the optimal semi-estimator ft∗
nh for optimal chosen h, we first
analyse the sub and super optimal choice for h.
If we choose h to be sub optimal the variance of ft
nh dominates the mean squared error, in
this case O(h4) = o(nh3). We expect our optimal estimator then to minimize its variance. Let
us check this. For sub optimal h n−1/7 one finds t∗ = 1 − F (x) + o(1). Our optimal estimator
equals in this case
fnht∗(x) = (1 − F (x))f−
nh(x) + F (x)f +
nh(x) + o(1). (3.28)
For small values of x the term (1 − F (x)) is bigger than F (x) and hence the contribution of fnh−(x) is bigger than the contribution of fnh+(x). For big x the opposite is true. This is consis-tent with the theory before; for small x the variance of fnh−(x) is smaller than the variance of fnh+(x) and for big x the opposite is true.
If we choose h to be super optimal the squared bias of ft
nh dominates the mean squared
error. In this case O(nh3) = o(h4). We expect our optimal estimator then to minimize its
squared bias. For super optimal h we find t∗ = −b+(x) ˜
q(x) + o(1) =
b+(x)
b+(x)−b−(x) + o(1). Our optimal
estimator equals in this case
fnht∗(x) = b +(x) b+(x) − b−(x)f − nh(x) + b−(x) b−(x) − b+(x)f + nh(x) + o(1). (3.29)
The contribution of fnh−(x) and fnh+(x) is in this case determined by their proportional difference in bias as expected.
We have seen that the expression for t∗ in case of an optimal h is less simple than the
expressions for sub and super optimal h. However by the above analysis for sub and super optimal h we can imagine its effect on the estimator. It will balance the contribution of fnh−(x) and fnh+(x) to ft
nh(x) while taking into account their single contribution to both the bias and
the variance.
In the sequel we prove asymptotic normality of a final estimator of the form ft
nh with
t = 1 − ˆFn(x) and ˆFn(x) an estimator for F (x). This final estimator will only be t-optimal in
the sub optimal regime of h. It however makes sense to choose h sub optimal.
First of all asymptotic normality states in this case that ft∗
nh approaches f (x) accurately,
because of the vanishing bias. A sub optimal choice for h is for this reason common in kernel density estimation literature. For instance the aim of Bickel and Rosenblatt in [Bickel and Rosenblatt (1973)] is to get global measures on how good an kernel estimator fn(x) is an
estimate of f (x). In particular they obtain asymptotic results for the maximum absolute deviation and integrated quadratic deviation. With these measures one is able to set up simple confidence intervals and one can use the measures as test statistics for the hypothesis H : f = f0.
The results are all obtained in the sub optimal regime of h.
Another advantage of a sub optimal choice with respect to a (super) optimal choice of h is illustrated in Section 4.3. As seen above t∗ depends on g(x), g00(x) and g000(x). However if h is
chosen sub optimal t∗ only depends on g(x). In practice we substitute the density estimators
gnh(x), gnh00 (x) and g 000
nh(x) for these functions. The need of g 00
nh(x) and g 000
nh(x) is a disadvantage.
First of all the variances of g00
nh(x) and g 000
nh(x) are of larger order. Secondly, boundary effects
occur more often. And thirdly the addition of the kernel estimators g00
nh(x) and g 000
nh(x) demand
the function g to be higher order differentiable.
3.10
Asymptotic normality of the final estimator
Choosing h sub optimal, we have found
t∗ = 1 − F (x) + o(1). In the sequel we obtain asymptotic normality for ft
nh(x) with t = 1 − ˆFn(x), where ˆFn(x) is
an estimator for F (x) that is consistent in mean squared error. Note that 1 − ˆFn(x) is not
generally in [0, 1]. We call this estimator our final estimator and write it as fnh(x). Hence we
have
fnh(x) = (1 − ˆFn(x))fnh−(x) + ˆFn(x)fnh+(x). (3.30)
In the last main theorem below we prove asymptotic normality of fnh and we give expansions
of the bias and variance for h = O(n−1/7). We also show that f
nh is optimal for sub optimal h,
i.e. h n−1/7.
Theorem 3.11. Assume that Condition W is satisfied, that f is twice continuously differen-tiable, q is three times continuously differentiable on a neighbourhood of x, and that ˆFn(x) is
an estimator of F (x) with
E ( ˆFn(x) − F (x))2 → 0. (3.31)
Then, as n → ∞ and h = O(n−1/7), we have nh3 → ∞, h → 0 and we have for 0 < x < 1
√ nh3(f nh(x) − E fnh(x)) D → N (0, σ2), with σ2 = F (x)(1 − F (x)) 1 q(x) Z w0(u)2du. Furthermore, if E ( ˆFn(x) − F (x))2 = o(nh7) (3.32) then E fnh(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+(x) + o(h2). If h n−1/7 then f
nh(x) is t-optimal and we have
√ nh3(f nh(x) − f (x)) D → N (0, σ2), as n → ∞.
Proof
Note that Theorem 3.8 proves asymptotic normality for the estimator ft
nh(x) with t = 1 − F (x),
because we have (1 − F (x)) ∈ [0, 1]. We use this fact to prove asymptotic normality for our final estimator fnh(x) = fnh1− ˆFn(x)(x). Write fnh(x) = (1 − ˆFn(x))fnh−(x) + ˆFn(x)fnh+(x) = fnh1−F (x)(x) + Rnh(x), (3.33) where Rnh(x) = ( ˆFn(x) − F (x))Snh(x) and Snh(x) = fnh+(x) − f − nh(x). (3.34) Note that √ nh3(f nh(x) − E fnh(x)) = √ nh3f1−F (x) nh (x) − E f 1−F (x) nh (x) + Rnh(x) − E Rnh(x) =√nh3f1−F (x) nh (x) − E f 1−F (x) nh (x) +√nh3R nh(x) − √ nh3E R nh(x).
We show in the sequel that √nh3R
nh(x) and
√
nh3E R
nh(x) converge to 0 in distribution.
We first rewrite Snh(x), we have
Snh(x) = − 1 q(x)g 0 nh(x + 1) − q0(x) q2(x)gnh(x + 1) − 1 q(x)g 0 nh(x) − q0(x) q2(x)gnh(x) = − 1 q(x) 1 n n X i=1 1 h2w 0x+ 1 − Vi h − q 0(x) q2(x) 1 n n X i=1 1 hw x+ 1 − Vi h ! − 1 q(x) 1 n n X i=1 1 h2 w 0x − Vi h − q 0(x) q2(x) 1 n n X i=1 1 hw x − Vi h ! = 1 n n X i=1 − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0(x) q2(x) 1 hw x+ 1 − Vi h − 1 q(x) 1 h2 w 0x − Vi h − q 0(x) q2(x) 1 hw x − Vi h ! = 1 n n X i=1 Wih(x) where Wih(x) = − 1 q(x) 1 h2 w 0x+ 1 − Vi h − q 0(x) q2(x) 1 hw x+ 1 − Vi h − 1 q(x) 1 h2 w 0x − Vi h − q 0(x) q2(x) 1 hw x − Vi h ! .
The next lemma establishes some properties of Snh(x). The proof of Lemma 3.12 can be found
in Section A.3.
Lemma 3.12. Under de assumptions of Theorem 3.11 we have
E Snh(x) = 1 2h 2 Z v2w(v)dv(b+(x) − b− (x)) + o(h2), (3.35) E Wih(x)m = 1 q(x)m−1 1 h2m−1 Z w0(u)mdu+ o 1 h2m−1 , (3.36) and √ nh3(S nh(x) − E Snh(x)) D → N0, 1 q(x) Z w0(u)2du . (3.37)
We start to analyse the term √nh3R
nh(x) and we prove that it converges to zero in
distri-bution. We rewrite the term as √ nh3R nh(x) = √ nh3( ˆF n(x) − F (x))Snh(x) =√nh3( ˆF n(x) − F (x))(Snh(x) − E Snh(x)) +√nh3( ˆF n(x) − F (x))E Snh(x).
We estimate the first and second term in the last line separately. By condition (3.31) we have that ˆFn(x) − F (x)
P
→ 0 and hence by Slutsky’s Theorem and the result of Lemma 3.12 we may conclude √ nh3 ˆF n(x) − F (x) (Snh(x) − E Snh(x)) D → 0.
Furthermore we have by Lemma 3.12 that E Snh(x) equals O(h2). Hence for h = O(n−1/7) we
have √ nh3E S nh(x) = O(n1/2h7/2) = O(n1/2(n−1/7)7/2) = O(1). (3.38)
Using again condition (3.31) we conclude √
nh3( ˆF
n(x) − F (x))E Snh(x) P
→ 0. (3.39)
Together equations (3.38) and (3.39) ensure that √
nh3R nh(x)
D
We now analyse the term √nh3E R
nh(x) and we prove that it converges to zero in
distribu-tion as well. By the Cauchy-Schwarz inequality we have E√nh3|R nh(x)| ≤ √ nh3(E ( ˆF n(x) − F (x))2)1/2(E (Snh(x))2)1/2 = E ( ˆFn(x) − F (x))2)1/2O(1) → 0.
By the fact that √nh3R
nh(x) and
√
nh3E R
nh(x) both converge to zero in distribution, we
may conclude that √nh3(f
nh(x) − E fnh(x)) has the same asymptotic normal distribution as
√
nh3(f1−F (x)
nh (x) − E f 1−F (x)
nh (x)), which proves the first statement of the theorem.
The second statement of the theorem is proven as follows. By equation (3.20) we have
E fnh1−F (x)(x) = f (x) + 1 2h 2 Z v2w(v)dv (1 − F (x))b−(x) + F (x)b+(x) + o(h2). (3.40) Furthermore we have E |Rnh(x)| ≤ (E ( ˆFn(x) − F (x))2)1/2(E (Snh(x))2)1/2 = o√nh7O√1 nh3 = o(h2). (3.41)
Together equation (3.40) and (3.41) prove the second statement of the theorem.
Finally, by equation (3.21), we find that
Var fnh1−F (x)(x) = 1 q(x) 1 nh3 (1 − F (x)) 2F(x) + F (x)2(1 − F (x) Z w0(u)2du+ o 1 nh3 = 1 q(x) 1 nh3F(x)(1 − F (x)) Z w0(u)2du+ o 1 nh3 .
Thus for sub optimal h we have Var fnh(x) ∼ Var fnh1−F (x)(x). Hence for sub optimal h the
estimator fnh(x) is t-optimal.
The last line follows by the fact that for nh3 → ∞ and h n−1/7 we have
√
nh3(E f
nh(x) − f (x)) → 0
as n → ∞. 2
Note that, in contrast to Theorem 3.8, we already assume sufficient smoothness on the functions f and q to prove the first statement of asymptotic normality. In equation (3.38) this assumption ensures that the difference between f+
nh(x) and f −
nh(x), denoted by Snh(x), satisfies
E Snh(x) = O(h2). Together with the restriction h = O(n−1/7) on h we obtain
√
nh3E S
nh(x) =
O(1).
Note also that Lemma 3.12 reveals that the expectation of the difference between the left and right estimator, i.e. E Snh(x), depends only on the density q as we have b+(x)−b−(x) = ˜q(x)
3.11
Consistent estimator of the distribution function
Theorem 3.11 is based on the assumption that an estimator that is consistent in mean squared error exists for the distribution function F . In this section we show that such an estimator can be found easily. In fact there are many suitable estimators of F .
The construction of the estimator starts with the inversion formulas (3.5) and (3.6) obtained in Section 3.2. For g we substitute again the kernel estimator gnh, defined in (3.9), as we did
in the construction of fnh− and f+
nh. This method leads to two different estimators for F for
x ∈[0, 1], Fnh−(x) = gnh(x) q(x) and F + nh(x) = 1 − gnh(x + 1) q(x) . (3.42) We define the estimator Ft
nh(x) for x ∈ [0, 1] as
Fnht (x) = tFnh−(x) + (1 − t)F+
nh(x) (3.43)
for some fixed 0 ≤ t ≤ 1. In the sequel we show that the estimator Ft
nh is consistent in mean
squared error. This is proven with the help of expansions of the expectation and the variance of Ft
nh. These expansions are stated in the next theorem of which the proof can be found in
Appendix A.5.
Theorem 3.13. Under the assumptions of Theorem 3.11, that is, if Condition W is satis-fied, f is twice continuously differentiable and q is three times continuously differentiable on a neighbourhood of x, we have E Ft nh(x) = F (x) + 1 q(x) 1 2h 2(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2) (3.44) and Var Ft nh(x) = 1 q(x) 1 nh(t 2F(x) + (1 − t)2(1 − F (x)) Z w(v)2dv+ o 1 nh . (3.45)
With the help of Theorem 3.13 we are able to check that the estimator Ft
nh is consistent in MSE. We have E Fnht (x) − F (x) 2 = bias2Fnht (x) + Var Fnht (x) = 1 q(x) 1 2h 2(tg00 (x) − (1 − t)g00(x + 1)) Z v2w(v)dv + o(h2) 2 + 1 q(x) 1 nh(t 2F(x) + (1 − t)2(1 − F (x)) Z w(v)2dv+ o 1 nh = O h4 + O 1 nh → 0, (3.46) as h → 0 and nh → ∞.
The second statement in Theorem 3.11 is based on the assumption that the estimator Fn(x)
satisfies
E ˆFn(x) − F (x)
2
= o(nh7).
This assumption is stated in equation (3.32). Note that we can choose different bandwidths h in ˆFn(x) and fnh(x) as long as we meet the requirements in Theorem 3.11. In the sequel we
will write h1 for the bandwidth in fnh(x) and h2 for the bandwidth in ˆFn(x). We show below
in which way the estimator Ft
nh(x) is able to satisfy equation (3.32) for respectively an optimal
and a sub optimal choice of h1.
An optimal choice of h1, i.e. h1 = cn− 1
7 with c > 0, reduces the assumption for Ft
nh(x) to
E Ft
nh(x) − F (x)
2
= o(1).
By equation (3.46) this requirement is met by the estimator Ft
nh(x) for all h2 such that h2 → 0
and nh2 → ∞ as n → ∞. In particular the requirement is met for the optimal choice of h2
in Ft
nh(x), given by h ∗
2 = cn
−1/5 with c > 0. The bandwidth h∗
2 is optimal in the sense that it
minimizes the MSE (equation (3.32)) with respect to h2.
A sub optimal choice of h1 requires the MSE of Fnht (x) to be of smaller order. It is possible to
keep the optimal choice h∗
2 ∼ n−1/5 when h1 is chosen sub optimal. However in order to satisfy
equation (3.32) one should not choose h1 too small. Note that by equation (3.46) the MSE of
Ft
nh(x) with h = h ∗
2 is of order O(n
−4/5). Hence to satisfy equation (3.32) one should always
choose h1 n−9/35. With this restriction Fnht (x) satisfies equation (3.32) for sub optimal h1
Chapter 4
Simulation
4.1
Illustration of the theory
In this section we illustrate the theory in Chapter 3 with the help of simulated examples. We define below the functions and values as well as the technique of computation that we use in the simulation of the examples.
As a kernel function we use the biweight kernel
w(x) = 15
16(1 − x
2)2I
[−1,1](x). (4.1)
We define ˆFn(x), proposed in Theorem 3.11, for fixed bandwidth h as
ˆ Fn(x) = F 1/2 nh (x) = 1 2(F − nh(x) + F + nh(x)). (4.2)
As seen in Section 3.11, ˆFn(x) meets the requirements of Theorem 3.11.
We want to reduce the computational power and we want to weigh each data point equally in our simulations. To achieve this we use a method similar as the method of WARPing, described in [H¨ardle (1991), Sec 1.4]. A short explanation is given below.
We divide the length of the domain l of our estimators in M intervals. The width of each interval is called the binwidth b. We count for each j ∈ {1, . . . , M } how many observations Vi
are in the interval bj = [(j − 1)b, jb]. We call this number nbj. Hence we have
nbj = n
X
i=1
I[Vi ∈ bj].
We give the observations Vi in the bin bj the average value of the bin, denoted by Vbj. Hence
we have Vbj = (j − 1)b + jb 2 = (j − 1 2)b.
With these considerations one can construct an estimator that approximates the initial estimator. We illustrate this for the estimator gnh. All other kernel estimators in the simulations
are approximated in the same way. De domain of gnhis [0, 2], hence l = 2. We choose M = 400.
Hence for the binwidth b we have b = 2/400 = 0.005. The estimator ˆgnh, given for x ∈ [0, 2] by
ˆ gnh(x) = 1 n M X j=1 1 hnbjw x − Vb j h , (4.3)
is an approximation of the estimator gnh.
In section 4.1.1, 4.1.2 and 4.1.3 we simulate three samples of UCSD where Xi ∼ Beta(2, 2).
For Ti we examine respectively Ti ∼ N (0.5, 0.3) conditioned on [0, 1], Ti ∼ U [0, 1] and Ti ∼
Beta(2, 5). The sample size is n = 10000 and the bandwidths are h1 = 0.22 and h2 = 0.16.
In Figure 1 the true density f , the known density q and the true density g are plotted for each of the three different cases.
4.1.1
T
i∼ N (0.5, 0.3) conditioned on [0, 1]
We start with the results for Ti ∼ N (0.5, 0.3) conditioned on [0, 1], which we will interchangeably
denote by Ti ∼ N[0,1](0.5, 0.3).
The estimators fnh−, f+
nh as well as the final estimator fnh are given in Figure 3. We analyse
Figure 3 (a) and 3 (b) with the help of Figure 2. In Figure 2 (a) the functions b−(x) and b+(x)
are drawn. Note that the image of b−(x) lies approximately between the values −50 and 50
for x in the interval [0, 0.8]. Note also that b−(x) changes sign around x = 0.15 and x = 0.7.
Hence we would expect the bias of fnh−(x) to be of equal magnitude in the interval [0, 0.8] and changing sign around x = 0.15 and x = 0.7. This is what we see in Figure 3 (a).
In Figure 1 (a) and in Figure 2 (b) the density g(x) and its second and third derivative are drawn. Note that g0(x) and g(3)(x) are not continuous at x = 1. This is however demanded
in Lemma 3.3. The kernel density estimator g0nh(x) is therefore insecure around x = 1. Recall
that h1 = 0.22. Hence for x ∈ [0.78, 1] we do not expect the graph of b−(x) to be an accurate
indication for the bias. In Figure 3 (a) we see that the discontinuity is causing a boundary effect in this interval.
Due to the decreasing shape of g(x) when approaching x = 0 the expected boundary effect at x= 0 is small and is not noticed in the graph of fnh−.
The graph of fnh+(x) in Figure 3 (b) can be analysed in the same way. Recall that fnh+(x) depends on g(x + 1) and its derivatives. Hence the discontinuity of g0(x) and g(3)(x) causes a
boundary effect for x ∈ [0, 0.22].
Due to the factors 1−F (x) and F (x) the discontinuity of g0(x) and g(3)(x) does not influence
our final estimator fnh. As seen in Figure 3 (c) the boundary effects have disappeared and our
final estimator approaches the density f better.
The estimators Fnh−, F+
nh and ˆFn = F 1/2
nh are given in Figure 4. The right tail of F −
nh is a
0.5 1.0 1.5 2.0 0.5 1.0 1.5 g(x) q(x) for Ti ∼ N(0.5,0.3) on [0,1] f(x) for Xi ∼ Beta(2,2)
(a) Xi ∼ Beta(2, 2) and Ti∼ N (0.5, 0.3) conditioned on [0, 1]
0.5 1.0 1.5 2.0 0.5 1.0 1.5 g(x) q(x) for Ti ∼ U(0,1) f(x) for Xi ∼ Beta(2,2) (b) Xi∼ Beta(2, 2) and Ti∼ U [0, 1] 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 2.5 g(x) q(x) for Ti ∼ Beta(2,5) f(x) for Xi ∼ Beta(2,2)
(c) Xi∼ Beta(2, 2) and Ti∼ Beta(2, 5)
0.2 0.4 0.6 0.8 1.0 - 50 50 100 150 b-(x) b+(x) (a) b−(x) and b+(x) 0.5 1.0 1.5 2.0 - 100 - 50 0 50 100 g''(x) g(3)(x)
(b) the density g(x) and derivatives
Figure 2: The functions b−, b+ and the derivatives of g for X
i ∼ Beta(2, 2) and Ti ∼ N[0,1](0.5, 0.3). 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 f -nh(x) f(x) (a) fnh−(x) 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 f+nh(x) f(x) (b) fnh+(x) 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 2.5 3.0 fnh(x) f(x) (c) fnh(x)
Figure 3: The estimators f− nh, f
+