Simultaneous inference for Berkson errors-in-variables regression under fixed design

(1)

Simultaneous inference for Berkson

errors-in-variables regression under fixed design

Katharina Proksch

1

, Nicolai Bissantz

2

and Hajo Holzmann

3

1_{University of Twente, The Netherlands}

Department of Applied Mathematics

2_Fakult¨_{at f¨}_{ur Mathematik}

Ruhr-Universit¨at Bochum, Germany

3_{Fachbereich Mathematik und Informatik}

Philipps-Universit¨at Marburg, Germany September 3, 2020

Abstract

In various applications of regression analysis, in addition to errors in the dependent observations also errors in the predictor variables play a substantial role and need to be in-corporated in the statistical modeling process. In this paper we consider a nonparametric measurement error model of Berkson type with fixed design regressors and centered ran-dom errors, which is in contrast to much existing work in which the predictors are taken as random observations with random noise. Based on an estimator that takes the error in the predictor into account and on a suitable Gaussian approximation, we derive finite sample bounds on the coverage error of uniform confidence bands, where we circumvent the use of extreme-value theory and rather rely on recent results on anti-concentration of Gaussian processes. In a simulation study we investigate the performance of the uniform confidence sets for finite samples.

Keywords: Berkson errors-in-variables; deconvolution; Gaussian approximation; uniform con-fidence bands

1 Introduction

In mean regression problems a predictor variable X, either a fixed design point or a random observation, is used to explain a response variable Y in terms of the conditional mean re-gression function g(x) = E[Y |X = x]. The case of a random covariate occurs when both X and Y are measured during an experiment and the case of fixed design corresponds to situations in which covariates can be set by the experimenter such as a machine setting, say, in a physical or engineering experiment. Writing ε = Y − E[Y |X] gives the standard form of the non-parametric regression model Y = g(X) + ε, that is, the response is observed with an additional error but the predictor can be set or measured error-free. In many experimen-tal settings this is not a suitable model assumption since either the predictor can also not be

1

Corresponding author: Dr. Katharina Proksch, Department of Applied Mathematics, University of Twente, Enschede, The Netherlands, Email: k.proksch@utwente.nl

(2)

measured precisely, or since the presumed setting of the predictor does not correspond exactly to its actual value. There are subtle differences between these two cases, which we illustrate by the example of drill core measurements of the content of climate gases in the polar ice. Assume that the content of climate gas Y at the bottom of a drill hole is quantified. The depth of the drill hole X is measured independently with error ∆ giving the observation W . A corresponding regression model is of the following form

Y = g(X) + ε, W = X + ∆, (1) where W , ∆ and ε are independent, ∆ and ε are centered, and observations of (Y, W ) are available. This model is often referred to as classical errors-in-variables model. A change in the experimental set-up might require a change in the model that is imposed. Assume that in our drill core experiment we fix specific depths w at which the drill core is to be analyzed. However, due to imprecisions of the instrument we cannot accurately fix the desired value of w, rather the true (but unknown) depth where the measurement is acquired is w + ∆. In this case a corresponding model, referred to as Berkson errors-in-variables model (Berkson, 1950), is of the form

Y = g(w + ∆) + ε, (2)

where ∆ and ε are independent and centered, w is set by the experimenter and Y is ob-served. In this paper we construct uniform confidence bands in the non-parametric Berkson errors-in-variables model with fixed design (2). In particular, we provide finite sample bounds on the coverage error of these bands. We also address the question how to choose the grid when approximating the supremum of a Gaussian process on [0, 1]. For Berkson-type mea-surement errors, a fixed design as considered in the present paper seems to be of particular relevance in experimentation in physics and engineering. Instead of using the classical ap-proach based on results from extreme-value theory (Bickel and Rosenblatt, 1973), we propose a multiplier bootstrap procedure and construct asymptotic uniform confidence regions by using anti-concentration properties of Gaussian processes which were recently derived by Chernozhukov et al. (2014).

For an early, related contribution see Neumann and Polzehl (1998), who develop the wild bootstrap originally proposed by Wu (1986) to construct confidence bands in a nonpara-metric heteroscedastic regression model with irregular design. While their method could potentially also be adopted in our setting, we preferred to work with the multiplier bootstrap which allows for a more transparent analysis.

There is a vast literature on errors-in-variables models, where most of the earlier work is focused on parametric models (Berkson, 1950; Anderson, 1984; Stefanski, 1985; Fuller, 1987). A more recent overview of different models and methods can be found in the monograph by Carroll et al. (2006). In a non-parametric regression context, Fan and Truong (1993) consider the classical errors-in-variables setting (1), construct a kernel-type deconvolution estimator and investigate its asymptotic performance with respect to weighted Lp-losses and L∞-loss

and show rate-optimality for both ordinary smooth and super smooth known distributions of errors ∆. The case of Berkson errors-in-variables with random design is treated, e. g., in Delaigle et al. (2006), who also assume a known error distribution, Wang (2004), who assumes a parametric form of the error density, and Schennach (2013), whose method relies on the availability of an instrumental variable instead of the full knowledge of the error distribution. Furthermore, Delaigle et al. (2008) consider the case in which the error-distribution is un-known but repeated measurements are available to estimate the error distribution. A mixture

(3)

of both types of errors-in-variables is considered in Carroll et al. (2007) and the estimation of the observation-error variance is studied in Delaigle and Hall (2011).

However, in the aforementioned papers the focus is on estimation techniques and the investi-gation of theoretical as well as numerical performance of the estimators under consideration. In the non-parametric setting only very little can be found about the construction of statis-tical tests or confidence statements. Model checks in the Berkson measurement error model are developed in Koul and Song (2008, 2009), who construct goodness-of-fit tests for a para-metric point hypothesis based on an empirical process approach and on a minimum-distance principle for estimating the regression function, respectively. The construction of confidence statements seems to be discussed only for classical errors in variables models with random design in Delaigle et al. (2015), who focus on pointwise confidence bands based on bootstrap methods and in Kato and Sasaki (2019), who provide uniform confidence bands.

This paper is organized as follows. In Section 2 we discuss the mathematical details of our model and describe non-parametric methods for estimating the regression function in the fixed design Berkson model. In Section 3 we state the main theoretical results and in particular discuss the construction of confidence bands in Section 3.2, where we also discuss the choice of the bandwidth. The numerical performance of the proposed confidence bands is investigated in Section 4. Section 5 outlines an extension to error densities for which the Fourier transform is allowed to oscillate. Some auxiliary lemmas are stated in Section 6. Technical proofs of the main results from Section 3 are provided in Section 7, while details and proofs for the extension in Section 5 along with some additional technical details are given in the Appendix, Section A. In the following, for a function f , which is bounded on some given interval [a, b], we denote by kf k = kf k_[a,b] = sup_x∈[a,b]|f (x)| its supremum norm. The Lp_{-norm of f over}

all of R is denoted by kf kp. Further, for w ∈ R we set hwi := (1 + w2) 1 2.

2 The Berkson errors-in-variables model with fixed design

The Berkson errors-in-variables model with fixed design that we shall consider is given by Yj = g(wj + ∆j) + εj, (3)

where wj = j/(n an), j = −n, . . . , n, are the design points on a regular grid, an is a design

parameter that satisfies an → 0, nan → ∞, and ∆j and εj are unobserved, centered,

inde-pendent and identically distributed errors for which Var[ε1] = σ2 > 0 and E|ε1|M < ∞ for

some M > 2. The density f∆ of the errors ∆j is assumed to be known. For ease of notation,

we consider an equally spaced grid of design points here. However, this somewhat restrictive assumption can be relaxed to more general designs with a mild technical effort, as we elab-orate in the Appendix, Section A.2. For random design Berkson errors-in-variables models, Delaigle et al. (2006) point out that identification of g on a given interval requires an infinitely supported design density if the error density is of infinite support. This corresponds to our assumption that asymptotically, the fixed design exhausts the whole real line, which is assured by the requirements on the design parameter an. Meister (2010) considers the particular case

of normally distributed errors ∆ and bounded design density, where a reconstruction of g is possible by using an analytic extension. If we define γ as the convolution of g and f∆(−·),

that is,

γ(w) = Z

R

(4)

then E[Yj] = γ(wj), and the calibrated regression model (Carroll et al., 2006) associated with

(3) is given by

Yj = γ(wj) + ηj, ηj = g(wj+ ∆j) − γ(wj) + εj. (4)

Here the errors ηj are independent and centered as well but no longer identically distributed

since their variances ν2(wj) = E[ηj2] depend on the design points. To be precise, we have that

ν2(wj) =

Z

g(wj+ δ) − γ(wj)

2

f∆(δ) d δ + σ2 ≥ σ2 > 0. (5)

This reveals the increased variability due to the errors in the predictors. The following

Figure 1: Alleged data points (wj, Yj), actual data points (wj+∆j, Yj), a comparison between

g (dashed line) and γ (solid line) and a comparison between σ2 (dashed line) and ν2 (solid line) (clockwise from upper left to lower left).

considerations show that ignoring the errors in variables can lead to misinterpretations of the data at hand. To illustrate, in the setting of simulation Section 4, scenario 2, Figure 1 (upper left panel) shows the alleged data points (wj, Yj), that is, the observations at the incorrect,

presumed positions, for a sample of size n = 100. In addition to the usual variation introduced by the errors εi in y-direction this display shows a variation in x-direction introduced by the

errors in the wj. The upper right panel shows the actual but unobserved data points (wj+

∆j, Yj) that only contain the variation in the y-direction. Ignoring the errors-in-variables leads

to estimating γ instead of g, which introduces a systematic error. The functions γ (solid line) and g (dashed line) are both shown in the lower right panel of Figure 1. The corresponding variance function is shown in the lower left panel of Figure 1 (solid line) in comparison to

(5)

the constant variance σ2 (dotted line). Apparently, there is a close connection between the calibrated model (4) and the classical deconvolution regression model as considered in Birke et al. (2010) and Proksch et al. (2015) in univariate and multivariate settings, respectively. In contrast to the calibrated regression model (4), in both works an i.i.d. error structure is assumed. Also, our theory provides finite sample bounds and is derived under weaker assumptions, requiring different techniques of proof. In particular, the previous, asymptotic, results are derived under a stronger assumption on the convolution function f∆.To estimate

g, we estimate the Fourier transform of γ, Φγ(t) = Z eitwγ(w) dw, by Φb_γ(t) = 1 nan n X j=−n Yjeitwj.

An estimator for g is then given by ˆ gn(x; h) = 1 2π Z R e−itxΦk(ht) ˆ Φγ(t) Φf∆(−t) dt. (6)

Here h > 0 is a smoothing parameter called the bandwidth, and Φkis the Fourier transform of

a bandlimited kernel function k that satisfies Assumption 2 below. Notice that both Φγ and

Φf∆ tend to zero as |t| → ∞ such that estimation of Φγ in (6) introduces instabilities for large

values of |t|. Since the kernel k is bandlimited, the function Φk is compactly supported and

the factor Φk(ht) discards large values of t, therefore serving as regularization. The estimator

can be rewritten in kernel form as follows: ˆ gn(x; h) = 1 nanh n X j=−n YjK wj− x h ; h ,

where the deconvolution kernel K(·; h) is given by K(w; h) = 1 2π Z R e−itw Φk(t) Φf∆(−t/h) dt. (7)

3 Theory

By Wm_{(R) we denote the Sobolev spaces W}m_{(R) = {g | kΦ}

g(·) h · imk2 < ∞}, m > 0, where

we recall that hwi := (1 + w2)12 for w ∈ R. We shall require the following assumptions.

Assumption 1. The functions g and f∆ satisfy

(i) g ∈ Wm(R) ∩ Lr(R) for all r ≤ M and for some m > 5/2, (ii) f∆ is a bounded, continuous, square-integrable density,

(iii) Φg∗f∆ = Φg· Φf∆ ∈ W

s_{(R) for some s > 1/2.}

Assumption 1 (i) stated above is a smoothness assumption on the function g. In Lemma 1 in Section 6.1 we list the properties of g that are frequently used throughout this paper and that are implied by this assumption. In particular, by Sobolev embedding, m > 5/2 implies that the function g is twice continuously differentiable, which is used in the proof of Lemma 5.

(6)

Assumption 2. Let Φk ∈ C2(R) be symmetric, Φk(t) ≡ 1 for all t ∈ [−D, D], 0 < D < 1,

|Φk(t)| ≤ 1 and Φk(t) = 0, |t| > 1.

In contrast to kernel-estimators in a classical non-parametric regression context, the kernel K(·; h), defined in (7), depends on the bandwidth h and hence on the sample size via the factor 1/Φf∆(−t/h). For this reason, the asymptotic behavior of K(·; h) is determined by the

properties of the Fourier transform of the error-density f∆. The following assumption on Φf∆

is standard in the non-parametric deconvolution context (see, e.g., Kato and Sasaki, 2019; Schmidt-Hieber et al., 2013) and will be relaxed in Section 5 below.

Assumption 3. Assume that Φf∆(t) 6= 0 for all t ∈ R and that there exist constants β > 0

and 0 < c < C, 0 < CS such that

chti−β ≤ |Φf∆(t)| ≤ Chti −β and Φ (1) f∆(t) ≤ C_Shti−β−1. (S) A standard example of a density that satisfies Assumption 3 is the Laplace density with parameter a > 0,

f∆,0(a; x) = a₂e−a|x| with Φf∆,0(a; t) = ht/ai

−2_. ₍₈₎

In this case we find β = 2, C = a2∨ 1, c = a2_{∧ 1 and C}

S = 2/a2∨ 2a2.

Remark 1. Our asymptotic theory cannot accommodate the case of exponential decay of the Fourier transform of the density f∆, as the asymptotic behaviour of the estimators in the

supersmooth case and the ordinary smooth case differs drastically. While for the ordinary smooth case considered here ˆg(x) and ˆg(y) are asymptotically independent if x 6= y, convolu-tion with a supersmooth distribuconvolu-tion is no longer local and causes dependencies throughout the domain. This leads to different properties of the suprema supx∈[0,1]|ˆg(x; h) − E[ˆg(x; h)]|,

which play a crucial role in the construction of our confidence bands. In particular, the asymptotics strongly depend on the exact decay of the characteristic function Φf∆ and needs

a treatment on a case to case basis (more details on the latter issue can be found in van Es and Gugushvili (2008))).

3.1 Simultaneous inference

Our main goal is to derive a method to conduct uniform inference on the regression function g, which is based on a Gaussian approximation to the maximal deviation of ˆgn from g.

We consider the usual decomposition of the difference g(x) − ˆgn(x; h) into deterministic and

stochastic parts, that is

g(x) − ˆgn(x; h) = g(x) − E[ˆgn(x; h)] + E[ˆgn(x; h)] − ˆgn(x; h), where ˆ gn(x; h) − E[ˆgn(x; h)] = 1 nanh n X j=−n ηjK wj− x h ; h . (9)

If the bias, the rate of convergence of which is given in Lemma 5 in Section 6, is taken care of by choosing an undersmoothing bandwidth h, the stochastic term (9) in the above decomposition dominates.

(7)

Theorem 1 below is the basic ingredient for the construction of the confidence statements under Assumption 3. It guarantees that the random sum (9) can be approximated by a distribution free Gaussian version, uniformly with respect to x ∈ [0, 1], that is, a weighted sum of independent, normally distributed random variables such that the required quantiles can be estimated from this approximation. In the following assumption conditions on the bandwidth and the design parameter are listed which will be needed for the theoretical results. Assumption 4. (i) ln(n)nM2−1/(a_nh) + h an + ln(n) 2_{h + ln(n)a} n+_na 1 nh1+2β = o(1), (ii) pnanh2m+2β+ p na2s+1n h2+ 1/ √ nanh2 = o(1/pln(n)).

The following example is a short version of a lengthy discussion given in the Appendix, Section A.3. More details can be found there.

Example 1. In a typical setting, the conditions listed in Assumption 4 are satisfied if h is the rate optimal bandwidth of classical deconvolution problems. As an example, consider the case of a function g ∈ Wm(R), m > 5/2, of bounded support, f∆ as in (8) and E[ε41] ≤ ∞. Then

β = 2 and Assumption 1 (iii) holds for any s > 0 such that an can be chosen of order n−ε

for ε arbitrarily small. The rate optimal bandwidth in the classical deconvolution problem is of order n−1/(2(mβ)) (Fan, 1991). With the choices of an = n−ε and h = n−1/(2(mβ)), ε

sufficiently small and s sufficiently large, Assumption 4 (i) and (ii) reduce to the requirements 1/(nanh1+2β) = o(1) and nanh2m+2βln(n) = o(1), respectively. These are met for small ε > 0

since 1/(nanh1+2β) ≥ n−1+ε+5/9 and nanh2m+2βln(n) = ln(n)n−ε.

The first term in Assumption 4 (i) stems from the Gaussian approximation and becomes less restrictive if the number of existing moments of the errors εi increases. The last term in

(i) guarantees that the variance of the estimator tends to zero. The terms in between are only weak requirements and are needed for the estimation of certain integrals. Assumption 4 (ii) guarantees that the bias is negligible under Assumption 3. The first term guarantees undersmoothing, the second term stems from the fact that only observations from the finite grid [−1/an, 1/an] are available, while the third term accounts for the discretization bias. It

is no additional restriction if β > 1/2. For a given interval [a, b], recall that kf k = kf k[a,b]

denotes the supremum norm of a bounded function on [a, b].

Theorem 1. Let Assumptions 2 - 3 and 4 (i) be satisfied. For some given interval [a, b] of interest, let ˆνn be a nonparametric estimator of the standard deviation in model (4) such that

ˆ ν > σ/2 and P 1_ˆ_ν −_ν1 [a,b]> n2/M √ nanh = o(1). (10)

(i) There exists a sequence of independent standard normally distributed random variables (Zn)n∈Z such that for

Dn(x) := √ nanhhβ ˆ ν(x) gˆn(x; h) − E[ˆgn(x; h)], Gn(x) := hβ √ nanh n X j=−n ZjK _w j−x h ; h , (11)

(8)

we have that for all α ∈ (0, 1)

P kD_nk ≤ q_kG_n_k(α) − α

≤ r_n,1, (12) where q_kG_nk(α) is the α-quantile of kGnk and for some constant C > 0

rn,1= P _ν1_ˆ −1_ν > n 2/M √ nanh + C 1 n+ n2/M√_ln(n)3₎ √ nanh .

(ii) If, in addition, Assumption 4 (ii) and Assumption 1 are satisfied, E[ˆgn(x; h)] in (11)

can be replaced by g(x) with an additional error term of order rn,2 =

p nanh2m+2β+ p na2s+1n h2+ 1/ √ nanh2.

In particular, Theorem 1 implies that limn→∞P kDnk ≤ q_kGnk(α) = α for all α ∈ (0, 1).

Re-garding assumption (10), properties of variance estimators in a heteroscedastic non-parametric regression model are discussed in Wang et al. (2008).

The following theorem is concerned with suitable grid widths of discrete grids Xn,m ⊂ [a, b]

such that the maximum over [a, b] and the maximum over Xn,m behave asymptotically

equiv-alently.

Theorem 2. For some given interval [a, b] of interest, let Xn,m ⊂ [a, b] a grid of points

a = x0,n≤ x1,n≤ . . . ≤ xm,n = b. Let kf kXn,m := maxx∈Xn,m|f (x)|. If the grid is sufficiently

fine, i.e.,

|Xn,m| := max

1≤i≤m|xi,m− xi−1,m| ≤

h1/2 na1/2n

, then, under the assumptions of Theorem 1, The following holds.

(i) For all α ∈ (0, 1) P kDnk ≤ qkGnkXn,m(α) − α ≤ rn,1(1 + o(1)). (13) (ii) If, in addition, Assumption 4 (ii) and Assumption 1 are satisfied, E[ˆgn(x; h)] in (13)

can be replaced by g(x) with an additional error term of order rn,2 =

p nanh2m+2β+ p na2s+1n h2+ 1/ √ nanh2.

3.2 Construction of the confidence sets and bandwidth choice

In this section we present an algorithm which can be used to construct uniform confidence sets based on Theorem 1. Let Gn(x) be the statistic defined in (11). In order to obtain

quantiles that guarantee uniform coverage of a confidence band, generate M times kGnkXn,m,

where |Xn,m| = o(h3/2a1/2n / ln(n)) (see Theorem 2), that is, calculate ˆνn(x) for x ∈ Xn,m,

generate M times 2n + 1 realizations of independent, standard normally distributed random variables Z1,j, . . . , Z2n+1,j, j = 1, . . . , M. Calculate Mn,j := maxx∈Xn,m|Gn,j(x)|. Estimate

the (1 − α)-quantile of kGnk from Mn,1, . . . , Mn,M and denote the estimated quantile by

ˆ

q_kG_nk_Xn,m(1 − α). From Theorem 1 we obtain the confidence band

ˆ gn(x; h) ± ˆqkGnkXn,m(1 − α) ˆ νn(x) √ nanh1/2+β , x ∈ [a, b]. (14)

(9)

Remark 2. Given a suitable estimator for the variance ν2, Theorem 1 and Example 1 imply that, typically, the coverage error of the above bands will be of order n2/Mpln(n)/√nanh +

p

nanh2m+2β. The first term is determined by the accuracy of the Gaussian approximation

and will be negligible if the distribution of the errors εi possesses sufficiently many moments,

while the second term is of order √an if the optimal bandwidth of classical deconvolution

problems is used. This shows that, in contrast to confidence bands based on asymptotic quantiles, the coverage error typically decays polynomially in n.

Remark 3. In nonparametric regression without errors-in-variables the widths of uniform confidence bands are of order pln(n)/√nh (see, e.g., Neumann and Polzehl, 1998). Our bands (14) are wider by the factor 1/(anhβ) which is due to the ill-posedness (β) and the,

possibly slow, decay of γ (expressed in terms of an).

For the choice of the bandwidth, Gin´e and Nickl (2010) (see also (Chernozhukov et al., 2014)) convincingly demonstrated how to use Lepski’s method to adapt to unknown smoothness when constructing confidence bands. In our framework, choose an exponential grid of band-widths hk = 2−k for k ∈ {kl, . . . , ku}, with kl, ku ∈ N being such that 2−ku ' 1/n and

2−kl _{' (log n)/(na} n)

1/(β+ ¯m)

and where ¯m corresponds to the maximal degree of smooth-ness to which one intends to adapt. Then for a sufficiently large constant CL> 0 choose the

index k according to ˆ k = mink ∈ {kl, . . . , ku} | kˆg(·; hk) − ˆg(·; hl)k ≤ CL log n n anh1+2 β_l 1/2 ∀ k ≤ l ≤ ku ,

and choose an undersmoothing bandwidth according as ˆh = hˆ_k/ log n. A result in

anal-ogy to Gin´e and Nickl (2010) would imply that under an additional self-similarity con-dition on the regression function g, using ˆh in (14) produces confidence bands of width

log n/(n an)

m−1/2_β+m

(log n)β+1/2 if g has smoothness m. Technicalities in our setting would be even more involved due to the truncated exhaustive design involving the parameter an.

Therefore, we refrain from going into the technical details. In the subsequent simulations we use a simplified bandwidth selection rule which, however, resembles the Lepski method.

4 Simulations

n = 100 n = 100 n = 750 n = 750 σ = σδ= 0.1 σ = σδ = 0.05 σ = σδ = 0.1 σ = σδ= 0.05

ga 0.25 0.24 0.21 0.12

gb 0.20 0.22 0.22 0.11

Table 1: Regularization parameter used in the subsequent simulations. See text for details on its selection.

In this section we investigate the numerical performance of our proposed methods in finite samples. We consider several different computational scenarios. As regression functions we

(10)

n = 100 n = 100 n = 750 n = 750 σ = σδ= 0.1 σ = σδ = 0.05 σ = σδ = 0.1 σ = σδ= 0.05

ga 5.8% 7.2% 5.1% 5.6%

gb 1.8% 5.3% 5.0% 5.0%

Table 2: Simulated rejection probabilities for bootstrap confidence bands. n = 100 n = 100 n = 750 n = 750

σ = σδ= 0.1 σ = σδ = 0.05 σ = σδ = 0.1 σ = σδ= 0.05

ga 0.44 0.16 0.21 0.14

gb 0.86 0.28 0.24 0.22

Table 3: Average width of bootstrap confidence bands.

consider

ga(x) = (1 − 4(x − 0.1)2)5I[0,1](2|x − 0.1|),

and

gb(x) = (1 − 4(x + 0.4)2)5I[0,1](2|x + 0.4|) + (1 − 4(x − 0.3)2)5I[0,1](2|x − 0.3|).

For the error distribution f∆ we chose two densities of a Laplace distribution as defined in

(8) with a = 0.1√

2 and a = 0.05_√

2, i.e. standard deviations σδ = 0.1 and σδ = 0.05, respectively.

Finally, an= 2/3 in all simulations discussed below. Our estimation is based on an application

of the Fast Fourier transform implemented in python/scipy. The integration used a damped version of a spectral cut off with cut-off function I(ω) = 1 − exp(−_(ω·h)1 2) in spectral space.

Construction of the confidence bands requires the selection of a regularization parameter for the estimator ˆg. In our simulations, we have chosen this parameter by a visual inspection of a sequence of estimates for the regularization parameter, covering a range from over- to under-smoothing, see Figure 2. We chose the minimal regularization parameter for which the estimates do not change systematically in overall amplitude, but appear to only exhibit additional random fluctuations at smaller values of the parameter. In the case shown here, we chose a regularization parameter of 0.27. The same procedure was followed for other combinations of n, σ, σδ and signal ga resp. gb) and the results can be found in Table 1.

This regularization parameter was then kept fixed for each combination of n, σ, σδand signal

g ∈ {ga, gb}. Figures 3 and 4 show four random examples each for estimates of ga and gb,

respectively, together with the associated confidence bands from 250 bootstrap simulations. Solid lines represent the true signal ga and gb and dashed lines the estimates ˆgntogether with

their associated confidence bands. Again, in both cases, n = 100, σ = 0.1 and σδ = 0.1.

Next, we discuss the practical performance of the bootstrap confidence bands in more detail for the first scenario, where the model is correctly specified and the errors in the predictors are taken into account as well. The results are shown in Tables 2 and 3 for the simulated rejection probabilities (one minus the coverage probability) at a nominal value of 5% and for the (average) width of the confidence bands. In all cases, we performed simulations based on 500 random samples of data and nominal rejection probability 5% (i.e. confidence bands with nominal coverage probability of 95%). For each of these data samples, we repeated 250

(11)

Figure 2: Sequence of estimates for increasing regularization parameter from a random sample of observations of signal gb with n = 100 and σ = σδ= 0.1.

times the following scenario. First, we determined the width of the confidence bands from 250 bootstrap simulations and second, we evaluated whether the confidence bands cover the true signal everywhere in an interval of interest. The numbers shown in the table give the percentage of rejections, i.e. of where the confidence bands do not overlap the true signal everywhere in such an interval. Here, the intervals of interest are chosen as an interval where the respective signal is significantly different from 0. The intention of this is that in many practical applications the data analyst is particularly interested in those parts of the signal. Here, we chose the interval [−0.7, 0.6] as ’interval of interest’ for ga and gb. From the tables

we conclude that the method performs well, particularly for n = 750, where the confidence bands are substantially less wide.

5 Extensions

The following assumption is less restrictive than Assumption 3, (S).

Assumption 5. Assume that Φf∆(t) 6= 0 for all t ∈ R and that there exist constants β > 0

and 0 < c < C, 0 < CW such that

chti−β ≤ |Φ_f_∆(t)| ≤ Chti−β and Φ

(1) f∆(t)

≤ C_Whti−β. (W) An example for a density that satisfies Assumption 5 but not Assumption 3 is given by the

(12)

Figure 3: True signal (solid line), observable signal (dash-dotted line) and estimates and associated confidence bands (dashed lines) from four random samples for ga for n = 100 (top)

and n = 750 (bottom) and σ = σδ = 0.1.

mixture f∆,1(1; x) = λ 2f∆,0(1; x − µ) + λ 2f∆,0(1; x + µ) + (1 − λ)f∆,0(1; x), (15) where λ ∈ (0, 1/2) and µ 6= 0, and f∆,0 is the Laplace density defined in (8). We find

Φf∆,1(t) = (1 − λ + λ cos(µt))hti −2

, which yields β = 2, c = 1 − 2λ and CW = λµ + 4.

Technically, Assumption 3, (S) allows for sharper estimates of the tails of the deconvolution kernel (7) than does Assumption 4, (W), see Lemma 4 in Section 6. In this case we have to proceed differently as the approximation via a distribution free process such as Gn can no

longer be guaranteed and we can only find a suitable Gaussian approximation depending on the standard deviation ν.

Roughly speaking, we approximate Dn(x) in (11) by the process

e Gn(x) = p nanh1+2β h ˜νn(x) X j ˜ νn(ωj) ZjK wj− x h ; h , (16)

for a variance estimator ˜νn on growing intervals |x| ≤ n an(1 − δ) for some δ > 0. We

then replace the quantiles involving Gn in (12), (13) and in (14) by (the conditional quantiles

given the sample) of eGn. Our theoretical developments involve a sample splitting, hence are

(13)

Figure 4: True signal (solid line), observable signal (dash-dotted line) and estimates and associated confidence bands (dashed lines) from four random samples for gb for n = 100 (top)

and n = 750 (bottom) and σ = σδ = 0.1.

We have also simulated a version of the bootstrap for the extended model. However, as simulations show, the results are clearly not as good as for the more restrictive assumptions on f∆. We have used f∆,1(x) = λ 2 · f∆,0(a; x − 0.3) + (1 − λ) · f∆,0(a; x) + λ 2 · f∆,0(a; x + 0.3), with f∆,0 again the Laplace density defined in (8), a = 0.05/

√

2 and λ = 0.2. For the signal ga in Section 4 we find confidence band widths of 0.686 and 0.462 for n = 100 and n = 750,

respectively, at simulated coverage probabilities of 6.3% and 4.5% and bandwidths of 0.59 and 0.32, for σ = σδ= 0.1.

Acknowledgements

HH gratefully acknowledges financial support form the DFG, grant Ho 3260/5-1. NB ac-knowledges support by the Bundesministerium f¨ur Bildung und Forschung through the project “MED4D: Dynamic medical imaging: Modeling and analysis of medical data for improved diagnosis, supervision and drug development”. KP gratefully acknowledges financial support by the DFG through subproject A07 of CRC 755.

6 Auxiliary Lemmas

(14)

6.1 Properties of the regression function g, the variance function ν and the convolution kernel K

Assumption 1 stated above is basically a smoothness assumption on the function g. In the following lemma we list the properties of g that are frequently used throughout this paper and that are implied by Assumption 1.

Lemma 1. Let Assumption 1 hold.

(i) The function g is twice continuously differentiable.

(ii) The function g has uniformly bounded derivatives: kg(j)k∞< ∞, j ≤ 2.

Given Assumption 1 (ii), the properties of the function g given in Lemma 1 are transferred to the convolution γ = g ∗ f∆. This is made precise in the following lemma.

(i) The function γ = g ∗ f∆ is twice continuously differentiable with derivatives γ(j) =

g(j)∗ f_∆. (ii) γ ∈ Wm(R).

(iii) The function γ has uniformly bounded derivatives: kγ(j)k∞< ∞, j ≤ 2.

Furthermore, the variance function ν2, defined in (5), is a function that depends on f∆, γ and

g. The following lemma lists the properties of ν2, which are implied by the previous Lemmas 1 and 2, and that are frequently used throughout this paper.

(i) The variance function ν2 is uniformly bounded and bounded away from zero.

(ii) The variance function ν2 is twice continuously differentiable with uniformly bounded derivatives.

For the tails of the kernel, we have the following estimate. Lemma 4. For any a > 1 and x ∈ [0, 1] we have

Z {|z|>a} K z − x h ; h 2 dz ≤ C 2a a2_{− x}2 · ( h−2β, if Ass. 4, (W) holds, h−2β+2, if Ass. 3, (S) holds.

Lemma 5. Let Assumptions 1 and 2 be satisfied. Further assume that h/an→ 0 as n → ∞.

(i) Then for the bias, we have that

sup x∈[0,1] Eˆgn(x; h) − g(x) = O hm−12 + 1 nanhβ+ 3 2 + ( o as+1/2n h1−β, Ass. 3, (S), o as+1/2n h−β, Ass. 4, (W).

(ii) a) For the variance if Assumption 5, (W) holds and nanh1+β → ∞, then we have that

σ2 2Cπ(1 + O(an)) ≤ nanh 1+2β_Var[ˆ_g n(x; h)] ≤ 2βsup_x∈Rν2(x) cπ . (ii) b) If actually Assumption 3, (S) holds and nanh1+β → ∞, then

ν2(x) Cπ (1 + O(an)) ≤ nanh 1+2β_Var[ˆ_g n(x; h)] ≤ ν2(x) cπ (1 + O (h/an)). Here c, C and β are the constants from Assumption 5 respectively 3.

(15)

6.2 Maxima of Gaussian processes

Let {Xt| t ∈ T } be a Gaussian process and ρ be a semi-metric on T . The packing number

D(T, δ, ρ) is the maximum number of points in T with distance ρ strictly larger than δ > 0. Similarly to the packing numbers, the covering numbers N (T, δ, ρ) are defined as the number of closed ρ-balls of radius δ, needed to cover T . Let further d_Xdenote the standard deviation semi-metric on T , that is,

d_X(s, t) = E|Xt− Xs|2 1 2 for s, t ∈ T.

In the following, we drop the subscript if it is clear which process induces the pseudo-metric d.

Lemma 6. There exist constants CE, CEb ∈ (0, ∞) such that

(i) N (T, δ, d_G_n) ≤ D(T, δ, d_G_n) ≤ CE h3/2_a1/2 n δ . (ii) N (T, δ, d GKb n) ≤ D(T, δ, dGKb n) ≤ C b E h3/2_a1/2 n δ , where GKb

n is defined as Gn with K replaced

by bK, where bK(z; h) = zK(z; h).

Lemma 7. Let (Xn,1(t), t ∈ T ) and (Xn,2(t), t ∈ T ) be almost surely bounded, centered

Gaus-sian processes on a compact index set T and suppose that for any fixed n ∈ N diamd_Xn,1(T ) >

Dn> 0. If

d_Xn,1(s, t) ≤ dXn,2(s, t) ∀ s, t ∈ T and EkXn,2k = o(1/

p ln(n)), we have that E [kXn,1k] ≤ 2E [kXn,2k] and hence kXn,1k = oP(1/ p ln(n)).

7 Proofs of Theorems 1 and 2

In the following, the letter C denotes a generic, positive constant, whose value may vary form line to line. The abbreviations Rn and eRn, possibly with additional subscripts, are used to

denote remainder terms and their definition may vary from proof to proof.

Proof of Theorem 1. We first prove assertion (i). Let ρn := n2/Mln(n)/

√ nanh and notice that P kDnk ≤ qkGnk(α) ≤ P kGnk ≤ qkGnk(α) + ρn + P kD_nk − kG_nk > ρ_n ≤ α + P qkGnk(α) ≤ kGnk ≤ qkGnk(α) + ρn + P kD_nk − kG_nk > ρ_n, since the distribution of kGnk is absolutely continuous. Analogously, it holds

P kDnk ≤ qkGnk(α) ≥ α − P qkGnk(α) − ρn≤ kGnk ≤ qkGnk(α) − P kD_nk − kG_nk > ρ_n,

(16)

and therefore P kD_nk ≤ q_kG_n_k(α) − α ≤ sup x∈R P (|kGnk − x| ≤ ρn) + P kG_nk − kD_nk > ρ_n . The first term on the right hand side of the inequality is the concentration function of the random variable kGnk, which can be estimated by Theorem 2.1 of Chernozhukov et al. (2014).

This gives P kDnk ≤ q_kGnk(α) − α ≤ 4ρn(E[kGnk] + 1) + P kGnk − kDnk > ρn . By Lemma 6 we have N ([0, 1], δ, d_Gn) ≤ CE/(h 3/2_a1/2

n δ), which allows to estimate the

expec-tation E[kGnk] as follows.

E[kGnk] ≤ C Z diamd Gn([0,1]) 0 v u u tln CE h3/2_a1/2 n δ ! dδ ≤ Cpln(n). This yields P kD_nk ≤ q_kG_n_k(α) − α ≤ C p ln(n)ρn+ P kG_nk − kD_nk > ρ_n . We now estimate the term P kG_nk − kD_nk

> ρ_n in several steps. With the definition Gn,0(x) := hβ ν(x)√nanh n X j=−n ν(wj)ZjK wj− x h ; h , (17) we find P kG_nk − kD_nk > ρ_n ≤ P (kG_n− D_nk > ρ_n) ≤ PkGn,0− Dnk > ρn 2 + PkGn− Gn,0k > ρn 2 , and thus P kG_nk − kD_nk > ρ_n ≤ P kνG_n,0− ν_nD_nk > σρ₈n + P 1_ν −_ν1_ˆ kνGn,0k > ρ₄n + P kGn− Gn,0k > ρ₂n =: Rn,1+ Rn,2+ Rn,3.

Consider first term Rn,2. Let κ > 0 be a constant and n sufficiently large such that

κ/pln(n) < 1. Then Rn,2≤ P 1_ν −_ν1_ˆ > κ ρn 4 √ ln(n) + PkνGn,0k > √ ln n κ =: Rn,2,1+ Rn,2,2.

The term Rn,2,1 is controlled by assumption and the term Rn,2,2 can be estimated by Borell’s

inequality. To this end, denote by d the pseudo distance induced by the process νGn,0. It

holds that E h sup x∈[0,1] ν(x)Gn,0(x) i ≤ E[kνGn,0k] ≤ C Z diam([0,1]) 0 p ln (N (δ, [0, 1], d)) dδ ≤ C Z diam([0,1]) 0 r ln C h32a 1 2 nδ dδ,

(17)

where the last estimate follows by an application of Lemma 6. By a change of variables, using that for any a ≤ 1

1 a Z a 0 p− ln(x) dx ≤p− ln(a) + 1 p−2 ln(a) ≤ Cp− ln(a), we obtain E h sup x∈[0,1] ν(x)Gn,0(x) i ≤ E[kνGn,0k] ≤ C p ln(n). (18) Next, Rn,2,2 ≤ 2P sup x∈[0,1] (νGn,0)(x) > √ ln n 2κ ! = P sup x∈[0,1] (νGn,0)(x) − E sup x∈[0,1] (νGn,0)(x) > √ ln n 2κ − E sup x∈[0,1] (νGn,0)(x) ! ≤ P sup x∈[0,1](νG n,0)(x) − E sup x∈[0,1](νG n,0)(x) > √ ln n 4κ ! ,

for sufficiently small κ such that E h sup_x∈[0,1]ν(x)Gn,0(x) i < √ ln n 4κ . An application of Borell’s inequality yields Rn,2,2≤ exp ln(n) 32κ2_σ2 [0,1] ,

where σ_[0,1]2 := sup_x∈[0,1]Var[ν(x)Gn,0(x)] is a bounded quantity by Lemma 6. For sufficiently

small κ, this yields the estimate Rn,2≤ P 1_ν −_ν1_ˆ > κ ρn 4√ln(n) +C n.

Next, we estimate the term Rn,1, i.e., we consider the approximation of Dn by a suitable

Gaussian process. To this end, consider the standardized random variables ξj := ηj/ν(wj)

and write ˆ νn(x)Dn(x) = hβ √ nanh n X j=−n ξjν(wj)K wj− x h ; h = h β √ nanh ξ0ν(w0)K −x h; h + n X j=1 ξjν(wj)K wj− x h ; h + −1 X j=−n ξjν(wj)K wj − x h ; h =: D0n(x) + D+n(x) + D − n(x), (19)

(18)

j-th partial sum Sj :=Pjν=1ξν, set S0 ≡ 0 and write n X j=1 ξjν(wj)K wj− x h ; h = Snν(wn)K wn− x h ; h − n−1 X j=0 Sj ν(wj+1)K wj+1− x h ; h − ν(w_j)K wj− x h ; h = Snν(wn)K wn− x h ; h − n−1 X j=1 Sj Z [wj,wj+1] d dz ν(z)K z − x h ; h dz.

By assumption, there exists a constant M > 2 such that E[|ε1|M] < ∞. By Lemma 2, γ is

uniformly bounded, which implies E[|ηj|M] ≤ M for some M > 0 and all j. By Corollary

4, §5 in Sakhanenko (1991) there exist iid standard normally distributed random variables Z1, . . . , Zn such that, for W (j) :=

Pn

j=1Zj the following estimate holds for any positive

where G+n,0 is defined in analogy to D+n in (19), with ξj replaced by Zj. For n sufficiently

large, we have an< 1/2 and thus, for x ∈ [0, 1] we have that (wn− x)/h ∈ [1/(2anh), 1/anh]

and thus ν(wn)hβ K wn− x h ; h ≤ ν(w_n)hβ sup u>1/(2anh) |K (u; h)| ≤ ν(wn)hβ sup u>1/(2anh)

|2anhuK (u; h)| ≤ 2anhβ+1ν(wn)k · K (·; h) k∞≤ Canh,

by (21). Next, sup x∈[0,1] Z [w1,wn] d dz hβν(z)K z − x h ; h dz ≤ Chβ sup x∈[0,1] Z [w1,wn] K z − x h ; h + 1 h K0 z − x h ; h dz ≤ Chβ sup x∈[0,1] Z 1 anh −1 h h |K (u; h)| +K0(u; h) dz ≤ C ln(n) by (S3) in the appendix. This yields

kˆνnD+n − νGn+k ≤ kˆνnDn− νGnk ≤ C ln(n)

max1≤j≤n|Sj− W (j)|

√ nanh

(19)

Hence, Rn,1≤ P C ln(n)max1≤j≤n√ |Sj − W (j)| nanh > ρn 2 ≤ P max 1≤j≤n|Sj− W (j)| > n2/M 2C .

Since 0 < σ < ν(wj), E[|ξj|M] ≤ M/σM for all 1 ≤ j ≤ n, we have Rn,2≤ C/n by (20).

Last, we need to estimate the term Rn,3. We have

Rn,3= P kGn− Gn,0k > ρ₄n ≤ P k eRnk > ρ₄n , where e Rn(x) := hβ √ nanh n X j=−n Zj(ν(wj) − ν(x))K wj− x h ; h .

Using that by Lemma 3 |ν(wj) − ν(x)| ≤ C|wj− x| = hC|wj − x|/h, we find that

N[0, 1], δ, d e Rn ≤ √C anhδ .

Furthermore, there exist positive constants_bc and bC such that

b c h 2 nanh ≤ sup x∈[0,1] VarhRe_n(x) i ≤ bC h 2 nanh .

By Theorem 4.1.2 in Adler and Taylor (2007), there exists a universal constant K such that, for all u > 2 q b C_nah2 nh, P sup x∈[0,1] e R(x) ≥ u ! ≤ Kun √ anh b ch2 · Ψ u q b C_nanhh2 ! ,

where Ψ denotes the tail function of the standard normal distribution. Setting u = ρn/8

yields, for sufficiently large n,

P sup x∈[0,1] e R(x) ≥ ρn 8 ! ≤ Kun √ anh bch 2 · Ψ u q b C_nanhh2 ! ≤ Kn 1 2+ 2 Mln(n) 8_bch2 · Ψ ln(n)nM2 8h √ b C ≤ Cn exp(−n4/M/h) ≤ C n. Therefore, Rn,3≤ C/n, which concludes the proof of assertion (i).

Assertion (ii) is again an immediate consequence of Lemma 5. Proof of Theorem 2. On the one hand,

P

kDnk ≤ qkGnkXn,m(α)

(20)

by Theorem 1. On the other hand, P kDnk ≤ qkGnkXn,m(α) ≥ α − C n− P q_kG_nk_Xn,m(α) − ρn≤ kGnk ≤ qkGnkXn,m(α) . Note that (23) implies

|s − t| ≤ |Xn,m| ⇒ dGn(s, t) ≤ |Xn,m|hβkK(1)(·; h)k∞ h3/2_a1/2 n ≤ C|Xn,m| h3/2_a1/2 n . Hence, kGnkXn,m ≤ kGnk ≤ kGnkXn,m+ sup s,t:|s−t|≤|Xn,m| |Gn(s) − Gn(t)| ≤ kGnkXn,m+ sup s,t:d_Gn(s,t)≤C |Xn,m| h3/2a1/2_n |Gn(s) − Gn(t)| =: kGnkXn,m+ τn. This yields P kDnk ≤ qkGnkXn,m(α) ≥ α − C n− P q_kG_nk_Xn,m(α) − ρn− τn≤ kGnkXn,m ≤ qkGnkXn,m(α) . By Corollary 2.2.8 in van der Vaart and Wellner (1996) and Lemma 6, we find

E[τn] ≤ C Z C |Xn,m| h3/2a1/2_n 0 q ln N ([0, 1], d_Gn, η) dη ≤ C |Xn,m| √ − ln(|Xn,m|) h3/2_a1/2 n .

Since |Xn,m| ≤ h1/2/na1/2n , we have that |Xn,m|

√

− ln(|Xn,m|) h3/2_a1/2

n

= o(ρ2_n) and therefore, by Markov’s inequality, P (τn≥ ρn) ≤ C |Xn,m| √ − ln(|Xn,m|) h3/2_a1/2 n 1 ρn = o(ρn). This yields P kDnk ≤ qkGnkXn,m(α) ≥ α − C n− P q_kG_nk_Xn,m(α) − 2ρn≤ kGnkXn,m ≤ qkGnkXn,m(α) − o(ρ_n) ≥ α − C n− sup_x∈RP kGnkXn,m ∈ [x − 2ρn, x + 2ρn] − o(ρn) ≥ α − C 1 n− ρn ,

where we applied Theorem 2.1 in Chernozhukov et al. (2014). Claim 1 of this theorem now follows. Claim 2 is an immediate consequence of Lemma 5.

(21)

8 Proofs of the auxiliary lemmas

Proof of Lemma 1. Assertion (i) is a direct consequence of Sobolev’s Lemma. (ii) By an application of the Hausdorff-Young inequality we obtain

dj dxjg _∞≤ 1 2π Φdj dxjg ₁, j = 0, 1, 2.

Fourier transformation converts differentiation into multiplication, that is, Φdj dxjg 1= k(·) j_Φ gk1, j = 0, 1, 2.

Since g ∈ Wm(R) for m > 5/2 by Assumption 1 it follows by an application of the Cauchy-Schwarz inequality that k(·)jΦgk1 < ∞ for j = 0, 1, 2 and the assertion follows.

Proof of Lemma 2. Assertion (i) follows from Proposition 8.10 in Folland (1984) since f∆ is

a density and is hence integrable.

Assertion (ii) is a direct consequence of Assumption 1 and the convolution theorem: Φγ = Φg∗f∆(−·) = Φg· Φf∆,

since Φf∆ is bounded.

Assertion (iii) follows in the same manner as the second claim of Lemma 1. Proof of Lemma 3. (i) Recall from definition (5) that

ν2(z) = Z

g(z + δ) − γ(z)2f∆(δ) d δ + σ2, σ2 > 0.

Hence, it follows from Lemma 1 (ii) and Lemma 2 (iii) that

0 < σ2 ≤ ν2(z) ≤ σ2+ 2(kgk2∞+ kγk2∞) < ∞.

(ii) By the first assertions of Lemma 1 and Lemma 2, the functions g and γ are twice contin-uously differentiable and f∆ is continuous. This yields for j = 1, 2

dj dzjν 2_{(z) =} Z _∂j ∂zj g(z + δ) − γ(z)2f∆(δ) d δ.

Since by Lemma 1 and Lemma 2 the derivatives of g and γ are uniformly bounded and f∆ is

a probability density, we find for j = 1, 2 dj dzjν 2_(z) ≤ sup z∈R ∂j ∂zj g(z + δ) − γ(z) 2 .

Proof of Lemma 4. From (7), we deduce for w ∈ R iwK(w; h) = 1 2π Z R e−itw d dt Φk(t) Φf∆(−t/h) dt = 1 2π Z R e−itw   Φ(1)_k (t) Φf∆(−t/h) + Φk(t) · Φ (1) f∆(−t/h) h(Φf∆(−t/h)) 2  dt.

(22)

Hence, sup w∈R |wK(w; h)| ≤ 1 2πc Φ (1) k _· h β + Φk·Φ (1) f∆(−·/h) ch _· h 2β 1 = ( O(h−β−1), (W), O(h−β), (S). (21)

In particular, for all w ∈ R\{0},

|K(w; h)| ≤ C |w| ·

(

h−β−1, (W),

h−β, (S). (22)

Now, let a > 1. Then Z {|z|>a} K z − x h ; h 2 dz ≤ C Z {|z|>a} h z − x 2 dz · ( h−2β−2, (W) h−2β, (S) ≤ C 2a a2_{− x}2 ( h−2β, (W) h−2β+2, (S) .

Proof of Lemma 5. The proof of Lemma 5 is straightforward but tedious. We therefore omit the proof here and defer it to the appendix.

Proof of Lemma 6. d_Gn(s, t) 2 = E|Gn(s) − Gn(t)|2 = h2β nanh n X j=−n K w_j− s h ; h − Kwj − t h ; h 2 ≤ h 2β_kK(1)_{(·; h)k}2 ∞ anh s − t h 2 ≤ Ch 2β anh t Φk(t) Φ(−t/h) 2 1 s − t h 2 , (23)

where the last estimate follows by the Hausdorff-Young inequality and definition (7). There-fore, by Assumption 3, there exists a constant CE such that dGn(s, t) ≤ CE|s − t|/(a

1 2 nh

3 2).

Now, consider the equidistant grid G_n,δ:= tj = ja 1 2 n CEδ, j = 1, . . . , CE a 1 2 nh 3 2_δ ⊂ [0, 1]

and note that for each s ∈ [0, 1] there exists a tj ∈ Gn,δ such that |s − tj| ≤ a1/2n h3/2δ/(2CE),

which implies dGn(s, tj) ≤ δ/2. Therefore, the closed dGn-balls with centers tj ∈ Gn,δ and

radius δ/2 cover the space [0, 1], i.e.,

N ([0, 1], δ/2, d_Gn) ≤ CE h32a 1 2 nδ .

The relationship N ([0, 1], δ, d_Gn) ≤ D([0, 1], δ, dGn) ≤ N ([0, 1], δ/2, dGn) now yields the first

claim of the lemma. Using that, by Assumption 3, k bK(1)(·, h)k∞≤ k · K(1)(·, h)k∞+ kK(·, h)k∞ ≤ C d dt tΦk(t) Φf∆(−t/h) 1 + tΦk(t) Φf∆(−t/h) 1 ≤ Ch−β, the second claim follows along the lines of the first claim.

(23)

References

Adler, R. J. and Taylor, J. E. (2007). Random Fields and Geometry. Springer, New York. Anderson, T. W. (1984). Estimating linear statistical relationships. Ann. Statist., 12(1):1–45. Berkson, J. (1950). Are there two regressions? Journal of the American Statistical

Associa-tion, 45:164–180.

Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Annals of Statistics, 1:1071–1095.

Birke, M., Bissantz, N., and Holzmann, H. (2010). Confidence bands for inverse regression models. Inverse Problems, 26:115020.

Carroll, R. J., Delaigle, A., and Hall, P. (2007). Non-parametric regression estimation from data contaminated by a mixture of berkson and classical errors. J. R. Stat. Soc. Ser. B Stat. Methodol., 69:859–878.

Carroll, R. J., Ruppert, D., Ste ski, L. A., and Crainiceanu, C. M. (2006). Measurement error in nonlinear models, volume 105 of Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, FL, second edition. A modern perspective.

Chernozhukov, V., Chetverikov, D., and Kato, K. (2014). Anti-concentration and honest adaptive confidence bands. Annals of Statistics, 42:1564–1597.

Delaigle, A. and Hall, P. (2011). Estimation of observation-error variance in errors-in-variables regression. Statist. Sinica, 21:103–1063.

Delaigle, A., Hall, P., and Jamshidi, F. (2015). Confidence bands in non-parametric errors-in-variables regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 77:149–169.

Delaigle, A., Hall, P., and Meister, A. (2008). On deconvolution with repeated measurements. Ann. Statist., 36(2):665–685.

Delaigle, A., Hall, P., and Qiu, P. (2006). Nonparametric methods for solving the berkson errors-in-variables problem. J. R. Stat. Soc. Ser. B Stat. Methodol., 68:201–220.

Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution prob-lems. The Annals of Statistics, 19(3):1257–1272.

Fan, J. and Truong, Y. K. (1993). Nonparametric regression with errors in variables. Ann. Statist., 21:1900–1925.

Folland, G. B. (1984). Real Analysis - Modern Techniques and their Applications. Wiley, New York.

Fuller, W. A. (1987). Measurement error models. Wiley Series in Probability and Mathe-matical Statistics: Probability and MatheMathe-matical Statistics. John Wiley & Sons, Inc., New York.

Gin´e, E. and Nickl, R. (2010). Confidence bands in density estimation. The Annals of Statistics, 38(2):1122–1170.

(24)

Kato, K. and Sasaki, Y. (2019). Uniform confidence bands for nonparametric errors-in-variables regression. Journal of Econometrics, 213(2):516–555.

Koul, H. L. and Song, W. (2008). Regression model checking with berkson measurement errors. J. Statist. Plann. Inference, 138:1615–1628.

Koul, H. L. and Song, W. (2009). Minimum distance regression model checking with berkson measurement errors. Ann. Statist., 37:132–156.

Meister, A. (2010). Nonparametric berkson regression under normal measurement error and bounded design. J. Multivariate Anal., 101:1179–1189.

Neumann, M. H. and Polzehl, J. (1998). Simultaneous bootstrap confidence bands in non-parametric regression. Journal of Nonnon-parametric Statistics, 9:307–333.

Proksch, K., Bissantz, N., and Dette, H. (2015). Confidence bands for multivariate and time dependent inverse regression models. Bernoulli, 21:144–175.

Sakhanenko, A. I. (1991). On the accuracy of normal approximation in the invariance prin-ciple. Siberian Advances in Mathematics, 1:58–91.

Schennach, S. M. (2013). Regressions with berkson errors in covariates – a nonparametric approach. The Annals of Statistics, 41:1642–1668.

Schmidt-Hieber, J., Munk, A., D¨umbgen, L., et al. (2013). Multiscale methods for shape constraints in deconvolution: Confidence statements for qualitative features. The Annals of Statistics, 41(3):1299–1328.

Stefanski, L. A. (1985). The effects of measurement error on parameter estimation. Biometrika, 72(3):583–592.

van der Vaart, A. and Wellner, J. (1996). Weak convergence and empirical processes. With applications to statistics. Springer, New York.

van Es, B. and Gugushvili, S. (2008). Weak convergence of the supremum distance for supersmooth kernel deconvolution. Statist. Probab. Lett., 78(17):2932–2938.

Wang, L. (2004). Estimation of nonlinear models with berkson measurement errors. Ann. Statist., 32:2559–2579.

Wang, L., Brown, L. D., Cai, T. T., and Levine, M. (2008). Effect of mean on variance function estimation in nonparametric regression. Ann. Statist., 36(2):646–664.

Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics, 14(4):1261–1295.

(25)

A

Appendix: Proofs of technical results in the main paper

A.1 Proof of Lemma 5

(i) We have that

E [ˆgn(x; h)] = 1 nanh n X j=−n γ(wj)K wj − x h ; h = 1 h n X j=−n Z wj+_nan1 wj γ(wj)K wj − x h ; h dz = Z _an1 +_nan1 − 1 an γ(z)K z − x h ; h dz + Rn,1(x) + Rn,2(x), where Rn,1(x) = 1 h n X j=−n Z wj+_nan1 wj d du γ(u)K u − x h ; h u=z (wj − z) dz and Rn,2(x) = 1 2 h n X j=−n Z wj+_nan1 wj d2 du2 γ(u)K u − x h ; h u= ˜wj(z) (wj− z)2dz. Then, Rn,1(x) = 1 h n X j=−n Z wj+_nan1 wj γ0(z)K z − x h ; h (wj − z) dz − 1 h2 n X j=−n Z wj+_nan1 wj γ(z)K0 z − x h ; h (wj− z) dz =: Rn,1,1(x) + Rn,1,2(x). Now, nanh|Rn,1,1(x)| ≤ Z _an1 − 1 an γ0(z)K z − x h ; h dz ≤ kγ0k2 K · − x h ; h ₂ = O ₁ h−1 2+β . Analogously, |R_n,1,2(x)| = O 1 nanh 3 2+β ! .

(26)

Furthermore |Rn,2(x)| ≤ 1 n2_a2 nh n X j=−n Z wj+_nan1 wj γ00( ˜wj(z))K ˜wj(z) − x h ; h dz + 2 n2_a2 nh2 n X j=−n Z wj+_nan1 wj γ0( ˜wj(z))K0 ˜wj(z) − x h ; h dz + 1 n2_a2 nh3 n X j=−n Z wj+_nan1 wj γ( ˜wj(z))K00 ˜wj(z) − x h ; h dz =: [Rn,2,1(x)] + Rn,2,2(x). Then, Rn,2,1(x) = O 1 n2_a3 nh2+β + 1 n2_a3 nh2+β = o 1 nanh 3 2+β

, since h/an→ 0. By Assumption

1 (iii) F γ ∈ Ws, s > 1₂, therefore γ ∈ L1(R) and hence Rn,2,2≤ C h3+β_(na n)2 kγk₁+ 1 na2 n = O 1 n2_a2 nh3+β + 1 n3_a4 nh3+β = O 1 nanh 3 2+β 1 nanh 3 2 + 1 n2_a3 nh 3 2+β !! = o 1 nanh 3 2+β ! ,

where we used again that h/an→ 0. Hence, in total we find

E[ˆgn(x; h)] = 1 h Z _an1 − 1 an γ(z)Kz − x h ; h dz + O 1 nanh 3 2+β .

Next, we enlarge the domain of integration and estimate the remainder as follows. By the Cauchy-Schwarz inequality we obtain

Z |z|> 1 an γ(z)K z − x h ; h dz ≤ Z |z|> 1 an |γ(z)|2dz !1₂ × Z |z|> 1 an K z − x h ; h 2 dz !1₂ . By Assumption 1 (iii) Z |z|> 1 an |γ(z)|2dz ≤ Z |z|> 1 an (1 + z2)s (1 +_a12 n) s|γ(z)| 2_{dz ≤ Ca}2s n. By Lemma 4 Z {|z|>_an1 } K z − x h ; h 2 dz ≤ Can ( h−2β, (W ) h−2β+2, (S) ) = O(ankK(·; h)k22).

(27)

Hence, E[ˆgn(x; h)] = 1 h Z γ(z)K z − x h ; h dz + o 1 nanh 3 2+β +    Oas+1/2n h1−β , Ass. 3, (S), O as+1/2n h−β , Ass. 4, (W). Furthermore, by Plancherel’s equality and the convolution theorem,

1 h Z γ(z)Kz − x h ; h dz = 1 h Z γ(z) eKz − x h ; h dz = 1 2πh Z Φγ(z)ΦK(e ·−x h ;h)(z) dz = 1 2π Z exp(ixz)Φγ(z)ΦK(·;h)e (zh) dz = 1 2π Z exp(ixz)Φf∆(z)Φg(z) Φk(hz) Φf∆(−z) dz = 1 2π Z exp(ixz)Φg(−z)Φk(hz) dz. Hence, 1 h Z γ(z)K z − x h ; h dz = 1 2π Z exp(ixz)Φg(−z) dz + 1 2π Z exp(ixz)Φg(−z) (Φk(hz) − 1) dz = g(x) + 1 2π Z exp(ixz)Φg(z) (Φk(hz) − 1) dz = g(x) + Rn(x). Finally, |Rn(x)| ≤ C Z |z|>D/h |Φg(z)| dz ≤ Z |z|>D/h 1 1 + z2 m dz !1 2 × Z |z|>D/h |hzi2m|Φg(z)|2dz !1 2 ,

which yields the estimate Rn(x) = O(hm− 1 2).

(ii) In the situation of both, (ii)a) and (ii)b), we have Varˆgn(x; h) = 1 n2_a2 nh2 n X j=−n ν2(wj) Kwj− x h ; h 2 = 1 nanh2 Z 1 an+ 1 nan −1 an ν2(z) K z − x h ; h 2 dz + Rn(x), where Rn(x) = 1 nanh2 n X j=−n Z wj+_nan1 wj ν2(wj) Kwj− x h ; h 2 − ν2_(z) Kz − x h ; h 2 dz.

(28)

Then Rn(x) = 1 nanh2 n X j=−n Z wj+_nan1 wj ν2(wj) Kwj− x h ; h 2 − Kz − x h ; h 2 dz + 1 nanh2 n X j=−n Z wj+_nan1 wj Kz − x h ; h 2 ν2_(w j) − ν2(z) dz =: Rn,1(x) + Rn,2(x).

By uniform Lipschitz continuity of ν2 (see Lemma 3 (ii)), it is immediate that |Rn,2(x)| ≤ C n2_a2 nh2 Z K z − x h ; h 2 ≤ C n2_a2 nh kK(·, h)k2 = O 1 n2_a2 nh1+2β .

Next, we consider the term Rn,1 for which we will use a Taylor expansion of K2(·; h). To this

end, notice first from (7) that for any l K(l)(w; h) = (−1) l 2π Z _e−itw_Φ k(t) · tl Φ_f_∆_(−t/h) dt,

where the functions Fl : t 7→ Φk(t) · tl is uniformly bounded by 1 and twice continuously

differentiable by Assumption 2 for any l ∈ N. It follows that K2(·; h) is smooth with integrable derivatives of all orders l ∈ N, since

(K2)(l)(w; h) = (K · K)(l)(w; h) = l X k=0 l k K(l−k)(w; h)K(k)(w; h), by the general Leibniz rule. This yields

Z (K 2₎(l)_{(w; h)} dw ≤ l X k=0 l k Z |K(l−k)(w; h)|2dw 1₂ × Z |K(k)(w; h)|2dw 1₂ ≤ C h2β, (S1)

by Lemma 4 and the previous discussion. Let M ∈ N be such that M ≥ 2_β− 1. It follows that |R_n,1(x)| ≤ C nanh2 n X j=−n Z wj+_nan1 wj "_M X l=1 (K2)(l)z − x h ; h 1 nanh l# dz + 1 nanh M +1 · 1 anh2β ! . By (S1), we deduce |Rn,1(x)| ≤ C nanh1+2β 1 nanh + 1 han 1 nanh M +1! ≤ C nanh1+2β   1 nanh + 1 nanh1+ 2 M +1 !M +1 .

(29)

Since M ≥_β2 − 1, we finally obtain |Rn,1(x)| = o 1 nanh1+2β . An application of Plancherel’s theorem and Assumption 5 give

1 πCh2β ≤ kK(·; h)k 2 2 = 1 2π Φk Φf∆(·/h) 2 2 ≤ 1 πc 1 + 1 h2 β . (S2)

Now, if (S) holds, an application of Lemma 4 yields

sup x∈[0,1] Z ∞ 1 an K z − x h ; h 2 dz = O anh 2 h2β = O anh2kK(·; h)k22 . Thus, Z 1 an+ 1 nan −_an1 ν2(z) Kz − x h ; h 2 dz = hν2(x)kK(·; h)k2₂(1 + O(han)) + Rn,3(x), where |Rn,3(x)| := Z 1 an+ 1 nan − 1 an ν2(z) − ν2(x) K z − x h ; h 2 dz ≤ Ch Z 2 anh − 2 anh ν2(zh − x) − ν2(x) K(z; h) 2 dz ≤ Ch Z 2 anh − 2 anh zh K(z; h) 2 dz. By (24) we have |z · K z; h| ≤ C/hβ _and |Rn,3(x)| ≤ Ch2−β Z 2 anh − 2 anh K(z; h) dz = O h2_ln(n) h2β ,

since, by (24) and (25) in the proof of Lemma 4 and (S), Z 2 anh − 2 anh K z; hdz ≤ C hβ + Z 1≤|z|≤_anh2 K z; hdz ≤ C hβ 1 + Z 1≤|z|≤_anh2 1 |z|dz ! ≤ C hβ (1 + ln(2/anh)) = O(ln(n)/h β_). _(S3)

Assertion (ii)b) now follows. (ii)a) Under (W) Z 1 an+ 1 nan − 1 an ν2(z) Kz − x h ; h 2 dz ≤ sup y∈R ν2(y)h Z |K(z; h)|2_dz,

(30)

and the second inequality of (ii)a) follows by (S2). Furthermore, Z 1 an+ 1 nan − 1 an ν2(z) K z − x h ; h 2 dz ≥ σ2 Z 1 an+ 1 nan − 1 an K z − x h ; h 2 dz ≥ hσ 2 2 kK(·; h)k 2 2,

for sufficiently large n by Lemma 4. Now, the first inequality of (ii)a) follows by (S2), which concludes the proof of this lemma.

A.2 Extensions to non-equidistant design

For ease of notation, we considered an equally spaced design in the main document. However, this can be relaxed to more general designs. In this section, we restate the main results (Theorem 1, Theorem 2, Lemma 5) and adjust their proofs to the case where the design is generated by a known positive design density fD,n on [0, ∞) as follows

j n + 1 =

Z wj

0

fD,n(z) dz, j = 1, . . . , n,

and wj = −w−j. Note that, given the latter definition, we have fD,n(z) = (n+1)/nanI[0,1/an](z).

Furthermore, we require the following regularity assumptions. Assumption 6.

(i) The density fD,n is continuously differentiable, fD,n∈ C1(supp(fD,n)).

(ii) There exist constants cD and CD such that cDan≤ fD,n ≤ CDan.

(iii) The derivative f_D,n0 is uniformly bounded, |f_D,n0 | ≤ a_nCD0.

Regarding our estimator, we need to make the following adjustment to accommodate the more general design

ˆ gn(x; h) = 1 nh n X j=−n Yj fD,n(wj) K wj− x h ; h .

This yields the following adjusted Lemma 5 and adjusted proof.

Lemma 8. Let Assumptions 1 and 2 be satisfied. Further assume that h/an→ 0 as n → ∞.

(i) Then for bias, we have that

sup x∈[0,1] Eˆgn(x; h) − g(x) = O hm−12 ₊ 1 nanhβ+ 3 2 + ( o as+1/2n h1−β, Ass. 3, (S), o as+1/2n h−β, Ass. 4, (W).

(ii) a) For the variance if Assumption 5, (W) holds and nanh1+β → ∞, then we have that

1 CD · σ 2 2Cπ(1 + O(an)) ≤ nanh 1+2β_Var[ˆ_g n(x; h)] ≤ 2βsup_x∈Rν2(x) cπ · 1 cD .

(31)

(ii) b) If actually Assumption 3, (S) holds and nanh1+β → ∞, then 1 fD,n(x) ·ν 2_(x) Cπ (1 + O(an)) ≤ nanh 1+2β_Var[ˆ_g n(x; h)] ≤ ν2(x) cπ (1 + O (h/an)) · 1 fD,n(x) ,

Here c, C and β are the constants from Assumption 5 respectively 3. Proof of Lemma 8. (i) We have that

E [ˆgn(x; h)] = 1 nh n X j=−n γ(wj) fD,n(wj) K wj − x h ; h = 1 nh n−1 X j=−n Z wj+1 wj γ(wj) fD,n(wj)(wj+1− wj) K wj− x h ; h dz. Next, observe fD,n(wj)(wj+1− wj) = FD,n(wj+1) − FD,n(wj) − 1 2f 0 D,n(w∗j)(wj+1− wj)2, = 1 n + 1 + 1 2f 0 D,n(w∗j)(wj+1− wj)2,

where FD,n is the primitive of fD,n. This yields

fD,n(wj)(wj+1− wj) − 1 n ≤ 1 n2 + 1 2CD0 1 n2_a2 ncD and thus 1 fD,n(wj)(wj+1− wj) = n 1 + O 1 na2 n . (S4)

Replacement of 1/fD,n(wj)(wj+1− wj) by the latter estimate now yields

E [ˆgn(x; h)] = 1 h Z 1 an − 1 an γ(z)K z − x h ; h dz 1 + O 1 na2 n + Rn,1(x) + Rn,2(x), where Rn,1(x) = 1 h n−1 X j=−n Z wj+1 wj d du γ(u)K u − x h ; h _u=z (wj − z) dz and Rn,2(x) = 1 2 h n−1 X j=−n Z wj+1 wj d2 du2 γ(u)K u − x h ; h u= ˜wj(z) (wj− z)2dz.

For z ∈ [wj, wj+1] we obtain the following estimates by Assumption 6:

(32)

. Therefore, the rest of the proof of claim (i) follows along the lines of the proof of Lemma 5 (i).

(ii) In the situation of both, (ii)a) and (ii)b), we have Varˆgn(x; h) = 1 n2_h2 n X j=−n ν2(wj) fD,n(wj)2 Kwj− x h ; h 2 = 1 nh2 Z 1 an − 1 an ν2(z) fD,n(z) Kz − x h ; h 2 dz · 1 + O 1 na2 n + Rn(x), where Rn(x) = 1 nh2 n−1 X j=−n Z wj+1 wj ν2(wj) nfD,n(wj)2(wj+1− wj) Kwj− x h ; h 2 − ν 2_(z) fD,n(z) Kz − x h ; h 2 dz. Then Rn(x) = 1 nh2 n−1 X j=−n Z wj+1 wj ν2(wj) nfD,n(wj)2(wj+1− wj) Kwj− x h ; h 2 − Kz − x h ; h 2 dz + 1 nh2 n X j=−n Z wj+1 wj Kz − x h ; h 2 ν2(w_j) nfD,n(wj)2(wj+1− wj) − ν 2_(z) fD,n(z) dz.

Using (S4), we further obtain

Rn(x) = 1 nh2 n−1 X j=−n Z wj+1 wj ν2(wj) fD,n(wj) Kwj− x h ; h 2 − Kz − x h ; h 2 dz ·1 + O_na1 n + 1 nh2 n−1 X j=−n Z wj+1 wj K z − x h ; h 2 ν2(w_j) fD,n(wj) − ν 2_(z) fD,n(z) dz + C n2_a nh2 n−1 X j=−n Z wj+1 wj K z − x h ; h 2 ν2(w_j) fD,n(wj) dz =: Rn,1(x) + Rn,2(x) + Rn,3(x). It holds that |Rn,3(x)| ≤ C n2_a2 nh kK(·, h)k2₂ = O 1 n2_a nh2β+1 . Furthermore, Rn,2(x) = 1 nh2 n−1 X j=−n Z wj+1 wj Kz − x h ; h 2 ν2(z)fD,n(z) − fD,n(wj) fD,n(z)fD,n(wj) dz + 1 nh2 n−1 X j=−n Z wj+1 wj K z − x h ; h 2 ν2(w_j) − ν2(z) fD,n(wj) dz

(33)

By uniform Lipschitz continuity of ν2 (see Lemma 3 (ii)) and fD,n (by Assumption 6), it is immediate that |R_n,2(x)| ≤ C n2_a2 nh2 Z Kz − x h ; h 2 ≤ C n2_a2 nh kK(·, h)k2 = O 1 n2_a2 nh1+2β .

Using again that |wj− z| ≤ |wj− wj+1| ≤ 1/(nancD), the rest of the proof of claim (ii) follows

along the lines of the proof of Lemma 5 (ii).

A.3 Role of the hyperparameters

In this section, we discuss the setting presented in Example 1 of the main document in more detail, in order to shed some light into the role of the parameters an and h, as well as the

assumptions made for our theoretical considerations. In particular, we show that, in a typical setting, the conditions listed in Assumption 4 are satisfied if h is the rate optimal bandwidth. As an example, we consider the case of a function g ∈ Wm(R), m > 5/2, of bounded support, [−1, 1], say, f∆ as in Definition (8) in the main document with a = 1, i.e.,

f∆(x) = 1₂e−|x| with Φf∆(t) = hti −2

,

and E[ε41] ≤ ∞, i.e., M = 4. Here, the parameter β, which gives the degree of ill-posedness of

the problem and which is defined in Assumption 3 in the main document is given by β = 2. In this example, Assumption 4 (i) becomes

ln(n) n−12a_nh + h an + ln(n)2h + ln(n)an+ 1 nanh5 = o(1).

Given that h/an = o(1), the last term asymptotically dominates the first one, such that

Assumption 4 (i) reduces to h an + ln(n)2h + ln(n)an+ 1 nanh5 = o(1). (S5)

In density deconvolution, where the target density g is contained in a H¨older ball such that |g(r)_(x)−g(r)_{(x+δ)| ≤ Bδ}a_{for some radius B > 0, some positive integer r and some a ∈ [0, 1),}

the rate optimal bandwidth is of order n−1/(2(r+a+β)−1). According to our Assumption 1 (i), we have g ∈ Wm. Due to the embedding Wm(R) ⊂ Cr+a(R), we can replace r + a in the above bandwidth by m − 1/2 in terms of our parametrization of the smoothness of the target function. This yields h ' n−1/(2(m+β)). With this choice of h, given that 1/(nanh5) = o(1)

by Assumption 4 (i), Assumption 4 (ii) becomes √ an+ q n(m+β−1)/(m+β)_a2s+1 n = o(1/ p ln(n)). (S6)

Next, we will consider the design parameter an.This parameter ensures that asymptotically,

observations on the whole real line are available. This is necessary since the function γ will typically be of unbounded support, even if the function g itself is of bounded support as it is the case in this example. Condition (S6) can only be satisfied if an = n−ε, where ε can

(34)

only be as small as the parameter s (see Assumption 1 (iii)) allows. Assumption 1 (iii) in the main document is an assumption on the decay of γ. As a rule of thumb it can be said that this assumption is met if both functions g and f∆ decay sufficiently fast. To give some

more intuition, we now provide some computations for our specific example, for which the assumption is met for any s. We find

Z hzi2s(γ(z))2dz = Z hzi2s Z g(t)f∆(t − z) dt 2 dz = 1 4 Z hzi2s Z g(t) exp(−|t − z|) dt 2 dz.

Since g is supported on [−1, 1] by assumption, we now split the outer integral into integrals over different regions, such that all values of t with a contribution to the inner integral will be either larger or smaller. The remaining term is bounded by a constant.

Z hzi2s_(γ(z))2_{dz ≤ C +}1 4 Z |z|>1 hzi2s Z |t|≤1 g(t) exp(−|t − z|) dt !2 dz. If z > 1 in the outer integral, exp(−|t − z|) = exp(−z + t) and thus

Z z>1 exp(−z)hzi2s Z |t|≤1 g(t) exp(t) dt !2 dz < ∞, for any s > 0. Analogously,

Z z<−1 exp(z)hzi2s Z |t|≤1 g(t) exp(−t) dt !2 dz < ∞,

for any s > 0. Therefore, in the setting of this example, Assumption 1 (iii) holds for any s,such that an can be chosen of order n−ε for ε arbitrarily small. In this case, the remaining

conditions in (S5) and (S6) become 1/(nanh5) = o(1) and pln(n)an = o(1), respectively.

Using an = n−ε and h = n−1/(2(m+β)) ≥ n−1/9, we find that Assumption 4 holds for any

ε < 4/9.

A.4 Extensions: Details

Our theoretical developments for the procedure in Section 5 in the main document actually involve a sample splitting. To this end, let (dn)n∈Nbe a sequence of natural numbers, dn→ ∞

and dn= o(n), 1/dn= o(1/ ln(n)2) and let Jn:= {−n, . . . , n}\{−n + k · dn| 1 ≤ k ≤ 2n/dn},

i.e., we remove each dn-th data point from our original sample. Now, set Y1 := {Yj| j ∈ Jn} as

well as Y2 := {Yj| j ∈ {−n, . . . , n}\Jn}. This way, the asymptotic properties of the estimator

based on the main part of the sample, Y1, remain the same. Let further, for j ∈ Jn, Mj denote

the difference of ωj and its left neighbor, that is, Mj= 1/(nan) if j − 1 ∈ Jnand Mj= 2/(nan)

else. Define the estimator ˜gn, based on Y1 by

˜ gn(x; h) = 1 h X j∈Jn Yj Mj K wj− x h ; h .

(35)

Theorem 3. Let Assumptions 2, 4 (i) and 5 be satisfied. Let further ˜νn be a nonparametric

estimator of the standard deviation in model (4) based on Y2 such that for some sequence of

positive numbers bn→ 0 for which an/bn= o(1/ ln(n)) we have that

E " sup |j|≤bnn |˜νn(ωj) − ν(ωj)| # = o 1/(ln(n) ln ln(n)) and ν˜n> ˜σ > 0 (S7)

for some constant ˜σ > 0.

(i) There exists a sequence of independent standard normally distributed random variables (Zn)n∈Z, independent of ˜νn, such that for

e Dn(x) = p nanh1+2β ˜ ν(x) ˜gn(x; h) − E[˜gn(x; h)], e Gn(x) = p nanh1+2β h ˜νn(x) X j∈Jn,|j|≤nbn ˜ νn(ωj) Mj ZjK wj− x h ; h , (S8) we have that ∀ α ∈ (0, 1) : lim n→∞P k eDnk ≤ qkeGnk(α) = α, where q_ke

Gnk(α) denotes the α-quantile of k eGnk.

(ii) If, in addition,pnanh2m+2β+

p

na2s+1n +1/

√

nanh2 = o(1/pln(n)). and if Assumption

1 is satisfied, E[˜gn(x; h)] in (11) can be replaced by g(x).

Proof of Theorem 3. We require that

ke_Dn− eGnk = oP(1/ p ln(n)), (Step 1) as well as Ek eGnk = OP( p ln(n)). (Step 2)

Step 1 a: Gaussian Approximation

Lemmas 9 and 10 are in preparation for the Gaussian approximation where the target process e

Gn is first approximated by processes eGbn,0n and eGn,0.

Lemma 9. We shall show that kν e_Gbn n,0− ˜νnGe_nk = o_P(1/ p ln(n)), (24) where e Gbn,0n := p nanh1+2β hν(x) X j∈Jn,|j|≤nbn ν(ωj) Mj ZjK wj− x h ; h .