• No results found

Quantile Regression with ℓ

N/A
N/A
Protected

Academic year: 2021

Share "Quantile Regression with ℓ"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Quantile Regression with ℓ

1

−regularization and Gaussian Kernels

Lei Shi1,2, Xiaolin Huang1, Zheng Tian2 and Johan A.K. Suykens1

1 Department of Electrical Engineering, KU Leuven, ESAT-SCD-SISTA, B-3001 Leuven, Belgium

2 Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University

Shanghai 200433, P. R. China

Abstract

The quantile regression problem is considered by learning schemes based on 1−regularization and Gaussian kernels. The purpose of this paper is to present concentration estimates for the algorithms. Our analysis shows that the convergence behavior of ℓ1−quantile regression with Gaussian kernels is almost the same as that of the RKHS-based learning schemes. Furthermore, the previous analysis for kernel-based quantile regression usually requires that the output sample values are uniformly bounded, which excludes the common case with Gaussian noise. Our error analysis presented in this paper can give satisfactory convergence rates even for unbounded sampling processes. Besides, numerical experiments are given which support the theoretical results.

Key words and phrases. Learning theory, Quantile regression, ℓ1-regularization, Gaussian kernels, Unbounded sampling processes, Concentration estimate for error anal- ysis

AMS Subject Classification Numbers: 68T05, 62J02

†The corresponding author is Lei Shi. Email addresses: leishi@fudan.edu.cn (L. Shi), huangxl06@mails.tsinghua.edu.cn (X. Huang), jerry.tianzheng@gmail.com (Z. Tian) and jo- han.suykens@esat.kuleuven.be (J. Suykens).

(2)

1 Introduction

In this paper, under the framework of learning theory, we study ℓ1−regularized quantile regression with Gaussian kernels. Let X be a compact subset of Rn and Y ⊂ R, the goal of quantile regression is to estimate the conditional quantile of a Borel probability measure ρ on Z := X × Y . Denote by ρ(·|x) the conditional distribution of ρ at x ∈ X, the conditional τ−quantile is a set-valued function defined by

Fρτ(x) ={t ∈ R : ρ((−∞, t]|x) ≥ τ and ρ([t, ∞)|x) ≥ 1 − τ} , x ∈ X, (1.1) where τ ∈ (0, 1) is a fixed constant specifying the desired quantile level. We suppose that Fρτ(x) consists of singletons, i.e., there exists an fρτ : X → R, called the conditional τ−quantile function, such that Fρτ(x) = {fρτ(x)} for x ∈ X. In the setting of learning theory, the distribution ρ is unknown. All we have in hand is only a sample set z = {(xi, yi)}mi=1∈ Zm, which is assumed to be independently distributed according to ρ. We additionally suppose that for some constant Mτ ≥ 1,

|fρτ(x)| ≤ Mτ for almost every x∈ X with respect to ρX, (1.2) where ρX denotes the marginal distribution of ρ on X. Throughout the paper, we will use these assumptions without any further reference. We aim to approximate fρτ from the sample z through learning algorithms.

The classical least-squares regression models the relationship between an input x∈ X and the conditional mean of a response variable y ∈ Y given x, which describes the cen- trality of the conditional distribution. In contrast, quantile regression can provide richer information about the conditional distribution of response variables such as stretching or compressing tails, so it is particularly useful in applications when both lower and upper or all quantiles are of interest. Over the last years, quantile regression has become a popular statistical method in various research fields, such as reference charts in medicine [12], survival analysis [16], economics [15] and so on. For example, in financial risk manage- ment, the value at risk (VAR) is an important measure for quantifying daily risks, which is defined directly based on extreme quantiles of risk measures [43]. As our interest here focuses on a particular quantile interval of the response, it is appropriate to adopt quan- tile regression for VAR modeling. Another example comes from environmental studies where upper quantiles of pollution levels are critical from a public health perspective. In addition, relative to the least-squares regression, quantile regression estimates are more robust against outliers in the response measurements. For more practical applications and attractive features of quantile regression, one may see the book [17] and references therein.

(3)

Due to its wide applications in data analysis, quantile regression attracts much at- tention in machine learning community and has been investigated in literature (e.g., [30, 27, 42, 9]). Define the τ -pinball loss Lτ :R → R+ as

Lτ(u) = {

(1− τ)u, if u > 0,

−τu, if u≤ 0.

One can see [17] that the loss function Lτ can be used to model the target function, i.e., the conditional τ−quantile function fρτ minimizes the generalization error

Eτ(f ) =

X×Y

Lτ(f (x)− y)dρ. (1.3)

over all measurable functions f : X → R. Based on this observation, learning algorithms produce estimators of fρτ by minimizing m1 m

i=1Lτ(f (xi) − yi) or a penalized version when i.i.d. samples {(xi, yi)}mi=1 are given. In kernel-based learning, this minimization process usually takes place in a hypothesis space (a subset of continuous functions on X) generated by a kernel function K : X× X → R. A popular choice is the Gaussian kernel with a width σ > 0, which is given by

Kσ(x, y) = exp {

∥x − y∥2 2

} .

The width σ is usually treated as a free parameter in training processes and can be chosen in a data-dependent way, e.g., by cross-validation. The adjustable parameter σ plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. A small σ will lead to over-fitting and the resulting predictive model will be highly sensitive to noise in sample data. Conversely, a large σ will make the learning algo- rithms perform unsatisfactorily and thus under-fitting will happen. In machine learning community, choosing the width σ is related to the model selection problem, which adjust- s the capacity or complexity of the models to the available amount of training data to avoid either under-fitting or over-fitting. It thus motivates the theoretical studies on the convergence behavior of algorithms with Gaussian kernels (e.g. [24, 41]). In particular, [42, 9] consider approximating fρτ by a solution of the optimization scheme

arg min

f∈Hσ

{ 1 m

m i=1

Lτ(f (xi)− yi) + λ∥f∥2σ

}

, (1.4)

where (Hσ,∥·∥σ) is the Reproducing Kernel Hilbert Space (RKHS) [1] induced by Kσ. The positive constant λ is another tunable parameter and called the regularization parameter.

Due to the Reprensenter Theorem [36], the solution of algorithm (1.4) belongs to a data- dependent hypothesis space

Hz,σ = { m

i=1

αiKσ(x, xi) : αi ∈ R }

.

(4)

The basis functions {Kσ(·, xi)}mi=1 are referred as to features generated by the input data {xi}mi=1 and the feature map x 7→ Kσ(·, x) which is well established from X to Hσ [8].

In this paper, for pursuing sparsity and achieving feature selections in Hz,σ, we es- timate fρτ by the ℓ1−regularized learning algorithm. The algorithm is defined as the solution ˆfzτ = fz,λ,στ to the following minimization problem

fˆzτ = arg min

f∈Hz,σ

{ 1 m

m i=1

Lτ(f (xi)− yi) + λΩ(f ) }

, (1.5)

where the regularization term is given by Ω(f ) =

m i=1

i| for f =

m i=1

αiKσ(x, xi)∈ Hz,σ,

i.e., the ℓ1−norm of the coefficients in the kernel expansion of f ∈ Hz,σ. The positive definiteness of Kσ ensures that the expression of f ∈ Hz,σ is unique. Thus the regular- ization term Ω as a functional on Hz,σ is well-defined. The ℓ1-regularization term not only shrinks the coefficients in the kernel expansion toward zero but also causes some of coefficients to be exactly zero when making a sufficiently large λ. The latter property will bring sparsity in the expression of the output function ˆfzτ. As is well known, the RKHS- based regularization which is essentially a squared penalty may have the disadvantage that even though some features Kσ(x, xi) may not contribute much to the overall solu- tion, they still appear in the kernel expansion. Therefore, in situations where there are a lot of irrelevant noise features, the ℓ1-norm regularization may perform superior to the RKHS-based regularization and offer more compact predictive models. As in algorithm (1.4), the parameters λ and σ are both free-determined which provides adaptivity of the algorithm.

The scheme with ℓ1−regularization is often related to LASSO algorithm [31] in the linear regression model. And there have been extensive studies on the error analysis of 1−estimator for linear least square regression and linear quantile regression in statistics (e.g. see [2, 45]). In kernel-based learning, the ℓ1−regularization was first introduced to design the linear programming support vector machine (e.g. [19, 33, 4]). Recently, a number of papers have begun to study the learning behavior of ℓ1−regularized least square regression with a fixed kernel function (e.g. see [28, 23]). The ℓ1−regularization is a very important regularization form as it is robust to irrelevant features and also serves as a methodology for feature selection. Particularly, the ℓ1−regularized quantile regression has excellent computational properties. Since the loss function and the regularization term are both piecewise linear, the learning algorithm (1.5) is essentially a linear programming optimization problem and thus can be efficiently solved by existing codes for large scale problems.

(5)

As a linear combination expressed by a Gaussian kernel Kσ and the input data{xi}mi=1, functions from the space Hz,σ are often used for scattered data interpolation in computer aided geometric design (CAGD) and approximation theory [35]. Functions of this form also has a wide application in the radial basis function networks [20]. Additionally, for i = 1,· · · , m, by taking αi = 1n and αi = yin with a suitable chosen σ = σ(m), the formulam

i=1αiKσ(x, xi) also can be used to estimate the density function of ρX and the conditional mean of ρ [44]. In the present scenario, the parameters i}mi=1 are obtained by solving a convex optimization problem in Rm, which is induced from the learning algorithms such as (1.4) and (1.5). Recall that the target function fρτ gives the smallest generalization error over all possible solutions. The performance of the algorithm (1.5) is measured by the excess generalization error Eτ( ˆfzτ)− Eτ(fρτ). For any σ > 0, when X is a compact subset of Rn, the linear span of the function set {Kσ(x, t)|t ∈ X} is dense in the space of continuous functions on X [26, 18]. We thus can expect that the learning scheme (1.5) is consistent, i.e., as the sample size m increases, the excess generalization error will tend to zero with high probabilities.

Up to now, the kernel-based quantile regression mainly focuses on estimating fρτ by regularization schemes in RKHS and the consistency of the algorithms is well understood due to the literature [27, 42, 9]. All theoretical results are stated under the boundedness assumption for the output, i.e., for some constant M > 0,|y| ≤ M almost surely. However, the regularization algorithm (1.5) is essentially different from its counterpart in RKHS, as the minimization procedure directly carries out in Hz,σ which varies with samples.

The sample dependent nature of the hypothesis space causes technical difficulties in the analysis [39]. The consistency study on such kind of algorithm is still open. Our paper is devoted to solving this problem. Specifically, we investigate how the output function fˆzτ given in (1.5) approximates the quantile regression function fρτ with suitable chosen λ = λ(m) and σ = σ(m) as m→ ∞. We show that the learning ability of algorithm (1.5) is almost the same as that of RKHS-based algorithm (1.4). It is also worth noting that the consistency of the algorithm generally implies that the estimator ˆfzτ is closed to the target function fρτ in a very weak sense. To obtain strong convergence result, under some mild conditions, we apply a so-called self-calibration inequality [25] to bound the function approximation in a weighted Lr−space by the excess generalization error (see Proposition 2). Our error bounds are obtained under a weaker assumption: for some constants M ≥ 1 and c > 0,

Y

|y|dρ(y|x) ≤ cℓ!M, ∀ℓ ∈ N, x ∈ X. (1.6) Note that the boundedness assumption excludes the Gaussian noise while assumption (1.6) covers it. This assumption is well known in probability theory and was introduced

(6)

in learning theory in [34, 11].

In the rest of this paper, we first present the main results in Section 2. After that, we give the framework of convergence analysis in Section 3 and prove the concerned theorems in Section 4. In Section 5, the results of numerical experiments are given to support the theoretical results. We concludes the paper in Section 6 by presenting some future topics related to our work.

2 Main Results

In order to illustrate our convergence analysis, we first state the definition of projection operator introduced in [7].

Definition 1. For B > 0, the projection operator πB on R is defined as

πB(t) =

{ −B if t < −B, t if − B ≤ t ≤ B, B if t > B.

(2.1)

The projection of a function f : X → R is defined by πB(f )(x) = πB(f (x)),∀x ∈ X.

Let ν be a Borel measure on X (or Rn). For p ∈ (0, ∞], a weighted Lp−space with the norm ∥f∥Lpν =(∫

X|f(x)|p)1/p

is denoted by Lpν. When omitting the subscript, the notion Lp is referred to the Lp−space with respect to the Lebesgue measure. Since the target function fρτ takes value in [−Mτ, Mτ] almost surely, it is natural to measure the approximation ability of ˆfzτ by the distance ∥πMτ( ˆfzτ)− fρτLrρX. Here the index r > 0 depends on the pair (ρ, τ ) and takes the value r = p+1pq when the following noise condition on ρ is satisfied.

Definition 2. Let p∈ (0, ∞] and q ∈ [1, ∞). A distribution ρ on X × R is said to have a τ−quantile of p−average type q if for almost every x ∈ X with respect to ρX, there exist a τ−quantile t ∈ R and constants 0 < ax≤ 1, bx > 0 such that for each s∈ [0, ax],

ρ((t− s, t)|x) ≥ bxsq−1 and ρ((t, t+ s)|x) ≥ bxsq−1, (2.2) and that the function on X taking values (bxaqx−1)−1 at x∈ X lies in LpρX.

Condition (2.2) ensures the uniqueness of the conditional τ−quantile function fρτ and the singleton assumption on Fρτ. For more details and examples about this definition, one may see [27] and references therein.

(7)

Denoted by Hs(Rn) the Sobolev space [22] with index s > 0 and for p ∈ (0, ∞] and q∈ (1, ∞), we set

θ = min {2

q, p p + 1

}

∈ (0, 1]. (2.3)

Our main results are stated as follows.

Theorem 1. Suppose that assumption (1.2) holds with Mτ ≥ 1, ρ has a τ−quantile of p−average type q with some p ∈ (0, ∞] and q ∈ [1, ∞) and satisfies assumption (1.6).

Assume that for some s > 0, fρτ is the restriction of some ˜fρτ ∈ Hs(Rn)∩ L(Rn) onto X and the density function h = dxX exists and lies in L2(X). Take σ = m−α with 0 < α < 2(n+1)1 and λ = m−β with β > (n + s)α. Then with r = p+1pq , for any 0 < ϵ < Θ/q and 0 < δ < 1, with confidence 1− δ, we have

∥πMτ( ˆfzτ)− fρτLrρX ≤ CX,ρ,α,βϵ (

log5 δ

)1/q

mϵΘq, (2.4) where CX,ρ,α,βϵ is a constant independent of m or δ and

Θ = min

{1− 2(n + 1)α

2− θ , β− (n + s)α, αs }

. (2.5)

Let α = 2(n+1)+(21 −θ)s and β = 2(n+1)+(2n+2s−θ)s, the convergence rate given by (2.4) is O(mϵ−q(2(n+1)+(2s −θ)s)) with an arbitrarily small (but fixed) ϵ > 0. Recall that, under the boundedness assumption for y, the convergence rate of algorithm (1.4) presented in [42]

is O(mq(2(n+1)+(2−θ)s)s ). Actually when y is bounded, a tiny modification in our proof will yield the same learning rate. An improved bound can be achieved if ρX is supported in the closed unit ball of Rn.

Theorem 2. If X is contained in the closed unit ball of Rn, under the same assumptions of Theorem 1, let σ = m−α with α < n1, λ = m−β with β > (n + s)α and r = p+1pq , then for any 0 < ϵ < Θ/q and 0 < δ < 1, with confidence 1− δ, there holds

∥πMτ( ˆfzτ)− fρτLrρX ≤ eCX,ρ,α,βϵ (

log5 δ

)1/q

mϵΘ′q , (2.6) where eCX,ρ,α,βϵ is a constant independent of m or δ and

Θ = min

{1− nα

2− θ , β− (n + s)α, αs }

. (2.7)

In Theorem 2, we further set α = n+(21−θ)s and β = n+(2n+2s−θ)s, and the convergence rate given by (2.6) is O(mϵq(n+(2−θ)s)s ). This rate is exactly the same as that of algorithm (1.4) obtained in [9] for bounded output y. Based on these observations, we claim that the approximation ability of algorithm (1.5) is comparable with that of the RKHS-based algorithm (1.4). Next, we give an example to illustrate our main results.

(8)

Proposition 1. Let X be a compact subset of Rn with Lipschitz boundary and ρX be the uniform distribution on X. For x ∈ X, the conditional distribution ρ(·|x) is a normal distribution with mean fρ(x) and variance σ2x. If ϑ1 := supx∈X|fρ(x)| < ∞, ϑ2 :=

supx∈Xσx ≤ 1 and fρ ∈ Hs(X) with s > n2, let σ = m2(n+1)+s1 , λ = m2(n+1)+sn+2s and ˆf

1

z2

be given by algorithm (1.5) with τ = 12, then for 0 < ϵ < 2s+4(n+1)s and 0 < δ < 1, with confidence 1− δ, there holds

∥πϑ1( ˆf

1

z2)− fρL2ρX ≤ cϵ

( log 5

δ )1/2

mϵ2s+4(n+1)s , (2.8)

where cϵ > 0 is a constant independent of m or δ. Furthermore, if X is contained in the unit ball of Rn, take σ = mn+s1 and λ = mn+2sn+s, then for 0 < ϵ < 2s+2ns , with confidence 1− δ, there holds

∥πϑ1( ˆf

1

z2)− fρL2ρX ≤ ˜cϵ

( log5

δ )1/2

mϵ2s+2ns , (2.9)

where ˜cϵ > 0 is a constant independent of m or δ.

Remark 1. Although we evaluate the approximation ability of the estimator ˆfzτ by its projection πMτ( ˆfzτ), the error bounds still hold true for πB( ˆfzτ) with some properly chosen B := B(m) ≥ Mτ. From the proofs of the main results, one can see that B will tend to infinity as the sample size increases.

Actually, when the kernel function is pre-given, since the pinball loss Lτ is Lipschitz continuous, one may derive the learning rates of kernel-based quantile regression with 1−regularization under the framework of our previous work [28]. However, besides the uniformly boundedness assumption, the presented approach also require the marginal dis- tribution ρX to satisfy some regularity condition (see Definition 1 in [28]), which guar- antees that the sampling data will have a certain density in X. Moreover, the analysis approach in [28] can not lead to satisfactory results for non-smooth kernel functions. Our approach in this paper is applicable to investigate the learning behavior of ℓ1−regularized quantile regression with a fixed Mercer kernel and will derive fast learning rates even for rough kernels. It also should be pointed out that, as the kernel width σ need to be tuned in the present scheme, the previous analysis methods that are available for the fixed kernel case can not be directly applied to our setting.

When q = 2 and the conditional τ -quantile function fρτ is smooth enough (meaning that the parameter s is large enough), the learning rates presented above can be arbitrarily close to 2(p+2)p+1 . However, if one estimates fρτ by the same scheme associated with a fixed Mercer kernel, similar convergence rates can be achieved under a regularity condition that fρτ lies in the range of powers of an integral operator LK : L2ρ

X → L2ρX defined by LK(f )(x) =

XK(x, y)f (y)dρX(y). Specifically, when applying the same algorithm with

(9)

a single fixed Gaussian kernel, the same convergence behavior for approximating fρτ may actually require a very restrictive condition fρτ ∈ C. Furthermore, the results of [29]

indicate that, the approximation ability of a Gaussian kernel with a fixed width is limited, one can not expect obtaining the polynomial decay rates for target functions of Sobolev smoothness.

3 Framework of Convergence Analysis

In this section, we establish the framework of convergence analysis for algorithm (1.5).

Given f : X → R, recall the generalization error Eτ(f ) defined by (1.3) and corresponding- ly the excess generalization error is given byEτ(f )−Eτ(fρτ). Compared to the consistency of the algorithm, people may be more concerned with the approximation of fρτ by the ob- tained estimator in some kind of function spaces. We thus need the following inequality, which plays an important role in our mathematical analysis.

Proposition 2. Suppose that assumption (1.2) with Mτ ≥ 1 holds and ρ has a τ−quantile of p−average type q. Then for any f : X → [−B, B] with B > 0, we have

∥f − fρτLrρX ≤ cρmax{B, Mτ}1−1/q{

Eτ(f )− Eτ(fρτ)}1/q

, (3.1)

where r = p+1pq and cρ= 21−1/qq1/q∥{(bxaqx−1)−1}x∈X1/qLp

ρX.

This proposition can be proved following the same idea in [27], and we move the proof to the Appendix just for completeness. For the least-square regression, the excess generalization error is exactly the distance in the space L2ρX(X) due to the strong convexity of the loss function (e.g., see Proposition 1.8 in [8]). However, as the pinball loss is not strictly convex, the established inequality (3.1) is non-trivial and noise condition on the distribution ρ is needed to derive the result.

By Proposition 2, in order to estimate error ∥πB( ˆfzτ)− fρτ∥ in the LrρX−space, we only need to bound EτB( ˆfzτ)) − Eτ(fρτ). This will be done by conducting an error decomposition which has been developed in the literature for RKHS-based regularization schemes (e.g. [8, 26]). A technical difficulty in our setting here is that the centers xi of the basis functions in Hz,σ are determined by the sample z and cannot be freely chosen.

One might consider regularization schemes in the infinite dimensional space of all linear combinations with{Kσ(x, t)|t ∈ X}. But due to the lack of a Reprensenter Theorem, the minimization in such kind of space can not be reduced to a convex optimization problem in a finite dimensional space like (1.5).

(10)

In this paper, we shall overcome this difficulty by a stepping stone method [37]. We use ˆfz,γτ to denote the solution of algorithm (1.4) with a regularization parameter γ, i.e.,

fˆz,γτ = arg min

f∈Hσ

{ 1 m

m i=1

Lτ(f (xi)− yi) + γ∥f∥2σ

}

. (3.2)

Note that ˆfz,γτ belongs to Hz,σ and is a reasonable estimator for fρτ. We expect then that fˆz,γτ might play a stepping stone role in the analysis for the algorithm (1.5), which will establish a close relation between ˆfzτ and fρτ. To this end, we need to estimate Ω( ˆfz,γτ ), the ℓ1−norm of the coefficients in the kernel expression for ˆfz,γτ .

Lemma 1. For every γ > 0, the function ˆfz,γτ defined by (3.2) satisfies

Ω( ˆfz,γτ ) 1 2γm

m i=1

Lτ( ˆfz,γτ (xi)− yi) + 1 +1

2∥ ˆfz,γτ 2σ. (3.3) Proof. Setting C = 2γm1 and introducing the slack variables ξi and ˜ξi, we can restate the optimization problem (3.2) as

minimize

f∈Hσi∈R,˜ξi∈R 1

2∥f∥2σ + Cm

i=1

{

(1− τ)ξi+ τ ˜ξi

}

subject to f (xi)− yi ≤ ξi, yi − f(xi)≤ ˜ξi,

ξi ≥ 0, ˜ξi ≥ 0, for all i = 1, · · · , m.

(3.4)

The Lagrangian L associated with problem (3.4) is given by L(f, ξ, ˜ξ, α, ˜α, β, ˜β) = 1

2∥f∥2σ + C

m i=1

{

(1− τ)ξi+ τ ˜ξi }

+

m i=1

αi(f (xi)− yi− ξi)

+

m i=1

˜

αi(yi− f(xi)− ˜ξi)

m i=1

βiξi

m i=1

β˜iξ˜i.

Denoting the inner product of Hσ as ⟨, ⟩σ, then for any f ∈ Hσ, we have ∥f∥2σ =⟨f, f⟩σ

and the reproducing property of Hσ [1] ensures that f (xi) = ⟨f, Kσ(·, xi)σ. Considering L as a functional from Hσ to R, the Fr´echet derivative of L at f ∈ Hσ is written as

L

Hσ(f ). We hence have HL

σ(f ) = f +m

i=1αiKσ(x, xi)m

i=1α˜iKσ(x, xi),∀f ∈ Hσ. In order to derive the dual problem of (3.4), we first let

L

Hσ

(f ) = 0→ f +

m i=1

αiKσ(x, xi)

m i=1

˜

αiKσ(x, xi) = 0,

L

∂ξi = 0→ C(1 − τ) − αi− βi = 0, i = 1,· · · , m

L

∂ ˜ξi

= 0→ Cτ − ˜αi− ˜βi = 0, i = 1,· · · , m.

Referenties

GERELATEERDE DOCUMENTEN

Dit deel omvat 384 pagina’s en behandelt Miocene Bivalvia, Scaphopoda, Cephalopoda, Bryozoa, Annelida en Brachiopoda van bo-. ringen uit de omgeving

As an aside, (i) an esti- mator of the conditional support is derived and is extended to the setting of conditional quantiles, (ii) its theoretical properties are derived, (iii)

Tube regression leads to a tube with a small width containing a required percentage of the data and the result is robust to outliers.. When ρ = 1, we want the tube to cover all

pelijk nadeel dat de zichtbaarheid 's nachts op een natte weg te wensen overlaat. De twee laatste soorten kunnen, doordat zij drie tot zes millimeter bo- ven het wegdek

Onderwysers kan gehelp word deur duidelike kriteria en riglyne (byvoorbeeld die gebruik van outentieke tekste en oudiovisuele komponente) vir die seleksie en

Std.. Thompson, die inspekteur van die Heidelbergse kring, waaronder Vereeniging tuisgehoort het, wys in sy verslag van hierdie jaar daarop dat hy meer tyd in

Ook nu, net als bij het mannelijke voorkomen van de koning, lijkt het onderwerp sodomie tot doel te hebben om niet alleen de koning politieke schade toe te brengen en

If we use the midpoint of the taut string interval as a default choice for the position of a local extreme we obtain confidence bounds as shown in the lower panel of Figure 4.. The