Katholieke Universiteit Leuven

(1)

Katholieke Universiteit Leuven

Departement Elektrotechniek ESAT-SISTA/TR 2004-228

A class of template splines

1

Diana M. Sima2 _{Sabine Van Huffel}2

July 2005

Accepted for publication to

Computational Statistics & Data Analysis.

1_{This report is available by anonymous ftp from ftp.esat.kuleuven.ac.be in the}

directory pub/sista/dsima/abstracts/04-228.html

2_{ESAT-SISTA, K.U. Leuven, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee,}

Belgium, (diana.sima,sabine.vanhuffel@esat.kuleuven.ac.be). This work was supported by Research Council KUL: GOA-Mefisto 666, IDO/99/003 and IDO/02/009 (Predictive computer models for medical classification problems using patient data and expert knowledge), several PhD/postdoc & fellow grants; Flem-ish Government: FWO: PhD/postdoc grants, projects, G.0078.01 (structured matrices), G.0407.02 (support vector machines), G.0269.02 (magnetic resonance spectroscopic imaging), G.0270.02 (nonlinear Lp approximation), research commu-nities (ICCoS, ANMMM); IWT: PhD Grants, Belgian Federal Science Policy Office: IUAP P5-22 (“Dynamical Systems and Control: Computation, Identifica-tion & Modelling”); EU: PDT-COIL, BIOPATTERN, eTUMOUR.

(2)

Abstract

A common frame of template splines that unifies the definitions of var-ious spline families, such as smoothing, regression or penalized splines, is considered. The nonlinear nonparametric regression problem that defines the template splines can be reduced, for a large class of Hilbert spaces, to a parameterized regularized linear least squares problem, which leads to an important computational advantage. Particular applications of template splines include the commonly used types of splines, as well as other atypi-cal formulations. In particular, this extension allows an easy incorporation of additional constraints, which is generally not possible in the context of classical spline families.

Keywords: nonparametric regression, reproducing kernel Hilbert space, splines.

(3)

A class of template splines

Diana M. Sima ∗, Sabine Van Huffel

ESAT-SCD, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.

Abstract

A common frame of template splines that unifies the definitions of various spline families, such as smoothing, regression or penalized splines, is considered. The non-linear nonparametric regression problem that defines the template splines can be reduced, for a large class of Hilbert spaces, to a parameterized regularized linear least squares problem, which leads to an important computational advantage. Par-ticular applications of template splines include the commonly used types of splines, as well as other atypical formulations. In particular, this extension allows an easy in-corporation of additional constraints, which is generally not possible in the context of classical spline families.

Key words: nonparametric regression, reproducing kernel Hilbert space, splines 1991 MSC: 62G08, 65D07

1 Introduction

We consider nonlinear nonparametric models of the form yi = f (ti) + ǫi, where

yi denotes a real-valued measured observation, ti is a regression abscissa from

a certain given bounded real interval I, ǫi is a zero-mean additive noise term,

and the function f : I → R is an unknown nonlinearity to be determined. The problem of estimating the unknown nonlinearity f , given measured data (t1, y1), . . . , (tm, ym), is a nonparametric regression problem. The range of

ap-plications for this type of problems is very wide: from density function estima-tions to econometric predicestima-tions, from biology to sociology. The literature on

∗ Corresponding author. Tel.: +32-16-32-11-43; fax: +32-16-32-19-70 Email addresses: diana.sima@esat.kuleuven.ac.be (Diana M. Sima), sabine.vanhuffel@esat.kuleuven.ac.be(Sabine Van Huffel).

(4)

nonparametric regression is also quite rich (see the monographs authored by Eubank (1999); Green and Silverman (1994); H¨ardle (1990); Wahba (1990)). But one tool that is almost always mentioned in connection with nonparamet-ric regression problems is splines.

In this paper, we design a new and more general formulation, which encom-passes many of the commonly used types of splines. We partly motivate this work by the need to link the family of penalized splines (Eilers and Marx, 1997) to the smoothing splines family (Wahba, 1990), whose theory is much more rigorous. In a recent article, Hall and Opsomer (2005) prove some basic statistical properties of consistency and expressions for mean squared errors for penalized splines. The extension that we propose is a super-family for both the smoothing splines and for penalized splines, among others. Good statisti-cal properties that hold for smoothing splines, such as the optimality of model selection via generalized cross validation, still hold for the template spline ex-tension. Moreover, with this general formulation we are able to solve not only smoothing problems, but also various constrained smoothing problems. In section 2, we define template splines by generalizing the approach of Wahba (1990), which embedded the smoothing splines into the theory of reproducing kernel Hilbert spaces. We allow more freedom in choosing some essential ele-ments of the template splines (compared to the smoothing splines definition), and this fact will play a role in widening the splines family.

In section 3 we show that by allowing these generalizations, we don’t overly restrict the computational properties, since the template spline solution will also be computable via a linear parameterization by solving a regularized linear least squares minimization.

Derivations of the well-known types of splines from the more general template spline formulation and some illustrative examples are detailed in section 4. A word about notation: we shall use bold-face symbols for vectors, for example: y is the vector (y1, . . . , ym)⊤, and t is the vector (t1, . . . , tm)⊤.

2 Template splines on reproducing kernel Hilbert spaces

Given the measured data (t1, y1), . . . , (tm, ym), the goal is to fit this data using

a simple nonparametric model yi = f (ti) + ǫi. Depending on the application,

we have to define more specifically some properties to which a good f should comply. The first step is to choose a function space where our search for a good f can be restricted to. A general framework for function spaces is described next; it is a framework that has attractive theoretical properties, but which

(5)

also proves to be advantageous from a computational point of view, as it will be shown in Section 3.

2.1 Reproducing kernel Hilbert spaces

The theory of reproducing kernel Hilbert spaces is a functional analysis tool that gives a sound foundation to the use of splines and other concepts in curve fitting, function estimation, model description or model building applications (Wahba, 2003).

A reproducing kernel Hilbert space (RKHS) H is a linear function space, with an embedded inner product h·, ·iH and an induced norm k · kH. It is a Hilbert

space on some domain I, thus it is complete with respect to the induced distance topology, but the most important property of a RKHS is that all pointwise evaluations can be written as bounded linear functionals. That is, for every t ∈ I the linear functional Lt defined by the relation Ltf = f (t) is

bounded (kLtfkH ≤ consttkfkH). As a consequence of Riesz’s representation

theorem (which says that every bounded linear functional has a representer in the Hilbert space), for each t ∈ I there exists an element ηt∈ H such that

f_{(t) = hη}t, fiH, ∀ f ∈ H.

In other words, each t ∈ I has a corresponding ηt ∈ H that “describes” all

possible values of the functions in H at t through the inner product of H. The function K : I ⊗ I → R defined as K(s, t) := hηs, ηtiH for all s, t ∈ I

is the reproducing kernel for H. It is symmetric, positive definite and satisfies hK(s, ·), K(t, ·)iH= K(s, t).

There is a one-to-one relationship between the set of reproducing kernel Hilbert function spaces defined on an interval I and the set of positive definite func-tions defined on I × I. Moreover, under general circumstances (e.g., conti-nuity and square-integrability), any positive definite function K admits an eigenfunction-eigenvalue decomposition: K(s, t) = ∞ X ν=1 σνΦν(s)Φν(t),

where σ1 ≥ σ2 ≥ · · · ≥ 0, with P∞ν=1σν2 < ∞, are the eigenvalues of K, and

Φ1,Φ2, . . ., an orthonormal sequence of square-integrable functions on I, are

called the eigenfunctions of K. Φ1,Φ2, . . . ,form a Hilbert basis for the RKHS

that corresponds to the kernel K.

These properties imply good news from the practical point of view. Choosing the RKHS H can either be replaced by choosing a desired kernel K or by

(6)

choosing a sequence of orthonormal functions that act as a basis or a gener-ator family for H. For this reason, in practical applications, the theoretical formulation of H is often omitted; typically, it is replaced by the definition of a certain sequence of generators for the “splines”, such as truncated polyno-mials, piecewise polynomials or B-spline bases (de Boor, 1978).

2.2 Unconstrained smoothing

When a space H is specified, we can already formulate the following least squares problem:

Definition 1 (The H-smoothing problem) Solving the least squares prob-lem min f∈H m X i=1 (yi− f(ti))2 (1)

for the function f is called the H-smoothing problem.

Via H-smoothing, the measured data y ∈ Rm _{is orthogonally projected onto}

the space H(t), which is a vector space of the discretizations of all the functions in H at the abscissas t1, . . . , tm. In other words, H-smoothing finds the function

ˆ

f _{in H that best approximates the data in y at given abscissas t. For the values} of t ∈ I that are not among the elements of t, ˆf can be used as predictor.

2.3 Constrained smoothing and template splines

To impose additional constraints such as smoothness, monotonicity, convexity, equality or inequality relations, on the estimated function f , it is possible in some cases to revise the definition of the space H, such that any function in H satisfies the extra constraints. However, the design of H in this case can become tedious. An alternative method is based on the projection framework for constrained smoothing developed by Mammen et al. (2001); basically, the additional constraints are imposed in a sequential manner: “smooth then con-strain.”

Here, we propose a one-step solution that is feasible and efficient for a wide class of constraints that can be written as (semi)norm conditions.

Let P denote a linear operator from H to H′_{, where H}′ _{is a certain normed}

space.

(7)

con-strained least squares problem min f∈H m X i=1

(yi− f(ti))2, such that kPfkH′ ≤ δ for a fixed δ ≥ 0, (2)

for the function f is called the P-constrained H-smoothing problem.

The value kPfkH′ is generally a measure of the quality of an arbitrary f with

respect to the desired constraints, or to the prior knowledge on properties that good estimators f should satisfy. Sometimes, if no such prior knowledge is available, kPfkH′ could also be a measure of parsimony for the model f .

Using a Lagrange multiplier argument, we arrive at the following:

Definition 3 (The template spline problem) The following minimization is the template spline minimization problem:

min f∈H 1 m m X i=1 (yi− f(ti))2+ λkPfk2_H′. (3)

This formulation is equivalent to the P-constrained H-smoothing problem, for a certain relation between λ ≥ 0 and δ ≥ 0. However, λ or δ are usu-ally not exactly known in practice. The penalty formulation of the template spline problem (3) shows that a trade-off between closeness of the model f to the observations and “quality” of the model – via the P-constraint – can be ensured by choosing an appropriate λ ≥ 0. In Section 3.2 we focus on the generalized cross validation (GCV) criterion that can be used for choosing the so-called smoothing (or penalty or regularization) parameter λ, given the measured data.

Remark 4 (More general formulations) Sometimes in spline literature, the function evaluation operator that appears in the sum of squares of prob-lems (1, 2, 3), ti → f(ti), is replaced by a more general bounded linear operator

Li, i.e., ti → Lif. All results in this paper are straightforwardly generalizable

to this setting.

Moreover, if the zero-mean noise term ǫi is assumed to have a known

co-variance matrix other than the identity matrix, the least squares term in all minimization problems (1, 2, 3) can be easily transformed into weighted least squares formulations, using the inverse of the noise covariance matrix as weight matrix.

(8)

3 Computing template splines

3.1 Transformation to a linear least squares problem

The following theorem says that when H is a RKHS and P has some gen-eral properties (see below), the solution of the template problem (3) has a representation in terms of the reproducing kernel and/or its eigenfunctions, and it can be easily computed via linear least squares or linear equations. In particular, problem (1) can also be efficiently solved as a linear least squares problem.

Let HP denote the null space of P in H (that is, HP := ker P := {f ∈ H :

Pf = 0}), and assume it is finite dimensional, of dimension d. For instance, if P is the dth _{order derivative operator on some space H of (infinitely)}

con-tinuously differentiable functions, then ker P is spanned by the finite basis {1, t, t2_{, . . . , t}d−1_}.

Assume that P preserves orthogonality relations from H to H′_{: thus, if hf, gi} H=

0, then hPf, PgiH′ = 0.

Theorem 5 (generalized from Wahba (1990), Th. 1.3.1) The solution of the template spline problem (3), for a fixed λ ≥ 0, has a closed-form ex-pression given by ˆ fλ := d X k=1 akφk+ m X i=1 biµi, (4) where

• φ1, . . . , φd are a basis for HP, (d := dim HP),

• µi= K(ti,·), for i = 1, . . . , m,

• the coefficients a := (a1, . . . , ad)⊤ and b := (b1, . . . , bm)⊤ are given by

a =A⊤A_{− A}⊤B(B2+ mλC)−1BA−1A⊤Im− B(B2+ mλC)−1B

y,

b = (B2+ mλC)−1B_{(y − Aa),} (5)

where the matrices A (m × d), B (m × m) and C (m × m) are computed as A_{= {(φ}k(tl)}(l=1,...,m; k=1,...,d),

B_{= {K(t}i, tk)}(i,j=1,...,m)= {hµi, µjiH}(i,j=1,...,m), (6)

C_{= {hPµ}i,PµjiH′}_{(i,j=1,...,m)}.

(9)

Since the set {φ1, . . . , φd} is a basis for HP, any f ∈ H can be written as

f = Pd

k=1akφk+ f⊥, for some a1, . . . , ad ∈ R and an f⊥ ∈ H⊥P. It will be

shown that f⊥ _{can be further decomposed such that the optimal solution}

of the minimization (3) has a finite decomposition in H. To this end, write f⊥ =Pm

i=1biµi+ ρ, where µi := K(ti,·) and ρ ∈ H is orthogonal to φ1, . . . , φd

and to µ1, . . . , µm.

The least squares objective to be minimized becomes:

F(f) :=_m1 m X i=1 (yi− f(ti))2+ λkPfk2_H′ = 1 m m X i=1  y_i− d X k=1 akφk(ti) − m X j=1 bjµj(ti) − ρ(ti)   2 +λ P   d X k=1 akφk+ m X j=1 bjµj + ρ   2 H′ = 1 mky − Aa − Bbk 2 2 + λb⊤Cb + λ kPρk 2 H′,

where A, B and C are given in (6). We used the facts that ρ(ti) = hρ, K(ti,·)iH=

hρ, µiiH= 0 from the choice of ρ orthogonal to µ1, . . . , µm, that φ1, . . . , φdare

in the null space of P, and that ρ being orthogonal toPm

j=1bjµj in H implies

hPm

j=1Pµj,PρiH′ = 0, by the “preservation of orthogonality” property of P.

For f to be a minimizer of F, we must have kPρkH′ = 0. This implies ρ ∈

ker P, which contradicts the choice of ρ orthogonal to the basis of ker P; thus, the only possibility is ρ = 0.

The normal system of equations for the linear least squares problem that we obtained is:      A⊤Aa + A⊤_B_{b = A}⊤_y, B⊤_{Aa + (B}⊤_B_{+ mλC)b = B}⊤_y. (7)

Since B is symmetric positive definite, the closed-form expressions (5) are easily obtained. 2

In summary, the theorem gives us a simple way to express the minimization (3) as a problem that is linearly parameterized by the vectors a and b, more precisely, as a regularized linear least squares problem:

min a∈Rd_{, b} ∈Rm 1 mky − Aa − Bbk 2 2+ λb⊤Cb. (8)

(10)

3.2 Data driven spline fitting using generalized cross validation

In practice, the smoothing parameter λ will not be known a priori; the meth-ods for choosing a good λ often rely on optimizing some criterion and involves many evaluations of spline solutions (from their coefficients a, b), for various values of λ. For this reason, it is required to have efficient algorithms for the computation of coefficients a and b, instead of the expressions in (5).

The regularized least squares problem of minimizing

1 mky − Aa − Bbk 2 2+ λb⊤Cb = 1 m    y 0   −    A B 0 √mλD       a b    2 2 ,

where D is a Cholesky factor of the symmetric positive definite matrix C (D⊤_D_{= C), can be efficiently solved if we compute in advance the generalized}

singular value decomposition (GSVD) (Golub and Van Loan, 1996) of the matrix pair A B , 0 D .

Many methods for choosing “optimal” regularization parameter λ can be em-ployed in the context of template splines. A recent simulation study (Lee, 2003) compares several parameter selection procedures (leave-one-out cross valida-tion, generalized cross validavalida-tion, Mallows’ Cp, Akaike’s information criterion,

and risk estimation methods); the conclusion is that none of the methods can be overall the best.

Here we present the application of generalized cross validation (Craven and Wahba, 1979) to template splines. The GCV criterion aims at minimizing the function G(λ) := ky−Aaλ−Bbλk2₂

[Tr(Im−H(λ))]2 where (aλ,bλ) denote the optimal solution

vectors that are obtained from solving problem (8) holding the regularization parameter at the fixed value λ, and H(λ) is the influence matrix (or hat matrix, or smoother matrix) defined as the m × m matrix that makes the equality H(λ)y = Aaλ+ Bbλ hold. Its formula reads:

H(λ) = A B    A⊤A A⊤B B⊤A B⊤B+ mλC    −1   A⊤ B⊤   .

GCV was originally introduced for problems such as ridge regression (Golub et al., 1979) and smoothing splines (Craven and Wahba, 1979), which belong to the same category of regularized linear least squares problems as the template splines do. For these problems, it is possible to show that the GCV criterion is a predictive mean squared error criterion. In the context of template splines this will mean that, under reasonable assumptions and in the asymptotic situation,

(11)

the values chosen for λ by minimizing the GCV function, when the sample size grows to infinity, converge to the optimal λ that corresponds to the spline model that is closest to the true underlying model.

For efficient computations, the GSVD of the matrix pair A B , 0 D

can be used, because in this case, the matrix that must be inverted in the formula of H(λ) becomes a diagonal matrix, and, thus, computing this in-verse is very fast, for any λ. (See Hansen (1997).) Interesting techniques for making the optimization of the GCV function fast for large-scale problems are described by Golub and von Matt (1997), making use of a randomization method to compute the trace of a large matrix.

4 Examples

In this section, we exemplify the described theory, by first showing that some classical spline families belong to our generalization. Afterwards, we design some examples that are atypical for the classical splines, but which can how-ever be successfully tackled by the template splines.

4.1 Smoothing, regression and penalized splines as template splines

Smoothing splines and penalized splines start from the same idea of using a weighted penalty term in order to obtain a desired degree of smoothness of the reconstructed nonparametric model. Regression splines rely on a different technique to obtain a smoother or a rougher approximation: tuning the number of basis splines, as explained in §4.1.2.

Despite their similarities, the penalized splines cannot be seen as a subset of smoothing splines (or vice-versa), mainly because of the fact that their penalty terms are of totally different nature.

But with the template splines, we generalize these approaches by allowing the operator P to be more general than just an orthogonal projection (as for smoothing splines) and to take value in other normed spaces (such as Rn_{, for}

penalized splines). In the meantime, the good properties of smoothing splines, such as the possibility of using GCV for choosing the penalty parameter, are still kept.

(12)

4.1.1 Smoothing splines as template splines

Smoothing splines (Wahba, 1990) are a subset of the template splines family, obtained when the space H′ _{is the same as the space H, and the operator P}

is an orthogonal projection.

A classical example of a smoothing spline is the natural spline. The space H is defined as the Sobolev space,

H := {f : [0, 1] → R : f, f′, . . . , f(d−1)are absolutely continuous and f(d) _{∈ L}2},

where f(k) _{denotes the k}th _{derivative and L}

2 denotes the space of square

integrable functions. H is endowed with the square seminorm:

kfk2H:= d X k=1 h f(k)(0)i2+ Z 1 0 h f(d)(u)i2du.

The corresponding reproducing kernel is

K(s, t) := d X k=1 φk(s)φk(t) + Z 1 0 Gd(s, u)Gd(t, u)dt,

where each φk (for k = 1, . . . , d) denotes the monomial φk(t) = t

k−1

(k−1)! and

Gd is the Green function Gd(t, u) =

(t−u)d−1+

(d−1)! , with (x)+ = x for x ≥ 0 and 0

otherwise.

The operator P is simply defined as the orthogonal projection onto the sub-space

HP := {f ∈ H : f(0) = f(1) = f′(0) = f′(1) = · · · = f(d−1)(0) = f(d−1)(1) = 0}.

The optimal natural spline solution solves the problem:

min f∈H 1 m m X i=1 (yi− f(ti))2+ λ Z 1 0 h f(d)(u)i2du.

Thus, the natural spline reconstruction from noisy measurements aims at d-order smoothness, with no boundary conditions.

The restriction of having H′ _{equal to H can be exploited in the smoothing}

spline solution derivation (Wahba, 1990) to yield a computational advantage. The smoothing spline solution can still be obtained by solving a regularized linear least squares problem (details in (Wahba, 1990, Chapter 1)). But H′ ₌

H implies that for smoothing splines we can redefine the matrices that appear in our Theorem 5 and have B = C, which gives a simpler numerical solution for the smoothing spline, since a QR decomposition can be used instead of a GSVD.

(13)

4.1.2 Regression splines as template splines

A regression spline (Friedman, 1991) is a piecewise polynomial function that is smooth at the jointure points (knots) up to the degree of the polynomials minus 1. Typical polynomial functions that are used with regression splines are truncated polynomials or B-splines (de Boor, 1978).

The space H can be defined as the span of the considered polynomial set. An example is the span of the truncated polynomials of degree q, span {t, t2_{, . . . , t}q_,

(t−τ1)q+,(t−τ2)q+, . . . ,(t−τN)q+}, where τ1, . . . , τN are N given knots, and (x)+

denotes, as before, x or 0, depending whether x is nonnegative or negative. Another example is the span of B-spline basis. The widely used B-splines are defined recursively such that, at the given knots τ1, . . . , τN, each of the

functions in the B-spline basis set is q-times continuously differentiable. The recursion formula reads:

v_i{1}(t) =      1, if τi ≤ t ≤ τi+1 0, otherwise, ; v {k} i (t) = v_i{k−1}_{(t) · (t − τ}i) τi+k−1− τi +v {k−1} i+1 (t) · (τi+k− t) τi+k− τi+1 , (9) and the q-order basis that defines H as span {v{q}1 , . . . , v

{q}

N−q}, contains thus

N _{− q functions.}

For fitting with regression splines, we can set the operator P as the identically zero mapping (and H′ _{= {0}). Thus the penalty part disappears from the}

problem (3), and we are left with a simple least squares problem.

Sometimes, the number of knots N and the knot positions must be optimized before performing regression. In the situation when we consider equally-spaced knots (i.e., uniform regression splines), only the integer parameter N has to be chosen. Therefore, we consider as hyperparameter λ the number of knots N and the generalized cross validation (or other typical method for model order selection) can then be used in order to choose an appropriate value for N. The parameter N controls in this case the bias-variance trade-off. A large N means that the measured y can be modeled with more fidelity, and a smaller N means that the reconstructed spline is smooth.

4.1.3 Penalized splines as template splines

The simple smoothing method of penalized splines was introduced by Eilers and Marx (1997) (and in very similar terms, by Ruppert and Carroll (1996)). It is based on (uniform) regression splines, but with an added penalty term, weighted by a certain regularization parameter λ. An interesting detail is found in (Ruppert, 2002), namely that the number of knots can be set to an arbitrary value (a number large enough, but always less than the number of regression

(14)

points); the technique of penalized splines controls the degree of smoothness only through the parameter λ.

In formal notation, let v1, . . . , vndenote a basis family of truncated

polynomi-als or of B-splines of degree q. Then, given the measurements y1, . . . , ym, a

func-tion f can be modeled as a linear combinafunc-tion of v1, . . . , vn, i.e., f =Pnk=1ckvk,

by minimizing the penalized least squares function

min c∈Rn 1 m m X i=1 yi− n X k=1 ckvk(ti) !2 + λk∆ck2, (10)

where ∆ is a finite difference operator (of a certain chosen order) and λ is a positive scalar parameter that controls the degree of smoothness.

From the formulation (10) it is clear that the penalized splines are also a particular example of template splines. We define the space H as the span of the function family {v1, . . . , vn}, and the space H′ as the n-dimensional

Euclidean space Rn_{. The operator P can be easily set using the canonical}

decomposition of any element f in H, as P(f) = P (Pn

k=1ckvk) = ∆c, where

∆ is the same approximation matrix for a finite difference operator that is used in the penalized splines problem (10).

4.2 Other applications of template splines

4.2.1 Smoothing in transformed spaces

We consider the abscissa interval I as the “time axis” (and therefore the mea-surements y1, . . . , ym are given in the “time-domain”), but we are interested

in imposing smoothness to the Fourier transform of the solution f .

Let F be the Fourier transform and F−1 _{denote the inverse Fourier transform.}

Consider a RKHS H of functions defined on the time-domain I. Then the image space is H′ _{:= FH = {F(f) : f ∈ H} . We can measure smoothness}

in H′ _{with the aid of a projection Q (see, for instance, the natural spline type}

of smoothing in section 4.1.1), i.e., the norm kQgkH′ (for any g in H′) is the

measure that we use for smoothness. It is then possible to define the mapping P : H → H′ _{by the composition}

P(f) = Q(F(f)).

Solving the template spline problem (3) in this case is equivalent to solving

min g∈H′ 1 m m X i=1 yi− (F−1◦ g)(ti) 2 + λ kQgk2H′,

(15)

which is a template spline problem in the space H′_.

4.2.2 Constrained smoothing

Classical smoothing splines or regression splines are designed to smooth scat-tered data in order to obtain meaningful approximations. However, addi-tional constraints are usually hard to impose. (See for instance (Kuijt and van Damme, 2001), where linear programming methods are used for several constrained smoothing problems.) With templates splines, constraints can be imposed through the penalty term. Tuning the penalty coefficient allows to slightly break the constraints, which is advantageous for identifying situations when the data does not obey the constraints.

In this paragraph, we give two numerical examples from an application where template splines are designed for constrained smoothing problems. In the first example, we show that imposing a symmetry constraint helps finding a reli-able model for noisy data that comes from a symmetric function. The second example illustrates that when using a wrong constraint, the GCV value for λ gives a model that clearly breaks the constraint.

Example 6 We consider several scalar functions on an interval [−α, α] that are symmetric with respect to the vertical axis through zero. We generate dis-cretizations and add Gaussian noise. We then compute from the noisy data two models, both based on a B-spline basis: a penalized spline where a second order derivative operator is used to obtain smoothness, and a template spline where the penalty contains the second order derivative for smoothness, but also a term for symmetry, which is governed by a “V-shaped” matrix [I −J], where I is the identity matrix and J is its mirror reflection. We choose the penalty coefficient by GCV in both models, but we observe that using the symmetry constraint gives models that have smaller mean squared error from the origi-nal simulated functions, than when no constraint is imposed. See Figure 1 for illustration.

Example 7 We consider now as original function a function that is not sym-metric with respect to the vertical axis through zero, e.g., the arctangent func-tion. We compute however a template spline with symmetry penalty. As illus-trated in Figure 2, the fit is symmetric, but does not reflect the data, when we choose a penalty parameter λ which is quite big. On the other hand, the GCV criterion chooses a λ that corresponds to a spline reconstruction that is rea-sonable for the given data, and ignores the inappropriate symmetry constraint.

(16)

−10 0 10 −50 0 50 100 150 −10 0 10 −4 −3 −2 −1 0 1 2 3 −10 0 10 −1 −0.5 0 0.5 1 1.5 −10 0 10 −50 0 50 100 150 −10 0 10 −4 −3 −2 −1 0 1 2 3 −10 0 10 −1 −0.5 0 0.5 1 1.5 p en al iz ed te m p la te

Fig. 1. In all plots, the thin solid line (green) is the simulated symmetric function, respectively: x2, cos(x) and sinc(x); the dots are the noise corrupted data. The top row of plots shows as thick (red) lines the penalized spline reconstructions, while in the bottom row these thick lines represent the template splines with symmetry constraint. −10 0 10 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −10 0 10 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 λ= 100 λ= 0.03

Fig. 2. On the left, the spline corresponds to a very large value of the penalty parameter, thus the symmetry is strongly imposed, yielding an almost zero solution. On the right, the penalty parameter is computed with GCV, and the symmetry constraint is completely ignored, but the data is reasonably fitted.

5 Conclusions

The template spline introduced in this paper generalizes some of the most used families of splines from literature. An issue that we find important is

(17)

that, due to this common framework, the family of penalized splines can be put in its own right beside the classical family of smoothing splines.

Template splines solutions are quite easily computable, since a (regularized) linear least squares optimization can be solved instead.

Template splines can constitute a solid basis for nonparametric regression problems where the “data acquisition space” is different from the space where interesting properties appear, or where strong or weak constraints should be imposed.

Acknowledgements

Diana Sima is a research assistant and Dr. Sabine Van Huffel is a full professor at the Katholieke Universiteit Leuven, Belgium. Research supported by Research Council KUL: GOA-AMBioRICS, several PhD/postdoc & fellow grants; Flem-ish Government: FWO:PhD/postdoc grants, projects, G.0407.02 (support vec-tor machines), G.0269.02 (magnetic resonance spectroscopic imaging), G.0270.02 (nonlinear Lp approximation), G.0360.05 (EEG, Epileptic), research communities (ICCoS, ANMMM); IWT: PhD Grants; Belgian Federal Science Policy Of-fice: IUAP P5/22 (‘Dynamical Systems and Control: Computation, Identification and Modelling’); EU: PDT-COIL, BIOPATTERN, ETUMOUR.

References

Craven, P., Wahba, G., 1979. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, 377–403.

de Boor, C., 1978. A Practical Guide to Splines. Springer, Berlin.

Eilers, P. H. C., Marx, B. D., 1997. Flexible smoothing with B-splines and penalties. Statistical Science 11 (2), 89–121.

Eubank, R., 1999. Nonparametric Regression and Spline Smoothing. Marcel Dekker, New York.

Friedman, J. H., 1991. Multivariate adaptive regression splines (with discus-sion). Annals of Statistics 19 (1).

Golub, G. H., Heath, M., Wahba, G., 1979. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–223. Golub, G. H., Van Loan, C. F., 1996. Matrix Computations, 3rd Edition.

Johns Hopkins University Press, Baltimore, Maryland.

Golub, G. H., von Matt, U., 1997. Generalized cross-validation for large scale problems. In: Van Huffel, S. (Ed.), Recent Advances in Total Least Squares Techniques and Errors-in-Variables Modeling. SIAM, pp. 139–148.

(18)

Green, P. J., Silverman, B. W., 1994. Nonparametric Regression and Gener-alized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London.

Hall, P., Opsomer, J., 2005. Theory for penalised spline regression. Biometrika 92, 105–118.

Hansen, P. C., 1997. Rank-Deficient and Discrete Ill-Posed Problems. Numer-ical Aspects of Linear Inversion. SIAM, Philadelphia.

H¨ardle, W., 1990. Applied Nonparametric Regression. Econometric Society Monographs. Cambridge University Press.

Kuijt, F., van Damme, R., 2001. A linear approach to shape preserving spline approximation. Advances in Computational Mathematics 14, 25–48. Lee, T. C. M., 2003. Smoothing parameter selection for smoothing splines:

A simulation study. Computational Statistics and Data Analysis 42 (1-2), 139–148.

Mammen, E., Marron, J., Turlach, B., Wand, M., 2001. A general projection framework for constrained smoothing. Statistical Science 16 (3), 232–248. Ruppert, D., 2002. Selecting the number of knots for penalized splines. Journal

of Computational and Graphical Statistics 11, 735–757.

Ruppert, D., Carroll, R., August 1996. A simple roughness penalty approach to regression spline estimation. Tech. Rep. 1167, School of OR&IE, Cornell University.

Wahba, G., 1990. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM.

Wahba, G., August 2003. An introduction to reproducing kernel Hilbert spaces and why are they so useful. In: Proceedings of the 13th IFAC Symposium on System Identification (SYSID 2003). Rotterdam.