Minimax robust designs for misspecified regression models

(1)

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, som e thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality o f th e copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps.

ProQuest Information and Learning

300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600

(2)

(3)

by Peilin Shi

B.Sc., Harbin Institute of Technology, 1982 M.Sc., Harbin Institute of Technology, 1989

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

in the Department of Mathematics and Statistics

We accept this dissertation as conforming to the required standard

Supervisor (Department of Mathematics and Statistics)

____________________________________________________________

Co-Supervisor (Department of Mathematics and Statistics)

Dr. F. N. Diacu, Departmental M ^ b e r (Department of Mathematics and Statistics)

r. N. Roy, y Outside M em ber^ (Department of Economics)

Dr. D. P. Wiens, External Examiner (Department of Mathematical and Statistical Sciences, University of Alberta)

(c) Peilin Shi, 2002 University of Victoria

(4)

ABSTRACT

Minimax robust designs are studied for regression models with possible mis specified response functions. These designs, minimizing the maximum of the mean squared error matrix, can control the bias caused by model misspecification and the desired efficiency through one parameter. Using nonsmooth optimization technique, we derive the minimax designs analytically for misspecified regression models. This extends the results in Heo, Schmuland and Wiens (2001). Several examples are discussed for approximately polynomial regression.

Examiners:

Dr.Lr J. Ye, ^ Supervisor (Department of Mathematics and Statistics)

Co-Supervisor (Department of Mathematics and Statistics)

departmental Member (Department of Mathematics and Statistics)

Dr. N. Roy, Outside M em ber/ (Department of Economics)

Dr. D. P. Wiens, External Examiner (Department of Mathematical and Statistical Sciences, University of Alberta)

(5)

A b stract ii

Table o f C on tents iii

List o f Tables iv

List o f Figures v

A ckn ow led gem ents vi

1 In trod u ction 1

1 .1 Classical Optimal D e s i g n s ... 1

1.1.1 General C o n c e p ts... 1

1.1 .2 E x a m p le s ... 2

1.1.3 Classical Optimal D e s ig n s ... 6

1 .2 Minimax Robust D esig n s... 7

1.2 .1 The Reasons We Study Minimax Robust Designs . . . . 7

1.2.2 Minimax Robust D esigns... 9

(6)

1.4 S u m m a r y ... 11

2 N o n sm o o th O p tim ization 13 2.1 Examples in Nonsmooth Analysis and O p tim iz a tio n ... 13

2.2 Basic Concepts of Nonsmooth A n a ly s is ... 17

2.2 .1 Classical Derivatives and Generalized G r a d ie n ts ... 17

2.2.2 Some Basic Calculus of Generalized G r a d ie n ts ... 2 0 2.2.3 General Definitions and Properties about Matrix Deriva tive ... 27

2.3 The Generalized Gradient of the Largest Eigenvalue of a Matrix Valued Function... 28

2.4 The Normal Cone of the Abstract Constraint S e t ... 31

2.5 Lagrange Multiplier R u le ... 33

3 M in im ax R o b u st D esign s for M issp ecificed R egression M odels 36 3.1 Minimax D e s ig n s ... 36

3.2 Some Fundamental C alc u la tio n s... 43

3.3 Main Analytical R e s u l t s ... 45

3.4 E x a m p le s ... 50

3.5 Huber’s Minimax Design for Simple Linear Regression... 53

(7)

4.1 Existence of Solutions on Ùm ... 60

4.2 Solution in Ù\f ... 67

5 C o m p u ta tio n a l R esu lts 73 5.1 The Sequential Unconstrained Minimization Technique (SUMT) 74 5.2 Normalizing Density Function m ... 75

5.3 Two Methods for Solving Unconstrained Nonlinear Program ming 76 5.3.1 Cyclic Coordinate M e th o d ... 76

5.3.2 BFGS M ethod... 78

5.4 A lgorithm s... 81

5.5 Com putational Results For z^(x) = (l,ar)... 82

5.6 Com putational Results For z^(x) = (x, 84 5.7 Com putational Results For z^(x) = (l,ar,a;^)... 84

(8)

3.1 Exact coefficient values of density function m{x) = (ax^ + . 59

5.1 H uber’s exact coefficient values ([H]) of density function m (r) =

(ax^ -t- 6)+ and numerical coefficient values ([F], [C]) of density

function m{x) = with UBSFM, where [F] are the

results using fixed starting point (1,1,1) and [C] with startin g

points close to [H]... 87

5.2 Some coefficient values of density function m{x) =

for loss function Ld with UBSFM... 8 8 5.3 Some coefficient values of density function m{x) =

for loss function Lq with UBSFM... 89

5.4 Heo, Schmuland and Wiens’ exact coefficient values ([H]) of den

sity function m{x) = and our numerical coefficient

values ([O]) of density function m{x) = ...w ith.UB

SFM, where KR denotes the kinds of results 90

(9)

for loss function Cq with PBSFM... 91

5.6 Numerical values of eigenvalues po,Pi,P2 for various u a t m{x) =

( ^X* 4-^ox^C? ) ^ function Cq with PB SFM ... 92

5.7 Numerical values of eigenvalues po, Pi for various u at [H]:

m{x) = {oqx^ + &o)^ and [O]: m(x) = for loss function

Cq with UBSFM, where [H] denotes H uber’s results and [O] de

notes our results in the column of the kinds of the results ([KR])

respectively... 93

5.8 Numerical coefficient values of density function Tu(i) = ( d p ^

for loss function Ca with PBSFM... 94

5.9 Numerical coefficient values of density function m(x) = ^

for loss function Co with PBSFM... 95

(10)

2.1 Two lines fitted to six data points (indicated by dots)... 16

2 .2 f q i x ) = max { 1 + 1 + x} on [—1, 1] (solid line)... 16

2.3 Function y = \x\ and its generalized gradient [—1, 1] at x = 0. . 21

5.1 Q-optimal minimax densities m(x) = (ax^ + b)'^ ([H]-solid lines),

and m(x) = with UBSFM ([F]-dotted lines) for ap

proximately linear regression: (a) u = 0.1; (6) t/ = 1... 96

5.2 Q-optimal minimax densities m(x) = (ox^ 4-6)"^ ([H]-solid lines),

and m(x) = with UBSFM ([F]-dotted lines) for ap

proximately linear regression: (c) u = 6.48; (d) f/ = 1 0 .... 97

5.3 Q-optimal minimax densities m(x) = (ox^ 4-6)^ ([H]-solid lines),

and m(x) = with PBSFM ([F]-dotted lines) for ap

proximately linear regression: (e) u = 100; ( / ) i/ = 1000. . . . 98

(11)

lines, Heo, Schmuland and Wiens, results), and m (x) =

with UBSFM ([O]: dotted lines, our results) for approximately

linear regression: (a) i/ = 1.000; (6) 10.00... 99

5.5 Q-optimal minimax densities m(x) = ([H]: solid

lines, Heo, Schmuland and Wiens, results), and m (x) =

with UBSFM ([O]: dotted lines, our results) for approximately

linear regression: (c) u = 1 0 0.0; (d) i/ = 1 0 0 0... 1 0 0

5.6 Q-optimal minimax densities m{x) = ^ with PB

SFM (solid lines) and m{x) = a(a;‘‘ + 0ix^ + ^2)'^ (dotted lines),

for quadratic model: (a) i/ = 0.1; (6) t/ = 1 .0...1 0 1

5.7 Q-optimal minimax densities m{x) = ( ^ with PB

SFM, for quadratic model: (c) u = 5.0; (d) 1/ = 10.0... 102

5.8 Q-optimal minimax densities m(x) = ^ with PB

SFM, for quadratic model: (e) u = 20.0; ( / ) u = 50.0 103

5.9 Q-optimal minimax densities m(x) = with PB

SFM (solid lines) and m(x) = a(x'* -h^ix^ 4- /^a)'"' (dotted lines),

for quadratic model: (g) u = 75.0; (h) u = 100.0... 104

5.10 A-optimal minimax densities m{x) = ( +^41^2+? ) ~

0.0150; (6) = 0.0165... 105

(12)

0.0175; (d) 1/ = 0.0200... 106

5.12 A-optimal minimax densities m(x) = ( ^ ' (^) ^ ~

0.0300; ( / ) i/ = 0.0500... 107

5.13 A-optimal minimax densities m(x) = : (p) f/ =

0.0800; {h) 1/ = 1 .0 0 0 0... 108

5.14 A-optimal minimax densities m(x) = (i) t/ =

10.000; (y) 1/ = 20.000... 109

5.15 D-optimal minimax densities m(x) = ( ^ ^ : (a) i/ =

0.1; (b) u = l ...1 1 0

5.16 D-optimal minimax densities m(x) = : (c) i/ =

1 0; (d) i/ = 2 0...1 1 1

5.17 D-optimal minimax densities m(x) = ( ^ = (e) f/ =

50; ( /) i/ = 100... 112

5.18 Q-optimal minimax densities m(x) = ( +e° ^ UB

SFM (solid lines) and PBSFM (dotted lines) : (a) u = 5.0; (6) i/ =

10.0... 113

(13)

I am deeply indebted to my supervisor Dr. J. J. Ye for introducing me to

nonsmooth analysis and optimization, and her countless efforts on my behalf

throughout my graduate studies. I also would like to express my sincere g ra t

itude to my co-supervisor Dr. J. Zhou for her tremendous help in the area

of minimax robust design. This dissertation would not have been completed

without their help and incredible patience.

I especially wish to thank the members of my examining committee Dr. F.

N. Diacu, Dr. N. Roy and Dr. D. P. Wiens, for their careful consideration and

helpful suggestions.

No word can describe my eternal gratitude to my parents and my parents-in

law. They are the pinnacle o f parenthood who have given my family everything,

without ever asking anything in return.

Finally, I would like to thank my wonderful wife, Xiaoqi Sun and my lovely

daughter, Shirly Xiaomeng Shi, for their love, support and encouragement.

(14)

In tro d u ctio n

1.1 Classical O ptim al D esigns

1.1.1 G en era l C o n c e p ts

Scientists and engineers often need to assess the effects of certain controllable

conditions on the experiments they conduct. To obtain the most information

about these effects, the experiment under consideration must be designed prop

erly. A set of possible conditions is called the factor space or design space. In

the factor space, specification of the number of observations to be made is called

a design (A rthanari and Dodge, 1981).

Before choosing a design, we need to have some information about the re

lationship between the controllable conditions and the outcome. T he outcome

is usually called the response. The response variable can be related to control-

lables by some functional relationship, which is assumed to hold. W hen the

(15)

have a statistical model.

The choice of both the model and the design influences the conclusion drawn

from the experiment. Thus problems of optim ally choosing the model and the

design are im portant. Also, methods of analyzing the d ata are required for

estim ating the unknown parameters in the model. In this dissertation we focus

on the choice of designs such th at accurate estimates can be obtained. In

particular, we apply the technique of m athem atical programming to solve the

related optim ization problems in the regression design. The following examples

give a brief introduction of regression designs.

1 .1 .2

E x a m p le s

E x a m p le 1.1

We assume th at the controllable variable x and the response variable y are

related according to the simple linear regression model. T h at is, there exist

param eters Oq, di, and such th at for any fixed value of the independent

variable x, the response variable y is related to x through the model equation

V = Oq 01 X

The quantity s in the model equation is a random error variable with mean

(16)

the experiment, from which the model parameters and the true regression line

itself can be estim ated. These observations are assumed to have been obtained

independently of one another. T hat is, yi is the observed value of a random

variable Vi, where Vi = Xi + e i and the errors e i, • • • ,£n are independent

random variables . Independence of V^, , V^ follows from the independence

of the Si's.

The least squares estimates {LSE) of 9q and ^i, denoted by 9q and are

given by - ÿ) 9o = ÿ - 9 i x , 9i = — ---, = \2 5^(Xi - x) i=l

which minimize the following function

LSE{9q, ^i) — 51 [î/t — (^0 4- Xi)]^ .

i=l

One of th e classical regression design problems is to choose Xj’s to find the

“best” estim ate 9 = {9q,9i)'^. The “best” can be explained by many different

measures. If the estimates are unbiased, one is the integrated mean squared

error { I M S E ) (let 5 = j] here),

I M S E =

J

E{9q + 9 i X — 9q — 9i x Ÿ dx

(17)

_ J . / 9 . - 9 x c r ^ 2 x x a ^

— + (x'^ + x^)______________

S l n ^ ^ ^ ^ E ? = l(x .--x)2 E ? = l( X i- x )2j dx

= <T^ I — 4- j^ + x2 E " = i( a :i- x )2.

To minimize I M S E over the x ,’s, the z / s ought to be spread out as much as

possible, for instance, half of them at the lowest z-value and the other half

a t the highest x-value. This example dem onstrates that we can improve the

quality of our estimates by planning the locations of the z-values, rath er than

ju st choosing them haphazardly. ■

E x a m p le 1.2

Consider a multiple regression model, which can be written in m atrix nota

tion as y = X 0 4- e where and ( 1 V2 ( 1 1 Z i i Z21 Z12 Z22 y = \ y n ) , X = 1 1 ^ n l ^n2 0 ( ₍

_A

9x S = £2

I ^p-1

} \ ^ n /

(18)

0 and variance cr^. The least squares estim ator of 0 is

ê = (x^x) 'x V

with the covariance m atrix

Cav

(d) = £; I [è -

E{è)] [à

-

(x^x)"^.

The classical regression problem is to choose the design points x,, i = 1, • • •, n,

in some optimal manner, which is equivalent to choosing the discrete measure

Ç on 5, where

C(x) : = - è (1.1)

" « = 1

and 6xi denotes the indicator function of x, , i.e.,

'.'M := I Ô

:

An optimal design can be considered to minimize some scalar function of the co-

variance m atrix, £(C ot;(0)). Examples of such functions include £(•) = trace{-)

and £(•) = det{-), where trace{A) and det(A) denote the trace and the deter

minant of a m atrix A respectively. ■

In general, we can consider a linear regression model

(19)

this case, for design points x,, we define

f

z ^ ( x i ) \

''

Zl(Xi) 22(Xi) • • • Z p ( Xi ) >

z =

Z^(X2) Zl(X2) 22(X2)

• • •

Zp(X2)

<

Z^(X n)

;

I 2 l ( x „ ) Zz(Xn)

•

• • 2p ( x „ )

/

thus, (1.2) C {Cav = C (Z^Z) ^ z(x)z^(x)^ (dx) ^ .

1 .1 .3

C la ssical O p tim a l D esig n s

T he classical optimal design problem with multiple variables includes the fol

lowing three parts:

P i : Specifying a Regression Model:

Y = z^(x)6 + e

z(x) e R , 0 e RP, (1.3)

x g S c R ,

where 5 is a given design space and z(x) = (zi(x), Z2(x), • • ■, Zp(x))'^ is a

given function of x. For instance, if z^(x) = (1, x^), then (1.3) is the usual

multiple linear regression model; ifx 6 R and z^(x) =

(20)

(1.4) Ai: the fitted model is exactly correct;

Ag: the errors Si are uncorrelated.

P3 : Estim ating 0 and Choosing an O ptim al Criterion:

^min C ( C o v

s.t. Xj G 5, i = 1, • • •, n,

where 0 is an estimate (not necessarily the least squares estim ate) in (1.3)

and £ is a loss function, such as the trace, the determ inant, the largest

eigenvalue and the integration of the covariance m atrix which give the

A—, D —, E — and Q —optimality criteria, respectively.

This kind of design problems has been studied extensively in the literature, and

many optimal designs are obtained for various linear models and loss functions,

see Fedorov (1972) and Pukelsheim (1993).

1.2 M inim ax Robust D esigns

1.2.1 T h e R ea so n s W e S tu d y M in im ax R o b u s t D esig n s

Classical optim al design theory is focused on the attainm ent of some form of

a minimum variance property, assuming the fitted model to be exactly correct

(21)

model assumptions, the classical optimal designs do not perform well and yield

poor estimates.

C a se 1: The model assumption A i is violated.

This means th at model (1.3) is not exactly correct, so the least squares

estim ator is biased. Box and Draper (1959) found th at very small departures

from A i can eliminate any supposed gains arising from the use of a design

which minimizes variance alone, see Wiens (1992). Huber (1981) also gave the

statem ent th at “deviations from linearity th at are too small to be detected

are already large enough to tip the balance away from the ‘optim al’ designs,

which assume exactly linearity and put observations on the extreme points of

the observable range, toward the naive’ ones which distribute the observations

more or less evenly over the entire design space. ”

Furthermore, with the classical optimal designs, the model adequacy test

cannot be completed since the observations are only on the extreme points of

the design space, and no information is available in the interior of S.

C a se 2: The model assumption Ag is violated.

In this case, the errors are correlated. Consequently, model departure from

uncorrelated errors occurs. The exact correlation structure of the errors is

usually unknown. Thus Cov{0) is unknown. The classical optim al designs

(22)

1.2 .2

M in im a x R o b u st D esign s

As stated in last subsection, there is a need to study optim al designs under

possible small violations of model assumptions A i and/or Ag. These designs

are called robust designs. In general, robust designs are those which are not

sensitive to small departures from model assumptions. Minimax robust design

is one kind of robust design. More discussion of the need to study robust designs

can be found in Wiens (1990,1992,1994) and Heo, Schmuland and Wiens (1999,

2001).

In practice, the relationship between a response variable y and independent

variables x is only approximately modelled. This often results in the violation

of A i. The violation of Ag can be caused by, for example, serial or spatial cor

relation or repeated measures. W ith respect to many kinds of model violations

(departures), various robust designs can be studied. In this dissertation, we

focus on the violation of A i, and construct the corresponding robust designs.

As we mentioned above, model (1.3) is usually an approximation of a true

(unknown) model, say,

£ ( Y lx ) = z ^ ( x ) 0 + /( x ) , (1.5)

where function / is unknown but “small” . Due to the unknown nature of

(23)

{(Yi, Xi)}"^j. The least squares estimates 0 oiO and Ÿ = z^(x )0 of E [Y|x] are

possibly biased because of the misspecified response. In this situation, regres

sion designs can play an im portant role in choosing optimal design points x € 5

th at yield estim ates 0 and Ÿ , which rem ain relatively efficient while suffering

as little as possible from the bias engendered by the model misspecification.

Robust regression designs have been studied for numerous response func

tions z(x) and various classes of function / . Examples are Box and Draper

(1959), Stigler (1971), Sacks and Ylvisaker (1984), Wiens (1990, 1992), Dette

and Wong (1996), and Liu and Wiens (1997). Recently Heo, Schmuland and

Wiens (2001) investigated minimax designs when function / belongs to some

class of functions (defined in (3.3)). T heir study indicates th at the associated

minimax problems are difficult to solve analytically, even if z^(ar) = (1, z, x^) - a

quadratic regression model. This leads to the investigation in this dissertation.

Using nonsmooth optimization theory, we are able to find the solution for the

minimax problems proposed in Heo, Schmuland and Wiens (2001).

1.3 O ptim ization and N onsm ooth A nalysis

The origins of analytic optimization lie in the classical calculus of variations

and are intertwined with the development of the calculus. For this reason,

(24)

differentiability) hypotheses th at were made a t its inception. The attem pts to

weaken these smoothness requirements have often been considered in nature.

In practical application of optimization, we often get into the situation where

the objective function to be minimized /maximized and/or constraint functions

are not necessarily differentiable. Nonsmooth analysis th at refers to differential

analysis in the absence of differentiability is, without a doubt, a very im portant

tool for this. It is only in the last two decades th at the object has grown

rapidly and come to play a role in functional analysis, optimization, differential

equation, control theory, and increasingly, in analysis generally.

1.4 Summary

In this dissertation, we continue the study in Heo, Schmuland and Wiens (2001)

to derive minimax designs. Using Lagrange Multiplier Rule in nonsmooth op

tim ization theory, we obtain minimax designs analytically for misspecificed re

gression model (1.5). The density of the minimax design has the analytical

form

m{x) — ^z(x)^B z(x)

— r _ i 11

z’^(x)D z(x) /

almost everywhere on S = [—5, for suitable matrices B and D and constant

d, where (c)"*" := max(0, c).

(25)

z^(a:) = ( l , z ) in Huber (1981) and z^(x) = (x,x^) in Heo, Schmuland and

Wiens (1999).

In C hapter 2, we give a brief review of the nonsmooth optimization the

ory and some results including the definition and properties of the generalized

gradient, the generalized gradient of the largest eigenvalue of a m atrix valued

function, properties of the abstract constraint set, the nonsmooth Lagrange mul

tiplier rule, and some examples. In Chapter 3 we introduce the minimax design

problems and study the related nonsmooth optimization problems. Analytic so

lutions are derived and illustrated through examples. In C hapter 4, we prove the

existence of solutions with the restriction th at the density is bounded pointwise

by a positive constant. In Chapter 5, computational techniques determining

minimax density functions are addressed, and detailed algorithms are presented.

In particular, sequential unconstrained minimization technique, cyclic coordi

nate m ethod and BFGS algorithm (Broyden, 1970a, 1970b; Fletcher, 1970;

Goldfarb, 1970; Shanno and Kettler, 1970) are applied to compute the minimax

density functions. Several examples are given and compared with the existing

designs. In C hapter 6, we summarize our results, and point out possible further

(26)

N o n sm o o th O p tim ization

This chapter contains the preliminaries and results on nonsmooth optim ization

th a t will be used later in this dissertation.

2.1 Examples in N onsm ooth Analysis and Op

tim ization

It is interesting th at in Clarke’s well-known book on nonsmooth analysis, the

following example on linear regression was given as the first example (Clarke,

1983) to illustrate the needs for nonsmooth analysis and optimization.

E x a m p le 2.1

“This first example is familiar to anyone who has had to prepare a laboratory

report for a physics or chemistry class. Suppose th at a set of observed d a ta

points (xo, I/o), • • •, (x/v, yv) in the x -y plane is given, and consider the

problem of determining the straight line in the x —y plane th a t best fit the

(27)

d ata. Assuming th at the given d ata points do not all lie on a certain line (any

lab instructor would be suspicious if they did), the notion of “best” must be

defined, and any choice is arbitrary. For a given line y = k x + b, the error Cj at

the ith d ata point (xi,yi) is defined to be \kxi + b — yi\. A common definition of

best approximating line requires th at the slope k and the intercept b minimize

over all k and b. On the face of it, it seems a t least as natural to

ask instead th at the total error be minimized. The characteristics of

the resulting solution certainly differ. In Figure 2.1, for example, the solid line

represents the least total error’ solution, and the dashed line represents the

‘least total square error’ solution . Note th at the former ignores the anomalous

d a ta point which presumably corresponds to a gross measurement error. The

least squares solution, in contrast, is greatly affected by th at point. One or

the other of these solutions may be preferable; the point we wish to make is

th a t the function 5Z,^o is nondifferentiable as a function of k and b. Thus

the usual methods for minimizing differentiable functions would be inapplicable

to this function, and different methods would have to be used. Of course, the

reasons th at the least square definition is the common one is that it leads to

the minimization of a smooth function of k and 6.” ■

E xam ple 2.2

(28)

A function which arises naturally as a criterion in engineering and statistical

design problems is

/^ (x ) = the largest eigenvalue of A{x).

The function is not differentiable in general even if all <iij{x) are continuously

differentiable functions of x. For example, let

and

Q( ' ) 4 i L )

for z € [—1,1]- Then

min Îm{x)

is a smooth optimization problem, since function fxrix) = 2 + x^ is continu

ously differentiable on [ - 1 ,1]. But

mm _ / , ( x )

is a nonsmooth optimization problem, since function

/g (x ) = m ax{l + I M + X} = I ; + g { o / i| .

is nondifferentiable a t z = 0 (Figure 2.2). We shall return to this example in

(29)

Figure 2.1: Two lines fitted to six data points (indicated by dots).

(30)

2.2 B asic C oncepts o f N onsm ooth A nalysis

Throughout this section, X is a real Banach space.

2 .2 .1 C lassical D er iv a tiv es and G en era lized G rad ien ts

Before defining the generalized gradient, we review some facts about classical

derivatives.

Definition 2.3 [G&teaux Derivative and Frêchet Derivative]

Let f : X R . The usual (one-sided) directional derivative of / at x in

the direction v is

/ ( x ; v ) : = lim (2.1,

t->0+ t

when this limit exists. We say that / is Gâteaux differentiable at x provided

the lim it in (2.1) exists for all v G X , and there exists an element in the space

X* of continuous linear functionals on X , denoted f '{x), th at satisfies

f ' {x;v) = ( f ' {x), v). (2.2)

If (2.2) holds at a point x, and the convergence in (2.1) is uniform with respect

to V in bounded subsets of X , we say that / is Frêchet differentiable at x. and

call f ' (x) the Frêchet Derivative at x.

We say th at / is continuously differentiable (C^) a t x provided that on a

neighborhood of x, the G âteaux derivative exists and is continuous as a mapping

(31)

From Examples 2.1 and 2.2, we see that the above classical derivatives are

not useful where nonsmoothness arises. The generalized gradient introduced by

F. Clarke in 1973 is a replacement for the classical derivatives when a function

is not smooth. It is defined for a class of functions called Lipschitz continuous

function which includes the functions encountered in Examples 2.1 and 2.2.

Definition 2.4 [Lipschitz Condition]

Let C be a nonempty subset of a norrned linear space X . A function / :

C —> is said to be Lipschitz {of rank L) on C if for some nonnegative scalar

L one has

\ f { y i ) — /( y z ) ! < L\\yi - Î/2II, Vî/i, Î/2 € c .

We say th at / is Lipschitz o f rank L near x if C = x + SB for some J > 0, where

B signifies the open unit ball.

It follows from the mean value theorem that if a function / is continuously differ

entiable at X then it is Lipschitz near x. O ther well-known Lipschitz continuous

functions include the class of all convex functions bounded above.

We are now ready to define the generalized directional derivative and gen

eralized gradient.

Definition 2.5 [Generalized Directional Derivative]

Let f : X R he Lipschitz near x. For any vector v in A", the (Clarke)

(32)

is defined by

y-*x t

t-*Q+

Note th a t unlike the usual directional derivative (2.1), the generalized direc

tional derivative involves only the super limit of the difference quotient which

is bounded above by f,||u|| in light of the Lipschitz condition.

Definition

2.6

[Generalized Gradient]

(p.27, Clarke, 1983)

Let f : X R he Lipschitz near a given point x € X . The generalized

gradient of / at x, denoted d f ( x ) , is the subset of X* given by

d f { x ) = {C € X* : f °{x\ v) > ((,%) for all v € X}.

Proposition

2 .7 (Proposition 2.2.4, Clarke, 1983)

I f f is continuously differentiable at x, then d f { x ) = {/'(x)} .

Proposition

2.8 (Proposition 2.2.7, Clarke, 1983)

When f : X R is convex and Lipschitz near x, df {x) coincides with the

subdifferential at x in the sense of convex analysis, i.e.,

9 f { x ) =

{c

€ X* : f {y) - f {x) > {Ç, y - x) fo r all y e X } ,

and /° (x ;u ) coincides with the directional derivative, i.e.,

r ( x ; « ) = / ( x ; . ) : = lim + t-+0+ t

(33)

Exam ple 2.9

Let X = R and f { x ) = |z|. T hen / is Lipschitz of rank of 1. Since f { x ) = x

if X > 0 and /( x ) = —x if x < 0, by Proposition 2.7, we have

if X > 0

For the case x = 0,

we

find

n O -,v ) = y->x t = Urn k, + 'w| - W y-*x t t->o+ = 1^1

-This means th at d f { x ) contains such Ç th at satisfy /°(0;u) = |u| > Çv, for all

V, which yields th at 9 /(0 ) = [—1,1] (Figure 2.3). Finally, we have

f { 1 } i f x > 0 9 /( x ) = < [ —1 ,1 ] if X = 0

[ { - 1 } if X < 0. ■

2.2.2 S om e B a s ic C a lc u lu s o f G en eralized G r a d ie n ts

We now gather some formulas th a t facilitate the calculation of 9 / when / is

synthesized from simple functionals through linear combinations, composition,

and so on.

Definition 2.10

(Definition 2.3.4, Clarke, 1983) / is said to be regular a t x provided

(34)

Figure 2.3: Function y = |ar| and its generalized gradient [—1, 1] a t x = 0.

(i) For ail v, the usual one-sided derivative f ' {x, v) (see (2.1)) exists.

(ii) For all u, f ' { x ; v ) = f ° ( x ; v ) .

E x a m p le 2 .1 1 (Proposition 2.3.6, Clarke, 1983)

All convex functions and all functions are regular. ■

P r o p o s itio n 2 .1 2 [Linear Com bination Rules] (Corollaries 2 and 3 of Propo

sition 2.3.3, Clarke, 1983)

Let /i —► /I (f = 1, • • •, n), be Lipschitz near x, and let A, {i = 1, • • •, n) be

scalars. Then f := YZjLi ^ifi Lipschitz near x, and one has

(x) C 5^A ,d/i(x), t=i

(35)

Proposition 2.13 [Product and Quotient Rules] (Propositions 2.3.13 and

2.3.14, Clarke, 1983)

Let f and g : X R be Lipschitz near x. Then the product f g is Lipschitz

near x and one has

d { f g) i x ) C f { x ) d g { x ) +g { x ) d f { x ) .

I f in addition f { x ) >

0,

g{x) >

0

and if /, g are both regular at x, then equality

holds and f g is regular at x . A nd moreover suppose g{x) ^

0,

then the quotient

f / g is Lipschitz near x and one has

g{x)df{x) - f{x)dg{x)

(f)

( i ) C

I f in addition f { x ) >

0,

g{x) >

0

and if f and —g are both regular at x , then

equality holds and f / g

ts

regular at x.

Proposition 2.14 [Chain Rule] (Theorem 2.3.10, Clarke, 1983)

Let F : X RP’ be continuously differentiable near x , and let g : RJ'’ R

be Lipschitz near F{x). Then the function G := g o F is Lipschitz near x, and one has

dG(x) C dg {Fix)) o F'{x) = F'{x)'^dg{F{x)).

Equality holds if g (or —g) is regular at F{x), in which case G (or —G ) is also regular at x.

(36)

Proposition 2.15 [Extrema]

(Proposition 2.3.2 of Clarke (1983) and Propo sition 2.8)

I f f attains a local m inim um or maximum at x , then x is a stationary point o f f , i.e., 0 € df {x) . Moreover if f is a convex function, then any stationary point must be a minim um fo r f .

E x a m p le 2.16

Consider G{k,b) = \2k + b — 2|. Let F{k,b) = 2k + b - 2, and g{y) = \y\.

Then G{k,b) = g{F(k,b)). Here F'{k,b) = (2,1)^, and

’ { 1}, if 1/ > 0

dg{y) = ' [ -1, 1], if y = 0 ,

^ { -1 }, if y < 0

by Example 2.9. Since g is convex, hence regular, it follows from Proposi

tion 2.14 that we have

i f2* + 6 - 2 > 0

dG(k,b) = • 1],

^ ( i ) { - D .

if 2A: + 6 - 2 = 0

(37)

if 2A: + 6 - 2 > 0

1 ^ , - 1 < A < l | if 2A:+ 6 - 2 = 0

I ((:;)!■

if 2fc + 6 — 2 < 0. H - i / j

E xam ple 2.17

Now armed with some of the propositions given above, we can prove that

A: = 1 and 6 = 0 are the “least total error” solution for Example 2.1. For

(xi, Vi) = (*,*), i = 0 ,1 ,2 ,3 ,4 and (xs, ys) = (5,0), the problem

m i n ^ jfcxj + 6 - yi|

i=0

becomes

min G{k, 6)

where G{k,b) = |5A: + 6| + \ik + 6 — i\. By Propositions 2.12. 2.15 and

Example 2.16, (A:, 6) is a solution to the problem if and only if

0 E dG{k,h)

= d |5fc + 6| + ^ ' |îA; + 6 — z|

1 = 0

= ^|5A: + 6| + ^jzA: + 6 — z|

(38)

which gives or in extended forms,

« n i

if 5fc + 6 > 0 i _{{ > ( ; ) . w s i |} _{if 5A: + 6 = 0}

!-(;)}■

if 5A: + 6 < 0

1(0)

if ik + b — i > 0 4 + E 1=0 | > ( ; ) . w s i } if ik + b — i = 0

,{-(01

if ifc + 6 — i < 0

0 0 4 4 ' . )=0.

(2.3)

Ai + 2A2 + 3A3 + 4 A4 + 5A5 = 0, (2.4)

and Ao + Ai + A2 + A3 + A4 + A5 = 0 (2.5) where As = r 1 if 5A: + 6 > 0 A € [ -1,1] if5A: + 6 = 0 —1 if ok + 6 < 0, At = 1 if ik + b — i > 0 A € [ - 1 , 1 ] i f i k + b - i = 0 —1 if îA: + 6 — z < 0

(39)

for i = 0 ,1 ,2 ,3 ,4 . It is easy to see th at \ = —1 or 1, i = 0 ,1 ,2 ,3 ,4 ,5 , does

not solve equation (2.4). Therefore, at least one of the equations ik + b — i =

0 , 5 k + b = 0 , i = 0 ,1 ,2 ,3 ,4 holds.

In the first case, ik + b — i = 0 for z = 0 implies th at 6 = 0 and hence for

i = 1,2,3 ,4 ,

/ 1 if t > 1 r 1 i( k > I \ - 1 if fc < 1, \ A e [-1 , 1] if 1.

Substituting Aj and Ag into (2.5) and (2.4), we have Ao = —5 for A: > 1 and

As = 2 for A: < 1, which are both impossible. It is obvious th a t when 6 = 0

and A: = 1, we can solve (2.3), th at is, there are Aj € [—1, 1] for i = 0,1 ,2 ,3 ,4

such th a t As = 1, and Aj satisfy (2.3), for example, (Ao, A%, Ag, A3, A^, As) =

f l — - — - — - — - 11

2’ 2’ 2’ 2’

In the second case, zA: + 6 — t = 0 for z = 1 implies th at 6 = 1 — A: and hence

{

- { k - 1) if z = 0 0 if z = 1 ( z - l ) ( t - l ) if z = 2 ,3 ,4 . Thus, we have r - 1 if A: > 1 r 1 if A: > 1 1 if A: < 1, ~ \ A 6 [-1 , 1] if A: < 1 and _, _{. if *: >}₁ • if i < 1 for i = 2,3 ,4 .

(40)

Hence, we have Ai = —3 for fc > 1 from (2.5), and Ai = ^ , > 5 = j for

A- < 1 from (2.4) and (2.5) which are contradictory to |Ai| < 1, jAs] < 1. In this

case, we have the same conclusion k = l , b = 0. In other cases, the proof and

conclusion are same, and omitted. ■

2 .2 .3

G en era l D efin itio n s an d P r o p e r tie s a b o u t M a tr ix

D er iv a tiv e

We recall some definitions and properties about m atrix functions which will be

used in following sections (Lancaster and Tismenetsky, 1985; Rogers, 1980). Let

A (t) = {aij{t))mxn, where : R — > R , i = I , . . . , m , j = 1, . . . ,n, then

(a) A '(t) = [fcA(t)]' = kA'{t)-,

(b) [A(t) + B (t)]' = A '(t) + B'(<), where B (t) = (6ij(<))mxn, ^

(c) [A(<)B(t)]' = A'(<)B(t) + A (t)B '((), where B (t) is a n x m m atrix

(d) [A (/(t))]' = A (,(u )/'(t), where, u = /(<) : R — y R;

(e) [A- Ht ) Y = - A - ^ { t ) A \ t ) A - ^ { t ) ;

(f) [X'^it)A{t)X{t)]' = X { t f A ' { t ) X { t ) + 2X'^{t)A{t)X'{t): where A '(0 =

(xi ( « ) ,..., Xn(t))'^,A(t) = (ay(t))„x„;

(g) f A m x n i t ) d t = ( / a ij(t)d t )

(41)

2.3 T he G eneralized G radient o f th e Largest

Eigenvalue o f a M atrix Valued Function

P r o p o sitio n 2.18 [Danskin’s Theorem] (Problem 9.13, p.99 of Clarke, Ledyaev, Stern and Wolenski, 1998)

Let a continuous function Ç : x M R be given, where M is a compact

m etric space. Suppose that fo r a neighborhood U of a given point z € A", the

derivative Çx{x', u) exists and is continuous (jointly) fo r {x', u) ÇlU x M . We

set

H(x') := max Ç{x',u).

ugM

Then 7i is Lipschitz near x , and one has

&H{x) = CO u) : u e M{ x ) } ,

where M{ x ) := {u 6 M : Ç{x,u) = 'H{x)} and co denotes the convex hull.

E x a m p le 2.19

Let us revisit Example 2.2. We find th at

^ / g (0) = c o { ( l ( 1 = c o {0, 1} = [0. Ij. ■

Let D : RP^ —> RP^^ be a p x p matrix-valued function defined by D (a) =

(42)

a, we can define a nonzero vector w = {wi,W2, • • -, Wp)^ € /P* to be an eigen

vector of D (a) corresponding to the real eigenvalue A (if it exists), i.e.,

D (a)w = Aw, for w ^ 0.

Assume th at all eigenvalues of D{a) are real and denote

J(a ) := the largest eigenvalue of £>(a)

— ^max(^(&))

= max (D (a)w )^w . IWP=i

Theorem 2.20 [The Generalized Gradient of

J

with Respect to m]

Suppose that the entries Oy ( i , j = l,- - - ,p ) of D(a) are all continuously differentiable functionals o/ m € f,^(S), and for each m, the eigenvalues o f m atrix D = (u^(m))pxp are all real. Then Jm '= J(a (m )) is Lipschitz near ni

and the generalized gradient o f at m is

dJm = co {(D (^w )^w : w 6 A/(a(m ))} ,

where denotes the Frêchet Deri.vative of D about m and M{a(m)) is the set

of maximizing w in

(43)

P ro o f.

By Proposition 2.18, Danskin’s theorem, J ( a ) = max (D (a)w ) w is Lip-||W ||2 = 1

schitz and

5 7 (a) = CO

I

^(D(a)w)^ ^ ]a ' ^ ^ M(a)

j

Qi jWiWj

«■j=l

: w 6 M (a)

}

where M (a) is the set of maximizing w in

7 (a) = max ((£ > (a )w )^ w |

||w|p = l l ^ -I

By Proposition 2.14, the chain rule, we have

57(a(m )) C 5 7 (a) o a '(m ) = a '(m )^ 5 7 (a)

C [ ( « 1 1 , - - , o i p , - - , O p i , - " , O p p ) ] L : w 6 A/(a(m)) [(®ll> * ' ‘ > ®lp) ‘ ■ 7 ®plt ■ ■ ' * ®pp)lm 1 I ^ " ( lijW iW j I :J = l : w € A/(a(m ))

11 )

= c o |(D j„w )^ w : w G A /( a ( m ) ) |,

(44)

2.4 T he N orm al C one o f the A bstract Con

straint Set

D efin itio n 2.21 (p.52, Clarke, 1983)

Let C be convex. The normal cone to C at x, denoted iVc(x), is the subset

of X *, given by {C € X* : (C, Î/ - x) < 0 fo r all y € C } . E x a m p le 2.22 Let C = [—1,1] 6 R. Then [0,oo)

{

0

}

. ( -0 0,0 ] i f X = 1 i f |x| < 1 i f X = — 1. E x a m p le 2.23

For C = R2 = {(x,y) : X > 0, y > 0}, C is convex. As a result,

Xr+(x, y) — ( - 00, 0 ] X ( - 00,0 ( - 00, 0 ] X { 0 } { 0 } X ( - 0 0 , 0 ] {0 } x { 0 } i f X = 0, y = 0 i f X = 0, y > 0 i f X > 0, y = 0 i f X > 0, y > 0.

(45)

Now let

n := |m : m{x) > 0 almost everywhere on S , m £ L^(S)J

be the constraint set which we will use in our problem. We give the following

expression for the normal cone of Q.

L em m a 2.24

Let mo G 0 . For any

C G iVn(mo) = {C ^ : (C, m — mo) < 0, fo r all m G fi} ,

we have

• C(^) ^ 0 almost everywhere on S;

• = 0 almost everywhere on E = {x £ S : mo(x) > 0} .

Proof.

(a) If m o(i) > 0 almost everywhere on 5, then there exists a function

mE G L^(5) such th at for any given measurable subset E C S,

m s i x )

0 if a: G 5 and mo(x) = 0

0 ii X £ E and mo(x) > 0

> 0 iî X £ S \ E and mo(x) > 0,

and m (x) := mo(x) ± m c(x) > 0 almost everywhere on 5. Therefore.

(46)

th a t is,

(C> wiE> = I ((z )m c (z ) dx = 0,

J{ x£S : in o (x )> 0 } n (S \E )

which implied that Ç(i) = 0 almost everywhere on S . Otherwise, if there exists

a measurable set E C S such th a t the measure of the set S \ E \s not zero and

C(ar) > 0 (or ((%) < 0) for all x G 5 \ E, then

<C. m , ) = C(x)mE(x) dx ^ 0,

which is impossible.

(b ) If mo(x) = 0 almost everywhere on S, then for all m G L'^{S) such that m (x) > 0 for all x G 5,

0 > (C, m — mo) = j^Ç{x) m (x) dx

which implies that C(a^) < 0 almost everywhere.

(c) If there are two sets P and Q such th a t P = {x G S : mo(x) > 0},

Q = {x G 5 : mo(x) = 0}, P U Q = S and the measures of both P and

Q are not zero, then in a m anner similar to (a ) and (b ), we have ((.x) = 0

alm ost everywhere on P , and Ç{x) < 0 almost everywhere on Q, which gives

the required result. ■

2.5 Lagrange M ultiplier R ule

The following nonsmooth Lagrange multiplier rule is a simple ciuse of Clarke's

(47)

Proposition 2.25 [Lagrange Multiplier Rule]

(Theorem 6.1.1, P228, Clarke, 1983)

Let X be a Banach space. Suppose that C C X is closed and convex, the function G is Lipschitz near any given point of C and the function hj is con

tinuously differentiable fo r j = . Consider the optimization problem

P : m in G (r), s.t. hj(x) = 0, x G C, fo r j = 1, - - ,/c. Let x solve P . Then

there exist real numbers Â > 0, and p.j, not all zero, such that

k

0 € Â dG(x) + p.j h'j{x) + Nc{x), j=i

where h'j, d and N ^ i x ) denote the Frêchet derivative of hj, the Clarke general ized gradient and the normal cone of C respectively.

Note th at the normal cone is involved in the above Lagrange m ultiplier rule

due to the presence of an abstract constraint C.

Example 2.26

We consider the optim al problems of the minimizing or maximizing function

f q i x ) — max {1 4- 1 + x} given in Example 2.2. For problem

max f o or equivalentlv min —/o ,

we know th at the optim al solutions are x = —1, 1. Note th at / q ( x ) ^ 0, since

(48)

have

0 € - 9 / q ( x ) + AT[_i i|(or)

{—1} + [ 0, oo ) if X = 1

{ 2 } + ( - 0 0 , 0] if X = - 1 .

It is obvious th a t x = 0 solves the problem min^/<j(x), and satisfies the

Lagrange m ultiplier rule

0 € df q i x ) + N[-i,i]{x)

= [0, l ] + {0 }

(49)

M inlm ax R ob u st D esign s for

M isspecificed R egression M od els

3.1 M inim ax D esigns

In this chapter, we focus on deriving the minimax design for regression model

(1.5) with one independent variable x. W ithout loss of generality, the design

space is assumed to be 5 = [—^, i]. Regressors z'^(x) = (zi(x), Z2(x), • • •, Zp{x))

are given, where Z j { x ) are continuous functions of i for j = 1, • • •, p. Thus, the

fitted regression model is,

E (Y lx) = z^(x)0, (3.1)

and the true model is,

E (Y |x ) = z^(x )0 + /(x ), (3.2)

where

0 E BP, X G S = [—^, ^1 C

(50)

and

/ e | / : ^ z ( x ) / ( x ) d x = 0, ^ / ^ ( x ) d x < 77^1, (3.3)

where r f is a constant. The first condition in .F says th a t / and z are orthogonal

which implies th at the param eter 0 is uniquely defined in model (3.2) under the

condition th at

J

z(x)z^(x)

dx

is non-singular. The second condition assumes th at / is small.

Assuming the additive error, we fit the following model based on n observa

tions

Y j = z ^ ( x j ) 0 - f - 7 = l , - - - , n , ( 3 . 4 )

where errors £,• are uncorrelated with mean 0 and variance <r^. From (1.2), we

have Z = and the L S E of 9 is ( 2 i ( X i ) Z 2 { X i ) • • • 2 p ( X i ) \ Z \ { X 2 ) Z2(xz) • • • 2 p ( X 2 ) V 2l ( x „ ) Z2 { X n ) • • ■2p ( x „ ) y -I ë =

(Z^Z) Z^y.

The covariance m atrix of Ô is given by

Cov{0) = E | [ ê - E (ê)] [ê - E ( ê ) ] ^ | = <7^ ( z ^ z ) " ' .

To study the properties of the L S E 0 of 0, we introduce the following

notations. Let Ç(x) be the design measure for x. Define

(51)

=

[z'^zy z'^{ze + f ) - e

= ( z ^ z ) " ‘ z ^ f

= ( n A ç ) - 'n b ( / , 0

= A f ‘ b ( / , 0

where f = ( / ( i i ) , • • •, /(x „))^ . The mean squared error m atrix is

M S E ( /,o = E [(ê - 6) ( ê -= Cov(ô) + [ E ( è - a)] [e ( à - 0 ) ] ‘ A f ^ + A ç ‘b (/,O b ^ (/,e )A ç -n -I (3.5) (3.6)

We consider three kinds of loss functions, Cq, Cd and Ca which represent the

integrated MSE of the fitted response Y (x), determ inant of the MSE m atrix and

the trace of the MSE matrix respectively. Their explicit descriptions (Fedorov,

1972; Studden, 1977; Heo, Schmuland and Wiens, 2001) are

CqU, ^ ) = I ^ E ^ { Ÿ {x) - E { Y \x) Ÿ dx = j ^ E |^{z^(x) 0 - (z^(x)d H- / ( x ) ) d x dx = j ^ E \ y { x ) ( 0 - e ) - f [ x ) } = [z^(x) {0 - 0 ) { 0 - 0 y z(x) - 2^ E [/(x)z^(x) ( à - ©)] dx dx dx

(52)

= ^z^(x)M SE(/, 0 z(x)dx-2j^f{x)x^{x)Aç ^b(/, ^)dx+J^f^{x)dx

= trace (M SE(/, Ç) Ao) + j f ^{x)dx 2

= ^trace(A f^ A o )+ b ^ (/,Ç )A ç^ A o A f^ b (/,0 + ^ /^ (2 ^)rfa:, (3.7)

C D { f , 0 = d e t(M S E (/,0 )

^ / » ( / .0 = trace(M SE(/,^))

= ^ ^ ^ t r a c e ( A f ^ ) + b ^ ( / , 0 A f 2 ( / , 0 b ( / , 0 - (3 9 )

We aim to construct designs to minimize the maximum (over f € T ) values

of a loss. Heo, Schmuland and Wiens (2 0 0 1) gave the following im portant

Propositions 3.1 and 3.2.

P ro p o sitio n 3.1

Suppose that ||z(x)|| is bounded in x on S and that fo r each a ^ 0, the set

{x : a^z(x) = 0} has Lebesgue measure zero. I f supjrC{f,Ç) is finite, then

^ is absolutely continuous with respect to Lebesgue measure, with a density m satisfying fg ||z(x)|pm ^(x) dx < oo.

Define matrices

Kç : = j ^ z { x ) ^ [ x ) w ? { x ) d x ,

(53)

P r o p o s itio n 3.2

Let ^ be as in Proposition 3.1. Denote by Amai(A) the largest eigenvalue of a matrix A . Then

M Cq{^) := = i f [i/ trace{Aç ^Ao) + A,„ai(KçHç *)] , (3.10)

M C o iO ~ = ■)], (3.11)

M Ca{ 0 := m ^ C A ( f , 0 =V^ [’' t r a c e {Aç^) + XmaxiGçAç^)], (312)

where u = Thus the density m, ( x) of a Q —, D — or A-optim al(m inim ax).

design must minimize the right hand side o f (3.10), (3.11) or (3.12) respec

tively.

Rem ark: The eigenvalues of matrices G(Ag and GçAç ^ are all real. In fact, if Gç is nonsingular, Wiens (1992) shows th a t

2

A 4£g(^) = — tr [Aç ^Ao] + rj^X^

71 *• •* I + G |A f 'A o A ç 'G |

where I is an identity matrix. Noting th at the m atrix I + G | A^^AqA ^ ^ G |

is symmetric since Gç, Aç, Aq are all symmetric, we know th at the eigen

values of I - I - G | Aç *^AqAç ^G | are all real. Suppose A is an eigenvalue of

I + G | Aç ^AqAç ^Gç , then it satisfies

(54)

Multiplying to both sides of the previous equation, we get

Gç + GçHç G ( — AGç = 0.

Thus

since G ( is nonsingular. Substituting G ( by K ( — in the above equation we

have

|K çH f ‘ - Al| = 0.

Therefore A is an eigenvalue of matrix Note th a t in the above proof

procedure each step can be reversed, it is also true conversely. Matrices I +

Gç Aç ^AoAç ^G | and have same eigenvalues which are all real. Simi

larly, eigenvalues of matrices GçAç ‘ and GçA^^ are all real. If G ( is singular,

Heo, Schmuland and Wiens (2001) proved the same result.

We now consider the following problem to find a minim ax design

' mm M £ q( 0 ( or M C o i O . or M Ca( 0 )

s.t. (l,m )£,2= / m { x ) d x = l, m € f2

(3.13)

where ( -, - )^z denotes the Ig-inner product, and Q. = {m € T^(5) : m(.r) >

0 almost everywhere o nS }. It is obvious that fl is convex and closed. This is

(55)

function. Thanks to the development in nonsmooth analysis, we can apply

nonsmooth Lagrange multiplier rule (Proposition 2.25), which is a simple case

of Clarke’s result, to derive important results in minimax designs.

3.2 Som e Fundamental C alculations

In this section, we present some fundamental calculations to be used in the

next section. We first find the derivative of the functional trace ( A ç ‘Aq) at m.

From Definition 2.3, the limit

lira + ^

t-»0 t \ » J’ /

for all h e implies that A^(x) = (z,(z)zj(z))p^p, th at is

A ç ( x ) = z ( x ) z ^ ( a r ) .

By operation rules defined in Section 2.2 we derive

(trace (A ^Âq) ) = trace ^ ( A ^ ') Aq^ = —trace (AçÂçAçÂo) = -z ^ (x ) (A ^ ÂqA^^) z ( x ) . (3.14) Similarly, Kç(x) = 2z(x)z^(x)m (x),

(56)

and ( K ç H j ‘) ' w = K J(x )H j-' + K « ( H j- ') '(x ) = K '( i ) H j - ' - K (H ^ ' (AjA J-'Aj) ' ( x )H (' = K J(x)H (-‘ - KjHç-> (a ;(x)Aô' A j + A çA J‘A J(x)) H f ' = 2z(a;)z^(x)H^^m(x) - ‘z(a:)z^(x)Ao - K ç H f ' A ç A o ‘z ( x ) z ^ ( x ) H f ' = z(a;)z^(x)M im (x) + M2z (z ) z ^ ( r ) M3

for some constant matrices M i, Mg and M j. For a constant vector w e

((K ç H f^ )'(x )w ) w

= ((z(x)z^(x)M im (x) + M 2z(x)z^(x)M3) w

(57)

3.3 M ain Analytical R esults

The following theorems give the main results for analytical solutions for minimax

designs.

T h eo rem 3.3

Suppose that fo r any non zero constant matrix D the set jx : z^(x)D z(x) = o | has Lebesgue measure zero. I f m solves problem (3.13), then m has the following form

almost everywhere on 5 = [—^, |] for suitable constant matrices B and D and constant d.

R em ark: It is easy to see that if z^(x) = the set | x ; z^(x )D z(x ) = o | has Lebesgue measure zero for any non zero constant

m atrix D.

Here we only give the proof for Theorem 3.3 with loss function Cq. The proofs for the other two loss functions Cp and Ca are similar and therefore are

om itted.

P r o o f of Theorem 3.3:

Let m solve problem (3.13) for loss function Cq. The objective function (3.10) is locally Lipschitz near m, since trace^Aç and A^ax are

(58)

all locally Lipschitz near m by Theorem 2.20. From Proposition 2.25 (Lagrange

Multiplier Rule), there exist X > 0, p. € R , not both zero, such that

0 6 Â5 [ifi/ (tra c e (A ç ‘Ao}) + Amax(KçHç ^)]

+ pd[{l,m )Lz - l] + iV n(m ). (3.17)

Let us now prove th at Â ^ 0. To the contrary suppose th at Â = 0. Since

p d [ { l , m)[^2 —

1]

= { p } ,

(3.17)

becomes

0 E {/i} + iVn(m)

where p ^ 0. There exists Ç E ATn(m) such th at p 4- C(^) = 0 for all x E S;

thus, C = C(^) = < 0 for all X E 5 from Lemma 2.24. By the definition of

iVn(m), we have

0 > (C,mi - m)(,2

= j ^ Ç{x){mi{x) — m{x)) dx

= (m i(x) - m(x)) dx,

for all nil G O. In particular, let m i{x ) = 0, then

0 = y

mi{x) dx > J m{x)dx=l

which is impossible. Hence, Â ^ 0.

(59)

W ithout loss of generality, assume th a t A = 1. B y Propositions 2.7 and 2.1 2,

(3.17) becomes

0 e { r/V [(< ra ce (A f'A o )]'} + ô [A ,„ ax (K çH ç')]

+ [(1) — 1] } 4- An(m ). (3.18)

From (3.14) and Theorem 2.20, (3.18) implies that

0 G CO

I

w : w G A f(m )|

- i fu z ^ i x ) (Aç ^A oA f‘) z(x) + {/i} + N n { m ) . (3.19)

B y the definition of convex hall, there exist a positive integer iV, scalar A j > 0

for i = 1 , . . . , AT with S i l i Aj = 1, w, G M {m ) C BP and C € Nçi{m) such that

0 = ^ A i W i ^ W , + . . . + Aa t ( ( K çH ç W y v

-

ifiyz^{x)

(Aç ^AqAç

z{x)

+ /2 +

C-F r o m ( 3 . 1 5 ) ,

0 = [ A i w f ( M f z ( x ) z ^ ( x ) m ( 2 : ) + M 3z(x)z^ (x) M 2 ) w i

H Ajvw]^ (M fz (x )z ^ (x )m (x ) + M ^z(x)z^(x)M 2 ) w ^ j

- i f uz'^ix) (Aç ^AoAç z(x) + /i +

C-By L e m m a 2 . 2 4 ,

(60)

H Ayvwjv (M fz(x )z^ (x )m (x ) 4 - M , z(x )z^(x )M2)

- T)^v^{x) (Aç ‘ A o A ç z ( x ) 4 - p.,

almost everywhere on {x € S : m (x) > 0}. Consequently, there exist constant

m atrices B and D and constant d such th at

z^(x)D z(x)m (x) — z(x)^B z(x) - d = 0.

Since for any non zero constant m atrix D the set |z ^ (x )D z (x ) = o} has Lebesgue

measure zero, m has the form (3.3), which gives the required result. ■

We know th at in the case of D —optim ality the optimal design is provably

symmetric (Wiens, 1992, 1993). For Q —and A -optim ality symmetry has not

been proven, but is certainly plausible. On the other hand, in practice, many

scientists and eigineers are only interested in the case of symmetric optim al

designs. So we need to consider the symmetric density functions, and we can

modify problem (3.13) as follows

nun

MC

q

{^) {

o t

MC

d

(0^

or A4£m(0)

s.t. (1, m)£,2 := m(x) dx = 1 (3.2 0)

||m — m~ 111^2 :=

J

[m(.x) — m (—x)]^ dx = 0, m 6 fl.

where m ~(x) := m (—x). The solution to problem (3.20) is given in the following

(61)

T h eorem 3.4

I f a symmetric density m solves problem (3.20), then m has the same form as the one in Theorem 3.3.

Proof:

We first calculate the derivative of ||m - m~ | | | 2 with respect to m. By

definition 2.3, for any h € L'^iS),

1 ^ 7 [ ^ [ ( ^ ( ^ ) - (m (-x )

+th{-x))Ÿdx-J^[m{x)-m{-x)\'^dx

= lim Y

^^[{ni{x)-m{-x))+t{h{x)-h{-x))]^dx—Jjm{x)-m{-x)fdx

l i m j [2^ t [ ( m ( x ) - m ( - x ) )

(h{x)-h{-x))]dx+t^ J^[h{x)-h{-x)fdx

= 2 [{m(x) — m{—x)) {h{x) —

/i(-x))]

dx

= 2

(m(x)

—

m(—x))h(x) dx — j^ {m{x)

—

m{—x))h{-x) dx

— 2 j^y^^^(m(x) —

m{—x))h{x) dx

— ‘ (m (—x) — m (x))/i(x)

{—dx)

(62)

(m (x) — m {—x))h{x) dx

= 4 (m — m , /%),

th at is,

I (m(ar) — m (—x))^ dx (x) = 4 [m(x) - m (—x)],

L-'S m

for all X 6 5 . This implies th at if m is a symmetric solution, then in a manner

similar to th e proof of Theorem 3.3, we have

z^(x)D z(x)m (x) — z(x)^B z(x) — d + 4 /xi[m — m~] 4- ( = 0,

where Ç is then same as in Theorem 3.3. Next

z^(x)D z(x)m (x) — z(x)^B z(x) — d = 0,

almost everywhere on E = (x € 5" : m (x) > 0}, since m — m~ = 0 if m is

symmetric, and ^ = 0 is almost everywhere on E . In conclusion, if m is a

symmetric solution, then m has the same form as the one in Theorem 3.3. ■

3.4 Exam ples

E xam ple 3 .5

Consider the approximately linear regression model

(63)

with z ^ { x ) — ( l,x ) . For loss function Cq , the minimax design was discussed in

Huber (1975, 1981) (also see next section), and the minimax density function

is given by

m(x) = {ax^ + 6) ^ (3.21)

for a: € 5. Suppose th a t m(x) is symmetric, from Theorem 3.4, the minimax

density is

which includes the solution (3.21). In Section 3.5, we obtain the exact values of

a and h in (3.21). ■

E xam p le 3.6

Consider the quadratic model with no intercept,

y = 9oX->r Oix^ + /( x ) + 6,

i.e., 2i(x) = X, and Z2(x) = x^. Theorem 3.4 implies th at the symmetric

minimax density is

(64)

E x a m p le 3 .7

For the quadratic model with intercept,

y = (9o + 9 \X + 92X^ + f{ x ) + e,

the sym m etric minimax density is

E x a m p le 3.8

Consider the general polynomial regression model

y 9q 9 \ X 9 p - \ x ^ ^ + f { x ) + £ .

From the proof of Theorem 3.3, the minimax density m(x) satisfies

(

2^1 . \ /2p-l

M ^ bi-ix' ^ I m{x) = 0

almost everywhere on {x e 5 : m(x) > 0} for constants ai and 6,. For the

minimax symmetric density, we need a, = 6, = 0 for i = 1,3, • • •, 2p — 3. Thus

m ix ) = ( Q2p-2X^p-" + U2p_4X^P-" + • • • + a2X^ + ao\ +

Minimax robust designs for misspecified regression models

In tro d u ctio n

1.1

Classical O ptim al D esigns

1.1.1

G en era l C o n c e p ts

1 .1 .2

E x a m p le s

J

A

I ^p-1

ê = (x^x) 'x V

(d) = £; I [è -

-

(x^x)"^.

'.'M := I Ô

:

f

''

z =

• • •

<

;

•

/

1 .1 .3

C la ssical O p tim a l D esig n s

1.2

M inim ax Robust D esigns

1.2.1 T h e R ea so n s W e S tu d y M in im ax R o b u s t D esig n s

1.2 .2

M in im a x R o b u st D esign s

1.3

O ptim ization and N onsm ooth A nalysis

1.4

Summary

N o n sm o o th O p tim ization

2.1

Examples in N onsm ooth Analysis and Op­

tim ization

2.2

B asic C oncepts o f N onsm ooth A nalysis

2 .2 .1 C lassical D er iv a tiv es and G en era lized G rad ien ts

Definition 2.3 [G&teaux Derivative and Frêchet Derivative]

Definition 2.4 [Lipschitz Condition]

Definition 2.5 [Generalized Directional Derivative]

Definition

[Generalized Gradient]

Proposition

Proposition

{c

we

2.2.2

S om e B a s ic C a lc u lu s o f G en eralized G r a d ie n ts

Definition 2.10

Proposition 2.13 [Product and Quotient Rules] (Propositions 2.3.13 and

2.3.14, Clarke, 1983)

0,

0

0,

(f)

0,

0

ts

Proposition 2.14 [Chain Rule] (Theorem 2.3.10, Clarke, 1983)

Proposition 2.15 [Extrema]

I ((:;)!■

« n i

!-(;)}■

1(0)

,{-(01

0 0 4 4 ' . )=0.

{

2 .2 .3

G en era l D efin itio n s an d P r o p e r tie s a b o u t M a tr ix

D er iv a tiv e

2.3

T he G eneralized G radient o f th e Largest

Eigenvalue o f a M atrix Valued Function

Theorem 2.20 [The Generalized Gradient of

_A

Examples in N onsm ooth Analysis and Op

T he N orm al C one o f the A bstract Con