This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, som e thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.
The quality of this reproduction is dependent upon the quality o f th e copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps.
ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600
by Peilin Shi
B.Sc., Harbin Institute of Technology, 1982 M.Sc., Harbin Institute of Technology, 1989
A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of
DOCTOR OF PHILOSOPHY
in the Department of Mathematics and Statistics
We accept this dissertation as conforming to the required standard
Supervisor (Department of Mathematics and Statistics)
____________________________________________________________
Co-Supervisor (Department of Mathematics and Statistics)
Dr. F. N. Diacu, Departmental M ^ b e r (Department of Mathematics and Statistics)
r. N. Roy, y Outside M em ber^ (Department of Economics)
Dr. D. P. Wiens, External Examiner (Department of Mathematical and Statistical Sciences, University of Alberta)
(c) Peilin Shi, 2002 University of Victoria
All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author.
ABSTRACT
Minimax robust designs are studied for regression models with possible mis specified response functions. These designs, minimizing the maximum of the mean squared error matrix, can control the bias caused by model misspecification and the desired efficiency through one parameter. Using nonsmooth optimization technique, we derive the minimax designs analytically for misspecified regression models. This extends the results in Heo, Schmuland and Wiens (2001). Several examples are discussed for approximately polynomial regression.
Examiners:
Dr.Lr J. Ye, ^ Supervisor (Department of Mathematics and Statistics)
Co-Supervisor (Department of Mathematics and Statistics)
departmental Member (Department of Mathematics and Statistics)
Dr. N. Roy, Outside M em ber/ (Department of Economics)
Dr. D. P. Wiens, External Examiner (Department of Mathematical and Statistical Sciences, University of Alberta)
A b stract ii
Table o f C on tents iii
List o f Tables iv
List o f Figures v
A ckn ow led gem ents vi
1 In trod u ction 1
1 .1 Classical Optimal D e s i g n s ... 1
1.1.1 General C o n c e p ts... 1
1.1 .2 E x a m p le s ... 2
1.1.3 Classical Optimal D e s ig n s ... 6
1 .2 Minimax Robust D esig n s... 7
1.2 .1 The Reasons We Study Minimax Robust Designs . . . . 7
1.2.2 Minimax Robust D esigns... 9
1.4 S u m m a r y ... 11
2 N o n sm o o th O p tim ization 13 2.1 Examples in Nonsmooth Analysis and O p tim iz a tio n ... 13
2.2 Basic Concepts of Nonsmooth A n a ly s is ... 17
2.2 .1 Classical Derivatives and Generalized G r a d ie n ts ... 17
2.2.2 Some Basic Calculus of Generalized G r a d ie n ts ... 2 0 2.2.3 General Definitions and Properties about Matrix Deriva tive ... 27
2.3 The Generalized Gradient of the Largest Eigenvalue of a Matrix Valued Function... 28
2.4 The Normal Cone of the Abstract Constraint S e t ... 31
2.5 Lagrange Multiplier R u le ... 33
3 M in im ax R o b u st D esign s for M issp ecificed R egression M odels 36 3.1 Minimax D e s ig n s ... 36
3.2 Some Fundamental C alc u la tio n s... 43
3.3 Main Analytical R e s u l t s ... 45
3.4 E x a m p le s ... 50
3.5 Huber’s Minimax Design for Simple Linear Regression... 53
4.1 Existence of Solutions on Ùm ... 60
4.2 Solution in Ù\f ... 67
5 C o m p u ta tio n a l R esu lts 73 5.1 The Sequential Unconstrained Minimization Technique (SUMT) 74 5.2 Normalizing Density Function m ... 75
5.3 Two Methods for Solving Unconstrained Nonlinear Program ming 76 5.3.1 Cyclic Coordinate M e th o d ... 76
5.3.2 BFGS M ethod... 78
5.4 A lgorithm s... 81
5.5 Com putational Results For z^(x) = (l,ar)... 82
5.6 Com putational Results For z^(x) = (x, 84 5.7 Com putational Results For z^(x) = (l,ar,a;^)... 84
3.1 Exact coefficient values of density function m{x) = (ax^ + . 59
5.1 H uber’s exact coefficient values ([H]) of density function m (r) =
(ax^ -t- 6)+ and numerical coefficient values ([F], [C]) of density
function m{x) = with UBSFM, where [F] are the
results using fixed starting point (1,1,1) and [C] with startin g
points close to [H]... 87
5.2 Some coefficient values of density function m{x) =
for loss function Ld with UBSFM... 8 8 5.3 Some coefficient values of density function m{x) =
for loss function Lq with UBSFM... 89
5.4 Heo, Schmuland and Wiens’ exact coefficient values ([H]) of den
sity function m{x) = and our numerical coefficient
values ([O]) of density function m{x) = ...w ith.UB
SFM, where KR denotes the kinds of results 90
for loss function Cq with PBSFM... 91
5.6 Numerical values of eigenvalues po,Pi,P2 for various u a t m{x) =
( ^X* 4-^ox^C? ) ^ function Cq with PB SFM ... 92
5.7 Numerical values of eigenvalues po, Pi for various u at [H]:
m{x) = {oqx^ + &o)^ and [O]: m(x) = for loss function
Cq with UBSFM, where [H] denotes H uber’s results and [O] de
notes our results in the column of the kinds of the results ([KR])
respectively... 93
5.8 Numerical coefficient values of density function Tu(i) = ( d p ^
for loss function Ca with PBSFM... 94
5.9 Numerical coefficient values of density function m(x) = ^
for loss function Co with PBSFM... 95
2.1 Two lines fitted to six data points (indicated by dots)... 16
2 .2 f q i x ) = max { 1 + 1 + x} on [—1, 1] (solid line)... 16
2.3 Function y = \x\ and its generalized gradient [—1, 1] at x = 0. . 21
5.1 Q-optimal minimax densities m(x) = (ax^ + b)'^ ([H]-solid lines),
and m(x) = with UBSFM ([F]-dotted lines) for ap
proximately linear regression: (a) u = 0.1; (6) t/ = 1... 96
5.2 Q-optimal minimax densities m(x) = (ox^ 4-6)"^ ([H]-solid lines),
and m(x) = with UBSFM ([F]-dotted lines) for ap
proximately linear regression: (c) u = 6.48; (d) f/ = 1 0 .... 97
5.3 Q-optimal minimax densities m(x) = (ox^ 4-6)^ ([H]-solid lines),
and m(x) = with PBSFM ([F]-dotted lines) for ap
proximately linear regression: (e) u = 100; ( / ) i/ = 1000. . . . 98
lines, Heo, Schmuland and Wiens, results), and m (x) =
with UBSFM ([O]: dotted lines, our results) for approximately
linear regression: (a) i/ = 1.000; (6) 10.00... 99
5.5 Q-optimal minimax densities m(x) = ([H]: solid
lines, Heo, Schmuland and Wiens, results), and m (x) =
with UBSFM ([O]: dotted lines, our results) for approximately
linear regression: (c) u = 1 0 0.0; (d) i/ = 1 0 0 0... 1 0 0
5.6 Q-optimal minimax densities m{x) = ^ with PB
SFM (solid lines) and m{x) = a(a;‘‘ + 0ix^ + ^2)'^ (dotted lines),
for quadratic model: (a) i/ = 0.1; (6) t/ = 1 .0...1 0 1
5.7 Q-optimal minimax densities m{x) = ( ^ with PB
SFM, for quadratic model: (c) u = 5.0; (d) 1/ = 10.0... 102
5.8 Q-optimal minimax densities m(x) = ^ with PB
SFM, for quadratic model: (e) u = 20.0; ( / ) u = 50.0 103
5.9 Q-optimal minimax densities m(x) = with PB
SFM (solid lines) and m(x) = a(x'* -h^ix^ 4- /^a)'"' (dotted lines),
for quadratic model: (g) u = 75.0; (h) u = 100.0... 104
5.10 A-optimal minimax densities m{x) = ( +^41^2+? ) ~
0.0150; (6) = 0.0165... 105
0.0175; (d) 1/ = 0.0200... 106
5.12 A-optimal minimax densities m(x) = ( ^ ' (^) ^ ~
0.0300; ( / ) i/ = 0.0500... 107
5.13 A-optimal minimax densities m(x) = : (p) f/ =
0.0800; {h) 1/ = 1 .0 0 0 0... 108
5.14 A-optimal minimax densities m(x) = (i) t/ =
10.000; (y) 1/ = 20.000... 109
5.15 D-optimal minimax densities m(x) = ( ^ ^ : (a) i/ =
0.1; (b) u = l ...1 1 0
5.16 D-optimal minimax densities m(x) = : (c) i/ =
1 0; (d) i/ = 2 0...1 1 1
5.17 D-optimal minimax densities m(x) = ( ^ = (e) f/ =
50; ( /) i/ = 100... 112
5.18 Q-optimal minimax densities m(x) = ( +e° ^ UB
SFM (solid lines) and PBSFM (dotted lines) : (a) u = 5.0; (6) i/ =
10.0... 113
I am deeply indebted to my supervisor Dr. J. J. Ye for introducing me to
nonsmooth analysis and optimization, and her countless efforts on my behalf
throughout my graduate studies. I also would like to express my sincere g ra t
itude to my co-supervisor Dr. J. Zhou for her tremendous help in the area
of minimax robust design. This dissertation would not have been completed
without their help and incredible patience.
I especially wish to thank the members of my examining committee Dr. F.
N. Diacu, Dr. N. Roy and Dr. D. P. Wiens, for their careful consideration and
helpful suggestions.
No word can describe my eternal gratitude to my parents and my parents-in
law. They are the pinnacle o f parenthood who have given my family everything,
without ever asking anything in return.
Finally, I would like to thank my wonderful wife, Xiaoqi Sun and my lovely
daughter, Shirly Xiaomeng Shi, for their love, support and encouragement.
In tro d u ctio n
1.1
Classical O ptim al D esigns
1.1.1
G en era l C o n c e p ts
Scientists and engineers often need to assess the effects of certain controllable
conditions on the experiments they conduct. To obtain the most information
about these effects, the experiment under consideration must be designed prop
erly. A set of possible conditions is called the factor space or design space. In
the factor space, specification of the number of observations to be made is called
a design (A rthanari and Dodge, 1981).
Before choosing a design, we need to have some information about the re
lationship between the controllable conditions and the outcome. T he outcome
is usually called the response. The response variable can be related to control-
lables by some functional relationship, which is assumed to hold. W hen the
have a statistical model.
The choice of both the model and the design influences the conclusion drawn
from the experiment. Thus problems of optim ally choosing the model and the
design are im portant. Also, methods of analyzing the d ata are required for
estim ating the unknown parameters in the model. In this dissertation we focus
on the choice of designs such th at accurate estimates can be obtained. In
particular, we apply the technique of m athem atical programming to solve the
related optim ization problems in the regression design. The following examples
give a brief introduction of regression designs.
1 .1 .2
E x a m p le s
E x a m p le 1.1
We assume th at the controllable variable x and the response variable y are
related according to the simple linear regression model. T h at is, there exist
param eters Oq, di, and such th at for any fixed value of the independent
variable x, the response variable y is related to x through the model equation
V = Oq 01 X
The quantity s in the model equation is a random error variable with mean
the experiment, from which the model parameters and the true regression line
itself can be estim ated. These observations are assumed to have been obtained
independently of one another. T hat is, yi is the observed value of a random
variable Vi, where Vi = Xi + e i and the errors e i, • • • ,£n are independent
random variables . Independence of V^, , V^ follows from the independence
of the Si's.
The least squares estimates {LSE) of 9q and ^i, denoted by 9q and are
given by - ÿ) 9o = ÿ - 9 i x , 9i = — ---, = \2 5^(Xi - x) i=l
which minimize the following function
LSE{9q, ^i) — 51 [î/t — (^0 4- Xi)]^ .
i=l
One of th e classical regression design problems is to choose Xj’s to find the
“best” estim ate 9 = {9q,9i)'^. The “best” can be explained by many different
measures. If the estimates are unbiased, one is the integrated mean squared
error { I M S E ) (let 5 = j] here),
I M S E =
J
E{9q + 9 i X — 9q — 9i x Ÿ dx_ J . / 9 . - 9 x c r ^ 2 x x a ^
— + (x'^ + x^)______________
S l n ^ ^ ^ ^ E ? = l(x .--x)2 E ? = l( X i- x )2j dx
= <T^ I — 4- j^ + x2 E " = i( a :i- x )2.
To minimize I M S E over the x ,’s, the z / s ought to be spread out as much as
possible, for instance, half of them at the lowest z-value and the other half
a t the highest x-value. This example dem onstrates that we can improve the
quality of our estimates by planning the locations of the z-values, rath er than
ju st choosing them haphazardly. ■
E x a m p le 1.2
Consider a multiple regression model, which can be written in m atrix nota
tion as y = X 0 4- e where and ( 1 V2 ( 1 1 Z i i Z21 Z12 Z22 y = \ y n ) , X = 1 1 ^ n l ^n2 0 ( (
A
9x S = £2I ^p-1
} \ ^ n /0 and variance cr^. The least squares estim ator of 0 is
ê = (x^x) 'x V
with the covariance m atrix
Cav
(d) = £; I [è -
E{è)] [à-
(x^x)"^.
The classical regression problem is to choose the design points x,, i = 1, • • •, n,
in some optimal manner, which is equivalent to choosing the discrete measure
Ç on 5, where
C(x) : = - è (1.1)
" « = 1
and 6xi denotes the indicator function of x, , i.e.,
'.'M := I Ô
:
An optimal design can be considered to minimize some scalar function of the co-
variance m atrix, £(C ot;(0)). Examples of such functions include £(•) = trace{-)
and £(•) = det{-), where trace{A) and det(A) denote the trace and the deter
minant of a m atrix A respectively. ■
In general, we can consider a linear regression model
this case, for design points x,, we define
f
z ^ ( x i ) \''
Zl(Xi) 22(Xi) • • • Z p ( Xi ) >z =
Z^(X2) Zl(X2) 22(X2)• • •
Zp(X2)<
Z^(X n);
I 2 l ( x „ ) Zz(Xn)•
• • 2p ( x „ )/
thus, (1.2) C {Cav = C (Z^Z) ^ z(x)z^(x)^ (dx) ^ .1 .1 .3
C la ssical O p tim a l D esig n s
T he classical optimal design problem with multiple variables includes the fol
lowing three parts:
P i : Specifying a Regression Model:
Y = z^(x)6 + e
z(x) e R , 0 e RP, (1.3)
x g S c R ,
where 5 is a given design space and z(x) = (zi(x), Z2(x), • • ■, Zp(x))'^ is a
given function of x. For instance, if z^(x) = (1, x^), then (1.3) is the usual
multiple linear regression model; ifx 6 R and z^(x) =
(1.4) Ai: the fitted model is exactly correct;
Ag: the errors Si are uncorrelated.
P3 : Estim ating 0 and Choosing an O ptim al Criterion:
^min C ( C o v
s.t. Xj G 5, i = 1, • • •, n,
where 0 is an estimate (not necessarily the least squares estim ate) in (1.3)
and £ is a loss function, such as the trace, the determ inant, the largest
eigenvalue and the integration of the covariance m atrix which give the
A—, D —, E — and Q —optimality criteria, respectively.
This kind of design problems has been studied extensively in the literature, and
many optimal designs are obtained for various linear models and loss functions,
see Fedorov (1972) and Pukelsheim (1993).
1.2
M inim ax Robust D esigns
1.2.1 T h e R ea so n s W e S tu d y M in im ax R o b u s t D esig n s
Classical optim al design theory is focused on the attainm ent of some form of
a minimum variance property, assuming the fitted model to be exactly correct
model assumptions, the classical optimal designs do not perform well and yield
poor estimates.
C a se 1: The model assumption A i is violated.
This means th at model (1.3) is not exactly correct, so the least squares
estim ator is biased. Box and Draper (1959) found th at very small departures
from A i can eliminate any supposed gains arising from the use of a design
which minimizes variance alone, see Wiens (1992). Huber (1981) also gave the
statem ent th at “deviations from linearity th at are too small to be detected
are already large enough to tip the balance away from the ‘optim al’ designs,
which assume exactly linearity and put observations on the extreme points of
the observable range, toward the naive’ ones which distribute the observations
more or less evenly over the entire design space. ”
Furthermore, with the classical optimal designs, the model adequacy test
cannot be completed since the observations are only on the extreme points of
the design space, and no information is available in the interior of S.
C a se 2: The model assumption Ag is violated.
In this case, the errors are correlated. Consequently, model departure from
uncorrelated errors occurs. The exact correlation structure of the errors is
usually unknown. Thus Cov{0) is unknown. The classical optim al designs
1.2 .2
M in im a x R o b u st D esign s
As stated in last subsection, there is a need to study optim al designs under
possible small violations of model assumptions A i and/or Ag. These designs
are called robust designs. In general, robust designs are those which are not
sensitive to small departures from model assumptions. Minimax robust design
is one kind of robust design. More discussion of the need to study robust designs
can be found in Wiens (1990,1992,1994) and Heo, Schmuland and Wiens (1999,
2001).
In practice, the relationship between a response variable y and independent
variables x is only approximately modelled. This often results in the violation
of A i. The violation of Ag can be caused by, for example, serial or spatial cor
relation or repeated measures. W ith respect to many kinds of model violations
(departures), various robust designs can be studied. In this dissertation, we
focus on the violation of A i, and construct the corresponding robust designs.
As we mentioned above, model (1.3) is usually an approximation of a true
(unknown) model, say,
£ ( Y lx ) = z ^ ( x ) 0 + /( x ) , (1.5)
where function / is unknown but “small” . Due to the unknown nature of
{(Yi, Xi)}"^j. The least squares estimates 0 oiO and Ÿ = z^(x )0 of E [Y|x] are
possibly biased because of the misspecified response. In this situation, regres
sion designs can play an im portant role in choosing optimal design points x € 5
th at yield estim ates 0 and Ÿ , which rem ain relatively efficient while suffering
as little as possible from the bias engendered by the model misspecification.
Robust regression designs have been studied for numerous response func
tions z(x) and various classes of function / . Examples are Box and Draper
(1959), Stigler (1971), Sacks and Ylvisaker (1984), Wiens (1990, 1992), Dette
and Wong (1996), and Liu and Wiens (1997). Recently Heo, Schmuland and
Wiens (2001) investigated minimax designs when function / belongs to some
class of functions (defined in (3.3)). T heir study indicates th at the associated
minimax problems are difficult to solve analytically, even if z^(ar) = (1, z, x^) - a
quadratic regression model. This leads to the investigation in this dissertation.
Using nonsmooth optimization theory, we are able to find the solution for the
minimax problems proposed in Heo, Schmuland and Wiens (2001).
1.3
O ptim ization and N onsm ooth A nalysis
The origins of analytic optimization lie in the classical calculus of variations
and are intertwined with the development of the calculus. For this reason,
differentiability) hypotheses th at were made a t its inception. The attem pts to
weaken these smoothness requirements have often been considered in nature.
In practical application of optimization, we often get into the situation where
the objective function to be minimized /maximized and/or constraint functions
are not necessarily differentiable. Nonsmooth analysis th at refers to differential
analysis in the absence of differentiability is, without a doubt, a very im portant
tool for this. It is only in the last two decades th at the object has grown
rapidly and come to play a role in functional analysis, optimization, differential
equation, control theory, and increasingly, in analysis generally.
1.4
Summary
In this dissertation, we continue the study in Heo, Schmuland and Wiens (2001)
to derive minimax designs. Using Lagrange Multiplier Rule in nonsmooth op
tim ization theory, we obtain minimax designs analytically for misspecificed re
gression model (1.5). The density of the minimax design has the analytical
form
m{x) — ^z(x)^B z(x)
— r _ i 11
z’^(x)D z(x) /
almost everywhere on S = [—5, for suitable matrices B and D and constant
d, where (c)"*" := max(0, c).
z^(a:) = ( l , z ) in Huber (1981) and z^(x) = (x,x^) in Heo, Schmuland and
Wiens (1999).
In C hapter 2, we give a brief review of the nonsmooth optimization the
ory and some results including the definition and properties of the generalized
gradient, the generalized gradient of the largest eigenvalue of a m atrix valued
function, properties of the abstract constraint set, the nonsmooth Lagrange mul
tiplier rule, and some examples. In Chapter 3 we introduce the minimax design
problems and study the related nonsmooth optimization problems. Analytic so
lutions are derived and illustrated through examples. In C hapter 4, we prove the
existence of solutions with the restriction th at the density is bounded pointwise
by a positive constant. In Chapter 5, computational techniques determining
minimax density functions are addressed, and detailed algorithms are presented.
In particular, sequential unconstrained minimization technique, cyclic coordi
nate m ethod and BFGS algorithm (Broyden, 1970a, 1970b; Fletcher, 1970;
Goldfarb, 1970; Shanno and Kettler, 1970) are applied to compute the minimax
density functions. Several examples are given and compared with the existing
designs. In C hapter 6, we summarize our results, and point out possible further
N o n sm o o th O p tim ization
This chapter contains the preliminaries and results on nonsmooth optim ization
th a t will be used later in this dissertation.
2.1
Examples in N onsm ooth Analysis and Op
tim ization
It is interesting th at in Clarke’s well-known book on nonsmooth analysis, the
following example on linear regression was given as the first example (Clarke,
1983) to illustrate the needs for nonsmooth analysis and optimization.
E x a m p le 2.1
“This first example is familiar to anyone who has had to prepare a laboratory
report for a physics or chemistry class. Suppose th at a set of observed d a ta
points (xo, I/o), • • •, (x/v, yv) in the x -y plane is given, and consider the
problem of determining the straight line in the x —y plane th a t best fit the
d ata. Assuming th at the given d ata points do not all lie on a certain line (any
lab instructor would be suspicious if they did), the notion of “best” must be
defined, and any choice is arbitrary. For a given line y = k x + b, the error Cj at
the ith d ata point (xi,yi) is defined to be \kxi + b — yi\. A common definition of
best approximating line requires th at the slope k and the intercept b minimize
over all k and b. On the face of it, it seems a t least as natural to
ask instead th at the total error be minimized. The characteristics of
the resulting solution certainly differ. In Figure 2.1, for example, the solid line
represents the least total error’ solution, and the dashed line represents the
‘least total square error’ solution . Note th at the former ignores the anomalous
d a ta point which presumably corresponds to a gross measurement error. The
least squares solution, in contrast, is greatly affected by th at point. One or
the other of these solutions may be preferable; the point we wish to make is
th a t the function 5Z,^o is nondifferentiable as a function of k and b. Thus
the usual methods for minimizing differentiable functions would be inapplicable
to this function, and different methods would have to be used. Of course, the
reasons th at the least square definition is the common one is that it leads to
the minimization of a smooth function of k and 6.” ■
E xam ple 2.2
A function which arises naturally as a criterion in engineering and statistical
design problems is
/^ (x ) = the largest eigenvalue of A{x).
The function is not differentiable in general even if all <iij{x) are continuously
differentiable functions of x. For example, let
and
Q( ' ) 4 i L )
for z € [—1,1]- Then
min Îm{x)
is a smooth optimization problem, since function fxrix) = 2 + x^ is continu
ously differentiable on [ - 1 ,1]. But
mm _ / , ( x )
is a nonsmooth optimization problem, since function
/g (x ) = m ax{l + I M + X} = I ; + g { o / i| .
is nondifferentiable a t z = 0 (Figure 2.2). We shall return to this example in
Figure 2.1: Two lines fitted to six data points (indicated by dots).
2.2
B asic C oncepts o f N onsm ooth A nalysis
Throughout this section, X is a real Banach space.
2 .2 .1 C lassical D er iv a tiv es and G en era lized G rad ien ts
Before defining the generalized gradient, we review some facts about classical
derivatives.
Definition 2.3 [G&teaux Derivative and Frêchet Derivative]
Let f : X R . The usual (one-sided) directional derivative of / at x in
the direction v is
/ ( x ; v ) : = lim (2.1,
t->0+ t
when this limit exists. We say that / is Gâteaux differentiable at x provided
the lim it in (2.1) exists for all v G X , and there exists an element in the space
X* of continuous linear functionals on X , denoted f '{x), th at satisfies
f ' {x;v) = ( f ' {x), v). (2.2)
If (2.2) holds at a point x, and the convergence in (2.1) is uniform with respect
to V in bounded subsets of X , we say that / is Frêchet differentiable at x. and
call f ' (x) the Frêchet Derivative at x.
We say th at / is continuously differentiable (C^) a t x provided that on a
neighborhood of x, the G âteaux derivative exists and is continuous as a mapping
From Examples 2.1 and 2.2, we see that the above classical derivatives are
not useful where nonsmoothness arises. The generalized gradient introduced by
F. Clarke in 1973 is a replacement for the classical derivatives when a function
is not smooth. It is defined for a class of functions called Lipschitz continuous
function which includes the functions encountered in Examples 2.1 and 2.2.
Definition 2.4 [Lipschitz Condition]
Let C be a nonempty subset of a norrned linear space X . A function / :
C —> is said to be Lipschitz {of rank L) on C if for some nonnegative scalar
L one has
\ f { y i ) — /( y z ) ! < L\\yi - Î/2II, Vî/i, Î/2 € c .
We say th at / is Lipschitz o f rank L near x if C = x + SB for some J > 0, where
B signifies the open unit ball.
It follows from the mean value theorem that if a function / is continuously differ
entiable at X then it is Lipschitz near x. O ther well-known Lipschitz continuous
functions include the class of all convex functions bounded above.
We are now ready to define the generalized directional derivative and gen
eralized gradient.
Definition 2.5 [Generalized Directional Derivative]
Let f : X R he Lipschitz near x. For any vector v in A", the (Clarke)
is defined by
y-*x t
t-*Q+
Note th a t unlike the usual directional derivative (2.1), the generalized direc
tional derivative involves only the super limit of the difference quotient which
is bounded above by f,||u|| in light of the Lipschitz condition.
Definition
2.6[Generalized Gradient]
(p.27, Clarke, 1983)Let f : X R he Lipschitz near a given point x € X . The generalized
gradient of / at x, denoted d f ( x ) , is the subset of X* given by
d f { x ) = {C € X* : f °{x\ v) > ((,%) for all v € X}.
Proposition
2 .7 (Proposition 2.2.4, Clarke, 1983)I f f is continuously differentiable at x, then d f { x ) = {/'(x)} .
Proposition
2.8 (Proposition 2.2.7, Clarke, 1983)When f : X R is convex and Lipschitz near x, df {x) coincides with the
subdifferential at x in the sense of convex analysis, i.e.,
9 f { x ) =
{c
€ X* : f {y) - f {x) > {Ç, y - x) fo r all y e X } ,and /° (x ;u ) coincides with the directional derivative, i.e.,
r ( x ; « ) = / ( x ; . ) : = lim + t-+0+ t
Exam ple 2.9
Let X = R and f { x ) = |z|. T hen / is Lipschitz of rank of 1. Since f { x ) = x
if X > 0 and /( x ) = —x if x < 0, by Proposition 2.7, we have
if X > 0
For the case x = 0,
we
findn O -,v ) = y->x t = Urn k, + 'w| - W y-*x t t->o+ = 1^1
-This means th at d f { x ) contains such Ç th at satisfy /°(0;u) = |u| > Çv, for all
V, which yields th at 9 /(0 ) = [—1,1] (Figure 2.3). Finally, we have
f { 1 } i f x > 0 9 /( x ) = < [ —1 ,1 ] if X = 0
[ { - 1 } if X < 0. ■
2.2.2
S om e B a s ic C a lc u lu s o f G en eralized G r a d ie n ts
We now gather some formulas th a t facilitate the calculation of 9 / when / is
synthesized from simple functionals through linear combinations, composition,
and so on.
Definition 2.10
(Definition 2.3.4, Clarke, 1983) / is said to be regular a t x providedFigure 2.3: Function y = |ar| and its generalized gradient [—1, 1] a t x = 0.
(i) For ail v, the usual one-sided derivative f ' {x, v) (see (2.1)) exists.
(ii) For all u, f ' { x ; v ) = f ° ( x ; v ) .
E x a m p le 2 .1 1 (Proposition 2.3.6, Clarke, 1983)
All convex functions and all functions are regular. ■
P r o p o s itio n 2 .1 2 [Linear Com bination Rules] (Corollaries 2 and 3 of Propo
sition 2.3.3, Clarke, 1983)
Let /i —► /I (f = 1, • • •, n), be Lipschitz near x, and let A, {i = 1, • • •, n) be
scalars. Then f := YZjLi ^ifi Lipschitz near x, and one has
(x) C 5^A ,d/i(x), t=i
Proposition 2.13 [Product and Quotient Rules] (Propositions 2.3.13 and
2.3.14, Clarke, 1983)
Let f and g : X R be Lipschitz near x. Then the product f g is Lipschitz
near x and one has
d { f g) i x ) C f { x ) d g { x ) +g { x ) d f { x ) .
I f in addition f { x ) >
0,
g{x) >0
and if /, g are both regular at x, then equalityholds and f g is regular at x . A nd moreover suppose g{x) ^
0,
then the quotientf / g is Lipschitz near x and one has
g{x)df{x) - f{x)dg{x)
(f)
( i ) CI f in addition f { x ) >
0,
g{x) >0
and if f and —g are both regular at x , thenequality holds and f / g
ts
regular at x.Proposition 2.14 [Chain Rule] (Theorem 2.3.10, Clarke, 1983)
Let F : X RP’ be continuously differentiable near x , and let g : RJ'’ R
be Lipschitz near F{x). Then the function G := g o F is Lipschitz near x, and one has
dG(x) C dg {Fix)) o F'{x) = F'{x)'^dg{F{x)).
Equality holds if g (or —g) is regular at F{x), in which case G (or —G ) is also regular at x.
Proposition 2.15 [Extrema]
(Proposition 2.3.2 of Clarke (1983) and Propo sition 2.8)I f f attains a local m inim um or maximum at x , then x is a stationary point o f f , i.e., 0 € df {x) . Moreover if f is a convex function, then any stationary point must be a minim um fo r f .
E x a m p le 2.16
Consider G{k,b) = \2k + b — 2|. Let F{k,b) = 2k + b - 2, and g{y) = \y\.
Then G{k,b) = g{F(k,b)). Here F'{k,b) = (2,1)^, and
’ { 1}, if 1/ > 0
dg{y) = ' [ -1, 1], if y = 0 ,
^ { -1 }, if y < 0
by Example 2.9. Since g is convex, hence regular, it follows from Proposi
tion 2.14 that we have
i f2* + 6 - 2 > 0
dG(k,b) = • 1],
^ ( i ) { - D .
if 2A: + 6 - 2 = 0
if 2A: + 6 - 2 > 0
1 ^ , - 1 < A < l | if 2A:+ 6 - 2 = 0
I ((:;)!■
if 2fc + 6 — 2 < 0. H - i / jE xam ple 2.17
Now armed with some of the propositions given above, we can prove that
A: = 1 and 6 = 0 are the “least total error” solution for Example 2.1. For
(xi, Vi) = (*,*), i = 0 ,1 ,2 ,3 ,4 and (xs, ys) = (5,0), the problem
m i n ^ jfcxj + 6 - yi|
i=0
becomes
min G{k, 6)
where G{k,b) = |5A: + 6| + \ik + 6 — i\. By Propositions 2.12. 2.15 and
Example 2.16, (A:, 6) is a solution to the problem if and only if
0 E dG{k,h)
= d |5fc + 6| + ^ ' |îA; + 6 — z|
1 = 0
= ^|5A: + 6| + ^jzA: + 6 — z|
which gives or in extended forms,
« n i
if 5fc + 6 > 0 i { > ( ; ) . w s i | if 5A: + 6 = 0!-(;)}■
if 5A: + 6 < 01(0)
if ik + b — i > 0 4 + E 1=0 | > ( ; ) . w s i } if ik + b — i = 0,{-(01
if ifc + 6 — i < 00 0 4 4 ' . )=0.
(2.3)Ai + 2A2 + 3A3 + 4 A4 + 5A5 = 0, (2.4)
and Ao + Ai + A2 + A3 + A4 + A5 = 0 (2.5) where As = r 1 if 5A: + 6 > 0 A € [ -1,1] if5A: + 6 = 0 —1 if ok + 6 < 0, At = 1 if ik + b — i > 0 A € [ - 1 , 1 ] i f i k + b - i = 0 —1 if îA: + 6 — z < 0
for i = 0 ,1 ,2 ,3 ,4 . It is easy to see th at \ = —1 or 1, i = 0 ,1 ,2 ,3 ,4 ,5 , does
not solve equation (2.4). Therefore, at least one of the equations ik + b — i =
0 , 5 k + b = 0 , i = 0 ,1 ,2 ,3 ,4 holds.
In the first case, ik + b — i = 0 for z = 0 implies th at 6 = 0 and hence for
i = 1,2,3 ,4 ,
/ 1 if t > 1 r 1 i( k > I \ - 1 if fc < 1, \ A e [-1 , 1] if 1.
Substituting Aj and Ag into (2.5) and (2.4), we have Ao = —5 for A: > 1 and
As = 2 for A: < 1, which are both impossible. It is obvious th a t when 6 = 0
and A: = 1, we can solve (2.3), th at is, there are Aj € [—1, 1] for i = 0,1 ,2 ,3 ,4
such th a t As = 1, and Aj satisfy (2.3), for example, (Ao, A%, Ag, A3, A^, As) =
f l — - — - — - — - 11
2’ 2’ 2’ 2’
In the second case, zA: + 6 — t = 0 for z = 1 implies th at 6 = 1 — A: and hence
{
- { k - 1) if z = 0 0 if z = 1 ( z - l ) ( t - l ) if z = 2 ,3 ,4 . Thus, we have r - 1 if A: > 1 r 1 if A: > 1 1 if A: < 1, ~ \ A 6 [-1 , 1] if A: < 1 and , . if *: > 1 • if i < 1 for i = 2,3 ,4 .Hence, we have Ai = —3 for fc > 1 from (2.5), and Ai = ^ , > 5 = j for
A- < 1 from (2.4) and (2.5) which are contradictory to |Ai| < 1, jAs] < 1. In this
case, we have the same conclusion k = l , b = 0. In other cases, the proof and
conclusion are same, and omitted. ■
2 .2 .3
G en era l D efin itio n s an d P r o p e r tie s a b o u t M a tr ix
D er iv a tiv e
We recall some definitions and properties about m atrix functions which will be
used in following sections (Lancaster and Tismenetsky, 1985; Rogers, 1980). Let
A (t) = {aij{t))mxn, where : R — > R , i = I , . . . , m , j = 1, . . . ,n, then
(a) A '(t) = [fcA(t)]' = kA'{t)-,
(b) [A(t) + B (t)]' = A '(t) + B'(<), where B (t) = (6ij(<))mxn, ^
(c) [A(<)B(t)]' = A'(<)B(t) + A (t)B '((), where B (t) is a n x m m atrix
(d) [A (/(t))]' = A (,(u )/'(t), where, u = /(<) : R — y R;
(e) [A- Ht ) Y = - A - ^ { t ) A \ t ) A - ^ { t ) ;
(f) [X'^it)A{t)X{t)]' = X { t f A ' { t ) X { t ) + 2X'^{t)A{t)X'{t): where A '(0 =
(xi ( « ) ,..., Xn(t))'^,A(t) = (ay(t))„x„;
(g) f A m x n i t ) d t = ( / a ij(t)d t )
2.3
T he G eneralized G radient o f th e Largest
Eigenvalue o f a M atrix Valued Function
P r o p o sitio n 2.18 [Danskin’s Theorem] (Problem 9.13, p.99 of Clarke, Ledyaev, Stern and Wolenski, 1998)
Let a continuous function Ç : x M R be given, where M is a compact
m etric space. Suppose that fo r a neighborhood U of a given point z € A", the
derivative Çx{x', u) exists and is continuous (jointly) fo r {x', u) ÇlU x M . We
set
H(x') := max Ç{x',u).
ugM
Then 7i is Lipschitz near x , and one has
&H{x) = CO u) : u e M{ x ) } ,
where M{ x ) := {u 6 M : Ç{x,u) = 'H{x)} and co denotes the convex hull.
E x a m p le 2.19
Let us revisit Example 2.2. We find th at
^ / g (0) = c o { ( l ( 1 = c o {0, 1} = [0. Ij. ■
Let D : RP^ —> RP^^ be a p x p matrix-valued function defined by D (a) =
a, we can define a nonzero vector w = {wi,W2, • • -, Wp)^ € /P* to be an eigen
vector of D (a) corresponding to the real eigenvalue A (if it exists), i.e.,
D (a)w = Aw, for w ^ 0.
Assume th at all eigenvalues of D{a) are real and denote
J(a ) := the largest eigenvalue of £>(a)
— ^max(^(&))
= max (D (a)w )^w . IWP=i
Theorem 2.20 [The Generalized Gradient of
Jwith Respect to m]
Suppose that the entries Oy ( i , j = l,- - - ,p ) of D(a) are all continuously differentiable functionals o/ m € f,^(S), and for each m, the eigenvalues o f m atrix D = (u^(m))pxp are all real. Then Jm '= J(a (m )) is Lipschitz near ni
and the generalized gradient o f at m is
dJm = co {(D (^w )^w : w 6 A/(a(m ))} ,
where denotes the Frêchet Deri.vative of D about m and M{a(m)) is the set
of maximizing w in
P ro o f.
By Proposition 2.18, Danskin’s theorem, J ( a ) = max (D (a)w ) w is Lip-||W ||2 = 1
schitz and
5 7 (a) = CO
I
^(D(a)w)^ ^ ]a ' ^ ^ M(a)j
Qi jWiWj
«■j=l
: w 6 M (a)
}
where M (a) is the set of maximizing w in7 (a) = max ((£ > (a )w )^ w |
||w|p = l l ^ -I
By Proposition 2.14, the chain rule, we have
57(a(m )) C 5 7 (a) o a '(m ) = a '(m )^ 5 7 (a)
C [ ( « 1 1 , - - , o i p , - - , O p i , - " , O p p ) ] L : w 6 A/(a(m)) [(®ll> * ' ‘ > ®lp) ‘ ■ 7 ®plt ■ ■ ' * ®pp)lm 1 I ^ " ( lijW iW j I :J = l : w € A/(a(m ))
11
)
= c o |(D j„w )^ w : w G A /( a ( m ) ) |,2.4
T he N orm al C one o f the A bstract Con
straint Set
D efin itio n 2.21 (p.52, Clarke, 1983)
Let C be convex. The normal cone to C at x, denoted iVc(x), is the subset
of X *, given by {C € X* : (C, Î/ - x) < 0 fo r all y € C } . E x a m p le 2.22 Let C = [—1,1] 6 R. Then [0,oo)
{
0}
. ( -0 0,0 ] i f X = 1 i f |x| < 1 i f X = — 1. E x a m p le 2.23For C = R2 = {(x,y) : X > 0, y > 0}, C is convex. As a result,
Xr+(x, y) — ( - 00, 0 ] X ( - 00,0 ( - 00, 0 ] X { 0 } { 0 } X ( - 0 0 , 0 ] {0 } x { 0 } i f X = 0, y = 0 i f X = 0, y > 0 i f X > 0, y = 0 i f X > 0, y > 0.
Now let
n := |m : m{x) > 0 almost everywhere on S , m £ L^(S)J
be the constraint set which we will use in our problem. We give the following
expression for the normal cone of Q.
L em m a 2.24
Let mo G 0 . For any
C G iVn(mo) = {C ^ : (C, m — mo) < 0, fo r all m G fi} ,
we have
• C(^) ^ 0 almost everywhere on S;
• = 0 almost everywhere on E = {x £ S : mo(x) > 0} .
Proof.
(a) If m o(i) > 0 almost everywhere on 5, then there exists a function
mE G L^(5) such th at for any given measurable subset E C S,
m s i x )
0 if a: G 5 and mo(x) = 0
0 ii X £ E and mo(x) > 0
> 0 iî X £ S \ E and mo(x) > 0,
and m (x) := mo(x) ± m c(x) > 0 almost everywhere on 5. Therefore.
th a t is,
(C> wiE> = I ((z )m c (z ) dx = 0,
J{ x£S : in o (x )> 0 } n (S \E )
which implied that Ç(i) = 0 almost everywhere on S . Otherwise, if there exists
a measurable set E C S such th a t the measure of the set S \ E \s not zero and
C(ar) > 0 (or ((%) < 0) for all x G 5 \ E, then
<C. m , ) = C(x)mE(x) dx ^ 0,
which is impossible.
(b ) If mo(x) = 0 almost everywhere on S, then for all m G L'^{S) such that m (x) > 0 for all x G 5,
0 > (C, m — mo) = j^Ç{x) m (x) dx
which implies that C(a^) < 0 almost everywhere.
(c) If there are two sets P and Q such th a t P = {x G S : mo(x) > 0},
Q = {x G 5 : mo(x) = 0}, P U Q = S and the measures of both P and
Q are not zero, then in a m anner similar to (a ) and (b ), we have ((.x) = 0
alm ost everywhere on P , and Ç{x) < 0 almost everywhere on Q, which gives
the required result. ■
2.5
Lagrange M ultiplier R ule
The following nonsmooth Lagrange multiplier rule is a simple ciuse of Clarke's
Proposition 2.25 [Lagrange Multiplier Rule]
(Theorem 6.1.1, P228, Clarke, 1983)Let X be a Banach space. Suppose that C C X is closed and convex, the function G is Lipschitz near any given point of C and the function hj is con
tinuously differentiable fo r j = . Consider the optimization problem
P : m in G (r), s.t. hj(x) = 0, x G C, fo r j = 1, - - ,/c. Let x solve P . Then
there exist real numbers  > 0, and p.j, not all zero, such that
k
0 € Â dG(x) + p.j h'j{x) + Nc{x), j=i
where h'j, d and N ^ i x ) denote the Frêchet derivative of hj, the Clarke general ized gradient and the normal cone of C respectively.
Note th at the normal cone is involved in the above Lagrange m ultiplier rule
due to the presence of an abstract constraint C.
Example 2.26
We consider the optim al problems of the minimizing or maximizing function
f q i x ) — max {1 4- 1 + x} given in Example 2.2. For problem
max f o or equivalentlv min —/o ,
we know th at the optim al solutions are x = —1, 1. Note th at / q ( x ) ^ 0, since
have
0 € - 9 / q ( x ) + AT[_i i|(or)
{—1} + [ 0, oo ) if X = 1
{ 2 } + ( - 0 0 , 0] if X = - 1 .
It is obvious th a t x = 0 solves the problem min^/<j(x), and satisfies the
Lagrange m ultiplier rule
0 € df q i x ) + N[-i,i]{x)
= [0, l ] + {0 }
M inlm ax R ob u st D esign s for
M isspecificed R egression M od els
3.1
M inim ax D esigns
In this chapter, we focus on deriving the minimax design for regression model
(1.5) with one independent variable x. W ithout loss of generality, the design
space is assumed to be 5 = [—^, i]. Regressors z'^(x) = (zi(x), Z2(x), • • •, Zp{x))
are given, where Z j { x ) are continuous functions of i for j = 1, • • •, p. Thus, the
fitted regression model is,
E (Y lx) = z^(x)0, (3.1)
and the true model is,
E (Y |x ) = z^(x )0 + /(x ), (3.2)
where
0 E BP, X G S = [—^, ^1 C
and
/ e | / : ^ z ( x ) / ( x ) d x = 0, ^ / ^ ( x ) d x < 77^1, (3.3)
where r f is a constant. The first condition in .F says th a t / and z are orthogonal
which implies th at the param eter 0 is uniquely defined in model (3.2) under the
condition th at
J
z(x)z^(x)dx
is non-singular. The second condition assumes th at / is small.Assuming the additive error, we fit the following model based on n observa
tions
Y j = z ^ ( x j ) 0 - f - 7 = l , - - - , n , ( 3 . 4 )
where errors £,• are uncorrelated with mean 0 and variance <r^. From (1.2), we
have Z = and the L S E of 9 is ( 2 i ( X i ) Z 2 { X i ) • • • 2 p ( X i ) \ Z \ { X 2 ) Z2(xz) • • • 2 p ( X 2 ) V 2l ( x „ ) Z2 { X n ) • • ■2p ( x „ ) y -I ë =
(Z^Z) Z^y.
The covariance m atrix of Ô is given byCov{0) = E | [ ê - E (ê)] [ê - E ( ê ) ] ^ | = <7^ ( z ^ z ) " ' .
To study the properties of the L S E 0 of 0, we introduce the following
notations. Let Ç(x) be the design measure for x. Define
=
[z'^zy z'^{ze + f ) - e
= ( z ^ z ) " ‘ z ^ f
= ( n A ç ) - 'n b ( / , 0
= A f ‘ b ( / , 0
where f = ( / ( i i ) , • • •, /(x „))^ . The mean squared error m atrix is
M S E ( /,o = E [(ê - 6) ( ê -= Cov(ô) + [ E ( è - a)] [e ( à - 0 ) ] ‘ A f ^ + A ç ‘b (/,O b ^ (/,e )A ç -n -I (3.5) (3.6)
We consider three kinds of loss functions, Cq, Cd and Ca which represent the
integrated MSE of the fitted response Y (x), determ inant of the MSE m atrix and
the trace of the MSE matrix respectively. Their explicit descriptions (Fedorov,
1972; Studden, 1977; Heo, Schmuland and Wiens, 2001) are
CqU, ^ ) = I ^ E ^ { Ÿ {x) - E { Y \x) Ÿ dx = j ^ E |^{z^(x) 0 - (z^(x)d H- / ( x ) ) d x dx = j ^ E \ y { x ) ( 0 - e ) - f [ x ) } = [z^(x) {0 - 0 ) { 0 - 0 y z(x) - 2^ E [/(x)z^(x) ( à - ©)] dx dx dx
= ^z^(x)M SE(/, 0 z(x)dx-2j^f{x)x^{x)Aç ^b(/, ^)dx+J^f^{x)dx
= trace (M SE(/, Ç) Ao) + j f ^{x)dx 2
= ^trace(A f^ A o )+ b ^ (/,Ç )A ç^ A o A f^ b (/,0 + ^ /^ (2 ^)rfa:, (3.7)
C D { f , 0 = d e t(M S E (/,0 )
^ / » ( / .0 = trace(M SE(/,^))
= ^ ^ ^ t r a c e ( A f ^ ) + b ^ ( / , 0 A f 2 ( / , 0 b ( / , 0 - (3 9 )
We aim to construct designs to minimize the maximum (over f € T ) values
of a loss. Heo, Schmuland and Wiens (2 0 0 1) gave the following im portant
Propositions 3.1 and 3.2.
P ro p o sitio n 3.1
Suppose that ||z(x)|| is bounded in x on S and that fo r each a ^ 0, the set
{x : a^z(x) = 0} has Lebesgue measure zero. I f supjrC{f,Ç) is finite, then
^ is absolutely continuous with respect to Lebesgue measure, with a density m satisfying fg ||z(x)|pm ^(x) dx < oo.
Define matrices
Kç : = j ^ z { x ) ^ [ x ) w ? { x ) d x ,
P r o p o s itio n 3.2
Let ^ be as in Proposition 3.1. Denote by Amai(A) the largest eigenvalue of a matrix A . Then
M Cq{^) := = i f [i/ trace{Aç ^Ao) + A,„ai(KçHç *)] , (3.10)
M C o iO ~ = ■)], (3.11)
M Ca{ 0 := m ^ C A ( f , 0 =V^ [’' t r a c e {Aç^) + XmaxiGçAç^)], (312)
where u = Thus the density m, ( x) of a Q —, D — or A-optim al(m inim ax).
design must minimize the right hand side o f (3.10), (3.11) or (3.12) respec
tively.
Rem ark: The eigenvalues of matrices G(Ag and GçAç ^ are all real. In fact, if Gç is nonsingular, Wiens (1992) shows th a t
2
A 4£g(^) = — tr [Aç ^Ao] + rj^X^
71 *• •* I + G |A f 'A o A ç 'G |
where I is an identity matrix. Noting th at the m atrix I + G | A^^AqA ^ ^ G |
is symmetric since Gç, Aç, Aq are all symmetric, we know th at the eigen
values of I - I - G | Aç *^AqAç ^G | are all real. Suppose A is an eigenvalue of
I + G | Aç ^AqAç ^Gç , then it satisfies
Multiplying to both sides of the previous equation, we get
Gç + GçHç G ( — AGç = 0.
Thus
since G ( is nonsingular. Substituting G ( by K ( — in the above equation we
have
|K çH f ‘ - Al| = 0.
Therefore A is an eigenvalue of matrix Note th a t in the above proof
procedure each step can be reversed, it is also true conversely. Matrices I +
Gç Aç ^AoAç ^G | and have same eigenvalues which are all real. Simi
larly, eigenvalues of matrices GçAç ‘ and GçA^^ are all real. If G ( is singular,
Heo, Schmuland and Wiens (2001) proved the same result.
We now consider the following problem to find a minim ax design
' mm M £ q( 0 ( or M C o i O . or M Ca( 0 )
s.t. (l,m )£,2= / m { x ) d x = l, m € f2
(3.13)
where ( -, - )^z denotes the Ig-inner product, and Q. = {m € T^(5) : m(.r) >
0 almost everywhere o nS }. It is obvious that fl is convex and closed. This is
function. Thanks to the development in nonsmooth analysis, we can apply
nonsmooth Lagrange multiplier rule (Proposition 2.25), which is a simple case
of Clarke’s result, to derive important results in minimax designs.
3.2
Som e Fundamental C alculations
In this section, we present some fundamental calculations to be used in the
next section. We first find the derivative of the functional trace ( A ç ‘Aq) at m.
From Definition 2.3, the limit
lira + ^
t-»0 t \ » J’ /
for all h e implies that A^(x) = (z,(z)zj(z))p^p, th at is
A ç ( x ) = z ( x ) z ^ ( a r ) .
By operation rules defined in Section 2.2 we derive
(trace (A ^^Aq) ) = trace ^ ( A ^ ') Aq^ = —trace (Aç^AçAç^Ao) = -z ^ (x ) (A ^ ^AqA^^) z ( x ) . (3.14) Similarly, Kç(x) = 2z(x)z^(x)m (x),
and ( K ç H j ‘) ' w = K J(x )H j-' + K « ( H j- ') '(x ) = K '( i ) H j - ' - K (H ^ ' (AjA J-'Aj) ' ( x )H (' = K J(x)H (-‘ - KjHç-> (a ;(x)Aô' A j + A çA J‘A J(x)) H f ' = 2z(a;)z^(x)H^^m(x) - ‘z(a:)z^(x)Ao - K ç H f ' A ç A o ‘z ( x ) z ^ ( x ) H f ' = z(a;)z^(x)M im (x) + M2z (z ) z ^ ( r ) M3
for some constant matrices M i, Mg and M j. For a constant vector w e
((K ç H f^ )'(x )w ) w
= ((z(x)z^(x)M im (x) + M 2z(x)z^(x)M3) w
3.3
M ain Analytical R esults
The following theorems give the main results for analytical solutions for minimax
designs.
T h eo rem 3.3
Suppose that fo r any non zero constant matrix D the set jx : z^(x)D z(x) = o | has Lebesgue measure zero. I f m solves problem (3.13), then m has the following form
almost everywhere on 5 = [—^, |] for suitable constant matrices B and D and constant d.
R em ark: It is easy to see that if z^(x) = the set | x ; z^(x )D z(x ) = o | has Lebesgue measure zero for any non zero constant
m atrix D.
Here we only give the proof for Theorem 3.3 with loss function Cq. The proofs for the other two loss functions Cp and Ca are similar and therefore are
om itted.
P r o o f of Theorem 3.3:
Let m solve problem (3.13) for loss function Cq. The objective function (3.10) is locally Lipschitz near m, since trace^Aç and A^ax are
all locally Lipschitz near m by Theorem 2.20. From Proposition 2.25 (Lagrange
Multiplier Rule), there exist X > 0, p. € R , not both zero, such that
0 6 Â5 [ifi/ (tra c e (A ç ‘Ao}) + Amax(KçHç ^)]
+ pd[{l,m )Lz - l] + iV n(m ). (3.17)
Let us now prove th at  ^ 0. To the contrary suppose th at  = 0. Since
p d [ { l , m)[^2 —
1]
= { p } ,(3.17)
becomes0 E {/i} + iVn(m)
where p ^ 0. There exists Ç E ATn(m) such th at p 4- C(^) = 0 for all x E S;
thus, C = C(^) = < 0 for all X E 5 from Lemma 2.24. By the definition of
iVn(m), we have
0 > (C,mi - m)(,2
= j ^ Ç{x){mi{x) — m{x)) dx
= (m i(x) - m(x)) dx,
for all nil G O. In particular, let m i{x ) = 0, then
0 = y
mi{x) dx > J m{x)dx=l
which is impossible. Hence, Â ^ 0.W ithout loss of generality, assume th a t A = 1. B y Propositions 2.7 and 2.1 2,
(3.17) becomes
0 e { r/V [(< ra ce (A f'A o )]'} + ô [A ,„ ax (K çH ç')]
+ [(1) — 1] } 4- An(m ). (3.18)
From (3.14) and Theorem 2.20, (3.18) implies that
0 G CO
I
w : w G A f(m )|- i fu z ^ i x ) (Aç ^A oA f‘) z(x) + {/i} + N n { m ) . (3.19)
B y the definition of convex hall, there exist a positive integer iV, scalar A j > 0
for i = 1 , . . . , AT with S i l i Aj = 1, w, G M {m ) C BP and C € Nçi{m) such that
0 = ^ A i W i ^ W , + . . . + Aa t ( ( K çH ç W y v
-
ifiyz^{x)
(Aç ^AqAçz{x)
+ /2 +C-F r o m ( 3 . 1 5 ) ,
0 = [ A i w f ( M f z ( x ) z ^ ( x ) m ( 2 : ) + M 3z(x)z^ (x) M 2 ) w i
H Ajvw]^ (M fz (x )z ^ (x )m (x ) + M ^z(x)z^(x)M 2 ) w ^ j
- i f uz'^ix) (Aç ^AoAç z(x) + /i +
C-By L e m m a 2 . 2 4 ,
H Ayvwjv (M fz(x )z^ (x )m (x ) 4 - M , z(x )z^(x )M2)
- T)^v^{x) (Aç ‘ A o A ç z ( x ) 4 - p.,
almost everywhere on {x € S : m (x) > 0}. Consequently, there exist constant
m atrices B and D and constant d such th at
z^(x)D z(x)m (x) — z(x)^B z(x) - d = 0.
Since for any non zero constant m atrix D the set |z ^ (x )D z (x ) = o} has Lebesgue
measure zero, m has the form (3.3), which gives the required result. ■
We know th at in the case of D —optim ality the optimal design is provably
symmetric (Wiens, 1992, 1993). For Q —and A -optim ality symmetry has not
been proven, but is certainly plausible. On the other hand, in practice, many
scientists and eigineers are only interested in the case of symmetric optim al
designs. So we need to consider the symmetric density functions, and we can
modify problem (3.13) as follows
nun
MC
q
{^) {
o tMC
d
(0^
or A4£m(0)s.t. (1, m)£,2 := m(x) dx = 1 (3.2 0)
||m — m~ 111^2 :=
J
[m(.x) — m (—x)]^ dx = 0, m 6 fl.where m ~(x) := m (—x). The solution to problem (3.20) is given in the following
T h eorem 3.4
I f a symmetric density m solves problem (3.20), then m has the same form as the one in Theorem 3.3.
Proof:
We first calculate the derivative of ||m - m~ | | | 2 with respect to m. By
definition 2.3, for any h € L'^iS),
1 ^ 7 [ ^ [ ( ^ ( ^ ) - (m (-x )
+th{-x))Ÿdx-J^[m{x)-m{-x)\'^dx
= lim Y^^[{ni{x)-m{-x))+t{h{x)-h{-x))]^dx—Jjm{x)-m{-x)fdx
l i m j [2^ t [ ( m ( x ) - m ( - x ) )(h{x)-h{-x))]dx+t^ J^[h{x)-h{-x)fdx
= 2 [{m(x) — m{—x)) {h{x) —
/i(-x))]dx
= 2(m(x)
—m(—x))h(x) dx — j^ {m{x)
—m{—x))h{-x) dx
— 2 j^y^^^(m(x) —m{—x))h{x) dx
— ‘ (m (—x) — m (x))/i(x){—dx)
(m (x) — m {—x))h{x) dx
= 4 (m — m , /%),
th at is,
I (m(ar) — m (—x))^ dx (x) = 4 [m(x) - m (—x)],
L-'S m
for all X 6 5 . This implies th at if m is a symmetric solution, then in a manner
similar to th e proof of Theorem 3.3, we have
z^(x)D z(x)m (x) — z(x)^B z(x) — d + 4 /xi[m — m~] 4- ( = 0,
where Ç is then same as in Theorem 3.3. Next
z^(x)D z(x)m (x) — z(x)^B z(x) — d = 0,
almost everywhere on E = (x € 5" : m (x) > 0}, since m — m~ = 0 if m is
symmetric, and ^ = 0 is almost everywhere on E . In conclusion, if m is a
symmetric solution, then m has the same form as the one in Theorem 3.3. ■
3.4 Exam ples
E xam ple 3 .5
Consider the approximately linear regression model
with z ^ { x ) — ( l,x ) . For loss function Cq , the minimax design was discussed in
Huber (1975, 1981) (also see next section), and the minimax density function
is given by
m(x) = {ax^ + 6) ^ (3.21)
for a: € 5. Suppose th a t m(x) is symmetric, from Theorem 3.4, the minimax
density is
which includes the solution (3.21). In Section 3.5, we obtain the exact values of
a and h in (3.21). ■
E xam p le 3.6
Consider the quadratic model with no intercept,
y = 9oX->r Oix^ + /( x ) + 6,
i.e., 2i(x) = X, and Z2(x) = x^. Theorem 3.4 implies th at the symmetric
minimax density is
E x a m p le 3 .7
For the quadratic model with intercept,
y = (9o + 9 \X + 92X^ + f{ x ) + e,
the sym m etric minimax density is
E x a m p le 3.8
Consider the general polynomial regression model
y 9q 9 \ X 9 p - \ x ^ ^ + f { x ) + £ .
From the proof of Theorem 3.3, the minimax density m(x) satisfies
(
2^1 . \ /2p-lM ^ bi-ix' ^ I m{x) = 0
almost everywhere on {x e 5 : m(x) > 0} for constants ai and 6,. For the
minimax symmetric density, we need a, = 6, = 0 for i = 1,3, • • •, 2p — 3. Thus
m ix ) = ( Q2p-2X^p-" + U2p_4X^P-" + • • • + a2X^ + ao\ +