This article was downloaded by: [KU Leuven University Library] On: 30 July 2014, At: 04:25
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics
- Theory and Methods
Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20
More on local influence
in pllincipal components
analysis
Baibing Li a & Bart De Moor a a
ESAT-SISTA, Dept. of Electrical Engineering , Katholieke Universiteit Leuven , Kardinaal Mercierlaan, Leuven, 945 300, Belgium Published online: 27 Jun 2007.
To cite this article: Baibing Li & Bart De Moor (1999) More on local influence in pllincipal components analysis, Communications in Statistics - Theory and Methods, 28:10, 2487-2495, DOI: 10.1080/03610929908832432
To link to this article: http://dx.doi.org/10.1080/03610929908832432
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and
Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Baibing Li 2nd Bart De Moor
Kcy !~t't?r.ds: local influence; periurharion scheme. principal cornpo~mti; analysis.
2488 LI AND DE MOOR
Consider an independent and identicaliy distributed sample X I ,,..,xn~
RP
and its sample covariance matrixS.
The purpose of this note is to suggest other detection indexes for local influence on the p distinct eigenvaluesXI
+=l,,.,,p) of S, The indexes are produced by modifying the perturbation scheme of Shi (1997). Extensions of this approach to the eigenvecrors ofS
are straightforward. It is known that principal components analysis is quite sensitive to outliers and influential cases (Huber, 1981; Critchley, 1985; Shi, 1993). To avoid obtaining misleading results, identification of such cases is necessary. For this purpose, Critchiey j1985) considered global influence analysis. Recently, another approach, local influence, was investigated by Shi (1997) as an extension from rhe iikelihood approach for local influence analysis iCook.1986j.
For a samplc X I ... X,,E R1' from a pop~lli?ti~ii ivith known mean p and unknown covxiancc in:wix. Shi (1997) considered following pertusbaiion scheme
x~(o)=o,(x,-~)
for i= l ,. .,I? ! I )T
with perturbation vector
o=[ol
,..., m,,] and o , = l i ~ h , (i=l, ..., n ) , h-jh! ,,.., i ~ , ~ ] ' ' and llh112=~, When the popuiation mean is unknown, Shi (1997) replacedg
by its sample versionT
= ( x ! t ... ix,)/n and consideredY
as pertwbation-free.The generalized local influence ft~nclions of
h,
are given by Shi (19")) based on the perturbation scheme (1):where y,,=(s,-- X
)Ta,,.
uj is an eigenvector clsociatec! with the eiger!va!iie ;,,(j=l ,...,p).
Gor~secpentiy~ ( which m x i m i z e s ~ G I F & ; ~ ] ] ' satisfies
21 r
h:!,,,j&,,)= [yiiz,,. yrii j This gives an index
k(x-,:
k j ) used ro identify intTr?en:iaicases by piois of G(x:; k j ) against case number:
~,T(x,; I&-v,~, i= ~. . , >n and ,j== ! ,,... p ,
In this note, for a sample x,, ... x,,ER" from a popuiatiar~ with both unknown popuiation mean
g
an2 u n k : m v n covariance matrix,w::
suggest foliowing perturbation schemex , ~ ~ ) = w , x , for i== 1 ? , , R ( 2 ) The main difference hetween the two perturbation schemes ( I ) and ( 2 ) is that perturbation characterized by (3 may influence s n y of the sample moments, whiie the scheme (li wirh the refiiacei-nent of p by
x
is i;pertu;bation-free scheniz for sampie means (i.e. it i s aisunied that p e r t ~ i r b a t ~ o ! ~
of each obsen,atioia c.\w doe< nst affect m n p l e :iieiii-is; We believe that ni?e:i no prior informatior: is L~vai!;~ble7 the perturbdtion scheme (2) is more reasonah!e since in gei~erai, m i m r cliilnges of 2,isei may irifiuence sarnple means as well as other. salnple moments.
Similar to Shi's apprtrach, kiting ~ ~ , = l - t . ~ l i ! in ( 2 ) : we consider the sample covariance matrix from the perturbe data ( 2 ) and its eigenvaiues Aj(cl,j. Local influence functions of
X j
are then given by [ a ~ ~ j w ) / a w ] ~ i ~ 31 w=[l,,..,i]', that isGFw(Jv,; h)=GiFs(Aj; h]+(2/rt j
T
with v;= 7
q,
Gonseq~eiitiy, hi,,(?$ obtained by maximizing [GFM(4!; h)]' can be me!! for detecting iocaily infhentiai cases. Sincei; ,,,, ,,(il,iK [:jl,2+i;yl; ,.... ~ , , ~ ' - + ~ ~ , i > ~ ~ j ~
it leads to a deleitkin index for cigenvaiuz 3 , ~4 fo!!oi~s.
I*&; &J=ytjL+Y;yij , i= I ,,. .,i1 2nd j=! ,.,., p.
2490 LI AND DE MOOR
2 It is interesting to compare In&,; A,) and Is(s,;
A,).
IM(xi; ?bj)=yij +vjyi, consists of both linear and quadratic terms of v , associated with centralized observation (x,-E), while Js(x,; has oniy a quadratic term. I,&,;A,)
reflects two types of effects by minor changes of cases: one for sample mean and another for sample second moment about the origin. The added effect depends on their relative magnitudes and signs. In general, for a small linear term ~ y , , , Ikl(xi;
h,)
and I&;L,)
give similar detection results, while for a large linear term, their results may be quite different. It is clear that the difference between I,&,;4)
and Isjx,;XI)
originates from the perturbation schemes (i.e, whether sample means are perturbation-free),In this section, we give two exainples to iliusrrate the detection index i,ii(x,;
A,)
developed abo:e. We firstly ciisciisi a ttvo-dirnensionai problein to have a graphical wew for scatters of observation cases.LE 1. Artificial data with
!5
cases arid 2 variables given by TABLE 1. FIG. i gives a scatter plot for the data. For the Largest eigenvalue hi of the data, the oniy outiier, case 10 at the position (1.5,2.5). is globally influential by Critchiey's giobal influence function. FIG, 2 gives the plot of L ( x i ; A,) and Is(xi;h,,) against case number i for local influence analysis. Obviously, from &(x,;?~,), case 10 is a iocally influential case, while i t is not by Il&,;h).
In order to gain insight for this problem, we start out from fundamental ideas of' local influence, and for minor changes of a case, consider the impacts or; the largest eigenvaiue kl. Specificaiiy. for an increment E of' case i such thai
X,(E)--(I+E)X,. we directly compute ihe largest eigenvaiue h!!''(s) and Its relative
PRINCIPAL COMPONENTS ANALYSIS 249 1
TABLE I, A set of artificial data - Case No. i 2 3 4 5 G 7 8 X I 2.1 2.11 2.9 2.7 3.6 3.0 2.9 2.2 X 2 1.0 1,3 1 . 1 0.4 0.5 0.5 0.9 0.6 Case No, 9 10 1 1 12 13 14 15 X I 2 , 1 1.5 3.2 3.3 3.3 3.5 2.8 x2 0.9 .- 2.5 0.8 1.2 0.4 0.9 0.8
FIG I . Artificial data: ccarter plot
E l AND DE MOOR
F1G.I. Reiai~se error versus pert'irbation iocrernen! E.
case 5 . - - ).
lo(--)
and i ! (. ..)error
.i,(~)=lh~-A!'!'(~)l/l~!.
Siinilnr treatment was adopted by Cook (1986). FIG, 3 Ive:, the p!ot of rclatiuc ei.!~oi.s i~giiinst the pertwl?ilt:on increment E i'ol. casei 5 . 10 a x i I I . Ir can ile hecii :hat case 5 !?as muci-i ntroi-isei. impact than cases 10:inti I I . ?vloreovz~., i:aac\ 10 ;md i ! have almost q u a i effects. These observations agree with w!mt l , Z . l ( ~ , ; jLJ) ~ndicates.
From the above analysis we coi~clude that minor changes of case 10 do not have strong impacts on hi or equivaiently.
7~~
is not sensitive to minor changes of case 10.Further analysis on I,&,; h,) shows that for case 10, its first and second order effects
bp,,
and Y,2 have almost the same (relatively large) absolute values but opposite signs, which leads their impacts to cancel out.Somewhat interesting is that by similar analysis, we can see that case 10 is
a globally as well as a locally influential case if it is moved to (5.5,1), Next, we cons~cier a more complicated practical example.
PRINCIPAL COMPONENTS ANALYSIS
-10- . 2
5 10 15 20 case rurnber
FIG 4 Detection indexes ls(xi, h,) (-)
d11d I & & , , k,) (- - )
FIG, 5 . Relative errors versus perturbation
E: case 4 (- -.), I 1 (-) and 13 (... ).
EXAlWPLE 2. Kendell's soil composition data (1975. Table 2.1).
Kendell (1975) investigated a set of soil composition data. There are 20
obsen.ations and 4 variables inc!uding silt content, clay content. organic matrer. and acidity on the pH scale. This set of data was also investigated by Critchley
!19R51 2 n d C h i 11997)
The authors are grateful for the reviewers' valuable suggestions for the earlier version of this paper. The second author is a senior Research Associate with the FWO-Flanders, This work was supported by Concerted Research Action GOA-MIPS, the FWO project G.0256.97: Numerical Algorithms for Subspace System Identification, the FWO Research Communities: ICCOS and Advanced Numerical Methods for Mathematical Modeling, and the Belgian Government: Interuniversity Attraction Poles (BUAP P4-02 and P4-24).
Cook, R. D. (1986). "Assessment of local influence," J, R. Stutist. Soc. B48,
113- 169.
Critchley, F. (1985). "Infiuence on principal components analysis," Biometrika 72, 627-636.
PRINCIPAL COMPONENTS ANALYSIS
Huber. P. ( i 98 1 ), Robi~sr Sratiscic.~. New York: \bliley
Kendeil, M ,
G
i 1975). i44rdiivizi.icrt~ Aml?si.?. London: Griffin§hi, I,. (1997). "Local influence in principal components analysiu.'' Biometrika, 84, 175-1 86,