• No results found

Communications in Statistics- Theory and Methods

N/A
N/A
Protected

Academic year: 2021

Share "Communications in Statistics- Theory and Methods"

Copied!
11
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

This article was downloaded by: [KU Leuven University Library] On: 30 July 2014, At: 04:25

Publisher: Taylor & Francis

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics

- Theory and Methods

Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20

More on local influence

in pllincipal components

analysis

Baibing Li a & Bart De Moor a a

ESAT-SISTA, Dept. of Electrical Engineering , Katholieke Universiteit Leuven , Kardinaal Mercierlaan, Leuven, 945 300, Belgium Published online: 27 Jun 2007.

To cite this article: Baibing Li & Bart De Moor (1999) More on local influence in pllincipal components analysis, Communications in Statistics - Theory and Methods, 28:10, 2487-2495, DOI: 10.1080/03610929908832432

To link to this article: http://dx.doi.org/10.1080/03610929908832432

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and

(2)

Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

(3)

Baibing Li 2nd Bart De Moor

Kcy !~t't?r.ds: local influence; periurharion scheme. principal cornpo~mti; analysis.

(4)

2488 LI AND DE MOOR

Consider an independent and identicaliy distributed sample X I ,,..,xn~

RP

and its sample covariance matrix

S.

The purpose of this note is to suggest other detection indexes for local influence on the p distinct eigenvalues

XI

+=l,,.,,p) of S, The indexes are produced by modifying the perturbation scheme of Shi (1997). Extensions of this approach to the eigenvecrors of

S

are straightforward. It is known that principal components analysis is quite sensitive to outliers and influential cases (Huber, 1981; Critchley, 1985; Shi, 1993). To avoid obtaining misleading results, identification of such cases is necessary. For this purpose, Critchiey j1985) considered global influence analysis. Recently, another approach, local influence, was investigated by Shi (1997) as an extension from rhe iikelihood approach for local influence analysis iCook.

1986j.

For a samplc X I ... X,,E R1' from a pop~lli?ti~ii ivith known mean p and unknown covxiancc in:wix. Shi (1997) considered following pertusbaiion scheme

x~(o)=o,(x,-~)

for i= l ,. .,I? ! I )

T

with perturbation vector

o=[ol

,..., m,,] and o , = l i ~ h , (i=l, ..., n ) , h-jh! ,,.., i ~ , ~ ] ' ' and llh112=~, When the popuiation mean is unknown, Shi (1997) replaced

g

by its sample version

T

= ( x ! t ... ix,)/n and considered

Y

as pertwbation-free.

The generalized local influence ft~nclions of

h,

are given by Shi (19")) based on the perturbation scheme (1):

where y,,=(s,-- X

)Ta,,.

uj is an eigenvector clsociatec! with the eiger!va!iie ;,,

(j=l ,...,p).

(5)

Gor~secpentiy~ ( which m x i m i z e s ~ G I F & ; ~ ] ] ' satisfies

21 r

h:!,,,j&,,)= [yiiz,,. yrii j This gives an index

k(x-,:

k j ) used ro identify intTr?en:iai

cases by piois of G(x:; k j ) against case number:

~,T(x,; I&-v,~, i= ~. . , >n and ,j== ! ,,... p ,

In this note, for a sample x,, ... x,,ER" from a popuiatiar~ with both unknown popuiation mean

g

an2 u n k : m v n covariance matrix,

w::

suggest foliowing perturbation scheme

x , ~ ~ ) = w , x , for i== 1 ? , , R ( 2 ) The main difference hetween the two perturbation schemes ( I ) and ( 2 ) is that perturbation characterized by (3 may influence s n y of the sample moments, whiie the scheme (li wirh the refiiacei-nent of p by

x

is i;

pertu;bation-free scheniz for sampie means (i.e. it i s aisunied that p e r t ~ i r b a t ~ o ! ~

of each obsen,atioia c.\w doe< nst affect m n p l e :iieiii-is; We believe that ni?e:i no prior informatior: is L~vai!;~ble7 the perturbdtion scheme (2) is more reasonah!e since in gei~erai, m i m r cliilnges of 2,isei may irifiuence sarnple means as well as other. salnple moments.

Similar to Shi's apprtrach, kiting ~ ~ , = l - t . ~ l i ! in ( 2 ) : we consider the sample covariance matrix from the perturbe data ( 2 ) and its eigenvaiues Aj(cl,j. Local influence functions of

X j

are then given by [ a ~ ~ j w ) / a w ] ~ i ~ 31 w=[l,,..,i]', that is

GFw(Jv,; h)=GiFs(Aj; h]+(2/rt j

T

with v;= 7

q,

Gonseq~eiitiy, hi,,(?$ obtained by maximizing [GFM(4!; h)]' can be me!! for detecting iocaily infhentiai cases. Since

i; ,,,, ,,(il,iK [:jl,2+i;yl; ,.... ~ , , ~ ' - + ~ ~ , i > ~ ~ j ~

it leads to a deleitkin index for cigenvaiuz 3 , ~4 fo!!oi~s.

I*&; &J=ytjL+Y;yij , i= I ,,. .,i1 2nd j=! ,.,., p.

(6)

2490 LI AND DE MOOR

2 It is interesting to compare In&,; A,) and Is(s,;

A,).

IM(xi; ?bj)=yij +vjyi, consists of both linear and quadratic terms of v , associated with centralized observation (x,-E), while Js(x,; has oniy a quadratic term. I,&,;

A,)

reflects two types of effects by minor changes of cases: one for sample mean and another for sample second moment about the origin. The added effect depends on their relative magnitudes and signs. In general, for a small linear term ~ y , , , Ikl(xi;

h,)

and I&;

L,)

give similar detection results, while for a large linear term, their results may be quite different. It is clear that the difference between I,&,;

4)

and Isjx,;

XI)

originates from the perturbation schemes (i.e, whether sample means are perturbation-free),

In this section, we give two exainples to iliusrrate the detection index i,ii(x,;

A,)

developed abo:e. We firstly ciisciisi a ttvo-dirnensionai problein to have a graphical wew for scatters of observation cases.

LE 1. Artificial data with

!5

cases arid 2 variables given by TABLE 1. FIG. i gives a scatter plot for the data. For the Largest eigenvalue hi of the data, the oniy outiier, case 10 at the position (1.5,2.5). is globally influential by Critchiey's giobal influence function. FIG, 2 gives the plot of L ( x i ; A,) and Is(xi;h,,) against case number i for local influence analysis. Obviously, from &(x,;?~,), case 10 is a iocally influential case, while i t is not by Il&,;

h).

In order to gain insight for this problem, we start out from fundamental ideas of' local influence, and for minor changes of a case, consider the impacts or; the largest eigenvaiue kl. Specificaiiy. for an increment E of' case i such thai

X,(E)--(I+E)X,. we directly compute ihe largest eigenvaiue h!!''(s) and Its relative

(7)

PRINCIPAL COMPONENTS ANALYSIS 249 1

TABLE I, A set of artificial data - Case No. i 2 3 4 5 G 7 8 X I 2.1 2.11 2.9 2.7 3.6 3.0 2.9 2.2 X 2 1.0 1,3 1 . 1 0.4 0.5 0.5 0.9 0.6 Case No, 9 10 1 1 12 13 14 15 X I 2 , 1 1.5 3.2 3.3 3.3 3.5 2.8 x2 0.9 .- 2.5 0.8 1.2 0.4 0.9 0.8

FIG I . Artificial data: ccarter plot

(8)

E l AND DE MOOR

F1G.I. Reiai~se error versus pert'irbation iocrernen! E.

case 5 . - - ).

lo(--)

and i ! (. ..)

error

.i,(~)=lh~-A!'!'(~)l/l~!.

Siinilnr treatment was adopted by Cook (1986). FIG, 3 Ive:, the p!ot of rclatiuc ei.!~oi.s i~giiinst the pertwl?ilt:on increment E i'ol. casei 5 . 10 a x i I I . Ir can ile hecii :hat case 5 !?as muci-i ntroi-isei. impact than cases 10

:inti I I . ?vloreovz~., i:aac\ 10 ;md i ! have almost q u a i effects. These observations agree with w!mt l , Z . l ( ~ , ; jLJ) ~ndicates.

From the above analysis we coi~clude that minor changes of case 10 do not have strong impacts on hi or equivaiently.

7~~

is not sensitive to minor changes of case 10.

Further analysis on I,&,; h,) shows that for case 10, its first and second order effects

bp,,

and Y,2 have almost the same (relatively large) absolute values but opposite signs, which leads their impacts to cancel out.

Somewhat interesting is that by similar analysis, we can see that case 10 is

a globally as well as a locally influential case if it is moved to (5.5,1), Next, we cons~cier a more complicated practical example.

(9)

PRINCIPAL COMPONENTS ANALYSIS

-10- . 2

5 10 15 20 case rurnber

FIG 4 Detection indexes ls(xi, h,) (-)

d11d I & & , , k,) (- - )

FIG, 5 . Relative errors versus perturbation

E: case 4 (- -.), I 1 (-) and 13 (... ).

EXAlWPLE 2. Kendell's soil composition data (1975. Table 2.1).

Kendell (1975) investigated a set of soil composition data. There are 20

obsen.ations and 4 variables inc!uding silt content, clay content. organic matrer. and acidity on the pH scale. This set of data was also investigated by Critchley

!19R51 2 n d C h i 11997)

(10)

The authors are grateful for the reviewers' valuable suggestions for the earlier version of this paper. The second author is a senior Research Associate with the FWO-Flanders, This work was supported by Concerted Research Action GOA-MIPS, the FWO project G.0256.97: Numerical Algorithms for Subspace System Identification, the FWO Research Communities: ICCOS and Advanced Numerical Methods for Mathematical Modeling, and the Belgian Government: Interuniversity Attraction Poles (BUAP P4-02 and P4-24).

Cook, R. D. (1986). "Assessment of local influence," J, R. Stutist. Soc. B48,

113- 169.

Critchley, F. (1985). "Infiuence on principal components analysis," Biometrika 72, 627-636.

(11)

PRINCIPAL COMPONENTS ANALYSIS

Huber. P. ( i 98 1 ), Robi~sr Sratiscic.~. New York: \bliley

Kendeil, M ,

G

i 1975). i44rdiivizi.icrt~ Aml?si.?. London: Griffin

§hi, I,. (1997). "Local influence in principal components analysiu.'' Biometrika, 84, 175-1 86,

Recelved

January, 1998; Revised March, 1999.

Referenties

GERELATEERDE DOCUMENTEN

We discuss a probability of unsuccessful repairs, capacitated resources, multiple failure modes per component, a probability that no failure is detected in a component that is sent

In this paper, we propose a perceptual compressed sensing (PCS) framework, in which a declipping algorithm is subse- quently developed. The PCS framework integrates CS theory

Objective evaluation ex- periments show that the proposed loudspeaker compensation algorithm provides a signicant audio quality improvement, and this for all considered

While least squares support vector machine classifiers have a natural link with kernel Fisher discriminant analysis (minimizing the within class scatter around targets +1 and 1),

In this paper, we propose a perceptual compressed sensing (PCS) framework, in which a declipping algorithm is subse- quently developed. The PCS framework integrates CS theory

Objective evaluation ex- periments show that the proposed loudspeaker compensation algorithm provides a signicant audio quality improvement, and this for all considered

Authorized licensed use limited to: KU Leuven Libraries... Authorized licensed use limited to: KU

Sampling a more diverse subset improves the performance of Nystr¨om approximation and KRR (Fanuel et al., 2020).. In these experiments, we discuss ensemble approaches for the