• No results found

A simple scheme for the analysis of variance

N/A
N/A
Protected

Academic year: 2021

Share "A simple scheme for the analysis of variance"

Copied!
13
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Citation for published version (APA):

Bosch, A. J. (1976). A simple scheme for the analysis of variance. (EUT report. WSK, Dept. of Mathematics and Computing Science; Vol. 76-WSK-06). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1976

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

ONDERAFDELING DER WISKUNDE DEPARTMENT OF MATHEMATICS

A simple scheme for the analysis of variance

by

A.J.

Bosch

..

T.H.-Report 76-wSK-06

(3)

Abstract

In 1963-65 the author developed a simple and unified algorithm for the

ana-lysis of variance of mixed models for balanced data (orthogonal classifica-tions). Although rules for setting up analysis of variance tables have been given at many places (for example Hicks [2J, Scheffe [3J and Searle [4J) their interpretation in complex cases still presents difficulties for the uninitia-ted. Presentation of these former results which, except for an internal me-morandum of the Department of Mathematics did not reach publication, seems

therefore appropriate.

As soon as the model has been stated 1n our notation the calculation of degrees of freedom, sums of squares and expected values of mean squares follows easi-ly. The procedure applies to any combination of any number of crossed and nestea classifications. Moreover the notation forms a good starting-point for writing a computer program.

(4)

1. Introduction and summary

In accordance with the greater part of the authorative litterature on this subject (cf. Cornfield and Tukey [IJ and also Scheffe [3J) we apply the usual side conditions to random interactions when at least one of the factors in-volved is fixed. Recently (cf. Searle [4J) arguments have bee~ put forward

against this practice. In section 3 of the Appendix it is shown how our proce-dure can be modified when these side conditions are dropped.

The F-tests as suggested by the expected mean squares column of the analysis of variance table are the correct ones under normality and supplementary

sym-me~ry conditions.

2. Definitions and r,ota tion

As usual we write

x

== J:: x .. .j i 1J J::

x.

2 ::: J:: (J::X •. )2; i l ' i J. lJ x

..

::: J:: X.. i , j 1J 2

(

J:: x ..)2 X

..

= i1.j lJ etc. .For factor A p == 1,2, ••• P ropulation size N p

nur:bl:r of levels in sample n

p

total number of observations n

correction term S = x2

In

0

••••

cru.de sum of squares : S ::: n ~ x2

In

p p k .. k· n n n pq p q

th

(k = p subscript) S 12 = n12 i,jZ

x:.

1J. •

In

eto.

(5)

corrected sum of squares : Sp

*

=

s - s

p 0 degrees of freedom J n

*

=

n -n

=

n - 1 p p 0 p mean square MS

*

=

S

*

In

*

p p p

3.

The procedure a. Sources of variation

A subscript provided with an asterisk

(*)

indicates a. "between" comparison;

a subscript without asterisk alwa;:,'s indicates a "within'.' compariso:1~

Thus A1* refers to "between the levels of factor A111; A

1*2* refers to the

"interaction be"b.veen factor 1,.1 and A

2"; A12

*

refers to IIbetvle~r: the levels

of facto.::' A2 within f~ctor A1"; A

12*3* refers to the "interaction between

factors A

2 and A3, within factor A, "; A1 1234* indicates, in the case that

"faCl;or" A

4 refers to "replications wi thin cells", the source "bet'.. cen

re-plications within cells (of the A

1A2A3 clasf~ification)lI, etc.

The advantage of this notation is that it lends itself to symbolic

calcula tions. To this effect, we define AJ.o oV - A - A , and A as" identityJ l ,

"" i 0 0

i.e. ]..]:..0 := .A and

A~

A

o• If vie furt:1,,:;:' agree tr..at the symbol

i\

*2* may also

be l'TI'itten as a symbolic product A

*

*

= A

*

A

"*

1 2 1 2

A

12 = A1A2

A

12ok == A,-'\"* etc. then, we obtain:

For example, i f factor A

2 is nested within factor A1 , the subscript 2 can never

occur without the subscript 1 (without asterisk); thus A

*

and A

"* *

do not

. 2 1 2 ...

exist separately. These t~o symbols are then combined:

which denotes the SaUTee "between the levels of A within A ".

2 1

Similarly, we r..ave

I: l:.~ is :"18sted w~.t:rdn A and A is nested wi thin A' , then the only

meaning-~ 2 2 1

ful f3yrrlbols are A

1-><'. A12*,.A.123 '

* .

The reJ.ation A ::: A +A +A*+A**+A* + A I A +A 123 0 1* 2 1· 2 3 1 *3* 2*3°)!- 1 *2'*3* ~ ... ~ ..-/ now becomes: A == A + A * + A + A 123 * 123 0 1 12

*

(6)

Likewise i f A4 represents the factor "replications" within cells, only A

1234

*

can appear, since

A

4 is nested in all other factors.

b. The S8, df and 'MS

The formulae for computing the S8 (corrected sums of squares) can be based on 0he same symbolic calculus. Thus we have:

S -"

=

S -"- S -"-

=

(S - 8 ) (S - S )

1*2"'" 1"'" 2"'" 1, 0 2 0

s s - s -

1 2 1 S2 -t ~l0 =:.;12 -

s - s

1 2 +S0

Similarly: S

*

= S (8 - S ) :::, S - S

12 1 2 0 12 1

and

s

12 3

* *

=

8 (S1 2 - S ) (S0 3 - 8 )0

=

S123 - S12 - S13 +

S

1

,

etc.

The crude S'cl.DS of square:::. on the right hand side in these equations are all properly

~efined. S S is replaced by S etc., as we never multinly sums of squares.

, 1 2 12 ...

Similarly we have for the degrees of freedoD, df:

~'or =

(n -

1)(n - 1 )

1 2

-;:;1 "T" S

~ 0.:.. 12* n12

*

== n n

* :::

n (n -1)

1 2 1 2

,

etc.

?urthermore we have of cou:rse for the mean squares: ]1/iS

':'hus for p.xample 1£ == S / n etc.

. 1 *2

* '

1*2 * 1*2 * ,

'''''jldt' ..)u . . ._ .

Tn addition to providinG the proper expressions for SS and df for each source of variation, the symbolic notation also tells us which SS :;ms-c 02 calculated. For example, if A

2 is nested within A1, then A2

*

does not occur, but A12

*

does; consequently we compute: inclUded) :

,

but

E.Qi

8 2

::::'2plicates n I:: n ::: n n n n n* == n - n 1234 1 2 3 4 0

*

A=A A =: A

-A

1234 0 S ::: S ::: EX2 * ijk1 S == S

-

S

1234 0

c. The Dodel equation

We truce as exarr~le the model in seetlon 4c:

a=CX +et*+a +a -t-CI. +a -:[1

o 1 3-31' 12* 1-l(TJ( 12*3* 1234*

(7)

The a's represent fixed-level effects, the a's are random variables. a is the o

over all average, a

1234

*

the error term. We assume that an interaction should

be represented by a random variable if any factor in this interaction is re-presented by a random variable, althoueh it is formally possible for this rule to fail. But models with random interaction between parametric factors are quite often appropriate (see also appendix

3).

Properly spenJ::ing we have to write the model in the followinG" way:

a. . ==

a

+

(a

).

+

(a

)

+ (a ) .. +

(a

).

+ (a ) . " + (a ) ..

lJkl 0 1* 1 3* k 12* lJ 17<'3* lk 12*3* lJK 1234* lJkl

But only the sid.e restrictions necessitate writing the mode~ in this way. For this n:.odel these side conditions are given in appendix 2.

The notation El~ is used to denote the expected value of a wean square. Our Froblem is to determine which variances occur in the expressions K:? and to find their coefficients.

~he scheme proposed in this paper is largely self-explar~tory. ~e will only add a few COi.'",llents.

It b '..,r",II-known that for Q sample of size n tal<en from a population of size

p

N , n times the variance of the sample mean lS:

p • 2 n var x == (1 - n / IT ) ncr p p n p

which in our rw::;ation becoo.es: c

* 7*

p p

...

is the finite population correction.

is n times the variance of the average effect of factOr A •

P

If II = 00 then c

*

= 1 and A is a random factor.

I:::r

= n i. e. if the sanple

p p . p p

cO!J?:rises t~J.e entire population, tn8n C -J(' := 0 and A is a fixed factor.

p p

Again we take as example the o.odel in section 4c.We introduce the corresponding correction vector c:= (co' c1* ' c3* ' c12-'/:' c1*3*' C

12*3*' c1234*) and variance

. 2: ( 2 " 2 2' 2' 2 2 2

voc~or 0 = 0 0 0 0 0 0 0 )

0 ' 1 * ' 3

*'

i 2 * ' 1 *3 * ' 1 27<-3

*'

1 23 4 * •

We define c := C C • C

*

=

c c oX- etc.

- p*q* p* q*' pq p q

2 2 / 2 " 2 2

°

12

*:=

n012 * n12 = 1134° 12*; °1-)('3* =: n

2401*3*

(the subscripts of n are those not occurring in

with c for all p.

p

"2 2 etc.; a == rJ.Ci

20 0

(8)

Further we define for each effect the complement vector c, for example cpq* what means: cancel in the vector C the subscripts p and q and put an 0 if a

component does not contain both subscripts.

So is for this model: c

12*:: (0,0,0,1 ,0,c3*, C34*)

c

3* :: (0,0,1,0,0 *,0 *' c *) eto.

1 12 124

How we can give in formula for each :1'.1S the corresponding i~I.:S:

- 2

EMS :: c

*

0

pq* pq (the inner product of both vectors) •

2 (

In practice one write in the heading the 0 -vector omittinG the first

com-pon\?!/.t) and in each ro,:; 'Le Gorresponding cO::lplementvector c.

hes~~ing: if we insert A, S, n, c, 20 Or c o i n the model,"2 we obtain:

:nodel a - 0: + 0: 1* + 0: +a + a.1*3* + a12*3* + a1234* 0 3

*

12* source A : : A + A1* +A +A +A +A12 *3* +A1234 * 0 3* 12 * 1*3* SS S :: S +

°1

* + S + S + S + S + S 0 3* 12* 1*3* 12*3* 1234* df n :: no + n*+n*+ n * + n *+n ** + n1234o lf-1 3 12 1 *3 12 3

cor:::-ection vector c :: (co c1 * C

3* C12o

lf-

,

C1 *3* 012 *3*

,

C1234 * )

variance· vector 2"a :: (02

,

0"2 2" 2" 2 2"

~234*)

1*

,

03* 012*

,

01*3* a12*3* , 0

SI/S veotor

EMS

:: (co

,

-

-

f2

°1 * c3* c12 * C1 *3*

,

012 *3* C1234 * a

.~. 1,'."O:c;-;:ed-out exa::1;plc: a nested-factorial experLilent with rC'iJlications,

;;;.ixed rr:odel

Consider three batches of materials, fran each of which a sample of three specirr:ens is taken. An analyst performs duplica~e detp'~inationson each

spe-Ci';:811 by each of two methods of analysis. The following res'J.lts were obtained:

(9)

b) Design batches

·

A 1 fixed 0 n

3

1)

·

01*

=

1 specimens

·

A2 (1 ) random o

=

1 n

=

3

·

2* 2 methods A fixed c

=

0 n 2 3 3* 3 duplicates A 4(123) random c4

*

=

1 n4 2

,

Model and uartitioning OJ ::;odel sources a :;:: ex0 + 0:1 * + ex3

*

-j- a12* + 0:1*3

*

+ a12 *3* + a1234

*

A = A0 + A1

*

+ A3

*

+ A1 2

*

+ A1 *3 * + A12

-~3

* + A1234 * 8 S5

s

+S +8.,,+8 + 8 * * + 8 * * + 8 o 1~· 3-"'- 12* 13 123 1234* df n :: no + n1

*

+ n3* + n12

*

+ 11 1-)(3* + n12*3* + n1234

*

t;I;:;)

0"2(cO '

C1* '

~*'

c

12* ' C1*3*' C 12*3*'

c

1234*) d) Computations S 2

In

1

{

/'~"..2

}

Iy:, 13456.00 no x 0:;0 1 'J J

...

Co' L: x~

In

3 {(241 .5)2 + + (221.9)2} ("7~ 13472.05 '-'1 n1 1- •• = ••• j )b c: I: x2

In

2

{(4

24.6)2 (271.4)2 } 1·-6 14107·95 oJ =: n 3 :::: + / .J~ 3 •• k· c == n I: x~,

In

9

{( 74.7)2 +- +.

(

73.1)2}

136

== 13511 .31 ~)

-

...

12 12 ~J•• ( ' 2 I /' { (146.1 )2 ( 84. 0

)2}

136

14125.19 ...'13 n13 L: x. k / n 0 +

"

..

+ ~.

.

..,

In

{(

44.3l

+ ( 27. 8 )2}

/36

s

:: n 123 L: x": Ok 18 +

....

14175.17 123 ~J S :: n I:

X~jkl

/ n =

36 {(

20.2)2 + ••• + (15.1)2}

/36

14216.76 ::1

*

== 81 - S 16,,05

s

= s

..

s

-

c + S 1.19 1-x-3-J(- oJ 0 13 1 3 0 ;: 3* ,,0: C S 651.95 "

=

C'

s

39,,26 u

-

-U .J

-

-3 0 12* 12 1

0*

:::: S S 760.76

s

s

8 - S + 8 10.72 .J :::: t.=

-== 0 12*3* . 123 12 13 1 1) A2 (1) means

tr~~t

factor A 2 is in all other factors.

nested wi thin A , A ( ) that A is nested

1 4 123 4

Ir.:. this case, n is the number of levels of factor A

within each level of 2

2

A,o

The number of levels of factor A

(10)

5

T--

6 specimens 1 2

3

4

7 8

9

f method I 20.2 26.2 24.) 22.0 22.6 22.9 23.1 23.7 23.5 24.1 26.9 23. ,) 23.5 I 24.6 25.0

22.9

22.9 21.8 - - ,

II-J

--«,-~._.-method II 16.2 18.0 15.4 16.1 14.0 13.7 I 16.1 12"2 12.7 14.2 19.1 12.5

1_~.

___

~8.1

1__

16.0___

~

13.3 1_15.1 -' - - - _ .

__

._--"

..., ,,-,."-+1LV 1,* 2 +-18°3

*

ElViS ---~ ! 2 _ 02

I

°1234* - e _

-.---J

' . _ -2 2 +4012* o e 2 2

°

+

12'*3* e 2 2 +4012*

°

e 2 o e 2 2 2 . 6 2 (1<3 1-

°

12 3

*

*

+ ,01 3

'*

*

2 +" 2 °e

C::°

12')('3* -,--- I --- - - --- --- --- -:~ -2 2 2" . 2 2" 2 f 1VlS °1* ° *3 °12* °1*3* °12*3* ° 1234-'* - - - -2 1 0

°2*

C * C2-*3* 1 3 1 1 () c 1* C 12* 1 6 I 1 0 c *3 1 2 1 c * 1 2

,

1 1 0 3 1 - _~ _~_----c1* = 03*

=

0; 02*

=

C4*

=

1 - ---_._--_..~_..-,. -,~~~_-

---~~---,-

·---r---SS I d

A,*

I

16.05

I

A3

*

I

651.95 Ai 2* I 39.26 e) Analysis Source be~reen specimens within batches' between nethods between batches

batches x methods A1*3*

I

1.19 sp ecimens x methods Ai2*3*

I

10.72 within batches

between duplicates A'234*

I

41.59 within cells *

I

760.76 Total A,

I

I I

1

l~r

'

-

_~--I ...

(11)

Appendix

1. One missing value

Suppose one (and only one) observation is missing. What will be the missing plot value? When we look in the liternture, we only find a formula for the Hissing plot value in a two-way classification

rR + cC S

y =

-(r-1)(c-1) SOlliOtLa0S for a latin square

(see Hicks [2J)

n(It + C+ T) - 2~;

y

In our notation however Yfe can give directly from the model equation the

forrr.ula for the missing plot value y.

Let the model be for example:

x. 'k is missing.

~J

-f.r:~,:: S9,,:~li~'1e analo;3Ll8 i" .

....

x x + x .;- x + x + x + x

1* 12* 3* 1 *3* 12-)\'3* • 0 ~.:e:::'f2 : x = x x x n x

In

x (k ptD .... uc·e-c""f'.;n+ ') p* p 0 p p •• lc • ··k· ~...J ... - - 1 VI /n

-

In

etc. x = n x

=

x x n x. x. 0 0

...

.

....

12 12 lj·. ~j••

A.:;ai:r. t;le same syrn.bolic product: x = x

1 *2* 12 x1 x2 + X0 etc. Then y = - n with x 123

=

0

~iodel a C( o + U.1*,Jf'., gives: y - n - n x +x -x 1 2 0 11 1'*2

*

n x. +11 X . - x j J.. 2 ·J n 1*2* rR + cC - S or

olD -;;ne case of 11 latin square we have:

in the lrnown notation.

a CI. + IX

*

+ a

*

+ lX

*

+ resid'l"lal. The d.f. of the residual is

(12)

X +X*+X*+X* n (x. + x . +x ) - 2x Thus y

=

n

o

1 2 3

=

1 ~•• • J. ••k

.

..

(n 1 -1)(n1

-2)

(n -1)(n -2)1 1 in usual notation n (R +C + T) - 28 or =

(n-1 )(n -

2)

2. 'I'he side conditions

Bor the codel in section 3c these conditions are (see also Scheffe

[3]

p.

275):

N

1

i:/O:,*\::,

(0:1*),

=

0; (a3

*).

=

0; (a

1*3*).j=

(a

1*3*)i.= 0

for all i, j

(a

12*3*)ij.

=

0

for all i, j.

Only

sUIT@ation over subscripts with asteris.1.cs and over the entire

popula-tion if finite.

are independent (0 , 02

* .)

variables for all i,

12 , l (a1

* ).

~'l are independent 234 1.,j1C ( 2 )

o ,

0 1" l : .,-K' ';1, t:-..J'1" ,~

\~riables for all

i, j, k, 1. • 2 o " " " 1 1 2 3 4";'sl.K N 13 2 o1 3

*

*::.

2 o3

*

1 )

So;:,!.:' authors '!f:cite i~2 or [02] to indicate that thi~; is not a true variance.

I ,rc'ler to reduce the Ylu;r;bel' of ',y::,bols if not cOYli\~~;in{':. In this case one

(13)

3. Independent interactions

The calculation of the EMS ~n section 3d applies when it is assumed that mixed interactions (in which some factors are random and others fixed) are

represented by random variables which are not independent, but sum to zero with respect to each parametric factor. We saw (a12*3*)ij'

=

O. If the

(a

12*3*)ijk are mutually independent random variables, we denote this in our model by underlining this variable: !12*3*'

For this case the method of calculation of the EMS needs only slight modifi-cation: the corresponding correction c

12*3* looses its asterisks and becomes

c123 •

References

[I] J. Cornfield and J.W. Tukey (1956), Average values of mean squares in

factorials, Annals of Mathematical Statistics (27), pp. 907-949.

[2J Ch.R. Hicks (1964), Fundamental concepts in the design of experiments,

Holt, Rinehart and Winston, New York, pp. 153-174.

[3J H. Scheffe (1959), The analysis of variance, Wiley, New York, pp. 261-288. [4J S.R. Searle (1971), Linear models, Wiley, New York, pp. 400-404.

Referenties

GERELATEERDE DOCUMENTEN

Most ebtl employees indicate that there is sufficient qualified personnel and that the offered programme is adequate. At the request of personnel both in Hoogeveen and in

Part 2 presents the application to polynomial optimization; namely, the main properties of the moment/SOS relaxations (Section 6), some further selected topics dealing in

Although the answer on the main research question was that implementing the brand equity model only for enhancing decision-making around product deletion is not really affordable

While an enhanced moisture convergence contributed to the advancement of the ITF ahead of the climatological position across the central portions of West Africa, strong

attributed to an anomalous cyclonic circulation located in North Africa, which northeasterly components blocked the advancement of the ITF during the third dekad of July.. The

The eastern portion of the ITF was approximated at 12.8 degrees North, which was two degrees south of the previous dekadal position and behind the climatology mean position by

In practice, it appears that the police of the office of the public prosecutor and the delivery team are more successful in the delivery of judicial papers than TPG Post..

The space E k (Γ) consists of modular forms called Eisenstein series which do not vanish at every cusp of Γ. Furthermore, Hecke operators commute, i.e.. On modular forms of level 1,