A simple scheme for the analysis of variance

(1)

Citation for published version (APA):

Bosch, A. J. (1976). A simple scheme for the analysis of variance. (EUT report. WSK, Dept. of Mathematics and Computing Science; Vol. 76-WSK-06). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1976

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

ONDERAFDELING DER WISKUNDE DEPARTMENT OF MATHEMATICS

A simple scheme for the analysis of variance

by

A.J.

Bosch

..

T.H.-Report 76-wSK-06

(3)

Abstract

In 1963-65 the author developed a simple and unified algorithm for the

ana-lysis of variance of mixed models for balanced data (orthogonal classifica-tions). Although rules for setting up analysis of variance tables have been given at many places (for example Hicks [2J, Scheffe [3J and Searle [4J) their interpretation in complex cases still presents difficulties for the uninitia-ted. Presentation of these former results which, except for an internal me-morandum of the Department of Mathematics did not reach publication, seems

therefore appropriate.

As soon as the model has been stated 1n our notation the calculation of degrees of freedom, sums of squares and expected values of mean squares follows easi-ly. The procedure applies to any combination of any number of crossed and nestea classifications. Moreover the notation forms a good starting-point for writing a computer program.

(4)

1. Introduction and summary

In accordance with the greater part of the authorative litterature on this subject (cf. Cornfield and Tukey [IJ and also Scheffe [3J) we apply the usual side conditions to random interactions when at least one of the factors in-volved is fixed. Recently (cf. Searle [4J) arguments have bee~ put forward

against this practice. In section 3 of the Appendix it is shown how our proce-dure can be modified when these side conditions are dropped.

The F-tests as suggested by the expected mean squares column of the analysis of variance table are the correct ones under normality and supplementary

sym-me~ry conditions.

2. Definitions and r,ota tion

As usual we write

x

== J:: x .. .j i 1J J::

x.

2 ::: J:: (J::X •. )2; i l ' i _J. lJ x

..

::: J:: X.. i , j 1J 2

(

J:: x ..)2 X

..

= i1.j lJ etc. .For factor A p == 1,2, ••• P ropulation size N p

nur:bl:r of levels in sample n

p

total number of observations n

correction term S = x2

In

0

••••

cru.de sum of squares : S ::: n ~ x2

In

p _{p k} .. k· n n n pq p q

th

(k = p subscript) S 12 = n12 _i,jZ

x:.

1J. •

In

eto.

(5)

corrected sum of squares : S_p

*

=

s - s

_p 0 degrees of freedom J _n

*

₌

n -n

=

n - 1 p p 0 p mean square _MS

*

=

_S

_*

In

*

p p p

3.

The procedure a. Sources of variation

A subscript provided with an asterisk

(*)

indicates a. "between" comparison;

a subscript without asterisk alwa;:,'s indicates a "within'.' compariso:1~

Thus A₁* refers to "between the levels of factor A111; A

1*2* refers to the

"interaction be"b.veen factor 1,.1 and A

2"; A12

*

refers to IIbetvle~r: the levels

of facto.::' A₂ within f~ctor A₁"; A

12*3* refers to the "interaction between

factors A

2 and A3, within factor A, "; A1 1234* indicates, in the case that

"faCl;or" A

4 refers to "replications wi thin cells", the source "bet'.. cen

re-plications within cells (of the A

1A2A3 clasf~ification)lI, etc.

The advantage of this notation is that it lends itself to symbolic

calcula tions. To this effect, we define AJ.o oV - A - A , and A as" identityJ l ,

"" i 0 0

i.e. ]..]:..0 := .A and

A~

A

o• If vie furt:1,,:;:' agree tr..at the symbol

i\

*2* may also

be l'TI'itten as a symbolic product A

*

= A

*

A

"*

1 2 1 2

A

12 = A1A2

A

12ok == A,-'\"* etc. then, we obtain:

For example, i f factor A

2 is nested within factor A1 , the subscript 2 can never

occur without the subscript 1 (without asterisk); thus A

*

and A

"* *

do not

. 2 1 2 ...

exist separately. These t~o symbols are then combined:

which denotes the SaUTee "between the levels of A within A ".

2 1

Similarly, we r..ave

•

I: l:.~ is :"18sted w~.t:rdn A and A is nested wi thin A' , then the only

meaning-~ 2 2 1

ful f3yrrlbols are A

1-><'. A12*,.A.123 '

* .

The reJ.ation A ::: A +A +A*+A**+A* + A I A +A 123 0 1* 2 1· 2 3 1 *3* 2*3°)!- _{1 *2'*3*} ~ ... ~ ..-/ now becomes: A ₌₌ A + A * + A + A 123 * 123 0 1 12

*

(6)

Likewise i f A4 represents the factor "replications" within cells, only A

1234

*

can appear, since

A

4 is nested in all other factors.

b. The S8, df and 'MS

The formulae for computing the S8 (corrected sums of squares) can be based on 0he same symbolic calculus. Thus we have:

S -"

=

S -"- S -"-

=

(S - 8 ) (S - S )

1*2"'" 1"'" 2"'" 1, 0 2 0

s s - s -

1 2 1 S2 -t ~l0 =:.;12 -

s - s

1 2 +S0

Similarly: S

*

= S (8 - S ) :::, S - S

12 1 2 0 12 1

•

and

_s

_{12 3}

* *

₌

8 (S₁ ₂ - S ) (S₀ ₃ - 8 )₀

=

S₁₂₃ - S₁₂ - S₁₃ +

S

₁

,

etc.

The crude S'cl.DS of square:::. on the right hand side in these equations are all properly

~efined. S S is replaced by S etc., as we never multinly sums of squares.

, 1 2 12 ...

Similarly we have for the degrees of freedoD, df:

~'or =

(n -

1)(n - 1 )

1 2

-;:;1 "T" S

~ 0.:.. 12* _n12

*

== n n

* :::

n (n -1)

1 2 1 2

,

etc.

?urthermore we have of cou:rse for the mean squares: ]1/iS

':'hus for p.xample 1£ == S / n etc.

. 1 *2

* '

1*2 * 1*2 * ,

'''''jldt' ..)u . . ._ .

Tn addition to providinG the proper expressions for SS and df for each source of variation, the symbolic notation also tells us which SS :;ms-c 02 calculated. For example, if A

2 is nested within A1, then A2

*

does not occur, but A12

*

does; consequently we compute: inclUded) :

,

but

E.Qi

8 2

•

::::'2plicates n I:: n ::: n n n n n* ₌₌ n - n 1234 1 2 3 4 0

*

A=A A =: _A

-A

1234 0 S ::: _S ::: EX2 * ijk1 S == S

-

S

•

1234 0

c. The Dodel equation

We truce as exarr~le the model in seetlon 4c:

a=CX +et*+a +a -t-CI. +a -:[1

o 1 3-31' 12* 1-l(TJ( 12*3* 1234*

•

(7)

The a's represent fixed-level effects, the a's are random variables. a is the o

over all average, a

1234

*

the error term. We assume that an interaction should

be represented by a random variable if any factor in this interaction is re-presented by a random variable, althoueh it is formally possible for this rule to fail. But models with random interaction between parametric factors are quite often appropriate (see also appendix

3).

Properly spenJ::ing we have to write the model in the followinG" way:

a. . ==

a

+

(a

).

+

(a

)

+ (a ) .. +

(a

).

+ (a ) . " + (a ) ..

lJkl 0 1* 1 3* k 12* lJ 17<'3* lk 12*3* lJK 1234* lJkl

But only the sid.e restrictions necessitate writing the mode~ in this way. For this n:.odel these side conditions are given in appendix 2.

The notation El~ is used to denote the expected value of a wean square. Our Froblem is to determine which variances occur in the expressions K:? and to find their coefficients.

~he scheme proposed in this paper is largely self-explar~tory. ~e will only add a few COi.'",llents.

It b '..,r",II-known that for Q sample of size n tal<en from a population of size

p

N , n times the variance of the sample mean lS:

p • 2 n var x == (1 - n / IT ) ncr p p n p

which in our rw::;ation becoo.es: c

* 7*

p p

...

is the finite population correction.

is n times the variance of the average effect of factOr A •

P

If II = 00 then c

*

= 1 and A is a random factor.

I:::r

= n i. e. if the sanple

p p . p p

cO!J?:rises t~J.e entire population, tn8n C -J(' := 0 and A is a fixed factor.

p p

Again we take as example the o.odel in section 4c.We introduce the corresponding correction vector c:= (co' c₁* ' c₃* ' c₁₂-'/:' c₁*3*' C

12*3*' c1234*) and variance

. 2: ( 2 " 2 2' 2' 2 2 2

voc~or 0 = 0 0 0 0 0 0 0 )

0 ' 1 * ' 3

*'

i 2 * ' 1 *3 * ' 1 27<-3

*'

1 23 4 * •

We define c := C C • C

*

=

c c oX- etc.

- p*q* p* q*' pq p q

2 2 / 2 " 2 2

°

12

*:=

n012 * n12 = 1134° 12*; °1-)('3* =: n

2401*3*

(the subscripts of n are those not occurring in

with c for all p.

p

"2 2 etc.; a == rJ.Ci •

20 0

(8)

Further we define for each effect the complement vector c, for example cpq* what means: cancel in the vector C the subscripts p and q and put an 0 if a

component does not contain both subscripts.

So is for this model: c

12*:: (0,0,0,1 ,0,c3*, C34*)

c

₃* :: (0,0,1,0,0 *,0 *' c *) eto.

1 12 124

How we can give in formula for each :1'.1S the corresponding i~I.:S:

- 2

EMS :: c

*

0

pq* pq (the inner product of both vectors) •

2 (

In practice one write in the heading the 0 -vector omittinG the first

com-pon\?!/.t) and in each ro,:; 'Le Gorresponding cO::lplementvector c.

hes~~ing: if we insert A, S, n, c, 20 Or c o i n the model,"2 we obtain:

:nodel a - 0: + 0: 1* + 0: +a + a.1*3* + a12*3* + a1234* 0 3

*

12* source A : : A ₊ A_1* +A +A +A +A_{12 *3*} +A_{1234 *} 0 3* 12 * 1*3* SS S :: _S ₊

°1

* + S + S + S + S + S 0 3* 12* 1*3* 12*3* 1234* df n :: no + n*+n*+ n * + n *+n _{** + n1234}o lf-1 3 12 1 *3 12 3

cor:::-ection vector _{c :: (co} _{c1 *} C

3* C12o

lf-

,

C_{1 *3*} 0_{12 *3*}

,

C_{1234 *} )

variance· vector 2"a :: (02

,

₀"2 2" 2" 2 2"

~234*)

1*

,

03* 012*

,

01*3* a12*3* , 0

SI/S _veotor

EMS

:: _(co

,

-

f2

°1 * c3* c12 * C1 *3*

,

012 *3* C1234 * a •

.~. 1,'."O:c;-;:ed-out exa::1;plc: a nested-factorial experLilent with rC'iJlications,

;;;.ixed rr:odel

Consider three batches of materials, fran each of which a sample of three specirr:ens is taken. An analyst performs duplica~e detp'~inationson each

spe-Ci';:811 by each of two methods of analysis. The following res'J.lts were obtained:

(9)

b) Design batches

·

A 1 fixed 0 n

3

1)

·

0_1*

=

1 specimens

·

A2 (1 ) random o

=

1 n

=

3

·

2* 2 methods A _fixed _c

₌

₀ _n ₂ 3 3* ₃ duplicates A 4(123) random c₄

*

=

1 n₄ 2

,

Model and uartitioning OJ ::;odel sources a :;:: ex0 + 0:_{1 *} + ex3

*

-j- a12* + 0:₁*3

*

+ a12 *3* + a1234

*

A = A0 + A1

*

+ A3

*

+ A1 2

*

+ A1 *3 * + A12

-~3

* + A1234 * 8 S5

_s

+S +8.,,+8 + 8 * * + 8 * * + 8 o 1~· 3-"'- 12* 13 123 1234* df n :: no + n₁

*

+ n₃* + n₁₂

*

+ 11 1-)(3* + n12*3* + n1234

*

t;I;:;)

0"2(cO '

C_{1* '}

~*'

c

₁₂* ' C₁*3*' C 12*3*'

c

1234*) d) Computations S 2

_In

₁

_{

/'~"..2

}

Iy:, _13456.00 no x _0:;0 1 'J J

...

Co' L: x~

In

₃ {(241 .5)2 + + (221.9)2} ("7~ 13472.05 '-'1 n₁ _{1- ••} = _••• _j _)b c: _I: _x2

In

2

{(4

24.6)2 _{(271.4)2 }} 1·-6 14107·95 oJ =: n 3 :::: + / .J~ 3 _{•• k·} c == n I: x~,

In

₉

_{{( 74.7)2} _+- _+.

₍

_73.1)2}

₁₃₆

₌₌ 13511 .31 ~)

-

...

12 ₁₂ ~J•• ( ' 2 I /' { (146.1 )2 _{( 84. 0}

_)2}

₁₃₆

_14125.19 ...'₁₃ n₁₃ _L: _x. _k _/ _n 0 ₊

_"

..

₊ ~.

.

..,

In

{(

44.3l

+ ( 27. 8 )2}

/36

s

:: n 123 L: x": Ok 18 +

....

_14175.17 123 _~J _• S :: n _I:

X~jkl

/ n =

_{36 {(}

20.2)2 + _••• + (15.1)2}

/36

14216.76 ::1

*

== 8₁ - S 16,,05

s

_{= s}

..

_s

-

c ₊ S 1.19 1-x-₃-J(- oJ 0 13 1 3 0 ;: 3* ,,0: C S _651.95 _"

₌

C'

s

39,,26 u

-

-U .J

-

-3 ₀ _12* 12 1

0*

:::: _S _S 760.76

s

_s

₈ _{- S} ₊ ₈ _10.72 .J :::: t.=

-== 0 _{12*3* .} ₁₂₃ ₁₂ 13 1 1) A_{2 (1)} means

tr~~t

factor A 2 is in all other factors.

nested wi thin A , A ( ) that A is nested

1 4 123 4

Ir.:. this case, n _{is the number of levels of factor} _A

within each level of 2

2

A,o

The _number _{of levels of factor A}

(10)

5

T--

6 specimens 1 2

₃

₄

₇ 8

₉

f method I 20.2 26.2 24.) 22.0 22.6 22.9 23.1 23.7 23.5 24.1 26.9 23. ,) 23.5 _I 24.6 25.0

22.9

22.9 21.8 - - ,

II-J

--«,-~._.-method II 16.2 18.0 15.4 16.1 14.0 13.7 I 16.1 12"2 12.7 14.2 19.1 _12.5

1_~.

_{___}

~8.1

1__

_{16.0___}

~

_13.3 _{1_15.1} -' - - - _ .

__

._--"

..., ,,-,."-+1LV 1,* 2 +-18°₃

*

ElViS ---~ ! 2 _ 02

I

°1234* - e _

-.---J

' . _ -2 2 +4012* o e 2 2

°

+

2°

_12'*3* e 2 2 +4012*

°

e 2 o e 2 2 2 . 6 2 (1_<3 1-

°

_{12 3}

*

+ ,0_{1 3}

'*

*

2 +" 2 °e

C::°

12')('3* -,--- I --- - - --- --- --- -:~ -2 2 2" . 2 2" 2 f 1VlS °1* ° *3 °12* °1*3* °12*3* ° 1234-'* - - - -2 1 0

_°2*

_C _* _C_2-*3* 1 3 1 1 () c 1* C 12* 1 6 I 1 0 _{c *}₃ 1 2 1 _{c *} 1 2

,

1 1 0 3 ₁ - _~ _~_----c1* = 03*

=

0; 02*

=

C4*

=

1 - ---_._--_..~_..-,. -,~~~_-

---~~---,-

·---r---SS I d

A,*

I

16.05 I

A3

*

I

651.95 Ai 2* I 39.26 e) Analysis Source be~reen specimens within batches' between nethods between batches

batches x methods _A1*3*

_I

1.19 sp ecimens x methods _Ai_2*3*

_I

10.72 within batches

between duplicates _A'234*

_I

_41.59 within cells *

I

760.76 Total A,

I

I I

1

l~r

'

-

_~--I ...

(11)

Appendix

1. One missing value

Suppose one (and only one) observation is missing. What will be the missing plot value? When we look in the liternture, we only find a formula for the Hissing plot value in a two-way classification

rR + cC S

y =

-(r-1)(c-1) SOlliOtLa0S for a latin square

(see Hicks [2J)

n(It + C+ T) - 2~;

y _•

In our notation however Yfe can give directly from the model equation the

forrr.ula for the missing plot value y.

Let the model be for example:

x. 'k is missing.

~J

-f.r:~,:: _S9,,:~li~'₁_e _analo;3Ll8 _{i" .}

....

_x _x ₊ _x .;- _x ₊ _x ₊ _x ₊ _x

1* 12* 3* 1 *3* 12-)\'3* • 0 ~.:e:::'f2 : x = x x x n x

In

x (k ptD .... uc·e-c""f'.;n+ ') p* p 0 p p •• lc • ··k· ~...J ... - - 1 VI /n

-

In

etc. x = n x

₌

x x n x. x. 0 0

...

.

....

12 12 lj·. ~j••

A.:;ai:r. t;le same syrn.bolic product: x = x

1 *2* 12 x1 x2 + X0 etc. Then y = - n with x 123

=

0

•

_• ~iodel a C( o + U.1*,Jf'., gives: y - n - n x +x -x 1 2 0 11 1'*2

*

n x. +11 X . - x j J.. 2 ·J n 1*2* rR + cC - S or

olD -;;ne case of 11 latin square we have:

in the lrnown notation.

a CI. + IX

*

+ a

*

+ lX

*

+ resid'l"lal. The d.f. of the residual is

(12)

X +X*+X*+X* n (x. + x . +x ) - 2x Thus y

₌

n

o

1 2 3

=

1 ~•• • J. ••k

.

..

(n 1 -1)(n1

-2)

(n -1)(n -2)1 1 in usual notation n (R +C + T) - 28 or =

_•

(n-1 )(n -

2)

2. 'I'he side conditions

Bor the codel in section 3c these conditions are (see also Scheffe

[3]

p.

275):

N

1

i:/O:,*\::,

_(0:1*),

=

0; (a₃

*).

=

0; (a

_13).j=

(a

_{13)i.= 0}

for all i, j

(a

_123)ij.

=

0

for all i, j.

Only

sUIT@ation over subscripts with asteris.1.cs and over the entire

popula-tion if finite.

are independent (0 , 02

* .)

variables for all i,

12 , l (a₁

* ).

~'l are independent 234 1.,j1C ( 2 )

o ,

0 1" l : .,-K' ';1, t:-..J'1" ,~

\~riables for all

i, j, k, 1. • 2 o " " " 1 1 2 3 4";'sl.K N 13 2 o_{1 3}

*

*::.

2 o₃

*

1 )

So;:,!.:' authors '!f:cite i~2 or [02] to indicate that thi~; is not a true variance.

I ,rc'ler to reduce the Ylu;r;bel' of ',y::,bols if not cOYli\~~;in{':. In this case one

(13)

3. Independent interactions

The calculation of the EMS ~n section 3d applies when it is assumed that mixed interactions (in which some factors are random and others fixed) are

represented by random variables which are not independent, but sum to zero with respect to each parametric factor. We saw (a₁₂*3*)ij'

=

O. If the

(a

12*3*)ijk are mutually independent random variables, we denote this in our model by underlining this variable: !12*3*'

For this case the method of calculation of the EMS needs only slight modifi-cation: the corresponding correction c

12*3* looses its asterisks and becomes

c_{123 •}

References

[I] J. Cornfield and J.W. Tukey (1956), Average values of mean squares in

factorials, Annals of Mathematical Statistics (27), pp. 907-949.

[2J Ch.R. Hicks (1964), Fundamental concepts in the design of experiments,

Holt, Rinehart and Winston, New York, pp. 153-174.

[3J H. Scheffe (1959), The analysis of variance, Wiley, New York, pp. 261-288. [4J S.R. Searle (1971), Linear models, Wiley, New York, pp. 400-404.