Citation for published version (APA):
Bosch, A. J. (1976). A simple scheme for the analysis of variance. (EUT report. WSK, Dept. of Mathematics and Computing Science; Vol. 76-WSK-06). Technische Hogeschool Eindhoven.
Document status and date: Published: 01/01/1976
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.
ONDERAFDELING DER WISKUNDE DEPARTMENT OF MATHEMATICS
A simple scheme for the analysis of variance
by
A.J.
Bosch..
T.H.-Report 76-wSK-06
Abstract
In 1963-65 the author developed a simple and unified algorithm for the
ana-lysis of variance of mixed models for balanced data (orthogonal classifica-tions). Although rules for setting up analysis of variance tables have been given at many places (for example Hicks [2J, Scheffe [3J and Searle [4J) their interpretation in complex cases still presents difficulties for the uninitia-ted. Presentation of these former results which, except for an internal me-morandum of the Department of Mathematics did not reach publication, seems
therefore appropriate.
As soon as the model has been stated 1n our notation the calculation of degrees of freedom, sums of squares and expected values of mean squares follows easi-ly. The procedure applies to any combination of any number of crossed and nestea classifications. Moreover the notation forms a good starting-point for writing a computer program.
1. Introduction and summary
In accordance with the greater part of the authorative litterature on this subject (cf. Cornfield and Tukey [IJ and also Scheffe [3J) we apply the usual side conditions to random interactions when at least one of the factors in-volved is fixed. Recently (cf. Searle [4J) arguments have bee~ put forward
against this practice. In section 3 of the Appendix it is shown how our proce-dure can be modified when these side conditions are dropped.
The F-tests as suggested by the expected mean squares column of the analysis of variance table are the correct ones under normality and supplementary
sym-me~ry conditions.
2. Definitions and r,ota tion
As usual we write
x
== J:: x .. .j i 1J J::x.
2 ::: J:: (J::X •. )2; i l ' i J. lJ x..
::: J:: X.. i , j 1J 2(
J:: x ..)2 X..
= i1.j lJ etc. .For factor A p == 1,2, ••• P ropulation size N pnur:bl:r of levels in sample n
p
total number of observations n
correction term S = x2
In
0
••••
cru.de sum of squares : S ::: n ~ x2
In
p p k .. k· n n n pq p q
th
(k = p subscript) S 12 = n12 i,jZx:.
1J. •In
eto.corrected sum of squares : Sp
*
=
s - s
p 0 degrees of freedom J n*
=
n -n=
n - 1 p p 0 p mean square MS*
=
S*
In
*
p p p3.
The procedure a. Sources of variationA subscript provided with an asterisk
(*)
indicates a. "between" comparison;a subscript without asterisk alwa;:,'s indicates a "within'.' compariso:1~
Thus A1* refers to "between the levels of factor A111; A
1*2* refers to the
"interaction be"b.veen factor 1,.1 and A
2"; A12
*
refers to IIbetvle~r: the levelsof facto.::' A2 within f~ctor A1"; A
12*3* refers to the "interaction between
factors A
2 and A3, within factor A, "; A1 1234* indicates, in the case that
"faCl;or" A
4 refers to "replications wi thin cells", the source "bet'.. cen
re-plications within cells (of the A
1A2A3 clasf~ification)lI, etc.
The advantage of this notation is that it lends itself to symbolic
calcula tions. To this effect, we define AJ.o oV - A - A , and A as" identityJ l ,
"" i 0 0
i.e. ]..]:..0 := .A and
A~
Ao• If vie furt:1,,:;:' agree tr..at the symbol
i\
*2* may alsobe l'TI'itten as a symbolic product A
*
*
= A*
A"*
1 2 1 2
A
12 = A1A2
A
12ok == A,-'\"* etc. then, we obtain:
For example, i f factor A
2 is nested within factor A1 , the subscript 2 can never
occur without the subscript 1 (without asterisk); thus A
*
and A"* *
do not. 2 1 2 ...
exist separately. These t~o symbols are then combined:
which denotes the SaUTee "between the levels of A within A ".
2 1
Similarly, we r..ave
•
I: l:.~ is :"18sted w~.t:rdn A and A is nested wi thin A' , then the only
meaning-~ 2 2 1
ful f3yrrlbols are A
1-><'. A12*,.A.123 '
* .
The reJ.ation A ::: A +A +A*+A**+A* + A I A +A 123 0 1* 2 1· 2 3 1 *3* 2*3°)!- 1 *2'*3* ~ ... ~ ..-/ now becomes: A == A + A * + A + A 123 * 123 0 1 12*
Likewise i f A4 represents the factor "replications" within cells, only A
1234
*
can appear, since
A
4 is nested in all other factors.
b. The S8, df and 'MS
The formulae for computing the S8 (corrected sums of squares) can be based on 0he same symbolic calculus. Thus we have:
S -"
=
S -"- S -"-=
(S - 8 ) (S - S )1*2"'" 1"'" 2"'" 1, 0 2 0
s s - s -
1 2 1 S2 -t ~l0 =:.;12 -s - s
1 2 +S0Similarly: S
*
= S (8 - S ) :::, S - S12 1 2 0 12 1
•
and
s
12 3* *
=
8 (S1 2 - S ) (S0 3 - 8 )0=
S123 - S12 - S13 +S
1,
etc.The crude S'cl.DS of square:::. on the right hand side in these equations are all properly
~efined. S S is replaced by S etc., as we never multinly sums of squares.
, 1 2 12 ...
Similarly we have for the degrees of freedoD, df:
~'or =
(n -
1)(n - 1 )1 2
-;:;1 "T" S
~ 0.:.. 12* n12
*
== n n* :::
n (n -1)1 2 1 2
,
etc.?urthermore we have of cou:rse for the mean squares: ]1/iS
':'hus for p.xample 1£ == S / n etc.
. 1 *2
* '
1*2 * 1*2 * ,'''''jldt' ..)u . . ._ .
Tn addition to providinG the proper expressions for SS and df for each source of variation, the symbolic notation also tells us which SS :;ms-c 02 calculated. For example, if A
2 is nested within A1, then A2
*
does not occur, but A12*
does; consequently we compute: inclUded) :,
butE.Qi
8 2•
::::'2plicates n I:: n ::: n n n n n* == n - n 1234 1 2 3 4 0*
A=A A =: A-A
1234 0 S ::: S ::: EX2 * ijk1 S == S-
S•
1234 0c. The Dodel equation
We truce as exarr~le the model in seetlon 4c:
a=CX +et*+a +a -t-CI. +a -:[1
o 1 3-31' 12* 1-l(TJ( 12*3* 1234*
•
The a's represent fixed-level effects, the a's are random variables. a is the o
over all average, a
1234
*
the error term. We assume that an interaction shouldbe represented by a random variable if any factor in this interaction is re-presented by a random variable, althoueh it is formally possible for this rule to fail. But models with random interaction between parametric factors are quite often appropriate (see also appendix
3).
Properly spenJ::ing we have to write the model in the followinG" way:a. . ==
a
+(a
).
+(a
)
+ (a ) .. +(a
).
+ (a ) . " + (a ) ..lJkl 0 1* 1 3* k 12* lJ 17<'3* lk 12*3* lJK 1234* lJkl
But only the sid.e restrictions necessitate writing the mode~ in this way. For this n:.odel these side conditions are given in appendix 2.
The notation El~ is used to denote the expected value of a wean square. Our Froblem is to determine which variances occur in the expressions K:? and to find their coefficients.
~he scheme proposed in this paper is largely self-explar~tory. ~e will only add a few COi.'",llents.
It b '..,r",II-known that for Q sample of size n tal<en from a population of size
p
N , n times the variance of the sample mean lS:
p • 2 n var x == (1 - n / IT ) ncr p p n p
which in our rw::;ation becoo.es: c
* 7*
p p
...
is the finite population correction.
is n times the variance of the average effect of factOr A •
P
If II = 00 then c
*
= 1 and A is a random factor.I:::r
= n i. e. if the sanplep p . p p
cO!J?:rises t~J.e entire population, tn8n C -J(' := 0 and A is a fixed factor.
p p
Again we take as example the o.odel in section 4c.We introduce the corresponding correction vector c:= (co' c1* ' c3* ' c12-'/:' c1*3*' C
12*3*' c1234*) and variance
. 2: ( 2 " 2 2' 2' 2 2 2
voc~or 0 = 0 0 0 0 0 0 0 )
0 ' 1 * ' 3
*'
i 2 * ' 1 *3 * ' 1 27<-3*'
1 23 4 * •We define c := C C • C
*
=
c c oX- etc.- p*q* p* q*' pq p q
2 2 / 2 " 2 2
°
12*:=
n012 * n12 = 1134° 12*; °1-)('3* =: n2401*3*
(the subscripts of n are those not occurring in
with c for all p.
p
"2 2 etc.; a == rJ.Ci •
20 0
Further we define for each effect the complement vector c, for example cpq* what means: cancel in the vector C the subscripts p and q and put an 0 if a
component does not contain both subscripts.
So is for this model: c
12*:: (0,0,0,1 ,0,c3*, C34*)
c
3* :: (0,0,1,0,0 *,0 *' c *) eto.1 12 124
How we can give in formula for each :1'.1S the corresponding i~I.:S:
- 2
EMS :: c
*
0pq* pq (the inner product of both vectors) •
2 (
In practice one write in the heading the 0 -vector omittinG the first
com-pon\?!/.t) and in each ro,:; 'Le Gorresponding cO::lplementvector c.
hes~~ing: if we insert A, S, n, c, 20 Or c o i n the model,"2 we obtain:
:nodel a - 0: + 0: 1* + 0: +a + a.1*3* + a12*3* + a1234* 0 3
*
12* source A : : A + A1* +A +A +A +A12 *3* +A1234 * 0 3* 12 * 1*3* SS S :: S +°1
* + S + S + S + S + S 0 3* 12* 1*3* 12*3* 1234* df n :: no + n*+n*+ n * + n *+n ** + n1234o lf-1 3 12 1 *3 12 3cor:::-ection vector c :: (co c1 * C
3* C12o
lf-
,
C1 *3* 012 *3*,
C1234 * )variance· vector 2"a :: (02
,
0"2 2" 2" 2 2"~234*)
1*
,
03* 012*,
01*3* a12*3* , 0SI/S veotor
EMS
:: (co,
-
-
f2
°1 * c3* c12 * C1 *3*
,
012 *3* C1234 * a •.~. 1,'."O:c;-;:ed-out exa::1;plc: a nested-factorial experLilent with rC'iJlications,
;;;.ixed rr:odel
Consider three batches of materials, fran each of which a sample of three specirr:ens is taken. An analyst performs duplica~e detp'~inationson each
spe-Ci';:811 by each of two methods of analysis. The following res'J.lts were obtained:
b) Design batches
·
A 1 fixed 0 n3
1)·
01*=
1 specimens·
A2 (1 ) random o=
1 n=
3
·
2* 2 methods A fixed c=
0 n 2 3 3* 3 duplicates A 4(123) random c4*
=
1 n4 2,
Model and uartitioning OJ ::;odel sources a :;:: ex0 + 0:1 * + ex3
*
-j- a12* + 0:1*3*
+ a12 *3* + a1234*
A = A0 + A1*
+ A3*
+ A1 2*
+ A1 *3 * + A12-~3
* + A1234 * 8 S5s
+S +8.,,+8 + 8 * * + 8 * * + 8 o 1~· 3-"'- 12* 13 123 1234* df n :: no + n1*
+ n3* + n12*
+ 11 1-)(3* + n12*3* + n1234*
t;I;:;)0"2(cO '
C1* '~*'
c
12* ' C1*3*' C 12*3*'c
1234*) d) Computations S 2In
1{
/'~"..2}
Iy:, 13456.00 no x 0:;0 1 'J J...
Co' L: x~In
3 {(241 .5)2 + + (221.9)2} ("7~ 13472.05 '-'1 n1 1- •• = ••• j )b c: I: x2In
2{(4
24.6)2 (271.4)2 } 1·-6 14107·95 oJ =: n 3 :::: + / .J~ 3 •• k· c == n I: x~,In
9
{( 74.7)2 +- +.(
73.1)2}136
== 13511 .31 ~)-
...
12 12 ~J•• ( ' 2 I /' { (146.1 )2 ( 84. 0)2}
136
14125.19 ...'13 n13 L: x. k / n 0 +"
..
+ ~..
..,In
{(
44.3l
+ ( 27. 8 )2}/36
s
:: n 123 L: x": Ok 18 +....
14175.17 123 ~J • S :: n I:X~jkl
/ n =36 {(
20.2)2 + ••• + (15.1)2}/36
14216.76 ::1*
== 81 - S 16,,05s
= s
..
s
-
c + S 1.19 1-x-3-J(- oJ 0 13 1 3 0 ;: 3* ,,0: C S 651.95 "=
C's
39,,26 u-
-U .J-
-3 0 12* 12 10*
:::: S S 760.76s
s
8 - S + 8 10.72 .J :::: t.= -== 0 12*3* . 123 12 13 1 1) A2 (1) meanstr~~t
factor A 2 is in all other factors.nested wi thin A , A ( ) that A is nested
1 4 123 4
Ir.:. this case, n is the number of levels of factor A
within each level of 2
2
A,o
The number of levels of factor A5
T--
6 specimens 1 23
4
7 89
f method I 20.2 26.2 24.) 22.0 22.6 22.9 23.1 23.7 23.5 24.1 26.9 23. ,) 23.5 I 24.6 25.022.9
22.9 21.8 - - ,II-J
--«,-~._.-method II 16.2 18.0 15.4 16.1 14.0 13.7 I 16.1 12"2 12.7 14.2 19.1 12.51_~.
___~8.1
1__
16.0___~
13.3 1_15.1 -' - - - _ .__
._--"
..., ,,-,."-+1LV 1,* 2 +-18°3*
ElViS ---~ ! 2 _ 02I
°1234* - e _-.---J
' . _ -2 2 +4012* o e 2 2°
+2°
12'*3* e 2 2 +4012*°
e 2 o e 2 2 2 . 6 2 (1<3 1-°
12 3*
*
+ ,01 3'*
*
2 +" 2 °eC::°
12')('3* -,--- I --- - - --- --- --- -:~ -2 2 2" . 2 2" 2 f 1VlS °1* ° *3 °12* °1*3* °12*3* ° 1234-'* - - - -2 1 0°2*
C * C2-*3* 1 3 1 1 () c 1* C 12* 1 6 I 1 0 c *3 1 2 1 c * 1 2,
1 1 0 3 1 - _~ _~_----c1* = 03*=
0; 02*=
C4*=
1 - ---_._--_..~_..-,. -,~~~_----~~---,-
·---r---SS I dA,*
I
16.05
I
A3*
I
651.95 Ai 2* I 39.26 e) Analysis Source be~reen specimens within batches' between nethods between batchesbatches x methods A1*3*
I
1.19 sp ecimens x methods Ai2*3*I
10.72 within batchesbetween duplicates A'234*
I
41.59 within cells *I
760.76 Total A,I
I I1
l~r
'-
_~--I ...Appendix
1. One missing value
Suppose one (and only one) observation is missing. What will be the missing plot value? When we look in the liternture, we only find a formula for the Hissing plot value in a two-way classification
rR + cC S
y =
-(r-1)(c-1) SOlliOtLa0S for a latin square
(see Hicks [2J)
n(It + C+ T) - 2~;
y •
In our notation however Yfe can give directly from the model equation the
forrr.ula for the missing plot value y.
Let the model be for example:
x. 'k is missing.
~J
-f.r:~,:: S9,,:~li~'1e analo;3Ll8 i" .
....
x x + x .;- x + x + x + x1* 12* 3* 1 *3* 12-)\'3* • 0 ~.:e:::'f2 : x = x x x n x
In
x (k ptD .... uc·e-c""f'.;n+ ') p* p 0 p p •• lc • ··k· ~...J ... - - 1 VI /n-
In
etc. x = n x=
x x n x. x. 0 0...
.
....
12 12 lj·. ~j••A.:;ai:r. t;le same syrn.bolic product: x = x
1 *2* 12 x1 x2 + X0 etc. Then y = - n with x 123
=
0•
• ~iodel a C( o + U.1*,Jf'., gives: y - n - n x +x -x 1 2 0 11 1'*2*
n x. +11 X . - x j J.. 2 ·J n 1*2* rR + cC - S orolD -;;ne case of 11 latin square we have:
in the lrnown notation.
a CI. + IX
*
+ a*
+ lX*
+ resid'l"lal. The d.f. of the residual isX +X*+X*+X* n (x. + x . +x ) - 2x Thus y
=
no
1 2 3=
1 ~•• • J. ••k.
..
(n 1 -1)(n1-2)
(n -1)(n -2)1 1 in usual notation n (R +C + T) - 28 or =•
(n-1 )(n -2)
2. 'I'he side conditions
Bor the codel in section 3c these conditions are (see also Scheffe
[3]
p.275):
N
1
i:/O:,*\::,
(0:1*),
=
0; (a3*).
=
0; (a1*3*).j=
(a1*3*)i.= 0
for all i, j
(a
12*3*)ij.
=0
for all i, j.Only
sUIT@ation over subscripts with asteris.1.cs and over the entirepopula-tion if finite.
are independent (0 , 02
* .)
variables for all i,12 , l (a1
* ).
~'l are independent 234 1.,j1C ( 2 )o ,
0 1" l : .,-K' ';1, t:-..J'1" ,~\~riables for all
i, j, k, 1. • 2 o " " " 1 1 2 3 4";'sl.K N 13 2 o1 3
*
*::.
2 o3*
1 )So;:,!.:' authors '!f:cite i~2 or [02] to indicate that thi~; is not a true variance.
I ,rc'ler to reduce the Ylu;r;bel' of ',y::,bols if not cOYli\~~;in{':. In this case one
3. Independent interactions
The calculation of the EMS ~n section 3d applies when it is assumed that mixed interactions (in which some factors are random and others fixed) are
represented by random variables which are not independent, but sum to zero with respect to each parametric factor. We saw (a12*3*)ij'
=
O. If the(a
12*3*)ijk are mutually independent random variables, we denote this in our model by underlining this variable: !12*3*'
For this case the method of calculation of the EMS needs only slight modifi-cation: the corresponding correction c
12*3* looses its asterisks and becomes
c123 •
References
[I] J. Cornfield and J.W. Tukey (1956), Average values of mean squares in
factorials, Annals of Mathematical Statistics (27), pp. 907-949.
[2J Ch.R. Hicks (1964), Fundamental concepts in the design of experiments,
Holt, Rinehart and Winston, New York, pp. 153-174.
[3J H. Scheffe (1959), The analysis of variance, Wiley, New York, pp. 261-288. [4J S.R. Searle (1971), Linear models, Wiley, New York, pp. 400-404.