• No results found

Bayesian model selection for constrained multivariate normal linear models

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian model selection for constrained multivariate normal linear models"

Copied!
205
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BAYESIAN MODEL SELECTION FOR

CONSTRAINED MULTIVARIATE NORMAL

LINEAR MODELS

Joris Mulder

(2)
(3)

BAYESIAN MODEL SELECTION FOR

CONSTRAINED MULTIVARIATE NORMAL

LINEAR MODELS

BAYESIAANSE MODELSELECTIE VOOR GERESTRICTEERDE

MULTIVARIAAT NORMALE LINEARE MODELLEN

(met een samenvatting in het Nederlands)

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit Utrecht,

op gezag van rector magnificus, prof. dr. J. C. Stoof, ingevolge

het besluit van het college voor promoties in het openbaar te

verdedigen op vrijdag 3 december 2010 om 12.45 uur

door

Joris Mulder

geboren op 21 december 1981

te Enschede

(4)

Co-promotor: Dr. ir. G. J. A. Fox Dr. I. G. Klugkist

The research in this dissertation was financially supported by a VICI research grant from the Netherlands Organisation for Scientific Research (NWO-VICI-453-05-002) to H. J. A. Hoijtink.

(5)
(6)

Prof. dr. P. A. L. de Boeck Prof. dr. P. G. M. van der Heijden Prof. dr. G. K. J. Maris

Prof. dr. J. K. Vermunt

Mulder, Joris

Bayesian Model Selection for Constrained Multivariate Normal Linear Models

Proefschrift Universiteit Utrecht, Utrecht. - Met lit. opg. - Met samenvatting in het Ne-derlands.

ISBN: 978-90-393-5396-7

Printed by ZuidamUithof Drukkerijen, Utrecht, the Netherlands. First print 2010. Cover design by Joris Mulder

Copyright c J. Mulder, 2010. All Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without written permission of the author. Alle rechten voorbehouden. Niets uit deze uitgave mag worden verveelvuldigd, in enige vorm of op enige wijze, zonder voorafgaande schriftelijke toestemming van de auteur.

(7)

To my mother Jos

and in memory of my father Marcel

(8)
(9)

Acknowledgments

I would like to use this part of my dissertation to thank some people that have been important to me during these last four years as a PhD student. First I would like to thank my promotor and daily supervisor Herbert Hoijtink. Thank you for your guidance, support, and also for providing a pleasant working environment. I always enjoyed our meetings at the office and talks outside the office. I would also like to thank Irene Klugkist for her supervision, especially in the beginning of my PhD project. I hope we stay in touch and work on more papers about inequality constraints, Bayes factors, prior distributions, and other Bayesian stuff.

In the last two years of my PhD, I also worked with Jean-Paul Fox from OMD (Uni-versity of Twente). I would like to thank you for your guidance. Your passion for the job really inspired me. After we finish our first paper I hope shall continue working together in the future.

In the spring of 2010, I visited John Boscardin at the department of Epidemiology and Biostatistics at UCSF in San Francisco. I really enjoyed my time in San Francisco and I would like to thank you for that. Hopefully we shall continue working on our ideas and write some papers in the near future.

Some of the chapters in this thesis have been written in close collaboration with fellow researchers such as other PhD students, a master student, and a professor. I would like to thank you all for your contribution. In particular I would like to thank Marjolein Kammers and Rens van de Schoot for your contributions and enthusiasm.

Also I would like to thank all the people from M&T at Utrecht University for making me feel very welcome in the department. I always enjoyed the times when we got together during birthdays, borrels and (sometimes) small diners. In particular I would like to thank my friend Carel with whom I shared so many good times with at the office and out of the office. I remember sharing work ideas, the singing (such as, “Aap-mens”, “Paardenvijg waar was gij al die tijd?”, and of course “Als ik later groot ben dan lust ik wel een aardappel”), and just hanging around together. Thanks dude!

My friends have always been very important to me. This has not been different when writing my dissertation. Hereby I would like to thank you all when enjoying drinks, when talking about Life, the Universe, and Everything, but also at moments when I needed your support. My good old friends from Enschede, my mates from Applied Mathematics, my new friends from Utrecht, especially my buds in my band ‘Spark to Fire’, and my friends

(10)

that do not belong in one of the above categories, you all know you mean a great deal to me!

Especially I would like to thank Floryt. I know a lot has changed recently. However, this does not change the fact that you have been very important to me these last few years. Thank you for your love, support, and all the good times we shared together!

Finally, I would like to thank my brother Ivo for being the coolest brother out there (no joke!) and my mother Jos for being the best mom possible.

During the past four years as a PhD student, I have changed from a mathematician who was interested in statistics into a Bayesian statistician. I remember Wim van der Linden warning me (in an ironic way) for becoming a ‘Bayesian’, when I told him I was planning to start as a PhD student at Utrecht University with Herbert Hoijtink as my promotor. In a way, Wim was right because I have become really interested in the Bayesian way of using statistics. On the other hand, there may be some interesting colleagues at my new job at MTO at Tilburg University with whom I might work on some non-Bayesian papers. Time will tell.

Utrecht, October 2010 Joris Mulder

(11)

Contents

Acknowledgments . . . ix

List of Tables . . . xv

List of Figures . . . xvii

1 Introduction . . . 1

1.1 Motivating Example . . . 1

1.2 Selection Rules . . . 2

1.3 The Bayes Factor . . . 4

1.4 Outline of the Dissertation . . . 5

Part I Statistical Methodology 2 Equality and Inequality Constrained Multivariate Linear Models: Objective Model Selection Using Constrained Posterior Priors1 . . . 11

2.1 Introduction . . . 11

2.2 Model and Motivating Examples . . . 14

2.2.1 The Multivariate Normal Linear Model . . . 14

2.2.2 Examples of Constrained Model Selection . . . 15

2.3 The Bayes Factor Based on Encompassing Priors . . . 16

2.4 Complexity of Inequality Constrained Models . . . 18

2.5 Constrained Posterior Priors . . . 20

2.5.1 The Restricted Maximum Likelihood Estimator ˆβRl . . . 21

2.5.2 The Covariance Matrix ˆΣRβ,l. . . 22

2.5.3 The Constrained Posterior Prior of Σ . . . 25

2.5.4 Properties of Constrained Posterior Priors . . . 26

2.5.5 Constrained Posterior Priors for Equality Constrained Models . . . 28

2.5.6 The Geometric IBF based on Constrained Posterior Priors . . . 28

2.6 Computing the Geometric IBF Using Gibbs Sampling . . . 30

2.6.1 Geometric IBFs for Inequality Constrained Models . . . 30

2.6.2 Geometric IBFs for Equality Constrained Models . . . 31

(12)

2.7.1 Multivariate One-Sided Testing . . . 33

2.7.2 Multivariate Analysis of Variance . . . 33

2.7.3 Multivariate Regression . . . 36

2.8 Summary . . . 37

3 Default Bayes Factors for Type B Testing Problems1. . . 39

3.1 Introduction . . . 39

3.2 Partial Bayes Factors . . . 42

3.2.1 Application to the Type B Testing Problem . . . 43

3.2.2 Transforming Posterior Priors . . . 44

3.3 Fractional Bayes Factors . . . 46

3.3.1 Application to the Type B Testing Problem . . . 47

3.3.2 A Modified Fractional Marginal Likelihood for Inequality Constrained Models . . . 48

3.4 Comparison of BF∗ b,tt0 and Btt∗0(Y−l|Yl) . . . 49

3.4.1 The Degree of Precision c−1t . . . 49

3.4.2 A Loss of Support for B∗tt0(Y−l|Yl) . . . 51

3.5 Simulation Study . . . 55

3.6 Application: Repeated Hamilton Depression Scores . . . 58

3.7 Summary . . . 59

4 Bayesian Tests on Variance Components in a Compound Symmetry Covariance Matrix1 . . . 61

4.1 Introduction . . . 61

4.2 Testing Compound Symmetry . . . 63

4.2.1 Prior Specification . . . 64

4.2.2 Posterior Computation: Sampling of the Mean and Variance Components . . . 65

4.2.3 Illustration . . . 67

4.3 Testing an Invariant Compound Symmetry Structure . . . 69

4.3.1 Bayes Factors for Inequality Constrained Models . . . 70

4.3.2 Bayes Factors for Equality Constrained Models . . . 72

4.4 Empirical Data Example: 1982 HSB Survey Data . . . 76

4.5 Summary . . . 79

4.A Derivation of the Bayes Factor With a Multivariate Normal Prior and Posterior . . . 79

4.B Derivation of the Bayes Factor With a Multivariate Student t Prior and Posterior . . . 80

4.C Derivation of the Marginal Posterior of Στ . . . 81

Part II Software and Applications 5 BIEMS: A Fortran 90 Program for Calculating Bayes Factors for Inequality and Equality Constrained Models1 . . . 87

5.1 Introduction . . . 87

5.2 Constrained Model Specification . . . 88

5.2.1 The Multivariate Normal Linear Model . . . 88

(13)

Contents xiii 5.2.3 Equality and Inequality Constrained Multivariate Normal Linear

Models . . . 90

5.3 Default Prior Specification . . . 91

5.3.1 Linear Restrictions for the Constrained Posterior Priors . . . 91

5.3.2 Hyper-Parameters of the Constrained Posterior Priors . . . 92

5.3.3 Conjugate Expected-Constrained Posterior Prior and Posterior . . . 95

5.3.4 Issues with Minimal Training Samples . . . 96

5.4 Calculating Bayes Factors . . . 96

5.4.1 Estimating ftand ct for an Inequality Constrained Model . . . 97

5.4.2 Sampling Error of the Estimate of the Bayes Factor . . . 99

5.4.3 Equality Constrained Models . . . 100

5.5 Short Examples . . . 102

5.5.1 2 by 3 ANOVA . . . 103

5.5.2 Regression 1: Inequality Constrained Models . . . 103

5.5.3 Regression 2: Variable Selection . . . 104

5.5.4 MANCOVA . . . 105

5.5.5 Repeated Measurements . . . 106

5.6 Conclusions . . . 107

5.A BIEMS User Manual . . . 107

5.B User Manual for the BIEMS Data Generator . . . 112

5.C Analyses of Empirical Studies . . . 115

6 Bayesian Model Selection of Informative Hypotheses for Repeated Measurements1. . . 119

6.1 Introduction . . . 119

6.2 Illustration . . . 121

6.2.1 Satisfaction with Peer Relations . . . 121

6.2.2 One Within Factor . . . 121

6.2.3 One Within and One Between Factor . . . 123

6.2.4 Mixed Design Controlling for a Time-Invariant Covariate . . . 125

6.2.5 Mixed Design Controlling for a Time-Varying Covariate . . . 126

6.3 Model Selection Criteria for Informative Hypotheses . . . 127

6.4 Objective Model Selection Using the Bayes Factor . . . 128

6.4.1 Introduction . . . 128

6.4.2 Combining Model Fit and Model Complexity . . . 130

6.4.3 A Prior Based on Training Data . . . 133

6.4.4 Estimating the Bayes Factor Using a Gibbs Sampler . . . 137

6.5 Simulation Study . . . 138

6.5.1 Simulation 1 . . . 138

6.5.2 Simulation 2 . . . 139

6.5.3 Simulation 3 . . . 140

6.6 Illustration (Continued) . . . 141

6.6.1 One Within Factor . . . 141

6.6.2 One Within and One Between Factor . . . 143

6.6.3 Mixed Design Controlling for a Time-Invariant Covariate . . . 143

6.6.4 Mixed Design Controlling for a Time-Varying Covariate . . . 144

6.6.5 Concluding Remarks on the Illustration . . . 144

6.7 Summary . . . 145

(14)

6.B Constrained Posterior Prior Based on a Minimal Training Sample . . . 146

6.C Conjugate Expected-Constrained Posterior Prior . . . 149

6.D Gibbs Sampler for CECPP and Posterior Distribution . . . 150

6.E Gibbs Sampler for Constrained Models Containing Equalities . . . 151

7 The Weight of Representing the Body: Addressing the Potentially Indefinite Number of Body Representations in Healthy Individuals1 153 7.1 Introduction . . . 153

7.2 Challenges in the Study of Body Representation . . . 154

7.3 A New Approach . . . 157

7.4 Two Rubber Hand Illusion Experiments as a Technical Illustration . . . 157

7.4.1 Bayesian Model Selection: Example 1 . . . 158

7.4.2 Bayesian Model Selection: Example 2 . . . 161

7.5 Conclusion: The Weight of Representing the Body . . . 165

Discussion and Remaining Issues . . . 167

References . . . 171

Index . . . 177

Samenvatting (Summary in Dutch) . . . 179

(15)

List of Tables

2.1 Model complexity ct of different inequality constrained models Mt. . . 20

2.2 Sample means and standard errors of the dependent variables (y1, y2). . . 33

2.3 Sample means and standard errors sorted by group j = 1, . . . , 4. . . 36

2.4 Means and standard errors of the standardized regression coefficients. . . 37

3.1 Sample means and sample covariance matrix of the Hamilton depression scores. . . 58

6.1 Model complexity ct, model fit ft of Posterior 1 and 2 (Figure 6.5b), and Bayes factors Bt0of two hypothetical data sets. . . 134

6.2 Bayes factors Bt0of four constrained models Mt, q = 1, . . . , 4, versus the unconstrained model M0 of simulation 3. . . 141

6.3 Sample Means (M) and Standard Deviations (SD) for Perceived Satisfaction, Marital Conflict and Generalized Anxiety Disorder at Five Waves for Boys (n = 196) and Girls (n = 144) . . . 142

6.4 Constrained models for the one within factor model and corresponding Bayes factors Bt0versus the unconstrained model M0 . . . 142

6.5 Bayes factors of constrained models versus the unconstrained model M0 for one within factor and one between factor model without covariate (Analysis 2), with time-invariant Covariate (Analysis 3), and with time-varying covariate (Analysis 4). . . 144

7.1 Illustrative data of five participants. . . 158

7.2 Overview of the real dataset, showing the RHI-dependent location error in centimeters (cm) for each perceptual response (data Previously published in Kammers et al. (2009a). Note. l = left. r = right. . . 163

7.3 Bayes factors between the constrained models M1, M2, and M3. . . 164

(16)
(17)

List of Figures

1.1 Graphical representation of the three constrained models M1 : µ = 0

(dot), M2: µ > 0 (squares), and M3: µ2> µ1 (light grey). . . 3 2.1 The arithmetic IBF (dotted line), the Bayes factor based on the intrinsic

prior (solid line), and the geometric IBF using constrained posterior priors (dashed line) of M1 versus M0 for samples generated from a N (µ, 1)

distribution with different means. . . 13 2.2 Geometric IBFs B10GI (triangle, M1 : µ = 0) and BGI20 (circle, M2: µ > 0)

of data sets with different mean vectors (µ1, µ2) = (µ, µ) with

µ = −2, . . . , 2 sorted by correlation ρ = 0, 0.5 and sample size n = 10, 36. . . 34 2.3 Geometric IBFs BGI

10 (triangle, M1 : µ1 = . . . = µ4) and B GI 20 (circle, M2 : µ1 < . . . < µ4) of data sets with different means sorted by

correlation ρ = 0, 0.5 and sample size n = 40, 400. . . 35 2.4 Geometric IBFs BGI

10 (triangle), B20GI (circle), and BGI30 (square) of data sets with different regression coefficients sorted by sample size n = 25, 60 and correlation ρ = 0, 0.5. . . 37 3.1 (a) Transforming the posterior prior π0 (solid ellipses) to the line µ1= µ2

resulting in the constrained posterior prior π∗0 (dashed circles) with equal means and larger variances. (b) Relocating the boundary µ1 = µ2 of the inequality constrained space of model M1 : µ1 > µ2 to the line

µ1− ¯y1= µ2− ¯y2 where the fraction of the likelihood, gb, is localized. . . . 45 3.2 Default Bayes factors of the inequality constrained models M1 (left hand

panel) and M2 (right hand panel) against the unconstrained model. Each row represents a different covariance matrix Σk, for k = 1, 2, 3, given in (3.21). (Note. medIBF is the standard median intrinsic Bayes factor. FBF is the standard fractional Bayes factor. The * refers to the proposed alternatives.) . . . 56 3.3 Natural logarithm of the default Bayes factors of the type B testing

problem of model M1 versus model M2. Each plot represents a different covariance matrix Σk, for k = 1, 2, 3, given in (3.21). The standard fractional Bayes factor and standard intrinsic Bayes factor were omitted due to the poor results for nested models in Figure 3.2. . . 57

(18)

4.1 a) Allowed parameter space of (τ, σ2) in the compound symmetry structure ΣCS1

nj = σ

2((1 − τ )In

j + τ Jnj), and (b) in the compound symmetry

structure ΣCS2

nj = σ

2In

j+ τ Jnj. . . 64

4.2 Posterior sample densities of τ for the data sets of Box and Tiao (1973, p.256–257). The solid line was obtained using Metropolis-Hastings of the complete likelihood in (4.8) and the dashed line was obtained using the likelihood on the group means in (4.12) when sampling τ . . . 68 4.3 Allowed parameter space of M3 : τ1 < τ2 and M4 : τ1 > τ2 with

(τ1∗, τ2∗) = (−nmax,11 −1, − 1

nmax,2−1). If nmax,1< nmax,2 and assuming a

uniform distribution over (− 1

nmax,1−1, 1) × (−

1

nmax,2−1, 1), the probability

mass in the constrained spaces of M3 is larger than the probability mass in the space of M4, resulting in a biased Bayes factor towards model M4. . 72 4.4 Cross-section of the prior density (a) and posterior density (b) of (τ1, τ2)

on the line τ1= τ2. For the prior, the surface of the cross section is .12 and for the posterior .17, so that B10= .17.12 = 1.42 . . . 76 4.5 Prior density (a) and approximated posterior density (b) of (τP, τC) for

the HSB data. The surface of the cross section through the line τP = τC (grey area) was equal to√2 · .97 = 1.37 and 0 for the prior and posterior, respectively. . . 78 5.1 Contours of different priors: π1 represents uniform priors for (µ1, µ2),

π2 represents identical normal priors for (µ1, µ2), and π3 represents normal priors with var(µ1) = 4var(µ2). The complexity of the inequality constrained model M1 : µ1> 2µ2> 0 correspond to the relative size of the grey areas. Hence, c11= .0625, c21= .0738, and c31= .125. . . 94 5.2 Regression line of the sample size per two inequality constraints versus the

standard error of the Bayes factor. The dots are the actual measurements of the standard errors. The Bayes factor of this data set is equal to 2.0e6. . 100 5.3 Draws from the unconstrained posterior of (µ1, µ2) (black), and subsequent

constrained posteriors under |µ1− µ2| < δa, with δa+1= δa/2, δ0 = .9. The solid line is µ1 = µ2. The data was generated for two groups of size 100 with means µ = (0, 1.4)0 and error variance σ2= 1. . . 101 5.4 Prior (dash-dotted line) and posterior (solid line) sample density plots

of a data set of size N = 30 and N (0, 1) (left panel) and a data set of size N = 15 and N (.7, 1) (right panel). Using the Savage-Dickey density ratio, the Bayes factors of M1: µ = 0 versus M0 : µ ∈ R1 equal B10≈ 2.1

.4 = 5.25 (left panel) and B10≈ .07

.29 = .24 (right panel) . . . 102 6.1 Hypotheses H1, H2 and H3 for the One Within Factor Design . . . 122 6.2 Hypotheses M1, . . . , M4 in (a) to (d), respectively, for the One Within

and One Between Factor Design . . . 123 6.3 Comparing boys’ and girls’ satisfaction averages after correction for

(19)

List of Figures xix 6.4 Graphical representation of the inequality constrained regions of the

models M1: µ1< µ2, M2: µ1< µ2< µ3, and M3: 0 < µ2− µ1 < µ3− µ2 in comparison to the complete unconstrained space of (µ1, µ2, µ3). When assuming a uniform prior over the complete space, the prior probability mass in agreement with models M1, M2, and M3 are equal to 1

2, 1 6, and 1

12 for M1, M2, and M3, respectively. . . 132 6.5 (a) 95% Iso-density sphere of the multivariate normal prior with equal

means and variances. (b) 95% Iso-density spheres of two hypothetical posteriors. The grey areas can be interpreted as the probability masses

located in the inequality constrained space of model M2. . . 133 6.6 Representation of the posterior prior and the constrained posterior prior

under the restriction µ1 = µ2 = µ3. Note: PP = posterior prior, CPP = constrained posterior prior. . . 136 6.7 Sketch of empirical expected-constrained posterior prior (EECPP)

based on four different constrained posterior priors (grey spheres) and corresponding conjugate expected-constrained posterior prior (CECPP) for (µ1, µ2, µ3) for the restriction µ1= µ2= µ3. . . 137 6.8 Trajectory A-B-C-D-E-F of the measurement means (µ1, µ2, µ3) that were

used to generate data. Note that M3⊂ M2⊂ M1⊂ M0. . . 139 6.9 Bayes factors Bt0of constrained models Mt, q = 1, . . . , 4 (solid, dashed,

dotted, dash-dotted line, respectively), versus the unconstrained model M0 for data sets generated with measurement means on the trajectory

A-B-C-D-E-F in Figure 6.8. . . 139 6.10 Bayes factors Bt0of constrained models Mt, q = 1, . . . , 4 (solid, dashed,

dotted, dash-dotted line, respectively), versus the unconstrained model M0 for data sets generated with adjusted means on the trajectory A-B-C-D-E-F in Figure 6.8 with a standard normal covariate and slope αp= .5, for p = 1, 2, 3. . . 140 7.1 Sketch of contour plots of prior and posterior densities based on the data

of Table 7.1. The complete square can be interpreted as the unconstrained space of M0 and the grey area can be interpreted as the inequality constrained space of M1: µ1> µ2. The proportion of the prior satisfying µ1> µ2 is 0.5. The proportion of the posterior satisfying µ1 > µ2 is 0.97. Note that the prior distribution is broad and vague, whilst the posterior distribution is narrower and centred on the means in the empirical data. . . 160 7.2 Set up of the experiment that was described in Kammers et al. (2009a). . . . 161

(20)
(21)

1

Introduction

One of the main goals of statistical science is to provide tools that can be used for testing hypotheses based on a sample from a population of interest. This dissertation explores how the Bayes factor, a Bayesian model selection criterion, can be used to test hypothe-ses containing equality and inequality constraints on the model parameters. When using equality constraints as well as inequality constraints, a researcher is able to formulate hy-potheses that correspond to very specific expectations. For this reason, these hyhy-potheses are also referred to as informative hypotheses (Hoijtink et al., 2008). The focus shall be on the multivariate normal linear model which includes models that are commonly used in the applied sciences such as (M)AN(C)OVA, (multivariate) regression, and repeated measurements. The following example illustrates a typical model selection problem that will be addressed throughout this dissertation.

1.1 Motivating Example

Interest is in the influence of a medical intervention on the cell counts of two cell types. The counts of both cells are obtained for N subjects before and after the intervention. For both cell types, the recorded counts before the intervention are subtracted from the counts after the intervention. This results in two cell count differences (yi1, yi2), for cell type I and II, respectively, for i = 1, . . . , N . When the distribution of the cell count differences

(22)

(yi1, yi2) has an elliptic shape, the data can be approximated using a bivariate normal model given by » yi1 yi2 – ∼ N„» µ1 µ2 – ,» σ 2 1 σ12 σ12 σ2 2 –« , (1.1)

where µ = (µ1, µ2)0are the means of the cell count differences and the 2 × 2 matrix is the error covariance matrix denoted by Σ.

Three expectations are formulated by medical experts. The first states that the in-tervention has no effect on the cell counts. The second states that the inin-tervention has a positive effect on the cell counts. The third states that the influence of the intervention is larger for cell type II in comparison to cell type I, or more specifically, the number of cells after the intervention minus the cell counts before intervention is larger for cell type II in comparison to cell type I. Based on model (1.1), these expectations can be formulated in the following three equality and inequality constrained models,

M1: µ1= 0, µ2= 0

M2: µ1> 0, µ2> 0 (1.2)

M3: µ2> µ1.

Note that the nuisance covariance matrix Σ is omitted here to simplify the notation. In Figure 1.1, a graphical representation is given of the parameter spaces of (µ1, µ2) of the three models. Subsequently, the goal is to determine which model receives most evidence from the data.

This example illustrates the type of model selection problems that is addressed in this dissertation. First, the models contain equality and inequality constraints between the model parameters. These constraints can be seen as a mathematical translation of the expectations stated by applied researchers. Second, the models of interest can be nested, partly nested, or nonnested. Third, the models are all nested in an unconstrained model which is denoted by M0. In the example above, M0 : (µ1, µ2)0 ∈ R2

. Fourth, the set of models under investigation, {M1, . . . , MT}, may contain more than two models, i.e., T may be larger than 2.

1.2 Selection Rules

The most commonly used statistical tool for hypothesis testing in the applied sciences is the p-value. Loosely formulated, the p-value can be interpreted as the probability that the observed data (or more extreme data) was generated under the null hypothesis . A small value of p therefore indicates that the data was not observed under the null hypothesis. Note here that the p-value actually depends on data that was not observed. A famous quote by Jeffreys (1963, p. 385) about this observation states: “What the use of p implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred”. In this dissertation we shall not add to the debate on the use and misuse of p-values for hypothesis testing. Regarding the selection problem in (1.2), we consider the following argument for not using p-values. Because of the difficulty of obtaining p-values under a null hypothesis that contains inequality constraints, the null hypothesis is generally formulated with equality constraints between the parameters of interest. In (1.2) the null hypothesis can be formulated as H0 : µ1 = 0, µ2 = 0 (if the interest is the change between the counts of both cell types)

(23)

1.2 Selection Rules 3 μ1 μ2

M

1

M

3

M

2

Fig. 1.1. Graphical representation of the three constrained models M1 : µ = 0 (dot), M2: µ > 0 (squares), and M3: µ2> µ1 (light grey).

or H0 : µ1 = µ2 (if the interest is in the difference between the change of the counts of the two cell types). The standard alternative hypothesis is equal to the complement of this null hypothesis. In their recent book on constrained statistical inference, Silvapulle and Sen (2005) discuss how p-values can be obtained for testing an equality constrained null hypothesis against an inequality constrained alternative and an inequality constrained null hypothesis against an unconstrained alternative. However p-values cannot be obtained between (nonnested) inequality constrained hypotheses (or models), such as, M2 : µ1> 0, µ2> 0 against M3: µ1< µ2. Furthermore, the use of p-values may result in conflicting conclusions when more than two models are under evaluation. For this reason, we do not further explore the use of p-values as selection rule for (1.2).

An alternative to p-values when choosing the best of a set of (nonnested) models is formed by the class of information criteria, such as, Akaike’s information criterion (Akaike, 1973) , the Bayesian or Schwarz’ information criterion (Schwarz, 1978) , and the deviance information criterion (Spiegelhalter et al., 2002). These criteria balance between fit and complexity, when determining which model is best given the data at hand. Generally, the measure of fit is based on the likelihood of the data and the complexity of a model is related to the number of free parameters in the model. For this reason, if two models fit the data equally well, the simplest (or smallest) model (with the least number of parameters) is preferred. This means that these selection criteria work as an “Ockham’s razor” . In the case of inequality constrained models, the number of free parameters may not be a good measure for model complexity. For instance, is the number of free parameters of the inequality constrained model M3 : µ2 > µ1 equal to the number of free parameters in the unconstrained model M0: µ1, µ2? As an alternative, Anraku (1999) proposed an AIC with an adjusted penalty for model complexity for models with simple order restrictions. In Chapter 6, we briefly come back to the use of the information criteria for selecting between inequality constrained models.

(24)

1.3 The Bayes Factor

In the Bayesian approach, the information we have about the parameters are modeled through probability distributions. First, a probability distribution needs to be specified before observing the data. This distribution is known as the prior distribution which re-flects the information we have before observing the data. It is possible to specify prior distributions that contain information, for instance, based on the results of previous re-search. On the other hand, prior distributions can be specified that do not contain any information about the values that the parameters can take. Such priors are referred to as noninformative. In this dissertation, priors will be used that are relatively uninformative. The likelihood function contains the information that is present in the data. Application of Bayes’ theorem yields the posterior distribution of the model parameters which combines the information in the prior and the likelihood, i.e.

πt(θt|y) = gt(y|θt)πt(θt)

R gt(y|θt)πt(θt)∂θt, (1.3)

where π(θt) denotes the prior distribution of the model parameters θt under model Mt, gt(y|θt) is the likelihood function of the data y, and π(θt|y) is the posterior distribution under model Mt. The denominator in (1.3) is also referred to as the marginal likelihood of the data under model Mt, denoted by mt(y). The posterior distribution reflects the information we have about the parameters after observing the data. Some useful references on Bayesian statistics for data analysis are, for instance, Zellner (1997), O’Hagan and Forster (2004), Gelman et al. (2005), and Press (2005).

In this dissertation, the Bayes factor will be used as selection criterion between equality and inequality constrained models. This measure dates back to Jeffreys (1963). For more recent work on the use of Bayes factors for model selection, see, for instance, Kass and Raftery (1995), Berger and Pericchi (1996), and Hoijtink et al. (2008). The Bayes factor of M1 against M2 is defined as the ratio of the marginal likelihood of models M1 and M2, i.e.,

B12= m1(y) m2(y) =

R g1(y|θ1)π1(θ1)∂θ1

R g2(y|θ2)π2(θ2)∂θ2. (1.4)

Hence, the Bayes factor can be seen as a likelihood ratio weighted according to the prior densities. The outcome of the Bayes factor can be interpreted as the amount of evidence from the data in favor of model M1 against M2. For instance, if B12= 5 this means that model M1 is five times more likely than model M1. Some consistency properties of the Bayes factor are:

1. B11= 1; 2. B12≥ 0; 3. B12= B21−1; 4. B12= B10

B20.

An important aspect in the calculation of Bayes factors between the constrained mod-els is the incorporation of the unconstrained model M0. Under very general conditions, the Bayes factor of an inequality constrained model Mtagainst the unconstrained model M0can be expressed as the ratio of the posterior and prior probability that the inequality constraints hold. These probabilities can be relatively easily computed which is generally not the case for marginal likelihoods mt(y) in (1.4). Besides this computational advantage, these probabilities are also insightful when determining which model is most supported by

(25)

1.4 Outline of the Dissertation 5 the data. First, the posterior probability serves as a measure of model fit: When the poste-rior probability that the inequality constraints hold is large, this means that the inequality constraints are supported by the data, and vice versa. Second, the prior probability serves as a measure of model complexity: When the prior probability that the inequality con-straints hold is small (assuming a relatively noninformative proper prior distribution), this means that the inequality constrained space of Mt is relatively small. This can be interpreted as a small model which corresponds to a precisely formulated expectation us-ing inequality constraints. For equality constrained models, a similar result applies as will be shown in Chapter 4. Chapters 2, 3, and 6 describes in more detail how proper priors can be constructed that result in Bayes factors with a effective balance between fit and complexity for models containing equality and/or inequality constraints. This is based, among others, on the work by O’Hagan (1995) and Berger and Pericchi (1996).

Based on the Bayes factors of each constrained model against the unconstrained model, i.e., Bt0for t = 1, . . . , T , the Bayes factors between the constrained models can be obtained using the fourth property stated above. Furthermore, the Bayes factors Bt0directly show whether the constrained models are supported to any degree by the data, i.e., when Bt0> 1. Furthermore, when specifying that each model under investigation is equally likely a priori (i.e., P (Mt) = P (Mt0) ∀t, t0 = 1, . . . , T ), it is easy to obtain posterior model

probabilities using

P (Mt|y) = PTBt0 t0=1Bt00

. (1.5)

Hence, the posterior model probability of model Mtis a normalization of the Bayes factor Bt0, and therefore, the relative support by the data is identical for each pair of models when using posterior model probabilities and Bayes factors. Expression (1.5) also shows that posterior model probabilities depend on the number of models under investigation T : P (Mt|y) is smaller when additional models are included in the set {M1, . . . , MT}. Although posterior model probabilities are not probabilities by definition because the triangular inequality does not hold, the outcomes are often interpreted as posterior prob-abilities that models are true under the assumption that the true model is present in the set of models under investigation. Because of this intuitive interpretation, posterior model probabilities are often used in papers that are published in applied journals, such as in Chapter 7. In the rest of this dissertation, however, the focus shall be on Bayes factors.

1.4 Outline of the Dissertation

This dissertation is divided into two parts: Part I focusses on the statistical methodology for calculating Bayes factors and Part II describes statistical software that can be used to obtain Bayes factors and applications where Bayes factors are used as selection criterion given a set of constrained models of interest. The dissertation concludes with a discussion and some remaining issues.

PART I: Statistical Methodology

In Bayesian model selection, a well-known problem is that standard noninformative prior distributions cannot be used to obtain a sensible outcome of the Bayes factor because these priors are improper. The use of a small part of the data, i.e., a training sample, to obtain a proper posterior prior distribution has become a popular method to resolve

(26)

this issue and seems to result in reasonable outcomes of default Bayes factors, such as the intrinsic Bayes factor or a Bayes factor based on the empirical expected-posterior prior.

In Chapter 2, it will be illustrated that such default methods may not result in sensible outcomes when evaluating inequality constrained models that are supported by the data. To resolve this issue, a default method is proposed for constructing so-called constrained posterior priors, which are inspired by the symmetrical intrinsic priors discussed by Berger and Mortera (1999) for a simple inequality constrained model selection problem. The resulting Bayes factors can be called “balanced” because model complexity of inequality constrained models is incorporated according to a specific definition that is presented in this paper.

In Chapter 3, we explore the use of partial Bayes factors the fractional Bayes factor for comparing nested inequality constrained models and for type B testing problems where the interest is in selecting between an inequality constrained model and its complement. For an inequality constrained model Mtagainst the unconstrained model M0, it is shown that the partial Bayes factor and the fractional Bayes factor can be expressed as a ratio of the posterior probability and prior probability that the constraints of Mt hold. These expressions show that the selection criteria are consistent for type B testing problems but not when comparing nested inequality constrained models. For this reason, alternatives are proposed that result in consistent selection behavior when testing nested inequality constrained models and type B testing problems. In general, the proposed alternative for the fractional Bayes factor performs better than the alternative for the partial Bayes factor. However, for small effect sizes and certain covariance structures in the data, the alternative of the partial Bayes factor performs better. As an illustration, the methodology is applied to repeated Hamilton depression scores to test whether a decrease of depression scores occurred.

Chapter 4 explores how Bayes factors can be used to test dependency structures. In practice, complex dependency structures are often conditionally modeled where random effects parameters are used to specify the natural heterogeneity in the population. Besides the common parameters, independent group-specific parameters are introduced with a common distribution function, usually normal, with a common mean and variance param-eter. In this conditional modeling framework, the random effects parameters as well as the variance components need to be estimated to make inferences about the dependency structure. When interest is focused on the dependency structure, inferences can be made from a complex covariance matrix using a marginal modeling approach. In this marginal modeling framework assumptions about conditional independence and random effects dis-tributions are not required. Furthermore, testing covariance parameters is not a boundary problem in the marginal framework where it is in the conditional framework.

Therefore, Bayesian tests are proposed on covariance parameter(s) in the marginal model where observations are multivariate normally distributed. The outcome of the test can be used in model building to justify any conditional independence assumptions in the conditional modeling framework. In our approach, innovative proper prior distribu-tions are introduced for the variance components such that the positive definiteness of the (compound symmetry) covariance matrix is ensured. An analytical form of the posterior density of the covariance parameter is derived that enables the computation of the Bayes factor and the posterior probability of the null hypothesis. A generalization is made to Bayes factors for testing heterogenous covariance structures where equality or inequality constrained hypotheses of covariance parameters are considered. As an illustration, the Bayes factor is used for testing (in)equalities between intraclass correlations of different group types, such as private, public or Catholic schools using the 1982 High School and

(27)

1.4 Outline of the Dissertation 7 Beyond Survey data.

PART II: Software and Applications

Chapter 5 decribes BIEMS (Bayesian Inequality and Equality constrained Model Selec-tion), a Fortran 90 program that can be used for calculating Bayes factors of multivariate normal linear models with equality and/or inequality constraints between the mean pa-rameters versus the unconstrained model. The prior that is used under the unconstrained model is the conjugate expected-constrained posterior prior and the prior under the con-strained model is proportional to the unconcon-strained prior truncated in the concon-strained space. This results in Bayes factors that appropriately balance between model fit and complexity for a broad class of constrained models. When the set of equality and/or in-equality constraints in the model represents a hypothesis that applied researchers have in, for instance, (M)AN(C)OVA, (multivariate) regression, or repeated measurements, the obtained Bayes factor can be used to determine how much evidence is provided by the data in favor of the hypothesis in comparison to the unconstrained model. If several hy-potheses are under investigation, the Bayes factors between the constrained models can be calculated using the obtained Bayes factors from BIEMS from which it can be concluded which hypothesis receives most support from the data.

The software package can be found on http://tinyurl.com/informativehypotheses. The package contains the software program BIEMS including a user manual (which is presented in Appendix 5.A) and a software program (with manual) that can be used for generating data under the multivariate normal linear model. By analyzing data that the user gener-ated him- or herself, users are able to learn more about the selection behavior of Bayes factors for informative hypotheses. A user-friendly interface is also under construction that enables users who are less familiar with the technical details of statistical models to evaluate hypotheses using the Bayes factor.

Chapter 6 focusses on the analysis of repeated measurements data. When analyzing such data, researchers often have expectations about the relations between the measure-ment means. The expectations can often be formulated using equality and inequality constraints between (i) the measurement means over time, (ii) the measurement means between groups, (iii) the means adjusted for time-invariant covariates, and (iv) the means adjusted for time-varying covariates. The result is a set of informative hypotheses. In this chapter, the Bayes factor is used to determine which hypothesis receives most support from the data. A pivotal element in the Bayesian framework is the specification of the prior. To avoid subjective prior specification, training data in combination with restric-tions on the measurement means are used to obtain so-called constrained posterior priors. A simulation study and an empirical example from developmental psychology show that this prior results in Bayes factors with desirable properties.

Chapter 7 describes a problem in experimental psychology about the characteristics and number of body representations in the brain. We examine the main problems that are encountered when trying to dissociate multiple body representations in healthy individuals with the use of bodily illusions. Traditionally, task-dependent bodily illusion effects have been taken as evidence for dissociable underlying body representations. Although this reasoning holds well when the dissociation is made between different types of tasks that are closely linked to different body representations, it becomes problematic when found within the same response task (i.e., within the same type of representation). Hence, this experimental approach to investigating body representations runs the risk of identifying as many different body representations as there are significantly different experimental

(28)

outputs. Here, we discuss and illustrate a different approach to this pluralism by shifting the focus towards investigating task-dependency of illusion outputs in combination with the type of multisensory input. Finally, we present two examples of behavioural bodily illusion experiments and apply Bayesian model selection to illustrate how this different approach of dissociating and classifying multiple body representations can be applied.

(29)

Part I

(30)
(31)

2

Equality and Inequality Constrained Multivariate

Linear Models: Objective Model Selection Using

Constrained Posterior Priors

1

2.1 Introduction

When analyzing multivariate data, researchers often have competing theories concerning the relationships between the model parameters. For instance in multivariate linear mod-els, a researcher may expect a specific ordering of the group means, expect that certain covariates have a larger effect on the dependent variable than other covariates, or expect certain means to be equal. These theories can be transformed into models with specific in-equality and in-equality constraints on the means and regression coefficients. The researcher is then interested which model fits the data best. In this paper, a new method is proposed 1 This chapter is published as Mulder, J., Hoijtink, H., & Klugkist (2010). Equality and inequality constrained multivariate linear models: objective model selection using constrained posterior priors. Journal of Statistical Planning and Inference, 140, 887– 906.

(32)

to construct a default prior distribution for inequality and equality constrained models such that sensible outcomes of the Bayes factor are obtained. The resulting Bayes factor has a naturally incorporated “Ockham’s razor”. This means that it implicitly balances model fit and model complexity. For an overview of the properties and possibilities of the Bayes factor see, for instance, Kass and Raftery (1995) and Lavine and Schervish (1999). A useful reference on Bayesian model selection applied in the social sciences is the edited book by Hoijtink, Klugkist, and Boelen (2008).

A well-known problem with the Bayes factor is that its outcome depends on arbitrarily chosen constants when using improper, noninformative priors. This result dates back to Jeffreys (1961). A common method to overcome this problem is by combining a non-informative prior with a small part of the data, referred to as a training sample, to obtain a proper posterior prior. The remaining part of the data is then used for model comparison. The resulting Bayes factor is referred to as a partial Bayes factor. Although it is generally advised to use a minimal training sample (e.g., Berger & Pericchi, 1996a; O’Hagan, 1995), which is the smallest possible subset of the data rendering a proper posterior prior, the idea of training data was originally proposed by Lempers (1971) who used half of the data as training sample. The use of training data has been extensively investigated the last decade (e.g., P´erez & Berger, 2002; Berger & Pericchi, 2004). A popular Bayes factor based on minimal training samples is the intrinsic Bayes factor (IBF) (Berger & Pericchi, 1996a, 1996b), which is the average over all partial Bayes factors based on all possible minimal training samples. A Bayes factor that is obtained without specifying ad hoc or subjective priors, such as the IBF, is referred to as a default Bayes factor.

However, direct application of the posterior prior approach may not give satisfactory results in the case of inequality constrained models. As an example, we determine the IBF for the following model selection problem, which was discussed by Berger and Pericchi (1996) in their Example 2. Consider the iid sample y = (y1, . . . , yn) that either comes from the inequality constrained model M1: yi∼ N (µ, 1) with µ < 0 or from the unconstrained model M0: yi∼ N (µ, 1) with µ ∈ R1

. Note here that the same symbol is used for µ under each model to simplify the notation. Berger and Pericchi (1996) suggest using different symbols for the model parameter under each model because µ may have different priors under each model in the area µ < 0. The outcome of the arithmetic IBF B10AI, which is the arithmetic mean of all partial Bayes factors, of random samples of size n = 30 generated from a N (µ, 1) distribution for µ = −5, . . . , 3 is displayed in Figure 2.1 (dotted line). The arithmetic IBF approximates an actual Bayes factor based on a so-called intrinsic prior for µ under each model. These intrinsic priors are given by πIP1 (µ) = Iµ<0(µ) and π0IP(µ) = Φ(−µ/√2), under model M1 and model M0, respectively, with I{·}(µ) the indicator function and Φ(·) the standard normal cdf (Berger & Pericchi, 1996a, p. 119). Note here that the intrinsic prior of µ under the unconstrained model M0is not symmetrical around µ = 0. The Bayes factor based on these intrinsic priors, BIP

10, was calculated and displayed in Figure 2.1 (solid line). The figure shows that these Bayes factors are not satisfactory when the mean µ is located in the inequality constrained model M1 because these Bayes factors do not prefer M1 over M0.

In another example by Berger and Mortera (1999) (their Example 1 ), the following models were under evaluation M1 : µ = 0 versus M2 : µ < 0 versus M3 : µ > 0. The intrinsic prior of µ under model M2 and model M3 was chosen proportional to an encompassing intrinsic prior πIP

0 under the unconstrained model M0 : µ ∈ R1, i.e., πIt(µ) = π0I(µ)1µ∈Mt(µ)/ct, for t = 2 or 3, with ct a normalizing constant. The

(33)

2.1 Introduction 13 −4 −2 0 2 0.0 0.5 1.0 1.5 2.0 2.5 μ −4 −2 0 2 0.0 0.5 1.0 1.5 2.0 2.5 −4 −2 0 2 0.0 0.5 1.0 1.5 2.0 2.5 B10

Fig. 2.1. The arithmetic IBF (dotted line), the Bayes factor based on the intrinsic prior (solid line), and the geometric IBF using constrained posterior priors (dashed line) of M1 versus M0 for samples generated from a N (µ, 1) distribution with different means.

around µ = 0. Furthermore, the outcomes of these Bayes factors were satisfactory in the sense that the Bayes factor went to infinity in favor of the correct model as n → ∞.

In this paper, this idea of a default symmetric prior under the unconstrained model is generalized to the multivariate normal linear model. We shall refer to these priors as constrained posterior priors which will be presented later in this paper. It will be shown that the methodology results in a Bayes factor that appropriately balances model fit and model complexity for a broad class of equality an inequality constrained models. In Figure 2.1 (dashed line), the geometric IBF based on constrained posterior priors are displayed for the Example 2 discussed above. The figure shows that the least complex model M1 is preferred over the more complex model M0 with a factor two when the fit of M1 and M0 is approximately equal. As will be shown, this factor of two is a direct result of the measure of complexity of an inequality constrained model that is introduced in this paper. The paper is organized as follows. Section 2 describes the multivariate normal linear model. Furthermore, three motivating examples are discussed with competing theories that can be translated into inequality and equality constrained models. In Section 3, the Bayes factor for a constrained model versus the unconstrained model is derived using a prior under the constrained model that is proportional to the prior under the unconstrained model. Section 4 provides a definition of model complexity for inequality constrained models. In Section 5, we present the constrained posterior prior (CPP) which results in a Bayes factor that takes the complexity of an inequality constrained model into account according to the definition. Furthermore, we discuss how to obtain an intrinsic Bayes factor based on all CPPs. Section 6 elaborates how to compute the Bayes factor using a Gibbs sampler. A simulation study and calculation of the Bayes factor for the examples introduced in Section 2.2 are presented in Section 7. We end this paper with a short discussion.

(34)

2.2 Model and Motivating Examples

In this section, we describe three applications of the multivariate normal linear model. In each example, researchers have different theories about the relationships between the model parameters. By translating these theories into (in)equality constrained models, the problem of choosing the most appropriate theory becomes equivalent to selecting the best model. Before describing these examples, the multivariate normal linear model and the general system of model (in)equalities will be discussed.

2.2.1 The Multivariate Normal Linear Model The multivariate normal linear model is given by

yi= Mdi+ Axi+ εi. (2.1)

In this model, a P -variate dependent variable yi, i = 1, . . . , N , is explained by a K-variate explanatory variable xi, a vector diof size J containing group membership, and εi a multivariate Gaussian error with εi∼ N (0, Σ). Furthermore, M is a P × J matrix with group means where the (p, j)th element µpj denotes the mean (or intercept) of the pth dependent variable for the jth group and A is a P × K matrix with regression coefficients where the (p, k)th element αpk denotes the coefficient of the kth covariate for the pth dependent variable. The group indicator diis coded as follows

dij= 1 if the ith observation belongs to group j 0 otherwise.

For N observations, the model can be written as

Y = XB + E, (2.2) with Y = 2 6 4 y01 . . . y0N 3 7 5, X = 2 6 4 d01 x01 . . . ... d0N x 0 N 3 7 5, B = » M0 A0 – , and E = 2 6 4 ε01 . . . ε0N 3 7 5.

Throughout this paper, the unknown model parameters are denoted by Θ = {B, Σ}. The likelihood of the multivariate normal linear model is given by

g(Y|X, B, Σ) ∝ |Σ|−N/2exp{−1 2 tr Σ −1 (Y − XB)0(Y − XB)} = exp{−1 2(β − ˆβ) 0 [Σ ⊗ (X0X)−1]−1(β − ˆβ)} · |Σ|−N/2exp{−1 2 tr Σ −1 S}, (2.3)

with β a vector of length D = P (J + K), which is the vectorization of B, that is, vec(B) = β, ˆβ is the maximum likelihood estimator of β, and

S = (Y − X ˆB)0(Y − X ˆB). (2.4)

Expression (2.3) shows that the likelihood is proportional to a normal-inverse-Wishart distribution.

(35)

2.2 Model and Motivating Examples 15 Consider an inequality constrained model Mt. The system of Ltinequality constraints of this model can be written as

Rtβ > rt, (2.5)

where Rtis a Lt× D matrix containing inequality coefficients and rtis a vector of length Lt containing constants. This system of inequalities is the multivariate equivalent of the univariate system considered by, for instance, Wolak (1987). Approximately equal param-eters can also be modeled using (2.5):

|µ1− µ2| < δ ⇔ µ1− µ2> −δ µ2− µ1> −δ ,

where δ is the maximal distance between two parameters. Furthermore, models can be constructed using equality constraints:

Qtβ = qt, (2.6)

where Qt is a Ht× D matrix containing equality coefficients and qt is a vector of length Htcontaining constants.

When combining (2.5) and (2.6), we obtain a constrained model with inequalities and equalities. The system of constraints of model Mt is then denoted by

» Rt Qt – β ≥» rt qt – ⇔ ˜Rtβ ≥ ˜rt. (2.7)

2.2.2 Examples of Constrained Model Selection

Multivariate One-Sided Testing: Cell Counts of HIV-Positive Newborn Infants

In an experiment described by Sleasman et al. (1999), researchers were interested whether ritonavir therapy has a positive effect on the immunoreconstruction of 36 HIV-positive newborn infants. Similarly as in Larocque and Labarre (2004), we limit the analysis to the cell counts of CD45RA T and CD45RO T, which were measured at birth and after 24 weeks of therapy. The research question is equivalent with a one-sided testing problem: • M1 : µ = 0

• M2 : µ > 0

where µ is a 2-dimensional vector containing the mean differences between the cell counts (after 24 weeks minus at birth) of the two cell types (data are printed in Larocque & Labarre; 2004). Note that M1assumes both means to be equal to zero while M2 assumes both means to be larger than zero.

Multivariate Analysis of Variance: Does Vinylidene Fluoride Cause Liver Damage?

Vinylidene fluoride is suspected of causing liver damage. To investigate this, 4 groups of 10 male Fischer-344 rats received different dosages of this substance by inhalation. Three serum enzyme levels (SDH, SGOT, and SGPT), which often indicate liver damage, were

(36)

measured for each rat. The data, printed in Silvapulle and Sen (2005, p. 10), appeared in a report of Litton Bionetics Inc. (1984).

The data can be analyzed with a multivariate analysis of variance model. Let µjdenote the 3-dimensional mean vector with serum levels of the jth group, j = 1, . . . , 4, where the dosage of vinylidene fluoride increases from group 1 to 4. The following two models are of interest:

• M1: µ1= µ2= µ3= µ4 • M2: µ1< µ2< µ3< µ4

The first model assumes that vinylidene fluoride has no influence on the three serum levels. The second model assumes that each serum level increases when increasing the dosage of vinylidene fluoride.

Multivariate Regression: Influence of Chemical Components on Cigarette Burn and Sugar Percentage

Anderson and Bancroft (1952, p. 205) provided a data set of chemical components of 25 tobacco leaf samples consisting of six explanatory variables: nitrogen (X1), chlorine (X2), potassium (X3), phosphorus (X4), calcium (X5), and magnesium (X6), and two dependent variables: rate of cigarette burn in inches per 1,000 seconds (Y1) and per cent sugar in the leaf (Y2). We consider the following inequality and equality constrained models (inspired by the results of Bedrick & Tsai (1994) who also analyzed this data set) on the standardized regression coefficients αkpwith k = 1, . . . , 6 and p = 1, 2:

• M1: α11= . . . = α61, α12= . . . = α62

• M2: 0 < (α11, α31, −α41, α51, −α61) < −α21, and 0 < (α12, α32, α42, α52, −α62) < α22,

• M3: 0 = α11= α41= α51= α61< α31< −α21, 0 = α32= α42= α52= α62< −α12< α22.

Model M1 assumes all standardized regression coefficients corresponding to a dependent variable to be equal, model M2 assumes that chlorine dominates the other explanatory variables in predicting the two dependent variables and that all standardized regression coefficients are either smaller than zero or larger than zero, and model M3 assumes that each dependent variable is explained by only two specific explanatory variables where chlorine is dominant over the other explanatory variable.

2.3 The Bayes Factor Based on Encompassing Priors

The Bayes factor of the constrained model Mtagainst the constrained model Mt0, denoted

by Btt0, is defined as Btt0= mt(Y) mt0(Y) = R g(Y|Θ)πt(Θ)∂Θ R g(Y|Θ)πt0(Θ)∂Θ (2.8) with mt(Y) the marginal density of Y under model Mt, πt(Θ) the prior density under model Mt, and g(Y|Θ) the likelihood in (2.3) with X omitted because we assume X to

(37)

2.3 The Bayes Factor Based on Encompassing Priors 17 be fixed throughout this paper. The Bayes factor Btt0 can be interpreted as the amount

of evidence the data provide in favor of model Mt against model Mt0. For a thorough

discussion of the Bayes factor, see, for instance, Kass and Raftery (1995).

The (in)equality constrained models discussed in the previous section are all nested in the unconstrained model M0, which does not assume any constraints on the model parameters Θ. For this reason, we can specify a so-called encompassing prior π0 for Θ under the unconstrained model M0(Berger & Mortera, 1999; Klugkist & Hoijtink, 2007). The prior distributions for Θ under a constrained model Mt is then proportional to the encompassing prior in the subspace of Θ allowed by Mt and zero elsewhere, that is,

πt(Θ) = c −1

t π0(Θ) if Θ ∈ Mt

0 otherwise, (2.9)

where ct is a normalizing constant given by ct=

Z

Θ∈Mt

π0(Θ)∂Θ. (2.10)

The posterior density of Θ under model Mt can be obtained using Bayes’ law and the truncated prior density in (2.9) and is proportional to the posterior under the uncon-strained model, denoted by π0(Θ|Y), in the subspace of Θ satisfying the (in)equality constraints of Mt, that is,

πt(Θ|Y) = f −1

t π0(Θ|Y) if Θ ∈ Mt

0 otherwise, (2.11)

where ftis a normalizing constant for the posterior density given by ft=

Z

Θ∈Mt

π0(Θ|Y)∂Θ. (2.12)

When using the expression of the marginal likelihood originally posted by Chib (1995), and (2.9) and (2.11), the Bayes factor Bt0can be expressed as

Bt0= mt(Y) m0(Y) = g(Y|Θ)πt(Θ) πt(Θ|Y) .g(Y|Θ)π0(Θ) π0(Θ|Y) = πt(Θ ∗ ) πt(Θ∗|Y) . π0(Θ∗) π0(Θ∗|Y), with Θ ∗∈ Mt = c −1 t π0(Θ ∗ ) ft−1π0(Θ∗|Y) . π0(Θ∗) π0(Θ∗|Y) = ft ct, (2.13)

with ftand ctas in (2.12) and (2.10), respectively. Hence, the Bayes factor of an inequal-ity constrained model Mt against the unconstrained model M0 is equal to the posterior probability mass that satisfies the inequality constraints divided by the prior probabil-ity mass that satisfies the inequalprobabil-ity constraints. For an equalprobabil-ity constrained model, the expression in (2.13) is equal to the Savage-Dickey density ratio (Dickey, 1971).

(38)

The nominator ft can be interpreted as a measure of model fit because when more constraints are supported by the data, more posterior probability mass will be located in the constrained region, such that ft will be larger. Note that for equality constraints, the model fit is large when the posterior is located on the line(s) implied by the equalities on the parameters. The denominator ct can be interpreted as a measure for the complexity of an inequality constrained model. We will come back to this in the next section.

The Bayes factor Btt0between two constrained models Mtand Mt0can be determined

using

Btt0 = Bt0

Bt00. (2.14)

Let us briefly return to Example 2 from Berger and Pericchi (1996) discussed in the introduction where the data y either come from the inequality constrained model M1 : yi ∼ N (µ, 1) with µ < 0 or from the unconstrained model M0 : yi ∼ N (µ, 1) with µ ∈ R1. A minimal training sample is any observation yi (i = 1, . . . , N ). Subsequently, the resulting posterior prior is given by π0(µ|yi) = N (yi, 1). The corresponding posterior π0(µ|y) is distributed with N (y, n−1). Furthermore, we take the truncated posterior prior under M0for µ < 0 as the posterior prior under model M1. Now, if the true sample mean µ → −∞, the partial Bayes factor corresponding to this training sample is equal to

B10(yi) = f1 c1 = R µ<0π0(µ|y)dµ R µ<0π0(µ|yi)dµ = Φ(−ny) Φ(−yi) → 1

1= 1 for all yi.

Consequently, the IBF, which is the average partial Bayes factor based on all possible training samples, also converges to 1 as was observed in Figure 2.1.

This example illustrates the problem that arises when using standard posterior priors in model selection between inequality constrained models. The main issue is that the model complexity measure c1 depends on the underlying value of µ, so that c1 varies for different data sets although the same inequality constrained model is evaluated. More specifically, the value of c1converges to 1 if the agreement between the data and inequality constrained model increases. As a consequence, the Bayes factor goes to 1 because f1 also converges to 1 in this setting. This is undesirable because the least complex model, i.e., the inequality constrained model M1, should be preferred over the more complex unconstrained model M0when the fit of both models is equal. For this reason, a posterior prior must be constructed that results in a value of c1 that is independent of the data and reflects the complexity of the inequality constrained model. This is for instance the case for the symmetrical intrinsic priors in Berger and Mortera (1999) in their Example 1. Before explaining how a default symmetrical prior can be constructed for the general linear model, we address what value the complexity measure c1 should have for inequality constrained models.

2.4 Complexity of Inequality Constrained Models

For a prior π0, the Bayes factor Bt0 of an inequality constrained model Mt against the unconstrained model M0has an upper bound c−1t given by

Bt0=ft ct ≤ c

−1

(39)

2.4 Complexity of Inequality Constrained Models 19 When Mt is the true model, i.e., when the inequalities of the inequality constrained model Mt are supported by the data, Bt0 → c−1

t as the sample size N → ∞. Further note that when two inequality constrained models Mtand Mt0 fit the data equally well,

i.e., ft= ft0, the Bayes factor should be largest for the least complex model, which is the

model with the smallest value for c.

A popular operationalization of model complexity in classical model selection criteria, e.g., the AIC (Akaike, 1973) or BIC (Schwarz, 1978), is the number of free parameters. For inequality constrained models, however, the number of free parameters is not directly clear. For example, the number of free parameters of the unconstrained model M0: µ1, µ2 is two but the number of free parameters of the inequality constrained model M1: µ1> µ2 is unclear.

For the Bayes factor Bt0 in (2.13), model complexity is measured through ct, which cannot be interpreted as the number of parameters because 0 ≤ ct≤ 1. Instead, we define the model complexity ct as the ratio of the size of the domain of Mt and the size of the domain of M0. This idea is formalized in the following definition.

Definition 2.1. Model complexity ct of an inequality constrained model Mt is defined as the limit of the probability mass located in the inequality constrained space of Mt using a D-variate uniform prior on β given by

πU(β; −`, `) = (2`) −D

if β ∈ (−`, `)D

0 elsewhere, (2.16)

as the bound ` → ∞. Hence, model complexity is defined as ct= lim

`→∞ Z

β∈Mt∩(−`,`)D

πU(β; −`, `)∂β. (2.17)

The following examples illustrate how the complexity ct of inequality constrained models can be determined.

Example 2.1. The complexity c1 of model M1: β < r, with r constant, is calculated with the noninformative proper prior πU(β; −`, `) in (2.16) according to

c1= lim `→∞ r Z β=−` (2`)−1dβ = lim `→∞ r + ` 2` = 1 2, ∀r ∈ R

Hence, the complexity of the inequality constrained model in Example 2 from Berger and Pericchi (1996) is equal to c1=12.

Example 2.2. Model complexity c2of the inequality constrained model M2 : β1> β2 > β3 can be determined as follows

c2= lim `→∞ ` Z β1=−` β1 Z β2=−` β2 Z β3=−` (2`)−3∂β3∂β2∂β1= lim `→∞ 4 3` 3 (2`)3 = 1 6.

The value of c2 is equal to the probability that the three parameters β1, β2, and β3 are ordered according to M2 when assuming that every ordering is equally likely a priori.

(40)

Table 2.1. Model complexity ctof different inequality constrained models Mt. model Mt ct β1> 0 1/2 β1< β2, β3> 2 1/4 β1< β2< β3 1/6 β1< (β2, β3) 1/3 (β1+ β2)/2 < β31/2

Model complexity of various inequality constrained models can be found in Table 2.1. The following theorem states a sufficient and necessary condition for a prior under M0 that results in a Bayes factor where model complexity of an inequality constrained model is incorporated according to Definition 1. Berger and Pericchi (1996) call such a Bayes factor “balanced” or “on equal footing”.

Theorem 2.1. The Bayes factor Bt0between an inequality constrained model Mtand the unconstrained model M0incorporates the complexity of model Mtaccording to Definition 1 if and only if

Z

β∈Mt

π0(β)∂β = ct (2.18)

and the prior πt under model Mt is truncated from π0 as in (2.9).

Proof This is a direct consequence when substituting (2.17) into (2.13).  Hence, in order to obtain a Bayes factor that is balanced, a prior under the uncon-strained model M0 must be used that satisfies (2.18). Note that this was the case for the encompassing intrinsic prior from Example 1 from Berger and Mortera (1999), which was symmetrical around µ = 0, but not for the intrinsic prior of Example 2 from Berger and Pericchi (1996), which was not symmertical around µ = 0. In the following section, a gen-eral method is proposed to construct a symmetric default prior under the unconstrained model M0 that satisfies Theorem 2.1.

2.5 Constrained Posterior Priors

A training sample Ylis a subset of the data set Y that results in a unique proper posterior prior when using a noninformative improper prior. For the multivariate normal linear model (2.1), the posterior prior π(β, Σ|Yl) for a training sample Ylis proportional to the product of the likelihood of Yland the noninformative, improper Jeffreys prior πN(β, Σ) ∝ |Σ|−1/2(P +1) . Hence, π(β, Σ|Yl) ∝ g(Yl|β, Σ)πN(β, Σ) ∝ π(β|Σ, Yl)π(Σ|Yl) = N ( ˆβl, Σ ⊗ (X 0 lXl)−1)inv-W(Sl, νl), (2.19)

where Xlcorresponds to the observations in the training sample Yl, ˆβlis the maximum likelihood (ML) estimate of β given Yl, Slis the corresponding sums of squares given in

(41)

2.5 Constrained Posterior Priors 21 (2.4), and νl = Nl+ P − J − K + 1 is the number of degrees of freedom for a training sample of size Nl(see, for instance, Press, 2005, p. 252).

As was illustrated at the end of the previous section, the posterior prior (2.19) generally does not satisfy (2.18), and therefore, generally does not take model complexity into account according to Definition 2.1. For this reason, we propose a posterior prior not only based on training data but also on the set of linear restrictions:

» Rt Qt – β =» rt qt – ⇔ ˜Rtβ = ˜rt, (2.20)

which is a combination of the model inequalities (2.5) and model equalities (2.6) of Mt where the inequalities are replaced by equalities. This constrained posterior prior (CPP) is defined by

πC(β, Σ|Yl) ≡ πC(β|Yl)πC(Σ|Yl) = N ( ˆβRl, ˆΣRβ,l)inv-W(Λ

R l, ν

R

l ), (2.21)

where ˆβRl is the restricted maximum likelihood (RML) estimator of β under the restric-tions in (2.20), ˆΣR

β,lis a D × D diagonal covariance matrix, ΛRl is a P × P diagonal scale matrix, and νlR is the number of degrees of freedom, all based on the training sample Yl. Note that in the CPP, β and Σ are modeled independently, which is not the case in (2.19). If πC(β|Yl) depends on Σ, the conditional posterior of Σ|β, Y, Ylwill not belong to a common family of probability distributions. As a consequence, the computation of the Bayes factor using the Gibbs sampler (discussed in Section 6) needs a Metropolis-Hastings step. This complicates the computation and can be avoided by modeling β and Σ inde-pendently in the prior. As a consequence, the resulting conditional posterior Σ|β, Y, Yl has an inverse-Wishart distribution.

At the end of this section, we will show that the training data based CPP satisfies (2.18) for a broad class of inequality constrained models. First, we elaborate how to obtain the hyper parameters of the CPP.

2.5.1 The Restricted Maximum Likelihood Estimator ˆβRl

The restricted maximum likelihood (RML) estimator for a training sample Yl, denoted by ˆβRl, is obtained by applying restricted regression (see, for instance, Judge et al., 1980, p. 56) where the model parameters are estimated under certain linear restrictions as in (2.20).

In order to use restricted regression for the multivariate normal linear model for a training sample Yl, the multivariate model (2.2) needs to be rewritten in a univariate form:

Yl= XlB + E

⇒ vec(Yl) = vec(XlB) + vec(E) vec(Yl) = (IP⊗ Xl)β + 

vec(Yl) = ˜Xlβ + , (2.22)

where  ∼ N (0, Σ ⊗ INl) with Nldenoting the training sample size (addressed below) and vec(·) denotes the vectorization of a matrix. Under the linear restrictions (2.20), the RML estimate of β for a training sample Ylis given by

Referenties

GERELATEERDE DOCUMENTEN

In this approach priors under competing inequality constrained hypotheses are formulated as truncations of the prior under the unconstrained hypothesis that does not impose

In summary, several sources of uncertainty complicate a priori sample size estimation in trials with multiple outcomes: Treatment differences on individual outcomes, the

Keywords: Approximate Bayesian procedure, Bayes factors, Order constrained hypothesis, Structural equation model..

The first lag of the quarterly housing related macroprudential index is only significant at the 1 percent level in case of the sample covering 5 advanced Asian economies, in-

De additie van (minimaal) vers maaisel en (idealiter) plagsel, beiden verzameld in een goed-ontwikkelde referentieheide (niet vergrast!), wordt geadviseerd: deze

Pure Newton methods have local quadratic convergence rate and their computational cost per iteration is of the same order as the one of the trust-region method.. However, they are

In de Praktijkonderzoek-studie is aangenomen dat door beter ontworpen huisvesting (loop- lijnen, materiaal enzovoort) en door mechanisatie de arbeidsbehoefte circa 20% lager ligt

Divergence and adaptive capacity of marine keystone species Fietz, Katharina.. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to