• No results found

Informed effect-size priors for two-group comparisons in psychological research: evidence from elicitation and simulation

N/A
N/A
Protected

Academic year: 2021

Share "Informed effect-size priors for two-group comparisons in psychological research: evidence from elicitation and simulation"

Copied!
23
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Informed effect-size priors for two-group comparisons in

psychological research: evidence from elicitation and

simulation

Dimitris Katsimpokis

Program group of Psychological Methods

Department of Psychology

University of Amsterdam

Supervisors: Quentin Gronau and Professor Dr. Eric-Jan Wagenmakers

November 2017

1

Introduction

Over the last years, a lively debate has sparked off in psychology (and quantitative sciences in general) as to which is a good way of performing statistical analyses. After a period of common

practice of Fisherian null hypothesis significance testing (for an overview see e.g.,Anderson et al.

(2000);Nickerson(2000)) the field has seen the emergence of many proposals and frameworks, all

of which endeavor to surpass the weaknesses of the Fisherian paradigm.

A long-standing alternative to perform hypothesis testing has been that of Bayesian statistics

(e.g., Fisher (1937); Lindley (2000); Wagenmakers (2007)). Bayesian statistics builds upon the

Bayes rule which quantifies the relationship between the observed data, obtained through an ex-perimental procedure, and the prior beliefs regarding the distribution of data points, before the latter have been actually observed. The Bayes rule formulates this relationship with the following

equation (for an introduction see: Etz and Vandekerckhove(2017)):

P (H|X) = P (H)P (X|H)

P (X) , (1)

where P(X|H) indicates the likelihood of the observed data given a hypothesis, P(H) the prior probability of that hypothesis before observing the data, and P(X) indicates the marginal

proba-∗The study has been pre-registered as an AsPredicted entry with registry number 4769 (url:

https://

aspredicted.org/mg85s.pdf). Code and data for the paper can be found at Open Science Foundation (url:

osf.io/vqszj). I would like to wholeheartedly thank the experts who participated in the present study.

Spe-cial thanks go to Professor Dr. Eric-Jan Wagenmakers and Quentin Gronau for their suggestions and comments on an earlier draft and several illuminating discussions we had during the project.

(2)

bility, namely that of observing a given outcome in our data. The result of the Bayes rule is the posterior probability (i.e., the left-hand side of equation (1)) which quantifies the probability of our hypothesis after observing the data. The Bayes rule provides a direct way to evaluate how probable our hypotheses are in the light of the collected evidence. This is a very important contribution of the Bayesian framework, since the question of how probable our hypotheses are given the data presumably constitutes the central question in every empirical research.

The marginal probabilities involved in the denominator of equation (1) can be thought of as equivalent to the probability of the data given all levels or values of hypothesis, weighted by the prior at that level or value of the hypothesis. As a result, a common reformulation of equation (1) is one which expresses marginal likelihood as the sum of such quantity:

P (H|X) = P (H)P (X|H)

PK

k=1P (Hk)P (X|Hk)

. (2)

Equation (2) is appropriate for updating beliefs of single-value probabilities. In the case of a set of values however, the Bayesian framework expresses probabilities as distributions of possible values. In the numerator, the prior distribution is expressed in terms of a prior density function, while the likelihood is formulated as a the likelihood function of the estimated parameters of the data given a hypothesis. The denominator on the other hand, becomes an infinitesimal rather than a discrete sum. As a result equation (2) is transformed into a continuous version of the Bayes rule as follows:

P (H|X) = R P (H)P (X|H)

HP (H)P (X|H)dH

. (3)

The rule maintains exactly the same properties as with its discrete counterpart and can be generalized in multiple dimensions based on the parameters of the likelihood function and prior density. In the continuous version, the denominator plays the role of a normalizing constant, which guarantees that the area under the density of the posterior distribution equals one.

To quantify the relative strength of the observed data in favor of one hypothesis, the Bayes framework provides a measure the probability of the data given the hypotheses. This measure, called Bayes Factor (BF), provides a likelihood ratio based on the numerator of the Bayes rule of

two different hypothesis (e.g.,Etz and Vandekerckhove(2017);Kass and Raftery(1995);Ly et al.

(2016);Morey and Rouder(2011)). Since the numerator of the rule weights a likelihood function by a prior distribution, a BF is defined as a corresponding ratio:

BF =P (X|H0)

P (X|H1)

, (4)

where P (X|H0) and (X|H1) are the likelihoods of the data given hypothesis 0 and 1 respectively.

Because the BF is a ratio, it equals 1 when both probabilities are equal. It becomes positive when the numerator is bigger than the denominator and the factor indicates how many times the numerator is bigger than the denominator. The same holds when the denominator indicates a bigger quantity.

(3)

Specifying the hypotheses in equation (4) means that we define what a null and an alternative hypothesis would look like in a Bayesian setting. Similarly to the null-hypothesis significance testing, the null hypothesis is defined as a parameter of interest (usually the mean) that is fixed

to a specific value. In the case of H1, the alternative hypothesis would be set up as a range of

possible values that would be plausible to occur if the null hypothesis would be wrong. However, probability distributions are usually defined based on more than one parameters, called nuisance, and thus it is necessary to be included in the calculation of the BF. Thus the latter can be written as follows: BF = R zP (X|θ = θ0, ζ, H0)P (ζ|H0)dζ R θ R zP (X|θ, ζ, H1)P (θ, ζ|H1)dζdθ , (5)

where the numerator indicates the likelihood function as weighted by the prior for the null

hy-pothesis (H0) the denominator indicates the same for the alternative hypothesis (H1), whereas

ζ is the nuisance parameter (or a vector of parameters if more nuisance parameters are required by the distribution). θ is the parameter of interest and is fixed to a particular value for the null hypothesis, while is left free to vary in the alternative hypothesis. Thus the BF can be seen as as the relative strength of the likelihood of the data weighted by the prior probability.

Unlike the Fisherian paradigm, in which the alternative hypothesis is not specified and hy-pothesis testing always evaluates a null hyhy-pothesis of no effect, the Bayes rule requires the explicit specification of the prior beliefs of the person who performs hypothesis testing. The prior hy-pothesis plays an important role in the estimation of the posterior distribution as well as the BF, since the likelihood function is infinitestimally weighted by the prior distribution. This weighting procedure restricts (but not prohibits) the shape of the posterior to a certain interval of more/less probable values.

In the context of t-test for example, the null hypothesis would indicate that the means of the groups are equal, while the alternative hypothesis would assume that the means are not equal and it would proceed to making predictions regarding the magnitude of the inequality. Previous Bayesian implementations of the t-test have used the effect size and not the mean as the quantity of

interest (e.g.,Rouder et al.(2009)). Effect sizes have been introduced (Cohen (1992);Sawilowsky

(2003)) in order to have a standardized measure of the magnitude of the effect in two-group

comparisons. Usually defined as µ1−µ2

σ , they quantify the standardized mean difference of two

groups in the population (though sample-based formulas exist). As a result the null hypothesis would be formulated as an effect size of 0 magnitude, whereas the alternative hypothesis would relaxed such strong expectation of the null hypothesis by centering the prior around 0 yet allowing it to vary. A commonly used prior distribution for the alternative hypothesis has been the Cauchy prior (Figure1; see alsoMorey et al.(2015)).

An immediately arising question pertains to the choice of the priors. Since Bayesian statistics requires an effective compromise between the prior distribution and the likelihood function, the way prior distributions are set has a direct effect on the inference of evidence strength (i.e., BFs) as well as the posterior distributions, given that both of them incorporate the prior. Several studies

(4)

−4

−2

0

2

4

0.0

0.1

0.2

0.3

0.4

Density

Figure 1: The Cauchy prior used in psychological research with location equal to zero and scale equal to 1/√2.

have shown that prior specification matters for Bayesian inference and estimation (e.g.,Vanpaemel

(2010);Lehermeier et al.(2013))

A number of researchers have argued in favor of employing default priors, that remain

unin-formative of any expectation regarding the data (Fraser et al.(2010);Berger et al.(2006);Ghosh

(2011); though default priors are not necessarily completely uninformative as we will see below

in the case of t-test). Default priors seem necessary in situations where the experimenter lacks any prior knowledge about the phenomenon under investigation. For one, the experimenter may be involved in a purely exploratory study that has never been done before, and thus he or she would be characterized by complete ignorance regarding the potential outcomes of the experiment. Because default priors remain uninformative about any expectation regarding the data, they have the slightest possible effect on the posterior distribution or the resulting BF (Berger et al.(2006)).

However, in the case where prior knowledge is available, then a way of incorporating it in

hypothesis testing seems necessary (e.g., Bousquet(2006); Goldstein et al. (2006)). In the early

stages of an exploration of a phenomenon, for example, the researcher would have reasons to include the slightest prior influence in BFs or posteriors distributions. However, as experiments on the phenomenon accumulate evidence, the researcher would have good reasons to include the accumulated knowledge in any future exploration of the phenomenon under study. In this case, the prior represents the empirically informed expectations of the researcher and its field. A relevant context to think about empirically informed priors, would be that of experiment replication. If a number of experiments conducted lead to a specific relation with respect to the variables of investigation, then any future replication requires to exhibit strong expectations about what would

(5)

be a likely outcome of the replication and what would not (Verhagen and Wagenmakers(2014)).

Outside of the domain of replication however, only few studies have tackled the problem of how priors of experts can be elicited and then used for constructing informed prior distributions. Some studies attempted to infer experts’ prior distributions with quantile-based methods, where the experts were asked questions about certain quantile of the cumulative prior distribution, so that its corresponding density function could be inferred (seeAlbert et al.(2012);Dey et al.(2007)

for instance).

Gronau et al. (2017) was one of the first studies to attempt to elicit the actual prior density from an expert working in the field of social psychology. The goal of the study was to compare and contrast the effects of the use of the elicited prior on inference as opposed to default priors.

Gronau et al. (2017) made a number of observations regarding the relationship between default

and informed priors. In particular, in case when the effect size is found in the region hypothesized by the prior, the BF based on the informed prior bears more evidence for an effect as opposed to the default prior. Because the effect size is found within the expected region, the informed prior also attracts the posterior closer to it.

It seems reasonable to believe that prior elicitation from only one expert cannot sufficiently capture prior expectations of social psychology let alone of psychology as a whole. First, it may be the case this particular expert was infused with a very extreme set of expectations about effect sizes in their particular field. In the same vein, one may expect that interpretations of the magnitude of effect size vary from one subfield of psychology to the other. In applied psychology for example, where the sample size of the empirical studies is usually larger than other subfields of psychology

(see e.g.,Marszalek et al.(2011)), small-to-medium effect size could be more easily revealed, since

employing a large sample size is a prerequisite of finding them. On the other hand, in cases of atypical development, for instance, where the sample size is usually low due to the rarity of a disorder or the practical deterrents that go together with such a study; in these cases, effect sizes of the order of δ = 0.1 are difficult to be found. All in all, what experts consider small-to-medium effect size in one field may be a medium-to-large effect size for the other.

In addition,Gronau et al.(2017) elicited a prior from a social psychology expert with respect

to a particular experimental manipulation in mind. However, a more general understanding of informed priors of families of effect sizes are important so that we acquire a deeper picture of when informed priors make important contributions to inference. Small-to-medium effect sizes are of particular importance because they are the most difficult to establish. They require sufficient sample size and, unlike large effects which are easily detectable, they can be mistakenly taken as evidence for the alternative hypothesis.

The goal of the present paper, therefore, is to extend the work ofGronau et al.(2017) in three

aspects. First, the paper aims at presenting a novel way of eliciting prior distributions from experts working in subfields of psychology. Second, it aims at studying the extent to which the elicited priors are similar to each other for small-to-medium effect sizes in various domains of psychological research, and by extension, the role of the psychological subfield in the interpretation of the effect

(6)

size magnitude. Third, the paper aims at contrasting the elicited priors to default ones, in order to investigate which of them are more informative about the data under various empirical scenarios.

2

Methods

2.1

Participants

Six experts ranging from postdoctoral researchers to full professors working at the University of Amsterdam were selected to participate in the study. Two were working on cognitive neuroscience, two on social psychology and the rest two on developmental psychology.

2.2

Materials and methods

To elicit priors distributions, the roulette method of the MATCH uncertainty elicitation tool was

used (Gore(1987);Morris et al.(2014)). The roulette method provides experts with a 10X10 grid

that includes equally sized bins. On the x axis, effect size values were presented, ranging from 0 to 1. By stacking chips over a certain range of values, participants could indicate the proportion of chips allocated to that particular range. Unlike previously used quantile-based methods (e.g.,

found inAlbert et al.(2012)), the roulette method makes it possible for the expert to provide their

distribution of beliefs themselves, rather than for it to be inferred at a later point. An example

from the version of the method use is presented in Figure2.

Figure 2: An example of the roulette method of the MATCH elicitation toolkit used in the exper-iment.

The size of the grid and the x axis sets limits in the elicitation procedure, appeared as an optimal decision in face of the bin-size-axis-limit trade-off. Specifically, one might argue that the upper x axis limit is too small (experts might categorize small-to-medium effect sizes beyond δ = 1) or that the number of bins (i.e., 100) are not enough, so that the elicited priors are representative of experts’ beliefs. However, such a criticism does not take into account the trade-off between

(7)

the bin-size and the axis limit. Specifically, having set the number of bins to a particular fixed value, the more we make x axis larger, the more the bin size becomes bigger and thus the elicited distribution more condensed since the bins now encompasses a larger range of possible effect size values. One might, again, argue that to tackle this problem, the number of bins should increase as well. Yet, the more the number of bins increase, the less range of effect sizes each bin captures, and thus the uncertainty of prior expert expectations increases as well. For example, experts may have some prior expectations regarding effect sizes within the range of 0.3 < δ < 0.4, but they will probably be uncertain about the 0.35 < δ < 0.4 interval, and very uncertain about 0.38 < δ < 0.4. However, such uncertainty does not follow from the experts’ belief uncertainty, since they are able to make predictions regarding the 0.3 < δ < 0.4. Thus, to provide an optimal compromise with respect to the bin-size-axis-limit trade-off, it appeared that a grid size at 10X10 and an effect size range within (0,1) provides experts with the required degrees of freedom to express their beliefs without being restricted by the experimental design, and at the same time, reducing their belief uncertainty regarding too small intervals. Furthermore, given that the effect size literature

interprets small-to-medium effects sizes within the range of 0.2 < δ < 0.4 (e.g. Cohen (1992);

Sawilowsky(2003)), an upper limit set to δ = 1 was considered reasonable.

The elicited distributions based on the MATCH roulette method are discrete since they repre-sent proportions and not densities over intervals of effect sizes. In order to transform the discrete probabilities to continuous ones, the MATCH roulette method fits a variety of distributions in the elicited data and reports the estimated parameters. We used the estimated parameters to provide participants with the estimated density of the priors distributions in an application we made in R

programming language using the shiny package (RStudio, Inc(2013);RTeam(2014), (for an

exam-ple see Figure3). The shiny application showed a panel with the estimated density and a vertical

panel on the left, through which the parameters of the prior distributions can be manipulated.

2.3

Elicitation procedure

The elicitation procedure took the form of an interview, where experts were invited to participate after having been briefly introduced to the topic of the interview, namely that it concerns an elicitation procedure regarding their expectations of small-to-medium effect sizes in their field of study. The interviews took place at the experts’ offices, or alternatively, at meeting rooms. The interview were one-on-one between the interviewer and the expert, where the interviewer was a master’s student.

In the beginning of the interview, experts were again introduced to the goal of the interview, namely to elicit a distribution of possible effect-size values of small-to-medium effect size in their particular field of study. It was mentioned to the experts that they should think about small-to-medium effect sizes in the context of t-test and bringing in mind Cohen’s d as a measure of effect size for two-group comparisons. Experts were briefly and roughly reminded about the definition of an effect size, namely that it measures standardized mean differences. It was underscored that they should abstract away from sampling variability, as they should think about general

(8)

small-to-Figure 3: An example of a transformation of discrete stacks from the roulette method to continuous densities, built through the shiny application.

medium effect sizes existing at the level of population. Experts were further told that they can change their mind as many times as they wish without further notice during the course of the interview, and they were told that there are no correct or erroneous answers since the elicitation procedure targets their internal beliefs.

Consequently, the interviewer asked the following questions in order to assist experts start focusing on the topic of the interview:

• Imagine how general small-to-medium effect sizes in your field would look like. Which effect size value would you expect as the most probable one to be found?

• Which range of values would you consider possible?

where in the meantime, the interviewer wrote down the numbers. The interviewer then told experts to keep these estimates of theirs in mind and he introduced them to the MATCH roulette method.

(9)

The latter was explained in detail and then experts were asked to draw a distribution, which depicts their expected prior beliefs regarding small-to-medium effect sizes in their particular field of study, by stacking chips one over the next. There was no time restriction for the completion of the task. After assigning their distribution in the grid, they further asked whether they would like to change it in any way. If they did not wish to make any further changes, experts were told that no further change could be made in the discrete version of their distribution.

Next, experts were told that a continuous version of their elicited distribution is needed for the purpose of Bayesian inference, and thus they were introduced in the shiny application. The best parameter estimates of the MATCH roulette method to the elicited discrete distributions were used to fit two prior distributions, a Gaussian and a scaled-shifted t-distribution. In the shiny application, the best parameter estimates of the Gaussian distribution were used to produce the corresponding Gaussian density function. Consequently, experts were asked whether the fitted midpoint of the Gaussian distribution (i.e., its mean) best represents their beliefs about small-to-medium effect sizes in their field. In case expects thought it does not, they decreased or increased the midpoint. In case they did not intent to make any further changes, the same procedure was followed regarding the spread of the Gaussian distribution.

After the Gaussian prior, the same procedure was followed with the t-distribution. Experts were made aware about the parameters of the t-distribution, namely its midpoint and spread, and they further introduced to its third parameter, the degrees of freedom. As with the Gaussian distribution, experts were asked about each parameter whether it best represents their beliefs about small-to-medium effect sizes in their field and in case it did not, experts proceed with changing it accordingly.

Gaussian and t-priors were selected because they exhibit a desirable property. Unlike beta or gamma priors, Gaussian and t-distribution are defined in the whole real line and thus they are able to assign density to negative effect sizes as well as also infinitely large ones (however improbable they are).

3

Results

3.1

Elicited priors

The experts that participated in the elicitation study were asked about their expectation when it comes to small-to-medium effect sizes in their particular field of psychology, when the alternative hypothesis is true.

The elicited parameters are given in Table1. An immediate observation pertains to the fact that

priors in three out of four experts exhibit similar parameters. Particularly, Experts 2-4 hypothesize that in the case the alternative hypothesis is true, a small-to-medium effect size is likely to be found

around δ = 0.6.1 In addition, the type of the distribution does not seem to induce major changes

1Experts are denoted with an initial capitial letter and a corresponding number to indicate the participant of

(10)

in the parameters of the distributions, since the mean of the t-distributions were found around δ = 0.056 for experts 2-4. The negligible discrepancies between the means of the Gaussian and that of t-distributions can be attributed to the low degrees of freedom parameter, which in two out of the three cases was set to three. T-distributions are approximately Gaussian-distributed when the parameter of degrees of freedom is very large. A similar observation could be made regarding

the spread of the prior distribution which is usually expected to be SDδ = 0.11 and generalizes

across experts and types of distributions with negligible discrepancies as in the case of means. In the case of Gaussian distributions, where the 68-95-98 rule applies, the δ interval in which 95% of the density is found, is distributed approximately 2*SD below and above the mean. Given that the grand mean of the means of Experts 2-4, for example, is 0.579189 and the respective grand SD equals 0.124196, approximately 95% of the density of the Gaussian distribution for these experts is found within the interval (0.330797, 0.827581), which we can consider to be the maximum range of possible values of δ. The interval is approximately similar in case of the t-distribution, depending on the value of the degrees of freedom.

Elicitated Priors

Gaussian Distribution T-Distribution

Expert Field of Study Mean SD Mean SD DF

Expert 1 Social Psychology 0.100 0.121 0.100 0.120 3

Expert 2 Social Psychology 0.550 0.102 0.550 0.081 3

Expert 3 Cognitive Neuroscience 0.599 0.136 0.600 0.107 13

Expert 4 Cognitive Neuroscience 0.587 0.134 0.585 0.107 3

Expert 5 Developmental psychology 0.404 0.121 0.405 0.116 13

Expert 6 Developmental psychology 0.317 0.083 0.310 0.075 9

Table 1: Elicited parameters of the Gaussian and T-priors per expert and field of study. Parameters are given to the closest three decimal numbers. ‘SD’ and ‘DF’ stand for ‘standard deviation’ and ‘degrees of freedom’ respectively.

As mentioned in section 2.3, each Expert was asked in the beginning of the interview two ques-tions about what (s)he considers the most probable value and range of values of small-to-medium

effect sizes in their particular field of study. Each expert’s answer is given in Table2. One

im-portant observation we can make concerns the initial effect size estimations of social psychology experts as opposed to cognitive neuroscientists. In spite of explicitly reporting higher estimates in the initial question period of the interview, both social psychology experts provided more conserva-tive estimates of their expectations in the roulette method. Their conservaconserva-tiveness only concerned the midpoint of the effect size expectations and not the spread of the elicited distributions which is similar across experts.

One way to account for these discrepancies between the initial answers of social psychologists and their elicited priors is lack of complete knowledge of effect size measures in two-group

(11)

compar-ison situations. During the interview, Expert 2 reported high uncertainty regarding their beliefs as she repeated multiple times that she does not feel sure with the answers she was providing us with. Expert 1 on the other hand also changed his mind during the interview and he reported that he is more familiar with effect size measures at the level of factors in ANOVA, rather than two-group comparisons. On the other hand, Experts 3-6 appeared to be more knowledgeable about effect size measures, as they reported they knew Cohen’s d and how it is defined beforehand. Because of their high degree of certainty, Experts 3-6 maintained similar prior estimates in the roulette method to the ones reported previously.

Expert Field of Study Most probable δ Most probable range of δ

Expert 1 Social Psychology 0.3 0.2 – 0.4

Expert 2 Social Psychology 0.8 0.5 – 0.9

Expert 3 Cognitive Neuroscience 0.6 0.4 – 0.8

Expert 4 Cognitive Neuroscience 0.5 0.4 – 1.0

Expert 5 Developmental Psychology 0.5 0.3 – 0.6

Expert 6 Developmental Psychology 0.3 0.1 – 0.5

Table 2: Experts’ description of general small-to-medium effect sizes in their subfields of psychol-ogy during the initial question period of the elicitation procedure. δ indicates effect size in the population.

3.2

Elicited Prior Distributions

The resulting priors of all Experts are plotted in Figure4. Two observations are important here.

First, no significant difference is observed between the family of Gaussian distributions and

T-distributions. With the exception of t-distributions with degrees of freedom equal to 13, the

rest of t-distributions assign more probability to the tails and more probability to the center of the distributions as compared to the Gaussian family. In spite of it, the two families of prior distributions appear similar. A second observation concerns differences across Experts. As we discussed in section 3.1, prior distributions of Expert 1 constitute a major diversion of the general

picture presented in Figure 4. The rest of the Experts exhibited similar prior distributions with

the small exception of Experts 5-6, whose priors are shifted a bit towards zero as compared to Experts 2-4.

The picture that emerges from Figure4 allows us to make several predictions of the

informa-tiveness of the elicited priors as compared to the default Cauchy prior. It is worth remembering that default priors relax the assumption of the null hypothesis, which postulates that δ = 0 in the population, and assume that the alternative hypothesis is distributed around δ = 0. Given that

the distributions of Figure4 constitute expected effect size values in the case of the alternative

(12)

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 1

Effect Size (delta)

Density

Gaussian Dist

T−dist

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 2

Effect Size (delta)

Density

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 3

Effect Size (delta)

Density

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 4

Effect Size (delta)

Density

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 5

Effect Size (delta)

Density

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

Expert 6

Effect Size (delta)

Density

Figure 4: Elicited prior distribution of Experts for Gaussian and T-distributions. δ indicates effect size in the population.

(13)

If the alternative hypothesis is true, the resulting BFs of all five elicited priors would carry more evidence in favor of the alternative hypothesis than the default Cauchy. This is because the priors assign more density on the interval (0,1). If the likelihood distribution is very far from what the priors predict (e.g., δ > 2), then the informativeness relationship may reverse. However, such effect sizes are too large to be considered small-to-medium and researchers are unlikely to confuse them with small-to-medium effects.

4

Comparing informed to default priors

4.1

Two-group comparisons based on the

Wetzels et al.

(

2011

) dataset

As we observed in section 3, there is noticeable variation in the elicited prior distributions of experts. Two explanations were given to account for this variation. First, the latter was interpreted as an indication of different expectations regarding small-to-medium effect sizes, determined by the particular subfield of psychology. The second explanation had to do with the potential lack of knowledge regarding effect sizes in general.

Yet, if we set the interpretation of these differences aside, a number of questions arise that concern the practical implications of such differences. First, one question pertains to whether the discrepancies among priors matter for hypothesis testing. It may be the case that albeit present, these discrepancies are marginal. If, on the other hand, they are not, a further question concerns which of them result in BFs providing more evidence for small-to-medium effect sizes as compared to the Cauchy prior in case the alternative hypothesis is true. Another related question pertains to the rest of the effect sizes. Though it is our goal in the present article to address the case of small-to-medium effect sizes, using informed priors for such effect sizes should not, theoretically, be in conflict with unexpected results. In case the effects are larger or smaller than encoded by the prior, the important question is whether informed priors still bear more evidence for the alternative hypothesis than the Cauchy prior.

In the present section, we will attempt to answer these questions by applying the elicited priors

to two-group comparisons of a variety of effect sizes in the dataset ofWetzels et al.(2011). Wetzels

et al.(2011) collected information (sample size, degrees of freedom and t-value) from 855 t-tests

published in 252 articles in the volumes of two major psychology journals in 2007 (i.e., Psychonomic Bulletin & Review and Journal of Experimental Psychology: Learning, Memory, and Cognition). The tests included paired and unpaired versions from a variety of effect sizes, ranging from small to very big.

We usedGronau et al.(2017) analytical solutions for arriving at the resulting BFs by plugging

in the parameters of priors (as estimated by the elicited distributions) and likelihood (stemming

from the t-test information of the Gronau et al.(2017) dataset) to the solution of the integrals

of the likelihood times the prior (for details seeGronau et al. (2017)). For the default priors we

used the Cauchy prior with scale parameter equal to 1/√2 for the alternative hypothesis and mean

(14)

Gaussian and t-priors are defined in the whole real line, we followedGronau et al.(2017) in using

a version of the elicited priors truncated at zero excluding negative values. Though effect sizes can be negative, depending on the order the means in the numerator of equation, we focused on the absolute value of these mean differences; in order words considering the strength and not the direction of the effect.

We derived BFs for both the Gaussian and t-priors for each Expert separately. This way we can investigate whether the choice of the underlying distribution or the field of study or personal beliefs make an essential contribution to the resulting BFs. We are further able to observe whether particular discrepancies among Experts matter for the resulting BFs.

For comparing informed and default BFs, we used the subset of 593 significant t-tests used in theGronau et al.(2017) dataset. The effects sizes were divided into four groups followingCohen

(1992) and Sawilowsky (2003) in their classification of small (δ = 0.2), medium (δ = 0.5), large

(δ = 0.8), very large (δ = 1.2). The results are shown in Figure5.

To begin with the t-priors (Figure5), the first observation we can make is that informed priors

provide more evidence for small-to-medium effects across the board; an expected result given that the elicited priors targeted this family of effects. However, an unexpected pattern of results emerges when we take into account the discrepancies among Experts. BFs of Expert 1, who elicited a prior distribution below the mode of small-to-medium effect sizes that we defined as δ = 0.35, were more informative for the alternative hypothesis only for small effects. Experts 2-4, on the other hand, whose priors were found higher than δ = 0.35 yielded BFs that are generally much more informative for the alternative hypothesis not only for small effects but also for medium and large as well. Only very large effects are better captured by the Cauchy prior and few exceptions of small effects. The BFs derived from Experts 5-6 yield a pattern of results that can be considered "middle ground" with respect to the BFs based on Experts 2-4 and Expert 1. Since the prior of Experts 5-6 assigned more density to values around δ = 0.4 and δ = 0.3 respectively, the resulting informed BFs provide more evidence for small and medium effects than large effects, where informed BFs are generally equal to these based on the Cauchy prior. For very large effects nevertheless, default BFs still remain more informative.

The fact that the Cauchy prior still provides more evidence for very large effects as compared

to the informed priors can be accounted for by the Cauchy prior’s spread (cf. Figure1). Although

not formally defined, with a scale parameter equal to 1/√2, the spread of the Cauchy distribution

is larger than the spreads of the elicited prior distributions. Thus, it follows that the Cauchy prior assigns more density to very large effects than either the Gaussian or the t-distribution. This is the reason why default BFs still provide more evidence for the alternative hypothesis for very large effect sizes.

The choice of the underlying prior distribution between t- or Gaussian contributes no essen-tial difference with respect to hypothesis testing, because the same patterns of results emerge in

Gaussian priors (see Figure A.1in the appendix). Therefore, we will make use only the elicited

(15)

Expert  1

Expert  2

Expert  3

Expert  5

Expert  4

1e+05 1e+43 ● ●● 100 1e+05 ● ● ●● ●● ●●● ●●●●●● ● ● ●● ● ●●●●● ● ● ● ● 30 100 ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 10 30 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● 3 10 ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 1 3

1 33 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2 1e+05 1e+43 ●●● 100 1e+05 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ● ● 30 100 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● 10 30 ● ● ●● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 10 ● ● ● ●●●● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ●● ● ● 1 3 1 3 ●● ● ● ● ● ● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2 1e+05 1e+43 ●●● 100 1e+05 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● 30 100 ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 10 30 ● ●●● ● ●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ●●● 3 10 ● ● ●●●● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ●● ● ● 1 3 1 3 ●● ● ● ● ● ● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2 1e+05 1e+43 ●●● 100 1e+05 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● 30 100 ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● 10 30 ● ●●● ● ●● ●●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● 3 10 ● ● ●●●● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●●● ●● ● ● 1 3 1 3 ●● ● ● ● ● ● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2 1e+05 1e+43 ●●● 100 1e+05 ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● 30 100 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 30 ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● 3 10 ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●●● ● ●● ● ● ● 1 3 1 3 ●● ● ● ● ● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2 1e+05 1e+43 ●●● 100 1e+05 ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● 30 100 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 30 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● 3 10 ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● 1 3 1 3 ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● 0.2 < d < 0.5 0.5 < d < 0.8 0.8 < d < 1.2 d > 1.2

Expert  6

Figure 5: Comparison of BFs of significant 537 t-tests based on the elicited t-priors and the default

Cauchy prior. The resulting BFs are presented according to theCohen(1992),Sawilowsky(2003)

(16)

parameter than Gaussian ones) and they essentially provide very similar results.

4.2

A simulation-based analysis

TheWetzels et al.(2011) dataset that we analyzed involves varying effect sizes and sample sizes.

We further engaged into a more systematic analysis of the comparison of informed to default priors as a function of the respective sample size. We proceeded by generating 1000 t-values from a uniform distribution U (a = 2, b = 15). Then we calculated default BFs, and informed ones based on the six elicited priors, but we kept the sample size fixed in three different sampling scenarios: n = 18, n = 40, and n = 136. The particular sample sizes were picked so as to reflect a variety of sampling occasions that are probable to be found in psychological research. Based on

a recent survey of Marszalek et al. (2011) in four psychological journals of 2006 (i.e., Journal of

Abnormal Psychology, Journal of Applied Psychology, Journal of Experimental Psychology: Human Perception and Performance and Developmental Psychology ), it was found that n = 18, n = 40, and n = 136, represented the first quartile, the median and the third quartile based on 690 articles

across different subfields of psychology.2 Thus the selected sample sizes for our simulation analysis

reflect plausible empirical situations of psychological research.3

The results of the simulation are presented in Figure6. The results show that generally informed

priors produce BFs providing more evidence for the alternative hypothesis than default ones do. Specifically, the elicited priors of Experts 2-4 were found to be more informative for the alternative hypothesis irrespectively of the sample size accompanying the t-value. This pattern was reversed only in very large effects, where BFs > 100, and it is the result of the strong predictions made by the informed prior. It is noteworthy that according to Experts 2-4 priors, default BFs were more informative for small effects with n = 136. Since small effects are closer to the predictions of the Cauchy prior, larger sample sizes make it easier for default BFs to be informative. As we observed in the Wetzels’ dataset, the prior of Expert 1 required a large sample size (n = 136) to be more informative for the alternative hypothesis. Generally, the prior of Expert 1 proved worse than default priors in supporting the alternative hypothesis even with median sample size (n = 40). On the other hand, the priors of Experts 5-6 presented a kind of "middle ground" between that of

Expert 1 and these of Experts 2-4, as it was the case for the results of theWetzels et al. (2011)

dataset. Expert 5-6 priors required median or large sample sizes to better support the evidence in favor of the alternative hypothesis. Yet, unlike the priors of Experts 2-4, with small sample size (i.e., n = 18) default priors provided more evidence for the alternative hypothesis.

The results of the simulation-based analysis further showed that having a larger sample size is generally desirable. However for most of priors, median sample size (n = 40) was found sufficient

2Marszalek et al.(2011) presented their 2006 results along with additional surveys over a time period spanning

the past 30 years. The latter are not taken into consideration here because of their temporal irrelevance to current psychological research.

3Though sample sizes selected for the simulation analysis is based on 2006 journals, the qualitative picture

presented inMarszalek et al.(2011) is taken to be indicative of the current state of affairs with respect to sample sizes in psychology, since it is reinforced by more recent surveys (see e.g.,Szucs and Ioannidis(2017)).

(17)

Expert  1

Expert  2

Expert  3

Expert  5

Expert  4

Expert  6

1e+05 1e+43 ●● ●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●● ● ● ●● ●● ● ●●● ●●●●●●●●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ●● ●●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● 100 1e+05 ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ● ● ● ●●● ●● ● ●● ● ●● ● ●●●●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● 30 100 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 30 ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 3 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136 1e+05 1e+43 ●●● ●●●●●●●●●●●● ●● ● ● ●●●●●●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ●● ●●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● 100 1e+05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● 30 100 ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 10 30 ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● 3 10 ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136 1e+05 1e+43 ●●●●●●●●●●●●●●● ●● ●●●● ●●●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●●● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●●● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ●● ●●●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●●●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● 100 1e+05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● 30 100 ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 30 ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● 3 10 ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136 1e+05 1e+43 ●●●●●●●●●● ●●●●●●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ●● ●●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● 100 1e+05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● 30 100 ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 30 ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● 3 10 ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136 1e+05 1e+43 ● ●● ●●●●● ●●● ●●● ●●●●● ●●● ●●●● ●●●● ●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●● ●● ● ●●●● ●● ●●● ●● ●●●●●●●● ● ●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●●● ● ● ●● ●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ●● ● ● ● ●●●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● 100 1e+05 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● 30 100 ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●●● 10 30 ●● ●●●● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● 3 10 ● ● ● ●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136 1e+05 1e+43 ● ● ● ●● ● ●● ●● ●●●●● ●●●●●● ● ● ●●● ●● ●●●●● ●● ●●●●●●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●● ●●●● ●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●● ●●● ● ● ●●●● ●●●● ●●●● ●●●●●●●●●●●●●● ●● ● ● ● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●● ● ●●●●●●●● ● ● ●● ● ●●● ● ● ● ●● ●● ●●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●● ●●● ● ●● ●●●● ●● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ●● ●● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ●●● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● 100 1e+05 ● ●●● ●● ●●● ● ●●●●● ●●●●● ● ● ● ● ●● ● ● ●● ● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● 30 100 ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● 10 30 ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 3 10 ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● 1 3 1 3 ● ●● ●

3 1010 3030 100100 1e+051e+05 1e+43

BFF+0 BF+0 ● n = 18 n = 40 n = 136

Figure 6: Comparison of 1000 BFs based on the elicited t-priors and the default Cauchy prior per sampling scenario (i.e., n = 18, n = 40 and n = 136 respectively). BFs are presented according to each sampling scenario. The line indicates the identity function.

(18)

to maintain the informativeness of informed priors against default ones. Informed priors closer to δ = 0 required more sample size in order to be more informative for the alternative hypothesis.

5

Discussion

The results presented in section 4 allow us to make the case for the use of informed priors in

hypothesis testing. As pointed out by earlier research (e.g., seeBousquet(2006);Goldstein et al.

(2006)) informed priors make it possible to incorporate established knowledge into the way

re-searchers test empirical hypotheses. Intensified by the replication crisis in psychology (e.g., for

reviews on the topic seeEarp and Trafimow (2015); Maxwell et al. (2015)), it is implausible for

researchers to assume that the null hypothesis is true even if they replicate a previously conducted experimental design or empirical finding. Informed priors are able to encode the accumulated body of knowledge and use it in statistical analysis.

In the case of small-to-medium effects sizes, the results presented in section 4 support the necessity of involving informed priors in hypothesis testing. Our analysis indicated that although Experts tend to overestimate small-to-medium effects, such overestimation, when encoded as prior knowledge, has desirable outcomes. Specifically, when effect sizes are a priori hypothesized to be distributed around δ = 0.6, then the data strongly support the alternative hypothesis not only for small-to-medium but also for medium and large effects as well as based on a variety of corresponding effect sizes. Thus, even in the case that researchers wrongly take a medium or large effect in the population to be small-to-medium, the informed priors will still support the truth of the alternative hypothesis. Default priors become more informative than informed ones only in very large effects or when the sample size is not sufficiently large. However, in very large effects, because they are easy to find even with a small sample, even though default priors are more informative, both default priors and informed ones provide overwhelming evidence in favor of the alternative hypothesis. All in all, informed priors assist hypothesis testing, when the alternative hypothesis is true, so that a range of effect sizes are established as statistically significant.

Our results point out that Gaussian and t-distributions are interchangeable for prior elicita-tion because they make essentially identical predicelicita-tions for the prior distribuelicita-tions. On practical grounds, we made use of the t-distribution because it provides more parameters than the gaus-sian one (scale, location and degrees of freedom). It is noteworthy however that these families of distributions are characterized by symmetry and thus they cannot express skewed shapes of distributions. Future research will accommodate this fact by incorporating distributional forms of priors of non-symmetric shape.

Our study also presented some practical insights on the way we can elicit prior knowledge from experts of various fields of psychological research. Though the design of the elicitation procedure was thoroughly devised and thought through, future research will attempt to ameliorate it in es-sential ways. For example, our operationalization did not include counterbalancing of presentation items. Future research might take this aspect of the experimental design into account, and it may

(19)

show whether it makes any essential contribution to the elicitation procedure. In addition, future studies may employ an increased sample size compared to the one employed in the present study, as well as draw from more subfields of psychological research so that future inquiry provides an even broader picture of the beliefs of psychologists regarding small-to-medium effect sizes.

Referenties

GERELATEERDE DOCUMENTEN

To sum up, the Bayesian meta-analytic results based on the informed prior for the effect size provide very strong evidence in favor of the hypothesis that power posing leads to an

After obtaining the data from Fanelli, we performed more standard meta-regression analyses

Figure 7 shows that for this sample period the stocks in the low quintile exhibit a positive and linear relationship between standard deviations and average returns

Following conventional methodologies, practiced and described in the size effect literature, no evidence is found for small firms yielding higher average returns compared to

I then find that there are positive significant January effects in all Chinese stock market segments, moreover, there is a small size effect in the Shanghai

The aim of this research was to assess to what extent using prior information on parameters for an observed relation between a measured confounder and the out- come in a

The Bayesian prediction models we proposed, with cluster specific expert opinion incorporated as priors for the random effects showed better predictive ability in new data, compared

For x  0:15, starting from a superconducting sample, the low temperature magnetic order in the sample reoxygenated under 1 bar oxygen at 900  C reveals a peculiar modulation of