• No results found

Using Bayesian hypothesis testing to explicit researchers' beliefs in education

N/A
N/A
Protected

Academic year: 2021

Share "Using Bayesian hypothesis testing to explicit researchers' beliefs in education"

Copied!
35
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Using Bayesian Hypothesis Testing to Explicit Researchers’ Beliefs in Education Bachelor Thesis

Steven Kuijper 100060 words

Supervisors: Prof. dr. M. S. Merry & dr. T. D. Jorgensen 14-6-2018

(2)

Table of contents

Title page 1

Abstract 3

Introduction 4

1. Why are values and beliefs important? 6

1.1 Beliefs 6

1.2 Values 7

1.3 How unexamined beliefs can influence evidential assessment 8

2. The problem of frequentist NHST 11

2.1 Null-hypothesis significance testing 12

2.2 Objectivity in frequentist statistics 13

2.3 Sesame Street: applying frequentist method 15

3. The alternative: Bayesian hypothesis testing 20

3.1 Bayes theorem and the Bayes factor 21

3.2 The importance of prior probabilities 23

3.3 The role of prior beliefs 24

3.4 Epistemology of Bayesian hypothesis testing 26 3.5 Revisiting Sesame Street: the Bayesian analysis 27

4. Conclusion 31

(3)

Abstract

Applied Bayesian statistics is becoming a more popular analysis technique within the social sciences and it is proving to be a valuable alternative to the classic frequentist statistical analyses. This thesis focusses on the relative merits of explicating and incorporating prior beliefs which is essential in Bayesian hypothesis testing. First of all, the need to explicit prior beliefs will be justified philosophically by emphasizing the epistemic values of educational researchers. To be transparent in prior assumptions, open to critique by other researchers and having a strong moral ethic to report findings honestly will result in stronger warrant for their research. Moreover, statistical and methodological considerations were also explored. The interpretation of a frequentist null-hypothesis significance test can be influenced by prior beliefs, which is problematic because standard hypothesis testing does not quantify beliefs and how strong of an effect is has on evidential interpretation. Bayesian hypothesis testing does quantify beliefs and dictates how researchers should update their prior beliefs after seeing the evidence and is proposed as a viable alternative to frequentist hypothesis testing.

Key words: Bayesian Statistics, Frequentist statistics, Hypothesis testing, Epistemology, Beliefs, Values.

(4)

Using Bayesian Hypothesis Testing to Explicit Researchers’ Beliefs in Education. Suppose a researcher would like to assess whether a newly developed intervention is successful at increasing motivation in physics education. In accordance with traditional quantitative research, the researcher collects data, conducts a null-hypothesis significance test and after seeing a p-value of less than 0.05, the researcher concludes that the intervention is successful. He might even claim that the intervention is now evidence-based. Now consider that the researcher had—for an arbitrary reason—a strong belief that the intervention would not be successful, would the evidence be more compelling or less compelling than if the outcomes had seemed equally possible. I will argue in this thesis that evidence relative to prior beliefs is more compelling if it is consistent with prior beliefs and predictions. Goldstein (2006) captures the basis of this argument is an elegant sentence: “We may argue thata

scientific case is “proven" if the evidence should be convincing given any reasonable assignment of prior beliefs” (Goldstein, p410).

This statement, however, poses a problem for classic quantitative research within

education science. Within ‘frequentist’ statistical analyses in which the majority of statistical analyses (p-values, confidence intervals and significance testing) are being conducted and interpreted, it is impossible to quantify and incorporate prior beliefs. The elegance of working within this framework is that it appears to be highly ‘objective’, because prior beliefs—that might be wrong—form no part of the analysis. I shall argue that this is a misconception and that incorporating prior beliefs can be essential to accurately interpreting the available evidence, and moreover that doing so is also necessary and justified, because researchers in education find themselves in a highly complex field in which their beliefs and values influence their research.

I shall argue that adopting a Bayesian framework when conducting education research ‘ticks all the boxes’. Bayesian analysis, as opposed to frequentist analysis, does incorporate

(5)

prior beliefs and assesses contradictory evidence relative to these beliefs. It measures the weight of evidence and assigns probabilities to hypotheses. It therefore serves as a suitable candidate for an alternative statistical analysis.

It might be an appealing thought that the process of scientific inquiry is purely neutral, without bias, beliefs and values of the researcher conducting the inquiry; however, this positivistic paradigm whereby inquiry is conducted in a detached and neutral manner is an outdated, foundationalist philosophy. It would be naive to argue that research is not in any way influenced by personal characteristics such as beliefs and values of the researcher. In the post-positivist paradigm, these things are taken into account. In fact, within this paradigm researchers’ values such as motivation and devotion to their research subject are an integral part and are to be acknowledged (Henderson, 2011).

Because of the well-established and defined post-positivistic paradigm, it has become widely accepted that values and beliefs influence research. I will argue that the way in which quantitative results are currently being analyzed and interpreted can be problematic and that adopting a Bayesian perspective fits better in the post-positivistic tradition.

To argue this standpoint I will first of all carefully define values and beliefs and elaborate on the role they play in educational research and why it is justified to be explicit about those beliefs and values. Next I will highlight some problems with classic frequentist null

hypothesis significance testing (hereafter referred to as NHST) in relation to values, beliefs and objectivity of scientific inquiry. Furthermore, the relative merits of adopting a Bayesian framework to incorporate beliefs will be explained, and finally, using examples I will try to illustrate what a Bayesian analysis can offer over a frequentist analysis.

1. Why are values and beliefs important? 1.1 Beliefs

(6)

Scientific inquiry is a process of theorizing, philosophizing, researching, interpreting and reporting. Because of the cumulative character of this process in which researchers build new knowledge upon previous research, it is essential that this takes place in a transparent manner. However, researchers as part of a scientific community have—unavoidably—certain beliefs, values and dispositions, which might be favorable (i.e. honesty) or unfavorable (i.e. implicit bias) in their research. Either way, it is unavoidable that these features play a role in

researchers’ inquiries. To give a simple example, it is unimaginable that physicists in the Manhattan project during the Second World War were completely value-free (Howe, 1985). The aims of the research was value-laden in the sense that their research was driven by personal and political motives. A somewhat crude example, but it clearly shows how values can inform “objective” research.

Unlike logical positivism, post-positivism acknowledges the existence of beliefs and values in research. However, this makes matters more challenging. It would be simpler to work within the logical positivism paradigm and claim science to be completely neutral, i.e., ‘value-free’, and therefore not having to address challenging issues such as bias (in any form), because they simply play no role. Thus, the challenge is figuring out what role these beliefs and values ought to play. I argue that one solution is to make prior beliefs and assumptions transparent and explicit. To do so would enable fellow researchers and other stakeholders to be critical of the research questions, methodologies and/or interpretations of results. If these are not made explicit the possibility that the research being conducted is compromised by flawed assumptions increases substantially.

(7)

In order to argue why the need for making prior beliefs and assumptions explicit is justified, understanding the role of researchers within the scientific community is essential. Elgin (2013) provides a solid basis for understanding this role. In her paper she describes researchers as epistemic agents, who are part of an active epistemic community—similar to scientific communities that Thomas Kuhn describes in his Structure of Scientific Revolutions (Kuhn, 1962)—that accumulates knowledge, devise methods and set standards for what is acceptable. Whereas Kuhn focusses on how paradigm shifts can take place within epistemic communities, Elgin focusses on the agency of individual researchers within the community. Epistemic agents have, according to Elgin, a responsibility to be aware of their own

influences on their research and how their values and beliefs unwittingly guide their data collection, analysis and interpretation. Making them explicit opens them to careful examination, which in turn can lead to further scientific progress.

Epistemic values. Elgin categorizes intellectual traits and other very specific epistemic tendencies such as an interest in the truth and objectivity, an openness to criticism and a strong moral ethic to report findings honestly, as epistemic values. These epistemic traits and values of researchers and scientist—which Elgin calls epistemic agents—are central to the argument, for these traits will make it more likely that research findings have stronger warrant.

Non-epistemic values. A lack of epistemic virtues, such as dishonesty in reporting findings and not being transparent in assumptions would have negative consequences on the warrant of their research. A popular example of non-epistemic values is the case of the Dutch psychology professor Diederik Stapel, who based his conclusions on non-real data. The exact reason as to why Stapel forged his data is unclear, however it is not unimaginable that values such as prestige and status guided his decisions and thus his research. Researchers therefore carry the responsibility, not only to themselves but also to the epistemic community to act

(8)

epistemically responsible. This also played a role in the case of Stapel. A few doctoral students who had based their research on Stapel’s data had to retract their publications

because they were based on forged data. This emphasizes the negative role that non-epistemic values can play.

1.3 How unexamined beliefs can influence evidential assessment

Aside from the philosophical justifications as to why beliefs should be made explicit, empirical evidence also supports this claim (Koehner, 1993; Lord, Ross & Lepper, 1979) . Koehner (1993) investigated if and how prior beliefs can influence judgements of scientific evidence. He tested his hypothesis by conducting two experiments, one of which I shall illustrate to make my point. In his first experiment, he provided graduate students with background information of a certain (fictionalized) biological topic; the students either received a version with highly detailed information and compelling argumentation or low-informative and weak argumentation. By using this strategy he primed his students with either strong prior beliefs (highly informative and compelling information) and with no strong prior beliefs (little informative and no compelling information). Next, the students were given a (fictionalized) research report on the topic of the information they had just read. There were two types of research reports: one who’s findings conformed to the information the students had just read and one who’s results contradicted the information. Their task was to assess the quality of the evidence that was being presented.

The result was surprising: students who had been primed with a strong prior belief assessed the quality of a research report where the evidence contradicted with their belief to be lower than the reports that confirmed their prior beliefs. This agreement effect (that confirming findings are more often deemed as higher quality rather than contradictory findings) was especially present with the students who had been primed with strong beliefs. Even though the

(9)

effect was less visible with students who had not been primed with strong beliefs, it was nevertheless present.

This phenomenon is commonly referred to as belief polarization (Lord, Ross & Lepper, 1979; Kelly, 2008; Jern, Chang & Kemp, 2014; Cook & Lewandowsky, 2015). Two persons can judge the quality of evidence differently based on their prior beliefs. Kelly (2008) provides a detailed account on the mechanism at play in belief polarization. Suppose a researcher believes a certain hypothesis H to be true. To test the hypothesis the researcher conducts an inquiry and gathers evidence E1 . If E1 provides support for H, then his beliefs are confirmed. If however he then is presented with contradicting evidence E2 he is most likely to dismiss that as misleading evidence. If the evidence presents itself in an opposite order (E2 then E1) the researcher finds his beliefs contradicted and will have a justified belief that H is wrong, therefore dismissing E1 when it presents itself after E2. Therefore the order in which evidence presents itself in combination with prior beliefs can in fact influence the way in which researchers assess evidence. Thus the temporal order in which evidence presents itself combined with the prior beliefs of the researcher will result in a polarization of beliefs. It would therefore seem that epistemic agents have a tendency to assess evidence irrationally because of their prior beliefs.

Education sciences. Although this is problematic for any scientific field, it is especially problematic in the education sciences. Researchers within this field develop new teaching strategies, learning interventions or anti-bullying interventions. If they conduct research into the effectiveness of their programs, interventions, etc., they often will try to prove their program to be effective. This is not surprising: education researchers strive to improve current educational settings, therefore they unavoidably have strong beliefs in the effectiveness of the interventions they propose, otherwise they would not have developed them in such a manner.

(10)

As a result, the way in which researchers asses the quality of their own conclusions will be influenced by beliefs they already possess, as shown by the effect of belief polarization.

What can be concluded then, is that education researchers have strong beliefs in the hypotheses they propose (and hence weak beliefs in alternative explanations), which has implications for the manner in which they will analyze their data and the conclusions that can be drawn. It is important to state this; however, these implication will be discussed in greater detail in section 2.

Dogmas. Another field in which belief polarization manifests itself is in policy making. Dogmas are unshakable beliefs not to be questioned and ‘obviously’ right. It is therefore that certain dogmas in education policy making can result in policy makers disregarding evidence that refutes their ideas and ideals. This occurs in the same way as scientist who find

contradictory evidence with their prior beliefs. A very concrete example of belief polarization in policy making is the intensive government attention of early childhood education in the Netherlands. From the year 2020 the Dutch government will invest 486 million euros to implement these intervention, because the idea exists that they are effective in stimulating cognitive development. According to Driessen (2017) these ideas are based on very positive results of early childhood education programs in the USA. However, from multiple meta-analyses of early childhood education programs in the Netherlands (Driessen, 2017; Fukkink, Jilink & Oostdam, 2017; Blok, Fukkink, Gebhardt & Leseman, 2005), the most recent one published in 2017, it becomes clear that these early childhood interventions are in fact not effective. Driessen (2017) concludes that the effect-sizes as computed by researchers in the Netherlands were very small (i.e., Cohen’s d < 0.3). This would not seem to be a justified investment based on the current evidence available. Even though might be the case, the government continues to invest in early education without questioning its current

(11)

effective in the Netherlands as well if implemented correctly. It is therefore easy to see from this example how certain dogmas in education policy, full of prior beliefs, can influence the judgement and interpretation of evidence.

This problem, whereby scientists and non-scientist are influenced by their beliefs, is

apparently unavoidable. No matter what good intensions researchers might have, research will always be guided and interpreted to some degree by personals beliefs. Research can be

harmed by incorrectly adjusting and updating your beliefs by the same way that some non-epistemic values can lead to poor quality research. It is therefore imperative that an alternative method should be proposed that considers these objections.

2. The problem of frequentist NHST

In the previous section I have identified how prior ideas exist and that they pose a problem in evidential and quality assessment in research. It would therefore seem reasonable to utilize statistical methods that take these beliefs into account. However, commonly used statistical hypothesis testing procedures do not consider prior beliefs. Therefore, it does not tell researchers how the evidence relates to their prior beliefs and consequently have no sense how strongly or weakly their beliefs are affirmed or not affirmed by the evidence available.

The discussion about beliefs affirms the idea that social science can never be truly objective. However, the practice of current quantitative research opposes this statement. Research data are frequently analyzed by objective methods and statistical techniques, which raises the question whether these objective analyses can be a justified manner to test

hypotheses, when researchers have strong beliefs toward one or more hypotheses.

In this section, I will illustrate the problems of frequentist statistics regarding objectivity and the disregard of prior information. First of all, a historical perspective on the

(12)

seemingly objective analyses are problematic. Next, a short description of the frequentist null-hypothesis significance testing procedure will be given and finally, by utilizing an empirical example, the criticisms of frequentist analysis will be explained.

2.1 Null hypothesis significance testing

Null hypothesis significance testing (NHST) was first introduced by Sir Ronald Fisher (Fisher, 1922) and later further developed by Neyman and Pearson. The basic concept is to formulate two hypotheses, the null- and the alternative hypothesis, conduct an experiment and calculate the probability of finding a result that differs from the null hypothesis as much as the data do, if the null hypothesis is assumed to be true: a p-value. If the probability of such an event is very small, say a probability of 0.05 (or 5 percent) or less, we would conclude that the null hypothesis is not true and should therefore be rejected.

Wagemakers (2007) illustrates the set up and analyzing procedures within the Fisherian paradigm with the following example. Say I were to give you 12 true or false questions and assume that you answered 9 out of 12 correctly. Suppose the researcher would like to know whether your performance is obtained by random guessing [or that you actually have studied for the test]. He first would set up a null-hypothesis. The null hypothesis would state that you have guessed randomly. We are going to test this hypothesis by investigating that you did not guessed randomly. Next, we add values to our hypotheses. Since our null-hypothesis is that you guessed randomly either ‘true’ or ‘false’ you have a probability of ½ to guess correctly. We then test our finding, that you have answered 9 out of 12 questions correctly, against this value of one half. Using a binomial model and plugging in the values, you would find that the probability of getting 9 out of 12 questions correct under the assumption that you where guessing is 0.054 (or 5.4%). It thus seems perfectly reasonable to doubt that you have guessed the questions, because answering as many as 75% correct would be very unlikely if you were

(13)

only guessing (with a 50% chance of guessing correctly). Moreover, if the computed probability is lower than some arbitrary threshold which is usually 0.05, you reject the null hypothesis. Seeing as the p-value in this example is 0.054, you would not reject the null-hypothesis.

Although the aim of this thesis is not to dive into the philosophical aspects of p-values, the toy example above provides a basic understanding as to how a NHST is performed. Obviously this is merely one of many applications of frequentist statistics and there are far more complex and interesting analyses than the above. All of them are, however, based on the premise that prior assumptions and beliefs are not included in the analysis, making it seem as if the analysis in conducted in a completely objective manner.

2.2 Objectivity in frequentist statistics

Before delving into the notion of objectivity and subjectivity in frequentist statistics it is imperative that these concepts are defined. Subjectivity can be defined as personal beliefs of an individual scientist in contrast to beliefs that are generally shared in that field. If beliefs are widely accepted and used as common knowledge it generally is to be found objective, says Blyth (1972). In contrast to subjectivity, being objective means discarding your personal beliefs and preconceptions and purely basing your judgements on experimental evidence. Blyth captures the notion of what it means to be objective in the following quote: “Your subjective beliefs might be wrong (…) the measure of a good scientist is his reliability—the degree to which others in his field will agree with his findings. Even one failure (…) can ruin a scientist’s reputation (Blyth, p20).”

This distinction by Blyth between objective and subjective is merely used as an instrument in understanding the frequentist approach and philosophy to hypothesis testing, when in fact true objectivity is unobtainable . Humans individually interpret an ‘objective’ reality that

(14)

exists outside of our ‘subjective’ self. The objective approach as described by Blyth would imply describing reality without any subjective influence. By definition of human subjectivity this is impossible because humans interpret and process an objective reality and inescapably makes it subjective in the process . True objective empirical research, therefore, does not exist.

Frequentist statistics however finds probabilities in an objective world that exists

independently of the observer according to Gelman (2017). Thus this method actually makes the sharp distinction between subjective and objective and implies that it is possible to describe an objective reality without subjective interpretation. Frequentist methods therefore, without involving personal beliefs, would be an elegant solution to overcome the influence of personal beliefs.

Unfortunately Blyth’s (1972) following toy example illustrates that a simple frequentist experiment can be influenced by subjective conditions. Suppose you toss a coin and you would like to know if it is a fair coin. You toss the coin (say a one euro coin) by placing the edge of the coin on a plate of glass and spin it by a sharp blow of the right index finger. Suppose we find a value of around p= ½, meaning we find an equal amount of heads and tails. For us, this would be an objective result seeing as we have not done anything to influence the behavior of the coin. Now suppose the experiment is replicated by another researcher and they find a different proportion of heads and tails, say ¼ times heads. Without questioning prior conditions, we would be inclined to conclude that the coin is not fair and favors tails more than heads. But what if the surface on which the coin spins is not perfectly horizontal, then we might conclude that the ¼ is due to the tilted surface and had nothing to do with the fairness of the coin. If we had known in advance what prior circumstances were, that knowledge would lead us to different conclusions. So there is a sense of subjectivity even in these simple experiments.

(15)

Gelman (2017) makes an even stronger claim: researchers cannot, in any manner, escape subjectivity in statistical analysis. The way in which most researchers use statistics—which often means significance testing with p-values—merely appears objective. The entire notion of a significance level of 0.05 seems like a very subjective threshold when in fact personal decision making plays an important part. Where to set the arbitrary threshold of the

significance level, for example, is a personal choice. Moreover, the main philosophy of using statistics is to use it as a tool to make decisions (reject, not reject, accept, etc.). Because decision making in itself is subjective, to claim it to be perfectly objective is misguiding and untrue.

2.3 Sesame Street: applying frequentist method

To illustrate my points on the issues of beliefs, values objectivity of frequentist analysis, I will apply the frequentist paradigm to an empirical research setting. The children’s television series Sesame Street was introduced to stimulate the cognitive development of disadvantaged children in the United Stated. To assess the impact of the series on children in various regions (rural and non-rural), data were collected and analyzed to inspect whether children had actually progressed in their development. The dataset, provided by Applied Multivariate Statistics for the Social Science contains N = 240 subjects, will be used as an example dataset on which the analyses will be performed.

The administered test consisted of various six parts with questions such as about numbers, forms and knowledge of the body. Also background variables such as the neighborhood in which the children live and if they have watched the series at home or at school were included. Originally, the dataset was used to investigate if economically disadvantaged children would catch up with the economically advantaged children. For the purpose of this

(16)

analysis, however, I will focus on the question whether disadvantaged children benefitted more from Sesame Street in literacy development then advantaged children.

First of all, it is essential to have some theoretical basis on which we base our hypothesis, which is the first step in every analysis. The home literacy environment (HLE) is a broad term to describe literacy-related activities resources, interactions and experiences (Hamilton, Hayiou-Thomas, Hulme, Snowling, 2016). Moreover, HLE has been found to predict literacy development in childhood, meaning that engaging is literacy-related activities has a positive effect on literacy development (Bus, IJzendoorn & Pellegrini, 1995; Burgess, Hecht & Lonigan, 2011; Niklas & Schneider, 2014). What is more, from Niklas and Schneider (2013) is becomes apparent that socio-economic status is an important factor in the prevalence of HLE in families. Thus we base our hypothesis on the positive relation between HLE and literacy development and the difference in prevalence of HLE in low and high socio-economic families.

From this theoretical basis we can now formulate our hypothesis: disadvantaged children (from families with a low socio-economic status) will benefit more from Sesame Street than advantaged children. The next step is to identify a statistical method to investigate the hypothesis, which in this case would be an independent students t test. With this analysis we investigate whether the difference in pre- and post-test scores (pre and post watching Sesame Street) from disadvantaged children is higher than advantaged children. We expect the difference in pre- and post-test scores to be higher with disadvantaged children because that would indicate that these children benefitted more from Sesame Street than advantaged children.

The next step is to choose a statistical test to investigate our hypothesis: in this case an independent t test would be most appropriate. It tests whether two groups (advantaged and disadvantaged) differ in some parameter (difference of pre and post test score). Next, we need

(17)

to specify the hypothesis we are going to test. Let µdiff denote the mean difference in pre- and

post-test scores and let subscript A, D denote advantaged and disadvantaged children. The null hypothesis 𝐻𝐻0: µ𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑,𝐴𝐴 = µ𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑,𝐷𝐷 which states that the differences in pre- and post-test

scores in both groups are equal; the alternative hypothesis 𝐻𝐻0: µ𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑,𝐴𝐴 ≠ µ𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑,𝐷𝐷 states that the

difference in the disadvantaged group are not equal to the difference in the advantaged group. Furthermore I set significance level at a value of 0.05.

After using statistical software to compute the p value for the t test as described above, we find that the difference in difference scores in the advantaged (M=10.99, SD=10.92)

compared to the disadvantaged (M=10.41, SD=11.57) children was not significant; t(238)=0.395, p=0.693. Indicating that disadvantaged children did not benefit more from Sesame Street compared to advantaged children in literacy development. That is not to say that disadvantaged did not benefit significantly but they did not benefit significantly more than advantaged children. This result is often seen as disappointing by education researchers, after all the intervention was designed with a specific reason: to stimulate literacy

development in disadvantaged children.

Philosophical considerations. Let us now explore the kind of conclusions we can draw from this result and assess whether they are justified. First of all, as I have illustrated , applying this frequentist method of statistical analysis make the p-value seem as if it is an objective quantity, when in fact subjective choices have been made. Consider our values and beliefs prior to our investigation. In the same manner as the development of the atomic bomb in Manhattan project was value laden because the research was driven by personal and political factors, our investigation was also value laden. Consider the values of the

researchers; more often than not, they are trying to prove that their interventions are effective. Even if other researchers investigate the intervention (that they did not create) it is unlikely that they are completely neutral in the sense that they do not predict outcomes, as humans we

(18)

have a tendency to make assumptions and predict outcomes. If a researchers on the other hand is skeptical about the effectiveness of Sesame Street he could be set out to disprove the

intervention. This happens consciously and fits well within the Popperian tradition of falsifying theories. Proving theories and interventions to be effective happens outside of conscious awareness of most empirical researchers. Even though good researchers will conform to the Popperian method in trying to falsify their theories it is reasonable to assume that a certain degree of confirmation bias exists, because the research into the effectiveness of their designed intervention is value laden The researchers who designed Sesame Street did so for a reason: to help disadvantaged children. They did not design it to be ineffective, thus the researchers do not assume prior to the investigation that Sesame Street does not work. Bias in one way or the other is thus very subtle.

That research is value-laden is not necessarily problematic, I have demonstrated that epistemic values should influence research, for example, the manner in which we predict outcomes. In our example, we were determined to prove that Sesame Street is effective. As a result, we have primed ourselves—by studying empirical literature—with the idea that Sesame Street should be effective. Within the post-positivistic paradigm this is not

problematic because it acknowledges the existence of values and beliefs in research and these might have in fact motivated us to produce a high quality research. Epistemically it is

therefore perfectly reasonable to have a certain belief prior to the investigation on which you base hypotheses.

Methodological considerations. However, accepting that premise has consequences for how likely we deem our hypotheses to be actually true. In our analyses the null hypothesis predicted there to be no difference in the average difference scores between the two groups, and our alternative predicted there to be a difference. Because we have primed ourselves with the idea that Sesame Street should be effective, we actually deem the alternative hypothesis

(19)

more likely to be true than the null hypothesis. As I have demonstrated, such a belief toward a specific hypothesis effects our judgement of the evidence. In our example we found that there was no statistically significant effect, as such belief polarization might lead us to conclude that the research design was flawed, or that the test was actually not valid or reliable. In short, an enormous amount of alternative explanations could account for the non-significant p-value. If, on the other hand, we had found a significant effect we would not be inclined to seek alternative explanation for the data and would accepted the result as an indication of Sesame Street being effective in literacy development. Now suppose a skeptical researcher replicates the study and also finds a non-significant result, the researcher then confirms his prior belief that the intervention does not work. If, however, he does find a significant result, he might also seek different explanations to account for this result. Consequently our prior beliefs influence the interpretation of the evidence. The conclusion researchers base on objective frequentist NHST method, are not objective at all.

The problem with this ‘objective’ frequentist analysis is that, although we deem the

alternative hypothesis more likely to be true than the null, we actually treat both hypotheses as equally possible by not incorporating prior information. As I have just shown in our example analysis, the interpretation and assessment of statistical evidence can as a result differ based on prior beliefs. Because of the equal treatment of both hypotheses, the NHST procedure does not tell us (or quantify) how the evidence relates to our prior beliefs. As we have seen in the belief polarization effect, humans tend to make unwarranted decisions after seeing different types of evidence leading to unreasonable conclusions. Making those beliefs explicit and incorporating these into the analysis to assess how the evidence should be evaluated in relation to prior beliefs, would be a more rational approach to making decisions.

In conclusion: the non-significant p-value we found in the example analysis as a result of an objective analysis can lead to different subjective interpretations, which can differ as a

(20)

result of prior beliefs. Not only is it philosophically justified to incorporate prior beliefs, it is also methodologically justified. Our beliefs influence the interpretation of evidence, which can lead to belief polarization. We need therefore a different analysis approach that estimates how the evidence relates to our beliefs in order to make more informative decisions.

3. The alternative: Bayesian Hypothesis testing

Recall the example of the coin tossing experiment to assess whether it was a fair coin. The set up was quite simple: we would count the number of heads and tails after a spinning it number of times, and would make a decision on the basis of the proportion of tails. Suppose the researcher again find a proportion of ¼. Meaning after spinning it a number of times, he finds three times as much heads than he does tails. In classic frequentist tradition we would analyze the data and conclude that it is not a fair coin, because of the larger frequency that heads occurs then tails. Now suppose we have reason to assume that the coin was not fair before we even began our experiment. For instance, we might have insight into the production process of the coin and know that it is probably biased toward heads. There is no absolute certainty that it is biased, therefore we conduct the experiment to either confirm or falsify our belief. Regardless of prior beliefs, we observed the proportion of ¼ in favor of heads.

Obviously we are not surprised by the result, because we believed that the coin was unfair. However, now suppose we had a strong belief that the coin was fair before we began the experiment and again we found the same result. Our beliefs then would be falsified for we expected that the proportion of heads and tails would not be ½. These examples show that our interpretation of experimental evidence depends on our prior beliefs. We assess evidence in relation of what we believed in the first place and we then update our beliefs.

Moreover, what follows from the phenomenon of belief polarization is that individuals are sometimes irrational decision makers. When confronted with conflicting evidence with their beliefs, they sometimes irrationally dismiss the evidence as an anomaly or as a low-quality

(21)

research design. I have already shown that current statistical techniques do not consider prior information and therefore fail to make reasonable decisions. This section, however, will focus on a different statistical methodology: namely, Bayesian hypothesis testing. Bayesian

statistics dictate precisely how researchers should update their prior beliefs with the evidence available, avoiding that researchers make irrational decisions. First of all, I shall elaborate on some technical details in order for a better understanding of this methodology. Next, I will illustrate what a Bayesian analysis can offer over a frequentist in terms of integrating prior knowledge and the conclusions which we can draw from that. Finally I will apply the

Bayesian methodology on the same dataset and research question as the empirical example of in section 2.

3.1 Bayes theorem and the Bayes factor

This process of explicating prior beliefs and updating those beliefs by acquiring new evidence is the main philosophy of Bayesian statistics. The basis for Bayesian statistics is Bayes theorem (equation 1). Bayes theorem was published in 1763 by Thomas Bayes and the theorem concerns itself with the conditional probabilities of certain events based on certain prior knowledge of conditions that might be related to that event. Which has proven to be especially useful in hypothesis testing, where we have evidence and certain prior knowledge. Although this thesis needs not go into the technical details of the mathematics involved, a basic understanding of the Bayes factor equation is essential. Equation 2 is an extension of Bayes’ theorem and is used to compare two competing hypotheses (H1 and H0). Equation 3 is exactly the same as equation 1, however its notation is in words rather than a mathematical notation.

(22)

Equation 1: 𝑃𝑃(𝐴𝐴|𝐵𝐵) = 𝑃𝑃(𝐵𝐵|𝐴𝐴)𝑃𝑃(𝐴𝐴)𝑃𝑃(𝐵𝐵) Equation 2: 𝑝𝑝(𝐻𝐻1|𝐸𝐸𝐸𝐸𝑑𝑑𝑑𝑑𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸) 𝑝𝑝(𝐻𝐻0|𝐸𝐸𝐸𝐸𝑑𝑑𝑑𝑑𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸)

=

𝑝𝑝(𝐻𝐻1) 𝑝𝑝(𝐻𝐻0)

𝑝𝑝(𝐸𝐸𝐸𝐸𝑑𝑑𝑑𝑑𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸|𝐻𝐻1) 𝑝𝑝(𝐸𝐸𝐸𝐸𝑑𝑑𝑑𝑑𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸|𝐻𝐻0)

Equation 3: 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜𝑜𝑜𝑃𝑃 = 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜𝑜𝑜𝑃𝑃 ∗ 𝑙𝑙𝑃𝑃𝑙𝑙𝑙𝑙𝑃𝑃ℎ𝑃𝑃𝑃𝑃𝑜𝑜 𝑃𝑃𝑟𝑟𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑃𝑃𝑒𝑒𝑃𝑃𝑜𝑜𝑃𝑃𝑒𝑒𝑒𝑒𝑃𝑃

To illustrate how these equations work let us again use the example of a coin. The first ingredient is to have two hypotheses. Our first hypothesis (H0) is that the coin is fair and our

second is that the coin is biased (H1). The first step in any Bayesian analysis is to specify our

prior beliefs in the form of prior odds (eq. 2). We would have to assign probabilities to our first hypothesis: the coin is fair; and to our second hypothesis: the coin is biased. Now suppose for the sake of argument we have a strong belief that the coin is biased. And we think—for whatever reason—that it is two times as likely that coin is biased versus it being not biased. So our prior odds would be equal to two (i.e., “2:1 odds in favor of bias”). We then conduct the experiment and we calculate the likelihood ratio. The likelihood ratio specifies under which hypothesis the evidence is more likely to happen (and thus which hypothesis is more likely to account for the evidence). If we find a proportion of ½ (an equal amount of heads and tails), that would support H0: the coin is fair. If we find a proportion of ¼, however, that would support H1. If we multiply our prior odds with the likelihood ratio (the Bayes factor), that would give us the posterior odds. These posterior odds give us an indication of which hypotheses is more probable after seeing the evidence in the form of a

(23)

factor. The factor tells us how much more likely the evidence is under either of the two hypotheses.

We use evidence (the Bayes factor) to update our beliefs (prior probabilities) to acquire new beliefs (posterior probabilities). In a very basic manner, this is how Bayesian analyses work and it is a completely different approach compared to frequentist statistics, especially because prior probabilities (or beliefs) can be and usually are subjective. It is therefore that Bayesian analysis is often labeled as subjective.

3.2 The importance of prior probabilities

The key difference that distinguishes frequentist from Bayesian statistics is the

incorporation of prior probabilities. I have already tried to illustrate how prior knowledge affects the interpretation of evidence in the simple coin example. However, to give my argument additional weight as to why specification of prior beliefs and assumptions are of absolute importance, I will provide another example as illustrated by Dawid (2005). Although the example does not relate to education, it perfectly illustrates how prior probabilities play a role in the assessment of evidence.

Sally Clark’s first child died suddenly in 1996 and a few years later in 1998 her second child died similarly. After the first child had died, it was found that the infant’s death was due to sudden infant death syndrome. However, Sally Clark was prosecuted after the death of her second child and was found to be guilty of murdering both her children. The prosecutor argued that the chances of two consecutive children dying of sudden infant death syndrome was about one in 700,000. Seeing as that chance was extremely unlikely (frequentist claim), she was convicted and sent to prison. In other words: they claimed that the prior probability of two children dying from sudden infant death syndrome was one in 700,000 .

(24)

At first glance it would seem reasonable to assume that the chances are so unlikely, that she must have killed her children. However, the calculation that the prosecutor presented was incomplete. To be able to make a claim about prior odds, two probabilities are needed (eq. 1). The prior probability of the first hypothesis (i.e. Sally Clark murdered her children) and the prior probability of the second hypothesis (i.e. The babies died due to sudden infant death syndrome). The prosecutor merely mentioned the probability of the second hypothesis and did not mention the first. According to the statistics: the probability of a mother murdering two consecutive children was found to be one in 73 million. The probability of Sally Clark

murdering her own children was an extreme amount smaller than the probability of her babies dying due to sudden infant death syndrome. A computation of the prior odds would yield a value of 104. In other words: based on prior information it is 104 times more likely of her babies dying from sudden death syndrome than that Clark had killed her babies. In 2003 there was a second appeal and with this new information in consideration, Clark was released from prison.

Of course, this example has no relation to educational science, however, it clearly indicates how prior assumptions can influence the way in which decisions are made. By frequentist standards, the number one in 700,000 would have been sufficient to convict Clark, but by making a proper assessment of the true prior circumstances it led to a different conclusion. In education, where researchers test new teaching methods, interventions and organizational interventions, it is extremely important to assess prior beliefs, for they can influence the manner in which the gathered evidence is interpreted.

3.3 The Role of Prior Beliefs

From the previous example it becomes clear that carefully identifying prior beliefs is an essential part of the investigation, because they can influence conclusions. How these prior

(25)

beliefs should be quantified, however, is not always as straight forward as in the previous example where a precise probability was utilized. The first step in identifying prior

probabilities is assessing how much information we believe we have and how accurate we believe this information to be (de Schoot, Kaplan, Denissen, Asendorpf, Neyer & van Aken, 2014). For example, prior information can come from meta-analyses and previous surveys. These sources are regarded to be objective, in the sense that the origin of the priors can be verified. For some “subjective Bayesians” however, prior information can come from any source ranging from objective sources to subjective sources such as a certain preconceptions about the likelihood of our hypotheses. Once we establish on what we base a prior probability distribution, we must also specify how informative our prior information is. For example, some population parameters are well-established and can therefore be classified as an informative prior. An uninformative or weakly informative prior is the exact opposite; researchers can opt for these if they are unsure about the informative value of their priors.

Whether a prior distribution is informative or uninformative, has an effect on the posterior distribution. A weakly informative prior distribution has very little effect on the posterior distribution. This seems reasonable, because if we have very little prior information, this will have a minimal effect on our interpretation and thus the posterior distribution because then it is mainly based on our collected data. In contrast to this, an informative prior distribution has a greater effect on the posterior distribution. An assessment of the informativity of the used priors is therefore essential.

The entire process of identifying prior distributions and assessing how informative they are is a personal choice. Researchers individually have to decide on what they base and from what source they base their prior distribution. This can be especially difficult when there is little prior information to utilize, for example when a certain population parameter has not been researched yet. The researcher then has two options: either use a non-informative

(26)

prior—which has little effect on the posterior distribution—or base a prior probability on the researchers’ own belief. There is, for instance, a web-based tool called the MATCH

Uncertainty Elicitation Tool that lets researchers create a prior distribution on the basis of intuitive ideas (http://optics.eee.nottingham.ac.uk/match/uncertainty.php). See Morris, Oakly and Crowe (2014) for a detailed overview of MATCH.

3.4 Epistemology of Bayesian hypothesis testing

Recall the term epistemic agency, where Elgin (2013) explained that researchers are epistemic agents within an epistemic community. From this Elgin concluded that researchers should pursue certain epistemic values, because that would contribute to high quality

research. Epistemic virtues such as honesty, open mindedness, and attentiveness to evidence are a few mentioned examples. The elegance of applying a Bayesian hypothesis test is that it fits very well with these epistemic considerations. For example, consider the equation of the Bayes factor. To compute this you multiply the likelihood ratio times the prior odds. These two terms in the equation can be seen as objective as well as subjective. The prior odds are the subjective element, for it depends on the researcher how this is defined; each is free to define a certain prior distribution either on the basis of empirical data or a strong belief. The

objective part— in the sense that the observed evidence can be verified by an epistemic community—is the likelihood ratio, because that provides researchers with an estimation of how likely (in terms of a probability) the evidence is under a certain hypothesis independently of prior assumptions. Because of the objective connotation of the likelihood ratio, there can be no discussion as to the interpretation of this value for the computation is exactly the same independent of who does the computation. The choice in prior, however, can be critiqued. This is a result of the subjective choice that researchers have in choosing their prior. For some researchers, however, subjective priors are not useful because it has too much effect on the

(27)

posterior distributions (Bandyopadhyay, Brittan, Taper, 2017). In principle this is true, choosing a prior and its informativity directly effects the influence the evidence has on our beliefs.

However, this is a strength rather than a weakness. The mere fact that prior beliefs and assumptions can be critiqued is a direct example of an honest scientific discourse. Disagreeing with chosen priors is therefore perfectly reasonable. The only way that researchers can

become aware of each other’s assumptions and beliefs is to make them explicit. This is not a perk for researchers only, but also for the scientific community reading the research reports. The community is now able to check on what assumptions a certain finding is based. Belief polarization is then less likely to occur, because researchers not only now have the means to have an objective sense of which hypothesis is more likely given the data, but they can observe what the likelihood of the hypotheses are given certain prior assumptions. This results in a researcher becoming more transparent, which can be classified as epistemic virtue.

Thus aside from the purely technical advantages of Bayesian hypothesis testing, it also has epistemic advantages. The most important advantage is transparency, which not only results in rational belief-updating, but also makes the conclusions verifiable or falsifiable by an epistemic community. Bayesian hypothesis testing is therefore a justified alternative to the frequentist method.

3.5 Revisiting Sesame Street: the Bayesian analysis

I have outlined the technical as well as philosophical advantages of using a Bayesian hypothesis test. Let us revisit the empirical example of the Sesame Street and the analysis we have conducted into the literacy development of advantaged and disadvantaged children. I will analyze the same research question and with the same hypotheses, however, instead of a frequentist analysis, a Bayesian analysis will be applied. The first step in any Bayesian

(28)

analysis is choosing a prior distribution and as I have tried to demonstrate, this is no easy task. There are however tools which can provide researchers with the means to elicit prior

distributions based on their intuitive ideas. The web-based tool MATCH will be utilized in this example to elicit a prior distribution.

Prior elicitation. Recall that I based the hypothesis that disadvantaged children will

benefit more from Sesame Street compared to advantaged children on scientific literature. The next step in the process is to assign a prior probity distribution where we incorporate this information, and thus explicating our beliefs. MATCH has a few option for eliciting

distributions, for this analysis the quartile method is the most appropriate. This method allows us to input quartile parameters such as the median, lower quartile and upper quartile.

The next step is to input quartile parameters based on the standardized effect size that we believe to be associated with the effect that Sesame Street produces. Standardized effect sizes, Cohen’s d for example, are numbers between zero and one; one being a very strong effect and zero having no effect. From the literature on which we based our hypothesis we believe there to be a positive effect; however, we did not have quantitative data to base our prior on. We must therefore be careful with our prior distribution. Conservatively, we believe Sesame Street will have a medium effect on literacy development with disadvantaged compared to advantaged children: we set the median of the effect size on 0.5, the lower quartile on 0.25 and the upper on 0.75. MATCH generated a normal distribution based on our input and calculated the location of the normal curve to be at µ = 0.5 (the mean or average) and 𝜎𝜎 = 0.309 (the standard deviation, or spread around the average).

Bayes factor. Now that the prior distribution based on our belief is specified, we can compute the Bayes factor. The statistical computing software JASP was utilized to perform an independent sample Bayesian t-test (as described in Gronau, Ly and Wagemakers (2017)). The prior was defined by inputting the distribution parameters that were computed by

(29)

MATCH. JASP computed two Bayes factors, BF10 and BF01 with subscripts one and zero

referring to the alternative- (there is an effect) and null-hypothesis (there is no effect). BF10

provides an estimation how probable the alternative hypothesis is given the data, which was computed to be 0.174. BF01 alternatively provides an estimation on how probable the

null-hypothesis is given the data, which was computed to be 5.737. By inspecting these Bayes factors it becomes apparent that the null-hypnosis is more likely given our data than our alternative [predicted] hypothesis.

Visually this can also be inspected. Figure 1 shows both the prior and the posterior

distributions. It can be observed that our prior was shifted to the left toward a predicted effect size of 0.5. After combining this with our data, is can be seen that the posterior distribution is shifted toward an effect size of zero, which is a clear indication that our beliefs are not confirmed. In fact, by looking at the Bayes factors, it is actually disputed because the hypothesis of having no effect 5.737 times more likely then there to be an effect.

Figure 1: Prior and posterior distributions

Not only can we now confidently conclude that our belief is incorrect, we have now quantified how much the data disconfirms our beliefs by comparing the Bayes factors. This

(30)

also can be inspected visually. Figure 2 is a plot of the Bayes factor and degree to which both factors are evidence for either of the hypotheses. First of all note that the Bayes factor is influenced by the size of our sample (n), for the Bayes factor changed the larger our sample was. If we then look at n=240, which was our sample size, we see that the computed Bayes factor provides moderate evidence for the hypothesis that Sesame Street benefits

disadvantaged children benefit more from Sesame Street then advantaged children. Thus in comparison to our beliefs we get a sense as to how wrong we might be and how we rationally should revise our beliefs in light of the evidence in the data.

Figure 2: Bayes factor plot

Comparison. In applying both the frequentist method and the Bayesian method, we reach the same conclusion: Sesame Street does not benefit disadvantaged children more than advantaged children. The knowledge claim we can draw from both analyses are different. Whereby the frequentist analyses only provided a binary solution (reject/not reject) the Bayesian method allowed us to rationally inspect how the evidence relates to our beliefs, and therefore get a sense of how wrong or correct we are. Moreover the Bayesian analysis allows for other researchers to replicate the analysis with different prior beliefs to which they can

(31)

compare the evidence. This is also not possible within the frequentist method, because after the first frequentist analysis I illustrated how the interpretation of a non-significant result can differ depending on your beliefs, which could in turn lead to irrational decision making based on ‘gut feelings’. That is prevented in a Bayesian analysis because the evidence can be investigated under different assumptions and researchers can see how the evidence relates to those assumptions.

What is more, in our example the p-value in the frequentist analyses happened to be non-significant and indeed the Bayes analysis concluded that the null-hypothesis was more probable. This is however not always the case: a study by Wetzels et al. (2011) analyzed 855 studies in which they applied a frequentist t-test, and they concluded that for 70 percent of the datasets for which the p-value fell between 0.01 and 0.05 the Bayes factor indicated that the evidence was merely anecdotal (not very compelling). Thus the Bayes factor provides a measure for the strength of evidence that the p-value does not provide. This again affirms the merits of Bayesian hypothesis testing over frequentist hypothesis testing.

4. Conclusion

In the introduction of this thesis, I quoted Goldstein: “We may argue that a scientific case is “proven" if the evidence should be convincing given any reasonable assignment of prior beliefs” (Goldstein, p410). I have tried to illustrate that this premise holds by identifying the various manners in which prior information can (in some cases irrationally) influence our assessment of evidence and the conclusion researchers draw from this. My argument as to why Bayesian statistical techniques are more appropriate compared to frequentist methods and why Bayesian methods should be applied more frequently applied was threefold: philosophical, empirical and methodological.

(32)

First, I explored philosophical and epistemological justifications to explicit prior beliefs. Researchers as part of an epistemic community have an obligation to explicit their beliefs in order to make their assumptions and beliefs research transparent, for they can influence the manner in which they researchers asses evidence. If beliefs and assumptions are made explicit, they can be critiqued. Only then can researcher be aware of their biases toward proving their beliefs to be correct.

Second, I explored empirical evidence of the polarization effect that provides an

explanation as to how and why evidential and quality assessment can be influenced by prior beliefs. From the effect of belief polarization it was concluded that when seeing contradictory evidence, researchers can act as irrational decision makers and that the process of updating their beliefs was not always according to logic.

Third, I highlighted the weaknesses of the frequentist analysis by conducting an example analysis on the Sesame Street data set. The conclusion was that the interpretation of an objective p-value can in fact be very subjective, because prior beliefs influence our evidential assessment capabilities. The frequentist method has no manner in which beliefs can be quantified and incorporated. Therefore, finally, a Bayesian analysis technique was proposed as an alternative. Bayesian statistics allows for the incorporation of prior beliefs and the Bayes factor allows researchers to rationally update their prior beliefs after seeing the data. This has philosophical as well as methodological advantages in comparison to the frequentist method.

The elegance of Bayesian hypothesis testing is that is fits very well within the

post-positivistic paradigm of education research. Beliefs can no longer be implicit, they have to be made explicit in order make justified and reasonable decisions. This has the advantage that researchers in an epistemic community can critique prior assumptions and can also evaluate the evidence in relation to their own beliefs. Something that was not possible in frequentist hypothesis testing. Applying a Bayesian hypothesis test quantifies and mathematically

(33)

dictates how researchers should update their beliefs after seeing the evidence. Hence

(irrational) belief polarization is less likely to occur. Moreover, by computing the Bayes factor there is the advantage that researchers are provided with a measure of the strength of their evidence, which was not possible in frequentist significance testing. Thus the philosophical, empirical and methodological advantages of Bayesian hypothesis testing over frequentist hypothesis testing should be sufficient enough to consider it as a viable alternative.

References

Blok, H., Fukkink, R. G., Gebhardt, E. C., & Leseman, P. P. (2005). The relevance of delivery mode and other programme characteristics for the effectiveness of early childhood

intervention. International Journal of Behavioral Development, 29(1), 35-47. Blyth, C. R. (1972). Subjective vs. objective methods in statistics. The American

Statistician, 26(3), 20-22.

Cook, J., & Lewandowsky, S. (2016). Rational irrationality: Modeling climate change belief polarization using Bayesian networks. Topics in cognitive science, 8(1), 160-179.

Dawid, P. (2005). Probability and proof. In T. J. Anderson, D. A. Schum, & W. L. Twining (Reds.), Analysis of Evidence (2e ed., pp. 1-88). Found at

www.cambridge.org/us/download_file/203379/

Driessen, G. (2017). Early Childhood Education Intervention Programs in the Netherlands: Still Searching for Empirical Evidence. Education Sciences, 8(1), 3.

Elgin, C. Z. (2013). Epistemic agency. School Field, 11(2), 135-152.

Fisher, M. A. (1922). On the mathematical foundations of theoretical statistics. Phil. Trans. R. Soc. Lond. A, 222(594-604), 309-368.

(34)

Fukkink, R., Jilink, L., & Oostdam, R. (2017). A meta-analysis of the impact of early childhood interventions on the development of children in the Netherlands: an

inconvenient truth?. European Early Childhood Education Research Journal, 25(5), 656-666.

Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 967-1033.

Goldstein, M. (2006). Subjective Bayesian analysis: principles and practice. Bayesian analysis, 1(3), 403-420.

Gronau, Q. F., Ly, A., & Wagenmakers, E. J. (2017). Informed Bayesian t-tests. arXiv preprint arXiv:1704.02479.

Henderson, K. A. (2011). Post-positivism and the pragmatics of leisure research. Leisure Sciences, 33(4), 341-346.

Howe, K. R. (1985). Two dogmas of educational research. Educational researcher, 14(8), 10-18.

Jern, A., Chang, K. M. K., & Kemp, C. (2014). Belief polarization is not always irrational. Psychological review, 121(2), 206.

Kelly, T. (2008). Disagreement, dogmatism, and belief polarization. The Journal of Philosophy, 105(10), 611-633.

Koehler, J. J. (1993). The influence of prior beliefs on scientific judgments of evidence quality. Organizational behavior and human decision processes, 56(1), 28-55.

Leseman, P. P., & Jong, P. F. (1998). Home literacy: Opportunity, instruction, cooperation and social‐emotional quality predicting early reading achievement. Reading Research Quarterly, 33(3), 294-318.

(35)

Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of personality and social psychology, 37(11), 2098.

Morris, D. E., Oakley, J. E., & Crowe, J. A. (2014). A web-based tool for eliciting probability distributions from experts. Environmental Modelling & Software, 52, 1-4.

Niklas, F., & Schneider, W. (2013). Home literacy environment and the beginning of reading and spelling. Contemporary Educational Psychology, 38(1), 40-50.

S Kuhn, T. (1962). The Structure of Scientific Revolutions. Geraadpleegd van

https://books.google.nl/books/about/The_Structure_of_Scientific_Revolutions.html?id=3e P5Y_OOuzwC&redir_esc=y

Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Aken, M. A. (2014). A gentle introduction to Bayesian analysis: Applications to developmental research. Child development, 85(3), 842-860.

Stevens, J. (2009). Applied Multivariate Statistics for the Social Sciences. Geraadpleegd van https://books.google.nl/books/about/Applied_Multivariate_Statistics_for_the.html?id=QM GjsqLQlmUC&redir_esc=y

Van Steensel, R. (2006). Relations between socio‐cultural factors, the home literacy environment and children's literacy development in the first years of primary education. Journal of Research in Reading, 29(4), 367-382.

Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic bulletin & review, 14(5), 779-804.

Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E. J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6(3), 291-298.

Referenties

GERELATEERDE DOCUMENTEN

As a simple demonstration that conjugate models might not react to prior-data conflict reasonably, infer- ence on the mean of data from a scaled normal distribution and inference on

Het inrichten van een woonerf gebeurt niet alleen om sluipverkeer te weren en de snelheid van het resterende verkeer te beperken, maar ook om een

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

a) What is the influence of the difference in Expectations and Performance Perceptions on Customer Satisfaction? We found that the total effect of the difference between

These analytical approaches are based on a spatially explicit phylogeographic or phylodynamic (skygrid coalescent) reconstruction, and aim to assess the impact of environmental

The social-economic factors of education and income taken together thus seem to better explain voter turnout in Noord Brabant than the social-cultural factors of religion and

Aan die Universiteit van Leicester is die Department of English Local History oak in 1948 gestig. Dit is die enigste departement vir nagraadse studie en navorsing

However, the Constitutional Court in the cases of Khosa and Another v Minister of Social Development and Another and Makhaule and Another v Minister of Social Development and